aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa/_gen/PPC64.rules
AgeCommit message (Collapse)Author
9 dayscmd/compile: improve uint8/uint16 logical immediates on PPC64Jayanth Krishnamurthy jayanth.krishnamurthy@ibm.com
Logical ops on uint8/uint16 (AND/OR/XOR) with constants sometimes materialized the mask via MOVD (often as a negative immediate), even when the value fit in the UI-immediate range. This prevented the backend from selecting andi. / ori / xori forms. This CL makes: UI-immediate truncation is performed only at the use-site of logical-immediate ops, and only when the constant does not fit in the 8- or 16-bit unsigned domain (m != uint8(m) / m != uint16(m)). This avoids negative-mask materialization and enables correct emission of UI-form logical instructions. Arithmetic SI-immediate instructions (addi, subfic, etc.) and other use-patterns are unchanged. Codegen tests are added to ensure the expected andi./ori/xori patterns appear and that MOVD is not emitted for valid 8/16-bit masks. Change-Id: I9fcdf4498c4e984c7587814fb9019a75865c4a0d Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/704015 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Paul Murphy <paumurph@redhat.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com>
2026-03-25cmd/compile: ppc64 fold (x+x)<<c into x<<(c+1)Jayanth Krishnamurthy jayanth.krishnamurthy@ibm.com
On ppc64/ppc64le, rewrite (x + x) << c to x << (c+1) for constant shifts. This removes an ADD, shortens the dependency chain, and reduces code size. Add rules for both 64-bit (SLDconst) and 32-bit (SLWconst), and extend test/codegen/shift.go with ppc64x checks to assert a single SLD/SLW and forbid ADD. Aligns ppc64 with other architectures that already assert similar codegen in shift.go. Change-Id: Ie564afbb029a5bd48887b82b0c455ca1dddd5508 Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/712000 Reviewed-by: Archana Ravindar <aravinda@redhat.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2026-03-06cmd/compile: use tail calls for wrappers for embedded interfacesKeith Randall
type I interface { foo() } type S struct { I } Because I is embedded in S, S needs a foo method. We generate a wrapper function to implement (*S).foo. It just loads the embedded field I out of S and calls foo on it. When the thing in S.I itself needs a wrapper, then we have a wrapper calling another wrapper. This can continue, leaving a potentially long sequence of wrappers on the stack. When we then call runtime.Callers or friends, we have to walk an unbounded number of frames to find a bounded number of non-wrapper frames. This really happens, for instance with I = context.Context, S = context.ValueCtx, and runtime.Callers = pprof sample (for any of context.Context's methods). To fix, make the interface call in the wrapper a tail call. That way, the number of wrapper frames on the stack does not increase when there are lots of wrappers happening. Fixes #75764 Fixes #77781 Change-Id: I03b1731159d9218c7f14f72ecbbac822d6a6bb87 Reviewed-on: https://go-review.googlesource.com/c/go/+/751465 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-10-28cmd/compile: extend ppc64 MADDLD to match const ADDconst & MULLDconstJorropo
Fixes #76084 I was focused on restoring the old behavior and fixing the failing test/codegen/arithmetic.go:MergeMuls2 test. It is probable this same bug hides elsewhere in this file. Change-Id: I17f2ee6b97a1e33b8132648d9d750749d006f7e0 Reviewed-on: https://go-review.googlesource.com/c/go/+/715560 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Paul Murphy <paumurph@redhat.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
2025-08-05cmd/compile: move ppc64 over to new bounds check strategyKeith Randall
Change-Id: I25a9bbc247b2490e7e37ed843386f53a71822146 Reviewed-on: https://go-review.googlesource.com/c/go/+/682498 Reviewed-by: Paul Murphy <paumurph@redhat.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-02-27cmd/compile: simplify intrinsification of TrailingZeros16 and TrailingZeros8Joel Sing
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64 and S390X, rather than having a custom intrinsic. Note that for PPC64 this actually allows the existing Ctz16 and Ctz8 rules to be used. Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4 Reviewed-on: https://go-review.googlesource.com/c/go/+/651816 Reviewed-by: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-02-26cmd/compile: simplify intrinsification of BitLen16 and BitLen8Joel Sing
Decompose BitLen16 and BitLen8 within the SSA rules for architectures that support BitLen32 or BitLen64, rather than having a custom intrinsic. Change-Id: Ie4188ce69d1021e63cec27a8e7418efb0714812b Reviewed-on: https://go-review.googlesource.com/c/go/+/651817 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-11-12cmd/compile/internal/ssa: improve carry addition rules on PPC64Paul E. Murphy
Fold constant int16 addends for usages of math/bits.Add64(x,const,0) on PPC64. This usage shows up in a few crypto implementations; notably the go wrapper for CL 626176. Change-Id: I6963163330487d04e0479b4fdac235f97bb96889 Reviewed-on: https://go-review.googlesource.com/c/go/+/625899 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-10-24cmd/compile/internal/ssa: fix PPC64 shift codegen regressionPaul E. Murphy
CL 621357 introduced new generic lowering rules which caused several shift related codegen test failures. Add new rules to fix the test regressions, and cleanup tests which are changed but not regressed. Some CLRLSLDI tests are removed as they are no test CLRLSLDI rules. Fixes #70003 Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034 Reviewed-on: https://go-review.googlesource.com/c/go/+/622236 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-07cmd/compile: add internal/runtime/atomic.Xchg8 intrinsic for PPC64Paul E. Murphy
This is minor extension of the existing support for 32 and 64 bit types. For #69735 Change-Id: I6828ec223951d2b692e077dc507b000ac23c32a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/617496 Reviewed-by: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@google.com>
2024-09-24cmd/compile: small cleanups to rewrite rule helperskhr@golang.org
Change-Id: I50a19bd971176598bf8e4ef86ec98f008abe245c Reviewed-on: https://go-review.googlesource.com/c/go/+/615198 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@google.com>
2024-08-26cmd/compile: intrinsify math.MulUintptr on PPC64Paul E. Murphy
This can be done efficiently with few instructions. This also adds MULHDUCC for further codegen improvement. Change-Id: I06320ba4383a679341b911a237a360ef07b19168 Reviewed-on: https://go-review.googlesource.com/c/go/+/605975 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Archana Ravindar <aravinda@redhat.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-06-07cmd/compile/ssa: fix (MOVWZreg (RLWINM)) folding on PPC64Paul E. Murphy
RLIWNM does not clear the upper 32 bits of the target register if the mask wraps around (e.g 0xF000000F). Don't elide MOVWZreg for such masks. All other usage clears the upper 32 bits. Fixes #67844. Change-Id: I11b89f1da9ae077624369bfe2bf25e9b7c9b79bc Reviewed-on: https://go-review.googlesource.com/c/go/+/590896 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-22cmd/compile/internal/ssa: reintroduce ANDconst opcode on PPC64Paul E. Murphy
This allows more effective conversion of rotate and mask opcodes into their CC equivalents, while simplifying the first lowering pass. This was removed before the latelower pass was introduced to fold more cases of compare against zero. Add ANDconst to push the conversion of ANDconst to ANDCCconst into latelower with the other CC opcodes. This also requires introducing RLDICLCC to prevent regressions when ANDconst is converted to RLDICL then to RLDICLCC and back to ANDCCconst when possible. Change-Id: I9e5f9c99fbefa334db18c6c152c5f967f3ff2590 Reviewed-on: https://go-review.googlesource.com/c/go/+/586160 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-05-17cmd/compile/internal/ssa: cleanup ANDCCconst rewrite rules on PPC64Paul E. Murphy
Avoid creating duplicate usages of ANDCCconst. This is preparation for a patch to reintroduce ANDconst to simplify the lower pass while treating ANDCCconst like other *CC* ssa opcodes. Also, move many of the similar rules wich retarget ANDCCconst users to the flag result to a common rule for all compares against zero. Change-Id: Ida86efe17ff413cb82c349d8ef69d2899361f4c0 Reviewed-on: https://go-review.googlesource.com/c/go/+/585400 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-05-15cmd/compile/internal/ssa: combine more shift and masking on PPC64Paul E. Murphy
Investigating binaries, these patterns seem to show up frequently. Change-Id: I987251e4070e35c25e98da321e444ccaa1526912 Reviewed-on: https://go-review.googlesource.com/c/go/+/583302 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-05-03cmd/compile/internal/ssa: on PPC64, try combining CLRLSLDI and SRDconst into ↵Paul E. Murphy
RLWINM This provides a small performance bump to crc64 as measured on ppc64le/power10: name old time/op new time/op delta Crc64/ISO64KB 49.6µs ± 0% 46.6µs ± 0% -6.18% Crc64/ISO4KB 3.16µs ± 0% 2.97µs ± 0% -5.83% Crc64/ISO1KB 840ns ± 0% 794ns ± 0% -5.46% Crc64/ECMA64KB 49.6µs ± 0% 46.5µs ± 0% -6.20% Crc64/Random64KB 53.1µs ± 0% 49.9µs ± 0% -6.04% Crc64/Random16KB 15.9µs ± 1% 15.0µs ± 0% -5.73% Change-Id: I302b5431c7dc46dfd2d211545c483bdcdfe011f1 Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/581937 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Eli Bendersky <eliben@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-04-01cmd/compile: support float min/max instructions on PPC64Paul E. Murphy
This enables efficient use of the builtin min/max function for float64 and float32 types on GOPPC64 >= power9. Extend the assembler to support xsminjdp/xsmaxjdp and use them to implement float min/max. Simplify the VSX xx3 opcode rules to allow FPR arguments, if all arguments are an FPR. Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328 Reviewed-on: https://go-review.googlesource.com/c/go/+/574535 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Than McIntosh <thanm@google.com>
2024-03-15cmd/compile/internal: generate ADDZE on PPC64Paul E. Murphy
This usage shows up in quite a few places, and helps reduce register pressure in several complex cryto functions by removing a MOVD $0,... instruction. Change-Id: I9444ea8f9d19bfd68fb71ea8dc34e109681b3802 Reviewed-on: https://go-review.googlesource.com/c/go/+/571055 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Paul Murphy <murp@ibm.com>
2023-11-06cmd/compile: adding rule to eliminate ANDCCconstJayanth Krishnamurthy
For example, the Slicemask rule in PPC64 generates a sequence wherein there is andi operation, after an sradi, which can be replaced by srdi. This new rule eliminates ANDCCconst. Change-Id: I27aaadf76b9c749a60bcdc5e87b1ebb8167d2fd4 Reviewed-on: https://go-review.googlesource.com/c/go/+/539055 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-10-18cmd/compile: update to rules on PPC64 folding bit reversal to loadJayanth Krishnamurthy
In the Power10 rule to fold bit reversal into load, the MOVWZreg or MOVHZreg (Zeroing out the upper bits of a word or halfword) becomes redundant since byte reverse (BR) load clears the upper bits. Hence removing for Power10. Similarly for < Power10 cases in the rule used to fold bit reversal into load (Bswap), the above redundant operation is removed. Change-Id: Idb027e8b6e79b6acfb81d48a9a6cc06f8e9cd2db Reviewed-on: https://go-review.googlesource.com/c/go/+/531377 Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-18cmd/compile: avoid ANDCCconst on PPC64 if condition not neededLynn Boger
In the PPC64 ISA, the instruction to do an 'and' operation using an immediate constant is only available in the form that also sets CR0 (i.e. clobbers the condition register.) This means CR0 is being clobbered unnecessarily in many cases. That affects some decisions made during some compiler passes that check for it. In those cases when the constant used by the ANDCC is a right justified consecutive set of bits, a shift instruction can be used which has the same effect if CR0 does not need to be set. The rule to do that has been added to the late rules file after other rules using ANDCCconst have been processed in the main rules file. Some codegen tests had to be updated since ANDCC is no longer generated for some cases. A new test case was added to verify the ANDCC is present if the results for both the AND and CR0 are used. Change-Id: I304f607c039a458e2d67d25351dd00aea72ba542 Reviewed-on: https://go-review.googlesource.com/c/go/+/531435 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-09-25cmd/compile: add rules to avoid unnecessary MOVDaddr for PPC64Lynn Boger
This adds some rules to recognize MOVDaddr in those cases where it is just adding 0 to a ptr value. Instead the ptr value can just be used. Change-Id: I95188defc9701165c86bbea70d14d037a9e54853 Reviewed-on: https://go-review.googlesource.com/c/go/+/527698 Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Paul Murphy <murp@ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Than McIntosh <thanm@google.com>
2023-09-22cmd/compile/internal/ssa: optimize (AND (MOVDconst [-1] x)) on PPC64Paul E. Murphy
This sequence can show up in the lowering pass on PPC64. If it makes it to the latelower pass, it will cause an error because it looks like it can be turned into RLDICL, but -1 isn't an accepted mask. Also, print more debug info if panic is called from encodePPC64RotateMask. Fixes #62698 Change-Id: I0f3322e2205357abe7fc28f96e05e3f7ad65567c Reviewed-on: https://go-review.googlesource.com/c/go/+/529195 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-07-12cmd/compile: on PPC64, fix sign/zero extension when maskingPaul E. Murphy
(ANDCCconst [y] (MOV.*reg x)) should only be merged when zero extending. Otherwise, sign bits are lost on negative values. (ANDCCconst [0xFF] (MOVBreg x)) should be simplified to a zero extension of x. Likewise for the MOVHreg variant. Fixes #61297 Change-Id: I04e4fd7dc6a826e870681f37506620d48393698b Reviewed-on: https://go-review.googlesource.com/c/go/+/508775 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-05-15cmd/compile: update rules to generate more prefixed instructionsLynn Boger
This modifies some existing rules to allow more prefixed instructions to be generated when using GOPPC64=power10. Some rules also check if PCRel is available, which is currently supported for linux/ppc64le and linux/ppc64 (internal linking only). Prior to p10, DS-offset loads and stores had a 16 bit size limit for the offset field. If the offset of the data for load or store was beyond this range then an indexed load or store would be selected by the rules. In p10 the assembler can generate prefixed instructions in this case, but does not if an indexed instruction was selected during the lowering pass. This allows many more cases to use prefixed loads or stores, reducing function sizes and improving performance in some cases where the code change happens in key loops. For example in strconv BenchmarkAppendQuoteRune before: 12c5e4: 15 00 10 06 pla r10,1425660 12c5e8: fc c0 40 39 12c5ec: 00 00 6a e8 ld r3,0(r10) 12c5f0: 10 00 aa e8 ld r5,16(r10) After this change: 12a828: 15 00 10 04 pld r3,1433272 12a82c: b8 de 60 e4 12a830: 15 00 10 04 pld r5,1433280 12a834: c0 de a0 e4 Performs better in the second case. A testcase was added to verify that the rules correctly select a load or store based on the offset and whether power10 or earlier. Change-Id: I4335fed0bd9b8aba8a4f84d69b89f819cc464846 Reviewed-on: https://go-review.googlesource.com/c/go/+/477398 Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com>
2023-04-21cmd/compile: introduce separate memory op combining passKeith Randall
Memory op combining is currently done using arch-specific rewrite rules. Instead, do them as a arch-independent rewrite pass. This ensures that all architectures (with unaligned loads & stores) get equal treatment. This removes a lot of rewrite rules. The new pass is a bit more comprehensive. It handles things like out-of-order writes and is careful not to apply partial optimizations that then block further optimizations. Change-Id: I780ff3bb052475cd725a923309616882d25b8d9e Reviewed-on: https://go-review.googlesource.com/c/go/+/478475 Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2023-04-10cmd/compile: replace isSigned(t) with t.IsSigned()Keith Randall
No change in semantics, just removing an unneeded helper. Also align rules a bit. Change-Id: Ie4dabb99392315a7700c645b3d0931eb8766a5fa Reviewed-on: https://go-review.googlesource.com/c/go/+/483439 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2023-04-10cmd/compile: clean up store rules to use store type, not argument typeKeith Randall
Argument type is dangerous because it may be thinner than the actual store being issued. Change-Id: Id19fbd8e6c41390a453994f897dd5048473136aa Reviewed-on: https://go-review.googlesource.com/c/go/+/483438 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com>
2023-02-23cmd/compile: rework unbounded shift lowering on PPC64Paul E. Murphy
This reduces unbounded shift latency by one cycle, and may generate less instructions in some cases. When there is a choice whether to use doubleword or word shifts, use doubleword shifts. Doubleword shifts have fewer hardware scheduling restrictions across P8/P9/P10. Likewise, rework the shift sequence to allow the compare/shift/overshift values to compute in parallel, then choose the correct value. Some ANDCCconst rules also need reworked to ensure they simplify when used for their flag value. This commonly occurs when prove fails to identify a bounded shift (e.g foo32<<uint(x&31)). Change-Id: Ifc6ff4a865d68675e57745056db414b0eb6f2d34 Reviewed-on: https://go-review.googlesource.com/c/go/+/442597 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-02-17cmd/compile: ensure constant folding of pointer arithmetic remains a pointerKeith Randall
For c + nil, we want the result to still be of pointer type. Fixes ppc64le build failure with CL 468455, in issue33724.go. The problem in that test is that it requires a nil check to be scheduled before the corresponding load. This normally happens fine because we prioritize nil checks. If we have nilcheck(p) and load(p), once p is scheduled the nil check will always go before the load. The issue we saw in 33724 is that when p is a nil pointer, we ended up with two different p's, an int64(0) as the argument to the nil check and an (*Outer)(0) as the argument to the load. Those two zeroes don't get CSEd, so if the (*Outer)(0) happens to get scheduled first, the load can end up before the nilcheck. Fix this by always having constant arithmetic preserve the pointerness of the value, so that both zeroes are of type *Outer and get CSEd. Update #58482 Update #33724 Change-Id: Ib9b8c0446f1690b574e0f3c0afb9934efbaf3513 Reviewed-on: https://go-review.googlesource.com/c/go/+/468615 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> TryBot-Bypass: Keith Randall <khr@golang.org>
2023-02-06cmd/compile: add rules to emit SETBC/R instructions on power10Archana R
This CL adds rules that replaces instances of ISEL that produce a boolean result based on a condition register by SETBC/SETBCR operations. On Power10 these are convereted to SETBC/SETBCR instructions that use one register instead of 3 registers conventionally used by ISEL and hence reduces register pressure. On loops written specifically to exercise such instances of ISEL extensively, a performance improvement of 2.5% is seen on Power10. Also added verification tests to verify correct generation of SETBC/SETBCR instructions on Power10. Change-Id: Ib719897f09d893de40324440a43052dca026e8fa Reviewed-on: https://go-review.googlesource.com/c/go/+/449795 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-03cmd/compile: intrinsify math/bits/ReverseBytes{16|32|64} for ppc64/power10Archana R
This change intrinsifies ReverseBytes{16|32|64} by generating the corresponding new instructions in Power10: brh, brd and brw and adds a verification test for the same. On Power 9 and 8, the .go code performs optimally as it is. Performance improvement seen on Power10: ReverseBytes32 1.38ns ± 0% 1.18ns ± 0% -14.2 ReverseBytes64 1.52ns ± 0% 1.11ns ± 0% -26.87 ReverseBytes16 1.41ns ± 1% 1.18ns ± 0% -16.47 Change-Id: I88f127f3ab9ba24a772becc21ad90acfba324b37 Reviewed-on: https://go-review.googlesource.com/c/go/+/446675 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-01-19cmd/compile: add anchored version of SPKeith Randall
The SPanchored opcode is identical to SP, except that it takes a memory argument so that it (and more importantly, anything that uses it) must be scheduled at or after that memory argument. This opcode ensures that a LEAQ of a variable gets scheduled after the corresponding VARDEF for that variable. This may lead to less CSE of LEAQ operations. The effect is very small. The go binary is only 80 bytes bigger after this CL. Usually LEAQs get folded into load/store operations, so the effect is only for pointerful types, large enough to need a duffzero, and have their address passed somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because the two uses are on different sides of a function call and the LEAQ ends up being rematerialized at the second use anyway. Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293 Reviewed-on: https://go-review.googlesource.com/c/go/+/452916 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Martin Möhrmann <martin@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2022-12-11cmd/compile: fix conditional move rule on PPC64Keith Randall
Similar to CL 456556 but for ppc64 instead of arm64. Change docs about how booleans are stored in registers for ppc64. We now don't promise to keep the upper bits zeroed; they might be junk. To test, I changed the boolean generation instructions (MOVBZload* and ISEL* with boolean type) to OR in 0x100 to the result. all.bash still passed, so I think nothing else is depending on the upper bits of booleans. Update #57184 Change-Id: Ie66f8934a0dafa34d0a8c2a37324868d959a852c Reviewed-on: https://go-review.googlesource.com/c/go/+/456437 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: KAMPANAT THUMWONG (KONG PC) <1992kongpc.kth@gmail.com> Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
2022-11-03cmd/compile: add debug-hash flag for fused-multiply-addDavid Chase
This adds a -d debug flag "fmahash" for hashcode search for floating point architecture-dependent problems. This variable has no effect on architectures w/o fused-multiply-add. This was rebased onto the GOSSAHASH renovation so that this could have its own dedicated environment variable, and so that it would be cheap (a nil check) to check it in the normal case. Includes a basic test of the trigger plumbing. Sample use (on arm64, ppc64le, s390x): % GOCOMPILEDEBUG=fmahash=001110110 \ go build -o foo cmd/compile/internal/ssa/testdata/fma.go fmahash triggered main.main:24 101111101101111001110110 GOFMAHASH triggered main.main:20 010111010000101110111011 1.0000000000000002 1.0000000000000004 -2.220446049250313e-16 exit status 1 The intended use is in conjunction with github.com/dr2chase/gossahash, which will probably acquire a flag "-fma" to streamline its use. This tool+use was inspired by an ad hoc use of this technique "in anger" to debug this very problem. This is also a dry-run for using this same technique to identify code sensitive to loop variable lifetime/capture, should we make that change. Example intended use, with current search tool (using old environment variable), for a test example: gossahash -e GOFMAHASH GOMAGIC=GOFMAHASH go run fma.go Trying go args=[...], env=[GOFMAHASH=1 GOMAGIC=GOFMAHASH] go failed (81 distinct triggers): exit status 1 Trying go args=[...], env=[GOFMAHASH=11 GOMAGIC=GOFMAHASH] go failed (39 distinct triggers): exit status 1 Trying go args=[...], env=[GOFMAHASH=011 GOMAGIC=GOFMAHASH] go failed (18 distinct triggers): exit status 1 Trying go args=[...], env=[GOFMAHASH=0011 GOMAGIC=GOFMAHASH] Trying go args=[...], env=[GOFMAHASH=1011 GOMAGIC=GOFMAHASH] ... Trying go args=[...], env=[GOFMAHASH=0110111011 GOMAGIC=GOFMAHASH] Trying go args=[...], env=[GOFMAHASH=1110111011 GOMAGIC=GOFMAHASH] go failed (2 distinct triggers): exit status 1 Trigger string is 'GOFMAHASH triggered math.qzero:427 111111101010011110111011', repeated 6 times Trigger string is 'GOFMAHASH triggered main.main:20 010111010000101110111011', repeated 1 times Trying go args=[...], env=[GOFMAHASH=01110111011 GOMAGIC=GOFMAHASH] go failed (1 distinct triggers): exit status 1 Trigger string is 'GOFMAHASH triggered main.main:20 010111010000101110111011', repeated 1 times Review GSHS_LAST_FAIL.0.log for failing run FINISHED, suggest this command line for debugging: GOSSAFUNC='main.main:20 010111010000101110111011' \ GOFMAHASH=01110111011 GOMAGIC=GOFMAHASH go run fma.go Change-Id: Ifa22dd8f1c37c18fc8a4f7c396345a364bc367d5 Reviewed-on: https://go-review.googlesource.com/c/go/+/394754 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2022-10-07cmd/compile: leverage cc ops in more cases on ppc64xLynn Boger
This updates some rules to use ops with CC variations to set the condition code when the result of the operation is zero. This allows the following compare with zero to be removed since the equivalent condition code has already been set. In addition, a previous rule change to use ANDCCconst was modified to allow any constant value, not just 1 in some cases. Improvements in the reflect package benchmarks: DeepEqual/int8-4 23.9ns ± 1% 23.1ns ± 1% -3.57% (p=0.029 n=4+4) DeepEqual/[]int8-4 109ns ± 2% 102ns ± 1% -6.67% (p=0.029 n=4+4) DeepEqual/int16-4 23.8ns ± 1% 22.8ns ± 0% -3.97% (p=0.029 n=4+4) DeepEqual/[]int16-4 108ns ± 1% 102ns ± 0% -6.25% (p=0.029 n=4+4) DeepEqual/int32-4 24.9ns ± 3% 23.6ns ± 0% -5.09% (p=0.029 n=4+4) DeepEqual/[]int32-4 109ns ± 1% 103ns ± 0% -5.64% (p=0.029 n=4+4) DeepEqual/int64-4 25.5ns ± 1% 23.7ns ± 0% -7.03% (p=0.029 n=4+4) DeepEqual/[]int64-4 109ns ± 1% 102ns ± 0% -6.73% (p=0.029 n=4+4) DeepEqual/int-4 23.2ns ± 1% 22.7ns ± 0% -2.05% (p=0.029 n=4+4) DeepEqual/[]int-4 109ns ± 3% 101ns ± 0% -7.18% (p=0.029 n=4+4) DeepEqual/uint8-4 23.9ns ± 1% 23.5ns ± 0% -1.69% (p=0.029 n=4+4) DeepEqual/[]uint8-4 89.1ns ± 0% 85.6ns ± 1% -3.95% (p=0.029 n=4+4) DeepEqual/uint16-4 24.0ns ± 1% 23.8ns ± 0% -0.76% (p=0.343 n=4+4) DeepEqual/[]uint16-4 111ns ± 0% 106ns ± 4% -4.74% (p=0.029 n=4+4) DeepEqual/uint32-4 23.5ns ± 1% 23.0ns ± 0% -2.15% (p=0.029 n=4+4) DeepEqual/[]uint32-4 110ns ± 1% 104ns ± 0% -5.66% (p=0.029 n=4+4) DeepEqual/uint64-4 24.6ns ± 1% 24.3ns ± 0% -1.10% (p=0.143 n=4+4) DeepEqual/[]uint64-4 111ns ± 0% 105ns ± 1% -5.16% (p=0.029 n=4+4) DeepEqual/uint-4 23.6ns ± 0% 23.0ns ± 0% -2.70% (p=0.029 n=4+4) DeepEqual/[]uint-4 109ns ± 0% 103ns ± 1% -5.74% (p=0.029 n=4+4) DeepEqual/uintptr-4 25.1ns ± 1% 24.8ns ± 2% -1.11% (p=0.171 n=4+4) DeepEqual/[]uintptr-4 111ns ± 0% 106ns ± 1% -4.45% (p=0.029 n=4+4) DeepEqual/float32-4 22.5ns ± 0% 22.2ns ± 0% -1.29% (p=0.029 n=4+4) DeepEqual/[]float32-4 105ns ± 0% 101ns ± 1% -3.75% (p=0.029 n=4+4) DeepEqual/float64-4 22.7ns ± 2% 22.1ns ± 0% -2.52% (p=0.029 n=4+4) DeepEqual/[]float64-4 105ns ± 1% 103ns ± 1% -2.77% (p=0.029 n=4+4) DeepEqual/complex64-4 22.9ns ± 0% 22.8ns ± 0% -0.48% (p=0.029 n=4+4) DeepEqual/[]complex64-4 107ns ± 0% 101ns ± 0% -5.48% (p=0.029 n=4+4) DeepEqual/complex128-4 23.2ns ± 1% 22.6ns ± 0% -2.34% (p=0.029 n=4+4) DeepEqual/[]complex128-4 107ns ± 0% 101ns ± 0% -5.60% (p=0.029 n=4+4) DeepEqual/bool-4 22.0ns ± 1% 21.7ns ± 0% -1.44% (p=0.029 n=4+4) DeepEqual/[]bool-4 106ns ± 1% 100ns ± 0% -5.42% (p=0.029 n=4+4) DeepEqual/string-4 26.7ns ± 1% 24.7ns ± 0% -7.47% (p=0.029 n=4+4) DeepEqual/[]string-4 112ns ± 0% 107ns ± 0% -4.21% (p=0.029 n=4+4) DeepEqual/[]uint8#01-4 89.4ns ± 1% 85.5ns ± 1% -4.44% (p=0.029 n=4+4) DeepEqual/[][]uint8-4 177ns ± 0% 173ns ± 1% -2.22% (p=0.029 n=4+4) DeepEqual/[6]uint8-4 137ns ± 1% 137ns ± 0% -0.56% (p=0.057 n=4+4) DeepEqual/[][6]uint8-4 232ns ± 0% 230ns ± 1% -1.09% (p=0.029 n=4+4) Change-Id: I275624e21dc4d70001032be48897f1504cbfdd1c Reviewed-on: https://go-review.googlesource.com/c/go/+/427634 Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-04cmd/compile: rename gen and builtin to _gen and _builtinRuss Cox
These two directories are full of //go:build ignore files. We can ignore them more easily by putting an underscore at the start of the name. That also works around a bug in Go 1.17 that was not fixed until Go 1.17.3. Change-Id: Ia5389b65c79b1e6d08e4fef374d335d776d44ead Reviewed-on: https://go-review.googlesource.com/c/go/+/435472 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>