aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa/rewritePPC64.go
AgeCommit message (Collapse)Author
9 dayscmd/compile: improve uint8/uint16 logical immediates on PPC64Jayanth Krishnamurthy jayanth.krishnamurthy@ibm.com
Logical ops on uint8/uint16 (AND/OR/XOR) with constants sometimes materialized the mask via MOVD (often as a negative immediate), even when the value fit in the UI-immediate range. This prevented the backend from selecting andi. / ori / xori forms. This CL makes: UI-immediate truncation is performed only at the use-site of logical-immediate ops, and only when the constant does not fit in the 8- or 16-bit unsigned domain (m != uint8(m) / m != uint16(m)). This avoids negative-mask materialization and enables correct emission of UI-form logical instructions. Arithmetic SI-immediate instructions (addi, subfic, etc.) and other use-patterns are unchanged. Codegen tests are added to ensure the expected andi./ori/xori patterns appear and that MOVD is not emitted for valid 8/16-bit masks. Change-Id: I9fcdf4498c4e984c7587814fb9019a75865c4a0d Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/704015 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Paul Murphy <paumurph@redhat.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com>
2026-03-25cmd/compile: ppc64 fold (x+x)<<c into x<<(c+1)Jayanth Krishnamurthy jayanth.krishnamurthy@ibm.com
On ppc64/ppc64le, rewrite (x + x) << c to x << (c+1) for constant shifts. This removes an ADD, shortens the dependency chain, and reduces code size. Add rules for both 64-bit (SLDconst) and 32-bit (SLWconst), and extend test/codegen/shift.go with ppc64x checks to assert a single SLD/SLW and forbid ADD. Aligns ppc64 with other architectures that already assert similar codegen in shift.go. Change-Id: Ie564afbb029a5bd48887b82b0c455ca1dddd5508 Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/712000 Reviewed-by: Archana Ravindar <aravinda@redhat.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2026-03-06cmd/compile: use tail calls for wrappers for embedded interfacesKeith Randall
type I interface { foo() } type S struct { I } Because I is embedded in S, S needs a foo method. We generate a wrapper function to implement (*S).foo. It just loads the embedded field I out of S and calls foo on it. When the thing in S.I itself needs a wrapper, then we have a wrapper calling another wrapper. This can continue, leaving a potentially long sequence of wrappers on the stack. When we then call runtime.Callers or friends, we have to walk an unbounded number of frames to find a bounded number of non-wrapper frames. This really happens, for instance with I = context.Context, S = context.ValueCtx, and runtime.Callers = pprof sample (for any of context.Context's methods). To fix, make the interface call in the wrapper a tail call. That way, the number of wrapper frames on the stack does not increase when there are lots of wrappers happening. Fixes #75764 Fixes #77781 Change-Id: I03b1731159d9218c7f14f72ecbbac822d6a6bb87 Reviewed-on: https://go-review.googlesource.com/c/go/+/751465 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-10-28cmd/compile: extend ppc64 MADDLD to match const ADDconst & MULLDconstJorropo
Fixes #76084 I was focused on restoring the old behavior and fixing the failing test/codegen/arithmetic.go:MergeMuls2 test. It is probable this same bug hides elsewhere in this file. Change-Id: I17f2ee6b97a1e33b8132648d9d750749d006f7e0 Reviewed-on: https://go-review.googlesource.com/c/go/+/715560 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Paul Murphy <paumurph@redhat.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
2025-08-05cmd/compile: move ppc64 over to new bounds check strategyKeith Randall
Change-Id: I25a9bbc247b2490e7e37ed843386f53a71822146 Reviewed-on: https://go-review.googlesource.com/c/go/+/682498 Reviewed-by: Paul Murphy <paumurph@redhat.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-02-27cmd/compile: simplify intrinsification of TrailingZeros16 and TrailingZeros8Joel Sing
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64 and S390X, rather than having a custom intrinsic. Note that for PPC64 this actually allows the existing Ctz16 and Ctz8 rules to be used. Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4 Reviewed-on: https://go-review.googlesource.com/c/go/+/651816 Reviewed-by: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-02-26cmd/compile: simplify intrinsification of BitLen16 and BitLen8Joel Sing
Decompose BitLen16 and BitLen8 within the SSA rules for architectures that support BitLen32 or BitLen64, rather than having a custom intrinsic. Change-Id: Ie4188ce69d1021e63cec27a8e7418efb0714812b Reviewed-on: https://go-review.googlesource.com/c/go/+/651817 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-11-12cmd/compile/internal/ssa: improve carry addition rules on PPC64Paul E. Murphy
Fold constant int16 addends for usages of math/bits.Add64(x,const,0) on PPC64. This usage shows up in a few crypto implementations; notably the go wrapper for CL 626176. Change-Id: I6963163330487d04e0479b4fdac235f97bb96889 Reviewed-on: https://go-review.googlesource.com/c/go/+/625899 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-10-24cmd/compile/internal/ssa: fix PPC64 shift codegen regressionPaul E. Murphy
CL 621357 introduced new generic lowering rules which caused several shift related codegen test failures. Add new rules to fix the test regressions, and cleanup tests which are changed but not regressed. Some CLRLSLDI tests are removed as they are no test CLRLSLDI rules. Fixes #70003 Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034 Reviewed-on: https://go-review.googlesource.com/c/go/+/622236 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-07cmd/compile: add internal/runtime/atomic.Xchg8 intrinsic for PPC64Paul E. Murphy
This is minor extension of the existing support for 32 and 64 bit types. For #69735 Change-Id: I6828ec223951d2b692e077dc507b000ac23c32a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/617496 Reviewed-by: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@google.com>
2024-09-24cmd/compile: small cleanups to rewrite rule helperskhr@golang.org
Change-Id: I50a19bd971176598bf8e4ef86ec98f008abe245c Reviewed-on: https://go-review.googlesource.com/c/go/+/615198 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@google.com>
2024-08-26cmd/compile: intrinsify math.MulUintptr on PPC64Paul E. Murphy
This can be done efficiently with few instructions. This also adds MULHDUCC for further codegen improvement. Change-Id: I06320ba4383a679341b911a237a360ef07b19168 Reviewed-on: https://go-review.googlesource.com/c/go/+/605975 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Archana Ravindar <aravinda@redhat.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-06-07cmd/compile/ssa: fix (MOVWZreg (RLWINM)) folding on PPC64Paul E. Murphy
RLIWNM does not clear the upper 32 bits of the target register if the mask wraps around (e.g 0xF000000F). Don't elide MOVWZreg for such masks. All other usage clears the upper 32 bits. Fixes #67844. Change-Id: I11b89f1da9ae077624369bfe2bf25e9b7c9b79bc Reviewed-on: https://go-review.googlesource.com/c/go/+/590896 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-22cmd/compile/internal/ssa: reintroduce ANDconst opcode on PPC64Paul E. Murphy
This allows more effective conversion of rotate and mask opcodes into their CC equivalents, while simplifying the first lowering pass. This was removed before the latelower pass was introduced to fold more cases of compare against zero. Add ANDconst to push the conversion of ANDconst to ANDCCconst into latelower with the other CC opcodes. This also requires introducing RLDICLCC to prevent regressions when ANDconst is converted to RLDICL then to RLDICLCC and back to ANDCCconst when possible. Change-Id: I9e5f9c99fbefa334db18c6c152c5f967f3ff2590 Reviewed-on: https://go-review.googlesource.com/c/go/+/586160 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-05-17cmd/compile/internal/ssa: cleanup ANDCCconst rewrite rules on PPC64Paul E. Murphy
Avoid creating duplicate usages of ANDCCconst. This is preparation for a patch to reintroduce ANDconst to simplify the lower pass while treating ANDCCconst like other *CC* ssa opcodes. Also, move many of the similar rules wich retarget ANDCCconst users to the flag result to a common rule for all compares against zero. Change-Id: Ida86efe17ff413cb82c349d8ef69d2899361f4c0 Reviewed-on: https://go-review.googlesource.com/c/go/+/585400 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-05-15cmd/compile/internal/ssa: combine more shift and masking on PPC64Paul E. Murphy
Investigating binaries, these patterns seem to show up frequently. Change-Id: I987251e4070e35c25e98da321e444ccaa1526912 Reviewed-on: https://go-review.googlesource.com/c/go/+/583302 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-05-03cmd/compile/internal/ssa: on PPC64, try combining CLRLSLDI and SRDconst into ↵Paul E. Murphy
RLWINM This provides a small performance bump to crc64 as measured on ppc64le/power10: name old time/op new time/op delta Crc64/ISO64KB 49.6µs ± 0% 46.6µs ± 0% -6.18% Crc64/ISO4KB 3.16µs ± 0% 2.97µs ± 0% -5.83% Crc64/ISO1KB 840ns ± 0% 794ns ± 0% -5.46% Crc64/ECMA64KB 49.6µs ± 0% 46.5µs ± 0% -6.20% Crc64/Random64KB 53.1µs ± 0% 49.9µs ± 0% -6.04% Crc64/Random16KB 15.9µs ± 1% 15.0µs ± 0% -5.73% Change-Id: I302b5431c7dc46dfd2d211545c483bdcdfe011f1 Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/581937 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Eli Bendersky <eliben@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-04-01cmd/compile: support float min/max instructions on PPC64Paul E. Murphy
This enables efficient use of the builtin min/max function for float64 and float32 types on GOPPC64 >= power9. Extend the assembler to support xsminjdp/xsmaxjdp and use them to implement float min/max. Simplify the VSX xx3 opcode rules to allow FPR arguments, if all arguments are an FPR. Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328 Reviewed-on: https://go-review.googlesource.com/c/go/+/574535 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Than McIntosh <thanm@google.com>
2024-03-15cmd/compile/internal: generate ADDZE on PPC64Paul E. Murphy
This usage shows up in quite a few places, and helps reduce register pressure in several complex cryto functions by removing a MOVD $0,... instruction. Change-Id: I9444ea8f9d19bfd68fb71ea8dc34e109681b3802 Reviewed-on: https://go-review.googlesource.com/c/go/+/571055 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Paul Murphy <murp@ibm.com>
2023-11-06cmd/compile: adding rule to eliminate ANDCCconstJayanth Krishnamurthy
For example, the Slicemask rule in PPC64 generates a sequence wherein there is andi operation, after an sradi, which can be replaced by srdi. This new rule eliminates ANDCCconst. Change-Id: I27aaadf76b9c749a60bcdc5e87b1ebb8167d2fd4 Reviewed-on: https://go-review.googlesource.com/c/go/+/539055 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-10-18cmd/compile: update to rules on PPC64 folding bit reversal to loadJayanth Krishnamurthy
In the Power10 rule to fold bit reversal into load, the MOVWZreg or MOVHZreg (Zeroing out the upper bits of a word or halfword) becomes redundant since byte reverse (BR) load clears the upper bits. Hence removing for Power10. Similarly for < Power10 cases in the rule used to fold bit reversal into load (Bswap), the above redundant operation is removed. Change-Id: Idb027e8b6e79b6acfb81d48a9a6cc06f8e9cd2db Reviewed-on: https://go-review.googlesource.com/c/go/+/531377 Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-18cmd/compile: avoid ANDCCconst on PPC64 if condition not neededLynn Boger
In the PPC64 ISA, the instruction to do an 'and' operation using an immediate constant is only available in the form that also sets CR0 (i.e. clobbers the condition register.) This means CR0 is being clobbered unnecessarily in many cases. That affects some decisions made during some compiler passes that check for it. In those cases when the constant used by the ANDCC is a right justified consecutive set of bits, a shift instruction can be used which has the same effect if CR0 does not need to be set. The rule to do that has been added to the late rules file after other rules using ANDCCconst have been processed in the main rules file. Some codegen tests had to be updated since ANDCC is no longer generated for some cases. A new test case was added to verify the ANDCC is present if the results for both the AND and CR0 are used. Change-Id: I304f607c039a458e2d67d25351dd00aea72ba542 Reviewed-on: https://go-review.googlesource.com/c/go/+/531435 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-09-25cmd/compile: add rules to avoid unnecessary MOVDaddr for PPC64Lynn Boger
This adds some rules to recognize MOVDaddr in those cases where it is just adding 0 to a ptr value. Instead the ptr value can just be used. Change-Id: I95188defc9701165c86bbea70d14d037a9e54853 Reviewed-on: https://go-review.googlesource.com/c/go/+/527698 Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Paul Murphy <murp@ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Than McIntosh <thanm@google.com>
2023-09-22cmd/compile/internal/ssa: optimize (AND (MOVDconst [-1] x)) on PPC64Paul E. Murphy
This sequence can show up in the lowering pass on PPC64. If it makes it to the latelower pass, it will cause an error because it looks like it can be turned into RLDICL, but -1 isn't an accepted mask. Also, print more debug info if panic is called from encodePPC64RotateMask. Fixes #62698 Change-Id: I0f3322e2205357abe7fc28f96e05e3f7ad65567c Reviewed-on: https://go-review.googlesource.com/c/go/+/529195 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-07-12cmd/compile: on PPC64, fix sign/zero extension when maskingPaul E. Murphy
(ANDCCconst [y] (MOV.*reg x)) should only be merged when zero extending. Otherwise, sign bits are lost on negative values. (ANDCCconst [0xFF] (MOVBreg x)) should be simplified to a zero extension of x. Likewise for the MOVHreg variant. Fixes #61297 Change-Id: I04e4fd7dc6a826e870681f37506620d48393698b Reviewed-on: https://go-review.googlesource.com/c/go/+/508775 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-05-15cmd/compile: update rules to generate more prefixed instructionsLynn Boger
This modifies some existing rules to allow more prefixed instructions to be generated when using GOPPC64=power10. Some rules also check if PCRel is available, which is currently supported for linux/ppc64le and linux/ppc64 (internal linking only). Prior to p10, DS-offset loads and stores had a 16 bit size limit for the offset field. If the offset of the data for load or store was beyond this range then an indexed load or store would be selected by the rules. In p10 the assembler can generate prefixed instructions in this case, but does not if an indexed instruction was selected during the lowering pass. This allows many more cases to use prefixed loads or stores, reducing function sizes and improving performance in some cases where the code change happens in key loops. For example in strconv BenchmarkAppendQuoteRune before: 12c5e4: 15 00 10 06 pla r10,1425660 12c5e8: fc c0 40 39 12c5ec: 00 00 6a e8 ld r3,0(r10) 12c5f0: 10 00 aa e8 ld r5,16(r10) After this change: 12a828: 15 00 10 04 pld r3,1433272 12a82c: b8 de 60 e4 12a830: 15 00 10 04 pld r5,1433280 12a834: c0 de a0 e4 Performs better in the second case. A testcase was added to verify that the rules correctly select a load or store based on the offset and whether power10 or earlier. Change-Id: I4335fed0bd9b8aba8a4f84d69b89f819cc464846 Reviewed-on: https://go-review.googlesource.com/c/go/+/477398 Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com>
2023-04-21cmd/compile: introduce separate memory op combining passKeith Randall
Memory op combining is currently done using arch-specific rewrite rules. Instead, do them as a arch-independent rewrite pass. This ensures that all architectures (with unaligned loads & stores) get equal treatment. This removes a lot of rewrite rules. The new pass is a bit more comprehensive. It handles things like out-of-order writes and is careful not to apply partial optimizations that then block further optimizations. Change-Id: I780ff3bb052475cd725a923309616882d25b8d9e Reviewed-on: https://go-review.googlesource.com/c/go/+/478475 Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2023-04-10cmd/compile: replace isSigned(t) with t.IsSigned()Keith Randall
No change in semantics, just removing an unneeded helper. Also align rules a bit. Change-Id: Ie4dabb99392315a7700c645b3d0931eb8766a5fa Reviewed-on: https://go-review.googlesource.com/c/go/+/483439 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2023-04-10cmd/compile: clean up store rules to use store type, not argument typeKeith Randall
Argument type is dangerous because it may be thinner than the actual store being issued. Change-Id: Id19fbd8e6c41390a453994f897dd5048473136aa Reviewed-on: https://go-review.googlesource.com/c/go/+/483438 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com>
2023-02-23cmd/compile: rework unbounded shift lowering on PPC64Paul E. Murphy
This reduces unbounded shift latency by one cycle, and may generate less instructions in some cases. When there is a choice whether to use doubleword or word shifts, use doubleword shifts. Doubleword shifts have fewer hardware scheduling restrictions across P8/P9/P10. Likewise, rework the shift sequence to allow the compare/shift/overshift values to compute in parallel, then choose the correct value. Some ANDCCconst rules also need reworked to ensure they simplify when used for their flag value. This commonly occurs when prove fails to identify a bounded shift (e.g foo32<<uint(x&31)). Change-Id: Ifc6ff4a865d68675e57745056db414b0eb6f2d34 Reviewed-on: https://go-review.googlesource.com/c/go/+/442597 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-02-17cmd/compile: ensure constant folding of pointer arithmetic remains a pointerKeith Randall
For c + nil, we want the result to still be of pointer type. Fixes ppc64le build failure with CL 468455, in issue33724.go. The problem in that test is that it requires a nil check to be scheduled before the corresponding load. This normally happens fine because we prioritize nil checks. If we have nilcheck(p) and load(p), once p is scheduled the nil check will always go before the load. The issue we saw in 33724 is that when p is a nil pointer, we ended up with two different p's, an int64(0) as the argument to the nil check and an (*Outer)(0) as the argument to the load. Those two zeroes don't get CSEd, so if the (*Outer)(0) happens to get scheduled first, the load can end up before the nilcheck. Fix this by always having constant arithmetic preserve the pointerness of the value, so that both zeroes are of type *Outer and get CSEd. Update #58482 Update #33724 Change-Id: Ib9b8c0446f1690b574e0f3c0afb9934efbaf3513 Reviewed-on: https://go-review.googlesource.com/c/go/+/468615 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> TryBot-Bypass: Keith Randall <khr@golang.org>
2023-02-06cmd/compile: add rules to emit SETBC/R instructions on power10Archana R
This CL adds rules that replaces instances of ISEL that produce a boolean result based on a condition register by SETBC/SETBCR operations. On Power10 these are convereted to SETBC/SETBCR instructions that use one register instead of 3 registers conventionally used by ISEL and hence reduces register pressure. On loops written specifically to exercise such instances of ISEL extensively, a performance improvement of 2.5% is seen on Power10. Also added verification tests to verify correct generation of SETBC/SETBCR instructions on Power10. Change-Id: Ib719897f09d893de40324440a43052dca026e8fa Reviewed-on: https://go-review.googlesource.com/c/go/+/449795 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-03cmd/compile: intrinsify math/bits/ReverseBytes{16|32|64} for ppc64/power10Archana R
This change intrinsifies ReverseBytes{16|32|64} by generating the corresponding new instructions in Power10: brh, brd and brw and adds a verification test for the same. On Power 9 and 8, the .go code performs optimally as it is. Performance improvement seen on Power10: ReverseBytes32 1.38ns ± 0% 1.18ns ± 0% -14.2 ReverseBytes64 1.52ns ± 0% 1.11ns ± 0% -26.87 ReverseBytes16 1.41ns ± 1% 1.18ns ± 0% -16.47 Change-Id: I88f127f3ab9ba24a772becc21ad90acfba324b37 Reviewed-on: https://go-review.googlesource.com/c/go/+/446675 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-01-19cmd/compile: add anchored version of SPKeith Randall
The SPanchored opcode is identical to SP, except that it takes a memory argument so that it (and more importantly, anything that uses it) must be scheduled at or after that memory argument. This opcode ensures that a LEAQ of a variable gets scheduled after the corresponding VARDEF for that variable. This may lead to less CSE of LEAQ operations. The effect is very small. The go binary is only 80 bytes bigger after this CL. Usually LEAQs get folded into load/store operations, so the effect is only for pointerful types, large enough to need a duffzero, and have their address passed somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because the two uses are on different sides of a function call and the LEAQ ends up being rematerialized at the second use anyway. Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293 Reviewed-on: https://go-review.googlesource.com/c/go/+/452916 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Martin Möhrmann <martin@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2023-01-19cmd/compile/internal/ssa: generate code via a //go:generate directiveDmitri Shuralyov
The standard way to generate code in a Go package is via //go:generate directives, which are invoked by the developer explicitly running: go generate import/path/of/said/package Switch to using that approach here. This way, developers don't need to learn and remember a custom way that each particular Go package may choose to implement its code generation. It also enables conveniences such as 'go generate -n' to discover how code is generated without running anything (this works on all packages that rely on //go:generate directives), being able to generate multiple packages at once and from any directory, and so on. Change-Id: I0e5b6a1edeff670a8e588befeef0c445613803c7 Reviewed-on: https://go-review.googlesource.com/c/go/+/460135 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2022-12-11cmd/compile: fix conditional move rule on PPC64Keith Randall
Similar to CL 456556 but for ppc64 instead of arm64. Change docs about how booleans are stored in registers for ppc64. We now don't promise to keep the upper bits zeroed; they might be junk. To test, I changed the boolean generation instructions (MOVBZload* and ISEL* with boolean type) to OR in 0x100 to the result. all.bash still passed, so I think nothing else is depending on the upper bits of booleans. Update #57184 Change-Id: Ie66f8934a0dafa34d0a8c2a37324868d959a852c Reviewed-on: https://go-review.googlesource.com/c/go/+/456437 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: KAMPANAT THUMWONG (KONG PC) <1992kongpc.kth@gmail.com> Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
2022-11-03cmd/compile: add debug-hash flag for fused-multiply-addDavid Chase
This adds a -d debug flag "fmahash" for hashcode search for floating point architecture-dependent problems. This variable has no effect on architectures w/o fused-multiply-add. This was rebased onto the GOSSAHASH renovation so that this could have its own dedicated environment variable, and so that it would be cheap (a nil check) to check it in the normal case. Includes a basic test of the trigger plumbing. Sample use (on arm64, ppc64le, s390x): % GOCOMPILEDEBUG=fmahash=001110110 \ go build -o foo cmd/compile/internal/ssa/testdata/fma.go fmahash triggered main.main:24 101111101101111001110110 GOFMAHASH triggered main.main:20 010111010000101110111011 1.0000000000000002 1.0000000000000004 -2.220446049250313e-16 exit status 1 The intended use is in conjunction with github.com/dr2chase/gossahash, which will probably acquire a flag "-fma" to streamline its use. This tool+use was inspired by an ad hoc use of this technique "in anger" to debug this very problem. This is also a dry-run for using this same technique to identify code sensitive to loop variable lifetime/capture, should we make that change. Example intended use, with current search tool (using old environment variable), for a test example: gossahash -e GOFMAHASH GOMAGIC=GOFMAHASH go run fma.go Trying go args=[...], env=[GOFMAHASH=1 GOMAGIC=GOFMAHASH] go failed (81 distinct triggers): exit status 1 Trying go args=[...], env=[GOFMAHASH=11 GOMAGIC=GOFMAHASH] go failed (39 distinct triggers): exit status 1 Trying go args=[...], env=[GOFMAHASH=011 GOMAGIC=GOFMAHASH] go failed (18 distinct triggers): exit status 1 Trying go args=[...], env=[GOFMAHASH=0011 GOMAGIC=GOFMAHASH] Trying go args=[...], env=[GOFMAHASH=1011 GOMAGIC=GOFMAHASH] ... Trying go args=[...], env=[GOFMAHASH=0110111011 GOMAGIC=GOFMAHASH] Trying go args=[...], env=[GOFMAHASH=1110111011 GOMAGIC=GOFMAHASH] go failed (2 distinct triggers): exit status 1 Trigger string is 'GOFMAHASH triggered math.qzero:427 111111101010011110111011', repeated 6 times Trigger string is 'GOFMAHASH triggered main.main:20 010111010000101110111011', repeated 1 times Trying go args=[...], env=[GOFMAHASH=01110111011 GOMAGIC=GOFMAHASH] go failed (1 distinct triggers): exit status 1 Trigger string is 'GOFMAHASH triggered main.main:20 010111010000101110111011', repeated 1 times Review GSHS_LAST_FAIL.0.log for failing run FINISHED, suggest this command line for debugging: GOSSAFUNC='main.main:20 010111010000101110111011' \ GOFMAHASH=01110111011 GOMAGIC=GOFMAHASH go run fma.go Change-Id: Ifa22dd8f1c37c18fc8a4f7c396345a364bc367d5 Reviewed-on: https://go-review.googlesource.com/c/go/+/394754 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2022-10-23cmd/internal/ssa: correct references to _gen folderJohan Brandhorst-Satzkorn
The gen folder was renamed to _gen in CL 435472, but references in code and docs were not updated. This updates the references. Change-Id: Ibadc0cdcb5bed145c3257b58465a8df370487ae5 Reviewed-on: https://go-review.googlesource.com/c/go/+/444355 Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Johan Brandhorst-Satzkorn <johan.brandhorst@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-07cmd/compile: leverage cc ops in more cases on ppc64xLynn Boger
This updates some rules to use ops with CC variations to set the condition code when the result of the operation is zero. This allows the following compare with zero to be removed since the equivalent condition code has already been set. In addition, a previous rule change to use ANDCCconst was modified to allow any constant value, not just 1 in some cases. Improvements in the reflect package benchmarks: DeepEqual/int8-4 23.9ns ± 1% 23.1ns ± 1% -3.57% (p=0.029 n=4+4) DeepEqual/[]int8-4 109ns ± 2% 102ns ± 1% -6.67% (p=0.029 n=4+4) DeepEqual/int16-4 23.8ns ± 1% 22.8ns ± 0% -3.97% (p=0.029 n=4+4) DeepEqual/[]int16-4 108ns ± 1% 102ns ± 0% -6.25% (p=0.029 n=4+4) DeepEqual/int32-4 24.9ns ± 3% 23.6ns ± 0% -5.09% (p=0.029 n=4+4) DeepEqual/[]int32-4 109ns ± 1% 103ns ± 0% -5.64% (p=0.029 n=4+4) DeepEqual/int64-4 25.5ns ± 1% 23.7ns ± 0% -7.03% (p=0.029 n=4+4) DeepEqual/[]int64-4 109ns ± 1% 102ns ± 0% -6.73% (p=0.029 n=4+4) DeepEqual/int-4 23.2ns ± 1% 22.7ns ± 0% -2.05% (p=0.029 n=4+4) DeepEqual/[]int-4 109ns ± 3% 101ns ± 0% -7.18% (p=0.029 n=4+4) DeepEqual/uint8-4 23.9ns ± 1% 23.5ns ± 0% -1.69% (p=0.029 n=4+4) DeepEqual/[]uint8-4 89.1ns ± 0% 85.6ns ± 1% -3.95% (p=0.029 n=4+4) DeepEqual/uint16-4 24.0ns ± 1% 23.8ns ± 0% -0.76% (p=0.343 n=4+4) DeepEqual/[]uint16-4 111ns ± 0% 106ns ± 4% -4.74% (p=0.029 n=4+4) DeepEqual/uint32-4 23.5ns ± 1% 23.0ns ± 0% -2.15% (p=0.029 n=4+4) DeepEqual/[]uint32-4 110ns ± 1% 104ns ± 0% -5.66% (p=0.029 n=4+4) DeepEqual/uint64-4 24.6ns ± 1% 24.3ns ± 0% -1.10% (p=0.143 n=4+4) DeepEqual/[]uint64-4 111ns ± 0% 105ns ± 1% -5.16% (p=0.029 n=4+4) DeepEqual/uint-4 23.6ns ± 0% 23.0ns ± 0% -2.70% (p=0.029 n=4+4) DeepEqual/[]uint-4 109ns ± 0% 103ns ± 1% -5.74% (p=0.029 n=4+4) DeepEqual/uintptr-4 25.1ns ± 1% 24.8ns ± 2% -1.11% (p=0.171 n=4+4) DeepEqual/[]uintptr-4 111ns ± 0% 106ns ± 1% -4.45% (p=0.029 n=4+4) DeepEqual/float32-4 22.5ns ± 0% 22.2ns ± 0% -1.29% (p=0.029 n=4+4) DeepEqual/[]float32-4 105ns ± 0% 101ns ± 1% -3.75% (p=0.029 n=4+4) DeepEqual/float64-4 22.7ns ± 2% 22.1ns ± 0% -2.52% (p=0.029 n=4+4) DeepEqual/[]float64-4 105ns ± 1% 103ns ± 1% -2.77% (p=0.029 n=4+4) DeepEqual/complex64-4 22.9ns ± 0% 22.8ns ± 0% -0.48% (p=0.029 n=4+4) DeepEqual/[]complex64-4 107ns ± 0% 101ns ± 0% -5.48% (p=0.029 n=4+4) DeepEqual/complex128-4 23.2ns ± 1% 22.6ns ± 0% -2.34% (p=0.029 n=4+4) DeepEqual/[]complex128-4 107ns ± 0% 101ns ± 0% -5.60% (p=0.029 n=4+4) DeepEqual/bool-4 22.0ns ± 1% 21.7ns ± 0% -1.44% (p=0.029 n=4+4) DeepEqual/[]bool-4 106ns ± 1% 100ns ± 0% -5.42% (p=0.029 n=4+4) DeepEqual/string-4 26.7ns ± 1% 24.7ns ± 0% -7.47% (p=0.029 n=4+4) DeepEqual/[]string-4 112ns ± 0% 107ns ± 0% -4.21% (p=0.029 n=4+4) DeepEqual/[]uint8#01-4 89.4ns ± 1% 85.5ns ± 1% -4.44% (p=0.029 n=4+4) DeepEqual/[][]uint8-4 177ns ± 0% 173ns ± 1% -2.22% (p=0.029 n=4+4) DeepEqual/[6]uint8-4 137ns ± 1% 137ns ± 0% -0.56% (p=0.057 n=4+4) DeepEqual/[][6]uint8-4 232ns ± 0% 230ns ± 1% -1.09% (p=0.029 n=4+4) Change-Id: I275624e21dc4d70001032be48897f1504cbfdd1c Reviewed-on: https://go-review.googlesource.com/c/go/+/427634 Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-09-28cmd/compile: remove some lines from PPC64.rulesLynn Boger
In CL 429035 Keith suggested removing some rules that were covered by generic rules. This follows up on that comment. Change-Id: I57b6c9ae0cd85f33a0eb2fef8356575d3d7820fb Reviewed-on: https://go-review.googlesource.com/c/go/+/430417 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2022-09-27cmd/compile: rework PPC64 Mul64uhilo lowering rulesPaul E. Murphy
Remove OpPPC64LoweredMuluhilo as this operation can be done more efficiently with MULHDU and MULLD directly. This has the benefit of not needing to use tuple select operations, and giving the scheduler more freedom to place these operations. The primary reason to avoid using tuples here is to to avoid suboptimal scheduling when carry ops (e.x ADDC/ADDE) are used in the same block as 64->128b multiples. CL 432275 modifies the scheduling priorities which may cause non-flag/non-carry generating tuple ops to interfere with carry opcodes. Thus resulting in excess saving and restoring of the XER register. This allows CL 432275 to adjust the scheduling priorities without having to workaround odd tuple scheduling behavior. Change-Id: Id04ef009ec4b86416e5436f2b44ae1474e73720e Reviewed-on: https://go-review.googlesource.com/c/go/+/434855 Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-09-08cmd/compile/internal: merge rules in PPC64.rulesLynn Boger
This uses rulegen syntax which allows similar rules to be combined, saving lines in the rules file. The Lsh16x32 rule had an incorrect value and that was fixed. Change-Id: I637410e39d8554825076aca5ac24083ce05ab186 Reviewed-on: https://go-review.googlesource.com/c/go/+/429035 Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2022-08-23cmd/compile: move SSA rotate instruction detection to arch-independent rulesKeith Randall
Detect rotate instructions while still in architecture-independent form. It's easier to do here, and we don't need to repeat it in each architecture file. Change-Id: I9396954b3f3b3bfb96c160d064a02002309935bb Reviewed-on: https://go-review.googlesource.com/c/go/+/421195 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Eric Fang <eric.fang@arm.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Ruinan Sun <Ruinan.Sun@arm.com> Run-TryBot: Keith Randall <khr@golang.org>
2022-08-08cmd/compile: fix confusion with ANDCCconst in PPC64 rulesLynn Boger
Currently there is a an ANDconst and an ANDCCconst op in PPC64, which is confusing since they map onto the same instruction. One of these ops sets the result of the AND operation, and the other sets the flag (condition register). This converts ANDCCconst into an op with the 2 expected results: the integer result of the AND and the flag setting. The ANDconst op has been removed. Note that in the PPC64 ISA the only variation of the 'and immediate' is the one that sets the condition bit, which probably led to the original (confusing) implementation. This also adds a few rules to improve the use of ANDCCconst with ISELB and some testcases to verify those improvements. Change-Id: I523703fa4da2098eb995dc3ba744d36fa28e41d4 Reviewed-on: https://go-review.googlesource.com/c/go/+/422015 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Paul Murphy <murp@ibm.com>
2022-05-10cmd/compile: lower Add64/Sub64 into ssa on PPC64Paul E. Murphy
math/bits.Add64 and math/bits.Sub64 now lower and optimize directly in SSA form. The optimization of carry chains focuses around eliding XER<->GPR transfers of the CA bit when used exclusively as an input to a single carry operations, or when the CA value is known. This also adds support for handling XER spills in the assembler which could happen if carry chains contain inter-dependencies on each other (which seems very unlikely with practical usage), or a clobber happens (SRAW/SRAD/SUBFC operations clobber CA). With PPC64 Add64/Sub64 lowering into SSA and this patch, the net performance difference in crypto/elliptic benchmarks on P9/ppc64le are: name old time/op new time/op delta ScalarBaseMult/P256 46.3µs ± 0% 46.9µs ± 0% +1.34% ScalarBaseMult/P224 356µs ± 0% 209µs ± 0% -41.14% ScalarBaseMult/P384 1.20ms ± 0% 0.57ms ± 0% -52.14% ScalarBaseMult/P521 3.38ms ± 0% 1.44ms ± 0% -57.27% ScalarMult/P256 199µs ± 0% 199µs ± 0% -0.17% ScalarMult/P224 357µs ± 0% 212µs ± 0% -40.56% ScalarMult/P384 1.20ms ± 0% 0.58ms ± 0% -51.86% ScalarMult/P521 3.37ms ± 0% 1.44ms ± 0% -57.32% MarshalUnmarshal/P256/Uncompressed 2.59µs ± 0% 2.52µs ± 0% -2.63% MarshalUnmarshal/P256/Compressed 2.58µs ± 0% 2.52µs ± 0% -2.06% MarshalUnmarshal/P224/Uncompressed 1.54µs ± 0% 1.40µs ± 0% -9.42% MarshalUnmarshal/P224/Compressed 1.54µs ± 0% 1.39µs ± 0% -9.87% MarshalUnmarshal/P384/Uncompressed 2.40µs ± 0% 1.80µs ± 0% -24.93% MarshalUnmarshal/P384/Compressed 2.35µs ± 0% 1.81µs ± 0% -23.03% MarshalUnmarshal/P521/Uncompressed 3.79µs ± 0% 2.58µs ± 0% -31.81% MarshalUnmarshal/P521/Compressed 3.80µs ± 0% 2.60µs ± 0% -31.67% Note, P256 uses an asm implementation, thus, little variation is expected. Change-Id: I88a24f6bf0f4f285c649e40243b1ab69cc452b71 Reviewed-on: https://go-review.googlesource.com/c/go/+/346870 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-05-10cmd/compile: fix boolean comparison on PPC64Cherry Mui
Following CL 405114, for PPC64. Should fix PPC64 builds. Updates #52788. Change-Id: I193ac31cfba18b4f907dd2523b51368251fd6fad Reviewed-on: https://go-review.googlesource.com/c/go/+/405116 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>
2022-05-08cmd/compile: teach prove about and operationWayne Zuo
For this code: z &= 63 _ = x<<z | x>>(64-z) Now can prove 'x<<z' in bound. In ppc64 lowering pass, it will not produce an extra '(ANDconst <typ.Int64> [63] z)' causing codegen/rotate.go failed. Just remove the type check in rewrite rules as the workaround. Removes 32 bounds checks during make.bat. Fixes #52563. Change-Id: I14ed2c093ff5638dfea7de9bc7649c0f756dd71a Reviewed-on: https://go-review.googlesource.com/c/go/+/404315 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Keith Randall <khr@golang.org> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-05cmd/compile/internal: intrinsify publicationBarrier on ppc64xLynn Boger
This enables publicationBarrier to be used as an intrinsic on ppc64le/ppc64. A call to this appears in test/bench/go1 BinaryTree17 Change-Id: If53528a82de99688270473cbe23472f37046ad65 Reviewed-on: https://go-review.googlesource.com/c/go/+/404056 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>
2022-05-04cmd/compile: combine OR + NOT into ORN on PPC64Paul E. Murphy
This shows up in a few crypto functions, and other assorted places. Change-Id: I5a7f4c25ddd4a6499dc295ef693b9fe43d2448ab Reviewed-on: https://go-review.googlesource.com/c/go/+/404057 Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Russ Cox <rsc@golang.org>
2022-03-15cmd/compile: fix PrefetchStreamed builtin implementation on PPC64Archana R
This CL fixes encoding of PrefetchStreamed on PPC64 to be consistent with what is implemented on AMD64 and ARM64 platforms which is prefetchNTA (prefetch non-temporal access). Looking at the definition of prefetchNTA, the closest corresponding Touch hint (TH) value to be used on PPC64 is 16 that states that the address is accessed in a transient manner. Current usage of TH=8 may cause degraded performance. Change-Id: I393bf5a9b971a22f632b3cbfb4fa659062af9a27 Reviewed-on: https://go-review.googlesource.com/c/go/+/390316 Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>