aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa/_gen
AgeCommit message (Collapse)Author
2 dayscmd/compile: add boolean absorption laws to SSA rewrite rulesTimo Friedl
The SSA generic rewrite rules implement DeMorgan's laws but are missing the closely related boolean absorption laws: x & (x | y) == x x | (x & y) == x These are fundamental boolean algebra identities (see https://en.wikipedia.org/wiki/Absorption_law) that hold for all bit patterns, all widths, signed and unsigned. Both GCC and LLVM recognize and optimize these patterns at -O2. Add two generic rules covering all four widths (8, 16, 32, 64). Commutativity of AND/OR is handled automatically by the rule engine, so all argument orderings are matched. The rules eliminate two redundant ALU instructions per occurrence and fire on real code (defer bit-manipulation patterns in runtime, testing, go/parser, and third-party packages). Fixes #78632 Change-Id: Ib59e839081302ad1635e823309d8aec768c25dcf GitHub-Last-Rev: 23f8296ece08c77fcaeeaf59c2c2d8ce23d1202c GitHub-Pull-Request: golang/go#78634 Reviewed-on: https://go-review.googlesource.com/c/go/+/765580 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
9 dayscmd/compile: optimize CondSelect to math on arm64 with inline register shiftsJorropo
Change-Id: I27696b1a5fa0593d9f36743efa3559a36d23ec4b Reviewed-on: https://go-review.googlesource.com/c/go/+/760844 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@google.com>
9 dayscmd/compile: improve Mul to Left Shift rulesJorropo
- fix a bug where it wouldn't recognize 1<<63 as a power of two - remove the IsSigned check; there is no such thing as a signed Mul If the rule works for signed numbers it works for unsigned ones too. Even if the intermediary steps makes no sense, it ends up wrapping the right way around in the end. Change-Id: I86182762aec5eff784e2d9bc49ee028825fb9ea0 Reviewed-on: https://go-review.googlesource.com/c/go/+/760843 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
9 dayscmd/compile: cleanup rules by canonicalizing sext(int(bool)) → zext(int(bool))Jorropo
Change-Id: Ic97f661c68180ff7adb9976fcc61279e1e1f04a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/760842 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
9 dayscmd/compile: extend condselect into math code to handle other constants than 1Jorropo
On amd64 along: if b { x += 1 } => x += b We can also implement constants 2 4 and 8: if b { x += 2 } => x += b * 2 This compiles to a displacement LEA. Change-Id: Ib00fcc5059acb0ebb346e056c4a656f164cc63df Reviewed-on: https://go-review.googlesource.com/c/go/+/760841 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
9 dayscmd/compile: improve uint8/uint16 logical immediates on PPC64Jayanth Krishnamurthy jayanth.krishnamurthy@ibm.com
Logical ops on uint8/uint16 (AND/OR/XOR) with constants sometimes materialized the mask via MOVD (often as a negative immediate), even when the value fit in the UI-immediate range. This prevented the backend from selecting andi. / ori / xori forms. This CL makes: UI-immediate truncation is performed only at the use-site of logical-immediate ops, and only when the constant does not fit in the 8- or 16-bit unsigned domain (m != uint8(m) / m != uint16(m)). This avoids negative-mask materialization and enables correct emission of UI-form logical instructions. Arithmetic SI-immediate instructions (addi, subfic, etc.) and other use-patterns are unchanged. Codegen tests are added to ensure the expected andi./ori/xori patterns appear and that MOVD is not emitted for valid 8/16-bit masks. Change-Id: I9fcdf4498c4e984c7587814fb9019a75865c4a0d Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/704015 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Paul Murphy <paumurph@redhat.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com>
12 dayscmd/compile: extend all the cmov into math generic rules with their contraryJorropo
If the bool comes from a local operation this is foldable into the comparison. if a == b { } else { x++ } becomes: x += !(a == b) becomes: x += a != b If the bool is passed in or loaded rather than being locally computed this adds an extra XOR ^1 to invert it. But at worst it should make the math equal to the compute + CMP + CMOV which is a tie on modern CPUs which can execute CMOV on all int ALUs and a win on the cheaper or older ones which can't. Change-Id: Idd2566c7a3826ec432ebfbba7b3898aa0db4b812 Reviewed-on: https://go-review.googlesource.com/c/go/+/760922 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2026-03-31cmd/compile: convert some condmoves in XORJorropo
Similar to CL 685676 but for XOR. Change-Id: Ib5ffd4c13348f176a808b3218fdbbafc2c42794f Reviewed-on: https://go-review.googlesource.com/c/go/+/760921 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2026-03-31cmd/compile: convert some condmoves in ORJorropo
Similar to CL 685676 but for OR. Change-Id: I0ddfd457ed9e8888462306138a251ac48ad42084 Reviewed-on: https://go-review.googlesource.com/c/go/+/760920 Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2026-03-26cmd/compile: on ARM64 merge SRA into TBZ & TBNZJorropo
Change-Id: I9596dbca8991c93c7543d10dc1b155056dfa7db3 Reviewed-on: https://go-review.googlesource.com/c/go/+/759500 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com>
2026-03-26cmd/compile: on ARM64 merge ROR into TBZ & TBNZJorropo
Change-Id: Ib37b35dfff6236c59c0242c3b7d979c95aefbb8b Reviewed-on: https://go-review.googlesource.com/c/go/+/750321 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
2026-03-26cmd/compile: on ARM64 merge shifts into TBZ & TBNZJorropo
Change-Id: I4dff3ba1462848f408257cbadedf202e62d1ea69 Reviewed-on: https://go-review.googlesource.com/c/go/+/750320 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2026-03-25cmd/compile: ppc64 fold (x+x)<<c into x<<(c+1)Jayanth Krishnamurthy jayanth.krishnamurthy@ibm.com
On ppc64/ppc64le, rewrite (x + x) << c to x << (c+1) for constant shifts. This removes an ADD, shortens the dependency chain, and reduces code size. Add rules for both 64-bit (SLDconst) and 32-bit (SLWconst), and extend test/codegen/shift.go with ppc64x checks to assert a single SLD/SLW and forbid ADD. Aligns ppc64 with other architectures that already assert similar codegen in shift.go. Change-Id: Ie564afbb029a5bd48887b82b0c455ca1dddd5508 Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/712000 Reviewed-by: Archana Ravindar <aravinda@redhat.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2026-03-20Revert "runtime, cmd/compile: use preemptible memclr for large pointer-free ↵Michael Pratt
clears" This reverts CL 750480. Reason: Adding preemptible memclrNoHeapPointers exposes existing unsafe use of notInHeapSlice, causing crashes. Revert the memclr stack until the underlying issue is fixed. We keep the test added in CL 755942, which is useful regardless. For #78254. Change-Id: I8be3f9a20292b7f294e98e74e5a86c6a204406ae Reviewed-on: https://go-review.googlesource.com/c/go/+/757343 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2026-03-20cmd/compile: elide sign-extend after zero-extend for wasmGeorge Adams
Add rules to eliminate sign-extension of values that have already been zero-extended from fewer bits via an I64And mask: (I64Extend32S x:(I64And _ (I64Const [c]))) && c >= 0 && int64(int32(c)) == c => x (I64Extend16S x:(I64And _ (I64Const [c]))) && c >= 0 && int64(int16(c)) == c => x (I64Extend8S x:(I64And _ (I64Const [c]))) && c >= 0 && int64(int8(c)) == c => x When a value has been masked to fit within the non-negative range of the sign-extension width, the upper bits are already zero and sign- extending is a no-op. For example, (I64Extend32S (I64And x 0xff)) can be elided because 0xff fits in a signed int32, so bit 31 is guaranteed to be zero and sign-extending from 32 bits is identity. Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero Change-Id: Ia54d67358756e47ca7635a6a8ca4beadb003820a Reviewed-on: https://go-review.googlesource.com/c/go/+/756320 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>
2026-03-20cmd/compile: (mips64x) optimize float32(abs|sqrt64(float64(x)))Julian Zhu
Absorb unnecessary conversion between float32 and float64 if both src and dst are 32 bit. Ref: CL 733621 Updates #75463 Change-Id: I439f92aa3d940fa4979e76845c0893e43bf584af Reviewed-on: https://go-review.googlesource.com/c/go/+/739520 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Joel Sing <joel@sing.id.au>
2026-03-18cmd/link: modify the register used in trampolinelimeidan
R30 is the callee's saved register; using it requires saving and then restoring. Therefore, we replace it with a register saved by the caller. R4~R19 are argument registers on loong64, and R20 is the only remaining usable caller saved register. To use R20 in trampoline, we modified the registers used by the LoweredMove/LoweredMoveLoop operations (originally using r20 and r21, now changed to R23 and R24). Change-Id: Ie7bba0caa30a764a45bcb47635c35c829036c5a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/726140 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Cherry Mui <cherryyz@google.com>
2026-03-17cmd/compile: use 128-bit arm64 vector ops for Move expansionAlexander Musman
Update Move rewrite rules to use FMOVQload/store and FLDPQ/FSTPQ for medium-sized copies (16-64 bytes). This generates fewer and wider instructions than the previous approach using LDP/STP pairs. Executable Base .text go1 Change ---------------------------------------------------- asm 2112308 2105732 -0.31% cgo 1826132 1823172 -0.16% compile 10474868 10460644 -0.14% cover 1990036 1985748 -0.22% fix 3234116 3226340 -0.24% link 2702628 2695316 -0.27% preprofile 947652 947028 -0.07% vet 3140964 3133524 -0.24% Performance effect on OrangePi 6 plus: │ orig.out │ movq.out │ │ sec/op │ sec/op vs base │ CopyFat16 0.4711n ± 0% 0.3852n ± 0% -18.23% (p=0.000 n=10) CopyFat17 0.7705n ± 0% 0.7705n ± 0% ~ (p=0.984 n=10) CopyFat18 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.771 n=10) CopyFat19 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.637 n=10) CopyFat20 0.7703n ± 0% 0.7704n ± 0% ~ (p=0.103 n=10) CopyFat21 0.7703n ± 0% 0.7708n ± 0% ~ (p=0.505 n=10) CopyFat22 0.7704n ± 0% 0.7705n ± 0% ~ (p=0.589 n=10) CopyFat23 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.347 n=10) CopyFat24 0.7704n ± 0% 0.7703n ± 0% ~ (p=0.383 n=10) CopyFat25 0.8385n ± 0% 0.6589n ± 0% -21.41% (p=0.000 n=10) CopyFat26 0.8386n ± 0% 0.6590n ± 0% -21.42% (p=0.000 n=10) CopyFat27 0.8385n ± 0% 0.6590n ± 0% -21.41% (p=0.000 n=10) CopyFat28 0.8386n ± 0% 0.6571n ± 0% -21.65% (p=0.000 n=10) CopyFat29 0.8385n ± 0% 0.6590n ± 0% -21.41% (p=0.000 n=10) CopyFat30 0.8387n ± 0% 0.6591n ± 0% -21.42% (p=0.000 n=10) CopyFat31 0.8385n ± 0% 0.6589n ± 0% -21.42% (p=0.000 n=10) CopyFat32 0.8318n ± 0% 0.4969n ± 0% -40.26% (p=0.000 n=10) CopyFat33 1.1550n ± 0% 0.7705n ± 0% -33.29% (p=0.000 n=10) CopyFat34 1.1560n ± 0% 0.7703n ± 0% -33.37% (p=0.000 n=10) CopyFat35 1.1550n ± 0% 0.7705n ± 0% -33.29% (p=0.000 n=10) CopyFat36 1.1550n ± 0% 0.7704n ± 0% -33.30% (p=0.000 n=10) CopyFat37 1.1555n ± 0% 0.7704n ± 0% -33.33% (p=0.000 n=10) CopyFat38 1.1550n ± 0% 0.7704n ± 0% -33.30% (p=0.000 n=10) CopyFat39 1.1560n ± 0% 0.7703n ± 0% -33.36% (p=0.000 n=10) CopyFat40 1.0020n ± 0% 0.7705n ± 0% -23.10% (p=0.000 n=10) CopyFat41 1.2060n ± 0% 0.7703n ± 0% -36.12% (p=0.000 n=10) CopyFat42 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10) CopyFat43 1.2060n ± 0% 0.7705n ± 0% -36.11% (p=0.000 n=10) CopyFat44 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10) CopyFat45 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10) CopyFat46 1.2060n ± 0% 0.7703n ± 0% -36.13% (p=0.000 n=10) CopyFat47 1.2060n ± 0% 0.7703n ± 0% -36.12% (p=0.000 n=10) CopyFat48 1.2060n ± 0% 0.7703n ± 0% -36.13% (p=0.000 n=10) CopyFat49 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10) CopyFat50 1.3620n ± 0% 0.8621n ± 0% -36.70% (p=0.000 n=10) CopyFat51 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10) CopyFat52 1.3620n ± 0% 0.8623n ± 0% -36.69% (p=0.000 n=10) CopyFat53 1.3620n ± 0% 0.8621n ± 0% -36.70% (p=0.000 n=10) CopyFat54 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10) CopyFat55 1.3620n ± 0% 0.8620n ± 0% -36.71% (p=0.000 n=10) CopyFat56 1.3120n ± 0% 0.8622n ± 0% -34.28% (p=0.000 n=10) CopyFat57 1.5905n ± 0% 0.8621n ± 0% -45.80% (p=0.000 n=10) CopyFat58 1.5830n ± 1% 0.8622n ± 0% -45.53% (p=0.000 n=10) CopyFat59 1.5865n ± 1% 0.8621n ± 0% -45.66% (p=0.000 n=10) CopyFat60 1.5720n ± 1% 0.8622n ± 0% -45.15% (p=0.000 n=10) CopyFat61 1.5900n ± 1% 0.8621n ± 0% -45.78% (p=0.000 n=10) CopyFat62 1.5890n ± 0% 0.8622n ± 0% -45.74% (p=0.000 n=10) CopyFat63 1.5900n ± 1% 0.8620n ± 0% -45.78% (p=0.000 n=10) CopyFat64 1.5440n ± 0% 0.8568n ± 0% -44.51% (p=0.000 n=10) geomean 1.093n 0.7636n -30.13% Kunpeng 920C: goos: linux goarch: arm64 pkg: runtime │ orig.out │ movq.out │ │ sec/op │ sec/op vs base │ CopyFat16 0.4892n ± 1% 0.5072n ± 0% +3.68% (p=0.000 n=10) CopyFat17 0.6394n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10) CopyFat18 0.6394n ± 0% 0.4638n ± 0% -27.46% (p=0.000 n=10) CopyFat19 0.6395n ± 0% 0.4638n ± 0% -27.48% (p=0.000 n=10) CopyFat20 0.6393n ± 0% 0.4638n ± 0% -27.45% (p=0.000 n=10) CopyFat21 0.6394n ± 0% 0.4637n ± 0% -27.48% (p=0.000 n=10) CopyFat22 0.6395n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10) CopyFat23 0.6395n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10) CopyFat24 0.6091n ± 0% 0.4639n ± 0% -23.84% (p=0.000 n=10) CopyFat25 0.9109n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10) CopyFat26 0.9107n ± 0% 0.4674n ± 0% -48.68% (p=0.000 n=10) CopyFat27 0.9108n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10) CopyFat28 0.9109n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10) CopyFat29 0.9110n ± 0% 0.4673n ± 0% -48.70% (p=0.000 n=10) CopyFat30 0.9109n ± 0% 0.4673n ± 0% -48.70% (p=0.000 n=10) CopyFat31 0.9110n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10) CopyFat32 0.6845n ± 0% 0.4845n ± 1% -29.21% (p=0.000 n=10) CopyFat33 0.9130n ± 0% 0.9117n ± 0% -0.14% (p=0.000 n=10) CopyFat34 0.9131n ± 0% 0.9118n ± 0% -0.14% (p=0.001 n=10) CopyFat35 0.9131n ± 0% 0.9117n ± 0% -0.15% (p=0.001 n=10) CopyFat36 0.9129n ± 0% 0.9117n ± 0% -0.14% (p=0.003 n=10) CopyFat37 0.9129n ± 0% 0.9117n ± 0% -0.14% (p=0.000 n=10) CopyFat38 0.9130n ± 0% 0.9118n ± 0% -0.14% (p=0.000 n=10) CopyFat39 0.9131n ± 0% 0.9118n ± 0% -0.15% (p=0.000 n=10) CopyFat40 0.9112n ± 0% 0.9118n ± 0% +0.07% (p=0.027 n=10) CopyFat41 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10) CopyFat42 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10) CopyFat43 1.1390n ± 0% 0.9116n ± 0% -19.96% (p=0.000 n=10) CopyFat44 1.1390n ± 0% 0.9119n ± 0% -19.94% (p=0.000 n=10) CopyFat45 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10) CopyFat46 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10) CopyFat47 1.1390n ± 0% 0.9117n ± 0% -19.96% (p=0.000 n=10) CopyFat48 0.9111n ± 0% 0.9116n ± 0% +0.06% (p=0.002 n=10) CopyFat49 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10) CopyFat50 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10) CopyFat51 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10) CopyFat52 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10) CopyFat53 1.2160n ± 0% 0.9293n ± 0% -23.58% (p=0.000 n=10) CopyFat54 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10) CopyFat55 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10) CopyFat56 1.1480n ± 0% 0.9303n ± 0% -18.96% (p=0.000 n=10) CopyFat57 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10) CopyFat58 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10) CopyFat59 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10) CopyFat60 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10) CopyFat61 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10) CopyFat62 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10) CopyFat63 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10) CopyFat64 1.1470n ± 0% 0.5742n ± 0% -49.94% (p=0.000 n=10) geomean 0.9710n 0.7214n -25.70% Change-Id: Iecfe52fde1d431a1e4503cd848813a67f3896512 Reviewed-on: https://go-review.googlesource.com/c/go/+/738261 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2026-03-17cmd/compile: eliminate redundant sign-extensions for wasmGeorge Adams
Add rules to eliminate redundant I64Extend sign-extension operations in the wasm backend: Idempotent (applying the same extend twice is redundant): (I64Extend32S (I64Extend32S x)) => (I64Extend32S x) (I64Extend16S (I64Extend16S x)) => (I64Extend16S x) (I64Extend8S (I64Extend8S x)) => (I64Extend8S x) Narrower-subsumes-wider (a narrower sign-extend already determines all the bits that a wider one would set): (I64Extend32S (I64Extend16S x)) => (I64Extend16S x) (I64Extend32S (I64Extend8S x)) => (I64Extend8S x) (I64Extend16S (I64Extend8S x)) => (I64Extend8S x) These patterns arise from nested sub-word type conversions. For example, converting int8 -> int16 -> int32 -> int64 lowers to I64Extend8S -> I64Extend16S -> I64Extend32S, but the I64Extend8S alone is sufficient since it already sign-extends from 8 to 64 bits. Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero Change-Id: I1637687df31893b1ffa36915a3bd2e10d455f4ef Reviewed-on: https://go-review.googlesource.com/c/go/+/754040 Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-16cmd/compile: (riscv64) optimize float32(abs|sqrt64(float64(x)))Meng Zhuo
Absorb unnecessary conversion between float32 and float64 if both src and dst are 32 bit. Updates #75463 Change-Id: Ia71941223b5cca3fea66b559da7b8f916e63feaf Reviewed-on: https://go-review.googlesource.com/c/go/+/733621 Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Julian Zhu <jz531210@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-10cmd/compile: add double-mask elimination rule for wasmGeorge Adams
Add a rule to collapse cascaded I64And operations with constant masks into a single mask: (I64And (I64And x (I64Const [c1])) (I64Const [c2])) => (I64And x (I64Const [c1 & c2])) This pattern arises from sub-word comparisons. For example, (Eq32 x y) lowers to (I64Eq (ZeroExt32to64 x) (ZeroExt32to64 y)), which becomes (I64Eq (I64And x 0xffffffff) (I64And y 0xffffffff)). If x or y is the result of another sub-word operation that already inserted a mask, the masks cascade and this rule collapses them. Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero Change-Id: Id7856b391be3ac20f1bc9eee40995b52c0754aed Reviewed-on: https://go-review.googlesource.com/c/go/+/753620 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2026-03-10cmd/compile: fix mips64 CALLtailinter argument countKeith Randall
It should be 2, not 1. Fixes #78013 Change-Id: If1c48c84c324a3fd50e9f4b43cca2ea62a995dc5 Reviewed-on: https://go-review.googlesource.com/c/go/+/752740 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2026-03-10cmd/compile: forward small Load through Move to avoid redundant copiesdorbmon
Fixes #77720 Add a generic SSA rewrite that forwards `Load` from a `Move` destination back to the `Move` source when it is provably safe, so field reads like `s.h.Value().typ` don’t force a full struct copy. - Add `Load <- Move` rewrite in `generic.rules` with safety guard: non-volatile source - Tweak `fixedbugs/issue22200*` so that it can still trigger the "stack frame too large" error. - Regenerate `rewritegeneric.go`. - Add `test/codegen/moveload.go` to assert no `MOVUPS` and direct `MOVBLZX` in both direct and inlined forms. Benchmark results (linux/amd64, i7-14700KF): $ go test cmd/compile/internal/test -run='^$' -bench='MoveLoad' -count=20 Before: BenchmarkMoveLoadTypViaValue-20 ~76.9 ns/op BenchmarkMoveLoadTypViaPtr-20 ~1.97 ns/op After: BenchmarkMoveLoadTypViaValue-20 ~1.894 ns/op BenchmarkMoveLoadTypViaPtr-20 ~1.905 ns/op The rewrite removes the redundant struct copy in `s.h.Value().typ`, bringing it in line with the direct pointer form. Change-Id: Iddf2263e390030ba013e0642a695b87c75f899da Reviewed-on: https://go-review.googlesource.com/c/go/+/748200 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <markfreeman@google.com>
2026-03-10cmd/compile: remove loop variable capture workaroundsgojkovicmatija99
Since Go 1.22, loop variables have per-iteration scope, making the x := x this pattern unnecessary for goroutine capture. No issue required for this trivial cleanup. Change-Id: I00d98522537fc2b9a6b4d598c8aa21b447628d41 Reviewed-on: https://go-review.googlesource.com/c/go/+/753400 Auto-Submit: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Robert Griesemer <gri@google.com>
2026-03-10cmd/compile: add identity and absorption rules for wasmGeorge Adams
Add post-lowering identity and absorption rules for I64And, I64Or, I64Xor, and I64Mul with constant operands: (I64And x (I64Const [-1])) => x (I64And x (I64Const [0])) => (I64Const [0]) (I64Or x (I64Const [0])) => x (I64Or x (I64Const [-1])) => (I64Const [-1]) (I64Xor x (I64Const [0])) => x (I64Mul x (I64Const [0])) => (I64Const [0]) (I64Mul x (I64Const [1])) => x The generic SSA rules handle these patterns before lowering, but these rules catch cases where wasm-specific lowering or other post-lowering optimization passes produce new nodes with identity or absorbing constant operands. For example, the complement rule lowers Com64(x) to (I64Xor x (I64Const [-1])), and if x is later determined to be all-ones, the I64And absorption rule can fold the result to zero. Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero Change-Id: Ie9a40e075662d4828a70e30b258d92ee171d0bc2 Reviewed-on: https://go-review.googlesource.com/c/go/+/752861 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2026-03-06cmd/compile: arm64 add 128-bit vector load/store SSA opsAlexander Musman
Add OpARM64FMOVQload, OpARM64FMOVQstore, OpARM64FLDPQ, and OpARM64FSTPQ for loading and storing Vec128 values. Includes offset folding and address combining rules. These ops will be used by subsequent CLs. Change-Id: I4ac86a0a31f878411f49d390cb8df01f81cfc4d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/738260 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Keith Randall <khr@golang.org>
2026-03-06cmd/compile: additional optimisation for CZEROEQZ/CZERONEZ on riscv64Joel Sing
Negation on a condition can be eliminated. Change-Id: I94fab5f019cbaebb2ca589e1d8796a9cb72f3894 Reviewed-on: https://go-review.googlesource.com/c/go/+/748401 Reviewed-by: Xueqi Luo <1824368278@qq.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Julian Zhu <jz531210@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>
2026-03-06cmd/compile: replace boolean simplification ruleMarvin Stenger
Replace one of the boolean simplification rules with two new rules in order to cover more cases. This is a rebase of CL 42516 which slipped through the cracks. Change-Id: I6da4cf30e5156174e8eac6bc2f0e2cebe95e555c Reviewed-on: https://go-review.googlesource.com/c/go/+/750520 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
2026-03-06cmd/compile: fold boolean x == x & x != xJorropo
Change-Id: I0e0a5919536b643477a6f9278fcc60492ea5a759 Reviewed-on: https://go-review.googlesource.com/c/go/+/750540 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>
2026-03-06cmd/compile: add I64Sub constant folding rule for wasmGeorge Adams
Add the missing I64Sub constant folding rule to the wasm backend. Every other wasm arithmetic operation (I64Add, I64Mul, I64And, I64Or, I64Xor, I64Shl, I64ShrU, I64ShrS) already had a post-lowering constant folding rule, but I64Sub was missing. While the generic SSA pass folds Sub64(Const64, Const64) before lowering, this rule ensures consistency and handles any edge cases where later wasm-specific passes produce I64Sub with two constant operands. Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero Change-Id: Ie8bc044dd300dcc6d077feec34f9a65f4a310b13 Reviewed-on: https://go-review.googlesource.com/c/go/+/751441 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> Commit-Queue: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-06cmd/compile: use tail calls for wrappers for embedded interfacesKeith Randall
type I interface { foo() } type S struct { I } Because I is embedded in S, S needs a foo method. We generate a wrapper function to implement (*S).foo. It just loads the embedded field I out of S and calls foo on it. When the thing in S.I itself needs a wrapper, then we have a wrapper calling another wrapper. This can continue, leaving a potentially long sequence of wrappers on the stack. When we then call runtime.Callers or friends, we have to walk an unbounded number of frames to find a bounded number of non-wrapper frames. This really happens, for instance with I = context.Context, S = context.ValueCtx, and runtime.Callers = pprof sample (for any of context.Context's methods). To fix, make the interface call in the wrapper a tail call. That way, the number of wrapper frames on the stack does not increase when there are lots of wrappers happening. Fixes #75764 Fixes #77781 Change-Id: I03b1731159d9218c7f14f72ecbbac822d6a6bb87 Reviewed-on: https://go-review.googlesource.com/c/go/+/751465 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2026-03-04runtime, cmd/compile: use preemptible memclr for large pointer-free clears“Muhammad
Large memory clearing operations (via clear() or large slice allocation) currently use non-preemptible assembly loops. This blocks the Garbage Collector from performing a Stop The World (STW) event, leading to significant tail latency or even indefinite hangs in tight loops. This change introduces memclrNoHeapPointersPreemptible, which chunks clears into 256KB blocks with preemption checks. The compiler's walk phase is updated to emit this call for large pointer-free clears. To prevent regressions, SSA rewrite rules are added to ensure that constant-size clears (which are common and small) continue to be inlined into OpZero assembly. Benchmarks on darwin/arm64: - STW with 50MB clear: Improved from 'Hung' to ~500µs max pause. - Small clears (5-64B): No measurable regression. - Large clears (1M-64M): No measurable regression. Fixes #69327 Change-Id: Ide14d6bcdca1f60d6ac95443acb57da9a8822538 Reviewed-on: https://go-review.googlesource.com/c/go/+/750480 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Robert Griesemer <gri@google.com>
2026-03-02cmd/compile, simd/archsimd: add VPSRL immeidate peepholesJunyang Shao
Before this CL, the simdgen contains a sign check to selectively enable such rules for deduplication purposes. This left out `VPSRL` as it's only available in unsigned form. This CL fixes that. It looks like the previous documentation fix to SHA instruction might not had run go generate, so this CL also contains the generated code for that fix. There is also a weird phantom import in cmd/compile/internal/ssa/issue77582_test.go This CL also fixes that The trybot didn't complain? Change-Id: Ibbf9f789c1a67af1474f0285ab376bc07f17667e Reviewed-on: https://go-review.googlesource.com/c/go/+/748501 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2026-03-01cmd/compile: combine some generic AMD64 simplificationsJakub Ciolek
Saves a few lines. If applicable, we also directly rewrite to 32 bit MOVLconst, skipping the redundant transformation. Change-Id: I4c2f5e2bb480e798cbe373de608e19a951d168ff Reviewed-on: https://go-review.googlesource.com/c/go/+/640215 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-01cmd/compile: (wasm) optimize float32(round64(float64(x)))David Chase
Not a fix because there are other architectures still to be done. Updates #75463. Change-Id: Ia5233c2b6c5f4439e269950efdd851e72e8e7ff6 Reviewed-on: https://go-review.googlesource.com/c/go/+/730160 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-01cmd/compile: (arm64) optimize float32(round64(float64(x)))David Chase
Not a fix because there are other architectures still to be done. Updates #75463. Change-Id: Ifca03975023e4e5d0ffa98d1f877314a1a291be0 Reviewed-on: https://go-review.googlesource.com/c/go/+/729161 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-01cmd/compile: (amd64) optimize float32(round64(float64(x)))David Chase
Not a fix because there are other architectures still to be done. Updates #75463. Change-Id: I3d7754ce4a26af0f5c4ef0be1254d164e68f8442 Reviewed-on: https://go-review.googlesource.com/c/go/+/729160 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2026-02-25cmd/compile: handle zero-sized values more generallykhr@golang.org
Introduce a new zero-arg op, Empty, which builds a zero-sized value. This is like ArrayMake0 but can make more general zero-sized values, like those of type [2][0]int. Needed for the subsequent CL. Update #77635 Change-Id: If928e9677be5d40a4e2d7501dada66e062319711 Reviewed-on: https://go-review.googlesource.com/c/go/+/747761 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2026-02-25cmd/compile: ensure StructMake/ArrayMake1 of direct interfaces are unwrappedKeith Randall
Ensures that deeply nested structs that have the underlying shape of a pointer get unwrapped properly. Update #77534 Change-Id: I004f424d2c62ec7026281daded9b3d96c021e2e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/747760 Reviewed-by: Mark Freeman <markfreeman@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2026-02-23cmd/compile/internal/ssa: add codegen for Zicond extension on riscv64Xueqi Luo
Zicond is a mandatory extension in rva23u64. This patch converts certain branches to CondSelect and optimizes them to Zicond instructions on RISC-V in appropriate cases, along with additional optimization rules. Zicond can provide performance benefits on unpredictable branches by avoiding branch misprediction penalties. However, on simple predictable branches, zicond uses 4 instructions vs 2 for traditional branches, which can cause performance regressions. To avoid regressions, we keep CondSelect globally disabled for riscv64 and only enable it for the ConstantTimeSelect intrinsic, which has been shown to benefit from zicond: goos: linux goarch: riscv64 pkg: crypto/subtle CPU: SG2044 │ nozicond.txt │ zicond.txt │ │ sec/op │ sec/op vs base │ ConstantTimeSelect-44 2.325n ± 4% 1.750n ± 2% -24.69% (p=0.000 n=10) Future work can explore enabling zicond for other cases that can benefit from zicond. Follow-up to CL 631595 Updates #75350 Co-authored-by: wangpengcheng.pp@bytedance.com mengzhuo1203@gmail.com Change-Id: If5d9555980e0d1e26fa924974f88943eb86b050b GitHub-Last-Rev: 7a61508780953295f5507e5f927ab5be1d6afd91 GitHub-Pull-Request: golang/go#75577 Reviewed-on: https://go-review.googlesource.com/c/go/+/705996 Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Joel Sing <joel@sing.id.au> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-02-14cmd/compile: avoid folding 64-bit integers into 32-bit constantsYoulin Feng
Folding a 64-bit integer into a 32-bit constant may result in a negative integer if the value exceeds math.MaxInt32 (the maximum value of a 32- bit signed integer). This negative value will be sign-extended to 64 bits at runtime, leading to unexpected results when used in bitwise AND/OR operations. Fixes #77613 Change-Id: Idb081a3c20c28bddddcc8eff1225d62123b37a2d Reviewed-on: https://go-review.googlesource.com/c/go/+/745581 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2026-02-12cmd/compile: fix slice bounds check elimination after function inliningYoulin Feng
When creating a dynamically-sized slice, the compiler attempts to use a stack-allocated buffer if the slice does not escape and its buffer size is ≤ 32 bytes. In this case, the SSA will contain a (OpPhi (OpSliceMake) (OpSliceMake)) value: one OpSliceMake uses the stack-allocated buffer, and the other uses the heap-allocated buffer. The len and cap arguments for these two OpSliceMake values are expected to be identical. This CL enables the prove pass to recognize this scenario and handle OpSliceLen and OpSliceCap as intended. Fixes #77375 Change-Id: Id77a2473caf66d366f5c94108aa6cb6d3df5b887 Reviewed-on: https://go-review.googlesource.com/c/go/+/740840 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2026-01-28cmd/compile, simd: capture VAES instructions and fix AVX512VAES featureJunyang Shao
The code previously filters out VAES-only instructions, this CL added them back. This CL added the VAES feature check following the Intel xed data: XED_ISA_SET_VAES: vaes.7.0.ecx.9 # avx.1.0.ecx.28 This CL also found out that the old AVX512VAES feature check is not checking the correct bits, it also fixes it: XED_ISA_SET_AVX512_VAES_128: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 avx512vl.7.0.ebx.31 XED_ISA_SET_AVX512_VAES_256: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 avx512vl.7.0.ebx.31 XED_ISA_SET_AVX512_VAES_512: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 It restricts to the most strict common set - includes avx512vl for even 512-bits although it doesn't requires it. Change-Id: I4e2f72b312fd2411589fbc12f9ee5c63c09c2e9a Reviewed-on: https://go-review.googlesource.com/c/go/+/738500 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-28cmd/compile: (loong64) optimize float32(abs|sqrt64(float64(x)))Xiaolin Zhao
Ref: #733621 Updates #75463 Change-Id: Idd8821d1713754097a2fe83a050c25d9ec5b17eb Reviewed-on: https://go-review.googlesource.com/c/go/+/735540 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-28cmd/compile: remove the NORconst op on mips{,64}Xiaolin Zhao
In the mips{,64} instruction sets and their extensions, there is no NORI instruction. Change-Id: If008442c792297d011b3d0c1e8501e62e32ab175 Reviewed-on: https://go-review.googlesource.com/c/go/+/735900 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Cherry Mui <cherryyz@google.com>
2026-01-27cmd/compile, runtime: avoid improper control transfer instruction hints on ↵wangboyao
riscv64 On RISC-V the JAL and JALR instructions provide Return Address Stack(RAS) prediction hints based on the registers used (as per section 2.5.1 of the RISC-V ISA manual). When a JALR instruction uses X1 or X5 as the source register, it hints that a pop should occur. When making a function call, avoid the use of X5 as a source register since this results in the RAS performing a pop-then-push instead of a push, breaking call/return pairing and significantly degrading front-end branch prediction performance. Based on test result of golang.org/x/benchmarks/json on SpacemiT K1, fix version has a performance improvement of about 7% Fixes #76654 Change-Id: I867c8d7cfb54f5decbe176f3ab3bb3d78af1cf64 Reviewed-on: https://go-review.googlesource.com/c/go/+/726760 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Joel Sing <joel@sing.id.au>
2026-01-23cmd/compile: on amd64 use 32bits copies for 64bits copies of 32bits valuesJorropo
Fixes #76449 This saves a single byte for the REX prefix per OpCopy it triggers on. Change-Id: I1eab364d07354555ba2f23ffd2f9c522d4a04bd0 Reviewed-on: https://go-review.googlesource.com/c/go/+/731640 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-23cmd/compile: cleanup isUnsignedPowerOfTwoJorropo
Merge the signed and unsigned generic functions. The only implementation difference between the two is: n > 0 vs n != 0 check. For unsigned numbers n > 0 == n != 0 and we infact optimize the first to the second. Change-Id: Ia2f6c3e3d4eb098d98f85e06dc2e81baa60bad4e Reviewed-on: https://go-review.googlesource.com/c/go/+/726720 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-23cmd/compile: avoid extending when already sufficiently shifted on loong64Xiaolin Zhao
This reduces 744 instructions from the go toolchain binary on loong64. file before after Δ % asm 599282 599222 -60 -0.0100% cgo 513606 513534 -72 -0.0140% compile 2939250 2939146 -104 -0.0035% cover 564136 564056 -80 -0.0142% fix 895622 895546 -76 -0.0085% link 759460 759376 -84 -0.0111% preprofile 264960 264916 -44 -0.0166% vet 869964 869888 -76 -0.0087% go 1712990 1712890 -100 -0.0058% gofmt 346416 346368 -48 -0.0139% total 9465686 9464942 -744 -0.0079% Change-Id: I32dfa7506d0458ca0b6de83b030c330cd2b82176 Reviewed-on: https://go-review.googlesource.com/c/go/+/725720 Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2026-01-22cmd/compile: ensure ops have the expected argument widthsKeith Randall
The generic SSA representation uses explicit extension and truncation operations to change widths of values. The map intrinsics were playing somewhat fast and loose with this requirement. Fix that, and add a check to make sure we don't regress. I don't think there is a triggerable bug here, but I ran into this with some prove pass modifications, where cmd/compile/internal/ssa/prove.go:isCleanExt (and/or its uses) is actually wrong when this invariant is not maintained. Change-Id: Idb7be6e691e2dbf6d7af6584641c3227c5c64bf5 Reviewed-on: https://go-review.googlesource.com/c/go/+/731300 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>