aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa
AgeCommit message (Collapse)Author
19 hoursall: prealloc slice with possible minimum capabilitiesShulhan
39 hourscmd/compile: handle min integer step in loopCuong Manh Le
Since negating min int will overflows back to itself, causing a panic inside subWillUnderflow check. Fixes #78641 Change-Id: Ibbf2fa3228b9890a1a76ac6f4ff504b7e125b29f Reviewed-on: https://go-review.googlesource.com/c/go/+/766260 Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@google.com>
47 hourscmd/compile: add boolean absorption laws to SSA rewrite rulesTimo Friedl
The SSA generic rewrite rules implement DeMorgan's laws but are missing the closely related boolean absorption laws: x & (x | y) == x x | (x & y) == x These are fundamental boolean algebra identities (see https://en.wikipedia.org/wiki/Absorption_law) that hold for all bit patterns, all widths, signed and unsigned. Both GCC and LLVM recognize and optimize these patterns at -O2. Add two generic rules covering all four widths (8, 16, 32, 64). Commutativity of AND/OR is handled automatically by the rule engine, so all argument orderings are matched. The rules eliminate two redundant ALU instructions per occurrence and fire on real code (defer bit-manipulation patterns in runtime, testing, go/parser, and third-party packages). Fixes #78632 Change-Id: Ib59e839081302ad1635e823309d8aec768c25dcf GitHub-Last-Rev: 23f8296ece08c77fcaeeaf59c2c2d8ce23d1202c GitHub-Pull-Request: golang/go#78634 Reviewed-on: https://go-review.googlesource.com/c/go/+/765580 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
5 dayscmd/compile: fix typoWeixie Cui
Change-Id: Ia9ee618aa68aad5bab73ee62eea176084ee162da GitHub-Last-Rev: 4cc005d3cd1ae4e5eaa283b1799c7be26b2279f5 GitHub-Pull-Request: golang/go#78625 Reviewed-on: https://go-review.googlesource.com/c/go/+/765280 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Keith Randall <khr@google.com>
6 dayscmd/compile: run CondSelect into math rules on all archesJorropo
Fixes #78558 I've also added tests to make sure PPC still generate ISEL when the constant isn't 1. This is to make sure we aren't generating a sequence that wouldn't work right now. But it does not mean we couldn't try to optimize other constants on PPC64 if a fast sequence exists; for example like arm64's inline register shifts. Change-Id: Ic241d593149b7a11533948f5d4c52db357cc134f Reviewed-on: https://go-review.googlesource.com/c/go/+/763340 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> Reviewed-by: Paul Murphy <paumurph@redhat.com>
7 dayscmd/compile: improve stp merging for non-sequent casesMelnikov Denis
Original algorithm merges stores with the first mergeable store in the chain, but it misses some cases. Additionally, creating list of STs, which store data to adjacent memory cells allows merging them according to the direction of increase of their addresses. I have already tried another algorithm in CL 698097, but it was reverted. This algorithm works differently and fixes bug, generated by variant from another CL. Fixes #71987, #75365 There are the results of sweet benchmarks │ base.stat │ opt.stat │ │ sec/op │ sec/op vs base │ ESBuildThreeJS-4 1.088 ± 2% 1.086 ± 1% ~ (p=1.000 n=10) ESBuildRomeTS-4 263.0m ± 2% 260.8m ± 1% ~ (p=0.105 n=10) EtcdPut-4 73.08m ± 1% 73.16m ± 1% ~ (p=0.971 n=10) EtcdSTM-4 414.9m ± 1% 415.4m ± 1% ~ (p=0.393 n=10) GoBuildKubelet-4 203.3 ± 0% 203.5 ± 0% ~ (p=0.393 n=10) GoBuildKubeletLink-4 19.06 ± 1% 19.05 ± 0% ~ (p=0.280 n=10) GoBuildIstioctl-4 156.6 ± 0% 156.6 ± 0% ~ (p=0.796 n=10) GoBuildIstioctlLink-4 14.16 ± 1% 14.18 ± 1% ~ (p=0.853 n=10) GoBuildFrontend-4 56.45 ± 1% 56.57 ± 0% ~ (p=0.579 n=10) GoBuildFrontendLink-4 3.635 ± 1% 3.646 ± 0% ~ (p=0.436 n=10) GoBuildTsgo-4 103.0 ± 1% 103.4 ± 1% ~ (p=0.529 n=10) GoBuildTsgoLink-4 1.865 ± 1% 1.860 ± 1% ~ (p=0.684 n=10) GopherLuaKNucleotide-4 33.55 ± 0% 33.58 ± 0% ~ (p=0.075 n=10) MarkdownRenderXHTML-4 281.0m ± 0% 280.3m ± 0% -0.23% (p=0.019 n=10) Tile38QueryLoad-4 970.0µ ± 1% 969.3µ ± 0% ~ (p=0.436 n=10) geomean 3.128 3.128 -0.01% Change-Id: Ia548b43601b1bdb1c1723d300a4b8b907ab0c040 Reviewed-on: https://go-review.googlesource.com/c/go/+/760100 Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>
7 dayscmd/compile: fix loopbce overflow check logicJunyang Shao
addWillOverflow and subWillOverflow has an implicit assumption that y is positive, using it outside of addU and subU is really incorrect. This CL fixes those incorrect usage to use the correct logic in place. Thanks to Jakub Ciolek for reporting this issue. Fixes #78333 Fixes CVE-2026-27143 Change-Id: I263e8e7ac227e2a68109eb7bbd45f66569ed22ec Reviewed-on: https://go-internal-review.googlesource.com/c/go/+/3700 Reviewed-by: Damien Neil <dneil@google.com> Reviewed-by: Neal Patel <nealpatel@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/763765 Reviewed-by: Jakub Ciolek <jakub@ciolek.dev> Reviewed-by: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: David Chase <drchase@google.com>
8 dayscmd/compile: optimize CondSelect to math on arm64 with inline register shiftsJorropo
Change-Id: I27696b1a5fa0593d9f36743efa3559a36d23ec4b Reviewed-on: https://go-review.googlesource.com/c/go/+/760844 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@google.com>
8 dayscmd/compile: improve Mul to Left Shift rulesJorropo
- fix a bug where it wouldn't recognize 1<<63 as a power of two - remove the IsSigned check; there is no such thing as a signed Mul If the rule works for signed numbers it works for unsigned ones too. Even if the intermediary steps makes no sense, it ends up wrapping the right way around in the end. Change-Id: I86182762aec5eff784e2d9bc49ee028825fb9ea0 Reviewed-on: https://go-review.googlesource.com/c/go/+/760843 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
8 dayscmd/compile: add loop invariant code motionIlya Tocar
(I copied Ilya Tocar's CL 27656 and heavily modified it.) This adds an optimization that moves loop invariant computations out of the loop. For example: a:= ... for ... { b:= a + 15 // uses of b } Turns into a:= ... b:= a + 15 for ... { // uses of b } Change-Id: I36c8c7e2b3bc1c5e6b4b293bed3a76dc20d6c825 Reviewed-on: https://go-review.googlesource.com/c/go/+/697235 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
9 dayscmd/compile: cleanup rules by canonicalizing sext(int(bool)) → zext(int(bool))Jorropo
Change-Id: Ic97f661c68180ff7adb9976fcc61279e1e1f04a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/760842 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
9 dayscmd/compile: extend condselect into math code to handle other constants than 1Jorropo
On amd64 along: if b { x += 1 } => x += b We can also implement constants 2 4 and 8: if b { x += 2 } => x += b * 2 This compiles to a displacement LEA. Change-Id: Ib00fcc5059acb0ebb346e056c4a656f164cc63df Reviewed-on: https://go-review.googlesource.com/c/go/+/760841 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
9 dayscmd/compile: unify DWARF variable generation and remove encodingDerek Parker
Refactor the DWARF variable generation in the compiler: 1. Replace the intermediate []byte location list encoding with a structured LocListEntry type. The old code packed SSA block/value IDs into pointer-sized integers, wrote them alongside DWARF4-format length-prefixed expressions, then re-read and decoded everything during final encoding. The new approach stores entries as {StartBlock, StartValue, EndBlock, EndValue, Expr} structs that PutLocationListDwarf4/5 directly encode into the appropriate format. This eliminates encodeValue, decodeValue, appendPtr, writePtr, readPtr, and SetupLocList, and removes the DWARF4 re-encoding in PutLocationListDwarf5. 2. Unify createDwarfVars into a single processing loop. The old code had three mutually exclusive paths (createSimpleVars, createABIVars, createComplexVars) selected by build mode, followed by a separate conservative-var loop. The new code uses one loop that tries createComplexVar first (when SSA debug info is available), then falls back to createSimpleVar. This removes createSimpleVars, createABIVars, and createComplexVars. 3. Extract createConservativeVar and shouldEmitDwarfVar as named functions, consolidating inline code and scattered filtering logic. 4. Fix createHeapDerefLocationList to return []LocListEntry instead of raw bytes, consistent with the new representation. Change-Id: If6fb755c22e398d7615dccaf33b1367828e6c47e Reviewed-on: https://go-review.googlesource.com/c/go/+/750920 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
9 dayscmd/compile: improve uint8/uint16 logical immediates on PPC64Jayanth Krishnamurthy jayanth.krishnamurthy@ibm.com
Logical ops on uint8/uint16 (AND/OR/XOR) with constants sometimes materialized the mask via MOVD (often as a negative immediate), even when the value fit in the UI-immediate range. This prevented the backend from selecting andi. / ori / xori forms. This CL makes: UI-immediate truncation is performed only at the use-site of logical-immediate ops, and only when the constant does not fit in the 8- or 16-bit unsigned domain (m != uint8(m) / m != uint16(m)). This avoids negative-mask materialization and enables correct emission of UI-form logical instructions. Arithmetic SI-immediate instructions (addi, subfic, etc.) and other use-patterns are unchanged. Codegen tests are added to ensure the expected andi./ori/xori patterns appear and that MOVD is not emitted for valid 8/16-bit masks. Change-Id: I9fcdf4498c4e984c7587814fb9019a75865c4a0d Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/704015 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Paul Murphy <paumurph@redhat.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com>
12 dayscmd/compile: extend all the cmov into math generic rules with their contraryJorropo
If the bool comes from a local operation this is foldable into the comparison. if a == b { } else { x++ } becomes: x += !(a == b) becomes: x += a != b If the bool is passed in or loaded rather than being locally computed this adds an extra XOR ^1 to invert it. But at worst it should make the math equal to the compute + CMP + CMOV which is a tie on modern CPUs which can execute CMOV on all int ALUs and a win on the cheaper or older ones which can't. Change-Id: Idd2566c7a3826ec432ebfbba7b3898aa0db4b812 Reviewed-on: https://go-review.googlesource.com/c/go/+/760922 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2026-03-31cmd/compile: convert some condmoves in XORJorropo
Similar to CL 685676 but for XOR. Change-Id: Ib5ffd4c13348f176a808b3218fdbbafc2c42794f Reviewed-on: https://go-review.googlesource.com/c/go/+/760921 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2026-03-31cmd/compile: convert some condmoves in ORJorropo
Similar to CL 685676 but for OR. Change-Id: I0ddfd457ed9e8888462306138a251ac48ad42084 Reviewed-on: https://go-review.googlesource.com/c/go/+/760920 Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2026-03-27cmd/compile/internal/ssa: prove support induction variable pairYoulin Feng
We have two induction variables i and j in the following loop: for i, j := 0, len(s)-1; i < j; i, j = i+1, j-1 { // loop body } This CL enables the prove pass to handle cases where one if block uses two induction variables. Updates #45078 Change-Id: I8b8dc8b7b2d160a796dab1d1e29a00ef4e8e8157 Reviewed-on: https://go-review.googlesource.com/c/go/+/757700 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-27cmd/compile: use prove to remove no-op OrsJorropo
This is hit 3 times (unique by LOC) when building the std. Change-Id: Ic1fc7b60a129e73470d9bc4f603f4be12d154b0f Reviewed-on: https://go-review.googlesource.com/c/go/+/750342 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2026-03-27cmd/compile: remove 68857 And flowLimit workaround in proveJorropo
Change-Id: Id8baeb89e6e11a01d53cd63c665f0b2966f50392 Reviewed-on: https://go-review.googlesource.com/c/go/+/750341 Reviewed-by: Mark Freeman <markfreeman@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-27cmd/compile: use prove to remove no-op AndsJorropo
This is hit 308 times (unique by LOC) when building the std. There are many hits in defer generated code. My original intent was to optimize cryptographic code that uses And to implement modulus by a power of two but the number is always smaller than the modulus, it also works there but there (unsurprisingly) far fewer hits. Change-Id: Ia7a9a57099b98de966673c6e8231ef09f7c80964 Reviewed-on: https://go-review.googlesource.com/c/go/+/750200 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-26cmd/compile: on ARM64 merge SRA into TBZ & TBNZJorropo
Change-Id: I9596dbca8991c93c7543d10dc1b155056dfa7db3 Reviewed-on: https://go-review.googlesource.com/c/go/+/759500 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com>
2026-03-26cmd/compile: on ARM64 merge ROR into TBZ & TBNZJorropo
Change-Id: Ib37b35dfff6236c59c0242c3b7d979c95aefbb8b Reviewed-on: https://go-review.googlesource.com/c/go/+/750321 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
2026-03-26cmd/compile: on ARM64 merge shifts into TBZ & TBNZJorropo
Change-Id: I4dff3ba1462848f408257cbadedf202e62d1ea69 Reviewed-on: https://go-review.googlesource.com/c/go/+/750320 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2026-03-26cmd/compile: do not invert loops that would overflow or underflowJorropo
On the final iteration we need space below start (which becomes end) such that i-step does not overflow or underflow. In other words the code used to assume that the last time the loop header execute `start < i - step` (or `<=`, `>` `>=` based on the loop) is always false. And it seems correct since by definition the only way for it to be the last's loop header execution is when the condition becomes false. However here is an example with uint (even tho the code doesn't already support them) to make things simpler: start = 1 i = 2 step = 100 We do 2 - 100 which should give us 1 < -98 == false breaking the loop; Instead we get 18446744073709551518 which gives 1 < 18446744073709551518 == true which keeps the loop going. This patch fixes this issue by ensuring that in the last execution of a loop header the induction variable does not underflow or overflow. Fixes #78303 Change-Id: I64e8e8592b023d79fdbc7f1598d584726ed601f5 Reviewed-on: https://go-review.googlesource.com/c/go/+/758801 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Jakub Ciolek <jakub@ciolek.dev> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-25cmd/compile: ppc64 fold (x+x)<<c into x<<(c+1)Jayanth Krishnamurthy jayanth.krishnamurthy@ibm.com
On ppc64/ppc64le, rewrite (x + x) << c to x << (c+1) for constant shifts. This removes an ADD, shortens the dependency chain, and reduces code size. Add rules for both 64-bit (SLDconst) and 32-bit (SLWconst), and extend test/codegen/shift.go with ppc64x checks to assert a single SLD/SLW and forbid ADD. Aligns ppc64 with other architectures that already assert similar codegen in shift.go. Change-Id: Ie564afbb029a5bd48887b82b0c455ca1dddd5508 Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/712000 Reviewed-by: Archana Ravindar <aravinda@redhat.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2026-03-20Revert "runtime, cmd/compile: use preemptible memclr for large pointer-free ↵Michael Pratt
clears" This reverts CL 750480. Reason: Adding preemptible memclrNoHeapPointers exposes existing unsafe use of notInHeapSlice, causing crashes. Revert the memclr stack until the underlying issue is fixed. We keep the test added in CL 755942, which is useful regardless. For #78254. Change-Id: I8be3f9a20292b7f294e98e74e5a86c6a204406ae Reviewed-on: https://go-review.googlesource.com/c/go/+/757343 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2026-03-20cmd/compile: on arm64 pair a load with a load in a subsequent blockKeith Randall
Look into the following block(s) for a load that can be paired with the load we're trying to pair up. This particularly helps the generated equality functions. Instead of doing MOVD x(R0), R2 MOVD x(R1), R3 CMP R2, R3 BNE noteq MOVD x+8(R0), R2 MOVD x+8(R1), R3 CMP R2, R3 BNE noteq we do LDP x(R0), (R2, R4) LDP x(R1), (R3, R5) CMP R2, R3 BNE noteq CMP R4, R5 BNE noteq Removes 5296 bytes of code from cmd/go. Change-Id: I6368686892ac944783c8b07ed7252126d1ef4031 Reviewed-on: https://go-review.googlesource.com/c/go/+/740741 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-20cmd/compile: elide sign-extend after zero-extend for wasmGeorge Adams
Add rules to eliminate sign-extension of values that have already been zero-extended from fewer bits via an I64And mask: (I64Extend32S x:(I64And _ (I64Const [c]))) && c >= 0 && int64(int32(c)) == c => x (I64Extend16S x:(I64And _ (I64Const [c]))) && c >= 0 && int64(int16(c)) == c => x (I64Extend8S x:(I64And _ (I64Const [c]))) && c >= 0 && int64(int8(c)) == c => x When a value has been masked to fit within the non-negative range of the sign-extension width, the upper bits are already zero and sign- extending is a no-op. For example, (I64Extend32S (I64And x 0xff)) can be elided because 0xff fits in a signed int32, so bit 31 is guaranteed to be zero and sign-extending from 32 bits is identity. Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero Change-Id: Ia54d67358756e47ca7635a6a8ca4beadb003820a Reviewed-on: https://go-review.googlesource.com/c/go/+/756320 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>
2026-03-20cmd/compile: (mips64x) optimize float32(abs|sqrt64(float64(x)))Julian Zhu
Absorb unnecessary conversion between float32 and float64 if both src and dst are 32 bit. Ref: CL 733621 Updates #75463 Change-Id: I439f92aa3d940fa4979e76845c0893e43bf584af Reviewed-on: https://go-review.googlesource.com/c/go/+/739520 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Joel Sing <joel@sing.id.au>
2026-03-18cmd/link: modify the register used in trampolinelimeidan
R30 is the callee's saved register; using it requires saving and then restoring. Therefore, we replace it with a register saved by the caller. R4~R19 are argument registers on loong64, and R20 is the only remaining usable caller saved register. To use R20 in trampoline, we modified the registers used by the LoweredMove/LoweredMoveLoop operations (originally using r20 and r21, now changed to R23 and R24). Change-Id: Ie7bba0caa30a764a45bcb47635c35c829036c5a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/726140 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Cherry Mui <cherryyz@google.com>
2026-03-18cmd/compile: allow multiple induction variables in one block in proveYoulin Feng
In this CL, the restriction that each block can only have one induction variable has been removed. This reduces missed optimizations. Fixes #76269 Change-Id: I14043182a40cc7887c5b6d9c1a5df8ea3a1bfedc Reviewed-on: https://go-review.googlesource.com/c/go/+/719881 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2026-03-17cmd/compile: use 128-bit arm64 vector ops for Move expansionAlexander Musman
Update Move rewrite rules to use FMOVQload/store and FLDPQ/FSTPQ for medium-sized copies (16-64 bytes). This generates fewer and wider instructions than the previous approach using LDP/STP pairs. Executable Base .text go1 Change ---------------------------------------------------- asm 2112308 2105732 -0.31% cgo 1826132 1823172 -0.16% compile 10474868 10460644 -0.14% cover 1990036 1985748 -0.22% fix 3234116 3226340 -0.24% link 2702628 2695316 -0.27% preprofile 947652 947028 -0.07% vet 3140964 3133524 -0.24% Performance effect on OrangePi 6 plus: │ orig.out │ movq.out │ │ sec/op │ sec/op vs base │ CopyFat16 0.4711n ± 0% 0.3852n ± 0% -18.23% (p=0.000 n=10) CopyFat17 0.7705n ± 0% 0.7705n ± 0% ~ (p=0.984 n=10) CopyFat18 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.771 n=10) CopyFat19 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.637 n=10) CopyFat20 0.7703n ± 0% 0.7704n ± 0% ~ (p=0.103 n=10) CopyFat21 0.7703n ± 0% 0.7708n ± 0% ~ (p=0.505 n=10) CopyFat22 0.7704n ± 0% 0.7705n ± 0% ~ (p=0.589 n=10) CopyFat23 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.347 n=10) CopyFat24 0.7704n ± 0% 0.7703n ± 0% ~ (p=0.383 n=10) CopyFat25 0.8385n ± 0% 0.6589n ± 0% -21.41% (p=0.000 n=10) CopyFat26 0.8386n ± 0% 0.6590n ± 0% -21.42% (p=0.000 n=10) CopyFat27 0.8385n ± 0% 0.6590n ± 0% -21.41% (p=0.000 n=10) CopyFat28 0.8386n ± 0% 0.6571n ± 0% -21.65% (p=0.000 n=10) CopyFat29 0.8385n ± 0% 0.6590n ± 0% -21.41% (p=0.000 n=10) CopyFat30 0.8387n ± 0% 0.6591n ± 0% -21.42% (p=0.000 n=10) CopyFat31 0.8385n ± 0% 0.6589n ± 0% -21.42% (p=0.000 n=10) CopyFat32 0.8318n ± 0% 0.4969n ± 0% -40.26% (p=0.000 n=10) CopyFat33 1.1550n ± 0% 0.7705n ± 0% -33.29% (p=0.000 n=10) CopyFat34 1.1560n ± 0% 0.7703n ± 0% -33.37% (p=0.000 n=10) CopyFat35 1.1550n ± 0% 0.7705n ± 0% -33.29% (p=0.000 n=10) CopyFat36 1.1550n ± 0% 0.7704n ± 0% -33.30% (p=0.000 n=10) CopyFat37 1.1555n ± 0% 0.7704n ± 0% -33.33% (p=0.000 n=10) CopyFat38 1.1550n ± 0% 0.7704n ± 0% -33.30% (p=0.000 n=10) CopyFat39 1.1560n ± 0% 0.7703n ± 0% -33.36% (p=0.000 n=10) CopyFat40 1.0020n ± 0% 0.7705n ± 0% -23.10% (p=0.000 n=10) CopyFat41 1.2060n ± 0% 0.7703n ± 0% -36.12% (p=0.000 n=10) CopyFat42 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10) CopyFat43 1.2060n ± 0% 0.7705n ± 0% -36.11% (p=0.000 n=10) CopyFat44 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10) CopyFat45 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10) CopyFat46 1.2060n ± 0% 0.7703n ± 0% -36.13% (p=0.000 n=10) CopyFat47 1.2060n ± 0% 0.7703n ± 0% -36.12% (p=0.000 n=10) CopyFat48 1.2060n ± 0% 0.7703n ± 0% -36.13% (p=0.000 n=10) CopyFat49 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10) CopyFat50 1.3620n ± 0% 0.8621n ± 0% -36.70% (p=0.000 n=10) CopyFat51 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10) CopyFat52 1.3620n ± 0% 0.8623n ± 0% -36.69% (p=0.000 n=10) CopyFat53 1.3620n ± 0% 0.8621n ± 0% -36.70% (p=0.000 n=10) CopyFat54 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10) CopyFat55 1.3620n ± 0% 0.8620n ± 0% -36.71% (p=0.000 n=10) CopyFat56 1.3120n ± 0% 0.8622n ± 0% -34.28% (p=0.000 n=10) CopyFat57 1.5905n ± 0% 0.8621n ± 0% -45.80% (p=0.000 n=10) CopyFat58 1.5830n ± 1% 0.8622n ± 0% -45.53% (p=0.000 n=10) CopyFat59 1.5865n ± 1% 0.8621n ± 0% -45.66% (p=0.000 n=10) CopyFat60 1.5720n ± 1% 0.8622n ± 0% -45.15% (p=0.000 n=10) CopyFat61 1.5900n ± 1% 0.8621n ± 0% -45.78% (p=0.000 n=10) CopyFat62 1.5890n ± 0% 0.8622n ± 0% -45.74% (p=0.000 n=10) CopyFat63 1.5900n ± 1% 0.8620n ± 0% -45.78% (p=0.000 n=10) CopyFat64 1.5440n ± 0% 0.8568n ± 0% -44.51% (p=0.000 n=10) geomean 1.093n 0.7636n -30.13% Kunpeng 920C: goos: linux goarch: arm64 pkg: runtime │ orig.out │ movq.out │ │ sec/op │ sec/op vs base │ CopyFat16 0.4892n ± 1% 0.5072n ± 0% +3.68% (p=0.000 n=10) CopyFat17 0.6394n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10) CopyFat18 0.6394n ± 0% 0.4638n ± 0% -27.46% (p=0.000 n=10) CopyFat19 0.6395n ± 0% 0.4638n ± 0% -27.48% (p=0.000 n=10) CopyFat20 0.6393n ± 0% 0.4638n ± 0% -27.45% (p=0.000 n=10) CopyFat21 0.6394n ± 0% 0.4637n ± 0% -27.48% (p=0.000 n=10) CopyFat22 0.6395n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10) CopyFat23 0.6395n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10) CopyFat24 0.6091n ± 0% 0.4639n ± 0% -23.84% (p=0.000 n=10) CopyFat25 0.9109n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10) CopyFat26 0.9107n ± 0% 0.4674n ± 0% -48.68% (p=0.000 n=10) CopyFat27 0.9108n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10) CopyFat28 0.9109n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10) CopyFat29 0.9110n ± 0% 0.4673n ± 0% -48.70% (p=0.000 n=10) CopyFat30 0.9109n ± 0% 0.4673n ± 0% -48.70% (p=0.000 n=10) CopyFat31 0.9110n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10) CopyFat32 0.6845n ± 0% 0.4845n ± 1% -29.21% (p=0.000 n=10) CopyFat33 0.9130n ± 0% 0.9117n ± 0% -0.14% (p=0.000 n=10) CopyFat34 0.9131n ± 0% 0.9118n ± 0% -0.14% (p=0.001 n=10) CopyFat35 0.9131n ± 0% 0.9117n ± 0% -0.15% (p=0.001 n=10) CopyFat36 0.9129n ± 0% 0.9117n ± 0% -0.14% (p=0.003 n=10) CopyFat37 0.9129n ± 0% 0.9117n ± 0% -0.14% (p=0.000 n=10) CopyFat38 0.9130n ± 0% 0.9118n ± 0% -0.14% (p=0.000 n=10) CopyFat39 0.9131n ± 0% 0.9118n ± 0% -0.15% (p=0.000 n=10) CopyFat40 0.9112n ± 0% 0.9118n ± 0% +0.07% (p=0.027 n=10) CopyFat41 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10) CopyFat42 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10) CopyFat43 1.1390n ± 0% 0.9116n ± 0% -19.96% (p=0.000 n=10) CopyFat44 1.1390n ± 0% 0.9119n ± 0% -19.94% (p=0.000 n=10) CopyFat45 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10) CopyFat46 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10) CopyFat47 1.1390n ± 0% 0.9117n ± 0% -19.96% (p=0.000 n=10) CopyFat48 0.9111n ± 0% 0.9116n ± 0% +0.06% (p=0.002 n=10) CopyFat49 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10) CopyFat50 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10) CopyFat51 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10) CopyFat52 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10) CopyFat53 1.2160n ± 0% 0.9293n ± 0% -23.58% (p=0.000 n=10) CopyFat54 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10) CopyFat55 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10) CopyFat56 1.1480n ± 0% 0.9303n ± 0% -18.96% (p=0.000 n=10) CopyFat57 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10) CopyFat58 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10) CopyFat59 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10) CopyFat60 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10) CopyFat61 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10) CopyFat62 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10) CopyFat63 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10) CopyFat64 1.1470n ± 0% 0.5742n ± 0% -49.94% (p=0.000 n=10) geomean 0.9710n 0.7214n -25.70% Change-Id: Iecfe52fde1d431a1e4503cd848813a67f3896512 Reviewed-on: https://go-review.googlesource.com/c/go/+/738261 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2026-03-17cmd/compile: eliminate redundant sign-extensions for wasmGeorge Adams
Add rules to eliminate redundant I64Extend sign-extension operations in the wasm backend: Idempotent (applying the same extend twice is redundant): (I64Extend32S (I64Extend32S x)) => (I64Extend32S x) (I64Extend16S (I64Extend16S x)) => (I64Extend16S x) (I64Extend8S (I64Extend8S x)) => (I64Extend8S x) Narrower-subsumes-wider (a narrower sign-extend already determines all the bits that a wider one would set): (I64Extend32S (I64Extend16S x)) => (I64Extend16S x) (I64Extend32S (I64Extend8S x)) => (I64Extend8S x) (I64Extend16S (I64Extend8S x)) => (I64Extend8S x) These patterns arise from nested sub-word type conversions. For example, converting int8 -> int16 -> int32 -> int64 lowers to I64Extend8S -> I64Extend16S -> I64Extend32S, but the I64Extend8S alone is sufficient since it already sign-extends from 8 to 64 bits. Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero Change-Id: I1637687df31893b1ffa36915a3bd2e10d455f4ef Reviewed-on: https://go-review.googlesource.com/c/go/+/754040 Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-16cmd/compile: (riscv64) optimize float32(abs|sqrt64(float64(x)))Meng Zhuo
Absorb unnecessary conversion between float32 and float64 if both src and dst are 32 bit. Updates #75463 Change-Id: Ia71941223b5cca3fea66b559da7b8f916e63feaf Reviewed-on: https://go-review.googlesource.com/c/go/+/733621 Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Julian Zhu <jz531210@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-10cmd/compile: add double-mask elimination rule for wasmGeorge Adams
Add a rule to collapse cascaded I64And operations with constant masks into a single mask: (I64And (I64And x (I64Const [c1])) (I64Const [c2])) => (I64And x (I64Const [c1 & c2])) This pattern arises from sub-word comparisons. For example, (Eq32 x y) lowers to (I64Eq (ZeroExt32to64 x) (ZeroExt32to64 y)), which becomes (I64Eq (I64And x 0xffffffff) (I64And y 0xffffffff)). If x or y is the result of another sub-word operation that already inserted a mask, the masks cascade and this rule collapses them. Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero Change-Id: Id7856b391be3ac20f1bc9eee40995b52c0754aed Reviewed-on: https://go-review.googlesource.com/c/go/+/753620 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2026-03-10cmd/compile: fix mips64 CALLtailinter argument countKeith Randall
It should be 2, not 1. Fixes #78013 Change-Id: If1c48c84c324a3fd50e9f4b43cca2ea62a995dc5 Reviewed-on: https://go-review.googlesource.com/c/go/+/752740 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2026-03-10cmd/compile: forward small Load through Move to avoid redundant copiesdorbmon
Fixes #77720 Add a generic SSA rewrite that forwards `Load` from a `Move` destination back to the `Move` source when it is provably safe, so field reads like `s.h.Value().typ` don’t force a full struct copy. - Add `Load <- Move` rewrite in `generic.rules` with safety guard: non-volatile source - Tweak `fixedbugs/issue22200*` so that it can still trigger the "stack frame too large" error. - Regenerate `rewritegeneric.go`. - Add `test/codegen/moveload.go` to assert no `MOVUPS` and direct `MOVBLZX` in both direct and inlined forms. Benchmark results (linux/amd64, i7-14700KF): $ go test cmd/compile/internal/test -run='^$' -bench='MoveLoad' -count=20 Before: BenchmarkMoveLoadTypViaValue-20 ~76.9 ns/op BenchmarkMoveLoadTypViaPtr-20 ~1.97 ns/op After: BenchmarkMoveLoadTypViaValue-20 ~1.894 ns/op BenchmarkMoveLoadTypViaPtr-20 ~1.905 ns/op The rewrite removes the redundant struct copy in `s.h.Value().typ`, bringing it in line with the direct pointer form. Change-Id: Iddf2263e390030ba013e0642a695b87c75f899da Reviewed-on: https://go-review.googlesource.com/c/go/+/748200 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <markfreeman@google.com>
2026-03-10cmd/compile: remove loop variable capture workaroundsgojkovicmatija99
Since Go 1.22, loop variables have per-iteration scope, making the x := x this pattern unnecessary for goroutine capture. No issue required for this trivial cleanup. Change-Id: I00d98522537fc2b9a6b4d598c8aa21b447628d41 Reviewed-on: https://go-review.googlesource.com/c/go/+/753400 Auto-Submit: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Robert Griesemer <gri@google.com>
2026-03-10cmd/compile: add identity and absorption rules for wasmGeorge Adams
Add post-lowering identity and absorption rules for I64And, I64Or, I64Xor, and I64Mul with constant operands: (I64And x (I64Const [-1])) => x (I64And x (I64Const [0])) => (I64Const [0]) (I64Or x (I64Const [0])) => x (I64Or x (I64Const [-1])) => (I64Const [-1]) (I64Xor x (I64Const [0])) => x (I64Mul x (I64Const [0])) => (I64Const [0]) (I64Mul x (I64Const [1])) => x The generic SSA rules handle these patterns before lowering, but these rules catch cases where wasm-specific lowering or other post-lowering optimization passes produce new nodes with identity or absorbing constant operands. For example, the complement rule lowers Com64(x) to (I64Xor x (I64Const [-1])), and if x is later determined to be all-ones, the I64And absorption rule can fold the result to zero. Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero Change-Id: Ie9a40e075662d4828a70e30b258d92ee171d0bc2 Reviewed-on: https://go-review.googlesource.com/c/go/+/752861 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2026-03-06cmd/compile: arm64 add 128-bit vector load/store SSA opsAlexander Musman
Add OpARM64FMOVQload, OpARM64FMOVQstore, OpARM64FLDPQ, and OpARM64FSTPQ for loading and storing Vec128 values. Includes offset folding and address combining rules. These ops will be used by subsequent CLs. Change-Id: I4ac86a0a31f878411f49d390cb8df01f81cfc4d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/738260 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Keith Randall <khr@golang.org>
2026-03-06cmd/compile: additional optimisation for CZEROEQZ/CZERONEZ on riscv64Joel Sing
Negation on a condition can be eliminated. Change-Id: I94fab5f019cbaebb2ca589e1d8796a9cb72f3894 Reviewed-on: https://go-review.googlesource.com/c/go/+/748401 Reviewed-by: Xueqi Luo <1824368278@qq.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Julian Zhu <jz531210@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>
2026-03-06cmd/compile: replace boolean simplification ruleMarvin Stenger
Replace one of the boolean simplification rules with two new rules in order to cover more cases. This is a rebase of CL 42516 which slipped through the cracks. Change-Id: I6da4cf30e5156174e8eac6bc2f0e2cebe95e555c Reviewed-on: https://go-review.googlesource.com/c/go/+/750520 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
2026-03-06cmd/compile: fold boolean x == x & x != xJorropo
Change-Id: I0e0a5919536b643477a6f9278fcc60492ea5a759 Reviewed-on: https://go-review.googlesource.com/c/go/+/750540 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>
2026-03-06cmd/compile: add I64Sub constant folding rule for wasmGeorge Adams
Add the missing I64Sub constant folding rule to the wasm backend. Every other wasm arithmetic operation (I64Add, I64Mul, I64And, I64Or, I64Xor, I64Shl, I64ShrU, I64ShrS) already had a post-lowering constant folding rule, but I64Sub was missing. While the generic SSA pass folds Sub64(Const64, Const64) before lowering, this rule ensures consistency and handles any edge cases where later wasm-specific passes produce I64Sub with two constant operands. Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero Change-Id: Ie8bc044dd300dcc6d077feec34f9a65f4a310b13 Reviewed-on: https://go-review.googlesource.com/c/go/+/751441 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> Commit-Queue: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-06cmd/compile: use tail calls for wrappers for embedded interfacesKeith Randall
type I interface { foo() } type S struct { I } Because I is embedded in S, S needs a foo method. We generate a wrapper function to implement (*S).foo. It just loads the embedded field I out of S and calls foo on it. When the thing in S.I itself needs a wrapper, then we have a wrapper calling another wrapper. This can continue, leaving a potentially long sequence of wrappers on the stack. When we then call runtime.Callers or friends, we have to walk an unbounded number of frames to find a bounded number of non-wrapper frames. This really happens, for instance with I = context.Context, S = context.ValueCtx, and runtime.Callers = pprof sample (for any of context.Context's methods). To fix, make the interface call in the wrapper a tail call. That way, the number of wrapper frames on the stack does not increase when there are lots of wrappers happening. Fixes #75764 Fixes #77781 Change-Id: I03b1731159d9218c7f14f72ecbbac822d6a6bb87 Reviewed-on: https://go-review.googlesource.com/c/go/+/751465 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2026-03-04runtime, cmd/compile: use preemptible memclr for large pointer-free clears“Muhammad
Large memory clearing operations (via clear() or large slice allocation) currently use non-preemptible assembly loops. This blocks the Garbage Collector from performing a Stop The World (STW) event, leading to significant tail latency or even indefinite hangs in tight loops. This change introduces memclrNoHeapPointersPreemptible, which chunks clears into 256KB blocks with preemption checks. The compiler's walk phase is updated to emit this call for large pointer-free clears. To prevent regressions, SSA rewrite rules are added to ensure that constant-size clears (which are common and small) continue to be inlined into OpZero assembly. Benchmarks on darwin/arm64: - STW with 50MB clear: Improved from 'Hung' to ~500µs max pause. - Small clears (5-64B): No measurable regression. - Large clears (1M-64M): No measurable regression. Fixes #69327 Change-Id: Ide14d6bcdca1f60d6ac95443acb57da9a8822538 Reviewed-on: https://go-review.googlesource.com/c/go/+/750480 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Robert Griesemer <gri@google.com>
2026-03-02cmd/compile: optimize sccp for faster convergenceYi Yang
While investigating other optimizations, I found several opportunities to accelerate sccp convergence: - Avoid adding duplicate uses to the re-visit worklist - Prevent queueing uses of values that have already reached the Bottom - Add an early exit when processing a value that is already Bottom These changes provide an overall speedup of ~9% for sccp phase during a full make.bash run. Also they does not change the number of constants found or the amount of dead code eliminated. Updates #77325 Change-Id: Iaf83f6ea355eed366c3d09fc38f85561634a5a16 GitHub-Last-Rev: 078c3d309cc1f1e4b7f7a40635ffc4506f2ac1c6 GitHub-Pull-Request: golang/go#77399 Reviewed-on: https://go-review.googlesource.com/c/go/+/740980 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org>
2026-03-02cmd/compile, simd/archsimd: add VPSRL immeidate peepholesJunyang Shao
Before this CL, the simdgen contains a sign check to selectively enable such rules for deduplication purposes. This left out `VPSRL` as it's only available in unsigned form. This CL fixes that. It looks like the previous documentation fix to SHA instruction might not had run go generate, so this CL also contains the generated code for that fix. There is also a weird phantom import in cmd/compile/internal/ssa/issue77582_test.go This CL also fixes that The trybot didn't complain? Change-Id: Ibbf9f789c1a67af1474f0285ab376bc07f17667e Reviewed-on: https://go-review.googlesource.com/c/go/+/748501 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2026-03-01cmd/compile: combine some generic AMD64 simplificationsJakub Ciolek
Saves a few lines. If applicable, we also directly rewrite to 32 bit MOVLconst, skipping the redundant transformation. Change-Id: I4c2f5e2bb480e798cbe373de608e19a951d168ff Reviewed-on: https://go-review.googlesource.com/c/go/+/640215 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>