aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa/_gen/generic.rules
AgeCommit message (Collapse)Author
2026-01-23cmd/compile: cleanup isUnsignedPowerOfTwoJorropo
Merge the signed and unsigned generic functions. The only implementation difference between the two is: n > 0 vs n != 0 check. For unsigned numbers n > 0 == n != 0 and we infact optimize the first to the second. Change-Id: Ia2f6c3e3d4eb098d98f85e06dc2e81baa60bad4e Reviewed-on: https://go-review.googlesource.com/c/go/+/726720 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-22cmd/compile: ensure ops have the expected argument widthsKeith Randall
The generic SSA representation uses explicit extension and truncation operations to change widths of values. The map intrinsics were playing somewhat fast and loose with this requirement. Fix that, and add a check to make sure we don't regress. I don't think there is a triggerable bug here, but I ran into this with some prove pass modifications, where cmd/compile/internal/ssa/prove.go:isCleanExt (and/or its uses) is actually wrong when this invariant is not maintained. Change-Id: Idb7be6e691e2dbf6d7af6584641c3227c5c64bf5 Reviewed-on: https://go-review.googlesource.com/c/go/+/731300 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2026-01-22cmd/compile: optimize small constant-sized MemEqAlexander Musman
Add optimization patterns for MemEq with small constant sizes (3-32 bytes). These patterns help to avoid runtime calls for small sizes. For sizes 3-16, combine two chunks loading and comparison. For sizes 17-32, combine a 16-byte comparison with the remaining bytes. This change may increase binary size slightly due to inline expansion, but improves performance for code with many small memequals, e.g. DecodehealingTracker benchmark on arm64: shortname: minio pkg: github.com/minio/minio/cmd │ Orig.res │ Uexp.res │ │ sec/op │ sec/op vs base │ DecodehealingTracker-4 842.5n ± 1% 794.0n ± 3% -5.75% (p=0.000 n=10) AppendMsgResyncTargetsInfo-4 8.472n ± 0% 8.472n ± 0% ~ (p=0.582 n=10) DataUpdateTracker-4 2.856µ ± 2% 2.804µ ± 3% ~ (p=0.210 n=10) MarshalMsgdataUsageCacheInfo-4 131.2n ± 1% 131.6n ± 2% ~ (p=0.494 n=10) geomean 227.4n 223.2n -1.86% │ Orig.res │ Uexp.res │ │ B/s │ B/s vs base │ DecodehealingTracker-4 352.0Mi ± 1% 373.5Mi ± 3% +6.10% (p=0.000 n=10) AppendMsgResyncTargetsInfo-4 1.099Gi ± 0% 1.099Gi ± 0% ~ (p=0.183 n=10) DataUpdateTracker-4 341.8Ki ± 3% 351.6Ki ± 3% ~ (p=0.286 n=10) geomean 50.95Mi 52.46Mi +2.96% Change-Id: If3d7e7395656d5f36e3ab303a71044293d17bc3e Reviewed-on: https://go-review.googlesource.com/c/go/+/688195 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-11-26cmd/compile: introduce alias analysis and automatically free non-aliased ↵thepudds
memory after growslice This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL updates the compiler to examine append calls to prove whether or not the slice is aliased. If proven unaliased, the compiler automatically inserts a call to a new runtime function introduced with this CL, runtime.growsliceNoAlias, which frees the old backing memory immediately after slice growth is complete and the old storage is logically dead. Two append benchmarks below show promising results, executing up to ~2x faster and up to factor of ~3 memory reduction with this CL. The approach works with multiple append calls for the same slice, including inside loops, and the final slice memory can be escaping, such as in a classic pattern of returning a slice from a function after the slice is built. (The final slice memory is never freed with this CL, though we have other work that tackles that.) An example target for this CL is we automatically free the intermediate memory for the appends in the loop in this function: func f1(input []int) []int { var s []int for _, x := range input { s = append(s, g(x)) // s cannot be aliased here if h(x) { s = append(s, x) // s cannot be aliased here } } return s // slice escapes at end } In this case, the compiler and the runtime collaborate so that the heap allocated backing memory for s is automatically freed after a successful grow. (For the first grow, there is nothing to free, but for the second and subsequent growths, the old heap memory is freed automatically.) The new runtime.growsliceNoAlias is primarily implemented by calling runtime.freegc, which we introduced in CL 673695. The high-level approach here is we step through the IR starting from a slice declaration and look for any operations that either alias the slice or might do so, and treat any IR construct we don't specifically handle as a potential alias (and therefore conservatively fall back to treating the slice as aliased when encountering something not understood). For loops, some additional care is required. We arrange the analysis so that an alias in the body of a loop causes all the appends in that same loop body to be marked aliased, even if the aliasing occurs after the append in the IR: func f2() { var s []int for i := range 10 { s = append(s, i) // aliased due to next line alias = s } } For nested loops, we analyse the nesting appropriately so that for example this append is still proven as non-aliased in the inner loop even though it aliased for the outer loop: func f3() { for range 10 { var s []int for i := range 10 { s = append(s, i) // append using non-aliased slice } alias = s } } A good starting point is the beginning of the test/escape_alias.go file, which starts with ~10 introductory examples with brief comments that attempt to illustrate the high-level approach. For more details, see the new .../internal/escape/alias.go file, especially the (*aliasAnalysis).analyze method. In the first benchmark, an append in a loop builds up a slice from nothing, where the slice elements are each 64 bytes. In the table below, 'count' is the number of appends. With 1 append, there is no opportunity for this CL to free memory. Once there are 2 appends, the growth from 1 element to 2 elements means the compiler-inserted growsliceNoAlias frees the 1-element array, and we see a ~33% reduction in memory use and a small reported speed improvement. As the number of appends increases for example to 5, we are at a ~20% speed improvement and ~45% memory reduction, and so on until we reach ~40% faster and ~50% less memory allocated at the end of the table. There can be variation in the reported numbers based on -randlayout, so this table is for 30 different values of -randlayout with a total n=150. (Even so, there is still some variation, so we probably should not read too much into small changes.) This is with GOAMD64=v3 on a VM that gcc reports is cascadelake. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ Append64Bytes/count=1-4 31.09n ± 2% 31.69n ± 1% +1.95% (n=150) Append64Bytes/count=2-4 73.31n ± 1% 70.27n ± 0% -4.15% (n=150) Append64Bytes/count=3-4 142.7n ± 1% 124.6n ± 1% -12.68% (n=150) Append64Bytes/count=4-4 149.6n ± 1% 127.7n ± 0% -14.64% (n=150) Append64Bytes/count=5-4 277.1n ± 1% 213.6n ± 0% -22.90% (n=150) Append64Bytes/count=6-4 280.7n ± 1% 216.5n ± 1% -22.87% (n=150) Append64Bytes/count=10-4 544.3n ± 1% 386.6n ± 0% -28.97% (n=150) Append64Bytes/count=20-4 1058.5n ± 1% 715.6n ± 1% -32.39% (n=150) Append64Bytes/count=50-4 2.121µ ± 1% 1.404µ ± 1% -33.83% (n=150) Append64Bytes/count=100-4 4.152µ ± 1% 2.736µ ± 1% -34.11% (n=150) Append64Bytes/count=200-4 7.753µ ± 1% 4.882µ ± 1% -37.03% (n=150) Append64Bytes/count=400-4 15.163µ ± 2% 9.273µ ± 1% -38.84% (n=150) geomean 601.8n 455.0n -24.39% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ Append64Bytes/count=1-4 64.00 ± 0% 64.00 ± 0% ~ (n=150) Append64Bytes/count=2-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) Append64Bytes/count=3-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=4-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=5-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=6-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=10-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) Append64Bytes/count=20-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) Append64Bytes/count=50-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) Append64Bytes/count=100-4 15.938Ki ± 0% 8.021Ki ± 0% -49.67% (n=150) Append64Bytes/count=200-4 31.94Ki ± 0% 16.08Ki ± 0% -49.64% (n=150) Append64Bytes/count=400-4 63.94Ki ± 0% 32.33Ki ± 0% -49.44% (n=150) geomean 1.991Ki 1.124Ki -43.54% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ Append64Bytes/count=1-4 1.000 ± 0% 1.000 ± 0% ~ (n=150) Append64Bytes/count=2-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) Append64Bytes/count=3-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=4-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=5-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=6-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=10-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) Append64Bytes/count=20-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) Append64Bytes/count=50-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) Append64Bytes/count=100-4 8.000 ± 0% 1.000 ± 0% -87.50% (n=150) Append64Bytes/count=200-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) Append64Bytes/count=400-4 10.000 ± 0% 1.000 ± 0% -90.00% (n=150) geomean 4.331 1.000 -76.91% The second benchmark is similar, but instead uses an 8-byte integer for the slice element. The first 4 appends in the loop never call into the runtime thanks to the excellent CL 664299 introduced by Keith in Go 1.25 that allows some <= 32 byte dynamically-sized slices to be on the stack, so this CL is neutral for <= 32 bytes. Once the 5th append occurs at count=5, a grow happens via the runtime and heap allocates as normal, but freegc does not yet have anything to free, so we see a small ~1.4ns penalty reported there. But once the second growth happens, the older heap memory is now automatically freed by freegc, so we start to see some benefit in memory reductions and speed improvements, starting at a tiny speed improvement (close to a wash, or maybe noise) by the second growth before count=10, and building up to ~2x faster with ~68% fewer allocated bytes reported. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ AppendInt/count=1-4 2.978n ± 0% 2.969n ± 0% -0.30% (p=0.000 n=150) AppendInt/count=4-4 4.292n ± 3% 4.163n ± 3% ~ (p=0.528 n=150) AppendInt/count=5-4 33.50n ± 0% 34.93n ± 0% +4.25% (p=0.000 n=150) AppendInt/count=10-4 76.21n ± 1% 75.67n ± 0% -0.72% (p=0.000 n=150) AppendInt/count=20-4 150.6n ± 1% 133.0n ± 0% -11.65% (n=150) AppendInt/count=50-4 284.1n ± 1% 225.6n ± 0% -20.59% (n=150) AppendInt/count=100-4 544.2n ± 1% 392.4n ± 1% -27.89% (n=150) AppendInt/count=200-4 1051.5n ± 1% 702.3n ± 0% -33.21% (n=150) AppendInt/count=400-4 2.041µ ± 1% 1.312µ ± 1% -35.70% (n=150) AppendInt/count=1000-4 5.224µ ± 2% 2.851µ ± 1% -45.43% (n=150) AppendInt/count=2000-4 11.770µ ± 1% 6.010µ ± 1% -48.94% (n=150) AppendInt/count=3000-4 17.747µ ± 2% 8.264µ ± 1% -53.44% (n=150) geomean 331.8n 246.4n -25.72% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) AppendInt/count=20-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) AppendInt/count=50-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) AppendInt/count=100-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) AppendInt/count=200-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) AppendInt/count=400-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) AppendInt/count=1000-4 24.56Ki ± 0% 10.05Ki ± 0% -59.07% (n=150) AppendInt/count=2000-4 58.56Ki ± 0% 20.31Ki ± 0% -65.32% (n=150) AppendInt/count=3000-4 85.19Ki ± 0% 27.30Ki ± 0% -67.95% (n=150) geomean ² -42.81% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) AppendInt/count=20-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) AppendInt/count=50-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) AppendInt/count=100-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) AppendInt/count=200-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) AppendInt/count=400-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) AppendInt/count=1000-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) AppendInt/count=2000-4 11.000 ± 0% 1.000 ± 0% -90.91% (n=150) AppendInt/count=3000-4 12.000 ± 0% 1.000 ± 0% -91.67% (n=150) geomean ² -72.76% ² Of course, these are just microbenchmarks, but likely indicate there are some opportunities here. The immediately following CL 712422 tackles inlining and is able to get runtime.freegc working automatically with iterators such as used by slices.Collect, which becomes able to automatically free the intermediate memory from its repeated appends (which earlier in this work required a temporary hand edit to the slices package). For now, we only use the NoAlias version for element types without pointers while waiting on additional runtime support in CL 698515. Updates #74299 Change-Id: I1b9d286aa97c170dcc2e203ec0f8ca72d84e8221 Reviewed-on: https://go-review.googlesource.com/c/go/+/710015 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-11-26cmd/compile,internal/bytealg: add MemEq intrinsic for runtime.memequalAlexander Musman
Introduce a new MemEq SSA operation for runtime.memequal. The operation is initially implemented for arm64. The change adds opt rules (following existing rules for call to runtime.memequal), working with MemEq, and a later op version LoweredMemEq which may be lowered differently for more constant size cases in future (for other targets as well as for arm64). The new MemEq SSA operation does not have memory result, allowing cse of loads operations around it. Code size difference (for arm64 linux): Executable Old .text New .text Change ------------------------------------------------------- asm 1970420 1969668 -0.04% cgo 1741220 1740212 -0.06% compile 8956756 8959428 +0.03% cover 1879332 1878772 -0.03% link 2574116 2572660 -0.06% preprofile 867124 866820 -0.04% vet 2890404 2888596 -0.06% Change-Id: I6ab507929b861884d17d5818cfbd152cf7879751 Reviewed-on: https://go-review.googlesource.com/c/go/+/686655 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2025-11-24[dev.simd] all: merge master (8dd5b13) into dev.simdCherry Mui
Merge List: + 2025-11-24 8dd5b13abc cmd/compile: relax stmtline_test on amd64 + 2025-11-23 feae743bdb cmd/compile: use 32x32->64 multiplies on loong64 + 2025-11-23 e88be8a128 runtime: fix stale comment for mheap/malloc + 2025-11-23 a318843a2a cmd/internal/obj/loong64: optimize duplicate optab entries + 2025-11-23 a18294bb6a cmd/internal/obj/arm64, image/gif, runtime, sort: use math/bits to calculate log2 + 2025-11-23 437323ef7b slices: fix incorrect comment in slices.Insert function documentation + 2025-11-23 1993dca400 doc/next: pre-announce end of support for macOS 12 in Go 1.27 + 2025-11-22 337f7b1f5d cmd/go: update default go directive in mod or work init + 2025-11-21 3c26aef8fb cmd/internal/obj/riscv: improve large branch/call/jump tests + 2025-11-21 31aa9f800b crypto/tls: use inner hello for earlyData when using QUIC and ECH + 2025-11-21 d68aec8db1 runtime: replace trace seqlock with write flag + 2025-11-21 8d9906cd34 runtime/trace: add Log benchmark + 2025-11-21 6aeacdff38 cmd/go: support sha1 repos when git default is sha256 + 2025-11-21 9570036ca5 crypto/sha3: make the zero value of SHAKE useable + 2025-11-21 155efbbeeb crypto/sha3: make the zero value of SHA3 useable + 2025-11-21 6f16669e34 database/sql: don't ignore ColumnConverter for unknown input count + 2025-11-21 121bc3e464 runtime/pprof: remove hard-coded sleep in CPU profile reader + 2025-11-21 b604148c4e runtime: fix double wakeup in CPU profile buffer + 2025-11-21 22f24f90b5 cmd/compile: change testing.B.Loop keep alive semantic + 2025-11-21 cfb9d2eb73 net: remove unused linknames + 2025-11-21 65ef314f89 net/http: remove unused linknames + 2025-11-21 0f32fbc631 net/http: populate Response.Request when using NewFileTransport + 2025-11-21 3e0a8e7867 net/http: preserve original path encoding in redirects + 2025-11-21 831af61120 net/http: use HTTP 307 redirects in ServeMux + 2025-11-21 87269224cb net/http: update Response.Request.URL after redirects on GOOS=js + 2025-11-21 7aa9ca729f net/http/cookiejar: treat localhost as secure origin + 2025-11-21 f870a1d398 net/url: warn that JoinPath arguments should be escaped + 2025-11-21 9962d95fed crypto/internal/fips140/mldsa: unroll NTT and inverseNTT + 2025-11-21 f821fc46c5 crypto/internal/fisp140test: update acvptool, test data + 2025-11-21 b59efc38a0 crypto/internal/fips140/mldsa: new package + 2025-11-21 62741480b8 runtime: remove linkname for gopanic + 2025-11-21 7db2f0bb9a crypto/internal/hpke: separate KEM and PublicKey/PrivateKey interfaces + 2025-11-21 e15800c0ec crypto/internal/hpke: add ML-KEM and hybrid KEMs, and SHAKE KDFs + 2025-11-21 7c985a2df4 crypto/internal/hpke: modularize API and support more ciphersuites + 2025-11-21 e7d47ac33d cmd/compile: simplify negative on multiplication + 2025-11-21 35d2712b32 net/http: fix typo in Transport docs + 2025-11-21 90c970cd0f net: remove unnecessary loop variable copies in tests + 2025-11-21 9772d3a690 cmd/cgo: strip top-level const qualifier from argument frame struct + 2025-11-21 1903782ade errors: add examples for custom Is/As matching + 2025-11-21 ec92bc6d63 cmd/compile: rewrite Rsh to RshU if arguments are proved positive + 2025-11-21 3820f94c1d cmd/compile: propagate unsigned relations for Rsh if arguments are positive + 2025-11-21 d474f1fd21 cmd/compile: make dse track multiple shadowed ranges + 2025-11-21 d0d0a72980 cmd/compile/internal/ssa: correct type of ARM64 conditional instructions + 2025-11-21 a9704f89ea internal/runtime/gc/scan: add AVX512 impl of filterNil. + 2025-11-21 ccd389036a cmd/internal/objabi: remove -V=goexperiment internal special case + 2025-11-21 e7787b9eca runtime: go fmt + 2025-11-21 17b3b98796 internal/strconv: go fmt + 2025-11-21 c851827c68 internal/trace: go fmt + 2025-11-21 f87aaec53d cmd/compile: fix integer overflow in prove pass + 2025-11-21 dbd2ab9992 cmd/compile/internal: fix typos + 2025-11-21 b9d86baae3 cmd/compile/internal/devirtualize: fix typos + 2025-11-20 4b0e3cc1d6 cmd/link: support loading R_LARCH_PCREL20_S2 and R_LARCH_CALL36 relocs + 2025-11-20 cdba82c7d6 cmd/internal/obj/loong64: add {,X}VSLT.{B/H/W/V}{,U} instructions support + 2025-11-20 bd2b117c2c crypto/tls: add QUICErrorEvent + 2025-11-20 3ad2e113fc net/http/httputil: wrap ReverseProxy's outbound request body so Close is a noop + 2025-11-20 d58b733646 runtime: track goroutine location until actual STW + 2025-11-20 1bc54868d4 cmd/vendor: update to x/tools@68724af + 2025-11-20 8c3195973b runtime: disable stack allocation tests on sanitizers + 2025-11-20 ff654ea100 net/url: permit colons in the host of postgresql:// URLs + 2025-11-20 a662badab9 encoding/json: remove linknames + 2025-11-20 5afe237d65 mime: add missing path for mime types in godoc + 2025-11-20 c1b7112af8 os/signal: make NotifyContext cancel the context with a cause Change-Id: Ib93ef643be610dfbdd83ff45095a7b1ca2537b8b
2025-11-21cmd/compile: simplify negative on multiplicationMeng Zhuo
goos: linux goarch: amd64 pkg: cmd/compile/internal/test cpu: AMD EPYC 7532 32-Core Processor │ simplify_base │ simplify_new │ │ sec/op │ sec/op vs base │ SimplifyNegMul 623.0n ± 0% 319.3n ± 1% -48.75% (p=0.000 n=10) goos: linux goarch: riscv64 pkg: cmd/compile/internal/test cpu: Spacemit(R) X60 │ simplify.base │ simplify.new │ │ sec/op │ sec/op vs base │ SimplifyNegMul 10.928µ ± 0% 6.432µ ± 0% -41.14% (p=0.000 n=10) Change-Id: I1d9393cd19a0b948a5d3a512d627cdc0cf0b38be Reviewed-on: https://go-review.googlesource.com/c/go/+/721520 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <markfreeman@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2025-11-20[dev.simd] all: merge master (ca37d24) into dev.simdCherry Mui
Conflicts: - src/cmd/compile/internal/typecheck/builtin.go Merge List: + 2025-11-20 ca37d24e0b net/http: drop unused "broken" field from persistConn + 2025-11-20 4b740af56a cmd/internal/obj/x86: handle global reference in From3 in dynlink mode + 2025-11-20 790384c6c2 spec: adjust rule for type parameter on RHS of alias declaration + 2025-11-20 a49b0302d0 net/http: correctly close fake net.Conns + 2025-11-20 32f5aadd2f cmd/compile: stack allocate backing stores during append + 2025-11-20 a18aff8057 runtime: select GC mark workers during start-the-world + 2025-11-20 829779f4fe runtime: split findRunnableGCWorker in two + 2025-11-20 ab59569099 go/version: use "custom" as an example of a version suffix + 2025-11-19 c4bb9653ba cmd/compile: Implement LoweredZeroLoop with LSX Instruction on loong64 + 2025-11-19 7f2ae21fb4 cmd/internal/obj/loong64: add MULW.D.W[U] instructions + 2025-11-19 a2946f2385 crypto: add Encapsulator and Decapsulator interfaces + 2025-11-19 6b83bd7146 crypto/ecdh: add KeyExchanger interface + 2025-11-19 4fef9f8b55 go/types, types2: fix object path for grouped declaration statements + 2025-11-19 33529db142 spec: escape double-ampersands + 2025-11-19 dc42565a20 cmd/compile: fix control flow for unsigned divisions proof relations + 2025-11-19 e64023dcbf cmd/compile: cleanup useless if statement in prove + 2025-11-19 2239520d1c test: go fmt prove.go tests + 2025-11-19 489d3dafb7 math: switch s390x math.Pow to generic implementation + 2025-11-18 8c41a482f9 runtime: add dlog.hexdump + 2025-11-18 e912618bd2 runtime: add hexdumper + 2025-11-18 2cf9d4b62f Revert "net/http: do not discard body content when closing it within request handlers" + 2025-11-18 4d0658bb08 cmd/compile: prefer fixed registers for values + 2025-11-18 ba634ca5c7 cmd/compile: fold boolean NOT into branches + 2025-11-18 8806d53c10 cmd/link: align sections, not symbols after DWARF compress + 2025-11-18 c93766007d runtime: do not print recovered when double panic with the same value + 2025-11-18 9859b43643 cmd/asm,cmd/compile,cmd/internal/obj/riscv: use compressed instructions on riscv64 + 2025-11-17 b9ef0633f6 cmd/internal/sys,internal/goarch,runtime: enable the use of compressed instructions on riscv64 + 2025-11-17 a087dea869 debug/elf: sync new loong64 relocation types up to LoongArch ELF psABI v20250521 + 2025-11-17 e1a12c781f cmd/compile: use 32x32->64 multiplies on arm64 + 2025-11-17 6caab99026 runtime: relax TestMemoryLimit on darwin a bit more + 2025-11-17 eda2e8c683 runtime: clear frame pointer at thread entry points + 2025-11-17 6919858338 runtime: rename findrunnable references to findRunnable + 2025-11-17 8e734ec954 go/ast: fix BasicLit.End position for raw strings containing \r + 2025-11-17 592775ec7d crypto/mlkem: avoid a few unnecessary inverse NTT calls + 2025-11-17 590cf18daf crypto/mlkem/mlkemtest: add derandomized Encapsulate768/1024 + 2025-11-17 c12c337099 cmd/compile: teach prove about subtract idioms + 2025-11-17 bc15963813 cmd/compile: clean up prove pass + 2025-11-17 1297fae708 go/token: add (*File).End method + 2025-11-17 65c09eafdf runtime: hoist invariant code out of heapBitsSmallForAddrInline + 2025-11-17 594129b80c internal/runtime/maps: update doc for table.Clear + 2025-11-15 c58d075e9a crypto/rsa: deprecate PKCS#1 v1.5 encryption + 2025-11-14 d55ecea9e5 runtime: usleep before stealing runnext only if not in syscall + 2025-11-14 410ef44f00 cmd: update x/tools to 59ff18c + 2025-11-14 50128a2154 runtime: support runtime.freegc in size-specialized mallocs for noscan objects + 2025-11-14 c3708350a4 cmd/go: tests: rename git-min-vers->git-sha256 + 2025-11-14 aea881230d std: fix printf("%q", int) mistakes + 2025-11-14 120f1874ef runtime: add more precise test of assist credit handling for runtime.freegc + 2025-11-14 fecfcaa4f6 runtime: add runtime.freegc to reduce GC work + 2025-11-14 5a347b775e runtime: set GOEXPERIMENT=runtimefreegc to disabled by default + 2025-11-14 1a03d0db3f runtime: skip tests for GOEXPERIMENT=arenas that do not handle clobberfree=1 + 2025-11-14 cb0d9980f5 net/http: do not discard body content when closing it within request handlers + 2025-11-14 03ed43988f cmd/compile: allow multi-field structs to be stored directly in interfaces + 2025-11-14 1bb1f2bf0c runtime: put AddCleanup cleanup arguments in their own allocation + 2025-11-14 9fd2e44439 runtime: add AddCleanup benchmark + 2025-11-14 80c91eedbb runtime: ensure weak handles end up in their own allocation + 2025-11-14 7a8d0b5d53 runtime: add debug mode to extend _Grunning-without-P windows + 2025-11-14 710abf74da internal/runtime/cgobench: add Go function call benchmark for comparison + 2025-11-14 b24aec598b doc, cmd/internal/obj/riscv: document the riscv64 assembler + 2025-11-14 a0e738c657 cmd/compile/internal: remove incorrect riscv64 SLTI rule + 2025-11-14 2cdcc4150b cmd/compile: fold negation into multiplication + 2025-11-14 b57962b7c7 bytes: fix panic in bytes.Buffer.Peek + 2025-11-14 0a569528ea cmd/compile: optimize comparisons with single bit difference + 2025-11-14 1e5e6663e9 cmd/compile: remove unnecessary casts and types from riscv64 rules + 2025-11-14 ddd8558e61 go/types, types2: swap object.color for Checker.objPathIdx + 2025-11-14 9daaab305c cmd/link/internal/ld: make runtime.buildVersion with experiments valid + 2025-11-13 d50a571ddf test: fix tests to work with sizespecializedmalloc turned off + 2025-11-13 704f841eab cmd/trace: annotation proc start/stop with thread and proc always + 2025-11-13 17a02b9106 net/http: remove unused isLitOrSingle and isNotToken + 2025-11-13 ff61991aed cmd/go: fix flaky TestScript/mod_get_direct + 2025-11-13 129d0cb543 net/http/cgi: accept INCLUDED as protocol for server side includes + 2025-11-13 77c5130100 go/types: minor simplification + 2025-11-13 7601cd3880 go/types: generate cycles.go + 2025-11-13 7a372affd9 go/types, types2: rename definedType to declaredType and clarify docs Change-Id: Ibaa9bdb982364892f80e511c1bb12661fcd5fb86
2025-11-14cmd/compile: allow multi-field structs to be stored directly in interfacesKeith Randall
If the struct is a bunch of 0-sized fields and one pointer field. Merged revert-of-revert for 4 CLs. original revert 681937 695016 693415 694996 693615 695015 694195 694995 Fixes #74092 Update #74888 Update #74908 Update #74935 (updated issues are bugs in the last attempt at this) Change-Id: I32246d49b8bac3bb080972dc06ab432a5480d560 Reviewed-on: https://go-review.googlesource.com/c/go/+/714421 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2025-11-14cmd/compile: fold negation into multiplicationMeng Zhuo
goos: linux goarch: riscv64 pkg: cmd/compile/internal/test cpu: Spacemit(R) X60 │ /root/mul.base.log │ /root/mul.new.log │ │ sec/op │ sec/op vs base │ MulNeg 6.426µ ± 0% 4.501µ ± 0% -29.96% (p=0.000 n=10) Mul2Neg 9.000µ ± 0% 6.431µ ± 0% -28.54% (p=0.000 n=10) Mul2 1.263µ ± 0% 1.263µ ± 0% ~ (p=1.000 n=10) MulNeg2 1.577µ ± 0% 1.577µ ± 0% ~ (p=0.211 n=10) geomean 3.276µ 2.756µ -15.89% goos: linux goarch: amd64 pkg: cmd/compile/internal/test cpu: AMD EPYC 7532 32-Core Processor │ /root/base │ /root/new │ │ sec/op │ sec/op vs base │ MulNeg 691.9n ± 1% 319.4n ± 0% -53.83% (p=0.000 n=10) Mul2Neg 630.0n ± 0% 629.6n ± 0% -0.07% (p=0.000 n=10) Mul2 438.1n ± 0% 438.1n ± 0% ~ (p=0.728 n=10) MulNeg2 439.3n ± 0% 439.4n ± 0% ~ (p=0.656 n=10) geomean 538.2n 443.6n -17.58% Change-Id: Ice8e6c8d1e8e3009ba8a0b1b689205174e199019 Reviewed-on: https://go-review.googlesource.com/c/go/+/720180 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>
2025-11-14cmd/compile: optimize comparisons with single bit differenceMichael Munday
Optimize comparisons with constants that only differ by 1 bit (i.e. a power of 2). For example: x == 4 || x == 6 -> x|2 == 6 x != 1 && x != 5 -> x|4 != 5 Change-Id: Ic61719e5118446d21cf15652d9da22f7d95b2a15 Reviewed-on: https://go-review.googlesource.com/c/go/+/719420 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-11-13[dev.simd] all: merge master (57362e9) into dev.simdCherry Mui
Conflicts: - src/cmd/compile/internal/ir/symtab.go - src/cmd/compile/internal/ssa/prove.go - src/cmd/compile/internal/ssa/rewriteAMD64.go - src/cmd/compile/internal/ssagen/intrinsics.go - src/cmd/compile/internal/typecheck/builtin.go - src/internal/buildcfg/exp.go - src/internal/strconv/ftoa.go - test/codegen/stack.go Manually resolved some conflicts: - Use internal/strconv for simd.String, remove internal/ftoa - prove.go is just copied from the one on the main branch. We have cherry-picked the changes to prove.go to main branch, so our copy is identical to an old version of the one on the main branch. There are CLs landed after our cherry-picks. Just copy it over to adopt the new code. Merge List: + 2025-11-13 57362e9814 go/types, types2: check for direct cycles as a separate phase + 2025-11-13 099e0027bd cmd/go/internal/modfetch: consolidate global vars + 2025-11-13 028375323f cmd/go/internal/modfetch/codehost: fix flaky TestReadZip + 2025-11-13 4ebf295b0b runtime: prefer to restart Ps on the same M after STW + 2025-11-13 625d8e9b9c runtime/pprof: fix goroutine leak profile tests for noopt + 2025-11-13 4684a26c26 spec: remove cycle restriction for type parameters + 2025-11-13 0f9c8fb29d cmd/asm,cmd/internal/obj/riscv: add support for riscv compressed instructions + 2025-11-13 a15d036ce2 cmd/internal/obj/riscv: implement better bit pattern encoding + 2025-11-12 abb241a789 cmd/internal/obj/loong64: add {,X}VS{ADD,SUB}.{B/H/W/V}{,U} instructions support + 2025-11-12 0929d21978 cmd/go: keep objects alive while stopping cleanups + 2025-11-12 f03d06ec1a runtime: fix list test memory management for mayMoreStack + 2025-11-12 48127f656b crypto/internal/fips140/sha3: remove outdated TODO + 2025-11-12 c3d1d42764 sync/atomic: amend comments for Value.{Swap,CompareAndSwap} + 2025-11-12 e0807ba470 cmd/compile: don't clear ptrmask in fillptrmask + 2025-11-12 66318d2b4b internal/abi: correctly describe result in Name.Name doc comment + 2025-11-12 34aef89366 cmd/compile: use FCLASSD for subnormal checks on riscv64 + 2025-11-12 0c28789bd7 net/url: disallow raw IPv6 addresses in host + 2025-11-12 4e761b9a18 cmd/compile: optimize liveness in stackalloc + 2025-11-12 956909ff84 crypto/x509: move BetterTLS suite from crypto/tls + 2025-11-12 6525f46707 cmd/link: change shdr and phdr from arrays to slices + 2025-11-12 d3aeba1670 runtime: switch p.gcFractionalMarkTime to atomic.Int64 + 2025-11-12 8873e8bea2 runtime,runtime/pprof: clean up goroutine leak profile writing + 2025-11-12 b8b84b789e cmd/go: clarify the -o testflag is only for copying the binary + 2025-11-12 c761b26b56 mime: parse media types that contain braces + 2025-11-12 65858a146e os/exec: include Cmd.Start in the list of methods that run Cmd + 2025-11-11 4bfc3a9d14 std,cmd: go fix -any std cmd + 2025-11-11 2263d4aabd runtime: doubly-linked sched.midle list + 2025-11-11 046dce0e54 runtime: use new list type for spanSPMCs + 2025-11-11 5f11275457 runtime: reusable intrusive doubly-linked list + 2025-11-11 951cf0501b internal/trace/testtrace: fix flag name typos + 2025-11-11 2750f95291 cmd/go: implement accurate pseudo-versions for Mercurial + 2025-11-11 b709a3e8b4 cmd/go/internal/vcweb: cache hg servers + 2025-11-11 426ef30ecf cmd/go: implement -reuse for Mercurial repos + 2025-11-10 5241d114f5 spec: more precise prose for special case of append + 2025-11-10 cdf64106f6 go/types, types2: first argument to append must never be be nil + 2025-11-10 a0eb4548cf .gitignore: ignore go test artifacts + 2025-11-10 bf58e7845e internal/trace: add "command" to convert text traces to raw + 2025-11-10 052c192a4c runtime: fix lock rank for work.spanSPMCs.lock + 2025-11-10 bc5ffe5c79 internal/runtime/sys,math/bits: eliminate bounds checks on len8tab + 2025-11-10 32f8d6486f runtime: document that tracefpunwindoff applies to some profilers + 2025-11-10 1c1c1942ba cmd/go: remove redundant AVX regex in security flag checks + 2025-11-10 3b3d6b9e5d cmd/internal/obj/arm64: shorten constant integer loads + 2025-11-10 5f4b5f1a19 runtime/msan: use different msan routine for copying + 2025-11-10 0fe6c8e8c8 runtime: tweak wording for comment of mcache.flushGen + 2025-11-10 95a0e5adc1 sync: don't call Done when f() panics in WaitGroup.Go + 2025-11-08 e8ed85d6c2 cmd/go: update goSum if necessary + 2025-11-08 b76103c08e cmd/go: output missing GoDebug entries + 2025-11-07 47a63a331d cmd/go: rewrite hgrepo1 test repo to be deterministic + 2025-11-07 7995751d3a cmd/go: copy git reuse and support repos to hg + 2025-11-07 66c7ca7fb3 cmd/go: improve TestScript/reuse_git + 2025-11-07 de84ac55c6 cmd/link: clean up some comments to Go standards + 2025-11-07 5cd1b73772 runtime: correctly print panics before fatal-ing on defer + 2025-11-07 91ca80f970 runtime/cgo: improve error messages after pointer panic + 2025-11-07 d36e88f21f runtime: tweak wording for doc + 2025-11-06 ad3ccd92e4 cmd/link: move pclntab out of relro section + 2025-11-06 43b91e7abd iter: fix a tiny doc comment bug + 2025-11-06 48c7fa13c6 Revert "runtime: remove the pc field of _defer struct" + 2025-11-05 8111104a21 cmd/internal/obj/loong64: add {,X}VSHUF.{B/H/W/V} instructions support + 2025-11-05 2e2072561c cmd/internal/obj/loong64: add {,X}VEXTRINS.{B,H,W,V} instruction support + 2025-11-05 01c29d1f0b internal/chacha8rand: replace VORV with instruction VMOVQ on loong64 + 2025-11-05 f01a1841fd cmd/compile: fix error message on loong64 + 2025-11-05 8cf7a0b4c9 cmd/link: support weak binding on darwin + 2025-11-05 2dd7e94e16 cmd/go: use go.dev instead of golang.org in flag errors + 2025-11-05 28f1ad5782 cmd/go: fix TestScript/govcs + 2025-11-05 daa220a1c9 cmd/go: silence TLS handshake errors during test + 2025-11-05 3ae9e95002 cmd/go: fix TestCgoPkgConfig on darwin with pkg-config installed + 2025-11-05 a494a26bc2 cmd/go: fix TestScript/vet_flags + 2025-11-05 a8fb94969c cmd/go: fix TestScript/tool_build_as_needed + 2025-11-05 04f05219c4 cmd/cgo: skip escape checks if call site has no argument + 2025-11-04 9f3a108ee0 os: ignore O_TRUNC errors on named pipes and terminal devices on Windows + 2025-11-04 0e1bd8b5f1 cmd/link, runtime: don't store text start in pcHeader + 2025-11-04 7347b54727 cmd/link: don't generate .gosymtab section + 2025-11-04 6914dd11c0 cmd/link: add and use new SymKind SFirstUnallocated + 2025-11-04 f5f14262d0 cmd/link: remove misleading comment + 2025-11-04 61de3a9dae cmd/link: remove unused SFILEPATH symbol kind + 2025-11-04 8e2bd267b5 cmd/link: add comments for SymKind values + 2025-11-04 16705b962e cmd/compile: faster liveness analysis in regalloc + 2025-11-04 a5fe6791d7 internal/syscall/windows: fix ReOpenFile sentinel error value + 2025-11-04 a7d174ccaa cmd/compile/internal/ssa: simplify riscv64 FCLASSD rewrite rules + 2025-11-04 856238615d runtime: amend doc for setPinned + 2025-11-04 c7ccbddf22 cmd/compile/internal/ssa: more aggressive on dead auto elim + 2025-11-04 75b2bb1d1a cmd/cgo: drop pre-1.18 support + 2025-11-04 dd839f1d00 internal/strconv: handle %f with fixedFtoa when possible + 2025-11-04 6e165b4d17 cmd/compile: implement Avg64u, Hmul64, Hmul64u for wasm + 2025-11-04 9f6590f333 encoding/pem: don't reslice in failure modes + 2025-11-03 34fec512ce internal/strconv: extract fixed-precision ftoa from ftoaryu.go + 2025-11-03 162ba6cc40 internal/strconv: add tests and benchmarks for ftoaFixed + 2025-11-03 9795c7ba22 internal/strconv: fix pow10 off-by-one in exponent result + 2025-11-03 ad5e941a45 cmd/internal/obj/loong64: using {xv,v}slli.d to perform copying between vector registers + 2025-11-03 dadbac0c9e cmd/internal/obj/loong64: add VPERMI.W, XVPERMI.{W,V,Q} instruction support + 2025-11-03 e2c6a2024c runtime: avoid append in printint, printuint + 2025-11-03 c93cc603cd runtime: allow Stack to traceback goroutines in syscall _Grunning window + 2025-11-03 b5353fd90a runtime: don't panic in castogscanstatus + 2025-11-03 43491f8d52 cmd/cgo: use the export'ed file/line in error messages + 2025-11-03 aa94fdf0cc cmd/go: link to go.dev/doc/godebug for removed GODEBUG settings + 2025-11-03 4d2b03d2fc crypto/tls: add BetterTLS test coverage + 2025-11-03 0c4444e13d cmd/internal/obj: support arm64 FMOVQ large offset encoding + 2025-11-03 85bec791a0 cmd/go/testdata/script: loosen list_empty_importpath for freebsd + 2025-11-03 17b57078ab internal/runtime/cgobench: add cgo callback benchmark + 2025-11-03 5f8fdb720c cmd/go: move functions to methods + 2025-11-03 0a95856b95 cmd/go: eliminate additional global variable + 2025-11-03 f93186fb44 cmd/go/internal/telemetrystats: count cgo usage + 2025-11-03 eaf28a27fd runtime: update outdated comments for deferprocStack + 2025-11-03 e12d8a90bf all: remove extra space in the comments + 2025-11-03 c5559344ac internal/profile: optimize Parse allocs + 2025-11-03 5132158ac2 bytes: add Buffer.Peek + 2025-11-03 361d51a6b5 runtime: remove the pc field of _defer struct + 2025-11-03 00ee1860ce crypto/internal/constanttime: expose intrinsics to the FIPS 140-3 packages + 2025-11-02 388c41c412 cmd/go: skip git sha256 tests if git < 2.29 + 2025-11-01 385dc33250 runtime: prevent time.Timer.Reset(0) from deadlocking testing/synctest tests + 2025-10-31 99b724f454 cmd/go: document purego convention + 2025-10-31 27937289dc runtime: avoid zeroing scavenged memory + 2025-10-30 89dee70484 runtime: prioritize panic output over racefini + 2025-10-30 8683bb846d runtime: optimistically CAS atomicstatus directly in enter/exitsyscall + 2025-10-30 5b8e850340 runtime: don't track scheduling latency for _Grunning <-> _Gsyscall + 2025-10-30 251814e580 runtime: document tracer invariants explicitly + 2025-10-30 7244e9221f runtime: eliminate _Psyscall + 2025-10-30 5ef19c0d0c strconv: delete divmod1e9 + 2025-10-30 d32b1f02c3 runtime: delete timediv + 2025-10-30 cbbd385cb8 strconv: remove arch-specific decision in formatBase10 + 2025-10-30 6aca04a73a reflect: correct internal docs for uncommonType + 2025-10-30 235b4e729d cmd/compile/internal/ssa: model right shift more precisely + 2025-10-30 d44db293f9 go/token: fix a typo in a comment + 2025-10-30 cdc6b559ca strconv: remove hand-written divide on 32-bit systems + 2025-10-30 1e5bb416d8 cmd/compile: implement bits.Mul64 on 32-bit systems + 2025-10-30 38317c44e7 crypto/internal/fips140/aes: fix CTR generator + 2025-10-29 3be9a0e014 go/types, types: proceed with correct (invalid) type in case of a selector error + 2025-10-29 d2c5fa0814 strconv: remove &0xFF trick in formatBase10 + 2025-10-29 9bbda7c99d cmd/compile: make prove understand div, mod better + 2025-10-29 915c1839fe test/codegen: simplify asmcheck pattern matching + 2025-10-29 32ee3f3f73 runtime: tweak example code for gorecover + 2025-10-29 da3fb90b23 crypto/internal/fips140/bigmod: fix extendedGCD comment + 2025-10-29 9035f7aea5 runtime: use internal/strconv + 2025-10-29 49c1da474d internal/itoa, internal/runtime/strconv: delete + 2025-10-29 b2a346bbd1 strconv: move all but Quote to internal/strconv + 2025-10-28 041f564b3e internal/runtime/gc/scan: avoid memory destination on VPCOMPRESSQ + 2025-10-28 81afd3a59b cmd/compile: extend ppc64 MADDLD to match const ADDconst & MULLDconst + 2025-10-28 ea50d61b66 cmd/compile: name change isDirect -> isDirectAndComparable + 2025-10-28 bd4dc413cd cmd/compile: don't optimize away a panicing interface comparison + 2025-10-28 30c047d0d0 cmd/compile: extend loong MOV*idx rules to match ADDshiftLLV + 2025-10-28 46e5e2b09a runtime: define PanicBounds in funcdata.h + 2025-10-28 3da0356685 crypto/internal/fips140test: collect 300M entropy samples for ESV + 2025-10-28 d5953185d5 runtime: amend comments a bit + 2025-10-28 12c8d14d94 errors: document that the target of Is must be comparable + 2025-10-28 1f4d14e493 go/types, types2: pull up package-level object sort to a separate phase + 2025-10-28 b8aa1ee442 go/types, types2: reduce locks held at once in resolveUnderlying + 2025-10-28 24af441437 cmd/compile: rewrite proved multiplies by 0 or 1 into CondSelect + 2025-10-28 2d33a456c6 cmd/compile: move branchelim supported arches to Config + 2025-10-27 2c91c33e88 crypto/subtle,cmd/compile: add intrinsics for ConstantTimeSelect and *Eq + 2025-10-27 73d7635fae cmd/compile: add generic rules to remove bool → int → bool roundtrips + 2025-10-27 1662d55247 cmd/compile: do not Zext bools to 64bits in amd64 CMOV generation rules + 2025-10-27 b8468d8c4e cmd/compile: introduce bytesizeToConst to cleanup switches in prove + 2025-10-27 9e25c2f6de cmd/link: internal linking support for windows/arm64 + 2025-10-27 ff2ebf69c4 internal/runtime/gc/scan: correct size class size check + 2025-10-27 9a77aa4f08 cmd/compile: add position info to sccp debug messages + 2025-10-27 77dc138030 cmd/compile: teach prove about unsigned rounding-up divide + 2025-10-27 a0f33b2887 cmd/compile: change !l.nonzero() into l.maybezero() + 2025-10-27 5453b788fd cmd/compile: optimize Add64carry with unused carries into plain Add64 + 2025-10-27 2ce5aab79e cmd/compile: remove 68857 ModU flowLimit workaround in prove + 2025-10-27 a50de4bda7 cmd/compile: remove 68857 min & max flowLimit workaround in prove + 2025-10-27 53be78630a cmd/compile: use topo-sort in prove to correctly learn facts while walking once + 2025-10-27 dec2b4c83d runtime: avoid bound check in freebsd binuptime + 2025-10-27 916e682d51 cmd/internal/obj, cmd/asm: reclassify the offset of memory access operations on loong64 + 2025-10-27 2835b994fb cmd/go: remove global loader state variable + 2025-10-27 139f89226f cmd/go: use local state for telemetry + 2025-10-27 8239156571 cmd/go: use tagged switch + 2025-10-27 d741483a1f cmd/go: increase stmt threshold on amd64 + 2025-10-27 a6929cf4a7 cmd/go: removed unused code in toolchain.Exec + 2025-10-27 180c07e2c1 go/types, types2: clarify docs for resolveUnderlying + 2025-10-27 d8a32f3d4b go/types, types2: wrap Named.fromRHS into Named.rhs + 2025-10-27 b2af92270f go/types, types2: verify stateMask transitions in debug mode + 2025-10-27 92decdcbaa net/url: further speed up escape and unescape + 2025-10-27 5f4ec3541f runtime: remove unused cgoCheckUsingType function + 2025-10-27 189f2c08cc time: rewrite IsZero method to use wall and ext fields + 2025-10-27 f619b4a00d cmd/go: reorder parameters so that context is first + 2025-10-27 f527994c61 sync: update comments for Once.done + 2025-10-26 5dcaf9a01b runtime: add GOEXPERIMENT=runtimefree + 2025-10-26 d7a52f9369 cmd/compile: use MOV(D|F) with const for Const(64|32)F on riscv64 + 2025-10-26 6f04a92be3 internal/chacha8rand: provide vector implementation for riscv64 + 2025-10-26 54e3adc533 cmd/go: use local loader state in test + 2025-10-26 ca379b1c56 cmd/go: remove loaderstate dependency + 2025-10-26 83a44bde64 cmd/go: remove unused loader state + 2025-10-26 7e7cd9de68 cmd/go: remove temporary rf cleanup script + 2025-10-26 53ad68de4b cmd/compile: allow unaligned load/store on Wasm + 2025-10-25 12ec09f434 cmd/go: use local state object in work.runBuild and work.runInstall + 2025-10-24 643f80a11f runtime: add ppc and s390 to 32 build constraints for gccgo + 2025-10-24 0afbeb5102 runtime: add ppc and s390 to linux 32 bits syscall build constraints for gccgo + 2025-10-24 7b506d106f cmd/go: use local state object in `generate.runGenerate` + 2025-10-24 26a8a21d7f cmd/go: use local state object in `env.runEnv` + 2025-10-24 f2dd3d7e31 cmd/go: use local state object in `vet.runVet` + 2025-10-24 784700439a cmd/go: use local state object in pkg `workcmd` + 2025-10-24 69673e9be2 cmd/go: use local state object in `tool.runTool` + 2025-10-24 2e12c5db11 cmd/go: use local state object in `test.runTest` + 2025-10-24 fe345ff2ae cmd/go: use local state object in `modget.runGet` + 2025-10-24 d312e27e8b cmd/go: use local state object in pkg `modcmd` + 2025-10-24 ea9cf26aa1 cmd/go: use local state object in `list.runList` + 2025-10-24 9926e1124e cmd/go: use local state object in `bug.runBug` + 2025-10-24 2c4fd7b2cd cmd/go: use local state object in `run.runRun` + 2025-10-24 ade9f33e1f cmd/go: add loaderstate as field on `mvsReqs` + 2025-10-24 ccf4192a31 cmd/go: make ImportMissingError work with local state + 2025-10-24 f5403f15f0 debug/pe: check for zdebug_gdb_scripts section in testDWARF + 2025-10-24 a26f860fa4 runtime: use 32-bit hash for maps on Wasm + 2025-10-24 747fe2efed encoding/json/v2: fix typo in documentation about errors.AsType + 2025-10-24 94f47fc03f cmd/link: remove pointless assignment in SetSymAlign + 2025-10-24 e6cff69051 crypto/x509: move constraint checking after chain building + 2025-10-24 f5f69a3de9 encoding/json/jsontext: avoid pinning application data in pools + 2025-10-24 a6a59f0762 encoding/json/v2: use slices.Sort directly + 2025-10-24 0d3dab9b1d crypto/x509: simplify candidate chain filtering + 2025-10-24 29046398bb cmd/go: refactor injection of modload.LoaderState + 2025-10-24 c18fa69e52 cmd/go: make ErrNoModRoot work with local state + 2025-10-24 296ecc918d cmd/go: add modload.State parameter to AllowedFunc + 2025-10-24 c445a61e52 cmd/go: add loaderstate as field on `QueryMatchesMainModulesError` + 2025-10-24 6ac40051d3 cmd/go: remove module loader state from ccompile + 2025-10-24 6a5a452528 cmd/go: inject vendor dir into builder struct + 2025-10-23 dfac972233 crypto/pbkdf2: add missing error return value in example + 2025-10-23 47bf8f073e unique: fix inconsistent panic prefix in canonmap cleanup path + 2025-10-23 03bd43e8bb go/types, types2: rename Named.resolve to unpack + 2025-10-23 9fcdc814b2 go/types, types2: rename loaded namedState to lazyLoaded + 2025-10-23 8401512a9b go/types, types2: rename complete namedState to hasMethods + 2025-10-23 cf826bfcb4 go/types, types2: set t.underlying exactly once in resolveUnderlying + 2025-10-23 c4e910895b net/url: speed up escape and unescape + 2025-10-23 3f6ac3a10f go/build: use slices.Equal + 2025-10-23 839da71f89 encoding/pem: properly calculate end indexes + 2025-10-23 39ed968832 cmd: update golang.org/x/arch for riscv64 disassembler + 2025-10-23 ca448191c9 all: replace Split in loops with more efficient SplitSeq + 2025-10-23 107fcb70de internal/goroot: replace HasPrefix+TrimPrefix with CutPrefix + 2025-10-23 8378276d66 strconv: optimize int-to-decimal and use consistently + 2025-10-23 e5688d0bdd cmd/internal/obj/riscv: simplify validation and encoding of raw instructions + 2025-10-22 77fc27972a doc/next: improve new(expr) release note + 2025-10-22 d94a8c56ad runtime: cleanup pagetrace + 2025-10-22 02728a2846 crypto/internal/fips140test: add entropy SHA2-384 testing + 2025-10-22 f92e01c117 runtime/cgo: fix cgoCheckArg description + 2025-10-22 50586182ab runtime: use backoff and ISB instruction to reduce contention in (*lfstack).pop and (*spanSet).pop on arm64 + 2025-10-22 1ff59f3dd3 strconv: clean up powers-of-10 table, tests + 2025-10-22 7c9fa4d5e9 cmd/go: check if build output should overwrite files with renames + 2025-10-22 557b4d6e0f comment: change slice to string in function comment/help + 2025-10-22 d09a8c8ef4 go/types, types2: simplify locking in Named.resolveUnderlying + 2025-10-22 5a42af7f6c go/types, types2: in resolveUnderlying, only compute path when needed + 2025-10-22 4bdb55b5b8 go/types, types2: rename Named.under to Named.resolveUnderlying + 2025-10-21 29d43df8ab go/build, cmd/go: use ast.ParseDirective for go:embed + 2025-10-21 4e695dd634 go/ast: add ParseDirective for parsing directive comments + 2025-10-21 06e57e60a7 go/types, types2: only report version errors if new(expr) is ok otherwise + 2025-10-21 6c3d0d259f path/filepath: reword documentation for Rel + 2025-10-21 39fd61ddb0 go/types, types2: guard Named.underlying with Named.mu + 2025-10-21 4a0115c886 runtime,syscall: implement and use syscalln on darwin + 2025-10-21 261c561f5a all: gofmt -w + 2025-10-21 c9c78c06ef strconv: embed testdata in test + 2025-10-21 8f74f9daf4 sync: re-enable race even when panicking + 2025-10-21 8a6c64f4fe syscall: use rawSyscall6 to call ptrace in forkAndExecInChild + 2025-10-21 4620db72d2 runtime: use timer_settime64 on 32-bit Linux + 2025-10-21 b31dc77cea os: support deleting read-only files in RemoveAll on older Windows versions + 2025-10-21 46cc532900 cmd/compile/internal/ssa: fix typo in comment + 2025-10-21 2163a58021 crypto/internal/fips140/entropy: increase AllocsPerRun iterations + 2025-10-21 306eacbc11 cmd/go/testdata/script: disable list_empty_importpath test on Windows + 2025-10-21 a5a249d6a6 all: eliminate unnecessary type conversions + 2025-10-21 694182d77b cmd/internal/obj/ppc64: improve large prologue generation + 2025-10-21 b0dcb95542 cmd/compile: leave the horses alone + 2025-10-21 9a5a1202f4 runtime: clean dead architectures from go:build constraint + 2025-10-21 8539691d0c crypto/internal/fips140/entropy: move to crypto/internal/entropy/v1.0.0 + 2025-10-20 99cf4d671c runtime: save lasx and lsx registers in loong64 async preemption + 2025-10-20 79ae97fe9b runtime: make procyieldAsm no longer loop infinitely if passed 0 + 2025-10-20 f838faffe2 runtime: wrap procyield assembly and check for 0 + 2025-10-20 ee4d2c312d runtime/trace: dump test traces on validation failure + 2025-10-20 7b81a1e107 net/url: reduce allocs in Encode + 2025-10-20 e425176843 cmd/asm: fix typo in comment + 2025-10-20 dc9a3e2a65 runtime: fix generation skew with trace reentrancy + 2025-10-20 df33c17091 runtime: add _Gdeadextra status + 2025-10-20 7503856d40 cmd/go: inject loaderstate into matcher function + 2025-10-20 d57c3fd743 cmd/go: inject State parameter into `work.runInstall` + 2025-10-20 e94a5008f6 cmd/go: inject State parameter into `work.runBuild` + 2025-10-20 d9e6f95450 cmd/go: inject State parameter into `workcmd.runSync` + 2025-10-20 9769a61e64 cmd/go: inject State parameter into `modget.runGet` + 2025-10-20 f859799ccf cmd/go: inject State parameter into `modcmd.runVerify` + 2025-10-20 0f820aca29 cmd/go: inject State parameter into `modcmd.runVendor` + 2025-10-20 92aa3e9e98 cmd/go: inject State parameter into `modcmd.runInit` + 2025-10-20 e176dff41c cmd/go: inject State parameter into `modcmd.runDownload` + 2025-10-20 e7c66a58d5 cmd/go: inject State parameter into `toolchain.Select` + 2025-10-20 4dc3dd9a86 cmd/go: add loaderstate to Switcher + 2025-10-20 bcf7da1595 cmd/go: convert functions to methods + 2025-10-20 0d3044f965 cmd/go: make Reset work with any State instance + 2025-10-20 386d81151d cmd/go: make setState work with any State instance + 2025-10-20 a420aa221e cmd/go: inject State parameter into `tool.runTool` + 2025-10-20 441e7194a4 cmd/go: inject State parameter into `test.runTest` + 2025-10-20 35e8309be2 cmd/go: inject State parameter into `list.runList` + 2025-10-20 29a81624f7 cmd/go: inject state parameter into `fmtcmd.runFmt` + 2025-10-20 f7eaea02fd cmd/go: inject state parameter into `clean.runClean` + 2025-10-20 58a8fdb6cf cmd/go: inject State parameter into `bug.runBug` + 2025-10-20 8d0bef7ffe runtime: add linkname documentation and guidance + 2025-10-20 3e43f48cb6 encoding/asn1: use reflect.TypeAssert to improve performance + 2025-10-20 4ad5585c2c runtime: fix _rt0_ppc64x_lib on aix + 2025-10-17 a5f55a441e cmd/fix: add modernize and inline analyzers + 2025-10-17 80876f4b42 cmd/go/internal/vet: tweak help doc + 2025-10-17 b5aefe07e5 all: remove unnecessary loop variable copies in tests + 2025-10-17 5137c473b6 go/types, types2: remove references to under function in comments + 2025-10-17 dbbb1bfc91 all: correct name for comments + 2025-10-17 0983090171 encoding/pem: properly decode strange PEM data + 2025-10-17 36863d6194 runtime: unify riscv64 library entry point + 2025-10-16 0c14000f87 go/types, types2: remove under(Type) in favor of Type.Underlying() + 2025-10-16 1099436f1b go/types, types2: change and enforce lifecycle of Named.fromRHS and Named.underlying fields + 2025-10-16 41f5659347 go/types, types2: remove superfluous unalias call (minor cleanup) + 2025-10-16 e7351c03c8 runtime: use DC ZVA instead of its encoding in WORD in arm64 memclr + 2025-10-16 6cbe0920c4 cmd: update to x/tools@7d9453cc + 2025-10-15 45eee553e2 cmd/internal/obj: move ARM64RegisterExtension from cmd/asm/internal/arch + 2025-10-15 27f9a6705c runtime: increase repeat count for alloc test + 2025-10-15 b68cebd809 net/http/httptest: record failed ResponseWriter writes + 2025-10-15 f1fed742eb cmd: fix three printf problems reported by newest vet + 2025-10-15 0984dcd757 cmd/compile: fix an error in comments + 2025-10-15 31f82877e8 go/types, types2: fix misleading internal comment + 2025-10-15 6346349f56 cmd/compile: replace angle brackets with square + 2025-10-15 284379cdfc cmd/compile: remove rematerializable values from live set across calls + 2025-10-15 519ae514ab cmd/compile: eliminate bound check for slices of the same length + 2025-10-15 b5a29cca48 cmd/distpack: add fix tool to inventory + 2025-10-15 bb5eb51715 runtime/pprof: fix errors in pprof_test + 2025-10-15 5c9a26c7f8 cmd/compile: use arm64 neon in LoweredMemmove/LoweredMemmoveLoop + 2025-10-15 61d1ff61ad cmd/compile: use block starting position for phi line number + 2025-10-15 5b29875c8e cmd/go: inject State parameter into `run.runRun` + 2025-10-15 5113496805 runtime/pprof: skip flaky test TestProfilerStackDepth/heap for now + 2025-10-15 36086e85f8 cmd/go: create temporary cleanup script + 2025-10-14 7056c71d32 cmd/compile: disable use of new saturating float-to-int conversions + 2025-10-14 6d5b13793f Revert "cmd/compile: make 386 float-to-int conversions match amd64" + 2025-10-14 bb2a14252b Revert "runtime: adjust softfloat corner cases to match amd64/arm64" + 2025-10-14 3bc9d9fa83 Revert "cmd/compile: make wasm match other platforms for FP->int32/64 conversions" + 2025-10-14 ee5af46172 encoding/json: avoid misleading errors under goexperiment.jsonv2 + 2025-10-14 11d3d2f77d cmd/internal/obj/arm64: add support for PAC instructions + 2025-10-14 4dbf1a5a4c cmd/compile/internal/devirtualize: do not track assignments to non-PAUTO + 2025-10-14 0ddb5ed465 cmd/compile/internal/devirtualize: use FatalfAt instead of Fatalf where possible + 2025-10-14 0a239bcc99 Revert "net/url: disallow raw IPv6 addresses in host" + 2025-10-14 5a9ef44bc0 cmd/compile/internal/devirtualize: fix OCONVNOP assertion + 2025-10-14 3765758b96 go/types, types2: minor cleanup (remove TODO) + 2025-10-14 f6b9d56aff crypto/internal/fips140/entropy: fix benign race + 2025-10-14 60f6d2f623 crypto/internal/fips140/entropy: support SHA-384 sizes for ACVP tests + 2025-10-13 6fd8e88d07 encoding/json/v2: restrict presence of default options + 2025-10-13 1abc6b0204 go/types, types2: permit type cycles through type parameter lists + 2025-10-13 9fdd6904da strconv: add tests that Java once mishandled + 2025-10-13 9b8742f2e7 cmd/compile: don't depend on arch-dependent conversions in the compiler + 2025-10-13 0e64ee1286 encoding/json/v2: report EOF for top-level values in UnmarshalDecode + 2025-10-13 6bcd97d9f4 all: replace calls to errors.As with errors.AsType + 2025-10-11 1cd71689f2 crypto/x509: rework fix for CVE-2025-58187 + 2025-10-11 8aa1efa223 cmd/link: in TestFallocate, only check number of blocks on Darwin + 2025-10-10 b497a29d25 encoding/json: fix regression in quoted numbers under goexperiment.jsonv2 + 2025-10-10 48bb7a6114 cmd/compile: repair bisection behavior for float-to-unsigned conversion + 2025-10-10 e8a53538b4 runtime: fail TestGoroutineLeakProfile on data race + 2025-10-10 e3be2d1b2b net/url: disallow raw IPv6 addresses in host + 2025-10-10 aced4c79a2 net/http: strip request body headers on POST to GET redirects + 2025-10-10 584a89fe74 all: omit unnecessary reassignment + 2025-10-10 69e8279632 net/http: set cookie host to Request.Host when available + 2025-10-10 6f4c63ba63 cmd/go: unify "go fix" and "go vet" + 2025-10-10 955a5a0dc5 runtime: support arm64 Neon in async preemption + 2025-10-10 5368e77429 net/http: run TestRequestWriteTransport with fake time to avoid flakes + 2025-10-09 c53cb642de internal/buildcfg: enable greenteagc experiment for loong64 + 2025-10-09 954fdcc51a cmd/compile: declare no output register for loong64 LoweredAtomic{And,Or}32 ops + 2025-10-09 19a30ea3f2 cmd/compile: call generated size-specialized malloc functions directly + 2025-10-09 80f3bb5516 reflect: remove timeout in TestChanOfGC + 2025-10-09 9db7e30bb4 net/url: allow IP-literals with IPv4-mapped IPv6 addresses + 2025-10-09 8d810286b3 cmd/compile: make wasm match other platforms for FP->int32/64 conversions + 2025-10-09 b9f3accdcf runtime: adjust softfloat corner cases to match amd64/arm64 + 2025-10-09 78d75b3799 cmd/compile: make 386 float-to-int conversions match amd64 + 2025-10-09 0e466a8d1d cmd/compile: modify float-to-[u]int so that amd64 and arm64 match + 2025-10-08 4837fbe414 net/http/httptest: check whether response bodies are allowed + 2025-10-08 ee163197a8 path/filepath: return cleaned path from Rel + 2025-10-08 de9da0de30 cmd/compile/internal/devirtualize: improve concrete type analysis + 2025-10-08 ae094a1397 crypto/internal/fips140test: make entropy file pair names match + 2025-10-08 941e5917c1 runtime: cleanup comments from asm_ppc64x.s improvements + 2025-10-08 d945600d06 cmd/gofmt: change -d to exit 1 if diffs exist + 2025-10-08 d4830c6130 cmd/internal/obj: fix Link.Diag printf errors + 2025-10-08 e1ca1de123 net/http: format pprof.go + 2025-10-08 e5d004c7a8 net/http: update HTTP/2 documentation to reference new config features + 2025-10-08 97fd6bdecc cmd/compile: fuse NaN checks with other comparisons + 2025-10-07 78b43037dc cmd/go: refactor usage of `workFilePath` + 2025-10-07 bb1ca7ae81 cmd/go, testing: add TB.ArtifactDir and -artifacts flag + 2025-10-07 1623927730 cmd/go: refactor usage of `requirements` + 2025-10-07 a1661e776f Revert "crypto/internal/fips140/subtle: add assembly implementation of xorBytes for mips64x" + 2025-10-07 cb81270113 Revert "crypto/internal/fips140/subtle: add assembly implementation of xorBytes for mipsx" + 2025-10-07 f2d0d05d28 cmd/go: refactor usage of `MainModules` + 2025-10-07 f7a68d3804 archive/tar: set a limit on the size of GNU sparse file 1.0 regions + 2025-10-07 463165699d net/mail: avoid quadratic behavior in mail address parsing + 2025-10-07 5ede095649 net/textproto: avoid quadratic complexity in Reader.ReadResponse + 2025-10-07 5ce8cd16f3 encoding/pem: make Decode complexity linear + 2025-10-07 f6f4e8b3ef net/url: enforce stricter parsing of bracketed IPv6 hostnames + 2025-10-07 7dd54e1fd7 runtime: make work.spanSPMCs.all doubly-linked + 2025-10-07 3ee761739b runtime: free spanQueue on P destroy + 2025-10-07 8709a41d5e encoding/asn1: prevent memory exhaustion when parsing using internal/saferio + 2025-10-07 9b9d02c5a0 net/http: add httpcookiemaxnum GODEBUG option to limit number of cookies parsed + 2025-10-07 3fc4c79fdb crypto/x509: improve domain name verification + 2025-10-07 6e4007e8cf crypto/x509: mitigate DoS vector when intermediate certificate contains DSA public key + 2025-10-07 6f7926589d cmd/go: refactor usage of `modRoots` + 2025-10-07 11d5484190 runtime: fix self-deadlock on sbrk platforms + 2025-10-07 2e52060084 cmd/go: refactor usage of `RootMode` + 2025-10-07 f86ddb54b5 cmd/go: refactor usage of `ForceUseModules` + 2025-10-07 c938051dd0 Revert "cmd/compile: redo arm64 LR/FP save and restore" + 2025-10-07 6469954203 runtime: assert p.destroy runs with GC not running + 2025-10-06 4c0fd3a2b4 internal/goexperiment: remove the synctest GOEXPERIMENT + 2025-10-06 c1e6e49d5d fmt: reduce Errorf("x") allocations to match errors.New("x") + 2025-10-06 7fbf54bfeb internal/buildcfg: enable greenteagc experiment by default + 2025-10-06 7bfeb43509 cmd/go: refactor usage of `initialized` + 2025-10-06 1d62e92567 test/codegen: make sure assignment results are used. + 2025-10-06 4fca79833f runtime: delete redundant code in the page allocator + 2025-10-06 719dfcf8a8 cmd/compile: redo arm64 LR/FP save and restore + 2025-10-06 f3312124c2 runtime: remove batching from spanSPMC free + 2025-10-06 24416458c2 cmd/go: export type State + 2025-10-06 c2fb15164b testing/synctest: remove Run + 2025-10-06 ac2ec82172 runtime: bump thread count slack for TestReadMetricsSched + 2025-10-06 e74b224b7c crypto/tls: streamline BoGo testing w/ -bogo-local-dir + 2025-10-06 3a05e7b032 spec: close tag + 2025-10-03 2a71af11fc net/url: improve URL docs + 2025-10-03 ee5369b003 cmd/link: add LIBRARY statement only with -buildmode=cshared + 2025-10-03 1bca4c1673 cmd/compile: improve slicemask removal + 2025-10-03 38b26f29f1 cmd/compile: remove stores to unread parameters + 2025-10-03 003b5ce1bc cmd/compile: fix SIMD const rematerialization condition + 2025-10-03 d91148c7a8 cmd/compile: enhance prove to infer bounds in slice len/cap calculations + 2025-10-03 20c9377e47 cmd/compile: enhance the chunked indexing case to include reslicing + 2025-10-03 ad3db2562e cmd/compile: handle rematerialized op for incompatible reg constraint + 2025-10-03 18cd4a1fc7 cmd/compile: use the right type for spill slot + 2025-10-03 1caa95acfa cmd/compile: enhance prove to deal with double-offset IsInBounds checks + 2025-10-03 ec70d19023 cmd/compile: rewrite to elide Slicemask from len==c>0 slicing + 2025-10-03 10e7968849 cmd/compile: accounts rematerialize ops's output reginfo + 2025-10-03 ab043953cb cmd/compile: minor tweak for race detector + 2025-10-03 ebb72bef44 cmd/compile: don't treat devel compiler as a released compiler + 2025-10-03 c54dc1418b runtime: support valgrind (but not asan) in specialized malloc functions + 2025-10-03 a7917eed70 internal/buildcfg: enable specializedmalloc experiment + 2025-10-03 630799c6c9 crypto/tls: add flag to render HTML BoGo report Change-Id: I6bf904c523a77ee7d3dea9c8ae72292f8a5f2ba5
2025-10-30cmd/compile: implement bits.Mul64 on 32-bit systemsRuss Cox
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u on 32-bit systems, with the effect that constant division of both int64s and uint64s can now be emitted directly in all cases, and also that bits.Mul64 can be intrinsified on 32-bit systems. Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were implemented as uint32 divisions by c and some fixup. After expanding those smaller constant divisions, the code for i/999 required: (386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes) (arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes) (mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes) For that much code, we might as well use a full 64x64->128 multiply that can be used for all divisors, not just small ones. Having done that, the same i/999 now generates: (386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes) (arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes) (mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes) The size increase on 386 is due to a few extra register spills. The size increase on mips is due to add-with-carry being hard. The new approach is more general, letting us delete the old special case and guarantee that all int64 and uint64 divisions by constants are generated directly on 32-bit systems. This especially speeds up code making heavy use of bits.Mul64 with a constant argument, which happens in strconv and various crypto packages. A few examples are benchmarked below. pkg: cmd/compile/internal/test benchmark \ host local linux-amd64 s7 linux-386 s7:GOARCH=386 vs base vs base vs base vs base vs base DivconstI64 ~ ~ ~ -49.66% -21.02% ModconstI64 ~ ~ ~ -13.45% +14.52% DivisiblePow2constI64 ~ ~ ~ +0.97% -1.32% DivisibleconstI64 ~ ~ ~ -20.01% -48.28% DivisibleWDivconstI64 ~ ~ -1.76% -38.59% -42.74% DivconstU64/3 ~ ~ ~ -13.82% -4.09% DivconstU64/5 ~ ~ ~ -14.10% -3.54% DivconstU64/37 -2.07% -4.45% ~ -19.60% -9.55% DivconstU64/1234567 ~ ~ ~ -61.55% -56.93% ModconstU64 ~ ~ ~ -6.25% ~ DivisibleconstU64 ~ ~ ~ -2.78% -7.82% DivisibleWDivconstU64 ~ ~ ~ +4.23% +2.56% pkg: math/bits benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base Add ~ ~ ~ ~ Add32 +1.59% ~ ~ ~ Add64 ~ ~ ~ ~ Add64multiple ~ ~ ~ ~ Sub ~ ~ ~ ~ Sub32 ~ ~ ~ ~ Sub64 ~ ~ -9.20% ~ Sub64multiple ~ ~ ~ ~ Mul ~ ~ ~ ~ Mul32 ~ ~ ~ ~ Mul64 ~ ~ -41.58% -53.21% Div ~ ~ ~ ~ Div32 ~ ~ ~ ~ Div64 ~ ~ ~ ~ pkg: strconv benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base ParseInt/Pos/7bit ~ ~ -11.08% -6.75% ParseInt/Pos/26bit ~ ~ -13.65% -11.02% ParseInt/Pos/31bit ~ ~ -14.65% -9.71% ParseInt/Pos/56bit -1.80% ~ -17.97% -10.78% ParseInt/Pos/63bit ~ ~ -13.85% -9.63% ParseInt/Neg/7bit ~ ~ -12.14% -7.26% ParseInt/Neg/26bit ~ ~ -14.18% -9.81% ParseInt/Neg/31bit ~ ~ -14.51% -9.02% ParseInt/Neg/56bit ~ ~ -15.79% -9.79% ParseInt/Neg/63bit ~ ~ -15.68% -11.07% AppendFloat/Decimal ~ ~ -7.25% -12.26% AppendFloat/Float ~ ~ -15.96% -19.45% AppendFloat/Exp ~ ~ -13.96% -17.76% AppendFloat/NegExp ~ ~ -14.89% -20.27% AppendFloat/LongExp ~ ~ -12.68% -17.97% AppendFloat/Big ~ ~ -11.10% -16.64% AppendFloat/BinaryExp ~ ~ ~ ~ AppendFloat/32Integer ~ ~ -10.05% -10.91% AppendFloat/32ExactFraction ~ ~ -8.93% -13.00% AppendFloat/32Point ~ ~ -10.36% -14.89% AppendFloat/32Exp ~ ~ -9.88% -13.54% AppendFloat/32NegExp ~ ~ -10.16% -14.26% AppendFloat/32Shortest ~ ~ -11.39% -14.96% AppendFloat/32Fixed8Hard ~ ~ ~ -2.31% AppendFloat/32Fixed9Hard ~ ~ ~ -7.01% AppendFloat/64Fixed1 ~ ~ -2.83% -8.23% AppendFloat/64Fixed2 ~ ~ ~ -7.94% AppendFloat/64Fixed3 ~ ~ -4.07% -7.22% AppendFloat/64Fixed4 ~ ~ -7.24% -7.62% AppendFloat/64Fixed12 ~ ~ -6.57% -4.82% AppendFloat/64Fixed16 ~ ~ -4.00% -5.81% AppendFloat/64Fixed12Hard -2.22% ~ -4.07% -6.35% AppendFloat/64Fixed17Hard -2.12% ~ ~ -3.79% AppendFloat/64Fixed18Hard -1.89% ~ +2.48% ~ AppendFloat/Slowpath64 -1.85% ~ -14.49% -18.21% AppendFloat/SlowpathDenormal64 ~ ~ -13.08% -19.41% pkg: crypto/internal/fips140/nistec/fiat benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base Mul/P224 ~ ~ -29.95% -39.60% Mul/P384 ~ ~ -37.11% -63.33% Mul/P521 ~ ~ -26.62% -12.42% Square/P224 +1.46% ~ -40.62% -49.18% Square/P384 ~ ~ -45.51% -69.68% Square/P521 +90.37% ~ -25.26% -11.23% (The +90% is a separate problem and not real; that much variation can be seen on that system by running the same binary from two different files.) pkg: crypto/internal/fips140/edwards25519 benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base EncodingDecoding ~ ~ -34.67% -35.75% ScalarBaseMult ~ ~ -31.25% -30.29% ScalarMult ~ ~ -33.45% -32.54% VarTimeDoubleScalarBaseMult ~ ~ -33.78% -33.68% Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65 Reviewed-on: https://go-review.googlesource.com/c/go/+/716061 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-10-29cmd/compile: make prove understand div, mod betterRuss Cox
This CL introduces new divisible and divmod passes that rewrite divisibility checks and div, mod, and mul. These happen after prove, so that prove can make better sense of the code for deriving bounds, and they must run before decompose, so that 64-bit ops can be lowered to 32-bit ops on 32-bit systems. And then they need another generic pass as well, to optimize the generated code before decomposing. The three opt passes are "opt", "middle opt", and "late opt". (Perhaps instead they should be "generic", "opt", and "late opt"?) The "late opt" pass repeats the "middle opt" work on any new code that has been generated in the interim. There will not be new divs or mods, but there may be new muls. The x%c==0 rewrite rules are much simpler now, since they can match before divs have been rewritten. This has the effect of applying them more consistently and making the rewrite rules independent of the exact div rewrites. Prove is also now charged with marking signed div/mod as unsigned when the arguments call for it, allowing simpler code to be emitted in various cases. For example, t.Seconds()/2 and len(x)/2 are now recognized as unsigned, meaning they compile to a simple shift (unsigned division), avoiding the more complex fixup we need for signed values. https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32 shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std' output before and after. "Proved Rsh64x64 shifts to zero" is replaced by the higher-level "Proved Div64 is unsigned" (the shift was in the signed expansion of div by constant), but otherwise prove is only finding more things to prove. One short example, in code that does x[i%len(x)]: < runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero --- > runtime/mfinal.go:131:34: Proved Div64 is unsigned > runtime/mfinal.go:131:38: Proved IsInBounds A longer example: < crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero --- > crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds These diffs are due to the smaller opt being better and taking work away from prove: < image/jpeg/dct.go:307:5: Proved IsInBounds < image/jpeg/dct.go:308:5: Proved IsInBounds ... < image/jpeg/dct.go:442:5: Proved IsInBounds In the old opt, Mul by 8 was rewritten to Lsh by 3 early. This CL delays that rule to help prove recognize mods, but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8]. Specifically, computing the length, opt can now do: (Sub64 (Add (Mul 8 i) 8) (Add (Mul 8 i) 8)) -> (Add 8 (Sub (Mul 8 i) (Mul 8 i))) -> (Add 8 (Mul 8 (Sub i i))) -> (Add 8 (Mul 8 0)) -> (Add 8 0) -> 8 The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)), Leaving the multiply as Mul enables using that step; the old rewrite to Lsh blocked it, leaving prove to figure out the length and then remove the bounds checks. But now opt can evaluate the length down to a constant 8 and then constant-fold away the bounds checks 0 < 8, 1 < 8, and so on. After that, the compiler has nothing left to prove. Benchmarks are noisy in general; I checked the assembly for the many large increases below, and the vast majority are unchanged and presumably hitting the caches differently in some way. The divisibility optimizations were not reliably triggering before. This leads to a very large improvement in some cases, like DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems and DivisbleconstU64 on 32-bit systems. Another way the divisibility optimizations were unreliable before was incorrectly triggering for x/3, x%3 even though they are written not to do that. There is a real but small slowdown in the DivisibleWDivconst benchmarks on Mac because in the cases used in the benchmark, it is still faster (on Mac) to do the divisibility check than to remultiply. This may be worth further study. Perhaps when there is no rotate (meaning the divisor is odd), the divisibility optimization should be enabled always. In any event, this CL makes it possible to study that. benchmark \ host s7 linux-amd64 mac linux-arm64 linux-ppc64le linux-386 s7:GOARCH=386 linux-arm vs base vs base vs base vs base vs base vs base vs base vs base LoadAdd ~ ~ ~ ~ ~ -1.59% ~ ~ ExtShift ~ ~ -42.14% +0.10% ~ +1.44% +5.66% +8.50% Modify ~ ~ ~ ~ ~ ~ ~ -1.53% MullImm ~ ~ ~ ~ ~ +37.90% -21.87% +3.05% ConstModify ~ ~ ~ ~ -49.14% ~ ~ ~ BitSet ~ ~ ~ ~ -15.86% -14.57% +6.44% +0.06% BitClear ~ ~ ~ ~ ~ +1.78% +3.50% +0.06% BitToggle ~ ~ ~ ~ ~ -16.09% +2.91% ~ BitSetConst ~ ~ ~ ~ ~ ~ ~ -0.49% BitClearConst ~ ~ ~ ~ -28.29% ~ ~ -0.40% BitToggleConst ~ ~ ~ +8.89% -31.19% ~ ~ -0.77% MulNeg ~ ~ ~ ~ ~ ~ ~ ~ Mul2Neg ~ ~ -4.83% ~ ~ -13.75% -5.92% ~ DivconstI64 ~ ~ ~ ~ ~ -30.12% ~ +0.50% ModconstI64 ~ ~ -9.94% -4.63% ~ +3.15% ~ +5.32% DivisiblePow2constI64 -34.49% -12.58% ~ ~ -12.25% ~ ~ ~ DivisibleconstI64 -24.69% -25.06% -0.40% -2.27% -42.61% -3.31% ~ +1.63% DivisibleWDivconstI64 ~ ~ ~ ~ ~ -17.55% ~ -0.60% DivconstU64/3 ~ ~ ~ ~ ~ +1.51% ~ ~ DivconstU64/5 ~ ~ ~ ~ ~ ~ ~ ~ DivconstU64/37 ~ ~ -0.18% ~ ~ +2.70% ~ ~ DivconstU64/1234567 ~ ~ ~ ~ ~ ~ ~ +0.12% ModconstU64 ~ ~ ~ -0.24% ~ -5.10% -1.07% -1.56% DivisibleconstU64 ~ ~ ~ ~ ~ -29.01% -59.13% -50.72% DivisibleWDivconstU64 ~ ~ -12.18% -18.88% ~ -5.50% -3.91% +5.17% DivconstI32 ~ ~ -0.48% ~ -34.69% +89.01% -6.01% -16.67% ModconstI32 ~ +2.95% -0.33% ~ ~ -2.98% -5.40% -8.30% DivisiblePow2constI32 ~ ~ ~ ~ ~ ~ ~ -16.22% DivisibleconstI32 ~ ~ ~ ~ ~ -37.27% -47.75% -25.03% DivisibleWDivconstI32 -11.59% +5.22% -12.99% -23.83% ~ +45.95% -7.03% -10.01% DivconstU32 ~ ~ ~ ~ ~ +74.71% +4.81% ~ ModconstU32 ~ ~ +0.53% +0.18% ~ +51.16% ~ ~ DivisibleconstU32 ~ ~ ~ -0.62% ~ -4.25% ~ ~ DivisibleWDivconstU32 -2.77% +5.56% +11.12% -5.15% ~ +48.70% +25.11% -4.07% DivconstI16 -6.06% ~ -0.33% +0.22% ~ ~ -9.68% +5.47% ModconstI16 ~ ~ +4.44% +2.82% ~ ~ ~ +5.06% DivisiblePow2constI16 ~ ~ ~ ~ ~ ~ ~ -0.17% DivisibleconstI16 ~ ~ -0.23% ~ ~ ~ +4.60% +6.64% DivisibleWDivconstI16 -1.44% -0.43% +13.48% -5.76% ~ +1.62% -23.15% -9.06% DivconstU16 +1.61% ~ -0.35% -0.47% ~ ~ +15.59% ~ ModconstU16 ~ ~ ~ ~ ~ -0.72% ~ +14.23% DivisibleconstU16 ~ ~ -0.05% +3.00% ~ ~ ~ +5.06% DivisibleWDivconstU16 +52.10% +0.75% +17.28% +4.79% ~ -37.39% +5.28% -9.06% DivconstI8 ~ ~ -0.34% -0.96% ~ ~ -9.20% ~ ModconstI8 +2.29% ~ +4.38% +2.96% ~ ~ ~ ~ DivisiblePow2constI8 ~ ~ ~ ~ ~ ~ ~ ~ DivisibleconstI8 ~ ~ ~ ~ ~ ~ +6.04% ~ DivisibleWDivconstI8 -26.44% +1.69% +17.03% +4.05% ~ +32.48% -24.90% ~ DivconstU8 -4.50% +14.06% -0.28% ~ ~ ~ +4.16% +0.88% ModconstU8 ~ ~ +25.84% -0.64% ~ ~ ~ ~ DivisibleconstU8 ~ ~ -5.70% ~ ~ ~ ~ ~ DivisibleWDivconstU8 +49.55% +9.07% ~ +4.03% +53.87% -40.03% +39.72% -3.01% Mul2 ~ ~ ~ ~ ~ ~ ~ ~ MulNeg2 ~ ~ ~ ~ -11.73% ~ ~ -0.02% EfaceInteger ~ ~ ~ ~ ~ +18.11% ~ +2.53% TypeAssert +33.90% +2.86% ~ ~ ~ -1.07% -5.29% -1.04% Div64UnsignedSmall ~ ~ ~ ~ ~ ~ ~ ~ Div64Small ~ ~ ~ ~ ~ -0.88% ~ +2.39% Div64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +0.35% Div64SmallNegDividend ~ ~ ~ ~ ~ -0.84% ~ +3.57% Div64SmallNegBoth ~ ~ ~ ~ ~ -0.86% ~ +3.55% Div64Unsigned ~ ~ ~ ~ ~ ~ ~ -0.11% Div64 ~ ~ ~ ~ ~ ~ ~ +0.11% Div64NegDivisor ~ ~ ~ ~ ~ -1.29% ~ ~ Div64NegDividend ~ ~ ~ ~ ~ -1.44% ~ ~ Div64NegBoth ~ ~ ~ ~ ~ ~ ~ +0.28% Mod64UnsignedSmall ~ ~ ~ ~ ~ +0.48% ~ +0.93% Mod64Small ~ ~ ~ ~ ~ ~ ~ ~ Mod64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +1.44% Mod64SmallNegDividend ~ ~ ~ ~ ~ +0.22% ~ +1.37% Mod64SmallNegBoth ~ ~ ~ ~ ~ ~ ~ -2.22% Mod64Unsigned ~ ~ ~ ~ ~ -0.95% ~ +0.11% Mod64 ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegDivisor ~ ~ ~ ~ ~ ~ ~ -0.02% Mod64NegDividend ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegBoth ~ ~ ~ ~ ~ ~ ~ -0.02% MulconstI32/3 ~ ~ ~ -25.00% ~ ~ ~ +47.37% MulconstI32/5 ~ ~ ~ +33.28% ~ ~ ~ +32.21% MulconstI32/12 ~ ~ ~ -2.13% ~ ~ ~ -0.02% MulconstI32/120 ~ ~ ~ +2.93% ~ ~ ~ -0.03% MulconstI32/-120 ~ ~ ~ -2.17% ~ ~ ~ -0.03% MulconstI32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstI32/65538 ~ ~ ~ ~ ~ -33.38% ~ +0.04% MulconstI64/3 ~ ~ ~ +33.35% ~ -0.37% ~ -0.13% MulconstI64/5 ~ ~ ~ -25.00% ~ -0.34% ~ ~ MulconstI64/12 ~ ~ ~ +2.13% ~ +11.62% ~ +2.30% MulconstI64/120 ~ ~ ~ -1.98% ~ ~ ~ ~ MulconstI64/-120 ~ ~ ~ +0.75% ~ ~ ~ ~ MulconstI64/65537 ~ ~ ~ ~ ~ +5.61% ~ ~ MulconstI64/65538 ~ ~ ~ ~ ~ +5.25% ~ ~ MulconstU32/3 ~ +0.81% ~ +33.39% ~ +77.92% ~ -32.31% MulconstU32/5 ~ ~ ~ -24.97% ~ +77.92% ~ -24.47% MulconstU32/12 ~ ~ ~ +2.06% ~ ~ ~ +0.03% MulconstU32/120 ~ ~ ~ -2.74% ~ ~ ~ +0.03% MulconstU32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstU32/65538 ~ ~ ~ ~ ~ -33.42% ~ -0.03% MulconstU64/3 ~ ~ ~ +33.33% ~ -0.28% ~ +1.22% MulconstU64/5 ~ ~ ~ -25.00% ~ ~ ~ -0.64% MulconstU64/12 ~ ~ ~ +2.30% ~ +11.59% ~ +0.14% MulconstU64/120 ~ ~ ~ -2.82% ~ ~ ~ +0.04% MulconstU64/65537 ~ +0.37% ~ ~ ~ +5.58% ~ ~ MulconstU64/65538 ~ ~ ~ ~ ~ +5.16% ~ ~ ShiftArithmeticRight ~ ~ ~ ~ ~ -10.81% ~ +0.31% Switch8Predictable +14.69% ~ ~ ~ ~ -24.85% ~ ~ Switch8Unpredictable ~ -0.58% -3.80% ~ ~ -11.78% ~ -0.79% Switch32Predictable -10.33% +17.89% ~ ~ ~ +5.76% ~ ~ Switch32Unpredictable -3.15% +1.19% +9.42% ~ ~ -10.30% -5.09% +0.44% SwitchStringPredictable +70.88% +20.48% ~ ~ ~ +2.39% ~ +0.31% SwitchStringUnpredictable ~ +3.91% -5.06% -0.98% ~ +0.61% +2.03% ~ SwitchTypePredictable +146.58% -1.10% ~ -12.45% ~ -0.46% -3.81% ~ SwitchTypeUnpredictable +0.46% -0.83% ~ +4.18% ~ +0.43% ~ +0.62% SwitchInterfaceTypePredictable -13.41% -10.13% +11.03% ~ ~ -4.38% ~ +0.75% SwitchInterfaceTypeUnpredictable -6.37% -2.14% ~ -3.21% ~ -4.20% ~ +1.08% Fixes #63110. Fixes #75954. Change-Id: I55a876f08c6c14f419ce1a8cbba2eaae6c6efbf0 Reviewed-on: https://go-review.googlesource.com/c/go/+/714160 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-28cmd/compile: name change isDirect -> isDirectAndComparableKeith Randall
Now that it also restricts to comparable types. Followon to CL 713840. Change-Id: Idd975c3fd16fb51f55360f2fa0b89ab0bf1d00ed Reviewed-on: https://go-review.googlesource.com/c/go/+/715700 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Mateusz Poliwczak <mpoliwczak34@gmail.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-27cmd/compile: add generic rules to remove bool → int → bool roundtripsJorropo
Change-Id: I8b0a3b64c89fe167d304f901a5d38470f35400ab Reviewed-on: https://go-review.googlesource.com/c/go/+/715200 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-10-27cmd/compile: optimize Add64carry with unused carries into plain Add64Jorropo
Change-Id: I8a63f567cfc574bb066ad6269eec6929760cb9c3 Reviewed-on: https://go-review.googlesource.com/c/go/+/656338 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-10-13cmd/compile: don't depend on arch-dependent conversions in the compilerKeith Randall
Leave those constant foldings for runtime, similar to how we do it for NaN generation. These are the only instances I could find in cmd/compile/..., using objdump -d ../pkg/tool/darwin_arm64/compile| egrep "(fcvtz|>:)" | grep -B1 fcvt (There are instances in other places, like runtime and reflect, but I don't think those places would affect compiler output.) Change-Id: I4113fe4570115e4765825cf442cb1fde97cf2f27 Reviewed-on: https://go-review.googlesource.com/c/go/+/711281 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-10-09cmd/compile: call generated size-specialized malloc functions directlyMichael Matloob
This change creates calls to size-specialized malloc functions instead of calls to newObject when we know the size of the allocation at compilation time. Most of it is a matter of calling the newObject function (which will create calls to the size-specialized functions) rather then the newObjectNonSpecialized function (which won't). In the newHeapaddr, small, non-pointer case, we'll create a non specialized newObject and transform that into the appropriate size-specialized function when we produce the mallocgc in flushPendingHeapAllocations. We have to update some of the rewrites in generic.rules to also apply to the size-specialized functions when they apply to newObject. The messiest thing is we have to adjust the offset we use to save the memory profiler stack, because the depth of the call to profilealloc is two frames fewer in the size-specialized malloc functions compared to when newObject calls mallocgc. A bunch of tests have been adjusted to account for that. Change-Id: I6a6a6964c9037fb6719e392c4a498ed700b617d7 Reviewed-on: https://go-review.googlesource.com/c/go/+/707856 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Matloob <matloob@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-10-08cmd/compile: fuse NaN checks with other comparisonsMichael Munday
NaN checks can often be merged into other comparisons by inverting them. For example, `math.IsNaN(x) || x > 0` is equivalent to `!(x <= 0)`. goos: linux goarch: amd64 pkg: math cpu: 12th Gen Intel(R) Core(TM) i7-12700T │ sec/op │ sec/op vs base │ Acos 4.315n ± 0% 4.314n ± 0% ~ (p=0.642 n=10) Acosh 8.398n ± 0% 7.779n ± 0% -7.37% (p=0.000 n=10) Asin 4.203n ± 0% 4.211n ± 0% +0.20% (p=0.001 n=10) Asinh 10.150n ± 0% 9.562n ± 0% -5.79% (p=0.000 n=10) Atan 2.363n ± 0% 2.363n ± 0% ~ (p=0.801 n=10) Atanh 8.192n ± 2% 7.685n ± 0% -6.20% (p=0.000 n=10) Atan2 4.013n ± 0% 4.010n ± 0% ~ (p=0.073 n=10) Cbrt 4.858n ± 0% 4.755n ± 0% -2.12% (p=0.000 n=10) Cos 4.596n ± 0% 4.357n ± 0% -5.20% (p=0.000 n=10) Cosh 5.071n ± 0% 5.071n ± 0% ~ (p=0.585 n=10) Erf 2.802n ± 1% 2.788n ± 0% -0.54% (p=0.002 n=10) Erfc 3.087n ± 1% 3.071n ± 0% ~ (p=0.320 n=10) Erfinv 3.981n ± 0% 3.965n ± 0% -0.41% (p=0.000 n=10) Erfcinv 3.985n ± 0% 3.977n ± 0% -0.20% (p=0.000 n=10) ExpGo 8.721n ± 2% 8.252n ± 0% -5.38% (p=0.000 n=10) Expm1 4.378n ± 0% 4.228n ± 0% -3.43% (p=0.000 n=10) Exp2 8.313n ± 0% 7.855n ± 0% -5.52% (p=0.000 n=10) Exp2Go 8.498n ± 2% 7.921n ± 0% -6.79% (p=0.000 n=10) Mod 15.16n ± 4% 12.20n ± 1% -19.58% (p=0.000 n=10) Frexp 1.780n ± 2% 1.496n ± 0% -15.96% (p=0.000 n=10) Gamma 4.378n ± 1% 4.013n ± 0% -8.35% (p=0.000 n=10) HypotGo 2.655n ± 5% 2.427n ± 1% -8.57% (p=0.000 n=10) Ilogb 1.912n ± 5% 1.749n ± 0% -8.53% (p=0.000 n=10) J0 22.43n ± 9% 20.46n ± 0% -8.76% (p=0.000 n=10) J1 21.03n ± 4% 19.96n ± 0% -5.09% (p=0.000 n=10) Jn 45.40n ± 1% 42.59n ± 0% -6.20% (p=0.000 n=10) Ldexp 2.312n ± 1% 1.944n ± 0% -15.94% (p=0.000 n=10) Lgamma 4.617n ± 1% 4.584n ± 0% -0.73% (p=0.000 n=10) Log 4.226n ± 0% 4.213n ± 0% -0.31% (p=0.001 n=10) Logb 1.771n ± 0% 1.775n ± 0% ~ (p=0.097 n=10) Log1p 5.102n ± 2% 5.001n ± 0% -1.97% (p=0.000 n=10) Log10 4.407n ± 0% 4.408n ± 0% ~ (p=1.000 n=10) Log2 2.416n ± 1% 2.138n ± 0% -11.51% (p=0.000 n=10) Modf 1.669n ± 2% 1.611n ± 0% -3.50% (p=0.000 n=10) Nextafter32 2.186n ± 0% 2.185n ± 0% ~ (p=0.051 n=10) Nextafter64 2.182n ± 0% 2.184n ± 0% +0.09% (p=0.016 n=10) PowInt 11.39n ± 6% 10.68n ± 2% -6.24% (p=0.000 n=10) PowFrac 26.60n ± 2% 26.12n ± 0% -1.80% (p=0.000 n=10) Pow10Pos 0.5067n ± 4% 0.5003n ± 1% -1.27% (p=0.001 n=10) Pow10Neg 0.8552n ± 0% 0.8552n ± 0% ~ (p=0.928 n=10) Round 1.181n ± 0% 1.182n ± 0% +0.08% (p=0.001 n=10) RoundToEven 1.709n ± 0% 1.710n ± 0% ~ (p=0.053 n=10) Remainder 12.54n ± 5% 11.99n ± 2% -4.46% (p=0.000 n=10) Sin 3.933n ± 5% 3.926n ± 0% -0.17% (p=0.000 n=10) Sincos 5.672n ± 0% 5.522n ± 0% -2.65% (p=0.000 n=10) Sinh 5.447n ± 1% 5.444n ± 0% -0.06% (p=0.029 n=10) Tan 4.061n ± 0% 4.058n ± 0% -0.07% (p=0.005 n=10) Tanh 5.599n ± 0% 5.595n ± 0% -0.06% (p=0.042 n=10) Y0 20.75n ± 5% 19.73n ± 1% -4.92% (p=0.000 n=10) Y1 20.87n ± 2% 19.78n ± 1% -5.20% (p=0.000 n=10) Yn 44.50n ± 2% 42.04n ± 2% -5.53% (p=0.000 n=10) geomean 4.989n 4.791n -3.96% goos: linux goarch: riscv64 pkg: math cpu: Spacemit(R) X60 │ sec/op │ sec/op vs base │ Acos 159.9n ± 0% 159.9n ± 0% ~ (p=0.269 n=10) Acosh 244.7n ± 0% 235.0n ± 0% -3.98% (p=0.000 n=10) Asin 159.9n ± 0% 159.9n ± 0% ~ (p=0.154 n=10) Asinh 270.8n ± 0% 261.1n ± 0% -3.60% (p=0.000 n=10) Atan 119.1n ± 0% 119.1n ± 0% ~ (p=0.347 n=10) Atanh 260.2n ± 0% 261.8n ± 4% ~ (p=0.459 n=10) Atan2 186.8n ± 0% 186.8n ± 0% ~ (p=0.487 n=10) Cbrt 203.5n ± 0% 198.2n ± 0% -2.60% (p=0.000 n=10) Ceil 31.82n ± 0% 31.81n ± 0% ~ (p=0.714 n=10) Copysign 4.894n ± 0% 4.893n ± 0% ~ (p=0.161 n=10) Cos 107.6n ± 0% 103.6n ± 0% -3.76% (p=0.000 n=10) Cosh 259.0n ± 0% 252.8n ± 0% -2.39% (p=0.000 n=10) Erf 133.7n ± 0% 133.7n ± 0% ~ (p=0.720 n=10) Erfc 137.9n ± 0% 137.8n ± 0% -0.04% (p=0.033 n=10) Erfinv 173.7n ± 0% 168.8n ± 0% -2.82% (p=0.000 n=10) Erfcinv 173.7n ± 0% 168.8n ± 0% -2.82% (p=0.000 n=10) Exp 215.3n ± 0% 208.1n ± 0% -3.34% (p=0.000 n=10) ExpGo 226.7n ± 0% 220.6n ± 0% -2.69% (p=0.000 n=10) Expm1 164.8n ± 0% 159.0n ± 0% -3.52% (p=0.000 n=10) Exp2 185.0n ± 0% 182.7n ± 0% -1.22% (p=0.000 n=10) Exp2Go 198.9n ± 0% 196.5n ± 0% -1.21% (p=0.000 n=10) Abs 4.894n ± 0% 4.893n ± 0% ~ (p=0.262 n=10) Dim 16.31n ± 0% 16.31n ± 0% ~ (p=1.000 n=10) Floor 31.81n ± 0% 31.81n ± 0% ~ (p=0.067 n=10) Max 26.11n ± 0% 26.10n ± 0% ~ (p=0.080 n=10) Min 26.10n ± 0% 26.10n ± 0% ~ (p=0.095 n=10) Mod 337.7n ± 0% 291.9n ± 0% -13.56% (p=0.000 n=10) Frexp 50.57n ± 0% 42.41n ± 0% -16.13% (p=0.000 n=10) Gamma 206.3n ± 0% 198.1n ± 0% -4.00% (p=0.000 n=10) Hypot 94.62n ± 0% 94.61n ± 0% ~ (p=0.437 n=10) HypotGo 109.3n ± 0% 109.3n ± 0% ~ (p=1.000 n=10) Ilogb 44.05n ± 0% 44.04n ± 0% -0.02% (p=0.025 n=10) J0 663.1n ± 0% 663.9n ± 0% +0.13% (p=0.002 n=10) J1 663.9n ± 0% 666.4n ± 0% +0.38% (p=0.000 n=10) Jn 1.404µ ± 0% 1.407µ ± 0% +0.21% (p=0.000 n=10) Ldexp 57.10n ± 0% 48.93n ± 0% -14.30% (p=0.000 n=10) Lgamma 185.1n ± 0% 187.6n ± 0% +1.32% (p=0.000 n=10) Log 182.7n ± 0% 170.1n ± 0% -6.87% (p=0.000 n=10) Logb 46.49n ± 0% 46.49n ± 0% ~ (p=0.675 n=10) Log1p 184.3n ± 0% 179.4n ± 0% -2.63% (p=0.000 n=10) Log10 184.3n ± 0% 171.2n ± 0% -7.08% (p=0.000 n=10) Log2 66.05n ± 0% 57.90n ± 0% -12.34% (p=0.000 n=10) Modf 34.25n ± 0% 34.24n ± 0% ~ (p=0.163 n=10) Nextafter32 49.33n ± 1% 48.93n ± 0% -0.81% (p=0.002 n=10) Nextafter64 43.64n ± 0% 43.23n ± 0% -0.93% (p=0.000 n=10) PowInt 267.6n ± 0% 251.2n ± 0% -6.11% (p=0.000 n=10) PowFrac 672.9n ± 0% 637.9n ± 0% -5.19% (p=0.000 n=10) Pow10Pos 13.87n ± 0% 13.87n ± 0% ~ (p=1.000 n=10) Pow10Neg 19.58n ± 62% 19.59n ± 62% ~ (p=0.355 n=10) Round 23.65n ± 0% 23.65n ± 0% ~ (p=1.000 n=10) RoundToEven 27.73n ± 0% 27.73n ± 0% ~ (p=0.635 n=10) Remainder 309.9n ± 0% 280.5n ± 0% -9.49% (p=0.000 n=10) Signbit 13.05n ± 0% 13.05n ± 0% ~ (p=1.000 n=10) ¹ Sin 120.7n ± 0% 120.7n ± 0% ~ (p=1.000 n=10) ¹ Sincos 148.4n ± 0% 143.5n ± 0% -3.30% (p=0.000 n=10) Sinh 275.6n ± 0% 267.5n ± 0% -2.94% (p=0.000 n=10) SqrtIndirect 3.262n ± 0% 3.262n ± 0% ~ (p=0.263 n=10) SqrtLatency 19.57n ± 0% 19.57n ± 0% ~ (p=0.582 n=10) SqrtIndirectLatency 19.57n ± 0% 19.57n ± 0% ~ (p=1.000 n=10) SqrtGoLatency 203.2n ± 0% 197.6n ± 0% -2.78% (p=0.000 n=10) SqrtPrime 4.952µ ± 0% 4.952µ ± 0% -0.01% (p=0.025 n=10) Tan 153.3n ± 0% 153.3n ± 0% ~ (p=1.000 n=10) Tanh 280.5n ± 0% 272.4n ± 0% -2.91% (p=0.000 n=10) Trunc 31.81n ± 0% 31.81n ± 0% ~ (p=1.000 n=10) Y0 680.1n ± 0% 664.8n ± 0% -2.25% (p=0.000 n=10) Y1 684.2n ± 0% 669.6n ± 0% -2.14% (p=0.000 n=10) Yn 1.444µ ± 0% 1.410µ ± 0% -2.35% (p=0.000 n=10) Float64bits 5.709n ± 0% 5.708n ± 0% ~ (p=0.573 n=10) Float64frombits 4.893n ± 0% 4.893n ± 0% ~ (p=0.734 n=10) Float32bits 12.23n ± 0% 12.23n ± 0% ~ (p=0.628 n=10) Float32frombits 4.893n ± 0% 4.893n ± 0% ~ (p=0.971 n=10) FMA 4.893n ± 0% 4.893n ± 0% ~ (p=0.736 n=10) geomean 88.96n 87.05n -2.15% ¹ all samples are equal Change-Id: I8db8ac7b7b3430b946b89e88dd6c1546804125c3 Reviewed-on: https://go-review.googlesource.com/c/go/+/697360 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Munday <mikemndy@gmail.com>
2025-10-03cmd/compile: rewrite to elide Slicemask from len==c>0 slicingDavid Chase
This might have been something that prove could be educated into figuring out, but this also works, and it also helps prove downstream. Adjusted the prove test, because this change moved a message. Cherry-picked from the dev.simd branch. This CL is not necessarily SIMD specific. Apply early to reduce risk. Change-Id: I5eabe639eff5db9cd9766a6a8666fdb4973829cb Reviewed-on: https://go-review.googlesource.com/c/go/+/697715 Commit-Queue: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Bypass: David Chase <drchase@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/708858 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-10-03[dev.simd] all: merge master (adce7f1) into dev.simdCherry Mui
Conflicts: - src/internal/goexperiment/flags.go - src/runtime/export_test.go Merge List: + 2025-10-03 adce7f196e cmd/link: support .def file with MSVC clang toolchain + 2025-10-03 d5b950399d cmd/cgo: fix unaligned arguments typedmemmove crash on iOS + 2025-10-02 53845004d6 net/http/httputil: deprecate ReverseProxy.Director + 2025-10-02 bbdff9e8e1 net/http: update bundled x/net/http2 and delete obsolete http2inTests + 2025-10-02 4008e07080 io/fs: move path name documentation up to the package doc comment + 2025-10-02 0e4e2e6832 runtime: skip TestGoroutineLeakProfile under mayMoreStackPreempt + 2025-10-02 f03c392295 runtime: fix aix/ppc64 library initialization + 2025-10-02 707454b41f cmd/go: update `go help mod edit` with the tool and ignore sections + 2025-10-02 8c68a1c1ab runtime,net/http/pprof: goroutine leak detection by using the garbage collector + 2025-10-02 84db201ae1 cmd/compile: propagate len([]T{}) to make builtin to allow stack allocation + 2025-10-02 5799c139a7 crypto/tls: rm marshalEncryptedClientHelloConfigList dead code + 2025-10-01 633dd1d475 encoding/json: fix Decoder.InputOffset regression in goexperiment.jsonv2 + 2025-10-01 8ad27fb656 doc/go_spec.html: update date + 2025-10-01 3f451f2c54 testing/synctest: fix inverted test failure message in TestContextAfterFunc + 2025-10-01 be0fed8a5f cmd/go/testdata/script/test_fuzz_fuzztime.txt: disable + 2025-09-30 eb1c7f6e69 runtime: move loong64 library entry point to os-agnostic file + 2025-09-30 c9257151e5 runtime: unify ppc64/ppc64le library entry point + 2025-09-30 4ff8a457db test/codegen: codify handling of floating point constants on arm64 + 2025-09-30 fcb893fc4b cmd/compile/internal/ssa: remove redundant "type:" prefix check + 2025-09-30 19cc1022ba mime: reduce allocs incurred by ParseMediaType + 2025-09-30 08afc50bea mime: extend "builtinTypes" to include a more complete list of common types + 2025-09-30 97da068774 cmd/compile: eliminate nil checks on .dict arg + 2025-09-30 300d9d2714 runtime: initialise debug settings much earlier in startup process + 2025-09-30 a846bb0aa5 errors: add AsType + 2025-09-30 7c8166d02d cmd/link/internal/arm64: support Mach-O ARM64_RELOC_SUBTRACTOR in internal linking + 2025-09-30 6e95748335 cmd/link/internal/arm64: support Mach-O ARM64_RELOC_POINTER_TO_GOT in internal linking + 2025-09-30 742f92063e cmd/compile, runtime: always enable Wasm signext and satconv features + 2025-09-30 db10db6be3 internal/poll: remove operation fields from FD + 2025-09-29 75c87df58e internal/poll: pass the I/O mode instead of an overlapped object in execIO + 2025-09-29 fc88e18b4a crypto/internal/fips140/entropy: add CPU jitter-based entropy source + 2025-09-29 db4fade759 crypto/internal/fips140/mlkem: make CAST conditional + 2025-09-29 db3cb3fd9a runtime: correct reference to getStackMap in comment + 2025-09-29 690fc2fb05 internal/poll: remove buf field from operation + 2025-09-29 eaf2345256 cmd/link: use a .def file to mark exported symbols on Windows + 2025-09-29 4b77733565 internal/syscall/windows: regenerate GetFileSizeEx + 2025-09-29 4e9006a716 crypto/tls: quote protocols in ALPN error message + 2025-09-29 047c2ab841 cmd/link: don't pass -Wl,-S on Solaris + 2025-09-29 ae8eba071b cmd/link: use correct length for pcln.cutab + 2025-09-29 fe3ba74b9e cmd/link: skip TestFlagW on platforms without DWARF symbol table + 2025-09-29 d42d56b764 encoding/xml: make use of reflect.TypeAssert + 2025-09-29 6d51f93257 runtime: jump instead of branch in netbsd/arm64 entry point + 2025-09-28 5500cbf0e4 debug/elf: prevent offset overflow + 2025-09-27 34e67623a8 all: fix typos + 2025-09-27 af6999e60d cmd/compile: implement jump table on loong64 + 2025-09-26 63cd912083 os/user: simplify go:build + 2025-09-26 53009b26dd runtime: use a smaller arena size on Wasm + 2025-09-26 3a5df9d2b2 net/http: add HTTP2Config.StrictMaxConcurrentRequests + 2025-09-26 16be34df02 net/http: add more tests of transport connection pool + 2025-09-26 3e4540b49d os/user: use getgrouplist on illumos && cgo + 2025-09-26 15fbe3480b internal/poll: simplify WriteMsg and ReadMsg on Windows + 2025-09-26 16ae11a9e1 runtime: move TestReadMetricsSched to testprog + 2025-09-26 459f3a3adc cmd/link: don't pass -Wl,-S on AIX + 2025-09-26 4631a2d3c6 cmd/link: skip TestFlagW on AIX + 2025-09-26 0f31d742cd cmd/compile: fix ICE with new(<untyped expr>) + 2025-09-26 7d7cd6e07b internal/poll: don't call SetFilePointerEx in Seek for overlapped handles + 2025-09-26 41cba31e66 mime/multipart: percent-encode CR and LF in header values to avoid CRLF injection + 2025-09-26 dd1d597c3a Revert "cmd/internal/obj/loong64: use the MOVVP instruction to optimize prologue" + 2025-09-26 45d6bc76af runtime: unify arm64 entry point code + 2025-09-25 fdea7da3e6 runtime: use common library entry point on windows amd64/386 + 2025-09-25 e8a4f508d1 lib/fips140: re-seal v1.0.0 + 2025-09-25 9b7a328089 crypto/internal/fips140: remove key import PCTs, make keygen PCTs fatal + 2025-09-25 7f9ab7203f crypto/internal/fips140: update frozen module version to "v1.0.0" + 2025-09-25 fb5719cbda crypto/internal/fips140/ecdsa: make TestingOnlyNewDRBG generic + 2025-09-25 56067e31f2 std: remove unused declarations Change-Id: Iecb28fd62c69fbed59da557f46d31bae55889e2c
2025-09-30cmd/compile: eliminate nil checks on .dict argJake Bailey
The first arg of a generic function is the dictionary. This dictionary is never nil, but it gets a nil check becuase the dict arg is treated as a slice during construction. cmp.Compare[go.shape.int] was: 00006 (+41) TESTB AX, (AX) 00007 (+52) CMPQ CX, BX 00008 (52) JGT 14 00009 (+55) JGE 12 00010 (+56) MOVL $1, AX 00011 (56) RET 00012 (+58) XORL AX, AX 00013 (58) RET 00014 (+53) MOVQ $-1, AX 00015 (53) RET Note how the function begins with a TESTB that loads the dict to perform the nil check. This CL eliminates that nil check. For most generic functions, this doesn't matter too much, but not infrequently are generic functions written which never actually use the dictionary (like cmp.Compare), so I suspect this might help in hot code to avoid repeatedly touching the dictionary in memory, and in cases where the generic function is not inlined (and thus the dict dropped). compilecmp shows these changes (deduped): cmp.Compare[go.shape.float64] 73 -> 72 (-1.37%) cmp.Compare[go.shape.int] 26 -> 24 (-7.69%) cmp.Compare[go.shape.int32] 25 -> 23 (-8.00%) cmp.Compare[go.shape.int64] 26 -> 24 (-7.69%) cmp.Compare[go.shape.string] 142 -> 141 (-0.70%) cmp.Compare[go.shape.uint16] 26 -> 24 (-7.69%) cmp.Compare[go.shape.uint] 26 -> 24 (-7.69%) cmp.Compare[go.shape.uint32] 25 -> 23 (-8.00%) cmp.Compare[go.shape.uint64] 26 -> 24 (-7.69%) cmp.Compare[go.shape.uint8] 25 -> 23 (-8.00%) cmp.Compare[go.shape.uintptr] 26 -> 24 (-7.69%) cmp.Less[go.shape.float64] 35 -> 34 (-2.86%) cmp.Less[go.shape.int32] 8 -> 6 (-25.00%) cmp.Less[go.shape.int64] 9 -> 7 (-22.22%) cmp.Less[go.shape.int] 9 -> 7 (-22.22%) cmp.Less[go.shape.string] 112 -> 110 (-1.79%) cmp.Less[go.shape.uint16] 9 -> 7 (-22.22%) cmp.Less[go.shape.uint32] 8 -> 6 (-25.00%) cmp.Less[go.shape.uint64] 9 -> 7 (-22.22%) internal/synctest.Associate[go.shape.struct 114 -> 113 (-0.88%) internal/trace.(*dataTable[go.shape.uint64,go.shape.string]).insert 805 -> 791 (-1.74%) internal/trace.(*dataTable[go.shape.uint64,go.shape.struct 858 -> 852 (-0.70%) main.(*gState[go.shape.int64]).stop 2111 -> 2085 (-1.23%) main.(*gState[go.shape.int64]).unblock 941 -> 923 (-1.91%) runtime.fmax[go.shape.float32] 85 -> 83 (-2.35%) runtime.fmax[go.shape.float64] 89 -> 87 (-2.25%) runtime.fmin[go.shape.float32] 85 -> 83 (-2.35%) runtime.fmin[go.shape.float64] 89 -> 87 (-2.25%) slices.BinarySearch[go.shape.[]string,go.shape.string] 346 -> 337 (-2.60%) slices.Concat[go.shape.[]uint8,go.shape.uint8] 462 -> 453 (-1.95%) slices.ContainsFunc[go.shape.[]*cmd/vendor/github.com/google/pprof/profile.Sample,go.shape.*uint8] 170 -> 169 (-0.59%) slices.ContainsFunc[go.shape.[]*debug/dwarf.StructField,go.shape.*uint8] 170 -> 169 (-0.59%) slices.ContainsFunc[go.shape.[]*go/ast.Field,go.shape.*uint8] 170 -> 169 (-0.59%) slices.ContainsFunc[go.shape.[]string,go.shape.string] 186 -> 181 (-2.69%) slices.Contains[go.shape.[]*cmd/compile/internal/syntax.BranchStmt,go.shape.*cmd/compile/internal/syntax.BranchStmt] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]cmd/compile/internal/syntax.Type,go.shape.interface 223 -> 219 (-1.79%) slices.Contains[go.shape.[]crypto/tls.CurveID,go.shape.uint16] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]crypto/tls.SignatureScheme,go.shape.uint16] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]*go/ast.BranchStmt,go.shape.*go/ast.BranchStmt] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]go/types.Type,go.shape.interface 223 -> 219 (-1.79%) slices.Contains[go.shape.[]int,go.shape.int] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]string,go.shape.string] 223 -> 219 (-1.79%) slices.Contains[go.shape.[]uint16,go.shape.uint16] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]uint8,go.shape.uint8] 44 -> 42 (-4.55%) slices.Insert[go.shape.[]string,go.shape.string] 1189 -> 1170 (-1.60%) slices.medianCmpFunc[go.shape.struct 1118 -> 1113 (-0.45%) slices.medianCmpFunc[go.shape.struct 1214 -> 1209 (-0.41%) slices.medianCmpFunc[go.shape.struct 889 -> 887 (-0.22%) slices.medianCmpFunc[go.shape.struct 901 -> 874 (-3.00%) slices.order2Ordered[go.shape.float64] 89 -> 87 (-2.25%) slices.order2Ordered[go.shape.uint16] 75 -> 70 (-6.67%) slices.partialInsertionSortOrdered[go.shape.string] 1115 -> 1110 (-0.45%) slices.partialInsertionSortOrdered[go.shape.uint16] 358 -> 352 (-1.68%) slices.partitionEqualOrdered[go.shape.int] 208 -> 203 (-2.40%) slices.partitionEqualOrdered[go.shape.int32] 208 -> 198 (-4.81%) slices.partitionEqualOrdered[go.shape.int64] 208 -> 203 (-2.40%) slices.partitionEqualOrdered[go.shape.uint32] 208 -> 198 (-4.81%) slices.partitionEqualOrdered[go.shape.uint64] 208 -> 203 (-2.40%) slices.partitionOrdered[go.shape.float64] 538 -> 533 (-0.93%) slices.partitionOrdered[go.shape.int] 437 -> 427 (-2.29%) slices.partitionOrdered[go.shape.int64] 437 -> 427 (-2.29%) slices.partitionOrdered[go.shape.uint16] 447 -> 442 (-1.12%) slices.partitionOrdered[go.shape.uint64] 437 -> 427 (-2.29%) slices.rotateCmpFunc[go.shape.struct 1045 -> 1029 (-1.53%) slices.rotateCmpFunc[go.shape.struct 1205 -> 1163 (-3.49%) slices.rotateCmpFunc[go.shape.struct 1226 -> 1176 (-4.08%) slices.rotateCmpFunc[go.shape.struct 1322 -> 1272 (-3.78%) slices.rotateCmpFunc[go.shape.struct 1419 -> 1400 (-1.34%) slices.rotateCmpFunc[go.shape.*uint8] 549 -> 538 (-2.00%) slices.rotateLeft[go.shape.string] 603 -> 588 (-2.49%) slices.rotateLeft[go.shape.uint8] 255 -> 250 (-1.96%) slices.siftDownOrdered[go.shape.int] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.int32] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.int64] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.string] 614 -> 592 (-3.58%) slices.siftDownOrdered[go.shape.uint32] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.uint64] 181 -> 171 (-5.52%) time.parseRFC3339[go.shape.string] 1774 -> 1758 (-0.90%) unique.(*canonMap[go.shape.struct 280 -> 276 (-1.43%) unique.clone[go.shape.struct 311 -> 293 (-5.79%) weak.Make[go.shape.6880e4598856efac32416085c0172278cf0fb9e5050ce6518bd9b7f7d1662440] 136 -> 134 (-1.47%) weak.Make[go.shape.struct 136 -> 134 (-1.47%) weak.Make[go.shape.uint8] 136 -> 134 (-1.47%) Change-Id: I43dcea5f2aa37372f773e5edc6a2ef1dee0a8db7 Reviewed-on: https://go-review.googlesource.com/c/go/+/706655 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org>
2025-09-11[dev.simd] all: merge master (cf5e993) into dev.simdCherry Mui
Merge List: + 2025-09-11 cf5e993177 cmd/link: allow one to specify the data section in the internal linker + 2025-09-11 cdb3d467fa encoding/gob: make use of reflect.TypeAssert + 2025-09-11 fef360964c archive/tar: fix typo in benchmark name + 2025-09-11 7d562b8460 syscall: actually remove unreachable code + 2025-09-11 c349582344 crypto/rsa: don't test CL 687836 against v1.0.0 FIPS 140-3 module + 2025-09-11 253dd08f5d debug/macho: filter non-external symbols when reading imported symbols without LC_DYSYMTAB + 2025-09-10 2009e6c596 internal/runtime/maps: remove redundant package docs + 2025-09-10 de5d7eccb9 runtime/internal/maps: only conditionally clear groups when sparse + 2025-09-10 8098b99547 internal/runtime/maps: speed up Clear + 2025-09-10 fe5420b054 cmd: delete some more windows/arm remnants + 2025-09-10 fad1dc608d runtime: don't artificially limit TestReadMetricsSched + 2025-09-10 b1f3e38e41 cmd/compile: when CSEing two values, prefer the statement marked one + 2025-09-10 00824f5ff5 types2: better documentation for resolve() + 2025-09-10 5cf8ca42e3 internal/trace/raw: use strings.Cut instead of strings.SplitN 2 + 2025-09-10 80a2aae922 Revert "cmd/compile: improve stp merging for non-sequent cases" + 2025-09-10 f327a05419 go/token, syscall: annotate if blocks that defeat vet's unreachable pass + 2025-09-10 9650c97d0f syscall: remove unreachable code + 2025-09-10 f1c4b860d4 Revert "crypto/internal/fips140: update frozen module version to "v1.0.0"" + 2025-09-10 30686c4cc8 encoding/json/v2: document context annotation with SemanticError + 2025-09-09 c5737dc21b runtime: when using cgo on 386, call C sigaction function + 2025-09-09 b9a4a09b0f runtime: remove duff support for riscv64 + 2025-09-09 4dac9e093f cmd/compile: use generated loops instead of DUFFCOPY on riscv64 + 2025-09-09 879ff736d3 cmd/compile: use generated loops instead of DUFFZERO on riscv64 + 2025-09-09 77643dc63f cmd/compile: simplify zerorange on riscv64 + 2025-09-09 e6605a1bcc encoding/json: use reflect.TypeAssert + 2025-09-09 4c20f7f15a cmd/cgo: run gcc to get errors and debug info in parallel + 2025-09-09 5dcedd6550 runtime: lock mheap_.speciallock when allocating synctest specials + 2025-09-09 d3be949ada runtime: don't negate eventfd errno + 2025-09-09 836fa74518 syscall: optimise cgo clearenv + 2025-09-09 ce39174482 crypto/rsa: check PrivateKey.D for consistency with Dp and Dq + 2025-09-09 5d9d0513dc crypto/rsa: check for post-Precompute changes in Validate + 2025-09-09 968a5107a9 crypto/internal/fips140: update frozen module version to "v1.0.0" + 2025-09-09 645ee44492 crypto/ecdsa: deprecate direct use of big.Int fields in keys + 2025-09-09 a67977da5e cmd/compile/internal/inline: ignore superfluous slicing + 2025-09-09 a5fa5ea51c cmd/compile/internal/ssa: expand runtime.memequal for length {3,5,6,7} + 2025-09-09 4c63d798cb cmd/compile: improve stp merging for non-sequent cases + 2025-09-09 bdd51e7855 cmd/compile: use constant zero register instead of specialized zero instructions on mips64x + 2025-09-09 10ac80de77 cmd/compile: introduce CCMP generation + 2025-09-09 3b3b16957c Revert "cmd/go: use os.Rename to move files on Windows" + 2025-09-09 e3223518b8 cmd/go: split generating cover files into its own action + 2025-09-09 af03343f93 cmd/compile: fix bounds check report + 2025-09-08 6447ff409a cmd/compile: fold constant in ADDshift op on loong64 + 2025-09-08 5b218461f9 cmd/compile: optimize loads from abi.Type.{Size_,PtrBytes,Kind_} + 2025-09-08 b915e14490 cmd/compile: consolidate logic for rewriting fixed loads + 2025-09-08 06e791c0cd cmd/compile: simplify zerorange on mips + 2025-09-08 cf42b785b7 cmd/cgo: run recordTypes for each of the debugs at the end of Translate + 2025-09-08 5e6296f3f8 archive/tar: optimize nanosecond parsing in parsePAXTime + 2025-09-08 ea00650784 debug/pe: permit symbols with no name + 2025-09-08 4cc7cc74c3 crypto: update Hash comments to point to crypto/sha3 + 2025-09-08 ff45d5d53c encoding/json/internal/jsonflags: fix comment with wrong field name + 2025-09-06 861c90c907 net/http: pool transport gzip readers + 2025-09-06 57769b5532 os: reject OpenDir of a non-directory file in Plan 9 + 2025-09-06 a6144613d3 crypto/tls: use context.AfterFunc in handshakeContext + 2025-09-05 e8126bce9e runtime/cgo: save and restore R31 for crosscall1 on loong64 + 2025-09-05 d767064170 cmd/compile: mark abi.PtrType.Elem sym as used + 2025-09-05 0b1eed09a3 vendor/golang.org/x/tools: update to a09a2fb + 2025-09-05 f5b20689e9 cmd/compile: optimize loads from readonly globals into constants on loong64 + 2025-09-05 3492e4262b cmd/compile: simplify specific addition operations using the ADDV16 instruction + 2025-09-05 459b85ccaa cmd/fix: remove all functionality except for buildtag + 2025-09-05 87e72769fa runtime: simplify openbsd check in usesLibcall and mStackIsSystemAllocated + 2025-09-05 bb48272e24 cmd/compile: simplify zerorange on mips64 + 2025-09-05 d52a56cce1 cmd/link/internal/ld: unconditionally use posix_fallocate on FreeBSD + 2025-09-04 9d0829963c net/http: fix cookie value of "" being interpreted as empty string. + 2025-09-04 ddce0522be cmd/internal/obj/loong64: add ADDU16I.D instruction support + 2025-09-04 00b8474e47 cmd/trace: don't filter events for profile by whether they have stack + 2025-09-04 e36c5aead6 log/slog: add multiple handlers support for logger + 2025-09-04 150fae714e crypto/x509: don't force system roots load in SetFallbackRoots + 2025-09-04 4f7bbc62c7 runtime, cmd/compile, cmd/internal/obj: remove duff support for loong64 + 2025-09-04 b8cc907425 cmd/internal/obj/loong64: fix the usage of offset in the instructions [X]VLDREPL.{B/H/W/D} + 2025-09-04 8c27a80890 path{,/filepath}: speed up Match + 2025-09-04 b7c20413c5 runtime: remove obsolete osArchInit function + 2025-09-04 df29038486 cmd/compile/internal/ssa: load constant values from abi.PtrType.Elem + 2025-09-04 4373754bc9 cmd/compile: add store to load forwarding rules on riscv64 + 2025-09-03 80038586ed cmd/compile: export to DWARF types only referenced through interfaces + 2025-09-03 91e76a513b cmd/compile: use generated loops instead of DUFFCOPY on loong64 + 2025-09-03 c552ad913f cmd/compile: simplify memory load and store operations on loong64 + 2025-09-03 e8f9127d1f net/netip: export Prefix.Compare, fix ordering + 2025-09-03 731e546166 cmd/compile: simplify the support for 32bit high multiply on loong64 Change-Id: I2c124fb8071e2972d39804867cafb6806e601aba
2025-09-09cmd/compile/internal/ssa: expand runtime.memequal for length {3,5,6,7}Youlin Feng
This CL slightly speeds up strings.HasPrefix when testing constant prefixes of length {3,5,6,7}. goos: linux goarch: amd64 cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz │ old │ new │ │ sec/op │ sec/op vs base │ StringPrefix3-8 11.125n ± 2% 8.539n ± 1% -23.25% (p=0.000 n=20) StringPrefix5-8 11.170n ± 2% 8.700n ± 1% -22.11% (p=0.000 n=20) StringPrefix6-8 11.190n ± 2% 8.655n ± 1% -22.65% (p=0.000 n=20) StringPrefix7-8 11.095n ± 1% 8.878n ± 1% -19.98% (p=0.000 n=20) Change-Id: I510a80d59cf78680b57d68780d35d212d24030e2 Reviewed-on: https://go-review.googlesource.com/c/go/+/700816 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <markfreeman@google.com> Auto-Submit: Keith Randall <khr@golang.org>
2025-09-08cmd/compile: consolidate logic for rewriting fixed loadsJake Bailey
Many CLs have worked with this bit of code, extending the cases more and more for various fixed addresses and constants. But, I find that it's getting duplicitive, and I don't find the current setup very clear that something like isFixed32 _only_ works for a specific element within the type data. This CL rewrites these rules (pun unintended) into a single set of rewrite rules with shared logic, which stops hardcoding offsets and type compatibility checks. This should open the door to optimizing further type:... field loads, of which most can be done entirely statically but are not yet today outside Hash and Elem. Passes toolstash -cmp. Change-Id: I754138ce1785c6036eada9ed53f0ce2ad2a58b63 Reviewed-on: https://go-review.googlesource.com/c/go/+/701297 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Florian Lehner <lehner.florian86@gmail.com>
2025-09-05cmd/compile: mark abi.PtrType.Elem sym as usedJake Bailey
CL 700336 let the compiler see into the abi.PtrType.Elem field, but forgot the MarkTypeSymUsedInInterface to ensure that the symbol is marked as referenced. I am not sure how to write a test for this, but I noticed this when working on further optimizations where I "fixed" this issue and confusingly failed toolstash -cmp, with diffs like: @@ -70582,6 +70582,7 @@ reflect.groupAndSlotOf<1> STEXT size=696 args=0x20 locals=0x1e0 funcid=0x0 align rel 3+0 t=R_USEIFACE type:*reflect.rtype<0>+0 rel 3+0 t=R_USEIFACE type:*reflect.rtype<0>+0 rel 3+0 t=R_USEIFACE type:*uint64<0>+0 + rel 3+0 t=R_USEIFACE type:uint64<0>+0 rel 71+0 t=R_CALLIND +0 rel 92+4 t=R_PCREL go:itab.*reflect.rtype,reflect.Type<0>+0 rel 114+4 t=R_CALL reflect.(*rtype).ptrTo<1>+0 Updates #75203 Change-Id: Ib8de8a32aeb8a7ea6fcf5d728a2e4944ef227ab2 Reviewed-on: https://go-review.googlesource.com/c/go/+/701296 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-09-04cmd/compile/internal/ssa: load constant values from abi.PtrType.ElemYoulin Feng
This CL makes the generated code for reflect.TypeFor as simple as an intrinsic function. Fixes #75203 Change-Id: I7bb48787101f07e77ab5c583292e834c28a028d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/700336 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Keith Randall <khr@golang.org>
2025-09-03[dev.simd] all: merge master (4c4cefc) into dev.simdCherry Mui
Merge List: + 2025-09-03 4c4cefc19a cmd/gofmt: simplify logic to process arguments + 2025-09-03 925a3cdcd1 unicode/utf8: make DecodeRune{,InString} inlineable + 2025-09-03 3e596d448f math: rename Modf parameter int to integer + 2025-09-02 2a7f1d47b0 runtime: use one more address bit for tagged pointers + 2025-09-02 b09068041a cmd/dist: run racebench tests only in longtest mode + 2025-09-02 355370ac52 runtime: add comment for concatstring2 + 2025-09-02 1eec830f54 go/doc: linkify interface methods + 2025-08-31 7bba745820 cmd/compile: use generated loops instead of DUFFZERO on loong64 + 2025-08-31 882335e2cb cmd/internal/obj/loong64: add LDPTR.{W/D} and STPTR.{W/D} instructions support + 2025-08-31 d4b17f5869 internal/runtime/atomic: reset wrong jump target in Cas{,64} on loong64 + 2025-08-31 6a08e80399 net/http: skip redirecting in ServeMux when URL path for CONNECT is empty + 2025-08-29 8bcda6c79d runtime/race: add race detector support for linux/riscv64 + 2025-08-29 8377adafc5 cmd/cgo: split loadDWARF into two parts + 2025-08-29 a7d9d5a80a cmd/cgo: move typedefs and typedefList out of Package + 2025-08-29 1d459c4357 all: delete more windows/arm remnants + 2025-08-29 27ce6e4e26 cmd/compile: remove sign extension before MULW on riscv64 + 2025-08-29 84b070bfb1 cmd/compile/internal/ssa: make oneBit function generic + 2025-08-29 fe42628dae internal/cpu: inline DebugOptions + 2025-08-29 94b7d519bd net: update document on limitation of iprawsock on Windows + 2025-08-29 ba9e1ddccf testing: allow specify temp dir by GOTMPDIR environment variable + 2025-08-29 9f6936b8da cmd/link: disallow linkname of runtime.addmoduledata + 2025-08-29 89d41d254a bytes, strings: speed up TrimSpace + 2025-08-29 38204e0872 testing/synctest: call out common issues with tests + 2025-08-29 252c901125 os,syscall: pass file flags to CreateFile on Windows + 2025-08-29 53515fb0a9 crypto/tls: use hash.Cloner + 2025-08-28 13bb48e6fb go/constant: fix complex != unknown comparison + 2025-08-28 ba1109feb5 net: remove redundant cgoLookupCNAME return parameter + 2025-08-28 f74ed44ed9 net/http/httputil: remove redundant pw.Close() call in DumpRequestOut + 2025-08-28 a9689d2e0b time: skip TestLongAdjustTimers in short mode on single CPU systems + 2025-08-28 ebc763f76d syscall: only get parent PID if SysProcAttr.Pdeathsig is set + 2025-08-28 7f1864b0a8 strings: remove redundant "runs" from string.Fields docstring + 2025-08-28 90c21fa5b6 net/textproto: eliminate some bounds checks + 2025-08-27 e47d88beae os: return nil slice when ReadDir is used with a file on file_windows + 2025-08-27 6b837a64db cmd/internal/obj/loong64: simplify buildop + 2025-08-27 765905e3bd debug/elf: don't panic if symtab too small + 2025-08-27 2ee4b31242 net/http: Ensure that CONNECT proxied requests respect MaxResponseHeaderBytes + 2025-08-27 b21867b1a2 net/http: require exact match for CrossSiteProtection bypass patterns + 2025-08-27 d19e377f6e cmd/cgo: make it safe to run gcc in parallel + 2025-08-27 49a2f3ed87 net: allow zero value destination address in WriteMsgUDPAddrPort + 2025-08-26 afc51ed007 internall/poll: remove bufs field from Windows' poll.operation + 2025-08-26 801b74eb95 internal/poll: remove rsa field from Windows' poll.operation + 2025-08-26 fa18c547cd syscall: sort Windows env block in StartProcess + 2025-08-26 bfd130db02 internal/poll: don't use stack-allocated WSAMsg parameters + 2025-08-26 dae9e456ae runtime: identify virtual memory layout for riscv64 + 2025-08-25 25c2d4109f math: use Trunc to implement Modf + 2025-08-25 4e05a070c4 math: implement IsInf using Abs + 2025-08-25 1eed4f32a0 math: optimize Signbit implementation slightly + 2025-08-25 bd71b94659 cmd/compile/internal: optimizing add+sll rule using ALSLV instruction on loong64 + 2025-08-25 ea55ca3600 runtime: skip doInit of plugins in runtime.main + 2025-08-25 9ae2f1fb57 internal/trace: skip async preempt off tests on low end systems + 2025-08-25 bbd5342a62 net: fix cgoResSearch + 2025-08-25 ed7f804775 os: set full name for Roots created with Root.OpenRoot + 2025-08-25 a21249436b internal/poll: use fdMutex to provide read/write locking on Windows + 2025-08-24 44c5956bf7 test/codegen: add Mul2 and DivPow2 test for loong64 + 2025-08-24 0aa8019e94 test/codegen: add Mul* test for loong64 + 2025-08-24 83420974b7 test/codegen: add sqrt* abs and copysign test for loong64 + 2025-08-23 f2db0dca0b net/http/httptest: redirect example.com requests to server + 2025-08-22 d86ec92499 internal/syscall/windows: increase internal Windows O_ flags values + 2025-08-22 9d3f7fda70 crypto/tls: fix quic comment typo + 2025-08-22 78a05c541f internal/poll: don't pass non-nil WSAMsg.Name with 0 namelen on windows + 2025-08-22 52c3f73fda runtime/metrics: improve doc + 2025-08-22 a076f49757 os: fix Root.MkdirAll to handle race of directory creation + 2025-08-22 98238fd495 all: delete remaining windows/arm code + 2025-08-21 1ad30844d9 cmd/asm: process forward jump to PCALIGN + 2025-08-21 13c082601d internal/poll: permit nil destination address in WriteMsg{Inet4,Inet6} + 2025-08-21 9b0a507735 runtime: remove remaining windows/arm files and comments + 2025-08-21 1843f1e9c0 cmd/compile: use zero register instead of specialized *zero instructions on loong64 + 2025-08-21 e0870a0a12 cmd/compile: simplify zerorange on loong64 + 2025-08-21 fb8bbe46d5 cmd/compile/internal/ssa: eliminate unnecessary extension operations + 2025-08-21 9632ba8160 cmd/compile: optimize some patterns into revb2h/revb4h instruction on loong64 + 2025-08-21 8dcab6f450 syscall: simplify execve handling on libc platforms + 2025-08-21 ba840c1bf9 cmd/compile: deduplication in the source code generated by mknode + 2025-08-21 fa706ea50f cmd/compile: optimize rule (x + x) << c to x << c+1 on loong64 + 2025-08-21 ffc85ee1f1 cmd/internal/objabi,cmd/link: add support for additional riscv64 relocations Change-Id: I3896f74b1a3cc0a52b29ca48767bb0ba84620f71
2025-08-29cmd/compile/internal/ssa: make oneBit function genericMichael Munday
Allows rewrite rules using oneBit to be made more compact. Change-Id: I986715f77db5b548759d809fe668e1893048f25c Reviewed-on: https://go-review.googlesource.com/c/go/+/699295 Reviewed-by: Youlin Feng <fengyoulin@live.com> Commit-Queue: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-20[dev.simd] cmd/compile: rewrite to elide Slicemask from len==c>0 slicingDavid Chase
This might have been something that prove could be educated into figuring out, but this also works, and it also helps prove downstream. Adjusted the prove test, because this change moved a message. Change-Id: I5eabe639eff5db9cd9766a6a8666fdb4973829cb Reviewed-on: https://go-review.googlesource.com/c/go/+/697715 Commit-Queue: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Bypass: David Chase <drchase@google.com>
2025-08-11Revert "cmd/compile: allow multi-field structs to be stored directly in ↵Keith Randall
interfaces" This reverts commit cd55f86b8dcfc139ee5c17d32530ac9e758c8bc0 (CL 681937) Reason for revert: still causing compiler failures on Google test code Change-Id: I5cd482fd607fd060a523257082d48821b5f965d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/695016 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-08-11Revert "cmd/compile: allow more args in StructMake folding rule"Keith Randall
This reverts commit 72e8237cc11569de2faf9885a1b83d06446533b5 (CL 693615) Reason for revert: still causing compiler failures on Google test code Change-Id: I4a7850c321d95ed7803d56866bb0c524c7a377d3 Reviewed-on: https://go-review.googlesource.com/c/go/+/695015 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-08-06cmd/compile: allow more args in StructMake folding ruleKeith Randall
imakeOfStructMake does the right thing, but we never call it when the StructMake has more than one argument. Fixes #74908 Change-Id: Ib4b1a025bfb1fa69a325207e47b74bd6217092bf Reviewed-on: https://go-review.googlesource.com/c/go/+/693615 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-05cmd/compile: allow multi-field structs to be stored directly in interfacesKeith Randall
If the struct is a bunch of 0-sized fields and one pointer field. Fixes #74092 Change-Id: I87c5d162c8c9fdba812420d7f9d21de97295b62c Reviewed-on: https://go-review.googlesource.com/c/go/+/681937 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-08-04[dev.simd] all: merge master (7a1679d) into dev.simdCherry Mui
Conflicts: - src/cmd/compile/internal/amd64/ssa.go - src/cmd/compile/internal/ssa/rewriteAMD64.go - src/internal/buildcfg/exp.go - src/internal/cpu/cpu.go - src/internal/cpu/cpu_x86.go - src/internal/goexperiment/flags.go Merge List: + 2025-08-04 7a1679d7ae cmd/compile: move s390x over to new bounds check strategy + 2025-08-04 95693816a5 cmd/compile: move riscv64 over to new bounds check strategy + 2025-08-04 d7bd7773eb go/parser: remove safePos + 2025-08-04 4b6cbc377f cmd/cgo/internal/test: use (syntactic) constant for C array bound + 2025-08-03 b2960e3580 cmd/internal/obj/loong64: add {V,XV}{BITCLR/BITSET/BITREV}[I].{B/H/W/D} instructions support + 2025-08-03 abeeef1c08 cmd/compile/internal/test: fix typo in comments + 2025-08-03 d44749b65b cmd/internal/obj/loong64: add [X]VLDREPL.{B/H/W/D} instructions support + 2025-08-03 d6beda863e runtime: add reference to debugPinnerV1 + 2025-08-01 4ab1aec007 cmd/go: modload should use a read-write lock to improve concurrency + 2025-08-01 e666972a67 runtime: deduplicate Windows stdcall + 2025-08-01 ef40549786 runtime,syscall: move loadlibrary and getprocaddress to syscall + 2025-08-01 336931a4ca cmd/go: use os.Rename to move files on Windows + 2025-08-01 eef5f8d930 cmd/compile: enforce that locals are always accessed with SP base register + 2025-08-01 e071617222 cmd/compile: optimize multiplication rules on loong64 + 2025-07-31 eb7f515c4d cmd/compile: use generated loops instead of DUFFZERO on amd64 + 2025-07-31 c0ee2fd4e3 cmd/go: explicitly reject module paths "go" and "toolchain" + 2025-07-30 a4d99770c0 runtime/metrics: add cleanup and finalizer queue metrics + 2025-07-30 70a2ff7648 runtime: add cgo call benchmark + 2025-07-30 69338a335a cmd/go/internal/gover: fix ModIsPrerelease for toolchain versions + 2025-07-30 cedf63616a cmd/compile: add floating point min/max intrinsics on s390x + 2025-07-30 82a1921c3b all: remove redundant Swiss prefixes + 2025-07-30 2ae059ccaf all: remove GOEXPERIMENT=swissmap + 2025-07-30 cc571dab91 cmd/compile: deduplicate instructions when rewrite func results + 2025-07-30 2174a7936c crypto/tls: use standard chacha20-poly1305 cipher suite names + 2025-07-30 8330fb48a6 cmd/compile: move mips32 over to new bounds check strategy + 2025-07-30 9f9d7b50e8 cmd/compile: move mips64 over to new bounds check strategy + 2025-07-30 5216fd570e cmd/compile: move loong64 over to new bounds check strategy + 2025-07-30 89a0af86b8 cmd/compile: allow ops to specify clobbering input registers + 2025-07-30 5e94d72158 cmd/compile: simplify zerorange on arm64 + 2025-07-30 8cd85e602a cmd/compile: check domination of loop return in both controls + 2025-07-30 cefaed0de0 reflect: fix noswiss builder + 2025-07-30 3aa1b00081 regexp: fix compiling alternate patterns of different fold case literals + 2025-07-30 b1e933d955 cmd/compile: avoid extending when already sufficiently masked on loong64 + 2025-07-29 880ca333d7 cmd/compile: removing log2uint32 function + 2025-07-29 1513661dc3 cmd/compile: simplify logX implementations + 2025-07-29 bd94ae8903 cmd/compile: use unsigned power-of-two detector for unsigned mod + 2025-07-29 f3582fc80e cmd/compile: add unsigned power-of-two detector + 2025-07-29 f7d167fe71 internal/abi: move direct/indirect flag from Kind to TFlag + 2025-07-29 e0b07dc22e os/exec: fix incorrect expansion of "", "." and ".." in LookPath + 2025-07-29 25816d401c internal/goexperiment: delete RangeFunc goexperiment + 2025-07-29 7961bf71f8 internal/goexperiment: delete CacheProg goexperiment + 2025-07-29 e15a14c4dd sync: remove synchashtriemap GOEXPERIMENT + 2025-07-29 7dccd6395c cmd/compile: move arm32 over to new bounds check strategy + 2025-07-29 d79405a344 runtime: only deduct assist credit for arenas during GC + 2025-07-29 19a086f716 cmd/go/internal/telemetrystats: count goexperiments + 2025-07-29 aa95ab8215 image: fix formatting of godoc link + 2025-07-29 4c854b7a3e crypto/elliptic: change a variable name that have the same name as keywords + 2025-07-28 b10eb1d042 cmd/compile: simplify zerorange on amd64 + 2025-07-28 f8eae7a3c3 os/user: fix tests to pass on non-english Windows + 2025-07-28 0984264471 internal/poll: remove msg field from Windows' poll.operation + 2025-07-28 d7b4114346 internal/poll: remove rsan field from Windows' poll.operation + 2025-07-28 361b1ab41f internal/poll: remove sa field from Windows' poll.operation + 2025-07-28 9b6bd64e46 internal/poll: remove qty and flags fields from Windows' poll.operation + 2025-07-28 cd3655a824 internal/runtime/maps: fix spelling errors in comments + 2025-07-28 d5dc36af45 runtime: remove openbsd/mips64 related code + 2025-07-28 64ba72474d errors: omit redundant nil check in type assertion for Join + 2025-07-28 e151db3e06 all: omit unnecessary type conversions + 2025-07-28 4569255f8c cmd/compile: cleanup SelectN rules by indexing into args + 2025-07-28 94645d2413 cmd/compile: rewrite cmov(x, x, cond) into x + 2025-07-28 10c5cf68d4 net/http: add proper panic message + 2025-07-28 46b5839231 test/codegen: fix failing condmove wasm tests + 2025-07-28 98f301cf68 runtime,syscall: move SyscallX implementations from runtime to syscall + 2025-07-28 c7ed3a1c5a internal/runtime/syscall/windows: factor out code from runtime + 2025-07-28 e81eac19d3 hash/crc32: fix incorrect checksums with avx512+race + 2025-07-25 6fbad4be75 cmd/compile: remove no-longer-necessary call to calculateDepths + 2025-07-25 5045fdd8ff cmd/compile: fix containsUnavoidableCall computation + 2025-07-25 d28b27cd8e go/types, types2: use nil to represent incomplete explicit aliases + 2025-07-25 7b53d8d06e cmd/compile/internal/types2: add loaded state between loader calls and constraint expansion + 2025-07-25 374e3be2eb os/user: user random name for the test user account + 2025-07-25 1aa154621d runtime: rename scanobject to scanObject + 2025-07-25 41b429881a runtime: duplicate scanobject in greentea and non-greentea files + 2025-07-25 aeb256e98a cmd/compile: remove unused arg from gorecover + 2025-07-25 08376e1a9c runtime: iterate through inlinings when processing recover() + 2025-07-25 c76c3abc54 encoding/json: fix truncated Token error regression in goexperiment.jsonv2 + 2025-07-25 ebdbfccd98 encoding/json/jsontext: preserve buffer capacity in Encoder.Reset + 2025-07-25 91c4f0ccd5 reflect: avoid a bounds check in stack-constrained code + 2025-07-24 3636ced112 encoding/json: fix extra data regression under goexperiment.jsonv2 + 2025-07-24 a6eec8bdc7 encoding/json: reduce error text regressions under goexperiment.jsonv2 + 2025-07-24 0fa88dec1e time: remove redundant uint32 conversion in split + 2025-07-24 ada30b8248 internal/buildcfg: add ability to get GORISCV64 variable in GOGOARCH + 2025-07-24 6f6c6c5782 cmd/internal/obj: rip out argp adjustment for wrapper frames + 2025-07-24 7b50024330 runtime: detect successful recovers differently + 2025-07-24 7b9de668bd unicode/utf8: skip ahead during ascii runs in Valid/ValidString + 2025-07-24 076eae436e cmd/compile: move amd64 and 386 over to new bounds check strategy + 2025-07-24 f703dc5bef cmd/compile: add missing StringLen rule in prove + 2025-07-24 394d0bee8d cmd/compile: move arm64 over to new bounds check strategy + 2025-07-24 3024785b92 cmd/compile,runtime: remember idx+len for bounds check failure with less code + 2025-07-24 741a19ab41 runtime: move bounds check constants to internal/abi + 2025-07-24 ce05ad448f cmd/compile: rewrite condselects into doublings and halvings + 2025-07-24 fcd28070fe cmd/compile: add opt branchelim to rewrite some CondSelect into math + 2025-07-24 f32cf8e4b0 cmd/compile: learn transitive proofs for safe unsigned subs + 2025-07-24 d574856482 cmd/compile: learn transitive proofs for safe negative signed adds + 2025-07-24 1a72920f09 cmd/compile: learn transitive proofs for safe positive signed adds + 2025-07-24 e5f202bb60 cmd/compile: learn transitive proofs for safe unsigned adds + 2025-07-24 bd80f74bc1 cmd/compile: fold shift through AND for slice operations + 2025-07-24 5c45fe1385 internal/runtime/syscall: rename to internal/runtime/syscall/linux + 2025-07-24 592c2db868 cmd/compile: improve loopRotate to handle nested loops + 2025-07-24 dcb479c2f9 cmd/compile: optimize slice bounds checking with SUB/SUBconst comparisons + 2025-07-24 f11599b0b9 internal/poll: remove handle field from Windows' poll.operation + 2025-07-24 f7432e0230 internal/poll: remove fd field from Windows' poll.operation + 2025-07-24 e84ed38641 runtime: add benchmark for small-size memmory operation + 2025-07-24 18dbe5b941 hash/crc32: add AVX512 IEEE CRC32 calculation + 2025-07-24 c641900f72 cmd/compile: prefer base.Fatalf to panic in dwarfgen + 2025-07-24 d71d8aeafd cmd/internal/obj/s390x: add MVCLE instruction + 2025-07-24 b6cf1d94dc runtime: optimize memclr on mips64x + 2025-07-24 a8edd99479 runtime: improvement in memclr for s390x + 2025-07-24 bd04f65511 internal/runtime/exithook: fix a typo + 2025-07-24 5c8624a396 cmd/internal/goobj: make error output clear + 2025-07-24 44d73dfb4e cmd/go/internal/doc: clean up after merge with cmd/internal/doc + 2025-07-24 bd446662dd cmd/internal/doc: merge with cmd/go/internal/doc + 2025-07-24 da8b50c830 cmd/doc: delete + 2025-07-24 6669aa3b14 runtime: randomize heap base address + 2025-07-24 26338a7f69 cmd/compile: use better fatal message for staticValue1 + 2025-07-24 8587ba272e cmd/cgo: compare malloc return value to NULL instead of literal 0 + 2025-07-24 cae45167b7 go/types, types2: better error messages for certain type mismatches + 2025-07-24 2ddf542e4c cmd/compile: use ,ok return idiom for sparsemap.get + 2025-07-24 6505fcbd0a cmd/compile: use generics for sparse map + 2025-07-24 14f5eb7812 cmd/api: rerun updategolden + 2025-07-24 52b6d7f67a runtime: drop NetBSD kernel bug sysmon workaround fixed in NetBSD 9.2 + 2025-07-24 1ebebf1cc1 cmd/go: clean should respect workspaces + 2025-07-24 6536a93547 encoding/json/jsontext: preserve buffer capacity in Decoder.Reset + 2025-07-24 efc37e97c0 cmd/go: always return the cached path from go tool -n + 2025-07-23 98a031193b runtime: check TestUsingVDSO ExitError type assertion + 2025-07-23 6bb42997c8 doc/next: initialize + 2025-07-23 2696a11a97 internal/goversion: update Version to 1.26 + 2025-07-23 489868f776 cmd/link: scope test to linux & net.sendFile + 2025-07-22 71c2bf5513 cmd/compile: fix loclist for heap return vars without optimizations + 2025-07-22 c74399e7f5 net: correct comment for ListenConfig.ListenPacket + 2025-07-22 4ed9943b26 all: go fmt + 2025-07-22 1aaf7422f1 cmd/internal/objabi: remove redundant word in comment + 2025-07-21 d5ec0815e6 runtime: relax TestMemoryLimitNoGCPercent a bit + 2025-07-21 f7cc61e7d7 cmd/compile: for arm64 epilog, do SP increment with a single instruction + 2025-07-21 5dac42363b runtime: fix asan wrapper for riscv64 + 2025-07-21 e5502e0959 cmd/go: check subcommand properties + 2025-07-19 2363897932 cmd/internal/obj: enable got pcrel itype in fips140 for riscv64 + 2025-07-19 e32255fcc0 cmd/compile/internal/ssa: restrict architectures for TestDebugLines_74576 + 2025-07-18 0451816430 os: revert the use of AddCleanup to close files and roots + 2025-07-18 34b70684ba go/types: infer correct type for y in append(bytes, y...) + 2025-07-17 66536242fc cmd/compile/internal/escape: improve DWARF .debug_line numbering for literal rewriting optimizations + 2025-07-16 385000b004 runtime: fix idle time double-counting bug + 2025-07-16 f506ad2644 cmd/compile/internal/escape: speed up analyzing some functions with many closures + 2025-07-16 9c507e7942 cmd/link, runtime: on Wasm, put only function index in method table and func table + 2025-07-16 9782dcfd16 runtime: use 32-bit function index on Wasm + 2025-07-16 c876bf9346 cmd/internal/obj/wasm: use 64-bit instructions for indirect calls + 2025-07-15 b4309ece66 cmd/internal/doc: upgrade godoc pkgsite to 01b046e + 2025-07-15 75a19dbcd7 runtime: use memclrNoHeapPointers to clear inline mark bits + 2025-07-15 6d4a91c7a5 runtime: only clear inline mark bits on span alloc if necessary + 2025-07-15 0c6296ab12 runtime: have mergeInlineMarkBits also clear the inline mark bits + 2025-07-15 397d2117ec runtime: merge inline mark bits with gcmarkBits 8 bytes at a time + 2025-07-15 7dceabd3be runtime/maps: fix typo in group.go comment (instrinsified -> intrinsified) + 2025-07-15 d826bf4d74 os: remove useless error check + 2025-07-14 bb07e55aff runtime: expand GOMAXPROCS documentation + 2025-07-14 9159cd4ec6 encoding/json: decompose legacy options + 2025-07-14 c6556b8eb3 encoding/json/v2: add security section to doc + 2025-07-11 6ebb5f56d9 runtime: gofmt after CL 643897 and CL 662455 + 2025-07-11 1e48ca7020 encoding/json: remove legacy option to EscapeInvalidUTF8 + 2025-07-11 a0a99cb22b encoding/json/v2: report wrapped io.ErrUnexpectedEOF + 2025-07-11 9d04122d24 crypto/rsa: drop contradictory promise to keep PublicKey modulus secret + 2025-07-11 1ca23682dd crypto/rsa: fix documentation formatting + 2025-07-11 4bc3373c8e runtime: turn off large memmove tests under asan/msan Change-Id: I1e32d964eba770b85421efb86b305a2242f24466
2025-07-29cmd/compile: use unsigned power-of-two detector for unsigned modCuong Manh Le
Same as CL 689815, but for modulus instead of division. Updates #74485 Change-Id: I73000231c886a987a1093669ff207fd9117a8160 Reviewed-on: https://go-review.googlesource.com/c/go/+/689895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-07-29cmd/compile: add unsigned power-of-two detectorCuong Manh Le
Fixes #74485 Change-Id: Ia22a58ac43bdc36c8414d555672a3a3eafc749ca Reviewed-on: https://go-review.googlesource.com/c/go/+/689815 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-07-28cmd/compile: cleanup SelectN rules by indexing into argsJorropo
Change-Id: I7b8e8cd88c4d6d562aa25df91593d35d331ef63c Reviewed-on: https://go-review.googlesource.com/c/go/+/690595 Reviewed-by: Mark Freeman <mark@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-28cmd/compile: rewrite cmov(x, x, cond) into xJorropo
I don't think branchelim will intentionally generate theses. But at the time where branchelim is generating them they might different, and through opt process they become the same value. Change-Id: I4a19f1db14c08057b7e782a098f4c18ca36ab7fe Reviewed-on: https://go-review.googlesource.com/c/go/+/690519 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <mark@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-07-24cmd/compile: rewrite condselects into doublings and halvingsJorropo
For performance see CL 685676. This allows something like: if y { x *= 2 } To be compiled to: SHLXQ BX, AX, AX Instead of: MOVQ AX, CX SHLQ $1, CX MOVBLZX BL, DX TESTQ DX, DX CMOVQNE CX, AX While ./make.bash uniqued per LOC, there is 2 doublings and 4 halvings. Change-Id: Ic0727cbf429528a2dbf17cbfc3b0121db8387444 Reviewed-on: https://go-review.googlesource.com/c/go/+/685695 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-07-24cmd/compile: add opt branchelim to rewrite some CondSelect into mathJorropo
This allows something like: if y { x++ } To be compiled to: MOVBLZX BX, CX ADDQ CX, AX Instead of: LEAQ 1(AX), CX MOVBLZX BL, DX TESTQ DX, DX CMOVQNE CX, AX While ./make.bash uniqued per LOC, there is 100 additions and 75 substractions. See benchmark here: https://go.dev/play/p/DJf5COjwhd_s Either it's a performance no-op or it is faster: goos: linux goarch: amd64 cpu: AMD Ryzen 5 3600 6-Core Processor │ /tmp/old.logs │ /tmp/new.logs │ │ sec/op │ sec/op vs base │ CmovInlineConditionAddLatency-12 0.5443n ± 5% 0.5339n ± 3% -1.90% (p=0.004 n=10) CmovInlineConditionAddThroughputBy6-12 1.492n ± 1% 1.494n ± 1% ~ (p=0.955 n=10) CmovInlineConditionSubLatency-12 0.5419n ± 3% 0.5282n ± 3% -2.52% (p=0.019 n=10) CmovInlineConditionSubThroughputBy6-12 1.587n ± 1% 1.584n ± 2% ~ (p=0.492 n=10) CmovOutlineConditionAddLatency-12 0.5223n ± 1% 0.2639n ± 4% -49.47% (p=0.000 n=10) CmovOutlineConditionAddThroughputBy6-12 1.159n ± 1% 1.097n ± 2% -5.35% (p=0.000 n=10) CmovOutlineConditionSubLatency-12 0.5271n ± 3% 0.2654n ± 2% -49.66% (p=0.000 n=10) CmovOutlineConditionSubThroughputBy6-12 1.053n ± 1% 1.050n ± 1% ~ (p=1.000 n=10) geomean There are other benefits not tested by this benchmark: - the math form is usually a couple bytes shorter (ICACHE) - the math form is usually 0~2 uops shorter (UCACHE) - the math form has usually less register pressure* - the math form can sometimes be optimized further *regalloc rarely find how it can use less registers As far as pass ordering goes there are many possible options, I've decided to reorder branchelim before late opt since: - unlike running exclusively the CondSelect rules after branchelim, some extra optimizations might trigger on the adds or subs. - I don't want to maintain a second generic.rules file of only the stuff, that can trigger after branchelim. - rerunning all of opt a third time increase compilation time for little gains. By elimination moving branchelim seems fine. Change-Id: I869adf57e4d109948ee157cfc47144445146bafd Reviewed-on: https://go-review.googlesource.com/c/go/+/685676 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-28[dev.simd] cmd/compile: adapters for simdDavid Chase
This combines several CLs into a single patch of "glue" for the generated SIMD extensions. This glue includes GOEXPERIMENT checks that disable the creation of user-visible "simd" types and that disable the registration of "simd" intrinsics. The simd type checks were changed to work for either package "simd" or "internal/simd" so that moving that package won't be quite so fragile. cmd/compile, internal/simd: glue for adding SIMD extensions to Go cmd/compile: theft of Cherry's sample SIMD compilation Change-Id: Id44e2f4bafe74032c26de576a8691b6f7d977e01 Reviewed-on: https://go-review.googlesource.com/c/go/+/675598 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-05-21cmd/compile/internal/ssa: eliminate string copies for calls to unique.MakeJake Bailey
unique.Make always copies strings passed into it, so it's safe to not copy byte slices converted to strings either. Handle this just like map accesses with string(b) as keys. This CL only handles unique.Make(string(b)), not nested cases like unique.Make([2]string{string(b1), string(b2)}); this could be done in a followup CL but the map lookup code in walk is sufficiently different than the call handling code that I didn't attempt it. (SSA is much easier). Fixes #71926 Change-Id: Ic2f82f2f91963d563b4ddb1282bd49fc40da8b85 Reviewed-on: https://go-review.googlesource.com/c/go/+/672135 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-22cmd/compile: constant fold 128-bit multipliesKeith Randall
The full 64x64->128 multiply comes up when using bits.Mul64. The 64x64->64+overflow multiply comes up in unsafe.Slice when using a constant length. Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc Reviewed-on: https://go-review.googlesource.com/c/go/+/667175 Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-04cmd/compile: improve store-to-load forwarding with compatible typesAlexander Musman
Improve the compiler's store-to-load forwarding optimization by relaxing the type comparison condition. Instead of requiring exact type equality (CMPeq), we now use copyCompatibleType which allows forwarding between compatible types where safe. Fix several size comparison bugs in the nested store patterns. Previously, we were comparing the size of the outer store with the load type, rather than comparing with the size of the actual store being forwarded from. Skip OpConvert in dead store elimination to help get rid of dead stores such as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory. This optimization is particularly beneficial for code that creates slices with computed pointers, such as the runtime's heapBitsSlice function, where intermediate calculations were previously causing the compiler to miss store-to-load forwarding opportunities. Local sweet run result on an x86_64 laptop: │ Orig.res │ Hopt.res │ │ sec/op │ sec/op vs base │ BiogoIgor-8 5.303 ± 1% 5.322 ± 1% ~ (p=0.190 n=10) BiogoKrishna-8 7.894 ± 1% 7.828 ± 2% ~ (p=0.190 n=10) BleveIndexBatch100-8 2.257 ± 1% 2.248 ± 2% ~ (p=0.529 n=10) EtcdPut-8 30.12m ± 1% 30.03m ± 1% ~ (p=0.796 n=10) EtcdSTM-8 127.1m ± 1% 126.2m ± 0% -0.74% (p=0.023 n=10) GoBuildKubelet-8 52.21 ± 0% 52.05 ± 1% ~ (p=0.063 n=10) GoBuildKubeletLink-8 4.342 ± 1% 4.305 ± 0% -0.85% (p=0.000 n=10) GoBuildIstioctl-8 43.33 ± 0% 43.24 ± 0% -0.22% (p=0.015 n=10) GoBuildIstioctlLink-8 4.604 ± 1% 4.598 ± 0% ~ (p=0.063 n=10) GoBuildFrontend-8 15.33 ± 0% 15.29 ± 0% ~ (p=0.143 n=10) GoBuildFrontendLink-8 740.0m ± 1% 737.7m ± 1% ~ (p=0.912 n=10) GopherLuaKNucleotide-8 9.590 ± 1% 9.656 ± 1% ~ (p=0.165 n=10) MarkdownRenderXHTML-8 96.97m ± 1% 97.26m ± 2% ~ (p=0.105 n=10) Tile38QueryLoad-8 335.9µ ± 1% 335.6µ ± 1% ~ (p=0.481 n=10) geomean 1.336 1.333 -0.22% Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f Reviewed-on: https://go-review.googlesource.com/c/go/+/662075 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Carlos Amedee <carlos@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-03-11cmd/compile: add constant folding for bits.Add64Jorropo
Change-Id: I0ed4ebeaaa68e274e5902485ccc1165c039440bd Reviewed-on: https://go-review.googlesource.com/c/go/+/656275 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
2025-03-11cmd/compile: add MakeTuple generic SSA op to remove duplicate Select[01] rulesJorropo
Change-Id: Id94a5e503f02aa29dc1e334b521770107d4261db Reviewed-on: https://go-review.googlesource.com/c/go/+/656615 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
2025-03-11cmd/compile: add constant folding for PopCountJorropo
Change-Id: I6ea3f75ddd5c7af114ef77bc48f28c7f8570997b Reviewed-on: https://go-review.googlesource.com/c/go/+/656156 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-02-19cmd/compile: load properly constant values from itabsMateusz Poliwczak
While looking at the SSA of following code, i noticed that these rules do not work properly, and the types are loaded indirectly through an itab, instead of statically. type M interface{ M() } type A interface{ A() } type Impl struct{} func (*Impl) M() {} func (*Impl) A() {} func main() { var a M = &Impl{} a.(A).A() } Change-Id: Ia275993f81a2e7302102d4ff87ac28586023d13c GitHub-Last-Rev: 4bfc9019172929d0b0f1c8a1b7eb28cdbc9b87ef GitHub-Pull-Request: golang/go#71784 Reviewed-on: https://go-review.googlesource.com/c/go/+/649500 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>