aboutsummaryrefslogtreecommitdiff
path: root/src/runtime
AgeCommit message (Collapse)Author
2025-11-26cmd/link, runtime, debug/gosym: move pclntab magic to internal/abiIan Lance Taylor
Change-Id: I2d3c41b0e61b994d7b04bd16a791fd226dc45269 Reviewed-on: https://go-review.googlesource.com/c/go/+/720302 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-26cmd/compile: introduce alias analysis and automatically free non-aliased ↵thepudds
memory after growslice This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL updates the compiler to examine append calls to prove whether or not the slice is aliased. If proven unaliased, the compiler automatically inserts a call to a new runtime function introduced with this CL, runtime.growsliceNoAlias, which frees the old backing memory immediately after slice growth is complete and the old storage is logically dead. Two append benchmarks below show promising results, executing up to ~2x faster and up to factor of ~3 memory reduction with this CL. The approach works with multiple append calls for the same slice, including inside loops, and the final slice memory can be escaping, such as in a classic pattern of returning a slice from a function after the slice is built. (The final slice memory is never freed with this CL, though we have other work that tackles that.) An example target for this CL is we automatically free the intermediate memory for the appends in the loop in this function: func f1(input []int) []int { var s []int for _, x := range input { s = append(s, g(x)) // s cannot be aliased here if h(x) { s = append(s, x) // s cannot be aliased here } } return s // slice escapes at end } In this case, the compiler and the runtime collaborate so that the heap allocated backing memory for s is automatically freed after a successful grow. (For the first grow, there is nothing to free, but for the second and subsequent growths, the old heap memory is freed automatically.) The new runtime.growsliceNoAlias is primarily implemented by calling runtime.freegc, which we introduced in CL 673695. The high-level approach here is we step through the IR starting from a slice declaration and look for any operations that either alias the slice or might do so, and treat any IR construct we don't specifically handle as a potential alias (and therefore conservatively fall back to treating the slice as aliased when encountering something not understood). For loops, some additional care is required. We arrange the analysis so that an alias in the body of a loop causes all the appends in that same loop body to be marked aliased, even if the aliasing occurs after the append in the IR: func f2() { var s []int for i := range 10 { s = append(s, i) // aliased due to next line alias = s } } For nested loops, we analyse the nesting appropriately so that for example this append is still proven as non-aliased in the inner loop even though it aliased for the outer loop: func f3() { for range 10 { var s []int for i := range 10 { s = append(s, i) // append using non-aliased slice } alias = s } } A good starting point is the beginning of the test/escape_alias.go file, which starts with ~10 introductory examples with brief comments that attempt to illustrate the high-level approach. For more details, see the new .../internal/escape/alias.go file, especially the (*aliasAnalysis).analyze method. In the first benchmark, an append in a loop builds up a slice from nothing, where the slice elements are each 64 bytes. In the table below, 'count' is the number of appends. With 1 append, there is no opportunity for this CL to free memory. Once there are 2 appends, the growth from 1 element to 2 elements means the compiler-inserted growsliceNoAlias frees the 1-element array, and we see a ~33% reduction in memory use and a small reported speed improvement. As the number of appends increases for example to 5, we are at a ~20% speed improvement and ~45% memory reduction, and so on until we reach ~40% faster and ~50% less memory allocated at the end of the table. There can be variation in the reported numbers based on -randlayout, so this table is for 30 different values of -randlayout with a total n=150. (Even so, there is still some variation, so we probably should not read too much into small changes.) This is with GOAMD64=v3 on a VM that gcc reports is cascadelake. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ Append64Bytes/count=1-4 31.09n ± 2% 31.69n ± 1% +1.95% (n=150) Append64Bytes/count=2-4 73.31n ± 1% 70.27n ± 0% -4.15% (n=150) Append64Bytes/count=3-4 142.7n ± 1% 124.6n ± 1% -12.68% (n=150) Append64Bytes/count=4-4 149.6n ± 1% 127.7n ± 0% -14.64% (n=150) Append64Bytes/count=5-4 277.1n ± 1% 213.6n ± 0% -22.90% (n=150) Append64Bytes/count=6-4 280.7n ± 1% 216.5n ± 1% -22.87% (n=150) Append64Bytes/count=10-4 544.3n ± 1% 386.6n ± 0% -28.97% (n=150) Append64Bytes/count=20-4 1058.5n ± 1% 715.6n ± 1% -32.39% (n=150) Append64Bytes/count=50-4 2.121µ ± 1% 1.404µ ± 1% -33.83% (n=150) Append64Bytes/count=100-4 4.152µ ± 1% 2.736µ ± 1% -34.11% (n=150) Append64Bytes/count=200-4 7.753µ ± 1% 4.882µ ± 1% -37.03% (n=150) Append64Bytes/count=400-4 15.163µ ± 2% 9.273µ ± 1% -38.84% (n=150) geomean 601.8n 455.0n -24.39% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ Append64Bytes/count=1-4 64.00 ± 0% 64.00 ± 0% ~ (n=150) Append64Bytes/count=2-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) Append64Bytes/count=3-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=4-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=5-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=6-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=10-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) Append64Bytes/count=20-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) Append64Bytes/count=50-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) Append64Bytes/count=100-4 15.938Ki ± 0% 8.021Ki ± 0% -49.67% (n=150) Append64Bytes/count=200-4 31.94Ki ± 0% 16.08Ki ± 0% -49.64% (n=150) Append64Bytes/count=400-4 63.94Ki ± 0% 32.33Ki ± 0% -49.44% (n=150) geomean 1.991Ki 1.124Ki -43.54% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ Append64Bytes/count=1-4 1.000 ± 0% 1.000 ± 0% ~ (n=150) Append64Bytes/count=2-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) Append64Bytes/count=3-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=4-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=5-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=6-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=10-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) Append64Bytes/count=20-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) Append64Bytes/count=50-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) Append64Bytes/count=100-4 8.000 ± 0% 1.000 ± 0% -87.50% (n=150) Append64Bytes/count=200-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) Append64Bytes/count=400-4 10.000 ± 0% 1.000 ± 0% -90.00% (n=150) geomean 4.331 1.000 -76.91% The second benchmark is similar, but instead uses an 8-byte integer for the slice element. The first 4 appends in the loop never call into the runtime thanks to the excellent CL 664299 introduced by Keith in Go 1.25 that allows some <= 32 byte dynamically-sized slices to be on the stack, so this CL is neutral for <= 32 bytes. Once the 5th append occurs at count=5, a grow happens via the runtime and heap allocates as normal, but freegc does not yet have anything to free, so we see a small ~1.4ns penalty reported there. But once the second growth happens, the older heap memory is now automatically freed by freegc, so we start to see some benefit in memory reductions and speed improvements, starting at a tiny speed improvement (close to a wash, or maybe noise) by the second growth before count=10, and building up to ~2x faster with ~68% fewer allocated bytes reported. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ AppendInt/count=1-4 2.978n ± 0% 2.969n ± 0% -0.30% (p=0.000 n=150) AppendInt/count=4-4 4.292n ± 3% 4.163n ± 3% ~ (p=0.528 n=150) AppendInt/count=5-4 33.50n ± 0% 34.93n ± 0% +4.25% (p=0.000 n=150) AppendInt/count=10-4 76.21n ± 1% 75.67n ± 0% -0.72% (p=0.000 n=150) AppendInt/count=20-4 150.6n ± 1% 133.0n ± 0% -11.65% (n=150) AppendInt/count=50-4 284.1n ± 1% 225.6n ± 0% -20.59% (n=150) AppendInt/count=100-4 544.2n ± 1% 392.4n ± 1% -27.89% (n=150) AppendInt/count=200-4 1051.5n ± 1% 702.3n ± 0% -33.21% (n=150) AppendInt/count=400-4 2.041µ ± 1% 1.312µ ± 1% -35.70% (n=150) AppendInt/count=1000-4 5.224µ ± 2% 2.851µ ± 1% -45.43% (n=150) AppendInt/count=2000-4 11.770µ ± 1% 6.010µ ± 1% -48.94% (n=150) AppendInt/count=3000-4 17.747µ ± 2% 8.264µ ± 1% -53.44% (n=150) geomean 331.8n 246.4n -25.72% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) AppendInt/count=20-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) AppendInt/count=50-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) AppendInt/count=100-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) AppendInt/count=200-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) AppendInt/count=400-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) AppendInt/count=1000-4 24.56Ki ± 0% 10.05Ki ± 0% -59.07% (n=150) AppendInt/count=2000-4 58.56Ki ± 0% 20.31Ki ± 0% -65.32% (n=150) AppendInt/count=3000-4 85.19Ki ± 0% 27.30Ki ± 0% -67.95% (n=150) geomean ² -42.81% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) AppendInt/count=20-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) AppendInt/count=50-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) AppendInt/count=100-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) AppendInt/count=200-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) AppendInt/count=400-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) AppendInt/count=1000-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) AppendInt/count=2000-4 11.000 ± 0% 1.000 ± 0% -90.91% (n=150) AppendInt/count=3000-4 12.000 ± 0% 1.000 ± 0% -91.67% (n=150) geomean ² -72.76% ² Of course, these are just microbenchmarks, but likely indicate there are some opportunities here. The immediately following CL 712422 tackles inlining and is able to get runtime.freegc working automatically with iterators such as used by slices.Collect, which becomes able to automatically free the intermediate memory from its repeated appends (which earlier in this work required a temporary hand edit to the slices package). For now, we only use the NoAlias version for element types without pointers while waiting on additional runtime support in CL 698515. Updates #74299 Change-Id: I1b9d286aa97c170dcc2e203ec0f8ca72d84e8221 Reviewed-on: https://go-review.googlesource.com/c/go/+/710015 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-11-26crypto,testing/cryptotest: ignore random io.Reader params, add SetGlobalRandomFilippo Valsorda
First, we centralize all random bytes generation through drbg.Read. The rest of the FIPS 140-3 module can't use external functions anyway, so drbg.Read needs to have all the logic. Then, make sure that the crypto/... tree uses drbg.Read (or the new crypto/internal/rand.Reader wrapper) instead of crypto/rand, so it is unaffected by applications setting crypto/rand.Reader. Next, pass all unspecified random io.Reader parameters through the new crypto/internal/rand.CustomReader, which just redirects to drbg.Read unless GODEBUG=cryptocustomrand=1 is set. Move all the calls to MaybeReadByte there, since it's only needed for these custom Readers. Finally, add testing/cryptotest.SetGlobalRandom which sets crypto/rand.Reader to a locked deterministic source and overrides drbg.Read. This way SetGlobalRandom should affect all cryptographic randomness in the standard library. Fixes #70942 Co-authored-by: qiulaidongfeng <2645477756@qq.com> Change-Id: I6a6a69641311d9fac318abcc6d79677f0e406100 Reviewed-on: https://go-review.googlesource.com/c/go/+/724480 Reviewed-by: Nicholas Husin <nsh@golang.org> Auto-Submit: Filippo Valsorda <filippo@golang.org> Reviewed-by: Nicholas Husin <husin@google.com> Reviewed-by: Roland Shoemaker <roland@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-26runtime: update mkmalloc to make generated code look nicermatloob
This cl adds a new operation that can remove an if statement or replace it with its body if its condition is var or !var for some variable var that's being replaced with a constant. Change-Id: I864abf1f023b2a66b2299ca65d4f837d6a6a6964 Reviewed-on: https://go-review.googlesource.com/c/go/+/724940 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Matloob <matloob@google.com> Auto-Submit: Michael Matloob <matloob@google.com>
2025-11-26runtime/secret: implement new secret packageDaniel Morsing
Implement secret.Do. - When secret.Do returns: - Clear stack that is used by the argument function. - Clear all the registers that might contain secrets. - On stack growth in secret mode, clear the old stack. - When objects are allocated in secret mode, mark them and then zero the marked objects immediately when they are freed. - If the argument function panics, raise that panic as if it originated from secret.Do. This removes anything about the secret function from tracebacks. For now, this is only implemented on linux for arm64 and amd64. This is a rebased version of Keith Randalls initial implementation at CL 600635. I have added arm64 support, signal handling, preemption handling and dealt with vDSOs spilling into system stacks. Fixes #21865 Change-Id: I6fbd5a233beeaceb160785e0c0199a5c94d8e520 Co-authored-by: Keith Randall <khr@golang.org> Reviewed-on: https://go-review.googlesource.com/c/go/+/704615 Reviewed-by: Roland Shoemaker <roland@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Filippo Valsorda <filippo@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-11-26cmd/compile, runtime: guard X15 zeroing with GOEXPERIMENT=simdCherry Mui
If simd experiment is not enabled, the compiler doesn't use the AVX part of the register. So only zero it with the SSE instruction. Change-Id: Ia3bdf34a9ed273128db2ee0f4f5db6f7cc76a975 Reviewed-on: https://go-review.googlesource.com/c/go/+/724720 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-11-26crypto/fips140: add WithoutEnforcementDaniel Morsing
WithoutEnforcement lets programs running under GODEBUG=fips140=only selectively opt out of strict enforcement. This is especially helpful for non-critical uses of cryptography routines like SHA-1 for content addressable storage backends (E.g. git). Fixes #74630 Change-Id: Iabba1f5eb63498db98047aca45e09c5dccf2fbdf Reviewed-on: https://go-review.googlesource.com/c/go/+/723720 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> Auto-Submit: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Roland Shoemaker <roland@golang.org>
2025-11-26runtime: merge all the linux 32 and 64 bits files into one for eachJorropo
Change-Id: I57067c9724dad2fba518b900d6f6a049cc32099e Reviewed-on: https://go-review.googlesource.com/c/go/+/714081 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-11-25runtime: panic if cleanup function closes over cleanup pointerIan Lance Taylor
This would catch problems like https://go.dev/cl/696295. Benchmark effect with this CL plus CL 697535: goos: linux goarch: amd64 pkg: runtime cpu: 12th Gen Intel(R) Core(TM) i7-1260P │ /tmp/foo.1 │ /tmp/foo.2 │ │ sec/op │ sec/op vs base │ AddCleanupAndStop-16 81.93n ± 1% 82.87n ± 1% +1.14% (p=0.041 n=10) For #75066 Change-Id: Ia41d0d6b88baebf6055cb7e5d42bc8210b31630f Reviewed-on: https://go-review.googlesource.com/c/go/+/714000 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Ian Lance Taylor <iant@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-25runtime: panic on AddCleanup with self pointerIan Lance Taylor
For #75066 Change-Id: Ifd555586fb448e7510fed16372648bdd7ec0ab4a Reviewed-on: https://go-review.googlesource.com/c/go/+/697535 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-11-24runtime: add GODEBUG=tracebacklabels=1 to include pprof labels in tracebacksDavid Finkel
Copy LabelSet to an internal package as label.Set, and include (escaped) labels within goroutine stack dumps. Labels are added to the goroutine header as quoted key:value pairs, so the line may get long if there are a lot of labels. To handle escaping, we add a printescaped function to the runtime and hook it up to the print function in the compiler with a new runtime.quoted type that's a sibling to runtime.hex. (in fact, we leverage some of the machinery from printhex to generate escape sequences). The escaping can be improved for printable runes outside basic ASCII (particularly for languages using non-latin stripts). Additionally, invalid UTF-8 can be improved. So we can experiment with the output format make this opt-in via a a new tracebacklabels GODEBUG var. Updates #23458 Updates #76349 Change-Id: I08e78a40c55839a809236fff593ef2090c13c036 Reviewed-on: https://go-review.googlesource.com/c/go/+/694119 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>
2025-11-24runtime: use m.profStack in traceStackMichael Anthony Knyszek
Turns out we spend a few percent of the trace event writing path in just zero-initializing the stack space for pcBuf. We don't need zero initialization, since we're going to write over whatever we actually use. Use m.profStack instead, which is already sized correctly. A side-effect of this change is that trace stacks now obey the GODEBUG profstackdepth where they previously ignored it. The name clearly doesn't match, but this is a positive: there's no reason the maximum stack depth shouldn't apply to every diagnostic. Change-Id: Ia654d3d708f15cbb2e1d95af196ae10b07a65df2 Reviewed-on: https://go-review.googlesource.com/c/go/+/723062 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-11-24runtime: don't write unique string to trace if it's length zeroMichael Anthony Knyszek
While we're here, document that ID 0 is implicitly assigned to an empty set of data for both stacks and strings. Change-Id: Ic52ff3a1132abc5a8f6f6c4e4357e31e6e7799fc Reviewed-on: https://go-review.googlesource.com/c/go/+/723061 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-11-24[dev.simd] all: merge master (02d1f3a) into dev.simdCherry Mui
Merge List: + 2025-11-24 02d1f3a06b runtime: respect GOTRACEBACK for user-triggered runtime panics + 2025-11-24 a593ca9d65 runtime/cgo: add support for `any` param and return type + 2025-11-24 89552911b3 cmd/compile, internal/buildcfg: enable regABI on s390x, and add s390x + 2025-11-24 2fe0ba8d52 internal/bytealg: port bytealg functions to reg ABI on s390x + 2025-11-24 4529c8fba6 runtime: port memmove, memclr to register ABI on s390x + 2025-11-24 58a48a3e3b internal/runtime/syscall: Syscall changes for s390x regabi + 2025-11-24 2a185fae7e reflect, runtime: add reflect support for regabi on s390x + 2025-11-24 e92d2964fa runtime: mark race functions on s390x as ABIInternal + 2025-11-24 41af98eb83 runtime: add runtime changes for register ABI on s390x + 2025-11-24 85e6080089 cmd/internal/obj: set morestack arg spilling and regabi prologue on s390x + 2025-11-24 24697419c5 cmd/compile: update s390x CALL* ops + 2025-11-24 81242d034c cmd/compile/internal/s390x: add initial spill support + 2025-11-24 73b6aa0fec cmd/compile/internal: add register ABI information for s390x + 2025-11-24 1036f6f485 internal/abi: define s390x ABI constants + 2025-11-24 2e5d12a277 cmd/compile: document register-based ABI for s390x Change-Id: I57b4ae6f9b65d99958b9fe5974205770e18f7788
2025-11-24runtime: respect GOTRACEBACK for user-triggered runtime panicsJoe Tsai
The documentation for GOTRACEBACK says that "single" is the default where the stack trace for only a single routine is printed except that it prints all stack traces if: there is no current goroutine or the failure is internal to the run-time. In the runtime, there are two types of panics: throwTypeUser and throwTypeRuntime. The latter more clearly corresponds to a "failure [that] is internal to the run-time", while the former corresponds to a problem trigger due to a user mistake. Thus, a user-triggered panic (e.g., concurrent map access) should not result in a dump of all stack traces. Fixes #68019 Change-Id: I9b02f82535ddb9fd666f7158e2e4ee10f235646a Reviewed-on: https://go-review.googlesource.com/c/go/+/649535 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-11-24runtime/cgo: add support for `any` param and return typeAlexandre Daubois
When using `any` as param or return type of an exported function, we currently have the error `unrecognized Go type any`. `any` is an alias of `interface{}` which is already supported. This would avoid such change: https://github.com/php/frankenphp/pull/1976 Fixes #76340 Change-Id: I301838ff72e99ae78b035a8eff2405f6a145ed1a GitHub-Last-Rev: 7dfbccfa582bbc6e79ed29677391b9ae81a9b5bd GitHub-Pull-Request: golang/go#76325 Reviewed-on: https://go-review.googlesource.com/c/go/+/720960 Reviewed-by: Mark Freeman <markfreeman@google.com> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-24runtime: port memmove, memclr to register ABI on s390xSrinivas Pokala
This allows memmove and memclr to be invoked using the new register ABI on s390x. Update #40724 Change-Id: I2e799aac693ddd693266c156c525d6303060796f Reviewed-on: https://go-review.googlesource.com/c/go/+/719424 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Vishwanatha HD <vishwanatha.hd@ibm.com> Reviewed-by: Keith Randall <khr@google.com>
2025-11-24reflect, runtime: add reflect support for regabi on s390xSrinivas Pokala
This adds the regabi support needed for reflect calls makeFuncSub and methodValueCall. Also, It add's archFloat32FromReg and archFloat32ToReg. Update #40724 Change-Id: Ic4b9e30c82f292a24fd2c2b9796cd80a58cecf77 Reviewed-on: https://go-review.googlesource.com/c/go/+/719480 Reviewed-by: Vishwanatha HD <vishwanatha.hd@ibm.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-24runtime: mark race functions on s390x as ABIInternalSrinivas Pokala
This adds ABIInternal to the race function declarations. Update #40724 Change-Id: I827f94fa08240a17a4107a39bca6b4e279dc2530 Reviewed-on: https://go-review.googlesource.com/c/go/+/719422 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Vishwanatha HD <vishwanatha.hd@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-24runtime: add runtime changes for register ABI on s390xSrinivas Pokala
This adds the changes for the register ABI in the runtime functions for s390x platform: - Add spill/unspill functions used by runtime - Add ABIInternal to functions Updates #40724 Change-Id: I6aaeec1d7293b6fec2aa489df90414937b80199e Reviewed-on: https://go-review.googlesource.com/c/go/+/719465 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Vishwanatha HD <vishwanatha.hd@ibm.com>
2025-11-24[dev.simd] all: merge master (8dd5b13) into dev.simdCherry Mui
Merge List: + 2025-11-24 8dd5b13abc cmd/compile: relax stmtline_test on amd64 + 2025-11-23 feae743bdb cmd/compile: use 32x32->64 multiplies on loong64 + 2025-11-23 e88be8a128 runtime: fix stale comment for mheap/malloc + 2025-11-23 a318843a2a cmd/internal/obj/loong64: optimize duplicate optab entries + 2025-11-23 a18294bb6a cmd/internal/obj/arm64, image/gif, runtime, sort: use math/bits to calculate log2 + 2025-11-23 437323ef7b slices: fix incorrect comment in slices.Insert function documentation + 2025-11-23 1993dca400 doc/next: pre-announce end of support for macOS 12 in Go 1.27 + 2025-11-22 337f7b1f5d cmd/go: update default go directive in mod or work init + 2025-11-21 3c26aef8fb cmd/internal/obj/riscv: improve large branch/call/jump tests + 2025-11-21 31aa9f800b crypto/tls: use inner hello for earlyData when using QUIC and ECH + 2025-11-21 d68aec8db1 runtime: replace trace seqlock with write flag + 2025-11-21 8d9906cd34 runtime/trace: add Log benchmark + 2025-11-21 6aeacdff38 cmd/go: support sha1 repos when git default is sha256 + 2025-11-21 9570036ca5 crypto/sha3: make the zero value of SHAKE useable + 2025-11-21 155efbbeeb crypto/sha3: make the zero value of SHA3 useable + 2025-11-21 6f16669e34 database/sql: don't ignore ColumnConverter for unknown input count + 2025-11-21 121bc3e464 runtime/pprof: remove hard-coded sleep in CPU profile reader + 2025-11-21 b604148c4e runtime: fix double wakeup in CPU profile buffer + 2025-11-21 22f24f90b5 cmd/compile: change testing.B.Loop keep alive semantic + 2025-11-21 cfb9d2eb73 net: remove unused linknames + 2025-11-21 65ef314f89 net/http: remove unused linknames + 2025-11-21 0f32fbc631 net/http: populate Response.Request when using NewFileTransport + 2025-11-21 3e0a8e7867 net/http: preserve original path encoding in redirects + 2025-11-21 831af61120 net/http: use HTTP 307 redirects in ServeMux + 2025-11-21 87269224cb net/http: update Response.Request.URL after redirects on GOOS=js + 2025-11-21 7aa9ca729f net/http/cookiejar: treat localhost as secure origin + 2025-11-21 f870a1d398 net/url: warn that JoinPath arguments should be escaped + 2025-11-21 9962d95fed crypto/internal/fips140/mldsa: unroll NTT and inverseNTT + 2025-11-21 f821fc46c5 crypto/internal/fisp140test: update acvptool, test data + 2025-11-21 b59efc38a0 crypto/internal/fips140/mldsa: new package + 2025-11-21 62741480b8 runtime: remove linkname for gopanic + 2025-11-21 7db2f0bb9a crypto/internal/hpke: separate KEM and PublicKey/PrivateKey interfaces + 2025-11-21 e15800c0ec crypto/internal/hpke: add ML-KEM and hybrid KEMs, and SHAKE KDFs + 2025-11-21 7c985a2df4 crypto/internal/hpke: modularize API and support more ciphersuites + 2025-11-21 e7d47ac33d cmd/compile: simplify negative on multiplication + 2025-11-21 35d2712b32 net/http: fix typo in Transport docs + 2025-11-21 90c970cd0f net: remove unnecessary loop variable copies in tests + 2025-11-21 9772d3a690 cmd/cgo: strip top-level const qualifier from argument frame struct + 2025-11-21 1903782ade errors: add examples for custom Is/As matching + 2025-11-21 ec92bc6d63 cmd/compile: rewrite Rsh to RshU if arguments are proved positive + 2025-11-21 3820f94c1d cmd/compile: propagate unsigned relations for Rsh if arguments are positive + 2025-11-21 d474f1fd21 cmd/compile: make dse track multiple shadowed ranges + 2025-11-21 d0d0a72980 cmd/compile/internal/ssa: correct type of ARM64 conditional instructions + 2025-11-21 a9704f89ea internal/runtime/gc/scan: add AVX512 impl of filterNil. + 2025-11-21 ccd389036a cmd/internal/objabi: remove -V=goexperiment internal special case + 2025-11-21 e7787b9eca runtime: go fmt + 2025-11-21 17b3b98796 internal/strconv: go fmt + 2025-11-21 c851827c68 internal/trace: go fmt + 2025-11-21 f87aaec53d cmd/compile: fix integer overflow in prove pass + 2025-11-21 dbd2ab9992 cmd/compile/internal: fix typos + 2025-11-21 b9d86baae3 cmd/compile/internal/devirtualize: fix typos + 2025-11-20 4b0e3cc1d6 cmd/link: support loading R_LARCH_PCREL20_S2 and R_LARCH_CALL36 relocs + 2025-11-20 cdba82c7d6 cmd/internal/obj/loong64: add {,X}VSLT.{B/H/W/V}{,U} instructions support + 2025-11-20 bd2b117c2c crypto/tls: add QUICErrorEvent + 2025-11-20 3ad2e113fc net/http/httputil: wrap ReverseProxy's outbound request body so Close is a noop + 2025-11-20 d58b733646 runtime: track goroutine location until actual STW + 2025-11-20 1bc54868d4 cmd/vendor: update to x/tools@68724af + 2025-11-20 8c3195973b runtime: disable stack allocation tests on sanitizers + 2025-11-20 ff654ea100 net/url: permit colons in the host of postgresql:// URLs + 2025-11-20 a662badab9 encoding/json: remove linknames + 2025-11-20 5afe237d65 mime: add missing path for mime types in godoc + 2025-11-20 c1b7112af8 os/signal: make NotifyContext cancel the context with a cause Change-Id: Ib93ef643be610dfbdd83ff45095a7b1ca2537b8b
2025-11-23runtime: fix stale comment for mheap/mallocLidong Yan
mheap use pageAlloc to manage free/scav address space instead of using free/scav treap. The comment on mheap states mheap uses treaps. Update the comment to reflect the use of pageAlloc. In the fallback code when sizeSpecializedMalloc is enabled, heapBitsInSpan is false. Update the comment to reflect that. Change-Id: I50d2993c84e2c0312a925ab0a33065cc4cd41c41 Reviewed-on: https://go-review.googlesource.com/c/go/+/722700 Reviewed-by: Lidong Yan <yldhome2d2@gmail.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <markfreeman@google.com>
2025-11-23cmd/internal/obj/arm64, image/gif, runtime, sort: use math/bits to calculate ↵Axel Wagner
log2 In several places the integer log2 is calculated using loops or similar mechanisms. math/bits.Len* provide a simpler and more efficient mechanisms for this. Annoyingly, every usage has slightly different ideas of what "log2" means and how non-positive inputs should be handled. I verified the replacements in each case by comparing the result for inputs from 0 to 1<<16. Change-Id: Ie962a74674802da363e0038d34c06979ccb41cf3 Reviewed-on: https://go-review.googlesource.com/c/go/+/721880 Reviewed-by: Mark Freeman <markfreeman@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-11-21runtime: replace trace seqlock with write flagMichael Anthony Knyszek
The runtime tracer currently uses a per-M seqlock to indicate whether a thread is writing to a local trace buffer. The seqlock is updated with two atomic adds, read-modify-write operations. These are quite expensive, even though they're completely uncontended. We can make these operations slightly cheaper by using an atomic store. The key insight here is that only one thread ever writes to the value at a time, so only the "write" of the read-modify-write actually matters. At that point, it doesn't really matter that we have a monotonically increasing counter. This is made clearer by the fact that nothing other than basic checks make sure the counter is monotonically increasing: everything only depends on whether the counter is even or odd. At that point, all we really need is a flag: an atomic.Bool, which we can update with an atomic Store, a write-only instruction. Change-Id: I0cfe39b34c7634554c34c53c0f0e196d125bbc4a Reviewed-on: https://go-review.googlesource.com/c/go/+/721840 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-11-21runtime/trace: add Log benchmarkMichael Anthony Knyszek
This is a pretty decent benchmark of the baseline event cost. Change-Id: I22a7fa3411bd80be3bd8093d5933e29062cb1377 Reviewed-on: https://go-review.googlesource.com/c/go/+/723060 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-21runtime/pprof: remove hard-coded sleep in CPU profile readerNick Ripley
The CPU profiler reader goroutine has a hard-coded 100ms sleep between reads of the CPU profile sample buffer. This is done because waking up the CPU profile reader is not signal-safe on some platforms. As a consequence, stopping the profiler takes 200ms (one iteration to read the last samples and one to see the "eof"), and on many-core systems the reader does not wake up frequently enought to keep up with incoming data. This CL removes the sleep where it is safe to do so, following a suggestion by Austin Clements in the comments on CL 445375. We let the reader fully block, and wake up the reader when the buffer is over half-full. Fixes #63043 Updates #56029 Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-arm64-longtest,gotip-linux-386-longtest Change-Id: I9f7e7e9918a4a6f16e80f6aaf33103126568a81f Reviewed-on: https://go-review.googlesource.com/c/go/+/610815 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com>
2025-11-21runtime: fix double wakeup in CPU profile bufferNick Ripley
The profBuf.wakeupExtra method wakes up the profile reader if it's sleeping, either when the buffer is closed or when there is a pending overflow entry. Unlike in profBuf.write, profBuf.wakeupExtra does not clear the profReaderSleeping bit before doing the wakeup. As a result, if there are two writes to a full buffer before the sleeping reader has time to wake up, we'll see two consecutive calls to notewakeup, which is a fatal error. This CL updates profBuf.wakeupExtra to clear the sleeping bit before doing the wakeup. This CL adds a unit test that demonstrates the problem. This is theoretically possible to trigger for real programs as well, but it's more difficult. The profBufWordCount is large enough that it takes several CPU-seconds to fill up the buffer. So we'd need to run on a system with lots of cores to have a chance of running into this failure, and the reader would need to fully go to sleep before a large burst of CPU activity. Change-Id: I59b4fa86a12f6236890b82cd353a95706a6a6964 Reviewed-on: https://go-review.googlesource.com/c/go/+/722940 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com>
2025-11-21runtime: remove linkname for gopanicSean Liao
github.com/goplus/igop now renamed github.com/goplus/ixgo already requires checklinkname=0, so the special case can be removed. https://github.com/goplus/ixgo/tree/e0d0bfeb2de9cfbe0b6cd668f015a1ba35dfea76/ixgo go.undefinedlabs.com is no longer a resolvable domain, having been absorbed by Datadog. The original use in https://pkg.go.dev/go.undefinedlabs.com/scopeagent@v0.4.2/reflection probably shouldn't have qualified anyway with 0 known importers. For #67401 Change-Id: Ida6024e014f3304d4a4190f0bd9d12746a29b40b Reviewed-on: https://go-review.googlesource.com/c/go/+/721300 Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-21runtime: go fmtMichael Pratt
Change-Id: I6a6a636cf38ddb1dc6f2170361eb4093b81acdfb Reviewed-on: https://go-review.googlesource.com/c/go/+/722521 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-11-20runtime: track goroutine location until actual STWMichael Pratt
TestTraceSTW / TestTraceGCSTW currently tracks the location (M/P) of the target goroutines until it reaches the "start" log message, assuming the actual STW comes immediately afterwards. On 386 with TestTraceGCSTW, it actually tends to take >10ms after the start log before the STW actually occurs. This is enough time for sysmon to preempt the target goroutines and migration them to another location. Fix this by continuing tracking all the way until the STW itself occurs. We still keep the start log message so we can ignore any STW (if any) before we expect. Cq-Include-Trybots: luci.golang.try:gotip-linux-386-longtest,gotip-linux-amd64-longtest Change-Id: I6a6a636cf2dcb18d8b33ac4ad88333cabff2eabb Reviewed-on: https://go-review.googlesource.com/c/go/+/722520 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-20runtime: disable stack allocation tests on sanitizersKeith Randall
CL 707755 broke the asan/msan builders. Change-Id: Ic9738140999a9bcfc94cecfe0964a5f1bc243c56 Reviewed-on: https://go-review.googlesource.com/c/go/+/722480 Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-11-20[dev.simd] all: merge master (ca37d24) into dev.simdCherry Mui
Conflicts: - src/cmd/compile/internal/typecheck/builtin.go Merge List: + 2025-11-20 ca37d24e0b net/http: drop unused "broken" field from persistConn + 2025-11-20 4b740af56a cmd/internal/obj/x86: handle global reference in From3 in dynlink mode + 2025-11-20 790384c6c2 spec: adjust rule for type parameter on RHS of alias declaration + 2025-11-20 a49b0302d0 net/http: correctly close fake net.Conns + 2025-11-20 32f5aadd2f cmd/compile: stack allocate backing stores during append + 2025-11-20 a18aff8057 runtime: select GC mark workers during start-the-world + 2025-11-20 829779f4fe runtime: split findRunnableGCWorker in two + 2025-11-20 ab59569099 go/version: use "custom" as an example of a version suffix + 2025-11-19 c4bb9653ba cmd/compile: Implement LoweredZeroLoop with LSX Instruction on loong64 + 2025-11-19 7f2ae21fb4 cmd/internal/obj/loong64: add MULW.D.W[U] instructions + 2025-11-19 a2946f2385 crypto: add Encapsulator and Decapsulator interfaces + 2025-11-19 6b83bd7146 crypto/ecdh: add KeyExchanger interface + 2025-11-19 4fef9f8b55 go/types, types2: fix object path for grouped declaration statements + 2025-11-19 33529db142 spec: escape double-ampersands + 2025-11-19 dc42565a20 cmd/compile: fix control flow for unsigned divisions proof relations + 2025-11-19 e64023dcbf cmd/compile: cleanup useless if statement in prove + 2025-11-19 2239520d1c test: go fmt prove.go tests + 2025-11-19 489d3dafb7 math: switch s390x math.Pow to generic implementation + 2025-11-18 8c41a482f9 runtime: add dlog.hexdump + 2025-11-18 e912618bd2 runtime: add hexdumper + 2025-11-18 2cf9d4b62f Revert "net/http: do not discard body content when closing it within request handlers" + 2025-11-18 4d0658bb08 cmd/compile: prefer fixed registers for values + 2025-11-18 ba634ca5c7 cmd/compile: fold boolean NOT into branches + 2025-11-18 8806d53c10 cmd/link: align sections, not symbols after DWARF compress + 2025-11-18 c93766007d runtime: do not print recovered when double panic with the same value + 2025-11-18 9859b43643 cmd/asm,cmd/compile,cmd/internal/obj/riscv: use compressed instructions on riscv64 + 2025-11-17 b9ef0633f6 cmd/internal/sys,internal/goarch,runtime: enable the use of compressed instructions on riscv64 + 2025-11-17 a087dea869 debug/elf: sync new loong64 relocation types up to LoongArch ELF psABI v20250521 + 2025-11-17 e1a12c781f cmd/compile: use 32x32->64 multiplies on arm64 + 2025-11-17 6caab99026 runtime: relax TestMemoryLimit on darwin a bit more + 2025-11-17 eda2e8c683 runtime: clear frame pointer at thread entry points + 2025-11-17 6919858338 runtime: rename findrunnable references to findRunnable + 2025-11-17 8e734ec954 go/ast: fix BasicLit.End position for raw strings containing \r + 2025-11-17 592775ec7d crypto/mlkem: avoid a few unnecessary inverse NTT calls + 2025-11-17 590cf18daf crypto/mlkem/mlkemtest: add derandomized Encapsulate768/1024 + 2025-11-17 c12c337099 cmd/compile: teach prove about subtract idioms + 2025-11-17 bc15963813 cmd/compile: clean up prove pass + 2025-11-17 1297fae708 go/token: add (*File).End method + 2025-11-17 65c09eafdf runtime: hoist invariant code out of heapBitsSmallForAddrInline + 2025-11-17 594129b80c internal/runtime/maps: update doc for table.Clear + 2025-11-15 c58d075e9a crypto/rsa: deprecate PKCS#1 v1.5 encryption + 2025-11-14 d55ecea9e5 runtime: usleep before stealing runnext only if not in syscall + 2025-11-14 410ef44f00 cmd: update x/tools to 59ff18c + 2025-11-14 50128a2154 runtime: support runtime.freegc in size-specialized mallocs for noscan objects + 2025-11-14 c3708350a4 cmd/go: tests: rename git-min-vers->git-sha256 + 2025-11-14 aea881230d std: fix printf("%q", int) mistakes + 2025-11-14 120f1874ef runtime: add more precise test of assist credit handling for runtime.freegc + 2025-11-14 fecfcaa4f6 runtime: add runtime.freegc to reduce GC work + 2025-11-14 5a347b775e runtime: set GOEXPERIMENT=runtimefreegc to disabled by default + 2025-11-14 1a03d0db3f runtime: skip tests for GOEXPERIMENT=arenas that do not handle clobberfree=1 + 2025-11-14 cb0d9980f5 net/http: do not discard body content when closing it within request handlers + 2025-11-14 03ed43988f cmd/compile: allow multi-field structs to be stored directly in interfaces + 2025-11-14 1bb1f2bf0c runtime: put AddCleanup cleanup arguments in their own allocation + 2025-11-14 9fd2e44439 runtime: add AddCleanup benchmark + 2025-11-14 80c91eedbb runtime: ensure weak handles end up in their own allocation + 2025-11-14 7a8d0b5d53 runtime: add debug mode to extend _Grunning-without-P windows + 2025-11-14 710abf74da internal/runtime/cgobench: add Go function call benchmark for comparison + 2025-11-14 b24aec598b doc, cmd/internal/obj/riscv: document the riscv64 assembler + 2025-11-14 a0e738c657 cmd/compile/internal: remove incorrect riscv64 SLTI rule + 2025-11-14 2cdcc4150b cmd/compile: fold negation into multiplication + 2025-11-14 b57962b7c7 bytes: fix panic in bytes.Buffer.Peek + 2025-11-14 0a569528ea cmd/compile: optimize comparisons with single bit difference + 2025-11-14 1e5e6663e9 cmd/compile: remove unnecessary casts and types from riscv64 rules + 2025-11-14 ddd8558e61 go/types, types2: swap object.color for Checker.objPathIdx + 2025-11-14 9daaab305c cmd/link/internal/ld: make runtime.buildVersion with experiments valid + 2025-11-13 d50a571ddf test: fix tests to work with sizespecializedmalloc turned off + 2025-11-13 704f841eab cmd/trace: annotation proc start/stop with thread and proc always + 2025-11-13 17a02b9106 net/http: remove unused isLitOrSingle and isNotToken + 2025-11-13 ff61991aed cmd/go: fix flaky TestScript/mod_get_direct + 2025-11-13 129d0cb543 net/http/cgi: accept INCLUDED as protocol for server side includes + 2025-11-13 77c5130100 go/types: minor simplification + 2025-11-13 7601cd3880 go/types: generate cycles.go + 2025-11-13 7a372affd9 go/types, types2: rename definedType to declaredType and clarify docs Change-Id: Ibaa9bdb982364892f80e511c1bb12661fcd5fb86
2025-11-20cmd/compile: stack allocate backing stores during appendkhr@golang.org
We can already stack allocate the backing store during append if the resulting backing store doesn't escape. See CL 664299. This CL enables us to often stack allocate the backing store during append *even if* the result escapes. Typically, for code like: func f(n int) []int { var r []int for i := range n { r = append(r, i) } return r } the backing store for r escapes, but only by returning it. Could we operate with r on the stack for most of its lifeime, and only move it to the heap at the return point? The current implementation of append will need to do an allocation each time it calls growslice. This will happen on the 1st, 2nd, 4th, 8th, etc. append calls. The allocations done by all but the last growslice call will then immediately be garbage. We'd like to avoid doing some of those intermediate allocations if possible. We rewrite the above code by introducing a move2heap operation: func f(n int) []int { var r []int for i := range n { r = append(r, i) } r = move2heap(r) return r } Using the move2heap runtime function, which does: move2heap(r): If r is already backed by heap storage, return r. Otherwise, copy r to the heap and return the copy. Now we can treat the backing store of r allocated at the append site as not escaping. Previous stack allocation optimizations now apply, which can use a fixed-size stack-allocated backing store for r when appending. See the description in cmd/compile/internal/slice/slice.go for how we ensure that this optimization is safe. Change-Id: I81f36e58bade2241d07f67967d8d547fff5302b8 Reviewed-on: https://go-review.googlesource.com/c/go/+/707755 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-20runtime: select GC mark workers during start-the-worldMichael Pratt
When the GC starts today, procresize and startTheWorldWithSema don't consider the additional Ps required to run the mark workers. procresize and startTheWorldWithSema resume only the Ps necessary to run the normal user goroutines. Once those Ps start, findRunnable and findRunnableGCWorker determine that a GC worker is necessary and run the worker instead, calling wakep to wake another P to run the original user goroutine. This is unfortunate because it disrupts the intentional placement of Ps on Ms that procresize does. It also has the unfortunate side effect of slightly delaying start-the-world time, as it takes several sequential wakeps to get all Ps started. To address this, procresize explicitly assigns GC mark workers to Ps before starting the world. The assignment occurs _after_ selecting runnable Ps, so that we prefer to select Ps that were previously idle. Note that if fewer than 25% of Ps are idle then we won't be able to assign all dedicated workers, and some of the Ps intended for user goroutines will convert to dedicated workers once they reach findRunnableGCWorker. Also note that stack scanning temporarily suspends the goroutine. Resume occurs through ready, which will move the goroutine to the local runq of the P that did the scan. Thus there is still a source of migration at some point during the GC. For #65694. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Change-Id: I6a6a636c51f39f4f4bc716aa87de68f6ebe163a5 Reviewed-on: https://go-review.googlesource.com/c/go/+/721002 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-11-20runtime: split findRunnableGCWorker in twoMichael Pratt
The first part, assignWaitingGCWorker selects a mark worker (if any) and assigns it to the P. The second part, findRunnableGCWorker, is responsible for actually marking the worker as runnable and updating the CPU limiter. The advantage of this split is that assignWaitingGCWorker is safe to do during STW, which will allow the next CL to make selections during procresize. This change is a semantic no-op in preparation for the next CL. For #65694. Change-Id: I6a6a636c8beb212185829946cfa1e49f706ac31a Reviewed-on: https://go-review.googlesource.com/c/go/+/721001 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-11-18runtime: add dlog.hexdumpAustin Clements
Change-Id: I8149ab9314216bb8f9bf58da55633e2587d75851 Reviewed-on: https://go-review.googlesource.com/c/go/+/681376 Auto-Submit: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-18runtime: add hexdumperAustin Clements
Currently, we have a simple hexdumpWords facility for debugging. It's useful but pretty limited. This CL adds a much more configurable and capable "hexdumper". It can be configured for any word size (including bytes), handles unaligned data, includes an ASCII dump, and accepts data in multiple slices. It also has a much nicer "mark" facility for annotating the hexdump that isn't limited to a single character per word. We use this to improve our existing hexdumps, particularly the new mark facility. The next CL will integrate hexdumps into debuglog, which will make use of several other new capabilities. Also this adds an actual test. The output looks like: 7 6 5 4 3 2 1 0 f e d c b a 9 8 0123456789abcdef 000000c00006ef70: 03000000 00000000 ........ 000000c00006ef80: 00000000 0053da80 000000c0 000bc380 ..S............. ^ <testing.tRunner.func2+0x0> 000000c00006ef90: 00000000 0053dac0 000000c0 000bc380 ..S............. ^ <testing.tRunner.func1+0x0> 000000c00006efa0: 000000c0 0006ef90 000000c0 0006ef80 ................ 000000c00006efb0: 000000c0 0006efd0 00000000 0053eb65 ........e.S..... ^ <testing.(*T).Run.gowrap1+0x25> 000000c00006efc0: 000000c0 000bc380 00000000 009aaae8 ................ 000000c00006efd0: 00000000 00000000 00000000 00496b01 .........kI..... ^ <runtime.goexit+0x1> 000000c00006efe0: 00000000 00000000 00000000 00000000 ................ 000000c00006eff0: 00000000 00000000 ........ The header gives column labels, indicating the order of bytes within the following words. The addresses on the left are always 16-byte aligned so it's easy to combine that address with the column header to determine the full address of a byte. Annotations are no longer interleaved with the data, so the data stays in nicely aligned columns. The annotations are also now much more flexible, including support for multiple annotations on the same word (not shown). Change-Id: I27e83800a1f6a7bdd3cc2c59614661a810a57d4d Reviewed-on: https://go-review.googlesource.com/c/go/+/681375 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Austin Clements <austin@google.com>
2025-11-18runtime: do not print recovered when double panic with the same valueYoulin Feng
Show the "[recovered, repanicked]" message only when it is repanicked after recovered. For the duplicated panics that not recovered, do not show this message. Fixes #76099 Change-Id: I87282022ebe44c6f6efbe3239218be4a2a7b1104 Reviewed-on: https://go-review.googlesource.com/c/go/+/716020 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-11-17cmd/internal/sys,internal/goarch,runtime: enable the use of compressed ↵Joel Sing
instructions on riscv64 Enable the use of compressed instructions on riscv64 by reducing the PC quantum to two bytes and reducing the minimum instruction length to two bytes. Change gostartcall on riscv64 to land at two times the PC quantum into goexit, so that we retain four byte alignment and revise the NOP instructions in goexit to ensure that they are never compressed. Additionally, adjust PCALIGN so that it correctly handles two byte offsets. Fixes #47560 Updates #71105 Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64 Change-Id: I4329a8fbfcb4de636aadaeadabb826bc22698640 Reviewed-on: https://go-review.googlesource.com/c/go/+/523477 Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
2025-11-17runtime: relax TestMemoryLimit on darwin a bit moreKeith Randall
Add 8MB more. Covers most of the failures watchflakes has seen. Fixes #73136 Change-Id: I593c599a9519b8b31ed0f401d4157d27ac692587 Reviewed-on: https://go-review.googlesource.com/c/go/+/708617 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-11-17runtime: clear frame pointer at thread entry pointsNick Ripley
There are a few places in the runtime where new threads enter Go code with a possibly invalid frame pointer. mstart is the entry point for new Ms, and rt0_go is the entrypoint for the program. As we try to introduce frame pointer unwinding in more places (e.g. for heap profiling in CL 540476 or for execution trace events on the system stack in CL 593835), we see these functions on the stack. We need to ensure that they have valid frame pointers. These functions are both considered the "top" (first) frame frame of the call stack, so this CL sets the frame pointer register to 0 in these functions. Updates #63630 Change-Id: I6a6a6964a9ebc6f68ba23d2616e5fb6f19677f97 Reviewed-on: https://go-review.googlesource.com/c/go/+/721020 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-11-17runtime: rename findrunnable references to findRunnableMichael Pratt
These cases were missed by CL 393880. Change-Id: I6a6a636cf0d97a4efcf4b9df766002ecef48b4de Reviewed-on: https://go-review.googlesource.com/c/go/+/721120 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-11-17runtime: hoist invariant code out of heapBitsSmallForAddrInlineArchana Ravindar
The first two instructions in heapBitsSmallForAddrInline are invariant for a given span and object and are called in a loop within ScanObjectsSmall which figures as a hot routine in profiles of some benchmark runs within sweet benchmark suite (x/benchmarks/sweet), Ideally it would have been great if the compiler hoisted this code out of the loop, Moving it out of inner loop manually gives gains (moving it entirely out of nested loop does not improve performance, in some cases it even regresses it perhaps due to the early loop exit). Tested with AMD64, ARM64, PPC64LE and S390x Fixes #76212 Change-Id: I49c3c826b9d7bf3125ffc42c8c174cce0ecc4cbf Reviewed-on: https://go-review.googlesource.com/c/go/+/718680 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-14runtime: usleep before stealing runnext only if not in syscallMichael Anthony Knyszek
In the scheduler's steal path, we usleep(3) before stealing a _Prunning P's runnext slot. Before CL 646198, we would not call usleep(3) if the P was in _Psyscall. After CL 646198, Ps with Gs in syscalls stay in _Prunning until stolen, meaning we might unnecessarily usleep(3) where we didn't before. This probably isn't a huge deal in most cases, but can cause some apparent slowdowns in microbenchmarks that frequently take the steal path while there are syscalling goroutines. Change-Id: I5bf3df10fe61cf8d7f0e9fe9522102de66faf344 Reviewed-on: https://go-review.googlesource.com/c/go/+/720441 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-11-14runtime: support runtime.freegc in size-specialized mallocs for noscan objectsthepudds
This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL updates the smallNoScanStub stub in malloc_stubs.go to reuse heap objects that have been freed by runtime.freegc calls, and generates the corresponding size-specialized code in malloc_generated.go. This CL only adds support in the specialized mallocs for noscan heap objects (objects without pointers). A later CL handles objects with pointers. While we are here, we leave a couple of breadcrumbs in mkmalloc.go on how to do the generation. Updates #74299 Change-Id: I2657622601a27211554ee862fce057e101767a70 Reviewed-on: https://go-review.googlesource.com/c/go/+/715761 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-11-14runtime: add more precise test of assist credit handling for runtime.freegcthepudds
This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL adds a better test of assist credit handling when heap objects are being reused after a runtime.freegc call. The main approach is bracketing alloc/free pairs with measurements of the assist credit before and after, and hoping to see a net zero change in the assist credit. However, validating the desired behavior is perhaps a bit subtle. To help stabilize the measurements, we do acquirem in the test code to avoid being preempted during the measurements to reduce other code's ability to adjust the assist credit while we are measuring, and we also reduce GOMAXPROCS to 1. This test currently does fail if we deliberately introduce bugs in the runtime.freegc implementation such as if we: - never adjust the assist credit when reusing an object, or - always adjust the assist credit when reusing an object, or - deliberately mishandle internal fragmentation. The two main cases of current interest for testing runtime.freegc are when over the course of our bracketed measurements gcBlackenEnable is either true or false. The test attempts to exercise both of those case by running the GC continually in the background (which we can see seems effective based on logging and by how our deliberate bugs fail). This passes ~10K test executions locally via stress. A small note to the future: a previous incarnation of this test (circa patchset 11 of this CL) did not do acquirem but had an approach of ignoring certain measurements, which also was able to pass ~10K runs via stress. The current version in this CL is simpler, but recording the existence of the prior version here in case it is useful in the future. (Hopefully not.) Updates #74299 Change-Id: I46c7e0295d125f5884fee0cc3d3d31aedc7e5ff4 Reviewed-on: https://go-review.googlesource.com/c/go/+/717520 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-14runtime: add runtime.freegc to reduce GC workthepudds
This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL adds runtime.freegc: func freegc(ptr unsafe.Pointer, uintptr size, noscan bool) Memory freed via runtime.freegc is made immediately reusable for the next allocation in the same size class, without waiting for a GC cycle, and hence can dramatically reduce pressure on the GC. A sample microbenchmark included below shows strings.Builder operating roughly 2x faster. An experimental modification to reflect to use runtime.freegc and then using that reflect with json/v2 gave reported memory allocation reductions of -43.7%, -32.9%, -21.9%, -22.0%, -1.0% for the 5 official real-world unmarshalling benchmarks from go-json-experiment/jsonbench by the authors of json/v2, covering the CanadaGeometry through TwitterStatus datasets. Note: there is no intent to modify the standard library to have explicit calls to runtime.freegc, and of course such an ability would never be exposed to end-user code. Later CLs in this stack teach the compiler how to automatically insert runtime.freegc calls when it can prove it is safe to do so. (The reflect modification and other experimental changes to the standard library were just that -- experiments. It was very helpful while initially developing runtime.freegc to see more complex uses and closer-to-real-world benchmark results prior to updating the compiler.) This CL only addresses noscan span classes (heap objects without pointers), such as the backing memory for a []byte or string. A follow-on CL adds support for heap objects with pointers. If we update strings.Builder to explicitly call runtime.freegc on its internal buf after a resize operation (but without freeing the usually final incarnation of buf that will be returned to the user as a string), we can see some nice benchmark results on the existing strings benchmarks that call Builder.Write N times and then call Builder.String. Here, the (uncommon) case of a single Builder.Write is not helped (given it never resizes after first alloc if there is only one Write), but the impact grows such that it is up to ~2x faster as there are more resize operations due to more strings.Builder.Write calls: │ disabled.out │ new-free-20.txt │ │ sec/op │ sec/op vs base │ BuildString_Builder/1Write_36Bytes_NoGrow-4 55.82n ± 2% 55.86n ± 2% ~ (p=0.794 n=20) BuildString_Builder/2Write_36Bytes_NoGrow-4 125.2n ± 2% 115.4n ± 1% -7.86% (p=0.000 n=20) BuildString_Builder/3Write_36Bytes_NoGrow-4 224.0n ± 1% 188.2n ± 2% -16.00% (p=0.000 n=20) BuildString_Builder/5Write_36Bytes_NoGrow-4 239.1n ± 9% 205.1n ± 1% -14.20% (p=0.000 n=20) BuildString_Builder/8Write_36Bytes_NoGrow-4 422.8n ± 3% 325.4n ± 1% -23.04% (p=0.000 n=20) BuildString_Builder/10Write_36Bytes_NoGrow-4 436.9n ± 2% 342.3n ± 1% -21.64% (p=0.000 n=20) BuildString_Builder/100Write_36Bytes_NoGrow-4 4.403µ ± 1% 2.381µ ± 2% -45.91% (p=0.000 n=20) BuildString_Builder/1000Write_36Bytes_NoGrow-4 48.28µ ± 2% 21.38µ ± 2% -55.71% (p=0.000 n=20) See the design document for more discussion of the strings.Builder case. For testing, we add tests that attempt to exercise different aspects of the underlying freegc and mallocgc behavior on the reuse path. Validating the assist credit manipulations turned out to be subtle, so a test for that is added in the next CL. There are also invariant checks added, controlled by consts (primarily the doubleCheckReusable const currently). This CL also adds support in runtime.freegc for GODEBUG=clobberfree=1 to immediately overwrite freed memory with 0xdeadbeef, which can help a higher-level test fail faster in the event of a bug, and also the GC specifically looks for that pattern and throws a fatal error if it unexpectedly finds it. A later CL (currently experimental) adds GODEBUG=clobberfree=2, which uses mprotect (or VirtualProtect on Windows) to set freed memory to fault if read or written, until the runtime later unprotects the memory on the mallocgc reuse path. For the cases where a normal allocation is happening without any reuse, some initial microbenchmarks suggest the impact of these changes could be small to negligible (at least with GOAMD64=v3): goos: linux goarch: amd64 pkg: runtime cpu: AMD EPYC 7B13 │ base-512M-v3.bench │ ps16-512M-goamd64-v3.bench │ │ sec/op │ sec/op vs base │ Malloc8-16 11.01n ± 1% 10.94n ± 1% -0.68% (p=0.038 n=20) Malloc16-16 17.15n ± 1% 17.05n ± 0% -0.55% (p=0.007 n=20) Malloc32-16 18.65n ± 1% 18.42n ± 0% -1.26% (p=0.000 n=20) MallocTypeInfo8-16 18.63n ± 0% 18.36n ± 0% -1.45% (p=0.000 n=20) MallocTypeInfo16-16 22.32n ± 0% 22.65n ± 0% +1.50% (p=0.000 n=20) MallocTypeInfo32-16 23.37n ± 0% 23.89n ± 0% +2.23% (p=0.000 n=20) geomean 18.02n 18.01n -0.05% These last benchmark results include the runtime updates to support span classes with pointers (which was originally part of this CL, but later split out for ease of review). Updates #74299 Change-Id: Icceaa0f79f85c70cd1a718f9a4e7f0cf3d77803c Reviewed-on: https://go-review.googlesource.com/c/go/+/673695 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-11-14runtime: skip tests for GOEXPERIMENT=arenas that do not handle clobberfree=1thepudds
When run with GODEBUG=clobberfree=1, three out of seven of the top-level tests in runtime/arena_test.go fail with a SIGSEGV inside the clobberfree function where it is overwriting freed memory with 0xdeadbeef. This is not a new problem. For example, this crashes in Go 1.20: GODEBUG=clobberfree=1 go test runtime -run=TestUserArena It would be nice for all.bash to pass with GODEBUG=clobberfree=1, including it is useful for testing the automatic reclaiming of dead memory via runtime.freegc in #74299. Given the GOEXPERIMENT=arenas in #51317 is not planned to move forward (and is hopefully slated to be replace by regions before too long), for now we just skip those three tests in order to get all.bash passing with GODEBUG=clobberfree=1. Updates #74299 Change-Id: I384d96791157b30c73457d582a45dd74c5607ee0 Reviewed-on: https://go-review.googlesource.com/c/go/+/715080 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-11-14runtime: put AddCleanup cleanup arguments in their own allocationMichael Anthony Knyszek
Currently, AddCleanup just creates a simple closure that calls `cleanup(arg)` as the actual cleanup function tracked internally. However, the argument ends up getting its own allocation. If it's tiny, then it can also end up sharing a tiny allocation slot with the object we're adding the cleanup to. Since the closure is a GC root, we can end up with cleanups that never fire. This change refactors the AddCleanup machinery to make the storage for the argument separate and explicit. With that in place, it explicitly allocates 16 bytes of storage for tiny arguments to side-step the tiny allocator. One would think this would cause an increase in memory use and more bytes allocated, but that's actually wrong! It turns out that the current "simple closure" actually creates _two_ closures. By making the argument passing explicit, we eliminate one layer of closures, so this actually results in a slightly faster AddCleanup overall and 16 bytes less memory allocated. goos: linux goarch: amd64 pkg: runtime cpu: AMD EPYC 7B13 │ before.bench │ after.bench │ │ sec/op │ sec/op vs base │ AddCleanupAndStop-64 124.5n ± 2% 103.7n ± 2% -16.71% (p=0.002 n=6) │ before.bench │ after.bench │ │ B/op │ B/op vs base │ AddCleanupAndStop-64 48.00 ± 0% 32.00 ± 0% -33.33% (p=0.002 n=6) │ before.bench │ after.bench │ │ allocs/op │ allocs/op vs base │ AddCleanupAndStop-64 3.000 ± 0% 2.000 ± 0% -33.33% (p=0.002 n=6) This change does, however, does add 16 bytes of overhead to the cleanup special itself, and makes each cleanup block entry 24 bytes instead of 8 bytes. This means the overall memory overhead delta with this change is neutral, and we just have a faster cleanup. (Cleanup block entries are transient, so I suspect any increase in memory overhead there is negligible.) Together with CL 719960, fixes #76007. Change-Id: I81bf3e44339e71c016c30d80bb4ee151c8263d5c Reviewed-on: https://go-review.googlesource.com/c/go/+/720321 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-14runtime: add AddCleanup benchmarkMichael Anthony Knyszek
Change-Id: Ia463a9b3b5980670bcf9297b4bddb60980ebfde5 Reviewed-on: https://go-review.googlesource.com/c/go/+/720320 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org>