aboutsummaryrefslogtreecommitdiff
path: root/src/runtime
AgeCommit message (Collapse)Author
2025-05-20runtime: print blocking status of bubbled goroutines in stacksDamien Neil
For goroutines in a synctest bubble, include whether the goroutine is "durably blocked" or not in the goroutine status. Synctest categorizes goroutines in certain states as "durably" blocked, where the goroutine is not merely idle but can only be awoken by another goroutine in its bubble. To make it easier for users to understand why a bubble is or is not idle, print the state of each bubbled goroutine. For example: goroutine 36 [chan receive, synctest bubble 34, not durably blocked]: goroutine 37 [chan receive (synctest), synctest bubble 34, durably blocked]: Goroutine 36 is receiving from a channel created outside its bubble. Goroutine 36 is receiving from a channel created inside its bubble. For #67434 Change-Id: I006b656a9ce7eeb75b2be21e748440a5dd57ceb0 Reviewed-on: https://go-review.googlesource.com/c/go/+/670976 Auto-Submit: Damien Neil <dneil@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-20runtime: add package doc for checkfinalizer modeMichael Anthony Knyszek
Fixes #72949. Change-Id: I114eda73c57bc7d596eb1656e738b80c1cbe5254 Reviewed-on: https://go-review.googlesource.com/c/go/+/662039 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20runtime: report finalizer and cleanup queue length with checkfinalizer>0Michael Anthony Knyszek
This change adds tracking for approximate finalizer and cleanup queue lengths. These lengths are reported once every GC cycle as a single line printed to stderr when GODEBUG=checkfinalizer>0. This change lays the groundwork for runtime/metrics metrics to produce the same values. For #72948. For #72950. Change-Id: I081721238a0fc4c7e5bee2dbaba6cfb4120d1a33 Reviewed-on: https://go-review.googlesource.com/c/go/+/671437 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20runtime: skip testprogcgo tests in race mode on freebsdMichael Pratt
These were just enabled by https://go.dev/cl/643897, but freebsd unfortunately doesn't seem to support cgo + race mode by default. For #73788. Cq-Include-Trybots: luci.golang.try:gotip-freebsd-amd64-race Change-Id: I6a6a636c06176ca746548d0588283b1429d7c6d5 Reviewed-on: https://go-review.googlesource.com/c/go/+/674160 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-05-20runtime: add scan trace for checkfinalizers>1Michael Anthony Knyszek
This change dumps a scan trace (each pointer marked and where it came from) for the partial GC cycle performed by checkfinalizers mode when checkfinalizers>1. This is useful for quickly understanding why certain values are reachable without having to pull out tools like viewcore. For #72949. Change-Id: Ic583f80e9558cdfe1c667d27a1d975008dd39a9c Reviewed-on: https://go-review.googlesource.com/c/go/+/662038 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20runtime: mark and identify tiny blocks in checkfinalizers modeMichael Anthony Knyszek
This change adds support for identifying cleanups and finalizers attached to tiny blocks to checkfinalizers mode. It also notes a subtle pitfall, which is that the cleanup arg, if tiny-allocated, could end up co-located with the object with the cleanup attached! Oops... For #72949. Change-Id: Icbe0112f7dcfc63f35c66cf713216796a70121ce Reviewed-on: https://go-review.googlesource.com/c/go/+/662037 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-05-20runtime: annotate checkfinalizers reports with source and type infoMichael Anthony Knyszek
This change adds a new special kind called CheckFinalizer which is used to annotate finalizers and cleanups with extra information about where that cleanup or finalizer came from. For #72949. Change-Id: I3c1ace7bd580293961b7f0ea30345a6ce956d340 Reviewed-on: https://go-review.googlesource.com/c/go/+/662135 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20runtime: add new GODEBUG checkfinalizerMichael Anthony Knyszek
This new debug mode detects cleanup/finalizer leaks using checkmark mode. It runs a partial GC using only specials as roots. If the GC can find a path from one of these roots back to the object the special is attached to, then the object might never be reclaimed. (The cycle could be broken in the future, but it's almost certainly a bug.) This debug mode is very barebones. It contains no type information and no stack location for where the finalizer or cleanup was created. For #72949. Change-Id: Ibffd64c1380b51f281950e4cfe61f677385d42a5 Reviewed-on: https://go-review.googlesource.com/c/go/+/634599 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-20runtime: prevent unnecessary zeroing of large objects with pointersMichael Anthony Knyszek
CL 614257 refactored mallocgc but lost an optimization: if a span for a large object is already backed by memory fresh from the OS (and thus zeroed), we don't need to zero it. CL 614257 unconditionally zeroed spans for large objects that contain pointers. This change restores the optimization from before CL 614257, which seems to matter in some real-world programs. While we're here, let's also fix a hole with the garbage collector being able to observe uninitialized memory of the large object is observed by the conservative scanner before being published. The gory details are in a comment in heapSetTypeLarge. In short, this change makes span.largeType an atomic variable, such that the GC can only observe initialized memory if span.largeType != nil. Fixes #72991. Change-Id: I2048aeb220ab363d252ffda7d980b8788e9674dc Reviewed-on: https://go-review.googlesource.com/c/go/+/659956 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
2025-05-20runtime: only update freeIndexForScan outside of the mark phaseMichael Anthony Knyszek
Currently, it's possible for asynchronous preemption to observe a partially initialized object. The sequence of events goes like this: - The GC is in the mark phase. - Thread T1 is allocating object O1. - Thread T1 zeroes the allocation, runs the publication barrier, and updates freeIndexForScan. It has not yet updated the mark bit on O1. - Thread T2 is conservatively scanning some stack frame. That stack frame has a dead pointer with the same address as O1. - T2 picks up the pointer, checks isFree (which checks freeIndexForScan without an import barrier), and sees that O1 is allocated. It marks and queues O1. - T2 then goes to scan O1, and observes uninitialized memory. Although a publication barrier was executed, T2 did not have an import barrier. T2 may thus observe T1's writes to zero the object out-of-order with the write to freeIndexForScan. Normally this would be impossible if T2 got a pointer to O1 from somewhere written by T1. The publication barrier guarantees that if the read side is data-dependent on the write side then we'd necessarily observe all writes to O1 before T1 published it. However, T2 got the pointer 'out of thin air' by scanning a stack frame with a dead pointer on it. One fix to this problem would be to add the import barrier in the conservative scanner. We would then also need to put freeIndexForScan behind the publication barrier, or make the write to freeIndexForScan exactly that barrier. However, there's a simpler way. We don't actually care if conservative scanning observes a stale freeIndexForScan during the mark phase. Newly-allocated memory is always marked at the point of allocation (the allocate-black policy part of the GC's design). So it doesn't actually matter that if the garbage collector scans that memory or not. This change modifies the allocator to only update freeIndexForScan outside the mark phase. This means freeIndexForScan is essentially a snapshot of freeindex at the point the mark phase started. Because there's no more race between conservative scanning and newly-allocated objects, the complicated scenario above is no longer a possibility. One thing we do have to be careful of is other callers of isFree. Previously freeIndexForScan would always track freeindex, now it no longer does. This change thus introduces isFreeOrNewlyAllocated which is used by the conservative scanner, and uses freeIndexForScan. Meanwhile isFree goes back to using freeindex like it used to. This change also documents the requirement on isFree that the caller must have obtained the pointer not 'out of thin air' but after the object was published. isFree is not currently used anywhere particularly sensitive (heap dump and checkmark mode, where the world is stopped in both cases) so using freeindex is both conceptually simple and also safe. Change-Id: If66b8c536b775971203fb4358c17d711c2944723 Reviewed-on: https://go-review.googlesource.com/c/go/+/672340 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-19cmd/compile: allocate backing store for append on the stackKeith Randall
When appending, if the backing store doesn't escape and a constant-sized backing store is big enough, use a constant-sized stack-allocated backing store instead of allocating it from the heap. cmd/go is <0.1% bigger. As an example of how this helps, if you edit strings/strings.go:FieldsFunc to replace spans := make([]span, 0, 32) with var spans []span then this CL removes the first 2 allocations that are part of the growth sequence: │ base │ exp │ │ allocs/op │ allocs/op vs base │ FieldsFunc/ASCII/16-24 3.000 ± ∞ ¹ 2.000 ± ∞ ¹ -33.33% (p=0.008 n=5) FieldsFunc/ASCII/256-24 7.000 ± ∞ ¹ 5.000 ± ∞ ¹ -28.57% (p=0.008 n=5) FieldsFunc/ASCII/4096-24 11.000 ± ∞ ¹ 9.000 ± ∞ ¹ -18.18% (p=0.008 n=5) FieldsFunc/ASCII/65536-24 18.00 ± ∞ ¹ 16.00 ± ∞ ¹ -11.11% (p=0.008 n=5) FieldsFunc/ASCII/1048576-24 30.00 ± ∞ ¹ 28.00 ± ∞ ¹ -6.67% (p=0.008 n=5) FieldsFunc/Mixed/16-24 2.000 ± ∞ ¹ 2.000 ± ∞ ¹ ~ (p=1.000 n=5) FieldsFunc/Mixed/256-24 7.000 ± ∞ ¹ 5.000 ± ∞ ¹ -28.57% (p=0.008 n=5) FieldsFunc/Mixed/4096-24 11.000 ± ∞ ¹ 9.000 ± ∞ ¹ -18.18% (p=0.008 n=5) FieldsFunc/Mixed/65536-24 18.00 ± ∞ ¹ 16.00 ± ∞ ¹ -11.11% (p=0.008 n=5) FieldsFunc/Mixed/1048576-24 30.00 ± ∞ ¹ 28.00 ± ∞ ¹ -6.67% (p=0.008 n=5) (Of course, people have spotted and fixed a bunch of allocation sites like this, but now we're ~automatically doing it everywhere going forward.) No significant increases in frame sizes in cmd/go. Change-Id: I301c4d9676667eacdae0058960321041d173751a Reviewed-on: https://go-review.googlesource.com/c/go/+/664299 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>
2025-05-19runtime: disable TestSegv in race modeMichael Pratt
This was just enabled in CL 643897. It seems to work fine on Linux, but there are traceback issues on Darwin. We could disable just on Darwin, but I'm not sure SIGSEGV inside of TSAN is something we care to support. Fixes #73784. Cq-Include-Trybots: luci.golang.try:gotip-darwin-arm64-race Change-Id: I6a6a636cb15d7affaeb22c4c13d8f2a5c9bb31fd Reviewed-on: https://go-review.googlesource.com/c/go/+/674276 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-05-19runtime: rename ncpu to numCPUStartupMichael Pratt
ncpu is the total logical CPU count at startup. It is never updated. For #73193, we will start using updated CPU counts for updated GOMAXPROCS, making the ncpu name a bit ambiguous. Change to a less ambiguous name. While we're at it, give the OS specific lookup functions a common name, so it can be used outside of osinit later. For #73193. Change-Id: I6a6a636cf21cc60de36b211f3c374080849fc667 Reviewed-on: https://go-review.googlesource.com/c/go/+/672277 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-05-19runtime: move atoi to internal/runtime/strconvMichael Pratt
Moving to a smaller package allows its use in other internal/runtime packages. This isn't internal/strconvlite since it can't be used directly by strconv. For #73193. Change-Id: I6a6a636c9c8b3f06b5fd6c07fe9dd5a7a37d1429 Reviewed-on: https://go-review.googlesource.com/c/go/+/672697 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-05-19runtime: check for gsignal in asancall/msancall/racecallMichael Pratt
asancall and msancall are reachable from the signal handler, where we are running on gsignal. Currently, these calls will use the g0 stack in this case, but if the interrupted code was running on g0 this will corrupt the stack and likely cause a crash. As far as I know, racecall is not reachable from the signal handler, but I have updated it as well for consistency. This is the most straightforward fix, though it would be nice to eventually migrate these wrappers to asmcgocall, which already handled this case. Fixes #71395. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-asan-clang15,gotip-linux-amd64-msan-clang15,gotip-linux-amd64-race Change-Id: I6a6a636ccba826dd53e31c0e85b5d42fb1e98d12 Reviewed-on: https://go-review.googlesource.com/c/go/+/643875 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-19runtime: pass through -asan/-msan/-race to testprog testsMichael Pratt
The tests using testprog / testprogcgo are currently not covered on the asan/msan/race builders because they don't build testprog with the sanitizer flag. Explicitly pass the flag if the test itself is built with the sanitizer. There were a few tests that explicitly passed -race (even on non-race builders). These tests will now only run on race builders. For #71395. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-asan-clang15,gotip-linux-amd64-msan-clang15,gotip-linux-amd64-race Change-Id: I6a6a636ce8271246316a80d426c0e4e2f6ab99c5 Reviewed-on: https://go-review.googlesource.com/c/go/+/643897 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-05-19std: pass bytes.Buffer and strings.Builder by pointerAlan Donovan
This CL fixes a number of (all true positive) findings of vet's copylock analyzer patched to treat the Bu{ff,uild}er types as non-copyable after first use. This does require imposing an additional indirection between noder.writer and Encoder since the field is embedded by value but its constructor now returns a pointer. Updates golang/go#25907 Updates golang/go#47276 Change-Id: I0b4d77ac12bcecadf06a91709e695365da10766c Reviewed-on: https://go-review.googlesource.com/c/go/+/635339 Reviewed-by: Robert Findley <rfindley@google.com> Commit-Queue: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Alan Donovan <adonovan@google.com>
2025-05-16runtime: prevent cleanup goroutines from missing workMichael Anthony Knyszek
Currently, there's a window of time where each cleanup goroutine has committed to going to sleep (immediately after full.pop() == nil) but hasn't yet marked itself as asleep (state.sleep()). If new work arrives in this window, it might get missed. This is what we see in #73642, and I can reproduce it with stress2. Side-note: even if the work gets missed by the existing sleeping goroutines, needg is incremented. So in theory a new goroutine will handle the work. Right now that doesn't happen in tests like the one running in #73642, where there might never be another call to AddCleanup to create the additional goroutine. Also, if we've hit the maximum on cleanup goroutines and all of them are in this window simultaneously, we can still end up missing work, it's just more rare. So this is still a problem even if we choose to just be more aggressive about creating new cleanup goroutines. This change fixes the problem and also aims to make the cleanup wake/sleep code clearer. The way this change fixes this problem is to have cleanup goroutines re-check the work list before going to sleep, but after having already marked themselves as sleeping. This way, if new work comes in before the cleanup goroutine marks itself as going to sleep, we can rely on the re-check to pick up that work. If new work comes after the goroutine marks itself as going to sleep and after the re-check, we can rely on the scheduler noticing that the goroutine is asleep and waking it up. If work comes in between a goroutine marking itself as sleeping and the re-check, then the re-check will catch that piece of work. However, the scheduler might now get a false signal that the goroutine is asleep and try to wake it up. This is OK. The sleeping signal is now mutated and double-checked under the queue lock, so the scheduler will grab the lock, may notice there are no sleeping goroutines, and go on its way. This may cause spurious lock acquisitions but it should be very rare. The window between a cleanup goroutine marking itself as going to sleep and re-checking the work list is a handful of instructions at most. This seems subtle but overall it's a simplification of the code. We rely more on the lock, which is easier to reason about, and we track two separate atomic variables instead of the merged cleanupSleepState: the length of the full list, and the number of cleanup goroutines that are asleep. The former is now the primary way to acquire work. Cleanup goroutines must decrement the length successfully to obtain an item off the full list. The number of cleanup goroutines asleep, meanwhile, is now only updated with the queue lock held. It can be checked without the lock held, and the invariant to make that safe is simple: it must always be an overestimate of the number of sleeping cleanup goroutines. The changes here do change some other behaviors. First, since we're tracking the length of the full list instead of the abstract concept of a wake-up, the waker can't consume wake-ups anymore. This means that cleanup goroutines may be created more aggressively. If two threads in the scheduler see that there are goroutines that are asleep, only one will win the race, but the other will observe zero asleep goroutines but potentially many work units available. This will cause it to signal many goroutines to be created. This is OK since we have a cap on the number of cleanup goroutines, and the race should be relatively rare. Second, because cleanup goroutines can now fail to go to sleep if any units of work come in, they might spend more time contended on the lock. For example, if we have N cleanup goroutines and work comes in at *just* the wrong rate, in the worst case we'll have each of G goroutines loop N times for N blocks, resulting in O(G*N) thread time to handle each block in the worst case. To paint a picture, imagine each goroutine trying to go to sleep, fail because a new block of work came in, and only one goroutine will get that block. Then once that goroutine is done, we all try again, fail because a new block of work came in, and so on and so forth. This case is unlikely, though, and probably not worth worrying about until it actually becomes a problem. (A similar problem exists with parking (and exists before this change, too) but at least in that case each goroutine parks, so it doesn't block the thread.) Fixes #73642. Change-Id: I6bbe1b789e7eb7e8168e56da425a6450fbad9625 Reviewed-on: https://go-review.googlesource.com/c/go/+/671676 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-15runtime/pprof: include PCs for deduplication in TestMutexBlockFullAggregationNick Ripley
TestMutexBlockFullAggregation aggregates stacks by function, file, and line number. But there can be multiple function calls on the same line, giving us different sequences of PCs. This causes the test to spuriously fail in some cases. Include PCs in the stacks for this test. Also pick up a small "range over int" modernize suggestion while we're looking at the test. Fixes #73641 Change-Id: I50489e19fcf920e27b9eebd9d4b35feb89981cbc Reviewed-on: https://go-review.googlesource.com/c/go/+/673115 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-14runtime: help the race detector detect possible concurrent cleanupsMichael Anthony Knyszek
This change makes it so that cleanup goroutines, in race mode, create a fake race context and switch to it, emulating cleanups running on new goroutines. This helps in catching races between cleanups that might run concurrently. Change-Id: I4c4e33054313798d4ac4e5d91ff2487ea3eb4b16 Reviewed-on: https://go-review.googlesource.com/c/go/+/652635 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-14runtime: improve scan inner loopKeith Randall
On every arch except amd64, it is faster to do x&(x-1) than x^(1<<n). Most archs need 3 instructions for the latter: MOV $1, R; SLL n, R; ANDN R, x. Maybe 4 if there's no ANDN. Most archs need only 2 instructions to do x&(x-1). It takes 3 on x86/amd64 because NEG only works in place. Only amd64 can do x^(1<<n) in a single instruction. (We could on 386 also, but that's currently not implemented.) Change-Id: I3b74b7a466ab972b20a25dbb21b572baf95c3467 Reviewed-on: https://go-review.googlesource.com/c/go/+/672956 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-14runtime: Improvement in perf of s390x memclrkmvijay
Memclr routine of s390x architecture is now implemented with vector operations. And loop unrolling is used for larger sizes. goos: linux goarch: s390x pkg: runtime | old.txt | new_final.txt | | sec/op | sec/op vs base | Memclr/5 2.485n ± 5% 2.421n ± 0% -2.54% (p=0.000 n=10) Memclr/16 3.037n ± 2% 2.969n ± 0% -2.26% (p=0.001 n=10) Memclr/64 9.623n ± 0% 4.455n ± 1% -53.70% (p=0.000 n=10) Memclr/256 3.347n ± 3% 3.312n ± 4% ~ (p=0.670 n=10) Memclr/4096 15.53n ± 0% 15.54n ± 0% +0.06% (p=0.000 n=10) Memclr/65536 329.8n ± 2% 228.4n ± 0% -30.74% (p=0.000 n=10) Memclr/1M 13.09µ ± 0% 12.78µ ± 0% -2.34% (p=0.000 n=10) Memclr/4M 52.33µ ± 0% 51.16µ ± 0% -2.24% (p=0.000 n=10) Memclr/8M 104.6µ ± 0% 102.3µ ± 0% -2.20% (p=0.000 n=10) Memclr/16M 209.4µ ± 0% 204.9µ ± 0% -2.17% (p=0.000 n=10) Memclr/64M 977.8µ ± 0% 967.8µ ± 0% -1.02% (p=0.000 n=10) MemclrUnaligned/0_5 3.398n ± 0% 3.657n ± 0% +7.62% (p=0.000 n=10) MemclrUnaligned/0_16 3.957n ± 0% 3.958n ± 0% ~ (p=0.325 n=10) MemclrUnaligned/0_64 11.550n ± 0% 5.139n ± 0% -55.51% (p=0.000 n=10) MemclrUnaligned/0_256 4.288n ± 0% 4.025n ± 4% -6.14% (p=0.000 n=10) MemclrUnaligned/0_4096 15.53n ± 0% 15.53n ± 0% ~ (p=1.000 n=10) MemclrUnaligned/0_65536 318.3n ± 1% 233.9n ± 0% -26.52% (p=0.000 n=10) MemclrUnaligned/1_5 3.398n ± 0% 3.657n ± 0% +7.62% (p=0.000 n=10) MemclrUnaligned/1_16 3.965n ± 0% 3.969n ± 0% +0.10% (p=0.000 n=10) MemclrUnaligned/1_64 11.550n ± 0% 5.109n ± 0% -55.76% (p=0.000 n=10) MemclrUnaligned/1_256 4.385n ± 0% 4.174n ± 1% -4.80% (p=0.000 n=10) MemclrUnaligned/1_4096 26.23n ± 0% 26.24n ± 0% +0.04% (p=0.005 n=10) MemclrUnaligned/1_65536 570.5n ± 0% 401.3n ± 0% -29.66% (p=0.000 n=10) MemclrUnaligned/4_5 3.398n ± 0% 3.657n ± 0% +7.62% (p=0.000 n=10) MemclrUnaligned/4_16 3.965n ± 0% 3.973n ± 1% +0.19% (p=0.000 n=10) MemclrUnaligned/4_64 11.550n ± 0% 5.131n ± 0% -55.58% (p=0.000 n=10) MemclrUnaligned/4_256 4.419n ± 0% 4.187n ± 1% -5.25% (p=0.000 n=10) MemclrUnaligned/4_4096 26.23n ± 0% 26.24n ± 0% +0.04% (p=0.011 n=10) MemclrUnaligned/4_65536 570.5n ± 0% 401.2n ± 0% -29.67% (p=0.000 n=10) MemclrUnaligned/7_5 3.397n ± 0% 3.657n ± 0% +7.65% (p=0.000 n=10) MemclrUnaligned/7_16 3.965n ± 0% 3.969n ± 0% +0.10% (p=0.000 n=10) MemclrUnaligned/7_64 11.550n ± 0% 5.120n ± 0% -55.67% (p=0.000 n=10) MemclrUnaligned/7_256 4.407n ± 0% 4.188n ± 2% -4.99% (p=0.000 n=10) MemclrUnaligned/7_4096 26.24n ± 0% 26.24n ± 0% ~ (p=1.000 n=10) MemclrUnaligned/7_65536 570.8n ± 0% 401.3n ± 0% -29.69% (p=0.000 n=10) MemclrUnaligned/0_1M 13.08µ ± 0% 12.81µ ± 0% -2.06% (p=0.000 n=10) MemclrUnaligned/0_4M 52.28µ ± 0% 51.13µ ± 0% -2.21% (p=0.000 n=10) MemclrUnaligned/0_8M 104.6µ ± 0% 102.3µ ± 0% -2.18% (p=0.000 n=10) MemclrUnaligned/0_16M 209.5µ ± 0% 204.8µ ± 0% -2.24% (p=0.000 n=10) MemclrUnaligned/0_64M 977.7µ ± 0% 969.1µ ± 0% -0.88% (p=0.000 n=10) MemclrUnaligned/1_1M 17.49µ ± 0% 16.04µ ± 0% -8.32% (p=0.000 n=10) MemclrUnaligned/1_4M 69.92µ ± 0% 64.13µ ± 0% -8.28% (p=0.000 n=10) MemclrUnaligned/1_8M 139.8µ ± 0% 128.2µ ± 0% -8.32% (p=0.000 n=10) MemclrUnaligned/1_16M 279.9µ ± 0% 256.1µ ± 0% -8.50% (p=0.000 n=10) MemclrUnaligned/1_64M 1.250m ± 0% 1.216m ± 0% -2.73% (p=0.000 n=10) MemclrUnaligned/4_1M 17.50µ ± 0% 16.04µ ± 0% -8.33% (p=0.000 n=10) MemclrUnaligned/4_4M 69.93µ ± 0% 64.12µ ± 0% -8.30% (p=0.000 n=10) MemclrUnaligned/4_8M 139.8µ ± 0% 128.2µ ± 0% -8.32% (p=0.000 n=10) MemclrUnaligned/4_16M 280.2µ ± 0% 256.2µ ± 0% -8.55% (p=0.000 n=10) MemclrUnaligned/4_64M 1.250m ± 0% 1.216m ± 0% -2.73% (p=0.000 n=10) MemclrUnaligned/7_1M 17.50µ ± 0% 16.04µ ± 0% -8.35% (p=0.000 n=10) MemclrUnaligned/7_4M 69.92µ ± 0% 64.13µ ± 0% -8.28% (p=0.000 n=10) MemclrUnaligned/7_8M 139.8µ ± 0% 128.2µ ± 0% -8.34% (p=0.000 n=10) MemclrUnaligned/7_16M 279.6µ ± 0% 256.2µ ± 0% -8.35% (p=0.000 n=10) MemclrUnaligned/7_64M 1.250m ± 0% 1.216m ± 0% -2.73% (p=0.000 n=10) MemclrRange/1K_2K 1.053µ ± 0% 1.020µ ± 1% -3.09% (p=0.000 n=10) MemclrRange/2K_8K 1.552µ ± 0% 1.570µ ± 12% ~ (p=0.137 n=10) MemclrRange/4K_16K 1.283µ ± 0% 1.250µ ± 0% -2.61% (p=0.000 n=10) MemclrRange/160K_228K 20.62µ ± 0% 19.86µ ± 0% -3.70% (p=0.000 n=10) MemclrKnownSize1 1.732n ± 0% 1.732n ± 0% ~ (p=1.000 n=10) MemclrKnownSize2 1.925n ± 34% 1.967n ± 8% ~ (p=0.080 n=10) MemclrKnownSize4 1.808n ± 3% 1.732n ± 0% -4.20% (p=0.000 n=10) MemclrKnownSize8 2.002n ± 9% 1.773n ± 5% -11.46% (p=0.000 n=10) MemclrKnownSize16 2.880n ± 5% 2.461n ± 5% -14.53% (p=0.000 n=10) MemclrKnownSize32 8.082n ± 0% 2.838n ± 5% -64.88% (p=0.000 n=10) MemclrKnownSize64 8.083n ± 0% 4.960n ± 4% -38.63% (p=0.000 n=10) MemclrKnownSize112 8.082n ± 0% 5.533n ± 1% -31.53% (p=0.000 n=10) MemclrKnownSize128 8.082n ± 0% 5.534n ± 1% -31.54% (p=0.000 n=10) MemclrKnownSize192 8.082n ± 0% 6.833n ± 2% -15.45% (p=0.000 n=10) MemclrKnownSize248 8.082n ± 0% 7.165n ± 1% -11.34% (p=0.000 n=10) MemclrKnownSize256 2.995n ± 6% 3.226n ± 4% +7.70% (p=0.006 n=10) MemclrKnownSize512 3.356n ± 8% 3.595n ± 3% +7.14% (p=0.007 n=10) MemclrKnownSize1024 4.664n ± 0% 4.665n ± 0% ~ (p=0.426 n=10) MemclrKnownSize4096 15.80n ± 4% 15.15n ± 0% ~ (p=0.449 n=10) MemclrKnownSize512KiB 6.543µ ± 0% 6.380µ ± 0% -2.48% (p=0.000 n=10) geomean 327.2n 286.6n -12.42% Change-Id: I0f8450743e2f7e736c5ff96a316a8b5d98b27222 Reviewed-on: https://go-review.googlesource.com/c/go/+/662475 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-14runtime: increase freebsd/amd64 pointer size from 48 to 57 bitskhr@golang.org
Because freebsd is now enabling la57 by default. Fixes #49405 Change-Id: I30f7bac8b8a9baa85e0c097e06072c19ad474e5a Reviewed-on: https://go-review.googlesource.com/c/go/+/670715 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-13runtime/pprof: return errors from writing profilesSean Liao
Fixes #73107 Change-Id: I41f3e1bd1fdaca2f0e94151b2320bd569e258a51 Reviewed-on: https://go-review.googlesource.com/c/go/+/671576 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-13cmd/go: support -json flag in go versionxieyuschen
It supports features described in the issue: * add -json flag for 'go version -m' to print json encoding of runtime/debug.BuildSetting to standard output. * report an error when specifying -json flag without -m. * print build settings on seperated line for each binary Fixes #69712 Change-Id: I79cba2109f80f7459252d197a74959694c4eea1f Reviewed-on: https://go-review.googlesource.com/c/go/+/619955 Reviewed-by: Sam Thanawalla <samthanawalla@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-12runtime: only poll network from one P at a time in findRunnableCarlos Amedee
This change reintroduces CL 564197. It was reverted due to a failing benchmark. That failure has been resolved. For #65064 Change-Id: Ic88841d2bc24c2717ad324873f0f52699f21dc66 Reviewed-on: https://go-review.googlesource.com/c/go/+/669235 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-12runtime: add goschedIfBusy to bgsweep to prevent livelock after inliningArsenySamoylov
gcMarkTermination() ensures that all caches are flushed before continuing the GC cycle, thus preempting all goroutines. However, inlining calls to lock() in bgsweep makes it non-preemptible for most of the time, leading to livelock. This change adds explicit preemption to avoid this. Fixes #73499. Change-Id: I4abf0d658f3d7a03ad588469cd013a0639de0c8a Reviewed-on: https://go-review.googlesource.com/c/go/+/668795 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-08runtime: avoid overflow in mutex delay calculationRhys Hiltner
If cputicks is in the top quarter of the int64's range, adding two values together will overflow and confuse the subsequent calculations, leading to zero-duration contention events in the profile. This fixes the TestRuntimeLockMetricsAndProfile failures on the linux-s390x builder. Change-Id: Icb814c39a8702379dfd71c06a53b2618e3589e07 Reviewed-on: https://go-review.googlesource.com/c/go/+/671115 Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Rhys Hiltner <rhys.hiltner@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-08runtime: remove ptr/scalar bitmap metrickhr@golang.org
We don't use this mechanism any more, so the metric will always be zero. Since CL 616255. Update #73628 Change-Id: Ic179927a8bc24e6291876c218d88e8848b057c2a Reviewed-on: https://go-review.googlesource.com/c/go/+/671096 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-08runtime: schedule cleanups across multiple goroutinesMichael Anthony Knyszek
This change splits the finalizer and cleanup queues and implements a new lock-free blocking queue for cleanups. The basic design is as follows: The cleanup queue is organized in fixed-sized blocks. Individual cleanup functions are queued, but only whole blocks are dequeued. Enqueuing cleanups places them in P-local cleanup blocks. These are flushed to the full list as they get full. Cleanups can only be enqueued by an active sweeper. Dequeuing cleanups always dequeues entire blocks from the full list. Cleanup blocks can be dequeued and executed at any time. The very last active sweeper in the sweep phase is responsible for flushing all local cleanup blocks to the full list. It can do this without any synchronization because the next GC can't start yet, so we can be very certain that nobody else will be accessing the local blocks. Cleanup blocks are stored off-heap because the need to be allocated by the sweeper, which is called from heap allocation paths. As a result, the GC treats cleanup blocks as roots, just like finalizer blocks. Flushes to the full list signal to the scheduler that cleanup goroutines should be awoken. Every time the scheduler goes to wake up a cleanup goroutine and there were more signals than goroutines to wake, it then forwards this signal to runtime.AddCleanup, so that it creates another goroutine the next time it is called, up to gomaxprocs goroutines. The signals here are a little convoluted, but exist because the sweeper and the scheduler cannot safely create new goroutines. For #71772. For #71825. Change-Id: Ie839fde2b67e1b79ac1426be0ea29a8d923a62cc Reviewed-on: https://go-review.googlesource.com/c/go/+/650697 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-08runtime: fix condition to emit gcpacertrace end-of-sweep lineMichael Anthony Knyszek
It's the job of the last sweeper to emit the GC pacer trace. The last sweeper can identify themselves by reducing the count of sweepers, and also seeing that there's no more sweep work. Currently this identification is broken, however, because the last sweeper doesn't check the state they just transitioned sweeping into, but rather the state they transitioned from (one sweeper, no sweep work left). By design, it's impossible to transition *out* of this state, except for another GC to start, but that doesn't take this codepath. This means lines like pacer: sweep done at heap size ... were missing from the gcpacertrace output for a long time. This change fixes this problem by having the last sweeper check the state they just transitioned sweeping to, instead of the state they transitioned from. Change-Id: I44bcd32fe2c8ae6ac6c21ba6feb2e7b9e17f60cc Reviewed-on: https://go-review.googlesource.com/c/go/+/670735 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-07runtime: fix tag pointers on aix, take 2Keith Randall
Previous fix in CL 667715 wasn't correct for aix. Change-Id: I44042786079463967165507b15756cf24b9a213a Reviewed-on: https://go-review.googlesource.com/c/go/+/668036 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-07runtime: use "bubble" terminology for synctestDamien Neil
We've settled on calling the group of goroutines started by synctest.Run a "bubble". At the time the runtime implementation was written, I was still calling this a "group". Update the code to match the current terminology. Change-Id: I31b757f31d804b5d5f9564c182627030a9532f4a Reviewed-on: https://go-review.googlesource.com/c/go/+/670135 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Damien Neil <dneil@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-07runtime, testing/synctest: stop advancing time when main goroutine exitsDamien Neil
Once the goroutine started by synctest.Run exits, stop advancing the fake clock in its bubble. This avoids confusing situations where a bubble remains alive indefinitely while a background goroutine reads from a time.Ticker or otherwise advances the clock. For #67434 Change-Id: Id608ffe3c7d7b07747b56a21f365787fb9a057d7 Reviewed-on: https://go-review.googlesource.com/c/go/+/662155 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Damien Neil <dneil@google.com>
2025-05-07runtime: remove GODEBUG=runtimecontentionstacksRhys Hiltner
Go 1.22 promised to remove the setting in a future release once the semantics of runtime-internal lock contention matched that of sync.Mutex. That work is done, remove the setting. Previously reviewed as https://go.dev/cl/585639. For #66999 Change-Id: I9fe62558ba0ac12824874a0bb1b41efeb7c0853f Reviewed-on: https://go-review.googlesource.com/c/go/+/668995 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-05-07runtime: verify attribution of mutex delayRhys Hiltner
Have the test use the same clock (cputicks) as the profiler, and use the test's own measurements as hard bounds on the magnitude to expect in the profile. Compare the depiction of two users of the same lock: one where the critical section is fast, one where it is slow. Confirm that the profile shows the slow critical section as a large source of delay (with #66999 fixed), rather than showing the fast critical section as a large recipient of delay. Previously reviewed as https://go.dev/cl/586237. For #66999 Change-Id: Ic2d78cc29153d5322577d84abdc448e95ed8f594 Reviewed-on: https://go-review.googlesource.com/c/go/+/667616 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
2025-05-07runtime: blame unlocker for mutex delayRhys Hiltner
Correct how the mutex contention profile reports on runtime-internal mutex values, to match sync.Mutex's semantics. Decide at the start of unlock2 whether we'd like to collect a contention sample. If so: Opt in to a slightly slower unlock path which avoids accidentally accepting blame for delay caused by other Ms. Release the lock before doing an O(N) traversal of the stack of waiting Ms, to calculate the total delay to those Ms that our critical section caused. Report that, with the current callstack, in the mutex profile. Fixes #66999 Change-Id: I561ed8dc120669bd045d514cb0d1c6c99c2add04 Reviewed-on: https://go-review.googlesource.com/c/go/+/667615 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-07cmd,runtime: enable race detector on loong64Guoqi Chen
The race feature depends on llvm. And support for building the tsan library on linux/loong64 has been added in this patch [1], which has been merged into the branch main and has landed in llvm18. The support for linux/loong64 in racebuild has been implemented in CL 655775, now racebuild can successfully build race_linux_loong64.syso [2]. [1]: https://github.com/llvm/llvm-project/pull/72819 [2]: racebuild -platforms linux/loong64 -cherrypick 'refs/changes/16/543316/10' \ -rev 83fe85115da9dc25fa270d2ea8140113c8d49670 \ -goroot /home/golang/src/go Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn> Change-Id: If389318215476890295ed771297c6c088cfc84b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/543316 Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn> Reviewed-by: Junyang Shao <shaojunyang@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
2025-05-07runtime: print stack traces for bubbled goroutines on synctest deadlockDamien Neil
When synctest.Run panics due to every goroutine in the bubble being blocked, print a stack trace for every goroutine in the bubble. For #67434 Change-Id: Ie751c2ee6fa136930b18f4bee0277ff30da46905 Reviewed-on: https://go-review.googlesource.com/c/go/+/645719 Auto-Submit: Damien Neil <dneil@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-07runtime: improve Error documentationMichael Pratt
The current Error documentation is vacuous and doesn't say anything about what this interface is actually for. Expand to include its meaning and why it might be used. Change-Id: I6a6a636cbd5f5788cb9d1a88845de16b98f7424b Reviewed-on: https://go-review.googlesource.com/c/go/+/670635 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-05-06unique: use a bespoke canonicalization map and runtime.AddCleanupMichael Anthony Knyszek
This change moves the unique package away from using a concurrent map and instead toward a bespoke concurrent canonicalization map. The map holds all its keys weakly, though keys may be looked up by value. The result is the strong pointer for the canonical value. Entries in the map are automatically cleaned up once the canonical reference no longer exists. Why do this? There's a problem with the current implementation when it comes to chains of unique.Handle: because the unique map will have a unique.Handle stored in its keys, each nested handle must be cleaned up 1 GC at a time. It takes N GC cycles, at minimum, to clean up a nested chain of N handles. This implementation, where the *only* value in the set is weakly-held, does not have this problem. The entire chain is dropped at once. The canon map implementation is a stripped-down version of HashTrieMap. The weak set implementation also has lower memory overheads by virtue of the fact that keys are all stored weakly. Whereas the previous map had both a T and a weak.Pointer[T], this *only* has a weak.Pointer[T]. The canonicalization map is a better abstraction overall and dramatically simplifies the unique.Make code. While we're here, delete the background goroutine and switch to runtime.AddCleanup. This is a step toward fixing #71772. We still need some kind of back-pressure mechanism, which will be implemented in a follow-up CL. For #71772. Fixes #71846. Change-Id: I5b2ee04ebfc7f6dd24c2c4a959dd0f6a8af24ca4 Reviewed-on: https://go-review.googlesource.com/c/go/+/650256 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-06runtime: replace mentions of "raised" with "panicked"Mark Freeman
Fixes #73526 Change-Id: I4b801cf3e54b99559e6d5ca8fdb2fd0692a0d3a5 Reviewed-on: https://go-review.googlesource.com/c/go/+/669975 TryBot-Bypass: Mark Freeman <mark@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Mark Freeman <mark@golang.org> Reviewed-by: Mark Freeman <mark@golang.org>
2025-05-05Revert "cmd/compile: allow all of the preamble to be preemptible"Keith Randall
This reverts commits 3f3782feed6e0726ddb08afd32dad7d94fbb38c6 (CL 648518) b386b628521780c048af14a148f373c84e687b26 (CL 668475) Fixes #73542 Change-Id: I218851c5c0b62700281feb0b3f82b6b9b97b910d Reviewed-on: https://go-review.googlesource.com/c/go/+/670055 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-02runtime: clear frame pointer in morestackMichael Pratt
Corollary to CL 669615. morestack uses the frame pointer from g0.sched.bp. This doesn't really make any sense. morestack wasn't called by whatever used g0 last, so at best unwinding will get misleading results. For #63630. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-arm64-longtest Change-Id: I6a6a636c3a2994eb88f890c506c96fd899e993a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/669616 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-02runtime: don't restore from g0.sched in systemstack on arm64Michael Pratt
On arm64, systemstack restores the frame pointer from g0.sched to R29 prior to calling the callback. That doesn't really make any sense. The frame pointer value in g0.sched is some arbitrary BP from a prior context save, but that is not the caller of systemstack. amd64 does not do this. In fact, it leaves BP completely unmodified so frame pointer unwinders like gdb can walk through the systemstack frame and continue traceback on the caller's stack. Unlike mcall, systemstack always returns to the original goroutine, so that is safe. We should do the same on arm64. For #63630. Cq-Include-Trybots: luci.golang.try:gotip-linux-arm64-longtest Change-Id: I6a6a636c35d321dd5d7dc1c4d09e29b55b1ab621 Reviewed-on: https://go-review.googlesource.com/c/go/+/669236 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-02runtime: clear frame pointer in mcallMichael Pratt
On amd64, mcall leaves BP untouched, so the callback will push BP, connecting the g0 stack to the calling g stack. This seems OK (frame pointer unwinders like Linux perf can see what user code called into the scheduler), but the "scheduler" part is problematic. mcall is used when calling into the scheduler to deschedule the current goroutine (e.g., in goyield). Once the goroutine is descheduled, it may be picked up by another M and continue execution. The other thread is mutating the goroutine stack, but our M still has a frame pointer pointing to the goroutine stack. A frame pointer unwinder like Linux perf could get bogus values off of the mutating stack. Note that though the execution tracer uses framepointer unwinding, it never unwinds a g0, so it isn't affected. Clear the frame pointer in mcall so that unwinding always stops at mcall. On arm64, mcall stores the frame pointer from g0.sched.bp. This doesn't really make any sense. mcall wasn't called by whatever used g0 last, so at best unwinding will get misleading results (e.g., it might look like cgocallback calls mcall?). Also clear the frame pointer on arm64. Other architectures don't use frame pointers. For #63630. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-arm64-longtest Change-Id: I6a6a636cb6404f3c95ecabdb969c9b8184615cee Reviewed-on: https://go-review.googlesource.com/c/go/+/669615 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-05-02runtime: mark and scan small objects in whole spans [green tea]Michael Anthony Knyszek
Our current parallel mark algorithm suffers from frequent stalls on memory since its access pattern is essentially random. Small objects are the worst offenders, since each one forces pulling in at least one full cache line to access even when the amount to be scanned is far smaller than that. Each object also requires an independent access to per-object metadata. The purpose of this change is to improve garbage collector performance by scanning small objects in batches to obtain better cache locality than our current approach. The core idea behind this change is to defer marking and scanning small objects, and then scan them in batches localized to a span. This change adds scanned bits to each small object (<=512 bytes) span in addition to mark bits. The scanned bits indicate that the object has been scanned. (One way to think of them is "grey" bits and "black" bits in the tri-color mark-sweep abstraction.) Each of these spans is always 8 KiB and if they contain pointers, the pointer/scalar data is already packed together at the end of the span, allowing us to further optimize the mark algorithm for this specific case. When the GC encounters a pointer, it first checks if it points into a small object span. If so, it is first marked in the mark bits, and then the object is queued on a work-stealing P-local queue. This object represents the whole span, and we ensure that a span can only appear at most once in any queue by maintaining an atomic ownership bit for each span. Later, when the pointer is dequeued, we scan every object with a set mark that doesn't have a corresponding scanned bit. If it turns out that was the only object in the mark bits since the last time we scanned the span, we scan just that object directly, essentially falling back to the existing algorithm. noscan objects have no scan work, so they are never queued. Each span's mark and scanned bits are co-located together at the end of the span. Since the span is always 8 KiB in size, it can be found with simple pointer arithmetic. Next to the marks and scans we also store the size class, eliminating the need to access the span's mspan altogether. The work-stealing P-local queue is a new source of GC work. If this queue gets full, half of it is dumped to a global linked list of spans to scan. The regular scan queues are always prioritized over this queue to allow time for darts to accumulate. Stealing work from other Ps is a last resort. This change also adds a new debug mode under GODEBUG=gctrace=2 that dumps whole-span scanning statistics by size class on every GC cycle. A future extension to this CL is to use SIMD-accelerated scanning kernels for scanning spans with high mark bit density. For #19112. (Deadlock averted in GOEXPERIMENT.) For #73581. Change-Id: I4bbb4e36f376950a53e61aaaae157ce842c341bc Reviewed-on: https://go-review.googlesource.com/c/go/+/658036 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01cmd/compile,internal/cpu,runtime: intrinsify math/bits.OnesCount on riscv64Joel Sing
For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount using the CPOP/CPOPW machine instructions. Since the native Go implementation of OnesCount is relatively expensive, it is also worth emitting a check for Zbb support when compiled for rva20u64. On a Banana Pi F3, with GORISCV64=rva22u64: │ oc.1 │ oc.2 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 4.389n ± 0% -74.08% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.016n ± 0% -11.10% (p=0.000 n=10) OnesCount16-8 9.404n ± 0% 5.015n ± 0% -46.67% (p=0.000 n=10) OnesCount32-8 13.165n ± 0% 4.388n ± 0% -66.67% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 4.388n ± 0% -73.08% (p=0.000 n=10) geomean 11.40n 4.629n -59.40% On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb detection enabled: │ oc.3 │ oc.4 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 5.643n ± 0% -66.67% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.642n ± 0% ~ (p=0.447 n=10) OnesCount16-8 10.030n ± 0% 6.896n ± 0% -31.25% (p=0.000 n=10) OnesCount32-8 13.170n ± 0% 5.642n ± 0% -57.16% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 5.642n ± 0% -65.39% (p=0.000 n=10) geomean 11.55n 5.873n -49.16% On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb detection disabled: │ oc.3 │ oc.5 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.93n ± 0% 29.47n ± 0% +74.07% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.643n ± 0% ~ (p=0.191 n=10) OnesCount16-8 10.03n ± 0% 15.05n ± 0% +50.05% (p=0.000 n=10) OnesCount32-8 13.17n ± 0% 18.18n ± 0% +38.04% (p=0.000 n=10) OnesCount64-8 16.30n ± 0% 21.94n ± 0% +34.60% (p=0.000 n=10) geomean 11.55n 15.84n +37.16% For hardware without Zbb, this adds ~5ns overhead, while for hardware with Zbb we achieve a performance gain up of up to 11ns. It is worth noting that OnesCount8 is cheap enough that it is preferable to stick with the generic version in this case. Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5 Reviewed-on: https://go-review.googlesource.com/c/go/+/660856 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-25runtime: don't read /dev/random on Plan 9David du Colombier
In CL 656755, the readRandom function was modified to read an integer from /dev/random. However, on Plan 9, /dev/random can only return a few hundred bits a second. The issue is that readRandom is called by randinit, which is called at the creation of Go processes. Consequently, it lead the Go programs to be very slow on Plan 9. This change reverts the change done in CL 656755 to make the readRandom function always returning 0 on Plan 9. Change-Id: Ibe1bf7e4c8cbc82998e4f5e1331f5e29a047c4fc Cq-Include-Trybots: luci.golang.try:gotip-plan9-arm Reviewed-on: https://go-review.googlesource.com/c/go/+/663195 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Richard Miller <millerresearch@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-04-25cmd/compile: allow all of the preamble to be preemptibleKeith Randall
We currently make some parts of the preamble unpreemptible because it confuses morestack. See comments in the code. Instead, have morestack handle those weird cases so we can remove unpreemptible marks from most places. This CL makes user functions preemptible everywhere if they have no write barriers (at least, on x86). In cmd/go the fraction of functions that need preemptible markings drops from 82% to 36%. Makes the cmd/go binary 0.3% smaller. Update #35470 Change-Id: Ic83d5eabfd0f6d239a92e65684bcce7e67ff30bb Reviewed-on: https://go-review.googlesource.com/c/go/+/648518 Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>