go - Fork of Go programming language with my patches.

Age	Commit message (Collapse)	Author
2025-06-18	runtime: prevent mutual deadlock between GC stopTheWorld and suspendG	Michael Anthony Knyszek
	Almost everywhere we stop the world we casGToWaitingForGC to prevent mutual deadlock with the GC trying to scan our stack. This historically was only necessary if we weren't stopping the world to change the GC phase, because what we were worried about was mutual deadlock with mark workers' use of suspendG. And, they were the only users of suspendG. In Go 1.22 this changed. The execution tracer began using suspendG, too. This leads to the possibility of mutual deadlock between the execution tracer and a goroutine trying to start or end the GC mark phase. The fix is simple: make the stop-the-world calls for the GC also call casGToWaitingForGC. This way, suspendG is guaranteed to make progress in this circumstance, and once it completes, the stop-the-world can complete as well. We can take this a step further, though, and move casGToWaitingForGC into stopTheWorldWithSema, since there's no longer really a place we can afford to skip this detail. While we're here, rename casGToWaitingForGC to casGToWaitingForSuspendG, since the GC is now not the only potential source of mutual deadlock. Fixes #72740. Change-Id: I5e3739a463ef3e8173ad33c531e696e46260692f Reviewed-on: https://go-review.googlesource.com/c/go/+/681501 Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-06-04	runtime: reduce per-P memory footprint when greenteagc is disabled	Michael Anthony Knyszek
	There are two additional sources of memory overhead per P that come from greenteagc. One is for ptrBuf, but on platforms other than Windows it doesn't actually cost anything due to demand-paging (Windows also demand-pages, but the memory is 'committed' so it still counts against OS RSS metrics). The other is for per-sizeclass scan stats. However when greenteagc is disabled, most of these scan stats are completely unused. The worst-case memory overhead from these two sources is relatively small (about 10 KiB per P), but for programs with a small memory footprint running on a machine with a lot of cores, this can be significant (single-digit percent). This change does two things. First, it puts ptrBuf initialization behind the greenteagc experiment, so now that memory is never allocated by default. Second, it abstracts the implementation details of scan stat collection and emission, such that we can have two different implementations depending on the build tag. This lets us remove all the unused stats when the greenteagc experiment is disabled, reducing the memory overhead of the stats from ~2.6 KiB per P to 536 bytes per P. This is enough to make the difference no longer noticable in our benchmark suite. Fixes #73931. Change-Id: I4351f1cbb3f6743d8f5922d757d73442c6d6ad3f Reviewed-on: https://go-review.googlesource.com/c/go/+/678535 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-20	runtime: report finalizer and cleanup queue length with checkfinalizer>0	Michael Anthony Knyszek
	This change adds tracking for approximate finalizer and cleanup queue lengths. These lengths are reported once every GC cycle as a single line printed to stderr when GODEBUG=checkfinalizer>0. This change lays the groundwork for runtime/metrics metrics to produce the same values. For #72948. For #72950. Change-Id: I081721238a0fc4c7e5bee2dbaba6cfb4120d1a33 Reviewed-on: https://go-review.googlesource.com/c/go/+/671437 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20	runtime: add new GODEBUG checkfinalizer	Michael Anthony Knyszek
	This new debug mode detects cleanup/finalizer leaks using checkmark mode. It runs a partial GC using only specials as roots. If the GC can find a path from one of these roots back to the object the special is attached to, then the object might never be reclaimed. (The cycle could be broken in the future, but it's almost certainly a bug.) This debug mode is very barebones. It contains no type information and no stack location for where the finalizer or cleanup was created. For #72949. Change-Id: Ibffd64c1380b51f281950e4cfe61f677385d42a5 Reviewed-on: https://go-review.googlesource.com/c/go/+/634599 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-19	runtime: rename ncpu to numCPUStartup	Michael Pratt
	ncpu is the total logical CPU count at startup. It is never updated. For #73193, we will start using updated CPU counts for updated GOMAXPROCS, making the ncpu name a bit ambiguous. Change to a less ambiguous name. While we're at it, give the OS specific lookup functions a common name, so it can be used outside of osinit later. For #73193. Change-Id: I6a6a636cf21cc60de36b211f3c374080849fc667 Reviewed-on: https://go-review.googlesource.com/c/go/+/672277 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-05-08	runtime: schedule cleanups across multiple goroutines	Michael Anthony Knyszek
	This change splits the finalizer and cleanup queues and implements a new lock-free blocking queue for cleanups. The basic design is as follows: The cleanup queue is organized in fixed-sized blocks. Individual cleanup functions are queued, but only whole blocks are dequeued. Enqueuing cleanups places them in P-local cleanup blocks. These are flushed to the full list as they get full. Cleanups can only be enqueued by an active sweeper. Dequeuing cleanups always dequeues entire blocks from the full list. Cleanup blocks can be dequeued and executed at any time. The very last active sweeper in the sweep phase is responsible for flushing all local cleanup blocks to the full list. It can do this without any synchronization because the next GC can't start yet, so we can be very certain that nobody else will be accessing the local blocks. Cleanup blocks are stored off-heap because the need to be allocated by the sweeper, which is called from heap allocation paths. As a result, the GC treats cleanup blocks as roots, just like finalizer blocks. Flushes to the full list signal to the scheduler that cleanup goroutines should be awoken. Every time the scheduler goes to wake up a cleanup goroutine and there were more signals than goroutines to wake, it then forwards this signal to runtime.AddCleanup, so that it creates another goroutine the next time it is called, up to gomaxprocs goroutines. The signals here are a little convoluted, but exist because the sweeper and the scheduler cannot safely create new goroutines. For #71772. For #71825. Change-Id: Ie839fde2b67e1b79ac1426be0ea29a8d923a62cc Reviewed-on: https://go-review.googlesource.com/c/go/+/650697 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-07	runtime: use "bubble" terminology for synctest	Damien Neil
	We've settled on calling the group of goroutines started by synctest.Run a "bubble". At the time the runtime implementation was written, I was still calling this a "group". Update the code to match the current terminology. Change-Id: I31b757f31d804b5d5f9564c182627030a9532f4a Reviewed-on: https://go-review.googlesource.com/c/go/+/670135 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Damien Neil <dneil@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-06	unique: use a bespoke canonicalization map and runtime.AddCleanup	Michael Anthony Knyszek
	This change moves the unique package away from using a concurrent map and instead toward a bespoke concurrent canonicalization map. The map holds all its keys weakly, though keys may be looked up by value. The result is the strong pointer for the canonical value. Entries in the map are automatically cleaned up once the canonical reference no longer exists. Why do this? There's a problem with the current implementation when it comes to chains of unique.Handle: because the unique map will have a unique.Handle stored in its keys, each nested handle must be cleaned up 1 GC at a time. It takes N GC cycles, at minimum, to clean up a nested chain of N handles. This implementation, where the only value in the set is weakly-held, does not have this problem. The entire chain is dropped at once. The canon map implementation is a stripped-down version of HashTrieMap. The weak set implementation also has lower memory overheads by virtue of the fact that keys are all stored weakly. Whereas the previous map had both a T and a weak.Pointer[T], this only has a weak.Pointer[T]. The canonicalization map is a better abstraction overall and dramatically simplifies the unique.Make code. While we're here, delete the background goroutine and switch to runtime.AddCleanup. This is a step toward fixing #71772. We still need some kind of back-pressure mechanism, which will be implemented in a follow-up CL. For #71772. Fixes #71846. Change-Id: I5b2ee04ebfc7f6dd24c2c4a959dd0f6a8af24ca4 Reviewed-on: https://go-review.googlesource.com/c/go/+/650256 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-02	runtime: mark and scan small objects in whole spans [green tea]	Michael Anthony Knyszek
	Our current parallel mark algorithm suffers from frequent stalls on memory since its access pattern is essentially random. Small objects are the worst offenders, since each one forces pulling in at least one full cache line to access even when the amount to be scanned is far smaller than that. Each object also requires an independent access to per-object metadata. The purpose of this change is to improve garbage collector performance by scanning small objects in batches to obtain better cache locality than our current approach. The core idea behind this change is to defer marking and scanning small objects, and then scan them in batches localized to a span. This change adds scanned bits to each small object (<=512 bytes) span in addition to mark bits. The scanned bits indicate that the object has been scanned. (One way to think of them is "grey" bits and "black" bits in the tri-color mark-sweep abstraction.) Each of these spans is always 8 KiB and if they contain pointers, the pointer/scalar data is already packed together at the end of the span, allowing us to further optimize the mark algorithm for this specific case. When the GC encounters a pointer, it first checks if it points into a small object span. If so, it is first marked in the mark bits, and then the object is queued on a work-stealing P-local queue. This object represents the whole span, and we ensure that a span can only appear at most once in any queue by maintaining an atomic ownership bit for each span. Later, when the pointer is dequeued, we scan every object with a set mark that doesn't have a corresponding scanned bit. If it turns out that was the only object in the mark bits since the last time we scanned the span, we scan just that object directly, essentially falling back to the existing algorithm. noscan objects have no scan work, so they are never queued. Each span's mark and scanned bits are co-located together at the end of the span. Since the span is always 8 KiB in size, it can be found with simple pointer arithmetic. Next to the marks and scans we also store the size class, eliminating the need to access the span's mspan altogether. The work-stealing P-local queue is a new source of GC work. If this queue gets full, half of it is dumped to a global linked list of spans to scan. The regular scan queues are always prioritized over this queue to allow time for darts to accumulate. Stealing work from other Ps is a last resort. This change also adds a new debug mode under GODEBUG=gctrace=2 that dumps whole-span scanning statistics by size class on every GC cycle. A future extension to this CL is to use SIMD-accelerated scanning kernels for scanning spans with high mark bit density. For #19112. (Deadlock averted in GOEXPERIMENT.) For #73581. Change-Id: I4bbb4e36f376950a53e61aaaae157ce842c341bc Reviewed-on: https://go-review.googlesource.com/c/go/+/658036 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-23	runtime: align taggable pointers more so we can use low bits for tag	Keith Randall
	Currently we assume alignment to 8 bytes, so we can steal the low 3 bits. This CL assumes alignment to 512 bytes, so we can steal the low 9 bits. That's 6 extra bits! Aligning to 512 bytes wastes a bit of space but it is not egregious. Most of the objects that we make tagged pointers to are pretty big. Update #49405 Change-Id: I66fc7784ac1be5f12f285de1d7851d5a6871fb75 Reviewed-on: https://go-review.googlesource.com/c/go/+/665815 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-15	runtime: size field for gQueue and gList	Dmitrii Martynov
	Before CL, all instances of gQueue and gList stored the size of structures in a separate variable. The size changed manually and passed as a separate argument to different functions. This CL added an additional field to gQueue and gList structures to store the size. Also, the calculation of size was moved into the implementation of API for these structures. This allows to reduce possible errors by eliminating manual calculation of the size and simplifying functions' signatures. Change-Id: I087da2dfaec4925e4254ad40fce5ccb4c175ec41 Reviewed-on: https://go-review.googlesource.com/c/go/+/664777 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>
2025-02-19	runtime: minor mfinal.go code cleanup	Michael Anthony Knyszek
	This change moves finBlockSize into mfinal.go and renames finblock to finBlock. Change-Id: I20a0bc3907e7b028a2caa5d2fe8cf3f76332c871 Reviewed-on: https://go-review.googlesource.com/c/go/+/650695 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>
2025-02-11	runtime: use internal/trace/tracev2 definitions	Michael Anthony Knyszek
	This change deduplicates trace wire format definitions between the runtime and the trace parser by making the internal/trace/tracev2 package the source of truth. Change-Id: Ia0721d3484a80417e40ac473ec32870bee73df09 Reviewed-on: https://go-review.googlesource.com/c/go/+/644221 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-03	runtime: fix GODEBUG=gccheckmark=1 and add smoke test	Michael Anthony Knyszek
	This change fixes GODEBUG=gccheckmark=1 which seems to have bit-rotted. Because the root jobs weren't being reset, it wasn't doing anything. Then, it turned out that checkmark mode would queue up noscan objects in workbufs, which caused it to fail. Then it turned out checkmark mode was broken with user arenas, since their heap arenas are not registered anywhere. Then, it turned out that checkmark mode could just not run properly if the goroutine's preemption flag was set (since sched.gcwaiting is true during the STW). And lastly, it turned out that async preemption could cause erroneous checkmark failures. This change fixes all these issues and adds a simple smoke test to dist to run the runtime tests under gccheckmark, which exercises all of these issues. Fixes #69074. Fixes #69377. Fixes #69376. Change-Id: Iaa0bb7b9e63ed4ba34d222b47510d6292ce168bc Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Reviewed-on: https://go-review.googlesource.com/c/go/+/608915 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-11-19	internal/synctest: new package for testing concurrent code	Damien Neil
	Add an internal (for now) implementation of testing/synctest. The synctest.Run function executes a tree of goroutines in an isolated environment using a fake clock. The synctest.Wait function allows a test to wait for all other goroutines within the test to reach a blocking point. For #67434 For #69687 Change-Id: Icb39e54c54cece96517e58ef9cfb18bf68506cfc Reviewed-on: https://go-review.googlesource.com/c/go/+/591997 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-16	runtime: implement AddCleanup	Carlos Amedee
	This change introduces AddCleanup to the runtime package. AddCleanup attaches a cleanup function to an pointer to an object. The Stop method on Cleanups will be implemented in a followup CL. AddCleanup is intended to be an incremental improvement over SetFinalizer and will result in SetFinalizer being deprecated. For #67535 Change-Id: I99645152e3fdcee85fcf42a4f312c6917e8aecb1 Reviewed-on: https://go-review.googlesource.com/c/go/+/627695 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-11-13	runtime: prevent weak->strong conversions during mark termination	Michael Anthony Knyszek
	Currently it's possible for weak->strong conversions to create more GC work during mark termination. When a weak->strong conversion happens during the mark phase, we need to mark the newly-strong pointer, since it may now be the only pointer to that object. In other words, the object could be white. But queueing new white objects creates GC work, and if this happens during mark termination, we could end up violating mark termination invariants. In the parlance of the mark termination algorithm, the weak->strong conversion is a non-monotonic source of GC work, unlike the write barriers (which will eventually only see black objects). This change fixes the problem by forcing weak->strong conversions to block during mark termination. We can do this efficiently by setting a global flag before the ragged barrier that is checked at each weak->strong conversion. If the flag is set, then the conversions block. The ragged barrier ensures that all Ps have observed the flag and that any weak->strong conversions which completed before the ragged barrier have their newly-minted strong pointers visible in GC work queues if necessary. We later unset the flag and wake all the blocked goroutines during the mark termination STW. There are a few subtleties that we need to account for. For one, it's possible that a goroutine which blocked in a weak->strong conversion wakes up only to find it's mark termination time again, so we need to recheck the global flag on wake. We should also stay non-preemptible while performing the check, so that if the check does appear as true, it cannot switch back to false while we're actively trying to block. If it switches to false while we try to block, then we'll be stuck in the queue until the following GC. All-in-all, this CL is more complicated than I would have liked, but it's the only idea so far that is clearly correct to me at a high level. This change adds a test which is somewhat invasive as it manipulates mark termination, but hopefully that infrastructure will be useful for debugging, fixing, and regression testing mark termination whenever we do fix it. Fixes #69803. Change-Id: Ie314e6fd357c9e2a07a9be21f217f75f7aba8c4a Reviewed-on: https://go-review.googlesource.com/c/go/+/623615 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-10-14	runtime: clarify work.bytesMarked documentation	Austin Clements
	Change-Id: If5132400aac0ef00e467958beeaab5e64d053d10 Reviewed-on: https://go-review.googlesource.com/c/go/+/619099 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Austin Clements <austin@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-09-06	runtime: remove cloudwego/frugal unused linkname from comment	Kyle Xiao
	frugal no longer uses these methods from next Go version Fixes #69222 Change-Id: Ie71de0752cabef7d5584d3392d6e5920ba742350 Reviewed-on: https://go-review.googlesource.com/c/go/+/609918 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-29	all: document legacy //go:linkname for modules with ≥100 dependents	Russ Cox
	For #67401. Change-Id: I015408a3f437c1733d97160ef2fb5da6d4efcc5c Reviewed-on: https://go-review.googlesource.com/c/go/+/587598 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>
2024-05-23	all: document legacy //go:linkname for modules with ≥200 dependents	Russ Cox
	Ignored these linknames which have not worked for a while: github.com/xtls/xray-core: context.newCancelCtx removed in CL 463999 (Feb 2023) github.com/u-root/u-root: funcPC removed in CL 513837 (Jul 2023) tinygo.org/x/drivers: net.useNetdev never existed For #67401. Change-Id: I9293f4ef197bb5552b431de8939fa94988a060ce Reviewed-on: https://go-review.googlesource.com/c/go/+/587576 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-23	all: document legacy //go:linkname for modules with ≥500 dependents	Russ Cox
	For #67401. Change-Id: I7dd28c3b01a1a647f84929d15412aa43ab0089ee Reviewed-on: https://go-review.googlesource.com/c/go/+/587575 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-23	all: document legacy //go:linkname for modules with ≥20,000 dependents	Russ Cox
	For #67401. Change-Id: Icc10ede72547d8020c0ba45e89d954822a4b2455 Reviewed-on: https://go-review.googlesource.com/c/go/+/587218 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-08	runtime: remove allocfreetrace	Michael Anthony Knyszek
	allocfreetrace prints all allocations and frees to stderr. It's not terribly useful because it has a really huge overhead, making it not feasible to use except for the most trivial programs. A follow-up CL will replace it with something that is both more thorough and also lower overhead. Change-Id: I1d668fee8b6aaef5251a5aea3054ec2444d75eb6 Reviewed-on: https://go-review.googlesource.com/c/go/+/583376 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-04-22	unique: add unique package and implement Make/Handle	Michael Anthony Knyszek
	This change adds the unique package for canonicalizing values, as described by the proposal in #62483. Fixes #62483. Change-Id: I1dc3d34ec12351cb4dc3838a8ea29a5368d59e99 Reviewed-on: https://go-review.googlesource.com/c/go/+/574355 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ingo Oeser <nightlyone@googlemail.com> Reviewed-by: David Chase <drchase@google.com>
2024-04-08	runtime: account for _Pgcstop in GC CPU pause time in a fine-grained way	Michael Anthony Knyszek
	The previous CL, CL 570257, made it so that STW time no longer overlapped with other CPU time tracking. However, what we lost was insight into the CPU time spent _stopping_ the world, which can be just as important. There's pretty much no easy way to measure this indirectly, so this CL implements a direct measurement: whenever a P enters _Pgcstop, it writes down what time it did so. stopTheWorld then accumulates all the time deltas between when it finished stopping the world and each P's stop time into a total additional pause time. The GC pause cases then accumulate this number into the metrics. This should cause minimal additional overhead in stopping the world. GC STWs already take on the order of 10s to 100s of microseconds. Even for 100 Ps, the extra `nanotime` call per P is only 1500ns of additional CPU time. This is likely to be much less in actual pause latency, since it all happens concurrently. Change-Id: Icf190ffea469cd35ebaf0b2587bf6358648c8554 Reviewed-on: https://go-review.googlesource.com/c/go/+/574215 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-04-08	runtime: remove overlap in the GC CPU pause time metrics	Michael Anthony Knyszek
	Currently the GC CPU pause time metrics start measuring before the STW is complete. This results in a slightly less accurate measurement and creates some overlap with other timings (for example, the idle time of idle Ps) that will cause double-counting. This CL adds a field to worldStop to track the point at which the world actually stopped and uses that as the basis for the GC CPU pause time metrics, basically eliminating this overlap. Note that this will cause Ps in _Pgcstop before the world is fully stopped to be counted as user time. A follow-up CL will fix this discrepancy. Change-Id: I287731f08415ffd97d327f582ddf7e5d2248a6f5 Reviewed-on: https://go-review.googlesource.com/c/go/+/570258 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com>
2024-04-08	runtime: move GC pause time CPU metrics update into the STW	Michael Anthony Knyszek
	This change fixes a possible race with updating metrics and reading them. The update is intended to be protected by the world being stopped, but here, it clearly isn't. Fixing this lets us lower the thresholds in the metrics tests by an order of magnitude, because the only thing we have to worry about now is floating point error (the tests were previously written assuming the floating point error was much higher than it actually was; that turns out not to be the case, and this bug was the problem instead). However, this still isn't that tight of a bound; we still want to catch any and all problems of exactness. For this purpose, this CL adds a test to check the source-of-truth (in uint64 nanoseconds) that ensures the totals exactly match. This means we unfortunately have to take another time measurement, but for now let's prioritize correctness. A few additional nanoseconds of STW time won't be terribly noticable. Fixes #66212. Change-Id: Id02c66e8a43c13b1f70e9b268b8a84cc72293bfd Reviewed-on: https://go-review.googlesource.com/c/go/+/570257 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-08	runtime: use maxprocs instead of stwprocs for GC CPU pause time metrics	Michael Anthony Knyszek
	Currently we use stwprocs as the multiplier for the STW CPU time computation, but this isn't the same as GOMAXPROCS, which is used for the total time in the CPU metrics. The two numbers need to be comparable, so this change switches to using maxprocs to make it so. Change-Id: I423e3c441d05b1bd656353368cb323289661e302 Reviewed-on: https://go-review.googlesource.com/c/go/+/570256 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-08	runtime: factor out GC pause time CPU stats update	Michael Anthony Knyszek
	Currently this is done manually in two places. Replace these manual updates with a method that also forces the caller to be mindful that the number will be multiplied (and that it needs to be). This will make follow-up changes simpler too. Change-Id: I81ea844b47a40ff3470d23214b4b2fb5b71a4abe Reviewed-on: https://go-review.googlesource.com/c/go/+/570255 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-05	runtime: take a stack trace during tracing only when we own the stack	Michael Anthony Knyszek
	Currently, the execution tracer may attempt to take a stack trace of a goroutine whose stack it does not own. For example, if the goroutine is in _Grunnable or _Gwaiting. This is easily fixed in all cases by simply moving the emission of GoStop and GoBlock events to before the casgstatus happens. The goroutine status is what is used to signal stack ownership, and the GC may shrink a goroutine's stack if it can acquire the scan bit. Although this is easily fixed, the interaction here is very subtle, because stack ownership is only implicit in the goroutine's scan status. To make this invariant more maintainable and less error-prone in the future, this change adds a GODEBUG setting that checks, at the point of taking a stack trace, whether the caller owns the goroutine. This check is not quite perfect because there's no way for the stack tracing code to know that the _Gscan bit was acquired by the caller, so for simplicity it assumes that it was the caller that acquired the scan bit. In all other cases however, we can check for ownership precisely. At the very least, this check is sufficient to catch the issue this change is fixing. To make sure this debug check doesn't bitrot, it's always enabled during trace testing. This new mode has actually caught a few other issues already, so this change fixes them. One issue that this debug mode caught was that it's not safe to take a stack trace of a _Gwaiting goroutine that's being unparked. Another much bigger issue this debug mode caught was the fact that the execution tracer could try to take a stack trace of a G that was in _Gwaiting solely to avoid a deadlock in the GC. The execution tracer already has a partial list of these cases since they're modeled as the goroutine just executing as normal in the tracer, but this change takes the list and makes it more formal. In this specific case, we now prevent the GC from shrinking the stacks of goroutines in this state if tracing is enabled. The stack traces from these scenarios are too useful to discard, but there is indeed a race here between the tracer and any attempt to shrink the stack by the GC. Change-Id: I019850dabc8cede202fd6dcc0a4b1f16764209fb Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race Reviewed-on: https://go-review.googlesource.com/c/go/+/573155 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-03-25	runtime: migrate internal/atomic to internal/runtime	Andy Pan
	For #65355 Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be Reviewed-on: https://go-review.googlesource.com/c/go/+/560155 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
2024-03-08	runtime: use built-in clear to simplify code	apocelipes
	Change-Id: Icb6d9ca996b4119d8636d9f7f6a56e510d74d059 GitHub-Last-Rev: 08178e8ff798f4a51860573788c9347a0fb6bc40 GitHub-Pull-Request: golang/go#66188 Reviewed-on: https://go-review.googlesource.com/c/go/+/569979 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: qiulaidongfeng <2645477756@qq.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2024-02-28	all: run go fmt	sivchari
	I ran go fmt to fix format on the entire repository. Change-Id: I2f09166b6b8ba0ffb0ba27f6500efb0ea4cf21ff Reviewed-on: https://go-review.googlesource.com/c/go/+/566835 Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Ian Lance Taylor <iant@golang.org>
2024-01-26	runtime: use channels in gcBgMarkStartWorkers	Michael Pratt
	gcBgMarkStartWorkers currently starts workers one at a time, using a note to communicate readiness back from the worker. However, this is a pretty standard goroutine, so we can just use a channel to communicate between the goroutines. In addition to being conceptually simpler, using channels has the additional advantage of coordinating with the scheduler. Notes use OS locks and sleep the entire thread, requiring other threads to run the other goroutines. Waiting on a channel allows the scheduler to directly run another goroutine. When the worker sends to the channel, the scheduler can use runnext to run gcBgMarkStartWorker immediately, reducing latency. We could additionally batch start all workers and then wait only once, however this would defeate runnext switching between the workers and gcBgMarkStartWorkers, so in a heavily loaded system, we expect the direct switches to reduce latency. Change-Id: Iedf0d2ad8ad796b43fd8d32ccb1e815cfe010cb4 Reviewed-on: https://go-review.googlesource.com/c/go/+/558535 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-11-17	runtime: gofmt -w -s	Jes Cok
	Change-Id: I2eac85b502df9851df294f8d46c7845f635dde9b GitHub-Last-Rev: 3c8382442a0fadb355be9e4656942c2e03db2391 GitHub-Pull-Request: golang/go#64198 Reviewed-on: https://go-review.googlesource.com/c/go/+/542697 Run-TryBot: Jes Cok <xigua67damn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-11-15	runtime/metrics: add STW stopping and total time metrics	Michael Pratt
	This CL adds four new time histogram metrics: /sched/pauses/stopping/gc:seconds /sched/pauses/stopping/other:seconds /sched/pauses/total/gc:seconds /sched/pauses/total/other:seconds The "stopping" metrics measure the time taken to start a stop-the-world pause. i.e., how long it takes stopTheWorldWithSema to stop all Ps. This can be used to detect STW struggling to preempt Ps. The "total" metrics measure the total duration of a stop-the-world pause, from starting to stop-the-world until the world is started again. This includes the time spent in the "start" phase. The "gc" metrics are used for GC-related STW pauses. The "other" metrics are used for all other STW pauses. All of these metrics start timing in stopTheWorldWithSema only after successfully acquiring sched.lock, thus excluding lock contention on sched.lock. The reasoning behind this is that while waiting on sched.lock the world is not stopped at all (all other Ps can run), so the impact of this contention is primarily limited to the goroutine attempting to stop-the-world. Additionally, we already have some visibility into sched.lock contention via contention profiles (#57071). /sched/pauses/total/gc:seconds is conceptually equivalent to /gc/pauses:seconds, so the latter is marked as deprecated and returns the same histogram as the former. In the implementation, there are a few minor differences: * For both mark and sweep termination stops, /gc/pauses:seconds started timing prior to calling startTheWorldWithSema, thus including lock contention. These details are minor enough, that I do not believe the slight change in reporting will matter. For mark termination stops, moving timing stop into startTheWorldWithSema does have the side effect of requiring moving other GC metric calculations outside of the STW, as they depend on the same end time. Fixes #63340 Change-Id: Iacd0bab11bedab85d3dcfb982361413a7d9c0d05 Reviewed-on: https://go-review.googlesource.com/c/go/+/534161 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-13	runtime: remove work.pauseStart	Michael Pratt
	Most of the uses of work.pauseStart are completely useless, it could simply be a local variable. One use passes a parameter from gcMarkDone to gcMarkTermination, but that could simply be an argument. Keeping this field in workType makes it seems more important than it really is, so just drop it. Change-Id: I2fdc0b21f8844e5e7be47148c3e10f13e49815c6 Reviewed-on: https://go-review.googlesource.com/c/go/+/542075 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-13	runtime: call enableMetadataHugePages and its callees on the systemstack	Michael Anthony Knyszek
	These functions acquire the heap lock. If they're not called on the systemstack, a stack growth could cause a self-deadlock since stack growth may allocate memory from the page heap. This has been a problem for a while. If this is what's plaguing the ppc64 port right now, it's very surprising (and probably just coincidental) that it's showing up now. For #64050. For #64062. Fixes #64067. Change-Id: I2b95dc134d17be63b9fe8f7a3370fe5b5438682f Reviewed-on: https://go-review.googlesource.com/c/go/+/541635 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Paul Murphy <murp@ibm.com>
2023-11-10	runtime: add execution tracer v2 behind GOEXPERIMENT=exectracer2	Michael Anthony Knyszek
	This change mostly implements the design described in #60773 and includes a new scalable parser for the new trace format, available in internal/trace/v2. I'll leave this commit message short because this is clearly an enormous CL with a lot of detail. This change does not hook up the new tracer into cmd/trace yet. A follow-up CL will handle that. For #60773. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race Change-Id: I5d2aca2cc07580ed3c76a9813ac48ec96b157de0 Reviewed-on: https://go-review.googlesource.com/c/go/+/494187 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-09	runtime: make it harder to introduce deadlocks with forEachP	Michael Anthony Knyszek
	Currently any thread that tries to get the attention of all Ps (e.g. stopTheWorldWithSema and forEachP) ends up in a non-preemptible state waiting to preempt another thread. Thing is, that other thread might also be in a non-preemptible state, trying to preempt the first thread, resulting in a deadlock. This is a general problem, but in practice it only boils down to one specific scenario: a thread in GC is blocked trying to preempt a goroutine to scan its stack while that goroutine is blocked in a non-preemptible state to get the attention of all Ps. There's currently a hack in a few places in the runtime to move the calling goroutine into _Gwaiting before it goes into a non-preemptible state to preempt other threads. This lets the GC scan its stack because the goroutine is trivially preemptible. The only restriction is that forEachP and stopTheWorldWithSema absolutely cannot reference the calling goroutine's stack. This is generally not necessary, so things are good. Anyway, to avoid exposing the details of this hack, this change creates a safer wrapper around forEachP (and then renames it to forEachP and the existing one to forEachPInternal) that performs the goroutine status change, just like stopTheWorld does. We're going to need to use this hack with forEachP in the new tracer, so this avoids propagating the hack further and leaves it as an implementation detail. Change-Id: I51f02e8d8e0a3172334d23787e31abefb8a129ab Reviewed-on: https://go-review.googlesource.com/c/go/+/533455 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-11-09	runtime: refactor runtime->tracer API to appear more like a lock	Michael Anthony Knyszek
	Currently the execution tracer synchronizes with itself using very heavyweight operations. As a result, it's totally fine for most of the tracer code to look like: if traceEnabled() { traceXXX(...) } However, if we want to make that synchronization more lightweight (as issue #60773 proposes), then this is insufficient. In particular, we need to make sure the tracer can't observe an inconsistency between g atomicstatus and the event that would be emitted for a particular g transition. This means making the g status change appear to happen atomically with the corresponding trace event being written out from the perspective of the tracer. This requires a change in API to something more like a lock. While we're here, we might as well make sure that trace events can only be emitted while this lock is held. This change introduces such an API: traceAcquire, which returns a value that can emit events, and traceRelease, which requires the value that was returned by traceAcquire. In practice, this won't be a real lock, it'll be more like a seqlock. For the current tracer, this API is completely overkill and the value returned by traceAcquire basically just checks trace.enabled. But it's necessary for the tracer described in #60773 and we can implement that more cleanly if we do this refactoring now instead of later. For #60773. Change-Id: Ibb9ff5958376339fafc2b5180aef65cf2ba18646 Reviewed-on: https://go-review.googlesource.com/c/go/+/515635 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-11-07	cmd/compile,runtime: dedup writeBarrier needed	Mauri de Souza Meneguzzo
	The writeBarrier "needed" struct member has the exact same value as "enabled", and used interchangeably. I'm not sure if we plan to make a distinction between the two at some point, but today they are effectively the same, so dedup it and keep only "enabled". Change-Id: I65e596f174e1e820dc471a45ff70c0ef4efbc386 GitHub-Last-Rev: f8c805a91606d42c8d5b178ddd7d0bec7aaf9f55 GitHub-Pull-Request: golang/go#63814 Reviewed-on: https://go-review.googlesource.com/c/go/+/538495 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-11	runtime: remove write-only sweepdata fields	Michael Pratt
	Change-Id: Ia238889a704812473b838b20efedfe9d24b1e26f Reviewed-on: https://go-review.googlesource.com/c/go/+/534160 Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-10-11	runtime: remove stale non-atomic access comment	Michael Pratt
	CL 397014 converted this into an atomic access. Change-Id: Ib97716cd19ecd7d6bf8601baf0391755a5baf378 Reviewed-on: https://go-review.googlesource.com/c/go/+/534159 Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-10	runtime: fix some comments	Jes Cok
	Change-Id: I06403cf217a4d2645e13115e7ca358b7f3d3f2ef GitHub-Last-Rev: e2b4e5326a6c68d066b637c6add86723e0cefd3b GitHub-Pull-Request: golang/go#63474 Reviewed-on: https://go-review.googlesource.com/c/go/+/533875 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>
2023-08-23	runtime: create wrappers for gcDrain to identify time in profiles	Michael Anthony Knyszek
	Currently it's impossible to identify in profiles where gcDrain-related time is coming from. More specifically, what kind of worker. Create trivial wrappers for each worker so that the difference shows up in stack traces. Also, clarify why gcDrain disables write barriers. Change-Id: I966e3c0b1c583994e691f486bf0ed8cabb91dbbb Reviewed-on: https://go-review.googlesource.com/c/go/+/521815 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-07-21	runtime: fix debug non-concurrent sweep mode after activeSweep changes	Michael Anthony Knyszek
	Currently the GC creates a sweepLocker before restarting the world at the end of the mark phase, so that it can safely flush mcaches without the runtime incorrectly concluding that sweeping is done before that happens. However, with GODEBUG=gcstoptheworld=2, where sweeping happens during that STW phase, creating that sweepLocker will fail, since the runtime will conclude that sweeping is in fact complete (all the queues will be drained). The problem however is that gcSweep, which does the non-concurrent sweeping, doesn't actually flush mcaches. In essence, this failure to create a sweepLocker is indicating a real issue: sweeping is marked as complete, but we haven't flush the mcaches yet! The fix to this is to flush mcaches in gcSweep when in a non-concurrent sweep. Now that gcSweep actually completes a full sweep, it's safe to ignore a failure to create a sweepLocker (and in fact, it must fail). While we're here, let's also remove _ConcurrentSweep, the debug flag. There's already an alias for it called concurrentSweep, and there's only one use of it in gcSweep. Lastly, add a dist test for the GODEBUG=gcstoptheworld=2 mode. Fixes #53885. Change-Id: I8a1e5b8f362ed8abd03f76e4950d3211f145ab1f Reviewed-on: https://go-review.googlesource.com/c/go/+/479517 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-25	runtime: change heapObjectsCanMove to a func	Russ Cox
	A var is problematic because the zero value is already false, so if it goes away, it will appear to be false. I'm also not sure about go:linkname on vars, so switch to func for both reasons. Also add a test. Change-Id: I2318a5390d98577aec025152e65543491489defb Reviewed-on: https://go-review.googlesource.com/c/go/+/498261 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2023-05-25	runtime: add heapObjectsCanMove	Russ Cox
	heapObjectsCanMove is always false in the current garbage collector. It exists for go4.org/unsafe/assume-no-moving-gc, which is an unfortunate idea that had an even more unfortunate implementation. Every time a new Go release happened, the package stopped building, and the authors had to add a new file with a new //go:build line, and then the entire ecosystem of packages with that as a dependency had to explicitly update to the new version. Many packages depend on assume-no-moving-gc transitively, through paths like inet.af/netaddr -> go4.org/intern -> assume-no-moving-gc. This was causing a significant amount of friction around each new release, so we added this bool for the package to //go:linkname instead. The bool is still unfortunate, but it's not as bad as breaking the ecosystem on every new release. If the Go garbage collector ever does move heap objects, we can set this to true to break all the programs using assume-no-moving-gc. Change-Id: I06c32bf6ccc4601c8eef741d7382b678aada3508 Reviewed-on: https://go-review.googlesource.com/c/go/+/498121 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>