aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/mgcmark.go
AgeCommit message (Collapse)Author
2026-02-11runtime: conservatively scan extended register stateAlexander Musman
Conservatively scan the extended register state when GC scans asynchronously preempted goroutines. This ensures that any pointers that appear only in vector registers at preemption time are kept alive. Using vector registers for small memory moves may load pointers into these registers. If async preemption occurs mid-move, with no write barrier (e.g., heap-to-stack copies) and the source register clobbered or source memory modified by a racing goroutine, the pointer may exist only in the vector register. Without scanning this state, GC could miss live pointers. This addresses concerns raised in CL 738261 and enables safe use of vector registers for operations that may involve pointers. Change-Id: I5f5ce98d6ed6f7cde34b33da0aea1f880c2fcf41 Reviewed-on: https://go-review.googlesource.com/c/go/+/740681 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2025-11-18runtime: add hexdumperAustin Clements
Currently, we have a simple hexdumpWords facility for debugging. It's useful but pretty limited. This CL adds a much more configurable and capable "hexdumper". It can be configured for any word size (including bytes), handles unaligned data, includes an ASCII dump, and accepts data in multiple slices. It also has a much nicer "mark" facility for annotating the hexdump that isn't limited to a single character per word. We use this to improve our existing hexdumps, particularly the new mark facility. The next CL will integrate hexdumps into debuglog, which will make use of several other new capabilities. Also this adds an actual test. The output looks like: 7 6 5 4 3 2 1 0 f e d c b a 9 8 0123456789abcdef 000000c00006ef70: 03000000 00000000 ........ 000000c00006ef80: 00000000 0053da80 000000c0 000bc380 ..S............. ^ <testing.tRunner.func2+0x0> 000000c00006ef90: 00000000 0053dac0 000000c0 000bc380 ..S............. ^ <testing.tRunner.func1+0x0> 000000c00006efa0: 000000c0 0006ef90 000000c0 0006ef80 ................ 000000c00006efb0: 000000c0 0006efd0 00000000 0053eb65 ........e.S..... ^ <testing.(*T).Run.gowrap1+0x25> 000000c00006efc0: 000000c0 000bc380 00000000 009aaae8 ................ 000000c00006efd0: 00000000 00000000 00000000 00496b01 .........kI..... ^ <runtime.goexit+0x1> 000000c00006efe0: 00000000 00000000 00000000 00000000 ................ 000000c00006eff0: 00000000 00000000 ........ The header gives column labels, indicating the order of bytes within the following words. The addresses on the left are always 16-byte aligned so it's easy to combine that address with the column header to determine the full address of a byte. Annotations are no longer interleaved with the data, so the data stays in nicely aligned columns. The annotations are also now much more flexible, including support for multiple annotations on the same word (not shown). Change-Id: I27e83800a1f6a7bdd3cc2c59614661a810a57d4d Reviewed-on: https://go-review.googlesource.com/c/go/+/681375 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Austin Clements <austin@google.com>
2025-11-14runtime: put AddCleanup cleanup arguments in their own allocationMichael Anthony Knyszek
Currently, AddCleanup just creates a simple closure that calls `cleanup(arg)` as the actual cleanup function tracked internally. However, the argument ends up getting its own allocation. If it's tiny, then it can also end up sharing a tiny allocation slot with the object we're adding the cleanup to. Since the closure is a GC root, we can end up with cleanups that never fire. This change refactors the AddCleanup machinery to make the storage for the argument separate and explicit. With that in place, it explicitly allocates 16 bytes of storage for tiny arguments to side-step the tiny allocator. One would think this would cause an increase in memory use and more bytes allocated, but that's actually wrong! It turns out that the current "simple closure" actually creates _two_ closures. By making the argument passing explicit, we eliminate one layer of closures, so this actually results in a slightly faster AddCleanup overall and 16 bytes less memory allocated. goos: linux goarch: amd64 pkg: runtime cpu: AMD EPYC 7B13 │ before.bench │ after.bench │ │ sec/op │ sec/op vs base │ AddCleanupAndStop-64 124.5n ± 2% 103.7n ± 2% -16.71% (p=0.002 n=6) │ before.bench │ after.bench │ │ B/op │ B/op vs base │ AddCleanupAndStop-64 48.00 ± 0% 32.00 ± 0% -33.33% (p=0.002 n=6) │ before.bench │ after.bench │ │ allocs/op │ allocs/op vs base │ AddCleanupAndStop-64 3.000 ± 0% 2.000 ± 0% -33.33% (p=0.002 n=6) This change does, however, does add 16 bytes of overhead to the cleanup special itself, and makes each cleanup block entry 24 bytes instead of 8 bytes. This means the overall memory overhead delta with this change is neutral, and we just have a faster cleanup. (Cleanup block entries are transient, so I suspect any increase in memory overhead there is negligible.) Together with CL 719960, fixes #76007. Change-Id: I81bf3e44339e71c016c30d80bb4ee151c8263d5c Reviewed-on: https://go-review.googlesource.com/c/go/+/720321 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-20runtime: add _Gdeadextra statusMichael Anthony Knyszek
_Gdeadextra is almost the same as _Gdead but for goroutines attached to extra Ms. The primary difference is that it can be transitioned into a _Gscan status, unlike _Gdead. (Why not just use _Gdead? For safety, mostly. There's exactly one case where we're going to want to transition _Gdead to _Gscan|_Gdead, and it's for extra Ms. It's also a bit weird to call this state dead when it can still have a syscalling P attached to it.) This status is used in a follow-up change that changes entersyscall and exitsyscall. Change-Id: I169a4c8617aa3dc329574b829203f56c86b58169 Reviewed-on: https://go-review.googlesource.com/c/go/+/646197 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-02runtime,net/http/pprof: goroutine leak detection by using the garbage collectorVlad Saioc
Proposal #74609 Change-Id: I97a754b128aac1bc5b7b9ab607fcd5bb390058c8 GitHub-Last-Rev: 60f2a192badf415112246de8bc6c0084085314f6 GitHub-Pull-Request: golang/go#74622 Reviewed-on: https://go-review.googlesource.com/c/go/+/688335 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: t hepudds <thepudds1460@gmail.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-09-26runtime: use a smaller arena size on WasmCherry Mui
On Wasm, some programs have very small heap. Currently, we use 4 MB arena size (like all other 32-bit platforms). For a very small program, it needs to allocate one heap arena, 4 MB size at a 4 MB aligned address. So we'll need 8 MB of linear memory, whereas only a smaller portion is actually used by the program. On Wasm, samll programs are not uncommon (e.g. WASI plugins), and users are concerned about the memory usage. This CL switches to a smaller arena size, as well as a smaller page allocator chunk size (both are now 512 KB). So the heap will be grown in 512 KB granularity. For a helloworld program, it now uses less than 3 MB of linear memory, instead of 8 MB. Change-Id: Ibd66c1fa6e794a12c00906cbacc8f2e410f196c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/683296 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-09-23runtime: eliminate global span queue [green tea]Michael Anthony Knyszek
This change removes the locked global span queue and replaces the fixed-size local span queue with a variable-sized local span queue. The variable-sized local span queue grows as needed to accomodate local work. With no global span queue either, GC workers balance work amongst themselves by stealing from each other. The new variable-sized local span queues are inspired by the P-local deque underlying sync.Pool. Unlike the sync.Pool deque, however, both the owning P and stealing Ps take spans from the tail, making this incarnation a strict queue, not a deque. This is intentional, since we want a queue-like order to encourage objects to accumulate on each span. These variable-sized local span queues are crucial to mark termination, just like the global span queue was. To avoid hitting the ragged barrier too often, we must check whether any Ps have any spans on their variable-sized local span queues. We maintain a per-P atomic bitmask (another pMask) that contains this state. We can also use this to speed up stealing by skipping Ps that don't have any local spans. The variable-sized local span queues are slower than the old fixed-size local span queues because of the additional indirection, so this change adds a non-atomic local fixed-size queue. This risks getting work stuck on it, so, similarly to how workbufs work, each worker will occasionally dump some spans onto its local variable-sized queue. This scales much more nicely than dumping to a global queue, but is still visible to all other Ps. For #73581. Change-Id: I814f54d9c3cc7fa7896167746e9823f50943ac22 Reviewed-on: https://go-review.googlesource.com/c/go/+/700496 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-09-23runtime: split gcMarkWorkAvailable into two separate conditionsMichael Anthony Knyszek
Right now, gcMarkWorkAvailable is used in two scenarios. The first is critical: we use it to determine whether we're approaching mark termination, and it's crucial to reaching a fixed point across the ragged barrier in gcMarkDone. The second is a heuristic: should we spin up another GC worker? This change splits gcMarkWorkAvailable into these two separate conditions. This change also deduplicates the logic for updating work.nwait into more abstract helpers "gcBeginWork" and "gcEndWork." This change is solely refactoring, and should be a no-op. There are only two functional changes: - work.nwait is incremented after setting pp.gcMarkWorkerMode in the background worker code. I don't believe this change is observable except if the code fails to update work.nwait (either it results in a non-sensical number, or the stack is corrupted) in which case the goroutine may not be labeled as a mark worker in the resulting stack trace (it should be obvious from the stack frames though). - endCheckmarks also checks work.nwait == work.nproc, which should be fine since we never mutate work.nwait on that path. That extra check should be a no-op. Splitting these two use-cases for gcMarkWorkAvailable is conceptually helpful, and the checks may also diverge from Green Tea once we get rid of the global span queue. Change-Id: I0bec244a14ee82919c4deb7c1575589c0dca1089 Reviewed-on: https://go-review.googlesource.com/c/go/+/701176 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-28all: omit unnecessary type conversionsJes Cok
Found by github.com/mdempsky/unconvert Change-Id: Ib78cceb718146509d96dbb6da87b27dbaeba1306 GitHub-Last-Rev: dedf354811701ce8920c305b6f7aa78914a4171c GitHub-Pull-Request: golang/go#74771 Reviewed-on: https://go-review.googlesource.com/c/go/+/690735 Reviewed-by: Mark Freeman <mark@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2025-07-25runtime: rename scanobject to scanObjectMichael Anthony Knyszek
This is long overdue. Change-Id: I891b114cb581e82b903c20d1c455bbbdad548fe8 Reviewed-on: https://go-review.googlesource.com/c/go/+/690535 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-07-25runtime: duplicate scanobject in greentea and non-greentea filesMichael Anthony Knyszek
This change exists to help differentiate profile samples spent on Green Tea and non-Green-Tea GC time in mixed contexts. Change-Id: I8dea340d2d11ba4c410ae939fb5f37020d0b55d1 Reviewed-on: https://go-review.googlesource.com/c/go/+/689477 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-06-18runtime: prevent mutual deadlock between GC stopTheWorld and suspendGMichael Anthony Knyszek
Almost everywhere we stop the world we casGToWaitingForGC to prevent mutual deadlock with the GC trying to scan our stack. This historically was only necessary if we weren't stopping the world to change the GC phase, because what we were worried about was mutual deadlock with mark workers' use of suspendG. And, they were the only users of suspendG. In Go 1.22 this changed. The execution tracer began using suspendG, too. This leads to the possibility of mutual deadlock between the execution tracer and a goroutine trying to start or end the GC mark phase. The fix is simple: make the stop-the-world calls for the GC also call casGToWaitingForGC. This way, suspendG is guaranteed to make progress in this circumstance, and once it completes, the stop-the-world can complete as well. We can take this a step further, though, and move casGToWaitingForGC into stopTheWorldWithSema, since there's no longer really a place we can afford to skip this detail. While we're here, rename casGToWaitingForGC to casGToWaitingForSuspendG, since the GC is now not the only potential source of mutual deadlock. Fixes #72740. Change-Id: I5e3739a463ef3e8173ad33c531e696e46260692f Reviewed-on: https://go-review.googlesource.com/c/go/+/681501 Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-21runtime: add valgrind instrumentationRoland Shoemaker
Add build tag gated Valgrind annotations to the runtime which let it understand how the runtime manages memory. This allows for Go binaries to be run under Valgrind without emitting spurious errors. Instead of adding the Valgrind headers to the tree, and using cgo to call the various Valgrind client request macros, we just add an assembly function which emits the necessary instructions to trigger client requests. In particular we add instrumentation of the memory allocator, using a two-level mempool structure (as described in the Valgrind manual [0]). We also add annotations which allow Valgrind to track which memory we use for stacks, which seems necessary to let it properly function. We describe the memory model to Valgrind as follows: we treat heap arenas as a "pool" created with VALGRIND_CREATE_MEMPOOL_EXT (so that we can use VALGRIND_MEMPOOL_METAPOOL and VALGRIND_MEMPOOL_AUTO_FREE). Within the pool we treat spans as "superblocks", annotated with VALGRIND_MEMPOOL_ALLOC. We then allocate individual objects within spans with VALGRIND_MALLOCLIKE_BLOCK. It should be noted that running binaries under Valgrind can be _quite slow_, and certain operations, such as running the GC, can be _very slow_. It is recommended to run programs with GOGC=off. Additionally, async preemption should be turned off, since it'll cause strange behavior (GODEBUG=asyncpreemptoff=1). Running Valgrind with --leak-check=yes will result in some errors resulting from some things not being marked fully free'd. These likely need more annotations to rectify, but for now it is recommended to run with --leak-check=off. Updates #73602 [0] https://valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools Change-Id: I71b26c47d7084de71ef1e03947ef6b1cc6d38301 Reviewed-on: https://go-review.googlesource.com/c/go/+/674077 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-20runtime: add scan trace for checkfinalizers>1Michael Anthony Knyszek
This change dumps a scan trace (each pointer marked and where it came from) for the partial GC cycle performed by checkfinalizers mode when checkfinalizers>1. This is useful for quickly understanding why certain values are reachable without having to pull out tools like viewcore. For #72949. Change-Id: Ic583f80e9558cdfe1c667d27a1d975008dd39a9c Reviewed-on: https://go-review.googlesource.com/c/go/+/662038 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20runtime: add new GODEBUG checkfinalizerMichael Anthony Knyszek
This new debug mode detects cleanup/finalizer leaks using checkmark mode. It runs a partial GC using only specials as roots. If the GC can find a path from one of these roots back to the object the special is attached to, then the object might never be reclaimed. (The cycle could be broken in the future, but it's almost certainly a bug.) This debug mode is very barebones. It contains no type information and no stack location for where the finalizer or cleanup was created. For #72949. Change-Id: Ibffd64c1380b51f281950e4cfe61f677385d42a5 Reviewed-on: https://go-review.googlesource.com/c/go/+/634599 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-20runtime: only update freeIndexForScan outside of the mark phaseMichael Anthony Knyszek
Currently, it's possible for asynchronous preemption to observe a partially initialized object. The sequence of events goes like this: - The GC is in the mark phase. - Thread T1 is allocating object O1. - Thread T1 zeroes the allocation, runs the publication barrier, and updates freeIndexForScan. It has not yet updated the mark bit on O1. - Thread T2 is conservatively scanning some stack frame. That stack frame has a dead pointer with the same address as O1. - T2 picks up the pointer, checks isFree (which checks freeIndexForScan without an import barrier), and sees that O1 is allocated. It marks and queues O1. - T2 then goes to scan O1, and observes uninitialized memory. Although a publication barrier was executed, T2 did not have an import barrier. T2 may thus observe T1's writes to zero the object out-of-order with the write to freeIndexForScan. Normally this would be impossible if T2 got a pointer to O1 from somewhere written by T1. The publication barrier guarantees that if the read side is data-dependent on the write side then we'd necessarily observe all writes to O1 before T1 published it. However, T2 got the pointer 'out of thin air' by scanning a stack frame with a dead pointer on it. One fix to this problem would be to add the import barrier in the conservative scanner. We would then also need to put freeIndexForScan behind the publication barrier, or make the write to freeIndexForScan exactly that barrier. However, there's a simpler way. We don't actually care if conservative scanning observes a stale freeIndexForScan during the mark phase. Newly-allocated memory is always marked at the point of allocation (the allocate-black policy part of the GC's design). So it doesn't actually matter that if the garbage collector scans that memory or not. This change modifies the allocator to only update freeIndexForScan outside the mark phase. This means freeIndexForScan is essentially a snapshot of freeindex at the point the mark phase started. Because there's no more race between conservative scanning and newly-allocated objects, the complicated scenario above is no longer a possibility. One thing we do have to be careful of is other callers of isFree. Previously freeIndexForScan would always track freeindex, now it no longer does. This change thus introduces isFreeOrNewlyAllocated which is used by the conservative scanner, and uses freeIndexForScan. Meanwhile isFree goes back to using freeindex like it used to. This change also documents the requirement on isFree that the caller must have obtained the pointer not 'out of thin air' but after the object was published. isFree is not currently used anywhere particularly sensitive (heap dump and checkmark mode, where the world is stopped in both cases) so using freeindex is both conceptually simple and also safe. Change-Id: If66b8c536b775971203fb4358c17d711c2944723 Reviewed-on: https://go-review.googlesource.com/c/go/+/672340 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-08runtime: schedule cleanups across multiple goroutinesMichael Anthony Knyszek
This change splits the finalizer and cleanup queues and implements a new lock-free blocking queue for cleanups. The basic design is as follows: The cleanup queue is organized in fixed-sized blocks. Individual cleanup functions are queued, but only whole blocks are dequeued. Enqueuing cleanups places them in P-local cleanup blocks. These are flushed to the full list as they get full. Cleanups can only be enqueued by an active sweeper. Dequeuing cleanups always dequeues entire blocks from the full list. Cleanup blocks can be dequeued and executed at any time. The very last active sweeper in the sweep phase is responsible for flushing all local cleanup blocks to the full list. It can do this without any synchronization because the next GC can't start yet, so we can be very certain that nobody else will be accessing the local blocks. Cleanup blocks are stored off-heap because the need to be allocated by the sweeper, which is called from heap allocation paths. As a result, the GC treats cleanup blocks as roots, just like finalizer blocks. Flushes to the full list signal to the scheduler that cleanup goroutines should be awoken. Every time the scheduler goes to wake up a cleanup goroutine and there were more signals than goroutines to wake, it then forwards this signal to runtime.AddCleanup, so that it creates another goroutine the next time it is called, up to gomaxprocs goroutines. The signals here are a little convoluted, but exist because the sweeper and the scheduler cannot safely create new goroutines. For #71772. For #71825. Change-Id: Ie839fde2b67e1b79ac1426be0ea29a8d923a62cc Reviewed-on: https://go-review.googlesource.com/c/go/+/650697 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-07runtime: use "bubble" terminology for synctestDamien Neil
We've settled on calling the group of goroutines started by synctest.Run a "bubble". At the time the runtime implementation was written, I was still calling this a "group". Update the code to match the current terminology. Change-Id: I31b757f31d804b5d5f9564c182627030a9532f4a Reviewed-on: https://go-review.googlesource.com/c/go/+/670135 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Damien Neil <dneil@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-02runtime: mark and scan small objects in whole spans [green tea]Michael Anthony Knyszek
Our current parallel mark algorithm suffers from frequent stalls on memory since its access pattern is essentially random. Small objects are the worst offenders, since each one forces pulling in at least one full cache line to access even when the amount to be scanned is far smaller than that. Each object also requires an independent access to per-object metadata. The purpose of this change is to improve garbage collector performance by scanning small objects in batches to obtain better cache locality than our current approach. The core idea behind this change is to defer marking and scanning small objects, and then scan them in batches localized to a span. This change adds scanned bits to each small object (<=512 bytes) span in addition to mark bits. The scanned bits indicate that the object has been scanned. (One way to think of them is "grey" bits and "black" bits in the tri-color mark-sweep abstraction.) Each of these spans is always 8 KiB and if they contain pointers, the pointer/scalar data is already packed together at the end of the span, allowing us to further optimize the mark algorithm for this specific case. When the GC encounters a pointer, it first checks if it points into a small object span. If so, it is first marked in the mark bits, and then the object is queued on a work-stealing P-local queue. This object represents the whole span, and we ensure that a span can only appear at most once in any queue by maintaining an atomic ownership bit for each span. Later, when the pointer is dequeued, we scan every object with a set mark that doesn't have a corresponding scanned bit. If it turns out that was the only object in the mark bits since the last time we scanned the span, we scan just that object directly, essentially falling back to the existing algorithm. noscan objects have no scan work, so they are never queued. Each span's mark and scanned bits are co-located together at the end of the span. Since the span is always 8 KiB in size, it can be found with simple pointer arithmetic. Next to the marks and scans we also store the size class, eliminating the need to access the span's mspan altogether. The work-stealing P-local queue is a new source of GC work. If this queue gets full, half of it is dumped to a global linked list of spans to scan. The regular scan queues are always prioritized over this queue to allow time for darts to accumulate. Stealing work from other Ps is a last resort. This change also adds a new debug mode under GODEBUG=gctrace=2 that dumps whole-span scanning statistics by size class on every GC cycle. A future extension to this CL is to use SIMD-accelerated scanning kernels for scanning spans with high mark bit density. For #19112. (Deadlock averted in GOEXPERIMENT.) For #73581. Change-Id: I4bbb4e36f376950a53e61aaaae157ce842c341bc Reviewed-on: https://go-review.googlesource.com/c/go/+/658036 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-15runtime: size field for gQueue and gListDmitrii Martynov
Before CL, all instances of gQueue and gList stored the size of structures in a separate variable. The size changed manually and passed as a separate argument to different functions. This CL added an additional field to gQueue and gList structures to store the size. Also, the calculation of size was moved into the implementation of API for these structures. This allows to reduce possible errors by eliminating manual calculation of the size and simplifying functions' signatures. Change-Id: I087da2dfaec4925e4254ad40fce5ccb4c175ec41 Reviewed-on: https://go-review.googlesource.com/c/go/+/664777 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>
2025-02-03runtime: fix GODEBUG=gccheckmark=1 and add smoke testMichael Anthony Knyszek
This change fixes GODEBUG=gccheckmark=1 which seems to have bit-rotted. Because the root jobs weren't being reset, it wasn't doing anything. Then, it turned out that checkmark mode would queue up noscan objects in workbufs, which caused it to fail. Then it turned out checkmark mode was broken with user arenas, since their heap arenas are not registered anywhere. Then, it turned out that checkmark mode could just not run properly if the goroutine's preemption flag was set (since sched.gcwaiting is true during the STW). And lastly, it turned out that async preemption could cause erroneous checkmark failures. This change fixes all these issues and adds a simple smoke test to dist to run the runtime tests under gccheckmark, which exercises all of these issues. Fixes #69074. Fixes #69377. Fixes #69376. Change-Id: Iaa0bb7b9e63ed4ba34d222b47510d6292ce168bc Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Reviewed-on: https://go-review.googlesource.com/c/go/+/608915 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-11-19internal/synctest: new package for testing concurrent codeDamien Neil
Add an internal (for now) implementation of testing/synctest. The synctest.Run function executes a tree of goroutines in an isolated environment using a fake clock. The synctest.Wait function allows a test to wait for all other goroutines within the test to reach a blocking point. For #67434 For #69687 Change-Id: Icb39e54c54cece96517e58ef9cfb18bf68506cfc Reviewed-on: https://go-review.googlesource.com/c/go/+/591997 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-18cmd/compile: remove gc programs from stack frame objectsKeith Randall
This is a two-pronged approach. First, try to keep large objects off the stack frame. Second, if they do manage to appear anyway, use straight bitmasks instead of gc programs. Generally probably a good idea to keep large objects out of stack frames. But particularly keeping gc programs off the stack simplifies runtime code a bit. This CL sets the limit of most stack objects to 131072 bytes (on 64-bit archs). There can still be large objects if allocated by a late pass, like order, or they are required to be on the stack, like function arguments. But the size for the bitmasks for these objects isn't a huge deal, as we have already have (probably several) bitmasks for the frame liveness map itself. Change-Id: I6d2bed0e9aa9ac7499955562c6154f9264061359 Reviewed-on: https://go-review.googlesource.com/c/go/+/542815 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-11-16runtime: implement AddCleanupCarlos Amedee
This change introduces AddCleanup to the runtime package. AddCleanup attaches a cleanup function to an pointer to an object. The Stop method on Cleanups will be implemented in a followup CL. AddCleanup is intended to be an incremental improvement over SetFinalizer and will result in SetFinalizer being deprecated. For #67535 Change-Id: I99645152e3fdcee85fcf42a4f312c6917e8aecb1 Reviewed-on: https://go-review.googlesource.com/c/go/+/627695 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-10-21runtime: refactor mallocgc into several independent codepathsMichael Anthony Knyszek
Right now mallocgc is a monster of a function. In real programs, we see that a substantial amount of time in mallocgc is spent in mallocgc itself. It's very branch-y, holds a lot of state, and handles quite a few disparate cases, trying to merge them together. This change breaks apart mallocgc into separate, leaner functions. There's some duplication now, but there are a lot of branches that can be pruned as a result. There's definitely still more we can do here. heapSetType can be inlined and broken down for each case, since its internals roughly map to each case anyway (done in a follow-up CL). We can probably also do more with the size class lookups, since we know more about the size of the object in each case than before. Below are the savings for the full stack up until now. │ after-3.out │ after-4.out │ │ sec/op │ sec/op vs base │ Malloc8-4 13.32n ± 2% 12.17n ± 1% -8.63% (p=0.002 n=6) Malloc16-4 21.64n ± 3% 19.38n ± 10% -10.47% (p=0.002 n=6) MallocTypeInfo8-4 23.15n ± 2% 19.91n ± 2% -14.00% (p=0.002 n=6) MallocTypeInfo16-4 25.86n ± 4% 22.48n ± 5% -13.11% (p=0.002 n=6) MallocLargeStruct-4 270.0n ± ∞ ¹ geomean 20.38n 30.97n -11.58% Change-Id: I681029c0b442f9221c4429950626f06299a5cfe4 Reviewed-on: https://go-review.googlesource.com/c/go/+/614257 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-10-21runtime: move debug checks behind constant flag in mallocgcMichael Anthony Knyszek
These debug checks are very occasionally helpful, but they do cost real time. The biggest issue seems to be the bloat of mallocgc due to the "throw" paths. Overall, after some follow-ups, this change cuts about 1ns off of the mallocgc fast path. This is a microoptimization that on its own changes very little, but together with other optimizations and a breaking up of the various malloc paths will matter all together ("death by a thousand cuts"). Change-Id: I07c4547ad724b9f94281320846677fb558957721 Reviewed-on: https://go-review.googlesource.com/c/go/+/617878 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-07-23runtime,internal: move runtime/internal/sys to internal/runtime/sysDavid Chase
Cleanup and friction reduction For #65355. Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511 Reviewed-on: https://go-review.googlesource.com/c/go/+/600436 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-04-18internal/weak: add package implementing weak pointersMichael Anthony Knyszek
This change adds the internal/weak package, which exposes GC-supported weak pointers to the standard library. This is for the upcoming weak package, but may be useful for other future constructs. For #62483. Change-Id: I4aa8fa9400110ad5ea022a43c094051699ccab9d Reviewed-on: https://go-review.googlesource.com/c/go/+/576297 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-15runtime, cmd/trace: remove code paths that include v1 tracerCarlos Amedee
This change makes the new execution tracer described in #60773, the default tracer. This change attempts to make the smallest amount of changes for a single CL. Updates #66703 For #60773 Change-Id: I3742f3419c54f07d7c020ae5e1c18d29d8bcae6d Reviewed-on: https://go-review.googlesource.com/c/go/+/576256 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-09runtime: remove the allocheaders GOEXPERIMENTMichael Anthony Knyszek
This change removes the allocheaders, deleting all the old code and merging mbitmap_allocheaders.go back into mbitmap.go. This change also deletes the SetType benchmarks which were already broken in the new GOEXPERIMENT (it's harder to set up than before). We weren't really watching these benchmarks at all, and they don't provide additional test coverage. Change-Id: I135497201c3259087c5cd3722ed3fbe24791d25d Reviewed-on: https://go-review.googlesource.com/c/go/+/567200 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-04-05runtime: take a stack trace during tracing only when we own the stackMichael Anthony Knyszek
Currently, the execution tracer may attempt to take a stack trace of a goroutine whose stack it does not own. For example, if the goroutine is in _Grunnable or _Gwaiting. This is easily fixed in all cases by simply moving the emission of GoStop and GoBlock events to before the casgstatus happens. The goroutine status is what is used to signal stack ownership, and the GC may shrink a goroutine's stack if it can acquire the scan bit. Although this is easily fixed, the interaction here is very subtle, because stack ownership is only implicit in the goroutine's scan status. To make this invariant more maintainable and less error-prone in the future, this change adds a GODEBUG setting that checks, at the point of taking a stack trace, whether the caller owns the goroutine. This check is not quite perfect because there's no way for the stack tracing code to know that the _Gscan bit was acquired by the caller, so for simplicity it assumes that it was the caller that acquired the scan bit. In all other cases however, we can check for ownership precisely. At the very least, this check is sufficient to catch the issue this change is fixing. To make sure this debug check doesn't bitrot, it's always enabled during trace testing. This new mode has actually caught a few other issues already, so this change fixes them. One issue that this debug mode caught was that it's not safe to take a stack trace of a _Gwaiting goroutine that's being unparked. Another much bigger issue this debug mode caught was the fact that the execution tracer could try to take a stack trace of a G that was in _Gwaiting solely to avoid a deadlock in the GC. The execution tracer already has a partial list of these cases since they're modeled as the goroutine just executing as normal in the tracer, but this change takes the list and makes it more formal. In this specific case, we now prevent the GC from shrinking the stacks of goroutines in this state if tracing is enabled. The stack traces from these scenarios are too useful to discard, but there is indeed a race here between the tracer and any attempt to shrink the stack by the GC. Change-Id: I019850dabc8cede202fd6dcc0a4b1f16764209fb Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race Reviewed-on: https://go-review.googlesource.com/c/go/+/573155 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-03-25runtime: migrate internal/atomic to internal/runtimeAndy Pan
For #65355 Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be Reviewed-on: https://go-review.googlesource.com/c/go/+/560155 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
2024-01-30runtime: compute data and bss root work units in one loopjeffery
Change-Id: Ia730ca244c83db925879de206809938aeb969cdd GitHub-Last-Rev: 711b8b8b935552eea6136242821444e83fc23d38 GitHub-Pull-Request: golang/go#64349 Reviewed-on: https://go-review.googlesource.com/c/go/+/544478 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-11-17runtime: use span.elemsize for accounting in mallocgcMichael Anthony Knyszek
Currently the final size computed for an object in mallocgc excludes the allocation header. This is correct in a number of cases, but definitely wrong for memory profiling because the "free" side accounts for the full allocation slot. This change makes an explicit distinction between the parts of mallocgc that care about the full allocation slot size ("the GC's accounting") and those that don't (pointer+len should always be valid). It then applies the appropriate size to the different forms of accounting in mallocgc. For #64153. Change-Id: I481b34b2bb9ff923b59e8408ab2b8fb9025ba944 Reviewed-on: https://go-review.googlesource.com/c/go/+/542735 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-11-10runtime: add execution tracer v2 behind GOEXPERIMENT=exectracer2Michael Anthony Knyszek
This change mostly implements the design described in #60773 and includes a new scalable parser for the new trace format, available in internal/trace/v2. I'll leave this commit message short because this is clearly an enormous CL with a lot of detail. This change does not hook up the new tracer into cmd/trace yet. A follow-up CL will handle that. For #60773. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race Change-Id: I5d2aca2cc07580ed3c76a9813ac48ec96b157de0 Reviewed-on: https://go-review.googlesource.com/c/go/+/494187 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-09runtime: make all GC mark workers yield for forEachPMichael Anthony Knyszek
Currently dedicated GC mark workers really try to avoid getting preempted. The one exception is for a pending STW, indicated by sched.gcwaiting. This is currently fine because other kinds of preemptions don't matter to the mark workers: they're intentionally bound to their P. With the new execution tracer we're going to want to use forEachP to get the attention of all Ps. We may want to do this during a GC cycle. forEachP doesn't set sched.gcwaiting, so it may end up waiting the full GC mark phase, burning a thread and a P in the meantime. This can mean basically seconds of waiting and trying to preempt GC mark workers. This change makes all mark workers yield if (*p).runSafePointFn != 0 so that the workers actually yield somewhat promptly in response to a forEachP attempt. Change-Id: I7430baf326886b9f7a868704482a224dae7c9bba Reviewed-on: https://go-review.googlesource.com/c/go/+/537235 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2023-11-09runtime: refactor runtime->tracer API to appear more like a lockMichael Anthony Knyszek
Currently the execution tracer synchronizes with itself using very heavyweight operations. As a result, it's totally fine for most of the tracer code to look like: if traceEnabled() { traceXXX(...) } However, if we want to make that synchronization more lightweight (as issue #60773 proposes), then this is insufficient. In particular, we need to make sure the tracer can't observe an inconsistency between g atomicstatus and the event that would be emitted for a particular g transition. This means making the g status change appear to happen atomically with the corresponding trace event being written out from the perspective of the tracer. This requires a change in API to something more like a lock. While we're here, we might as well make sure that trace events can *only* be emitted while this lock is held. This change introduces such an API: traceAcquire, which returns a value that can emit events, and traceRelease, which requires the value that was returned by traceAcquire. In practice, this won't be a real lock, it'll be more like a seqlock. For the current tracer, this API is completely overkill and the value returned by traceAcquire basically just checks trace.enabled. But it's necessary for the tracer described in #60773 and we can implement that more cleanly if we do this refactoring now instead of later. For #60773. Change-Id: Ibb9ff5958376339fafc2b5180aef65cf2ba18646 Reviewed-on: https://go-review.googlesource.com/c/go/+/515635 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-11-09runtime: implement experiment to replace heap bitmap with alloc headersMichael Anthony Knyszek
This change replaces the 1-bit-per-word heap bitmap for most size classes with allocation headers for objects that contain pointers. The header consists of a single pointer to a type. All allocations with headers are treated as implicitly containing one or more instances of the type in the header. As the name implies, headers are usually stored as the first word of an object. There are two additional exceptions to where headers are stored and how they're used. Objects smaller than 512 bytes do not have headers. Instead, a heap bitmap is reserved at the end of spans for objects of this size. A full word of overhead is too much for these small objects. The bitmap is of the same format of the old bitmap, minus the noMorePtrs bits which are unnecessary. All the objects <512 bytes have a bitmap less than a pointer-word in size, and that was the granularity at which noMorePtrs could stop scanning early anyway. Objects that are larger than 32 KiB (which have their own span) have their headers stored directly in the span, to allow power-of-two-sized allocations to not spill over into an extra page. The full implementation is behind GOEXPERIMENT=allocheaders. The purpose of this change is performance. First and foremost, with headers we no longer have to unroll pointer/scalar data at allocation time for most size classes. Small size classes still need some unrolling, but their bitmaps are small so we can optimize that case fairly well. Larger objects effectively have their pointer/scalar data unrolled on-demand from type data, which is much more compactly represented and results in less TLB pressure. Furthermore, since the headers are usually right next to the object and where we're about to start scanning, we get an additional temporal locality benefit in the data cache when looking up type metadata. The pointer/scalar data is now effectively unrolled on-demand, but it's also simpler to unroll than before; that unrolled data is never written anywhere, and for arrays we get the benefit of retreading the same data per element, as opposed to looking it up from scratch for each pointer-word of bitmap. Lastly, because we no longer have a heap bitmap that spans the entire heap, there's a flat 1.5% memory use reduction. This is balanced slightly by some objects possibly being bumped up a size class, but most objects are not tightly optimized to size class sizes so there's some memory to spare, making the header basically free in those cases. See the follow-up CL which turns on this experiment by default for benchmark results. (CL 538217.) Change-Id: I4c9034ee200650d06d8bdecd579d5f7c1bbf1fc5 Reviewed-on: https://go-review.googlesource.com/c/go/+/437955 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-07cmd/compile,runtime: dedup writeBarrier neededMauri de Souza Meneguzzo
The writeBarrier "needed" struct member has the exact same value as "enabled", and used interchangeably. I'm not sure if we plan to make a distinction between the two at some point, but today they are effectively the same, so dedup it and keep only "enabled". Change-Id: I65e596f174e1e820dc471a45ff70c0ef4efbc386 GitHub-Last-Rev: f8c805a91606d42c8d5b178ddd7d0bec7aaf9f55 GitHub-Pull-Request: golang/go#63814 Reviewed-on: https://go-review.googlesource.com/c/go/+/538495 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-24runtime: use max/min funcqiulaidongfeng
Change-Id: I3f0b7209621b39cee69566a5cc95e4343b4f1f20 GitHub-Last-Rev: af9dbbe69ad74e8c210254dafa260a886b690853 GitHub-Pull-Request: golang/go#63321 Reviewed-on: https://go-review.googlesource.com/c/go/+/531916 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2023-08-23runtime: create wrappers for gcDrain to identify time in profilesMichael Anthony Knyszek
Currently it's impossible to identify in profiles where gcDrain-related time is coming from. More specifically, what kind of worker. Create trivial wrappers for each worker so that the difference shows up in stack traces. Also, clarify why gcDrain disables write barriers. Change-Id: I966e3c0b1c583994e691f486bf0ed8cabb91dbbb Reviewed-on: https://go-review.googlesource.com/c/go/+/521815 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-21runtime: drop stack-allocated pcvalueCachesAustin Clements
Now that pcvalue keeps its cache on the M, we can drop all of the stack-allocated pcvalueCaches and stop carefully passing them around between lots of operations. This significantly simplifies a fair amount of code and makes several structures smaller. This series of changes has no statistically significant effect on any runtime Stack benchmarks. I also experimented with making the cache larger, now that the impact is limited to the M struct, but wasn't able to measure any improvements. This is a re-roll of CL 515277 Change-Id: Ia27529302f81c1c92fb9c3a7474739eca80bfca1 Reviewed-on: https://go-review.googlesource.com/c/go/+/520064 Auto-Submit: Austin Clements <austin@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com>
2023-08-07Revert "runtime: drop stack-allocated pcvalueCaches"Austin Clements
This reverts CL 515277 Change-Id: Ie10378eed4993cb69f4a9b43a38af32b9d743155 Reviewed-on: https://go-review.googlesource.com/c/go/+/516855 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Austin Clements <austin@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-08-07runtime: drop stack-allocated pcvalueCachesAustin Clements
Now that pcvalue keeps its cache on the M, we can drop all of the stack-allocated pcvalueCaches and stop carefully passing them around between lots of operations. This significantly simplifies a fair amount of code and makes several structures smaller. This series of changes has no statistically significant effect on any runtime Stack benchmarks. I also experimented with making the cache larger, now that the impact is limited to the M struct, but wasn't able to measure any improvements. Change-Id: I4719ebf347c7150a05e887e75a238e23647c20cd Reviewed-on: https://go-review.googlesource.com/c/go/+/515277 TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Austin Clements <austin@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2023-07-18all: fix typosJes Cok
Change-Id: I510b0a4bf3472d937393800dd57472c30beef329 GitHub-Last-Rev: 8d289b73a37bd86080936423d981d21e152aaa33 GitHub-Pull-Request: golang/go#60960 Reviewed-on: https://go-review.googlesource.com/c/go/+/505398 Auto-Submit: Robert Findley <rfindley@google.com> Reviewed-by: Robert Findley <rfindley@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Robert Findley <rfindley@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-05-19runtime: replace raw traceEv with traceBlockReason in goparkMichael Anthony Knyszek
This change adds traceBlockReason which leaks fewer implementation details of the tracer to the runtime. Currently, gopark is called with an explicit trace event, but this leaks details about trace internals throughout the runtime. This change will make it easier to change out the trace implementation. Change-Id: Id633e1704d2c8838c6abd1214d9695537c4ac7db Reviewed-on: https://go-review.googlesource.com/c/go/+/494185 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-05-11runtime: replace trace.enabled with traceEnabledMichael Anthony Knyszek
[git-generate] cd src/runtime grep -l 'trace\.enabled' *.go | grep -v "trace.go" | xargs sed -i 's/trace\.enabled/traceEnabled()/g' Change-Id: I14c7821c1134690b18c8abc0edd27abcdabcad72 Reviewed-on: https://go-review.googlesource.com/c/go/+/494181 Run-TryBot: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-04-21internal/abi, runtime, cmd: merge funcID_* consts into internal/abiAustin Clements
For #59670. Change-Id: I517e97ea74cf232e5cfbb77b127fa8804f74d84b Reviewed-on: https://go-review.googlesource.com/c/go/+/485495 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>
2023-03-10runtime: replace all callback uses of gentraceback with unwinderAustin Clements
This is a really nice simplification for all of these call sites. It also achieves a nice performance improvement for stack copying: goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz │ before │ after │ │ sec/op │ sec/op vs base │ StackCopyPtr-48 89.25m ± 1% 79.78m ± 1% -10.62% (p=0.000 n=20) StackCopy-48 83.48m ± 2% 71.88m ± 1% -13.90% (p=0.000 n=20) StackCopyNoCache-48 2.504m ± 2% 2.195m ± 1% -12.32% (p=0.000 n=20) StackCopyWithStkobj-48 21.66m ± 1% 21.02m ± 2% -2.95% (p=0.000 n=20) geomean 25.21m 22.68m -10.04% Updates #54466. Change-Id: I31715b7b6efd65726940041d3052bb1c0a1186f3 Reviewed-on: https://go-review.googlesource.com/c/go/+/468297 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-02-16runtime: reimplement GODEBUG=cgocheck=2 as a GOEXPERIMENTKeith Randall
Move this knob from a binary-startup thing to a build-time thing. This will enable followon optmizations to the write barrier. Change-Id: Ic3323348621c76a7dc390c09ff55016b19c43018 Reviewed-on: https://go-review.googlesource.com/c/go/+/447778 Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>