aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/runtime2.go
AgeCommit message (Collapse)Author
2025-06-30runtime: stash allpSnapshot on the MMichael Pratt
findRunnable takes a snapshot of allp prior to dropping the P because afterwards procresize may mutate allp without synchronization. procresize is careful to never mutate the contents up to cap(allp), so findRunnable can still safely access the Ps in the slice. Unfortunately, growing allp is problematic. If procresize grows the allp backing array, it drops the reference to the old array. allpSnapshot still refers to the old array, but allpSnapshot is on the system stack in findRunnable, which also likely no longer has a P at all. This means that a future GC will not find the reference and can free the array and use it for another allocation. This would corrupt later reads that findRunnable does from the array. The fix is simple: the M struct itself is reachable by the GC, so we can stash the snapshot in the M to ensure it is visible to the GC. The ugliest part of the CL is the cleanup when we are done with the snapshot because there are so many return/goto top sites. I am tempted to put mp.clearAllpSnapshot() in the caller and at top to make this less error prone, at the expensive of extra unnecessary writes. Fixes #74414. Change-Id: I6a6a636c484e4f4b34794fd07910b3fffeca830b Reviewed-on: https://go-review.googlesource.com/c/go/+/684460 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-06-27runtime: account for missing frame pointer in preambleMichael Anthony Knyszek
If a goroutine is synchronously preempted, then taking a frame-pointer-based stack trace at that preemption will skip PC of the caller of the function which called into morestack. This happens because the frame pointer is pushed to the stack after the preamble, leaving the stack in an odd state for frame pointer unwinding. Deal with this by marking a goroutine as synchronously preempted and using that signal to load the missing PC from the stack. On LR platforms this is available in gp.sched.lr. On non-LR platforms like x86, it's at gp.sched.sp, because there are no args, no locals, and no frame pointer pushed to the SP yet. For #68090. Change-Id: I73a1206d8b84eecb8a96dbe727195da30088f288 Reviewed-on: https://go-review.googlesource.com/c/go/+/684435 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com>
2025-06-18runtime: prevent mutual deadlock between GC stopTheWorld and suspendGMichael Anthony Knyszek
Almost everywhere we stop the world we casGToWaitingForGC to prevent mutual deadlock with the GC trying to scan our stack. This historically was only necessary if we weren't stopping the world to change the GC phase, because what we were worried about was mutual deadlock with mark workers' use of suspendG. And, they were the only users of suspendG. In Go 1.22 this changed. The execution tracer began using suspendG, too. This leads to the possibility of mutual deadlock between the execution tracer and a goroutine trying to start or end the GC mark phase. The fix is simple: make the stop-the-world calls for the GC also call casGToWaitingForGC. This way, suspendG is guaranteed to make progress in this circumstance, and once it completes, the stop-the-world can complete as well. We can take this a step further, though, and move casGToWaitingForGC into stopTheWorldWithSema, since there's no longer really a place we can afford to skip this detail. While we're here, rename casGToWaitingForGC to casGToWaitingForSuspendG, since the GC is now not the only potential source of mutual deadlock. Fixes #72740. Change-Id: I5e3739a463ef3e8173ad33c531e696e46260692f Reviewed-on: https://go-review.googlesource.com/c/go/+/681501 Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-06-09runtime: clarify stack traces for bubbled goroutinesDamien Neil
Use the synctest bubble ID to identify bubbles in traces, rather than the goroutine ID of the bubble's root goroutine. Some waitReasons include a "(synctest)" suffix to distinguish a durably blocking state from a non-durable one. For example, "chan send" vs. "chan send (synctest)". Change this suffix to "(durable)". Always print a "(durable)" sufix for the state of durably blocked bubbled goroutines. For example, print "sleep (durable)". Drop the "[not] durably blocked" text from goroutine states, since this is now entirely redundant with the waitReason. Old: goroutine 8 [chan receive (synctest), synctest bubble 7, durably blocked]: goroutine 9 [select (no cases), synctest bubble 7, durably blocked]: New: goroutine 8 [chan receive (durable), synctest bubble 1]: goroutine 9 [select (no cases) (durable), synctest bubble 1]: Change-Id: I89112efb25150a98a2954f54d1910ccec52a5824 Reviewed-on: https://go-review.googlesource.com/c/go/+/679376 Auto-Submit: Damien Neil <dneil@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-29runtime: guarantee no GOMAXPROCS update syscalls after GOMAXPROCS callMichael Pratt
We already guarantee that no automatic updates to GOMAXPROCS occur after a GOMAXPROCS call returns. This is easily achieved by having the update goroutine double-check that updates are still allowed during STW before committing the new value. However, it is possible for sysmon to concurrently run defaultGOMAXPROCS to compute a new GOMAXPROCS value after GOMAXPROCS returns. This new value will be discarded later, but we'll still perform the system calls necessary to compute the new value. Normally this distinction doesn't matter, but if you want to sandbox a Go program, then you may want to disable GOMAXPROCS updates to reduce the system call footprint. A call to GOMAXPROCS will disable updates, but without a guarantee on when sysmon will observe the change it is somewhat fragile. Add explicit synchronization between GOMAXPROCS and sysmon to guarantee that sysmon won't run defaultGOMAXPROCS after GOMAXPROCS returns. The synchronization is a bit complex because we can't hold a mutex across STW, nor take a semaphore from sysmon, but the result isn't too bad. One oddity is that sched.customGOMAXPROCS and gomaxprocs are no longer updated in lockstep (even though both are protected by sched.lock), but I don't believe anything should depend on that. For #73193. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking Change-Id: I6a6a636cff243a9b69ac1b5d2f98925648e60236 Reviewed-on: https://go-review.googlesource.com/c/go/+/677037 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-29runtime, internal/synctest, sync: associate WaitGroups with bubblesDamien Neil
Add support to internal/synctest for managing associations between arbitrary pointers and synctest bubbles. (Implemented internally to the runtime package by attaching a special to the pointer.) Associate WaitGroups with bubbles. Since WaitGroups don't have a constructor, perform the association when Add is called. All Add calls must be made from within the same bubble, or outside any bubble. When a bubbled goroutine calls WaitGroup.Wait, the wait is durably blocking iff the WaitGroup is associated with the current bubble. Change-Id: I77e2701e734ac2fa2b32b28d5b0c853b7b2825c9 Reviewed-on: https://go-review.googlesource.com/c/go/+/676656 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Damien Neil <dneil@google.com>
2025-05-21runtime: use cgroup CPU limit to set GOMAXPROCSMichael Pratt
This CL adds two related features enabled by default via compatibility GODEBUGs containermaxprocs and updatemaxprocs. On Linux, containermaxprocs makes the Go runtime consider cgroup CPU bandwidth limits (quota/period) when setting GOMAXPROCS. If the cgroup limit is lower than the number of logical CPUs available, then the cgroup limit takes precedence. On all OSes, updatemaxprocs makes the Go runtime periodically recalculate the default GOMAXPROCS value and update GOMAXPROCS if it has changed. If GOMAXPROCS is set manually, this update does not occur. This is intended primarily to detect changes to cgroup limits, but it applies on all OSes because the CPU affinity mask can change as well. The runtime only considers the limit in the leaf cgroup (the one that actually contains the process), caching the CPU limit file descriptor(s), which are periodically reread for updates. This is a small departure from the original proposed design. It will not consider limits of parent cgroups (which may be lower than the leaf), and it will not detection cgroup migration after process start. We can consider changing this in the future, but the simpler approach is less invasive; less risk to packages that have some awareness of runtime internals. e.g., if the runtime periodically opens new files during execution, file descriptor leak detection is difficult to implement in a stable way. For #73193. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Change-Id: I6a6a636c631c1ae577fb8254960377ba91c5dc98 Reviewed-on: https://go-review.googlesource.com/c/go/+/670497 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-21runtime: add valgrind instrumentationRoland Shoemaker
Add build tag gated Valgrind annotations to the runtime which let it understand how the runtime manages memory. This allows for Go binaries to be run under Valgrind without emitting spurious errors. Instead of adding the Valgrind headers to the tree, and using cgo to call the various Valgrind client request macros, we just add an assembly function which emits the necessary instructions to trigger client requests. In particular we add instrumentation of the memory allocator, using a two-level mempool structure (as described in the Valgrind manual [0]). We also add annotations which allow Valgrind to track which memory we use for stacks, which seems necessary to let it properly function. We describe the memory model to Valgrind as follows: we treat heap arenas as a "pool" created with VALGRIND_CREATE_MEMPOOL_EXT (so that we can use VALGRIND_MEMPOOL_METAPOOL and VALGRIND_MEMPOOL_AUTO_FREE). Within the pool we treat spans as "superblocks", annotated with VALGRIND_MEMPOOL_ALLOC. We then allocate individual objects within spans with VALGRIND_MALLOCLIKE_BLOCK. It should be noted that running binaries under Valgrind can be _quite slow_, and certain operations, such as running the GC, can be _very slow_. It is recommended to run programs with GOGC=off. Additionally, async preemption should be turned off, since it'll cause strange behavior (GODEBUG=asyncpreemptoff=1). Running Valgrind with --leak-check=yes will result in some errors resulting from some things not being marked fully free'd. These likely need more annotations to rectify, but for now it is recommended to run with --leak-check=off. Updates #73602 [0] https://valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools Change-Id: I71b26c47d7084de71ef1e03947ef6b1cc6d38301 Reviewed-on: https://go-review.googlesource.com/c/go/+/674077 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-20runtime: report finalizer and cleanup queue length with checkfinalizer>0Michael Anthony Knyszek
This change adds tracking for approximate finalizer and cleanup queue lengths. These lengths are reported once every GC cycle as a single line printed to stderr when GODEBUG=checkfinalizer>0. This change lays the groundwork for runtime/metrics metrics to produce the same values. For #72948. For #72950. Change-Id: I081721238a0fc4c7e5bee2dbaba6cfb4120d1a33 Reviewed-on: https://go-review.googlesource.com/c/go/+/671437 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-19runtime: rename ncpu to numCPUStartupMichael Pratt
ncpu is the total logical CPU count at startup. It is never updated. For #73193, we will start using updated CPU counts for updated GOMAXPROCS, making the ncpu name a bit ambiguous. Change to a less ambiguous name. While we're at it, give the OS specific lookup functions a common name, so it can be used outside of osinit later. For #73193. Change-Id: I6a6a636cf21cc60de36b211f3c374080849fc667 Reviewed-on: https://go-review.googlesource.com/c/go/+/672277 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-05-12runtime: only poll network from one P at a time in findRunnableCarlos Amedee
This change reintroduces CL 564197. It was reverted due to a failing benchmark. That failure has been resolved. For #65064 Change-Id: Ic88841d2bc24c2717ad324873f0f52699f21dc66 Reviewed-on: https://go-review.googlesource.com/c/go/+/669235 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-08runtime: schedule cleanups across multiple goroutinesMichael Anthony Knyszek
This change splits the finalizer and cleanup queues and implements a new lock-free blocking queue for cleanups. The basic design is as follows: The cleanup queue is organized in fixed-sized blocks. Individual cleanup functions are queued, but only whole blocks are dequeued. Enqueuing cleanups places them in P-local cleanup blocks. These are flushed to the full list as they get full. Cleanups can only be enqueued by an active sweeper. Dequeuing cleanups always dequeues entire blocks from the full list. Cleanup blocks can be dequeued and executed at any time. The very last active sweeper in the sweep phase is responsible for flushing all local cleanup blocks to the full list. It can do this without any synchronization because the next GC can't start yet, so we can be very certain that nobody else will be accessing the local blocks. Cleanup blocks are stored off-heap because the need to be allocated by the sweeper, which is called from heap allocation paths. As a result, the GC treats cleanup blocks as roots, just like finalizer blocks. Flushes to the full list signal to the scheduler that cleanup goroutines should be awoken. Every time the scheduler goes to wake up a cleanup goroutine and there were more signals than goroutines to wake, it then forwards this signal to runtime.AddCleanup, so that it creates another goroutine the next time it is called, up to gomaxprocs goroutines. The signals here are a little convoluted, but exist because the sweeper and the scheduler cannot safely create new goroutines. For #71772. For #71825. Change-Id: Ie839fde2b67e1b79ac1426be0ea29a8d923a62cc Reviewed-on: https://go-review.googlesource.com/c/go/+/650697 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-07runtime: use "bubble" terminology for synctestDamien Neil
We've settled on calling the group of goroutines started by synctest.Run a "bubble". At the time the runtime implementation was written, I was still calling this a "group". Update the code to match the current terminology. Change-Id: I31b757f31d804b5d5f9564c182627030a9532f4a Reviewed-on: https://go-review.googlesource.com/c/go/+/670135 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Damien Neil <dneil@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-06runtime: replace mentions of "raised" with "panicked"Mark Freeman
Fixes #73526 Change-Id: I4b801cf3e54b99559e6d5ca8fdb2fd0692a0d3a5 Reviewed-on: https://go-review.googlesource.com/c/go/+/669975 TryBot-Bypass: Mark Freeman <mark@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Mark Freeman <mark@golang.org> Reviewed-by: Mark Freeman <mark@golang.org>
2025-05-05Revert "cmd/compile: allow all of the preamble to be preemptible"Keith Randall
This reverts commits 3f3782feed6e0726ddb08afd32dad7d94fbb38c6 (CL 648518) b386b628521780c048af14a148f373c84e687b26 (CL 668475) Fixes #73542 Change-Id: I218851c5c0b62700281feb0b3f82b6b9b97b910d Reviewed-on: https://go-review.googlesource.com/c/go/+/670055 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-25cmd/compile: allow all of the preamble to be preemptibleKeith Randall
We currently make some parts of the preamble unpreemptible because it confuses morestack. See comments in the code. Instead, have morestack handle those weird cases so we can remove unpreemptible marks from most places. This CL makes user functions preemptible everywhere if they have no write barriers (at least, on x86). In cmd/go the fraction of functions that need preemptible markings drops from 82% to 36%. Makes the cmd/go binary 0.3% smaller. Update #35470 Change-Id: Ic83d5eabfd0f6d239a92e65684bcce7e67ff30bb Reviewed-on: https://go-review.googlesource.com/c/go/+/648518 Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-04-22Revert "runtime: only poll network from one P at a time in findRunnable"Carlos Amedee
This reverts commit 352dd2d932c1c1c6dbc3e112fcdfface07d4fffb. Reason for revert: cockroachdb benchmark failing. Likely due to CL 564197. For #73474 Change-Id: Id5d83cd8bb8fe9ee7fddb8dc01f1a01f2d40154e Reviewed-on: https://go-review.googlesource.com/c/go/+/667336 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Auto-Submit: Carlos Amedee <carlos@golang.org>
2025-04-22runtime: commit to spinbitmutex GOEXPERIMENTRhys Hiltner
Use the "spinbit" mutex implementation always (including on platforms that need to emulate atomic.Xchg8), and delete the prior "tristate" implementations. The exception is GOARCH=wasm, where the Go runtime does not use multiple threads. For #68578 Change-Id: Ifc29bbfa05071d776c23a19ae185891a03a82417 Reviewed-on: https://go-review.googlesource.com/c/go/+/658456 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-22runtime: only poll network from one P at a time in findRunnableIan Lance Taylor
For #65064 Change-Id: Ifecd7e332d2cf251750752743befeda4ed396f33 Reviewed-on: https://go-review.googlesource.com/c/go/+/564197 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Artur M. Wolff <artur.m.wolff@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-04-15runtime: size field for gQueue and gListDmitrii Martynov
Before CL, all instances of gQueue and gList stored the size of structures in a separate variable. The size changed manually and passed as a separate argument to different functions. This CL added an additional field to gQueue and gList structures to store the size. Also, the calculation of size was moved into the implementation of API for these structures. This allows to reduce possible errors by eliminating manual calculation of the size and simplifying functions' signatures. Change-Id: I087da2dfaec4925e4254ad40fce5ccb4c175ec41 Reviewed-on: https://go-review.googlesource.com/c/go/+/664777 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>
2025-04-11runtime: handle m0 padding betterRuss Cox
The SpinbitMutex experiment requires m structs other than m0 to be allocated in 2048-byte size class, by adding padding. Do the calculation more explicitly, to avoid future CLs like CL 653335. Change-Id: I83ae1e86ef3711ab65441f4e487f94b9e1429029 Reviewed-on: https://go-review.googlesource.com/c/go/+/654595 Reviewed-by: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-03-14runtime: only set isExtraInC if there are no Go frames leftMichael Pratt
mp.isExtraInC is intended to indicate that this M has no Go frames at all; it is entirely executing in C. If there was a cgocallback to Go and then a cgocall to C, such that the leaf frames are C, that is fine. e.g., traceback can handle this fine with SetCgoTraceback (or by simply skipping the C frames). However, we currently mismanage isExtraInC, unconditionally setting it on return from cgocallback. This means that if there are two levels of cgocallback, we end up running Go code with isExtraInC set. 1. C-created thread calls into Go function 1 (via cgocallback). 2. Go function 1 calls into C function 1 (via cgocall). 3. C function 1 calls into Go function 2 (via cgocallback). 4. Go function 2 returns back to C function 1 (returning via the remainder of cgocallback). 5. C function 1 returns back to Go function 1 (returning via the remainder of cgocall). 6. Go function 1 is now running with mp.isExtraInC == true. The fix is simple; only set isExtraInC on return from cgocallback if there are no more Go frames. There can't be more Go frames unless there is an active cgocall out of the Go frames. Fixes #72870. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Change-Id: I6a6a636c4e7ba75a29639d7036c5af3738033467 Reviewed-on: https://go-review.googlesource.com/c/go/+/658035 Reviewed-by: Cherry Mui <cherryyz@google.com> Commit-Queue: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-01runtime: add padding to m struct for 64 bit architecturesJoel Sing
CL 652276 reduced the m struct by 8 bytes, which has changed the allocation class on 64 bit OpenBSD platforms. This results in build failures due to: M structure uses sizeclass 1792/0x700 bytes; incompatible with mutex flag mask 0x3ff Add 128 bytes of padding when spinbitmutex is enabled on 64 bit architectures, moving the size to the half point between the 1792 and 2048 allocation size. Change-Id: I71623a1f75714543c302217e619d20cf0e717aeb Reviewed-on: https://go-review.googlesource.com/c/go/+/653335 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-02-26runtime: remove ret field from gobufKeith Randall
It's not used for anything. Change-Id: I031b3cdfe52b6b1cff4b3cb6713ffe588084542f Reviewed-on: https://go-review.googlesource.com/c/go/+/652276 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-06runtime: don't duplicate reraised panic values in printpanicsDamien Neil
Change the output printed when crashing with a reraised panic value to not duplicate that value. Changes output of panicking with "PANIC", recovering, and reraising from: panic: PANIC [recovered] panic: PANIC to: panic: PANIC [recovered, reraised] Fixes #71517 Change-Id: Id59938c4ea0df555b851ffc650fe6f94c0845499 Reviewed-on: https://go-review.googlesource.com/c/go/+/645916 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-19internal/synctest: new package for testing concurrent codeDamien Neil
Add an internal (for now) implementation of testing/synctest. The synctest.Run function executes a tree of goroutines in an isolated environment using a fake clock. The synctest.Wait function allows a test to wait for all other goroutines within the test to reach a blocking point. For #67434 For #69687 Change-Id: Icb39e54c54cece96517e58ef9cfb18bf68506cfc Reviewed-on: https://go-review.googlesource.com/c/go/+/591997 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-15runtime: unify lock2, allow deeper sleepRhys Hiltner
The tri-state mutex implementation (unlocked, locked, sleeping) avoids sleep/wake syscalls when contention is low or absent, but its performance degrades when many threads are contending for a mutex to execute a fast critical section. A fast critical section means frequent unlock2 calls. Each of those finds the mutex in the "sleeping" state and so wakes a sleeping thread, even if many other threads are already awake and in the spin loop of lock2 attempting to acquire the mutex for themselves. Many spinning threads means wasting energy and CPU time that could be used by other processes on the machine. Many threads all spinning on the same cache line leads to performance collapse. Merge the futex- and semaphore-based mutex implementations by using a semaphore abstraction for futex platforms. Then, add a bit to the mutex state word that communicates whether one of the waiting threads is awake and spinning. When threads in lock2 see the new "spinning" bit, they can sleep immediately. In unlock2, the "spinning" bit means we can save a syscall and not wake a sleeping thread. This brings up the real possibility of starvation: waiting threads are able to enter a deeper sleep than before, since one of their peers can volunteer to be the sole "spinning" thread and thus cause unlock2 to skip the semawakeup call. Additionally, the waiting threads form a LIFO stack so any wakeups that do occur will target threads that have gone to sleep most recently. Counteract those effects by periodically waking the thread at the bottom of the stack and allowing it to spin. Exempt sched.lock from most of the new behaviors; it's often used by several threads in sequence to do thread-specific work, so low-latency handoff is a priority over improved throughput. Gate use of this implementation behind GOEXPERIMENT=spinbitmutex, so it's easy to disable. Enable it by default on supported platforms (the most efficient implementation requires atomic.Xchg8). Fixes #68578 goos: linux goarch: amd64 pkg: runtime cpu: 13th Gen Intel(R) Core(TM) i7-13700H │ old │ new │ │ sec/op │ sec/op vs base │ MutexContention 17.82n ± 0% 17.74n ± 0% -0.42% (p=0.000 n=10) MutexContention-2 22.17n ± 9% 19.85n ± 12% ~ (p=0.089 n=10) MutexContention-3 26.14n ± 14% 20.81n ± 13% -20.41% (p=0.000 n=10) MutexContention-4 29.28n ± 8% 21.19n ± 10% -27.62% (p=0.000 n=10) MutexContention-5 31.79n ± 2% 21.98n ± 10% -30.83% (p=0.000 n=10) MutexContention-6 34.63n ± 1% 22.58n ± 5% -34.79% (p=0.000 n=10) MutexContention-7 44.16n ± 2% 23.14n ± 7% -47.59% (p=0.000 n=10) MutexContention-8 53.81n ± 3% 23.66n ± 6% -56.04% (p=0.000 n=10) MutexContention-9 65.58n ± 4% 23.91n ± 9% -63.54% (p=0.000 n=10) MutexContention-10 77.35n ± 3% 26.06n ± 9% -66.31% (p=0.000 n=10) MutexContention-11 89.62n ± 1% 25.56n ± 9% -71.47% (p=0.000 n=10) MutexContention-12 102.45n ± 2% 25.57n ± 7% -75.04% (p=0.000 n=10) MutexContention-13 111.95n ± 1% 24.59n ± 8% -78.04% (p=0.000 n=10) MutexContention-14 123.95n ± 3% 24.42n ± 6% -80.30% (p=0.000 n=10) MutexContention-15 120.80n ± 10% 25.54n ± 6% -78.86% (p=0.000 n=10) MutexContention-16 128.10n ± 25% 26.95n ± 4% -78.96% (p=0.000 n=10) MutexContention-17 139.80n ± 18% 24.96n ± 5% -82.14% (p=0.000 n=10) MutexContention-18 141.35n ± 7% 25.05n ± 8% -82.27% (p=0.000 n=10) MutexContention-19 151.35n ± 18% 25.72n ± 6% -83.00% (p=0.000 n=10) MutexContention-20 153.30n ± 20% 24.75n ± 6% -83.85% (p=0.000 n=10) MutexHandoff/Solo-20 13.54n ± 1% 13.61n ± 4% ~ (p=0.206 n=10) MutexHandoff/FastPingPong-20 141.3n ± 209% 164.8n ± 49% ~ (p=0.436 n=10) MutexHandoff/SlowPingPong-20 1.572µ ± 16% 1.804µ ± 19% +14.76% (p=0.015 n=10) geomean 74.34n 30.26n -59.30% goos: darwin goarch: arm64 pkg: runtime cpu: Apple M1 │ old │ new │ │ sec/op │ sec/op vs base │ MutexContention 13.86n ± 3% 12.09n ± 3% -12.73% (p=0.000 n=10) MutexContention-2 15.88n ± 1% 16.50n ± 2% +3.94% (p=0.001 n=10) MutexContention-3 18.45n ± 2% 16.88n ± 2% -8.54% (p=0.000 n=10) MutexContention-4 20.01n ± 2% 18.94n ± 18% ~ (p=0.469 n=10) MutexContention-5 22.60n ± 1% 17.51n ± 9% -22.50% (p=0.000 n=10) MutexContention-6 23.93n ± 2% 17.35n ± 2% -27.48% (p=0.000 n=10) MutexContention-7 24.69n ± 1% 17.15n ± 3% -30.54% (p=0.000 n=10) MutexContention-8 25.01n ± 1% 17.33n ± 2% -30.69% (p=0.000 n=10) MutexHandoff/Solo-8 13.96n ± 4% 12.04n ± 4% -13.78% (p=0.000 n=10) MutexHandoff/FastPingPong-8 68.89n ± 4% 64.62n ± 2% -6.20% (p=0.000 n=10) MutexHandoff/SlowPingPong-8 9.698µ ± 22% 9.646µ ± 35% ~ (p=0.912 n=10) geomean 38.20n 32.53n -14.84% Change-Id: I0058c75eadf282d08eea7fce0d426f0518039f7c Reviewed-on: https://go-review.googlesource.com/c/go/+/620435 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
2024-11-15runtime: allow futex OSes to use sema-based mutexRhys Hiltner
Implement sema{create,sleep,wakeup} in terms of the futex syscall when available. Split the lock2/unlock2 implementations out of lock_sema.go and lock_futex.go (which they shared with runtime.note) to allow swapping in new implementations of those. Let futex-based platforms use the semaphore-based mutex implementation. Control that via the new "spinbitmutex" GOEXPERMENT value, disabled by default. This lays the groundwork for a "spinbit" mutex implementation; it does not include the new mutex implementation. For #68578. Change-Id: I091289c85124212a87abec7079ecbd9e610b4270 Reviewed-on: https://go-review.googlesource.com/c/go/+/622996 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-13runtime: prevent weak->strong conversions during mark terminationMichael Anthony Knyszek
Currently it's possible for weak->strong conversions to create more GC work during mark termination. When a weak->strong conversion happens during the mark phase, we need to mark the newly-strong pointer, since it may now be the only pointer to that object. In other words, the object could be white. But queueing new white objects creates GC work, and if this happens during mark termination, we could end up violating mark termination invariants. In the parlance of the mark termination algorithm, the weak->strong conversion is a non-monotonic source of GC work, unlike the write barriers (which will eventually only see black objects). This change fixes the problem by forcing weak->strong conversions to block during mark termination. We can do this efficiently by setting a global flag before the ragged barrier that is checked at each weak->strong conversion. If the flag is set, then the conversions block. The ragged barrier ensures that all Ps have observed the flag and that any weak->strong conversions which completed before the ragged barrier have their newly-minted strong pointers visible in GC work queues if necessary. We later unset the flag and wake all the blocked goroutines during the mark termination STW. There are a few subtleties that we need to account for. For one, it's possible that a goroutine which blocked in a weak->strong conversion wakes up only to find it's mark termination time again, so we need to recheck the global flag on wake. We should also stay non-preemptible while performing the check, so that if the check *does* appear as true, it cannot switch back to false while we're actively trying to block. If it switches to false while we try to block, then we'll be stuck in the queue until the following GC. All-in-all, this CL is more complicated than I would have liked, but it's the only idea so far that is clearly correct to me at a high level. This change adds a test which is somewhat invasive as it manipulates mark termination, but hopefully that infrastructure will be useful for debugging, fixing, and regression testing mark termination whenever we do fix it. Fixes #69803. Change-Id: Ie314e6fd357c9e2a07a9be21f217f75f7aba8c4a Reviewed-on: https://go-review.googlesource.com/c/go/+/623615 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-10-30runtime: update and restore g0 stack bounds at cgocallbackCherry Mui
Currently, at a cgo callback where there is already a Go frame on the stack (i.e. C->Go->C->Go), we require that at the inner Go callback the SP is within the g0's stack bounds set by a previous callback. This is to prevent that the C code switches stack while having a Go frame on the stack, which we don't really support. But this could also happen when we cannot get accurate stack bounds, e.g. when pthread_getattr_np is not available. Since the stack bounds are just estimates based on the current SP, if there are multiple C->Go callbacks with various stack depth, it is possible that the SP of a later callback falls out of a previous call's estimate. This leads to runtime throw in a seemingly reasonable program. This CL changes it to save the old g0 stack bounds at cgocallback, update the bounds, and restore the old bounds at return. So each callback will get its own stack bounds based on the current SP, and when it returns, the outer callback has the its old stack bounds restored. Also, at a cgo callback when there is no Go frame on the stack, we currently always get new stack bounds. We do this because if we can only get estimated bounds based on the SP, and the stack depth varies a lot between two C->Go calls, the previous estimates may be off and we fall out or nearly fall out of the previous bounds. But this causes a performance problem: the pthread API to get accurate stack bounds (pthread_getattr_np) is very slow when called on the main thread. Getting the stack bounds every time significantly slows down repeated C->Go calls on the main thread. This CL fixes it by "caching" the stack bounds if they are accurate. I.e. at the second time Go calls into C, if the previous stack bounds are accurate, and the current SP is in bounds, we can be sure it is the same stack and we don't need to update the bounds. This avoids the repeated calls to pthread_getattr_np. If we cannot get the accurate bounds, we continue to update the stack bounds based on the SP, and that operation is very cheap. On a Linux/AMD64 machine with glibc: name old time/op new time/op delta CgoCallbackMainThread-8 96.4µs ± 3% 0.1µs ± 2% -99.92% (p=0.000 n=10+9) Fixes #68285. Fixes #68587. Change-Id: I3422badd5ad8ff63e1a733152d05fb7a44d5d435 Reviewed-on: https://go-review.googlesource.com/c/go/+/600296 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-10-28crypto/internal/fips: add service indicator mechanismFilippo Valsorda
Placed the fipsIndicator field in some 64-bit alignment padding in the g struct to avoid growing per-goroutine memory requirements on 64-bit targets. Fixes #69911 Updates #69536 Change-Id: I176419d0e3814574758cb88a47340a944f405604 Reviewed-on: https://go-review.googlesource.com/c/go/+/620795 Reviewed-by: Roland Shoemaker <roland@golang.org> Reviewed-by: Daniel McCarney <daniel@binaryparadox.net> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Derek Parker <parkerderek86@gmail.com>
2024-08-01runtime: don't use maps in js note implementationMichael Pratt
notes are used in sensitive locations in the runtime, such as those with write barriers forbidden. Maps aren't designed for this sort of internal use. Notably, newm -> notewakeup doesn't allow write barriers, but mapaccess1 -> panic contains write barriers. The js runtime only builds right now because the map access is optimized to mapaccess1_fast64, which happens to not have a panic call. The initial swisstable map implementation doesn't have a fast64 variant. While we could add one, it is a bad idea in general to use a map in such a fragile location. Simplify the implementation by storing the metadata directly in the note, and using a linked list for checkTimeouts. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-js-wasm Change-Id: Ib9d39f064ae4ad32dcc873f799428717eb6c2d5a Reviewed-on: https://go-review.googlesource.com/c/go/+/595558 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-07-23runtime,internal: move runtime/internal/sys to internal/runtime/sysDavid Chase
Cleanup and friction reduction For #65355. Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511 Reviewed-on: https://go-review.googlesource.com/c/go/+/600436 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-05-30Revert "runtime: prepare for extensions to waiting M list"Rhys Hiltner
This reverts commit be0b569caa0eab1a7f30edf64e550bbf5f6ff235 (CL 585635). Reason for revert: This is part of a patch series that changed the handling of contended lock2/unlock2 calls, reducing the maximum throughput of contended runtime.mutex values, and causing a performance regression on applications where that is (or became) the bottleneck. Updates #66999 Updates #67585 Change-Id: I7843ccaecbd273b7ceacfa0f420dd993b4b15a0a Reviewed-on: https://go-review.googlesource.com/c/go/+/589117 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-05-29all: document legacy //go:linkname for final round of modulesRuss Cox
Add linknames for most modules with ≥50 dependents. Add linknames for a few other modules that we know are important but are below 50. Remove linknames from badlinkname.go that do not merit inclusion (very small number of dependents). We can add them back later if the need arises. Fixes #67401. (For now.) Change-Id: I1e49fec0292265256044d64b1841d366c4106002 Reviewed-on: https://go-review.googlesource.com/c/go/+/587756 Auto-Submit: Russ Cox <rsc@golang.org> TryBot-Bypass: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-05-21runtime: prepare for extensions to waiting M listRhys Hiltner
Move the nextwaitm field into a small struct, in preparation for additional metadata to track how long Ms need to wait for locks. For #66999 Change-Id: Ib40e43c15cde22f7e35922641107973d99439ecd Reviewed-on: https://go-review.googlesource.com/c/go/+/585635 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-05-17internal/runtime/atomic: fix missing linknamesAustin Clements
CL 544455, which added atomic And/Or APIs, raced with CL 585556, which enabled stricter linkname checking. This caused linkname-related failures on ARM and MIPS. Fix this by adding the necessary linknames. We fix one other linkname that got overlooked in CL 585556. Updates #61395. Change-Id: I454f0767ce28188e550a61bc39b7e398239bc10e Reviewed-on: https://go-review.googlesource.com/c/go/+/586516 Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Austin Clements <austin@google.com>
2024-05-08runtime: delete pagetrace GOEXPERIMENTMichael Anthony Knyszek
The page tracer's functionality is now captured by the regular execution tracer as an experimental GODEBUG variable. This is a lot more usable and maintainable than the page tracer, which is likely to have bitrotted by this point. There's also no tooling available for the page tracer. Change-Id: I2408394555e01dde75a522e9a489b7e55cf12c8e Reviewed-on: https://go-review.googlesource.com/c/go/+/583379 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-08runtime: move profiling pc buffers to mFelix Geisendörfer
Move profiling pc buffers from being stack allocated to an m field. This is motivated by the next patch, which will increase the default stack depth to 128, which might lead to undesirable stack growth for goroutines that produce profiling events. Additionally, this change paves the way to make the stack depth configurable via GODEBUG. Change-Id: Ifa407f899188e2c7c0a81de92194fdb627cb4b36 Reviewed-on: https://go-review.googlesource.com/c/go/+/574699 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-04-22runtime: reduced struct sizes found via paholeSabyrzhan Tasbolatov
During my research of pahole with Go structs, I've found couple of structs in runtime/ pkg where we can reduce several structs' sizes highligted by pahole tool which detect byte holes and paddings. Overall, there are 80 bytes reduced. Change-Id: I398e5ed6f5b199394307741981cb5ad5b875e98f Reviewed-on: https://go-review.googlesource.com/c/go/+/578795 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Joedian Reid <joedian@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-19runtime: track frame pointer while in syscallMichael Anthony Knyszek
Currently the runtime only tracks the PC and SP upon entering a syscall, but not the FP (BP). This is mainly for historical reasons, and because the tracer (which uses the frame pointer unwinder) does not need it. Until it did, of course, in CL 567076, where the tracer tries to take a stack trace of a goroutine that's in a syscall from afar. It tries to use gp.sched.bp and lots of things go wrong. It *really* should be using the equivalent of gp.syscallbp, which doesn't exist before this CL. This change introduces gp.syscallbp and tracks it. It also introduces getcallerfp which is nice for simplifying some code. Because we now have gp.syscallbp, we can also delete the frame skip count computation in traceLocker.GoSysCall, because it's now the same regardless of whether frame pointer unwinding is used. Fixes #66889. Change-Id: Ib6d761c9566055e0a037134138cb0f81be73ecf7 Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-nocgo Reviewed-on: https://go-review.googlesource.com/c/go/+/580255 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-04-08runtime: account for _Pgcstop in GC CPU pause time in a fine-grained wayMichael Anthony Knyszek
The previous CL, CL 570257, made it so that STW time no longer overlapped with other CPU time tracking. However, what we lost was insight into the CPU time spent _stopping_ the world, which can be just as important. There's pretty much no easy way to measure this indirectly, so this CL implements a direct measurement: whenever a P enters _Pgcstop, it writes down what time it did so. stopTheWorld then accumulates all the time deltas between when it finished stopping the world and each P's stop time into a total additional pause time. The GC pause cases then accumulate this number into the metrics. This should cause minimal additional overhead in stopping the world. GC STWs already take on the order of 10s to 100s of microseconds. Even for 100 Ps, the extra `nanotime` call per P is only 1500ns of additional CPU time. This is likely to be much less in actual pause latency, since it all happens concurrently. Change-Id: Icf190ffea469cd35ebaf0b2587bf6358648c8554 Reviewed-on: https://go-review.googlesource.com/c/go/+/574215 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-04-05runtime: take a stack trace during tracing only when we own the stackMichael Anthony Knyszek
Currently, the execution tracer may attempt to take a stack trace of a goroutine whose stack it does not own. For example, if the goroutine is in _Grunnable or _Gwaiting. This is easily fixed in all cases by simply moving the emission of GoStop and GoBlock events to before the casgstatus happens. The goroutine status is what is used to signal stack ownership, and the GC may shrink a goroutine's stack if it can acquire the scan bit. Although this is easily fixed, the interaction here is very subtle, because stack ownership is only implicit in the goroutine's scan status. To make this invariant more maintainable and less error-prone in the future, this change adds a GODEBUG setting that checks, at the point of taking a stack trace, whether the caller owns the goroutine. This check is not quite perfect because there's no way for the stack tracing code to know that the _Gscan bit was acquired by the caller, so for simplicity it assumes that it was the caller that acquired the scan bit. In all other cases however, we can check for ownership precisely. At the very least, this check is sufficient to catch the issue this change is fixing. To make sure this debug check doesn't bitrot, it's always enabled during trace testing. This new mode has actually caught a few other issues already, so this change fixes them. One issue that this debug mode caught was that it's not safe to take a stack trace of a _Gwaiting goroutine that's being unparked. Another much bigger issue this debug mode caught was the fact that the execution tracer could try to take a stack trace of a G that was in _Gwaiting solely to avoid a deadlock in the GC. The execution tracer already has a partial list of these cases since they're modeled as the goroutine just executing as normal in the tracer, but this change takes the list and makes it more formal. In this specific case, we now prevent the GC from shrinking the stacks of goroutines in this state if tracing is enabled. The stack traces from these scenarios are too useful to discard, but there is indeed a race here between the tracer and any attempt to shrink the stack by the GC. Change-Id: I019850dabc8cede202fd6dcc0a4b1f16764209fb Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race Reviewed-on: https://go-review.googlesource.com/c/go/+/573155 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-03-26runtime: don't call lockOSThread for every syscall call on Windowsqmuntal
Windows syscall.SyscallN currently calls lockOSThread for every syscall. This can be expensive and produce unnecessary context switches, especially when the syscall is called frequently under high contention. The lockOSThread was necessary to ensure that cgocall wouldn't reschedule the goroutine to a different M, as the syscall return values are reported back in the M struct. This CL instructs cgocall to copy the syscall return values into the the M that will see the caller on return, so the caller no longer needs to call lockOSThread. Updates #58336. Cq-Include-Trybots: luci.golang.try:gotip-windows-arm64,gotip-windows-amd64-longtest Change-Id: If6644fd111dbacab74e7dcee2afa18ca146735da Reviewed-on: https://go-review.googlesource.com/c/go/+/562915 Reviewed-by: Alex Brainman <alex.brainman@gmail.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com> Reviewed-by: Than McIntosh <thanm@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-25runtime: migrate internal/atomic to internal/runtimeAndy Pan
For #65355 Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be Reviewed-on: https://go-review.googlesource.com/c/go/+/560155 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
2024-03-13runtime: clean up timer stateRuss Cox
The timers had evolved to the point where the state was stored as follows: if timer in heap: state has timerHeaped set if heap timer is stale: heap deadline in t.when real deadline in t.nextWhen state has timerNextWhen set else: real deadline in t.when t.nextWhen unset else: real deadline in t.when t.nextWhen unset That made it hard to find the real deadline and just hard to think about everything. The new state is: real deadline in t.when (always) if timer in heap: state has timerHeaped set heap deadline in t.whenHeap if heap timer is stale: state has timerModified set Separately, the 'state' word itself was being used as a lock and state bits because the code started with CAS loops, which we abstracted into the lock/unlock methods step by step. At this point, we can switch to a real lock, making sure to publish the one boolean needed by timers fast paths at each unlock. All this simplifies various logic considerably. Change-Id: I35766204f7a26d999206bd56cc0db60ad1b17cbe Reviewed-on: https://go-review.googlesource.com/c/go/+/570335 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-03-08runtime: avoid pp.timers.lock in updateTimerPMaskRuss Cox
The comment in updateTimerPMask is wrong. It says: // Looks like there are no timers, however another P // may be adding one at this very moment. // Take the lock to synchronize. This was my incorrect simplification of the original comment from CL 264477 when I was renaming all the things it mentioned: // Looks like there are no timers, however another P may transiently // decrement numTimers when handling a timerModified timer in // checkTimers. We must take timersLock to serialize with these changes. updateTimerPMask is being called by pidleput, so the P in question is not in use. And other P's cannot add to this P. As the original comment more precisely noted, the problem was that other P's might be calling timers.check, which updates ts.len occasionally while ts is locked, and one of those updates might "leak" an ephemeral len==0 even when the heap is not going to be empty when the P is finally unlocked. The lock/unlock in updateTimerPMask synchronizes to avoid that. But this defeats most of the purpose of using ts.len in the first place. Instead of requiring that synchronization, we can arrange that ts.len only ever shows a "publishable" length, meaning the len(ts.heap) we leave behind during ts.unlock. Having done that, updateTimerPMask can be inlined into pidleput. The big comment on updateTimerPMask explaining how timerpMask works is better placed as the doc comment for timerpMask itself, so move it there. Change-Id: I5442c9bb7f1473b5fd37c43165429d087012e73f Reviewed-on: https://go-review.googlesource.com/c/go/+/568336 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>
2024-02-29runtime: move per-P timers state into its own structRuss Cox
Continuing conversion from C to Go, introduce type timers encapsulating all timer heap state, with methods for operations. This should at least be easier to think about, instead of having these fields strewn through the P struct. It should also be easier to test. I am skeptical about the pair of atomic int64 deadlines: I think there are missed wakeups lurking. Having the code in an abstracted API should make it easier to reason through and fix if needed. [This is one CL in a refactoring stack making very small changes in each step, so that any subtle bugs that we miss can be more easily pinpointed to a small change.] Change-Id: If5ea3e0b946ca14076f44c85cbb4feb9eddb4f95 Reviewed-on: https://go-review.googlesource.com/c/go/+/564132 Reviewed-by: Austin Clements <austin@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>
2024-02-08cmd/compile: move runtime.itab to internal/abi.ITabKeith Randall
Change-Id: I44293452764dc4bc4de8d386153c6402a9cbe409 Reviewed-on: https://go-review.googlesource.com/c/go/+/549435 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Than McIntosh <thanm@google.com>
2024-01-03pagetrace: fix build when experiment is onJohn Howard
due to a recent change, this experiment does not compile at all. This simply fixes to pass in the new required parameter. Change-Id: Idce0e72fa436a7acf4923717913deb3a37847fe2 Reviewed-on: https://go-review.googlesource.com/c/go/+/551415 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>