aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/runtime2.go
AgeCommit message (Collapse)Author
2026-01-23runtime: speed up cheaprand and cheaprand64Gavin Lam
The current cheaprand performs 128-bit multiplication on 64-bit numbers and truncate the result to 32 bits, which is inefficient. A 32-bit specific implementation is more performant because it performs 64-bit multiplication on 32-bit numbers instead. The current cheaprand64 involves two cheaprand calls. Implementing it as 64-bit wyrand is significantly faster. Since cheaprand64 discards one bit, I have preserved this behavior. The underlying uint64 function is made available as cheaprandu64. │ old │ new │ │ sec/op │ sec/op vs base │ Cheaprand-8 1.358n ± 0% 1.218n ± 0% -10.31% (n=100) Cheaprand64-8 2.424n ± 0% 1.391n ± 0% -42.62% (n=100) Blocksampled-8 8.347n ± 0% 2.022n ± 0% -75.78% (n=100) Fixes #77149 Change-Id: Ib0b5da4a642cd34d0401b03c1d343041f8230d11 GitHub-Last-Rev: 549d8d407e2bbcaecdee0b52cbf3a513dda637fb GitHub-Pull-Request: golang/go#77150 Reviewed-on: https://go-review.googlesource.com/c/go/+/735480 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-11runtime: make goroutines inherit DIT state, don't lock to OS threadRoland Shoemaker
When we first implemented DIT (crypto/subtle.WithDataIndependentTiming), we made it so that enabling DIT on a goroutine would lock that goroutine to its current OS thread. This was done to ensure that the DIT state (which is per-thread) would not leak to other goroutines. We also did not make goroutines inherit the DIT state. This change makes goroutines inherit the DIT state from their parent at creation time. It also removes the OS thread locking when enabling DIT on a goroutine. Instead, we now set the DIT state on the OS thread in the scheduler whenever we switch to a goroutine that has DIT enabled, and we unset it when switching to a goroutine that has DIT disabled. We add a new field to G and M, ditEnabled, to track whether the G wants DIT enabled, and whether the M currently has DIT enabled, respectively. When the scheduler executes a goroutine, it checks these fields and enables/disables DIT on the thread as needed. Additionally, cgocallbackg is updated to check if DIT is enabled when being called from C, and sets the G and M fields accordingly. This ensures that if DIT was enabled/disabled in C, the correct state will be reflected in the Go runtime. The behavior as it currently stands is as follows: - The function passed to crypto/subtle.WithDataIndependentTiming will have DIT enabled. - Any goroutine created within that function will inherit DIT enabled for its lifetime. Any goroutine created from subquent goroutines will also inherit DIT enabled for their lifetimes. - Calling into a C function within from a goroutine with DIT enabled will have DIT enabled. - If the C code disables DIT, the goroutine will have DIT re-enabled when returning to Go. - If the C code enables DIT, the goroutine will have DIT disabled when returning to Go if it was not previously enabled. - Calling back into Go code from C will have DIT enabled if it was enabled when calling into C, or if the C code enabled it. Change-Id: I8e91e6df13bb88e56e1036e0e0e5f04efd8eebd3 Reviewed-on: https://go-review.googlesource.com/c/go/+/726382 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-05runtime: don't count nGsyscallNoP for extra Ms in CMichael Anthony Knyszek
In #76435, it turns out that the new metric /sched/goroutines/not-in-go:goroutines counts C threads that have called into Go before (on Linux) as not-in-go goroutines. The reason for this is that the M is still attached to the C thread on Linux as an optimization, so we don't go through all the trouble of detaching the M and, of course, decrementing nGsyscallNoP. There's an easy fix to this accounting issue. The flag on the M, isExtraInC, says whether a thread with an extra M attached no longer has any Go on its (logical) stack. When we take the P from an M in this state, we simply just don't increment nGsyscallNoP. When it calls back into Go, we similarly skip the decrement to nGsyscallNoP. This is more efficient than alternatives, like always updating nGsyscallNoP in cgocallbackg, since that would add a new read-modify-write atomic onto that fast path. It does mean we count threads in C with a P still attached as not-in-go, but this transient in most real programs, assuming the thread indeed does not call back into Go any time soon. Fixes #76435. Change-Id: Id05563bacbe35d3fae17d67fb5ed45fa43fa0548 Reviewed-on: https://go-review.googlesource.com/c/go/+/726964 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-11-26runtime/secret: implement new secret packageDaniel Morsing
Implement secret.Do. - When secret.Do returns: - Clear stack that is used by the argument function. - Clear all the registers that might contain secrets. - On stack growth in secret mode, clear the old stack. - When objects are allocated in secret mode, mark them and then zero the marked objects immediately when they are freed. - If the argument function panics, raise that panic as if it originated from secret.Do. This removes anything about the secret function from tracebacks. For now, this is only implemented on linux for arm64 and amd64. This is a rebased version of Keith Randalls initial implementation at CL 600635. I have added arm64 support, signal handling, preemption handling and dealt with vDSOs spilling into system stacks. Fixes #21865 Change-Id: I6fbd5a233beeaceb160785e0c0199a5c94d8e520 Co-authored-by: Keith Randall <khr@golang.org> Reviewed-on: https://go-review.googlesource.com/c/go/+/704615 Reviewed-by: Roland Shoemaker <roland@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Filippo Valsorda <filippo@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-11-26crypto/fips140: add WithoutEnforcementDaniel Morsing
WithoutEnforcement lets programs running under GODEBUG=fips140=only selectively opt out of strict enforcement. This is especially helpful for non-critical uses of cryptography routines like SHA-1 for content addressable storage backends (E.g. git). Fixes #74630 Change-Id: Iabba1f5eb63498db98047aca45e09c5dccf2fbdf Reviewed-on: https://go-review.googlesource.com/c/go/+/723720 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> Auto-Submit: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Roland Shoemaker <roland@golang.org>
2025-11-21runtime: go fmtMichael Pratt
Change-Id: I6a6a636cf38ddb1dc6f2170361eb4093b81acdfb Reviewed-on: https://go-review.googlesource.com/c/go/+/722521 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-11-20runtime: split findRunnableGCWorker in twoMichael Pratt
The first part, assignWaitingGCWorker selects a mark worker (if any) and assigns it to the P. The second part, findRunnableGCWorker, is responsible for actually marking the worker as runnable and updating the CPU limiter. The advantage of this split is that assignWaitingGCWorker is safe to do during STW, which will allow the next CL to make selections during procresize. This change is a semantic no-op in preparation for the next CL. For #65694. Change-Id: I6a6a636c8beb212185829946cfa1e49f706ac31a Reviewed-on: https://go-review.googlesource.com/c/go/+/721001 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-11-17runtime: rename findrunnable references to findRunnableMichael Pratt
These cases were missed by CL 393880. Change-Id: I6a6a636cf0d97a4efcf4b9df766002ecef48b4de Reviewed-on: https://go-review.googlesource.com/c/go/+/721120 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-11-13runtime: prefer to restart Ps on the same M after STWMichael Pratt
Today, Ps jump around arbitrarily across STW. Instead, try to keep the P on the previous M it ran on. In the future, we'll likely want to try to expand this beyond STW to create a more general affinity for specific Ms. For this to be useful, the Ps need to have runnable Gs. Today, STW preemption goes through goschedImpl, which places the G on the global run queue. If that was the only G then the P won't have runnable goroutines anymore. It makes more sense to keep the G with its P across STW anyway, so add a special case to goschedImpl for that. On my machine, this CL reduces the error rate in TestTraceSTW from 99.8% to 1.9%. As a nearly 2% error rate shows, there are still cases where this best effort scheduling doesn't work. The most obvious is that while procresize assigns Ps back to their original M, startTheWorldWithSema calls wakep to start a spinning M. The spinning M may steal a goroutine from another P if that P is too slow to start. For #65694. Change-Id: I6a6a636c0969c587d039b68bc68ea16c74ff1fc9 Reviewed-on: https://go-review.googlesource.com/c/go/+/714801 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-12runtime: switch p.gcFractionalMarkTime to atomic.Int64Michael Pratt
atomic.Int64 automatically maintains proper alignment, avoiding the need to manually adjust alignment back and forth as fields above change. Change-Id: I6a6a636c4c3c366353f6dc8ecac473c075dd5cd9 Reviewed-on: https://go-review.googlesource.com/c/go/+/716700 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-11-11runtime: doubly-linked sched.midle listMichael Pratt
This will be used by CL 714801 to remove Ms from the middle of the list. We could simply convert schedlink to the doubly-linked list, bringing along all other uses of schedlink. However, CL 714801 removes Ms from the middle of the midle list. It would be an easy mistake to make to accidentally remove an M from schedlink, assuming that it is on the midle list when it is actually on a completely different list. Using separate a list node makes this impossible. For #65694. Change-Id: I6a6a636c223d925fdc30c0c648460cbf5c2af4d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/714800 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-11-06Revert "runtime: remove the pc field of _defer struct"Keith Randall
This reverts commit 361d51a6b58bccaab0559e06737c918018a7a5fa. Reason for revert: Breaks some tests inside Google (on arm64?) Change-Id: Iaea45fdcf9b4f9d36553687ca7f479750fe559da Reviewed-on: https://go-review.googlesource.com/c/go/+/718066 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Youlin Feng <fengyoulin@live.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-11-04cmd/link, runtime: don't store text start in pcHeaderIan Lance Taylor
The textStart field requires a relocation, the only relocation in pclntab. And nothing uses it. So remove it. Replace it with a zero, which can itself be removed at some point in coordination with Delve. For #76038 Change-Id: I35675c0868c5d957bb375e40b804c516ae0300ca Reviewed-on: https://go-review.googlesource.com/c/go/+/717240 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Ian Lance Taylor <iant@golang.org>
2025-11-03runtime: remove the pc field of _defer structYoulin Feng
Since we always can get the address of `CALL runtime.deferreturn(SB)` from the unwinder, so it is not necessary to record the caller's pc in the _defer struct. For the stack allocated _defer, this CL makes the frame smaller. Change-Id: I0fd347e4bc07cf8a9b954816323df30fc52552b6 Reviewed-on: https://go-review.googlesource.com/c/go/+/716720 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-10-30runtime: eliminate _PsyscallMichael Anthony Knyszek
This change eliminates the _Psyscall state by using synchronization on the G status _Gsyscall to make syscalls work instead. This removes an atomic Store and an atomic CAS on the syscall path, which reduces syscall and cgo overheads. It also simplifies the syscall paths quite a bit. The one danger with this change is that we have a new combination of states that was previously impossible. There are brief windows where it's possible to observe a goroutine in _Grunning but without a P. This change is careful to hide this detail from the execution tracer, but it may have unexpected effects in the rest of the runtime, making this change somewhat risky. goos: linux goarch: amd64 pkg: internal/runtime/cgobench cpu: AMD EPYC 7B13 │ before.out │ after.out │ │ sec/op │ sec/op vs base │ CgoCall-64 43.69n ± 1% 35.83n ± 1% -17.99% (p=0.002 n=6) CgoCallParallel-64 5.306n ± 1% 5.338n ± 1% ~ (p=0.132 n=6) Change-Id: I4551afc1eea0c1b67a0b2dd26b0d49aa47bf1fb8 Reviewed-on: https://go-review.googlesource.com/c/go/+/646198 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-10-28runtime: amend comments a bitJes Cok
Change-Id: I3cabc57f6b8f803f966221f9583a5edb8828ca12 GitHub-Last-Rev: 57569ace50ab8ce3d39e17ddf25ad161dffcc19d GitHub-Pull-Request: golang/go#76086 Reviewed-on: https://go-review.googlesource.com/c/go/+/715600 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2025-10-20runtime: add _Gdeadextra statusMichael Anthony Knyszek
_Gdeadextra is almost the same as _Gdead but for goroutines attached to extra Ms. The primary difference is that it can be transitioned into a _Gscan status, unlike _Gdead. (Why not just use _Gdead? For safety, mostly. There's exactly one case where we're going to want to transition _Gdead to _Gscan|_Gdead, and it's for extra Ms. It's also a bit weird to call this state dead when it can still have a syscalling P attached to it.) This status is used in a follow-up change that changes entersyscall and exitsyscall. Change-Id: I169a4c8617aa3dc329574b829203f56c86b58169 Reviewed-on: https://go-review.googlesource.com/c/go/+/646197 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-02runtime,net/http/pprof: goroutine leak detection by using the garbage collectorVlad Saioc
Proposal #74609 Change-Id: I97a754b128aac1bc5b7b9ab607fcd5bb390058c8 GitHub-Last-Rev: 60f2a192badf415112246de8bc6c0084085314f6 GitHub-Pull-Request: golang/go#74622 Reviewed-on: https://go-review.googlesource.com/c/go/+/688335 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: t hepudds <thepudds1460@gmail.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-08-15runtime/metrics: add metric for current Go-owned thread countMichael Anthony Knyszek
Fixes #15490. Change-Id: I6ce9edc46398030ff639e22d4ca4adebccdfe1b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/690399 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-08-15runtime/metrics: add metric for total goroutines createdMichael Anthony Knyszek
For #15490. Change-Id: Ic587dda1f42d613ea131a6b53ce6ba6e6cadf4c7 Reviewed-on: https://go-review.googlesource.com/c/go/+/690398 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-15runtime/metrics: add metrics for goroutine sched statesMichael Anthony Knyszek
This is largely a port of CL 38180. For #15490. Change-Id: I2726111e472e81e9f9f0f294df97872c2689f061 Reviewed-on: https://go-review.googlesource.com/c/go/+/690397 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-05runtime: save scalar registers off stack in amd64 async preemptionAustin Clements
Asynchronous preemption must save all registers that could be in use by Go code. Currently, it saves all of these to the goroutine stack. As a result, the stack frame requirements of asynchronous preemption can be rather high. On amd64, this requires 368 bytes of stack space, most of which is the XMM registers. Several RISC architectures are around 0.5 KiB. As we add support for SIMD instructions, this is going to become a problem. The AVX-512 register state is 2.5 KiB. This well exceeds the nosplit limit, and even if it didn't, could constrain when we can asynchronously preempt goroutines on small stacks. This CL fixes this by moving pure scalar state stored in non-GP registers off the stack and into an allocated "extended register state" object. To reduce space overhead, we only allocate these objects as needed. While in the theoretical limit, every G could need this register state, in practice very few do at a time. However, we can't allocate when we're in the middle of saving the register state during an asynchronous preemption, so we reserve scratch space on every P to temporarily store the register state, which can then be copied out to an allocated state object later by Go code. This commit only implements this for amd64, since that's where we're about to add much more vector state, but it lays the groundwork for doing this on any architecture that could benefit. This is a cherry-pick of CL 680898 plus bug fix CL 684836 from the dev.simd branch. Change-Id: I123a95e21c11d5c10942d70e27f84d2d99bbf735 Reviewed-on: https://go-review.googlesource.com/c/go/+/669195 Auto-Submit: Austin Clements <austin@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-07-28internal/runtime/syscall/windows: factor out code from runtimeqmuntal
Factor out the code related to doing calls using the Windows stdcall calling convention into a separate package. This will allow us to reuse it in other low-level packages that can't depend on syscall. Updates #51087. Cq-Include-Trybots: luci.golang.try:gotip-windows-arm64,gotip-windows-amd64-longtest,gotip-solaris-amd64 Change-Id: I68640b07091183b50da6bef17406c10a397896e9 Reviewed-on: https://go-review.googlesource.com/c/go/+/689156 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-07-24cmd/internal/obj: rip out argp adjustment for wrapper framesKeith Randall
The previous CL made this adjustment unnecessary. The argp field is no longer used by the runtime. Change-Id: I3491eeef4103c6653ec345d604c0acd290af9e8f Reviewed-on: https://go-review.googlesource.com/c/go/+/685356 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-07-24runtime: detect successful recovers differentlyKeith Randall
Use stack unwinding instead of keeping incremental track of the argp of defers that are allowed to recover. It's much simpler, and it lets us get rid of the incremental tracking by wrapper code. (Ripped out in a subsequent CL.) We only need to stack unwind a few frames to get the right answer, and only when recover()ing in a panic situation. It will be more expensive in that case, but cheaper in all others. Change-Id: Id095807db6864b7ac1e1baf09285b77a07c46d19 Reviewed-on: https://go-review.googlesource.com/c/go/+/685355 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-06-30runtime: stash allpSnapshot on the MMichael Pratt
findRunnable takes a snapshot of allp prior to dropping the P because afterwards procresize may mutate allp without synchronization. procresize is careful to never mutate the contents up to cap(allp), so findRunnable can still safely access the Ps in the slice. Unfortunately, growing allp is problematic. If procresize grows the allp backing array, it drops the reference to the old array. allpSnapshot still refers to the old array, but allpSnapshot is on the system stack in findRunnable, which also likely no longer has a P at all. This means that a future GC will not find the reference and can free the array and use it for another allocation. This would corrupt later reads that findRunnable does from the array. The fix is simple: the M struct itself is reachable by the GC, so we can stash the snapshot in the M to ensure it is visible to the GC. The ugliest part of the CL is the cleanup when we are done with the snapshot because there are so many return/goto top sites. I am tempted to put mp.clearAllpSnapshot() in the caller and at top to make this less error prone, at the expensive of extra unnecessary writes. Fixes #74414. Change-Id: I6a6a636c484e4f4b34794fd07910b3fffeca830b Reviewed-on: https://go-review.googlesource.com/c/go/+/684460 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-06-27runtime: account for missing frame pointer in preambleMichael Anthony Knyszek
If a goroutine is synchronously preempted, then taking a frame-pointer-based stack trace at that preemption will skip PC of the caller of the function which called into morestack. This happens because the frame pointer is pushed to the stack after the preamble, leaving the stack in an odd state for frame pointer unwinding. Deal with this by marking a goroutine as synchronously preempted and using that signal to load the missing PC from the stack. On LR platforms this is available in gp.sched.lr. On non-LR platforms like x86, it's at gp.sched.sp, because there are no args, no locals, and no frame pointer pushed to the SP yet. For #68090. Change-Id: I73a1206d8b84eecb8a96dbe727195da30088f288 Reviewed-on: https://go-review.googlesource.com/c/go/+/684435 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com>
2025-06-18runtime: prevent mutual deadlock between GC stopTheWorld and suspendGMichael Anthony Knyszek
Almost everywhere we stop the world we casGToWaitingForGC to prevent mutual deadlock with the GC trying to scan our stack. This historically was only necessary if we weren't stopping the world to change the GC phase, because what we were worried about was mutual deadlock with mark workers' use of suspendG. And, they were the only users of suspendG. In Go 1.22 this changed. The execution tracer began using suspendG, too. This leads to the possibility of mutual deadlock between the execution tracer and a goroutine trying to start or end the GC mark phase. The fix is simple: make the stop-the-world calls for the GC also call casGToWaitingForGC. This way, suspendG is guaranteed to make progress in this circumstance, and once it completes, the stop-the-world can complete as well. We can take this a step further, though, and move casGToWaitingForGC into stopTheWorldWithSema, since there's no longer really a place we can afford to skip this detail. While we're here, rename casGToWaitingForGC to casGToWaitingForSuspendG, since the GC is now not the only potential source of mutual deadlock. Fixes #72740. Change-Id: I5e3739a463ef3e8173ad33c531e696e46260692f Reviewed-on: https://go-review.googlesource.com/c/go/+/681501 Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-06-09runtime: clarify stack traces for bubbled goroutinesDamien Neil
Use the synctest bubble ID to identify bubbles in traces, rather than the goroutine ID of the bubble's root goroutine. Some waitReasons include a "(synctest)" suffix to distinguish a durably blocking state from a non-durable one. For example, "chan send" vs. "chan send (synctest)". Change this suffix to "(durable)". Always print a "(durable)" sufix for the state of durably blocked bubbled goroutines. For example, print "sleep (durable)". Drop the "[not] durably blocked" text from goroutine states, since this is now entirely redundant with the waitReason. Old: goroutine 8 [chan receive (synctest), synctest bubble 7, durably blocked]: goroutine 9 [select (no cases), synctest bubble 7, durably blocked]: New: goroutine 8 [chan receive (durable), synctest bubble 1]: goroutine 9 [select (no cases) (durable), synctest bubble 1]: Change-Id: I89112efb25150a98a2954f54d1910ccec52a5824 Reviewed-on: https://go-review.googlesource.com/c/go/+/679376 Auto-Submit: Damien Neil <dneil@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-29runtime: guarantee no GOMAXPROCS update syscalls after GOMAXPROCS callMichael Pratt
We already guarantee that no automatic updates to GOMAXPROCS occur after a GOMAXPROCS call returns. This is easily achieved by having the update goroutine double-check that updates are still allowed during STW before committing the new value. However, it is possible for sysmon to concurrently run defaultGOMAXPROCS to compute a new GOMAXPROCS value after GOMAXPROCS returns. This new value will be discarded later, but we'll still perform the system calls necessary to compute the new value. Normally this distinction doesn't matter, but if you want to sandbox a Go program, then you may want to disable GOMAXPROCS updates to reduce the system call footprint. A call to GOMAXPROCS will disable updates, but without a guarantee on when sysmon will observe the change it is somewhat fragile. Add explicit synchronization between GOMAXPROCS and sysmon to guarantee that sysmon won't run defaultGOMAXPROCS after GOMAXPROCS returns. The synchronization is a bit complex because we can't hold a mutex across STW, nor take a semaphore from sysmon, but the result isn't too bad. One oddity is that sched.customGOMAXPROCS and gomaxprocs are no longer updated in lockstep (even though both are protected by sched.lock), but I don't believe anything should depend on that. For #73193. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking Change-Id: I6a6a636cff243a9b69ac1b5d2f98925648e60236 Reviewed-on: https://go-review.googlesource.com/c/go/+/677037 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-29runtime, internal/synctest, sync: associate WaitGroups with bubblesDamien Neil
Add support to internal/synctest for managing associations between arbitrary pointers and synctest bubbles. (Implemented internally to the runtime package by attaching a special to the pointer.) Associate WaitGroups with bubbles. Since WaitGroups don't have a constructor, perform the association when Add is called. All Add calls must be made from within the same bubble, or outside any bubble. When a bubbled goroutine calls WaitGroup.Wait, the wait is durably blocking iff the WaitGroup is associated with the current bubble. Change-Id: I77e2701e734ac2fa2b32b28d5b0c853b7b2825c9 Reviewed-on: https://go-review.googlesource.com/c/go/+/676656 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Damien Neil <dneil@google.com>
2025-05-21runtime: use cgroup CPU limit to set GOMAXPROCSMichael Pratt
This CL adds two related features enabled by default via compatibility GODEBUGs containermaxprocs and updatemaxprocs. On Linux, containermaxprocs makes the Go runtime consider cgroup CPU bandwidth limits (quota/period) when setting GOMAXPROCS. If the cgroup limit is lower than the number of logical CPUs available, then the cgroup limit takes precedence. On all OSes, updatemaxprocs makes the Go runtime periodically recalculate the default GOMAXPROCS value and update GOMAXPROCS if it has changed. If GOMAXPROCS is set manually, this update does not occur. This is intended primarily to detect changes to cgroup limits, but it applies on all OSes because the CPU affinity mask can change as well. The runtime only considers the limit in the leaf cgroup (the one that actually contains the process), caching the CPU limit file descriptor(s), which are periodically reread for updates. This is a small departure from the original proposed design. It will not consider limits of parent cgroups (which may be lower than the leaf), and it will not detection cgroup migration after process start. We can consider changing this in the future, but the simpler approach is less invasive; less risk to packages that have some awareness of runtime internals. e.g., if the runtime periodically opens new files during execution, file descriptor leak detection is difficult to implement in a stable way. For #73193. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Change-Id: I6a6a636c631c1ae577fb8254960377ba91c5dc98 Reviewed-on: https://go-review.googlesource.com/c/go/+/670497 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-21runtime: add valgrind instrumentationRoland Shoemaker
Add build tag gated Valgrind annotations to the runtime which let it understand how the runtime manages memory. This allows for Go binaries to be run under Valgrind without emitting spurious errors. Instead of adding the Valgrind headers to the tree, and using cgo to call the various Valgrind client request macros, we just add an assembly function which emits the necessary instructions to trigger client requests. In particular we add instrumentation of the memory allocator, using a two-level mempool structure (as described in the Valgrind manual [0]). We also add annotations which allow Valgrind to track which memory we use for stacks, which seems necessary to let it properly function. We describe the memory model to Valgrind as follows: we treat heap arenas as a "pool" created with VALGRIND_CREATE_MEMPOOL_EXT (so that we can use VALGRIND_MEMPOOL_METAPOOL and VALGRIND_MEMPOOL_AUTO_FREE). Within the pool we treat spans as "superblocks", annotated with VALGRIND_MEMPOOL_ALLOC. We then allocate individual objects within spans with VALGRIND_MALLOCLIKE_BLOCK. It should be noted that running binaries under Valgrind can be _quite slow_, and certain operations, such as running the GC, can be _very slow_. It is recommended to run programs with GOGC=off. Additionally, async preemption should be turned off, since it'll cause strange behavior (GODEBUG=asyncpreemptoff=1). Running Valgrind with --leak-check=yes will result in some errors resulting from some things not being marked fully free'd. These likely need more annotations to rectify, but for now it is recommended to run with --leak-check=off. Updates #73602 [0] https://valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools Change-Id: I71b26c47d7084de71ef1e03947ef6b1cc6d38301 Reviewed-on: https://go-review.googlesource.com/c/go/+/674077 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-20runtime: report finalizer and cleanup queue length with checkfinalizer>0Michael Anthony Knyszek
This change adds tracking for approximate finalizer and cleanup queue lengths. These lengths are reported once every GC cycle as a single line printed to stderr when GODEBUG=checkfinalizer>0. This change lays the groundwork for runtime/metrics metrics to produce the same values. For #72948. For #72950. Change-Id: I081721238a0fc4c7e5bee2dbaba6cfb4120d1a33 Reviewed-on: https://go-review.googlesource.com/c/go/+/671437 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-19runtime: rename ncpu to numCPUStartupMichael Pratt
ncpu is the total logical CPU count at startup. It is never updated. For #73193, we will start using updated CPU counts for updated GOMAXPROCS, making the ncpu name a bit ambiguous. Change to a less ambiguous name. While we're at it, give the OS specific lookup functions a common name, so it can be used outside of osinit later. For #73193. Change-Id: I6a6a636cf21cc60de36b211f3c374080849fc667 Reviewed-on: https://go-review.googlesource.com/c/go/+/672277 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-05-12runtime: only poll network from one P at a time in findRunnableCarlos Amedee
This change reintroduces CL 564197. It was reverted due to a failing benchmark. That failure has been resolved. For #65064 Change-Id: Ic88841d2bc24c2717ad324873f0f52699f21dc66 Reviewed-on: https://go-review.googlesource.com/c/go/+/669235 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-08runtime: schedule cleanups across multiple goroutinesMichael Anthony Knyszek
This change splits the finalizer and cleanup queues and implements a new lock-free blocking queue for cleanups. The basic design is as follows: The cleanup queue is organized in fixed-sized blocks. Individual cleanup functions are queued, but only whole blocks are dequeued. Enqueuing cleanups places them in P-local cleanup blocks. These are flushed to the full list as they get full. Cleanups can only be enqueued by an active sweeper. Dequeuing cleanups always dequeues entire blocks from the full list. Cleanup blocks can be dequeued and executed at any time. The very last active sweeper in the sweep phase is responsible for flushing all local cleanup blocks to the full list. It can do this without any synchronization because the next GC can't start yet, so we can be very certain that nobody else will be accessing the local blocks. Cleanup blocks are stored off-heap because the need to be allocated by the sweeper, which is called from heap allocation paths. As a result, the GC treats cleanup blocks as roots, just like finalizer blocks. Flushes to the full list signal to the scheduler that cleanup goroutines should be awoken. Every time the scheduler goes to wake up a cleanup goroutine and there were more signals than goroutines to wake, it then forwards this signal to runtime.AddCleanup, so that it creates another goroutine the next time it is called, up to gomaxprocs goroutines. The signals here are a little convoluted, but exist because the sweeper and the scheduler cannot safely create new goroutines. For #71772. For #71825. Change-Id: Ie839fde2b67e1b79ac1426be0ea29a8d923a62cc Reviewed-on: https://go-review.googlesource.com/c/go/+/650697 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-07runtime: use "bubble" terminology for synctestDamien Neil
We've settled on calling the group of goroutines started by synctest.Run a "bubble". At the time the runtime implementation was written, I was still calling this a "group". Update the code to match the current terminology. Change-Id: I31b757f31d804b5d5f9564c182627030a9532f4a Reviewed-on: https://go-review.googlesource.com/c/go/+/670135 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Damien Neil <dneil@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-06runtime: replace mentions of "raised" with "panicked"Mark Freeman
Fixes #73526 Change-Id: I4b801cf3e54b99559e6d5ca8fdb2fd0692a0d3a5 Reviewed-on: https://go-review.googlesource.com/c/go/+/669975 TryBot-Bypass: Mark Freeman <mark@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Mark Freeman <mark@golang.org> Reviewed-by: Mark Freeman <mark@golang.org>
2025-05-05Revert "cmd/compile: allow all of the preamble to be preemptible"Keith Randall
This reverts commits 3f3782feed6e0726ddb08afd32dad7d94fbb38c6 (CL 648518) b386b628521780c048af14a148f373c84e687b26 (CL 668475) Fixes #73542 Change-Id: I218851c5c0b62700281feb0b3f82b6b9b97b910d Reviewed-on: https://go-review.googlesource.com/c/go/+/670055 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-25cmd/compile: allow all of the preamble to be preemptibleKeith Randall
We currently make some parts of the preamble unpreemptible because it confuses morestack. See comments in the code. Instead, have morestack handle those weird cases so we can remove unpreemptible marks from most places. This CL makes user functions preemptible everywhere if they have no write barriers (at least, on x86). In cmd/go the fraction of functions that need preemptible markings drops from 82% to 36%. Makes the cmd/go binary 0.3% smaller. Update #35470 Change-Id: Ic83d5eabfd0f6d239a92e65684bcce7e67ff30bb Reviewed-on: https://go-review.googlesource.com/c/go/+/648518 Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-04-22Revert "runtime: only poll network from one P at a time in findRunnable"Carlos Amedee
This reverts commit 352dd2d932c1c1c6dbc3e112fcdfface07d4fffb. Reason for revert: cockroachdb benchmark failing. Likely due to CL 564197. For #73474 Change-Id: Id5d83cd8bb8fe9ee7fddb8dc01f1a01f2d40154e Reviewed-on: https://go-review.googlesource.com/c/go/+/667336 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Auto-Submit: Carlos Amedee <carlos@golang.org>
2025-04-22runtime: commit to spinbitmutex GOEXPERIMENTRhys Hiltner
Use the "spinbit" mutex implementation always (including on platforms that need to emulate atomic.Xchg8), and delete the prior "tristate" implementations. The exception is GOARCH=wasm, where the Go runtime does not use multiple threads. For #68578 Change-Id: Ifc29bbfa05071d776c23a19ae185891a03a82417 Reviewed-on: https://go-review.googlesource.com/c/go/+/658456 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-22runtime: only poll network from one P at a time in findRunnableIan Lance Taylor
For #65064 Change-Id: Ifecd7e332d2cf251750752743befeda4ed396f33 Reviewed-on: https://go-review.googlesource.com/c/go/+/564197 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Artur M. Wolff <artur.m.wolff@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-04-15runtime: size field for gQueue and gListDmitrii Martynov
Before CL, all instances of gQueue and gList stored the size of structures in a separate variable. The size changed manually and passed as a separate argument to different functions. This CL added an additional field to gQueue and gList structures to store the size. Also, the calculation of size was moved into the implementation of API for these structures. This allows to reduce possible errors by eliminating manual calculation of the size and simplifying functions' signatures. Change-Id: I087da2dfaec4925e4254ad40fce5ccb4c175ec41 Reviewed-on: https://go-review.googlesource.com/c/go/+/664777 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>
2025-04-11runtime: handle m0 padding betterRuss Cox
The SpinbitMutex experiment requires m structs other than m0 to be allocated in 2048-byte size class, by adding padding. Do the calculation more explicitly, to avoid future CLs like CL 653335. Change-Id: I83ae1e86ef3711ab65441f4e487f94b9e1429029 Reviewed-on: https://go-review.googlesource.com/c/go/+/654595 Reviewed-by: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-03-14runtime: only set isExtraInC if there are no Go frames leftMichael Pratt
mp.isExtraInC is intended to indicate that this M has no Go frames at all; it is entirely executing in C. If there was a cgocallback to Go and then a cgocall to C, such that the leaf frames are C, that is fine. e.g., traceback can handle this fine with SetCgoTraceback (or by simply skipping the C frames). However, we currently mismanage isExtraInC, unconditionally setting it on return from cgocallback. This means that if there are two levels of cgocallback, we end up running Go code with isExtraInC set. 1. C-created thread calls into Go function 1 (via cgocallback). 2. Go function 1 calls into C function 1 (via cgocall). 3. C function 1 calls into Go function 2 (via cgocallback). 4. Go function 2 returns back to C function 1 (returning via the remainder of cgocallback). 5. C function 1 returns back to Go function 1 (returning via the remainder of cgocall). 6. Go function 1 is now running with mp.isExtraInC == true. The fix is simple; only set isExtraInC on return from cgocallback if there are no more Go frames. There can't be more Go frames unless there is an active cgocall out of the Go frames. Fixes #72870. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Change-Id: I6a6a636c4e7ba75a29639d7036c5af3738033467 Reviewed-on: https://go-review.googlesource.com/c/go/+/658035 Reviewed-by: Cherry Mui <cherryyz@google.com> Commit-Queue: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-01runtime: add padding to m struct for 64 bit architecturesJoel Sing
CL 652276 reduced the m struct by 8 bytes, which has changed the allocation class on 64 bit OpenBSD platforms. This results in build failures due to: M structure uses sizeclass 1792/0x700 bytes; incompatible with mutex flag mask 0x3ff Add 128 bytes of padding when spinbitmutex is enabled on 64 bit architectures, moving the size to the half point between the 1792 and 2048 allocation size. Change-Id: I71623a1f75714543c302217e619d20cf0e717aeb Reviewed-on: https://go-review.googlesource.com/c/go/+/653335 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-02-26runtime: remove ret field from gobufKeith Randall
It's not used for anything. Change-Id: I031b3cdfe52b6b1cff4b3cb6713ffe588084542f Reviewed-on: https://go-review.googlesource.com/c/go/+/652276 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-06runtime: don't duplicate reraised panic values in printpanicsDamien Neil
Change the output printed when crashing with a reraised panic value to not duplicate that value. Changes output of panicking with "PANIC", recovering, and reraising from: panic: PANIC [recovered] panic: PANIC to: panic: PANIC [recovered, reraised] Fixes #71517 Change-Id: Id59938c4ea0df555b851ffc650fe6f94c0845499 Reviewed-on: https://go-review.googlesource.com/c/go/+/645916 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>