diff options
| author | Austin Clements <austin@google.com> | 2025-04-29 22:55:40 -0400 |
|---|---|---|
| committer | Gopher Robot <gobot@golang.org> | 2025-06-30 11:50:33 -0700 |
| commit | 426cf36b4d0c672dc88fc5cef9b0d5db0d2f4fe5 (patch) | |
| tree | 1ed5d81491cef46cc97b6de82490ed9c8be1961d /src/runtime/runtime2.go | |
| parent | ead249a2e2989c6775235058d38f0e33afdf752a (diff) | |
| download | go-426cf36b4d0c672dc88fc5cef9b0d5db0d2f4fe5.tar.xz | |
[dev.simd] runtime: save scalar registers off stack in amd64 async preemption
Asynchronous preemption must save all registers that could be in use
by Go code. Currently, it saves all of these to the goroutine stack.
As a result, the stack frame requirements of asynchronous preemption
can be rather high. On amd64, this requires 368 bytes of stack space,
most of which is the XMM registers. Several RISC architectures are
around 0.5 KiB.
As we add support for SIMD instructions, this is going to become a
problem. The AVX-512 register state is 2.5 KiB. This well exceeds the
nosplit limit, and even if it didn't, could constrain when we can
asynchronously preempt goroutines on small stacks.
This CL fixes this by moving pure scalar state stored in non-GP
registers off the stack and into an allocated "extended register
state" object. To reduce space overhead, we only allocate these
objects as needed. While in the theoretical limit, every G could need
this register state, in practice very few do at a time.
However, we can't allocate when we're in the middle of saving the
register state during an asynchronous preemption, so we reserve
scratch space on every P to temporarily store the register state,
which can then be copied out to an allocated state object later by Go
code.
This commit only implements this for amd64, since that's where we're
about to add much more vector state, but it lays the groundwork for
doing this on any architecture that could benefit.
Change-Id: I123a95e21c11d5c10942d70e27f84d2d99bbf735
Reviewed-on: https://go-review.googlesource.com/c/go/+/680898
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Austin Clements <austin@google.com>
Diffstat (limited to 'src/runtime/runtime2.go')
| -rw-r--r-- | src/runtime/runtime2.go | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/src/runtime/runtime2.go b/src/runtime/runtime2.go index 96720846b2..789b68e54e 100644 --- a/src/runtime/runtime2.go +++ b/src/runtime/runtime2.go @@ -491,6 +491,10 @@ type g struct { coroarg *coro // argument during coroutine transfers bubble *synctestBubble + // xRegs stores the extended register state if this G has been + // asynchronously preempted. + xRegs xRegPerG + // Per-G tracer state. trace gTraceState @@ -760,6 +764,11 @@ type p struct { // gcStopTime is the nanotime timestamp that this P last entered _Pgcstop. gcStopTime int64 + // xRegs is the per-P extended register state used by asynchronous + // preemption. This is an empty struct on platforms that don't use extended + // register state. + xRegs xRegPerP + // Padding is no longer needed. False sharing is now not a worry because p is large enough // that its size class is an integer multiple of the cache line size (for any of our architectures). } |
