diff options
| author | Michael Anthony Knyszek <mknyszek@google.com> | 2024-03-21 18:49:05 +0000 |
|---|---|---|
| committer | Gopher Robot <gobot@golang.org> | 2024-04-05 20:50:21 +0000 |
| commit | d6a3d093c3f630e206abfc974a4a8b6c07884485 (patch) | |
| tree | 33234f7f646c1cc359b4df7e46a751bfcd76fc34 /src/runtime/trace2stack.go | |
| parent | 5879bf7e38ac49e2e0caddd11cd4ddd4a4782437 (diff) | |
| download | go-d6a3d093c3f630e206abfc974a4a8b6c07884485.tar.xz | |
runtime: take a stack trace during tracing only when we own the stack
Currently, the execution tracer may attempt to take a stack trace of a
goroutine whose stack it does not own. For example, if the goroutine is
in _Grunnable or _Gwaiting. This is easily fixed in all cases by simply
moving the emission of GoStop and GoBlock events to before the
casgstatus happens. The goroutine status is what is used to signal stack
ownership, and the GC may shrink a goroutine's stack if it can acquire
the scan bit.
Although this is easily fixed, the interaction here is very subtle,
because stack ownership is only implicit in the goroutine's scan status.
To make this invariant more maintainable and less error-prone in the
future, this change adds a GODEBUG setting that checks, at the point of
taking a stack trace, whether the caller owns the goroutine. This check
is not quite perfect because there's no way for the stack tracing code
to know that the _Gscan bit was acquired by the caller, so for
simplicity it assumes that it was the caller that acquired the scan bit.
In all other cases however, we can check for ownership precisely. At the
very least, this check is sufficient to catch the issue this change is
fixing.
To make sure this debug check doesn't bitrot, it's always enabled during
trace testing. This new mode has actually caught a few other issues
already, so this change fixes them.
One issue that this debug mode caught was that it's not safe to take a
stack trace of a _Gwaiting goroutine that's being unparked.
Another much bigger issue this debug mode caught was the fact that the
execution tracer could try to take a stack trace of a G that was in
_Gwaiting solely to avoid a deadlock in the GC. The execution tracer
already has a partial list of these cases since they're modeled as the
goroutine just executing as normal in the tracer, but this change takes
the list and makes it more formal. In this specific case, we now prevent
the GC from shrinking the stacks of goroutines in this state if tracing
is enabled. The stack traces from these scenarios are too useful to
discard, but there is indeed a race here between the tracer and any
attempt to shrink the stack by the GC.
Change-Id: I019850dabc8cede202fd6dcc0a4b1f16764209fb
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race
Reviewed-on: https://go-review.googlesource.com/c/go/+/573155
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Diffstat (limited to 'src/runtime/trace2stack.go')
| -rw-r--r-- | src/runtime/trace2stack.go | 23 |
1 files changed, 23 insertions, 0 deletions
diff --git a/src/runtime/trace2stack.go b/src/runtime/trace2stack.go index 44588fa39e..4ee3b32b05 100644 --- a/src/runtime/trace2stack.go +++ b/src/runtime/trace2stack.go @@ -46,6 +46,29 @@ func traceStack(skip int, gp *g, gen uintptr) uint64 { mp = getg().m gp = mp.curg } + + // Double-check that we own the stack we're about to trace. + if debug.traceCheckStackOwnership != 0 && gp != nil { + status := readgstatus(gp) + // If the scan bit is set, assume we're the ones that acquired it. + if status&_Gscan == 0 { + // Use the trace status to check this. There are a number of cases + // where a running goroutine might be in _Gwaiting, and these cases + // are totally fine for taking a stack trace. They're captured + // correctly in goStatusToTraceGoStatus. + switch goStatusToTraceGoStatus(status, gp.waitreason) { + case traceGoRunning, traceGoSyscall: + if getg() == gp || mp.curg == gp { + break + } + fallthrough + default: + print("runtime: gp=", unsafe.Pointer(gp), " gp.goid=", gp.goid, " status=", gStatusStrings[status], "\n") + throw("attempted to trace stack of a goroutine this thread does not own") + } + } + } + if gp != nil && mp == nil { // We're getting the backtrace for a G that's not currently executing. // It may still have an M, if it's locked to some M. |
