aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/stack.go
diff options
context:
space:
mode:
authorMichael Anthony Knyszek <mknyszek@google.com>2024-03-21 18:49:05 +0000
committerGopher Robot <gobot@golang.org>2024-04-05 20:50:21 +0000
commitd6a3d093c3f630e206abfc974a4a8b6c07884485 (patch)
tree33234f7f646c1cc359b4df7e46a751bfcd76fc34 /src/runtime/stack.go
parent5879bf7e38ac49e2e0caddd11cd4ddd4a4782437 (diff)
downloadgo-d6a3d093c3f630e206abfc974a4a8b6c07884485.tar.xz
runtime: take a stack trace during tracing only when we own the stack
Currently, the execution tracer may attempt to take a stack trace of a goroutine whose stack it does not own. For example, if the goroutine is in _Grunnable or _Gwaiting. This is easily fixed in all cases by simply moving the emission of GoStop and GoBlock events to before the casgstatus happens. The goroutine status is what is used to signal stack ownership, and the GC may shrink a goroutine's stack if it can acquire the scan bit. Although this is easily fixed, the interaction here is very subtle, because stack ownership is only implicit in the goroutine's scan status. To make this invariant more maintainable and less error-prone in the future, this change adds a GODEBUG setting that checks, at the point of taking a stack trace, whether the caller owns the goroutine. This check is not quite perfect because there's no way for the stack tracing code to know that the _Gscan bit was acquired by the caller, so for simplicity it assumes that it was the caller that acquired the scan bit. In all other cases however, we can check for ownership precisely. At the very least, this check is sufficient to catch the issue this change is fixing. To make sure this debug check doesn't bitrot, it's always enabled during trace testing. This new mode has actually caught a few other issues already, so this change fixes them. One issue that this debug mode caught was that it's not safe to take a stack trace of a _Gwaiting goroutine that's being unparked. Another much bigger issue this debug mode caught was the fact that the execution tracer could try to take a stack trace of a G that was in _Gwaiting solely to avoid a deadlock in the GC. The execution tracer already has a partial list of these cases since they're modeled as the goroutine just executing as normal in the tracer, but this change takes the list and makes it more formal. In this specific case, we now prevent the GC from shrinking the stacks of goroutines in this state if tracing is enabled. The stack traces from these scenarios are too useful to discard, but there is indeed a race here between the tracer and any attempt to shrink the stack by the GC. Change-Id: I019850dabc8cede202fd6dcc0a4b1f16764209fb Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race Reviewed-on: https://go-review.googlesource.com/c/go/+/573155 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
Diffstat (limited to 'src/runtime/stack.go')
-rw-r--r--src/runtime/stack.go27
1 files changed, 23 insertions, 4 deletions
diff --git a/src/runtime/stack.go b/src/runtime/stack.go
index 8acc5e9f98..6679cd993d 100644
--- a/src/runtime/stack.go
+++ b/src/runtime/stack.go
@@ -1136,21 +1136,40 @@ func gostartcallfn(gobuf *gobuf, fv *funcval) {
// isShrinkStackSafe returns whether it's safe to attempt to shrink
// gp's stack. Shrinking the stack is only safe when we have precise
-// pointer maps for all frames on the stack.
+// pointer maps for all frames on the stack. The caller must hold the
+// _Gscan bit for gp or must be running gp itself.
func isShrinkStackSafe(gp *g) bool {
// We can't copy the stack if we're in a syscall.
// The syscall might have pointers into the stack and
// often we don't have precise pointer maps for the innermost
// frames.
- //
+ if gp.syscallsp != 0 {
+ return false
+ }
// We also can't copy the stack if we're at an asynchronous
// safe-point because we don't have precise pointer maps for
// all frames.
- //
+ if gp.asyncSafePoint {
+ return false
+ }
// We also can't *shrink* the stack in the window between the
// goroutine calling gopark to park on a channel and
// gp.activeStackChans being set.
- return gp.syscallsp == 0 && !gp.asyncSafePoint && !gp.parkingOnChan.Load()
+ if gp.parkingOnChan.Load() {
+ return false
+ }
+ // We also can't copy the stack while tracing is enabled, and
+ // gp is in _Gwaiting solely to make itself available to the GC.
+ // In these cases, the G is actually executing on the system
+ // stack, and the execution tracer may want to take a stack trace
+ // of the G's stack. Note: it's safe to access gp.waitreason here.
+ // We're only checking if this is true if we took ownership of the
+ // G with the _Gscan bit. This prevents the goroutine from transitioning,
+ // which prevents gp.waitreason from changing.
+ if traceEnabled() && readgstatus(gp)&^_Gscan == _Gwaiting && gp.waitreason.isWaitingForGC() {
+ return false
+ }
+ return true
}
// Maybe shrink the stack being used by gp.