diff options
| author | Michael Anthony Knyszek <mknyszek@google.com> | 2020-08-10 20:02:22 +0000 |
|---|---|---|
| committer | Dmitri Shuralyov <dmitshur@golang.org> | 2020-10-07 20:34:14 +0000 |
| commit | bf79f91d3dc424b2e61d5a48fc6864b8aea65ed3 (patch) | |
| tree | e1aa68cb0c947bcf9bab11264d1747009711dfdb /src/runtime/stack.go | |
| parent | 400c68a04d2d3a177899438948c7b712054521e6 (diff) | |
| download | go-bf79f91d3dc424b2e61d5a48fc6864b8aea65ed3.tar.xz | |
[release-branch.go1.15] runtime: disable stack shrinking in activeStackChans race window
Currently activeStackChans is set before a goroutine blocks on a channel
operation in an unlockf passed to gopark. The trouble is that the
unlockf is called *after* the G's status is changed, and the G's status
is what is used by a concurrent mark worker (calling suspendG) to
determine that a G has successfully been suspended. In this window
between the status change and unlockf, the mark worker could try to
shrink the G's stack, and in particular observe that activeStackChans is
false. This observation will cause the mark worker to *not* synchronize
with concurrent channel operations when it should, and so updating
pointers in the sudog for the blocked goroutine (which may point to the
goroutine's stack) races with channel operations which may also
manipulate the pointer (read it, dereference it, update it, etc.).
Fix the problem by adding a new atomically-updated flag to the g struct
called parkingOnChan, which is non-zero in the race window above. Then,
in isShrinkStackSafe, check if parkingOnChan is zero. The race is
resolved like so:
* Blocking G sets parkingOnChan, then changes status in gopark.
* Mark worker successfully suspends blocking G.
* If the mark worker observes parkingOnChan is non-zero when checking
isShrinkStackSafe, then it's not safe to shrink (we're in the race
window).
* If the mark worker observes parkingOnChan as zero, then because
the mark worker observed the G status change, it can be sure that
gopark's unlockf completed, and gp.activeStackChans will be correct.
The risk of this change is low, since although it reduces the number of
places that stack shrinking is allowed, the window here is incredibly
small. Essentially, every place that it might crash now is replaced with
no shrink.
This change adds a test, but the race window is so small that it's hard
to trigger without a well-placed sleep in park_m. Also, this change
fixes stackGrowRecursive in proc_test.go to actually allocate a 128-byte
stack frame. It turns out the compiler was destructuring the "pad" field
and only allocating one uint64 on the stack.
For #40641.
Fixes #40643.
Change-Id: I7dfbe7d460f6972b8956116b137bc13bc24464e8
Reviewed-on: https://go-review.googlesource.com/c/go/+/247050
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
(cherry picked from commit eb3c6a93c3236bbde5dee6cc5bd4ca9f8ab1647a)
Reviewed-on: https://go-review.googlesource.com/c/go/+/256300
Reviewed-by: Austin Clements <austin@google.com>
Diffstat (limited to 'src/runtime/stack.go')
| -rw-r--r-- | src/runtime/stack.go | 13 |
1 files changed, 12 insertions, 1 deletions
diff --git a/src/runtime/stack.go b/src/runtime/stack.go index 0e930f60db..f38ba56c80 100644 --- a/src/runtime/stack.go +++ b/src/runtime/stack.go @@ -864,6 +864,13 @@ func copystack(gp *g, newsize uintptr) { // Adjust sudogs, synchronizing with channel ops if necessary. ncopy := used if !gp.activeStackChans { + if newsize < old.hi-old.lo && atomic.Load8(&gp.parkingOnChan) != 0 { + // It's not safe for someone to shrink this stack while we're actively + // parking on a channel, but it is safe to grow since we do that + // ourselves and explicitly don't want to synchronize with channels + // since we could self-deadlock. + throw("racy sudog adjustment due to parking on channel") + } adjustsudogs(gp, &adjinfo) } else { // sudogs may be pointing in to the stack and gp has @@ -1103,7 +1110,11 @@ func isShrinkStackSafe(gp *g) bool { // We also can't copy the stack if we're at an asynchronous // safe-point because we don't have precise pointer maps for // all frames. - return gp.syscallsp == 0 && !gp.asyncSafePoint + // + // We also can't *shrink* the stack in the window between the + // goroutine calling gopark to park on a channel and + // gp.activeStackChans being set. + return gp.syscallsp == 0 && !gp.asyncSafePoint && atomic.Load8(&gp.parkingOnChan) == 0 } // Maybe shrink the stack being used by gp. |
