| Age | Commit message (Collapse) | Author |
|
race
««« CL 179680043 / 752cd9199639
runtime: fix hang in GC due to shrinkstack vs netpoll race
During garbage collection, after scanning a stack, we think about
shrinking it to reclaim some memory. The shrinking code (called
while the world is stopped) checked that the status was Gwaiting
or Grunnable and then changed the state to Gcopystack, to essentially
lock the stack so that no other GC thread is scanning it.
The same locking happens for stack growth (and is more necessary there).
oldstatus = runtime·readgstatus(gp);
oldstatus &= ~Gscan;
if(oldstatus == Gwaiting || oldstatus == Grunnable)
runtime·casgstatus(gp, oldstatus, Gcopystack); // oldstatus is Gwaiting or Grunnable
else
runtime·throw("copystack: bad status, not Gwaiting or Grunnable");
Unfortunately, "stop the world" doesn't stop everything. It stops all
normal goroutine execution, but the network polling thread is still
blocked in epoll and may wake up. If it does, and it chooses a goroutine
to mark runnable, and that goroutine is the one whose stack is shrinking,
then it can happen that between readgstatus and casgstatus, the status
changes from Gwaiting to Grunnable.
casgstatus assumes that if the status is not what is expected, it is a
transient change (like from Gwaiting to Gscanwaiting and back, or like
from Gwaiting to Gcopystack and back), and it loops until the status
has been restored to the expected value. In this case, the status has
changed semi-permanently from Gwaiting to Grunnable - it won't
change again until the GC is done and the world can continue, but the
GC is waiting for the status to change back. This wedges the program.
To fix, call a special variant of casgstatus that accepts either Gwaiting
or Grunnable as valid statuses.
Without the fix bug with the extra check+throw in casgstatus, the
program below dies in a few seconds (2-10) with GOMAXPROCS=8
on a 2012 Retina MacBook Pro. With the fix, it runs for minutes
and minutes.
package main
import (
"io"
"log"
"net"
"runtime"
)
func main() {
const N = 100
for i := 0; i < N; i++ {
l, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
log.Fatal(err)
}
ch := make(chan net.Conn, 1)
go func() {
var err error
c1, err := net.Dial("tcp", l.Addr().String())
if err != nil {
log.Fatal(err)
}
ch <- c1
}()
c2, err := l.Accept()
if err != nil {
log.Fatal(err)
}
c1 := <-ch
l.Close()
go netguy(c1, c2)
go netguy(c2, c1)
c1.Write(make([]byte, 100))
}
for {
runtime.GC()
}
}
func netguy(r, w net.Conn) {
buf := make([]byte, 100)
for {
bigstack(1000)
_, err := io.ReadFull(r, buf)
if err != nil {
log.Fatal(err)
}
w.Write(buf)
}
}
var g int
func bigstack(n int) {
var buf [100]byte
if n > 0 {
bigstack(n - 1)
}
g = int(buf[0]) + int(buf[99])
}
Fixes #9186.
LGTM=rlh
R=austin, rlh
CC=dvyukov, golang-codereviews, iant, khr, r
https://golang.org/cl/179680043
»»»
TBR=rlh
CC=golang-codereviews
https://golang.org/cl/184030043
|
|
Originally traceback was only used for printing the stack
when an unexpected signal came in. In that case, the
initial PC is taken from the signal and should be used
unaltered. For the callers, the PC is the return address,
which might be on the line after the call; we subtract 1
to get to the CALL instruction.
Traceback is now used for a variety of things, and for
almost all of those the initial PC is a return address,
whether from getcallerpc, or gp->sched.pc, or gp->syscallpc.
In those cases, we need to subtract 1 from this initial PC,
but the traceback code had a hard rule "never subtract 1
from the initial PC", left over from the signal handling days.
Change gentraceback to take a flag that specifies whether
we are tracing a trap.
Change traceback to default to "starting with a return PC",
which is the overwhelmingly common case.
Add tracebacktrap, like traceback but starting with a trap PC.
Use tracebacktrap in signal handlers.
Fixes #7690.
LGTM=iant, r
R=r, iant
CC=golang-codereviews
https://golang.org/cl/167810044
|
|
Fixes #8861.
Fixes #8911.
LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/165780043
|
|
In channels, zeroing of gp.waiting is missed on a closed channel panic.
m.morebuf.g is not zeroed.
I don't expect the latter causes any problems, but just in case.
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/151610043
|
|
This change prevents confusion in the garbage collector.
The collector wants to make sure that every pointer it finds
isn't junk. Its criteria for junk is (among others) points
to a "free" span.
Because the stack shrinker modifies pointers in the heap,
there is a race condition between the GC scanner and the
shrinker. The GC scanner can see old pointers (pointers to
freed stacks). In particular this happens with SudoG.elem
pointers.
Normally this is not a problem, as pointers into stack spans
are ok. But if the freed stack is the last one in its span,
the span is marked as "free" instead of "contains stacks".
This change makes sure that even if the GC scanner sees
an old pointer, the span into which it points is still
marked as "contains stacks", and thus the GC doesn't
complain about it.
This change will make the GC pause a tiny bit slower, as
the stack freeing now happens in serial with the mark pause.
We could delay the freeing until the mutators start back up,
but this is the simplest change for now.
TBR=dvyukov
CC=golang-codereviews
https://golang.org/cl/158750043
|
|
Newstack runs on g0, g0->throwsplit is never set.
LGTM=rsc
R=rsc
CC=golang-codereviews, khr
https://golang.org/cl/147370043
|
|
During a cgo call, the stack can be copied. This copy invalidates
the pointer that cgo has into the return value area. To fix this
problem, pass the address of the location containing the stack
top value (which is in the G struct). For cgo functions which
return values, read the stktop before and after the cgo call to
compute the adjustment necessary to write the return value.
Fixes #8771
LGTM=iant, rsc
R=iant, rsc, khr
CC=golang-codereviews
https://golang.org/cl/144130043
|
|
In linker, refuse to write conservative (array of pointers) as the
garbage collection type for any variable in the data/bss GC program.
In the linker, attach the Go type to an already-read C declaration
during dedup. This gives us Go types for C globals for free as long
as the cmd/dist-generated Go code contains the declaration.
(Most runtime C declarations have a corresponding Go declaration.
Both are bss declarations and so the linker dedups them.)
In cmd/dist, add a few more C files to the auto-Go-declaration list
in order to get Go type information for the C declarations into the linker.
In C compiler, mark all non-pointer-containing global declarations
and all string data as NOPTR. This allows them to exist in C files
without any corresponding Go declaration. Count C function pointers
as "non-pointer-containing", since we have no heap-allocated C functions.
In runtime, add NOPTR to the remaining pointer-containing declarations,
none of which refer to Go heap objects.
In runtime, also move os.Args and syscall.envs data into runtime-owned
variables. Otherwise, in programs that do not import os or syscall, the
runtime variables named os.Args and syscall.envs will be missing type
information.
I believe that this CL eliminates the final source of conservative GC scanning
in non-SWIG Go programs, and therefore...
Fixes #909.
LGTM=iant
R=iant
CC=golang-codereviews
https://golang.org/cl/149770043
|
|
Pure renaming. This will make an upcoming CL have smaller diffs.
LGTM=dvyukov, iant
R=iant, dvyukov
CC=golang-codereviews
https://golang.org/cl/142280043
|
|
We could probably free the G structures as well, but
for the allg list. Leaving that for another day.
Fixes #8287
LGTM=rsc
R=golang-codereviews, dvyukov, khr, rsc
CC=golang-codereviews
https://golang.org/cl/145010043
|
|
The logic here is copied from mgc0.c's scanframe.
Mostly it is messages although the minsize code is new
(and I believe necessary).
I am hoping to get more information about the current
arm build failures (or, if it's the minsize thing, fix them).
TBR=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/143180043
|
|
This makes the GC and the stack copying agree about how
to interpret the defer structures. Previously, only the stack
copying treated them precisely.
This removes an untyped memory allocation and fixes
at least three copystack bugs.
To make sure the GC can find the deferred argument
frame until it has been copied, keep a Defer on the defer list
during its execution.
In addition to making it possible to remove the untyped
memory allocation, keeping the Defer on the list fixes
two races between copystack and execution of defers
(in both gopanic and Goexit). The problem is that once
the defer has been taken off the list, a stack copy that
happens before the deferred arguments have been copied
back to the stack will not update the arguments correctly.
The new tests TestDeferPtrsPanic and TestDeferPtrsGoexit
(variations on the existing TestDeferPtrs) pass now but
failed before this CL.
In addition to those fixes, keeping the Defer on the list
helps correct a dangling pointer error during copystack.
The traceback routines walk the Defer chain to provide
information about where a panic may resume execution.
When the executing Defer was not on the Defer chain
but instead linked from the Panic chain, the traceback
had to walk the Panic chain too. But Panic structs are
on the stack and being updated by copystack.
Traceback's use of the Panic chain while copystack is
updating those structs means that it can follow an
updated pointer and find itself reading from the new stack.
The new stack is usually all zeros, so it sees an incorrect
early end to the chain. The new TestPanicUseStack makes
this happen at tip and dies when adjustdefers finds an
unexpected argp. The new StackCopyPoison mode
causes an earlier bad dereference instead.
By keeping the Defer on the list, traceback can avoid
walking the Panic chain at all, making it okay for copystack
to update the Panics.
We'd have the same problem for any Defers on the stack.
There was only one: gopanic's dabort. Since we are not
taking the executing Defer off the chain, we can use it
to do what dabort was doing, and then there are no
Defers on the stack ever, so it is okay for traceback to use
the Defer chain even while copystack is executing:
copystack cannot modify the Defer chain.
LGTM=khr
R=khr
CC=dvyukov, golang-codereviews, iant, rlh
https://golang.org/cl/141490043
|
|
Dmitriy changed all the execution to interpret the BitVector
as an array of bytes. Update the declaration and generation
of the bitmaps to match, to avoid problems on big-endian
machines.
LGTM=khr
R=khr
CC=dvyukov, golang-codereviews
https://golang.org/cl/140570044
|
|
makeFuncStub and methodValueStub are used by reflect as
generic function implementations. Each call might have
different arguments. Extract those arguments from the
closure data instead of assuming it is the same each time.
Because the argument map is now being extracted from the
function itself, we don't need the special cases in reflect.Call
anymore, so delete those.
Fixes an occasional crash seen when stack copying does
not update makeFuncStub's arguments correctly.
Will also help make it safe to require stack maps in the
garbage collector.
Derived from CL 142000044 by khr.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/143890044
|
|
Just go ahead and do it, if something is wrong we'll throw.
Also rip out cc-generated arg ptr maps, they are useless now.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/133690045
|
|
Commit to stack copying for stack growth.
We're carrying around a surprising amount of cruft from older schemes.
I am confident that precise stack scans and stack copying are here to stay.
Delete fallback code for when precise stack info is disabled.
Delete fallback code for when copying stacks is disabled.
Delete fallback code for when StackCopyAlways is disabled.
Delete Stktop chain - there is only one stack segment now.
Delete M.moreargp, M.moreargsize, M.moreframesize, M.cret.
Delete G.writenbuf (unrelated, just dead).
Delete runtime.lessstack, runtime.oldstack.
Delete many amd64 morestack variants.
Delete initialization of morestack frame/arg sizes (shortens split prologue!).
Replace G's stackguard/stackbase/stack0/stacksize/
syscallstack/syscallguard/forkstackguard with simple stack
bounds (lo, hi).
Update liblink, runtime/cgo for adjustments to G.
LGTM=khr
R=khr, bradfitz
CC=golang-codereviews, iant, r
https://golang.org/cl/137410043
|
|
It already is updating parts of them; we're just getting lucky
retraversing them and not finding much to do.
Change argp to a pointer so that it will be updated too.
Existing tests break if you apply the change to adjustpanics
without also updating the type of argp.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/139380043
|
|
It worked at CL 134660043 on the builders,
so I believe it will stick this time.
LGTM=bradfitz
R=khr, bradfitz
CC=golang-codereviews
https://golang.org/cl/141280043
|
|
windows/amd64 failure:
http://build.golang.org/log/1ded5e3ef4bd1226f976e3180772f87e6c918255
# ..\misc\cgo\testso
runtime: copystack: locals size info only for syscall.Syscall
fatal error: split stack not allowed
runtime stack:
runtime.throw(0xa64cc7)
c:/go/src/runtime/panic.go:395 +0xad fp=0x6fde0 sp=0x6fdb0
runtime.newstack()
c:/go/src/runtime/stack.c:1001 +0x750 fp=0x6ff20 sp=0x6fde0
runtime.morestack()
c:/go/src/runtime/asm_amd64.s:306 +0x73 fp=0x6ff28 sp=0x6ff20
goroutine 1 [stack growth, locked to thread]:
runtime.freedefer(0xc0820ce120)
c:/go/src/runtime/panic.go:162 fp=0xc08201b1a0 sp=0xc08201b198
runtime.deferreturn(0xa69420)
c:/go/src/runtime/panic.go:211 +0xa8 fp=0xc08201b1e8 sp=0xc08201b1a0
runtime.cgocall_errno(0x498c00, 0xc08201b228, 0x0)
c:/go/src/runtime/cgocall.go:134 +0x10e fp=0xc08201b210 sp=0xc08201b1e8
syscall.Syscall(0x7786b1d0, 0x2, 0xc0820c85b0, 0xc08201b2d8, 0x0, 0x0, 0x0, 0x0)
c:/go/src/runtime/syscall_windows.c:74 +0x3c fp=0xc08201b260 sp=0xc08201b210
syscall.findFirstFile1(0xc0820c85b0, 0xc08201b2d8, 0x500000000000000, 0x0, 0x0)
c:/go/src/syscall/zsyscall_windows.go:340 +0x76 fp=0xc08201b2b0 sp=0xc08201b260
syscall.FindFirstFile(0xc0820c85b0, 0xc08210c500, 0xc0820c85b0, 0x0, 0x0)
c:/go/src/syscall/syscall_windows.go:907 +0x6a fp=0xc08201b530 sp=0xc08201b2b0
os.openDir(0xc0820b2e40, 0x33, 0x0, 0x0, 0x0)
c:/go/src/os/file_windows.go:96 +0x110 fp=0xc08201b5e0 sp=0xc08201b530
os.OpenFile(0xc0820b2e40, 0x33, 0x0, 0x0, 0x41, 0x0, 0x0)
c:/go/src/os/file_windows.go:143 +0x1e9 fp=0xc08201b650 sp=0xc08201b5e0
TBR=khr
CC=golang-codereviews
https://golang.org/cl/138230043
|
|
Let's see how close we are to this being ready.
Will roll back if it breaks any builds in non-trivial ways.
LGTM=r, khr
R=iant, khr, r
CC=golang-codereviews
https://golang.org/cl/138200043
|
|
This CL contains compiler+runtime changes that detect C code
running on Go (not g0, not gsignal) stacks, and it contains
corrections for what it detected.
The detection works by changing the C prologue to use a different
stack guard word in the G than Go prologue does. On the g0 and
gsignal stacks, that stack guard word is set to the usual
stack guard value. But on ordinary Go stacks, that stack
guard word is set to ^0, which will make any stack split
check fail. The C prologue then calls morestackc instead
of morestack, and morestackc aborts the program with
a message about running C code on a Go stack.
This check catches all C code running on the Go stack
except NOSPLIT code. The NOSPLIT code is allowed,
so the check is complete. Since it is a dynamic check,
the code must execute to be caught. But unlike the static
checks we've been using in cmd/ld, the dynamic check
works with function pointers and other indirect calls.
For example it caught sigpanic being pushed onto Go
stacks in the signal handlers.
Fixes #8667.
LGTM=khr, iant
R=golang-codereviews, khr, iant
CC=golang-codereviews, r
https://golang.org/cl/133700043
|
|
Preparation was in CL 134570043.
This CL contains only the effect of 'hg mv src/pkg/* src'.
For more about the move, see golang.org/s/go14nopkg.
|
|
mark and sweep, stop the world garbage collector
(intermediate step in the way to ref counting).
can run pretty with an explicit gc after each file.
R=r
DELTA=502 (346 added, 143 deleted, 13 changed)
OCL=20630
CL=20635
|
|
run oldstack on g0's stack, just like newstack does,
so that oldstack can free the old stack.
R=r
DELTA=53 (44 added, 0 deleted, 9 changed)
OCL=20404
CL=20433
|