aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/traceback.go
AgeCommit message (Collapse)Author
2018-02-13runtime: avoid bad unwinding from sigpanic in C codeAustin Clements
Currently, if a sigpanic call is injected into C code, it's possible for preparePanic to leave the stack in a state where traceback can't unwind correctly past the sigpanic. Specifically, shouldPushPanic sniffs the stack to decide where to put the PC from the signal context. In the cgo case, it will find that !findfunc(pc).valid() because pc is in C code, and then it will check if the top of the stack looks like a Go PC. However, this stack slot is just in a C frame, so it could be uninitialized and contain anything, including what looks like a valid Go PC. For example, in https://build.golang.org/log/c601a18e2af24794e6c0899e05dddbb08caefc17, it sees 1c02c23a <runtime.newproc1+682>. When this condition is met, it skips putting the signal PC on the stack at all. As a result, when we later unwind from the sigpanic, we'll "successfully" but incorrectly unwind to whatever PC was in this uninitialized slot and go who knows where from there. Fix this by making shouldPushPanic assume that the signal PC is always usable if we're running C code, so we always make it appear like sigpanic's caller. This lets us be pickier again about unexpected return PCs in gentraceback. Updates #23640. Change-Id: I1e8ade24b031bd905d48e92d5e60c982e8edf160 Reviewed-on: https://go-review.googlesource.com/91137 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-01-31runtime: suppress "unexpected return pc" any time we're in cgoAustin Clements
Currently, gentraceback suppresses the "unexpected return pc" error for sigpanic's caller if the M was running C code. However, there are various situations where a sigpanic is injected into C code that can cause traceback to unwind *past* the sigpanic before realizing that it's in trouble (the traceback beyond the sigpanic will be wrong). Rather than try to fix these issues for Go 1.10, this CL simply disables complaining about unexpected return PCs if we're in cgo regardless of whether or not they're from the sigpanic frame. Go 1.9 never complained about unexpected return PCs when printing, so this is simply a step closer to the old behavior. This should fix the openbsd-386 failures on the dashboard, though this issue could affect any architecture. Fixes #23640. Change-Id: I8c32c1ee86a70d2f280661ed1f8caf82549e324b Reviewed-on: https://go-review.googlesource.com/91136 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-01-31runtime: fail silently if we unwind over sigpanic into C codeAustin Clements
If we're running C code and the code panics, the runtime will inject a call to sigpanic into the C code just like it would into Go code. However, the return PC from this sigpanic will be in C code. We used to silently abort the traceback if we didn't recognize a return PC, so this went by quietly. Now we're much louder because in general this is a bad thing. However, in this one particular case, it's fine, so if we're in cgo and are looking at the return PC of sigpanic, silence the debug output. Fixes #23576. Change-Id: I03d0c14d4e4d25b29b1f5804f5e9ccc4f742f876 Reviewed-on: https://go-review.googlesource.com/90896 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-01-31runtime: don't unwind past asmcgocallAustin Clements
asmcgocall switches to the system stack and aligns the SP, so gentraceback both can't unwind over it when it appears on the system stack (it'll read some uninitialized stack slot as the return PC). There's also no point in unwinding over it, so don't. Updates #23576. Change-Id: Idfcc9599c7636b80dec5451cb65ae892b4611981 Reviewed-on: https://go-review.googlesource.com/90895 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-01-22runtime: print hexdump on traceback failureAustin Clements
Currently, if anything goes wrong when printing a traceback, we simply cut off the traceback without any further diagnostics. Unfortunately, right now, we have a few issues that are difficult to debug because the traceback simply cuts off (#21431, #23484). This is an attempt to improve the debuggability of traceback failure by printing a diagnostic message plus a hex dump around the failed traceback frame when something goes wrong. The failures look like: goroutine 5 [running]: runtime: unexpected return pc for main.badLR2 called from 0xbad stack: frame={sp:0xc42004dfa8, fp:0xc42004dfc8} stack=[0xc42004d800,0xc42004e000) 000000c42004dea8: 0000000000000001 0000000000000001 000000c42004deb8: 000000c42004ded8 000000c42004ded8 000000c42004dec8: 0000000000427eea <runtime.dopanic+74> 000000c42004ded8 000000c42004ded8: 000000000044df70 <runtime.dopanic.func1+0> 000000c420001080 000000c42004dee8: 0000000000427b21 <runtime.gopanic+961> 000000c42004df08 000000c42004def8: 000000c42004df98 0000000000427b21 <runtime.gopanic+961> 000000c42004df08: 0000000000000000 0000000000000000 000000c42004df18: 0000000000000000 0000000000000000 000000c42004df28: 0000000000000000 0000000000000000 000000c42004df38: 0000000000000000 000000c420001080 000000c42004df48: 0000000000000000 0000000000000000 000000c42004df58: 0000000000000000 0000000000000000 000000c42004df68: 000000c4200010a0 0000000000000000 000000c42004df78: 00000000004c6400 00000000005031d0 000000c42004df88: 0000000000000000 0000000000000000 000000c42004df98: 000000c42004dfb8 00000000004ae7d9 <main.badLR2+73> 000000c42004dfa8: <00000000004c6400 00000000005031d0 000000c42004dfb8: 000000c42004dfd0 !0000000000000bad 000000c42004dfc8: >0000000000000000 0000000000000000 000000c42004dfd8: 0000000000451821 <runtime.goexit+1> 0000000000000000 000000c42004dfe8: 0000000000000000 0000000000000000 000000c42004dff8: 0000000000000000 main.badLR2(0x0) /go/src/runtime/testdata/testprog/badtraceback.go:42 +0x49 For #21431, #23484. Change-Id: I8718fc76ced81adb0b4b0b4f2293f3219ca80786 Reviewed-on: https://go-review.googlesource.com/89016 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-13runtime: don't elide wrapper functions that call panic or at TOSAustin Clements
CL 45412 started hiding autogenerated wrapper functions from call stacks so that call stack semantics better matched language semantics. This is based on the theory that the wrapper function will call the "real" function and all the programmer knows about is the real function. However, this theory breaks down in two cases: 1. If the wrapper is at the top of the stack, then it didn't call anything. This can happen, for example, if the "stack" was actually synthesized by the user. 2. If the wrapper panics, for example by calling panicwrap or by dereferencing a nil pointer, then it didn't call the wrapped function and the user needs to see what panicked, even if we can't attribute it nicely. This commit modifies the traceback logic to include the wrapper function in both of these cases. Fixes #22231. Change-Id: I6e4339a652f73038bd8331884320f0b8edd86eb1 Reviewed-on: https://go-review.googlesource.com/76770 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-09-22runtime: hide <autogenerated> methods from call stackAustin Clements
The compiler generates wrapper methods to forward interface method calls (which are always pointer-based) to value methods. These wrappers appear in the call stack even though they are an implementation detail. This leaves ugly "<autogenerated>" functions in stack traces and can throw off skip counts for stack traces. Fix this by considering these runtime frames in printed stack traces so they will only be printed if runtime frames are being printed, and by eliding them from the call stack expansion used by CallersFrames and Caller. This removes the test for issue 4388 since that was checking that "<autogenerated>" appeared in the stack trace instead of something even weirder. We replace it with various runtime package tests. Fixes #16723. Change-Id: Ice3f118c66f254bb71478a664d62ab3fc7125819 Reviewed-on: https://go-review.googlesource.com/45412 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-22runtime: remove getcallerpc argumentAustin Clements
Now that getcallerpc is a compiler intrinsic on x86 and non-x86 platforms don't need the argument, we can drop it. Sadly, this doesn't let us remove any dummy arguments since all of those cases also use getcallersp, which still takes the argument pointer, but this is at least an improvement. Change-Id: I9c34a41cf2c18cba57f59938390bf9491efb22d2 Reviewed-on: https://go-review.googlesource.com/65474 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-09-15runtime: change lockedg/lockedm to guintptr/muintptrIan Lance Taylor
This change has no real effect in itself. This is to prepare for a followup change that will call lockOSThread during a cgo callback when there is no p assigned, and therefore when lockOSThread can not use a write barrier. Change-Id: Ia122d41acf54191864bcb68f393f2ed3b2f87abc Reviewed-on: https://go-review.googlesource.com/63630 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2017-06-09runtime: print pc with fp/sp in tracebackAustin Clements
If we're in a situation where printing the fp and sp in the traceback is useful, it's almost certainly also useful to print the PC. Change-Id: Ie48a0d5de8a54b5b90ab1d18638a897958e48f70 Reviewed-on: https://go-review.googlesource.com/45210 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
2017-03-29runtime: include inlined calls in result of CallersFramesDavid Lazar
Change-Id: If1a3396175f2afa607d56efd1444181334a9ae3e Reviewed-on: https://go-review.googlesource.com/37862 Reviewed-by: Austin Clements <austin@google.com>
2017-03-29runtime: handle inlined calls in runtime.CallersDavid Lazar
The `skip` argument passed to runtime.Caller and runtime.Callers should be interpreted as the number of logical calls to skip (rather than the number of physical stack frames to skip). This changes runtime.Callers to skip inlined calls in addition to physical stack frames. The result value of runtime.Callers is a slice of program counters ([]uintptr) representing physical stack frames. If the `skip` parameter to runtime.Callers skips part-way into a physical frame, there is no convenient way to encode that in the resulting slice. To avoid changing the API in an incompatible way, our solution is to store the number of skipped logical calls of the first frame in the _second_ uintptr returned by runtime.Callers. Since this number is a small integer, we encode it as a valid PC value into a small symbol called: runtime.skipPleaseUseCallersFrames For example, if f() calls g(), g() calls `runtime.Callers(2, pcs)`, and g() is inlined into f, then the frame for f will be partially skipped, resulting in the following slice: pcs = []uintptr{pc_in_f, runtime.skipPleaseUseCallersFrames+1, ...} We store the skip PC in pcs[1] instead of pcs[0] so that `pcs[i:]` will truncate the captured stack trace rather than grow it for all i. Updates #19348. Change-Id: I1c56f89ac48c29e6f52a5d085567c6d77d499cf1 Reviewed-on: https://go-review.googlesource.com/37854 Run-TryBot: David Lazar <lazard@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2017-03-06runtime: avoid repeated findmoduledatap callsAustin Clements
Currently almost every function that deals with a *_func has to first look up the *moduledata for the module containing the function's entry point. This means we almost always do at least two identical module lookups whenever we deal with a *_func (one to get the *_func and another to get something from its module data) and sometimes several more. Fix this by making findfunc return a new funcInfo type that embeds *_func, but also includes the *moduledata, and making all of the functions that currently take a *_func instead take a funcInfo and use the already-found *moduledata. This transformation is trivial for the most part, since the *_func type is usually inferred. The annoying part is that we can no longer use nil to indicate failure, so this introduces a funcInfo.valid() method and replaces nil checks with calls to valid. Change-Id: I9b8075ef1c31185c1943596d96dec45c7ab5100f Reviewed-on: https://go-review.googlesource.com/37331 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2017-03-03runtime: use inlining tables to generate accurate tracebacksDavid Lazar
The code in https://play.golang.org/p/aYQPrTtzoK now produces the following stack trace: goroutine 1 [running]: main.(*point).negate(...) /tmp/go/main.go:8 main.main() /tmp/go/main.go:14 +0x23 Previously the stack trace missed the inlined call: goroutine 1 [running]: main.main() /tmp/go/main.go:14 +0x23 Fixes #10152. Updates #19348. Change-Id: Ib43c67012f53da0ef1a1e69bcafb65b57d9cecb2 Reviewed-on: https://go-review.googlesource.com/37233 Run-TryBot: David Lazar <lazard@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2017-02-14runtime: remove stack barriersAustin Clements
Now that we don't rescan stacks, stack barriers are unnecessary. This removes all of the code and structures supporting them as well as tests that were specifically for stack barriers. Updates #17503. Change-Id: Ia29221730e0f2bbe7beab4fa757f31a032d9690c Reviewed-on: https://go-review.googlesource.com/36620 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-12-19runtime: clean up and improve reflect.methodValue commentsAustin Clements
The runtime no longer hard-codes the offset of reflect.methodValue.stack, so remove these obsolete comments. Also, reflect.methodValue and runtime.reflectMethodValue must also agree with reflect.makeFuncImpl, so update the comments on all three to mention this. This was pointed out by Minux on CL 31138. Change-Id: Ic5ed1beffb65db76aca2977958da35de902e8e58 Reviewed-on: https://go-review.googlesource.com/34590 Reviewed-by: Keith Randall <khr@golang.org>
2016-11-22runtime: do not print runtime panic frame at top of user stackRuss Cox
The expected default behavior (no explicit GOTRACEBACK setting) is for the stack trace to start in user code, eliding unnecessary runtime frames that led up to the actual trace printing code. The idea was that the first line number printed was the one that crashed. For #5832 we added code to show 'panic' frames so that if code panics and then starts running defers and then we trace from there, the panic frame can help explain why the code seems to have made a call not present in the code. But that's only needed for panics between two different call frames, not the panic at the very top of the stack trace. Fix the fix to again elide the runtime code at the very top of the stack trace. Simple panic: package main func main() { var x []int println(x[1]) } Before this CL: panic: runtime error: index out of range goroutine 1 [running]: panic(0x1056980, 0x1091bf0) /Users/rsc/go/src/runtime/panic.go:531 +0x1cf main.main() /tmp/x.go:5 +0x5 After this CL: panic: runtime error: index out of range goroutine 1 [running]: main.main() /tmp/x.go:5 +0x5 Panic inside defer triggered by panic: package main func main() { var x []int defer func() { println(x[1]) }() println(x[2]) } Before this CL: panic: runtime error: index out of range panic: runtime error: index out of range goroutine 1 [running]: panic(0x1056aa0, 0x1091bf0) /Users/rsc/go/src/runtime/panic.go:531 +0x1cf main.main.func1(0x0, 0x0, 0x0) /tmp/y.go:6 +0x62 panic(0x1056aa0, 0x1091bf0) /Users/rsc/go/src/runtime/panic.go:489 +0x2cf main.main() /tmp/y.go:8 +0x59 The middle panic is important: it explains why main.main ended up calling main.main.func1 on a line that looks like a call to println. The top panic is noise. After this CL: panic: runtime error: index out of range panic: runtime error: index out of range goroutine 1 [running]: main.main.func1(0x0, 0x0, 0x0) /tmp/y.go:6 +0x62 panic(0x1056ac0, 0x1091bf0) /Users/rsc/go/src/runtime/panic.go:489 +0x2cf main.main() /tmp/y.go:8 +0x59 Fixes #17901. Change-Id: Id6d7c76373f7a658a537a39ca32b7dc23e1e76aa Reviewed-on: https://go-review.googlesource.com/33165 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-17runtime: fix getArgInfo for deferred reflection callsAustin Clements
getArgInfo for reflect.makeFuncStub and reflect.methodValueCall is necessarily special. These have dynamically determined argument maps that are stored in their context (that is, their *funcval). These functions are written to store this context at 0(SP) when called, and getArgInfo retrieves it from there. This technique works if getArgInfo is passed an active call frame for one of these functions. However, getArgInfo is also used in tracebackdefers, where the "call" is not a true call with an active stack frame, but a deferred call. In this situation, getArgInfo currently crashes because tracebackdefers passes a frame with sp set to 0. However, the entire approach used by getArgInfo is flawed in this situation because the wrapper has not actually executed, and hence hasn't saved this metadata to any stack frame. In the defer case, we know the *funcval from the _defer itself, so we can fix this by teaching getArgInfo to use the *funcval context directly when its available, and otherwise get it from the active call frame. While we're here, this commit simplifies getArgInfo a bit by making it play more nicely with the type system. Rather than decoding the *reflect.methodValue that is the wrapper's context as a *[2]uintptr, just write out a copy of the reflect.methodValue type in the runtime. Fixes #16331. Fixes #17471. Change-Id: I81db4d985179b4a81c68c490cceeccbfc675456a Reviewed-on: https://go-review.googlesource.com/31138 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-08-23runtime: add msan calls before calling traceback functionsIan Lance Taylor
Tell msan that the arguments to the traceback functions are initialized, in case the traceback functions are compiled with -fsanitize=memory. Change-Id: I3ab0816604906c6cd7086245e6ae2e7fa62fe354 Reviewed-on: https://go-review.googlesource.com/24856 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-06-02runtime: only permit SetCgoTraceback to be called onceIan Lance Taylor
Accept a duplicate call, but nothing else. Change-Id: Iec24bf5ddc3b0f0c559ad2158339aca698601743 Reviewed-on: https://go-review.googlesource.com/23692 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-06-02runtime: fix typo in commentDmitry Vyukov
Change-Id: I82e35770b45ccd1433dfae0af423073c312c0859 Reviewed-on: https://go-review.googlesource.com/23680 Reviewed-by: Andrew Gerrand <adg@golang.org>
2016-05-31runtime: pass signal context to cgo traceback functionIan Lance Taylor
When doing a backtrace from a signal that occurs in C code compiled without using -fasynchronous-unwind-tables, we have to rely on frame pointers. In order to do that, the traceback function needs the signal context to reliably pick up the frame pointer. Change-Id: I7b45930fced01685c337d108e0f146057928f876 Reviewed-on: https://go-review.googlesource.com/23494 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-24runtime: update SP when jumping stacks in tracebackAustin Clements
When gentraceback starts on a system stack in sigprof, it is configured to jump to the user stack when it reaches the end of the system stack. Currently this updates the current frame's FP, but not its SP. This is okay on non-LR machines (x86) because frame.sp is only used to find defers, which the bottom-most frame of the user stack will never have. However, on LR machines, we use frame.sp to find the saved LR. We then use to resolve the function of the next frame, which is used to resolved the size of the next frame. Since we're not updating frame.sp on a stack jump, we read the saved LR from the system stack instead of the user stack and wind up resolving the wrong function and hence the wrong frame size for the next frame. This has had remarkably few ill effects (though the resulting profiles must be wrong). We noticed it because of a bad interaction with stack barriers. Specifically, once we get the next frame size wrong, we also get the location of its LR wrong. If we happen to get a stack slot that contains a stale stack barrier LR (for a stack barrier we already hit) and hasn't been overwritten with something else as we re-grew the stack, gentraceback will fail with a "found next stack barrier at ..." error, pointing at the slot that it thinks is an LR, but isn't. Fixes #15138. Updates #15313 (might fix it). Change-Id: I13cfa322b44c0c2f23ac2b3d03e12631e4a6406b Reviewed-on: https://go-review.googlesource.com/23291 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-05-18runtime: print PC, not the counter, for a cgo tracebackIan Lance Taylor
Change-Id: I54ed7a26a753afb2d6a72080e1f50ce9fba7c183 Reviewed-on: https://go-review.googlesource.com/23228 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-06runtime: stop traceback at foreign functionRuss Cox
This can only happen when profiling and there is foreign code at the top of the g0 stack but we're not in cgo. That in turn only happens with the race detector. Fixes #13568. Change-Id: I23775132c9c1a3a3aaae191b318539f368adf25e Reviewed-on: https://go-review.googlesource.com/18322 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-29cmd/cgo, runtime, runtime/cgo: use cgo context functionIan Lance Taylor
Add support for the context function set by runtime.SetCgoTraceback. The context function was added in CL 17761, without support. This CL is the support. This CL has not been tested for real C code, as a working context function for C code requires unwind support that does not seem to exist. I wanted to get the CL out before the freeze. I apologize for the length of this CL. It's mostly plumbing, but unfortunately the plumbing is processor-specific. Change-Id: I8ce11a0de9b3dafcc29efd2649d776e93bff0e90 Reviewed-on: https://go-review.googlesource.com/22508 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-27runtime: fix SetCgoTraceback doc indentationBrad Fitzpatrick
It wasn't rendering as HTML nicely. Change-Id: I5408ec22932a05e85c210c0faa434bd19dce5650 Reviewed-on: https://go-review.googlesource.com/22532 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-01runtime: support symbolic backtrace of C code in a cgo crashIan Lance Taylor
The new function runtime.SetCgoTraceback may be used to register stack traceback and symbolizer functions, written in C, to do a stack traceback from cgo code. There is a sample implementation of runtime.SetCgoSymbolizer at github.com/ianlancetaylor/cgosymbolizer. Just importing that package is sufficient to get symbolic C backtraces. Currently only supported on linux/amd64. Change-Id: If96ee2eb41c6c7379d407b9561b87557bfe47341 Reviewed-on: https://go-review.googlesource.com/17761 Reviewed-by: Austin Clements <austin@google.com>
2016-03-07runtime: eliminate unnecessary type conversionsMatthew Dempsky
Automated refactoring produced using github.com/mdempsky/unconvert. Change-Id: Iacf871a4f221ef17f48999a464ab2858b2bbaa90 Reviewed-on: https://go-review.googlesource.com/20071 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02all: single space after period.Brad Fitzpatrick
The tree's pretty inconsistent about single space vs double space after a period in documentation. Make it consistently a single space, per earlier decisions. This means contributors won't be confused by misleading precedence. This CL doesn't use go/doc to parse. It only addresses // comments. It was generated with: $ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])') $ go test go/doc -update Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7 Reviewed-on: https://go-review.googlesource.com/20022 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dave Day <djd@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-25runtime: eliminate unused _Genqueue stateAustin Clements
_Genqueue and _Gscanenqueue were introduced as part of the GC quiesce code. The quiesce code was removed by 197aa9e, but these states and some associated code stuck around. Remove them. Change-Id: I69df81881602d4a431556513dac2959668d27c20 Reviewed-on: https://go-review.googlesource.com/19638 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-16runtime: show panics in tracebackAustin Clements
We used to include panic calls in tracebacks; however, when runtime.panic was renamed to runtime.gopanic in the conversion of the runtime to Go, we missed the special case in showframe that includes panic calls even though they're in package runtime. Fix the function name check in showframe (and, while we're here, fix the other check for "runtime.panic" in runtime/pprof). Since the "runtime.gopanic" name doesn't match what users call panic and hence isn't very user-friendly, make traceback rewrite it to just "panic". Updates #5832, #13857. Fixes #14315. Change-Id: I8059621b41ec043e63d5cfb4cbee479f47f64973 Reviewed-on: https://go-review.googlesource.com/19492 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
2015-12-28runtime: fix, simplify, and improve scan state in goroutine headerAustin Clements
Currently goroutineheader goes through some convolutions to *almost* print the scan state of a G. However, the code path that would print the scan state of the G refers to gStatusStrings where it almost certainly meant to refer to gScanStatusStrings (which is unused), so it winds up printing the regular status string without the scan state either way. Furthermore, if the G is in _Gwaiting, we override the status string and lose where this would indicate the scan state if it worked. This commit fixes this so the runtime prints the scan state. However, rather than using a parallel list of status strings, this simply adds a conditional print if the scan bit is set. This lets us remove the string list, prints the scan state even in _Gwaiting, and lets us strip off the scan bit at the beginning of the function, which simplifies the rest of it. Change-Id: Ic0adbe5c05abf4adda93da59f93b578172b28e3d Reviewed-on: https://go-review.googlesource.com/18092 Reviewed-by: Keith Randall <khr@golang.org>
2015-11-23runtime: improve stack barrier debuggingAustin Clements
This improves stack barrier debugging messages in various ways: 1) Rather than printing only the remaining stack barriers (of which there may be none, which isn't very useful), print all of the G's stack barriers with a marker at the position the stack itself has unwound to and a marker at the problematic stack barrier (where applicable). 2) Rather than crashing if we encounter a stack barrier when there are no more stkbar entries, print the same debug message we would if we had encountered a stack barrier at an unexpected location. Hopefully this will help with debugging #12528. Change-Id: I2e6fe6a778e0d36dd8ef30afd4c33d5d94731262 Reviewed-on: https://go-review.googlesource.com/17147 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-20runtime: fix new stack barrier checkRuss Cox
During a crash showing goroutine stacks of all threads (with GOTRACEBACK=crash), it can be that f == nil. Only happens on Solaris; not sure why. Change-Id: Iee2c394a0cf19fa0a24f6befbc70776b9e42d25a Reviewed-on: https://go-review.googlesource.com/17110 Reviewed-by: Austin Clements <austin@google.com>
2015-11-19runtime: eliminate write barriers from gentracebackAustin Clements
gentraceback is used in many contexts where write barriers are disallowed. This currently works because the only write barrier is in assigning frame.argmap in setArgInfo and in practice frame is always on the stack, so this write barrier is a no-op. However, we can easily eliminate this write barrier, which will let us statically disallow write barriers (using go:nowritebarrierrec annotations) in many more situations. As a bonus, this makes the code a little more idiomatic. Updates #10600. Change-Id: I45ba5cece83697ff79f8537ee6e43eadf1c18c6d Reviewed-on: https://go-review.googlesource.com/17003 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-11-19runtime: handle sigprof in stackBarrierAustin Clements
Currently, if a profiling signal happens in the middle of stackBarrier, gentraceback may see inconsistencies between stkbar and the barriers on the stack and it will certainly get the wrong return PC for stackBarrier. In most cases, the return PC won't be a PC at all and this will immediately abort the traceback (which is considered okay for a sigprof), but if it happens to be a valid PC this may sent gentraceback down a rabbit hole. Fix this by detecting when the gentraceback starts in stackBarrier and simulating the completion of the barrier to get the correct initial frame. Change-Id: Ib11f705ac9194925f63fe5dfbfc84013a38333e6 Reviewed-on: https://go-review.googlesource.com/17035 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-12runtime: break out system-specific constants into package sysMichael Matloob
runtime/internal/sys will hold system-, architecture- and config- specific constants. Updates #11647 Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54 Reviewed-on: https://go-review.googlesource.com/16817 Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-05runtime: remove background GC goroutine and mark barriersAustin Clements
These are now unused. Updates #11970. Change-Id: I43e5c4e5bcda9581bacc63364f96bb4855ab779f Reviewed-on: https://go-review.googlesource.com/16393 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-30runtime: introduce GOTRACEBACK=single, now the defaultRuss Cox
Abandon (but still support) the old numbering system. GOTRACEBACK=none is old 0 GOTRACEBACK=single is the new behavior GOTRACEBACK=all is old 1 GOTRACEBACK=system is old 2 GOTRACEBACK=crash is unchanged See doc comment change in runtime1.go for details. Filed #13107 to decide whether to change default back to GOTRACEBACK=all for Go 1.6 release. If you run into programs where printing only the current goroutine omits needed information, please add details in a comment on that issue. Fixes #12366. Change-Id: I82ca8b99b5d86dceb3f7102d38d2659d45dbe0db Reviewed-on: https://go-review.googlesource.com/16512 Reviewed-by: Austin Clements <austin@google.com>
2015-10-22runtime: add pcvalue cache to improve stack scan speedAustin Clements
The cost of scanning large stacks is currently dominated by the time spent looking up and decoding the pcvalue table. However, large stacks are usually large not because they contain calls to many different functions, but because they contain many calls to the same, small set of recursive functions. Hence, walking large stacks tends to make the same pcvalue queries many times. Based on this observation, this commit adds a small, very simple, and fast cache in front of pcvalue lookup. We thread this cache down from operations that make many pcvalue calls, such as gentraceback, stack scanning, and stack adjusting. This simple cache works well because it has minimal overhead when it's not effective. I also tried a hashed direct-map cache, CLOCK-based replacement, round-robin replacement, and round-robin with lookups disabled until there had been at least 16 probes, but none of these approaches had obvious wins over the random replacement policy in this commit. This nearly doubles the overall performance of the deep stack test program from issue #10898: name old time/op new time/op delta Issue10898 16.5s ±12% 9.2s ±12% -44.37% (p=0.008 n=5+5) It's a very slight win on the garbage benchmark: name old time/op new time/op delta XBenchGarbage-12 4.92ms ± 1% 4.89ms ± 1% -0.75% (p=0.000 n=18+19) It's a wash (but doesn't harm performance) on the go1 benchmarks, which don't have particularly deep stacks: name old time/op new time/op delta BinaryTree17-12 3.11s ± 2% 3.20s ± 3% +2.83% (p=0.000 n=17+20) Fannkuch11-12 2.51s ± 1% 2.51s ± 1% -0.22% (p=0.034 n=19+18) FmtFprintfEmpty-12 50.8ns ± 3% 50.6ns ± 2% ~ (p=0.793 n=20+20) FmtFprintfString-12 174ns ± 0% 174ns ± 1% +0.17% (p=0.048 n=15+20) FmtFprintfInt-12 177ns ± 0% 165ns ± 1% -6.99% (p=0.000 n=17+19) FmtFprintfIntInt-12 283ns ± 1% 284ns ± 0% +0.22% (p=0.000 n=18+15) FmtFprintfPrefixedInt-12 243ns ± 1% 244ns ± 1% +0.40% (p=0.000 n=20+19) FmtFprintfFloat-12 318ns ± 0% 319ns ± 0% +0.27% (p=0.001 n=19+20) FmtManyArgs-12 1.12µs ± 0% 1.14µs ± 0% +1.74% (p=0.000 n=19+20) GobDecode-12 8.69ms ± 0% 8.73ms ± 1% +0.46% (p=0.000 n=18+18) GobEncode-12 6.64ms ± 1% 6.61ms ± 1% -0.46% (p=0.000 n=20+20) Gzip-12 323ms ± 2% 319ms ± 1% -1.11% (p=0.000 n=20+20) Gunzip-12 42.8ms ± 0% 42.9ms ± 0% ~ (p=0.158 n=18+20) HTTPClientServer-12 63.3µs ± 1% 63.1µs ± 1% -0.35% (p=0.011 n=20+20) JSONEncode-12 16.9ms ± 1% 17.3ms ± 1% +2.84% (p=0.000 n=19+20) JSONDecode-12 59.7ms ± 0% 58.5ms ± 0% -2.05% (p=0.000 n=19+17) Mandelbrot200-12 3.92ms ± 0% 3.91ms ± 0% -0.16% (p=0.003 n=19+19) GoParse-12 3.79ms ± 2% 3.75ms ± 2% -0.91% (p=0.005 n=20+20) RegexpMatchEasy0_32-12 102ns ± 1% 101ns ± 1% -0.80% (p=0.001 n=14+20) RegexpMatchEasy0_1K-12 337ns ± 1% 346ns ± 1% +2.90% (p=0.000 n=20+19) RegexpMatchEasy1_32-12 84.4ns ± 2% 84.3ns ± 2% ~ (p=0.743 n=20+20) RegexpMatchEasy1_1K-12 502ns ± 1% 505ns ± 0% +0.64% (p=0.000 n=20+20) RegexpMatchMedium_32-12 133ns ± 1% 132ns ± 1% -0.85% (p=0.000 n=20+19) RegexpMatchMedium_1K-12 40.1µs ± 1% 39.8µs ± 1% -0.77% (p=0.000 n=18+18) RegexpMatchHard_32-12 2.08µs ± 1% 2.07µs ± 1% -0.55% (p=0.001 n=18+19) RegexpMatchHard_1K-12 62.4µs ± 1% 62.0µs ± 1% -0.74% (p=0.000 n=19+19) Revcomp-12 545ms ± 2% 545ms ± 3% ~ (p=0.771 n=19+20) Template-12 73.7ms ± 1% 72.0ms ± 0% -2.33% (p=0.000 n=20+18) TimeParse-12 358ns ± 1% 351ns ± 1% -2.07% (p=0.000 n=20+20) TimeFormat-12 369ns ± 1% 356ns ± 0% -3.53% (p=0.000 n=20+18) [Geo mean] 63.5µs 63.2µs -0.41% name old speed new speed delta GobDecode-12 88.3MB/s ± 0% 87.9MB/s ± 0% -0.43% (p=0.000 n=18+17) GobEncode-12 116MB/s ± 1% 116MB/s ± 1% +0.47% (p=0.000 n=20+20) Gzip-12 60.2MB/s ± 2% 60.8MB/s ± 1% +1.13% (p=0.000 n=20+20) Gunzip-12 453MB/s ± 0% 453MB/s ± 0% ~ (p=0.160 n=18+20) JSONEncode-12 115MB/s ± 1% 112MB/s ± 1% -2.76% (p=0.000 n=19+20) JSONDecode-12 32.5MB/s ± 0% 33.2MB/s ± 0% +2.09% (p=0.000 n=19+17) GoParse-12 15.3MB/s ± 2% 15.4MB/s ± 2% +0.92% (p=0.004 n=20+20) RegexpMatchEasy0_32-12 311MB/s ± 1% 314MB/s ± 1% +0.78% (p=0.000 n=15+19) RegexpMatchEasy0_1K-12 3.04GB/s ± 1% 2.95GB/s ± 1% -2.90% (p=0.000 n=19+19) RegexpMatchEasy1_32-12 379MB/s ± 2% 380MB/s ± 2% ~ (p=0.779 n=20+20) RegexpMatchEasy1_1K-12 2.04GB/s ± 1% 2.02GB/s ± 0% -0.62% (p=0.000 n=20+20) RegexpMatchMedium_32-12 7.46MB/s ± 1% 7.53MB/s ± 1% +0.86% (p=0.000 n=20+19) RegexpMatchMedium_1K-12 25.5MB/s ± 1% 25.7MB/s ± 1% +0.78% (p=0.000 n=18+18) RegexpMatchHard_32-12 15.4MB/s ± 1% 15.5MB/s ± 1% +0.62% (p=0.000 n=19+19) RegexpMatchHard_1K-12 16.4MB/s ± 1% 16.5MB/s ± 1% +0.82% (p=0.000 n=20+19) Revcomp-12 466MB/s ± 2% 466MB/s ± 3% ~ (p=0.765 n=19+20) Template-12 26.3MB/s ± 1% 27.0MB/s ± 0% +2.38% (p=0.000 n=20+18) [Geo mean] 97.8MB/s 98.0MB/s +0.23% Change-Id: I281044ae0b24990ba46487cacbc1069493274bc4 Reviewed-on: https://go-review.googlesource.com/13614 Reviewed-by: Keith Randall <khr@golang.org>
2015-10-18runtime: add a constant for the smallest possible stack frameMichael Hudson-Doyle
Shared libraries on ppc64le will require a larger minimum stack frame (because the ABI mandates that the TOC pointer is available at 24(R1)). So to prepare for this, make a constant for the fixed part of a stack and use that where necessary. Change-Id: I447949f4d725003bb82e7d2cf7991c1bca5aa887 Reviewed-on: https://go-review.googlesource.com/15523 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-08-30runtime: don't install a stack barrier in cgocallback_gofunc's frameAustin Clements
Currently the runtime can install stack barriers in any frame. However, the frame of cgocallback_gofunc is special: it's the one function that switches from a regular G stack to the system stack on return. Hence, the return PC slot in its frame on the G stack is actually used to save getg().sched.pc (so tracebacks appear to unwind to the last Go function running on that G), and not as an actual return PC for cgocallback_gofunc. Because of this, if we install a stack barrier in cgocallback_gofunc's return PC slot, when cgocallback_gofunc does return, it will move the stack barrier stub PC in to getg().sched.pc and switch back to the system stack. The rest of the runtime doesn't know how to deal with a stack barrier stub in sched.pc: nothing knows how to match it up with the G's stack barrier array and, when the runtime removes stack barriers, it doesn't know to undo the one in sched.pc. Hence, if the C code later returns back in to Go code, it will attempt to return through the stack barrier saved in sched.pc, which may no longer have correct unwinding information. Fix this by blacklisting cgocallback_gofunc's frame so the runtime won't install a stack barrier in it's return PC slot. Fixes #12238. Change-Id: I46aa2155df2fd050dd50de3434b62987dc4947b8 Reviewed-on: https://go-review.googlesource.com/13944 Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-28runtime: check explicitly for short unwinding of stacksRuss Cox
Right now we find out implicitly if stack barriers are in place, or defers. This change makes sure we find out about short unwinds always. Change-Id: Ibdde1ba9c79eb792660dcb7aa6f186e4e4d559b3 Reviewed-on: https://go-review.googlesource.com/13966 Reviewed-by: Austin Clements <austin@google.com>
2015-06-02runtime: implement GC stack barriersAustin Clements
This commit implements stack barriers to minimize the amount of stack re-scanning that must be done during mark termination. Currently the GC scans stacks of active goroutines twice during every GC cycle: once at the beginning during root discovery and once at the end during mark termination. The second scan happens while the world is stopped and guarantees that we've seen all of the roots (since there are no write barriers on writes to local stack variables). However, this means pause time is proportional to stack size. In particularly recursive programs, this can drive pause time up past our 10ms goal (e.g., it takes about 150ms to scan a 50MB heap). Re-scanning the entire stack is rarely necessary, especially for large stacks, because usually most of the frames on the stack were not active between the first and second scans and hence any changes to these frames (via non-escaping pointers passed down the stack) were tracked by write barriers. To efficiently track how far a stack has been unwound since the first scan (and, hence, how much needs to be re-scanned), this commit introduces stack barriers. During the first scan, at exponentially spaced points in each stack, the scan overwrites return PCs with the PC of the stack barrier function. When "returned" to, the stack barrier function records how far the stack has unwound and jumps to the original return PC for that point in the stack. Then the second scan only needs to proceed as far as the lowest barrier that hasn't been hit. For deeply recursive programs, this substantially reduces mark termination time (and hence pause time). For the goscheme example linked in issue #10898, prior to this change, mark termination times were typically between 100 and 500ms; with this change, mark termination times are typically between 10 and 20ms. As a result of the reduced stack scanning work, this reduces overall execution time of the goscheme example by 20%. Fixes #10898. The effect of this on programs that are not deeply recursive is minimal: name old time/op new time/op delta BinaryTree17 3.16s ± 2% 3.26s ± 1% +3.31% (p=0.000 n=19+19) Fannkuch11 2.42s ± 1% 2.48s ± 1% +2.24% (p=0.000 n=17+19) FmtFprintfEmpty 50.0ns ± 3% 49.8ns ± 1% ~ (p=0.534 n=20+19) FmtFprintfString 173ns ± 0% 175ns ± 0% +1.49% (p=0.000 n=16+19) FmtFprintfInt 170ns ± 1% 175ns ± 1% +2.97% (p=0.000 n=20+19) FmtFprintfIntInt 288ns ± 0% 295ns ± 0% +2.73% (p=0.000 n=16+19) FmtFprintfPrefixedInt 242ns ± 1% 252ns ± 1% +4.13% (p=0.000 n=18+18) FmtFprintfFloat 324ns ± 0% 323ns ± 0% -0.36% (p=0.000 n=20+19) FmtManyArgs 1.14µs ± 0% 1.12µs ± 1% -1.01% (p=0.000 n=18+19) GobDecode 8.88ms ± 1% 8.87ms ± 0% ~ (p=0.480 n=19+18) GobEncode 6.80ms ± 1% 6.85ms ± 0% +0.82% (p=0.000 n=20+18) Gzip 363ms ± 1% 363ms ± 1% ~ (p=0.077 n=18+20) Gunzip 90.6ms ± 0% 90.0ms ± 1% -0.71% (p=0.000 n=17+18) HTTPClientServer 51.5µs ± 1% 50.8µs ± 1% -1.32% (p=0.000 n=18+18) JSONEncode 17.0ms ± 0% 17.1ms ± 0% +0.40% (p=0.000 n=18+17) JSONDecode 61.8ms ± 0% 63.8ms ± 1% +3.11% (p=0.000 n=18+17) Mandelbrot200 3.84ms ± 0% 3.84ms ± 1% ~ (p=0.583 n=19+19) GoParse 3.71ms ± 1% 3.72ms ± 1% ~ (p=0.159 n=18+19) RegexpMatchEasy0_32 100ns ± 0% 100ns ± 1% -0.19% (p=0.033 n=17+19) RegexpMatchEasy0_1K 342ns ± 1% 331ns ± 0% -3.41% (p=0.000 n=19+19) RegexpMatchEasy1_32 82.5ns ± 0% 81.7ns ± 0% -0.98% (p=0.000 n=18+18) RegexpMatchEasy1_1K 505ns ± 0% 494ns ± 1% -2.16% (p=0.000 n=18+18) RegexpMatchMedium_32 137ns ± 1% 137ns ± 1% -0.24% (p=0.048 n=20+18) RegexpMatchMedium_1K 41.6µs ± 0% 41.3µs ± 1% -0.57% (p=0.004 n=18+20) RegexpMatchHard_32 2.11µs ± 0% 2.11µs ± 1% +0.20% (p=0.037 n=17+19) RegexpMatchHard_1K 63.9µs ± 2% 63.3µs ± 0% -0.99% (p=0.000 n=20+17) Revcomp 560ms ± 1% 522ms ± 0% -6.87% (p=0.000 n=18+16) Template 75.0ms ± 0% 75.1ms ± 1% +0.18% (p=0.013 n=18+19) TimeParse 358ns ± 1% 364ns ± 0% +1.74% (p=0.000 n=20+15) TimeFormat 360ns ± 0% 372ns ± 0% +3.55% (p=0.000 n=20+18) Change-Id: If8a9bfae6c128d15a4f405e02bcfa50129df82a2 Reviewed-on: https://go-review.googlesource.com/10314 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-05-21runtime: make runtime.callers walk calling G, not g0Austin Clements
Currently runtime.callers invokes gentraceback with the pc and sp of the G it is called from, but always passes g0 even if it was called from a regular g. Right now this has no ill effects because runtime.callers does not use either callback argument or the _TraceJumpStack flag, but it makes the code fragile and will break some upcoming changes. Fix this by lifting the getg() call outside of the systemstack in runtime.callers. Change-Id: I4e1e927961c0e0cd4dcf28693be47df7bae9e122 Reviewed-on: https://go-review.googlesource.com/10292 Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com> Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-16runtime: replace GC programs with simpler encoding, faster decoderRuss Cox
Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-11runtime: enable profiling on g0Daniel Morsing
Since we now have stack information for code running on the systemstack, we can traceback over it. To make cpu profiles useful, add a case in gentraceback to jump over systemstack switches. Fixes #10609. Change-Id: I21f47fcc802c07c5d4a1ada56374314e388a6dc7 Reviewed-on: https://go-review.googlesource.com/9506 Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-04-29runtime: print stack of G during a signalKeith Randall
Sequence of operations: - Go code does a systemstack call - during the systemstack call, receive a signal - signal requests a traceback of all goroutines The orignal G is still marked as _Grunning, so the traceback code refuses to print its stack. Fix by allowing traceback of Gs whose caller is on the same M as G is. G can't be modifying its stack if that is the case. Fixes #10546 Change-Id: I2bcea48c0197fbf78ab6fa080027cd80181083ad Reviewed-on: https://go-review.googlesource.com/9435 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-21runtime: multi-threaded, utilization-scheduled background markAustin Clements
Currently, the concurrent mark phase is performed by the main GC goroutine. Prior to the previous commit enabling preemption, this caused marking to always consume 1/GOMAXPROCS of the available CPU time. If GOMAXPROCS=1, this meant background GC would consume 100% of the CPU (effectively a STW). If GOMAXPROCS>4, background GC would use less than the goal of 25%. If GOMAXPROCS=4, background GC would use the goal 25%, but if the mutator wasn't using the remaining 75%, background marking wouldn't take advantage of the idle time. Enabling preemption in the previous commit made GC miss CPU targets in completely different ways, but set us up to bring everything back in line. This change replaces the fixed GC goroutine with per-P background mark goroutines. Once started, these goroutines don't go in the standard run queues; instead, they are scheduled specially such that the time spent in mutator assists and the background mark goroutines totals 25% of the CPU time available to the program. Furthermore, this lets background marking take advantage of idle Ps, which significantly boosts GC performance for applications that under-utilize the CPU. This requires also changing how time is reported for gctrace, so this change splits the concurrent mark CPU time into assist/background/idle scanning. This also requires increasing the size of the StackRecord slice used in a GoroutineProfile test. Change-Id: I0936ff907d2cee6cb687a208f2df47e8988e3157 Reviewed-on: https://go-review.googlesource.com/8850 Reviewed-by: Rick Hudson <rlh@golang.org>