aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/runtime2.go
AgeCommit message (Collapse)Author
2015-10-22runtime: prune some dead variablesMatthew Dempsky
Change-Id: I7a1c3079b433c4e30d72fb7d59f9594e0d5efe47 Reviewed-on: https://go-review.googlesource.com/16178 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-10-22runtime: split plan9 and solaris's m fields into new embedded mOS typeMatthew Dempsky
Reduces the size of m by ~8% on linux/amd64 (1040 bytes -> 960 bytes). There are also windows-specific fields, but they're currently referenced in OS-independent source files (but only when GOOS=="windows"). Change-Id: I13e1471ff585ccced1271f74209f8ed6df14c202 Reviewed-on: https://go-review.googlesource.com/16173 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-21runtime: make iface/eface handling more type safeMatthew Dempsky
Change compiler-invoked interface functions to directly take iface/eface parameters instead of fInterface/interface{} to avoid needing to always convert. For the handful of functions that legitimately need to take an interface{} parameter, add efaceOf to type-safely convert *interface{} to *eface. Change-Id: I8928761a12fd3c771394f36adf93d3006a9fcf39 Reviewed-on: https://go-review.googlesource.com/16166 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-20runtime: add stringStructOf helper functionMatthew Dempsky
Instead of open-coding conversions from *string to unsafe.Pointer then to *stringStruct, add a helper function to add some type safety. Bonus: This caught two **string values being converted to *stringStruct in heapdump.go. While here, get rid of the redundant _string type, but add in a stringStructDWARF type used for generating DWARF debug info. Change-Id: I8882f8cca66ac45190270f82019a5d85db023bd2 Reviewed-on: https://go-review.googlesource.com/16131 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-20runtime: rename _func.frame to make it clear it's deprecated and unused.Aaron Jacobs
When I saw that it was labelled "legacy", I went looking for users of it to see how it was still used. But there aren't any. Save the next person the trouble. Change-Id: I921dd6c57b60331c9816542272555153ac133c02 Reviewed-on: https://go-review.googlesource.com/16035 Reviewed-by: Dave Cheney <dave@cheney.net> Run-TryBot: Dave Cheney <dave@cheney.net> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-09runtime: directly track GC assist balanceAustin Clements
Currently we track the per-G GC assist balance as two monotonically increasing values: the bytes allocated by the G this cycle (gcalloc) and the scan work performed by the G this cycle (gcscanwork). The assist balance is hence assistRatio*gcalloc - gcscanwork. This works, but has two important downsides: 1) It requires floating-point math to figure out if a G is in debt or not. This makes it inappropriate to check for assist debt in the hot path of mallocgc, so we only do this when a G allocates a new span. As a result, Gs can operate "in the red", leading to under-assist and extended GC cycle length. 2) Revising the assist ratio during a GC cycle can lead to an "assist burst". If you think of plotting the scan work performed versus heaps size, the assist ratio controls the slope of this line. However, in the current system, the target line always passes through 0 at the heap size that triggered GC, so if the runtime increases the assist ratio, there has to be a potentially large assist to jump from the current amount of scan work up to the new target scan work for the current heap size. This commit replaces this approach with directly tracking the GC assist balance in terms of allocation credit bytes. Allocating N bytes simply decreases this by N and assisting raises it by the amount of scan work performed divided by the assist ratio (to get back to bytes). This will make it cheap to figure out if a G is in debt, which will let us efficiently check if an assist is necessary *before* performing an allocation and hence keep Gs "in the black". This also fixes assist bursts because the assist ratio is now in terms of *remaining* work, rather than work from the beginning of the GC cycle. Hence, the plot of scan work versus heap size becomes continuous: we can revise the slope, but this slope always starts from where we are right now, rather than where we were at the beginning of the cycle. Change-Id: Ia821c5f07f8a433e8da7f195b52adfedd58bdf2c Reviewed-on: https://go-review.googlesource.com/15408 Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-14runtime: remove unused g.readyg fieldAustin Clements
Commit 0e6a6c5 removed readyExecute a long time ago, but left behind the g.readyg field that was used by readyExecute. Remove this now unused field. Change-Id: I41b87ad2b427974d256ec7a7f6d4bdc2ce8a13bb Reviewed-on: https://go-review.googlesource.com/13111 Reviewed-by: Rick Hudson <rlh@golang.org>
2015-08-28runtime: check explicitly for short unwinding of stacksRuss Cox
Right now we find out implicitly if stack barriers are in place, or defers. This change makes sure we find out about short unwinds always. Change-Id: Ibdde1ba9c79eb792660dcb7aa6f186e4e4d559b3 Reviewed-on: https://go-review.googlesource.com/13966 Reviewed-by: Austin Clements <austin@google.com>
2015-08-25runtime: fix nmspinning comparisonTodd Neal
nmspinning has a value range of [0, 2^31-1]. Update the comment to indicate this and fix the comparison so it's not always false. Fixes #11280 Change-Id: Iedaf0654dcba5e2c800645f26b26a1a781ea1991 Reviewed-on: https://go-review.googlesource.com/13877 Reviewed-by: Minux Ma <minux@golang.org>
2015-08-07runtime: run on GOARM=5 and GOARM=6 uniprocessor freebsd/arm systemsRuss Cox
Also, crash early on non-Linux SMP ARM systems when GOARM < 7; without the proper synchronization, SMP cannot work. Linux is okay because we call kernel-provided routines for synchronization and barriers, and the kernel takes care of providing the right routines for the current system. On non-Linux systems we are left to fend for ourselves. It is possible to use different synchronization on GOARM=6, but it's too late to do that in the Go 1.5 cycle. We don't believe there are any non-Linux SMP GOARM=6 systems anyway. Fixes #12067. Change-Id: I771a556e47893ed540ec2cd33d23c06720157ea3 Reviewed-on: https://go-review.googlesource.com/13363 Reviewed-by: Austin Clements <austin@google.com>
2015-07-30runtime: change arm software div/mod call sequence not to modify stackRuss Cox
Instead of pushing the denominator argument on the stack, the denominator is now passed in m. This fixes a variety of bugs related to trying to take stack traces backwards from the middle of the software div/mod routines. Some of those bugs have been kludged around in the past, but others have not. Instead of trying to patch up after breaking the stack, this CL stops breaking the stack. This is an update of https://golang.org/cl/19810043, which was rolled back in https://golang.org/cl/20350043. The problem in the original CL was that there were divisions at bad times, when m was not available. These were divisions by constant denominators, either in C code or in assembly. The Go compiler knows how to generate division by multiplication for constant denominators, but the C compiler did not. There is no longer any C code, so that's taken care of. There was one problematic DIV in runtime.usleep (assembly) but https://golang.org/cl/12898 took care of that one. So now this approach is safe. Reject DIV/MOD in NOSPLIT functions to keep them from coming back. Fixes #6681. Fixes #6699. Fixes #10486. Change-Id: I09a13c76ad08ba75b3bd5d46a3eb78e66a84ab38 Reviewed-on: https://go-review.googlesource.com/12899 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-29runtime/trace: record event sequence numbers explicitlyRuss Cox
Nearly all the flaky failures we've seen in trace tests have been due to the use of time stamps to determine relative event ordering. This is tricky for many reasons, including: - different cores might not have exactly synchronized clocks - VMs are worse than real hardware - non-x86 chips have different timer resolution than x86 chips - on fast systems two events can end up with the same time stamp Stop trying to make time reliable. It's clearly not going to be for Go 1.5. Instead, record an explicit event sequence number for ordering. Using our own counter solves all of the above problems. The trace still contains time stamps, of course. The sequence number is just used for ordering. Should alleviate #10554 somewhat. Then tickDiv can be chosen to be a useful time unit instead of having to be exact for ordering. Separating ordering and time stamps lets the trace parser diagnose systems where the time stamp order and actual order do not match for one reason or another. This CL adds that check to the end of trace.Parse, after all other sequence order-based checking. If that error is found, we skip the test instead of failing it. Putting the check in trace.Parse means that cmd/trace will pick up the same check, refusing to display a trace where the time stamps do not match actual ordering. Using net/http's BenchmarkClientServerParallel4 on various CPU counts, not tracing vs tracing: name old time/op new time/op delta ClientServerParallel4 50.4µs ± 4% 80.2µs ± 4% +59.06% (p=0.000 n=10+10) ClientServerParallel4-2 33.1µs ± 7% 57.8µs ± 5% +74.53% (p=0.000 n=10+10) ClientServerParallel4-4 18.5µs ± 4% 32.6µs ± 3% +75.77% (p=0.000 n=10+10) ClientServerParallel4-6 12.9µs ± 5% 24.4µs ± 2% +89.33% (p=0.000 n=10+10) ClientServerParallel4-8 11.4µs ± 6% 21.0µs ± 3% +83.40% (p=0.000 n=10+10) ClientServerParallel4-12 14.4µs ± 4% 23.8µs ± 4% +65.67% (p=0.000 n=10+10) Fixes #10512. Change-Id: I173eecf8191e86feefd728a5aad25bf1bc094b12 Reviewed-on: https://go-review.googlesource.com/12579 Reviewed-by: Austin Clements <austin@google.com>
2015-07-29runtime: avoid race between SIGPROF traceback and stack barriersAustin Clements
The following sequence of events can lead to the runtime attempting an out-of-bounds access on a stack barrier slice: 1. A SIGPROF comes in on a thread while the G on that thread is in _Gsyscall. The sigprof handler calls gentraceback, which saves a local copy of the G's stkbar slice. Currently the G has no stack barriers, so this slice is empty. 2. On another thread, the GC concurrently scans the stack of the goroutine being profiled (it considers it stopped because it's in _Gsyscall) and installs stack barriers. 3. Back on the sigprof thread, gentraceback comes across a stack barrier in the stack and attempts to look it up in its (zero length) copy of G's old stkbar slice, which causes an out-of-bounds access. This commit fixes this by adding a simple cas spin to synchronize the SIGPROF handler with stack barrier insertion. In general I would prefer that this synchronization be done through the G status, since that's how stack scans are otherwise synchronized, but adding a new lock is a much smaller change and G statuses are full of subtlety. Fixes #11863. Change-Id: Ie89614a6238bb9c6a5b1190499b0b48ec759eaf7 Reviewed-on: https://go-review.googlesource.com/12748 Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-11all: link to https instead of httpBrad Fitzpatrick
The one in misc/makerelease/makerelease.go is particularly bad and probably warrants rotating our keys. I didn't update old weekly notes, and reverted some changes involving test code for now, since we're late in the Go 1.5 freeze. Otherwise, the rest are all auto-generated changes, and all manually reviewed. Change-Id: Ia2753576ab5d64826a167d259f48a2f50508792d Reviewed-on: https://go-review.googlesource.com/12048 Reviewed-by: Rob Pike <r@golang.org>
2015-06-29runtime: store syscall parameters in m not on stackAlex Brainman
Stack can move during callback, so libcall struct cannot be stored on stack. asmstdcall updates return values and errno in libcall struct parameter, but these could be at different location when callback returns. Store these in m, so they are not affected by GC. Fixes #10406 Change-Id: Id01c9d2b4b44530494e6d9e9e1c875261ce477cd Reviewed-on: https://go-review.googlesource.com/10370 Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-18runtime: fix tracing of syscallexitDmitry Vyukov
There were two issues. 1. Delayed EvGoSysExit could have been emitted during TraceStart, while it had not yet emitted EvGoInSyscall. 2. Delayed EvGoSysExit could have been emitted during next tracing session. Fixes #10476 Fixes #11262 Change-Id: Iab68eb31cf38eb6eb6eee427f49c5ca0865a8c64 Reviewed-on: https://go-review.googlesource.com/9132 Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-17runtime: fix races in stack scanRuss Cox
This fixes a hang during runtime.TestTraceStress. It also fixes double-scan of stacks, which leads to stack barrier installation failures. Both of these have shown up as flaky failures on the dashboard. Fixes #10941. Change-Id: Ia2a5991ce2c9f43ba06ae1c7032f7c898dc990e0 Reviewed-on: https://go-review.googlesource.com/11089 Reviewed-by: Austin Clements <austin@google.com>
2015-06-11all: fix misprints in commentsAinar Garipov
These were found by grepping the comments from the go code and feeding the output to aspell. Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395 Reviewed-on: https://go-review.googlesource.com/10941 Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-10runtime: correct a drifted comment in referencing m->locked.Yongjian Xu
Change-Id: Ida4b98aa63e57594fa6fa0b8178106bac9b3cd19 Reviewed-on: https://go-review.googlesource.com/10837 Reviewed-by: Minux Ma <minux@golang.org>
2015-06-02runtime: steal space for stack barrier tracking from stackAustin Clements
The stack barrier code will need a bookkeeping structure to keep track of the overwritten return PCs. This commit introduces and allocates this structure, but does not yet use the structure. We don't want to allocate space for this structure during garbage collection, so this commit allocates it along with the allocation of the corresponding stack. However, we can't do a regular allocation in newstack because mallocgc may itself grow the stack (which would lead to a recursive allocation). Hence, this commit makes the bookkeeping structure part of the stack allocation itself by stealing the necessary space from the top of the stack allocation. Since the size of this bookkeeping structure is logarithmic in the size of the stack, this has minimal impact on stack behavior. Change-Id: Ia14408be06aafa9ca4867f4e70bddb3fe0e96665 Reviewed-on: https://go-review.googlesource.com/10313 Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02runtime: decouple stack bounds and stack allocation sizeAustin Clements
Currently the runtime assumes that the allocation for the stack is exactly [stack.lo, stack.hi). We're about to steal a small part of this allocation for per-stack GC metadata. To prepare for this, this commit adds a field to the G for the allocated size of the stack. With this change, stack.lo and stack.hi continue to act as the true bounds on the stack, but are no longer also used as the bounds on the stack allocation. (I also tried this the other way around, where stack.lo and stack.hi remained the allocation bounds and I introduced a new top of stack. However, there are far more places that assume stack.hi is the true top of the stack than there are places that assume it's the top of the allocation.) Change-Id: Ifa9d956753be53d286d09cbc73d47fb34a18c0c6 Reviewed-on: https://go-review.googlesource.com/10312 Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-22runtime: don't always unblock all signalsElias Naur
Ian proposed an improved way of handling signals masks in Go, motivated by a problem where the Android java runtime expects certain signals to be blocked for all JVM threads. Discussion here https://groups.google.com/forum/#!topic/golang-dev/_TSCkQHJt6g Ian's text is used in the following: A Go program always needs to have the synchronous signals enabled. These are the signals for which _SigPanic is set in sigtable, namely SIGSEGV, SIGBUS, SIGFPE. A Go program that uses the os/signal package, and calls signal.Notify, needs to have at least one thread which is not blocking that signal, but it doesn't matter much which one. Unix programs do not change signal mask across execve. They inherit signal masks across fork. The shell uses this fact to some extent; for example, the job control signals (SIGTTIN, SIGTTOU, SIGTSTP) are blocked for commands run due to backquote quoting or $(). Our current position on signal masks was not thought out. We wandered into step by step, e.g., http://golang.org/cl/7323067 . This CL does the following: Introduce a new platform hook, msigsave, that saves the signal mask of the current thread to m.sigsave. Call msigsave from needm and newm. In minit grab set up the signal mask from m.sigsave and unblock the essential synchronous signals, and SIGILL, SIGTRAP, SIGPROF, SIGSTKFLT (for systems that have it). In unminit, restore the signal mask from m.sigsave. The first time that os/signal.Notify is called, start a new thread whose only purpose is to update its signal mask to make sure signals for signal.Notify are unblocked on at least one thread. The effect on Go programs will be that if they are invoked with some non-synchronous signals blocked, those signals will normally be ignored. Previously, those signals would mostly be ignored. A change in behaviour will occur for programs started with any of these signals blocked, if they receive the signal: SIGHUP, SIGINT, SIGQUIT, SIGABRT, SIGTERM. Previously those signals would always cause a crash (unless using the os/signal package); with this change, they will be ignored if the program is started with the signal blocked (and does not use the os/signal package). ./all.bash completes successfully on linux/amd64. OpenBSD is missing the implementation. Change-Id: I188098ba7eb85eae4c14861269cc466f2aa40e8c Reviewed-on: https://go-review.googlesource.com/10173 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-18runtime: use separate count and note for forEachPAustin Clements
Currently, forEachP reuses the stopwait and stopnote fields from stopTheWorld to track how many Ps have not responded to the safe-point request and to sleep until all Ps have responded. It was assumed this was safe because both stopTheWorld and forEachP must occur under the worlsema and hence stopwait and stopnote cannot be used for both purposes simultaneously and callers could always determine the appropriate use based on sched.gcwaiting (which is only set by stopTheWorld). However, this is not the case, since it's possible for there to be a window between when an M observes that gcwaiting is set and when it checks stopwait during which stopwait could have changed meanings. When this happens, the M decrements stopwait and may wakeup stopnote, but does not otherwise participate in the forEachP protocol. As a result, stopwait is decremented too many times, so it may reach zero before all Ps have run the safe-point function, causing forEachP to wake up early. It will then either observe that some P has not run the safe-point function and panic with "P did not run fn", or the remaining P (or Ps) will run the safe-point function before it wakes up and it will observe that stopwait is negative and panic with "not stopped". Fix this problem by giving forEachP its own safePointWait and safePointNote fields. One known sequence of events that can cause this race is as follows. It involves three actors: G1 is running on M1 on P1. P1 has an empty run queue. G2/M2 is in a blocked syscall and has lost its P. (The details of this don't matter, it just needs to be in a position where it needs to grab an idle P.) GC just started on G3/M3/P3. (These aren't very involved, they just have to be separate from the other G's, M's, and P's.) 1. GC calls stopTheWorld(), which sets sched.gcwaiting to 1. Now G1/M1 begins to enter a syscall: 2. G1/M1 invokes reentersyscall, which sets the P1's status to _Psyscall. 3. G1/M1's reentersyscall observes gcwaiting != 0 and calls entersyscall_gcwait. 4. G1/M1's entersyscall_gcwait blocks acquiring sched.lock. Back on GC: 5. stopTheWorld cas's P1's status to _Pgcstop, does other stuff, and returns. 6. GC does stuff and then calls startTheWorld(). 7. startTheWorld() calls procresize(), which sets P1's status to _Pidle and puts P1 on the idle list. Now G2/M2 returns from its syscall and takes over P1: 8. G2/M2 returns from its blocked syscall and gets P1 from the idle list. 9. G2/M2 acquires P1, which sets P1's status to _Prunning. 10. G2/M2 starts a new syscall and invokes reentersyscall, which sets P1's status to _Psyscall. Back on G1/M1: 11. G1/M1 finally acquires sched.lock in entersyscall_gcwait. At this point, G1/M1 still thinks it's running on P1. P1's status is _Psyscall, which is consistent with what G1/M1 is doing, but it's _Psyscall because *G2/M2* put it in to _Psyscall, not G1/M1. This is basically an ABA race on P1's status. Because forEachP currently shares stopwait with stopTheWorld. G1/M1's entersyscall_gcwait observes the non-zero stopwait set by forEachP, but mistakes it for a stopTheWorld. It cas's P1's status from _Psyscall (set by G2/M2) to _Pgcstop and proceeds to decrement stopwait one more time than forEachP was expecting. Fixes #10618. (See the issue for details on why the above race is safe when forEachP is not involved.) Prior to this commit, the command stress ./runtime.test -test.run TestFutexsleep\|TestGoroutineProfile would reliably fail after a few hundred runs. With this commit, it ran for over 2 million runs and never crashed. Change-Id: I9a91ea20035b34b6e5f07ef135b144115f281f30 Reviewed-on: https://go-review.googlesource.com/10157 Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-17runtime: eliminate runqvictims and a copy from runqstealAustin Clements
Currently, runqsteal steals Gs from another P into an intermediate buffer and then copies those Gs into the current P's run queue. This intermediate buffer itself was moved from the stack to the P in commit c4fe503 to eliminate the cost of zeroing it on every steal. This commit follows up c4fe503 by stealing directly into the current P's run queue, which eliminates the copy and the need for the intermediate buffer. The update to the tail pointer is only committed once the entire steal operation has succeeded, so the semantics of stealing do not change. Change-Id: Icdd7a0eb82668980bf42c0154b51eef6419fdd51 Reviewed-on: https://go-review.googlesource.com/9998 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>
2015-05-13runtime: reduce thrashing of gs between psRick Hudson
One important use case is a pipeline computation that pass values from one Goroutine to the next and then exits or is placed in a wait state. If GOMAXPROCS > 1 a Goroutine running on P1 will enable another Goroutine and then immediately make P1 available to execute it. We need to prevent other Ps from stealing the G that P1 is about to execute. Otherwise the Gs can thrash between Ps causing unneeded synchronization and slowing down throughput. Fix this by changing the stealing logic so that when a P attempts to steal the only G on some other P's run queue, it will pause momentarily to allow the victim P to schedule the G. As part of optimizing stealing we also use a per P victim queue move stolen gs. This eliminates the zeroing of a stack local victim queue which turned out to be expensive. This CL is a necessary but not sufficient prerequisite to changing the default value of GOMAXPROCS to something > 1 which is another CL/discussion. For highly serialized programs, such as GoroutineRing below this can make a large difference. For larger and more parallel programs such as the x/benchmarks there is no noticeable detriment. ~/work/code/src/rsc.io/benchstat/benchstat old.txt new.txt name old mean new mean delta GoroutineRing 30.2µs × (0.98,1.01) 30.1µs × (0.97,1.04) ~ (p=0.941) GoroutineRing-2 113µs × (0.91,1.07) 30µs × (0.98,1.03) -73.17% (p=0.004) GoroutineRing-4 144µs × (0.98,1.02) 32µs × (0.98,1.01) -77.69% (p=0.000) GoroutineRingBuf 32.7µs × (0.97,1.03) 32.5µs × (0.97,1.02) ~ (p=0.795) GoroutineRingBuf-2 120µs × (0.92,1.08) 33µs × (1.00,1.00) -72.48% (p=0.004) GoroutineRingBuf-4 138µs × (0.92,1.06) 33µs × (1.00,1.00) -76.21% (p=0.003) The bench benchmarks show little impact. old new garbage 7032879 7011696 httpold 25509 25301 splayold 1022073 1019499 jsonold 28230624 28081433 Change-Id: I228c48fed8d85c9bbef16a7edc53ab7898506f50 Reviewed-on: https://go-review.googlesource.com/9872 Reviewed-by: Austin Clements <austin@google.com>
2015-05-11runtime: enable profiling on g0Daniel Morsing
Since we now have stack information for code running on the systemstack, we can traceback over it. To make cpu profiles useful, add a case in gentraceback to jump over systemstack switches. Fixes #10609. Change-Id: I21f47fcc802c07c5d4a1ada56374314e388a6dc7 Reviewed-on: https://go-review.googlesource.com/9506 Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-05-07runtime: fix comments that mention g status valuesAlex Brainman
Makes searching in source code easier. Change-Id: Ie2e85934d23920ac0bc01d28168bcfbbdc465580 Reviewed-on: https://go-review.googlesource.com/9774 Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com> Reviewed-by: Minux Ma <minux@golang.org>
2015-04-28runtime: replace needwb() with writeBarrierEnabledRuss Cox
Reduce the write barrier check to a single load and compare so that it can be inlined into write barrier use sites. Makes the standard write barrier a little faster too. name old new delta BenchmarkBinaryTree17 17.9s × (0.99,1.01) 17.9s × (1.00,1.01) ~ BenchmarkFannkuch11 4.35s × (1.00,1.00) 4.43s × (1.00,1.00) +1.81% BenchmarkFmtFprintfEmpty 120ns × (0.93,1.06) 110ns × (1.00,1.06) -7.92% BenchmarkFmtFprintfString 479ns × (0.99,1.00) 487ns × (0.99,1.00) +1.67% BenchmarkFmtFprintfInt 452ns × (0.99,1.02) 450ns × (0.99,1.00) ~ BenchmarkFmtFprintfIntInt 766ns × (0.99,1.01) 762ns × (1.00,1.00) ~ BenchmarkFmtFprintfPrefixedInt 576ns × (0.98,1.01) 584ns × (0.99,1.01) ~ BenchmarkFmtFprintfFloat 730ns × (1.00,1.01) 738ns × (1.00,1.00) +1.16% BenchmarkFmtManyArgs 2.84µs × (0.99,1.00) 2.80µs × (1.00,1.01) -1.22% BenchmarkGobDecode 39.3ms × (0.98,1.01) 39.0ms × (0.99,1.00) ~ BenchmarkGobEncode 39.5ms × (0.99,1.01) 37.8ms × (0.98,1.01) -4.33% BenchmarkGzip 663ms × (1.00,1.01) 661ms × (0.99,1.01) ~ BenchmarkGunzip 143ms × (1.00,1.00) 142ms × (1.00,1.00) ~ BenchmarkHTTPClientServer 132µs × (0.99,1.01) 132µs × (0.99,1.01) ~ BenchmarkJSONEncode 57.4ms × (0.99,1.01) 56.3ms × (0.99,1.01) -1.96% BenchmarkJSONDecode 139ms × (0.99,1.00) 138ms × (0.99,1.01) ~ BenchmarkMandelbrot200 6.03ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ BenchmarkGoParse 10.3ms × (0.89,1.14) 10.2ms × (0.87,1.05) ~ BenchmarkRegexpMatchEasy0_32 209ns × (1.00,1.00) 208ns × (1.00,1.00) ~ BenchmarkRegexpMatchEasy0_1K 591ns × (0.99,1.00) 588ns × (1.00,1.00) ~ BenchmarkRegexpMatchEasy1_32 184ns × (0.99,1.02) 182ns × (0.99,1.01) ~ BenchmarkRegexpMatchEasy1_1K 1.01µs × (1.00,1.00) 0.99µs × (1.00,1.01) -2.33% BenchmarkRegexpMatchMedium_32 330ns × (1.00,1.00) 323ns × (1.00,1.01) -2.12% BenchmarkRegexpMatchMedium_1K 92.6µs × (1.00,1.00) 89.9µs × (1.00,1.00) -2.92% BenchmarkRegexpMatchHard_32 4.80µs × (0.95,1.00) 4.72µs × (0.95,1.01) ~ BenchmarkRegexpMatchHard_1K 136µs × (1.00,1.00) 133µs × (1.00,1.01) -1.86% BenchmarkRevcomp 900ms × (0.99,1.04) 900ms × (1.00,1.05) ~ BenchmarkTemplate 172ms × (1.00,1.00) 168ms × (0.99,1.01) -2.07% BenchmarkTimeParse 637ns × (1.00,1.00) 637ns × (1.00,1.00) ~ BenchmarkTimeFormat 744ns × (1.00,1.01) 738ns × (1.00,1.00) -0.67% Change-Id: I4ecc925805da1f5ee264377f1f7574f54ee575e7 Reviewed-on: https://go-review.googlesource.com/9321 Reviewed-by: Austin Clements <austin@google.com>
2015-04-27runtime: replace STW for enabling write barriers with ragged barrierAustin Clements
Currently, we use a full stop-the-world around enabling write barriers. This is to ensure that all Gs have enabled write barriers before any blackening occurs (either in gcBgMarkWorker() or in gcAssistAlloc()). However, there's no need to bring the whole world to a synchronous stop to ensure this. This change replaces the STW with a ragged barrier that ensures each P has individually observed that write barriers should be enabled before GC performs any blackening. Change-Id: If2f129a6a55bd8bdd4308067af2b739f3fb41955 Reviewed-on: https://go-review.googlesource.com/8207 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27runtime: add ragged global barrier functionAustin Clements
This adds forEachP, which performs a general-purpose ragged global barrier. forEachP takes a callback and invokes it for every P at a GC safe point. Ps that are idle or in a syscall are considered to be at a continuous safe point. forEachP ensures that these Ps do not change state by forcing all syscall Ps into idle and holding the sched.lock. To ensure that Ps do not enter syscall or idle without running the safe-point function, this adds checks for a pending callback every place there is currently a gcwaiting check. We'll use forEachP to replace the STW around enabling the write barrier and to replace the current asynchronous per-M wbuf cache with a cooperatively managed per-P gcWork cache. Change-Id: Ie944f8ce1fead7c79bf271d2f42fcd61a41bb3cc Reviewed-on: https://go-review.googlesource.com/8206 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-24runtime: replace per-M workbuf cache with per-P gcWork cacheAustin Clements
Currently, each M has a cache of the most recently used *workbuf. This is used primarily by the write barrier so it doesn't have to access the global workbuf lists on every write barrier. It's also used by stack scanning because it's convenient. This cache is important for write barrier performance, but this particular approach has several downsides. It's faster than no cache, but far from optimal (as the benchmarks below show). It's complex: access to the cache is sprinkled through most of the workbuf list operations and it requires special care to transform into and back out of the gcWork cache that's actually used for scanning and marking. It requires atomic exchanges to take ownership of the cached workbuf and to return it to the M's cache even though it's almost always used by only the current M. Since it's per-M, flushing these caches is O(# of Ms), which may be high. And it has some significant subtleties: for example, in general the cache shouldn't be used after the harvestwbufs() in mark termination because it could hide work from mark termination, but stack scanning can happen after this and *will* use the cache (but it turns out this is okay because it will always be followed by a getfull(), which drains the cache). This change replaces this cache with a per-P gcWork object. This gcWork cache can be used directly by scanning and marking (as long as preemption is disabled, which is a general requirement of gcWork). Since it's per-P, it doesn't require synchronization, which simplifies things and means the only atomic operations in the write barrier are occasionally fetching new work buffers and setting a mark bit if the object isn't already marked. This cache can be flushed in O(# of Ps), which is generally small. It follows a simple flushing rule: the cache can be used during any phase, but during mark termination it must be flushed before allowing preemption. This also makes the dispose during mutator assist no longer necessary, which eliminates the vast majority of gcWork dispose calls and reduces contention on the global workbuf lists. And it's a lot faster on some benchmarks: benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 11963668673 11206112763 -6.33% BenchmarkFannkuch11 2643217136 2649182499 +0.23% BenchmarkFmtFprintfEmpty 70.4 70.2 -0.28% BenchmarkFmtFprintfString 364 307 -15.66% BenchmarkFmtFprintfInt 317 282 -11.04% BenchmarkFmtFprintfIntInt 512 483 -5.66% BenchmarkFmtFprintfPrefixedInt 404 380 -5.94% BenchmarkFmtFprintfFloat 521 479 -8.06% BenchmarkFmtManyArgs 2164 1894 -12.48% BenchmarkGobDecode 30366146 22429593 -26.14% BenchmarkGobEncode 29867472 26663152 -10.73% BenchmarkGzip 391236616 396779490 +1.42% BenchmarkGunzip 96639491 96297024 -0.35% BenchmarkHTTPClientServer 100110 70763 -29.31% BenchmarkJSONEncode 51866051 52511382 +1.24% BenchmarkJSONDecode 103813138 86094963 -17.07% BenchmarkMandelbrot200 4121834 4120886 -0.02% BenchmarkGoParse 16472789 5879949 -64.31% BenchmarkRegexpMatchEasy0_32 140 140 +0.00% BenchmarkRegexpMatchEasy0_1K 394 394 +0.00% BenchmarkRegexpMatchEasy1_32 120 120 +0.00% BenchmarkRegexpMatchEasy1_1K 621 614 -1.13% BenchmarkRegexpMatchMedium_32 209 202 -3.35% BenchmarkRegexpMatchMedium_1K 54889 55175 +0.52% BenchmarkRegexpMatchHard_32 2682 2675 -0.26% BenchmarkRegexpMatchHard_1K 79383 79524 +0.18% BenchmarkRevcomp 584116718 584595320 +0.08% BenchmarkTemplate 125400565 109620196 -12.58% BenchmarkTimeParse 386 387 +0.26% BenchmarkTimeFormat 580 447 -22.93% (Best out of 10 runs. The delta of averages is similar.) This also puts us in a good position to flush these caches when nearing the end of concurrent marking, which will let us increase the size of the work buffers while still controlling mark termination pause time. Change-Id: I2dd94c8517a19297a98ec280203cccaa58792522 Reviewed-on: https://go-review.googlesource.com/9178 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24runtime: yield time slice to most recently readied GAustin Clements
Currently, when the runtime ready()s a G, it adds it to the end of the current P's run queue and continues running. If there are many other things in the run queue, this can result in a significant delay before the ready()d G actually runs and can hurt fairness when other Gs in the run queue are CPU hogs. For example, if there are three Gs sharing a P, one of which is a CPU hog that never voluntarily gives up the P and the other two of which are doing small amounts of work and communicating back and forth on an unbuffered channel, the two communicating Gs will get very little CPU time. Change this so that when G1 ready()s G2 and then blocks, the scheduler immediately hands off the remainder of G1's time slice to G2. In the above example, the two communicating Gs will now act as a unit and together get half of the CPU time, while the CPU hog gets the other half of the CPU time. This fixes the problem demonstrated by the ping-pong benchmark added in the previous commit: benchmark old ns/op new ns/op delta BenchmarkPingPongHog 684287 825 -99.88% On the x/benchmarks suite, this change improves the performance of garbage by ~6% (for GOMAXPROCS=1 and 4), and json by 28% and 36% for GOMAXPROCS=1 and 4. It has negligible effect on heap size. This has no effect on the go1 benchmark suite since those benchmarks are mostly single-threaded. Change-Id: I858a08eaa78f702ea98a5fac99d28a4ac91d339f Reviewed-on: https://go-review.googlesource.com/9289 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-21runtime: fix background marking at 25% utilizationAustin Clements
Currently, in accordance with the GC pacing proposal, we schedule background marking with a goal of achieving 25% utilization *total* between mutator assists and background marking. This is stricter than was set out in the Go 1.5 proposal, which suggests that the garbage collector can use 25% just for itself and anything the mutator does to help out is on top of that. It also has several technical drawbacks. Because mutator assist time is constantly changing and we can't have instantaneous information on background marking time, it effectively requires hitting a moving target based on out-of-date information. This works out in the long run, but works poorly for short GC cycles and on short time scales. Also, this requires time-multiplexing all Ps between the mutator and background GC since the goal utilization of background GC constantly fluctuates. This results in a complicated scheduling algorithm, poor affinity, and extra overheads from context switching. This change modifies the way we schedule and run background marking so that background marking always consumes 25% of GOMAXPROCS and mutator assist is in addition to this. This enables a much more robust scheduling algorithm where we pre-determine the number of Ps we should dedicate to background marking as well as the utilization goal for a single floating "remainder" mark worker. Change-Id: I187fa4c03ab6fe78012a84d95975167299eb9168 Reviewed-on: https://go-review.googlesource.com/9013 Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21runtime: multi-threaded, utilization-scheduled background markAustin Clements
Currently, the concurrent mark phase is performed by the main GC goroutine. Prior to the previous commit enabling preemption, this caused marking to always consume 1/GOMAXPROCS of the available CPU time. If GOMAXPROCS=1, this meant background GC would consume 100% of the CPU (effectively a STW). If GOMAXPROCS>4, background GC would use less than the goal of 25%. If GOMAXPROCS=4, background GC would use the goal 25%, but if the mutator wasn't using the remaining 75%, background marking wouldn't take advantage of the idle time. Enabling preemption in the previous commit made GC miss CPU targets in completely different ways, but set us up to bring everything back in line. This change replaces the fixed GC goroutine with per-P background mark goroutines. Once started, these goroutines don't go in the standard run queues; instead, they are scheduled specially such that the time spent in mutator assists and the background mark goroutines totals 25% of the CPU time available to the program. Furthermore, this lets background marking take advantage of idle Ps, which significantly boosts GC performance for applications that under-utilize the CPU. This requires also changing how time is reported for gctrace, so this change splits the concurrent mark CPU time into assist/background/idle scanning. This also requires increasing the size of the StackRecord slice used in a GoroutineProfile test. Change-Id: I0936ff907d2cee6cb687a208f2df47e8988e3157 Reviewed-on: https://go-review.googlesource.com/8850 Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21runtime: track time spent in mutator assistsAustin Clements
This time is tracked per P and periodically flushed to the global controller state. This will be used to compute mutator assist utilization in order to schedule background GC work. Change-Id: Ib94f90903d426a02cf488bf0e2ef67a068eb3eec Reviewed-on: https://go-review.googlesource.com/8837 Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21runtime: proportional mutator assistAustin Clements
Currently, mutator allocation periodically assists the garbage collector by performing a small, fixed amount of scanning work. However, to control heap growth, mutators need to perform scanning work *proportional* to their allocation rate. This change implements proportional mutator assists. This uses the scan work estimate computed by the garbage collector at the beginning of each cycle to compute how much scan work must be performed per allocation byte to complete the estimated scan work by the time the heap reaches the goal size. When allocation triggers an assist, it uses this ratio and the amount allocated since the last assist to compute the assist work, then attempts to steal as much of this work as possible from the background collector's credit, and then performs any remaining scan work itself. Change-Id: I98b2078147a60d01d6228b99afd414ef857e4fba Reviewed-on: https://go-review.googlesource.com/8836 Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-20runtime: replace func-based write barrier skipping with type-basedRuss Cox
This CL revises CL 7504 to use explicitly uintptr types for the struct fields that are going to be updated sometimes without write barriers. The result is that the fields are now updated *always* without write barriers. This approach has two important properties: 1) Now the GC never looks at the field, so if the missing reference could cause a problem, it will do so all the time, not just when the write barrier is missed at just the right moment. 2) Now a write barrier never happens for the field, avoiding the (correct) detection of inconsistent write barriers when GODEBUG=wbshadow=1. Change-Id: Iebd3962c727c0046495cc08914a8dc0808460e0e Reviewed-on: https://go-review.googlesource.com/9019 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-17runtime: fix dangling pointer in readyExecuteAustin Clements
readyExecute passes a closure to mcall that captures an argument to readyExecute. Since mcall is marked noescape, this closure lives on the stack of the calling goroutine. However, the closure puts the calling goroutine on the run queue (and switches to a new goroutine). If the calling goroutine gets scheduled before the mcall returns, this stack-allocated closure will become invalid while it's still executing. One consequence of this we've observed is that the captured gp variable can get overwritten before the call to execute(gp), causing execute(gp) to segfault. Fix this by passing the currently captured gp variable through a field in the calling goroutine's g struct so that the func is no longer a closure. To prevent problems like this in the future, this change also removes the go:noescape annotation from mcall. Due to a compiler bug, this will currently cause a func closure passed to mcall to be implicitly allocated rather than refusing the implicit allocation. However, this is okay because there are no other closures passed to mcall right now and the compiler bug will be fixed shortly. Fixes #10428. Change-Id: I49b48b85de5643323b89e9eaa4df63854e968c32 Reviewed-on: https://go-review.googlesource.com/8866 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-17runtime: delete cgo_allocateRuss Cox
This memory is untyped and can't be used anymore. The next version of SWIG won't need it. Change-Id: I592b287c5f5186975ee09a9b28d8efe3b57134e7 Reviewed-on: https://go-review.googlesource.com/8956 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-15runtime: merge slice and sliceStructMichael Hudson-Doyle
By removing type slice, renaming type sliceStruct to type slice and whacking until it compiles. Has a pleasing net reduction of conversions. Fixes #10188 Change-Id: I77202b8df637185b632fd7875a1fdd8d52c7a83c Reviewed-on: https://go-review.googlesource.com/8770 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-09runtime: add isarchive, set by the linkerDavid Crawshaw
According to Go execution modes, a Go program compiled with -buildmode=c-archive has a main function, but it is ignored on run. This gives the runtime the information it needs not to run the main. I have this working with pending linker changes on darwin/amd64. Change-Id: I49bd7d65aa619ec847c464a872afa5deea7d4d30 Reviewed-on: https://go-review.googlesource.com/8701 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-03runtime: initialize shared library at library-load timeSrdjan Petrovic
This is Part 2 of the change, see Part 1 here: in https://go-review.googlesource.com/#/c/7692/ Suggested by iant@, we use the library initialization entry point to: - create a new OS thread and run the "regular" runtime init stack on that thread - return immediately from the main (i.e., loader) thread - at the first CGO invocation, we wait for the runtime initialization to complete. The above mechanism is implemented only on linux_amd64. Next step is to support it on linux_arm. Other platforms don't yet support shared library compiling/linking, but we intend to use the same strategy there as well. Change-Id: Ib2c81b1b83bee837134084b75a3beecfb8de6bf4 Reviewed-on: https://go-review.googlesource.com/8094 Run-TryBot: Srdjan Petrovic <spetrovic@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-02runtime: add cumulative GC CPU % to gctrace lineAustin Clements
This tracks both total CPU time used by GC and the total time available to all Ps since the beginning of the program and uses this to derive a cumulative CPU usage percent for the gctrace line. Change-Id: Ica85372b8dd45f7621909b325d5ac713a9b0d015 Reviewed-on: https://go-review.googlesource.com/8350 Reviewed-by: Russ Cox <rsc@golang.org>
2015-03-26runtime, runtime/cgo: make needextram a boolDavid Crawshaw
Also invert it, which means it no longer needs to cross the cgo package boundary. Change-Id: I393cd073bda02b591a55d6bc6b8bb94970ea71cd Reviewed-on: https://go-review.googlesource.com/8082 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-03-20runtime: add GODEBUG=sbrk=1 to bypass memory allocator (and GC)Russ Cox
To reduce lock contention in this mode, makes persistent allocation state per-P, which means at most 64 kB overhead x $GOMAXPROCS, which should be completely tolerable. Change-Id: I34ca95e77d7e67130e30822e5a4aff6772b1a1c5 Reviewed-on: https://go-review.googlesource.com/7740 Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-17runtime: fix writebarrier throw in lock_semaRuss Cox
The value in question is really a bit pattern (a pointer with extra bits thrown in), so treat it as a uintptr instead, avoiding the generation of a write barrier when there might not be a p. Also add the obligatory //go:nowritebarrier. Change-Id: I4ea097945dd7093a140f4740bcadca3ce7191971 Reviewed-on: https://go-review.googlesource.com/7667 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2015-03-17runtime: Remove write barriers during STW.Rick Hudson
The GC assumes that there will be no asynchronous write barriers when the world is stopped. This keeps the synchronization between write barriers and the GC simple. However, currently, there are a few places in runtime code where this assumption does not hold. The GC stops the world by collecting all Ps, which stops all user Go code, but small parts of the runtime can run without a P. For example, the code that releases a P must still deschedule its G onto a runnable queue before stopping. Similarly, when a G returns from a long-running syscall, it must run code to reacquire a P. Currently, this code can contain write barriers. This can lead to the GC collecting reachable objects if something like the following sequence of events happens: 1. GC stops the world by collecting all Ps. 2. G #1 returns from a syscall (for example), tries to install a pointer to object X, and calls greyobject on X. 3. greyobject on G #1 marks X, but does not yet add it to a write buffer. At this point, X is effectively black, not grey, even though it may point to white objects. 4. GC reaches X through some other path and calls greyobject on X, but greyobject does nothing because X is already marked. 5. GC completes. 6. greyobject on G #1 adds X to a work buffer, but it's too late. 7. Objects that were reachable only through X are incorrectly collected. To fix this, we check the invariant that no asynchronous write barriers happen when the world is stopped by checking that write barriers always have a P, and modify all currently known sources of these writes to disable the write barrier. In all modified cases this is safe because the object in question will always be reachable via some other path. Some of the trace code was turned off, in particular the code that traces returning from a syscall. The GC assumes that as far as the heap is concerned the thread is stopped when it is in a syscall. Upon returning the trace code must not do any heap writes for the same reasons discussed above. Fixes #10098 Fixes #9953 Fixes #9951 Fixes #9884 May relate to #9610 #9771 Change-Id: Ic2e70b7caffa053e56156838eb8d89503e3c0c8a Reviewed-on: https://go-review.googlesource.com/7504 Reviewed-by: Austin Clements <austin@google.com>
2015-03-11runtime,reflect,cmd/internal/gc: Fix comments referring to .c/.h filesKeith Randall
Everything has moved to Go, but comments still refer to .c/.h files. Fix all of those up, at least for these three directories. Fixes #10138 Change-Id: Ie5efe89b247841e0b3f82aac5256b2c606ef67dc Reviewed-on: https://go-review.googlesource.com/7431 Reviewed-by: Russ Cox <rsc@golang.org>
2015-03-10runtime: remove runtime frames from stacks in tracesDmitry Vyukov
Stip uninteresting bottom and top frames from trace stacks. This makes both binary and json trace files smaller, and also makes stacks shorter and more readable in the viewer. Change-Id: Ib9c80ccc280504f0e235f867f53f1d2652c41583 Reviewed-on: https://go-review.googlesource.com/5523 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
2015-03-04runtime: bound defer pools (try 2)Dmitry Vyukov
The unbounded list-based defer pool can grow infinitely. This can happen if a goroutine routinely allocates a defer; then blocks on one P; and then unblocked, scheduled and frees the defer on another P. The scenario was reported on golang-nuts list. We've been here several times. Any unbounded local caches are bad and grow to infinite size. This change introduces central defer pool; local pools become fixed-size with the only purpose of amortizing accesses to the central pool. Freedefer now executes on system stack to not consume nosplit stack space. Change-Id: I1a27695838409259d1586a0adfa9f92bccf7ceba Reviewed-on: https://go-review.googlesource.com/3967 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>