aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/runtime2.go
AgeCommit message (Collapse)Author
2018-03-12runtime: convert g.waitreason from string to uint8Josh Bleecher Snyder
Every time I poke at #14921, the g.waitreason string pointer writes show up. They're not particularly important performance-wise, but it'd be nice to clear the noise away. And it does open up a few extra bytes in the g struct for some future use. Change-Id: I7ffbd52fbc2a286931a2218038fda52ed6473cc9 Reviewed-on: https://go-review.googlesource.com/99078 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-08runtime: explain and enforce that _panic values live on the stackAustin Clements
It's a bit mysterious that _defer.sp is a uintptr that gets stack-adjusted explicitly while _panic.argp is an unsafe.Pointer that doesn't, but turns out to be critically important when a deferred function grows the stack before doing a recover. Add a comment explaining that this works because _panic values live on the stack. Enforce this by marking _panic go:notinheap. Change-Id: I9ca49e84ee1f86d881552c55dccd0662b530836b Reviewed-on: https://go-review.googlesource.com/99735 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-03-07runtime: get traceback from VDSO codeIan Lance Taylor
Currently if a profiling signal arrives while executing within a VDSO the profiler will report _ExternalCode, which is needlessly confusing for a pure Go program. Change the VDSO calling code to record the caller's PC/SP, so that we can do a traceback from that point. If that fails for some reason, report _VDSO rather than _ExternalCode, which should at least point in the right direction. This adds some instructions to the code that calls the VDSO, but the slowdown is reasonably negligible: name old time/op new time/op delta ClockVDSOAndFallbackPaths/vDSO-8 40.5ns ± 2% 41.3ns ± 1% +1.85% (p=0.002 n=10+10) ClockVDSOAndFallbackPaths/Fallback-8 41.9ns ± 1% 43.5ns ± 1% +3.84% (p=0.000 n=9+9) TimeNow-8 41.5ns ± 3% 41.5ns ± 2% ~ (p=0.723 n=10+10) Fixes #24142 Change-Id: Iacd935db3c4c782150b3809aaa675a71799b1c9c Reviewed-on: https://go-review.googlesource.com/97315 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-04internal/bytealg: move Count to bytealgKeith Randall
Move bytes.Count and strings.Count to bytealg. Update #19792 Change-Id: I3e4e14b504a0b71758885bb131e5656e342cf8cb Reviewed-on: https://go-review.googlesource.com/98495 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-12-01runtime: restore the Go-allocated signal stack in unminitAustin Clements
Currently, when we minit on a thread that already has an alternate signal stack (e.g., because the M was an extram being used for a cgo callback, or to handle a signal on a C thread, or because the platform's libc always allocates a signal stack like on Android), we simply drop the Go-allocated gsignal stack on the floor. This is a problem for Ms on the extram list because those Ms may later be reused for a different thread that may not have its own alternate signal stack. On tip, this manifests as a crash in sigaltstack because we clear the gsignal stack bounds in unminit and later try to use those cleared bounds when we re-minit that M. On 1.9 and earlier, we didn't clear the bounds, so this manifests as running more than one signal handler on the same signal stack, which could lead to arbitrary memory corruption. This CL fixes this problem by saving the Go-allocated gsignal stack in a new field in the m struct when overwriting it with a system-provided signal stack, and then restoring the original gsignal stack in unminit. This CL is designed to be easy to back-port to 1.9. It won't quite cherry-pick cleanly, but it should be sufficient to simply ignore the change in mexit (which didn't exist in 1.9). Now that we always have a place to stash the original signal stack in the m struct, there are some simplifications we can make to the signal stack handling. We'll do those in a later CL. Fixes #22930. Change-Id: I55c5a6dd9d97532f131146afdef0b216e1433054 Reviewed-on: https://go-review.googlesource.com/81476 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-11-30runtime: don't block signals that will kill the programIan Lance Taylor
Otherwise we may delay the delivery of these signals for an arbitrary length of time. We are already careful to not block signals that the program has asked to see. Also make sure that we don't miss a signal delivery if a thread decides to stop for a while while executing the signal handler. Also clean up the TestAtomicStop output a little bit. Fixes #21433 Change-Id: Ic0c1a4eaf7eba80d1abc1e9537570bf4687c2434 Reviewed-on: https://go-review.googlesource.com/79581 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2017-10-30runtime: buffered write barrier implementationAustin Clements
This implements runtime support for buffered write barriers on amd64. The buffered write barrier has a fast path that simply enqueues pointers in a per-P buffer. Unlike the current write barrier, this fast path is *not* a normal Go call and does not require the compiler to spill general-purpose registers or put arguments on the stack. When the buffer fills up, the write barrier takes the slow path, which spills all general purpose registers and flushes the buffer. We don't allow safe-points or stack splits while this frame is active, so it doesn't matter that we have no type information for the spilled registers in this frame. One minor complication is cgocheck=2 mode, which uses the write barrier to detect Go pointers being written to non-Go memory. We obviously can't buffer this, so instead we set the buffer to its minimum size, forcing the write barrier into the slow path on every call. For this specific case, we pass additional information as arguments to the flush function. This also requires enabling the cgo write barrier slightly later during runtime initialization, after Ps (and the per-P write barrier buffers) have been initialized. The code in this CL is not yet active. The next CL will modify the compiler to generate calls to the new write barrier. This reduces the average cost of the write barrier by roughly a factor of 4, which will pay for the cost of having it enabled more of the time after we make the GC pacer less aggressive. (Benchmarks will be in the next CL.) Updates #14951. Updates #22460. Change-Id: I396b5b0e2c5e5c4acfd761a3235fd15abadc6cb1 Reviewed-on: https://go-review.googlesource.com/73711 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-29runtime: remove write barriers from newstack, gogoAustin Clements
Currently, newstack and gogo have write barriers for maintaining the context register saved in g.sched.ctxt. This is troublesome, because newstack can be called from go:nowritebarrierrec places that can't allow write barriers. It happens to be benign because g.sched.ctxt will always be nil on entry to newstack *and* it so happens the incoming ctxt will also always be nil in these contexts (I think/hope), but this is playing with fire. It's also desirable to mark newstack go:nowritebarrierrec to prevent any other, non-benign write barriers from creeping in, but we can't do that right now because of this one write barrier. Fix all of this by observing that g.sched.ctxt is really just a saved live pointer register. Hence, we can shade it when we scan g's stack and otherwise move it back and forth between the actual context register and g.sched.ctxt without write barriers. This means we can save it in morestack along with all of the other g.sched, eliminate the save from newstack along with its troublesome write barrier, and eliminate the shenanigans in gogo to invoke the write barrier when restoring it. Once we've done all of this, we can mark newstack go:nowritebarrierrec. Fixes #22385. For #22460. Change-Id: I43c24958e3f6785b53c1350e1e83c2844e0d1522 Reviewed-on: https://go-review.googlesource.com/72553 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-13runtime: schedule fractional workers on all PsAustin Clements
Currently only a single P can run a fractional mark worker at a time. This doesn't let us spread out the load, so it gets concentrated on whatever unlucky P picks up the token to run a fractional worker. This can significantly delay goroutines on that P. This commit changes this scheduling rule so each P separately schedules fractional workers. This can significantly reduce the load on any individual P and allows workers to self-preempt earlier. It does have the downside that it's possible for all Ps to be in fractional workers simultaneously (an effect STW). Updates #21698. Change-Id: Ia1e300c422043fa62bb4e3dd23c6232d81e4419c Reviewed-on: https://go-review.googlesource.com/68574 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-13runtime: preempt fractional worker after reaching utilization goalAustin Clements
Currently fractional workers run until preempted by the scheduler, which means they typically run for 20ms. During this time, all other goroutines on that P are blocked, which can introduce significant latency variance. This modifies fractional workers to self-preempt shortly after achieving the fractional utilization goal. In practice this means they preempt much sooner, and the scale of their preemption is on the order of how often the user goroutine block (so, if the application is compute-bound, the fractional workers will also run for long times, but if the application blocks frequently, the fractional workers will also preempt quickly). Fixes #21698. Updates #18534. Change-Id: I03a5ab195dae93154a46c32083c4bb52415d2017 Reviewed-on: https://go-review.googlesource.com/68573 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-11runtime: make it possible to exit Go-created threadsAustin Clements
Currently, threads created by the runtime exist until the whole program exits. For #14592 and #20395, we want to be able to exit and clean up threads created by the runtime. This commit implements that mechanism. The main difficulty is how to clean up the g0 stack. In cgo mode and on Solaris and Windows where the OS manages thread stacks, we simply arrange to return from mstart and let the system clean up the thread. If the runtime allocated the g0 stack, then we use a new exitThread syscall wrapper that arranges to clear a flag in the M once the stack can safely be reaped and call the thread termination syscall. exitThread is based on the existing exit1 wrapper, which was always meant to terminate the calling thread. However, exit1 has never been used since it was introduced 9 years ago, so it was broken on several platforms. exitThread also has the additional complication of having to flag that the stack is unused, which requires some tricks on platforms that use the stack for syscalls. This still leaves the problem of how to reap the unused g0 stacks. For this, we move the M from allm to a new freem list as part of the M exiting. Later, allocm scans the freem list, finds Ms that are marked as done with their stack, removes these from the list and frees their g0 stacks. This also allows these Ms to be garbage collected. This CL does not yet use any of this functionality. Follow-up CLs will. Likewise, there are no new tests in this CL because we'll need follow-up functionality to test it. Change-Id: Ic851ee74227b6d39c6fc1219fc71b45d3004bc63 Reviewed-on: https://go-review.googlesource.com/46037 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-10-11runtime: replace sched.mcount int32 with sched.mnext int64Austin Clements
Currently, since Ms never exit, the number of Ms, the number of Ms ever created, and the ID of the next M are all the same and must be small. That's about to change, so rename sched.mcount to sched.mnext to make it clear it's the number of Ms ever created (and the ID of the next M), change its type to int64, and use mcount() for the number of Ms. In the next commit, mcount() will become slightly less trivial. For #20395. Change-Id: I9af34d36bd72416b5656555d16e8085076f1b196 Reviewed-on: https://go-review.googlesource.com/68750 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-10-11runtime: make m.nextwaitm an muintptrAustin Clements
This field is really a *m (modulo its bottom bit). Change it from uintptr to muintptr to document this fact. Change-Id: I2d181a955ef1d2c1a268edf20091b440d85726c9 Reviewed-on: https://go-review.googlesource.com/46034 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-11runtime: don't start new threads from locked threadsAustin Clements
Applications that need to manipulate kernel thread state are currently on thin ice in Go: they can use LockOSThread to prevent other goroutines from running on the manipulated thread, but Go may clone this manipulated state into a new thread that's put into the runtime's thread pool along with other threads. Fix this by never starting a new thread from a locked thread or a thread that may have been started by C. Instead, the runtime starts a "template thread" with a known-good state. If it then needs to start a new thread but doesn't know that the current thread is in a good state, it forwards the thread creation to the template thread. Fixes #20676. Change-Id: I798137a56e04b7723d55997e9c5c085d1d910643 Reviewed-on: https://go-review.googlesource.com/46033 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-10-05runtime: make LockOSThread/UnlockOSThread nestedAustin Clements
Currently, there is a single bit for LockOSThread, so two calls to LockOSThread followed by one call to UnlockOSThread will unlock the thread. There's evidence (#20458) that this is almost never what people want or expect and it makes these APIs very hard to use correctly or reliably. Change this so LockOSThread/UnlockOSThread can be nested and the calling goroutine will not be unwired until UnlockOSThread has been called as many times as LockOSThread has. This should fix the vast majority of incorrect uses while having no effect on the vast majority of correct uses. Fixes #20458. Change-Id: I1464e5e9a0ea4208fbb83638ee9847f929a2bacb Reviewed-on: https://go-review.googlesource.com/45752 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-09-27runtime: eliminate GOMAXPROCS limitAustin Clements
Now that allp is dynamically allocated, there's no need for a hard cap on GOMAXPROCS. Fixes #15131. Change-Id: I53eee8e228a711a818f7ebce8d9fd915b3865eed Reviewed-on: https://go-review.googlesource.com/45574 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-09-27runtime: dynamically allocate allpAustin Clements
This makes it possible to eliminate the hard cap on GOMAXPROCS. Updates #15131. Change-Id: I4c422b340791621584c118a6be1b38e8a44f8b70 Reviewed-on: https://go-review.googlesource.com/45573 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-09-22runtime: don't call lockOSThread for every cgo callIan Lance Taylor
For a trivial benchmark with a do-nothing cgo call: name old time/op new time/op delta Call-4 64.5ns ± 7% 63.0ns ± 6% -2.25% (p=0.027 n=20+16) Because Windows uses the cgocall mechanism to make system calls, and passes arguments in a struct held in the m, we need to do the lockOSThread/unlockOSThread in that code. Because deferreturn was getting a nosplit stack overflow error, change it to avoid calling typedmemmove. Updates #21827. Change-Id: I9b1d61434c44faeb29805b46b409c812c9acadc2 Reviewed-on: https://go-review.googlesource.com/64070 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-09-16runtime: improve fastrand with a better generatorGiovanni Bajo
The current generator is a simple LSFR, which showed strong correlation in higher bits, as manifested by fastrandn(). Change it with xorshift64+, which is slightly more complex, has a larger state, but has a period of 2^64-1 and is much better at statistical tests. The version used here is capable of passing Diehard and even SmallCrush. Speed is slightly worse but is probably insignificant: name old time/op new time/op delta Fastrand-4 0.77ns ±12% 0.91ns ±21% +17.31% (p=0.048 n=5+5) FastrandHashiter-4 13.6ns ±21% 15.2ns ±17% ~ (p=0.160 n=6+5) Fastrandn/2-4 2.30ns ± 5% 2.45ns ±15% ~ (p=0.222 n=5+5) Fastrandn/3-4 2.36ns ± 7% 2.45ns ± 6% ~ (p=0.222 n=5+5) Fastrandn/4-4 2.33ns ± 8% 2.61ns ±30% ~ (p=0.126 n=6+5) Fastrandn/5-4 2.33ns ± 5% 2.48ns ± 9% ~ (p=0.052 n=6+5) Fixes #21806 Change-Id: I013bb37b463fdfc229a7f324df8fe2da8d286f33 Reviewed-on: https://go-review.googlesource.com/62530 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-09-15runtime: change lockedg/lockedm to guintptr/muintptrIan Lance Taylor
This change has no real effect in itself. This is to prepare for a followup change that will call lockOSThread during a cgo callback when there is no p assigned, and therefore when lockOSThread can not use a write barrier. Change-Id: Ia122d41acf54191864bcb68f393f2ed3b2f87abc Reviewed-on: https://go-review.googlesource.com/63630 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2017-08-29runtime: forward crashing signals to late handlersElias Naur
CL 49590 made it possible for external signal handlers to catch signals from a crashing Go process. This CL extends that support to handlers registered after the Go runtime has initialized. Updates #20392 (and possibly fix it). Change-Id: I18eccd5e958a505f4d1782a7fc51c16bd3a4ff9c Reviewed-on: https://go-review.googlesource.com/57291 Run-TryBot: Elias Naur <elias.naur@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-25runtime: unify sigTabT type across Unix systemsIan Lance Taylor
Change-Id: I8e8a3a118b1216f191c9076b70a88f6f3f19f79f Reviewed-on: https://go-review.googlesource.com/59150 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-08-15runtime: move selectdone into gDaniel Morsing
Writing to selectdone on the stack of another goroutine meant a pretty subtle dance between the select code and the stack copying code. Instead move the selectdone variable into the g struct. Change-Id: Id246aaf18077c625adef7ca2d62794afef1bdd1b Reviewed-on: https://go-review.googlesource.com/53390 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-15runtime: remove link field from itabKeith Randall
We don't use it any more, remove it. Change-Id: I76ce1a4c2e7048fdd13a37d3718b5abf39ed9d26 Reviewed-on: https://go-review.googlesource.com/44474 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-08-15runtime: remove bad field from itabKeith Randall
Just use fun[0]==0 to indicate a bad itab. Change-Id: I28ecb2d2d857090c1ecc40b1d1866ac24a844848 Reviewed-on: https://go-review.googlesource.com/44473 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-08-15runtime: new itab lookup tableKeith Randall
Keep itabs in a growable hash table. Use a simple open-addressable hash table, quadratic probing, power of two sized. Synchronization gets a bit more tricky. The common read path now has two atomic reads, one to get the table pointer and one to read the entry out of the table. I set the max load factor to 75%, kind of arbitrarily. There's a space-speed tradeoff here, and I'm not sure where we should land. Because we use open addressing the itab.link field is no longer needed. I'll remove it in a separate CL. Fixes #20505 Change-Id: Ifb3d9a337512d6cf968c1fceb1eeaf89559afebf Reviewed-on: https://go-review.googlesource.com/44472 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-08-15runtime: remove unused global variable emptystringIan Lance Taylor
Last runtime use was removed in https://golang.org/cl/133700043, September 2014. Replace plan9 syscall uses with plan9-specific variable. Change-Id: Ifb910c021c1419a7c782959f90b054ed600d9e19 Reviewed-on: https://go-review.googlesource.com/55450 Reviewed-by: Martin Möhrmann <moehrmann@google.com> Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-06-14runtime: move pdesc into pAustin Clements
There are currently two arrays indexed by P ID: allp and pdesc. Consolidate these by moving the pdesc fields into type p so they can be indexed off allp along with all other per-P state. For #15131. Change-Id: Ib6c4e6e7612281a1171ba4a0d62e52fd59e960b4 Reviewed-on: https://go-review.googlesource.com/45572 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-06-13runtime: increase MaxGomaxprocs to 1024Austin Clements
Currently MaxGomaxprocs is 256. The previous CL saved enough per-P static space that we can quadruple MaxGomaxprocs (and hence the static size of allp) and still come out ahead. This is safe for Go 1.9. In Go 1.10 we'll eliminate the hard-coded limit entirely. Updates #15131. Change-Id: I919ea821c1ce64c27812541dccd7cd7db4122d16 Reviewed-on: https://go-review.googlesource.com/45673 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-06-05runtime: delay exiting while panic is running deferred functionsIan Lance Taylor
Try to avoid a race between the main goroutine exiting and a panic occurring. Don't try too hard, to avoid hanging. Updates #3934 Fixes #20018 Change-Id: I57a02b6d795d2a61f1cadd137ce097145280ece7 Reviewed-on: https://go-review.googlesource.com/41052 Reviewed-by: Austin Clements <austin@google.com>
2017-05-10runtime: remove unused cpuid_X variablesMartin Möhrmann
They are not exported and not used in the compiler or standard library. Change-Id: Ie1d210464f826742d282f12258ed1792cbd2d188 Reviewed-on: https://go-review.googlesource.com/43135 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-05-01runtime: refactor cpu feature detection for 386 & amd64Martin Möhrmann
Changes all cpu features to be detected and stored in bools in rt0_go. Updates: #15403 Change-Id: I5a9961cdec789b331d09c44d86beb53833d5dc3e Reviewed-on: https://go-review.googlesource.com/41950 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-25runtime: simplify detection of preference to use AVX memmoveMartin Möhrmann
Reduces cmd/go by 4464 bytes on amd64. Removes the duplicate detection of AVX support and presence of Intel processors. Change-Id: I4670189951a63760fae217708f68d65e94a30dc5 Reviewed-on: https://go-review.googlesource.com/41570 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-19runtime: record swept and reclaimed bytes in sweep traceAustin Clements
This extends the GCSweepDone event with counts of swept and reclaimed bytes. These are useful for understanding the duration and effectiveness of sweep events. Change-Id: I3c97a4f0f3aad3adbd188adb264859775f54e2df Reviewed-on: https://go-review.googlesource.com/40811 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2017-04-19runtime: make sweep trace events encompass entire sweep loopAustin Clements
Currently, each individual span sweep emits a span to the trace. But sweeps are generally done in loops until some condition is satisfied, so this tracing is lower-level than anyone really wants any hides the fact that no other work is being accomplished between adjacent sweep events. This is also high overhead: enabling tracing significantly impacts sweep latency. Replace this with instead tracing around the sweep loops used for allocation. This is slightly tricky because sweep loops don't generally know if any sweeping will happen in them. Hence, we make the tracing lazy by recording in the P that we would like to start tracing the sweep *if* one happens, and then only closing the sweep event if we started it. This does mean we don't get tracing on every sweep path, which are legion. However, we get much more informative tracing on the paths that block allocation, which are the paths that matter. Change-Id: I73e14fbb250acb0c9d92e3648bddaa5e7d7e271c Reviewed-on: https://go-review.googlesource.com/40810 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-21bytes: add optimized countByte for amd64Josselin Costanzi
Use SSE/AVX2 when counting a single byte. Inspired from runtime indexbyte implementation. Benchmark against previous implementation, where 1 byte in every 8 is the one we are looking for: * On a machine without AVX2 name old time/op new time/op delta CountSingle/10-4 61.8ns ±10% 15.6ns ±11% -74.83% (p=0.000 n=10+10) CountSingle/32-4 100ns ± 4% 17ns ±10% -82.54% (p=0.000 n=10+9) CountSingle/4K-4 9.66µs ± 3% 0.37µs ± 6% -96.21% (p=0.000 n=10+10) CountSingle/4M-4 11.0ms ± 6% 0.4ms ± 4% -96.04% (p=0.000 n=10+10) CountSingle/64M-4 194ms ± 8% 8ms ± 2% -95.64% (p=0.000 n=10+10) name old speed new speed delta CountSingle/10-4 162MB/s ±10% 645MB/s ±10% +297.00% (p=0.000 n=10+10) CountSingle/32-4 321MB/s ± 5% 1844MB/s ± 9% +474.79% (p=0.000 n=10+9) CountSingle/4K-4 424MB/s ± 3% 11169MB/s ± 6% +2533.10% (p=0.000 n=10+10) CountSingle/4M-4 381MB/s ± 7% 9609MB/s ± 4% +2421.88% (p=0.000 n=10+10) CountSingle/64M-4 346MB/s ± 7% 7924MB/s ± 2% +2188.78% (p=0.000 n=10+10) * On a machine with AVX2 name old time/op new time/op delta CountSingle/10-8 37.1ns ± 3% 8.2ns ± 1% -77.80% (p=0.000 n=10+10) CountSingle/32-8 66.1ns ± 3% 9.8ns ± 2% -85.23% (p=0.000 n=10+10) CountSingle/4K-8 7.36µs ± 3% 0.11µs ± 1% -98.54% (p=0.000 n=10+10) CountSingle/4M-8 7.46ms ± 2% 0.15ms ± 2% -97.95% (p=0.000 n=10+9) CountSingle/64M-8 124ms ± 2% 6ms ± 4% -95.09% (p=0.000 n=10+10) name old speed new speed delta CountSingle/10-8 269MB/s ± 3% 1213MB/s ± 1% +350.32% (p=0.000 n=10+10) CountSingle/32-8 484MB/s ± 4% 3277MB/s ± 2% +576.66% (p=0.000 n=10+10) CountSingle/4K-8 556MB/s ± 3% 37933MB/s ± 1% +6718.36% (p=0.000 n=10+10) CountSingle/4M-8 562MB/s ± 2% 27444MB/s ± 3% +4783.43% (p=0.000 n=10+9) CountSingle/64M-8 543MB/s ± 2% 11054MB/s ± 3% +1935.81% (p=0.000 n=10+10) Fixes #19411 Change-Id: Ieaf20b1fabccabe767c55c66e242e86f3617f883 Reviewed-on: https://go-review.googlesource.com/38258 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-03-06runtime: avoid repeated findmoduledatap callsAustin Clements
Currently almost every function that deals with a *_func has to first look up the *moduledata for the module containing the function's entry point. This means we almost always do at least two identical module lookups whenever we deal with a *_func (one to get the *_func and another to get something from its module data) and sometimes several more. Fix this by making findfunc return a new funcInfo type that embeds *_func, but also includes the *moduledata, and making all of the functions that currently take a *_func instead take a funcInfo and use the already-found *moduledata. This transformation is trivial for the most part, since the *_func type is usually inferred. The annoying part is that we can no longer use nil to indicate failure, so this introduces a funcInfo.valid() method and replaces nil checks with calls to valid. Change-Id: I9b8075ef1c31185c1943596d96dec45c7ab5100f Reviewed-on: https://go-review.googlesource.com/37331 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2017-03-04runtime: remove unused gcstatsAustin Clements
The gcstats structure is no longer consumed by anything and no longer tracks statistics that are particularly relevant to the concurrent garbage collector. Remove it. (Having statistics is probably a good idea, but these aren't the stats we need these days and we don't have a way to get them out of the runtime.) In preparation for #13613. Change-Id: Ib63e2f9067850668f9dcbfd4ed89aab4a6622c3f Reviewed-on: https://go-review.googlesource.com/34936 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-02-24runtime: do not allocate on every time.SleepRuss Cox
It's common for some goroutines to loop calling time.Sleep. Allocate once per goroutine, not every time. This comes up in runtime/pprof's background reader. Change-Id: I89d17dc7379dca266d2c9cd3aefc2382f5bdbade Reviewed-on: https://go-review.googlesource.com/37162 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2017-02-23runtime: new profile buffer implementation supporting label pointersRuss Cox
The existing CPU profiling buffer is a slice of uintptr, but we want to start including profiling label data in the profiles, and those labels need to be pointers in order to let them describe rich information. This CL implements a new profBuf type that holds both a slice of uint64 for data and a slice of unsafe.Pointer for profiling labels (aka tags). Making the runtime use these buffers will happen in followup CLs. Change-Id: I9ff16b532d8edaf4ce0cbba1098229a561834efc Reviewed-on: https://go-review.googlesource.com/36713 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2017-02-16runtime: use balanced tree for addr lookup in semaphore implementationRuss Cox
CL 36792 fixed #17953, a linear scan caused by n goroutines piling into two different locks that hashed to the same bucket in the semaphore table. In that CL, n goroutines contending for 2 unfortunately chosen locks went from O(n²) to O(n). This CL fixes a different linear scan, when n goroutines are contending for n/2 different locks that all hash to the same bucket in the semaphore table. In this CL, n goroutines contending for n/2 unfortunately chosen locks goes from O(n²) to O(n log n). This case is much less likely, but any linear scan eventually hurts, so we might as well fix it while the problem is fresh in our minds. The new test in this CL checks for both linear scans. The effect of this CL on the sync benchmarks is negligible (but it fixes the new test). name old time/op new time/op delta Cond1-48 576ns ±10% 575ns ±13% ~ (p=0.679 n=71+71) Cond2-48 1.59µs ± 8% 1.61µs ± 9% ~ (p=0.107 n=73+69) Cond4-48 4.56µs ± 7% 4.55µs ± 7% ~ (p=0.670 n=74+72) Cond8-48 9.87µs ± 9% 9.90µs ± 7% ~ (p=0.507 n=69+73) Cond16-48 20.4µs ± 7% 20.4µs ±10% ~ (p=0.588 n=69+71) Cond32-48 45.4µs ±10% 45.4µs ±14% ~ (p=0.944 n=73+73) UncontendedSemaphore-48 19.7ns ±12% 19.7ns ± 8% ~ (p=0.589 n=65+63) ContendedSemaphore-48 55.4ns ±26% 54.9ns ±32% ~ (p=0.441 n=75+75) MutexUncontended-48 0.63ns ± 0% 0.63ns ± 0% ~ (all equal) Mutex-48 210ns ± 6% 213ns ±10% +1.30% (p=0.035 n=70+74) MutexSlack-48 210ns ± 7% 211ns ± 9% ~ (p=0.184 n=71+72) MutexWork-48 299ns ± 5% 300ns ± 5% ~ (p=0.678 n=73+75) MutexWorkSlack-48 302ns ± 6% 300ns ± 5% ~ (p=0.149 n=74+72) MutexNoSpin-48 135ns ± 6% 135ns ±10% ~ (p=0.788 n=67+75) MutexSpin-48 693ns ± 5% 689ns ± 6% ~ (p=0.092 n=65+74) Once-48 0.22ns ±25% 0.22ns ±24% ~ (p=0.882 n=74+73) Pool-48 5.88ns ±36% 5.79ns ±24% ~ (p=0.655 n=69+69) PoolOverflow-48 4.79µs ±18% 4.87µs ±20% ~ (p=0.233 n=75+75) SemaUncontended-48 0.80ns ± 1% 0.82ns ± 8% +2.46% (p=0.000 n=60+74) SemaSyntNonblock-48 103ns ± 4% 102ns ± 5% -1.11% (p=0.003 n=75+75) SemaSyntBlock-48 104ns ± 4% 104ns ± 5% ~ (p=0.231 n=71+75) SemaWorkNonblock-48 128ns ± 4% 129ns ± 6% +1.51% (p=0.000 n=63+75) SemaWorkBlock-48 129ns ± 8% 130ns ± 7% ~ (p=0.072 n=75+74) RWMutexUncontended-48 2.35ns ± 1% 2.35ns ± 0% ~ (p=0.144 n=70+55) RWMutexWrite100-48 139ns ±18% 141ns ±21% ~ (p=0.071 n=75+73) RWMutexWrite10-48 145ns ± 9% 145ns ± 8% ~ (p=0.553 n=75+75) RWMutexWorkWrite100-48 297ns ±13% 297ns ±15% ~ (p=0.519 n=75+74) RWMutexWorkWrite10-48 588ns ± 7% 585ns ± 5% ~ (p=0.173 n=73+70) WaitGroupUncontended-48 0.87ns ± 0% 0.87ns ± 0% ~ (all equal) WaitGroupAddDone-48 63.2ns ± 4% 62.7ns ± 4% -0.82% (p=0.027 n=72+75) WaitGroupAddDoneWork-48 109ns ± 5% 109ns ± 4% ~ (p=0.233 n=75+75) WaitGroupWait-48 0.17ns ± 0% 0.16ns ±16% -8.55% (p=0.000 n=56+75) WaitGroupWaitWork-48 1.78ns ± 1% 2.08ns ± 5% +16.92% (p=0.000 n=74+70) WaitGroupActuallyWait-48 52.0ns ± 3% 50.6ns ± 5% -2.70% (p=0.000 n=71+69) https://perf.golang.org/search?q=upload:20170215.1 Change-Id: Ia29a8bd006c089e401ec4297c3038cca656bcd0a Reviewed-on: https://go-review.googlesource.com/37103 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-14runtime: remove g.stackAllocAustin Clements
Since we're no longer stealing space for the stack barrier array from the stack allocation, the stack allocation is simply g.stack.hi-g.stack.lo. Updates #17503. Change-Id: Id9b450ae12c3df9ec59cfc4365481a0a16b7c601 Reviewed-on: https://go-review.googlesource.com/36621 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-02-14runtime: remove stack barriersAustin Clements
Now that we don't rescan stacks, stack barriers are unnecessary. This removes all of the code and structures supporting them as well as tests that were specifically for stack barriers. Updates #17503. Change-Id: Ia29221730e0f2bbe7beab4fa757f31a032d9690c Reviewed-on: https://go-review.googlesource.com/36620 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-14runtime: remove rescan listAustin Clements
With the hybrid barrier, rescanning stacks is no longer necessary so the rescan list is no longer necessary. Remove it. This leaves the gcrescanstacks GODEBUG variable, since it's useful for debugging, but changes it to simply walk all of the Gs to rescan stacks rather than using the rescan list. We could also remove g.gcscanvalid, which is effectively a distributed rescan list. However, it's still useful for gcrescanstacks mode and it adds little complexity, so we'll leave it in. Fixes #17099. Updates #17503. Change-Id: I776d43f0729567335ef1bfd145b75c74de2cc7a9 Reviewed-on: https://go-review.googlesource.com/36619 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-02-13cmd/compile: optimize non-empty-interface type conversionsKeith Randall
When doing i.(T) for non-empty-interface i and concrete type T, there's no need to read the type out of the itab. Just compare the itab to the itab we expect for that interface/type pair. Also optimize type switches by putting the type hash of the concrete type in the itab. That way we don't need to load the type pointer out of the itab. Update #18492 Change-Id: I49e280a21e5687e771db5b8a56b685291ac168ce Reviewed-on: https://go-review.googlesource.com/34810 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: David Chase <drchase@google.com>
2017-02-12runtime: use two-level list for semaphore address search in semaRootRuss Cox
If there are many goroutines contending for two different locks and both locks hash to the same semaRoot, the scans to find the goroutines for a particular lock can end up being O(n), making n lock acquisitions quadratic. As long as only one actively-used lock hashes to each semaRoot there's no problem, since the list operations in that case are O(1). But when the second actively-used lock hits the same semaRoot, then scans for entries with for a given lock have to scan over the entries for the other lock. Fix this problem by changing the semaRoot to hold only one sudog per unique address. In the running example, this drops the length of that list from O(n) to 2. Then attach other goroutines waiting on the same address to a separate list headed by the sudog in the semaRoot list. Those "same address list" operations are still O(1), so now the example from above works much better. There is still an assumption here that in real programs you don't have many many goroutines queueing up on many many distinct addresses. If we end up with that problem, we can replace the top-level list with a treap. Fixes #17953. Change-Id: I78c5b1a5053845275ab31686038aa4f6db5720b2 Reviewed-on: https://go-review.googlesource.com/36792 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-08runtime: use atomic ops for fwdSig, make sigtable immutableIan Lance Taylor
The fwdSig array is accessed by the signal handler, which may run in parallel with other threads manipulating it via the os/signal package. Use atomic accesses to ensure that there are no problems. Move the _SigHandling flag out of the sigtable array. This makes sigtable immutable and safe to read from the signal handler. Change-Id: Icfa407518c4ebe1da38580920ced764898dfc9ad Reviewed-on: https://go-review.googlesource.com/36321 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-06runtime: add definitions for SetGoroutineLabels and DoMichael Matloob
This change defines runtime/pprof.SetGoroutineLabels and runtime/pprof.Do, which are used to set profiler labels on goroutines. The change defines functions in the runtime for setting and getting profile labels, and sets and unsets profile labels when goroutines are created and deleted. The change also adds the package runtime/internal/proflabel, which defines the structure the runtime uses to store profile labels. Change-Id: I747a4400141f89b6e8160dab6aa94ca9f0d4c94d Reviewed-on: https://go-review.googlesource.com/34198 Run-TryBot: Michael Matloob <matloob@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-on: https://go-review.googlesource.com/35010
2017-02-03runtime: handle SIGPIPE in c-archive and c-shared programsElias Naur
Before this CL, Go programs in c-archive or c-shared buildmodes would not handle SIGPIPE. That leads to surprising behaviour where writes on a closed pipe or socket would raise SIGPIPE and terminate the program. This CL changes the Go runtime to handle SIGPIPE regardless of buildmode. In addition, SIGPIPE from non-Go code is forwarded. This is a refinement of CL 32796 that fixes the case where a non-default handler for SIGPIPE is installed by the host C program. Fixes #17393 Change-Id: Ia41186e52c1ac209d0a594bae9904166ae7df7de Reviewed-on: https://go-review.googlesource.com/35960 Run-TryBot: Elias Naur <elias.naur@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-01-05crypto: detect BMI usability on AMD64 for sha1 and sha256Lion Yang
The existing implementations on AMD64 only detects AVX2 usability, when they also contains BMI (bit-manipulation instructions). These instructions crash the running program as 'unknown instructions' on the architecture, e.g. i3-4000M, which supports AVX2 but not support BMI. This change added the detections for BMI1 and BMI2 to AMD64 runtime with two flags as the result, `support_bmi1` and `support_bmi2`, in runtime/runtime2.go. It also completed the condition to run AVX2 version in packages crypto/sha1 and crypto/sha256. Fixes #18512 Change-Id: I917bf0de365237740999de3e049d2e8f2a4385ad Reviewed-on: https://go-review.googlesource.com/34850 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>