aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/lockrank.go
AgeCommit message (Collapse)Author
2026-01-27cmd/link, runtime: remove typelinksIan Lance Taylor
Instead of adding a typelinks section to a Go binary, mark the start and end of the typelinked type descriptors. The runtime can then step through the descriptors to find them all, rather than relying on the extra linker-generated offset list. The runtime steps through the type descriptors lazily, as many Go programs don't need the typelinks list at all. This reduces the size of cmd/go by 15K bytes, which isn't much but it's not nothing. A future CL will change the reflect package to use the type pointers directly rather than converting to offsets and then back to type pointers. For #6853 Change-Id: Id0af4ce81c5b1cea899fc92b6ff9d2db8ce4c267 Reviewed-on: https://go-review.googlesource.com/c/go/+/724261 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Ian Lance Taylor <iant@golang.org>
2025-11-10runtime: fix lock rank for work.spanSPMCs.lockMichael Anthony Knyszek
Currently this lock is treated like a leaf lock, but it's not one. It can acquire the globalAlloc lock via fixalloc and persistentalloc. This means we need to figure out where it can be acquired. In general, it can be acquired by any write barrier, which is already incredibly broad. AFAICT, it can't be acquired directly otherwise, except when destroying a spanSPMC during procresize, in which case we'll be holding sched.lock. Fixes #75916. Change-Id: I2da1f5b82c750bf4e8d6a8a562046b9a17fd44be Reviewed-on: https://go-review.googlesource.com/c/go/+/717500 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-05runtime: save scalar registers off stack in amd64 async preemptionAustin Clements
Asynchronous preemption must save all registers that could be in use by Go code. Currently, it saves all of these to the goroutine stack. As a result, the stack frame requirements of asynchronous preemption can be rather high. On amd64, this requires 368 bytes of stack space, most of which is the XMM registers. Several RISC architectures are around 0.5 KiB. As we add support for SIMD instructions, this is going to become a problem. The AVX-512 register state is 2.5 KiB. This well exceeds the nosplit limit, and even if it didn't, could constrain when we can asynchronously preempt goroutines on small stacks. This CL fixes this by moving pure scalar state stored in non-GP registers off the stack and into an allocated "extended register state" object. To reduce space overhead, we only allocate these objects as needed. While in the theoretical limit, every G could need this register state, in practice very few do at a time. However, we can't allocate when we're in the middle of saving the register state during an asynchronous preemption, so we reserve scratch space on every P to temporarily store the register state, which can then be copied out to an allocated state object later by Go code. This commit only implements this for amd64, since that's where we're about to add much more vector state, but it lays the groundwork for doing this on any architecture that could benefit. This is a cherry-pick of CL 680898 plus bug fix CL 684836 from the dev.simd branch. Change-Id: I123a95e21c11d5c10942d70e27f84d2d99bbf735 Reviewed-on: https://go-review.googlesource.com/c/go/+/669195 Auto-Submit: Austin Clements <austin@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-29runtime: add vgetrandom lock rankMichael Pratt
vgetrandomGetState can call malloc, so this is not a leaf lock. Our staticlockrank builder doesn't support vgetrandom, so it didn't catch this. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking Change-Id: I6a6a636c36c9172e4ebf9493c10cb23cac29a13f Reviewed-on: https://go-review.googlesource.com/c/go/+/677255 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-29runtime: guarantee no GOMAXPROCS update syscalls after GOMAXPROCS callMichael Pratt
We already guarantee that no automatic updates to GOMAXPROCS occur after a GOMAXPROCS call returns. This is easily achieved by having the update goroutine double-check that updates are still allowed during STW before committing the new value. However, it is possible for sysmon to concurrently run defaultGOMAXPROCS to compute a new GOMAXPROCS value after GOMAXPROCS returns. This new value will be discarded later, but we'll still perform the system calls necessary to compute the new value. Normally this distinction doesn't matter, but if you want to sandbox a Go program, then you may want to disable GOMAXPROCS updates to reduce the system call footprint. A call to GOMAXPROCS will disable updates, but without a guarantee on when sysmon will observe the change it is somewhat fragile. Add explicit synchronization between GOMAXPROCS and sysmon to guarantee that sysmon won't run defaultGOMAXPROCS after GOMAXPROCS returns. The synchronization is a bit complex because we can't hold a mutex across STW, nor take a semaphore from sysmon, but the result isn't too bad. One oddity is that sched.customGOMAXPROCS and gomaxprocs are no longer updated in lockstep (even though both are protected by sched.lock), but I don't believe anything should depend on that. For #73193. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking Change-Id: I6a6a636cff243a9b69ac1b5d2f98925648e60236 Reviewed-on: https://go-review.googlesource.com/c/go/+/677037 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-28runtime: rename updateGOMAXPROCS to updateMaxProcsGMichael Pratt
There are other parts to updating GOMAXPROCS than just the helper goroutine, so make the naming more specific. For #73193. Change-Id: I6a6a636c31ac80c8d76afe90c0bfc29d3086af4d Reviewed-on: https://go-review.googlesource.com/c/go/+/677035 Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-27runtime: define lock ranking between weak pointers and synctestDamien Neil
Fixes #73817 Change-Id: I0101bdc797237b4c7eb58b414c71b009b0b44447 Reviewed-on: https://go-review.googlesource.com/c/go/+/675176 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Damien Neil <dneil@google.com>
2025-05-21runtime: use cgroup CPU limit to set GOMAXPROCSMichael Pratt
This CL adds two related features enabled by default via compatibility GODEBUGs containermaxprocs and updatemaxprocs. On Linux, containermaxprocs makes the Go runtime consider cgroup CPU bandwidth limits (quota/period) when setting GOMAXPROCS. If the cgroup limit is lower than the number of logical CPUs available, then the cgroup limit takes precedence. On all OSes, updatemaxprocs makes the Go runtime periodically recalculate the default GOMAXPROCS value and update GOMAXPROCS if it has changed. If GOMAXPROCS is set manually, this update does not occur. This is intended primarily to detect changes to cgroup limits, but it applies on all OSes because the CPU affinity mask can change as well. The runtime only considers the limit in the leaf cgroup (the one that actually contains the process), caching the CPU limit file descriptor(s), which are periodically reread for updates. This is a small departure from the original proposed design. It will not consider limits of parent cgroups (which may be lower than the leaf), and it will not detection cgroup migration after process start. We can consider changing this in the future, but the simpler approach is less invasive; less risk to packages that have some awareness of runtime internals. e.g., if the runtime periodically opens new files during execution, file descriptor leak detection is difficult to implement in a stable way. For #73193. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Change-Id: I6a6a636c631c1ae577fb8254960377ba91c5dc98 Reviewed-on: https://go-review.googlesource.com/c/go/+/670497 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-08runtime: schedule cleanups across multiple goroutinesMichael Anthony Knyszek
This change splits the finalizer and cleanup queues and implements a new lock-free blocking queue for cleanups. The basic design is as follows: The cleanup queue is organized in fixed-sized blocks. Individual cleanup functions are queued, but only whole blocks are dequeued. Enqueuing cleanups places them in P-local cleanup blocks. These are flushed to the full list as they get full. Cleanups can only be enqueued by an active sweeper. Dequeuing cleanups always dequeues entire blocks from the full list. Cleanup blocks can be dequeued and executed at any time. The very last active sweeper in the sweep phase is responsible for flushing all local cleanup blocks to the full list. It can do this without any synchronization because the next GC can't start yet, so we can be very certain that nobody else will be accessing the local blocks. Cleanup blocks are stored off-heap because the need to be allocated by the sweeper, which is called from heap allocation paths. As a result, the GC treats cleanup blocks as roots, just like finalizer blocks. Flushes to the full list signal to the scheduler that cleanup goroutines should be awoken. Every time the scheduler goes to wake up a cleanup goroutine and there were more signals than goroutines to wake, it then forwards this signal to runtime.AddCleanup, so that it creates another goroutine the next time it is called, up to gomaxprocs goroutines. The signals here are a little convoluted, but exist because the sweeper and the scheduler cannot safely create new goroutines. For #71772. For #71825. Change-Id: Ie839fde2b67e1b79ac1426be0ea29a8d923a62cc Reviewed-on: https://go-review.googlesource.com/c/go/+/650697 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-11-19internal/synctest: new package for testing concurrent codeDamien Neil
Add an internal (for now) implementation of testing/synctest. The synctest.Run function executes a tree of goroutines in an isolated environment using a fake clock. The synctest.Wait function allows a test to wait for all other goroutines within the test to reach a blocking point. For #67434 For #69687 Change-Id: Icb39e54c54cece96517e58ef9cfb18bf68506cfc Reviewed-on: https://go-review.googlesource.com/c/go/+/591997 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-13runtime: prevent weak->strong conversions during mark terminationMichael Anthony Knyszek
Currently it's possible for weak->strong conversions to create more GC work during mark termination. When a weak->strong conversion happens during the mark phase, we need to mark the newly-strong pointer, since it may now be the only pointer to that object. In other words, the object could be white. But queueing new white objects creates GC work, and if this happens during mark termination, we could end up violating mark termination invariants. In the parlance of the mark termination algorithm, the weak->strong conversion is a non-monotonic source of GC work, unlike the write barriers (which will eventually only see black objects). This change fixes the problem by forcing weak->strong conversions to block during mark termination. We can do this efficiently by setting a global flag before the ragged barrier that is checked at each weak->strong conversion. If the flag is set, then the conversions block. The ragged barrier ensures that all Ps have observed the flag and that any weak->strong conversions which completed before the ragged barrier have their newly-minted strong pointers visible in GC work queues if necessary. We later unset the flag and wake all the blocked goroutines during the mark termination STW. There are a few subtleties that we need to account for. For one, it's possible that a goroutine which blocked in a weak->strong conversion wakes up only to find it's mark termination time again, so we need to recheck the global flag on wake. We should also stay non-preemptible while performing the check, so that if the check *does* appear as true, it cannot switch back to false while we're actively trying to block. If it switches to false while we try to block, then we'll be stuck in the queue until the following GC. All-in-all, this CL is more complicated than I would have liked, but it's the only idea so far that is clearly correct to me at a high level. This change adds a test which is somewhat invasive as it manipulates mark termination, but hopefully that infrastructure will be useful for debugging, fixing, and regression testing mark termination whenever we do fix it. Fixes #69803. Change-Id: Ie314e6fd357c9e2a07a9be21f217f75f7aba8c4a Reviewed-on: https://go-review.googlesource.com/c/go/+/623615 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-05-08runtime: add traceallocfree GODEBUG for alloc/free events in tracesMichael Anthony Knyszek
This change adds expensive alloc/free events to traces, guarded by a GODEBUG that can be set at run time by mutating the GODEBUG environment variable. This supersedes the alloc/free trace deleted in a previous CL. There are two parts to this CL. The first part is adding a mechanism for exposing experimental events through the tracer and trace parser. This boils down to a new ExperimentalEvent event type in the parser API which simply reveals the raw event data for the event. Each experimental event can also be associated with "experimental data" which is associated with a particular generation. This experimental data is just exposed as a bag of bytes that supplements the experimental events. In the runtime, this CL organizes experimental events by experiment. An experiment is defined by a set of experimental events and a single special batch type. Batches of this special type are exposed through the parser's API as the aforementioned "experimental data". The second part of this CL is defining the AllocFree experiment, which defines 9 new experimental events covering heap object alloc/frees, span alloc/frees, and goroutine stack alloc/frees. It also generates special batches that contain a type table: a mapping of IDs to type information. Change-Id: I965c00e3dcfdf5570f365ff89d0f70d8aeca219c Reviewed-on: https://go-review.googlesource.com/c/go/+/583377 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-14time: avoid stale receives after Timer/Ticker Stop/Reset returnRuss Cox
A proposal discussion in mid-2020 on #37196 decided to change time.Timer and time.Ticker so that their Stop and Reset methods guarantee that no old value (corresponding to the previous configuration of the Timer or Ticker) will be received after the method returns. The trivial way to do this is to make the Timer/Ticker channels unbuffered, create a goroutine per Timer/Ticker feeding the channel, and then coordinate with that goroutine during Stop/Reset. Since Stop/Reset coordinate with the goroutine and the channel is unbuffered, there is no possibility of a stale value being sent after Stop/Reset returns. Of course, we do not want an extra goroutine per Timer/Ticker, but that's still a good semantic model: behave like the channels are unbuffered and fed by a coordinating goroutine. The actual implementation is more effort but behaves like the model. Specifically, the timer channel has a 1-element buffer like it always has, but len(t.C) and cap(t.C) are special-cased to return 0 anyway, so user code cannot see what's in the buffer except with a receive. Stop/Reset lock out any stale sends and then clear any pending send from the buffer. Some programs will change behavior. For example: package main import "time" func main() { t := time.NewTimer(2 * time.Second) time.Sleep(3 * time.Second) if t.Reset(2*time.Second) != false { panic("expected timer to have fired") } <-t.C <-t.C } This program (from #11513) sleeps 3s after setting a 2s timer, resets the timer, and expects Reset to return false: the Reset is too late and the send has already occurred. It then expects to receive two values: the one from before the Reset, and the one from after the Reset. With an unbuffered timer channel, it should be clear that no value can be sent during the time.Sleep, so the time.Reset returns true, indicating that the Reset stopped the timer from going off. Then there is only one value to receive from t.C: the one from after the Reset. In 2015, I used the above example as an argument against this change. Note that a correct version of the program would be: func main() { t := time.NewTimer(2 * time.Second) time.Sleep(3 * time.Second) if !t.Reset(2*time.Second) { <-t.C } <-t.C } This works with either semantics, by heeding t.Reset's result. The change should not affect correct programs. However, one way that the change would be visible is when programs use len(t.C) (instead of a non-blocking receive) to poll whether the timer has triggered already. We might legitimately worry about breaking such programs. In 2020, discussing #37196, Bryan Mills and I surveyed programs using len on timer channels. These are exceedingly rare to start with; nearly all the uses are buggy; and all the buggy programs would be fixed by the new semantics. The details are at [1]. To further reduce the impact of this change, this CL adds a temporary GODEBUG setting, which we didn't know about yet in 2015 and 2020. Specifically, asynctimerchan=1 disables the change and is the default for main programs in modules that use a Go version before 1.23. We hope to be able to retire this setting after the minimum 2-year window. Setting asynctimerchan=1 also disables the garbage collection change from CL 568341, although users shouldn't need to know that since it is not a semantically visible change (unless we have bugs!). As an undocumented bonus that we do not officially support, asynctimerchan=2 disables the channel buffer change but keeps the garbage collection change. This may help while we are shaking out bugs in either of them. Fixes #37196. [1] https://github.com/golang/go/issues/37196#issuecomment-641698749 Change-Id: I8925d3fb2b86b2ae87fd2acd055011cbf7bd5916 Reviewed-on: https://go-review.googlesource.com/c/go/+/568341 Reviewed-by: Austin Clements <austin@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-13time: garbage collect unstopped Tickers and TimersRuss Cox
From the beginning of Go, the time package has had a gotcha: if you use a select on <-time.After(1*time.Minute), even if the select finishes immediately because some other case is ready, the underlying timer from time.After keeps running until the minute is over. This pins the timer in the timer heap, which keeps it from being garbage collected and in extreme cases also slows down timer operations. The lack of garbage collection is the more important problem. The docs for After warn against this scenario and suggest using NewTimer with a call to Stop after the select instead, purely to work around this garbage collection problem. Oddly, the docs for NewTimer and NewTicker do not mention this problem, but they have the same issue: they cannot be collected until either they are Stopped or, in the case of Timer, the timer expires. (Tickers repeat, so they never expire.) People have built up a shared knowledge that timers and tickers need to defer t.Stop even though the docs do not mention this (it is somewhat implied by the After docs). This CL fixes the garbage collection problem, so that a timer that is unreferenced can be GC'ed immediately, even if it is still running. The approach is to only insert the timer into the heap when some channel operation is blocked on it; the last channel operation to stop using the timer takes it back out of the heap. When a timer's channel is no longer referenced, there are no channel operations blocked on it, so it's not in the heap, so it can be GC'ed immediately. This CL adds an undocumented GODEBUG asynctimerchan=1 that will disable the change. The documentation happens in the CL 568341. Fixes #8898. Fixes #61542. Change-Id: Ieb303b6de1fb3527d3256135151a9e983f3c27e6 Reviewed-on: https://go-review.googlesource.com/c/go/+/512355 Reviewed-by: Austin Clements <austin@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>
2024-03-13runtime: fix another lock ordering problemRuss Cox
https://logs.chromium.org/logs/golang/buildbucket/cr-buildbucket/8753622336585847105/+/u/step/11/log/2 shows a staticlockranking crash with pollcache (defaulted to LEAF) being held during a write barrier, which got unlucky and acquired wbufSpans, triggering a lock ordering throw. My change in https://go-review.googlesource.com/c/go/+/570335/13/src/runtime/netpoll.go around line 700 caused batching of many write barriers on the first call rather than having just a few write barriers on each call, making the crash much more likely, but the ordering problem appears to have always existed. We just never allocated enough pollDescs to trigger it. Change-Id: Icb5e8340a5027dd4f7535a5ef02b2868476539e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/571195 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2024-03-08runtime: remove sched, allp < timers lockrank ruleRuss Cox
allp < timers has not been necessary since CL 258303. sched < timers was implied by allp < timers, and that was still necessary, but only when the world is stopped. Rewrite the code to avoid that lock since the world is stopped. Now timers and timer are independent of the scheduler, so they could call into the scheduler (for example to ready a goroutine) if we wanted them to. Change-Id: I12a93013c98e51c9e2f2148175b02afce8384a59 Reviewed-on: https://go-review.googlesource.com/c/go/+/568337 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Russ Cox <rsc@golang.org>
2024-03-08runtime: fix timers.wakeTime inaccuracy raceRuss Cox
timers.wakeTime, which is called concurrently by P's trying to decide how long they should sleep, can return inaccurate values while timers.adjust is running. (Before the refactoring, this was still true but the code did not have good names and was spread across more files, making the race harder to see.) The runtime thread sleeping code is complex enough that I am not confident that the inaccuracy can cause delayed timer wakeups, but I am also not confident that it can't, nor that it won't in the future. There are two parts to the fix: 1. A simple logic change in timers.adjust. 2. The introduction of t.maybeAdd to avoid having a t that is marked as belonging to a specific timers ts but not present in ts.heap. That was okay before when everything was racy but needs to be eliminated to make timers.adjust fully consistent. The cost of the change is an extra CAS-lock operation on a timer add (close to free since the CAS-lock was just unlocked) and a change in the static lock ranking to allow malloc while holding a timer lock. Change-Id: I1249e6e24ae9ef74a69837f453e15b513f0d75c0 Reviewed-on: https://go-review.googlesource.com/c/go/+/564977 Reviewed-by: Austin Clements <austin@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>
2024-02-28runtime: add timer lock to lock rankingRuss Cox
No deadlocks yet! Change-Id: I87fb3742a386d682fbcc8cb98e98771b54bc3fec Reviewed-on: https://go-review.googlesource.com/c/go/+/564133 Reviewed-by: Austin Clements <austin@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-01-09runtime: replace rwmutexR/W with per-rwmutex lock rankMichael Pratt
CL 549536 intended to decouple the internal implementation of rwmutex from the semantic meaning of an rwmutex read/write lock in the static lock ranking. Unfortunately, it was not thought through well enough. The internals were represented with the rwmutexR and rwmutexW lock ranks. The idea was that the internal lock ranks need not model the higher-level ordering, since those have separate rankings. That is incorrect; rwmutexW is held for the duration of a write lock, so it must be ranked before any lock taken while any write lock is held, which is precisely what we were trying to avoid. This is visible in violations like: 0 : execW 11 0x0 1 : rwmutexW 51 0x111d9c8 2 : fin 30 0x111d3a0 fatal error: lock ordering problem execW < fin is modeled, but rwmutexW < fin is missing. Fix this by eliminating the rwmutexR/W lock ranks shared across different types of rwmutex. Instead require users to define an additional "internal" lock rank to represent the implementation details of rwmutex.rLock. We can avoid an additional "internal" lock rank for rwmutex.wLock because the existing writeRank has the same semantics for semantic and internal locking. i.e., writeRank is held for the duration of a write lock, which is exactly how rwmutex.wLock is used, so we can use writeRank directly on wLock. For #64722. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking Change-Id: Ia572de188a46ba8fe054ae28537648beaa16b12c Reviewed-on: https://go-review.googlesource.com/c/go/+/555055 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-12-15runtime: properly model rwmutex in lock rankingMichael Pratt
Currently, lock ranking doesn't really try to model rwmutex. It records the internal locks rLock and wLock, but in a subpar fashion: 1. wLock is held from lock to unlock, so it works OK, but it conflates write locks of all rwmutexes as rwmutexW, rather than allowing different rwmutexes to have different rankings. 2. rLock is an internal implementation detail that is only taken when there is contention in rlock. As as result, the reader lock path is almost never checked. Add proper modeling. rwmutexR and rwmutexW remain as the ranks of the internal locks, which have their own ordering. The new init method is passed the ranks of the higher level lock that this represents, just like lockInit for mutex. execW ordered before MALLOC captures the case from #64722. i.e., there can be allocation between BeforeFork and AfterFork. For #64722. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking Change-Id: I23335b28faa42fb04f1bc9da02fdf54d1616cd28 Reviewed-on: https://go-review.googlesource.com/c/go/+/549536 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-22runtime: move the wakeableSleep lock under sched in the lock rankMichael Anthony Knyszek
Currently the wakeableSleep lock is placed just after timers in the ranking, but it turns out the timers lock can never be held over a timer func, so that's wrong. Meanwhile, wakeableSleep can acquire sched.lock. wakeableSleep, as it turns out, doesn't have any dependencies -- it's always acquired in a (mostly) regular goroutine context. Change-Id: Icc8ea76a8b309fbaf0f02215f16e5f706d49cd95 Reviewed-on: https://go-review.googlesource.com/c/go/+/544395 TryBot-Bypass: Michael Knyszek <mknyszek@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-11-21runtime: add lock partial order edge between wakeableSleep and hchanMichael Anthony Knyszek
This is totally valid and always was, but the staticlockranking builder started failing when the new execution tracer was enabled by default. Change-Id: I011e7d86bd968b1251bcc4d74395633036753b00 Reviewed-on: https://go-review.googlesource.com/c/go/+/544319 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-11-14runtime: prevent send on closed channel in wakeableSleepMichael Anthony Knyszek
Currently wakeableSleep has a race where, although stopTimer is called, the timer could be queued already and fire *after* the wakeup channel is closed. Fix this by protecting wakeup with a lock used on the close and wake paths and assigning the wakeup to nil on close. The wake path then ignores a nil wakeup channel. This fixes the problem by ensuring that a failure to stop the timer only results in the timer doing nothing, rather than trying to send on a closed channel. The addition of this lock requires some changes to the static lock ranking system. Thiere's also a second problem here: the timer could be delayed far enough into the future that when it fires, it observes a non-nil wakeup if the wakeableSleep has been re-initialized and reset. Fix this problem too by allocating the wakeableSleep on the heap and creating a new one instead of reinitializing the old one. The GC will make sure that the reference to the old one stays alive for the timer to fire, but that timer firing won't cause a spurious wakeup in the new one. Change-Id: I2b979304e755c015d4466991f135396f6a271069 Reviewed-on: https://go-review.googlesource.com/c/go/+/542335 Reviewed-by: Michael Pratt <mpratt@google.com> Commit-Queue: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-05-23runtime: add partial lock order between mspanSpecial and gcBitsArenasMichael Anthony Knyszek
CL 493275 started using gcBits for pinner bits. This means gcBits can be allocated while holding the mspanSpecial lock. This is safe because these were just parallel in the partial order, but now they need an explicit edge between them. For #58277. Change-Id: I37917730e12d59cf0580f198d732198413a56424 Reviewed-on: https://go-review.googlesource.com/c/go/+/497475 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2023-05-19runtime: fix lockrank ordering for pinner implementationMichael Anthony Knyszek
The new Pinner API's implementation imposes some partial-orders that are safe but previously did not exist between a mspanSpecial, mheapSpecial, and mheap. Fix that up in the lock ranking. For #46787. Change-Id: I51cc8f7f069240caeb44d749bed43515634f4814 Reviewed-on: https://go-review.googlesource.com/c/go/+/496193 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2023-04-24runtime: add raceFiniLock to lock rankingIan Lance Taylor
Also preserve the PC/SP in reentersyscall when doing lock ranking. The test is TestDestructorCallbackRace with the staticlockranking experiment enabled. For #59711 Change-Id: I87ac1d121ec0d399de369666834891ab9e7d11b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/487955 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com>
2022-10-12runtime: add safe arena support to the runtimeMichael Anthony Knyszek
This change adds an API to the runtime for arenas. A later CL can potentially export it as an experimental API, but for now, just the runtime implementation will suffice. The purpose of arenas is to improve efficiency, primarily by allowing for an application to manually free memory, thereby delaying garbage collection. It comes with other potential performance benefits, such as better locality, a better allocation strategy, and better handling of interior pointers by the GC. This implementation is based on one by danscales@google.com with a few significant differences: * The implementation lives entirely in the runtime (all layers). * Arena chunks are the minimum of 8 MiB or the heap arena size. This choice is made because in practice 64 MiB appears to be way too large of an area for most real-world use-cases. * Arena chunks are not unmapped, instead they're placed on an evacuation list and when there are no pointers left pointing into them, they're allowed to be reused. * Reusing partially-used arena chunks no longer tries to find one used by the same P first; it just takes the first one available. * In order to ensure worst-case fragmentation is never worse than 25%, only types and slice backing stores whose sizes are 1/4th the size of a chunk or less may be used. Previously larger sizes, up to the size of the chunk, were allowed. * ASAN, MSAN, and the race detector are fully supported. * Sets arena chunks to fault that were deferred at the end of mark termination (a non-public patch once did this; I don't see a reason not to continue that). For #51317. Change-Id: I83b1693a17302554cb36b6daa4e9249a81b1644f Reviewed-on: https://go-review.googlesource.com/c/go/+/423359 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-08-11runtime: add mayAcquire annotation for trace.lockAustin Clements
Now that we've moved the trace locks to the leaf of the lock graph, we can safely annotate that any trace event may acquire trace.lock even if dynamically it turns out a particular event doesn't need to flush and acquire this lock. This reveals a new edge where we can trace while holding the mheap lock, so we add this to the lock graph. For #53789. Updates #53979. Change-Id: I13e2f6cd1b621cca4bed0cc13ef12e64d05c89a7 Reviewed-on: https://go-review.googlesource.com/c/go/+/418720 Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-11runtime: move trace locks to the leaf of the lock graphAustin Clements
Now that trace.lock cannot be held over a stack split, we can move that lock and traceStackTab to the leaf of the lock graph. We add a couple edges to STACKGROW that were previously passing through trace. Fixes #53979. Change-Id: Ie664ff7bb33973745f991f7516dc6106e60f5892 Reviewed-on: https://go-review.googlesource.com/c/go/+/418957 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-04runtime: clean up panic and deadlock lock ranksAustin Clements
I'm not entirely sure why these locks are currently ranked "deadlock < panic" since we drop panic before acquiring deadlock, and we actually want deadlock to be below panic because panic is implicitly below everything else and we want deadlock to be, too. My best guess is that we had this edge because we intentionally acquire deadlock twice to deadlock, and that causes the lock rank checking to panic on the second acquire. Fix this in a more sensible way by capturing that deadlock can be acquired in a self-cycle and flipping the rank to "panic < deadlock" to express that deadlock needs to be under all other locks, just like panic. For #53789. Change-Id: I8809e5d102ce473bd3ace0ba07bf2200ef60263f Reviewed-on: https://go-review.googlesource.com/c/go/+/418719 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Austin Clements <austin@google.com>
2022-08-04runtime: make the lock rank DAG make more senseAustin Clements
This groups, comments, and generally reorganizes the lock rank graph description by subsystem. It also introduces several pseudo-nodes that more cleanly describe the inherent layering of lock ranks by subsystem. I believe this doesn't actually change the graph, but haven't verified this. For #53789. Change-Id: I72f332f5a23b8217c7dc1b21411631ad48cee4b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/418718 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2022-08-04runtime: add mayAcquire annotation for finlockAustin Clements
We're missing lock edges to finlock that happen only rarely. Anything that calls mallocgc can potentially trigger sweeping, which can potentially queue a finalizer, which acquires finlock. While this can happen on any malloc, it happens relatively rarely, so we simply haven't seen some of the lock edges that could happen. Add a mayAcquire annotation to mallocgc to capture the possibility of acquiring finlock. With this change, we add "fin" to the set of "malloc" locks. Several of these edges were already there, but not quite all of them. This was found by inspecting the rank graph for things that didn't make sense. For #53789. Change-Id: Idc10ce6f250596b0c07ba07ac93f2198fb38c22b Reviewed-on: https://go-review.googlesource.com/c/go/+/418717 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-04runtime: add missing trace lock edgesAustin Clements
We're missing lock edges to trace.lock that happen only rarely. Any trace event can potentially fill up a trace buffer and acquire trace.lock in order to flush the buffer, but this happens relatively rarely, so we simply haven't seen some of these lock edges that could happen. With this change, we promote "fin, notifyList < traceStackTab" to "fin, notifyList < trace" and now everything that emits trace events with a P enters the tracer lock ranks via "trace", rather than some things entering at "trace" and others at "traceStackTab". This was found by inspecting the rank graph for things that didn't make sense. Ideally we would add a mayAcquire annotation that any trace event can potentially acquire trace.lock, but there are actually cases that violate this ranking right now. This is #53979. The chance of a lock cycle is extremely low given the number of conditions that have to happen simultaneously. For #53789. Change-Id: Ic65947d27dee88d2daf639b21b2c9d37552f0ac0 Reviewed-on: https://go-review.googlesource.com/c/go/+/418716 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-04runtime: generate the lock ranking from a DAG descriptionAustin Clements
Currently, the runtime lock rank graph is maintained manually in a large set of arrays that give the partial order and a manual topological sort of this partial order. Any changes to the rank graph are difficult to reason about and hard to review, as well as likely to cause merge conflicts. Furthermore, because the partial order is manually maintained, it's not actually transitively closed (though it's close), meaning there are many cases where rank a can be acquired before b and b before c, but a cannot be acquired before c. While this isn't technically wrong, it's very strange in the context of lock ordering. Replace all of this with a much more compact, readable, and maintainable description of the rank graph written in the internal/dag graph language. We statically generate the runtime structures from this description, which has the advantage that the parser doesn't have to run during runtime initialization and the structures can live in static data where they can be accessed from any point during runtime init. The current description was automatically generated from the existing partial order, combined with a transitive reduction. This ensures it's correct, but it could use some manual messaging to call out the logical layers and add some structure. We do lose the ad hoc string names of the lock ranks in this translation, which could mostly be derived from the rank constant names, but not always. I may bring those back but in a more uniform way. We no longer need the tests in lockrank_test.go because they were checking that we manually maintained the structures correctly. Fixes #53789. Change-Id: I54451d561b22e61150aff7e9b8602ba9737e1b9b Reviewed-on: https://go-review.googlesource.com/c/go/+/418715 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-04runtime: delete unused lock ranksAustin Clements
For #53789. Change-Id: Ic7379afcfdcc47b541bac9b44b5bc6b43604fc0a Reviewed-on: https://go-review.googlesource.com/c/go/+/418714 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com>
2022-05-10runtime: add lock partial order edge for trace and wbufSpans and mheapMichael Anthony Knyszek
This edge represents the case of executing a write barrier under the trace lock: we might use the wbufSpans lock to get a new trace buffer, or mheap to allocate a totally new one. Fixes #52794. Change-Id: Ia1ac2c744b8284ae29f4745373df3f9675ab1168 Reviewed-on: https://go-review.googlesource.com/c/go/+/405476 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Bryan Mills <bcmills@google.com>
2022-05-03runtime: split mprof locksRhys Hiltner
The profiles for memory allocations, sync.Mutex contention, and general blocking store their data in a shared hash table. The bookkeeping work at the end of a garbage collection cycle involves maintenance on each memory allocation record. Previously, a single lock guarded access to the hash table and the contents of all records. When a program has allocated memory at a large number of unique call stacks, the maintenance following every garbage collection can hold that lock for several milliseconds. That can prevent progress on all other goroutines by delaying acquirep's call to mcache.prepareForSweep, which needs the lock in mProf_Free to report when a profiled allocation is no longer in use. With no user goroutines making progress, it is in effect a multi-millisecond GC-related stop-the-world pause. Split the lock so the call to mProf_Flush no longer delays each P's call to mProf_Free: mProf_Free uses a lock on the memory records' N+1 cycle, and mProf_Flush uses locks on the memory records' accumulator and their N cycle. mProf_Malloc also no longer competes with mProf_Flush, as it uses a lock on the memory records' N+2 cycle. The profiles for sync.Mutex contention and general blocking now share a separate lock, and another lock guards insertions to the shared hash table (uncommon in the steady-state). Consumers of each type of profile take the matching accumulator lock, so will observe consistent count and magnitude values for each record. For #45894 Change-Id: I615ff80618d10e71025423daa64b0b7f9dc57daa Reviewed-on: https://go-review.googlesource.com/c/go/+/399956 Reviewed-by: Carlos Amedee <carlos@golang.org> Run-TryBot: Rhys Hiltner <rhys@justin.tv> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2021-11-05runtime: don't hold the heap lock while scavengingMichael Anthony Knyszek
This change modifies the scavenger to no longer hold the heap lock while actively scavenging pages. To achieve this, the change also: * Reverses the locking behavior of the (*pageAlloc).scavenge API, to only acquire the heap lock when necessary. * Introduces a new lock on the scavenger-related fields in a pageAlloc so that access to those fields doesn't require the heap lock. There are a few places in the scavenge path, notably reservation, that requires synchronization. The heap lock is far too heavy handed for this case. * Changes the scavenger to marks pages that are actively being scavenged as allocated, and "frees" them back to the page allocator the usual way. * Lifts the heap-growth scavenging code out of mheap.grow, where the heap lock is held, and into allocSpan, just after the lock is released. Releasing the lock during mheap.grow is not feasible if we want to ensure that allocation always makes progress (post-growth, another allocator could come in and take all that space, forcing the goroutine that just grew the heap to do so again). This change means that the scavenger now must do more work for each scavenge, but it is also now much more scalable. Although in theory it's not great by always taking the locked paths in the page allocator, it takes advantage of some properties of the allocator: * Most of the time, the scavenger will be working with one page at a time. The page allocator's locked path is optimized for this case. * On the allocation path, it doesn't need to do the find operation at all; it can go straight to setting bits for the range and updating the summary structure. Change-Id: Ie941d5e7c05dcc96476795c63fef74bcafc2a0f1 Reviewed-on: https://go-review.googlesource.com/c/go/+/353974 Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2021-11-05runtime: fix a lock rank ordering and some edges.Dan Scales
The first stack-trace in #49361 shows that traceBuf must precede fin in lockrank ordering, since traceBuf is acquired in StartTrace(), which eventually leads to getting fin in queueFinalizer(). It is fine to move traceBuf above fin, since there are no other conflicting dependencies. The second stack trace shows that there is an edge bewtween reflectOffs and fin, since reflectOffs is acquired in addReflectOff, and map operations can lead to an allocation that eventually causes fin to be acquired in queueFinalizer(). Fixes #49361 Change-Id: I8e857ef9ecdff37fdd229e4dba22e15bc71d4ba5 Reviewed-on: https://go-review.googlesource.com/c/go/+/361407 Trust: Dan Scales <danscales@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2021-05-03cmd/compile: add edge from lock rank edge from forceGC to traceStackTabDan Scales
This edge can happen when forcegchelper() calls goparkunlock(&forcegc.lock, ...) while holding the forcegc lock. goparkunlock() eventually calls park_m(). In park_m(), traceGoPark() (which leads to (*traceStackTable).put() and acquires the traceStackTab lock) can be called before the forcegc lock is released. Fixes #45774 Change-Id: If0fceab596712eb9ec0b9b47326778bc0ff80913 Reviewed-on: https://go-review.googlesource.com/c/go/+/316029 Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Trust: Carlos Amedee <carlos@golang.org>
2021-03-10runtime: check partial lock ranking orderMichael Pratt
To ease readability we typically keep the partial order lists sorted by rank. This isn't required for correctness, it just makes it (slightly) easier to read the lists. Currently we must notice out-of-order entries during code review, which is an error-prone process. Add a test to enforce ordering, and fix the errors that have crept in. Most of the existing errors were misordered lockRankHchan or lockRankPollDesc. While we're here, I've moved the correctness check that the partial ordering satisfies the total ordering from init to a test case. This will allow us to catch these errors without even running staticlockranking. Change-Id: I9c11abe49ea26c556439822bb6a3183129600c3b Reviewed-on: https://go-review.googlesource.com/c/go/+/300171 Trust: Michael Pratt <mpratt@google.com> Trust: Dan Scales <danscales@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Dan Scales <danscales@google.com>
2021-03-09runtime: add pollDesc partial edgesMichael Pratt
gscan is taken during stack growth, which may occur while pollDesc is held. mallocgc may also be called while pollDesc is held. mallocgc may take mheap or mheapSpecial. The former exists, but is out of order; the latter is missing. Fixes #44881 Change-Id: Ie25935d9d433e813c11a528ee47255b317a09f41 Reviewed-on: https://go-review.googlesource.com/c/go/+/300009 Trust: Michael Pratt <mpratt@google.com> Trust: Dan Scales <danscales@google.com> Reviewed-by: Dan Scales <danscales@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Pratt <mpratt@google.com>
2021-03-05runtime: update paniclk orderingMichael Pratt
Now that allglock is no longer taken in throw, paniclk can move to the bottom of the lock order where it belongs. There is no fundamental reason that we really need to skip checks on paniclk in lockWithRank (despite the recursive throws that could be caused by lock rank checking, startpanic_m would still allow the crash to complete). However, the partial order of lockRankPanic should be every single lock that may be held before a throw, nil dereference, out-of-bounds access, which our partial order doesn't cover. Updates #42669 Change-Id: Ic3efaea873dc2dd9fd5b0d6ccdd5319730b29a22 Reviewed-on: https://go-review.googlesource.com/c/go/+/270862 Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-11-10runtime: add lock rank partial order edge pollDesc -> spanSetSpineMichael Anthony Knyszek
This change adds a missing partial order edge. This edge captures of the case of `wakep` getting called in `wakeNetPoller` which may then allocate. Fixes #42461. Change-Id: Ie67d868e9cd24ed3cc94381dbf8a691dd13f068d Reviewed-on: https://go-review.googlesource.com/c/go/+/268858 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2020-11-10runtime: add lock rank partial order edge sweep -> mspanSpecialMichael Anthony Knyszek
This change adds a missing partial order edge. The edge captures the case where the background sweeper handles some specials (i.e. finalizers or memory profile sampling) and is otherwise correct. Fixes #42472. Change-Id: Ic45f6cc1635fd3d6bc6c91ff6f64d436088cef33 Reviewed-on: https://go-review.googlesource.com/c/go/+/268857 Trust: Michael Knyszek <mknyszek@google.com> Trust: Dan Scales <danscales@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Dan Scales <danscales@google.com>
2020-10-28runtime: add edge lockRankSysmon -> lockRankRwmutexRDan Scales
Sysmon can actually get the RW lock execLock while holding the sysmon lock (if no M is available), so there is an edge from lockRankSysmon to lockRankRwmutexR. The stack trace is sysmon() [gets sched.sysmonlock] -> startm() -> newm() -> newm1() -> execLock.runlock() [gets execLock.rLock] Change-Id: I9658659ba3899afb5219114d66b989abd50540db Reviewed-on: https://go-review.googlesource.com/c/go/+/265721 Trust: Dan Scales <danscales@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-10-27runtime: reduce timer latencyChris Hines
Change the scheduler to treat expired timers with the same approach it uses to steal runnable G's. Previously the scheduler ignored timers on P's not marked for preemption. That had the downside that any G's waiting on those expired timers starved until the G running on their P completed or was preempted. That could take as long as 20ms if sysmon was in a 10ms wake up cycle. In addition, a spinning P that ignored an expired timer and found no other work would stop despite there being available work, missing the opportunity for greater parallelism. With this change the scheduler no longer ignores timers on non-preemptable P's or relies on sysmon as a backstop to start threads when timers expire. Instead it wakes an idle P, if needed, when creating a new timer because it cannot predict if the current P will have a scheduling opportunity before the new timer expires. The P it wakes will determine how long to sleep and block on the netpoller for the required time, potentially stealing the new timer when it wakes. This change also eliminates a race between a spinning P transitioning to idle concurrently with timer creation using the same pattern used for submission of new goroutines in the same window. Benchmark analysis: CL 232199, which was included in Go 1.15 improved timer latency over Go 1.14 by allowing P's to steal timers from P's not marked for preemption. The benchmarks added in this CL measure that improvement in the ParallelTimerLatency benchmark seen below. However, Go 1.15 still relies on sysmon to notice expired timers in some situations and sysmon can sleep for up to 10ms before waking to check timers. This CL fixes that shortcoming with modest regression on other benchmarks. name \ avg-late-ns go14.time.bench go15.time.bench fix.time.bench ParallelTimerLatency-8 17.3M ± 3% 7.9M ± 0% 0.2M ± 3% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=1-8 53.4k ±23% 50.7k ±31% 252.4k ± 9% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=2-8 204k ±14% 90k ±58% 188k ±12% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=3-8 1.17M ± 0% 0.11M ± 5% 0.11M ± 2% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=4-8 1.81M ±44% 0.10M ± 4% 0.10M ± 2% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=5-8 2.28M ±66% 0.09M ±13% 0.08M ±21% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=6-8 2.84M ±85% 0.07M ±15% 0.07M ±18% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=7-8 2.13M ±27% 0.06M ± 4% 0.06M ± 9% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=8-8 2.63M ± 6% 0.06M ±11% 0.06M ± 9% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=9-8 3.32M ±17% 0.06M ±16% 0.07M ±14% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=10-8 8.46M ±20% 4.37M ±21% 5.03M ±23% StaggeredTickerLatency/work-dur=2ms/tickers-per-P=1-8 1.02M ± 1% 0.20M ± 2% 0.20M ± 2% name \ max-late-ns go14.time.bench go15.time.bench fix.time.bench ParallelTimerLatency-8 18.3M ± 1% 8.2M ± 0% 0.5M ±12% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=1-8 141k ±19% 127k ±19% 1129k ± 3% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=2-8 2.78M ± 4% 1.23M ±15% 1.26M ± 5% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=3-8 6.05M ± 5% 0.67M ±56% 0.81M ±33% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=4-8 7.93M ±20% 0.71M ±46% 0.76M ±41% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=5-8 9.41M ±30% 0.92M ±23% 0.81M ±44% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=6-8 10.8M ±42% 0.8M ±41% 0.8M ±30% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=7-8 9.62M ±24% 0.77M ±38% 0.88M ±27% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=8-8 10.6M ±10% 0.8M ±32% 0.7M ±27% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=9-8 11.9M ±36% 0.6M ±46% 0.8M ±38% StaggeredTickerLatency/work-dur=300µs/tickers-per-P=10-8 36.8M ±21% 24.7M ±21% 27.5M ±16% StaggeredTickerLatency/work-dur=2ms/tickers-per-P=1-8 2.12M ± 2% 1.02M ±11% 1.03M ± 7% Other time benchmarks: name \ time/op go14.time.bench go15.time.bench fix.time.bench AfterFunc-8 137µs ± 4% 123µs ± 4% 131µs ± 2% After-8 212µs ± 3% 195µs ± 4% 204µs ± 7% Stop-8 165µs ± 6% 156µs ± 2% 151µs ±12% SimultaneousAfterFunc-8 260µs ± 3% 248µs ± 3% 284µs ± 2% StartStop-8 65.8µs ± 9% 64.4µs ± 7% 67.3µs ±15% Reset-8 13.6µs ± 2% 9.6µs ± 2% 9.1µs ± 4% Sleep-8 307µs ± 4% 306µs ± 3% 320µs ± 2% Ticker-8 53.0µs ± 5% 54.5µs ± 5% 57.0µs ±11% TickerReset-8 9.24µs ± 2% 9.51µs ± 3% TickerResetNaive-8 149µs ± 5% 145µs ± 5% Fixes #38860 Updates #25471 Updates #27707 Change-Id: If52680509b0f3b66dbd1d0c13fa574bd2d0bbd57 Reviewed-on: https://go-review.googlesource.com/c/go/+/232298 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Trust: Ian Lance Taylor <iant@golang.org>
2020-10-19runtime: add lock rank partial-order edge between fin and mheapMichael Anthony Knyszek
finlock may be held across a write barrier, which could then acquire the mheap lock. Notably, this occurs in the mp.unlockf write in gopark where finlock is held by the finalizer goroutines and is going to sleep. Fixes #42062. Change-Id: Icf76637ae6fc12795436272633dca3d473780875 Reviewed-on: https://go-review.googlesource.com/c/go/+/263678 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Dan Scales <danscales@google.com>
2020-08-27runtime: add lock partial order edge (fin -> wbufSpans)Michael Pratt
runfinq may have write barriers, thus it may need to take wbufSpans on any write. Fixes #41021 Change-Id: Ib69e20994b5d7d1526ad53d6ddb5e2e83bf2ed00 Reviewed-on: https://go-review.googlesource.com/c/go/+/250464 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dan Scales <danscales@google.com>
2020-08-17runtime: clean up old mcentral codeMichael Anthony Knyszek
This change deletes the old mcentral implementation from the code base and the newMCentralImpl feature flag along with it. Updates #37487. Change-Id: Ibca8f722665f0865051f649ffe699cbdbfdcfcf2 Reviewed-on: https://go-review.googlesource.com/c/go/+/221184 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>