From 0ecc9c77faf22aeb0a8385b539d5ba827e351e4b Mon Sep 17 00:00:00 2001 From: Rhys Hiltner Date: Fri, 4 Oct 2024 14:57:12 -0700 Subject: design/68578-mutex-spinbit.md: describe protocol Add a design doc describing the general approach of the "spinbit" mutex protocol, and the details of the two drafts that implement it. Based on futex, for linux/amd64: https://go.dev/cl/601597 For all GOOS values and four architectures: https://go.dev/cl/620435 For golang/go#68578 Change-Id: Ie9665085c9b8cf1741deeb431acfa12fba550b63 Reviewed-on: https://go-review.googlesource.com/c/proposal/+/617618 Auto-Submit: Rhys Hiltner Reviewed-by: Rhys Hiltner Commit-Queue: Rhys Hiltner --- design/68578-mutex-spinbit.md | 258 ++++++++++++++++++++ design/68578/spinbit.cfg | 12 + design/68578/spinbit.tla | 533 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 803 insertions(+) create mode 100644 design/68578-mutex-spinbit.md create mode 100644 design/68578/spinbit.cfg create mode 100644 design/68578/spinbit.tla diff --git a/design/68578-mutex-spinbit.md b/design/68578-mutex-spinbit.md new file mode 100644 index 0000000..85f9e93 --- /dev/null +++ b/design/68578-mutex-spinbit.md @@ -0,0 +1,258 @@ +# Proposal: Improve scalability of runtime.lock2 + +Author(s): Rhys Hiltner + +Last updated: 2024-10-16 + +Discussion at https://go.dev/issue/68578. + +## Abstract + +Improve multi-core scalability of the runtime's internal mutex implementation +by minimizing wakeups of waiting threads. + +Avoiding wakeups of threads that are waiting for the lock allows those threads +to sleep for longer. +That reduces the number of concurrent threads that are attempting to read the +mutex's state word. +Fewer reads of that cache line mean less cache coherency traffic within the +processor when a thread needs to make an update. +Fast updates (to acquire and release the lock) even when many threads need the +lock means better scalability. + +This is not an API change, so is not part of the formal proposal process. + +## Background + +One of the simplest mutex designs is a single bit that is "0" when unlocked or +"1" when locked. +To acquire the lock, a thread attempts to swap in a "1", +looping until the result it gets is "0". +To unlock, the thread swaps in a "0". + +The performance of such a spinlock is poor in at least two ways. +First, threads that are trying to acquire an already-held lock waste their own +on-CPU time. +Second, those software threads execute on hardware resources that need a local +copy of the mutex state word in cache. + +Having the state word in cache for read access requires it not be writeable by +any other processors. +Writing to that memory location requires the hardware to invalidate all cached +copies of that memory, one in each processor that had loaded it for reading. +The hardware-internal communication necessary to implement those guarantees +has a cost, which appears as a slowdown when writing to that memory location. + +Go's current mutex design is several steps more advanced than the simple +spinlock, but under certain conditions its performance can degrade in a similar way. +First, when `runtime.lock2` is unable to immediately obtain the mutex it will +pause for a moment before retrying, primarily using hardware-level delay +instructions (such as `PAUSE` on 386 and amd64). +Then, if it's unable to acquire the mutex after several retries it will ask +the OS to put it to sleep until another thread requests a wakeup. +On Linux, we use the `futex` syscall to sleep directly on the mutex address, +implemented in src/runtime/lock_futex.go. +On many other platforms (including Windows and macOS),the waiting threads +form a LIFO stack with the mutex state word as a pointer to the top of the +stack, implemented in src/runtime/lock_sema.go. + +When the `futex` syscall is available, +the OS maintains a list of waiting threads and will choose which it wakes. +Otherwise, the Go runtime maintains that list and names a specific thread +when it asks the OS to do a wakeup. +To avoid a `futex` syscall when there's no contention, +we split the "locked" state into two variants: +1 meaning "locked with no contention" and +2 meaning "locked, and a thread may be asleep". +(With the semaphore-based implementation, +the Go runtime can--and must--know for itself whether a thread is asleep.) +Go's mutex implementation has those three logical states +(unlocked, locked, locked-with-sleepers) on all multi-threaded platforms. +For the purposes of the Go runtime, I'm calling this design "tristate". + +After releasing the mutex, +`runtime.unlock2` will wake a thread whenever one is sleeping. +It does not consider whether one of the waiting threads is already awake. +If a waiting thread is already awake, it's not necessary to wake another. + +Waking additional threads results in higher concurrent demand for the mutex +state word's cache line. +Every thread that is awake and spinning in a loop to reload the state word +leads to more cache coherency traffic within the processor, +and to slower writes to that cache line. + +Consider the case where many threads all need to use the same mutex many times +in a row. +Furthermore, consider that the critical section is short relative to the time +it takes a thread to give up on spinning and go (back) to sleep. +At the end of each critical section, the thread that is releasing the mutex +will see that a waiting thread is asleep, and will wake it. +It takes a relatively long time for a thread to decide to go to sleep, +and there's a relatively short time until the next `runtime.unlock2` call will +wake it. +Many threads will be awake, all reloading the state word in a loop, +all slowing down updates to its value. + +Without a limit on the number of threads that can spin on the state word, +higher demand for a mutex value degrades its performance. + +See also https://go.dev/issue/68578. + +## Proposal + +Expand the mutex state word to include a new flag, "spinning". +Use the "spinning" bit to communicate whether one of the waiting threads is +awake and looping while trying to acquire the lock. +Threads mutually exclude each other from the "spinning" state, +but they won't block while attempting to acquire the bit. +Only the thread that owns the "spinning" bit is allowed to reload the state +word in a loop. +It releases the "spinning" bit before going to sleep. +The other waiting threads go directly to sleep. +The thread that unlocks a mutex can avoid waking a thread if it sees that one +is already awake and spinning. +For the purposes of the Go runtime, I'm calling this design "spinbit". + +### futex-based option, https://go.dev/cl/601597 + +I've prepared https://go.dev/cl/601597, +which implements the "spinbit" design for GOOS=linux and GOARCH=amd64. +I've prepared a matching [TLA+ model](./68578/spinbit.tla) +to check for lost wakeups. +(When relying on the `futex` syscall to maintain the list of sleeping Ms, +it's easy to write lost-wakeup bugs.) + +It uses an atomic `Xchg8` operation on two different bytes of the mutex state +word. +The low byte records whether the mutex is locked, +and whether one or more waiting Ms may be asleep. +The "spinning" flag is in a separate byte and so can be independently +manipulated with atomic `Xchg8` operations. +The two bytes are within a single uintptr field (`runtime.mutex.key`). +When the spinning M attempts to acquire the lock, +it can do a CAS on the entire state word, +setting the "locked" flag and clearing the "spinning" flag +in a single operation. + +### Cross-OS option, https://go.dev/cl/620435 + +I've also prepared https://go.dev/cl/620435 which unifies the lock_sema.go and +lock_futex.go implementations and so supports all GOOS values for which Go +supports multiple threads. +(It uses `futex` to implement the `runtime.sema{create,sleep,wakeup}` +functions for lock_futex.go platforms.) +Go's development branch now includes `Xchg8` support for +GOARCH=amd64,arm64,ppc64,ppc64le, +and so that CL supports all of those architectures. + +The fast path for `runtime.lock2` and `runtime.unlock2` use `Xchg8` operations +to interact with the "locked" flag. +The lowest byte of the state word is dedicated to use with those `Xchg8` +operations. +Most of the upper bytes hold a partial pointer to an M. +(The `runtime.m` datastructure is large enough to allow reconstructing the low +bits from the partial pointer, +with special handling for the non-heap-allocated `runtime.m0` value.) +Beyond the 8 bits needed for use with `Xchg8`, +a few more low bits are available for use as flags. +One of those bits holds the "spinning" flag, +which is manipulated with pointer-length `Load` and `CAS` operations. + +When Ms go to sleep they form a LIFO stack linked via `runtime.m.nextwaitm` +pointers, as lock_sema.go does today. +The list of waiting Ms is a multi-producer, single-consumer stack. +Each M can add itself, +but inspecting or removing Ms requires exclusive access. +Today, lock_sema.go's `runtime.unlock2` uses the mutex itself to control that +ownership. +That puts any use of the sleeping M list in the critical path of the mutex. + +My proposal uses another bit of the state word as a try-lock to control +inspecting and removing Ms from the list. +This allows additional list-management code without slowing the critical path +of a busy mutex, and use of efficient `Xchg8` operations in the fast paths. +We'll need access to the list in order to attribute contention delay to the +right critical section in the [mutex profile](https://go.dev/issue/66999). +Access to the list will also let us periodically wake an M even when it's not +strictly necessary, to combat tail latency that may be introduced by the +reduction in wakeups. + +Here's the full layout of the `runtime.mutex.key` state word: +Bit 0 holds the "locked" flag, the primary purpose of the mutex. +Bit 1 is the "sleeping" flag, and is set when the upper bits point to an M. +Bits 2 through 7 are unused, since they're lost with every `Xchg8` operation. +Bit 8 holds the "spinning" try-lock, allowing the holder to reload the state +word in a loop. +Bit 9 holds the "stack" try-lock, allowing the holder to inspect and remove +sleeping Ms from the LIFO stack. +Bits 10 and higher of the state word hold bits 10 and higher of a pointer to +the M at the top of the LIFO stack of sleeping waiters. + +## Rationale + +The status quo is a `runtime.lock2` implementation that experiences congestion +collapse under high contention on machines with many hardware threads. +Addressing that will require fewer threads loading the same cache line in a +loop. + +The literature presents several options for scalable / non-collapsing mutexes. +Some require an additional memory footprint for each mutex in proportion to +the number of threads that may seek to acquire the lock. +Some require threads to store a reference to a value that they will use to +release each lock they hold. +Go includes a `runtime.mutex` as part of every `chan`, and in some +applications those values are the ones with the most contention. +Coupled with `select`, there's no limit to the number of mutexes that an M can +hold. +That means neither of those forms of increased memory footprint is acceptable. + +The performance of fully uncontended `runtime.lock2`/`runtime.unlock2` pairs +is also important to the project. +That limits the use of many of the literature's proposed locking algorithms, +if they include FIFO queue handoff behavior. +On my test hardware +(a linux/amd64 machine with i7-13700H, and a darwin/arm64 M1), +a `runtime.mutex` value with zero or moderate contention can support +50,000,000 uses per second on any threads, +or can move between active threads 10,000,000 times per second, +or can move between inactive threads (with sleep mediated by the OS) +about 40,000 to 500,000 times per second (depending on the OS). +Some amount of capture or barging, rather than queueing, is required to +maintain the level of throughput that Go users have come to expect. + +Keeping the size of `runtime.mutex` values as they are today but allowing +threads to sleep with fewer interruptions seems like fulfilling the goal of +the original design. +The main disadvantage I know of is the risk of increased tail latency: +A small set of threads may be able to capture a contended mutex, +passing it back and forth among themselves while the other threads sleep +indefinitely. +That's already a risk of the current lock_sema.go implementation, +but the high volume of wakeups means threads are unlikely to sleep for long, +and the list of sleeping threads may regularly dip to zero. + +The "cross-OS" option has an edge here: +with it, the Go runtime maintains an explicit list of sleeping Ms and so can do +targeted wakes or even direct handoffs to reduce starvation. + +## Compatibility + +There is no change in exported APIs. + +## Implementation + +I've prepared two options for the Go 1.24 release cycle. +One relies on the `futex` syscall and the `Xchg8` operation, and so initially +supports GOOS=linux and GOARCH=amd64: https://go.dev/cl/601597. +The other relies on only the `Xchg8` operation and works with any GOOS value +that supports threads: https://go.dev/cl/620435. +Both are controlled by `GOEXPERIMENT=spinbitmutex`, +enabled by default on supported platforms. + +## Open issues (if applicable) + +I appreciate feedback on the balance between simplicity, +performance at zero or low contention, +performance under extreme contention, +both the performance and maintenance burden for non-first-class ports, +and the accuracy of contention profiles. diff --git a/design/68578/spinbit.cfg b/design/68578/spinbit.cfg new file mode 100644 index 0000000..d6dc6eb --- /dev/null +++ b/design/68578/spinbit.cfg @@ -0,0 +1,12 @@ +SPECIFICATION Spec + +INVARIANT TypeInvariant +PROPERTY NoLostWakeups +PROPERTY HaveAcquisitions + +CONSTANT + NULL = NULL + NumThreads = 3 + NumAcquires = 1 + NumSpins = 0 + WakeAny = TRUE diff --git a/design/68578/spinbit.tla b/design/68578/spinbit.tla new file mode 100644 index 0000000..3023b20 --- /dev/null +++ b/design/68578/spinbit.tla @@ -0,0 +1,533 @@ +------------------------------ MODULE spinbit ------------------------------ +EXTENDS TLC, Integers, Sequences, FiniteSets + +CONSTANT NumThreads +CONSTANT NumAcquires +CONSTANT NumSpins \* performance tunable "spin" +CONSTANT WakeAny \* reduce state space: wake any of the sleepers +CONSTANT NULL \* [ model value ] + +ASSUME NumThreads >= 1 +ASSUME NumAcquires >= 0 +ASSUME NumSpins >= 0 +ASSUME WakeAny \in BOOLEAN + +Threads == 1..NumThreads + +(* --algorithm spinbit + +variables + + \* logical owner + owner = NULL; + + \* lock state, stored in l.key word + l_key_locked = FALSE; \* bit 0 + l_key_sleeping = FALSE; \* bit 1 + l_key_spinning = FALSE; \* bit 63 + + \* local variables + v_locked = [t \in Threads |-> FALSE]; + v_sleeping = [t \in Threads |-> FALSE]; + v_spinning = [t \in Threads |-> FALSE]; + v8_locked = [t \in Threads |-> FALSE]; + v8_sleeping = [t \in Threads |-> FALSE]; + weSpin = [t \in Threads |-> FALSE]; + i = [t \in Threads |-> 0]; + + \* OS state + sleepers = {}; + + \* + acquisitions = 0; + + \* performance tunables + spin = NumSpins; + +define + + TypeInvariant == + /\ owner \in Threads \union {NULL} + /\ l_key_locked \in BOOLEAN + /\ l_key_sleeping \in BOOLEAN + /\ l_key_spinning \in BOOLEAN + /\ \A t \in Threads: v_locked[t] \in BOOLEAN + /\ \A t \in Threads: v_sleeping[t] \in BOOLEAN + /\ \A t \in Threads: v_spinning[t] \in BOOLEAN + /\ \A t \in Threads: v8_locked[t] \in BOOLEAN + /\ \A t \in Threads: v8_sleeping[t] \in BOOLEAN + /\ \A t \in Threads: weSpin[t] \in BOOLEAN + /\ \A t \in Threads: i[t] \in Nat + /\ sleepers \subseteq Threads + /\ acquisitions \in Nat + + NoLostWakeups == + (Cardinality(sleepers) > 0) ~> (Cardinality(sleepers) = 0) + + HaveAcquisitions == + <>[](acquisitions >= NumAcquires) + +end define; + +\* This is a lock, so if a process dies while holding it the rest of the simulation +\* won't be able to make progress. We need "fair process" so the threads don't die. +fair process thread \in Threads +begin + NonCriticalSection: + if acquisitions >= NumAcquires then + goto Done; + end if; + + SpeculativeGrab: + \* v8 = atomic.Xchg8(k8, mutexLocked) + v8_locked[self] := l_key_locked; + v8_sleeping[self] := l_key_sleeping; + l_key_locked := TRUE; + l_key_sleeping := FALSE; + SpeculativeGrabCheck: + if v8_locked[self] = FALSE then + if v8_sleeping[self] = TRUE then + \* atomic.Or8(k8, mutexSleeping) + l_key_sleeping := TRUE; + end if; + goto Locked; + else + \* var v uintptr = mutexLocked + v_locked[self] := TRUE; + v_sleeping[self] := FALSE; + v_spinning[self] := FALSE; + weSpin[self] := FALSE; + end if; + + EnterSlowPath: + i[self] := 0; + + TryAcquire: + if v_locked[self] = TRUE then + goto SetSleepBit + else + if weSpin[self] then + \* next := (v &^ mutexSpinning) | mutexLocked | mutexSleeping + \* if atomic.Casuintptr(&l.key, v, next) { + if v_locked[self] = l_key_locked + /\ v_sleeping[self] = l_key_sleeping + /\ v_spinning[self] = l_key_spinning then + l_key_locked := TRUE; + l_key_sleeping := TRUE; + l_key_spinning := FALSE; + goto Locked; + end if; + else + \* prev8 := atomic.Xchg8(k8, mutexLocked|mutexSleeping) + \* if prev8&mutexLocked == 0 { + if l_key_locked = FALSE then + l_key_locked := TRUE; + l_key_sleeping := TRUE; + goto Locked; + else + l_key_locked := TRUE; + l_key_sleeping := TRUE; + end if; + end if; + end if; + TryAcquireLoop: + \* v = atomic.Loaduintptr(&l.key) + v_locked[self] := l_key_locked; + v_sleeping[self] := l_key_sleeping; + v_spinning[self] := l_key_spinning; + goto TryAcquire; + + SetSleepBit: + \* atomic.Or8(k8, mutexSleeping) + if v_sleeping[self] = FALSE then + l_key_sleeping := TRUE; + v_sleeping[self] := TRUE; + end if; + + SetSpinBit: + \* if !weSpin && atomic.Xchg8(key8Upper(&l.key), mutexSpinning>>((goarch.PtrSize-1)*8)) == 0 { + if weSpin[self] = FALSE then + weSpin[self] := (l_key_spinning = FALSE); + l_key_spinning := TRUE; + end if; + + DoSpin: + if weSpin[self] = TRUE then + if i[self] < spin then + \* procyield(active_spin_cnt) + \* v = atomic.Loaduintptr(&l.key) + v_locked[self] := l_key_locked; + v_sleeping[self] := l_key_sleeping; + v_spinning[self] := l_key_spinning; + goto TryAcquire; + else + weSpin[self] := FALSE; + \* atomic.Xchg8(key8Upper(&l.key), 0) + l_key_spinning := FALSE; + end if; + end if; + + Sleep: + \* v = atomic.Loaduintptr(&l.key) + v_locked[self] := l_key_locked; + v_sleeping[self] := l_key_sleeping; + v_spinning[self] := l_key_spinning; + Futex: + if v_locked[self] = FALSE then + goto TryAcquire; + else + \* futexsleep(k32, uint32(v|mutexSleeping), -1) + if l_key_locked = v_locked[self] + /\ l_key_sleeping = TRUE then + sleepers := sleepers \union {self}; + end if; + end if; + + Sleeping: + await self \notin sleepers; + i[self] := 0; + \* v = atomic.Loaduintptr(&l.key) + v_locked[self] := l_key_locked; + v_sleeping[self] := l_key_sleeping; + v_spinning[self] := l_key_spinning; + goto TryAcquire; + + Locked: + owner := self; + CriticalSection: + acquisitions := acquisitions + 1; + Unlock: + owner := NULL; + \* prev8 := atomic.Xchg8(key8(&l.key), 0) + v8_locked[self] := l_key_locked; + v8_sleeping[self] := l_key_sleeping; + l_key_locked := FALSE; + l_key_sleeping := FALSE; + if v8_sleeping[self] = TRUE then + goto Wakeup; + else + goto NonCriticalSection; + end if; + + Wakeup: + \* v := atomic.Loaduintptr(&l.key) + v_locked[self] := l_key_locked; + v_sleeping[self] := l_key_sleeping; + v_spinning[self] := l_key_spinning; + Wake2: + if v_spinning[self] = FALSE then + if Cardinality(sleepers) > 0 then + if WakeAny then + with other = CHOOSE other \in sleepers: TRUE do + sleepers := sleepers \ {other}; + end with; + else + with other \in sleepers do + sleepers := sleepers \ {other}; + end with; + end if; + end if; + end if; + goto NonCriticalSection; + +end process; + +end algorithm; *) +\* BEGIN TRANSLATION (chksum(pcal) = "a993096" /\ chksum(tla) = "41bbe278") +VARIABLES owner, l_key_locked, l_key_sleeping, l_key_spinning, v_locked, + v_sleeping, v_spinning, v8_locked, v8_sleeping, weSpin, i, sleepers, + acquisitions, spin, pc + +(* define statement *) +TypeInvariant == + /\ owner \in Threads \union {NULL} + /\ l_key_locked \in BOOLEAN + /\ l_key_sleeping \in BOOLEAN + /\ l_key_spinning \in BOOLEAN + /\ \A t \in Threads: v_locked[t] \in BOOLEAN + /\ \A t \in Threads: v_sleeping[t] \in BOOLEAN + /\ \A t \in Threads: v_spinning[t] \in BOOLEAN + /\ \A t \in Threads: v8_locked[t] \in BOOLEAN + /\ \A t \in Threads: v8_sleeping[t] \in BOOLEAN + /\ \A t \in Threads: weSpin[t] \in BOOLEAN + /\ \A t \in Threads: i[t] \in Nat + /\ sleepers \subseteq Threads + /\ acquisitions \in Nat + +NoLostWakeups == + (Cardinality(sleepers) > 0) ~> (Cardinality(sleepers) = 0) + +HaveAcquisitions == + <>[](acquisitions >= NumAcquires) + + +vars == << owner, l_key_locked, l_key_sleeping, l_key_spinning, v_locked, + v_sleeping, v_spinning, v8_locked, v8_sleeping, weSpin, i, + sleepers, acquisitions, spin, pc >> + +ProcSet == (Threads) + +Init == (* Global variables *) + /\ owner = NULL + /\ l_key_locked = FALSE + /\ l_key_sleeping = FALSE + /\ l_key_spinning = FALSE + /\ v_locked = [t \in Threads |-> FALSE] + /\ v_sleeping = [t \in Threads |-> FALSE] + /\ v_spinning = [t \in Threads |-> FALSE] + /\ v8_locked = [t \in Threads |-> FALSE] + /\ v8_sleeping = [t \in Threads |-> FALSE] + /\ weSpin = [t \in Threads |-> FALSE] + /\ i = [t \in Threads |-> 0] + /\ sleepers = {} + /\ acquisitions = 0 + /\ spin = NumSpins + /\ pc = [self \in ProcSet |-> "NonCriticalSection"] + +NonCriticalSection(self) == /\ pc[self] = "NonCriticalSection" + /\ IF acquisitions >= NumAcquires + THEN /\ pc' = [pc EXCEPT ![self] = "Done"] + ELSE /\ pc' = [pc EXCEPT ![self] = "SpeculativeGrab"] + /\ UNCHANGED << owner, l_key_locked, + l_key_sleeping, l_key_spinning, + v_locked, v_sleeping, v_spinning, + v8_locked, v8_sleeping, weSpin, i, + sleepers, acquisitions, spin >> + +SpeculativeGrab(self) == /\ pc[self] = "SpeculativeGrab" + /\ v8_locked' = [v8_locked EXCEPT ![self] = l_key_locked] + /\ v8_sleeping' = [v8_sleeping EXCEPT ![self] = l_key_sleeping] + /\ l_key_locked' = TRUE + /\ l_key_sleeping' = FALSE + /\ pc' = [pc EXCEPT ![self] = "SpeculativeGrabCheck"] + /\ UNCHANGED << owner, l_key_spinning, v_locked, + v_sleeping, v_spinning, weSpin, i, + sleepers, acquisitions, spin >> + +SpeculativeGrabCheck(self) == /\ pc[self] = "SpeculativeGrabCheck" + /\ IF v8_locked[self] = FALSE + THEN /\ IF v8_sleeping[self] = TRUE + THEN /\ l_key_sleeping' = TRUE + ELSE /\ TRUE + /\ UNCHANGED l_key_sleeping + /\ pc' = [pc EXCEPT ![self] = "Locked"] + /\ UNCHANGED << v_locked, v_sleeping, + v_spinning, weSpin >> + ELSE /\ v_locked' = [v_locked EXCEPT ![self] = TRUE] + /\ v_sleeping' = [v_sleeping EXCEPT ![self] = FALSE] + /\ v_spinning' = [v_spinning EXCEPT ![self] = FALSE] + /\ weSpin' = [weSpin EXCEPT ![self] = FALSE] + /\ pc' = [pc EXCEPT ![self] = "EnterSlowPath"] + /\ UNCHANGED l_key_sleeping + /\ UNCHANGED << owner, l_key_locked, + l_key_spinning, v8_locked, + v8_sleeping, i, sleepers, + acquisitions, spin >> + +EnterSlowPath(self) == /\ pc[self] = "EnterSlowPath" + /\ i' = [i EXCEPT ![self] = 0] + /\ pc' = [pc EXCEPT ![self] = "TryAcquire"] + /\ UNCHANGED << owner, l_key_locked, l_key_sleeping, + l_key_spinning, v_locked, v_sleeping, + v_spinning, v8_locked, v8_sleeping, + weSpin, sleepers, acquisitions, spin >> + +TryAcquire(self) == /\ pc[self] = "TryAcquire" + /\ IF v_locked[self] = TRUE + THEN /\ pc' = [pc EXCEPT ![self] = "SetSleepBit"] + /\ UNCHANGED << l_key_locked, l_key_sleeping, + l_key_spinning >> + ELSE /\ IF weSpin[self] + THEN /\ IF v_locked[self] = l_key_locked + /\ v_sleeping[self] = l_key_sleeping + /\ v_spinning[self] = l_key_spinning + THEN /\ l_key_locked' = TRUE + /\ l_key_sleeping' = TRUE + /\ l_key_spinning' = FALSE + /\ pc' = [pc EXCEPT ![self] = "Locked"] + ELSE /\ pc' = [pc EXCEPT ![self] = "TryAcquireLoop"] + /\ UNCHANGED << l_key_locked, + l_key_sleeping, + l_key_spinning >> + ELSE /\ IF l_key_locked = FALSE + THEN /\ l_key_locked' = TRUE + /\ l_key_sleeping' = TRUE + /\ pc' = [pc EXCEPT ![self] = "Locked"] + ELSE /\ l_key_locked' = TRUE + /\ l_key_sleeping' = TRUE + /\ pc' = [pc EXCEPT ![self] = "TryAcquireLoop"] + /\ UNCHANGED l_key_spinning + /\ UNCHANGED << owner, v_locked, v_sleeping, v_spinning, + v8_locked, v8_sleeping, weSpin, i, + sleepers, acquisitions, spin >> + +TryAcquireLoop(self) == /\ pc[self] = "TryAcquireLoop" + /\ v_locked' = [v_locked EXCEPT ![self] = l_key_locked] + /\ v_sleeping' = [v_sleeping EXCEPT ![self] = l_key_sleeping] + /\ v_spinning' = [v_spinning EXCEPT ![self] = l_key_spinning] + /\ pc' = [pc EXCEPT ![self] = "TryAcquire"] + /\ UNCHANGED << owner, l_key_locked, l_key_sleeping, + l_key_spinning, v8_locked, v8_sleeping, + weSpin, i, sleepers, acquisitions, + spin >> + +SetSleepBit(self) == /\ pc[self] = "SetSleepBit" + /\ IF v_sleeping[self] = FALSE + THEN /\ l_key_sleeping' = TRUE + /\ v_sleeping' = [v_sleeping EXCEPT ![self] = TRUE] + ELSE /\ TRUE + /\ UNCHANGED << l_key_sleeping, v_sleeping >> + /\ pc' = [pc EXCEPT ![self] = "SetSpinBit"] + /\ UNCHANGED << owner, l_key_locked, l_key_spinning, + v_locked, v_spinning, v8_locked, + v8_sleeping, weSpin, i, sleepers, + acquisitions, spin >> + +SetSpinBit(self) == /\ pc[self] = "SetSpinBit" + /\ IF weSpin[self] = FALSE + THEN /\ weSpin' = [weSpin EXCEPT ![self] = (l_key_spinning = FALSE)] + /\ l_key_spinning' = TRUE + ELSE /\ TRUE + /\ UNCHANGED << l_key_spinning, weSpin >> + /\ pc' = [pc EXCEPT ![self] = "DoSpin"] + /\ UNCHANGED << owner, l_key_locked, l_key_sleeping, + v_locked, v_sleeping, v_spinning, + v8_locked, v8_sleeping, i, sleepers, + acquisitions, spin >> + +DoSpin(self) == /\ pc[self] = "DoSpin" + /\ IF weSpin[self] = TRUE + THEN /\ IF i[self] < spin + THEN /\ v_locked' = [v_locked EXCEPT ![self] = l_key_locked] + /\ v_sleeping' = [v_sleeping EXCEPT ![self] = l_key_sleeping] + /\ v_spinning' = [v_spinning EXCEPT ![self] = l_key_spinning] + /\ pc' = [pc EXCEPT ![self] = "TryAcquire"] + /\ UNCHANGED << l_key_spinning, weSpin >> + ELSE /\ weSpin' = [weSpin EXCEPT ![self] = FALSE] + /\ l_key_spinning' = FALSE + /\ pc' = [pc EXCEPT ![self] = "Sleep"] + /\ UNCHANGED << v_locked, v_sleeping, + v_spinning >> + ELSE /\ pc' = [pc EXCEPT ![self] = "Sleep"] + /\ UNCHANGED << l_key_spinning, v_locked, + v_sleeping, v_spinning, weSpin >> + /\ UNCHANGED << owner, l_key_locked, l_key_sleeping, v8_locked, + v8_sleeping, i, sleepers, acquisitions, spin >> + +Sleep(self) == /\ pc[self] = "Sleep" + /\ v_locked' = [v_locked EXCEPT ![self] = l_key_locked] + /\ v_sleeping' = [v_sleeping EXCEPT ![self] = l_key_sleeping] + /\ v_spinning' = [v_spinning EXCEPT ![self] = l_key_spinning] + /\ pc' = [pc EXCEPT ![self] = "Futex"] + /\ UNCHANGED << owner, l_key_locked, l_key_sleeping, + l_key_spinning, v8_locked, v8_sleeping, weSpin, + i, sleepers, acquisitions, spin >> + +Futex(self) == /\ pc[self] = "Futex" + /\ IF v_locked[self] = FALSE + THEN /\ pc' = [pc EXCEPT ![self] = "TryAcquire"] + /\ UNCHANGED sleepers + ELSE /\ IF l_key_locked = v_locked[self] + /\ l_key_sleeping = TRUE + THEN /\ sleepers' = (sleepers \union {self}) + ELSE /\ TRUE + /\ UNCHANGED sleepers + /\ pc' = [pc EXCEPT ![self] = "Sleeping"] + /\ UNCHANGED << owner, l_key_locked, l_key_sleeping, + l_key_spinning, v_locked, v_sleeping, + v_spinning, v8_locked, v8_sleeping, weSpin, i, + acquisitions, spin >> + +Sleeping(self) == /\ pc[self] = "Sleeping" + /\ self \notin sleepers + /\ i' = [i EXCEPT ![self] = 0] + /\ v_locked' = [v_locked EXCEPT ![self] = l_key_locked] + /\ v_sleeping' = [v_sleeping EXCEPT ![self] = l_key_sleeping] + /\ v_spinning' = [v_spinning EXCEPT ![self] = l_key_spinning] + /\ pc' = [pc EXCEPT ![self] = "TryAcquire"] + /\ UNCHANGED << owner, l_key_locked, l_key_sleeping, + l_key_spinning, v8_locked, v8_sleeping, + weSpin, sleepers, acquisitions, spin >> + +Locked(self) == /\ pc[self] = "Locked" + /\ owner' = self + /\ pc' = [pc EXCEPT ![self] = "CriticalSection"] + /\ UNCHANGED << l_key_locked, l_key_sleeping, l_key_spinning, + v_locked, v_sleeping, v_spinning, v8_locked, + v8_sleeping, weSpin, i, sleepers, acquisitions, + spin >> + +CriticalSection(self) == /\ pc[self] = "CriticalSection" + /\ acquisitions' = acquisitions + 1 + /\ pc' = [pc EXCEPT ![self] = "Unlock"] + /\ UNCHANGED << owner, l_key_locked, l_key_sleeping, + l_key_spinning, v_locked, v_sleeping, + v_spinning, v8_locked, v8_sleeping, + weSpin, i, sleepers, spin >> + +Unlock(self) == /\ pc[self] = "Unlock" + /\ owner' = NULL + /\ v8_locked' = [v8_locked EXCEPT ![self] = l_key_locked] + /\ v8_sleeping' = [v8_sleeping EXCEPT ![self] = l_key_sleeping] + /\ l_key_locked' = FALSE + /\ l_key_sleeping' = FALSE + /\ IF v8_sleeping'[self] = TRUE + THEN /\ pc' = [pc EXCEPT ![self] = "Wakeup"] + ELSE /\ pc' = [pc EXCEPT ![self] = "NonCriticalSection"] + /\ UNCHANGED << l_key_spinning, v_locked, v_sleeping, + v_spinning, weSpin, i, sleepers, acquisitions, + spin >> + +Wakeup(self) == /\ pc[self] = "Wakeup" + /\ v_locked' = [v_locked EXCEPT ![self] = l_key_locked] + /\ v_sleeping' = [v_sleeping EXCEPT ![self] = l_key_sleeping] + /\ v_spinning' = [v_spinning EXCEPT ![self] = l_key_spinning] + /\ pc' = [pc EXCEPT ![self] = "Wake2"] + /\ UNCHANGED << owner, l_key_locked, l_key_sleeping, + l_key_spinning, v8_locked, v8_sleeping, weSpin, + i, sleepers, acquisitions, spin >> + +Wake2(self) == /\ pc[self] = "Wake2" + /\ IF v_spinning[self] = FALSE + THEN /\ IF Cardinality(sleepers) > 0 + THEN /\ IF WakeAny + THEN /\ LET other == CHOOSE other \in sleepers: TRUE IN + sleepers' = sleepers \ {other} + ELSE /\ \E other \in sleepers: + sleepers' = sleepers \ {other} + ELSE /\ TRUE + /\ UNCHANGED sleepers + ELSE /\ TRUE + /\ UNCHANGED sleepers + /\ pc' = [pc EXCEPT ![self] = "NonCriticalSection"] + /\ UNCHANGED << owner, l_key_locked, l_key_sleeping, + l_key_spinning, v_locked, v_sleeping, + v_spinning, v8_locked, v8_sleeping, weSpin, i, + acquisitions, spin >> + +thread(self) == NonCriticalSection(self) \/ SpeculativeGrab(self) + \/ SpeculativeGrabCheck(self) \/ EnterSlowPath(self) + \/ TryAcquire(self) \/ TryAcquireLoop(self) + \/ SetSleepBit(self) \/ SetSpinBit(self) \/ DoSpin(self) + \/ Sleep(self) \/ Futex(self) \/ Sleeping(self) + \/ Locked(self) \/ CriticalSection(self) \/ Unlock(self) + \/ Wakeup(self) \/ Wake2(self) + +(* Allow infinite stuttering to prevent deadlock on termination. *) +Terminating == /\ \A self \in ProcSet: pc[self] = "Done" + /\ UNCHANGED vars + +Next == (\E self \in Threads: thread(self)) + \/ Terminating + +Spec == /\ Init /\ [][Next]_vars + /\ \A self \in Threads : WF_vars(thread(self)) + +Termination == <>(\A self \in ProcSet: pc[self] = "Done") + +\* END TRANSLATION + +============================================================================= +\* Modification History +\* Last modified Tue Oct 01 13:18:42 PDT 2024 by rhysh +\* Created Tue Sep 17 12:10:43 PDT 2024 by rhysh -- cgit v1.3