diff options
| author | Vlad Saioc <vsaioc@uber.com> | 2025-09-08 13:48:04 +0000 |
|---|---|---|
| committer | Michael Knyszek <mknyszek@google.com> | 2025-09-08 08:11:43 -0700 |
| commit | f6e2e6b7f89c932d16aaa20b2aabddd599e5b468 (patch) | |
| tree | 79874bd8058bc0b812f9c83ed121a0c686d416d1 /design/74609-goroutine-leak-detection-gc.md | |
| parent | 2c02d6bab9c85b41afd730856a70286a687f85be (diff) | |
| download | go-x-proposal-f6e2e6b7f89c932d16aaa20b2aabddd599e5b468.tar.xz | |
design: add design/74609-goroutine-leak-detection-gc.md
For golang/go#74609
Changes at CL 688335
Change-Id: I605c0d4aa88cd44f42300ebe476496744d93f9ce
GitHub-Last-Rev: 49564e8f7727b67d56faf8b330f266cea67a7fcd
GitHub-Pull-Request: golang/proposal#58
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/689555
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Diffstat (limited to 'design/74609-goroutine-leak-detection-gc.md')
| -rw-r--r-- | design/74609-goroutine-leak-detection-gc.md | 154 |
1 files changed, 154 insertions, 0 deletions
diff --git a/design/74609-goroutine-leak-detection-gc.md b/design/74609-goroutine-leak-detection-gc.md new file mode 100644 index 0000000..3e9dbc5 --- /dev/null +++ b/design/74609-goroutine-leak-detection-gc.md @@ -0,0 +1,154 @@ +# Proposal: Goroutine leak detection via garbage collection + +Author(s): Georgian-Vlad Saioc (vsaioc@uber.com), Milind Chabbi (milind@uber.com) + +Last updated: 14 Aug 2025 + +Discussion at [issue #74609](https://go.dev/issue/74609). + +## Abstract + +This proposal outlines a dynamic technique for detecting goroutine +leaks within Go programs. It leverages the existing marking phase +of the Go garbage collector (GC) to find goroutines blocked over +concurrency primitives that are not reachable in memory from goroutines +that may still be runnable. + +## Background + +Due to its concurrency features (lightweight goroutines, +message passing), Go is particularly susceptible to concurrency bugs +known as _goroutine leaks_ (also known as _partial deadlocks_ in +literature [1](https://dl.acm.org/doi/10.1145/3676641.3715990)). +Unlike global deadlocks (wherein all goroutines are blocked) that halt +an entire application, goroutine leaks occur whenever a goroutine is +blocked indefinitely, e.g., by reading from a channel that no other +goroutine has access to, but other running goroutines keep the +program operational. +This issue can lead to (_a_) severe memory leaks, and (_b_) performance +penalties, by over-burdening the GC with the task to mark useless memory. +Goroutine leaks may be notoriously difficult to debug; in some cases +even their presence alone is difficult to discern, even with otherwise +thorough diagnostic information, e.g., memory and goroutine profiles. +This makes tooling capable of detecting their presence valuable +to the Go ecosystem. + +## Proposal + +The change involves several modifications to key points during phases +of the GC cycle, as follows: +1. Mark root preparation: initially treat only _runnable_ goroutines +as mark roots (the regular GC treats _all_ goroutines as roots) +2. Proceed to mark memory from this set of roots. +3. Once all reachable memory has been marked, check whether any +unmarked goroutines are blocked at operations over any concurrency +primitives that have been marked as a result of step 2. +4. Any such goroutines are considered _eventually runnable_, and +must be treated as mark roots. Resume marking from step 2 with +the new roots. +5. Once a fixed point over reachable memory is computed, report any +goroutines that are not treated as roots as leaks; resume from +step 2 one last time with leaked goroutines as mark roots to ensure +that all reachable memory is marked, like in the regular GC. +6. Sweeping proceeds as normal. + +For an additional in-depth description of the theoretical +underpinnings, refer [here](https://dl.acm.org/doi/10.1145/3676641.3715990). + +## Rationale + +The proposal expands the developer toolset when it comes to identifying +goroutine leaks, especially in long-running systems with complex +non-deterministic behavior. +The advantage of this approach over other goroutine leak detection +techniques is that it can be leveraged, with a minimal performance +cost, in regular Go systems, e.g., production services. +It is also theoretically sound, i.e., there are no false positives. +Its primary limitation is that its effectiveness is reduced the more +heap resources are over-exposed in memory, i.e., pair-wise reachable. + +## Compatibility + +The feature is backwards-compatible with any Go program. +Changes are strictly internal, and any extensions are only accessible +on an opt-in basis via additional APIs, in this case by adding a +new profile type. + +## Implementation + +A working prototype is available at [go.dev/cl/688335](https://go.dev/cl/688335). + +In this section we discuss various aspects of the implementation. + +### Opting in via profiling + +Goroutine leak detection behaviour is +triggered on-demand via profiling. +An additional profile type, `"goroutineleak"`, is now available. +Attempting to extract it will perform the following: + +1. Queue a leak detecting GC cycle and wait for it to complete. +2. Extract a goroutine profile. +3. Filter for goroutines with a leaked status, if `debug < 2`; +alternatively, get a full stack dump of all goroutines, if `debug >=2`. +4. Output the results. + +Otherwise, the GC preserves regular behavior, with a few exceptions +described in the remainder of this section. + +### Temporary experimental flag +In order to avoid most performance penalties, +the proposal is currently only enabled via the +experimental flag `goleakprofiler`. + +### Hiding pointers from the GC +It is essential for the approach that certain pointers are only +conditionally traced by the GC. +In the current implementation, this is achieved via +**maybe-traceable pointers**, expressed as type `maybeTraceablePtr` +in the runtime. + +A maybe-traceable pointer value is a pair between a +`unsafe.Pointer` and `uintptr` value, stored at fields `.vp` and `.vu`, +respectively, within the `maybeTraceablePtr` type. +A maybe-traceable pointer has one of three states: + +1) **Unset:** both `.vp` and `.vu` are zero values. +This is homologous to `nil`. +2) **Traceable:** both `.vp` and `.vu` are set, where both point to the +same address. +3) **Untraceable:** `.vu` is set to the address that is referenced, but +`.vp` is set +to `nil`, such that the GC does not automatically trace it when +scanning the object embedding the maybe-traceable pointer. + +Maybe-traceable pointers are then provided with a set of methods for +setting and unsetting them, that guarantee certain invariants at +runtime, e.g., that if `.vp` and `.vu` are set, they point to the +same address. + +The use of maybe-traceable pointers is only required for `*sudog` +objects, specifically for the `.elem` and `.hchan` fields. +This prevents the GC from inadvertendly marking channels that have +not yet been deemed reachable in memory via eventually runnable +goroutines. +This may occur because `*sudog` objects are globally reachable: via +the list of goroutine objects (`*g`) at `allgs`, and via the treap +forest of semaphore-related `*sudog`s at `semtable`. + +All uses of these fields have been updated with the methods provided +by the `maybeTraceablePtr` type. +When a goroutine leak detection GC cycle starts, it sets all +maybe-traceable pointers in `*sudog` objects as untraceable. +Once the cycle concludes, it resets all the pointers to being traceable. + +### Soft dependency on [go.dev/issue/27993](https://go.dev/issue/27993) +In the current implementation of the GC, there is a check for whether +marking phase must be restarted due to +[go.dev/issue/27993](https://go.dev/issue/27993). +We extend that checkpoint with additional logic: (1) to find +additional eventually-runnable goroutines, or (2) to mark goroutines as +leaked, both of which provide another reason to restart +the marking phase. +Even if #27993 is resolved, the checkpoint must be preserved +for goroutine leak detection. |
