diff options
| author | Michael Anthony Knyszek <mknyszek@google.com> | 2022-08-04 20:08:53 +0000 |
|---|---|---|
| committer | Michael Knyszek <mknyszek@google.com> | 2022-08-04 21:01:59 +0000 |
| commit | 2e782e0005d36fe20e9f2fb16fc264cc08c97819 (patch) | |
| tree | ef5a3d1b8dfa377d7a5f390471f76d717015013f | |
| parent | 6963ca6ea3c5b47fcd3e27b71ffb9cbd6ea5ab1a (diff) | |
| download | go-x-proposal-2e782e0005d36fe20e9f2fb16fc264cc08c97819.tar.xz | |
design: update the soft memory limit design document
This change updates the design document with some additional prior art
and corrections that reflect the final implementation.
For #48409.
Change-Id: I8276a30b546c698ab1c9f946ccc0ae2a327399d0
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/421418
Reviewed-by: Michael Knyszek <mknyszek@google.com>
| -rw-r--r-- | design/48409-soft-memory-limit.md | 146 | ||||
| -rw-r--r-- | design/48409/README.md | 5 | ||||
| -rw-r--r-- | design/48409/eqn1.png | bin | 1012 -> 922 bytes | |||
| -rw-r--r-- | design/48409/eqn2.png | bin | 1656 -> 1656 bytes | |||
| -rw-r--r-- | design/48409/inl1.png | bin | 399 -> 399 bytes | |||
| -rw-r--r-- | design/48409/inl10.png | bin | 455 -> 449 bytes | |||
| -rw-r--r-- | design/48409/inl11.png | bin | 368 -> 455 bytes | |||
| -rw-r--r-- | design/48409/inl12.png | bin | 755 -> 368 bytes | |||
| -rw-r--r-- | design/48409/inl13.png | bin | 0 -> 755 bytes | |||
| -rw-r--r-- | design/48409/inl2.png | bin | 370 -> 370 bytes | |||
| -rw-r--r-- | design/48409/inl3.png | bin | 486 -> 389 bytes | |||
| -rw-r--r-- | design/48409/inl4.png | bin | 434 -> 400 bytes | |||
| -rw-r--r-- | design/48409/inl5.png | bin | 400 -> 397 bytes | |||
| -rw-r--r-- | design/48409/inl6.png | bin | 374 -> 695 bytes | |||
| -rw-r--r-- | design/48409/inl7.png | bin | 462 -> 374 bytes | |||
| -rw-r--r-- | design/48409/inl8.png | bin | 482 -> 462 bytes | |||
| -rw-r--r-- | design/48409/inl9.png | bin | 449 -> 482 bytes | |||
| -rw-r--r-- | design/48409/soft-memory-limit.src.md | 135 |
18 files changed, 93 insertions, 193 deletions
diff --git a/design/48409-soft-memory-limit.md b/design/48409-soft-memory-limit.md index 59c0ab1..3edc9ad 100644 --- a/design/48409-soft-memory-limit.md +++ b/design/48409-soft-memory-limit.md @@ -208,70 +208,51 @@ limit , I propose the following calculation:  -Where  is (per `runtime/metrics` memory names) + is the total amount of memory mapped by the Go runtime. + is the amount of free and unscavenged memory the Go +runtime is holding. + is the number of bytes in allocated heap objects at the +time  is computed. -``` -/memory/classes/metadata/mcache/free:bytes + -/memory/classes/metadata/mcache/inuse:bytes + -/memory/classes/metadata/mspan/free:bytes + -/memory/classes/metadata/mspan/inuse:bytes + -/memory/classes/metadata/other:bytes + -/memory/classes/os-stacks:bytes + -/memory/classes/other:bytes + -/memory/classes/profiling/buckets:bytes -``` - -and  is the maximum of -`/memory/classes/heap/unused:bytes + /memory/classes/heap/free:bytes` over the -last GC cycle. - -These terms (called , for "overheads") account for all -memory that is not accounted for by the GC pacer (from the [new pacer -proposal](https://github.com/golang/proposal/blob/329650d4723a558c2b76b81b4995fc5c267e6bc1/design/44167-gc-pacer-redesign.md#heap-goal)). +The second term, , represents the sum of +non-heap overheads. +Free and unscavenged memory is specifically excluded because this is memory that +the runtime might use in the near future, and the scavenger is specifically +instructed to leave the memory up to the heap goal unscavenged. +Failing to exclude free and unscavenged memory could lead to a very poor +accounting of non-heap overheads. With  fully defined, our heap goal for cycle - () is a straightforward extension + () is a straightforward extension of the existing one. Where -*  is equal to bytes marked at the end of GC n's mark +*  is equal to bytes marked at the end of GC n's mark phase -*  is equal to stack bytes at the beginning of GC n's +*  is equal to stack bytes at the beginning of GC n's mark phase -*  is equal to bytes of globals at the beginning of GC +*  is equal to bytes of globals at the beginning of GC n's mark phase -*  is equal to -  +*  is equal to +  then  -Over the course of a GC cycle  remains stable because it -increases monotonically. -There's only one situation where  can grow tremendously -(relative to active heap objects) in a short period of time (< 1 GC cycle), and -that's when `GOMAXPROCS` increases. -So, I also propose recomputing this value at that time. - -Meanwhile  stays relatively stable (and doesn't have a -sawtooth pattern, as one might expect from a sum of idle heap memory) because -object sweeping occurs incrementally, specifically proportionally to how fast -the application is allocating. -Furthermore, this value is guaranteed to stay relatively stable across a single -GC cycle, because the total size of the heap for one GC cycle is bounded by the -heap goal. -Taking the highwater mark of this value places a conservative upper bound on the -total impact of this memory, so the heap goal stays safe from major changes. +Over the course of a GC cycle, non-heap overheads remain stable because the +mostly increase monotonically. +However, the GC needs to be responsive to any change in non-heap overheads. +Therefore, I propose a more heavy-weight recomputation of the heap goal every +time its needed, as opposed to computing it only once per cycle. +This also means the GC trigger point needs to be dynamically recomputable. +This check will create additional overheads, but they're likely to be low, as +the GC's internal statistics are updated only on slow paths. -One concern with the above definition of  is that it -is fragile to changes to the Go GC. -In the past, seemingly unrelated changes to the Go runtime have impacted the -GC's pacer, usually due to an unforeseen influence on the accounting that the -pacer relies on. -To minimize the impact of these accidents on the conversion function, I propose -centralizing and categorizing all the variables used in accounting, and writing -tests to ensure that expected properties of the account remain in-tact. +The nice thing about this definition of  is that +it's fairly robust to changes to the Go GC, since total mapped memory, free and +unscavenged memory, and bytes allocated in objects, are fairly fundamental +properties (especially to any tracing GC design). #### Death spirals @@ -322,7 +303,7 @@ large enough to accommodate worst-case pause times but not too large such that a more than about a second. 1 CPU-second per `GOMAXPROCS` seems like a reasonable place to start. -Unfortunately, 50% is not a reasonable choice for small values of `GOGC`. +Unfortunately, 50% is not always a reasonable choice for small values of `GOGC`. Consider an application running with `GOGC=10`: an overall 50% GC CPU utilization limit for `GOGC=10` is likely going to be always active, leading to significant overshoot. @@ -359,22 +340,13 @@ use approaches the limit. I propose it does so using a proportional-integral controller whose input is the difference between the memory limit and the memory used by Go, and whose output is the CPU utilization target of the background scavenger. -The output will be clamped at a minimum of 1% and a maximum of 10% overall CPU -utilization. -Note that the 10% is chosen arbitrarily; in general, returning memory to the -platform is nowhere near as costly as the GC, but the number must be chosen such -that the mutator still has plenty of room to make progress (thus, I assert that -40% of CPU time is enough). -In order to make the scavenger scale to overall CPU utilization effectively, it -requires some improvements to avoid the aforementioned locking issues it deals -with today. +This will make the background scavenger more reliable. -Any CPU time spent in the scavenger should also be accounted for in the leaky -bucket algorithm described in the [Death spirals](#death-spirals) section as GC -time, however I don't think it should be throttled in the same way. -The intuition behind that is that returning memory to the platform is generally -going to be more immediately fruitful than spending more time in garbage -collection. +However, the background scavenger likely won't return memory to the OS promptly +enough for the memory limit, so in addition, I propose having span allocations +eagerly return memory to the OS to stay under the limit. +The time a goroutine spends in this will also count toward the 50% GC CPU limit +described in the [Death spirals](#death-spirals) section. #### Alternative approaches considered @@ -418,38 +390,13 @@ go beyond the spans already in-use. ##### Returning memory to the platform -A potential issue with the proposed design is that because the scavenger is -running in the background, it may not react readily to spikes in memory use that -exceed the limit. - -In contrast, [TCMalloc](#tcmalloc) searches for memory to return eagerly, if an -allocation were to exceed the limit. -In the Go 1.13 cycle, I attempted a similar policy when first implementing the -scavenger, and found that it could cause unacceptable tail latency increases in -some applications. -While that policy certainly tried to return memory back to the platform -significantly more often than it would be in this case, it still has a couple of -downsides: -1. It introduces latency. - The background scavenger can be more efficiently time-sliced in between other - work, so it generally should only impact throughput. -1. It's much more complicated to bound the total amount of time spent searching - for and returning memory to the platform during an allocation. - -The key insight as to why this policy works just fine for TCMalloc and won't -work for Go comes from a fundamental difference in design. -Manual memory allocators are typically designed to have a LIFO-style memory -reuse pattern. -Once an allocation is freed, it is immediately available for reallocation. -In contrast, most efficient tracing garbage collection algorithms require a -FIFO-style memory reuse pattern, since allocations are freed in bulk. -The result is that the page allocator in a garbage-collected memory allocator is -accessed far more frequently than in manual memory allocator, so this path will -be hit a lot harder. - -For the purposes of this design, I don't believe the benefits of eager return -outweigh the costs, and I do believe that the proposed design is good enough for -most cases. +If returning memory to the OS eagerly becomes a significant performance issue, a +reasonable alternative could be to crank up the background scavenger's CPU usage +in response to growing memory pressure. +This needs more thought, but given that it would now be controlled by a +controller, its CPU usage will be more reliable, and this is an option we can +keep in mind. +One benefit of this option is that it may impact latency less prominently. ### Documentation @@ -513,6 +460,11 @@ decides to shrink the heap space used; more recent implementations (e.g. G1) do so more rarely, except when [the application is idle](https://openjdk.java.net/jeps/346). +Some JVMs are "container aware" and read the memory limits of their containers +to stay under the limit. +This behavior is closer to what is proposed in this document, but I do not +believe the memory limit is directly configurable, like the one proposed here. + ### SetMaxHeap For nearly 4 years, the Go project has been trialing an experimental API in the diff --git a/design/48409/README.md b/design/48409/README.md index 8994615..375a0f2 100644 --- a/design/48409/README.md +++ b/design/48409/README.md @@ -1,6 +1,6 @@ # Design document -The GC pacer design document is generated from the `.src.md` file in this +The memory limit design document is generated from the `.src.md` file in this directory. It contains LaTeX formulas which Markdown cannot render, so they're rendered by an external open-source tool, @@ -10,9 +10,6 @@ Then, because Gitiles' markdown viewer can't render SVGs, run ``` ./svg2png.bash -cd pacer-plots -./svg2png.bash -cd .. ``` And go back and replace all instances of SVG with PNG in the final document. diff --git a/design/48409/eqn1.png b/design/48409/eqn1.png Binary files differindex f638a9a..96ce7b9 100644 --- a/design/48409/eqn1.png +++ b/design/48409/eqn1.png diff --git a/design/48409/eqn2.png b/design/48409/eqn2.png Binary files differindex b25a7dd..8174e00 100644 --- a/design/48409/eqn2.png +++ b/design/48409/eqn2.png diff --git a/design/48409/inl1.png b/design/48409/inl1.png Binary files differindex d5799ad..8636d42 100644 --- a/design/48409/inl1.png +++ b/design/48409/inl1.png diff --git a/design/48409/inl10.png b/design/48409/inl10.png Binary files differindex 0cc2d21..2d1b53e 100644 --- a/design/48409/inl10.png +++ b/design/48409/inl10.png diff --git a/design/48409/inl11.png b/design/48409/inl11.png Binary files differindex 038e874..b1966ed 100644 --- a/design/48409/inl11.png +++ b/design/48409/inl11.png diff --git a/design/48409/inl12.png b/design/48409/inl12.png Binary files differindex 48cd28f..1158f2d 100644 --- a/design/48409/inl12.png +++ b/design/48409/inl12.png diff --git a/design/48409/inl13.png b/design/48409/inl13.png Binary files differnew file mode 100644 index 0000000..1beadc3 --- /dev/null +++ b/design/48409/inl13.png diff --git a/design/48409/inl2.png b/design/48409/inl2.png Binary files differindex bb27598..7fc5478 100644 --- a/design/48409/inl2.png +++ b/design/48409/inl2.png diff --git a/design/48409/inl3.png b/design/48409/inl3.png Binary files differindex c152bf2..4afda7a 100644 --- a/design/48409/inl3.png +++ b/design/48409/inl3.png diff --git a/design/48409/inl4.png b/design/48409/inl4.png Binary files differindex cf4f20b..23b71e5 100644 --- a/design/48409/inl4.png +++ b/design/48409/inl4.png diff --git a/design/48409/inl5.png b/design/48409/inl5.png Binary files differindex 6386671..9fc70f6 100644 --- a/design/48409/inl5.png +++ b/design/48409/inl5.png diff --git a/design/48409/inl6.png b/design/48409/inl6.png Binary files differindex de20b8e..d72def7 100644 --- a/design/48409/inl6.png +++ b/design/48409/inl6.png diff --git a/design/48409/inl7.png b/design/48409/inl7.png Binary files differindex 4b7a746..4149a13 100644 --- a/design/48409/inl7.png +++ b/design/48409/inl7.png diff --git a/design/48409/inl8.png b/design/48409/inl8.png Binary files differindex e3dbc80..9e0c57c 100644 --- a/design/48409/inl8.png +++ b/design/48409/inl8.png diff --git a/design/48409/inl9.png b/design/48409/inl9.png Binary files differindex 01dff67..820bb71 100644 --- a/design/48409/inl9.png +++ b/design/48409/inl9.png diff --git a/design/48409/soft-memory-limit.src.md b/design/48409/soft-memory-limit.src.md index c8efb3b..a5d6e81 100644 --- a/design/48409/soft-memory-limit.src.md +++ b/design/48409/soft-memory-limit.src.md @@ -107,7 +107,7 @@ It's outside of the scope of the Go runtime to solve this problem for everyone, but I believe the Go runtime has an important role to play in supporting these worthwhile efforts. -2. Eliminate out-of-memory errors in 100% of cases. +1. Eliminate out-of-memory errors in 100% of cases. Whatever policy this API adheres to is going to fail for some use-case, and that's OK. @@ -207,28 +207,20 @@ To compute the heap limit `$\hat{L}$` from the soft memory limit `$L$`, I propose the following calculation: ```render-latex -\hat{L} = L - (O_M + O_I) +\hat{L} = L - (T - F - A) ``` -Where `$O_M$` is (per `runtime/metrics` memory names) +`$T$` is the total amount of memory mapped by the Go runtime. +`$F$` is the amount of free and unscavenged memory the Go runtime is holding. +`$A$` is the number of bytes in allocated heap objects at the time `$\hat{L}$` +is computed. -``` -/memory/classes/metadata/mcache/free:bytes + -/memory/classes/metadata/mcache/inuse:bytes + -/memory/classes/metadata/mspan/free:bytes + -/memory/classes/metadata/mspan/inuse:bytes + -/memory/classes/metadata/other:bytes + -/memory/classes/os-stacks:bytes + -/memory/classes/other:bytes + -/memory/classes/profiling/buckets:bytes -``` - -and `$O_I$` is the maximum of `/memory/classes/heap/unused:bytes + -/memory/classes/heap/free:bytes` over the last GC cycle. - -These terms (called `$O$`, for "overheads") account for all memory that is not -accounted for by the GC pacer (from the [new pacer -proposal](https://github.com/golang/proposal/blob/329650d4723a558c2b76b81b4995fc5c267e6bc1/design/44167-gc-pacer-redesign.md#heap-goal)). +The second term, `$(T - F - A)$`, represents the sum of non-heap overheads. +Free and unscavenged memory is specifically excluded because this is memory that +the runtime might use in the near future, and the scavenger is specifically +instructed to leave the memory up to the heap goal unscavenged. +Failing to exclude free and unscavenged memory could lead to a very poor +accounting of non-heap overheads. With `$\hat{L}$` fully defined, our heap goal for cycle `$n$` (`$N_n$`) is a straightforward extension of the existing one. @@ -245,31 +237,19 @@ then N_n = min(\hat{L}, \gamma(M_{n-1})+S_n+G_n) ``` -Over the course of a GC cycle `$O_M$` remains stable because it increases -monotonically. -There's only one situation where `$O_M$` can grow tremendously (relative to -active heap objects) in a short period of time (< 1 GC cycle), and that's when -`GOMAXPROCS` increases. -So, I also propose recomputing this value at that time. - -Meanwhile `$O_I$` stays relatively stable (and doesn't have a sawtooth pattern, -as one might expect from a sum of idle heap memory) because object sweeping -occurs incrementally, specifically proportionally to how fast the application is -allocating. -Furthermore, this value is guaranteed to stay relatively stable across a single -GC cycle, because the total size of the heap for one GC cycle is bounded by the -heap goal. -Taking the highwater mark of this value places a conservative upper bound on the -total impact of this memory, so the heap goal stays safe from major changes. +Over the course of a GC cycle, non-heap overheads remain stable because the +mostly increase monotonically. +However, the GC needs to be responsive to any change in non-heap overheads. +Therefore, I propose a more heavy-weight recomputation of the heap goal every +time its needed, as opposed to computing it only once per cycle. +This also means the GC trigger point needs to be dynamically recomputable. +This check will create additional overheads, but they're likely to be low, as +the GC's internal statistics are updated only on slow paths. -One concern with the above definition of `$\hat{L}$` is that it is fragile to -changes to the Go GC. -In the past, seemingly unrelated changes to the Go runtime have impacted the -GC's pacer, usually due to an unforeseen influence on the accounting that the -pacer relies on. -To minimize the impact of these accidents on the conversion function, I propose -centralizing and categorizing all the variables used in accounting, and writing -tests to ensure that expected properties of the account remain in-tact. +The nice thing about this definition of `$\hat{L}$` is that it's fairly robust +to changes to the Go GC, since total mapped memory, free and unscavenged memory, +and bytes allocated in objects, are fairly fundamental properties (especially to +any tracing GC design). #### Death spirals @@ -320,7 +300,7 @@ large enough to accommodate worst-case pause times but not too large such that a more than about a second. 1 CPU-second per `GOMAXPROCS` seems like a reasonable place to start. -Unfortunately, 50% is not a reasonable choice for small values of `GOGC`. +Unfortunately, 50% is not always a reasonable choice for small values of `GOGC`. Consider an application running with `GOGC=10`: an overall 50% GC CPU utilization limit for `GOGC=10` is likely going to be always active, leading to significant overshoot. @@ -357,22 +337,13 @@ use approaches the limit. I propose it does so using a proportional-integral controller whose input is the difference between the memory limit and the memory used by Go, and whose output is the CPU utilization target of the background scavenger. -The output will be clamped at a minimum of 1% and a maximum of 10% overall CPU -utilization. -Note that the 10% is chosen arbitrarily; in general, returning memory to the -platform is nowhere near as costly as the GC, but the number must be chosen such -that the mutator still has plenty of room to make progress (thus, I assert that -40% of CPU time is enough). -In order to make the scavenger scale to overall CPU utilization effectively, it -requires some improvements to avoid the aforementioned locking issues it deals -with today. +This will make the background scavenger more reliable. -Any CPU time spent in the scavenger should also be accounted for in the leaky -bucket algorithm described in the [Death spirals](#death-spirals) section as GC -time, however I don't think it should be throttled in the same way. -The intuition behind that is that returning memory to the platform is generally -going to be more immediately fruitful than spending more time in garbage -collection. +However, the background scavenger likely won't return memory to the OS promptly +enough for the memory limit, so in addition, I propose having span allocations +eagerly return memory to the OS to stay under the limit. +The time a goroutine spends in this will also count toward the 50% GC CPU limit +described in the [Death spirals](#death-spirals) section. #### Alternative approaches considered @@ -416,38 +387,13 @@ go beyond the spans already in-use. ##### Returning memory to the platform -A potential issue with the proposed design is that because the scavenger is -running in the background, it may not react readily to spikes in memory use that -exceed the limit. - -In contrast, [TCMalloc](#tcmalloc) searches for memory to return eagerly, if an -allocation were to exceed the limit. -In the Go 1.13 cycle, I attempted a similar policy when first implementing the -scavenger, and found that it could cause unacceptable tail latency increases in -some applications. -While that policy certainly tried to return memory back to the platform -significantly more often than it would be in this case, it still has a couple of -downsides: -1. It introduces latency. - The background scavenger can be more efficiently time-sliced in between other - work, so it generally should only impact throughput. -1. It's much more complicated to bound the total amount of time spent searching - for and returning memory to the platform during an allocation. - -The key insight as to why this policy works just fine for TCMalloc and won't -work for Go comes from a fundamental difference in design. -Manual memory allocators are typically designed to have a LIFO-style memory -reuse pattern. -Once an allocation is freed, it is immediately available for reallocation. -In contrast, most efficient tracing garbage collection algorithms require a -FIFO-style memory reuse pattern, since allocations are freed in bulk. -The result is that the page allocator in a garbage-collected memory allocator is -accessed far more frequently than in manual memory allocator, so this path will -be hit a lot harder. - -For the purposes of this design, I don't believe the benefits of eager return -outweigh the costs, and I do believe that the proposed design is good enough for -most cases. +If returning memory to the OS eagerly becomes a significant performance issue, a +reasonable alternative could be to crank up the background scavenger's CPU usage +in response to growing memory pressure. +This needs more thought, but given that it would now be controlled by a +controller, its CPU usage will be more reliable, and this is an option we can +keep in mind. +One benefit of this option is that it may impact latency less prominently. ### Documentation @@ -511,6 +457,11 @@ decides to shrink the heap space used; more recent implementations (e.g. G1) do so more rarely, except when [the application is idle](https://openjdk.java.net/jeps/346). +Some JVMs are "container aware" and read the memory limits of their containers +to stay under the limit. +This behavior is closer to what is proposed in this document, but I do not +believe the memory limit is directly configurable, like the one proposed here. + ### SetMaxHeap For nearly 4 years, the Go project has been trialing an experimental API in the |
