aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/export_test.go
AgeCommit message (Collapse)Author
2022-11-17runtime: fix conflict between lfstack and checkptrKeith Randall
lfstack does very unsafe things. In particular, it will not work with nodes that live on the heap. In normal use by the runtime, that is the case (it is only used for gc work bufs). But the lfstack test does use heap objects. It goes through some hoops to prevent premature deallocation, but those hoops are not enough to convince -d=checkptr that everything is ok. Instead, allocate the test objects outside the heap, like the runtime does for all of its lfstack usage. Remove the lifetime workaround from the test. Reported in https://groups.google.com/g/golang-nuts/c/psjrUV2ZKyI Change-Id: If611105eab6c823a4d6c105938ce145ed731781d Reviewed-on: https://go-review.googlesource.com/c/go/+/448899 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org>
2022-11-01runtime: skip TestArenaCollision on failed reservationAustin Clements
If TestArenaCollision cannot reserve the address range it expects to reserve, it currently fails somewhat mysteriously. Detect this case and skip the test. This could lead to test rot if we wind up always skipping this test, but it's not clear that there's a better answer. If the test does fail, we now also log what it thinks it reserved so the failure message is more useful in debugging any issues. Fixes #49415 Fixes #54597 Change-Id: I05cf27258c1c0a7a3ac8d147f36bf8890820d59b Reviewed-on: https://go-review.googlesource.com/c/go/+/446877 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2022-10-14cmd/compile,cmd/link,runtime: add start line numbers to func metadataMichael Pratt
This adds the function "start line number" to runtime._func and runtime.inlinedCall objects. The "start line number" is the line number of the func keyword or TEXT directive for assembly. Subtracting the start line number from PC line number provides the relative line offset of a PC from the the start of the function. This helps with source stability by allowing code above the function to move without invalidating samples within the function. Encoding start line rather than relative lines directly is convenient because the pprof format already contains a start line field. This CL uses a straightforward encoding of explictly including a start line field in every _func and inlinedCall. It is possible that we could compress this further in the future. e.g., functions with a prologue usually have <line of PC 0> == <start line>. In runtime.test, 95% of functions have <line of PC 0> == <start line>. According to bent, this is geomean +0.83% binary size vs master and -0.31% binary size vs 1.19. Note that //line directives can change the file and line numbers arbitrarily. The encoded start line is as adjusted by //line directives. Since this can change in the middle of a function, `line - start line` offset calculations may not be meaningful if //line directives are in use. For #55022. Change-Id: Iaabbc6dd4f85ffdda294266ef982ae838cc692f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/429638 Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-10-12arena: add experimental arena packageMichael Anthony Knyszek
This change adds the arenas package and a function to reflect for allocating from an arena via reflection, but all the new API is placed behind a GOEXPERIMENT. For #51317. Change-Id: I026d46294e26ab386d74625108c19a0024fbcedc Reviewed-on: https://go-review.googlesource.com/c/go/+/423361 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-12runtime: add safe arena support to the runtimeMichael Anthony Knyszek
This change adds an API to the runtime for arenas. A later CL can potentially export it as an experimental API, but for now, just the runtime implementation will suffice. The purpose of arenas is to improve efficiency, primarily by allowing for an application to manually free memory, thereby delaying garbage collection. It comes with other potential performance benefits, such as better locality, a better allocation strategy, and better handling of interior pointers by the GC. This implementation is based on one by danscales@google.com with a few significant differences: * The implementation lives entirely in the runtime (all layers). * Arena chunks are the minimum of 8 MiB or the heap arena size. This choice is made because in practice 64 MiB appears to be way too large of an area for most real-world use-cases. * Arena chunks are not unmapped, instead they're placed on an evacuation list and when there are no pointers left pointing into them, they're allowed to be reused. * Reusing partially-used arena chunks no longer tries to find one used by the same P first; it just takes the first one available. * In order to ensure worst-case fragmentation is never worse than 25%, only types and slice backing stores whose sizes are 1/4th the size of a chunk or less may be used. Previously larger sizes, up to the size of the chunk, were allowed. * ASAN, MSAN, and the race detector are fully supported. * Sets arena chunks to fault that were deferred at the end of mark termination (a non-public patch once did this; I don't see a reason not to continue that). For #51317. Change-Id: I83b1693a17302554cb36b6daa4e9249a81b1644f Reviewed-on: https://go-review.googlesource.com/c/go/+/423359 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-09-16runtime/metrics: add /sync/mutex/wait/total:seconds metricMichael Anthony Knyszek
This change adds a metric to the runtime/metrics package which tracks total mutex wait time for sync.Mutex and sync.RWMutex. The purpose of this metric is to be able to quickly get an idea of the total mutex wait time. The implementation of this metric piggybacks off of the existing G runnable tracking infrastructure, as well as the wait reason set on a G when it goes into _Gwaiting. Fixes #49881. Change-Id: I4691abf64ac3574bec69b4d7d4428b1573130517 Reviewed-on: https://go-review.googlesource.com/c/go/+/427618 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-09-16runtime: shrink time histogram bucketsMichael Anthony Knyszek
There are lots of useless buckets with too much precision. Introduce a minimum level of precision with a minimum bucket bit. This cuts down on the size of a time histogram dramatically (~3x). Also, pick a smaller sub bucket count; we don't need 6% precision. Also, rename super-buckets to buckets to more closely line up with HDR histogram literature. Change-Id: I199449650e4b34f2a6dca3cf1d8edb071c6655c0 Reviewed-on: https://go-review.googlesource.com/c/go/+/427615 Run-TryBot: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2022-09-05runtime: refactor finalizer goroutine statusLeonard Wang
Use an atomic.Uint32 to represent the state of finalizer goroutine. fingStatus will only be changed to fingWake in non fingWait state, so it is safe to set fingRunningFinalizer status in runfinq. name old time/op new time/op delta Finalizer-8 592µs ± 4% 561µs ± 1% -5.22% (p=0.000 n=10+10) FinalizerRun-8 694ns ± 6% 675ns ± 7% ~ (p=0.059 n=9+8) Change-Id: I7e4da30cec98ce99f7d8cf4c97f933a8a2d1cae1 Reviewed-on: https://go-review.googlesource.com/c/go/+/400134 Reviewed-by: Joedian Reid <joedian@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Michael Pratt <mpratt@google.com>
2022-08-31runtime: convert semaRoot.nwait to atomic typecuiweixie
For #53821 Change-Id: I686fe81268f70acc6a4c3e6b1d3ed0e07bb0d61c Reviewed-on: https://go-review.googlesource.com/c/go/+/425775 Run-TryBot: xie cui <523516579@qq.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Pratt <mpratt@google.com>
2022-08-19runtime: add and use runtime/internal/sys.NotInHeapCuong Manh Le
Updates #46731 Change-Id: Ic2208c8bb639aa1e390be0d62e2bd799ecf20654 Reviewed-on: https://go-review.googlesource.com/c/go/+/421878 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2022-08-12runtime: convert timeHistogram to atomic typesMichael Pratt
I've dropped the note that sched.timeToRun is protected by sched.lock, as it does not seem to be true. For #53821. Change-Id: I03f8dc6ca0bcd4ccf3ec113010a0aa39c6f7d6ef Reviewed-on: https://go-review.googlesource.com/c/go/+/419449 Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Pratt <mpratt@google.com>
2022-08-08runtime: convert gcController.globalsScan to atomic typeMichael Pratt
For #53821. Change-Id: I92bd33e355c868ae229395fd9c98fdb10768d03d Reviewed-on: https://go-review.googlesource.com/c/go/+/417779 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2022-08-08runtime: convert gcController.maxStackScan to atomic typeMichael Pratt
For #53821. Change-Id: I1bd23cdbc371011ec2331fb0a37482ecf99a063b Reviewed-on: https://go-review.googlesource.com/c/go/+/417778 Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-08runtime: convert gcController.heapScan to atomic typeMichael Pratt
For #53821. Change-Id: I64d3f53c89a579d93056906304e4c05fc35cd9b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/417776 Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-08runtime: convert gcController.heapLive to atomic typeMichael Pratt
Atomic operations are used even during STW for consistency. For #53821. Change-Id: Ibe7afe5cf893b1288ce24fc96b7691b1f81754ff Reviewed-on: https://go-review.googlesource.com/c/go/+/417775 Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-02runtime: trivial replacements of g in remaining filesMichael Pratt
Rename g variables to gp for consistency. Change-Id: I09ecdc7e8439637bc0e32f9c5f96f515e6436362 Reviewed-on: https://go-review.googlesource.com/c/go/+/418591 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Michael Pratt <mpratt@google.com>
2022-08-02runtime: rename _p_ to ppMichael Pratt
_g_, _p_, and _m_ are primarily vestiges of the C version of the runtime, while today we prefer Go-style variable names (generally gp, pp, and mp). This change replaces all remaining uses of _p_ with pp. These are all trivial replacements (i.e., no conflicts). That said, there are several functions that refer to two different Ps at once. There the naming convention is generally that pp refers to the local P, and p2 refers to the other P we are accessing. Change-Id: I205b801be839216972e7644b1fbeacdbf2612859 Reviewed-on: https://go-review.googlesource.com/c/go/+/306674 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-06-29runtime: add race annotations to metricsSemaMichael Pratt
metricsSema protects the metrics map. The map implementation is race instrumented regardless of which package is it called from. semacquire/semrelease are not automatically race instrumented, so we can trigger race false positives without manually annotating our lock acquire and release. See similar instrumentation on trace.shutdownSema and reflectOffs.lock. Fixes #53542. Change-Id: Ia3fd239ac860e037d09c7cb9c4ad267391e70705 Reviewed-on: https://go-review.googlesource.com/c/go/+/414517 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-06-16runtime: write much more direct test for semaphore waiter scalabilityMichael Anthony Knyszek
This test originally existed as two tests in test/locklinear.go, but this checked against actual locks and was flaky. The test was checking a property of a deep part of the runtime but from a much higher level, and it's easy for nondeterminism due to scheduling to completely mess that up, especially on an oversubscribed system. That test was then moved to the sync package with a more rigorous testing methodology, but it could still flake pretty easily. Finally, this CL makes semtable more testable, exports it in export_test.go, then writes a very direct scalability test for exactly the situation the original test described. As far as I can tell, this is much, much more stable, because it's single-threaded and is just checking exactly the algorithm we need to check. Don't bother trying to bring in a test that checks for O(log n) behavior on the other kind of iteration. It'll be perpetually flaky because the underlying data structure is a treap, so it's only _expected_ to be O(log n), but it's very easy for it to get unlucky without a large number of iterations that's too much for a simple test. Fixes #53381. Change-Id: Ia1cd2d2b0e36d552d5a8ae137077260a16016602 Reviewed-on: https://go-review.googlesource.com/c/go/+/412875 Reviewed-by: Michael Pratt <mpratt@google.com>
2022-06-03runtime: only use CPU time from the current window in the GC CPU limiterMichael Anthony Knyszek
Currently the GC CPU limiter consumes CPU time from a few pools, but because the events that flush to those pools may overlap, rather than be strictly contained within, the update window for the GC CPU limiter, the limiter's accounting is ultimately sloppy. This sloppiness complicates accounting for idle time more completely, and makes reasoning about the transient behavior of the GC CPU limiter much more difficult. To remedy this, this CL adds a field to the P struct that tracks the start time of any in-flight event the limiter might care about, along with information about the nature of that event. This timestamp is managed atomically so that the GC CPU limiter can come in and perform a read of the partial CPU time consumed by a given event. The limiter also updates the timestamp so that only what's left over is flushed by the event itself when it completes. The end result of this change is that, since the GC CPU limiter is aware of all past completed events, and all in-flight events, it can much more accurately collect the CPU time of events since the last update. There's still the possibility for skew, but any leftover time will be captured in the following update, and the magnitude of this leftover time is effectively bounded by the update period of the GC CPU limiter, which is much easier to consider. One caveat of managing this timestamp-type combo atomically is that they need to be packed in 64 bits. So, this CL gives up the top 3 bits of the timestamp and places the type information there. What this means is we effectively have only a 61-bit resolution timestamp. This is fine when the top 3 bits are the same between calls to nanotime, but becomes a problem on boundaries when those 3 bits change. These cases may cause hiccups in the GC CPU limiter by not accounting for some source of CPU time correctly, but with 61 bits of resolution this should be extremely rare. The rate of update is on the order of milliseconds, so at worst the runtime will be off of any given measurement by only a few CPU-milliseconds (and this is directly bounded by the rate of update). We're probably more inaccurate from the fact that we don't measure real CPU time but only approximate it. For #52890. Change-Id: I347f30ac9e2ba6061806c21dfe0193ef2ab3bbe9 Reviewed-on: https://go-review.googlesource.com/c/go/+/410120 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-27runtime: cancel mark and scavenge assists if the limiter is enabledMichael Anthony Knyszek
This change forces mark and scavenge assists to be cancelled early if the limiter is enabled. This avoids goroutines getting stuck in really long assists if the limiter happens to be disabled when they first come into the assist. This can get especially bad for mark assists, which, in dire situations, can end up "owing" the GC a really significant debt. For #52890. Change-Id: I4bfaa76b8de3e167d49d2ffd8bc2127b87ea566a Reviewed-on: https://go-review.googlesource.com/c/go/+/408816 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-13runtime: make CPU limiter assist time much less error-proneMichael Anthony Knyszek
At the expense of performance (having to update another atomic counter) this change makes CPU limiter assist time much less error-prone to manage. There are currently a number of issues with respect to how scavenge assist time is treated, and this change resolves those by just having the limiter maintain its own internal pool that's drained on each update. While we're here, clear the measured assist time each cycle, which was the impetus for the change. Change-Id: I84c513a9f012b4007362a33cddb742c5779782b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/404304 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-05-12runtime: measure stack usage; start stacks larger if neededKeith Randall
Measure the average stack size used by goroutines at every GC. When starting a new goroutine, allocate an initial goroutine stack of that average size. Intuition is that we'll waste at most 2x in stack space because only half the goroutines can be below average. In turn, we avoid some of the early stack growth / copying needed in the average case. More details in the design doc at: https://docs.google.com/document/d/1YDlGIdVTPnmUiTAavlZxBI1d9pwGQgZT7IKFKlIXohQ/edit?usp=sharing name old time/op new time/op delta Issue18138 95.3µs ± 0% 67.3µs ±13% -29.35% (p=0.000 n=9+10) Fixes #18138 Change-Id: Iba34d22ed04279da7e718bbd569bbf2734922eaa Reviewed-on: https://go-review.googlesource.com/c/go/+/345889 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2022-05-03runtime: use Escape instead of escape in export_test.goMichael Anthony Knyszek
I landed the bottom CL of my stack without rebasing or retrying trybots, but in the rebase "escape" was removed in favor of "Escape." Change-Id: Icdc4d8de8b6ebc782215f2836cd191377cc211df Reviewed-on: https://go-review.googlesource.com/c/go/+/403755 Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-05-03runtime: redesign scavenging algorithmMichael Anthony Knyszek
Currently the runtime's scavenging algorithm involves running from the top of the heap address space to the bottom (or as far as it gets) once per GC cycle. Once it treads some ground, it doesn't tread it again until the next GC cycle. This works just fine for the background scavenger, for heap-growth scavenging, and for debug.FreeOSMemory. However, it breaks down in the face of a memory limit for small heaps in the tens of MiB. Basically, because the scavenger never retreads old ground, it's completely oblivious to new memory it could scavenge, and that it really *should* in the face of a memory limit. Also, every time some thread goes to scavenge in the runtime, it reserves what could be a considerable amount of address space, hiding it from other scavengers. This change modifies and simplifies the implementation overall. It's less code with complexities that are much better encapsulated. The current implementation iterates optimistically over the address space looking for memory to scavenge, keeping track of what it last saw. The new implementation does the same, but instead of directly iterating over pages, it iterates over chunks. It maintains an index of chunks (as a bitmap over the address space) that indicate which chunks may contain scavenge work. The page allocator populates this index, while scavengers consume it and iterate over it optimistically. This has a two key benefits: 1. Scavenging is much simpler: find a candidate chunk, and check it, essentially just using the scavengeOne fast path. There's no need for the complexity of iterating beyond one chunk, because the index is lock-free and already maintains that information. 2. If pages are freed to the page allocator (always guaranteed to be unscavenged), the page allocator immediately notifies all scavengers of the new source of work, avoiding the hiding issues of the old implementation. One downside of the new implementation, however, is that it's potentially more expensive to find pages to scavenge. In the past, if a single page would become free high up in the address space, the runtime's scavengers would ignore it. Now that scavengers won't, one or more scavengers may need to iterate potentially across the whole heap to find the next source of work. For the background scavenger, this just means a potentially less reactive scavenger -- overall it should still use the same amount of CPU. It means worse overheads for memory limit scavenging, but that's not exactly something with a baseline yet. In practice, this shouldn't be too bad, hopefully since the chunk index is extremely compact. For a 48-bit address space, the index is only 8 MiB in size at worst, but even just one physical page in the index is able to support up to 128 GiB heaps, provided they aren't terribly sparse. On 32-bit platforms, the index is only 128 bytes in size. For #48409. Change-Id: I72b7e74365046b18c64a6417224c5d85511194fb Reviewed-on: https://go-review.googlesource.com/c/go/+/399474 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03runtime: set the heap goal from the memory limitMichael Anthony Knyszek
This change makes the memory limit functional by including it in the heap goal calculation. Specifically, we derive a heap goal from the memory limit, and compare that to the GOGC-based goal. If the goal based on the memory limit is lower, we prefer that. To derive the memory limit goal, the heap goal calculation now takes a few additional parameters as input. As a result, the heap goal, in the presence of a memory limit, may change dynamically. The consequences of this are that different parts of the runtime can have different views of the heap goal; this is OK. What's important is that all of the runtime is able to observe the correct heap goal for the moment it's doing something that affects it, like anything that should trigger a GC cycle. On the topic of triggering a GC cycle, this change also allows any manually managed memory allocation from the page heap to trigger a GC. So, specifically workbufs, unrolled GC scan programs, and goroutine stacks. The reason for this is that now non-heap memory can effect the trigger or the heap goal. Most sources of non-heap memory only change slowly, like GC pointer bitmaps, or change in response to explicit function calls like GOMAXPROCS. Note also that unrolled GC scan programs and workbufs are really only relevant during a GC cycle anyway, so they won't actually ever trigger a GC. Our primary target here is goroutine stacks. Goroutine stacks can increase quickly, and this is currently totally independent of the GC cycle. Thus, if for example a goroutine begins to recurse suddenly and deeply, then even though the heap goal and trigger react, we might not notice until its too late. As a result, we need to trigger a GC cycle. We do this trigger in allocManual instead of in stackalloc because it's far more general. We ultimately care about memory that's mapped read/write and not returned to the OS, which is much more the domain of the page heap than the stack allocator. Furthermore, there may be new sources of memory manual allocation in the future (e.g. arenas) that need to trigger a GC if necessary. As such, I'm inclined to leave the trigger in allocManual as an extra defensive measure. It's worth noting that because goroutine stacks do not behave quite as predictably as other non-heap memory, there is the potential for the heap goal to swing wildly. Fortunately, goroutine stacks that haven't been set up to shrink by the last GC cycle will not shrink until after the next one. This reduces the amount of possible churn in the heap goal because it means that shrinkage only happens once per goroutine, per GC cycle. After all the goroutines that should shrink did, then goroutine stacks will only grow. The shrink mechanism is analagous to sweeping, which is incremental and thus tends toward a steady amount of heap memory used. As a result, in practice, I expect this to be a non-issue. Note that if the memory limit is not set, this change should be a no-op. For #48409. Change-Id: Ie06d10175e5e36f9fb6450e26ed8acd3d30c681c Reviewed-on: https://go-review.googlesource.com/c/go/+/394221 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2022-05-03runtime: check the heap goal and trigger dynamicallyMichael Anthony Knyszek
As it stands, the heap goal and the trigger are set once by gcController.commit, and then read out of gcController. However with the coming memory limit we need the GC to be able to respond to changes in non-heap memory. The simplest way of achieving this is to compute the heap goal and its associated trigger dynamically. In order to make this easier to implement, the GC trigger is now based on the heap goal, as opposed to the status quo of computing both simultaneously. In many cases we just want the heap goal anyway, not both, but we definitely need the goal to compute the trigger, because the trigger's bounds are entirely based on the goal (the initial runway is not). A consequence of this is that we can't rely on the trigger to enforce a minimum heap size anymore, and we need to lift that up directly to the goal. Specifically, we need to lift up any part of the calculation that *could* put the trigger ahead of the goal. Luckily this is just the heap minimum and minimum sweep distance. In the first case, the pacer may behave slightly differently, as the heap minimum is no longer the minimum trigger, but the actual minimum heap goal. In the second case it should be the same, as we ensure the additional runway for sweeping is added to both the goal *and* the trigger, as before, by computing that in gcControllerState.commit. There's also another place we update the heap goal: if a GC starts and we triggered beyond the goal, we always ensure there's some runway. That calculation uses the current trigger, which violates the rule of keeping the goal based on the trigger. Notice, however, that using the precomputed trigger for this isn't even quite correct: due to a bug, or something else, we might trigger a GC beyond the precomputed trigger. So this change also adds a "triggered" field to gcControllerState that tracks the point at which a GC actually triggered. This is independent of the precomputed trigger, so it's fine for the heap goal calculation to rely on it. It also turns out, there's more than just that one place where we really should be using the actual trigger point, so this change fixes those up too. Also, because the heap minimum is set by the goal and not the trigger, the maximum trigger calculation now happens *after* the goal is set, so the maximum trigger actually does what I originally intended (and what the comment says): at small heaps, the pacer picks 95% of the runway as the maximum trigger. Currently, the pacer picks a small trigger based on a not-yet-rounded-up heap goal, so the trigger gets rounded up to the goal, and as per the "ensure there's some runway" check, the runway ends up at always being 64 KiB. That check is supposed to be for exceptional circumstances, not the status quo. There's a test introduced in the last CL that needs to be updated to accomodate this slight change in behavior. So, this all sounds like a lot that changed, but what we're talking about here are really, really tight corner cases that arise from situations outside of our control, like pathologically bad behavior on the part of an OS or CPU. Even in these corner cases, it's very unlikely that users will notice any difference at all. What's more important, I think, is that the pacer behaves more closely to what all the comments describe, and what the original intent was. Another note: at first, one might think that computing the heap goal and trigger dynamically introduces some raciness, but not in this CL: the heap goal and trigger are completely static. Allocation outside of a GC cycle may now be a bit slower than before, as the GC trigger check is now significantly more complex. However, note that this executes basically just as often as gcController.revise, and that makes up for a vanishingly small part of any CPU profile. The next CL cleans up the floating point multiplications on this path nonetheless, just to be safe. For #48409. Change-Id: I280f5ad607a86756d33fb8449ad08555cbee93f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/397014 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03runtime: rewrite pacer max trigger calculationMichael Anthony Knyszek
Currently the maximum trigger calculation is totally incorrect with respect to the comment above it and its intent. This change rectifies this mistake. For #48409. Change-Id: Ifef647040a8bdd304dd327695f5f315796a61a74 Reviewed-on: https://go-review.googlesource.com/c/go/+/398834 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03runtime: move inconsistent memstats into gcControllerMichael Anthony Knyszek
Fundamentally, all of these memstats exist to serve the runtime in managing memory. For the sake of simpler testing, couple these stats more tightly with the GC. This CL was mostly done automatically. The fields had to be moved manually, but the references to the fields were updated via gofmt -w -r 'memstats.<field> -> gcController.<field>' *.go For #48409. Change-Id: Ic036e875c98138d9a11e1c35f8c61b784c376134 Reviewed-on: https://go-review.googlesource.com/c/go/+/397678 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03runtime: track how much memory is mapped in the Ready stateMichael Anthony Knyszek
This change adds a field to memstats called mappedReady that tracks how much memory is in the Ready state at any given time. In essence, it's the total memory usage by the Go runtime (with one exception which is documented). Essentially, all memory mapped read/write that has either been paged in or will soon. To make tracking this not involve the many different stats that track mapped memory, we track this statistic at a very low level. The downside of tracking this statistic at such a low level is that it managed to catch lots of situations where the runtime wasn't fully accounting for memory. This change rectifies these situations by always accounting for memory that's mapped in some way (i.e. always passing a sysMemStat to a mem.go function), with *two* exceptions. Rectifying these situations means also having the memory mapped during testing being accounted for, so that tests (i.e. ReadMemStats) that ultimately check mappedReady continue to work correctly without special exceptions. We choose to simply account for this memory in other_sys. Let's talk about the exceptions. The first is the arenas array for finding heap arena metadata from an address is mapped as read/write in one large chunk. It's tens of MiB in size. On systems with demand paging, we assume that the whole thing isn't paged in at once (after all, it maps to the whole address space, and it's exceedingly difficult with today's technology to even broach having as much physical memory as the total address space). On systems where we have to commit memory manually, we use a two-level structure. Now, the reason why this is an exception is because we have no mechanism to track what memory is paged in, and we can't just account for the entire thing, because that would *look* like an enormous overhead. Furthermore, this structure is on a few really, really critical paths in the runtime, so doing more explicit tracking isn't really an option. So, we explicitly don't and call sysAllocOS to map this memory. The second exception is that we call sysFree with no accounting to clean up address space reservations, or otherwise to throw out mappings we don't care about. In this case, also drop down to a lower level and call sysFreeOS to explicitly avoid accounting. The third exception is debuglog allocations. That is purely a debugging facility and ideally we want it to have as small an impact on the runtime as possible. If we include it in mappedReady calculations, it could cause GC pacing shifts in future CLs, especailly if one increases the debuglog buffer sizes as a one-off. As of this CL, these are the only three places in the runtime that would pass nil for a stat to any of the functions in mem.go. As a result, this CL makes sysMemStats mandatory to facilitate better accounting in the future. It's now much easier to grep and find out where accounting is explicitly elided, because one doesn't have to follow the trail of sysMemStat nil pointer values, and can just look at the function name. For #48409. Change-Id: I274eb467fc2603881717482214fddc47c9eaf218 Reviewed-on: https://go-review.googlesource.com/c/go/+/393402 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-05-03runtime: add a non-functional memory limit to the pacerMichael Anthony Knyszek
Nothing much to see here, just some plumbing to make latter CLs smaller and clearer. For #48409. Change-Id: Ide23812d5553e0b6eea5616c277d1a760afb4ed0 Reviewed-on: https://go-review.googlesource.com/c/go/+/393401 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2022-05-03runtime: add byte count parser for GOMEMLIMITMichael Anthony Knyszek
This change adds a parser for the GOMEMLIMIT environment variable's input. This environment variable accepts a number followed by an optional prefix expressing the unit. Acceptable units include B, KiB, MiB, GiB, TiB, where *iB is a power-of-two byte unit. For #48409. Change-Id: I6a3b4c02b175bfcf9c4debee6118cf5dda93bb6f Reviewed-on: https://go-review.googlesource.com/c/go/+/393400 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-05-03runtime: add GC CPU utilization limiterMichael Knyszek
This change adds a GC CPU utilization limiter to the GC. It disables assists to ensure GC CPU utilization remains under 50%. It uses a leaky bucket mechanism that will only fill if GC CPU utilization exceeds 50%. Once the bucket begins to overflow, GC assists are limited until the bucket empties, at the risk of GC overshoot. The limiter is primarily updated by assists. The scheduler may also update it, but only if the GC is on and a few milliseconds have passed since the last update. This second case exists to ensure that if the limiter is on, and no assists are happening, we're still updating the limiter regularly. The purpose of this limiter is to mitigate GC death spirals, opting to use more memory instead. This change turns the limiter on always. In practice, 50% overall GC CPU utilization is very difficult to hit unless you're trying; even the most allocation-heavy applications with complex heaps still need to do something with that memory. Note that small GOGC values (i.e. single-digit, or low teens) are more likely to trigger the limiter, which means the GOGC tradeoff may no longer be respected. Even so, it should still be relatively rare. This change also introduces the feature flag for code to support the memory limit feature. For #48409. Change-Id: Ia30f914e683e491a00900fd27868446c65e5d3c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/353989 Reviewed-by: Michael Pratt <mpratt@google.com>
2022-04-28runtime: clean up escaping in testsAustin Clements
There are several tests in the runtime that need to force various things to escape to the heap. This CL centralizes this functionality into runtime.Escape, defined in export_test. Change-Id: I2de2519661603ad46c372877a9c93efef8e7a857 Reviewed-on: https://go-review.googlesource.com/c/go/+/402178 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Austin Clements <austin@google.com> Auto-Submit: Austin Clements <austin@google.com>
2022-04-26runtime: refactor the scavenger and make it testableMichael Anthony Knyszek
This change refactors the scavenger into a type whose methods represent the actual function and scheduling of the scavenger. It also stubs out access to global state in order to make it testable. This change thus also adds a test for the scavenger. In writing this test, I discovered the lack of a behavior I expected: if the pageAlloc.scavenge returns < the bytes requested scavenged, that means the heap is exhausted. This has been true this whole time, but was not documented or explicitly relied upon. This change rectifies that. In theory this means the scavenger could spin in run() indefinitely (as happened in the test) if shouldStop never told it to stop. In practice, shouldStop fires long before the heap is exhausted, but for future changes it may be important. At the very least it's good to be intentional about these things. While we're here, I also moved the call to stopTimer out of wake and into sleep. There's no reason to add more operations to a context that's already precarious (running without a P on sysmon). Change-Id: Ib31b86379fd9df84f25ae282734437afc540da5c Reviewed-on: https://go-review.googlesource.com/c/go/+/384734 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-26runtime: reduce max idle mark workers during periodic GC cyclesMichael Anthony Knyszek
This change reduces the maximum number of idle mark workers during periodic (currently every 2 minutes) GC cycles to 1. Idle mark workers soak up all available and unused Ps, up to GOMAXPROCS. While this provides some throughput and latency benefit in general, it can cause what appear to be massive CPU utilization spikes in otherwise idle applications. This is mostly an issue for *very* idle applications, ones idle enough to trigger periodic GC cycles. This spike also tends to interact poorly with auto-scaling systems, as the system might assume the load average is very low and suddenly see a massive burst in activity. The result of this change is not to bring down this 100% (of GOMAXPROCS) CPU utilization spike to 0%, but rather min(25% + 1/GOMAXPROCS*100%, 100%) Idle mark workers also do incur a small latency penalty as they must be descheduled for other work that might pop up. Luckily the runtime is pretty good about getting idle mark workers off of Ps, so in general the latency benefit from shorter GC cycles outweighs this cost. But, the cost is still non-zero and may be more significant in idle applications that aren't invoking assists and write barriers quite as often. We can't completely eliminate idle mark workers because they're currently necessary for GC progress in some circumstances. Namely, they're critical for progress when all we have is fractional workers. If a fractional worker meets its quota, and all user goroutines are blocked directly or indirectly on a GC cycle (via runtime.GOMAXPROCS, or runtime.GC), the program may deadlock without GC workers, since the fractional worker will go to sleep with nothing to wake it. Fixes #37116. For #44163. Change-Id: Ib74793bb6b88d1765c52d445831310b0d11ef423 Reviewed-on: https://go-review.googlesource.com/c/go/+/393394 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-25runtime: fix stack-move sensitivity in some testsAustin Clements
There are a few tests of the scheduler run queue API that allocate a local []g and test using those G's. However, the run queue API frequently converts between *g and guintptr, which is safe for "real" Gs because they're heap-allocated and hence don't move, but if these tests get a stack movement while holding one of these local *g's as a guintptr, it won't get updated and the test will fail. Updates #48297. Change-Id: Ifd424147ce1a1b53732ff0cf55a81df1a9beeb3b Reviewed-on: https://go-review.googlesource.com/c/go/+/402157 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-04-20runtime: add fastrand64zhangyunhao
Support fastrand64 in the runtime, although fastrand uses wyrand to generate 64-bit random number, it still returns uint32. In some cases, we need to generate a 64-bit random number, the new API would be faster and easier to use, and at least we can use the new function in these places: src/net/dnsclient.go:randInt() src/hash/maphash/maphash.go:MakeSeed() src/runtime/map.go:mapiterinit() name time/op Fastrand-16 0.09ns ± 5% Fastrand64-16 0.09ns ± 6% Change-Id: Ibb97378c7ca59bc7dc15535d4872fa58ea112e6a Reviewed-on: https://go-review.googlesource.com/c/go/+/400734 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-04-05all: separate doc comment from //go: directivesRuss Cox
A future change to gofmt will rewrite // Doc comment. //go:foo to // Doc comment. // //go:foo Apply that change preemptively to all comments (not necessarily just doc comments). For #51082. Change-Id: Iffe0285418d1e79d34526af3520b415a12203ca9 Reviewed-on: https://go-review.googlesource.com/c/go/+/384260 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-31runtime: remove old pacer and the PacerRedesign goexperimentMichael Anthony Knyszek
Now that Go 1.18 has been released, remove the old pacer. Change-Id: Ie7a7596d67f3fc25d3f375a08fc75eafac2eb834 Reviewed-on: https://go-review.googlesource.com/c/go/+/393396 Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2022-02-10runtime: make piController much more defensive about overflowMichael Anthony Knyszek
If something goes horribly wrong with the assumptions surrounding a piController, its internal error state might accumulate in an unbounded manner. In practice this means unexpected Inf and NaN values. Avoid this by identifying cases where the error overflows and resetting controller state. In the scavenger, this case is much more likely. All that has to happen is the proportional relationship between sleep time and estimated CPU usage has to break down. Unfortunately because we're just measuring monotonic time for all this, there are lots of ways it could happen, especially in an oversubscribed system. In these cases, just fall back on a conservative pace for scavenging and try to wait out the issue. In the pacer I'm pretty sure this is impossible. Because we wire the output of the controller to the input, the response is very directly correlated, so it's impossible for the controller's core assumption to break down. While we're in the pacer, add more detail about why that controller is even there, as well as its purpose. Finally, let's be proactive about other sources of overflow, namely overflow from a very large input value. This change adds a check after the first few operations to detect overflow issues from the input, specifically the multiplication. No tests for the pacer because I was unable to actually break the pacer's controller under a fuzzer, and no tests for the scavenger because it is not really in a testable state. However: * This change includes a fuzz test for the piController. * I broke out the scavenger code locally and fuzz tested it, confirming that the patch eliminates the original failure mode. * I tested that on a local heap-spike test, the scavenger continues operating as expected under normal conditions. Fixes #51061. Change-Id: I02a01d2dbf0eb9d2a8a8e7274d4165c2b6a3415a Reviewed-on: https://go-review.googlesource.com/c/go/+/383954 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-02-10runtime: simplify histogram buckets considerablyMichael Anthony Knyszek
There was an off-by-one error in the time histogram buckets calculation that caused the linear sub-buckets distances to be off by 2x. The fix was trivial, but in writing tests I realized there was a much simpler way to express the calculation for the histogram buckets, and took the opportunity to do that here. The new bucket calculation also fixes the bug. Fixes #50732. Change-Id: Idae89986de1c415ee4e148f778e0e101ca003ade Reviewed-on: https://go-review.googlesource.com/c/go/+/380094 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com> Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-01-24runtime: replace TestFutexsleep with TestTimedivMichael Pratt
TestFutexsleep was originally created in CL 7876043 as a regression test for buggy division logic in futexsleep. Several months later CL 11575044 moved this logic to timediv (called by futexsleep). This test calls runtime.Futexsleep, which temporarily disables asynchronous preemption. Unfortunately, TestFutexSleep calls this from multiple goroutines, creating a race condition that may result in asynchronous preemption remaining disabled for the remainder of the process lifetime. We could fix this by moving the async preemption disable to the main test function, however this test has had a history of flakiness. As an alternative, this CL replaces the test wholesale with a new test for timediv, covering the overflow logic without the difficulty of dealing with futex. Fixes #50749. Change-Id: If9e1dac63ef1535adb49f9a9ffcaff99b9135895 Reviewed-on: https://go-review.googlesource.com/c/go/+/380058 Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-12-13all: gofmt -w -r 'interface{} -> any' srcRuss Cox
And then revert the bootstrap cmd directories and certain testdata. And adjust tests as needed. Not reverting the changes in std that are bootstrapped, because some of those changes would appear in API docs, and we want to use any consistently. Instead, rewrite 'any' to 'interface{}' in cmd/dist for those directories when preparing the bootstrap copy. A few files changed as a result of running gofmt -w not because of interface{} -> any but because they hadn't been updated for the new //go:build lines. Fixes #49884. Change-Id: Ie8045cba995f65bd79c694ec77a1b3d1fe01bb09 Reviewed-on: https://go-review.googlesource.com/c/go/+/368254 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2021-12-06runtime: set iOS addr space to 40 bits with incremental pageallocMichael Anthony Knyszek
In iOS <14, the address space is strictly limited to 8 GiB, or 33 bits. As a result, the page allocator also assumes all heap memory lives in this region. This is especially necessary because the page allocator has a PROT_NONE mapping proportional to the size of the usable address space, so this keeps that mapping very small. However starting with iOS 14, this restriction is relaxed, and mmap may start returning addresses outside of the <14 range. Today this means that in iOS 14 and later, users experience an error in the page allocator when a heap arena is mapped outside of the old range. This change increases the ios/arm64 heapAddrBits to 40 while simultaneously making ios/arm64 use the 64-bit pagealloc implementation (with reservations and incremental mapping) to accommodate both iOS versions <14 and 14+. Once iOS <14 is deprecated, we can remove these exceptions and treat ios/arm64 like any other arm64 platform. This change also makes the BaseChunkIdx expression a little bit easier to read, while we're here. Fixes #46860. Change-Id: I13865f799777739109585f14f1cc49d6d57e096b Reviewed-on: https://go-review.googlesource.com/c/go/+/344401 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com>
2021-11-29runtime: fix preemption sensitivity in TestTinyAllocIssue37262Austin Clements
TestTinyAllocIssue37262 assumes that all of its allocations will come from the same tiny allocator (that is, the same P), and that nothing else will allocate from that tiny allocator while it's running. It can fail incorrectly if these assumptions aren't met. Fix this potential test flakiness by disabling preemption during this test. As far as I know, this has never happened on the builders. It was found by mayMoreStackPreempt. Change-Id: I59f993e0bdbf46a9add842d0e278415422c3f804 Reviewed-on: https://go-review.googlesource.com/c/go/+/366994 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-11-05runtime: don't hold the heap lock while scavengingMichael Anthony Knyszek
This change modifies the scavenger to no longer hold the heap lock while actively scavenging pages. To achieve this, the change also: * Reverses the locking behavior of the (*pageAlloc).scavenge API, to only acquire the heap lock when necessary. * Introduces a new lock on the scavenger-related fields in a pageAlloc so that access to those fields doesn't require the heap lock. There are a few places in the scavenge path, notably reservation, that requires synchronization. The heap lock is far too heavy handed for this case. * Changes the scavenger to marks pages that are actively being scavenged as allocated, and "frees" them back to the page allocator the usual way. * Lifts the heap-growth scavenging code out of mheap.grow, where the heap lock is held, and into allocSpan, just after the lock is released. Releasing the lock during mheap.grow is not feasible if we want to ensure that allocation always makes progress (post-growth, another allocator could come in and take all that space, forcing the goroutine that just grew the heap to do so again). This change means that the scavenger now must do more work for each scavenge, but it is also now much more scalable. Although in theory it's not great by always taking the locked paths in the page allocator, it takes advantage of some properties of the allocator: * Most of the time, the scavenger will be working with one page at a time. The page allocator's locked path is optimized for this case. * On the allocation path, it doesn't need to do the find operation at all; it can go straight to setting bits for the range and updating the summary structure. Change-Id: Ie941d5e7c05dcc96476795c63fef74bcafc2a0f1 Reviewed-on: https://go-review.googlesource.com/c/go/+/353974 Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2021-11-04runtime: implement GC pacer redesignMichael Anthony Knyszek
This change implements the GC pacer redesign outlined in #44167 and the accompanying design document, behind a GOEXPERIMENT flag that is on by default. In addition to adding the new pacer, this CL also includes code to track and account for stack and globals scan work in the pacer and in the assist credit system. The new pacer also deviates slightly from the document in that it increases the bound on the minimum trigger ratio from 0.6 (scaled by GOGC) to 0.7. The logic behind this change is that the new pacer much more consistently hits the goal (good!) leading to slightly less frequent GC cycles, but _longer_ ones (in this case, bad!). It turns out that the cost of having the GC on hurts throughput significantly (per byte of memory used), though tail latencies can improve by up to 10%! To be conservative, this change moves the value to 0.7 where there is a small improvement to both throughput and latency, given the memory use. Because the new pacer accounts for the two most significant sources of scan work after heap objects, it is now also safer to reduce the minimum heap size without leading to very poor amortization. This change thus decreases the minimum heap size to 512 KiB, which corresponds to the fact that the runtime has around 200 KiB of scannable globals always there, up-front, providing a baseline. Benchmark results: https://perf.golang.org/search?q=upload:20211001.6 tile38's KNearest benchmark shows a memory increase, but throughput (and latency) per byte of memory used is better. gopher-lua showed an increase in both CPU time and memory usage, but subsequent attempts to reproduce this behavior are inconsistent. Sometimes the overall performance is better, sometimes it's worse. This suggests that the benchmark is fairly noisy in a way not captured by the benchmarking framework itself. biogo-igor is the only benchmark to show a significant performance loss. This benchmark exhibits a very high GC rate, with relatively little work to do in each cycle. The idle mark workers are quite active. In the new pacer, mark phases are longer, mark assists are fewer, and some of that time in mark assists has shifted to idle workers. Linux perf indicates that the difference in CPU time can be mostly attributed to write-barrier slow path related calls, which in turn indicates that the write barrier being on for longer is the primary culprit. This also explains the memory increase, as a longer mark phase leads to more memory allocated black, surviving an extra cycle and contributing to the heap goal. For #44167. Change-Id: I8ac7cfef7d593e4a642c9b2be43fb3591a8ec9c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/309869 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2021-11-01runtime: disable pacer lock held assertions in testsMichael Anthony Knyszek
Fixes #49234. Change-Id: I64c1eab0dce2bbe990343b43a32858a6c9f3dcda Reviewed-on: https://go-review.googlesource.com/c/go/+/359878 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2021-10-29runtime: add testing framework and basic tests for GC pacerMichael Knyszek
This change creates a formal exported interface for the GC pacer and creates tests for it that simulate some series of GC cycles. The tests are completely driven by the real pacer implementation, except for assists, which are idealized (though revise is called repeatedly). For #44167. Change-Id: I0112242b07e7702595ca71001d781ad6c1fddd2d Reviewed-on: https://go-review.googlesource.com/c/go/+/353354 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org>