aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/mstats.go
AgeCommit message (Collapse)Author
2025-10-30runtime: eliminate _PsyscallMichael Anthony Knyszek
This change eliminates the _Psyscall state by using synchronization on the G status _Gsyscall to make syscalls work instead. This removes an atomic Store and an atomic CAS on the syscall path, which reduces syscall and cgo overheads. It also simplifies the syscall paths quite a bit. The one danger with this change is that we have a new combination of states that was previously impossible. There are brief windows where it's possible to observe a goroutine in _Grunning but without a P. This change is careful to hide this detail from the execution tracer, but it may have unexpected effects in the rest of the runtime, making this change somewhat risky. goos: linux goarch: amd64 pkg: internal/runtime/cgobench cpu: AMD EPYC 7B13 │ before.out │ after.out │ │ sec/op │ sec/op vs base │ CgoCall-64 43.69n ± 1% 35.83n ± 1% -17.99% (p=0.002 n=6) CgoCallParallel-64 5.306n ± 1% 5.338n ± 1% ~ (p=0.132 n=6) Change-Id: I4551afc1eea0c1b67a0b2dd26b0d49aa47bf1fb8 Reviewed-on: https://go-review.googlesource.com/c/go/+/646198 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-06-04runtime: reduce per-P memory footprint when greenteagc is disabledMichael Anthony Knyszek
There are two additional sources of memory overhead per P that come from greenteagc. One is for ptrBuf, but on platforms other than Windows it doesn't actually cost anything due to demand-paging (Windows also demand-pages, but the memory is 'committed' so it still counts against OS RSS metrics). The other is for per-sizeclass scan stats. However when greenteagc is disabled, most of these scan stats are completely unused. The worst-case memory overhead from these two sources is relatively small (about 10 KiB per P), but for programs with a small memory footprint running on a machine with a lot of cores, this can be significant (single-digit percent). This change does two things. First, it puts ptrBuf initialization behind the greenteagc experiment, so now that memory is never allocated by default. Second, it abstracts the implementation details of scan stat collection and emission, such that we can have two different implementations depending on the build tag. This lets us remove all the unused stats when the greenteagc experiment is disabled, reducing the memory overhead of the stats from ~2.6 KiB per P to 536 bytes per P. This is enough to make the difference no longer noticable in our benchmark suite. Fixes #73931. Change-Id: I4351f1cbb3f6743d8f5922d757d73442c6d6ad3f Reviewed-on: https://go-review.googlesource.com/c/go/+/678535 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-08runtime: remove ptr/scalar bitmap metrickhr@golang.org
We don't use this mechanism any more, so the metric will always be zero. Since CL 616255. Update #73628 Change-Id: Ic179927a8bc24e6291876c218d88e8848b057c2a Reviewed-on: https://go-review.googlesource.com/c/go/+/671096 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-02runtime: mark and scan small objects in whole spans [green tea]Michael Anthony Knyszek
Our current parallel mark algorithm suffers from frequent stalls on memory since its access pattern is essentially random. Small objects are the worst offenders, since each one forces pulling in at least one full cache line to access even when the amount to be scanned is far smaller than that. Each object also requires an independent access to per-object metadata. The purpose of this change is to improve garbage collector performance by scanning small objects in batches to obtain better cache locality than our current approach. The core idea behind this change is to defer marking and scanning small objects, and then scan them in batches localized to a span. This change adds scanned bits to each small object (<=512 bytes) span in addition to mark bits. The scanned bits indicate that the object has been scanned. (One way to think of them is "grey" bits and "black" bits in the tri-color mark-sweep abstraction.) Each of these spans is always 8 KiB and if they contain pointers, the pointer/scalar data is already packed together at the end of the span, allowing us to further optimize the mark algorithm for this specific case. When the GC encounters a pointer, it first checks if it points into a small object span. If so, it is first marked in the mark bits, and then the object is queued on a work-stealing P-local queue. This object represents the whole span, and we ensure that a span can only appear at most once in any queue by maintaining an atomic ownership bit for each span. Later, when the pointer is dequeued, we scan every object with a set mark that doesn't have a corresponding scanned bit. If it turns out that was the only object in the mark bits since the last time we scanned the span, we scan just that object directly, essentially falling back to the existing algorithm. noscan objects have no scan work, so they are never queued. Each span's mark and scanned bits are co-located together at the end of the span. Since the span is always 8 KiB in size, it can be found with simple pointer arithmetic. Next to the marks and scans we also store the size class, eliminating the need to access the span's mspan altogether. The work-stealing P-local queue is a new source of GC work. If this queue gets full, half of it is dumped to a global linked list of spans to scan. The regular scan queues are always prioritized over this queue to allow time for darts to accumulate. Stealing work from other Ps is a last resort. This change also adds a new debug mode under GODEBUG=gctrace=2 that dumps whole-span scanning statistics by size class on every GC cycle. A future extension to this CL is to use SIMD-accelerated scanning kernels for scanning spans with high mark bit density. For #19112. (Deadlock averted in GOEXPERIMENT.) For #73581. Change-Id: I4bbb4e36f376950a53e61aaaae157ce842c341bc Reviewed-on: https://go-review.googlesource.com/c/go/+/658036 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-23runtime: move sizeclass defs to new package internal/runtime/gcMichael Anthony Knyszek
We will want to reference these definitions from new generator programs, and this is a good opportunity to cleanup all these old C-style names. Change-Id: Ifb06f0afc381e2697e7877f038eca786610c96de Reviewed-on: https://go-review.googlesource.com/c/go/+/655275 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-04-18internal,runtime: use the builtin clearapocelipes
To simplify the code. Change-Id: I023de705504c0b580718eec3c7c563b6cf2c8184 GitHub-Last-Rev: 026b32c799b13d0c7ded54f2e61429e6c5ed0aa8 GitHub-Pull-Request: golang/go#73412 Reviewed-on: https://go-review.googlesource.com/c/go/+/666118 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-04-08runtime: move GC pause time CPU metrics update into the STWMichael Anthony Knyszek
This change fixes a possible race with updating metrics and reading them. The update is intended to be protected by the world being stopped, but here, it clearly isn't. Fixing this lets us lower the thresholds in the metrics tests by an order of magnitude, because the only thing we have to worry about now is floating point error (the tests were previously written assuming the floating point error was much higher than it actually was; that turns out not to be the case, and this bug was the problem instead). However, this still isn't that tight of a bound; we still want to catch any and all problems of exactness. For this purpose, this CL adds a test to check the source-of-truth (in uint64 nanoseconds) that ensures the totals exactly match. This means we unfortunately have to take another time measurement, but for now let's prioritize correctness. A few additional nanoseconds of STW time won't be terribly noticable. Fixes #66212. Change-Id: Id02c66e8a43c13b1f70e9b268b8a84cc72293bfd Reviewed-on: https://go-review.googlesource.com/c/go/+/570257 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-08runtime: use maxprocs instead of stwprocs for GC CPU pause time metricsMichael Anthony Knyszek
Currently we use stwprocs as the multiplier for the STW CPU time computation, but this isn't the same as GOMAXPROCS, which is used for the total time in the CPU metrics. The two numbers need to be comparable, so this change switches to using maxprocs to make it so. Change-Id: I423e3c441d05b1bd656353368cb323289661e302 Reviewed-on: https://go-review.googlesource.com/c/go/+/570256 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-08runtime: factor out GC pause time CPU stats updateMichael Anthony Knyszek
Currently this is done manually in two places. Replace these manual updates with a method that also forces the caller to be mindful that the number will be multiplied (and that it needs to be). This will make follow-up changes simpler too. Change-Id: I81ea844b47a40ff3470d23214b4b2fb5b71a4abe Reviewed-on: https://go-review.googlesource.com/c/go/+/570255 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-25runtime: migrate internal/atomic to internal/runtimeAndy Pan
For #65355 Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be Reviewed-on: https://go-review.googlesource.com/c/go/+/560155 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
2023-11-28runtime: put ReadMemStats debug assertions behind a double-check modeMichael Anthony Knyszek
ReadMemStats has a few assertions it makes about the consistency of the stats it's about to produce. Specifically, how those stats line up with runtime-internal stats. These checks are generally useful, but crashing just because some stats are wrong is a heavy price to pay. For a long time this wasn't a problem, but very recently it became a real problem. It turns out that there's real benign skew that can happen wherein sysmon (which doesn't synchronize with a STW) generates a trace event when tracing is enabled, and may mutate some stats while ReadMemStats is running its checks. Fix this by synchronizing with both sysmon and the tracer. This is a bit heavy-handed, but better that than false positives. Also, put the checks behind a debug mode. We want to reduce the risk of backporting this change, and again, it's not great to crash just because user-facing stats are off. Still, enable this debug mode during the runtime tests so we don't lose quite as much coverage from disabling these checks by default. Fixes #64401. Change-Id: I9adb3e5c7161d207648d07373a11da8a5f0fda9a Reviewed-on: https://go-review.googlesource.com/c/go/+/545277 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
2023-11-15runtime/metrics: add STW stopping and total time metricsMichael Pratt
This CL adds four new time histogram metrics: /sched/pauses/stopping/gc:seconds /sched/pauses/stopping/other:seconds /sched/pauses/total/gc:seconds /sched/pauses/total/other:seconds The "stopping" metrics measure the time taken to start a stop-the-world pause. i.e., how long it takes stopTheWorldWithSema to stop all Ps. This can be used to detect STW struggling to preempt Ps. The "total" metrics measure the total duration of a stop-the-world pause, from starting to stop-the-world until the world is started again. This includes the time spent in the "start" phase. The "gc" metrics are used for GC-related STW pauses. The "other" metrics are used for all other STW pauses. All of these metrics start timing in stopTheWorldWithSema only after successfully acquiring sched.lock, thus excluding lock contention on sched.lock. The reasoning behind this is that while waiting on sched.lock the world is not stopped at all (all other Ps can run), so the impact of this contention is primarily limited to the goroutine attempting to stop-the-world. Additionally, we already have some visibility into sched.lock contention via contention profiles (#57071). /sched/pauses/total/gc:seconds is conceptually equivalent to /gc/pauses:seconds, so the latter is marked as deprecated and returns the same histogram as the former. In the implementation, there are a few minor differences: * For both mark and sweep termination stops, /gc/pauses:seconds started timing prior to calling startTheWorldWithSema, thus including lock contention. These details are minor enough, that I do not believe the slight change in reporting will matter. For mark termination stops, moving timing stop into startTheWorldWithSema does have the side effect of requiring moving other GC metric calculations outside of the STW, as they depend on the same end time. Fixes #63340 Change-Id: Iacd0bab11bedab85d3dcfb982361413a7d9c0d05 Reviewed-on: https://go-review.googlesource.com/c/go/+/534161 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-07-05runtime,runtime/metrics: clarify OS stack metricsMichael Anthony Knyszek
There are some subtle details here about measuring OS stacks in cgo programs. There's also an expectation about magnitude in the MemStats docs that isn't in the runtime/metrics docs. Fix both. Fixes #54396. Change-Id: I6b60a62a4a304e6688e7ab4d511d66193fc25321 Reviewed-on: https://go-review.googlesource.com/c/go/+/502156 Run-TryBot: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-07-04runtime: have ReadMemStats do a nil check before switching stacksKeith Randall
This gives the user a better stack trace experience. No need to expose them to runtime.systemstack and friends. Fixes #61158 Change-Id: I4f423f82e54b062773067c0ae64622e37cb3948b Reviewed-on: https://go-review.googlesource.com/c/go/+/507755 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2023-05-23runtime/metrics: refactor CPU stats accumulationMichael Anthony Knyszek
Currently the CPU stats are only updated once every mark termination, but for writing robust tests, it's often useful to force this update. Refactor the CPU stats accumulation out of gcMarkTermination and into its own function. This is also a step toward real-time CPU stats. While we're here, fix some incorrect documentation about dedicated GC CPU time. For #59749. For #60276. Change-Id: I8c1a9aca45fcce6ce7999702ae4e082853a69711 Reviewed-on: https://go-review.googlesource.com/c/go/+/487215 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2023-05-19runtime: emit STW events for all pauses, not just those for the GCMichael Anthony Knyszek
Currently STW events are only emitted for GC STWs. There's little reason why the trace can't contain events for every STW: they're rare so don't take up much space in the trace, yet being able to see when the world was stopped is often critical to debugging certain latency issues, especially when they stem from user-level APIs. This change adds new "kinds" to the EvGCSTWStart event, renames the GCSTW events to just "STW," and lets the parser deal with unknown STW kinds for future backwards compatibility. But, this change must break trace compatibility, so it bumps the trace version to Go 1.21. This change also includes a small cleanup in the trace command, which previously checked for STW events when deciding whether user tasks overlapped with a GC. Looking at the source, I don't see a way for STW events to ever enter the stream that that code looks at, so that condition has been deleted. Change-Id: I9a5dc144092c53e92eb6950e9a5504a790ac00cf Reviewed-on: https://go-review.googlesource.com/c/go/+/494495 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-09-16runtime/metrics: add CPU statsMichael Anthony Knyszek
This changes adds a breakdown for estimated CPU usage by time. These estimates are not based on real on-CPU counters, so each metric has a disclaimer explaining so. They can, however, be more reasonably compared to a total CPU time metric that this change also adds. Fixes #47216. Change-Id: I125006526be9f8e0d609200e193da5a78d9935be Reviewed-on: https://go-review.googlesource.com/c/go/+/404307 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Josh MacDonald <jmacd@lightstep.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-31runtime: convert consistentHeapStats.gen to atomic typecuiweixie
For #53821 Change-Id: I9f57b84f6a2c29d750fb20420daef903a9311a83 Reviewed-on: https://go-review.googlesource.com/c/go/+/425781 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-26runtime: drop padding alignment field for timeHistogramCuong Manh Le
After CL 419449, timeHistogram always have 8-byte alignment. Change-Id: I93145502bcafa1712b811b1a6d62da5d54d0db42 Reviewed-on: https://go-review.googlesource.com/c/go/+/425777 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2022-08-26runtime: convert p.statsSeq to internal atomic typehopehook
For #53821. Change-Id: I1cab3671a29c218b8a927aba9064e63b65900173 Reviewed-on: https://go-review.googlesource.com/c/go/+/425416 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: hopehook <hopehook@qq.com>
2022-08-12runtime: convert timeHistogram to atomic typesMichael Pratt
I've dropped the note that sched.timeToRun is protected by sched.lock, as it does not seem to be true. For #53821. Change-Id: I03f8dc6ca0bcd4ccf3ec113010a0aa39c6f7d6ef Reviewed-on: https://go-review.googlesource.com/c/go/+/419449 Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Pratt <mpratt@google.com>
2022-05-03runtime: store consistent total allocation stats as uint64Michael Anthony Knyszek
Currently the consistent total allocation stats are managed as uintptrs, which means they can easily overflow on 32-bit systems. Fix this by storing these stats as uint64s. This will cause some minor performance degradation on 32-bit systems, but there really isn't a way around this, and it affects the correctness of the metrics we export. Fixes #52680. Change-Id: I7e6ca44047d46b4bd91c6f87c2d29f730e0d6191 Reviewed-on: https://go-review.googlesource.com/c/go/+/403758 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Austin Clements <austin@google.com>
2022-05-03runtime: check the heap goal and trigger dynamicallyMichael Anthony Knyszek
As it stands, the heap goal and the trigger are set once by gcController.commit, and then read out of gcController. However with the coming memory limit we need the GC to be able to respond to changes in non-heap memory. The simplest way of achieving this is to compute the heap goal and its associated trigger dynamically. In order to make this easier to implement, the GC trigger is now based on the heap goal, as opposed to the status quo of computing both simultaneously. In many cases we just want the heap goal anyway, not both, but we definitely need the goal to compute the trigger, because the trigger's bounds are entirely based on the goal (the initial runway is not). A consequence of this is that we can't rely on the trigger to enforce a minimum heap size anymore, and we need to lift that up directly to the goal. Specifically, we need to lift up any part of the calculation that *could* put the trigger ahead of the goal. Luckily this is just the heap minimum and minimum sweep distance. In the first case, the pacer may behave slightly differently, as the heap minimum is no longer the minimum trigger, but the actual minimum heap goal. In the second case it should be the same, as we ensure the additional runway for sweeping is added to both the goal *and* the trigger, as before, by computing that in gcControllerState.commit. There's also another place we update the heap goal: if a GC starts and we triggered beyond the goal, we always ensure there's some runway. That calculation uses the current trigger, which violates the rule of keeping the goal based on the trigger. Notice, however, that using the precomputed trigger for this isn't even quite correct: due to a bug, or something else, we might trigger a GC beyond the precomputed trigger. So this change also adds a "triggered" field to gcControllerState that tracks the point at which a GC actually triggered. This is independent of the precomputed trigger, so it's fine for the heap goal calculation to rely on it. It also turns out, there's more than just that one place where we really should be using the actual trigger point, so this change fixes those up too. Also, because the heap minimum is set by the goal and not the trigger, the maximum trigger calculation now happens *after* the goal is set, so the maximum trigger actually does what I originally intended (and what the comment says): at small heaps, the pacer picks 95% of the runway as the maximum trigger. Currently, the pacer picks a small trigger based on a not-yet-rounded-up heap goal, so the trigger gets rounded up to the goal, and as per the "ensure there's some runway" check, the runway ends up at always being 64 KiB. That check is supposed to be for exceptional circumstances, not the status quo. There's a test introduced in the last CL that needs to be updated to accomodate this slight change in behavior. So, this all sounds like a lot that changed, but what we're talking about here are really, really tight corner cases that arise from situations outside of our control, like pathologically bad behavior on the part of an OS or CPU. Even in these corner cases, it's very unlikely that users will notice any difference at all. What's more important, I think, is that the pacer behaves more closely to what all the comments describe, and what the original intent was. Another note: at first, one might think that computing the heap goal and trigger dynamically introduces some raciness, but not in this CL: the heap goal and trigger are completely static. Allocation outside of a GC cycle may now be a bit slower than before, as the GC trigger check is now significantly more complex. However, note that this executes basically just as often as gcController.revise, and that makes up for a vanishingly small part of any CPU profile. The next CL cleans up the floating point multiplications on this path nonetheless, just to be safe. For #48409. Change-Id: I280f5ad607a86756d33fb8449ad08555cbee93f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/397014 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03runtime: move inconsistent memstats into gcControllerMichael Anthony Knyszek
Fundamentally, all of these memstats exist to serve the runtime in managing memory. For the sake of simpler testing, couple these stats more tightly with the GC. This CL was mostly done automatically. The fields had to be moved manually, but the references to the fields were updated via gofmt -w -r 'memstats.<field> -> gcController.<field>' *.go For #48409. Change-Id: Ic036e875c98138d9a11e1c35f8c61b784c376134 Reviewed-on: https://go-review.googlesource.com/c/go/+/397678 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03runtime: clean up inconsistent heap statsMichael Anthony Knyszek
The inconsistent heaps stats in memstats are a bit messy. Primarily, heap_sys is non-orthogonal with heap_released and heap_inuse. In later CLs, we're going to want heap_sys-heap_released-heap_inuse, so clean this up by replacing heap_sys with an orthogonal metric: heapFree. heapFree represents page heap memory that is free but not released. I think this change also simplifies a lot of reasoning about these stats; it's much clearer what they mean, and to obtain HeapSys for memstats, we no longer need to do the strange subtraction from heap_sys when allocating specifically non-heap memory from the page heap. Because we're removing heap_sys, we need to replace it with a sysMemStat for mem.go functions. In this case, heap_released is the most appropriate because we increase it anyway (again, non-orthogonality). In which case, it makes sense for heap_inuse, heap_released, and heapFree to become more uniform, and to just represent them all as sysMemStats. While we're here and messing with the types of heap_inuse and heap_released, let's also fix their names (and last_heap_inuse's name) up to the more modern Go convention of camelCase. For #48409. Change-Id: I87fcbf143b3e36b065c7faf9aa888d86bd11710b Reviewed-on: https://go-review.googlesource.com/c/go/+/397677 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03runtime: track how much memory is mapped in the Ready stateMichael Anthony Knyszek
This change adds a field to memstats called mappedReady that tracks how much memory is in the Ready state at any given time. In essence, it's the total memory usage by the Go runtime (with one exception which is documented). Essentially, all memory mapped read/write that has either been paged in or will soon. To make tracking this not involve the many different stats that track mapped memory, we track this statistic at a very low level. The downside of tracking this statistic at such a low level is that it managed to catch lots of situations where the runtime wasn't fully accounting for memory. This change rectifies these situations by always accounting for memory that's mapped in some way (i.e. always passing a sysMemStat to a mem.go function), with *two* exceptions. Rectifying these situations means also having the memory mapped during testing being accounted for, so that tests (i.e. ReadMemStats) that ultimately check mappedReady continue to work correctly without special exceptions. We choose to simply account for this memory in other_sys. Let's talk about the exceptions. The first is the arenas array for finding heap arena metadata from an address is mapped as read/write in one large chunk. It's tens of MiB in size. On systems with demand paging, we assume that the whole thing isn't paged in at once (after all, it maps to the whole address space, and it's exceedingly difficult with today's technology to even broach having as much physical memory as the total address space). On systems where we have to commit memory manually, we use a two-level structure. Now, the reason why this is an exception is because we have no mechanism to track what memory is paged in, and we can't just account for the entire thing, because that would *look* like an enormous overhead. Furthermore, this structure is on a few really, really critical paths in the runtime, so doing more explicit tracking isn't really an option. So, we explicitly don't and call sysAllocOS to map this memory. The second exception is that we call sysFree with no accounting to clean up address space reservations, or otherwise to throw out mappings we don't care about. In this case, also drop down to a lower level and call sysFreeOS to explicitly avoid accounting. The third exception is debuglog allocations. That is purely a debugging facility and ideally we want it to have as small an impact on the runtime as possible. If we include it in mappedReady calculations, it could cause GC pacing shifts in future CLs, especailly if one increases the debuglog buffer sizes as a one-off. As of this CL, these are the only three places in the runtime that would pass nil for a stat to any of the functions in mem.go. As a result, this CL makes sysMemStats mandatory to facilitate better accounting in the future. It's now much easier to grep and find out where accounting is explicitly elided, because one doesn't have to follow the trail of sysMemStat nil pointer values, and can just look at the function name. For #48409. Change-Id: I274eb467fc2603881717482214fddc47c9eaf218 Reviewed-on: https://go-review.googlesource.com/c/go/+/393402 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-05-03runtime: maintain a direct count of total allocs and freesMichael Anthony Knyszek
This will be used by the memory limit computation to determine overheads. For #48409. Change-Id: Iaa4e26e1e6e46f88d10ba8ebb6b001be876dc5cd Reviewed-on: https://go-review.googlesource.com/c/go/+/394220 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-05all: separate doc comment from //go: directivesRuss Cox
A future change to gofmt will rewrite // Doc comment. //go:foo to // Doc comment. // //go:foo Apply that change preemptively to all comments (not necessarily just doc comments). For #51082. Change-Id: Iffe0285418d1e79d34526af3520b415a12203ca9 Reviewed-on: https://go-review.googlesource.com/c/go/+/384260 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-31runtime: remove intermediate fields in memstats for ReadMemStatsMichael Anthony Knyszek
Currently, the ReadMemStats (really this is all happening in readmemstats_m, but that's just a direct call from ReadMemStats) call chain first populates some fields in memstats, then copies those into the final MemStats location. This used to make a lot of sense when memstats' structure aligned with MemStats, and the values were just copied from one to other. Sometime in the last few releases, we switched to populating the MemStats manually because a lot of fields had diverged from their internal representation. Now, we're left with a lot of fields in memstats that pollute the structure: they only exist to be updated for the sake of ReadMemStats. Since we're going to be adding more fields to memstats in further CLs, this is a good opportunity to clean up. As a result of this change, updatememstats, which used to just update the aforementioned intermediate fields in memstats, is no longer necessary, so it is removed. Change-Id: Ifabfb3ac3002641105af62e9509a6351165dcd87 Reviewed-on: https://go-review.googlesource.com/c/go/+/393397 Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2021-11-05runtime: make consistentHeapStats acquire/release nosplitMichael Anthony Knyszek
consistentHeapStats is updated during a stack allocation, so a stack growth during an acquire or release could cause another acquire to happen before the operation completes fully. This may lead to an invalid sequence number. Fixes #49395. Change-Id: I41ce3393dff80201793e053d4d6394d7b211a5b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/361158 Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Austin Clements <austin@google.com>
2021-06-17[dev.typeparams] runtime: fix import sort order [generated]Michael Anthony Knyszek
[git-generate] cd src/runtime goimports -w *.go Change-Id: I1387af0f2fd1a213dc2f4c122e83a8db0fcb15f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/329189 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-06-17[dev.typeparams] runtime: replace uses of runtime/internal/sys.PtrSize with ↵Michael Anthony Knyszek
internal/goarch.PtrSize [generated] [git-generate] cd src/runtime/internal/math gofmt -w -r "sys.PtrSize -> goarch.PtrSize" . goimports -w *.go cd ../.. gofmt -w -r "sys.PtrSize -> goarch.PtrSize" . goimports -w *.go Change-Id: I43491cdd54d2e06d4d04152b3d213851b7d6d423 Reviewed-on: https://go-review.googlesource.com/c/go/+/328337 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2021-04-27runtime/metrics: add tiny allocs metricMichael Anthony Knyszek
Currently tiny allocations are not represented in either MemStats or runtime/metrics, but they're represented in MemStats (indirectly) via Mallocs. Add them to runtime/metrics by first merging memstats.tinyallocs into consistentHeapStats (just for simplicity; it's monotonic so metrics would still be self-consistent if we just read it atomically) and then adding /gc/heap/tiny/allocs:objects to the list of supported metrics. Change-Id: Ie478006ab942a3e877b4a79065ffa43569722f3d Reviewed-on: https://go-review.googlesource.com/c/go/+/312909 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2021-04-14runtime: move next_gc and last_next_gc into gcControllerStateMichael Anthony Knyszek
This change moves next_gc and last_next_gc into gcControllerState under the names heapGoal and lastHeapGoal respectively. These are fundamentally GC pacer related values, and so it makes sense for them to live here. Partially generated by rf ' ex . { memstats.next_gc -> gcController.heapGoal memstats.last_next_gc -> gcController.lastHeapGoal } ' except for updates to comments and gcControllerState methods, where they're accessed through the receiver, and trace-related renames of NextGC -> HeapGoal, while we're here. For #44167. Change-Id: I1e871ad78a57b01be8d9f71bd662530c84853bed Reviewed-on: https://go-review.googlesource.com/c/go/+/306603 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2021-04-13runtime: move internal GC statistics from memstats to gcControllerMichael Anthony Knyszek
This change moves certain important but internal-only GC statistics from memstats into gcController. These statistics are mainly used in pacing the GC, so it makes sense to keep them in the pacer's state. This CL was mostly generated via rf ' ex . { memstats.gc_trigger -> gcController.trigger memstats.triggerRatio -> gcController.triggerRatio memstats.heap_marked -> gcController.heapMarked memstats.heap_live -> gcController.heapLive memstats.heap_scan -> gcController.heapScan } ' except for a few special cases, like updating names in comments and when these fields are used within gcControllerState methods (at which point they're accessed through the reciever). For #44167. Change-Id: I6bd1602585aeeb80818ded24c07d8e6fec992b93 Reviewed-on: https://go-review.googlesource.com/c/go/+/306598 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2020-11-02runtime: decouple consistent stats from mcache and allow P-less updateMichael Anthony Knyszek
This change modifies the consistent stats implementation to keep the per-P sequence counter on each P instead of each mcache. A valid mcache is not available everywhere that we want to call e.g. allocSpan, as per issue #42339. By decoupling these two, we can add a mechanism to allow contexts without a P to update stats consistently. In this CL, we achieve that with a mutex. In practice, it will be very rare for an M to update these stats without a P. Furthermore, the stats reader also only needs to hold the mutex across the update to "gen" since once that changes, writers are free to continue updating the new stats generation. Contention could thus only arise between writers without a P, and as mentioned earlier, those should be rare. A nice side-effect of this change is that the consistent stats acquire and release API becomes simpler. Fixes #42339. Change-Id: Ied74ab256f69abd54b550394c8ad7c4c40a5fe34 Reviewed-on: https://go-review.googlesource.com/c/go/+/267158 Run-TryBot: Michael Knyszek <mknyszek@google.com> Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-30runtime: add world-stopped assertionsMichael Pratt
Stopping the world is an implicit lock for many operations, so we should assert the world is stopped in functions that require it. This is enabled along with the rest of lock ranking, though it is a bit orthogonal and likely cheap enough to enable all the time should we choose. Requiring a lock _or_ world stop is common, so that can be expressed as well. Updates #40677 Change-Id: If0a58544f4251d367f73c4120c9d39974c6cd091 Reviewed-on: https://go-review.googlesource.com/c/go/+/248577 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Trust: Michael Pratt <mpratt@google.com>
2020-10-26runtime,runtime/metrics: add metric for distribution of GC pausesMichael Anthony Knyszek
For #37112. Change-Id: Ibb0425c9c582ae3da3b2662d5bbe830d7df9079c Reviewed-on: https://go-review.googlesource.com/c/go/+/247047 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26runtime: make sysMemStats' methods nosplitMichael Anthony Knyszek
sysMemStats are updated early on in runtime initialization, so triggering a stack growth would be bad. Mark them nosplit. Thank you so much to cherryyz@google.com for finding this fix! Fixes #42218. Change-Id: Ic62db76e6a4f829355d7eaabed1727c51adfbd0f Reviewed-on: https://go-review.googlesource.com/c/go/+/265157 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2020-10-26runtime,runtime/metrics: add memory metricsMichael Anthony Knyszek
This change adds support for a variety of runtime memory metrics and contains the base implementation of Read for the runtime/metrics package, which lives in the runtime. It also adds testing infrastructure for the metrics package, and a bunch of format and documentation tests. For #37112. Change-Id: I16a2c4781eeeb2de0abcb045c15105f1210e2d8a Reviewed-on: https://go-review.googlesource.com/c/go/+/247041 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Trust: Michael Knyszek <mknyszek@google.com>
2020-10-26runtime: move malloc stats into consistentHeapStatsMichael Anthony Knyszek
This change moves the mcache-local malloc stats into the consistentHeapStats structure so the malloc stats can be managed consistently with the memory stats. The one exception here is tinyAllocs for which moving that into the global stats would incur several atomic writes on the fast path. Microbenchmarks for just one CPU core have shown a 50% loss in throughput. Since tiny allocation counnt isn't exposed anyway and is always blindly added to both allocs and frees, let that stay inconsistent and flush the tiny allocation count every so often. Change-Id: I2a4b75f209c0e659b9c0db081a3287bf227c10ca Reviewed-on: https://go-review.googlesource.com/c/go/+/247039 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26runtime: replace some memstats with consistent statsMichael Anthony Knyszek
This change replaces stacks_inuse, gcWorkBufInUse and gcProgPtrScalarBitsInUse with their corresponding consistent stats. It also adds checks to make sure the rest of the sharded stats line up with existing stats in updatememstats. Change-Id: I17d0bd181aedb5c55e09c8dff18cef5b2a3a14e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/247038 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26runtime: add consistent heap statisticsMichael Anthony Knyszek
This change adds a global set of heap statistics which are similar to existing memory statistics. The purpose of these new statistics is to be able to read them and get a consistent result without stopping the world. The goal is to eventually replace as many of the existing memstats statistics with the sharded ones as possible. The consistent memory statistics use a tailor-made synchronization mechanism to allow writers (allocators) to proceed with minimal synchronization by using a sequence counter and a global generation counter to determine which set of statistics to update. Readers increment the global generation counter to effectively grab a snapshot of the statistics, and then iterate over all Ps using the sequence counter to ensure that they may safely read the snapshotted statistics. To keep statistics fresh, the reader also has a responsibility to merge sets of statistics. These consistent statistics are computed, but otherwise unused for now. Upcoming changes will integrate them with the rest of the codebase and will begin to phase out existing statistics. Change-Id: I637a11f2439e2049d7dccb8650c5d82500733ca5 Reviewed-on: https://go-review.googlesource.com/c/go/+/247037 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26runtime: remove memstats.heap_allocMichael Anthony Knyszek
memstats.heap_alloc is 100% a duplicate and unnecessary copy of memstats.alloc which exists because MemStats used to be populated from memstats via a memmove. Change-Id: I995489f61be39786e573b8494a8ab6d4ea8bed9c Reviewed-on: https://go-review.googlesource.com/c/go/+/246975 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26runtime: remove memstats.heap_idleMichael Anthony Knyszek
This statistic is updated in many places but for MemStats may be computed from existing statistics. Specifically by definition heap_idle = heap_sys - heap_inuse since heap_sys is all memory allocated from the OS for use in the heap minus memory used for non-heap purposes. heap_idle is almost the same (since it explicitly includes memory that *could* be used for non-heap purposes) but also doesn't include memory that's actually used to hold heap objects. Although it has some utility as a sanity check, it complicates accounting and we want fewer, orthogonal statistics for upcoming metrics changes, so just drop it. Change-Id: I40af54a38e335f43249f6e218f35088bfd4380d1 Reviewed-on: https://go-review.googlesource.com/c/go/+/246974 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26runtime: break down memstats.gc_sysMichael Anthony Knyszek
This change breaks apart gc_sys into three distinct pieces. Two of those pieces are pieces which come from heap_sys since they're allocated from the page heap. The rest comes from memory mapped from e.g. persistentalloc which better fits the purpose of a sysMemStat. Also, rename gc_sys to gcMiscSys. Change-Id: I098789170052511e7b31edbcdc9a53e5c24573f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/246973 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26runtime: copy in MemStats fields explicitlyMichael Anthony Knyszek
Currently MemStats is populated via an unsafe memmove from memstats, but this places unnecessary structural restrictions on memstats, is annoying to reason about, and tightly couples the two. Instead, just populate the fields of MemStats explicitly. Change-Id: I96f6a64326b1a91d4084e7b30169a4bbe6a331f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/246972 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26runtime: delineate which memstats are system stats with a typeMichael Anthony Knyszek
This change modifies the type of several mstats fields to be a new type: sysMemStat. This type has the same structure as the fields used to have. The purpose of this change is to make it very clear which stats may be used in various functions for accounting (usually the platform-specific sys* functions, but there are others). Currently there's an implicit understanding that the *uint64 value passed to these functions is some kind of statistic whose value is atomically managed. This understanding isn't inherently problematic, but we're about to change how some stats (which currently use mSysStatInc and mSysStatDec) work, so we want to make it very clear what the various requirements are around "sysStat". This change also removes mSysStatInc and mSysStatDec in favor of a method on sysMemStat. Note that those two functions were originally written the way they were because atomic 64-bit adds required a valid G on ARM, but this hasn't been the case for a very long time (since golang.org/cl/14204, but even before then it wasn't clear if mutexes required a valid G anymore). Today we implement 64-bit adds on ARM with a spinlock table. Change-Id: I4e9b37cf14afc2ae20cf736e874eb0064af086d7 Reviewed-on: https://go-review.googlesource.com/c/go/+/246971 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26runtime: rename mcache fields to match Go styleMichael Anthony Knyszek
This change renames a bunch of malloc statistics stored in the mcache that are all named with the "local_" prefix. It also renames largeAlloc to allocLarge to prevent a naming conflict, and next_sample because it would be the last mcache field with the old C naming style. Change-Id: I29695cb83b397a435ede7e9ad5c3c9be72767ea3 Reviewed-on: https://go-review.googlesource.com/c/go/+/246969 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26runtime: flush local_scan directly and more oftenMichael Anthony Knyszek
Now that local_scan is the last mcache-based statistic that is flushed by purgecachedstats, and heap_scan and gcController.revise may be interacted with concurrently, we don't need to flush heap_scan at arbitrary locations where the heap is locked, and we don't need purgecachedstats and cachestats anymore. Instead, we can flush local_scan at the same time we update heap_live in refill, so the two updates may share the same revise call. Clean up unused functions, remove code that would cause the heap to get locked in the allocSpan when it didn't need to (other than to flush local_scan), and flush local_scan explicitly in a few important places. Notably we need to flush local_scan whenever we flush the other stats, but it doesn't need to be donated anywhere, so have releaseAll do the flushing. Also, we need to flush local_scan before we set heap_scan at the end of a GC, which was previously handled by cachestats. Just do so explicitly -- it's not much code and it becomes a lot more clear why we need to do so. Change-Id: I35ac081784df7744d515479896a41d530653692d Reviewed-on: https://go-review.googlesource.com/c/go/+/246968 Run-TryBot: Michael Knyszek <mknyszek@google.com> Trust: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>