aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/metrics_test.go
AgeCommit message (Collapse)Author
2025-12-08runtime: only run TestNotInGoMetricCallback when building with cgoMichael Anthony Knyszek
It seems like some platforms don't build with cgo at all, like linux/ppc64. Change-Id: Ibe5306e79899822e77d42cb3018f9f0c9ca2604c Reviewed-on: https://go-review.googlesource.com/c/go/+/728140 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-12-08runtime: disable TestNotInGoMetricCallback on FreeBSD + raceMichael Anthony Knyszek
cgo + race is not supported on FreeBSD. Change-Id: I38abeccaaabfcc104d1d5a077fb99646dc4be792 Reviewed-on: https://go-review.googlesource.com/c/go/+/728120 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-12-05runtime: don't count nGsyscallNoP for extra Ms in CMichael Anthony Knyszek
In #76435, it turns out that the new metric /sched/goroutines/not-in-go:goroutines counts C threads that have called into Go before (on Linux) as not-in-go goroutines. The reason for this is that the M is still attached to the C thread on Linux as an optimization, so we don't go through all the trouble of detaching the M and, of course, decrementing nGsyscallNoP. There's an easy fix to this accounting issue. The flag on the M, isExtraInC, says whether a thread with an extra M attached no longer has any Go on its (logical) stack. When we take the P from an M in this state, we simply just don't increment nGsyscallNoP. When it calls back into Go, we similarly skip the decrement to nGsyscallNoP. This is more efficient than alternatives, like always updating nGsyscallNoP in cgocallbackg, since that would add a new read-modify-write atomic onto that fast path. It does mean we count threads in C with a P still attached as not-in-go, but this transient in most real programs, assuming the thread indeed does not call back into Go any time soon. Fixes #76435. Change-Id: Id05563bacbe35d3fae17d67fb5ed45fa43fa0548 Reviewed-on: https://go-review.googlesource.com/c/go/+/726964 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-11-11std,cmd: go fix -any std cmdAlan Donovan
This change mechanically replaces all occurrences of interface{} by 'any' (where deemed safe by the 'any' modernizer) throughout std and cmd, minus their vendor trees. Since this fix is relatively numerous, it gets its own CL. Also, 'go generate go/types'. Change-Id: I14a6b52856c3291c1d27935409bca8d5fd4242a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/719702 Commit-Queue: Alan Donovan <adonovan@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Auto-Submit: Alan Donovan <adonovan@google.com>
2025-09-26runtime: move TestReadMetricsSched to testprogMichael Anthony Knyszek
There are just too many flakes resulting from background pollution by the testing package and other tests. Run in a subprocess where at least the environment can be more tightly controlled. Fixes #75049. Change-Id: Iad59edaaf31268f1fcb77273f01317d963708fa6 Reviewed-on: https://go-review.googlesource.com/c/go/+/707155 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-09-22runtime: don't re-read metrics before check in TestReadMetricsSchedMichael Anthony Knyszek
The two remaining popular flakes in TestReadMetricsSched are with waiting and not-in-go goroutines. I believe the reason for both of these is pollution due to other goroutines in the test binary, and we can be a little bit more robust to them. In particular, both of these tests spin until there's a particular condition met in the metrics. Then they both immediately re-read the metrics. The metrics being checked are not guaranteed to stay that way in the re-read. For example, for the waiting case, it's possible for >1000 to be waiting, but then some leftover goroutines from a previous test wake up off of some condition, causing the number to dip again on the re-read. This is supported by the fact that in some of these failures, the test takes very little time to execute (the condition that there are at least 1000 waiting goroutines is almost immediately satisfied) but it still fails (the re-read causes us to notice that there are <1000 waiting goroutines). This change makes the test more robust by not performing this re-read. This means more possible false negatives, but the error cases we're looking for (bad metrics) should still show up with some reasonable probability when something *is* truly wrong. For #75049. Change-Id: Idc3a8030bed8da51f24322efe15f3ff13da05623 Reviewed-on: https://go-review.googlesource.com/c/go/+/705875 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-09-10runtime: don't artificially limit TestReadMetricsSchedqmuntal
TestReadMetricsSched/running can take some time to enter in steady state on busy systems. We currently only allow 1 second for that, we should let it run unlimitedly until success or the test time's out. Fixes #75049 Change-Id: I452059e1837caf12a2d2d9cae1f70a0ef2d4f518 Reviewed-on: https://go-review.googlesource.com/c/go/+/702295 Auto-Submit: Damien Neil <dneil@google.com> Reviewed-by: Damien Neil <dneil@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-08-15runtime: don't overwrite global stop channel in testsMichael Anthony Knyszek
This is used by TestStopTheWorldDeadlock, and the new TestReadMetricsSched test accidentally modifies this global variable instead of (as intended) defining a local one. Change-Id: I7aaece83f285d051ad8b56b7591c76613c39613a Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Reviewed-on: https://go-review.googlesource.com/c/go/+/696556 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
2025-08-15runtime/metrics: add metric for current Go-owned thread countMichael Anthony Knyszek
Fixes #15490. Change-Id: I6ce9edc46398030ff639e22d4ca4adebccdfe1b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/690399 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-08-15runtime/metrics: add metric for total goroutines createdMichael Anthony Knyszek
For #15490. Change-Id: Ic587dda1f42d613ea131a6b53ce6ba6e6cadf4c7 Reviewed-on: https://go-review.googlesource.com/c/go/+/690398 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-15runtime/metrics: add metrics for goroutine sched statesMichael Anthony Knyszek
This is largely a port of CL 38180. For #15490. Change-Id: I2726111e472e81e9f9f0f294df97872c2689f061 Reviewed-on: https://go-review.googlesource.com/c/go/+/690397 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-30runtime/metrics: add cleanup and finalizer queue metricsMichael Anthony Knyszek
These metrics are useful for identifying finalizer and cleanup problems, namely slow finalizers and/or cleanups holding up the queue, which can lead to a memory leak. Fixes #72948. Change-Id: I1bb64a9ca751fcb462c96d986d0346e0c2894c95 Reviewed-on: https://go-review.googlesource.com/c/go/+/690396 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-07runtime: remove GODEBUG=runtimecontentionstacksRhys Hiltner
Go 1.22 promised to remove the setting in a future release once the semantics of runtime-internal lock contention matched that of sync.Mutex. That work is done, remove the setting. Previously reviewed as https://go.dev/cl/585639. For #66999 Change-Id: I9fe62558ba0ac12824874a0bb1b41efeb7c0853f Reviewed-on: https://go-review.googlesource.com/c/go/+/668995 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-05-07runtime: verify attribution of mutex delayRhys Hiltner
Have the test use the same clock (cputicks) as the profiler, and use the test's own measurements as hard bounds on the magnitude to expect in the profile. Compare the depiction of two users of the same lock: one where the critical section is fast, one where it is slow. Confirm that the profile shows the slow critical section as a large source of delay (with #66999 fixed), rather than showing the fast critical section as a large recipient of delay. Previously reviewed as https://go.dev/cl/586237. For #66999 Change-Id: Ic2d78cc29153d5322577d84abdc448e95ed8f594 Reviewed-on: https://go-review.googlesource.com/c/go/+/667616 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
2025-03-15runtime: log profile when mutex profile test failsRhys Hiltner
For #70602 Change-Id: I3f723ebc17ef690d5be7f4f948c9dd1f890196fd Reviewed-on: https://go-review.googlesource.com/c/go/+/658095 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-07-25runtime: use slices and maps to clean up testsapocelipes
Replace reflect.DeepEqual with slices.Equal/maps.Equal, which is much faster. Also remove some unecessary helper functions. Change-Id: I3e4fa2938fed1598278c9e556cd4fa3b9ed3ad6d GitHub-Last-Rev: 69bb43fc6e5c4a4a7d028528fe00b43db784464e GitHub-Pull-Request: golang/go#67603 Reviewed-on: https://go-review.googlesource.com/c/go/+/587815 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2024-05-30Revert "runtime: remove GODEBUG=runtimecontentionstacks"Rhys Hiltner
This reverts commit 87e930f7289136fad1310d4b63dd4127e409bac5 (CL 585639) Reason for revert: This is part of a patch series that changed the handling of contended lock2/unlock2 calls, reducing the maximum throughput of contended runtime.mutex values, and causing a performance regression on applications where that is (or became) the bottleneck. Updates #66999 Updates #67585 Change-Id: I1e286d2a16d16e4af202cd5dc04b2d9c4ee71b32 Reviewed-on: https://go-review.googlesource.com/c/go/+/589097 Reviewed-by: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
2024-05-30Revert "runtime: improve runtime-internal mutex profile tests"Rhys Hiltner
This reverts commit f9ba2cff2286d378eca28c841bea8488c69fc30e (CL 586237) Reason for revert: This is part of a patch series that changed the handling of contended lock2/unlock2 calls, reducing the maximum throughput of contended runtime.mutex values, and causing a performance regression on applications where that is (or became) the bottleneck. This test verifies that the semantics of the mutex profile for runtime.mutex values matches that of sync.Mutex values. Without the rest of the patch series, this test would correctly identify that Go 1.22's semantics are incorrect (issue #66999). Updates #66999 Updates #67585 Change-Id: Id06ae01d7bc91c94054c80d273e6530cb2d59d10 Reviewed-on: https://go-review.googlesource.com/c/go/+/589096 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-22runtime: lower mutex contention test expectationsRhys Hiltner
As of https://go.dev/cl/586796, the runtime/metrics view of internal mutex contention is sampled at 1 per gTrackingPeriod, rather than either 1 (immediately prior to CL 586796) or the more frequent of gTrackingPeriod or the mutex profiling rate (Go 1.22). Thus, we no longer have a real lower bound on the amount of contention that runtime/metrics will report. Relax the test's expectations again. For #64253 Change-Id: I94e1d92348a03599a819ec8ac785a0eb3c1ddd73 Reviewed-on: https://go-review.googlesource.com/c/go/+/587515 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
2024-05-21runtime: improve runtime-internal mutex profile testsRhys Hiltner
Have the test use the same clock (cputicks) as the profiler, and use the test's own measurements as hard bounds on the magnitude to expect in the profile. Compare the depiction of two users of the same lock: one where the critical section is fast, one where it is slow. Confirm that the profile shows the slow critical section as a large source of delay (with #66999 fixed), rather than showing the fast critical section as a large recipient of delay. For #64253 For #66999 Change-Id: I784c8beedc39de564dc8cee42060a5d5ce55de39 Reviewed-on: https://go-review.googlesource.com/c/go/+/586237 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-05-21runtime: remove GODEBUG=runtimecontentionstacksRhys Hiltner
Go 1.22 promised to remove the setting in a future release once the semantics of runtime-internal lock contention matched that of sync.Mutex. That work is done, remove the setting. For #66999 Change-Id: I3c4894148385adf2756d8754e44d7317305ad758 Reviewed-on: https://go-review.googlesource.com/c/go/+/585639 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-05-08runtime: update large object stats before freeSpan in sweepMichael Anthony Knyszek
Currently freeSpan is called before large object stats are updated when sweeping large objects. This means heapStats.inHeap might get subtracted before the large object is added to the largeFree field. The end result is that the /memory/classes/heap/unused:bytes metric, which subtracts live objects (alloc-free) from inHeap may overflow. Fix this by always updating the large object stats before calling freeSpan. Fixes #67019. Change-Id: Ib02bd8dcd1cf8cd1bc0110b6141e74f678c10445 Reviewed-on: https://go-review.googlesource.com/c/go/+/583380 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-04-24runtime: test mutex contention stacks and countsRhys Hiltner
Fully testing the runtime's profiles and metrics for contention on its internal mutex values involves comparing two separate clocks (cputicks for the profile and nanotime for the metric), verifying its fractional sampling (when MutexProfileRate is greater than 1), and observing a very small critical section outside of the test's control (semrelease). Flakiness (#64253) from those parts of the test have led to skipping it entirely. But there are portions of the mutex profiling behavior that should have more consistent behavior: for a mutex under the test's control, the test and the runtime should be able to agree that the test successfully induced contention, and should agree on the call stack that caused the contention. Allow those more consistent parts to run. For #64253 Change-Id: I7f368d3265a5c003da2765164276fab616eb9959 Reviewed-on: https://go-review.googlesource.com/c/go/+/581296 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Joedian Reid <joedian@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
2024-04-08runtime: move GC pause time CPU metrics update into the STWMichael Anthony Knyszek
This change fixes a possible race with updating metrics and reading them. The update is intended to be protected by the world being stopped, but here, it clearly isn't. Fixing this lets us lower the thresholds in the metrics tests by an order of magnitude, because the only thing we have to worry about now is floating point error (the tests were previously written assuming the floating point error was much higher than it actually was; that turns out not to be the case, and this bug was the problem instead). However, this still isn't that tight of a bound; we still want to catch any and all problems of exactness. For this purpose, this CL adds a test to check the source-of-truth (in uint64 nanoseconds) that ensures the totals exactly match. This means we unfortunately have to take another time measurement, but for now let's prioritize correctness. A few additional nanoseconds of STW time won't be terribly noticable. Fixes #66212. Change-Id: Id02c66e8a43c13b1f70e9b268b8a84cc72293bfd Reviewed-on: https://go-review.googlesource.com/c/go/+/570257 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-12-18runtime: skip TestRuntimeLockMetricsAndProfile for flakinessMichael Anthony Knyszek
This test was added to check new mutex profile functionality. Specifically, it checks to make sure that the functionality behind GODEBUG=runtimecontentionstacks works. The runtime currently tracks contention from runtime-internal mutexes in mutex profiles, but it does not record stack traces for them, attributing the time to a dummy symbol. This GODEBUG enables collecting stacks. Just disable the test. Even if this functionality breaks, it won't affect Go users and it'll help keep the builders green. It's fine to leave the test because this will be revisited in the next dev cycle. For #64253. Change-Id: I7938fe0f036fc4e4a0764f030e691e312ec2c9b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/550775 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Eli Bendersky <eliben@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2023-12-06runtime: rename GODEBUG=profileruntimelocks to runtimecontentionstacksMichael Pratt
profileruntimelocks is new in CL 544195, but the name is deceptive. Even with profileruntimelocks=0, runtime-internal locks are still profiled. The actual difference is that call stacks are not collected. Instead all contention is reported at runtime._LostContendedLock. Rename this setting to runtimecontentionstacks to make its name more aligned with its behavior. In addition, for this release the default is profileruntimelocks=0, meaning that users are fairly likely to encounter runtime._LostContendedLock. Rename it to runtime._LostContendedRuntimeLock in an attempt to make it more intuitive that these are runtime locks, not locks in application code. For #57071. Change-Id: I38aac28b2c0852db643d53b1eab3f3bc42a43393 Reviewed-on: https://go-review.googlesource.com/c/go/+/547055 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Rhys Hiltner <rhys@justin.tv>
2023-11-28runtime: test for contention in both semaphore pathsRhys Hiltner
Most contention on the runtime locks inside semaphores is observed in runtime.semrelease1, but it can also appear in runtime.semacquire1. When examining contention profiles in TestRuntimeLockMetricsAndProfile, allow call stacks that include either. For #64253 Change-Id: Id4f16af5e9a28615ab5032a3197e8df90f7e382f Reviewed-on: https://go-review.googlesource.com/c/go/+/544375 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Rhys Hiltner <rhys@justin.tv> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-11-21runtime: profile contended lock callsRhys Hiltner
Add runtime-internal locks to the mutex contention profile. Store up to one call stack responsible for lock contention on the M, until it's safe to contribute its value to the mprof table. Try to use that limited local storage space for a relatively large source of contention, and attribute any contention in stacks we're not able to store to a sentinel _LostContendedLock function. Avoid ballooning lock contention while manipulating the mprof table by attributing to that sentinel function any lock contention experienced while reporting lock contention. Guard collecting real call stacks with GODEBUG=profileruntimelocks=1, since the available data has mixed semantics; we can easily capture an M's own wait time, but we'd prefer for the profile entry of each critical section to describe how long it made the other Ms wait. It's too late in the Go 1.22 cycle to make the required changes to futex-based locks. When not enabled, attribute the time to the sentinel function instead. Fixes #57071 This is a roll-forward of https://go.dev/cl/528657, which was reverted in https://go.dev/cl/543660 Reason for revert: de-flakes tests (reduces dependence on fine-grained timers, correctly identifies contention on big-endian futex locks, attempts to measure contention in the semaphore implementation but only uses that secondary measurement to finish the test early, skips tests on single-processor systems) Change-Id: I31389f24283d85e46ad9ba8d4f514cb9add8dfb0 Reviewed-on: https://go-review.googlesource.com/c/go/+/544195 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Auto-Submit: Rhys Hiltner <rhys@justin.tv> Run-TryBot: Rhys Hiltner <rhys@justin.tv>
2023-11-20src: a/an grammar fixesVille Skyttä
Change-Id: I179b50ae8e73677d4d408b83424afbbfe6aa17a1 GitHub-Last-Rev: 2e2d9c1e45556155d02db4df381b99f2d1bc5c0e GitHub-Pull-Request: golang/go#63478 Reviewed-on: https://go-review.googlesource.com/c/go/+/534015 Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-11-20Revert "runtime: profile contended lock calls"Matthew Dempsky
This reverts commit go.dev/cl/528657. Reason for revert: broke a lot of builders. Change-Id: I70c33062020e997c4df67b3eaa2e886cf0da961e Reviewed-on: https://go-review.googlesource.com/c/go/+/543660 Reviewed-by: Than McIntosh <thanm@google.com> Auto-Submit: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-17runtime: profile contended lock callsRhys Hiltner
Add runtime-internal locks to the mutex contention profile. Store up to one call stack responsible for lock contention on the M, until it's safe to contribute its value to the mprof table. Try to use that limited local storage space for a relatively large source of contention, and attribute any contention in stacks we're not able to store to a sentinel _LostContendedLock function. Avoid ballooning lock contention while manipulating the mprof table by attributing to that sentinel function any lock contention experienced while reporting lock contention. Guard collecting real call stacks with GODEBUG=profileruntimelocks=1, since the available data has mixed semantics; we can easily capture an M's own wait time, but we'd prefer for the profile entry of each critical section to describe how long it made the other Ms wait. It's too late in the Go 1.22 cycle to make the required changes to futex-based locks. When not enabled, attribute the time to the sentinel function instead. Fixes #57071 Change-Id: I3eee0ccbfc20f333b56f20d8725dfd7f3a526b41 Reviewed-on: https://go-review.googlesource.com/c/go/+/528657 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Rhys Hiltner <rhys@justin.tv> Reviewed-by: Than McIntosh <thanm@google.com>
2023-11-16runtime: disable automatic GC for STW metric testsRhys Hiltner
A follow-up to https://go.dev/cl/534161 -- calls to runtime/trace.Start and Stop synchronize with the GC, waiting for any in-progress mark phase to complete. Disable automatic GCs to quiet the system, so we can observe only the test's intentional pauses. Change-Id: I6f8106c42528f9bda9afec1c151119783bbc78dc Reviewed-on: https://go-review.googlesource.com/c/go/+/543075 Run-TryBot: Rhys Hiltner <rhys@justin.tv> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Rhys Hiltner <rhys@justin.tv> Reviewed-by: Bryan Mills <bcmills@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-15runtime/metrics: add STW stopping and total time metricsMichael Pratt
This CL adds four new time histogram metrics: /sched/pauses/stopping/gc:seconds /sched/pauses/stopping/other:seconds /sched/pauses/total/gc:seconds /sched/pauses/total/other:seconds The "stopping" metrics measure the time taken to start a stop-the-world pause. i.e., how long it takes stopTheWorldWithSema to stop all Ps. This can be used to detect STW struggling to preempt Ps. The "total" metrics measure the total duration of a stop-the-world pause, from starting to stop-the-world until the world is started again. This includes the time spent in the "start" phase. The "gc" metrics are used for GC-related STW pauses. The "other" metrics are used for all other STW pauses. All of these metrics start timing in stopTheWorldWithSema only after successfully acquiring sched.lock, thus excluding lock contention on sched.lock. The reasoning behind this is that while waiting on sched.lock the world is not stopped at all (all other Ps can run), so the impact of this contention is primarily limited to the goroutine attempting to stop-the-world. Additionally, we already have some visibility into sched.lock contention via contention profiles (#57071). /sched/pauses/total/gc:seconds is conceptually equivalent to /gc/pauses:seconds, so the latter is marked as deprecated and returns the same histogram as the former. In the implementation, there are a few minor differences: * For both mark and sweep termination stops, /gc/pauses:seconds started timing prior to calling startTheWorldWithSema, thus including lock contention. These details are minor enough, that I do not believe the slight change in reporting will matter. For mark termination stops, moving timing stop into startTheWorldWithSema does have the side effect of requiring moving other GC metric calculations outside of the STW, as they depend on the same end time. Fixes #63340 Change-Id: Iacd0bab11bedab85d3dcfb982361413a7d9c0d05 Reviewed-on: https://go-review.googlesource.com/c/go/+/534161 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-09-06runtime/metrics: fix /gc/scan/* metricsNayef Ghattas
In the existing implementation, all /gc/scan/* metrics are always equal to 0 due to the dependency on gcStatDep not being set. This leads to gcStatAggregate always containing zeros, and always reporting 0 for those metrics. Also, add a test to ensure that /gc/scan/* metrics are not empty. Fixes #62477. Change-Id: I67497347d50ed5c3ce1719a18714c062ec938cab Reviewed-on: https://go-review.googlesource.com/c/go/+/525595 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
2023-06-08runtime: apply looser bound to /gc/heap/live:bytes in testMichael Anthony Knyszek
/gc/heap/live:bytes may exceed MemStats.HeapAlloc, even when all data is flushed, becuase the GC may double-count objects when marking them. This is an intentional design choice that is largely inconsequential. The runtime is already robust to it, and the condition is rare. Fixes #60607. Change-Id: I4da402efc24327328d2d8780e4e49961b189f0ea Reviewed-on: https://go-review.googlesource.com/c/go/+/501858 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-31runtime: make TestCPUMetricsSleep even more lenientMichael Anthony Knyszek
This test was introduced as a regression test for #60276. However, it was quite flaky on a number of different platforms because there are myriad ways the runtime can eat into time one might expect is completely idle. This change re-enables the test, but makes it much more resilient. Because the issue we're testing for is persistent, we now require 10 consecutive failures to count. Any single success counts as a test success. This change also makes the test's idle time bound more lenient, allowing for a little bit of time to be eaten up. The regression we're testing for results in nearly zero idle time being accounted for. If this is still not good enough to eliminate flakes, this test should just be deleted. For #60276. Fixes #60376. Change-Id: Icd81f0c9970821b7f386f6d27c8a566fee4d0ff7 Reviewed-on: https://go-review.googlesource.com/c/go/+/498274 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-05-24runtime: ensure consistency of /gc/scan/*Michael Anthony Knyszek
Currently /gc/scan/total:bytes is computed as a separate sum. Compute it using the same inputs so it's always consistent with the sum of everything else in /gc/scan/*. For #56857. Change-Id: I43d9148a23b1d2eb948ae990193dca1da85df8a3 Reviewed-on: https://go-review.googlesource.com/c/go/+/497880 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-05-24runtime: skip TestCPUMetricsSleep as flakyMichael Anthony Knyszek
This test is fundamentally flaky because of a mismatch between how internal idle time is calculated and how the test expects it to be calculated. It's unclear how to resolve this mismatch, given that it's perfectly valid for a goroutine to remain asleep while background goroutines (e.g. the scavenger) run. In practice, we might be able to set some generous lower-bound, but until we can confirm that on the affected platforms, skip the test as flaky unconditionally. For #60276. For #60376. Change-Id: Iffd5c4be10cf8ae8a6c285b61fcc9173235fbb2a Reviewed-on: https://go-review.googlesource.com/c/go/+/497876 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-23runtime: fix usage of stale "now" value for netpolling MsMichael Anthony Knyszek
Currently pidleget gets passed "now" from before the M goes into netpoll, resulting in incorrect accounting of idle CPU time. lastpoll is also stored with a stale "now": the mistake was added in the same CL it was added for pidleget. Recompute "now" after returning from netpoll. Also, start tracking idle time on js/wasm at all. Credit to Rhys Hiltner for the test case. Fixes #60276. Change-Id: I5dd677471f74c915dfcf3d01621430876c3ff307 Reviewed-on: https://go-review.googlesource.com/c/go/+/496183 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-05-23runtime/metrics: add /gc/gogc:percentFelix Geisendörfer
For #56857 Change-Id: I7e7d2ea3e6ab59291a4cd867c680605ad75bd21f Reviewed-on: https://go-review.googlesource.com/c/go/+/497317 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2023-05-23runtime/metrics: add /gc/gomemlimit:bytesFelix Geisendörfer
For #56857 Change-Id: I184d752cc615874ada3d0dbc6ed1bf72c8debd0f Reviewed-on: https://go-review.googlesource.com/c/go/+/497316 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com>
2023-05-23runtime/metrics: add /gc/heap/live:bytesFelix Geisendörfer
For #56857 Change-Id: I0622af974783ab435e91b9fb3c1ba43f256ee4ac Reviewed-on: https://go-review.googlesource.com/c/go/+/497315 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2022-10-02all: use time.Since instead of time.Now().Subhopehook
Change-Id: Ifaa73b64e5b6a1d37c753e2440b642478d7dfbce Reviewed-on: https://go-review.googlesource.com/c/go/+/436957 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: hopehook <hopehook@golangcn.org> Run-TryBot: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>
2022-09-16runtime/metrics: add /sync/mutex/wait/total:seconds metricMichael Anthony Knyszek
This change adds a metric to the runtime/metrics package which tracks total mutex wait time for sync.Mutex and sync.RWMutex. The purpose of this metric is to be able to quickly get an idea of the total mutex wait time. The implementation of this metric piggybacks off of the existing G runnable tracking infrastructure, as well as the wait reason set on a G when it goes into _Gwaiting. Fixes #49881. Change-Id: I4691abf64ac3574bec69b4d7d4428b1573130517 Reviewed-on: https://go-review.googlesource.com/c/go/+/427618 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-09-16runtime/metrics: add CPU statsMichael Anthony Knyszek
This changes adds a breakdown for estimated CPU usage by time. These estimates are not based on real on-CPU counters, so each metric has a disclaimer explaining so. They can, however, be more reasonably compared to a total CPU time metric that this change also adds. Fixes #47216. Change-Id: I125006526be9f8e0d609200e193da5a78d9935be Reviewed-on: https://go-review.googlesource.com/c/go/+/404307 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Josh MacDonald <jmacd@lightstep.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-13runtime/metrics: add the number of Go-to-C callsMichael Anthony Knyszek
For #47216. Change-Id: I1c2cd518e6ff510cc3ac8d8f72fd52eadcabc16c Reviewed-on: https://go-review.googlesource.com/c/go/+/404306 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-05-13runtime/metrics: add gomaxprocs metricMichael Anthony Knyszek
For #47216. Change-Id: Ib2d48c4583570a2dae9510a52d4c6ffc20161b31 Reviewed-on: https://go-review.googlesource.com/c/go/+/404305 Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-04-26runtime: make alloc count metrics truly monotonicMichael Anthony Knyszek
Right now we export alloc count metrics via the runtime/metrics package and mark them as monotonic, but that's not actually true. As an optimization, the runtime assumes a span is always fully allocated before being uncached, and updates the accounting as such. In the rare case that it's wrong, the span has enough information to back out what did not get allocated. This change uses 16 bits of padding in the mspan to house another field that represents the amount of mspan slots filled just as the mspan is cached. This is information is enough to get an exact count, allowing us to make the metrics truly monotonic. Change-Id: Iaff3ca43f8745dc1bbb0232372423e014b89b920 Reviewed-on: https://go-review.googlesource.com/c/go/+/377516 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2021-04-29runtime/metrics: add additional allocation metricsMichael Anthony Knyszek
This change adds four additional metrics to the runtime/metrics package to fill in a few gaps with runtime.MemStats that were overlooked. The biggest one is TotalAlloc, which is impossible to find with the runtime/metrics package, but also add a few others for convenience and clarity. For instance, the total number of objects allocated and freed are technically available via allocs-by-size and frees-by-size, but it's onerous to get them (one needs to sum the sample counts in the histograms). The four additional metrics are: - /gc/heap/allocs:bytes -- total bytes allocated (TotalAlloc) - /gc/heap/allocs:objects -- total objects allocated (Mallocs - [tiny]) - /gc/heap/frees:bytes -- total bytes frees (TotalAlloc-HeapAlloc) - /gc/heap/frees:objects -- total objects freed (Frees - [tiny]) This change also updates the descriptions of allocs-by-size and frees-by-size to be more precise. Change-Id: Iec8c1797a584491e3484b198f2e7f325b68954a7 Reviewed-on: https://go-review.googlesource.com/c/go/+/312431 Reviewed-by: Michael Pratt <mpratt@google.com> Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-04-27runtime/metrics: add tiny allocs metricMichael Anthony Knyszek
Currently tiny allocations are not represented in either MemStats or runtime/metrics, but they're represented in MemStats (indirectly) via Mallocs. Add them to runtime/metrics by first merging memstats.tinyallocs into consistentHeapStats (just for simplicity; it's monotonic so metrics would still be self-consistent if we just read it atomically) and then adding /gc/heap/tiny/allocs:objects to the list of supported metrics. Change-Id: Ie478006ab942a3e877b4a79065ffa43569722f3d Reviewed-on: https://go-review.googlesource.com/c/go/+/312909 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>