diff options
| author | Michael Pratt <mpratt@google.com> | 2023-10-10 15:28:32 -0400 |
|---|---|---|
| committer | Gopher Robot <gobot@golang.org> | 2023-11-15 16:49:45 +0000 |
| commit | 6ef98ac87c8a4185c0bace496d84cb3b68f069e3 (patch) | |
| tree | 055a940a2beb55ca09c411bdf9accb49f054f7aa /src/runtime/metrics/doc.go | |
| parent | a0df23888fb30c82d8c54c24212442bf56211769 (diff) | |
| download | go-6ef98ac87c8a4185c0bace496d84cb3b68f069e3.tar.xz | |
runtime/metrics: add STW stopping and total time metrics
This CL adds four new time histogram metrics:
/sched/pauses/stopping/gc:seconds
/sched/pauses/stopping/other:seconds
/sched/pauses/total/gc:seconds
/sched/pauses/total/other:seconds
The "stopping" metrics measure the time taken to start a stop-the-world
pause. i.e., how long it takes stopTheWorldWithSema to stop all Ps.
This can be used to detect STW struggling to preempt Ps.
The "total" metrics measure the total duration of a stop-the-world
pause, from starting to stop-the-world until the world is started again.
This includes the time spent in the "start" phase.
The "gc" metrics are used for GC-related STW pauses. The "other" metrics
are used for all other STW pauses.
All of these metrics start timing in stopTheWorldWithSema only after
successfully acquiring sched.lock, thus excluding lock contention on
sched.lock. The reasoning behind this is that while waiting on
sched.lock the world is not stopped at all (all other Ps can run), so
the impact of this contention is primarily limited to the goroutine
attempting to stop-the-world. Additionally, we already have some
visibility into sched.lock contention via contention profiles (#57071).
/sched/pauses/total/gc:seconds is conceptually equivalent to
/gc/pauses:seconds, so the latter is marked as deprecated and returns
the same histogram as the former.
In the implementation, there are a few minor differences:
* For both mark and sweep termination stops, /gc/pauses:seconds started
timing prior to calling startTheWorldWithSema, thus including lock
contention.
These details are minor enough, that I do not believe the slight change
in reporting will matter. For mark termination stops, moving timing stop
into startTheWorldWithSema does have the side effect of requiring moving
other GC metric calculations outside of the STW, as they depend on the
same end time.
Fixes #63340
Change-Id: Iacd0bab11bedab85d3dcfb982361413a7d9c0d05
Reviewed-on: https://go-review.googlesource.com/c/go/+/534161
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Diffstat (limited to 'src/runtime/metrics/doc.go')
| -rw-r--r-- | src/runtime/metrics/doc.go | 43 |
1 files changed, 37 insertions, 6 deletions
diff --git a/src/runtime/metrics/doc.go b/src/runtime/metrics/doc.go index 78b2e6c3bc..5be6c32bfa 100644 --- a/src/runtime/metrics/doc.go +++ b/src/runtime/metrics/doc.go @@ -83,10 +83,10 @@ Below is the full list of supported metrics, ordered lexicographically. the GC. Even if only one thread is running during the pause, this is computed as GOMAXPROCS times the pause latency because nothing else can be executing. This is the exact sum of samples - in /gc/pause:seconds if each sample is multiplied by GOMAXPROCS - at the time it is taken. This metric is an overestimate, - and not directly comparable to system CPU time measurements. - Compare only with other /cpu/classes metrics. + in /sched/pauses/total/gc:seconds if each sample is multiplied + by GOMAXPROCS at the time it is taken. This metric is an + overestimate, and not directly comparable to system CPU time + measurements. Compare only with other /cpu/classes metrics. /cpu/classes/gc/total:cpu-seconds Estimated total CPU time spent performing GC tasks. This metric @@ -211,8 +211,7 @@ Below is the full list of supported metrics, ordered lexicographically. 1, so a value of 0 indicates that it was never enabled. /gc/pauses:seconds - Distribution of individual GC-related stop-the-world pause - latencies. Bucket counts increase monotonically. + Deprecated. Prefer the identical /sched/pauses/total/gc:seconds. /gc/scan/globals:bytes The total amount of global variable space that is scannable. @@ -411,6 +410,38 @@ Below is the full list of supported metrics, ordered lexicographically. in a runnable state before actually running. Bucket counts increase monotonically. + /sched/pauses/stopping/gc:seconds + Distribution of individual GC-related stop-the-world stopping + latencies. This is the time it takes from deciding to stop the + world until all Ps are stopped. This is a subset of the total + GC-related stop-the-world time (/sched/pauses/total/gc:seconds). + During this time, some threads may be executing. Bucket counts + increase monotonically. + + /sched/pauses/stopping/other:seconds + Distribution of individual non-GC-related stop-the-world + stopping latencies. This is the time it takes from deciding + to stop the world until all Ps are stopped. This is a + subset of the total non-GC-related stop-the-world time + (/sched/pauses/total/other:seconds). During this time, some + threads may be executing. Bucket counts increase monotonically. + + /sched/pauses/total/gc:seconds + Distribution of individual GC-related stop-the-world pause + latencies. This is the time from deciding to stop the world + until the world is started again. Some of this time is spent + getting all threads to stop (this is measured directly in + /sched/pauses/stopping/gc:seconds), during which some threads + may still be running. Bucket counts increase monotonically. + + /sched/pauses/total/other:seconds + Distribution of individual non-GC-related stop-the-world + pause latencies. This is the time from deciding to stop the + world until the world is started again. Some of this time + is spent getting all threads to stop (measured directly in + /sched/pauses/stopping/other:seconds). Bucket counts increase + monotonically. + /sync/mutex/wait/total:seconds Approximate cumulative time goroutines have spent blocked on a sync.Mutex or sync.RWMutex. This metric is useful for |
