aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/mprof.go
AgeCommit message (Collapse)Author
2024-01-15runtime: more godoc linksOlivier Mengué
Change-Id: I8fe66326994894b17ce0eda991bba942844d26b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/541475 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-12-06runtime: rename GODEBUG=profileruntimelocks to runtimecontentionstacksMichael Pratt
profileruntimelocks is new in CL 544195, but the name is deceptive. Even with profileruntimelocks=0, runtime-internal locks are still profiled. The actual difference is that call stacks are not collected. Instead all contention is reported at runtime._LostContendedLock. Rename this setting to runtimecontentionstacks to make its name more aligned with its behavior. In addition, for this release the default is profileruntimelocks=0, meaning that users are fairly likely to encounter runtime._LostContendedLock. Rename it to runtime._LostContendedRuntimeLock in an attempt to make it more intuitive that these are runtime locks, not locks in application code. For #57071. Change-Id: I38aac28b2c0852db643d53b1eab3f3bc42a43393 Reviewed-on: https://go-review.googlesource.com/c/go/+/547055 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Rhys Hiltner <rhys@justin.tv>
2023-12-05math/rand, math/rand/v2: use ChaCha8 for global randRuss Cox
Move ChaCha8 code into internal/chacha8rand and use it to implement runtime.rand, which is used for the unseeded global source for both math/rand and math/rand/v2. This also affects the calculation of the start point for iteration over very very large maps (when the 32-bit fastrand is not big enough). The benefit is that misuse of the global random number generators in math/rand and math/rand/v2 in contexts where non-predictable randomness is important for security reasons is no longer a security problem, removing a common mistake among programmers who are unaware of the different kinds of randomness. The cost is an extra 304 bytes per thread stored in the m struct plus 2-3ns more per random uint64 due to the more sophisticated algorithm. Using PCG looks like it would cost about the same, although I haven't benchmarked that. Before this, the math/rand and math/rand/v2 global generator was wyrand (https://github.com/wangyi-fudan/wyhash). For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson ALFG was justifiable, since the latter was not any better. But for math/rand/v2, the global generator really should be at least as good as one of the well-studied, specific algorithms provided directly by the package, and it's not. (Wyrand is still reasonable for scheduling and cache decisions.) Good randomness does have a cost: about twice wyrand. Also rationalize the various runtime rand references. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ bbb48afeb7.amd64 │ 5cf807d1ea.amd64 │ │ sec/op │ sec/op vs base │ ChaCha8-32 1.862n ± 2% 1.861n ± 2% ~ (p=0.825 n=20) PCG_DXSM-32 1.471n ± 1% 1.460n ± 2% ~ (p=0.153 n=20) SourceUint64-32 1.636n ± 2% 1.582n ± 1% -3.30% (p=0.000 n=20) GlobalInt64-32 2.087n ± 1% 3.663n ± 1% +75.54% (p=0.000 n=20) GlobalInt64Parallel-32 0.1042n ± 1% 0.2026n ± 1% +94.48% (p=0.000 n=20) GlobalUint64-32 2.263n ± 2% 3.724n ± 1% +64.57% (p=0.000 n=20) GlobalUint64Parallel-32 0.1019n ± 1% 0.1973n ± 1% +93.67% (p=0.000 n=20) Int64-32 1.771n ± 1% 1.774n ± 1% ~ (p=0.449 n=20) Uint64-32 1.863n ± 2% 1.866n ± 1% ~ (p=0.364 n=20) GlobalIntN1000-32 3.134n ± 3% 4.730n ± 2% +50.95% (p=0.000 n=20) IntN1000-32 2.489n ± 1% 2.489n ± 1% ~ (p=0.683 n=20) Int64N1000-32 2.521n ± 1% 2.516n ± 1% ~ (p=0.394 n=20) Int64N1e8-32 2.479n ± 1% 2.478n ± 2% ~ (p=0.743 n=20) Int64N1e9-32 2.530n ± 2% 2.514n ± 2% ~ (p=0.193 n=20) Int64N2e9-32 2.501n ± 1% 2.494n ± 1% ~ (p=0.616 n=20) Int64N1e18-32 3.227n ± 1% 3.205n ± 1% ~ (p=0.101 n=20) Int64N2e18-32 3.647n ± 1% 3.599n ± 1% ~ (p=0.019 n=20) Int64N4e18-32 5.135n ± 1% 5.069n ± 2% ~ (p=0.034 n=20) Int32N1000-32 2.657n ± 1% 2.637n ± 1% ~ (p=0.180 n=20) Int32N1e8-32 2.636n ± 1% 2.636n ± 1% ~ (p=0.763 n=20) Int32N1e9-32 2.660n ± 2% 2.638n ± 1% ~ (p=0.358 n=20) Int32N2e9-32 2.662n ± 2% 2.618n ± 2% ~ (p=0.064 n=20) Float32-32 2.272n ± 2% 2.239n ± 2% ~ (p=0.194 n=20) Float64-32 2.272n ± 1% 2.286n ± 2% ~ (p=0.763 n=20) ExpFloat64-32 3.762n ± 1% 3.744n ± 1% ~ (p=0.171 n=20) NormFloat64-32 3.706n ± 1% 3.655n ± 2% ~ (p=0.066 n=20) Perm3-32 32.93n ± 3% 34.62n ± 1% +5.13% (p=0.000 n=20) Perm30-32 202.9n ± 1% 204.0n ± 1% ~ (p=0.482 n=20) Perm30ViaShuffle-32 115.0n ± 1% 114.9n ± 1% ~ (p=0.358 n=20) ShuffleOverhead-32 112.8n ± 1% 112.7n ± 1% ~ (p=0.692 n=20) Concurrent-32 2.107n ± 0% 3.725n ± 1% +76.75% (p=0.000 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 │ bbb48afeb7.arm64 │ 5cf807d1ea.arm64 │ │ sec/op │ sec/op vs base │ ChaCha8-8 2.480n ± 0% 2.429n ± 0% -2.04% (p=0.000 n=20) PCG_DXSM-8 2.531n ± 0% 2.530n ± 0% ~ (p=0.877 n=20) SourceUint64-8 2.534n ± 0% 2.533n ± 0% ~ (p=0.732 n=20) GlobalInt64-8 2.172n ± 1% 4.794n ± 0% +120.67% (p=0.000 n=20) GlobalInt64Parallel-8 0.4320n ± 0% 0.9605n ± 0% +122.32% (p=0.000 n=20) GlobalUint64-8 2.182n ± 0% 4.770n ± 0% +118.58% (p=0.000 n=20) GlobalUint64Parallel-8 0.4307n ± 0% 0.9583n ± 0% +122.51% (p=0.000 n=20) Int64-8 4.107n ± 0% 4.104n ± 0% ~ (p=0.416 n=20) Uint64-8 4.080n ± 0% 4.080n ± 0% ~ (p=0.052 n=20) GlobalIntN1000-8 2.814n ± 2% 5.643n ± 0% +100.50% (p=0.000 n=20) IntN1000-8 4.141n ± 0% 4.139n ± 0% ~ (p=0.140 n=20) Int64N1000-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.313 n=20) Int64N1e8-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.103 n=20) Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.761 n=20) Int64N2e9-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.636 n=20) Int64N1e18-8 5.266n ± 0% 5.326n ± 1% +1.14% (p=0.001 n=20) Int64N2e18-8 6.052n ± 0% 6.167n ± 0% +1.90% (p=0.000 n=20) Int64N4e18-8 8.826n ± 0% 9.051n ± 0% +2.55% (p=0.000 n=20) Int32N1000-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20) Int32N1e8-8 4.126n ± 0% 4.131n ± 0% +0.12% (p=0.000 n=20) Int32N1e9-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20) Int32N2e9-8 4.132n ± 0% 4.131n ± 0% ~ (p=0.017 n=20) Float32-8 4.109n ± 0% 4.105n ± 0% ~ (p=0.379 n=20) Float64-8 4.107n ± 0% 4.106n ± 0% ~ (p=0.867 n=20) ExpFloat64-8 5.339n ± 0% 5.383n ± 0% +0.82% (p=0.000 n=20) NormFloat64-8 5.735n ± 0% 5.737n ± 1% ~ (p=0.856 n=20) Perm3-8 26.65n ± 0% 26.80n ± 1% +0.58% (p=0.000 n=20) Perm30-8 194.8n ± 1% 197.0n ± 0% +1.18% (p=0.000 n=20) Perm30ViaShuffle-8 156.6n ± 0% 157.6n ± 1% +0.61% (p=0.000 n=20) ShuffleOverhead-8 124.9n ± 0% 125.5n ± 0% +0.52% (p=0.000 n=20) Concurrent-8 2.434n ± 3% 5.066n ± 0% +108.09% (p=0.000 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ bbb48afeb7.386 │ 5cf807d1ea.386 │ │ sec/op │ sec/op vs base │ ChaCha8-32 11.295n ± 1% 4.748n ± 2% -57.96% (p=0.000 n=20) PCG_DXSM-32 7.693n ± 1% 7.738n ± 2% ~ (p=0.542 n=20) SourceUint64-32 7.658n ± 2% 7.622n ± 2% ~ (p=0.344 n=20) GlobalInt64-32 3.473n ± 2% 7.526n ± 2% +116.73% (p=0.000 n=20) GlobalInt64Parallel-32 0.3198n ± 0% 0.5444n ± 0% +70.22% (p=0.000 n=20) GlobalUint64-32 3.612n ± 0% 7.575n ± 1% +109.69% (p=0.000 n=20) GlobalUint64Parallel-32 0.3168n ± 0% 0.5403n ± 0% +70.51% (p=0.000 n=20) Int64-32 7.673n ± 2% 7.789n ± 1% ~ (p=0.122 n=20) Uint64-32 7.773n ± 1% 7.827n ± 2% ~ (p=0.920 n=20) GlobalIntN1000-32 6.268n ± 1% 9.581n ± 1% +52.87% (p=0.000 n=20) IntN1000-32 10.33n ± 2% 10.45n ± 1% ~ (p=0.233 n=20) Int64N1000-32 10.98n ± 2% 11.01n ± 1% ~ (p=0.401 n=20) Int64N1e8-32 11.19n ± 2% 10.97n ± 1% ~ (p=0.033 n=20) Int64N1e9-32 11.06n ± 1% 11.08n ± 1% ~ (p=0.498 n=20) Int64N2e9-32 11.10n ± 1% 11.01n ± 2% ~ (p=0.995 n=20) Int64N1e18-32 15.23n ± 2% 15.04n ± 1% ~ (p=0.973 n=20) Int64N2e18-32 15.89n ± 1% 15.85n ± 1% ~ (p=0.409 n=20) Int64N4e18-32 18.96n ± 2% 19.34n ± 2% ~ (p=0.048 n=20) Int32N1000-32 10.46n ± 2% 10.44n ± 2% ~ (p=0.480 n=20) Int32N1e8-32 10.46n ± 2% 10.49n ± 2% ~ (p=0.951 n=20) Int32N1e9-32 10.28n ± 2% 10.26n ± 1% ~ (p=0.431 n=20) Int32N2e9-32 10.50n ± 2% 10.44n ± 2% ~ (p=0.249 n=20) Float32-32 13.80n ± 2% 13.80n ± 2% ~ (p=0.751 n=20) Float64-32 23.55n ± 2% 23.87n ± 0% ~ (p=0.408 n=20) ExpFloat64-32 15.36n ± 1% 15.29n ± 2% ~ (p=0.316 n=20) NormFloat64-32 13.57n ± 1% 13.79n ± 1% +1.66% (p=0.005 n=20) Perm3-32 45.70n ± 2% 46.99n ± 2% +2.81% (p=0.001 n=20) Perm30-32 399.0n ± 1% 403.8n ± 1% +1.19% (p=0.006 n=20) Perm30ViaShuffle-32 349.0n ± 1% 350.4n ± 1% ~ (p=0.909 n=20) ShuffleOverhead-32 322.3n ± 1% 323.8n ± 1% ~ (p=0.410 n=20) Concurrent-32 3.331n ± 1% 7.312n ± 1% +119.50% (p=0.000 n=20) For #61716. Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2 Reviewed-on: https://go-review.googlesource.com/c/go/+/516860 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-21runtime: profile contended lock callsRhys Hiltner
Add runtime-internal locks to the mutex contention profile. Store up to one call stack responsible for lock contention on the M, until it's safe to contribute its value to the mprof table. Try to use that limited local storage space for a relatively large source of contention, and attribute any contention in stacks we're not able to store to a sentinel _LostContendedLock function. Avoid ballooning lock contention while manipulating the mprof table by attributing to that sentinel function any lock contention experienced while reporting lock contention. Guard collecting real call stacks with GODEBUG=profileruntimelocks=1, since the available data has mixed semantics; we can easily capture an M's own wait time, but we'd prefer for the profile entry of each critical section to describe how long it made the other Ms wait. It's too late in the Go 1.22 cycle to make the required changes to futex-based locks. When not enabled, attribute the time to the sentinel function instead. Fixes #57071 This is a roll-forward of https://go.dev/cl/528657, which was reverted in https://go.dev/cl/543660 Reason for revert: de-flakes tests (reduces dependence on fine-grained timers, correctly identifies contention on big-endian futex locks, attempts to measure contention in the semaphore implementation but only uses that secondary measurement to finish the test early, skips tests on single-processor systems) Change-Id: I31389f24283d85e46ad9ba8d4f514cb9add8dfb0 Reviewed-on: https://go-review.googlesource.com/c/go/+/544195 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Auto-Submit: Rhys Hiltner <rhys@justin.tv> Run-TryBot: Rhys Hiltner <rhys@justin.tv>
2023-11-20Revert "runtime: profile contended lock calls"Matthew Dempsky
This reverts commit go.dev/cl/528657. Reason for revert: broke a lot of builders. Change-Id: I70c33062020e997c4df67b3eaa2e886cf0da961e Reviewed-on: https://go-review.googlesource.com/c/go/+/543660 Reviewed-by: Than McIntosh <thanm@google.com> Auto-Submit: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-17runtime: profile contended lock callsRhys Hiltner
Add runtime-internal locks to the mutex contention profile. Store up to one call stack responsible for lock contention on the M, until it's safe to contribute its value to the mprof table. Try to use that limited local storage space for a relatively large source of contention, and attribute any contention in stacks we're not able to store to a sentinel _LostContendedLock function. Avoid ballooning lock contention while manipulating the mprof table by attributing to that sentinel function any lock contention experienced while reporting lock contention. Guard collecting real call stacks with GODEBUG=profileruntimelocks=1, since the available data has mixed semantics; we can easily capture an M's own wait time, but we'd prefer for the profile entry of each critical section to describe how long it made the other Ms wait. It's too late in the Go 1.22 cycle to make the required changes to futex-based locks. When not enabled, attribute the time to the sentinel function instead. Fixes #57071 Change-Id: I3eee0ccbfc20f333b56f20d8725dfd7f3a526b41 Reviewed-on: https://go-review.googlesource.com/c/go/+/528657 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Rhys Hiltner <rhys@justin.tv> Reviewed-by: Than McIntosh <thanm@google.com>
2023-11-15runtime/metrics: add STW stopping and total time metricsMichael Pratt
This CL adds four new time histogram metrics: /sched/pauses/stopping/gc:seconds /sched/pauses/stopping/other:seconds /sched/pauses/total/gc:seconds /sched/pauses/total/other:seconds The "stopping" metrics measure the time taken to start a stop-the-world pause. i.e., how long it takes stopTheWorldWithSema to stop all Ps. This can be used to detect STW struggling to preempt Ps. The "total" metrics measure the total duration of a stop-the-world pause, from starting to stop-the-world until the world is started again. This includes the time spent in the "start" phase. The "gc" metrics are used for GC-related STW pauses. The "other" metrics are used for all other STW pauses. All of these metrics start timing in stopTheWorldWithSema only after successfully acquiring sched.lock, thus excluding lock contention on sched.lock. The reasoning behind this is that while waiting on sched.lock the world is not stopped at all (all other Ps can run), so the impact of this contention is primarily limited to the goroutine attempting to stop-the-world. Additionally, we already have some visibility into sched.lock contention via contention profiles (#57071). /sched/pauses/total/gc:seconds is conceptually equivalent to /gc/pauses:seconds, so the latter is marked as deprecated and returns the same histogram as the former. In the implementation, there are a few minor differences: * For both mark and sweep termination stops, /gc/pauses:seconds started timing prior to calling startTheWorldWithSema, thus including lock contention. These details are minor enough, that I do not believe the slight change in reporting will matter. For mark termination stops, moving timing stop into startTheWorldWithSema does have the side effect of requiring moving other GC metric calculations outside of the STW, as they depend on the same end time. Fixes #63340 Change-Id: Iacd0bab11bedab85d3dcfb982361413a7d9c0d05 Reviewed-on: https://go-review.googlesource.com/c/go/+/534161 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-09runtime: improve tickspersecondMichael Anthony Knyszek
Currently tickspersecond forces a 100 millisecond sleep the first time it's called. This isn't great for profiling short-lived programs, since both CPU profiling and block profiling might call into it. 100 milliseconds is a long time, but it's chosen to try and capture a decent estimate of the conversion on platform with course-granularity clocks. If the granularity is 15 ms, it'll only be 15% off at worst. Let's try a different strategy. First, let's require 5 milliseconds of time to have elapsed at a minimum. This should be plenty on platforms with nanosecond time granularity from the system clock, provided the caller of tickspersecond intends to use it for calculating durations, not timestamps. Next, grab a timestamp as close to process start as possible, so that we can cover some of that 5 millisecond just during runtime start. Finally, this function is only ever called from normal goroutine contexts. Let's do a regular goroutine sleep instead of a thread-level sleep under a runtime lock, which has all sorts of nasty effects on preemption. While we're here, let's also rename tickspersecond to ticksPerSecond. Also, let's write down some explicit rules of thumb on when to use this function. Clocks are hard, and using this for timestamp conversion is likely to make lining up those timestamps with other clocks on the system difficult if not impossible. Note that while this improves ticksPerSecond on platforms with good clocks, we still end up with a pretty coarse sleep on platforms with coarse clocks, and a pretty coarse result. On these platforms, keep the minimum required elapsed time at 100 ms. There's not much we can do about these platforms except spin and try to catch the clock boundary, but at 10+ ms of granularity, that might be a lot of spinning. Fixes #63103. Fixes #63078. Change-Id: Ic32a4ba70a03bdf5c13cb80c2669c4064aa4cca2 Reviewed-on: https://go-review.googlesource.com/c/go/+/538898 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-11-08runtime: add available godoc linkcui fliter
Change-Id: Ifb4844efddcb0369b0302eeab72394eeaf5c8072 Reviewed-on: https://go-review.googlesource.com/c/go/+/540022 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: shuang cui <imcusg@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-25runtime/pprof: include labels for caller of goroutine profileRhys Hiltner
The goroutine profile has close to three code paths for adding a goroutine record to the goroutine profile: one for the goroutine that requested the profile, one for every other goroutine, plus some special handling for the finalizer goroutine. The first of those captured the goroutine stack, but neglected to include that goroutine's labels. Update the tests to check for the inclusion of labels for all three types of goroutines, and include labels for the creator of the goroutine profile. For #63712 Change-Id: Id5387a5f536d3c37268c240e0b6db3d329a3d632 Reviewed-on: https://go-review.googlesource.com/c/go/+/537515 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Rhys Hiltner <rhys@justin.tv> Reviewed-by: David Chase <drchase@google.com>
2023-08-21runtime: document maxStack and m.createstack in more detailsAndy Pan
Change-Id: If93b6cfa5a598a5f4101c879a0cd88a194e4a6aa Reviewed-on: https://go-review.googlesource.com/c/go/+/518116 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Andy Pan <panjf2000@gmail.com>
2023-08-17runtime: change mutex profile to count every blocked goroutineRuss Cox
The pprof mutex profile was meant to match the Google C++ (now Abseil) mutex profiler, originally designed and implemented by Mike Burrows. When we worked on the Go version, pjw and I missed that C++ counts the time each thread is blocked, even if multiple threads are blocked on a mutex. That is, if 100 threads are blocked on the same mutex for the same 10ms, that still counts as 1000ms of contention in C++. In Go, to date, /debug/pprof/mutex has counted that as only 10ms of contention. If 100 goroutines are blocked on one mutex and only 1 goroutine is blocked on another mutex, we probably do want to see the first mutex as being more contended, so the Abseil approach is the more useful one. This CL adopts "contention scales with number of goroutines blocked", to better match Abseil [1]. However, it still makes sure to attribute the time to the unlock that caused the backup, not subsequent innocent unlocks that were affected by the congestion. In this way it still gives more accurate profiles than Abseil does. [1] https://github.com/abseil/abseil-cpp/blob/lts_2023_01_25/absl/synchronization/mutex.cc#L2390 Fixes #61015. Change-Id: I7eb9e706867ffa8c0abb5b26a1b448f6eba49331 Reviewed-on: https://go-review.googlesource.com/c/go/+/506415 Run-TryBot: Russ Cox <rsc@golang.org> Auto-Submit: Russ Cox <rsc@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-05-19runtime: emit STW events for all pauses, not just those for the GCMichael Anthony Knyszek
Currently STW events are only emitted for GC STWs. There's little reason why the trace can't contain events for every STW: they're rare so don't take up much space in the trace, yet being able to see when the world was stopped is often critical to debugging certain latency issues, especially when they stem from user-level APIs. This change adds new "kinds" to the EvGCSTWStart event, renames the GCSTW events to just "STW," and lets the parser deal with unknown STW kinds for future backwards compatibility. But, this change must break trace compatibility, so it bumps the trace version to Go 1.21. This change also includes a small cleanup in the trace command, which previously checked for STW events when deciding whether user tasks overlapped with a GC. Looking at the source, I don't see a way for STW events to ever enter the stream that that code looks at, so that condition has been deleted. Change-Id: I9a5dc144092c53e92eb6950e9a5504a790ac00cf Reviewed-on: https://go-review.googlesource.com/c/go/+/494495 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-05-11runtime: redefine _type to abi.Type; add rtype for methods.David Chase
Change-Id: I1c478b704d84811caa209006c657dda82d9c4cf9 Reviewed-on: https://go-review.googlesource.com/c/go/+/488435 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2023-03-10runtime: new API for filling PC traceback buffersAustin Clements
Currently, filling PC traceback buffers is one of the jobs of gentraceback. This moves it into a new function, tracebackPCs, with a simple API built around unwinder, and changes all callers to use this new API. Updates #54466. Change-Id: Id2038bded81bf533a5a4e71178a7c014904d938c Reviewed-on: https://go-review.googlesource.com/c/go/+/468300 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com>
2023-01-28runtime: remove go119ConcurrentGoroutineProfile flagKeith Randall
Change-Id: If7a248374dcb2c276d2d85a4863eb2ed1bc246a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/463226 Run-TryBot: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Rhys Hiltner <rhys@justin.tv>
2022-11-15runtime/pprof: scale mutex profile samples when they are recordedNick Ripley
Samples in the mutex profile have their count and duration scaled according to the probability they were sampled. This is done when the profile is actually requested. The adjustment is done using to the current configured sampling rate. However, if the sample rate is changed after a specific sample is recorded, then the sample will be scaled incorrectly. In particular, if the sampling rate is changed to 0, all of the samples in the encoded profile will have 0 count and duration. This means the profile will be "empty", even if it should have had samples. This CL scales the samples in the profile when they are recorded, rather than when the profile is requested. This matches what is currently done for the block profile. With this change, neither the block profile nor mutex profile are scaled when they are encoded, so the logic for scaling the samples can be removed. Change-Id: If228cf39284385aa8fb9a2d62492d839e02f027f Reviewed-on: https://go-review.googlesource.com/c/go/+/443056 Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2022-10-05runtime: avoid initializing MemProfileRate in init functionWang Deyu
Fixes #55100 Change-Id: Ibbff921e74c3a416fd8bb019d20410273961c015 Reviewed-on: https://go-review.googlesource.com/c/go/+/431315 Auto-Submit: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2022-09-05runtime: refactor finalizer goroutine statusLeonard Wang
Use an atomic.Uint32 to represent the state of finalizer goroutine. fingStatus will only be changed to fingWake in non fingWait state, so it is safe to set fingRunningFinalizer status in runfinq. name old time/op new time/op delta Finalizer-8 592µs ± 4% 561µs ± 1% -5.22% (p=0.000 n=10+10) FinalizerRun-8 694ns ± 6% 675ns ± 7% ~ (p=0.059 n=9+8) Change-Id: I7e4da30cec98ce99f7d8cf4c97f933a8a2d1cae1 Reviewed-on: https://go-review.googlesource.com/c/go/+/400134 Reviewed-by: Joedian Reid <joedian@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Michael Pratt <mpratt@google.com>
2022-08-19runtime: add and use runtime/internal/sys.NotInHeapCuong Manh Le
Updates #46731 Change-Id: Ic2208c8bb639aa1e390be0d62e2bd799ecf20654 Reviewed-on: https://go-review.googlesource.com/c/go/+/421878 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2022-07-18runtime: fix typosJun10ng
"finializer" => "finalizer" Change-Id: Ia3c12bff8556b6a8d33b700c59357f47502757b1 GitHub-Last-Rev: c64cf47974020c8480039ba61d0890bdc07a3b0f GitHub-Pull-Request: golang/go#53917 Reviewed-on: https://go-review.googlesource.com/c/go/+/417915 Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Keith Randall <khr@google.com>
2022-05-17all: fix spellingJohn Bampton
Change-Id: Iee18987c495d1d4bde9da888d454eea8079d3ebc GitHub-Last-Rev: ff5e01599ddf7deb3ab6ce190ba92eb02ae2cb15 GitHub-Pull-Request: golang/go#52949 Reviewed-on: https://go-review.googlesource.com/c/go/+/406915 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>
2022-05-10runtime: profile finalizer G more carefully in goroutine profileMichael Anthony Knyszek
If SetFinalizer is never called, we might readgstatus on a nil fing variable, resulting in a crash. This change guards code that accesses fing by a nil check. Fixes #52821. Change-Id: I3e8e7004f97f073dc622e801a1d37003ea153a29 Reviewed-on: https://go-review.googlesource.com/c/go/+/405475 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Rhys Hiltner <rhys@justin.tv>
2022-05-03runtime: split mprof locksRhys Hiltner
The profiles for memory allocations, sync.Mutex contention, and general blocking store their data in a shared hash table. The bookkeeping work at the end of a garbage collection cycle involves maintenance on each memory allocation record. Previously, a single lock guarded access to the hash table and the contents of all records. When a program has allocated memory at a large number of unique call stacks, the maintenance following every garbage collection can hold that lock for several milliseconds. That can prevent progress on all other goroutines by delaying acquirep's call to mcache.prepareForSweep, which needs the lock in mProf_Free to report when a profiled allocation is no longer in use. With no user goroutines making progress, it is in effect a multi-millisecond GC-related stop-the-world pause. Split the lock so the call to mProf_Flush no longer delays each P's call to mProf_Free: mProf_Free uses a lock on the memory records' N+1 cycle, and mProf_Flush uses locks on the memory records' accumulator and their N cycle. mProf_Malloc also no longer competes with mProf_Flush, as it uses a lock on the memory records' N+2 cycle. The profiles for sync.Mutex contention and general blocking now share a separate lock, and another lock guards insertions to the shared hash table (uncommon in the steady-state). Consumers of each type of profile take the matching accumulator lock, so will observe consistent count and magnitude values for each record. For #45894 Change-Id: I615ff80618d10e71025423daa64b0b7f9dc57daa Reviewed-on: https://go-review.googlesource.com/c/go/+/399956 Reviewed-by: Carlos Amedee <carlos@golang.org> Run-TryBot: Rhys Hiltner <rhys@justin.tv> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03runtime: decrease STW pause for goroutine profileRhys Hiltner
The goroutine profile needs to stop the world to get a consistent snapshot of all goroutines in the app. Leaving the world stopped while iterating over allgs leads to a pause proportional to the number of goroutines in the app (or its high-water mark). Instead, do only a fixed amount of bookkeeping while the world is stopped. Install a barrier so the scheduler confirms that a goroutine appears in the profile, with its stack recorded exactly as it was during the stop-the-world pause, before it allows that goroutine to execute. Iterate over allgs while the app resumes normal operations, adding each to the profile unless they've been scheduled in the meantime (and so have profiled themselves). Stop the world a second time to remove the barrier and do a fixed amount of cleanup work. This increases both the fixed overhead and per-goroutine CPU-time cost of GoroutineProfile. It also increases the wall-clock latency of the call to GoroutineProfile, since the scheduler may interrupt it to execute other goroutines. name old time/op new time/op delta GoroutineProfile/small/loaded-8 1.05ms ± 5% 4.99ms ±31% +376.85% (p=0.000 n=10+9) GoroutineProfile/sparse/loaded-8 1.04ms ± 4% 3.61ms ±27% +246.61% (p=0.000 n=10+10) GoroutineProfile/large/loaded-8 7.69ms ±17% 20.35ms ± 4% +164.50% (p=0.000 n=10+10) GoroutineProfile/small/idle 958µs ± 0% 1820µs ±23% +89.91% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.00ms ± 3% 1.52ms ±17% +51.18% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.01ms ± 4% 1.47ms ± 7% +45.28% (p=0.000 n=9+9) GoroutineProfile/sparse/idle 980µs ± 1% 1403µs ± 2% +43.22% (p=0.000 n=9+10) GoroutineProfile/large/idle-8 7.19ms ± 8% 8.43ms ±21% +17.22% (p=0.011 n=10+10) PingPongHog 511ns ± 8% 585ns ± 9% +14.39% (p=0.000 n=10+10) GoroutineProfile/large/idle 6.71ms ± 0% 7.58ms ± 3% +13.08% (p=0.000 n=8+10) PingPongHog-8 469ns ± 8% 509ns ±12% +8.62% (p=0.010 n=9+10) WakeupParallelSyscall/5µs 216µs ± 4% 229µs ± 3% +6.06% (p=0.000 n=10+9) WakeupParallelSyscall/5µs-8 147µs ± 1% 149µs ± 2% +1.12% (p=0.009 n=10+10) WakeupParallelSyscall/2µs-8 140µs ± 0% 142µs ± 1% +1.11% (p=0.001 n=10+9) WakeupParallelSyscall/50µs-8 236µs ± 0% 238µs ± 1% +1.08% (p=0.000 n=9+10) WakeupParallelSyscall/1µs-8 138µs ± 0% 140µs ± 1% +1.05% (p=0.013 n=10+9) Matmult 8.52ns ± 1% 8.61ns ± 0% +0.98% (p=0.002 n=10+8) WakeupParallelSyscall/10µs-8 157µs ± 1% 158µs ± 1% +0.58% (p=0.003 n=10+8) CreateGoroutinesSingle-8 328ns ± 0% 330ns ± 1% +0.57% (p=0.000 n=9+9) WakeupParallelSpinning/100µs-8 343µs ± 0% 344µs ± 1% +0.30% (p=0.015 n=8+8) WakeupParallelSyscall/20µs-8 178µs ± 0% 178µs ± 0% +0.18% (p=0.043 n=10+9) StackGrowthDeep-8 22.8µs ± 0% 22.9µs ± 0% +0.12% (p=0.006 n=10+10) StackGrowth 1.06µs ± 0% 1.06µs ± 0% +0.09% (p=0.000 n=8+9) WakeupParallelSpinning/0s 10.7µs ± 0% 10.7µs ± 0% +0.08% (p=0.000 n=9+9) WakeupParallelSpinning/5µs 30.7µs ± 0% 30.7µs ± 0% +0.04% (p=0.000 n=10+10) WakeupParallelSpinning/100µs 411µs ± 0% 411µs ± 0% +0.03% (p=0.000 n=10+9) WakeupParallelSpinning/2µs 18.7µs ± 0% 18.7µs ± 0% +0.02% (p=0.026 n=10+10) WakeupParallelSpinning/20µs-8 93.0µs ± 0% 93.0µs ± 0% +0.01% (p=0.021 n=9+10) StackGrowth-8 216ns ± 0% 216ns ± 0% ~ (p=0.209 n=10+10) CreateGoroutinesParallel-8 49.5ns ± 2% 49.3ns ± 1% ~ (p=0.591 n=10+10) CreateGoroutinesSingle 699ns ±20% 748ns ±19% ~ (p=0.353 n=10+10) WakeupParallelSpinning/0s-8 15.9µs ± 2% 16.0µs ± 3% ~ (p=0.315 n=10+10) WakeupParallelSpinning/1µs 14.6µs ± 0% 14.6µs ± 0% ~ (p=0.513 n=10+10) WakeupParallelSpinning/2µs-8 24.2µs ± 3% 24.1µs ± 2% ~ (p=0.971 n=10+10) WakeupParallelSpinning/10µs 50.7µs ± 0% 50.7µs ± 0% ~ (p=0.101 n=10+10) WakeupParallelSpinning/20µs 90.7µs ± 0% 90.7µs ± 0% ~ (p=0.898 n=10+10) WakeupParallelSpinning/50µs 211µs ± 0% 211µs ± 0% ~ (p=0.382 n=10+10) WakeupParallelSyscall/0s-8 137µs ± 1% 138µs ± 0% ~ (p=0.075 n=10+10) WakeupParallelSyscall/1µs 216µs ± 1% 219µs ± 3% ~ (p=0.065 n=10+9) WakeupParallelSyscall/2µs 214µs ± 7% 219µs ± 1% ~ (p=0.101 n=10+8) WakeupParallelSyscall/50µs 317µs ± 5% 326µs ± 4% ~ (p=0.123 n=10+10) WakeupParallelSyscall/100µs 450µs ± 9% 459µs ± 8% ~ (p=0.247 n=10+10) WakeupParallelSyscall/100µs-8 337µs ± 0% 338µs ± 1% ~ (p=0.089 n=10+10) WakeupParallelSpinning/5µs-8 32.2µs ± 0% 32.2µs ± 0% -0.05% (p=0.026 n=9+10) WakeupParallelSpinning/50µs-8 216µs ± 0% 216µs ± 0% -0.12% (p=0.004 n=10+10) WakeupParallelSpinning/1µs-8 20.6µs ± 0% 20.5µs ± 0% -0.22% (p=0.014 n=10+10) WakeupParallelSpinning/10µs-8 54.5µs ± 0% 54.2µs ± 1% -0.57% (p=0.000 n=10+10) CreateGoroutines-8 213ns ± 1% 211ns ± 1% -0.86% (p=0.002 n=10+10) CreateGoroutinesCapture 1.03µs ± 0% 1.02µs ± 0% -0.91% (p=0.000 n=10+10) CreateGoroutinesCapture-8 1.32µs ± 1% 1.31µs ± 1% -1.06% (p=0.001 n=10+9) CreateGoroutines 188ns ± 0% 186ns ± 0% -1.06% (p=0.000 n=9+10) CreateGoroutinesParallel 188ns ± 0% 186ns ± 0% -1.27% (p=0.000 n=8+10) WakeupParallelSyscall/0s 210µs ± 3% 207µs ± 3% -1.60% (p=0.043 n=10+10) StackGrowthDeep 121µs ± 1% 119µs ± 1% -1.70% (p=0.000 n=9+10) Matmult-8 1.82ns ± 3% 1.78ns ± 3% -2.16% (p=0.020 n=10+10) WakeupParallelSyscall/20µs 281µs ± 3% 269µs ± 4% -4.44% (p=0.000 n=10+10) WakeupParallelSyscall/10µs 239µs ± 3% 228µs ± 9% -4.70% (p=0.001 n=10+10) GoroutineProfile/sparse-nil/idle-8 485µs ± 2% 12µs ± 4% -97.56% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle-8 484µs ± 2% 12µs ± 1% -97.60% (p=0.000 n=10+7) GoroutineProfile/small-nil/loaded-8 487µs ± 2% 11µs ± 3% -97.68% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/loaded-8 507µs ± 4% 11µs ± 6% -97.78% (p=0.000 n=10+10) GoroutineProfile/large-nil/idle-8 709µs ± 2% 11µs ± 4% -98.38% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 717µs ± 2% 11µs ± 3% -98.43% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle 465µs ± 3% 1µs ± 1% -99.84% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle 493µs ± 3% 1µs ± 0% -99.85% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle 716µs ± 1% 1µs ± 2% -99.89% (p=0.000 n=7+10) name old alloc/op new alloc/op delta CreateGoroutinesCapture 144B ± 0% 144B ± 0% ~ (all equal) CreateGoroutinesCapture-8 144B ± 0% 144B ± 0% ~ (all equal) name old allocs/op new allocs/op delta CreateGoroutinesCapture 5.00 ± 0% 5.00 ± 0% ~ (all equal) CreateGoroutinesCapture-8 5.00 ± 0% 5.00 ± 0% ~ (all equal) name old p50-ns new p50-ns delta GoroutineProfile/small/loaded-8 1.01M ± 3% 3.87M ±45% +282.15% (p=0.000 n=10+10) GoroutineProfile/sparse/loaded-8 1.02M ± 3% 2.43M ±41% +138.42% (p=0.000 n=10+10) GoroutineProfile/large/loaded-8 7.43M ±16% 17.28M ± 2% +132.43% (p=0.000 n=10+10) GoroutineProfile/small/idle 956k ± 0% 1559k ±16% +63.03% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.01M ± 3% 1.45M ± 7% +44.31% (p=0.000 n=10+9) GoroutineProfile/sparse/idle 977k ± 1% 1399k ± 2% +43.20% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.00M ± 3% 1.41M ± 3% +40.47% (p=0.000 n=10+10) GoroutineProfile/large/idle-8 6.97M ± 1% 8.41M ±25% +20.54% (p=0.003 n=8+10) GoroutineProfile/large/idle 6.71M ± 1% 7.46M ± 4% +11.15% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle-8 483k ± 3% 13k ± 3% -97.41% (p=0.000 n=10+9) GoroutineProfile/small-nil/idle-8 483k ± 2% 12k ± 1% -97.43% (p=0.000 n=10+8) GoroutineProfile/small-nil/loaded-8 484k ± 3% 10k ± 2% -97.93% (p=0.000 n=10+8) GoroutineProfile/sparse-nil/loaded-8 492k ± 2% 10k ± 4% -97.97% (p=0.000 n=10+8) GoroutineProfile/large-nil/idle-8 708k ± 2% 12k ±15% -98.36% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 714k ± 2% 10k ± 2% -98.60% (p=0.000 n=10+8) GoroutineProfile/sparse-nil/idle 459k ± 1% 1k ± 1% -99.85% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle 477k ± 1% 1k ± 0% -99.85% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle 712k ± 1% 1k ± 1% -99.90% (p=0.000 n=7+10) name old p90-ns new p90-ns delta GoroutineProfile/small/loaded-8 1.13M ±10% 7.49M ±35% +562.07% (p=0.000 n=10+10) GoroutineProfile/sparse/loaded-8 1.10M ±12% 4.58M ±31% +318.02% (p=0.000 n=10+9) GoroutineProfile/large/loaded-8 8.78M ±24% 27.83M ± 2% +217.00% (p=0.000 n=10+10) GoroutineProfile/small/idle 967k ± 0% 2909k ±50% +200.91% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.02M ± 3% 1.96M ±76% +92.99% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.07M ±17% 1.55M ±12% +45.23% (p=0.000 n=10+10) GoroutineProfile/sparse/idle 992k ± 1% 1417k ± 3% +42.79% (p=0.000 n=9+10) GoroutineProfile/large/idle 6.73M ± 0% 7.99M ± 8% +18.80% (p=0.000 n=8+10) GoroutineProfile/large/idle-8 8.20M ±25% 9.18M ±25% ~ (p=0.315 n=10+10) GoroutineProfile/sparse-nil/idle-8 495k ± 3% 13k ± 1% -97.36% (p=0.000 n=10+9) GoroutineProfile/small-nil/idle-8 494k ± 2% 13k ± 3% -97.36% (p=0.000 n=10+10) GoroutineProfile/small-nil/loaded-8 496k ± 2% 13k ± 1% -97.41% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/loaded-8 544k ±11% 13k ± 1% -97.62% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle-8 724k ± 1% 13k ± 3% -98.20% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 729k ± 3% 13k ± 2% -98.23% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle 476k ± 4% 1k ± 1% -99.85% (p=0.000 n=9+10) GoroutineProfile/small-nil/idle 537k ±10% 1k ± 0% -99.87% (p=0.000 n=10+9) GoroutineProfile/large-nil/idle 729k ± 0% 1k ± 1% -99.90% (p=0.000 n=7+10) name old p99-ns new p99-ns delta GoroutineProfile/sparse/loaded-8 1.27M ±33% 20.49M ±17% +1514.61% (p=0.000 n=10+10) GoroutineProfile/small/loaded-8 1.37M ±29% 20.48M ±23% +1399.35% (p=0.000 n=10+10) GoroutineProfile/large/loaded-8 9.76M ±23% 39.98M ±22% +309.52% (p=0.000 n=10+8) GoroutineProfile/small/idle 976k ± 1% 3367k ±55% +244.94% (p=0.000 n=10+10) GoroutineProfile/sparse/idle-8 1.03M ± 3% 2.50M ±65% +142.30% (p=0.000 n=10+10) GoroutineProfile/small/idle-8 1.17M ±34% 1.70M ±14% +45.15% (p=0.000 n=10+10) GoroutineProfile/sparse/idle 1.02M ± 3% 1.45M ± 4% +42.64% (p=0.000 n=9+10) GoroutineProfile/large/idle 6.92M ± 2% 9.00M ± 7% +29.98% (p=0.000 n=8+9) GoroutineProfile/large/idle-8 8.74M ±23% 9.90M ±24% ~ (p=0.190 n=10+10) GoroutineProfile/sparse-nil/idle-8 508k ± 4% 16k ± 2% -96.90% (p=0.000 n=10+9) GoroutineProfile/small-nil/idle-8 508k ± 4% 16k ± 3% -96.91% (p=0.000 n=10+9) GoroutineProfile/small-nil/loaded-8 542k ± 5% 15k ±15% -97.15% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/loaded-8 649k ±16% 15k ±18% -97.67% (p=0.000 n=10+10) GoroutineProfile/large-nil/idle-8 738k ± 2% 16k ± 2% -97.86% (p=0.000 n=10+10) GoroutineProfile/large-nil/loaded-8 765k ± 4% 15k ±17% -98.03% (p=0.000 n=10+10) GoroutineProfile/sparse-nil/idle 539k ±26% 1k ±17% -99.84% (p=0.000 n=10+10) GoroutineProfile/small-nil/idle 659k ±25% 1k ± 0% -99.84% (p=0.000 n=10+8) GoroutineProfile/large-nil/idle 760k ± 2% 1k ±22% -99.88% (p=0.000 n=9+10) Fixes #33250 For #50794 Change-Id: I862a2bc4e991cec485f21a6fce4fca84f2c6435b Reviewed-on: https://go-review.googlesource.com/c/go/+/387415 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Rhys Hiltner <rhys@justin.tv> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03runtime/pprof: add race annotations for goroutine profilesRhys Hiltner
The race annotations for goroutine label maps covered the special type of read necessary to create CPU profiles. Extend that to include goroutine profiles. Annotate the copy involved in creating new goroutines. Fixes #50292 Change-Id: I10f69314e4f4eba85c506590fe4781f4d6b8ec2d Reviewed-on: https://go-review.googlesource.com/c/go/+/385660 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com>
2021-11-11runtime: bypass scheduler when doing traceback for goroutine profileCherry Mui
When acquire a goroutine profile, we stop the world then acquire a stack trace for each goroutine. When cgo traceback is used, the traceback code may call the cgo traceback function using cgocall. As the world is stopped, cgocall will be blocked at exitsyscall, causing a deadlock. Bypass the scheduler (using asmcgocall) to fix this. Change-Id: Ic4e596adc3711310b6a983d73786d697ef15dd72 Reviewed-on: https://go-review.googlesource.com/c/go/+/362757 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2021-11-02runtime, syscall: add calls to asan functionsfanzha02
Add explicit address sanitizer instrumentation to the runtime and syscall packages. The compiler does not instrument the runtime package. It does instrument the syscall package, but we need to add a couple of cases that it can't see. Refer to the implementation of the asan malloc runtime library, this patch also allocates extra memory as the redzone, around the returned memory region, and marks the redzone as unaddressable to detect the overflows or underflows. Updates #44853. Change-Id: I2753d1cc1296935a66bf521e31ce91e35fcdf798 Reviewed-on: https://go-review.googlesource.com/c/go/+/298614 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Trust: fannie zhang <Fannie.Zhang@arm.com>
2021-08-22runtime: fix buckHashSize duplicationDmitry Vyukov
We have a constant for 179999, don't duplicate it. Change-Id: Iefb9c4746f6dda2e08b42e3c978963198469ee8c Reviewed-on: https://go-review.googlesource.com/c/go/+/277375 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Pratt <mpratt@google.com>
2021-05-21[dev.typeparams] runtime: replace funcPC with internal/abi.FuncPCABIInternalCherry Mui
At this point all funcPC references are ABIInternal functions. Replace with the intrinsics. Change-Id: I3ba7e485c83017408749b53f92877d3727a75e27 Reviewed-on: https://go-review.googlesource.com/c/go/+/321954 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-04-27runtime/pprof: fix block profile biasFelix Geisendörfer
Block profiles were biased towards infrequent long events over frequent short events. This fix corrects the bias by aggregating shorter events as longer but less frequent in the profiles. As a result their cumulative duration will be accurately represented in the profile without skewing their sample mean (duration/count). Credit to @dvyukov for suggesting to adjust the count in the saveblockevent function. Fixes #44192. Change-Id: I71a99d7f6ebdb2d484d44890a2517863cceb4004 Reviewed-on: https://go-review.googlesource.com/c/go/+/299991 Trust: Michael Pratt <mpratt@google.com> Trust: Ian Lance Taylor <iant@golang.org> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2021-03-09runtime, cmd/link/internal/ld: disable memory profiling when data unreachableBrad Fitzpatrick
If runtime.MemProfile is unreachable, default to not collecting any memory profiling samples, to save memory on the hash table. Fixes #42347 Change-Id: I9a4894a5fc77035fe59b1842e1ec77a1182e70c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/299671 Reviewed-by: Cherry Zhang <cherryyz@google.com> Trust: Keith Randall <khr@golang.org>
2021-03-05runtime: encapsulate access to allgsMichael Pratt
Correctly accessing allgs is a bit hairy. Some paths need to lock allglock, some don't. Those that don't are safest using atomicAllG, but usage is not consistent. Rather than doing this ad-hoc, move all access* through forEachG / forEachGRace, the locking and atomic versions, respectively. This will make it easier to ensure safe access. * markroot is the only exception, as it has a far-removed guarantee of safe access via an atomic load of allglen far before actual use. Change-Id: Ie1c7a8243e155ae2b4bc3143577380c695680e89 Reviewed-on: https://go-review.googlesource.com/c/go/+/279994 Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-04-22runtime/pprof: plumb labels for goroutine profilesDavid Finkel
Goroutines are directly associated with labels. It's relatively easy to plumb those through without creating goroutine-locals in the wild. This is accomplished by splitting out most of the code from the public `runtime.GoroutineProfile` into a new unexported `runtime.goroutineProfileWithLabels`, which then has a thin wrapper linked into the `runtime/pprof` package as `runtime_goroutineProfileWithLabels`. (mirroring the way labels get associated with the `g` for a goroutine in the first place) Per-#6104, OS-thread creation profiles are a bit useless, as `M`s tend to be created be created by a background goroutine. As such, I decided not to add support for capturing the labels at `M`-creation-time, since the stack-traces seem to always come out `nil` for my simple test binaries. This change currently provides labels for debug=0 and debug=1, as debug=2 is currently entirely generated by the runtime package and I don't see a clean way of getting the `labelMap` type handled properly within the `runtime` package. Update the comment added in cl/131275 to mention goroutine support for labels. Updates #23458 Change-Id: Ia4b558893d7d10156b77121cd9b70c4ccd9e1889 Reviewed-on: https://go-review.googlesource.com/c/go/+/189318 Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2018-10-02runtime: add a more stable isSystemGoroutine modeAustin Clements
Currently, isSystemGoroutine varies on whether it considers the finalizer goroutine a user goroutine or a system goroutine. For the next CL, we're going to want to always consider the finalier goroutine a user goroutine, so add a flag that indicates that. Updates #26903. This is preparation for unifying STW GC and concurrent GC. Change-Id: Iafc92e519c13d9f8d879332cb5f0d12164104c33 Reviewed-on: https://go-review.googlesource.com/c/134778 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2018-04-26runtime: remove the dummy arg of getcallerspCherry Zhang
getcallersp is intrinsified, and so the dummy arg is no longer needed. Remove it, as well as a few dummy args that are solely to feed getcallersp. Change-Id: Ibb6c948ff9c56537042b380ac3be3a91b247aaa6 Reviewed-on: https://go-review.googlesource.com/109596 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-19runtime: specify behavior of SetMutexProfileFraction for negative valuesAditya Mukerjee
Change-Id: Ie4da1a515d5405140d742bdcd55f54a73a7f71fe Reviewed-on: https://go-review.googlesource.com/108175 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-22runtime: remove getcallerpc argumentAustin Clements
Now that getcallerpc is a compiler intrinsic on x86 and non-x86 platforms don't need the argument, we can drop it. Sadly, this doesn't let us remove any dummy arguments since all of those cases also use getcallersp, which still takes the argument pointer, but this is at least an improvement. Change-Id: I9c34a41cf2c18cba57f59938390bf9491efb22d2 Reviewed-on: https://go-review.googlesource.com/65474 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-31runtime: make runtime.GC() trigger a concurrent GCAustin Clements
Currently runtime.GC() triggers a STW GC. For common uses in tests and benchmarks, it doesn't matter whether it's STW or concurrent, but for uses in servers for things like collecting heap profiles and controlling memory footprint, this pause can be a bit problem for latency. This changes runtime.GC() to trigger a concurrent GC. In order to remain as close as possible to its current meaning, we define it to always perform a full mark/sweep GC cycle before returning (even if that means it has to finish up a cycle we're in the middle of first) and to publish the heap profile as of the triggered mark termination. While it must perform a full cycle, simultaneous runtime.GC() calls can be consolidated into a single full cycle. Fixes #18216. Change-Id: I9088cc5deef4ab6bcf0245ed1982a852a01c44b5 Reviewed-on: https://go-review.googlesource.com/37520 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31runtime: snapshot heap profile during mark terminationAustin Clements
Currently we snapshot the heap profile just *after* mark termination starts the world because it's a relatively expensive operation. However, this means any alloc or free events that happen between starting the world and snapshotting the heap profile can be accounted to the wrong cycle. In the worst case, a free can be accounted to the cycle before the alloc; if the heap is small, this can result temporarily in a negative "in use" count in the profile. Fix this without making STW more expensive by using a global heap profile cycle counter. This lets us split up the operation into a two parts: 1) a super-cheap snapshot operation that simply increments the global cycle counter during STW, and 2) a more expensive cleanup operation we can do after starting the world that frees up a slot in all buckets for use by the next heap profile cycle. Fixes #19311. Change-Id: I6bdafabf111c48b3d26fe2d91267f7bef0bd4270 Reviewed-on: https://go-review.googlesource.com/37714 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31runtime: pull heap profile cycle into a typeAustin Clements
Currently memRecord has the same set of four fields repeated three times. Pull these into a type and use this type three times. This cleans up and simplifies the code a bit and will make it easier to switch to a globally tracked heap profile cycle for #19311. Change-Id: I414d15673feaa406a8366b48784437c642997cf2 Reviewed-on: https://go-review.googlesource.com/37713 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31runtime: diagram flow of stats through heap profileAustin Clements
Every time I modify heap profiling, I find myself redrawing this diagram, so add it to the comments. This shows how allocations and frees are accounted, how we arrive at consistent profile snapshots, and when those snapshots are published to the user. Change-Id: I106aba1200af3c773b46e24e5f50205e808e2c69 Reviewed-on: https://go-review.googlesource.com/37514 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-10runtime: remove unused ratep parameterDaniel Martí
Found by github.com/mvdan/unparam. Change-Id: Iabcdfec2ae42c735aa23210b7183080d750682ca Reviewed-on: https://go-review.googlesource.com/38030 Reviewed-by: Peter Weinberger <pjw@google.com> Run-TryBot: Peter Weinberger <pjw@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-28runtime: ensure elapsed cycles are not negativePeter Weinberger
On solaris/amd64 sometimes the reported cycle count is negative. Replace with 0. Change-Id: I364eea5ca072281245c7ab3afb0bf69adc3a8eae Reviewed-on: https://go-review.googlesource.com/32258 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-10-28runtime: Profile goroutines holding contended mutexes.Peter Weinberger
runtime.SetMutexProfileFraction(n int) will capture 1/n-th of stack traces of goroutines holding contended mutexes if n > 0. From runtime/pprof, pprot.Lookup("mutex").WriteTo writes the accumulated stack traces to w (in essentially the same format that blocking profiling uses). Change-Id: Ie0b54fa4226853d99aa42c14cb529ae586a8335a Reviewed-on: https://go-review.googlesource.com/29650 Reviewed-by: Austin Clements <austin@google.com>
2016-10-15runtime: mark several types go:notinheapAustin Clements
This covers basically all sysAlloc'd, persistentalloc'd, and fixalloc'd types. Change-Id: I0487c887c2a0ade5e33d4c4c12d837e97468e66b Reviewed-on: https://go-review.googlesource.com/30941 Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-22cmd/compile: don't instrument copy and append in runtimeIan Lance Taylor
Instrumenting copy and append for the race detector changes them to call different functions. In the runtime package the alternate functions are not marked as nosplit. This caused a crash in the SIGPROF handler when invoked on a non-Go thread in a program built with the race detector. In some cases the handler can call copy, the race detector changed that to a call to a non-nosplit function, the function tried to check the stack guard, and crashed because it was running on a non-Go thread. The SIGPROF handler is written carefully to avoid such problems, but hidden function calls are difficult to avoid. Fix this by changing the compiler to not instrument copy and append when compiling the runtime package. Change the runtime package to add explicit race checks for the only code I could find where copy is used to write to user data (append is never used). Change-Id: I11078a66c0aaa459a7d2b827b49f4147922050af Reviewed-on: https://go-review.googlesource.com/29472 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-08-30runtime: rename fastrand1 to fastrandJosh Bleecher Snyder
Change-Id: I37706ff0a3486827c5b072c95ad890ea87ede847 Reviewed-on: https://go-review.googlesource.com/28210 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-22cmd/compile, etc: use nameOff for rtype stringDavid Crawshaw
linux/amd64: cmd/go: -8KB (basically nothing) linux/amd64 PIE: cmd/go: -191KB (1.6%) jujud: -1.5MB (1.9%) Updates #6853 Fixes #15064 Change-Id: I0adbb95685e28be92e8548741df0e11daa0a9b5f Reviewed-on: https://go-review.googlesource.com/21777 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-07runtime: eliminate unnecessary type conversionsMatthew Dempsky
Automated refactoring produced using github.com/mdempsky/unconvert. Change-Id: Iacf871a4f221ef17f48999a464ab2858b2bbaa90 Reviewed-on: https://go-review.googlesource.com/20071 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>