| Age | Commit message (Collapse) | Author |
|
Change-Id: I8fe66326994894b17ce0eda991bba942844d26b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/541475
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
profileruntimelocks is new in CL 544195, but the name is deceptive. Even
with profileruntimelocks=0, runtime-internal locks are still profiled.
The actual difference is that call stacks are not collected. Instead all
contention is reported at runtime._LostContendedLock.
Rename this setting to runtimecontentionstacks to make its name more
aligned with its behavior.
In addition, for this release the default is profileruntimelocks=0,
meaning that users are fairly likely to encounter
runtime._LostContendedLock. Rename it to
runtime._LostContendedRuntimeLock in an attempt to make it more
intuitive that these are runtime locks, not locks in application code.
For #57071.
Change-Id: I38aac28b2c0852db643d53b1eab3f3bc42a43393
Reviewed-on: https://go-review.googlesource.com/c/go/+/547055
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
|
|
Move ChaCha8 code into internal/chacha8rand and use it to implement
runtime.rand, which is used for the unseeded global source for
both math/rand and math/rand/v2. This also affects the calculation of
the start point for iteration over very very large maps (when the
32-bit fastrand is not big enough).
The benefit is that misuse of the global random number generators
in math/rand and math/rand/v2 in contexts where non-predictable
randomness is important for security reasons is no longer a
security problem, removing a common mistake among programmers
who are unaware of the different kinds of randomness.
The cost is an extra 304 bytes per thread stored in the m struct
plus 2-3ns more per random uint64 due to the more sophisticated
algorithm. Using PCG looks like it would cost about the same,
although I haven't benchmarked that.
Before this, the math/rand and math/rand/v2 global generator
was wyrand (https://github.com/wangyi-fudan/wyhash).
For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson
ALFG was justifiable, since the latter was not any better.
But for math/rand/v2, the global generator really should be
at least as good as one of the well-studied, specific algorithms
provided directly by the package, and it's not.
(Wyrand is still reasonable for scheduling and cache decisions.)
Good randomness does have a cost: about twice wyrand.
Also rationalize the various runtime rand references.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ bbb48afeb7.amd64 │ 5cf807d1ea.amd64 │
│ sec/op │ sec/op vs base │
ChaCha8-32 1.862n ± 2% 1.861n ± 2% ~ (p=0.825 n=20)
PCG_DXSM-32 1.471n ± 1% 1.460n ± 2% ~ (p=0.153 n=20)
SourceUint64-32 1.636n ± 2% 1.582n ± 1% -3.30% (p=0.000 n=20)
GlobalInt64-32 2.087n ± 1% 3.663n ± 1% +75.54% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1042n ± 1% 0.2026n ± 1% +94.48% (p=0.000 n=20)
GlobalUint64-32 2.263n ± 2% 3.724n ± 1% +64.57% (p=0.000 n=20)
GlobalUint64Parallel-32 0.1019n ± 1% 0.1973n ± 1% +93.67% (p=0.000 n=20)
Int64-32 1.771n ± 1% 1.774n ± 1% ~ (p=0.449 n=20)
Uint64-32 1.863n ± 2% 1.866n ± 1% ~ (p=0.364 n=20)
GlobalIntN1000-32 3.134n ± 3% 4.730n ± 2% +50.95% (p=0.000 n=20)
IntN1000-32 2.489n ± 1% 2.489n ± 1% ~ (p=0.683 n=20)
Int64N1000-32 2.521n ± 1% 2.516n ± 1% ~ (p=0.394 n=20)
Int64N1e8-32 2.479n ± 1% 2.478n ± 2% ~ (p=0.743 n=20)
Int64N1e9-32 2.530n ± 2% 2.514n ± 2% ~ (p=0.193 n=20)
Int64N2e9-32 2.501n ± 1% 2.494n ± 1% ~ (p=0.616 n=20)
Int64N1e18-32 3.227n ± 1% 3.205n ± 1% ~ (p=0.101 n=20)
Int64N2e18-32 3.647n ± 1% 3.599n ± 1% ~ (p=0.019 n=20)
Int64N4e18-32 5.135n ± 1% 5.069n ± 2% ~ (p=0.034 n=20)
Int32N1000-32 2.657n ± 1% 2.637n ± 1% ~ (p=0.180 n=20)
Int32N1e8-32 2.636n ± 1% 2.636n ± 1% ~ (p=0.763 n=20)
Int32N1e9-32 2.660n ± 2% 2.638n ± 1% ~ (p=0.358 n=20)
Int32N2e9-32 2.662n ± 2% 2.618n ± 2% ~ (p=0.064 n=20)
Float32-32 2.272n ± 2% 2.239n ± 2% ~ (p=0.194 n=20)
Float64-32 2.272n ± 1% 2.286n ± 2% ~ (p=0.763 n=20)
ExpFloat64-32 3.762n ± 1% 3.744n ± 1% ~ (p=0.171 n=20)
NormFloat64-32 3.706n ± 1% 3.655n ± 2% ~ (p=0.066 n=20)
Perm3-32 32.93n ± 3% 34.62n ± 1% +5.13% (p=0.000 n=20)
Perm30-32 202.9n ± 1% 204.0n ± 1% ~ (p=0.482 n=20)
Perm30ViaShuffle-32 115.0n ± 1% 114.9n ± 1% ~ (p=0.358 n=20)
ShuffleOverhead-32 112.8n ± 1% 112.7n ± 1% ~ (p=0.692 n=20)
Concurrent-32 2.107n ± 0% 3.725n ± 1% +76.75% (p=0.000 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
│ bbb48afeb7.arm64 │ 5cf807d1ea.arm64 │
│ sec/op │ sec/op vs base │
ChaCha8-8 2.480n ± 0% 2.429n ± 0% -2.04% (p=0.000 n=20)
PCG_DXSM-8 2.531n ± 0% 2.530n ± 0% ~ (p=0.877 n=20)
SourceUint64-8 2.534n ± 0% 2.533n ± 0% ~ (p=0.732 n=20)
GlobalInt64-8 2.172n ± 1% 4.794n ± 0% +120.67% (p=0.000 n=20)
GlobalInt64Parallel-8 0.4320n ± 0% 0.9605n ± 0% +122.32% (p=0.000 n=20)
GlobalUint64-8 2.182n ± 0% 4.770n ± 0% +118.58% (p=0.000 n=20)
GlobalUint64Parallel-8 0.4307n ± 0% 0.9583n ± 0% +122.51% (p=0.000 n=20)
Int64-8 4.107n ± 0% 4.104n ± 0% ~ (p=0.416 n=20)
Uint64-8 4.080n ± 0% 4.080n ± 0% ~ (p=0.052 n=20)
GlobalIntN1000-8 2.814n ± 2% 5.643n ± 0% +100.50% (p=0.000 n=20)
IntN1000-8 4.141n ± 0% 4.139n ± 0% ~ (p=0.140 n=20)
Int64N1000-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.313 n=20)
Int64N1e8-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.103 n=20)
Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.761 n=20)
Int64N2e9-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.636 n=20)
Int64N1e18-8 5.266n ± 0% 5.326n ± 1% +1.14% (p=0.001 n=20)
Int64N2e18-8 6.052n ± 0% 6.167n ± 0% +1.90% (p=0.000 n=20)
Int64N4e18-8 8.826n ± 0% 9.051n ± 0% +2.55% (p=0.000 n=20)
Int32N1000-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20)
Int32N1e8-8 4.126n ± 0% 4.131n ± 0% +0.12% (p=0.000 n=20)
Int32N1e9-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20)
Int32N2e9-8 4.132n ± 0% 4.131n ± 0% ~ (p=0.017 n=20)
Float32-8 4.109n ± 0% 4.105n ± 0% ~ (p=0.379 n=20)
Float64-8 4.107n ± 0% 4.106n ± 0% ~ (p=0.867 n=20)
ExpFloat64-8 5.339n ± 0% 5.383n ± 0% +0.82% (p=0.000 n=20)
NormFloat64-8 5.735n ± 0% 5.737n ± 1% ~ (p=0.856 n=20)
Perm3-8 26.65n ± 0% 26.80n ± 1% +0.58% (p=0.000 n=20)
Perm30-8 194.8n ± 1% 197.0n ± 0% +1.18% (p=0.000 n=20)
Perm30ViaShuffle-8 156.6n ± 0% 157.6n ± 1% +0.61% (p=0.000 n=20)
ShuffleOverhead-8 124.9n ± 0% 125.5n ± 0% +0.52% (p=0.000 n=20)
Concurrent-8 2.434n ± 3% 5.066n ± 0% +108.09% (p=0.000 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ bbb48afeb7.386 │ 5cf807d1ea.386 │
│ sec/op │ sec/op vs base │
ChaCha8-32 11.295n ± 1% 4.748n ± 2% -57.96% (p=0.000 n=20)
PCG_DXSM-32 7.693n ± 1% 7.738n ± 2% ~ (p=0.542 n=20)
SourceUint64-32 7.658n ± 2% 7.622n ± 2% ~ (p=0.344 n=20)
GlobalInt64-32 3.473n ± 2% 7.526n ± 2% +116.73% (p=0.000 n=20)
GlobalInt64Parallel-32 0.3198n ± 0% 0.5444n ± 0% +70.22% (p=0.000 n=20)
GlobalUint64-32 3.612n ± 0% 7.575n ± 1% +109.69% (p=0.000 n=20)
GlobalUint64Parallel-32 0.3168n ± 0% 0.5403n ± 0% +70.51% (p=0.000 n=20)
Int64-32 7.673n ± 2% 7.789n ± 1% ~ (p=0.122 n=20)
Uint64-32 7.773n ± 1% 7.827n ± 2% ~ (p=0.920 n=20)
GlobalIntN1000-32 6.268n ± 1% 9.581n ± 1% +52.87% (p=0.000 n=20)
IntN1000-32 10.33n ± 2% 10.45n ± 1% ~ (p=0.233 n=20)
Int64N1000-32 10.98n ± 2% 11.01n ± 1% ~ (p=0.401 n=20)
Int64N1e8-32 11.19n ± 2% 10.97n ± 1% ~ (p=0.033 n=20)
Int64N1e9-32 11.06n ± 1% 11.08n ± 1% ~ (p=0.498 n=20)
Int64N2e9-32 11.10n ± 1% 11.01n ± 2% ~ (p=0.995 n=20)
Int64N1e18-32 15.23n ± 2% 15.04n ± 1% ~ (p=0.973 n=20)
Int64N2e18-32 15.89n ± 1% 15.85n ± 1% ~ (p=0.409 n=20)
Int64N4e18-32 18.96n ± 2% 19.34n ± 2% ~ (p=0.048 n=20)
Int32N1000-32 10.46n ± 2% 10.44n ± 2% ~ (p=0.480 n=20)
Int32N1e8-32 10.46n ± 2% 10.49n ± 2% ~ (p=0.951 n=20)
Int32N1e9-32 10.28n ± 2% 10.26n ± 1% ~ (p=0.431 n=20)
Int32N2e9-32 10.50n ± 2% 10.44n ± 2% ~ (p=0.249 n=20)
Float32-32 13.80n ± 2% 13.80n ± 2% ~ (p=0.751 n=20)
Float64-32 23.55n ± 2% 23.87n ± 0% ~ (p=0.408 n=20)
ExpFloat64-32 15.36n ± 1% 15.29n ± 2% ~ (p=0.316 n=20)
NormFloat64-32 13.57n ± 1% 13.79n ± 1% +1.66% (p=0.005 n=20)
Perm3-32 45.70n ± 2% 46.99n ± 2% +2.81% (p=0.001 n=20)
Perm30-32 399.0n ± 1% 403.8n ± 1% +1.19% (p=0.006 n=20)
Perm30ViaShuffle-32 349.0n ± 1% 350.4n ± 1% ~ (p=0.909 n=20)
ShuffleOverhead-32 322.3n ± 1% 323.8n ± 1% ~ (p=0.410 n=20)
Concurrent-32 3.331n ± 1% 7.312n ± 1% +119.50% (p=0.000 n=20)
For #61716.
Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2
Reviewed-on: https://go-review.googlesource.com/c/go/+/516860
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add runtime-internal locks to the mutex contention profile.
Store up to one call stack responsible for lock contention on the M,
until it's safe to contribute its value to the mprof table. Try to use
that limited local storage space for a relatively large source of
contention, and attribute any contention in stacks we're not able to
store to a sentinel _LostContendedLock function.
Avoid ballooning lock contention while manipulating the mprof table by
attributing to that sentinel function any lock contention experienced
while reporting lock contention.
Guard collecting real call stacks with GODEBUG=profileruntimelocks=1,
since the available data has mixed semantics; we can easily capture an
M's own wait time, but we'd prefer for the profile entry of each
critical section to describe how long it made the other Ms wait. It's
too late in the Go 1.22 cycle to make the required changes to
futex-based locks. When not enabled, attribute the time to the sentinel
function instead.
Fixes #57071
This is a roll-forward of https://go.dev/cl/528657, which was reverted
in https://go.dev/cl/543660
Reason for revert: de-flakes tests (reduces dependence on fine-grained
timers, correctly identifies contention on big-endian futex locks,
attempts to measure contention in the semaphore implementation but only
uses that secondary measurement to finish the test early, skips tests on
single-processor systems)
Change-Id: I31389f24283d85e46ad9ba8d4f514cb9add8dfb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/544195
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: Rhys Hiltner <rhys@justin.tv>
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
|
|
This reverts commit go.dev/cl/528657.
Reason for revert: broke a lot of builders.
Change-Id: I70c33062020e997c4df67b3eaa2e886cf0da961e
Reviewed-on: https://go-review.googlesource.com/c/go/+/543660
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add runtime-internal locks to the mutex contention profile.
Store up to one call stack responsible for lock contention on the M,
until it's safe to contribute its value to the mprof table. Try to use
that limited local storage space for a relatively large source of
contention, and attribute any contention in stacks we're not able to
store to a sentinel _LostContendedLock function.
Avoid ballooning lock contention while manipulating the mprof table by
attributing to that sentinel function any lock contention experienced
while reporting lock contention.
Guard collecting real call stacks with GODEBUG=profileruntimelocks=1,
since the available data has mixed semantics; we can easily capture an
M's own wait time, but we'd prefer for the profile entry of each
critical section to describe how long it made the other Ms wait. It's
too late in the Go 1.22 cycle to make the required changes to
futex-based locks. When not enabled, attribute the time to the sentinel
function instead.
Fixes #57071
Change-Id: I3eee0ccbfc20f333b56f20d8725dfd7f3a526b41
Reviewed-on: https://go-review.googlesource.com/c/go/+/528657
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
This CL adds four new time histogram metrics:
/sched/pauses/stopping/gc:seconds
/sched/pauses/stopping/other:seconds
/sched/pauses/total/gc:seconds
/sched/pauses/total/other:seconds
The "stopping" metrics measure the time taken to start a stop-the-world
pause. i.e., how long it takes stopTheWorldWithSema to stop all Ps.
This can be used to detect STW struggling to preempt Ps.
The "total" metrics measure the total duration of a stop-the-world
pause, from starting to stop-the-world until the world is started again.
This includes the time spent in the "start" phase.
The "gc" metrics are used for GC-related STW pauses. The "other" metrics
are used for all other STW pauses.
All of these metrics start timing in stopTheWorldWithSema only after
successfully acquiring sched.lock, thus excluding lock contention on
sched.lock. The reasoning behind this is that while waiting on
sched.lock the world is not stopped at all (all other Ps can run), so
the impact of this contention is primarily limited to the goroutine
attempting to stop-the-world. Additionally, we already have some
visibility into sched.lock contention via contention profiles (#57071).
/sched/pauses/total/gc:seconds is conceptually equivalent to
/gc/pauses:seconds, so the latter is marked as deprecated and returns
the same histogram as the former.
In the implementation, there are a few minor differences:
* For both mark and sweep termination stops, /gc/pauses:seconds started
timing prior to calling startTheWorldWithSema, thus including lock
contention.
These details are minor enough, that I do not believe the slight change
in reporting will matter. For mark termination stops, moving timing stop
into startTheWorldWithSema does have the side effect of requiring moving
other GC metric calculations outside of the STW, as they depend on the
same end time.
Fixes #63340
Change-Id: Iacd0bab11bedab85d3dcfb982361413a7d9c0d05
Reviewed-on: https://go-review.googlesource.com/c/go/+/534161
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Currently tickspersecond forces a 100 millisecond sleep the first time
it's called. This isn't great for profiling short-lived programs, since
both CPU profiling and block profiling might call into it.
100 milliseconds is a long time, but it's chosen to try and capture a
decent estimate of the conversion on platform with course-granularity
clocks. If the granularity is 15 ms, it'll only be 15% off at worst.
Let's try a different strategy. First, let's require 5 milliseconds of
time to have elapsed at a minimum. This should be plenty on platforms
with nanosecond time granularity from the system clock, provided the
caller of tickspersecond intends to use it for calculating durations,
not timestamps. Next, grab a timestamp as close to process start as
possible, so that we can cover some of that 5 millisecond just during
runtime start.
Finally, this function is only ever called from normal goroutine
contexts. Let's do a regular goroutine sleep instead of a thread-level
sleep under a runtime lock, which has all sorts of nasty effects on
preemption.
While we're here, let's also rename tickspersecond to ticksPerSecond.
Also, let's write down some explicit rules of thumb on when to use this
function. Clocks are hard, and using this for timestamp conversion is
likely to make lining up those timestamps with other clocks on the
system difficult if not impossible.
Note that while this improves ticksPerSecond on platforms with good
clocks, we still end up with a pretty coarse sleep on platforms with
coarse clocks, and a pretty coarse result. On these platforms, keep the
minimum required elapsed time at 100 ms. There's not much we can do
about these platforms except spin and try to catch the clock boundary,
but at 10+ ms of granularity, that might be a lot of spinning.
Fixes #63103.
Fixes #63078.
Change-Id: Ic32a4ba70a03bdf5c13cb80c2669c4064aa4cca2
Reviewed-on: https://go-review.googlesource.com/c/go/+/538898
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Change-Id: Ifb4844efddcb0369b0302eeab72394eeaf5c8072
Reviewed-on: https://go-review.googlesource.com/c/go/+/540022
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: shuang cui <imcusg@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The goroutine profile has close to three code paths for adding a
goroutine record to the goroutine profile: one for the goroutine that
requested the profile, one for every other goroutine, plus some special
handling for the finalizer goroutine. The first of those captured the
goroutine stack, but neglected to include that goroutine's labels.
Update the tests to check for the inclusion of labels for all three
types of goroutines, and include labels for the creator of the goroutine
profile.
For #63712
Change-Id: Id5387a5f536d3c37268c240e0b6db3d329a3d632
Reviewed-on: https://go-review.googlesource.com/c/go/+/537515
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: David Chase <drchase@google.com>
|
|
Change-Id: If93b6cfa5a598a5f4101c879a0cd88a194e4a6aa
Reviewed-on: https://go-review.googlesource.com/c/go/+/518116
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Andy Pan <panjf2000@gmail.com>
|
|
The pprof mutex profile was meant to match the Google C++ (now Abseil)
mutex profiler, originally designed and implemented by Mike Burrows.
When we worked on the Go version, pjw and I missed that C++ counts the
time each thread is blocked, even if multiple threads are blocked on a
mutex. That is, if 100 threads are blocked on the same mutex for the
same 10ms, that still counts as 1000ms of contention in C++. In Go, to
date, /debug/pprof/mutex has counted that as only 10ms of contention.
If 100 goroutines are blocked on one mutex and only 1 goroutine is
blocked on another mutex, we probably do want to see the first mutex
as being more contended, so the Abseil approach is the more useful one.
This CL adopts "contention scales with number of goroutines blocked",
to better match Abseil [1]. However, it still makes sure to attribute the
time to the unlock that caused the backup, not subsequent innocent
unlocks that were affected by the congestion. In this way it still gives
more accurate profiles than Abseil does.
[1] https://github.com/abseil/abseil-cpp/blob/lts_2023_01_25/absl/synchronization/mutex.cc#L2390
Fixes #61015.
Change-Id: I7eb9e706867ffa8c0abb5b26a1b448f6eba49331
Reviewed-on: https://go-review.googlesource.com/c/go/+/506415
Run-TryBot: Russ Cox <rsc@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Currently STW events are only emitted for GC STWs. There's little reason
why the trace can't contain events for every STW: they're rare so don't
take up much space in the trace, yet being able to see when the world
was stopped is often critical to debugging certain latency issues,
especially when they stem from user-level APIs.
This change adds new "kinds" to the EvGCSTWStart event, renames the
GCSTW events to just "STW," and lets the parser deal with unknown STW
kinds for future backwards compatibility.
But, this change must break trace compatibility, so it bumps the trace
version to Go 1.21.
This change also includes a small cleanup in the trace command, which
previously checked for STW events when deciding whether user tasks
overlapped with a GC. Looking at the source, I don't see a way for STW
events to ever enter the stream that that code looks at, so that
condition has been deleted.
Change-Id: I9a5dc144092c53e92eb6950e9a5504a790ac00cf
Reviewed-on: https://go-review.googlesource.com/c/go/+/494495
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
|
|
Change-Id: I1c478b704d84811caa209006c657dda82d9c4cf9
Reviewed-on: https://go-review.googlesource.com/c/go/+/488435
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Currently, filling PC traceback buffers is one of the jobs of
gentraceback. This moves it into a new function, tracebackPCs, with a
simple API built around unwinder, and changes all callers to use this
new API.
Updates #54466.
Change-Id: Id2038bded81bf533a5a4e71178a7c014904d938c
Reviewed-on: https://go-review.googlesource.com/c/go/+/468300
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
|
|
Change-Id: If7a248374dcb2c276d2d85a4863eb2ed1bc246a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/463226
Run-TryBot: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
|
|
Samples in the mutex profile have their count and duration scaled
according to the probability they were sampled. This is done when the
profile is actually requested. The adjustment is done using to the
current configured sampling rate. However, if the sample rate is changed
after a specific sample is recorded, then the sample will be scaled
incorrectly. In particular, if the sampling rate is changed to 0, all of
the samples in the encoded profile will have 0 count and duration. This
means the profile will be "empty", even if it should have had samples.
This CL scales the samples in the profile when they are recorded, rather
than when the profile is requested. This matches what is currently done
for the block profile.
With this change, neither the block profile nor mutex profile are scaled
when they are encoded, so the logic for scaling the samples can be
removed.
Change-Id: If228cf39284385aa8fb9a2d62492d839e02f027f
Reviewed-on: https://go-review.googlesource.com/c/go/+/443056
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Fixes #55100
Change-Id: Ibbff921e74c3a416fd8bb019d20410273961c015
Reviewed-on: https://go-review.googlesource.com/c/go/+/431315
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Use an atomic.Uint32 to represent the state of finalizer goroutine.
fingStatus will only be changed to fingWake in non fingWait state,
so it is safe to set fingRunningFinalizer status in runfinq.
name old time/op new time/op delta
Finalizer-8 592µs ± 4% 561µs ± 1% -5.22% (p=0.000 n=10+10)
FinalizerRun-8 694ns ± 6% 675ns ± 7% ~ (p=0.059 n=9+8)
Change-Id: I7e4da30cec98ce99f7d8cf4c97f933a8a2d1cae1
Reviewed-on: https://go-review.googlesource.com/c/go/+/400134
Reviewed-by: Joedian Reid <joedian@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Updates #46731
Change-Id: Ic2208c8bb639aa1e390be0d62e2bd799ecf20654
Reviewed-on: https://go-review.googlesource.com/c/go/+/421878
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
"finializer" => "finalizer"
Change-Id: Ia3c12bff8556b6a8d33b700c59357f47502757b1
GitHub-Last-Rev: c64cf47974020c8480039ba61d0890bdc07a3b0f
GitHub-Pull-Request: golang/go#53917
Reviewed-on: https://go-review.googlesource.com/c/go/+/417915
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: Iee18987c495d1d4bde9da888d454eea8079d3ebc
GitHub-Last-Rev: ff5e01599ddf7deb3ab6ce190ba92eb02ae2cb15
GitHub-Pull-Request: golang/go#52949
Reviewed-on: https://go-review.googlesource.com/c/go/+/406915
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
|
|
If SetFinalizer is never called, we might readgstatus on a nil fing
variable, resulting in a crash. This change guards code that accesses
fing by a nil check.
Fixes #52821.
Change-Id: I3e8e7004f97f073dc622e801a1d37003ea153a29
Reviewed-on: https://go-review.googlesource.com/c/go/+/405475
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
|
|
The profiles for memory allocations, sync.Mutex contention, and general
blocking store their data in a shared hash table. The bookkeeping work
at the end of a garbage collection cycle involves maintenance on each
memory allocation record. Previously, a single lock guarded access to
the hash table and the contents of all records. When a program has
allocated memory at a large number of unique call stacks, the
maintenance following every garbage collection can hold that lock for
several milliseconds. That can prevent progress on all other goroutines
by delaying acquirep's call to mcache.prepareForSweep, which needs the
lock in mProf_Free to report when a profiled allocation is no longer in
use. With no user goroutines making progress, it is in effect a
multi-millisecond GC-related stop-the-world pause.
Split the lock so the call to mProf_Flush no longer delays each P's call
to mProf_Free: mProf_Free uses a lock on the memory records' N+1 cycle,
and mProf_Flush uses locks on the memory records' accumulator and their
N cycle. mProf_Malloc also no longer competes with mProf_Flush, as it
uses a lock on the memory records' N+2 cycle. The profiles for
sync.Mutex contention and general blocking now share a separate lock,
and another lock guards insertions to the shared hash table (uncommon in
the steady-state). Consumers of each type of profile take the matching
accumulator lock, so will observe consistent count and magnitude values
for each record.
For #45894
Change-Id: I615ff80618d10e71025423daa64b0b7f9dc57daa
Reviewed-on: https://go-review.googlesource.com/c/go/+/399956
Reviewed-by: Carlos Amedee <carlos@golang.org>
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
The goroutine profile needs to stop the world to get a consistent
snapshot of all goroutines in the app. Leaving the world stopped while
iterating over allgs leads to a pause proportional to the number of
goroutines in the app (or its high-water mark).
Instead, do only a fixed amount of bookkeeping while the world is
stopped. Install a barrier so the scheduler confirms that a goroutine
appears in the profile, with its stack recorded exactly as it was during
the stop-the-world pause, before it allows that goroutine to execute.
Iterate over allgs while the app resumes normal operations, adding each
to the profile unless they've been scheduled in the meantime (and so
have profiled themselves). Stop the world a second time to remove the
barrier and do a fixed amount of cleanup work.
This increases both the fixed overhead and per-goroutine CPU-time cost
of GoroutineProfile. It also increases the wall-clock latency of the
call to GoroutineProfile, since the scheduler may interrupt it to
execute other goroutines.
name old time/op new time/op delta
GoroutineProfile/small/loaded-8 1.05ms ± 5% 4.99ms ±31% +376.85% (p=0.000 n=10+9)
GoroutineProfile/sparse/loaded-8 1.04ms ± 4% 3.61ms ±27% +246.61% (p=0.000 n=10+10)
GoroutineProfile/large/loaded-8 7.69ms ±17% 20.35ms ± 4% +164.50% (p=0.000 n=10+10)
GoroutineProfile/small/idle 958µs ± 0% 1820µs ±23% +89.91% (p=0.000 n=10+10)
GoroutineProfile/sparse/idle-8 1.00ms ± 3% 1.52ms ±17% +51.18% (p=0.000 n=10+10)
GoroutineProfile/small/idle-8 1.01ms ± 4% 1.47ms ± 7% +45.28% (p=0.000 n=9+9)
GoroutineProfile/sparse/idle 980µs ± 1% 1403µs ± 2% +43.22% (p=0.000 n=9+10)
GoroutineProfile/large/idle-8 7.19ms ± 8% 8.43ms ±21% +17.22% (p=0.011 n=10+10)
PingPongHog 511ns ± 8% 585ns ± 9% +14.39% (p=0.000 n=10+10)
GoroutineProfile/large/idle 6.71ms ± 0% 7.58ms ± 3% +13.08% (p=0.000 n=8+10)
PingPongHog-8 469ns ± 8% 509ns ±12% +8.62% (p=0.010 n=9+10)
WakeupParallelSyscall/5µs 216µs ± 4% 229µs ± 3% +6.06% (p=0.000 n=10+9)
WakeupParallelSyscall/5µs-8 147µs ± 1% 149µs ± 2% +1.12% (p=0.009 n=10+10)
WakeupParallelSyscall/2µs-8 140µs ± 0% 142µs ± 1% +1.11% (p=0.001 n=10+9)
WakeupParallelSyscall/50µs-8 236µs ± 0% 238µs ± 1% +1.08% (p=0.000 n=9+10)
WakeupParallelSyscall/1µs-8 138µs ± 0% 140µs ± 1% +1.05% (p=0.013 n=10+9)
Matmult 8.52ns ± 1% 8.61ns ± 0% +0.98% (p=0.002 n=10+8)
WakeupParallelSyscall/10µs-8 157µs ± 1% 158µs ± 1% +0.58% (p=0.003 n=10+8)
CreateGoroutinesSingle-8 328ns ± 0% 330ns ± 1% +0.57% (p=0.000 n=9+9)
WakeupParallelSpinning/100µs-8 343µs ± 0% 344µs ± 1% +0.30% (p=0.015 n=8+8)
WakeupParallelSyscall/20µs-8 178µs ± 0% 178µs ± 0% +0.18% (p=0.043 n=10+9)
StackGrowthDeep-8 22.8µs ± 0% 22.9µs ± 0% +0.12% (p=0.006 n=10+10)
StackGrowth 1.06µs ± 0% 1.06µs ± 0% +0.09% (p=0.000 n=8+9)
WakeupParallelSpinning/0s 10.7µs ± 0% 10.7µs ± 0% +0.08% (p=0.000 n=9+9)
WakeupParallelSpinning/5µs 30.7µs ± 0% 30.7µs ± 0% +0.04% (p=0.000 n=10+10)
WakeupParallelSpinning/100µs 411µs ± 0% 411µs ± 0% +0.03% (p=0.000 n=10+9)
WakeupParallelSpinning/2µs 18.7µs ± 0% 18.7µs ± 0% +0.02% (p=0.026 n=10+10)
WakeupParallelSpinning/20µs-8 93.0µs ± 0% 93.0µs ± 0% +0.01% (p=0.021 n=9+10)
StackGrowth-8 216ns ± 0% 216ns ± 0% ~ (p=0.209 n=10+10)
CreateGoroutinesParallel-8 49.5ns ± 2% 49.3ns ± 1% ~ (p=0.591 n=10+10)
CreateGoroutinesSingle 699ns ±20% 748ns ±19% ~ (p=0.353 n=10+10)
WakeupParallelSpinning/0s-8 15.9µs ± 2% 16.0µs ± 3% ~ (p=0.315 n=10+10)
WakeupParallelSpinning/1µs 14.6µs ± 0% 14.6µs ± 0% ~ (p=0.513 n=10+10)
WakeupParallelSpinning/2µs-8 24.2µs ± 3% 24.1µs ± 2% ~ (p=0.971 n=10+10)
WakeupParallelSpinning/10µs 50.7µs ± 0% 50.7µs ± 0% ~ (p=0.101 n=10+10)
WakeupParallelSpinning/20µs 90.7µs ± 0% 90.7µs ± 0% ~ (p=0.898 n=10+10)
WakeupParallelSpinning/50µs 211µs ± 0% 211µs ± 0% ~ (p=0.382 n=10+10)
WakeupParallelSyscall/0s-8 137µs ± 1% 138µs ± 0% ~ (p=0.075 n=10+10)
WakeupParallelSyscall/1µs 216µs ± 1% 219µs ± 3% ~ (p=0.065 n=10+9)
WakeupParallelSyscall/2µs 214µs ± 7% 219µs ± 1% ~ (p=0.101 n=10+8)
WakeupParallelSyscall/50µs 317µs ± 5% 326µs ± 4% ~ (p=0.123 n=10+10)
WakeupParallelSyscall/100µs 450µs ± 9% 459µs ± 8% ~ (p=0.247 n=10+10)
WakeupParallelSyscall/100µs-8 337µs ± 0% 338µs ± 1% ~ (p=0.089 n=10+10)
WakeupParallelSpinning/5µs-8 32.2µs ± 0% 32.2µs ± 0% -0.05% (p=0.026 n=9+10)
WakeupParallelSpinning/50µs-8 216µs ± 0% 216µs ± 0% -0.12% (p=0.004 n=10+10)
WakeupParallelSpinning/1µs-8 20.6µs ± 0% 20.5µs ± 0% -0.22% (p=0.014 n=10+10)
WakeupParallelSpinning/10µs-8 54.5µs ± 0% 54.2µs ± 1% -0.57% (p=0.000 n=10+10)
CreateGoroutines-8 213ns ± 1% 211ns ± 1% -0.86% (p=0.002 n=10+10)
CreateGoroutinesCapture 1.03µs ± 0% 1.02µs ± 0% -0.91% (p=0.000 n=10+10)
CreateGoroutinesCapture-8 1.32µs ± 1% 1.31µs ± 1% -1.06% (p=0.001 n=10+9)
CreateGoroutines 188ns ± 0% 186ns ± 0% -1.06% (p=0.000 n=9+10)
CreateGoroutinesParallel 188ns ± 0% 186ns ± 0% -1.27% (p=0.000 n=8+10)
WakeupParallelSyscall/0s 210µs ± 3% 207µs ± 3% -1.60% (p=0.043 n=10+10)
StackGrowthDeep 121µs ± 1% 119µs ± 1% -1.70% (p=0.000 n=9+10)
Matmult-8 1.82ns ± 3% 1.78ns ± 3% -2.16% (p=0.020 n=10+10)
WakeupParallelSyscall/20µs 281µs ± 3% 269µs ± 4% -4.44% (p=0.000 n=10+10)
WakeupParallelSyscall/10µs 239µs ± 3% 228µs ± 9% -4.70% (p=0.001 n=10+10)
GoroutineProfile/sparse-nil/idle-8 485µs ± 2% 12µs ± 4% -97.56% (p=0.000 n=10+10)
GoroutineProfile/small-nil/idle-8 484µs ± 2% 12µs ± 1% -97.60% (p=0.000 n=10+7)
GoroutineProfile/small-nil/loaded-8 487µs ± 2% 11µs ± 3% -97.68% (p=0.000 n=10+10)
GoroutineProfile/sparse-nil/loaded-8 507µs ± 4% 11µs ± 6% -97.78% (p=0.000 n=10+10)
GoroutineProfile/large-nil/idle-8 709µs ± 2% 11µs ± 4% -98.38% (p=0.000 n=10+10)
GoroutineProfile/large-nil/loaded-8 717µs ± 2% 11µs ± 3% -98.43% (p=0.000 n=10+10)
GoroutineProfile/sparse-nil/idle 465µs ± 3% 1µs ± 1% -99.84% (p=0.000 n=10+10)
GoroutineProfile/small-nil/idle 493µs ± 3% 1µs ± 0% -99.85% (p=0.000 n=10+9)
GoroutineProfile/large-nil/idle 716µs ± 1% 1µs ± 2% -99.89% (p=0.000 n=7+10)
name old alloc/op new alloc/op delta
CreateGoroutinesCapture 144B ± 0% 144B ± 0% ~ (all equal)
CreateGoroutinesCapture-8 144B ± 0% 144B ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
CreateGoroutinesCapture 5.00 ± 0% 5.00 ± 0% ~ (all equal)
CreateGoroutinesCapture-8 5.00 ± 0% 5.00 ± 0% ~ (all equal)
name old p50-ns new p50-ns delta
GoroutineProfile/small/loaded-8 1.01M ± 3% 3.87M ±45% +282.15% (p=0.000 n=10+10)
GoroutineProfile/sparse/loaded-8 1.02M ± 3% 2.43M ±41% +138.42% (p=0.000 n=10+10)
GoroutineProfile/large/loaded-8 7.43M ±16% 17.28M ± 2% +132.43% (p=0.000 n=10+10)
GoroutineProfile/small/idle 956k ± 0% 1559k ±16% +63.03% (p=0.000 n=10+10)
GoroutineProfile/small/idle-8 1.01M ± 3% 1.45M ± 7% +44.31% (p=0.000 n=10+9)
GoroutineProfile/sparse/idle 977k ± 1% 1399k ± 2% +43.20% (p=0.000 n=10+10)
GoroutineProfile/sparse/idle-8 1.00M ± 3% 1.41M ± 3% +40.47% (p=0.000 n=10+10)
GoroutineProfile/large/idle-8 6.97M ± 1% 8.41M ±25% +20.54% (p=0.003 n=8+10)
GoroutineProfile/large/idle 6.71M ± 1% 7.46M ± 4% +11.15% (p=0.000 n=10+10)
GoroutineProfile/sparse-nil/idle-8 483k ± 3% 13k ± 3% -97.41% (p=0.000 n=10+9)
GoroutineProfile/small-nil/idle-8 483k ± 2% 12k ± 1% -97.43% (p=0.000 n=10+8)
GoroutineProfile/small-nil/loaded-8 484k ± 3% 10k ± 2% -97.93% (p=0.000 n=10+8)
GoroutineProfile/sparse-nil/loaded-8 492k ± 2% 10k ± 4% -97.97% (p=0.000 n=10+8)
GoroutineProfile/large-nil/idle-8 708k ± 2% 12k ±15% -98.36% (p=0.000 n=10+10)
GoroutineProfile/large-nil/loaded-8 714k ± 2% 10k ± 2% -98.60% (p=0.000 n=10+8)
GoroutineProfile/sparse-nil/idle 459k ± 1% 1k ± 1% -99.85% (p=0.000 n=10+10)
GoroutineProfile/small-nil/idle 477k ± 1% 1k ± 0% -99.85% (p=0.000 n=10+9)
GoroutineProfile/large-nil/idle 712k ± 1% 1k ± 1% -99.90% (p=0.000 n=7+10)
name old p90-ns new p90-ns delta
GoroutineProfile/small/loaded-8 1.13M ±10% 7.49M ±35% +562.07% (p=0.000 n=10+10)
GoroutineProfile/sparse/loaded-8 1.10M ±12% 4.58M ±31% +318.02% (p=0.000 n=10+9)
GoroutineProfile/large/loaded-8 8.78M ±24% 27.83M ± 2% +217.00% (p=0.000 n=10+10)
GoroutineProfile/small/idle 967k ± 0% 2909k ±50% +200.91% (p=0.000 n=10+10)
GoroutineProfile/sparse/idle-8 1.02M ± 3% 1.96M ±76% +92.99% (p=0.000 n=10+10)
GoroutineProfile/small/idle-8 1.07M ±17% 1.55M ±12% +45.23% (p=0.000 n=10+10)
GoroutineProfile/sparse/idle 992k ± 1% 1417k ± 3% +42.79% (p=0.000 n=9+10)
GoroutineProfile/large/idle 6.73M ± 0% 7.99M ± 8% +18.80% (p=0.000 n=8+10)
GoroutineProfile/large/idle-8 8.20M ±25% 9.18M ±25% ~ (p=0.315 n=10+10)
GoroutineProfile/sparse-nil/idle-8 495k ± 3% 13k ± 1% -97.36% (p=0.000 n=10+9)
GoroutineProfile/small-nil/idle-8 494k ± 2% 13k ± 3% -97.36% (p=0.000 n=10+10)
GoroutineProfile/small-nil/loaded-8 496k ± 2% 13k ± 1% -97.41% (p=0.000 n=10+10)
GoroutineProfile/sparse-nil/loaded-8 544k ±11% 13k ± 1% -97.62% (p=0.000 n=10+9)
GoroutineProfile/large-nil/idle-8 724k ± 1% 13k ± 3% -98.20% (p=0.000 n=10+10)
GoroutineProfile/large-nil/loaded-8 729k ± 3% 13k ± 2% -98.23% (p=0.000 n=10+10)
GoroutineProfile/sparse-nil/idle 476k ± 4% 1k ± 1% -99.85% (p=0.000 n=9+10)
GoroutineProfile/small-nil/idle 537k ±10% 1k ± 0% -99.87% (p=0.000 n=10+9)
GoroutineProfile/large-nil/idle 729k ± 0% 1k ± 1% -99.90% (p=0.000 n=7+10)
name old p99-ns new p99-ns delta
GoroutineProfile/sparse/loaded-8 1.27M ±33% 20.49M ±17% +1514.61% (p=0.000 n=10+10)
GoroutineProfile/small/loaded-8 1.37M ±29% 20.48M ±23% +1399.35% (p=0.000 n=10+10)
GoroutineProfile/large/loaded-8 9.76M ±23% 39.98M ±22% +309.52% (p=0.000 n=10+8)
GoroutineProfile/small/idle 976k ± 1% 3367k ±55% +244.94% (p=0.000 n=10+10)
GoroutineProfile/sparse/idle-8 1.03M ± 3% 2.50M ±65% +142.30% (p=0.000 n=10+10)
GoroutineProfile/small/idle-8 1.17M ±34% 1.70M ±14% +45.15% (p=0.000 n=10+10)
GoroutineProfile/sparse/idle 1.02M ± 3% 1.45M ± 4% +42.64% (p=0.000 n=9+10)
GoroutineProfile/large/idle 6.92M ± 2% 9.00M ± 7% +29.98% (p=0.000 n=8+9)
GoroutineProfile/large/idle-8 8.74M ±23% 9.90M ±24% ~ (p=0.190 n=10+10)
GoroutineProfile/sparse-nil/idle-8 508k ± 4% 16k ± 2% -96.90% (p=0.000 n=10+9)
GoroutineProfile/small-nil/idle-8 508k ± 4% 16k ± 3% -96.91% (p=0.000 n=10+9)
GoroutineProfile/small-nil/loaded-8 542k ± 5% 15k ±15% -97.15% (p=0.000 n=10+10)
GoroutineProfile/sparse-nil/loaded-8 649k ±16% 15k ±18% -97.67% (p=0.000 n=10+10)
GoroutineProfile/large-nil/idle-8 738k ± 2% 16k ± 2% -97.86% (p=0.000 n=10+10)
GoroutineProfile/large-nil/loaded-8 765k ± 4% 15k ±17% -98.03% (p=0.000 n=10+10)
GoroutineProfile/sparse-nil/idle 539k ±26% 1k ±17% -99.84% (p=0.000 n=10+10)
GoroutineProfile/small-nil/idle 659k ±25% 1k ± 0% -99.84% (p=0.000 n=10+8)
GoroutineProfile/large-nil/idle 760k ± 2% 1k ±22% -99.88% (p=0.000 n=9+10)
Fixes #33250
For #50794
Change-Id: I862a2bc4e991cec485f21a6fce4fca84f2c6435b
Reviewed-on: https://go-review.googlesource.com/c/go/+/387415
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
The race annotations for goroutine label maps covered the special type
of read necessary to create CPU profiles. Extend that to include
goroutine profiles. Annotate the copy involved in creating new
goroutines.
Fixes #50292
Change-Id: I10f69314e4f4eba85c506590fe4781f4d6b8ec2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/385660
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
|
|
When acquire a goroutine profile, we stop the world then acquire a
stack trace for each goroutine. When cgo traceback is used, the
traceback code may call the cgo traceback function using cgocall.
As the world is stopped, cgocall will be blocked at exitsyscall,
causing a deadlock. Bypass the scheduler (using asmcgocall) to fix
this.
Change-Id: Ic4e596adc3711310b6a983d73786d697ef15dd72
Reviewed-on: https://go-review.googlesource.com/c/go/+/362757
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Add explicit address sanitizer instrumentation to the runtime and
syscall packages. The compiler does not instrument the runtime
package. It does instrument the syscall package, but we need to add
a couple of cases that it can't see.
Refer to the implementation of the asan malloc runtime library,
this patch also allocates extra memory as the redzone, around the
returned memory region, and marks the redzone as unaddressable to
detect the overflows or underflows.
Updates #44853.
Change-Id: I2753d1cc1296935a66bf521e31ce91e35fcdf798
Reviewed-on: https://go-review.googlesource.com/c/go/+/298614
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: fannie zhang <Fannie.Zhang@arm.com>
|
|
We have a constant for 179999, don't duplicate it.
Change-Id: Iefb9c4746f6dda2e08b42e3c978963198469ee8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/277375
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Pratt <mpratt@google.com>
|
|
At this point all funcPC references are ABIInternal functions.
Replace with the intrinsics.
Change-Id: I3ba7e485c83017408749b53f92877d3727a75e27
Reviewed-on: https://go-review.googlesource.com/c/go/+/321954
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Block profiles were biased towards infrequent long events over frequent
short events. This fix corrects the bias by aggregating shorter events
as longer but less frequent in the profiles. As a result their
cumulative duration will be accurately represented in the profile
without skewing their sample mean (duration/count).
Credit to @dvyukov for suggesting to adjust the count in the
saveblockevent function.
Fixes #44192.
Change-Id: I71a99d7f6ebdb2d484d44890a2517863cceb4004
Reviewed-on: https://go-review.googlesource.com/c/go/+/299991
Trust: Michael Pratt <mpratt@google.com>
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
If runtime.MemProfile is unreachable, default to not collecting any
memory profiling samples, to save memory on the hash table.
Fixes #42347
Change-Id: I9a4894a5fc77035fe59b1842e1ec77a1182e70c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/299671
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Keith Randall <khr@golang.org>
|
|
Correctly accessing allgs is a bit hairy. Some paths need to lock
allglock, some don't. Those that don't are safest using atomicAllG, but
usage is not consistent.
Rather than doing this ad-hoc, move all access* through forEachG /
forEachGRace, the locking and atomic versions, respectively. This will
make it easier to ensure safe access.
* markroot is the only exception, as it has a far-removed guarantee of
safe access via an atomic load of allglen far before actual use.
Change-Id: Ie1c7a8243e155ae2b4bc3143577380c695680e89
Reviewed-on: https://go-review.googlesource.com/c/go/+/279994
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Goroutines are directly associated with labels. It's relatively easy to
plumb those through without creating goroutine-locals in the wild.
This is accomplished by splitting out most of the code from the public
`runtime.GoroutineProfile` into a new unexported
`runtime.goroutineProfileWithLabels`, which then has a thin wrapper
linked into the `runtime/pprof` package as
`runtime_goroutineProfileWithLabels`. (mirroring the way labels get
associated with the `g` for a goroutine in the first place)
Per-#6104, OS-thread creation profiles are a bit useless, as `M`s tend
to be created be created by a background goroutine. As such, I decided
not to add support for capturing the labels at `M`-creation-time, since
the stack-traces seem to always come out `nil` for my simple test
binaries.
This change currently provides labels for debug=0 and debug=1, as
debug=2 is currently entirely generated by the runtime package and I
don't see a clean way of getting the `labelMap` type handled properly
within the `runtime` package.
Update the comment added in cl/131275 to mention goroutine support for
labels.
Updates #23458
Change-Id: Ia4b558893d7d10156b77121cd9b70c4ccd9e1889
Reviewed-on: https://go-review.googlesource.com/c/go/+/189318
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
|
|
Currently, isSystemGoroutine varies on whether it considers the
finalizer goroutine a user goroutine or a system goroutine. For the
next CL, we're going to want to always consider the finalier goroutine
a user goroutine, so add a flag that indicates that.
Updates #26903. This is preparation for unifying STW GC and concurrent
GC.
Change-Id: Iafc92e519c13d9f8d879332cb5f0d12164104c33
Reviewed-on: https://go-review.googlesource.com/c/134778
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
getcallersp is intrinsified, and so the dummy arg is no longer
needed. Remove it, as well as a few dummy args that are solely
to feed getcallersp.
Change-Id: Ibb6c948ff9c56537042b380ac3be3a91b247aaa6
Reviewed-on: https://go-review.googlesource.com/109596
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Change-Id: Ie4da1a515d5405140d742bdcd55f54a73a7f71fe
Reviewed-on: https://go-review.googlesource.com/108175
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Now that getcallerpc is a compiler intrinsic on x86 and non-x86
platforms don't need the argument, we can drop it.
Sadly, this doesn't let us remove any dummy arguments since all of
those cases also use getcallersp, which still takes the argument
pointer, but this is at least an improvement.
Change-Id: I9c34a41cf2c18cba57f59938390bf9491efb22d2
Reviewed-on: https://go-review.googlesource.com/65474
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Currently runtime.GC() triggers a STW GC. For common uses in tests and
benchmarks, it doesn't matter whether it's STW or concurrent, but for
uses in servers for things like collecting heap profiles and
controlling memory footprint, this pause can be a bit problem for
latency.
This changes runtime.GC() to trigger a concurrent GC. In order to
remain as close as possible to its current meaning, we define it to
always perform a full mark/sweep GC cycle before returning (even if
that means it has to finish up a cycle we're in the middle of first)
and to publish the heap profile as of the triggered mark termination.
While it must perform a full cycle, simultaneous runtime.GC() calls
can be consolidated into a single full cycle.
Fixes #18216.
Change-Id: I9088cc5deef4ab6bcf0245ed1982a852a01c44b5
Reviewed-on: https://go-review.googlesource.com/37520
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Currently we snapshot the heap profile just *after* mark termination
starts the world because it's a relatively expensive operation.
However, this means any alloc or free events that happen between
starting the world and snapshotting the heap profile can be accounted
to the wrong cycle. In the worst case, a free can be accounted to the
cycle before the alloc; if the heap is small, this can result
temporarily in a negative "in use" count in the profile.
Fix this without making STW more expensive by using a global heap
profile cycle counter. This lets us split up the operation into a two
parts: 1) a super-cheap snapshot operation that simply increments the
global cycle counter during STW, and 2) a more expensive cleanup
operation we can do after starting the world that frees up a slot in
all buckets for use by the next heap profile cycle.
Fixes #19311.
Change-Id: I6bdafabf111c48b3d26fe2d91267f7bef0bd4270
Reviewed-on: https://go-review.googlesource.com/37714
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Currently memRecord has the same set of four fields repeated three
times. Pull these into a type and use this type three times. This
cleans up and simplifies the code a bit and will make it easier to
switch to a globally tracked heap profile cycle for #19311.
Change-Id: I414d15673feaa406a8366b48784437c642997cf2
Reviewed-on: https://go-review.googlesource.com/37713
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Every time I modify heap profiling, I find myself redrawing this
diagram, so add it to the comments. This shows how allocations and
frees are accounted, how we arrive at consistent profile snapshots,
and when those snapshots are published to the user.
Change-Id: I106aba1200af3c773b46e24e5f50205e808e2c69
Reviewed-on: https://go-review.googlesource.com/37514
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Found by github.com/mvdan/unparam.
Change-Id: Iabcdfec2ae42c735aa23210b7183080d750682ca
Reviewed-on: https://go-review.googlesource.com/38030
Reviewed-by: Peter Weinberger <pjw@google.com>
Run-TryBot: Peter Weinberger <pjw@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
On solaris/amd64 sometimes the reported cycle count is negative. Replace
with 0.
Change-Id: I364eea5ca072281245c7ab3afb0bf69adc3a8eae
Reviewed-on: https://go-review.googlesource.com/32258
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
runtime.SetMutexProfileFraction(n int) will capture 1/n-th of stack
traces of goroutines holding contended mutexes if n > 0. From runtime/pprof,
pprot.Lookup("mutex").WriteTo writes the accumulated
stack traces to w (in essentially the same format that blocking
profiling uses).
Change-Id: Ie0b54fa4226853d99aa42c14cb529ae586a8335a
Reviewed-on: https://go-review.googlesource.com/29650
Reviewed-by: Austin Clements <austin@google.com>
|
|
This covers basically all sysAlloc'd, persistentalloc'd, and
fixalloc'd types.
Change-Id: I0487c887c2a0ade5e33d4c4c12d837e97468e66b
Reviewed-on: https://go-review.googlesource.com/30941
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Instrumenting copy and append for the race detector changes them to call
different functions. In the runtime package the alternate functions are
not marked as nosplit. This caused a crash in the SIGPROF handler when
invoked on a non-Go thread in a program built with the race detector. In
some cases the handler can call copy, the race detector changed that to
a call to a non-nosplit function, the function tried to check the stack
guard, and crashed because it was running on a non-Go thread. The
SIGPROF handler is written carefully to avoid such problems, but hidden
function calls are difficult to avoid.
Fix this by changing the compiler to not instrument copy and append when
compiling the runtime package. Change the runtime package to add
explicit race checks for the only code I could find where copy is used
to write to user data (append is never used).
Change-Id: I11078a66c0aaa459a7d2b827b49f4147922050af
Reviewed-on: https://go-review.googlesource.com/29472
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
|
|
Change-Id: I37706ff0a3486827c5b072c95ad890ea87ede847
Reviewed-on: https://go-review.googlesource.com/28210
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
linux/amd64:
cmd/go: -8KB (basically nothing)
linux/amd64 PIE:
cmd/go: -191KB (1.6%)
jujud: -1.5MB (1.9%)
Updates #6853
Fixes #15064
Change-Id: I0adbb95685e28be92e8548741df0e11daa0a9b5f
Reviewed-on: https://go-review.googlesource.com/21777
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Automated refactoring produced using github.com/mdempsky/unconvert.
Change-Id: Iacf871a4f221ef17f48999a464ab2858b2bbaa90
Reviewed-on: https://go-review.googlesource.com/20071
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|