| Age | Commit message (Collapse) | Author |
|
This reverts commit f9ba2cff2286d378eca28c841bea8488c69fc30e (CL 586237)
Reason for revert: This is part of a patch series that changed the
handling of contended lock2/unlock2 calls, reducing the maximum
throughput of contended runtime.mutex values, and causing a performance
regression on applications where that is (or became) the bottleneck.
This test verifies that the semantics of the mutex profile for
runtime.mutex values matches that of sync.Mutex values. Without the rest
of the patch series, this test would correctly identify that Go 1.22's
semantics are incorrect (issue #66999).
Updates #66999
Updates #67585
Change-Id: Id06ae01d7bc91c94054c80d273e6530cb2d59d10
Reviewed-on: https://go-review.googlesource.com/c/go/+/589096
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Have the test use the same clock (cputicks) as the profiler, and use the
test's own measurements as hard bounds on the magnitude to expect in the
profile.
Compare the depiction of two users of the same lock: one where the
critical section is fast, one where it is slow. Confirm that the profile
shows the slow critical section as a large source of delay (with #66999
fixed), rather than showing the fast critical section as a large
recipient of delay.
For #64253
For #66999
Change-Id: I784c8beedc39de564dc8cee42060a5d5ce55de39
Reviewed-on: https://go-review.googlesource.com/c/go/+/586237
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
The existing implementation of traceMap is a hash map with a fixed
bucket table size which scales poorly with the number of elements added
to the map. After a few thousands elements are in the map, it tends to
fall over.
Furthermore, cleaning up the trace map is currently non-preemptible,
without very good reason.
This change replaces the traceMap implementation with a simple
append-only concurrent hash-trie. The data structure is incredibly
simple and does not suffer at all from the same scaling issues.
Because the traceMap no longer has a lock, and the traceRegionAlloc it
embeds is not thread-safe, we have to push that lock down. While we're
here, this change also makes the fast path for the traceRegionAlloc
lock-free. This may not be inherently faster due to contention on the
atomic add, but it creates an easy path to sharding the main allocation
buffer to reduce contention in the future. (We might want to also
consider a fully thread-local allocator that covers both string and
stack tables. The only reason a thread-local allocator isn't feasible
right now is because each of these has their own region, but we could
certainly group all them together.)
Change-Id: I8c06d42825c326061a1b8569e322afc4bc2a513a
Reviewed-on: https://go-review.googlesource.com/c/go/+/570035
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This change removes the allocheaders, deleting all the old code and
merging mbitmap_allocheaders.go back into mbitmap.go.
This change also deletes the SetType benchmarks which were already
broken in the new GOEXPERIMENT (it's harder to set up than before). We
weren't really watching these benchmarks at all, and they don't provide
additional test coverage.
Change-Id: I135497201c3259087c5cd3722ed3fbe24791d25d
Reviewed-on: https://go-review.googlesource.com/c/go/+/567200
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
This change fixes a possible race with updating metrics and reading
them. The update is intended to be protected by the world being stopped,
but here, it clearly isn't.
Fixing this lets us lower the thresholds in the metrics tests by an
order of magnitude, because the only thing we have to worry about now is
floating point error (the tests were previously written assuming the
floating point error was much higher than it actually was; that turns
out not to be the case, and this bug was the problem instead). However,
this still isn't that tight of a bound; we still want to catch any and
all problems of exactness. For this purpose, this CL adds a test to
check the source-of-truth (in uint64 nanoseconds) that ensures the
totals exactly match.
This means we unfortunately have to take another time measurement, but
for now let's prioritize correctness. A few additional nanoseconds of
STW time won't be terribly noticable.
Fixes #66212.
Change-Id: Id02c66e8a43c13b1f70e9b268b8a84cc72293bfd
Reviewed-on: https://go-review.googlesource.com/c/go/+/570257
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Nicolas Hillegeer <aktau@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For #59670
Change-Id: Id66e102f13e529dd041b68ce869026a56f0a1b9b
GitHub-Last-Rev: 43aa9376f72bc02a9d86518cdc99494a6b2f8573
GitHub-Pull-Request: golang/go#65564
Reviewed-on: https://go-review.googlesource.com/c/go/+/562298
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Austin Clements <austin@google.com>
|
|
For #65355
Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be
Reviewed-on: https://go-review.googlesource.com/c/go/+/560155
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
|
|
If user code has two timers t1 and t2 and does *t1 = *t2
(or *t1 = Timer{}), it creeps me out that we would be
corrupting the runtime data structures inlined in the
Timer struct. Replace that field with a pointer to the
runtime data structure instead, so that the corruption
cannot happen, even in a badly behaved program.
In fact, remove the struct definition entirely and linkname
a constructor instead. Now the runtime can evolve the struct
however it likes without needing to keep package time in sync.
Also move the workaround logic for #21874 out of
runtime and into package time.
Change-Id: Ia30f7802ee7b3a11f5d8a78dd30fd9c8633dc787
Reviewed-on: https://go-review.googlesource.com/c/go/+/568339
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: Ib78c1513616089f4942297cd17212b1b11871fd5
GitHub-Last-Rev: f97fe5b5bffffe25dc31de7964588640cb70ec41
GitHub-Pull-Request: golang/go#65819
Reviewed-on: https://go-review.googlesource.com/c/go/+/565515
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
For #59670
Change-Id: I9265e033bf3a84c3dc7b4a5d52c0df9672435f0d
GitHub-Last-Rev: 8e4099095cec9e774abe9b72212a1eab6f6f9fdb
GitHub-Pull-Request: golang/go#64774
Reviewed-on: https://go-review.googlesource.com/c/go/+/550117
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
CL 549536 intended to decouple the internal implementation of rwmutex
from the semantic meaning of an rwmutex read/write lock in the static
lock ranking.
Unfortunately, it was not thought through well enough. The internals
were represented with the rwmutexR and rwmutexW lock ranks. The idea was
that the internal lock ranks need not model the higher-level ordering,
since those have separate rankings. That is incorrect; rwmutexW is held
for the duration of a write lock, so it must be ranked before any lock
taken while any write lock is held, which is precisely what we were
trying to avoid.
This is visible in violations like:
0 : execW 11 0x0
1 : rwmutexW 51 0x111d9c8
2 : fin 30 0x111d3a0
fatal error: lock ordering problem
execW < fin is modeled, but rwmutexW < fin is missing.
Fix this by eliminating the rwmutexR/W lock ranks shared across
different types of rwmutex. Instead require users to define an
additional "internal" lock rank to represent the implementation details
of rwmutex.rLock. We can avoid an additional "internal" lock rank for
rwmutex.wLock because the existing writeRank has the same semantics for
semantic and internal locking. i.e., writeRank is held for the duration
of a write lock, which is exactly how rwmutex.wLock is used, so we can
use writeRank directly on wLock.
For #64722.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking
Change-Id: Ia572de188a46ba8fe054ae28537648beaa16b12c
Reviewed-on: https://go-review.googlesource.com/c/go/+/555055
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Currently, lock ranking doesn't really try to model rwmutex. It records
the internal locks rLock and wLock, but in a subpar fashion:
1. wLock is held from lock to unlock, so it works OK, but it conflates
write locks of all rwmutexes as rwmutexW, rather than allowing
different rwmutexes to have different rankings.
2. rLock is an internal implementation detail that is only taken when
there is contention in rlock. As as result, the reader lock path is
almost never checked.
Add proper modeling. rwmutexR and rwmutexW remain as the ranks of the
internal locks, which have their own ordering. The new init method is
passed the ranks of the higher level lock that this represents, just
like lockInit for mutex.
execW ordered before MALLOC captures the case from #64722. i.e., there
can be allocation between BeforeFork and AfterFork.
For #64722.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking
Change-Id: I23335b28faa42fb04f1bc9da02fdf54d1616cd28
Reviewed-on: https://go-review.googlesource.com/c/go/+/549536
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Move ChaCha8 code into internal/chacha8rand and use it to implement
runtime.rand, which is used for the unseeded global source for
both math/rand and math/rand/v2. This also affects the calculation of
the start point for iteration over very very large maps (when the
32-bit fastrand is not big enough).
The benefit is that misuse of the global random number generators
in math/rand and math/rand/v2 in contexts where non-predictable
randomness is important for security reasons is no longer a
security problem, removing a common mistake among programmers
who are unaware of the different kinds of randomness.
The cost is an extra 304 bytes per thread stored in the m struct
plus 2-3ns more per random uint64 due to the more sophisticated
algorithm. Using PCG looks like it would cost about the same,
although I haven't benchmarked that.
Before this, the math/rand and math/rand/v2 global generator
was wyrand (https://github.com/wangyi-fudan/wyhash).
For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson
ALFG was justifiable, since the latter was not any better.
But for math/rand/v2, the global generator really should be
at least as good as one of the well-studied, specific algorithms
provided directly by the package, and it's not.
(Wyrand is still reasonable for scheduling and cache decisions.)
Good randomness does have a cost: about twice wyrand.
Also rationalize the various runtime rand references.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ bbb48afeb7.amd64 │ 5cf807d1ea.amd64 │
│ sec/op │ sec/op vs base │
ChaCha8-32 1.862n ± 2% 1.861n ± 2% ~ (p=0.825 n=20)
PCG_DXSM-32 1.471n ± 1% 1.460n ± 2% ~ (p=0.153 n=20)
SourceUint64-32 1.636n ± 2% 1.582n ± 1% -3.30% (p=0.000 n=20)
GlobalInt64-32 2.087n ± 1% 3.663n ± 1% +75.54% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1042n ± 1% 0.2026n ± 1% +94.48% (p=0.000 n=20)
GlobalUint64-32 2.263n ± 2% 3.724n ± 1% +64.57% (p=0.000 n=20)
GlobalUint64Parallel-32 0.1019n ± 1% 0.1973n ± 1% +93.67% (p=0.000 n=20)
Int64-32 1.771n ± 1% 1.774n ± 1% ~ (p=0.449 n=20)
Uint64-32 1.863n ± 2% 1.866n ± 1% ~ (p=0.364 n=20)
GlobalIntN1000-32 3.134n ± 3% 4.730n ± 2% +50.95% (p=0.000 n=20)
IntN1000-32 2.489n ± 1% 2.489n ± 1% ~ (p=0.683 n=20)
Int64N1000-32 2.521n ± 1% 2.516n ± 1% ~ (p=0.394 n=20)
Int64N1e8-32 2.479n ± 1% 2.478n ± 2% ~ (p=0.743 n=20)
Int64N1e9-32 2.530n ± 2% 2.514n ± 2% ~ (p=0.193 n=20)
Int64N2e9-32 2.501n ± 1% 2.494n ± 1% ~ (p=0.616 n=20)
Int64N1e18-32 3.227n ± 1% 3.205n ± 1% ~ (p=0.101 n=20)
Int64N2e18-32 3.647n ± 1% 3.599n ± 1% ~ (p=0.019 n=20)
Int64N4e18-32 5.135n ± 1% 5.069n ± 2% ~ (p=0.034 n=20)
Int32N1000-32 2.657n ± 1% 2.637n ± 1% ~ (p=0.180 n=20)
Int32N1e8-32 2.636n ± 1% 2.636n ± 1% ~ (p=0.763 n=20)
Int32N1e9-32 2.660n ± 2% 2.638n ± 1% ~ (p=0.358 n=20)
Int32N2e9-32 2.662n ± 2% 2.618n ± 2% ~ (p=0.064 n=20)
Float32-32 2.272n ± 2% 2.239n ± 2% ~ (p=0.194 n=20)
Float64-32 2.272n ± 1% 2.286n ± 2% ~ (p=0.763 n=20)
ExpFloat64-32 3.762n ± 1% 3.744n ± 1% ~ (p=0.171 n=20)
NormFloat64-32 3.706n ± 1% 3.655n ± 2% ~ (p=0.066 n=20)
Perm3-32 32.93n ± 3% 34.62n ± 1% +5.13% (p=0.000 n=20)
Perm30-32 202.9n ± 1% 204.0n ± 1% ~ (p=0.482 n=20)
Perm30ViaShuffle-32 115.0n ± 1% 114.9n ± 1% ~ (p=0.358 n=20)
ShuffleOverhead-32 112.8n ± 1% 112.7n ± 1% ~ (p=0.692 n=20)
Concurrent-32 2.107n ± 0% 3.725n ± 1% +76.75% (p=0.000 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
│ bbb48afeb7.arm64 │ 5cf807d1ea.arm64 │
│ sec/op │ sec/op vs base │
ChaCha8-8 2.480n ± 0% 2.429n ± 0% -2.04% (p=0.000 n=20)
PCG_DXSM-8 2.531n ± 0% 2.530n ± 0% ~ (p=0.877 n=20)
SourceUint64-8 2.534n ± 0% 2.533n ± 0% ~ (p=0.732 n=20)
GlobalInt64-8 2.172n ± 1% 4.794n ± 0% +120.67% (p=0.000 n=20)
GlobalInt64Parallel-8 0.4320n ± 0% 0.9605n ± 0% +122.32% (p=0.000 n=20)
GlobalUint64-8 2.182n ± 0% 4.770n ± 0% +118.58% (p=0.000 n=20)
GlobalUint64Parallel-8 0.4307n ± 0% 0.9583n ± 0% +122.51% (p=0.000 n=20)
Int64-8 4.107n ± 0% 4.104n ± 0% ~ (p=0.416 n=20)
Uint64-8 4.080n ± 0% 4.080n ± 0% ~ (p=0.052 n=20)
GlobalIntN1000-8 2.814n ± 2% 5.643n ± 0% +100.50% (p=0.000 n=20)
IntN1000-8 4.141n ± 0% 4.139n ± 0% ~ (p=0.140 n=20)
Int64N1000-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.313 n=20)
Int64N1e8-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.103 n=20)
Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.761 n=20)
Int64N2e9-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.636 n=20)
Int64N1e18-8 5.266n ± 0% 5.326n ± 1% +1.14% (p=0.001 n=20)
Int64N2e18-8 6.052n ± 0% 6.167n ± 0% +1.90% (p=0.000 n=20)
Int64N4e18-8 8.826n ± 0% 9.051n ± 0% +2.55% (p=0.000 n=20)
Int32N1000-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20)
Int32N1e8-8 4.126n ± 0% 4.131n ± 0% +0.12% (p=0.000 n=20)
Int32N1e9-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20)
Int32N2e9-8 4.132n ± 0% 4.131n ± 0% ~ (p=0.017 n=20)
Float32-8 4.109n ± 0% 4.105n ± 0% ~ (p=0.379 n=20)
Float64-8 4.107n ± 0% 4.106n ± 0% ~ (p=0.867 n=20)
ExpFloat64-8 5.339n ± 0% 5.383n ± 0% +0.82% (p=0.000 n=20)
NormFloat64-8 5.735n ± 0% 5.737n ± 1% ~ (p=0.856 n=20)
Perm3-8 26.65n ± 0% 26.80n ± 1% +0.58% (p=0.000 n=20)
Perm30-8 194.8n ± 1% 197.0n ± 0% +1.18% (p=0.000 n=20)
Perm30ViaShuffle-8 156.6n ± 0% 157.6n ± 1% +0.61% (p=0.000 n=20)
ShuffleOverhead-8 124.9n ± 0% 125.5n ± 0% +0.52% (p=0.000 n=20)
Concurrent-8 2.434n ± 3% 5.066n ± 0% +108.09% (p=0.000 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ bbb48afeb7.386 │ 5cf807d1ea.386 │
│ sec/op │ sec/op vs base │
ChaCha8-32 11.295n ± 1% 4.748n ± 2% -57.96% (p=0.000 n=20)
PCG_DXSM-32 7.693n ± 1% 7.738n ± 2% ~ (p=0.542 n=20)
SourceUint64-32 7.658n ± 2% 7.622n ± 2% ~ (p=0.344 n=20)
GlobalInt64-32 3.473n ± 2% 7.526n ± 2% +116.73% (p=0.000 n=20)
GlobalInt64Parallel-32 0.3198n ± 0% 0.5444n ± 0% +70.22% (p=0.000 n=20)
GlobalUint64-32 3.612n ± 0% 7.575n ± 1% +109.69% (p=0.000 n=20)
GlobalUint64Parallel-32 0.3168n ± 0% 0.5403n ± 0% +70.51% (p=0.000 n=20)
Int64-32 7.673n ± 2% 7.789n ± 1% ~ (p=0.122 n=20)
Uint64-32 7.773n ± 1% 7.827n ± 2% ~ (p=0.920 n=20)
GlobalIntN1000-32 6.268n ± 1% 9.581n ± 1% +52.87% (p=0.000 n=20)
IntN1000-32 10.33n ± 2% 10.45n ± 1% ~ (p=0.233 n=20)
Int64N1000-32 10.98n ± 2% 11.01n ± 1% ~ (p=0.401 n=20)
Int64N1e8-32 11.19n ± 2% 10.97n ± 1% ~ (p=0.033 n=20)
Int64N1e9-32 11.06n ± 1% 11.08n ± 1% ~ (p=0.498 n=20)
Int64N2e9-32 11.10n ± 1% 11.01n ± 2% ~ (p=0.995 n=20)
Int64N1e18-32 15.23n ± 2% 15.04n ± 1% ~ (p=0.973 n=20)
Int64N2e18-32 15.89n ± 1% 15.85n ± 1% ~ (p=0.409 n=20)
Int64N4e18-32 18.96n ± 2% 19.34n ± 2% ~ (p=0.048 n=20)
Int32N1000-32 10.46n ± 2% 10.44n ± 2% ~ (p=0.480 n=20)
Int32N1e8-32 10.46n ± 2% 10.49n ± 2% ~ (p=0.951 n=20)
Int32N1e9-32 10.28n ± 2% 10.26n ± 1% ~ (p=0.431 n=20)
Int32N2e9-32 10.50n ± 2% 10.44n ± 2% ~ (p=0.249 n=20)
Float32-32 13.80n ± 2% 13.80n ± 2% ~ (p=0.751 n=20)
Float64-32 23.55n ± 2% 23.87n ± 0% ~ (p=0.408 n=20)
ExpFloat64-32 15.36n ± 1% 15.29n ± 2% ~ (p=0.316 n=20)
NormFloat64-32 13.57n ± 1% 13.79n ± 1% +1.66% (p=0.005 n=20)
Perm3-32 45.70n ± 2% 46.99n ± 2% +2.81% (p=0.001 n=20)
Perm30-32 399.0n ± 1% 403.8n ± 1% +1.19% (p=0.006 n=20)
Perm30ViaShuffle-32 349.0n ± 1% 350.4n ± 1% ~ (p=0.909 n=20)
ShuffleOverhead-32 322.3n ± 1% 323.8n ± 1% ~ (p=0.410 n=20)
Concurrent-32 3.331n ± 1% 7.312n ± 1% +119.50% (p=0.000 n=20)
For #61716.
Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2
Reviewed-on: https://go-review.googlesource.com/c/go/+/516860
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
ReadMemStats has a few assertions it makes about the consistency of the
stats it's about to produce. Specifically, how those stats line up with
runtime-internal stats. These checks are generally useful, but crashing
just because some stats are wrong is a heavy price to pay.
For a long time this wasn't a problem, but very recently it became a
real problem. It turns out that there's real benign skew that can happen
wherein sysmon (which doesn't synchronize with a STW) generates a trace
event when tracing is enabled, and may mutate some stats while
ReadMemStats is running its checks.
Fix this by synchronizing with both sysmon and the tracer. This is a bit
heavy-handed, but better that than false positives.
Also, put the checks behind a debug mode. We want to reduce the risk of
backporting this change, and again, it's not great to crash just because
user-facing stats are off. Still, enable this debug mode during the
runtime tests so we don't lose quite as much coverage from disabling
these checks by default.
Fixes #64401.
Change-Id: I9adb3e5c7161d207648d07373a11da8a5f0fda9a
Reviewed-on: https://go-review.googlesource.com/c/go/+/545277
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
|
|
Add runtime-internal locks to the mutex contention profile.
Store up to one call stack responsible for lock contention on the M,
until it's safe to contribute its value to the mprof table. Try to use
that limited local storage space for a relatively large source of
contention, and attribute any contention in stacks we're not able to
store to a sentinel _LostContendedLock function.
Avoid ballooning lock contention while manipulating the mprof table by
attributing to that sentinel function any lock contention experienced
while reporting lock contention.
Guard collecting real call stacks with GODEBUG=profileruntimelocks=1,
since the available data has mixed semantics; we can easily capture an
M's own wait time, but we'd prefer for the profile entry of each
critical section to describe how long it made the other Ms wait. It's
too late in the Go 1.22 cycle to make the required changes to
futex-based locks. When not enabled, attribute the time to the sentinel
function instead.
Fixes #57071
This is a roll-forward of https://go.dev/cl/528657, which was reverted
in https://go.dev/cl/543660
Reason for revert: de-flakes tests (reduces dependence on fine-grained
timers, correctly identifies contention on big-endian futex locks,
attempts to measure contention in the semaphore implementation but only
uses that secondary measurement to finish the test early, skips tests on
single-processor systems)
Change-Id: I31389f24283d85e46ad9ba8d4f514cb9add8dfb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/544195
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: Rhys Hiltner <rhys@justin.tv>
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
|
|
Once(Func|Value|Values)
The function passed to OnceFunc/OnceValue/OnceValues may transitively
keep more allocations alive. As the passed function is guaranteed to be
called at most once, it is safe to drop it after the first call is
complete. This avoids keeping the passed function (and anything it
transitively references) alive until the returned function is GCed.
Change-Id: I2faf397b481d2f693ab3aea8e2981b02adbc7a21
Reviewed-on: https://go-review.googlesource.com/c/go/+/481515
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: qiulaidongfeng <2645477756@qq.com>
|
|
This reverts commit go.dev/cl/528657.
Reason for revert: broke a lot of builders.
Change-Id: I70c33062020e997c4df67b3eaa2e886cf0da961e
Reviewed-on: https://go-review.googlesource.com/c/go/+/543660
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add runtime-internal locks to the mutex contention profile.
Store up to one call stack responsible for lock contention on the M,
until it's safe to contribute its value to the mprof table. Try to use
that limited local storage space for a relatively large source of
contention, and attribute any contention in stacks we're not able to
store to a sentinel _LostContendedLock function.
Avoid ballooning lock contention while manipulating the mprof table by
attributing to that sentinel function any lock contention experienced
while reporting lock contention.
Guard collecting real call stacks with GODEBUG=profileruntimelocks=1,
since the available data has mixed semantics; we can easily capture an
M's own wait time, but we'd prefer for the profile entry of each
critical section to describe how long it made the other Ms wait. It's
too late in the Go 1.22 cycle to make the required changes to
futex-based locks. When not enabled, attribute the time to the sentinel
function instead.
Fixes #57071
Change-Id: I3eee0ccbfc20f333b56f20d8725dfd7f3a526b41
Reviewed-on: https://go-review.googlesource.com/c/go/+/528657
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
This CL adds four new time histogram metrics:
/sched/pauses/stopping/gc:seconds
/sched/pauses/stopping/other:seconds
/sched/pauses/total/gc:seconds
/sched/pauses/total/other:seconds
The "stopping" metrics measure the time taken to start a stop-the-world
pause. i.e., how long it takes stopTheWorldWithSema to stop all Ps.
This can be used to detect STW struggling to preempt Ps.
The "total" metrics measure the total duration of a stop-the-world
pause, from starting to stop-the-world until the world is started again.
This includes the time spent in the "start" phase.
The "gc" metrics are used for GC-related STW pauses. The "other" metrics
are used for all other STW pauses.
All of these metrics start timing in stopTheWorldWithSema only after
successfully acquiring sched.lock, thus excluding lock contention on
sched.lock. The reasoning behind this is that while waiting on
sched.lock the world is not stopped at all (all other Ps can run), so
the impact of this contention is primarily limited to the goroutine
attempting to stop-the-world. Additionally, we already have some
visibility into sched.lock contention via contention profiles (#57071).
/sched/pauses/total/gc:seconds is conceptually equivalent to
/gc/pauses:seconds, so the latter is marked as deprecated and returns
the same histogram as the former.
In the implementation, there are a few minor differences:
* For both mark and sweep termination stops, /gc/pauses:seconds started
timing prior to calling startTheWorldWithSema, thus including lock
contention.
These details are minor enough, that I do not believe the slight change
in reporting will matter. For mark termination stops, moving timing stop
into startTheWorldWithSema does have the side effect of requiring moving
other GC metric calculations outside of the STW, as they depend on the
same end time.
Fixes #63340
Change-Id: Iacd0bab11bedab85d3dcfb982361413a7d9c0d05
Reviewed-on: https://go-review.googlesource.com/c/go/+/534161
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change replaces the 1-bit-per-word heap bitmap for most size
classes with allocation headers for objects that contain pointers. The
header consists of a single pointer to a type. All allocations with
headers are treated as implicitly containing one or more instances of
the type in the header.
As the name implies, headers are usually stored as the first word of an
object. There are two additional exceptions to where headers are stored
and how they're used.
Objects smaller than 512 bytes do not have headers. Instead, a heap
bitmap is reserved at the end of spans for objects of this size. A full
word of overhead is too much for these small objects. The bitmap is of
the same format of the old bitmap, minus the noMorePtrs bits which are
unnecessary. All the objects <512 bytes have a bitmap less than a
pointer-word in size, and that was the granularity at which noMorePtrs
could stop scanning early anyway.
Objects that are larger than 32 KiB (which have their own span) have
their headers stored directly in the span, to allow power-of-two-sized
allocations to not spill over into an extra page.
The full implementation is behind GOEXPERIMENT=allocheaders.
The purpose of this change is performance. First and foremost, with
headers we no longer have to unroll pointer/scalar data at allocation
time for most size classes. Small size classes still need some
unrolling, but their bitmaps are small so we can optimize that case
fairly well. Larger objects effectively have their pointer/scalar data
unrolled on-demand from type data, which is much more compactly
represented and results in less TLB pressure. Furthermore, since the
headers are usually right next to the object and where we're about to
start scanning, we get an additional temporal locality benefit in the
data cache when looking up type metadata. The pointer/scalar data is
now effectively unrolled on-demand, but it's also simpler to unroll than
before; that unrolled data is never written anywhere, and for arrays we
get the benefit of retreading the same data per element, as opposed to
looking it up from scratch for each pointer-word of bitmap. Lastly,
because we no longer have a heap bitmap that spans the entire heap,
there's a flat 1.5% memory use reduction. This is balanced slightly by
some objects possibly being bumped up a size class, but most objects are
not tightly optimized to size class sizes so there's some memory to
spare, making the header basically free in those cases.
See the follow-up CL which turns on this experiment by default for
benchmark results. (CL 538217.)
Change-Id: I4c9034ee200650d06d8bdecd579d5f7c1bbf1fc5
Reviewed-on: https://go-review.googlesource.com/c/go/+/437955
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
ReadMetricsSlow was updated to call the core of readMetrics on the
systemstack to prevent issues with stat skew if the stack gets moved
between readmemstats_m and readMetrics. However, readMetrics calls into
the map implementation, which has race instrumentation. The system stack
typically has no racectx set, resulting in crashes.
Donate racectx to g0 like the tracer does, so that these accesses don't
crash.
For #60607.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-race
Change-Id: Ic0251af2d9b60361f071fe97084508223109480c
Reviewed-on: https://go-review.googlesource.com/c/go/+/539695
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
Currently it's possible (and even probable, with mayMoreStackMove mode)
for a stack allocation to occur between readmemstats_m and readMetrics
in ReadMetricsSlow. This can cause tests to fail by producing metrics
that are inconsistent between the two sources.
Fix this by breaking out the critical section of readMetrics and calling
that from ReadMetricsSlow on the systemstack. Our main constraint in
calling readMetrics on the system stack is the fact that we can't
acquire the metrics semaphore from the system stack. But if we break out
the critical section, then we can acquire that semaphore before we go on
the system stack.
While we're here, add another readMetrics call before readmemstats_m.
Since we're being paranoid about ways that metrics could get skewed
between the two calls, let's eliminate all uncertainty. It's possible
for readMetrics to allocate new memory, for example for histograms, and
fail while it's reading metrics. I believe we're just getting lucky
today with the order in which the metrics are produced. Another call to
readMetrics will preallocate this data in the samples slice. One nice
thing about this second read is that now we effectively have a way to
check if readMetrics really will allocate if called a second time on the
same samples slice.
Fixes #60607.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Change-Id: If6ce666530903239ef9f02dbbc3f1cb6be71e425
Reviewed-on: https://go-review.googlesource.com/c/go/+/539117
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Change-Id: Ib89a71e20f9c6b86c97814c75cb427e9bd7075e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/538735
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
|
|
Error like "morestack on g0" is one of the errors that is very
hard to debug, because often it doesn't print a useful stack trace.
The runtime doesn't directly print a stack trace because it is
a bad stack state to call print. Sometimes the SIGABRT may trigger
a traceback, but sometimes not especially in a cgo binary. Even if
it triggers a traceback it often does not include the stack trace
of the bad stack.
This CL makes it explicitly print a stack trace and throw. The
idea is to have some space as an "emergency" crash stack. When the
stack is in a really bad state, we switch to the crash stack and
do a traceback.
Currently only implemented on AMD64 and ARM64.
TODO: also handle errors like "morestack on gsignal" and bad
systemstack. Also handle other architectures.
Change-Id: Ibfc397202f2bb0737c5cbe99f2763de83301c1c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/419435
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Change-Id: I3f0b7209621b39cee69566a5cc95e4343b4f1f20
GitHub-Last-Rev: af9dbbe69ad74e8c210254dafa260a886b690853
GitHub-Pull-Request: golang/go#63321
Reviewed-on: https://go-review.googlesource.com/c/go/+/531916
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
We were using the size stored in the map, which is the smaller
of the real type size and 128.
As of CL 61538 we don't use these functions, but we expect to
use them again in the future after #61626 is resolved.
Change-Id: I7bfb4af5f0e3a56361d4019a8ed7c1ec59ff31fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/535215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
The correct load factor is 6.5, not 6.
This got broken by accident in CL 462115.
Fixes #63438
Change-Id: Ib07bb6ab6103aec87cb775bc06bd04362a64e489
Reviewed-on: https://go-review.googlesource.com/c/go/+/533279
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
mspan.freeindex and nelems can fit into uint16 for all possible
values. Use uint16 instead of uintptr.
Change-Id: Ifce20751e81d5022be1f6b5cbb5fbe4fd1728b1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/451359
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
After the previous CL, this is now all dead code. This change is
separated out to make the previous one easy to backport.
For #63334.
Related to #61718 and #59960.
Change-Id: I109673ed97c62c472bbe2717dfeeb5aa4fc883ea
Reviewed-on: https://go-review.googlesource.com/c/go/+/532117
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This is a follow up of CL 528696.
Change-Id: I5b71eabedb12567c4b1b36f7182a3d2b0ed662a5
GitHub-Last-Rev: acaf3ac11c38042ad27b99e1c70a3c9f1a554a15
GitHub-Pull-Request: golang/go#62713
Reviewed-on: https://go-review.googlesource.com/c/go/+/529197
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The stack bounds from pthread are not always accurate, and could
cause seg fault if we run out of the actual stack space before
reaching the bounds. Here we use an artificially small stack bounds
to check overflow without actually running out of the system stack.
Change-Id: I8067c5e1297307103b315d9d0c60120293b57aab
Reviewed-on: https://go-review.googlesource.com/c/go/+/523695
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Currently the runtime marks all new memory as MADV_HUGEPAGE on Linux and
manages its hugepage eligibility status. Unfortunately, the default
THP behavior on most Linux distros is that MADV_HUGEPAGE blocks while
the kernel eagerly reclaims and compacts memory to allocate a hugepage.
This direct reclaim and compaction is unbounded, and may result in
significant application thread stalls. In really bad cases, this can
exceed 100s of ms or even seconds.
Really all we want is to undo MADV_NOHUGEPAGE marks and let the default
Linux paging behavior take over, but the only way to unmark a region as
MADV_NOHUGEPAGE is to also mark it MADV_HUGEPAGE.
The overall strategy of trying to keep hugepages for the heap unbroken
however is sound. So instead let's use the new shiny MADV_COLLAPSE if it
exists.
MADV_COLLAPSE makes a best-effort synchronous attempt at collapsing the
physical memory backing a memory region into a hugepage. We'll use
MADV_COLLAPSE where we would've used MADV_HUGEPAGE, and stop using
MADV_NOHUGEPAGE altogether.
Because MADV_COLLAPSE is synchronous, it's also important to not
re-collapse huge pages if the huge pages are likely part of some large
allocation. Although in many cases it's advantageous to back these
allocations with hugepages because they're contiguous, eagerly
collapsing every hugepage means having to page in at least part of the
large allocation.
However, because we won't use MADV_NOHUGEPAGE anymore, we'll no longer
handle the fact that khugepaged might come in and back some memory we
returned to the OS with a hugepage. I've come to the conclusion that
this is basically unavoidable without a new madvise flag and that it's
just not a good default. If this change lands, advice about Linux huge
page settings will be added to the GC guide.
Verified that this change doesn't regress Sweet, at least not on my
machine with:
/sys/kernel/mm/transparent_hugepage/enabled [always or madvise]
/sys/kernel/mm/transparent_hugepage/defrag [madvise]
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none [0 or 511]
Unfortunately, this workaround means that we only get forced hugepages
on Linux 6.1+.
Fixes #61718.
Change-Id: I7f4a7ba397847de29f800a99f9cb66cb2720a533
Reviewed-on: https://go-review.googlesource.com/c/go/+/516795
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
Now that pcvalue keeps its cache on the M, we can drop all of the
stack-allocated pcvalueCaches and stop carefully passing them around
between lots of operations. This significantly simplifies a fair
amount of code and makes several structures smaller.
This series of changes has no statistically significant effect on any
runtime Stack benchmarks.
I also experimented with making the cache larger, now that the impact
is limited to the M struct, but wasn't able to measure any
improvements.
This is a re-roll of CL 515277
Change-Id: Ia27529302f81c1c92fb9c3a7474739eca80bfca1
Reviewed-on: https://go-review.googlesource.com/c/go/+/520064
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
|
|
The pprof mutex profile was meant to match the Google C++ (now Abseil)
mutex profiler, originally designed and implemented by Mike Burrows.
When we worked on the Go version, pjw and I missed that C++ counts the
time each thread is blocked, even if multiple threads are blocked on a
mutex. That is, if 100 threads are blocked on the same mutex for the
same 10ms, that still counts as 1000ms of contention in C++. In Go, to
date, /debug/pprof/mutex has counted that as only 10ms of contention.
If 100 goroutines are blocked on one mutex and only 1 goroutine is
blocked on another mutex, we probably do want to see the first mutex
as being more contended, so the Abseil approach is the more useful one.
This CL adopts "contention scales with number of goroutines blocked",
to better match Abseil [1]. However, it still makes sure to attribute the
time to the unlock that caused the backup, not subsequent innocent
unlocks that were affected by the congestion. In this way it still gives
more accurate profiles than Abseil does.
[1] https://github.com/abseil/abseil-cpp/blob/lts_2023_01_25/absl/synchronization/mutex.cc#L2390
Fixes #61015.
Change-Id: I7eb9e706867ffa8c0abb5b26a1b448f6eba49331
Reviewed-on: https://go-review.googlesource.com/c/go/+/506415
Run-TryBot: Russ Cox <rsc@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
When recovering from a panic, restore the caller's frame pointer before
returning control to the caller. Otherwise, if the function proceeds to
run more deferred calls before returning, the deferred functions will
get invalid frame pointers pointing to an address lower in the stack.
This can cause frame pointer unwinding to crash, such as if an execution
trace event is recorded during the deferred call on architectures which
support frame pointer unwinding.
Fixes #61766
Change-Id: I45f41aedcc397133560164ab520ca638bbd93c4e
Reviewed-on: https://go-review.googlesource.com/c/go/+/516157
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
|
|
Followon to CL 518055.
Change-Id: I05c4b429f49feb7012070e467fefbf3392260915
Reviewed-on: https://go-review.googlesource.com/c/go/+/518538
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Fixes #60439
Change-Id: I09fcd2d3deb7f80ed012a769fdb6f53b09c0290b
Reviewed-on: https://go-review.googlesource.com/c/go/+/502895
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Currently the BenchmarkSetType* benchmarks are racy: they call
heapBitsSetType on an allocation that might be in a span in-use for
allocation on another P. Because heap bits are bits but are written
byte-wise non-atomically (because a P assumes it has total ownership of
a span's bits), two threads can race writing the same heap bitmap byte
creating incorrect metadata.
Fix this by forcing every value we're writing heap bits for into a large
object. Large object spans will never be written to concurrently unless
they're freed first.
Also, while we're here, refactor the benchmarks a bit. Use generics to
eliminate the reflect nastiness in gc_test.go, and pass b.ResetTimer
down into the test to get slightly more accurate results.
Fixes #60050.
Change-Id: Ib7d6249b321963367c8c8ca88385386c8ae9af1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/497215
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
The previous name was wrong due to the mistaken assumption that calling
f->g->getcallerpc and f->g->getcallersp would respectively return the
pc/sp at g. However, they are actually referring to their caller's
caller, i.e. f.
Rename getcallerfp to getfp in order to stay consistent with this
naming convention.
Also see discussion on CL 463835.
For #16638
This is a redo of CL 481617 that became necessary because CL 461738
added another call site for getcallerfp().
Change-Id: If0b536e85a6c26061b65e7b5c2859fc31385d025
Reviewed-on: https://go-review.googlesource.com/c/go/+/494857
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
|
|
Currently STW events are only emitted for GC STWs. There's little reason
why the trace can't contain events for every STW: they're rare so don't
take up much space in the trace, yet being able to see when the world
was stopped is often critical to debugging certain latency issues,
especially when they stem from user-level APIs.
This change adds new "kinds" to the EvGCSTWStart event, renames the
GCSTW events to just "STW," and lets the parser deal with unknown STW
kinds for future backwards compatibility.
But, this change must break trace compatibility, so it bumps the trace
version to Go 1.21.
This change also includes a small cleanup in the trace command, which
previously checked for STW events when deciding whether user tasks
overlapped with a GC. Looking at the source, I don't see a way for STW
events to ever enter the stream that that code looks at, so that
condition has been deleted.
Change-Id: I9a5dc144092c53e92eb6950e9a5504a790ac00cf
Reviewed-on: https://go-review.googlesource.com/c/go/+/494495
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
|
|
Some C APIs require the use or structures that contain pointers to
buffers (iovec, io_uring, ...). The pointer passing rules would
require that these buffers are allocated in C memory and to process
this data with Go libraries it would need to be copied.
In order to provide a zero-copy way to use these C APIs, this CL
implements a Pinner API that allows to pin Go objects, which
guarantees that the garbage collector does not move these objects
while pinned. This allows to relax the pointer passing rules so that
pinned pointers can be stored in C allocated memory or can be
contained in Go memory that is passed to C functions.
The Pin() method accepts pointers to objects of any type and
unsafe.Pointer. Slices and arrays can be pinned by calling Pin()
with the pointer to the first element. Pinning of maps is not
supported.
If the GC collects unreachable Pinner holding pinned objects it
panics. If Pin() is called with the other non-pointer types it
panics as well.
Performance considerations: This change has no impact on execution
time on existing code, because checks are only done in code paths,
that would panic otherwise. The memory footprint on existing code is
one pointer per memory span.
Fixes: #46787
Signed-off-by: Sven Anderson <sven@anderson.de>
Change-Id: I110031fe789b92277ae45a9455624687bd1c54f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/367296
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
|
|
Currently if GOGC=off and GOMEMLIMIT is set, then the synchronous
scavenger is likely to work fairly often to maintain the limit, since
the heap goal goes right up to the edge of the memory limit (minus a
fixed 1 MiB of headroom).
If the application's allocation rate is high, and page-level
fragmentation is high, then most allocations will scavenge.
This change mitigates this problem by adding a proportional component
to constant headroom added to the memory-limit-based heap goal. This
means the runtime will have much more headroom before fragmentation
forces memory to be eagerly scavenged.
The proportional headroom in this case is 3%, or ~30 MiB for a 1 GiB
heap. This technically will increase GC frequency in the GOGC=off case
by a tiny amount, but will likely have a positive impact on both
allocation throughput and latency that outweighs this difference.
I wrote a small program to reproduce this issue and confirmed that the
issue is resolved by this patch:
https://github.com/golang/go/issues/57069#issuecomment-1551746565
This value of 3% is chosen as it seems to be a inflection point in this
particular small program. 2% still resulted in quite a bit of eager
scavenging work. I confirmed this results in a GC frequency increase of
about 3%.
This choice is still somewhat arbitrary because the program is
arbitrary, so perhaps worth revisiting in the future. Still, it should
help a good number of programs.
Fixes #57069.
Change-Id: Icb9829db0dfefb4fe42a0cabc5aa8d35970dd7d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/460375
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Types are either static (for compiler-created types) or heap
allocated and always reachable (for reflection-created types, held
in the central map). So there is no need to escape types.
With CL 408826 reflect.Value does not always escape. Some functions
that escapes Value.typ would make the Value escape without this CL.
Had to add a special case for the inliner to keep (*Value).Type
still inlineable.
Change-Id: I7c14d35fd26328347b509a06eb5bd1534d40775f
Reviewed-on: https://go-review.googlesource.com/c/go/+/413474
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This was accidentally left behind when moving the logic to set the skip
sentinel in pcBuf to the caller.
Change-Id: Id7565f6ea4df6b32cf18b99c700bca322998d182
Reviewed-on: https://go-review.googlesource.com/c/go/+/489095
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
|
|
This reverts CL 481617.
Reason for revert: breaks test build on Windows
Change-Id: Ifc1a323b0cc521e7a5a1f7de7b3da667f5fee375
Reviewed-on: https://go-review.googlesource.com/c/go/+/494377
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
The previous name was wrong due to the mistaken assumption that calling
f->g->getcallerpc and f->g->getcallersp would respectively return the
pc/sp at g. However, they are actually referring to their caller's
caller, i.e. f.
Rename getcallerfp to getfp in order to stay consistent with this
naming convention.
Also see discussion on CL 463835.
For #16638
Change-Id: I07990645da78819efd3db92f643326652ee516f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/481617
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Change-Id: I1f031f0f83a94bebe41d3978a91a903dc5bcda66
Reviewed-on: https://go-review.googlesource.com/c/go/+/489276
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This touches a lot of files, which is bad, but it is also good,
since there's N copies of this information commoned into 1.
The new files in internal/abi are copied from the end of the stack;
ultimately this will all end up being used.
Change-Id: Ia252c0055aaa72ca569411ef9f9e96e3d610889e
Reviewed-on: https://go-review.googlesource.com/c/go/+/462995
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Add TestSystemstackFramePointerAdjust as a regression test for CL
489015.
By turning stackPoisonCopy into a var instead of const and introducing
the ShrinkStackAndVerifyFramePointers() helper function, we are able to
trigger the exact combination of events that can crash traceEvent() if
systemstack restores a frame pointer that is pointing into the old
stack.
Updates #59692
Change-Id: I60fc6940638077e3b60a81d923b5f5b4f6d8a44c
Reviewed-on: https://go-review.googlesource.com/c/go/+/489115
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
|
|
The scavenge index currently doesn't guard against overflow, and CL
436395 removed the minHeapIdx optimization that allows the chunk scan to
skip scanning chunks that haven't been mapped for the heap, and are only
available as a consequence of chunks' mapped region being rounded out to
a page on both ends.
Because the 0'th chunk is never mapped, minHeapIdx effectively prevents
overflow, fixing the iOS breakage.
This change also refactors growth and initialization a little bit to
decouple it from pageAlloc a bit and share code across platforms.
Change-Id: If7fc3245aa81cf99451bf8468458da31986a9b0a
Reviewed-on: https://go-review.googlesource.com/c/go/+/486695
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
|