| Age | Commit message (Collapse) | Author |
|
For goroutines in a synctest bubble, include whether the goroutine
is "durably blocked" or not in the goroutine status.
Synctest categorizes goroutines in certain states as "durably"
blocked, where the goroutine is not merely idle but can only
be awoken by another goroutine in its bubble. To make it easier
for users to understand why a bubble is or is not idle,
print the state of each bubbled goroutine.
For example:
goroutine 36 [chan receive, synctest bubble 34, not durably blocked]:
goroutine 37 [chan receive (synctest), synctest bubble 34, durably blocked]:
Goroutine 36 is receiving from a channel created outside its bubble.
Goroutine 36 is receiving from a channel created inside its bubble.
For #67434
Change-Id: I006b656a9ce7eeb75b2be21e748440a5dd57ceb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/670976
Auto-Submit: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Fixes #72949.
Change-Id: I114eda73c57bc7d596eb1656e738b80c1cbe5254
Reviewed-on: https://go-review.googlesource.com/c/go/+/662039
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change adds tracking for approximate finalizer and cleanup queue
lengths. These lengths are reported once every GC cycle as a single line
printed to stderr when GODEBUG=checkfinalizer>0.
This change lays the groundwork for runtime/metrics metrics to produce
the same values.
For #72948.
For #72950.
Change-Id: I081721238a0fc4c7e5bee2dbaba6cfb4120d1a33
Reviewed-on: https://go-review.googlesource.com/c/go/+/671437
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
These were just enabled by https://go.dev/cl/643897, but freebsd
unfortunately doesn't seem to support cgo + race mode by default.
For #73788.
Cq-Include-Trybots: luci.golang.try:gotip-freebsd-amd64-race
Change-Id: I6a6a636c06176ca746548d0588283b1429d7c6d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/674160
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
This change dumps a scan trace (each pointer marked and where it came
from) for the partial GC cycle performed by checkfinalizers mode when
checkfinalizers>1. This is useful for quickly understanding why certain
values are reachable without having to pull out tools like viewcore.
For #72949.
Change-Id: Ic583f80e9558cdfe1c667d27a1d975008dd39a9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/662038
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change adds support for identifying cleanups and finalizers
attached to tiny blocks to checkfinalizers mode. It also notes a subtle
pitfall, which is that the cleanup arg, if tiny-allocated, could end up
co-located with the object with the cleanup attached! Oops...
For #72949.
Change-Id: Icbe0112f7dcfc63f35c66cf713216796a70121ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/662037
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
This change adds a new special kind called CheckFinalizer which is used
to annotate finalizers and cleanups with extra information about where
that cleanup or finalizer came from.
For #72949.
Change-Id: I3c1ace7bd580293961b7f0ea30345a6ce956d340
Reviewed-on: https://go-review.googlesource.com/c/go/+/662135
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This new debug mode detects cleanup/finalizer leaks using checkmark
mode. It runs a partial GC using only specials as roots. If the GC can
find a path from one of these roots back to the object the special is
attached to, then the object might never be reclaimed. (The cycle could
be broken in the future, but it's almost certainly a bug.)
This debug mode is very barebones. It contains no type information and
no stack location for where the finalizer or cleanup was created.
For #72949.
Change-Id: Ibffd64c1380b51f281950e4cfe61f677385d42a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/634599
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
CL 614257 refactored mallocgc but lost an optimization: if a span for a
large object is already backed by memory fresh from the OS (and thus
zeroed), we don't need to zero it. CL 614257 unconditionally zeroed
spans for large objects that contain pointers.
This change restores the optimization from before CL 614257, which seems
to matter in some real-world programs.
While we're here, let's also fix a hole with the garbage collector being
able to observe uninitialized memory of the large object is observed
by the conservative scanner before being published. The gory details are
in a comment in heapSetTypeLarge. In short, this change makes
span.largeType an atomic variable, such that the GC can only observe
initialized memory if span.largeType != nil.
Fixes #72991.
Change-Id: I2048aeb220ab363d252ffda7d980b8788e9674dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/659956
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
|
|
Currently, it's possible for asynchronous preemption to observe a
partially initialized object. The sequence of events goes like this:
- The GC is in the mark phase.
- Thread T1 is allocating object O1.
- Thread T1 zeroes the allocation, runs the publication barrier, and
updates freeIndexForScan. It has not yet updated the mark bit on O1.
- Thread T2 is conservatively scanning some stack frame.
That stack frame has a dead pointer with the same address as O1.
- T2 picks up the pointer, checks isFree (which checks
freeIndexForScan without an import barrier), and sees that O1 is
allocated. It marks and queues O1.
- T2 then goes to scan O1, and observes uninitialized memory.
Although a publication barrier was executed, T2 did not have an import
barrier. T2 may thus observe T1's writes to zero the object out-of-order
with the write to freeIndexForScan.
Normally this would be impossible if T2 got a pointer to O1 from
somewhere written by T1. The publication barrier guarantees that if the
read side is data-dependent on the write side then we'd necessarily
observe all writes to O1 before T1 published it. However, T2 got the
pointer 'out of thin air' by scanning a stack frame with a dead pointer
on it.
One fix to this problem would be to add the import barrier in the
conservative scanner. We would then also need to put freeIndexForScan
behind the publication barrier, or make the write to freeIndexForScan
exactly that barrier.
However, there's a simpler way. We don't actually care if conservative
scanning observes a stale freeIndexForScan during the mark phase.
Newly-allocated memory is always marked at the point of allocation (the
allocate-black policy part of the GC's design). So it doesn't actually
matter that if the garbage collector scans that memory or not.
This change modifies the allocator to only update freeIndexForScan
outside the mark phase. This means freeIndexForScan is essentially
a snapshot of freeindex at the point the mark phase started. Because
there's no more race between conservative scanning and newly-allocated
objects, the complicated scenario above is no longer a possibility.
One thing we do have to be careful of is other callers of isFree.
Previously freeIndexForScan would always track freeindex, now it no
longer does. This change thus introduces isFreeOrNewlyAllocated which is
used by the conservative scanner, and uses freeIndexForScan. Meanwhile
isFree goes back to using freeindex like it used to. This change also
documents the requirement on isFree that the caller must have obtained
the pointer not 'out of thin air' but after the object was published.
isFree is not currently used anywhere particularly sensitive (heap dump
and checkmark mode, where the world is stopped in both cases) so using
freeindex is both conceptually simple and also safe.
Change-Id: If66b8c536b775971203fb4358c17d711c2944723
Reviewed-on: https://go-review.googlesource.com/c/go/+/672340
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
When appending, if the backing store doesn't escape and a
constant-sized backing store is big enough, use a constant-sized
stack-allocated backing store instead of allocating it from the heap.
cmd/go is <0.1% bigger.
As an example of how this helps, if you edit strings/strings.go:FieldsFunc
to replace
spans := make([]span, 0, 32)
with
var spans []span
then this CL removes the first 2 allocations that are part of the growth sequence:
│ base │ exp │
│ allocs/op │ allocs/op vs base │
FieldsFunc/ASCII/16-24 3.000 ± ∞ ¹ 2.000 ± ∞ ¹ -33.33% (p=0.008 n=5)
FieldsFunc/ASCII/256-24 7.000 ± ∞ ¹ 5.000 ± ∞ ¹ -28.57% (p=0.008 n=5)
FieldsFunc/ASCII/4096-24 11.000 ± ∞ ¹ 9.000 ± ∞ ¹ -18.18% (p=0.008 n=5)
FieldsFunc/ASCII/65536-24 18.00 ± ∞ ¹ 16.00 ± ∞ ¹ -11.11% (p=0.008 n=5)
FieldsFunc/ASCII/1048576-24 30.00 ± ∞ ¹ 28.00 ± ∞ ¹ -6.67% (p=0.008 n=5)
FieldsFunc/Mixed/16-24 2.000 ± ∞ ¹ 2.000 ± ∞ ¹ ~ (p=1.000 n=5)
FieldsFunc/Mixed/256-24 7.000 ± ∞ ¹ 5.000 ± ∞ ¹ -28.57% (p=0.008 n=5)
FieldsFunc/Mixed/4096-24 11.000 ± ∞ ¹ 9.000 ± ∞ ¹ -18.18% (p=0.008 n=5)
FieldsFunc/Mixed/65536-24 18.00 ± ∞ ¹ 16.00 ± ∞ ¹ -11.11% (p=0.008 n=5)
FieldsFunc/Mixed/1048576-24 30.00 ± ∞ ¹ 28.00 ± ∞ ¹ -6.67% (p=0.008 n=5)
(Of course, people have spotted and fixed a bunch of allocation sites
like this, but now we're ~automatically doing it everywhere going forward.)
No significant increases in frame sizes in cmd/go.
Change-Id: I301c4d9676667eacdae0058960321041d173751a
Reviewed-on: https://go-review.googlesource.com/c/go/+/664299
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
This was just enabled in CL 643897. It seems to work fine on Linux, but
there are traceback issues on Darwin. We could disable just on Darwin,
but I'm not sure SIGSEGV inside of TSAN is something we care to support.
Fixes #73784.
Cq-Include-Trybots: luci.golang.try:gotip-darwin-arm64-race
Change-Id: I6a6a636cb15d7affaeb22c4c13d8f2a5c9bb31fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/674276
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
ncpu is the total logical CPU count at startup. It is never updated. For
#73193, we will start using updated CPU counts for updated GOMAXPROCS,
making the ncpu name a bit ambiguous. Change to a less ambiguous name.
While we're at it, give the OS specific lookup functions a common name,
so it can be used outside of osinit later.
For #73193.
Change-Id: I6a6a636cf21cc60de36b211f3c374080849fc667
Reviewed-on: https://go-review.googlesource.com/c/go/+/672277
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
Moving to a smaller package allows its use in other internal/runtime
packages.
This isn't internal/strconvlite since it can't be used directly by
strconv.
For #73193.
Change-Id: I6a6a636c9c8b3f06b5fd6c07fe9dd5a7a37d1429
Reviewed-on: https://go-review.googlesource.com/c/go/+/672697
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
asancall and msancall are reachable from the signal handler, where we
are running on gsignal. Currently, these calls will use the g0 stack in
this case, but if the interrupted code was running on g0 this will
corrupt the stack and likely cause a crash.
As far as I know, racecall is not reachable from the signal handler, but
I have updated it as well for consistency.
This is the most straightforward fix, though it would be nice to
eventually migrate these wrappers to asmcgocall, which already handled
this case.
Fixes #71395.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-asan-clang15,gotip-linux-amd64-msan-clang15,gotip-linux-amd64-race
Change-Id: I6a6a636ccba826dd53e31c0e85b5d42fb1e98d12
Reviewed-on: https://go-review.googlesource.com/c/go/+/643875
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The tests using testprog / testprogcgo are currently not covered on the
asan/msan/race builders because they don't build testprog with the
sanitizer flag.
Explicitly pass the flag if the test itself is built with the sanitizer.
There were a few tests that explicitly passed -race (even on non-race
builders). These tests will now only run on race builders.
For #71395.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-asan-clang15,gotip-linux-amd64-msan-clang15,gotip-linux-amd64-race
Change-Id: I6a6a636ce8271246316a80d426c0e4e2f6ab99c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/643897
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
This CL fixes a number of (all true positive) findings of vet's
copylock analyzer patched to treat the Bu{ff,uild}er types
as non-copyable after first use.
This does require imposing an additional indirection
between noder.writer and Encoder since the field is
embedded by value but its constructor now returns a pointer.
Updates golang/go#25907
Updates golang/go#47276
Change-Id: I0b4d77ac12bcecadf06a91709e695365da10766c
Reviewed-on: https://go-review.googlesource.com/c/go/+/635339
Reviewed-by: Robert Findley <rfindley@google.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Alan Donovan <adonovan@google.com>
|
|
Currently, there's a window of time where each cleanup goroutine has
committed to going to sleep (immediately after full.pop() == nil) but
hasn't yet marked itself as asleep (state.sleep()). If new work arrives
in this window, it might get missed. This is what we see in #73642, and
I can reproduce it with stress2.
Side-note: even if the work gets missed by the existing sleeping
goroutines, needg is incremented. So in theory a new goroutine will
handle the work. Right now that doesn't happen in tests like the one
running in #73642, where there might never be another call to AddCleanup
to create the additional goroutine. Also, if we've hit the maximum on
cleanup goroutines and all of them are in this window simultaneously, we
can still end up missing work, it's just more rare. So this is still a
problem even if we choose to just be more aggressive about creating new
cleanup goroutines.
This change fixes the problem and also aims to make the cleanup
wake/sleep code clearer. The way this change fixes this problem is to
have cleanup goroutines re-check the work list before going to sleep,
but after having already marked themselves as sleeping. This way, if new
work comes in before the cleanup goroutine marks itself as going to
sleep, we can rely on the re-check to pick up that work. If new work
comes after the goroutine marks itself as going to sleep and after the
re-check, we can rely on the scheduler noticing that the goroutine is
asleep and waking it up. If work comes in between a goroutine marking
itself as sleeping and the re-check, then the re-check will catch that
piece of work. However, the scheduler might now get a false signal that
the goroutine is asleep and try to wake it up. This is OK. The sleeping
signal is now mutated and double-checked under the queue lock, so the
scheduler will grab the lock, may notice there are no sleeping
goroutines, and go on its way. This may cause spurious lock acquisitions
but it should be very rare. The window between a cleanup goroutine
marking itself as going to sleep and re-checking the work list is a
handful of instructions at most.
This seems subtle but overall it's a simplification of the code. We
rely more on the lock, which is easier to reason about, and we track two
separate atomic variables instead of the merged cleanupSleepState: the
length of the full list, and the number of cleanup goroutines that are
asleep. The former is now the primary way to acquire work. Cleanup
goroutines must decrement the length successfully to obtain an item off
the full list. The number of cleanup goroutines asleep, meanwhile, is
now only updated with the queue lock held. It can be checked without the
lock held, and the invariant to make that safe is simple: it must always
be an overestimate of the number of sleeping cleanup goroutines.
The changes here do change some other behaviors.
First, since we're tracking the length of the full list instead of the
abstract concept of a wake-up, the waker can't consume wake-ups anymore.
This means that cleanup goroutines may be created more aggressively. If
two threads in the scheduler see that there are goroutines that are
asleep, only one will win the race, but the other will observe zero
asleep goroutines but potentially many work units available. This will
cause it to signal many goroutines to be created. This is OK since we
have a cap on the number of cleanup goroutines, and the race should be
relatively rare.
Second, because cleanup goroutines can now fail to go to sleep if any
units of work come in, they might spend more time contended on the lock.
For example, if we have N cleanup goroutines and work comes in at *just*
the wrong rate, in the worst case we'll have each of G goroutines loop
N times for N blocks, resulting in O(G*N) thread time to handle each
block in the worst case. To paint a picture, imagine each goroutine
trying to go to sleep, fail because a new block of work came in, and
only one goroutine will get that block. Then once that goroutine is
done, we all try again, fail because a new block of work came in, and so
on and so forth. This case is unlikely, though, and probably not worth
worrying about until it actually becomes a problem. (A similar problem
exists with parking (and exists before this change, too) but at least in
that case each goroutine parks, so it doesn't block the thread.)
Fixes #73642.
Change-Id: I6bbe1b789e7eb7e8168e56da425a6450fbad9625
Reviewed-on: https://go-review.googlesource.com/c/go/+/671676
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
TestMutexBlockFullAggregation aggregates stacks by function, file, and
line number. But there can be multiple function calls on the same line,
giving us different sequences of PCs. This causes the test to spuriously
fail in some cases. Include PCs in the stacks for this test.
Also pick up a small "range over int" modernize suggestion while we're
looking at the test.
Fixes #73641
Change-Id: I50489e19fcf920e27b9eebd9d4b35feb89981cbc
Reviewed-on: https://go-review.googlesource.com/c/go/+/673115
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change makes it so that cleanup goroutines, in race mode, create a
fake race context and switch to it, emulating cleanups running on new
goroutines. This helps in catching races between cleanups that might run
concurrently.
Change-Id: I4c4e33054313798d4ac4e5d91ff2487ea3eb4b16
Reviewed-on: https://go-review.googlesource.com/c/go/+/652635
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
On every arch except amd64, it is faster to do x&(x-1) than x^(1<<n).
Most archs need 3 instructions for the latter: MOV $1, R; SLL n, R;
ANDN R, x. Maybe 4 if there's no ANDN.
Most archs need only 2 instructions to do x&(x-1). It takes 3 on
x86/amd64 because NEG only works in place.
Only amd64 can do x^(1<<n) in a single instruction.
(We could on 386 also, but that's currently not implemented.)
Change-Id: I3b74b7a466ab972b20a25dbb21b572baf95c3467
Reviewed-on: https://go-review.googlesource.com/c/go/+/672956
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Memclr routine of s390x architecture is now implemented with vector operations.
And loop unrolling is used for larger sizes.
goos: linux
goarch: s390x
pkg: runtime
| old.txt | new_final.txt |
| sec/op | sec/op vs base |
Memclr/5 2.485n ± 5% 2.421n ± 0% -2.54% (p=0.000 n=10)
Memclr/16 3.037n ± 2% 2.969n ± 0% -2.26% (p=0.001 n=10)
Memclr/64 9.623n ± 0% 4.455n ± 1% -53.70% (p=0.000 n=10)
Memclr/256 3.347n ± 3% 3.312n ± 4% ~ (p=0.670 n=10)
Memclr/4096 15.53n ± 0% 15.54n ± 0% +0.06% (p=0.000 n=10)
Memclr/65536 329.8n ± 2% 228.4n ± 0% -30.74% (p=0.000 n=10)
Memclr/1M 13.09µ ± 0% 12.78µ ± 0% -2.34% (p=0.000 n=10)
Memclr/4M 52.33µ ± 0% 51.16µ ± 0% -2.24% (p=0.000 n=10)
Memclr/8M 104.6µ ± 0% 102.3µ ± 0% -2.20% (p=0.000 n=10)
Memclr/16M 209.4µ ± 0% 204.9µ ± 0% -2.17% (p=0.000 n=10)
Memclr/64M 977.8µ ± 0% 967.8µ ± 0% -1.02% (p=0.000 n=10)
MemclrUnaligned/0_5 3.398n ± 0% 3.657n ± 0% +7.62% (p=0.000 n=10)
MemclrUnaligned/0_16 3.957n ± 0% 3.958n ± 0% ~ (p=0.325 n=10)
MemclrUnaligned/0_64 11.550n ± 0% 5.139n ± 0% -55.51% (p=0.000 n=10)
MemclrUnaligned/0_256 4.288n ± 0% 4.025n ± 4% -6.14% (p=0.000 n=10)
MemclrUnaligned/0_4096 15.53n ± 0% 15.53n ± 0% ~ (p=1.000 n=10)
MemclrUnaligned/0_65536 318.3n ± 1% 233.9n ± 0% -26.52% (p=0.000 n=10)
MemclrUnaligned/1_5 3.398n ± 0% 3.657n ± 0% +7.62% (p=0.000 n=10)
MemclrUnaligned/1_16 3.965n ± 0% 3.969n ± 0% +0.10% (p=0.000 n=10)
MemclrUnaligned/1_64 11.550n ± 0% 5.109n ± 0% -55.76% (p=0.000 n=10)
MemclrUnaligned/1_256 4.385n ± 0% 4.174n ± 1% -4.80% (p=0.000 n=10)
MemclrUnaligned/1_4096 26.23n ± 0% 26.24n ± 0% +0.04% (p=0.005 n=10)
MemclrUnaligned/1_65536 570.5n ± 0% 401.3n ± 0% -29.66% (p=0.000 n=10)
MemclrUnaligned/4_5 3.398n ± 0% 3.657n ± 0% +7.62% (p=0.000 n=10)
MemclrUnaligned/4_16 3.965n ± 0% 3.973n ± 1% +0.19% (p=0.000 n=10)
MemclrUnaligned/4_64 11.550n ± 0% 5.131n ± 0% -55.58% (p=0.000 n=10)
MemclrUnaligned/4_256 4.419n ± 0% 4.187n ± 1% -5.25% (p=0.000 n=10)
MemclrUnaligned/4_4096 26.23n ± 0% 26.24n ± 0% +0.04% (p=0.011 n=10)
MemclrUnaligned/4_65536 570.5n ± 0% 401.2n ± 0% -29.67% (p=0.000 n=10)
MemclrUnaligned/7_5 3.397n ± 0% 3.657n ± 0% +7.65% (p=0.000 n=10)
MemclrUnaligned/7_16 3.965n ± 0% 3.969n ± 0% +0.10% (p=0.000 n=10)
MemclrUnaligned/7_64 11.550n ± 0% 5.120n ± 0% -55.67% (p=0.000 n=10)
MemclrUnaligned/7_256 4.407n ± 0% 4.188n ± 2% -4.99% (p=0.000 n=10)
MemclrUnaligned/7_4096 26.24n ± 0% 26.24n ± 0% ~ (p=1.000 n=10)
MemclrUnaligned/7_65536 570.8n ± 0% 401.3n ± 0% -29.69% (p=0.000 n=10)
MemclrUnaligned/0_1M 13.08µ ± 0% 12.81µ ± 0% -2.06% (p=0.000 n=10)
MemclrUnaligned/0_4M 52.28µ ± 0% 51.13µ ± 0% -2.21% (p=0.000 n=10)
MemclrUnaligned/0_8M 104.6µ ± 0% 102.3µ ± 0% -2.18% (p=0.000 n=10)
MemclrUnaligned/0_16M 209.5µ ± 0% 204.8µ ± 0% -2.24% (p=0.000 n=10)
MemclrUnaligned/0_64M 977.7µ ± 0% 969.1µ ± 0% -0.88% (p=0.000 n=10)
MemclrUnaligned/1_1M 17.49µ ± 0% 16.04µ ± 0% -8.32% (p=0.000 n=10)
MemclrUnaligned/1_4M 69.92µ ± 0% 64.13µ ± 0% -8.28% (p=0.000 n=10)
MemclrUnaligned/1_8M 139.8µ ± 0% 128.2µ ± 0% -8.32% (p=0.000 n=10)
MemclrUnaligned/1_16M 279.9µ ± 0% 256.1µ ± 0% -8.50% (p=0.000 n=10)
MemclrUnaligned/1_64M 1.250m ± 0% 1.216m ± 0% -2.73% (p=0.000 n=10)
MemclrUnaligned/4_1M 17.50µ ± 0% 16.04µ ± 0% -8.33% (p=0.000 n=10)
MemclrUnaligned/4_4M 69.93µ ± 0% 64.12µ ± 0% -8.30% (p=0.000 n=10)
MemclrUnaligned/4_8M 139.8µ ± 0% 128.2µ ± 0% -8.32% (p=0.000 n=10)
MemclrUnaligned/4_16M 280.2µ ± 0% 256.2µ ± 0% -8.55% (p=0.000 n=10)
MemclrUnaligned/4_64M 1.250m ± 0% 1.216m ± 0% -2.73% (p=0.000 n=10)
MemclrUnaligned/7_1M 17.50µ ± 0% 16.04µ ± 0% -8.35% (p=0.000 n=10)
MemclrUnaligned/7_4M 69.92µ ± 0% 64.13µ ± 0% -8.28% (p=0.000 n=10)
MemclrUnaligned/7_8M 139.8µ ± 0% 128.2µ ± 0% -8.34% (p=0.000 n=10)
MemclrUnaligned/7_16M 279.6µ ± 0% 256.2µ ± 0% -8.35% (p=0.000 n=10)
MemclrUnaligned/7_64M 1.250m ± 0% 1.216m ± 0% -2.73% (p=0.000 n=10)
MemclrRange/1K_2K 1.053µ ± 0% 1.020µ ± 1% -3.09% (p=0.000 n=10)
MemclrRange/2K_8K 1.552µ ± 0% 1.570µ ± 12% ~ (p=0.137 n=10)
MemclrRange/4K_16K 1.283µ ± 0% 1.250µ ± 0% -2.61% (p=0.000 n=10)
MemclrRange/160K_228K 20.62µ ± 0% 19.86µ ± 0% -3.70% (p=0.000 n=10)
MemclrKnownSize1 1.732n ± 0% 1.732n ± 0% ~ (p=1.000 n=10)
MemclrKnownSize2 1.925n ± 34% 1.967n ± 8% ~ (p=0.080 n=10)
MemclrKnownSize4 1.808n ± 3% 1.732n ± 0% -4.20% (p=0.000 n=10)
MemclrKnownSize8 2.002n ± 9% 1.773n ± 5% -11.46% (p=0.000 n=10)
MemclrKnownSize16 2.880n ± 5% 2.461n ± 5% -14.53% (p=0.000 n=10)
MemclrKnownSize32 8.082n ± 0% 2.838n ± 5% -64.88% (p=0.000 n=10)
MemclrKnownSize64 8.083n ± 0% 4.960n ± 4% -38.63% (p=0.000 n=10)
MemclrKnownSize112 8.082n ± 0% 5.533n ± 1% -31.53% (p=0.000 n=10)
MemclrKnownSize128 8.082n ± 0% 5.534n ± 1% -31.54% (p=0.000 n=10)
MemclrKnownSize192 8.082n ± 0% 6.833n ± 2% -15.45% (p=0.000 n=10)
MemclrKnownSize248 8.082n ± 0% 7.165n ± 1% -11.34% (p=0.000 n=10)
MemclrKnownSize256 2.995n ± 6% 3.226n ± 4% +7.70% (p=0.006 n=10)
MemclrKnownSize512 3.356n ± 8% 3.595n ± 3% +7.14% (p=0.007 n=10)
MemclrKnownSize1024 4.664n ± 0% 4.665n ± 0% ~ (p=0.426 n=10)
MemclrKnownSize4096 15.80n ± 4% 15.15n ± 0% ~ (p=0.449 n=10)
MemclrKnownSize512KiB 6.543µ ± 0% 6.380µ ± 0% -2.48% (p=0.000 n=10)
geomean 327.2n 286.6n -12.42%
Change-Id: I0f8450743e2f7e736c5ff96a316a8b5d98b27222
Reviewed-on: https://go-review.googlesource.com/c/go/+/662475
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Because freebsd is now enabling la57 by default.
Fixes #49405
Change-Id: I30f7bac8b8a9baa85e0c097e06072c19ad474e5a
Reviewed-on: https://go-review.googlesource.com/c/go/+/670715
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Fixes #73107
Change-Id: I41f3e1bd1fdaca2f0e94151b2320bd569e258a51
Reviewed-on: https://go-review.googlesource.com/c/go/+/671576
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
It supports features described in the issue:
* add -json flag for 'go version -m' to print json encoding of
runtime/debug.BuildSetting to standard output.
* report an error when specifying -json flag without -m.
* print build settings on seperated line for each binary
Fixes #69712
Change-Id: I79cba2109f80f7459252d197a74959694c4eea1f
Reviewed-on: https://go-review.googlesource.com/c/go/+/619955
Reviewed-by: Sam Thanawalla <samthanawalla@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change reintroduces CL 564197. It was reverted due to a failing
benchmark. That failure has been resolved.
For #65064
Change-Id: Ic88841d2bc24c2717ad324873f0f52699f21dc66
Reviewed-on: https://go-review.googlesource.com/c/go/+/669235
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
gcMarkTermination() ensures that all caches are flushed before continuing the GC cycle, thus preempting all goroutines.
However, inlining calls to lock() in bgsweep makes it non-preemptible for most of the time, leading to livelock.
This change adds explicit preemption to avoid this.
Fixes #73499.
Change-Id: I4abf0d658f3d7a03ad588469cd013a0639de0c8a
Reviewed-on: https://go-review.googlesource.com/c/go/+/668795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
If cputicks is in the top quarter of the int64's range, adding two
values together will overflow and confuse the subsequent calculations,
leading to zero-duration contention events in the profile.
This fixes the TestRuntimeLockMetricsAndProfile failures on the
linux-s390x builder.
Change-Id: Icb814c39a8702379dfd71c06a53b2618e3589e07
Reviewed-on: https://go-review.googlesource.com/c/go/+/671115
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Rhys Hiltner <rhys.hiltner@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
We don't use this mechanism any more, so the metric will always be zero.
Since CL 616255.
Update #73628
Change-Id: Ic179927a8bc24e6291876c218d88e8848b057c2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/671096
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change splits the finalizer and cleanup queues and implements a new
lock-free blocking queue for cleanups. The basic design is as follows:
The cleanup queue is organized in fixed-sized blocks. Individual cleanup
functions are queued, but only whole blocks are dequeued.
Enqueuing cleanups places them in P-local cleanup blocks. These are
flushed to the full list as they get full. Cleanups can only be enqueued
by an active sweeper.
Dequeuing cleanups always dequeues entire blocks from the full list.
Cleanup blocks can be dequeued and executed at any time.
The very last active sweeper in the sweep phase is responsible for
flushing all local cleanup blocks to the full list. It can do this
without any synchronization because the next GC can't start yet, so we
can be very certain that nobody else will be accessing the local blocks.
Cleanup blocks are stored off-heap because the need to be allocated by
the sweeper, which is called from heap allocation paths. As a result,
the GC treats cleanup blocks as roots, just like finalizer blocks.
Flushes to the full list signal to the scheduler that cleanup goroutines
should be awoken. Every time the scheduler goes to wake up a cleanup
goroutine and there were more signals than goroutines to wake, it then
forwards this signal to runtime.AddCleanup, so that it creates another
goroutine the next time it is called, up to gomaxprocs goroutines.
The signals here are a little convoluted, but exist because the sweeper
and the scheduler cannot safely create new goroutines.
For #71772.
For #71825.
Change-Id: Ie839fde2b67e1b79ac1426be0ea29a8d923a62cc
Reviewed-on: https://go-review.googlesource.com/c/go/+/650697
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
It's the job of the last sweeper to emit the GC pacer trace. The last
sweeper can identify themselves by reducing the count of sweepers, and
also seeing that there's no more sweep work.
Currently this identification is broken, however, because the last
sweeper doesn't check the state they just transitioned sweeping into,
but rather the state they transitioned from (one sweeper, no sweep work
left). By design, it's impossible to transition *out* of this state,
except for another GC to start, but that doesn't take this codepath.
This means lines like
pacer: sweep done at heap size ...
were missing from the gcpacertrace output for a long time.
This change fixes this problem by having the last sweeper check the
state they just transitioned sweeping to, instead of the state they
transitioned from.
Change-Id: I44bcd32fe2c8ae6ac6c21ba6feb2e7b9e17f60cc
Reviewed-on: https://go-review.googlesource.com/c/go/+/670735
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Previous fix in CL 667715 wasn't correct for aix.
Change-Id: I44042786079463967165507b15756cf24b9a213a
Reviewed-on: https://go-review.googlesource.com/c/go/+/668036
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
We've settled on calling the group of goroutines started by
synctest.Run a "bubble". At the time the runtime implementation
was written, I was still calling this a "group". Update the code
to match the current terminology.
Change-Id: I31b757f31d804b5d5f9564c182627030a9532f4a
Reviewed-on: https://go-review.googlesource.com/c/go/+/670135
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Once the goroutine started by synctest.Run exits, stop advancing
the fake clock in its bubble. This avoids confusing situations
where a bubble remains alive indefinitely while a background
goroutine reads from a time.Ticker or otherwise advances the clock.
For #67434
Change-Id: Id608ffe3c7d7b07747b56a21f365787fb9a057d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/662155
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Damien Neil <dneil@google.com>
|
|
Go 1.22 promised to remove the setting in a future release once the
semantics of runtime-internal lock contention matched that of
sync.Mutex. That work is done, remove the setting.
Previously reviewed as https://go.dev/cl/585639.
For #66999
Change-Id: I9fe62558ba0ac12824874a0bb1b41efeb7c0853f
Reviewed-on: https://go-review.googlesource.com/c/go/+/668995
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Have the test use the same clock (cputicks) as the profiler, and use the
test's own measurements as hard bounds on the magnitude to expect in the
profile.
Compare the depiction of two users of the same lock: one where the
critical section is fast, one where it is slow. Confirm that the profile
shows the slow critical section as a large source of delay (with #66999
fixed), rather than showing the fast critical section as a large
recipient of delay.
Previously reviewed as https://go.dev/cl/586237.
For #66999
Change-Id: Ic2d78cc29153d5322577d84abdc448e95ed8f594
Reviewed-on: https://go-review.googlesource.com/c/go/+/667616
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
|
|
Correct how the mutex contention profile reports on runtime-internal
mutex values, to match sync.Mutex's semantics.
Decide at the start of unlock2 whether we'd like to collect a contention
sample. If so: Opt in to a slightly slower unlock path which avoids
accidentally accepting blame for delay caused by other Ms. Release the
lock before doing an O(N) traversal of the stack of waiting Ms, to
calculate the total delay to those Ms that our critical section caused.
Report that, with the current callstack, in the mutex profile.
Fixes #66999
Change-Id: I561ed8dc120669bd045d514cb0d1c6c99c2add04
Reviewed-on: https://go-review.googlesource.com/c/go/+/667615
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
The race feature depends on llvm. And support for building the tsan library on
linux/loong64 has been added in this patch [1], which has been merged into the
branch main and has landed in llvm18.
The support for linux/loong64 in racebuild has been implemented in CL 655775,
now racebuild can successfully build race_linux_loong64.syso [2].
[1]: https://github.com/llvm/llvm-project/pull/72819
[2]: racebuild -platforms linux/loong64 -cherrypick 'refs/changes/16/543316/10' \
-rev 83fe85115da9dc25fa270d2ea8140113c8d49670 \
-goroot /home/golang/src/go
Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: If389318215476890295ed771297c6c088cfc84b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/543316
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
|
|
When synctest.Run panics due to every goroutine in the bubble being
blocked, print a stack trace for every goroutine in the bubble.
For #67434
Change-Id: Ie751c2ee6fa136930b18f4bee0277ff30da46905
Reviewed-on: https://go-review.googlesource.com/c/go/+/645719
Auto-Submit: Damien Neil <dneil@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The current Error documentation is vacuous and doesn't say anything
about what this interface is actually for. Expand to include its meaning
and why it might be used.
Change-Id: I6a6a636cbd5f5788cb9d1a88845de16b98f7424b
Reviewed-on: https://go-review.googlesource.com/c/go/+/670635
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
This change moves the unique package away from using a concurrent map
and instead toward a bespoke concurrent canonicalization map. The map
holds all its keys weakly, though keys may be looked up by value. The
result is the strong pointer for the canonical value. Entries in the map
are automatically cleaned up once the canonical reference no longer
exists.
Why do this? There's a problem with the current implementation when it
comes to chains of unique.Handle: because the unique map will have a
unique.Handle stored in its keys, each nested handle must be cleaned up
1 GC at a time. It takes N GC cycles, at minimum, to clean up a nested
chain of N handles. This implementation, where the *only* value in the
set is weakly-held, does not have this problem. The entire chain is
dropped at once.
The canon map implementation is a stripped-down version of HashTrieMap.
The weak set implementation also has lower memory overheads by virtue of
the fact that keys are all stored weakly. Whereas the previous map had
both a T and a weak.Pointer[T], this *only* has a weak.Pointer[T].
The canonicalization map is a better abstraction overall and
dramatically simplifies the unique.Make code.
While we're here, delete the background goroutine and switch to
runtime.AddCleanup. This is a step toward fixing #71772. We still need
some kind of back-pressure mechanism, which will be implemented in a
follow-up CL.
For #71772.
Fixes #71846.
Change-Id: I5b2ee04ebfc7f6dd24c2c4a959dd0f6a8af24ca4
Reviewed-on: https://go-review.googlesource.com/c/go/+/650256
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
Fixes #73526
Change-Id: I4b801cf3e54b99559e6d5ca8fdb2fd0692a0d3a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/669975
TryBot-Bypass: Mark Freeman <mark@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Mark Freeman <mark@golang.org>
Reviewed-by: Mark Freeman <mark@golang.org>
|
|
This reverts commits
3f3782feed6e0726ddb08afd32dad7d94fbb38c6 (CL 648518)
b386b628521780c048af14a148f373c84e687b26 (CL 668475)
Fixes #73542
Change-Id: I218851c5c0b62700281feb0b3f82b6b9b97b910d
Reviewed-on: https://go-review.googlesource.com/c/go/+/670055
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Corollary to CL 669615.
morestack uses the frame pointer from g0.sched.bp. This doesn't really
make any sense. morestack wasn't called by whatever used g0 last, so at
best unwinding will get misleading results.
For #63630.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-arm64-longtest
Change-Id: I6a6a636c3a2994eb88f890c506c96fd899e993a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/669616
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
On arm64, systemstack restores the frame pointer from g0.sched to R29
prior to calling the callback. That doesn't really make any sense. The
frame pointer value in g0.sched is some arbitrary BP from a prior
context save, but that is not the caller of systemstack.
amd64 does not do this. In fact, it leaves BP completely unmodified so
frame pointer unwinders like gdb can walk through the systemstack frame
and continue traceback on the caller's stack. Unlike mcall, systemstack
always returns to the original goroutine, so that is safe.
We should do the same on arm64.
For #63630.
Cq-Include-Trybots: luci.golang.try:gotip-linux-arm64-longtest
Change-Id: I6a6a636c35d321dd5d7dc1c4d09e29b55b1ab621
Reviewed-on: https://go-review.googlesource.com/c/go/+/669236
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
On amd64, mcall leaves BP untouched, so the callback will push BP,
connecting the g0 stack to the calling g stack. This seems OK (frame
pointer unwinders like Linux perf can see what user code called into the
scheduler), but the "scheduler" part is problematic.
mcall is used when calling into the scheduler to deschedule the current
goroutine (e.g., in goyield). Once the goroutine is descheduled, it may
be picked up by another M and continue execution. The other thread is
mutating the goroutine stack, but our M still has a frame pointer
pointing to the goroutine stack.
A frame pointer unwinder like Linux perf could get bogus values off of
the mutating stack. Note that though the execution tracer uses
framepointer unwinding, it never unwinds a g0, so it isn't affected.
Clear the frame pointer in mcall so that unwinding always stops at
mcall.
On arm64, mcall stores the frame pointer from g0.sched.bp. This doesn't
really make any sense. mcall wasn't called by whatever used g0 last, so
at best unwinding will get misleading results (e.g., it might look like
cgocallback calls mcall?).
Also clear the frame pointer on arm64.
Other architectures don't use frame pointers.
For #63630.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-arm64-longtest
Change-Id: I6a6a636cb6404f3c95ecabdb969c9b8184615cee
Reviewed-on: https://go-review.googlesource.com/c/go/+/669615
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
Our current parallel mark algorithm suffers from frequent stalls on
memory since its access pattern is essentially random. Small objects
are the worst offenders, since each one forces pulling in at least one
full cache line to access even when the amount to be scanned is far
smaller than that. Each object also requires an independent access to
per-object metadata.
The purpose of this change is to improve garbage collector performance
by scanning small objects in batches to obtain better cache locality
than our current approach. The core idea behind this change is to defer
marking and scanning small objects, and then scan them in batches
localized to a span.
This change adds scanned bits to each small object (<=512 bytes) span in
addition to mark bits. The scanned bits indicate that the object has
been scanned. (One way to think of them is "grey" bits and "black" bits
in the tri-color mark-sweep abstraction.) Each of these spans is always
8 KiB and if they contain pointers, the pointer/scalar data is already
packed together at the end of the span, allowing us to further optimize
the mark algorithm for this specific case.
When the GC encounters a pointer, it first checks if it points into a
small object span. If so, it is first marked in the mark bits, and then
the object is queued on a work-stealing P-local queue. This object
represents the whole span, and we ensure that a span can only appear at
most once in any queue by maintaining an atomic ownership bit for each
span. Later, when the pointer is dequeued, we scan every object with a
set mark that doesn't have a corresponding scanned bit. If it turns out
that was the only object in the mark bits since the last time we scanned
the span, we scan just that object directly, essentially falling back to
the existing algorithm. noscan objects have no scan work, so they are
never queued.
Each span's mark and scanned bits are co-located together at the end of
the span. Since the span is always 8 KiB in size, it can be found with
simple pointer arithmetic. Next to the marks and scans we also store the
size class, eliminating the need to access the span's mspan altogether.
The work-stealing P-local queue is a new source of GC work. If this
queue gets full, half of it is dumped to a global linked list of spans
to scan. The regular scan queues are always prioritized over this queue
to allow time for darts to accumulate. Stealing work from other Ps is a
last resort.
This change also adds a new debug mode under GODEBUG=gctrace=2 that
dumps whole-span scanning statistics by size class on every GC cycle.
A future extension to this CL is to use SIMD-accelerated scanning
kernels for scanning spans with high mark bit density.
For #19112. (Deadlock averted in GOEXPERIMENT.)
For #73581.
Change-Id: I4bbb4e36f376950a53e61aaaae157ce842c341bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/658036
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount
using the CPOP/CPOPW machine instructions. Since the native Go
implementation of OnesCount is relatively expensive, it is also
worth emitting a check for Zbb support when compiled for rva20u64.
On a Banana Pi F3, with GORISCV64=rva22u64:
│ oc.1 │ oc.2 │
│ sec/op │ sec/op vs base │
OnesCount-8 16.930n ± 0% 4.389n ± 0% -74.08% (p=0.000 n=10)
OnesCount8-8 5.642n ± 0% 5.016n ± 0% -11.10% (p=0.000 n=10)
OnesCount16-8 9.404n ± 0% 5.015n ± 0% -46.67% (p=0.000 n=10)
OnesCount32-8 13.165n ± 0% 4.388n ± 0% -66.67% (p=0.000 n=10)
OnesCount64-8 16.300n ± 0% 4.388n ± 0% -73.08% (p=0.000 n=10)
geomean 11.40n 4.629n -59.40%
On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb
detection enabled:
│ oc.3 │ oc.4 │
│ sec/op │ sec/op vs base │
OnesCount-8 16.930n ± 0% 5.643n ± 0% -66.67% (p=0.000 n=10)
OnesCount8-8 5.642n ± 0% 5.642n ± 0% ~ (p=0.447 n=10)
OnesCount16-8 10.030n ± 0% 6.896n ± 0% -31.25% (p=0.000 n=10)
OnesCount32-8 13.170n ± 0% 5.642n ± 0% -57.16% (p=0.000 n=10)
OnesCount64-8 16.300n ± 0% 5.642n ± 0% -65.39% (p=0.000 n=10)
geomean 11.55n 5.873n -49.16%
On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb
detection disabled:
│ oc.3 │ oc.5 │
│ sec/op │ sec/op vs base │
OnesCount-8 16.93n ± 0% 29.47n ± 0% +74.07% (p=0.000 n=10)
OnesCount8-8 5.642n ± 0% 5.643n ± 0% ~ (p=0.191 n=10)
OnesCount16-8 10.03n ± 0% 15.05n ± 0% +50.05% (p=0.000 n=10)
OnesCount32-8 13.17n ± 0% 18.18n ± 0% +38.04% (p=0.000 n=10)
OnesCount64-8 16.30n ± 0% 21.94n ± 0% +34.60% (p=0.000 n=10)
geomean 11.55n 15.84n +37.16%
For hardware without Zbb, this adds ~5ns overhead, while for hardware
with Zbb we achieve a performance gain up of up to 11ns. It is worth
noting that OnesCount8 is cheap enough that it is preferable to stick
with the generic version in this case.
Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5
Reviewed-on: https://go-review.googlesource.com/c/go/+/660856
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
In CL 656755, the readRandom function was modified
to read an integer from /dev/random.
However, on Plan 9, /dev/random can only return
a few hundred bits a second.
The issue is that readRandom is called by randinit,
which is called at the creation of Go processes.
Consequently, it lead the Go programs to be very
slow on Plan 9.
This change reverts the change done in CL 656755
to make the readRandom function always returning 0
on Plan 9.
Change-Id: Ibe1bf7e4c8cbc82998e4f5e1331f5e29a047c4fc
Cq-Include-Trybots: luci.golang.try:gotip-plan9-arm
Reviewed-on: https://go-review.googlesource.com/c/go/+/663195
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Richard Miller <millerresearch@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
We currently make some parts of the preamble unpreemptible because
it confuses morestack. See comments in the code.
Instead, have morestack handle those weird cases so we can
remove unpreemptible marks from most places.
This CL makes user functions preemptible everywhere if they have no
write barriers (at least, on x86). In cmd/go the fraction of functions
that need preemptible markings drops from 82% to 36%. Makes the cmd/go
binary 0.3% smaller.
Update #35470
Change-Id: Ic83d5eabfd0f6d239a92e65684bcce7e67ff30bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/648518
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|