| Age | Commit message (Collapse) | Author |
|
Real programs can call os.Exit concurrently from multiple goroutines.
Make internal/runtime/exithook not crash in that case.
The throw on panic also now runs in the deferred context,
so that we will see the full stack trace that led to the panic.
That should give us more visibility into the flaky failures on
bugs #55167 and #56197 as well.
Fixes #67631.
Change-Id: Iefdf71b3a3b52a793ca88d89a9c270eb50ece094
Reviewed-on: https://go-review.googlesource.com/c/go/+/588235
Reviewed-by: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
|
|
This removes a //go:linkname usage in the coverage implementation.
For #67401.
Change-Id: I0602172c7e372a84465160dbf46d9fa371582fff
Reviewed-on: https://go-review.googlesource.com/c/go/+/586259
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Ignored these linknames which have not worked for a while:
github.com/xtls/xray-core:
context.newCancelCtx removed in CL 463999 (Feb 2023)
github.com/u-root/u-root:
funcPC removed in CL 513837 (Jul 2023)
tinygo.org/x/drivers:
net.useNetdev never existed
For #67401.
Change-Id: I9293f4ef197bb5552b431de8939fa94988a060ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/587576
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For #67401.
Change-Id: I7dd28c3b01a1a647f84929d15412aa43ab0089ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/587575
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For #67401.
Change-Id: Icc10ede72547d8020c0ba45e89d954822a4b2455
Reviewed-on: https://go-review.googlesource.com/c/go/+/587218
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Allow users to decrease the profiling stack depth back to 32 in case
they experience any problems with the new default of 128.
Users may also use this option to increase the depth up to 1024.
Change-Id: Ieaab2513024915a223239278dd97a6e161dde1cf
Reviewed-on: https://go-review.googlesource.com/c/go/+/581917
Reviewed-by: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
The current stack depth limit for alloc, mutex, block, threadcreate and
goroutine profiles of 32 frequently leads to truncated stack traces in
production applications. Increase the limit to 128 which is the same
size used by the execution tracer.
Create internal/profilerecord to define variants of the runtime's
StackRecord, MemProfileRecord and BlockProfileRecord types that can hold
arbitrarily big stack traces. Implement internal profiling APIs based on
these new types and use them for creating protobuf profiles and to act
as shims for the public profiling APIs using the old types.
This will lead to an increase in memory usage for applications that
use the impacted profile types and have stack traces exceeding the
current limit of 32. Those applications will also experience a slight
increase in CPU usage, but this will hopefully soon be mitigated via CL
540476 and 533258 which introduce frame pointer unwinding for the
relevant profile types.
For #43669.
Change-Id: Ie53762e65d0f6295f5d4c7d3c87172d5a052164e
Reviewed-on: https://go-review.googlesource.com/c/go/+/572396
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This change fixes problems with thread-locked goroutines using
newcoro/coroswitch/etc. Currently, the coro paths do not consider
thread-locked goroutines at all and can quickly result in broken
scheduler state or lost/leaked goroutines.
One possible fix to these issues is to fall back on goroutine+channel
semantics, but that turns out to be fairly complicated to implement and
results in significant performance cliffs. More complex thread-lock
state donation tricks also result in some fairly complicated state
tracking that doesn't seem worth it given the use-cases of iter.Pull
(and even then, there will be performance cliffs).
This change implements a much simpler, but more restrictive semantics.
In particular, thread-lock state is tied to the coro at the first call
to newcoro (i.e. iter.Pull). From then on, the invariant is that if the
coro has any thread-lock state *or* a goroutine calling into coroswitch
has any thread-lock state, that the full gamut of thread-lock state must
remain the same as it was when newcoro was called (the full gamut
meaning internal and external lock counts as well as the identity of the
thread that was locked to).
This semantics allows the common cases to be always fast, but comes with
a non-orthogonality caveat. Specifically, when iter.Pull is used in
conjunction with thread-locked goroutines, complex cases (passing next
between goroutines or passing yield between goroutines) are likely to
fail. Simple cases, where any number of iter.Pull iterators are used in
a straightforward way (nested, in series, etc.) from the same
goroutine, will work and will be guaranteed to be fast regardless of
thread-lock state.
This is a compromise for the near-term and we may consider lifting the
restrictions imposed by this CL in the future.
Fixes #65889.
Fixes #65946.
Change-Id: I3fb5791e36a61f5ded50226a229a79d28739b24e
Reviewed-on: https://go-review.googlesource.com/c/go/+/583675
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Change-Id: I7461a892e1591e3bad876f0a718a99e6de2c4659
Reviewed-on: https://go-review.googlesource.com/c/go/+/585435
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
This patch removes redundant guards for the crash stack feature, as all
supported Go architectures now have a crash stack implementation.
Change-Id: I7ffb61c57955778d687073418130b2aaab0ff183
GitHub-Last-Rev: 6ff56d6420093ee8f95c75019557dc8bcec32ed7
GitHub-Pull-Request: golang/go#67202
Reviewed-on: https://go-review.googlesource.com/c/go/+/583416
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
|
|
Change-Id: Ifb9864704f55e27adfa5c21452fed5a243468d13
GitHub-Last-Rev: 6b000e7314ea47eea6fead60988d6d432bc381f7
GitHub-Pull-Request: golang/go#67013
Reviewed-on: https://go-review.googlesource.com/c/go/+/581376
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: I3150e2d0b9f5590c6da95392b0b51df94b8c20eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/584338
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The page tracer's functionality is now captured by the regular execution
tracer as an experimental GODEBUG variable. This is a lot more usable
and maintainable than the page tracer, which is likely to have bitrotted
by this point. There's also no tooling available for the page tracer.
Change-Id: I2408394555e01dde75a522e9a489b7e55cf12c8e
Reviewed-on: https://go-review.googlesource.com/c/go/+/583379
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Move profiling pc buffers from being stack allocated to an m field.
This is motivated by the next patch, which will increase the default
stack depth to 128, which might lead to undesirable stack growth for
goroutines that produce profiling events.
Additionally, this change paves the way to make the stack depth
configurable via GODEBUG.
Change-Id: Ifa407f899188e2c7c0a81de92194fdb627cb4b36
Reviewed-on: https://go-review.googlesource.com/c/go/+/574699
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
This code was just missed during the cleanup. There's maybe some merit
to keeping OneNewExtraM, but it would still be fairly optimistic. It's
trivial to bring back, so delete it for now.
Change-Id: I2d033c6daae787e0e8d6b92524f3e59610e2599f
Reviewed-on: https://go-review.googlesource.com/c/go/+/583375
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: Ic6fcdfb6a9a912a9b1dd268836d2e5ab44d80440
GitHub-Last-Rev: dab6ecc0660d3e0f8e23944286f965a3bb15b4cb
GitHub-Pull-Request: golang/go#65305
Reviewed-on: https://go-review.googlesource.com/c/go/+/558698
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
|
|
CL 580255 increased the frame size of entersyscall and reentersyscall,
which is causing the x/sys repository to fail to build for
windows/arm64 because of an overflow of the nosplit stack reservation.
Fix this by wrapping the other call to throw in casgstatus in a system
stack switch. This is a fatal throw anyway indicating a core runtime
invariant is broken, so this path is basically never taken. This cuts
off the nosplit frame chain and allows x/sys to build.
Change-Id: I00b16c9db3a7467413ed48953c7f8a9a750f000a
Reviewed-on: https://go-review.googlesource.com/c/go/+/580775
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
During my research of pahole with Go structs, I've found couple of
structs in runtime/ pkg where we can reduce several structs' sizes
highligted by pahole tool which detect byte holes and paddings.
Overall, there are 80 bytes reduced.
Change-Id: I398e5ed6f5b199394307741981cb5ad5b875e98f
Reviewed-on: https://go-review.googlesource.com/c/go/+/578795
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Joedian Reid <joedian@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This was an oversight and is causing a few failures, most notably on
Solaris and Illumos, but also occasionally on the Linux builders.
Change-Id: I38bd28537ad01d955675f61f9b1d42b9ecdd1ef0
Reviewed-on: https://go-review.googlesource.com/c/go/+/580875
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Profiling of runtime-internal locks checks gp.m.locks to see if it's
safe to add a new record to the profile, but direct use of
acquireLockRank can change the list of the M's active lock ranks without
updating gp.m.locks to match. The runtime's internal rwmutex
implementation makes a point of calling acquirem/releasem when
manipulating the lock rank list, but the other user of acquireLockRank
(the GC's Gscan bit) relied on the GC's invariants to avoid deadlocks.
Codify the rwmutex approach by renaming acquireLockRank to
acquireLockRankAndM and having it include a call to aquirem. Do the same
for release.
Fixes #64706
Fixes #66004
Change-Id: Ib76eaa0cc1c45b64861d03345e17e1e843c19713
GitHub-Last-Rev: 160577bdb2bb2a4e869c6fd7e53e3be8fb819182
GitHub-Pull-Request: golang/go#66276
Reviewed-on: https://go-review.googlesource.com/c/go/+/571056
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Currently the runtime only tracks the PC and SP upon entering a syscall,
but not the FP (BP). This is mainly for historical reasons, and because
the tracer (which uses the frame pointer unwinder) does not need it.
Until it did, of course, in CL 567076, where the tracer tries to take a
stack trace of a goroutine that's in a syscall from afar. It tries to
use gp.sched.bp and lots of things go wrong. It *really* should be using
the equivalent of gp.syscallbp, which doesn't exist before this CL.
This change introduces gp.syscallbp and tracks it. It also introduces
getcallerfp which is nice for simplifying some code. Because we now have
gp.syscallbp, we can also delete the frame skip count computation in
traceLocker.GoSysCall, because it's now the same regardless of whether
frame pointer unwinding is used.
Fixes #66889.
Change-Id: Ib6d761c9566055e0a037134138cb0f81be73ecf7
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-nocgo
Reviewed-on: https://go-review.googlesource.com/c/go/+/580255
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This change adds the internal/weak package, which exposes GC-supported
weak pointers to the standard library. This is for the upcoming weak
package, but may be useful for other future constructs.
For #62483.
Change-Id: I4aa8fa9400110ad5ea022a43c094051699ccab9d
Reviewed-on: https://go-review.googlesource.com/c/go/+/576297
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change makes the new execution tracer described in #60773, the
default tracer. This change attempts to make the smallest amount of
changes for a single CL.
Updates #66703
For #60773
Change-Id: I3742f3419c54f07d7c020ae5e1c18d29d8bcae6d
Reviewed-on: https://go-review.googlesource.com/c/go/+/576256
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The previous CL, CL 570257, made it so that STW time no longer
overlapped with other CPU time tracking. However, what we lost was
insight into the CPU time spent _stopping_ the world, which can be just
as important. There's pretty much no easy way to measure this
indirectly, so this CL implements a direct measurement: whenever a P
enters _Pgcstop, it writes down what time it did so. stopTheWorld then
accumulates all the time deltas between when it finished stopping the
world and each P's stop time into a total additional pause time. The GC
pause cases then accumulate this number into the metrics.
This should cause minimal additional overhead in stopping the world. GC
STWs already take on the order of 10s to 100s of microseconds. Even for
100 Ps, the extra `nanotime` call per P is only 1500ns of additional CPU
time. This is likely to be much less in actual pause latency, since it
all happens concurrently.
Change-Id: Icf190ffea469cd35ebaf0b2587bf6358648c8554
Reviewed-on: https://go-review.googlesource.com/c/go/+/574215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Nicolas Hillegeer <aktau@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
Currently the GC CPU pause time metrics start measuring before the STW
is complete. This results in a slightly less accurate measurement and
creates some overlap with other timings (for example, the idle time of
idle Ps) that will cause double-counting.
This CL adds a field to worldStop to track the point at which the world
actually stopped and uses that as the basis for the GC CPU pause time
metrics, basically eliminating this overlap.
Note that this will cause Ps in _Pgcstop before the world is fully
stopped to be counted as user time. A follow-up CL will fix this
discrepancy.
Change-Id: I287731f08415ffd97d327f582ddf7e5d2248a6f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/570258
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Nicolas Hillegeer <aktau@google.com>
|
|
Currently, the execution tracer may attempt to take a stack trace of a
goroutine whose stack it does not own. For example, if the goroutine is
in _Grunnable or _Gwaiting. This is easily fixed in all cases by simply
moving the emission of GoStop and GoBlock events to before the
casgstatus happens. The goroutine status is what is used to signal stack
ownership, and the GC may shrink a goroutine's stack if it can acquire
the scan bit.
Although this is easily fixed, the interaction here is very subtle,
because stack ownership is only implicit in the goroutine's scan status.
To make this invariant more maintainable and less error-prone in the
future, this change adds a GODEBUG setting that checks, at the point of
taking a stack trace, whether the caller owns the goroutine. This check
is not quite perfect because there's no way for the stack tracing code
to know that the _Gscan bit was acquired by the caller, so for
simplicity it assumes that it was the caller that acquired the scan bit.
In all other cases however, we can check for ownership precisely. At the
very least, this check is sufficient to catch the issue this change is
fixing.
To make sure this debug check doesn't bitrot, it's always enabled during
trace testing. This new mode has actually caught a few other issues
already, so this change fixes them.
One issue that this debug mode caught was that it's not safe to take a
stack trace of a _Gwaiting goroutine that's being unparked.
Another much bigger issue this debug mode caught was the fact that the
execution tracer could try to take a stack trace of a G that was in
_Gwaiting solely to avoid a deadlock in the GC. The execution tracer
already has a partial list of these cases since they're modeled as the
goroutine just executing as normal in the tracer, but this change takes
the list and makes it more formal. In this specific case, we now prevent
the GC from shrinking the stacks of goroutines in this state if tracing
is enabled. The stack traces from these scenarios are too useful to
discard, but there is indeed a race here between the tracer and any
attempt to shrink the stack by the GC.
Change-Id: I019850dabc8cede202fd6dcc0a4b1f16764209fb
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race
Reviewed-on: https://go-review.googlesource.com/c/go/+/573155
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
For #65355
Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be
Reviewed-on: https://go-review.googlesource.com/c/go/+/560155
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
|
|
This change resolves a TODO in the coroutine switch implementation (used
exclusively by iter.Pull at the moment) to enable tracing. This was
blocked on eliminating the atomic load in the tracer's "off" path
(completed in the previous CL in this series) and the addition of new
tracer events to minimize the overhead of tracing in this circumstance.
This change introduces 3 new event types to support coroutine switches:
GoCreateBlocked, GoSwitch, and GoSwitchDestroy.
GoCreateBlocked needs to be introduced because the goroutine created for
the coroutine starts out in a blocked state. There's no way to represent
this in the tracer right now, so we need a new event for it.
GoSwitch represents the actual coroutine switch, which conceptually
consists of a GoUnblock, a GoBlock, and a GoStart event in series
(unblocking the next goroutine to run, blocking the current goroutine,
and then starting the next goroutine to run).
GoSwitchDestroy is closely related to GoSwitch, implementing the same
semantics except that GoBlock is replaced with GoDestroy. This is used
when exiting the coroutine.
The implementation of all this is fairly straightforward, and the trace
parser simply translates GoSwitch* into the three constituent events.
Because GoSwitch and GoSwitchDestroy imply a GoUnblock and a GoStart,
they need to synchronize with other past and future GoStart events to
create a correct partial ordering in the trace. Therefore, these events
need a sequence number for the goroutine that will be unblocked and
started.
Also, while implementing this, I noticed that the coroutine
implementation is actually buggy with respect to LockOSThread. In fact,
it blatantly disregards its invariants without an explicit panic. While
such a case is likely to be rare (and inefficient!) we should decide how
iter.Pull behaves with respect to runtime.LockOSThread.
Lastly, this change also bumps the trace version from Go 1.22 to Go
1.23. We're adding events that are incompatible with a Go 1.22 parser,
but Go 1.22 traces are all valid Go 1.23 traces, so the newer parser
supports both (and the CL otherwise updates the Go 1.22 definitions of
events and such). We may want to reconsider the structure and naming of
some of these packages though; it could quickly get confusing.
For #61897.
Change-Id: I96897a46d5852c02691cde9f957dc6c13ef4d8e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/565937
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
Change-Id: Ib787b27670ad0f10bcc94b3ce76e86746997af00
GitHub-Last-Rev: e5abb9a556ae709d30d6ffb5a13805923c215254
GitHub-Pull-Request: golang/go#65934
Reviewed-on: https://go-review.googlesource.com/c/go/+/566715
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
The comment in updateTimerPMask is wrong. It says:
// Looks like there are no timers, however another P
// may be adding one at this very moment.
// Take the lock to synchronize.
This was my incorrect simplification of the original comment
from CL 264477 when I was renaming all the things it mentioned:
// Looks like there are no timers, however another P may transiently
// decrement numTimers when handling a timerModified timer in
// checkTimers. We must take timersLock to serialize with these changes.
updateTimerPMask is being called by pidleput, so the P in question
is not in use. And other P's cannot add to this P.
As the original comment more precisely noted, the problem was
that other P's might be calling timers.check, which updates ts.len
occasionally while ts is locked, and one of those updates might
"leak" an ephemeral len==0 even when the heap is not going to
be empty when the P is finally unlocked. The lock/unlock in
updateTimerPMask synchronizes to avoid that. But this defeats
most of the purpose of using ts.len in the first place.
Instead of requiring that synchronization, we can arrange that
ts.len only ever shows a "publishable" length, meaning the len(ts.heap)
we leave behind during ts.unlock.
Having done that, updateTimerPMask can be inlined into pidleput.
The big comment on updateTimerPMask explaining how timerpMask
works is better placed as the doc comment for timerpMask itself,
so move it there.
Change-Id: I5442c9bb7f1473b5fd37c43165429d087012e73f
Reviewed-on: https://go-review.googlesource.com/c/go/+/568336
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
|
|
No semantic changes here.
Cleaning up for next change.
Change-Id: I9706009739677ff9eb893bcc007d805f7877511e
Reviewed-on: https://go-review.googlesource.com/c/go/+/568335
Reviewed-by: Austin Clements <austin@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Continuing conversion from C to Go, introduce type timers
encapsulating all timer heap state, with methods for operations.
This should at least be easier to think about, instead of having
these fields strewn through the P struct. It should also be easier
to test.
I am skeptical about the pair of atomic int64 deadlines:
I think there are missed wakeups lurking.
Having the code in an abstracted API should make it easier
to reason through and fix if needed.
[This is one CL in a refactoring stack making very small changes
in each step, so that any subtle bugs that we miss can be more
easily pinpointed to a small change.]
Change-Id: If5ea3e0b946ca14076f44c85cbb4feb9eddb4f95
Reviewed-on: https://go-review.googlesource.com/c/go/+/564132
Reviewed-by: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
|
|
No code changes, only code moves here.
Move all code that locks pp.timersLock into time.go
so that it is all in one place, for easier abstraction.
[This is one CL in a refactoring stack making very small changes
in each step, so that any subtle bugs that we miss can be more
easily pinpointed to a small change.]
Change-Id: I1b59af7780431ec6479440534579deb1a3d9d7a3
Reviewed-on: https://go-review.googlesource.com/c/go/+/564117
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
adjusttimers already contains the same logic. Use it instead.
This avoids having two copies of the code and is faster.
adjusttimers was formerly O(n log n) but is now O(n).
clearDeletedTimers was formerly O(n² log n) and is now gone!
[This is one CL in a refactoring stack making very small changes
in each step, so that any subtle bugs that we miss can be more
easily pinpointed to a small change.]
Change-Id: I32bf24817a589033dc304b359f8df10ea21f48fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/564116
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Fixes #63555
Change-Id: I26e7baf58fcb78e66e0feed5725e371e25d657cc
Reviewed-on: https://go-review.googlesource.com/c/go/+/550175
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
When readying a goroutine, the scheduler typically places the readied
goroutine in pp.runnext, which will typically be the next goroutine to
run in the schedule.
In order to prevent a set of ping-pong goroutines from simply switching
back and forth via runnext and starving the rest of the run queue, a
goroutine scheduled via runnext shares a time slice (pp.schedtick) with
the previous goroutine.
sysmon detects "long-running goroutines", which really means Ps using
the same pp.schedtick for too long, and preempts them to allow the rest
of the run queue to run. Thus this avoids starvation via runnext.
However, wasm has no threads, and thus no sysmon. Without sysmon to
preempt, the possibility for starvation returns. Avoid this by disabling
runnext entirely on wasm. This means that readied goroutines always go
on the end of the run queue and thus cannot starve via runnext.
Note that this CL doesn't do anything about single long-running
goroutines. Without sysmon to preempt them, a single goroutine that
fails to yield will starve the run queue indefinitely.
For #65178.
Change-Id: I10859d088776125a2af8c9cd862b6e071da628b5
Cq-Include-Trybots: luci.golang.try:gotip-js-wasm,gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero
Reviewed-on: https://go-review.googlesource.com/c/go/+/559798
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: Ide4002d1cf82f2daaf7261b367c391dedbbf7719
GitHub-Last-Rev: 80ee248c3e34529e7a522acc97db9fb69c82dffb
GitHub-Pull-Request: golang/go#65308
Reviewed-on: https://go-review.googlesource.com/c/go/+/558699
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
crashFD defaults to the zero value of (surprise!) zero. Zero is a valid
FD, so on the first call to SetCrashOutput we actually close FD 0 since
it is a "valid" FD.
Initialize crashFD to -1, the sentinel for "no FD".
Change-Id: I3b108c60603f2b83b867cbe079f035c159b6a6ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/560776
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Currently there are a few places where a P can get stolen where the
runtime doesn't traceAcquire and traceRelease across the steal itself.
What can happen then is the following scenario:
- Thread 1 enters a syscall and writes an event about it.
- Thread 2 steals Thread 1's P.
- Thread 1 exits the syscall and writes one or more events about it.
- Tracing ends (trace.gen is set to 0).
- Thread 2 checks to see if it should write an event for the P it just
stole, sees that tracing is disabled, and doesn't.
This results in broken traces, because there's a missing ProcSteal
event. The parser always waits for a ProcSteal to advance a
GoSyscallEndBlocked event, and in this case, it never comes.
Fixes #65181.
Change-Id: I437629499bb7669bf7fe2fc6fc4f64c53002916b
Reviewed-on: https://go-review.googlesource.com/c/go/+/560235
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This reverts CL 557437.
Reason for revert: Appears to have broken wasip1 builders.
For #65178.
Change-Id: I59c1a310eb56589c768536fe444c1efaf862f8b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/559237
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
When readying a goroutine, the scheduler typically places the readied
goroutine in pp.runnext, which will typically be the next goroutine to
run in the schedule.
In order to prevent a set of ping-pong goroutines from simply switching
back and forth via runnext and starving the rest of the run queue, a
goroutine scheduled via runnext shares a time slice (pp.schedtick) with
the previous goroutine.
sysmon detects "long-running goroutines", which really means Ps using
the same pp.schedtick for too long, and preempts them to allow the rest
of the run queue to run. Thus this avoids starvation via runnext.
However, wasm has no threads, and thus no sysmon. Without sysmon to
preempt, the possibility for starvation returns. Avoid this by disabling
runnext entirely on wasm. This means that readied goroutines always go
on the end of the run queue and thus cannot starve via runnext.
Note that this CL doesn't do anything about single long-running
goroutines. Without sysmon to preempt them, a single goroutine that
fails to yield will starve the run queue indefinitely.
For #65178.
Cq-Include-Trybots: luci.golang.try:gotip-js-wasm,gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero
Change-Id: I7dffe1e72c6586474186b72f8068cff77b661eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/557437
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
|
|
Change-Id: Icc2641b888440cc27444b5dfb2b8ff286e6a595d
GitHub-Last-Rev: f5772e32e9190ab1eed94fcf2c9e58d6bc0d74d6
GitHub-Pull-Request: golang/go#63923
Reviewed-on: https://go-review.googlesource.com/c/go/+/539536
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Change-Id: Ie923f7bbe5ef22e381ae4f421387fbd570622a28
GitHub-Last-Rev: f8f21635025eb6e26c6994679995ade501e870cf
GitHub-Pull-Request: golang/go#63908
Reviewed-on: https://go-review.googlesource.com/c/go/+/539296
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
|
|
Change-Id: I8fe66326994894b17ce0eda991bba942844d26b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/541475
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
CL 549536 intended to decouple the internal implementation of rwmutex
from the semantic meaning of an rwmutex read/write lock in the static
lock ranking.
Unfortunately, it was not thought through well enough. The internals
were represented with the rwmutexR and rwmutexW lock ranks. The idea was
that the internal lock ranks need not model the higher-level ordering,
since those have separate rankings. That is incorrect; rwmutexW is held
for the duration of a write lock, so it must be ranked before any lock
taken while any write lock is held, which is precisely what we were
trying to avoid.
This is visible in violations like:
0 : execW 11 0x0
1 : rwmutexW 51 0x111d9c8
2 : fin 30 0x111d3a0
fatal error: lock ordering problem
execW < fin is modeled, but rwmutexW < fin is missing.
Fix this by eliminating the rwmutexR/W lock ranks shared across
different types of rwmutex. Instead require users to define an
additional "internal" lock rank to represent the implementation details
of rwmutex.rLock. We can avoid an additional "internal" lock rank for
rwmutex.wLock because the existing writeRank has the same semantics for
semantic and internal locking. i.e., writeRank is held for the duration
of a write lock, which is exactly how rwmutex.wLock is used, so we can
use writeRank directly on wLock.
For #64722.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking
Change-Id: Ia572de188a46ba8fe054ae28537648beaa16b12c
Reviewed-on: https://go-review.googlesource.com/c/go/+/555055
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Currently, lock ranking doesn't really try to model rwmutex. It records
the internal locks rLock and wLock, but in a subpar fashion:
1. wLock is held from lock to unlock, so it works OK, but it conflates
write locks of all rwmutexes as rwmutexW, rather than allowing
different rwmutexes to have different rankings.
2. rLock is an internal implementation detail that is only taken when
there is contention in rlock. As as result, the reader lock path is
almost never checked.
Add proper modeling. rwmutexR and rwmutexW remain as the ranks of the
internal locks, which have their own ordering. The new init method is
passed the ranks of the higher level lock that this represents, just
like lockInit for mutex.
execW ordered before MALLOC captures the case from #64722. i.e., there
can be allocation between BeforeFork and AfterFork.
For #64722.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking
Change-Id: I23335b28faa42fb04f1bc9da02fdf54d1616cd28
Reviewed-on: https://go-review.googlesource.com/c/go/+/549536
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The exported API is only available with GOEXPERIMENT=rangefunc.
This will let Go 1.22 users who want to experiment with rangefuncs
access an efficient implementation of iter.Pull and iter.Pull2.
For #61897.
Change-Id: I6ef5fa8f117567efe4029b7b8b0f4d9b85697fb7
Reviewed-on: https://go-review.googlesource.com/c/go/+/543319
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
profileruntimelocks is new in CL 544195, but the name is deceptive. Even
with profileruntimelocks=0, runtime-internal locks are still profiled.
The actual difference is that call stacks are not collected. Instead all
contention is reported at runtime._LostContendedLock.
Rename this setting to runtimecontentionstacks to make its name more
aligned with its behavior.
In addition, for this release the default is profileruntimelocks=0,
meaning that users are fairly likely to encounter
runtime._LostContendedLock. Rename it to
runtime._LostContendedRuntimeLock in an attempt to make it more
intuitive that these are runtime locks, not locks in application code.
For #57071.
Change-Id: I38aac28b2c0852db643d53b1eab3f3bc42a43393
Reviewed-on: https://go-review.googlesource.com/c/go/+/547055
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
|
|
Move ChaCha8 code into internal/chacha8rand and use it to implement
runtime.rand, which is used for the unseeded global source for
both math/rand and math/rand/v2. This also affects the calculation of
the start point for iteration over very very large maps (when the
32-bit fastrand is not big enough).
The benefit is that misuse of the global random number generators
in math/rand and math/rand/v2 in contexts where non-predictable
randomness is important for security reasons is no longer a
security problem, removing a common mistake among programmers
who are unaware of the different kinds of randomness.
The cost is an extra 304 bytes per thread stored in the m struct
plus 2-3ns more per random uint64 due to the more sophisticated
algorithm. Using PCG looks like it would cost about the same,
although I haven't benchmarked that.
Before this, the math/rand and math/rand/v2 global generator
was wyrand (https://github.com/wangyi-fudan/wyhash).
For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson
ALFG was justifiable, since the latter was not any better.
But for math/rand/v2, the global generator really should be
at least as good as one of the well-studied, specific algorithms
provided directly by the package, and it's not.
(Wyrand is still reasonable for scheduling and cache decisions.)
Good randomness does have a cost: about twice wyrand.
Also rationalize the various runtime rand references.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ bbb48afeb7.amd64 │ 5cf807d1ea.amd64 │
│ sec/op │ sec/op vs base │
ChaCha8-32 1.862n ± 2% 1.861n ± 2% ~ (p=0.825 n=20)
PCG_DXSM-32 1.471n ± 1% 1.460n ± 2% ~ (p=0.153 n=20)
SourceUint64-32 1.636n ± 2% 1.582n ± 1% -3.30% (p=0.000 n=20)
GlobalInt64-32 2.087n ± 1% 3.663n ± 1% +75.54% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1042n ± 1% 0.2026n ± 1% +94.48% (p=0.000 n=20)
GlobalUint64-32 2.263n ± 2% 3.724n ± 1% +64.57% (p=0.000 n=20)
GlobalUint64Parallel-32 0.1019n ± 1% 0.1973n ± 1% +93.67% (p=0.000 n=20)
Int64-32 1.771n ± 1% 1.774n ± 1% ~ (p=0.449 n=20)
Uint64-32 1.863n ± 2% 1.866n ± 1% ~ (p=0.364 n=20)
GlobalIntN1000-32 3.134n ± 3% 4.730n ± 2% +50.95% (p=0.000 n=20)
IntN1000-32 2.489n ± 1% 2.489n ± 1% ~ (p=0.683 n=20)
Int64N1000-32 2.521n ± 1% 2.516n ± 1% ~ (p=0.394 n=20)
Int64N1e8-32 2.479n ± 1% 2.478n ± 2% ~ (p=0.743 n=20)
Int64N1e9-32 2.530n ± 2% 2.514n ± 2% ~ (p=0.193 n=20)
Int64N2e9-32 2.501n ± 1% 2.494n ± 1% ~ (p=0.616 n=20)
Int64N1e18-32 3.227n ± 1% 3.205n ± 1% ~ (p=0.101 n=20)
Int64N2e18-32 3.647n ± 1% 3.599n ± 1% ~ (p=0.019 n=20)
Int64N4e18-32 5.135n ± 1% 5.069n ± 2% ~ (p=0.034 n=20)
Int32N1000-32 2.657n ± 1% 2.637n ± 1% ~ (p=0.180 n=20)
Int32N1e8-32 2.636n ± 1% 2.636n ± 1% ~ (p=0.763 n=20)
Int32N1e9-32 2.660n ± 2% 2.638n ± 1% ~ (p=0.358 n=20)
Int32N2e9-32 2.662n ± 2% 2.618n ± 2% ~ (p=0.064 n=20)
Float32-32 2.272n ± 2% 2.239n ± 2% ~ (p=0.194 n=20)
Float64-32 2.272n ± 1% 2.286n ± 2% ~ (p=0.763 n=20)
ExpFloat64-32 3.762n ± 1% 3.744n ± 1% ~ (p=0.171 n=20)
NormFloat64-32 3.706n ± 1% 3.655n ± 2% ~ (p=0.066 n=20)
Perm3-32 32.93n ± 3% 34.62n ± 1% +5.13% (p=0.000 n=20)
Perm30-32 202.9n ± 1% 204.0n ± 1% ~ (p=0.482 n=20)
Perm30ViaShuffle-32 115.0n ± 1% 114.9n ± 1% ~ (p=0.358 n=20)
ShuffleOverhead-32 112.8n ± 1% 112.7n ± 1% ~ (p=0.692 n=20)
Concurrent-32 2.107n ± 0% 3.725n ± 1% +76.75% (p=0.000 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
│ bbb48afeb7.arm64 │ 5cf807d1ea.arm64 │
│ sec/op │ sec/op vs base │
ChaCha8-8 2.480n ± 0% 2.429n ± 0% -2.04% (p=0.000 n=20)
PCG_DXSM-8 2.531n ± 0% 2.530n ± 0% ~ (p=0.877 n=20)
SourceUint64-8 2.534n ± 0% 2.533n ± 0% ~ (p=0.732 n=20)
GlobalInt64-8 2.172n ± 1% 4.794n ± 0% +120.67% (p=0.000 n=20)
GlobalInt64Parallel-8 0.4320n ± 0% 0.9605n ± 0% +122.32% (p=0.000 n=20)
GlobalUint64-8 2.182n ± 0% 4.770n ± 0% +118.58% (p=0.000 n=20)
GlobalUint64Parallel-8 0.4307n ± 0% 0.9583n ± 0% +122.51% (p=0.000 n=20)
Int64-8 4.107n ± 0% 4.104n ± 0% ~ (p=0.416 n=20)
Uint64-8 4.080n ± 0% 4.080n ± 0% ~ (p=0.052 n=20)
GlobalIntN1000-8 2.814n ± 2% 5.643n ± 0% +100.50% (p=0.000 n=20)
IntN1000-8 4.141n ± 0% 4.139n ± 0% ~ (p=0.140 n=20)
Int64N1000-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.313 n=20)
Int64N1e8-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.103 n=20)
Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.761 n=20)
Int64N2e9-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.636 n=20)
Int64N1e18-8 5.266n ± 0% 5.326n ± 1% +1.14% (p=0.001 n=20)
Int64N2e18-8 6.052n ± 0% 6.167n ± 0% +1.90% (p=0.000 n=20)
Int64N4e18-8 8.826n ± 0% 9.051n ± 0% +2.55% (p=0.000 n=20)
Int32N1000-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20)
Int32N1e8-8 4.126n ± 0% 4.131n ± 0% +0.12% (p=0.000 n=20)
Int32N1e9-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20)
Int32N2e9-8 4.132n ± 0% 4.131n ± 0% ~ (p=0.017 n=20)
Float32-8 4.109n ± 0% 4.105n ± 0% ~ (p=0.379 n=20)
Float64-8 4.107n ± 0% 4.106n ± 0% ~ (p=0.867 n=20)
ExpFloat64-8 5.339n ± 0% 5.383n ± 0% +0.82% (p=0.000 n=20)
NormFloat64-8 5.735n ± 0% 5.737n ± 1% ~ (p=0.856 n=20)
Perm3-8 26.65n ± 0% 26.80n ± 1% +0.58% (p=0.000 n=20)
Perm30-8 194.8n ± 1% 197.0n ± 0% +1.18% (p=0.000 n=20)
Perm30ViaShuffle-8 156.6n ± 0% 157.6n ± 1% +0.61% (p=0.000 n=20)
ShuffleOverhead-8 124.9n ± 0% 125.5n ± 0% +0.52% (p=0.000 n=20)
Concurrent-8 2.434n ± 3% 5.066n ± 0% +108.09% (p=0.000 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ bbb48afeb7.386 │ 5cf807d1ea.386 │
│ sec/op │ sec/op vs base │
ChaCha8-32 11.295n ± 1% 4.748n ± 2% -57.96% (p=0.000 n=20)
PCG_DXSM-32 7.693n ± 1% 7.738n ± 2% ~ (p=0.542 n=20)
SourceUint64-32 7.658n ± 2% 7.622n ± 2% ~ (p=0.344 n=20)
GlobalInt64-32 3.473n ± 2% 7.526n ± 2% +116.73% (p=0.000 n=20)
GlobalInt64Parallel-32 0.3198n ± 0% 0.5444n ± 0% +70.22% (p=0.000 n=20)
GlobalUint64-32 3.612n ± 0% 7.575n ± 1% +109.69% (p=0.000 n=20)
GlobalUint64Parallel-32 0.3168n ± 0% 0.5403n ± 0% +70.51% (p=0.000 n=20)
Int64-32 7.673n ± 2% 7.789n ± 1% ~ (p=0.122 n=20)
Uint64-32 7.773n ± 1% 7.827n ± 2% ~ (p=0.920 n=20)
GlobalIntN1000-32 6.268n ± 1% 9.581n ± 1% +52.87% (p=0.000 n=20)
IntN1000-32 10.33n ± 2% 10.45n ± 1% ~ (p=0.233 n=20)
Int64N1000-32 10.98n ± 2% 11.01n ± 1% ~ (p=0.401 n=20)
Int64N1e8-32 11.19n ± 2% 10.97n ± 1% ~ (p=0.033 n=20)
Int64N1e9-32 11.06n ± 1% 11.08n ± 1% ~ (p=0.498 n=20)
Int64N2e9-32 11.10n ± 1% 11.01n ± 2% ~ (p=0.995 n=20)
Int64N1e18-32 15.23n ± 2% 15.04n ± 1% ~ (p=0.973 n=20)
Int64N2e18-32 15.89n ± 1% 15.85n ± 1% ~ (p=0.409 n=20)
Int64N4e18-32 18.96n ± 2% 19.34n ± 2% ~ (p=0.048 n=20)
Int32N1000-32 10.46n ± 2% 10.44n ± 2% ~ (p=0.480 n=20)
Int32N1e8-32 10.46n ± 2% 10.49n ± 2% ~ (p=0.951 n=20)
Int32N1e9-32 10.28n ± 2% 10.26n ± 1% ~ (p=0.431 n=20)
Int32N2e9-32 10.50n ± 2% 10.44n ± 2% ~ (p=0.249 n=20)
Float32-32 13.80n ± 2% 13.80n ± 2% ~ (p=0.751 n=20)
Float64-32 23.55n ± 2% 23.87n ± 0% ~ (p=0.408 n=20)
ExpFloat64-32 15.36n ± 1% 15.29n ± 2% ~ (p=0.316 n=20)
NormFloat64-32 13.57n ± 1% 13.79n ± 1% +1.66% (p=0.005 n=20)
Perm3-32 45.70n ± 2% 46.99n ± 2% +2.81% (p=0.001 n=20)
Perm30-32 399.0n ± 1% 403.8n ± 1% +1.19% (p=0.006 n=20)
Perm30ViaShuffle-32 349.0n ± 1% 350.4n ± 1% ~ (p=0.909 n=20)
ShuffleOverhead-32 322.3n ± 1% 323.8n ± 1% ~ (p=0.410 n=20)
Concurrent-32 3.331n ± 1% 7.312n ± 1% +119.50% (p=0.000 n=20)
For #61716.
Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2
Reviewed-on: https://go-review.googlesource.com/c/go/+/516860
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The exitsyscall path, since the introduction of the new execution
tracer, stores a just little bit more data in the exitsyscall stack
frame, causing a build failure from exceeding the nosplit limit with
'-N -l' set on all packages (like Delve does).
One of the paths through which this fails is "throw" from wirep, called
by a callee of exitsyscall. By switching to the systemstack on this
path, we can avoid hitting the nosplit limit, fixing the build. It's
also not totally unreasonable to switch to the systemstack for the
throws in this function, since the function has to be nosplit anyway. It
gives the throw path a bit more wiggle room to dump information than it
otherwise would have.
Fixes #64113.
Change-Id: I56e94e40614a202b8ac2fdc8b8b731493b74e5d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/544535
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|