| Age | Commit message (Collapse) | Author |
|
This change renames the v2 execution tracer files created as part of
Updates #66703
For #60773
Change-Id: I91bfdc08fec4ec68ff3a6e8b5c86f6f8bcae6e6d
Reviewed-on: https://go-review.googlesource.com/c/go/+/576257
Auto-Submit: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
This change makes the new execution tracer described in #60773, the
default tracer. This change attempts to make the smallest amount of
changes for a single CL.
Updates #66703
For #60773
Change-Id: I3742f3419c54f07d7c020ae5e1c18d29d8bcae6d
Reviewed-on: https://go-review.googlesource.com/c/go/+/576256
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The existing implementation of traceMap is a hash map with a fixed
bucket table size which scales poorly with the number of elements added
to the map. After a few thousands elements are in the map, it tends to
fall over.
Furthermore, cleaning up the trace map is currently non-preemptible,
without very good reason.
This change replaces the traceMap implementation with a simple
append-only concurrent hash-trie. The data structure is incredibly
simple and does not suffer at all from the same scaling issues.
Because the traceMap no longer has a lock, and the traceRegionAlloc it
embeds is not thread-safe, we have to push that lock down. While we're
here, this change also makes the fast path for the traceRegionAlloc
lock-free. This may not be inherently faster due to contention on the
atomic add, but it creates an easy path to sharding the main allocation
buffer to reduce contention in the future. (We might want to also
consider a fully thread-local allocator that covers both string and
stack tables. The only reason a thread-local allocator isn't feasible
right now is because each of these has their own region, but we could
certainly group all them together.)
Change-Id: I8c06d42825c326061a1b8569e322afc4bc2a513a
Reviewed-on: https://go-review.googlesource.com/c/go/+/570035
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This change adds a new event, GoStatusStack, which is like GoStatus but
also carries a stack ID. The purpose of this event is to emit stacks in
more places, in particular for goroutines that may never emit a
stack-bearing event in a whole generation.
This CL targets one specific case: goroutines that were blocked or in a
syscall the entire generation. This particular case is handled at the
point that we scribble down the goroutine's status before the generation
transition. That way, when we're finishing up the generation and
emitting events for any goroutines we scribbled down, we have an
accurate stack for those goroutines ready to go, and we emit a
GoStatusStack instead of a GoStatus event. There's a small drawback with
the way we scribble down the stack though: we immediately register it in
the stack table instead of tracking the PCs. This means that if a
goroutine does run and emit a trace event in between when we scribbled
down its stack and the end of the generation, we will have recorded a
stack that never actually gets referenced in the trace. This case should
be rare.
There are two remaining cases where we could emit stacks for goroutines
but we don't.
One is goroutines that get unblocked but either never run, or run and
never block within a generation. We could take a stack trace at the
point of unblocking the goroutine, if we're emitting a GoStatus event
for it, but unfortunately we don't own the stack at that point. We could
obtain ownership by grabbing its _Gscan bit, but that seems a little
risky, since we could hold up the goroutine emitting the event for a
while. Something to consider for the future.
The other remaining case is a goroutine that was runnable when tracing
started and began running, but then ran until the end of the generation
without getting preempted or blocking. The main issue here is that
although the goroutine will have a GoStatus event, it'll only have a
GoStart event for it which doesn't emit a stack trace. This case is
rare, but still certainly possible. I believe the only way to resolve it
is to emit a GoStatusStack event instead of a GoStatus event for a
goroutine that we're emitting GoStart for. This case is a bit easier
than the last one because at the point of emitting GoStart, we have
ownership of the goroutine's stack.
We may consider dealing with these in the future, but for now, this CL
captures a fairly large class of goroutines, so is worth it on its own.
Fixes #65634.
Change-Id: Ief3b6df5848b426e7ee6794e98dc7ef5f37ab2d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/567076
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For #65355
Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be
Reviewed-on: https://go-review.googlesource.com/c/go/+/560155
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
|
|
This change resolves a TODO in the coroutine switch implementation (used
exclusively by iter.Pull at the moment) to enable tracing. This was
blocked on eliminating the atomic load in the tracer's "off" path
(completed in the previous CL in this series) and the addition of new
tracer events to minimize the overhead of tracing in this circumstance.
This change introduces 3 new event types to support coroutine switches:
GoCreateBlocked, GoSwitch, and GoSwitchDestroy.
GoCreateBlocked needs to be introduced because the goroutine created for
the coroutine starts out in a blocked state. There's no way to represent
this in the tracer right now, so we need a new event for it.
GoSwitch represents the actual coroutine switch, which conceptually
consists of a GoUnblock, a GoBlock, and a GoStart event in series
(unblocking the next goroutine to run, blocking the current goroutine,
and then starting the next goroutine to run).
GoSwitchDestroy is closely related to GoSwitch, implementing the same
semantics except that GoBlock is replaced with GoDestroy. This is used
when exiting the coroutine.
The implementation of all this is fairly straightforward, and the trace
parser simply translates GoSwitch* into the three constituent events.
Because GoSwitch and GoSwitchDestroy imply a GoUnblock and a GoStart,
they need to synchronize with other past and future GoStart events to
create a correct partial ordering in the trace. Therefore, these events
need a sequence number for the goroutine that will be unblocked and
started.
Also, while implementing this, I noticed that the coroutine
implementation is actually buggy with respect to LockOSThread. In fact,
it blatantly disregards its invariants without an explicit panic. While
such a case is likely to be rare (and inefficient!) we should decide how
iter.Pull behaves with respect to runtime.LockOSThread.
Lastly, this change also bumps the trace version from Go 1.22 to Go
1.23. We're adding events that are incompatible with a Go 1.22 parser,
but Go 1.22 traces are all valid Go 1.23 traces, so the newer parser
supports both (and the CL otherwise updates the Go 1.22 definitions of
events and such). We may want to reconsider the structure and naming of
some of these packages though; it could quickly get confusing.
For #61897.
Change-Id: I96897a46d5852c02691cde9f957dc6c13ef4d8e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/565937
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
Tracing is currently broken when using iter.Pull from the rangefunc
experiment partly because the "tracing is off" fast path in traceAcquire
was deemed too expensive to check (an atomic load) during the coroutine
switch.
This change adds trace.enabled, a non-atomic indicator of whether
tracing is enabled. It doubles trace.gen, which is the source of truth
on whether tracing is enabled. The semantics around trace.enabled are
subtle.
When tracing is enabled, we need to be careful to make sure that if gen
!= 0, goroutines enter the tracer on traceAcquire. This is enforced by
making sure trace.enabled is published atomically with trace.gen. The
STW takes care of synchronization with most Ms, but there's still sysmon
and goroutines exiting syscalls. We need to synchronize with those
explicitly anyway, which luckily takes care of trace.enabled as well.
When tracing is disabled, it's always OK for trace.enabled to be stale,
since traceAcquire will always double-check gen before proceeding.
For #61897.
Change-Id: I47c2a530fb5339c15e419312fbb1e22d782cd453
Reviewed-on: https://go-review.googlesource.com/c/go/+/565935
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Currently the stop reason for runtime.Gosched is labeled
"runtime.GoSched" which doesn't actually match the function name. Fix
the label to match the function name.
This change doesn't regenerate the internal/trace/v2 tests, because
regenerating the tests breaks summarization tests in internal/trace that
rely on very specific details in the example traces that aren't
guaranteed. Also, go122-gc-trace.test isn't generated at all, as it
turns out. I'll fix this all up in a follow-up CL. For now, just replace
runtime.GoSched with runtime.Gosched in the traces so we don't have a
problem later if a test wants to look for that string.
This change does regenerate the cmd/trace/v2 test, but it turns out the
cmd/trace/v2 tests are way too strict about network unblock events, and
3 usually pop up instead of 1 or 2, which is what the test expects.
AFAICT this looks plausible to me, so just lift the restriction on
"up to 2" events entirely.
Change-Id: Id7350132be19119c743c259f2f5250903bf41a04
Reviewed-on: https://go-review.googlesource.com/c/go/+/552275
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Currently the new execution tracer's handling of CPU profile samples is
very best-effort. The same CPU profile buffer is used across
generations, leading to a high probability that CPU samples will bleed
across generations. Also, because the CPU profile buffer (not the trace
buffer the samples get written into) isn't guaranteed to be flushed when
we close out a generation, nor when tracing stops. This has led to test
failures, but can more generally just lead to lost samples.
In general, lost samples are considered OK. The CPU profile buffer is
only read from every 100 ms, so if it fills up too much before then, old
samples will get overwritten. The tests already account for this, and in
that sense the CPU profile samples are already best-effort. But with
actual CPU profiles, this is really the only condition under which
samples are dropped.
This CL aims to align CPU profiles better with traces by eliminating
all best-effort parts of the implementation aside from the possibility
of dropped samples from a full buffer.
To achieve this, this CL adds a second CPU profile buffer and has the
SIGPROF handler pick which CPU profile buffer to use based on the
generation, much like every other part of the tracer. The SIGPROF
handler then reads the trace generation, but not before ensuring it
can't change: it grabs its own thread's trace seqlock. It's possible
that a SIGPROF signal lands while this seqlock is already held by the
thread. Luckily this is detectable and the SIGPROF handler can simply
elide the locking if this happens (the tracer will already wait until
all threads exit their seqlock critical section).
Now that there are two CPU profile buffers written to, the read side
needs to change. Instead of calling traceAcquire/traceRelease for every
single CPU sample event, the trace CPU profile reader goroutine holds
this conceptual lock over the entirety of flushing a buffer. This means
it can pick the CPU profile buffer for the current generation to flush.
With all this machinery in place, we're now at a point where all CPU
profile samples get divided into either the previous generation or the
current generation. This is good, since it means that we're able to
emit profile samples into the correct generation, avoiding surprises in
the final trace. All that's missing is to flush the CPU profile buffer
from the previous generation, once the runtime has moved on from that
generation. That is, when the generation counter updates, there may yet
be CPU profile samples sitting in the last generation's buffer. So,
traceCPUFlush now first flushes the CPU profile buffer, followed by any
trace buffers containing CPU profile samples.
The end result of all this is that no sample gets left behind unless it
gets overwritten in the CPU profile buffer in the first place. CPU
profile samples in the trace will now also get attributed to the right
generation, since the SIGPROF handler now participates in the tracer's
synchronization across trace generations.
Fixes #55317.
Change-Id: I47719fad164c544eef0bb12f99c8f3c15358e344
Reviewed-on: https://go-review.googlesource.com/c/go/+/555495
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Currently entersyscall_gcwait always emits a ProcStop event. Most of the
time, this is correct, since the thread that just put the P into
_Psyscall is the same one that is putting it into _Pgcstop. However it's
possible for another thread to steal the P, start running a goroutine,
and then enter another syscall, putting the P back into _Psyscall. In
this case ProcStop is incorrect; the P is getting stolen. This leads to
broken traces.
Fix this by always emitting a ProcSteal event from entersyscall_gcwait.
This means that most of the time a thread will be 'stealing' the proc
from itself when it enters this function, but that's theoretically fine.
A ProcSteal is really just a fancy ProcStop.
Well, it would be if the parser correctly handled a self-steal. This is
a minor bug that just never came up before, but it's an update order
error (the mState is looked up and modified, but then it's modified
again at the end of the function to match newCtx). There's really no
reason a self-steal shouldn't be allowed, so fix that up and add a test.
Change-Id: Iec3d7639d331e3f2d127f92ce50c2c4a7818fcd3
Reviewed-on: https://go-review.googlesource.com/c/go/+/544215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
After landing the new execution tracer, the Windows builders failed with
some new errors.
Currently the GoSyscallBegin event has no indicator that its the target
of a ProcSteal event. This can lead to an ambiguous situation that is
unresolvable if timestamps are broken. For instance, if the tracer sees
the ProcSteal event while a goroutine has been observed to be in a
syscall (one that, for instance, did not actually lose its P), it will
proceed with the ProcSteal incorrectly.
This is a little abstract. For a more concrete example, see the
go122-syscall-steal-proc-ambiguous test.
This change resolves this ambiguity by interleaving GoSyscallBegin
events into how Ps are sequenced. Because a ProcSteal has a sequence
number (it has to, it's stopping a P from a distance) it necessarily
has to synchronize with a precise ProcStart event. This change basically
just extends this synchronization to GoSyscallBegin, so the ProcSteal
can't advance until _exactly the right_ syscall has been entered.
This change removes the test skip, since it and CL 541695 fix the two
main issues observed on Windows platforms.
For #60773.
Fixes #64061.
Change-Id: I069389cd7fe1ea903edf42d79912f6e2bcc23f62
Reviewed-on: https://go-review.googlesource.com/c/go/+/541696
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change mostly implements the design described in #60773 and
includes a new scalable parser for the new trace format, available in
internal/trace/v2. I'll leave this commit message short because this is
clearly an enormous CL with a lot of detail.
This change does not hook up the new tracer into cmd/trace yet. A
follow-up CL will handle that.
For #60773.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race
Change-Id: I5d2aca2cc07580ed3c76a9813ac48ec96b157de0
Reviewed-on: https://go-review.googlesource.com/c/go/+/494187
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|