aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/trace2stack.go
AgeCommit message (Collapse)Author
2024-04-15runtime: rename v2 execution tracer filesCarlos Amedee
This change renames the v2 execution tracer files created as part of Updates #66703 For #60773 Change-Id: I91bfdc08fec4ec68ff3a6e8b5c86f6f8bcae6e6d Reviewed-on: https://go-review.googlesource.com/c/go/+/576257 Auto-Submit: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-04-15runtime, cmd/trace: remove code paths that include v1 tracerCarlos Amedee
This change makes the new execution tracer described in #60773, the default tracer. This change attempts to make the smallest amount of changes for a single CL. Updates #66703 For #60773 Change-Id: I3742f3419c54f07d7c020ae5e1c18d29d8bcae6d Reviewed-on: https://go-review.googlesource.com/c/go/+/576256 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-10runtime: rewrite traceMap to scale betterMichael Anthony Knyszek
The existing implementation of traceMap is a hash map with a fixed bucket table size which scales poorly with the number of elements added to the map. After a few thousands elements are in the map, it tends to fall over. Furthermore, cleaning up the trace map is currently non-preemptible, without very good reason. This change replaces the traceMap implementation with a simple append-only concurrent hash-trie. The data structure is incredibly simple and does not suffer at all from the same scaling issues. Because the traceMap no longer has a lock, and the traceRegionAlloc it embeds is not thread-safe, we have to push that lock down. While we're here, this change also makes the fast path for the traceRegionAlloc lock-free. This may not be inherently faster due to contention on the atomic add, but it creates an easy path to sharding the main allocation buffer to reduce contention in the future. (We might want to also consider a fully thread-local allocator that covers both string and stack tables. The only reason a thread-local allocator isn't feasible right now is because each of these has their own region, but we could certainly group all them together.) Change-Id: I8c06d42825c326061a1b8569e322afc4bc2a513a Reviewed-on: https://go-review.googlesource.com/c/go/+/570035 Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Bypass: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com>
2024-04-10runtime: push down systemstack requirement for tracer where possibleMichael Anthony Knyszek
Currently lots of functions require systemstack because the trace buffer might get flushed, but that will already switch to the systemstack for the most critical bits (grabbing trace.lock). That means a lot of this code is non-preemptible when it doesn't need to be. We've seen this cause problems at scale, when dumping very large numbers of stacks at once, for example. This is a re-land of CL 572095 which was reverted in CL 577376. This re-land includes a fix of the test that broke on the longtest builders. Change-Id: Ia8d7cbe3aaa8398cf4a1818bac66c3415a399348 Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Reviewed-on: https://go-review.googlesource.com/c/go/+/577377 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-04-08Revert "runtime: push down systemstack requirement for tracer where possible"Michael Knyszek
This reverts CL 572095. Reason for revert: Broke longtest builders. Change-Id: Iac3a8159d3afb4156a49c7d6819cdd15fe9d4bbb Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Reviewed-on: https://go-review.googlesource.com/c/go/+/577376 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-04-05runtime: push down systemstack requirement for tracer where possibleMichael Anthony Knyszek
Currently lots of functions require systemstack because the trace buffer might get flushed, but that will already switch to the systemstack for the most critical bits (grabbing trace.lock). That means a lot of this code is non-preemptible when it doesn't need to be. We've seen this cause problems at scale, when dumping very large numbers of stacks at once, for example. Change-Id: I88340091a3c43f0513b5601ef5199c946aa56ed7 Reviewed-on: https://go-review.googlesource.com/c/go/+/572095 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-04-05runtime: take a stack trace during tracing only when we own the stackMichael Anthony Knyszek
Currently, the execution tracer may attempt to take a stack trace of a goroutine whose stack it does not own. For example, if the goroutine is in _Grunnable or _Gwaiting. This is easily fixed in all cases by simply moving the emission of GoStop and GoBlock events to before the casgstatus happens. The goroutine status is what is used to signal stack ownership, and the GC may shrink a goroutine's stack if it can acquire the scan bit. Although this is easily fixed, the interaction here is very subtle, because stack ownership is only implicit in the goroutine's scan status. To make this invariant more maintainable and less error-prone in the future, this change adds a GODEBUG setting that checks, at the point of taking a stack trace, whether the caller owns the goroutine. This check is not quite perfect because there's no way for the stack tracing code to know that the _Gscan bit was acquired by the caller, so for simplicity it assumes that it was the caller that acquired the scan bit. In all other cases however, we can check for ownership precisely. At the very least, this check is sufficient to catch the issue this change is fixing. To make sure this debug check doesn't bitrot, it's always enabled during trace testing. This new mode has actually caught a few other issues already, so this change fixes them. One issue that this debug mode caught was that it's not safe to take a stack trace of a _Gwaiting goroutine that's being unparked. Another much bigger issue this debug mode caught was the fact that the execution tracer could try to take a stack trace of a G that was in _Gwaiting solely to avoid a deadlock in the GC. The execution tracer already has a partial list of these cases since they're modeled as the goroutine just executing as normal in the tracer, but this change takes the list and makes it more formal. In this specific case, we now prevent the GC from shrinking the stacks of goroutines in this state if tracing is enabled. The stack traces from these scenarios are too useful to discard, but there is indeed a race here between the tracer and any attempt to shrink the stack by the GC. Change-Id: I019850dabc8cede202fd6dcc0a4b1f16764209fb Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race Reviewed-on: https://go-review.googlesource.com/c/go/+/573155 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-04-05runtime: emit trace stacks for more goroutines in each generationMichael Anthony Knyszek
This change adds a new event, GoStatusStack, which is like GoStatus but also carries a stack ID. The purpose of this event is to emit stacks in more places, in particular for goroutines that may never emit a stack-bearing event in a whole generation. This CL targets one specific case: goroutines that were blocked or in a syscall the entire generation. This particular case is handled at the point that we scribble down the goroutine's status before the generation transition. That way, when we're finishing up the generation and emitting events for any goroutines we scribbled down, we have an accurate stack for those goroutines ready to go, and we emit a GoStatusStack instead of a GoStatus event. There's a small drawback with the way we scribble down the stack though: we immediately register it in the stack table instead of tracking the PCs. This means that if a goroutine does run and emit a trace event in between when we scribbled down its stack and the end of the generation, we will have recorded a stack that never actually gets referenced in the trace. This case should be rare. There are two remaining cases where we could emit stacks for goroutines but we don't. One is goroutines that get unblocked but either never run, or run and never block within a generation. We could take a stack trace at the point of unblocking the goroutine, if we're emitting a GoStatus event for it, but unfortunately we don't own the stack at that point. We could obtain ownership by grabbing its _Gscan bit, but that seems a little risky, since we could hold up the goroutine emitting the event for a while. Something to consider for the future. The other remaining case is a goroutine that was runnable when tracing started and began running, but then ran until the end of the generation without getting preempted or blocking. The main issue here is that although the goroutine will have a GoStatus event, it'll only have a GoStart event for it which doesn't emit a stack trace. This case is rare, but still certainly possible. I believe the only way to resolve it is to emit a GoStatusStack event instead of a GoStatus event for a goroutine that we're emitting GoStart for. This case is a bit easier than the last one because at the point of emitting GoStart, we have ownership of the goroutine's stack. We may consider dealing with these in the future, but for now, this CL captures a fairly large class of goroutines, so is worth it on its own. Fixes #65634. Change-Id: Ief3b6df5848b426e7ee6794e98dc7ef5f37ab2d0 Reviewed-on: https://go-review.googlesource.com/c/go/+/567076 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-22runtime: don't hold the table lock in (*traceStackTable).dumpMichael Anthony Knyszek
There's a conceptual cycle between traceStackTable.lock and allocation-related locks, but it can't happen in practice because the caller guarantees that there are no more writers to the table at the point that dump is called. But if that's true, then the lock isn't necessary at all. It would be difficult to model this quiesence in the lockrank mode, so just don't hold the lock and expand the documentation of the dump method. Change-Id: Id4db61363f075b7574135529915e8bd4f4f4c082 Reviewed-on: https://go-review.googlesource.com/c/go/+/544177 Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-11-10runtime: add execution tracer v2 behind GOEXPERIMENT=exectracer2Michael Anthony Knyszek
This change mostly implements the design described in #60773 and includes a new scalable parser for the new trace format, available in internal/trace/v2. I'll leave this commit message short because this is clearly an enormous CL with a lot of detail. This change does not hook up the new tracer into cmd/trace yet. A follow-up CL will handle that. For #60773. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race Change-Id: I5d2aca2cc07580ed3c76a9813ac48ec96b157de0 Reviewed-on: https://go-review.googlesource.com/c/go/+/494187 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>