| Age | Commit message (Collapse) | Author |
|
The current cheaprand performs 128-bit multiplication on 64-bit numbers
and truncate the result to 32 bits, which is inefficient.
A 32-bit specific implementation is more performant because it performs
64-bit multiplication on 32-bit numbers instead.
The current cheaprand64 involves two cheaprand calls.
Implementing it as 64-bit wyrand is significantly faster.
Since cheaprand64 discards one bit, I have preserved this behavior.
The underlying uint64 function is made available as cheaprandu64.
│ old │ new │
│ sec/op │ sec/op vs base │
Cheaprand-8 1.358n ± 0% 1.218n ± 0% -10.31% (n=100)
Cheaprand64-8 2.424n ± 0% 1.391n ± 0% -42.62% (n=100)
Blocksampled-8 8.347n ± 0% 2.022n ± 0% -75.78% (n=100)
Fixes #77149
Change-Id: Ib0b5da4a642cd34d0401b03c1d343041f8230d11
GitHub-Last-Rev: 549d8d407e2bbcaecdee0b52cbf3a513dda637fb
GitHub-Pull-Request: golang/go#77150
Reviewed-on: https://go-review.googlesource.com/c/go/+/735480
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
When CL 533258 added frame pointer support for the block and mutex
profiles, it deferred skipping/inline expansion, similar to the
execution tracer. The first frame indicated whether this expansion was
needed by holding either the number of frames to skip (implying inline
expansion) or the logicalStackSentinel placeholder to indicate that no
extra handling is needed. However, CL 598515 switched to doing the
skipping/inline expansion at sample time. After this, the sentinel value
is unused. This CL removes the sentinel from runtime lock profiling,
correcting an oversight from that CL. This didn't cause any problems
since post-processing would just ignore the bogus sentinel value; this
is just a cleanup.
Change-Id: Idbec11f57ac7a57426cd8a064782c4fe6a6a6964
Reviewed-on: https://go-review.googlesource.com/c/go/+/726040
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
CL 29650 added a sync.event linkname for mutexevent, but that does not
appear to have ever been used. This linkname isn't in the "hall of
shame", so we can just remove it.
Change-Id: I778b53ca089b5afda6c6074be9e43e3a6a6a6964
Reviewed-on: https://go-review.googlesource.com/c/go/+/736220
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Cleaned up goroutine leak profile extraction:
- removed the acquisition of goroutineProfile semaphore
- inlined the call to saveg when recording stacks instead of using
doRecordGoroutineProfile, which had side-effects over
goroutineProfile fields.
Added regression tests for goroutine leak profiling frontend for binary
and debug=1 profile formats.
Added stress tests for concurrent goroutine and goroutine leak profile requests.
Change-Id: I55c1bcef11e9a7fb7699b4c5a2353e594d3e7173
GitHub-Last-Rev: 5e9eb3b1d80c4d2d9b668a01f6b39a7b42f7bb45
GitHub-Pull-Request: golang/go#76045
Reviewed-on: https://go-review.googlesource.com/c/go/+/714580
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
This change eliminates the _Psyscall state by using synchronization on
the G status _Gsyscall to make syscalls work instead. This removes an
atomic Store and an atomic CAS on the syscall path, which reduces
syscall and cgo overheads. It also simplifies the syscall paths quite a
bit.
The one danger with this change is that we have a new combination of
states that was previously impossible. There are brief windows where
it's possible to observe a goroutine in _Grunning but without a P. This
change is careful to hide this detail from the execution tracer, but it
may have unexpected effects in the rest of the runtime, making this
change somewhat risky.
goos: linux
goarch: amd64
pkg: internal/runtime/cgobench
cpu: AMD EPYC 7B13
│ before.out │ after.out │
│ sec/op │ sec/op vs base │
CgoCall-64 43.69n ± 1% 35.83n ± 1% -17.99% (p=0.002 n=6)
CgoCallParallel-64 5.306n ± 1% 5.338n ± 1% ~ (p=0.132 n=6)
Change-Id: I4551afc1eea0c1b67a0b2dd26b0d49aa47bf1fb8
Reviewed-on: https://go-review.googlesource.com/c/go/+/646198
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
_Gdeadextra is almost the same as _Gdead but for goroutines attached to
extra Ms. The primary difference is that it can be transitioned into a
_Gscan status, unlike _Gdead. (Why not just use _Gdead? For safety,
mostly. There's exactly one case where we're going to want to transition
_Gdead to _Gscan|_Gdead, and it's for extra Ms. It's also a bit weird to
call this state dead when it can still have a syscalling P attached to
it.)
This status is used in a follow-up change that changes entersyscall and
exitsyscall.
Change-Id: I169a4c8617aa3dc329574b829203f56c86b58169
Reviewed-on: https://go-review.googlesource.com/c/go/+/646197
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
I think the original depth-1 argument to allocDeep was correct.
Reverted that, and also the change to maxSkip in mprof.go, which was
also incorrect. I think before we were usually passing accidentally in
the loop over matched stacks when we really should usually have been
passing in the previous loop.
Change-Id: I6a6a696463e2baf045b66f418d7afbfcb49258e4
Reviewed-on: https://go-review.googlesource.com/c/go/+/712100
Reviewed-by: Michael Matloob <matloob@google.com>
TryBot-Bypass: Michael Matloob <matloob@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
This change creates calls to size-specialized malloc functions instead
of calls to newObject when we know the size of the allocation at
compilation time. Most of it is a matter of calling the newObject
function (which will create calls to the size-specialized functions)
rather then the newObjectNonSpecialized function (which won't). In the
newHeapaddr, small, non-pointer case, we'll create a non specialized
newObject and transform that into the appropriate size-specialized
function when we produce the mallocgc in flushPendingHeapAllocations.
We have to update some of the rewrites in generic.rules to also apply to
the size-specialized functions when they apply to newObject.
The messiest thing is we have to adjust the offset we use to save the
memory profiler stack, because the depth of the call to profilealloc is
two frames fewer in the size-specialized malloc functions compared to
when newObject calls mallocgc. A bunch of tests have been adjusted to
account for that.
Change-Id: I6a6a6964c9037fb6719e392c4a498ed700b617d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/707856
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Matloob <matloob@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Proposal #74609
Change-Id: I97a754b128aac1bc5b7b9ab607fcd5bb390058c8
GitHub-Last-Rev: 60f2a192badf415112246de8bc6c0084085314f6
GitHub-Pull-Request: golang/go#74622
Reviewed-on: https://go-review.googlesource.com/c/go/+/688335
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: t hepudds <thepudds1460@gmail.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
This is largely a port of CL 38180.
For #15490.
Change-Id: I2726111e472e81e9f9f0f294df97872c2689f061
Reviewed-on: https://go-review.googlesource.com/c/go/+/690397
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Before CL 650697, there was only one system goroutine that could
dynamically change between being a user goroutine and a system
goroutine, and that was the finalizer/cleanup goroutine. In goroutine
profiles, it was handled explicitly. It's status would be checked during
the first STW, and its stack would be recorded. This let the goroutine
profiler completely ignore system goroutines once the world was started
again.
CL 650697 added dedicated cleanup goroutines (there may be more than
one), and with this, the logic for finalizer goroutines no longer
scaled. In that CL, I let the isSystemGoroutine check be dynamic and
dropped the special case, but this was based on incorrect assumptions.
Namely, it's possible for the scheduler to observe, for example, the
finalizer goroutine as a system goroutine and ignore it, but then later
the goroutine profiler itself sees it as a user goroutine. At that point
it's too late and already running. This violates the invariant of the
goroutine profile that all goroutines are handled by the profiler before
they start executing. In practice, the result is that the goroutine
profiler can crash when it checks this invariant (not checking the
invariant means racily reading goroutine stack memory).
The root cause of the problem is that these system goroutines do not
participate in the goroutine profiler's state machine. Normally, when
profiling, goroutines transition from 'absent' to 'in-progress' to
'satisfied'. However with system goroutines, the state machine is
ignored entirely. They always stay in the 'absent' state. This means
that if a goroutine transitions from system to user, it is eligible for
a profile record when it shouldn't be. That transition shouldn't be
allowed to occur with respect to the goroutine profiler, because the
goroutine profiler is trying to snapshot the state of every goroutine.
The fix to this problem is simple: don't ignore system goroutines. Let
them participate in the goroutine profile state machine. Instead, decide
whether or not to record the stack after the goroutine has been acquired
for goroutine profiling. This means if the scheduler observes the
finalizer goroutine as a system goroutine, it will get promoted in the
goroutine profiler's state machine, and no other part of the goroutine
profiler will observe the goroutine again. Simultaneously, the
stack record for the goroutine will be correctly skipped.
Fixes #74090.
Change-Id: Icb9a164a033be22aaa942d19e828e895f700ca74
Reviewed-on: https://go-review.googlesource.com/c/go/+/680477
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change splits the finalizer and cleanup queues and implements a new
lock-free blocking queue for cleanups. The basic design is as follows:
The cleanup queue is organized in fixed-sized blocks. Individual cleanup
functions are queued, but only whole blocks are dequeued.
Enqueuing cleanups places them in P-local cleanup blocks. These are
flushed to the full list as they get full. Cleanups can only be enqueued
by an active sweeper.
Dequeuing cleanups always dequeues entire blocks from the full list.
Cleanup blocks can be dequeued and executed at any time.
The very last active sweeper in the sweep phase is responsible for
flushing all local cleanup blocks to the full list. It can do this
without any synchronization because the next GC can't start yet, so we
can be very certain that nobody else will be accessing the local blocks.
Cleanup blocks are stored off-heap because the need to be allocated by
the sweeper, which is called from heap allocation paths. As a result,
the GC treats cleanup blocks as roots, just like finalizer blocks.
Flushes to the full list signal to the scheduler that cleanup goroutines
should be awoken. Every time the scheduler goes to wake up a cleanup
goroutine and there were more signals than goroutines to wake, it then
forwards this signal to runtime.AddCleanup, so that it creates another
goroutine the next time it is called, up to gomaxprocs goroutines.
The signals here are a little convoluted, but exist because the sweeper
and the scheduler cannot safely create new goroutines.
For #71772.
For #71825.
Change-Id: Ie839fde2b67e1b79ac1426be0ea29a8d923a62cc
Reviewed-on: https://go-review.googlesource.com/c/go/+/650697
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
Go 1.22 promised to remove the setting in a future release once the
semantics of runtime-internal lock contention matched that of
sync.Mutex. That work is done, remove the setting.
Previously reviewed as https://go.dev/cl/585639.
For #66999
Change-Id: I9fe62558ba0ac12824874a0bb1b41efeb7c0853f
Reviewed-on: https://go-review.googlesource.com/c/go/+/668995
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Correct how the mutex contention profile reports on runtime-internal
mutex values, to match sync.Mutex's semantics.
Decide at the start of unlock2 whether we'd like to collect a contention
sample. If so: Opt in to a slightly slower unlock path which avoids
accidentally accepting blame for delay caused by other Ms. Release the
lock before doing an O(N) traversal of the stack of waiting Ms, to
calculate the total delay to those Ms that our critical section caused.
Report that, with the current callstack, in the mutex profile.
Fixes #66999
Change-Id: I561ed8dc120669bd045d514cb0d1c6c99c2add04
Reviewed-on: https://go-review.googlesource.com/c/go/+/667615
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Leverage the prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ...) API to name
the anonymous memory areas.
This API has been introduced in Linux 5.17 to decorate the anonymous
memory areas shown in /proc/<pid>/maps.
This is already used by glibc. See:
* https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=27dfd1eb907f4615b70c70237c42c552bb4f26a8;hb=HEAD#l2434
* https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/setvmaname.c;h=ea93a5ffbebc9e5a7e32a297138f465724b4725f;hb=HEAD#l63
This can be useful when investigating the memory consumption of a
multi-language program.
On a 100% Go program, pprof profiler can be used to profile the memory
consumption of the program. But pprof is only aware of what happens
within the Go world.
On a multi-language program, there could be a doubt about whether the
suspicious extra-memory consumption comes from the Go part or the native
part.
With this change, the following Go program:
package main
import (
"fmt"
"log"
"os"
)
/*
#include <stdlib.h>
void f(void)
{
(void)malloc(1024*1024*1024);
}
*/
import "C"
func main() {
C.f()
data, err := os.ReadFile("/proc/self/maps")
if err != nil {
log.Fatal(err)
}
fmt.Println(string(data))
}
produces this output:
$ GLIBC_TUNABLES=glibc.mem.decorate_maps=1 ~/doc/devel/open-source/go/bin/go run .
00400000-00402000 r--p 00000000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
00402000-004a4000 r-xp 00002000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
004a4000-00574000 r--p 000a4000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
00574000-00575000 r--p 00173000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
00575000-00580000 rw-p 00174000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
00580000-005a4000 rw-p 00000000 00:00 0
2e075000-2e096000 rw-p 00000000 00:00 0 [heap]
c000000000-c000400000 rw-p 00000000 00:00 0 [anon: Go: heap]
c000400000-c004000000 ---p 00000000 00:00 0 [anon: Go: heap reservation]
777f40000000-777f40021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena]
777f40021000-777f44000000 ---p 00000000 00:00 0
777f44000000-777f44021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena]
777f44021000-777f48000000 ---p 00000000 00:00 0
777f48000000-777f48021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena]
777f48021000-777f4c000000 ---p 00000000 00:00 0
777f4c000000-777f4c021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena]
777f4c021000-777f50000000 ---p 00000000 00:00 0
777f50000000-777f50021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena]
777f50021000-777f54000000 ---p 00000000 00:00 0
777f55afb000-777f55afc000 ---p 00000000 00:00 0
777f55afc000-777f562fc000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216378]
777f562fc000-777f562fd000 ---p 00000000 00:00 0
777f562fd000-777f56afd000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216377]
777f56afd000-777f56afe000 ---p 00000000 00:00 0
777f56afe000-777f572fe000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216376]
777f572fe000-777f572ff000 ---p 00000000 00:00 0
777f572ff000-777f57aff000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216375]
777f57aff000-777f57b00000 ---p 00000000 00:00 0
777f57b00000-777f58300000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216374]
777f58300000-777f58400000 rw-p 00000000 00:00 0 [anon: Go: page alloc index]
777f58400000-777f5a400000 rw-p 00000000 00:00 0 [anon: Go: heap index]
777f5a400000-777f6a580000 ---p 00000000 00:00 0 [anon: Go: scavenge index]
777f6a580000-777f6a581000 rw-p 00000000 00:00 0 [anon: Go: scavenge index]
777f6a581000-777f7a400000 ---p 00000000 00:00 0 [anon: Go: scavenge index]
777f7a400000-777f8a580000 ---p 00000000 00:00 0 [anon: Go: page summary]
777f8a580000-777f8a581000 rw-p 00000000 00:00 0 [anon: Go: page alloc]
777f8a581000-777f9c430000 ---p 00000000 00:00 0 [anon: Go: page summary]
777f9c430000-777f9c431000 rw-p 00000000 00:00 0 [anon: Go: page alloc]
777f9c431000-777f9e806000 ---p 00000000 00:00 0 [anon: Go: page summary]
777f9e806000-777f9e807000 rw-p 00000000 00:00 0 [anon: Go: page alloc]
777f9e807000-777f9ec00000 ---p 00000000 00:00 0 [anon: Go: page summary]
777f9ec36000-777f9ecb6000 rw-p 00000000 00:00 0 [anon: Go: immortal metadata]
777f9ecb6000-777f9ecc6000 rw-p 00000000 00:00 0 [anon: Go: gc bits]
777f9ecc6000-777f9ecd6000 rw-p 00000000 00:00 0 [anon: Go: allspans array]
777f9ecd6000-777f9ece7000 rw-p 00000000 00:00 0 [anon: Go: immortal metadata]
777f9ece7000-777f9ed67000 ---p 00000000 00:00 0 [anon: Go: page summary]
777f9ed67000-777f9ed68000 rw-p 00000000 00:00 0 [anon: Go: page alloc]
777f9ed68000-777f9ede7000 ---p 00000000 00:00 0 [anon: Go: page summary]
777f9ede7000-777f9ee07000 rw-p 00000000 00:00 0 [anon: Go: page alloc]
777f9ee07000-777f9ee0a000 rw-p 00000000 00:00 0 [anon: glibc: loader malloc]
777f9ee0a000-777f9ee2e000 r--p 00000000 00:21 48158213 /usr/lib/libc.so.6
777f9ee2e000-777f9ef9f000 r-xp 00024000 00:21 48158213 /usr/lib/libc.so.6
777f9ef9f000-777f9efee000 r--p 00195000 00:21 48158213 /usr/lib/libc.so.6
777f9efee000-777f9eff2000 r--p 001e3000 00:21 48158213 /usr/lib/libc.so.6
777f9eff2000-777f9eff4000 rw-p 001e7000 00:21 48158213 /usr/lib/libc.so.6
777f9eff4000-777f9effc000 rw-p 00000000 00:00 0
777f9effc000-777f9effe000 rw-p 00000000 00:00 0 [anon: glibc: loader malloc]
777f9f00a000-777f9f04a000 rw-p 00000000 00:00 0 [anon: Go: immortal metadata]
777f9f04a000-777f9f04c000 r--p 00000000 00:00 0 [vvar]
777f9f04c000-777f9f04e000 r--p 00000000 00:00 0 [vvar_vclock]
777f9f04e000-777f9f050000 r-xp 00000000 00:00 0 [vdso]
777f9f050000-777f9f051000 r--p 00000000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2
777f9f051000-777f9f07a000 r-xp 00001000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2
777f9f07a000-777f9f085000 r--p 0002a000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2
777f9f085000-777f9f087000 r--p 00034000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2
777f9f087000-777f9f088000 rw-p 00036000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2
777f9f088000-777f9f089000 rw-p 00000000 00:00 0
7ffc7bfa7000-7ffc7bfc8000 rw-p 00000000 00:00 0 [stack]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
The anonymous memory areas are now labelled so that we can see which
ones have been allocated by the Go runtime versus which ones have been
allocated by the glibc.
Fixes #71546
Change-Id: I304e8b4dd7f2477a6da794fd44e9a7a5354e4bf4
Reviewed-on: https://go-review.googlesource.com/c/go/+/646095
Auto-Submit: Alan Donovan <adonovan@google.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
This CL refactors sync.Mutex such that its implementation lives in the
new internal/sync package. The purpose of this change is to eventually
reverse the dependency edge between internal/concurrent and sync, such
that sync can depend on internal/concurrent (or really, its contents,
which will likely end up in internal/sync).
The only change made to the sync.Mutex code is the frame skip count for
mutex profiling, so that the internal/sync frames are omitted in the
profile.
Change-Id: Ib3603d30e8e71508c4ea883a584ae2e51ce40c3f
Reviewed-on: https://go-review.googlesource.com/c/go/+/594056
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
Right now mallocgc is a monster of a function. In real programs, we see
that a substantial amount of time in mallocgc is spent in mallocgc
itself. It's very branch-y, holds a lot of state, and handles quite a few
disparate cases, trying to merge them together.
This change breaks apart mallocgc into separate, leaner functions.
There's some duplication now, but there are a lot of branches that can
be pruned as a result.
There's definitely still more we can do here. heapSetType can be inlined
and broken down for each case, since its internals roughly map to each
case anyway (done in a follow-up CL). We can probably also do more with
the size class lookups, since we know more about the size of the object
in each case than before.
Below are the savings for the full stack up until now.
│ after-3.out │ after-4.out │
│ sec/op │ sec/op vs base │
Malloc8-4 13.32n ± 2% 12.17n ± 1% -8.63% (p=0.002 n=6)
Malloc16-4 21.64n ± 3% 19.38n ± 10% -10.47% (p=0.002 n=6)
MallocTypeInfo8-4 23.15n ± 2% 19.91n ± 2% -14.00% (p=0.002 n=6)
MallocTypeInfo16-4 25.86n ± 4% 22.48n ± 5% -13.11% (p=0.002 n=6)
MallocLargeStruct-4 270.0n ± ∞ ¹
geomean 20.38n 30.97n -11.58%
Change-Id: I681029c0b442f9221c4429950626f06299a5cfe4
Reviewed-on: https://go-review.googlesource.com/c/go/+/614257
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Fix a regression introduced in CL 572396 causing goroutine stacks not
getting null terminated.
This bug impacts callers that reuse the []StackRecord slice for multiple
calls to GoroutineProfile. See https://github.com/felixge/fgprof/issues/33
for an example of the problem.
Add a test case to prevent similar regressions in the future. Use null
padding instead of null termination to be consistent with other profile
types and because it's less code to implement. Also fix the
ThreadCreateProfile code path.
Fixes #69243
Change-Id: I0b9414f6c694c304bc03a5682586f619e9bf0588
Reviewed-on: https://go-review.googlesource.com/c/go/+/609815
Reviewed-by: Tim King <taking@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Fix a regression introduced in CL 598515 causing runtime.MutexProfile
stack traces to omit their root frames.
In most cases this was merely causing the `runtime.goexit` frame to go
missing. But in the case of runtime._LostContendedRuntimeLock, an empty
stack trace was being produced.
Add a test that catches this regression by checking for a stack trace
with the `runtime.goexit` frame.
Also fix a separate problem in expandFrame that could cause
out-of-bounds panics when profstackdepth is set to a value below 32.
There is no test for this fix because profstackdepth can't be changed at
runtime right now.
Fixes #69335
Change-Id: I1600fe62548ea84981df0916d25072c3ddf1ea1a
Reviewed-on: https://go-review.googlesource.com/c/go/+/611615
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Moving these intrinsics to a base package enables other internal/runtime
packages to use them.
For #54766.
Change-Id: I45a530422207dd94b5ad4eee51216c9410a84040
Reviewed-on: https://go-review.googlesource.com/c/go/+/613261
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Moving these intrinsics to a base package enables other internal/runtime
packages to use them.
For #54766.
Change-Id: I0b3eded3bb45af53e3eb5bab93e3792e6a8beb46
Reviewed-on: https://go-review.googlesource.com/c/go/+/613260
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Fix typos in ~30 files
Change-Id: Ie433aea01e7d15944c1e9e103691784876d5c1f9
GitHub-Last-Rev: bbaeb3d1f88a5fa6bbb69607b1bd075f496a7894
GitHub-Pull-Request: golang/go#68964
Reviewed-on: https://go-review.googlesource.com/c/go/+/606955
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Mutex contention events with delay of 0 need more than CL 604355 added:
When deciding which event to store in the M's single available slot,
always choose to drop the zero-delay event. Store an explicit flag for
whether we have an event to store, rather than relying on a non-zero
delay.
And, fix a test of sync.Mutex contention that expects those events to
have non-zero delay. The reporting of non-runtime contention like this
has long allowed zero-delay events, which we see when cputicks has low
resolution.
Fixes #68892
Fixes #68906
Change-Id: Id412141e4eb09724f3ce195899a20d59c92d7b78
Reviewed-on: https://go-review.googlesource.com/c/go/+/606115
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
The block and mutex profiles have slightly different behaviors when a
sampled event has a negative (or zero) duration. The block profile
enforces a minimum duration for each event of "1" in the cputicks unit.
It does so by clamping the duration to 1 if it was originally reported
as being smaller. The mutex profile for app-level contention enforces a
minimum duration of 0 in a similar way: by reporting any negative values
as 0 instead.
The mutex profile for runtime-internal contention had a different
behavior: to enforce a minimum event duration of "1" by dropping any
non-conforming samples.
Stop dropping samples, and use the same minimum (0) that's in place for
the other mutex profile events.
Fixes #64253
Fixes #68453
Fixes #68781
Change-Id: I4c5d23a2675501226eef5b9bc1ada2efc1a55b9e
Reviewed-on: https://go-review.googlesource.com/c/go/+/604355
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
|
|
Cleanup and friction reduction
For #65355.
Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511
Reviewed-on: https://go-review.googlesource.com/c/go/+/600436
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
When using frame pointer unwinding, we defer frame skipping and inline
expansion for call stacks until profile reporting time. We can end up
with records which have different stacks if no frames are skipped, but
identical stacks once skipping is taken into account. Returning multiple
records with the same stack (but different values) has broken programs
which rely on the records already being fully aggregated by call stack
when returned from runtime.MutexProfile. This CL addresses the problem
by handling skipping at recording time. We do full inline expansion to
correctly skip the desired number of frames when recording the call
stack, and then handle the rest of inline expansion when reporting the
profile.
The regression test in this CL is adapted from the reproducer in
https://github.com/grafana/pyroscope-go/issues/103, authored by Tolya
Korniltsev.
Fixes #67548
This reapplies CL 595966.
The original version of this CL introduced a bounds error in
MutexProfile and failed to correctly expand inlined frames from that
call. This CL applies the original CL, fixing the bounds error and
adding a test for the output of MutexProfile to ensure the frames are
expanded properly.
Change-Id: I5ef8aafb9f88152a704034065c0742ba767c4dbb
Reviewed-on: https://go-review.googlesource.com/c/go/+/598515
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This reverts CL 595966.
Reason for revert: This CL contains a bug. See the comment in https://go-review.googlesource.com/c/go/+/595966/8#message-57f4c1f9570b5fe912e06f4ae3b52817962533c0
Change-Id: I48030907ded173ae20a8965bf1b41a713dd06059
Reviewed-on: https://go-review.googlesource.com/c/go/+/598219
Reviewed-by: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
When using frame pointer unwinding, we defer frame skipping and inline
expansion for call stacks until profile reporting time. We can end up
with records which have different stacks if no frames are skipped, but
identical stacks once skipping is taken into account. Returning multiple
records with the same stack (but different values) has broken programs
which rely on the records already being fully aggregated by call stack
when returned from runtime.MutexProfile. This CL addresses the problem
by handling skipping at recording time. We do full inline expansion to
correctly skip the desired number of frames when recording the call
stack, and then handle the rest of inline expansion when reporting the
profile.
The regression test in this CL is adapted from the reproducer in
https://github.com/grafana/pyroscope-go/issues/103, authored by Tolya
Korniltsev.
Fixes #67548
Co-Authored-By: Tolya Korniltsev <korniltsev.anatoly@gmail.com>
Change-Id: I6a42ce612377f235b2c8c0cec9ba8e9331224b84
Reviewed-on: https://go-review.googlesource.com/c/go/+/595966
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
|
|
We have an optimization that if the memory profile is not consumed
anywhere, we set the memory profiling rate to 0 to disable the
"background" low-rate profiling. We detect whether the memory
profile is used by checking whether the runtime.MemProfile function
is reachable at link time. Previously, all APIs that access the
memory profile go through runtime.MemProfile. But the code was
refactored in CL 572396, and now the legacy entry point
WriteHeapProfile uses pprof_memProfileInternal without going
through runtime.MemProfile. In fact, even with the recommended
runtime/pprof.Profile API (pprof.Lookup or pprof.Profiles),
runtime.MemProfile is only (happen to be) reachable through
countHeap.
Change the linker to check runtime.memProfileInternal instead,
which is on all code paths that retrieve the memory profile. Add
a test case for WriteHeapProfile, so we cover all entry points.
Fixes #68136.
Change-Id: I075c8d45c95c81825a1822f032e23107aea4303c
Reviewed-on: https://go-review.googlesource.com/c/go/+/596538
Reviewed-by: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This reverts commit be0b569caa0eab1a7f30edf64e550bbf5f6ff235 (CL 585635).
Reason for revert: This is part of a patch series that changed the
handling of contended lock2/unlock2 calls, reducing the maximum
throughput of contended runtime.mutex values, and causing a performance
regression on applications where that is (or became) the bottleneck.
Updates #66999
Updates #67585
Change-Id: I7843ccaecbd273b7ceacfa0f420dd993b4b15a0a
Reviewed-on: https://go-review.googlesource.com/c/go/+/589117
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
This reverts commit d881ed6384ae58154d99682f1e20160c64e7c3c2 (CL 585637).
Reason for revert: This is part of a patch series that changed the
handling of contended lock2/unlock2 calls, reducing the maximum
throughput of contended runtime.mutex values, and causing a performance
regression on applications where that is (or became) the bottleneck.
Updates #66999
Updates #67585
Change-Id: I70d8d0b74f73be95c43d664f584e8d98519aba26
Reviewed-on: https://go-review.googlesource.com/c/go/+/589116
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
This reverts commit ba1c5b2c4573e10f3b5f0e0f25a27f17fba67eb0 (CL 585638).
Reason for revert: This is part of a patch series that changed the
handling of contended lock2/unlock2 calls, reducing the maximum
throughput of contended runtime.mutex values, and causing a performance
regression on applications where that is (or became) the bottleneck.
Updates #66999
Updates #67585
Change-Id: Ibeec5d8deb17e87966cf352fefc7efe2267839d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/589115
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This reverts commit 87e930f7289136fad1310d4b63dd4127e409bac5 (CL 585639)
Reason for revert: This is part of a patch series that changed the
handling of contended lock2/unlock2 calls, reducing the maximum
throughput of contended runtime.mutex values, and causing a performance
regression on applications where that is (or became) the bottleneck.
Updates #66999
Updates #67585
Change-Id: I1e286d2a16d16e4af202cd5dc04b2d9c4ee71b32
Reviewed-on: https://go-review.googlesource.com/c/go/+/589097
Reviewed-by: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
|
|
This reverts commit 8ab131fb1256a4a795c610e145c022e22e2d1567 (CL 586796)
Reason for revert: This is part of a patch series that changed the
handling of contended lock2/unlock2 calls, reducing the maximum
throughput of contended runtime.mutex values, and causing a performance
regression on applications where that is (or became) the bottleneck.
Updates #66999
Updates #67585
Change-Id: I54711691e86e072081482102019d168292b5150a
Reviewed-on: https://go-review.googlesource.com/c/go/+/589095
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
Mutex contention measurements work with two clocks: nanotime for use in
runtime/metrics, and cputicks for the runtime/pprof profile. They're
subject to different sampling rates: the runtime/metrics view is always
enabled, but the profile is adjustable and is turned off by default.
They have different levels of overhead: it can take as little as one
instruction to read cputicks while nanotime calls are more elaborate
(although some platforms implement cputicks as a nanotime call). The use
of the timestamps is also different: the profile's view needs to attach
the delay in some Ms' lock2 calls to another M's unlock2 call stack, but
the metric's view is only an int64.
Treat them differently. Don't bother threading the nanotime clock
through to the unlock2 call, measure and report it directly within
lock2. Sample nanotime at a constant gTrackingPeriod.
Don't consult any clocks unless the mutex is actually contended.
Continue liberal use of cputicks for now.
For #66999
Change-Id: I1c2085ea0e695bfa90c30fadedc99ced9eb1f69e
Reviewed-on: https://go-review.googlesource.com/c/go/+/586796
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Go 1.22 promised to remove the setting in a future release once the
semantics of runtime-internal lock contention matched that of
sync.Mutex. That work is done, remove the setting.
For #66999
Change-Id: I3c4894148385adf2756d8754e44d7317305ad758
Reviewed-on: https://go-review.googlesource.com/c/go/+/585639
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
When an M's use of a lock causes delays in other Ms, capture the stack
of the unlock call that caused the delay. This makes the meaning of the
mutex profile for runtime-internal mutexes match the behavior for
sync.Mutex: the profile points to the end of the critical section that
is responsible for delaying other work.
Fixes #66999
Change-Id: I4abc4a1df00a48765d29c07776481a1cbd539ff8
Reviewed-on: https://go-review.googlesource.com/c/go/+/585638
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
When an M unlocks a contended mutex, it needs to consult a list of the
Ms that had to wait during its critical section. This allows the M to
attribute the appropriate amount of blame to the unlocking call stack.
Mirroring the implementation for users' sync.Mutex contention (via
sudog), we can (in a future commit) use the time that the head and tail
of the wait list started waiting, and the number of waiters, to estimate
the sum of the Ms' delays.
When an M acquires the mutex, it needs to remove itself from the list of
waiters. Since the futex-based lock implementation leaves the OS in
control of the order of M wakeups, we need to be prepared for quickly
(constant time) removing any M from the list.
First, have each M add itself to a singly-linked wait list when it finds
that its lock call will need to sleep. This case is safe against
live-lock, since any delay to one M adding itself to the list would be
due to another M making durable progress.
Second, have the M that holds the lock (either right before releasing,
or right after acquiring) update metadata on the list of waiting Ms to
double-link the list and maintain a tail pointer and waiter count. That
work is amortized-constant: we'll avoid contended locks becoming
proportionally more contended and undergoing performance collapse.
For #66999
Change-Id: If75cdea915afb59ccec47294e0b52c466aac8736
Reviewed-on: https://go-review.googlesource.com/c/go/+/585637
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
|
|
Move the nextwaitm field into a small struct, in preparation for
additional metadata to track how long Ms need to wait for locks.
For #66999
Change-Id: Ib40e43c15cde22f7e35922641107973d99439ecd
Reviewed-on: https://go-review.googlesource.com/c/go/+/585635
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Allow users to decrease the profiling stack depth back to 32 in case
they experience any problems with the new default of 128.
Users may also use this option to increase the depth up to 1024.
Change-Id: Ieaab2513024915a223239278dd97a6e161dde1cf
Reviewed-on: https://go-review.googlesource.com/c/go/+/581917
Reviewed-by: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
The current stack depth limit for alloc, mutex, block, threadcreate and
goroutine profiles of 32 frequently leads to truncated stack traces in
production applications. Increase the limit to 128 which is the same
size used by the execution tracer.
Create internal/profilerecord to define variants of the runtime's
StackRecord, MemProfileRecord and BlockProfileRecord types that can hold
arbitrarily big stack traces. Implement internal profiling APIs based on
these new types and use them for creating protobuf profiles and to act
as shims for the public profiling APIs using the old types.
This will lead to an increase in memory usage for applications that
use the impacted profile types and have stack traces exceeding the
current limit of 32. Those applications will also experience a slight
increase in CPU usage, but this will hopefully soon be mitigated via CL
540476 and 533258 which introduce frame pointer unwinding for the
relevant profile types.
For #43669.
Change-Id: Ie53762e65d0f6295f5d4c7d3c87172d5a052164e
Reviewed-on: https://go-review.googlesource.com/c/go/+/572396
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Previously it was possible for mutex and block profile stack traces to
contain up to 32 frames in Stack0 or the resulting pprof profiles.
CL 533258 changed this behavior by using some of the space to
record skipped frames that are discarded when performing delayed inline
expansion. This has lowered the effective maximum stack size from 32 to
27 (the max skip value is 5), which can be seen as a small regression.
Add TestProfilerStackDepth to demonstrate the issue and protect all
profile types from similar regressions in the future. Fix the issue by
increasing the internal maxStack limit to take the maxSkip value into
account. Assert that the maxSkip value is never exceeded when recording
mutex and block profile stack traces.
Three alternative solutions to the problem were considered and
discarded:
1) Revert CL 533258 and give up on frame pointer unwinding. This seems
unappealing as we would lose the performance benefits of frame
pointer unwinding.
2) Discard skipped frames when recording the initial stack trace. This
would require eager inline expansion for up to maxSkip frames and
partially negate the performance benefits of frame pointer
unwinding.
3) Accept and document the new behavior. This would simplify the
implementation, but seems more confusing from a user perspective. It
also complicates the creation of test cases that make assertions
about the maximum profiling stack depth.
The execution tracer still has the same issue due to CL 463835. This
should be addressed in a follow-up CL.
Co-authored-by: Nick Ripley <nick.ripley@datadoghq.com>
Change-Id: Ibf4dbf08a5166c9cb32470068c69f58bc5f98d2c
Reviewed-on: https://go-review.googlesource.com/c/go/+/586657
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Use frame pointer unwinding, where supported, to collect call stacks for
the block, and mutex profilers. This method of collecting call stacks is
typically an order of magnitude faster than callers/tracebackPCs. The
marginal benefit for these profile types is likely small compared to
using frame pointer unwinding for the execution tracer. However, the
block profiler can have noticeable overhead unless the sampling rate is
very high. Additionally, using frame pointer unwinding in more places
helps ensure more testing/support, which benefits systems like the
execution tracer which rely on frame pointer unwinding to be practical
to use.
Change-Id: I4b36c90cd2df844645fd275a41b247352d635727
Reviewed-on: https://go-review.googlesource.com/c/go/+/533258
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Move profiling pc buffers from being stack allocated to an m field.
This is motivated by the next patch, which will increase the default
stack depth to 128, which might lead to undesirable stack growth for
goroutines that produce profiling events.
Additionally, this change paves the way to make the stack depth
configurable via GODEBUG.
Change-Id: Ifa407f899188e2c7c0a81de92194fdb627cb4b36
Reviewed-on: https://go-review.googlesource.com/c/go/+/574699
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
allocfreetrace prints all allocations and frees to stderr. It's not
terribly useful because it has a really huge overhead, making it not
feasible to use except for the most trivial programs. A follow-up CL
will replace it with something that is both more thorough and also lower
overhead.
Change-Id: I1d668fee8b6aaef5251a5aea3054ec2444d75eb6
Reviewed-on: https://go-review.googlesource.com/c/go/+/583376
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
For #65355
Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be
Reviewed-on: https://go-review.googlesource.com/c/go/+/560155
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
|
|
Change-Id: Icb6d9ca996b4119d8636d9f7f6a56e510d74d059
GitHub-Last-Rev: 08178e8ff798f4a51860573788c9347a0fb6bc40
GitHub-Pull-Request: golang/go#66188
Reviewed-on: https://go-review.googlesource.com/c/go/+/569979
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
CL 464349 added a new linkname to provide gcount to runtime/pprof to
avoid a STW when estimating the goroutine profile allocation size.
However, adding a linkname here isn't necessary for a few reasons:
1. We already export gcount via NumGoroutines. I completely forgot about
this during review.
2. aktau suggested that goroutineProfileWithLabelsConcurrent return
gcount as a fast path estimate when the input is empty.
The second point keeps the code cleaner overall, so I've done that.
For #54014.
Change-Id: I6cb0811a769c805e269b55774cdd43509854078e
Reviewed-on: https://go-review.googlesource.com/c/go/+/559515
Auto-Submit: Michael Pratt <mpratt@google.com>
Auto-Submit: Nicolas Hillegeer <aktau@google.com>
Reviewed-by: Nicolas Hillegeer <aktau@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Fixes #54014
Change-Id: If4ee2752008729e1ed4b767cfda52effdcec4959
GitHub-Last-Rev: 5ce300bf5128f842604d85d5f8749027c8e091c2
GitHub-Pull-Request: golang/go#58239
Reviewed-on: https://go-review.googlesource.com/c/go/+/464349
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: qiulaidongfeng <2645477756@qq.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Change-Id: I8fe66326994894b17ce0eda991bba942844d26b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/541475
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|