aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/mprof.go
AgeCommit message (Collapse)Author
2026-01-23runtime: speed up cheaprand and cheaprand64Gavin Lam
The current cheaprand performs 128-bit multiplication on 64-bit numbers and truncate the result to 32 bits, which is inefficient. A 32-bit specific implementation is more performant because it performs 64-bit multiplication on 32-bit numbers instead. The current cheaprand64 involves two cheaprand calls. Implementing it as 64-bit wyrand is significantly faster. Since cheaprand64 discards one bit, I have preserved this behavior. The underlying uint64 function is made available as cheaprandu64. │ old │ new │ │ sec/op │ sec/op vs base │ Cheaprand-8 1.358n ± 0% 1.218n ± 0% -10.31% (n=100) Cheaprand64-8 2.424n ± 0% 1.391n ± 0% -42.62% (n=100) Blocksampled-8 8.347n ± 0% 2.022n ± 0% -75.78% (n=100) Fixes #77149 Change-Id: Ib0b5da4a642cd34d0401b03c1d343041f8230d11 GitHub-Last-Rev: 549d8d407e2bbcaecdee0b52cbf3a513dda637fb GitHub-Pull-Request: golang/go#77150 Reviewed-on: https://go-review.googlesource.com/c/go/+/735480 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-23runtime: remove logical stack sentinel for runtime lock stacksNick Ripley
When CL 533258 added frame pointer support for the block and mutex profiles, it deferred skipping/inline expansion, similar to the execution tracer. The first frame indicated whether this expansion was needed by holding either the number of frames to skip (implying inline expansion) or the logicalStackSentinel placeholder to indicate that no extra handling is needed. However, CL 598515 switched to doing the skipping/inline expansion at sample time. After this, the sentinel value is unused. This CL removes the sentinel from runtime lock profiling, correcting an oversight from that CL. This didn't cause any problems since post-processing would just ignore the bogus sentinel value; this is just a cleanup. Change-Id: Idbec11f57ac7a57426cd8a064782c4fe6a6a6964 Reviewed-on: https://go-review.googlesource.com/c/go/+/726040 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2026-01-23runtime: remove unused mutexevent linknameNick Ripley
CL 29650 added a sync.event linkname for mutexevent, but that does not appear to have ever been used. This linkname isn't in the "hall of shame", so we can just remove it. Change-Id: I778b53ca089b5afda6c6074be9e43e3a6a6a6964 Reviewed-on: https://go-review.googlesource.com/c/go/+/736220 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-11-12runtime,runtime/pprof: clean up goroutine leak profile writingVlad Saioc
Cleaned up goroutine leak profile extraction: - removed the acquisition of goroutineProfile semaphore - inlined the call to saveg when recording stacks instead of using doRecordGoroutineProfile, which had side-effects over goroutineProfile fields. Added regression tests for goroutine leak profiling frontend for binary and debug=1 profile formats. Added stress tests for concurrent goroutine and goroutine leak profile requests. Change-Id: I55c1bcef11e9a7fb7699b4c5a2353e594d3e7173 GitHub-Last-Rev: 5e9eb3b1d80c4d2d9b668a01f6b39a7b42f7bb45 GitHub-Pull-Request: golang/go#76045 Reviewed-on: https://go-review.googlesource.com/c/go/+/714580 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-10-30runtime: eliminate _PsyscallMichael Anthony Knyszek
This change eliminates the _Psyscall state by using synchronization on the G status _Gsyscall to make syscalls work instead. This removes an atomic Store and an atomic CAS on the syscall path, which reduces syscall and cgo overheads. It also simplifies the syscall paths quite a bit. The one danger with this change is that we have a new combination of states that was previously impossible. There are brief windows where it's possible to observe a goroutine in _Grunning but without a P. This change is careful to hide this detail from the execution tracer, but it may have unexpected effects in the rest of the runtime, making this change somewhat risky. goos: linux goarch: amd64 pkg: internal/runtime/cgobench cpu: AMD EPYC 7B13 │ before.out │ after.out │ │ sec/op │ sec/op vs base │ CgoCall-64 43.69n ± 1% 35.83n ± 1% -17.99% (p=0.002 n=6) CgoCallParallel-64 5.306n ± 1% 5.338n ± 1% ~ (p=0.132 n=6) Change-Id: I4551afc1eea0c1b67a0b2dd26b0d49aa47bf1fb8 Reviewed-on: https://go-review.googlesource.com/c/go/+/646198 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-10-20runtime: add _Gdeadextra statusMichael Anthony Knyszek
_Gdeadextra is almost the same as _Gdead but for goroutines attached to extra Ms. The primary difference is that it can be transitioned into a _Gscan status, unlike _Gdead. (Why not just use _Gdead? For safety, mostly. There's exactly one case where we're going to want to transition _Gdead to _Gscan|_Gdead, and it's for extra Ms. It's also a bit weird to call this state dead when it can still have a syscalling P attached to it.) This status is used in a follow-up change that changes entersyscall and exitsyscall. Change-Id: I169a4c8617aa3dc329574b829203f56c86b58169 Reviewed-on: https://go-review.googlesource.com/c/go/+/646197 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-15runtime/pprof: fix errors in pprof_testmatloob
I think the original depth-1 argument to allocDeep was correct. Reverted that, and also the change to maxSkip in mprof.go, which was also incorrect. I think before we were usually passing accidentally in the loop over matched stacks when we really should usually have been passing in the previous loop. Change-Id: I6a6a696463e2baf045b66f418d7afbfcb49258e4 Reviewed-on: https://go-review.googlesource.com/c/go/+/712100 Reviewed-by: Michael Matloob <matloob@google.com> TryBot-Bypass: Michael Matloob <matloob@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-10-09cmd/compile: call generated size-specialized malloc functions directlyMichael Matloob
This change creates calls to size-specialized malloc functions instead of calls to newObject when we know the size of the allocation at compilation time. Most of it is a matter of calling the newObject function (which will create calls to the size-specialized functions) rather then the newObjectNonSpecialized function (which won't). In the newHeapaddr, small, non-pointer case, we'll create a non specialized newObject and transform that into the appropriate size-specialized function when we produce the mallocgc in flushPendingHeapAllocations. We have to update some of the rewrites in generic.rules to also apply to the size-specialized functions when they apply to newObject. The messiest thing is we have to adjust the offset we use to save the memory profiler stack, because the depth of the call to profilealloc is two frames fewer in the size-specialized malloc functions compared to when newObject calls mallocgc. A bunch of tests have been adjusted to account for that. Change-Id: I6a6a6964c9037fb6719e392c4a498ed700b617d7 Reviewed-on: https://go-review.googlesource.com/c/go/+/707856 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Matloob <matloob@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-10-02runtime,net/http/pprof: goroutine leak detection by using the garbage collectorVlad Saioc
Proposal #74609 Change-Id: I97a754b128aac1bc5b7b9ab607fcd5bb390058c8 GitHub-Last-Rev: 60f2a192badf415112246de8bc6c0084085314f6 GitHub-Pull-Request: golang/go#74622 Reviewed-on: https://go-review.googlesource.com/c/go/+/688335 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: t hepudds <thepudds1460@gmail.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-08-15runtime/metrics: add metrics for goroutine sched statesMichael Anthony Knyszek
This is largely a port of CL 38180. For #15490. Change-Id: I2726111e472e81e9f9f0f294df97872c2689f061 Reviewed-on: https://go-review.googlesource.com/c/go/+/690397 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-06-10runtime: handle system goroutines later in goroutine profilingMichael Anthony Knyszek
Before CL 650697, there was only one system goroutine that could dynamically change between being a user goroutine and a system goroutine, and that was the finalizer/cleanup goroutine. In goroutine profiles, it was handled explicitly. It's status would be checked during the first STW, and its stack would be recorded. This let the goroutine profiler completely ignore system goroutines once the world was started again. CL 650697 added dedicated cleanup goroutines (there may be more than one), and with this, the logic for finalizer goroutines no longer scaled. In that CL, I let the isSystemGoroutine check be dynamic and dropped the special case, but this was based on incorrect assumptions. Namely, it's possible for the scheduler to observe, for example, the finalizer goroutine as a system goroutine and ignore it, but then later the goroutine profiler itself sees it as a user goroutine. At that point it's too late and already running. This violates the invariant of the goroutine profile that all goroutines are handled by the profiler before they start executing. In practice, the result is that the goroutine profiler can crash when it checks this invariant (not checking the invariant means racily reading goroutine stack memory). The root cause of the problem is that these system goroutines do not participate in the goroutine profiler's state machine. Normally, when profiling, goroutines transition from 'absent' to 'in-progress' to 'satisfied'. However with system goroutines, the state machine is ignored entirely. They always stay in the 'absent' state. This means that if a goroutine transitions from system to user, it is eligible for a profile record when it shouldn't be. That transition shouldn't be allowed to occur with respect to the goroutine profiler, because the goroutine profiler is trying to snapshot the state of every goroutine. The fix to this problem is simple: don't ignore system goroutines. Let them participate in the goroutine profile state machine. Instead, decide whether or not to record the stack after the goroutine has been acquired for goroutine profiling. This means if the scheduler observes the finalizer goroutine as a system goroutine, it will get promoted in the goroutine profiler's state machine, and no other part of the goroutine profiler will observe the goroutine again. Simultaneously, the stack record for the goroutine will be correctly skipped. Fixes #74090. Change-Id: Icb9a164a033be22aaa942d19e828e895f700ca74 Reviewed-on: https://go-review.googlesource.com/c/go/+/680477 Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-08runtime: schedule cleanups across multiple goroutinesMichael Anthony Knyszek
This change splits the finalizer and cleanup queues and implements a new lock-free blocking queue for cleanups. The basic design is as follows: The cleanup queue is organized in fixed-sized blocks. Individual cleanup functions are queued, but only whole blocks are dequeued. Enqueuing cleanups places them in P-local cleanup blocks. These are flushed to the full list as they get full. Cleanups can only be enqueued by an active sweeper. Dequeuing cleanups always dequeues entire blocks from the full list. Cleanup blocks can be dequeued and executed at any time. The very last active sweeper in the sweep phase is responsible for flushing all local cleanup blocks to the full list. It can do this without any synchronization because the next GC can't start yet, so we can be very certain that nobody else will be accessing the local blocks. Cleanup blocks are stored off-heap because the need to be allocated by the sweeper, which is called from heap allocation paths. As a result, the GC treats cleanup blocks as roots, just like finalizer blocks. Flushes to the full list signal to the scheduler that cleanup goroutines should be awoken. Every time the scheduler goes to wake up a cleanup goroutine and there were more signals than goroutines to wake, it then forwards this signal to runtime.AddCleanup, so that it creates another goroutine the next time it is called, up to gomaxprocs goroutines. The signals here are a little convoluted, but exist because the sweeper and the scheduler cannot safely create new goroutines. For #71772. For #71825. Change-Id: Ie839fde2b67e1b79ac1426be0ea29a8d923a62cc Reviewed-on: https://go-review.googlesource.com/c/go/+/650697 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-07runtime: remove GODEBUG=runtimecontentionstacksRhys Hiltner
Go 1.22 promised to remove the setting in a future release once the semantics of runtime-internal lock contention matched that of sync.Mutex. That work is done, remove the setting. Previously reviewed as https://go.dev/cl/585639. For #66999 Change-Id: I9fe62558ba0ac12824874a0bb1b41efeb7c0853f Reviewed-on: https://go-review.googlesource.com/c/go/+/668995 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-05-07runtime: blame unlocker for mutex delayRhys Hiltner
Correct how the mutex contention profile reports on runtime-internal mutex values, to match sync.Mutex's semantics. Decide at the start of unlock2 whether we'd like to collect a contention sample. If so: Opt in to a slightly slower unlock path which avoids accidentally accepting blame for delay caused by other Ms. Release the lock before doing an O(N) traversal of the stack of waiting Ms, to calculate the total delay to those Ms that our critical section caused. Report that, with the current callstack, in the mutex profile. Fixes #66999 Change-Id: I561ed8dc120669bd045d514cb0d1c6c99c2add04 Reviewed-on: https://go-review.googlesource.com/c/go/+/667615 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-03-04runtime: decorate anonymous memory mappingsLénaïc Huard
Leverage the prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ...) API to name the anonymous memory areas. This API has been introduced in Linux 5.17 to decorate the anonymous memory areas shown in /proc/<pid>/maps. This is already used by glibc. See: * https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=27dfd1eb907f4615b70c70237c42c552bb4f26a8;hb=HEAD#l2434 * https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/setvmaname.c;h=ea93a5ffbebc9e5a7e32a297138f465724b4725f;hb=HEAD#l63 This can be useful when investigating the memory consumption of a multi-language program. On a 100% Go program, pprof profiler can be used to profile the memory consumption of the program. But pprof is only aware of what happens within the Go world. On a multi-language program, there could be a doubt about whether the suspicious extra-memory consumption comes from the Go part or the native part. With this change, the following Go program: package main import ( "fmt" "log" "os" ) /* #include <stdlib.h> void f(void) { (void)malloc(1024*1024*1024); } */ import "C" func main() { C.f() data, err := os.ReadFile("/proc/self/maps") if err != nil { log.Fatal(err) } fmt.Println(string(data)) } produces this output: $ GLIBC_TUNABLES=glibc.mem.decorate_maps=1 ~/doc/devel/open-source/go/bin/go run . 00400000-00402000 r--p 00000000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name 00402000-004a4000 r-xp 00002000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name 004a4000-00574000 r--p 000a4000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name 00574000-00575000 r--p 00173000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name 00575000-00580000 rw-p 00174000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name 00580000-005a4000 rw-p 00000000 00:00 0 2e075000-2e096000 rw-p 00000000 00:00 0 [heap] c000000000-c000400000 rw-p 00000000 00:00 0 [anon: Go: heap] c000400000-c004000000 ---p 00000000 00:00 0 [anon: Go: heap reservation] 777f40000000-777f40021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena] 777f40021000-777f44000000 ---p 00000000 00:00 0 777f44000000-777f44021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena] 777f44021000-777f48000000 ---p 00000000 00:00 0 777f48000000-777f48021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena] 777f48021000-777f4c000000 ---p 00000000 00:00 0 777f4c000000-777f4c021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena] 777f4c021000-777f50000000 ---p 00000000 00:00 0 777f50000000-777f50021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena] 777f50021000-777f54000000 ---p 00000000 00:00 0 777f55afb000-777f55afc000 ---p 00000000 00:00 0 777f55afc000-777f562fc000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216378] 777f562fc000-777f562fd000 ---p 00000000 00:00 0 777f562fd000-777f56afd000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216377] 777f56afd000-777f56afe000 ---p 00000000 00:00 0 777f56afe000-777f572fe000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216376] 777f572fe000-777f572ff000 ---p 00000000 00:00 0 777f572ff000-777f57aff000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216375] 777f57aff000-777f57b00000 ---p 00000000 00:00 0 777f57b00000-777f58300000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216374] 777f58300000-777f58400000 rw-p 00000000 00:00 0 [anon: Go: page alloc index] 777f58400000-777f5a400000 rw-p 00000000 00:00 0 [anon: Go: heap index] 777f5a400000-777f6a580000 ---p 00000000 00:00 0 [anon: Go: scavenge index] 777f6a580000-777f6a581000 rw-p 00000000 00:00 0 [anon: Go: scavenge index] 777f6a581000-777f7a400000 ---p 00000000 00:00 0 [anon: Go: scavenge index] 777f7a400000-777f8a580000 ---p 00000000 00:00 0 [anon: Go: page summary] 777f8a580000-777f8a581000 rw-p 00000000 00:00 0 [anon: Go: page alloc] 777f8a581000-777f9c430000 ---p 00000000 00:00 0 [anon: Go: page summary] 777f9c430000-777f9c431000 rw-p 00000000 00:00 0 [anon: Go: page alloc] 777f9c431000-777f9e806000 ---p 00000000 00:00 0 [anon: Go: page summary] 777f9e806000-777f9e807000 rw-p 00000000 00:00 0 [anon: Go: page alloc] 777f9e807000-777f9ec00000 ---p 00000000 00:00 0 [anon: Go: page summary] 777f9ec36000-777f9ecb6000 rw-p 00000000 00:00 0 [anon: Go: immortal metadata] 777f9ecb6000-777f9ecc6000 rw-p 00000000 00:00 0 [anon: Go: gc bits] 777f9ecc6000-777f9ecd6000 rw-p 00000000 00:00 0 [anon: Go: allspans array] 777f9ecd6000-777f9ece7000 rw-p 00000000 00:00 0 [anon: Go: immortal metadata] 777f9ece7000-777f9ed67000 ---p 00000000 00:00 0 [anon: Go: page summary] 777f9ed67000-777f9ed68000 rw-p 00000000 00:00 0 [anon: Go: page alloc] 777f9ed68000-777f9ede7000 ---p 00000000 00:00 0 [anon: Go: page summary] 777f9ede7000-777f9ee07000 rw-p 00000000 00:00 0 [anon: Go: page alloc] 777f9ee07000-777f9ee0a000 rw-p 00000000 00:00 0 [anon: glibc: loader malloc] 777f9ee0a000-777f9ee2e000 r--p 00000000 00:21 48158213 /usr/lib/libc.so.6 777f9ee2e000-777f9ef9f000 r-xp 00024000 00:21 48158213 /usr/lib/libc.so.6 777f9ef9f000-777f9efee000 r--p 00195000 00:21 48158213 /usr/lib/libc.so.6 777f9efee000-777f9eff2000 r--p 001e3000 00:21 48158213 /usr/lib/libc.so.6 777f9eff2000-777f9eff4000 rw-p 001e7000 00:21 48158213 /usr/lib/libc.so.6 777f9eff4000-777f9effc000 rw-p 00000000 00:00 0 777f9effc000-777f9effe000 rw-p 00000000 00:00 0 [anon: glibc: loader malloc] 777f9f00a000-777f9f04a000 rw-p 00000000 00:00 0 [anon: Go: immortal metadata] 777f9f04a000-777f9f04c000 r--p 00000000 00:00 0 [vvar] 777f9f04c000-777f9f04e000 r--p 00000000 00:00 0 [vvar_vclock] 777f9f04e000-777f9f050000 r-xp 00000000 00:00 0 [vdso] 777f9f050000-777f9f051000 r--p 00000000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2 777f9f051000-777f9f07a000 r-xp 00001000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2 777f9f07a000-777f9f085000 r--p 0002a000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2 777f9f085000-777f9f087000 r--p 00034000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2 777f9f087000-777f9f088000 rw-p 00036000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2 777f9f088000-777f9f089000 rw-p 00000000 00:00 0 7ffc7bfa7000-7ffc7bfc8000 rw-p 00000000 00:00 0 [stack] ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall] The anonymous memory areas are now labelled so that we can see which ones have been allocated by the Go runtime versus which ones have been allocated by the glibc. Fixes #71546 Change-Id: I304e8b4dd7f2477a6da794fd44e9a7a5354e4bf4 Reviewed-on: https://go-review.googlesource.com/c/go/+/646095 Auto-Submit: Alan Donovan <adonovan@google.com> Commit-Queue: Alan Donovan <adonovan@google.com> Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-11-18internal/sync: move sync.Mutex implementation into new packageMichael Anthony Knyszek
This CL refactors sync.Mutex such that its implementation lives in the new internal/sync package. The purpose of this change is to eventually reverse the dependency edge between internal/concurrent and sync, such that sync can depend on internal/concurrent (or really, its contents, which will likely end up in internal/sync). The only change made to the sync.Mutex code is the frame skip count for mutex profiling, so that the internal/sync frames are omitted in the profile. Change-Id: Ib3603d30e8e71508c4ea883a584ae2e51ce40c3f Reviewed-on: https://go-review.googlesource.com/c/go/+/594056 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-10-21runtime: refactor mallocgc into several independent codepathsMichael Anthony Knyszek
Right now mallocgc is a monster of a function. In real programs, we see that a substantial amount of time in mallocgc is spent in mallocgc itself. It's very branch-y, holds a lot of state, and handles quite a few disparate cases, trying to merge them together. This change breaks apart mallocgc into separate, leaner functions. There's some duplication now, but there are a lot of branches that can be pruned as a result. There's definitely still more we can do here. heapSetType can be inlined and broken down for each case, since its internals roughly map to each case anyway (done in a follow-up CL). We can probably also do more with the size class lookups, since we know more about the size of the object in each case than before. Below are the savings for the full stack up until now. │ after-3.out │ after-4.out │ │ sec/op │ sec/op vs base │ Malloc8-4 13.32n ± 2% 12.17n ± 1% -8.63% (p=0.002 n=6) Malloc16-4 21.64n ± 3% 19.38n ± 10% -10.47% (p=0.002 n=6) MallocTypeInfo8-4 23.15n ± 2% 19.91n ± 2% -14.00% (p=0.002 n=6) MallocTypeInfo16-4 25.86n ± 4% 22.48n ± 5% -13.11% (p=0.002 n=6) MallocLargeStruct-4 270.0n ± ∞ ¹ geomean 20.38n 30.97n -11.58% Change-Id: I681029c0b442f9221c4429950626f06299a5cfe4 Reviewed-on: https://go-review.googlesource.com/c/go/+/614257 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-09-25runtime: fix GoroutineProfile stacks not getting null terminatedFelix Geisendörfer
Fix a regression introduced in CL 572396 causing goroutine stacks not getting null terminated. This bug impacts callers that reuse the []StackRecord slice for multiple calls to GoroutineProfile. See https://github.com/felixge/fgprof/issues/33 for an example of the problem. Add a test case to prevent similar regressions in the future. Use null padding instead of null termination to be consistent with other profile types and because it's less code to implement. Also fix the ThreadCreateProfile code path. Fixes #69243 Change-Id: I0b9414f6c694c304bc03a5682586f619e9bf0588 Reviewed-on: https://go-review.googlesource.com/c/go/+/609815 Reviewed-by: Tim King <taking@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-09-25runtime: fix MutexProfile missing root framesFelix Geisendörfer
Fix a regression introduced in CL 598515 causing runtime.MutexProfile stack traces to omit their root frames. In most cases this was merely causing the `runtime.goexit` frame to go missing. But in the case of runtime._LostContendedRuntimeLock, an empty stack trace was being produced. Add a test that catches this regression by checking for a stack trace with the `runtime.goexit` frame. Also fix a separate problem in expandFrame that could cause out-of-bounds panics when profstackdepth is set to a value below 32. There is no test for this fix because profstackdepth can't be changed at runtime right now. Fixes #69335 Change-Id: I1600fe62548ea84981df0916d25072c3ddf1ea1a Reviewed-on: https://go-review.googlesource.com/c/go/+/611615 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-09-17runtime: move getcallersp to internal/runtime/sysMichael Pratt
Moving these intrinsics to a base package enables other internal/runtime packages to use them. For #54766. Change-Id: I45a530422207dd94b5ad4eee51216c9410a84040 Reviewed-on: https://go-review.googlesource.com/c/go/+/613261 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-09-17runtime: move getcallerpc to internal/runtime/sysMichael Pratt
Moving these intrinsics to a base package enables other internal/runtime packages to use them. For #54766. Change-Id: I0b3eded3bb45af53e3eb5bab93e3792e6a8beb46 Reviewed-on: https://go-review.googlesource.com/c/go/+/613260 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-08-20src: fix typosAlexander Cyon
Fix typos in ~30 files Change-Id: Ie433aea01e7d15944c1e9e103691784876d5c1f9 GitHub-Last-Rev: bbaeb3d1f88a5fa6bbb69607b1bd075f496a7894 GitHub-Pull-Request: golang/go#68964 Reviewed-on: https://go-review.googlesource.com/c/go/+/606955 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2024-08-19runtime: store zero-delay mutex contention eventsRhys Hiltner
Mutex contention events with delay of 0 need more than CL 604355 added: When deciding which event to store in the M's single available slot, always choose to drop the zero-delay event. Store an explicit flag for whether we have an event to store, rather than relying on a non-zero delay. And, fix a test of sync.Mutex contention that expects those events to have non-zero delay. The reporting of non-runtime contention like this has long allowed zero-delay events, which we see when cputicks has low resolution. Fixes #68892 Fixes #68906 Change-Id: Id412141e4eb09724f3ce195899a20d59c92d7b78 Reviewed-on: https://go-review.googlesource.com/c/go/+/606115 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-08-14runtime: record all sampled mutex profile eventsRhys Hiltner
The block and mutex profiles have slightly different behaviors when a sampled event has a negative (or zero) duration. The block profile enforces a minimum duration for each event of "1" in the cputicks unit. It does so by clamping the duration to 1 if it was originally reported as being smaller. The mutex profile for app-level contention enforces a minimum duration of 0 in a similar way: by reporting any negative values as 0 instead. The mutex profile for runtime-internal contention had a different behavior: to enforce a minimum event duration of "1" by dropping any non-conforming samples. Stop dropping samples, and use the same minimum (0) that's in place for the other mutex profile events. Fixes #64253 Fixes #68453 Fixes #68781 Change-Id: I4c5d23a2675501226eef5b9bc1ada2efc1a55b9e Reviewed-on: https://go-review.googlesource.com/c/go/+/604355 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
2024-07-23runtime,internal: move runtime/internal/sys to internal/runtime/sysDavid Chase
Cleanup and friction reduction For #65355. Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511 Reviewed-on: https://go-review.googlesource.com/c/go/+/600436 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-07-17runtime: avoid multiple records with identical stacks from MutexProfileNick Ripley
When using frame pointer unwinding, we defer frame skipping and inline expansion for call stacks until profile reporting time. We can end up with records which have different stacks if no frames are skipped, but identical stacks once skipping is taken into account. Returning multiple records with the same stack (but different values) has broken programs which rely on the records already being fully aggregated by call stack when returned from runtime.MutexProfile. This CL addresses the problem by handling skipping at recording time. We do full inline expansion to correctly skip the desired number of frames when recording the call stack, and then handle the rest of inline expansion when reporting the profile. The regression test in this CL is adapted from the reproducer in https://github.com/grafana/pyroscope-go/issues/103, authored by Tolya Korniltsev. Fixes #67548 This reapplies CL 595966. The original version of this CL introduced a bounds error in MutexProfile and failed to correctly expand inlined frames from that call. This CL applies the original CL, fixing the bounds error and adding a test for the output of MutexProfile to ensure the frames are expanded properly. Change-Id: I5ef8aafb9f88152a704034065c0742ba767c4dbb Reviewed-on: https://go-review.googlesource.com/c/go/+/598515 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-07-15Revert "runtime: avoid multiple records with identical stacks from MutexProfile"Cherry Mui
This reverts CL 595966. Reason for revert: This CL contains a bug. See the comment in https://go-review.googlesource.com/c/go/+/595966/8#message-57f4c1f9570b5fe912e06f4ae3b52817962533c0 Change-Id: I48030907ded173ae20a8965bf1b41a713dd06059 Reviewed-on: https://go-review.googlesource.com/c/go/+/598219 Reviewed-by: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-07-09runtime: avoid multiple records with identical stacks from MutexProfileNick Ripley
When using frame pointer unwinding, we defer frame skipping and inline expansion for call stacks until profile reporting time. We can end up with records which have different stacks if no frames are skipped, but identical stacks once skipping is taken into account. Returning multiple records with the same stack (but different values) has broken programs which rely on the records already being fully aggregated by call stack when returned from runtime.MutexProfile. This CL addresses the problem by handling skipping at recording time. We do full inline expansion to correctly skip the desired number of frames when recording the call stack, and then handle the rest of inline expansion when reporting the profile. The regression test in this CL is adapted from the reproducer in https://github.com/grafana/pyroscope-go/issues/103, authored by Tolya Korniltsev. Fixes #67548 Co-Authored-By: Tolya Korniltsev <korniltsev.anatoly@gmail.com> Change-Id: I6a42ce612377f235b2c8c0cec9ba8e9331224b84 Reviewed-on: https://go-review.googlesource.com/c/go/+/595966 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Carlos Amedee <carlos@golang.org> Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
2024-07-03cmd/link: don't disable memory profiling when pprof.WriteHeapProfile is usedCherry Mui
We have an optimization that if the memory profile is not consumed anywhere, we set the memory profiling rate to 0 to disable the "background" low-rate profiling. We detect whether the memory profile is used by checking whether the runtime.MemProfile function is reachable at link time. Previously, all APIs that access the memory profile go through runtime.MemProfile. But the code was refactored in CL 572396, and now the legacy entry point WriteHeapProfile uses pprof_memProfileInternal without going through runtime.MemProfile. In fact, even with the recommended runtime/pprof.Profile API (pprof.Lookup or pprof.Profiles), runtime.MemProfile is only (happen to be) reachable through countHeap. Change the linker to check runtime.memProfileInternal instead, which is on all code paths that retrieve the memory profile. Add a test case for WriteHeapProfile, so we cover all entry points. Fixes #68136. Change-Id: I075c8d45c95c81825a1822f032e23107aea4303c Reviewed-on: https://go-review.googlesource.com/c/go/+/596538 Reviewed-by: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-30Revert "runtime: prepare for extensions to waiting M list"Rhys Hiltner
This reverts commit be0b569caa0eab1a7f30edf64e550bbf5f6ff235 (CL 585635). Reason for revert: This is part of a patch series that changed the handling of contended lock2/unlock2 calls, reducing the maximum throughput of contended runtime.mutex values, and causing a performance regression on applications where that is (or became) the bottleneck. Updates #66999 Updates #67585 Change-Id: I7843ccaecbd273b7ceacfa0f420dd993b4b15a0a Reviewed-on: https://go-review.googlesource.com/c/go/+/589117 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-05-30Revert "runtime: double-link list of waiting Ms"Rhys Hiltner
This reverts commit d881ed6384ae58154d99682f1e20160c64e7c3c2 (CL 585637). Reason for revert: This is part of a patch series that changed the handling of contended lock2/unlock2 calls, reducing the maximum throughput of contended runtime.mutex values, and causing a performance regression on applications where that is (or became) the bottleneck. Updates #66999 Updates #67585 Change-Id: I70d8d0b74f73be95c43d664f584e8d98519aba26 Reviewed-on: https://go-review.googlesource.com/c/go/+/589116 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-05-30Revert "runtime: profile mutex contention during unlock"Rhys Hiltner
This reverts commit ba1c5b2c4573e10f3b5f0e0f25a27f17fba67eb0 (CL 585638). Reason for revert: This is part of a patch series that changed the handling of contended lock2/unlock2 calls, reducing the maximum throughput of contended runtime.mutex values, and causing a performance regression on applications where that is (or became) the bottleneck. Updates #66999 Updates #67585 Change-Id: Ibeec5d8deb17e87966cf352fefc7efe2267839d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/589115 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-30Revert "runtime: remove GODEBUG=runtimecontentionstacks"Rhys Hiltner
This reverts commit 87e930f7289136fad1310d4b63dd4127e409bac5 (CL 585639) Reason for revert: This is part of a patch series that changed the handling of contended lock2/unlock2 calls, reducing the maximum throughput of contended runtime.mutex values, and causing a performance regression on applications where that is (or became) the bottleneck. Updates #66999 Updates #67585 Change-Id: I1e286d2a16d16e4af202cd5dc04b2d9c4ee71b32 Reviewed-on: https://go-review.googlesource.com/c/go/+/589097 Reviewed-by: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
2024-05-30Revert "runtime: split mutex profile clocks"Rhys Hiltner
This reverts commit 8ab131fb1256a4a795c610e145c022e22e2d1567 (CL 586796) Reason for revert: This is part of a patch series that changed the handling of contended lock2/unlock2 calls, reducing the maximum throughput of contended runtime.mutex values, and causing a performance regression on applications where that is (or became) the bottleneck. Updates #66999 Updates #67585 Change-Id: I54711691e86e072081482102019d168292b5150a Reviewed-on: https://go-review.googlesource.com/c/go/+/589095 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com>
2024-05-22runtime: split mutex profile clocksRhys Hiltner
Mutex contention measurements work with two clocks: nanotime for use in runtime/metrics, and cputicks for the runtime/pprof profile. They're subject to different sampling rates: the runtime/metrics view is always enabled, but the profile is adjustable and is turned off by default. They have different levels of overhead: it can take as little as one instruction to read cputicks while nanotime calls are more elaborate (although some platforms implement cputicks as a nanotime call). The use of the timestamps is also different: the profile's view needs to attach the delay in some Ms' lock2 calls to another M's unlock2 call stack, but the metric's view is only an int64. Treat them differently. Don't bother threading the nanotime clock through to the unlock2 call, measure and report it directly within lock2. Sample nanotime at a constant gTrackingPeriod. Don't consult any clocks unless the mutex is actually contended. Continue liberal use of cputicks for now. For #66999 Change-Id: I1c2085ea0e695bfa90c30fadedc99ced9eb1f69e Reviewed-on: https://go-review.googlesource.com/c/go/+/586796 TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Run-TryBot: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-05-21runtime: remove GODEBUG=runtimecontentionstacksRhys Hiltner
Go 1.22 promised to remove the setting in a future release once the semantics of runtime-internal lock contention matched that of sync.Mutex. That work is done, remove the setting. For #66999 Change-Id: I3c4894148385adf2756d8754e44d7317305ad758 Reviewed-on: https://go-review.googlesource.com/c/go/+/585639 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-05-21runtime: profile mutex contention during unlockRhys Hiltner
When an M's use of a lock causes delays in other Ms, capture the stack of the unlock call that caused the delay. This makes the meaning of the mutex profile for runtime-internal mutexes match the behavior for sync.Mutex: the profile points to the end of the critical section that is responsible for delaying other work. Fixes #66999 Change-Id: I4abc4a1df00a48765d29c07776481a1cbd539ff8 Reviewed-on: https://go-review.googlesource.com/c/go/+/585638 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-05-21runtime: double-link list of waiting MsRhys Hiltner
When an M unlocks a contended mutex, it needs to consult a list of the Ms that had to wait during its critical section. This allows the M to attribute the appropriate amount of blame to the unlocking call stack. Mirroring the implementation for users' sync.Mutex contention (via sudog), we can (in a future commit) use the time that the head and tail of the wait list started waiting, and the number of waiters, to estimate the sum of the Ms' delays. When an M acquires the mutex, it needs to remove itself from the list of waiters. Since the futex-based lock implementation leaves the OS in control of the order of M wakeups, we need to be prepared for quickly (constant time) removing any M from the list. First, have each M add itself to a singly-linked wait list when it finds that its lock call will need to sleep. This case is safe against live-lock, since any delay to one M adding itself to the list would be due to another M making durable progress. Second, have the M that holds the lock (either right before releasing, or right after acquiring) update metadata on the list of waiting Ms to double-link the list and maintain a tail pointer and waiter count. That work is amortized-constant: we'll avoid contended locks becoming proportionally more contended and undergoing performance collapse. For #66999 Change-Id: If75cdea915afb59ccec47294e0b52c466aac8736 Reviewed-on: https://go-review.googlesource.com/c/go/+/585637 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
2024-05-21runtime: prepare for extensions to waiting M listRhys Hiltner
Move the nextwaitm field into a small struct, in preparation for additional metadata to track how long Ms need to wait for locks. For #66999 Change-Id: Ib40e43c15cde22f7e35922641107973d99439ecd Reviewed-on: https://go-review.googlesource.com/c/go/+/585635 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-05-21runtime: make profstackdepth a GODEBUG optionFelix Geisendörfer
Allow users to decrease the profiling stack depth back to 32 in case they experience any problems with the new default of 128. Users may also use this option to increase the depth up to 1024. Change-Id: Ieaab2513024915a223239278dd97a6e161dde1cf Reviewed-on: https://go-review.googlesource.com/c/go/+/581917 Reviewed-by: Austin Clements <austin@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-05-21runtime: increase profiling stack depth to 128Felix Geisendörfer
The current stack depth limit for alloc, mutex, block, threadcreate and goroutine profiles of 32 frequently leads to truncated stack traces in production applications. Increase the limit to 128 which is the same size used by the execution tracer. Create internal/profilerecord to define variants of the runtime's StackRecord, MemProfileRecord and BlockProfileRecord types that can hold arbitrarily big stack traces. Implement internal profiling APIs based on these new types and use them for creating protobuf profiles and to act as shims for the public profiling APIs using the old types. This will lead to an increase in memory usage for applications that use the impacted profile types and have stack traces exceeding the current limit of 32. Those applications will also experience a slight increase in CPU usage, but this will hopefully soon be mitigated via CL 540476 and 533258 which introduce frame pointer unwinding for the relevant profile types. For #43669. Change-Id: Ie53762e65d0f6295f5d4c7d3c87172d5a052164e Reviewed-on: https://go-review.googlesource.com/c/go/+/572396 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-05-21runtime: fix profile stack trace depth regressionFelix Geisendörfer
Previously it was possible for mutex and block profile stack traces to contain up to 32 frames in Stack0 or the resulting pprof profiles. CL 533258 changed this behavior by using some of the space to record skipped frames that are discarded when performing delayed inline expansion. This has lowered the effective maximum stack size from 32 to 27 (the max skip value is 5), which can be seen as a small regression. Add TestProfilerStackDepth to demonstrate the issue and protect all profile types from similar regressions in the future. Fix the issue by increasing the internal maxStack limit to take the maxSkip value into account. Assert that the maxSkip value is never exceeded when recording mutex and block profile stack traces. Three alternative solutions to the problem were considered and discarded: 1) Revert CL 533258 and give up on frame pointer unwinding. This seems unappealing as we would lose the performance benefits of frame pointer unwinding. 2) Discard skipped frames when recording the initial stack trace. This would require eager inline expansion for up to maxSkip frames and partially negate the performance benefits of frame pointer unwinding. 3) Accept and document the new behavior. This would simplify the implementation, but seems more confusing from a user perspective. It also complicates the creation of test cases that make assertions about the maximum profiling stack depth. The execution tracer still has the same issue due to CL 463835. This should be addressed in a follow-up CL. Co-authored-by: Nick Ripley <nick.ripley@datadoghq.com> Change-Id: Ibf4dbf08a5166c9cb32470068c69f58bc5f98d2c Reviewed-on: https://go-review.googlesource.com/c/go/+/586657 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-13runtime: use frame pointer unwinding for block and mutex profilersNick Ripley
Use frame pointer unwinding, where supported, to collect call stacks for the block, and mutex profilers. This method of collecting call stacks is typically an order of magnitude faster than callers/tracebackPCs. The marginal benefit for these profile types is likely small compared to using frame pointer unwinding for the execution tracer. However, the block profiler can have noticeable overhead unless the sampling rate is very high. Additionally, using frame pointer unwinding in more places helps ensure more testing/support, which benefits systems like the execution tracer which rely on frame pointer unwinding to be practical to use. Change-Id: I4b36c90cd2df844645fd275a41b247352d635727 Reviewed-on: https://go-review.googlesource.com/c/go/+/533258 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-05-08runtime: move profiling pc buffers to mFelix Geisendörfer
Move profiling pc buffers from being stack allocated to an m field. This is motivated by the next patch, which will increase the default stack depth to 128, which might lead to undesirable stack growth for goroutines that produce profiling events. Additionally, this change paves the way to make the stack depth configurable via GODEBUG. Change-Id: Ifa407f899188e2c7c0a81de92194fdb627cb4b36 Reviewed-on: https://go-review.googlesource.com/c/go/+/574699 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-05-08runtime: remove allocfreetraceMichael Anthony Knyszek
allocfreetrace prints all allocations and frees to stderr. It's not terribly useful because it has a really huge overhead, making it not feasible to use except for the most trivial programs. A follow-up CL will replace it with something that is both more thorough and also lower overhead. Change-Id: I1d668fee8b6aaef5251a5aea3054ec2444d75eb6 Reviewed-on: https://go-review.googlesource.com/c/go/+/583376 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-03-25runtime: migrate internal/atomic to internal/runtimeAndy Pan
For #65355 Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be Reviewed-on: https://go-review.googlesource.com/c/go/+/560155 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
2024-03-08runtime: use built-in clear to simplify codeapocelipes
Change-Id: Icb6d9ca996b4119d8636d9f7f6a56e510d74d059 GitHub-Last-Rev: 08178e8ff798f4a51860573788c9347a0fb6bc40 GitHub-Pull-Request: golang/go#66188 Reviewed-on: https://go-review.googlesource.com/c/go/+/569979 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: qiulaidongfeng <2645477756@qq.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2024-01-30runtime: avoid new linkname for goroutine profilesMichael Pratt
CL 464349 added a new linkname to provide gcount to runtime/pprof to avoid a STW when estimating the goroutine profile allocation size. However, adding a linkname here isn't necessary for a few reasons: 1. We already export gcount via NumGoroutines. I completely forgot about this during review. 2. aktau suggested that goroutineProfileWithLabelsConcurrent return gcount as a fast path estimate when the input is empty. The second point keeps the code cleaner overall, so I've done that. For #54014. Change-Id: I6cb0811a769c805e269b55774cdd43509854078e Reviewed-on: https://go-review.googlesource.com/c/go/+/559515 Auto-Submit: Michael Pratt <mpratt@google.com> Auto-Submit: Nicolas Hillegeer <aktau@google.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-01-30runtime: reduce one STW when obtaining goroutine configuration fileJun10ng
Fixes #54014 Change-Id: If4ee2752008729e1ed4b767cfda52effdcec4959 GitHub-Last-Rev: 5ce300bf5128f842604d85d5f8749027c8e091c2 GitHub-Pull-Request: golang/go#58239 Reviewed-on: https://go-review.googlesource.com/c/go/+/464349 Reviewed-by: qiulaidongfeng <2645477756@qq.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: qiulaidongfeng <2645477756@qq.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-01-15runtime: more godoc linksOlivier Mengué
Change-Id: I8fe66326994894b17ce0eda991bba942844d26b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/541475 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>