aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/malloc.go
AgeCommit message (Collapse)Author
2025-05-21runtime: add valgrind instrumentationRoland Shoemaker
Add build tag gated Valgrind annotations to the runtime which let it understand how the runtime manages memory. This allows for Go binaries to be run under Valgrind without emitting spurious errors. Instead of adding the Valgrind headers to the tree, and using cgo to call the various Valgrind client request macros, we just add an assembly function which emits the necessary instructions to trigger client requests. In particular we add instrumentation of the memory allocator, using a two-level mempool structure (as described in the Valgrind manual [0]). We also add annotations which allow Valgrind to track which memory we use for stacks, which seems necessary to let it properly function. We describe the memory model to Valgrind as follows: we treat heap arenas as a "pool" created with VALGRIND_CREATE_MEMPOOL_EXT (so that we can use VALGRIND_MEMPOOL_METAPOOL and VALGRIND_MEMPOOL_AUTO_FREE). Within the pool we treat spans as "superblocks", annotated with VALGRIND_MEMPOOL_ALLOC. We then allocate individual objects within spans with VALGRIND_MALLOCLIKE_BLOCK. It should be noted that running binaries under Valgrind can be _quite slow_, and certain operations, such as running the GC, can be _very slow_. It is recommended to run programs with GOGC=off. Additionally, async preemption should be turned off, since it'll cause strange behavior (GODEBUG=asyncpreemptoff=1). Running Valgrind with --leak-check=yes will result in some errors resulting from some things not being marked fully free'd. These likely need more annotations to rectify, but for now it is recommended to run with --leak-check=off. Updates #73602 [0] https://valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools Change-Id: I71b26c47d7084de71ef1e03947ef6b1cc6d38301 Reviewed-on: https://go-review.googlesource.com/c/go/+/674077 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-21runtime: guarantee checkfinalizers test allocates in a shared tiny blockMichael Anthony Knyszek
Currently the checkfinalizers test (TestDetectCleanupOrFinalizerLeak) only *tries* to ensure the tiny alloc with a cleanup attached shares a block with other objects. However, what it does is insufficient, because it could get unlucky and have the last object allocated be the first object of a new block. This change changes the test to guarantee that a tiny object is not at the start of a fresh block by looking at the alignment of the object's pointer. If the object's pointer is odd, then that's good enough to know that it shares a block with something else, since the blocks themselves are aligned to a much higher power of two. This fixes a failure I've seen on the builders. Fixes #73810. Change-Id: Ieafdbb9cccb0d2dc3659a9a5d9d9233718461635 Reviewed-on: https://go-review.googlesource.com/c/go/+/674655 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-20runtime: mark and identify tiny blocks in checkfinalizers modeMichael Anthony Knyszek
This change adds support for identifying cleanups and finalizers attached to tiny blocks to checkfinalizers mode. It also notes a subtle pitfall, which is that the cleanup arg, if tiny-allocated, could end up co-located with the object with the cleanup attached! Oops... For #72949. Change-Id: Icbe0112f7dcfc63f35c66cf713216796a70121ce Reviewed-on: https://go-review.googlesource.com/c/go/+/662037 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-05-20runtime: prevent unnecessary zeroing of large objects with pointersMichael Anthony Knyszek
CL 614257 refactored mallocgc but lost an optimization: if a span for a large object is already backed by memory fresh from the OS (and thus zeroed), we don't need to zero it. CL 614257 unconditionally zeroed spans for large objects that contain pointers. This change restores the optimization from before CL 614257, which seems to matter in some real-world programs. While we're here, let's also fix a hole with the garbage collector being able to observe uninitialized memory of the large object is observed by the conservative scanner before being published. The gory details are in a comment in heapSetTypeLarge. In short, this change makes span.largeType an atomic variable, such that the GC can only observe initialized memory if span.largeType != nil. Fixes #72991. Change-Id: I2048aeb220ab363d252ffda7d980b8788e9674dc Reviewed-on: https://go-review.googlesource.com/c/go/+/659956 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
2025-05-20runtime: only update freeIndexForScan outside of the mark phaseMichael Anthony Knyszek
Currently, it's possible for asynchronous preemption to observe a partially initialized object. The sequence of events goes like this: - The GC is in the mark phase. - Thread T1 is allocating object O1. - Thread T1 zeroes the allocation, runs the publication barrier, and updates freeIndexForScan. It has not yet updated the mark bit on O1. - Thread T2 is conservatively scanning some stack frame. That stack frame has a dead pointer with the same address as O1. - T2 picks up the pointer, checks isFree (which checks freeIndexForScan without an import barrier), and sees that O1 is allocated. It marks and queues O1. - T2 then goes to scan O1, and observes uninitialized memory. Although a publication barrier was executed, T2 did not have an import barrier. T2 may thus observe T1's writes to zero the object out-of-order with the write to freeIndexForScan. Normally this would be impossible if T2 got a pointer to O1 from somewhere written by T1. The publication barrier guarantees that if the read side is data-dependent on the write side then we'd necessarily observe all writes to O1 before T1 published it. However, T2 got the pointer 'out of thin air' by scanning a stack frame with a dead pointer on it. One fix to this problem would be to add the import barrier in the conservative scanner. We would then also need to put freeIndexForScan behind the publication barrier, or make the write to freeIndexForScan exactly that barrier. However, there's a simpler way. We don't actually care if conservative scanning observes a stale freeIndexForScan during the mark phase. Newly-allocated memory is always marked at the point of allocation (the allocate-black policy part of the GC's design). So it doesn't actually matter that if the garbage collector scans that memory or not. This change modifies the allocator to only update freeIndexForScan outside the mark phase. This means freeIndexForScan is essentially a snapshot of freeindex at the point the mark phase started. Because there's no more race between conservative scanning and newly-allocated objects, the complicated scenario above is no longer a possibility. One thing we do have to be careful of is other callers of isFree. Previously freeIndexForScan would always track freeindex, now it no longer does. This change thus introduces isFreeOrNewlyAllocated which is used by the conservative scanner, and uses freeIndexForScan. Meanwhile isFree goes back to using freeindex like it used to. This change also documents the requirement on isFree that the caller must have obtained the pointer not 'out of thin air' but after the object was published. isFree is not currently used anywhere particularly sensitive (heap dump and checkmark mode, where the world is stopped in both cases) so using freeindex is both conceptually simple and also safe. Change-Id: If66b8c536b775971203fb4358c17d711c2944723 Reviewed-on: https://go-review.googlesource.com/c/go/+/672340 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-19runtime: move atoi to internal/runtime/strconvMichael Pratt
Moving to a smaller package allows its use in other internal/runtime packages. This isn't internal/strconvlite since it can't be used directly by strconv. For #73193. Change-Id: I6a6a636c9c8b3f06b5fd6c07fe9dd5a7a37d1429 Reviewed-on: https://go-review.googlesource.com/c/go/+/672697 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-04-24runtime: fix tag pointers on aixKeith Randall
Clean up tagged pointers a bit. I got the shifts wrong for the weird aix case. Change-Id: I21449fd5973f4651fd1103d3b8be9c2b9b93a490 Reviewed-on: https://go-review.googlesource.com/c/go/+/667715 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-23runtime: move some malloc constants to internal/runtime/gcMichael Anthony Knyszek
These constants are needed by some future generator programs. Change-Id: I5dccd009cbb3b2f321523bc0d8eaeb4c82e5df81 Reviewed-on: https://go-review.googlesource.com/c/go/+/655276 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-23runtime: move sizeclass defs to new package internal/runtime/gcMichael Anthony Knyszek
We will want to reference these definitions from new generator programs, and this is a good opportunity to cleanup all these old C-style names. Change-Id: Ifb06f0afc381e2697e7877f038eca786610c96de Reviewed-on: https://go-review.googlesource.com/c/go/+/655275 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-04-02runtime: simplify needzero logickhr@golang.org
We always need to zero allocations with pointers in them. So we don't need some of the mallocs to take a needzero argument. Change-Id: Ideaa7b738873ba6a93addb5169791b42e2baad7c Reviewed-on: https://go-review.googlesource.com/c/go/+/660455 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-03-12runtime: remove nextSampleNoFP from plan9Russ Cox
Plan 9 can use floating point now. Change-Id: If721b243daa31853609cb3d2c535d86c106a1ee1 Reviewed-on: https://go-review.googlesource.com/c/go/+/655879 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Russ Cox <rsc@golang.org>
2025-03-04runtime: decorate anonymous memory mappingsLénaïc Huard
Leverage the prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ...) API to name the anonymous memory areas. This API has been introduced in Linux 5.17 to decorate the anonymous memory areas shown in /proc/<pid>/maps. This is already used by glibc. See: * https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=27dfd1eb907f4615b70c70237c42c552bb4f26a8;hb=HEAD#l2434 * https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/setvmaname.c;h=ea93a5ffbebc9e5a7e32a297138f465724b4725f;hb=HEAD#l63 This can be useful when investigating the memory consumption of a multi-language program. On a 100% Go program, pprof profiler can be used to profile the memory consumption of the program. But pprof is only aware of what happens within the Go world. On a multi-language program, there could be a doubt about whether the suspicious extra-memory consumption comes from the Go part or the native part. With this change, the following Go program: package main import ( "fmt" "log" "os" ) /* #include <stdlib.h> void f(void) { (void)malloc(1024*1024*1024); } */ import "C" func main() { C.f() data, err := os.ReadFile("/proc/self/maps") if err != nil { log.Fatal(err) } fmt.Println(string(data)) } produces this output: $ GLIBC_TUNABLES=glibc.mem.decorate_maps=1 ~/doc/devel/open-source/go/bin/go run . 00400000-00402000 r--p 00000000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name 00402000-004a4000 r-xp 00002000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name 004a4000-00574000 r--p 000a4000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name 00574000-00575000 r--p 00173000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name 00575000-00580000 rw-p 00174000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name 00580000-005a4000 rw-p 00000000 00:00 0 2e075000-2e096000 rw-p 00000000 00:00 0 [heap] c000000000-c000400000 rw-p 00000000 00:00 0 [anon: Go: heap] c000400000-c004000000 ---p 00000000 00:00 0 [anon: Go: heap reservation] 777f40000000-777f40021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena] 777f40021000-777f44000000 ---p 00000000 00:00 0 777f44000000-777f44021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena] 777f44021000-777f48000000 ---p 00000000 00:00 0 777f48000000-777f48021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena] 777f48021000-777f4c000000 ---p 00000000 00:00 0 777f4c000000-777f4c021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena] 777f4c021000-777f50000000 ---p 00000000 00:00 0 777f50000000-777f50021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena] 777f50021000-777f54000000 ---p 00000000 00:00 0 777f55afb000-777f55afc000 ---p 00000000 00:00 0 777f55afc000-777f562fc000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216378] 777f562fc000-777f562fd000 ---p 00000000 00:00 0 777f562fd000-777f56afd000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216377] 777f56afd000-777f56afe000 ---p 00000000 00:00 0 777f56afe000-777f572fe000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216376] 777f572fe000-777f572ff000 ---p 00000000 00:00 0 777f572ff000-777f57aff000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216375] 777f57aff000-777f57b00000 ---p 00000000 00:00 0 777f57b00000-777f58300000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216374] 777f58300000-777f58400000 rw-p 00000000 00:00 0 [anon: Go: page alloc index] 777f58400000-777f5a400000 rw-p 00000000 00:00 0 [anon: Go: heap index] 777f5a400000-777f6a580000 ---p 00000000 00:00 0 [anon: Go: scavenge index] 777f6a580000-777f6a581000 rw-p 00000000 00:00 0 [anon: Go: scavenge index] 777f6a581000-777f7a400000 ---p 00000000 00:00 0 [anon: Go: scavenge index] 777f7a400000-777f8a580000 ---p 00000000 00:00 0 [anon: Go: page summary] 777f8a580000-777f8a581000 rw-p 00000000 00:00 0 [anon: Go: page alloc] 777f8a581000-777f9c430000 ---p 00000000 00:00 0 [anon: Go: page summary] 777f9c430000-777f9c431000 rw-p 00000000 00:00 0 [anon: Go: page alloc] 777f9c431000-777f9e806000 ---p 00000000 00:00 0 [anon: Go: page summary] 777f9e806000-777f9e807000 rw-p 00000000 00:00 0 [anon: Go: page alloc] 777f9e807000-777f9ec00000 ---p 00000000 00:00 0 [anon: Go: page summary] 777f9ec36000-777f9ecb6000 rw-p 00000000 00:00 0 [anon: Go: immortal metadata] 777f9ecb6000-777f9ecc6000 rw-p 00000000 00:00 0 [anon: Go: gc bits] 777f9ecc6000-777f9ecd6000 rw-p 00000000 00:00 0 [anon: Go: allspans array] 777f9ecd6000-777f9ece7000 rw-p 00000000 00:00 0 [anon: Go: immortal metadata] 777f9ece7000-777f9ed67000 ---p 00000000 00:00 0 [anon: Go: page summary] 777f9ed67000-777f9ed68000 rw-p 00000000 00:00 0 [anon: Go: page alloc] 777f9ed68000-777f9ede7000 ---p 00000000 00:00 0 [anon: Go: page summary] 777f9ede7000-777f9ee07000 rw-p 00000000 00:00 0 [anon: Go: page alloc] 777f9ee07000-777f9ee0a000 rw-p 00000000 00:00 0 [anon: glibc: loader malloc] 777f9ee0a000-777f9ee2e000 r--p 00000000 00:21 48158213 /usr/lib/libc.so.6 777f9ee2e000-777f9ef9f000 r-xp 00024000 00:21 48158213 /usr/lib/libc.so.6 777f9ef9f000-777f9efee000 r--p 00195000 00:21 48158213 /usr/lib/libc.so.6 777f9efee000-777f9eff2000 r--p 001e3000 00:21 48158213 /usr/lib/libc.so.6 777f9eff2000-777f9eff4000 rw-p 001e7000 00:21 48158213 /usr/lib/libc.so.6 777f9eff4000-777f9effc000 rw-p 00000000 00:00 0 777f9effc000-777f9effe000 rw-p 00000000 00:00 0 [anon: glibc: loader malloc] 777f9f00a000-777f9f04a000 rw-p 00000000 00:00 0 [anon: Go: immortal metadata] 777f9f04a000-777f9f04c000 r--p 00000000 00:00 0 [vvar] 777f9f04c000-777f9f04e000 r--p 00000000 00:00 0 [vvar_vclock] 777f9f04e000-777f9f050000 r-xp 00000000 00:00 0 [vdso] 777f9f050000-777f9f051000 r--p 00000000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2 777f9f051000-777f9f07a000 r-xp 00001000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2 777f9f07a000-777f9f085000 r--p 0002a000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2 777f9f085000-777f9f087000 r--p 00034000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2 777f9f087000-777f9f088000 rw-p 00036000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2 777f9f088000-777f9f089000 rw-p 00000000 00:00 0 7ffc7bfa7000-7ffc7bfc8000 rw-p 00000000 00:00 0 [stack] ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall] The anonymous memory areas are now labelled so that we can see which ones have been allocated by the Go runtime versus which ones have been allocated by the glibc. Fixes #71546 Change-Id: I304e8b4dd7f2477a6da794fd44e9a7a5354e4bf4 Reviewed-on: https://go-review.googlesource.com/c/go/+/646095 Auto-Submit: Alan Donovan <adonovan@google.com> Commit-Queue: Alan Donovan <adonovan@google.com> Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-02-03runtime: fix GODEBUG=gccheckmark=1 and add smoke testMichael Anthony Knyszek
This change fixes GODEBUG=gccheckmark=1 which seems to have bit-rotted. Because the root jobs weren't being reset, it wasn't doing anything. Then, it turned out that checkmark mode would queue up noscan objects in workbufs, which caused it to fail. Then it turned out checkmark mode was broken with user arenas, since their heap arenas are not registered anywhere. Then, it turned out that checkmark mode could just not run properly if the goroutine's preemption flag was set (since sched.gcwaiting is true during the STW). And lastly, it turned out that async preemption could cause erroneous checkmark failures. This change fixes all these issues and adds a simple smoke test to dist to run the runtime tests under gccheckmark, which exercises all of these issues. Fixes #69074. Fixes #69377. Fixes #69376. Change-Id: Iaa0bb7b9e63ed4ba34d222b47510d6292ce168bc Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Reviewed-on: https://go-review.googlesource.com/c/go/+/608915 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-11-18runtime: get rid of gc programs for typesKeith Randall
Instead, have the runtime build the gc bitmaps on demand at runtime. Change-Id: If7a245bc62e4bce3ce80972410b0ed307d921abe Reviewed-on: https://go-review.googlesource.com/c/go/+/616255 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>
2024-10-30cmd/compile,runtime: add indirect key/elem to swissmapMichael Pratt
We use the same heuristics as existing maps. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: I44bb51483cae2c1714717f1b501850fb9e55a39a Reviewed-on: https://go-review.googlesource.com/c/go/+/616461 Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-25runtime: fix mallocgc for asanMichael Anthony Knyszek
This change finally fully fixes mallocgc for asan after the recent refactoring. Here is everything that changed: Fix the accounting for the alloc header; large objects don't have them. Mask out extra bits set from unrolling the bitmap for slice backing stores in writeHeapBitsSmall. The redzone in asan mode makes it so that dataSize is no longer an exact multiple of typ.Size_ in this case (a new assumption I have recently discovered) but we didn't mask out any extra bits, so we'd accidentally set bits in other allocations. Oops. Move the initHeapBits optimization for the 8-byte scan sizeclass on 64-bit platforms up to mallocgc, out from writeHeapBitsSmall. So, this actually caused a problem with asan when the optimization first landed, but we missed it. The issue was then masked once we started passing the redzone down into writeHeapBitsSmall, since the optimization would no longer erroneously fire on asan. What happened was that dataSize would be 8 (because that was the user-provided alloc size) so we'd skip writing heap bits, but it would turn out the redzone bumped the size class, so we'd actually *have* to write the heap bits for that size class. This is not really a problem now *but* it caused problems for me when debugging, since I would try to remove the red zone from dataSize and this would trigger this bug again. Ultimately, this whole situation is confusing because the check in writeHeapBitsSmall is *not* the same as the check in initHeapBits. By moving this check up to mallocgc, we can make the checks align better by matching on the sizeclass, so this should be less error-prone in the future. Change-Id: I1e9819223be23f722f3bf21e63e812f5fb557194 Reviewed-on: https://go-review.googlesource.com/c/go/+/622041 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-25runtime: reserve fewer memory for aligned reservation on sbrk systemsCherry Mui
Sometimes the runtime needs to reserve some memory with a large alignment, which the OS usually won't directly satisfy. So, it asks size+align bytes instead, and frees the unaligned portions. On sbrk systems, this doesn't work that well, as freeing the tail portion doesn't really free the memory to the OS. Instead, we could simply round the current break up, then reserve the given size, without wasting the tail portion. Also, don't create heap arena hints on sbrk systems. We can only grow the break sequentially, and reserving specific addresses would not succeed anyway. For #69018. Change-Id: Iadc2c54d62b00ad7befa5bbf71146523483a8c47 Reviewed-on: https://go-review.googlesource.com/c/go/+/621715 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-10-24runtime: fix ASAN poison calculation in mallocgcMichael Anthony Knyszek
A previous CL broke the ASAN poisoning calculation in mallocgc by not taking into account a possible allocation header, so the beginning of the following allocation could have been poisoned. This mostly isn't a problem, actually, since the following slot would usually just have an allocation header in it that programs shouldn't be touching anyway, but if we're going a word-past-the-end at the end of a span, we could be poisoning a valid heap allocation. Change-Id: I76a4f59bcef01af513a1640c4c212c0eb6be85b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/622295 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2024-10-23runtime: fix typo in error messagechangwang ma
Change-Id: I27bf98e84545746d90948dd06c4a7bd70782c49d Reviewed-on: https://go-review.googlesource.com/c/go/+/621895 Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2024-10-21runtime: execute publicationBarrier in noscan case for delayed zeroingMichael Anthony Knyszek
This is a peace-of-mind change to make sure that delayed-zeroed memory (in the large alloc case) is globally visible from the moment the allocation is published back to the caller. The way it's written right now is good enough for the garbage collector (we already have a publication barrier for a nil span.largeType, so the GC will ignore the noscan span) but this might matter for user code on weak memory architectures. Change-Id: I06ac9b95863074e5f09382629083b19bfa87fdb8 Reviewed-on: https://go-review.googlesource.com/c/go/+/619036 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-10-21runtime: specialize heapSetTypeMichael Anthony Knyszek
Last CL we separated mallocgc into several specialized paths. Let's split up heapSetType too. This will make the specialized heapSetType functions inlineable and cut out some branches as well as a function call. Microbenchmark results at this point in the stack: │ before.out │ after-5.out │ │ sec/op │ sec/op vs base │ Malloc8-4 13.52n ± 3% 12.15n ± 2% -10.13% (p=0.002 n=6) Malloc16-4 21.49n ± 2% 18.32n ± 4% -14.75% (p=0.002 n=6) MallocTypeInfo8-4 27.12n ± 1% 18.64n ± 2% -31.30% (p=0.002 n=6) MallocTypeInfo16-4 28.71n ± 3% 21.63n ± 5% -24.65% (p=0.002 n=6) geomean 21.81n 17.31n -20.64% Change-Id: I5de9ac5089b9eb49bf563af2a74e6dc564420e05 Reviewed-on: https://go-review.googlesource.com/c/go/+/614795 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2024-10-21runtime: refactor mallocgc into several independent codepathsMichael Anthony Knyszek
Right now mallocgc is a monster of a function. In real programs, we see that a substantial amount of time in mallocgc is spent in mallocgc itself. It's very branch-y, holds a lot of state, and handles quite a few disparate cases, trying to merge them together. This change breaks apart mallocgc into separate, leaner functions. There's some duplication now, but there are a lot of branches that can be pruned as a result. There's definitely still more we can do here. heapSetType can be inlined and broken down for each case, since its internals roughly map to each case anyway (done in a follow-up CL). We can probably also do more with the size class lookups, since we know more about the size of the object in each case than before. Below are the savings for the full stack up until now. │ after-3.out │ after-4.out │ │ sec/op │ sec/op vs base │ Malloc8-4 13.32n ± 2% 12.17n ± 1% -8.63% (p=0.002 n=6) Malloc16-4 21.64n ± 3% 19.38n ± 10% -10.47% (p=0.002 n=6) MallocTypeInfo8-4 23.15n ± 2% 19.91n ± 2% -14.00% (p=0.002 n=6) MallocTypeInfo16-4 25.86n ± 4% 22.48n ± 5% -13.11% (p=0.002 n=6) MallocLargeStruct-4 270.0n ± ∞ ¹ geomean 20.38n 30.97n -11.58% Change-Id: I681029c0b442f9221c4429950626f06299a5cfe4 Reviewed-on: https://go-review.googlesource.com/c/go/+/614257 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-10-21runtime: break out the debug.malloc codepaths into functionsMichael Anthony Knyszek
This change breaks out the debug.malloc codepaths into dedicated functions, both for making mallocgc easier to read, and to reduce the function's size (currently all that code is inlined and really doesn't need to be). This is a microoptimization that on its own changes very little, but together with other optimizations and a breaking up of the various malloc paths will matter all together ("death by a thousand cuts"). Change-Id: I30b3ab4a1f349ba85b4a1b5b2c399abcdfe4844f Reviewed-on: https://go-review.googlesource.com/c/go/+/617879 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-21runtime: move debug checks behind constant flag in mallocgcMichael Anthony Knyszek
These debug checks are very occasionally helpful, but they do cost real time. The biggest issue seems to be the bloat of mallocgc due to the "throw" paths. Overall, after some follow-ups, this change cuts about 1ns off of the mallocgc fast path. This is a microoptimization that on its own changes very little, but together with other optimizations and a breaking up of the various malloc paths will matter all together ("death by a thousand cuts"). Change-Id: I07c4547ad724b9f94281320846677fb558957721 Reviewed-on: https://go-review.googlesource.com/c/go/+/617878 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-10-21runtime: rename shouldhelpgc to checkGCTrigger in mallocgcMichael Anthony Knyszek
shouldhelpgc is a very unhelpful name, because it has nothing to do with assists and solely to do with GC triggering. Name it checkGCTrigger instead, which is much clearer. Change-Id: Id38debd424ddb397376c0cea6e74b3fe94002f71 Reviewed-on: https://go-review.googlesource.com/c/go/+/617877 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-10-21runtime: recompute assistG before and after mallocMichael Anthony Knyszek
This change stops tracking assistG across malloc to reduce number of slots the compiler must keep track of in mallocgc, which adds to register pressure. It also makes the call to deductAssistCredit only happen if the GC is running. This is a microoptimization that on its own changes very little, but together with other optimizations and a breaking up of the various malloc paths will matter all together ("death by a thousand cuts"). Change-Id: I4cfac7f3e8e873ba66ff3b553072737a4707e2c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/617876 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-10-21runtime: use wb flag instead of gcphase for allocate-black checkMichael Anthony Knyszek
This is an allocator microoptimization. There's no reason to check gcphase in general, since it's mostly for debugging anyway. writeBarrier.enabled is set in all the same cases here, and we force one fewer cache line (probably) to be touched during malloc. Conceptually, it also makes a bit more sense. The allocate-black policy is partly informed by the write barrier design. Change-Id: Ia5ff593d64c29cf7f4d1bced3204056566444a98 Reviewed-on: https://go-review.googlesource.com/c/go/+/617875 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-10-21runtime: simplify mem profile checking in mallocgcMichael Anthony Knyszek
Checking whether the current allocation needs to be profiled is currently branch-y and weirdly a lot of code. The branches are generally predictable, but it's a surprising number of instructions. Part of the problem is that MemProfileRate is just a global that can be set at any time, so we need to load it and check certain settings explicitly. In an ideal world, we would just always subtract from nextSample and have a single branch to take the slow path if we subtract below zero. If MemProfileRate were a function, we could trash all the nextSample values intentionally in each mcache. This would be slow, but MemProfileRate changes rarely while the malloc hot path is well, hot. Unfortunate... Although this ideal world is, AFAICT, impossible, we can still get close. If we cache the value of MemProfileRate in each mcache, then we can force malloc to take the slow path whenever MemProfileRate changes. This does require two additional loads, but crucially, these loads are independent of everything else in mallocgc. Furthermore, the branch dependent on those loads is incredibly predictable in practice. This CL on its own has little-to-no impact on mallocgc. But this codepath is going to be duplicated in several places in the next CL, so it'll pay to simplify it. Also, we're very much trying to remedy a death-by-a-thousand-cuts situation, and malloc is currently still kind of a monster -- it will not help if mallocgc isn't really streamlined itself. Lastly, there's a nice property now that all nextSample values get immediately re-sampled when MemProfileRate changes. Change-Id: I6443d0cf9bd7861595584442b675ac1be8ea3455 Reviewed-on: https://go-review.googlesource.com/c/go/+/615815 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2024-10-21runtime: don't call span.heapBits in writeHeapBitsSmallMichael Anthony Knyszek
For whatever reason, span.heapBits is kind of slow. It accounts for about a quarter of the cost of writeHeapBitsSmall, which is absurd. We get a nice speed improvement for small allocations by eliminating this call. │ before │ after │ │ sec/op │ sec/op vs base │ MallocTypeInfo16-4 29.47n ± 1% 27.02n ± 1% -8.31% (p=0.002 n=6) Change-Id: I6270e26902e5a9254cf1503fac81c3c799c59d6a Reviewed-on: https://go-review.googlesource.com/c/go/+/614255 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-10-08internal/runtime/maps: initial swiss table map implementationMichael Pratt
Add a new package that will contain a new "Swiss Table" (https://abseil.io/about/design/swisstables) map implementation, which is intended to eventually replace the existing runtime map implementation. This implementation is based on the fabulous github.com/cockroachdb/swiss package contributed by Peter Mattis. This CL adds an hash map implementation. It supports all the core operations, but does not have incremental growth. For #54766. Change-Id: I52cf371448c3817d471ddb1f5a78f3513565db41 Reviewed-on: https://go-review.googlesource.com/c/go/+/582415 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-09-10runtime: update documentation for ios addr bitsWuGuangyao
After this merge: https://go-review.googlesource.com/c/go/+/344401, ios/arm64 was treated as a 64 bit system and the addr bits of ios/arm64 was set to 40 Change-Id: I32d72787d20a3cf952b036e3e887cf5bae2273d8 GitHub-Last-Rev: 8917029fddc4d187b24fad8245fd7eed2bd570ba GitHub-Pull-Request: golang/go#69343 Reviewed-on: https://go-review.googlesource.com/c/go/+/610856 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Tim King <taking@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-07-23runtime,internal: move runtime/internal/sys to internal/runtime/sysDavid Chase
Cleanup and friction reduction For #65355. Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511 Reviewed-on: https://go-review.googlesource.com/c/go/+/600436 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-07-23runtime,internal: move runtime/internal/math to internal/runtime/mathDavid Chase
Cleanup and friction reduction. Updates #65355. Change-Id: I6c4fcd409d044c00d16561fe9ed2257877d73f5b Reviewed-on: https://go-review.googlesource.com/c/go/+/600435 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-05-29all: document legacy //go:linkname for final round of modulesRuss Cox
Add linknames for most modules with ≥50 dependents. Add linknames for a few other modules that we know are important but are below 50. Remove linknames from badlinkname.go that do not merit inclusion (very small number of dependents). We can add them back later if the need arises. Fixes #67401. (For now.) Change-Id: I1e49fec0292265256044d64b1841d366c4106002 Reviewed-on: https://go-review.googlesource.com/c/go/+/587756 Auto-Submit: Russ Cox <rsc@golang.org> TryBot-Bypass: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-05-29all: document legacy //go:linkname for modules with ≥100 dependentsRuss Cox
For #67401. Change-Id: I015408a3f437c1733d97160ef2fb5da6d4efcc5c Reviewed-on: https://go-review.googlesource.com/c/go/+/587598 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>
2024-05-23all: document legacy //go:linkname for modules with ≥200 dependentsRuss Cox
Ignored these linknames which have not worked for a while: github.com/xtls/xray-core: context.newCancelCtx removed in CL 463999 (Feb 2023) github.com/u-root/u-root: funcPC removed in CL 513837 (Jul 2023) tinygo.org/x/drivers: net.useNetdev never existed For #67401. Change-Id: I9293f4ef197bb5552b431de8939fa94988a060ce Reviewed-on: https://go-review.googlesource.com/c/go/+/587576 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-23all: document legacy //go:linkname for modules with ≥500 dependentsRuss Cox
For #67401. Change-Id: I7dd28c3b01a1a647f84929d15412aa43ab0089ee Reviewed-on: https://go-review.googlesource.com/c/go/+/587575 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-23all: document legacy //go:linkname for modules with ≥1,000 dependentsRuss Cox
For #67401. Change-Id: If23a2c07e3dd042a3c439da7088437a330b9caa4 Reviewed-on: https://go-review.googlesource.com/c/go/+/587222 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-05-23all: document legacy //go:linkname for modules with ≥2,000 dependentsRuss Cox
For #67401. Change-Id: I3ae93042dffd0683b7e6d6225536ae667749515b Reviewed-on: https://go-review.googlesource.com/c/go/+/587221 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-23all: document legacy //go:linkname for modules with ≥20,000 dependentsRuss Cox
For #67401. Change-Id: Icc10ede72547d8020c0ba45e89d954822a4b2455 Reviewed-on: https://go-review.googlesource.com/c/go/+/587218 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-22all: document legacy //go:linkname for modules with ≥50,000 dependentsRuss Cox
Note that this depends on the revert of CL 581395 to move zeroVal back. For #67401. Change-Id: I507c27c2404ad1348aabf1ffa3740e6b1957495b Reviewed-on: https://go-review.googlesource.com/c/go/+/587217 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-05-22all: document legacy //go:linkname for modules with ≥100,000 dependentsRuss Cox
For #67401. Change-Id: I51f5b561ee11eb242e3b1585d591281d0df4fc24 Reviewed-on: https://go-review.googlesource.com/c/go/+/587215 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-08runtime: move profiling pc buffers to mFelix Geisendörfer
Move profiling pc buffers from being stack allocated to an m field. This is motivated by the next patch, which will increase the default stack depth to 128, which might lead to undesirable stack growth for goroutines that produce profiling events. Additionally, this change paves the way to make the stack depth configurable via GODEBUG. Change-Id: Ifa407f899188e2c7c0a81de92194fdb627cb4b36 Reviewed-on: https://go-review.googlesource.com/c/go/+/574699 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-05-08runtime: add traceallocfree GODEBUG for alloc/free events in tracesMichael Anthony Knyszek
This change adds expensive alloc/free events to traces, guarded by a GODEBUG that can be set at run time by mutating the GODEBUG environment variable. This supersedes the alloc/free trace deleted in a previous CL. There are two parts to this CL. The first part is adding a mechanism for exposing experimental events through the tracer and trace parser. This boils down to a new ExperimentalEvent event type in the parser API which simply reveals the raw event data for the event. Each experimental event can also be associated with "experimental data" which is associated with a particular generation. This experimental data is just exposed as a bag of bytes that supplements the experimental events. In the runtime, this CL organizes experimental events by experiment. An experiment is defined by a set of experimental events and a single special batch type. Batches of this special type are exposed through the parser's API as the aforementioned "experimental data". The second part of this CL is defining the AllocFree experiment, which defines 9 new experimental events covering heap object alloc/frees, span alloc/frees, and goroutine stack alloc/frees. It also generates special batches that contain a type table: a mapping of IDs to type information. Change-Id: I965c00e3dcfdf5570f365ff89d0f70d8aeca219c Reviewed-on: https://go-review.googlesource.com/c/go/+/583377 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-08runtime: remove allocfreetraceMichael Anthony Knyszek
allocfreetrace prints all allocations and frees to stderr. It's not terribly useful because it has a really huge overhead, making it not feasible to use except for the most trivial programs. A follow-up CL will replace it with something that is both more thorough and also lower overhead. Change-Id: I1d668fee8b6aaef5251a5aea3054ec2444d75eb6 Reviewed-on: https://go-review.googlesource.com/c/go/+/583376 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-04-09runtime: make zeroing of large objects containing pointers preemptibleMichael Anthony Knyszek
This change makes it possible for the runtime to preempt the zeroing of large objects that contain pointers. It turns out this is fairly straightforward with allocation headers, since we can just temporarily tell the GC that there's nothing to scan for a large object with a single pointer write (as opposed to trying to zero a whole bunch of bits, as we would've had to do once upon a time). Fixes #31222. Change-Id: I10d0dcfa3938c383282a3eb485a6f00070d07bd2 Reviewed-on: https://go-review.googlesource.com/c/go/+/577495 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-04-09runtime: remove the allocheaders GOEXPERIMENTMichael Anthony Knyszek
This change removes the allocheaders, deleting all the old code and merging mbitmap_allocheaders.go back into mbitmap.go. This change also deletes the SetType benchmarks which were already broken in the new GOEXPERIMENT (it's harder to set up than before). We weren't really watching these benchmarks at all, and they don't provide additional test coverage. Change-Id: I135497201c3259087c5cd3722ed3fbe24791d25d Reviewed-on: https://go-review.googlesource.com/c/go/+/567200 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-03-25runtime: migrate internal/atomic to internal/runtimeAndy Pan
For #65355 Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be Reviewed-on: https://go-review.googlesource.com/c/go/+/560155 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
2024-03-04runtime: use .Pointers() instead of manual checkingPouriya
Change-Id: Ib78c1513616089f4942297cd17212b1b11871fd5 GitHub-Last-Rev: f97fe5b5bffffe25dc31de7964588640cb70ec41 GitHub-Pull-Request: golang/go#65819 Reviewed-on: https://go-review.googlesource.com/c/go/+/565515 Reviewed-by: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2023-12-05math/rand, math/rand/v2: use ChaCha8 for global randRuss Cox
Move ChaCha8 code into internal/chacha8rand and use it to implement runtime.rand, which is used for the unseeded global source for both math/rand and math/rand/v2. This also affects the calculation of the start point for iteration over very very large maps (when the 32-bit fastrand is not big enough). The benefit is that misuse of the global random number generators in math/rand and math/rand/v2 in contexts where non-predictable randomness is important for security reasons is no longer a security problem, removing a common mistake among programmers who are unaware of the different kinds of randomness. The cost is an extra 304 bytes per thread stored in the m struct plus 2-3ns more per random uint64 due to the more sophisticated algorithm. Using PCG looks like it would cost about the same, although I haven't benchmarked that. Before this, the math/rand and math/rand/v2 global generator was wyrand (https://github.com/wangyi-fudan/wyhash). For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson ALFG was justifiable, since the latter was not any better. But for math/rand/v2, the global generator really should be at least as good as one of the well-studied, specific algorithms provided directly by the package, and it's not. (Wyrand is still reasonable for scheduling and cache decisions.) Good randomness does have a cost: about twice wyrand. Also rationalize the various runtime rand references. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ bbb48afeb7.amd64 │ 5cf807d1ea.amd64 │ │ sec/op │ sec/op vs base │ ChaCha8-32 1.862n ± 2% 1.861n ± 2% ~ (p=0.825 n=20) PCG_DXSM-32 1.471n ± 1% 1.460n ± 2% ~ (p=0.153 n=20) SourceUint64-32 1.636n ± 2% 1.582n ± 1% -3.30% (p=0.000 n=20) GlobalInt64-32 2.087n ± 1% 3.663n ± 1% +75.54% (p=0.000 n=20) GlobalInt64Parallel-32 0.1042n ± 1% 0.2026n ± 1% +94.48% (p=0.000 n=20) GlobalUint64-32 2.263n ± 2% 3.724n ± 1% +64.57% (p=0.000 n=20) GlobalUint64Parallel-32 0.1019n ± 1% 0.1973n ± 1% +93.67% (p=0.000 n=20) Int64-32 1.771n ± 1% 1.774n ± 1% ~ (p=0.449 n=20) Uint64-32 1.863n ± 2% 1.866n ± 1% ~ (p=0.364 n=20) GlobalIntN1000-32 3.134n ± 3% 4.730n ± 2% +50.95% (p=0.000 n=20) IntN1000-32 2.489n ± 1% 2.489n ± 1% ~ (p=0.683 n=20) Int64N1000-32 2.521n ± 1% 2.516n ± 1% ~ (p=0.394 n=20) Int64N1e8-32 2.479n ± 1% 2.478n ± 2% ~ (p=0.743 n=20) Int64N1e9-32 2.530n ± 2% 2.514n ± 2% ~ (p=0.193 n=20) Int64N2e9-32 2.501n ± 1% 2.494n ± 1% ~ (p=0.616 n=20) Int64N1e18-32 3.227n ± 1% 3.205n ± 1% ~ (p=0.101 n=20) Int64N2e18-32 3.647n ± 1% 3.599n ± 1% ~ (p=0.019 n=20) Int64N4e18-32 5.135n ± 1% 5.069n ± 2% ~ (p=0.034 n=20) Int32N1000-32 2.657n ± 1% 2.637n ± 1% ~ (p=0.180 n=20) Int32N1e8-32 2.636n ± 1% 2.636n ± 1% ~ (p=0.763 n=20) Int32N1e9-32 2.660n ± 2% 2.638n ± 1% ~ (p=0.358 n=20) Int32N2e9-32 2.662n ± 2% 2.618n ± 2% ~ (p=0.064 n=20) Float32-32 2.272n ± 2% 2.239n ± 2% ~ (p=0.194 n=20) Float64-32 2.272n ± 1% 2.286n ± 2% ~ (p=0.763 n=20) ExpFloat64-32 3.762n ± 1% 3.744n ± 1% ~ (p=0.171 n=20) NormFloat64-32 3.706n ± 1% 3.655n ± 2% ~ (p=0.066 n=20) Perm3-32 32.93n ± 3% 34.62n ± 1% +5.13% (p=0.000 n=20) Perm30-32 202.9n ± 1% 204.0n ± 1% ~ (p=0.482 n=20) Perm30ViaShuffle-32 115.0n ± 1% 114.9n ± 1% ~ (p=0.358 n=20) ShuffleOverhead-32 112.8n ± 1% 112.7n ± 1% ~ (p=0.692 n=20) Concurrent-32 2.107n ± 0% 3.725n ± 1% +76.75% (p=0.000 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 │ bbb48afeb7.arm64 │ 5cf807d1ea.arm64 │ │ sec/op │ sec/op vs base │ ChaCha8-8 2.480n ± 0% 2.429n ± 0% -2.04% (p=0.000 n=20) PCG_DXSM-8 2.531n ± 0% 2.530n ± 0% ~ (p=0.877 n=20) SourceUint64-8 2.534n ± 0% 2.533n ± 0% ~ (p=0.732 n=20) GlobalInt64-8 2.172n ± 1% 4.794n ± 0% +120.67% (p=0.000 n=20) GlobalInt64Parallel-8 0.4320n ± 0% 0.9605n ± 0% +122.32% (p=0.000 n=20) GlobalUint64-8 2.182n ± 0% 4.770n ± 0% +118.58% (p=0.000 n=20) GlobalUint64Parallel-8 0.4307n ± 0% 0.9583n ± 0% +122.51% (p=0.000 n=20) Int64-8 4.107n ± 0% 4.104n ± 0% ~ (p=0.416 n=20) Uint64-8 4.080n ± 0% 4.080n ± 0% ~ (p=0.052 n=20) GlobalIntN1000-8 2.814n ± 2% 5.643n ± 0% +100.50% (p=0.000 n=20) IntN1000-8 4.141n ± 0% 4.139n ± 0% ~ (p=0.140 n=20) Int64N1000-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.313 n=20) Int64N1e8-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.103 n=20) Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.761 n=20) Int64N2e9-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.636 n=20) Int64N1e18-8 5.266n ± 0% 5.326n ± 1% +1.14% (p=0.001 n=20) Int64N2e18-8 6.052n ± 0% 6.167n ± 0% +1.90% (p=0.000 n=20) Int64N4e18-8 8.826n ± 0% 9.051n ± 0% +2.55% (p=0.000 n=20) Int32N1000-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20) Int32N1e8-8 4.126n ± 0% 4.131n ± 0% +0.12% (p=0.000 n=20) Int32N1e9-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20) Int32N2e9-8 4.132n ± 0% 4.131n ± 0% ~ (p=0.017 n=20) Float32-8 4.109n ± 0% 4.105n ± 0% ~ (p=0.379 n=20) Float64-8 4.107n ± 0% 4.106n ± 0% ~ (p=0.867 n=20) ExpFloat64-8 5.339n ± 0% 5.383n ± 0% +0.82% (p=0.000 n=20) NormFloat64-8 5.735n ± 0% 5.737n ± 1% ~ (p=0.856 n=20) Perm3-8 26.65n ± 0% 26.80n ± 1% +0.58% (p=0.000 n=20) Perm30-8 194.8n ± 1% 197.0n ± 0% +1.18% (p=0.000 n=20) Perm30ViaShuffle-8 156.6n ± 0% 157.6n ± 1% +0.61% (p=0.000 n=20) ShuffleOverhead-8 124.9n ± 0% 125.5n ± 0% +0.52% (p=0.000 n=20) Concurrent-8 2.434n ± 3% 5.066n ± 0% +108.09% (p=0.000 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ bbb48afeb7.386 │ 5cf807d1ea.386 │ │ sec/op │ sec/op vs base │ ChaCha8-32 11.295n ± 1% 4.748n ± 2% -57.96% (p=0.000 n=20) PCG_DXSM-32 7.693n ± 1% 7.738n ± 2% ~ (p=0.542 n=20) SourceUint64-32 7.658n ± 2% 7.622n ± 2% ~ (p=0.344 n=20) GlobalInt64-32 3.473n ± 2% 7.526n ± 2% +116.73% (p=0.000 n=20) GlobalInt64Parallel-32 0.3198n ± 0% 0.5444n ± 0% +70.22% (p=0.000 n=20) GlobalUint64-32 3.612n ± 0% 7.575n ± 1% +109.69% (p=0.000 n=20) GlobalUint64Parallel-32 0.3168n ± 0% 0.5403n ± 0% +70.51% (p=0.000 n=20) Int64-32 7.673n ± 2% 7.789n ± 1% ~ (p=0.122 n=20) Uint64-32 7.773n ± 1% 7.827n ± 2% ~ (p=0.920 n=20) GlobalIntN1000-32 6.268n ± 1% 9.581n ± 1% +52.87% (p=0.000 n=20) IntN1000-32 10.33n ± 2% 10.45n ± 1% ~ (p=0.233 n=20) Int64N1000-32 10.98n ± 2% 11.01n ± 1% ~ (p=0.401 n=20) Int64N1e8-32 11.19n ± 2% 10.97n ± 1% ~ (p=0.033 n=20) Int64N1e9-32 11.06n ± 1% 11.08n ± 1% ~ (p=0.498 n=20) Int64N2e9-32 11.10n ± 1% 11.01n ± 2% ~ (p=0.995 n=20) Int64N1e18-32 15.23n ± 2% 15.04n ± 1% ~ (p=0.973 n=20) Int64N2e18-32 15.89n ± 1% 15.85n ± 1% ~ (p=0.409 n=20) Int64N4e18-32 18.96n ± 2% 19.34n ± 2% ~ (p=0.048 n=20) Int32N1000-32 10.46n ± 2% 10.44n ± 2% ~ (p=0.480 n=20) Int32N1e8-32 10.46n ± 2% 10.49n ± 2% ~ (p=0.951 n=20) Int32N1e9-32 10.28n ± 2% 10.26n ± 1% ~ (p=0.431 n=20) Int32N2e9-32 10.50n ± 2% 10.44n ± 2% ~ (p=0.249 n=20) Float32-32 13.80n ± 2% 13.80n ± 2% ~ (p=0.751 n=20) Float64-32 23.55n ± 2% 23.87n ± 0% ~ (p=0.408 n=20) ExpFloat64-32 15.36n ± 1% 15.29n ± 2% ~ (p=0.316 n=20) NormFloat64-32 13.57n ± 1% 13.79n ± 1% +1.66% (p=0.005 n=20) Perm3-32 45.70n ± 2% 46.99n ± 2% +2.81% (p=0.001 n=20) Perm30-32 399.0n ± 1% 403.8n ± 1% +1.19% (p=0.006 n=20) Perm30ViaShuffle-32 349.0n ± 1% 350.4n ± 1% ~ (p=0.909 n=20) ShuffleOverhead-32 322.3n ± 1% 323.8n ± 1% ~ (p=0.410 n=20) Concurrent-32 3.331n ± 1% 7.312n ± 1% +119.50% (p=0.000 n=20) For #61716. Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2 Reviewed-on: https://go-review.googlesource.com/c/go/+/516860 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>