From fecfcaa4f68a220f47e2c7c8b65d55906dbf8d46 Mon Sep 17 00:00:00 2001 From: thepudds Date: Tue, 4 Nov 2025 09:33:17 -0500 Subject: runtime: add runtime.freegc to reduce GC work MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL adds runtime.freegc: func freegc(ptr unsafe.Pointer, uintptr size, noscan bool) Memory freed via runtime.freegc is made immediately reusable for the next allocation in the same size class, without waiting for a GC cycle, and hence can dramatically reduce pressure on the GC. A sample microbenchmark included below shows strings.Builder operating roughly 2x faster. An experimental modification to reflect to use runtime.freegc and then using that reflect with json/v2 gave reported memory allocation reductions of -43.7%, -32.9%, -21.9%, -22.0%, -1.0% for the 5 official real-world unmarshalling benchmarks from go-json-experiment/jsonbench by the authors of json/v2, covering the CanadaGeometry through TwitterStatus datasets. Note: there is no intent to modify the standard library to have explicit calls to runtime.freegc, and of course such an ability would never be exposed to end-user code. Later CLs in this stack teach the compiler how to automatically insert runtime.freegc calls when it can prove it is safe to do so. (The reflect modification and other experimental changes to the standard library were just that -- experiments. It was very helpful while initially developing runtime.freegc to see more complex uses and closer-to-real-world benchmark results prior to updating the compiler.) This CL only addresses noscan span classes (heap objects without pointers), such as the backing memory for a []byte or string. A follow-on CL adds support for heap objects with pointers. If we update strings.Builder to explicitly call runtime.freegc on its internal buf after a resize operation (but without freeing the usually final incarnation of buf that will be returned to the user as a string), we can see some nice benchmark results on the existing strings benchmarks that call Builder.Write N times and then call Builder.String. Here, the (uncommon) case of a single Builder.Write is not helped (given it never resizes after first alloc if there is only one Write), but the impact grows such that it is up to ~2x faster as there are more resize operations due to more strings.Builder.Write calls: │ disabled.out │ new-free-20.txt │ │ sec/op │ sec/op vs base │ BuildString_Builder/1Write_36Bytes_NoGrow-4 55.82n ± 2% 55.86n ± 2% ~ (p=0.794 n=20) BuildString_Builder/2Write_36Bytes_NoGrow-4 125.2n ± 2% 115.4n ± 1% -7.86% (p=0.000 n=20) BuildString_Builder/3Write_36Bytes_NoGrow-4 224.0n ± 1% 188.2n ± 2% -16.00% (p=0.000 n=20) BuildString_Builder/5Write_36Bytes_NoGrow-4 239.1n ± 9% 205.1n ± 1% -14.20% (p=0.000 n=20) BuildString_Builder/8Write_36Bytes_NoGrow-4 422.8n ± 3% 325.4n ± 1% -23.04% (p=0.000 n=20) BuildString_Builder/10Write_36Bytes_NoGrow-4 436.9n ± 2% 342.3n ± 1% -21.64% (p=0.000 n=20) BuildString_Builder/100Write_36Bytes_NoGrow-4 4.403µ ± 1% 2.381µ ± 2% -45.91% (p=0.000 n=20) BuildString_Builder/1000Write_36Bytes_NoGrow-4 48.28µ ± 2% 21.38µ ± 2% -55.71% (p=0.000 n=20) See the design document for more discussion of the strings.Builder case. For testing, we add tests that attempt to exercise different aspects of the underlying freegc and mallocgc behavior on the reuse path. Validating the assist credit manipulations turned out to be subtle, so a test for that is added in the next CL. There are also invariant checks added, controlled by consts (primarily the doubleCheckReusable const currently). This CL also adds support in runtime.freegc for GODEBUG=clobberfree=1 to immediately overwrite freed memory with 0xdeadbeef, which can help a higher-level test fail faster in the event of a bug, and also the GC specifically looks for that pattern and throws a fatal error if it unexpectedly finds it. A later CL (currently experimental) adds GODEBUG=clobberfree=2, which uses mprotect (or VirtualProtect on Windows) to set freed memory to fault if read or written, until the runtime later unprotects the memory on the mallocgc reuse path. For the cases where a normal allocation is happening without any reuse, some initial microbenchmarks suggest the impact of these changes could be small to negligible (at least with GOAMD64=v3): goos: linux goarch: amd64 pkg: runtime cpu: AMD EPYC 7B13 │ base-512M-v3.bench │ ps16-512M-goamd64-v3.bench │ │ sec/op │ sec/op vs base │ Malloc8-16 11.01n ± 1% 10.94n ± 1% -0.68% (p=0.038 n=20) Malloc16-16 17.15n ± 1% 17.05n ± 0% -0.55% (p=0.007 n=20) Malloc32-16 18.65n ± 1% 18.42n ± 0% -1.26% (p=0.000 n=20) MallocTypeInfo8-16 18.63n ± 0% 18.36n ± 0% -1.45% (p=0.000 n=20) MallocTypeInfo16-16 22.32n ± 0% 22.65n ± 0% +1.50% (p=0.000 n=20) MallocTypeInfo32-16 23.37n ± 0% 23.89n ± 0% +2.23% (p=0.000 n=20) geomean 18.02n 18.01n -0.05% These last benchmark results include the runtime updates to support span classes with pointers (which was originally part of this CL, but later split out for ease of review). Updates #74299 Change-Id: Icceaa0f79f85c70cd1a718f9a4e7f0cf3d77803c Reviewed-on: https://go-review.googlesource.com/c/go/+/673695 LUCI-TryBot-Result: Go LUCI Reviewed-by: Michael Knyszek Reviewed-by: Junyang Shao --- src/runtime/malloc_test.go | 286 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 286 insertions(+) (limited to 'src/runtime/malloc_test.go') diff --git a/src/runtime/malloc_test.go b/src/runtime/malloc_test.go index bf58947bbc..6285cdaff7 100644 --- a/src/runtime/malloc_test.go +++ b/src/runtime/malloc_test.go @@ -16,6 +16,7 @@ import ( "runtime" . "runtime" "strings" + "sync" "sync/atomic" "testing" "time" @@ -234,6 +235,275 @@ func TestTinyAllocIssue37262(t *testing.T) { runtime.Releasem() } +// TestFreegc does basic testing of explicit frees. +func TestFreegc(t *testing.T) { + tests := []struct { + size string + f func(noscan bool) func(*testing.T) + noscan bool + }{ + // Types without pointers. + {"size=16", testFreegc[[16]byte], true}, // smallest we support currently + {"size=17", testFreegc[[17]byte], true}, + {"size=64", testFreegc[[64]byte], true}, + {"size=500", testFreegc[[500]byte], true}, + {"size=512", testFreegc[[512]byte], true}, + {"size=4096", testFreegc[[4096]byte], true}, + {"size=32KiB-8", testFreegc[[1<<15 - 8]byte], true}, // max noscan small object for 64-bit + } + + // Run the tests twice if not in -short mode or not otherwise saving test time. + // First while manually calling runtime.GC to slightly increase isolation (perhaps making + // problems more reproducible). + for _, tt := range tests { + runtime.GC() + t.Run(fmt.Sprintf("gc=yes/ptrs=%v/%s", !tt.noscan, tt.size), tt.f(tt.noscan)) + } + runtime.GC() + + if testing.Short() || !RuntimeFreegcEnabled || runtime.Raceenabled { + return + } + + // Again, but without manually calling runtime.GC in the loop (perhaps less isolation might + // trigger problems). + for _, tt := range tests { + t.Run(fmt.Sprintf("gc=no/ptrs=%v/%s", !tt.noscan, tt.size), tt.f(tt.noscan)) + } + runtime.GC() +} + +func testFreegc[T comparable](noscan bool) func(*testing.T) { + // We use stressMultiple to influence the duration of the tests. + // When testing freegc changes, stressMultiple can be increased locally + // to test longer or in some cases with more goroutines. + // It can also be helpful to test with GODEBUG=clobberfree=1 and + // with and without doubleCheckMalloc and doubleCheckReusable enabled. + stressMultiple := 10 + if testing.Short() || !RuntimeFreegcEnabled || runtime.Raceenabled { + stressMultiple = 1 + } + + return func(t *testing.T) { + alloc := func() *T { + // Force heap alloc, plus some light validation of zeroed memory. + t.Helper() + p := Escape(new(T)) + var zero T + if *p != zero { + t.Fatalf("allocator returned non-zero memory: %v", *p) + } + return p + } + + free := func(p *T) { + t.Helper() + var zero T + if *p != zero { + t.Fatalf("found non-zero memory before freeing (tests do not modify memory): %v", *p) + } + runtime.Freegc(unsafe.Pointer(p), unsafe.Sizeof(*p), noscan) + } + + t.Run("basic-free", func(t *testing.T) { + // Test that freeing a live heap object doesn't crash. + for range 100 { + p := alloc() + free(p) + } + }) + + t.Run("stack-free", func(t *testing.T) { + // Test that freeing a stack object doesn't crash. + for range 100 { + var x [32]byte + var y [32]*int + runtime.Freegc(unsafe.Pointer(&x), unsafe.Sizeof(x), true) // noscan + runtime.Freegc(unsafe.Pointer(&y), unsafe.Sizeof(y), false) // !noscan + } + }) + + // Check our allocations. These tests rely on the + // current implementation treating a re-used object + // as not adding to the allocation counts seen + // by testing.AllocsPerRun. (This is not the desired + // long-term behavior, but it is the current behavior and + // makes these tests convenient). + + t.Run("allocs-baseline", func(t *testing.T) { + // Baseline result without any explicit free. + allocs := testing.AllocsPerRun(100, func() { + for range 100 { + p := alloc() + _ = p + } + }) + if allocs < 100 { + // TODO(thepudds): we get exactly 100 for almost all the tests, but investigate why + // ~101 allocs for TestFreegc/ptrs=true/size=32KiB-8. + t.Fatalf("expected >=100 allocations, got %v", allocs) + } + }) + + t.Run("allocs-with-free", func(t *testing.T) { + // Same allocations, but now using explicit free so that + // no allocs get reported. (Again, not the desired long-term behavior). + if SizeSpecializedMallocEnabled { + t.Skip("temporarily skipping alloc tests for GOEXPERIMENT=sizespecializedmalloc") + } + if !RuntimeFreegcEnabled { + t.Skip("skipping alloc tests with runtime.freegc disabled") + } + allocs := testing.AllocsPerRun(100, func() { + for range 100 { + p := alloc() + free(p) + } + }) + if allocs != 0 { + t.Fatalf("expected 0 allocations, got %v", allocs) + } + }) + + t.Run("free-multiple", func(t *testing.T) { + // Multiple allocations outstanding before explicitly freeing, + // but still within the limit of our smallest free list size + // so that no allocs are reported. (Again, not long-term behavior). + if SizeSpecializedMallocEnabled { + t.Skip("temporarily skipping alloc tests for GOEXPERIMENT=sizespecializedmalloc") + } + if !RuntimeFreegcEnabled { + t.Skip("skipping alloc tests with runtime.freegc disabled") + } + const maxOutstanding = 20 + s := make([]*T, 0, maxOutstanding) + allocs := testing.AllocsPerRun(100*stressMultiple, func() { + s = s[:0] + for range maxOutstanding { + p := alloc() + s = append(s, p) + } + for _, p := range s { + free(p) + } + }) + if allocs != 0 { + t.Fatalf("expected 0 allocations, got %v", allocs) + } + }) + + if runtime.GOARCH == "wasm" { + // TODO(thepudds): for wasm, double-check if just slow, vs. some test logic problem, + // vs. something else. It might have been wasm was slowest with tests that spawn + // many goroutines, which might be expected for wasm. This skip might no longer be + // needed now that we have tuned test execution time more, or perhaps wasm should just + // always run in short mode, which might also let us remove this skip. + t.Skip("skipping remaining freegc tests, was timing out on wasm") + } + + t.Run("free-many", func(t *testing.T) { + // Confirm we are graceful if we have more freed elements at once + // than the max free list size. + s := make([]*T, 0, 1000) + iterations := stressMultiple * stressMultiple // currently 1 or 100 depending on -short + for range iterations { + s = s[:0] + for range 1000 { + p := alloc() + s = append(s, p) + } + for _, p := range s { + free(p) + } + } + }) + + t.Run("duplicate-check", func(t *testing.T) { + // A simple duplicate allocation test. We track what should be the set + // of live pointers in a map across a series of allocs and frees, + // and fail if a live pointer value is returned by an allocation. + // TODO: maybe add randomness? allow more live pointers? do across goroutines? + live := make(map[uintptr]bool) + for i := range 100 * stressMultiple { + var s []*T + // Alloc 10 times, tracking the live pointer values. + for j := range 10 { + p := alloc() + uptr := uintptr(unsafe.Pointer(p)) + if live[uptr] { + t.Fatalf("TestFreeLive: found duplicate pointer (0x%x). i: %d j: %d", uptr, i, j) + } + live[uptr] = true + s = append(s, p) + } + // Explicitly free those pointers, removing them from the live map. + for k := range s { + p := s[k] + s[k] = nil + uptr := uintptr(unsafe.Pointer(p)) + free(p) + delete(live, uptr) + } + } + }) + + t.Run("free-other-goroutine", func(t *testing.T) { + // Use explicit free, but the free happens on a different goroutine than the alloc. + // This also lightly simulates how the free code sees P migration or flushing + // the mcache, assuming we have > 1 P. (Not using testing.AllocsPerRun here). + iterations := 10 * stressMultiple * stressMultiple // currently 10 or 1000 depending on -short + for _, capacity := range []int{2} { + for range iterations { + ch := make(chan *T, capacity) + var wg sync.WaitGroup + for range 2 { + wg.Add(1) + go func() { + defer wg.Done() + for p := range ch { + free(p) + } + }() + } + for range 100 { + p := alloc() + ch <- p + } + close(ch) + wg.Wait() + } + } + }) + + t.Run("many-goroutines", func(t *testing.T) { + // Allocate across multiple goroutines, freeing on the same goroutine. + // TODO: probably remove the duplicate checking here; not that useful. + counts := []int{1, 2, 4, 8, 10 * stressMultiple} + for _, goroutines := range counts { + var wg sync.WaitGroup + for range goroutines { + wg.Add(1) + go func() { + defer wg.Done() + live := make(map[uintptr]bool) + for range 100 * stressMultiple { + p := alloc() + uptr := uintptr(unsafe.Pointer(p)) + if live[uptr] { + panic("TestFreeLive: found duplicate pointer") + } + live[uptr] = true + free(p) + delete(live, uptr) + } + }() + } + wg.Wait() + } + }) + } +} + func TestPageCacheLeak(t *testing.T) { defer GOMAXPROCS(GOMAXPROCS(1)) leaked := PageCachePagesLeaked() @@ -337,6 +607,13 @@ func BenchmarkMalloc16(b *testing.B) { } } +func BenchmarkMalloc32(b *testing.B) { + for i := 0; i < b.N; i++ { + p := new([4]int64) + Escape(p) + } +} + func BenchmarkMallocTypeInfo8(b *testing.B) { for i := 0; i < b.N; i++ { p := new(struct { @@ -355,6 +632,15 @@ func BenchmarkMallocTypeInfo16(b *testing.B) { } } +func BenchmarkMallocTypeInfo32(b *testing.B) { + for i := 0; i < b.N; i++ { + p := new(struct { + p [32 / unsafe.Sizeof(uintptr(0))]*int + }) + Escape(p) + } +} + type LargeStruct struct { x [16][]byte } -- cgit v1.3 From 120f1874ef380362cf8b8c4775a327bcd417ff70 Mon Sep 17 00:00:00 2001 From: thepudds Date: Mon, 3 Nov 2025 16:40:40 -0500 Subject: runtime: add more precise test of assist credit handling for runtime.freegc This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL adds a better test of assist credit handling when heap objects are being reused after a runtime.freegc call. The main approach is bracketing alloc/free pairs with measurements of the assist credit before and after, and hoping to see a net zero change in the assist credit. However, validating the desired behavior is perhaps a bit subtle. To help stabilize the measurements, we do acquirem in the test code to avoid being preempted during the measurements to reduce other code's ability to adjust the assist credit while we are measuring, and we also reduce GOMAXPROCS to 1. This test currently does fail if we deliberately introduce bugs in the runtime.freegc implementation such as if we: - never adjust the assist credit when reusing an object, or - always adjust the assist credit when reusing an object, or - deliberately mishandle internal fragmentation. The two main cases of current interest for testing runtime.freegc are when over the course of our bracketed measurements gcBlackenEnable is either true or false. The test attempts to exercise both of those case by running the GC continually in the background (which we can see seems effective based on logging and by how our deliberate bugs fail). This passes ~10K test executions locally via stress. A small note to the future: a previous incarnation of this test (circa patchset 11 of this CL) did not do acquirem but had an approach of ignoring certain measurements, which also was able to pass ~10K runs via stress. The current version in this CL is simpler, but recording the existence of the prior version here in case it is useful in the future. (Hopefully not.) Updates #74299 Change-Id: I46c7e0295d125f5884fee0cc3d3d31aedc7e5ff4 Reviewed-on: https://go-review.googlesource.com/c/go/+/717520 Reviewed-by: Michael Knyszek Reviewed-by: Junyang Shao LUCI-TryBot-Result: Go LUCI --- src/runtime/export_test.go | 28 ++++++++++++++ src/runtime/malloc_test.go | 93 ++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 117 insertions(+), 4 deletions(-) (limited to 'src/runtime/malloc_test.go') diff --git a/src/runtime/export_test.go b/src/runtime/export_test.go index 48dcf5aa39..8438603b9e 100644 --- a/src/runtime/export_test.go +++ b/src/runtime/export_test.go @@ -644,6 +644,25 @@ func Freegc(p unsafe.Pointer, size uintptr, noscan bool) { freegc(p, size, noscan) } +// Expose gcAssistBytes for the current g for testing. +func AssistCredit() int64 { + assistG := getg() + if assistG.m.curg != nil { + assistG = assistG.m.curg + } + return assistG.gcAssistBytes +} + +// Expose gcBlackenEnabled for testing. +func GcBlackenEnable() bool { + // Note we do a non-atomic load here. + // Some checks against gcBlackenEnabled (e.g., in mallocgc) + // are currently done via non-atomic load for performance reasons, + // but other checks are done via atomic load (e.g., in mgcmark.go), + // so interpreting this value in a test may be subtle. + return gcBlackenEnabled != 0 +} + const SizeSpecializedMallocEnabled = sizeSpecializedMallocEnabled const RuntimeFreegcEnabled = runtimeFreegcEnabled @@ -1487,6 +1506,15 @@ func Releasem() { releasem(getg().m) } +// GoschedIfBusy is an explicit preemption check to call back +// into the scheduler. This is useful for tests that run code +// which spend most of their time as non-preemptible, as it +// can be placed right after becoming preemptible again to ensure +// that the scheduler gets a chance to preempt the goroutine. +func GoschedIfBusy() { + goschedIfBusy() +} + type PIController struct { piController } diff --git a/src/runtime/malloc_test.go b/src/runtime/malloc_test.go index 6285cdaff7..10c20e6c23 100644 --- a/src/runtime/malloc_test.go +++ b/src/runtime/malloc_test.go @@ -249,6 +249,7 @@ func TestFreegc(t *testing.T) { {"size=500", testFreegc[[500]byte], true}, {"size=512", testFreegc[[512]byte], true}, {"size=4096", testFreegc[[4096]byte], true}, + {"size=20000", testFreegc[[20000]byte], true}, // not power of 2 or spc boundary {"size=32KiB-8", testFreegc[[1<<15 - 8]byte], true}, // max noscan small object for 64-bit } @@ -300,7 +301,7 @@ func testFreegc[T comparable](noscan bool) func(*testing.T) { t.Helper() var zero T if *p != zero { - t.Fatalf("found non-zero memory before freeing (tests do not modify memory): %v", *p) + t.Fatalf("found non-zero memory before freegc (tests do not modify memory): %v", *p) } runtime.Freegc(unsafe.Pointer(p), unsafe.Sizeof(*p), noscan) } @@ -405,7 +406,7 @@ func testFreegc[T comparable](noscan bool) func(*testing.T) { // Confirm we are graceful if we have more freed elements at once // than the max free list size. s := make([]*T, 0, 1000) - iterations := stressMultiple * stressMultiple // currently 1 or 100 depending on -short + iterations := stressMultiple * stressMultiple // currently 1 (-short) or 100 for range iterations { s = s[:0] for range 1000 { @@ -431,7 +432,7 @@ func testFreegc[T comparable](noscan bool) func(*testing.T) { p := alloc() uptr := uintptr(unsafe.Pointer(p)) if live[uptr] { - t.Fatalf("TestFreeLive: found duplicate pointer (0x%x). i: %d j: %d", uptr, i, j) + t.Fatalf("found duplicate pointer (0x%x). i: %d j: %d", uptr, i, j) } live[uptr] = true s = append(s, p) @@ -451,7 +452,7 @@ func testFreegc[T comparable](noscan bool) func(*testing.T) { // Use explicit free, but the free happens on a different goroutine than the alloc. // This also lightly simulates how the free code sees P migration or flushing // the mcache, assuming we have > 1 P. (Not using testing.AllocsPerRun here). - iterations := 10 * stressMultiple * stressMultiple // currently 10 or 1000 depending on -short + iterations := 10 * stressMultiple * stressMultiple // currently 10 (-short) or 1000 for _, capacity := range []int{2} { for range iterations { ch := make(chan *T, capacity) @@ -501,6 +502,90 @@ func testFreegc[T comparable](noscan bool) func(*testing.T) { wg.Wait() } }) + + t.Run("assist-credit", func(t *testing.T) { + // Allocate and free using the same span class repeatedly while + // verifying it results in a net zero change in assist credit. + // This helps double-check our manipulation of the assist credit + // during mallocgc/freegc, including in cases when there is + // internal fragmentation when the requested mallocgc size is + // smaller than the size class. + // + // See https://go.dev/cl/717520 for some additional discussion, + // including how we can deliberately cause the test to fail currently + // if we purposefully introduce some assist credit bugs. + if SizeSpecializedMallocEnabled { + // TODO(thepudds): skip this test at this point in the stack; later CL has + // integration with sizespecializedmalloc. + t.Skip("temporarily skip assist credit test for GOEXPERIMENT=sizespecializedmalloc") + } + if !RuntimeFreegcEnabled { + t.Skip("skipping assist credit test with runtime.freegc disabled") + } + + // Use a background goroutine to continuously run the GC. + done := make(chan struct{}) + defer close(done) + go func() { + for { + select { + case <-done: + return + default: + runtime.GC() + } + } + }() + + // If making changes related to this test, consider testing locally with + // larger counts, like 100K or 1M. + counts := []int{1, 2, 10, 100 * stressMultiple} + // Dropping down to GOMAXPROCS=1 might help reduce noise. + defer GOMAXPROCS(GOMAXPROCS(1)) + size := int64(unsafe.Sizeof(*new(T))) + for _, count := range counts { + // Start by forcing a GC to reset this g's assist credit + // and perhaps help us get a cleaner measurement of GC cycle count. + runtime.GC() + for i := range count { + // We disable preemption to reduce other code's ability to adjust this g's + // assist credit or otherwise change things while we are measuring. + Acquirem() + + // We do two allocations per loop, with the second allocation being + // the one we measure. The first allocation tries to ensure at least one + // reusable object on the mspan's free list when we do our measured allocation. + p := alloc() + free(p) + + // Now do our primary allocation of interest, bracketed by measurements. + // We measure more than we strictly need (to log details in case of a failure). + creditStart := AssistCredit() + blackenStart := GcBlackenEnable() + p = alloc() + blackenAfterAlloc := GcBlackenEnable() + creditAfterAlloc := AssistCredit() + free(p) + blackenEnd := GcBlackenEnable() + creditEnd := AssistCredit() + + Releasem() + GoschedIfBusy() + + delta := creditEnd - creditStart + if delta != 0 { + t.Logf("assist credit non-zero delta: %d", delta) + t.Logf("\t| size: %d i: %d count: %d", size, i, count) + t.Logf("\t| credit before: %d credit after: %d", creditStart, creditEnd) + t.Logf("\t| alloc delta: %d free delta: %d", + creditAfterAlloc-creditStart, creditEnd-creditAfterAlloc) + t.Logf("\t| gcBlackenEnable (start / after alloc / end): %v/%v/%v", + blackenStart, blackenAfterAlloc, blackenEnd) + t.FailNow() + } + } + } + }) } } -- cgit v1.3 From 50128a21541e3fd712ad717a223aaa109cb86d43 Mon Sep 17 00:00:00 2001 From: thepudds Date: Sun, 9 Nov 2025 09:24:22 -0500 Subject: runtime: support runtime.freegc in size-specialized mallocs for noscan objects This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL updates the smallNoScanStub stub in malloc_stubs.go to reuse heap objects that have been freed by runtime.freegc calls, and generates the corresponding size-specialized code in malloc_generated.go. This CL only adds support in the specialized mallocs for noscan heap objects (objects without pointers). A later CL handles objects with pointers. While we are here, we leave a couple of breadcrumbs in mkmalloc.go on how to do the generation. Updates #74299 Change-Id: I2657622601a27211554ee862fce057e101767a70 Reviewed-on: https://go-review.googlesource.com/c/go/+/715761 Reviewed-by: Junyang Shao LUCI-TryBot-Result: Go LUCI Reviewed-by: Michael Knyszek --- src/runtime/_mkmalloc/mkmalloc.go | 3 +- src/runtime/malloc.go | 11 +- src/runtime/malloc_generated.go | 651 ++++++++++++++++++++++++++++++++++++++ src/runtime/malloc_stubs.go | 22 +- src/runtime/malloc_test.go | 16 +- 5 files changed, 693 insertions(+), 10 deletions(-) (limited to 'src/runtime/malloc_test.go') diff --git a/src/runtime/_mkmalloc/mkmalloc.go b/src/runtime/_mkmalloc/mkmalloc.go index 986b0aa9f8..1f040c8861 100644 --- a/src/runtime/_mkmalloc/mkmalloc.go +++ b/src/runtime/_mkmalloc/mkmalloc.go @@ -254,7 +254,8 @@ func inline(config generatorConfig) []byte { } // Write out the package and import declarations. - out.WriteString("// Code generated by mkmalloc.go; DO NOT EDIT.\n\n") + out.WriteString("// Code generated by mkmalloc.go; DO NOT EDIT.\n") + out.WriteString("// See overview in malloc_stubs.go.\n\n") out.WriteString("package " + f.Name.Name + "\n\n") for _, importDecl := range importDecls { out.Write(mustFormatNode(fset, importDecl)) diff --git a/src/runtime/malloc.go b/src/runtime/malloc.go index 13f5fc3081..d49dacaf68 100644 --- a/src/runtime/malloc.go +++ b/src/runtime/malloc.go @@ -1094,6 +1094,8 @@ const sizeSpecializedMallocEnabled = goexperiment.SizeSpecializedMalloc && GOOS // implementation and the corresponding allocation-related changes: the experiment must be // enabled, and none of the memory sanitizers should be enabled. We allow the race detector, // in contrast to sizeSpecializedMallocEnabled. +// TODO(thepudds): it would be nice to check Valgrind integration, though there are some hints +// there might not be any canned tests in tree for Go's integration with Valgrind. const runtimeFreegcEnabled = goexperiment.RuntimeFreegc && !asanenabled && !msanenabled && !valgrindenabled // Allocate an object of size bytes. @@ -1966,10 +1968,15 @@ const ( // or roughly when the liveness analysis of the compiler // would otherwise have determined ptr's object is reclaimable by the GC. func freegc(ptr unsafe.Pointer, size uintptr, noscan bool) bool { - if !runtimeFreegcEnabled || sizeSpecializedMallocEnabled || !reusableSize(size) { - // TODO(thepudds): temporarily disable freegc with SizeSpecializedMalloc until we finish integrating. + if !runtimeFreegcEnabled || !reusableSize(size) { return false } + if sizeSpecializedMallocEnabled && !noscan { + // TODO(thepudds): temporarily disable freegc with SizeSpecializedMalloc for pointer types + // until we finish integrating. + return false + } + if ptr == nil { throw("freegc nil") } diff --git a/src/runtime/malloc_generated.go b/src/runtime/malloc_generated.go index 2215dbaddb..5abb61257a 100644 --- a/src/runtime/malloc_generated.go +++ b/src/runtime/malloc_generated.go @@ -1,4 +1,5 @@ // Code generated by mkmalloc.go; DO NOT EDIT. +// See overview in malloc_stubs.go. package runtime @@ -6400,6 +6401,32 @@ func mallocgcSmallNoScanSC2(size uintptr, typ *_type, needzero bool) unsafe.Poin const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -6497,6 +6524,32 @@ func mallocgcSmallNoScanSC3(size uintptr, typ *_type, needzero bool) unsafe.Poin const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -6594,6 +6647,32 @@ func mallocgcSmallNoScanSC4(size uintptr, typ *_type, needzero bool) unsafe.Poin const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -6691,6 +6770,32 @@ func mallocgcSmallNoScanSC5(size uintptr, typ *_type, needzero bool) unsafe.Poin const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -6788,6 +6893,32 @@ func mallocgcSmallNoScanSC6(size uintptr, typ *_type, needzero bool) unsafe.Poin const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -6885,6 +7016,32 @@ func mallocgcSmallNoScanSC7(size uintptr, typ *_type, needzero bool) unsafe.Poin const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -6982,6 +7139,32 @@ func mallocgcSmallNoScanSC8(size uintptr, typ *_type, needzero bool) unsafe.Poin const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -7079,6 +7262,32 @@ func mallocgcSmallNoScanSC9(size uintptr, typ *_type, needzero bool) unsafe.Poin const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -7176,6 +7385,32 @@ func mallocgcSmallNoScanSC10(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -7273,6 +7508,32 @@ func mallocgcSmallNoScanSC11(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -7370,6 +7631,32 @@ func mallocgcSmallNoScanSC12(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -7467,6 +7754,32 @@ func mallocgcSmallNoScanSC13(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -7564,6 +7877,32 @@ func mallocgcSmallNoScanSC14(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -7661,6 +8000,32 @@ func mallocgcSmallNoScanSC15(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -7758,6 +8123,32 @@ func mallocgcSmallNoScanSC16(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -7855,6 +8246,32 @@ func mallocgcSmallNoScanSC17(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -7952,6 +8369,32 @@ func mallocgcSmallNoScanSC18(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -8049,6 +8492,32 @@ func mallocgcSmallNoScanSC19(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -8146,6 +8615,32 @@ func mallocgcSmallNoScanSC20(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -8243,6 +8738,32 @@ func mallocgcSmallNoScanSC21(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -8340,6 +8861,32 @@ func mallocgcSmallNoScanSC22(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -8437,6 +8984,32 @@ func mallocgcSmallNoScanSC23(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -8534,6 +9107,32 @@ func mallocgcSmallNoScanSC24(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -8631,6 +9230,32 @@ func mallocgcSmallNoScanSC25(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) @@ -8728,6 +9353,32 @@ func mallocgcSmallNoScanSC26(size uintptr, typ *_type, needzero bool) unsafe.Poi const spc = spanClass(sizeclass<<1) | spanClass(1) span := c.alloc[spc] + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + x := v + { + + if valgrindenabled { + valgrindMalloc(x, size) + } + + if gcBlackenEnabled != 0 && elemsize != 0 { + if assistG := getg().m.curg; assistG != nil { + assistG.gcAssistBytes -= int64(elemsize - size) + } + } + + if debug.malloc { + postMallocgcDebug(x, elemsize, typ) + } + return x + } + + } + var nextFreeFastResult gclinkptr if span.allocCache != 0 { theBit := sys.TrailingZeros64(span.allocCache) diff --git a/src/runtime/malloc_stubs.go b/src/runtime/malloc_stubs.go index 224746f3d4..e9752956b8 100644 --- a/src/runtime/malloc_stubs.go +++ b/src/runtime/malloc_stubs.go @@ -7,6 +7,8 @@ // to produce a full mallocgc function that's specialized for a span class // or specific size in the case of the tiny allocator. // +// To generate the specialized mallocgc functions, do 'go run .' inside runtime/_mkmalloc. +// // To assemble a mallocgc function, the mallocStub function is cloned, and the call to // inlinedMalloc is replaced with the inlined body of smallScanNoHeaderStub, // smallNoScanStub or tinyStub, depending on the parameters being specialized. @@ -71,7 +73,8 @@ func mallocStub(size uintptr, typ *_type, needzero bool) unsafe.Pointer { } } - // Assist the GC if needed. + // Assist the GC if needed. (On the reuse path, we currently compensate for this; + // changes here might require changes there.) if gcBlackenEnabled != 0 { deductAssistCredit(size) } @@ -242,6 +245,23 @@ func smallNoScanStub(size uintptr, typ *_type, needzero bool) (unsafe.Pointer, u c := getMCache(mp) const spc = spanClass(sizeclass<<1) | spanClass(noscanint_) span := c.alloc[spc] + + // First, check for a reusable object. + if runtimeFreegcEnabled && c.hasReusableNoscan(spc) { + // We have a reusable object, use it. + v := mallocgcSmallNoscanReuse(c, span, spc, elemsize, needzero) + mp.mallocing = 0 + releasem(mp) + + // TODO(thepudds): note that the generated return path is essentially duplicated + // by the generator. For example, see the two postMallocgcDebug calls and + // related duplicated code on the return path currently in the generated + // mallocgcSmallNoScanSC2 function. One set of those correspond to this + // return here. We might be able to de-duplicate the generated return path + // by updating the generator, perhaps by jumping to a shared return or similar. + return v, elemsize + } + v := nextFreeFastStub(span) if v == 0 { v, span, checkGCTrigger = c.nextFree(spc) diff --git a/src/runtime/malloc_test.go b/src/runtime/malloc_test.go index 10c20e6c23..97cf0eed54 100644 --- a/src/runtime/malloc_test.go +++ b/src/runtime/malloc_test.go @@ -349,8 +349,10 @@ func testFreegc[T comparable](noscan bool) func(*testing.T) { t.Run("allocs-with-free", func(t *testing.T) { // Same allocations, but now using explicit free so that // no allocs get reported. (Again, not the desired long-term behavior). - if SizeSpecializedMallocEnabled { - t.Skip("temporarily skipping alloc tests for GOEXPERIMENT=sizespecializedmalloc") + if SizeSpecializedMallocEnabled && !noscan { + // TODO(thepudds): skip at this point in the stack for size-specialized malloc + // with !noscan. Additional integration with sizespecializedmalloc is in a later CL. + t.Skip("temporarily skipping alloc tests for GOEXPERIMENT=sizespecializedmalloc for pointer types") } if !RuntimeFreegcEnabled { t.Skip("skipping alloc tests with runtime.freegc disabled") @@ -370,8 +372,10 @@ func testFreegc[T comparable](noscan bool) func(*testing.T) { // Multiple allocations outstanding before explicitly freeing, // but still within the limit of our smallest free list size // so that no allocs are reported. (Again, not long-term behavior). - if SizeSpecializedMallocEnabled { - t.Skip("temporarily skipping alloc tests for GOEXPERIMENT=sizespecializedmalloc") + if SizeSpecializedMallocEnabled && !noscan { + // TODO(thepudds): skip at this point in the stack for size-specialized malloc + // with !noscan. Additional integration with sizespecializedmalloc is in a later CL. + t.Skip("temporarily skipping alloc tests for GOEXPERIMENT=sizespecializedmalloc for pointer types") } if !RuntimeFreegcEnabled { t.Skip("skipping alloc tests with runtime.freegc disabled") @@ -514,10 +518,10 @@ func testFreegc[T comparable](noscan bool) func(*testing.T) { // See https://go.dev/cl/717520 for some additional discussion, // including how we can deliberately cause the test to fail currently // if we purposefully introduce some assist credit bugs. - if SizeSpecializedMallocEnabled { + if SizeSpecializedMallocEnabled && !noscan { // TODO(thepudds): skip this test at this point in the stack; later CL has // integration with sizespecializedmalloc. - t.Skip("temporarily skip assist credit test for GOEXPERIMENT=sizespecializedmalloc") + t.Skip("temporarily skip assist credit tests for GOEXPERIMENT=sizespecializedmalloc for pointer types") } if !RuntimeFreegcEnabled { t.Skip("skipping assist credit test with runtime.freegc disabled") -- cgit v1.3