diff options
| author | Michael Pratt <mpratt@google.com> | 2025-09-17 13:25:03 -0400 |
|---|---|---|
| committer | Gopher Robot <gobot@golang.org> | 2025-09-18 10:00:29 -0700 |
| commit | 3032894e045fd3628198061a44c56d4a1fb73d93 (patch) | |
| tree | 182fd1d044a21e7868048463de3196da66cae4fe /src/runtime | |
| parent | ef05b66d6115209361dd99ff8f3ab978695fd74a (diff) | |
| download | go-3032894e045fd3628198061a44c56d4a1fb73d93.tar.xz | |
runtime: make explicit nil check in heapSetTypeSmallHeader
This is another case very similar to CL 684015 and #74375.
In spans with type headers, mallocgc always writes to the page before
returning the allocated memory. This initial write is done by
runtime.heapSetTypeSmallHeader. Prior to the write, the compiler inserts
a nil check, implemented as a dummy instruction reading from memory.
On a freshly mapped page, this read triggers a page fault, mapping the
zero page read-only. Immediately afterwards, the write triggers another
page fault, copying to a writeable page and performing a TLB flush.
This problem is exacerbated as the process scales up. At GOMAXPROCS=6,
the tile38 sweet benchmark spends around 0.1% of cycles directly
handling these page faults. On the same machine at GOMAXPROCS=192, it
spends about 2.7% of cycles directly handling these page faults.
Replacing the read with an explicit nil check reduces the direct cost of
these page faults down to around 0.1% at GOMAXPROCS=192. There are
additional positive side-effects due to reduced contention, so the
overall time spent in page faults drops from around 12.8% to 6.8%. Most
of the remaining time in page faults is spent on automatic NUMA page
migration (completely unrelated to this issue).
Impact on the tile38 benchmark results:
│ baseline │ cl704755 │
│ sec/op │ sec/op vs base │
Tile38QueryLoad-192 1.638m ± 3% 1.494m ± 5% -8.79% (p=0.002 n=6)
│ baseline │ cl704755 │
│ average-RSS-bytes │ average-RSS-bytes vs base │
Tile38QueryLoad-192 5.384Gi ± 3% 5.399Gi ± 3% ~ (p=0.818 n=6)
│ baseline │ cl704755 │
│ peak-RSS-bytes │ peak-RSS-bytes vs base │
Tile38QueryLoad-192 5.818Gi ± 1% 5.864Gi ± 2% ~ (p=0.394 n=6)
│ baseline │ cl704755 │
│ peak-VM-bytes │ peak-VM-bytes vs base │
Tile38QueryLoad-192 7.121Gi ± 1% 7.180Gi ± 2% ~ (p=0.818 n=6)
│ baseline │ cl704755 │
│ p50-latency-sec │ p50-latency-sec vs base │
Tile38QueryLoad-192 343.2µ ± 1% 313.2µ ± 3% -8.73% (p=0.002 n=6)
│ baseline │ cl704755 │
│ p90-latency-sec │ p90-latency-sec vs base │
Tile38QueryLoad-192 1.662m ± 2% 1.603m ± 5% ~ (p=0.093 n=6)
│ baseline │ cl704755 │
│ p99-latency-sec │ p99-latency-sec vs base │
Tile38QueryLoad-192 41.56m ± 8% 35.26m ± 18% -15.17% (p=0.026 n=6)
│ baseline │ cl704755 │
│ ops/s │ ops/s vs base │
Tile38QueryLoad-192 87.89k ± 3% 96.36k ± 4% +9.64% (p=0.002 n=6)
Updates #74375.
Change-Id: I6a6a636c1a16261b6d5076f2e1b08524a6544d33
Reviewed-on: https://go-review.googlesource.com/c/go/+/704755
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Diffstat (limited to 'src/runtime')
| -rw-r--r-- | src/runtime/mbitmap.go | 20 |
1 files changed, 20 insertions, 0 deletions
diff --git a/src/runtime/mbitmap.go b/src/runtime/mbitmap.go index 9872e5297f..508de9a115 100644 --- a/src/runtime/mbitmap.go +++ b/src/runtime/mbitmap.go @@ -714,6 +714,26 @@ func heapSetTypeNoHeader(x, dataSize uintptr, typ *_type, span *mspan) uintptr { } func heapSetTypeSmallHeader(x, dataSize uintptr, typ *_type, header **_type, span *mspan) uintptr { + if header == nil { + // This nil check and throw is almost pointless. Normally we would + // expect header to never be nil. However, this is called on potentially + // freshly-allocated virtual memory. As of 2025, the compiler-inserted + // nil check is not a branch but a memory read that we expect to fault + // if the pointer really is nil. + // + // However, this causes a read of the page, and operating systems may + // take it as a hint to back the accessed memory with a read-only zero + // page. However, we immediately write to this memory, which can then + // force operating systems to have to update the page table and flush + // the TLB. + // + // This nil check is thus an explicit branch instead of what the compiler + // would insert circa 2025, which is a memory read instruction. + // + // See go.dev/issue/74375 for details of a similar issue in + // spanInlineMarkBits. + throw("runtime: pointer to heap type header nil?") + } *header = typ if doubleCheckHeapSetType { doubleCheckHeapType(x, dataSize, typ, header, span) |
