diff options
| author | Michael Anthony Knyszek <mknyszek@google.com> | 2023-11-14 22:05:53 +0000 |
|---|---|---|
| committer | Michael Knyszek <mknyszek@google.com> | 2023-11-16 05:53:55 +0000 |
| commit | d6ef98b8fa4851f025779ef4ade084d63290de2a (patch) | |
| tree | 75ff4789e138894df856523f34ef25bb2001edb4 /src/runtime/mbitmap_noallocheaders.go | |
| parent | 17eb0a2bac79eda8dc71d628474989d05d9755c5 (diff) | |
| download | go-d6ef98b8fa4851f025779ef4ade084d63290de2a.tar.xz | |
runtime: optimize bulkBarrierPreWrite with allocheaders
Currently bulkBarrierPreWrite follows a fairly slow path wherein it
calls typePointersOf, which ends up calling into fastForward. This does
some fairly heavy computation to move the iterator forward without any
assumptions about where it lands at all. It needs to be completely
general to support splitting at arbitrary boundaries, for example for
scanning oblets.
This means that copying objects during the GC mark phase is fairly
expensive, and is a regression from before allocheaders.
However, in almost all cases bulkBarrierPreWrite and
bulkBarrierPreWriteSrcOnly have perfect type information. We can do a
lot better in these cases because we're starting on a type-size
boundary, which is exactly what the iterator is built around.
This change adds the typePointersOfType method which produces a
typePointers iterator from a pointer and a type. This change
significantly improves the performance of these bulk write barriers,
eliminating some performance regressions that were noticed on the perf
dashboard.
There are still just a couple cases where we have to use the more
general typePointersOf calls, but they're fairly rare; most bulk
barriers have perfect type information.
This change is tested by the GCInfo tests in the runtime and the GCBits
tests in the reflect package via an additional check in getgcmask.
Results for tile38 before and after allocheaders. There was previous a
regression in the p90, now it's gone. Also, the overall win has been
boosted slightly.
tile38 $ benchstat noallocheaders.results allocheaders.results
name old time/op new time/op delta
Tile38QueryLoad 481µs ± 1% 468µs ± 1% -2.71% (p=0.000 n=10+10)
name old average-RSS-bytes new average-RSS-bytes delta
Tile38QueryLoad 6.32GB ± 1% 6.23GB ± 0% -1.38% (p=0.000 n=9+8)
name old peak-RSS-bytes new peak-RSS-bytes delta
Tile38QueryLoad 6.49GB ± 1% 6.40GB ± 1% -1.38% (p=0.002 n=10+10)
name old peak-VM-bytes new peak-VM-bytes delta
Tile38QueryLoad 7.72GB ± 1% 7.64GB ± 1% -1.07% (p=0.007 n=10+10)
name old p50-latency-ns new p50-latency-ns delta
Tile38QueryLoad 212k ± 1% 205k ± 0% -3.02% (p=0.000 n=10+9)
name old p90-latency-ns new p90-latency-ns delta
Tile38QueryLoad 622k ± 1% 616k ± 1% -1.03% (p=0.005 n=10+10)
name old p99-latency-ns new p99-latency-ns delta
Tile38QueryLoad 4.55M ± 2% 4.39M ± 2% -3.51% (p=0.000 n=10+10)
name old ops/s new ops/s delta
Tile38QueryLoad 12.5k ± 1% 12.8k ± 1% +2.78% (p=0.000 n=10+10)
Change-Id: I0a48f848eae8777d0fd6769c3a1fe449f8d9d0a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/542219
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Diffstat (limited to 'src/runtime/mbitmap_noallocheaders.go')
| -rw-r--r-- | src/runtime/mbitmap_noallocheaders.go | 11 |
1 files changed, 9 insertions, 2 deletions
diff --git a/src/runtime/mbitmap_noallocheaders.go b/src/runtime/mbitmap_noallocheaders.go index dab15889a4..383993aa1e 100644 --- a/src/runtime/mbitmap_noallocheaders.go +++ b/src/runtime/mbitmap_noallocheaders.go @@ -42,6 +42,7 @@ package runtime import ( + "internal/abi" "internal/goarch" "runtime/internal/sys" "unsafe" @@ -233,10 +234,13 @@ func (h heapBits) nextFast() (heapBits, uintptr) { // make sure the underlying allocation contains pointers, usually // by checking typ.PtrBytes. // +// The type of the space can be provided purely as an optimization, +// however it is not used with GOEXPERIMENT=noallocheaders. +// // Callers must perform cgo checks if goexperiment.CgoCheck2. // //go:nosplit -func bulkBarrierPreWrite(dst, src, size uintptr) { +func bulkBarrierPreWrite(dst, src, size uintptr, _ *abi.Type) { if (dst|src|size)&(goarch.PtrSize-1) != 0 { throw("bulkBarrierPreWrite: unaligned arguments") } @@ -305,8 +309,11 @@ func bulkBarrierPreWrite(dst, src, size uintptr) { // This is used for special cases where e.g. dst was just // created and zeroed with malloc. // +// The type of the space can be provided purely as an optimization, +// however it is not used with GOEXPERIMENT=noallocheaders. +// //go:nosplit -func bulkBarrierPreWriteSrcOnly(dst, src, size uintptr) { +func bulkBarrierPreWriteSrcOnly(dst, src, size uintptr, _ *abi.Type) { if (dst|src|size)&(goarch.PtrSize-1) != 0 { throw("bulkBarrierPreWrite: unaligned arguments") } |
