| Age | Commit message (Collapse) | Author |
|
This is long overdue.
Change-Id: I891b114cb581e82b903c20d1c455bbbdad548fe8
Reviewed-on: https://go-review.googlesource.com/c/go/+/690535
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
Our current parallel mark algorithm suffers from frequent stalls on
memory since its access pattern is essentially random. Small objects
are the worst offenders, since each one forces pulling in at least one
full cache line to access even when the amount to be scanned is far
smaller than that. Each object also requires an independent access to
per-object metadata.
The purpose of this change is to improve garbage collector performance
by scanning small objects in batches to obtain better cache locality
than our current approach. The core idea behind this change is to defer
marking and scanning small objects, and then scan them in batches
localized to a span.
This change adds scanned bits to each small object (<=512 bytes) span in
addition to mark bits. The scanned bits indicate that the object has
been scanned. (One way to think of them is "grey" bits and "black" bits
in the tri-color mark-sweep abstraction.) Each of these spans is always
8 KiB and if they contain pointers, the pointer/scalar data is already
packed together at the end of the span, allowing us to further optimize
the mark algorithm for this specific case.
When the GC encounters a pointer, it first checks if it points into a
small object span. If so, it is first marked in the mark bits, and then
the object is queued on a work-stealing P-local queue. This object
represents the whole span, and we ensure that a span can only appear at
most once in any queue by maintaining an atomic ownership bit for each
span. Later, when the pointer is dequeued, we scan every object with a
set mark that doesn't have a corresponding scanned bit. If it turns out
that was the only object in the mark bits since the last time we scanned
the span, we scan just that object directly, essentially falling back to
the existing algorithm. noscan objects have no scan work, so they are
never queued.
Each span's mark and scanned bits are co-located together at the end of
the span. Since the span is always 8 KiB in size, it can be found with
simple pointer arithmetic. Next to the marks and scans we also store the
size class, eliminating the need to access the span's mspan altogether.
The work-stealing P-local queue is a new source of GC work. If this
queue gets full, half of it is dumped to a global linked list of spans
to scan. The regular scan queues are always prioritized over this queue
to allow time for darts to accumulate. Stealing work from other Ps is a
last resort.
This change also adds a new debug mode under GODEBUG=gctrace=2 that
dumps whole-span scanning statistics by size class on every GC cycle.
A future extension to this CL is to use SIMD-accelerated scanning
kernels for scanning spans with high mark bit density.
For #19112. (Deadlock averted in GOEXPERIMENT.)
For #73581.
Change-Id: I4bbb4e36f376950a53e61aaaae157ce842c341bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/658036
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For #65355
Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be
Reviewed-on: https://go-review.googlesource.com/c/go/+/560155
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
|
|
Have the write barrier call return a pointer to a buffer into which
the generated code records pointers that need write barrier treatment.
Change-Id: I7871764298e0aa1513de417010c8d46b296b199e
Reviewed-on: https://go-review.googlesource.com/c/go/+/447781
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Bypass: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Future CLs will remove the invariant that pointers are always put in
the write barrier in pairs.
The behavior of the assembly code changes a bit, where instead of writing
the pointers unconditionally and then checking for overflow, check for
overflow first and then write the pointers.
Also changed the write barrier flush function to not take the src/dst
as arguments.
Change-Id: I2ef708038367b7b82ea67cbaf505a1d5904c775c
Reviewed-on: https://go-review.googlesource.com/c/go/+/447779
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Bypass: Keith Randall <khr@golang.org>
|
|
Move this knob from a binary-startup thing to a build-time thing.
This will enable followon optmizations to the write barrier.
Change-Id: Ic3323348621c76a7dc390c09ff55016b19c43018
Reviewed-on: https://go-review.googlesource.com/c/go/+/447778
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
_g_, _p_, and _m_ are primarily vestiges of the C version of the
runtime, while today we prefer Go-style variable names (generally gp,
pp, and mp).
This change replaces all remaining uses of _p_ with pp. These are all
trivial replacements (i.e., no conflicts). That said, there are several
functions that refer to two different Ps at once. There the naming
convention is generally that pp refers to the local P, and p2 refers to
the other P we are accessing.
Change-Id: I205b801be839216972e7644b1fbeacdbf2612859
Reviewed-on: https://go-review.googlesource.com/c/go/+/306674
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
[This CL is part of a sequence implementing the proposal #51082.
The design doc is at https://go.dev/s/godocfmt-design.]
Run the updated gofmt, which reformats doc comments,
on the main repository. Vendored files are excluded.
For #51082.
Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407
Reviewed-on: https://go-review.googlesource.com/c/go/+/384268
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
[git-generate]
cd src/runtime
goimports -w *.go
Change-Id: I1387af0f2fd1a213dc2f4c122e83a8db0fcb15f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/329189
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
internal/goarch.PtrSize [generated]
[git-generate]
cd src/runtime/internal/math
gofmt -w -r "sys.PtrSize -> goarch.PtrSize" .
goimports -w *.go
cd ../..
gofmt -w -r "sys.PtrSize -> goarch.PtrSize" .
goimports -w *.go
Change-Id: I43491cdd54d2e06d4d04152b3d213851b7d6d423
Reviewed-on: https://go-review.googlesource.com/c/go/+/328337
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
debugCachedWork and all of its dependent fields and code were added to
aid in debugging issue #27993. Now that the source of the problem is
known and mitigated (via the extra work check after STW in gcMarkDone),
these extra checks are no longer required and simply make the code more
difficult to follow.
Remove it all.
Updates #27993
Change-Id: I594beedd5ca61733ba9cc9eaad8f80ea92df1a0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/262350
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
The page sweeper depends on spans being marked if any object in the
span is marked, but currently only greyobject does this.
gcmarknewobject and wbBufFlush1 also mark objects, but neither set
span marks. As a result, if there are live objects on a span, but
they're all marked via allocation or write barriers, then the span
itself won't be marked and the page reclaimer will free the span,
ultimately leading to memory corruption when the memory for those live
allocations gets reused.
Fix this by making gcmarknewobject and wbBufFlush1 also mark pages.
No test because I have no idea how to reliably (or even unreliably)
trigger this.
Fixes #39432.
Performance is a wash or very slightly worse. I benchmarked the
gcmarknewobject and wbBufFlush1 changes independently and both showed
a slight performance improvement, so I'm going to call this noise.
name old time/op new time/op delta
BiogoIgor 15.9s ± 2% 15.9s ± 2% ~ (p=0.758 n=25+25)
BiogoKrishna 15.7s ± 3% 15.7s ± 3% ~ (p=0.382 n=21+21)
BleveIndexBatch100 4.94s ± 3% 5.07s ± 4% +2.63% (p=0.000 n=25+25)
CompileTemplate 204ms ± 1% 205ms ± 1% +0.43% (p=0.000 n=21+23)
CompileUnicode 77.8ms ± 1% 78.1ms ± 1% ~ (p=0.130 n=23+23)
CompileGoTypes 731ms ± 1% 733ms ± 1% +0.30% (p=0.006 n=22+22)
CompileCompiler 3.64s ± 2% 3.65s ± 3% ~ (p=0.179 n=24+25)
CompileSSA 8.44s ± 1% 8.46s ± 1% +0.30% (p=0.003 n=22+23)
CompileFlate 132ms ± 1% 133ms ± 1% ~ (p=0.098 n=22+22)
CompileGoParser 164ms ± 1% 164ms ± 1% +0.37% (p=0.000 n=21+23)
CompileReflect 455ms ± 1% 457ms ± 2% +0.50% (p=0.002 n=20+22)
CompileTar 182ms ± 2% 182ms ± 1% ~ (p=0.382 n=22+22)
CompileXML 245ms ± 3% 245ms ± 1% ~ (p=0.070 n=21+23)
CompileStdCmd 16.5s ± 2% 16.5s ± 3% ~ (p=0.486 n=23+23)
FoglemanFauxGLRenderRotateBoat 12.9s ± 1% 13.0s ± 1% +0.97% (p=0.000 n=21+24)
FoglemanPathTraceRenderGopherIter1 18.6s ± 1% 18.7s ± 0% ~ (p=0.083 n=23+24)
GopherLuaKNucleotide 28.4s ± 1% 29.3s ± 1% +2.84% (p=0.000 n=25+25)
MarkdownRenderXHTML 252ms ± 0% 251ms ± 1% -0.50% (p=0.000 n=23+24)
Tile38WithinCircle100kmRequest 516µs ± 2% 516µs ± 2% ~ (p=0.763 n=24+25)
Tile38IntersectsCircle100kmRequest 689µs ± 2% 689µs ± 2% ~ (p=0.617 n=24+24)
Tile38KNearestLimit100Request 608µs ± 1% 606µs ± 2% -0.35% (p=0.030 n=19+22)
[Geo mean] 522ms 524ms +0.41%
https://perf.golang.org/search?q=upload:20200606.4
Change-Id: I8b331f310dbfaba0468035f207467c8403005bf5
Reviewed-on: https://go-review.googlesource.com/c/go/+/236817
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Currently, we flush the write barrier buffer on every write barrier
once throwOnGCWork is set, but not during the mark completion
algorithm itself. As seen in recent failures like
https://build.golang.org/log/317369853b803b4ee762b27653f367e1aa445ac1
by the time we actually catch a late gcWork put, the write barrier
buffer is full-size again.
As a result, we're probably not catching the actual problematic write
barrier, which is probably somewhere in the buffer.
Fix this by using the gcWork pause generation to also keep the write
barrier buffer small between the mark completion flushes it and when
mark completion is done.
For #27993.
Change-Id: I77618169441d42a7d562fb2a998cfaa89891edb2
Reviewed-on: https://go-review.googlesource.com/c/154638
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Currently we only know the slot address and the value being written in
the throwOnGCWork crash tracebacks, and we have to infer the old value
from what's dumped by gcWork.checkPut. Sometimes these old values
don't make sense, like when we see a write of a nil pointer to a
freshly-allocated object, yet we observe marking a value (where did
that pointer come from?).
This CL adds the old value of the slot and the first two pointers in
the buffer to the traceback.
For #27993.
Change-Id: Ib70eead1afb9c06e8099e520172c3a2acaa45f80
Reviewed-on: https://go-review.googlesource.com/c/154597
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Currently we reset the write barrier buffer before processing the
pointers in it. As a result, if there were any write barriers in the
code that processes the buffer, it would corrupt the write barrier
buffer and cause us to mark objects without later scanning them.
As far as I can tell, this shouldn't be happening, but rather than
relying on hope (and incomplete static analysis), this CL changes
wbBufFlush1 to poison the write barrier buffer while processing it,
and only reset it once it's done.
Updates #27993. (Unlike many of the other changes for this issue,
there's no need to roll back this CL. It's a good change in its own
right.)
Change-Id: I6d2d9f1b69b89438438b9ee624f3fff9f009e29d
Reviewed-on: https://go-review.googlesource.com/c/154537
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
This adds several new checks to help debug #27993. It adds a mechanism
for freezing write barriers and gcWork puts during the mark completion
algorithm. This way, if we do detect mark completion, we can catch any
puts that happened during the completion algorithm. Based on build
dashboard failures, this seems to be the window of time when these are
happening.
This also double-checks that all work buffers are empty immediately
upon entering mark termination (much earlier than the current check).
This is unlikely to trigger based on the current failures, but is a
good safety net.
Change-Id: I03f56c48c4322069e28c50fbc3c15b2fee2130c2
Reviewed-on: https://go-review.googlesource.com/c/151797
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Go documentation style for boolean funcs is to say:
// Foo reports whether ...
func Foo() bool
(rather than "returns true if")
This CL also replaces 4 uses of "iff" with the same "reports whether"
wording, which doesn't lose any meaning, and will prevent people from
sending typo fixes when they don't realize it's "if and only if". In
the past I think we've had the typo CLs updated to just say "reports
whether". So do them all at once.
(Inspired by the addition of another "returns true if" in CL 146938
in fd_plan9.go)
Created with:
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor)
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor)
Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f
Reviewed-on: https://go-review.googlesource.com/c/147037
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Currently, we disable GC work caching during mark termination. This is
no longer necessary with the new mark completion detection because
1. There's no way for any of the GC mark termination helpers to have
any real work queued and,
2. Mark termination has to explicitly flush every P's buffers anyway
in order to flush Ps that didn't run a GC mark termination helper.
Hence, remove the code that disposes gcWork buffers during mark
termination.
Updates #26903. This is a follow-up to eliminating mark 2.
Change-Id: I81f002ee25d5c10f42afd39767774636519007f9
Reviewed-on: https://go-review.googlesource.com/c/134320
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Now that there is no mark 2 phase, gcBlackenPromptly is no longer
used.
Updates #26903. This is a follow-up to eliminating mark 2.
Change-Id: Ib9c534f21b36b8416fcf3cab667f186167b827f8
Reviewed-on: https://go-review.googlesource.com/c/134319
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
The mark 2 phase was originally introduced as a way to reduce the
chance of entering STW mark termination while there was still marking
work to do. It works by flushing and disabling all local work caches
so that all enqueued work becomes immediately globally visible.
However, mark 2 is not only slow–disabling caches makes marking and
the write barrier both much more expensive–but also imperfect. There
is still a rare but possible race (~once per all.bash) that can cause
GC to enter mark termination while there is still marking work. This
race is detailed at
https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md#appendix-mark-completion-race
The effect of this is that mark termination must still cope with the
possibility that there may be work remaining after a concurrent mark
phase. Dealing with this increases STW pause time and increases the
complexity of mark termination.
Furthermore, a similar but far more likely race can cause early
transition from mark 1 to mark 2. This is unfortunate because it
causes performance instability because of the cost of mark 2.
This CL fixes this by replacing mark 2 with a distributed termination
detection algorithm. This algorithm is correct, so it eliminates the
mark termination race, and doesn't require disabling local caches. It
ensures that there are no grey objects upon entering mark termination.
With this change, we're one step closer to eliminating marking from
mark termination entirely (it's still used by STW GC and checkmarks
mode).
This CL does not eliminate the gcBlackenPromptly global flag, though
it is always set to false now. It will be removed in a cleanup CL.
This led to only minor variations in the go1 benchmarks
(https://perf.golang.org/search?q=upload:20180909.1) and compilebench
benchmarks (https://perf.golang.org/search?q=upload:20180910.2).
This significantly improves performance of the garbage benchmark, with
no impact on STW times:
name old time/op new time/op delta
Garbage/benchmem-MB=64-12 2.21ms ± 1% 2.05ms ± 1% -7.38% (p=0.000 n=18+19)
Garbage/benchmem-MB=1024-12 2.30ms ±16% 2.20ms ± 7% -4.51% (p=0.001 n=20+20)
name old STW-ns/GC new STW-ns/GC delta
Garbage/benchmem-MB=64-12 138k ±44% 141k ±23% ~ (p=0.309 n=19+20)
Garbage/benchmem-MB=1024-12 159k ±25% 178k ±98% ~ (p=0.798 n=16+18)
name old STW-ns/op new STW-ns/op delta
Garbage/benchmem-MB=64-12 4.42k ±44% 4.24k ±23% ~ (p=0.531 n=19+20)
Garbage/benchmem-MB=1024-12 591 ±24% 636 ±111% ~ (p=0.309 n=16+18)
(https://perf.golang.org/search?q=upload:20180910.1)
Updates #26903.
Updates #17503.
Change-Id: Icbd1e12b7a12a76f423c9bf033b13cb363e4cd19
Reviewed-on: https://go-review.googlesource.com/c/134318
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
gcWriteBarrier and wbBufFlush assume that not writing to an argument
variable is sufficient to not clobber the corresponding argument slot.
This assumption lets us simplify the write barrier assembly code,
speed up the flush path, and reduce the stack usage of the write
barrier.
But it is an assumption, so this CL documents it to make this clear.
Alternatively, we could separate the register spill slots from the
argument slots in the write barrier, but that loses the advantages
above. On the other hand, it's extremely unlikely that we'll change
the behavior of the compiler to start clobbering argument slots (if
anything, we'd probably change it to *not* clobber argument slots even
if you wrote to the arguments).
Fixes #25512.
Change-Id: Ib2cf29c0d90956ca02b997ef6e7fa56fc8044efe
Reviewed-on: https://go-review.googlesource.com/127815
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This replaces all uses of the mheap_.arena_* fields outside of
mallocinit and sysAlloc. These fields fundamentally assume a
contiguous heap between two bounds, so eliminating these is necessary
for a sparse heap.
Many of these are replaced with checks for non-nil spans at the test
address (which in turn checks for a non-nil entry in the heap arena
array). Some of them are just for debugging and somewhat meaningless
with a sparse heap, so those we just delete.
Updates #10460.
Change-Id: I8345b95ffc610aed694f08f74633b3c63506a41f
Reviewed-on: https://go-review.googlesource.com/85886
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
heapBitsForObject does two things: it finds the base of the object and
it creates the heapBits for the base of the object. There are several
places where we just care about the base of the object. Furthermore,
greyobject only needs the heapBits in the checkmark path and can
easily compute them only when needed. Once we eliminate passing the
heap bits to grayobject, almost all uses of heapBitsForObject don't
need the heap bits.
Hence, this splits heapBitsForObject into findObject and
heapBitsForAddr (the latter already exists), removes the hbits
argument to grayobject, and replaces all heapBitsForObject calls with
calls to findObject.
In addition to making things cleaner overall, heapBitsForAddr is going
to get more expensive shortly, so it's important that we don't do it
needlessly.
Note that there's an interesting performance pitfall here. I had
originally moved findObject to mheap.go, since it made more sense
there. However, that leads to a ~2% slow down and a whopping 11%
increase in L1 icache misses on both the x/garbage and compilebench
benchmarks. This suggests we may want to be more principled about
this, but, for now, let's just leave findObject in mbitmap.go.
(I tried to make findObject small enough to inline by splitting out
the error case, but, sadly, wasn't quite able to get it under the
inlining budget.)
Change-Id: I7bcb92f383ade565d22a9f2494e4c66fd513fb10
Reviewed-on: https://go-review.googlesource.com/85878
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Now that the buffered write barrier is implemented for all
architectures, we can remove the old eager write barrier
implementation. This CL removes the implementation from the runtime,
support in the compiler for calling it, and updates some compiler
tests that relied on the old eager barrier support. It also makes sure
that all of the useful comments from the old write barrier
implementation still have a place to live.
Fixes #22460.
Updates #21640 since this fixes the layering concerns of the write
barrier (but not the other things in that issue).
Change-Id: I580f93c152e89607e0a72fe43370237ba97bae74
Reviewed-on: https://go-review.googlesource.com/92705
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Currently, wbBufFlush does nothing if the goroutine is dying on the
assumption that the system is crashing anyway and running the write
barrier may crash it even more. However, it fails to reset the
buffer's "next" pointer. As a result, if there are later write
barriers on the same P, the write barrier will overflow the write
barrier buffer and start corrupting other fields in the P or other
heap objects. Often, this corrupts fields in the next allocated P
since they tend to be together in the heap.
Fix this by always resetting the buffer's "next" pointer, even if
we're not doing anything with the pointers in the buffer.
Updates #22987 and #22988. (May fix; it's hard to say.)
Change-Id: I82c11ea2d399e1658531c3e8065445a66b7282b2
Reviewed-on: https://go-review.googlesource.com/83016
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
This modifies bulkBarrierPreWrite to use the buffered write barrier
instead of the eager write barrier. This reduces the number of system
stack switches and sanity checks by a factor of the buffer size
(currently 256). This affects both typedmemmove and typedmemclr.
Since this is purely a runtime change, it applies to all arches
(unlike the pointer write barrier).
name old time/op new time/op delta
BulkWriteBarrier-12 7.33ns ± 6% 4.46ns ± 9% -39.10% (p=0.000 n=20+19)
Updates #22460.
Change-Id: I6a686a63bbf08be02b9b97250e37163c5a90cdd8
Reviewed-on: https://go-review.googlesource.com/73832
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
This implements runtime support for buffered write barriers on amd64.
The buffered write barrier has a fast path that simply enqueues
pointers in a per-P buffer. Unlike the current write barrier, this
fast path is *not* a normal Go call and does not require the compiler
to spill general-purpose registers or put arguments on the stack. When
the buffer fills up, the write barrier takes the slow path, which
spills all general purpose registers and flushes the buffer. We don't
allow safe-points or stack splits while this frame is active, so it
doesn't matter that we have no type information for the spilled
registers in this frame.
One minor complication is cgocheck=2 mode, which uses the write
barrier to detect Go pointers being written to non-Go memory. We
obviously can't buffer this, so instead we set the buffer to its
minimum size, forcing the write barrier into the slow path on every
call. For this specific case, we pass additional information as
arguments to the flush function. This also requires enabling the cgo
write barrier slightly later during runtime initialization, after Ps
(and the per-P write barrier buffers) have been initialized.
The code in this CL is not yet active. The next CL will modify the
compiler to generate calls to the new write barrier.
This reduces the average cost of the write barrier by roughly a factor
of 4, which will pay for the cost of having it enabled more of the
time after we make the GC pacer less aggressive. (Benchmarks will be
in the next CL.)
Updates #14951.
Updates #22460.
Change-Id: I396b5b0e2c5e5c4acfd761a3235fd15abadc6cb1
Reviewed-on: https://go-review.googlesource.com/73711
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|