aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/mwbbuf.go
AgeCommit message (Collapse)Author
2025-07-25runtime: rename scanobject to scanObjectMichael Anthony Knyszek
This is long overdue. Change-Id: I891b114cb581e82b903c20d1c455bbbdad548fe8 Reviewed-on: https://go-review.googlesource.com/c/go/+/690535 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-02runtime: mark and scan small objects in whole spans [green tea]Michael Anthony Knyszek
Our current parallel mark algorithm suffers from frequent stalls on memory since its access pattern is essentially random. Small objects are the worst offenders, since each one forces pulling in at least one full cache line to access even when the amount to be scanned is far smaller than that. Each object also requires an independent access to per-object metadata. The purpose of this change is to improve garbage collector performance by scanning small objects in batches to obtain better cache locality than our current approach. The core idea behind this change is to defer marking and scanning small objects, and then scan them in batches localized to a span. This change adds scanned bits to each small object (<=512 bytes) span in addition to mark bits. The scanned bits indicate that the object has been scanned. (One way to think of them is "grey" bits and "black" bits in the tri-color mark-sweep abstraction.) Each of these spans is always 8 KiB and if they contain pointers, the pointer/scalar data is already packed together at the end of the span, allowing us to further optimize the mark algorithm for this specific case. When the GC encounters a pointer, it first checks if it points into a small object span. If so, it is first marked in the mark bits, and then the object is queued on a work-stealing P-local queue. This object represents the whole span, and we ensure that a span can only appear at most once in any queue by maintaining an atomic ownership bit for each span. Later, when the pointer is dequeued, we scan every object with a set mark that doesn't have a corresponding scanned bit. If it turns out that was the only object in the mark bits since the last time we scanned the span, we scan just that object directly, essentially falling back to the existing algorithm. noscan objects have no scan work, so they are never queued. Each span's mark and scanned bits are co-located together at the end of the span. Since the span is always 8 KiB in size, it can be found with simple pointer arithmetic. Next to the marks and scans we also store the size class, eliminating the need to access the span's mspan altogether. The work-stealing P-local queue is a new source of GC work. If this queue gets full, half of it is dumped to a global linked list of spans to scan. The regular scan queues are always prioritized over this queue to allow time for darts to accumulate. Stealing work from other Ps is a last resort. This change also adds a new debug mode under GODEBUG=gctrace=2 that dumps whole-span scanning statistics by size class on every GC cycle. A future extension to this CL is to use SIMD-accelerated scanning kernels for scanning spans with high mark bit density. For #19112. (Deadlock averted in GOEXPERIMENT.) For #73581. Change-Id: I4bbb4e36f376950a53e61aaaae157ce842c341bc Reviewed-on: https://go-review.googlesource.com/c/go/+/658036 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-25runtime: migrate internal/atomic to internal/runtimeAndy Pan
For #65355 Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be Reviewed-on: https://go-review.googlesource.com/c/go/+/560155 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
2023-02-24cmd/compile: batch write barrier callsKeith Randall
Have the write barrier call return a pointer to a buffer into which the generated code records pointers that need write barrier treatment. Change-Id: I7871764298e0aa1513de417010c8d46b296b199e Reviewed-on: https://go-review.googlesource.com/c/go/+/447781 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Bypass: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-02-17runtime: remove the restriction that write barrier ptrs come in pairsKeith Randall
Future CLs will remove the invariant that pointers are always put in the write barrier in pairs. The behavior of the assembly code changes a bit, where instead of writing the pointers unconditionally and then checking for overflow, check for overflow first and then write the pointers. Also changed the write barrier flush function to not take the src/dst as arguments. Change-Id: I2ef708038367b7b82ea67cbaf505a1d5904c775c Reviewed-on: https://go-review.googlesource.com/c/go/+/447779 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Bypass: Keith Randall <khr@golang.org>
2023-02-16runtime: reimplement GODEBUG=cgocheck=2 as a GOEXPERIMENTKeith Randall
Move this knob from a binary-startup thing to a build-time thing. This will enable followon optmizations to the write barrier. Change-Id: Ic3323348621c76a7dc390c09ff55016b19c43018 Reviewed-on: https://go-review.googlesource.com/c/go/+/447778 Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-08-02runtime: rename _p_ to ppMichael Pratt
_g_, _p_, and _m_ are primarily vestiges of the C version of the runtime, while today we prefer Go-style variable names (generally gp, pp, and mp). This change replaces all remaining uses of _p_ with pp. These are all trivial replacements (i.e., no conflicts). That said, there are several functions that refer to two different Ps at once. There the naming convention is generally that pp refers to the local P, and p2 refers to the other P we are accessing. Change-Id: I205b801be839216972e7644b1fbeacdbf2612859 Reviewed-on: https://go-review.googlesource.com/c/go/+/306674 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-11all: gofmt main repoRuss Cox
[This CL is part of a sequence implementing the proposal #51082. The design doc is at https://go.dev/s/godocfmt-design.] Run the updated gofmt, which reformats doc comments, on the main repository. Vendored files are excluded. For #51082. Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407 Reviewed-on: https://go-review.googlesource.com/c/go/+/384268 Reviewed-by: Jonathan Amsterdam <jba@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2021-06-17[dev.typeparams] runtime: fix import sort order [generated]Michael Anthony Knyszek
[git-generate] cd src/runtime goimports -w *.go Change-Id: I1387af0f2fd1a213dc2f4c122e83a8db0fcb15f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/329189 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-06-17[dev.typeparams] runtime: replace uses of runtime/internal/sys.PtrSize with ↵Michael Anthony Knyszek
internal/goarch.PtrSize [generated] [git-generate] cd src/runtime/internal/math gofmt -w -r "sys.PtrSize -> goarch.PtrSize" . goimports -w *.go cd ../.. gofmt -w -r "sys.PtrSize -> goarch.PtrSize" . goimports -w *.go Change-Id: I43491cdd54d2e06d4d04152b3d213851b7d6d423 Reviewed-on: https://go-review.googlesource.com/c/go/+/328337 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-10-15runtime: remove debugCachedWorkMichael Pratt
debugCachedWork and all of its dependent fields and code were added to aid in debugging issue #27993. Now that the source of the problem is known and mitigated (via the extra work check after STW in gcMarkDone), these extra checks are no longer required and simply make the code more difficult to follow. Remove it all. Updates #27993 Change-Id: I594beedd5ca61733ba9cc9eaad8f80ea92df1a0d Reviewed-on: https://go-review.googlesource.com/c/go/+/262350 Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2020-06-08runtime: always mark span when marking an objectAustin Clements
The page sweeper depends on spans being marked if any object in the span is marked, but currently only greyobject does this. gcmarknewobject and wbBufFlush1 also mark objects, but neither set span marks. As a result, if there are live objects on a span, but they're all marked via allocation or write barriers, then the span itself won't be marked and the page reclaimer will free the span, ultimately leading to memory corruption when the memory for those live allocations gets reused. Fix this by making gcmarknewobject and wbBufFlush1 also mark pages. No test because I have no idea how to reliably (or even unreliably) trigger this. Fixes #39432. Performance is a wash or very slightly worse. I benchmarked the gcmarknewobject and wbBufFlush1 changes independently and both showed a slight performance improvement, so I'm going to call this noise. name old time/op new time/op delta BiogoIgor 15.9s ± 2% 15.9s ± 2% ~ (p=0.758 n=25+25) BiogoKrishna 15.7s ± 3% 15.7s ± 3% ~ (p=0.382 n=21+21) BleveIndexBatch100 4.94s ± 3% 5.07s ± 4% +2.63% (p=0.000 n=25+25) CompileTemplate 204ms ± 1% 205ms ± 1% +0.43% (p=0.000 n=21+23) CompileUnicode 77.8ms ± 1% 78.1ms ± 1% ~ (p=0.130 n=23+23) CompileGoTypes 731ms ± 1% 733ms ± 1% +0.30% (p=0.006 n=22+22) CompileCompiler 3.64s ± 2% 3.65s ± 3% ~ (p=0.179 n=24+25) CompileSSA 8.44s ± 1% 8.46s ± 1% +0.30% (p=0.003 n=22+23) CompileFlate 132ms ± 1% 133ms ± 1% ~ (p=0.098 n=22+22) CompileGoParser 164ms ± 1% 164ms ± 1% +0.37% (p=0.000 n=21+23) CompileReflect 455ms ± 1% 457ms ± 2% +0.50% (p=0.002 n=20+22) CompileTar 182ms ± 2% 182ms ± 1% ~ (p=0.382 n=22+22) CompileXML 245ms ± 3% 245ms ± 1% ~ (p=0.070 n=21+23) CompileStdCmd 16.5s ± 2% 16.5s ± 3% ~ (p=0.486 n=23+23) FoglemanFauxGLRenderRotateBoat 12.9s ± 1% 13.0s ± 1% +0.97% (p=0.000 n=21+24) FoglemanPathTraceRenderGopherIter1 18.6s ± 1% 18.7s ± 0% ~ (p=0.083 n=23+24) GopherLuaKNucleotide 28.4s ± 1% 29.3s ± 1% +2.84% (p=0.000 n=25+25) MarkdownRenderXHTML 252ms ± 0% 251ms ± 1% -0.50% (p=0.000 n=23+24) Tile38WithinCircle100kmRequest 516µs ± 2% 516µs ± 2% ~ (p=0.763 n=24+25) Tile38IntersectsCircle100kmRequest 689µs ± 2% 689µs ± 2% ~ (p=0.617 n=24+24) Tile38KNearestLimit100Request 608µs ± 1% 606µs ± 2% -0.35% (p=0.030 n=19+22) [Geo mean] 522ms 524ms +0.41% https://perf.golang.org/search?q=upload:20200606.4 Change-Id: I8b331f310dbfaba0468035f207467c8403005bf5 Reviewed-on: https://go-review.googlesource.com/c/go/+/236817 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-12-18runtime: flush on every write barrier while debuggingAustin Clements
Currently, we flush the write barrier buffer on every write barrier once throwOnGCWork is set, but not during the mark completion algorithm itself. As seen in recent failures like https://build.golang.org/log/317369853b803b4ee762b27653f367e1aa445ac1 by the time we actually catch a late gcWork put, the write barrier buffer is full-size again. As a result, we're probably not catching the actual problematic write barrier, which is probably somewhere in the buffer. Fix this by using the gcWork pause generation to also keep the write barrier buffer small between the mark completion flushes it and when mark completion is done. For #27993. Change-Id: I77618169441d42a7d562fb2a998cfaa89891edb2 Reviewed-on: https://go-review.googlesource.com/c/154638 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-12-17runtime: record extra information in throwOnGCWork crashesAustin Clements
Currently we only know the slot address and the value being written in the throwOnGCWork crash tracebacks, and we have to infer the old value from what's dumped by gcWork.checkPut. Sometimes these old values don't make sense, like when we see a write of a nil pointer to a freshly-allocated object, yet we observe marking a value (where did that pointer come from?). This CL adds the old value of the slot and the first two pointers in the buffer to the traceback. For #27993. Change-Id: Ib70eead1afb9c06e8099e520172c3a2acaa45f80 Reviewed-on: https://go-review.googlesource.com/c/154597 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-12-17runtime: poison the write barrier buffer during flushingAustin Clements
Currently we reset the write barrier buffer before processing the pointers in it. As a result, if there were any write barriers in the code that processes the buffer, it would corrupt the write barrier buffer and cause us to mark objects without later scanning them. As far as I can tell, this shouldn't be happening, but rather than relying on hope (and incomplete static analysis), this CL changes wbBufFlush1 to poison the write barrier buffer while processing it, and only reset it once it's done. Updates #27993. (Unlike many of the other changes for this issue, there's no need to roll back this CL. It's a good change in its own right.) Change-Id: I6d2d9f1b69b89438438b9ee624f3fff9f009e29d Reviewed-on: https://go-review.googlesource.com/c/154537 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-11-29runtime: check more work flushing racesAustin Clements
This adds several new checks to help debug #27993. It adds a mechanism for freezing write barriers and gcWork puts during the mark completion algorithm. This way, if we do detect mark completion, we can catch any puts that happened during the completion algorithm. Based on build dashboard failures, this seems to be the window of time when these are happening. This also double-checks that all work buffers are empty immediately upon entering mark termination (much earlier than the current check). This is unlikely to trigger based on the current failures, but is a good safety net. Change-Id: I03f56c48c4322069e28c50fbc3c15b2fee2130c2 Reviewed-on: https://go-review.googlesource.com/c/151797 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-11-02all: use "reports whether" consistently in the few places that didn'tBrad Fitzpatrick
Go documentation style for boolean funcs is to say: // Foo reports whether ... func Foo() bool (rather than "returns true if") This CL also replaces 4 uses of "iff" with the same "reports whether" wording, which doesn't lose any meaning, and will prevent people from sending typo fixes when they don't realize it's "if and only if". In the past I think we've had the typo CLs updated to just say "reports whether". So do them all at once. (Inspired by the addition of another "returns true if" in CL 146938 in fd_plan9.go) Created with: $ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor) $ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor) Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f Reviewed-on: https://go-review.googlesource.com/c/147037 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-10-02runtime: don't disable GC work caching during mark terminationAustin Clements
Currently, we disable GC work caching during mark termination. This is no longer necessary with the new mark completion detection because 1. There's no way for any of the GC mark termination helpers to have any real work queued and, 2. Mark termination has to explicitly flush every P's buffers anyway in order to flush Ps that didn't run a GC mark termination helper. Hence, remove the code that disposes gcWork buffers during mark termination. Updates #26903. This is a follow-up to eliminating mark 2. Change-Id: I81f002ee25d5c10f42afd39767774636519007f9 Reviewed-on: https://go-review.googlesource.com/c/134320 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02runtime: eliminate gcBlackenPromptly modeAustin Clements
Now that there is no mark 2 phase, gcBlackenPromptly is no longer used. Updates #26903. This is a follow-up to eliminating mark 2. Change-Id: Ib9c534f21b36b8416fcf3cab667f186167b827f8 Reviewed-on: https://go-review.googlesource.com/c/134319 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02runtime: eliminate mark 2 and fix mark termination raceAustin Clements
The mark 2 phase was originally introduced as a way to reduce the chance of entering STW mark termination while there was still marking work to do. It works by flushing and disabling all local work caches so that all enqueued work becomes immediately globally visible. However, mark 2 is not only slow–disabling caches makes marking and the write barrier both much more expensive–but also imperfect. There is still a rare but possible race (~once per all.bash) that can cause GC to enter mark termination while there is still marking work. This race is detailed at https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md#appendix-mark-completion-race The effect of this is that mark termination must still cope with the possibility that there may be work remaining after a concurrent mark phase. Dealing with this increases STW pause time and increases the complexity of mark termination. Furthermore, a similar but far more likely race can cause early transition from mark 1 to mark 2. This is unfortunate because it causes performance instability because of the cost of mark 2. This CL fixes this by replacing mark 2 with a distributed termination detection algorithm. This algorithm is correct, so it eliminates the mark termination race, and doesn't require disabling local caches. It ensures that there are no grey objects upon entering mark termination. With this change, we're one step closer to eliminating marking from mark termination entirely (it's still used by STW GC and checkmarks mode). This CL does not eliminate the gcBlackenPromptly global flag, though it is always set to false now. It will be removed in a cleanup CL. This led to only minor variations in the go1 benchmarks (https://perf.golang.org/search?q=upload:20180909.1) and compilebench benchmarks (https://perf.golang.org/search?q=upload:20180910.2). This significantly improves performance of the garbage benchmark, with no impact on STW times: name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.21ms ± 1% 2.05ms ± 1% -7.38% (p=0.000 n=18+19) Garbage/benchmem-MB=1024-12 2.30ms ±16% 2.20ms ± 7% -4.51% (p=0.001 n=20+20) name old STW-ns/GC new STW-ns/GC delta Garbage/benchmem-MB=64-12 138k ±44% 141k ±23% ~ (p=0.309 n=19+20) Garbage/benchmem-MB=1024-12 159k ±25% 178k ±98% ~ (p=0.798 n=16+18) name old STW-ns/op new STW-ns/op delta Garbage/benchmem-MB=64-12 4.42k ±44% 4.24k ±23% ~ (p=0.531 n=19+20) Garbage/benchmem-MB=1024-12 591 ±24% 636 ±111% ~ (p=0.309 n=16+18) (https://perf.golang.org/search?q=upload:20180910.1) Updates #26903. Updates #17503. Change-Id: Icbd1e12b7a12a76f423c9bf033b13cb363e4cd19 Reviewed-on: https://go-review.googlesource.com/c/134318 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2018-08-03runtime: document assumption about wbBufFlush argument slotsAustin Clements
gcWriteBarrier and wbBufFlush assume that not writing to an argument variable is sufficient to not clobber the corresponding argument slot. This assumption lets us simplify the write barrier assembly code, speed up the flush path, and reduce the stack usage of the write barrier. But it is an assumption, so this CL documents it to make this clear. Alternatively, we could separate the register spill slots from the argument slots in the write barrier, but that loses the advantages above. On the other hand, it's extremely unlikely that we'll change the behavior of the compiler to start clobbering argument slots (if anything, we'd probably change it to *not* clobber argument slots even if you wrote to the arguments). Fixes #25512. Change-Id: Ib2cf29c0d90956ca02b997ef6e7fa56fc8044efe Reviewed-on: https://go-review.googlesource.com/127815 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-15runtime: eliminate most uses of mheap_.arena_*Austin Clements
This replaces all uses of the mheap_.arena_* fields outside of mallocinit and sysAlloc. These fields fundamentally assume a contiguous heap between two bounds, so eliminating these is necessary for a sparse heap. Many of these are replaced with checks for non-nil spans at the test address (which in turn checks for a non-nil entry in the heap arena array). Some of them are just for debugging and somewhat meaningless with a sparse heap, so those we just delete. Updates #10460. Change-Id: I8345b95ffc610aed694f08f74633b3c63506a41f Reviewed-on: https://go-review.googlesource.com/85886 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15runtime: split object finding out of heapBitsForObjectAustin Clements
heapBitsForObject does two things: it finds the base of the object and it creates the heapBits for the base of the object. There are several places where we just care about the base of the object. Furthermore, greyobject only needs the heapBits in the checkmark path and can easily compute them only when needed. Once we eliminate passing the heap bits to grayobject, almost all uses of heapBitsForObject don't need the heap bits. Hence, this splits heapBitsForObject into findObject and heapBitsForAddr (the latter already exists), removes the hbits argument to grayobject, and replaces all heapBitsForObject calls with calls to findObject. In addition to making things cleaner overall, heapBitsForAddr is going to get more expensive shortly, so it's important that we don't do it needlessly. Note that there's an interesting performance pitfall here. I had originally moved findObject to mheap.go, since it made more sense there. However, that leads to a ~2% slow down and a whopping 11% increase in L1 icache misses on both the x/garbage and compilebench benchmarks. This suggests we may want to be more principled about this, but, for now, let's just leave findObject in mbitmap.go. (I tried to make findObject small enough to inline by splitting out the error case, but, sadly, wasn't quite able to get it under the inlining budget.) Change-Id: I7bcb92f383ade565d22a9f2494e4c66fd513fb10 Reviewed-on: https://go-review.googlesource.com/85878 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-13runtime: remove legacy eager write barrierAustin Clements
Now that the buffered write barrier is implemented for all architectures, we can remove the old eager write barrier implementation. This CL removes the implementation from the runtime, support in the compiler for calling it, and updates some compiler tests that relied on the old eager barrier support. It also makes sure that all of the useful comments from the old write barrier implementation still have a place to live. Fixes #22460. Updates #21640 since this fixes the layering concerns of the write barrier (but not the other things in that issue). Change-Id: I580f93c152e89607e0a72fe43370237ba97bae74 Reviewed-on: https://go-review.googlesource.com/92705 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-12-11runtime: reset write barrier buffer on all flush pathsAustin Clements
Currently, wbBufFlush does nothing if the goroutine is dying on the assumption that the system is crashing anyway and running the write barrier may crash it even more. However, it fails to reset the buffer's "next" pointer. As a result, if there are later write barriers on the same P, the write barrier will overflow the write barrier buffer and start corrupting other fields in the P or other heap objects. Often, this corrupts fields in the next allocated P since they tend to be together in the heap. Fix this by always resetting the buffer's "next" pointer, even if we're not doing anything with the pointers in the buffer. Updates #22987 and #22988. (May fix; it's hard to say.) Change-Id: I82c11ea2d399e1658531c3e8065445a66b7282b2 Reviewed-on: https://go-review.googlesource.com/83016 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-10-30runtime: use buffered write barrier for bulkBarrierPreWriteAustin Clements
This modifies bulkBarrierPreWrite to use the buffered write barrier instead of the eager write barrier. This reduces the number of system stack switches and sanity checks by a factor of the buffer size (currently 256). This affects both typedmemmove and typedmemclr. Since this is purely a runtime change, it applies to all arches (unlike the pointer write barrier). name old time/op new time/op delta BulkWriteBarrier-12 7.33ns ± 6% 4.46ns ± 9% -39.10% (p=0.000 n=20+19) Updates #22460. Change-Id: I6a686a63bbf08be02b9b97250e37163c5a90cdd8 Reviewed-on: https://go-review.googlesource.com/73832 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-30runtime: buffered write barrier implementationAustin Clements
This implements runtime support for buffered write barriers on amd64. The buffered write barrier has a fast path that simply enqueues pointers in a per-P buffer. Unlike the current write barrier, this fast path is *not* a normal Go call and does not require the compiler to spill general-purpose registers or put arguments on the stack. When the buffer fills up, the write barrier takes the slow path, which spills all general purpose registers and flushes the buffer. We don't allow safe-points or stack splits while this frame is active, so it doesn't matter that we have no type information for the spilled registers in this frame. One minor complication is cgocheck=2 mode, which uses the write barrier to detect Go pointers being written to non-Go memory. We obviously can't buffer this, so instead we set the buffer to its minimum size, forcing the write barrier into the slow path on every call. For this specific case, we pass additional information as arguments to the flush function. This also requires enabling the cgo write barrier slightly later during runtime initialization, after Ps (and the per-P write barrier buffers) have been initialized. The code in this CL is not yet active. The next CL will modify the compiler to generate calls to the new write barrier. This reduces the average cost of the write barrier by roughly a factor of 4, which will pay for the cost of having it enabled more of the time after we make the GC pacer less aggressive. (Benchmarks will be in the next CL.) Updates #14951. Updates #22460. Change-Id: I396b5b0e2c5e5c4acfd761a3235fd15abadc6cb1 Reviewed-on: https://go-review.googlesource.com/73711 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>