| Age | Commit message (Collapse) | Author |
|
Currently, gcController.scanWork is updated as lazily as possible
since it is only read at the end of the GC cycle. We're about to read
it during the GC cycle to improve the assist ratio revisions, so
modify gcDrain* to regularly flush to gcController.scanWork in much
the same way as we regularly flush to gcController.bgScanCredit.
One consequence of this is that it's difficult to keep gcw.scanWork
monotonic, so we give up on that and simply return the amount of scan
work done by gcDrainN rather than calculating it in the caller.
Change-Id: I7b50acdc39602f843eed0b5c6d2dacd7e762b81d
Reviewed-on: https://go-review.googlesource.com/15407
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Change-Id: I3c21ffa80a5c14911e07238b1f64bec686ed7b72
Reviewed-on: https://go-review.googlesource.com/14980
Reviewed-by: Minux Ma <minux@golang.org>
|
|
The scheduler, work buffer's dispose, and write barriers
can conspire to hide the a pointer from the GC's concurent
mark phase. If this pointer is the only path to a large
amount of marking the STW mark termination phase may take
a lot of time.
Consider the following:
1) dispose places a work buffer on the partial queue
2) the GC is busy so it does not immediately remove and
process the work buffer
3) the scheduler runs a mutator whose write barrier dequeues the
work buffer from the partial queue so the GC won't see it
This repeats until the GC reaches the mark termination
phase where the GC finally discovers the pointer along
with a lot of work to do.
This CL fixes the problem by having the mutator
dispose of the buffer to the full queue instead of
the partial queue. Since the write buffer never asks for full
buffers the conspiracy described above is not possible.
Updates #11694.
Change-Id: I2ce832f9657a7570f800e8ce4459cd9e304ef43b
Reviewed-on: https://go-review.googlesource.com/12840
Reviewed-by: Austin Clements <austin@google.com>
|
|
Some latency regressions have crept into our system over the past few
weeks. This CL fixes those by having the mark phase more aggressively
blacken objects so that the mark termination phase, a STW phase, has less
work to do. Three approaches were taken when the mark phase believes
it has no more work to do, ie all the work buffers are empty.
If things have gone well the mark phase is correct and there is
in fact little or no work. In that case the following items will
take very little time. If the mark phase is wrong this CL will
ferret that work out and give the mark phase a chance to deal with
it concurrently before mark termination begins.
When the mark phase first appears to be out of work, it does three things:
1) It switches from allocating white to allocating black to reduce the
number of unmarked objects reachable only from stacks.
2) It flushes and disables per-P GC work caches so all work must be in
globally visible work buffers.
3) It rescans the global roots---the BSS and data segments---so there
are fewer objects to blacken during mark termination. We do not rescan
stacks at this point, though that could be done in a later CL.
After these steps, it again drains the global work buffers.
On a lightly loaded machine the garbage benchmark has reduced the
number of GC cycles with latency > 10 ms from 83 out of 4083 cycles
down to 2 out of 3995 cycles. Maximum latency was reduced from
60+ msecs down to 20 ms.
Change-Id: I152285b48a7e56c5083a02e8e4485dd39c990492
Reviewed-on: https://go-review.googlesource.com/10590
Reviewed-by: Austin Clements <austin@google.com>
|
|
These were found by grepping the comments from the go code and feeding
the output to aspell.
Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395
Reviewed-on: https://go-review.googlesource.com/10941
Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
During development we ran with monitoring code turned
on by default. This CL turns the work buffer monitoring
off. Performance change on most go1 benchmarks is small
or insignificant.
name old mean new mean delta
BinaryTree17 3.35s × (0.99,1.01) 3.35s × (0.99,1.01) ~ (p=0.841 n=5+5)
Fannkuch11 2.59s × (1.00,1.01) 2.55s × (1.00,1.00) -1.65% (p=0.008 n=5+5)
FmtFprintfEmpty 52.5ns × (0.99,1.02) 53.2ns × (0.98,1.01) ~ (p=0.063 n=5+5)
FmtFprintfString 181ns × (1.00,1.00) 180ns × (1.00,1.00) -0.55% (p=0.029 n=4+4)
FmtFprintfInt 176ns × (1.00,1.01) 174ns × (1.00,1.00) -0.91% (p=0.000 n=5+4)
FmtFprintfIntInt 298ns × (1.00,1.00) 299ns × (1.00,1.00) ~ (p=0.143 n=4+4)
FmtFprintfPrefixedInt 250ns × (1.00,1.01) 246ns × (1.00,1.00) -1.68% (p=0.000 n=5+4)
FmtFprintfFloat 340ns × (1.00,1.00) 340ns × (1.00,1.01) ~ (p=0.643 n=5+5)
FmtManyArgs 1.16µs × (1.00,1.00) 1.15µs × (1.00,1.00) -0.47% (p=0.016 n=5+5)
GobDecode 9.22ms × (1.00,1.00) 9.23ms × (1.00,1.00) ~ (p=0.841 n=5+5)
GobEncode 7.00ms × (1.00,1.01) 7.09ms × (0.99,1.01) +1.26% (p=0.016 n=5+5)
Gzip 387ms × (1.00,1.00) 389ms × (0.99,1.02) ~ (p=1.000 n=5+5)
Gunzip 97.8ms × (1.00,1.00) 98.3ms × (1.00,1.00) +0.51% (p=0.016 n=5+4)
HTTPClientServer 52.6µs × (1.00,1.01) 52.7µs × (1.00,1.01) ~ (p=1.000 n=5+5)
JSONEncode 18.0ms × (0.99,1.02) 17.9ms × (1.00,1.00) ~ (p=0.310 n=5+5)
JSONDecode 64.8ms × (0.99,1.02) 63.6ms × (1.00,1.00) -1.94% (p=0.008 n=5+5)
Mandelbrot200 4.05ms × (1.00,1.00) 4.05ms × (1.00,1.00) ~ (p=0.421 n=5+5)
GoParse 3.86ms × (1.00,1.01) 3.84ms × (0.99,1.01) ~ (p=0.421 n=5+5)
RegexpMatchEasy0_32 101ns × (1.00,1.00) 102ns × (0.99,1.02) ~ (p=0.238 n=4+5)
RegexpMatchEasy0_1K 346ns × (1.00,1.01) 345ns × (1.00,1.00) ~ (p=0.333 n=5+4)
RegexpMatchEasy1_32 87.3ns × (0.99,1.02) 87.4ns × (1.00,1.00) ~ (p=0.190 n=5+4)
RegexpMatchEasy1_1K 520ns × (1.00,1.00) 520ns × (1.00,1.01) ~ (p=1.000 n=4+5)
RegexpMatchMedium_32 143ns × (1.00,1.00) 142ns × (1.00,1.00) -0.70% (p=0.029 n=4+4)
RegexpMatchMedium_1K 43.2µs × (1.00,1.01) 43.2µs × (1.00,1.00) ~ (p=0.841 n=5+5)
RegexpMatchHard_32 2.24µs × (1.00,1.01) 2.23µs × (1.00,1.01) -0.63% (p=0.048 n=5+5)
RegexpMatchHard_1K 68.7µs × (1.00,1.00) 68.3µs × (1.00,1.00) -0.56% (p=0.008 n=5+5)
Revcomp 577ms × (1.00,1.01) 579ms × (1.00,1.00) ~ (p=0.151 n=5+5)
Template 74.9ms × (1.00,1.00) 76.5ms × (1.00,1.00) +2.11% (p=0.008 n=5+5)
TimeParse 359ns × (1.00,1.00) 362ns × (1.00,1.00) +0.72% (p=0.008 n=5+5)
TimeFormat 369ns × (1.00,1.00) 371ns × (1.00,1.01) ~ (p=0.071 n=5+5)
Change-Id: I4206a3f77a3d1450966b7a62ea7597aec44cb72f
Reviewed-on: https://go-review.googlesource.com/10294
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Prior to this CL whenever the GC marking was enabled and
a P was looking for work we supplied a G to help
the GC do its marking tasks. Once this G finished all
the marking available it would release the P to find another
available G. In the case where there was no work the P would drop
into findrunnable which would execute the mark helper G which would
immediately return and the P would drop into findrunnable again repeating
the process. Since the P was always given a G to run it never blocks.
This CL first checks if the GC mark helper G has available work and if
not the P immediately falls through to its blocking logic.
Fixes #10901
Change-Id: I94ac9646866ba64b7892af358888bc9950de23b5
Reviewed-on: https://go-review.googlesource.com/10189
Reviewed-by: Austin Clements <austin@google.com>
|
|
scanobject with ptrmask!=nil is only ever called with the base
pointer of a heap object. Currently, scanobject calls
heapBitsForObject, which goes to a great deal of trouble to check
that the pointer points into the heap and to find the base of the
object it points to, both of which are completely unnecessary in
this case.
Replace this call to heapBitsForObject with much simpler logic to
fetch the span and compute the heap bits.
Benchmark results with five runs:
name old mean new mean delta
BenchmarkBinaryTree17 9.21s × (0.95,1.02) 8.55s × (0.91,1.03) -7.16% (p=0.022)
BenchmarkFannkuch11 2.65s × (1.00,1.00) 2.62s × (1.00,1.00) -1.10% (p=0.000)
BenchmarkFmtFprintfEmpty 73.2ns × (0.99,1.01) 71.7ns × (1.00,1.01) -1.99% (p=0.004)
BenchmarkFmtFprintfString 302ns × (0.99,1.00) 292ns × (0.98,1.02) -3.31% (p=0.020)
BenchmarkFmtFprintfInt 281ns × (0.98,1.01) 279ns × (0.96,1.02) ~ (p=0.596)
BenchmarkFmtFprintfIntInt 482ns × (0.98,1.01) 488ns × (0.95,1.02) ~ (p=0.419)
BenchmarkFmtFprintfPrefixedInt 382ns × (0.99,1.01) 365ns × (0.96,1.02) -4.35% (p=0.015)
BenchmarkFmtFprintfFloat 475ns × (0.99,1.01) 472ns × (1.00,1.00) ~ (p=0.108)
BenchmarkFmtManyArgs 1.89µs × (1.00,1.01) 1.90µs × (0.94,1.02) ~ (p=0.883)
BenchmarkGobDecode 22.4ms × (0.99,1.01) 21.9ms × (0.92,1.04) ~ (p=0.332)
BenchmarkGobEncode 24.7ms × (0.98,1.02) 23.9ms × (0.87,1.07) ~ (p=0.407)
BenchmarkGzip 397ms × (0.99,1.01) 398ms × (0.99,1.01) ~ (p=0.718)
BenchmarkGunzip 96.7ms × (1.00,1.00) 96.9ms × (1.00,1.00) ~ (p=0.230)
BenchmarkHTTPClientServer 71.5µs × (0.98,1.01) 68.5µs × (0.92,1.06) ~ (p=0.243)
BenchmarkJSONEncode 46.1ms × (0.98,1.01) 44.9ms × (0.98,1.03) -2.51% (p=0.040)
BenchmarkJSONDecode 86.1ms × (0.99,1.01) 86.5ms × (0.99,1.01) ~ (p=0.343)
BenchmarkMandelbrot200 4.12ms × (1.00,1.00) 4.13ms × (1.00,1.00) +0.23% (p=0.000)
BenchmarkGoParse 5.89ms × (0.96,1.03) 5.82ms × (0.96,1.04) ~ (p=0.522)
BenchmarkRegexpMatchEasy0_32 141ns × (0.99,1.01) 142ns × (1.00,1.00) ~ (p=0.178)
BenchmarkRegexpMatchEasy0_1K 408ns × (1.00,1.00) 392ns × (0.99,1.00) -3.83% (p=0.000)
BenchmarkRegexpMatchEasy1_32 122ns × (1.00,1.00) 122ns × (1.00,1.00) ~ (p=0.178)
BenchmarkRegexpMatchEasy1_1K 626ns × (1.00,1.01) 624ns × (0.99,1.00) ~ (p=0.122)
BenchmarkRegexpMatchMedium_32 202ns × (0.99,1.00) 205ns × (0.99,1.01) +1.58% (p=0.001)
BenchmarkRegexpMatchMedium_1K 54.4µs × (1.00,1.00) 55.5µs × (1.00,1.00) +1.86% (p=0.000)
BenchmarkRegexpMatchHard_32 2.68µs × (1.00,1.00) 2.71µs × (1.00,1.00) +0.97% (p=0.002)
BenchmarkRegexpMatchHard_1K 79.8µs × (1.00,1.01) 80.5µs × (1.00,1.01) +0.94% (p=0.003)
BenchmarkRevcomp 590ms × (0.99,1.01) 585ms × (1.00,1.00) ~ (p=0.066)
BenchmarkTemplate 111ms × (0.97,1.02) 112ms × (0.99,1.01) ~ (p=0.201)
BenchmarkTimeParse 392ns × (1.00,1.00) 385ns × (1.00,1.00) -1.69% (p=0.000)
BenchmarkTimeFormat 449ns × (0.98,1.01) 448ns × (0.99,1.01) ~ (p=0.550)
Change-Id: Ie7c3830c481d96c9043e7bf26853c6c1d05dc9f4
Reviewed-on: https://go-review.googlesource.com/9364
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Currently, each M has a cache of the most recently used *workbuf. This
is used primarily by the write barrier so it doesn't have to access
the global workbuf lists on every write barrier. It's also used by
stack scanning because it's convenient.
This cache is important for write barrier performance, but this
particular approach has several downsides. It's faster than no cache,
but far from optimal (as the benchmarks below show). It's complex:
access to the cache is sprinkled through most of the workbuf list
operations and it requires special care to transform into and back out
of the gcWork cache that's actually used for scanning and marking. It
requires atomic exchanges to take ownership of the cached workbuf and
to return it to the M's cache even though it's almost always used by
only the current M. Since it's per-M, flushing these caches is O(# of
Ms), which may be high. And it has some significant subtleties: for
example, in general the cache shouldn't be used after the
harvestwbufs() in mark termination because it could hide work from
mark termination, but stack scanning can happen after this and *will*
use the cache (but it turns out this is okay because it will always be
followed by a getfull(), which drains the cache).
This change replaces this cache with a per-P gcWork object. This
gcWork cache can be used directly by scanning and marking (as long as
preemption is disabled, which is a general requirement of gcWork).
Since it's per-P, it doesn't require synchronization, which simplifies
things and means the only atomic operations in the write barrier are
occasionally fetching new work buffers and setting a mark bit if the
object isn't already marked. This cache can be flushed in O(# of Ps),
which is generally small. It follows a simple flushing rule: the cache
can be used during any phase, but during mark termination it must be
flushed before allowing preemption. This also makes the dispose during
mutator assist no longer necessary, which eliminates the vast majority
of gcWork dispose calls and reduces contention on the global workbuf
lists. And it's a lot faster on some benchmarks:
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 11963668673 11206112763 -6.33%
BenchmarkFannkuch11 2643217136 2649182499 +0.23%
BenchmarkFmtFprintfEmpty 70.4 70.2 -0.28%
BenchmarkFmtFprintfString 364 307 -15.66%
BenchmarkFmtFprintfInt 317 282 -11.04%
BenchmarkFmtFprintfIntInt 512 483 -5.66%
BenchmarkFmtFprintfPrefixedInt 404 380 -5.94%
BenchmarkFmtFprintfFloat 521 479 -8.06%
BenchmarkFmtManyArgs 2164 1894 -12.48%
BenchmarkGobDecode 30366146 22429593 -26.14%
BenchmarkGobEncode 29867472 26663152 -10.73%
BenchmarkGzip 391236616 396779490 +1.42%
BenchmarkGunzip 96639491 96297024 -0.35%
BenchmarkHTTPClientServer 100110 70763 -29.31%
BenchmarkJSONEncode 51866051 52511382 +1.24%
BenchmarkJSONDecode 103813138 86094963 -17.07%
BenchmarkMandelbrot200 4121834 4120886 -0.02%
BenchmarkGoParse 16472789 5879949 -64.31%
BenchmarkRegexpMatchEasy0_32 140 140 +0.00%
BenchmarkRegexpMatchEasy0_1K 394 394 +0.00%
BenchmarkRegexpMatchEasy1_32 120 120 +0.00%
BenchmarkRegexpMatchEasy1_1K 621 614 -1.13%
BenchmarkRegexpMatchMedium_32 209 202 -3.35%
BenchmarkRegexpMatchMedium_1K 54889 55175 +0.52%
BenchmarkRegexpMatchHard_32 2682 2675 -0.26%
BenchmarkRegexpMatchHard_1K 79383 79524 +0.18%
BenchmarkRevcomp 584116718 584595320 +0.08%
BenchmarkTemplate 125400565 109620196 -12.58%
BenchmarkTimeParse 386 387 +0.26%
BenchmarkTimeFormat 580 447 -22.93%
(Best out of 10 runs. The delta of averages is similar.)
This also puts us in a good position to flush these caches when
nearing the end of concurrent marking, which will let us increase the
size of the work buffers while still controlling mark termination
pause time.
Change-Id: I2dd94c8517a19297a98ec280203cccaa58792522
Reviewed-on: https://go-review.googlesource.com/9178
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
This tracks the amount of scan work in terms of scanned pointers
during the concurrent mark phase. We'll use this information to
estimate scan work for the next cycle.
Currently this aggregates the work counter in gcWork and dispose
atomically aggregates this into a global work counter. dispose happens
relatively infrequently, so the contention on the global counter
should be low. If this turns out to be an issue, we can reduce the
number of disposes, and if it's still a problem, we can switch to
per-P counters.
Change-Id: Iac0364c466ee35fab781dbbbe7970a5f3c4e1fc1
Reviewed-on: https://go-review.googlesource.com/8832
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
This tracks the number of heap bytes marked by a GC cycle. We'll use
this information to precisely trigger the next GC cycle.
Currently this aggregates the work counter in gcWork and dispose
atomically aggregates this into a global work counter. dispose happens
relatively infrequently, so the contention on the global counter
should be low. If this turns out to be an issue, we can reduce the
number of disposes, and if it's still a problem, we can switch to
per-P counters.
Change-Id: I1bc377cb2e802ef61c2968602b63146d52e7f5db
Reviewed-on: https://go-review.googlesource.com/8388
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
Currently, gcDrainN is documented saying that it must be run on the
system stack. In fact, the problem and solution here are somewhat
subtler. First, it doesn't have to happen on the system stack, it just
has to be non-stoppable (that is, non-preemptible). Second, this isn't
specific to gcDrainN (though gcDrainN is perhaps the most surprising
instance); it's general to anything that uses the gcWork structure.
Move the comment to gcWork and generalize it.
Change-Id: I5277b5abb070e47f8d783bc15a310b379c6adc22
Reviewed-on: https://go-review.googlesource.com/8247
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Currently, we only exit the getfull barrier if there is work on the
full list, even though the exit path will take work from either the
full or partial list. Change this to exit the barrier if there is work
on either the full or partial lists.
I believe it's currently safe to check only the full list, since
during mark termination there is no reason to put a workbuf on a
partial list. However, checking both is more robust.
Change-Id: Icf095b0945c7cad326a87ff2f1dc49b7699df373
Reviewed-on: https://go-review.googlesource.com/7840
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
The distinction between gcWorkProducer and gcWork (producer and
consumer) is not serving us as originally intended, so merge these
into just gcWork.
The original intent was to replace the currentwbuf cache with a
gcWorkProducer. However, with gchelpwork (aka mutator assists),
mutators can both produce and consume work, so it will make more sense
to cache a whole gcWork.
Change-Id: I6e633e96db7cb23a64fbadbfc4607e3ad32bcfb3
Reviewed-on: https://go-review.googlesource.com/7733
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Until recently, struct workbuf had only lfnode and uintptr fields
before the obj array to make it convenient to compute the size of the
obj array. It slowly grew more fields until this became inconvenient
enough that it was restructured to make the size computation easy.
Now the size computation doesn't care what the field types are, so
switch to more natural types.
Change-Id: I966140ba7ebb4aeb41d5c66d9d2a3bdc17dd4bcf
Reviewed-on: https://go-review.googlesource.com/5262
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
This introduces a producer/consumer abstraction for GC work pointers
that internally handles the details of filling, draining, and
shuffling work buffers.
In addition to simplifying the GC code, this should make it easy for
us to change how we use work buffers, including cleaning up how we use
the work.partial queue, reintroducing a FIFO lookahead cache, adding
prefetching, and using dual buffers to avoid flapping.
This commit doesn't change any existing code. The following commit
will switch the garbage collector from explicit workbuf manipulation
to gcWork.
Change-Id: Ifbfe5fff45bf0362d6d7c3cecb061f0c9874077d
Reviewed-on: https://go-review.googlesource.com/5231
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
Change-Id: If7729b3c7df6dc7fcd41f293e2ef2472c769fe8b
Reviewed-on: https://go-review.googlesource.com/5261
Reviewed-by: Rick Hudson <rlh@golang.org>
|
|
All of the other memory-related source files start with "m". Keep up
the tradition.
Change-Id: Idd88fdbf2a1453374fa12109b949b1c4d149a4f8
Reviewed-on: https://go-review.googlesource.com/4853
Reviewed-by: Minux Ma <minux@golang.org>
|