| Age | Commit message (Collapse) | Author |
|
This change adds an API to the runtime for arenas. A later CL can
potentially export it as an experimental API, but for now, just the
runtime implementation will suffice.
The purpose of arenas is to improve efficiency, primarily by allowing
for an application to manually free memory, thereby delaying garbage
collection. It comes with other potential performance benefits, such as
better locality, a better allocation strategy, and better handling of
interior pointers by the GC.
This implementation is based on one by danscales@google.com with a few
significant differences:
* The implementation lives entirely in the runtime (all layers).
* Arena chunks are the minimum of 8 MiB or the heap arena size. This
choice is made because in practice 64 MiB appears to be way too large
of an area for most real-world use-cases.
* Arena chunks are not unmapped, instead they're placed on an evacuation
list and when there are no pointers left pointing into them, they're
allowed to be reused.
* Reusing partially-used arena chunks no longer tries to find one used
by the same P first; it just takes the first one available.
* In order to ensure worst-case fragmentation is never worse than 25%,
only types and slice backing stores whose sizes are 1/4th the size of
a chunk or less may be used. Previously larger sizes, up to the size
of the chunk, were allowed.
* ASAN, MSAN, and the race detector are fully supported.
* Sets arena chunks to fault that were deferred at the end of mark
termination (a non-public patch once did this; I don't see a reason
not to continue that).
For #51317.
Change-Id: I83b1693a17302554cb36b6daa4e9249a81b1644f
Reviewed-on: https://go-review.googlesource.com/c/go/+/423359
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
|
|
Currently the runtime's scavenging algorithm involves running from the
top of the heap address space to the bottom (or as far as it gets) once
per GC cycle. Once it treads some ground, it doesn't tread it again
until the next GC cycle.
This works just fine for the background scavenger, for heap-growth
scavenging, and for debug.FreeOSMemory. However, it breaks down in the
face of a memory limit for small heaps in the tens of MiB. Basically,
because the scavenger never retreads old ground, it's completely
oblivious to new memory it could scavenge, and that it really *should*
in the face of a memory limit.
Also, every time some thread goes to scavenge in the runtime, it
reserves what could be a considerable amount of address space, hiding it
from other scavengers.
This change modifies and simplifies the implementation overall. It's
less code with complexities that are much better encapsulated. The
current implementation iterates optimistically over the address space
looking for memory to scavenge, keeping track of what it last saw. The
new implementation does the same, but instead of directly iterating over
pages, it iterates over chunks. It maintains an index of chunks (as a
bitmap over the address space) that indicate which chunks may contain
scavenge work. The page allocator populates this index, while scavengers
consume it and iterate over it optimistically.
This has a two key benefits:
1. Scavenging is much simpler: find a candidate chunk, and check it,
essentially just using the scavengeOne fast path. There's no need for
the complexity of iterating beyond one chunk, because the index is
lock-free and already maintains that information.
2. If pages are freed to the page allocator (always guaranteed to be
unscavenged), the page allocator immediately notifies all scavengers
of the new source of work, avoiding the hiding issues of the old
implementation.
One downside of the new implementation, however, is that it's
potentially more expensive to find pages to scavenge. In the past, if
a single page would become free high up in the address space, the
runtime's scavengers would ignore it. Now that scavengers won't, one or
more scavengers may need to iterate potentially across the whole heap to
find the next source of work. For the background scavenger, this just
means a potentially less reactive scavenger -- overall it should still
use the same amount of CPU. It means worse overheads for memory limit
scavenging, but that's not exactly something with a baseline yet.
In practice, this shouldn't be too bad, hopefully since the chunk index
is extremely compact. For a 48-bit address space, the index is only 8
MiB in size at worst, but even just one physical page in the index is
able to support up to 128 GiB heaps, provided they aren't terribly
sparse. On 32-bit platforms, the index is only 128 bytes in size.
For #48409.
Change-Id: I72b7e74365046b18c64a6417224c5d85511194fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/399474
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
internal/goarch.PtrSize [generated]
[git-generate]
cd src/runtime/internal/math
gofmt -w -r "sys.PtrSize -> goarch.PtrSize" .
goimports -w *.go
cd ../..
gofmt -w -r "sys.PtrSize -> goarch.PtrSize" .
goimports -w *.go
Change-Id: I43491cdd54d2e06d4d04152b3d213851b7d6d423
Reviewed-on: https://go-review.googlesource.com/c/go/+/328337
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
This change modifies addrRanges.findSucc to more efficiently find the
successor range in an addrRanges by using a binary search to narrow down
large addrRanges and iterate over no more than 8 addrRanges.
This change makes the runtime more robust against systems that may
aggressively randomize the address space mappings it gives the runtime
(e.g. Fuchsia).
For #40191.
Change-Id: If529df2abd2edb1b1496d8690ddd284ecd7138c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/242679
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
This change modifies the type of several mstats fields to be a new type:
sysMemStat. This type has the same structure as the fields used to have.
The purpose of this change is to make it very clear which stats may be
used in various functions for accounting (usually the platform-specific
sys* functions, but there are others). Currently there's an implicit
understanding that the *uint64 value passed to these functions is some
kind of statistic whose value is atomically managed. This understanding
isn't inherently problematic, but we're about to change how some stats
(which currently use mSysStatInc and mSysStatDec) work, so we want to
make it very clear what the various requirements are around "sysStat".
This change also removes mSysStatInc and mSysStatDec in favor of a
method on sysMemStat. Note that those two functions were originally
written the way they were because atomic 64-bit adds required a valid G
on ARM, but this hasn't been the case for a very long time (since
golang.org/cl/14204, but even before then it wasn't clear if mutexes
required a valid G anymore). Today we implement 64-bit adds on ARM with
a spinlock table.
Change-Id: I4e9b37cf14afc2ae20cf736e874eb0064af086d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/246971
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
addrRanges represents a set of addresses. Currently, passing in a
zero-sized range will cause that range to be added to the list, even
though it doesn't represent any address (addrRanges.contains will still
always return false, and findSucc will give surprising results).
We could ignore this input, but it's almost always a bug for the calling
code to pass in a zero-sized range, so just throw.
Change-Id: I8ed09e15b79a3a33e2d0cf5ed55f9e497388e7a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/242817
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Currently pageAlloc.find attempts to find a better estimate for the
first free page in the heap, even if the space its looking for isn't
necessarily going to be the first free page in the heap (e.g. if npages
>= 2). However, in doing so it has the potential to return a searchAddr
candidate that doesn't actually correspond to mapped memory, but this
candidate might still be adopted. As a result, pageAlloc.alloc's fast
path may look at unmapped summary memory and segfault. This case is rare
on most operating systems since the heap is kept fairly contiguous, so
the chance that the candidate searchAddr discovered is unmapped is
fairly low. Even so, this is totally possible and outside the user's
control when it happens (in fact, it's likely to happen consistently for
a given user on a given system).
Fix this problem by ensuring that our candidate always points to mapped
memory. We do this by looking at mheap's arenas structure first. If it
turns out our candidate doesn't correspond to mapped memory, then we
look at inUse to round up the searchAddr to the next mapped address.
While we're here, clean up some documentation related to searchAddr.
Fixes #40191.
Change-Id: I759efec78987e4a8fde466ae45aabbaa3d9d4214
Reviewed-on: https://go-review.googlesource.com/c/go/+/242680
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Currently in (*addrRanges).removeGreaterEqual we use
(addrRange).subtract with a range from specified address to "infinity"
which is supposed to be maxOffAddr. However, maxOffAddr is necessarily
an inclusive bound on the address space, because on many platforms an
exclusive bound would overflow back to 0.
On some platforms like mips and mipsle, the address space is smaller
than what's representable in a pointer, so if there's a range which hits
the top of the address space (such as in the pageAlloc tests), the limit
doesn't overflow, but maxOffAddr is inclusive, so any attempt to prune
this range with (*addrRange).removeGreaterEqual causes a failure, since
the range passed to subtract is contained within the address range which
touches the top of the address space.
Another problem with using subtract here is that addr and
maxOffAddr.addr() may not be in the same segment which could cause
makeAddrRange to panic. While this unlikely to happen, on some platforms
such as Solaris it is possible.
Fix these issues by not using subtract at all. Create a specific
implementation of (addrRange).removeGreaterEqual which side-steps all of
this by not having to worry about the top of the address space at all.
Fixes #39128.
Change-Id: Icd5b587b1a3d32a5681fb76cec4c001401f5756f
Reviewed-on: https://go-review.googlesource.com/c/go/+/234457
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Currently maxOffAddr is defined in terms of the whole 64-bit address
space, assuming that it's all supported, by using ^uintptr(0) as the
maximal address in the offset space. In reality, the maximal address in
the offset space is (1<<heapAddrBits)-1 because we don't have more than
that actually available to us on a given platform.
On most platforms this is fine, because arenaBaseOffset is just
connecting two segments of address space, but on AIX we use it as an
actual offset for the starting address of the available address space,
which is limited. This means using ^uintptr(0) as the maximal address in
the offset address space causes wrap-around, especially when we just
want to represent a range approximately like [addr, infinity), which
today we do by using maxOffAddr.
To fix this, we define maxOffAddr more appropriately, in terms of
(1<<heapAddrBits)-1.
This change also redefines arenaBaseOffset to not be the negation of the
virtual address corresponding to address zero in the virtual address
space, but instead directly as the virtual address corresponding to
zero. This matches the existing documentation more closely and makes the
logic around arenaBaseOffset decidedly simpler, especially when trying
to reason about its use on AIX.
Fixes #38966.
Change-Id: I1336e5036a39de846f64cc2d253e8536dee57611
Reviewed-on: https://go-review.googlesource.com/c/go/+/233497
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
This change uses the new offAddr type in more parts of the runtime where
we've been implicitly switching from the default address space to a
contiguous view. The purpose of offAddr is to represent addresses in the
contiguous view of the address space, and to make direct computations
between real addresses and offset addresses impossible. This change thus
improves readability in the runtime.
Updates #35788.
Change-Id: I4e1c5fed3ed68aa12f49a42b82eb3f46aba82fc1
Reviewed-on: https://go-review.googlesource.com/c/go/+/230718
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Currently addrRange and addrRanges operate on real addresses. That is,
the addresses they manipulate don't include arenaBaseOffset. When added
to an address, arenaBaseOffset makes the address space appear contiguous
on platforms where the address space is segmented. While this is
generally OK because even those platforms which have a segmented address
space usually don't give addresses in a different segment, today it
causes a mismatch between the scavenger and the rest of the page
allocator. The scavenger scavenges from the highest addresses first, but
only via real address, whereas the page allocator allocates memory in
offset address order.
So this change makes addrRange and addrRanges, i.e. what the scavenger
operates on, use offset addresses. However, lots of the page allocator
relies on an addrRange containing real addresses.
To make this transition less error-prone, this change introduces a new
type, offAddr, whose purpose is to make offset addresses a distinct
type, so any attempt to trivially mix real and offset addresses will
trigger a compilation error.
This change doesn't attempt to use offAddr in all of the runtime; a
follow-up change will look for and catch remaining uses of an offset
address which doesn't use the type.
Updates #35788.
Change-Id: I991d891ac8ace8339ca180daafdf6b261a4d43d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/230717
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
This change removes the concept of s.scavAddr in favor of explicitly
reserving and unreserving address ranges. s.scavAddr has several
problems with raciness that can cause the scavenger to miss updates, or
move it back unnecessarily, forcing future scavenge calls to iterate
over searched address space unnecessarily.
This change achieves this by replacing scavAddr with a second addrRanges
which is cloned from s.inUse at the end of each sweep phase. Ranges from
this second addrRanges are then reserved by scavengers (with the
reservation size proportional to the heap size) who are then able to
safely iterate over those ranges without worry of another scavenger
coming in.
Fixes #35788.
Change-Id: Ief01ae170384174875118742f6c26b2a41cbb66d
Reviewed-on: https://go-review.googlesource.com/c/go/+/208378
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Austin Clements <austin@google.com>
|
|
This change makes it so that we check whether scavAddr is actually
mapped before trying to look at the summary for the fast path, since we
may segfault if that that part of the summary is not mapped in.
Previously this wasn't a problem because we would conservatively map
all memory for the summaries between the lowest mapped heap address and
the highest one.
This change also adds a test for this case.
Change-Id: I2b1d89b5e044dce81745964dfaba829f4becdc57
Reviewed-on: https://go-review.googlesource.com/c/go/+/212637
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
When growing the address ranges, the new length is the old length + 1.
Fixes #36113.
Change-Id: I1b425f78e473cfa3cbdfe6113e166663f41fc9f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/211157
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Prior to this change, if the heap was very discontiguous (such as in
TestArenaCollision) it's possible we could map a large amount of memory
as R/W and commit it. We would use only the start and end to track what
should be mapped, and we would extend that mapping as needed to
accomodate a potentially fragmented address space.
After this change, we only map exactly the part of the summary arrays
that we need by using the inUse ranges from the previous change. This
reduces the GCSys footprint of TestArenaCollision from 300 MiB to 18
MiB.
Because summaries are no longer mapped contiguously, this means the
scavenger can no longer iterate directly. This change also updates the
scavenger to borrow ranges out of inUse and iterate over only the
parts of the heap which are actually currently in use. This is both an
optimization and necessary for correctness.
Fixes #35514.
Change-Id: I96bf0c73ed0d2d89a00202ece7b9d089a53bac90
Reviewed-on: https://go-review.googlesource.com/c/go/+/207758
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
This change adds a new inUse field to the allocator which tracks ranges
of addresses that are owned by the heap. It is updated on each heap
growth.
These ranges are tracked in an array which is kept sorted. In practice
this array shouldn't exceed its initial allocation except in rare cases
and thus should be small (ideally exactly 1 element in size).
In a hypothetical worst-case scenario wherein we have a 1 TiB heap and 4
MiB arenas (note that the address ranges will never be at a smaller
granularity than an arena, since arenas are always allocated
contiguously), inUse would use at most 4 MiB of memory if the heap
mappings were completely discontiguous (highly unlikely) with an
additional 2 MiB leaked from previous allocations. Furthermore, the
copies that are done to keep the inUse array sorted will copy at most 4
MiB of memory in such a scenario, which, assuming a conservative copying
rate of 5 GiB/s, amounts to about 800µs.
However, note that in practice:
1) Most 64-bit platforms have 64 MiB arenas.
2) The copies should incur little-to-no page faults, meaning a copy rate
closer to 25-50 GiB/s is expected.
3) Go heaps are almost always mostly contiguous.
Updates #35514.
Change-Id: I3ad07f1c2b5b9340acf59ecc3b9ae09e884814fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/207757
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
|