| Age | Commit message (Collapse) | Author |
|
The memProfCycle struct holds allocation counts and bytes allocated, and
frees and bytes freed. But the memory profile records are already
aggregated by allocation size, which is stored in the "size" field of
the bucket struct. We can derive the bytes allocated/freed using the
counts and the size we already store. Thus we can delete the bytes
fields from memProfCycle, saving 64 bytes per memRecord.
We can do something similar for the profilerecord.MemProfileRecord type.
We just need to know the object size and we can derive the allocated and
freed bytes accordingly.
Change-Id: I103885c2f29471b25283e330674fc16d6a6a6964
Reviewed-on: https://go-review.googlesource.com/c/go/+/760140
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
this method seems implements for the old span's memory layout,
and is not suitable for the new implements.
Change-Id: I30d274bc18f4af440b003ade6f9342038783fc8a
GitHub-Last-Rev: d8ac30e3ab32148ef7fd91df57a1590161124303
GitHub-Pull-Request: golang/go#77599
Reviewed-on: https://go-review.googlesource.com/c/go/+/745260
Reviewed-by: Tony Tang <jianfeng.tony@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
After CL 700496, mspan.scanIdx is never used, this CL just remove it.
Change-Id: I41ce9902957c0cfa6fbf26b66a2a7787b179376a
Reviewed-on: https://go-review.googlesource.com/c/go/+/737220
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
During a naive attempt to test the new runtime/secret package, I tried
wrapping the entire handshake in a secret.Do call. This lead to a panic
because some of the allocator logic had been previously untested.
freeSpecial takes p and size, but they can be misleading. They don't
refer to the pointer and size of the object with the special attached,
but a pointer to the enclosing object and the size of the span element.
The previous code did not take this into account and when passing the
size to memclr would overwrite nearby objects.
Fix by storing the size of the object being cleared inside the special.
Fixes #76865.
Change-Id: Ifae31f1c8d0609a562a37f37c45aec2f369dc6a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/730361
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Implement secret.Do.
- When secret.Do returns:
- Clear stack that is used by the argument function.
- Clear all the registers that might contain secrets.
- On stack growth in secret mode, clear the old stack.
- When objects are allocated in secret mode, mark them and then zero
the marked objects immediately when they are freed.
- If the argument function panics, raise that panic as if it originated
from secret.Do. This removes anything about the secret function
from tracebacks.
For now, this is only implemented on linux for arm64 and amd64.
This is a rebased version of Keith Randalls initial implementation at
CL 600635. I have added arm64 support, signal handling, preemption
handling and dealt with vDSOs spilling into system stacks.
Fixes #21865
Change-Id: I6fbd5a233beeaceb160785e0c0199a5c94d8e520
Co-authored-by: Keith Randall <khr@golang.org>
Reviewed-on: https://go-review.googlesource.com/c/go/+/704615
Reviewed-by: Roland Shoemaker <roland@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
mheap use pageAlloc to manage free/scav address space instead of
using free/scav treap. The comment on mheap states mheap uses
treaps. Update the comment to reflect the use of pageAlloc.
In the fallback code when sizeSpecializedMalloc is enabled,
heapBitsInSpan is false. Update the comment to reflect that.
Change-Id: I50d2993c84e2c0312a925ab0a33065cc4cd41c41
Reviewed-on: https://go-review.googlesource.com/c/go/+/722700
Reviewed-by: Lidong Yan <yldhome2d2@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <markfreeman@google.com>
|
|
This CL is part of a set of CLs that attempt to reduce how much work the
GC must do. See the design in https://go.dev/design/74299-runtime-freegc
This CL adds runtime.freegc:
func freegc(ptr unsafe.Pointer, uintptr size, noscan bool)
Memory freed via runtime.freegc is made immediately reusable for
the next allocation in the same size class, without waiting for a
GC cycle, and hence can dramatically reduce pressure on the GC. A sample
microbenchmark included below shows strings.Builder operating roughly
2x faster.
An experimental modification to reflect to use runtime.freegc
and then using that reflect with json/v2 gave reported memory
allocation reductions of -43.7%, -32.9%, -21.9%, -22.0%, -1.0%
for the 5 official real-world unmarshalling benchmarks from
go-json-experiment/jsonbench by the authors of json/v2, covering
the CanadaGeometry through TwitterStatus datasets.
Note: there is no intent to modify the standard library to have
explicit calls to runtime.freegc, and of course such an ability
would never be exposed to end-user code.
Later CLs in this stack teach the compiler how to automatically
insert runtime.freegc calls when it can prove it is safe to do so.
(The reflect modification and other experimental changes to
the standard library were just that -- experiments. It was
very helpful while initially developing runtime.freegc to see
more complex uses and closer-to-real-world benchmark results
prior to updating the compiler.)
This CL only addresses noscan span classes (heap objects without
pointers), such as the backing memory for a []byte or string. A
follow-on CL adds support for heap objects with pointers.
If we update strings.Builder to explicitly call runtime.freegc on its
internal buf after a resize operation (but without freeing the usually
final incarnation of buf that will be returned to the user as a string),
we can see some nice benchmark results on the existing strings
benchmarks that call Builder.Write N times and then call Builder.String.
Here, the (uncommon) case of a single Builder.Write is not helped (given
it never resizes after first alloc if there is only one Write), but the
impact grows such that it is up to ~2x faster as there are more resize
operations due to more strings.Builder.Write calls:
│ disabled.out │ new-free-20.txt │
│ sec/op │ sec/op vs base │
BuildString_Builder/1Write_36Bytes_NoGrow-4 55.82n ± 2% 55.86n ± 2% ~ (p=0.794 n=20)
BuildString_Builder/2Write_36Bytes_NoGrow-4 125.2n ± 2% 115.4n ± 1% -7.86% (p=0.000 n=20)
BuildString_Builder/3Write_36Bytes_NoGrow-4 224.0n ± 1% 188.2n ± 2% -16.00% (p=0.000 n=20)
BuildString_Builder/5Write_36Bytes_NoGrow-4 239.1n ± 9% 205.1n ± 1% -14.20% (p=0.000 n=20)
BuildString_Builder/8Write_36Bytes_NoGrow-4 422.8n ± 3% 325.4n ± 1% -23.04% (p=0.000 n=20)
BuildString_Builder/10Write_36Bytes_NoGrow-4 436.9n ± 2% 342.3n ± 1% -21.64% (p=0.000 n=20)
BuildString_Builder/100Write_36Bytes_NoGrow-4 4.403µ ± 1% 2.381µ ± 2% -45.91% (p=0.000 n=20)
BuildString_Builder/1000Write_36Bytes_NoGrow-4 48.28µ ± 2% 21.38µ ± 2% -55.71% (p=0.000 n=20)
See the design document for more discussion of the strings.Builder case.
For testing, we add tests that attempt to exercise different aspects
of the underlying freegc and mallocgc behavior on the reuse path.
Validating the assist credit manipulations turned out to be subtle,
so a test for that is added in the next CL. There are also
invariant checks added, controlled by consts (primarily the
doubleCheckReusable const currently).
This CL also adds support in runtime.freegc for GODEBUG=clobberfree=1
to immediately overwrite freed memory with 0xdeadbeef, which
can help a higher-level test fail faster in the event of a bug,
and also the GC specifically looks for that pattern and throws
a fatal error if it unexpectedly finds it.
A later CL (currently experimental) adds GODEBUG=clobberfree=2,
which uses mprotect (or VirtualProtect on Windows) to set
freed memory to fault if read or written, until the runtime
later unprotects the memory on the mallocgc reuse path.
For the cases where a normal allocation is happening without any reuse,
some initial microbenchmarks suggest the impact of these changes could
be small to negligible (at least with GOAMD64=v3):
goos: linux
goarch: amd64
pkg: runtime
cpu: AMD EPYC 7B13
│ base-512M-v3.bench │ ps16-512M-goamd64-v3.bench │
│ sec/op │ sec/op vs base │
Malloc8-16 11.01n ± 1% 10.94n ± 1% -0.68% (p=0.038 n=20)
Malloc16-16 17.15n ± 1% 17.05n ± 0% -0.55% (p=0.007 n=20)
Malloc32-16 18.65n ± 1% 18.42n ± 0% -1.26% (p=0.000 n=20)
MallocTypeInfo8-16 18.63n ± 0% 18.36n ± 0% -1.45% (p=0.000 n=20)
MallocTypeInfo16-16 22.32n ± 0% 22.65n ± 0% +1.50% (p=0.000 n=20)
MallocTypeInfo32-16 23.37n ± 0% 23.89n ± 0% +2.23% (p=0.000 n=20)
geomean 18.02n 18.01n -0.05%
These last benchmark results include the runtime updates to support
span classes with pointers (which was originally part of this CL,
but later split out for ease of review).
Updates #74299
Change-Id: Icceaa0f79f85c70cd1a718f9a4e7f0cf3d77803c
Reviewed-on: https://go-review.googlesource.com/c/go/+/673695
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
Currently, AddCleanup just creates a simple closure that calls
`cleanup(arg)` as the actual cleanup function tracked internally.
However, the argument ends up getting its own allocation. If it's tiny,
then it can also end up sharing a tiny allocation slot with the object
we're adding the cleanup to. Since the closure is a GC root, we can end
up with cleanups that never fire.
This change refactors the AddCleanup machinery to make the storage for
the argument separate and explicit. With that in place, it explicitly
allocates 16 bytes of storage for tiny arguments to side-step the tiny
allocator.
One would think this would cause an increase in memory use and more
bytes allocated, but that's actually wrong! It turns out that the
current "simple closure" actually creates _two_ closures. By making the
argument passing explicit, we eliminate one layer of closures, so this
actually results in a slightly faster AddCleanup overall and 16 bytes
less memory allocated.
goos: linux
goarch: amd64
pkg: runtime
cpu: AMD EPYC 7B13
│ before.bench │ after.bench │
│ sec/op │ sec/op vs base │
AddCleanupAndStop-64 124.5n ± 2% 103.7n ± 2% -16.71% (p=0.002 n=6)
│ before.bench │ after.bench │
│ B/op │ B/op vs base │
AddCleanupAndStop-64 48.00 ± 0% 32.00 ± 0% -33.33% (p=0.002 n=6)
│ before.bench │ after.bench │
│ allocs/op │ allocs/op vs base │
AddCleanupAndStop-64 3.000 ± 0% 2.000 ± 0% -33.33% (p=0.002 n=6)
This change does, however, does add 16 bytes of overhead to the cleanup
special itself, and makes each cleanup block entry 24 bytes instead of 8
bytes. This means the overall memory overhead delta with this change is
neutral, and we just have a faster cleanup. (Cleanup block entries are
transient, so I suspect any increase in memory overhead there is
negligible.)
Together with CL 719960, fixes #76007.
Change-Id: I81bf3e44339e71c016c30d80bb4ee151c8263d5c
Reviewed-on: https://go-review.googlesource.com/c/go/+/720321
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Currently weak handles are atomic.Uintptr values that may end up in
a tiny block which can cause all sorts of surprising leaks. See #76007
for one example.
This change pads out the underlying allocation of the atomic.Uintptr to
16 bytes to ensure we bypass the tiny allocator, and it gets its own
block. This wastes 8 bytes per weak handle. We could potentially do
better by using the 8 byte noscan size class, but this can be a
follow-up change.
For #76007.
Change-Id: I3ab74dda9bf312ea0007f167093052de28134944
Reviewed-on: https://go-review.googlesource.com/c/go/+/719960
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
On Linux, memory returned to the kernel via MADV_DONTNEED is guaranteed
to be zero-filled on its next use.
This commit leverages this kernel behavior to avoid a redundant software
zeroing pass in the runtime, improving performance.
Change-Id: Ia14343b447a2cec7af87644fe8050e23e983c787
GitHub-Last-Rev: 6c8df322836e70922c69ca3c5aac36e4b8a0839a
GitHub-Pull-Request: golang/go#76063
Reviewed-on: https://go-review.googlesource.com/c/go/+/715160
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
On Wasm, some programs have very small heap. Currently,
we use 4 MB arena size (like all other 32-bit platforms).
For a very small program, it needs to allocate one heap
arena, 4 MB size at a 4 MB aligned address. So we'll
need 8 MB of linear memory, whereas only a smaller
portion is actually used by the program. On Wasm, samll
programs are not uncommon (e.g. WASI plugins), and
users are concerned about the memory usage.
This CL switches to a smaller arena size, as well as a
smaller page allocator chunk size (both are now 512 KB).
So the heap will be grown in 512 KB granularity. For a
helloworld program, it now uses less than 3 MB of
linear memory, instead of 8 MB.
Change-Id: Ibd66c1fa6e794a12c00906cbacc8f2e410f196c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/683296
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change removes the locked global span queue and replaces the
fixed-size local span queue with a variable-sized local span queue. The
variable-sized local span queue grows as needed to accomodate local
work. With no global span queue either, GC workers balance work amongst
themselves by stealing from each other.
The new variable-sized local span queues are inspired by the P-local
deque underlying sync.Pool. Unlike the sync.Pool deque, however, both
the owning P and stealing Ps take spans from the tail, making this
incarnation a strict queue, not a deque. This is intentional, since we
want a queue-like order to encourage objects to accumulate on each span.
These variable-sized local span queues are crucial to mark termination,
just like the global span queue was. To avoid hitting the ragged barrier
too often, we must check whether any Ps have any spans on their
variable-sized local span queues. We maintain a per-P atomic bitmask
(another pMask) that contains this state. We can also use this to speed
up stealing by skipping Ps that don't have any local spans.
The variable-sized local span queues are slower than the old fixed-size
local span queues because of the additional indirection, so this change
adds a non-atomic local fixed-size queue. This risks getting work stuck
on it, so, similarly to how workbufs work, each worker will occasionally
dump some spans onto its local variable-sized queue. This scales much
more nicely than dumping to a global queue, but is still visible to all
other Ps.
For #73581.
Change-Id: I814f54d9c3cc7fa7896167746e9823f50943ac22
Reviewed-on: https://go-review.googlesource.com/c/go/+/700496
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Asynchronous preemption must save all registers that could be in use
by Go code. Currently, it saves all of these to the goroutine stack.
As a result, the stack frame requirements of asynchronous preemption
can be rather high. On amd64, this requires 368 bytes of stack space,
most of which is the XMM registers. Several RISC architectures are
around 0.5 KiB.
As we add support for SIMD instructions, this is going to become a
problem. The AVX-512 register state is 2.5 KiB. This well exceeds the
nosplit limit, and even if it didn't, could constrain when we can
asynchronously preempt goroutines on small stacks.
This CL fixes this by moving pure scalar state stored in non-GP
registers off the stack and into an allocated "extended register
state" object. To reduce space overhead, we only allocate these
objects as needed. While in the theoretical limit, every G could need
this register state, in practice very few do at a time.
However, we can't allocate when we're in the middle of saving the
register state during an asynchronous preemption, so we reserve
scratch space on every P to temporarily store the register state,
which can then be copied out to an allocated state object later by Go
code.
This commit only implements this for amd64, since that's where we're
about to add much more vector state, but it lays the groundwork for
doing this on any architecture that could benefit.
This is a cherry-pick of CL 680898 plus bug fix CL 684836 from the
dev.simd branch.
Change-Id: I123a95e21c11d5c10942d70e27f84d2d99bbf735
Reviewed-on: https://go-review.googlesource.com/c/go/+/669195
Auto-Submit: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Found by github.com/mdempsky/unconvert
Change-Id: Ib78cceb718146509d96dbb6da87b27dbaeba1306
GitHub-Last-Rev: dedf354811701ce8920c305b6f7aa78914a4171c
GitHub-Pull-Request: golang/go#74771
Reviewed-on: https://go-review.googlesource.com/c/go/+/690735
Reviewed-by: Mark Freeman <mark@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This is long overdue.
Change-Id: I891b114cb581e82b903c20d1c455bbbdad548fe8
Reviewed-on: https://go-review.googlesource.com/c/go/+/690535
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
During initialization, allow randomizing the heap base address by
generating a random uint64 and using its bits to randomize various
portions of the heap base address.
We use the following method to randomize the base address:
* We first generate a random heapArenaBytes aligned address that we use
for generating the hints.
* On the first call to mheap.grow, we then generate a random
PallocChunkBytes aligned offset into the mmap'd heap region, which we
use as the base for the heap region.
* We then mark a random number of pages within the page allocator as
allocated.
Our final randomized "heap base address" becomes the first byte of
the first available page returned by the page allocator. This results
in an address with at least heapAddrBits-gc.PageShift-1 bits of
entropy.
Fixes #27583
Change-Id: Ideb4450a5ff747a132f702d563d2a516dec91a88
Reviewed-on: https://go-review.googlesource.com/c/go/+/674835
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: I0c33830b13c8a187ac82504c7653abb8f8cf7530
Reviewed-on: https://go-review.googlesource.com/c/go/+/681655
Reviewed-by: Sean Liao <sean@liao.dev>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Sean Liao <sean@liao.dev>
|
|
Currently the mspan limit field is set after allocSpan returns, *after*
the span has already been published to the GC (including the
conservative scanner). But the limit field is load-bearing, because it's
checked to filter out invalid pointers. A stale limit value could cause
a crash by having the conservative scanner access allocBits out of
bounds.
Fix this by setting the mspan limit field before publishing the span.
For large objects and arena chunks, we adjust the limit down after
allocSpan because we don't have access to the true object's size from
allocSpan. However this is safe, since we first initialize the limit to
something definitely safe (the actual span bounds) and only adjust it
down after. Adjusting it down has the benefit of more precise debug
output, but the window in which it's imprecise is also fine because a
single object (logically, with arena chunks) occupies the whole span, so
the 'invalid' part of the memory will just safely point back to that
object. We can't do this for smaller objects because the limit will
include space that does *not* contain any valid objects.
Fixes #74288.
Change-Id: I0a22e5b9bccc1bfdf51d2b73ea7130f1b99c0c7c
Reviewed-on: https://go-review.googlesource.com/c/go/+/682655
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Add support to internal/synctest for managing associations between
arbitrary pointers and synctest bubbles. (Implemented internally to
the runtime package by attaching a special to the pointer.)
Associate WaitGroups with bubbles.
Since WaitGroups don't have a constructor,
perform the association when Add is called.
All Add calls must be made from within the same bubble,
or outside any bubble.
When a bubbled goroutine calls WaitGroup.Wait,
the wait is durably blocking iff the WaitGroup is associated
with the current bubble.
Change-Id: I77e2701e734ac2fa2b32b28d5b0c853b7b2825c9
Reviewed-on: https://go-review.googlesource.com/c/go/+/676656
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Damien Neil <dneil@google.com>
|
|
Currently weak pointers break in sbrk mode. We can just use the immortal
weak handle map for weak pointers in this case, since nothing is ever
freed.
Fixes #69729.
Change-Id: Ie9fa7e203c22776dc9eb3601c6480107d9ad0c99
Reviewed-on: https://go-review.googlesource.com/c/go/+/674656
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
|
|
Add build tag gated Valgrind annotations to the runtime which let it
understand how the runtime manages memory. This allows for Go binaries
to be run under Valgrind without emitting spurious errors.
Instead of adding the Valgrind headers to the tree, and using cgo to
call the various Valgrind client request macros, we just add an assembly
function which emits the necessary instructions to trigger client
requests.
In particular we add instrumentation of the memory allocator, using a
two-level mempool structure (as described in the Valgrind manual [0]).
We also add annotations which allow Valgrind to track which memory we
use for stacks, which seems necessary to let it properly function.
We describe the memory model to Valgrind as follows: we treat heap
arenas as a "pool" created with VALGRIND_CREATE_MEMPOOL_EXT (so that we
can use VALGRIND_MEMPOOL_METAPOOL and VALGRIND_MEMPOOL_AUTO_FREE).
Within the pool we treat spans as "superblocks", annotated with
VALGRIND_MEMPOOL_ALLOC. We then allocate individual objects within spans
with VALGRIND_MALLOCLIKE_BLOCK.
It should be noted that running binaries under Valgrind can be _quite
slow_, and certain operations, such as running the GC, can be _very
slow_. It is recommended to run programs with GOGC=off. Additionally,
async preemption should be turned off, since it'll cause strange
behavior (GODEBUG=asyncpreemptoff=1).
Running Valgrind with --leak-check=yes will result in some errors
resulting from some things not being marked fully free'd. These likely
need more annotations to rectify, but for now it is recommended to run
with --leak-check=off.
Updates #73602
[0] https://valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools
Change-Id: I71b26c47d7084de71ef1e03947ef6b1cc6d38301
Reviewed-on: https://go-review.googlesource.com/c/go/+/674077
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
This change adds support for identifying cleanups and finalizers
attached to tiny blocks to checkfinalizers mode. It also notes a subtle
pitfall, which is that the cleanup arg, if tiny-allocated, could end up
co-located with the object with the cleanup attached! Oops...
For #72949.
Change-Id: Icbe0112f7dcfc63f35c66cf713216796a70121ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/662037
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
This change adds a new special kind called CheckFinalizer which is used
to annotate finalizers and cleanups with extra information about where
that cleanup or finalizer came from.
For #72949.
Change-Id: I3c1ace7bd580293961b7f0ea30345a6ce956d340
Reviewed-on: https://go-review.googlesource.com/c/go/+/662135
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This new debug mode detects cleanup/finalizer leaks using checkmark
mode. It runs a partial GC using only specials as roots. If the GC can
find a path from one of these roots back to the object the special is
attached to, then the object might never be reclaimed. (The cycle could
be broken in the future, but it's almost certainly a bug.)
This debug mode is very barebones. It contains no type information and
no stack location for where the finalizer or cleanup was created.
For #72949.
Change-Id: Ibffd64c1380b51f281950e4cfe61f677385d42a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/634599
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
Moving to a smaller package allows its use in other internal/runtime
packages.
This isn't internal/strconvlite since it can't be used directly by
strconv.
For #73193.
Change-Id: I6a6a636c9c8b3f06b5fd6c07fe9dd5a7a37d1429
Reviewed-on: https://go-review.googlesource.com/c/go/+/672697
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
We don't use this mechanism any more, so the metric will always be zero.
Since CL 616255.
Update #73628
Change-Id: Ic179927a8bc24e6291876c218d88e8848b057c2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/671096
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change splits the finalizer and cleanup queues and implements a new
lock-free blocking queue for cleanups. The basic design is as follows:
The cleanup queue is organized in fixed-sized blocks. Individual cleanup
functions are queued, but only whole blocks are dequeued.
Enqueuing cleanups places them in P-local cleanup blocks. These are
flushed to the full list as they get full. Cleanups can only be enqueued
by an active sweeper.
Dequeuing cleanups always dequeues entire blocks from the full list.
Cleanup blocks can be dequeued and executed at any time.
The very last active sweeper in the sweep phase is responsible for
flushing all local cleanup blocks to the full list. It can do this
without any synchronization because the next GC can't start yet, so we
can be very certain that nobody else will be accessing the local blocks.
Cleanup blocks are stored off-heap because the need to be allocated by
the sweeper, which is called from heap allocation paths. As a result,
the GC treats cleanup blocks as roots, just like finalizer blocks.
Flushes to the full list signal to the scheduler that cleanup goroutines
should be awoken. Every time the scheduler goes to wake up a cleanup
goroutine and there were more signals than goroutines to wake, it then
forwards this signal to runtime.AddCleanup, so that it creates another
goroutine the next time it is called, up to gomaxprocs goroutines.
The signals here are a little convoluted, but exist because the sweeper
and the scheduler cannot safely create new goroutines.
For #71772.
For #71825.
Change-Id: Ie839fde2b67e1b79ac1426be0ea29a8d923a62cc
Reviewed-on: https://go-review.googlesource.com/c/go/+/650697
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
Our current parallel mark algorithm suffers from frequent stalls on
memory since its access pattern is essentially random. Small objects
are the worst offenders, since each one forces pulling in at least one
full cache line to access even when the amount to be scanned is far
smaller than that. Each object also requires an independent access to
per-object metadata.
The purpose of this change is to improve garbage collector performance
by scanning small objects in batches to obtain better cache locality
than our current approach. The core idea behind this change is to defer
marking and scanning small objects, and then scan them in batches
localized to a span.
This change adds scanned bits to each small object (<=512 bytes) span in
addition to mark bits. The scanned bits indicate that the object has
been scanned. (One way to think of them is "grey" bits and "black" bits
in the tri-color mark-sweep abstraction.) Each of these spans is always
8 KiB and if they contain pointers, the pointer/scalar data is already
packed together at the end of the span, allowing us to further optimize
the mark algorithm for this specific case.
When the GC encounters a pointer, it first checks if it points into a
small object span. If so, it is first marked in the mark bits, and then
the object is queued on a work-stealing P-local queue. This object
represents the whole span, and we ensure that a span can only appear at
most once in any queue by maintaining an atomic ownership bit for each
span. Later, when the pointer is dequeued, we scan every object with a
set mark that doesn't have a corresponding scanned bit. If it turns out
that was the only object in the mark bits since the last time we scanned
the span, we scan just that object directly, essentially falling back to
the existing algorithm. noscan objects have no scan work, so they are
never queued.
Each span's mark and scanned bits are co-located together at the end of
the span. Since the span is always 8 KiB in size, it can be found with
simple pointer arithmetic. Next to the marks and scans we also store the
size class, eliminating the need to access the span's mspan altogether.
The work-stealing P-local queue is a new source of GC work. If this
queue gets full, half of it is dumped to a global linked list of spans
to scan. The regular scan queues are always prioritized over this queue
to allow time for darts to accumulate. Stealing work from other Ps is a
last resort.
This change also adds a new debug mode under GODEBUG=gctrace=2 that
dumps whole-span scanning statistics by size class on every GC cycle.
A future extension to this CL is to use SIMD-accelerated scanning
kernels for scanning spans with high mark bit density.
For #19112. (Deadlock averted in GOEXPERIMENT.)
For #73581.
Change-Id: I4bbb4e36f376950a53e61aaaae157ce842c341bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/658036
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: I85f518e36c18f4f0eda8b167750b43cd8c48ecff
Reviewed-on: https://go-review.googlesource.com/c/go/+/622675
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
We will want to reference these definitions from new generator programs,
and this is a good opportunity to cleanup all these old C-style names.
Change-Id: Ifb06f0afc381e2697e7877f038eca786610c96de
Reviewed-on: https://go-review.googlesource.com/c/go/+/655275
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Leverage the prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ...) API to name
the anonymous memory areas.
This API has been introduced in Linux 5.17 to decorate the anonymous
memory areas shown in /proc/<pid>/maps.
This is already used by glibc. See:
* https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=27dfd1eb907f4615b70c70237c42c552bb4f26a8;hb=HEAD#l2434
* https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/setvmaname.c;h=ea93a5ffbebc9e5a7e32a297138f465724b4725f;hb=HEAD#l63
This can be useful when investigating the memory consumption of a
multi-language program.
On a 100% Go program, pprof profiler can be used to profile the memory
consumption of the program. But pprof is only aware of what happens
within the Go world.
On a multi-language program, there could be a doubt about whether the
suspicious extra-memory consumption comes from the Go part or the native
part.
With this change, the following Go program:
package main
import (
"fmt"
"log"
"os"
)
/*
#include <stdlib.h>
void f(void)
{
(void)malloc(1024*1024*1024);
}
*/
import "C"
func main() {
C.f()
data, err := os.ReadFile("/proc/self/maps")
if err != nil {
log.Fatal(err)
}
fmt.Println(string(data))
}
produces this output:
$ GLIBC_TUNABLES=glibc.mem.decorate_maps=1 ~/doc/devel/open-source/go/bin/go run .
00400000-00402000 r--p 00000000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
00402000-004a4000 r-xp 00002000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
004a4000-00574000 r--p 000a4000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
00574000-00575000 r--p 00173000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
00575000-00580000 rw-p 00174000 00:21 28451768 /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
00580000-005a4000 rw-p 00000000 00:00 0
2e075000-2e096000 rw-p 00000000 00:00 0 [heap]
c000000000-c000400000 rw-p 00000000 00:00 0 [anon: Go: heap]
c000400000-c004000000 ---p 00000000 00:00 0 [anon: Go: heap reservation]
777f40000000-777f40021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena]
777f40021000-777f44000000 ---p 00000000 00:00 0
777f44000000-777f44021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena]
777f44021000-777f48000000 ---p 00000000 00:00 0
777f48000000-777f48021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena]
777f48021000-777f4c000000 ---p 00000000 00:00 0
777f4c000000-777f4c021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena]
777f4c021000-777f50000000 ---p 00000000 00:00 0
777f50000000-777f50021000 rw-p 00000000 00:00 0 [anon: glibc: malloc arena]
777f50021000-777f54000000 ---p 00000000 00:00 0
777f55afb000-777f55afc000 ---p 00000000 00:00 0
777f55afc000-777f562fc000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216378]
777f562fc000-777f562fd000 ---p 00000000 00:00 0
777f562fd000-777f56afd000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216377]
777f56afd000-777f56afe000 ---p 00000000 00:00 0
777f56afe000-777f572fe000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216376]
777f572fe000-777f572ff000 ---p 00000000 00:00 0
777f572ff000-777f57aff000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216375]
777f57aff000-777f57b00000 ---p 00000000 00:00 0
777f57b00000-777f58300000 rw-p 00000000 00:00 0 [anon: glibc: pthread stack: 216374]
777f58300000-777f58400000 rw-p 00000000 00:00 0 [anon: Go: page alloc index]
777f58400000-777f5a400000 rw-p 00000000 00:00 0 [anon: Go: heap index]
777f5a400000-777f6a580000 ---p 00000000 00:00 0 [anon: Go: scavenge index]
777f6a580000-777f6a581000 rw-p 00000000 00:00 0 [anon: Go: scavenge index]
777f6a581000-777f7a400000 ---p 00000000 00:00 0 [anon: Go: scavenge index]
777f7a400000-777f8a580000 ---p 00000000 00:00 0 [anon: Go: page summary]
777f8a580000-777f8a581000 rw-p 00000000 00:00 0 [anon: Go: page alloc]
777f8a581000-777f9c430000 ---p 00000000 00:00 0 [anon: Go: page summary]
777f9c430000-777f9c431000 rw-p 00000000 00:00 0 [anon: Go: page alloc]
777f9c431000-777f9e806000 ---p 00000000 00:00 0 [anon: Go: page summary]
777f9e806000-777f9e807000 rw-p 00000000 00:00 0 [anon: Go: page alloc]
777f9e807000-777f9ec00000 ---p 00000000 00:00 0 [anon: Go: page summary]
777f9ec36000-777f9ecb6000 rw-p 00000000 00:00 0 [anon: Go: immortal metadata]
777f9ecb6000-777f9ecc6000 rw-p 00000000 00:00 0 [anon: Go: gc bits]
777f9ecc6000-777f9ecd6000 rw-p 00000000 00:00 0 [anon: Go: allspans array]
777f9ecd6000-777f9ece7000 rw-p 00000000 00:00 0 [anon: Go: immortal metadata]
777f9ece7000-777f9ed67000 ---p 00000000 00:00 0 [anon: Go: page summary]
777f9ed67000-777f9ed68000 rw-p 00000000 00:00 0 [anon: Go: page alloc]
777f9ed68000-777f9ede7000 ---p 00000000 00:00 0 [anon: Go: page summary]
777f9ede7000-777f9ee07000 rw-p 00000000 00:00 0 [anon: Go: page alloc]
777f9ee07000-777f9ee0a000 rw-p 00000000 00:00 0 [anon: glibc: loader malloc]
777f9ee0a000-777f9ee2e000 r--p 00000000 00:21 48158213 /usr/lib/libc.so.6
777f9ee2e000-777f9ef9f000 r-xp 00024000 00:21 48158213 /usr/lib/libc.so.6
777f9ef9f000-777f9efee000 r--p 00195000 00:21 48158213 /usr/lib/libc.so.6
777f9efee000-777f9eff2000 r--p 001e3000 00:21 48158213 /usr/lib/libc.so.6
777f9eff2000-777f9eff4000 rw-p 001e7000 00:21 48158213 /usr/lib/libc.so.6
777f9eff4000-777f9effc000 rw-p 00000000 00:00 0
777f9effc000-777f9effe000 rw-p 00000000 00:00 0 [anon: glibc: loader malloc]
777f9f00a000-777f9f04a000 rw-p 00000000 00:00 0 [anon: Go: immortal metadata]
777f9f04a000-777f9f04c000 r--p 00000000 00:00 0 [vvar]
777f9f04c000-777f9f04e000 r--p 00000000 00:00 0 [vvar_vclock]
777f9f04e000-777f9f050000 r-xp 00000000 00:00 0 [vdso]
777f9f050000-777f9f051000 r--p 00000000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2
777f9f051000-777f9f07a000 r-xp 00001000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2
777f9f07a000-777f9f085000 r--p 0002a000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2
777f9f085000-777f9f087000 r--p 00034000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2
777f9f087000-777f9f088000 rw-p 00036000 00:21 48158204 /usr/lib/ld-linux-x86-64.so.2
777f9f088000-777f9f089000 rw-p 00000000 00:00 0
7ffc7bfa7000-7ffc7bfc8000 rw-p 00000000 00:00 0 [stack]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
The anonymous memory areas are now labelled so that we can see which
ones have been allocated by the Go runtime versus which ones have been
allocated by the glibc.
Fixes #71546
Change-Id: I304e8b4dd7f2477a6da794fd44e9a7a5354e4bf4
Reviewed-on: https://go-review.googlesource.com/c/go/+/646095
Auto-Submit: Alan Donovan <adonovan@google.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Currently Make panics when passed a linker-allocated object. This is
inconsistent with both runtime.AddCleanup and runtime.SetFinalizer. Not
panicking in this case is important so that all pointers can be treated
equally by these APIs. Libraries should not have to worry where a
pointer came from to still make weak pointers.
Supporting this behavior is a bit complex for weak pointers versus
finalizers and cleanups. For the latter two, it means a function is
never called, so we can just drop everything on the floor. For weak
pointers, we still need to produce pointers that compare as per the API.
To do this, copy the tiny lock-free trace map implementation and use it
to store weak handles for "immortal" objects. These paths in the
runtime should be rare, so it's OK if it's not incredibly fast, but we
should keep the memory footprint relatively low (at least not have it be
any worse than specials), so this change tweaks the map implementation a
little bit to ensure that's the case.
Fixes #71726.
Change-Id: I0c87c9d90656d81659ac8d70f511773d0093ce27
Reviewed-on: https://go-review.googlesource.com/c/go/+/649460
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change fixes GODEBUG=gccheckmark=1 which seems to have bit-rotted.
Because the root jobs weren't being reset, it wasn't doing anything.
Then, it turned out that checkmark mode would queue up noscan objects in
workbufs, which caused it to fail. Then it turned out checkmark mode was
broken with user arenas, since their heap arenas are not registered
anywhere. Then, it turned out that checkmark mode could just not run
properly if the goroutine's preemption flag was set (since
sched.gcwaiting is true during the STW). And lastly, it turned out that
async preemption could cause erroneous checkmark failures.
This change fixes all these issues and adds a simple smoke test to dist
to run the runtime tests under gccheckmark, which exercises all of these
issues.
Fixes #69074.
Fixes #69377.
Fixes #69376.
Change-Id: Iaa0bb7b9e63ed4ba34d222b47510d6292ce168bc
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/608915
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Currently specials try to save on space by only encoding the offset from
the base of the span in a uint16. This worked fine up until Go 1.24.
- Most specials have an offset of 0 (mem profile, finalizers, etc.)
- Cleanups do not care about the offset at all, so even if it's wrong,
it's OK.
- Weak pointers *do* care, but the unique package always makes a new
allocation, so the weak pointer handle offset it makes is always zero.
With Go 1.24 and general weak pointers now available, nothing is
stopping someone from just creating a weak pointer that is >64 KiB
offset from the start of an object, and this weak pointer must be
distinct from others.
Fix this problem by just increasing the size of a special and making the
offset a uintptr, to capture all possible offsets. Since we're in the
freeze, this is the safest thing to do. Specials aren't so common that I
expect a substantial memory increase from this change. In a future
release (or if there is a problem) we can almost certainly pack the
special's kind and offset together. There was already a bunch of wasted
space due to padding, so this would bring us back to the same memory
footprint before this change.
Also, add tests for equality of basic weak interior pointers. This
works, but we really should've had tests for it.
Fixes #70739.
Change-Id: Ib49a7f8f0f1ec3db4571a7afb0f4d94c8a93aa40
Reviewed-on: https://go-review.googlesource.com/c/go/+/634598
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Commit-Queue: Michael Knyszek <mknyszek@google.com>
|
|
This change modifies the logic which searches for existing cleanups.
The existing search logic sets the next node to the current node
in certain conditions. This would cause future searches to loop
endlessly. The existing loop could convert non-cleanup specials into
cleanups and cause data corruption.
This also changes where we release the m while we are adding a
cleanup. We are currently holding onto an p-specific gcwork after
releasing the m.
Change-Id: I0ac0b304f40910549c8df114e523c89d9f0d7a75
Reviewed-on: https://go-review.googlesource.com/c/go/+/630278
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
This is similar to the weak handle bug in #70455. In short, there's a
window where a heap-allocated value is only visible through a special
that has not been made visible to the GC yet.
For #70455.
Change-Id: Ic2bb2c60d422a5bc5dab8d971cfc26ff6d7622bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/630277
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
getOrAddWeakHandle is very careful about keeping its input alive across
the operation, but not very careful about keeping the heap-allocated
handle it creates alive. In fact, there's a window in this function
where it is *only* visible via the special. Specifically, the window of
time between when the handle is stored in the special and when the
special actually becomes visible to the GC.
(If we fail to add the special because it already exists, that case is
fine. We don't even use the same handle value, but the one we obtain
from the attached GC-visible special, *and* we return that value, so it
remains live.)
Fixes #70455.
Change-Id: Iadaff0cfb93bcaf61ba2b05be7fa0519c481de82
Reviewed-on: https://go-review.googlesource.com/c/go/+/630315
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
The updates are:
- API documentation changes.
- Removal of the old package documentation discouraging linkname.
- Addition of new package documentation with some advice.
- Renaming of weak.Pointer.Strong -> weak.Pointer.Value.
Fixes #67552.
Change-Id: Ifad7e629b6d339dacaf2ca37b459d7f903e31bf8
Reviewed-on: https://go-review.googlesource.com/c/go/+/628455
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
This change adds the implementation for AddCleanup.Stop. It allows the
caller to cancel the call to execute the cleanup. Cleanup will not be
stopped if the cleanup has already been queued for execution.
For #67535
Change-Id: I494b77d344e54d772c41489d172286773c3814e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/627975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
|
|
This change introduces AddCleanup to the runtime package. AddCleanup attaches
a cleanup function to an pointer to an object.
The Stop method on Cleanups will be implemented in a followup CL.
AddCleanup is intended to be an incremental improvement over
SetFinalizer and will result in SetFinalizer being deprecated.
For #67535
Change-Id: I99645152e3fdcee85fcf42a4f312c6917e8aecb1
Reviewed-on: https://go-review.googlesource.com/c/go/+/627695
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Currently it's possible for weak->strong conversions to create more GC
work during mark termination. When a weak->strong conversion happens
during the mark phase, we need to mark the newly-strong pointer, since
it may now be the only pointer to that object. In other words, the
object could be white.
But queueing new white objects creates GC work, and if this happens
during mark termination, we could end up violating mark termination
invariants. In the parlance of the mark termination algorithm, the
weak->strong conversion is a non-monotonic source of GC work, unlike the
write barriers (which will eventually only see black objects).
This change fixes the problem by forcing weak->strong conversions to
block during mark termination. We can do this efficiently by setting a
global flag before the ragged barrier that is checked at each
weak->strong conversion. If the flag is set, then the conversions block.
The ragged barrier ensures that all Ps have observed the flag and that
any weak->strong conversions which completed before the ragged barrier
have their newly-minted strong pointers visible in GC work queues if
necessary. We later unset the flag and wake all the blocked goroutines
during the mark termination STW.
There are a few subtleties that we need to account for. For one, it's
possible that a goroutine which blocked in a weak->strong conversion
wakes up only to find it's mark termination time again, so we need to
recheck the global flag on wake. We should also stay non-preemptible
while performing the check, so that if the check *does* appear as true,
it cannot switch back to false while we're actively trying to block. If
it switches to false while we try to block, then we'll be stuck in the
queue until the following GC.
All-in-all, this CL is more complicated than I would have liked, but
it's the only idea so far that is clearly correct to me at a high level.
This change adds a test which is somewhat invasive as it manipulates
mark termination, but hopefully that infrastructure will be useful for
debugging, fixing, and regression testing mark termination whenever we
do fix it.
Fixes #69803.
Change-Id: Ie314e6fd357c9e2a07a9be21f217f75f7aba8c4a
Reviewed-on: https://go-review.googlesource.com/c/go/+/623615
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
There's a bug in the weak-to-strong conversion in that creating the
*only* strong pointer to some weakly-held object during the mark phase
may result in that object not being properly marked.
The exact mechanism for this is that the new strong pointer will always
point to a white object (because it was only weakly referenced up until
this point) and it can then be stored in a blackened stack, hiding it
from the garbage collector.
This "hide a white pointer in the stack" problem is pretty much exactly
what the Yuasa part of the hybrid write barrier is trying to catch, so
we need to do the same thing the write barrier would do: shade the
pointer.
Added a test and confirmed that it fails with high probability if the
pointer shading is missing.
Fixes #69210.
Change-Id: Iaae64ae95ea7e975c2f2c3d4d1960e74e1bd1c3f
Reviewed-on: https://go-review.googlesource.com/c/go/+/610396
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
This change allows the tracer to be reentrant by restructuring the
internals such that writing an event is atomic with respect to stack
growth. Essentially, a core set of functions that are involved in
acquiring a trace buffer and writing to it are all marked nosplit.
Stack growth is currently the only hidden place where the tracer may be
accidentally reentrant, preventing the tracer from being used
everywhere. It already lacks write barriers, lacks allocations, and is
non-preemptible. This change thus makes the tracer fully reentrant,
since the only reentry case it needs to handle is stack growth.
Since the invariants needed to attain this are subtle, this change also
extends the debugTraceReentrancy debug mode to check these invariants as
well. Specifically, the invariants are checked by setting the throwsplit
flag.
A side benefit of this change is it simplifies the trace event writing
API a good bit: there's no need to actually thread the event writer
through things, and most callsites look a bit simpler.
Change-Id: I7c329fb7a6cb936bd363c44cf882ea0a925132f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/587599
Reviewed-by: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Cleanup and friction reduction
For #65355.
Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511
Reviewed-on: https://go-review.googlesource.com/c/go/+/600436
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Some of the new experimental events added have a problem in that they
might be emitted during stack growth. This is, to my knowledge, the only
restriction on the tracer, because the tracer otherwise prevents
preemption, avoids allocation, and avoids write barriers. However, the
stack can grow from within the tracer. This leads to
tracing-during-tracing which can result in lost buffers and broken event
streams. (There's a debug mode to get a nice error message, but it's
disabled by default.)
This change resolves the problem by skipping writing out these new
events. This results in the new events sometimes being broken (alloc
without a free, free without an alloc) but for now that's OK. Before the
freeze begins we just want to fix broken tests; tools interpreting these
events will be totally in-house to begin with, and if they have to be a
little bit smarter about missing information, that's OK. In the future
we'll have a more robust fix for this, but it appears that it's going to
require making the tracer fully reentrant. (This is not too hard; either
we force flushing all buffers when going reentrant (which is actually
somewhat subtle with respect to event ordering) or we isolate down just
the actual event writing to be atomic with respect to stack growth. Both
are just bigger changes on shared codepaths that are scary to land this
late in the release cycle.)
Fixes #67379.
Change-Id: I46bb7e470e61c64ff54ac5aec5554b828c1ca4be
Reviewed-on: https://go-review.googlesource.com/c/go/+/587597
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The page tracer's functionality is now captured by the regular execution
tracer as an experimental GODEBUG variable. This is a lot more usable
and maintainable than the page tracer, which is likely to have bitrotted
by this point. There's also no tooling available for the page tracer.
Change-Id: I2408394555e01dde75a522e9a489b7e55cf12c8e
Reviewed-on: https://go-review.googlesource.com/c/go/+/583379
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change adds expensive alloc/free events to traces, guarded by a
GODEBUG that can be set at run time by mutating the GODEBUG environment
variable. This supersedes the alloc/free trace deleted in a previous CL.
There are two parts to this CL.
The first part is adding a mechanism for exposing experimental events
through the tracer and trace parser. This boils down to a new
ExperimentalEvent event type in the parser API which simply reveals the
raw event data for the event. Each experimental event can also be
associated with "experimental data" which is associated with a
particular generation. This experimental data is just exposed as a bag
of bytes that supplements the experimental events.
In the runtime, this CL organizes experimental events by experiment.
An experiment is defined by a set of experimental events and a single
special batch type. Batches of this special type are exposed through the
parser's API as the aforementioned "experimental data".
The second part of this CL is defining the AllocFree experiment, which
defines 9 new experimental events covering heap object alloc/frees, span
alloc/frees, and goroutine stack alloc/frees. It also generates special
batches that contain a type table: a mapping of IDs to type information.
Change-Id: I965c00e3dcfdf5570f365ff89d0f70d8aeca219c
Reviewed-on: https://go-review.googlesource.com/c/go/+/583377
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change adds the internal/weak package, which exposes GC-supported
weak pointers to the standard library. This is for the upcoming weak
package, but may be useful for other future constructs.
For #62483.
Change-Id: I4aa8fa9400110ad5ea022a43c094051699ccab9d
Reviewed-on: https://go-review.googlesource.com/c/go/+/576297
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change removes the allocheaders, deleting all the old code and
merging mbitmap_allocheaders.go back into mbitmap.go.
This change also deletes the SetType benchmarks which were already
broken in the new GOEXPERIMENT (it's harder to set up than before). We
weren't really watching these benchmarks at all, and they don't provide
additional test coverage.
Change-Id: I135497201c3259087c5cd3722ed3fbe24791d25d
Reviewed-on: https://go-review.googlesource.com/c/go/+/567200
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
For #65355
Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be
Reviewed-on: https://go-review.googlesource.com/c/go/+/560155
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
|