aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/malloc.go
AgeCommit message (Collapse)Author
2024-05-29all: document legacy //go:linkname for modules with ≥100 dependentsRuss Cox
For #67401. Change-Id: I015408a3f437c1733d97160ef2fb5da6d4efcc5c Reviewed-on: https://go-review.googlesource.com/c/go/+/587598 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>
2024-05-23all: document legacy //go:linkname for modules with ≥200 dependentsRuss Cox
Ignored these linknames which have not worked for a while: github.com/xtls/xray-core: context.newCancelCtx removed in CL 463999 (Feb 2023) github.com/u-root/u-root: funcPC removed in CL 513837 (Jul 2023) tinygo.org/x/drivers: net.useNetdev never existed For #67401. Change-Id: I9293f4ef197bb5552b431de8939fa94988a060ce Reviewed-on: https://go-review.googlesource.com/c/go/+/587576 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-23all: document legacy //go:linkname for modules with ≥500 dependentsRuss Cox
For #67401. Change-Id: I7dd28c3b01a1a647f84929d15412aa43ab0089ee Reviewed-on: https://go-review.googlesource.com/c/go/+/587575 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-23all: document legacy //go:linkname for modules with ≥1,000 dependentsRuss Cox
For #67401. Change-Id: If23a2c07e3dd042a3c439da7088437a330b9caa4 Reviewed-on: https://go-review.googlesource.com/c/go/+/587222 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-05-23all: document legacy //go:linkname for modules with ≥2,000 dependentsRuss Cox
For #67401. Change-Id: I3ae93042dffd0683b7e6d6225536ae667749515b Reviewed-on: https://go-review.googlesource.com/c/go/+/587221 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-23all: document legacy //go:linkname for modules with ≥20,000 dependentsRuss Cox
For #67401. Change-Id: Icc10ede72547d8020c0ba45e89d954822a4b2455 Reviewed-on: https://go-review.googlesource.com/c/go/+/587218 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-22all: document legacy //go:linkname for modules with ≥50,000 dependentsRuss Cox
Note that this depends on the revert of CL 581395 to move zeroVal back. For #67401. Change-Id: I507c27c2404ad1348aabf1ffa3740e6b1957495b Reviewed-on: https://go-review.googlesource.com/c/go/+/587217 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-05-22all: document legacy //go:linkname for modules with ≥100,000 dependentsRuss Cox
For #67401. Change-Id: I51f5b561ee11eb242e3b1585d591281d0df4fc24 Reviewed-on: https://go-review.googlesource.com/c/go/+/587215 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-08runtime: move profiling pc buffers to mFelix Geisendörfer
Move profiling pc buffers from being stack allocated to an m field. This is motivated by the next patch, which will increase the default stack depth to 128, which might lead to undesirable stack growth for goroutines that produce profiling events. Additionally, this change paves the way to make the stack depth configurable via GODEBUG. Change-Id: Ifa407f899188e2c7c0a81de92194fdb627cb4b36 Reviewed-on: https://go-review.googlesource.com/c/go/+/574699 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-05-08runtime: add traceallocfree GODEBUG for alloc/free events in tracesMichael Anthony Knyszek
This change adds expensive alloc/free events to traces, guarded by a GODEBUG that can be set at run time by mutating the GODEBUG environment variable. This supersedes the alloc/free trace deleted in a previous CL. There are two parts to this CL. The first part is adding a mechanism for exposing experimental events through the tracer and trace parser. This boils down to a new ExperimentalEvent event type in the parser API which simply reveals the raw event data for the event. Each experimental event can also be associated with "experimental data" which is associated with a particular generation. This experimental data is just exposed as a bag of bytes that supplements the experimental events. In the runtime, this CL organizes experimental events by experiment. An experiment is defined by a set of experimental events and a single special batch type. Batches of this special type are exposed through the parser's API as the aforementioned "experimental data". The second part of this CL is defining the AllocFree experiment, which defines 9 new experimental events covering heap object alloc/frees, span alloc/frees, and goroutine stack alloc/frees. It also generates special batches that contain a type table: a mapping of IDs to type information. Change-Id: I965c00e3dcfdf5570f365ff89d0f70d8aeca219c Reviewed-on: https://go-review.googlesource.com/c/go/+/583377 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-08runtime: remove allocfreetraceMichael Anthony Knyszek
allocfreetrace prints all allocations and frees to stderr. It's not terribly useful because it has a really huge overhead, making it not feasible to use except for the most trivial programs. A follow-up CL will replace it with something that is both more thorough and also lower overhead. Change-Id: I1d668fee8b6aaef5251a5aea3054ec2444d75eb6 Reviewed-on: https://go-review.googlesource.com/c/go/+/583376 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-04-09runtime: make zeroing of large objects containing pointers preemptibleMichael Anthony Knyszek
This change makes it possible for the runtime to preempt the zeroing of large objects that contain pointers. It turns out this is fairly straightforward with allocation headers, since we can just temporarily tell the GC that there's nothing to scan for a large object with a single pointer write (as opposed to trying to zero a whole bunch of bits, as we would've had to do once upon a time). Fixes #31222. Change-Id: I10d0dcfa3938c383282a3eb485a6f00070d07bd2 Reviewed-on: https://go-review.googlesource.com/c/go/+/577495 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-04-09runtime: remove the allocheaders GOEXPERIMENTMichael Anthony Knyszek
This change removes the allocheaders, deleting all the old code and merging mbitmap_allocheaders.go back into mbitmap.go. This change also deletes the SetType benchmarks which were already broken in the new GOEXPERIMENT (it's harder to set up than before). We weren't really watching these benchmarks at all, and they don't provide additional test coverage. Change-Id: I135497201c3259087c5cd3722ed3fbe24791d25d Reviewed-on: https://go-review.googlesource.com/c/go/+/567200 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-03-25runtime: migrate internal/atomic to internal/runtimeAndy Pan
For #65355 Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be Reviewed-on: https://go-review.googlesource.com/c/go/+/560155 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
2024-03-04runtime: use .Pointers() instead of manual checkingPouriya
Change-Id: Ib78c1513616089f4942297cd17212b1b11871fd5 GitHub-Last-Rev: f97fe5b5bffffe25dc31de7964588640cb70ec41 GitHub-Pull-Request: golang/go#65819 Reviewed-on: https://go-review.googlesource.com/c/go/+/565515 Reviewed-by: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2023-12-05math/rand, math/rand/v2: use ChaCha8 for global randRuss Cox
Move ChaCha8 code into internal/chacha8rand and use it to implement runtime.rand, which is used for the unseeded global source for both math/rand and math/rand/v2. This also affects the calculation of the start point for iteration over very very large maps (when the 32-bit fastrand is not big enough). The benefit is that misuse of the global random number generators in math/rand and math/rand/v2 in contexts where non-predictable randomness is important for security reasons is no longer a security problem, removing a common mistake among programmers who are unaware of the different kinds of randomness. The cost is an extra 304 bytes per thread stored in the m struct plus 2-3ns more per random uint64 due to the more sophisticated algorithm. Using PCG looks like it would cost about the same, although I haven't benchmarked that. Before this, the math/rand and math/rand/v2 global generator was wyrand (https://github.com/wangyi-fudan/wyhash). For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson ALFG was justifiable, since the latter was not any better. But for math/rand/v2, the global generator really should be at least as good as one of the well-studied, specific algorithms provided directly by the package, and it's not. (Wyrand is still reasonable for scheduling and cache decisions.) Good randomness does have a cost: about twice wyrand. Also rationalize the various runtime rand references. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ bbb48afeb7.amd64 │ 5cf807d1ea.amd64 │ │ sec/op │ sec/op vs base │ ChaCha8-32 1.862n ± 2% 1.861n ± 2% ~ (p=0.825 n=20) PCG_DXSM-32 1.471n ± 1% 1.460n ± 2% ~ (p=0.153 n=20) SourceUint64-32 1.636n ± 2% 1.582n ± 1% -3.30% (p=0.000 n=20) GlobalInt64-32 2.087n ± 1% 3.663n ± 1% +75.54% (p=0.000 n=20) GlobalInt64Parallel-32 0.1042n ± 1% 0.2026n ± 1% +94.48% (p=0.000 n=20) GlobalUint64-32 2.263n ± 2% 3.724n ± 1% +64.57% (p=0.000 n=20) GlobalUint64Parallel-32 0.1019n ± 1% 0.1973n ± 1% +93.67% (p=0.000 n=20) Int64-32 1.771n ± 1% 1.774n ± 1% ~ (p=0.449 n=20) Uint64-32 1.863n ± 2% 1.866n ± 1% ~ (p=0.364 n=20) GlobalIntN1000-32 3.134n ± 3% 4.730n ± 2% +50.95% (p=0.000 n=20) IntN1000-32 2.489n ± 1% 2.489n ± 1% ~ (p=0.683 n=20) Int64N1000-32 2.521n ± 1% 2.516n ± 1% ~ (p=0.394 n=20) Int64N1e8-32 2.479n ± 1% 2.478n ± 2% ~ (p=0.743 n=20) Int64N1e9-32 2.530n ± 2% 2.514n ± 2% ~ (p=0.193 n=20) Int64N2e9-32 2.501n ± 1% 2.494n ± 1% ~ (p=0.616 n=20) Int64N1e18-32 3.227n ± 1% 3.205n ± 1% ~ (p=0.101 n=20) Int64N2e18-32 3.647n ± 1% 3.599n ± 1% ~ (p=0.019 n=20) Int64N4e18-32 5.135n ± 1% 5.069n ± 2% ~ (p=0.034 n=20) Int32N1000-32 2.657n ± 1% 2.637n ± 1% ~ (p=0.180 n=20) Int32N1e8-32 2.636n ± 1% 2.636n ± 1% ~ (p=0.763 n=20) Int32N1e9-32 2.660n ± 2% 2.638n ± 1% ~ (p=0.358 n=20) Int32N2e9-32 2.662n ± 2% 2.618n ± 2% ~ (p=0.064 n=20) Float32-32 2.272n ± 2% 2.239n ± 2% ~ (p=0.194 n=20) Float64-32 2.272n ± 1% 2.286n ± 2% ~ (p=0.763 n=20) ExpFloat64-32 3.762n ± 1% 3.744n ± 1% ~ (p=0.171 n=20) NormFloat64-32 3.706n ± 1% 3.655n ± 2% ~ (p=0.066 n=20) Perm3-32 32.93n ± 3% 34.62n ± 1% +5.13% (p=0.000 n=20) Perm30-32 202.9n ± 1% 204.0n ± 1% ~ (p=0.482 n=20) Perm30ViaShuffle-32 115.0n ± 1% 114.9n ± 1% ~ (p=0.358 n=20) ShuffleOverhead-32 112.8n ± 1% 112.7n ± 1% ~ (p=0.692 n=20) Concurrent-32 2.107n ± 0% 3.725n ± 1% +76.75% (p=0.000 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 │ bbb48afeb7.arm64 │ 5cf807d1ea.arm64 │ │ sec/op │ sec/op vs base │ ChaCha8-8 2.480n ± 0% 2.429n ± 0% -2.04% (p=0.000 n=20) PCG_DXSM-8 2.531n ± 0% 2.530n ± 0% ~ (p=0.877 n=20) SourceUint64-8 2.534n ± 0% 2.533n ± 0% ~ (p=0.732 n=20) GlobalInt64-8 2.172n ± 1% 4.794n ± 0% +120.67% (p=0.000 n=20) GlobalInt64Parallel-8 0.4320n ± 0% 0.9605n ± 0% +122.32% (p=0.000 n=20) GlobalUint64-8 2.182n ± 0% 4.770n ± 0% +118.58% (p=0.000 n=20) GlobalUint64Parallel-8 0.4307n ± 0% 0.9583n ± 0% +122.51% (p=0.000 n=20) Int64-8 4.107n ± 0% 4.104n ± 0% ~ (p=0.416 n=20) Uint64-8 4.080n ± 0% 4.080n ± 0% ~ (p=0.052 n=20) GlobalIntN1000-8 2.814n ± 2% 5.643n ± 0% +100.50% (p=0.000 n=20) IntN1000-8 4.141n ± 0% 4.139n ± 0% ~ (p=0.140 n=20) Int64N1000-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.313 n=20) Int64N1e8-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.103 n=20) Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.761 n=20) Int64N2e9-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.636 n=20) Int64N1e18-8 5.266n ± 0% 5.326n ± 1% +1.14% (p=0.001 n=20) Int64N2e18-8 6.052n ± 0% 6.167n ± 0% +1.90% (p=0.000 n=20) Int64N4e18-8 8.826n ± 0% 9.051n ± 0% +2.55% (p=0.000 n=20) Int32N1000-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20) Int32N1e8-8 4.126n ± 0% 4.131n ± 0% +0.12% (p=0.000 n=20) Int32N1e9-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20) Int32N2e9-8 4.132n ± 0% 4.131n ± 0% ~ (p=0.017 n=20) Float32-8 4.109n ± 0% 4.105n ± 0% ~ (p=0.379 n=20) Float64-8 4.107n ± 0% 4.106n ± 0% ~ (p=0.867 n=20) ExpFloat64-8 5.339n ± 0% 5.383n ± 0% +0.82% (p=0.000 n=20) NormFloat64-8 5.735n ± 0% 5.737n ± 1% ~ (p=0.856 n=20) Perm3-8 26.65n ± 0% 26.80n ± 1% +0.58% (p=0.000 n=20) Perm30-8 194.8n ± 1% 197.0n ± 0% +1.18% (p=0.000 n=20) Perm30ViaShuffle-8 156.6n ± 0% 157.6n ± 1% +0.61% (p=0.000 n=20) ShuffleOverhead-8 124.9n ± 0% 125.5n ± 0% +0.52% (p=0.000 n=20) Concurrent-8 2.434n ± 3% 5.066n ± 0% +108.09% (p=0.000 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ bbb48afeb7.386 │ 5cf807d1ea.386 │ │ sec/op │ sec/op vs base │ ChaCha8-32 11.295n ± 1% 4.748n ± 2% -57.96% (p=0.000 n=20) PCG_DXSM-32 7.693n ± 1% 7.738n ± 2% ~ (p=0.542 n=20) SourceUint64-32 7.658n ± 2% 7.622n ± 2% ~ (p=0.344 n=20) GlobalInt64-32 3.473n ± 2% 7.526n ± 2% +116.73% (p=0.000 n=20) GlobalInt64Parallel-32 0.3198n ± 0% 0.5444n ± 0% +70.22% (p=0.000 n=20) GlobalUint64-32 3.612n ± 0% 7.575n ± 1% +109.69% (p=0.000 n=20) GlobalUint64Parallel-32 0.3168n ± 0% 0.5403n ± 0% +70.51% (p=0.000 n=20) Int64-32 7.673n ± 2% 7.789n ± 1% ~ (p=0.122 n=20) Uint64-32 7.773n ± 1% 7.827n ± 2% ~ (p=0.920 n=20) GlobalIntN1000-32 6.268n ± 1% 9.581n ± 1% +52.87% (p=0.000 n=20) IntN1000-32 10.33n ± 2% 10.45n ± 1% ~ (p=0.233 n=20) Int64N1000-32 10.98n ± 2% 11.01n ± 1% ~ (p=0.401 n=20) Int64N1e8-32 11.19n ± 2% 10.97n ± 1% ~ (p=0.033 n=20) Int64N1e9-32 11.06n ± 1% 11.08n ± 1% ~ (p=0.498 n=20) Int64N2e9-32 11.10n ± 1% 11.01n ± 2% ~ (p=0.995 n=20) Int64N1e18-32 15.23n ± 2% 15.04n ± 1% ~ (p=0.973 n=20) Int64N2e18-32 15.89n ± 1% 15.85n ± 1% ~ (p=0.409 n=20) Int64N4e18-32 18.96n ± 2% 19.34n ± 2% ~ (p=0.048 n=20) Int32N1000-32 10.46n ± 2% 10.44n ± 2% ~ (p=0.480 n=20) Int32N1e8-32 10.46n ± 2% 10.49n ± 2% ~ (p=0.951 n=20) Int32N1e9-32 10.28n ± 2% 10.26n ± 1% ~ (p=0.431 n=20) Int32N2e9-32 10.50n ± 2% 10.44n ± 2% ~ (p=0.249 n=20) Float32-32 13.80n ± 2% 13.80n ± 2% ~ (p=0.751 n=20) Float64-32 23.55n ± 2% 23.87n ± 0% ~ (p=0.408 n=20) ExpFloat64-32 15.36n ± 1% 15.29n ± 2% ~ (p=0.316 n=20) NormFloat64-32 13.57n ± 1% 13.79n ± 1% +1.66% (p=0.005 n=20) Perm3-32 45.70n ± 2% 46.99n ± 2% +2.81% (p=0.001 n=20) Perm30-32 399.0n ± 1% 403.8n ± 1% +1.19% (p=0.006 n=20) Perm30ViaShuffle-32 349.0n ± 1% 350.4n ± 1% ~ (p=0.909 n=20) ShuffleOverhead-32 322.3n ± 1% 323.8n ± 1% ~ (p=0.410 n=20) Concurrent-32 3.331n ± 1% 7.312n ± 1% +119.50% (p=0.000 n=20) For #61716. Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2 Reviewed-on: https://go-review.googlesource.com/c/go/+/516860 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-17runtime: use span.elemsize for accounting in mallocgcMichael Anthony Knyszek
Currently the final size computed for an object in mallocgc excludes the allocation header. This is correct in a number of cases, but definitely wrong for memory profiling because the "free" side accounts for the full allocation slot. This change makes an explicit distinction between the parts of mallocgc that care about the full allocation slot size ("the GC's accounting") and those that don't (pointer+len should always be valid). It then applies the appropriate size to the different forms of accounting in mallocgc. For #64153. Change-Id: I481b34b2bb9ff923b59e8408ab2b8fb9025ba944 Reviewed-on: https://go-review.googlesource.com/c/go/+/542735 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-11-17runtime: put allocation headers back at the start the objectMichael Anthony Knyszek
A persistent performance regression was discovered on perf.golang.org/dashboard and this was narrowed down to the switch to footers. Using allocation headers instead resolves the issue. The benchmark results for allocation footers weren't realistic, because they were performed on a machine with enough L3 cache that it completely hid the additional cache miss introduced by allocation footers. This means that in some corner cases the Go runtime may no longer allocate 16-byte aligned memory. Note however that this property was *mostly* incidental and never guaranteed in any documentation. Allocation headers were tested widely within Google and no issues were found, so we're fairly confident that this will not affect very many users. Nonetheless, by Hyrum's Law some code might depend on it. A follow-up change will add a GODEBUG flag that ensures 16 byte alignment at the potential cost of some additional memory use. Users experiencing both a performance regression and an alignment issue can also disable the GOEXPERIMENT at build time. This reverts commit 1e250a219900651dad27f29eab0877eee4afd5b9. Change-Id: Ia7d62a9c60d1773c8b6d33322ee33a80ef814943 Reviewed-on: https://go-review.googlesource.com/c/go/+/543255 Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-11-13runtime: call enableMetadataHugePages and its callees on the systemstackMichael Anthony Knyszek
These functions acquire the heap lock. If they're not called on the systemstack, a stack growth could cause a self-deadlock since stack growth may allocate memory from the page heap. This has been a problem for a while. If this is what's plaguing the ppc64 port right now, it's very surprising (and probably just coincidental) that it's showing up now. For #64050. For #64062. Fixes #64067. Change-Id: I2b95dc134d17be63b9fe8f7a3370fe5b5438682f Reviewed-on: https://go-review.googlesource.com/c/go/+/541635 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Paul Murphy <murp@ibm.com>
2023-11-09runtime: make alloc headers footers insteadMichael Anthony Knyszek
The previous CL in this series (CL 437955) adds the allocation headers experiment. However, this experiment puts the headers at the beginning of each allocation, which decreases the default allocator alignment that users can rely upon. Historically, Go's memory allocator has implicitly provided 16-byte alignment (except for sizes where it doesn't make sense, like 8 or 24 bytes), so it's not unlikely that users are depending on it. It also complicates other changes that want higher alignment. For example, the sync/atomic.Uint64Pair proposal would (hypothetically; it's not yet accepted) introduce a type with 16-byte alignment. The malloc fast path will require extra code to consider alignment and will waste memory for any value containing such a type. This change moves the allocation header to the end of the span's allocation slot instead of the beginning. This means worse locality for the GC when scanning, but it's still an overall win. It also means that objects will still have the 16-byte alignment we've provided thus far. This is broken out in a separate change just becauase it ended up that way during development. But I've chosen to leave it this way in case we want to try and move allocation headers to the front of objects again. Below are the benchmark results of this CL series, comparing the performance of this CL with GOEXPERIMENT=allocheaders vs. without this CL series. name old time/op new time/op delta BiogoIgor 12.5s ± 0% 12.4s ± 2% ~ (p=0.079 n=9+10) BiogoKrishna 12.8s ±10% 12.4s ±10% ~ (p=0.182 n=9+10) BleveIndexBatch100 4.54s ± 3% 4.60s ± 3% ~ (p=0.050 n=9+9) EtcdPut 21.1ms ± 2% 21.3ms ± 4% ~ (p=0.669 n=7+10) EtcdSTM 107ms ± 3% 108ms ± 2% ~ (p=0.497 n=9+10) GoBuildKubelet 34.1s ± 3% 33.1s ± 2% -3.08% (p=0.000 n=10+10) GoBuildKubeletLink 7.94s ± 2% 7.95s ± 2% ~ (p=0.631 n=10+10) GoBuildIstioctl 33.2s ± 1% 31.7s ± 0% -4.37% (p=0.000 n=9+9) GoBuildIstioctlLink 8.07s ± 1% 8.05s ± 1% ~ (p=0.356 n=9+10) GoBuildFrontend 12.1s ± 0% 11.5s ± 1% -4.43% (p=0.000 n=8+10) GoBuildFrontendLink 1.20s ± 2% 1.20s ± 2% ~ (p=0.905 n=9+10) GopherLuaKNucleotide 19.9s ± 0% 19.5s ± 1% -1.95% (p=0.000 n=9+10) MarkdownRenderXHTML 194ms ± 5% 194ms ± 2% ~ (p=0.931 n=9+9) Tile38QueryLoad 518µs ± 1% 508µs ± 1% -1.93% (p=0.000 n=9+8) name old average-RSS-bytes new average-RSS-bytes delta BiogoIgor 66.2MB ± 3% 65.6MB ± 1% ~ (p=0.156 n=10+9) BiogoKrishna 4.34GB ± 2% 4.34GB ± 1% ~ (p=0.315 n=10+9) BleveIndexBatch100 189MB ± 3% 186MB ± 3% ~ (p=0.052 n=10+10) EtcdPut 105MB ± 5% 107MB ± 6% ~ (p=0.579 n=10+10) EtcdSTM 92.1MB ± 5% 93.2MB ± 4% ~ (p=0.353 n=10+10) GoBuildKubelet 2.07GB ± 1% 2.07GB ± 1% ~ (p=0.436 n=10+10) GoBuildIstioctl 1.44GB ± 1% 1.46GB ± 1% +0.96% (p=0.001 n=10+10) GoBuildFrontend 522MB ± 1% 512MB ± 2% -1.98% (p=0.000 n=10+10) GopherLuaKNucleotide 37.4MB ± 5% 36.4MB ± 4% -2.53% (p=0.035 n=10+10) MarkdownRenderXHTML 21.2MB ± 1% 20.9MB ± 3% -1.53% (p=0.003 n=8+10) Tile38QueryLoad 6.39GB ± 2% 6.24GB ± 2% -2.40% (p=0.000 n=10+10) name old peak-RSS-bytes new peak-RSS-bytes delta BiogoIgor 88.5MB ± 4% 88.4MB ± 3% ~ (p=0.971 n=10+10) BiogoKrishna 4.48GB ± 0% 4.42GB ± 0% -1.49% (p=0.000 n=10+10) BleveIndexBatch100 268MB ± 3% 265MB ± 4% ~ (p=0.315 n=9+10) EtcdPut 147MB ± 9% 146MB ± 5% ~ (p=0.853 n=10+10) EtcdSTM 119MB ± 6% 120MB ± 5% ~ (p=0.796 n=10+10) GopherLuaKNucleotide 43.1MB ±17% 40.7MB ±12% ~ (p=0.075 n=10+10) MarkdownRenderXHTML 21.2MB ± 1% 21.1MB ± 3% ~ (p=0.511 n=9+10) Tile38QueryLoad 6.65GB ± 4% 6.52GB ± 2% -1.93% (p=0.009 n=10+10) name old peak-VM-bytes new peak-VM-bytes delta BiogoIgor 1.33GB ± 0% 1.33GB ± 0% -0.16% (p=0.000 n=10+10) BiogoKrishna 5.77GB ± 0% 5.69GB ± 0% -1.23% (p=0.000 n=10+10) BleveIndexBatch100 2.62GB ± 0% 2.61GB ± 0% -0.13% (p=0.000 n=7+10) EtcdPut 12.1GB ± 0% 12.1GB ± 0% ~ (p=0.160 n=8+10) EtcdSTM 12.1GB ± 0% 12.1GB ± 0% -0.02% (p=0.000 n=10+10) GopherLuaKNucleotide 1.26GB ± 0% 1.26GB ± 0% -0.09% (p=0.000 n=10+10) MarkdownRenderXHTML 1.26GB ± 0% 1.26GB ± 0% -0.08% (p=0.000 n=10+10) Tile38QueryLoad 7.89GB ± 4% 7.76GB ± 1% -1.70% (p=0.008 n=10+8) name old p50-latency-ns new p50-latency-ns delta EtcdPut 20.1M ± 5% 20.2M ± 4% ~ (p=0.529 n=10+10) EtcdSTM 79.8M ± 4% 79.9M ± 4% ~ (p=0.971 n=10+10) Tile38QueryLoad 215k ± 1% 210k ± 3% -2.04% (p=0.021 n=8+10) name old p90-latency-ns new p90-latency-ns delta EtcdPut 31.9M ± 6% 32.0M ± 7% ~ (p=0.780 n=9+10) EtcdSTM 220M ± 6% 220M ± 2% ~ (p=1.000 n=10+10) Tile38QueryLoad 622k ± 2% 646k ± 2% +3.83% (p=0.000 n=10+10) name old p99-latency-ns new p99-latency-ns delta EtcdPut 47.6M ±32% 51.4M ±28% ~ (p=0.529 n=10+10) EtcdSTM 452M ± 2% 457M ± 2% ~ (p=0.182 n=9+10) Tile38QueryLoad 5.04M ± 2% 4.91M ± 3% -2.56% (p=0.001 n=9+9) name old ops/s new ops/s delta EtcdPut 46.1k ± 2% 45.7k ± 4% ~ (p=0.475 n=7+10) EtcdSTM 9.18k ± 5% 9.20k ± 3% ~ (p=0.971 n=10+10) Tile38QueryLoad 17.4k ± 1% 17.7k ± 1% +1.97% (p=0.000 n=9+8) Change-Id: I637f48fb9e8c181912db785ae9186d7f16769870 Reviewed-on: https://go-review.googlesource.com/c/go/+/537886 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-11-09runtime: implement experiment to replace heap bitmap with alloc headersMichael Anthony Knyszek
This change replaces the 1-bit-per-word heap bitmap for most size classes with allocation headers for objects that contain pointers. The header consists of a single pointer to a type. All allocations with headers are treated as implicitly containing one or more instances of the type in the header. As the name implies, headers are usually stored as the first word of an object. There are two additional exceptions to where headers are stored and how they're used. Objects smaller than 512 bytes do not have headers. Instead, a heap bitmap is reserved at the end of spans for objects of this size. A full word of overhead is too much for these small objects. The bitmap is of the same format of the old bitmap, minus the noMorePtrs bits which are unnecessary. All the objects <512 bytes have a bitmap less than a pointer-word in size, and that was the granularity at which noMorePtrs could stop scanning early anyway. Objects that are larger than 32 KiB (which have their own span) have their headers stored directly in the span, to allow power-of-two-sized allocations to not spill over into an extra page. The full implementation is behind GOEXPERIMENT=allocheaders. The purpose of this change is performance. First and foremost, with headers we no longer have to unroll pointer/scalar data at allocation time for most size classes. Small size classes still need some unrolling, but their bitmaps are small so we can optimize that case fairly well. Larger objects effectively have their pointer/scalar data unrolled on-demand from type data, which is much more compactly represented and results in less TLB pressure. Furthermore, since the headers are usually right next to the object and where we're about to start scanning, we get an additional temporal locality benefit in the data cache when looking up type metadata. The pointer/scalar data is now effectively unrolled on-demand, but it's also simpler to unroll than before; that unrolled data is never written anywhere, and for arrays we get the benefit of retreading the same data per element, as opposed to looking it up from scratch for each pointer-word of bitmap. Lastly, because we no longer have a heap bitmap that spans the entire heap, there's a flat 1.5% memory use reduction. This is balanced slightly by some objects possibly being bumped up a size class, but most objects are not tightly optimized to size class sizes so there's some memory to spare, making the header basically free in those cases. See the follow-up CL which turns on this experiment by default for benchmark results. (CL 538217.) Change-Id: I4c9034ee200650d06d8bdecd579d5f7c1bbf1fc5 Reviewed-on: https://go-review.googlesource.com/c/go/+/437955 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-02runtime: use smaller fields for mspan.freeindex and nelemsCherry Mui
mspan.freeindex and nelems can fit into uint16 for all possible values. Use uint16 instead of uintptr. Change-Id: Ifce20751e81d5022be1f6b5cbb5fbe4fd1728b1b Reviewed-on: https://go-review.googlesource.com/c/go/+/451359 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-07-21runtime: fix debug non-concurrent sweep mode after activeSweep changesMichael Anthony Knyszek
Currently the GC creates a sweepLocker before restarting the world at the end of the mark phase, so that it can safely flush mcaches without the runtime incorrectly concluding that sweeping is done before that happens. However, with GODEBUG=gcstoptheworld=2, where sweeping happens during that STW phase, creating that sweepLocker will fail, since the runtime will conclude that sweeping is in fact complete (all the queues will be drained). The problem however is that gcSweep, which does the non-concurrent sweeping, doesn't actually flush mcaches. In essence, this failure to create a sweepLocker is indicating a real issue: sweeping is marked as complete, but we haven't flush the mcaches yet! The fix to this is to flush mcaches in gcSweep when in a non-concurrent sweep. Now that gcSweep actually completes a full sweep, it's safe to ignore a failure to create a sweepLocker (and in fact, it *must* fail). While we're here, let's also remove _ConcurrentSweep, the debug flag. There's already an alias for it called concurrentSweep, and there's only one use of it in gcSweep. Lastly, add a dist test for the GODEBUG=gcstoptheworld=2 mode. Fixes #53885. Change-Id: I8a1e5b8f362ed8abd03f76e4950d3211f145ab1f Reviewed-on: https://go-review.googlesource.com/c/go/+/479517 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-06-14all: fix spelling errorsAlexander Yastrebov
Fix spelling errors discovered using https://github.com/codespell-project/codespell. Errors in data files and vendored packages are ignored. Change-Id: I83c7818222f2eea69afbd270c15b7897678131dc GitHub-Last-Rev: 3491615b1b82832cc0064f535786546e89aa6184 GitHub-Pull-Request: golang/go#60758 Reviewed-on: https://go-review.googlesource.com/c/go/+/502576 Auto-Submit: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-05-05internal/abi: refactor (basic) type struct into one definitionDavid Chase
This touches a lot of files, which is bad, but it is also good, since there's N copies of this information commoned into 1. The new files in internal/abi are copied from the end of the stack; ultimately this will all end up being used. Change-Id: Ia252c0055aaa72ca569411ef9f9e96e3d610889e Reviewed-on: https://go-review.googlesource.com/c/go/+/462995 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2023-04-19runtime: disable huge pages for GC metadata for small heapsMichael Anthony Knyszek
For #55328. Change-Id: I8792161f09906c08d506cc0ace9d07e76ec6baa6 Reviewed-on: https://go-review.googlesource.com/c/go/+/460316 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-04-18runtime: change lfstack support to taggedPointerIan Lance Taylor
This is a refactoring with no change in behavior, in preparation for future netpoll work. For #59545 Change-Id: I493c5fd0f49f31b75787f7b5b89c544bed73f64f Reviewed-on: https://go-review.googlesource.com/c/go/+/484836 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Orlando Labao <orlando.labao43@gmail.com>
2023-04-05runtime: initialize the memory limit in mallocinitMichael Anthony Knyszek
Currently the memory limit is left uninitialized before gcinit, and allocations may happen. The result is that the span allocation path might try to scavenge memory unnecessarily. Prevent this by setting the memory limit up early to its default value. Change-Id: I886d9a8fa645861e4f88e0d54af793418426f520 Reviewed-on: https://go-review.googlesource.com/c/go/+/450736 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2022-11-18all: add missing periods in commentscui fliter
Change-Id: I69065f8adf101fdb28682c55997f503013a50e29 Reviewed-on: https://go-review.googlesource.com/c/go/+/449757 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Joedian Reid <joedian@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-11-15runtime: make GC see object as allocated after it is initializedCherry Mui
When the GC is scanning some memory (possibly conservatively), finding a pointer, while concurrently another goroutine is allocating an object at the same address as the found pointer, the GC may see the pointer before the object and/or the heap bits are initialized. This may cause the GC to see bad pointers and possibly crash. To prevent this, we make it that the scanner can only see the object as allocated after the object and the heap bits are initialized. Currently the allocator uses freeindex to find the next available slot, and that code is coupled with updating the free index to a new slot past it. The scanner also uses the freeindex to determine if an object is allocated. This is somewhat racy. This CL makes the scanner use a different field, which is only updated after the object initialization (and a memory barrier). Fixes #54596. Change-Id: I2a57a226369926e7192c253dd0d21d3faf22297c Reviewed-on: https://go-review.googlesource.com/c/go/+/449017 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-11-14Revert "runtime: delay incrementing freeindex in malloc"Michael Knyszek
This reverts commit bed2b7cf41471e1521af5a83ae28bd643eb3e038. Reason for revert: I clicked submit by accident on the wrong CL. Change-Id: Iddf128cb62f289d472510eb30466e515068271b2 Reviewed-on: https://go-review.googlesource.com/c/go/+/449501 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-11-11runtime: delay incrementing freeindex in mallocCherry Mui
When the GC is scanning some memory (possibly conservatively), finding a pointer, while concurrently another goroutine is allocating an object at the same address as the found pointer, the GC may see the pointer before the object and/or the heap bits are initialized. This may cause the GC to see bad pointers and possibly crash. To prevent this, we make it that the scanner can only see the object as allocated after the object and the heap bits are initialized. As the scanner uses the freeindex to determine if an object is allocated, we delay the increment of freeindex after the initialization. As currently in some code path finding the next free index and updating the free index to a new slot past it is coupled, this needs a small refactoring. In the new code mspan.nextFreeIndex return the next free index but not update it (although allocCache is updated). mallocgc will update it at a later time. Fixes #54596. Change-Id: I6dd5ccf743f2d2c46a1ed67c6a8237fe09a71260 Reviewed-on: https://go-review.googlesource.com/c/go/+/427619 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2022-10-18runtime: replace all uses of CtzXX with TrailingZerosXXYoulin Feng
Replace all uses of Ctz64/32/8 with TrailingZeros64/32/8, because they are the same and maybe duplicated. Also renamed CtzXX functions in 386 assembly code. Change-Id: I19290204858083750f4be589bb0923393950ae6d Reviewed-on: https://go-review.googlesource.com/c/go/+/438935 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Auto-Submit: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org>
2022-10-17runtime: remove redundant conversionDan Kortschak
This appears to have been left over from a C cast during the rewrite of malloc into Go in https://golang.org/cl/108840046. Change-Id: I88f212089c2bcf79d5881b3e8bf3f94f343331d8 Reviewed-on: https://go-review.googlesource.com/c/go/+/443235 Run-TryBot: Dan Kortschak <dan@kortschak.io> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-10-12runtime: add safe arena support to the runtimeMichael Anthony Knyszek
This change adds an API to the runtime for arenas. A later CL can potentially export it as an experimental API, but for now, just the runtime implementation will suffice. The purpose of arenas is to improve efficiency, primarily by allowing for an application to manually free memory, thereby delaying garbage collection. It comes with other potential performance benefits, such as better locality, a better allocation strategy, and better handling of interior pointers by the GC. This implementation is based on one by danscales@google.com with a few significant differences: * The implementation lives entirely in the runtime (all layers). * Arena chunks are the minimum of 8 MiB or the heap arena size. This choice is made because in practice 64 MiB appears to be way too large of an area for most real-world use-cases. * Arena chunks are not unmapped, instead they're placed on an evacuation list and when there are no pointers left pointing into them, they're allowed to be reused. * Reusing partially-used arena chunks no longer tries to find one used by the same P first; it just takes the first one available. * In order to ensure worst-case fragmentation is never worse than 25%, only types and slice backing stores whose sizes are 1/4th the size of a chunk or less may be used. Previously larger sizes, up to the size of the chunk, were allowed. * ASAN, MSAN, and the race detector are fully supported. * Sets arena chunks to fault that were deferred at the end of mark termination (a non-public patch once did this; I don't see a reason not to continue that). For #51317. Change-Id: I83b1693a17302554cb36b6daa4e9249a81b1644f Reviewed-on: https://go-review.googlesource.com/c/go/+/423359 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-10-12runtime: make (*mheap).sysAlloc more generalMichael Anthony Knyszek
This change makes (*mheap).sysAlloc take an explicit list of hints and a boolean as to whether or not any newly-created heapArenas should be registered in the full arena list. This is a refactoring in preparation for arenas. For #51317. Change-Id: I0584a033fce3fcb60c5d0bc033d5fb8bd23b2378 Reviewed-on: https://go-review.googlesource.com/c/go/+/432078 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-10-12runtime: factor out GC assist credit accountingMichael Anthony Knyszek
No-op change in preparation for arenas. For #51317. Change-Id: I0777f21763fcd34957b7e709580cf2b7a962ba67 Reviewed-on: https://go-review.googlesource.com/c/go/+/423365 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-09-08runtime: remove unused scanSize parameter to gcmarknewobjectMichael Anthony Knyszek
This was left over from the old pacer, and never removed when the old pacer was removed in Go 1.19. Change-Id: I79e5f0420c6100c66bd06129a68f5bbab7c1ea8f Reviewed-on: https://go-review.googlesource.com/c/go/+/429256 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2022-08-19runtime: add and use runtime/internal/sys.NotInHeapCuong Manh Le
Updates #46731 Change-Id: Ic2208c8bb639aa1e390be0d62e2bd799ecf20654 Reviewed-on: https://go-review.googlesource.com/c/go/+/421878 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2022-08-16runtime: redo heap bitmapKeith Randall
[this is a retry of CL 407035 + its revert CL 422395. The content is unchanged] Use just 1 bit per word to record the ptr/nonptr bitmap. Use word-sized operations to manipulate the bitmap, so we can operate on up to 64 ptr/nonptr bits at a time. Use a separate bitmap, one bit per word of the ptr/nonptr bitmap, to encode a no-more-pointers signal. Since we can check 64 ptr/nonptr bits at once, knowing the exact last pointer location is not necessary. As a followon CL, we should make the gcdata bitmap an array of uintptr instead of an array of byte, so we can load 64 bits of it at once. Similarly for the processing of gc programs. Change-Id: Ica5eb622f5b87e647be64f471d67b02732ef8be6 Reviewed-on: https://go-review.googlesource.com/c/go/+/422634 Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org>
2022-08-09Revert "runtime: redo heap bitmap"Keith Randall
This reverts commit b589208c8cc6e08239868f47e12c1449cd797bac. Reason for revert: Bug somewhere in this code, causing wasm and maybe linux/386 to fail. Change-Id: I5e1e501d839584e0219271bb937e94348f83c11f Reviewed-on: https://go-review.googlesource.com/c/go/+/422395 Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-08runtime: redo heap bitmapKeith Randall
Use just 1 bit per word to record the ptr/nonptr bitmap. Use word-sized operations to manipulate the bitmap, so we can operate on up to 64 ptr/nonptr bits at a time. Use a separate bitmap, one bit per word of the ptr/nonptr bitmap, to encode a no-more-pointers signal. Since we can check 64 ptr/nonptr bits at once, knowing the exact last pointer location is not necessary. This cleans up the bitmap implementation significantly, which will hopefully make it faster. TODO: measure As a followon CL, we should make the gcdata bitmap an array of uintptr instead of an array of byte, so we can load 64 bits of it at once. Similarly for the processing of gc programs. Change-Id: I18151b1876d9543599800dec51e2a1b19df97d49 Reviewed-on: https://go-review.googlesource.com/c/go/+/407035 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com>
2022-08-04runtime: add mayAcquire annotation for finlockAustin Clements
We're missing lock edges to finlock that happen only rarely. Anything that calls mallocgc can potentially trigger sweeping, which can potentially queue a finalizer, which acquires finlock. While this can happen on any malloc, it happens relatively rarely, so we simply haven't seen some of the lock edges that could happen. Add a mayAcquire annotation to mallocgc to capture the possibility of acquiring finlock. With this change, we add "fin" to the set of "malloc" locks. Several of these edges were already there, but not quite all of them. This was found by inspecting the rank graph for things that didn't make sense. For #53789. Change-Id: Idc10ce6f250596b0c07ba07ac93f2198fb38c22b Reviewed-on: https://go-review.googlesource.com/c/go/+/418717 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-02runtime: trivial replacements of g in remaining filesMichael Pratt
Rename g variables to gp for consistency. Change-Id: I09ecdc7e8439637bc0e32f9c5f96f515e6436362 Reviewed-on: https://go-review.googlesource.com/c/go/+/418591 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Michael Pratt <mpratt@google.com>
2022-05-18runtime: remove useless constant definition in malloc.goLeonard Wang
Change-Id: I060c867d89a06b5a44fbe77804c19299385802d9 Reviewed-on: https://go-review.googlesource.com/c/go/+/311250 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
2022-05-03runtime: split mprof locksRhys Hiltner
The profiles for memory allocations, sync.Mutex contention, and general blocking store their data in a shared hash table. The bookkeeping work at the end of a garbage collection cycle involves maintenance on each memory allocation record. Previously, a single lock guarded access to the hash table and the contents of all records. When a program has allocated memory at a large number of unique call stacks, the maintenance following every garbage collection can hold that lock for several milliseconds. That can prevent progress on all other goroutines by delaying acquirep's call to mcache.prepareForSweep, which needs the lock in mProf_Free to report when a profiled allocation is no longer in use. With no user goroutines making progress, it is in effect a multi-millisecond GC-related stop-the-world pause. Split the lock so the call to mProf_Flush no longer delays each P's call to mProf_Free: mProf_Free uses a lock on the memory records' N+1 cycle, and mProf_Flush uses locks on the memory records' accumulator and their N cycle. mProf_Malloc also no longer competes with mProf_Flush, as it uses a lock on the memory records' N+2 cycle. The profiles for sync.Mutex contention and general blocking now share a separate lock, and another lock guards insertions to the shared hash table (uncommon in the steady-state). Consumers of each type of profile take the matching accumulator lock, so will observe consistent count and magnitude values for each record. For #45894 Change-Id: I615ff80618d10e71025423daa64b0b7f9dc57daa Reviewed-on: https://go-review.googlesource.com/c/go/+/399956 Reviewed-by: Carlos Amedee <carlos@golang.org> Run-TryBot: Rhys Hiltner <rhys@justin.tv> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03runtime: move inconsistent memstats into gcControllerMichael Anthony Knyszek
Fundamentally, all of these memstats exist to serve the runtime in managing memory. For the sake of simpler testing, couple these stats more tightly with the GC. This CL was mostly done automatically. The fields had to be moved manually, but the references to the fields were updated via gofmt -w -r 'memstats.<field> -> gcController.<field>' *.go For #48409. Change-Id: Ic036e875c98138d9a11e1c35f8c61b784c376134 Reviewed-on: https://go-review.googlesource.com/c/go/+/397678 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03runtime: clean up inconsistent heap statsMichael Anthony Knyszek
The inconsistent heaps stats in memstats are a bit messy. Primarily, heap_sys is non-orthogonal with heap_released and heap_inuse. In later CLs, we're going to want heap_sys-heap_released-heap_inuse, so clean this up by replacing heap_sys with an orthogonal metric: heapFree. heapFree represents page heap memory that is free but not released. I think this change also simplifies a lot of reasoning about these stats; it's much clearer what they mean, and to obtain HeapSys for memstats, we no longer need to do the strange subtraction from heap_sys when allocating specifically non-heap memory from the page heap. Because we're removing heap_sys, we need to replace it with a sysMemStat for mem.go functions. In this case, heap_released is the most appropriate because we increase it anyway (again, non-orthogonality). In which case, it makes sense for heap_inuse, heap_released, and heapFree to become more uniform, and to just represent them all as sysMemStats. While we're here and messing with the types of heap_inuse and heap_released, let's also fix their names (and last_heap_inuse's name) up to the more modern Go convention of camelCase. For #48409. Change-Id: I87fcbf143b3e36b065c7faf9aa888d86bd11710b Reviewed-on: https://go-review.googlesource.com/c/go/+/397677 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03runtime: track how much memory is mapped in the Ready stateMichael Anthony Knyszek
This change adds a field to memstats called mappedReady that tracks how much memory is in the Ready state at any given time. In essence, it's the total memory usage by the Go runtime (with one exception which is documented). Essentially, all memory mapped read/write that has either been paged in or will soon. To make tracking this not involve the many different stats that track mapped memory, we track this statistic at a very low level. The downside of tracking this statistic at such a low level is that it managed to catch lots of situations where the runtime wasn't fully accounting for memory. This change rectifies these situations by always accounting for memory that's mapped in some way (i.e. always passing a sysMemStat to a mem.go function), with *two* exceptions. Rectifying these situations means also having the memory mapped during testing being accounted for, so that tests (i.e. ReadMemStats) that ultimately check mappedReady continue to work correctly without special exceptions. We choose to simply account for this memory in other_sys. Let's talk about the exceptions. The first is the arenas array for finding heap arena metadata from an address is mapped as read/write in one large chunk. It's tens of MiB in size. On systems with demand paging, we assume that the whole thing isn't paged in at once (after all, it maps to the whole address space, and it's exceedingly difficult with today's technology to even broach having as much physical memory as the total address space). On systems where we have to commit memory manually, we use a two-level structure. Now, the reason why this is an exception is because we have no mechanism to track what memory is paged in, and we can't just account for the entire thing, because that would *look* like an enormous overhead. Furthermore, this structure is on a few really, really critical paths in the runtime, so doing more explicit tracking isn't really an option. So, we explicitly don't and call sysAllocOS to map this memory. The second exception is that we call sysFree with no accounting to clean up address space reservations, or otherwise to throw out mappings we don't care about. In this case, also drop down to a lower level and call sysFreeOS to explicitly avoid accounting. The third exception is debuglog allocations. That is purely a debugging facility and ideally we want it to have as small an impact on the runtime as possible. If we include it in mappedReady calculations, it could cause GC pacing shifts in future CLs, especailly if one increases the debuglog buffer sizes as a one-off. As of this CL, these are the only three places in the runtime that would pass nil for a stat to any of the functions in mem.go. As a result, this CL makes sysMemStats mandatory to facilitate better accounting in the future. It's now much easier to grep and find out where accounting is explicitly elided, because one doesn't have to follow the trail of sysMemStat nil pointer values, and can just look at the function name. For #48409. Change-Id: I274eb467fc2603881717482214fddc47c9eaf218 Reviewed-on: https://go-review.googlesource.com/c/go/+/393402 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-04-05all: separate doc comment from //go: directivesRuss Cox
A future change to gofmt will rewrite // Doc comment. //go:foo to // Doc comment. // //go:foo Apply that change preemptively to all comments (not necessarily just doc comments). For #51082. Change-Id: Iffe0285418d1e79d34526af3520b415a12203ca9 Reviewed-on: https://go-review.googlesource.com/c/go/+/384260 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>