aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/slice.go
AgeCommit message (Collapse)Author
2025-11-26cmd/compile: introduce alias analysis and automatically free non-aliased ↵thepudds
memory after growslice This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL updates the compiler to examine append calls to prove whether or not the slice is aliased. If proven unaliased, the compiler automatically inserts a call to a new runtime function introduced with this CL, runtime.growsliceNoAlias, which frees the old backing memory immediately after slice growth is complete and the old storage is logically dead. Two append benchmarks below show promising results, executing up to ~2x faster and up to factor of ~3 memory reduction with this CL. The approach works with multiple append calls for the same slice, including inside loops, and the final slice memory can be escaping, such as in a classic pattern of returning a slice from a function after the slice is built. (The final slice memory is never freed with this CL, though we have other work that tackles that.) An example target for this CL is we automatically free the intermediate memory for the appends in the loop in this function: func f1(input []int) []int { var s []int for _, x := range input { s = append(s, g(x)) // s cannot be aliased here if h(x) { s = append(s, x) // s cannot be aliased here } } return s // slice escapes at end } In this case, the compiler and the runtime collaborate so that the heap allocated backing memory for s is automatically freed after a successful grow. (For the first grow, there is nothing to free, but for the second and subsequent growths, the old heap memory is freed automatically.) The new runtime.growsliceNoAlias is primarily implemented by calling runtime.freegc, which we introduced in CL 673695. The high-level approach here is we step through the IR starting from a slice declaration and look for any operations that either alias the slice or might do so, and treat any IR construct we don't specifically handle as a potential alias (and therefore conservatively fall back to treating the slice as aliased when encountering something not understood). For loops, some additional care is required. We arrange the analysis so that an alias in the body of a loop causes all the appends in that same loop body to be marked aliased, even if the aliasing occurs after the append in the IR: func f2() { var s []int for i := range 10 { s = append(s, i) // aliased due to next line alias = s } } For nested loops, we analyse the nesting appropriately so that for example this append is still proven as non-aliased in the inner loop even though it aliased for the outer loop: func f3() { for range 10 { var s []int for i := range 10 { s = append(s, i) // append using non-aliased slice } alias = s } } A good starting point is the beginning of the test/escape_alias.go file, which starts with ~10 introductory examples with brief comments that attempt to illustrate the high-level approach. For more details, see the new .../internal/escape/alias.go file, especially the (*aliasAnalysis).analyze method. In the first benchmark, an append in a loop builds up a slice from nothing, where the slice elements are each 64 bytes. In the table below, 'count' is the number of appends. With 1 append, there is no opportunity for this CL to free memory. Once there are 2 appends, the growth from 1 element to 2 elements means the compiler-inserted growsliceNoAlias frees the 1-element array, and we see a ~33% reduction in memory use and a small reported speed improvement. As the number of appends increases for example to 5, we are at a ~20% speed improvement and ~45% memory reduction, and so on until we reach ~40% faster and ~50% less memory allocated at the end of the table. There can be variation in the reported numbers based on -randlayout, so this table is for 30 different values of -randlayout with a total n=150. (Even so, there is still some variation, so we probably should not read too much into small changes.) This is with GOAMD64=v3 on a VM that gcc reports is cascadelake. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ Append64Bytes/count=1-4 31.09n ± 2% 31.69n ± 1% +1.95% (n=150) Append64Bytes/count=2-4 73.31n ± 1% 70.27n ± 0% -4.15% (n=150) Append64Bytes/count=3-4 142.7n ± 1% 124.6n ± 1% -12.68% (n=150) Append64Bytes/count=4-4 149.6n ± 1% 127.7n ± 0% -14.64% (n=150) Append64Bytes/count=5-4 277.1n ± 1% 213.6n ± 0% -22.90% (n=150) Append64Bytes/count=6-4 280.7n ± 1% 216.5n ± 1% -22.87% (n=150) Append64Bytes/count=10-4 544.3n ± 1% 386.6n ± 0% -28.97% (n=150) Append64Bytes/count=20-4 1058.5n ± 1% 715.6n ± 1% -32.39% (n=150) Append64Bytes/count=50-4 2.121µ ± 1% 1.404µ ± 1% -33.83% (n=150) Append64Bytes/count=100-4 4.152µ ± 1% 2.736µ ± 1% -34.11% (n=150) Append64Bytes/count=200-4 7.753µ ± 1% 4.882µ ± 1% -37.03% (n=150) Append64Bytes/count=400-4 15.163µ ± 2% 9.273µ ± 1% -38.84% (n=150) geomean 601.8n 455.0n -24.39% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ Append64Bytes/count=1-4 64.00 ± 0% 64.00 ± 0% ~ (n=150) Append64Bytes/count=2-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) Append64Bytes/count=3-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=4-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=5-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=6-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=10-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) Append64Bytes/count=20-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) Append64Bytes/count=50-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) Append64Bytes/count=100-4 15.938Ki ± 0% 8.021Ki ± 0% -49.67% (n=150) Append64Bytes/count=200-4 31.94Ki ± 0% 16.08Ki ± 0% -49.64% (n=150) Append64Bytes/count=400-4 63.94Ki ± 0% 32.33Ki ± 0% -49.44% (n=150) geomean 1.991Ki 1.124Ki -43.54% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ Append64Bytes/count=1-4 1.000 ± 0% 1.000 ± 0% ~ (n=150) Append64Bytes/count=2-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) Append64Bytes/count=3-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=4-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=5-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=6-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=10-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) Append64Bytes/count=20-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) Append64Bytes/count=50-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) Append64Bytes/count=100-4 8.000 ± 0% 1.000 ± 0% -87.50% (n=150) Append64Bytes/count=200-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) Append64Bytes/count=400-4 10.000 ± 0% 1.000 ± 0% -90.00% (n=150) geomean 4.331 1.000 -76.91% The second benchmark is similar, but instead uses an 8-byte integer for the slice element. The first 4 appends in the loop never call into the runtime thanks to the excellent CL 664299 introduced by Keith in Go 1.25 that allows some <= 32 byte dynamically-sized slices to be on the stack, so this CL is neutral for <= 32 bytes. Once the 5th append occurs at count=5, a grow happens via the runtime and heap allocates as normal, but freegc does not yet have anything to free, so we see a small ~1.4ns penalty reported there. But once the second growth happens, the older heap memory is now automatically freed by freegc, so we start to see some benefit in memory reductions and speed improvements, starting at a tiny speed improvement (close to a wash, or maybe noise) by the second growth before count=10, and building up to ~2x faster with ~68% fewer allocated bytes reported. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ AppendInt/count=1-4 2.978n ± 0% 2.969n ± 0% -0.30% (p=0.000 n=150) AppendInt/count=4-4 4.292n ± 3% 4.163n ± 3% ~ (p=0.528 n=150) AppendInt/count=5-4 33.50n ± 0% 34.93n ± 0% +4.25% (p=0.000 n=150) AppendInt/count=10-4 76.21n ± 1% 75.67n ± 0% -0.72% (p=0.000 n=150) AppendInt/count=20-4 150.6n ± 1% 133.0n ± 0% -11.65% (n=150) AppendInt/count=50-4 284.1n ± 1% 225.6n ± 0% -20.59% (n=150) AppendInt/count=100-4 544.2n ± 1% 392.4n ± 1% -27.89% (n=150) AppendInt/count=200-4 1051.5n ± 1% 702.3n ± 0% -33.21% (n=150) AppendInt/count=400-4 2.041µ ± 1% 1.312µ ± 1% -35.70% (n=150) AppendInt/count=1000-4 5.224µ ± 2% 2.851µ ± 1% -45.43% (n=150) AppendInt/count=2000-4 11.770µ ± 1% 6.010µ ± 1% -48.94% (n=150) AppendInt/count=3000-4 17.747µ ± 2% 8.264µ ± 1% -53.44% (n=150) geomean 331.8n 246.4n -25.72% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) AppendInt/count=20-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) AppendInt/count=50-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) AppendInt/count=100-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) AppendInt/count=200-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) AppendInt/count=400-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) AppendInt/count=1000-4 24.56Ki ± 0% 10.05Ki ± 0% -59.07% (n=150) AppendInt/count=2000-4 58.56Ki ± 0% 20.31Ki ± 0% -65.32% (n=150) AppendInt/count=3000-4 85.19Ki ± 0% 27.30Ki ± 0% -67.95% (n=150) geomean ² -42.81% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) AppendInt/count=20-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) AppendInt/count=50-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) AppendInt/count=100-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) AppendInt/count=200-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) AppendInt/count=400-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) AppendInt/count=1000-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) AppendInt/count=2000-4 11.000 ± 0% 1.000 ± 0% -90.91% (n=150) AppendInt/count=3000-4 12.000 ± 0% 1.000 ± 0% -91.67% (n=150) geomean ² -72.76% ² Of course, these are just microbenchmarks, but likely indicate there are some opportunities here. The immediately following CL 712422 tackles inlining and is able to get runtime.freegc working automatically with iterators such as used by slices.Collect, which becomes able to automatically free the intermediate memory from its repeated appends (which earlier in this work required a temporary hand edit to the slices package). For now, we only use the NoAlias version for element types without pointers while waiting on additional runtime support in CL 698515. Updates #74299 Change-Id: I1b9d286aa97c170dcc2e203ec0f8ca72d84e8221 Reviewed-on: https://go-review.googlesource.com/c/go/+/710015 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-11-20cmd/compile: stack allocate backing stores during appendkhr@golang.org
We can already stack allocate the backing store during append if the resulting backing store doesn't escape. See CL 664299. This CL enables us to often stack allocate the backing store during append *even if* the result escapes. Typically, for code like: func f(n int) []int { var r []int for i := range n { r = append(r, i) } return r } the backing store for r escapes, but only by returning it. Could we operate with r on the stack for most of its lifeime, and only move it to the heap at the return point? The current implementation of append will need to do an allocation each time it calls growslice. This will happen on the 1st, 2nd, 4th, 8th, etc. append calls. The allocations done by all but the last growslice call will then immediately be garbage. We'd like to avoid doing some of those intermediate allocations if possible. We rewrite the above code by introducing a move2heap operation: func f(n int) []int { var r []int for i := range n { r = append(r, i) } r = move2heap(r) return r } Using the move2heap runtime function, which does: move2heap(r): If r is already backed by heap storage, return r. Otherwise, copy r to the heap and return the copy. Now we can treat the backing store of r allocated at the append site as not escaping. Previous stack allocation optimizations now apply, which can use a fixed-size stack-allocated backing store for r when appending. See the description in cmd/compile/internal/slice/slice.go for how we ensure that this optimization is safe. Change-Id: I81f36e58bade2241d07f67967d8d547fff5302b8 Reviewed-on: https://go-review.googlesource.com/c/go/+/707755 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-28all: omit unnecessary type conversionsJes Cok
Found by github.com/mdempsky/unconvert Change-Id: Ib78cceb718146509d96dbb6da87b27dbaeba1306 GitHub-Last-Rev: dedf354811701ce8920c305b6f7aa78914a4171c GitHub-Pull-Request: golang/go#74771 Reviewed-on: https://go-review.googlesource.com/c/go/+/690735 Reviewed-by: Mark Freeman <mark@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2024-09-17runtime: move getcallerpc to internal/runtime/sysMichael Pratt
Moving these intrinsics to a base package enables other internal/runtime packages to use them. For #54766. Change-Id: I0b3eded3bb45af53e3eb5bab93e3792e6a8beb46 Reviewed-on: https://go-review.googlesource.com/c/go/+/613260 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-07-23runtime,internal: move runtime/internal/sys to internal/runtime/sysDavid Chase
Cleanup and friction reduction For #65355. Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511 Reviewed-on: https://go-review.googlesource.com/c/go/+/600436 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-07-23runtime,internal: move runtime/internal/math to internal/runtime/mathDavid Chase
Cleanup and friction reduction. Updates #65355. Change-Id: I6c4fcd409d044c00d16561fe9ed2257877d73f5b Reviewed-on: https://go-review.googlesource.com/c/go/+/600435 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-05-29all: document legacy //go:linkname for modules with ≥100 dependentsRuss Cox
For #67401. Change-Id: I015408a3f437c1733d97160ef2fb5da6d4efcc5c Reviewed-on: https://go-review.googlesource.com/c/go/+/587598 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>
2024-05-23all: document legacy //go:linkname for modules with ≥200 dependentsRuss Cox
Ignored these linknames which have not worked for a while: github.com/xtls/xray-core: context.newCancelCtx removed in CL 463999 (Feb 2023) github.com/u-root/u-root: funcPC removed in CL 513837 (Jul 2023) tinygo.org/x/drivers: net.useNetdev never existed For #67401. Change-Id: I9293f4ef197bb5552b431de8939fa94988a060ce Reviewed-on: https://go-review.googlesource.com/c/go/+/587576 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-23all: document legacy //go:linkname for modules with ≥20,000 dependentsRuss Cox
For #67401. Change-Id: Icc10ede72547d8020c0ba45e89d954822a4b2455 Reviewed-on: https://go-review.googlesource.com/c/go/+/587218 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-22all: document legacy //go:linkname for modules with ≥50,000 dependentsRuss Cox
Note that this depends on the revert of CL 581395 to move zeroVal back. For #67401. Change-Id: I507c27c2404ad1348aabf1ffa3740e6b1957495b Reviewed-on: https://go-review.googlesource.com/c/go/+/587217 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-03-04runtime: use .Pointers() instead of manual checkingPouriya
Change-Id: Ib78c1513616089f4942297cd17212b1b11871fd5 GitHub-Last-Rev: f97fe5b5bffffe25dc31de7964588640cb70ec41 GitHub-Pull-Request: golang/go#65819 Reviewed-on: https://go-review.googlesource.com/c/go/+/565515 Reviewed-by: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-02-19strings: make use of sizeclasses in (*Builder).GrowMateusz Poliwczak
Fixes #64833 Change-Id: Ice3f5dfab65f5525bc7a6f57ddeaabda8d64dfa3 GitHub-Last-Rev: 38f1d6c19d8ec29ae5645ce677839a301f798df3 GitHub-Pull-Request: golang/go#64835 Reviewed-on: https://go-review.googlesource.com/c/go/+/552135 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-16runtime: optimize bulkBarrierPreWrite with allocheadersMichael Anthony Knyszek
Currently bulkBarrierPreWrite follows a fairly slow path wherein it calls typePointersOf, which ends up calling into fastForward. This does some fairly heavy computation to move the iterator forward without any assumptions about where it lands at all. It needs to be completely general to support splitting at arbitrary boundaries, for example for scanning oblets. This means that copying objects during the GC mark phase is fairly expensive, and is a regression from before allocheaders. However, in almost all cases bulkBarrierPreWrite and bulkBarrierPreWriteSrcOnly have perfect type information. We can do a lot better in these cases because we're starting on a type-size boundary, which is exactly what the iterator is built around. This change adds the typePointersOfType method which produces a typePointers iterator from a pointer and a type. This change significantly improves the performance of these bulk write barriers, eliminating some performance regressions that were noticed on the perf dashboard. There are still just a couple cases where we have to use the more general typePointersOf calls, but they're fairly rare; most bulk barriers have perfect type information. This change is tested by the GCInfo tests in the runtime and the GCBits tests in the reflect package via an additional check in getgcmask. Results for tile38 before and after allocheaders. There was previous a regression in the p90, now it's gone. Also, the overall win has been boosted slightly. tile38 $ benchstat noallocheaders.results allocheaders.results name old time/op new time/op delta Tile38QueryLoad 481µs ± 1% 468µs ± 1% -2.71% (p=0.000 n=10+10) name old average-RSS-bytes new average-RSS-bytes delta Tile38QueryLoad 6.32GB ± 1% 6.23GB ± 0% -1.38% (p=0.000 n=9+8) name old peak-RSS-bytes new peak-RSS-bytes delta Tile38QueryLoad 6.49GB ± 1% 6.40GB ± 1% -1.38% (p=0.002 n=10+10) name old peak-VM-bytes new peak-VM-bytes delta Tile38QueryLoad 7.72GB ± 1% 7.64GB ± 1% -1.07% (p=0.007 n=10+10) name old p50-latency-ns new p50-latency-ns delta Tile38QueryLoad 212k ± 1% 205k ± 0% -3.02% (p=0.000 n=10+9) name old p90-latency-ns new p90-latency-ns delta Tile38QueryLoad 622k ± 1% 616k ± 1% -1.03% (p=0.005 n=10+10) name old p99-latency-ns new p99-latency-ns delta Tile38QueryLoad 4.55M ± 2% 4.39M ± 2% -3.51% (p=0.000 n=10+10) name old ops/s new ops/s delta Tile38QueryLoad 12.5k ± 1% 12.8k ± 1% +2.78% (p=0.000 n=10+10) Change-Id: I0a48f848eae8777d0fd6769c3a1fe449f8d9d0a6 Reviewed-on: https://go-review.googlesource.com/c/go/+/542219 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-09runtime: implement experiment to replace heap bitmap with alloc headersMichael Anthony Knyszek
This change replaces the 1-bit-per-word heap bitmap for most size classes with allocation headers for objects that contain pointers. The header consists of a single pointer to a type. All allocations with headers are treated as implicitly containing one or more instances of the type in the header. As the name implies, headers are usually stored as the first word of an object. There are two additional exceptions to where headers are stored and how they're used. Objects smaller than 512 bytes do not have headers. Instead, a heap bitmap is reserved at the end of spans for objects of this size. A full word of overhead is too much for these small objects. The bitmap is of the same format of the old bitmap, minus the noMorePtrs bits which are unnecessary. All the objects <512 bytes have a bitmap less than a pointer-word in size, and that was the granularity at which noMorePtrs could stop scanning early anyway. Objects that are larger than 32 KiB (which have their own span) have their headers stored directly in the span, to allow power-of-two-sized allocations to not spill over into an extra page. The full implementation is behind GOEXPERIMENT=allocheaders. The purpose of this change is performance. First and foremost, with headers we no longer have to unroll pointer/scalar data at allocation time for most size classes. Small size classes still need some unrolling, but their bitmaps are small so we can optimize that case fairly well. Larger objects effectively have their pointer/scalar data unrolled on-demand from type data, which is much more compactly represented and results in less TLB pressure. Furthermore, since the headers are usually right next to the object and where we're about to start scanning, we get an additional temporal locality benefit in the data cache when looking up type metadata. The pointer/scalar data is now effectively unrolled on-demand, but it's also simpler to unroll than before; that unrolled data is never written anywhere, and for arrays we get the benefit of retreading the same data per element, as opposed to looking it up from scratch for each pointer-word of bitmap. Lastly, because we no longer have a heap bitmap that spans the entire heap, there's a flat 1.5% memory use reduction. This is balanced slightly by some objects possibly being bumped up a size class, but most objects are not tightly optimized to size class sizes so there's some memory to spare, making the header basically free in those cases. See the follow-up CL which turns on this experiment by default for benchmark results. (CL 538217.) Change-Id: I4c9034ee200650d06d8bdecd579d5f7c1bbf1fc5 Reviewed-on: https://go-review.googlesource.com/c/go/+/437955 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-03cmd/compile,runtime: remove runtime.mulUintptrqiulaidongfeng
For #48798 Change-Id: I3e928d3921cfd5a7bf35b23d0ae6442aa6d2d482 GitHub-Last-Rev: b101a8a54f2cc9ea917f879a545f30c702508743 GitHub-Pull-Request: golang/go#63349 Reviewed-on: https://go-review.googlesource.com/c/go/+/532355 TryBot-Result: Gopher Robot <gobot@golang.org> Commit-Queue: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2023-09-04runtime: introduce nextslicecapEgon Elbre
This allows to reuse the slice cap computation across specialized growslice funcs. Updates #49480 Change-Id: Ie075d9c3075659ea14c11d51a9cd4ed46aa0e961 Reviewed-on: https://go-review.googlesource.com/c/go/+/495876 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Egon Elbre <egonelbre@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@golang.org>
2023-09-04runtime: optimize growsliceEgon Elbre
This is tiny optimization for growslice, which is probably too small to measure easily. Move the for loop to avoid multiple checks inside the loop. Also, use >> 2 instead of /4, which generates fewer instructions. Change-Id: I9ab09bdccb56f98ab22073f23d9e102c252238c7 Reviewed-on: https://go-review.googlesource.com/c/go/+/493795 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Egon Elbre <egonelbre@gmail.com> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2023-05-05internal/abi: refactor (basic) type struct into one definitionDavid Chase
This touches a lot of files, which is bad, but it is also good, since there's N copies of this information commoned into 1. The new files in internal/abi are copied from the end of the stack; ultimately this will all end up being used. Change-Id: Ia252c0055aaa72ca569411ef9f9e96e3d610889e Reviewed-on: https://go-review.googlesource.com/c/go/+/462995 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2023-02-27bytes, strings: avoid unnecessary zero initializationJoe Tsai
Add bytealg.MakeNoZero that specially allocates a []byte without zeroing it. It assumes the caller will populate every byte. From within the bytes and strings packages, we can use bytealg.MakeNoZero in a way where our logic ensures that the entire slice is overwritten such that uninitialized bytes are never leaked to the end user. We use bytealg.MakeNoZero from within the following functions: * bytes.Join * bytes.Repeat * bytes.ToUpper * bytes.ToLower * strings.Builder.Grow The optimization in strings.Builder transitively benefits the following: * strings.Join * strings.Map * strings.Repeat * strings.ToUpper * strings.ToLower * strings.ToValidUTF8 * strings.Replace * any user logic that depends on strings.Builder This optimization is especially notable on large buffers that do not fit in the CPU cache, such that the cost of runtime.memclr and runtime.memmove are non-trivial since they are both limited by the relatively slow speed of physical RAM. Performance: RepeatLarge/256/1 66.0ns ± 3% 64.5ns ± 1% ~ (p=0.095 n=5+5) RepeatLarge/256/16 55.4ns ± 5% 53.1ns ± 3% -4.17% (p=0.016 n=5+5) RepeatLarge/512/1 95.5ns ± 7% 87.1ns ± 2% -8.78% (p=0.008 n=5+5) RepeatLarge/512/16 84.4ns ± 9% 76.2ns ± 5% -9.73% (p=0.016 n=5+5) RepeatLarge/1024/1 161ns ± 4% 144ns ± 7% -10.45% (p=0.016 n=5+5) RepeatLarge/1024/16 148ns ± 3% 141ns ± 5% ~ (p=0.095 n=5+5) RepeatLarge/2048/1 296ns ± 7% 288ns ± 5% ~ (p=0.841 n=5+5) RepeatLarge/2048/16 298ns ± 8% 281ns ± 5% ~ (p=0.151 n=5+5) RepeatLarge/4096/1 593ns ± 8% 539ns ± 8% -8.99% (p=0.032 n=5+5) RepeatLarge/4096/16 568ns ±12% 526ns ± 7% ~ (p=0.056 n=5+5) RepeatLarge/8192/1 1.15µs ± 8% 1.08µs ±12% ~ (p=0.095 n=5+5) RepeatLarge/8192/16 1.12µs ± 4% 1.07µs ± 7% ~ (p=0.310 n=5+5) RepeatLarge/8192/4097 1.77ns ± 1% 1.76ns ± 2% ~ (p=0.310 n=5+5) RepeatLarge/16384/1 2.06µs ± 7% 1.94µs ± 5% ~ (p=0.222 n=5+5) RepeatLarge/16384/16 2.02µs ± 4% 1.92µs ± 6% ~ (p=0.095 n=5+5) RepeatLarge/16384/4097 1.50µs ±15% 1.44µs ±11% ~ (p=0.802 n=5+5) RepeatLarge/32768/1 3.90µs ± 8% 3.65µs ±11% ~ (p=0.151 n=5+5) RepeatLarge/32768/16 3.92µs ±14% 3.68µs ±12% ~ (p=0.222 n=5+5) RepeatLarge/32768/4097 3.71µs ± 5% 3.43µs ± 4% -7.54% (p=0.032 n=5+5) RepeatLarge/65536/1 7.47µs ± 8% 6.88µs ± 9% ~ (p=0.056 n=5+5) RepeatLarge/65536/16 7.29µs ± 4% 6.74µs ± 6% -7.60% (p=0.016 n=5+5) RepeatLarge/65536/4097 7.90µs ±11% 6.34µs ± 5% -19.81% (p=0.008 n=5+5) RepeatLarge/131072/1 17.0µs ±18% 14.1µs ± 6% -17.32% (p=0.008 n=5+5) RepeatLarge/131072/16 15.2µs ± 2% 16.2µs ±17% ~ (p=0.151 n=5+5) RepeatLarge/131072/4097 15.7µs ± 6% 14.8µs ±11% ~ (p=0.095 n=5+5) RepeatLarge/262144/1 30.4µs ± 5% 31.4µs ±13% ~ (p=0.548 n=5+5) RepeatLarge/262144/16 30.1µs ± 4% 30.7µs ±11% ~ (p=1.000 n=5+5) RepeatLarge/262144/4097 31.2µs ± 7% 32.7µs ±13% ~ (p=0.310 n=5+5) RepeatLarge/524288/1 67.5µs ± 9% 63.7µs ± 3% ~ (p=0.095 n=5+5) RepeatLarge/524288/16 67.2µs ± 5% 62.9µs ± 6% ~ (p=0.151 n=5+5) RepeatLarge/524288/4097 65.5µs ± 4% 65.2µs ±18% ~ (p=0.548 n=5+5) RepeatLarge/1048576/1 141µs ± 6% 137µs ±14% ~ (p=0.421 n=5+5) RepeatLarge/1048576/16 140µs ± 2% 134µs ±11% ~ (p=0.222 n=5+5) RepeatLarge/1048576/4097 141µs ± 3% 134µs ±10% ~ (p=0.151 n=5+5) RepeatLarge/2097152/1 258µs ± 2% 271µs ±10% ~ (p=0.222 n=5+5) RepeatLarge/2097152/16 263µs ± 6% 273µs ± 9% ~ (p=0.151 n=5+5) RepeatLarge/2097152/4097 270µs ± 2% 277µs ± 6% ~ (p=0.690 n=5+5) RepeatLarge/4194304/1 684µs ± 3% 467µs ± 6% -31.69% (p=0.008 n=5+5) RepeatLarge/4194304/16 682µs ± 1% 471µs ± 7% -30.91% (p=0.008 n=5+5) RepeatLarge/4194304/4097 685µs ± 2% 465µs ±20% -32.12% (p=0.008 n=5+5) RepeatLarge/8388608/1 1.50ms ± 1% 1.16ms ± 8% -22.63% (p=0.008 n=5+5) RepeatLarge/8388608/16 1.50ms ± 2% 1.22ms ±17% -18.49% (p=0.008 n=5+5) RepeatLarge/8388608/4097 1.51ms ± 7% 1.33ms ±11% -11.56% (p=0.008 n=5+5) RepeatLarge/16777216/1 3.48ms ± 4% 2.66ms ±13% -23.76% (p=0.008 n=5+5) RepeatLarge/16777216/16 3.37ms ± 3% 2.57ms ±13% -23.72% (p=0.008 n=5+5) RepeatLarge/16777216/4097 3.38ms ± 9% 2.50ms ±11% -26.16% (p=0.008 n=5+5) RepeatLarge/33554432/1 7.74ms ± 1% 4.70ms ±19% -39.31% (p=0.016 n=4+5) RepeatLarge/33554432/16 7.90ms ± 4% 4.78ms ± 9% -39.50% (p=0.008 n=5+5) RepeatLarge/33554432/4097 7.80ms ± 2% 4.86ms ±11% -37.60% (p=0.008 n=5+5) RepeatLarge/67108864/1 16.4ms ± 3% 9.7ms ±15% -41.29% (p=0.008 n=5+5) RepeatLarge/67108864/16 16.5ms ± 1% 9.9ms ±15% -39.83% (p=0.008 n=5+5) RepeatLarge/67108864/4097 16.5ms ± 1% 11.0ms ±18% -32.95% (p=0.008 n=5+5) RepeatLarge/134217728/1 35.2ms ±12% 19.2ms ±10% -45.58% (p=0.008 n=5+5) RepeatLarge/134217728/16 34.6ms ± 6% 19.3ms ± 7% -44.07% (p=0.008 n=5+5) RepeatLarge/134217728/4097 33.2ms ± 2% 19.3ms ±14% -41.79% (p=0.008 n=5+5) RepeatLarge/268435456/1 70.9ms ± 2% 36.2ms ± 5% -48.87% (p=0.008 n=5+5) RepeatLarge/268435456/16 77.4ms ± 7% 36.1ms ± 8% -53.33% (p=0.008 n=5+5) RepeatLarge/268435456/4097 75.8ms ± 4% 37.0ms ± 4% -51.15% (p=0.008 n=5+5) RepeatLarge/536870912/1 163ms ±14% 77ms ± 9% -52.94% (p=0.008 n=5+5) RepeatLarge/536870912/16 156ms ± 4% 76ms ± 6% -51.42% (p=0.008 n=5+5) RepeatLarge/536870912/4097 151ms ± 2% 76ms ± 6% -49.64% (p=0.008 n=5+5) RepeatLarge/1073741824/1 293ms ± 5% 149ms ± 8% -49.18% (p=0.008 n=5+5) RepeatLarge/1073741824/16 308ms ± 9% 150ms ± 8% -51.19% (p=0.008 n=5+5) RepeatLarge/1073741824/4097 299ms ± 5% 151ms ± 6% -49.51% (p=0.008 n=5+5) Updates #57153 Change-Id: I024553b7e676d6da6408278109ac1fa8def0a802 Reviewed-on: https://go-review.googlesource.com/c/go/+/456336 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Joseph Tsai <joetsai@digital-static.net> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2022-10-18runtime: replace all uses of CtzXX with TrailingZerosXXYoulin Feng
Replace all uses of Ctz64/32/8 with TrailingZeros64/32/8, because they are the same and maybe duplicated. Also renamed CtzXX functions in 386 assembly code. Change-Id: I19290204858083750f4be589bb0923393950ae6d Reviewed-on: https://go-review.googlesource.com/c/go/+/438935 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Auto-Submit: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org>
2022-10-15reflect: add Value.GrowJoe Tsai
The Grow method is like the proposed slices.Grow function in that it ensures that the slice has enough capacity to append n elements without allocating. The implementation of Grow is a thin wrapper over runtime.growslice. This also changes Append and AppendSlice to use growslice under the hood. Fixes #48000 Change-Id: I992a58584a2ff1448c1c2bc0877fe76073609111 Reviewed-on: https://go-review.googlesource.com/c/go/+/389635 Run-TryBot: Joseph Tsai <joetsai@digital-static.net> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-09-01cmd/compile,runtime: redo growslice calling conventionKeith Randall
Instead of passing the original length and the new length, pass the new length and the length increment. Also use the new length in all the post-growslice calculations so that the original length is dead and does not need to be spilled/restored around the growslice. old: growslice(typ, oldPtr, oldLen, oldCap, newLen) (newPtr, newLen, newCap) new: growslice(oldPtr, newLen, oldCap, inc, typ) (newPtr, newLen, newCap) where inc = # of elements added = newLen-oldLen Also move the element type to the end of the call. This makes register allocation more efficient, as oldPtr and newPtr can often be in the same register (e.g. AX on amd64) and thus the phi takes no instructions. Makes the go binary 0.3% smaller. Change-Id: I7295a60227dbbeecec2bf039eeef2950a72df760 Reviewed-on: https://go-review.googlesource.com/c/go/+/418554 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2022-08-31cmd/compile: add support for unsafe.{String,StringData,SliceData}cuiweixie
For #53003 Change-Id: I13a761daca8b433b271a1feb711c103d9820772d Reviewed-on: https://go-review.googlesource.com/c/go/+/423774 Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: hopehook <hopehook@golangcn.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-19runtime: add and use runtime/internal/sys.NotInHeapCuong Manh Le
Updates #46731 Change-Id: Ic2208c8bb639aa1e390be0d62e2bd799ecf20654 Reviewed-on: https://go-review.googlesource.com/c/go/+/421878 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2022-08-16runtime: redo heap bitmapKeith Randall
[this is a retry of CL 407035 + its revert CL 422395. The content is unchanged] Use just 1 bit per word to record the ptr/nonptr bitmap. Use word-sized operations to manipulate the bitmap, so we can operate on up to 64 ptr/nonptr bits at a time. Use a separate bitmap, one bit per word of the ptr/nonptr bitmap, to encode a no-more-pointers signal. Since we can check 64 ptr/nonptr bits at once, knowing the exact last pointer location is not necessary. As a followon CL, we should make the gcdata bitmap an array of uintptr instead of an array of byte, so we can load 64 bits of it at once. Similarly for the processing of gc programs. Change-Id: Ica5eb622f5b87e647be64f471d67b02732ef8be6 Reviewed-on: https://go-review.googlesource.com/c/go/+/422634 Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org>
2022-08-10runtime: fix gofmt errorKeith Randall
Introduced in https://go-review.googlesource.com/c/go/+/419755 Change-Id: I7ca353d495dd7e833e46b3eeb972eac38b3a7a24 Reviewed-on: https://go-review.googlesource.com/c/go/+/422474 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: xie cui <523516579@qq.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com>
2022-08-09Revert "runtime: redo heap bitmap"Keith Randall
This reverts commit b589208c8cc6e08239868f47e12c1449cd797bac. Reason for revert: Bug somewhere in this code, causing wasm and maybe linux/386 to fail. Change-Id: I5e1e501d839584e0219271bb937e94348f83c11f Reviewed-on: https://go-review.googlesource.com/c/go/+/422395 Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-08cmd/compile: avoid assignment conversion in append(a, b...)Keith Randall
There's no need for a and b to match types. The typechecker already ensured that a and b are both slices with the same base type, or a and b are (possibly named) []byte and string. The optimization to treat append(b, make([], ...)) as a zeroing slice extension doesn't fire when there's a OCONVNOP wrapping the make. Fixes #53888 Change-Id: Ied871ed0bbb8e4a4b35d280c71acbab8103691bc Reviewed-on: https://go-review.googlesource.com/c/go/+/418475 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org>
2022-08-08runtime: redo heap bitmapKeith Randall
Use just 1 bit per word to record the ptr/nonptr bitmap. Use word-sized operations to manipulate the bitmap, so we can operate on up to 64 ptr/nonptr bits at a time. Use a separate bitmap, one bit per word of the ptr/nonptr bitmap, to encode a no-more-pointers signal. Since we can check 64 ptr/nonptr bits at once, knowing the exact last pointer location is not necessary. This cleans up the bitmap implementation significantly, which will hopefully make it faster. TODO: measure As a followon CL, we should make the gcdata bitmap an array of uintptr instead of an array of byte, so we can load 64 bits of it at once. Similarly for the processing of gc programs. Change-Id: I18151b1876d9543599800dec51e2a1b19df97d49 Reviewed-on: https://go-review.googlesource.com/c/go/+/407035 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com>
2022-08-08cmd/compile,runtime: panic when unsafe.Slice param is nil and > 0cuiweixie
Fixes #54092 Change-Id: Ib917922ed36ee5410e5515f812737203c44f46ae GitHub-Last-Rev: dfd0c3883cf8b10479d9c5b389baa1a04c52dd34 GitHub-Pull-Request: golang/go#54107 Reviewed-on: https://go-review.googlesource.com/c/go/+/419755 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-11cmd/compile,runtime: open code unsafe.SliceCuong Manh Le
So prevent heavy runtime call overhead, and the compiler will have a chance to optimize the bound check. With this optimization, changing runtime/stack.go to use unsafe.Slice no longer negatively impacts stack copying performance: name old time/op new time/op delta StackCopyWithStkobj-8 16.3ms ± 6% 16.5ms ± 5% ~ (p=0.382 n=8+8) name old alloc/op new alloc/op delta StackCopyWithStkobj-8 17.0B ± 0% 17.0B ± 0% ~ (all equal) name old allocs/op new allocs/op delta StackCopyWithStkobj-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Fixes #48798 Change-Id: I731a9a4abd6dd6846f44eece7f86025b7bb1141b Reviewed-on: https://go-review.googlesource.com/c/go/+/362934 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2022-02-05runtime: change sys.PtrSize to goarch.PtrSize in commentsIan Lance Taylor
The code was updated, the comments were not. Change-Id: If387779f3abd5e8a1b487fe34c33dcf9ce5fa7ff Reviewed-on: https://go-review.googlesource.com/c/go/+/383495 Trust: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2021-11-02runtime, syscall: add calls to asan functionsfanzha02
Add explicit address sanitizer instrumentation to the runtime and syscall packages. The compiler does not instrument the runtime package. It does instrument the syscall package, but we need to add a couple of cases that it can't see. Refer to the implementation of the asan malloc runtime library, this patch also allocates extra memory as the redzone, around the returned memory region, and marks the redzone as unaddressable to detect the overflows or underflows. Updates #44853. Change-Id: I2753d1cc1296935a66bf521e31ce91e35fcdf798 Reviewed-on: https://go-review.googlesource.com/c/go/+/298614 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Trust: fannie zhang <Fannie.Zhang@arm.com>
2021-10-13unsafe: optimize Slice bounds checkingMatthew Dempsky
This reduces the number of branches to bounds check non-empty slices from 5 to 3. It does also increase the number of branches to handle empty slices from 1 to 3; but for non-panicking calls, they should all be predictable. Updates #48798. Change-Id: I3ffa66857096486f4dee417e1a66eb8fdf7a3777 Reviewed-on: https://go-review.googlesource.com/c/go/+/355490 Trust: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2021-10-13unsafe: allow unsafe.Slice up to end of address spaceMatthew Dempsky
Allow the user to construct slices that are larger than the Go heap as long as they don't overflow the address space. Updates #48798. Change-Id: I659c8334d04676e1f253b9c3cd499eab9b9f989a Reviewed-on: https://go-review.googlesource.com/c/go/+/355489 Trust: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2021-09-27runtime: make slice growth formula a bit smootherKeith Randall
Instead of growing 2x for < 1024 elements and 1.25x for >= 1024 elements, use a somewhat smoother formula for the growth factor. Start reducing the growth factor after 256 elements, but slowly. starting cap growth factor 256 2.0 512 1.63 1024 1.44 2048 1.35 4096 1.30 (Note that the real growth factor, both before and now, is somewhat larger because we round up to the next size class.) This CL also makes the growth monotonic (larger initial capacities make larger final capacities, which was not true before). See discussion at https://groups.google.com/g/golang-nuts/c/UaVlMQ8Nz3o 256 was chosen as the threshold to roughly match the total number of reallocations when appending to eventually make a very large slice. (We allocate smaller when appending to capacities [256,1024] and larger with capacities [1024,...]). Change-Id: I876df09fdc9ae911bb94e41cb62675229cb10512 Reviewed-on: https://go-review.googlesource.com/c/go/+/347917 Trust: Keith Randall <khr@golang.org> Trust: Martin Möhrmann <martin@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Martin Möhrmann <martin@golang.org>
2021-06-30[dev.typeparams] all: merge master (4711bf3) into dev.typeparamsMatthew Dempsky
Conflicts: - src/cmd/compile/internal/walk/builtin.go On dev.typeparams, CL 330194 changed OCHECKNIL to not require manual SetTypecheck(1) anymore; while on master, CL 331070 got rid of the OCHECKNIL altogether by moving the check into the runtime support functions. - src/internal/buildcfg/exp.go On master, CL 331109 refactored the logic for parsing the GOEXPERIMENT string, so that it could be more easily reused by cmd/go; while on dev.typeparams, several CLs tweaked the regabi experiment defaults. Merge List: + 2021-06-30 4711bf30e5 doc/go1.17: linkify "language changes" in the runtime section + 2021-06-30 ed56ea73e8 path/filepath: deflake TestEvalSymlinksAboveRoot on darwin + 2021-06-30 c080d0323b cmd/dist: pass -Wno-unknown-warning-option in swig_callback_lto + 2021-06-30 7d0e9e6e74 image/gif: fix typo in the comment (io.ReadByte -> io.ByteReader) + 2021-06-30 0fa3265fe1 os: change example to avoid deprecated function + 2021-06-30 d19a53338f image: add Uniform.RGBA64At and Rectangle.RGBA64At + 2021-06-30 c45e800e0c crypto/x509: don't fail on optional auth key id fields + 2021-06-29 f9d50953b9 net: fix failure of TestCVE202133195 + 2021-06-29 e294b8a49e doc/go1.17: fix typo "MacOS" -> "macOS" + 2021-06-29 3463852b76 math/big: fix typo of comment (`BytesScanner` to `ByteScanner`) + 2021-06-29 fd4b587da3 cmd/compile: suppress details error for invalid variadic argument type + 2021-06-29 e2e05af6e1 cmd/internal/obj/arm64: fix an encoding error of CMPW instruction + 2021-06-28 4bb0847b08 cmd/compile,runtime: change unsafe.Slice((*T)(nil), 0) to return []T(nil) + 2021-06-28 1519271a93 spec: change unsafe.Slice((*T)(nil), 0) to return []T(nil) + 2021-06-28 5385e2386b runtime/internal/atomic: drop Cas64 pointer indirection in comments + 2021-06-28 956c81bfe6 cmd/go: add GOEXPERIMENT to `go env` output + 2021-06-28 a1d27269d6 cmd/go: prep for 'go env' refactoring + 2021-06-28 901510ed4e cmd/link/internal/ld: skip the windows ASLR test when CGO_ENABLED=0 + 2021-06-28 361159c055 cmd/cgo: fix 'see gmp.go' to 'see doc.go' + 2021-06-27 c95464f0ea internal/buildcfg: refactor GOEXPERIMENT parsing code somewhat + 2021-06-25 ed01ceaf48 runtime/race: use race build tag on syso_test.go + 2021-06-25 d1916e5e84 go/types: in TestCheck/issues.src, import regexp/syntax instead of cmd/compile/internal/syntax + 2021-06-25 5160896c69 go/types: in TestStdlib, import from source instead of export data + 2021-06-25 d01bc571f7 runtime: make ncgocall a global counter Change-Id: I1ce4a3b3ff7c824c67ad66dd27d9d5f1d25c0023
2021-06-28cmd/compile,runtime: change unsafe.Slice((*T)(nil), 0) to return []T(nil)Matthew Dempsky
This CL removes the unconditional OCHECKNIL check added in walkUnsafeSlice by instead passing it as a pointer to runtime.unsafeslice, and hiding the check behind a `len == 0` check. While here, this CL also implements checkptr functionality for unsafe.Slice and disallows use of unsafe.Slice with //go:notinheap types. Updates #46742. Change-Id: I743a445ac124304a4d7322a7fe089c4a21b9a655 Reviewed-on: https://go-review.googlesource.com/c/go/+/331070 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2021-06-17[dev.typeparams] runtime: fix import sort order [generated]Michael Anthony Knyszek
[git-generate] cd src/runtime goimports -w *.go Change-Id: I1387af0f2fd1a213dc2f4c122e83a8db0fcb15f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/329189 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-06-17[dev.typeparams] runtime: replace uses of runtime/internal/sys.PtrSize with ↵Michael Anthony Knyszek
internal/goarch.PtrSize [generated] [git-generate] cd src/runtime/internal/math gofmt -w -r "sys.PtrSize -> goarch.PtrSize" . goimports -w *.go cd ../.. gofmt -w -r "sys.PtrSize -> goarch.PtrSize" . goimports -w *.go Change-Id: I43491cdd54d2e06d4d04152b3d213851b7d6d423 Reviewed-on: https://go-review.googlesource.com/c/go/+/328337 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2021-05-21[dev.typeparams] runtime: replace funcPC with internal/abi.FuncPCABIInternalCherry Mui
At this point all funcPC references are ABIInternal functions. Replace with the intrinsics. Change-Id: I3ba7e485c83017408749b53f92877d3727a75e27 Reviewed-on: https://go-review.googlesource.com/c/go/+/321954 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-05-02cmd/compile: implement unsafe.Add and unsafe.SliceMatthew Dempsky
Updates #19367. Updates #40481. Change-Id: Iabd2afdd0d520e5d68fd9e6dedd013335a4b3886 Reviewed-on: https://go-review.googlesource.com/c/go/+/312214 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Trust: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>
2020-09-25runtime: use old capacity to decide on append growth regimeKeith Randall
We grow the backing store on append by 2x for small sizes and 1.25x for large sizes. The threshold we use for picking the growth factor used to depend on the old length, not the old capacity. That's kind of unfortunate, because then doing append(s, 0, 0) and append(append(s, 0), 0) do different things. (If s has one more spot available, then the former expression chooses its growth based on len(s) and the latter on len(s)+1.) If we instead use the old capacity, we get more consistent behavior. (Both expressions use len(s)+1 == cap(s) to decide.) Fixes #41239 Change-Id: I40686471d256edd72ec92aef973a89b52e235d4b Reviewed-on: https://go-review.googlesource.com/c/go/+/257338 Trust: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-09-16cmd/compile: unify reflect, string and slice copy runtime functionsMartin Möhrmann
Use a common runtime slicecopy function to copy strings or slices into slices. This deduplicates similar code previously used in reflect.slicecopy and runtime.stringslicecopy. Change-Id: I09572ff0647a9e12bb5c6989689ce1c43f16b7f1 Reviewed-on: https://go-review.googlesource.com/c/go/+/254658 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2020-05-07runtime: do not attempt bulkBarrierPreWrite when dst slice length is zeroMartin Möhrmann
If dst slice length is zero in makeslicecopy then the called mallocgc is using a fast path to only return a pointer to runtime.zerobase. There may be no heapBits for that address readable by bulkBarrierPreWriteSrcOnly which will cause a panic. Protect against this by not calling bulkBarrierPreWriteSrcOnly if there is nothing to copy. This is the case for all cases where the length of the destination slice is zero. runtime.growslice and runtime.typedslicecopy have fast paths that do not call bulkBarrierPreWrite for zero copy lengths either. Fixes #38929 Change-Id: I78ece600203a0a8d24de5b6c9eef56f605d44e99 Reviewed-on: https://go-review.googlesource.com/c/go/+/232800 Reviewed-by: Keith Randall <khr@golang.org>
2020-05-07cmd/compile: optimize make+copy pattern to avoid memclrMartin Möhrmann
match: m = make([]T, x); copy(m, s) for pointer free T and x==len(s) rewrite to: m = mallocgc(x*elemsize(T), nil, false); memmove(&m, &s, x*elemsize(T)) otherwise rewrite to: m = makeslicecopy([]T, x, s) This avoids memclear and shading of pointers in the newly created slice before the copy. With this CL "s" is only be allowed to bev a variable and not a more complex expression. This restriction could be lifted in future versions of this optimization when it can be proven that "s" is not referencing "m". Triggers 450 times during make.bash.. Reduces go binary size by ~8 kbyte. name old time/op new time/op delta MakeSliceCopy/mallocmove/Byte 71.1ns ± 1% 65.8ns ± 0% -7.49% (p=0.000 n=10+9) MakeSliceCopy/mallocmove/Int 71.2ns ± 1% 66.0ns ± 0% -7.27% (p=0.000 n=10+8) MakeSliceCopy/mallocmove/Ptr 104ns ± 4% 99ns ± 1% -5.13% (p=0.000 n=10+10) MakeSliceCopy/makecopy/Byte 70.3ns ± 0% 68.0ns ± 0% -3.22% (p=0.000 n=10+9) MakeSliceCopy/makecopy/Int 70.3ns ± 0% 68.5ns ± 1% -2.59% (p=0.000 n=9+10) MakeSliceCopy/makecopy/Ptr 102ns ± 0% 99ns ± 1% -2.97% (p=0.000 n=9+9) MakeSliceCopy/nilappend/Byte 75.4ns ± 0% 74.9ns ± 2% -0.63% (p=0.015 n=9+9) MakeSliceCopy/nilappend/Int 75.6ns ± 0% 76.4ns ± 3% ~ (p=0.245 n=9+10) MakeSliceCopy/nilappend/Ptr 107ns ± 0% 108ns ± 1% +0.93% (p=0.005 n=9+10) Fixes #26252 Change-Id: Iec553dd1fef6ded16197216a472351c8799a8e71 Reviewed-on: https://go-review.googlesource.com/c/go/+/146719 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-08cmd/compile,runtime: pass only ptr and len to some runtime callsJosh Bleecher Snyder
Some runtime calls accept a slice, but only use ptr and len. This change modifies most such routines to accept only ptr and len. After this change, the only runtime calls that accept an unnecessary cap arg are concatstrings and slicerunetostring. Neither is particularly common, and both are complicated to modify. Negligible compiler performance impact. Shrinks binaries a little. There are only a few regressions; the one I investigated was due to register allocation fluctuation. Passes 'go test -race std cmd', modulo #38265 and #38266. Wow, does that take a long time to run. Updates #36890 file before after Δ % compile 19655024 19655152 +128 +0.001% cover 5244840 5236648 -8192 -0.156% dist 3662376 3658280 -4096 -0.112% link 6680056 6675960 -4096 -0.061% pprof 14789844 14777556 -12288 -0.083% test2json 2824744 2820648 -4096 -0.145% trace 11647876 11639684 -8192 -0.070% vet 8260472 8256376 -4096 -0.050% total 115163736 115118808 -44928 -0.039% Change-Id: Idb29fa6a81d6a82bfd3b65740b98cf3275ca0a78 Reviewed-on: https://go-review.googlesource.com/c/go/+/227163 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-04-08runtime: only check for pointers up to ptrdata, not sizeIan Lance Taylor
Change-Id: I166cf253b7f2483d652c98d2fba36c380e2f3347 Reviewed-on: https://go-review.googlesource.com/c/go/+/227177 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2020-02-25runtime: reorder race detector calls in slicecopyKeith Randall
In rare circumstances, this helps report a race which would otherwise go undetected. Fixes #36794 Change-Id: I8a3c9bd6fc34efa51516393f7ee72531c34fb073 Reviewed-on: https://go-review.googlesource.com/c/go/+/220685 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2019-11-15all: fix a bunch of misspellingsVille Skyttä
Change-Id: I5b909df0fd048cd66c5a27fca1b06466d3bcaac7 GitHub-Last-Rev: 778c5d21311abee09a5fbda2e4005a5fd4cc3f9f GitHub-Pull-Request: golang/go#35624 Reviewed-on: https://go-review.googlesource.com/c/go/+/207421 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>