aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/base/debug.go
AgeCommit message (Collapse)Author
2026-03-02cmd/compile: add concurrency-ok property to some compiler debug flagsDavid Chase
The property was missing from some flags that we potentially care about. This change only affects flags supplied in the command line (not explicitly changed within the compiler) but nonetheless it seems like a good idea to get them right -- we might adjust them on the command line, someone might read the code and wonder why they're unset. Change-Id: I44812ddea640b71c078594317ef3506ab055a37b Reviewed-on: https://go-review.googlesource.com/c/go/+/452876 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2026-02-03cmd/compile: add astdump debug flagDavid Chase
This was extraordinarily useful for inlining work. I have cleaned it up somewhat, and did some additional tweaks after working on changes to bloop. -gcflags=-d=astdump=SomeFunc -gcflags=-d=astdump=SomeSubPkg.SomeFunc -gcflags=-d=astdump=Some/Pkg.SomeFunc -gcflags=-d=astdump=~YourRegExpHere Change-Id: I3f98601ca96c87d6b191d4b64b264cd236e6d8bf Reviewed-on: https://go-review.googlesource.com/c/go/+/629775 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-11-26cmd/compile: introduce alias analysis and automatically free non-aliased ↵thepudds
memory after growslice This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL updates the compiler to examine append calls to prove whether or not the slice is aliased. If proven unaliased, the compiler automatically inserts a call to a new runtime function introduced with this CL, runtime.growsliceNoAlias, which frees the old backing memory immediately after slice growth is complete and the old storage is logically dead. Two append benchmarks below show promising results, executing up to ~2x faster and up to factor of ~3 memory reduction with this CL. The approach works with multiple append calls for the same slice, including inside loops, and the final slice memory can be escaping, such as in a classic pattern of returning a slice from a function after the slice is built. (The final slice memory is never freed with this CL, though we have other work that tackles that.) An example target for this CL is we automatically free the intermediate memory for the appends in the loop in this function: func f1(input []int) []int { var s []int for _, x := range input { s = append(s, g(x)) // s cannot be aliased here if h(x) { s = append(s, x) // s cannot be aliased here } } return s // slice escapes at end } In this case, the compiler and the runtime collaborate so that the heap allocated backing memory for s is automatically freed after a successful grow. (For the first grow, there is nothing to free, but for the second and subsequent growths, the old heap memory is freed automatically.) The new runtime.growsliceNoAlias is primarily implemented by calling runtime.freegc, which we introduced in CL 673695. The high-level approach here is we step through the IR starting from a slice declaration and look for any operations that either alias the slice or might do so, and treat any IR construct we don't specifically handle as a potential alias (and therefore conservatively fall back to treating the slice as aliased when encountering something not understood). For loops, some additional care is required. We arrange the analysis so that an alias in the body of a loop causes all the appends in that same loop body to be marked aliased, even if the aliasing occurs after the append in the IR: func f2() { var s []int for i := range 10 { s = append(s, i) // aliased due to next line alias = s } } For nested loops, we analyse the nesting appropriately so that for example this append is still proven as non-aliased in the inner loop even though it aliased for the outer loop: func f3() { for range 10 { var s []int for i := range 10 { s = append(s, i) // append using non-aliased slice } alias = s } } A good starting point is the beginning of the test/escape_alias.go file, which starts with ~10 introductory examples with brief comments that attempt to illustrate the high-level approach. For more details, see the new .../internal/escape/alias.go file, especially the (*aliasAnalysis).analyze method. In the first benchmark, an append in a loop builds up a slice from nothing, where the slice elements are each 64 bytes. In the table below, 'count' is the number of appends. With 1 append, there is no opportunity for this CL to free memory. Once there are 2 appends, the growth from 1 element to 2 elements means the compiler-inserted growsliceNoAlias frees the 1-element array, and we see a ~33% reduction in memory use and a small reported speed improvement. As the number of appends increases for example to 5, we are at a ~20% speed improvement and ~45% memory reduction, and so on until we reach ~40% faster and ~50% less memory allocated at the end of the table. There can be variation in the reported numbers based on -randlayout, so this table is for 30 different values of -randlayout with a total n=150. (Even so, there is still some variation, so we probably should not read too much into small changes.) This is with GOAMD64=v3 on a VM that gcc reports is cascadelake. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ Append64Bytes/count=1-4 31.09n ± 2% 31.69n ± 1% +1.95% (n=150) Append64Bytes/count=2-4 73.31n ± 1% 70.27n ± 0% -4.15% (n=150) Append64Bytes/count=3-4 142.7n ± 1% 124.6n ± 1% -12.68% (n=150) Append64Bytes/count=4-4 149.6n ± 1% 127.7n ± 0% -14.64% (n=150) Append64Bytes/count=5-4 277.1n ± 1% 213.6n ± 0% -22.90% (n=150) Append64Bytes/count=6-4 280.7n ± 1% 216.5n ± 1% -22.87% (n=150) Append64Bytes/count=10-4 544.3n ± 1% 386.6n ± 0% -28.97% (n=150) Append64Bytes/count=20-4 1058.5n ± 1% 715.6n ± 1% -32.39% (n=150) Append64Bytes/count=50-4 2.121µ ± 1% 1.404µ ± 1% -33.83% (n=150) Append64Bytes/count=100-4 4.152µ ± 1% 2.736µ ± 1% -34.11% (n=150) Append64Bytes/count=200-4 7.753µ ± 1% 4.882µ ± 1% -37.03% (n=150) Append64Bytes/count=400-4 15.163µ ± 2% 9.273µ ± 1% -38.84% (n=150) geomean 601.8n 455.0n -24.39% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ Append64Bytes/count=1-4 64.00 ± 0% 64.00 ± 0% ~ (n=150) Append64Bytes/count=2-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) Append64Bytes/count=3-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=4-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=5-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=6-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=10-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) Append64Bytes/count=20-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) Append64Bytes/count=50-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) Append64Bytes/count=100-4 15.938Ki ± 0% 8.021Ki ± 0% -49.67% (n=150) Append64Bytes/count=200-4 31.94Ki ± 0% 16.08Ki ± 0% -49.64% (n=150) Append64Bytes/count=400-4 63.94Ki ± 0% 32.33Ki ± 0% -49.44% (n=150) geomean 1.991Ki 1.124Ki -43.54% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ Append64Bytes/count=1-4 1.000 ± 0% 1.000 ± 0% ~ (n=150) Append64Bytes/count=2-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) Append64Bytes/count=3-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=4-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=5-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=6-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=10-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) Append64Bytes/count=20-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) Append64Bytes/count=50-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) Append64Bytes/count=100-4 8.000 ± 0% 1.000 ± 0% -87.50% (n=150) Append64Bytes/count=200-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) Append64Bytes/count=400-4 10.000 ± 0% 1.000 ± 0% -90.00% (n=150) geomean 4.331 1.000 -76.91% The second benchmark is similar, but instead uses an 8-byte integer for the slice element. The first 4 appends in the loop never call into the runtime thanks to the excellent CL 664299 introduced by Keith in Go 1.25 that allows some <= 32 byte dynamically-sized slices to be on the stack, so this CL is neutral for <= 32 bytes. Once the 5th append occurs at count=5, a grow happens via the runtime and heap allocates as normal, but freegc does not yet have anything to free, so we see a small ~1.4ns penalty reported there. But once the second growth happens, the older heap memory is now automatically freed by freegc, so we start to see some benefit in memory reductions and speed improvements, starting at a tiny speed improvement (close to a wash, or maybe noise) by the second growth before count=10, and building up to ~2x faster with ~68% fewer allocated bytes reported. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ AppendInt/count=1-4 2.978n ± 0% 2.969n ± 0% -0.30% (p=0.000 n=150) AppendInt/count=4-4 4.292n ± 3% 4.163n ± 3% ~ (p=0.528 n=150) AppendInt/count=5-4 33.50n ± 0% 34.93n ± 0% +4.25% (p=0.000 n=150) AppendInt/count=10-4 76.21n ± 1% 75.67n ± 0% -0.72% (p=0.000 n=150) AppendInt/count=20-4 150.6n ± 1% 133.0n ± 0% -11.65% (n=150) AppendInt/count=50-4 284.1n ± 1% 225.6n ± 0% -20.59% (n=150) AppendInt/count=100-4 544.2n ± 1% 392.4n ± 1% -27.89% (n=150) AppendInt/count=200-4 1051.5n ± 1% 702.3n ± 0% -33.21% (n=150) AppendInt/count=400-4 2.041µ ± 1% 1.312µ ± 1% -35.70% (n=150) AppendInt/count=1000-4 5.224µ ± 2% 2.851µ ± 1% -45.43% (n=150) AppendInt/count=2000-4 11.770µ ± 1% 6.010µ ± 1% -48.94% (n=150) AppendInt/count=3000-4 17.747µ ± 2% 8.264µ ± 1% -53.44% (n=150) geomean 331.8n 246.4n -25.72% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) AppendInt/count=20-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) AppendInt/count=50-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) AppendInt/count=100-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) AppendInt/count=200-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) AppendInt/count=400-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) AppendInt/count=1000-4 24.56Ki ± 0% 10.05Ki ± 0% -59.07% (n=150) AppendInt/count=2000-4 58.56Ki ± 0% 20.31Ki ± 0% -65.32% (n=150) AppendInt/count=3000-4 85.19Ki ± 0% 27.30Ki ± 0% -67.95% (n=150) geomean ² -42.81% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) AppendInt/count=20-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) AppendInt/count=50-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) AppendInt/count=100-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) AppendInt/count=200-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) AppendInt/count=400-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) AppendInt/count=1000-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) AppendInt/count=2000-4 11.000 ± 0% 1.000 ± 0% -90.91% (n=150) AppendInt/count=3000-4 12.000 ± 0% 1.000 ± 0% -91.67% (n=150) geomean ² -72.76% ² Of course, these are just microbenchmarks, but likely indicate there are some opportunities here. The immediately following CL 712422 tackles inlining and is able to get runtime.freegc working automatically with iterators such as used by slices.Collect, which becomes able to automatically free the intermediate memory from its repeated appends (which earlier in this work required a temporary hand edit to the slices package). For now, we only use the NoAlias version for element types without pointers while waiting on additional runtime support in CL 698515. Updates #74299 Change-Id: I1b9d286aa97c170dcc2e203ec0f8ca72d84e8221 Reviewed-on: https://go-review.googlesource.com/c/go/+/710015 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-11-26cmd/compile: adjust start heap sizeDavid Chase
TLDR - not-huge increase to default starting heap boost, - small improvement in build performance, - remove concurrency dependence of starting heap, - aligns RSS behavior with GOMEMLIMIT, - adds a gcflags=-d=gcstart=N (N -> N MiB) flag for people who want to trade a lot of memory for a little build performance improvement. This removes concurrency (-c flag) sensitivity and increases the nominal default to 128MiB. Refactored the startheap code into a separate file, to make it easier to extract and reuse. Added sensitivity to concurrency=1 and GOMEMLIMIT!="" (in addition to existing GOGC!=""), those disable the default starting heap boost because the compiler-invoker has indicated either a desire to control the GC or a desire to run in minimum memory(or both). Adds a -d flag gcstart=N (N is number of MiB) for tinkering/experiments. This always enables the starting heap. (`GOGC=XXX` and `-d=gcstart=YYY` will use `GOGC=XXX` after starting heap size is achieved.) Derated the "boost" obtained by a factor of .70 so that `-d=gcstart=2000` yields the same RSS as `GOMEMLIMIT=2000MiB` (Actually adjusts the boost with a high-low breakpoint.) The parent, with concurrency sensitivity, provided 64MB of plain boost. Derating reduces the effects of boosting the starting heap slightly. The benchmark here shows that maintaining 64MB results in a minor regression, while increasing it to 128MB produces a slight improvement, and does not grow the RSS versus 64MB. ``` │ parent │ sh64 │ sh128 │ sh1024 │ │ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │ std 10.164 ± 1% 10.527 ± 1% +3.57% (p=0.000 n=50) 10.084 ± 1% -0.79% (p=0.000 n=50) 9.631 ± 1% -5.24% (p=0.000 n=50) compile 21.05 ± 1% 20.78 ± 0% -1.28% (p=0.000 n=50) 20.74 ± 1% -1.46% (p=0.000 n=50) 20.77 ± 0% -1.32% (p=0.001 n=50) ast 20.45 ± 1% 20.39 ± 1% ~ (p=0.334 n=50) 20.44 ± 0% ~ (p=0.818 n=50) 20.11 ± 1% -1.65% (p=0.000 n=50) geomean 16.35 16.46 +0.65% 16.23 -0.76% 15.90 -2.75% │ parent │ sh64 │ sh128 │ sh1024 │ │ user-sec/op │ user-sec/op vs base │ user-sec/op vs base │ user-sec/op vs base │ std 66.06 ± 0% 69.74 ± 0% +5.56% (p=0.000 n=50) 64.68 ± 0% -2.09% (p=0.000 n=50) 59.51 ± 0% -9.91% (p=0.000 n=50) compile 84.69 ± 1% 82.54 ± 0% -2.53% (p=0.000 n=50) 82.63 ± 0% -2.43% (p=0.000 n=50) 82.66 ± 1% -2.40% (p=0.000 n=50) ast 59.41 ± 0% 58.84 ± 1% -0.95% (p=0.011 n=50) 59.48 ± 1% ~ (p=0.341 n=50) 57.13 ± 1% -3.83% (p=0.000 n=50) geomean 69.27 69.71 +0.63% 68.25 -1.47% 65.50 -5.44% │ parent │ sh64 │ sh128 │ sh1024 │ │ sys-sec/op │ sys-sec/op vs base │ sys-sec/op vs base │ sys-sec/op vs base │ std 9.599 ± 1% 10.031 ± 1% +4.50% (p=0.000 n=50) 9.513 ± 1% -0.90% (p=0.014 n=50) 9.359 ± 1% -2.50% (p=0.000 n=50) compile 6.813 ± 1% 6.740 ± 1% -1.08% (p=0.017 n=50) 6.716 ± 1% -1.42% (p=0.006 n=50) 6.696 ± 1% -1.72% (p=0.000 n=50) ast 4.315 ± 1% 4.291 ± 1% ~ (p=0.781 n=50) 4.296 ± 1% ~ (p=0.792 n=50) 4.279 ± 2% ~ (p=0.124 n=50) geomean 6.559 6.620 +0.93% 6.499 -0.92% 6.449 -1.68% │ parent │ sh64 │ sh128 │ sh1024 │ │ peak-RSS-bytes │ peak-RSS-bytes vs base │ peak-RSS-bytes vs base │ peak-RSS-bytes vs base │ std 257.1Mi ± 1% 257.2Mi ± 1% ~ (p=0.754 n=50) 257.0Mi ± 0% ~ (p=0.570 n=50) 605.6Mi ± 0% +135.59% (p=0.000 n=50) compile 1007.2Mi ± 1% 1004.3Mi ± 0% ~ (p=0.064 n=50) 1007.4Mi ± 0% ~ (p=0.348 n=50) 1009.4Mi ± 1% ~ (p=0.598 n=50) ast 1.848Gi ± 0% 1.842Gi ± 0% ~ (p=0.079 n=50) 1.824Gi ± 0% -1.25% (p=0.000 n=50) 1.856Gi ± 0% +0.47% (p=0.000 n=50) geomean 788.3Mi 786.8Mi -0.19% 785.0Mi -0.41% 1.027Gi +33.37% ``` Updates #73044 Change-Id: I6359642a94b396e696dd57e64ed1f2c4cf178475 Reviewed-on: https://go-review.googlesource.com/c/go/+/724441 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-18cmd/asm,cmd/compile,cmd/internal/obj/riscv: use compressed instructions on ↵Joel Sing
riscv64 Make use of compressed instructions on riscv64 - add a compress pass to the end of the assembler, which replaces non-compressed instructions with compressed alternatives if possible. Provide a `compressinstructions` compiler and assembler debug flag, such that the compression pass can be disabled via `-asmflags=all=-d=compressinstructions=0` and `-gcflags=all=-d=compressinstructions=0`. Note that this does not prevent the explicit use of compressed instructions via assembly. Note that this does not make use of compressed control transfer instructions - this will be implemented in later changes. Reduces the text size of a hello world binary by ~121KB and reduces the text size of the go binary on riscv64 by ~1.21MB (between 8-10% in both cases). Updates #71105 Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64 Change-Id: I24258353688554042c2a836deed4830cc673e985 Reviewed-on: https://go-review.googlesource.com/c/go/+/523478 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Mark Freeman <markfreeman@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-10-09cmd/compile: modify float-to-[u]int so that amd64 and arm64 matchDavid Chase
Eventual goal is that all the architectures agree, and are sensible. The test will be build-tagged to exclude not-yet-handled platforms. This change also bisects the conversion change in case of bugs. (`bisect -compile=convert ...`) Change-Id: I98528666b0a3fde17cbe8d69b612d01da18dce85 Reviewed-on: https://go-review.googlesource.com/c/go/+/691135 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-06-30cmd/compile/internal/escape: add debug hash for literal allocation optimizationsthepudds
Several CLs earlier in this stack added optimizations to reduce user allocations by recognizing and taking advantage of literals, including CL 649555, CL 649079, and CL 649035. This CL adds debug hashing of those changes, which enables use of the bisect tool, such as 'bisect -compile=literalalloc go test -run=Foo'. This also allows these optimizations to be manually disabled via '-gcflags=all=-d=literalallochash=n'. Updates #71359 Change-Id: I854f7742a6efa5b17d914932d61a32b2297f0c88 Reviewed-on: https://go-review.googlesource.com/c/go/+/675415 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-20cmd/compile/internal/escape: additional constant and zero value tests and ↵thepudds
logging This adds additional logging for the work that walk does to reduce how often an interface conversion results in an allocation. Also, as part of #71359, we will be updating how escape analysis and walk handle basic literals, composite literals, and zero values, so add some tests that uses this new logging. By the end of our CL stack, we address all of these tests. Updates #71359 Change-Id: I43fde8343d9aacaec1e05360417908014a86c8bd Reviewed-on: https://go-review.googlesource.com/c/go/+/649076 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-15cmd/compile/internal/escape: add hash for bisecting stack allocation of ↵thepudds
variable-sized makeslice CL 653856 enabled stack allocation of variable-sized makeslice results. This CL adds debug hashing of that change, plus a debug flag to control the byte threshold used. The debug hashing machinery means we also now have a way to disable just the CL 653856 optimization by doing -gcflags='all=-d=variablemakehash=n' or similar, though the stderr output will then typically have many lines of debug hash output. Using this CL plus the bisect command, I was able to retroactively find one of the lines of code responsible for #73199: $ bisect -compile=variablemake go test -skip TestListWireGuardDrivers [...] bisect: FOUND failing change set --- change set #1 (enabling changes causes failure) ./security_windows.go:1321:38 (variablemake) ./security_windows.go:1321:38 (variablemake) --- Previously, I had tracked down those lines by diffing '-gcflags=-m=1' output and brief code inspection, but seeing the bisect was very nice. This CL also adds a compiler debug flag to control the threshold for stack allocation of variably sized make results. This can help us identify more code that is relying on certain stack allocations. This might be a temporary flag that we delete prior to Go 1.25 (given we would not want people to rely on it), or maybe it might make sense to keep it for some period of time beyond the release of Go 1.25 to help the ecosystem shake out other bugs. Using these two flags together (and picking a threshold of 64 rather than the default of 32), it looks for example like this x/sys/windows code might be relying on stack allocation of a byte slice: $ bisect -compile=variablemake go test -gcflags=-d=variablemakethreshold=64 -skip TestListWireGuardDrivers [...] bisect: FOUND failing change set --- change set #1 (enabling changes causes failure) ./syscall_windows_test.go:1178:16 (variablemake) Updates #73199 Fixes #73253 Change-Id: I160179a0e3c148c3ea86be5c9b6cea8a52c3e5b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/663795 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-13cmd/compile, cmd/link: add FIPS verification supportRuss Cox
For FIPS init-time code+data verification, we need to arrange to put the FIPS symbols into contiguous regions of the executable and then record those sections along with the expected checksum. The cmd/internal/obj changes identify the FIPS symbols and give them distinguished types, which the linker then places in contiguous regions. The linker also writes out information to use at run time to find the FIPS sections, along with the expected hash. See cmd/internal/obj/fips.go and cmd/link/internal/ld/fips.go for more details. The code is disabled in this commit. CL 625998 and 625999 adds tests. CL 626000 enables the code. For #69536. Change-Id: I48da6db94bc0bea7428c43d4abcf999527bccfcd Reviewed-on: https://go-review.googlesource.com/c/go/+/625997 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-09-09cmd/compile: emit tail call wrappers when possibleamusman
Use OTAILCALL in wrapper if the receiver and method are both pointers and it is not going to be inlined, similar to how it is done in reflectdata.methodWrapper. Currently tail call may be used for functions with identical argument types. This change updates wrappers where both wrapper and the wrapped method's receiver are pointers. In this case, we have the same signature for the wrapper and the wrapped method (modulo the receiver's pointed-to types), and do not need any local variables in the generated wrapper (on stack) because the arguments are immediately passed to the wrapped method in place (without need to move some value passed to other register or to change any argument/return passed through stack). Thus, the wrapper does not need its own stack frame. This applies to promoted methods, e.g. when we have some struct type U with an embedded type *T and construct a wrapper like func (recv *U) M(arg int) bool { return recv.T.M(i) } See also test/abi/method_wrapper.go for a running example. Code size difference measured with this change (tried for x86_64): etcd binary: .text section size: 21472251 -> 21432350 (0.2%) total binary size: 32226640 -> 32191136 (0.1%) compile binary: .text section size: 17419073 -> 17413929 (0.03%) total binary size: 26744743 -> 26737567 (0.03%) Change-Id: I9bbe730568f6def21a8e61118a6b6f503d98049c Reviewed-on: https://go-review.googlesource.com/c/go/+/578235 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2024-07-30cmd/compile: add "deadlocals" pass to remove unused localsDavid Chase
This CL adds a "deadlocals" pass, which runs after inlining and before escape analysis, to prune any unneeded local variables and assignments. In particular, this helps avoid unnecessary Addrtaken markings from unreachable closures. Deadlocals is sensitive to "_ = ..." as a signal of explicit use for testing. This signal occurs only if the entire left-hand-side is "_" targets; if it is `_, ok := someInlinedFunc(args)` then the first return value is eligible for dead code elimination. Use this (`_ = x`) to fix tests broken by deadlocals elimination. Includes a test, based on one of the tests that required modification. Matthew Dempsky wrote this, changing ownership to allow rebases, commits, tweaks. Fixes #65158. Old-Change-Id: I723fb69ccd7baadaae04d415702ce6c8901eaf4e Change-Id: I1f25f4293b19527f305c18c3680b214237a7714c Reviewed-on: https://go-review.googlesource.com/c/go/+/600498 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: David Chase <drchase@google.com> Commit-Queue: David Chase <drchase@google.com>
2024-05-15cmd/compile, cmd/internal: fine-grained fiddling with loop alignmentDavid Chase
This appears to be useful only on amd64, and was specifically benchmarked on Apple Silicon and did not produce any benefit there. This CL adds the assembly instruction `PCALIGNMAX align,amount` which aligns to `align` if that can be achieved with `amount` or fewer bytes of padding. (0 means never, but will align the enclosing function.) Specifically, if low-order-address-bits + amount are greater than or equal to align; thus, `PCALIGNMAX 64,63` is the same as `PCALIGN 64` and `PCALIGNMAX 64,0` will never emit any alignment, but will still cause the function itself to be aligned to (at least) 64 bytes. Change-Id: Id51a056f1672f8095e8f755e01f72836c9686aa3 Reviewed-on: https://go-review.googlesource.com/c/go/+/577935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-04-09cmd/compile/internal/liveness: enhance mergelocals for addr-taken candidatesThan McIntosh
It is possible to have situations where a given ir.Name is non-address-taken at the source level, but whose address is materialized in order to accommodate the needs of arch-dependent memory ops. The issue here is that the SymAddr op will show up as touching a variable of interest, but the subsequent memory op will not. This is generally not an issue for computing whether something is live across a call, but it is problematic for collecting the more fine-grained live interval info that drives stack slot merging. As an example, consider this Go code: package p type T struct { x [10]int f float64 } func ABC(i, j int) int { var t T t.x[i&3] = j return t.x[j&3] } On amd64 the code sequences we'll see for accesses to "t" might look like v10 = VarDef <mem> {t} v1 v5 = MOVOstoreconst <mem> {t} [val=0,off=0] v2 v10 v23 = LEAQ <*T> {t} [8] v2 : DI v12 = DUFFZERO <mem> [80] v23 v5 v14 = ANDQconst <int> [3] v7 : AX v19 = MOVQstoreidx8 <mem> {t} v2 v14 v8 v12 v22 = ANDQconst <int> [3] v8 : BX v24 = MOVQloadidx8 <int> {t} v2 v22 v19 : AX v25 = MakeResult <int,mem> v24 v19 : <> Note that the the loads and stores (ex: v19, v24) all refer directly to "t", which means that regular live analysis will work fine for identifying variable lifetimes. The DUFFZERO is (in effect) an indirect write, but since there are accesses immediately after it we wind up with the same live intervals. Now the same code with GOARCH=ppc64: v10 = VarDef <mem> {t} v1 v20 = MOVDaddr <*T> {t} v2 : R20 v12 = LoweredZero <mem> [88] v20 v10 v3 = CLRLSLDI <int> [212543] v7 : R5 v15 = MOVDaddr <*T> {t} v2 : R6 v19 = MOVDstoreidx <mem> v15 v3 v8 v12 v29 = CLRLSLDI <int> [212543] v8 : R4 v24 = MOVDloadidx <int> v15 v29 v19 : R3 v25 = MakeResult <int,mem> v24 v19 : <> Here instead of memory ops that refer directly to the symbol, we take the address of "t" (ex: v15) and then pass the address to memory ops (where the ops themselves no longer refer to the symbol). This patch enhances the stack slot merging liveness analysis to handle cases like the PPC64 one above. We add a new phase in candidate selection that collects more precise use information for merge candidates, and screens out candidates that are too difficult to analyze. The phase make a forward pass over each basic block looking for instructions of the form vK := SymAddr(N) where N is a raw candidate. It then creates an entry in a map with key vK and value holding name and the vK use count. As the walk continues, we check for uses of of vK: when we see one, record it in a side table as an upwards exposed use of N. At each vK use we also decrement the use count in the map entry, and if we hit zero, remove the map entry. If we hit the end of the basic block and we still have map entries, this implies that the address in question "escapes" the block -- at that point to be conservative we just evict the name in question from the candidate set. Although this CL fixes the issues that forced a revert of the original merging CL, this CL doesn't enable stack slot merging by default; a subsequent CL will do that. Updates #62737. Updates #65532. Updates #65495. Change-Id: Id41d359a677767a8e7ac1e962ae23f7becb4031f Reviewed-on: https://go-review.googlesource.com/c/go/+/576735 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-09cmd/compile/internal: merge stack slots for selected local auto varsThan McIntosh
[This is a partial roll-forward of CL 553055, the main change here is that the stack slot overlap operation is flagged off by default (can be enabled by hand with -gcflags=-d=mergelocals=1) ] Preliminary compiler support for merging/overlapping stack slots of local variables whose access patterns are disjoint. This patch includes changes in AllocFrame to do the actual merging/overlapping based on information returned from a new liveness.MergeLocals helper. The MergeLocals helper identifies candidates by looking for sets of AUTO variables that either A) have the same size and GC shape (if types contain pointers), or B) have the same size (but potentially different types as long as those types have no pointers). Variables must be greater than (3*types.PtrSize) in size to be considered for merging. After forming candidates, MergeLocals collects variables into "can be overlapped" equivalence classes or partitions; this process is driven by an additional liveness analysis pass. Ideally it would be nice to move the existing stackmap liveness pass up before AllocFrame and "widen" it to include merge candidates so that we can do just a single liveness as opposed to two passes, however this may be difficult given that the merge-locals liveness has to take into account writes corresponding to dead stores. This patch also required a change to the way ssa.OpVarDef pseudo-ops are generated; prior to this point they would only be created for variables whose type included pointers; if stack slot merging is enabled then the ssagen code creates OpVarDef ops for all auto vars that are merge candidates. Note that some temporaries created late in the compilation process (e.g. during ssa backend) are difficult to reason about, especially in cases where we take the address of a temp and pass it to the runtime. For the time being we mark most of the vars created post-ssagen as "not a merge candidate". Stack slot merging for locals/autos is enabled by default if "-N" is not in effect, and can be disabled via "-gcflags=-d=mergelocals=0". Fixmes/todos/restrictions: - try lowering size restrictions - re-evaluate the various skips that happen in SSA-created autotmps Updates #62737. Updates #65532. Updates #65495. Change-Id: Ifda26bc48cde5667de245c8a9671b3f0a30bb45d Reviewed-on: https://go-review.googlesource.com/c/go/+/575415 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-03-30Revert "cmd/compile/internal: merge stack slots for selected local auto vars"Cuong Manh Le
This reverts CL 553055. Reason for revert: causes crypto/ecdsa failures on linux ppc64/s390x builders Change-Id: I9266b030693a5b6b1e667a009de89d613755b048 Reviewed-on: https://go-review.googlesource.com/c/go/+/575236 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Auto-Submit: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-29cmd/compile/internal: merge stack slots for selected local auto varsThan McIntosh
Preliminary compiler support for merging/overlapping stack slots of local variables whose access patterns are disjoint. This patch includes changes in AllocFrame to do the actual merging/overlapping based on information returned from a new liveness.MergeLocals helper. The MergeLocals helper identifies candidates by looking for sets of AUTO variables that either A) have the same size and GC shape (if types contain pointers), or B) have the same size (but potentially different types as long as those types have no pointers). Variables must be greater than (3*types.PtrSize) in size to be considered for merging. After forming candidates, MergeLocals collects variables into "can be overlapped" equivalence classes or partitions; this process is driven by an additional liveness analysis pass. Ideally it would be nice to move the existing stackmap liveness pass up before AllocFrame and "widen" it to include merge candidates so that we can do just a single liveness as opposed to two passes, however this may be difficult given that the merge-locals liveness has to take into account writes corresponding to dead stores. This patch also required a change to the way ssa.OpVarDef pseudo-ops are generated; prior to this point they would only be created for variables whose type included pointers; if stack slot merging is enabled then the ssagen code creates OpVarDef ops for all auto vars that are merge candidates. Note that some temporaries created late in the compilation process (e.g. during ssa backend) are difficult to reason about, especially in cases where we take the address of a temp and pass it to the runtime. For the time being we mark most of the vars created post-ssagen as "not a merge candidate". Stack slot merging for locals/autos is enabled by default if "-N" is not in effect, and can be disabled via "-gcflags=-d=mergelocals=0". Fixmes/todos/restrictions: - try lowering size restrictions - re-evaluate the various skips that happen in SSA-created autotmps Fixes #62737. Updates #65532. Updates #65495. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Change-Id: Ibc22e8a76c87e47bc9fafe4959804d9ea923623d Reviewed-on: https://go-review.googlesource.com/c/go/+/553055 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-01-10cmd/compile: use hashed symbol name for go.shape types if too longThan McIntosh
Shape-based stenciling in the Go compiler's generic instantiation phase looks up shape types using the underlying type of a given target type. This has a beneficial effect in most cases (e.g. we can use the same shape type for two different named types whose underlying type is "int"), but causes some problems when the underlying type is a very large structure. The link string for the underlying type of a large imported struct can be extremely long, since the link string essentially enumerates the full package path for every field type; this can produce a "go.shape.struct { ... " symbol name that is absurdly long. This patch switches the compiler to use a hash of the underlying type link string instead of the string itself, which should continue to provide commoning but keep symbol name lengths reasonable for shape types based on large imported structs. Fixes #65030. Change-Id: I87d602626c43172beb99c186b8ef72327b8227a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/554975 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Than McIntosh <thanm@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-12-19cmd/compile: remove interfacecycles debug flagRobert Griesemer
Per the discussion on the issue, since no problems related to this appeared since Go 1.20, remove the ability to disable the check for anonymous interface cycles permanently. Adjust various tests accordingly. For #56103. Change-Id: Ica2b28752dca08934bbbc163a9b062ae1eb2a834 Reviewed-on: https://go-review.googlesource.com/c/go/+/550896 Run-TryBot: Robert Griesemer <gri@google.com> Auto-Submit: Robert Griesemer <gri@google.com> Reviewed-by: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-11-16cmd/compile: allow disable of PGO function value devirtualization with flagMichael Pratt
Extend the pgodevirtualize debug flag to distinguish interface and function devirtualization. Setting 1 keeps interface devirtualization enabled but disables function value devirtualization. For #64209. Change-Id: I33aa7eb95ca0bdb215256d8c7cc8f9dac53ae30e Reviewed-on: https://go-review.googlesource.com/c/go/+/543115 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-16cmd/compile/internal/inline: debug flag to alter score adjustmentsThan McIntosh
Add a debugging flag "-d=inlscoreadj" intended to support running experiments in which the inliner uses different score adjustment values for specific heuristics. The flag argument is a series of clauses separated by the "/" char where each clause takes the form "adjK:valK". For example, in this build go build -gcflags=-d=inlscoreadj=inLoopAdj:10/returnFeedsConstToIfAdj:-99 the "in loop" score adjustments would be reset to a value of 15 (effectively penalizing calls in loops) adn the "return feeds constant to foldable if/switch" score adjustment would be boosted from -15 to -99. Change-Id: Ibd1ee334684af5992466556a69baa6dfefb246b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/532116 Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-16cmd/compile: add compiler debug flag to disable range func iterator checkingDavid Chase
E.g. `GOEXPERIMENT=rangefunc go test -v -gcflags=-d=rangefunccheck=0 rangefunc_test.go` will turn off the checking and fail. The benchmarks, which do not use pathological iterators, run slightly faster. Change-Id: Ia3e175e86d67ef74bbae9bcc5d2def6a2cdf519d Reviewed-on: https://go-review.googlesource.com/c/go/+/541995 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-09-19cmd/compile: add pgohash for debugging/bisecting PGO optimizationsCherry Mui
When a PGO build fails or produces incorrect program, it is often unclear what the problem is. Add pgo hash so we can bisect to individual optimization decisions, which often helps debugging. Related to #58153. Change-Id: I651ffd9c53bad60f2f28c8ec2a90a3f532982712 Reviewed-on: https://go-review.googlesource.com/c/go/+/528400 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-09-15cmd/compile/internal/inline: add callsite trace output debugging flagThan McIntosh
Add a new debug flag "-d=dumpinlcallsitescores" that dumps out a summary of all callsites in the package being compiled with info on inlining heuristics, for human consumption. Sample output lines: Score Adjustment Status Callee CallerPos ScoreFlags ... 115 40 DEMOTED cmd/compile/internal/abi.(*ABIParamAssignment).Offset expand_calls.go:1679:14|6 panicPathAdj ... 76 -5 PROMOTED runtime.persistentalloc mcheckmark.go:48:45|3 inLoopAdj ... 201 0 --- PGO unicode.DecodeRuneInString utf8.go:312:30|1 ... 7 -5 --- PGO internal/abi.Name.DataChecked type.go:625:22|0 inLoopAdj Here "Score" is the final score calculated for the callsite, "Adjustment" is the amount added to or subtracted from the original hairyness estimate to form the score. "Status" shows whether anything changed with the site -- did the adjustment bump it down just below the threshold ("PROMOTED") or instead bump it above the threshold ("DEMOTED") or did nothing happen as a result of the heuristics ("---"); "Status" also shows whether PGO was involved. "Callee" is the name of the function called, "CallerPos" is the position of the callsite, and "ScoreFlags" is a digest of the specific properties we used to make adjustments to callsite score via heuristics. Change-Id: Iea4b1cbfee038bc68df6ab81e9973f145636300b Reviewed-on: https://go-review.googlesource.com/c/go/+/513455 Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-09-11cmd/compile/internal/staticinit: make staticopy safeMatthew Dempsky
Currently, cmd/compile optimizes `var a = true; var b = a` into `var a = true; var b = true`. But this may not be safe if we need to initialize any other global variables between `a` and `b`, and the initialization involves calling a user-defined function that reassigns `a`. This CL changes staticinit to keep track of the initialization expressions that we've seen so far, and to stop applying the staticcopy optimization once we've seen an initialization expression that might have modified another global variable within this package. To help identify affected initializers, this CL adds a -d=staticcopy flag to warn when a staticcopy is suppressed and turned into a dynamic copy. Currently, `go build -gcflags=all=-d=staticcopy std` reports only four instances: ``` encoding/xml/xml.go:1600:5: skipping static copy of HTMLEntity+0 with map[string]string{...} encoding/xml/xml.go:1869:5: skipping static copy of HTMLAutoClose+0 with []string{...} net/net.go:661:5: skipping static copy of .stmp_31+0 with poll.ErrNetClosing net/http/transport.go:2566:5: skipping static copy of errRequestCanceled+0 with ~R0 ``` Fixes #51913. Change-Id: Iab41cf6f84c44f7f960e4e62c28a8aeaade4fbcf Reviewed-on: https://go-review.googlesource.com/c/go/+/395541 Auto-Submit: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Heschi Kreinick <heschi@google.com>
2023-08-18cmd/compile: restore zero-copy string->[]byte optimizationMatthew Dempsky
This CL implements the remainder of the zero-copy string->[]byte conversion optimization initially attempted in go.dev/cl/520395, but fixes the tracking of mutations due to ODEREF/ODOTPTR assignments, and adds more comprehensive tests that I should have included originally. However, this CL also keeps it behind the -d=zerocopy flag. The next CL will enable it by default (for easier rollback). Updates #2205. Change-Id: Ic330260099ead27fc00e2680a59c6ff23cb63c2b Reviewed-on: https://go-review.googlesource.com/c/go/+/520599 Auto-Submit: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-08-10cmd/compile/internal/inline: add framework to compute func "properties"Than McIntosh
Add some machinery to support computing function "properties" for use in driving inlining heuristics, and a unit testing framework to check to see if the property computations are correct for a given set of canned Go source files. This CL is mainly the analysis skeleton and a testing framework; the code to compute the actual props will arrive in a later patch. Updates #61502. Change-Id: I7970b64f713d17d7fdd7e8e9ccc7d9b0490571bf Reviewed-on: https://go-review.googlesource.com/c/go/+/511557 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Than McIntosh <thanm@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-22cmd/compile: enable PGO-driven call devirtualizationMichael Pratt
This CL is originally based on CL 484838 from rajbarik@uber.com. Add a new PGO-based devirtualize pass. This pass conditionally devirtualizes interface calls for the hottest callee. That is, it performs a transformation like: type Iface interface { Foo() } type Concrete struct{} func (Concrete) Foo() {} func foo(i Iface) { i.Foo() } to: func foo(i Iface) { if c, ok := i.(Concrete); ok { c.Foo() } else { i.Foo() } } The primary benefit of this transformation is enabling inlining of the direct calls. Today this change has no impact on the escape behavior, as the fallback interface always forces an escape. But improving escape analysis to take advantage of this is an area of potential work. This CL is the bare minimum of a devirtualization implementation. There are still numerous limitations: * Callees not directly referenced in the current package can be missed (even if they are in the transitive dependences). * Callees not in the transitive dependencies of the current package are missed. * Only interface method calls are supported, not other indirect function calls. * Multiple calls to compatible interfaces on the same line cannot be distinguished and will use the same callee target. * Callees that only partially implement an interface (they are embedded in another type that completes the interface) cannot be devirtualized. * Others, mentioned in TODOs. Fixes #59959 Change-Id: I8bedb516139695ee4069650b099d05957b7ce5ee Reviewed-on: https://go-review.googlesource.com/c/go/+/492436 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-22cmd/compile: replace -d=pgoinline with -d=pgodebugMichael Pratt
We will soon have PGO specialization. It doesn't make sense for the debug flag to have inline in the name, so rename it to pgodebug. pgoinline is now a flag that can be used to disable PGO inlining. Devirtualization will have a similar debug flag. For #59959. Change-Id: I9770ff1f0d132dfa3cd417018a887a1bd5555bba Reviewed-on: https://go-review.googlesource.com/c/go/+/494716 Auto-Submit: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-05-11cmd/compile: remove debugging option InlineSCCOnePass from inlinerThan McIntosh
Delete the "InlineSCCOnePass" debugging flag and the inliner fallback code that kicks in if it is used. The change it was intended to guard has been working on tip for some time, no need for the fallback any more. Updates #58905. Change-Id: I2e1dbc7640902d9402213db5ad338be03deb96c5 Reviewed-on: https://go-review.googlesource.com/c/go/+/492015 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Than McIntosh <thanm@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-09cmd/compile: reorder operations in SCCs to enable more inliningThan McIntosh
This patch changes the relative order of "CanInline" and "InlineCalls" operations within the inliner for clumps of functions corresponding to strongly connected components in the call graph. This helps increase the amount of inlining within SCCs, particularly in Go's runtime package, which has a couple of very large SCCs. For a given SCC of the form { fn1, fn2, ... fnk }, the inliner would (prior to this point) walk through the list of functions and for each function first compute inlinability ("CanInline") and then perform inlining ("InlineCalls"). This meant that if there was an inlinable call from fn3 to fn4 (for example), this call would never be inlined, since at the point fn3 was visited, we would not have computed inlinability for fn4. We now do inlinability analysis for all functions in an SCC first, then do actual inlining for everything. This results in 47 additional inlines in the Go runtime package (a fairly modest increase percentage-wise of 0.6%). Updates #58905. Change-Id: I48dbb1ca16f0b12f256d9eeba8cf7f3e6dd853cd Reviewed-on: https://go-review.googlesource.com/c/go/+/474955 Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2023-03-09cmd/compile: remove -wrapglobalmapinit flagThan McIntosh
Remove the compiler's "-wrapglobalmapinit" flag; it is potentially confusing for users and isn't appropriate as a top level flag. Move the enable/disable control to the "wrapglobalmapctl" debug flag (values: 0 on by default, 1 disabled, 2 stress mode). No other changes to compiler functionality. Change-Id: I0d120eaf90ee34e29d5032889e673d42fe99e5dc Reviewed-on: https://go-review.googlesource.com/c/go/+/475035 Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-06cmd/compile: experimental loop iterator capture semantics changeDavid Chase
Adds: GOEXPERIMENT=loopvar (expected way of invoking) -d=loopvar={-1,0,1,2,11,12} (for per-package control and/or logging) -d=loopvarhash=... (for hash debugging) loopvar=11,12 are for testing, benchmarking, and debugging. If enabled,for loops of the form `for x,y := range thing`, if x and/or y are addressed or captured by a closure, are transformed by renaming x/y to a temporary and prepending an assignment to the body of the loop x := tmp_x. This changes the loop semantics by making each iteration's instance of x be distinct from the others (currently they are all aliased, and when this matters, it is almost always a bug). 3-range with captured iteration variables are also transformed, though it is a more complex transformation. "Optimized" to do a simpler transformation for 3-clause for where the increment is empty. (Prior optimization of address-taking under Return disabled, because it was incorrect; returns can have loops for children. Restored in a later CL.) Includes support for -d=loopvarhash=<binary string> intended for use with hash search and GOCOMPILEDEBUG=loopvarhash=<binary string> (use `gossahash -e loopvarhash command-that-fails`). Minor feature upgrades to hash-triggered features; clients can specify that file-position hashes use only the most-inline position, and/or that they use only the basenames of source files (not the full directory path). Most-inlined is the right choice for debugging loop-iteration change once the semantics are linked to the package across inlining; basename-only makes it tractable to write tests (which, otherwise, depend on the full pathname of the source file and thus vary). Updates #57969. Change-Id: I180a51a3f8d4173f6210c861f10de23de8a1b1db Reviewed-on: https://go-review.googlesource.com/c/go/+/411904 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-06cmd/compile: enable deadcode of unreferenced large global mapsThan McIntosh
This patch changes the compiler's pkg init machinery to pick out large initialization assignments to global maps (e.g. var mymap = map[string]int{"foo":1, "bar":2, ... } and extract the map init code into a separate outlined function, which is then called from the main init function with a weak relocation: var mymap map[string]int // KEEP reloc -> map.init.0 func init() { map.init.0() // weak relocation } func map.init.0() { mymap = map[string]int{"foo":1, "bar":2} } The map init outlining is done selectively (only in the case where the RHS code exceeds a size limit of 20 IR nodes). In order to ensure that a given map.init.NNN function is included when its corresponding map is live, we add dummy R_KEEP relocation from the map variable to the map init function. This first patch includes the main compiler compiler changes, and with the weak relocation addition disabled. Subsequent patch includes the requred linker changes along with switching to the call to the outlined routine to a weak relocation. See the later linker change for associated compile time performance numbers. Updates #2559. Updates #36021. Updates #14840. Change-Id: I1fd6fd6397772be1ebd3eb397caf68ae9a3147e9 Reviewed-on: https://go-review.googlesource.com/c/go/+/461315 Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-26cmd/compile/internal/types: remove Markdcl/Pushdcl/PopdclMatthew Dempsky
Sym.Def used to be used for symbol resolution during the old (pre-types2) typechecker. But since moving to types2-based IR construction, we haven't really had a need for Sym.Def to ever refer to anything but the package-scope definition, because types2 handles symbol resolution for us. This CL finally removes the Markdcl/Pushdcl/Popdcl functions that have been a recurring source of issues in the past. Change-Id: I2b012a0f17203efdd724ebd1e9314bd128cc2d61 Reviewed-on: https://go-review.googlesource.com/c/go/+/458625 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@google.com>
2023-01-26cmd/compile: remove -d=typecheckinl flagMatthew Dempsky
This flag forced the compiler to eagerly type check all available inline function bodies, which presumably was useful in the early days of implementing inlining support. However, it shouldn't have any significance with the unified frontend, since the same code paths are used for constructing normal function bodies as for inlining. Updates #57410. Change-Id: I6842cf86bcd0fbf22ac336f2fc0b7b8fe14bccca Reviewed-on: https://go-review.googlesource.com/c/go/+/458617 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-25cmd: remove GOEXPERIMENT=nounified knobMatthew Dempsky
This CL removes the GOEXPERIMENT=nounified knob, and any conditional statements that depend on that knob. Further CLs to remove unreachable code follow this one. Updates #57410. Change-Id: I39c147e1a83601c73f8316a001705778fee64a91 Reviewed-on: https://go-review.googlesource.com/c/go/+/458615 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-11-23cmd/compile: reenable inlstaticinitRuss Cox
This was disabled in CL 452676 out of an abundance of caution, but further analysis has shown that the failures were not being caused by this optimization. Instead the sequence of commits was: CL 450136 cmd/compile: handle simple inlined calls in staticinit ... CL 449937 archive/tar, archive/zip: return ErrInsecurePath for unsafe paths ... CL 451555 cmd/compile: fix static init for inlined calls The failures in question became compile failures in the first CL and started building again after the last CL. But in the interim the code had been broken by the middle CL. CL 451555 was just the first time that the tests could run and fail. For #30820. Change-Id: I65064032355b56fdb43d9731be2f9f32ef6ee600 Reviewed-on: https://go-review.googlesource.com/c/go/+/452817 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Russ Cox <rsc@golang.org> Auto-Submit: Russ Cox <rsc@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-11-22cmd/compile: add -d=inlstaticinit debug flagMatthew Dempsky
This CL adds -d=inlstaticinit to control whether static initialization of inlined function calls (added in CL 450136) is allowed. We've needed to fix it once already (CL 451555) and Google-internal testing is hitting additional failure cases, so putting this optimization behind a feature flag seems appropriate regardless. Also, while we diagnose and fix the remaining cases, this CL also disables the optimization to avoid miscompilations. Updates #56894. Change-Id: If52a358ad1e9d6aad1c74fac5a81ff9cfa5a3793 Reviewed-on: https://go-review.googlesource.com/c/go/+/452676 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com>
2022-11-21cmd/compile: reject anonymous interface cyclesMatthew Dempsky
This CL changes cmd/compile to reject anonymous interface cycles like: type I interface { m() interface { I } } We don't anticipate any users to be affected by this change in practice. Nonetheless, this CL also adds a `-d=interfacecycles` compiler flag to suppress the error. And assuming no issue reports from users, we'll move the check into go/types and types2 instead. Updates #56103. Change-Id: I1f1dce2d7aa19fb388312cc020e99cc354afddcb Reviewed-on: https://go-review.googlesource.com/c/go/+/445598 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Matthew Dempsky <mdempsky@google.com>
2022-11-15cmd/compile: add testing-flag guard to package-is-collected assertDavid Chase
On advice of the department of garbage collection, forcing a garbage collection generally does not improve performance. However, this-data-is-now-unreachable is a good property to be able to test, and that requires finalizers and a forced GC. So, to save build time, this test was removed from the compiler itself, but to verify the property, it was added to the fma_test (and the end-to-end dependence on the flag was tested with an inserted failure in testing the test). TODO: also turn on the new -d=gccheck=1 debug flag on the ssacheck builder. Benchmarking reveals that it is profitable to avoid this GC, with about 1.5% reduction in both user and wall time. (48 p) https://perf.golang.org/search?q=upload:20221103.3 (12 p) https://perf.golang.org/search?q=upload:20221103.5 Change-Id: I4c4816d619735838a32388acf0cc5eb1cd5f0db5 Reviewed-on: https://go-review.googlesource.com/c/go/+/447359 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2022-11-10cmd/compile: adjust PGO inlining default parametersCherry Mui
Adjust PGO inlining default parameters to 99% CDF threshold and 2000 budget. Benchmark results (mostly from Sweet) show that this set of parameters performs reasonably well, with a few percent speedup at the cost of a few percent binary size increase. Also rename the debug flags to start with "pgo", to make it clear that they are related to PGO. Change-Id: I0749249b1298d1dc55a28993c37b3185f9d7639d Reviewed-on: https://go-review.googlesource.com/c/go/+/449477 Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-11-07cmd/compile: let compiler downgrade its own concurrencyDavid Chase
This gets the Go command out of the business of thinking it understands compiler debug flags, and allows the compiler to turn down its worker concurrency instead of failing and forcing the user to do the very same thing. Debug flags that are obviously safe for concurrency (at least to me) are tagged; probably there's more. Change-Id: I59bb19861d8a654a9cfd2364ee78c8628212f82e Reviewed-on: https://go-review.googlesource.com/c/go/+/448359 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-11-03cmd/compile: if GOGC is not set, temporarily boost it for rapid starting ↵David Chase
heap growth Benchmarking suggests about a 14-17% reduction in user build time, about 3.5-7.8% reduction for wall time. This helps most builds because small packages are common. Latest benchmarks (after the last round of improvement): (12 processors) https://perf.golang.org/search?q=upload:20221102.20 (GOMAXPROCS=2) https://perf.golang.org/search?q=upload:20221103.1 (48 processors) https://perf.golang.org/search?q=upload:20221102.19 (The number of compiler workers is capped at min(4, GOMAXPROCS)) An earlier, similar version of this CL at one point observed a 27% reduction in user build time (building 40+ benchmarks, 20 times), but the current form is judged to be the most reliable; it may be profitable to tweak the numbers slightly later, and/or to adjust the number of compiler workers. We've talked about doing this in the past, the "new"(ish) metrics package makes it a more tractable proposition. The method here is: 1. If os.Getenv("GOGC") is empty, then increase GOGC to a large value, calculated to grow the heap to 32 + 4 * compile_parallelism before a GC occurs (e.g., on a >= 4 processor box, 64M). In practice, sometimes GC occurs before that, but this still results in fewer GCs and saved time. This is "heap goal". 2. Use a finalizer to approximately detect when GC occurs, and use runtime metrics to track progress towards the goal heap size, readjusting GOGC to retarget it as necessary. Reset GOGC to 100 when the heap is "close enough" to the goal. One feared failure mode of doing this is that the finalizer will be slow to run and the heap will grow exceptionally large before GOGC is reset; I monitored the heap size at reset and exit across several boxes with a variety of processor counts and extra noise (including several builds in parallel, including a laptop with a busy many-tabs browser running) and overshoot effectively does not occur. In some cases the compiler's heap grows so rapidly that estimated live exceeds the GC goal, but this is not delayed-finalizer overshoot; the compiler is just using that much memory. In a small number of cases (3% of GCs in make.bash) the new goal is larger than predicted by as much as 38%, so check for that and redo the adjustment. I considered instead using the maximum heap size limit + GC-detecting-finalizer + reset instead, but to me that seemed like it might have a worse bad-case outcome; if the reset is delayed, it's possible the GC would start running frequently, making it harder to run the finalizer, reach 50% utilization, and the extra GCs would lose the advantage. This might also perform badly in the case that a rapidly growing heap outruns goal. In practice, this sort of overshoot hasn't been observed, and a goal of 64M is small enough to tolerate plenty of overshoot anyway. This version of the CL includes a comment urging anyone who sees the code and thinks it would work for them, to update a bug (to be created if the CL is approved) with information about their situation/experience, so that we may consider creating some more official and reliable way of obtaining the same result. Change-Id: I45df1c927c1a7d7503ade1abd1a3300e27516633 Reviewed-on: https://go-review.googlesource.com/c/go/+/436235 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2022-11-03cmd/compile: add debug-hash flag for fused-multiply-addDavid Chase
This adds a -d debug flag "fmahash" for hashcode search for floating point architecture-dependent problems. This variable has no effect on architectures w/o fused-multiply-add. This was rebased onto the GOSSAHASH renovation so that this could have its own dedicated environment variable, and so that it would be cheap (a nil check) to check it in the normal case. Includes a basic test of the trigger plumbing. Sample use (on arm64, ppc64le, s390x): % GOCOMPILEDEBUG=fmahash=001110110 \ go build -o foo cmd/compile/internal/ssa/testdata/fma.go fmahash triggered main.main:24 101111101101111001110110 GOFMAHASH triggered main.main:20 010111010000101110111011 1.0000000000000002 1.0000000000000004 -2.220446049250313e-16 exit status 1 The intended use is in conjunction with github.com/dr2chase/gossahash, which will probably acquire a flag "-fma" to streamline its use. This tool+use was inspired by an ad hoc use of this technique "in anger" to debug this very problem. This is also a dry-run for using this same technique to identify code sensitive to loop variable lifetime/capture, should we make that change. Example intended use, with current search tool (using old environment variable), for a test example: gossahash -e GOFMAHASH GOMAGIC=GOFMAHASH go run fma.go Trying go args=[...], env=[GOFMAHASH=1 GOMAGIC=GOFMAHASH] go failed (81 distinct triggers): exit status 1 Trying go args=[...], env=[GOFMAHASH=11 GOMAGIC=GOFMAHASH] go failed (39 distinct triggers): exit status 1 Trying go args=[...], env=[GOFMAHASH=011 GOMAGIC=GOFMAHASH] go failed (18 distinct triggers): exit status 1 Trying go args=[...], env=[GOFMAHASH=0011 GOMAGIC=GOFMAHASH] Trying go args=[...], env=[GOFMAHASH=1011 GOMAGIC=GOFMAHASH] ... Trying go args=[...], env=[GOFMAHASH=0110111011 GOMAGIC=GOFMAHASH] Trying go args=[...], env=[GOFMAHASH=1110111011 GOMAGIC=GOFMAHASH] go failed (2 distinct triggers): exit status 1 Trigger string is 'GOFMAHASH triggered math.qzero:427 111111101010011110111011', repeated 6 times Trigger string is 'GOFMAHASH triggered main.main:20 010111010000101110111011', repeated 1 times Trying go args=[...], env=[GOFMAHASH=01110111011 GOMAGIC=GOFMAHASH] go failed (1 distinct triggers): exit status 1 Trigger string is 'GOFMAHASH triggered main.main:20 010111010000101110111011', repeated 1 times Review GSHS_LAST_FAIL.0.log for failing run FINISHED, suggest this command line for debugging: GOSSAFUNC='main.main:20 010111010000101110111011' \ GOFMAHASH=01110111011 GOMAGIC=GOFMAHASH go run fma.go Change-Id: Ifa22dd8f1c37c18fc8a4f7c396345a364bc367d5 Reviewed-on: https://go-review.googlesource.com/c/go/+/394754 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2022-11-03cmd/compile: use CDF to determine PGO inline thresholdCherry Mui
Currently in PGO we use a percentage threshold to determine if a callsite is hot. This CL uses a different method -- treating the hottest callsites that make up cumulatively top X% of total edge weights as hot (X=95 for now). This default might work better for a wider range of profiles. (The absolute threshold can still be changed by a flag.) For #55022. Change-Id: I7e3b6f0c3cf23f9a89dd5994c10075b498bf14ee Reviewed-on: https://go-review.googlesource.com/c/go/+/447016 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2022-11-02cmd/compile: use edge weights to decide inlineability in PGOCherry Mui
Currently, with PGO, the inliner uses node weights to decide if a function is inlineable (with a larger budget). But the actual inlining is determined by the weight of the call edge. There is a discrepancy that, if a callee node is hot but the call edge is not, it would not inlined, and marking the callee inlineable would of no use. Instead of using two kinds of weights, we just use the edge weights to decide inlineability. If a function is the callee of a hot call edge, its inlineability is determined with a larger threshold. For a function that exceeds the regular inlining budget, it is still inlined only when the call edge is hot, as it would exceed the regular inlining cost for non-hot call sites, even if it is marked inlineable. For #55022. Change-Id: I93fa9919fc6bcbb394e6cfe54ec96a96eede08f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/447015 Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-11-02cmd/compile: renovate GOSSAHASHDavid Chase
Randomized feature enable/disable might be something we use to help users debug any problems with changed loop variable capture, and there's another CL that would like to use it to help in locating places where "fused" multiply add instructions change program behavior. This CL: - adds the ability to include an integer parameter (e.g. line number) - replumbed the environment variable into a flag to simplify go build cache management - but added an environment variable to allow flag setting through the environment - which adds the possibility of switching on a different variable (if there's one built-in for variable capture, it shouldn't be GOSSAHASH) - cleaned up the checking code - adds tests for all the intended behavior - removes the case for GSHS_LOGFILE; TBD whether we'll need to put that back or if there is another way. Change-Id: I8503e1bb3dbc4a743aea696e04411ea7ab884787 Reviewed-on: https://go-review.googlesource.com/c/go/+/443063 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: David Chase <drchase@google.com>
2022-10-31cmd/compile: add ability to indicate 'concurrentOk' for debug flagsDavid Chase
Also removes no-longer-needed "Any" field from compiler's DebugFlags. Test/use case for this is the fmahash CL. Change-Id: I214f02c91f30fc2ce53caf75fa5e2b905dd33429 Reviewed-on: https://go-review.googlesource.com/c/go/+/445495 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com>
2022-10-28cmd/compile: Enables PGO in Go and performs profile-guided inliningRaj Barik
For #55022 Change-Id: I51f1ba166d5a66dcaf4b280756be4a6bf9545c5e Reviewed-on: https://go-review.googlesource.com/c/go/+/429863 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com>