aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/base
AgeCommit message (Collapse)Author
2026-03-18cmd/compile: allow halt flag to run concurrentlyDaniel Morsing
During the investigation for #78081, I found it helpful to use the -h flag to get a compiler core dump. I don't see any reason why it would need to clamp concurrency down. We have pretty good support for not interleaving log lines and if people really need it, they can always pass -c=1. Change-Id: Ic6425fc0da63e7ac42e0821f3d40ec4c6a6a6964 Reviewed-on: https://go-review.googlesource.com/c/go/+/756321 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Daniel Morsing <daniel.morsing@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>
2026-03-02cmd/compile: add concurrency-ok property to some compiler debug flagsDavid Chase
The property was missing from some flags that we potentially care about. This change only affects flags supplied in the command line (not explicitly changed within the compiler) but nonetheless it seems like a good idea to get them right -- we might adjust them on the command line, someone might read the code and wonder why they're unset. Change-Id: I44812ddea640b71c078594317ef3506ab055a37b Reviewed-on: https://go-review.googlesource.com/c/go/+/452876 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2026-02-21cmd/compile: add flag for emitting diagnostics unsortedDaniel Morsing
The prove pass uses warnings with locations that are sorted before they are emitted. When debugging on level 3, this sorting can obscure which log lines happened before each other, obscuring the flow of control. This gets doubly confusing when debugging issues that include use of the inliner, where 2 related log lines can be present at the far ends of the output. Make the sorting optional. This allows both the errorcheck tests to keep working and for developers to use the logs as printf-style debugging. Change-Id: I1c1d88696e830da71eeeca8e324ef9856a6a6964 Reviewed-on: https://go-review.googlesource.com/c/go/+/742341 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2026-02-03cmd/compile: add astdump debug flagDavid Chase
This was extraordinarily useful for inlining work. I have cleaned it up somewhat, and did some additional tweaks after working on changes to bloop. -gcflags=-d=astdump=SomeFunc -gcflags=-d=astdump=SomeSubPkg.SomeFunc -gcflags=-d=astdump=Some/Pkg.SomeFunc -gcflags=-d=astdump=~YourRegExpHere Change-Id: I3f98601ca96c87d6b191d4b64b264cd236e6d8bf Reviewed-on: https://go-review.googlesource.com/c/go/+/629775 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-11-26cmd/compile: introduce alias analysis and automatically free non-aliased ↵thepudds
memory after growslice This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL updates the compiler to examine append calls to prove whether or not the slice is aliased. If proven unaliased, the compiler automatically inserts a call to a new runtime function introduced with this CL, runtime.growsliceNoAlias, which frees the old backing memory immediately after slice growth is complete and the old storage is logically dead. Two append benchmarks below show promising results, executing up to ~2x faster and up to factor of ~3 memory reduction with this CL. The approach works with multiple append calls for the same slice, including inside loops, and the final slice memory can be escaping, such as in a classic pattern of returning a slice from a function after the slice is built. (The final slice memory is never freed with this CL, though we have other work that tackles that.) An example target for this CL is we automatically free the intermediate memory for the appends in the loop in this function: func f1(input []int) []int { var s []int for _, x := range input { s = append(s, g(x)) // s cannot be aliased here if h(x) { s = append(s, x) // s cannot be aliased here } } return s // slice escapes at end } In this case, the compiler and the runtime collaborate so that the heap allocated backing memory for s is automatically freed after a successful grow. (For the first grow, there is nothing to free, but for the second and subsequent growths, the old heap memory is freed automatically.) The new runtime.growsliceNoAlias is primarily implemented by calling runtime.freegc, which we introduced in CL 673695. The high-level approach here is we step through the IR starting from a slice declaration and look for any operations that either alias the slice or might do so, and treat any IR construct we don't specifically handle as a potential alias (and therefore conservatively fall back to treating the slice as aliased when encountering something not understood). For loops, some additional care is required. We arrange the analysis so that an alias in the body of a loop causes all the appends in that same loop body to be marked aliased, even if the aliasing occurs after the append in the IR: func f2() { var s []int for i := range 10 { s = append(s, i) // aliased due to next line alias = s } } For nested loops, we analyse the nesting appropriately so that for example this append is still proven as non-aliased in the inner loop even though it aliased for the outer loop: func f3() { for range 10 { var s []int for i := range 10 { s = append(s, i) // append using non-aliased slice } alias = s } } A good starting point is the beginning of the test/escape_alias.go file, which starts with ~10 introductory examples with brief comments that attempt to illustrate the high-level approach. For more details, see the new .../internal/escape/alias.go file, especially the (*aliasAnalysis).analyze method. In the first benchmark, an append in a loop builds up a slice from nothing, where the slice elements are each 64 bytes. In the table below, 'count' is the number of appends. With 1 append, there is no opportunity for this CL to free memory. Once there are 2 appends, the growth from 1 element to 2 elements means the compiler-inserted growsliceNoAlias frees the 1-element array, and we see a ~33% reduction in memory use and a small reported speed improvement. As the number of appends increases for example to 5, we are at a ~20% speed improvement and ~45% memory reduction, and so on until we reach ~40% faster and ~50% less memory allocated at the end of the table. There can be variation in the reported numbers based on -randlayout, so this table is for 30 different values of -randlayout with a total n=150. (Even so, there is still some variation, so we probably should not read too much into small changes.) This is with GOAMD64=v3 on a VM that gcc reports is cascadelake. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ Append64Bytes/count=1-4 31.09n ± 2% 31.69n ± 1% +1.95% (n=150) Append64Bytes/count=2-4 73.31n ± 1% 70.27n ± 0% -4.15% (n=150) Append64Bytes/count=3-4 142.7n ± 1% 124.6n ± 1% -12.68% (n=150) Append64Bytes/count=4-4 149.6n ± 1% 127.7n ± 0% -14.64% (n=150) Append64Bytes/count=5-4 277.1n ± 1% 213.6n ± 0% -22.90% (n=150) Append64Bytes/count=6-4 280.7n ± 1% 216.5n ± 1% -22.87% (n=150) Append64Bytes/count=10-4 544.3n ± 1% 386.6n ± 0% -28.97% (n=150) Append64Bytes/count=20-4 1058.5n ± 1% 715.6n ± 1% -32.39% (n=150) Append64Bytes/count=50-4 2.121µ ± 1% 1.404µ ± 1% -33.83% (n=150) Append64Bytes/count=100-4 4.152µ ± 1% 2.736µ ± 1% -34.11% (n=150) Append64Bytes/count=200-4 7.753µ ± 1% 4.882µ ± 1% -37.03% (n=150) Append64Bytes/count=400-4 15.163µ ± 2% 9.273µ ± 1% -38.84% (n=150) geomean 601.8n 455.0n -24.39% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ Append64Bytes/count=1-4 64.00 ± 0% 64.00 ± 0% ~ (n=150) Append64Bytes/count=2-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) Append64Bytes/count=3-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=4-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=5-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=6-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=10-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) Append64Bytes/count=20-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) Append64Bytes/count=50-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) Append64Bytes/count=100-4 15.938Ki ± 0% 8.021Ki ± 0% -49.67% (n=150) Append64Bytes/count=200-4 31.94Ki ± 0% 16.08Ki ± 0% -49.64% (n=150) Append64Bytes/count=400-4 63.94Ki ± 0% 32.33Ki ± 0% -49.44% (n=150) geomean 1.991Ki 1.124Ki -43.54% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ Append64Bytes/count=1-4 1.000 ± 0% 1.000 ± 0% ~ (n=150) Append64Bytes/count=2-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) Append64Bytes/count=3-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=4-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=5-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=6-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=10-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) Append64Bytes/count=20-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) Append64Bytes/count=50-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) Append64Bytes/count=100-4 8.000 ± 0% 1.000 ± 0% -87.50% (n=150) Append64Bytes/count=200-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) Append64Bytes/count=400-4 10.000 ± 0% 1.000 ± 0% -90.00% (n=150) geomean 4.331 1.000 -76.91% The second benchmark is similar, but instead uses an 8-byte integer for the slice element. The first 4 appends in the loop never call into the runtime thanks to the excellent CL 664299 introduced by Keith in Go 1.25 that allows some <= 32 byte dynamically-sized slices to be on the stack, so this CL is neutral for <= 32 bytes. Once the 5th append occurs at count=5, a grow happens via the runtime and heap allocates as normal, but freegc does not yet have anything to free, so we see a small ~1.4ns penalty reported there. But once the second growth happens, the older heap memory is now automatically freed by freegc, so we start to see some benefit in memory reductions and speed improvements, starting at a tiny speed improvement (close to a wash, or maybe noise) by the second growth before count=10, and building up to ~2x faster with ~68% fewer allocated bytes reported. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ AppendInt/count=1-4 2.978n ± 0% 2.969n ± 0% -0.30% (p=0.000 n=150) AppendInt/count=4-4 4.292n ± 3% 4.163n ± 3% ~ (p=0.528 n=150) AppendInt/count=5-4 33.50n ± 0% 34.93n ± 0% +4.25% (p=0.000 n=150) AppendInt/count=10-4 76.21n ± 1% 75.67n ± 0% -0.72% (p=0.000 n=150) AppendInt/count=20-4 150.6n ± 1% 133.0n ± 0% -11.65% (n=150) AppendInt/count=50-4 284.1n ± 1% 225.6n ± 0% -20.59% (n=150) AppendInt/count=100-4 544.2n ± 1% 392.4n ± 1% -27.89% (n=150) AppendInt/count=200-4 1051.5n ± 1% 702.3n ± 0% -33.21% (n=150) AppendInt/count=400-4 2.041µ ± 1% 1.312µ ± 1% -35.70% (n=150) AppendInt/count=1000-4 5.224µ ± 2% 2.851µ ± 1% -45.43% (n=150) AppendInt/count=2000-4 11.770µ ± 1% 6.010µ ± 1% -48.94% (n=150) AppendInt/count=3000-4 17.747µ ± 2% 8.264µ ± 1% -53.44% (n=150) geomean 331.8n 246.4n -25.72% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) AppendInt/count=20-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) AppendInt/count=50-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) AppendInt/count=100-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) AppendInt/count=200-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) AppendInt/count=400-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) AppendInt/count=1000-4 24.56Ki ± 0% 10.05Ki ± 0% -59.07% (n=150) AppendInt/count=2000-4 58.56Ki ± 0% 20.31Ki ± 0% -65.32% (n=150) AppendInt/count=3000-4 85.19Ki ± 0% 27.30Ki ± 0% -67.95% (n=150) geomean ² -42.81% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) AppendInt/count=20-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) AppendInt/count=50-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) AppendInt/count=100-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) AppendInt/count=200-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) AppendInt/count=400-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) AppendInt/count=1000-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) AppendInt/count=2000-4 11.000 ± 0% 1.000 ± 0% -90.91% (n=150) AppendInt/count=3000-4 12.000 ± 0% 1.000 ± 0% -91.67% (n=150) geomean ² -72.76% ² Of course, these are just microbenchmarks, but likely indicate there are some opportunities here. The immediately following CL 712422 tackles inlining and is able to get runtime.freegc working automatically with iterators such as used by slices.Collect, which becomes able to automatically free the intermediate memory from its repeated appends (which earlier in this work required a temporary hand edit to the slices package). For now, we only use the NoAlias version for element types without pointers while waiting on additional runtime support in CL 698515. Updates #74299 Change-Id: I1b9d286aa97c170dcc2e203ec0f8ca72d84e8221 Reviewed-on: https://go-review.googlesource.com/c/go/+/710015 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-11-26cmd/compile: adjust start heap sizeDavid Chase
TLDR - not-huge increase to default starting heap boost, - small improvement in build performance, - remove concurrency dependence of starting heap, - aligns RSS behavior with GOMEMLIMIT, - adds a gcflags=-d=gcstart=N (N -> N MiB) flag for people who want to trade a lot of memory for a little build performance improvement. This removes concurrency (-c flag) sensitivity and increases the nominal default to 128MiB. Refactored the startheap code into a separate file, to make it easier to extract and reuse. Added sensitivity to concurrency=1 and GOMEMLIMIT!="" (in addition to existing GOGC!=""), those disable the default starting heap boost because the compiler-invoker has indicated either a desire to control the GC or a desire to run in minimum memory(or both). Adds a -d flag gcstart=N (N is number of MiB) for tinkering/experiments. This always enables the starting heap. (`GOGC=XXX` and `-d=gcstart=YYY` will use `GOGC=XXX` after starting heap size is achieved.) Derated the "boost" obtained by a factor of .70 so that `-d=gcstart=2000` yields the same RSS as `GOMEMLIMIT=2000MiB` (Actually adjusts the boost with a high-low breakpoint.) The parent, with concurrency sensitivity, provided 64MB of plain boost. Derating reduces the effects of boosting the starting heap slightly. The benchmark here shows that maintaining 64MB results in a minor regression, while increasing it to 128MB produces a slight improvement, and does not grow the RSS versus 64MB. ``` │ parent │ sh64 │ sh128 │ sh1024 │ │ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │ std 10.164 ± 1% 10.527 ± 1% +3.57% (p=0.000 n=50) 10.084 ± 1% -0.79% (p=0.000 n=50) 9.631 ± 1% -5.24% (p=0.000 n=50) compile 21.05 ± 1% 20.78 ± 0% -1.28% (p=0.000 n=50) 20.74 ± 1% -1.46% (p=0.000 n=50) 20.77 ± 0% -1.32% (p=0.001 n=50) ast 20.45 ± 1% 20.39 ± 1% ~ (p=0.334 n=50) 20.44 ± 0% ~ (p=0.818 n=50) 20.11 ± 1% -1.65% (p=0.000 n=50) geomean 16.35 16.46 +0.65% 16.23 -0.76% 15.90 -2.75% │ parent │ sh64 │ sh128 │ sh1024 │ │ user-sec/op │ user-sec/op vs base │ user-sec/op vs base │ user-sec/op vs base │ std 66.06 ± 0% 69.74 ± 0% +5.56% (p=0.000 n=50) 64.68 ± 0% -2.09% (p=0.000 n=50) 59.51 ± 0% -9.91% (p=0.000 n=50) compile 84.69 ± 1% 82.54 ± 0% -2.53% (p=0.000 n=50) 82.63 ± 0% -2.43% (p=0.000 n=50) 82.66 ± 1% -2.40% (p=0.000 n=50) ast 59.41 ± 0% 58.84 ± 1% -0.95% (p=0.011 n=50) 59.48 ± 1% ~ (p=0.341 n=50) 57.13 ± 1% -3.83% (p=0.000 n=50) geomean 69.27 69.71 +0.63% 68.25 -1.47% 65.50 -5.44% │ parent │ sh64 │ sh128 │ sh1024 │ │ sys-sec/op │ sys-sec/op vs base │ sys-sec/op vs base │ sys-sec/op vs base │ std 9.599 ± 1% 10.031 ± 1% +4.50% (p=0.000 n=50) 9.513 ± 1% -0.90% (p=0.014 n=50) 9.359 ± 1% -2.50% (p=0.000 n=50) compile 6.813 ± 1% 6.740 ± 1% -1.08% (p=0.017 n=50) 6.716 ± 1% -1.42% (p=0.006 n=50) 6.696 ± 1% -1.72% (p=0.000 n=50) ast 4.315 ± 1% 4.291 ± 1% ~ (p=0.781 n=50) 4.296 ± 1% ~ (p=0.792 n=50) 4.279 ± 2% ~ (p=0.124 n=50) geomean 6.559 6.620 +0.93% 6.499 -0.92% 6.449 -1.68% │ parent │ sh64 │ sh128 │ sh1024 │ │ peak-RSS-bytes │ peak-RSS-bytes vs base │ peak-RSS-bytes vs base │ peak-RSS-bytes vs base │ std 257.1Mi ± 1% 257.2Mi ± 1% ~ (p=0.754 n=50) 257.0Mi ± 0% ~ (p=0.570 n=50) 605.6Mi ± 0% +135.59% (p=0.000 n=50) compile 1007.2Mi ± 1% 1004.3Mi ± 0% ~ (p=0.064 n=50) 1007.4Mi ± 0% ~ (p=0.348 n=50) 1009.4Mi ± 1% ~ (p=0.598 n=50) ast 1.848Gi ± 0% 1.842Gi ± 0% ~ (p=0.079 n=50) 1.824Gi ± 0% -1.25% (p=0.000 n=50) 1.856Gi ± 0% +0.47% (p=0.000 n=50) geomean 788.3Mi 786.8Mi -0.19% 785.0Mi -0.41% 1.027Gi +33.37% ``` Updates #73044 Change-Id: I6359642a94b396e696dd57e64ed1f2c4cf178475 Reviewed-on: https://go-review.googlesource.com/c/go/+/724441 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-18cmd/asm,cmd/compile,cmd/internal/obj/riscv: use compressed instructions on ↵Joel Sing
riscv64 Make use of compressed instructions on riscv64 - add a compress pass to the end of the assembler, which replaces non-compressed instructions with compressed alternatives if possible. Provide a `compressinstructions` compiler and assembler debug flag, such that the compression pass can be disabled via `-asmflags=all=-d=compressinstructions=0` and `-gcflags=all=-d=compressinstructions=0`. Note that this does not prevent the explicit use of compressed instructions via assembly. Note that this does not make use of compressed control transfer instructions - this will be implemented in later changes. Reduces the text size of a hello world binary by ~121KB and reduces the text size of the go binary on riscv64 by ~1.21MB (between 8-10% in both cases). Updates #71105 Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64 Change-Id: I24258353688554042c2a836deed4830cc673e985 Reviewed-on: https://go-review.googlesource.com/c/go/+/523478 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Mark Freeman <markfreeman@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-11-11std,cmd: go fix -any std cmdAlan Donovan
This change mechanically replaces all occurrences of interface{} by 'any' (where deemed safe by the 'any' modernizer) throughout std and cmd, minus their vendor trees. Since this fix is relatively numerous, it gets its own CL. Also, 'go generate go/types'. Change-Id: I14a6b52856c3291c1d27935409bca8d5fd4242a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/719702 Commit-Queue: Alan Donovan <adonovan@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Auto-Submit: Alan Donovan <adonovan@google.com>
2025-10-14cmd/compile: disable use of new saturating float-to-int conversionsDavid Chase
The new conversions can be activated (or bisected) with -gcflags=all=-d=converthash=PATTERN where PATTERN is either a hash string or n, qn, y, qy for no, quietly no, yes, quietly yes. This CL makes the default pattern be "qn" instead of the default-default which is an efficient encoding of "qy". Updates #75834 Change-Id: I88a9fd7880bc999132420c8d0a22a8fdc1e95a2a Reviewed-on: https://go-review.googlesource.com/c/go/+/711845 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Bypass: David Chase <drchase@google.com>
2025-10-09cmd/compile: modify float-to-[u]int so that amd64 and arm64 matchDavid Chase
Eventual goal is that all the architectures agree, and are sensible. The test will be build-tagged to exclude not-yet-handled platforms. This change also bisects the conversion change in case of bugs. (`bisect -compile=convert ...`) Change-Id: I98528666b0a3fde17cbe8d69b612d01da18dce85 Reviewed-on: https://go-review.googlesource.com/c/go/+/691135 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-10-03cmd/compile: don't treat devel compiler as a released compilerCherry Mui
The compiler has a logic to print different messages on internal compiler error depending on whether this is a released version of Go. It hides the panic stack trace if it is a released version. It does this by checking the version and see if it has a "go" prefix. This includes all the released versions. However, for a non- released build, if there is no explicit version set, cmd/dist now sets the toolchain version as go1.X-devel_XXX, which makes it be treated as a released compiler, and causes the stack trace to be hidden. Change the logic to not match a devel compiler as a released compiler. Cherry-picked from the dev.simd branch. This CL is not necessarily SIMD specific. Apply early to reduce risk. Change-Id: I5d3b2101527212f825b6e4000b36030c4f83870b Reviewed-on: https://go-review.googlesource.com/c/go/+/682975 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/708855 Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-09-15all: replace strings.Split with strings.SplitSeq1911860538
In Go 1.25+, strings.SplitSeq offers better performance. Here are the benchmark results comparing strings.Split and strings.SplitSeq in a for-loop, with the benchmark code located in src/strings/iter_test.go: goos: darwin goarch: amd64 pkg: cmd/go/internal/auth cpu: Intel(R) Core(TM) i7-8569U CPU @ 2.80GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ ParseGitAuth/standard-8 281.4n ± 1% 218.0n ± 11% -22.54% (p=0.000 n=10) ParseGitAuth/with_url-8 549.1n ± 1% 480.5n ± 13% -12.48% (p=0.002 n=10) ParseGitAuth/minimal-8 235.4n ± 1% 197.3n ± 7% -16.20% (p=0.000 n=10) ParseGitAuth/complex-8 797.6n ± 2% 805.2n ± 4% ~ (p=0.481 n=10) ParseGitAuth/empty-8 87.48n ± 3% 63.25n ± 6% -27.71% (p=0.000 n=10) ParseGitAuth/malformed-8 228.8n ± 1% 171.2n ± 3% -25.17% (p=0.000 n=10) geomean 288.9n 237.7n -17.72% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ ParseGitAuth/standard-8 192.00 ± 0% 96.00 ± 0% -50.00% (p=0.000 n=10) ParseGitAuth/with_url-8 400.0 ± 0% 288.0 ± 0% -28.00% (p=0.000 n=10) ParseGitAuth/minimal-8 144.00 ± 0% 80.00 ± 0% -44.44% (p=0.000 n=10) ParseGitAuth/complex-8 528.0 ± 0% 400.0 ± 0% -24.24% (p=0.000 n=10) ParseGitAuth/empty-8 32.00 ± 0% 16.00 ± 0% -50.00% (p=0.000 n=10) ParseGitAuth/malformed-8 176.00 ± 0% 80.00 ± 0% -54.55% (p=0.000 n=10) geomean 179.0 102.1 -42.96% │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ ParseGitAuth/standard-8 3.000 ± 0% 2.000 ± 0% -33.33% (p=0.000 n=10) ParseGitAuth/with_url-8 4.000 ± 0% 3.000 ± 0% -25.00% (p=0.000 n=10) ParseGitAuth/minimal-8 3.000 ± 0% 2.000 ± 0% -33.33% (p=0.000 n=10) ParseGitAuth/complex-8 4.000 ± 0% 3.000 ± 0% -25.00% (p=0.000 n=10) ParseGitAuth/empty-8 2.000 ± 0% 1.000 ± 0% -50.00% (p=0.000 n=10) ParseGitAuth/malformed-8 3.000 ± 0% 2.000 ± 0% -33.33% (p=0.000 n=10) geomean 3.086 2.040 -33.91% Updates #69315. Change-Id: Id0219edea45d9658d527b863162ebe917e7821d9 GitHub-Last-Rev: 392b315e122f2c9ef8703ca2dbce8f82ec198556 GitHub-Pull-Request: golang/go#75259 Reviewed-on: https://go-review.googlesource.com/c/go/+/701015 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
2025-09-15cmd/compile,cmd/gofmt: use reflect.TypeForapocelipes
Use "reflect.TypeFor" to simplify the code. Updates #60088 Change-Id: I93db6cbd4f02813d9a81f5d02996db8128cb81a9 GitHub-Last-Rev: 2aee64dac6e13ef869aa73f2abf236650e1c1757 GitHub-Pull-Request: golang/go#75349 Reviewed-on: https://go-review.googlesource.com/c/go/+/701676 Reviewed-by: Mark Freeman <markfreeman@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-06-30cmd/compile/internal/escape: add debug hash for literal allocation optimizationsthepudds
Several CLs earlier in this stack added optimizations to reduce user allocations by recognizing and taking advantage of literals, including CL 649555, CL 649079, and CL 649035. This CL adds debug hashing of those changes, which enables use of the bisect tool, such as 'bisect -compile=literalalloc go test -run=Foo'. This also allows these optimizations to be manually disabled via '-gcflags=all=-d=literalallochash=n'. Updates #71359 Change-Id: I854f7742a6efa5b17d914932d61a32b2297f0c88 Reviewed-on: https://go-review.googlesource.com/c/go/+/675415 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-20cmd/compile/internal/escape: additional constant and zero value tests and ↵thepudds
logging This adds additional logging for the work that walk does to reduce how often an interface conversion results in an allocation. Also, as part of #71359, we will be updating how escape analysis and walk handle basic literals, composite literals, and zero values, so add some tests that uses this new logging. By the end of our CL stack, we address all of these tests. Updates #71359 Change-Id: I43fde8343d9aacaec1e05360417908014a86c8bd Reviewed-on: https://go-review.googlesource.com/c/go/+/649076 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-15cmd/compile/internal/escape: add hash for bisecting stack allocation of ↵thepudds
variable-sized makeslice CL 653856 enabled stack allocation of variable-sized makeslice results. This CL adds debug hashing of that change, plus a debug flag to control the byte threshold used. The debug hashing machinery means we also now have a way to disable just the CL 653856 optimization by doing -gcflags='all=-d=variablemakehash=n' or similar, though the stderr output will then typically have many lines of debug hash output. Using this CL plus the bisect command, I was able to retroactively find one of the lines of code responsible for #73199: $ bisect -compile=variablemake go test -skip TestListWireGuardDrivers [...] bisect: FOUND failing change set --- change set #1 (enabling changes causes failure) ./security_windows.go:1321:38 (variablemake) ./security_windows.go:1321:38 (variablemake) --- Previously, I had tracked down those lines by diffing '-gcflags=-m=1' output and brief code inspection, but seeing the bisect was very nice. This CL also adds a compiler debug flag to control the threshold for stack allocation of variably sized make results. This can help us identify more code that is relying on certain stack allocations. This might be a temporary flag that we delete prior to Go 1.25 (given we would not want people to rely on it), or maybe it might make sense to keep it for some period of time beyond the release of Go 1.25 to help the ecosystem shake out other bugs. Using these two flags together (and picking a threshold of 64 rather than the default of 32), it looks for example like this x/sys/windows code might be relying on stack allocation of a byte slice: $ bisect -compile=variablemake go test -gcflags=-d=variablemakethreshold=64 -skip TestListWireGuardDrivers [...] bisect: FOUND failing change set --- change set #1 (enabling changes causes failure) ./syscall_windows_test.go:1178:16 (variablemake) Updates #73199 Fixes #73253 Change-Id: I160179a0e3c148c3ea86be5c9b6cea8a52c3e5b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/663795 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-13cmd/compile, cmd/link: add FIPS verification supportRuss Cox
For FIPS init-time code+data verification, we need to arrange to put the FIPS symbols into contiguous regions of the executable and then record those sections along with the expected checksum. The cmd/internal/obj changes identify the FIPS symbols and give them distinguished types, which the linker then places in contiguous regions. The linker also writes out information to use at run time to find the FIPS sections, along with the expected hash. See cmd/internal/obj/fips.go and cmd/link/internal/ld/fips.go for more details. The code is disabled in this commit. CL 625998 and 625999 adds tests. CL 626000 enables the code. For #69536. Change-Id: I48da6db94bc0bea7428c43d4abcf999527bccfcd Reviewed-on: https://go-review.googlesource.com/c/go/+/625997 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-09-09cmd/compile: emit tail call wrappers when possibleamusman
Use OTAILCALL in wrapper if the receiver and method are both pointers and it is not going to be inlined, similar to how it is done in reflectdata.methodWrapper. Currently tail call may be used for functions with identical argument types. This change updates wrappers where both wrapper and the wrapped method's receiver are pointers. In this case, we have the same signature for the wrapper and the wrapped method (modulo the receiver's pointed-to types), and do not need any local variables in the generated wrapper (on stack) because the arguments are immediately passed to the wrapped method in place (without need to move some value passed to other register or to change any argument/return passed through stack). Thus, the wrapper does not need its own stack frame. This applies to promoted methods, e.g. when we have some struct type U with an embedded type *T and construct a wrapper like func (recv *U) M(arg int) bool { return recv.T.M(i) } See also test/abi/method_wrapper.go for a running example. Code size difference measured with this change (tried for x86_64): etcd binary: .text section size: 21472251 -> 21432350 (0.2%) total binary size: 32226640 -> 32191136 (0.1%) compile binary: .text section size: 17419073 -> 17413929 (0.03%) total binary size: 26744743 -> 26737567 (0.03%) Change-Id: I9bbe730568f6def21a8e61118a6b6f503d98049c Reviewed-on: https://go-review.googlesource.com/c/go/+/578235 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2024-07-30cmd/compile: add "deadlocals" pass to remove unused localsDavid Chase
This CL adds a "deadlocals" pass, which runs after inlining and before escape analysis, to prune any unneeded local variables and assignments. In particular, this helps avoid unnecessary Addrtaken markings from unreachable closures. Deadlocals is sensitive to "_ = ..." as a signal of explicit use for testing. This signal occurs only if the entire left-hand-side is "_" targets; if it is `_, ok := someInlinedFunc(args)` then the first return value is eligible for dead code elimination. Use this (`_ = x`) to fix tests broken by deadlocals elimination. Includes a test, based on one of the tests that required modification. Matthew Dempsky wrote this, changing ownership to allow rebases, commits, tweaks. Fixes #65158. Old-Change-Id: I723fb69ccd7baadaae04d415702ce6c8901eaf4e Change-Id: I1f25f4293b19527f305c18c3680b214237a7714c Reviewed-on: https://go-review.googlesource.com/c/go/+/600498 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: David Chase <drchase@google.com> Commit-Queue: David Chase <drchase@google.com>
2024-06-21cmd/internal: separate counter package from telemetry packageMichael Matloob
Move the code that opens and increments counters out of the cmd/internal/telemetry package into cmd/internal/telemetry/counter. The telemetry package has dependencies on the upload code, which we do not want to pull into the rest of the go toolchain. For #68109 Change-Id: I463c106819b169177a783de4a7d93377e81f4e3e Reviewed-on: https://go-review.googlesource.com/c/go/+/593976 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Findley <rfindley@google.com>
2024-05-21cmd/compile: disable AlignHot on plan9/386David du Colombier
CL 577935 enabled hot block alignment on 386 and amd64 architecture. However, this change broke the plan9/386 build. This change disables AlignHot on plan9/386. Updates #67502. Change-Id: If73b066824c7218a9408c6e8f06aec5908b7a64f Reviewed-on: https://go-review.googlesource.com/c/go/+/586835 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: David du Colombier <0intro@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
2024-05-15cmd/link: disallow pull-only linknamesCherry Mui
As mentioned in CL 584598, linkname is a mechanism that, when abused, can break API integrity and even safety of Go programs. CL 584598 is a first step to restrict the use of linknames, by implementing a blocklist. This CL takes a step further, tightening up the restriction by allowing linkname references ("pull") only when the definition side explicitly opts into it, by having a linkname on the definition (possibly to itself). This way, it is at least clear on the definition side that the symbol, despite being unexported, is accessed outside of the package. Unexported symbols without linkname can now be actually private. This is similar to the symbol visibility rule used by gccgo for years (which defines unexported non-linknamed symbols as C static symbols). As there can be pull-only linknames in the wild that may be broken by this change, we currently only enforce this rule for symbols defined in the standard library. Push linknames are added in the standard library to allow things build. Linkname references to external (non-Go) symbols are still allowed, as their visibility is controlled by the C symbol visibility rules and enforced by the C (static or dynamic) linker. Assembly symbols are treated similar to linknamed symbols. This is controlled by -checklinkname linker flag, currently not enabled by default. A follow-up CL will enable it by default. Change-Id: I07344f5c7a02124dbbef0fbc8fec3b666a4b2b0e Reviewed-on: https://go-review.googlesource.com/c/go/+/585358 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Russ Cox <rsc@golang.org>
2024-05-15cmd/compile, cmd/internal: fine-grained fiddling with loop alignmentDavid Chase
This appears to be useful only on amd64, and was specifically benchmarked on Apple Silicon and did not produce any benefit there. This CL adds the assembly instruction `PCALIGNMAX align,amount` which aligns to `align` if that can be achieved with `amount` or fewer bytes of padding. (0 means never, but will align the enclosing function.) Specifically, if low-order-address-bits + amount are greater than or equal to align; thus, `PCALIGNMAX 64,63` is the same as `PCALIGN 64` and `PCALIGNMAX 64,0` will never emit any alignment, but will still cause the function itself to be aligned to (at least) 64 bytes. Change-Id: Id51a056f1672f8095e8f755e01f72836c9686aa3 Reviewed-on: https://go-review.googlesource.com/c/go/+/577935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-05-09cmd/compile: add support for telemetryMichael Matloob
Add cmd/internal/telemetry to cmd/dist's bootstrapDirs so it's built when bootstrapping the compiler. cmd/internal/telemetry is a wrapper arount telemetry functions that stubs out the functions when built in bootstrap mode to avoid dependencies on x/telemetry in bootstrap mode. Call telemetry.Start with an empty config to open the counter file, and increment a counter for when the command is invoked. After flags are parsed, increment a counter for each of the names of the flags that were passed in. The counter names will be compile/flag:<name> so for example we'll have compile/flag:e and compile/flag:E. In FatalfAt, increment a stack counter for internal errors. For #58894 Change-Id: Ia5a8a63aa43b2276641181626cbfbea7e4647faa Reviewed-on: https://go-review.googlesource.com/c/go/+/570679 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-04-09cmd/compile/internal/base: enable stack slot merging by defaultThan McIntosh
Flag flip to enable stack slot merging by default when optimizing. Please see the earlier CL for details on what this is doing. Updates #62737. Updates #65532. Updates #65495. Change-Id: I8e30d553e74ace43d418f883199721f05320d3d7 Reviewed-on: https://go-review.googlesource.com/c/go/+/576681 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-04-09cmd/compile/internal/liveness: enhance mergelocals for addr-taken candidatesThan McIntosh
It is possible to have situations where a given ir.Name is non-address-taken at the source level, but whose address is materialized in order to accommodate the needs of arch-dependent memory ops. The issue here is that the SymAddr op will show up as touching a variable of interest, but the subsequent memory op will not. This is generally not an issue for computing whether something is live across a call, but it is problematic for collecting the more fine-grained live interval info that drives stack slot merging. As an example, consider this Go code: package p type T struct { x [10]int f float64 } func ABC(i, j int) int { var t T t.x[i&3] = j return t.x[j&3] } On amd64 the code sequences we'll see for accesses to "t" might look like v10 = VarDef <mem> {t} v1 v5 = MOVOstoreconst <mem> {t} [val=0,off=0] v2 v10 v23 = LEAQ <*T> {t} [8] v2 : DI v12 = DUFFZERO <mem> [80] v23 v5 v14 = ANDQconst <int> [3] v7 : AX v19 = MOVQstoreidx8 <mem> {t} v2 v14 v8 v12 v22 = ANDQconst <int> [3] v8 : BX v24 = MOVQloadidx8 <int> {t} v2 v22 v19 : AX v25 = MakeResult <int,mem> v24 v19 : <> Note that the the loads and stores (ex: v19, v24) all refer directly to "t", which means that regular live analysis will work fine for identifying variable lifetimes. The DUFFZERO is (in effect) an indirect write, but since there are accesses immediately after it we wind up with the same live intervals. Now the same code with GOARCH=ppc64: v10 = VarDef <mem> {t} v1 v20 = MOVDaddr <*T> {t} v2 : R20 v12 = LoweredZero <mem> [88] v20 v10 v3 = CLRLSLDI <int> [212543] v7 : R5 v15 = MOVDaddr <*T> {t} v2 : R6 v19 = MOVDstoreidx <mem> v15 v3 v8 v12 v29 = CLRLSLDI <int> [212543] v8 : R4 v24 = MOVDloadidx <int> v15 v29 v19 : R3 v25 = MakeResult <int,mem> v24 v19 : <> Here instead of memory ops that refer directly to the symbol, we take the address of "t" (ex: v15) and then pass the address to memory ops (where the ops themselves no longer refer to the symbol). This patch enhances the stack slot merging liveness analysis to handle cases like the PPC64 one above. We add a new phase in candidate selection that collects more precise use information for merge candidates, and screens out candidates that are too difficult to analyze. The phase make a forward pass over each basic block looking for instructions of the form vK := SymAddr(N) where N is a raw candidate. It then creates an entry in a map with key vK and value holding name and the vK use count. As the walk continues, we check for uses of of vK: when we see one, record it in a side table as an upwards exposed use of N. At each vK use we also decrement the use count in the map entry, and if we hit zero, remove the map entry. If we hit the end of the basic block and we still have map entries, this implies that the address in question "escapes" the block -- at that point to be conservative we just evict the name in question from the candidate set. Although this CL fixes the issues that forced a revert of the original merging CL, this CL doesn't enable stack slot merging by default; a subsequent CL will do that. Updates #62737. Updates #65532. Updates #65495. Change-Id: Id41d359a677767a8e7ac1e962ae23f7becb4031f Reviewed-on: https://go-review.googlesource.com/c/go/+/576735 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-09cmd/compile/internal: merge stack slots for selected local auto varsThan McIntosh
[This is a partial roll-forward of CL 553055, the main change here is that the stack slot overlap operation is flagged off by default (can be enabled by hand with -gcflags=-d=mergelocals=1) ] Preliminary compiler support for merging/overlapping stack slots of local variables whose access patterns are disjoint. This patch includes changes in AllocFrame to do the actual merging/overlapping based on information returned from a new liveness.MergeLocals helper. The MergeLocals helper identifies candidates by looking for sets of AUTO variables that either A) have the same size and GC shape (if types contain pointers), or B) have the same size (but potentially different types as long as those types have no pointers). Variables must be greater than (3*types.PtrSize) in size to be considered for merging. After forming candidates, MergeLocals collects variables into "can be overlapped" equivalence classes or partitions; this process is driven by an additional liveness analysis pass. Ideally it would be nice to move the existing stackmap liveness pass up before AllocFrame and "widen" it to include merge candidates so that we can do just a single liveness as opposed to two passes, however this may be difficult given that the merge-locals liveness has to take into account writes corresponding to dead stores. This patch also required a change to the way ssa.OpVarDef pseudo-ops are generated; prior to this point they would only be created for variables whose type included pointers; if stack slot merging is enabled then the ssagen code creates OpVarDef ops for all auto vars that are merge candidates. Note that some temporaries created late in the compilation process (e.g. during ssa backend) are difficult to reason about, especially in cases where we take the address of a temp and pass it to the runtime. For the time being we mark most of the vars created post-ssagen as "not a merge candidate". Stack slot merging for locals/autos is enabled by default if "-N" is not in effect, and can be disabled via "-gcflags=-d=mergelocals=0". Fixmes/todos/restrictions: - try lowering size restrictions - re-evaluate the various skips that happen in SSA-created autotmps Updates #62737. Updates #65532. Updates #65495. Change-Id: Ifda26bc48cde5667de245c8a9671b3f0a30bb45d Reviewed-on: https://go-review.googlesource.com/c/go/+/575415 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-03-30Revert "cmd/compile/internal: merge stack slots for selected local auto vars"Cuong Manh Le
This reverts CL 553055. Reason for revert: causes crypto/ecdsa failures on linux ppc64/s390x builders Change-Id: I9266b030693a5b6b1e667a009de89d613755b048 Reviewed-on: https://go-review.googlesource.com/c/go/+/575236 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Auto-Submit: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-29cmd/compile/internal: merge stack slots for selected local auto varsThan McIntosh
Preliminary compiler support for merging/overlapping stack slots of local variables whose access patterns are disjoint. This patch includes changes in AllocFrame to do the actual merging/overlapping based on information returned from a new liveness.MergeLocals helper. The MergeLocals helper identifies candidates by looking for sets of AUTO variables that either A) have the same size and GC shape (if types contain pointers), or B) have the same size (but potentially different types as long as those types have no pointers). Variables must be greater than (3*types.PtrSize) in size to be considered for merging. After forming candidates, MergeLocals collects variables into "can be overlapped" equivalence classes or partitions; this process is driven by an additional liveness analysis pass. Ideally it would be nice to move the existing stackmap liveness pass up before AllocFrame and "widen" it to include merge candidates so that we can do just a single liveness as opposed to two passes, however this may be difficult given that the merge-locals liveness has to take into account writes corresponding to dead stores. This patch also required a change to the way ssa.OpVarDef pseudo-ops are generated; prior to this point they would only be created for variables whose type included pointers; if stack slot merging is enabled then the ssagen code creates OpVarDef ops for all auto vars that are merge candidates. Note that some temporaries created late in the compilation process (e.g. during ssa backend) are difficult to reason about, especially in cases where we take the address of a temp and pass it to the runtime. For the time being we mark most of the vars created post-ssagen as "not a merge candidate". Stack slot merging for locals/autos is enabled by default if "-N" is not in effect, and can be disabled via "-gcflags=-d=mergelocals=0". Fixmes/todos/restrictions: - try lowering size restrictions - re-evaluate the various skips that happen in SSA-created autotmps Fixes #62737. Updates #65532. Updates #65495. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Change-Id: Ibc22e8a76c87e47bc9fafe4959804d9ea923623d Reviewed-on: https://go-review.googlesource.com/c/go/+/553055 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-19all: use "unix" build tag where appropriateTobias Klauser
For #51572 Change-Id: I23bb25b8cf1ecb9be25eb6ab9e89cd397b58b3c8 Reviewed-on: https://go-review.googlesource.com/c/go/+/572535 Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2024-03-01cmd/compile: remove unnecessary fmt.Sprintf callguoguangwu
Change-Id: I38ba7cb0179ec9226a68629c53ea2d81fa19c059 GitHub-Last-Rev: a3d4fe2ac2d2f40033ae0244a264074d45b3ad52 GitHub-Pull-Request: golang/go#66024 Reviewed-on: https://go-review.googlesource.com/c/go/+/568115 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Commit-Queue: Ian Lance Taylor <iant@google.com> Reviewed-by: qiulaidongfeng <2645477756@qq.com> Auto-Submit: Robert Griesemer <gri@google.com>
2024-01-31cmd/preprofile: Implement a tool to preprocess the PGO profile.Jin Lin
It fixes the issue https://github.com/golang/go/issues/65220. It also includes https://go.dev/cl/557458 from Michael. Change-Id: Ic6109e1b6a9045459ff4a54dea11cbfe732b01e6 Reviewed-on: https://go-review.googlesource.com/c/go/+/557918 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-01-22Revert "cmd/preprofile: Add preprocess tool to pre-parse the profile file."Michael Pratt
This reverts CL 529738. Reason for revert: Breaking longtest builders For #58102. Fixes #65220. Change-Id: Id295e3249da9d82f6a9e4fc571760302a1362def Reviewed-on: https://go-review.googlesource.com/c/go/+/557460 Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-01-22cmd/preprofile: Add preprocess tool to pre-parse the profile file.Jin Lin
The pgo compilation time is very long if the profile file is large. We added a preprocess tool to pre-parse profile file in order to expedite the compile time. Change-Id: I6f50bbd01f242448e2463607a9b63483c6ca9a12 Reviewed-on: https://go-review.googlesource.com/c/go/+/529738 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-01-10cmd/compile: use hashed symbol name for go.shape types if too longThan McIntosh
Shape-based stenciling in the Go compiler's generic instantiation phase looks up shape types using the underlying type of a given target type. This has a beneficial effect in most cases (e.g. we can use the same shape type for two different named types whose underlying type is "int"), but causes some problems when the underlying type is a very large structure. The link string for the underlying type of a large imported struct can be extremely long, since the link string essentially enumerates the full package path for every field type; this can produce a "go.shape.struct { ... " symbol name that is absurdly long. This patch switches the compiler to use a hash of the underlying type link string instead of the string itself, which should continue to provide commoning but keep symbol name lengths reasonable for shape types based on large imported structs. Fixes #65030. Change-Id: I87d602626c43172beb99c186b8ef72327b8227a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/554975 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Than McIntosh <thanm@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-12-19cmd/compile: remove interfacecycles debug flagRobert Griesemer
Per the discussion on the issue, since no problems related to this appeared since Go 1.20, remove the ability to disable the check for anonymous interface cycles permanently. Adjust various tests accordingly. For #56103. Change-Id: Ica2b28752dca08934bbbc163a9b062ae1eb2a834 Reviewed-on: https://go-review.googlesource.com/c/go/+/550896 Run-TryBot: Robert Griesemer <gri@google.com> Auto-Submit: Robert Griesemer <gri@google.com> Reviewed-by: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-12-14all: remove newline characters after return statementsDanil Timerbulatov
This commit is aimed at improving the readability and consistency of the code base. Extraneous newline characters were present after some return statements, creating unnecessary separation in the code. Fixes #64610 Change-Id: Ic1b05bf11761c4dff22691c2f1c3755f66d341f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/548316 Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-11-16cmd/compile: allow disable of PGO function value devirtualization with flagMichael Pratt
Extend the pgodevirtualize debug flag to distinguish interface and function devirtualization. Setting 1 keeps interface devirtualization enabled but disables function value devirtualization. For #64209. Change-Id: I33aa7eb95ca0bdb215256d8c7cc8f9dac53ae30e Reviewed-on: https://go-review.googlesource.com/c/go/+/543115 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-16cmd/compile/internal/inline: debug flag to alter score adjustmentsThan McIntosh
Add a debugging flag "-d=inlscoreadj" intended to support running experiments in which the inliner uses different score adjustment values for specific heuristics. The flag argument is a series of clauses separated by the "/" char where each clause takes the form "adjK:valK". For example, in this build go build -gcflags=-d=inlscoreadj=inLoopAdj:10/returnFeedsConstToIfAdj:-99 the "in loop" score adjustments would be reset to a value of 15 (effectively penalizing calls in loops) adn the "return feeds constant to foldable if/switch" score adjustment would be boosted from -15 to -99. Change-Id: Ibd1ee334684af5992466556a69baa6dfefb246b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/532116 Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-16cmd/compile: add compiler debug flag to disable range func iterator checkingDavid Chase
E.g. `GOEXPERIMENT=rangefunc go test -v -gcflags=-d=rangefunccheck=0 rangefunc_test.go` will turn off the checking and fail. The benchmarks, which do not use pathological iterators, run slightly faster. Change-Id: Ia3e175e86d67ef74bbae9bcc5d2def6a2cdf519d Reviewed-on: https://go-review.googlesource.com/c/go/+/541995 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-09-30internal,cmd/internal: relocate covcmd package from std to cmdThan McIntosh
Relocate the 'covcmd' package from .../internal/coverage to .../cmd/internal/cov, to reflect the fact that the definitions in this package are used only in cmd, not in std. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-windows-amd64-longtest Change-Id: I65bcc34736d1d0a23134a6c91c17ff138cd45431 Reviewed-on: https://go-review.googlesource.com/c/go/+/526595 Reviewed-by: Bryan Mills <bcmills@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-09-19cmd/compile: add pgohash for debugging/bisecting PGO optimizationsCherry Mui
When a PGO build fails or produces incorrect program, it is often unclear what the problem is. Add pgo hash so we can bisect to individual optimization decisions, which often helps debugging. Related to #58153. Change-Id: I651ffd9c53bad60f2f28c8ec2a90a3f532982712 Reviewed-on: https://go-review.googlesource.com/c/go/+/528400 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-09-15cmd/compile/internal/inline: add callsite trace output debugging flagThan McIntosh
Add a new debug flag "-d=dumpinlcallsitescores" that dumps out a summary of all callsites in the package being compiled with info on inlining heuristics, for human consumption. Sample output lines: Score Adjustment Status Callee CallerPos ScoreFlags ... 115 40 DEMOTED cmd/compile/internal/abi.(*ABIParamAssignment).Offset expand_calls.go:1679:14|6 panicPathAdj ... 76 -5 PROMOTED runtime.persistentalloc mcheckmark.go:48:45|3 inLoopAdj ... 201 0 --- PGO unicode.DecodeRuneInString utf8.go:312:30|1 ... 7 -5 --- PGO internal/abi.Name.DataChecked type.go:625:22|0 inLoopAdj Here "Score" is the final score calculated for the callsite, "Adjustment" is the amount added to or subtracted from the original hairyness estimate to form the score. "Status" shows whether anything changed with the site -- did the adjustment bump it down just below the threshold ("PROMOTED") or instead bump it above the threshold ("DEMOTED") or did nothing happen as a result of the heuristics ("---"); "Status" also shows whether PGO was involved. "Callee" is the name of the function called, "CallerPos" is the position of the callsite, and "ScoreFlags" is a digest of the specific properties we used to make adjustments to callsite score via heuristics. Change-Id: Iea4b1cbfee038bc68df6ab81e9973f145636300b Reviewed-on: https://go-review.googlesource.com/c/go/+/513455 Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-09-12cmd/compile/internal/base: keep Ctxt.Flag_optimize in sync with Flag.NThan McIntosh
This patch fixes an inconsistency in compiler flag handling introduced accidentally in CL 521699. In the compiler we have both base.Flag.N (which records whether the user has supplied the "-N" flag to disable optimization) and base.Ctxt.Flag_optimize (which tracks whether optimization is turned on). In this case Flag.N was updated without a corresponding change to Ctxt.Flag_optimize, which led to problems with DWARF generation for the runtime. This CL doesn't include a regression test; a test will be added later in the x/debug repo in a subsequent CL. Updates #62523. Change-Id: I0c383bb43ec0a0e7c12e7e2852c0590731416d6f Reviewed-on: https://go-review.googlesource.com/c/go/+/527319 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-09-11cmd/compile/internal/staticinit: make staticopy safeMatthew Dempsky
Currently, cmd/compile optimizes `var a = true; var b = a` into `var a = true; var b = true`. But this may not be safe if we need to initialize any other global variables between `a` and `b`, and the initialization involves calling a user-defined function that reassigns `a`. This CL changes staticinit to keep track of the initialization expressions that we've seen so far, and to stop applying the staticcopy optimization once we've seen an initialization expression that might have modified another global variable within this package. To help identify affected initializers, this CL adds a -d=staticcopy flag to warn when a staticcopy is suppressed and turned into a dynamic copy. Currently, `go build -gcflags=all=-d=staticcopy std` reports only four instances: ``` encoding/xml/xml.go:1600:5: skipping static copy of HTMLEntity+0 with map[string]string{...} encoding/xml/xml.go:1869:5: skipping static copy of HTMLAutoClose+0 with []string{...} net/net.go:661:5: skipping static copy of .stmp_31+0 with poll.ErrNetClosing net/http/transport.go:2566:5: skipping static copy of errRequestCanceled+0 with ~R0 ``` Fixes #51913. Change-Id: Iab41cf6f84c44f7f960e4e62c28a8aeaade4fbcf Reviewed-on: https://go-review.googlesource.com/c/go/+/395541 Auto-Submit: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Heschi Kreinick <heschi@google.com>
2023-08-25cmd/compile: remove ErrorfVers and base.Flag.Lang hackMatthew Dempsky
ErrorfVers used to be used by typecheck to report when new language functionality was used, but the -lang flag (from go.mod) was set to an older version. However, all of the callers have been since removed, now that this is handled by types2. And for the same reason, we can stop changing base.Flag.Lang. This was previously a workaround so that the unified frontend could generate arbitrary IR without upsetting typecheck, at a time when typecheck was itself a real frontend. Now it's just a glorified desugaring pass. Change-Id: I1c0316dbfe2e08ba089acd50fdfe20b17176be25 Reviewed-on: https://go-review.googlesource.com/c/go/+/522877 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Auto-Submit: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-22cmd/internal/objabi,cmd/compile: move NoInstrumentPkgs and NoRacePkgs to ↵Austin Clements
PkgSpecials This consolidates the NoInstrumentPkgs and NoRacePkgs lists into the objabi.LookupPkgSpecial mechanism. Change-Id: I411654afdd690fb01c412e7e8b57ddfbe85415e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/521702 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Austin Clements <austin@google.com>
2023-08-22cmd/compile: add all runtime package dependencies to NoInstrumentPkgsAustin Clements
Currently, this list includes *almost* all runtime packages, but not quite all. We leave out internal/bytealg for reasons explained in the code. Compiling with or without race instrumentation has no effect on the other packages added to the list here, so this is a no-op change today, but makes this more robust. Change-Id: Iaec585b2efbc72983d8cb3929394524c42dd664d Reviewed-on: https://go-review.googlesource.com/c/go/+/521701 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-22cmd/compile,cmd/dist,cmd/go: compute -+ flag from package pathAustin Clements
As we did for the asm -compiling-runtime flag, this CL modifies the compiler to compute the -+ (compiling runtime) flag from the package path. Unlike for asm, some tests use -+ explicitly to opt in to runtime restrictions, so we leave the flag, but it's no longer passed by any build tools. This lets us eliminate cmd/go's list of "runtime packages" in favor of the unified objabi.LookupPkgSpecial. It also fixes an inconsistency with dist, which only passed -+ when compiling "runtime" itself. One consequence of this is that the compiler now ignores the -N flag when compiling runtime packages. Previously, cmd/go would strip -N when passing -+ and the compiler would fatal if it got both -N and -+, so the overall effect was that the compiler never saw -N when compiling a runtime package. Now we simply move that logic to disable -N down into the compiler. Change-Id: I4876047a1563210ed122a31b72d62798762cbcf5 Reviewed-on: https://go-review.googlesource.com/c/go/+/521699 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-08-21cmd/compile: move racewalk comment to walk/race.goAustin Clements
This comment got left behind in some refactoring and now refers to code "below" that is no longer below. Move it to be with the code it's referring to. Change-Id: I7f7bf0cf8b22c1f6e05ff12b8be71d18fb3359d5 Reviewed-on: https://go-review.googlesource.com/c/go/+/521177 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Austin Clements <austin@google.com> TryBot-Bypass: Austin Clements <austin@google.com>