| Age | Commit message (Collapse) | Author |
|
The property was missing from some flags that we potentially care
about. This change only affects flags supplied in the command line
(not explicitly changed within the compiler) but nonetheless it seems
like a good idea to get them right -- we might adjust them on the
command line, someone might read the code and wonder why they're
unset.
Change-Id: I44812ddea640b71c078594317ef3506ab055a37b
Reviewed-on: https://go-review.googlesource.com/c/go/+/452876
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This was extraordinarily useful for inlining work.
I have cleaned it up somewhat, and did some additional tweaks
after working on changes to bloop.
-gcflags=-d=astdump=SomeFunc
-gcflags=-d=astdump=SomeSubPkg.SomeFunc
-gcflags=-d=astdump=Some/Pkg.SomeFunc
-gcflags=-d=astdump=~YourRegExpHere
Change-Id: I3f98601ca96c87d6b191d4b64b264cd236e6d8bf
Reviewed-on: https://go-review.googlesource.com/c/go/+/629775
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
memory after growslice
This CL is part of a set of CLs that attempt to reduce how much work the
GC must do. See the design in https://go.dev/design/74299-runtime-freegc
This CL updates the compiler to examine append calls to prove
whether or not the slice is aliased.
If proven unaliased, the compiler automatically inserts a call to a new
runtime function introduced with this CL, runtime.growsliceNoAlias,
which frees the old backing memory immediately after slice growth is
complete and the old storage is logically dead.
Two append benchmarks below show promising results, executing up to
~2x faster and up to factor of ~3 memory reduction with this CL.
The approach works with multiple append calls for the same slice,
including inside loops, and the final slice memory can be escaping,
such as in a classic pattern of returning a slice from a function
after the slice is built. (The final slice memory is never freed with
this CL, though we have other work that tackles that.)
An example target for this CL is we automatically free the
intermediate memory for the appends in the loop in this function:
func f1(input []int) []int {
var s []int
for _, x := range input {
s = append(s, g(x)) // s cannot be aliased here
if h(x) {
s = append(s, x) // s cannot be aliased here
}
}
return s // slice escapes at end
}
In this case, the compiler and the runtime collaborate so that
the heap allocated backing memory for s is automatically freed after
a successful grow. (For the first grow, there is nothing to free,
but for the second and subsequent growths, the old heap memory is
freed automatically.)
The new runtime.growsliceNoAlias is primarily implemented
by calling runtime.freegc, which we introduced in CL 673695.
The high-level approach here is we step through the IR starting
from a slice declaration and look for any operations that either
alias the slice or might do so, and treat any IR construct we
don't specifically handle as a potential alias (and therefore
conservatively fall back to treating the slice as aliased when
encountering something not understood).
For loops, some additional care is required. We arrange the analysis
so that an alias in the body of a loop causes all the appends in that
same loop body to be marked aliased, even if the aliasing occurs after
the append in the IR:
func f2() {
var s []int
for i := range 10 {
s = append(s, i) // aliased due to next line
alias = s
}
}
For nested loops, we analyse the nesting appropriately so that
for example this append is still proven as non-aliased in the
inner loop even though it aliased for the outer loop:
func f3() {
for range 10 {
var s []int
for i := range 10 {
s = append(s, i) // append using non-aliased slice
}
alias = s
}
}
A good starting point is the beginning of the test/escape_alias.go file,
which starts with ~10 introductory examples with brief comments that
attempt to illustrate the high-level approach.
For more details, see the new .../internal/escape/alias.go file,
especially the (*aliasAnalysis).analyze method.
In the first benchmark, an append in a loop builds up a slice from
nothing, where the slice elements are each 64 bytes. In the table below,
'count' is the number of appends. With 1 append, there is no opportunity
for this CL to free memory. Once there are 2 appends, the growth from
1 element to 2 elements means the compiler-inserted growsliceNoAlias
frees the 1-element array, and we see a ~33% reduction in memory use
and a small reported speed improvement.
As the number of appends increases for example to 5, we are at
a ~20% speed improvement and ~45% memory reduction, and so on until
we reach ~40% faster and ~50% less memory allocated at the end of
the table.
There can be variation in the reported numbers based on -randlayout, so
this table is for 30 different values of -randlayout with a total
n=150. (Even so, there is still some variation, so we probably should
not read too much into small changes.) This is with GOAMD64=v3 on
a VM that gcc reports is cascadelake.
goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ sec/op │ sec/op vs base │
Append64Bytes/count=1-4 31.09n ± 2% 31.69n ± 1% +1.95% (n=150)
Append64Bytes/count=2-4 73.31n ± 1% 70.27n ± 0% -4.15% (n=150)
Append64Bytes/count=3-4 142.7n ± 1% 124.6n ± 1% -12.68% (n=150)
Append64Bytes/count=4-4 149.6n ± 1% 127.7n ± 0% -14.64% (n=150)
Append64Bytes/count=5-4 277.1n ± 1% 213.6n ± 0% -22.90% (n=150)
Append64Bytes/count=6-4 280.7n ± 1% 216.5n ± 1% -22.87% (n=150)
Append64Bytes/count=10-4 544.3n ± 1% 386.6n ± 0% -28.97% (n=150)
Append64Bytes/count=20-4 1058.5n ± 1% 715.6n ± 1% -32.39% (n=150)
Append64Bytes/count=50-4 2.121µ ± 1% 1.404µ ± 1% -33.83% (n=150)
Append64Bytes/count=100-4 4.152µ ± 1% 2.736µ ± 1% -34.11% (n=150)
Append64Bytes/count=200-4 7.753µ ± 1% 4.882µ ± 1% -37.03% (n=150)
Append64Bytes/count=400-4 15.163µ ± 2% 9.273µ ± 1% -38.84% (n=150)
geomean 601.8n 455.0n -24.39%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ B/op │ B/op vs base │
Append64Bytes/count=1-4 64.00 ± 0% 64.00 ± 0% ~ (n=150)
Append64Bytes/count=2-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150)
Append64Bytes/count=3-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150)
Append64Bytes/count=4-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150)
Append64Bytes/count=5-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150)
Append64Bytes/count=6-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150)
Append64Bytes/count=10-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150)
Append64Bytes/count=20-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150)
Append64Bytes/count=50-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150)
Append64Bytes/count=100-4 15.938Ki ± 0% 8.021Ki ± 0% -49.67% (n=150)
Append64Bytes/count=200-4 31.94Ki ± 0% 16.08Ki ± 0% -49.64% (n=150)
Append64Bytes/count=400-4 63.94Ki ± 0% 32.33Ki ± 0% -49.44% (n=150)
geomean 1.991Ki 1.124Ki -43.54%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ allocs/op │ allocs/op vs base │
Append64Bytes/count=1-4 1.000 ± 0% 1.000 ± 0% ~ (n=150)
Append64Bytes/count=2-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150)
Append64Bytes/count=3-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150)
Append64Bytes/count=4-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150)
Append64Bytes/count=5-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150)
Append64Bytes/count=6-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150)
Append64Bytes/count=10-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150)
Append64Bytes/count=20-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150)
Append64Bytes/count=50-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150)
Append64Bytes/count=100-4 8.000 ± 0% 1.000 ± 0% -87.50% (n=150)
Append64Bytes/count=200-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150)
Append64Bytes/count=400-4 10.000 ± 0% 1.000 ± 0% -90.00% (n=150)
geomean 4.331 1.000 -76.91%
The second benchmark is similar, but instead uses an 8-byte integer
for the slice element. The first 4 appends in the loop never call into
the runtime thanks to the excellent CL 664299 introduced by Keith in
Go 1.25 that allows some <= 32 byte dynamically-sized slices to be on
the stack, so this CL is neutral for <= 32 bytes. Once the 5th append
occurs at count=5, a grow happens via the runtime and heap allocates
as normal, but freegc does not yet have anything to free, so we see
a small ~1.4ns penalty reported there. But once the second growth
happens, the older heap memory is now automatically freed by freegc,
so we start to see some benefit in memory reductions and speed
improvements, starting at a tiny speed improvement (close to a wash,
or maybe noise) by the second growth before count=10, and building up to
~2x faster with ~68% fewer allocated bytes reported.
goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ sec/op │ sec/op vs base │
AppendInt/count=1-4 2.978n ± 0% 2.969n ± 0% -0.30% (p=0.000 n=150)
AppendInt/count=4-4 4.292n ± 3% 4.163n ± 3% ~ (p=0.528 n=150)
AppendInt/count=5-4 33.50n ± 0% 34.93n ± 0% +4.25% (p=0.000 n=150)
AppendInt/count=10-4 76.21n ± 1% 75.67n ± 0% -0.72% (p=0.000 n=150)
AppendInt/count=20-4 150.6n ± 1% 133.0n ± 0% -11.65% (n=150)
AppendInt/count=50-4 284.1n ± 1% 225.6n ± 0% -20.59% (n=150)
AppendInt/count=100-4 544.2n ± 1% 392.4n ± 1% -27.89% (n=150)
AppendInt/count=200-4 1051.5n ± 1% 702.3n ± 0% -33.21% (n=150)
AppendInt/count=400-4 2.041µ ± 1% 1.312µ ± 1% -35.70% (n=150)
AppendInt/count=1000-4 5.224µ ± 2% 2.851µ ± 1% -45.43% (n=150)
AppendInt/count=2000-4 11.770µ ± 1% 6.010µ ± 1% -48.94% (n=150)
AppendInt/count=3000-4 17.747µ ± 2% 8.264µ ± 1% -53.44% (n=150)
geomean 331.8n 246.4n -25.72%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ B/op │ B/op vs base │
AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=5-4 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=150)
AppendInt/count=10-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150)
AppendInt/count=20-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150)
AppendInt/count=50-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150)
AppendInt/count=100-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150)
AppendInt/count=200-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150)
AppendInt/count=400-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150)
AppendInt/count=1000-4 24.56Ki ± 0% 10.05Ki ± 0% -59.07% (n=150)
AppendInt/count=2000-4 58.56Ki ± 0% 20.31Ki ± 0% -65.32% (n=150)
AppendInt/count=3000-4 85.19Ki ± 0% 27.30Ki ± 0% -67.95% (n=150)
geomean ² -42.81%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ allocs/op │ allocs/op vs base │
AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=5-4 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=10-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150)
AppendInt/count=20-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150)
AppendInt/count=50-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150)
AppendInt/count=100-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150)
AppendInt/count=200-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150)
AppendInt/count=400-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150)
AppendInt/count=1000-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150)
AppendInt/count=2000-4 11.000 ± 0% 1.000 ± 0% -90.91% (n=150)
AppendInt/count=3000-4 12.000 ± 0% 1.000 ± 0% -91.67% (n=150)
geomean ² -72.76% ²
Of course, these are just microbenchmarks, but likely indicate
there are some opportunities here.
The immediately following CL 712422 tackles inlining and is able to get
runtime.freegc working automatically with iterators such as used by
slices.Collect, which becomes able to automatically free the
intermediate memory from its repeated appends (which earlier
in this work required a temporary hand edit to the slices package).
For now, we only use the NoAlias version for element types without
pointers while waiting on additional runtime support in CL 698515.
Updates #74299
Change-Id: I1b9d286aa97c170dcc2e203ec0f8ca72d84e8221
Reviewed-on: https://go-review.googlesource.com/c/go/+/710015
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
TLDR
- not-huge increase to default starting heap boost,
- small improvement in build performance,
- remove concurrency dependence of starting heap,
- aligns RSS behavior with GOMEMLIMIT,
- adds a gcflags=-d=gcstart=N (N -> N MiB) flag for
people who want to trade a lot of memory for a
little build performance improvement.
This removes concurrency (-c flag) sensitivity and increases the
nominal default to 128MiB.
Refactored the startheap code into a separate file, to make it
easier to extract and reuse.
Added sensitivity to concurrency=1 and GOMEMLIMIT!="" (in addition
to existing GOGC!=""), those disable the default starting heap boost
because the compiler-invoker has indicated either a desire to control
the GC or a desire to run in minimum memory(or both).
Adds a -d flag gcstart=N (N is number of MiB) for
tinkering/experiments. This always enables the starting heap.
(`GOGC=XXX` and `-d=gcstart=YYY` will use `GOGC=XXX` after starting
heap size is achieved.)
Derated the "boost" obtained by a factor of .70 so that
`-d=gcstart=2000` yields the same RSS as `GOMEMLIMIT=2000MiB`
(Actually adjusts the boost with a high-low breakpoint.)
The parent, with concurrency sensitivity, provided 64MB of plain
boost. Derating reduces the effects of boosting the starting heap
slightly. The benchmark here shows that maintaining 64MB results in
a minor regression, while increasing it to 128MB produces a slight
improvement, and does not grow the RSS versus 64MB.
```
│ parent │ sh64 │ sh128 │ sh1024 │
│ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │
std 10.164 ± 1% 10.527 ± 1% +3.57% (p=0.000 n=50) 10.084 ± 1% -0.79% (p=0.000 n=50) 9.631 ± 1% -5.24% (p=0.000 n=50)
compile 21.05 ± 1% 20.78 ± 0% -1.28% (p=0.000 n=50) 20.74 ± 1% -1.46% (p=0.000 n=50) 20.77 ± 0% -1.32% (p=0.001 n=50)
ast 20.45 ± 1% 20.39 ± 1% ~ (p=0.334 n=50) 20.44 ± 0% ~ (p=0.818 n=50) 20.11 ± 1% -1.65% (p=0.000 n=50)
geomean 16.35 16.46 +0.65% 16.23 -0.76% 15.90 -2.75%
│ parent │ sh64 │ sh128 │ sh1024 │
│ user-sec/op │ user-sec/op vs base │ user-sec/op vs base │ user-sec/op vs base │
std 66.06 ± 0% 69.74 ± 0% +5.56% (p=0.000 n=50) 64.68 ± 0% -2.09% (p=0.000 n=50) 59.51 ± 0% -9.91% (p=0.000 n=50)
compile 84.69 ± 1% 82.54 ± 0% -2.53% (p=0.000 n=50) 82.63 ± 0% -2.43% (p=0.000 n=50) 82.66 ± 1% -2.40% (p=0.000 n=50)
ast 59.41 ± 0% 58.84 ± 1% -0.95% (p=0.011 n=50) 59.48 ± 1% ~ (p=0.341 n=50) 57.13 ± 1% -3.83% (p=0.000 n=50)
geomean 69.27 69.71 +0.63% 68.25 -1.47% 65.50 -5.44%
│ parent │ sh64 │ sh128 │ sh1024 │
│ sys-sec/op │ sys-sec/op vs base │ sys-sec/op vs base │ sys-sec/op vs base │
std 9.599 ± 1% 10.031 ± 1% +4.50% (p=0.000 n=50) 9.513 ± 1% -0.90% (p=0.014 n=50) 9.359 ± 1% -2.50% (p=0.000 n=50)
compile 6.813 ± 1% 6.740 ± 1% -1.08% (p=0.017 n=50) 6.716 ± 1% -1.42% (p=0.006 n=50) 6.696 ± 1% -1.72% (p=0.000 n=50)
ast 4.315 ± 1% 4.291 ± 1% ~ (p=0.781 n=50) 4.296 ± 1% ~ (p=0.792 n=50) 4.279 ± 2% ~ (p=0.124 n=50)
geomean 6.559 6.620 +0.93% 6.499 -0.92% 6.449 -1.68%
│ parent │ sh64 │ sh128 │ sh1024 │
│ peak-RSS-bytes │ peak-RSS-bytes vs base │ peak-RSS-bytes vs base │ peak-RSS-bytes vs base │
std 257.1Mi ± 1% 257.2Mi ± 1% ~ (p=0.754 n=50) 257.0Mi ± 0% ~ (p=0.570 n=50) 605.6Mi ± 0% +135.59% (p=0.000 n=50)
compile 1007.2Mi ± 1% 1004.3Mi ± 0% ~ (p=0.064 n=50) 1007.4Mi ± 0% ~ (p=0.348 n=50) 1009.4Mi ± 1% ~ (p=0.598 n=50)
ast 1.848Gi ± 0% 1.842Gi ± 0% ~ (p=0.079 n=50) 1.824Gi ± 0% -1.25% (p=0.000 n=50) 1.856Gi ± 0% +0.47% (p=0.000 n=50)
geomean 788.3Mi 786.8Mi -0.19% 785.0Mi -0.41% 1.027Gi +33.37%
```
Updates #73044
Change-Id: I6359642a94b396e696dd57e64ed1f2c4cf178475
Reviewed-on: https://go-review.googlesource.com/c/go/+/724441
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
riscv64
Make use of compressed instructions on riscv64 - add a compress
pass to the end of the assembler, which replaces non-compressed
instructions with compressed alternatives if possible.
Provide a `compressinstructions` compiler and assembler debug
flag, such that the compression pass can be disabled via
`-asmflags=all=-d=compressinstructions=0` and
`-gcflags=all=-d=compressinstructions=0`. Note that this does
not prevent the explicit use of compressed instructions via
assembly.
Note that this does not make use of compressed control transfer
instructions - this will be implemented in later changes.
Reduces the text size of a hello world binary by ~121KB
and reduces the text size of the go binary on riscv64 by ~1.21MB
(between 8-10% in both cases).
Updates #71105
Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64
Change-Id: I24258353688554042c2a836deed4830cc673e985
Reviewed-on: https://go-review.googlesource.com/c/go/+/523478
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Eventual goal is that all the architectures agree, and are
sensible. The test will be build-tagged to exclude
not-yet-handled platforms.
This change also bisects the conversion change in case of bugs.
(`bisect -compile=convert ...`)
Change-Id: I98528666b0a3fde17cbe8d69b612d01da18dce85
Reviewed-on: https://go-review.googlesource.com/c/go/+/691135
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Several CLs earlier in this stack added optimizations to reduce
user allocations by recognizing and taking advantage of literals,
including CL 649555, CL 649079, and CL 649035.
This CL adds debug hashing of those changes, which enables use of the
bisect tool, such as 'bisect -compile=literalalloc go test -run=Foo'.
This also allows these optimizations to be manually disabled via
'-gcflags=all=-d=literalallochash=n'.
Updates #71359
Change-Id: I854f7742a6efa5b17d914932d61a32b2297f0c88
Reviewed-on: https://go-review.googlesource.com/c/go/+/675415
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
logging
This adds additional logging for the work that walk does to reduce
how often an interface conversion results in an allocation.
Also, as part of #71359, we will be updating how escape analysis and
walk handle basic literals, composite literals, and zero values,
so add some tests that uses this new logging.
By the end of our CL stack, we address all of these tests.
Updates #71359
Change-Id: I43fde8343d9aacaec1e05360417908014a86c8bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/649076
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
variable-sized makeslice
CL 653856 enabled stack allocation of variable-sized makeslice results.
This CL adds debug hashing of that change, plus a debug flag
to control the byte threshold used.
The debug hashing machinery means we also now have a way to disable just
the CL 653856 optimization by doing -gcflags='all=-d=variablemakehash=n'
or similar, though the stderr output will then typically have many
lines of debug hash output.
Using this CL plus the bisect command, I was able to retroactively
find one of the lines of code responsible for #73199:
$ bisect -compile=variablemake go test -skip TestListWireGuardDrivers
[...]
bisect: FOUND failing change set
--- change set #1 (enabling changes causes failure)
./security_windows.go:1321:38 (variablemake)
./security_windows.go:1321:38 (variablemake)
---
Previously, I had tracked down those lines by diffing '-gcflags=-m=1'
output and brief code inspection, but seeing the bisect was very nice.
This CL also adds a compiler debug flag to control the threshold for
stack allocation of variably sized make results. This can help
us identify more code that is relying on certain stack allocations.
This might be a temporary flag that we delete prior to Go 1.25
(given we would not want people to rely on it), or maybe it
might make sense to keep it for some period of time beyond the release
of Go 1.25 to help the ecosystem shake out other bugs.
Using these two flags together (and picking a threshold of 64 rather
than the default of 32), it looks for example like this
x/sys/windows code might be relying on stack allocation of
a byte slice:
$ bisect -compile=variablemake go test -gcflags=-d=variablemakethreshold=64 -skip TestListWireGuardDrivers
[...]
bisect: FOUND failing change set
--- change set #1 (enabling changes causes failure)
./syscall_windows_test.go:1178:16 (variablemake)
Updates #73199
Fixes #73253
Change-Id: I160179a0e3c148c3ea86be5c9b6cea8a52c3e5b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/663795
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For FIPS init-time code+data verification, we need to arrange to
put the FIPS symbols into contiguous regions of the executable
and then record those sections along with the expected checksum.
The cmd/internal/obj changes identify the FIPS symbols and give
them distinguished types, which the linker then places in contiguous
regions. The linker also writes out information to use at run time
to find the FIPS sections, along with the expected hash.
See cmd/internal/obj/fips.go and cmd/link/internal/ld/fips.go
for more details.
The code is disabled in this commit.
CL 625998 and 625999 adds tests.
CL 626000 enables the code.
For #69536.
Change-Id: I48da6db94bc0bea7428c43d4abcf999527bccfcd
Reviewed-on: https://go-review.googlesource.com/c/go/+/625997
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Use OTAILCALL in wrapper if the receiver and method are both pointers and it is
not going to be inlined, similar to how it is done in reflectdata.methodWrapper.
Currently tail call may be used for functions with identical argument types.
This change updates wrappers where both wrapper and the wrapped method's
receiver are pointers. In this case, we have the same signature for the
wrapper and the wrapped method (modulo the receiver's pointed-to types),
and do not need any local variables in the generated wrapper (on stack)
because the arguments are immediately passed to the wrapped method in place
(without need to move some value passed to other register or to change any
argument/return passed through stack). Thus, the wrapper does not need its
own stack frame.
This applies to promoted methods, e.g. when we have some struct type U with
an embedded type *T and construct a wrapper like
func (recv *U) M(arg int) bool { return recv.T.M(i) }
See also test/abi/method_wrapper.go for a running example.
Code size difference measured with this change (tried for x86_64):
etcd binary:
.text section size: 21472251 -> 21432350 (0.2%)
total binary size: 32226640 -> 32191136 (0.1%)
compile binary:
.text section size: 17419073 -> 17413929 (0.03%)
total binary size: 26744743 -> 26737567 (0.03%)
Change-Id: I9bbe730568f6def21a8e61118a6b6f503d98049c
Reviewed-on: https://go-review.googlesource.com/c/go/+/578235
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
This CL adds a "deadlocals" pass, which runs after inlining and before
escape analysis, to prune any unneeded local variables and
assignments. In particular, this helps avoid unnecessary Addrtaken
markings from unreachable closures.
Deadlocals is sensitive to "_ = ..." as a signal of explicit
use for testing. This signal occurs only if the entire
left-hand-side is "_" targets; if it is
`_, ok := someInlinedFunc(args)`
then the first return value is eligible for dead code elimination.
Use this (`_ = x`) to fix tests broken by deadlocals elimination.
Includes a test, based on one of the tests that required modification.
Matthew Dempsky wrote this, changing ownership to allow rebases, commits, tweaks.
Fixes #65158.
Old-Change-Id: I723fb69ccd7baadaae04d415702ce6c8901eaf4e
Change-Id: I1f25f4293b19527f305c18c3680b214237a7714c
Reviewed-on: https://go-review.googlesource.com/c/go/+/600498
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: David Chase <drchase@google.com>
Commit-Queue: David Chase <drchase@google.com>
|
|
This appears to be useful only on amd64, and was specifically
benchmarked on Apple Silicon and did not produce any benefit there.
This CL adds the assembly instruction `PCALIGNMAX align,amount`
which aligns to `align` if that can be achieved with `amount`
or fewer bytes of padding. (0 means never, but will align the
enclosing function.)
Specifically, if low-order-address-bits + amount are
greater than or equal to align; thus, `PCALIGNMAX 64,63` is
the same as `PCALIGN 64` and `PCALIGNMAX 64,0` will never
emit any alignment, but will still cause the function itself
to be aligned to (at least) 64 bytes.
Change-Id: Id51a056f1672f8095e8f755e01f72836c9686aa3
Reviewed-on: https://go-review.googlesource.com/c/go/+/577935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
It is possible to have situations where a given ir.Name is
non-address-taken at the source level, but whose address is
materialized in order to accommodate the needs of arch-dependent
memory ops. The issue here is that the SymAddr op will show up as
touching a variable of interest, but the subsequent memory op will
not. This is generally not an issue for computing whether something is
live across a call, but it is problematic for collecting the more
fine-grained live interval info that drives stack slot merging.
As an example, consider this Go code:
package p
type T struct {
x [10]int
f float64
}
func ABC(i, j int) int {
var t T
t.x[i&3] = j
return t.x[j&3]
}
On amd64 the code sequences we'll see for accesses to "t" might look like
v10 = VarDef <mem> {t} v1
v5 = MOVOstoreconst <mem> {t} [val=0,off=0] v2 v10
v23 = LEAQ <*T> {t} [8] v2 : DI
v12 = DUFFZERO <mem> [80] v23 v5
v14 = ANDQconst <int> [3] v7 : AX
v19 = MOVQstoreidx8 <mem> {t} v2 v14 v8 v12
v22 = ANDQconst <int> [3] v8 : BX
v24 = MOVQloadidx8 <int> {t} v2 v22 v19 : AX
v25 = MakeResult <int,mem> v24 v19 : <>
Note that the the loads and stores (ex: v19, v24) all refer directly
to "t", which means that regular live analysis will work fine for
identifying variable lifetimes. The DUFFZERO is (in effect) an
indirect write, but since there are accesses immediately after it we
wind up with the same live intervals.
Now the same code with GOARCH=ppc64:
v10 = VarDef <mem> {t} v1
v20 = MOVDaddr <*T> {t} v2 : R20
v12 = LoweredZero <mem> [88] v20 v10
v3 = CLRLSLDI <int> [212543] v7 : R5
v15 = MOVDaddr <*T> {t} v2 : R6
v19 = MOVDstoreidx <mem> v15 v3 v8 v12
v29 = CLRLSLDI <int> [212543] v8 : R4
v24 = MOVDloadidx <int> v15 v29 v19 : R3
v25 = MakeResult <int,mem> v24 v19 : <>
Here instead of memory ops that refer directly to the symbol, we take
the address of "t" (ex: v15) and then pass the address to memory ops
(where the ops themselves no longer refer to the symbol).
This patch enhances the stack slot merging liveness analysis to handle
cases like the PPC64 one above. We add a new phase in candidate
selection that collects more precise use information for merge
candidates, and screens out candidates that are too difficult to
analyze. The phase make a forward pass over each basic block looking
for instructions of the form vK := SymAddr(N) where N is a raw
candidate. It then creates an entry in a map with key vK and value
holding name and the vK use count. As the walk continues, we check for
uses of of vK: when we see one, record it in a side table as an
upwards exposed use of N. At each vK use we also decrement the use
count in the map entry, and if we hit zero, remove the map entry. If
we hit the end of the basic block and we still have map entries, this
implies that the address in question "escapes" the block -- at that
point to be conservative we just evict the name in question from the
candidate set.
Although this CL fixes the issues that forced a revert of the original
merging CL, this CL doesn't enable stack slot merging by default; a
subsequent CL will do that.
Updates #62737.
Updates #65532.
Updates #65495.
Change-Id: Id41d359a677767a8e7ac1e962ae23f7becb4031f
Reviewed-on: https://go-review.googlesource.com/c/go/+/576735
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
[This is a partial roll-forward of CL 553055, the main change here
is that the stack slot overlap operation is flagged off by default
(can be enabled by hand with -gcflags=-d=mergelocals=1) ]
Preliminary compiler support for merging/overlapping stack slots of
local variables whose access patterns are disjoint.
This patch includes changes in AllocFrame to do the actual
merging/overlapping based on information returned from a new
liveness.MergeLocals helper. The MergeLocals helper identifies
candidates by looking for sets of AUTO variables that either A) have
the same size and GC shape (if types contain pointers), or B) have the
same size (but potentially different types as long as those types have
no pointers). Variables must be greater than (3*types.PtrSize) in size
to be considered for merging.
After forming candidates, MergeLocals collects variables into "can be
overlapped" equivalence classes or partitions; this process is driven
by an additional liveness analysis pass. Ideally it would be nice to
move the existing stackmap liveness pass up before AllocFrame
and "widen" it to include merge candidates so that we can do just a
single liveness as opposed to two passes, however this may be difficult
given that the merge-locals liveness has to take into account
writes corresponding to dead stores.
This patch also required a change to the way ssa.OpVarDef pseudo-ops
are generated; prior to this point they would only be created for
variables whose type included pointers; if stack slot merging is
enabled then the ssagen code creates OpVarDef ops for all auto vars
that are merge candidates.
Note that some temporaries created late in the compilation process
(e.g. during ssa backend) are difficult to reason about, especially in
cases where we take the address of a temp and pass it to the runtime.
For the time being we mark most of the vars created post-ssagen as
"not a merge candidate".
Stack slot merging for locals/autos is enabled by default if "-N" is
not in effect, and can be disabled via "-gcflags=-d=mergelocals=0".
Fixmes/todos/restrictions:
- try lowering size restrictions
- re-evaluate the various skips that happen in SSA-created autotmps
Updates #62737.
Updates #65532.
Updates #65495.
Change-Id: Ifda26bc48cde5667de245c8a9671b3f0a30bb45d
Reviewed-on: https://go-review.googlesource.com/c/go/+/575415
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This reverts CL 553055.
Reason for revert: causes crypto/ecdsa failures on linux ppc64/s390x builders
Change-Id: I9266b030693a5b6b1e667a009de89d613755b048
Reviewed-on: https://go-review.googlesource.com/c/go/+/575236
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Preliminary compiler support for merging/overlapping stack
slots of local variables whose access patterns are disjoint.
This patch includes changes in AllocFrame to do the actual
merging/overlapping based on information returned from a new
liveness.MergeLocals helper. The MergeLocals helper identifies
candidates by looking for sets of AUTO variables that either A) have
the same size and GC shape (if types contain pointers), or B) have the
same size (but potentially different types as long as those types have
no pointers). Variables must be greater than (3*types.PtrSize) in size
to be considered for merging.
After forming candidates, MergeLocals collects variables into "can be
overlapped" equivalence classes or partitions; this process is driven
by an additional liveness analysis pass. Ideally it would be nice to
move the existing stackmap liveness pass up before AllocFrame
and "widen" it to include merge candidates so that we can do just a
single liveness as opposed to two passes, however this may be difficult
given that the merge-locals liveness has to take into account
writes corresponding to dead stores.
This patch also required a change to the way ssa.OpVarDef pseudo-ops
are generated; prior to this point they would only be created for
variables whose type included pointers; if stack slot merging is
enabled then the ssagen code creates OpVarDef ops for all auto vars
that are merge candidates.
Note that some temporaries created late in the compilation process
(e.g. during ssa backend) are difficult to reason about, especially in
cases where we take the address of a temp and pass it to the runtime.
For the time being we mark most of the vars created post-ssagen as
"not a merge candidate".
Stack slot merging for locals/autos is enabled by default if "-N" is
not in effect, and can be disabled via "-gcflags=-d=mergelocals=0".
Fixmes/todos/restrictions:
- try lowering size restrictions
- re-evaluate the various skips that happen in SSA-created autotmps
Fixes #62737.
Updates #65532.
Updates #65495.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Change-Id: Ibc22e8a76c87e47bc9fafe4959804d9ea923623d
Reviewed-on: https://go-review.googlesource.com/c/go/+/553055
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Shape-based stenciling in the Go compiler's generic instantiation
phase looks up shape types using the underlying type of a given target
type. This has a beneficial effect in most cases (e.g. we can use the
same shape type for two different named types whose underlying type is
"int"), but causes some problems when the underlying type is a very
large structure. The link string for the underlying type of a large
imported struct can be extremely long, since the link string
essentially enumerates the full package path for every field type;
this can produce a "go.shape.struct { ... " symbol name that is
absurdly long.
This patch switches the compiler to use a hash of the underlying type
link string instead of the string itself, which should continue to
provide commoning but keep symbol name lengths reasonable for shape
types based on large imported structs.
Fixes #65030.
Change-Id: I87d602626c43172beb99c186b8ef72327b8227a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/554975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Than McIntosh <thanm@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Per the discussion on the issue, since no problems related to this
appeared since Go 1.20, remove the ability to disable the check for
anonymous interface cycles permanently.
Adjust various tests accordingly.
For #56103.
Change-Id: Ica2b28752dca08934bbbc163a9b062ae1eb2a834
Reviewed-on: https://go-review.googlesource.com/c/go/+/550896
Run-TryBot: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Extend the pgodevirtualize debug flag to distinguish interface and
function devirtualization. Setting 1 keeps interface devirtualization
enabled but disables function value devirtualization.
For #64209.
Change-Id: I33aa7eb95ca0bdb215256d8c7cc8f9dac53ae30e
Reviewed-on: https://go-review.googlesource.com/c/go/+/543115
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add a debugging flag "-d=inlscoreadj" intended to support running
experiments in which the inliner uses different score adjustment
values for specific heuristics. The flag argument is a series of
clauses separated by the "/" char where each clause takes the form
"adjK:valK". For example, in this build
go build -gcflags=-d=inlscoreadj=inLoopAdj:10/returnFeedsConstToIfAdj:-99
the "in loop" score adjustments would be reset to a value of 15 (effectively
penalizing calls in loops) adn the "return feeds constant to foldable if/switch"
score adjustment would be boosted from -15 to -99.
Change-Id: Ibd1ee334684af5992466556a69baa6dfefb246b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/532116
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
E.g.
`GOEXPERIMENT=rangefunc go test -v -gcflags=-d=rangefunccheck=0 rangefunc_test.go`
will turn off the checking and fail.
The benchmarks, which do not use pathological iterators, run slightly faster.
Change-Id: Ia3e175e86d67ef74bbae9bcc5d2def6a2cdf519d
Reviewed-on: https://go-review.googlesource.com/c/go/+/541995
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
When a PGO build fails or produces incorrect program, it is often
unclear what the problem is. Add pgo hash so we can bisect to
individual optimization decisions, which often helps debugging.
Related to #58153.
Change-Id: I651ffd9c53bad60f2f28c8ec2a90a3f532982712
Reviewed-on: https://go-review.googlesource.com/c/go/+/528400
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Add a new debug flag "-d=dumpinlcallsitescores" that dumps out a
summary of all callsites in the package being compiled with info on
inlining heuristics, for human consumption. Sample output lines:
Score Adjustment Status Callee CallerPos ScoreFlags
...
115 40 DEMOTED cmd/compile/internal/abi.(*ABIParamAssignment).Offset expand_calls.go:1679:14|6 panicPathAdj
...
76 -5 PROMOTED runtime.persistentalloc mcheckmark.go:48:45|3 inLoopAdj
...
201 0 --- PGO unicode.DecodeRuneInString utf8.go:312:30|1
...
7 -5 --- PGO internal/abi.Name.DataChecked type.go:625:22|0 inLoopAdj
Here "Score" is the final score calculated for the callsite,
"Adjustment" is the amount added to or subtracted from the original
hairyness estimate to form the score. "Status" shows whether anything
changed with the site -- did the adjustment bump it down just below
the threshold ("PROMOTED") or instead bump it above the threshold
("DEMOTED") or did nothing happen as a result of the heuristics
("---"); "Status" also shows whether PGO was involved. "Callee" is the
name of the function called, "CallerPos" is the position of the
callsite, and "ScoreFlags" is a digest of the specific properties we
used to make adjustments to callsite score via heuristics.
Change-Id: Iea4b1cbfee038bc68df6ab81e9973f145636300b
Reviewed-on: https://go-review.googlesource.com/c/go/+/513455
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Currently, cmd/compile optimizes `var a = true; var b = a` into `var a
= true; var b = true`. But this may not be safe if we need to
initialize any other global variables between `a` and `b`, and the
initialization involves calling a user-defined function that reassigns
`a`.
This CL changes staticinit to keep track of the initialization
expressions that we've seen so far, and to stop applying the
staticcopy optimization once we've seen an initialization expression
that might have modified another global variable within this package.
To help identify affected initializers, this CL adds a -d=staticcopy
flag to warn when a staticcopy is suppressed and turned into a dynamic
copy.
Currently, `go build -gcflags=all=-d=staticcopy std` reports only four
instances:
```
encoding/xml/xml.go:1600:5: skipping static copy of HTMLEntity+0 with map[string]string{...}
encoding/xml/xml.go:1869:5: skipping static copy of HTMLAutoClose+0 with []string{...}
net/net.go:661:5: skipping static copy of .stmp_31+0 with poll.ErrNetClosing
net/http/transport.go:2566:5: skipping static copy of errRequestCanceled+0 with ~R0
```
Fixes #51913.
Change-Id: Iab41cf6f84c44f7f960e4e62c28a8aeaade4fbcf
Reviewed-on: https://go-review.googlesource.com/c/go/+/395541
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
|
|
This CL implements the remainder of the zero-copy string->[]byte
conversion optimization initially attempted in go.dev/cl/520395, but
fixes the tracking of mutations due to ODEREF/ODOTPTR assignments, and
adds more comprehensive tests that I should have included originally.
However, this CL also keeps it behind the -d=zerocopy flag. The next
CL will enable it by default (for easier rollback).
Updates #2205.
Change-Id: Ic330260099ead27fc00e2680a59c6ff23cb63c2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/520599
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
Add some machinery to support computing function "properties" for use
in driving inlining heuristics, and a unit testing framework to check
to see if the property computations are correct for a given set of
canned Go source files. This CL is mainly the analysis skeleton and a
testing framework; the code to compute the actual props will arrive in
a later patch.
Updates #61502.
Change-Id: I7970b64f713d17d7fdd7e8e9ccc7d9b0490571bf
Reviewed-on: https://go-review.googlesource.com/c/go/+/511557
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This CL is originally based on CL 484838 from rajbarik@uber.com.
Add a new PGO-based devirtualize pass. This pass conditionally
devirtualizes interface calls for the hottest callee. That is, it
performs a transformation like:
type Iface interface {
Foo()
}
type Concrete struct{}
func (Concrete) Foo() {}
func foo(i Iface) {
i.Foo()
}
to:
func foo(i Iface) {
if c, ok := i.(Concrete); ok {
c.Foo()
} else {
i.Foo()
}
}
The primary benefit of this transformation is enabling inlining of the
direct calls.
Today this change has no impact on the escape behavior, as the fallback
interface always forces an escape. But improving escape analysis to take
advantage of this is an area of potential work.
This CL is the bare minimum of a devirtualization implementation. There
are still numerous limitations:
* Callees not directly referenced in the current package can be missed
(even if they are in the transitive dependences).
* Callees not in the transitive dependencies of the current package are
missed.
* Only interface method calls are supported, not other indirect function
calls.
* Multiple calls to compatible interfaces on the same line cannot be
distinguished and will use the same callee target.
* Callees that only partially implement an interface (they are embedded
in another type that completes the interface) cannot be devirtualized.
* Others, mentioned in TODOs.
Fixes #59959
Change-Id: I8bedb516139695ee4069650b099d05957b7ce5ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/492436
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
We will soon have PGO specialization. It doesn't make sense for the
debug flag to have inline in the name, so rename it to pgodebug.
pgoinline is now a flag that can be used to disable PGO inlining.
Devirtualization will have a similar debug flag.
For #59959.
Change-Id: I9770ff1f0d132dfa3cd417018a887a1bd5555bba
Reviewed-on: https://go-review.googlesource.com/c/go/+/494716
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Delete the "InlineSCCOnePass" debugging flag and the inliner fallback
code that kicks in if it is used. The change it was intended to guard
has been working on tip for some time, no need for the fallback any
more.
Updates #58905.
Change-Id: I2e1dbc7640902d9402213db5ad338be03deb96c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/492015
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This patch changes the relative order of "CanInline" and "InlineCalls"
operations within the inliner for clumps of functions corresponding to
strongly connected components in the call graph. This helps increase
the amount of inlining within SCCs, particularly in Go's runtime
package, which has a couple of very large SCCs.
For a given SCC of the form { fn1, fn2, ... fnk }, the inliner would
(prior to this point) walk through the list of functions and for each
function first compute inlinability ("CanInline") and then perform
inlining ("InlineCalls"). This meant that if there was an inlinable
call from fn3 to fn4 (for example), this call would never be inlined,
since at the point fn3 was visited, we would not have computed
inlinability for fn4.
We now do inlinability analysis for all functions in an SCC first,
then do actual inlining for everything. This results in 47 additional
inlines in the Go runtime package (a fairly modest increase
percentage-wise of 0.6%).
Updates #58905.
Change-Id: I48dbb1ca16f0b12f256d9eeba8cf7f3e6dd853cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/474955
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Remove the compiler's "-wrapglobalmapinit" flag; it is potentially
confusing for users and isn't appropriate as a top level flag. Move
the enable/disable control to the "wrapglobalmapctl" debug flag
(values: 0 on by default, 1 disabled, 2 stress mode). No other changes
to compiler functionality.
Change-Id: I0d120eaf90ee34e29d5032889e673d42fe99e5dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/475035
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Adds:
GOEXPERIMENT=loopvar (expected way of invoking)
-d=loopvar={-1,0,1,2,11,12} (for per-package control and/or logging)
-d=loopvarhash=... (for hash debugging)
loopvar=11,12 are for testing, benchmarking, and debugging.
If enabled,for loops of the form `for x,y := range thing`, if x and/or
y are addressed or captured by a closure, are transformed by renaming
x/y to a temporary and prepending an assignment to the body of the
loop x := tmp_x. This changes the loop semantics by making each
iteration's instance of x be distinct from the others (currently they
are all aliased, and when this matters, it is almost always a bug).
3-range with captured iteration variables are also transformed,
though it is a more complex transformation.
"Optimized" to do a simpler transformation for
3-clause for where the increment is empty.
(Prior optimization of address-taking under Return disabled, because
it was incorrect; returns can have loops for children. Restored in
a later CL.)
Includes support for -d=loopvarhash=<binary string> intended for use
with hash search and GOCOMPILEDEBUG=loopvarhash=<binary string>
(use `gossahash -e loopvarhash command-that-fails`).
Minor feature upgrades to hash-triggered features; clients can specify
that file-position hashes use only the most-inline position, and/or that
they use only the basenames of source files (not the full directory path).
Most-inlined is the right choice for debugging loop-iteration change
once the semantics are linked to the package across inlining; basename-only
makes it tractable to write tests (which, otherwise, depend on the full
pathname of the source file and thus vary).
Updates #57969.
Change-Id: I180a51a3f8d4173f6210c861f10de23de8a1b1db
Reviewed-on: https://go-review.googlesource.com/c/go/+/411904
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This patch changes the compiler's pkg init machinery to pick out large
initialization assignments to global maps (e.g.
var mymap = map[string]int{"foo":1, "bar":2, ... }
and extract the map init code into a separate outlined function, which is
then called from the main init function with a weak relocation:
var mymap map[string]int // KEEP reloc -> map.init.0
func init() {
map.init.0() // weak relocation
}
func map.init.0() {
mymap = map[string]int{"foo":1, "bar":2}
}
The map init outlining is done selectively (only in the case where the
RHS code exceeds a size limit of 20 IR nodes).
In order to ensure that a given map.init.NNN function is included when
its corresponding map is live, we add dummy R_KEEP relocation from the
map variable to the map init function.
This first patch includes the main compiler compiler changes, and with
the weak relocation addition disabled. Subsequent patch includes the
requred linker changes along with switching to the call to the
outlined routine to a weak relocation. See the later linker change for
associated compile time performance numbers.
Updates #2559.
Updates #36021.
Updates #14840.
Change-Id: I1fd6fd6397772be1ebd3eb397caf68ae9a3147e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/461315
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Sym.Def used to be used for symbol resolution during the
old (pre-types2) typechecker. But since moving to types2-based IR
construction, we haven't really had a need for Sym.Def to ever refer
to anything but the package-scope definition, because types2 handles
symbol resolution for us.
This CL finally removes the Markdcl/Pushdcl/Popdcl functions that have
been a recurring source of issues in the past.
Change-Id: I2b012a0f17203efdd724ebd1e9314bd128cc2d61
Reviewed-on: https://go-review.googlesource.com/c/go/+/458625
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
This flag forced the compiler to eagerly type check all available
inline function bodies, which presumably was useful in the early days
of implementing inlining support. However, it shouldn't have any
significance with the unified frontend, since the same code paths are
used for constructing normal function bodies as for inlining.
Updates #57410.
Change-Id: I6842cf86bcd0fbf22ac336f2fc0b7b8fe14bccca
Reviewed-on: https://go-review.googlesource.com/c/go/+/458617
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This CL removes the GOEXPERIMENT=nounified knob, and any conditional
statements that depend on that knob. Further CLs to remove unreachable
code follow this one.
Updates #57410.
Change-Id: I39c147e1a83601c73f8316a001705778fee64a91
Reviewed-on: https://go-review.googlesource.com/c/go/+/458615
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This was disabled in CL 452676 out of an abundance of caution,
but further analysis has shown that the failures were not being
caused by this optimization. Instead the sequence of commits was:
CL 450136 cmd/compile: handle simple inlined calls in staticinit
...
CL 449937 archive/tar, archive/zip: return ErrInsecurePath for unsafe paths
...
CL 451555 cmd/compile: fix static init for inlined calls
The failures in question became compile failures in the first CL
and started building again after the last CL.
But in the interim the code had been broken by the middle CL.
CL 451555 was just the first time that the tests could run and fail.
For #30820.
Change-Id: I65064032355b56fdb43d9731be2f9f32ef6ee600
Reviewed-on: https://go-review.googlesource.com/c/go/+/452817
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This CL adds -d=inlstaticinit to control whether static initialization
of inlined function calls (added in CL 450136) is allowed.
We've needed to fix it once already (CL 451555) and Google-internal
testing is hitting additional failure cases, so putting this
optimization behind a feature flag seems appropriate regardless.
Also, while we diagnose and fix the remaining cases, this CL also
disables the optimization to avoid miscompilations.
Updates #56894.
Change-Id: If52a358ad1e9d6aad1c74fac5a81ff9cfa5a3793
Reviewed-on: https://go-review.googlesource.com/c/go/+/452676
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
|
|
This CL changes cmd/compile to reject anonymous interface cycles like:
type I interface { m() interface { I } }
We don't anticipate any users to be affected by this change in
practice. Nonetheless, this CL also adds a `-d=interfacecycles`
compiler flag to suppress the error. And assuming no issue reports
from users, we'll move the check into go/types and types2 instead.
Updates #56103.
Change-Id: I1f1dce2d7aa19fb388312cc020e99cc354afddcb
Reviewed-on: https://go-review.googlesource.com/c/go/+/445598
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
|
|
On advice of the department of garbage collection, forcing a garbage
collection generally does not improve performance. However,
this-data-is-now-unreachable is a good property to be able to test,
and that requires finalizers and a forced GC. So, to save build time,
this test was removed from the compiler itself, but to verify the
property, it was added to the fma_test (and the end-to-end dependence
on the flag was tested with an inserted failure in testing the
test).
TODO: also turn on the new -d=gccheck=1 debug flag on the ssacheck
builder.
Benchmarking reveals that it is profitable to avoid this GC,
with about 1.5% reduction in both user and wall time.
(48 p) https://perf.golang.org/search?q=upload:20221103.3
(12 p) https://perf.golang.org/search?q=upload:20221103.5
Change-Id: I4c4816d619735838a32388acf0cc5eb1cd5f0db5
Reviewed-on: https://go-review.googlesource.com/c/go/+/447359
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Adjust PGO inlining default parameters to 99% CDF threshold and
2000 budget. Benchmark results (mostly from Sweet) show that this
set of parameters performs reasonably well, with a few percent
speedup at the cost of a few percent binary size increase.
Also rename the debug flags to start with "pgo", to make it clear
that they are related to PGO.
Change-Id: I0749249b1298d1dc55a28993c37b3185f9d7639d
Reviewed-on: https://go-review.googlesource.com/c/go/+/449477
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This gets the Go command out of the business of thinking it understands
compiler debug flags, and allows the compiler to turn down its worker
concurrency instead of failing and forcing the user to do the very
same thing. Debug flags that are obviously safe for concurrency
(at least to me) are tagged; probably there's more.
Change-Id: I59bb19861d8a654a9cfd2364ee78c8628212f82e
Reviewed-on: https://go-review.googlesource.com/c/go/+/448359
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
heap growth
Benchmarking suggests about a 14-17% reduction in user build time,
about 3.5-7.8% reduction for wall time. This helps most builds
because small packages are common. Latest benchmarks (after the last
round of improvement):
(12 processors) https://perf.golang.org/search?q=upload:20221102.20
(GOMAXPROCS=2) https://perf.golang.org/search?q=upload:20221103.1
(48 processors) https://perf.golang.org/search?q=upload:20221102.19
(The number of compiler workers is capped at min(4, GOMAXPROCS))
An earlier, similar version of this CL at one point observed a 27%
reduction in user build time (building 40+ benchmarks, 20 times), but
the current form is judged to be the most reliable; it may be
profitable to tweak the numbers slightly later, and/or to adjust the
number of compiler workers.
We've talked about doing this in the past, the "new"(ish) metrics
package makes it a more tractable proposition.
The method here is:
1. If os.Getenv("GOGC") is empty, then increase GOGC to a large value,
calculated to grow the heap to 32 + 4 * compile_parallelism before a
GC occurs (e.g., on a >= 4 processor box, 64M). In practice,
sometimes GC occurs before that, but this still results in fewer GCs
and saved time. This is "heap goal".
2. Use a finalizer to approximately detect when GC occurs, and use
runtime metrics to track progress towards the goal heap size,
readjusting GOGC to retarget it as necessary. Reset GOGC to 100 when
the heap is "close enough" to the goal.
One feared failure mode of doing this is that the finalizer will be
slow to run and the heap will grow exceptionally large before GOGC is
reset; I monitored the heap size at reset and exit across several
boxes with a variety of processor counts and extra noise
(including several builds in parallel, including a laptop with a busy
many-tabs browser running) and overshoot effectively does not occur.
In some cases the compiler's heap grows so rapidly that estimated live
exceeds the GC goal, but this is not delayed-finalizer overshoot; the
compiler is just using that much memory. In a small number of cases
(3% of GCs in make.bash) the new goal is larger than predicted by as
much as 38%, so check for that and redo the adjustment.
I considered instead using the maximum heap size limit +
GC-detecting-finalizer + reset instead, but to me that seemed like it
might have a worse bad-case outcome; if the reset is delayed, it's
possible the GC would start running frequently, making it harder to
run the finalizer, reach 50% utilization, and the extra GCs would
lose the advantage. This might also perform badly in the case that a
rapidly growing heap outruns goal. In practice, this sort of
overshoot hasn't been observed, and a goal of 64M is small enough to
tolerate plenty of overshoot anyway.
This version of the CL includes a comment urging anyone who sees the
code and thinks it would work for them, to update a bug (to be
created if the CL is approved) with information about their
situation/experience, so that we may consider creating some more
official and reliable way of obtaining the same result.
Change-Id: I45df1c927c1a7d7503ade1abd1a3300e27516633
Reviewed-on: https://go-review.googlesource.com/c/go/+/436235
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This adds a -d debug flag "fmahash" for hashcode search for
floating point architecture-dependent problems. This variable has no
effect on architectures w/o fused-multiply-add.
This was rebased onto the GOSSAHASH renovation so that this could have
its own dedicated environment variable, and so that it would be
cheap (a nil check) to check it in the normal case.
Includes a basic test of the trigger plumbing.
Sample use (on arm64, ppc64le, s390x):
% GOCOMPILEDEBUG=fmahash=001110110 \
go build -o foo cmd/compile/internal/ssa/testdata/fma.go
fmahash triggered main.main:24 101111101101111001110110
GOFMAHASH triggered main.main:20 010111010000101110111011
1.0000000000000002 1.0000000000000004 -2.220446049250313e-16
exit status 1
The intended use is in conjunction with github.com/dr2chase/gossahash,
which will probably acquire a flag "-fma" to streamline its use. This
tool+use was inspired by an ad hoc use of this technique "in anger"
to debug this very problem. This is also a dry-run for using this
same technique to identify code sensitive to loop variable
lifetime/capture, should we make that change.
Example intended use, with current search tool (using old environment
variable), for a test example:
gossahash -e GOFMAHASH GOMAGIC=GOFMAHASH go run fma.go
Trying go args=[...], env=[GOFMAHASH=1 GOMAGIC=GOFMAHASH]
go failed (81 distinct triggers): exit status 1
Trying go args=[...], env=[GOFMAHASH=11 GOMAGIC=GOFMAHASH]
go failed (39 distinct triggers): exit status 1
Trying go args=[...], env=[GOFMAHASH=011 GOMAGIC=GOFMAHASH]
go failed (18 distinct triggers): exit status 1
Trying go args=[...], env=[GOFMAHASH=0011 GOMAGIC=GOFMAHASH]
Trying go args=[...], env=[GOFMAHASH=1011 GOMAGIC=GOFMAHASH]
...
Trying go args=[...], env=[GOFMAHASH=0110111011 GOMAGIC=GOFMAHASH]
Trying go args=[...], env=[GOFMAHASH=1110111011 GOMAGIC=GOFMAHASH]
go failed (2 distinct triggers): exit status 1
Trigger string is 'GOFMAHASH triggered math.qzero:427 111111101010011110111011', repeated 6 times
Trigger string is 'GOFMAHASH triggered main.main:20 010111010000101110111011', repeated 1 times
Trying go args=[...], env=[GOFMAHASH=01110111011 GOMAGIC=GOFMAHASH]
go failed (1 distinct triggers): exit status 1
Trigger string is 'GOFMAHASH triggered main.main:20 010111010000101110111011', repeated 1 times
Review GSHS_LAST_FAIL.0.log for failing run
FINISHED, suggest this command line for debugging:
GOSSAFUNC='main.main:20 010111010000101110111011' \
GOFMAHASH=01110111011 GOMAGIC=GOFMAHASH go run fma.go
Change-Id: Ifa22dd8f1c37c18fc8a4f7c396345a364bc367d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/394754
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Currently in PGO we use a percentage threshold to determine if a
callsite is hot. This CL uses a different method -- treating the
hottest callsites that make up cumulatively top X% of total edge
weights as hot (X=95 for now). This default might work better for
a wider range of profiles. (The absolute threshold can still be
changed by a flag.)
For #55022.
Change-Id: I7e3b6f0c3cf23f9a89dd5994c10075b498bf14ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/447016
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Currently, with PGO, the inliner uses node weights to decide if a
function is inlineable (with a larger budget). But the actual
inlining is determined by the weight of the call edge. There is a
discrepancy that, if a callee node is hot but the call edge is not,
it would not inlined, and marking the callee inlineable would of
no use.
Instead of using two kinds of weights, we just use the edge
weights to decide inlineability. If a function is the callee of a
hot call edge, its inlineability is determined with a larger
threshold. For a function that exceeds the regular inlining budget,
it is still inlined only when the call edge is hot, as it would
exceed the regular inlining cost for non-hot call sites, even if
it is marked inlineable.
For #55022.
Change-Id: I93fa9919fc6bcbb394e6cfe54ec96a96eede08f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/447015
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Randomized feature enable/disable might be something we use to
help users debug any problems with changed loop variable capture,
and there's another CL that would like to use it to help in
locating places where "fused" multiply add instructions change
program behavior.
This CL:
- adds the ability to include an integer parameter (e.g. line number)
- replumbed the environment variable into a flag to simplify go build cache management
- but added an environment variable to allow flag setting through the environment
- which adds the possibility of switching on a different variable
(if there's one built-in for variable capture, it shouldn't be GOSSAHASH)
- cleaned up the checking code
- adds tests for all the intended behavior
- removes the case for GSHS_LOGFILE; TBD whether we'll need to put that back
or if there is another way.
Change-Id: I8503e1bb3dbc4a743aea696e04411ea7ab884787
Reviewed-on: https://go-review.googlesource.com/c/go/+/443063
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: David Chase <drchase@google.com>
|
|
Also removes no-longer-needed "Any" field from compiler's DebugFlags.
Test/use case for this is the fmahash CL.
Change-Id: I214f02c91f30fc2ce53caf75fa5e2b905dd33429
Reviewed-on: https://go-review.googlesource.com/c/go/+/445495
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
|
|
For #55022
Change-Id: I51f1ba166d5a66dcaf4b280756be4a6bf9545c5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/429863
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
|