| Age | Commit message (Collapse) | Author |
|
During the investigation for #78081, I found it helpful to use the -h
flag to get a compiler core dump. I don't see any reason why it would
need to clamp concurrency down. We have pretty good support for not
interleaving log lines and if people really need it, they can always
pass -c=1.
Change-Id: Ic6425fc0da63e7ac42e0821f3d40ec4c6a6a6964
Reviewed-on: https://go-review.googlesource.com/c/go/+/756321
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Daniel Morsing <daniel.morsing@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
The property was missing from some flags that we potentially care
about. This change only affects flags supplied in the command line
(not explicitly changed within the compiler) but nonetheless it seems
like a good idea to get them right -- we might adjust them on the
command line, someone might read the code and wonder why they're
unset.
Change-Id: I44812ddea640b71c078594317ef3506ab055a37b
Reviewed-on: https://go-review.googlesource.com/c/go/+/452876
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
The prove pass uses warnings with locations that are sorted before they
are emitted. When debugging on level 3, this sorting can obscure which
log lines happened before each other, obscuring the flow of control.
This gets doubly confusing when debugging issues that include use of the
inliner, where 2 related log lines can be present at the far ends of the
output.
Make the sorting optional. This allows both the errorcheck tests to keep
working and for developers to use the logs as printf-style debugging.
Change-Id: I1c1d88696e830da71eeeca8e324ef9856a6a6964
Reviewed-on: https://go-review.googlesource.com/c/go/+/742341
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
This was extraordinarily useful for inlining work.
I have cleaned it up somewhat, and did some additional tweaks
after working on changes to bloop.
-gcflags=-d=astdump=SomeFunc
-gcflags=-d=astdump=SomeSubPkg.SomeFunc
-gcflags=-d=astdump=Some/Pkg.SomeFunc
-gcflags=-d=astdump=~YourRegExpHere
Change-Id: I3f98601ca96c87d6b191d4b64b264cd236e6d8bf
Reviewed-on: https://go-review.googlesource.com/c/go/+/629775
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
memory after growslice
This CL is part of a set of CLs that attempt to reduce how much work the
GC must do. See the design in https://go.dev/design/74299-runtime-freegc
This CL updates the compiler to examine append calls to prove
whether or not the slice is aliased.
If proven unaliased, the compiler automatically inserts a call to a new
runtime function introduced with this CL, runtime.growsliceNoAlias,
which frees the old backing memory immediately after slice growth is
complete and the old storage is logically dead.
Two append benchmarks below show promising results, executing up to
~2x faster and up to factor of ~3 memory reduction with this CL.
The approach works with multiple append calls for the same slice,
including inside loops, and the final slice memory can be escaping,
such as in a classic pattern of returning a slice from a function
after the slice is built. (The final slice memory is never freed with
this CL, though we have other work that tackles that.)
An example target for this CL is we automatically free the
intermediate memory for the appends in the loop in this function:
func f1(input []int) []int {
var s []int
for _, x := range input {
s = append(s, g(x)) // s cannot be aliased here
if h(x) {
s = append(s, x) // s cannot be aliased here
}
}
return s // slice escapes at end
}
In this case, the compiler and the runtime collaborate so that
the heap allocated backing memory for s is automatically freed after
a successful grow. (For the first grow, there is nothing to free,
but for the second and subsequent growths, the old heap memory is
freed automatically.)
The new runtime.growsliceNoAlias is primarily implemented
by calling runtime.freegc, which we introduced in CL 673695.
The high-level approach here is we step through the IR starting
from a slice declaration and look for any operations that either
alias the slice or might do so, and treat any IR construct we
don't specifically handle as a potential alias (and therefore
conservatively fall back to treating the slice as aliased when
encountering something not understood).
For loops, some additional care is required. We arrange the analysis
so that an alias in the body of a loop causes all the appends in that
same loop body to be marked aliased, even if the aliasing occurs after
the append in the IR:
func f2() {
var s []int
for i := range 10 {
s = append(s, i) // aliased due to next line
alias = s
}
}
For nested loops, we analyse the nesting appropriately so that
for example this append is still proven as non-aliased in the
inner loop even though it aliased for the outer loop:
func f3() {
for range 10 {
var s []int
for i := range 10 {
s = append(s, i) // append using non-aliased slice
}
alias = s
}
}
A good starting point is the beginning of the test/escape_alias.go file,
which starts with ~10 introductory examples with brief comments that
attempt to illustrate the high-level approach.
For more details, see the new .../internal/escape/alias.go file,
especially the (*aliasAnalysis).analyze method.
In the first benchmark, an append in a loop builds up a slice from
nothing, where the slice elements are each 64 bytes. In the table below,
'count' is the number of appends. With 1 append, there is no opportunity
for this CL to free memory. Once there are 2 appends, the growth from
1 element to 2 elements means the compiler-inserted growsliceNoAlias
frees the 1-element array, and we see a ~33% reduction in memory use
and a small reported speed improvement.
As the number of appends increases for example to 5, we are at
a ~20% speed improvement and ~45% memory reduction, and so on until
we reach ~40% faster and ~50% less memory allocated at the end of
the table.
There can be variation in the reported numbers based on -randlayout, so
this table is for 30 different values of -randlayout with a total
n=150. (Even so, there is still some variation, so we probably should
not read too much into small changes.) This is with GOAMD64=v3 on
a VM that gcc reports is cascadelake.
goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ sec/op │ sec/op vs base │
Append64Bytes/count=1-4 31.09n ± 2% 31.69n ± 1% +1.95% (n=150)
Append64Bytes/count=2-4 73.31n ± 1% 70.27n ± 0% -4.15% (n=150)
Append64Bytes/count=3-4 142.7n ± 1% 124.6n ± 1% -12.68% (n=150)
Append64Bytes/count=4-4 149.6n ± 1% 127.7n ± 0% -14.64% (n=150)
Append64Bytes/count=5-4 277.1n ± 1% 213.6n ± 0% -22.90% (n=150)
Append64Bytes/count=6-4 280.7n ± 1% 216.5n ± 1% -22.87% (n=150)
Append64Bytes/count=10-4 544.3n ± 1% 386.6n ± 0% -28.97% (n=150)
Append64Bytes/count=20-4 1058.5n ± 1% 715.6n ± 1% -32.39% (n=150)
Append64Bytes/count=50-4 2.121µ ± 1% 1.404µ ± 1% -33.83% (n=150)
Append64Bytes/count=100-4 4.152µ ± 1% 2.736µ ± 1% -34.11% (n=150)
Append64Bytes/count=200-4 7.753µ ± 1% 4.882µ ± 1% -37.03% (n=150)
Append64Bytes/count=400-4 15.163µ ± 2% 9.273µ ± 1% -38.84% (n=150)
geomean 601.8n 455.0n -24.39%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ B/op │ B/op vs base │
Append64Bytes/count=1-4 64.00 ± 0% 64.00 ± 0% ~ (n=150)
Append64Bytes/count=2-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150)
Append64Bytes/count=3-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150)
Append64Bytes/count=4-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150)
Append64Bytes/count=5-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150)
Append64Bytes/count=6-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150)
Append64Bytes/count=10-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150)
Append64Bytes/count=20-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150)
Append64Bytes/count=50-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150)
Append64Bytes/count=100-4 15.938Ki ± 0% 8.021Ki ± 0% -49.67% (n=150)
Append64Bytes/count=200-4 31.94Ki ± 0% 16.08Ki ± 0% -49.64% (n=150)
Append64Bytes/count=400-4 63.94Ki ± 0% 32.33Ki ± 0% -49.44% (n=150)
geomean 1.991Ki 1.124Ki -43.54%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ allocs/op │ allocs/op vs base │
Append64Bytes/count=1-4 1.000 ± 0% 1.000 ± 0% ~ (n=150)
Append64Bytes/count=2-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150)
Append64Bytes/count=3-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150)
Append64Bytes/count=4-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150)
Append64Bytes/count=5-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150)
Append64Bytes/count=6-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150)
Append64Bytes/count=10-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150)
Append64Bytes/count=20-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150)
Append64Bytes/count=50-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150)
Append64Bytes/count=100-4 8.000 ± 0% 1.000 ± 0% -87.50% (n=150)
Append64Bytes/count=200-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150)
Append64Bytes/count=400-4 10.000 ± 0% 1.000 ± 0% -90.00% (n=150)
geomean 4.331 1.000 -76.91%
The second benchmark is similar, but instead uses an 8-byte integer
for the slice element. The first 4 appends in the loop never call into
the runtime thanks to the excellent CL 664299 introduced by Keith in
Go 1.25 that allows some <= 32 byte dynamically-sized slices to be on
the stack, so this CL is neutral for <= 32 bytes. Once the 5th append
occurs at count=5, a grow happens via the runtime and heap allocates
as normal, but freegc does not yet have anything to free, so we see
a small ~1.4ns penalty reported there. But once the second growth
happens, the older heap memory is now automatically freed by freegc,
so we start to see some benefit in memory reductions and speed
improvements, starting at a tiny speed improvement (close to a wash,
or maybe noise) by the second growth before count=10, and building up to
~2x faster with ~68% fewer allocated bytes reported.
goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ sec/op │ sec/op vs base │
AppendInt/count=1-4 2.978n ± 0% 2.969n ± 0% -0.30% (p=0.000 n=150)
AppendInt/count=4-4 4.292n ± 3% 4.163n ± 3% ~ (p=0.528 n=150)
AppendInt/count=5-4 33.50n ± 0% 34.93n ± 0% +4.25% (p=0.000 n=150)
AppendInt/count=10-4 76.21n ± 1% 75.67n ± 0% -0.72% (p=0.000 n=150)
AppendInt/count=20-4 150.6n ± 1% 133.0n ± 0% -11.65% (n=150)
AppendInt/count=50-4 284.1n ± 1% 225.6n ± 0% -20.59% (n=150)
AppendInt/count=100-4 544.2n ± 1% 392.4n ± 1% -27.89% (n=150)
AppendInt/count=200-4 1051.5n ± 1% 702.3n ± 0% -33.21% (n=150)
AppendInt/count=400-4 2.041µ ± 1% 1.312µ ± 1% -35.70% (n=150)
AppendInt/count=1000-4 5.224µ ± 2% 2.851µ ± 1% -45.43% (n=150)
AppendInt/count=2000-4 11.770µ ± 1% 6.010µ ± 1% -48.94% (n=150)
AppendInt/count=3000-4 17.747µ ± 2% 8.264µ ± 1% -53.44% (n=150)
geomean 331.8n 246.4n -25.72%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ B/op │ B/op vs base │
AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=5-4 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=150)
AppendInt/count=10-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150)
AppendInt/count=20-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150)
AppendInt/count=50-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150)
AppendInt/count=100-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150)
AppendInt/count=200-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150)
AppendInt/count=400-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150)
AppendInt/count=1000-4 24.56Ki ± 0% 10.05Ki ± 0% -59.07% (n=150)
AppendInt/count=2000-4 58.56Ki ± 0% 20.31Ki ± 0% -65.32% (n=150)
AppendInt/count=3000-4 85.19Ki ± 0% 27.30Ki ± 0% -67.95% (n=150)
geomean ² -42.81%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ allocs/op │ allocs/op vs base │
AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=5-4 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=10-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150)
AppendInt/count=20-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150)
AppendInt/count=50-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150)
AppendInt/count=100-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150)
AppendInt/count=200-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150)
AppendInt/count=400-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150)
AppendInt/count=1000-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150)
AppendInt/count=2000-4 11.000 ± 0% 1.000 ± 0% -90.91% (n=150)
AppendInt/count=3000-4 12.000 ± 0% 1.000 ± 0% -91.67% (n=150)
geomean ² -72.76% ²
Of course, these are just microbenchmarks, but likely indicate
there are some opportunities here.
The immediately following CL 712422 tackles inlining and is able to get
runtime.freegc working automatically with iterators such as used by
slices.Collect, which becomes able to automatically free the
intermediate memory from its repeated appends (which earlier
in this work required a temporary hand edit to the slices package).
For now, we only use the NoAlias version for element types without
pointers while waiting on additional runtime support in CL 698515.
Updates #74299
Change-Id: I1b9d286aa97c170dcc2e203ec0f8ca72d84e8221
Reviewed-on: https://go-review.googlesource.com/c/go/+/710015
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
TLDR
- not-huge increase to default starting heap boost,
- small improvement in build performance,
- remove concurrency dependence of starting heap,
- aligns RSS behavior with GOMEMLIMIT,
- adds a gcflags=-d=gcstart=N (N -> N MiB) flag for
people who want to trade a lot of memory for a
little build performance improvement.
This removes concurrency (-c flag) sensitivity and increases the
nominal default to 128MiB.
Refactored the startheap code into a separate file, to make it
easier to extract and reuse.
Added sensitivity to concurrency=1 and GOMEMLIMIT!="" (in addition
to existing GOGC!=""), those disable the default starting heap boost
because the compiler-invoker has indicated either a desire to control
the GC or a desire to run in minimum memory(or both).
Adds a -d flag gcstart=N (N is number of MiB) for
tinkering/experiments. This always enables the starting heap.
(`GOGC=XXX` and `-d=gcstart=YYY` will use `GOGC=XXX` after starting
heap size is achieved.)
Derated the "boost" obtained by a factor of .70 so that
`-d=gcstart=2000` yields the same RSS as `GOMEMLIMIT=2000MiB`
(Actually adjusts the boost with a high-low breakpoint.)
The parent, with concurrency sensitivity, provided 64MB of plain
boost. Derating reduces the effects of boosting the starting heap
slightly. The benchmark here shows that maintaining 64MB results in
a minor regression, while increasing it to 128MB produces a slight
improvement, and does not grow the RSS versus 64MB.
```
│ parent │ sh64 │ sh128 │ sh1024 │
│ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │
std 10.164 ± 1% 10.527 ± 1% +3.57% (p=0.000 n=50) 10.084 ± 1% -0.79% (p=0.000 n=50) 9.631 ± 1% -5.24% (p=0.000 n=50)
compile 21.05 ± 1% 20.78 ± 0% -1.28% (p=0.000 n=50) 20.74 ± 1% -1.46% (p=0.000 n=50) 20.77 ± 0% -1.32% (p=0.001 n=50)
ast 20.45 ± 1% 20.39 ± 1% ~ (p=0.334 n=50) 20.44 ± 0% ~ (p=0.818 n=50) 20.11 ± 1% -1.65% (p=0.000 n=50)
geomean 16.35 16.46 +0.65% 16.23 -0.76% 15.90 -2.75%
│ parent │ sh64 │ sh128 │ sh1024 │
│ user-sec/op │ user-sec/op vs base │ user-sec/op vs base │ user-sec/op vs base │
std 66.06 ± 0% 69.74 ± 0% +5.56% (p=0.000 n=50) 64.68 ± 0% -2.09% (p=0.000 n=50) 59.51 ± 0% -9.91% (p=0.000 n=50)
compile 84.69 ± 1% 82.54 ± 0% -2.53% (p=0.000 n=50) 82.63 ± 0% -2.43% (p=0.000 n=50) 82.66 ± 1% -2.40% (p=0.000 n=50)
ast 59.41 ± 0% 58.84 ± 1% -0.95% (p=0.011 n=50) 59.48 ± 1% ~ (p=0.341 n=50) 57.13 ± 1% -3.83% (p=0.000 n=50)
geomean 69.27 69.71 +0.63% 68.25 -1.47% 65.50 -5.44%
│ parent │ sh64 │ sh128 │ sh1024 │
│ sys-sec/op │ sys-sec/op vs base │ sys-sec/op vs base │ sys-sec/op vs base │
std 9.599 ± 1% 10.031 ± 1% +4.50% (p=0.000 n=50) 9.513 ± 1% -0.90% (p=0.014 n=50) 9.359 ± 1% -2.50% (p=0.000 n=50)
compile 6.813 ± 1% 6.740 ± 1% -1.08% (p=0.017 n=50) 6.716 ± 1% -1.42% (p=0.006 n=50) 6.696 ± 1% -1.72% (p=0.000 n=50)
ast 4.315 ± 1% 4.291 ± 1% ~ (p=0.781 n=50) 4.296 ± 1% ~ (p=0.792 n=50) 4.279 ± 2% ~ (p=0.124 n=50)
geomean 6.559 6.620 +0.93% 6.499 -0.92% 6.449 -1.68%
│ parent │ sh64 │ sh128 │ sh1024 │
│ peak-RSS-bytes │ peak-RSS-bytes vs base │ peak-RSS-bytes vs base │ peak-RSS-bytes vs base │
std 257.1Mi ± 1% 257.2Mi ± 1% ~ (p=0.754 n=50) 257.0Mi ± 0% ~ (p=0.570 n=50) 605.6Mi ± 0% +135.59% (p=0.000 n=50)
compile 1007.2Mi ± 1% 1004.3Mi ± 0% ~ (p=0.064 n=50) 1007.4Mi ± 0% ~ (p=0.348 n=50) 1009.4Mi ± 1% ~ (p=0.598 n=50)
ast 1.848Gi ± 0% 1.842Gi ± 0% ~ (p=0.079 n=50) 1.824Gi ± 0% -1.25% (p=0.000 n=50) 1.856Gi ± 0% +0.47% (p=0.000 n=50)
geomean 788.3Mi 786.8Mi -0.19% 785.0Mi -0.41% 1.027Gi +33.37%
```
Updates #73044
Change-Id: I6359642a94b396e696dd57e64ed1f2c4cf178475
Reviewed-on: https://go-review.googlesource.com/c/go/+/724441
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
riscv64
Make use of compressed instructions on riscv64 - add a compress
pass to the end of the assembler, which replaces non-compressed
instructions with compressed alternatives if possible.
Provide a `compressinstructions` compiler and assembler debug
flag, such that the compression pass can be disabled via
`-asmflags=all=-d=compressinstructions=0` and
`-gcflags=all=-d=compressinstructions=0`. Note that this does
not prevent the explicit use of compressed instructions via
assembly.
Note that this does not make use of compressed control transfer
instructions - this will be implemented in later changes.
Reduces the text size of a hello world binary by ~121KB
and reduces the text size of the go binary on riscv64 by ~1.21MB
(between 8-10% in both cases).
Updates #71105
Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64
Change-Id: I24258353688554042c2a836deed4830cc673e985
Reviewed-on: https://go-review.googlesource.com/c/go/+/523478
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This change mechanically replaces all occurrences of interface{}
by 'any' (where deemed safe by the 'any' modernizer) throughout
std and cmd, minus their vendor trees.
Since this fix is relatively numerous, it gets its own CL.
Also, 'go generate go/types'.
Change-Id: I14a6b52856c3291c1d27935409bca8d5fd4242a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/719702
Commit-Queue: Alan Donovan <adonovan@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Alan Donovan <adonovan@google.com>
|
|
The new conversions can be activated (or bisected) with
-gcflags=all=-d=converthash=PATTERN
where PATTERN is either a hash string or n, qn, y, qy for
no, quietly no, yes, quietly yes.
This CL makes the default pattern be "qn" instead of the
default-default which is an efficient encoding of "qy".
Updates #75834
Change-Id: I88a9fd7880bc999132420c8d0a22a8fdc1e95a2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/711845
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Bypass: David Chase <drchase@google.com>
|
|
Eventual goal is that all the architectures agree, and are
sensible. The test will be build-tagged to exclude
not-yet-handled platforms.
This change also bisects the conversion change in case of bugs.
(`bisect -compile=convert ...`)
Change-Id: I98528666b0a3fde17cbe8d69b612d01da18dce85
Reviewed-on: https://go-review.googlesource.com/c/go/+/691135
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
The compiler has a logic to print different messages on internal
compiler error depending on whether this is a released version of
Go. It hides the panic stack trace if it is a released version. It
does this by checking the version and see if it has a "go" prefix.
This includes all the released versions. However, for a non-
released build, if there is no explicit version set, cmd/dist now
sets the toolchain version as go1.X-devel_XXX, which makes it be
treated as a released compiler, and causes the stack trace to be
hidden. Change the logic to not match a devel compiler as a
released compiler.
Cherry-picked from the dev.simd branch. This CL is not
necessarily SIMD specific. Apply early to reduce risk.
Change-Id: I5d3b2101527212f825b6e4000b36030c4f83870b
Reviewed-on: https://go-review.googlesource.com/c/go/+/682975
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/708855
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
In Go 1.25+, strings.SplitSeq offers better
performance. Here are the benchmark results comparing
strings.Split and strings.SplitSeq in a for-loop, with the
benchmark code located in src/strings/iter_test.go:
goos: darwin
goarch: amd64
pkg: cmd/go/internal/auth
cpu: Intel(R) Core(TM) i7-8569U CPU @ 2.80GHz
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
ParseGitAuth/standard-8 281.4n ± 1% 218.0n ± 11% -22.54% (p=0.000 n=10)
ParseGitAuth/with_url-8 549.1n ± 1% 480.5n ± 13% -12.48% (p=0.002 n=10)
ParseGitAuth/minimal-8 235.4n ± 1% 197.3n ± 7% -16.20% (p=0.000 n=10)
ParseGitAuth/complex-8 797.6n ± 2% 805.2n ± 4% ~ (p=0.481 n=10)
ParseGitAuth/empty-8 87.48n ± 3% 63.25n ± 6% -27.71% (p=0.000 n=10)
ParseGitAuth/malformed-8 228.8n ± 1% 171.2n ± 3% -25.17% (p=0.000 n=10)
geomean 288.9n 237.7n -17.72%
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
ParseGitAuth/standard-8 192.00 ± 0% 96.00 ± 0% -50.00% (p=0.000 n=10)
ParseGitAuth/with_url-8 400.0 ± 0% 288.0 ± 0% -28.00% (p=0.000 n=10)
ParseGitAuth/minimal-8 144.00 ± 0% 80.00 ± 0% -44.44% (p=0.000 n=10)
ParseGitAuth/complex-8 528.0 ± 0% 400.0 ± 0% -24.24% (p=0.000 n=10)
ParseGitAuth/empty-8 32.00 ± 0% 16.00 ± 0% -50.00% (p=0.000 n=10)
ParseGitAuth/malformed-8 176.00 ± 0% 80.00 ± 0% -54.55% (p=0.000 n=10)
geomean 179.0 102.1 -42.96%
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
ParseGitAuth/standard-8 3.000 ± 0% 2.000 ± 0% -33.33% (p=0.000 n=10)
ParseGitAuth/with_url-8 4.000 ± 0% 3.000 ± 0% -25.00% (p=0.000 n=10)
ParseGitAuth/minimal-8 3.000 ± 0% 2.000 ± 0% -33.33% (p=0.000 n=10)
ParseGitAuth/complex-8 4.000 ± 0% 3.000 ± 0% -25.00% (p=0.000 n=10)
ParseGitAuth/empty-8 2.000 ± 0% 1.000 ± 0% -50.00% (p=0.000 n=10)
ParseGitAuth/malformed-8 3.000 ± 0% 2.000 ± 0% -33.33% (p=0.000 n=10)
geomean 3.086 2.040 -33.91%
Updates #69315.
Change-Id: Id0219edea45d9658d527b863162ebe917e7821d9
GitHub-Last-Rev: 392b315e122f2c9ef8703ca2dbce8f82ec198556
GitHub-Pull-Request: golang/go#75259
Reviewed-on: https://go-review.googlesource.com/c/go/+/701015
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
|
|
Use "reflect.TypeFor" to simplify the code.
Updates #60088
Change-Id: I93db6cbd4f02813d9a81f5d02996db8128cb81a9
GitHub-Last-Rev: 2aee64dac6e13ef869aa73f2abf236650e1c1757
GitHub-Pull-Request: golang/go#75349
Reviewed-on: https://go-review.googlesource.com/c/go/+/701676
Reviewed-by: Mark Freeman <markfreeman@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Several CLs earlier in this stack added optimizations to reduce
user allocations by recognizing and taking advantage of literals,
including CL 649555, CL 649079, and CL 649035.
This CL adds debug hashing of those changes, which enables use of the
bisect tool, such as 'bisect -compile=literalalloc go test -run=Foo'.
This also allows these optimizations to be manually disabled via
'-gcflags=all=-d=literalallochash=n'.
Updates #71359
Change-Id: I854f7742a6efa5b17d914932d61a32b2297f0c88
Reviewed-on: https://go-review.googlesource.com/c/go/+/675415
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
logging
This adds additional logging for the work that walk does to reduce
how often an interface conversion results in an allocation.
Also, as part of #71359, we will be updating how escape analysis and
walk handle basic literals, composite literals, and zero values,
so add some tests that uses this new logging.
By the end of our CL stack, we address all of these tests.
Updates #71359
Change-Id: I43fde8343d9aacaec1e05360417908014a86c8bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/649076
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
variable-sized makeslice
CL 653856 enabled stack allocation of variable-sized makeslice results.
This CL adds debug hashing of that change, plus a debug flag
to control the byte threshold used.
The debug hashing machinery means we also now have a way to disable just
the CL 653856 optimization by doing -gcflags='all=-d=variablemakehash=n'
or similar, though the stderr output will then typically have many
lines of debug hash output.
Using this CL plus the bisect command, I was able to retroactively
find one of the lines of code responsible for #73199:
$ bisect -compile=variablemake go test -skip TestListWireGuardDrivers
[...]
bisect: FOUND failing change set
--- change set #1 (enabling changes causes failure)
./security_windows.go:1321:38 (variablemake)
./security_windows.go:1321:38 (variablemake)
---
Previously, I had tracked down those lines by diffing '-gcflags=-m=1'
output and brief code inspection, but seeing the bisect was very nice.
This CL also adds a compiler debug flag to control the threshold for
stack allocation of variably sized make results. This can help
us identify more code that is relying on certain stack allocations.
This might be a temporary flag that we delete prior to Go 1.25
(given we would not want people to rely on it), or maybe it
might make sense to keep it for some period of time beyond the release
of Go 1.25 to help the ecosystem shake out other bugs.
Using these two flags together (and picking a threshold of 64 rather
than the default of 32), it looks for example like this
x/sys/windows code might be relying on stack allocation of
a byte slice:
$ bisect -compile=variablemake go test -gcflags=-d=variablemakethreshold=64 -skip TestListWireGuardDrivers
[...]
bisect: FOUND failing change set
--- change set #1 (enabling changes causes failure)
./syscall_windows_test.go:1178:16 (variablemake)
Updates #73199
Fixes #73253
Change-Id: I160179a0e3c148c3ea86be5c9b6cea8a52c3e5b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/663795
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For FIPS init-time code+data verification, we need to arrange to
put the FIPS symbols into contiguous regions of the executable
and then record those sections along with the expected checksum.
The cmd/internal/obj changes identify the FIPS symbols and give
them distinguished types, which the linker then places in contiguous
regions. The linker also writes out information to use at run time
to find the FIPS sections, along with the expected hash.
See cmd/internal/obj/fips.go and cmd/link/internal/ld/fips.go
for more details.
The code is disabled in this commit.
CL 625998 and 625999 adds tests.
CL 626000 enables the code.
For #69536.
Change-Id: I48da6db94bc0bea7428c43d4abcf999527bccfcd
Reviewed-on: https://go-review.googlesource.com/c/go/+/625997
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Use OTAILCALL in wrapper if the receiver and method are both pointers and it is
not going to be inlined, similar to how it is done in reflectdata.methodWrapper.
Currently tail call may be used for functions with identical argument types.
This change updates wrappers where both wrapper and the wrapped method's
receiver are pointers. In this case, we have the same signature for the
wrapper and the wrapped method (modulo the receiver's pointed-to types),
and do not need any local variables in the generated wrapper (on stack)
because the arguments are immediately passed to the wrapped method in place
(without need to move some value passed to other register or to change any
argument/return passed through stack). Thus, the wrapper does not need its
own stack frame.
This applies to promoted methods, e.g. when we have some struct type U with
an embedded type *T and construct a wrapper like
func (recv *U) M(arg int) bool { return recv.T.M(i) }
See also test/abi/method_wrapper.go for a running example.
Code size difference measured with this change (tried for x86_64):
etcd binary:
.text section size: 21472251 -> 21432350 (0.2%)
total binary size: 32226640 -> 32191136 (0.1%)
compile binary:
.text section size: 17419073 -> 17413929 (0.03%)
total binary size: 26744743 -> 26737567 (0.03%)
Change-Id: I9bbe730568f6def21a8e61118a6b6f503d98049c
Reviewed-on: https://go-review.googlesource.com/c/go/+/578235
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
This CL adds a "deadlocals" pass, which runs after inlining and before
escape analysis, to prune any unneeded local variables and
assignments. In particular, this helps avoid unnecessary Addrtaken
markings from unreachable closures.
Deadlocals is sensitive to "_ = ..." as a signal of explicit
use for testing. This signal occurs only if the entire
left-hand-side is "_" targets; if it is
`_, ok := someInlinedFunc(args)`
then the first return value is eligible for dead code elimination.
Use this (`_ = x`) to fix tests broken by deadlocals elimination.
Includes a test, based on one of the tests that required modification.
Matthew Dempsky wrote this, changing ownership to allow rebases, commits, tweaks.
Fixes #65158.
Old-Change-Id: I723fb69ccd7baadaae04d415702ce6c8901eaf4e
Change-Id: I1f25f4293b19527f305c18c3680b214237a7714c
Reviewed-on: https://go-review.googlesource.com/c/go/+/600498
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: David Chase <drchase@google.com>
Commit-Queue: David Chase <drchase@google.com>
|
|
Move the code that opens and increments counters out of the
cmd/internal/telemetry package into cmd/internal/telemetry/counter. The
telemetry package has dependencies on the upload code, which we do not
want to pull into the rest of the go toolchain.
For #68109
Change-Id: I463c106819b169177a783de4a7d93377e81f4e3e
Reviewed-on: https://go-review.googlesource.com/c/go/+/593976
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Findley <rfindley@google.com>
|
|
CL 577935 enabled hot block alignment on 386
and amd64 architecture.
However, this change broke the plan9/386 build.
This change disables AlignHot on plan9/386.
Updates #67502.
Change-Id: If73b066824c7218a9408c6e8f06aec5908b7a64f
Reviewed-on: https://go-review.googlesource.com/c/go/+/586835
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David du Colombier <0intro@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
|
|
As mentioned in CL 584598, linkname is a mechanism that, when
abused, can break API integrity and even safety of Go programs.
CL 584598 is a first step to restrict the use of linknames, by
implementing a blocklist. This CL takes a step further, tightening
up the restriction by allowing linkname references ("pull") only
when the definition side explicitly opts into it, by having a
linkname on the definition (possibly to itself). This way, it is at
least clear on the definition side that the symbol, despite being
unexported, is accessed outside of the package. Unexported symbols
without linkname can now be actually private. This is similar to
the symbol visibility rule used by gccgo for years (which defines
unexported non-linknamed symbols as C static symbols).
As there can be pull-only linknames in the wild that may be broken
by this change, we currently only enforce this rule for symbols
defined in the standard library. Push linknames are added in the
standard library to allow things build.
Linkname references to external (non-Go) symbols are still allowed,
as their visibility is controlled by the C symbol visibility rules
and enforced by the C (static or dynamic) linker.
Assembly symbols are treated similar to linknamed symbols.
This is controlled by -checklinkname linker flag, currently not
enabled by default. A follow-up CL will enable it by default.
Change-Id: I07344f5c7a02124dbbef0fbc8fec3b666a4b2b0e
Reviewed-on: https://go-review.googlesource.com/c/go/+/585358
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
This appears to be useful only on amd64, and was specifically
benchmarked on Apple Silicon and did not produce any benefit there.
This CL adds the assembly instruction `PCALIGNMAX align,amount`
which aligns to `align` if that can be achieved with `amount`
or fewer bytes of padding. (0 means never, but will align the
enclosing function.)
Specifically, if low-order-address-bits + amount are
greater than or equal to align; thus, `PCALIGNMAX 64,63` is
the same as `PCALIGN 64` and `PCALIGNMAX 64,0` will never
emit any alignment, but will still cause the function itself
to be aligned to (at least) 64 bytes.
Change-Id: Id51a056f1672f8095e8f755e01f72836c9686aa3
Reviewed-on: https://go-review.googlesource.com/c/go/+/577935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Add cmd/internal/telemetry to cmd/dist's bootstrapDirs so it's built
when bootstrapping the compiler. cmd/internal/telemetry is a wrapper
arount telemetry functions that stubs out the functions when built in
bootstrap mode to avoid dependencies on x/telemetry in bootstrap mode.
Call telemetry.Start with an empty config to open the counter file, and
increment a counter for when the command is invoked.
After flags are parsed, increment a counter for each of the names of the
flags that were passed in. The counter names will be compile/flag:<name>
so for example we'll have compile/flag:e and compile/flag:E.
In FatalfAt, increment a stack counter for internal errors.
For #58894
Change-Id: Ia5a8a63aa43b2276641181626cbfbea7e4647faa
Reviewed-on: https://go-review.googlesource.com/c/go/+/570679
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Flag flip to enable stack slot merging by default when optimizing.
Please see the earlier CL for details on what this is doing.
Updates #62737.
Updates #65532.
Updates #65495.
Change-Id: I8e30d553e74ace43d418f883199721f05320d3d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/576681
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
It is possible to have situations where a given ir.Name is
non-address-taken at the source level, but whose address is
materialized in order to accommodate the needs of arch-dependent
memory ops. The issue here is that the SymAddr op will show up as
touching a variable of interest, but the subsequent memory op will
not. This is generally not an issue for computing whether something is
live across a call, but it is problematic for collecting the more
fine-grained live interval info that drives stack slot merging.
As an example, consider this Go code:
package p
type T struct {
x [10]int
f float64
}
func ABC(i, j int) int {
var t T
t.x[i&3] = j
return t.x[j&3]
}
On amd64 the code sequences we'll see for accesses to "t" might look like
v10 = VarDef <mem> {t} v1
v5 = MOVOstoreconst <mem> {t} [val=0,off=0] v2 v10
v23 = LEAQ <*T> {t} [8] v2 : DI
v12 = DUFFZERO <mem> [80] v23 v5
v14 = ANDQconst <int> [3] v7 : AX
v19 = MOVQstoreidx8 <mem> {t} v2 v14 v8 v12
v22 = ANDQconst <int> [3] v8 : BX
v24 = MOVQloadidx8 <int> {t} v2 v22 v19 : AX
v25 = MakeResult <int,mem> v24 v19 : <>
Note that the the loads and stores (ex: v19, v24) all refer directly
to "t", which means that regular live analysis will work fine for
identifying variable lifetimes. The DUFFZERO is (in effect) an
indirect write, but since there are accesses immediately after it we
wind up with the same live intervals.
Now the same code with GOARCH=ppc64:
v10 = VarDef <mem> {t} v1
v20 = MOVDaddr <*T> {t} v2 : R20
v12 = LoweredZero <mem> [88] v20 v10
v3 = CLRLSLDI <int> [212543] v7 : R5
v15 = MOVDaddr <*T> {t} v2 : R6
v19 = MOVDstoreidx <mem> v15 v3 v8 v12
v29 = CLRLSLDI <int> [212543] v8 : R4
v24 = MOVDloadidx <int> v15 v29 v19 : R3
v25 = MakeResult <int,mem> v24 v19 : <>
Here instead of memory ops that refer directly to the symbol, we take
the address of "t" (ex: v15) and then pass the address to memory ops
(where the ops themselves no longer refer to the symbol).
This patch enhances the stack slot merging liveness analysis to handle
cases like the PPC64 one above. We add a new phase in candidate
selection that collects more precise use information for merge
candidates, and screens out candidates that are too difficult to
analyze. The phase make a forward pass over each basic block looking
for instructions of the form vK := SymAddr(N) where N is a raw
candidate. It then creates an entry in a map with key vK and value
holding name and the vK use count. As the walk continues, we check for
uses of of vK: when we see one, record it in a side table as an
upwards exposed use of N. At each vK use we also decrement the use
count in the map entry, and if we hit zero, remove the map entry. If
we hit the end of the basic block and we still have map entries, this
implies that the address in question "escapes" the block -- at that
point to be conservative we just evict the name in question from the
candidate set.
Although this CL fixes the issues that forced a revert of the original
merging CL, this CL doesn't enable stack slot merging by default; a
subsequent CL will do that.
Updates #62737.
Updates #65532.
Updates #65495.
Change-Id: Id41d359a677767a8e7ac1e962ae23f7becb4031f
Reviewed-on: https://go-review.googlesource.com/c/go/+/576735
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
[This is a partial roll-forward of CL 553055, the main change here
is that the stack slot overlap operation is flagged off by default
(can be enabled by hand with -gcflags=-d=mergelocals=1) ]
Preliminary compiler support for merging/overlapping stack slots of
local variables whose access patterns are disjoint.
This patch includes changes in AllocFrame to do the actual
merging/overlapping based on information returned from a new
liveness.MergeLocals helper. The MergeLocals helper identifies
candidates by looking for sets of AUTO variables that either A) have
the same size and GC shape (if types contain pointers), or B) have the
same size (but potentially different types as long as those types have
no pointers). Variables must be greater than (3*types.PtrSize) in size
to be considered for merging.
After forming candidates, MergeLocals collects variables into "can be
overlapped" equivalence classes or partitions; this process is driven
by an additional liveness analysis pass. Ideally it would be nice to
move the existing stackmap liveness pass up before AllocFrame
and "widen" it to include merge candidates so that we can do just a
single liveness as opposed to two passes, however this may be difficult
given that the merge-locals liveness has to take into account
writes corresponding to dead stores.
This patch also required a change to the way ssa.OpVarDef pseudo-ops
are generated; prior to this point they would only be created for
variables whose type included pointers; if stack slot merging is
enabled then the ssagen code creates OpVarDef ops for all auto vars
that are merge candidates.
Note that some temporaries created late in the compilation process
(e.g. during ssa backend) are difficult to reason about, especially in
cases where we take the address of a temp and pass it to the runtime.
For the time being we mark most of the vars created post-ssagen as
"not a merge candidate".
Stack slot merging for locals/autos is enabled by default if "-N" is
not in effect, and can be disabled via "-gcflags=-d=mergelocals=0".
Fixmes/todos/restrictions:
- try lowering size restrictions
- re-evaluate the various skips that happen in SSA-created autotmps
Updates #62737.
Updates #65532.
Updates #65495.
Change-Id: Ifda26bc48cde5667de245c8a9671b3f0a30bb45d
Reviewed-on: https://go-review.googlesource.com/c/go/+/575415
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This reverts CL 553055.
Reason for revert: causes crypto/ecdsa failures on linux ppc64/s390x builders
Change-Id: I9266b030693a5b6b1e667a009de89d613755b048
Reviewed-on: https://go-review.googlesource.com/c/go/+/575236
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Preliminary compiler support for merging/overlapping stack
slots of local variables whose access patterns are disjoint.
This patch includes changes in AllocFrame to do the actual
merging/overlapping based on information returned from a new
liveness.MergeLocals helper. The MergeLocals helper identifies
candidates by looking for sets of AUTO variables that either A) have
the same size and GC shape (if types contain pointers), or B) have the
same size (but potentially different types as long as those types have
no pointers). Variables must be greater than (3*types.PtrSize) in size
to be considered for merging.
After forming candidates, MergeLocals collects variables into "can be
overlapped" equivalence classes or partitions; this process is driven
by an additional liveness analysis pass. Ideally it would be nice to
move the existing stackmap liveness pass up before AllocFrame
and "widen" it to include merge candidates so that we can do just a
single liveness as opposed to two passes, however this may be difficult
given that the merge-locals liveness has to take into account
writes corresponding to dead stores.
This patch also required a change to the way ssa.OpVarDef pseudo-ops
are generated; prior to this point they would only be created for
variables whose type included pointers; if stack slot merging is
enabled then the ssagen code creates OpVarDef ops for all auto vars
that are merge candidates.
Note that some temporaries created late in the compilation process
(e.g. during ssa backend) are difficult to reason about, especially in
cases where we take the address of a temp and pass it to the runtime.
For the time being we mark most of the vars created post-ssagen as
"not a merge candidate".
Stack slot merging for locals/autos is enabled by default if "-N" is
not in effect, and can be disabled via "-gcflags=-d=mergelocals=0".
Fixmes/todos/restrictions:
- try lowering size restrictions
- re-evaluate the various skips that happen in SSA-created autotmps
Fixes #62737.
Updates #65532.
Updates #65495.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Change-Id: Ibc22e8a76c87e47bc9fafe4959804d9ea923623d
Reviewed-on: https://go-review.googlesource.com/c/go/+/553055
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For #51572
Change-Id: I23bb25b8cf1ecb9be25eb6ab9e89cd397b58b3c8
Reviewed-on: https://go-review.googlesource.com/c/go/+/572535
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Change-Id: I38ba7cb0179ec9226a68629c53ea2d81fa19c059
GitHub-Last-Rev: a3d4fe2ac2d2f40033ae0244a264074d45b3ad52
GitHub-Pull-Request: golang/go#66024
Reviewed-on: https://go-review.googlesource.com/c/go/+/568115
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
Auto-Submit: Robert Griesemer <gri@google.com>
|
|
It fixes the issue https://github.com/golang/go/issues/65220.
It also includes https://go.dev/cl/557458 from Michael.
Change-Id: Ic6109e1b6a9045459ff4a54dea11cbfe732b01e6
Reviewed-on: https://go-review.googlesource.com/c/go/+/557918
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This reverts CL 529738.
Reason for revert: Breaking longtest builders
For #58102.
Fixes #65220.
Change-Id: Id295e3249da9d82f6a9e4fc571760302a1362def
Reviewed-on: https://go-review.googlesource.com/c/go/+/557460
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The pgo compilation time is very long if the profile file is large.
We added a preprocess tool to pre-parse profile file in order to
expedite the compile time.
Change-Id: I6f50bbd01f242448e2463607a9b63483c6ca9a12
Reviewed-on: https://go-review.googlesource.com/c/go/+/529738
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Shape-based stenciling in the Go compiler's generic instantiation
phase looks up shape types using the underlying type of a given target
type. This has a beneficial effect in most cases (e.g. we can use the
same shape type for two different named types whose underlying type is
"int"), but causes some problems when the underlying type is a very
large structure. The link string for the underlying type of a large
imported struct can be extremely long, since the link string
essentially enumerates the full package path for every field type;
this can produce a "go.shape.struct { ... " symbol name that is
absurdly long.
This patch switches the compiler to use a hash of the underlying type
link string instead of the string itself, which should continue to
provide commoning but keep symbol name lengths reasonable for shape
types based on large imported structs.
Fixes #65030.
Change-Id: I87d602626c43172beb99c186b8ef72327b8227a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/554975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Than McIntosh <thanm@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Per the discussion on the issue, since no problems related to this
appeared since Go 1.20, remove the ability to disable the check for
anonymous interface cycles permanently.
Adjust various tests accordingly.
For #56103.
Change-Id: Ica2b28752dca08934bbbc163a9b062ae1eb2a834
Reviewed-on: https://go-review.googlesource.com/c/go/+/550896
Run-TryBot: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
This commit is aimed at improving the readability and consistency
of the code base. Extraneous newline characters were present after
some return statements, creating unnecessary separation in the code.
Fixes #64610
Change-Id: Ic1b05bf11761c4dff22691c2f1c3755f66d341f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/548316
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Extend the pgodevirtualize debug flag to distinguish interface and
function devirtualization. Setting 1 keeps interface devirtualization
enabled but disables function value devirtualization.
For #64209.
Change-Id: I33aa7eb95ca0bdb215256d8c7cc8f9dac53ae30e
Reviewed-on: https://go-review.googlesource.com/c/go/+/543115
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add a debugging flag "-d=inlscoreadj" intended to support running
experiments in which the inliner uses different score adjustment
values for specific heuristics. The flag argument is a series of
clauses separated by the "/" char where each clause takes the form
"adjK:valK". For example, in this build
go build -gcflags=-d=inlscoreadj=inLoopAdj:10/returnFeedsConstToIfAdj:-99
the "in loop" score adjustments would be reset to a value of 15 (effectively
penalizing calls in loops) adn the "return feeds constant to foldable if/switch"
score adjustment would be boosted from -15 to -99.
Change-Id: Ibd1ee334684af5992466556a69baa6dfefb246b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/532116
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
E.g.
`GOEXPERIMENT=rangefunc go test -v -gcflags=-d=rangefunccheck=0 rangefunc_test.go`
will turn off the checking and fail.
The benchmarks, which do not use pathological iterators, run slightly faster.
Change-Id: Ia3e175e86d67ef74bbae9bcc5d2def6a2cdf519d
Reviewed-on: https://go-review.googlesource.com/c/go/+/541995
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Relocate the 'covcmd' package from .../internal/coverage to
.../cmd/internal/cov, to reflect the fact that the definitions in this
package are used only in cmd, not in std.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-windows-amd64-longtest
Change-Id: I65bcc34736d1d0a23134a6c91c17ff138cd45431
Reviewed-on: https://go-review.googlesource.com/c/go/+/526595
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
When a PGO build fails or produces incorrect program, it is often
unclear what the problem is. Add pgo hash so we can bisect to
individual optimization decisions, which often helps debugging.
Related to #58153.
Change-Id: I651ffd9c53bad60f2f28c8ec2a90a3f532982712
Reviewed-on: https://go-review.googlesource.com/c/go/+/528400
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Add a new debug flag "-d=dumpinlcallsitescores" that dumps out a
summary of all callsites in the package being compiled with info on
inlining heuristics, for human consumption. Sample output lines:
Score Adjustment Status Callee CallerPos ScoreFlags
...
115 40 DEMOTED cmd/compile/internal/abi.(*ABIParamAssignment).Offset expand_calls.go:1679:14|6 panicPathAdj
...
76 -5 PROMOTED runtime.persistentalloc mcheckmark.go:48:45|3 inLoopAdj
...
201 0 --- PGO unicode.DecodeRuneInString utf8.go:312:30|1
...
7 -5 --- PGO internal/abi.Name.DataChecked type.go:625:22|0 inLoopAdj
Here "Score" is the final score calculated for the callsite,
"Adjustment" is the amount added to or subtracted from the original
hairyness estimate to form the score. "Status" shows whether anything
changed with the site -- did the adjustment bump it down just below
the threshold ("PROMOTED") or instead bump it above the threshold
("DEMOTED") or did nothing happen as a result of the heuristics
("---"); "Status" also shows whether PGO was involved. "Callee" is the
name of the function called, "CallerPos" is the position of the
callsite, and "ScoreFlags" is a digest of the specific properties we
used to make adjustments to callsite score via heuristics.
Change-Id: Iea4b1cbfee038bc68df6ab81e9973f145636300b
Reviewed-on: https://go-review.googlesource.com/c/go/+/513455
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This patch fixes an inconsistency in compiler flag handling introduced
accidentally in CL 521699. In the compiler we have both base.Flag.N
(which records whether the user has supplied the "-N" flag to disable
optimization) and base.Ctxt.Flag_optimize (which tracks whether
optimization is turned on). In this case Flag.N was updated without a
corresponding change to Ctxt.Flag_optimize, which led to problems with
DWARF generation for the runtime.
This CL doesn't include a regression test; a test will be added later
in the x/debug repo in a subsequent CL.
Updates #62523.
Change-Id: I0c383bb43ec0a0e7c12e7e2852c0590731416d6f
Reviewed-on: https://go-review.googlesource.com/c/go/+/527319
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Currently, cmd/compile optimizes `var a = true; var b = a` into `var a
= true; var b = true`. But this may not be safe if we need to
initialize any other global variables between `a` and `b`, and the
initialization involves calling a user-defined function that reassigns
`a`.
This CL changes staticinit to keep track of the initialization
expressions that we've seen so far, and to stop applying the
staticcopy optimization once we've seen an initialization expression
that might have modified another global variable within this package.
To help identify affected initializers, this CL adds a -d=staticcopy
flag to warn when a staticcopy is suppressed and turned into a dynamic
copy.
Currently, `go build -gcflags=all=-d=staticcopy std` reports only four
instances:
```
encoding/xml/xml.go:1600:5: skipping static copy of HTMLEntity+0 with map[string]string{...}
encoding/xml/xml.go:1869:5: skipping static copy of HTMLAutoClose+0 with []string{...}
net/net.go:661:5: skipping static copy of .stmp_31+0 with poll.ErrNetClosing
net/http/transport.go:2566:5: skipping static copy of errRequestCanceled+0 with ~R0
```
Fixes #51913.
Change-Id: Iab41cf6f84c44f7f960e4e62c28a8aeaade4fbcf
Reviewed-on: https://go-review.googlesource.com/c/go/+/395541
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
|
|
ErrorfVers used to be used by typecheck to report when new language
functionality was used, but the -lang flag (from go.mod) was set to an
older version. However, all of the callers have been since removed,
now that this is handled by types2.
And for the same reason, we can stop changing base.Flag.Lang. This was
previously a workaround so that the unified frontend could generate
arbitrary IR without upsetting typecheck, at a time when typecheck was
itself a real frontend. Now it's just a glorified desugaring pass.
Change-Id: I1c0316dbfe2e08ba089acd50fdfe20b17176be25
Reviewed-on: https://go-review.googlesource.com/c/go/+/522877
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
PkgSpecials
This consolidates the NoInstrumentPkgs and NoRacePkgs lists into the
objabi.LookupPkgSpecial mechanism.
Change-Id: I411654afdd690fb01c412e7e8b57ddfbe85415e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/521702
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Austin Clements <austin@google.com>
|
|
Currently, this list includes *almost* all runtime packages, but not
quite all.
We leave out internal/bytealg for reasons explained in the code.
Compiling with or without race instrumentation has no effect on the
other packages added to the list here, so this is a no-op change
today, but makes this more robust.
Change-Id: Iaec585b2efbc72983d8cb3929394524c42dd664d
Reviewed-on: https://go-review.googlesource.com/c/go/+/521701
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
As we did for the asm -compiling-runtime flag, this CL modifies the
compiler to compute the -+ (compiling runtime) flag from the package
path. Unlike for asm, some tests use -+ explicitly to opt in to
runtime restrictions, so we leave the flag, but it's no longer passed
by any build tools.
This lets us eliminate cmd/go's list of "runtime packages" in favor of
the unified objabi.LookupPkgSpecial. It also fixes an inconsistency
with dist, which only passed -+ when compiling "runtime" itself.
One consequence of this is that the compiler now ignores the -N flag
when compiling runtime packages. Previously, cmd/go would strip -N
when passing -+ and the compiler would fatal if it got both -N and -+,
so the overall effect was that the compiler never saw -N when
compiling a runtime package. Now we simply move that logic to disable
-N down into the compiler.
Change-Id: I4876047a1563210ed122a31b72d62798762cbcf5
Reviewed-on: https://go-review.googlesource.com/c/go/+/521699
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
This comment got left behind in some refactoring and now refers to
code "below" that is no longer below. Move it to be with the code it's
referring to.
Change-Id: I7f7bf0cf8b22c1f6e05ff12b8be71d18fb3359d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/521177
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Austin Clements <austin@google.com>
TryBot-Bypass: Austin Clements <austin@google.com>
|