| Age | Commit message (Collapse) | Author |
|
|
|
When a no-op interface conversion is wrapped around the rhs of an
assignment, the memory overlap detection logic in the compiler failed to
peel down conversion to see the actual pointer, causing an incorrect
no-overlapping determination.
Thanks to Jakub Ciolek for reporting this issue.
Fixes #78371
Fixes CVE-2026-27144
Change-Id: I55ff0806b099e1447bdbfba7fde6c6597db5d65c
Reviewed-on: https://go-internal-review.googlesource.com/c/go/+/3780
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Neal Patel <nealpatel@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/763764
Auto-Submit: David Chase <drchase@google.com>
TryBot-Bypass: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
|
|
Switch statement containing integer constant cases and case bodies just
returning a constant should be optimizable to a simpler and faster table
lookup instead of a jump table.
That is, a switch like this:
switch x {
case 0: return 10
case 1: return 20
case 2: return 30
case 3: return 40
default: return -1
}
Could be optimized to this:
var table = [4]int{10, 20, 30, 40}
if uint(x) < 4 { return table[x] }
return -1
The resulting code is smaller and faster, especially on platforms where
jump tables are not supported.
goos: windows
goarch: arm64
pkg: cmd/compile/internal/test
│ .\old.txt │ .\new.txt │
│ sec/op │ sec/op vs base │
SwitchLookup8Predictable-12 2.708n ± 6% 2.249n ± 5% -16.97% (p=0.000 n=10)
SwitchLookup8Unpredictable-12 8.758n ± 7% 3.272n ± 4% -62.65% (p=0.000 n=10)
SwitchLookup32Predictable-12 2.672n ± 5% 2.373n ± 6% -11.21% (p=0.000 n=10)
SwitchLookup32Unpredictable-12 9.372n ± 7% 3.385n ± 6% -63.89% (p=0.000 n=10)
geomean 4.937n 2.772n -43.84%
Fixes #78203
Change-Id: I74fa3d77ef618412951b2e5c3cb6ebc760ce4ff1
Reviewed-on: https://go-review.googlesource.com/c/go/+/756340
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Fixes #78219
Updates #77597
Change-Id: I021df668bfc18081e71faaab2e4bad607873bf4d
Reviewed-on: https://go-review.googlesource.com/c/go/+/756780
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Fixes #77720
Add a generic SSA rewrite that forwards `Load` from a `Move` destination
back to the `Move` source when it is provably safe, so field reads like
`s.h.Value().typ` don’t force a full struct copy.
- Add `Load <- Move` rewrite in `generic.rules` with safety guard:
non-volatile source
- Tweak `fixedbugs/issue22200*` so that it can still trigger the "stack frame too large" error.
- Regenerate `rewritegeneric.go`.
- Add `test/codegen/moveload.go` to assert no `MOVUPS` and direct `MOVBLZX`
in both direct and inlined forms.
Benchmark results (linux/amd64, i7-14700KF):
$ go test cmd/compile/internal/test -run='^$' -bench='MoveLoad' -count=20
Before:
BenchmarkMoveLoadTypViaValue-20 ~76.9 ns/op
BenchmarkMoveLoadTypViaPtr-20 ~1.97 ns/op
After:
BenchmarkMoveLoadTypViaValue-20 ~1.894 ns/op
BenchmarkMoveLoadTypViaPtr-20 ~1.905 ns/op
The rewrite removes the redundant struct copy in
`s.h.Value().typ`, bringing it in line with the direct pointer form.
Change-Id: Iddf2263e390030ba013e0642a695b87c75f899da
Reviewed-on: https://go-review.googlesource.com/c/go/+/748200
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <markfreeman@google.com>
|
|
This CL follows from the abandoned CL 722961.
Alan Donovan asked if modernize would catch these changes; it does in part.
Here is how modernize was used:
1. Build the Go compiler from latest master
2. Clone x/tools and build modernize with the newest compiler
3. Then, do:
PATH=PATH_TO_BIN:$PATH go list ./... | xargs env PATH=PATH_TO_BIN:$PATH GOMOD=off MODERNIZE -forvar -fix -test -debug fpstv
4. For assurance, move into a package directory (i.e, cmd), and redo Step 3.
From the obtained result, it seems modernize did not remove the loop variable when:
- the range notation was not used
- the loop variable was not directly under the for directive.
Which does happens here:
# git ls-files | xargs -I {} perl -nE 'print "$ARGV:$.:$_" if /\s+(\w+) := \1$/' {} | grep _test
[...]
cmd/compile/internal/types2/api_test.go:2423: i := i
md/go/internal/modload/query_test.go:182: tt := tt
md/go/internal/web/url_test.go:17: tc := tc
cmd/go/internal/web/url_test.go:49: tc := tc
cmd/internal/par/queue_test.go:54: i := i
runtime/syscall_windows_test.go:781: arglen := arglen
Link: https://go-review.googlesource.com/c/go/+/722961/comments/e8d47866_fc399fa1
Co-authored-by: Plamerdi Makela <plamerdi447@gmail.com>
Fixes #76411
Change-Id: I0c191cdca70dbea6efaf6796dca9c60e2afcd9ea
GitHub-Last-Rev: 77c3e11fc21a4ede70b733b567a1690edb944dc1
GitHub-Pull-Request: golang/go#77694
Reviewed-on: https://go-review.googlesource.com/c/go/+/746502
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
|
|
This matters for encodings supporting time.Duration,
which cannot add an AppendText method due to the encoding package godoc:
Adding encoding/decoding methods to existing types may constitute
a breaking change, [...] The policy for packages maintained by
the Go project is to only allow the addition of marshaling functions
if no existing, reasonable marshaling exists.
And indeed, time.Duration does have an existing marshaling today:
as an integer, which encoding/json still uses to this day.
Thankfully, with the String method being inlineable, we can rely on
the following not allocating in the heap:
var td time.Duration = ...
b = append(b, td.String()...)
Lock this in so that we can rely on this pattern going forward.
For #77851.
Change-Id: I57ce0bc099f293b0e36aa4c74b5410f8bda347a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/750100
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Add BenchmarkStringEqParamOrder to verify that the operand order of
string equality comparisons (strs[i] == str vs str == strs[i]) does not
affect performance when the comparand is a function parameter.
This benchmark was written while investigating issue #74471. Extensive
benchmarking on both amd64 (Intel Xeon Gold 6148, linux) and arm64
(Apple M1 Pro, darwin) with 30 samples each showed that on Go tip the
compiler already generates equivalent code for both orderings, so no
compiler change is needed. The benchmark is retained as a regression
guard.
Updates #74471
Change-Id: I257910bba0abb3dc738578d898e63ecfb261c3cc
Reviewed-on: https://go-review.googlesource.com/c/go/+/742141
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The NoRaceFunc flag is meant to suppress racefuncenter/racefuncexit
instrumentation for packages like internal/runtime/atomic. However,
instrumentEnterExit was set unconditionally when -race was enabled,
outside the NoRaceFunc guard. This caused generic functions from
NoRaceFunc packages (e.g. atomic.(*Pointer[T]).Store) to receive
racefuncenter calls when instantiated in other packages, leading to
a segfault during early runtime init before the race runtime is ready.
Move the instrumentEnterExit assignment inside the NoRaceFunc check
so both memory and enter/exit instrumentation are suppressed together.
Fixes #77597
Change-Id: Id03bb9c422d36e2e88ecdf165ad3b1a4700a935c
Reviewed-on: https://go-review.googlesource.com/c/go/+/748260
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change introduces a new SSA pass that converts conditionals
with logical AND into CMP/CCMP instruction sequences on ARM64.
Fixes #71268
Change-Id: I3eba9c05b88ed6eb70350d30f6e805e6a4dddbf1
Reviewed-on: https://go-review.googlesource.com/c/go/+/698099
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
If a valid profile is scaled such that the samples/counts become 0,
an empty graph in which the compiler complain that we never saw a start line
since the graph has no nodes which is a misleading error message.
src/cmd/internal/pgo/pprof.go was modified to check if the graph is empty
return an empty profile.
GitHub-Pull-Request: golang/go#73640
Change-Id: If3f7ab8af6fcf77b2e29ae1df154f87bee377ab0
Reviewed-on: https://go-review.googlesource.com/c/go/+/725120
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
Change-Id: Id2c4152b43c7ee1a687e49da7dda5a690e554231
Reviewed-on: https://go-review.googlesource.com/c/go/+/727900
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Change-Id: I05cd103ea8d8bbee0ad907ff3e594de273d49e86
Reviewed-on: https://go-review.googlesource.com/c/go/+/725600
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
memory after growslice
This CL is part of a set of CLs that attempt to reduce how much work the
GC must do. See the design in https://go.dev/design/74299-runtime-freegc
This CL updates the compiler to examine append calls to prove
whether or not the slice is aliased.
If proven unaliased, the compiler automatically inserts a call to a new
runtime function introduced with this CL, runtime.growsliceNoAlias,
which frees the old backing memory immediately after slice growth is
complete and the old storage is logically dead.
Two append benchmarks below show promising results, executing up to
~2x faster and up to factor of ~3 memory reduction with this CL.
The approach works with multiple append calls for the same slice,
including inside loops, and the final slice memory can be escaping,
such as in a classic pattern of returning a slice from a function
after the slice is built. (The final slice memory is never freed with
this CL, though we have other work that tackles that.)
An example target for this CL is we automatically free the
intermediate memory for the appends in the loop in this function:
func f1(input []int) []int {
var s []int
for _, x := range input {
s = append(s, g(x)) // s cannot be aliased here
if h(x) {
s = append(s, x) // s cannot be aliased here
}
}
return s // slice escapes at end
}
In this case, the compiler and the runtime collaborate so that
the heap allocated backing memory for s is automatically freed after
a successful grow. (For the first grow, there is nothing to free,
but for the second and subsequent growths, the old heap memory is
freed automatically.)
The new runtime.growsliceNoAlias is primarily implemented
by calling runtime.freegc, which we introduced in CL 673695.
The high-level approach here is we step through the IR starting
from a slice declaration and look for any operations that either
alias the slice or might do so, and treat any IR construct we
don't specifically handle as a potential alias (and therefore
conservatively fall back to treating the slice as aliased when
encountering something not understood).
For loops, some additional care is required. We arrange the analysis
so that an alias in the body of a loop causes all the appends in that
same loop body to be marked aliased, even if the aliasing occurs after
the append in the IR:
func f2() {
var s []int
for i := range 10 {
s = append(s, i) // aliased due to next line
alias = s
}
}
For nested loops, we analyse the nesting appropriately so that
for example this append is still proven as non-aliased in the
inner loop even though it aliased for the outer loop:
func f3() {
for range 10 {
var s []int
for i := range 10 {
s = append(s, i) // append using non-aliased slice
}
alias = s
}
}
A good starting point is the beginning of the test/escape_alias.go file,
which starts with ~10 introductory examples with brief comments that
attempt to illustrate the high-level approach.
For more details, see the new .../internal/escape/alias.go file,
especially the (*aliasAnalysis).analyze method.
In the first benchmark, an append in a loop builds up a slice from
nothing, where the slice elements are each 64 bytes. In the table below,
'count' is the number of appends. With 1 append, there is no opportunity
for this CL to free memory. Once there are 2 appends, the growth from
1 element to 2 elements means the compiler-inserted growsliceNoAlias
frees the 1-element array, and we see a ~33% reduction in memory use
and a small reported speed improvement.
As the number of appends increases for example to 5, we are at
a ~20% speed improvement and ~45% memory reduction, and so on until
we reach ~40% faster and ~50% less memory allocated at the end of
the table.
There can be variation in the reported numbers based on -randlayout, so
this table is for 30 different values of -randlayout with a total
n=150. (Even so, there is still some variation, so we probably should
not read too much into small changes.) This is with GOAMD64=v3 on
a VM that gcc reports is cascadelake.
goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ sec/op │ sec/op vs base │
Append64Bytes/count=1-4 31.09n ± 2% 31.69n ± 1% +1.95% (n=150)
Append64Bytes/count=2-4 73.31n ± 1% 70.27n ± 0% -4.15% (n=150)
Append64Bytes/count=3-4 142.7n ± 1% 124.6n ± 1% -12.68% (n=150)
Append64Bytes/count=4-4 149.6n ± 1% 127.7n ± 0% -14.64% (n=150)
Append64Bytes/count=5-4 277.1n ± 1% 213.6n ± 0% -22.90% (n=150)
Append64Bytes/count=6-4 280.7n ± 1% 216.5n ± 1% -22.87% (n=150)
Append64Bytes/count=10-4 544.3n ± 1% 386.6n ± 0% -28.97% (n=150)
Append64Bytes/count=20-4 1058.5n ± 1% 715.6n ± 1% -32.39% (n=150)
Append64Bytes/count=50-4 2.121µ ± 1% 1.404µ ± 1% -33.83% (n=150)
Append64Bytes/count=100-4 4.152µ ± 1% 2.736µ ± 1% -34.11% (n=150)
Append64Bytes/count=200-4 7.753µ ± 1% 4.882µ ± 1% -37.03% (n=150)
Append64Bytes/count=400-4 15.163µ ± 2% 9.273µ ± 1% -38.84% (n=150)
geomean 601.8n 455.0n -24.39%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ B/op │ B/op vs base │
Append64Bytes/count=1-4 64.00 ± 0% 64.00 ± 0% ~ (n=150)
Append64Bytes/count=2-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150)
Append64Bytes/count=3-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150)
Append64Bytes/count=4-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150)
Append64Bytes/count=5-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150)
Append64Bytes/count=6-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150)
Append64Bytes/count=10-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150)
Append64Bytes/count=20-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150)
Append64Bytes/count=50-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150)
Append64Bytes/count=100-4 15.938Ki ± 0% 8.021Ki ± 0% -49.67% (n=150)
Append64Bytes/count=200-4 31.94Ki ± 0% 16.08Ki ± 0% -49.64% (n=150)
Append64Bytes/count=400-4 63.94Ki ± 0% 32.33Ki ± 0% -49.44% (n=150)
geomean 1.991Ki 1.124Ki -43.54%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ allocs/op │ allocs/op vs base │
Append64Bytes/count=1-4 1.000 ± 0% 1.000 ± 0% ~ (n=150)
Append64Bytes/count=2-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150)
Append64Bytes/count=3-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150)
Append64Bytes/count=4-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150)
Append64Bytes/count=5-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150)
Append64Bytes/count=6-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150)
Append64Bytes/count=10-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150)
Append64Bytes/count=20-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150)
Append64Bytes/count=50-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150)
Append64Bytes/count=100-4 8.000 ± 0% 1.000 ± 0% -87.50% (n=150)
Append64Bytes/count=200-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150)
Append64Bytes/count=400-4 10.000 ± 0% 1.000 ± 0% -90.00% (n=150)
geomean 4.331 1.000 -76.91%
The second benchmark is similar, but instead uses an 8-byte integer
for the slice element. The first 4 appends in the loop never call into
the runtime thanks to the excellent CL 664299 introduced by Keith in
Go 1.25 that allows some <= 32 byte dynamically-sized slices to be on
the stack, so this CL is neutral for <= 32 bytes. Once the 5th append
occurs at count=5, a grow happens via the runtime and heap allocates
as normal, but freegc does not yet have anything to free, so we see
a small ~1.4ns penalty reported there. But once the second growth
happens, the older heap memory is now automatically freed by freegc,
so we start to see some benefit in memory reductions and speed
improvements, starting at a tiny speed improvement (close to a wash,
or maybe noise) by the second growth before count=10, and building up to
~2x faster with ~68% fewer allocated bytes reported.
goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ sec/op │ sec/op vs base │
AppendInt/count=1-4 2.978n ± 0% 2.969n ± 0% -0.30% (p=0.000 n=150)
AppendInt/count=4-4 4.292n ± 3% 4.163n ± 3% ~ (p=0.528 n=150)
AppendInt/count=5-4 33.50n ± 0% 34.93n ± 0% +4.25% (p=0.000 n=150)
AppendInt/count=10-4 76.21n ± 1% 75.67n ± 0% -0.72% (p=0.000 n=150)
AppendInt/count=20-4 150.6n ± 1% 133.0n ± 0% -11.65% (n=150)
AppendInt/count=50-4 284.1n ± 1% 225.6n ± 0% -20.59% (n=150)
AppendInt/count=100-4 544.2n ± 1% 392.4n ± 1% -27.89% (n=150)
AppendInt/count=200-4 1051.5n ± 1% 702.3n ± 0% -33.21% (n=150)
AppendInt/count=400-4 2.041µ ± 1% 1.312µ ± 1% -35.70% (n=150)
AppendInt/count=1000-4 5.224µ ± 2% 2.851µ ± 1% -45.43% (n=150)
AppendInt/count=2000-4 11.770µ ± 1% 6.010µ ± 1% -48.94% (n=150)
AppendInt/count=3000-4 17.747µ ± 2% 8.264µ ± 1% -53.44% (n=150)
geomean 331.8n 246.4n -25.72%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ B/op │ B/op vs base │
AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=5-4 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=150)
AppendInt/count=10-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150)
AppendInt/count=20-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150)
AppendInt/count=50-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150)
AppendInt/count=100-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150)
AppendInt/count=200-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150)
AppendInt/count=400-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150)
AppendInt/count=1000-4 24.56Ki ± 0% 10.05Ki ± 0% -59.07% (n=150)
AppendInt/count=2000-4 58.56Ki ± 0% 20.31Ki ± 0% -65.32% (n=150)
AppendInt/count=3000-4 85.19Ki ± 0% 27.30Ki ± 0% -67.95% (n=150)
geomean ² -42.81%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ allocs/op │ allocs/op vs base │
AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=5-4 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=10-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150)
AppendInt/count=20-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150)
AppendInt/count=50-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150)
AppendInt/count=100-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150)
AppendInt/count=200-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150)
AppendInt/count=400-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150)
AppendInt/count=1000-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150)
AppendInt/count=2000-4 11.000 ± 0% 1.000 ± 0% -90.91% (n=150)
AppendInt/count=3000-4 12.000 ± 0% 1.000 ± 0% -91.67% (n=150)
geomean ² -72.76% ²
Of course, these are just microbenchmarks, but likely indicate
there are some opportunities here.
The immediately following CL 712422 tackles inlining and is able to get
runtime.freegc working automatically with iterators such as used by
slices.Collect, which becomes able to automatically free the
intermediate memory from its repeated appends (which earlier
in this work required a temporary hand edit to the slices package).
For now, we only use the NoAlias version for element types without
pointers while waiting on additional runtime support in CL 698515.
Updates #74299
Change-Id: I1b9d286aa97c170dcc2e203ec0f8ca72d84e8221
Reviewed-on: https://go-review.googlesource.com/c/go/+/710015
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Merge List:
+ 2025-11-24 8dd5b13abc cmd/compile: relax stmtline_test on amd64
+ 2025-11-23 feae743bdb cmd/compile: use 32x32->64 multiplies on loong64
+ 2025-11-23 e88be8a128 runtime: fix stale comment for mheap/malloc
+ 2025-11-23 a318843a2a cmd/internal/obj/loong64: optimize duplicate optab entries
+ 2025-11-23 a18294bb6a cmd/internal/obj/arm64, image/gif, runtime, sort: use math/bits to calculate log2
+ 2025-11-23 437323ef7b slices: fix incorrect comment in slices.Insert function documentation
+ 2025-11-23 1993dca400 doc/next: pre-announce end of support for macOS 12 in Go 1.27
+ 2025-11-22 337f7b1f5d cmd/go: update default go directive in mod or work init
+ 2025-11-21 3c26aef8fb cmd/internal/obj/riscv: improve large branch/call/jump tests
+ 2025-11-21 31aa9f800b crypto/tls: use inner hello for earlyData when using QUIC and ECH
+ 2025-11-21 d68aec8db1 runtime: replace trace seqlock with write flag
+ 2025-11-21 8d9906cd34 runtime/trace: add Log benchmark
+ 2025-11-21 6aeacdff38 cmd/go: support sha1 repos when git default is sha256
+ 2025-11-21 9570036ca5 crypto/sha3: make the zero value of SHAKE useable
+ 2025-11-21 155efbbeeb crypto/sha3: make the zero value of SHA3 useable
+ 2025-11-21 6f16669e34 database/sql: don't ignore ColumnConverter for unknown input count
+ 2025-11-21 121bc3e464 runtime/pprof: remove hard-coded sleep in CPU profile reader
+ 2025-11-21 b604148c4e runtime: fix double wakeup in CPU profile buffer
+ 2025-11-21 22f24f90b5 cmd/compile: change testing.B.Loop keep alive semantic
+ 2025-11-21 cfb9d2eb73 net: remove unused linknames
+ 2025-11-21 65ef314f89 net/http: remove unused linknames
+ 2025-11-21 0f32fbc631 net/http: populate Response.Request when using NewFileTransport
+ 2025-11-21 3e0a8e7867 net/http: preserve original path encoding in redirects
+ 2025-11-21 831af61120 net/http: use HTTP 307 redirects in ServeMux
+ 2025-11-21 87269224cb net/http: update Response.Request.URL after redirects on GOOS=js
+ 2025-11-21 7aa9ca729f net/http/cookiejar: treat localhost as secure origin
+ 2025-11-21 f870a1d398 net/url: warn that JoinPath arguments should be escaped
+ 2025-11-21 9962d95fed crypto/internal/fips140/mldsa: unroll NTT and inverseNTT
+ 2025-11-21 f821fc46c5 crypto/internal/fisp140test: update acvptool, test data
+ 2025-11-21 b59efc38a0 crypto/internal/fips140/mldsa: new package
+ 2025-11-21 62741480b8 runtime: remove linkname for gopanic
+ 2025-11-21 7db2f0bb9a crypto/internal/hpke: separate KEM and PublicKey/PrivateKey interfaces
+ 2025-11-21 e15800c0ec crypto/internal/hpke: add ML-KEM and hybrid KEMs, and SHAKE KDFs
+ 2025-11-21 7c985a2df4 crypto/internal/hpke: modularize API and support more ciphersuites
+ 2025-11-21 e7d47ac33d cmd/compile: simplify negative on multiplication
+ 2025-11-21 35d2712b32 net/http: fix typo in Transport docs
+ 2025-11-21 90c970cd0f net: remove unnecessary loop variable copies in tests
+ 2025-11-21 9772d3a690 cmd/cgo: strip top-level const qualifier from argument frame struct
+ 2025-11-21 1903782ade errors: add examples for custom Is/As matching
+ 2025-11-21 ec92bc6d63 cmd/compile: rewrite Rsh to RshU if arguments are proved positive
+ 2025-11-21 3820f94c1d cmd/compile: propagate unsigned relations for Rsh if arguments are positive
+ 2025-11-21 d474f1fd21 cmd/compile: make dse track multiple shadowed ranges
+ 2025-11-21 d0d0a72980 cmd/compile/internal/ssa: correct type of ARM64 conditional instructions
+ 2025-11-21 a9704f89ea internal/runtime/gc/scan: add AVX512 impl of filterNil.
+ 2025-11-21 ccd389036a cmd/internal/objabi: remove -V=goexperiment internal special case
+ 2025-11-21 e7787b9eca runtime: go fmt
+ 2025-11-21 17b3b98796 internal/strconv: go fmt
+ 2025-11-21 c851827c68 internal/trace: go fmt
+ 2025-11-21 f87aaec53d cmd/compile: fix integer overflow in prove pass
+ 2025-11-21 dbd2ab9992 cmd/compile/internal: fix typos
+ 2025-11-21 b9d86baae3 cmd/compile/internal/devirtualize: fix typos
+ 2025-11-20 4b0e3cc1d6 cmd/link: support loading R_LARCH_PCREL20_S2 and R_LARCH_CALL36 relocs
+ 2025-11-20 cdba82c7d6 cmd/internal/obj/loong64: add {,X}VSLT.{B/H/W/V}{,U} instructions support
+ 2025-11-20 bd2b117c2c crypto/tls: add QUICErrorEvent
+ 2025-11-20 3ad2e113fc net/http/httputil: wrap ReverseProxy's outbound request body so Close is a noop
+ 2025-11-20 d58b733646 runtime: track goroutine location until actual STW
+ 2025-11-20 1bc54868d4 cmd/vendor: update to x/tools@68724af
+ 2025-11-20 8c3195973b runtime: disable stack allocation tests on sanitizers
+ 2025-11-20 ff654ea100 net/url: permit colons in the host of postgresql:// URLs
+ 2025-11-20 a662badab9 encoding/json: remove linknames
+ 2025-11-20 5afe237d65 mime: add missing path for mime types in godoc
+ 2025-11-20 c1b7112af8 os/signal: make NotifyContext cancel the context with a cause
Change-Id: Ib93ef643be610dfbdd83ff45095a7b1ca2537b8b
|
|
goos: linux
goarch: amd64
pkg: cmd/compile/internal/test
cpu: AMD EPYC 7532 32-Core Processor
│ simplify_base │ simplify_new │
│ sec/op │ sec/op vs base │
SimplifyNegMul 623.0n ± 0% 319.3n ± 1% -48.75% (p=0.000 n=10)
goos: linux
goarch: riscv64
pkg: cmd/compile/internal/test
cpu: Spacemit(R) X60
│ simplify.base │ simplify.new │
│ sec/op │ sec/op vs base │
SimplifyNegMul 10.928µ ± 0% 6.432µ ± 0% -41.14% (p=0.000 n=10)
Change-Id: I1d9393cd19a0b948a5d3a512d627cdc0cf0b38be
Reviewed-on: https://go-review.googlesource.com/c/go/+/721520
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Conflicts:
- src/cmd/compile/internal/typecheck/builtin.go
Merge List:
+ 2025-11-20 ca37d24e0b net/http: drop unused "broken" field from persistConn
+ 2025-11-20 4b740af56a cmd/internal/obj/x86: handle global reference in From3 in dynlink mode
+ 2025-11-20 790384c6c2 spec: adjust rule for type parameter on RHS of alias declaration
+ 2025-11-20 a49b0302d0 net/http: correctly close fake net.Conns
+ 2025-11-20 32f5aadd2f cmd/compile: stack allocate backing stores during append
+ 2025-11-20 a18aff8057 runtime: select GC mark workers during start-the-world
+ 2025-11-20 829779f4fe runtime: split findRunnableGCWorker in two
+ 2025-11-20 ab59569099 go/version: use "custom" as an example of a version suffix
+ 2025-11-19 c4bb9653ba cmd/compile: Implement LoweredZeroLoop with LSX Instruction on loong64
+ 2025-11-19 7f2ae21fb4 cmd/internal/obj/loong64: add MULW.D.W[U] instructions
+ 2025-11-19 a2946f2385 crypto: add Encapsulator and Decapsulator interfaces
+ 2025-11-19 6b83bd7146 crypto/ecdh: add KeyExchanger interface
+ 2025-11-19 4fef9f8b55 go/types, types2: fix object path for grouped declaration statements
+ 2025-11-19 33529db142 spec: escape double-ampersands
+ 2025-11-19 dc42565a20 cmd/compile: fix control flow for unsigned divisions proof relations
+ 2025-11-19 e64023dcbf cmd/compile: cleanup useless if statement in prove
+ 2025-11-19 2239520d1c test: go fmt prove.go tests
+ 2025-11-19 489d3dafb7 math: switch s390x math.Pow to generic implementation
+ 2025-11-18 8c41a482f9 runtime: add dlog.hexdump
+ 2025-11-18 e912618bd2 runtime: add hexdumper
+ 2025-11-18 2cf9d4b62f Revert "net/http: do not discard body content when closing it within request handlers"
+ 2025-11-18 4d0658bb08 cmd/compile: prefer fixed registers for values
+ 2025-11-18 ba634ca5c7 cmd/compile: fold boolean NOT into branches
+ 2025-11-18 8806d53c10 cmd/link: align sections, not symbols after DWARF compress
+ 2025-11-18 c93766007d runtime: do not print recovered when double panic with the same value
+ 2025-11-18 9859b43643 cmd/asm,cmd/compile,cmd/internal/obj/riscv: use compressed instructions on riscv64
+ 2025-11-17 b9ef0633f6 cmd/internal/sys,internal/goarch,runtime: enable the use of compressed instructions on riscv64
+ 2025-11-17 a087dea869 debug/elf: sync new loong64 relocation types up to LoongArch ELF psABI v20250521
+ 2025-11-17 e1a12c781f cmd/compile: use 32x32->64 multiplies on arm64
+ 2025-11-17 6caab99026 runtime: relax TestMemoryLimit on darwin a bit more
+ 2025-11-17 eda2e8c683 runtime: clear frame pointer at thread entry points
+ 2025-11-17 6919858338 runtime: rename findrunnable references to findRunnable
+ 2025-11-17 8e734ec954 go/ast: fix BasicLit.End position for raw strings containing \r
+ 2025-11-17 592775ec7d crypto/mlkem: avoid a few unnecessary inverse NTT calls
+ 2025-11-17 590cf18daf crypto/mlkem/mlkemtest: add derandomized Encapsulate768/1024
+ 2025-11-17 c12c337099 cmd/compile: teach prove about subtract idioms
+ 2025-11-17 bc15963813 cmd/compile: clean up prove pass
+ 2025-11-17 1297fae708 go/token: add (*File).End method
+ 2025-11-17 65c09eafdf runtime: hoist invariant code out of heapBitsSmallForAddrInline
+ 2025-11-17 594129b80c internal/runtime/maps: update doc for table.Clear
+ 2025-11-15 c58d075e9a crypto/rsa: deprecate PKCS#1 v1.5 encryption
+ 2025-11-14 d55ecea9e5 runtime: usleep before stealing runnext only if not in syscall
+ 2025-11-14 410ef44f00 cmd: update x/tools to 59ff18c
+ 2025-11-14 50128a2154 runtime: support runtime.freegc in size-specialized mallocs for noscan objects
+ 2025-11-14 c3708350a4 cmd/go: tests: rename git-min-vers->git-sha256
+ 2025-11-14 aea881230d std: fix printf("%q", int) mistakes
+ 2025-11-14 120f1874ef runtime: add more precise test of assist credit handling for runtime.freegc
+ 2025-11-14 fecfcaa4f6 runtime: add runtime.freegc to reduce GC work
+ 2025-11-14 5a347b775e runtime: set GOEXPERIMENT=runtimefreegc to disabled by default
+ 2025-11-14 1a03d0db3f runtime: skip tests for GOEXPERIMENT=arenas that do not handle clobberfree=1
+ 2025-11-14 cb0d9980f5 net/http: do not discard body content when closing it within request handlers
+ 2025-11-14 03ed43988f cmd/compile: allow multi-field structs to be stored directly in interfaces
+ 2025-11-14 1bb1f2bf0c runtime: put AddCleanup cleanup arguments in their own allocation
+ 2025-11-14 9fd2e44439 runtime: add AddCleanup benchmark
+ 2025-11-14 80c91eedbb runtime: ensure weak handles end up in their own allocation
+ 2025-11-14 7a8d0b5d53 runtime: add debug mode to extend _Grunning-without-P windows
+ 2025-11-14 710abf74da internal/runtime/cgobench: add Go function call benchmark for comparison
+ 2025-11-14 b24aec598b doc, cmd/internal/obj/riscv: document the riscv64 assembler
+ 2025-11-14 a0e738c657 cmd/compile/internal: remove incorrect riscv64 SLTI rule
+ 2025-11-14 2cdcc4150b cmd/compile: fold negation into multiplication
+ 2025-11-14 b57962b7c7 bytes: fix panic in bytes.Buffer.Peek
+ 2025-11-14 0a569528ea cmd/compile: optimize comparisons with single bit difference
+ 2025-11-14 1e5e6663e9 cmd/compile: remove unnecessary casts and types from riscv64 rules
+ 2025-11-14 ddd8558e61 go/types, types2: swap object.color for Checker.objPathIdx
+ 2025-11-14 9daaab305c cmd/link/internal/ld: make runtime.buildVersion with experiments valid
+ 2025-11-13 d50a571ddf test: fix tests to work with sizespecializedmalloc turned off
+ 2025-11-13 704f841eab cmd/trace: annotation proc start/stop with thread and proc always
+ 2025-11-13 17a02b9106 net/http: remove unused isLitOrSingle and isNotToken
+ 2025-11-13 ff61991aed cmd/go: fix flaky TestScript/mod_get_direct
+ 2025-11-13 129d0cb543 net/http/cgi: accept INCLUDED as protocol for server side includes
+ 2025-11-13 77c5130100 go/types: minor simplification
+ 2025-11-13 7601cd3880 go/types: generate cycles.go
+ 2025-11-13 7a372affd9 go/types, types2: rename definedType to declaredType and clarify docs
Change-Id: Ibaa9bdb982364892f80e511c1bb12661fcd5fb86
|
|
The rule
(SLTI [x] (ORI [y] _)) && y >= 0 && int64(y) >= int64(x) => (MOVDconst [0])
is incorrect as it only generates correct code if the unknown value
being compared is >= 0. If the unknown value is < 0 the rule will
incorrectly produce a constant value of 0, whereas the code optimized
away by the rule would have produced a value of 1.
A new test that causes the faulty rule to generate incorrect code
is also added to ensure that the error does not return.
Change-Id: I69224e0776596f1b9538acf9dacf9009d305f966
Reviewed-on: https://go-review.googlesource.com/c/go/+/720220
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Conflicts:
- src/cmd/compile/internal/ir/symtab.go
- src/cmd/compile/internal/ssa/prove.go
- src/cmd/compile/internal/ssa/rewriteAMD64.go
- src/cmd/compile/internal/ssagen/intrinsics.go
- src/cmd/compile/internal/typecheck/builtin.go
- src/internal/buildcfg/exp.go
- src/internal/strconv/ftoa.go
- test/codegen/stack.go
Manually resolved some conflicts:
- Use internal/strconv for simd.String, remove internal/ftoa
- prove.go is just copied from the one on the main branch. We
have cherry-picked the changes to prove.go to main branch, so
our copy is identical to an old version of the one on the main
branch. There are CLs landed after our cherry-picks. Just copy
it over to adopt the new code.
Merge List:
+ 2025-11-13 57362e9814 go/types, types2: check for direct cycles as a separate phase
+ 2025-11-13 099e0027bd cmd/go/internal/modfetch: consolidate global vars
+ 2025-11-13 028375323f cmd/go/internal/modfetch/codehost: fix flaky TestReadZip
+ 2025-11-13 4ebf295b0b runtime: prefer to restart Ps on the same M after STW
+ 2025-11-13 625d8e9b9c runtime/pprof: fix goroutine leak profile tests for noopt
+ 2025-11-13 4684a26c26 spec: remove cycle restriction for type parameters
+ 2025-11-13 0f9c8fb29d cmd/asm,cmd/internal/obj/riscv: add support for riscv compressed instructions
+ 2025-11-13 a15d036ce2 cmd/internal/obj/riscv: implement better bit pattern encoding
+ 2025-11-12 abb241a789 cmd/internal/obj/loong64: add {,X}VS{ADD,SUB}.{B/H/W/V}{,U} instructions support
+ 2025-11-12 0929d21978 cmd/go: keep objects alive while stopping cleanups
+ 2025-11-12 f03d06ec1a runtime: fix list test memory management for mayMoreStack
+ 2025-11-12 48127f656b crypto/internal/fips140/sha3: remove outdated TODO
+ 2025-11-12 c3d1d42764 sync/atomic: amend comments for Value.{Swap,CompareAndSwap}
+ 2025-11-12 e0807ba470 cmd/compile: don't clear ptrmask in fillptrmask
+ 2025-11-12 66318d2b4b internal/abi: correctly describe result in Name.Name doc comment
+ 2025-11-12 34aef89366 cmd/compile: use FCLASSD for subnormal checks on riscv64
+ 2025-11-12 0c28789bd7 net/url: disallow raw IPv6 addresses in host
+ 2025-11-12 4e761b9a18 cmd/compile: optimize liveness in stackalloc
+ 2025-11-12 956909ff84 crypto/x509: move BetterTLS suite from crypto/tls
+ 2025-11-12 6525f46707 cmd/link: change shdr and phdr from arrays to slices
+ 2025-11-12 d3aeba1670 runtime: switch p.gcFractionalMarkTime to atomic.Int64
+ 2025-11-12 8873e8bea2 runtime,runtime/pprof: clean up goroutine leak profile writing
+ 2025-11-12 b8b84b789e cmd/go: clarify the -o testflag is only for copying the binary
+ 2025-11-12 c761b26b56 mime: parse media types that contain braces
+ 2025-11-12 65858a146e os/exec: include Cmd.Start in the list of methods that run Cmd
+ 2025-11-11 4bfc3a9d14 std,cmd: go fix -any std cmd
+ 2025-11-11 2263d4aabd runtime: doubly-linked sched.midle list
+ 2025-11-11 046dce0e54 runtime: use new list type for spanSPMCs
+ 2025-11-11 5f11275457 runtime: reusable intrusive doubly-linked list
+ 2025-11-11 951cf0501b internal/trace/testtrace: fix flag name typos
+ 2025-11-11 2750f95291 cmd/go: implement accurate pseudo-versions for Mercurial
+ 2025-11-11 b709a3e8b4 cmd/go/internal/vcweb: cache hg servers
+ 2025-11-11 426ef30ecf cmd/go: implement -reuse for Mercurial repos
+ 2025-11-10 5241d114f5 spec: more precise prose for special case of append
+ 2025-11-10 cdf64106f6 go/types, types2: first argument to append must never be be nil
+ 2025-11-10 a0eb4548cf .gitignore: ignore go test artifacts
+ 2025-11-10 bf58e7845e internal/trace: add "command" to convert text traces to raw
+ 2025-11-10 052c192a4c runtime: fix lock rank for work.spanSPMCs.lock
+ 2025-11-10 bc5ffe5c79 internal/runtime/sys,math/bits: eliminate bounds checks on len8tab
+ 2025-11-10 32f8d6486f runtime: document that tracefpunwindoff applies to some profilers
+ 2025-11-10 1c1c1942ba cmd/go: remove redundant AVX regex in security flag checks
+ 2025-11-10 3b3d6b9e5d cmd/internal/obj/arm64: shorten constant integer loads
+ 2025-11-10 5f4b5f1a19 runtime/msan: use different msan routine for copying
+ 2025-11-10 0fe6c8e8c8 runtime: tweak wording for comment of mcache.flushGen
+ 2025-11-10 95a0e5adc1 sync: don't call Done when f() panics in WaitGroup.Go
+ 2025-11-08 e8ed85d6c2 cmd/go: update goSum if necessary
+ 2025-11-08 b76103c08e cmd/go: output missing GoDebug entries
+ 2025-11-07 47a63a331d cmd/go: rewrite hgrepo1 test repo to be deterministic
+ 2025-11-07 7995751d3a cmd/go: copy git reuse and support repos to hg
+ 2025-11-07 66c7ca7fb3 cmd/go: improve TestScript/reuse_git
+ 2025-11-07 de84ac55c6 cmd/link: clean up some comments to Go standards
+ 2025-11-07 5cd1b73772 runtime: correctly print panics before fatal-ing on defer
+ 2025-11-07 91ca80f970 runtime/cgo: improve error messages after pointer panic
+ 2025-11-07 d36e88f21f runtime: tweak wording for doc
+ 2025-11-06 ad3ccd92e4 cmd/link: move pclntab out of relro section
+ 2025-11-06 43b91e7abd iter: fix a tiny doc comment bug
+ 2025-11-06 48c7fa13c6 Revert "runtime: remove the pc field of _defer struct"
+ 2025-11-05 8111104a21 cmd/internal/obj/loong64: add {,X}VSHUF.{B/H/W/V} instructions support
+ 2025-11-05 2e2072561c cmd/internal/obj/loong64: add {,X}VEXTRINS.{B,H,W,V} instruction support
+ 2025-11-05 01c29d1f0b internal/chacha8rand: replace VORV with instruction VMOVQ on loong64
+ 2025-11-05 f01a1841fd cmd/compile: fix error message on loong64
+ 2025-11-05 8cf7a0b4c9 cmd/link: support weak binding on darwin
+ 2025-11-05 2dd7e94e16 cmd/go: use go.dev instead of golang.org in flag errors
+ 2025-11-05 28f1ad5782 cmd/go: fix TestScript/govcs
+ 2025-11-05 daa220a1c9 cmd/go: silence TLS handshake errors during test
+ 2025-11-05 3ae9e95002 cmd/go: fix TestCgoPkgConfig on darwin with pkg-config installed
+ 2025-11-05 a494a26bc2 cmd/go: fix TestScript/vet_flags
+ 2025-11-05 a8fb94969c cmd/go: fix TestScript/tool_build_as_needed
+ 2025-11-05 04f05219c4 cmd/cgo: skip escape checks if call site has no argument
+ 2025-11-04 9f3a108ee0 os: ignore O_TRUNC errors on named pipes and terminal devices on Windows
+ 2025-11-04 0e1bd8b5f1 cmd/link, runtime: don't store text start in pcHeader
+ 2025-11-04 7347b54727 cmd/link: don't generate .gosymtab section
+ 2025-11-04 6914dd11c0 cmd/link: add and use new SymKind SFirstUnallocated
+ 2025-11-04 f5f14262d0 cmd/link: remove misleading comment
+ 2025-11-04 61de3a9dae cmd/link: remove unused SFILEPATH symbol kind
+ 2025-11-04 8e2bd267b5 cmd/link: add comments for SymKind values
+ 2025-11-04 16705b962e cmd/compile: faster liveness analysis in regalloc
+ 2025-11-04 a5fe6791d7 internal/syscall/windows: fix ReOpenFile sentinel error value
+ 2025-11-04 a7d174ccaa cmd/compile/internal/ssa: simplify riscv64 FCLASSD rewrite rules
+ 2025-11-04 856238615d runtime: amend doc for setPinned
+ 2025-11-04 c7ccbddf22 cmd/compile/internal/ssa: more aggressive on dead auto elim
+ 2025-11-04 75b2bb1d1a cmd/cgo: drop pre-1.18 support
+ 2025-11-04 dd839f1d00 internal/strconv: handle %f with fixedFtoa when possible
+ 2025-11-04 6e165b4d17 cmd/compile: implement Avg64u, Hmul64, Hmul64u for wasm
+ 2025-11-04 9f6590f333 encoding/pem: don't reslice in failure modes
+ 2025-11-03 34fec512ce internal/strconv: extract fixed-precision ftoa from ftoaryu.go
+ 2025-11-03 162ba6cc40 internal/strconv: add tests and benchmarks for ftoaFixed
+ 2025-11-03 9795c7ba22 internal/strconv: fix pow10 off-by-one in exponent result
+ 2025-11-03 ad5e941a45 cmd/internal/obj/loong64: using {xv,v}slli.d to perform copying between vector registers
+ 2025-11-03 dadbac0c9e cmd/internal/obj/loong64: add VPERMI.W, XVPERMI.{W,V,Q} instruction support
+ 2025-11-03 e2c6a2024c runtime: avoid append in printint, printuint
+ 2025-11-03 c93cc603cd runtime: allow Stack to traceback goroutines in syscall _Grunning window
+ 2025-11-03 b5353fd90a runtime: don't panic in castogscanstatus
+ 2025-11-03 43491f8d52 cmd/cgo: use the export'ed file/line in error messages
+ 2025-11-03 aa94fdf0cc cmd/go: link to go.dev/doc/godebug for removed GODEBUG settings
+ 2025-11-03 4d2b03d2fc crypto/tls: add BetterTLS test coverage
+ 2025-11-03 0c4444e13d cmd/internal/obj: support arm64 FMOVQ large offset encoding
+ 2025-11-03 85bec791a0 cmd/go/testdata/script: loosen list_empty_importpath for freebsd
+ 2025-11-03 17b57078ab internal/runtime/cgobench: add cgo callback benchmark
+ 2025-11-03 5f8fdb720c cmd/go: move functions to methods
+ 2025-11-03 0a95856b95 cmd/go: eliminate additional global variable
+ 2025-11-03 f93186fb44 cmd/go/internal/telemetrystats: count cgo usage
+ 2025-11-03 eaf28a27fd runtime: update outdated comments for deferprocStack
+ 2025-11-03 e12d8a90bf all: remove extra space in the comments
+ 2025-11-03 c5559344ac internal/profile: optimize Parse allocs
+ 2025-11-03 5132158ac2 bytes: add Buffer.Peek
+ 2025-11-03 361d51a6b5 runtime: remove the pc field of _defer struct
+ 2025-11-03 00ee1860ce crypto/internal/constanttime: expose intrinsics to the FIPS 140-3 packages
+ 2025-11-02 388c41c412 cmd/go: skip git sha256 tests if git < 2.29
+ 2025-11-01 385dc33250 runtime: prevent time.Timer.Reset(0) from deadlocking testing/synctest tests
+ 2025-10-31 99b724f454 cmd/go: document purego convention
+ 2025-10-31 27937289dc runtime: avoid zeroing scavenged memory
+ 2025-10-30 89dee70484 runtime: prioritize panic output over racefini
+ 2025-10-30 8683bb846d runtime: optimistically CAS atomicstatus directly in enter/exitsyscall
+ 2025-10-30 5b8e850340 runtime: don't track scheduling latency for _Grunning <-> _Gsyscall
+ 2025-10-30 251814e580 runtime: document tracer invariants explicitly
+ 2025-10-30 7244e9221f runtime: eliminate _Psyscall
+ 2025-10-30 5ef19c0d0c strconv: delete divmod1e9
+ 2025-10-30 d32b1f02c3 runtime: delete timediv
+ 2025-10-30 cbbd385cb8 strconv: remove arch-specific decision in formatBase10
+ 2025-10-30 6aca04a73a reflect: correct internal docs for uncommonType
+ 2025-10-30 235b4e729d cmd/compile/internal/ssa: model right shift more precisely
+ 2025-10-30 d44db293f9 go/token: fix a typo in a comment
+ 2025-10-30 cdc6b559ca strconv: remove hand-written divide on 32-bit systems
+ 2025-10-30 1e5bb416d8 cmd/compile: implement bits.Mul64 on 32-bit systems
+ 2025-10-30 38317c44e7 crypto/internal/fips140/aes: fix CTR generator
+ 2025-10-29 3be9a0e014 go/types, types: proceed with correct (invalid) type in case of a selector error
+ 2025-10-29 d2c5fa0814 strconv: remove &0xFF trick in formatBase10
+ 2025-10-29 9bbda7c99d cmd/compile: make prove understand div, mod better
+ 2025-10-29 915c1839fe test/codegen: simplify asmcheck pattern matching
+ 2025-10-29 32ee3f3f73 runtime: tweak example code for gorecover
+ 2025-10-29 da3fb90b23 crypto/internal/fips140/bigmod: fix extendedGCD comment
+ 2025-10-29 9035f7aea5 runtime: use internal/strconv
+ 2025-10-29 49c1da474d internal/itoa, internal/runtime/strconv: delete
+ 2025-10-29 b2a346bbd1 strconv: move all but Quote to internal/strconv
+ 2025-10-28 041f564b3e internal/runtime/gc/scan: avoid memory destination on VPCOMPRESSQ
+ 2025-10-28 81afd3a59b cmd/compile: extend ppc64 MADDLD to match const ADDconst & MULLDconst
+ 2025-10-28 ea50d61b66 cmd/compile: name change isDirect -> isDirectAndComparable
+ 2025-10-28 bd4dc413cd cmd/compile: don't optimize away a panicing interface comparison
+ 2025-10-28 30c047d0d0 cmd/compile: extend loong MOV*idx rules to match ADDshiftLLV
+ 2025-10-28 46e5e2b09a runtime: define PanicBounds in funcdata.h
+ 2025-10-28 3da0356685 crypto/internal/fips140test: collect 300M entropy samples for ESV
+ 2025-10-28 d5953185d5 runtime: amend comments a bit
+ 2025-10-28 12c8d14d94 errors: document that the target of Is must be comparable
+ 2025-10-28 1f4d14e493 go/types, types2: pull up package-level object sort to a separate phase
+ 2025-10-28 b8aa1ee442 go/types, types2: reduce locks held at once in resolveUnderlying
+ 2025-10-28 24af441437 cmd/compile: rewrite proved multiplies by 0 or 1 into CondSelect
+ 2025-10-28 2d33a456c6 cmd/compile: move branchelim supported arches to Config
+ 2025-10-27 2c91c33e88 crypto/subtle,cmd/compile: add intrinsics for ConstantTimeSelect and *Eq
+ 2025-10-27 73d7635fae cmd/compile: add generic rules to remove bool → int → bool roundtrips
+ 2025-10-27 1662d55247 cmd/compile: do not Zext bools to 64bits in amd64 CMOV generation rules
+ 2025-10-27 b8468d8c4e cmd/compile: introduce bytesizeToConst to cleanup switches in prove
+ 2025-10-27 9e25c2f6de cmd/link: internal linking support for windows/arm64
+ 2025-10-27 ff2ebf69c4 internal/runtime/gc/scan: correct size class size check
+ 2025-10-27 9a77aa4f08 cmd/compile: add position info to sccp debug messages
+ 2025-10-27 77dc138030 cmd/compile: teach prove about unsigned rounding-up divide
+ 2025-10-27 a0f33b2887 cmd/compile: change !l.nonzero() into l.maybezero()
+ 2025-10-27 5453b788fd cmd/compile: optimize Add64carry with unused carries into plain Add64
+ 2025-10-27 2ce5aab79e cmd/compile: remove 68857 ModU flowLimit workaround in prove
+ 2025-10-27 a50de4bda7 cmd/compile: remove 68857 min & max flowLimit workaround in prove
+ 2025-10-27 53be78630a cmd/compile: use topo-sort in prove to correctly learn facts while walking once
+ 2025-10-27 dec2b4c83d runtime: avoid bound check in freebsd binuptime
+ 2025-10-27 916e682d51 cmd/internal/obj, cmd/asm: reclassify the offset of memory access operations on loong64
+ 2025-10-27 2835b994fb cmd/go: remove global loader state variable
+ 2025-10-27 139f89226f cmd/go: use local state for telemetry
+ 2025-10-27 8239156571 cmd/go: use tagged switch
+ 2025-10-27 d741483a1f cmd/go: increase stmt threshold on amd64
+ 2025-10-27 a6929cf4a7 cmd/go: removed unused code in toolchain.Exec
+ 2025-10-27 180c07e2c1 go/types, types2: clarify docs for resolveUnderlying
+ 2025-10-27 d8a32f3d4b go/types, types2: wrap Named.fromRHS into Named.rhs
+ 2025-10-27 b2af92270f go/types, types2: verify stateMask transitions in debug mode
+ 2025-10-27 92decdcbaa net/url: further speed up escape and unescape
+ 2025-10-27 5f4ec3541f runtime: remove unused cgoCheckUsingType function
+ 2025-10-27 189f2c08cc time: rewrite IsZero method to use wall and ext fields
+ 2025-10-27 f619b4a00d cmd/go: reorder parameters so that context is first
+ 2025-10-27 f527994c61 sync: update comments for Once.done
+ 2025-10-26 5dcaf9a01b runtime: add GOEXPERIMENT=runtimefree
+ 2025-10-26 d7a52f9369 cmd/compile: use MOV(D|F) with const for Const(64|32)F on riscv64
+ 2025-10-26 6f04a92be3 internal/chacha8rand: provide vector implementation for riscv64
+ 2025-10-26 54e3adc533 cmd/go: use local loader state in test
+ 2025-10-26 ca379b1c56 cmd/go: remove loaderstate dependency
+ 2025-10-26 83a44bde64 cmd/go: remove unused loader state
+ 2025-10-26 7e7cd9de68 cmd/go: remove temporary rf cleanup script
+ 2025-10-26 53ad68de4b cmd/compile: allow unaligned load/store on Wasm
+ 2025-10-25 12ec09f434 cmd/go: use local state object in work.runBuild and work.runInstall
+ 2025-10-24 643f80a11f runtime: add ppc and s390 to 32 build constraints for gccgo
+ 2025-10-24 0afbeb5102 runtime: add ppc and s390 to linux 32 bits syscall build constraints for gccgo
+ 2025-10-24 7b506d106f cmd/go: use local state object in `generate.runGenerate`
+ 2025-10-24 26a8a21d7f cmd/go: use local state object in `env.runEnv`
+ 2025-10-24 f2dd3d7e31 cmd/go: use local state object in `vet.runVet`
+ 2025-10-24 784700439a cmd/go: use local state object in pkg `workcmd`
+ 2025-10-24 69673e9be2 cmd/go: use local state object in `tool.runTool`
+ 2025-10-24 2e12c5db11 cmd/go: use local state object in `test.runTest`
+ 2025-10-24 fe345ff2ae cmd/go: use local state object in `modget.runGet`
+ 2025-10-24 d312e27e8b cmd/go: use local state object in pkg `modcmd`
+ 2025-10-24 ea9cf26aa1 cmd/go: use local state object in `list.runList`
+ 2025-10-24 9926e1124e cmd/go: use local state object in `bug.runBug`
+ 2025-10-24 2c4fd7b2cd cmd/go: use local state object in `run.runRun`
+ 2025-10-24 ade9f33e1f cmd/go: add loaderstate as field on `mvsReqs`
+ 2025-10-24 ccf4192a31 cmd/go: make ImportMissingError work with local state
+ 2025-10-24 f5403f15f0 debug/pe: check for zdebug_gdb_scripts section in testDWARF
+ 2025-10-24 a26f860fa4 runtime: use 32-bit hash for maps on Wasm
+ 2025-10-24 747fe2efed encoding/json/v2: fix typo in documentation about errors.AsType
+ 2025-10-24 94f47fc03f cmd/link: remove pointless assignment in SetSymAlign
+ 2025-10-24 e6cff69051 crypto/x509: move constraint checking after chain building
+ 2025-10-24 f5f69a3de9 encoding/json/jsontext: avoid pinning application data in pools
+ 2025-10-24 a6a59f0762 encoding/json/v2: use slices.Sort directly
+ 2025-10-24 0d3dab9b1d crypto/x509: simplify candidate chain filtering
+ 2025-10-24 29046398bb cmd/go: refactor injection of modload.LoaderState
+ 2025-10-24 c18fa69e52 cmd/go: make ErrNoModRoot work with local state
+ 2025-10-24 296ecc918d cmd/go: add modload.State parameter to AllowedFunc
+ 2025-10-24 c445a61e52 cmd/go: add loaderstate as field on `QueryMatchesMainModulesError`
+ 2025-10-24 6ac40051d3 cmd/go: remove module loader state from ccompile
+ 2025-10-24 6a5a452528 cmd/go: inject vendor dir into builder struct
+ 2025-10-23 dfac972233 crypto/pbkdf2: add missing error return value in example
+ 2025-10-23 47bf8f073e unique: fix inconsistent panic prefix in canonmap cleanup path
+ 2025-10-23 03bd43e8bb go/types, types2: rename Named.resolve to unpack
+ 2025-10-23 9fcdc814b2 go/types, types2: rename loaded namedState to lazyLoaded
+ 2025-10-23 8401512a9b go/types, types2: rename complete namedState to hasMethods
+ 2025-10-23 cf826bfcb4 go/types, types2: set t.underlying exactly once in resolveUnderlying
+ 2025-10-23 c4e910895b net/url: speed up escape and unescape
+ 2025-10-23 3f6ac3a10f go/build: use slices.Equal
+ 2025-10-23 839da71f89 encoding/pem: properly calculate end indexes
+ 2025-10-23 39ed968832 cmd: update golang.org/x/arch for riscv64 disassembler
+ 2025-10-23 ca448191c9 all: replace Split in loops with more efficient SplitSeq
+ 2025-10-23 107fcb70de internal/goroot: replace HasPrefix+TrimPrefix with CutPrefix
+ 2025-10-23 8378276d66 strconv: optimize int-to-decimal and use consistently
+ 2025-10-23 e5688d0bdd cmd/internal/obj/riscv: simplify validation and encoding of raw instructions
+ 2025-10-22 77fc27972a doc/next: improve new(expr) release note
+ 2025-10-22 d94a8c56ad runtime: cleanup pagetrace
+ 2025-10-22 02728a2846 crypto/internal/fips140test: add entropy SHA2-384 testing
+ 2025-10-22 f92e01c117 runtime/cgo: fix cgoCheckArg description
+ 2025-10-22 50586182ab runtime: use backoff and ISB instruction to reduce contention in (*lfstack).pop and (*spanSet).pop on arm64
+ 2025-10-22 1ff59f3dd3 strconv: clean up powers-of-10 table, tests
+ 2025-10-22 7c9fa4d5e9 cmd/go: check if build output should overwrite files with renames
+ 2025-10-22 557b4d6e0f comment: change slice to string in function comment/help
+ 2025-10-22 d09a8c8ef4 go/types, types2: simplify locking in Named.resolveUnderlying
+ 2025-10-22 5a42af7f6c go/types, types2: in resolveUnderlying, only compute path when needed
+ 2025-10-22 4bdb55b5b8 go/types, types2: rename Named.under to Named.resolveUnderlying
+ 2025-10-21 29d43df8ab go/build, cmd/go: use ast.ParseDirective for go:embed
+ 2025-10-21 4e695dd634 go/ast: add ParseDirective for parsing directive comments
+ 2025-10-21 06e57e60a7 go/types, types2: only report version errors if new(expr) is ok otherwise
+ 2025-10-21 6c3d0d259f path/filepath: reword documentation for Rel
+ 2025-10-21 39fd61ddb0 go/types, types2: guard Named.underlying with Named.mu
+ 2025-10-21 4a0115c886 runtime,syscall: implement and use syscalln on darwin
+ 2025-10-21 261c561f5a all: gofmt -w
+ 2025-10-21 c9c78c06ef strconv: embed testdata in test
+ 2025-10-21 8f74f9daf4 sync: re-enable race even when panicking
+ 2025-10-21 8a6c64f4fe syscall: use rawSyscall6 to call ptrace in forkAndExecInChild
+ 2025-10-21 4620db72d2 runtime: use timer_settime64 on 32-bit Linux
+ 2025-10-21 b31dc77cea os: support deleting read-only files in RemoveAll on older Windows versions
+ 2025-10-21 46cc532900 cmd/compile/internal/ssa: fix typo in comment
+ 2025-10-21 2163a58021 crypto/internal/fips140/entropy: increase AllocsPerRun iterations
+ 2025-10-21 306eacbc11 cmd/go/testdata/script: disable list_empty_importpath test on Windows
+ 2025-10-21 a5a249d6a6 all: eliminate unnecessary type conversions
+ 2025-10-21 694182d77b cmd/internal/obj/ppc64: improve large prologue generation
+ 2025-10-21 b0dcb95542 cmd/compile: leave the horses alone
+ 2025-10-21 9a5a1202f4 runtime: clean dead architectures from go:build constraint
+ 2025-10-21 8539691d0c crypto/internal/fips140/entropy: move to crypto/internal/entropy/v1.0.0
+ 2025-10-20 99cf4d671c runtime: save lasx and lsx registers in loong64 async preemption
+ 2025-10-20 79ae97fe9b runtime: make procyieldAsm no longer loop infinitely if passed 0
+ 2025-10-20 f838faffe2 runtime: wrap procyield assembly and check for 0
+ 2025-10-20 ee4d2c312d runtime/trace: dump test traces on validation failure
+ 2025-10-20 7b81a1e107 net/url: reduce allocs in Encode
+ 2025-10-20 e425176843 cmd/asm: fix typo in comment
+ 2025-10-20 dc9a3e2a65 runtime: fix generation skew with trace reentrancy
+ 2025-10-20 df33c17091 runtime: add _Gdeadextra status
+ 2025-10-20 7503856d40 cmd/go: inject loaderstate into matcher function
+ 2025-10-20 d57c3fd743 cmd/go: inject State parameter into `work.runInstall`
+ 2025-10-20 e94a5008f6 cmd/go: inject State parameter into `work.runBuild`
+ 2025-10-20 d9e6f95450 cmd/go: inject State parameter into `workcmd.runSync`
+ 2025-10-20 9769a61e64 cmd/go: inject State parameter into `modget.runGet`
+ 2025-10-20 f859799ccf cmd/go: inject State parameter into `modcmd.runVerify`
+ 2025-10-20 0f820aca29 cmd/go: inject State parameter into `modcmd.runVendor`
+ 2025-10-20 92aa3e9e98 cmd/go: inject State parameter into `modcmd.runInit`
+ 2025-10-20 e176dff41c cmd/go: inject State parameter into `modcmd.runDownload`
+ 2025-10-20 e7c66a58d5 cmd/go: inject State parameter into `toolchain.Select`
+ 2025-10-20 4dc3dd9a86 cmd/go: add loaderstate to Switcher
+ 2025-10-20 bcf7da1595 cmd/go: convert functions to methods
+ 2025-10-20 0d3044f965 cmd/go: make Reset work with any State instance
+ 2025-10-20 386d81151d cmd/go: make setState work with any State instance
+ 2025-10-20 a420aa221e cmd/go: inject State parameter into `tool.runTool`
+ 2025-10-20 441e7194a4 cmd/go: inject State parameter into `test.runTest`
+ 2025-10-20 35e8309be2 cmd/go: inject State parameter into `list.runList`
+ 2025-10-20 29a81624f7 cmd/go: inject state parameter into `fmtcmd.runFmt`
+ 2025-10-20 f7eaea02fd cmd/go: inject state parameter into `clean.runClean`
+ 2025-10-20 58a8fdb6cf cmd/go: inject State parameter into `bug.runBug`
+ 2025-10-20 8d0bef7ffe runtime: add linkname documentation and guidance
+ 2025-10-20 3e43f48cb6 encoding/asn1: use reflect.TypeAssert to improve performance
+ 2025-10-20 4ad5585c2c runtime: fix _rt0_ppc64x_lib on aix
+ 2025-10-17 a5f55a441e cmd/fix: add modernize and inline analyzers
+ 2025-10-17 80876f4b42 cmd/go/internal/vet: tweak help doc
+ 2025-10-17 b5aefe07e5 all: remove unnecessary loop variable copies in tests
+ 2025-10-17 5137c473b6 go/types, types2: remove references to under function in comments
+ 2025-10-17 dbbb1bfc91 all: correct name for comments
+ 2025-10-17 0983090171 encoding/pem: properly decode strange PEM data
+ 2025-10-17 36863d6194 runtime: unify riscv64 library entry point
+ 2025-10-16 0c14000f87 go/types, types2: remove under(Type) in favor of Type.Underlying()
+ 2025-10-16 1099436f1b go/types, types2: change and enforce lifecycle of Named.fromRHS and Named.underlying fields
+ 2025-10-16 41f5659347 go/types, types2: remove superfluous unalias call (minor cleanup)
+ 2025-10-16 e7351c03c8 runtime: use DC ZVA instead of its encoding in WORD in arm64 memclr
+ 2025-10-16 6cbe0920c4 cmd: update to x/tools@7d9453cc
+ 2025-10-15 45eee553e2 cmd/internal/obj: move ARM64RegisterExtension from cmd/asm/internal/arch
+ 2025-10-15 27f9a6705c runtime: increase repeat count for alloc test
+ 2025-10-15 b68cebd809 net/http/httptest: record failed ResponseWriter writes
+ 2025-10-15 f1fed742eb cmd: fix three printf problems reported by newest vet
+ 2025-10-15 0984dcd757 cmd/compile: fix an error in comments
+ 2025-10-15 31f82877e8 go/types, types2: fix misleading internal comment
+ 2025-10-15 6346349f56 cmd/compile: replace angle brackets with square
+ 2025-10-15 284379cdfc cmd/compile: remove rematerializable values from live set across calls
+ 2025-10-15 519ae514ab cmd/compile: eliminate bound check for slices of the same length
+ 2025-10-15 b5a29cca48 cmd/distpack: add fix tool to inventory
+ 2025-10-15 bb5eb51715 runtime/pprof: fix errors in pprof_test
+ 2025-10-15 5c9a26c7f8 cmd/compile: use arm64 neon in LoweredMemmove/LoweredMemmoveLoop
+ 2025-10-15 61d1ff61ad cmd/compile: use block starting position for phi line number
+ 2025-10-15 5b29875c8e cmd/go: inject State parameter into `run.runRun`
+ 2025-10-15 5113496805 runtime/pprof: skip flaky test TestProfilerStackDepth/heap for now
+ 2025-10-15 36086e85f8 cmd/go: create temporary cleanup script
+ 2025-10-14 7056c71d32 cmd/compile: disable use of new saturating float-to-int conversions
+ 2025-10-14 6d5b13793f Revert "cmd/compile: make 386 float-to-int conversions match amd64"
+ 2025-10-14 bb2a14252b Revert "runtime: adjust softfloat corner cases to match amd64/arm64"
+ 2025-10-14 3bc9d9fa83 Revert "cmd/compile: make wasm match other platforms for FP->int32/64 conversions"
+ 2025-10-14 ee5af46172 encoding/json: avoid misleading errors under goexperiment.jsonv2
+ 2025-10-14 11d3d2f77d cmd/internal/obj/arm64: add support for PAC instructions
+ 2025-10-14 4dbf1a5a4c cmd/compile/internal/devirtualize: do not track assignments to non-PAUTO
+ 2025-10-14 0ddb5ed465 cmd/compile/internal/devirtualize: use FatalfAt instead of Fatalf where possible
+ 2025-10-14 0a239bcc99 Revert "net/url: disallow raw IPv6 addresses in host"
+ 2025-10-14 5a9ef44bc0 cmd/compile/internal/devirtualize: fix OCONVNOP assertion
+ 2025-10-14 3765758b96 go/types, types2: minor cleanup (remove TODO)
+ 2025-10-14 f6b9d56aff crypto/internal/fips140/entropy: fix benign race
+ 2025-10-14 60f6d2f623 crypto/internal/fips140/entropy: support SHA-384 sizes for ACVP tests
+ 2025-10-13 6fd8e88d07 encoding/json/v2: restrict presence of default options
+ 2025-10-13 1abc6b0204 go/types, types2: permit type cycles through type parameter lists
+ 2025-10-13 9fdd6904da strconv: add tests that Java once mishandled
+ 2025-10-13 9b8742f2e7 cmd/compile: don't depend on arch-dependent conversions in the compiler
+ 2025-10-13 0e64ee1286 encoding/json/v2: report EOF for top-level values in UnmarshalDecode
+ 2025-10-13 6bcd97d9f4 all: replace calls to errors.As with errors.AsType
+ 2025-10-11 1cd71689f2 crypto/x509: rework fix for CVE-2025-58187
+ 2025-10-11 8aa1efa223 cmd/link: in TestFallocate, only check number of blocks on Darwin
+ 2025-10-10 b497a29d25 encoding/json: fix regression in quoted numbers under goexperiment.jsonv2
+ 2025-10-10 48bb7a6114 cmd/compile: repair bisection behavior for float-to-unsigned conversion
+ 2025-10-10 e8a53538b4 runtime: fail TestGoroutineLeakProfile on data race
+ 2025-10-10 e3be2d1b2b net/url: disallow raw IPv6 addresses in host
+ 2025-10-10 aced4c79a2 net/http: strip request body headers on POST to GET redirects
+ 2025-10-10 584a89fe74 all: omit unnecessary reassignment
+ 2025-10-10 69e8279632 net/http: set cookie host to Request.Host when available
+ 2025-10-10 6f4c63ba63 cmd/go: unify "go fix" and "go vet"
+ 2025-10-10 955a5a0dc5 runtime: support arm64 Neon in async preemption
+ 2025-10-10 5368e77429 net/http: run TestRequestWriteTransport with fake time to avoid flakes
+ 2025-10-09 c53cb642de internal/buildcfg: enable greenteagc experiment for loong64
+ 2025-10-09 954fdcc51a cmd/compile: declare no output register for loong64 LoweredAtomic{And,Or}32 ops
+ 2025-10-09 19a30ea3f2 cmd/compile: call generated size-specialized malloc functions directly
+ 2025-10-09 80f3bb5516 reflect: remove timeout in TestChanOfGC
+ 2025-10-09 9db7e30bb4 net/url: allow IP-literals with IPv4-mapped IPv6 addresses
+ 2025-10-09 8d810286b3 cmd/compile: make wasm match other platforms for FP->int32/64 conversions
+ 2025-10-09 b9f3accdcf runtime: adjust softfloat corner cases to match amd64/arm64
+ 2025-10-09 78d75b3799 cmd/compile: make 386 float-to-int conversions match amd64
+ 2025-10-09 0e466a8d1d cmd/compile: modify float-to-[u]int so that amd64 and arm64 match
+ 2025-10-08 4837fbe414 net/http/httptest: check whether response bodies are allowed
+ 2025-10-08 ee163197a8 path/filepath: return cleaned path from Rel
+ 2025-10-08 de9da0de30 cmd/compile/internal/devirtualize: improve concrete type analysis
+ 2025-10-08 ae094a1397 crypto/internal/fips140test: make entropy file pair names match
+ 2025-10-08 941e5917c1 runtime: cleanup comments from asm_ppc64x.s improvements
+ 2025-10-08 d945600d06 cmd/gofmt: change -d to exit 1 if diffs exist
+ 2025-10-08 d4830c6130 cmd/internal/obj: fix Link.Diag printf errors
+ 2025-10-08 e1ca1de123 net/http: format pprof.go
+ 2025-10-08 e5d004c7a8 net/http: update HTTP/2 documentation to reference new config features
+ 2025-10-08 97fd6bdecc cmd/compile: fuse NaN checks with other comparisons
+ 2025-10-07 78b43037dc cmd/go: refactor usage of `workFilePath`
+ 2025-10-07 bb1ca7ae81 cmd/go, testing: add TB.ArtifactDir and -artifacts flag
+ 2025-10-07 1623927730 cmd/go: refactor usage of `requirements`
+ 2025-10-07 a1661e776f Revert "crypto/internal/fips140/subtle: add assembly implementation of xorBytes for mips64x"
+ 2025-10-07 cb81270113 Revert "crypto/internal/fips140/subtle: add assembly implementation of xorBytes for mipsx"
+ 2025-10-07 f2d0d05d28 cmd/go: refactor usage of `MainModules`
+ 2025-10-07 f7a68d3804 archive/tar: set a limit on the size of GNU sparse file 1.0 regions
+ 2025-10-07 463165699d net/mail: avoid quadratic behavior in mail address parsing
+ 2025-10-07 5ede095649 net/textproto: avoid quadratic complexity in Reader.ReadResponse
+ 2025-10-07 5ce8cd16f3 encoding/pem: make Decode complexity linear
+ 2025-10-07 f6f4e8b3ef net/url: enforce stricter parsing of bracketed IPv6 hostnames
+ 2025-10-07 7dd54e1fd7 runtime: make work.spanSPMCs.all doubly-linked
+ 2025-10-07 3ee761739b runtime: free spanQueue on P destroy
+ 2025-10-07 8709a41d5e encoding/asn1: prevent memory exhaustion when parsing using internal/saferio
+ 2025-10-07 9b9d02c5a0 net/http: add httpcookiemaxnum GODEBUG option to limit number of cookies parsed
+ 2025-10-07 3fc4c79fdb crypto/x509: improve domain name verification
+ 2025-10-07 6e4007e8cf crypto/x509: mitigate DoS vector when intermediate certificate contains DSA public key
+ 2025-10-07 6f7926589d cmd/go: refactor usage of `modRoots`
+ 2025-10-07 11d5484190 runtime: fix self-deadlock on sbrk platforms
+ 2025-10-07 2e52060084 cmd/go: refactor usage of `RootMode`
+ 2025-10-07 f86ddb54b5 cmd/go: refactor usage of `ForceUseModules`
+ 2025-10-07 c938051dd0 Revert "cmd/compile: redo arm64 LR/FP save and restore"
+ 2025-10-07 6469954203 runtime: assert p.destroy runs with GC not running
+ 2025-10-06 4c0fd3a2b4 internal/goexperiment: remove the synctest GOEXPERIMENT
+ 2025-10-06 c1e6e49d5d fmt: reduce Errorf("x") allocations to match errors.New("x")
+ 2025-10-06 7fbf54bfeb internal/buildcfg: enable greenteagc experiment by default
+ 2025-10-06 7bfeb43509 cmd/go: refactor usage of `initialized`
+ 2025-10-06 1d62e92567 test/codegen: make sure assignment results are used.
+ 2025-10-06 4fca79833f runtime: delete redundant code in the page allocator
+ 2025-10-06 719dfcf8a8 cmd/compile: redo arm64 LR/FP save and restore
+ 2025-10-06 f3312124c2 runtime: remove batching from spanSPMC free
+ 2025-10-06 24416458c2 cmd/go: export type State
+ 2025-10-06 c2fb15164b testing/synctest: remove Run
+ 2025-10-06 ac2ec82172 runtime: bump thread count slack for TestReadMetricsSched
+ 2025-10-06 e74b224b7c crypto/tls: streamline BoGo testing w/ -bogo-local-dir
+ 2025-10-06 3a05e7b032 spec: close tag
+ 2025-10-03 2a71af11fc net/url: improve URL docs
+ 2025-10-03 ee5369b003 cmd/link: add LIBRARY statement only with -buildmode=cshared
+ 2025-10-03 1bca4c1673 cmd/compile: improve slicemask removal
+ 2025-10-03 38b26f29f1 cmd/compile: remove stores to unread parameters
+ 2025-10-03 003b5ce1bc cmd/compile: fix SIMD const rematerialization condition
+ 2025-10-03 d91148c7a8 cmd/compile: enhance prove to infer bounds in slice len/cap calculations
+ 2025-10-03 20c9377e47 cmd/compile: enhance the chunked indexing case to include reslicing
+ 2025-10-03 ad3db2562e cmd/compile: handle rematerialized op for incompatible reg constraint
+ 2025-10-03 18cd4a1fc7 cmd/compile: use the right type for spill slot
+ 2025-10-03 1caa95acfa cmd/compile: enhance prove to deal with double-offset IsInBounds checks
+ 2025-10-03 ec70d19023 cmd/compile: rewrite to elide Slicemask from len==c>0 slicing
+ 2025-10-03 10e7968849 cmd/compile: accounts rematerialize ops's output reginfo
+ 2025-10-03 ab043953cb cmd/compile: minor tweak for race detector
+ 2025-10-03 ebb72bef44 cmd/compile: don't treat devel compiler as a released compiler
+ 2025-10-03 c54dc1418b runtime: support valgrind (but not asan) in specialized malloc functions
+ 2025-10-03 a7917eed70 internal/buildcfg: enable specializedmalloc experiment
+ 2025-10-03 630799c6c9 crypto/tls: add flag to render HTML BoGo report
Change-Id: I6bf904c523a77ee7d3dea9c8ae72292f8a5f2ba5
|
|
Only implemented for 64 bit floating point operations for now.
goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
│ sec/op │ sec/op vs base │
Acos 154.1n ± 0% 154.1n ± 0% ~ (p=0.303 n=10)
Acosh 215.8n ± 6% 226.7n ± 0% ~ (p=0.439 n=10)
Asin 149.2n ± 1% 149.2n ± 0% ~ (p=0.700 n=10)
Asinh 262.1n ± 0% 258.5n ± 0% -1.37% (p=0.000 n=10)
Atan 99.48n ± 0% 99.49n ± 0% ~ (p=0.836 n=10)
Atanh 244.9n ± 0% 243.8n ± 0% -0.43% (p=0.002 n=10)
Atan2 158.2n ± 1% 153.3n ± 0% -3.10% (p=0.000 n=10)
Cbrt 186.8n ± 0% 181.1n ± 0% -3.03% (p=0.000 n=10)
Ceil 36.71n ± 1% 36.71n ± 0% ~ (p=0.434 n=10)
Copysign 6.531n ± 1% 6.526n ± 0% ~ (p=0.268 n=10)
Cos 98.19n ± 0% 95.40n ± 0% -2.84% (p=0.000 n=10)
Cosh 233.1n ± 0% 222.6n ± 0% -4.50% (p=0.000 n=10)
Erf 122.5n ± 0% 114.2n ± 0% -6.78% (p=0.000 n=10)
Erfc 126.0n ± 1% 116.6n ± 0% -7.46% (p=0.000 n=10)
Erfinv 138.8n ± 0% 138.6n ± 0% ~ (p=0.082 n=10)
Erfcinv 140.0n ± 0% 139.7n ± 0% ~ (p=0.359 n=10)
Exp 193.3n ± 0% 184.2n ± 0% -4.68% (p=0.000 n=10)
ExpGo 204.8n ± 0% 194.5n ± 0% -5.03% (p=0.000 n=10)
Expm1 152.5n ± 1% 145.0n ± 0% -4.92% (p=0.000 n=10)
Exp2 174.5n ± 0% 164.2n ± 0% -5.85% (p=0.000 n=10)
Exp2Go 184.4n ± 1% 175.4n ± 0% -4.88% (p=0.000 n=10)
Abs 4.912n ± 0% 4.914n ± 0% ~ (p=0.283 n=10)
Dim 15.50n ± 1% 15.52n ± 1% ~ (p=0.331 n=10)
Floor 36.89n ± 1% 36.76n ± 1% ~ (p=0.325 n=10)
Max 31.05n ± 1% 31.17n ± 1% ~ (p=0.628 n=10)
Min 31.01n ± 0% 31.06n ± 0% ~ (p=0.767 n=10)
Mod 294.1n ± 0% 245.6n ± 0% -16.52% (p=0.000 n=10)
Frexp 44.86n ± 1% 35.20n ± 0% -21.53% (p=0.000 n=10)
Gamma 195.8n ± 0% 185.4n ± 1% -5.29% (p=0.000 n=10)
Hypot 84.91n ± 0% 84.54n ± 1% -0.43% (p=0.006 n=10)
HypotGo 96.70n ± 0% 95.42n ± 1% -1.32% (p=0.000 n=10)
Ilogb 45.03n ± 0% 35.07n ± 1% -22.10% (p=0.000 n=10)
J0 634.5n ± 0% 627.2n ± 0% -1.16% (p=0.000 n=10)
J1 644.5n ± 0% 636.9n ± 0% -1.18% (p=0.000 n=10)
Jn 1.357µ ± 0% 1.344µ ± 0% -0.92% (p=0.000 n=10)
Ldexp 49.89n ± 0% 39.96n ± 0% -19.90% (p=0.000 n=10)
Lgamma 186.6n ± 0% 184.3n ± 0% -1.21% (p=0.000 n=10)
Log 150.4n ± 0% 141.1n ± 0% -6.15% (p=0.000 n=10)
Logb 46.70n ± 0% 35.89n ± 0% -23.15% (p=0.000 n=10)
Log1p 164.1n ± 0% 163.9n ± 0% ~ (p=0.122 n=10)
Log10 153.1n ± 0% 143.5n ± 0% -6.24% (p=0.000 n=10)
Log2 58.83n ± 0% 49.75n ± 0% -15.43% (p=0.000 n=10)
Modf 40.82n ± 1% 40.78n ± 0% ~ (p=0.239 n=10)
Nextafter32 49.15n ± 0% 48.93n ± 0% -0.44% (p=0.011 n=10)
Nextafter64 43.33n ± 0% 43.23n ± 0% ~ (p=0.228 n=10)
PowInt 269.4n ± 0% 243.8n ± 0% -9.49% (p=0.000 n=10)
PowFrac 618.0n ± 0% 571.7n ± 0% -7.48% (p=0.000 n=10)
Pow10Pos 13.09n ± 0% 13.05n ± 0% -0.31% (p=0.003 n=10)
Pow10Neg 30.99n ± 1% 30.99n ± 0% ~ (p=0.173 n=10)
Round 23.73n ± 0% 23.65n ± 0% -0.36% (p=0.011 n=10)
RoundToEven 27.87n ± 0% 27.73n ± 0% -0.48% (p=0.003 n=10)
Remainder 282.1n ± 0% 249.6n ± 0% -11.52% (p=0.000 n=10)
Signbit 11.46n ± 0% 11.42n ± 0% -0.39% (p=0.003 n=10)
Sin 115.2n ± 0% 113.2n ± 0% -1.74% (p=0.000 n=10)
Sincos 140.6n ± 0% 138.6n ± 0% -1.39% (p=0.000 n=10)
Sinh 252.0n ± 0% 241.4n ± 0% -4.21% (p=0.000 n=10)
SqrtIndirect 4.909n ± 0% 4.893n ± 0% -0.34% (p=0.021 n=10)
SqrtLatency 19.57n ± 1% 19.57n ± 0% ~ (p=0.087 n=10)
SqrtIndirectLatency 19.64n ± 0% 19.57n ± 0% -0.36% (p=0.025 n=10)
SqrtGoLatency 198.1n ± 0% 197.4n ± 0% -0.35% (p=0.014 n=10)
SqrtPrime 5.733µ ± 0% 5.725µ ± 0% ~ (p=0.116 n=10)
Tan 149.1n ± 0% 146.8n ± 0% -1.54% (p=0.000 n=10)
Tanh 248.2n ± 1% 238.1n ± 0% -4.05% (p=0.000 n=10)
Trunc 36.86n ± 0% 36.70n ± 0% -0.43% (p=0.029 n=10)
Y0 638.2n ± 0% 633.6n ± 0% -0.71% (p=0.000 n=10)
Y1 641.8n ± 0% 636.1n ± 0% -0.87% (p=0.000 n=10)
Yn 1.358µ ± 0% 1.345µ ± 0% -0.92% (p=0.000 n=10)
Float64bits 5.721n ± 0% 5.709n ± 0% -0.22% (p=0.044 n=10)
Float64frombits 4.905n ± 0% 4.893n ± 0% ~ (p=0.266 n=10)
Float32bits 12.27n ± 0% 12.23n ± 0% ~ (p=0.122 n=10)
Float32frombits 4.909n ± 0% 4.893n ± 0% -0.32% (p=0.024 n=10)
FMA 6.556n ± 0% 6.526n ± 0% ~ (p=0.283 n=10)
geomean 86.82n 83.75n -3.54%
Change-Id: I522297a79646d76543d516accce291f5a3cea337
Reviewed-on: https://go-review.googlesource.com/c/go/+/717560
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
This change mechanically replaces all occurrences of interface{}
by 'any' (where deemed safe by the 'any' modernizer) throughout
std and cmd, minus their vendor trees.
Since this fix is relatively numerous, it gets its own CL.
Also, 'go generate go/types'.
Change-Id: I14a6b52856c3291c1d27935409bca8d5fd4242a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/719702
Commit-Queue: Alan Donovan <adonovan@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Alan Donovan <adonovan@google.com>
|
|
Propagate "unread" across OpMoves. If the addr of this auto is only used
by an OpMove as its source arg, and the OpMove's target arg is the addr
of another auto. If the 2nd auto can be eliminated, this one can also be
eliminated.
This CL eliminates unnecessary memory copies and makes the frame smaller
in the following code snippet:
func contains(m map[string][16]int, k string) bool {
_, ok := m[k]
return ok
}
These are the benchmark results followed by the benchmark code:
goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Map1Access2Ok-8 9.582n ± 2% 9.226n ± 0% -3.72% (p=0.000 n=20)
Map2Access2Ok-8 13.79n ± 1% 10.24n ± 1% -25.77% (p=0.000 n=20)
Map3Access2Ok-8 68.68n ± 1% 12.65n ± 1% -81.58% (p=0.000 n=20)
package main_test
import "testing"
var (
m1 = map[int]int{}
m2 = map[int][16]int{}
m3 = map[int][256]int{}
)
func init() {
for i := range 1000 {
m1[i] = i
m2[i] = [16]int{15:i}
m3[i] = [256]int{255:i}
}
}
func BenchmarkMap1Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m1[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
func BenchmarkMap2Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m2[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
func BenchmarkMap3Access2Ok(b *testing.B) {
for i := range b.N {
_, ok := m3[i%1000]
if !ok {
b.Errorf("%d not found", i)
}
}
}
Fixes #75398
Change-Id: If75e9caaa50d460efc31a94565b9ba28c8158771
Reviewed-on: https://go-review.googlesource.com/c/go/+/702875
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
This CL introduces new divisible and divmod passes that rewrite
divisibility checks and div, mod, and mul. These happen after
prove, so that prove can make better sense of the code for
deriving bounds, and they must run before decompose, so that
64-bit ops can be lowered to 32-bit ops on 32-bit systems.
And then they need another generic pass as well, to optimize
the generated code before decomposing.
The three opt passes are "opt", "middle opt", and "late opt".
(Perhaps instead they should be "generic", "opt", and "late opt"?)
The "late opt" pass repeats the "middle opt" work on any new code
that has been generated in the interim.
There will not be new divs or mods, but there may be new muls.
The x%c==0 rewrite rules are much simpler now, since they can
match before divs have been rewritten. This has the effect of
applying them more consistently and making the rewrite rules
independent of the exact div rewrites.
Prove is also now charged with marking signed div/mod as
unsigned when the arguments call for it, allowing simpler
code to be emitted in various cases. For example,
t.Seconds()/2 and len(x)/2 are now recognized as unsigned,
meaning they compile to a simple shift (unsigned division),
avoiding the more complex fixup we need for signed values.
https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32
shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std'
output before and after. "Proved Rsh64x64 shifts to zero" is replaced
by the higher-level "Proved Div64 is unsigned" (the shift was in the
signed expansion of div by constant), but otherwise prove is only
finding more things to prove.
One short example, in code that does x[i%len(x)]:
< runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero
---
> runtime/mfinal.go:131:34: Proved Div64 is unsigned
> runtime/mfinal.go:131:38: Proved IsInBounds
A longer example:
< crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero
---
> crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds
These diffs are due to the smaller opt being better
and taking work away from prove:
< image/jpeg/dct.go:307:5: Proved IsInBounds
< image/jpeg/dct.go:308:5: Proved IsInBounds
...
< image/jpeg/dct.go:442:5: Proved IsInBounds
In the old opt, Mul by 8 was rewritten to Lsh by 3 early.
This CL delays that rule to help prove recognize mods,
but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8].
Specifically, computing the length, opt can now do:
(Sub64 (Add (Mul 8 i) 8) (Add (Mul 8 i) 8)) ->
(Add 8 (Sub (Mul 8 i) (Mul 8 i))) ->
(Add 8 (Mul 8 (Sub i i))) ->
(Add 8 (Mul 8 0)) ->
(Add 8 0) ->
8
The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)),
Leaving the multiply as Mul enables using that step; the old
rewrite to Lsh blocked it, leaving prove to figure out the length
and then remove the bounds checks. But now opt can evaluate
the length down to a constant 8 and then constant-fold away
the bounds checks 0 < 8, 1 < 8, and so on. After that,
the compiler has nothing left to prove.
Benchmarks are noisy in general; I checked the assembly for the many
large increases below, and the vast majority are unchanged and
presumably hitting the caches differently in some way.
The divisibility optimizations were not reliably triggering before.
This leads to a very large improvement in some cases, like
DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems
and DivisbleconstU64 on 32-bit systems.
Another way the divisibility optimizations were unreliable before
was incorrectly triggering for x/3, x%3 even though they are
written not to do that. There is a real but small slowdown
in the DivisibleWDivconst benchmarks on Mac because in the cases
used in the benchmark, it is still faster (on Mac) to do the
divisibility check than to remultiply.
This may be worth further study. Perhaps when there is no rotate
(meaning the divisor is odd), the divisibility optimization
should be enabled always. In any event, this CL makes it possible
to study that.
benchmark \ host s7 linux-amd64 mac linux-arm64 linux-ppc64le linux-386 s7:GOARCH=386 linux-arm
vs base vs base vs base vs base vs base vs base vs base vs base
LoadAdd ~ ~ ~ ~ ~ -1.59% ~ ~
ExtShift ~ ~ -42.14% +0.10% ~ +1.44% +5.66% +8.50%
Modify ~ ~ ~ ~ ~ ~ ~ -1.53%
MullImm ~ ~ ~ ~ ~ +37.90% -21.87% +3.05%
ConstModify ~ ~ ~ ~ -49.14% ~ ~ ~
BitSet ~ ~ ~ ~ -15.86% -14.57% +6.44% +0.06%
BitClear ~ ~ ~ ~ ~ +1.78% +3.50% +0.06%
BitToggle ~ ~ ~ ~ ~ -16.09% +2.91% ~
BitSetConst ~ ~ ~ ~ ~ ~ ~ -0.49%
BitClearConst ~ ~ ~ ~ -28.29% ~ ~ -0.40%
BitToggleConst ~ ~ ~ +8.89% -31.19% ~ ~ -0.77%
MulNeg ~ ~ ~ ~ ~ ~ ~ ~
Mul2Neg ~ ~ -4.83% ~ ~ -13.75% -5.92% ~
DivconstI64 ~ ~ ~ ~ ~ -30.12% ~ +0.50%
ModconstI64 ~ ~ -9.94% -4.63% ~ +3.15% ~ +5.32%
DivisiblePow2constI64 -34.49% -12.58% ~ ~ -12.25% ~ ~ ~
DivisibleconstI64 -24.69% -25.06% -0.40% -2.27% -42.61% -3.31% ~ +1.63%
DivisibleWDivconstI64 ~ ~ ~ ~ ~ -17.55% ~ -0.60%
DivconstU64/3 ~ ~ ~ ~ ~ +1.51% ~ ~
DivconstU64/5 ~ ~ ~ ~ ~ ~ ~ ~
DivconstU64/37 ~ ~ -0.18% ~ ~ +2.70% ~ ~
DivconstU64/1234567 ~ ~ ~ ~ ~ ~ ~ +0.12%
ModconstU64 ~ ~ ~ -0.24% ~ -5.10% -1.07% -1.56%
DivisibleconstU64 ~ ~ ~ ~ ~ -29.01% -59.13% -50.72%
DivisibleWDivconstU64 ~ ~ -12.18% -18.88% ~ -5.50% -3.91% +5.17%
DivconstI32 ~ ~ -0.48% ~ -34.69% +89.01% -6.01% -16.67%
ModconstI32 ~ +2.95% -0.33% ~ ~ -2.98% -5.40% -8.30%
DivisiblePow2constI32 ~ ~ ~ ~ ~ ~ ~ -16.22%
DivisibleconstI32 ~ ~ ~ ~ ~ -37.27% -47.75% -25.03%
DivisibleWDivconstI32 -11.59% +5.22% -12.99% -23.83% ~ +45.95% -7.03% -10.01%
DivconstU32 ~ ~ ~ ~ ~ +74.71% +4.81% ~
ModconstU32 ~ ~ +0.53% +0.18% ~ +51.16% ~ ~
DivisibleconstU32 ~ ~ ~ -0.62% ~ -4.25% ~ ~
DivisibleWDivconstU32 -2.77% +5.56% +11.12% -5.15% ~ +48.70% +25.11% -4.07%
DivconstI16 -6.06% ~ -0.33% +0.22% ~ ~ -9.68% +5.47%
ModconstI16 ~ ~ +4.44% +2.82% ~ ~ ~ +5.06%
DivisiblePow2constI16 ~ ~ ~ ~ ~ ~ ~ -0.17%
DivisibleconstI16 ~ ~ -0.23% ~ ~ ~ +4.60% +6.64%
DivisibleWDivconstI16 -1.44% -0.43% +13.48% -5.76% ~ +1.62% -23.15% -9.06%
DivconstU16 +1.61% ~ -0.35% -0.47% ~ ~ +15.59% ~
ModconstU16 ~ ~ ~ ~ ~ -0.72% ~ +14.23%
DivisibleconstU16 ~ ~ -0.05% +3.00% ~ ~ ~ +5.06%
DivisibleWDivconstU16 +52.10% +0.75% +17.28% +4.79% ~ -37.39% +5.28% -9.06%
DivconstI8 ~ ~ -0.34% -0.96% ~ ~ -9.20% ~
ModconstI8 +2.29% ~ +4.38% +2.96% ~ ~ ~ ~
DivisiblePow2constI8 ~ ~ ~ ~ ~ ~ ~ ~
DivisibleconstI8 ~ ~ ~ ~ ~ ~ +6.04% ~
DivisibleWDivconstI8 -26.44% +1.69% +17.03% +4.05% ~ +32.48% -24.90% ~
DivconstU8 -4.50% +14.06% -0.28% ~ ~ ~ +4.16% +0.88%
ModconstU8 ~ ~ +25.84% -0.64% ~ ~ ~ ~
DivisibleconstU8 ~ ~ -5.70% ~ ~ ~ ~ ~
DivisibleWDivconstU8 +49.55% +9.07% ~ +4.03% +53.87% -40.03% +39.72% -3.01%
Mul2 ~ ~ ~ ~ ~ ~ ~ ~
MulNeg2 ~ ~ ~ ~ -11.73% ~ ~ -0.02%
EfaceInteger ~ ~ ~ ~ ~ +18.11% ~ +2.53%
TypeAssert +33.90% +2.86% ~ ~ ~ -1.07% -5.29% -1.04%
Div64UnsignedSmall ~ ~ ~ ~ ~ ~ ~ ~
Div64Small ~ ~ ~ ~ ~ -0.88% ~ +2.39%
Div64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +0.35%
Div64SmallNegDividend ~ ~ ~ ~ ~ -0.84% ~ +3.57%
Div64SmallNegBoth ~ ~ ~ ~ ~ -0.86% ~ +3.55%
Div64Unsigned ~ ~ ~ ~ ~ ~ ~ -0.11%
Div64 ~ ~ ~ ~ ~ ~ ~ +0.11%
Div64NegDivisor ~ ~ ~ ~ ~ -1.29% ~ ~
Div64NegDividend ~ ~ ~ ~ ~ -1.44% ~ ~
Div64NegBoth ~ ~ ~ ~ ~ ~ ~ +0.28%
Mod64UnsignedSmall ~ ~ ~ ~ ~ +0.48% ~ +0.93%
Mod64Small ~ ~ ~ ~ ~ ~ ~ ~
Mod64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +1.44%
Mod64SmallNegDividend ~ ~ ~ ~ ~ +0.22% ~ +1.37%
Mod64SmallNegBoth ~ ~ ~ ~ ~ ~ ~ -2.22%
Mod64Unsigned ~ ~ ~ ~ ~ -0.95% ~ +0.11%
Mod64 ~ ~ ~ ~ ~ ~ ~ ~
Mod64NegDivisor ~ ~ ~ ~ ~ ~ ~ -0.02%
Mod64NegDividend ~ ~ ~ ~ ~ ~ ~ ~
Mod64NegBoth ~ ~ ~ ~ ~ ~ ~ -0.02%
MulconstI32/3 ~ ~ ~ -25.00% ~ ~ ~ +47.37%
MulconstI32/5 ~ ~ ~ +33.28% ~ ~ ~ +32.21%
MulconstI32/12 ~ ~ ~ -2.13% ~ ~ ~ -0.02%
MulconstI32/120 ~ ~ ~ +2.93% ~ ~ ~ -0.03%
MulconstI32/-120 ~ ~ ~ -2.17% ~ ~ ~ -0.03%
MulconstI32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03%
MulconstI32/65538 ~ ~ ~ ~ ~ -33.38% ~ +0.04%
MulconstI64/3 ~ ~ ~ +33.35% ~ -0.37% ~ -0.13%
MulconstI64/5 ~ ~ ~ -25.00% ~ -0.34% ~ ~
MulconstI64/12 ~ ~ ~ +2.13% ~ +11.62% ~ +2.30%
MulconstI64/120 ~ ~ ~ -1.98% ~ ~ ~ ~
MulconstI64/-120 ~ ~ ~ +0.75% ~ ~ ~ ~
MulconstI64/65537 ~ ~ ~ ~ ~ +5.61% ~ ~
MulconstI64/65538 ~ ~ ~ ~ ~ +5.25% ~ ~
MulconstU32/3 ~ +0.81% ~ +33.39% ~ +77.92% ~ -32.31%
MulconstU32/5 ~ ~ ~ -24.97% ~ +77.92% ~ -24.47%
MulconstU32/12 ~ ~ ~ +2.06% ~ ~ ~ +0.03%
MulconstU32/120 ~ ~ ~ -2.74% ~ ~ ~ +0.03%
MulconstU32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03%
MulconstU32/65538 ~ ~ ~ ~ ~ -33.42% ~ -0.03%
MulconstU64/3 ~ ~ ~ +33.33% ~ -0.28% ~ +1.22%
MulconstU64/5 ~ ~ ~ -25.00% ~ ~ ~ -0.64%
MulconstU64/12 ~ ~ ~ +2.30% ~ +11.59% ~ +0.14%
MulconstU64/120 ~ ~ ~ -2.82% ~ ~ ~ +0.04%
MulconstU64/65537 ~ +0.37% ~ ~ ~ +5.58% ~ ~
MulconstU64/65538 ~ ~ ~ ~ ~ +5.16% ~ ~
ShiftArithmeticRight ~ ~ ~ ~ ~ -10.81% ~ +0.31%
Switch8Predictable +14.69% ~ ~ ~ ~ -24.85% ~ ~
Switch8Unpredictable ~ -0.58% -3.80% ~ ~ -11.78% ~ -0.79%
Switch32Predictable -10.33% +17.89% ~ ~ ~ +5.76% ~ ~
Switch32Unpredictable -3.15% +1.19% +9.42% ~ ~ -10.30% -5.09% +0.44%
SwitchStringPredictable +70.88% +20.48% ~ ~ ~ +2.39% ~ +0.31%
SwitchStringUnpredictable ~ +3.91% -5.06% -0.98% ~ +0.61% +2.03% ~
SwitchTypePredictable +146.58% -1.10% ~ -12.45% ~ -0.46% -3.81% ~
SwitchTypeUnpredictable +0.46% -0.83% ~ +4.18% ~ +0.43% ~ +0.62%
SwitchInterfaceTypePredictable -13.41% -10.13% +11.03% ~ ~ -4.38% ~ +0.75%
SwitchInterfaceTypeUnpredictable -6.37% -2.14% ~ -3.21% ~ -4.20% ~ +1.08%
Fixes #63110.
Fixes #75954.
Change-Id: I55a876f08c6c14f419ce1a8cbba2eaae6c6efbf0
Reviewed-on: https://go-review.googlesource.com/c/go/+/714160
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Found by github.com/mdempsky/unconvert
Change-Id: I88ce10390a49ba768a4deaa0df9057c93c1164de
GitHub-Last-Rev: 3b0f7e8f74f58340637f33287c238765856b2483
GitHub-Pull-Request: golang/go#75974
Reviewed-on: https://go-review.googlesource.com/c/go/+/712940
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
NaN checks can often be merged into other comparisons by inverting them.
For example, `math.IsNaN(x) || x > 0` is equivalent to `!(x <= 0)`.
goos: linux
goarch: amd64
pkg: math
cpu: 12th Gen Intel(R) Core(TM) i7-12700T
│ sec/op │ sec/op vs base │
Acos 4.315n ± 0% 4.314n ± 0% ~ (p=0.642 n=10)
Acosh 8.398n ± 0% 7.779n ± 0% -7.37% (p=0.000 n=10)
Asin 4.203n ± 0% 4.211n ± 0% +0.20% (p=0.001 n=10)
Asinh 10.150n ± 0% 9.562n ± 0% -5.79% (p=0.000 n=10)
Atan 2.363n ± 0% 2.363n ± 0% ~ (p=0.801 n=10)
Atanh 8.192n ± 2% 7.685n ± 0% -6.20% (p=0.000 n=10)
Atan2 4.013n ± 0% 4.010n ± 0% ~ (p=0.073 n=10)
Cbrt 4.858n ± 0% 4.755n ± 0% -2.12% (p=0.000 n=10)
Cos 4.596n ± 0% 4.357n ± 0% -5.20% (p=0.000 n=10)
Cosh 5.071n ± 0% 5.071n ± 0% ~ (p=0.585 n=10)
Erf 2.802n ± 1% 2.788n ± 0% -0.54% (p=0.002 n=10)
Erfc 3.087n ± 1% 3.071n ± 0% ~ (p=0.320 n=10)
Erfinv 3.981n ± 0% 3.965n ± 0% -0.41% (p=0.000 n=10)
Erfcinv 3.985n ± 0% 3.977n ± 0% -0.20% (p=0.000 n=10)
ExpGo 8.721n ± 2% 8.252n ± 0% -5.38% (p=0.000 n=10)
Expm1 4.378n ± 0% 4.228n ± 0% -3.43% (p=0.000 n=10)
Exp2 8.313n ± 0% 7.855n ± 0% -5.52% (p=0.000 n=10)
Exp2Go 8.498n ± 2% 7.921n ± 0% -6.79% (p=0.000 n=10)
Mod 15.16n ± 4% 12.20n ± 1% -19.58% (p=0.000 n=10)
Frexp 1.780n ± 2% 1.496n ± 0% -15.96% (p=0.000 n=10)
Gamma 4.378n ± 1% 4.013n ± 0% -8.35% (p=0.000 n=10)
HypotGo 2.655n ± 5% 2.427n ± 1% -8.57% (p=0.000 n=10)
Ilogb 1.912n ± 5% 1.749n ± 0% -8.53% (p=0.000 n=10)
J0 22.43n ± 9% 20.46n ± 0% -8.76% (p=0.000 n=10)
J1 21.03n ± 4% 19.96n ± 0% -5.09% (p=0.000 n=10)
Jn 45.40n ± 1% 42.59n ± 0% -6.20% (p=0.000 n=10)
Ldexp 2.312n ± 1% 1.944n ± 0% -15.94% (p=0.000 n=10)
Lgamma 4.617n ± 1% 4.584n ± 0% -0.73% (p=0.000 n=10)
Log 4.226n ± 0% 4.213n ± 0% -0.31% (p=0.001 n=10)
Logb 1.771n ± 0% 1.775n ± 0% ~ (p=0.097 n=10)
Log1p 5.102n ± 2% 5.001n ± 0% -1.97% (p=0.000 n=10)
Log10 4.407n ± 0% 4.408n ± 0% ~ (p=1.000 n=10)
Log2 2.416n ± 1% 2.138n ± 0% -11.51% (p=0.000 n=10)
Modf 1.669n ± 2% 1.611n ± 0% -3.50% (p=0.000 n=10)
Nextafter32 2.186n ± 0% 2.185n ± 0% ~ (p=0.051 n=10)
Nextafter64 2.182n ± 0% 2.184n ± 0% +0.09% (p=0.016 n=10)
PowInt 11.39n ± 6% 10.68n ± 2% -6.24% (p=0.000 n=10)
PowFrac 26.60n ± 2% 26.12n ± 0% -1.80% (p=0.000 n=10)
Pow10Pos 0.5067n ± 4% 0.5003n ± 1% -1.27% (p=0.001 n=10)
Pow10Neg 0.8552n ± 0% 0.8552n ± 0% ~ (p=0.928 n=10)
Round 1.181n ± 0% 1.182n ± 0% +0.08% (p=0.001 n=10)
RoundToEven 1.709n ± 0% 1.710n ± 0% ~ (p=0.053 n=10)
Remainder 12.54n ± 5% 11.99n ± 2% -4.46% (p=0.000 n=10)
Sin 3.933n ± 5% 3.926n ± 0% -0.17% (p=0.000 n=10)
Sincos 5.672n ± 0% 5.522n ± 0% -2.65% (p=0.000 n=10)
Sinh 5.447n ± 1% 5.444n ± 0% -0.06% (p=0.029 n=10)
Tan 4.061n ± 0% 4.058n ± 0% -0.07% (p=0.005 n=10)
Tanh 5.599n ± 0% 5.595n ± 0% -0.06% (p=0.042 n=10)
Y0 20.75n ± 5% 19.73n ± 1% -4.92% (p=0.000 n=10)
Y1 20.87n ± 2% 19.78n ± 1% -5.20% (p=0.000 n=10)
Yn 44.50n ± 2% 42.04n ± 2% -5.53% (p=0.000 n=10)
geomean 4.989n 4.791n -3.96%
goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
│ sec/op │ sec/op vs base │
Acos 159.9n ± 0% 159.9n ± 0% ~ (p=0.269 n=10)
Acosh 244.7n ± 0% 235.0n ± 0% -3.98% (p=0.000 n=10)
Asin 159.9n ± 0% 159.9n ± 0% ~ (p=0.154 n=10)
Asinh 270.8n ± 0% 261.1n ± 0% -3.60% (p=0.000 n=10)
Atan 119.1n ± 0% 119.1n ± 0% ~ (p=0.347 n=10)
Atanh 260.2n ± 0% 261.8n ± 4% ~ (p=0.459 n=10)
Atan2 186.8n ± 0% 186.8n ± 0% ~ (p=0.487 n=10)
Cbrt 203.5n ± 0% 198.2n ± 0% -2.60% (p=0.000 n=10)
Ceil 31.82n ± 0% 31.81n ± 0% ~ (p=0.714 n=10)
Copysign 4.894n ± 0% 4.893n ± 0% ~ (p=0.161 n=10)
Cos 107.6n ± 0% 103.6n ± 0% -3.76% (p=0.000 n=10)
Cosh 259.0n ± 0% 252.8n ± 0% -2.39% (p=0.000 n=10)
Erf 133.7n ± 0% 133.7n ± 0% ~ (p=0.720 n=10)
Erfc 137.9n ± 0% 137.8n ± 0% -0.04% (p=0.033 n=10)
Erfinv 173.7n ± 0% 168.8n ± 0% -2.82% (p=0.000 n=10)
Erfcinv 173.7n ± 0% 168.8n ± 0% -2.82% (p=0.000 n=10)
Exp 215.3n ± 0% 208.1n ± 0% -3.34% (p=0.000 n=10)
ExpGo 226.7n ± 0% 220.6n ± 0% -2.69% (p=0.000 n=10)
Expm1 164.8n ± 0% 159.0n ± 0% -3.52% (p=0.000 n=10)
Exp2 185.0n ± 0% 182.7n ± 0% -1.22% (p=0.000 n=10)
Exp2Go 198.9n ± 0% 196.5n ± 0% -1.21% (p=0.000 n=10)
Abs 4.894n ± 0% 4.893n ± 0% ~ (p=0.262 n=10)
Dim 16.31n ± 0% 16.31n ± 0% ~ (p=1.000 n=10)
Floor 31.81n ± 0% 31.81n ± 0% ~ (p=0.067 n=10)
Max 26.11n ± 0% 26.10n ± 0% ~ (p=0.080 n=10)
Min 26.10n ± 0% 26.10n ± 0% ~ (p=0.095 n=10)
Mod 337.7n ± 0% 291.9n ± 0% -13.56% (p=0.000 n=10)
Frexp 50.57n ± 0% 42.41n ± 0% -16.13% (p=0.000 n=10)
Gamma 206.3n ± 0% 198.1n ± 0% -4.00% (p=0.000 n=10)
Hypot 94.62n ± 0% 94.61n ± 0% ~ (p=0.437 n=10)
HypotGo 109.3n ± 0% 109.3n ± 0% ~ (p=1.000 n=10)
Ilogb 44.05n ± 0% 44.04n ± 0% -0.02% (p=0.025 n=10)
J0 663.1n ± 0% 663.9n ± 0% +0.13% (p=0.002 n=10)
J1 663.9n ± 0% 666.4n ± 0% +0.38% (p=0.000 n=10)
Jn 1.404µ ± 0% 1.407µ ± 0% +0.21% (p=0.000 n=10)
Ldexp 57.10n ± 0% 48.93n ± 0% -14.30% (p=0.000 n=10)
Lgamma 185.1n ± 0% 187.6n ± 0% +1.32% (p=0.000 n=10)
Log 182.7n ± 0% 170.1n ± 0% -6.87% (p=0.000 n=10)
Logb 46.49n ± 0% 46.49n ± 0% ~ (p=0.675 n=10)
Log1p 184.3n ± 0% 179.4n ± 0% -2.63% (p=0.000 n=10)
Log10 184.3n ± 0% 171.2n ± 0% -7.08% (p=0.000 n=10)
Log2 66.05n ± 0% 57.90n ± 0% -12.34% (p=0.000 n=10)
Modf 34.25n ± 0% 34.24n ± 0% ~ (p=0.163 n=10)
Nextafter32 49.33n ± 1% 48.93n ± 0% -0.81% (p=0.002 n=10)
Nextafter64 43.64n ± 0% 43.23n ± 0% -0.93% (p=0.000 n=10)
PowInt 267.6n ± 0% 251.2n ± 0% -6.11% (p=0.000 n=10)
PowFrac 672.9n ± 0% 637.9n ± 0% -5.19% (p=0.000 n=10)
Pow10Pos 13.87n ± 0% 13.87n ± 0% ~ (p=1.000 n=10)
Pow10Neg 19.58n ± 62% 19.59n ± 62% ~ (p=0.355 n=10)
Round 23.65n ± 0% 23.65n ± 0% ~ (p=1.000 n=10)
RoundToEven 27.73n ± 0% 27.73n ± 0% ~ (p=0.635 n=10)
Remainder 309.9n ± 0% 280.5n ± 0% -9.49% (p=0.000 n=10)
Signbit 13.05n ± 0% 13.05n ± 0% ~ (p=1.000 n=10) ¹
Sin 120.7n ± 0% 120.7n ± 0% ~ (p=1.000 n=10) ¹
Sincos 148.4n ± 0% 143.5n ± 0% -3.30% (p=0.000 n=10)
Sinh 275.6n ± 0% 267.5n ± 0% -2.94% (p=0.000 n=10)
SqrtIndirect 3.262n ± 0% 3.262n ± 0% ~ (p=0.263 n=10)
SqrtLatency 19.57n ± 0% 19.57n ± 0% ~ (p=0.582 n=10)
SqrtIndirectLatency 19.57n ± 0% 19.57n ± 0% ~ (p=1.000 n=10)
SqrtGoLatency 203.2n ± 0% 197.6n ± 0% -2.78% (p=0.000 n=10)
SqrtPrime 4.952µ ± 0% 4.952µ ± 0% -0.01% (p=0.025 n=10)
Tan 153.3n ± 0% 153.3n ± 0% ~ (p=1.000 n=10)
Tanh 280.5n ± 0% 272.4n ± 0% -2.91% (p=0.000 n=10)
Trunc 31.81n ± 0% 31.81n ± 0% ~ (p=1.000 n=10)
Y0 680.1n ± 0% 664.8n ± 0% -2.25% (p=0.000 n=10)
Y1 684.2n ± 0% 669.6n ± 0% -2.14% (p=0.000 n=10)
Yn 1.444µ ± 0% 1.410µ ± 0% -2.35% (p=0.000 n=10)
Float64bits 5.709n ± 0% 5.708n ± 0% ~ (p=0.573 n=10)
Float64frombits 4.893n ± 0% 4.893n ± 0% ~ (p=0.734 n=10)
Float32bits 12.23n ± 0% 12.23n ± 0% ~ (p=0.628 n=10)
Float32frombits 4.893n ± 0% 4.893n ± 0% ~ (p=0.971 n=10)
FMA 4.893n ± 0% 4.893n ± 0% ~ (p=0.736 n=10)
geomean 88.96n 87.05n -2.15%
¹ all samples are equal
Change-Id: I8db8ac7b7b3430b946b89e88dd6c1546804125c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/697360
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Munday <mikemndy@gmail.com>
|
|
This CL tires to improve a situation pointed out by
https://github.com/golang/go/issues/73787#issuecomment-3305494947.
Change-Id: Ic23c80fe71344fc25383ab238ad6631e0f0cd22e
Reviewed-on: https://go-review.googlesource.com/c/go/+/705416
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
When slicing, ignore expressions which could be elided, as in slicing
starting at 0 or ending at len(v).
Fixes #75278
Change-Id: I9c18e29c3d4da9bef89bd25bb261d3cb60e66392
Reviewed-on: https://go-review.googlesource.com/c/go/+/701216
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <markfreeman@google.com>
|
|
This change adds benchmarks for Match and speeds it up by simplifying
scanChunk (to the point of making it inlineable) and eliminating some
branches in matchChunk.
Here are some benchmark results (no change to allocations):
goos: darwin
goarch: amd64
pkg: path
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
│ old │ new2 │
│ sec/op │ sec/op vs base │
Match/"abc"_"abc"-8 24.64n ± 6% 21.30n ± 1% -13.53% (p=0.000 n=10)
Match/"*"_"abc"-8 9.117n ± 3% 8.345n ± 2% -8.47% (p=0.000 n=10)
Match/"*c"_"abc"-8 29.59n ± 3% 29.86n ± 3% ~ (p=0.055 n=10)
Match/"a*"_"a"-8 22.88n ± 2% 20.66n ± 1% -9.68% (p=0.000 n=10)
Match/"a*"_"abc"-8 23.57n ± 1% 21.59n ± 0% -8.40% (p=0.000 n=10)
Match/"a*"_"ab/c"-8 23.18n ± 1% 21.48n ± 1% -7.35% (p=0.000 n=10)
Match/"a*/b"_"abc/b"-8 54.96n ± 1% 52.06n ± 0% -5.29% (p=0.000 n=10)
Match/"a*/b"_"a/c/b"-8 34.50n ± 3% 31.42n ± 2% -8.91% (p=0.000 n=10)
Match/"a*b*c*d*e*/f"_"axbxcxdxe/f"-8 125.4n ± 1% 114.9n ± 1% -8.45% (p=0.000 n=10)
Match/"a*b*c*d*e*/f"_"axbxcxdxexxx/f"-8 151.1n ± 1% 149.4n ± 1% -1.09% (p=0.001 n=10)
Match/"a*b*c*d*e*/f"_"axbxcxdxe/xxx/f"-8 127.8n ± 4% 115.6n ± 3% -9.51% (p=0.000 n=10)
Match/"a*b*c*d*e*/f"_"axbxcxdxexxx/fff"-8 155.6n ± 1% 168.0n ± 11% ~ (p=0.143 n=10)
Match/"a*b?c*x"_"abxbbxdbxebxczzx"-8 189.2n ± 1% 193.5n ± 6% ~ (p=1.000 n=10)
Match/"a*b?c*x"_"abxbbxdbxebxczzy"-8 196.3n ± 1% 190.6n ± 2% -2.93% (p=0.001 n=10)
Match/"ab[c]"_"abc"-8 39.50n ± 1% 37.84n ± 1% -4.20% (p=0.000 n=10)
Match/"ab[b-d]"_"abc"-8 47.94n ± 1% 45.99n ± 1% -4.06% (p=0.000 n=10)
Match/"ab[e-g]"_"abc"-8 49.48n ± 1% 47.57n ± 1% -3.85% (p=0.000 n=10)
Match/"ab[^c]"_"abc"-8 41.55n ± 1% 38.01n ± 2% -8.51% (p=0.000 n=10)
Match/"ab[^b-d]"_"abc"-8 50.48n ± 1% 46.35n ± 0% -8.17% (p=0.000 n=10)
Match/"ab[^e-g]"_"abc"-8 50.12n ± 3% 47.37n ± 2% -5.47% (p=0.000 n=10)
Match/"a\\*b"_"a*b"-8 24.59n ± 0% 21.77n ± 1% -11.47% (p=0.000 n=10)
Match/"a\\*b"_"ab"-8 26.14n ± 1% 24.52n ± 3% -6.18% (p=0.000 n=10)
Match/"a?b"_"a☺b"-8 25.66n ± 2% 23.40n ± 1% -8.81% (p=0.000 n=10)
Match/"a[^a]b"_"a☺b"-8 41.14n ± 0% 38.97n ± 3% -5.29% (p=0.001 n=10)
Match/"a???b"_"a☺b"-8 39.66n ± 6% 36.63n ± 6% -7.63% (p=0.003 n=10)
Match/"a[^a][^a][^a]b"_"a☺b"-8 85.07n ± 8% 84.97n ± 1% ~ (p=0.971 n=10)
Match/"[a-ζ]*"_"α"-8 49.45n ± 4% 48.34n ± 2% -2.23% (p=0.001 n=10)
Match/"*[a-ζ]"_"A"-8 67.67n ± 3% 70.53n ± 2% +4.23% (p=0.000 n=10)
Match/"a?b"_"a/b"-8 26.61n ± 5% 25.92n ± 1% -2.59% (p=0.000 n=10)
Match/"a*b"_"a/b"-8 29.38n ± 3% 26.61n ± 2% -9.46% (p=0.000 n=10)
Match/"[\\]a]"_"]"-8 40.75n ± 3% 39.31n ± 2% -3.52% (p=0.000 n=10)
Match/"[\\-]"_"-"-8 30.58n ± 1% 30.53n ± 2% ~ (p=0.565 n=10)
Match/"[x\\-]"_"x"-8 41.02n ± 2% 39.32n ± 1% -4.13% (p=0.000 n=10)
Match/"[x\\-]"_"-"-8 40.74n ± 2% 39.82n ± 1% -2.25% (p=0.000 n=10)
Match/"[x\\-]"_"z"-8 42.19n ± 1% 40.47n ± 1% -4.08% (p=0.001 n=10)
Match/"[\\-x]"_"x"-8 40.77n ± 1% 39.61n ± 3% -2.86% (p=0.000 n=10)
Match/"[\\-x]"_"-"-8 40.94n ± 2% 39.39n ± 0% -3.79% (p=0.000 n=10)
Match/"[\\-x]"_"a"-8 42.12n ± 2% 42.11n ± 1% ~ (p=0.954 n=10)
Match/"[]a]"_"]"-8 23.73n ± 2% 21.67n ± 2% -8.72% (p=0.000 n=10)
Match/"[-]"_"-"-8 21.28n ± 1% 19.96n ± 1% -6.16% (p=0.000 n=10)
Match/"[x-]"_"x"-8 28.96n ± 1% 27.98n ± 3% -3.37% (p=0.000 n=10)
Match/"[x-]"_"-"-8 29.46n ± 2% 29.27n ± 7% ~ (p=0.796 n=10)
Match/"[x-]"_"z"-8 29.14n ± 0% 30.78n ± 9% ~ (p=0.468 n=10)
Match/"[-x]"_"x"-8 23.62n ± 3% 22.08n ± 1% -6.54% (p=0.000 n=10)
Match/"[-x]"_"-"-8 23.70n ± 2% 22.11n ± 2% -6.69% (p=0.000 n=10)
Match/"[-x]"_"a"-8 23.71n ± 0% 22.09n ± 3% -6.83% (p=0.000 n=10)
Match/"\\"_"a"-8 11.845n ± 2% 9.496n ± 3% -19.83% (p=0.000 n=10)
Match/"[a-b-c]"_"a"-8 39.85n ± 1% 40.69n ± 1% +2.10% (p=0.000 n=10)
Match/"["_"a"-8 18.57n ± 3% 16.47n ± 3% -11.33% (p=0.000 n=10)
Match/"[^"_"a"-8 20.30n ± 3% 17.96n ± 1% -11.53% (p=0.000 n=10)
Match/"[^bc"_"a"-8 36.81n ± 7% 32.87n ± 2% -10.72% (p=0.000 n=10)
Match/"a["_"a"-8 20.11n ± 1% 19.41n ± 1% -3.46% (p=0.000 n=10)
Match/"a["_"ab"-8 23.19n ± 2% 21.80n ± 1% -6.02% (p=0.000 n=10)
Match/"a["_"x"-8 20.79n ± 1% 19.51n ± 1% -6.13% (p=0.000 n=10)
Match/"a/b["_"x"-8 29.80n ± 0% 28.36n ± 0% -4.82% (p=0.000 n=10)
Match/"*x"_"xxx"-8 30.77n ± 2% 28.55n ± 2% -7.23% (p=0.000 n=10)
geomean 37.11n 35.15n -5.29%
pkg: path/filepath
│ old │ new2 │
│ sec/op │ sec/op vs base │
Match/"abc"_"abc"-8 22.79n ± 3% 21.62n ± 1% -5.11% (p=0.000 n=10)
Match/"*"_"abc"-8 11.410n ± 3% 9.595n ± 1% -15.91% (p=0.000 n=10)
Match/"*c"_"abc"-8 29.69n ± 2% 29.09n ± 0% -2.00% (p=0.001 n=10)
Match/"a*"_"a"-8 23.69n ± 1% 21.12n ± 1% -10.83% (p=0.000 n=10)
Match/"a*"_"abc"-8 25.38n ± 4% 22.41n ± 5% -11.70% (p=0.000 n=10)
Match/"a*"_"ab/c"-8 24.73n ± 1% 21.85n ± 3% -11.65% (p=0.000 n=10)
Match/"a*/b"_"abc/b"-8 53.73n ± 2% 53.30n ± 2% -0.82% (p=0.037 n=10)
Match/"a*/b"_"a/c/b"-8 32.97n ± 2% 31.64n ± 0% -4.02% (p=0.000 n=10)
Match/"a*b*c*d*e*/f"_"axbxcxdxe/f"-8 120.7n ± 0% 124.0n ± 1% +2.82% (p=0.000 n=10)
Match/"a*b*c*d*e*/f"_"axbxcxdxexxx/f"-8 149.4n ± 2% 160.0n ± 6% +7.13% (p=0.009 n=10)
Match/"a*b*c*d*e*/f"_"axbxcxdxe/xxx/f"-8 121.8n ± 2% 122.8n ± 9% ~ (p=0.898 n=10)
Match/"a*b*c*d*e*/f"_"axbxcxdxexxx/fff"-8 149.7n ± 1% 151.2n ± 2% +1.04% (p=0.005 n=10)
Match/"a*b?c*x"_"abxbbxdbxebxczzx"-8 187.8n ± 1% 184.8n ± 1% -1.57% (p=0.001 n=10)
Match/"a*b?c*x"_"abxbbxdbxebxczzy"-8 195.6n ± 1% 191.2n ± 2% -2.22% (p=0.018 n=10)
Match/"ab[c]"_"abc"-8 40.30n ± 4% 37.31n ± 3% -7.41% (p=0.000 n=10)
Match/"ab[b-d]"_"abc"-8 47.92n ± 4% 45.92n ± 4% -4.17% (p=0.002 n=10)
Match/"ab[e-g]"_"abc"-8 47.58n ± 4% 45.22n ± 0% -4.98% (p=0.000 n=10)
Match/"ab[^c]"_"abc"-8 40.33n ± 2% 36.95n ± 2% -8.38% (p=0.000 n=10)
Match/"ab[^b-d]"_"abc"-8 48.68n ± 2% 46.49n ± 3% -4.50% (p=0.001 n=10)
Match/"ab[^e-g]"_"abc"-8 49.11n ± 2% 48.42n ± 2% -1.42% (p=0.014 n=10)
Match/"a\\*b"_"a*b"-8 26.79n ± 9% 23.70n ± 1% -11.53% (p=0.000 n=10)
Match/"a\\*b"_"ab"-8 23.84n ± 11% 25.99n ± 1% +9.02% (p=0.015 n=10)
Match/"a?b"_"a☺b"-8 25.72n ± 2% 23.70n ± 1% -7.87% (p=0.000 n=10)
Match/"a[^a]b"_"a☺b"-8 41.72n ± 2% 39.24n ± 2% -5.94% (p=0.000 n=10)
Match/"a???b"_"a☺b"-8 35.94n ± 1% 35.30n ± 3% -1.78% (p=0.009 n=10)
Match/"a[^a][^a][^a]b"_"a☺b"-8 82.17n ± 0% 84.56n ± 3% +2.91% (p=0.000 n=10)
Match/"[a-ζ]*"_"α"-8 52.01n ± 3% 49.78n ± 0% -4.30% (p=0.000 n=10)
Match/"*[a-ζ]"_"A"-8 65.62n ± 1% 66.91n ± 3% +1.97% (p=0.000 n=10)
Match/"a?b"_"a/b"-8 24.60n ± 2% 25.60n ± 1% +4.07% (p=0.001 n=10)
Match/"a*b"_"a/b"-8 28.00n ± 1% 26.50n ± 1% -5.37% (p=0.000 n=10)
Match/"[\\]a]"_"]"-8 40.72n ± 1% 39.55n ± 2% -2.86% (p=0.002 n=10)
Match/"[\\-]"_"-"-8 30.52n ± 2% 30.15n ± 1% -1.18% (p=0.003 n=10)
Match/"[x\\-]"_"x"-8 40.69n ± 1% 40.41n ± 2% ~ (p=0.105 n=10)
Match/"[x\\-]"_"-"-8 40.50n ± 2% 40.69n ± 2% ~ (p=0.109 n=10)
Match/"[x\\-]"_"z"-8 40.80n ± 3% 39.92n ± 2% -2.17% (p=0.002 n=10)
Match/"[\\-x]"_"x"-8 40.73n ± 2% 43.78n ± 5% +7.49% (p=0.003 n=10)
Match/"[\\-x]"_"-"-8 40.86n ± 2% 39.58n ± 9% ~ (p=0.105 n=10)
Match/"[\\-x]"_"a"-8 40.67n ± 0% 40.85n ± 3% ~ (p=0.288 n=10)
Match/"[]a]"_"]"-8 22.71n ± 1% 20.35n ± 2% -10.37% (p=0.000 n=10)
Match/"[-]"_"-"-8 20.98n ± 2% 19.71n ± 2% -6.10% (p=0.000 n=10)
Match/"[x-]"_"x"-8 30.78n ± 1% 26.29n ± 1% -14.56% (p=0.000 n=10)
Match/"[x-]"_"-"-8 30.79n ± 2% 26.38n ± 3% -14.29% (p=0.000 n=10)
Match/"[x-]"_"z"-8 30.77n ± 0% 26.47n ± 1% -13.97% (p=0.000 n=10)
Match/"[-x]"_"x"-8 23.73n ± 1% 21.08n ± 1% -11.19% (p=0.000 n=10)
Match/"[-x]"_"-"-8 23.68n ± 1% 21.11n ± 3% -10.81% (p=0.000 n=10)
Match/"[-x]"_"a"-8 23.74n ± 3% 21.04n ± 0% -11.37% (p=0.000 n=10)
Match/"\\"_"a"-8 11.57n ± 0% 10.35n ± 0% -10.63% (p=0.000 n=10)
Match/"[a-b-c]"_"a"-8 42.46n ± 1% 38.97n ± 3% -8.23% (p=0.000 n=10)
Match/"["_"a"-8 18.03n ± 7% 15.95n ± 1% -11.51% (p=0.000 n=10)
Match/"[^"_"a"-8 19.20n ± 11% 17.96n ± 3% -6.41% (p=0.000 n=10)
Match/"[^bc"_"a"-8 32.82n ± 3% 32.34n ± 0% -1.45% (p=0.000 n=10)
Match/"a["_"a"-8 19.48n ± 2% 19.48n ± 2% ~ (p=0.670 n=10)
Match/"a["_"ab"-8 22.19n ± 3% 22.09n ± 1% ~ (p=0.148 n=10)
Match/"a["_"x"-8 20.32n ± 3% 19.66n ± 1% -3.27% (p=0.001 n=10)
Match/"a/b["_"x"-8 28.68n ± 1% 28.52n ± 1% ~ (p=0.315 n=10)
Match/"*x"_"xxx"-8 29.95n ± 3% 29.27n ± 3% -2.27% (p=0.006 n=10)
geomean 36.76n 35.10n -4.51%
Change-Id: I02d07b7a066e5789587035180fa15ae07a9579a6
GitHub-Last-Rev: 8b314b430920f9636088d0291adf8d518fc56b0a
GitHub-Pull-Request: golang/go#75211
Reviewed-on: https://go-review.googlesource.com/c/go/+/700315
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Kirill Kolyshkin <kolyshkin@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change makes the fast path for ASCII characters inlineable in
DecodeRune and DecodeRuneInString and removes most instances of manual
inlining at call sites.
Here are some benchmark results (no change to allocations):
goos: darwin
goarch: amd64
pkg: unicode/utf8
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
│ old │ new │
│ sec/op │ sec/op vs base │
DecodeASCIIRune-8 2.4545n ± 2% 0.6253n ± 2% -74.52% (p=0.000 n=20)
DecodeJapaneseRune-8 3.988n ± 1% 4.023n ± 1% +0.86% (p=0.050 n=20)
DecodeASCIIRuneInString-8 2.4675n ± 1% 0.6264n ± 2% -74.61% (p=0.000 n=20)
DecodeJapaneseRuneInString-8 3.992n ± 1% 4.001n ± 1% ~ (p=0.625 n=20)
geomean 3.134n 1.585n -49.43%
Note: when #61502 gets resolved, DecodeRune and DecodeRuneInString should
be reverted to their idiomatic implementations.
Fixes #31666
Updates #48195
Change-Id: I4be25c4f52417dc28b3a7bd72f1b04018470f39d
GitHub-Last-Rev: 2e352a0045027e059be79cdb60241b5cf35fec71
GitHub-Pull-Request: golang/go#75181
Reviewed-on: https://go-review.googlesource.com/c/go/+/699675
Reviewed-by: Sean Liao <sean@liao.dev>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
The 'classify' instruction on RISC-V sets a bit in a mask to indicate
the class a floating point value belongs to (e.g. whether the value is
an infinity, a normal number, a subnormal number and so on). There are
other places this instruction is useful but for now I've just used it
for infinity tests.
The gains are relatively small (~1-2 instructions per IsInf call) but
using FCLASSD does potentially unlock further optimizations. It also
reduces the number of loads from memory and the number of moves
between general purpose and floating point register files.
goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
│ sec/op │ sec/op vs base │
Acos 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10)
Acosh 249.8n ± 0% 254.4n ± 0% +1.86% (p=0.000 n=10)
Asin 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10)
Asinh 292.2n ± 0% 283.0n ± 0% -3.15% (p=0.000 n=10)
Atan 119.1n ± 0% 119.0n ± 0% -0.08% (p=0.036 n=10)
Atanh 265.1n ± 0% 271.6n ± 0% +2.43% (p=0.000 n=10)
Atan2 194.9n ± 0% 186.7n ± 0% -4.23% (p=0.000 n=10)
Cbrt 216.3n ± 0% 203.1n ± 0% -6.10% (p=0.000 n=10)
Ceil 31.82n ± 0% 31.81n ± 0% ~ (p=0.063 n=10)
Copysign 4.897n ± 0% 4.893n ± 3% -0.08% (p=0.038 n=10)
Cos 123.9n ± 0% 107.7n ± 1% -13.03% (p=0.000 n=10)
Cosh 293.0n ± 0% 264.6n ± 0% -9.68% (p=0.000 n=10)
Erf 150.0n ± 0% 133.8n ± 0% -10.80% (p=0.000 n=10)
Erfc 151.8n ± 0% 137.9n ± 0% -9.16% (p=0.000 n=10)
Erfinv 173.8n ± 0% 173.8n ± 0% ~ (p=0.820 n=10)
Erfcinv 173.8n ± 0% 173.8n ± 0% ~ (p=1.000 n=10)
Exp 247.7n ± 0% 220.4n ± 0% -11.04% (p=0.000 n=10)
ExpGo 261.4n ± 0% 232.5n ± 0% -11.04% (p=0.000 n=10)
Expm1 176.2n ± 0% 164.9n ± 0% -6.41% (p=0.000 n=10)
Exp2 220.4n ± 0% 190.2n ± 0% -13.70% (p=0.000 n=10)
Exp2Go 232.5n ± 0% 204.0n ± 0% -12.22% (p=0.000 n=10)
Abs 4.897n ± 0% 4.897n ± 0% ~ (p=0.726 n=10)
Dim 16.32n ± 0% 16.31n ± 0% ~ (p=0.770 n=10)
Floor 31.84n ± 0% 31.83n ± 0% ~ (p=0.677 n=10)
Max 26.11n ± 0% 26.13n ± 0% ~ (p=0.290 n=10)
Min 26.10n ± 0% 26.11n ± 0% ~ (p=0.424 n=10)
Mod 416.2n ± 0% 337.8n ± 0% -18.83% (p=0.000 n=10)
Frexp 63.65n ± 0% 50.60n ± 0% -20.50% (p=0.000 n=10)
Gamma 218.8n ± 0% 206.4n ± 0% -5.62% (p=0.000 n=10)
Hypot 92.20n ± 0% 94.69n ± 0% +2.70% (p=0.000 n=10)
HypotGo 107.7n ± 0% 109.3n ± 0% +1.49% (p=0.000 n=10)
Ilogb 59.54n ± 0% 44.04n ± 0% -26.04% (p=0.000 n=10)
J0 708.9n ± 0% 674.5n ± 0% -4.86% (p=0.000 n=10)
J1 707.6n ± 0% 676.1n ± 0% -4.44% (p=0.000 n=10)
Jn 1.513µ ± 0% 1.427µ ± 0% -5.68% (p=0.000 n=10)
Ldexp 70.20n ± 0% 57.09n ± 0% -18.68% (p=0.000 n=10)
Lgamma 201.5n ± 0% 185.3n ± 1% -8.01% (p=0.000 n=10)
Log 201.5n ± 0% 182.7n ± 0% -9.35% (p=0.000 n=10)
Logb 59.54n ± 0% 46.53n ± 0% -21.86% (p=0.000 n=10)
Log1p 178.8n ± 0% 173.9n ± 6% -2.74% (p=0.021 n=10)
Log10 201.4n ± 0% 184.3n ± 0% -8.49% (p=0.000 n=10)
Log2 79.17n ± 0% 66.07n ± 0% -16.54% (p=0.000 n=10)
Modf 34.27n ± 0% 34.25n ± 0% ~ (p=0.559 n=10)
Nextafter32 49.34n ± 0% 49.37n ± 0% +0.05% (p=0.040 n=10)
Nextafter64 43.66n ± 0% 43.66n ± 0% ~ (p=0.869 n=10)
PowInt 309.1n ± 0% 267.4n ± 0% -13.49% (p=0.000 n=10)
PowFrac 769.6n ± 0% 677.3n ± 0% -11.98% (p=0.000 n=10)
Pow10Pos 13.88n ± 0% 13.88n ± 0% ~ (p=0.811 n=10)
Pow10Neg 19.58n ± 0% 19.57n ± 0% ~ (p=0.993 n=10)
Round 23.65n ± 0% 23.66n ± 0% ~ (p=0.354 n=10)
RoundToEven 27.75n ± 0% 27.75n ± 0% ~ (p=0.971 n=10)
Remainder 380.0n ± 0% 309.9n ± 0% -18.45% (p=0.000 n=10)
Signbit 13.06n ± 0% 13.06n ± 0% ~ (p=1.000 n=10)
Sin 133.8n ± 0% 120.8n ± 0% -9.75% (p=0.000 n=10)
Sincos 160.7n ± 0% 147.7n ± 0% -8.12% (p=0.000 n=10)
Sinh 305.9n ± 0% 277.9n ± 0% -9.17% (p=0.000 n=10)
SqrtIndirect 3.265n ± 0% 3.264n ± 0% ~ (p=0.546 n=10)
SqrtLatency 19.58n ± 0% 19.58n ± 0% ~ (p=0.973 n=10)
SqrtIndirectLatency 19.59n ± 0% 19.58n ± 0% ~ (p=0.370 n=10)
SqrtGoLatency 205.7n ± 0% 202.7n ± 0% -1.46% (p=0.000 n=10)
SqrtPrime 4.953µ ± 0% 4.954µ ± 0% ~ (p=0.477 n=10)
Tan 163.2n ± 0% 150.2n ± 0% -7.99% (p=0.000 n=10)
Tanh 312.4n ± 0% 284.2n ± 0% -9.01% (p=0.000 n=10)
Trunc 31.83n ± 0% 31.83n ± 0% ~ (p=0.663 n=10)
Y0 701.0n ± 0% 669.2n ± 0% -4.54% (p=0.000 n=10)
Y1 704.5n ± 0% 672.4n ± 0% -4.55% (p=0.000 n=10)
Yn 1.490µ ± 0% 1.422µ ± 0% -4.60% (p=0.000 n=10)
Float64bits 5.713n ± 0% 5.710n ± 0% ~ (p=0.926 n=10)
Float64frombits 4.896n ± 0% 4.896n ± 0% ~ (p=0.663 n=10)
Float32bits 12.25n ± 0% 12.25n ± 0% ~ (p=0.571 n=10)
Float32frombits 4.898n ± 0% 4.896n ± 0% ~ (p=0.754 n=10)
FMA 4.895n ± 0% 4.895n ± 0% ~ (p=0.745 n=10)
geomean 94.40n 89.43n -5.27%
Change-Id: I4fe0f2e9f609e38d79463f9ba2519a3f9427432e
Reviewed-on: https://go-review.googlesource.com/c/go/+/348389
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
This change also add corresponding benchmark tests and codegen tests.
The performance improvement on CPU Loongson-3A6000-HV is as follows:
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
MulNeg 828.4n ± 0% 655.9n ± 0% -20.82% (p=0.000 n=10)
Mul2Neg 1062.0n ± 0% 826.8n ± 0% -22.15% (p=0.000 n=10)
geomean 938.0n 736.4n -21.49%
Change-Id: Ia999732880ec65be0c66cddc757a4868847e5b15
Reviewed-on: https://go-review.googlesource.com/c/go/+/682535
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
|
|
Change-Id: Iba6bb7f8252120f56d7e6ae49c9edc9382e8c7e0
Reviewed-on: https://go-review.googlesource.com/c/go/+/679855
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <mark@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
For #54766.
Change-Id: I6a6a636c40b5fe2e8b0d4a5e23933492bc8bb76e
Reviewed-on: https://go-review.googlesource.com/c/go/+/691595
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This info makes more sense in the flags instead of as a high
bit of the kind. This makes kind access simpler because we now
don't need to mask anything.
Cleaned up most direct field accesses to use methods instead.
(reflect making new types is the only remaining direct accessor.)
IfaceIndir -> !IsDirectIface everywhere.
gocore has been updated to handle the new location. So has delve.
TODO: any other tools need updating?
Change-Id: I123f97a4d4bdd0bff1641ee7e276d1cc0bd7e8eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/681936
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: I17568a898ea8514c7b32d2f48c44365ae37cf898
Reviewed-on: https://go-review.googlesource.com/c/go/+/670195
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Our current parallel mark algorithm suffers from frequent stalls on
memory since its access pattern is essentially random. Small objects
are the worst offenders, since each one forces pulling in at least one
full cache line to access even when the amount to be scanned is far
smaller than that. Each object also requires an independent access to
per-object metadata.
The purpose of this change is to improve garbage collector performance
by scanning small objects in batches to obtain better cache locality
than our current approach. The core idea behind this change is to defer
marking and scanning small objects, and then scan them in batches
localized to a span.
This change adds scanned bits to each small object (<=512 bytes) span in
addition to mark bits. The scanned bits indicate that the object has
been scanned. (One way to think of them is "grey" bits and "black" bits
in the tri-color mark-sweep abstraction.) Each of these spans is always
8 KiB and if they contain pointers, the pointer/scalar data is already
packed together at the end of the span, allowing us to further optimize
the mark algorithm for this specific case.
When the GC encounters a pointer, it first checks if it points into a
small object span. If so, it is first marked in the mark bits, and then
the object is queued on a work-stealing P-local queue. This object
represents the whole span, and we ensure that a span can only appear at
most once in any queue by maintaining an atomic ownership bit for each
span. Later, when the pointer is dequeued, we scan every object with a
set mark that doesn't have a corresponding scanned bit. If it turns out
that was the only object in the mark bits since the last time we scanned
the span, we scan just that object directly, essentially falling back to
the existing algorithm. noscan objects have no scan work, so they are
never queued.
Each span's mark and scanned bits are co-located together at the end of
the span. Since the span is always 8 KiB in size, it can be found with
simple pointer arithmetic. Next to the marks and scans we also store the
size class, eliminating the need to access the span's mspan altogether.
The work-stealing P-local queue is a new source of GC work. If this
queue gets full, half of it is dumped to a global linked list of spans
to scan. The regular scan queues are always prioritized over this queue
to allow time for darts to accumulate. Stealing work from other Ps is a
last resort.
This change also adds a new debug mode under GODEBUG=gctrace=2 that
dumps whole-span scanning statistics by size class on every GC cycle.
A future extension to this CL is to use SIMD-accelerated scanning
kernels for scanning spans with high mark bit density.
For #19112. (Deadlock averted in GOEXPERIMENT.)
For #73581.
Change-Id: I4bbb4e36f376950a53e61aaaae157ce842c341bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/658036
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Use an automatic algorithm to generate strength reduction code.
You give it all the linear combination (a*x+b*y) instructions in your
architecture, it figures out the rest.
Just amd64 and arm64 for now.
Fixes #67575
Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/626998
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The vast majority of the time, carry propagation is limited and
addVW/subVW only need to consider a single word for carry propagation.
As Josh Bleecher-Snyder pointed out in 2019 (CL 164968), once carrying
is done, the remaining words can be handled faster with copy (memmove).
In the benchmarks below, this is the data=random case.
Even more important, if the source and destination are the same,
the copy can be optimized away entirely, making a small in-place
addition to a big.Int O(1) instead of O(N). To date, only a few
systems (amd64, arm64, and pure Go, meaning wasm) make use of this
asymptotic improvement. This is the data=shortcut case.
This CL deletes the addVW/subVW assembly and replaces it with
an optimized pure Go version. Using Go makes it easy to call
the real copy builtin, which will use optimized memmove code,
instead of recreating a worse memmove in assembly (as arm64 does)
or omitting the copy optimization entirely (as most others do).
The worst case for the Go version versus assembly is the case
of incrementing 2^N-1 by 1, which has to propagate a carry
the entire length of the array. This is the data=carry case.
On balance, we believe this case is rare enough to be worth
taking a hit in that case, in exchange for significant wins
in the other cases and the deletion of significant amounts of
assembly of varying quality. (Remember that half the assembly has
the copy optimization and shortcut, while half does not.)
In the benchmarks, the systems are:
c2s16 GOARCH=amd64 c2s16 perf gomote (Intel, Google Cloud)
c3h88 GOARCH=amd64 c3h88 perf gomote (newer Intel, Google Cloud)
s7 GOARCH=amd64 rsc basement server (AMD Ryzen 9 7950X)
c4as16 GOARCH=arm64 c4as16 perf gomote (Google Cloud)
mac GOARCH=arm64 Apple M3 Pro in MacBook Pro
386 GOARCH=386 gotip-linux-386 gomote
arm GOARCH=arm gotip-linux-arm gomote
loong64 GOARCH=loong64 gotip-linux-loong64 gomote
ppc64le GOARCH=ppc64le gotip-linux-ppc64le gomote
riscv64 GOARCH=riscv64 gotip-linux-riscv64 gomote
benchmark \ system c2s16 c3h88 s7 c4as16 mac 386 arm loong64 ppc64le riscv64
AddVW/words=1/data=random -1.15% -1.74% -5.89% -9.80% -11.54% +23.71% -12.74% -14.25% +14.67% +10.27%
AddVW/words=2/data=random -2.59% ~ -4.38% -19.31% -15.41% +24.80% ~ -19.99% +13.73% +19.71%
AddVW/words=3/data=random -3.75% -19.10% -3.79% -23.15% -17.04% +20.04% -10.07% -23.20% ~ +15.39%
AddVW/words=4/data=random -2.84% +7.05% -8.77% -22.64% -15.77% +16.01% -7.36% -28.22% ~ +23.00%
AddVW/words=5/data=random -10.97% +2.16% -12.09% -20.89% -17.14% +9.42% -4.69% -32.60% ~ +10.07%
AddVW/words=6/data=random -9.87% ~ -7.54% -19.08% -6.46% ~ -3.44% -34.61% ~ +12.19%
AddVW/words=7/data=random -14.36% ~ -10.09% -19.10% -10.47% -6.20% -5.06% -38.14% -11.54% +6.79%
AddVW/words=8/data=random -17.50% ~ -11.06% -25.14% -12.88% -8.35% -5.11% -41.39% -14.04% +11.87%
AddVW/words=9/data=random -19.76% -4.05% -15.47% -24.08% -16.50% -12.34% -21.56% -44.25% -14.82% ~
AddVW/words=10/data=random -13.89% ~ -9.69% -23.06% -8.04% -12.58% -19.25% -32.80% -11.68% ~
AddVW/words=16/data=random -29.36% -15.35% -21.86% -25.04% -19.89% -32.26% -16.29% -42.66% -25.92% -3.01%
AddVW/words=32/data=random -39.02% -28.76% -39.87% -11.22% -2.85% -55.40% -31.17% -55.37% -37.92% -16.28%
AddVW/words=64/data=random -25.94% -19.09% -20.60% -6.90% +8.91% -51.00% -43.72% -62.27% -44.11% -28.74%
AddVW/words=100/data=random -22.79% -18.13% -18.25% ~ +33.89% -67.40% -51.77% -63.54% -53.75% -30.97%
AddVW/words=1000/data=random -8.98% -3.84% ~ -3.15% ~ -93.35% -63.92% -65.66% -68.67% -42.30%
AddVW/words=10000/data=random -1.38% -0.38% ~ ~ ~ -89.16% -65.18% -44.65% -70.35% -20.08%
AddVW/words=100000/data=random ~ ~ ~ ~ ~ -87.03% -64.51% -36.08% -61.40% -16.53%
SubVW/words=1/data=random -3.67% ~ -8.38% -10.26% -3.07% +45.78% -6.06% -11.17% ~ ~
SubVW/words=2/data=random -3.48% -10.07% -5.76% -20.14% -8.45% +44.28% ~ -19.09% ~ +16.98%
SubVW/words=3/data=random -7.11% -26.64% -4.48% -22.07% -9.21% +35.61% ~ -23.93% -18.20% ~
SubVW/words=4/data=random -4.23% +7.19% -8.95% -22.62% -13.89% +33.20% -8.96% -29.96% ~ +22.23%
SubVW/words=5/data=random -11.49% +1.92% -10.86% -22.27% -17.53% +24.48% -2.88% -35.19% -19.55% ~
SubVW/words=6/data=random -7.67% ~ -7.72% -18.44% -6.24% +12.03% -2.00% -39.68% -10.73% ~
SubVW/words=7/data=random -13.69% -18.32% -11.82% -18.92% -11.57% +6.63% ~ -43.54% -30.81% ~
SubVW/words=8/data=random -16.02% ~ -11.07% -24.50% -11.92% +4.32% -3.01% -46.95% -24.14% ~
SubVW/words=9/data=random -18.76% -3.34% -14.84% -23.79% -17.50% ~ -21.80% -49.98% -29.62% ~
SubVW/words=10/data=random -13.23% ~ -9.25% -21.26% -11.63% ~ -18.58% -39.19% -20.09% ~
SubVW/words=16/data=random -28.25% -13.24% -22.66% -27.18% -19.13% -23.38% -20.24% -51.01% -28.06% -3.05%
SubVW/words=32/data=random -38.41% -28.88% -40.12% -11.20% -2.80% -49.17% -34.67% -63.29% -39.25% -15.20%
SubVW/words=64/data=random -25.51% -19.24% -22.20% -6.57% +9.98% -48.52% -48.14% -69.50% -49.44% -27.92%
SubVW/words=100/data=random -21.69% -18.51% ~ +1.92% +34.42% -65.88% -54.67% -71.24% -58.88% -30.71%
SubVW/words=1000/data=random -9.81% -4.05% -2.14% -3.06% ~ -93.37% -67.33% -74.12% -68.36% -42.17%
SubVW/words=10000/data=random ~ -0.52% ~ ~ ~ -88.87% -68.54% -44.94% -70.63% -19.95%
SubVW/words=100000/data=random ~ ~ ~ ~ ~ -86.69% -68.09% -48.36% -62.42% -19.32%
AddVW/words=1/data=shortcut -29.38% -25.38% -27.37% -23.15% -25.41% +3.01% -33.60% -36.12% -15.76% ~
AddVW/words=2/data=shortcut -32.79% -34.72% -31.47% -24.47% -28.21% -3.75% -34.66% -43.89% -23.65% -21.56%
AddVW/words=3/data=shortcut -38.50% -46.83% -35.67% -26.38% -30.29% -10.41% -44.89% -47.68% -30.93% -26.85%
AddVW/words=4/data=shortcut -40.40% -28.85% -34.19% -29.83% -32.95% -16.09% -42.86% -51.02% -34.19% -26.69%
AddVW/words=5/data=shortcut -43.87% -35.42% -36.46% -32.59% -37.72% -20.82% -45.14% -54.01% -35.49% -30.48%
AddVW/words=6/data=shortcut -46.98% -39.34% -42.22% -35.43% -38.18% -27.46% -46.72% -56.61% -40.21% -34.07%
AddVW/words=7/data=shortcut -49.63% -47.97% -46.61% -35.28% -41.93% -31.14% -49.29% -58.89% -41.10% -37.01%
AddVW/words=8/data=shortcut -50.48% -42.33% -45.40% -40.24% -41.74% -32.92% -50.62% -60.98% -44.85% -38.10%
AddVW/words=9/data=shortcut -54.27% -43.52% -49.06% -42.16% -45.22% -37.57% -51.84% -62.91% -46.04% -40.82%
AddVW/words=10/data=shortcut -56.01% -45.40% -51.42% -43.29% -46.14% -38.65% -53.65% -64.62% -47.05% -43.21%
AddVW/words=16/data=shortcut -62.73% -55.66% -59.31% -56.38% -54.31% -53.16% -61.03% -72.29% -58.24% -52.57%
AddVW/words=32/data=shortcut -74.00% -69.42% -71.75% -33.65% -37.35% -71.73% -72.59% -82.44% -70.87% -67.69%
AddVW/words=64/data=shortcut -56.69% -52.72% -52.09% -35.48% -36.87% -84.24% -83.10% -90.37% -82.56% -80.81%
AddVW/words=100/data=shortcut -56.68% -53.18% -51.49% -33.49% -37.72% -89.95% -88.21% -93.37% -88.47% -86.52%
AddVW/words=1000/data=shortcut -56.68% -52.45% -51.66% -35.31% -36.65% -98.88% -98.62% -99.24% -98.78% -98.41%
AddVW/words=10000/data=shortcut -56.70% -52.40% -51.92% -33.49% -36.98% -99.89% -99.86% -99.92% -99.87% -99.91%
AddVW/words=100000/data=shortcut -56.67% -52.46% -52.38% -35.31% -37.20% -99.99% -99.99% -99.99% -99.99% -99.99%
SubVW/words=1/data=shortcut -29.80% -20.71% -26.94% -23.24% -25.33% +26.97% -32.02% -37.85% -40.20% -12.67%
SubVW/words=2/data=shortcut -35.47% -36.38% -31.93% -25.43% -30.18% +18.96% -33.48% -46.48% -39.38% -18.65%
SubVW/words=3/data=shortcut -39.22% -49.96% -36.90% -25.82% -30.96% +12.53% -40.67% -51.07% -43.71% -23.78%
SubVW/words=4/data=shortcut -40.46% -24.90% -34.66% -29.87% -33.97% +4.60% -42.32% -54.92% -42.83% -22.45%
SubVW/words=5/data=shortcut -43.84% -34.17% -38.00% -32.55% -37.27% -2.46% -43.09% -58.18% -45.70% -26.45%
SubVW/words=6/data=shortcut -47.69% -37.49% -42.73% -35.90% -37.73% -8.52% -46.55% -61.01% -44.00% -30.14%
SubVW/words=7/data=shortcut -49.45% -50.66% -46.88% -34.77% -41.64% -14.46% -48.92% -63.46% -50.47% -33.39%
SubVW/words=8/data=shortcut -50.45% -39.31% -47.14% -40.47% -41.70% -15.77% -50.21% -65.64% -47.71% -34.01%
SubVW/words=9/data=shortcut -54.28% -43.07% -49.42% -41.34% -44.99% -19.39% -51.55% -67.61% -56.92% -36.82%
SubVW/words=10/data=shortcut -56.85% -47.88% -50.92% -42.76% -45.67% -23.60% -53.04% -69.34% -60.18% -39.43%
SubVW/words=16/data=shortcut -62.36% -54.83% -58.80% -55.83% -53.74% -41.04% -60.16% -76.75% -60.56% -48.63%
SubVW/words=32/data=shortcut -73.68% -68.64% -71.57% -33.52% -37.34% -64.73% -72.67% -85.89% -71.87% -64.56%
SubVW/words=64/data=shortcut -56.68% -51.66% -52.56% -34.75% -37.54% -80.30% -83.58% -92.39% -83.41% -78.70%
SubVW/words=100/data=shortcut -56.68% -50.97% -51.57% -33.68% -36.78% -87.42% -88.53% -94.84% -88.87% -84.96%
SubVW/words=1000/data=shortcut -56.68% -50.89% -52.10% -34.94% -37.77% -98.59% -98.71% -99.43% -98.80% -98.20%
SubVW/words=10000/data=shortcut -56.68% -51.00% -52.44% -33.65% -37.27% -99.86% -99.87% -99.94% -99.88% -99.90%
SubVW/words=100000/data=shortcut -56.68% -50.80% -52.20% -34.79% -37.46% -99.99% -99.99% -99.99% -99.99% -99.99%
AddVW/words=1/data=carry -0.51% -5.29% -24.03% -26.48% ~ ~ -33.14% -30.23% ~ -20.74%
AddVW/words=2/data=carry -6.36% ~ -21.05% -39.40% ~ +10.72% -29.12% -31.34% ~ -17.29%
AddVW/words=3/data=carry ~ ~ -17.46% -19.53% +17.58% ~ -26.23% -23.61% +7.80% -14.34%
AddVW/words=4/data=carry +19.02% +16.80% ~ ~ +28.25% ~ -27.90% -20.31% +19.16% ~
AddVW/words=5/data=carry +3.97% +53.02% ~ ~ +11.31% ~ -19.05% -17.47% +16.81% ~
AddVW/words=6/data=carry +2.98% +19.83% ~ ~ +14.84% ~ -18.48% -14.92% +18.25% ~
AddVW/words=7/data=carry ~ ~ ~ ~ +27.17% ~ -15.50% -12.74% +13.00% ~
AddVW/words=8/data=carry +0.58% +22.32% ~ +6.10% +29.63% ~ -13.04% ~ +28.46% +2.95%
AddVW/words=9/data=carry ~ +31.53% ~ ~ +14.42% ~ -11.32% ~ +18.37% +3.28%
AddVW/words=10/data=carry +3.94% +22.36% ~ +6.29% +19.22% ~ -11.27% ~ +20.10% +3.91%
AddVW/words=16/data=carry +2.82% +14.23% ~ +10.06% +25.91% -16.12% ~ ~ +52.28% +10.40%
AddVW/words=32/data=carry ~ +25.35% +13.66% ~ +34.89% -34.39% +6.51% -18.71% +41.06% +19.42%
AddVW/words=64/data=carry -42.03% ~ -39.70% +6.65% +32.29% -39.94% +14.34% ~ +19.68% +20.86%
AddVW/words=100/data=carry -33.95% -34.28% -39.65% ~ +27.72% -26.80% +17.40% ~ +26.39% +23.32%
AddVW/words=1000/data=carry -42.49% -47.87% -47.44% +1.25% +4.25% -41.76% +23.40% ~ +25.48% +27.99%
AddVW/words=10000/data=carry -41.85% -48.49% -49.43% ~ ~ -42.09% +24.61% -10.32% +40.55% +18.35%
AddVW/words=100000/data=carry -28.18% -48.13% -48.24% +1.35% ~ -42.90% +24.73% -9.79% +22.55% +17.16%
SubVW/words=1/data=carry -10.32% -17.16% -24.14% -26.24% ~ +18.43% -34.10% -29.54% -9.57% ~
SubVW/words=2/data=carry -19.45% -23.31% -20.74% -39.73% ~ +15.74% -28.13% -30.21% ~ -18.74%
SubVW/words=3/data=carry ~ -16.18% -15.34% -19.54% +17.62% +12.39% -27.64% -27.09% ~ -14.97%
SubVW/words=4/data=carry +11.67% +24.42% ~ ~ +25.11% +14.07% -28.08% -26.18% ~ ~
SubVW/words=5/data=carry +8.08% +25.64% ~ ~ +10.35% +8.12% -21.75% -25.50% ~ -4.86%
SubVW/words=6/data=carry ~ +13.82% ~ ~ +12.92% +6.79% -20.25% -24.70% ~ -2.74%
SubVW/words=7/data=carry ~ ~ +8.29% +4.51% +26.59% +4.62% -18.01% -24.09% ~ -1.26%
SubVW/words=8/data=carry ~ +23.16% +16.19% +6.16% +25.46% +6.74% -15.57% -22.74% ~ +1.44%
SubVW/words=9/data=carry ~ +30.71% +20.81% ~ +12.36% ~ -12.99% ~ ~ +3.13%
SubVW/words=10/data=carry +5.03% +19.53% +14.84% +14.16% +16.12% ~ -11.64% -16.00% +15.45% +3.29%
SubVW/words=16/data=carry +14.42% +15.58% +33.07% +11.43% +24.65% ~ ~ -21.90% +25.59% +9.40%
SubVW/words=32/data=carry ~ +27.57% +46.58% ~ +35.35% -8.49% ~ -24.04% +11.86% +18.40%
SubVW/words=64/data=carry -24.34% -27.83% -20.90% +13.34% +37.17% -14.90% ~ -8.81% +12.88% +18.92%
SubVW/words=100/data=carry -25.19% -34.70% -27.45% +12.86% +28.42% -14.48% ~ ~ +25.71% +21.93%
SubVW/words=1000/data=carry -24.93% -47.86% -47.26% +2.66% ~ -23.88% ~ ~ +25.99% +27.81%
SubVW/words=10000/data=carry -24.17% -36.48% -49.41% +1.06% ~ -25.06% ~ -26.50% +27.94% +18.36%
SubVW/words=100000/data=carry -22.51% -35.86% -49.46% +3.96% ~ -25.18% ~ -22.15% +26.86% +15.44%
Change-Id: I8f252073040e674780ac6ec9912082fb205329dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/664898
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
```
find . \
-not -path './.git/*' \
-not -path './test/*' \
-not -path './src/cmd/vendor/*' \
-not -wholename './src/strings/example_test.go' \
-type f \
-exec \
sed -i -E 's/strings\.Replace\((.+), -1\)/strings\.ReplaceAll\(\1\)/g' {} \;
```
Change-Id: I59e2e91b3654c41a32f17dd91ec56f250198f0d6
GitHub-Last-Rev: 0868b1eccc945ca62a5ed0e56a4054994d4bd659
GitHub-Pull-Request: golang/go#73370
Reviewed-on: https://go-review.googlesource.com/c/go/+/665395
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
On 32-bit systems, these need to be aligned to 8 bytes, even though the
typechecker doesn't tell us that.
The 64-bit allocations might be the target of atomic operations
that require 64-bit alignment.
Fixes 386 longtest builder.
Fixes #73173
Change-Id: I68f6a4f40c7051d29c57ecd560c8d920876a56a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/663015
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
For variable-sized allocations.
Turns out that we already implement the correct escape semantics
for this case. Even when the result of the "make" does not escape,
everything assigned into it does.
Change-Id: Ia123c538d39f2f1e1581c24e4135a65af3821c5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/657937
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
If multiple small scalars escape to the heap, allocate them together
with a single allocation. They are going to be aggregated together
in the tiny allocator anyway, might as well do just one runtime call.
Change-Id: I4317e29235af63de378a26436a18d7fb0c39e41f
Reviewed-on: https://go-review.googlesource.com/c/go/+/648536
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Instead of always allocating variable-sized "make" calls on the heap,
allocate a small, constant-sized array on the stack and use that array
as the backing store if it is big enough.
Requires the result of the "make" doesn't escape.
if cap <= K {
var arr [K]E
slice = arr[:len:cap]
} else {
slice = makeslice(E, len, cap)
}
Pretty conservatively for now, K = 32/sizeof(E). The slice header is
already 24 bytes, so wasting 32 bytes of stack if the requested size
is too big isn't that bad. Larger would waste more stack space but
maybe avoid more allocations.
This CL also requires the element type be pointer-free. Maybe we
could relax that at some point, but it is hard. If the element type
has pointers we can get heap->stack pointers (in the case where the
requested size is too big and the slice is heap allocated).
Note that this only handles the case of makeslice called directly from
compiler-generated code. It does not handle slices built in the
runtime on behalf of the program (e.g. in growslice). Some of those
are currently handled by passing in a tmpBuf (e.g. concatstrings),
but we could probably do more.
Change-Id: I8378efad527cd00d25948a80b82a68d88fbd93a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/653856
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
hottest one
When both a direct call and an interface call appear on the same line,
PGO devirtualization may make a suboptimal decision. In some cases,
the directly called function becomes a candidate for devirtualization
if no other relevant outgoing edges with non-zero weight exist for the
caller's IRNode in the WeightedCG. The edge to this candidate is
considered the hottest. Despite having zero weight, this edge still
causes the interface call to be devirtualized.
This CL prevents devirtualization when the weight of the hottest edge
is 0.
Fixes #72092
Change-Id: I06c0c5e080398d86f832e09244aceaa4aeb98721
Reviewed-on: https://go-review.googlesource.com/c/go/+/655475
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Currently, if the user stops the timer in a B.Loop benchmark loop, the
benchmark will run until it hits the timeout and fails.
Fix this by detecting that the timer is stopped and failing the
benchmark right away. We avoid making the fast path more expensive for
this check by "poisoning" the B.Loop iteration counter when the timer
is stopped so that it falls back to the slow path, which can check the
timer.
This causes b to escape from B.Loop, which is totally harmless because
it was already definitely heap-allocated. But it causes the
test/inline_testingbloop.go errorcheck test to fail. I don't think the
escape messages actually mattered to that test, they just had to be
matched. To fix this, we drop the debug level to -m=1, since -m=2
prints a lot of extra information for escaping parameters that we
don't want to deal with, and change one error check to allow b to
escape.
Fixes #72971.
Change-Id: I7d4abbb1ec1e096685514536f91ba0d581cca6b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/659657
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Fixes the ComputePadding calculation to take into account the padding
added for the current offset. This fixes an issue where padding can be
added incorrectly for certain structs.
Related: https://github.com/go-delve/delve/issues/3923
Same as https://go-review.googlesource.com/c/go/+/656736 just without
the brittle test.
Fixes #72053
Change-Id: I67f157a42f5fc5d3a54d0e9be03488aa44752bcb
GitHub-Last-Rev: fabed69a31258fa8c1806f88d1cbcc745c881148
GitHub-Pull-Request: golang/go#72997
Reviewed-on: https://go-review.googlesource.com/c/go/+/659698
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This reverts CL 656736.
Reason for revert: breaks many builders (all flavors of
linux-amd64 builders).
Change-Id: Ie7190d4078fada227391804c5cf10b9ce9cc9115
Reviewed-on: https://go-review.googlesource.com/c/go/+/659955
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Fixes the ComputePadding calculation to take into account
the padding added for the current offset. This fixes an issue
where padding can be added incorrectly for certain structs.
Related: https://github.com/go-delve/delve/issues/3923
Fixes #72053
Change-Id: I277629799168c6b44bc9ed03df4345e0318064ce
GitHub-Last-Rev: 9478b29a137e20421ad348bb93a54406b1977008
GitHub-Pull-Request: golang/go#72805
Reviewed-on: https://go-review.googlesource.com/c/go/+/656736
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
The original issue62407_test also passes with versions prior to 1.23.
The improvement makes it fail with versions prior to 1.23.
Change-Id: I94bfb9d1ac695c8e07997d7029fc2101535e14f8
GitHub-Last-Rev: 44be2a610a1a79d04dc3d228af2b313200f4d900
GitHub-Pull-Request: golang/go#70938
Reviewed-on: https://go-review.googlesource.com/c/go/+/638036
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Update #72018
Change-Id: I3188019658c37da3c31f06472023b39e13170ebf
Reviewed-on: https://go-review.googlesource.com/c/go/+/654316
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|