| Age | Commit message (Collapse) | Author |
|
CL 622236 forgot to check the mask was also a 32 bit rotate mask. Add
a modified version of isPPC64WordRotateMask which valids the mask is
contiguous and fits inside a uint32.
I don't this is possible when merging SRDconst, the first check should
always reject such combines. But, be extra careful and do it there
too.
Fixes #73153
Change-Id: Ie95f74ec5e7d89dc761511126db814f886a7a435
Reviewed-on: https://go-review.googlesource.com/c/go/+/679775
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
unique.Make always copies strings passed into it, so it's safe to not
copy byte slices converted to strings either. Handle this just like map
accesses with string(b) as keys.
This CL only handles unique.Make(string(b)), not nested cases like
unique.Make([2]string{string(b1), string(b2)}); this could be done in a
followup CL but the map lookup code in walk is sufficiently different
than the call handling code that I didn't attempt it. (SSA is much
easier).
Fixes #71926
Change-Id: Ic2f82f2f91963d563b4ddb1282bd49fc40da8b85
Reviewed-on: https://go-review.googlesource.com/c/go/+/672135
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
allocating
Today, this interface conversion causes the struct literal
to be heap allocated:
var sink any
func example1() {
sink = S{1, 1}
}
For basic literals like integers that are directly used in
an interface conversion that would otherwise allocate, the compiler
is able to use read-only global storage (see #18704).
This CL extends that to struct and array literals as well by creating
read-only global storage that is able to represent for example S{1, 1},
and then using a pointer to that storage in the interface
when the interface conversion happens.
A more challenging example is:
func example2() {
v := S{1, 1}
sink = v
}
In this case, the struct literal is not directly part of the
interface conversion, but is instead assigned to a local variable.
To still avoid heap allocation in cases like this, in walk we
construct a cache that maps from expressions used in interface
conversions to earlier expressions that can be used to represent the
same value (via ir.ReassignOracle.StaticValue). This is somewhat
analogous to how we avoided heap allocation for basic literals in
CL 649077 earlier in our stack, though here we also need to do a
little more work to create the read-only global.
CL 649076 (also earlier in our stack) added most of the tests
along with debug diagnostics in convert.go to make it easier
to test this change.
See the writeup in #71359 for details.
Fixes #71359
Fixes #71323
Updates #62653
Updates #53465
Updates #8618
Change-Id: I8924f0c69ff738ea33439bd6af7b4066af493b90
Reviewed-on: https://go-review.googlesource.com/c/go/+/649555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Fixes #73812
Change-Id: If7a6e103ae9e1442a2cf4a3c6b1270b6a1887196
Reviewed-on: https://go-review.googlesource.com/c/go/+/675175
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Reduce the number of go toolchain instructions on loong64 as follows.
file before after Δ %
addr2line 279880 279776 -104 -0.0372%
asm 556638 556410 -228 -0.0410%
buildid 272272 272072 -200 -0.0735%
cgo 481522 481318 -204 -0.0424%
compile 2457788 2457580 -208 -0.0085%
covdata 323384 323280 -104 -0.0322%
cover 518450 518234 -216 -0.0417%
dist 340790 340686 -104 -0.0305%
distpack 282456 282252 -204 -0.0722%
doc 789932 789688 -244 -0.0309%
fix 324332 324228 -104 -0.0321%
link 704622 704390 -232 -0.0329%
nm 277132 277028 -104 -0.0375%
objdump 507862 507758 -104 -0.0205%
pack 221774 221674 -100 -0.0451%
pprof 1469816 1469552 -264 -0.0180%
test2json 254836 254732 -104 -0.0408%
trace 1100002 1099738 -264 -0.0240%
vet 781078 780874 -204 -0.0261%
go 1529116 1528848 -268 -0.0175%
gofmt 318556 318448 -108 -0.0339%
total 13792238 13788566 -3672 -0.0266%
Change-Id: I23fb3ebd41309252c7075e57ea7094e79f8c4fef
Reviewed-on: https://go-review.googlesource.com/c/go/+/674335
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
|
|
In the loong64 instruction set, there is no NORI instruction,
so the immediate value in NORconst need to be stored in register
and then use the three-register NOR instruction.
Change-Id: I5ef697450619317218cb3ef47fc07e238bdc2139
Reviewed-on: https://go-review.googlesource.com/c/go/+/673836
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This CL implements the TODO in combineStores to allow combining
stores of different sizes, as long as the total size aligns to
2, 4, 8.
Fixes #72832.
Change-Id: I6d1d471335da90d851ad8f3b5a0cf10bdcfa17c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/661855
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Fold negation into addition/subtraction and avoid double negation.
platform: linux/arm64
file before after Δ %
addr2line 3628108 3628116 +8 +0.000%
asm 6208353 6207857 -496 -0.008%
buildid 3460682 3460418 -264 -0.008%
cgo 5572988 5572492 -496 -0.009%
compile 26042159 26041039 -1120 -0.004%
cover 6304328 6303472 -856 -0.014%
dist 4139330 4139098 -232 -0.006%
doc 9429305 9428065 -1240 -0.013%
fix 3997189 3996733 -456 -0.011%
link 8212128 8210280 -1848 -0.023%
nm 3620056 3619696 -360 -0.010%
objdump 5920289 5919233 -1056 -0.018%
pack 2892250 2891778 -472 -0.016%
pprof 17094569 17092745 -1824 -0.011%
test2json 3335825 3335529 -296 -0.009%
trace 15842080 15841456 -624 -0.004%
vet 9472194 9471106 -1088 -0.011%
go 19081541 19081509 -32 -0.000%
total 154253374 154240622 -12752 -0.008%
platform: darwin/arm64
file before after Δ %
compile 27152002 27135490 -16512 -0.061%
link 8372914 8356402 -16512 -0.197%
go 19154802 19154778 -24 -0.000%
total 157734180 157701132 -33048 -0.021%
Change-Id: I15a349bfbaf7333ec3e4a62ae4d06f3f371dfb1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/673715
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
-N+1 <= x % N <= N-1
This is useful for cases like:
func setBit(b []byte, i int) {
b[i/8] |= 1<<(i%8)
}
The shift does not need protection against larger-than-7 cases.
(It does still need protection against <0 cases.)
Change-Id: Idf83101386af538548bfeb6e2928cea855610ce2
Reviewed-on: https://go-review.googlesource.com/c/go/+/672995
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Fold negation into addition/subtraction and avoid double negation.
file before after Δ %
addr2line 3742022 3741986 -36 -0.001%
asm 6668616 6668628 +12 +0.000%
buildid 3583786 3583630 -156 -0.004%
cgo 6020370 6019634 -736 -0.012%
compile 29416016 29417336 +1320 +0.004%
cover 6801903 6801675 -228 -0.003%
dist 4485916 4485816 -100 -0.002%
doc 10652787 10652251 -536 -0.005%
fix 4115988 4115560 -428 -0.010%
link 9002328 9001616 -712 -0.008%
nm 3733148 3732780 -368 -0.010%
objdump 6163292 6163068 -224 -0.004%
pack 2944768 2944604 -164 -0.006%
pprof 18909973 18908773 -1200 -0.006%
test2json 3394662 3394778 +116 +0.003%
trace 17350911 17349751 -1160 -0.007%
vet 10077727 10077527 -200 -0.002%
go 19118769 19118609 -160 -0.001%
total 166182982 166178022 -4960 -0.003%
Change-Id: Id55698800fd70f3cb2ff48393584456b87208921
Reviewed-on: https://go-review.googlesource.com/c/go/+/673556
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Fold negation into addition/subtraction and avoid double negation.
file before after Δ %
addr2line 4007310 4007470 +160 +0.004%
asm 7007636 7007436 -200 -0.003%
buildid 3839268 3838972 -296 -0.008%
cgo 6353466 6352738 -728 -0.011%
compile 30426920 30426896 -24 -0.000%
cover 7005408 7004744 -664 -0.009%
dist 4651192 4650872 -320 -0.007%
doc 10606050 10606034 -16 -0.000%
fix 4446414 4446390 -24 -0.001%
link 9237736 9237024 -712 -0.008%
nm 3999107 3999323 +216 +0.005%
objdump 6762424 6762144 -280 -0.004%
pack 3270757 3270493 -264 -0.008%
pprof 19428299 19361939 -66360 -0.342%
test2json 3717345 3717217 -128 -0.003%
trace 17382273 17381657 -616 -0.004%
vet 10689481 10688985 -496 -0.005%
go 19118769 19118609 -160 -0.001%
total 171949855 171878943 -70912 -0.041%
Change-Id: I35c1f264d216c214ea3f56252a9ddab8ea850fa6
Reviewed-on: https://go-review.googlesource.com/c/go/+/673555
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
x += *p
We want to do this with a single load+add operation on amd64.
The tricky part is that we don't want to combine if there are
other uses of x after this instruction.
Implement a simple detector that seems to capture a common situation -
x += *p is in a loop, and the other use of x is after loop exit.
In that case, it does not hurt to do the load+add combo.
Change-Id: I466174cce212e78bde83f908cc1f2752b560c49c
Reviewed-on: https://go-review.googlesource.com/c/go/+/672957
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
for ..; ..; i++ {
...
}
We want to schedule the i++ late in the block, so that all other
uses of i in the block are scheduled first. That way, i++ can
happen in place in a register instead of requiring a temporary register.
Change-Id: Id777407c7e67a5ddbd8e58251099b0488138c0df
Reviewed-on: https://go-review.googlesource.com/c/go/+/672998
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
This change also avoid double negation, and add loong64 codegen for arithmetic tests.
Reduce the number of go toolchain instructions on loong64 as follows.
file before after Δ %
addr2line 279972 279896 -76 -0.0271%
asm 556390 556310 -80 -0.0144%
buildid 272376 272300 -76 -0.0279%
cgo 481534 481550 +16 +0.0033%
compile 2457992 2457396 -596 -0.0242%
covdata 323488 323404 -84 -0.0260%
cover 518630 518490 -140 -0.0270%
dist 340894 340814 -80 -0.0235%
distpack 282568 282484 -84 -0.0297%
doc 790224 789984 -240 -0.0304%
fix 324408 324348 -60 -0.0185%
link 704910 704666 -244 -0.0346%
nm 277220 277144 -76 -0.0274%
objdump 508026 507878 -148 -0.0291%
pack 221810 221786 -24 -0.0108%
pprof 1470284 1469880 -404 -0.0275%
test2json 254896 254852 -44 -0.0173%
trace 1100390 1100074 -316 -0.0287%
vet 781398 781142 -256 -0.0328%
go 1529668 1529128 -540 -0.0353%
gofmt 318668 318568 -100 -0.0314%
total 13795746 13792094 -3652 -0.0265%
Change-Id: I88d1f12cfc4be0e92687c48e06a57213aa484aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/672555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Add 2 more cases:
if a { x = value } else { x = a } => x = a && value
if a { x = a } else { x = value } => x = a || value
AND case goes from:
00006 (8) TESTB AX, AX
00007 (8) JNE 9
00008 (13) MOVL AX, BX
00009 (13) MOVL BX, AX
00010 (13) RET
to:
00006 (13) ANDL BX, AX
00007 (13) RET
OR goes from:
00006 (19) TESTB AX, AX
00007 (19) JNE 9
00008 (24) MOVL BX, AX
00009 (24) RET
to:
00006 (24) ORL BX, AX
00007 (24) RET
compilecmp linux/amd64:
runtime
runtime.lock2 847 -> 869 (+2.60%)
runtime.addspecial 542 -> 517 (-4.61%)
runtime.tracebackPCs changed
runtime.scanstack changed
runtime.mallocinit changed
runtime.traceback2 2238 -> 2206 (-1.43%)
runtime [cmd/compile]
runtime.lock2 860 -> 882 (+2.56%)
runtime.scanstack changed
runtime.addspecial 542 -> 517 (-4.61%)
runtime.traceback2 2238 -> 2206 (-1.43%)
runtime.lockWithRank 870 -> 890 (+2.30%)
runtime.tracebackPCs changed
runtime.mallocinit changed
strconv
strconv.ryuFtoaFixed32 changed
strconv.ryuFtoaFixed64 639 -> 638 (-0.16%)
strconv.readFloat changed
strconv.ryuFtoaShortest changed
strings
strings.(*Replacer).build changed
strconv [cmd/compile]
strconv.readFloat changed
strconv.ryuFtoaFixed64 639 -> 638 (-0.16%)
strconv.ryuFtoaFixed32 changed
strconv.ryuFtoaShortest changed
strings [cmd/compile]
strings.(*Replacer).build changed
regexp
regexp.makeOnePass.func1 changed
regexp [cmd/compile]
regexp.makeOnePass.func1 changed
encoding/json
encoding/json.indirect changed
database/sql
database/sql.driverArgsConnLocked changed
vendor/golang.org/x/text/unicode/norm
vendor/golang.org/x/text/unicode/norm.Form.transform changed
go/doc/comment
go/doc/comment.parseSpans changed
internal/diff
internal/diff.tgs changed
log/slog
log/slog.(*handleState).appendNonBuiltIns 1898 -> 1877 (-1.11%)
testing/fstest
testing/fstest.(*fsTester).checkGlob changed
runtime/pprof
runtime/pprof.(*profileBuilder).build changed
cmd/internal/dwarf
cmd/internal/dwarf.isEmptyInlinedCall 254 -> 244 (-3.94%)
go/printer
go/printer.keepTypeColumn 302 -> 270 (-10.60%)
go/printer.(*printer).binaryExpr changed
cmd/compile/internal/syntax
cmd/compile/internal/syntax.(*scanner).rune changed
cmd/compile/internal/syntax.(*scanner).number 2137 -> 2153 (+0.75%)
Change-Id: I7f95f54b03a35d0b616c40f38b415a7feb71be73
Reviewed-on: https://go-review.googlesource.com/c/go/+/666835
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Jakub Ciolek <jakub@ciolek.dev>
TryBot-Bypass: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Use an automatic algorithm to generate strength reduction code.
You give it all the linear combination (a*x+b*y) instructions in your
architecture, it figures out the rest.
Just amd64 and arm64 for now.
Fixes #67575
Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/626998
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount
using the CPOP/CPOPW machine instructions. Since the native Go
implementation of OnesCount is relatively expensive, it is also
worth emitting a check for Zbb support when compiled for rva20u64.
On a Banana Pi F3, with GORISCV64=rva22u64:
│ oc.1 │ oc.2 │
│ sec/op │ sec/op vs base │
OnesCount-8 16.930n ± 0% 4.389n ± 0% -74.08% (p=0.000 n=10)
OnesCount8-8 5.642n ± 0% 5.016n ± 0% -11.10% (p=0.000 n=10)
OnesCount16-8 9.404n ± 0% 5.015n ± 0% -46.67% (p=0.000 n=10)
OnesCount32-8 13.165n ± 0% 4.388n ± 0% -66.67% (p=0.000 n=10)
OnesCount64-8 16.300n ± 0% 4.388n ± 0% -73.08% (p=0.000 n=10)
geomean 11.40n 4.629n -59.40%
On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb
detection enabled:
│ oc.3 │ oc.4 │
│ sec/op │ sec/op vs base │
OnesCount-8 16.930n ± 0% 5.643n ± 0% -66.67% (p=0.000 n=10)
OnesCount8-8 5.642n ± 0% 5.642n ± 0% ~ (p=0.447 n=10)
OnesCount16-8 10.030n ± 0% 6.896n ± 0% -31.25% (p=0.000 n=10)
OnesCount32-8 13.170n ± 0% 5.642n ± 0% -57.16% (p=0.000 n=10)
OnesCount64-8 16.300n ± 0% 5.642n ± 0% -65.39% (p=0.000 n=10)
geomean 11.55n 5.873n -49.16%
On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb
detection disabled:
│ oc.3 │ oc.5 │
│ sec/op │ sec/op vs base │
OnesCount-8 16.93n ± 0% 29.47n ± 0% +74.07% (p=0.000 n=10)
OnesCount8-8 5.642n ± 0% 5.643n ± 0% ~ (p=0.191 n=10)
OnesCount16-8 10.03n ± 0% 15.05n ± 0% +50.05% (p=0.000 n=10)
OnesCount32-8 13.17n ± 0% 18.18n ± 0% +38.04% (p=0.000 n=10)
OnesCount64-8 16.30n ± 0% 21.94n ± 0% +34.60% (p=0.000 n=10)
geomean 11.55n 15.84n +37.16%
For hardware without Zbb, this adds ~5ns overhead, while for hardware
with Zbb we achieve a performance gain up of up to 11ns. It is worth
noting that OnesCount8 is cheap enough that it is preferable to stick
with the generic version in this case.
Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5
Reviewed-on: https://go-review.googlesource.com/c/go/+/660856
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For riscv64/rva22u64 and above, we can intrinsify math/bits.Bswap
using the REV8 machine instruction.
On a StarFive VisionFive 2 with GORISCV64=rva22u64:
│ rb.1 │ rb.2 │
│ sec/op │ sec/op vs base │
ReverseBytes-4 18.790n ± 0% 4.026n ± 0% -78.57% (p=0.000 n=10)
ReverseBytes16-4 6.710n ± 0% 5.368n ± 0% -20.00% (p=0.000 n=10)
ReverseBytes32-4 13.420n ± 0% 5.368n ± 0% -60.00% (p=0.000 n=10)
ReverseBytes64-4 17.450n ± 0% 4.026n ± 0% -76.93% (p=0.000 n=10)
geomean 13.11n 4.649n -64.54%
Change-Id: I26eee34270b1721f7304bb1cddb0fda129b20ece
Reviewed-on: https://go-review.googlesource.com/c/go/+/660855
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
The full 64x64->128 multiply comes up when using bits.Mul64.
The 64x64->64+overflow multiply comes up in unsafe.Slice when using
a constant length.
Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc
Reviewed-on: https://go-review.googlesource.com/c/go/+/667175
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
If the thing we're ranging over is an array or ptr to array, and
it doesn't have a function call or channel receive in it, then we
shouldn't evaluate it.
Typecheck the ranged-over value as a constant in that case.
That makes the unified exporter replace the range expression
with a constant int.
Change-Id: I0d4ea081de70d20cf6d1fa8d25ef6cb021975554
Reviewed-on: https://go-review.googlesource.com/c/go/+/659317
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
goos: linux
goarch: loong64
pkg: unicode/utf8
cpu: Loongson-3A6000-HV @ 2500.00MHz
│ old │ new │
│ sec/op │ sec/op vs base │
ValidTenASCIIChars 7.604n ± 0% 6.805n ± 0% -10.51% (p=0.000 n=10)
Valid100KASCIIChars 37.41µ ± 0% 16.58µ ± 0% -55.67% (p=0.000 n=10)
ValidTenJapaneseChars 60.84n ± 0% 58.62n ± 0% -3.64% (p=0.000 n=10)
ValidLongMostlyASCII 113.5µ ± 0% 113.5µ ± 0% ~ (p=0.303 n=10)
ValidLongJapanese 204.6µ ± 0% 206.8µ ± 0% +1.07% (p=0.000 n=10)
ValidStringTenASCIIChars 7.604n ± 0% 6.803n ± 0% -10.53% (p=0.000 n=10)
ValidString100KASCIIChars 38.05µ ± 0% 17.14µ ± 0% -54.97% (p=0.000 n=10)
ValidStringTenJapaneseChars 60.58n ± 0% 59.48n ± 0% -1.82% (p=0.000 n=10)
ValidStringLongMostlyASCII 113.5µ ± 0% 113.4µ ± 0% -0.10% (p=0.000 n=10)
ValidStringLongJapanese 205.9µ ± 0% 207.3µ ± 0% +0.67% (p=0.000 n=10)
geomean 3.324µ 2.756µ -17.08%
Change-Id: Id43b6e2e41907bd4b92f421dacde31f048db47d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/662495
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Improve the compiler's store-to-load forwarding optimization by relaxing the
type comparison condition. Instead of requiring exact type equality (CMPeq),
we now use copyCompatibleType which allows forwarding between compatible
types where safe.
Fix several size comparison bugs in the nested store patterns. Previously,
we were comparing the size of the outer store with the load type,
rather than comparing with the size of the actual store being forwarded
from.
Skip OpConvert in dead store elimination to help get rid of dead stores such
as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory.
This optimization is particularly beneficial for code that creates slices with
computed pointers, such as the runtime's heapBitsSlice function, where
intermediate calculations were previously causing the compiler to miss
store-to-load forwarding opportunities.
Local sweet run result on an x86_64 laptop:
│ Orig.res │ Hopt.res │
│ sec/op │ sec/op vs base │
BiogoIgor-8 5.303 ± 1% 5.322 ± 1% ~ (p=0.190 n=10)
BiogoKrishna-8 7.894 ± 1% 7.828 ± 2% ~ (p=0.190 n=10)
BleveIndexBatch100-8 2.257 ± 1% 2.248 ± 2% ~ (p=0.529 n=10)
EtcdPut-8 30.12m ± 1% 30.03m ± 1% ~ (p=0.796 n=10)
EtcdSTM-8 127.1m ± 1% 126.2m ± 0% -0.74% (p=0.023 n=10)
GoBuildKubelet-8 52.21 ± 0% 52.05 ± 1% ~ (p=0.063 n=10)
GoBuildKubeletLink-8 4.342 ± 1% 4.305 ± 0% -0.85% (p=0.000 n=10)
GoBuildIstioctl-8 43.33 ± 0% 43.24 ± 0% -0.22% (p=0.015 n=10)
GoBuildIstioctlLink-8 4.604 ± 1% 4.598 ± 0% ~ (p=0.063 n=10)
GoBuildFrontend-8 15.33 ± 0% 15.29 ± 0% ~ (p=0.143 n=10)
GoBuildFrontendLink-8 740.0m ± 1% 737.7m ± 1% ~ (p=0.912 n=10)
GopherLuaKNucleotide-8 9.590 ± 1% 9.656 ± 1% ~ (p=0.165 n=10)
MarkdownRenderXHTML-8 96.97m ± 1% 97.26m ± 2% ~ (p=0.105 n=10)
Tile38QueryLoad-8 335.9µ ± 1% 335.6µ ± 1% ~ (p=0.481 n=10)
geomean 1.336 1.333 -0.22%
Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f
Reviewed-on: https://go-review.googlesource.com/c/go/+/662075
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Optimise more branches with zero on riscv64. In particular, BLTU with
zero occurs with IsInBounds checks for index zero. This currently results
in two instructions and requires an additional register:
li t2, 0
bltu t2, t1, 0x174b4
This is equivalent to checking if the bounds is not equal to zero. With
this change:
bnez t1, 0x174c0
This removes more than 500 instructions from the Go binary on riscv64.
Change-Id: I6cd861d853e3ef270bd46dacecdfaa205b1c4644
Reviewed-on: https://go-review.googlesource.com/c/go/+/606715
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
All other files here use the codegen package.
Change-Id: I714162941b9fa9051dacc29643e905fe60b9304b
Reviewed-on: https://go-review.googlesource.com/c/go/+/661135
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
This adds tests for type conversion and shifts, detailing various
poor bad code generation that currently exists for riscv64. This
will be addressed in future CLs.
Change-Id: Ie1d366dfe878832df691600f8500ef383da92848
Reviewed-on: https://go-review.googlesource.com/c/go/+/615678
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
For riscv64/rva22u64 and above, we can intrinsify math/bits.Len using the
CLZ/CLZW machine instructions.
On a StarFive VisionFive 2 with GORISCV64=rva22u64:
│ clz.b.1 │ clz.b.2 │
│ sec/op │ sec/op vs base │
LeadingZeros-4 28.89n ± 0% 12.08n ± 0% -58.19% (p=0.000 n=10)
LeadingZeros8-4 18.79n ± 0% 14.76n ± 0% -21.45% (p=0.000 n=10)
LeadingZeros16-4 25.27n ± 0% 14.76n ± 0% -41.59% (p=0.000 n=10)
LeadingZeros32-4 25.12n ± 0% 12.08n ± 0% -51.92% (p=0.000 n=10)
LeadingZeros64-4 25.89n ± 0% 12.08n ± 0% -53.35% (p=0.000 n=10)
geomean 24.55n 13.09n -46.70%
Change-Id: I0dda684713dbdf5336af393f5ccbdae861c4f694
Reviewed-on: https://go-review.googlesource.com/c/go/+/652321
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
For riscv64/rva22u64 and above, we can intrinsify math/bits.TrailingZeros
using the CTZ/CTZW machine instructions.
On a StarFive VisionFive 2 with GORISCV64=rva22u64:
│ ctz.b.1 │ ctz.b.2 │
│ sec/op │ sec/op vs base │
TrailingZeros-4 25.500n ± 0% 8.052n ± 0% -68.42% (p=0.000 n=10)
TrailingZeros8-4 14.76n ± 0% 10.74n ± 0% -27.24% (p=0.000 n=10)
TrailingZeros16-4 26.84n ± 0% 10.74n ± 0% -59.99% (p=0.000 n=10)
TrailingZeros32-4 25.500n ± 0% 8.052n ± 0% -68.42% (p=0.000 n=10)
TrailingZeros64-4 25.500n ± 0% 8.052n ± 0% -68.42% (p=0.000 n=10)
geomean 23.09n 9.035n -60.88%
Change-Id: I71edf2b988acb7a68e797afda4ee66d7a57d587e
Reviewed-on: https://go-review.googlesource.com/c/go/+/652320
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
|
|
Use NEGW to produce a negated and sign extended word, rather than doing
the same via two instructions:
neg t0, t0
sext.w a0, t0
Becomes:
negw t0, t0
Change-Id: I824ab25001bd3304bdbd435e7b244fcc036ef212
Reviewed-on: https://go-review.googlesource.com/c/go/+/652319
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
On riscv64, subtraction from a constant is typically implemented as an
ADDI with the negative constant, followed by a negation. However this can
lead to multiple NEG/ADDI/NEG sequences that can be optimised out.
For example, runtime.(*_panic).nextDefer currently contains:
lbu t0, 0(t0)
addi t0, t0, -8
neg t0, t0
addi t0, t0, -7
neg t0, t0
Which is now optimised to:
lbu t0, 0(t0)
addi t0, t0, -1
Change-Id: Idf5815e6db2e3705cc4a4811ca9130a064ae3d80
Reviewed-on: https://go-review.googlesource.com/c/go/+/652318
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Codify the current code generation used on riscv64 in this case.
Change-Id: If4152e3652fc19d0aa28b79dba08abee2486d5ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/652317
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Codify the current riscv64 code generation for various subtract from
constant and addition/subtraction tests.
Change-Id: I54ad923280a0578a338bc4431fa5bdc0644c4729
Reviewed-on: https://go-review.googlesource.com/c/go/+/652316
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Tests that exist for riscv64/rva22u64 should also be applied to
riscv64/rva23u64.
Change-Id: Ia529fdf0ac55b8bcb3dcd24fa80efef2351f3842
Reviewed-on: https://go-review.googlesource.com/c/go/+/652315
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Make the TrailingZeros64 code generation check more specific for 386.
Just checking for BSFL will match both the generic 64 bit decomposition
and the custom 386 lowering.
Change-Id: I62076f1889af0ef1f29704cba01ab419cae0c6e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/656996
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The compiler previously avoided the use of MOVUPS on plan9/amd64. This
was changed in CL 655875, however the codegen tests were not updated
and now fail (seemingly the full codegen tests do not run anywhere,
not even on the longtest builders).
Change-Id: I388b60e7b0911048d4949c5029347f9801c018a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/656997
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Auto-Submit: Keith Randall <khr@google.com>
|
|
Use the shiftIsBounded function to generate more efficient shift instructions.
This change also optimize shift ops when the shift value is v&63 and v&31.
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000-HV @ 2500.00MHz
| CL 627855 | this CL |
| sec/op | sec/op vs base |
LeadingZeros 1.1005n ± 0% 0.8425n ± 1% -23.44% (p=0.000 n=10)
LeadingZeros8 1.502n ± 0% 1.501n ± 0% -0.07% (p=0.001 n=10)
LeadingZeros16 1.502n ± 0% 1.501n ± 0% -0.07% (p=0.000 n=10)
LeadingZeros32 0.9511n ± 0% 0.8050n ± 0% -15.36% (p=0.000 n=10)
LeadingZeros64 1.1195n ± 0% 0.8423n ± 0% -24.76% (p=0.000 n=10)
TrailingZeros 0.8086n ± 0% 0.8005n ± 0% -1.00% (p=0.000 n=10)
TrailingZeros8 1.031n ± 1% 1.035n ± 1% ~ (p=0.136 n=10)
TrailingZeros16 0.8114n ± 0% 0.8254n ± 1% +1.73% (p=0.000 n=10)
TrailingZeros32 0.8090n ± 0% 0.8005n ± 0% -1.05% (p=0.000 n=10)
TrailingZeros64 0.8089n ± 1% 0.8005n ± 0% -1.04% (p=0.000 n=10)
OnesCount 0.8677n ± 0% 1.2010n ± 0% +38.41% (p=0.000 n=10)
OnesCount8 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
OnesCount16 0.9344n ± 0% 1.2010n ± 0% +28.53% (p=0.000 n=10)
OnesCount32 0.8677n ± 0% 1.2010n ± 0% +38.41% (p=0.000 n=10)
OnesCount64 1.2010n ± 0% 0.8671n ± 0% -27.80% (p=0.000 n=10)
RotateLeft 0.8009n ± 0% 0.6671n ± 0% -16.71% (p=0.000 n=10)
RotateLeft8 1.202n ± 0% 1.327n ± 0% +10.40% (p=0.000 n=10)
RotateLeft16 0.8036n ± 0% 0.8218n ± 0% +2.26% (p=0.000 n=10)
RotateLeft32 0.6674n ± 0% 0.8004n ± 0% +19.94% (p=0.000 n=10)
RotateLeft64 0.6674n ± 0% 0.8004n ± 0% +19.94% (p=0.000 n=10)
Reverse 0.4067n ± 1% 0.4122n ± 1% +1.38% (p=0.001 n=10)
Reverse8 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
Reverse16 0.8009n ± 0% 0.8005n ± 0% -0.05% (p=0.000 n=10)
Reverse32 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.001 n=10)
Reverse64 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.008 n=10)
ReverseBytes 0.4057n ± 1% 0.4133n ± 1% +1.90% (p=0.000 n=10)
ReverseBytes16 0.8009n ± 0% 0.8004n ± 0% -0.07% (p=0.000 n=10)
ReverseBytes32 0.8009n ± 0% 0.8005n ± 0% -0.05% (p=0.000 n=10)
ReverseBytes64 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
Add 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Add32 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
Add64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Add64multiple 1.832n ± 0% 1.828n ± 0% -0.22% (p=0.001 n=10)
Sub 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub32 1.602n ± 0% 1.601n ± 0% -0.06% (p=0.000 n=10)
Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
Sub64multiple 2.402n ± 0% 2.400n ± 0% -0.10% (p=0.000 n=10)
Mul 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
Mul32 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
Mul64 0.8008n ± 0% 0.8004n ± 0% -0.05% (p=0.000 n=10)
Div 9.083n ± 0% 7.638n ± 0% -15.91% (p=0.000 n=10)
Div32 4.011n ± 0% 4.009n ± 0% -0.05% (p=0.000 n=10)
Div64 9.711n ± 0% 8.204n ± 0% -15.51% (p=0.000 n=10)
geomean 1.083n 1.078n -0.40%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| CL 627855 | this CL |
| sec/op | sec/op vs base |
LeadingZeros 1.341n ± 4% 1.331n ± 2% -0.71% (p=0.008 n=10)
LeadingZeros8 1.781n ± 0% 1.766n ± 1% -0.84% (p=0.011 n=10)
LeadingZeros16 1.782n ± 0% 1.767n ± 0% -0.79% (p=0.001 n=10)
LeadingZeros32 1.341n ± 1% 1.333n ± 0% -0.52% (p=0.001 n=10)
LeadingZeros64 1.338n ± 0% 1.333n ± 0% -0.37% (p=0.008 n=10)
TrailingZeros 0.9025n ± 0% 0.8077n ± 0% -10.50% (p=0.000 n=10)
TrailingZeros8 1.056n ± 0% 1.089n ± 1% +3.17% (p=0.001 n=10)
TrailingZeros16 1.101n ± 0% 1.102n ± 0% +0.09% (p=0.011 n=10)
TrailingZeros32 0.9024n ± 1% 0.8083n ± 0% -10.43% (p=0.000 n=10)
TrailingZeros64 0.9028n ± 1% 0.8087n ± 0% -10.43% (p=0.000 n=10)
OnesCount 1.482n ± 1% 1.302n ± 0% -12.15% (p=0.000 n=10)
OnesCount8 1.206n ± 0% 1.207n ± 2% +0.12% (p=0.000 n=10)
OnesCount16 1.534n ± 0% 1.402n ± 0% -8.58% (p=0.000 n=10)
OnesCount32 1.531n ± 1% 1.302n ± 0% -14.99% (p=0.000 n=10)
OnesCount64 1.302n ± 0% 1.538n ± 1% +18.16% (p=0.000 n=10)
RotateLeft 0.8083n ± 0% 0.8087n ± 1% ~ (p=0.579 n=10)
RotateLeft8 1.310n ± 0% 1.323n ± 0% +0.95% (p=0.001 n=10)
RotateLeft16 1.149n ± 0% 1.165n ± 1% +1.35% (p=0.001 n=10)
RotateLeft32 0.8093n ± 0% 0.8105n ± 0% ~ (p=0.393 n=10)
RotateLeft64 0.8088n ± 0% 0.8090n ± 0% ~ (p=0.739 n=10)
Reverse 0.5109n ± 0% 0.5172n ± 1% +1.25% (p=0.000 n=10)
Reverse8 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.000 n=10)
Reverse16 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.002 n=10)
Reverse32 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.000 n=10)
Reverse64 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.005 n=10)
ReverseBytes 0.5122n ± 2% 0.5182n ± 1% ~ (p=0.060 n=10)
ReverseBytes16 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.005 n=10)
ReverseBytes32 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.005 n=10)
ReverseBytes64 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.001 n=10)
Add 1.201n ± 4% 1.202n ± 0% +0.08% (p=0.028 n=10)
Add32 1.201n ± 0% 1.202n ± 2% +0.08% (p=0.014 n=10)
Add64 1.201n ± 1% 1.202n ± 0% +0.08% (p=0.025 n=10)
Add64multiple 1.902n ± 0% 1.913n ± 0% +0.55% (p=0.004 n=10)
Sub 1.201n ± 0% 1.202n ± 3% +0.08% (p=0.001 n=10)
Sub32 1.654n ± 0% 1.656n ± 1% ~ (p=0.117 n=10)
Sub64 1.201n ± 0% 1.202n ± 0% +0.08% (p=0.001 n=10)
Sub64multiple 2.180n ± 4% 2.159n ± 1% -0.96% (p=0.006 n=10)
Mul 0.9345n ± 0% 0.9346n ± 0% +0.01% (p=0.000 n=10)
Mul32 1.030n ± 0% 1.050n ± 1% +1.94% (p=0.000 n=10)
Mul64 0.9345n ± 0% 0.9346n ± 1% +0.01% (p=0.000 n=10)
Div 11.57n ± 1% 11.12n ± 0% -3.85% (p=0.000 n=10)
Div32 4.337n ± 1% 4.341n ± 1% ~ (p=0.286 n=10)
Div64 12.76n ± 0% 12.02n ± 3% -5.80% (p=0.000 n=10)
geomean 1.252n 1.235n -1.32%
Change-Id: Iec4cfd2b83bb0f946068c1d657369ff081d95b04
Reviewed-on: https://go-review.googlesource.com/c/go/+/628575
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 1.100n ± 1% 1.101n ± 0% ~ (p=0.566 n=10)
LeadingZeros8 1.501n ± 0% 1.502n ± 0% +0.07% (p=0.000 n=10)
LeadingZeros16 1.501n ± 0% 1.502n ± 0% +0.07% (p=0.000 n=10)
LeadingZeros32 1.2010n ± 0% 0.9511n ± 0% -20.81% (p=0.000 n=10)
LeadingZeros64 1.104n ± 1% 1.119n ± 0% +1.40% (p=0.000 n=10)
TrailingZeros 0.8137n ± 0% 0.8086n ± 0% -0.63% (p=0.001 n=10)
TrailingZeros8 1.031n ± 1% 1.031n ± 1% ~ (p=0.956 n=10)
TrailingZeros16 0.8204n ± 1% 0.8114n ± 0% -1.11% (p=0.000 n=10)
TrailingZeros32 0.8145n ± 0% 0.8090n ± 0% -0.68% (p=0.000 n=10)
TrailingZeros64 0.8159n ± 0% 0.8089n ± 1% -0.86% (p=0.000 n=10)
OnesCount 0.8672n ± 0% 0.8677n ± 0% +0.06% (p=0.000 n=10)
OnesCount8 0.8005n ± 0% 0.8009n ± 0% +0.06% (p=0.000 n=10)
OnesCount16 0.9339n ± 0% 0.9344n ± 0% +0.05% (p=0.000 n=10)
OnesCount32 0.8672n ± 0% 0.8677n ± 0% +0.06% (p=0.000 n=10)
OnesCount64 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
RotateLeft 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
RotateLeft8 1.202n ± 0% 1.202n ± 0% ~ (p=0.210 n=10)
RotateLeft16 0.8050n ± 0% 0.8036n ± 0% -0.17% (p=0.002 n=10)
RotateLeft32 0.6674n ± 0% 0.6674n ± 0% ~ (p=1.000 n=10)
RotateLeft64 0.6673n ± 0% 0.6674n ± 0% ~ (p=0.072 n=10)
Reverse 0.4123n ± 0% 0.4067n ± 1% -1.37% (p=0.000 n=10)
Reverse8 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
Reverse16 0.8004n ± 0% 0.8009n ± 0% +0.06% (p=0.000 n=10)
Reverse32 0.8004n ± 0% 0.8009n ± 0% +0.06% (p=0.000 n=10)
Reverse64 0.8004n ± 0% 0.8009n ± 0% +0.06% (p=0.001 n=10)
ReverseBytes 0.4100n ± 1% 0.4057n ± 1% -1.06% (p=0.002 n=10)
ReverseBytes16 0.8004n ± 0% 0.8009n ± 0% +0.07% (p=0.000 n=10)
ReverseBytes32 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
ReverseBytes64 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
Add 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Add32 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
Add64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Add64multiple 1.831n ± 0% 1.832n ± 0% ~ (p=1.000 n=10)
Sub 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub32 1.601n ± 0% 1.602n ± 0% +0.06% (p=0.000 n=10)
Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
Sub64multiple 2.400n ± 0% 2.402n ± 0% +0.10% (p=0.000 n=10)
Mul 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
Mul32 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
Mul64 0.8004n ± 0% 0.8008n ± 0% +0.05% (p=0.000 n=10)
Div 9.107n ± 0% 9.083n ± 0% ~ (p=0.255 n=10)
Div32 4.009n ± 0% 4.011n ± 0% +0.05% (p=0.000 n=10)
Div64 9.705n ± 0% 9.711n ± 0% +0.06% (p=0.000 n=10)
geomean 1.089n 1.083n -0.62%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 1.352n ± 0% 1.341n ± 4% -0.81% (p=0.024 n=10)
LeadingZeros8 1.766n ± 0% 1.781n ± 0% +0.88% (p=0.000 n=10)
LeadingZeros16 1.766n ± 0% 1.782n ± 0% +0.88% (p=0.000 n=10)
LeadingZeros32 1.536n ± 0% 1.341n ± 1% -12.73% (p=0.000 n=10)
LeadingZeros64 1.351n ± 1% 1.338n ± 0% -0.96% (p=0.000 n=10)
TrailingZeros 0.9037n ± 0% 0.9025n ± 0% -0.12% (p=0.020 n=10)
TrailingZeros8 1.087n ± 3% 1.056n ± 0% ~ (p=0.060 n=10)
TrailingZeros16 1.101n ± 0% 1.101n ± 0% ~ (p=0.211 n=10)
TrailingZeros32 0.9040n ± 0% 0.9024n ± 1% -0.18% (p=0.017 n=10)
TrailingZeros64 0.9043n ± 0% 0.9028n ± 1% ~ (p=0.118 n=10)
OnesCount 1.503n ± 2% 1.482n ± 1% -1.43% (p=0.001 n=10)
OnesCount8 1.207n ± 0% 1.206n ± 0% -0.12% (p=0.000 n=10)
OnesCount16 1.501n ± 0% 1.534n ± 0% +2.13% (p=0.000 n=10)
OnesCount32 1.483n ± 1% 1.531n ± 1% +3.27% (p=0.000 n=10)
OnesCount64 1.301n ± 0% 1.302n ± 0% +0.08% (p=0.000 n=10)
RotateLeft 0.8136n ± 4% 0.8083n ± 0% -0.66% (p=0.002 n=10)
RotateLeft8 1.311n ± 0% 1.310n ± 0% ~ (p=0.786 n=10)
RotateLeft16 1.165n ± 0% 1.149n ± 0% -1.33% (p=0.001 n=10)
RotateLeft32 0.8138n ± 1% 0.8093n ± 0% -0.57% (p=0.017 n=10)
RotateLeft64 0.8149n ± 1% 0.8088n ± 0% -0.74% (p=0.000 n=10)
Reverse 0.5195n ± 1% 0.5109n ± 0% -1.67% (p=0.000 n=10)
Reverse8 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
Reverse16 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
Reverse32 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.012 n=10)
Reverse64 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.010 n=10)
ReverseBytes 0.5120n ± 1% 0.5122n ± 2% ~ (p=0.306 n=10)
ReverseBytes16 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
ReverseBytes32 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
ReverseBytes64 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
Add 1.201n ± 0% 1.201n ± 4% ~ (p=0.334 n=10)
Add32 1.201n ± 0% 1.201n ± 0% ~ (p=0.563 n=10)
Add64 1.201n ± 0% 1.201n ± 1% ~ (p=0.652 n=10)
Add64multiple 1.909n ± 0% 1.902n ± 0% ~ (p=0.126 n=10)
Sub 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub32 1.655n ± 0% 1.654n ± 0% ~ (p=0.589 n=10)
Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub64multiple 2.150n ± 0% 2.180n ± 4% +1.37% (p=0.000 n=10)
Mul 0.9341n ± 0% 0.9345n ± 0% +0.04% (p=0.011 n=10)
Mul32 1.053n ± 0% 1.030n ± 0% -2.23% (p=0.000 n=10)
Mul64 0.9341n ± 0% 0.9345n ± 0% +0.04% (p=0.018 n=10)
Div 11.59n ± 0% 11.57n ± 1% ~ (p=0.091 n=10)
Div32 4.337n ± 0% 4.337n ± 1% ~ (p=0.783 n=10)
Div64 12.81n ± 0% 12.76n ± 0% -0.39% (p=0.001 n=10)
geomean 1.257n 1.252n -0.46%
Change-Id: I9e93ea49736760c19dc6b6463d2aa95878121b7b
Reviewed-on: https://go-review.googlesource.com/c/go/+/627855
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64
and S390X, rather than having a custom intrinsic. Note that for PPC64 this
actually allows the existing Ctz16 and Ctz8 rules to be used.
Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/651816
Reviewed-by: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
While looking at the SSA of following code, i noticed
that these rules do not work properly, and the types
are loaded indirectly through an itab, instead of statically.
type M interface{ M() }
type A interface{ A() }
type Impl struct{}
func (*Impl) M() {}
func (*Impl) A() {}
func main() {
var a M = &Impl{}
a.(A).A()
}
Change-Id: Ia275993f81a2e7302102d4ff87ac28586023d13c
GitHub-Last-Rev: 4bfc9019172929d0b0f1c8a1b7eb28cdbc9b87ef
GitHub-Pull-Request: golang/go#71784
Reviewed-on: https://go-review.googlesource.com/c/go/+/649500
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
x < 128 -> x <= 127
x >= 128 -> x > 127
This allows for shorter encoding as 127 fits into
a single-byte immediate.
archive/tar benchmark (Alder Lake 12600K)
name old time/op new time/op delta
/Writer/USTAR-16 1.46µs ± 0% 1.32µs ± 0% -9.43% (p=0.008 n=5+5)
/Writer/GNU-16 1.85µs ± 1% 1.79µs ± 1% -3.47% (p=0.008 n=5+5)
/Writer/PAX-16 3.21µs ± 0% 3.11µs ± 2% -2.96% (p=0.008 n=5+5)
/Reader/USTAR-16 1.38µs ± 1% 1.37µs ± 0% ~ (p=0.127 n=5+4)
/Reader/GNU-16 798ns ± 1% 800ns ± 2% ~ (p=0.548 n=5+5)
/Reader/PAX-16 3.07µs ± 1% 3.00µs ± 0% -2.35% (p=0.008 n=5+5)
[Geo mean] 1.76µs 1.70µs -3.15%
compilecmp:
hash/maphash
hash/maphash.(*Hash).Write 517 -> 510 (-1.35%)
runtime
runtime.traceReadCPU 1626 -> 1615 (-0.68%)
runtime [cmd/compile]
runtime.traceReadCPU 1626 -> 1615 (-0.68%)
math/rand/v2
type:.eq.[128]float32 65 -> 59 (-9.23%)
bytes
bytes.trimLeftUnicode 378 -> 373 (-1.32%)
bytes.IndexAny 1189 -> 1157 (-2.69%)
bytes.LastIndexAny 1256 -> 1239 (-1.35%)
bytes.lastIndexFunc 263 -> 261 (-0.76%)
strings
strings.FieldsFuncSeq.func1 411 -> 399 (-2.92%)
strings.EqualFold 625 -> 624 (-0.16%)
strings.trimLeftUnicode 248 -> 231 (-6.85%)
math/rand
type:.eq.[128]float32 65 -> 59 (-9.23%)
bytes [cmd/compile]
bytes.LastIndexAny 1256 -> 1239 (-1.35%)
bytes.lastIndexFunc 263 -> 261 (-0.76%)
bytes.trimLeftUnicode 378 -> 373 (-1.32%)
bytes.IndexAny 1189 -> 1157 (-2.69%)
regexp/syntax
regexp/syntax.(*parser).parseEscape 1113 -> 1102 (-0.99%)
math/rand/v2 [cmd/compile]
type:.eq.[128]float32 65 -> 59 (-9.23%)
strings [cmd/compile]
strings.EqualFold 625 -> 624 (-0.16%)
strings.FieldsFuncSeq.func1 411 -> 399 (-2.92%)
strings.trimLeftUnicode 248 -> 231 (-6.85%)
math/rand [cmd/compile]
type:.eq.[128]float32 65 -> 59 (-9.23%)
regexp
regexp.(*inputString).context 198 -> 197 (-0.51%)
regexp.(*inputBytes).context 221 -> 212 (-4.07%)
image/jpeg
image/jpeg.(*decoder).processDQT 500 -> 491 (-1.80%)
regexp/syntax [cmd/compile]
regexp/syntax.(*parser).parseEscape 1113 -> 1102 (-0.99%)
regexp [cmd/compile]
regexp.(*inputString).context 198 -> 197 (-0.51%)
regexp.(*inputBytes).context 221 -> 212 (-4.07%)
encoding/csv
encoding/csv.(*Writer).fieldNeedsQuotes 269 -> 266 (-1.12%)
cmd/vendor/golang.org/x/sys/unix
type:.eq.[131]struct 855 -> 823 (-3.74%)
vendor/golang.org/x/text/unicode/norm
vendor/golang.org/x/text/unicode/norm.nextDecomposed 4831 -> 4826 (-0.10%)
vendor/golang.org/x/text/unicode/norm.(*Iter).returnSlice 281 -> 275 (-2.14%)
vendor/golang.org/x/text/secure/bidirule
vendor/golang.org/x/text/secure/bidirule.init.0 85 -> 83 (-2.35%)
go/scanner
go/scanner.isDigit 100 -> 98 (-2.00%)
go/scanner.(*Scanner).next 431 -> 422 (-2.09%)
go/scanner.isLetter 142 -> 124 (-12.68%)
encoding/asn1
encoding/asn1.parseTagAndLength 1189 -> 1182 (-0.59%)
encoding/asn1.makeField 3481 -> 3463 (-0.52%)
text/scanner
text/scanner.(*Scanner).next 1242 -> 1236 (-0.48%)
archive/tar
archive/tar.isASCII 133 -> 127 (-4.51%)
archive/tar.(*Writer).writeRawFile 1206 -> 1198 (-0.66%)
archive/tar.(*Reader).readHeader.func1 9 -> 7 (-22.22%)
archive/tar.toASCII 393 -> 383 (-2.54%)
archive/tar.splitUSTARPath 405 -> 396 (-2.22%)
archive/tar.(*Writer).writePAXHeader.func1 627 -> 620 (-1.12%)
text/template
text/template.jsIsSpecial 59 -> 57 (-3.39%)
go/doc
go/doc.assumedPackageName 714 -> 701 (-1.82%)
vendor/golang.org/x/net/http/httpguts
vendor/golang.org/x/net/http/httpguts.headerValueContainsToken 965 -> 952 (-1.35%)
vendor/golang.org/x/net/http/httpguts.tokenEqual 280 -> 269 (-3.93%)
vendor/golang.org/x/net/http/httpguts.IsTokenRune 28 -> 26 (-7.14%)
net/mail
net/mail.isVchar 26 -> 24 (-7.69%)
net/mail.isAtext 106 -> 104 (-1.89%)
net/mail.(*Address).String 1084 -> 1052 (-2.95%)
net/mail.isQtext 39 -> 37 (-5.13%)
net/mail.isMultibyte 9 -> 7 (-22.22%)
net/mail.isDtext 45 -> 43 (-4.44%)
net/mail.(*addrParser).consumeQuotedString 1050 -> 1029 (-2.00%)
net/mail.quoteString 741 -> 714 (-3.64%)
cmd/internal/obj/s390x
cmd/internal/obj/s390x.preprocess 6405 -> 6393 (-0.19%)
cmd/internal/obj/x86
cmd/internal/obj/x86.toDisp8 303 -> 301 (-0.66%)
fmt [cmd/compile]
fmt.Fprintf 4726 -> 4662 (-1.35%)
go/scanner [cmd/compile]
go/scanner.(*Scanner).next 431 -> 422 (-2.09%)
go/scanner.isLetter 142 -> 124 (-12.68%)
go/scanner.isDigit 100 -> 98 (-2.00%)
cmd/compile/internal/syntax
cmd/compile/internal/syntax.(*source).nextch 879 -> 847 (-3.64%)
cmd/vendor/golang.org/x/mod/module
cmd/vendor/golang.org/x/mod/module.checkElem 1253 -> 1235 (-1.44%)
cmd/vendor/golang.org/x/mod/module.escapeString 519 -> 517 (-0.39%)
go/doc [cmd/compile]
go/doc.assumedPackageName 714 -> 701 (-1.82%)
cmd/compile/internal/syntax [cmd/compile]
cmd/compile/internal/syntax.(*scanner).escape 1965 -> 1933 (-1.63%)
cmd/compile/internal/syntax.(*scanner).next 8975 -> 8847 (-1.43%)
cmd/internal/obj/s390x [cmd/compile]
cmd/internal/obj/s390x.preprocess 6405 -> 6393 (-0.19%)
cmd/internal/obj/x86 [cmd/compile]
cmd/internal/obj/x86.toDisp8 303 -> 301 (-0.66%)
cmd/internal/gcprog
cmd/internal/gcprog.(*Writer).Repeat 688 -> 677 (-1.60%)
cmd/internal/gcprog.(*Writer).varint 442 -> 439 (-0.68%)
cmd/compile/internal/ir
cmd/compile/internal/ir.splitPkg 331 -> 325 (-1.81%)
cmd/compile/internal/ir [cmd/compile]
cmd/compile/internal/ir.splitPkg 331 -> 325 (-1.81%)
net/http
net/http.containsDotDot.FieldsFuncSeq.func1 411 -> 399 (-2.92%)
net/http.isNotToken 33 -> 30 (-9.09%)
net/http.containsDotDot 606 -> 588 (-2.97%)
net/http.isCookieNameValid 197 -> 191 (-3.05%)
net/http.parsePattern 4330 -> 4317 (-0.30%)
net/http.ParseCookie 1099 -> 1096 (-0.27%)
net/http.validMethod 197 -> 187 (-5.08%)
cmd/vendor/golang.org/x/text/unicode/norm
cmd/vendor/golang.org/x/text/unicode/norm.(*Iter).returnSlice 281 -> 275 (-2.14%)
cmd/vendor/golang.org/x/text/unicode/norm.nextDecomposed 4831 -> 4826 (-0.10%)
net/http/cookiejar
net/http/cookiejar.encode 1936 -> 1918 (-0.93%)
expvar
expvar.appendJSONQuote 972 -> 965 (-0.72%)
cmd/cgo/internal/test
cmd/cgo/internal/test.stack128 116 -> 114 (-1.72%)
cmd/vendor/rsc.io/markdown
cmd/vendor/rsc.io/markdown.newATXHeading 1637 -> 1628 (-0.55%)
cmd/vendor/rsc.io/markdown.isUnicodePunct 197 -> 179 (-9.14%)
Change-Id: I578bdf42ef229d687d526e378d697ced51e1880c
Reviewed-on: https://go-review.googlesource.com/c/go/+/639935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Fixes #71759
Change-Id: Iab05294ac933cc9972949158d3fe2bdc3073df5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/649895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
It currently isn't because it does load/store/load/store/...
Rework to do overwrite processing in pairs so it is instead
load/load/store/store/...
Change-Id: If7be629bc4048da5f2386dafb8f05759b79e9e2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/631495
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Particularly with 2-word load instructions, this becomes important.
Classic example is:
func f(p *string) string {
return *p
}
We want the two loads to put the return values directly into
the two ABI return registers.
At this point in the stack, cmd/go is 1.1% smaller.
Change-Id: I51fd1710238e81d15aab2bfb816d73c8e7c207b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/631137
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Look for possible paired load/store operations on arm64.
I don't expect this would be a lot faster, but it will save
binary space, and indirectly through the icache at least a bit
of time.
Change-Id: I4dd73b0e6329c4659b7453998f9b75320fcf380b
Reviewed-on: https://go-review.googlesource.com/c/go/+/629256
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
If we call slicebytetostring immediately (with no intervening writes)
before calling map access or delete functions with the resulting
string as the key, then we can just use the ptr/len of the
slicebytetostring argument as the key. This avoids an allocation.
Fixes #44898
Update #71132
There's old code in cmd/compile/internal/walk/order.go that handles
some of these cases.
1. m[string(b)]
2. s := string(b); m[s]
3. m[[2]string{string(b1),string(b2)}]
The old code handled cases 1&3. The new code handles cases 1&2.
We'll leave the old code around to keep 3 working, although it seems
not terribly common.
Case 2 happens particularly after inlining, so it is pretty common.
Change-Id: I8913226ca79d2c65f4e2bd69a38ac8c976a57e43
Reviewed-on: https://go-review.googlesource.com/c/go/+/640656
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
There is a generic opcode for FMA, but we don't use it in rewrite rules.
This is maybe because some archs, like WASM and MIPS don't have a late
lowering rule for it.
Fixes #71204
Intel Alder Lake 12600k (GOAMD64=v3):
math:
name old time/op new time/op delta
Acos-16 4.58ns ± 0% 3.36ns ± 0% -26.68% (p=0.008 n=5+5)
Acosh-16 8.04ns ± 1% 6.44ns ± 0% -19.95% (p=0.008 n=5+5)
Asin-16 4.28ns ± 0% 3.32ns ± 0% -22.24% (p=0.008 n=5+5)
Asinh-16 9.92ns ± 0% 8.62ns ± 0% -13.13% (p=0.008 n=5+5)
Atan-16 2.31ns ± 0% 1.84ns ± 0% -20.02% (p=0.008 n=5+5)
Atanh-16 7.79ns ± 0% 7.03ns ± 0% -9.67% (p=0.008 n=5+5)
Atan2-16 3.93ns ± 0% 3.52ns ± 0% -10.35% (p=0.000 n=5+4)
Cbrt-16 4.62ns ± 0% 4.41ns ± 0% -4.57% (p=0.016 n=4+5)
Ceil-16 0.14ns ± 1% 0.14ns ± 2% ~ (p=0.103 n=5+5)
Copysign-16 0.33ns ± 0% 0.33ns ± 0% +0.03% (p=0.029 n=4+4)
Cos-16 4.87ns ± 0% 4.75ns ± 0% -2.44% (p=0.016 n=5+4)
Cosh-16 4.86ns ± 0% 4.86ns ± 0% ~ (p=0.317 n=5+5)
Erf-16 2.71ns ± 0% 2.25ns ± 0% -16.69% (p=0.008 n=5+5)
Erfc-16 3.06ns ± 0% 2.67ns ± 0% -13.00% (p=0.016 n=5+4)
Erfinv-16 3.88ns ± 0% 2.84ns ± 3% -26.83% (p=0.008 n=5+5)
Erfcinv-16 4.08ns ± 0% 3.01ns ± 1% -26.27% (p=0.008 n=5+5)
Exp-16 3.29ns ± 0% 3.37ns ± 2% +2.64% (p=0.016 n=4+5)
ExpGo-16 8.44ns ± 0% 7.48ns ± 1% -11.37% (p=0.008 n=5+5)
Expm1-16 4.46ns ± 0% 3.69ns ± 2% -17.26% (p=0.016 n=4+5)
Exp2-16 8.20ns ± 0% 7.39ns ± 2% -9.94% (p=0.008 n=5+5)
Exp2Go-16 8.26ns ± 0% 7.23ns ± 0% -12.49% (p=0.016 n=4+5)
Abs-16 0.26ns ± 3% 0.22ns ± 1% -16.34% (p=0.008 n=5+5)
Dim-16 0.38ns ± 1% 0.40ns ± 2% +5.02% (p=0.008 n=5+5)
Floor-16 0.11ns ± 1% 0.17ns ± 4% +54.99% (p=0.008 n=5+5)
Max-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.619 n=5+5)
Min-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.484 n=5+5)
Mod-16 13.4ns ± 1% 12.8ns ± 0% -4.21% (p=0.016 n=5+4)
Frexp-16 1.70ns ± 0% 1.71ns ± 0% +0.46% (p=0.008 n=5+5)
Gamma-16 3.97ns ± 0% 3.97ns ± 0% ~ (p=0.643 n=5+5)
Hypot-16 2.11ns ± 0% 2.11ns ± 0% ~ (p=0.762 n=5+5)
HypotGo-16 2.48ns ± 4% 2.26ns ± 0% -8.94% (p=0.008 n=5+5)
Ilogb-16 1.67ns ± 0% 1.67ns ± 0% -0.07% (p=0.048 n=5+5)
J0-16 19.8ns ± 0% 19.3ns ± 0% ~ (p=0.079 n=4+5)
J1-16 19.4ns ± 0% 18.9ns ± 0% -2.63% (p=0.000 n=5+4)
Jn-16 41.5ns ± 0% 40.6ns ± 0% -2.32% (p=0.016 n=4+5)
Ldexp-16 2.26ns ± 0% 2.26ns ± 0% ~ (p=0.683 n=5+5)
Lgamma-16 4.40ns ± 0% 4.21ns ± 0% -4.21% (p=0.008 n=5+5)
Log-16 4.05ns ± 0% 4.05ns ± 0% ~ (all equal)
Logb-16 1.69ns ± 0% 1.69ns ± 0% ~ (p=0.429 n=5+5)
Log1p-16 5.00ns ± 0% 3.99ns ± 0% -20.14% (p=0.008 n=5+5)
Log10-16 4.22ns ± 0% 4.21ns ± 0% -0.15% (p=0.008 n=5+5)
Log2-16 2.27ns ± 0% 2.25ns ± 0% -0.94% (p=0.008 n=5+5)
Modf-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.492 n=5+5)
Nextafter32-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.079 n=4+5)
Nextafter64-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.095 n=4+5)
PowInt-16 10.8ns ± 0% 10.8ns ± 0% ~ (all equal)
PowFrac-16 25.3ns ± 0% 25.3ns ± 0% -0.09% (p=0.000 n=5+4)
Pow10Pos-16 0.52ns ± 1% 0.52ns ± 0% ~ (p=0.810 n=5+5)
Pow10Neg-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.381 n=5+5)
Round-16 0.93ns ± 0% 0.93ns ± 0% ~ (p=0.056 n=5+5)
RoundToEven-16 1.64ns ± 0% 1.64ns ± 0% ~ (all equal)
Remainder-16 12.4ns ± 2% 12.0ns ± 0% -3.27% (p=0.008 n=5+5)
Signbit-16 0.37ns ± 0% 0.37ns ± 0% -0.19% (p=0.008 n=5+5)
Sin-16 4.04ns ± 0% 3.92ns ± 0% -3.13% (p=0.000 n=4+5)
Sincos-16 5.99ns ± 0% 5.80ns ± 0% -3.03% (p=0.008 n=5+5)
Sinh-16 5.22ns ± 0% 5.22ns ± 0% ~ (p=0.651 n=5+4)
SqrtIndirect-16 0.41ns ± 0% 0.41ns ± 0% ~ (p=0.333 n=4+5)
SqrtLatency-16 2.66ns ± 0% 2.66ns ± 0% ~ (p=0.079 n=4+5)
SqrtIndirectLatency-16 2.66ns ± 0% 2.66ns ± 0% ~ (p=1.000 n=5+5)
SqrtGoLatency-16 30.1ns ± 0% 28.6ns ± 1% -4.84% (p=0.008 n=5+5)
SqrtPrime-16 645ns ± 0% 645ns ± 0% ~ (p=0.095 n=5+4)
Tan-16 4.21ns ± 0% 4.09ns ± 0% -2.76% (p=0.029 n=4+4)
Tanh-16 5.36ns ± 0% 5.36ns ± 0% ~ (p=0.444 n=5+5)
Trunc-16 0.12ns ± 6% 0.11ns ± 1% -6.79% (p=0.008 n=5+5)
Y0-16 19.2ns ± 0% 18.7ns ± 0% -2.52% (p=0.000 n=5+4)
Y1-16 19.1ns ± 0% 18.4ns ± 0% ~ (p=0.079 n=4+5)
Yn-16 40.7ns ± 0% 39.5ns ± 0% -2.82% (p=0.008 n=5+5)
Float64bits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.603 n=5+5)
Float64frombits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.984 n=4+5)
Float32bits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.778 n=4+5)
Float32frombits-16 0.21ns ± 0% 0.20ns ± 0% ~ (p=0.397 n=5+5)
FMA-16 0.82ns ± 0% 0.82ns ± 0% +0.02% (p=0.029 n=4+4)
[Geo mean] 2.87ns 2.74ns -4.61%
math/cmplx:
name old time/op new time/op delta
Abs-16 2.07ns ± 0% 2.05ns ± 0% -0.70% (p=0.016 n=5+4)
Acos-16 36.5ns ± 0% 35.7ns ± 0% -2.33% (p=0.029 n=4+4)
Acosh-16 37.0ns ± 0% 36.2ns ± 0% -2.20% (p=0.008 n=5+5)
Asin-16 36.5ns ± 0% 35.7ns ± 0% -2.29% (p=0.008 n=5+5)
Asinh-16 33.5ns ± 0% 31.6ns ± 0% -5.51% (p=0.008 n=5+5)
Atan-16 15.5ns ± 0% 13.9ns ± 0% -10.61% (p=0.008 n=5+5)
Atanh-16 15.0ns ± 0% 13.6ns ± 0% -9.73% (p=0.008 n=5+5)
Conj-16 0.11ns ± 5% 0.11ns ± 1% ~ (p=0.421 n=5+5)
Cos-16 12.3ns ± 0% 12.2ns ± 0% -0.60% (p=0.000 n=4+5)
Cosh-16 12.1ns ± 0% 12.0ns ± 0% ~ (p=0.079 n=4+5)
Exp-16 10.0ns ± 0% 9.8ns ± 0% -1.77% (p=0.008 n=5+5)
Log-16 14.5ns ± 0% 13.7ns ± 0% -5.67% (p=0.008 n=5+5)
Log10-16 14.5ns ± 0% 13.7ns ± 0% -5.55% (p=0.000 n=5+4)
Phase-16 5.11ns ± 0% 4.25ns ± 0% -16.90% (p=0.008 n=5+5)
Polar-16 7.12ns ± 0% 6.35ns ± 0% -10.90% (p=0.008 n=5+5)
Pow-16 64.3ns ± 0% 63.7ns ± 0% -0.97% (p=0.008 n=5+5)
Rect-16 5.74ns ± 0% 5.58ns ± 0% -2.73% (p=0.016 n=4+5)
Sin-16 12.2ns ± 0% 12.2ns ± 0% -0.54% (p=0.000 n=4+5)
Sinh-16 12.1ns ± 0% 12.0ns ± 0% -0.58% (p=0.000 n=5+4)
Sqrt-16 5.30ns ± 0% 5.18ns ± 0% -2.36% (p=0.008 n=5+5)
Tan-16 22.7ns ± 0% 22.6ns ± 0% -0.33% (p=0.008 n=5+5)
Tanh-16 21.2ns ± 0% 20.9ns ± 0% -1.32% (p=0.008 n=5+5)
[Geo mean] 11.3ns 10.8ns -3.97%
Change-Id: Idcc4b357ba68477929c126289e5095b27a827b1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/646335
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
In addition to unsigned loads which already exist.
This helps code that does switches on strings to constant-fold
the switch away when the string being switched on is constant.
Fixes #71699
Change-Id: If3051af0f7255d2a573da6f96b153a987a7f159d
Reviewed-on: https://go-review.googlesource.com/c/go/+/649295
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
|
|
We can just use == if the interface is direct.
Fixes #70738
Change-Id: Ia9a644791a370fec969c04c42d28a9b58f16911f
Reviewed-on: https://go-review.googlesource.com/c/go/+/635435
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
ADD(Q|L) has generally twice the throughput.
Came up in CL 626998.
Throughput by arch:
Zen 4:
SHLL (R64, 1): 0.5
ADD (R64, R64): 0.25
Intel Alder Lake:
SHLL (R64, 1): 0.5
ADD (R64, R64): 0.2
Intel Haswell:
SHLL (R64, 1): 0.5
ADD (R64, R64): 0.25
Also include a minor opt for:
(x + x) << c -> x << (c + 1)
Before this, the code:
func addShift(x int64) int64 {
return (x + x) << 1
}
emitted two instructions:
ADDQ AX, AX
SHLQ $1, AX
but we can do it in a single shift:
SHLQ $2, AX
Add a codegen test for clearing the last bit.
compilecmp linux/amd64:
math
math.sqrt 243 -> 242 (-0.41%)
math [cmd/compile]
math.sqrt 243 -> 242 (-0.41%)
runtime
runtime.selectgo 5455 -> 5445 (-0.18%)
runtime.sysargs 665 -> 662 (-0.45%)
runtime.isPinned 145 -> 141 (-2.76%)
runtime.atoi64 198 -> 194 (-2.02%)
runtime.setPinned 714 -> 709 (-0.70%)
runtime [cmd/compile]
runtime.sysargs 665 -> 662 (-0.45%)
runtime.setPinned 714 -> 709 (-0.70%)
runtime.atoi64 198 -> 194 (-2.02%)
runtime.isPinned 145 -> 141 (-2.76%)
strconv
strconv.computeBounds 109 -> 107 (-1.83%)
strconv.FormatInt 201 -> 197 (-1.99%)
strconv.ryuFtoaShortest 1298 -> 1266 (-2.47%)
strconv.small 144 -> 134 (-6.94%)
strconv.AppendInt 357 -> 344 (-3.64%)
strconv.ryuDigits32 490 -> 488 (-0.41%)
strconv.AppendUint 342 -> 340 (-0.58%)
strconv [cmd/compile]
strconv.FormatInt 201 -> 197 (-1.99%)
strconv.ryuFtoaShortest 1298 -> 1266 (-2.47%)
strconv.ryuDigits32 490 -> 488 (-0.41%)
strconv.AppendUint 342 -> 340 (-0.58%)
strconv.computeBounds 109 -> 107 (-1.83%)
strconv.small 144 -> 134 (-6.94%)
strconv.AppendInt 357 -> 344 (-3.64%)
image
image.Rectangle.Inset 101 -> 97 (-3.96%)
regexp/syntax
regexp/syntax.inCharClass.func1 111 -> 110 (-0.90%)
regexp/syntax.(*compiler).quest 586 -> 573 (-2.22%)
regexp/syntax.ranges.Less 153 -> 150 (-1.96%)
regexp/syntax.(*compiler).loop 583 -> 568 (-2.57%)
time
time.Time.Before 179 -> 161 (-10.06%)
time.Time.Compare 189 -> 166 (-12.17%)
time.Time.Sub 444 -> 425 (-4.28%)
time.Time.UnixMicro 106 -> 95 (-10.38%)
time.div 592 -> 587 (-0.84%)
time.Time.UnixNano 85 -> 78 (-8.24%)
time.(*Time).UnixMilli 141 -> 140 (-0.71%)
time.Time.UnixMilli 106 -> 95 (-10.38%)
time.(*Time).UnixMicro 141 -> 140 (-0.71%)
time.Time.After 179 -> 161 (-10.06%)
time.Time.Equal 170 -> 150 (-11.76%)
time.Time.AppendBinary 766 -> 757 (-1.17%)
time.Time.IsZero 74 -> 66 (-10.81%)
time.(*Time).UnixNano 124 -> 113 (-8.87%)
time.(*Time).IsZero 113 -> 108 (-4.42%)
regexp
regexp.(*Regexp).FindAllStringSubmatch.func1 590 -> 569 (-3.56%)
regexp.QuoteMeta 485 -> 469 (-3.30%)
regexp/syntax [cmd/compile]
regexp/syntax.inCharClass.func1 111 -> 110 (-0.90%)
regexp/syntax.(*compiler).loop 583 -> 568 (-2.57%)
regexp/syntax.(*compiler).quest 586 -> 573 (-2.22%)
regexp/syntax.ranges.Less 153 -> 150 (-1.96%)
encoding/base64
encoding/base64.decodedLen 92 -> 90 (-2.17%)
encoding/base64.(*Encoding).DecodedLen 99 -> 97 (-2.02%)
time [cmd/compile]
time.(*Time).IsZero 113 -> 108 (-4.42%)
time.Time.IsZero 74 -> 66 (-10.81%)
time.(*Time).UnixNano 124 -> 113 (-8.87%)
time.Time.UnixMilli 106 -> 95 (-10.38%)
time.Time.Equal 170 -> 150 (-11.76%)
time.Time.UnixMicro 106 -> 95 (-10.38%)
time.(*Time).UnixMicro 141 -> 140 (-0.71%)
time.Time.Before 179 -> 161 (-10.06%)
time.Time.UnixNano 85 -> 78 (-8.24%)
time.Time.AppendBinary 766 -> 757 (-1.17%)
time.div 592 -> 587 (-0.84%)
time.Time.After 179 -> 161 (-10.06%)
time.Time.Compare 189 -> 166 (-12.17%)
time.(*Time).UnixMilli 141 -> 140 (-0.71%)
time.Time.Sub 444 -> 425 (-4.28%)
index/suffixarray
index/suffixarray.sais_8_32 1677 -> 1645 (-1.91%)
index/suffixarray.sais_32 1677 -> 1645 (-1.91%)
index/suffixarray.sais_64 1677 -> 1654 (-1.37%)
index/suffixarray.sais_8_64 1677 -> 1654 (-1.37%)
index/suffixarray.writeInt 249 -> 247 (-0.80%)
os
os.Expand 1070 -> 1051 (-1.78%)
os.Chtimes 787 -> 774 (-1.65%)
regexp [cmd/compile]
regexp.(*Regexp).FindAllStringSubmatch.func1 590 -> 569 (-3.56%)
regexp.QuoteMeta 485 -> 469 (-3.30%)
encoding/base64 [cmd/compile]
encoding/base64.decodedLen 92 -> 90 (-2.17%)
encoding/base64.(*Encoding).DecodedLen 99 -> 97 (-2.02%)
encoding/hex
encoding/hex.Encode 138 -> 136 (-1.45%)
encoding/hex.(*decoder).Read 830 -> 824 (-0.72%)
crypto/des
crypto/des.initFeistelBox 235 -> 229 (-2.55%)
crypto/des.cryptBlock 549 -> 538 (-2.00%)
os [cmd/compile]
os.Chtimes 787 -> 774 (-1.65%)
os.Expand 1070 -> 1051 (-1.78%)
math/big
math/big.newFloat 238 -> 223 (-6.30%)
math/big.nat.mul 2138 -> 2122 (-0.75%)
math/big.karatsubaSqr 1372 -> 1369 (-0.22%)
math/big.(*Float).sqrtInverse 895 -> 878 (-1.90%)
math/big.basicSqr 1032 -> 1017 (-1.45%)
cmd/vendor/golang.org/x/sys/unix
cmd/vendor/golang.org/x/sys/unix.TimeToTimespec 72 -> 66 (-8.33%)
encoding/json
encoding/json.Indent 404 -> 403 (-0.25%)
encoding/json.MarshalIndent 303 -> 297 (-1.98%)
testing
testing.(*T).Deadline 84 -> 82 (-2.38%)
testing.(*M).Run 3545 -> 3525 (-0.56%)
archive/zip
archive/zip.headerFileInfo.ModTime 229 -> 223 (-2.62%)
encoding/gob
encoding/gob.(*encoderState).encodeInt 474 -> 469 (-1.05%)
crypto/elliptic
crypto/elliptic.Marshal 728 -> 714 (-1.92%)
debug/buildinfo
debug/buildinfo.readString 325 -> 315 (-3.08%)
image/png
image/png.(*decoder).readImagePass 10866 -> 10834 (-0.29%)
archive/tar
archive/tar.Header.allowedFormats.func3 1768 -> 1736 (-1.81%)
archive/tar.formatPAXTime 389 -> 358 (-7.97%)
archive/tar.(*Writer).writeGNUHeader 741 -> 727 (-1.89%)
archive/tar.readGNUSparseMap0x1 709 -> 695 (-1.97%)
archive/tar.(*Writer).templateV7Plus 915 -> 909 (-0.66%)
crypto/internal/cryptotest
crypto/internal/cryptotest.TestHash.func4 890 -> 879 (-1.24%)
crypto/internal/cryptotest.TestStream.func6.1 646 -> 645 (-0.15%)
crypto/internal/cryptotest.testCipher.func3 1300 -> 1289 (-0.85%)
internal/pkgbits
internal/pkgbits.(*Encoder).Int64 113 -> 103 (-8.85%)
internal/pkgbits.(*Encoder).rawVarint 74 -> 72 (-2.70%)
testing/quick
testing/quick.(*Config).getRand 316 -> 315 (-0.32%)
log/slog
log/slog.TimeValue 489 -> 479 (-2.04%)
runtime/pprof
runtime/pprof.(*profileBuilder).build 2341 -> 2322 (-0.81%)
internal/coverage/cfile
internal/coverage/cfile.(*emitState).openMetaFile 824 -> 822 (-0.24%)
internal/coverage/cfile.(*emitState).openCounterFile 904 -> 892 (-1.33%)
cmd/internal/objabi
cmd/internal/objabi.expandArgs 1177 -> 1169 (-0.68%)
crypto/ecdsa
crypto/ecdsa.pointFromAffine 1162 -> 1144 (-1.55%)
net
net.minNonzeroTime 313 -> 308 (-1.60%)
net.cgoLookupAddrPTR 812 -> 797 (-1.85%)
net.(*IPNet).String 851 -> 827 (-2.82%)
net.IP.AppendText 488 -> 471 (-3.48%)
net.IPMask.String 281 -> 270 (-3.91%)
net.partialDeadline 374 -> 366 (-2.14%)
net.hexString 249 -> 240 (-3.61%)
net.IP.String 454 -> 453 (-0.22%)
internal/fuzz
internal/fuzz.newPcgRand 240 -> 234 (-2.50%)
crypto/x509
crypto/x509.(*Certificate).isValid 2642 -> 2611 (-1.17%)
cmd/internal/obj/s390x
cmd/internal/obj/s390x.buildop 33676 -> 33644 (-0.10%)
encoding/hex [cmd/compile]
encoding/hex.(*decoder).Read 830 -> 824 (-0.72%)
encoding/hex.Encode 138 -> 136 (-1.45%)
cmd/internal/objabi [cmd/compile]
cmd/internal/objabi.expandArgs 1177 -> 1169 (-0.68%)
math/big [cmd/compile]
math/big.(*Float).sqrtInverse 895 -> 878 (-1.90%)
math/big.nat.mul 2138 -> 2122 (-0.75%)
math/big.karatsubaSqr 1372 -> 1369 (-0.22%)
math/big.basicSqr 1032 -> 1017 (-1.45%)
math/big.newFloat 238 -> 223 (-6.30%)
encoding/json [cmd/compile]
encoding/json.MarshalIndent 303 -> 297 (-1.98%)
encoding/json.Indent 404 -> 403 (-0.25%)
cmd/covdata
main.(*metaMerge).emitCounters 985 -> 973 (-1.22%)
runtime/pprof [cmd/compile]
runtime/pprof.(*profileBuilder).build 2341 -> 2322 (-0.81%)
cmd/compile/internal/syntax
cmd/compile/internal/syntax.(*source).fill 722 -> 703 (-2.63%)
cmd/dist
main.runInstall 19081 -> 19049 (-0.17%)
crypto/tls
crypto/tls.extractPadding 176 -> 175 (-0.57%)
slices.Clone[[]crypto/tls.SignatureScheme,crypto/tls.SignatureScheme] 253 -> 247 (-2.37%)
slices.Clone[[]uint16,uint16] 253 -> 247 (-2.37%)
slices.Clone[[]crypto/tls.CurveID,crypto/tls.CurveID] 253 -> 247 (-2.37%)
crypto/tls.(*Config).cipherSuites 335 -> 326 (-2.69%)
slices.DeleteFunc[go.shape.[]crypto/tls.CurveID,go.shape.uint16] 437 -> 434 (-0.69%)
crypto/tls.dial 1349 -> 1339 (-0.74%)
slices.DeleteFunc[go.shape.[]uint16,go.shape.uint16] 437 -> 434 (-0.69%)
internal/pkgbits [cmd/compile]
internal/pkgbits.(*Encoder).Int64 113 -> 103 (-8.85%)
internal/pkgbits.(*Encoder).rawVarint 74 -> 72 (-2.70%)
cmd/compile/internal/syntax [cmd/compile]
cmd/compile/internal/syntax.(*source).fill 722 -> 703 (-2.63%)
cmd/internal/obj/s390x [cmd/compile]
cmd/internal/obj/s390x.buildop 33676 -> 33644 (-0.10%)
cmd/go/internal/trace
cmd/go/internal/trace.Flow 910 -> 886 (-2.64%)
cmd/go/internal/trace.(*Span).Done 311 -> 304 (-2.25%)
cmd/go/internal/trace.StartSpan 620 -> 615 (-0.81%)
cmd/internal/script
cmd/internal/script.(*Engine).Execute.func2 534 -> 528 (-1.12%)
cmd/link/internal/loader
cmd/link/internal/loader.(*Loader).SetSymSect 344 -> 338 (-1.74%)
net/http
net/http.(*Transport).queueForIdleConn 1797 -> 1766 (-1.73%)
net/http.(*Transport).getConn 2149 -> 2131 (-0.84%)
net/http.(*http2ClientConn).tooIdleLocked 207 -> 197 (-4.83%)
net/http.(*http2responseWriter).SetWriteDeadline.func1 520 -> 508 (-2.31%)
net/http.(*Cookie).Valid 837 -> 818 (-2.27%)
net/http.(*http2responseWriter).SetReadDeadline 373 -> 357 (-4.29%)
net/http.checkIfRange 701 -> 690 (-1.57%)
net/http.(*http2SettingsFrame).Value 325 -> 298 (-8.31%)
net/http.(*http2SettingsFrame).HasDuplicates 777 -> 767 (-1.29%)
net/http.(*Server).Serve 1746 -> 1739 (-0.40%)
net/http.http2traceGotConn 569 -> 556 (-2.28%)
net/http/pprof
net/http/pprof.collectProfile 242 -> 239 (-1.24%)
cmd/compile/internal/coverage
cmd/compile/internal/coverage.metaHashAndLen 439 -> 438 (-0.23%)
cmd/vendor/golang.org/x/telemetry/internal/upload
cmd/vendor/golang.org/x/telemetry/internal/upload.(*uploader).findWork 4570 -> 4540 (-0.66%)
cmd/vendor/golang.org/x/telemetry/internal/upload.(*uploader).reports 3604 -> 3572 (-0.89%)
cmd/compile/internal/coverage [cmd/compile]
cmd/compile/internal/coverage.metaHashAndLen 439 -> 438 (-0.23%)
cmd/vendor/golang.org/x/text/language
cmd/vendor/golang.org/x/text/language.regionGroupDist 287 -> 284 (-1.05%)
cmd/go/internal/vcweb
cmd/go/internal/vcweb.(*Server).overview.func1 1045 -> 1041 (-0.38%)
cmd/go/internal/vcs
cmd/go/internal/vcs.expand 761 -> 741 (-2.63%)
cmd/compile/internal/inline/inlheur
slices.stableCmpFunc[go.shape.struct 2300 -> 2284 (-0.70%)
cmd/compile/internal/inline/inlheur [cmd/compile]
slices.stableCmpFunc[go.shape.struct 2300 -> 2284 (-0.70%)
cmd/go/internal/modfetch/codehost
cmd/go/internal/modfetch/codehost.bzrParseStat 2217 -> 2213 (-0.18%)
cmd/link/internal/ld
cmd/link/internal/ld.decodetypeStructFieldCount 157 -> 152 (-3.18%)
cmd/link/internal/ld.(*Link).address 12559 -> 12495 (-0.51%)
cmd/link/internal/ld.(*dodataState).allocateDataSections 18345 -> 18205 (-0.76%)
cmd/link/internal/ld.elfshreloc 618 -> 616 (-0.32%)
cmd/link/internal/ld.(*deadcodePass).decodetypeMethods 794 -> 779 (-1.89%)
cmd/link/internal/ld.(*dodataState).assignDsymsToSection 668 -> 663 (-0.75%)
cmd/link/internal/ld.relocSectFn 285 -> 284 (-0.35%)
cmd/link/internal/ld.decodetypeIfaceMethodCount 146 -> 144 (-1.37%)
cmd/link/internal/ld.decodetypeArrayLen 157 -> 152 (-3.18%)
cmd/link/internal/arm64
cmd/link/internal/arm64.gensymlate.func1 895 -> 888 (-0.78%)
cmd/go/internal/modload
cmd/go/internal/modload.queryProxy.func3 1029 -> 1012 (-1.65%)
cmd/go/internal/load
cmd/go/internal/load.(*Package).setBuildInfo 8453 -> 8447 (-0.07%)
cmd/go/internal/clean
cmd/go/internal/clean.runClean 2120 -> 2104 (-0.75%)
cmd/compile/internal/ssa
cmd/compile/internal/ssa.(*poset).aliasnodes 2010 -> 1978 (-1.59%)
cmd/compile/internal/ssa.rewriteValueARM64_OpARM64MOVHstoreidx2 730 -> 719 (-1.51%)
cmd/compile/internal/ssa.(*debugState).buildLocationLists 3326 -> 3294 (-0.96%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDLconst 3069 -> 2941 (-4.17%)
cmd/compile/internal/ssa.(*debugState).processValue 9756 -> 9724 (-0.33%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDQconst 3069 -> 2941 (-4.17%)
cmd/compile/internal/ssa.(*poset).mergeroot 1079 -> 1054 (-2.32%)
cmd/compile/internal/ssa [cmd/compile]
cmd/compile/internal/ssa.rewriteValueARM64_OpARM64MOVHstoreidx2 730 -> 719 (-1.51%)
cmd/compile/internal/ssa.(*poset).aliasnodes 2010 -> 1978 (-1.59%)
cmd/compile/internal/ssa.(*poset).mergeroot 1079 -> 1054 (-2.32%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDQconst 3069 -> 2941 (-4.17%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDLconst 3069 -> 2941 (-4.17%)
file before after Δ %
math/bits.s 2352 2354 +2 +0.085%
math/bits [cmd/compile].s 2352 2354 +2 +0.085%
math.s 35675 35674 -1 -0.003%
math [cmd/compile].s 35675 35674 -1 -0.003%
runtime.s 577251 577245 -6 -0.001%
runtime [cmd/compile].s 642419 642438 +19 +0.003%
sort.s 37434 37435 +1 +0.003%
strconv.s 48391 48343 -48 -0.099%
sort [cmd/compile].s 37434 37435 +1 +0.003%
bufio.s 21386 21418 +32 +0.150%
strconv [cmd/compile].s 48391 48343 -48 -0.099%
image.s 34978 35022 +44 +0.126%
regexp/syntax.s 81719 81781 +62 +0.076%
time.s 94341 94184 -157 -0.166%
regexp.s 60411 60399 -12 -0.020%
bufio [cmd/compile].s 21512 21544 +32 +0.149%
encoding/binary.s 34062 34087 +25 +0.073%
regexp/syntax [cmd/compile].s 81719 81781 +62 +0.076%
encoding/base64.s 11907 11903 -4 -0.034%
time [cmd/compile].s 94341 94184 -157 -0.166%
index/suffixarray.s 41633 41527 -106 -0.255%
os.s 101770 101738 -32 -0.031%
regexp [cmd/compile].s 60411 60399 -12 -0.020%
encoding/binary [cmd/compile].s 37173 37198 +25 +0.067%
encoding/base64 [cmd/compile].s 11907 11903 -4 -0.034%
os/exec.s 23900 23907 +7 +0.029%
encoding/hex.s 6038 6030 -8 -0.132%
crypto/des.s 5073 5056 -17 -0.335%
os [cmd/compile].s 102030 101998 -32 -0.031%
vendor/golang.org/x/net/http2/hpack.s 22027 22033 +6 +0.027%
math/big.s 164808 164753 -55 -0.033%
cmd/vendor/golang.org/x/sys/unix.s 121450 121444 -6 -0.005%
encoding/json.s 110294 110287 -7 -0.006%
testing.s 115303 115281 -22 -0.019%
archive/zip.s 65329 65325 -4 -0.006%
os/user.s 10078 10080 +2 +0.020%
encoding/gob.s 143788 143783 -5 -0.003%
crypto/elliptic.s 30686 30704 +18 +0.059%
go/doc/comment.s 49401 49433 +32 +0.065%
debug/buildinfo.s 9095 9085 -10 -0.110%
image/png.s 36113 36081 -32 -0.089%
archive/tar.s 71994 71897 -97 -0.135%
crypto/internal/cryptotest.s 60872 60849 -23 -0.038%
internal/pkgbits.s 20441 20429 -12 -0.059%
testing/quick.s 8236 8235 -1 -0.012%
log/slog.s 77568 77558 -10 -0.013%
internal/trace/internal/oldtrace.s 52885 52896 +11 +0.021%
runtime/pprof.s 123978 123969 -9 -0.007%
internal/coverage/cfile.s 25198 25184 -14 -0.056%
cmd/internal/objabi.s 19954 19946 -8 -0.040%
crypto/ecdsa.s 29159 29141 -18 -0.062%
log/slog/internal/benchmarks.s 6694 6695 +1 +0.015%
net.s 299569 299503 -66 -0.022%
os/exec [cmd/compile].s 23888 23895 +7 +0.029%
internal/trace.s 179226 179240 +14 +0.008%
internal/fuzz.s 86190 86191 +1 +0.001%
crypto/x509.s 177195 177164 -31 -0.017%
cmd/internal/obj/s390x.s 121642 121610 -32 -0.026%
cmd/internal/obj/ppc64.s 140118 140122 +4 +0.003%
encoding/hex [cmd/compile].s 6149 6141 -8 -0.130%
cmd/internal/objabi [cmd/compile].s 19954 19946 -8 -0.040%
cmd/internal/obj/arm64.s 158523 158555 +32 +0.020%
go/doc/comment [cmd/compile].s 49512 49544 +32 +0.065%
math/big [cmd/compile].s 166394 166339 -55 -0.033%
encoding/json [cmd/compile].s 110712 110705 -7 -0.006%
cmd/covdata.s 39699 39687 -12 -0.030%
runtime/pprof [cmd/compile].s 125209 125200 -9 -0.007%
cmd/compile/internal/syntax.s 181755 181736 -19 -0.010%
cmd/dist.s 177893 177861 -32 -0.018%
crypto/tls.s 389157 389113 -44 -0.011%
internal/pkgbits [cmd/compile].s 41644 41632 -12 -0.029%
cmd/compile/internal/syntax [cmd/compile].s 196105 196086 -19 -0.010%
cmd/compile/internal/types.s 71315 71345 +30 +0.042%
cmd/internal/obj/s390x [cmd/compile].s 121733 121701 -32 -0.026%
cmd/go/internal/trace.s 4796 4760 -36 -0.751%
cmd/internal/obj/arm64 [cmd/compile].s 168120 168147 +27 +0.016%
cmd/internal/obj/ppc64 [cmd/compile].s 140219 140223 +4 +0.003%
cmd/internal/script.s 83442 83436 -6 -0.007%
cmd/link/internal/loader.s 93299 93294 -5 -0.005%
net/http.s 620639 620472 -167 -0.027%
net/http/pprof.s 35016 35013 -3 -0.009%
cmd/compile/internal/coverage.s 6668 6667 -1 -0.015%
cmd/vendor/golang.org/x/telemetry/internal/upload.s 34210 34148 -62 -0.181%
cmd/compile/internal/coverage [cmd/compile].s 6664 6663 -1 -0.015%
cmd/vendor/golang.org/x/text/language.s 48077 48074 -3 -0.006%
cmd/go/internal/vcweb.s 45193 45189 -4 -0.009%
cmd/go/internal/vcs.s 44749 44729 -20 -0.045%
cmd/compile/internal/inline/inlheur.s 83758 83742 -16 -0.019%
cmd/compile/internal/inline/inlheur [cmd/compile].s 84773 84757 -16 -0.019%
cmd/go/internal/modfetch/codehost.s 89098 89094 -4 -0.004%
cmd/trace.s 257550 257564 +14 +0.005%
cmd/link/internal/ld.s 641945 641706 -239 -0.037%
cmd/link/internal/arm64.s 34805 34798 -7 -0.020%
cmd/go/internal/modload.s 328971 328954 -17 -0.005%
cmd/go/internal/load.s 178877 178871 -6 -0.003%
cmd/go/internal/clean.s 11006 10990 -16 -0.145%
cmd/compile/internal/ssa.s 3552843 3553347 +504 +0.014%
cmd/compile/internal/ssa [cmd/compile].s 3752511 3753123 +612 +0.016%
total 36179015 36178687 -328 -0.001%
Change-Id: I251c2898ccf3c9931d162d87dabbd49cf4ec73a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/641757
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Fixes #70409
Fixes #47107
Change-Id: I82a66c46f6b76c68e156b5d937273b0316975d44
Reviewed-on: https://go-review.googlesource.com/c/go/+/629016
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
It's shorter to encode. Additionally, XOR and AND generally
have higher throughput than BT/SET*.
compilecmp:
runtime
runtime.(*sweepClass).split 58 -> 56 (-3.45%)
runtime.sweepClass.split 14 -> 11 (-21.43%)
runtime [cmd/compile]
runtime.(*sweepClass).split 58 -> 56 (-3.45%)
runtime.sweepClass.split 14 -> 11 (-21.43%)
strconv
strconv.ryuFtoaShortest changed
strconv [cmd/compile]
strconv.ryuFtoaShortest changed
math/big
math/big.(*Int).MulRange 255 -> 252 (-1.18%)
testing/quick
testing/quick.sizedValue changed
internal/fuzz
internal/fuzz.(*pcgRand).bool 69 -> 70 (+1.45%)
cmd/internal/obj/x86
cmd/internal/obj/x86.(*AsmBuf).asmevex changed
math/big [cmd/compile]
math/big.(*Int).MulRange 255 -> 252 (-1.18%)
cmd/internal/obj/x86 [cmd/compile]
cmd/internal/obj/x86.(*AsmBuf).asmevex changed
net/http
net/http.(*http2stream).isPushed 11 -> 10 (-9.09%)
cmd/vendor/github.com/google/pprof/internal/binutils
cmd/vendor/github.com/google/pprof/internal/binutils.(*file).computeBase changed
Change-Id: I9cb2987eb263c85ee4e93d6f8455c91a55273173
Reviewed-on: https://go-review.googlesource.com/c/go/+/640975
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|