| Age | Commit message (Collapse) | Author |
|
This CL makes the generated code for reflect.TypeFor as simple as an
intrinsic function.
Fixes #75203
Change-Id: I7bb48787101f07e77ab5c583292e834c28a028d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/700336
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Reduce the number of go toolchain instructions on loong64 as follows:
file before after Δ %
go 1573148 1571708 -1,440 -0.0915%
gofmt 320578 320090 -488 -0.1522%
asm 555066 554406 -660 -0.1189%
cgo 481566 480926 -640 -0.1329%
compile 2475962 2473880 -2,082 -0.0841%
cover 516536 515920 -616 -0.1193%
link 702172 701404 -768 -0.1094%
preprofile 238626 238274 -352 -0.1475%
vet 792928 792100 -828 -0.1044%
Change-Id: I61e462726835959c60e1b4e5256d4020202418ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/693877
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
Change-Id: I29ccd105c5418955146a3f4873162963da489a70
Reviewed-on: https://go-review.googlesource.com/c/go/+/697935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Change-Id: Ica285212e4884a96fe9738b53cdc789b223bf2e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/697895
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
Change-Id: I645396fc4b00242f36a06f01550906805c0c1f73
Reviewed-on: https://go-review.googlesource.com/c/go/+/697955
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
loong64
Refer to CL 633075, loong64 has a zero(R0) register that can be used to do this.
Change-Id: I846c6bdfcfd6dbfa18338afc13e34e350580ead4
Reviewed-on: https://go-review.googlesource.com/c/go/+/693876
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Pattern1: (the type of c is uint16)
c>>8 | c<<8
To:
revb2h c
Pattern2: (the type of c is uint32)
(c & 0xff00ff00)>>8 | (c & 0x00ff00ff)<<8
To:
revb2h c
Pattern3: (the type of c is uint64)
(c & 0xff00ff00ff00ff00)>>8 | (c & 0x00ff00ff00ff00ff)<<8
To:
revb4h c
Change-Id: Ic6231a3f476cbacbea4bd00e31193d107cb86cda
Reviewed-on: https://go-review.googlesource.com/c/go/+/696335
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: I782f93510bba92ba60b298c1c1cde456c8bcec38
Reviewed-on: https://go-review.googlesource.com/c/go/+/697956
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
The 'classify' instruction on RISC-V sets a bit in a mask to indicate
the class a floating point value belongs to (e.g. whether the value is
an infinity, a normal number, a subnormal number and so on). There are
other places this instruction is useful but for now I've just used it
for infinity tests.
The gains are relatively small (~1-2 instructions per IsInf call) but
using FCLASSD does potentially unlock further optimizations. It also
reduces the number of loads from memory and the number of moves
between general purpose and floating point register files.
goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
│ sec/op │ sec/op vs base │
Acos 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10)
Acosh 249.8n ± 0% 254.4n ± 0% +1.86% (p=0.000 n=10)
Asin 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10)
Asinh 292.2n ± 0% 283.0n ± 0% -3.15% (p=0.000 n=10)
Atan 119.1n ± 0% 119.0n ± 0% -0.08% (p=0.036 n=10)
Atanh 265.1n ± 0% 271.6n ± 0% +2.43% (p=0.000 n=10)
Atan2 194.9n ± 0% 186.7n ± 0% -4.23% (p=0.000 n=10)
Cbrt 216.3n ± 0% 203.1n ± 0% -6.10% (p=0.000 n=10)
Ceil 31.82n ± 0% 31.81n ± 0% ~ (p=0.063 n=10)
Copysign 4.897n ± 0% 4.893n ± 3% -0.08% (p=0.038 n=10)
Cos 123.9n ± 0% 107.7n ± 1% -13.03% (p=0.000 n=10)
Cosh 293.0n ± 0% 264.6n ± 0% -9.68% (p=0.000 n=10)
Erf 150.0n ± 0% 133.8n ± 0% -10.80% (p=0.000 n=10)
Erfc 151.8n ± 0% 137.9n ± 0% -9.16% (p=0.000 n=10)
Erfinv 173.8n ± 0% 173.8n ± 0% ~ (p=0.820 n=10)
Erfcinv 173.8n ± 0% 173.8n ± 0% ~ (p=1.000 n=10)
Exp 247.7n ± 0% 220.4n ± 0% -11.04% (p=0.000 n=10)
ExpGo 261.4n ± 0% 232.5n ± 0% -11.04% (p=0.000 n=10)
Expm1 176.2n ± 0% 164.9n ± 0% -6.41% (p=0.000 n=10)
Exp2 220.4n ± 0% 190.2n ± 0% -13.70% (p=0.000 n=10)
Exp2Go 232.5n ± 0% 204.0n ± 0% -12.22% (p=0.000 n=10)
Abs 4.897n ± 0% 4.897n ± 0% ~ (p=0.726 n=10)
Dim 16.32n ± 0% 16.31n ± 0% ~ (p=0.770 n=10)
Floor 31.84n ± 0% 31.83n ± 0% ~ (p=0.677 n=10)
Max 26.11n ± 0% 26.13n ± 0% ~ (p=0.290 n=10)
Min 26.10n ± 0% 26.11n ± 0% ~ (p=0.424 n=10)
Mod 416.2n ± 0% 337.8n ± 0% -18.83% (p=0.000 n=10)
Frexp 63.65n ± 0% 50.60n ± 0% -20.50% (p=0.000 n=10)
Gamma 218.8n ± 0% 206.4n ± 0% -5.62% (p=0.000 n=10)
Hypot 92.20n ± 0% 94.69n ± 0% +2.70% (p=0.000 n=10)
HypotGo 107.7n ± 0% 109.3n ± 0% +1.49% (p=0.000 n=10)
Ilogb 59.54n ± 0% 44.04n ± 0% -26.04% (p=0.000 n=10)
J0 708.9n ± 0% 674.5n ± 0% -4.86% (p=0.000 n=10)
J1 707.6n ± 0% 676.1n ± 0% -4.44% (p=0.000 n=10)
Jn 1.513µ ± 0% 1.427µ ± 0% -5.68% (p=0.000 n=10)
Ldexp 70.20n ± 0% 57.09n ± 0% -18.68% (p=0.000 n=10)
Lgamma 201.5n ± 0% 185.3n ± 1% -8.01% (p=0.000 n=10)
Log 201.5n ± 0% 182.7n ± 0% -9.35% (p=0.000 n=10)
Logb 59.54n ± 0% 46.53n ± 0% -21.86% (p=0.000 n=10)
Log1p 178.8n ± 0% 173.9n ± 6% -2.74% (p=0.021 n=10)
Log10 201.4n ± 0% 184.3n ± 0% -8.49% (p=0.000 n=10)
Log2 79.17n ± 0% 66.07n ± 0% -16.54% (p=0.000 n=10)
Modf 34.27n ± 0% 34.25n ± 0% ~ (p=0.559 n=10)
Nextafter32 49.34n ± 0% 49.37n ± 0% +0.05% (p=0.040 n=10)
Nextafter64 43.66n ± 0% 43.66n ± 0% ~ (p=0.869 n=10)
PowInt 309.1n ± 0% 267.4n ± 0% -13.49% (p=0.000 n=10)
PowFrac 769.6n ± 0% 677.3n ± 0% -11.98% (p=0.000 n=10)
Pow10Pos 13.88n ± 0% 13.88n ± 0% ~ (p=0.811 n=10)
Pow10Neg 19.58n ± 0% 19.57n ± 0% ~ (p=0.993 n=10)
Round 23.65n ± 0% 23.66n ± 0% ~ (p=0.354 n=10)
RoundToEven 27.75n ± 0% 27.75n ± 0% ~ (p=0.971 n=10)
Remainder 380.0n ± 0% 309.9n ± 0% -18.45% (p=0.000 n=10)
Signbit 13.06n ± 0% 13.06n ± 0% ~ (p=1.000 n=10)
Sin 133.8n ± 0% 120.8n ± 0% -9.75% (p=0.000 n=10)
Sincos 160.7n ± 0% 147.7n ± 0% -8.12% (p=0.000 n=10)
Sinh 305.9n ± 0% 277.9n ± 0% -9.17% (p=0.000 n=10)
SqrtIndirect 3.265n ± 0% 3.264n ± 0% ~ (p=0.546 n=10)
SqrtLatency 19.58n ± 0% 19.58n ± 0% ~ (p=0.973 n=10)
SqrtIndirectLatency 19.59n ± 0% 19.58n ± 0% ~ (p=0.370 n=10)
SqrtGoLatency 205.7n ± 0% 202.7n ± 0% -1.46% (p=0.000 n=10)
SqrtPrime 4.953µ ± 0% 4.954µ ± 0% ~ (p=0.477 n=10)
Tan 163.2n ± 0% 150.2n ± 0% -7.99% (p=0.000 n=10)
Tanh 312.4n ± 0% 284.2n ± 0% -9.01% (p=0.000 n=10)
Trunc 31.83n ± 0% 31.83n ± 0% ~ (p=0.663 n=10)
Y0 701.0n ± 0% 669.2n ± 0% -4.54% (p=0.000 n=10)
Y1 704.5n ± 0% 672.4n ± 0% -4.55% (p=0.000 n=10)
Yn 1.490µ ± 0% 1.422µ ± 0% -4.60% (p=0.000 n=10)
Float64bits 5.713n ± 0% 5.710n ± 0% ~ (p=0.926 n=10)
Float64frombits 4.896n ± 0% 4.896n ± 0% ~ (p=0.663 n=10)
Float32bits 12.25n ± 0% 12.25n ± 0% ~ (p=0.571 n=10)
Float32frombits 4.898n ± 0% 4.896n ± 0% ~ (p=0.754 n=10)
FMA 4.895n ± 0% 4.895n ± 0% ~ (p=0.745 n=10)
geomean 94.40n 89.43n -5.27%
Change-Id: I4fe0f2e9f609e38d79463f9ba2519a3f9427432e
Reviewed-on: https://go-review.googlesource.com/c/go/+/348389
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
'ADDshiftLLV' on loong64
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
│ old │ new │
│ sec/op │ sec/op vs base │
MulconstI32/3 0.8004n ± 0% 0.4247n ± 2% -46.94% (p=0.000 n=10)
MulconstI32/5 0.8005n ± 0% 0.4256n ± 1% -46.83% (p=0.000 n=10)
MulconstI32/12 1.2010n ± 0% 0.8005n ± 0% -33.35% (p=0.000 n=10)
MulconstI32/120 0.8090n ± 0% 0.8067n ± 0% -0.28% (p=0.007 n=10)
MulconstI32/-120 0.8109n ± 0% 0.8072n ± 0% -0.47% (p=0.000 n=10)
MulconstI32/65537 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
MulconstI32/65538 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.265 n=10)
MulconstI64/3 0.8005n ± 0% 0.4241n ± 1% -47.02% (p=0.000 n=10)
MulconstI64/5 0.8004n ± 0% 0.4249n ± 1% -46.91% (p=0.000 n=10)
MulconstI64/12 1.2010n ± 0% 0.8004n ± 0% -33.36% (p=0.000 n=10)
MulconstI64/120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.635 n=10)
MulconstI64/-120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.837 n=10)
MulconstI64/65537 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.837 n=10)
MulconstI64/65538 0.8096n ± 0% 0.8004n ± 0% -1.14% (p=0.000 n=10)
MulconstU32/3 0.8004n ± 0% 0.4263n ± 1% -46.75% (p=0.000 n=10)
MulconstU32/5 0.8005n ± 0% 0.4262n ± 1% -46.76% (p=0.000 n=10)
MulconstU32/12 1.2010n ± 0% 0.8005n ± 0% -33.35% (p=0.000 n=10)
MulconstU32/120 0.8105n ± 0% 0.8096n ± 0% ~ (p=0.183 n=10)
MulconstU32/65537 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
MulconstU32/65538 0.8005n ± 0% 0.8005n ± 0% ~ (p=1.000 n=10)
MulconstU64/3 0.8004n ± 0% 0.4265n ± 4% -46.71% (p=0.000 n=10)
MulconstU64/5 0.8004n ± 0% 0.4256n ± 0% -46.82% (p=0.000 n=10)
MulconstU64/12 1.2010n ± 0% 0.8004n ± 0% -33.36% (p=0.000 n=10)
MulconstU64/120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.387 n=10)
MulconstU64/65537 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.265 n=10)
MulconstU64/65538 0.8080n ± 0% 0.8004n ± 0% -0.93% (p=0.000 n=10)
geomean 0.8539n 0.6597n -22.74%
Change-Id: Ie33e88985d7639f481bbba540bc917b9f185c357
Reviewed-on: https://go-review.googlesource.com/c/go/+/693855
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
We now (as of CL 678620) use float registers other than X0 for copying.
Change-Id: Ifdecd5df7519663742eed0f292c98453754d4b25
Reviewed-on: https://go-review.googlesource.com/c/go/+/695275
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
|
|
Although InlMark takes a memory argument it ultimately becomes a
NOP and therefore is safe to speculatively execute.
Fixes #74915
Change-Id: I64317dd433e300ac28de2bcf201845083ec2ac82
Reviewed-on: https://go-review.googlesource.com/c/go/+/693795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
This change also add corresponding benchmark tests and codegen tests.
The performance improvement on CPU Loongson-3A6000-HV is as follows:
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
MulNeg 828.4n ± 0% 655.9n ± 0% -20.82% (p=0.000 n=10)
Mul2Neg 1062.0n ± 0% 826.8n ± 0% -22.15% (p=0.000 n=10)
geomean 938.0n 736.4n -21.49%
Change-Id: Ia999732880ec65be0c66cddc757a4868847e5b15
Reviewed-on: https://go-review.googlesource.com/c/go/+/682535
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
|
|
Use the FMV* instructions to move values between the floating point and
integer register files.
Note: I'm unsure why there is a slowdown in the Float32bits benchmark,
I've checked and an FMVXS instruction is being used as expected. There
are multiple loads and other instructions in the main loop.
goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
│ fmv-before.txt │ fmv-after.txt │
│ sec/op │ sec/op vs base │
Acos 122.7n ± 0% 122.7n ± 0% ~ (p=1.000 n=10)
Acosh 197.2n ± 0% 191.5n ± 0% -2.89% (p=0.000 n=10)
Asin 122.7n ± 0% 122.7n ± 0% ~ (p=0.474 n=10)
Asinh 231.0n ± 0% 224.1n ± 0% -2.99% (p=0.000 n=10)
Atan 91.39n ± 0% 91.41n ± 0% ~ (p=0.465 n=10)
Atanh 210.3n ± 0% 203.4n ± 0% -3.26% (p=0.000 n=10)
Atan2 149.6n ± 0% 149.6n ± 0% ~ (p=0.721 n=10)
Cbrt 176.5n ± 0% 165.9n ± 0% -6.01% (p=0.000 n=10)
Ceil 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10)
Copysign 3.756n ± 0% 3.756n ± 0% ~ (p=0.149 n=10)
Cos 95.15n ± 0% 95.15n ± 0% ~ (p=0.374 n=10)
Cosh 228.6n ± 0% 224.7n ± 0% -1.71% (p=0.000 n=10)
Erf 115.2n ± 0% 115.2n ± 0% ~ (p=0.474 n=10)
Erfc 116.4n ± 0% 116.4n ± 0% ~ (p=0.628 n=10)
Erfinv 133.3n ± 0% 133.3n ± 0% ~ (p=1.000 n=10)
Erfcinv 133.3n ± 0% 133.3n ± 0% ~ (p=1.000 n=10)
Exp 194.1n ± 0% 190.3n ± 0% -1.93% (p=0.000 n=10)
ExpGo 204.7n ± 0% 200.3n ± 0% -2.15% (p=0.000 n=10)
Expm1 137.7n ± 0% 135.2n ± 0% -1.82% (p=0.000 n=10)
Exp2 173.4n ± 0% 169.0n ± 0% -2.54% (p=0.000 n=10)
Exp2Go 182.8n ± 0% 178.4n ± 0% -2.41% (p=0.000 n=10)
Abs 3.756n ± 0% 3.756n ± 0% ~ (p=0.157 n=10)
Dim 12.52n ± 0% 12.52n ± 0% ~ (p=0.737 n=10)
Floor 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10)
Max 21.29n ± 0% 20.03n ± 0% -5.92% (p=0.000 n=10)
Min 21.28n ± 0% 20.04n ± 0% -5.85% (p=0.000 n=10)
Mod 344.9n ± 0% 319.2n ± 0% -7.45% (p=0.000 n=10)
Frexp 55.71n ± 0% 48.85n ± 0% -12.30% (p=0.000 n=10)
Gamma 165.9n ± 0% 167.8n ± 0% +1.15% (p=0.000 n=10)
Hypot 73.24n ± 0% 70.74n ± 0% -3.41% (p=0.000 n=10)
HypotGo 84.50n ± 0% 82.63n ± 0% -2.21% (p=0.000 n=10)
Ilogb 49.45n ± 0% 45.70n ± 0% -7.59% (p=0.000 n=10)
J0 556.5n ± 0% 544.0n ± 0% -2.25% (p=0.000 n=10)
J1 555.3n ± 0% 542.8n ± 0% -2.24% (p=0.000 n=10)
Jn 1.181µ ± 0% 1.156µ ± 0% -2.12% (p=0.000 n=10)
Ldexp 59.47n ± 0% 53.84n ± 0% -9.47% (p=0.000 n=10)
Lgamma 167.2n ± 0% 154.6n ± 0% -7.51% (p=0.000 n=10)
Log 160.9n ± 0% 154.6n ± 0% -3.92% (p=0.000 n=10)
Logb 49.45n ± 0% 45.70n ± 0% -7.58% (p=0.000 n=10)
Log1p 147.1n ± 0% 137.1n ± 0% -6.80% (p=0.000 n=10)
Log10 162.1n ± 1% 154.6n ± 0% -4.63% (p=0.000 n=10)
Log2 66.99n ± 0% 60.72n ± 0% -9.36% (p=0.000 n=10)
Modf 29.42n ± 0% 26.29n ± 0% -10.64% (p=0.000 n=10)
Nextafter32 41.95n ± 0% 37.88n ± 0% -9.70% (p=0.000 n=10)
Nextafter64 38.82n ± 0% 33.49n ± 0% -13.73% (p=0.000 n=10)
PowInt 252.3n ± 0% 237.3n ± 0% -5.95% (p=0.000 n=10)
PowFrac 615.5n ± 0% 589.7n ± 0% -4.19% (p=0.000 n=10)
Pow10Pos 10.64n ± 0% 10.64n ± 0% ~ (p=1.000 n=10)
Pow10Neg 24.42n ± 0% 15.02n ± 0% -38.49% (p=0.000 n=10)
Round 21.91n ± 0% 18.16n ± 0% -17.12% (p=0.000 n=10)
RoundToEven 24.42n ± 0% 21.29n ± 0% -12.84% (p=0.000 n=10)
Remainder 308.0n ± 0% 291.2n ± 0% -5.44% (p=0.000 n=10)
Signbit 10.02n ± 0% 10.02n ± 0% ~ (p=1.000 n=10)
Sin 102.7n ± 0% 102.7n ± 0% ~ (p=0.211 n=10)
Sincos 124.0n ± 1% 123.3n ± 0% -0.56% (p=0.002 n=10)
Sinh 239.1n ± 0% 234.7n ± 0% -1.84% (p=0.000 n=10)
SqrtIndirect 2.504n ± 0% 2.504n ± 0% ~ (p=0.303 n=10)
SqrtLatency 15.03n ± 0% 15.02n ± 0% ~ (p=0.598 n=10)
SqrtIndirectLatency 15.02n ± 0% 15.02n ± 0% ~ (p=0.907 n=10)
SqrtGoLatency 165.3n ± 0% 157.2n ± 0% -4.90% (p=0.000 n=10)
SqrtPrime 3.801µ ± 0% 3.802µ ± 0% ~ (p=1.000 n=10)
Tan 125.2n ± 0% 125.2n ± 0% ~ (p=0.458 n=10)
Tanh 244.2n ± 0% 239.9n ± 0% -1.76% (p=0.000 n=10)
Trunc 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10)
Y0 550.2n ± 0% 538.1n ± 0% -2.21% (p=0.000 n=10)
Y1 552.8n ± 0% 540.6n ± 0% -2.21% (p=0.000 n=10)
Yn 1.168µ ± 0% 1.143µ ± 0% -2.14% (p=0.000 n=10)
Float64bits 8.139n ± 0% 4.385n ± 0% -46.13% (p=0.000 n=10)
Float64frombits 7.512n ± 0% 3.759n ± 0% -49.96% (p=0.000 n=10)
Float32bits 8.138n ± 0% 9.393n ± 0% +15.42% (p=0.000 n=10)
Float32frombits 7.513n ± 0% 3.757n ± 0% -49.98% (p=0.000 n=10)
FMA 3.756n ± 0% 3.756n ± 0% ~ (p=0.246 n=10)
geomean 77.43n 72.42n -6.47%
Change-Id: I8dac69b1d17cb3d2af78d1c844d2b5d80000d667
Reviewed-on: https://go-review.googlesource.com/c/go/+/599235
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Munday <mikemndy@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Improve multiplication strength reduction, refer to CL 626998,
add additional 3 linear combination instructions for loong64.
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
MulconstI32/3 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstI32/5 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstI32/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstI32/120 1.6010n ± 0% 0.8130n ± 0% -49.22% (p=0.000 n=10)
MulconstI32/-120 1.6010n ± 0% 0.8109n ± 0% -49.35% (p=0.000 n=10)
MulconstI32/65537 1.6275n ± 0% 0.8005n ± 0% -50.81% (p=0.000 n=10)
MulconstI32/65538 1.6290n ± 0% 0.8004n ± 0% -50.87% (p=0.000 n=10)
MulconstI64/3 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10)
MulconstI64/5 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10)
MulconstI64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstI64/120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstI64/-120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstI64/65537 1.6270n ± 0% 0.8005n ± 0% -50.80% (p=0.000 n=10)
MulconstI64/65538 1.6290n ± 0% 0.8071n ± 1% -50.45% (p=0.000 n=10)
MulconstU32/3 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10)
MulconstU32/5 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10)
MulconstU32/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstU32/120 1.6010n ± 0% 0.8066n ± 0% -49.62% (p=0.000 n=10)
MulconstU32/65537 1.6290n ± 0% 0.8005n ± 0% -50.86% (p=0.000 n=10)
MulconstU32/65538 1.6280n ± 0% 0.8005n ± 0% -50.83% (p=0.000 n=10)
MulconstU64/3 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstU64/5 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstU64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstU64/120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10)
MulconstU64/65537 1.6290n ± 0% 0.8005n ± 0% -50.86% (p=0.000 n=10)
MulconstU64/65538 1.6300n ± 0% 0.8067n ± 0% -50.51% (p=0.000 n=10)
geomean 1.609n 0.8537n -46.95%
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
MulconstI32/3 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI32/5 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI32/12 1.601n ± 0% 1.202n ± 0% -24.92% (p=0.000 n=10)
MulconstI32/120 1.6020n ± 0% 0.8012n ± 0% -49.99% (p=0.000 n=10)
MulconstI32/-120 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI32/65537 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10)
MulconstI32/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI64/3 1.6015n ± 0% 0.8007n ± 0% -50.00% (p=0.000 n=10)
MulconstI64/5 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10)
MulconstI64/12 1.602n ± 0% 1.202n ± 0% -25.00% (p=0.000 n=10)
MulconstI64/120 1.6030n ± 0% 0.8011n ± 0% -50.02% (p=0.000 n=10)
MulconstI64/-120 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10)
MulconstI64/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstI64/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/3 1.6010n ± 0% 0.8006n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/5 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/12 1.601n ± 0% 1.202n ± 0% -24.92% (p=0.000 n=10)
MulconstU32/120 1.6010n ± 0% 0.8006n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU32/65538 1.6020n ± 0% 0.8009n ± 0% -50.01% (p=0.000 n=10)
MulconstU64/3 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU64/5 1.6010n ± 0% 0.8007n ± 0% -49.98% (p=0.000 n=10)
MulconstU64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10)
MulconstU64/120 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10)
MulconstU64/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
MulconstU64/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10)
geomean 1.601n 0.8523n -46.77%
Change-Id: I9fb0e47ca57875da171a347bf4828adfab41b875
Reviewed-on: https://go-review.googlesource.com/c/go/+/675455
Reviewed-by: Mark Freeman <mark@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
goarch: amd64
cpu: 12th Gen Intel(R) Core(TM) i7-12700
│ base │ exp │
│ sec/op │ sec/op vs base │
MemclrKnownSize112-20 1.270n ± 14% 1.006n ± 0% -20.72% (p=0.000 n=10)
MemclrKnownSize128-20 1.266n ± 0% 1.005n ± 0% -20.58% (p=0.000 n=10)
MemclrKnownSize192-20 1.771n ± 0% 1.579n ± 1% -10.84% (p=0.000 n=10)
MemclrKnownSize248-20 4.034n ± 0% 3.520n ± 0% -12.75% (p=0.000 n=10)
MemclrKnownSize256-20 2.269n ± 0% 2.014n ± 0% -11.26% (p=0.000 n=10)
MemclrKnownSize512-20 4.280n ± 0% 4.030n ± 0% -5.84% (p=0.000 n=10)
MemclrKnownSize1024-20 8.309n ± 1% 8.057n ± 0% -3.03% (p=0.000 n=10)
Change-Id: I8f1627e2a1e981ff351dc7178932b32a2627f765
Reviewed-on: https://go-review.googlesource.com/c/go/+/678937
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add the VECTOR FP (MINIMUM|MAXIMUM) instructions to the assembler and
use them in the compiler to implement min and max.
Note: I've allowed floating point registers to be used with the single
element instructions (those with the W instead of V prefix) to allow
easier integration into the compiler.
Change-Id: I5f80a510bd248cf483cce95f1979bf63fbae7de6
Reviewed-on: https://go-review.googlesource.com/c/go/+/684715
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <mark@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
After CL 628075, do not rely on the memory arg of an OpLocalAddr.
Fixes #74788
Change-Id: I4e893241e3949bb8f2d93c8b88cc102e155b725d
Reviewed-on: https://go-review.googlesource.com/c/go/+/691275
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Mark Freeman <mark@golang.org>
|
|
Same as CL 689815, but for modulus instead of division.
Updates #74485
Change-Id: I73000231c886a987a1093669ff207fd9117a8160
Reviewed-on: https://go-review.googlesource.com/c/go/+/689895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Fixes #74485
Change-Id: Ia22a58ac43bdc36c8414d555672a3a3eafc749ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/689815
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
These recently added tests failed when using the -all_codgen flag.
Fixes #74770
Change-Id: Idea1ea02af2bd9f45c7d0a28d633c7442328e6df
Reviewed-on: https://go-review.googlesource.com/c/go/+/690715
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Run-TryBot: Michael Munday <mikemndy@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Mark Freeman <mark@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
|
|
For performance see CL 685676.
This allows something like:
if y { x *= 2 }
To be compiled to:
SHLXQ BX, AX, AX
Instead of:
MOVQ AX, CX
SHLQ $1, CX
MOVBLZX BL, DX
TESTQ DX, DX
CMOVQNE CX, AX
While ./make.bash uniqued per LOC, there is 2 doublings and 4 halvings.
Change-Id: Ic0727cbf429528a2dbf17cbfc3b0121db8387444
Reviewed-on: https://go-review.googlesource.com/c/go/+/685695
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This allows something like:
if y { x++ }
To be compiled to:
MOVBLZX BX, CX
ADDQ CX, AX
Instead of:
LEAQ 1(AX), CX
MOVBLZX BL, DX
TESTQ DX, DX
CMOVQNE CX, AX
While ./make.bash uniqued per LOC, there is 100 additions and 75 substractions.
See benchmark here: https://go.dev/play/p/DJf5COjwhd_s
Either it's a performance no-op or it is faster:
goos: linux
goarch: amd64
cpu: AMD Ryzen 5 3600 6-Core Processor
│ /tmp/old.logs │ /tmp/new.logs │
│ sec/op │ sec/op vs base │
CmovInlineConditionAddLatency-12 0.5443n ± 5% 0.5339n ± 3% -1.90% (p=0.004 n=10)
CmovInlineConditionAddThroughputBy6-12 1.492n ± 1% 1.494n ± 1% ~ (p=0.955 n=10)
CmovInlineConditionSubLatency-12 0.5419n ± 3% 0.5282n ± 3% -2.52% (p=0.019 n=10)
CmovInlineConditionSubThroughputBy6-12 1.587n ± 1% 1.584n ± 2% ~ (p=0.492 n=10)
CmovOutlineConditionAddLatency-12 0.5223n ± 1% 0.2639n ± 4% -49.47% (p=0.000 n=10)
CmovOutlineConditionAddThroughputBy6-12 1.159n ± 1% 1.097n ± 2% -5.35% (p=0.000 n=10)
CmovOutlineConditionSubLatency-12 0.5271n ± 3% 0.2654n ± 2% -49.66% (p=0.000 n=10)
CmovOutlineConditionSubThroughputBy6-12 1.053n ± 1% 1.050n ± 1% ~ (p=1.000 n=10)
geomean
There are other benefits not tested by this benchmark:
- the math form is usually a couple bytes shorter (ICACHE)
- the math form is usually 0~2 uops shorter (UCACHE)
- the math form has usually less register pressure*
- the math form can sometimes be optimized further
*regalloc rarely find how it can use less registers
As far as pass ordering goes there are many possible options,
I've decided to reorder branchelim before late opt since:
- unlike running exclusively the CondSelect rules after branchelim,
some extra optimizations might trigger on the adds or subs.
- I don't want to maintain a second generic.rules file of only the stuff,
that can trigger after branchelim.
- rerunning all of opt a third time increase compilation time for little gains.
By elimination moving branchelim seems fine.
Change-Id: I869adf57e4d109948ee157cfc47144445146bafd
Reviewed-on: https://go-review.googlesource.com/c/go/+/685676
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Fold a shift through AND when the AND gets a zero-or-one operand (e.g.
from arithmetic shift by 63 of a 64-bit value) for a common case with
slice operations:
ASR $63, R2, R2
AND R3<<3, R2, R2
ADD R2, R0, R2
As the operands are 64-bit, we can transform it to:
AND R2->63, R3, R2
ADD R2<<3, R0, R2
Code size improvement:
compile: .text: 9088004 -> 9086292 (-0.02%)
etcd: .text: 10500276 -> 10498964 (-0.01%)
Change-Id: Ibcd5e67173da39b77ceff77ca67812fb8be5a7b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/679895
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <mark@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Optimize ARM64 code generation for slice bounds checking by recognizing
patterns where comparisons to zero involve SUB or SUBconst operations.
This change adds SSA opt rules to simplify:
(CMPconst [0] (SUB x y)) => (CMP x y)
The optimizations apply to EQ, NE, ULE, and UGT comparisons, enabling
more efficient bounds checking for slice operations.
Code size improvement:
compile: .text: 9088004 -> 9065988 (-0.24%)
etcd: .text: 10500276 -> 10497092 (-0.03%)
Change-Id: I467cb27674351652bcacc52b87e1f19677bd46a8
Reviewed-on: https://go-review.googlesource.com/c/go/+/679915
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
CL 622236 forgot to check the mask was also a 32 bit rotate mask. Add
a modified version of isPPC64WordRotateMask which valids the mask is
contiguous and fits inside a uint32.
I don't this is possible when merging SRDconst, the first check should
always reject such combines. But, be extra careful and do it there
too.
Fixes #73153
Change-Id: Ie95f74ec5e7d89dc761511126db814f886a7a435
Reviewed-on: https://go-review.googlesource.com/c/go/+/679775
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
unique.Make always copies strings passed into it, so it's safe to not
copy byte slices converted to strings either. Handle this just like map
accesses with string(b) as keys.
This CL only handles unique.Make(string(b)), not nested cases like
unique.Make([2]string{string(b1), string(b2)}); this could be done in a
followup CL but the map lookup code in walk is sufficiently different
than the call handling code that I didn't attempt it. (SSA is much
easier).
Fixes #71926
Change-Id: Ic2f82f2f91963d563b4ddb1282bd49fc40da8b85
Reviewed-on: https://go-review.googlesource.com/c/go/+/672135
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
allocating
Today, this interface conversion causes the struct literal
to be heap allocated:
var sink any
func example1() {
sink = S{1, 1}
}
For basic literals like integers that are directly used in
an interface conversion that would otherwise allocate, the compiler
is able to use read-only global storage (see #18704).
This CL extends that to struct and array literals as well by creating
read-only global storage that is able to represent for example S{1, 1},
and then using a pointer to that storage in the interface
when the interface conversion happens.
A more challenging example is:
func example2() {
v := S{1, 1}
sink = v
}
In this case, the struct literal is not directly part of the
interface conversion, but is instead assigned to a local variable.
To still avoid heap allocation in cases like this, in walk we
construct a cache that maps from expressions used in interface
conversions to earlier expressions that can be used to represent the
same value (via ir.ReassignOracle.StaticValue). This is somewhat
analogous to how we avoided heap allocation for basic literals in
CL 649077 earlier in our stack, though here we also need to do a
little more work to create the read-only global.
CL 649076 (also earlier in our stack) added most of the tests
along with debug diagnostics in convert.go to make it easier
to test this change.
See the writeup in #71359 for details.
Fixes #71359
Fixes #71323
Updates #62653
Updates #53465
Updates #8618
Change-Id: I8924f0c69ff738ea33439bd6af7b4066af493b90
Reviewed-on: https://go-review.googlesource.com/c/go/+/649555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Fixes #73812
Change-Id: If7a6e103ae9e1442a2cf4a3c6b1270b6a1887196
Reviewed-on: https://go-review.googlesource.com/c/go/+/675175
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Reduce the number of go toolchain instructions on loong64 as follows.
file before after Δ %
addr2line 279880 279776 -104 -0.0372%
asm 556638 556410 -228 -0.0410%
buildid 272272 272072 -200 -0.0735%
cgo 481522 481318 -204 -0.0424%
compile 2457788 2457580 -208 -0.0085%
covdata 323384 323280 -104 -0.0322%
cover 518450 518234 -216 -0.0417%
dist 340790 340686 -104 -0.0305%
distpack 282456 282252 -204 -0.0722%
doc 789932 789688 -244 -0.0309%
fix 324332 324228 -104 -0.0321%
link 704622 704390 -232 -0.0329%
nm 277132 277028 -104 -0.0375%
objdump 507862 507758 -104 -0.0205%
pack 221774 221674 -100 -0.0451%
pprof 1469816 1469552 -264 -0.0180%
test2json 254836 254732 -104 -0.0408%
trace 1100002 1099738 -264 -0.0240%
vet 781078 780874 -204 -0.0261%
go 1529116 1528848 -268 -0.0175%
gofmt 318556 318448 -108 -0.0339%
total 13792238 13788566 -3672 -0.0266%
Change-Id: I23fb3ebd41309252c7075e57ea7094e79f8c4fef
Reviewed-on: https://go-review.googlesource.com/c/go/+/674335
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
|
|
In the loong64 instruction set, there is no NORI instruction,
so the immediate value in NORconst need to be stored in register
and then use the three-register NOR instruction.
Change-Id: I5ef697450619317218cb3ef47fc07e238bdc2139
Reviewed-on: https://go-review.googlesource.com/c/go/+/673836
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This CL implements the TODO in combineStores to allow combining
stores of different sizes, as long as the total size aligns to
2, 4, 8.
Fixes #72832.
Change-Id: I6d1d471335da90d851ad8f3b5a0cf10bdcfa17c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/661855
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Fold negation into addition/subtraction and avoid double negation.
platform: linux/arm64
file before after Δ %
addr2line 3628108 3628116 +8 +0.000%
asm 6208353 6207857 -496 -0.008%
buildid 3460682 3460418 -264 -0.008%
cgo 5572988 5572492 -496 -0.009%
compile 26042159 26041039 -1120 -0.004%
cover 6304328 6303472 -856 -0.014%
dist 4139330 4139098 -232 -0.006%
doc 9429305 9428065 -1240 -0.013%
fix 3997189 3996733 -456 -0.011%
link 8212128 8210280 -1848 -0.023%
nm 3620056 3619696 -360 -0.010%
objdump 5920289 5919233 -1056 -0.018%
pack 2892250 2891778 -472 -0.016%
pprof 17094569 17092745 -1824 -0.011%
test2json 3335825 3335529 -296 -0.009%
trace 15842080 15841456 -624 -0.004%
vet 9472194 9471106 -1088 -0.011%
go 19081541 19081509 -32 -0.000%
total 154253374 154240622 -12752 -0.008%
platform: darwin/arm64
file before after Δ %
compile 27152002 27135490 -16512 -0.061%
link 8372914 8356402 -16512 -0.197%
go 19154802 19154778 -24 -0.000%
total 157734180 157701132 -33048 -0.021%
Change-Id: I15a349bfbaf7333ec3e4a62ae4d06f3f371dfb1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/673715
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
-N+1 <= x % N <= N-1
This is useful for cases like:
func setBit(b []byte, i int) {
b[i/8] |= 1<<(i%8)
}
The shift does not need protection against larger-than-7 cases.
(It does still need protection against <0 cases.)
Change-Id: Idf83101386af538548bfeb6e2928cea855610ce2
Reviewed-on: https://go-review.googlesource.com/c/go/+/672995
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Fold negation into addition/subtraction and avoid double negation.
file before after Δ %
addr2line 3742022 3741986 -36 -0.001%
asm 6668616 6668628 +12 +0.000%
buildid 3583786 3583630 -156 -0.004%
cgo 6020370 6019634 -736 -0.012%
compile 29416016 29417336 +1320 +0.004%
cover 6801903 6801675 -228 -0.003%
dist 4485916 4485816 -100 -0.002%
doc 10652787 10652251 -536 -0.005%
fix 4115988 4115560 -428 -0.010%
link 9002328 9001616 -712 -0.008%
nm 3733148 3732780 -368 -0.010%
objdump 6163292 6163068 -224 -0.004%
pack 2944768 2944604 -164 -0.006%
pprof 18909973 18908773 -1200 -0.006%
test2json 3394662 3394778 +116 +0.003%
trace 17350911 17349751 -1160 -0.007%
vet 10077727 10077527 -200 -0.002%
go 19118769 19118609 -160 -0.001%
total 166182982 166178022 -4960 -0.003%
Change-Id: Id55698800fd70f3cb2ff48393584456b87208921
Reviewed-on: https://go-review.googlesource.com/c/go/+/673556
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Fold negation into addition/subtraction and avoid double negation.
file before after Δ %
addr2line 4007310 4007470 +160 +0.004%
asm 7007636 7007436 -200 -0.003%
buildid 3839268 3838972 -296 -0.008%
cgo 6353466 6352738 -728 -0.011%
compile 30426920 30426896 -24 -0.000%
cover 7005408 7004744 -664 -0.009%
dist 4651192 4650872 -320 -0.007%
doc 10606050 10606034 -16 -0.000%
fix 4446414 4446390 -24 -0.001%
link 9237736 9237024 -712 -0.008%
nm 3999107 3999323 +216 +0.005%
objdump 6762424 6762144 -280 -0.004%
pack 3270757 3270493 -264 -0.008%
pprof 19428299 19361939 -66360 -0.342%
test2json 3717345 3717217 -128 -0.003%
trace 17382273 17381657 -616 -0.004%
vet 10689481 10688985 -496 -0.005%
go 19118769 19118609 -160 -0.001%
total 171949855 171878943 -70912 -0.041%
Change-Id: I35c1f264d216c214ea3f56252a9ddab8ea850fa6
Reviewed-on: https://go-review.googlesource.com/c/go/+/673555
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
x += *p
We want to do this with a single load+add operation on amd64.
The tricky part is that we don't want to combine if there are
other uses of x after this instruction.
Implement a simple detector that seems to capture a common situation -
x += *p is in a loop, and the other use of x is after loop exit.
In that case, it does not hurt to do the load+add combo.
Change-Id: I466174cce212e78bde83f908cc1f2752b560c49c
Reviewed-on: https://go-review.googlesource.com/c/go/+/672957
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
for ..; ..; i++ {
...
}
We want to schedule the i++ late in the block, so that all other
uses of i in the block are scheduled first. That way, i++ can
happen in place in a register instead of requiring a temporary register.
Change-Id: Id777407c7e67a5ddbd8e58251099b0488138c0df
Reviewed-on: https://go-review.googlesource.com/c/go/+/672998
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
This change also avoid double negation, and add loong64 codegen for arithmetic tests.
Reduce the number of go toolchain instructions on loong64 as follows.
file before after Δ %
addr2line 279972 279896 -76 -0.0271%
asm 556390 556310 -80 -0.0144%
buildid 272376 272300 -76 -0.0279%
cgo 481534 481550 +16 +0.0033%
compile 2457992 2457396 -596 -0.0242%
covdata 323488 323404 -84 -0.0260%
cover 518630 518490 -140 -0.0270%
dist 340894 340814 -80 -0.0235%
distpack 282568 282484 -84 -0.0297%
doc 790224 789984 -240 -0.0304%
fix 324408 324348 -60 -0.0185%
link 704910 704666 -244 -0.0346%
nm 277220 277144 -76 -0.0274%
objdump 508026 507878 -148 -0.0291%
pack 221810 221786 -24 -0.0108%
pprof 1470284 1469880 -404 -0.0275%
test2json 254896 254852 -44 -0.0173%
trace 1100390 1100074 -316 -0.0287%
vet 781398 781142 -256 -0.0328%
go 1529668 1529128 -540 -0.0353%
gofmt 318668 318568 -100 -0.0314%
total 13795746 13792094 -3652 -0.0265%
Change-Id: I88d1f12cfc4be0e92687c48e06a57213aa484aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/672555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Add 2 more cases:
if a { x = value } else { x = a } => x = a && value
if a { x = a } else { x = value } => x = a || value
AND case goes from:
00006 (8) TESTB AX, AX
00007 (8) JNE 9
00008 (13) MOVL AX, BX
00009 (13) MOVL BX, AX
00010 (13) RET
to:
00006 (13) ANDL BX, AX
00007 (13) RET
OR goes from:
00006 (19) TESTB AX, AX
00007 (19) JNE 9
00008 (24) MOVL BX, AX
00009 (24) RET
to:
00006 (24) ORL BX, AX
00007 (24) RET
compilecmp linux/amd64:
runtime
runtime.lock2 847 -> 869 (+2.60%)
runtime.addspecial 542 -> 517 (-4.61%)
runtime.tracebackPCs changed
runtime.scanstack changed
runtime.mallocinit changed
runtime.traceback2 2238 -> 2206 (-1.43%)
runtime [cmd/compile]
runtime.lock2 860 -> 882 (+2.56%)
runtime.scanstack changed
runtime.addspecial 542 -> 517 (-4.61%)
runtime.traceback2 2238 -> 2206 (-1.43%)
runtime.lockWithRank 870 -> 890 (+2.30%)
runtime.tracebackPCs changed
runtime.mallocinit changed
strconv
strconv.ryuFtoaFixed32 changed
strconv.ryuFtoaFixed64 639 -> 638 (-0.16%)
strconv.readFloat changed
strconv.ryuFtoaShortest changed
strings
strings.(*Replacer).build changed
strconv [cmd/compile]
strconv.readFloat changed
strconv.ryuFtoaFixed64 639 -> 638 (-0.16%)
strconv.ryuFtoaFixed32 changed
strconv.ryuFtoaShortest changed
strings [cmd/compile]
strings.(*Replacer).build changed
regexp
regexp.makeOnePass.func1 changed
regexp [cmd/compile]
regexp.makeOnePass.func1 changed
encoding/json
encoding/json.indirect changed
database/sql
database/sql.driverArgsConnLocked changed
vendor/golang.org/x/text/unicode/norm
vendor/golang.org/x/text/unicode/norm.Form.transform changed
go/doc/comment
go/doc/comment.parseSpans changed
internal/diff
internal/diff.tgs changed
log/slog
log/slog.(*handleState).appendNonBuiltIns 1898 -> 1877 (-1.11%)
testing/fstest
testing/fstest.(*fsTester).checkGlob changed
runtime/pprof
runtime/pprof.(*profileBuilder).build changed
cmd/internal/dwarf
cmd/internal/dwarf.isEmptyInlinedCall 254 -> 244 (-3.94%)
go/printer
go/printer.keepTypeColumn 302 -> 270 (-10.60%)
go/printer.(*printer).binaryExpr changed
cmd/compile/internal/syntax
cmd/compile/internal/syntax.(*scanner).rune changed
cmd/compile/internal/syntax.(*scanner).number 2137 -> 2153 (+0.75%)
Change-Id: I7f95f54b03a35d0b616c40f38b415a7feb71be73
Reviewed-on: https://go-review.googlesource.com/c/go/+/666835
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Jakub Ciolek <jakub@ciolek.dev>
TryBot-Bypass: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Use an automatic algorithm to generate strength reduction code.
You give it all the linear combination (a*x+b*y) instructions in your
architecture, it figures out the rest.
Just amd64 and arm64 for now.
Fixes #67575
Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/626998
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount
using the CPOP/CPOPW machine instructions. Since the native Go
implementation of OnesCount is relatively expensive, it is also
worth emitting a check for Zbb support when compiled for rva20u64.
On a Banana Pi F3, with GORISCV64=rva22u64:
│ oc.1 │ oc.2 │
│ sec/op │ sec/op vs base │
OnesCount-8 16.930n ± 0% 4.389n ± 0% -74.08% (p=0.000 n=10)
OnesCount8-8 5.642n ± 0% 5.016n ± 0% -11.10% (p=0.000 n=10)
OnesCount16-8 9.404n ± 0% 5.015n ± 0% -46.67% (p=0.000 n=10)
OnesCount32-8 13.165n ± 0% 4.388n ± 0% -66.67% (p=0.000 n=10)
OnesCount64-8 16.300n ± 0% 4.388n ± 0% -73.08% (p=0.000 n=10)
geomean 11.40n 4.629n -59.40%
On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb
detection enabled:
│ oc.3 │ oc.4 │
│ sec/op │ sec/op vs base │
OnesCount-8 16.930n ± 0% 5.643n ± 0% -66.67% (p=0.000 n=10)
OnesCount8-8 5.642n ± 0% 5.642n ± 0% ~ (p=0.447 n=10)
OnesCount16-8 10.030n ± 0% 6.896n ± 0% -31.25% (p=0.000 n=10)
OnesCount32-8 13.170n ± 0% 5.642n ± 0% -57.16% (p=0.000 n=10)
OnesCount64-8 16.300n ± 0% 5.642n ± 0% -65.39% (p=0.000 n=10)
geomean 11.55n 5.873n -49.16%
On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb
detection disabled:
│ oc.3 │ oc.5 │
│ sec/op │ sec/op vs base │
OnesCount-8 16.93n ± 0% 29.47n ± 0% +74.07% (p=0.000 n=10)
OnesCount8-8 5.642n ± 0% 5.643n ± 0% ~ (p=0.191 n=10)
OnesCount16-8 10.03n ± 0% 15.05n ± 0% +50.05% (p=0.000 n=10)
OnesCount32-8 13.17n ± 0% 18.18n ± 0% +38.04% (p=0.000 n=10)
OnesCount64-8 16.30n ± 0% 21.94n ± 0% +34.60% (p=0.000 n=10)
geomean 11.55n 15.84n +37.16%
For hardware without Zbb, this adds ~5ns overhead, while for hardware
with Zbb we achieve a performance gain up of up to 11ns. It is worth
noting that OnesCount8 is cheap enough that it is preferable to stick
with the generic version in this case.
Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5
Reviewed-on: https://go-review.googlesource.com/c/go/+/660856
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For riscv64/rva22u64 and above, we can intrinsify math/bits.Bswap
using the REV8 machine instruction.
On a StarFive VisionFive 2 with GORISCV64=rva22u64:
│ rb.1 │ rb.2 │
│ sec/op │ sec/op vs base │
ReverseBytes-4 18.790n ± 0% 4.026n ± 0% -78.57% (p=0.000 n=10)
ReverseBytes16-4 6.710n ± 0% 5.368n ± 0% -20.00% (p=0.000 n=10)
ReverseBytes32-4 13.420n ± 0% 5.368n ± 0% -60.00% (p=0.000 n=10)
ReverseBytes64-4 17.450n ± 0% 4.026n ± 0% -76.93% (p=0.000 n=10)
geomean 13.11n 4.649n -64.54%
Change-Id: I26eee34270b1721f7304bb1cddb0fda129b20ece
Reviewed-on: https://go-review.googlesource.com/c/go/+/660855
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
The full 64x64->128 multiply comes up when using bits.Mul64.
The 64x64->64+overflow multiply comes up in unsafe.Slice when using
a constant length.
Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc
Reviewed-on: https://go-review.googlesource.com/c/go/+/667175
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
If the thing we're ranging over is an array or ptr to array, and
it doesn't have a function call or channel receive in it, then we
shouldn't evaluate it.
Typecheck the ranged-over value as a constant in that case.
That makes the unified exporter replace the range expression
with a constant int.
Change-Id: I0d4ea081de70d20cf6d1fa8d25ef6cb021975554
Reviewed-on: https://go-review.googlesource.com/c/go/+/659317
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
goos: linux
goarch: loong64
pkg: unicode/utf8
cpu: Loongson-3A6000-HV @ 2500.00MHz
│ old │ new │
│ sec/op │ sec/op vs base │
ValidTenASCIIChars 7.604n ± 0% 6.805n ± 0% -10.51% (p=0.000 n=10)
Valid100KASCIIChars 37.41µ ± 0% 16.58µ ± 0% -55.67% (p=0.000 n=10)
ValidTenJapaneseChars 60.84n ± 0% 58.62n ± 0% -3.64% (p=0.000 n=10)
ValidLongMostlyASCII 113.5µ ± 0% 113.5µ ± 0% ~ (p=0.303 n=10)
ValidLongJapanese 204.6µ ± 0% 206.8µ ± 0% +1.07% (p=0.000 n=10)
ValidStringTenASCIIChars 7.604n ± 0% 6.803n ± 0% -10.53% (p=0.000 n=10)
ValidString100KASCIIChars 38.05µ ± 0% 17.14µ ± 0% -54.97% (p=0.000 n=10)
ValidStringTenJapaneseChars 60.58n ± 0% 59.48n ± 0% -1.82% (p=0.000 n=10)
ValidStringLongMostlyASCII 113.5µ ± 0% 113.4µ ± 0% -0.10% (p=0.000 n=10)
ValidStringLongJapanese 205.9µ ± 0% 207.3µ ± 0% +0.67% (p=0.000 n=10)
geomean 3.324µ 2.756µ -17.08%
Change-Id: Id43b6e2e41907bd4b92f421dacde31f048db47d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/662495
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Improve the compiler's store-to-load forwarding optimization by relaxing the
type comparison condition. Instead of requiring exact type equality (CMPeq),
we now use copyCompatibleType which allows forwarding between compatible
types where safe.
Fix several size comparison bugs in the nested store patterns. Previously,
we were comparing the size of the outer store with the load type,
rather than comparing with the size of the actual store being forwarded
from.
Skip OpConvert in dead store elimination to help get rid of dead stores such
as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory.
This optimization is particularly beneficial for code that creates slices with
computed pointers, such as the runtime's heapBitsSlice function, where
intermediate calculations were previously causing the compiler to miss
store-to-load forwarding opportunities.
Local sweet run result on an x86_64 laptop:
│ Orig.res │ Hopt.res │
│ sec/op │ sec/op vs base │
BiogoIgor-8 5.303 ± 1% 5.322 ± 1% ~ (p=0.190 n=10)
BiogoKrishna-8 7.894 ± 1% 7.828 ± 2% ~ (p=0.190 n=10)
BleveIndexBatch100-8 2.257 ± 1% 2.248 ± 2% ~ (p=0.529 n=10)
EtcdPut-8 30.12m ± 1% 30.03m ± 1% ~ (p=0.796 n=10)
EtcdSTM-8 127.1m ± 1% 126.2m ± 0% -0.74% (p=0.023 n=10)
GoBuildKubelet-8 52.21 ± 0% 52.05 ± 1% ~ (p=0.063 n=10)
GoBuildKubeletLink-8 4.342 ± 1% 4.305 ± 0% -0.85% (p=0.000 n=10)
GoBuildIstioctl-8 43.33 ± 0% 43.24 ± 0% -0.22% (p=0.015 n=10)
GoBuildIstioctlLink-8 4.604 ± 1% 4.598 ± 0% ~ (p=0.063 n=10)
GoBuildFrontend-8 15.33 ± 0% 15.29 ± 0% ~ (p=0.143 n=10)
GoBuildFrontendLink-8 740.0m ± 1% 737.7m ± 1% ~ (p=0.912 n=10)
GopherLuaKNucleotide-8 9.590 ± 1% 9.656 ± 1% ~ (p=0.165 n=10)
MarkdownRenderXHTML-8 96.97m ± 1% 97.26m ± 2% ~ (p=0.105 n=10)
Tile38QueryLoad-8 335.9µ ± 1% 335.6µ ± 1% ~ (p=0.481 n=10)
geomean 1.336 1.333 -0.22%
Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f
Reviewed-on: https://go-review.googlesource.com/c/go/+/662075
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Optimise more branches with zero on riscv64. In particular, BLTU with
zero occurs with IsInBounds checks for index zero. This currently results
in two instructions and requires an additional register:
li t2, 0
bltu t2, t1, 0x174b4
This is equivalent to checking if the bounds is not equal to zero. With
this change:
bnez t1, 0x174c0
This removes more than 500 instructions from the Go binary on riscv64.
Change-Id: I6cd861d853e3ef270bd46dacecdfaa205b1c4644
Reviewed-on: https://go-review.googlesource.com/c/go/+/606715
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
All other files here use the codegen package.
Change-Id: I714162941b9fa9051dacc29643e905fe60b9304b
Reviewed-on: https://go-review.googlesource.com/c/go/+/661135
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
This adds tests for type conversion and shifts, detailing various
poor bad code generation that currently exists for riscv64. This
will be addressed in future CLs.
Change-Id: Ie1d366dfe878832df691600f8500ef383da92848
Reviewed-on: https://go-review.googlesource.com/c/go/+/615678
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|