aboutsummaryrefslogtreecommitdiff
path: root/test/codegen
AgeCommit message (Collapse)Author
2025-09-04cmd/compile/internal/ssa: load constant values from abi.PtrType.ElemYoulin Feng
This CL makes the generated code for reflect.TypeFor as simple as an intrinsic function. Fixes #75203 Change-Id: I7bb48787101f07e77ab5c583292e834c28a028d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/700336 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Keith Randall <khr@golang.org>
2025-08-25cmd/compile/internal: optimizing add+sll rule using ALSLV instruction on loong64limeidan
Reduce the number of go toolchain instructions on loong64 as follows: file before after Δ % go 1573148 1571708 -1,440 -0.0915% gofmt 320578 320090 -488 -0.1522% asm 555066 554406 -660 -0.1189% cgo 481566 480926 -640 -0.1329% compile 2475962 2473880 -2,082 -0.0841% cover 516536 515920 -616 -0.1193% link 702172 701404 -768 -0.1094% preprofile 238626 238274 -352 -0.1475% vet 792928 792100 -828 -0.1044% Change-Id: I61e462726835959c60e1b4e5256d4020202418ab Reviewed-on: https://go-review.googlesource.com/c/go/+/693877 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2025-08-24test/codegen: add Mul2 and DivPow2 test for loong64Xiaolin Zhao
Change-Id: I29ccd105c5418955146a3f4873162963da489a70 Reviewed-on: https://go-review.googlesource.com/c/go/+/697935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-08-24test/codegen: add Mul* test for loong64Xiaolin Zhao
Change-Id: Ica285212e4884a96fe9738b53cdc789b223bf2e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/697895 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2025-08-24test/codegen: add sqrt* abs and copysign test for loong64Xiaolin Zhao
Change-Id: I645396fc4b00242f36a06f01550906805c0c1f73 Reviewed-on: https://go-review.googlesource.com/c/go/+/697955 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-08-21cmd/compile: use zero register instead of specialized *zero instructions on ↵limeidan
loong64 Refer to CL 633075, loong64 has a zero(R0) register that can be used to do this. Change-Id: I846c6bdfcfd6dbfa18338afc13e34e350580ead4 Reviewed-on: https://go-review.googlesource.com/c/go/+/693876 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>
2025-08-21cmd/compile: optimize some patterns into revb2h/revb4h instruction on loong64Xiaolin Zhao
Pattern1: (the type of c is uint16) c>>8 | c<<8 To: revb2h c Pattern2: (the type of c is uint32) (c & 0xff00ff00)>>8 | (c & 0x00ff00ff)<<8 To: revb2h c Pattern3: (the type of c is uint64) (c & 0xff00ff00ff00ff00)>>8 | (c & 0x00ff00ff00ff00ff)<<8 To: revb4h c Change-Id: Ic6231a3f476cbacbea4bd00e31193d107cb86cda Reviewed-on: https://go-review.googlesource.com/c/go/+/696335 Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-21cmd/compile: optimize rule (x + x) << c to x << c+1 on loong64Xiaolin Zhao
Change-Id: I782f93510bba92ba60b298c1c1cde456c8bcec38 Reviewed-on: https://go-review.googlesource.com/c/go/+/697956 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-08-13cmd/compile: emit classify instructions for infinity tests on riscv64Michael Munday
The 'classify' instruction on RISC-V sets a bit in a mask to indicate the class a floating point value belongs to (e.g. whether the value is an infinity, a normal number, a subnormal number and so on). There are other places this instruction is useful but for now I've just used it for infinity tests. The gains are relatively small (~1-2 instructions per IsInf call) but using FCLASSD does potentially unlock further optimizations. It also reduces the number of loads from memory and the number of moves between general purpose and floating point register files. goos: linux goarch: riscv64 pkg: math cpu: Spacemit(R) X60 │ sec/op │ sec/op vs base │ Acos 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10) Acosh 249.8n ± 0% 254.4n ± 0% +1.86% (p=0.000 n=10) Asin 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10) Asinh 292.2n ± 0% 283.0n ± 0% -3.15% (p=0.000 n=10) Atan 119.1n ± 0% 119.0n ± 0% -0.08% (p=0.036 n=10) Atanh 265.1n ± 0% 271.6n ± 0% +2.43% (p=0.000 n=10) Atan2 194.9n ± 0% 186.7n ± 0% -4.23% (p=0.000 n=10) Cbrt 216.3n ± 0% 203.1n ± 0% -6.10% (p=0.000 n=10) Ceil 31.82n ± 0% 31.81n ± 0% ~ (p=0.063 n=10) Copysign 4.897n ± 0% 4.893n ± 3% -0.08% (p=0.038 n=10) Cos 123.9n ± 0% 107.7n ± 1% -13.03% (p=0.000 n=10) Cosh 293.0n ± 0% 264.6n ± 0% -9.68% (p=0.000 n=10) Erf 150.0n ± 0% 133.8n ± 0% -10.80% (p=0.000 n=10) Erfc 151.8n ± 0% 137.9n ± 0% -9.16% (p=0.000 n=10) Erfinv 173.8n ± 0% 173.8n ± 0% ~ (p=0.820 n=10) Erfcinv 173.8n ± 0% 173.8n ± 0% ~ (p=1.000 n=10) Exp 247.7n ± 0% 220.4n ± 0% -11.04% (p=0.000 n=10) ExpGo 261.4n ± 0% 232.5n ± 0% -11.04% (p=0.000 n=10) Expm1 176.2n ± 0% 164.9n ± 0% -6.41% (p=0.000 n=10) Exp2 220.4n ± 0% 190.2n ± 0% -13.70% (p=0.000 n=10) Exp2Go 232.5n ± 0% 204.0n ± 0% -12.22% (p=0.000 n=10) Abs 4.897n ± 0% 4.897n ± 0% ~ (p=0.726 n=10) Dim 16.32n ± 0% 16.31n ± 0% ~ (p=0.770 n=10) Floor 31.84n ± 0% 31.83n ± 0% ~ (p=0.677 n=10) Max 26.11n ± 0% 26.13n ± 0% ~ (p=0.290 n=10) Min 26.10n ± 0% 26.11n ± 0% ~ (p=0.424 n=10) Mod 416.2n ± 0% 337.8n ± 0% -18.83% (p=0.000 n=10) Frexp 63.65n ± 0% 50.60n ± 0% -20.50% (p=0.000 n=10) Gamma 218.8n ± 0% 206.4n ± 0% -5.62% (p=0.000 n=10) Hypot 92.20n ± 0% 94.69n ± 0% +2.70% (p=0.000 n=10) HypotGo 107.7n ± 0% 109.3n ± 0% +1.49% (p=0.000 n=10) Ilogb 59.54n ± 0% 44.04n ± 0% -26.04% (p=0.000 n=10) J0 708.9n ± 0% 674.5n ± 0% -4.86% (p=0.000 n=10) J1 707.6n ± 0% 676.1n ± 0% -4.44% (p=0.000 n=10) Jn 1.513µ ± 0% 1.427µ ± 0% -5.68% (p=0.000 n=10) Ldexp 70.20n ± 0% 57.09n ± 0% -18.68% (p=0.000 n=10) Lgamma 201.5n ± 0% 185.3n ± 1% -8.01% (p=0.000 n=10) Log 201.5n ± 0% 182.7n ± 0% -9.35% (p=0.000 n=10) Logb 59.54n ± 0% 46.53n ± 0% -21.86% (p=0.000 n=10) Log1p 178.8n ± 0% 173.9n ± 6% -2.74% (p=0.021 n=10) Log10 201.4n ± 0% 184.3n ± 0% -8.49% (p=0.000 n=10) Log2 79.17n ± 0% 66.07n ± 0% -16.54% (p=0.000 n=10) Modf 34.27n ± 0% 34.25n ± 0% ~ (p=0.559 n=10) Nextafter32 49.34n ± 0% 49.37n ± 0% +0.05% (p=0.040 n=10) Nextafter64 43.66n ± 0% 43.66n ± 0% ~ (p=0.869 n=10) PowInt 309.1n ± 0% 267.4n ± 0% -13.49% (p=0.000 n=10) PowFrac 769.6n ± 0% 677.3n ± 0% -11.98% (p=0.000 n=10) Pow10Pos 13.88n ± 0% 13.88n ± 0% ~ (p=0.811 n=10) Pow10Neg 19.58n ± 0% 19.57n ± 0% ~ (p=0.993 n=10) Round 23.65n ± 0% 23.66n ± 0% ~ (p=0.354 n=10) RoundToEven 27.75n ± 0% 27.75n ± 0% ~ (p=0.971 n=10) Remainder 380.0n ± 0% 309.9n ± 0% -18.45% (p=0.000 n=10) Signbit 13.06n ± 0% 13.06n ± 0% ~ (p=1.000 n=10) Sin 133.8n ± 0% 120.8n ± 0% -9.75% (p=0.000 n=10) Sincos 160.7n ± 0% 147.7n ± 0% -8.12% (p=0.000 n=10) Sinh 305.9n ± 0% 277.9n ± 0% -9.17% (p=0.000 n=10) SqrtIndirect 3.265n ± 0% 3.264n ± 0% ~ (p=0.546 n=10) SqrtLatency 19.58n ± 0% 19.58n ± 0% ~ (p=0.973 n=10) SqrtIndirectLatency 19.59n ± 0% 19.58n ± 0% ~ (p=0.370 n=10) SqrtGoLatency 205.7n ± 0% 202.7n ± 0% -1.46% (p=0.000 n=10) SqrtPrime 4.953µ ± 0% 4.954µ ± 0% ~ (p=0.477 n=10) Tan 163.2n ± 0% 150.2n ± 0% -7.99% (p=0.000 n=10) Tanh 312.4n ± 0% 284.2n ± 0% -9.01% (p=0.000 n=10) Trunc 31.83n ± 0% 31.83n ± 0% ~ (p=0.663 n=10) Y0 701.0n ± 0% 669.2n ± 0% -4.54% (p=0.000 n=10) Y1 704.5n ± 0% 672.4n ± 0% -4.55% (p=0.000 n=10) Yn 1.490µ ± 0% 1.422µ ± 0% -4.60% (p=0.000 n=10) Float64bits 5.713n ± 0% 5.710n ± 0% ~ (p=0.926 n=10) Float64frombits 4.896n ± 0% 4.896n ± 0% ~ (p=0.663 n=10) Float32bits 12.25n ± 0% 12.25n ± 0% ~ (p=0.571 n=10) Float32frombits 4.898n ± 0% 4.896n ± 0% ~ (p=0.754 n=10) FMA 4.895n ± 0% 4.895n ± 0% ~ (p=0.745 n=10) geomean 94.40n 89.43n -5.27% Change-Id: I4fe0f2e9f609e38d79463f9ba2519a3f9427432e Reviewed-on: https://go-review.googlesource.com/c/go/+/348389 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-08-12cmd/compile/internal: optimize multiplication use new operation ↵limeidan
'ADDshiftLLV' on loong64 goos: linux goarch: loong64 pkg: cmd/compile/internal/test cpu: Loongson-3A6000-HV @ 2500.00MHz │ old │ new │ │ sec/op │ sec/op vs base │ MulconstI32/3 0.8004n ± 0% 0.4247n ± 2% -46.94% (p=0.000 n=10) MulconstI32/5 0.8005n ± 0% 0.4256n ± 1% -46.83% (p=0.000 n=10) MulconstI32/12 1.2010n ± 0% 0.8005n ± 0% -33.35% (p=0.000 n=10) MulconstI32/120 0.8090n ± 0% 0.8067n ± 0% -0.28% (p=0.007 n=10) MulconstI32/-120 0.8109n ± 0% 0.8072n ± 0% -0.47% (p=0.000 n=10) MulconstI32/65537 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10) MulconstI32/65538 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.265 n=10) MulconstI64/3 0.8005n ± 0% 0.4241n ± 1% -47.02% (p=0.000 n=10) MulconstI64/5 0.8004n ± 0% 0.4249n ± 1% -46.91% (p=0.000 n=10) MulconstI64/12 1.2010n ± 0% 0.8004n ± 0% -33.36% (p=0.000 n=10) MulconstI64/120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.635 n=10) MulconstI64/-120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.837 n=10) MulconstI64/65537 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.837 n=10) MulconstI64/65538 0.8096n ± 0% 0.8004n ± 0% -1.14% (p=0.000 n=10) MulconstU32/3 0.8004n ± 0% 0.4263n ± 1% -46.75% (p=0.000 n=10) MulconstU32/5 0.8005n ± 0% 0.4262n ± 1% -46.76% (p=0.000 n=10) MulconstU32/12 1.2010n ± 0% 0.8005n ± 0% -33.35% (p=0.000 n=10) MulconstU32/120 0.8105n ± 0% 0.8096n ± 0% ~ (p=0.183 n=10) MulconstU32/65537 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10) MulconstU32/65538 0.8005n ± 0% 0.8005n ± 0% ~ (p=1.000 n=10) MulconstU64/3 0.8004n ± 0% 0.4265n ± 4% -46.71% (p=0.000 n=10) MulconstU64/5 0.8004n ± 0% 0.4256n ± 0% -46.82% (p=0.000 n=10) MulconstU64/12 1.2010n ± 0% 0.8004n ± 0% -33.36% (p=0.000 n=10) MulconstU64/120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.387 n=10) MulconstU64/65537 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.265 n=10) MulconstU64/65538 0.8080n ± 0% 0.8004n ± 0% -0.93% (p=0.000 n=10) geomean 0.8539n 0.6597n -22.74% Change-Id: Ie33e88985d7639f481bbba540bc917b9f185c357 Reviewed-on: https://go-review.googlesource.com/c/go/+/693855 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-12cmd/compile: soften test for 74788Keith Randall
We now (as of CL 678620) use float registers other than X0 for copying. Change-Id: Ifdecd5df7519663742eed0f292c98453754d4b25 Reviewed-on: https://go-review.googlesource.com/c/go/+/695275 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
2025-08-11cmd/compile: allow InlMark operations to be speculatively executedMichael Munday
Although InlMark takes a memory argument it ultimately becomes a NOP and therefore is safe to speculatively execute. Fixes #74915 Change-Id: I64317dd433e300ac28de2bcf201845083ec2ac82 Reviewed-on: https://go-review.googlesource.com/c/go/+/693795 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-08-05cmd/compile: fold negation into multiplication on loong64Xiaolin Zhao
This change also add corresponding benchmark tests and codegen tests. The performance improvement on CPU Loongson-3A6000-HV is as follows: goos: linux goarch: loong64 pkg: cmd/compile/internal/test cpu: Loongson-3A6000-HV @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | MulNeg 828.4n ± 0% 655.9n ± 0% -20.82% (p=0.000 n=10) Mul2Neg 1062.0n ± 0% 826.8n ± 0% -22.15% (p=0.000 n=10) geomean 938.0n 736.4n -21.49% Change-Id: Ia999732880ec65be0c66cddc757a4868847e5b15 Reviewed-on: https://go-review.googlesource.com/c/go/+/682535 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <markfreeman@google.com>
2025-08-05cmd/compile: optimise float <-> int register moves on riscv64Michael Munday
Use the FMV* instructions to move values between the floating point and integer register files. Note: I'm unsure why there is a slowdown in the Float32bits benchmark, I've checked and an FMVXS instruction is being used as expected. There are multiple loads and other instructions in the main loop. goos: linux goarch: riscv64 pkg: math cpu: Spacemit(R) X60 │ fmv-before.txt │ fmv-after.txt │ │ sec/op │ sec/op vs base │ Acos 122.7n ± 0% 122.7n ± 0% ~ (p=1.000 n=10) Acosh 197.2n ± 0% 191.5n ± 0% -2.89% (p=0.000 n=10) Asin 122.7n ± 0% 122.7n ± 0% ~ (p=0.474 n=10) Asinh 231.0n ± 0% 224.1n ± 0% -2.99% (p=0.000 n=10) Atan 91.39n ± 0% 91.41n ± 0% ~ (p=0.465 n=10) Atanh 210.3n ± 0% 203.4n ± 0% -3.26% (p=0.000 n=10) Atan2 149.6n ± 0% 149.6n ± 0% ~ (p=0.721 n=10) Cbrt 176.5n ± 0% 165.9n ± 0% -6.01% (p=0.000 n=10) Ceil 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10) Copysign 3.756n ± 0% 3.756n ± 0% ~ (p=0.149 n=10) Cos 95.15n ± 0% 95.15n ± 0% ~ (p=0.374 n=10) Cosh 228.6n ± 0% 224.7n ± 0% -1.71% (p=0.000 n=10) Erf 115.2n ± 0% 115.2n ± 0% ~ (p=0.474 n=10) Erfc 116.4n ± 0% 116.4n ± 0% ~ (p=0.628 n=10) Erfinv 133.3n ± 0% 133.3n ± 0% ~ (p=1.000 n=10) Erfcinv 133.3n ± 0% 133.3n ± 0% ~ (p=1.000 n=10) Exp 194.1n ± 0% 190.3n ± 0% -1.93% (p=0.000 n=10) ExpGo 204.7n ± 0% 200.3n ± 0% -2.15% (p=0.000 n=10) Expm1 137.7n ± 0% 135.2n ± 0% -1.82% (p=0.000 n=10) Exp2 173.4n ± 0% 169.0n ± 0% -2.54% (p=0.000 n=10) Exp2Go 182.8n ± 0% 178.4n ± 0% -2.41% (p=0.000 n=10) Abs 3.756n ± 0% 3.756n ± 0% ~ (p=0.157 n=10) Dim 12.52n ± 0% 12.52n ± 0% ~ (p=0.737 n=10) Floor 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10) Max 21.29n ± 0% 20.03n ± 0% -5.92% (p=0.000 n=10) Min 21.28n ± 0% 20.04n ± 0% -5.85% (p=0.000 n=10) Mod 344.9n ± 0% 319.2n ± 0% -7.45% (p=0.000 n=10) Frexp 55.71n ± 0% 48.85n ± 0% -12.30% (p=0.000 n=10) Gamma 165.9n ± 0% 167.8n ± 0% +1.15% (p=0.000 n=10) Hypot 73.24n ± 0% 70.74n ± 0% -3.41% (p=0.000 n=10) HypotGo 84.50n ± 0% 82.63n ± 0% -2.21% (p=0.000 n=10) Ilogb 49.45n ± 0% 45.70n ± 0% -7.59% (p=0.000 n=10) J0 556.5n ± 0% 544.0n ± 0% -2.25% (p=0.000 n=10) J1 555.3n ± 0% 542.8n ± 0% -2.24% (p=0.000 n=10) Jn 1.181µ ± 0% 1.156µ ± 0% -2.12% (p=0.000 n=10) Ldexp 59.47n ± 0% 53.84n ± 0% -9.47% (p=0.000 n=10) Lgamma 167.2n ± 0% 154.6n ± 0% -7.51% (p=0.000 n=10) Log 160.9n ± 0% 154.6n ± 0% -3.92% (p=0.000 n=10) Logb 49.45n ± 0% 45.70n ± 0% -7.58% (p=0.000 n=10) Log1p 147.1n ± 0% 137.1n ± 0% -6.80% (p=0.000 n=10) Log10 162.1n ± 1% 154.6n ± 0% -4.63% (p=0.000 n=10) Log2 66.99n ± 0% 60.72n ± 0% -9.36% (p=0.000 n=10) Modf 29.42n ± 0% 26.29n ± 0% -10.64% (p=0.000 n=10) Nextafter32 41.95n ± 0% 37.88n ± 0% -9.70% (p=0.000 n=10) Nextafter64 38.82n ± 0% 33.49n ± 0% -13.73% (p=0.000 n=10) PowInt 252.3n ± 0% 237.3n ± 0% -5.95% (p=0.000 n=10) PowFrac 615.5n ± 0% 589.7n ± 0% -4.19% (p=0.000 n=10) Pow10Pos 10.64n ± 0% 10.64n ± 0% ~ (p=1.000 n=10) Pow10Neg 24.42n ± 0% 15.02n ± 0% -38.49% (p=0.000 n=10) Round 21.91n ± 0% 18.16n ± 0% -17.12% (p=0.000 n=10) RoundToEven 24.42n ± 0% 21.29n ± 0% -12.84% (p=0.000 n=10) Remainder 308.0n ± 0% 291.2n ± 0% -5.44% (p=0.000 n=10) Signbit 10.02n ± 0% 10.02n ± 0% ~ (p=1.000 n=10) Sin 102.7n ± 0% 102.7n ± 0% ~ (p=0.211 n=10) Sincos 124.0n ± 1% 123.3n ± 0% -0.56% (p=0.002 n=10) Sinh 239.1n ± 0% 234.7n ± 0% -1.84% (p=0.000 n=10) SqrtIndirect 2.504n ± 0% 2.504n ± 0% ~ (p=0.303 n=10) SqrtLatency 15.03n ± 0% 15.02n ± 0% ~ (p=0.598 n=10) SqrtIndirectLatency 15.02n ± 0% 15.02n ± 0% ~ (p=0.907 n=10) SqrtGoLatency 165.3n ± 0% 157.2n ± 0% -4.90% (p=0.000 n=10) SqrtPrime 3.801µ ± 0% 3.802µ ± 0% ~ (p=1.000 n=10) Tan 125.2n ± 0% 125.2n ± 0% ~ (p=0.458 n=10) Tanh 244.2n ± 0% 239.9n ± 0% -1.76% (p=0.000 n=10) Trunc 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10) Y0 550.2n ± 0% 538.1n ± 0% -2.21% (p=0.000 n=10) Y1 552.8n ± 0% 540.6n ± 0% -2.21% (p=0.000 n=10) Yn 1.168µ ± 0% 1.143µ ± 0% -2.14% (p=0.000 n=10) Float64bits 8.139n ± 0% 4.385n ± 0% -46.13% (p=0.000 n=10) Float64frombits 7.512n ± 0% 3.759n ± 0% -49.96% (p=0.000 n=10) Float32bits 8.138n ± 0% 9.393n ± 0% +15.42% (p=0.000 n=10) Float32frombits 7.513n ± 0% 3.757n ± 0% -49.98% (p=0.000 n=10) FMA 3.756n ± 0% 3.756n ± 0% ~ (p=0.246 n=10) geomean 77.43n 72.42n -6.47% Change-Id: I8dac69b1d17cb3d2af78d1c844d2b5d80000d667 Reviewed-on: https://go-review.googlesource.com/c/go/+/599235 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Munday <mikemndy@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-08-01cmd/compile: optimize multiplication rules on loong64Xiaolin Zhao
Improve multiplication strength reduction, refer to CL 626998, add additional 3 linear combination instructions for loong64. goos: linux goarch: loong64 pkg: cmd/compile/internal/test cpu: Loongson-3A6000-HV @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | MulconstI32/3 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstI32/5 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstI32/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10) MulconstI32/120 1.6010n ± 0% 0.8130n ± 0% -49.22% (p=0.000 n=10) MulconstI32/-120 1.6010n ± 0% 0.8109n ± 0% -49.35% (p=0.000 n=10) MulconstI32/65537 1.6275n ± 0% 0.8005n ± 0% -50.81% (p=0.000 n=10) MulconstI32/65538 1.6290n ± 0% 0.8004n ± 0% -50.87% (p=0.000 n=10) MulconstI64/3 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10) MulconstI64/5 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10) MulconstI64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10) MulconstI64/120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstI64/-120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstI64/65537 1.6270n ± 0% 0.8005n ± 0% -50.80% (p=0.000 n=10) MulconstI64/65538 1.6290n ± 0% 0.8071n ± 1% -50.45% (p=0.000 n=10) MulconstU32/3 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10) MulconstU32/5 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10) MulconstU32/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10) MulconstU32/120 1.6010n ± 0% 0.8066n ± 0% -49.62% (p=0.000 n=10) MulconstU32/65537 1.6290n ± 0% 0.8005n ± 0% -50.86% (p=0.000 n=10) MulconstU32/65538 1.6280n ± 0% 0.8005n ± 0% -50.83% (p=0.000 n=10) MulconstU64/3 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstU64/5 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstU64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10) MulconstU64/120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstU64/65537 1.6290n ± 0% 0.8005n ± 0% -50.86% (p=0.000 n=10) MulconstU64/65538 1.6300n ± 0% 0.8067n ± 0% -50.51% (p=0.000 n=10) geomean 1.609n 0.8537n -46.95% goos: linux goarch: loong64 pkg: cmd/compile/internal/test cpu: Loongson-3A5000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | MulconstI32/3 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstI32/5 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstI32/12 1.601n ± 0% 1.202n ± 0% -24.92% (p=0.000 n=10) MulconstI32/120 1.6020n ± 0% 0.8012n ± 0% -49.99% (p=0.000 n=10) MulconstI32/-120 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstI32/65537 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10) MulconstI32/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstI64/3 1.6015n ± 0% 0.8007n ± 0% -50.00% (p=0.000 n=10) MulconstI64/5 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10) MulconstI64/12 1.602n ± 0% 1.202n ± 0% -25.00% (p=0.000 n=10) MulconstI64/120 1.6030n ± 0% 0.8011n ± 0% -50.02% (p=0.000 n=10) MulconstI64/-120 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10) MulconstI64/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstI64/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstU32/3 1.6010n ± 0% 0.8006n ± 0% -49.99% (p=0.000 n=10) MulconstU32/5 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstU32/12 1.601n ± 0% 1.202n ± 0% -24.92% (p=0.000 n=10) MulconstU32/120 1.6010n ± 0% 0.8006n ± 0% -49.99% (p=0.000 n=10) MulconstU32/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstU32/65538 1.6020n ± 0% 0.8009n ± 0% -50.01% (p=0.000 n=10) MulconstU64/3 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstU64/5 1.6010n ± 0% 0.8007n ± 0% -49.98% (p=0.000 n=10) MulconstU64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10) MulconstU64/120 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10) MulconstU64/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstU64/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) geomean 1.601n 0.8523n -46.77% Change-Id: I9fb0e47ca57875da171a347bf4828adfab41b875 Reviewed-on: https://go-review.googlesource.com/c/go/+/675455 Reviewed-by: Mark Freeman <mark@golang.org> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>
2025-07-31cmd/compile: use generated loops instead of DUFFZERO on amd64Keith Randall
goarch: amd64 cpu: 12th Gen Intel(R) Core(TM) i7-12700 │ base │ exp │ │ sec/op │ sec/op vs base │ MemclrKnownSize112-20 1.270n ± 14% 1.006n ± 0% -20.72% (p=0.000 n=10) MemclrKnownSize128-20 1.266n ± 0% 1.005n ± 0% -20.58% (p=0.000 n=10) MemclrKnownSize192-20 1.771n ± 0% 1.579n ± 1% -10.84% (p=0.000 n=10) MemclrKnownSize248-20 4.034n ± 0% 3.520n ± 0% -12.75% (p=0.000 n=10) MemclrKnownSize256-20 2.269n ± 0% 2.014n ± 0% -11.26% (p=0.000 n=10) MemclrKnownSize512-20 4.280n ± 0% 4.030n ± 0% -5.84% (p=0.000 n=10) MemclrKnownSize1024-20 8.309n ± 1% 8.057n ± 0% -3.03% (p=0.000 n=10) Change-Id: I8f1627e2a1e981ff351dc7178932b32a2627f765 Reviewed-on: https://go-review.googlesource.com/c/go/+/678937 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-30cmd/compile: add floating point min/max intrinsics on s390xMichael Munday
Add the VECTOR FP (MINIMUM|MAXIMUM) instructions to the assembler and use them in the compiler to implement min and max. Note: I've allowed floating point registers to be used with the single element instructions (those with the W instead of V prefix) to allow easier integration into the compiler. Change-Id: I5f80a510bd248cf483cce95f1979bf63fbae7de6 Reviewed-on: https://go-review.googlesource.com/c/go/+/684715 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <mark@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-07-30cmd/compile: deduplicate instructions when rewrite func resultsYoulin Feng
After CL 628075, do not rely on the memory arg of an OpLocalAddr. Fixes #74788 Change-Id: I4e893241e3949bb8f2d93c8b88cc102e155b725d Reviewed-on: https://go-review.googlesource.com/c/go/+/691275 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Mark Freeman <mark@golang.org>
2025-07-29cmd/compile: use unsigned power-of-two detector for unsigned modCuong Manh Le
Same as CL 689815, but for modulus instead of division. Updates #74485 Change-Id: I73000231c886a987a1093669ff207fd9117a8160 Reviewed-on: https://go-review.googlesource.com/c/go/+/689895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-07-29cmd/compile: add unsigned power-of-two detectorCuong Manh Le
Fixes #74485 Change-Id: Ia22a58ac43bdc36c8414d555672a3a3eafc749ca Reviewed-on: https://go-review.googlesource.com/c/go/+/689815 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-07-28test/codegen: fix failing condmove wasm testsMichael Munday
These recently added tests failed when using the -all_codgen flag. Fixes #74770 Change-Id: Idea1ea02af2bd9f45c7d0a28d633c7442328e6df Reviewed-on: https://go-review.googlesource.com/c/go/+/690715 Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Run-TryBot: Michael Munday <mikemndy@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Mark Freeman <mark@golang.org> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
2025-07-24cmd/compile: rewrite condselects into doublings and halvingsJorropo
For performance see CL 685676. This allows something like: if y { x *= 2 } To be compiled to: SHLXQ BX, AX, AX Instead of: MOVQ AX, CX SHLQ $1, CX MOVBLZX BL, DX TESTQ DX, DX CMOVQNE CX, AX While ./make.bash uniqued per LOC, there is 2 doublings and 4 halvings. Change-Id: Ic0727cbf429528a2dbf17cbfc3b0121db8387444 Reviewed-on: https://go-review.googlesource.com/c/go/+/685695 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-07-24cmd/compile: add opt branchelim to rewrite some CondSelect into mathJorropo
This allows something like: if y { x++ } To be compiled to: MOVBLZX BX, CX ADDQ CX, AX Instead of: LEAQ 1(AX), CX MOVBLZX BL, DX TESTQ DX, DX CMOVQNE CX, AX While ./make.bash uniqued per LOC, there is 100 additions and 75 substractions. See benchmark here: https://go.dev/play/p/DJf5COjwhd_s Either it's a performance no-op or it is faster: goos: linux goarch: amd64 cpu: AMD Ryzen 5 3600 6-Core Processor │ /tmp/old.logs │ /tmp/new.logs │ │ sec/op │ sec/op vs base │ CmovInlineConditionAddLatency-12 0.5443n ± 5% 0.5339n ± 3% -1.90% (p=0.004 n=10) CmovInlineConditionAddThroughputBy6-12 1.492n ± 1% 1.494n ± 1% ~ (p=0.955 n=10) CmovInlineConditionSubLatency-12 0.5419n ± 3% 0.5282n ± 3% -2.52% (p=0.019 n=10) CmovInlineConditionSubThroughputBy6-12 1.587n ± 1% 1.584n ± 2% ~ (p=0.492 n=10) CmovOutlineConditionAddLatency-12 0.5223n ± 1% 0.2639n ± 4% -49.47% (p=0.000 n=10) CmovOutlineConditionAddThroughputBy6-12 1.159n ± 1% 1.097n ± 2% -5.35% (p=0.000 n=10) CmovOutlineConditionSubLatency-12 0.5271n ± 3% 0.2654n ± 2% -49.66% (p=0.000 n=10) CmovOutlineConditionSubThroughputBy6-12 1.053n ± 1% 1.050n ± 1% ~ (p=1.000 n=10) geomean There are other benefits not tested by this benchmark: - the math form is usually a couple bytes shorter (ICACHE) - the math form is usually 0~2 uops shorter (UCACHE) - the math form has usually less register pressure* - the math form can sometimes be optimized further *regalloc rarely find how it can use less registers As far as pass ordering goes there are many possible options, I've decided to reorder branchelim before late opt since: - unlike running exclusively the CondSelect rules after branchelim, some extra optimizations might trigger on the adds or subs. - I don't want to maintain a second generic.rules file of only the stuff, that can trigger after branchelim. - rerunning all of opt a third time increase compilation time for little gains. By elimination moving branchelim seems fine. Change-Id: I869adf57e4d109948ee157cfc47144445146bafd Reviewed-on: https://go-review.googlesource.com/c/go/+/685676 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-07-24cmd/compile: fold shift through AND for slice operationsAlexander Musman
Fold a shift through AND when the AND gets a zero-or-one operand (e.g. from arithmetic shift by 63 of a 64-bit value) for a common case with slice operations: ASR $63, R2, R2 AND R3<<3, R2, R2 ADD R2, R0, R2 As the operands are 64-bit, we can transform it to: AND R2->63, R3, R2 ADD R2<<3, R0, R2 Code size improvement: compile: .text: 9088004 -> 9086292 (-0.02%) etcd: .text: 10500276 -> 10498964 (-0.01%) Change-Id: Ibcd5e67173da39b77ceff77ca67812fb8be5a7b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/679895 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <mark@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-07-24cmd/compile: optimize slice bounds checking with SUB/SUBconst comparisonsAlexander Musman
Optimize ARM64 code generation for slice bounds checking by recognizing patterns where comparisons to zero involve SUB or SUBconst operations. This change adds SSA opt rules to simplify: (CMPconst [0] (SUB x y)) => (CMP x y) The optimizations apply to EQ, NE, ULE, and UGT comparisons, enabling more efficient bounds checking for slice operations. Code size improvement: compile: .text: 9088004 -> 9065988 (-0.24%) etcd: .text: 10500276 -> 10497092 (-0.03%) Change-Id: I467cb27674351652bcacc52b87e1f19677bd46a8 Reviewed-on: https://go-review.googlesource.com/c/go/+/679915 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Keith Randall <khr@golang.org>
2025-06-09cmd/compile/internal/ssa: fix PPC64 merging of (AND (S[RL]Dconst ...)Paul Murphy
CL 622236 forgot to check the mask was also a 32 bit rotate mask. Add a modified version of isPPC64WordRotateMask which valids the mask is contiguous and fits inside a uint32. I don't this is possible when merging SRDconst, the first check should always reject such combines. But, be extra careful and do it there too. Fixes #73153 Change-Id: Ie95f74ec5e7d89dc761511126db814f886a7a435 Reviewed-on: https://go-review.googlesource.com/c/go/+/679775 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-05-21cmd/compile/internal/ssa: eliminate string copies for calls to unique.MakeJake Bailey
unique.Make always copies strings passed into it, so it's safe to not copy byte slices converted to strings either. Handle this just like map accesses with string(b) as keys. This CL only handles unique.Make(string(b)), not nested cases like unique.Make([2]string{string(b1), string(b2)}); this could be done in a followup CL but the map lookup code in walk is sufficiently different than the call handling code that I didn't attempt it. (SSA is much easier). Fixes #71926 Change-Id: Ic2f82f2f91963d563b4ddb1282bd49fc40da8b85 Reviewed-on: https://go-review.googlesource.com/c/go/+/672135 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21cmd/compile/internal/walk: convert composite literals to interfaces without ↵thepudds
allocating Today, this interface conversion causes the struct literal to be heap allocated: var sink any func example1() { sink = S{1, 1} } For basic literals like integers that are directly used in an interface conversion that would otherwise allocate, the compiler is able to use read-only global storage (see #18704). This CL extends that to struct and array literals as well by creating read-only global storage that is able to represent for example S{1, 1}, and then using a pointer to that storage in the interface when the interface conversion happens. A more challenging example is: func example2() { v := S{1, 1} sink = v } In this case, the struct literal is not directly part of the interface conversion, but is instead assigned to a local variable. To still avoid heap allocation in cases like this, in walk we construct a cache that maps from expressions used in interface conversions to earlier expressions that can be used to represent the same value (via ir.ReassignOracle.StaticValue). This is somewhat analogous to how we avoided heap allocation for basic literals in CL 649077 earlier in our stack, though here we also need to do a little more work to create the read-only global. CL 649076 (also earlier in our stack) added most of the tests along with debug diagnostics in convert.go to make it easier to test this change. See the writeup in #71359 for details. Fixes #71359 Fixes #71323 Updates #62653 Updates #53465 Updates #8618 Change-Id: I8924f0c69ff738ea33439bd6af7b4066af493b90 Reviewed-on: https://go-review.googlesource.com/c/go/+/649555 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-05-21cmd/compile: fix offset calculation error in memcombineJunyang Shao
Fixes #73812 Change-Id: If7a6e103ae9e1442a2cf4a3c6b1270b6a1887196 Reviewed-on: https://go-review.googlesource.com/c/go/+/675175 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21cmd/compile: add rules about ORN and ANDNXiaolin Zhao
Reduce the number of go toolchain instructions on loong64 as follows. file before after Δ % addr2line 279880 279776 -104 -0.0372% asm 556638 556410 -228 -0.0410% buildid 272272 272072 -200 -0.0735% cgo 481522 481318 -204 -0.0424% compile 2457788 2457580 -208 -0.0085% covdata 323384 323280 -104 -0.0322% cover 518450 518234 -216 -0.0417% dist 340790 340686 -104 -0.0305% distpack 282456 282252 -204 -0.0722% doc 789932 789688 -244 -0.0309% fix 324332 324228 -104 -0.0321% link 704622 704390 -232 -0.0329% nm 277132 277028 -104 -0.0375% objdump 507862 507758 -104 -0.0205% pack 221774 221674 -100 -0.0451% pprof 1469816 1469552 -264 -0.0180% test2json 254836 254732 -104 -0.0408% trace 1100002 1099738 -264 -0.0240% vet 781078 780874 -204 -0.0261% go 1529116 1528848 -268 -0.0175% gofmt 318556 318448 -108 -0.0339% total 13792238 13788566 -3672 -0.0266% Change-Id: I23fb3ebd41309252c7075e57ea7094e79f8c4fef Reviewed-on: https://go-review.googlesource.com/c/go/+/674335 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: abner chenc <chenguoqi@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn>
2025-05-20cmd/compile: fix the implementation of NORconst on loong64Xiaolin Zhao
In the loong64 instruction set, there is no NORI instruction, so the immediate value in NORconst need to be stored in register and then use the three-register NOR instruction. Change-Id: I5ef697450619317218cb3ef47fc07e238bdc2139 Reviewed-on: https://go-review.googlesource.com/c/go/+/673836 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20cmd/compile: memcombine different size storesJunyang Shao
This CL implements the TODO in combineStores to allow combining stores of different sizes, as long as the total size aligns to 2, 4, 8. Fixes #72832. Change-Id: I6d1d471335da90d851ad8f3b5a0cf10bdcfa17c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/661855 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Junyang Shao <shaojunyang@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20cmd/compile: fold negation into addition/subtraction on arm64Julian Zhu
Fold negation into addition/subtraction and avoid double negation. platform: linux/arm64 file before after Δ % addr2line 3628108 3628116 +8 +0.000% asm 6208353 6207857 -496 -0.008% buildid 3460682 3460418 -264 -0.008% cgo 5572988 5572492 -496 -0.009% compile 26042159 26041039 -1120 -0.004% cover 6304328 6303472 -856 -0.014% dist 4139330 4139098 -232 -0.006% doc 9429305 9428065 -1240 -0.013% fix 3997189 3996733 -456 -0.011% link 8212128 8210280 -1848 -0.023% nm 3620056 3619696 -360 -0.010% objdump 5920289 5919233 -1056 -0.018% pack 2892250 2891778 -472 -0.016% pprof 17094569 17092745 -1824 -0.011% test2json 3335825 3335529 -296 -0.009% trace 15842080 15841456 -624 -0.004% vet 9472194 9471106 -1088 -0.011% go 19081541 19081509 -32 -0.000% total 154253374 154240622 -12752 -0.008% platform: darwin/arm64 file before after Δ % compile 27152002 27135490 -16512 -0.061% link 8372914 8356402 -16512 -0.197% go 19154802 19154778 -24 -0.000% total 157734180 157701132 -33048 -0.021% Change-Id: I15a349bfbaf7333ec3e4a62ae4d06f3f371dfb1d Reviewed-on: https://go-review.googlesource.com/c/go/+/673715 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-19cmd/compile: derive bounds on signed %N for N a power of 2Keith Randall
-N+1 <= x % N <= N-1 This is useful for cases like: func setBit(b []byte, i int) { b[i/8] |= 1<<(i%8) } The shift does not need protection against larger-than-7 cases. (It does still need protection against <0 cases.) Change-Id: Idf83101386af538548bfeb6e2928cea855610ce2 Reviewed-on: https://go-review.googlesource.com/c/go/+/672995 Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-19cmd/compile: fold negation into addition/subtraction on mipsxJulian Zhu
Fold negation into addition/subtraction and avoid double negation. file before after Δ % addr2line 3742022 3741986 -36 -0.001% asm 6668616 6668628 +12 +0.000% buildid 3583786 3583630 -156 -0.004% cgo 6020370 6019634 -736 -0.012% compile 29416016 29417336 +1320 +0.004% cover 6801903 6801675 -228 -0.003% dist 4485916 4485816 -100 -0.002% doc 10652787 10652251 -536 -0.005% fix 4115988 4115560 -428 -0.010% link 9002328 9001616 -712 -0.008% nm 3733148 3732780 -368 -0.010% objdump 6163292 6163068 -224 -0.004% pack 2944768 2944604 -164 -0.006% pprof 18909973 18908773 -1200 -0.006% test2json 3394662 3394778 +116 +0.003% trace 17350911 17349751 -1160 -0.007% vet 10077727 10077527 -200 -0.002% go 19118769 19118609 -160 -0.001% total 166182982 166178022 -4960 -0.003% Change-Id: Id55698800fd70f3cb2ff48393584456b87208921 Reviewed-on: https://go-review.googlesource.com/c/go/+/673556 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-16cmd/compile: fold negation into addition/subtraction on mips64xJulian Zhu
Fold negation into addition/subtraction and avoid double negation. file before after Δ % addr2line 4007310 4007470 +160 +0.004% asm 7007636 7007436 -200 -0.003% buildid 3839268 3838972 -296 -0.008% cgo 6353466 6352738 -728 -0.011% compile 30426920 30426896 -24 -0.000% cover 7005408 7004744 -664 -0.009% dist 4651192 4650872 -320 -0.007% doc 10606050 10606034 -16 -0.000% fix 4446414 4446390 -24 -0.001% link 9237736 9237024 -712 -0.008% nm 3999107 3999323 +216 +0.005% objdump 6762424 6762144 -280 -0.004% pack 3270757 3270493 -264 -0.008% pprof 19428299 19361939 -66360 -0.342% test2json 3717345 3717217 -128 -0.003% trace 17382273 17381657 -616 -0.004% vet 10689481 10688985 -496 -0.005% go 19118769 19118609 -160 -0.001% total 171949855 171878943 -70912 -0.041% Change-Id: I35c1f264d216c214ea3f56252a9ddab8ea850fa6 Reviewed-on: https://go-review.googlesource.com/c/go/+/673555 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-15cmd/compile: allow load-op merging in additional situationsKeith Randall
x += *p We want to do this with a single load+add operation on amd64. The tricky part is that we don't want to combine if there are other uses of x after this instruction. Implement a simple detector that seems to capture a common situation - x += *p is in a loop, and the other use of x is after loop exit. In that case, it does not hurt to do the load+add combo. Change-Id: I466174cce212e78bde83f908cc1f2752b560c49c Reviewed-on: https://go-review.googlesource.com/c/go/+/672957 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-15cmd/compile: schedule induction variable increments lateKeith Randall
for ..; ..; i++ { ... } We want to schedule the i++ late in the block, so that all other uses of i in the block are scheduled first. That way, i++ can happen in place in a register instead of requiring a temporary register. Change-Id: Id777407c7e67a5ddbd8e58251099b0488138c0df Reviewed-on: https://go-review.googlesource.com/c/go/+/672998 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-05-14cmd/compile: fold negation into addition/subtraction on loong64Xiaolin Zhao
This change also avoid double negation, and add loong64 codegen for arithmetic tests. Reduce the number of go toolchain instructions on loong64 as follows. file before after Δ % addr2line 279972 279896 -76 -0.0271% asm 556390 556310 -80 -0.0144% buildid 272376 272300 -76 -0.0279% cgo 481534 481550 +16 +0.0033% compile 2457992 2457396 -596 -0.0242% covdata 323488 323404 -84 -0.0260% cover 518630 518490 -140 -0.0270% dist 340894 340814 -80 -0.0235% distpack 282568 282484 -84 -0.0297% doc 790224 789984 -240 -0.0304% fix 324408 324348 -60 -0.0185% link 704910 704666 -244 -0.0346% nm 277220 277144 -76 -0.0274% objdump 508026 507878 -148 -0.0291% pack 221810 221786 -24 -0.0108% pprof 1470284 1469880 -404 -0.0275% test2json 254896 254852 -44 -0.0173% trace 1100390 1100074 -316 -0.0287% vet 781398 781142 -256 -0.0328% go 1529668 1529128 -540 -0.0353% gofmt 318668 318568 -100 -0.0314% total 13795746 13792094 -3652 -0.0265% Change-Id: I88d1f12cfc4be0e92687c48e06a57213aa484aca Reviewed-on: https://go-review.googlesource.com/c/go/+/672555 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-08cmd/compile: add 2 phiopt casesJakub Ciolek
Add 2 more cases: if a { x = value } else { x = a } => x = a && value if a { x = a } else { x = value } => x = a || value AND case goes from: 00006 (8) TESTB AX, AX 00007 (8) JNE 9 00008 (13) MOVL AX, BX 00009 (13) MOVL BX, AX 00010 (13) RET to: 00006 (13) ANDL BX, AX 00007 (13) RET OR goes from: 00006 (19) TESTB AX, AX 00007 (19) JNE 9 00008 (24) MOVL BX, AX 00009 (24) RET to: 00006 (24) ORL BX, AX 00007 (24) RET compilecmp linux/amd64: runtime runtime.lock2 847 -> 869 (+2.60%) runtime.addspecial 542 -> 517 (-4.61%) runtime.tracebackPCs changed runtime.scanstack changed runtime.mallocinit changed runtime.traceback2 2238 -> 2206 (-1.43%) runtime [cmd/compile] runtime.lock2 860 -> 882 (+2.56%) runtime.scanstack changed runtime.addspecial 542 -> 517 (-4.61%) runtime.traceback2 2238 -> 2206 (-1.43%) runtime.lockWithRank 870 -> 890 (+2.30%) runtime.tracebackPCs changed runtime.mallocinit changed strconv strconv.ryuFtoaFixed32 changed strconv.ryuFtoaFixed64 639 -> 638 (-0.16%) strconv.readFloat changed strconv.ryuFtoaShortest changed strings strings.(*Replacer).build changed strconv [cmd/compile] strconv.readFloat changed strconv.ryuFtoaFixed64 639 -> 638 (-0.16%) strconv.ryuFtoaFixed32 changed strconv.ryuFtoaShortest changed strings [cmd/compile] strings.(*Replacer).build changed regexp regexp.makeOnePass.func1 changed regexp [cmd/compile] regexp.makeOnePass.func1 changed encoding/json encoding/json.indirect changed database/sql database/sql.driverArgsConnLocked changed vendor/golang.org/x/text/unicode/norm vendor/golang.org/x/text/unicode/norm.Form.transform changed go/doc/comment go/doc/comment.parseSpans changed internal/diff internal/diff.tgs changed log/slog log/slog.(*handleState).appendNonBuiltIns 1898 -> 1877 (-1.11%) testing/fstest testing/fstest.(*fsTester).checkGlob changed runtime/pprof runtime/pprof.(*profileBuilder).build changed cmd/internal/dwarf cmd/internal/dwarf.isEmptyInlinedCall 254 -> 244 (-3.94%) go/printer go/printer.keepTypeColumn 302 -> 270 (-10.60%) go/printer.(*printer).binaryExpr changed cmd/compile/internal/syntax cmd/compile/internal/syntax.(*scanner).rune changed cmd/compile/internal/syntax.(*scanner).number 2137 -> 2153 (+0.75%) Change-Id: I7f95f54b03a35d0b616c40f38b415a7feb71be73 Reviewed-on: https://go-review.googlesource.com/c/go/+/666835 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Jakub Ciolek <jakub@ciolek.dev> TryBot-Bypass: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01cmd/compile: improve multiplication strength reductionKeith Randall
Use an automatic algorithm to generate strength reduction code. You give it all the linear combination (a*x+b*y) instructions in your architecture, it figures out the rest. Just amd64 and arm64 for now. Fixes #67575 Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/626998 Reviewed-by: Jakub Ciolek <jakub@ciolek.dev> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01cmd/compile,internal/cpu,runtime: intrinsify math/bits.OnesCount on riscv64Joel Sing
For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount using the CPOP/CPOPW machine instructions. Since the native Go implementation of OnesCount is relatively expensive, it is also worth emitting a check for Zbb support when compiled for rva20u64. On a Banana Pi F3, with GORISCV64=rva22u64: │ oc.1 │ oc.2 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 4.389n ± 0% -74.08% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.016n ± 0% -11.10% (p=0.000 n=10) OnesCount16-8 9.404n ± 0% 5.015n ± 0% -46.67% (p=0.000 n=10) OnesCount32-8 13.165n ± 0% 4.388n ± 0% -66.67% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 4.388n ± 0% -73.08% (p=0.000 n=10) geomean 11.40n 4.629n -59.40% On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb detection enabled: │ oc.3 │ oc.4 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 5.643n ± 0% -66.67% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.642n ± 0% ~ (p=0.447 n=10) OnesCount16-8 10.030n ± 0% 6.896n ± 0% -31.25% (p=0.000 n=10) OnesCount32-8 13.170n ± 0% 5.642n ± 0% -57.16% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 5.642n ± 0% -65.39% (p=0.000 n=10) geomean 11.55n 5.873n -49.16% On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb detection disabled: │ oc.3 │ oc.5 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.93n ± 0% 29.47n ± 0% +74.07% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.643n ± 0% ~ (p=0.191 n=10) OnesCount16-8 10.03n ± 0% 15.05n ± 0% +50.05% (p=0.000 n=10) OnesCount32-8 13.17n ± 0% 18.18n ± 0% +38.04% (p=0.000 n=10) OnesCount64-8 16.30n ± 0% 21.94n ± 0% +34.60% (p=0.000 n=10) geomean 11.55n 15.84n +37.16% For hardware without Zbb, this adds ~5ns overhead, while for hardware with Zbb we achieve a performance gain up of up to 11ns. It is worth noting that OnesCount8 is cheap enough that it is preferable to stick with the generic version in this case. Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5 Reviewed-on: https://go-review.googlesource.com/c/go/+/660856 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01cmd/compile: intrinsify math/bits.Bswap on riscv64Joel Sing
For riscv64/rva22u64 and above, we can intrinsify math/bits.Bswap using the REV8 machine instruction. On a StarFive VisionFive 2 with GORISCV64=rva22u64: │ rb.1 │ rb.2 │ │ sec/op │ sec/op vs base │ ReverseBytes-4 18.790n ± 0% 4.026n ± 0% -78.57% (p=0.000 n=10) ReverseBytes16-4 6.710n ± 0% 5.368n ± 0% -20.00% (p=0.000 n=10) ReverseBytes32-4 13.420n ± 0% 5.368n ± 0% -60.00% (p=0.000 n=10) ReverseBytes64-4 17.450n ± 0% 4.026n ± 0% -76.93% (p=0.000 n=10) geomean 13.11n 4.649n -64.54% Change-Id: I26eee34270b1721f7304bb1cddb0fda129b20ece Reviewed-on: https://go-review.googlesource.com/c/go/+/660855 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-04-22cmd/compile: constant fold 128-bit multipliesKeith Randall
The full 64x64->128 multiply comes up when using bits.Mul64. The 64x64->64+overflow multiply comes up in unsafe.Slice when using a constant length. Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc Reviewed-on: https://go-review.googlesource.com/c/go/+/667175 Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-21cmd/compile: don't evaluate side effects of range over arrayKeith Randall
If the thing we're ranging over is an array or ptr to array, and it doesn't have a function call or channel receive in it, then we shouldn't evaluate it. Typecheck the ranged-over value as a constant in that case. That makes the unified exporter replace the range expression with a constant int. Change-Id: I0d4ea081de70d20cf6d1fa8d25ef6cb021975554 Reviewed-on: https://go-review.googlesource.com/c/go/+/659317 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>
2025-04-09cmd/compile: set unalignedOK to make memcombine work properly on loong64limeidan
goos: linux goarch: loong64 pkg: unicode/utf8 cpu: Loongson-3A6000-HV @ 2500.00MHz │ old │ new │ │ sec/op │ sec/op vs base │ ValidTenASCIIChars 7.604n ± 0% 6.805n ± 0% -10.51% (p=0.000 n=10) Valid100KASCIIChars 37.41µ ± 0% 16.58µ ± 0% -55.67% (p=0.000 n=10) ValidTenJapaneseChars 60.84n ± 0% 58.62n ± 0% -3.64% (p=0.000 n=10) ValidLongMostlyASCII 113.5µ ± 0% 113.5µ ± 0% ~ (p=0.303 n=10) ValidLongJapanese 204.6µ ± 0% 206.8µ ± 0% +1.07% (p=0.000 n=10) ValidStringTenASCIIChars 7.604n ± 0% 6.803n ± 0% -10.53% (p=0.000 n=10) ValidString100KASCIIChars 38.05µ ± 0% 17.14µ ± 0% -54.97% (p=0.000 n=10) ValidStringTenJapaneseChars 60.58n ± 0% 59.48n ± 0% -1.82% (p=0.000 n=10) ValidStringLongMostlyASCII 113.5µ ± 0% 113.4µ ± 0% -0.10% (p=0.000 n=10) ValidStringLongJapanese 205.9µ ± 0% 207.3µ ± 0% +0.67% (p=0.000 n=10) geomean 3.324µ 2.756µ -17.08% Change-Id: Id43b6e2e41907bd4b92f421dacde31f048db47d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/662495 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Keith Randall <khr@google.com>
2025-04-04cmd/compile: improve store-to-load forwarding with compatible typesAlexander Musman
Improve the compiler's store-to-load forwarding optimization by relaxing the type comparison condition. Instead of requiring exact type equality (CMPeq), we now use copyCompatibleType which allows forwarding between compatible types where safe. Fix several size comparison bugs in the nested store patterns. Previously, we were comparing the size of the outer store with the load type, rather than comparing with the size of the actual store being forwarded from. Skip OpConvert in dead store elimination to help get rid of dead stores such as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory. This optimization is particularly beneficial for code that creates slices with computed pointers, such as the runtime's heapBitsSlice function, where intermediate calculations were previously causing the compiler to miss store-to-load forwarding opportunities. Local sweet run result on an x86_64 laptop: │ Orig.res │ Hopt.res │ │ sec/op │ sec/op vs base │ BiogoIgor-8 5.303 ± 1% 5.322 ± 1% ~ (p=0.190 n=10) BiogoKrishna-8 7.894 ± 1% 7.828 ± 2% ~ (p=0.190 n=10) BleveIndexBatch100-8 2.257 ± 1% 2.248 ± 2% ~ (p=0.529 n=10) EtcdPut-8 30.12m ± 1% 30.03m ± 1% ~ (p=0.796 n=10) EtcdSTM-8 127.1m ± 1% 126.2m ± 0% -0.74% (p=0.023 n=10) GoBuildKubelet-8 52.21 ± 0% 52.05 ± 1% ~ (p=0.063 n=10) GoBuildKubeletLink-8 4.342 ± 1% 4.305 ± 0% -0.85% (p=0.000 n=10) GoBuildIstioctl-8 43.33 ± 0% 43.24 ± 0% -0.22% (p=0.015 n=10) GoBuildIstioctlLink-8 4.604 ± 1% 4.598 ± 0% ~ (p=0.063 n=10) GoBuildFrontend-8 15.33 ± 0% 15.29 ± 0% ~ (p=0.143 n=10) GoBuildFrontendLink-8 740.0m ± 1% 737.7m ± 1% ~ (p=0.912 n=10) GopherLuaKNucleotide-8 9.590 ± 1% 9.656 ± 1% ~ (p=0.165 n=10) MarkdownRenderXHTML-8 96.97m ± 1% 97.26m ± 2% ~ (p=0.105 n=10) Tile38QueryLoad-8 335.9µ ± 1% 335.6µ ± 1% ~ (p=0.481 n=10) geomean 1.336 1.333 -0.22% Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f Reviewed-on: https://go-review.googlesource.com/c/go/+/662075 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Carlos Amedee <carlos@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-03-28cmd/compile/internal/ssa: optimise more branches with zero on riscv64Joel Sing
Optimise more branches with zero on riscv64. In particular, BLTU with zero occurs with IsInBounds checks for index zero. This currently results in two instructions and requires an additional register: li t2, 0 bltu t2, t1, 0x174b4 This is equivalent to checking if the bounds is not equal to zero. With this change: bnez t1, 0x174c0 This removes more than 500 instructions from the Go binary on riscv64. Change-Id: I6cd861d853e3ef270bd46dacecdfaa205b1c4644 Reviewed-on: https://go-review.googlesource.com/c/go/+/606715 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-27cmd/compile: rename some test packages in codegenMark Freeman
All other files here use the codegen package. Change-Id: I714162941b9fa9051dacc29643e905fe60b9304b Reviewed-on: https://go-review.googlesource.com/c/go/+/661135 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>
2025-03-25test/codegen: add combined conversion and shift testsJoel Sing
This adds tests for type conversion and shifts, detailing various poor bad code generation that currently exists for riscv64. This will be addressed in future CLs. Change-Id: Ie1d366dfe878832df691600f8500ef383da92848 Reviewed-on: https://go-review.googlesource.com/c/go/+/615678 Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>