aboutsummaryrefslogtreecommitdiff
path: root/test/codegen/floats.go
AgeCommit message (Collapse)Author
2026-03-31test/codegen: get rid of \sKeith Randall
Replace \s with a space in backtick-quoted strings Replace \\s with a space in double-quoted strings Change-Id: I0c8b249bb12c2c8ca69e683e4bc6f27544fd6094 Reviewed-on: https://go-review.googlesource.com/c/go/+/760680 Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Paul Murphy <paumurph@redhat.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-27test/codegen: check mips64 sqrt/abs code only for hardfloatKeith Randall
Softfloat doesn't use the hardware instructions. Followon to CL 739520 + CL 757300 Change-Id: Ic271cd5567c62933d2d0c01d8834f9bf07e31061 Reviewed-on: https://go-review.googlesource.com/c/go/+/760520 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Julian Zhu <jz531210@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2026-03-23test/codegen: add codegen checks for float32/float64 conversions optimizationsJulian Zhu
Updates #75463 Change-Id: Iec51bdedd5a29bbb81ac553ad7e22403e1715ee3 Reviewed-on: https://go-review.googlesource.com/c/go/+/757300 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2026-03-16cmd/compile: (riscv64) optimize float32(abs|sqrt64(float64(x)))Meng Zhuo
Absorb unnecessary conversion between float32 and float64 if both src and dst are 32 bit. Updates #75463 Change-Id: Ia71941223b5cca3fea66b559da7b8f916e63feaf Reviewed-on: https://go-review.googlesource.com/c/go/+/733621 Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Julian Zhu <jz531210@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-01cmd/compile: (wasm) optimize float32(round64(float64(x)))David Chase
Not a fix because there are other architectures still to be done. Updates #75463. Change-Id: Ia5233c2b6c5f4439e269950efdd851e72e8e7ff6 Reviewed-on: https://go-review.googlesource.com/c/go/+/730160 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-01cmd/compile: (arm64) optimize float32(round64(float64(x)))David Chase
Not a fix because there are other architectures still to be done. Updates #75463. Change-Id: Ifca03975023e4e5d0ffa98d1f877314a1a291be0 Reviewed-on: https://go-review.googlesource.com/c/go/+/729161 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-03-01cmd/compile: (amd64) optimize float32(round64(float64(x)))David Chase
Not a fix because there are other architectures still to be done. Updates #75463. Change-Id: I3d7754ce4a26af0f5c4ef0be1254d164e68f8442 Reviewed-on: https://go-review.googlesource.com/c/go/+/729160 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-11-12cmd/compile: use FCLASSD for subnormal checks on riscv64Michael Munday
Only implemented for 64 bit floating point operations for now. goos: linux goarch: riscv64 pkg: math cpu: Spacemit(R) X60 │ sec/op │ sec/op vs base │ Acos 154.1n ± 0% 154.1n ± 0% ~ (p=0.303 n=10) Acosh 215.8n ± 6% 226.7n ± 0% ~ (p=0.439 n=10) Asin 149.2n ± 1% 149.2n ± 0% ~ (p=0.700 n=10) Asinh 262.1n ± 0% 258.5n ± 0% -1.37% (p=0.000 n=10) Atan 99.48n ± 0% 99.49n ± 0% ~ (p=0.836 n=10) Atanh 244.9n ± 0% 243.8n ± 0% -0.43% (p=0.002 n=10) Atan2 158.2n ± 1% 153.3n ± 0% -3.10% (p=0.000 n=10) Cbrt 186.8n ± 0% 181.1n ± 0% -3.03% (p=0.000 n=10) Ceil 36.71n ± 1% 36.71n ± 0% ~ (p=0.434 n=10) Copysign 6.531n ± 1% 6.526n ± 0% ~ (p=0.268 n=10) Cos 98.19n ± 0% 95.40n ± 0% -2.84% (p=0.000 n=10) Cosh 233.1n ± 0% 222.6n ± 0% -4.50% (p=0.000 n=10) Erf 122.5n ± 0% 114.2n ± 0% -6.78% (p=0.000 n=10) Erfc 126.0n ± 1% 116.6n ± 0% -7.46% (p=0.000 n=10) Erfinv 138.8n ± 0% 138.6n ± 0% ~ (p=0.082 n=10) Erfcinv 140.0n ± 0% 139.7n ± 0% ~ (p=0.359 n=10) Exp 193.3n ± 0% 184.2n ± 0% -4.68% (p=0.000 n=10) ExpGo 204.8n ± 0% 194.5n ± 0% -5.03% (p=0.000 n=10) Expm1 152.5n ± 1% 145.0n ± 0% -4.92% (p=0.000 n=10) Exp2 174.5n ± 0% 164.2n ± 0% -5.85% (p=0.000 n=10) Exp2Go 184.4n ± 1% 175.4n ± 0% -4.88% (p=0.000 n=10) Abs 4.912n ± 0% 4.914n ± 0% ~ (p=0.283 n=10) Dim 15.50n ± 1% 15.52n ± 1% ~ (p=0.331 n=10) Floor 36.89n ± 1% 36.76n ± 1% ~ (p=0.325 n=10) Max 31.05n ± 1% 31.17n ± 1% ~ (p=0.628 n=10) Min 31.01n ± 0% 31.06n ± 0% ~ (p=0.767 n=10) Mod 294.1n ± 0% 245.6n ± 0% -16.52% (p=0.000 n=10) Frexp 44.86n ± 1% 35.20n ± 0% -21.53% (p=0.000 n=10) Gamma 195.8n ± 0% 185.4n ± 1% -5.29% (p=0.000 n=10) Hypot 84.91n ± 0% 84.54n ± 1% -0.43% (p=0.006 n=10) HypotGo 96.70n ± 0% 95.42n ± 1% -1.32% (p=0.000 n=10) Ilogb 45.03n ± 0% 35.07n ± 1% -22.10% (p=0.000 n=10) J0 634.5n ± 0% 627.2n ± 0% -1.16% (p=0.000 n=10) J1 644.5n ± 0% 636.9n ± 0% -1.18% (p=0.000 n=10) Jn 1.357µ ± 0% 1.344µ ± 0% -0.92% (p=0.000 n=10) Ldexp 49.89n ± 0% 39.96n ± 0% -19.90% (p=0.000 n=10) Lgamma 186.6n ± 0% 184.3n ± 0% -1.21% (p=0.000 n=10) Log 150.4n ± 0% 141.1n ± 0% -6.15% (p=0.000 n=10) Logb 46.70n ± 0% 35.89n ± 0% -23.15% (p=0.000 n=10) Log1p 164.1n ± 0% 163.9n ± 0% ~ (p=0.122 n=10) Log10 153.1n ± 0% 143.5n ± 0% -6.24% (p=0.000 n=10) Log2 58.83n ± 0% 49.75n ± 0% -15.43% (p=0.000 n=10) Modf 40.82n ± 1% 40.78n ± 0% ~ (p=0.239 n=10) Nextafter32 49.15n ± 0% 48.93n ± 0% -0.44% (p=0.011 n=10) Nextafter64 43.33n ± 0% 43.23n ± 0% ~ (p=0.228 n=10) PowInt 269.4n ± 0% 243.8n ± 0% -9.49% (p=0.000 n=10) PowFrac 618.0n ± 0% 571.7n ± 0% -7.48% (p=0.000 n=10) Pow10Pos 13.09n ± 0% 13.05n ± 0% -0.31% (p=0.003 n=10) Pow10Neg 30.99n ± 1% 30.99n ± 0% ~ (p=0.173 n=10) Round 23.73n ± 0% 23.65n ± 0% -0.36% (p=0.011 n=10) RoundToEven 27.87n ± 0% 27.73n ± 0% -0.48% (p=0.003 n=10) Remainder 282.1n ± 0% 249.6n ± 0% -11.52% (p=0.000 n=10) Signbit 11.46n ± 0% 11.42n ± 0% -0.39% (p=0.003 n=10) Sin 115.2n ± 0% 113.2n ± 0% -1.74% (p=0.000 n=10) Sincos 140.6n ± 0% 138.6n ± 0% -1.39% (p=0.000 n=10) Sinh 252.0n ± 0% 241.4n ± 0% -4.21% (p=0.000 n=10) SqrtIndirect 4.909n ± 0% 4.893n ± 0% -0.34% (p=0.021 n=10) SqrtLatency 19.57n ± 1% 19.57n ± 0% ~ (p=0.087 n=10) SqrtIndirectLatency 19.64n ± 0% 19.57n ± 0% -0.36% (p=0.025 n=10) SqrtGoLatency 198.1n ± 0% 197.4n ± 0% -0.35% (p=0.014 n=10) SqrtPrime 5.733µ ± 0% 5.725µ ± 0% ~ (p=0.116 n=10) Tan 149.1n ± 0% 146.8n ± 0% -1.54% (p=0.000 n=10) Tanh 248.2n ± 1% 238.1n ± 0% -4.05% (p=0.000 n=10) Trunc 36.86n ± 0% 36.70n ± 0% -0.43% (p=0.029 n=10) Y0 638.2n ± 0% 633.6n ± 0% -0.71% (p=0.000 n=10) Y1 641.8n ± 0% 636.1n ± 0% -0.87% (p=0.000 n=10) Yn 1.358µ ± 0% 1.345µ ± 0% -0.92% (p=0.000 n=10) Float64bits 5.721n ± 0% 5.709n ± 0% -0.22% (p=0.044 n=10) Float64frombits 4.905n ± 0% 4.893n ± 0% ~ (p=0.266 n=10) Float32bits 12.27n ± 0% 12.23n ± 0% ~ (p=0.122 n=10) Float32frombits 4.909n ± 0% 4.893n ± 0% -0.32% (p=0.024 n=10) FMA 6.556n ± 0% 6.526n ± 0% ~ (p=0.283 n=10) geomean 86.82n 83.75n -3.54% Change-Id: I522297a79646d76543d516accce291f5a3cea337 Reviewed-on: https://go-review.googlesource.com/c/go/+/717560 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-10-29test/codegen: simplify asmcheck pattern matchingRuss Cox
Separate patterns in asmcheck by spaces instead of commas. Many patterns end in comma (like "MOV [$]123,") so separating patterns by comma is not great; they're already quoted, so spaces are fine. Also replace all tabs in the assembly lines with spaces before matching. Finally, replace \$ or \\$ with [$] as the matching idiom. The effect of all these is to make the patterns look like: // amd64:"BSFQ" "ORQ [$]256" instead of the old: // amd64:"BSFQ","ORQ\t\\$256" Update all tests as well. Change-Id: Ia39febe5d7f67ba115846422789e11b185d5c807 Reviewed-on: https://go-review.googlesource.com/c/go/+/716060 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
2025-10-26cmd/compile: use MOV(D|F) with const for Const(64|32)F on riscv64Meng Zhuo
The original Const64F using: AUIPC + LD + FMVDX to load float64 const, we can use AUIPC + FLD instead, same as Const32F. Change-Id: I8ca0a0e90d820a26e69b74cd25df3cc662132bf7 Reviewed-on: https://go-review.googlesource.com/c/go/+/703215 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-09-30test/codegen: codify handling of floating point constants on arm64Joel Sing
While here, reorder Float32ConstantStore/Float64ConstantStore for consistency. Change-Id: Ic1b3e9f9474965d15bc94518d78d1a4a7bda93f3 Reviewed-on: https://go-review.googlesource.com/c/go/+/703756 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@google.com>
2025-08-24test/codegen: add Mul2 and DivPow2 test for loong64Xiaolin Zhao
Change-Id: I29ccd105c5418955146a3f4873162963da489a70 Reviewed-on: https://go-review.googlesource.com/c/go/+/697935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-07-30cmd/compile: add floating point min/max intrinsics on s390xMichael Munday
Add the VECTOR FP (MINIMUM|MAXIMUM) instructions to the assembler and use them in the compiler to implement min and max. Note: I've allowed floating point registers to be used with the single element instructions (those with the W instead of V prefix) to allow easier integration into the compiler. Change-Id: I5f80a510bd248cf483cce95f1979bf63fbae7de6 Reviewed-on: https://go-review.googlesource.com/c/go/+/684715 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <mark@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-03-13test/codegen: remove plan9/amd64 specific array zeroing/copying testsJoel Sing
The compiler previously avoided the use of MOVUPS on plan9/amd64. This was changed in CL 655875, however the codegen tests were not updated and now fail (seemingly the full codegen tests do not run anywhere, not even on the longtest builders). Change-Id: I388b60e7b0911048d4949c5029347f9801c018a9 Reviewed-on: https://go-review.googlesource.com/c/go/+/656997 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Auto-Submit: Keith Randall <khr@google.com>
2025-02-13cmd/compile: lower x*z + y to FMA if FMA enabledJakub Ciolek
There is a generic opcode for FMA, but we don't use it in rewrite rules. This is maybe because some archs, like WASM and MIPS don't have a late lowering rule for it. Fixes #71204 Intel Alder Lake 12600k (GOAMD64=v3): math: name old time/op new time/op delta Acos-16 4.58ns ± 0% 3.36ns ± 0% -26.68% (p=0.008 n=5+5) Acosh-16 8.04ns ± 1% 6.44ns ± 0% -19.95% (p=0.008 n=5+5) Asin-16 4.28ns ± 0% 3.32ns ± 0% -22.24% (p=0.008 n=5+5) Asinh-16 9.92ns ± 0% 8.62ns ± 0% -13.13% (p=0.008 n=5+5) Atan-16 2.31ns ± 0% 1.84ns ± 0% -20.02% (p=0.008 n=5+5) Atanh-16 7.79ns ± 0% 7.03ns ± 0% -9.67% (p=0.008 n=5+5) Atan2-16 3.93ns ± 0% 3.52ns ± 0% -10.35% (p=0.000 n=5+4) Cbrt-16 4.62ns ± 0% 4.41ns ± 0% -4.57% (p=0.016 n=4+5) Ceil-16 0.14ns ± 1% 0.14ns ± 2% ~ (p=0.103 n=5+5) Copysign-16 0.33ns ± 0% 0.33ns ± 0% +0.03% (p=0.029 n=4+4) Cos-16 4.87ns ± 0% 4.75ns ± 0% -2.44% (p=0.016 n=5+4) Cosh-16 4.86ns ± 0% 4.86ns ± 0% ~ (p=0.317 n=5+5) Erf-16 2.71ns ± 0% 2.25ns ± 0% -16.69% (p=0.008 n=5+5) Erfc-16 3.06ns ± 0% 2.67ns ± 0% -13.00% (p=0.016 n=5+4) Erfinv-16 3.88ns ± 0% 2.84ns ± 3% -26.83% (p=0.008 n=5+5) Erfcinv-16 4.08ns ± 0% 3.01ns ± 1% -26.27% (p=0.008 n=5+5) Exp-16 3.29ns ± 0% 3.37ns ± 2% +2.64% (p=0.016 n=4+5) ExpGo-16 8.44ns ± 0% 7.48ns ± 1% -11.37% (p=0.008 n=5+5) Expm1-16 4.46ns ± 0% 3.69ns ± 2% -17.26% (p=0.016 n=4+5) Exp2-16 8.20ns ± 0% 7.39ns ± 2% -9.94% (p=0.008 n=5+5) Exp2Go-16 8.26ns ± 0% 7.23ns ± 0% -12.49% (p=0.016 n=4+5) Abs-16 0.26ns ± 3% 0.22ns ± 1% -16.34% (p=0.008 n=5+5) Dim-16 0.38ns ± 1% 0.40ns ± 2% +5.02% (p=0.008 n=5+5) Floor-16 0.11ns ± 1% 0.17ns ± 4% +54.99% (p=0.008 n=5+5) Max-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.619 n=5+5) Min-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.484 n=5+5) Mod-16 13.4ns ± 1% 12.8ns ± 0% -4.21% (p=0.016 n=5+4) Frexp-16 1.70ns ± 0% 1.71ns ± 0% +0.46% (p=0.008 n=5+5) Gamma-16 3.97ns ± 0% 3.97ns ± 0% ~ (p=0.643 n=5+5) Hypot-16 2.11ns ± 0% 2.11ns ± 0% ~ (p=0.762 n=5+5) HypotGo-16 2.48ns ± 4% 2.26ns ± 0% -8.94% (p=0.008 n=5+5) Ilogb-16 1.67ns ± 0% 1.67ns ± 0% -0.07% (p=0.048 n=5+5) J0-16 19.8ns ± 0% 19.3ns ± 0% ~ (p=0.079 n=4+5) J1-16 19.4ns ± 0% 18.9ns ± 0% -2.63% (p=0.000 n=5+4) Jn-16 41.5ns ± 0% 40.6ns ± 0% -2.32% (p=0.016 n=4+5) Ldexp-16 2.26ns ± 0% 2.26ns ± 0% ~ (p=0.683 n=5+5) Lgamma-16 4.40ns ± 0% 4.21ns ± 0% -4.21% (p=0.008 n=5+5) Log-16 4.05ns ± 0% 4.05ns ± 0% ~ (all equal) Logb-16 1.69ns ± 0% 1.69ns ± 0% ~ (p=0.429 n=5+5) Log1p-16 5.00ns ± 0% 3.99ns ± 0% -20.14% (p=0.008 n=5+5) Log10-16 4.22ns ± 0% 4.21ns ± 0% -0.15% (p=0.008 n=5+5) Log2-16 2.27ns ± 0% 2.25ns ± 0% -0.94% (p=0.008 n=5+5) Modf-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.492 n=5+5) Nextafter32-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.079 n=4+5) Nextafter64-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.095 n=4+5) PowInt-16 10.8ns ± 0% 10.8ns ± 0% ~ (all equal) PowFrac-16 25.3ns ± 0% 25.3ns ± 0% -0.09% (p=0.000 n=5+4) Pow10Pos-16 0.52ns ± 1% 0.52ns ± 0% ~ (p=0.810 n=5+5) Pow10Neg-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.381 n=5+5) Round-16 0.93ns ± 0% 0.93ns ± 0% ~ (p=0.056 n=5+5) RoundToEven-16 1.64ns ± 0% 1.64ns ± 0% ~ (all equal) Remainder-16 12.4ns ± 2% 12.0ns ± 0% -3.27% (p=0.008 n=5+5) Signbit-16 0.37ns ± 0% 0.37ns ± 0% -0.19% (p=0.008 n=5+5) Sin-16 4.04ns ± 0% 3.92ns ± 0% -3.13% (p=0.000 n=4+5) Sincos-16 5.99ns ± 0% 5.80ns ± 0% -3.03% (p=0.008 n=5+5) Sinh-16 5.22ns ± 0% 5.22ns ± 0% ~ (p=0.651 n=5+4) SqrtIndirect-16 0.41ns ± 0% 0.41ns ± 0% ~ (p=0.333 n=4+5) SqrtLatency-16 2.66ns ± 0% 2.66ns ± 0% ~ (p=0.079 n=4+5) SqrtIndirectLatency-16 2.66ns ± 0% 2.66ns ± 0% ~ (p=1.000 n=5+5) SqrtGoLatency-16 30.1ns ± 0% 28.6ns ± 1% -4.84% (p=0.008 n=5+5) SqrtPrime-16 645ns ± 0% 645ns ± 0% ~ (p=0.095 n=5+4) Tan-16 4.21ns ± 0% 4.09ns ± 0% -2.76% (p=0.029 n=4+4) Tanh-16 5.36ns ± 0% 5.36ns ± 0% ~ (p=0.444 n=5+5) Trunc-16 0.12ns ± 6% 0.11ns ± 1% -6.79% (p=0.008 n=5+5) Y0-16 19.2ns ± 0% 18.7ns ± 0% -2.52% (p=0.000 n=5+4) Y1-16 19.1ns ± 0% 18.4ns ± 0% ~ (p=0.079 n=4+5) Yn-16 40.7ns ± 0% 39.5ns ± 0% -2.82% (p=0.008 n=5+5) Float64bits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.603 n=5+5) Float64frombits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.984 n=4+5) Float32bits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.778 n=4+5) Float32frombits-16 0.21ns ± 0% 0.20ns ± 0% ~ (p=0.397 n=5+5) FMA-16 0.82ns ± 0% 0.82ns ± 0% +0.02% (p=0.029 n=4+4) [Geo mean] 2.87ns 2.74ns -4.61% math/cmplx: name old time/op new time/op delta Abs-16 2.07ns ± 0% 2.05ns ± 0% -0.70% (p=0.016 n=5+4) Acos-16 36.5ns ± 0% 35.7ns ± 0% -2.33% (p=0.029 n=4+4) Acosh-16 37.0ns ± 0% 36.2ns ± 0% -2.20% (p=0.008 n=5+5) Asin-16 36.5ns ± 0% 35.7ns ± 0% -2.29% (p=0.008 n=5+5) Asinh-16 33.5ns ± 0% 31.6ns ± 0% -5.51% (p=0.008 n=5+5) Atan-16 15.5ns ± 0% 13.9ns ± 0% -10.61% (p=0.008 n=5+5) Atanh-16 15.0ns ± 0% 13.6ns ± 0% -9.73% (p=0.008 n=5+5) Conj-16 0.11ns ± 5% 0.11ns ± 1% ~ (p=0.421 n=5+5) Cos-16 12.3ns ± 0% 12.2ns ± 0% -0.60% (p=0.000 n=4+5) Cosh-16 12.1ns ± 0% 12.0ns ± 0% ~ (p=0.079 n=4+5) Exp-16 10.0ns ± 0% 9.8ns ± 0% -1.77% (p=0.008 n=5+5) Log-16 14.5ns ± 0% 13.7ns ± 0% -5.67% (p=0.008 n=5+5) Log10-16 14.5ns ± 0% 13.7ns ± 0% -5.55% (p=0.000 n=5+4) Phase-16 5.11ns ± 0% 4.25ns ± 0% -16.90% (p=0.008 n=5+5) Polar-16 7.12ns ± 0% 6.35ns ± 0% -10.90% (p=0.008 n=5+5) Pow-16 64.3ns ± 0% 63.7ns ± 0% -0.97% (p=0.008 n=5+5) Rect-16 5.74ns ± 0% 5.58ns ± 0% -2.73% (p=0.016 n=4+5) Sin-16 12.2ns ± 0% 12.2ns ± 0% -0.54% (p=0.000 n=4+5) Sinh-16 12.1ns ± 0% 12.0ns ± 0% -0.58% (p=0.000 n=5+4) Sqrt-16 5.30ns ± 0% 5.18ns ± 0% -2.36% (p=0.008 n=5+5) Tan-16 22.7ns ± 0% 22.6ns ± 0% -0.33% (p=0.008 n=5+5) Tanh-16 21.2ns ± 0% 20.9ns ± 0% -1.32% (p=0.008 n=5+5) [Geo mean] 11.3ns 10.8ns -3.97% Change-Id: Idcc4b357ba68477929c126289e5095b27a827b1b Reviewed-on: https://go-review.googlesource.com/c/go/+/646335 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-11-08cmd/compile: implement FMA codegen for loong64Xiaolin Zhao
Benchmark results on Loongson 3A5000 and 3A6000: goos: linux goarch: loong64 pkg: math cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | FMA 25.930n ± 0% 2.002n ± 0% -92.28% (p=0.000 n=10) goos: linux goarch: loong64 pkg: math cpu: Loongson-3A5000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | FMA 32.840n ± 0% 2.002n ± 0% -93.90% (p=0.000 n=10) Updates #59120 This patch is a copy of CL 483355. Co-authored-by: WANG Xuerui <git@xen0n.name> Change-Id: I88b89d23f00864f9173a182a47ee135afec7ed6e Reviewed-on: https://go-review.googlesource.com/c/go/+/625335 Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-10-17cmd/compile: optimize loong64 with register indexed load/storeXiaolin Zhao
goos: linux goarch: loong64 pkg: test/bench/go1 cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | BinaryTree17 7.766 ± 1% 7.640 ± 2% -1.62% (p=0.000 n=20) Fannkuch11 2.649 ± 0% 2.358 ± 0% -10.96% (p=0.000 n=20) FmtFprintfEmpty 35.89n ± 0% 35.87n ± 0% -0.06% (p=0.000 n=20) FmtFprintfString 59.44n ± 0% 57.25n ± 2% -3.68% (p=0.000 n=20) FmtFprintfInt 62.07n ± 0% 60.04n ± 0% -3.27% (p=0.000 n=20) FmtFprintfIntInt 97.90n ± 0% 97.26n ± 0% -0.65% (p=0.000 n=20) FmtFprintfPrefixedInt 116.7n ± 0% 119.2n ± 0% +2.14% (p=0.000 n=20) FmtFprintfFloat 204.5n ± 0% 201.9n ± 0% -1.30% (p=0.000 n=20) FmtManyArgs 455.9n ± 0% 466.8n ± 0% +2.39% (p=0.000 n=20) GobDecode 7.458m ± 1% 7.138m ± 1% -4.28% (p=0.000 n=20) GobEncode 8.573m ± 1% 8.473m ± 1% ~ (p=0.091 n=20) Gzip 280.2m ± 0% 284.9m ± 0% +1.67% (p=0.000 n=20) Gunzip 32.68m ± 0% 32.67m ± 0% ~ (p=0.211 n=20) HTTPClientServer 54.22µ ± 0% 53.24µ ± 0% -1.80% (p=0.000 n=20) JSONEncode 9.427m ± 1% 9.152m ± 0% -2.92% (p=0.000 n=20) JSONDecode 47.08m ± 1% 46.85m ± 1% -0.49% (p=0.007 n=20) Mandelbrot200 4.601m ± 0% 4.605m ± 0% +0.08% (p=0.000 n=20) GoParse 4.776m ± 0% 4.655m ± 1% -2.52% (p=0.000 n=20) RegexpMatchEasy0_32 59.77n ± 0% 57.59n ± 0% -3.66% (p=0.000 n=20) RegexpMatchEasy0_1K 458.1n ± 0% 458.8n ± 0% +0.15% (p=0.000 n=20) RegexpMatchEasy1_32 59.36n ± 0% 59.24n ± 0% -0.20% (p=0.000 n=20) RegexpMatchEasy1_1K 557.7n ± 0% 560.2n ± 0% +0.46% (p=0.000 n=20) RegexpMatchMedium_32 803.1n ± 0% 772.8n ± 0% -3.77% (p=0.000 n=20) RegexpMatchMedium_1K 27.29µ ± 0% 25.88µ ± 0% -5.18% (p=0.000 n=20) RegexpMatchHard_32 1.385µ ± 0% 1.304µ ± 0% -5.85% (p=0.000 n=20) RegexpMatchHard_1K 40.92µ ± 0% 39.58µ ± 0% -3.27% (p=0.000 n=20) Revcomp 474.3m ± 0% 410.0m ± 0% -13.56% (p=0.000 n=20) Template 78.16m ± 0% 76.32m ± 1% -2.36% (p=0.000 n=20) TimeParse 271.8n ± 0% 272.1n ± 0% +0.11% (p=0.000 n=20) TimeFormat 292.3n ± 0% 294.8n ± 0% +0.86% (p=0.000 n=20) geomean 51.98µ 50.82µ -2.22% Change-Id: Ia78f1ddee8f1d9ec7192a4b8d2a4ec6058679956 Reviewed-on: https://go-review.googlesource.com/c/go/+/615918 Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2024-08-07cmd/compile, math: improve implementation of math.{Max,Min} on loong64Xiaolin Zhao
Make math.{Min,Max} intrinsics and implement math.{archMax,archMin} in hardware. goos: linux goarch: loong64 pkg: math cpu: Loongson-3A6000 @ 2500.00MHz │ old.bench │ new.bench │ │ sec/op │ sec/op vs base │ Max 7.606n ± 0% 3.087n ± 0% -59.41% (p=0.000 n=20) Min 7.205n ± 0% 2.904n ± 0% -59.69% (p=0.000 n=20) MinFloat 37.220n ± 0% 4.802n ± 0% -87.10% (p=0.000 n=20) MaxFloat 33.620n ± 0% 4.802n ± 0% -85.72% (p=0.000 n=20) geomean 16.18n 3.792n -76.57% goos: linux goarch: loong64 pkg: runtime cpu: Loongson-3A5000 @ 2500.00MHz │ old.bench │ new.bench │ │ sec/op │ sec/op vs base │ Max 10.010n ± 0% 7.196n ± 0% -28.11% (p=0.000 n=20) Min 8.806n ± 0% 7.155n ± 0% -18.75% (p=0.000 n=20) MinFloat 60.010n ± 0% 7.976n ± 0% -86.71% (p=0.000 n=20) MaxFloat 56.410n ± 0% 7.980n ± 0% -85.85% (p=0.000 n=20) geomean 23.37n 7.566n -67.63% Updates #59120. Change-Id: I6815d20bc304af3cbf5d6ca8fe0ca1c2ddebea2d Reviewed-on: https://go-review.googlesource.com/c/go/+/580283 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2024-07-23cmd/compile: store constant floats using integer constantsKeith Randall
x86 is better at storing constant ints than constant floats. (It uses a constant directly in the instruction stream, instead of loading it from a constant global memory.) Noticed as part of #67957 Change-Id: I9b7b586ad8e0fe9ce245324f020e9526f82b209d Reviewed-on: https://go-review.googlesource.com/c/go/+/592596 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-04-04cmd/internal/obj/ppc64: on Power10, use xxspltidp for float constantsPaul E. Murphy
Any normal float32 constant can be generated by this instruction; use xxspltidp when possible. This prefixed instruction is much faster than the two instruction load sequence from the float32/float64 constant pool. Change-Id: Id751d9ffdae71463adbde66427b986f0b2ef74c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/575555 Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2024-04-01cmd/compile: support float min/max instructions on PPC64Paul E. Murphy
This enables efficient use of the builtin min/max function for float64 and float32 types on GOPPC64 >= power9. Extend the assembler to support xsminjdp/xsmaxjdp and use them to implement float min/max. Simplify the VSX xx3 opcode rules to allow FPR arguments, if all arguments are an FPR. Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328 Reviewed-on: https://go-review.googlesource.com/c/go/+/574535 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Than McIntosh <thanm@google.com>
2024-02-08test/codegen: add float max/min codegen testMeng Zhuo
As CL 514596 and CL 514775 adds hardware implement of float max/min, we should add codegen test for these two CL. Change-Id: I347331032fe9f67a2e6fdb5d3cfe20203296b81c Reviewed-on: https://go-review.googlesource.com/c/go/+/561295 Reviewed-by: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: M Zhuo <mengzhuo1203@gmail.com> Reviewed-by: David Chase <drchase@google.com>
2023-08-22cmd/compile: add single-precision FMA code generation for riscv64Meng Zhuo
This CL adds FMADDS,FMSUBS,FNMADDS,FNMSUBS SSA support for riscv Change-Id: I1e7dd322b46b9e0f4923dbba256303d69ed12066 Reviewed-on: https://go-review.googlesource.com/c/go/+/506616 Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: M Zhuo <mzh@golangcn.org>
2023-08-22cmd/compile: improve FP FMA performance on riscv64Meng Zhuo
FMADD/FMSUB/FNSUB are an efficient FP FMA instructions, which can be used by the compiler to improve FP performance. Erf 188.0n ± 2% 139.5n ± 2% -25.82% (p=0.000 n=10) Erfc 193.6n ± 1% 143.2n ± 1% -26.01% (p=0.000 n=10) Erfinv 244.4n ± 2% 172.6n ± 0% -29.40% (p=0.000 n=10) Erfcinv 244.7n ± 2% 173.0n ± 1% -29.31% (p=0.000 n=10) geomean 216.0n 156.3n -27.65% Ref: The RISC-V Instruction Set Manual Volume I: Unprivileged ISA 11.6 Single-Precision Floating-Point Computational Instructions Change-Id: I89aa3a4df7576fdd47f4a6ee608ac16feafd093c Reviewed-on: https://go-review.googlesource.com/c/go/+/506036 Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: M Zhuo <mzh@golangcn.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-07-04test/codegen: enable Mul2 DivPow2 test for riscv64Meng Zhuo
Change-Id: Ice0bb7a665599b334e927a1b00d1a5b400c15e3d Reviewed-on: https://go-review.googlesource.com/c/go/+/506035 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org>
2023-01-27test/codegen: combine trivial PPC64 tests into ppc64xPaul E. Murphy
Use a small python script to consolidate duplicate ppc64/ppc64le tests into a single ppc64x codegen test. This makes small assumption that anytime two tests with for different arch/variant combos exists, those tests can be combined into a single ppc64x test. E.x: // ppc64le: foo // ppc64le/power9: foo into // ppc64x: foo or // ppc64: foo // ppc64le: foo into // ppc64x: foo import glob import re files = glob.glob("codegen/*.go") for file in files: with open(file) as f: text = [l for l in f] i = 0 while i < len(text): first = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[i]) if first: j = i+1 while j < len(text): second = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[j]) if not second: break if (not first.group(2) or first.group(2) == second.group(2)) and first.group(3) == second.group(3): text[i] = re.sub(" ?ppc64(le|x)?"," ppc64x",text[i]) text=text[:j] + (text[j+1:]) else: j += 1 i+=1 with open(file, 'w') as f: f.write("".join(text)) Change-Id: Ic6b009b54eacaadc5a23db9c5a3bf7331b595821 Reviewed-on: https://go-review.googlesource.com/c/go/+/463220 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2021-02-24cmd/compile: ARM64 optimize []float64 and []float32 accessEgon Elbre
Optimize load and store to []float64 and []float32. Previously it used LSL instead of shifted register indexed load/store. Before: LSL $3, R0, R0 FMOVD F0, (R1)(R0) After: FMOVD F0, (R1)(R0<<3) Fixes #42798 Change-Id: I0c0912140c3dce5aa6abc27097c0eb93833cc589 Reviewed-on: https://go-review.googlesource.com/c/go/+/273706 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Giovanni Bajo <rasky@develer.com>
2020-10-06all: implement GO386=softfloatKeith Randall
Backstop support for non-sse2 chips now that 387 is gone. RELNOTE=yes Change-Id: Ib10e69c4a3654c15a03568f93393437e1939e013 Reviewed-on: https://go-review.googlesource.com/c/go/+/260017 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-10-02all: drop 387 supportKeith Randall
My last 387 CL. So sad ... ... ... ... not! Fixes #40255 Change-Id: I8d4ddb744b234b8adc735db2f7c3c7b6d8bbdfa4 Reviewed-on: https://go-review.googlesource.com/c/go/+/258957 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-08cmd/compile: allow floating point Ops to produce flags on s390xRuixin(Peter) Bao
On s390x, some floating point arithmetic instructions (FSUB, FADD) generate flag. This patch allows those related SSA ops to return a tuple, where the second argument of the tuple is the generated flag. We can use the flag and remove the subsequent comparison instruction (e.g: LTDBR). This CL also reduces the .text section for math.test binary by 0.4KB. Benchmarks: name old time/op new time/op delta Acos-18 12.1ns ± 0% 12.1ns ± 0% ~ (all equal) Acosh-18 18.5ns ± 0% 18.5ns ± 0% ~ (all equal) Asin-18 13.1ns ± 0% 13.1ns ± 0% ~ (all equal) Asinh-18 19.4ns ± 0% 19.5ns ± 1% ~ (p=0.444 n=5+5) Atan-18 10.0ns ± 0% 10.0ns ± 0% ~ (all equal) Atanh-18 19.1ns ± 1% 19.2ns ± 2% ~ (p=0.841 n=5+5) Atan2-18 16.4ns ± 0% 16.4ns ± 0% ~ (all equal) Cbrt-18 14.8ns ± 0% 14.8ns ± 0% ~ (all equal) Ceil-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal) Copysign-18 0.80ns ± 0% 0.80ns ± 0% ~ (all equal) Cos-18 7.19ns ± 0% 7.19ns ± 0% ~ (p=0.556 n=4+5) Cosh-18 12.4ns ± 0% 12.4ns ± 0% ~ (all equal) Erf-18 10.8ns ± 0% 10.8ns ± 0% ~ (all equal) Erfc-18 11.0ns ± 0% 11.0ns ± 0% ~ (all equal) Erfinv-18 23.0ns ±16% 26.8ns ± 1% +16.90% (p=0.008 n=5+5) Erfcinv-18 23.3ns ±15% 26.1ns ± 7% ~ (p=0.087 n=5+5) Exp-18 8.67ns ± 0% 8.67ns ± 0% ~ (p=1.000 n=4+4) ExpGo-18 50.8ns ± 3% 52.4ns ± 2% ~ (p=0.063 n=5+5) Expm1-18 9.49ns ± 1% 9.47ns ± 0% ~ (p=1.000 n=5+5) Exp2-18 52.7ns ± 1% 50.5ns ± 3% -4.10% (p=0.024 n=5+5) Exp2Go-18 50.6ns ± 1% 48.4ns ± 3% -4.39% (p=0.008 n=5+5) Abs-18 0.67ns ± 0% 0.67ns ± 0% ~ (p=0.444 n=5+5) Dim-18 1.02ns ± 0% 1.03ns ± 0% +0.98% (p=0.008 n=5+5) Floor-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal) Max-18 3.09ns ± 1% 3.05ns ± 0% -1.42% (p=0.008 n=5+5) Min-18 3.32ns ± 1% 3.30ns ± 0% -0.72% (p=0.016 n=5+4) Mod-18 62.3ns ± 1% 65.8ns ± 3% +5.55% (p=0.008 n=5+5) Frexp-18 5.05ns ± 2% 4.98ns ± 0% ~ (p=0.683 n=5+5) Gamma-18 24.4ns ± 0% 24.1ns ± 0% -1.23% (p=0.008 n=5+5) Hypot-18 10.3ns ± 0% 10.3ns ± 0% ~ (all equal) HypotGo-18 10.2ns ± 0% 10.2ns ± 0% ~ (all equal) Ilogb-18 3.56ns ± 1% 3.54ns ± 0% ~ (p=0.595 n=5+5) J0-18 113ns ± 0% 108ns ± 1% -4.42% (p=0.016 n=4+5) J1-18 115ns ± 0% 109ns ± 1% -4.87% (p=0.016 n=4+5) Jn-18 240ns ± 0% 230ns ± 2% -4.41% (p=0.008 n=5+5) Ldexp-18 6.19ns ± 0% 6.19ns ± 0% ~ (p=0.444 n=5+5) Lgamma-18 32.2ns ± 0% 32.2ns ± 0% ~ (all equal) Log-18 13.1ns ± 0% 13.1ns ± 0% ~ (all equal) Logb-18 4.23ns ± 0% 4.22ns ± 0% ~ (p=0.444 n=5+5) Log1p-18 12.7ns ± 0% 12.7ns ± 0% ~ (all equal) Log10-18 18.1ns ± 0% 18.2ns ± 0% ~ (p=0.167 n=5+5) Log2-18 14.0ns ± 0% 14.0ns ± 0% ~ (all equal) Modf-18 10.4ns ± 0% 10.5ns ± 0% +0.96% (p=0.016 n=4+5) Nextafter32-18 11.3ns ± 0% 11.3ns ± 0% ~ (all equal) Nextafter64-18 4.01ns ± 1% 3.97ns ± 0% ~ (p=0.333 n=5+4) PowInt-18 32.7ns ± 0% 32.7ns ± 0% ~ (all equal) PowFrac-18 33.2ns ± 0% 33.1ns ± 0% ~ (p=0.095 n=4+5) Pow10Pos-18 1.58ns ± 0% 1.58ns ± 0% ~ (all equal) Pow10Neg-18 5.81ns ± 0% 5.81ns ± 0% ~ (all equal) Round-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal) RoundToEven-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal) Remainder-18 40.6ns ± 0% 40.7ns ± 0% ~ (p=0.238 n=5+4) Signbit-18 1.57ns ± 0% 1.57ns ± 0% ~ (all equal) Sin-18 6.75ns ± 0% 6.74ns ± 0% ~ (p=0.333 n=5+4) Sincos-18 29.5ns ± 0% 29.5ns ± 0% ~ (all equal) Sinh-18 14.4ns ± 0% 14.4ns ± 0% ~ (all equal) SqrtIndirect-18 3.97ns ± 0% 4.15ns ± 0% +4.59% (p=0.008 n=5+5) SqrtLatency-18 8.01ns ± 0% 8.01ns ± 0% ~ (all equal) SqrtIndirectLatency-18 11.6ns ± 0% 11.6ns ± 0% ~ (all equal) SqrtGoLatency-18 44.7ns ± 0% 45.0ns ± 0% +0.67% (p=0.008 n=5+5) SqrtPrime-18 1.26µs ± 0% 1.27µs ± 0% +0.63% (p=0.029 n=4+4) Tan-18 11.1ns ± 0% 11.1ns ± 0% ~ (all equal) Tanh-18 15.8ns ± 0% 15.8ns ± 0% ~ (all equal) Trunc-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal) Y0-18 113ns ± 2% 108ns ± 3% -5.11% (p=0.008 n=5+5) Y1-18 112ns ± 3% 107ns ± 0% -4.29% (p=0.000 n=5+4) Yn-18 229ns ± 0% 220ns ± 1% -3.76% (p=0.016 n=4+5) Float64bits-18 1.09ns ± 0% 1.09ns ± 0% ~ (all equal) Float64frombits-18 0.55ns ± 0% 0.55ns ± 0% ~ (all equal) Float32bits-18 0.96ns ±16% 0.86ns ± 0% ~ (p=0.563 n=5+5) Float32frombits-18 1.03ns ±28% 0.84ns ± 0% ~ (p=0.167 n=5+5) FMA-18 1.60ns ± 0% 1.60ns ± 0% ~ (all equal) [Geo mean] 10.0ns 9.9ns -0.41% Change-Id: Ief7e63ea5a8ba404b0a4696e12b9b7e0b05a9a03 Reviewed-on: https://go-review.googlesource.com/c/go/+/209160 Reviewed-by: Michael Munday <mike.munday@ibm.com> Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-07cmd/compile: delete the floating point Greater and Geq opsMichael Munday
Extend CL 220417 (which removed the integer Greater and Geq ops) to floating point comparisons. Greater and Geq can always be implemented using Less and Leq. Fixes #37316. Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/222397 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-03-25cmd/compile: use load and test instructions on s390xRuixin(Peter) Bao
The load and test instructions compare the given value against zero and will produce a condition code indicating one of the following scenarios: 0: Result is zero 1: Result is less than zero 2: Result is greater than zero 3: Result is not a number (NaN) The instruction can be used to simplify floating point comparisons against zero, which can enable further optimizations. This CL also reduces the size of .text section of math.test binary by around 0.7 KB (in hexadecimal, from 1358f0 to 135620). Change-Id: I33cb714f0c6feebac7a1c46dfcc735e7daceff9c Reviewed-on: https://go-review.googlesource.com/c/go/+/209159 Reviewed-by: Michael Munday <mike.munday@ibm.com> Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-05-16cmd/compile: fix the error of absorbing boolean tests into block(FGE, FGT)fanzha02
The CL 164718 mistyped the comparison flags. The rules for floating point comparison should be GreaterThanF and GreaterEqualF. Fortunately, the wrong optimizations were overwritten by other integer rules, so the issue won't cause failure but just some performance impact. The fixed CL optimizes the floating point test as follows. source code: func foo(f float64) bool { return f > 4 || f < -4} previous version: "FCMPD", "CSET\tGT", "CBZ" fixed version: "FCMPD", BLE" Add the test case. Change-Id: Iea954fdbb8272b2d642dae0f816dc77286e6e1fa Reviewed-on: https://go-review.googlesource.com/c/go/+/177121 Reviewed-by: Ben Shi <powerman1st@163.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-16test/codegen: enable more tests for ppc64/ppc64leLynn Boger
Adding cases for ppc64,ppc64le to the codegen tests where appropriate. Change-Id: Idf8cbe88a4ab4406a4ef1ea777bd15a58b68f3ed Reviewed-on: https://go-review.googlesource.com/c/142557 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-16test/codegen: fix confusing test casesBen Shi
ARMv7's MULAF/MULSF/MULAD/MULSD are not fused, this CL fixes the confusing test cases. Change-Id: I35022e207e2f0d24a23a7f6f188e41ba8eee9886 Reviewed-on: https://go-review.googlesource.com/c/142439 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Akhil Indurti <aindurti@gmail.com> Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-10-15test/codegen: add tests of FMA for arm/arm64Ben Shi
This CL adds tests of fused multiplication-accumulation on arm/arm64. Change-Id: Ic85d5277c0d6acb7e1e723653372dfaf96824a39 Reviewed-on: https://go-review.googlesource.com/c/141652 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-28cmd/compile: optimize arm64 with indexed FP load/storeBen Shi
The FP load/store on arm64 have register indexed forms. And this CL implements this optimization. 1. The total size of pkg/android_arm64 (excluding cmd/compile) decreases about 400 bytes. 2. There is no regression in the go1 benchmark, the test case GobEncode even gets slight improvement, excluding noise. name old time/op new time/op delta BinaryTree17-4 19.0s ± 0% 19.0s ± 1% ~ (p=0.817 n=29+29) Fannkuch11-4 9.94s ± 0% 9.95s ± 0% +0.03% (p=0.010 n=24+30) FmtFprintfEmpty-4 233ns ± 0% 233ns ± 0% ~ (all equal) FmtFprintfString-4 427ns ± 0% 427ns ± 0% ~ (p=0.649 n=30+30) FmtFprintfInt-4 471ns ± 0% 471ns ± 0% ~ (all equal) FmtFprintfIntInt-4 730ns ± 0% 730ns ± 0% ~ (all equal) FmtFprintfPrefixedInt-4 889ns ± 0% 889ns ± 0% ~ (all equal) FmtFprintfFloat-4 1.21µs ± 0% 1.21µs ± 0% +0.04% (p=0.012 n=20+30) FmtManyArgs-4 2.99µs ± 0% 2.99µs ± 0% ~ (p=0.651 n=29+29) GobDecode-4 42.4ms ± 1% 42.3ms ± 1% -0.27% (p=0.001 n=29+28) GobEncode-4 37.8ms ±11% 36.0ms ± 0% -4.67% (p=0.000 n=30+26) Gzip-4 1.98s ± 1% 1.96s ± 1% -1.26% (p=0.000 n=30+30) Gunzip-4 175ms ± 0% 175ms ± 0% ~ (p=0.988 n=29+29) HTTPClientServer-4 854µs ± 5% 860µs ± 5% ~ (p=0.236 n=28+29) JSONEncode-4 88.8ms ± 0% 87.9ms ± 0% -1.00% (p=0.000 n=24+26) JSONDecode-4 390ms ± 1% 392ms ± 2% +0.48% (p=0.025 n=30+30) Mandelbrot200-4 19.5ms ± 0% 19.5ms ± 0% ~ (p=0.894 n=24+29) GoParse-4 20.3ms ± 0% 20.1ms ± 1% -0.94% (p=0.000 n=27+26) RegexpMatchEasy0_32-4 451ns ± 0% 451ns ± 0% ~ (p=0.578 n=30+30) RegexpMatchEasy0_1K-4 1.63µs ± 0% 1.63µs ± 0% ~ (p=0.298 n=30+28) RegexpMatchEasy1_32-4 431ns ± 0% 434ns ± 0% +0.67% (p=0.000 n=30+29) RegexpMatchEasy1_1K-4 2.60µs ± 0% 2.64µs ± 0% +1.36% (p=0.000 n=28+26) RegexpMatchMedium_32-4 744ns ± 0% 744ns ± 0% ~ (p=0.474 n=29+29) RegexpMatchMedium_1K-4 223µs ± 0% 223µs ± 0% -0.08% (p=0.038 n=26+30) RegexpMatchHard_32-4 12.2µs ± 0% 12.3µs ± 0% +0.27% (p=0.000 n=29+30) RegexpMatchHard_1K-4 373µs ± 0% 373µs ± 0% ~ (p=0.219 n=29+28) Revcomp-4 2.84s ± 0% 2.84s ± 0% ~ (p=0.130 n=28+28) Template-4 394ms ± 1% 392ms ± 1% -0.52% (p=0.001 n=30+30) TimeParse-4 1.93µs ± 0% 1.93µs ± 0% ~ (p=0.587 n=29+30) TimeFormat-4 2.00µs ± 0% 2.00µs ± 0% +0.07% (p=0.001 n=28+27) [Geo mean] 306µs 305µs -0.17% name old speed new speed delta GobDecode-4 18.1MB/s ± 1% 18.2MB/s ± 1% +0.27% (p=0.001 n=29+28) GobEncode-4 20.3MB/s ±10% 21.3MB/s ± 0% +4.64% (p=0.000 n=30+26) Gzip-4 9.79MB/s ± 1% 9.91MB/s ± 1% +1.28% (p=0.000 n=30+30) Gunzip-4 111MB/s ± 0% 111MB/s ± 0% ~ (p=0.988 n=29+29) JSONEncode-4 21.8MB/s ± 0% 22.1MB/s ± 0% +1.02% (p=0.000 n=24+26) JSONDecode-4 4.97MB/s ± 1% 4.95MB/s ± 2% -0.45% (p=0.031 n=30+30) GoParse-4 2.85MB/s ± 1% 2.88MB/s ± 1% +1.03% (p=0.000 n=30+26) RegexpMatchEasy0_32-4 70.9MB/s ± 0% 70.9MB/s ± 0% ~ (p=0.904 n=29+28) RegexpMatchEasy0_1K-4 627MB/s ± 0% 627MB/s ± 0% ~ (p=0.156 n=30+30) RegexpMatchEasy1_32-4 74.2MB/s ± 0% 73.7MB/s ± 0% -0.67% (p=0.000 n=30+29) RegexpMatchEasy1_1K-4 393MB/s ± 0% 388MB/s ± 0% -1.34% (p=0.000 n=28+26) RegexpMatchMedium_32-4 1.34MB/s ± 0% 1.34MB/s ± 0% ~ (all equal) RegexpMatchMedium_1K-4 4.59MB/s ± 0% 4.59MB/s ± 0% +0.07% (p=0.035 n=25+30) RegexpMatchHard_32-4 2.61MB/s ± 0% 2.61MB/s ± 0% -0.11% (p=0.002 n=28+30) RegexpMatchHard_1K-4 2.75MB/s ± 0% 2.75MB/s ± 0% +0.15% (p=0.001 n=30+24) Revcomp-4 89.4MB/s ± 0% 89.4MB/s ± 0% ~ (p=0.140 n=28+28) Template-4 4.93MB/s ± 1% 4.95MB/s ± 1% +0.51% (p=0.001 n=30+30) [Geo mean] 18.4MB/s 18.4MB/s +0.37% Change-Id: I9a6b521a971b21cfb51064e8e9b853cef8a1d071 Reviewed-on: https://go-review.googlesource.com/124636 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-25cmd/compile: optimize 386 code with FLDPIBen Shi
FLDPI pushes the constant pi to 387's register stack, which is more efficient than MOVSSconst/MOVSDconst. 1. This optimization reduces 0.3KB of the total size of pkg/linux_386 (exlcuding cmd/compile). 2. There is little regression in the go1 benchmark. name old time/op new time/op delta BinaryTree17-4 3.30s ± 3% 3.30s ± 2% ~ (p=0.759 n=40+39) Fannkuch11-4 3.53s ± 1% 3.54s ± 1% ~ (p=0.168 n=40+40) FmtFprintfEmpty-4 45.5ns ± 3% 45.6ns ± 3% ~ (p=0.553 n=40+40) FmtFprintfString-4 78.4ns ± 3% 78.3ns ± 3% ~ (p=0.593 n=40+40) FmtFprintfInt-4 88.8ns ± 2% 89.9ns ± 2% ~ (p=0.083 n=40+33) FmtFprintfIntInt-4 140ns ± 4% 140ns ± 4% ~ (p=0.656 n=40+40) FmtFprintfPrefixedInt-4 180ns ± 2% 181ns ± 3% +0.53% (p=0.050 n=40+40) FmtFprintfFloat-4 408ns ± 4% 411ns ± 3% ~ (p=0.112 n=40+40) FmtManyArgs-4 599ns ± 3% 602ns ± 3% ~ (p=0.784 n=40+40) GobDecode-4 7.24ms ± 6% 7.30ms ± 5% ~ (p=0.171 n=40+40) GobEncode-4 6.98ms ± 5% 6.89ms ± 8% ~ (p=0.107 n=40+40) Gzip-4 396ms ± 4% 396ms ± 3% ~ (p=0.852 n=40+40) Gunzip-4 41.3ms ± 3% 41.5ms ± 4% ~ (p=0.221 n=40+40) HTTPClientServer-4 63.4µs ± 3% 63.4µs ± 2% ~ (p=0.895 n=39+40) JSONEncode-4 17.5ms ± 2% 17.5ms ± 3% ~ (p=0.090 n=40+40) JSONDecode-4 60.6ms ± 3% 60.1ms ± 4% ~ (p=0.184 n=40+40) Mandelbrot200-4 7.80ms ± 3% 7.78ms ± 2% ~ (p=0.512 n=40+40) GoParse-4 3.30ms ± 3% 3.28ms ± 2% -0.61% (p=0.034 n=40+40) RegexpMatchEasy0_32-4 104ns ± 4% 103ns ± 4% ~ (p=0.118 n=40+40) RegexpMatchEasy0_1K-4 850ns ± 2% 848ns ± 2% ~ (p=0.370 n=40+40) RegexpMatchEasy1_32-4 112ns ± 4% 112ns ± 4% ~ (p=0.848 n=40+40) RegexpMatchEasy1_1K-4 1.04µs ± 4% 1.03µs ± 4% ~ (p=0.333 n=40+40) RegexpMatchMedium_32-4 132ns ± 4% 131ns ± 3% ~ (p=0.527 n=40+40) RegexpMatchMedium_1K-4 43.4µs ± 3% 43.5µs ± 3% ~ (p=0.111 n=40+40) RegexpMatchHard_32-4 2.24µs ± 4% 2.24µs ± 4% ~ (p=0.441 n=40+40) RegexpMatchHard_1K-4 67.9µs ± 3% 68.0µs ± 3% ~ (p=0.095 n=40+40) Revcomp-4 1.84s ± 2% 1.84s ± 2% ~ (p=0.677 n=40+40) Template-4 68.4ms ± 3% 68.6ms ± 3% ~ (p=0.345 n=40+40) TimeParse-4 433ns ± 3% 433ns ± 3% ~ (p=0.403 n=40+40) TimeFormat-4 407ns ± 3% 406ns ± 3% ~ (p=0.900 n=40+40) [Geo mean] 67.1µs 67.2µs +0.04% name old speed new speed delta GobDecode-4 106MB/s ± 5% 105MB/s ± 5% ~ (p=0.173 n=40+40) GobEncode-4 110MB/s ± 5% 112MB/s ± 9% ~ (p=0.104 n=40+40) Gzip-4 49.0MB/s ± 4% 49.1MB/s ± 4% ~ (p=0.836 n=40+40) Gunzip-4 471MB/s ± 3% 468MB/s ± 4% ~ (p=0.218 n=40+40) JSONEncode-4 111MB/s ± 2% 111MB/s ± 3% ~ (p=0.090 n=40+40) JSONDecode-4 32.0MB/s ± 3% 32.3MB/s ± 4% ~ (p=0.194 n=40+40) GoParse-4 17.6MB/s ± 3% 17.7MB/s ± 2% +0.62% (p=0.035 n=40+40) RegexpMatchEasy0_32-4 307MB/s ± 4% 309MB/s ± 4% +0.70% (p=0.041 n=40+40) RegexpMatchEasy0_1K-4 1.20GB/s ± 3% 1.21GB/s ± 2% ~ (p=0.353 n=40+40) RegexpMatchEasy1_32-4 285MB/s ± 3% 284MB/s ± 4% ~ (p=0.384 n=40+40) RegexpMatchEasy1_1K-4 988MB/s ± 4% 992MB/s ± 3% ~ (p=0.335 n=40+40) RegexpMatchMedium_32-4 7.56MB/s ± 4% 7.57MB/s ± 4% ~ (p=0.314 n=40+40) RegexpMatchMedium_1K-4 23.6MB/s ± 3% 23.6MB/s ± 3% ~ (p=0.107 n=40+40) RegexpMatchHard_32-4 14.3MB/s ± 4% 14.3MB/s ± 4% ~ (p=0.429 n=40+40) RegexpMatchHard_1K-4 15.1MB/s ± 3% 15.1MB/s ± 3% ~ (p=0.099 n=40+40) Revcomp-4 138MB/s ± 2% 138MB/s ± 2% ~ (p=0.658 n=40+40) Template-4 28.4MB/s ± 3% 28.3MB/s ± 3% ~ (p=0.331 n=40+40) [Geo mean] 80.8MB/s 80.8MB/s +0.09% Change-Id: I0cb715eead68ade097a302e7fb80ccbd1d1b511e Reviewed-on: https://go-review.googlesource.com/130975 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-04-15test: run codegen tests on all supported architecture variantsGiovanni Bajo
This CL makes the codegen testsuite automatically test all architecture variants for architecture specified in tests. For instance, if a test file specifies a "arm" test, it will be automatically run on all GOARM variants (5,6,7), to increase the coverage. The CL also introduces a syntax to specify only a specific variant (eg: "arm/7") in case the test makes sense only there. The same syntax also allows to specify the operating system in case it matters (eg: "plan9/386/sse2"). Fixes #24658 Change-Id: I2eba8b918f51bb6a77a8431a309f8b71af07ea22 Reviewed-on: https://go-review.googlesource.com/107315 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-15test: migrate plan9 tests to codegenGiovanni Bajo
And remove it from asmtest. Next CL will remove the whole asmtest infrastructure. Change-Id: I5851bf7c617456d62a3c6cffacf70252df7b056b Reviewed-on: https://go-review.googlesource.com/107335 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-29test/codegen: match 387 ops too for GOARCH=386Alberto Donizetti
Change-Id: I99407e27e340689009af798989b33cef7cb92070 Reviewed-on: https://go-review.googlesource.com/103376 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-15test/codegen: port floats tests to codegenAlberto Donizetti
And delete them from asm_test. Change-Id: Ibdaca3496eefc73c731b511ddb9636a1f3dff68c Reviewed-on: https://go-review.googlesource.com/100915 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>