| Age | Commit message (Collapse) | Author |
|
Replace \s with a space in backtick-quoted strings
Replace \\s with a space in double-quoted strings
Change-Id: I0c8b249bb12c2c8ca69e683e4bc6f27544fd6094
Reviewed-on: https://go-review.googlesource.com/c/go/+/760680
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Softfloat doesn't use the hardware instructions.
Followon to CL 739520 + CL 757300
Change-Id: Ic271cd5567c62933d2d0c01d8834f9bf07e31061
Reviewed-on: https://go-review.googlesource.com/c/go/+/760520
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Julian Zhu <jz531210@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Updates #75463
Change-Id: Iec51bdedd5a29bbb81ac553ad7e22403e1715ee3
Reviewed-on: https://go-review.googlesource.com/c/go/+/757300
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Absorb unnecessary conversion between float32 and float64
if both src and dst are 32 bit.
Updates #75463
Change-Id: Ia71941223b5cca3fea66b559da7b8f916e63feaf
Reviewed-on: https://go-review.googlesource.com/c/go/+/733621
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Julian Zhu <jz531210@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Not a fix because there are other architectures
still to be done.
Updates #75463.
Change-Id: Ia5233c2b6c5f4439e269950efdd851e72e8e7ff6
Reviewed-on: https://go-review.googlesource.com/c/go/+/730160
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Not a fix because there are other architectures
still to be done.
Updates #75463.
Change-Id: Ifca03975023e4e5d0ffa98d1f877314a1a291be0
Reviewed-on: https://go-review.googlesource.com/c/go/+/729161
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Not a fix because there are other architectures
still to be done.
Updates #75463.
Change-Id: I3d7754ce4a26af0f5c4ef0be1254d164e68f8442
Reviewed-on: https://go-review.googlesource.com/c/go/+/729160
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Only implemented for 64 bit floating point operations for now.
goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
│ sec/op │ sec/op vs base │
Acos 154.1n ± 0% 154.1n ± 0% ~ (p=0.303 n=10)
Acosh 215.8n ± 6% 226.7n ± 0% ~ (p=0.439 n=10)
Asin 149.2n ± 1% 149.2n ± 0% ~ (p=0.700 n=10)
Asinh 262.1n ± 0% 258.5n ± 0% -1.37% (p=0.000 n=10)
Atan 99.48n ± 0% 99.49n ± 0% ~ (p=0.836 n=10)
Atanh 244.9n ± 0% 243.8n ± 0% -0.43% (p=0.002 n=10)
Atan2 158.2n ± 1% 153.3n ± 0% -3.10% (p=0.000 n=10)
Cbrt 186.8n ± 0% 181.1n ± 0% -3.03% (p=0.000 n=10)
Ceil 36.71n ± 1% 36.71n ± 0% ~ (p=0.434 n=10)
Copysign 6.531n ± 1% 6.526n ± 0% ~ (p=0.268 n=10)
Cos 98.19n ± 0% 95.40n ± 0% -2.84% (p=0.000 n=10)
Cosh 233.1n ± 0% 222.6n ± 0% -4.50% (p=0.000 n=10)
Erf 122.5n ± 0% 114.2n ± 0% -6.78% (p=0.000 n=10)
Erfc 126.0n ± 1% 116.6n ± 0% -7.46% (p=0.000 n=10)
Erfinv 138.8n ± 0% 138.6n ± 0% ~ (p=0.082 n=10)
Erfcinv 140.0n ± 0% 139.7n ± 0% ~ (p=0.359 n=10)
Exp 193.3n ± 0% 184.2n ± 0% -4.68% (p=0.000 n=10)
ExpGo 204.8n ± 0% 194.5n ± 0% -5.03% (p=0.000 n=10)
Expm1 152.5n ± 1% 145.0n ± 0% -4.92% (p=0.000 n=10)
Exp2 174.5n ± 0% 164.2n ± 0% -5.85% (p=0.000 n=10)
Exp2Go 184.4n ± 1% 175.4n ± 0% -4.88% (p=0.000 n=10)
Abs 4.912n ± 0% 4.914n ± 0% ~ (p=0.283 n=10)
Dim 15.50n ± 1% 15.52n ± 1% ~ (p=0.331 n=10)
Floor 36.89n ± 1% 36.76n ± 1% ~ (p=0.325 n=10)
Max 31.05n ± 1% 31.17n ± 1% ~ (p=0.628 n=10)
Min 31.01n ± 0% 31.06n ± 0% ~ (p=0.767 n=10)
Mod 294.1n ± 0% 245.6n ± 0% -16.52% (p=0.000 n=10)
Frexp 44.86n ± 1% 35.20n ± 0% -21.53% (p=0.000 n=10)
Gamma 195.8n ± 0% 185.4n ± 1% -5.29% (p=0.000 n=10)
Hypot 84.91n ± 0% 84.54n ± 1% -0.43% (p=0.006 n=10)
HypotGo 96.70n ± 0% 95.42n ± 1% -1.32% (p=0.000 n=10)
Ilogb 45.03n ± 0% 35.07n ± 1% -22.10% (p=0.000 n=10)
J0 634.5n ± 0% 627.2n ± 0% -1.16% (p=0.000 n=10)
J1 644.5n ± 0% 636.9n ± 0% -1.18% (p=0.000 n=10)
Jn 1.357µ ± 0% 1.344µ ± 0% -0.92% (p=0.000 n=10)
Ldexp 49.89n ± 0% 39.96n ± 0% -19.90% (p=0.000 n=10)
Lgamma 186.6n ± 0% 184.3n ± 0% -1.21% (p=0.000 n=10)
Log 150.4n ± 0% 141.1n ± 0% -6.15% (p=0.000 n=10)
Logb 46.70n ± 0% 35.89n ± 0% -23.15% (p=0.000 n=10)
Log1p 164.1n ± 0% 163.9n ± 0% ~ (p=0.122 n=10)
Log10 153.1n ± 0% 143.5n ± 0% -6.24% (p=0.000 n=10)
Log2 58.83n ± 0% 49.75n ± 0% -15.43% (p=0.000 n=10)
Modf 40.82n ± 1% 40.78n ± 0% ~ (p=0.239 n=10)
Nextafter32 49.15n ± 0% 48.93n ± 0% -0.44% (p=0.011 n=10)
Nextafter64 43.33n ± 0% 43.23n ± 0% ~ (p=0.228 n=10)
PowInt 269.4n ± 0% 243.8n ± 0% -9.49% (p=0.000 n=10)
PowFrac 618.0n ± 0% 571.7n ± 0% -7.48% (p=0.000 n=10)
Pow10Pos 13.09n ± 0% 13.05n ± 0% -0.31% (p=0.003 n=10)
Pow10Neg 30.99n ± 1% 30.99n ± 0% ~ (p=0.173 n=10)
Round 23.73n ± 0% 23.65n ± 0% -0.36% (p=0.011 n=10)
RoundToEven 27.87n ± 0% 27.73n ± 0% -0.48% (p=0.003 n=10)
Remainder 282.1n ± 0% 249.6n ± 0% -11.52% (p=0.000 n=10)
Signbit 11.46n ± 0% 11.42n ± 0% -0.39% (p=0.003 n=10)
Sin 115.2n ± 0% 113.2n ± 0% -1.74% (p=0.000 n=10)
Sincos 140.6n ± 0% 138.6n ± 0% -1.39% (p=0.000 n=10)
Sinh 252.0n ± 0% 241.4n ± 0% -4.21% (p=0.000 n=10)
SqrtIndirect 4.909n ± 0% 4.893n ± 0% -0.34% (p=0.021 n=10)
SqrtLatency 19.57n ± 1% 19.57n ± 0% ~ (p=0.087 n=10)
SqrtIndirectLatency 19.64n ± 0% 19.57n ± 0% -0.36% (p=0.025 n=10)
SqrtGoLatency 198.1n ± 0% 197.4n ± 0% -0.35% (p=0.014 n=10)
SqrtPrime 5.733µ ± 0% 5.725µ ± 0% ~ (p=0.116 n=10)
Tan 149.1n ± 0% 146.8n ± 0% -1.54% (p=0.000 n=10)
Tanh 248.2n ± 1% 238.1n ± 0% -4.05% (p=0.000 n=10)
Trunc 36.86n ± 0% 36.70n ± 0% -0.43% (p=0.029 n=10)
Y0 638.2n ± 0% 633.6n ± 0% -0.71% (p=0.000 n=10)
Y1 641.8n ± 0% 636.1n ± 0% -0.87% (p=0.000 n=10)
Yn 1.358µ ± 0% 1.345µ ± 0% -0.92% (p=0.000 n=10)
Float64bits 5.721n ± 0% 5.709n ± 0% -0.22% (p=0.044 n=10)
Float64frombits 4.905n ± 0% 4.893n ± 0% ~ (p=0.266 n=10)
Float32bits 12.27n ± 0% 12.23n ± 0% ~ (p=0.122 n=10)
Float32frombits 4.909n ± 0% 4.893n ± 0% -0.32% (p=0.024 n=10)
FMA 6.556n ± 0% 6.526n ± 0% ~ (p=0.283 n=10)
geomean 86.82n 83.75n -3.54%
Change-Id: I522297a79646d76543d516accce291f5a3cea337
Reviewed-on: https://go-review.googlesource.com/c/go/+/717560
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
Separate patterns in asmcheck by spaces instead of commas.
Many patterns end in comma (like "MOV [$]123,") so separating
patterns by comma is not great; they're already quoted, so spaces are fine.
Also replace all tabs in the assembly lines with spaces before matching.
Finally, replace \$ or \\$ with [$] as the matching idiom.
The effect of all these is to make the patterns look like:
// amd64:"BSFQ" "ORQ [$]256"
instead of the old:
// amd64:"BSFQ","ORQ\t\\$256"
Update all tests as well.
Change-Id: Ia39febe5d7f67ba115846422789e11b185d5c807
Reviewed-on: https://go-review.googlesource.com/c/go/+/716060
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
|
|
The original Const64F using: AUIPC + LD + FMVDX to load
float64 const, we can use AUIPC + FLD instead, same as Const32F.
Change-Id: I8ca0a0e90d820a26e69b74cd25df3cc662132bf7
Reviewed-on: https://go-review.googlesource.com/c/go/+/703215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
While here, reorder Float32ConstantStore/Float64ConstantStore for
consistency.
Change-Id: Ic1b3e9f9474965d15bc94518d78d1a4a7bda93f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/703756
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: I29ccd105c5418955146a3f4873162963da489a70
Reviewed-on: https://go-review.googlesource.com/c/go/+/697935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Add the VECTOR FP (MINIMUM|MAXIMUM) instructions to the assembler and
use them in the compiler to implement min and max.
Note: I've allowed floating point registers to be used with the single
element instructions (those with the W instead of V prefix) to allow
easier integration into the compiler.
Change-Id: I5f80a510bd248cf483cce95f1979bf63fbae7de6
Reviewed-on: https://go-review.googlesource.com/c/go/+/684715
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <mark@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
The compiler previously avoided the use of MOVUPS on plan9/amd64. This
was changed in CL 655875, however the codegen tests were not updated
and now fail (seemingly the full codegen tests do not run anywhere,
not even on the longtest builders).
Change-Id: I388b60e7b0911048d4949c5029347f9801c018a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/656997
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Auto-Submit: Keith Randall <khr@google.com>
|
|
There is a generic opcode for FMA, but we don't use it in rewrite rules.
This is maybe because some archs, like WASM and MIPS don't have a late
lowering rule for it.
Fixes #71204
Intel Alder Lake 12600k (GOAMD64=v3):
math:
name old time/op new time/op delta
Acos-16 4.58ns ± 0% 3.36ns ± 0% -26.68% (p=0.008 n=5+5)
Acosh-16 8.04ns ± 1% 6.44ns ± 0% -19.95% (p=0.008 n=5+5)
Asin-16 4.28ns ± 0% 3.32ns ± 0% -22.24% (p=0.008 n=5+5)
Asinh-16 9.92ns ± 0% 8.62ns ± 0% -13.13% (p=0.008 n=5+5)
Atan-16 2.31ns ± 0% 1.84ns ± 0% -20.02% (p=0.008 n=5+5)
Atanh-16 7.79ns ± 0% 7.03ns ± 0% -9.67% (p=0.008 n=5+5)
Atan2-16 3.93ns ± 0% 3.52ns ± 0% -10.35% (p=0.000 n=5+4)
Cbrt-16 4.62ns ± 0% 4.41ns ± 0% -4.57% (p=0.016 n=4+5)
Ceil-16 0.14ns ± 1% 0.14ns ± 2% ~ (p=0.103 n=5+5)
Copysign-16 0.33ns ± 0% 0.33ns ± 0% +0.03% (p=0.029 n=4+4)
Cos-16 4.87ns ± 0% 4.75ns ± 0% -2.44% (p=0.016 n=5+4)
Cosh-16 4.86ns ± 0% 4.86ns ± 0% ~ (p=0.317 n=5+5)
Erf-16 2.71ns ± 0% 2.25ns ± 0% -16.69% (p=0.008 n=5+5)
Erfc-16 3.06ns ± 0% 2.67ns ± 0% -13.00% (p=0.016 n=5+4)
Erfinv-16 3.88ns ± 0% 2.84ns ± 3% -26.83% (p=0.008 n=5+5)
Erfcinv-16 4.08ns ± 0% 3.01ns ± 1% -26.27% (p=0.008 n=5+5)
Exp-16 3.29ns ± 0% 3.37ns ± 2% +2.64% (p=0.016 n=4+5)
ExpGo-16 8.44ns ± 0% 7.48ns ± 1% -11.37% (p=0.008 n=5+5)
Expm1-16 4.46ns ± 0% 3.69ns ± 2% -17.26% (p=0.016 n=4+5)
Exp2-16 8.20ns ± 0% 7.39ns ± 2% -9.94% (p=0.008 n=5+5)
Exp2Go-16 8.26ns ± 0% 7.23ns ± 0% -12.49% (p=0.016 n=4+5)
Abs-16 0.26ns ± 3% 0.22ns ± 1% -16.34% (p=0.008 n=5+5)
Dim-16 0.38ns ± 1% 0.40ns ± 2% +5.02% (p=0.008 n=5+5)
Floor-16 0.11ns ± 1% 0.17ns ± 4% +54.99% (p=0.008 n=5+5)
Max-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.619 n=5+5)
Min-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.484 n=5+5)
Mod-16 13.4ns ± 1% 12.8ns ± 0% -4.21% (p=0.016 n=5+4)
Frexp-16 1.70ns ± 0% 1.71ns ± 0% +0.46% (p=0.008 n=5+5)
Gamma-16 3.97ns ± 0% 3.97ns ± 0% ~ (p=0.643 n=5+5)
Hypot-16 2.11ns ± 0% 2.11ns ± 0% ~ (p=0.762 n=5+5)
HypotGo-16 2.48ns ± 4% 2.26ns ± 0% -8.94% (p=0.008 n=5+5)
Ilogb-16 1.67ns ± 0% 1.67ns ± 0% -0.07% (p=0.048 n=5+5)
J0-16 19.8ns ± 0% 19.3ns ± 0% ~ (p=0.079 n=4+5)
J1-16 19.4ns ± 0% 18.9ns ± 0% -2.63% (p=0.000 n=5+4)
Jn-16 41.5ns ± 0% 40.6ns ± 0% -2.32% (p=0.016 n=4+5)
Ldexp-16 2.26ns ± 0% 2.26ns ± 0% ~ (p=0.683 n=5+5)
Lgamma-16 4.40ns ± 0% 4.21ns ± 0% -4.21% (p=0.008 n=5+5)
Log-16 4.05ns ± 0% 4.05ns ± 0% ~ (all equal)
Logb-16 1.69ns ± 0% 1.69ns ± 0% ~ (p=0.429 n=5+5)
Log1p-16 5.00ns ± 0% 3.99ns ± 0% -20.14% (p=0.008 n=5+5)
Log10-16 4.22ns ± 0% 4.21ns ± 0% -0.15% (p=0.008 n=5+5)
Log2-16 2.27ns ± 0% 2.25ns ± 0% -0.94% (p=0.008 n=5+5)
Modf-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.492 n=5+5)
Nextafter32-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.079 n=4+5)
Nextafter64-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.095 n=4+5)
PowInt-16 10.8ns ± 0% 10.8ns ± 0% ~ (all equal)
PowFrac-16 25.3ns ± 0% 25.3ns ± 0% -0.09% (p=0.000 n=5+4)
Pow10Pos-16 0.52ns ± 1% 0.52ns ± 0% ~ (p=0.810 n=5+5)
Pow10Neg-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.381 n=5+5)
Round-16 0.93ns ± 0% 0.93ns ± 0% ~ (p=0.056 n=5+5)
RoundToEven-16 1.64ns ± 0% 1.64ns ± 0% ~ (all equal)
Remainder-16 12.4ns ± 2% 12.0ns ± 0% -3.27% (p=0.008 n=5+5)
Signbit-16 0.37ns ± 0% 0.37ns ± 0% -0.19% (p=0.008 n=5+5)
Sin-16 4.04ns ± 0% 3.92ns ± 0% -3.13% (p=0.000 n=4+5)
Sincos-16 5.99ns ± 0% 5.80ns ± 0% -3.03% (p=0.008 n=5+5)
Sinh-16 5.22ns ± 0% 5.22ns ± 0% ~ (p=0.651 n=5+4)
SqrtIndirect-16 0.41ns ± 0% 0.41ns ± 0% ~ (p=0.333 n=4+5)
SqrtLatency-16 2.66ns ± 0% 2.66ns ± 0% ~ (p=0.079 n=4+5)
SqrtIndirectLatency-16 2.66ns ± 0% 2.66ns ± 0% ~ (p=1.000 n=5+5)
SqrtGoLatency-16 30.1ns ± 0% 28.6ns ± 1% -4.84% (p=0.008 n=5+5)
SqrtPrime-16 645ns ± 0% 645ns ± 0% ~ (p=0.095 n=5+4)
Tan-16 4.21ns ± 0% 4.09ns ± 0% -2.76% (p=0.029 n=4+4)
Tanh-16 5.36ns ± 0% 5.36ns ± 0% ~ (p=0.444 n=5+5)
Trunc-16 0.12ns ± 6% 0.11ns ± 1% -6.79% (p=0.008 n=5+5)
Y0-16 19.2ns ± 0% 18.7ns ± 0% -2.52% (p=0.000 n=5+4)
Y1-16 19.1ns ± 0% 18.4ns ± 0% ~ (p=0.079 n=4+5)
Yn-16 40.7ns ± 0% 39.5ns ± 0% -2.82% (p=0.008 n=5+5)
Float64bits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.603 n=5+5)
Float64frombits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.984 n=4+5)
Float32bits-16 0.21ns ± 0% 0.21ns ± 0% ~ (p=0.778 n=4+5)
Float32frombits-16 0.21ns ± 0% 0.20ns ± 0% ~ (p=0.397 n=5+5)
FMA-16 0.82ns ± 0% 0.82ns ± 0% +0.02% (p=0.029 n=4+4)
[Geo mean] 2.87ns 2.74ns -4.61%
math/cmplx:
name old time/op new time/op delta
Abs-16 2.07ns ± 0% 2.05ns ± 0% -0.70% (p=0.016 n=5+4)
Acos-16 36.5ns ± 0% 35.7ns ± 0% -2.33% (p=0.029 n=4+4)
Acosh-16 37.0ns ± 0% 36.2ns ± 0% -2.20% (p=0.008 n=5+5)
Asin-16 36.5ns ± 0% 35.7ns ± 0% -2.29% (p=0.008 n=5+5)
Asinh-16 33.5ns ± 0% 31.6ns ± 0% -5.51% (p=0.008 n=5+5)
Atan-16 15.5ns ± 0% 13.9ns ± 0% -10.61% (p=0.008 n=5+5)
Atanh-16 15.0ns ± 0% 13.6ns ± 0% -9.73% (p=0.008 n=5+5)
Conj-16 0.11ns ± 5% 0.11ns ± 1% ~ (p=0.421 n=5+5)
Cos-16 12.3ns ± 0% 12.2ns ± 0% -0.60% (p=0.000 n=4+5)
Cosh-16 12.1ns ± 0% 12.0ns ± 0% ~ (p=0.079 n=4+5)
Exp-16 10.0ns ± 0% 9.8ns ± 0% -1.77% (p=0.008 n=5+5)
Log-16 14.5ns ± 0% 13.7ns ± 0% -5.67% (p=0.008 n=5+5)
Log10-16 14.5ns ± 0% 13.7ns ± 0% -5.55% (p=0.000 n=5+4)
Phase-16 5.11ns ± 0% 4.25ns ± 0% -16.90% (p=0.008 n=5+5)
Polar-16 7.12ns ± 0% 6.35ns ± 0% -10.90% (p=0.008 n=5+5)
Pow-16 64.3ns ± 0% 63.7ns ± 0% -0.97% (p=0.008 n=5+5)
Rect-16 5.74ns ± 0% 5.58ns ± 0% -2.73% (p=0.016 n=4+5)
Sin-16 12.2ns ± 0% 12.2ns ± 0% -0.54% (p=0.000 n=4+5)
Sinh-16 12.1ns ± 0% 12.0ns ± 0% -0.58% (p=0.000 n=5+4)
Sqrt-16 5.30ns ± 0% 5.18ns ± 0% -2.36% (p=0.008 n=5+5)
Tan-16 22.7ns ± 0% 22.6ns ± 0% -0.33% (p=0.008 n=5+5)
Tanh-16 21.2ns ± 0% 20.9ns ± 0% -1.32% (p=0.008 n=5+5)
[Geo mean] 11.3ns 10.8ns -3.97%
Change-Id: Idcc4b357ba68477929c126289e5095b27a827b1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/646335
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
FMA 25.930n ± 0% 2.002n ± 0% -92.28% (p=0.000 n=10)
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
FMA 32.840n ± 0% 2.002n ± 0% -93.90% (p=0.000 n=10)
Updates #59120
This patch is a copy of CL 483355.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: I88b89d23f00864f9173a182a47ee135afec7ed6e
Reviewed-on: https://go-review.googlesource.com/c/go/+/625335
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
goos: linux
goarch: loong64
pkg: test/bench/go1
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
BinaryTree17 7.766 ± 1% 7.640 ± 2% -1.62% (p=0.000 n=20)
Fannkuch11 2.649 ± 0% 2.358 ± 0% -10.96% (p=0.000 n=20)
FmtFprintfEmpty 35.89n ± 0% 35.87n ± 0% -0.06% (p=0.000 n=20)
FmtFprintfString 59.44n ± 0% 57.25n ± 2% -3.68% (p=0.000 n=20)
FmtFprintfInt 62.07n ± 0% 60.04n ± 0% -3.27% (p=0.000 n=20)
FmtFprintfIntInt 97.90n ± 0% 97.26n ± 0% -0.65% (p=0.000 n=20)
FmtFprintfPrefixedInt 116.7n ± 0% 119.2n ± 0% +2.14% (p=0.000 n=20)
FmtFprintfFloat 204.5n ± 0% 201.9n ± 0% -1.30% (p=0.000 n=20)
FmtManyArgs 455.9n ± 0% 466.8n ± 0% +2.39% (p=0.000 n=20)
GobDecode 7.458m ± 1% 7.138m ± 1% -4.28% (p=0.000 n=20)
GobEncode 8.573m ± 1% 8.473m ± 1% ~ (p=0.091 n=20)
Gzip 280.2m ± 0% 284.9m ± 0% +1.67% (p=0.000 n=20)
Gunzip 32.68m ± 0% 32.67m ± 0% ~ (p=0.211 n=20)
HTTPClientServer 54.22µ ± 0% 53.24µ ± 0% -1.80% (p=0.000 n=20)
JSONEncode 9.427m ± 1% 9.152m ± 0% -2.92% (p=0.000 n=20)
JSONDecode 47.08m ± 1% 46.85m ± 1% -0.49% (p=0.007 n=20)
Mandelbrot200 4.601m ± 0% 4.605m ± 0% +0.08% (p=0.000 n=20)
GoParse 4.776m ± 0% 4.655m ± 1% -2.52% (p=0.000 n=20)
RegexpMatchEasy0_32 59.77n ± 0% 57.59n ± 0% -3.66% (p=0.000 n=20)
RegexpMatchEasy0_1K 458.1n ± 0% 458.8n ± 0% +0.15% (p=0.000 n=20)
RegexpMatchEasy1_32 59.36n ± 0% 59.24n ± 0% -0.20% (p=0.000 n=20)
RegexpMatchEasy1_1K 557.7n ± 0% 560.2n ± 0% +0.46% (p=0.000 n=20)
RegexpMatchMedium_32 803.1n ± 0% 772.8n ± 0% -3.77% (p=0.000 n=20)
RegexpMatchMedium_1K 27.29µ ± 0% 25.88µ ± 0% -5.18% (p=0.000 n=20)
RegexpMatchHard_32 1.385µ ± 0% 1.304µ ± 0% -5.85% (p=0.000 n=20)
RegexpMatchHard_1K 40.92µ ± 0% 39.58µ ± 0% -3.27% (p=0.000 n=20)
Revcomp 474.3m ± 0% 410.0m ± 0% -13.56% (p=0.000 n=20)
Template 78.16m ± 0% 76.32m ± 1% -2.36% (p=0.000 n=20)
TimeParse 271.8n ± 0% 272.1n ± 0% +0.11% (p=0.000 n=20)
TimeFormat 292.3n ± 0% 294.8n ± 0% +0.86% (p=0.000 n=20)
geomean 51.98µ 50.82µ -2.22%
Change-Id: Ia78f1ddee8f1d9ec7192a4b8d2a4ec6058679956
Reviewed-on: https://go-review.googlesource.com/c/go/+/615918
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
Make math.{Min,Max} intrinsics and implement math.{archMax,archMin}
in hardware.
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Max 7.606n ± 0% 3.087n ± 0% -59.41% (p=0.000 n=20)
Min 7.205n ± 0% 2.904n ± 0% -59.69% (p=0.000 n=20)
MinFloat 37.220n ± 0% 4.802n ± 0% -87.10% (p=0.000 n=20)
MaxFloat 33.620n ± 0% 4.802n ± 0% -85.72% (p=0.000 n=20)
geomean 16.18n 3.792n -76.57%
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A5000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Max 10.010n ± 0% 7.196n ± 0% -28.11% (p=0.000 n=20)
Min 8.806n ± 0% 7.155n ± 0% -18.75% (p=0.000 n=20)
MinFloat 60.010n ± 0% 7.976n ± 0% -86.71% (p=0.000 n=20)
MaxFloat 56.410n ± 0% 7.980n ± 0% -85.85% (p=0.000 n=20)
geomean 23.37n 7.566n -67.63%
Updates #59120.
Change-Id: I6815d20bc304af3cbf5d6ca8fe0ca1c2ddebea2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/580283
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
x86 is better at storing constant ints than constant floats.
(It uses a constant directly in the instruction stream, instead of
loading it from a constant global memory.)
Noticed as part of #67957
Change-Id: I9b7b586ad8e0fe9ce245324f020e9526f82b209d
Reviewed-on: https://go-review.googlesource.com/c/go/+/592596
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Any normal float32 constant can be generated by this instruction;
use xxspltidp when possible. This prefixed instruction is much
faster than the two instruction load sequence from the
float32/float64 constant pool.
Change-Id: Id751d9ffdae71463adbde66427b986f0b2ef74c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/575555
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This enables efficient use of the builtin min/max function
for float64 and float32 types on GOPPC64 >= power9.
Extend the assembler to support xsminjdp/xsmaxjdp and use
them to implement float min/max.
Simplify the VSX xx3 opcode rules to allow FPR arguments,
if all arguments are an FPR.
Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328
Reviewed-on: https://go-review.googlesource.com/c/go/+/574535
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
As CL 514596 and CL 514775 adds hardware implement of float
max/min, we should add codegen test for these two CL.
Change-Id: I347331032fe9f67a2e6fdb5d3cfe20203296b81c
Reviewed-on: https://go-review.googlesource.com/c/go/+/561295
Reviewed-by: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: M Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This CL adds FMADDS,FMSUBS,FNMADDS,FNMSUBS SSA support for riscv
Change-Id: I1e7dd322b46b9e0f4923dbba256303d69ed12066
Reviewed-on: https://go-review.googlesource.com/c/go/+/506616
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: M Zhuo <mzh@golangcn.org>
|
|
FMADD/FMSUB/FNSUB are an efficient FP FMA instructions, which can
be used by the compiler to improve FP performance.
Erf 188.0n ± 2% 139.5n ± 2% -25.82% (p=0.000 n=10)
Erfc 193.6n ± 1% 143.2n ± 1% -26.01% (p=0.000 n=10)
Erfinv 244.4n ± 2% 172.6n ± 0% -29.40% (p=0.000 n=10)
Erfcinv 244.7n ± 2% 173.0n ± 1% -29.31% (p=0.000 n=10)
geomean 216.0n 156.3n -27.65%
Ref: The RISC-V Instruction Set Manual Volume I: Unprivileged ISA
11.6 Single-Precision Floating-Point Computational Instructions
Change-Id: I89aa3a4df7576fdd47f4a6ee608ac16feafd093c
Reviewed-on: https://go-review.googlesource.com/c/go/+/506036
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Change-Id: Ice0bb7a665599b334e927a1b00d1a5b400c15e3d
Reviewed-on: https://go-review.googlesource.com/c/go/+/506035
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
|
|
Use a small python script to consolidate duplicate
ppc64/ppc64le tests into a single ppc64x codegen test.
This makes small assumption that anytime two tests with
for different arch/variant combos exists, those tests
can be combined into a single ppc64x test.
E.x:
// ppc64le: foo
// ppc64le/power9: foo
into
// ppc64x: foo
or
// ppc64: foo
// ppc64le: foo
into
// ppc64x: foo
import glob
import re
files = glob.glob("codegen/*.go")
for file in files:
with open(file) as f:
text = [l for l in f]
i = 0
while i < len(text):
first = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[i])
if first:
j = i+1
while j < len(text):
second = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[j])
if not second:
break
if (not first.group(2) or first.group(2) == second.group(2)) and first.group(3) == second.group(3):
text[i] = re.sub(" ?ppc64(le|x)?"," ppc64x",text[i])
text=text[:j] + (text[j+1:])
else:
j += 1
i+=1
with open(file, 'w') as f:
f.write("".join(text))
Change-Id: Ic6b009b54eacaadc5a23db9c5a3bf7331b595821
Reviewed-on: https://go-review.googlesource.com/c/go/+/463220
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Optimize load and store to []float64 and []float32.
Previously it used LSL instead of shifted register indexed load/store.
Before:
LSL $3, R0, R0
FMOVD F0, (R1)(R0)
After:
FMOVD F0, (R1)(R0<<3)
Fixes #42798
Change-Id: I0c0912140c3dce5aa6abc27097c0eb93833cc589
Reviewed-on: https://go-review.googlesource.com/c/go/+/273706
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Giovanni Bajo <rasky@develer.com>
|
|
Backstop support for non-sse2 chips now that 387 is gone.
RELNOTE=yes
Change-Id: Ib10e69c4a3654c15a03568f93393437e1939e013
Reviewed-on: https://go-review.googlesource.com/c/go/+/260017
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
My last 387 CL. So sad ... ... ... ... not!
Fixes #40255
Change-Id: I8d4ddb744b234b8adc735db2f7c3c7b6d8bbdfa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/258957
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
On s390x, some floating point arithmetic instructions (FSUB, FADD) generate flag.
This patch allows those related SSA ops to return a tuple, where the second argument of
the tuple is the generated flag. We can use the flag and remove the
subsequent comparison instruction (e.g: LTDBR).
This CL also reduces the .text section for math.test binary by 0.4KB.
Benchmarks:
name old time/op new time/op delta
Acos-18 12.1ns ± 0% 12.1ns ± 0% ~ (all equal)
Acosh-18 18.5ns ± 0% 18.5ns ± 0% ~ (all equal)
Asin-18 13.1ns ± 0% 13.1ns ± 0% ~ (all equal)
Asinh-18 19.4ns ± 0% 19.5ns ± 1% ~ (p=0.444 n=5+5)
Atan-18 10.0ns ± 0% 10.0ns ± 0% ~ (all equal)
Atanh-18 19.1ns ± 1% 19.2ns ± 2% ~ (p=0.841 n=5+5)
Atan2-18 16.4ns ± 0% 16.4ns ± 0% ~ (all equal)
Cbrt-18 14.8ns ± 0% 14.8ns ± 0% ~ (all equal)
Ceil-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
Copysign-18 0.80ns ± 0% 0.80ns ± 0% ~ (all equal)
Cos-18 7.19ns ± 0% 7.19ns ± 0% ~ (p=0.556 n=4+5)
Cosh-18 12.4ns ± 0% 12.4ns ± 0% ~ (all equal)
Erf-18 10.8ns ± 0% 10.8ns ± 0% ~ (all equal)
Erfc-18 11.0ns ± 0% 11.0ns ± 0% ~ (all equal)
Erfinv-18 23.0ns ±16% 26.8ns ± 1% +16.90% (p=0.008 n=5+5)
Erfcinv-18 23.3ns ±15% 26.1ns ± 7% ~ (p=0.087 n=5+5)
Exp-18 8.67ns ± 0% 8.67ns ± 0% ~ (p=1.000 n=4+4)
ExpGo-18 50.8ns ± 3% 52.4ns ± 2% ~ (p=0.063 n=5+5)
Expm1-18 9.49ns ± 1% 9.47ns ± 0% ~ (p=1.000 n=5+5)
Exp2-18 52.7ns ± 1% 50.5ns ± 3% -4.10% (p=0.024 n=5+5)
Exp2Go-18 50.6ns ± 1% 48.4ns ± 3% -4.39% (p=0.008 n=5+5)
Abs-18 0.67ns ± 0% 0.67ns ± 0% ~ (p=0.444 n=5+5)
Dim-18 1.02ns ± 0% 1.03ns ± 0% +0.98% (p=0.008 n=5+5)
Floor-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
Max-18 3.09ns ± 1% 3.05ns ± 0% -1.42% (p=0.008 n=5+5)
Min-18 3.32ns ± 1% 3.30ns ± 0% -0.72% (p=0.016 n=5+4)
Mod-18 62.3ns ± 1% 65.8ns ± 3% +5.55% (p=0.008 n=5+5)
Frexp-18 5.05ns ± 2% 4.98ns ± 0% ~ (p=0.683 n=5+5)
Gamma-18 24.4ns ± 0% 24.1ns ± 0% -1.23% (p=0.008 n=5+5)
Hypot-18 10.3ns ± 0% 10.3ns ± 0% ~ (all equal)
HypotGo-18 10.2ns ± 0% 10.2ns ± 0% ~ (all equal)
Ilogb-18 3.56ns ± 1% 3.54ns ± 0% ~ (p=0.595 n=5+5)
J0-18 113ns ± 0% 108ns ± 1% -4.42% (p=0.016 n=4+5)
J1-18 115ns ± 0% 109ns ± 1% -4.87% (p=0.016 n=4+5)
Jn-18 240ns ± 0% 230ns ± 2% -4.41% (p=0.008 n=5+5)
Ldexp-18 6.19ns ± 0% 6.19ns ± 0% ~ (p=0.444 n=5+5)
Lgamma-18 32.2ns ± 0% 32.2ns ± 0% ~ (all equal)
Log-18 13.1ns ± 0% 13.1ns ± 0% ~ (all equal)
Logb-18 4.23ns ± 0% 4.22ns ± 0% ~ (p=0.444 n=5+5)
Log1p-18 12.7ns ± 0% 12.7ns ± 0% ~ (all equal)
Log10-18 18.1ns ± 0% 18.2ns ± 0% ~ (p=0.167 n=5+5)
Log2-18 14.0ns ± 0% 14.0ns ± 0% ~ (all equal)
Modf-18 10.4ns ± 0% 10.5ns ± 0% +0.96% (p=0.016 n=4+5)
Nextafter32-18 11.3ns ± 0% 11.3ns ± 0% ~ (all equal)
Nextafter64-18 4.01ns ± 1% 3.97ns ± 0% ~ (p=0.333 n=5+4)
PowInt-18 32.7ns ± 0% 32.7ns ± 0% ~ (all equal)
PowFrac-18 33.2ns ± 0% 33.1ns ± 0% ~ (p=0.095 n=4+5)
Pow10Pos-18 1.58ns ± 0% 1.58ns ± 0% ~ (all equal)
Pow10Neg-18 5.81ns ± 0% 5.81ns ± 0% ~ (all equal)
Round-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
RoundToEven-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
Remainder-18 40.6ns ± 0% 40.7ns ± 0% ~ (p=0.238 n=5+4)
Signbit-18 1.57ns ± 0% 1.57ns ± 0% ~ (all equal)
Sin-18 6.75ns ± 0% 6.74ns ± 0% ~ (p=0.333 n=5+4)
Sincos-18 29.5ns ± 0% 29.5ns ± 0% ~ (all equal)
Sinh-18 14.4ns ± 0% 14.4ns ± 0% ~ (all equal)
SqrtIndirect-18 3.97ns ± 0% 4.15ns ± 0% +4.59% (p=0.008 n=5+5)
SqrtLatency-18 8.01ns ± 0% 8.01ns ± 0% ~ (all equal)
SqrtIndirectLatency-18 11.6ns ± 0% 11.6ns ± 0% ~ (all equal)
SqrtGoLatency-18 44.7ns ± 0% 45.0ns ± 0% +0.67% (p=0.008 n=5+5)
SqrtPrime-18 1.26µs ± 0% 1.27µs ± 0% +0.63% (p=0.029 n=4+4)
Tan-18 11.1ns ± 0% 11.1ns ± 0% ~ (all equal)
Tanh-18 15.8ns ± 0% 15.8ns ± 0% ~ (all equal)
Trunc-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
Y0-18 113ns ± 2% 108ns ± 3% -5.11% (p=0.008 n=5+5)
Y1-18 112ns ± 3% 107ns ± 0% -4.29% (p=0.000 n=5+4)
Yn-18 229ns ± 0% 220ns ± 1% -3.76% (p=0.016 n=4+5)
Float64bits-18 1.09ns ± 0% 1.09ns ± 0% ~ (all equal)
Float64frombits-18 0.55ns ± 0% 0.55ns ± 0% ~ (all equal)
Float32bits-18 0.96ns ±16% 0.86ns ± 0% ~ (p=0.563 n=5+5)
Float32frombits-18 1.03ns ±28% 0.84ns ± 0% ~ (p=0.167 n=5+5)
FMA-18 1.60ns ± 0% 1.60ns ± 0% ~ (all equal)
[Geo mean] 10.0ns 9.9ns -0.41%
Change-Id: Ief7e63ea5a8ba404b0a4696e12b9b7e0b05a9a03
Reviewed-on: https://go-review.googlesource.com/c/go/+/209160
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.
Fixes #37316.
Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The load and test instructions compare the given value
against zero and will produce a condition code indicating
one of the following scenarios:
0: Result is zero
1: Result is less than zero
2: Result is greater than zero
3: Result is not a number (NaN)
The instruction can be used to simplify floating point comparisons
against zero, which can enable further optimizations.
This CL also reduces the size of .text section of math.test binary by around
0.7 KB (in hexadecimal, from 1358f0 to 135620).
Change-Id: I33cb714f0c6feebac7a1c46dfcc735e7daceff9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/209159
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The CL 164718 mistyped the comparison flags. The rules for floating
point comparison should be GreaterThanF and GreaterEqualF. Fortunately,
the wrong optimizations were overwritten by other integer rules, so the
issue won't cause failure but just some performance impact.
The fixed CL optimizes the floating point test as follows.
source code: func foo(f float64) bool { return f > 4 || f < -4}
previous version: "FCMPD", "CSET\tGT", "CBZ"
fixed version: "FCMPD", BLE"
Add the test case.
Change-Id: Iea954fdbb8272b2d642dae0f816dc77286e6e1fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/177121
Reviewed-by: Ben Shi <powerman1st@163.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Adding cases for ppc64,ppc64le to the codegen tests
where appropriate.
Change-Id: Idf8cbe88a4ab4406a4ef1ea777bd15a58b68f3ed
Reviewed-on: https://go-review.googlesource.com/c/142557
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
ARMv7's MULAF/MULSF/MULAD/MULSD are not fused,
this CL fixes the confusing test cases.
Change-Id: I35022e207e2f0d24a23a7f6f188e41ba8eee9886
Reviewed-on: https://go-review.googlesource.com/c/142439
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Akhil Indurti <aindurti@gmail.com>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
|
|
This CL adds tests of fused multiplication-accumulation
on arm/arm64.
Change-Id: Ic85d5277c0d6acb7e1e723653372dfaf96824a39
Reviewed-on: https://go-review.googlesource.com/c/141652
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
The FP load/store on arm64 have register indexed forms. And this
CL implements this optimization.
1. The total size of pkg/android_arm64 (excluding cmd/compile)
decreases about 400 bytes.
2. There is no regression in the go1 benchmark, the test case
GobEncode even gets slight improvement, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 19.0s ± 0% 19.0s ± 1% ~ (p=0.817 n=29+29)
Fannkuch11-4 9.94s ± 0% 9.95s ± 0% +0.03% (p=0.010 n=24+30)
FmtFprintfEmpty-4 233ns ± 0% 233ns ± 0% ~ (all equal)
FmtFprintfString-4 427ns ± 0% 427ns ± 0% ~ (p=0.649 n=30+30)
FmtFprintfInt-4 471ns ± 0% 471ns ± 0% ~ (all equal)
FmtFprintfIntInt-4 730ns ± 0% 730ns ± 0% ~ (all equal)
FmtFprintfPrefixedInt-4 889ns ± 0% 889ns ± 0% ~ (all equal)
FmtFprintfFloat-4 1.21µs ± 0% 1.21µs ± 0% +0.04% (p=0.012 n=20+30)
FmtManyArgs-4 2.99µs ± 0% 2.99µs ± 0% ~ (p=0.651 n=29+29)
GobDecode-4 42.4ms ± 1% 42.3ms ± 1% -0.27% (p=0.001 n=29+28)
GobEncode-4 37.8ms ±11% 36.0ms ± 0% -4.67% (p=0.000 n=30+26)
Gzip-4 1.98s ± 1% 1.96s ± 1% -1.26% (p=0.000 n=30+30)
Gunzip-4 175ms ± 0% 175ms ± 0% ~ (p=0.988 n=29+29)
HTTPClientServer-4 854µs ± 5% 860µs ± 5% ~ (p=0.236 n=28+29)
JSONEncode-4 88.8ms ± 0% 87.9ms ± 0% -1.00% (p=0.000 n=24+26)
JSONDecode-4 390ms ± 1% 392ms ± 2% +0.48% (p=0.025 n=30+30)
Mandelbrot200-4 19.5ms ± 0% 19.5ms ± 0% ~ (p=0.894 n=24+29)
GoParse-4 20.3ms ± 0% 20.1ms ± 1% -0.94% (p=0.000 n=27+26)
RegexpMatchEasy0_32-4 451ns ± 0% 451ns ± 0% ~ (p=0.578 n=30+30)
RegexpMatchEasy0_1K-4 1.63µs ± 0% 1.63µs ± 0% ~ (p=0.298 n=30+28)
RegexpMatchEasy1_32-4 431ns ± 0% 434ns ± 0% +0.67% (p=0.000 n=30+29)
RegexpMatchEasy1_1K-4 2.60µs ± 0% 2.64µs ± 0% +1.36% (p=0.000 n=28+26)
RegexpMatchMedium_32-4 744ns ± 0% 744ns ± 0% ~ (p=0.474 n=29+29)
RegexpMatchMedium_1K-4 223µs ± 0% 223µs ± 0% -0.08% (p=0.038 n=26+30)
RegexpMatchHard_32-4 12.2µs ± 0% 12.3µs ± 0% +0.27% (p=0.000 n=29+30)
RegexpMatchHard_1K-4 373µs ± 0% 373µs ± 0% ~ (p=0.219 n=29+28)
Revcomp-4 2.84s ± 0% 2.84s ± 0% ~ (p=0.130 n=28+28)
Template-4 394ms ± 1% 392ms ± 1% -0.52% (p=0.001 n=30+30)
TimeParse-4 1.93µs ± 0% 1.93µs ± 0% ~ (p=0.587 n=29+30)
TimeFormat-4 2.00µs ± 0% 2.00µs ± 0% +0.07% (p=0.001 n=28+27)
[Geo mean] 306µs 305µs -0.17%
name old speed new speed delta
GobDecode-4 18.1MB/s ± 1% 18.2MB/s ± 1% +0.27% (p=0.001 n=29+28)
GobEncode-4 20.3MB/s ±10% 21.3MB/s ± 0% +4.64% (p=0.000 n=30+26)
Gzip-4 9.79MB/s ± 1% 9.91MB/s ± 1% +1.28% (p=0.000 n=30+30)
Gunzip-4 111MB/s ± 0% 111MB/s ± 0% ~ (p=0.988 n=29+29)
JSONEncode-4 21.8MB/s ± 0% 22.1MB/s ± 0% +1.02% (p=0.000 n=24+26)
JSONDecode-4 4.97MB/s ± 1% 4.95MB/s ± 2% -0.45% (p=0.031 n=30+30)
GoParse-4 2.85MB/s ± 1% 2.88MB/s ± 1% +1.03% (p=0.000 n=30+26)
RegexpMatchEasy0_32-4 70.9MB/s ± 0% 70.9MB/s ± 0% ~ (p=0.904 n=29+28)
RegexpMatchEasy0_1K-4 627MB/s ± 0% 627MB/s ± 0% ~ (p=0.156 n=30+30)
RegexpMatchEasy1_32-4 74.2MB/s ± 0% 73.7MB/s ± 0% -0.67% (p=0.000 n=30+29)
RegexpMatchEasy1_1K-4 393MB/s ± 0% 388MB/s ± 0% -1.34% (p=0.000 n=28+26)
RegexpMatchMedium_32-4 1.34MB/s ± 0% 1.34MB/s ± 0% ~ (all equal)
RegexpMatchMedium_1K-4 4.59MB/s ± 0% 4.59MB/s ± 0% +0.07% (p=0.035 n=25+30)
RegexpMatchHard_32-4 2.61MB/s ± 0% 2.61MB/s ± 0% -0.11% (p=0.002 n=28+30)
RegexpMatchHard_1K-4 2.75MB/s ± 0% 2.75MB/s ± 0% +0.15% (p=0.001 n=30+24)
Revcomp-4 89.4MB/s ± 0% 89.4MB/s ± 0% ~ (p=0.140 n=28+28)
Template-4 4.93MB/s ± 1% 4.95MB/s ± 1% +0.51% (p=0.001 n=30+30)
[Geo mean] 18.4MB/s 18.4MB/s +0.37%
Change-Id: I9a6b521a971b21cfb51064e8e9b853cef8a1d071
Reviewed-on: https://go-review.googlesource.com/124636
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
FLDPI pushes the constant pi to 387's register stack, which is
more efficient than MOVSSconst/MOVSDconst.
1. This optimization reduces 0.3KB of the total size of pkg/linux_386
(exlcuding cmd/compile).
2. There is little regression in the go1 benchmark.
name old time/op new time/op delta
BinaryTree17-4 3.30s ± 3% 3.30s ± 2% ~ (p=0.759 n=40+39)
Fannkuch11-4 3.53s ± 1% 3.54s ± 1% ~ (p=0.168 n=40+40)
FmtFprintfEmpty-4 45.5ns ± 3% 45.6ns ± 3% ~ (p=0.553 n=40+40)
FmtFprintfString-4 78.4ns ± 3% 78.3ns ± 3% ~ (p=0.593 n=40+40)
FmtFprintfInt-4 88.8ns ± 2% 89.9ns ± 2% ~ (p=0.083 n=40+33)
FmtFprintfIntInt-4 140ns ± 4% 140ns ± 4% ~ (p=0.656 n=40+40)
FmtFprintfPrefixedInt-4 180ns ± 2% 181ns ± 3% +0.53% (p=0.050 n=40+40)
FmtFprintfFloat-4 408ns ± 4% 411ns ± 3% ~ (p=0.112 n=40+40)
FmtManyArgs-4 599ns ± 3% 602ns ± 3% ~ (p=0.784 n=40+40)
GobDecode-4 7.24ms ± 6% 7.30ms ± 5% ~ (p=0.171 n=40+40)
GobEncode-4 6.98ms ± 5% 6.89ms ± 8% ~ (p=0.107 n=40+40)
Gzip-4 396ms ± 4% 396ms ± 3% ~ (p=0.852 n=40+40)
Gunzip-4 41.3ms ± 3% 41.5ms ± 4% ~ (p=0.221 n=40+40)
HTTPClientServer-4 63.4µs ± 3% 63.4µs ± 2% ~ (p=0.895 n=39+40)
JSONEncode-4 17.5ms ± 2% 17.5ms ± 3% ~ (p=0.090 n=40+40)
JSONDecode-4 60.6ms ± 3% 60.1ms ± 4% ~ (p=0.184 n=40+40)
Mandelbrot200-4 7.80ms ± 3% 7.78ms ± 2% ~ (p=0.512 n=40+40)
GoParse-4 3.30ms ± 3% 3.28ms ± 2% -0.61% (p=0.034 n=40+40)
RegexpMatchEasy0_32-4 104ns ± 4% 103ns ± 4% ~ (p=0.118 n=40+40)
RegexpMatchEasy0_1K-4 850ns ± 2% 848ns ± 2% ~ (p=0.370 n=40+40)
RegexpMatchEasy1_32-4 112ns ± 4% 112ns ± 4% ~ (p=0.848 n=40+40)
RegexpMatchEasy1_1K-4 1.04µs ± 4% 1.03µs ± 4% ~ (p=0.333 n=40+40)
RegexpMatchMedium_32-4 132ns ± 4% 131ns ± 3% ~ (p=0.527 n=40+40)
RegexpMatchMedium_1K-4 43.4µs ± 3% 43.5µs ± 3% ~ (p=0.111 n=40+40)
RegexpMatchHard_32-4 2.24µs ± 4% 2.24µs ± 4% ~ (p=0.441 n=40+40)
RegexpMatchHard_1K-4 67.9µs ± 3% 68.0µs ± 3% ~ (p=0.095 n=40+40)
Revcomp-4 1.84s ± 2% 1.84s ± 2% ~ (p=0.677 n=40+40)
Template-4 68.4ms ± 3% 68.6ms ± 3% ~ (p=0.345 n=40+40)
TimeParse-4 433ns ± 3% 433ns ± 3% ~ (p=0.403 n=40+40)
TimeFormat-4 407ns ± 3% 406ns ± 3% ~ (p=0.900 n=40+40)
[Geo mean] 67.1µs 67.2µs +0.04%
name old speed new speed delta
GobDecode-4 106MB/s ± 5% 105MB/s ± 5% ~ (p=0.173 n=40+40)
GobEncode-4 110MB/s ± 5% 112MB/s ± 9% ~ (p=0.104 n=40+40)
Gzip-4 49.0MB/s ± 4% 49.1MB/s ± 4% ~ (p=0.836 n=40+40)
Gunzip-4 471MB/s ± 3% 468MB/s ± 4% ~ (p=0.218 n=40+40)
JSONEncode-4 111MB/s ± 2% 111MB/s ± 3% ~ (p=0.090 n=40+40)
JSONDecode-4 32.0MB/s ± 3% 32.3MB/s ± 4% ~ (p=0.194 n=40+40)
GoParse-4 17.6MB/s ± 3% 17.7MB/s ± 2% +0.62% (p=0.035 n=40+40)
RegexpMatchEasy0_32-4 307MB/s ± 4% 309MB/s ± 4% +0.70% (p=0.041 n=40+40)
RegexpMatchEasy0_1K-4 1.20GB/s ± 3% 1.21GB/s ± 2% ~ (p=0.353 n=40+40)
RegexpMatchEasy1_32-4 285MB/s ± 3% 284MB/s ± 4% ~ (p=0.384 n=40+40)
RegexpMatchEasy1_1K-4 988MB/s ± 4% 992MB/s ± 3% ~ (p=0.335 n=40+40)
RegexpMatchMedium_32-4 7.56MB/s ± 4% 7.57MB/s ± 4% ~ (p=0.314 n=40+40)
RegexpMatchMedium_1K-4 23.6MB/s ± 3% 23.6MB/s ± 3% ~ (p=0.107 n=40+40)
RegexpMatchHard_32-4 14.3MB/s ± 4% 14.3MB/s ± 4% ~ (p=0.429 n=40+40)
RegexpMatchHard_1K-4 15.1MB/s ± 3% 15.1MB/s ± 3% ~ (p=0.099 n=40+40)
Revcomp-4 138MB/s ± 2% 138MB/s ± 2% ~ (p=0.658 n=40+40)
Template-4 28.4MB/s ± 3% 28.3MB/s ± 3% ~ (p=0.331 n=40+40)
[Geo mean] 80.8MB/s 80.8MB/s +0.09%
Change-Id: I0cb715eead68ade097a302e7fb80ccbd1d1b511e
Reviewed-on: https://go-review.googlesource.com/130975
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This CL makes the codegen testsuite automatically test all
architecture variants for architecture specified in tests. For
instance, if a test file specifies a "arm" test, it will be
automatically run on all GOARM variants (5,6,7), to increase
the coverage.
The CL also introduces a syntax to specify only a specific
variant (eg: "arm/7") in case the test makes sense only there.
The same syntax also allows to specify the operating system
in case it matters (eg: "plan9/386/sse2").
Fixes #24658
Change-Id: I2eba8b918f51bb6a77a8431a309f8b71af07ea22
Reviewed-on: https://go-review.googlesource.com/107315
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
And remove it from asmtest. Next CL will remove the whole
asmtest infrastructure.
Change-Id: I5851bf7c617456d62a3c6cffacf70252df7b056b
Reviewed-on: https://go-review.googlesource.com/107335
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Change-Id: I99407e27e340689009af798989b33cef7cb92070
Reviewed-on: https://go-review.googlesource.com/103376
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
And delete them from asm_test.
Change-Id: Ibdaca3496eefc73c731b511ddb9636a1f3dff68c
Reviewed-on: https://go-review.googlesource.com/100915
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|