| Age | Commit message (Collapse) | Author |
|
goos: linux
goarch: riscv64
pkg: math
│ math-old │ math-new │
│ sec/op │ sec/op vs base │
Exp-64 41.21n ± 0% 32.03n ± 0% -22.28% (p=0.000 n=8)
Exp2-64 38.86n ± 1% 28.18n ± 0% -27.49% (p=0.000 n=8)
Exp2Go-64 40.36n ± 1% 40.51n ± 1% +0.36% (p=0.049 n=8)
Frexp-64 5.681n ± 1% 5.446n ± 0% -4.14% (p=0.000 n=8)
Ldexp-64 7.676n ± 1% 7.555n ± 0% -1.58% (p=0.001 n=8)
Change-Id: Ic122bf9598302f947c6dbf751db591f403c50373
Reviewed-on: https://go-review.googlesource.com/c/go/+/754687
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
Exp 26.30n ± 0% 12.93n ± 0% -50.85% (p=0.000 n=10)
ExpGo 26.86n ± 0% 26.92n ± 0% +0.22% (p=0.000 n=10)
Expm1 16.76n ± 0% 16.75n ± 0% ~ (p=0.060 n=10)
Exp2 23.05n ± 0% 12.12n ± 0% -47.42% (p=0.000 n=10)
Exp2Go 23.41n ± 0% 23.47n ± 0% +0.28% (p=0.000 n=10)
geomean 22.97n 17.54n -23.64%
goos: linux
goarch: loong64
pkg: math/cmplx
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
Exp 51.32n ± 0% 35.41n ± 0% -30.99% (p=0.000 n=10)
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
Exp 50.27n ± 0% 48.75n ± 1% -3.01% (p=0.000 n=10)
ExpGo 50.72n ± 0% 50.44n ± 0% -0.55% (p=0.000 n=10)
Expm1 28.40n ± 0% 28.32n ± 0% ~ (p=0.360 n=10)
Exp2 50.09n ± 0% 21.49n ± 1% -57.10% (p=0.000 n=10)
Exp2Go 50.05n ± 0% 49.69n ± 0% -0.72% (p=0.000 n=10)
geomean 44.85n 37.52n -16.35%
goos: linux
goarch: loong64
pkg: math/cmplx
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
Exp 88.56n ± 0% 67.29n ± 0% -24.03% (p=0.000 n=10)
Change-Id: I89e456d26fc075d83335ee4a31227d2aface5714
Reviewed-on: https://go-review.googlesource.com/c/go/+/653935
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
When these packages are released as part of Go 1.18,
Go 1.16 will no longer be supported, so we can remove
the +build tags in these files.
Ran go fix -fix=buildtag std cmd and then reverted the bootstrapDirs
as defined in src/cmd/dist/buildtool.go, which need to continue
to build with Go 1.4 for now.
Also reverted src/vendor and src/cmd/vendor, which will need
to be updated in their own repos first.
Manual changes in runtime/pprof/mprof_test.go to adjust line numbers.
For #41184.
Change-Id: Ic0f93f7091295b6abc76ed5cd6e6746e1280861e
Reviewed-on: https://go-review.googlesource.com/c/go/+/344955
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
|
|
Currently almost all math functions have the following pattern:
func Sin(x float64) float64
func sin(x float64) float64 {
// ... pure Go implementation ...
}
Architectures that implement a function in assembly provide the
assembly implementation directly as the exported function (e.g., Sin),
and architectures that don't implement it in assembly use a small stub
to jump back to the Go code, like:
TEXT ·Sin(SB), NOSPLIT, $0
JMP ·sin(SB)
However, most functions are not implemented in assembly on most
architectures, so this jump through assembly is a waste. It defeats
compiler optimizations like inlining. And, with regabi, it actually
adds a small but non-trivial overhead because the jump from assembly
back to Go must go through an ABI0->ABIInternal bridge function.
Hence, this CL reorganizes this structure across the entire package.
It now leans on inlining to achieve peak performance, but allows the
compiler to see all the way through the pure Go implementation.
Now, functions follow this pattern:
func Sin(x float64) float64 {
if haveArchSin {
return archSin(x)
}
return sin(x)
}
func sin(x float64) float64 {
// ... pure Go implementation ...
}
Architectures that have assembly implementations use build-tagged
files to set haveArchX to true an provide an archX implementation.
That implementation can also still call back into the Go
implementation (some of them do this).
Prior to this change, enabling ABI wrappers results in a geomean
slowdown of the math benchmarks of 8.77% (full results:
https://perf.golang.org/search?q=upload:20210415.6) and of the Tile38
benchmarks by ~4%. After this change, enabling ABI wrappers is
completely performance-neutral on Tile38 and all but one math
benchmark (full results:
https://perf.golang.org/search?q=upload:20210415.7). ABI wrappers slow
down SqrtIndirectLatency-12 by 2.09%, which makes sense because that
call must still go through an ABI wrapper.
With ABI wrappers disabled (which won't be an option on amd64 much
longer), on linux/amd64, this change is largely performance-neutral
and slightly improves the performance of a few benchmarks:
(Because there are so many benchmarks, I've applied the Šidák
correction to the alpha threshold. It makes relatively little
difference in which benchmarks are statistically significant.)
name old time/op new time/op delta
Acos-12 22.3ns ± 0% 18.8ns ± 1% -15.44% (p=0.000 n=18+16)
Acosh-12 28.2ns ± 0% 28.2ns ± 0% ~ (p=0.404 n=18+20)
Asin-12 18.1ns ± 0% 18.2ns ± 0% +0.20% (p=0.000 n=18+16)
Asinh-12 32.8ns ± 0% 32.9ns ± 1% ~ (p=0.891 n=18+20)
Atan-12 9.92ns ± 0% 9.90ns ± 1% -0.24% (p=0.000 n=17+16)
Atanh-12 27.7ns ± 0% 27.5ns ± 0% -0.72% (p=0.000 n=16+20)
Atan2-12 18.5ns ± 0% 18.4ns ± 0% -0.59% (p=0.000 n=19+19)
Cbrt-12 22.1ns ± 0% 22.1ns ± 0% ~ (p=0.804 n=16+17)
Ceil-12 0.84ns ± 0% 0.84ns ± 0% ~ (p=0.663 n=18+16)
Copysign-12 0.84ns ± 0% 0.84ns ± 0% ~ (p=0.762 n=16+19)
Cos-12 12.7ns ± 0% 12.7ns ± 1% ~ (p=0.145 n=19+18)
Cosh-12 22.2ns ± 0% 22.5ns ± 0% +1.60% (p=0.000 n=17+19)
Erf-12 11.1ns ± 1% 11.1ns ± 1% ~ (p=0.010 n=19+19)
Erfc-12 12.6ns ± 1% 12.7ns ± 0% ~ (p=0.066 n=19+15)
Erfinv-12 16.1ns ± 0% 16.1ns ± 0% ~ (p=0.462 n=17+20)
Erfcinv-12 16.0ns ± 1% 16.0ns ± 1% ~ (p=0.015 n=17+16)
Exp-12 16.3ns ± 0% 16.5ns ± 1% +1.25% (p=0.000 n=19+16)
ExpGo-12 36.2ns ± 1% 36.1ns ± 1% ~ (p=0.242 n=20+18)
Expm1-12 18.6ns ± 0% 18.7ns ± 0% +0.25% (p=0.000 n=16+19)
Exp2-12 34.7ns ± 0% 34.6ns ± 1% ~ (p=0.010 n=19+18)
Exp2Go-12 34.8ns ± 1% 34.8ns ± 1% ~ (p=0.372 n=19+19)
Abs-12 0.56ns ± 0% 0.56ns ± 0% ~ (p=0.766 n=18+16)
Dim-12 0.84ns ± 1% 0.84ns ± 1% ~ (p=0.167 n=17+19)
Floor-12 0.84ns ± 0% 0.84ns ± 0% ~ (p=0.993 n=18+16)
Max-12 3.35ns ± 0% 3.35ns ± 0% ~ (p=0.894 n=17+19)
Min-12 3.35ns ± 0% 3.36ns ± 1% ~ (p=0.214 n=18+18)
Mod-12 35.2ns ± 0% 34.7ns ± 0% -1.45% (p=0.000 n=18+17)
Frexp-12 5.31ns ± 0% 4.75ns ± 0% -10.51% (p=0.000 n=19+18)
Gamma-12 14.8ns ± 0% 16.2ns ± 1% +9.21% (p=0.000 n=20+19)
Hypot-12 6.16ns ± 0% 6.17ns ± 0% +0.26% (p=0.000 n=19+20)
HypotGo-12 7.79ns ± 1% 7.78ns ± 0% ~ (p=0.497 n=18+17)
Ilogb-12 4.47ns ± 0% 4.47ns ± 0% ~ (p=0.167 n=19+19)
J0-12 76.0ns ± 0% 76.3ns ± 0% +0.35% (p=0.000 n=19+18)
J1-12 76.8ns ± 1% 75.9ns ± 0% -1.14% (p=0.000 n=18+18)
Jn-12 167ns ± 1% 168ns ± 1% ~ (p=0.038 n=18+18)
Ldexp-12 6.98ns ± 0% 6.43ns ± 0% -7.97% (p=0.000 n=17+18)
Lgamma-12 15.9ns ± 0% 16.0ns ± 1% ~ (p=0.011 n=20+17)
Log-12 13.3ns ± 0% 13.4ns ± 1% +0.37% (p=0.000 n=15+18)
Logb-12 4.75ns ± 0% 4.75ns ± 0% ~ (p=0.831 n=16+18)
Log1p-12 19.5ns ± 0% 19.5ns ± 1% ~ (p=0.851 n=18+17)
Log10-12 15.9ns ± 0% 14.0ns ± 0% -11.92% (p=0.000 n=17+16)
Log2-12 7.88ns ± 1% 8.01ns ± 0% +1.72% (p=0.000 n=20+20)
Modf-12 4.75ns ± 0% 4.34ns ± 0% -8.66% (p=0.000 n=19+17)
Nextafter32-12 5.31ns ± 0% 5.31ns ± 0% ~ (p=0.389 n=17+18)
Nextafter64-12 5.03ns ± 1% 5.03ns ± 0% ~ (p=0.774 n=17+18)
PowInt-12 29.9ns ± 0% 28.5ns ± 0% -4.69% (p=0.000 n=18+19)
PowFrac-12 91.0ns ± 0% 91.1ns ± 0% ~ (p=0.029 n=19+19)
Pow10Pos-12 1.12ns ± 0% 1.12ns ± 0% ~ (p=0.363 n=20+20)
Pow10Neg-12 3.90ns ± 0% 3.90ns ± 0% ~ (p=0.921 n=17+18)
Round-12 2.31ns ± 0% 2.31ns ± 1% ~ (p=0.390 n=18+18)
RoundToEven-12 0.84ns ± 0% 0.84ns ± 0% ~ (p=0.280 n=18+19)
Remainder-12 31.6ns ± 0% 29.6ns ± 0% -6.16% (p=0.000 n=18+17)
Signbit-12 0.56ns ± 0% 0.56ns ± 0% ~ (p=0.385 n=19+18)
Sin-12 12.5ns ± 0% 12.5ns ± 0% ~ (p=0.080 n=18+18)
Sincos-12 16.4ns ± 2% 16.4ns ± 2% ~ (p=0.253 n=20+19)
Sinh-12 26.1ns ± 0% 26.1ns ± 0% +0.18% (p=0.000 n=17+19)
SqrtIndirect-12 3.91ns ± 0% 3.90ns ± 0% ~ (p=0.133 n=19+19)
SqrtLatency-12 2.79ns ± 0% 2.79ns ± 0% ~ (p=0.226 n=16+19)
SqrtIndirectLatency-12 6.68ns ± 0% 6.37ns ± 2% -4.66% (p=0.000 n=17+20)
SqrtGoLatency-12 49.4ns ± 0% 49.4ns ± 0% ~ (p=0.289 n=18+16)
SqrtPrime-12 3.18µs ± 0% 3.18µs ± 0% ~ (p=0.084 n=17+18)
Tan-12 13.8ns ± 0% 13.9ns ± 2% ~ (p=0.292 n=19+20)
Tanh-12 25.4ns ± 0% 25.4ns ± 0% ~ (p=0.101 n=17+17)
Trunc-12 0.84ns ± 0% 0.84ns ± 0% ~ (p=0.765 n=18+16)
Y0-12 75.8ns ± 0% 75.9ns ± 1% ~ (p=0.805 n=16+18)
Y1-12 76.3ns ± 0% 75.3ns ± 1% -1.34% (p=0.000 n=19+17)
Yn-12 164ns ± 0% 164ns ± 2% ~ (p=0.356 n=18+20)
Float64bits-12 0.56ns ± 0% 0.56ns ± 0% ~ (p=0.383 n=18+18)
Float64frombits-12 0.56ns ± 0% 0.56ns ± 0% ~ (p=0.066 n=18+19)
Float32bits-12 0.56ns ± 0% 0.56ns ± 0% ~ (p=0.889 n=16+19)
Float32frombits-12 0.56ns ± 0% 0.56ns ± 0% ~ (p=0.007 n=18+19)
FMA-12 23.9ns ± 0% 24.0ns ± 0% +0.31% (p=0.000 n=16+17)
[Geo mean] 9.86ns 9.77ns -0.87%
(https://perf.golang.org/search?q=upload:20210415.5)
For #40724.
Change-Id: I44fbba2a17be930ec9daeb0a8222f55cd50555a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/310331
Trust: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Make all our package sources use Go 1.17 gofmt format
(adding //go:build lines).
Part of //go:build change (#41184).
See https://golang.org/design/draft-gobuild
Change-Id: Ia0534360e4957e58cd9a18429c39d0e32a6addb4
Reviewed-on: https://go-review.googlesource.com/c/go/+/294430
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Part 1: CL 199499 (GOOS nacl)
Part 2: CL 200077 (amd64p32 files, toolchain)
Part 3: stuff that arguably should've been part of Part 2, but I forgot
one of my grep patterns when splitting the original CL up into
two parts.
This one might also have interesting stuff to resurrect for any future
x32 ABI support.
Updates #30439
Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
- using FMA and AVX instructions if available to speed-up
Exp calculation on amd64
- using a data table instead of #define'ed constants because
these instructions do not support loading floating point immediates.
One has to use a memory operand / register.
- Benchmark results on Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz:
Original vs New (non-FMA path)
name old time/op new time/op delta
Exp 16.0ns ± 1% 16.1ns ± 3% ~ (p=0.308 n=9+10)
Original vs New (FMA path)
name old time/op new time/op delta
Exp 16.0ns ± 1% 13.7ns ± 2% -14.80% (p=0.000 n=9+10)
Change-Id: I3d8986925d82b39b95ee979ae06f59d7e591d02e
Reviewed-on: https://go-review.googlesource.com/62590
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|