| Age | Commit message (Collapse) | Author |
|
ChaCha8 provides a cryptographically strong generator
alongside PCG, so that people who want stronger randomness
have access to that. On systems with 128-bit vector math
assembly (amd64 and arm64), ChaCha8 runs at about the same
speed as PCG (25% slower on amd64, 2% faster on arm64).
Obviously all the claimed benchmark variation other than the
new ChaCha8 benchmark is a lie.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ afa459a2f0.amd64 │ bbb48afeb7.amd64 │
│ sec/op │ sec/op vs base │
PCG_DXSM-32 1.488n ± 2% 1.492n ± 2% ~ (p=0.309 n=20)
ChaCha8-32 1.861n ± 2%
SourceUint64-32 1.450n ± 3% 1.590n ± 2% +9.69% (p=0.000 n=20)
GlobalInt64-32 2.067n ± 2% 2.061n ± 1% ~ (p=0.952 n=20)
GlobalInt64Parallel-32 0.1044n ± 2% 0.1041n ± 1% ~ (p=0.498 n=20)
GlobalUint64-32 2.085n ± 0% 2.256n ± 2% +8.23% (p=0.000 n=20)
GlobalUint64Parallel-32 0.1008n ± 1% 0.1018n ± 1% ~ (p=0.041 n=20)
Int64-32 1.779n ± 1% 1.779n ± 1% ~ (p=0.410 n=20)
Uint64-32 1.854n ± 2% 1.882n ± 1% ~ (p=0.044 n=20)
GlobalIntN1000-32 3.140n ± 3% 3.115n ± 3% ~ (p=0.673 n=20)
IntN1000-32 2.496n ± 1% 2.509n ± 1% ~ (p=0.171 n=20)
Int64N1000-32 2.510n ± 2% 2.493n ± 1% ~ (p=0.804 n=20)
Int64N1e8-32 2.471n ± 2% 2.521n ± 1% +1.98% (p=0.003 n=20)
Int64N1e9-32 2.488n ± 2% 2.506n ± 1% ~ (p=0.663 n=20)
Int64N2e9-32 2.478n ± 2% 2.482n ± 2% ~ (p=0.533 n=20)
Int64N1e18-32 3.088n ± 1% 3.216n ± 1% +4.15% (p=0.000 n=20)
Int64N2e18-32 3.493n ± 1% 3.635n ± 2% +4.05% (p=0.000 n=20)
Int64N4e18-32 5.060n ± 2% 5.122n ± 1% +1.22% (p=0.000 n=20)
Int32N1000-32 2.620n ± 1% 2.672n ± 1% +2.00% (p=0.002 n=20)
Int32N1e8-32 2.652n ± 0% 2.646n ± 1% ~ (p=0.743 n=20)
Int32N1e9-32 2.644n ± 1% 2.660n ± 2% ~ (p=0.163 n=20)
Int32N2e9-32 2.619n ± 2% 2.652n ± 1% ~ (p=0.132 n=20)
Float32-32 2.261n ± 1% 2.267n ± 1% ~ (p=0.516 n=20)
Float64-32 2.241n ± 2% 2.276n ± 1% ~ (p=0.080 n=20)
ExpFloat64-32 3.716n ± 1% 3.779n ± 1% +1.68% (p=0.007 n=20)
NormFloat64-32 3.718n ± 1% 3.747n ± 1% ~ (p=0.011 n=20)
Perm3-32 34.11n ± 2% 34.23n ± 2% ~ (p=0.779 n=20)
Perm30-32 200.6n ± 0% 202.3n ± 2% ~ (p=0.055 n=20)
Perm30ViaShuffle-32 109.7n ± 1% 115.5n ± 2% +5.34% (p=0.000 n=20)
ShuffleOverhead-32 107.2n ± 1% 113.3n ± 1% +5.74% (p=0.000 n=20)
Concurrent-32 2.108n ± 6% 2.107n ± 1% ~ (p=0.448 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ afa459a2f0.arm64 │ bbb48afeb7.arm64 │
│ sec/op │ sec/op vs base │
PCG_DXSM-8 2.531n ± 0% 2.529n ± 0% ~ (p=0.586 n=20)
ChaCha8-8 2.480n ± 0%
SourceUint64-8 2.531n ± 0% 2.534n ± 0% ~ (p=0.227 n=20)
GlobalInt64-8 2.177n ± 1% 2.173n ± 1% ~ (p=0.733 n=20)
GlobalInt64Parallel-8 0.4319n ± 0% 0.4304n ± 0% -0.32% (p=0.003 n=20)
GlobalUint64-8 2.185n ± 1% 2.185n ± 0% ~ (p=0.541 n=20)
GlobalUint64Parallel-8 0.4295n ± 1% 0.4294n ± 0% ~ (p=0.203 n=20)
Int64-8 4.104n ± 0% 4.107n ± 0% ~ (p=0.193 n=20)
Uint64-8 4.080n ± 0% 4.081n ± 0% ~ (p=0.053 n=20)
GlobalIntN1000-8 2.814n ± 1% 2.814n ± 0% ~ (p=0.879 n=20)
IntN1000-8 4.140n ± 0% 4.141n ± 0% ~ (p=0.428 n=20)
Int64N1000-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.114 n=20)
Int64N1e8-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.898 n=20)
Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.593 n=20)
Int64N2e9-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.158 n=20)
Int64N1e18-8 5.273n ± 0% 5.274n ± 0% ~ (p=0.308 n=20)
Int64N2e18-8 6.059n ± 0% 6.058n ± 0% ~ (p=0.053 n=20)
Int64N4e18-8 8.803n ± 0% 8.800n ± 0% ~ (p=0.673 n=20)
Int32N1000-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.342 n=20)
Int32N1e8-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.091 n=20)
Int32N1e9-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.273 n=20)
Int32N2e9-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.425 n=20)
Float32-8 4.110n ± 0% 4.112n ± 0% ~ (p=0.203 n=20)
Float64-8 4.104n ± 0% 4.106n ± 0% ~ (p=0.409 n=20)
ExpFloat64-8 5.338n ± 0% 5.339n ± 0% ~ (p=0.037 n=20)
NormFloat64-8 5.731n ± 0% 5.733n ± 0% ~ (p=0.692 n=20)
Perm3-8 26.62n ± 0% 26.65n ± 0% +0.09% (p=0.000 n=20)
Perm30-8 194.6n ± 2% 194.9n ± 0% ~ (p=0.141 n=20)
Perm30ViaShuffle-8 156.4n ± 0% 156.5n ± 0% +0.06% (p=0.000 n=20)
ShuffleOverhead-8 125.8n ± 0% 125.0n ± 0% -0.64% (p=0.000 n=20)
Concurrent-8 2.654n ± 6% 2.441n ± 6% -8.06% (p=0.009 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ afa459a2f0.386 │ bbb48afeb7.386 │
│ sec/op │ sec/op vs base │
PCG_DXSM-32 7.793n ± 2% 7.647n ± 1% ~ (p=0.021 n=20)
ChaCha8-32 11.48n ± 2%
SourceUint64-32 7.680n ± 1% 7.714n ± 1% ~ (p=0.713 n=20)
GlobalInt64-32 3.474n ± 3% 3.491n ± 28% ~ (p=0.337 n=20)
GlobalInt64Parallel-32 0.3253n ± 0% 0.3194n ± 0% -1.81% (p=0.000 n=20)
GlobalUint64-32 3.433n ± 2% 3.610n ± 2% +5.14% (p=0.000 n=20)
GlobalUint64Parallel-32 0.3156n ± 0% 0.3164n ± 0% ~ (p=0.073 n=20)
Int64-32 7.707n ± 1% 7.824n ± 0% +1.52% (p=0.005 n=20)
Uint64-32 7.714n ± 1% 7.732n ± 2% ~ (p=0.441 n=20)
GlobalIntN1000-32 6.236n ± 1% 6.176n ± 2% ~ (p=0.499 n=20)
IntN1000-32 10.41n ± 1% 10.31n ± 2% ~ (p=0.782 n=20)
Int64N1000-32 10.97n ± 2% 11.22n ± 2% +2.19% (p=0.002 n=20)
Int64N1e8-32 10.98n ± 1% 11.07n ± 1% ~ (p=0.056 n=20)
Int64N1e9-32 10.95n ± 0% 11.15n ± 2% ~ (p=0.016 n=20)
Int64N2e9-32 11.11n ± 1% 11.00n ± 1% ~ (p=0.654 n=20)
Int64N1e18-32 15.18n ± 2% 14.97n ± 2% ~ (p=0.387 n=20)
Int64N2e18-32 15.61n ± 1% 15.91n ± 1% +1.92% (p=0.003 n=20)
Int64N4e18-32 19.23n ± 2% 18.98n ± 1% ~ (p=1.000 n=20)
Int32N1000-32 10.35n ± 1% 10.31n ± 2% ~ (p=0.081 n=20)
Int32N1e8-32 10.33n ± 1% 10.38n ± 1% ~ (p=0.335 n=20)
Int32N1e9-32 10.35n ± 1% 10.37n ± 1% ~ (p=0.497 n=20)
Int32N2e9-32 10.35n ± 1% 10.41n ± 1% ~ (p=0.605 n=20)
Float32-32 13.57n ± 1% 13.78n ± 2% ~ (p=0.047 n=20)
Float64-32 22.95n ± 4% 23.43n ± 3% ~ (p=0.218 n=20)
ExpFloat64-32 15.23n ± 2% 15.46n ± 1% ~ (p=0.095 n=20)
NormFloat64-32 13.78n ± 1% 13.73n ± 2% ~ (p=0.031 n=20)
Perm3-32 46.62n ± 2% 47.46n ± 2% +1.82% (p=0.004 n=20)
Perm30-32 400.7n ± 1% 403.5n ± 1% ~ (p=0.098 n=20)
Perm30ViaShuffle-32 350.5n ± 1% 348.1n ± 2% ~ (p=0.703 n=20)
ShuffleOverhead-32 326.0n ± 2% 326.2n ± 2% ~ (p=0.440 n=20)
Concurrent-32 3.290n ± 0% 3.297n ± 4% ~ (p=0.189 n=20)
For #61716.
Change-Id: Id2a7e1c1db0beb81f563faaefba65fe292497269
Reviewed-on: https://go-review.googlesource.com/c/go/+/516859
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
|
|
Based on observations by Cherry Mui (see comments in CL 539299).
Add new benchmark FloatPrecMixed.
For #50489.
name old time/op new time/op delta
FloatPrecExact/1-12 129ns ± 0% 105ns ±11% -18.51% (p=0.008 n=5+5)
FloatPrecExact/10-12 317ns ± 2% 283ns ± 1% -10.65% (p=0.008 n=5+5)
FloatPrecExact/100-12 1.80µs ±15% 1.35µs ± 0% -25.09% (p=0.008 n=5+5)
FloatPrecExact/1000-12 9.48µs ±14% 8.32µs ± 1% -12.25% (p=0.008 n=5+5)
FloatPrecExact/10000-12 195µs ± 1% 191µs ± 0% -1.73% (p=0.008 n=5+5)
FloatPrecExact/100000-12 7.31ms ± 1% 7.24ms ± 1% -0.99% (p=0.032 n=5+5)
FloatPrecExact/1000000-12 301ms ± 3% 302ms ± 2% ~ (p=0.841 n=5+5)
FloatPrecMixed/1-12 141ns ± 0% 110ns ± 3% -21.88% (p=0.008 n=5+5)
FloatPrecMixed/10-12 767ns ± 0% 739ns ± 5% ~ (p=0.151 n=5+5)
FloatPrecMixed/100-12 4.93µs ± 2% 3.73µs ± 1% -24.33% (p=0.008 n=5+5)
FloatPrecMixed/1000-12 90.9µs ±11% 70.3µs ± 2% -22.66% (p=0.008 n=5+5)
FloatPrecMixed/10000-12 2.30ms ± 0% 1.92ms ± 1% -16.41% (p=0.008 n=5+5)
FloatPrecMixed/100000-12 87.1ms ± 1% 68.5ms ± 1% -21.42% (p=0.008 n=5+5)
FloatPrecMixed/1000000-12 4.09s ± 1% 3.58s ± 1% -12.35% (p=0.008 n=5+5)
FloatPrecInexact/1-12 92.4ns ± 0% 66.1ns ± 5% -28.41% (p=0.008 n=5+5)
FloatPrecInexact/10-12 118ns ± 0% 91ns ± 1% -23.14% (p=0.016 n=5+4)
FloatPrecInexact/100-12 310ns ±10% 244ns ± 1% -21.32% (p=0.008 n=5+5)
FloatPrecInexact/1000-12 952ns ± 1% 828ns ± 1% -12.96% (p=0.016 n=4+5)
FloatPrecInexact/10000-12 6.71µs ± 1% 6.25µs ± 3% -6.83% (p=0.008 n=5+5)
FloatPrecInexact/100000-12 66.1µs ± 1% 61.2µs ± 1% -7.45% (p=0.008 n=5+5)
FloatPrecInexact/1000000-12 635µs ± 2% 584µs ± 1% -7.97% (p=0.008 n=5+5)
Change-Id: I3aa67b49a042814a3286ee8306fbed36709cbb6e
Reviewed-on: https://go-review.googlesource.com/c/go/+/542756
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
|
|
Follow-up on CL 539299: missed to incorporate the updated
comment per feedback on that CL.
For #50489.
Change-Id: Ib035400038b1d11532f62055b5cdb382ab75654c
Reviewed-on: https://go-review.googlesource.com/c/go/+/542115
Run-TryBot: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
goos: darwin
goarch: amd64
pkg: math/big
cpu: Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
BenchmarkFloatPrecExact/1-12 9380685 125.0 ns/op
BenchmarkFloatPrecExact/10-12 3780493 321.2 ns/op
BenchmarkFloatPrecExact/100-12 698272 1679 ns/op
BenchmarkFloatPrecExact/1000-12 117975 9113 ns/op
BenchmarkFloatPrecExact/10000-12 5913 192768 ns/op
BenchmarkFloatPrecExact/100000-12 164 7401817 ns/op
BenchmarkFloatPrecExact/1000000-12 4 293568523 ns/op
BenchmarkFloatPrecInexact/1-12 12836612 91.26 ns/op
BenchmarkFloatPrecInexact/10-12 10144908 114.9 ns/op
BenchmarkFloatPrecInexact/100-12 4121931 297.3 ns/op
BenchmarkFloatPrecInexact/1000-12 1275886 927.7 ns/op
BenchmarkFloatPrecInexact/10000-12 170392 6546 ns/op
BenchmarkFloatPrecInexact/100000-12 18307 65232 ns/op
BenchmarkFloatPrecInexact/1000000-12 1701 621412 ns/op
Fixes #50489.
Change-Id: Ic952f00e35d42f2470ecab53df712721997eac94
Reviewed-on: https://go-review.googlesource.com/c/go/+/539299
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Run-TryBot: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
These slowdowns are because we are now using PCG instead of the
Mitchell/Reeds LFSR for the benchmarks. PCG is in fact a bit slower
(but generates statically far better random numbers).
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 01ff938549.amd64 │ afa459a2f0.amd64 │
│ sec/op │ sec/op vs base │
PCG_DXSM-32 1.490n ± 0% 1.488n ± 2% ~ (p=0.408 n=20)
SourceUint64-32 1.352n ± 1% 1.450n ± 3% +7.21% (p=0.000 n=20)
GlobalInt64-32 2.083n ± 0% 2.067n ± 2% ~ (p=0.223 n=20)
GlobalInt64Parallel-32 0.1035n ± 1% 0.1044n ± 2% ~ (p=0.010 n=20)
GlobalUint64-32 2.038n ± 1% 2.085n ± 0% +2.28% (p=0.000 n=20)
GlobalUint64Parallel-32 0.1006n ± 1% 0.1008n ± 1% ~ (p=0.733 n=20)
Int64-32 1.687n ± 2% 1.779n ± 1% +5.48% (p=0.000 n=20)
Uint64-32 1.674n ± 2% 1.854n ± 2% +10.69% (p=0.000 n=20)
GlobalIntN1000-32 3.135n ± 1% 3.140n ± 3% ~ (p=0.794 n=20)
IntN1000-32 2.478n ± 1% 2.496n ± 1% +0.73% (p=0.006 n=20)
Int64N1000-32 2.455n ± 1% 2.510n ± 2% +2.22% (p=0.000 n=20)
Int64N1e8-32 2.467n ± 2% 2.471n ± 2% ~ (p=0.050 n=20)
Int64N1e9-32 2.454n ± 1% 2.488n ± 2% +1.39% (p=0.000 n=20)
Int64N2e9-32 2.482n ± 1% 2.478n ± 2% ~ (p=0.066 n=20)
Int64N1e18-32 3.349n ± 2% 3.088n ± 1% -7.81% (p=0.000 n=20)
Int64N2e18-32 3.537n ± 1% 3.493n ± 1% -1.24% (p=0.002 n=20)
Int64N4e18-32 4.917n ± 0% 5.060n ± 2% +2.91% (p=0.000 n=20)
Int32N1000-32 2.386n ± 1% 2.620n ± 1% +9.76% (p=0.000 n=20)
Int32N1e8-32 2.366n ± 1% 2.652n ± 0% +12.11% (p=0.000 n=20)
Int32N1e9-32 2.355n ± 2% 2.644n ± 1% +12.32% (p=0.000 n=20)
Int32N2e9-32 2.371n ± 1% 2.619n ± 2% +10.48% (p=0.000 n=20)
Float32-32 2.245n ± 2% 2.261n ± 1% ~ (p=0.625 n=20)
Float64-32 2.235n ± 1% 2.241n ± 2% ~ (p=0.393 n=20)
ExpFloat64-32 3.813n ± 3% 3.716n ± 1% -2.53% (p=0.000 n=20)
NormFloat64-32 3.652n ± 2% 3.718n ± 1% +1.79% (p=0.006 n=20)
Perm3-32 33.12n ± 3% 34.11n ± 2% ~ (p=0.021 n=20)
Perm30-32 205.1n ± 1% 200.6n ± 0% -2.17% (p=0.000 n=20)
Perm30ViaShuffle-32 110.8n ± 1% 109.7n ± 1% -0.99% (p=0.002 n=20)
ShuffleOverhead-32 113.0n ± 1% 107.2n ± 1% -5.09% (p=0.000 n=20)
Concurrent-32 2.100n ± 0% 2.108n ± 6% ~ (p=0.103 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
│ 01ff938549.arm64 │ afa459a2f0.arm64 │
│ sec/op │ sec/op vs base │
PCG_DXSM-8 2.531n ± 0% 2.531n ± 0% ~ (p=0.763 n=20)
SourceUint64-8 2.258n ± 1% 2.531n ± 0% +12.09% (p=0.000 n=20)
GlobalInt64-8 2.167n ± 0% 2.177n ± 1% ~ (p=0.213 n=20)
GlobalInt64Parallel-8 0.4310n ± 0% 0.4319n ± 0% ~ (p=0.027 n=20)
GlobalUint64-8 2.182n ± 1% 2.185n ± 1% ~ (p=0.683 n=20)
GlobalUint64Parallel-8 0.4297n ± 0% 0.4295n ± 1% ~ (p=0.941 n=20)
Int64-8 2.472n ± 1% 4.104n ± 0% +66.00% (p=0.000 n=20)
Uint64-8 2.449n ± 1% 4.080n ± 0% +66.60% (p=0.000 n=20)
GlobalIntN1000-8 2.814n ± 2% 2.814n ± 1% ~ (p=0.972 n=20)
IntN1000-8 2.998n ± 2% 4.140n ± 0% +38.09% (p=0.000 n=20)
Int64N1000-8 2.949n ± 2% 4.139n ± 0% +40.35% (p=0.000 n=20)
Int64N1e8-8 2.953n ± 2% 4.140n ± 0% +40.22% (p=0.000 n=20)
Int64N1e9-8 2.950n ± 0% 4.139n ± 0% +40.32% (p=0.000 n=20)
Int64N2e9-8 2.946n ± 2% 4.140n ± 0% +40.53% (p=0.000 n=20)
Int64N1e18-8 3.779n ± 1% 5.273n ± 0% +39.52% (p=0.000 n=20)
Int64N2e18-8 4.370n ± 1% 6.059n ± 0% +38.65% (p=0.000 n=20)
Int64N4e18-8 6.544n ± 1% 8.803n ± 0% +34.52% (p=0.000 n=20)
Int32N1000-8 2.950n ± 0% 4.131n ± 0% +40.06% (p=0.000 n=20)
Int32N1e8-8 2.950n ± 2% 4.131n ± 0% +40.03% (p=0.000 n=20)
Int32N1e9-8 2.951n ± 2% 4.131n ± 0% +39.99% (p=0.000 n=20)
Int32N2e9-8 2.950n ± 2% 4.131n ± 0% +40.03% (p=0.000 n=20)
Float32-8 3.441n ± 0% 4.110n ± 0% +19.44% (p=0.000 n=20)
Float64-8 3.442n ± 0% 4.104n ± 0% +19.24% (p=0.000 n=20)
ExpFloat64-8 4.481n ± 0% 5.338n ± 0% +19.11% (p=0.000 n=20)
NormFloat64-8 4.725n ± 0% 5.731n ± 0% +21.28% (p=0.000 n=20)
Perm3-8 26.55n ± 0% 26.62n ± 0% +0.28% (p=0.000 n=20)
Perm30-8 181.9n ± 0% 194.6n ± 2% +6.98% (p=0.000 n=20)
Perm30ViaShuffle-8 142.9n ± 0% 156.4n ± 0% +9.45% (p=0.000 n=20)
ShuffleOverhead-8 120.8n ± 2% 125.8n ± 0% +4.10% (p=0.000 n=20)
Concurrent-8 2.421n ± 6% 2.654n ± 6% +9.67% (p=0.002 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 01ff938549.386 │ afa459a2f0.386 │
│ sec/op │ sec/op vs base │
PCG_DXSM-32 7.613n ± 1% 7.793n ± 2% +2.38% (p=0.000 n=20)
SourceUint64-32 2.069n ± 0% 7.680n ± 1% +271.19% (p=0.000 n=20)
GlobalInt64-32 3.456n ± 1% 3.474n ± 3% ~ (p=0.654 n=20)
GlobalInt64Parallel-32 0.3252n ± 0% 0.3253n ± 0% ~ (p=0.952 n=20)
GlobalUint64-32 3.573n ± 1% 3.433n ± 2% -3.92% (p=0.000 n=20)
GlobalUint64Parallel-32 0.3159n ± 0% 0.3156n ± 0% ~ (p=0.223 n=20)
Int64-32 2.562n ± 2% 7.707n ± 1% +200.74% (p=0.000 n=20)
Uint64-32 2.592n ± 0% 7.714n ± 1% +197.65% (p=0.000 n=20)
GlobalIntN1000-32 6.266n ± 2% 6.236n ± 1% ~ (p=0.039 n=20)
IntN1000-32 4.724n ± 2% 10.410n ± 1% +120.39% (p=0.000 n=20)
Int64N1000-32 5.490n ± 2% 10.975n ± 2% +99.89% (p=0.000 n=20)
Int64N1e8-32 5.513n ± 2% 10.980n ± 1% +99.15% (p=0.000 n=20)
Int64N1e9-32 5.476n ± 1% 10.950n ± 0% +99.96% (p=0.000 n=20)
Int64N2e9-32 5.501n ± 2% 11.110n ± 1% +101.96% (p=0.000 n=20)
Int64N1e18-32 9.043n ± 2% 15.180n ± 2% +67.86% (p=0.000 n=20)
Int64N2e18-32 9.601n ± 2% 15.610n ± 1% +62.60% (p=0.000 n=20)
Int64N4e18-32 12.00n ± 1% 19.23n ± 2% +60.14% (p=0.000 n=20)
Int32N1000-32 4.829n ± 2% 10.345n ± 1% +114.25% (p=0.000 n=20)
Int32N1e8-32 4.825n ± 2% 10.330n ± 1% +114.09% (p=0.000 n=20)
Int32N1e9-32 4.830n ± 2% 10.350n ± 1% +114.26% (p=0.000 n=20)
Int32N2e9-32 4.750n ± 2% 10.345n ± 1% +117.81% (p=0.000 n=20)
Float32-32 10.89n ± 4% 13.57n ± 1% +24.61% (p=0.000 n=20)
Float64-32 19.60n ± 4% 22.95n ± 4% +17.12% (p=0.000 n=20)
ExpFloat64-32 12.96n ± 3% 15.23n ± 2% +17.47% (p=0.000 n=20)
NormFloat64-32 7.516n ± 1% 13.780n ± 1% +83.34% (p=0.000 n=20)
Perm3-32 36.78n ± 2% 46.62n ± 2% +26.72% (p=0.000 n=20)
Perm30-32 238.9n ± 2% 400.7n ± 1% +67.73% (p=0.000 n=20)
Perm30ViaShuffle-32 189.7n ± 2% 350.5n ± 1% +84.79% (p=0.000 n=20)
ShuffleOverhead-32 159.8n ± 1% 326.0n ± 2% +104.01% (p=0.000 n=20)
Concurrent-32 3.286n ± 1% 3.290n ± 0% ~ (p=0.743 n=20)
On the other hand, compared to the original "update benchmarks" CL,
the cleanups we've made more than compensate for PCG being a bit
slower than LFSR, at least on 64-bit x86. ARM64 (Apple M1) is a bit
slower: perhaps the 64x64→128 multiply is slower there for some reason.
386 is noticeably slower, but it's also a non-SSA backend.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 220860f76f.amd64 │ afa459a2f0.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.555n ± 1% 1.450n ± 3% -6.78% (p=0.000 n=20)
GlobalInt64-32 2.071n ± 1% 2.067n ± 2% ~ (p=0.673 n=20)
GlobalInt63Parallel-32 0.1023n ± 1%
GlobalInt64Parallel-32 0.1044n ± 2%
GlobalUint64-32 5.193n ± 1% 2.085n ± 0% -59.86% (p=0.000 n=20)
GlobalUint64Parallel-32 0.2341n ± 0% 0.1008n ± 1% -56.93% (p=0.000 n=20)
Int64-32 2.056n ± 2% 1.779n ± 1% -13.47% (p=0.000 n=20)
Uint64-32 2.077n ± 2% 1.854n ± 2% -10.74% (p=0.000 n=20)
GlobalIntN1000-32 4.077n ± 2% 3.140n ± 3% -22.98% (p=0.000 n=20)
IntN1000-32 3.476n ± 2% 2.496n ± 1% -28.19% (p=0.000 n=20)
Int64N1000-32 3.059n ± 1% 2.510n ± 2% -17.96% (p=0.000 n=20)
Int64N1e8-32 2.942n ± 1% 2.471n ± 2% -15.98% (p=0.000 n=20)
Int64N1e9-32 2.932n ± 1% 2.488n ± 2% -15.14% (p=0.000 n=20)
Int64N2e9-32 2.925n ± 1% 2.478n ± 2% -15.30% (p=0.000 n=20)
Int64N1e18-32 3.116n ± 1% 3.088n ± 1% ~ (p=0.013 n=20)
Int64N2e18-32 4.067n ± 1% 3.493n ± 1% -14.11% (p=0.000 n=20)
Int64N4e18-32 4.054n ± 1% 5.060n ± 2% +24.80% (p=0.000 n=20)
Int32N1000-32 2.951n ± 1% 2.620n ± 1% -11.22% (p=0.000 n=20)
Int32N1e8-32 3.102n ± 1% 2.652n ± 0% -14.50% (p=0.000 n=20)
Int32N1e9-32 3.535n ± 1% 2.644n ± 1% -25.20% (p=0.000 n=20)
Int32N2e9-32 3.514n ± 1% 2.619n ± 2% -25.47% (p=0.000 n=20)
Float32-32 2.760n ± 1% 2.261n ± 1% -18.06% (p=0.000 n=20)
Float64-32 2.284n ± 1% 2.241n ± 2% ~ (p=0.016 n=20)
ExpFloat64-32 3.757n ± 1% 3.716n ± 1% ~ (p=0.034 n=20)
NormFloat64-32 3.837n ± 1% 3.718n ± 1% -3.09% (p=0.000 n=20)
Perm3-32 35.23n ± 2% 34.11n ± 2% -3.19% (p=0.000 n=20)
Perm30-32 208.8n ± 1% 200.6n ± 0% -3.93% (p=0.000 n=20)
Perm30ViaShuffle-32 111.7n ± 1% 109.7n ± 1% -1.84% (p=0.000 n=20)
ShuffleOverhead-32 101.1n ± 1% 107.2n ± 1% +6.03% (p=0.000 n=20)
Concurrent-32 2.108n ± 7% 2.108n ± 6% ~ (p=0.644 n=20)
PCG_DXSM-32 1.488n ± 2%
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ 220860f76f.arm64 │ afa459a2f0.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.316n ± 1% 2.531n ± 0% +9.33% (p=0.000 n=20)
GlobalInt64-8 2.183n ± 1% 2.177n ± 1% ~ (p=0.533 n=20)
GlobalInt63Parallel-8 0.4331n ± 0%
GlobalInt64Parallel-8 0.4319n ± 0%
GlobalUint64-8 4.377n ± 2% 2.185n ± 1% -50.07% (p=0.000 n=20)
GlobalUint64Parallel-8 0.9237n ± 0% 0.4295n ± 1% -53.50% (p=0.000 n=20)
Int64-8 2.538n ± 1% 4.104n ± 0% +61.68% (p=0.000 n=20)
Uint64-8 2.604n ± 1% 4.080n ± 0% +56.68% (p=0.000 n=20)
GlobalIntN1000-8 3.857n ± 2% 2.814n ± 1% -27.04% (p=0.000 n=20)
IntN1000-8 3.822n ± 2% 4.140n ± 0% +8.32% (p=0.000 n=20)
Int64N1000-8 3.318n ± 0% 4.139n ± 0% +24.74% (p=0.000 n=20)
Int64N1e8-8 3.349n ± 1% 4.140n ± 0% +23.64% (p=0.000 n=20)
Int64N1e9-8 3.317n ± 2% 4.139n ± 0% +24.80% (p=0.000 n=20)
Int64N2e9-8 3.317n ± 2% 4.140n ± 0% +24.81% (p=0.000 n=20)
Int64N1e18-8 3.542n ± 1% 5.273n ± 0% +48.85% (p=0.000 n=20)
Int64N2e18-8 5.087n ± 0% 6.059n ± 0% +19.12% (p=0.000 n=20)
Int64N4e18-8 5.084n ± 0% 8.803n ± 0% +73.16% (p=0.000 n=20)
Int32N1000-8 3.208n ± 2% 4.131n ± 0% +28.79% (p=0.000 n=20)
Int32N1e8-8 3.610n ± 1% 4.131n ± 0% +14.43% (p=0.000 n=20)
Int32N1e9-8 4.235n ± 0% 4.131n ± 0% -2.44% (p=0.000 n=20)
Int32N2e9-8 4.229n ± 1% 4.131n ± 0% -2.33% (p=0.000 n=20)
Float32-8 3.468n ± 0% 4.110n ± 0% +18.50% (p=0.000 n=20)
Float64-8 3.447n ± 0% 4.104n ± 0% +19.05% (p=0.000 n=20)
ExpFloat64-8 4.567n ± 0% 5.338n ± 0% +16.86% (p=0.000 n=20)
NormFloat64-8 4.821n ± 0% 5.731n ± 0% +18.89% (p=0.000 n=20)
Perm3-8 28.89n ± 0% 26.62n ± 0% -7.84% (p=0.000 n=20)
Perm30-8 175.7n ± 0% 194.6n ± 2% +10.76% (p=0.000 n=20)
Perm30ViaShuffle-8 153.5n ± 0% 156.4n ± 0% +1.86% (p=0.000 n=20)
ShuffleOverhead-8 119.8n ± 1% 125.8n ± 0% +4.97% (p=0.000 n=20)
Concurrent-8 2.433n ± 3% 2.654n ± 6% +9.13% (p=0.001 n=20)
PCG_DXSM-8 2.531n ± 0%
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 220860f76f.386 │ afa459a2f0.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.370n ± 1% 7.680n ± 1% +224.05% (p=0.000 n=20)
GlobalInt64-32 3.569n ± 1% 3.474n ± 3% -2.66% (p=0.001 n=20)
GlobalInt63Parallel-32 0.3221n ± 1%
GlobalInt64Parallel-32 0.3253n ± 0%
GlobalUint64-32 8.797n ± 10% 3.433n ± 2% -60.98% (p=0.000 n=20)
GlobalUint64Parallel-32 0.6351n ± 0% 0.3156n ± 0% -50.31% (p=0.000 n=20)
Int64-32 2.612n ± 2% 7.707n ± 1% +195.04% (p=0.000 n=20)
Uint64-32 3.350n ± 1% 7.714n ± 1% +130.25% (p=0.000 n=20)
GlobalIntN1000-32 5.892n ± 1% 6.236n ± 1% +5.82% (p=0.000 n=20)
IntN1000-32 4.546n ± 1% 10.410n ± 1% +128.97% (p=0.000 n=20)
Int64N1000-32 14.59n ± 1% 10.97n ± 2% -24.75% (p=0.000 n=20)
Int64N1e8-32 14.76n ± 2% 10.98n ± 1% -25.58% (p=0.000 n=20)
Int64N1e9-32 16.57n ± 1% 10.95n ± 0% -33.90% (p=0.000 n=20)
Int64N2e9-32 14.54n ± 1% 11.11n ± 1% -23.62% (p=0.000 n=20)
Int64N1e18-32 16.14n ± 1% 15.18n ± 2% -5.95% (p=0.000 n=20)
Int64N2e18-32 18.10n ± 1% 15.61n ± 1% -13.73% (p=0.000 n=20)
Int64N4e18-32 18.65n ± 1% 19.23n ± 2% +3.08% (p=0.000 n=20)
Int32N1000-32 3.560n ± 1% 10.345n ± 1% +190.55% (p=0.000 n=20)
Int32N1e8-32 3.770n ± 2% 10.330n ± 1% +174.01% (p=0.000 n=20)
Int32N1e9-32 4.098n ± 0% 10.350n ± 1% +152.53% (p=0.000 n=20)
Int32N2e9-32 4.179n ± 1% 10.345n ± 1% +147.52% (p=0.000 n=20)
Float32-32 21.18n ± 4% 13.57n ± 1% -35.93% (p=0.000 n=20)
Float64-32 20.60n ± 2% 22.95n ± 4% +11.41% (p=0.000 n=20)
ExpFloat64-32 13.07n ± 0% 15.23n ± 2% +16.48% (p=0.000 n=20)
NormFloat64-32 7.738n ± 2% 13.780n ± 1% +78.08% (p=0.000 n=20)
Perm3-32 36.73n ± 1% 46.62n ± 2% +26.91% (p=0.000 n=20)
Perm30-32 211.9n ± 1% 400.7n ± 1% +89.05% (p=0.000 n=20)
Perm30ViaShuffle-32 165.2n ± 1% 350.5n ± 1% +112.20% (p=0.000 n=20)
ShuffleOverhead-32 133.9n ± 1% 326.0n ± 2% +143.37% (p=0.000 n=20)
Concurrent-32 3.287n ± 2% 3.290n ± 0% ~ (p=0.365 n=20)
PCG_DXSM-32 7.793n ± 2%
For #61716.
Change-Id: I4e9c0525b5f84a2ac46f23da9e365495e2d05777
Reviewed-on: https://go-review.googlesource.com/c/go/+/502506
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For the original math/rand, we ported Plan 9's random number
generator, which was a refinement by Ken Thompson of an algorithm
by Don Mitchell and Jim Reeds, which Mitchell in turn recalls as
having been derived from an algorithm by Marsaglia. At its core,
it is an additive lagged Fibonacci generator (ALFG).
Whatever the details of the history, this generator is nowhere
near the current state of the art for simple, pseudo-random
generators.
This CL adds an implementation of Melissa O'Neill's PCG, specifically
the variant PCG-DXSM, which she defined after writing the PCG paper
and which is now the default in Numpy. The update is slightly slower
(a few multiplies and adds, instead of a few adds), but the state
is dramatically smaller (2 words instead of 607). The statistical
output properties are better too.
A followup CL will delete the old generator.
PCG is the only change here, so no benchmarks should be affected.
Including them anyway as further evidence for caution.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 8993506f2f.amd64 │ 01ff938549.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.325n ± 1% 1.352n ± 1% +2.00% (p=0.000 n=20)
GlobalInt64-32 2.240n ± 1% 2.083n ± 0% -7.03% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1041n ± 1% 0.1035n ± 1% ~ (p=0.064 n=20)
GlobalUint64-32 2.072n ± 3% 2.038n ± 1% ~ (p=0.089 n=20)
GlobalUint64Parallel-32 0.1008n ± 1% 0.1006n ± 1% ~ (p=0.804 n=20)
Int64-32 1.716n ± 1% 1.687n ± 2% ~ (p=0.045 n=20)
Uint64-32 1.665n ± 1% 1.674n ± 2% ~ (p=0.878 n=20)
GlobalIntN1000-32 3.335n ± 1% 3.135n ± 1% -6.00% (p=0.000 n=20)
IntN1000-32 2.484n ± 1% 2.478n ± 1% ~ (p=0.085 n=20)
Int64N1000-32 2.502n ± 2% 2.455n ± 1% -1.88% (p=0.002 n=20)
Int64N1e8-32 2.484n ± 2% 2.467n ± 2% ~ (p=0.048 n=20)
Int64N1e9-32 2.502n ± 0% 2.454n ± 1% -1.92% (p=0.000 n=20)
Int64N2e9-32 2.502n ± 0% 2.482n ± 1% -0.76% (p=0.000 n=20)
Int64N1e18-32 3.201n ± 1% 3.349n ± 2% +4.62% (p=0.000 n=20)
Int64N2e18-32 3.504n ± 1% 3.537n ± 1% ~ (p=0.185 n=20)
Int64N4e18-32 4.873n ± 1% 4.917n ± 0% +0.90% (p=0.000 n=20)
Int32N1000-32 2.639n ± 1% 2.386n ± 1% -9.57% (p=0.000 n=20)
Int32N1e8-32 2.686n ± 2% 2.366n ± 1% -11.91% (p=0.000 n=20)
Int32N1e9-32 2.636n ± 1% 2.355n ± 2% -10.70% (p=0.000 n=20)
Int32N2e9-32 2.660n ± 1% 2.371n ± 1% -10.88% (p=0.000 n=20)
Float32-32 2.261n ± 1% 2.245n ± 2% ~ (p=0.752 n=20)
Float64-32 2.280n ± 1% 2.235n ± 1% -1.97% (p=0.007 n=20)
ExpFloat64-32 3.891n ± 1% 3.813n ± 3% ~ (p=0.087 n=20)
NormFloat64-32 3.711n ± 1% 3.652n ± 2% ~ (p=0.021 n=20)
Perm3-32 32.60n ± 2% 33.12n ± 3% ~ (p=0.107 n=20)
Perm30-32 204.2n ± 0% 205.1n ± 1% ~ (p=0.358 n=20)
Perm30ViaShuffle-32 121.7n ± 2% 110.8n ± 1% -8.96% (p=0.000 n=20)
ShuffleOverhead-32 106.2n ± 2% 113.0n ± 1% +6.36% (p=0.000 n=20)
Concurrent-32 2.190n ± 5% 2.100n ± 0% -4.13% (p=0.001 n=20)
PCG_DXSM-32 1.490n ± 0%
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ 8993506f2f.arm64 │ 01ff938549.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.271n ± 0% 2.258n ± 1% ~ (p=0.167 n=20)
GlobalInt64-8 2.161n ± 1% 2.167n ± 0% ~ (p=0.693 n=20)
GlobalInt64Parallel-8 0.4303n ± 0% 0.4310n ± 0% ~ (p=0.051 n=20)
GlobalUint64-8 2.164n ± 1% 2.182n ± 1% ~ (p=0.042 n=20)
GlobalUint64Parallel-8 0.4287n ± 0% 0.4297n ± 0% ~ (p=0.082 n=20)
Int64-8 2.478n ± 1% 2.472n ± 1% ~ (p=0.151 n=20)
Uint64-8 2.460n ± 1% 2.449n ± 1% ~ (p=0.013 n=20)
GlobalIntN1000-8 2.814n ± 2% 2.814n ± 2% ~ (p=0.821 n=20)
IntN1000-8 3.003n ± 2% 2.998n ± 2% ~ (p=0.024 n=20)
Int64N1000-8 2.954n ± 0% 2.949n ± 2% ~ (p=0.192 n=20)
Int64N1e8-8 2.956n ± 0% 2.953n ± 2% ~ (p=0.109 n=20)
Int64N1e9-8 3.325n ± 0% 2.950n ± 0% -11.26% (p=0.000 n=20)
Int64N2e9-8 2.956n ± 2% 2.946n ± 2% ~ (p=0.027 n=20)
Int64N1e18-8 3.780n ± 1% 3.779n ± 1% ~ (p=0.815 n=20)
Int64N2e18-8 4.385n ± 0% 4.370n ± 1% ~ (p=0.402 n=20)
Int64N4e18-8 6.527n ± 0% 6.544n ± 1% ~ (p=0.140 n=20)
Int32N1000-8 2.964n ± 1% 2.950n ± 0% -0.47% (p=0.002 n=20)
Int32N1e8-8 2.964n ± 1% 2.950n ± 2% ~ (p=0.013 n=20)
Int32N1e9-8 2.963n ± 2% 2.951n ± 2% ~ (p=0.062 n=20)
Int32N2e9-8 2.961n ± 2% 2.950n ± 2% -0.37% (p=0.002 n=20)
Float32-8 3.442n ± 0% 3.441n ± 0% ~ (p=0.211 n=20)
Float64-8 3.442n ± 0% 3.442n ± 0% ~ (p=0.067 n=20)
ExpFloat64-8 4.472n ± 0% 4.481n ± 0% +0.20% (p=0.000 n=20)
NormFloat64-8 4.734n ± 0% 4.725n ± 0% -0.19% (p=0.003 n=20)
Perm3-8 26.55n ± 0% 26.55n ± 0% ~ (p=0.833 n=20)
Perm30-8 181.9n ± 0% 181.9n ± 0% -0.03% (p=0.004 n=20)
Perm30ViaShuffle-8 143.1n ± 0% 142.9n ± 0% ~ (p=0.204 n=20)
ShuffleOverhead-8 120.6n ± 1% 120.8n ± 2% ~ (p=0.102 n=20)
Concurrent-8 2.357n ± 2% 2.421n ± 6% ~ (p=0.016 n=20)
PCG_DXSM-8 2.531n ± 0%
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 8993506f2f.386 │ 01ff938549.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.102n ± 2% 2.069n ± 0% ~ (p=0.021 n=20)
GlobalInt64-32 3.542n ± 2% 3.456n ± 1% -2.44% (p=0.001 n=20)
GlobalInt64Parallel-32 0.3202n ± 0% 0.3252n ± 0% +1.56% (p=0.000 n=20)
GlobalUint64-32 3.507n ± 1% 3.573n ± 1% +1.87% (p=0.000 n=20)
GlobalUint64Parallel-32 0.3170n ± 1% 0.3159n ± 0% ~ (p=0.167 n=20)
Int64-32 2.516n ± 1% 2.562n ± 2% ~ (p=0.016 n=20)
Uint64-32 2.544n ± 1% 2.592n ± 0% +1.85% (p=0.000 n=20)
GlobalIntN1000-32 6.237n ± 1% 6.266n ± 2% ~ (p=0.268 n=20)
IntN1000-32 4.670n ± 2% 4.724n ± 2% ~ (p=0.644 n=20)
Int64N1000-32 5.412n ± 1% 5.490n ± 2% ~ (p=0.159 n=20)
Int64N1e8-32 5.414n ± 2% 5.513n ± 2% ~ (p=0.129 n=20)
Int64N1e9-32 5.473n ± 1% 5.476n ± 1% ~ (p=0.723 n=20)
Int64N2e9-32 5.487n ± 1% 5.501n ± 2% ~ (p=0.481 n=20)
Int64N1e18-32 8.901n ± 2% 9.043n ± 2% ~ (p=0.330 n=20)
Int64N2e18-32 9.521n ± 1% 9.601n ± 2% ~ (p=0.703 n=20)
Int64N4e18-32 11.92n ± 1% 12.00n ± 1% ~ (p=0.489 n=20)
Int32N1000-32 4.785n ± 1% 4.829n ± 2% ~ (p=0.402 n=20)
Int32N1e8-32 4.748n ± 1% 4.825n ± 2% ~ (p=0.218 n=20)
Int32N1e9-32 4.810n ± 1% 4.830n ± 2% ~ (p=0.794 n=20)
Int32N2e9-32 4.812n ± 1% 4.750n ± 2% ~ (p=0.057 n=20)
Float32-32 10.48n ± 4% 10.89n ± 4% ~ (p=0.162 n=20)
Float64-32 19.79n ± 3% 19.60n ± 4% ~ (p=0.668 n=20)
ExpFloat64-32 12.91n ± 3% 12.96n ± 3% ~ (p=1.000 n=20)
NormFloat64-32 7.462n ± 1% 7.516n ± 1% ~ (p=0.051 n=20)
Perm3-32 35.98n ± 2% 36.78n ± 2% ~ (p=0.033 n=20)
Perm30-32 241.5n ± 1% 238.9n ± 2% ~ (p=0.126 n=20)
Perm30ViaShuffle-32 187.3n ± 2% 189.7n ± 2% ~ (p=0.387 n=20)
ShuffleOverhead-32 160.2n ± 1% 159.8n ± 1% ~ (p=0.256 n=20)
Concurrent-32 3.308n ± 3% 3.286n ± 1% ~ (p=0.038 n=20)
PCG_DXSM-32 7.613n ± 1%
For #61716.
Change-Id: Icb274ca1f782504d658305a40159b4ae6a2f3f1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/502505
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rob Pike <r@golang.org>
|
|
The compiler says Perm is being inlined into BenchmarkPerm,
and yet BenchmarkPerm30ViaShuffle, which you'd think is the
same code, still runs significantly faster.
The benchmarks are mystifying but this is clearly still a step in
the right direction, since BenchmarkPerm30ViaShuffle is still
the fastest and we avoid having two copies of that logic.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ e1bbe739fb.amd64 │ 8993506f2f.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.316n ± 2% 1.325n ± 1% ~ (p=0.208 n=20)
GlobalInt64-32 2.048n ± 1% 2.240n ± 1% +9.38% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1037n ± 1% 0.1041n ± 1% ~ (p=0.774 n=20)
GlobalUint64-32 2.039n ± 2% 2.072n ± 3% ~ (p=0.115 n=20)
GlobalUint64Parallel-32 0.1013n ± 1% 0.1008n ± 1% ~ (p=0.417 n=20)
Int64-32 1.692n ± 2% 1.716n ± 1% ~ (p=0.122 n=20)
Uint64-32 1.643n ± 2% 1.665n ± 1% ~ (p=0.062 n=20)
GlobalIntN1000-32 3.287n ± 1% 3.335n ± 1% ~ (p=0.147 n=20)
IntN1000-32 2.678n ± 2% 2.484n ± 1% -7.24% (p=0.000 n=20)
Int64N1000-32 2.684n ± 2% 2.502n ± 2% -6.80% (p=0.000 n=20)
Int64N1e8-32 2.663n ± 2% 2.484n ± 2% -6.76% (p=0.000 n=20)
Int64N1e9-32 2.633n ± 1% 2.502n ± 0% -4.98% (p=0.000 n=20)
Int64N2e9-32 2.657n ± 1% 2.502n ± 0% -5.87% (p=0.000 n=20)
Int64N1e18-32 3.125n ± 2% 3.201n ± 1% +2.43% (p=0.000 n=20)
Int64N2e18-32 3.476n ± 1% 3.504n ± 1% +0.83% (p=0.009 n=20)
Int64N4e18-32 4.795n ± 1% 4.873n ± 1% ~ (p=0.106 n=20)
Int32N1000-32 2.485n ± 2% 2.639n ± 1% +6.20% (p=0.000 n=20)
Int32N1e8-32 2.457n ± 1% 2.686n ± 2% +9.34% (p=0.000 n=20)
Int32N1e9-32 2.452n ± 1% 2.636n ± 1% +7.52% (p=0.000 n=20)
Int32N2e9-32 2.453n ± 1% 2.660n ± 1% +8.44% (p=0.000 n=20)
Float32-32 2.254n ± 1% 2.261n ± 1% ~ (p=0.888 n=20)
Float64-32 2.262n ± 1% 2.280n ± 1% ~ (p=0.040 n=20)
ExpFloat64-32 3.777n ± 2% 3.891n ± 1% +3.03% (p=0.000 n=20)
NormFloat64-32 3.606n ± 1% 3.711n ± 1% +2.91% (p=0.000 n=20)
Perm3-32 33.12n ± 2% 32.60n ± 2% ~ (p=0.045 n=20)
Perm30-32 176.1n ± 1% 204.2n ± 0% +15.96% (p=0.000 n=20)
Perm30ViaShuffle-32 109.3n ± 1% 121.7n ± 2% +11.30% (p=0.000 n=20)
ShuffleOverhead-32 112.5n ± 1% 106.2n ± 2% -5.56% (p=0.000 n=20)
Concurrent-32 2.099n ± 0% 2.190n ± 5% +4.36% (p=0.001 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ e1bbe739fb.arm64 │ 8993506f2f.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.290n ± 1% 2.271n ± 0% ~ (p=0.015 n=20)
GlobalInt64-8 2.180n ± 1% 2.161n ± 1% ~ (p=0.180 n=20)
GlobalInt64Parallel-8 0.4294n ± 0% 0.4303n ± 0% +0.19% (p=0.001 n=20)
GlobalUint64-8 2.170n ± 1% 2.164n ± 1% ~ (p=0.673 n=20)
GlobalUint64Parallel-8 0.4283n ± 0% 0.4287n ± 0% ~ (p=0.128 n=20)
Int64-8 2.481n ± 1% 2.478n ± 1% ~ (p=0.867 n=20)
Uint64-8 2.464n ± 1% 2.460n ± 1% ~ (p=0.763 n=20)
GlobalIntN1000-8 2.814n ± 0% 2.814n ± 2% ~ (p=0.969 n=20)
IntN1000-8 2.934n ± 2% 3.003n ± 2% +2.35% (p=0.000 n=20)
Int64N1000-8 2.957n ± 1% 2.954n ± 0% ~ (p=0.285 n=20)
Int64N1e8-8 2.935n ± 2% 2.956n ± 0% +0.73% (p=0.002 n=20)
Int64N1e9-8 2.935n ± 2% 3.325n ± 0% +13.29% (p=0.000 n=20)
Int64N2e9-8 2.933n ± 4% 2.956n ± 2% ~ (p=0.163 n=20)
Int64N1e18-8 3.781n ± 1% 3.780n ± 1% ~ (p=0.805 n=20)
Int64N2e18-8 4.362n ± 0% 4.385n ± 0% ~ (p=0.077 n=20)
Int64N4e18-8 6.576n ± 1% 6.527n ± 0% ~ (p=0.024 n=20)
Int32N1000-8 2.942n ± 2% 2.964n ± 1% ~ (p=0.073 n=20)
Int32N1e8-8 2.941n ± 1% 2.964n ± 1% ~ (p=0.058 n=20)
Int32N1e9-8 2.938n ± 2% 2.963n ± 2% +0.87% (p=0.003 n=20)
Int32N2e9-8 2.982n ± 2% 2.961n ± 2% ~ (p=0.056 n=20)
Float32-8 3.441n ± 0% 3.442n ± 0% ~ (p=0.030 n=20)
Float64-8 3.441n ± 0% 3.442n ± 0% +0.03% (p=0.001 n=20)
ExpFloat64-8 4.472n ± 0% 4.472n ± 0% ~ (p=0.877 n=20)
NormFloat64-8 4.716n ± 0% 4.734n ± 0% +0.38% (p=0.000 n=20)
Perm3-8 26.66n ± 0% 26.55n ± 0% -0.39% (p=0.000 n=20)
Perm30-8 143.3n ± 0% 181.9n ± 0% +26.97% (p=0.000 n=20)
Perm30ViaShuffle-8 142.9n ± 0% 143.1n ± 0% ~ (p=0.669 n=20)
ShuffleOverhead-8 121.1n ± 1% 120.6n ± 1% -0.41% (p=0.004 n=20)
Concurrent-8 2.379n ± 2% 2.357n ± 2% ~ (p=0.337 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ e1bbe739fb.386 │ 8993506f2f.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.087n ± 1% 2.102n ± 2% ~ (p=0.507 n=20)
GlobalInt64-32 3.538n ± 2% 3.542n ± 2% ~ (p=0.425 n=20)
GlobalInt64Parallel-32 0.3207n ± 1% 0.3202n ± 0% ~ (p=0.963 n=20)
GlobalUint64-32 3.543n ± 1% 3.507n ± 1% ~ (p=0.034 n=20)
GlobalUint64Parallel-32 0.3170n ± 0% 0.3170n ± 1% ~ (p=0.920 n=20)
Int64-32 2.548n ± 1% 2.516n ± 1% ~ (p=0.139 n=20)
Uint64-32 2.565n ± 2% 2.544n ± 1% ~ (p=0.394 n=20)
GlobalIntN1000-32 6.300n ± 1% 6.237n ± 1% ~ (p=0.029 n=20)
IntN1000-32 4.750n ± 0% 4.670n ± 2% ~ (p=0.034 n=20)
Int64N1000-32 5.515n ± 2% 5.412n ± 1% -1.86% (p=0.009 n=20)
Int64N1e8-32 5.527n ± 0% 5.414n ± 2% -2.05% (p=0.002 n=20)
Int64N1e9-32 5.531n ± 2% 5.473n ± 1% ~ (p=0.047 n=20)
Int64N2e9-32 5.514n ± 2% 5.487n ± 1% ~ (p=0.298 n=20)
Int64N1e18-32 9.059n ± 1% 8.901n ± 2% ~ (p=0.037 n=20)
Int64N2e18-32 9.594n ± 1% 9.521n ± 1% ~ (p=0.051 n=20)
Int64N4e18-32 12.05n ± 2% 11.92n ± 1% ~ (p=0.357 n=20)
Int32N1000-32 4.840n ± 2% 4.785n ± 1% ~ (p=0.189 n=20)
Int32N1e8-32 4.832n ± 2% 4.748n ± 1% ~ (p=0.042 n=20)
Int32N1e9-32 4.815n ± 2% 4.810n ± 1% ~ (p=0.878 n=20)
Int32N2e9-32 4.813n ± 1% 4.812n ± 1% ~ (p=0.542 n=20)
Float32-32 10.90n ± 2% 10.48n ± 4% -3.85% (p=0.007 n=20)
Float64-32 20.32n ± 4% 19.79n ± 3% ~ (p=0.553 n=20)
ExpFloat64-32 12.95n ± 3% 12.91n ± 3% ~ (p=0.909 n=20)
NormFloat64-32 7.570n ± 1% 7.462n ± 1% -1.44% (p=0.004 n=20)
Perm3-32 37.80n ± 2% 35.98n ± 2% -4.79% (p=0.000 n=20)
Perm30-32 214.0n ± 1% 241.5n ± 1% +12.85% (p=0.000 n=20)
Perm30ViaShuffle-32 188.7n ± 2% 187.3n ± 2% ~ (p=0.029 n=20)
ShuffleOverhead-32 160.8n ± 1% 160.2n ± 1% ~ (p=0.180 n=20)
Concurrent-32 3.288n ± 0% 3.308n ± 3% ~ (p=0.037 n=20)
For #61716.
Change-Id: I342b611456c3569520d3c91c849d29eba325d87e
Reviewed-on: https://go-review.googlesource.com/c/go/+/502504
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rob Pike <r@golang.org>
|
|
The original implementation of the ziggurat algorithm was designed for
32-bit random integer inputs. This necessitated reusing some low-order
bits for the slice selection and the random coordinate, which introduces
statistical bias. The result is that PractRand consistently fails the
math/rand normal and exponential sequences (transformed to uniform)
within 2 GB of variates.
This change adjusts the ziggurat procedures to use 63-bit random inputs,
so that there is no need to reuse bits between the slice and coordinate.
This is sufficient for the normal sequence to survive to 256 GB of
PractRand testing.
An alternative technique is to recalculate the ziggurats to use 1024
rather than 128 or 256 slices to make full use of 64-bit inputs. This
improves the survival of the normal sequence to far beyond 256 GB and
additionally provides a 6% performance improvement due to the improved
rejection procedure efficiency. However, doing so increases the total
size of the ziggurat tables from 4.5 kB to 48 kB.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 2703446c2e.amd64 │ e1bbe739fb.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.337n ± 1% 1.316n ± 2% ~ (p=0.024 n=20)
GlobalInt64-32 2.225n ± 2% 2.048n ± 1% -7.93% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1043n ± 2% 0.1037n ± 1% ~ (p=0.587 n=20)
GlobalUint64-32 2.058n ± 1% 2.039n ± 2% ~ (p=0.030 n=20)
GlobalUint64Parallel-32 0.1009n ± 1% 0.1013n ± 1% ~ (p=0.984 n=20)
Int64-32 1.719n ± 2% 1.692n ± 2% ~ (p=0.085 n=20)
Uint64-32 1.669n ± 1% 1.643n ± 2% ~ (p=0.049 n=20)
GlobalIntN1000-32 3.321n ± 2% 3.287n ± 1% ~ (p=0.298 n=20)
IntN1000-32 2.479n ± 1% 2.678n ± 2% +8.01% (p=0.000 n=20)
Int64N1000-32 2.477n ± 1% 2.684n ± 2% +8.38% (p=0.000 n=20)
Int64N1e8-32 2.490n ± 1% 2.663n ± 2% +6.99% (p=0.000 n=20)
Int64N1e9-32 2.458n ± 1% 2.633n ± 1% +7.12% (p=0.000 n=20)
Int64N2e9-32 2.486n ± 2% 2.657n ± 1% +6.90% (p=0.000 n=20)
Int64N1e18-32 3.215n ± 2% 3.125n ± 2% -2.78% (p=0.000 n=20)
Int64N2e18-32 3.588n ± 2% 3.476n ± 1% -3.15% (p=0.000 n=20)
Int64N4e18-32 4.938n ± 2% 4.795n ± 1% -2.91% (p=0.000 n=20)
Int32N1000-32 2.673n ± 2% 2.485n ± 2% -7.02% (p=0.000 n=20)
Int32N1e8-32 2.631n ± 2% 2.457n ± 1% -6.63% (p=0.000 n=20)
Int32N1e9-32 2.628n ± 2% 2.452n ± 1% -6.70% (p=0.000 n=20)
Int32N2e9-32 2.684n ± 2% 2.453n ± 1% -8.61% (p=0.000 n=20)
Float32-32 2.240n ± 2% 2.254n ± 1% ~ (p=0.878 n=20)
Float64-32 2.253n ± 1% 2.262n ± 1% ~ (p=0.963 n=20)
ExpFloat64-32 3.677n ± 1% 3.777n ± 2% +2.71% (p=0.004 n=20)
NormFloat64-32 3.761n ± 1% 3.606n ± 1% -4.15% (p=0.000 n=20)
Perm3-32 33.55n ± 2% 33.12n ± 2% ~ (p=0.402 n=20)
Perm30-32 173.2n ± 1% 176.1n ± 1% +1.67% (p=0.000 n=20)
Perm30ViaShuffle-32 115.9n ± 1% 109.3n ± 1% -5.69% (p=0.000 n=20)
ShuffleOverhead-32 101.9n ± 1% 112.5n ± 1% +10.35% (p=0.000 n=20)
Concurrent-32 2.107n ± 6% 2.099n ± 0% ~ (p=0.051 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ 2703446c2e.arm64 │ e1bbe739fb.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.275n ± 0% 2.290n ± 1% ~ (p=0.044 n=20)
GlobalInt64-8 2.154n ± 1% 2.180n ± 1% ~ (p=0.068 n=20)
GlobalInt64Parallel-8 0.4298n ± 0% 0.4294n ± 0% ~ (p=0.079 n=20)
GlobalUint64-8 2.160n ± 1% 2.170n ± 1% ~ (p=0.129 n=20)
GlobalUint64Parallel-8 0.4286n ± 0% 0.4283n ± 0% ~ (p=0.350 n=20)
Int64-8 2.491n ± 1% 2.481n ± 1% ~ (p=0.330 n=20)
Uint64-8 2.458n ± 0% 2.464n ± 1% ~ (p=0.351 n=20)
GlobalIntN1000-8 2.814n ± 2% 2.814n ± 0% ~ (p=0.325 n=20)
IntN1000-8 2.933n ± 0% 2.934n ± 2% ~ (p=0.079 n=20)
Int64N1000-8 2.962n ± 1% 2.957n ± 1% ~ (p=0.259 n=20)
Int64N1e8-8 2.960n ± 1% 2.935n ± 2% ~ (p=0.276 n=20)
Int64N1e9-8 2.935n ± 2% 2.935n ± 2% ~ (p=0.984 n=20)
Int64N2e9-8 2.934n ± 0% 2.933n ± 4% ~ (p=0.463 n=20)
Int64N1e18-8 3.777n ± 1% 3.781n ± 1% ~ (p=0.516 n=20)
Int64N2e18-8 4.359n ± 1% 4.362n ± 0% ~ (p=0.256 n=20)
Int64N4e18-8 6.536n ± 1% 6.576n ± 1% ~ (p=0.224 n=20)
Int32N1000-8 2.937n ± 0% 2.942n ± 2% ~ (p=0.312 n=20)
Int32N1e8-8 2.937n ± 1% 2.941n ± 1% ~ (p=0.463 n=20)
Int32N1e9-8 2.936n ± 0% 2.938n ± 2% ~ (p=0.044 n=20)
Int32N2e9-8 2.938n ± 2% 2.982n ± 2% ~ (p=0.174 n=20)
Float32-8 3.441n ± 0% 3.441n ± 0% ~ (p=0.064 n=20)
Float64-8 3.441n ± 0% 3.441n ± 0% ~ (p=0.826 n=20)
ExpFloat64-8 4.486n ± 0% 4.472n ± 0% -0.31% (p=0.000 n=20)
NormFloat64-8 4.721n ± 0% 4.716n ± 0% ~ (p=0.051 n=20)
Perm3-8 26.65n ± 0% 26.66n ± 0% ~ (p=0.080 n=20)
Perm30-8 143.2n ± 0% 143.3n ± 0% +0.10% (p=0.000 n=20)
Perm30ViaShuffle-8 143.0n ± 0% 142.9n ± 0% ~ (p=0.642 n=20)
ShuffleOverhead-8 120.6n ± 1% 121.1n ± 1% +0.41% (p=0.010 n=20)
Concurrent-8 2.399n ± 5% 2.379n ± 2% ~ (p=0.365 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 2703446c2e.386 │ e1bbe739fb.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.072n ± 2% 2.087n ± 1% ~ (p=0.440 n=20)
GlobalInt64-32 3.546n ± 27% 3.538n ± 2% ~ (p=0.101 n=20)
GlobalInt64Parallel-32 0.3211n ± 0% 0.3207n ± 1% ~ (p=0.753 n=20)
GlobalUint64-32 3.522n ± 2% 3.543n ± 1% ~ (p=0.071 n=20)
GlobalUint64Parallel-32 0.3172n ± 0% 0.3170n ± 0% ~ (p=0.507 n=20)
Int64-32 2.520n ± 2% 2.548n ± 1% ~ (p=0.267 n=20)
Uint64-32 2.581n ± 1% 2.565n ± 2% ~ (p=0.143 n=20)
GlobalIntN1000-32 6.171n ± 1% 6.300n ± 1% ~ (p=0.037 n=20)
IntN1000-32 4.752n ± 2% 4.750n ± 0% ~ (p=0.984 n=20)
Int64N1000-32 5.429n ± 1% 5.515n ± 2% ~ (p=0.292 n=20)
Int64N1e8-32 5.469n ± 2% 5.527n ± 0% ~ (p=0.013 n=20)
Int64N1e9-32 5.489n ± 2% 5.531n ± 2% ~ (p=0.256 n=20)
Int64N2e9-32 5.492n ± 2% 5.514n ± 2% ~ (p=0.606 n=20)
Int64N1e18-32 8.927n ± 1% 9.059n ± 1% ~ (p=0.229 n=20)
Int64N2e18-32 9.622n ± 1% 9.594n ± 1% ~ (p=0.703 n=20)
Int64N4e18-32 12.03n ± 1% 12.05n ± 2% ~ (p=0.733 n=20)
Int32N1000-32 4.817n ± 1% 4.840n ± 2% ~ (p=0.941 n=20)
Int32N1e8-32 4.801n ± 1% 4.832n ± 2% ~ (p=0.228 n=20)
Int32N1e9-32 4.798n ± 1% 4.815n ± 2% ~ (p=0.560 n=20)
Int32N2e9-32 4.840n ± 1% 4.813n ± 1% ~ (p=0.015 n=20)
Float32-32 10.51n ± 4% 10.90n ± 2% +3.71% (p=0.007 n=20)
Float64-32 20.33n ± 3% 20.32n ± 4% ~ (p=0.566 n=20)
ExpFloat64-32 12.59n ± 2% 12.95n ± 3% +2.86% (p=0.002 n=20)
NormFloat64-32 7.350n ± 2% 7.570n ± 1% +2.99% (p=0.007 n=20)
Perm3-32 39.29n ± 2% 37.80n ± 2% -3.79% (p=0.000 n=20)
Perm30-32 219.1n ± 2% 214.0n ± 1% -2.33% (p=0.002 n=20)
Perm30ViaShuffle-32 189.8n ± 2% 188.7n ± 2% ~ (p=0.147 n=20)
ShuffleOverhead-32 158.9n ± 2% 160.8n ± 1% ~ (p=0.176 n=20)
Concurrent-32 3.306n ± 3% 3.288n ± 0% -0.54% (p=0.005 n=20)
For #61716.
Change-Id: I4c5fe710b310dc075ae21c97d1805bcc20db5050
Reviewed-on: https://go-review.googlesource.com/c/go/+/516275
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
|
|
We realized too late after Go 1 that float64(r.Uint64())/(1<<64)
is not a correct implementation: it occasionally rounds to 1.
The correct implementation is float64(r.Uint64()&(1<<53-1))/(1<<53)
but we couldn't change the implementation for compatibility, so we
changed it to retry only in the "round to 1" cases.
The change to v2 lets us update the algorithm to the simpler,
faster one.
Note that this implementation cannot generate 2⁻⁵⁴, nor 2⁻¹⁰⁰,
nor any of the other numbers between 0 and 2⁻⁵³. A slower algorithm
could shift some of the probability of generating these two boundary
values over to the values in between, but that would be much slower
and not necessarily be better. In particular, the current
implementation has the property that there are uniform gaps between
the possible returned floats, which might help stability. Also, the
result is often scaled and shifted, like Float64()*X+Y. Multiplying by
X>1 would open new gaps, and adding most Y would erase all the
distinctions that were introduced.
The only changes to benchmarks should be in Float32 and Float64.
The other changes remain a cautionary tale.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 4d84a369d1.amd64 │ 2703446c2e.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.348n ± 2% 1.337n ± 1% ~ (p=0.662 n=20)
GlobalInt64-32 2.082n ± 2% 2.225n ± 2% +6.87% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1036n ± 1% 0.1043n ± 2% ~ (p=0.171 n=20)
GlobalUint64-32 2.077n ± 2% 2.058n ± 1% ~ (p=0.560 n=20)
GlobalUint64Parallel-32 0.1012n ± 1% 0.1009n ± 1% ~ (p=0.995 n=20)
Int64-32 1.750n ± 0% 1.719n ± 2% -1.74% (p=0.000 n=20)
Uint64-32 1.707n ± 2% 1.669n ± 1% -2.20% (p=0.000 n=20)
GlobalIntN1000-32 3.192n ± 1% 3.321n ± 2% +4.04% (p=0.000 n=20)
IntN1000-32 2.462n ± 2% 2.479n ± 1% ~ (p=0.417 n=20)
Int64N1000-32 2.470n ± 1% 2.477n ± 1% ~ (p=0.664 n=20)
Int64N1e8-32 2.503n ± 2% 2.490n ± 1% ~ (p=0.245 n=20)
Int64N1e9-32 2.487n ± 1% 2.458n ± 1% ~ (p=0.032 n=20)
Int64N2e9-32 2.487n ± 1% 2.486n ± 2% ~ (p=0.507 n=20)
Int64N1e18-32 3.006n ± 2% 3.215n ± 2% +6.94% (p=0.000 n=20)
Int64N2e18-32 3.368n ± 1% 3.588n ± 2% +6.55% (p=0.000 n=20)
Int64N4e18-32 4.763n ± 1% 4.938n ± 2% +3.69% (p=0.000 n=20)
Int32N1000-32 2.403n ± 1% 2.673n ± 2% +11.19% (p=0.000 n=20)
Int32N1e8-32 2.405n ± 1% 2.631n ± 2% +9.42% (p=0.000 n=20)
Int32N1e9-32 2.402n ± 2% 2.628n ± 2% +9.41% (p=0.000 n=20)
Int32N2e9-32 2.384n ± 1% 2.684n ± 2% +12.56% (p=0.000 n=20)
Float32-32 2.641n ± 2% 2.240n ± 2% -15.18% (p=0.000 n=20)
Float64-32 2.483n ± 1% 2.253n ± 1% -9.26% (p=0.000 n=20)
ExpFloat64-32 3.486n ± 2% 3.677n ± 1% +5.49% (p=0.000 n=20)
NormFloat64-32 3.648n ± 1% 3.761n ± 1% +3.11% (p=0.000 n=20)
Perm3-32 33.04n ± 1% 33.55n ± 2% ~ (p=0.180 n=20)
Perm30-32 171.9n ± 1% 173.2n ± 1% ~ (p=0.050 n=20)
Perm30ViaShuffle-32 100.3n ± 1% 115.9n ± 1% +15.55% (p=0.000 n=20)
ShuffleOverhead-32 102.5n ± 1% 101.9n ± 1% ~ (p=0.266 n=20)
Concurrent-32 2.101n ± 0% 2.107n ± 6% ~ (p=0.212 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ 4d84a369d1.arm64 │ 2703446c2e.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.261n ± 1% 2.275n ± 0% ~ (p=0.082 n=20)
GlobalInt64-8 2.160n ± 1% 2.154n ± 1% ~ (p=0.490 n=20)
GlobalInt64Parallel-8 0.4299n ± 0% 0.4298n ± 0% ~ (p=0.663 n=20)
GlobalUint64-8 2.169n ± 1% 2.160n ± 1% ~ (p=0.292 n=20)
GlobalUint64Parallel-8 0.4293n ± 1% 0.4286n ± 0% ~ (p=0.155 n=20)
Int64-8 2.473n ± 1% 2.491n ± 1% ~ (p=0.317 n=20)
Uint64-8 2.453n ± 1% 2.458n ± 0% ~ (p=0.941 n=20)
GlobalIntN1000-8 2.814n ± 2% 2.814n ± 2% ~ (p=0.972 n=20)
IntN1000-8 2.933n ± 2% 2.933n ± 0% ~ (p=0.287 n=20)
Int64N1000-8 2.934n ± 2% 2.962n ± 1% ~ (p=0.062 n=20)
Int64N1e8-8 2.935n ± 2% 2.960n ± 1% ~ (p=0.183 n=20)
Int64N1e9-8 2.934n ± 2% 2.935n ± 2% ~ (p=0.367 n=20)
Int64N2e9-8 2.935n ± 2% 2.934n ± 0% ~ (p=0.455 n=20)
Int64N1e18-8 3.778n ± 1% 3.777n ± 1% ~ (p=0.995 n=20)
Int64N2e18-8 4.359n ± 1% 4.359n ± 1% ~ (p=0.122 n=20)
Int64N4e18-8 6.546n ± 1% 6.536n ± 1% ~ (p=0.920 n=20)
Int32N1000-8 2.940n ± 2% 2.937n ± 0% ~ (p=0.149 n=20)
Int32N1e8-8 2.937n ± 2% 2.937n ± 1% ~ (p=0.620 n=20)
Int32N1e9-8 2.938n ± 0% 2.936n ± 0% ~ (p=0.046 n=20)
Int32N2e9-8 2.938n ± 2% 2.938n ± 2% ~ (p=0.455 n=20)
Float32-8 3.486n ± 0% 3.441n ± 0% -1.28% (p=0.000 n=20)
Float64-8 3.480n ± 0% 3.441n ± 0% -1.13% (p=0.000 n=20)
ExpFloat64-8 4.533n ± 0% 4.486n ± 0% -1.03% (p=0.000 n=20)
NormFloat64-8 4.764n ± 0% 4.721n ± 0% -0.90% (p=0.000 n=20)
Perm3-8 26.66n ± 0% 26.65n ± 0% ~ (p=0.019 n=20)
Perm30-8 143.4n ± 0% 143.2n ± 0% -0.17% (p=0.000 n=20)
Perm30ViaShuffle-8 142.9n ± 0% 143.0n ± 0% ~ (p=0.522 n=20)
ShuffleOverhead-8 120.7n ± 0% 120.6n ± 1% ~ (p=0.488 n=20)
Concurrent-8 2.360n ± 2% 2.399n ± 5% ~ (p=0.062 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 4d84a369d1.386 │ 2703446c2e.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.101n ± 2% 2.072n ± 2% ~ (p=0.273 n=20)
GlobalInt64-32 3.518n ± 2% 3.546n ± 27% +0.78% (p=0.007 n=20)
GlobalInt64Parallel-32 0.3206n ± 0% 0.3211n ± 0% ~ (p=0.386 n=20)
GlobalUint64-32 3.538n ± 1% 3.522n ± 2% ~ (p=0.331 n=20)
GlobalUint64Parallel-32 0.3231n ± 0% 0.3172n ± 0% -1.84% (p=0.000 n=20)
Int64-32 2.554n ± 2% 2.520n ± 2% ~ (p=0.465 n=20)
Uint64-32 2.575n ± 2% 2.581n ± 1% ~ (p=0.213 n=20)
GlobalIntN1000-32 6.292n ± 1% 6.171n ± 1% ~ (p=0.015 n=20)
IntN1000-32 4.735n ± 1% 4.752n ± 2% ~ (p=0.635 n=20)
Int64N1000-32 5.489n ± 2% 5.429n ± 1% ~ (p=0.324 n=20)
Int64N1e8-32 5.528n ± 2% 5.469n ± 2% ~ (p=0.013 n=20)
Int64N1e9-32 5.438n ± 2% 5.489n ± 2% ~ (p=0.984 n=20)
Int64N2e9-32 5.474n ± 1% 5.492n ± 2% ~ (p=0.616 n=20)
Int64N1e18-32 9.053n ± 1% 8.927n ± 1% ~ (p=0.037 n=20)
Int64N2e18-32 9.685n ± 2% 9.622n ± 1% ~ (p=0.449 n=20)
Int64N4e18-32 12.18n ± 1% 12.03n ± 1% ~ (p=0.013 n=20)
Int32N1000-32 4.862n ± 1% 4.817n ± 1% -0.94% (p=0.002 n=20)
Int32N1e8-32 4.758n ± 2% 4.801n ± 1% ~ (p=0.597 n=20)
Int32N1e9-32 4.772n ± 1% 4.798n ± 1% ~ (p=0.774 n=20)
Int32N2e9-32 4.847n ± 0% 4.840n ± 1% ~ (p=0.867 n=20)
Float32-32 22.18n ± 4% 10.51n ± 4% -52.61% (p=0.000 n=20)
Float64-32 21.21n ± 3% 20.33n ± 3% -4.17% (p=0.000 n=20)
ExpFloat64-32 12.39n ± 2% 12.59n ± 2% ~ (p=0.139 n=20)
NormFloat64-32 7.422n ± 1% 7.350n ± 2% ~ (p=0.208 n=20)
Perm3-32 38.00n ± 2% 39.29n ± 2% +3.38% (p=0.000 n=20)
Perm30-32 212.7n ± 1% 219.1n ± 2% +3.03% (p=0.001 n=20)
Perm30ViaShuffle-32 187.5n ± 2% 189.8n ± 2% ~ (p=0.457 n=20)
ShuffleOverhead-32 159.7n ± 1% 158.9n ± 2% ~ (p=0.920 n=20)
Concurrent-32 3.470n ± 0% 3.306n ± 3% -4.71% (p=0.000 n=20)
For #61716.
Change-Id: I1933f1f9efd7e6e832d83e7fa5d84398f67d41f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/502503
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Now that we can break the value stream, we can take advantage
of better algorithms that have been suggested since the original
code was written.
Also optimizes IntN, Int32N, Int64N, Perm (indirectly).
All the N variants (IntN, Int32N, Int64N, UintN, N, etc) now
return the same values given a Source and parameter n, so that
for example uint(r.IntN(10)) and r.UintN(10) and r.N(uint(10))
are completely interchangeable.
Int64N4e18 gets slower but that is a near worst case for
the algorithm and is extremely unlikely in practice.
32-bit Int32N variants got slower too, by 15-30%, in exchange
for speeding up everything on 64-bit systems and consistency
across the N functions.
Also rename previously missed benchmark
GlobalInt63Parallel to GlobalInt64Parallel.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 11ad9fdddc.amd64 │ 4d84a369d1.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.335n ± 1% 1.348n ± 2% ~ (p=0.335 n=20)
GlobalInt64-32 2.046n ± 1% 2.082n ± 2% ~ (p=0.310 n=20)
GlobalInt63Parallel-32 0.1037n ± 1%
GlobalInt64Parallel-32 0.1036n ± 1%
GlobalUint64-32 2.075n ± 0% 2.077n ± 2% ~ (p=0.228 n=20)
GlobalUint64Parallel-32 0.1013n ± 1% 0.1012n ± 1% ~ (p=0.878 n=20)
Int64-32 1.726n ± 2% 1.750n ± 0% +1.39% (p=0.000 n=20)
Uint64-32 1.673n ± 1% 1.707n ± 2% +2.03% (p=0.002 n=20)
GlobalIntN1000-32 3.895n ± 2% 3.192n ± 1% -18.05% (p=0.000 n=20)
IntN1000-32 3.403n ± 1% 2.462n ± 2% -27.65% (p=0.000 n=20)
Int64N1000-32 3.053n ± 2% 2.470n ± 1% -19.11% (p=0.000 n=20)
Int64N1e8-32 2.718n ± 1% 2.503n ± 2% -7.91% (p=0.000 n=20)
Int64N1e9-32 2.712n ± 1% 2.487n ± 1% -8.31% (p=0.000 n=20)
Int64N2e9-32 2.690n ± 1% 2.487n ± 1% -7.57% (p=0.000 n=20)
Int64N1e18-32 3.084n ± 2% 3.006n ± 2% -2.53% (p=0.000 n=20)
Int64N2e18-32 4.026n ± 1% 3.368n ± 1% -16.33% (p=0.000 n=20)
Int64N4e18-32 4.049n ± 2% 4.763n ± 1% +17.62% (p=0.000 n=20)
Int32N1000-32 2.730n ± 0% 2.403n ± 1% -11.94% (p=0.000 n=20)
Int32N1e8-32 2.916n ± 2% 2.405n ± 1% -17.53% (p=0.000 n=20)
Int32N1e9-32 3.375n ± 1% 2.402n ± 2% -28.83% (p=0.000 n=20)
Int32N2e9-32 3.292n ± 1% 2.384n ± 1% -27.58% (p=0.000 n=20)
Float32-32 2.673n ± 1% 2.641n ± 2% ~ (p=0.147 n=20)
Float64-32 2.485n ± 1% 2.483n ± 1% ~ (p=0.804 n=20)
ExpFloat64-32 3.577n ± 2% 3.486n ± 2% -2.57% (p=0.000 n=20)
NormFloat64-32 3.797n ± 2% 3.648n ± 1% -3.92% (p=0.000 n=20)
Perm3-32 35.79n ± 2% 33.04n ± 1% -7.68% (p=0.000 n=20)
Perm30-32 205.1n ± 1% 171.9n ± 1% -16.14% (p=0.000 n=20)
Perm30ViaShuffle-32 111.2n ± 2% 100.3n ± 1% -9.76% (p=0.000 n=20)
ShuffleOverhead-32 100.5n ± 2% 102.5n ± 1% +1.99% (p=0.007 n=20)
Concurrent-32 2.188n ± 5% 2.101n ± 0% ~ (p=0.013 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ 11ad9fdddc.arm64 │ 4d84a369d1.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.272n ± 1% 2.261n ± 1% ~ (p=0.172 n=20)
GlobalInt64-8 2.155n ± 1% 2.160n ± 1% ~ (p=0.482 n=20)
GlobalInt63Parallel-8 0.4352n ± 0%
GlobalInt64Parallel-8 0.4299n ± 0%
GlobalUint64-8 2.173n ± 1% 2.169n ± 1% ~ (p=0.262 n=20)
GlobalUint64Parallel-8 0.4340n ± 0% 0.4293n ± 1% -1.08% (p=0.000 n=20)
Int64-8 2.544n ± 1% 2.473n ± 1% -2.83% (p=0.000 n=20)
Uint64-8 2.552n ± 1% 2.453n ± 1% -3.90% (p=0.000 n=20)
GlobalIntN1000-8 3.856n ± 0% 2.814n ± 2% -27.02% (p=0.000 n=20)
IntN1000-8 3.820n ± 0% 2.933n ± 2% -23.22% (p=0.000 n=20)
Int64N1000-8 3.219n ± 2% 2.934n ± 2% -8.85% (p=0.000 n=20)
Int64N1e8-8 3.221n ± 2% 2.935n ± 2% -8.91% (p=0.000 n=20)
Int64N1e9-8 3.276n ± 2% 2.934n ± 2% -10.44% (p=0.000 n=20)
Int64N2e9-8 3.217n ± 0% 2.935n ± 2% -8.78% (p=0.000 n=20)
Int64N1e18-8 3.502n ± 2% 3.778n ± 1% +7.91% (p=0.000 n=20)
Int64N2e18-8 4.968n ± 1% 4.359n ± 1% -12.26% (p=0.000 n=20)
Int64N4e18-8 4.963n ± 0% 6.546n ± 1% +31.92% (p=0.000 n=20)
Int32N1000-8 3.189n ± 1% 2.940n ± 2% -7.81% (p=0.000 n=20)
Int32N1e8-8 3.514n ± 1% 2.937n ± 2% -16.41% (p=0.000 n=20)
Int32N1e9-8 4.133n ± 0% 2.938n ± 0% -28.91% (p=0.000 n=20)
Int32N2e9-8 4.137n ± 0% 2.938n ± 2% -28.97% (p=0.000 n=20)
Float32-8 3.468n ± 1% 3.486n ± 0% +0.52% (p=0.000 n=20)
Float64-8 3.478n ± 0% 3.480n ± 0% ~ (p=0.063 n=20)
ExpFloat64-8 4.563n ± 0% 4.533n ± 0% -0.67% (p=0.000 n=20)
NormFloat64-8 4.768n ± 0% 4.764n ± 0% -0.07% (p=0.001 n=20)
Perm3-8 28.94n ± 0% 26.66n ± 0% -7.88% (p=0.000 n=20)
Perm30-8 175.9n ± 0% 143.4n ± 0% -18.50% (p=0.000 n=20)
Perm30ViaShuffle-8 152.6n ± 1% 142.9n ± 0% -6.29% (p=0.000 n=20)
ShuffleOverhead-8 119.6n ± 1% 120.7n ± 0% +0.96% (p=0.000 n=20)
Concurrent-8 2.452n ± 3% 2.360n ± 2% -3.73% (p=0.007 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 11ad9fdddc.386 │ 4d84a369d1.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.091n ± 1% 2.101n ± 2% ~ (p=0.672 n=20)
GlobalInt64-32 3.514n ± 2% 3.518n ± 2% ~ (p=0.723 n=20)
GlobalInt63Parallel-32 0.3197n ± 0%
GlobalInt64Parallel-32 0.3206n ± 0%
GlobalUint64-32 3.542n ± 1% 3.538n ± 1% ~ (p=0.304 n=20)
GlobalUint64Parallel-32 0.3218n ± 0% 0.3231n ± 0% ~ (p=0.071 n=20)
Int64-32 2.552n ± 2% 2.554n ± 2% ~ (p=0.693 n=20)
Uint64-32 2.566n ± 1% 2.575n ± 2% ~ (p=0.606 n=20)
GlobalIntN1000-32 5.965n ± 2% 6.292n ± 1% +5.46% (p=0.000 n=20)
IntN1000-32 4.652n ± 1% 4.735n ± 1% +1.77% (p=0.000 n=20)
Int64N1000-32 14.485n ± 1% 5.489n ± 2% -62.11% (p=0.000 n=20)
Int64N1e8-32 14.675n ± 1% 5.528n ± 2% -62.33% (p=0.000 n=20)
Int64N1e9-32 16.805n ± 2% 5.438n ± 2% -67.64% (p=0.000 n=20)
Int64N2e9-32 14.515n ± 1% 5.474n ± 1% -62.28% (p=0.000 n=20)
Int64N1e18-32 16.165n ± 1% 9.053n ± 1% -44.00% (p=0.000 n=20)
Int64N2e18-32 17.945n ± 2% 9.685n ± 2% -46.03% (p=0.000 n=20)
Int64N4e18-32 18.35n ± 2% 12.18n ± 1% -33.62% (p=0.000 n=20)
Int32N1000-32 3.608n ± 1% 4.862n ± 1% +34.77% (p=0.000 n=20)
Int32N1e8-32 3.767n ± 1% 4.758n ± 2% +26.31% (p=0.000 n=20)
Int32N1e9-32 4.130n ± 2% 4.772n ± 1% +15.54% (p=0.000 n=20)
Int32N2e9-32 4.206n ± 1% 4.847n ± 0% +15.24% (p=0.000 n=20)
Float32-32 22.18n ± 4% 22.18n ± 4% ~ (p=0.195 n=20)
Float64-32 20.75n ± 4% 21.21n ± 3% ~ (p=0.394 n=20)
ExpFloat64-32 12.58n ± 3% 12.39n ± 2% ~ (p=0.032 n=20)
NormFloat64-32 7.920n ± 3% 7.422n ± 1% -6.29% (p=0.000 n=20)
Perm3-32 40.27n ± 1% 38.00n ± 2% -5.65% (p=0.000 n=20)
Perm30-32 213.2n ± 2% 212.7n ± 1% ~ (p=0.995 n=20)
Perm30ViaShuffle-32 164.2n ± 2% 187.5n ± 2% +14.22% (p=0.000 n=20)
ShuffleOverhead-32 134.7n ± 2% 159.7n ± 1% +18.52% (p=0.000 n=20)
Concurrent-32 3.301n ± 2% 3.470n ± 0% +5.10% (p=0.000 n=20)
For #61716.
Change-Id: Id1481b04202883cd0b23e21bb58d1bca4e482bd3
Reviewed-on: https://go-review.googlesource.com/c/go/+/502500
Reviewed-by: Rob Pike <r@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This should make Uint64-using functions faster and leave
other things alone. It is a mystery why so much got faster.
A good cautionary tale not to read too much into minor
jitter in the benchmarks.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 220860f76f.amd64 │ 11ad9fdddc.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.555n ± 1% 1.335n ± 1% -14.15% (p=0.000 n=20)
GlobalInt64-32 2.071n ± 1% 2.046n ± 1% ~ (p=0.016 n=20)
GlobalInt63Parallel-32 0.1023n ± 1% 0.1037n ± 1% +1.37% (p=0.002 n=20)
GlobalUint64-32 5.193n ± 1% 2.075n ± 0% -60.06% (p=0.000 n=20)
GlobalUint64Parallel-32 0.2341n ± 0% 0.1013n ± 1% -56.74% (p=0.000 n=20)
Int64-32 2.056n ± 2% 1.726n ± 2% -16.10% (p=0.000 n=20)
Uint64-32 2.077n ± 2% 1.673n ± 1% -19.46% (p=0.000 n=20)
GlobalIntN1000-32 4.077n ± 2% 3.895n ± 2% -4.45% (p=0.000 n=20)
IntN1000-32 3.476n ± 2% 3.403n ± 1% -2.10% (p=0.000 n=20)
Int64N1000-32 3.059n ± 1% 3.053n ± 2% ~ (p=0.131 n=20)
Int64N1e8-32 2.942n ± 1% 2.718n ± 1% -7.60% (p=0.000 n=20)
Int64N1e9-32 2.932n ± 1% 2.712n ± 1% -7.50% (p=0.000 n=20)
Int64N2e9-32 2.925n ± 1% 2.690n ± 1% -8.03% (p=0.000 n=20)
Int64N1e18-32 3.116n ± 1% 3.084n ± 2% ~ (p=0.425 n=20)
Int64N2e18-32 4.067n ± 1% 4.026n ± 1% -1.02% (p=0.007 n=20)
Int64N4e18-32 4.054n ± 1% 4.049n ± 2% ~ (p=0.204 n=20)
Int32N1000-32 2.951n ± 1% 2.730n ± 0% -7.49% (p=0.000 n=20)
Int32N1e8-32 3.102n ± 1% 2.916n ± 2% -6.03% (p=0.000 n=20)
Int32N1e9-32 3.535n ± 1% 3.375n ± 1% -4.54% (p=0.000 n=20)
Int32N2e9-32 3.514n ± 1% 3.292n ± 1% -6.30% (p=0.000 n=20)
Float32-32 2.760n ± 1% 2.673n ± 1% -3.13% (p=0.000 n=20)
Float64-32 2.284n ± 1% 2.485n ± 1% +8.80% (p=0.000 n=20)
ExpFloat64-32 3.757n ± 1% 3.577n ± 2% -4.78% (p=0.000 n=20)
NormFloat64-32 3.837n ± 1% 3.797n ± 2% ~ (p=0.204 n=20)
Perm3-32 35.23n ± 2% 35.79n ± 2% ~ (p=0.298 n=20)
Perm30-32 208.8n ± 1% 205.1n ± 1% -1.82% (p=0.000 n=20)
Perm30ViaShuffle-32 111.7n ± 1% 111.2n ± 2% ~ (p=0.273 n=20)
ShuffleOverhead-32 101.1n ± 1% 100.5n ± 2% ~ (p=0.878 n=20)
Concurrent-32 2.108n ± 7% 2.188n ± 5% ~ (p=0.417 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
│ 220860f76f.arm64 │ 11ad9fdddc.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.316n ± 1% 2.272n ± 1% -1.86% (p=0.000 n=20)
GlobalInt64-8 2.183n ± 1% 2.155n ± 1% ~ (p=0.122 n=20)
GlobalInt63Parallel-8 0.4331n ± 0% 0.4352n ± 0% +0.48% (p=0.000 n=20)
GlobalUint64-8 4.377n ± 2% 2.173n ± 1% -50.35% (p=0.000 n=20)
GlobalUint64Parallel-8 0.9237n ± 0% 0.4340n ± 0% -53.02% (p=0.000 n=20)
Int64-8 2.538n ± 1% 2.544n ± 1% ~ (p=0.189 n=20)
Uint64-8 2.604n ± 1% 2.552n ± 1% -1.98% (p=0.000 n=20)
GlobalIntN1000-8 3.857n ± 2% 3.856n ± 0% ~ (p=0.051 n=20)
IntN1000-8 3.822n ± 2% 3.820n ± 0% -0.05% (p=0.001 n=20)
Int64N1000-8 3.318n ± 0% 3.219n ± 2% -2.98% (p=0.000 n=20)
Int64N1e8-8 3.349n ± 1% 3.221n ± 2% -3.79% (p=0.000 n=20)
Int64N1e9-8 3.317n ± 2% 3.276n ± 2% -1.24% (p=0.001 n=20)
Int64N2e9-8 3.317n ± 2% 3.217n ± 0% -3.01% (p=0.000 n=20)
Int64N1e18-8 3.542n ± 1% 3.502n ± 2% -1.16% (p=0.001 n=20)
Int64N2e18-8 5.087n ± 0% 4.968n ± 1% -2.33% (p=0.000 n=20)
Int64N4e18-8 5.084n ± 0% 4.963n ± 0% -2.39% (p=0.000 n=20)
Int32N1000-8 3.208n ± 2% 3.189n ± 1% -0.58% (p=0.001 n=20)
Int32N1e8-8 3.610n ± 1% 3.514n ± 1% -2.67% (p=0.000 n=20)
Int32N1e9-8 4.235n ± 0% 4.133n ± 0% -2.40% (p=0.000 n=20)
Int32N2e9-8 4.229n ± 1% 4.137n ± 0% -2.19% (p=0.000 n=20)
Float32-8 3.468n ± 0% 3.468n ± 1% ~ (p=0.350 n=20)
Float64-8 3.447n ± 0% 3.478n ± 0% +0.90% (p=0.000 n=20)
ExpFloat64-8 4.567n ± 0% 4.563n ± 0% -0.10% (p=0.002 n=20)
NormFloat64-8 4.821n ± 0% 4.768n ± 0% -1.09% (p=0.000 n=20)
Perm3-8 28.89n ± 0% 28.94n ± 0% +0.17% (p=0.000 n=20)
Perm30-8 175.7n ± 0% 175.9n ± 0% +0.14% (p=0.000 n=20)
Perm30ViaShuffle-8 153.5n ± 0% 152.6n ± 1% ~ (p=0.010 n=20)
ShuffleOverhead-8 119.8n ± 1% 119.6n ± 1% ~ (p=0.147 n=20)
Concurrent-8 2.433n ± 3% 2.452n ± 3% ~ (p=0.616 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 220860f76f.386 │ 11ad9fdddc.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.370n ± 1% 2.091n ± 1% -11.75% (p=0.000 n=20)
GlobalInt64-32 3.569n ± 1% 3.514n ± 2% -1.56% (p=0.000 n=20)
GlobalInt63Parallel-32 0.3221n ± 1% 0.3197n ± 0% -0.76% (p=0.000 n=20)
GlobalUint64-32 8.797n ± 10% 3.542n ± 1% -59.74% (p=0.000 n=20)
GlobalUint64Parallel-32 0.6351n ± 0% 0.3218n ± 0% -49.33% (p=0.000 n=20)
Int64-32 2.612n ± 2% 2.552n ± 2% -2.30% (p=0.000 n=20)
Uint64-32 3.350n ± 1% 2.566n ± 1% -23.42% (p=0.000 n=20)
GlobalIntN1000-32 5.892n ± 1% 5.965n ± 2% ~ (p=0.082 n=20)
IntN1000-32 4.546n ± 1% 4.652n ± 1% +2.33% (p=0.000 n=20)
Int64N1000-32 14.59n ± 1% 14.48n ± 1% ~ (p=0.652 n=20)
Int64N1e8-32 14.76n ± 2% 14.67n ± 1% ~ (p=0.836 n=20)
Int64N1e9-32 16.57n ± 1% 16.80n ± 2% ~ (p=0.016 n=20)
Int64N2e9-32 14.54n ± 1% 14.52n ± 1% ~ (p=0.533 n=20)
Int64N1e18-32 16.14n ± 1% 16.16n ± 1% ~ (p=0.606 n=20)
Int64N2e18-32 18.10n ± 1% 17.95n ± 2% ~ (p=0.062 n=20)
Int64N4e18-32 18.65n ± 1% 18.35n ± 2% -1.61% (p=0.010 n=20)
Int32N1000-32 3.560n ± 1% 3.608n ± 1% +1.33% (p=0.001 n=20)
Int32N1e8-32 3.770n ± 2% 3.767n ± 1% ~ (p=0.155 n=20)
Int32N1e9-32 4.098n ± 0% 4.130n ± 2% ~ (p=0.016 n=20)
Int32N2e9-32 4.179n ± 1% 4.206n ± 1% ~ (p=0.011 n=20)
Float32-32 21.18n ± 4% 22.18n ± 4% +4.70% (p=0.003 n=20)
Float64-32 20.60n ± 2% 20.75n ± 4% +0.73% (p=0.000 n=20)
ExpFloat64-32 13.07n ± 0% 12.58n ± 3% -3.82% (p=0.000 n=20)
NormFloat64-32 7.738n ± 2% 7.920n ± 3% ~ (p=0.066 n=20)
Perm3-32 36.73n ± 1% 40.27n ± 1% +9.65% (p=0.000 n=20)
Perm30-32 211.9n ± 1% 213.2n ± 2% ~ (p=0.262 n=20)
Perm30ViaShuffle-32 165.2n ± 1% 164.2n ± 2% ~ (p=0.029 n=20)
ShuffleOverhead-32 133.9n ± 1% 134.7n ± 2% ~ (p=0.551 n=20)
Concurrent-32 3.287n ± 2% 3.301n ± 2% ~ (p=0.330 n=20)
For #61716.
Change-Id: I8d2f73f87dd3603a0c2ff069988938e0957b6904
Reviewed-on: https://go-review.googlesource.com/c/go/+/502499
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
|
|
Change the benchmarks to use the result of the calls,
as I found that in certain cases inlining resulted in
discarding part of the computation in the benchmark loop.
Add various benchmarks that will be relevant in future CLs.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 220860f76f.amd64 │
│ sec/op │
SourceUint64-32 1.555n ± 1%
GlobalInt64-32 2.071n ± 1%
GlobalInt63Parallel-32 0.1023n ± 1%
GlobalUint64-32 5.193n ± 1%
GlobalUint64Parallel-32 0.2341n ± 0%
Int64-32 2.056n ± 2%
Uint64-32 2.077n ± 2%
GlobalIntN1000-32 4.077n ± 2%
IntN1000-32 3.476n ± 2%
Int64N1000-32 3.059n ± 1%
Int64N1e8-32 2.942n ± 1%
Int64N1e9-32 2.932n ± 1%
Int64N2e9-32 2.925n ± 1%
Int64N1e18-32 3.116n ± 1%
Int64N2e18-32 4.067n ± 1%
Int64N4e18-32 4.054n ± 1%
Int32N1000-32 2.951n ± 1%
Int32N1e8-32 3.102n ± 1%
Int32N1e9-32 3.535n ± 1%
Int32N2e9-32 3.514n ± 1%
Float32-32 2.760n ± 1%
Float64-32 2.284n ± 1%
ExpFloat64-32 3.757n ± 1%
NormFloat64-32 3.837n ± 1%
Perm3-32 35.23n ± 2%
Perm30-32 208.8n ± 1%
Perm30ViaShuffle-32 111.7n ± 1%
ShuffleOverhead-32 101.1n ± 1%
Concurrent-32 2.108n ± 7%
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ 220860f76f.arm64 │
│ sec/op │
SourceUint64-8 2.316n ± 1%
GlobalInt64-8 2.183n ± 1%
GlobalInt63Parallel-8 0.4331n ± 0%
GlobalUint64-8 4.377n ± 2%
GlobalUint64Parallel-8 0.9237n ± 0%
Int64-8 2.538n ± 1%
Uint64-8 2.604n ± 1%
GlobalIntN1000-8 3.857n ± 2%
IntN1000-8 3.822n ± 2%
Int64N1000-8 3.318n ± 0%
Int64N1e8-8 3.349n ± 1%
Int64N1e9-8 3.317n ± 2%
Int64N2e9-8 3.317n ± 2%
Int64N1e18-8 3.542n ± 1%
Int64N2e18-8 5.087n ± 0%
Int64N4e18-8 5.084n ± 0%
Int32N1000-8 3.208n ± 2%
Int32N1e8-8 3.610n ± 1%
Int32N1e9-8 4.235n ± 0%
Int32N2e9-8 4.229n ± 1%
Float32-8 3.468n ± 0%
Float64-8 3.447n ± 0%
ExpFloat64-8 4.567n ± 0%
NormFloat64-8 4.821n ± 0%
Perm3-8 28.89n ± 0%
Perm30-8 175.7n ± 0%
Perm30ViaShuffle-8 153.5n ± 0%
ShuffleOverhead-8 119.8n ± 1%
Concurrent-8 2.433n ± 3%
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 220860f76f.386 │
│ sec/op │
SourceUint64-32 2.370n ± 1%
GlobalInt64-32 3.569n ± 1%
GlobalInt63Parallel-32 0.3221n ± 1%
GlobalUint64-32 8.797n ± 10%
GlobalUint64Parallel-32 0.6351n ± 0%
Int64-32 2.612n ± 2%
Uint64-32 3.350n ± 1%
GlobalIntN1000-32 5.892n ± 1%
IntN1000-32 4.546n ± 1%
Int64N1000-32 14.59n ± 1%
Int64N1e8-32 14.76n ± 2%
Int64N1e9-32 16.57n ± 1%
Int64N2e9-32 14.54n ± 1%
Int64N1e18-32 16.14n ± 1%
Int64N2e18-32 18.10n ± 1%
Int64N4e18-32 18.65n ± 1%
Int32N1000-32 3.560n ± 1%
Int32N1e8-32 3.770n ± 2%
Int32N1e9-32 4.098n ± 0%
Int32N2e9-32 4.179n ± 1%
Float32-32 21.18n ± 4%
Float64-32 20.60n ± 2%
ExpFloat64-32 13.07n ± 0%
NormFloat64-32 7.738n ± 2%
Perm3-32 36.73n ± 1%
Perm30-32 211.9n ± 1%
Perm30ViaShuffle-32 165.2n ± 1%
ShuffleOverhead-32 133.9n ± 1%
Concurrent-32 3.287n ± 2%
For #61716.
Change-Id: I2f0938eae4b7bf736a8cd899a99783e731bf2179
Reviewed-on: https://go-review.googlesource.com/c/go/+/502496
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Removing Rand.Seed lets us remove lockedSource as well,
along with the ambiguity in globalRand about which source
to use.
For #61716.
Change-Id: Ibe150520dd1e7dd87165eacaebe9f0c2daeaedfd
Reviewed-on: https://go-review.googlesource.com/c/go/+/502498
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
|
|
Add more test cases.
Replace -printgolden with -update,
which rewrites the files for us.
For #61716.
Change-Id: I7c4c900ee896042429135a21971a56ebe16b6a66
Reviewed-on: https://go-review.googlesource.com/c/go/+/516858
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
In math/rand, Read is deprecated. Remove in v2.
People should use crypto/rand if they need long strings.
For #61716.
Change-Id: Ib254b7e1844616e96db60a3a7abb572b0dcb1583
Reviewed-on: https://go-review.googlesource.com/c/go/+/502497
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Int31 -> Int32
Int31n -> Int32N
Int63 -> Int64
Int63n -> Int64N
Intn -> IntN
The 31 and 63 are pedantic and confusing: the functions should
be named for the type they return, same as all the others.
The lower-case n is inconsistent with Go's usual CamelCase
and especially problematic because we plan to add 'func N'.
Capitalize the n.
For #61716.
Change-Id: Idb1a005a82f353677450d47fb612ade7a41fde69
Reviewed-on: https://go-review.googlesource.com/c/go/+/516857
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This is the beginning of the math/rand/v2 package from proposal #61716.
Start by copying old API. This CL copies math/rand/* to math/rand/v2
and updates references to math/rand to add v2 throughout.
Later CLs will make the v2 changes.
For #61716.
Change-Id: I1624ccffae3dfa442d4ba2461942decbd076e11b
Reviewed-on: https://go-review.googlesource.com/c/go/+/502495
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
|
|
Running 'go fix' on the cmd+std packages handled much of this change.
Also update code generators to use only the new go:build lines,
not the old +build ones.
For #41184.
For #60268.
Change-Id: If35532abe3012e7357b02c79d5992ff5ac37ca23
Cq-Include-Trybots: luci.golang.try:gotip-linux-386-longtest,gotip-linux-amd64-longtest,gotip-windows-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/536237
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: I4a6c2ef6fd21355952ab7d8eaad883646a95d364
Reviewed-on: https://go-review.googlesource.com/c/go/+/535087
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
Change-Id: Id2079f7012392dea8dfe2386bb9fb1ea3f487a4a
Reviewed-on: https://go-review.googlesource.com/c/go/+/526015
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
|
|
Use ^ and $ in the -run flag regular expression value when the intention
is to invoke a single named test. This removes the reliance on there not
being another similarly named test to achieve the intended result.
In particular, package syscall has tests named TestUnshareMountNameSpace
and TestUnshareMountNameSpaceChroot that both trigger themselves setting
GO_WANT_HELPER_PROCESS=1 to run alternate code in a helper process. As a
consequence of overlap in their test names, the former was inadvertently
triggering one too many helpers.
Spotted while reviewing CL 525196. Apply the same change in other places
to make it easier for code readers to see that said tests aren't running
extraneous tests. The unlikely cases of -run=TestSomething intentionally
being used to run all tests that have the TestSomething substring in the
name can be better written as -run=^.*TestSomething.*$ or with a comment
so it is clear it wasn't an oversight.
Change-Id: Iba208aba3998acdbf8c6708e5d23ab88938bfc1e
Reviewed-on: https://go-review.googlesource.com/c/go/+/524948
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Kirill Kolyshkin <kolyshkin@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: I71a38dd20bfaf2b1aed18892d54eeb017d3d7d66
GitHub-Last-Rev: 8da43b2cbd563ed123690709e519c9f84272b332
GitHub-Pull-Request: golang/go#61955
Reviewed-on: https://go-review.googlesource.com/c/go/+/518595
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
|
|
Change-Id: I9e95806116a8547ec782f66226d1b1382c6156de
Change-Id: I9e95806116a8547ec782f66226d1b1382c6156de
GitHub-Last-Rev: 5b4ce994c162775e91aa00c942571bc0ac8b1eca
GitHub-Pull-Request: golang/go#61829
Reviewed-on: https://go-review.googlesource.com/c/go/+/516895
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
new s390x assembly implementation of Sin/Cos/SinCos/Tan handle huge argument
test's.
Updates #29240
Change-Id: I9f22d9714528ef2af52c749079f3727250089baf
Reviewed-on: https://go-review.googlesource.com/c/go/+/509675
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Currently s390x, sin/cos assembly implementation not handling huge
arguments. This change reverts assembly routine to native go implementation
for huge arguments. Implementing the changes in assembly giving better
performance than native go changes in terms of execution/cycles.
name Go_changes Asm_changes
Sin/input_size_(0.5)-8 11.85ns ± 0% 5.32ns ± 1%
Sin/input_size_(1<<20)-8 15.32ns ± 0% 9.75ns ± 3%
Sin/input_size_(1<<_40)-8 17.9ns ± 0% 10.3ns ± 6%
Sin/input_size_(1<<50)-8 16.33ns ± 0% 9.75ns ± 6%
Sin/input_size_(1<<60)-8 33.0ns ± 1% 29.1ns ± 0%
Sin/input_size_(1<<80)-8 29.9ns ± 0% 27.2ns ± 2%
Sin/input_size_(1<<200)-8 31.5ns ± 1% 28.3ns ± 0%
Sin/input_size_(1<<480)-8 29.4ns ± 1% 28.0ns ± 1%
Sin/input_size_(1234567891234567_<<_180)-8 29.3ns ± 1% 28.0ns ± 0%
Cos/input_size_(0.5)-8 10.33ns ± 0% 5.69ns ± 1%
Cos/input_size_(1<<20)-8 16.67ns ± 0% 9.18ns ± 0%
Cos/input_size_(1<<_40)-8 18.50ns ± 0% 9.45ns ± 3%
Cos/input_size_(1<<50)-8 16.67ns ± 0% 9.18ns ± 1%
Cos/input_size_(1<<60)-8 31.6ns ± 1% 26.7ns ± 2%
Cos/input_size_(1<<80)-8 31.3ns ± 0% 25.5ns ± 1%
Cos/input_size_(1<<200)-8 30.0ns ± 0% 26.7ns ± 1%
Cos/input_size_(1<<480)-8 31.9ns ±2% 27.0ns ± 0%
Cos/input_size_(1234567891234567_<<_180)-8 31.8ns ± 0% 26.9ns ± 0%
Fixes #29240
Change-Id: Id2ebcfa113926f27510d527e80daaddad925a707
Reviewed-on: https://go-review.googlesource.com/c/go/+/469635
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bill O'Farrell <billotosyr@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Currently on s390x, tan assembly implementation is not handling huge arguments at all. This change is to check for large arguments and revert back to native go implantation from assembly code in case of huge arguments.
The changes are implemented in assembly code to get better performance over native go implementation.
Benchmark details of tan function with table driven inputs are updated as part of the issue link.
Fixes #37854
Change-Id: I4e5321e65c27b7ce8c497fc9d3991ca8604753d2
Reviewed-on: https://go-review.googlesource.com/c/go/+/470595
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
|
|
Sin/Tan are odd, Cos is even, so it is easy to compute the correct
result from the positive argument case.
Change-Id: If851d00fc7f515ece8199cf56d21186ced51e94f
Reviewed-on: https://go-review.googlesource.com/c/go/+/509815
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Srinivas Pokala <Pokala.Srinivas@ibm.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Adds a test that triggers the RISC-V fused multiply-add code
generation bug fixed by CL 506575.
Change-Id: Ia3a55a68b48c5cc6beac4e5235975dea31f3faf2
Reviewed-on: https://go-review.googlesource.com/c/go/+/507035
Auto-Submit: M Zhuo <mzh@golangcn.org>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
When x*y == -z the portable implementation of FMA copied the sign
bit from x*y into the result. This meant that when x*y == -z and
x*y < 0 the result was -0 which is incorrect.
Fixes #61130.
Change-Id: Ib93a568b7bdb9031e2aedfa1bdfa9bddde90851d
Reviewed-on: https://go-review.googlesource.com/c/go/+/507376
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joedian Reid <joedian@golang.org>
|
|
For #59488
Fixes #60616
Change-Id: Idf9f42d7d868999664652dd7b478684a474f1d96
Reviewed-on: https://go-review.googlesource.com/c/go/+/501355
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Rob Pike <r@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
|
|
Fix spelling errors discovered using https://github.com/codespell-project/codespell. Errors in data files and vendored packages are ignored.
Change-Id: I83c7818222f2eea69afbd270c15b7897678131dc
GitHub-Last-Rev: 3491615b1b82832cc0064f535786546e89aa6184
GitHub-Pull-Request: golang/go#60758
Reviewed-on: https://go-review.googlesource.com/c/go/+/502576
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
There are some symbol mismatches in the comments, this commit attempts to fix them
Change-Id: I5c9075e5218defe9233c075744d243b26ff68496
Reviewed-on: https://go-review.googlesource.com/c/go/+/492996
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: shuang cui <imcusg@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
The "To" prefix was a relic of the first draft
that I failed to make consistent with the unprefixed
name used in the proposal. Fortunately iant spotted
it during the API audit.
Updates #56984
Updates #60560
Change-Id: Ifa6eeddf6dd5f0637c0568e383f9a4bef88b10f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/500116
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Alan Donovan <adonovan@google.com>
|
|
This reverts CL 467515. Now that we have cmp.Compare,
we don't need math.Compare or math.Compare32 after all.
For #56491
Fixes #60519
Change-Id: I8ed33464adfc6d69bd6b328edb26aa2ee3d234d9
Reviewed-on: https://go-review.googlesource.com/c/go/+/499416
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Eli Bendersky <eliben@google.com>
|
|
b.ResetTimer used to also stop the timer, however it does not anymore.
These benchmarks hadn't been fixed and as a result ended up measuring
some additional things.
Also, make some for loops more conventional.
Change-Id: I76ca68456d85eec51722a80587e5b2c9f5d836a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/496996
Run-TryBot: Damien Neil <dneil@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Auto-Submit: Damien Neil <dneil@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
|
|
Fix comments, including duplicate is, wrong phrases and articles, misspellings, etc.
Change-Id: I8bfea53b9b275e649757cc4bee6a8a026ed9c7a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/493035
Reviewed-by: Benny Siegert <bsiegert@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: shuang cui <imcusg@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
|
|
The initial purpose of PCALIGN was to identify code
where it would be beneficial to align code for performance,
but avoid cases where too many NOPs were added. On p10, it
is now necessary to enforce a certain alignment in some
cases, so the behavior of PCALIGN needs to be slightly
different. Code will now be aligned to the value specified
on the PCALIGN instruction regardless of number of NOPs added,
which is more intuitive and consistent with power assembler
alignment directives.
This also adds 64 as a possible alignment value.
The existing values used in PCALIGN were modified according to
the new behavior.
A testcase was updated and performance testing was done to
verify that this does not adversely affect performance.
Change-Id: Iad1cf5ff112e5bfc0514f0805be90e24095e932b
Reviewed-on: https://go-review.googlesource.com/c/go/+/485056
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
|
|
Re-run all go:generate stringer commands. This mostly adds checks
that the constant values did not change, but does add new strings
for the debug/dwarf and internal/pkgbits packages.
Change-Id: I5fc41f20da47338152c183d45d5ae65074e2fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/483717
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
|
|
Fixes #59331
Change-Id: I62156be2f2758c59349c3b02db6cf9140429c9e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/481915
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Bypass: Ian Lance Taylor <iant@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
Fixes the misuse of "a" vs "an", according to English grammatical
expectations and using https://www.a-or-an.com/
Change-Id: I53ac724070e3ff3d33c304483fe72c023c7cda47
Reviewed-on: https://go-review.googlesource.com/c/go/+/480536
Run-TryBot: shuang cui <imcusg@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
I noticed the one in path/filepath while reading the docs,
and the other ones were found via some quick grepping.
Change-Id: I386f2f74ef816a6d18aa2f58ee6b64dbd0147c9e
Reviewed-on: https://go-review.googlesource.com/c/go/+/478795
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Most of these are one-off mistakes. Only one file was all spaces.
Change-Id: I277c3ce4a4811aa4248c90676f66bc775ae8d062
Reviewed-on: https://go-review.googlesource.com/c/go/+/478976
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This change introduces the Compare and Compare32 functions
based on the total-ordering predicate in IEEE-754, section 5.10.
In particular,
* -NaN is ordered before any other value
* +NaN is ordered after any other value
* -0 is ordered before +0
* All other values are ordered the usual way
Compare-8 0.4537n ± 1%
Compare32-8 0.3752n ± 1%
geomean 0.4126n
Fixes #56491.
Change-Id: I5c9c77430a2872f380688c1b0a66f2105b77d5ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/467515
Reviewed-by: WANG Xuerui <git@xen0n.name>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Change-Id: I16ec916b47de2f417b681c8abff5a1375ddf491b
Reviewed-on: https://go-review.googlesource.com/c/go/+/468055
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
This reverts CL 459435.
Reason for revert: Tests failing on MIPS.
Change-Id: I9017bf718ba938df6d6766041555034d55d90b8a
Reviewed-on: https://go-review.googlesource.com/c/go/+/467255
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Akhil Indurti <aindurti@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Change-Id: I37a9e4362953a711840087e1b7b8d7a25f1a83b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/467275
Reviewed-by: Russ Cox <rsc@golang.org>
TryBot-Bypass: Russ Cox <rsc@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
|
|
It currently says only what it wasn't good for, which is not helpful.
Change-Id: I468c7f385c14eaca99788a94d53c30b729ed0944
Reviewed-on: https://go-review.googlesource.com/c/go/+/466276
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
|
|
This change introduces the Compare and Compare32 functions
based on the total-ordering predicate in IEEE-754, section 5.10.
In particular,
* -NaN is ordered before any other value
* +NaN is ordered after any other value
* -0 is ordered before +0
* All other values are ordered the usual way
name time/op
Compare-8 0.24ns ± 1%
Compare32-8 0.24ns ± 0%
Fixes #56491.
Change-Id: I9444fbfefe26741794c4436a26d403b8da97bdaf
Reviewed-on: https://go-review.googlesource.com/c/go/+/459435
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Now that the top-level math/rand functions are auto-seeded by default
(issue #54880), use the runtime fastrand64 function when 1) Seed
has not been called; 2) the GODEBUG randautoseed=0 is not used.
The benchmarks run quickly and are relatively noisy, but they show
significant improvements for parallel calls to the top-level functions.
goos: linux
goarch: amd64
pkg: math/rand
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ /tmp/foo.1 │ /tmp/foo.2 │
│ sec/op │ sec/op vs base │
Int63Threadsafe-16 11.605n ± 1% 3.094n ± 1% -73.34% (p=0.000 n=10)
Int63ThreadsafeParallel-16 67.8350n ± 2% 0.4000n ± 1% -99.41% (p=0.000 n=10)
Int63Unthreadsafe-16 1.947n ± 3% 1.924n ± 2% ~ (p=0.189 n=10)
Intn1000-16 4.295n ± 2% 4.287n ± 3% ~ (p=0.517 n=10)
Int63n1000-16 4.379n ± 0% 4.192n ± 2% -4.27% (p=0.000 n=10)
Int31n1000-16 3.641n ± 3% 3.506n ± 0% -3.69% (p=0.000 n=10)
Float32-16 3.330n ± 7% 3.250n ± 2% -2.40% (p=0.017 n=10)
Float64-16 2.194n ± 6% 2.056n ± 4% -6.31% (p=0.004 n=10)
Perm3-16 43.39n ± 9% 38.28n ± 12% -11.77% (p=0.015 n=10)
Perm30-16 324.4n ± 6% 315.9n ± 19% ~ (p=0.315 n=10)
Perm30ViaShuffle-16 175.4n ± 1% 143.6n ± 2% -18.15% (p=0.000 n=10)
ShuffleOverhead-16 223.4n ± 2% 215.8n ± 1% -3.38% (p=0.000 n=10)
Read3-16 5.428n ± 3% 5.406n ± 2% ~ (p=0.780 n=10)
Read64-16 41.55n ± 5% 40.14n ± 3% -3.38% (p=0.000 n=10)
Read1000-16 622.9n ± 4% 594.9n ± 2% -4.50% (p=0.000 n=10)
Concurrent-16 136.300n ± 2% 4.647n ± 26% -96.59% (p=0.000 n=10)
geomean 23.40n 12.15n -48.08%
Fixes #49892
Change-Id: Iba75b326145512ab0b7ece233b98ac3d4e1fb504
Reviewed-on: https://go-review.googlesource.com/c/go/+/465037
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
|
|
Change-Id: I31bec5d2b4a79a085942c7d380678379d99cf07b
Reviewed-on: https://go-review.googlesource.com/c/go/+/455135
Auto-Submit: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Run-TryBot: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
|