| Age | Commit message (Collapse) | Author |
|
Copying the loop variable is no longer necessary since Go 1.22.
Change-Id: Iebb21dac44a20ec200567f1d786f105a4ee4999d
Reviewed-on: https://go-review.googlesource.com/c/go/+/711640
Reviewed-by: Florian Lehner <lehner.florian86@gmail.com>
Auto-Submit: Damien Neil <dneil@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
Auto-Submit: Tobias Klauser <tobias.klauser@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
If the -test.run value is not surrounded by ^$ then any test that
matches the -test.run value will be run. This is normally not the
desired behavior, as it can lead to unexpected tests being run.
Change-Id: I3447aaebad5156bbef7f263cdb9f6b8c32331324
Reviewed-on: https://go-review.googlesource.com/c/go/+/651956
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This harmonize the docs with (*Rand).Uint* functions.
And it make it clearer, I wasn't sure if it would try to interpret
the uint as a signed number somehow, it does not pull any surprises
make that clear.
Change-Id: I5a87a0a5563dbabfc31e536e40ee69b11f5cb6cf
Reviewed-on: https://go-review.googlesource.com/c/go/+/633535
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
If Be and Le stand for big-endian and little-endian,
then they should be BE and LE.
Change-Id: I723e3962b8918da84791783d3c547638f1c9e8a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/627376
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
As some callers don't have a testing context, modify testenv.Executable
to accept nil (similar to how testenv.GOROOT works).
Change-Id: I39112a7869933785a26b5cb6520055b3cc42b847
Reviewed-on: https://go-review.googlesource.com/c/go/+/609835
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Implement the encoding.(Binary|Text)Appender interfaces for "x509.OID".
Implement the encoding.BinaryAppender interface for "rand/v2.PCG" and "rand/v2.ChaCha8".
"rand/v2.ChaCha8.MarshalBinary" alse gains some performance benefits:
│ old │ new │
│ sec/op │ sec/op vs base │
ChaCha8MarshalBinary-8 33.730n ± 2% 9.786n ± 1% -70.99% (p=0.000 n=10)
ChaCha8MarshalBinaryRead-8 99.86n ± 1% 17.79n ± 0% -82.18% (p=0.000 n=10)
geomean 58.04n 13.19n -77.27%
│ old │ new │
│ B/op │ B/op vs base │
ChaCha8MarshalBinary-8 48.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10)
ChaCha8MarshalBinaryRead-8 83.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10)
│ old │ new │
│ allocs/op │ allocs/op vs base │
ChaCha8MarshalBinary-8 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10)
ChaCha8MarshalBinaryRead-8 2.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10)
For #62384
Change-Id: I604bde6dad90a916012909c7260f4bb06dcf5c0a
GitHub-Last-Rev: 78abf9c5dfb74838985637798bcd5cb957541d20
GitHub-Pull-Request: golang/go#68987
Reviewed-on: https://go-review.googlesource.com/c/go/+/607079
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Makes calls to the global Seed a no-op. The GODEBUG=randseednop=0
setting can be used to revert this behavior.
Fixes #67273
Change-Id: I79c1b2b23f3bc472fbd6190cb916a9d7583250f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/606055
Auto-Submit: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
In all cases the intent was not to interpret s as a format string.
In one case (go/types), this was a latent bug in production.
(These were uncovered by a new check in vet's printf analyzer.)
Updates #60529
Change-Id: I3e17af7e589be9aec1580783a1b1011c52ec494b
Reviewed-on: https://go-review.googlesource.com/c/go/+/587855
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
Fixes #67059
Closes #67452
Closes #67498
Change-Id: I84eba2ed787a17e9d6aaad2a8a78596e3944909a
Reviewed-on: https://go-review.googlesource.com/c/go/+/587280
Reviewed-by: Roland Shoemaker <roland@golang.org>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Just a cleanup.
Change-Id: Ibeb2c7d447c793086280e612fe5f0f7eeb863f71
Reviewed-on: https://go-review.googlesource.com/c/go/+/582875
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
|
|
Change-Id: I6d0050319c66fb62c817206e646e1a9449dc444c
Reviewed-on: https://go-review.googlesource.com/c/go/+/585715
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Robert Griesemer <gri@google.com>
|
|
Change-Id: Id07f16d14133ee539bc2880b39641c42418fa6e2
GitHub-Last-Rev: 7b327d508f677f2476d24f046d25921f4599dd9a
GitHub-Pull-Request: golang/go#67319
Reviewed-on: https://go-review.googlesource.com/c/go/+/585016
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Uint was part of the approved proposal but was inadvertently left
out of Go 1.22. Add for Go 1.23.
Change-Id: Ifaf24447bd70c8524c2fd299eefdf4aa29e49e66
Reviewed-on: https://go-review.googlesource.com/c/go/+/583455
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
According to the https://go.dev/wiki/CodeReviewComments#receiver-names
Change-Id: Ib8bc57cf6a680e5c75d7346b74e77847945f6939
Reviewed-on: https://go-review.googlesource.com/c/go/+/568635
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
It's easier to go look at its documentation when there's a link.
Change-Id: Iad6c1aa1a3f4b9127dc526b4db473239329780d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/563255
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
Move ChaCha8 code into internal/chacha8rand and use it to implement
runtime.rand, which is used for the unseeded global source for
both math/rand and math/rand/v2. This also affects the calculation of
the start point for iteration over very very large maps (when the
32-bit fastrand is not big enough).
The benefit is that misuse of the global random number generators
in math/rand and math/rand/v2 in contexts where non-predictable
randomness is important for security reasons is no longer a
security problem, removing a common mistake among programmers
who are unaware of the different kinds of randomness.
The cost is an extra 304 bytes per thread stored in the m struct
plus 2-3ns more per random uint64 due to the more sophisticated
algorithm. Using PCG looks like it would cost about the same,
although I haven't benchmarked that.
Before this, the math/rand and math/rand/v2 global generator
was wyrand (https://github.com/wangyi-fudan/wyhash).
For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson
ALFG was justifiable, since the latter was not any better.
But for math/rand/v2, the global generator really should be
at least as good as one of the well-studied, specific algorithms
provided directly by the package, and it's not.
(Wyrand is still reasonable for scheduling and cache decisions.)
Good randomness does have a cost: about twice wyrand.
Also rationalize the various runtime rand references.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ bbb48afeb7.amd64 │ 5cf807d1ea.amd64 │
│ sec/op │ sec/op vs base │
ChaCha8-32 1.862n ± 2% 1.861n ± 2% ~ (p=0.825 n=20)
PCG_DXSM-32 1.471n ± 1% 1.460n ± 2% ~ (p=0.153 n=20)
SourceUint64-32 1.636n ± 2% 1.582n ± 1% -3.30% (p=0.000 n=20)
GlobalInt64-32 2.087n ± 1% 3.663n ± 1% +75.54% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1042n ± 1% 0.2026n ± 1% +94.48% (p=0.000 n=20)
GlobalUint64-32 2.263n ± 2% 3.724n ± 1% +64.57% (p=0.000 n=20)
GlobalUint64Parallel-32 0.1019n ± 1% 0.1973n ± 1% +93.67% (p=0.000 n=20)
Int64-32 1.771n ± 1% 1.774n ± 1% ~ (p=0.449 n=20)
Uint64-32 1.863n ± 2% 1.866n ± 1% ~ (p=0.364 n=20)
GlobalIntN1000-32 3.134n ± 3% 4.730n ± 2% +50.95% (p=0.000 n=20)
IntN1000-32 2.489n ± 1% 2.489n ± 1% ~ (p=0.683 n=20)
Int64N1000-32 2.521n ± 1% 2.516n ± 1% ~ (p=0.394 n=20)
Int64N1e8-32 2.479n ± 1% 2.478n ± 2% ~ (p=0.743 n=20)
Int64N1e9-32 2.530n ± 2% 2.514n ± 2% ~ (p=0.193 n=20)
Int64N2e9-32 2.501n ± 1% 2.494n ± 1% ~ (p=0.616 n=20)
Int64N1e18-32 3.227n ± 1% 3.205n ± 1% ~ (p=0.101 n=20)
Int64N2e18-32 3.647n ± 1% 3.599n ± 1% ~ (p=0.019 n=20)
Int64N4e18-32 5.135n ± 1% 5.069n ± 2% ~ (p=0.034 n=20)
Int32N1000-32 2.657n ± 1% 2.637n ± 1% ~ (p=0.180 n=20)
Int32N1e8-32 2.636n ± 1% 2.636n ± 1% ~ (p=0.763 n=20)
Int32N1e9-32 2.660n ± 2% 2.638n ± 1% ~ (p=0.358 n=20)
Int32N2e9-32 2.662n ± 2% 2.618n ± 2% ~ (p=0.064 n=20)
Float32-32 2.272n ± 2% 2.239n ± 2% ~ (p=0.194 n=20)
Float64-32 2.272n ± 1% 2.286n ± 2% ~ (p=0.763 n=20)
ExpFloat64-32 3.762n ± 1% 3.744n ± 1% ~ (p=0.171 n=20)
NormFloat64-32 3.706n ± 1% 3.655n ± 2% ~ (p=0.066 n=20)
Perm3-32 32.93n ± 3% 34.62n ± 1% +5.13% (p=0.000 n=20)
Perm30-32 202.9n ± 1% 204.0n ± 1% ~ (p=0.482 n=20)
Perm30ViaShuffle-32 115.0n ± 1% 114.9n ± 1% ~ (p=0.358 n=20)
ShuffleOverhead-32 112.8n ± 1% 112.7n ± 1% ~ (p=0.692 n=20)
Concurrent-32 2.107n ± 0% 3.725n ± 1% +76.75% (p=0.000 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
│ bbb48afeb7.arm64 │ 5cf807d1ea.arm64 │
│ sec/op │ sec/op vs base │
ChaCha8-8 2.480n ± 0% 2.429n ± 0% -2.04% (p=0.000 n=20)
PCG_DXSM-8 2.531n ± 0% 2.530n ± 0% ~ (p=0.877 n=20)
SourceUint64-8 2.534n ± 0% 2.533n ± 0% ~ (p=0.732 n=20)
GlobalInt64-8 2.172n ± 1% 4.794n ± 0% +120.67% (p=0.000 n=20)
GlobalInt64Parallel-8 0.4320n ± 0% 0.9605n ± 0% +122.32% (p=0.000 n=20)
GlobalUint64-8 2.182n ± 0% 4.770n ± 0% +118.58% (p=0.000 n=20)
GlobalUint64Parallel-8 0.4307n ± 0% 0.9583n ± 0% +122.51% (p=0.000 n=20)
Int64-8 4.107n ± 0% 4.104n ± 0% ~ (p=0.416 n=20)
Uint64-8 4.080n ± 0% 4.080n ± 0% ~ (p=0.052 n=20)
GlobalIntN1000-8 2.814n ± 2% 5.643n ± 0% +100.50% (p=0.000 n=20)
IntN1000-8 4.141n ± 0% 4.139n ± 0% ~ (p=0.140 n=20)
Int64N1000-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.313 n=20)
Int64N1e8-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.103 n=20)
Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.761 n=20)
Int64N2e9-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.636 n=20)
Int64N1e18-8 5.266n ± 0% 5.326n ± 1% +1.14% (p=0.001 n=20)
Int64N2e18-8 6.052n ± 0% 6.167n ± 0% +1.90% (p=0.000 n=20)
Int64N4e18-8 8.826n ± 0% 9.051n ± 0% +2.55% (p=0.000 n=20)
Int32N1000-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20)
Int32N1e8-8 4.126n ± 0% 4.131n ± 0% +0.12% (p=0.000 n=20)
Int32N1e9-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20)
Int32N2e9-8 4.132n ± 0% 4.131n ± 0% ~ (p=0.017 n=20)
Float32-8 4.109n ± 0% 4.105n ± 0% ~ (p=0.379 n=20)
Float64-8 4.107n ± 0% 4.106n ± 0% ~ (p=0.867 n=20)
ExpFloat64-8 5.339n ± 0% 5.383n ± 0% +0.82% (p=0.000 n=20)
NormFloat64-8 5.735n ± 0% 5.737n ± 1% ~ (p=0.856 n=20)
Perm3-8 26.65n ± 0% 26.80n ± 1% +0.58% (p=0.000 n=20)
Perm30-8 194.8n ± 1% 197.0n ± 0% +1.18% (p=0.000 n=20)
Perm30ViaShuffle-8 156.6n ± 0% 157.6n ± 1% +0.61% (p=0.000 n=20)
ShuffleOverhead-8 124.9n ± 0% 125.5n ± 0% +0.52% (p=0.000 n=20)
Concurrent-8 2.434n ± 3% 5.066n ± 0% +108.09% (p=0.000 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ bbb48afeb7.386 │ 5cf807d1ea.386 │
│ sec/op │ sec/op vs base │
ChaCha8-32 11.295n ± 1% 4.748n ± 2% -57.96% (p=0.000 n=20)
PCG_DXSM-32 7.693n ± 1% 7.738n ± 2% ~ (p=0.542 n=20)
SourceUint64-32 7.658n ± 2% 7.622n ± 2% ~ (p=0.344 n=20)
GlobalInt64-32 3.473n ± 2% 7.526n ± 2% +116.73% (p=0.000 n=20)
GlobalInt64Parallel-32 0.3198n ± 0% 0.5444n ± 0% +70.22% (p=0.000 n=20)
GlobalUint64-32 3.612n ± 0% 7.575n ± 1% +109.69% (p=0.000 n=20)
GlobalUint64Parallel-32 0.3168n ± 0% 0.5403n ± 0% +70.51% (p=0.000 n=20)
Int64-32 7.673n ± 2% 7.789n ± 1% ~ (p=0.122 n=20)
Uint64-32 7.773n ± 1% 7.827n ± 2% ~ (p=0.920 n=20)
GlobalIntN1000-32 6.268n ± 1% 9.581n ± 1% +52.87% (p=0.000 n=20)
IntN1000-32 10.33n ± 2% 10.45n ± 1% ~ (p=0.233 n=20)
Int64N1000-32 10.98n ± 2% 11.01n ± 1% ~ (p=0.401 n=20)
Int64N1e8-32 11.19n ± 2% 10.97n ± 1% ~ (p=0.033 n=20)
Int64N1e9-32 11.06n ± 1% 11.08n ± 1% ~ (p=0.498 n=20)
Int64N2e9-32 11.10n ± 1% 11.01n ± 2% ~ (p=0.995 n=20)
Int64N1e18-32 15.23n ± 2% 15.04n ± 1% ~ (p=0.973 n=20)
Int64N2e18-32 15.89n ± 1% 15.85n ± 1% ~ (p=0.409 n=20)
Int64N4e18-32 18.96n ± 2% 19.34n ± 2% ~ (p=0.048 n=20)
Int32N1000-32 10.46n ± 2% 10.44n ± 2% ~ (p=0.480 n=20)
Int32N1e8-32 10.46n ± 2% 10.49n ± 2% ~ (p=0.951 n=20)
Int32N1e9-32 10.28n ± 2% 10.26n ± 1% ~ (p=0.431 n=20)
Int32N2e9-32 10.50n ± 2% 10.44n ± 2% ~ (p=0.249 n=20)
Float32-32 13.80n ± 2% 13.80n ± 2% ~ (p=0.751 n=20)
Float64-32 23.55n ± 2% 23.87n ± 0% ~ (p=0.408 n=20)
ExpFloat64-32 15.36n ± 1% 15.29n ± 2% ~ (p=0.316 n=20)
NormFloat64-32 13.57n ± 1% 13.79n ± 1% +1.66% (p=0.005 n=20)
Perm3-32 45.70n ± 2% 46.99n ± 2% +2.81% (p=0.001 n=20)
Perm30-32 399.0n ± 1% 403.8n ± 1% +1.19% (p=0.006 n=20)
Perm30ViaShuffle-32 349.0n ± 1% 350.4n ± 1% ~ (p=0.909 n=20)
ShuffleOverhead-32 322.3n ± 1% 323.8n ± 1% ~ (p=0.410 n=20)
Concurrent-32 3.331n ± 1% 7.312n ± 1% +119.50% (p=0.000 n=20)
For #61716.
Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2
Reviewed-on: https://go-review.googlesource.com/c/go/+/516860
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This is a replay of CL 516859, after its rollback in CL 543895,
with big-endian systems fixed and the tests disabled on RISC-V
since the compiler is broken there (#64285).
ChaCha8 provides a cryptographically strong generator
alongside PCG, so that people who want stronger randomness
have access to that. On systems with 128-bit vector math
assembly (amd64 and arm64), ChaCha8 runs at about the same
speed as PCG (25% slower on amd64, 2% faster on arm64).
Fixes #64284.
Change-Id: I6290bb8ace28e1aff9a61f805dbe380ccdf25b94
Reviewed-on: https://go-review.googlesource.com/c/go/+/546020
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This reverts commit 6382893890b82bbc0439eade5b132d9d1b3fb3a7.
Reason for revert: Causes failures on big endian platforms and riscv64.
Possibly a bug in the generic implementation.
For #64284.
For #64285.
Change-Id: Ic1bb8533d9641fae28d0337b36d434b9a575cd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/543895
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
|
|
This change introduces new options to set the floating point
mode on ARM targets. The GOARM version number can optionally be
followed by ',hardfloat' or ',softfloat' to select whether to
use hardware instructions or software emulation for floating
point computations, respectively. For example,
GOARM=7,softfloat.
Previously, software floating point support was limited to
GOARM=5. With these options, software floating point is now
extended to all ARM versions, including GOARM=6 and 7. This
change also extends hardware floating point to GOARM=5.
GOARM=5 defaults to softfloat and GOARM=6 and 7 default to
hardfloat.
For #61588
Change-Id: I23dc86fbd0733b262004a2ed001e1032cf371e94
Reviewed-on: https://go-review.googlesource.com/c/go/+/514907
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
ChaCha8 provides a cryptographically strong generator
alongside PCG, so that people who want stronger randomness
have access to that. On systems with 128-bit vector math
assembly (amd64 and arm64), ChaCha8 runs at about the same
speed as PCG (25% slower on amd64, 2% faster on arm64).
Obviously all the claimed benchmark variation other than the
new ChaCha8 benchmark is a lie.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ afa459a2f0.amd64 │ bbb48afeb7.amd64 │
│ sec/op │ sec/op vs base │
PCG_DXSM-32 1.488n ± 2% 1.492n ± 2% ~ (p=0.309 n=20)
ChaCha8-32 1.861n ± 2%
SourceUint64-32 1.450n ± 3% 1.590n ± 2% +9.69% (p=0.000 n=20)
GlobalInt64-32 2.067n ± 2% 2.061n ± 1% ~ (p=0.952 n=20)
GlobalInt64Parallel-32 0.1044n ± 2% 0.1041n ± 1% ~ (p=0.498 n=20)
GlobalUint64-32 2.085n ± 0% 2.256n ± 2% +8.23% (p=0.000 n=20)
GlobalUint64Parallel-32 0.1008n ± 1% 0.1018n ± 1% ~ (p=0.041 n=20)
Int64-32 1.779n ± 1% 1.779n ± 1% ~ (p=0.410 n=20)
Uint64-32 1.854n ± 2% 1.882n ± 1% ~ (p=0.044 n=20)
GlobalIntN1000-32 3.140n ± 3% 3.115n ± 3% ~ (p=0.673 n=20)
IntN1000-32 2.496n ± 1% 2.509n ± 1% ~ (p=0.171 n=20)
Int64N1000-32 2.510n ± 2% 2.493n ± 1% ~ (p=0.804 n=20)
Int64N1e8-32 2.471n ± 2% 2.521n ± 1% +1.98% (p=0.003 n=20)
Int64N1e9-32 2.488n ± 2% 2.506n ± 1% ~ (p=0.663 n=20)
Int64N2e9-32 2.478n ± 2% 2.482n ± 2% ~ (p=0.533 n=20)
Int64N1e18-32 3.088n ± 1% 3.216n ± 1% +4.15% (p=0.000 n=20)
Int64N2e18-32 3.493n ± 1% 3.635n ± 2% +4.05% (p=0.000 n=20)
Int64N4e18-32 5.060n ± 2% 5.122n ± 1% +1.22% (p=0.000 n=20)
Int32N1000-32 2.620n ± 1% 2.672n ± 1% +2.00% (p=0.002 n=20)
Int32N1e8-32 2.652n ± 0% 2.646n ± 1% ~ (p=0.743 n=20)
Int32N1e9-32 2.644n ± 1% 2.660n ± 2% ~ (p=0.163 n=20)
Int32N2e9-32 2.619n ± 2% 2.652n ± 1% ~ (p=0.132 n=20)
Float32-32 2.261n ± 1% 2.267n ± 1% ~ (p=0.516 n=20)
Float64-32 2.241n ± 2% 2.276n ± 1% ~ (p=0.080 n=20)
ExpFloat64-32 3.716n ± 1% 3.779n ± 1% +1.68% (p=0.007 n=20)
NormFloat64-32 3.718n ± 1% 3.747n ± 1% ~ (p=0.011 n=20)
Perm3-32 34.11n ± 2% 34.23n ± 2% ~ (p=0.779 n=20)
Perm30-32 200.6n ± 0% 202.3n ± 2% ~ (p=0.055 n=20)
Perm30ViaShuffle-32 109.7n ± 1% 115.5n ± 2% +5.34% (p=0.000 n=20)
ShuffleOverhead-32 107.2n ± 1% 113.3n ± 1% +5.74% (p=0.000 n=20)
Concurrent-32 2.108n ± 6% 2.107n ± 1% ~ (p=0.448 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ afa459a2f0.arm64 │ bbb48afeb7.arm64 │
│ sec/op │ sec/op vs base │
PCG_DXSM-8 2.531n ± 0% 2.529n ± 0% ~ (p=0.586 n=20)
ChaCha8-8 2.480n ± 0%
SourceUint64-8 2.531n ± 0% 2.534n ± 0% ~ (p=0.227 n=20)
GlobalInt64-8 2.177n ± 1% 2.173n ± 1% ~ (p=0.733 n=20)
GlobalInt64Parallel-8 0.4319n ± 0% 0.4304n ± 0% -0.32% (p=0.003 n=20)
GlobalUint64-8 2.185n ± 1% 2.185n ± 0% ~ (p=0.541 n=20)
GlobalUint64Parallel-8 0.4295n ± 1% 0.4294n ± 0% ~ (p=0.203 n=20)
Int64-8 4.104n ± 0% 4.107n ± 0% ~ (p=0.193 n=20)
Uint64-8 4.080n ± 0% 4.081n ± 0% ~ (p=0.053 n=20)
GlobalIntN1000-8 2.814n ± 1% 2.814n ± 0% ~ (p=0.879 n=20)
IntN1000-8 4.140n ± 0% 4.141n ± 0% ~ (p=0.428 n=20)
Int64N1000-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.114 n=20)
Int64N1e8-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.898 n=20)
Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.593 n=20)
Int64N2e9-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.158 n=20)
Int64N1e18-8 5.273n ± 0% 5.274n ± 0% ~ (p=0.308 n=20)
Int64N2e18-8 6.059n ± 0% 6.058n ± 0% ~ (p=0.053 n=20)
Int64N4e18-8 8.803n ± 0% 8.800n ± 0% ~ (p=0.673 n=20)
Int32N1000-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.342 n=20)
Int32N1e8-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.091 n=20)
Int32N1e9-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.273 n=20)
Int32N2e9-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.425 n=20)
Float32-8 4.110n ± 0% 4.112n ± 0% ~ (p=0.203 n=20)
Float64-8 4.104n ± 0% 4.106n ± 0% ~ (p=0.409 n=20)
ExpFloat64-8 5.338n ± 0% 5.339n ± 0% ~ (p=0.037 n=20)
NormFloat64-8 5.731n ± 0% 5.733n ± 0% ~ (p=0.692 n=20)
Perm3-8 26.62n ± 0% 26.65n ± 0% +0.09% (p=0.000 n=20)
Perm30-8 194.6n ± 2% 194.9n ± 0% ~ (p=0.141 n=20)
Perm30ViaShuffle-8 156.4n ± 0% 156.5n ± 0% +0.06% (p=0.000 n=20)
ShuffleOverhead-8 125.8n ± 0% 125.0n ± 0% -0.64% (p=0.000 n=20)
Concurrent-8 2.654n ± 6% 2.441n ± 6% -8.06% (p=0.009 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ afa459a2f0.386 │ bbb48afeb7.386 │
│ sec/op │ sec/op vs base │
PCG_DXSM-32 7.793n ± 2% 7.647n ± 1% ~ (p=0.021 n=20)
ChaCha8-32 11.48n ± 2%
SourceUint64-32 7.680n ± 1% 7.714n ± 1% ~ (p=0.713 n=20)
GlobalInt64-32 3.474n ± 3% 3.491n ± 28% ~ (p=0.337 n=20)
GlobalInt64Parallel-32 0.3253n ± 0% 0.3194n ± 0% -1.81% (p=0.000 n=20)
GlobalUint64-32 3.433n ± 2% 3.610n ± 2% +5.14% (p=0.000 n=20)
GlobalUint64Parallel-32 0.3156n ± 0% 0.3164n ± 0% ~ (p=0.073 n=20)
Int64-32 7.707n ± 1% 7.824n ± 0% +1.52% (p=0.005 n=20)
Uint64-32 7.714n ± 1% 7.732n ± 2% ~ (p=0.441 n=20)
GlobalIntN1000-32 6.236n ± 1% 6.176n ± 2% ~ (p=0.499 n=20)
IntN1000-32 10.41n ± 1% 10.31n ± 2% ~ (p=0.782 n=20)
Int64N1000-32 10.97n ± 2% 11.22n ± 2% +2.19% (p=0.002 n=20)
Int64N1e8-32 10.98n ± 1% 11.07n ± 1% ~ (p=0.056 n=20)
Int64N1e9-32 10.95n ± 0% 11.15n ± 2% ~ (p=0.016 n=20)
Int64N2e9-32 11.11n ± 1% 11.00n ± 1% ~ (p=0.654 n=20)
Int64N1e18-32 15.18n ± 2% 14.97n ± 2% ~ (p=0.387 n=20)
Int64N2e18-32 15.61n ± 1% 15.91n ± 1% +1.92% (p=0.003 n=20)
Int64N4e18-32 19.23n ± 2% 18.98n ± 1% ~ (p=1.000 n=20)
Int32N1000-32 10.35n ± 1% 10.31n ± 2% ~ (p=0.081 n=20)
Int32N1e8-32 10.33n ± 1% 10.38n ± 1% ~ (p=0.335 n=20)
Int32N1e9-32 10.35n ± 1% 10.37n ± 1% ~ (p=0.497 n=20)
Int32N2e9-32 10.35n ± 1% 10.41n ± 1% ~ (p=0.605 n=20)
Float32-32 13.57n ± 1% 13.78n ± 2% ~ (p=0.047 n=20)
Float64-32 22.95n ± 4% 23.43n ± 3% ~ (p=0.218 n=20)
ExpFloat64-32 15.23n ± 2% 15.46n ± 1% ~ (p=0.095 n=20)
NormFloat64-32 13.78n ± 1% 13.73n ± 2% ~ (p=0.031 n=20)
Perm3-32 46.62n ± 2% 47.46n ± 2% +1.82% (p=0.004 n=20)
Perm30-32 400.7n ± 1% 403.5n ± 1% ~ (p=0.098 n=20)
Perm30ViaShuffle-32 350.5n ± 1% 348.1n ± 2% ~ (p=0.703 n=20)
ShuffleOverhead-32 326.0n ± 2% 326.2n ± 2% ~ (p=0.440 n=20)
Concurrent-32 3.290n ± 0% 3.297n ± 4% ~ (p=0.189 n=20)
For #61716.
Change-Id: Id2a7e1c1db0beb81f563faaefba65fe292497269
Reviewed-on: https://go-review.googlesource.com/c/go/+/516859
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
|
|
These slowdowns are because we are now using PCG instead of the
Mitchell/Reeds LFSR for the benchmarks. PCG is in fact a bit slower
(but generates statically far better random numbers).
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 01ff938549.amd64 │ afa459a2f0.amd64 │
│ sec/op │ sec/op vs base │
PCG_DXSM-32 1.490n ± 0% 1.488n ± 2% ~ (p=0.408 n=20)
SourceUint64-32 1.352n ± 1% 1.450n ± 3% +7.21% (p=0.000 n=20)
GlobalInt64-32 2.083n ± 0% 2.067n ± 2% ~ (p=0.223 n=20)
GlobalInt64Parallel-32 0.1035n ± 1% 0.1044n ± 2% ~ (p=0.010 n=20)
GlobalUint64-32 2.038n ± 1% 2.085n ± 0% +2.28% (p=0.000 n=20)
GlobalUint64Parallel-32 0.1006n ± 1% 0.1008n ± 1% ~ (p=0.733 n=20)
Int64-32 1.687n ± 2% 1.779n ± 1% +5.48% (p=0.000 n=20)
Uint64-32 1.674n ± 2% 1.854n ± 2% +10.69% (p=0.000 n=20)
GlobalIntN1000-32 3.135n ± 1% 3.140n ± 3% ~ (p=0.794 n=20)
IntN1000-32 2.478n ± 1% 2.496n ± 1% +0.73% (p=0.006 n=20)
Int64N1000-32 2.455n ± 1% 2.510n ± 2% +2.22% (p=0.000 n=20)
Int64N1e8-32 2.467n ± 2% 2.471n ± 2% ~ (p=0.050 n=20)
Int64N1e9-32 2.454n ± 1% 2.488n ± 2% +1.39% (p=0.000 n=20)
Int64N2e9-32 2.482n ± 1% 2.478n ± 2% ~ (p=0.066 n=20)
Int64N1e18-32 3.349n ± 2% 3.088n ± 1% -7.81% (p=0.000 n=20)
Int64N2e18-32 3.537n ± 1% 3.493n ± 1% -1.24% (p=0.002 n=20)
Int64N4e18-32 4.917n ± 0% 5.060n ± 2% +2.91% (p=0.000 n=20)
Int32N1000-32 2.386n ± 1% 2.620n ± 1% +9.76% (p=0.000 n=20)
Int32N1e8-32 2.366n ± 1% 2.652n ± 0% +12.11% (p=0.000 n=20)
Int32N1e9-32 2.355n ± 2% 2.644n ± 1% +12.32% (p=0.000 n=20)
Int32N2e9-32 2.371n ± 1% 2.619n ± 2% +10.48% (p=0.000 n=20)
Float32-32 2.245n ± 2% 2.261n ± 1% ~ (p=0.625 n=20)
Float64-32 2.235n ± 1% 2.241n ± 2% ~ (p=0.393 n=20)
ExpFloat64-32 3.813n ± 3% 3.716n ± 1% -2.53% (p=0.000 n=20)
NormFloat64-32 3.652n ± 2% 3.718n ± 1% +1.79% (p=0.006 n=20)
Perm3-32 33.12n ± 3% 34.11n ± 2% ~ (p=0.021 n=20)
Perm30-32 205.1n ± 1% 200.6n ± 0% -2.17% (p=0.000 n=20)
Perm30ViaShuffle-32 110.8n ± 1% 109.7n ± 1% -0.99% (p=0.002 n=20)
ShuffleOverhead-32 113.0n ± 1% 107.2n ± 1% -5.09% (p=0.000 n=20)
Concurrent-32 2.100n ± 0% 2.108n ± 6% ~ (p=0.103 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
│ 01ff938549.arm64 │ afa459a2f0.arm64 │
│ sec/op │ sec/op vs base │
PCG_DXSM-8 2.531n ± 0% 2.531n ± 0% ~ (p=0.763 n=20)
SourceUint64-8 2.258n ± 1% 2.531n ± 0% +12.09% (p=0.000 n=20)
GlobalInt64-8 2.167n ± 0% 2.177n ± 1% ~ (p=0.213 n=20)
GlobalInt64Parallel-8 0.4310n ± 0% 0.4319n ± 0% ~ (p=0.027 n=20)
GlobalUint64-8 2.182n ± 1% 2.185n ± 1% ~ (p=0.683 n=20)
GlobalUint64Parallel-8 0.4297n ± 0% 0.4295n ± 1% ~ (p=0.941 n=20)
Int64-8 2.472n ± 1% 4.104n ± 0% +66.00% (p=0.000 n=20)
Uint64-8 2.449n ± 1% 4.080n ± 0% +66.60% (p=0.000 n=20)
GlobalIntN1000-8 2.814n ± 2% 2.814n ± 1% ~ (p=0.972 n=20)
IntN1000-8 2.998n ± 2% 4.140n ± 0% +38.09% (p=0.000 n=20)
Int64N1000-8 2.949n ± 2% 4.139n ± 0% +40.35% (p=0.000 n=20)
Int64N1e8-8 2.953n ± 2% 4.140n ± 0% +40.22% (p=0.000 n=20)
Int64N1e9-8 2.950n ± 0% 4.139n ± 0% +40.32% (p=0.000 n=20)
Int64N2e9-8 2.946n ± 2% 4.140n ± 0% +40.53% (p=0.000 n=20)
Int64N1e18-8 3.779n ± 1% 5.273n ± 0% +39.52% (p=0.000 n=20)
Int64N2e18-8 4.370n ± 1% 6.059n ± 0% +38.65% (p=0.000 n=20)
Int64N4e18-8 6.544n ± 1% 8.803n ± 0% +34.52% (p=0.000 n=20)
Int32N1000-8 2.950n ± 0% 4.131n ± 0% +40.06% (p=0.000 n=20)
Int32N1e8-8 2.950n ± 2% 4.131n ± 0% +40.03% (p=0.000 n=20)
Int32N1e9-8 2.951n ± 2% 4.131n ± 0% +39.99% (p=0.000 n=20)
Int32N2e9-8 2.950n ± 2% 4.131n ± 0% +40.03% (p=0.000 n=20)
Float32-8 3.441n ± 0% 4.110n ± 0% +19.44% (p=0.000 n=20)
Float64-8 3.442n ± 0% 4.104n ± 0% +19.24% (p=0.000 n=20)
ExpFloat64-8 4.481n ± 0% 5.338n ± 0% +19.11% (p=0.000 n=20)
NormFloat64-8 4.725n ± 0% 5.731n ± 0% +21.28% (p=0.000 n=20)
Perm3-8 26.55n ± 0% 26.62n ± 0% +0.28% (p=0.000 n=20)
Perm30-8 181.9n ± 0% 194.6n ± 2% +6.98% (p=0.000 n=20)
Perm30ViaShuffle-8 142.9n ± 0% 156.4n ± 0% +9.45% (p=0.000 n=20)
ShuffleOverhead-8 120.8n ± 2% 125.8n ± 0% +4.10% (p=0.000 n=20)
Concurrent-8 2.421n ± 6% 2.654n ± 6% +9.67% (p=0.002 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 01ff938549.386 │ afa459a2f0.386 │
│ sec/op │ sec/op vs base │
PCG_DXSM-32 7.613n ± 1% 7.793n ± 2% +2.38% (p=0.000 n=20)
SourceUint64-32 2.069n ± 0% 7.680n ± 1% +271.19% (p=0.000 n=20)
GlobalInt64-32 3.456n ± 1% 3.474n ± 3% ~ (p=0.654 n=20)
GlobalInt64Parallel-32 0.3252n ± 0% 0.3253n ± 0% ~ (p=0.952 n=20)
GlobalUint64-32 3.573n ± 1% 3.433n ± 2% -3.92% (p=0.000 n=20)
GlobalUint64Parallel-32 0.3159n ± 0% 0.3156n ± 0% ~ (p=0.223 n=20)
Int64-32 2.562n ± 2% 7.707n ± 1% +200.74% (p=0.000 n=20)
Uint64-32 2.592n ± 0% 7.714n ± 1% +197.65% (p=0.000 n=20)
GlobalIntN1000-32 6.266n ± 2% 6.236n ± 1% ~ (p=0.039 n=20)
IntN1000-32 4.724n ± 2% 10.410n ± 1% +120.39% (p=0.000 n=20)
Int64N1000-32 5.490n ± 2% 10.975n ± 2% +99.89% (p=0.000 n=20)
Int64N1e8-32 5.513n ± 2% 10.980n ± 1% +99.15% (p=0.000 n=20)
Int64N1e9-32 5.476n ± 1% 10.950n ± 0% +99.96% (p=0.000 n=20)
Int64N2e9-32 5.501n ± 2% 11.110n ± 1% +101.96% (p=0.000 n=20)
Int64N1e18-32 9.043n ± 2% 15.180n ± 2% +67.86% (p=0.000 n=20)
Int64N2e18-32 9.601n ± 2% 15.610n ± 1% +62.60% (p=0.000 n=20)
Int64N4e18-32 12.00n ± 1% 19.23n ± 2% +60.14% (p=0.000 n=20)
Int32N1000-32 4.829n ± 2% 10.345n ± 1% +114.25% (p=0.000 n=20)
Int32N1e8-32 4.825n ± 2% 10.330n ± 1% +114.09% (p=0.000 n=20)
Int32N1e9-32 4.830n ± 2% 10.350n ± 1% +114.26% (p=0.000 n=20)
Int32N2e9-32 4.750n ± 2% 10.345n ± 1% +117.81% (p=0.000 n=20)
Float32-32 10.89n ± 4% 13.57n ± 1% +24.61% (p=0.000 n=20)
Float64-32 19.60n ± 4% 22.95n ± 4% +17.12% (p=0.000 n=20)
ExpFloat64-32 12.96n ± 3% 15.23n ± 2% +17.47% (p=0.000 n=20)
NormFloat64-32 7.516n ± 1% 13.780n ± 1% +83.34% (p=0.000 n=20)
Perm3-32 36.78n ± 2% 46.62n ± 2% +26.72% (p=0.000 n=20)
Perm30-32 238.9n ± 2% 400.7n ± 1% +67.73% (p=0.000 n=20)
Perm30ViaShuffle-32 189.7n ± 2% 350.5n ± 1% +84.79% (p=0.000 n=20)
ShuffleOverhead-32 159.8n ± 1% 326.0n ± 2% +104.01% (p=0.000 n=20)
Concurrent-32 3.286n ± 1% 3.290n ± 0% ~ (p=0.743 n=20)
On the other hand, compared to the original "update benchmarks" CL,
the cleanups we've made more than compensate for PCG being a bit
slower than LFSR, at least on 64-bit x86. ARM64 (Apple M1) is a bit
slower: perhaps the 64x64→128 multiply is slower there for some reason.
386 is noticeably slower, but it's also a non-SSA backend.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 220860f76f.amd64 │ afa459a2f0.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.555n ± 1% 1.450n ± 3% -6.78% (p=0.000 n=20)
GlobalInt64-32 2.071n ± 1% 2.067n ± 2% ~ (p=0.673 n=20)
GlobalInt63Parallel-32 0.1023n ± 1%
GlobalInt64Parallel-32 0.1044n ± 2%
GlobalUint64-32 5.193n ± 1% 2.085n ± 0% -59.86% (p=0.000 n=20)
GlobalUint64Parallel-32 0.2341n ± 0% 0.1008n ± 1% -56.93% (p=0.000 n=20)
Int64-32 2.056n ± 2% 1.779n ± 1% -13.47% (p=0.000 n=20)
Uint64-32 2.077n ± 2% 1.854n ± 2% -10.74% (p=0.000 n=20)
GlobalIntN1000-32 4.077n ± 2% 3.140n ± 3% -22.98% (p=0.000 n=20)
IntN1000-32 3.476n ± 2% 2.496n ± 1% -28.19% (p=0.000 n=20)
Int64N1000-32 3.059n ± 1% 2.510n ± 2% -17.96% (p=0.000 n=20)
Int64N1e8-32 2.942n ± 1% 2.471n ± 2% -15.98% (p=0.000 n=20)
Int64N1e9-32 2.932n ± 1% 2.488n ± 2% -15.14% (p=0.000 n=20)
Int64N2e9-32 2.925n ± 1% 2.478n ± 2% -15.30% (p=0.000 n=20)
Int64N1e18-32 3.116n ± 1% 3.088n ± 1% ~ (p=0.013 n=20)
Int64N2e18-32 4.067n ± 1% 3.493n ± 1% -14.11% (p=0.000 n=20)
Int64N4e18-32 4.054n ± 1% 5.060n ± 2% +24.80% (p=0.000 n=20)
Int32N1000-32 2.951n ± 1% 2.620n ± 1% -11.22% (p=0.000 n=20)
Int32N1e8-32 3.102n ± 1% 2.652n ± 0% -14.50% (p=0.000 n=20)
Int32N1e9-32 3.535n ± 1% 2.644n ± 1% -25.20% (p=0.000 n=20)
Int32N2e9-32 3.514n ± 1% 2.619n ± 2% -25.47% (p=0.000 n=20)
Float32-32 2.760n ± 1% 2.261n ± 1% -18.06% (p=0.000 n=20)
Float64-32 2.284n ± 1% 2.241n ± 2% ~ (p=0.016 n=20)
ExpFloat64-32 3.757n ± 1% 3.716n ± 1% ~ (p=0.034 n=20)
NormFloat64-32 3.837n ± 1% 3.718n ± 1% -3.09% (p=0.000 n=20)
Perm3-32 35.23n ± 2% 34.11n ± 2% -3.19% (p=0.000 n=20)
Perm30-32 208.8n ± 1% 200.6n ± 0% -3.93% (p=0.000 n=20)
Perm30ViaShuffle-32 111.7n ± 1% 109.7n ± 1% -1.84% (p=0.000 n=20)
ShuffleOverhead-32 101.1n ± 1% 107.2n ± 1% +6.03% (p=0.000 n=20)
Concurrent-32 2.108n ± 7% 2.108n ± 6% ~ (p=0.644 n=20)
PCG_DXSM-32 1.488n ± 2%
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ 220860f76f.arm64 │ afa459a2f0.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.316n ± 1% 2.531n ± 0% +9.33% (p=0.000 n=20)
GlobalInt64-8 2.183n ± 1% 2.177n ± 1% ~ (p=0.533 n=20)
GlobalInt63Parallel-8 0.4331n ± 0%
GlobalInt64Parallel-8 0.4319n ± 0%
GlobalUint64-8 4.377n ± 2% 2.185n ± 1% -50.07% (p=0.000 n=20)
GlobalUint64Parallel-8 0.9237n ± 0% 0.4295n ± 1% -53.50% (p=0.000 n=20)
Int64-8 2.538n ± 1% 4.104n ± 0% +61.68% (p=0.000 n=20)
Uint64-8 2.604n ± 1% 4.080n ± 0% +56.68% (p=0.000 n=20)
GlobalIntN1000-8 3.857n ± 2% 2.814n ± 1% -27.04% (p=0.000 n=20)
IntN1000-8 3.822n ± 2% 4.140n ± 0% +8.32% (p=0.000 n=20)
Int64N1000-8 3.318n ± 0% 4.139n ± 0% +24.74% (p=0.000 n=20)
Int64N1e8-8 3.349n ± 1% 4.140n ± 0% +23.64% (p=0.000 n=20)
Int64N1e9-8 3.317n ± 2% 4.139n ± 0% +24.80% (p=0.000 n=20)
Int64N2e9-8 3.317n ± 2% 4.140n ± 0% +24.81% (p=0.000 n=20)
Int64N1e18-8 3.542n ± 1% 5.273n ± 0% +48.85% (p=0.000 n=20)
Int64N2e18-8 5.087n ± 0% 6.059n ± 0% +19.12% (p=0.000 n=20)
Int64N4e18-8 5.084n ± 0% 8.803n ± 0% +73.16% (p=0.000 n=20)
Int32N1000-8 3.208n ± 2% 4.131n ± 0% +28.79% (p=0.000 n=20)
Int32N1e8-8 3.610n ± 1% 4.131n ± 0% +14.43% (p=0.000 n=20)
Int32N1e9-8 4.235n ± 0% 4.131n ± 0% -2.44% (p=0.000 n=20)
Int32N2e9-8 4.229n ± 1% 4.131n ± 0% -2.33% (p=0.000 n=20)
Float32-8 3.468n ± 0% 4.110n ± 0% +18.50% (p=0.000 n=20)
Float64-8 3.447n ± 0% 4.104n ± 0% +19.05% (p=0.000 n=20)
ExpFloat64-8 4.567n ± 0% 5.338n ± 0% +16.86% (p=0.000 n=20)
NormFloat64-8 4.821n ± 0% 5.731n ± 0% +18.89% (p=0.000 n=20)
Perm3-8 28.89n ± 0% 26.62n ± 0% -7.84% (p=0.000 n=20)
Perm30-8 175.7n ± 0% 194.6n ± 2% +10.76% (p=0.000 n=20)
Perm30ViaShuffle-8 153.5n ± 0% 156.4n ± 0% +1.86% (p=0.000 n=20)
ShuffleOverhead-8 119.8n ± 1% 125.8n ± 0% +4.97% (p=0.000 n=20)
Concurrent-8 2.433n ± 3% 2.654n ± 6% +9.13% (p=0.001 n=20)
PCG_DXSM-8 2.531n ± 0%
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 220860f76f.386 │ afa459a2f0.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.370n ± 1% 7.680n ± 1% +224.05% (p=0.000 n=20)
GlobalInt64-32 3.569n ± 1% 3.474n ± 3% -2.66% (p=0.001 n=20)
GlobalInt63Parallel-32 0.3221n ± 1%
GlobalInt64Parallel-32 0.3253n ± 0%
GlobalUint64-32 8.797n ± 10% 3.433n ± 2% -60.98% (p=0.000 n=20)
GlobalUint64Parallel-32 0.6351n ± 0% 0.3156n ± 0% -50.31% (p=0.000 n=20)
Int64-32 2.612n ± 2% 7.707n ± 1% +195.04% (p=0.000 n=20)
Uint64-32 3.350n ± 1% 7.714n ± 1% +130.25% (p=0.000 n=20)
GlobalIntN1000-32 5.892n ± 1% 6.236n ± 1% +5.82% (p=0.000 n=20)
IntN1000-32 4.546n ± 1% 10.410n ± 1% +128.97% (p=0.000 n=20)
Int64N1000-32 14.59n ± 1% 10.97n ± 2% -24.75% (p=0.000 n=20)
Int64N1e8-32 14.76n ± 2% 10.98n ± 1% -25.58% (p=0.000 n=20)
Int64N1e9-32 16.57n ± 1% 10.95n ± 0% -33.90% (p=0.000 n=20)
Int64N2e9-32 14.54n ± 1% 11.11n ± 1% -23.62% (p=0.000 n=20)
Int64N1e18-32 16.14n ± 1% 15.18n ± 2% -5.95% (p=0.000 n=20)
Int64N2e18-32 18.10n ± 1% 15.61n ± 1% -13.73% (p=0.000 n=20)
Int64N4e18-32 18.65n ± 1% 19.23n ± 2% +3.08% (p=0.000 n=20)
Int32N1000-32 3.560n ± 1% 10.345n ± 1% +190.55% (p=0.000 n=20)
Int32N1e8-32 3.770n ± 2% 10.330n ± 1% +174.01% (p=0.000 n=20)
Int32N1e9-32 4.098n ± 0% 10.350n ± 1% +152.53% (p=0.000 n=20)
Int32N2e9-32 4.179n ± 1% 10.345n ± 1% +147.52% (p=0.000 n=20)
Float32-32 21.18n ± 4% 13.57n ± 1% -35.93% (p=0.000 n=20)
Float64-32 20.60n ± 2% 22.95n ± 4% +11.41% (p=0.000 n=20)
ExpFloat64-32 13.07n ± 0% 15.23n ± 2% +16.48% (p=0.000 n=20)
NormFloat64-32 7.738n ± 2% 13.780n ± 1% +78.08% (p=0.000 n=20)
Perm3-32 36.73n ± 1% 46.62n ± 2% +26.91% (p=0.000 n=20)
Perm30-32 211.9n ± 1% 400.7n ± 1% +89.05% (p=0.000 n=20)
Perm30ViaShuffle-32 165.2n ± 1% 350.5n ± 1% +112.20% (p=0.000 n=20)
ShuffleOverhead-32 133.9n ± 1% 326.0n ± 2% +143.37% (p=0.000 n=20)
Concurrent-32 3.287n ± 2% 3.290n ± 0% ~ (p=0.365 n=20)
PCG_DXSM-32 7.793n ± 2%
For #61716.
Change-Id: I4e9c0525b5f84a2ac46f23da9e365495e2d05777
Reviewed-on: https://go-review.googlesource.com/c/go/+/502506
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For the original math/rand, we ported Plan 9's random number
generator, which was a refinement by Ken Thompson of an algorithm
by Don Mitchell and Jim Reeds, which Mitchell in turn recalls as
having been derived from an algorithm by Marsaglia. At its core,
it is an additive lagged Fibonacci generator (ALFG).
Whatever the details of the history, this generator is nowhere
near the current state of the art for simple, pseudo-random
generators.
This CL adds an implementation of Melissa O'Neill's PCG, specifically
the variant PCG-DXSM, which she defined after writing the PCG paper
and which is now the default in Numpy. The update is slightly slower
(a few multiplies and adds, instead of a few adds), but the state
is dramatically smaller (2 words instead of 607). The statistical
output properties are better too.
A followup CL will delete the old generator.
PCG is the only change here, so no benchmarks should be affected.
Including them anyway as further evidence for caution.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 8993506f2f.amd64 │ 01ff938549.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.325n ± 1% 1.352n ± 1% +2.00% (p=0.000 n=20)
GlobalInt64-32 2.240n ± 1% 2.083n ± 0% -7.03% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1041n ± 1% 0.1035n ± 1% ~ (p=0.064 n=20)
GlobalUint64-32 2.072n ± 3% 2.038n ± 1% ~ (p=0.089 n=20)
GlobalUint64Parallel-32 0.1008n ± 1% 0.1006n ± 1% ~ (p=0.804 n=20)
Int64-32 1.716n ± 1% 1.687n ± 2% ~ (p=0.045 n=20)
Uint64-32 1.665n ± 1% 1.674n ± 2% ~ (p=0.878 n=20)
GlobalIntN1000-32 3.335n ± 1% 3.135n ± 1% -6.00% (p=0.000 n=20)
IntN1000-32 2.484n ± 1% 2.478n ± 1% ~ (p=0.085 n=20)
Int64N1000-32 2.502n ± 2% 2.455n ± 1% -1.88% (p=0.002 n=20)
Int64N1e8-32 2.484n ± 2% 2.467n ± 2% ~ (p=0.048 n=20)
Int64N1e9-32 2.502n ± 0% 2.454n ± 1% -1.92% (p=0.000 n=20)
Int64N2e9-32 2.502n ± 0% 2.482n ± 1% -0.76% (p=0.000 n=20)
Int64N1e18-32 3.201n ± 1% 3.349n ± 2% +4.62% (p=0.000 n=20)
Int64N2e18-32 3.504n ± 1% 3.537n ± 1% ~ (p=0.185 n=20)
Int64N4e18-32 4.873n ± 1% 4.917n ± 0% +0.90% (p=0.000 n=20)
Int32N1000-32 2.639n ± 1% 2.386n ± 1% -9.57% (p=0.000 n=20)
Int32N1e8-32 2.686n ± 2% 2.366n ± 1% -11.91% (p=0.000 n=20)
Int32N1e9-32 2.636n ± 1% 2.355n ± 2% -10.70% (p=0.000 n=20)
Int32N2e9-32 2.660n ± 1% 2.371n ± 1% -10.88% (p=0.000 n=20)
Float32-32 2.261n ± 1% 2.245n ± 2% ~ (p=0.752 n=20)
Float64-32 2.280n ± 1% 2.235n ± 1% -1.97% (p=0.007 n=20)
ExpFloat64-32 3.891n ± 1% 3.813n ± 3% ~ (p=0.087 n=20)
NormFloat64-32 3.711n ± 1% 3.652n ± 2% ~ (p=0.021 n=20)
Perm3-32 32.60n ± 2% 33.12n ± 3% ~ (p=0.107 n=20)
Perm30-32 204.2n ± 0% 205.1n ± 1% ~ (p=0.358 n=20)
Perm30ViaShuffle-32 121.7n ± 2% 110.8n ± 1% -8.96% (p=0.000 n=20)
ShuffleOverhead-32 106.2n ± 2% 113.0n ± 1% +6.36% (p=0.000 n=20)
Concurrent-32 2.190n ± 5% 2.100n ± 0% -4.13% (p=0.001 n=20)
PCG_DXSM-32 1.490n ± 0%
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ 8993506f2f.arm64 │ 01ff938549.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.271n ± 0% 2.258n ± 1% ~ (p=0.167 n=20)
GlobalInt64-8 2.161n ± 1% 2.167n ± 0% ~ (p=0.693 n=20)
GlobalInt64Parallel-8 0.4303n ± 0% 0.4310n ± 0% ~ (p=0.051 n=20)
GlobalUint64-8 2.164n ± 1% 2.182n ± 1% ~ (p=0.042 n=20)
GlobalUint64Parallel-8 0.4287n ± 0% 0.4297n ± 0% ~ (p=0.082 n=20)
Int64-8 2.478n ± 1% 2.472n ± 1% ~ (p=0.151 n=20)
Uint64-8 2.460n ± 1% 2.449n ± 1% ~ (p=0.013 n=20)
GlobalIntN1000-8 2.814n ± 2% 2.814n ± 2% ~ (p=0.821 n=20)
IntN1000-8 3.003n ± 2% 2.998n ± 2% ~ (p=0.024 n=20)
Int64N1000-8 2.954n ± 0% 2.949n ± 2% ~ (p=0.192 n=20)
Int64N1e8-8 2.956n ± 0% 2.953n ± 2% ~ (p=0.109 n=20)
Int64N1e9-8 3.325n ± 0% 2.950n ± 0% -11.26% (p=0.000 n=20)
Int64N2e9-8 2.956n ± 2% 2.946n ± 2% ~ (p=0.027 n=20)
Int64N1e18-8 3.780n ± 1% 3.779n ± 1% ~ (p=0.815 n=20)
Int64N2e18-8 4.385n ± 0% 4.370n ± 1% ~ (p=0.402 n=20)
Int64N4e18-8 6.527n ± 0% 6.544n ± 1% ~ (p=0.140 n=20)
Int32N1000-8 2.964n ± 1% 2.950n ± 0% -0.47% (p=0.002 n=20)
Int32N1e8-8 2.964n ± 1% 2.950n ± 2% ~ (p=0.013 n=20)
Int32N1e9-8 2.963n ± 2% 2.951n ± 2% ~ (p=0.062 n=20)
Int32N2e9-8 2.961n ± 2% 2.950n ± 2% -0.37% (p=0.002 n=20)
Float32-8 3.442n ± 0% 3.441n ± 0% ~ (p=0.211 n=20)
Float64-8 3.442n ± 0% 3.442n ± 0% ~ (p=0.067 n=20)
ExpFloat64-8 4.472n ± 0% 4.481n ± 0% +0.20% (p=0.000 n=20)
NormFloat64-8 4.734n ± 0% 4.725n ± 0% -0.19% (p=0.003 n=20)
Perm3-8 26.55n ± 0% 26.55n ± 0% ~ (p=0.833 n=20)
Perm30-8 181.9n ± 0% 181.9n ± 0% -0.03% (p=0.004 n=20)
Perm30ViaShuffle-8 143.1n ± 0% 142.9n ± 0% ~ (p=0.204 n=20)
ShuffleOverhead-8 120.6n ± 1% 120.8n ± 2% ~ (p=0.102 n=20)
Concurrent-8 2.357n ± 2% 2.421n ± 6% ~ (p=0.016 n=20)
PCG_DXSM-8 2.531n ± 0%
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 8993506f2f.386 │ 01ff938549.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.102n ± 2% 2.069n ± 0% ~ (p=0.021 n=20)
GlobalInt64-32 3.542n ± 2% 3.456n ± 1% -2.44% (p=0.001 n=20)
GlobalInt64Parallel-32 0.3202n ± 0% 0.3252n ± 0% +1.56% (p=0.000 n=20)
GlobalUint64-32 3.507n ± 1% 3.573n ± 1% +1.87% (p=0.000 n=20)
GlobalUint64Parallel-32 0.3170n ± 1% 0.3159n ± 0% ~ (p=0.167 n=20)
Int64-32 2.516n ± 1% 2.562n ± 2% ~ (p=0.016 n=20)
Uint64-32 2.544n ± 1% 2.592n ± 0% +1.85% (p=0.000 n=20)
GlobalIntN1000-32 6.237n ± 1% 6.266n ± 2% ~ (p=0.268 n=20)
IntN1000-32 4.670n ± 2% 4.724n ± 2% ~ (p=0.644 n=20)
Int64N1000-32 5.412n ± 1% 5.490n ± 2% ~ (p=0.159 n=20)
Int64N1e8-32 5.414n ± 2% 5.513n ± 2% ~ (p=0.129 n=20)
Int64N1e9-32 5.473n ± 1% 5.476n ± 1% ~ (p=0.723 n=20)
Int64N2e9-32 5.487n ± 1% 5.501n ± 2% ~ (p=0.481 n=20)
Int64N1e18-32 8.901n ± 2% 9.043n ± 2% ~ (p=0.330 n=20)
Int64N2e18-32 9.521n ± 1% 9.601n ± 2% ~ (p=0.703 n=20)
Int64N4e18-32 11.92n ± 1% 12.00n ± 1% ~ (p=0.489 n=20)
Int32N1000-32 4.785n ± 1% 4.829n ± 2% ~ (p=0.402 n=20)
Int32N1e8-32 4.748n ± 1% 4.825n ± 2% ~ (p=0.218 n=20)
Int32N1e9-32 4.810n ± 1% 4.830n ± 2% ~ (p=0.794 n=20)
Int32N2e9-32 4.812n ± 1% 4.750n ± 2% ~ (p=0.057 n=20)
Float32-32 10.48n ± 4% 10.89n ± 4% ~ (p=0.162 n=20)
Float64-32 19.79n ± 3% 19.60n ± 4% ~ (p=0.668 n=20)
ExpFloat64-32 12.91n ± 3% 12.96n ± 3% ~ (p=1.000 n=20)
NormFloat64-32 7.462n ± 1% 7.516n ± 1% ~ (p=0.051 n=20)
Perm3-32 35.98n ± 2% 36.78n ± 2% ~ (p=0.033 n=20)
Perm30-32 241.5n ± 1% 238.9n ± 2% ~ (p=0.126 n=20)
Perm30ViaShuffle-32 187.3n ± 2% 189.7n ± 2% ~ (p=0.387 n=20)
ShuffleOverhead-32 160.2n ± 1% 159.8n ± 1% ~ (p=0.256 n=20)
Concurrent-32 3.308n ± 3% 3.286n ± 1% ~ (p=0.038 n=20)
PCG_DXSM-32 7.613n ± 1%
For #61716.
Change-Id: Icb274ca1f782504d658305a40159b4ae6a2f3f1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/502505
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rob Pike <r@golang.org>
|
|
The compiler says Perm is being inlined into BenchmarkPerm,
and yet BenchmarkPerm30ViaShuffle, which you'd think is the
same code, still runs significantly faster.
The benchmarks are mystifying but this is clearly still a step in
the right direction, since BenchmarkPerm30ViaShuffle is still
the fastest and we avoid having two copies of that logic.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ e1bbe739fb.amd64 │ 8993506f2f.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.316n ± 2% 1.325n ± 1% ~ (p=0.208 n=20)
GlobalInt64-32 2.048n ± 1% 2.240n ± 1% +9.38% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1037n ± 1% 0.1041n ± 1% ~ (p=0.774 n=20)
GlobalUint64-32 2.039n ± 2% 2.072n ± 3% ~ (p=0.115 n=20)
GlobalUint64Parallel-32 0.1013n ± 1% 0.1008n ± 1% ~ (p=0.417 n=20)
Int64-32 1.692n ± 2% 1.716n ± 1% ~ (p=0.122 n=20)
Uint64-32 1.643n ± 2% 1.665n ± 1% ~ (p=0.062 n=20)
GlobalIntN1000-32 3.287n ± 1% 3.335n ± 1% ~ (p=0.147 n=20)
IntN1000-32 2.678n ± 2% 2.484n ± 1% -7.24% (p=0.000 n=20)
Int64N1000-32 2.684n ± 2% 2.502n ± 2% -6.80% (p=0.000 n=20)
Int64N1e8-32 2.663n ± 2% 2.484n ± 2% -6.76% (p=0.000 n=20)
Int64N1e9-32 2.633n ± 1% 2.502n ± 0% -4.98% (p=0.000 n=20)
Int64N2e9-32 2.657n ± 1% 2.502n ± 0% -5.87% (p=0.000 n=20)
Int64N1e18-32 3.125n ± 2% 3.201n ± 1% +2.43% (p=0.000 n=20)
Int64N2e18-32 3.476n ± 1% 3.504n ± 1% +0.83% (p=0.009 n=20)
Int64N4e18-32 4.795n ± 1% 4.873n ± 1% ~ (p=0.106 n=20)
Int32N1000-32 2.485n ± 2% 2.639n ± 1% +6.20% (p=0.000 n=20)
Int32N1e8-32 2.457n ± 1% 2.686n ± 2% +9.34% (p=0.000 n=20)
Int32N1e9-32 2.452n ± 1% 2.636n ± 1% +7.52% (p=0.000 n=20)
Int32N2e9-32 2.453n ± 1% 2.660n ± 1% +8.44% (p=0.000 n=20)
Float32-32 2.254n ± 1% 2.261n ± 1% ~ (p=0.888 n=20)
Float64-32 2.262n ± 1% 2.280n ± 1% ~ (p=0.040 n=20)
ExpFloat64-32 3.777n ± 2% 3.891n ± 1% +3.03% (p=0.000 n=20)
NormFloat64-32 3.606n ± 1% 3.711n ± 1% +2.91% (p=0.000 n=20)
Perm3-32 33.12n ± 2% 32.60n ± 2% ~ (p=0.045 n=20)
Perm30-32 176.1n ± 1% 204.2n ± 0% +15.96% (p=0.000 n=20)
Perm30ViaShuffle-32 109.3n ± 1% 121.7n ± 2% +11.30% (p=0.000 n=20)
ShuffleOverhead-32 112.5n ± 1% 106.2n ± 2% -5.56% (p=0.000 n=20)
Concurrent-32 2.099n ± 0% 2.190n ± 5% +4.36% (p=0.001 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ e1bbe739fb.arm64 │ 8993506f2f.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.290n ± 1% 2.271n ± 0% ~ (p=0.015 n=20)
GlobalInt64-8 2.180n ± 1% 2.161n ± 1% ~ (p=0.180 n=20)
GlobalInt64Parallel-8 0.4294n ± 0% 0.4303n ± 0% +0.19% (p=0.001 n=20)
GlobalUint64-8 2.170n ± 1% 2.164n ± 1% ~ (p=0.673 n=20)
GlobalUint64Parallel-8 0.4283n ± 0% 0.4287n ± 0% ~ (p=0.128 n=20)
Int64-8 2.481n ± 1% 2.478n ± 1% ~ (p=0.867 n=20)
Uint64-8 2.464n ± 1% 2.460n ± 1% ~ (p=0.763 n=20)
GlobalIntN1000-8 2.814n ± 0% 2.814n ± 2% ~ (p=0.969 n=20)
IntN1000-8 2.934n ± 2% 3.003n ± 2% +2.35% (p=0.000 n=20)
Int64N1000-8 2.957n ± 1% 2.954n ± 0% ~ (p=0.285 n=20)
Int64N1e8-8 2.935n ± 2% 2.956n ± 0% +0.73% (p=0.002 n=20)
Int64N1e9-8 2.935n ± 2% 3.325n ± 0% +13.29% (p=0.000 n=20)
Int64N2e9-8 2.933n ± 4% 2.956n ± 2% ~ (p=0.163 n=20)
Int64N1e18-8 3.781n ± 1% 3.780n ± 1% ~ (p=0.805 n=20)
Int64N2e18-8 4.362n ± 0% 4.385n ± 0% ~ (p=0.077 n=20)
Int64N4e18-8 6.576n ± 1% 6.527n ± 0% ~ (p=0.024 n=20)
Int32N1000-8 2.942n ± 2% 2.964n ± 1% ~ (p=0.073 n=20)
Int32N1e8-8 2.941n ± 1% 2.964n ± 1% ~ (p=0.058 n=20)
Int32N1e9-8 2.938n ± 2% 2.963n ± 2% +0.87% (p=0.003 n=20)
Int32N2e9-8 2.982n ± 2% 2.961n ± 2% ~ (p=0.056 n=20)
Float32-8 3.441n ± 0% 3.442n ± 0% ~ (p=0.030 n=20)
Float64-8 3.441n ± 0% 3.442n ± 0% +0.03% (p=0.001 n=20)
ExpFloat64-8 4.472n ± 0% 4.472n ± 0% ~ (p=0.877 n=20)
NormFloat64-8 4.716n ± 0% 4.734n ± 0% +0.38% (p=0.000 n=20)
Perm3-8 26.66n ± 0% 26.55n ± 0% -0.39% (p=0.000 n=20)
Perm30-8 143.3n ± 0% 181.9n ± 0% +26.97% (p=0.000 n=20)
Perm30ViaShuffle-8 142.9n ± 0% 143.1n ± 0% ~ (p=0.669 n=20)
ShuffleOverhead-8 121.1n ± 1% 120.6n ± 1% -0.41% (p=0.004 n=20)
Concurrent-8 2.379n ± 2% 2.357n ± 2% ~ (p=0.337 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ e1bbe739fb.386 │ 8993506f2f.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.087n ± 1% 2.102n ± 2% ~ (p=0.507 n=20)
GlobalInt64-32 3.538n ± 2% 3.542n ± 2% ~ (p=0.425 n=20)
GlobalInt64Parallel-32 0.3207n ± 1% 0.3202n ± 0% ~ (p=0.963 n=20)
GlobalUint64-32 3.543n ± 1% 3.507n ± 1% ~ (p=0.034 n=20)
GlobalUint64Parallel-32 0.3170n ± 0% 0.3170n ± 1% ~ (p=0.920 n=20)
Int64-32 2.548n ± 1% 2.516n ± 1% ~ (p=0.139 n=20)
Uint64-32 2.565n ± 2% 2.544n ± 1% ~ (p=0.394 n=20)
GlobalIntN1000-32 6.300n ± 1% 6.237n ± 1% ~ (p=0.029 n=20)
IntN1000-32 4.750n ± 0% 4.670n ± 2% ~ (p=0.034 n=20)
Int64N1000-32 5.515n ± 2% 5.412n ± 1% -1.86% (p=0.009 n=20)
Int64N1e8-32 5.527n ± 0% 5.414n ± 2% -2.05% (p=0.002 n=20)
Int64N1e9-32 5.531n ± 2% 5.473n ± 1% ~ (p=0.047 n=20)
Int64N2e9-32 5.514n ± 2% 5.487n ± 1% ~ (p=0.298 n=20)
Int64N1e18-32 9.059n ± 1% 8.901n ± 2% ~ (p=0.037 n=20)
Int64N2e18-32 9.594n ± 1% 9.521n ± 1% ~ (p=0.051 n=20)
Int64N4e18-32 12.05n ± 2% 11.92n ± 1% ~ (p=0.357 n=20)
Int32N1000-32 4.840n ± 2% 4.785n ± 1% ~ (p=0.189 n=20)
Int32N1e8-32 4.832n ± 2% 4.748n ± 1% ~ (p=0.042 n=20)
Int32N1e9-32 4.815n ± 2% 4.810n ± 1% ~ (p=0.878 n=20)
Int32N2e9-32 4.813n ± 1% 4.812n ± 1% ~ (p=0.542 n=20)
Float32-32 10.90n ± 2% 10.48n ± 4% -3.85% (p=0.007 n=20)
Float64-32 20.32n ± 4% 19.79n ± 3% ~ (p=0.553 n=20)
ExpFloat64-32 12.95n ± 3% 12.91n ± 3% ~ (p=0.909 n=20)
NormFloat64-32 7.570n ± 1% 7.462n ± 1% -1.44% (p=0.004 n=20)
Perm3-32 37.80n ± 2% 35.98n ± 2% -4.79% (p=0.000 n=20)
Perm30-32 214.0n ± 1% 241.5n ± 1% +12.85% (p=0.000 n=20)
Perm30ViaShuffle-32 188.7n ± 2% 187.3n ± 2% ~ (p=0.029 n=20)
ShuffleOverhead-32 160.8n ± 1% 160.2n ± 1% ~ (p=0.180 n=20)
Concurrent-32 3.288n ± 0% 3.308n ± 3% ~ (p=0.037 n=20)
For #61716.
Change-Id: I342b611456c3569520d3c91c849d29eba325d87e
Reviewed-on: https://go-review.googlesource.com/c/go/+/502504
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rob Pike <r@golang.org>
|
|
The original implementation of the ziggurat algorithm was designed for
32-bit random integer inputs. This necessitated reusing some low-order
bits for the slice selection and the random coordinate, which introduces
statistical bias. The result is that PractRand consistently fails the
math/rand normal and exponential sequences (transformed to uniform)
within 2 GB of variates.
This change adjusts the ziggurat procedures to use 63-bit random inputs,
so that there is no need to reuse bits between the slice and coordinate.
This is sufficient for the normal sequence to survive to 256 GB of
PractRand testing.
An alternative technique is to recalculate the ziggurats to use 1024
rather than 128 or 256 slices to make full use of 64-bit inputs. This
improves the survival of the normal sequence to far beyond 256 GB and
additionally provides a 6% performance improvement due to the improved
rejection procedure efficiency. However, doing so increases the total
size of the ziggurat tables from 4.5 kB to 48 kB.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 2703446c2e.amd64 │ e1bbe739fb.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.337n ± 1% 1.316n ± 2% ~ (p=0.024 n=20)
GlobalInt64-32 2.225n ± 2% 2.048n ± 1% -7.93% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1043n ± 2% 0.1037n ± 1% ~ (p=0.587 n=20)
GlobalUint64-32 2.058n ± 1% 2.039n ± 2% ~ (p=0.030 n=20)
GlobalUint64Parallel-32 0.1009n ± 1% 0.1013n ± 1% ~ (p=0.984 n=20)
Int64-32 1.719n ± 2% 1.692n ± 2% ~ (p=0.085 n=20)
Uint64-32 1.669n ± 1% 1.643n ± 2% ~ (p=0.049 n=20)
GlobalIntN1000-32 3.321n ± 2% 3.287n ± 1% ~ (p=0.298 n=20)
IntN1000-32 2.479n ± 1% 2.678n ± 2% +8.01% (p=0.000 n=20)
Int64N1000-32 2.477n ± 1% 2.684n ± 2% +8.38% (p=0.000 n=20)
Int64N1e8-32 2.490n ± 1% 2.663n ± 2% +6.99% (p=0.000 n=20)
Int64N1e9-32 2.458n ± 1% 2.633n ± 1% +7.12% (p=0.000 n=20)
Int64N2e9-32 2.486n ± 2% 2.657n ± 1% +6.90% (p=0.000 n=20)
Int64N1e18-32 3.215n ± 2% 3.125n ± 2% -2.78% (p=0.000 n=20)
Int64N2e18-32 3.588n ± 2% 3.476n ± 1% -3.15% (p=0.000 n=20)
Int64N4e18-32 4.938n ± 2% 4.795n ± 1% -2.91% (p=0.000 n=20)
Int32N1000-32 2.673n ± 2% 2.485n ± 2% -7.02% (p=0.000 n=20)
Int32N1e8-32 2.631n ± 2% 2.457n ± 1% -6.63% (p=0.000 n=20)
Int32N1e9-32 2.628n ± 2% 2.452n ± 1% -6.70% (p=0.000 n=20)
Int32N2e9-32 2.684n ± 2% 2.453n ± 1% -8.61% (p=0.000 n=20)
Float32-32 2.240n ± 2% 2.254n ± 1% ~ (p=0.878 n=20)
Float64-32 2.253n ± 1% 2.262n ± 1% ~ (p=0.963 n=20)
ExpFloat64-32 3.677n ± 1% 3.777n ± 2% +2.71% (p=0.004 n=20)
NormFloat64-32 3.761n ± 1% 3.606n ± 1% -4.15% (p=0.000 n=20)
Perm3-32 33.55n ± 2% 33.12n ± 2% ~ (p=0.402 n=20)
Perm30-32 173.2n ± 1% 176.1n ± 1% +1.67% (p=0.000 n=20)
Perm30ViaShuffle-32 115.9n ± 1% 109.3n ± 1% -5.69% (p=0.000 n=20)
ShuffleOverhead-32 101.9n ± 1% 112.5n ± 1% +10.35% (p=0.000 n=20)
Concurrent-32 2.107n ± 6% 2.099n ± 0% ~ (p=0.051 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ 2703446c2e.arm64 │ e1bbe739fb.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.275n ± 0% 2.290n ± 1% ~ (p=0.044 n=20)
GlobalInt64-8 2.154n ± 1% 2.180n ± 1% ~ (p=0.068 n=20)
GlobalInt64Parallel-8 0.4298n ± 0% 0.4294n ± 0% ~ (p=0.079 n=20)
GlobalUint64-8 2.160n ± 1% 2.170n ± 1% ~ (p=0.129 n=20)
GlobalUint64Parallel-8 0.4286n ± 0% 0.4283n ± 0% ~ (p=0.350 n=20)
Int64-8 2.491n ± 1% 2.481n ± 1% ~ (p=0.330 n=20)
Uint64-8 2.458n ± 0% 2.464n ± 1% ~ (p=0.351 n=20)
GlobalIntN1000-8 2.814n ± 2% 2.814n ± 0% ~ (p=0.325 n=20)
IntN1000-8 2.933n ± 0% 2.934n ± 2% ~ (p=0.079 n=20)
Int64N1000-8 2.962n ± 1% 2.957n ± 1% ~ (p=0.259 n=20)
Int64N1e8-8 2.960n ± 1% 2.935n ± 2% ~ (p=0.276 n=20)
Int64N1e9-8 2.935n ± 2% 2.935n ± 2% ~ (p=0.984 n=20)
Int64N2e9-8 2.934n ± 0% 2.933n ± 4% ~ (p=0.463 n=20)
Int64N1e18-8 3.777n ± 1% 3.781n ± 1% ~ (p=0.516 n=20)
Int64N2e18-8 4.359n ± 1% 4.362n ± 0% ~ (p=0.256 n=20)
Int64N4e18-8 6.536n ± 1% 6.576n ± 1% ~ (p=0.224 n=20)
Int32N1000-8 2.937n ± 0% 2.942n ± 2% ~ (p=0.312 n=20)
Int32N1e8-8 2.937n ± 1% 2.941n ± 1% ~ (p=0.463 n=20)
Int32N1e9-8 2.936n ± 0% 2.938n ± 2% ~ (p=0.044 n=20)
Int32N2e9-8 2.938n ± 2% 2.982n ± 2% ~ (p=0.174 n=20)
Float32-8 3.441n ± 0% 3.441n ± 0% ~ (p=0.064 n=20)
Float64-8 3.441n ± 0% 3.441n ± 0% ~ (p=0.826 n=20)
ExpFloat64-8 4.486n ± 0% 4.472n ± 0% -0.31% (p=0.000 n=20)
NormFloat64-8 4.721n ± 0% 4.716n ± 0% ~ (p=0.051 n=20)
Perm3-8 26.65n ± 0% 26.66n ± 0% ~ (p=0.080 n=20)
Perm30-8 143.2n ± 0% 143.3n ± 0% +0.10% (p=0.000 n=20)
Perm30ViaShuffle-8 143.0n ± 0% 142.9n ± 0% ~ (p=0.642 n=20)
ShuffleOverhead-8 120.6n ± 1% 121.1n ± 1% +0.41% (p=0.010 n=20)
Concurrent-8 2.399n ± 5% 2.379n ± 2% ~ (p=0.365 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 2703446c2e.386 │ e1bbe739fb.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.072n ± 2% 2.087n ± 1% ~ (p=0.440 n=20)
GlobalInt64-32 3.546n ± 27% 3.538n ± 2% ~ (p=0.101 n=20)
GlobalInt64Parallel-32 0.3211n ± 0% 0.3207n ± 1% ~ (p=0.753 n=20)
GlobalUint64-32 3.522n ± 2% 3.543n ± 1% ~ (p=0.071 n=20)
GlobalUint64Parallel-32 0.3172n ± 0% 0.3170n ± 0% ~ (p=0.507 n=20)
Int64-32 2.520n ± 2% 2.548n ± 1% ~ (p=0.267 n=20)
Uint64-32 2.581n ± 1% 2.565n ± 2% ~ (p=0.143 n=20)
GlobalIntN1000-32 6.171n ± 1% 6.300n ± 1% ~ (p=0.037 n=20)
IntN1000-32 4.752n ± 2% 4.750n ± 0% ~ (p=0.984 n=20)
Int64N1000-32 5.429n ± 1% 5.515n ± 2% ~ (p=0.292 n=20)
Int64N1e8-32 5.469n ± 2% 5.527n ± 0% ~ (p=0.013 n=20)
Int64N1e9-32 5.489n ± 2% 5.531n ± 2% ~ (p=0.256 n=20)
Int64N2e9-32 5.492n ± 2% 5.514n ± 2% ~ (p=0.606 n=20)
Int64N1e18-32 8.927n ± 1% 9.059n ± 1% ~ (p=0.229 n=20)
Int64N2e18-32 9.622n ± 1% 9.594n ± 1% ~ (p=0.703 n=20)
Int64N4e18-32 12.03n ± 1% 12.05n ± 2% ~ (p=0.733 n=20)
Int32N1000-32 4.817n ± 1% 4.840n ± 2% ~ (p=0.941 n=20)
Int32N1e8-32 4.801n ± 1% 4.832n ± 2% ~ (p=0.228 n=20)
Int32N1e9-32 4.798n ± 1% 4.815n ± 2% ~ (p=0.560 n=20)
Int32N2e9-32 4.840n ± 1% 4.813n ± 1% ~ (p=0.015 n=20)
Float32-32 10.51n ± 4% 10.90n ± 2% +3.71% (p=0.007 n=20)
Float64-32 20.33n ± 3% 20.32n ± 4% ~ (p=0.566 n=20)
ExpFloat64-32 12.59n ± 2% 12.95n ± 3% +2.86% (p=0.002 n=20)
NormFloat64-32 7.350n ± 2% 7.570n ± 1% +2.99% (p=0.007 n=20)
Perm3-32 39.29n ± 2% 37.80n ± 2% -3.79% (p=0.000 n=20)
Perm30-32 219.1n ± 2% 214.0n ± 1% -2.33% (p=0.002 n=20)
Perm30ViaShuffle-32 189.8n ± 2% 188.7n ± 2% ~ (p=0.147 n=20)
ShuffleOverhead-32 158.9n ± 2% 160.8n ± 1% ~ (p=0.176 n=20)
Concurrent-32 3.306n ± 3% 3.288n ± 0% -0.54% (p=0.005 n=20)
For #61716.
Change-Id: I4c5fe710b310dc075ae21c97d1805bcc20db5050
Reviewed-on: https://go-review.googlesource.com/c/go/+/516275
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
|
|
We realized too late after Go 1 that float64(r.Uint64())/(1<<64)
is not a correct implementation: it occasionally rounds to 1.
The correct implementation is float64(r.Uint64()&(1<<53-1))/(1<<53)
but we couldn't change the implementation for compatibility, so we
changed it to retry only in the "round to 1" cases.
The change to v2 lets us update the algorithm to the simpler,
faster one.
Note that this implementation cannot generate 2⁻⁵⁴, nor 2⁻¹⁰⁰,
nor any of the other numbers between 0 and 2⁻⁵³. A slower algorithm
could shift some of the probability of generating these two boundary
values over to the values in between, but that would be much slower
and not necessarily be better. In particular, the current
implementation has the property that there are uniform gaps between
the possible returned floats, which might help stability. Also, the
result is often scaled and shifted, like Float64()*X+Y. Multiplying by
X>1 would open new gaps, and adding most Y would erase all the
distinctions that were introduced.
The only changes to benchmarks should be in Float32 and Float64.
The other changes remain a cautionary tale.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 4d84a369d1.amd64 │ 2703446c2e.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.348n ± 2% 1.337n ± 1% ~ (p=0.662 n=20)
GlobalInt64-32 2.082n ± 2% 2.225n ± 2% +6.87% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1036n ± 1% 0.1043n ± 2% ~ (p=0.171 n=20)
GlobalUint64-32 2.077n ± 2% 2.058n ± 1% ~ (p=0.560 n=20)
GlobalUint64Parallel-32 0.1012n ± 1% 0.1009n ± 1% ~ (p=0.995 n=20)
Int64-32 1.750n ± 0% 1.719n ± 2% -1.74% (p=0.000 n=20)
Uint64-32 1.707n ± 2% 1.669n ± 1% -2.20% (p=0.000 n=20)
GlobalIntN1000-32 3.192n ± 1% 3.321n ± 2% +4.04% (p=0.000 n=20)
IntN1000-32 2.462n ± 2% 2.479n ± 1% ~ (p=0.417 n=20)
Int64N1000-32 2.470n ± 1% 2.477n ± 1% ~ (p=0.664 n=20)
Int64N1e8-32 2.503n ± 2% 2.490n ± 1% ~ (p=0.245 n=20)
Int64N1e9-32 2.487n ± 1% 2.458n ± 1% ~ (p=0.032 n=20)
Int64N2e9-32 2.487n ± 1% 2.486n ± 2% ~ (p=0.507 n=20)
Int64N1e18-32 3.006n ± 2% 3.215n ± 2% +6.94% (p=0.000 n=20)
Int64N2e18-32 3.368n ± 1% 3.588n ± 2% +6.55% (p=0.000 n=20)
Int64N4e18-32 4.763n ± 1% 4.938n ± 2% +3.69% (p=0.000 n=20)
Int32N1000-32 2.403n ± 1% 2.673n ± 2% +11.19% (p=0.000 n=20)
Int32N1e8-32 2.405n ± 1% 2.631n ± 2% +9.42% (p=0.000 n=20)
Int32N1e9-32 2.402n ± 2% 2.628n ± 2% +9.41% (p=0.000 n=20)
Int32N2e9-32 2.384n ± 1% 2.684n ± 2% +12.56% (p=0.000 n=20)
Float32-32 2.641n ± 2% 2.240n ± 2% -15.18% (p=0.000 n=20)
Float64-32 2.483n ± 1% 2.253n ± 1% -9.26% (p=0.000 n=20)
ExpFloat64-32 3.486n ± 2% 3.677n ± 1% +5.49% (p=0.000 n=20)
NormFloat64-32 3.648n ± 1% 3.761n ± 1% +3.11% (p=0.000 n=20)
Perm3-32 33.04n ± 1% 33.55n ± 2% ~ (p=0.180 n=20)
Perm30-32 171.9n ± 1% 173.2n ± 1% ~ (p=0.050 n=20)
Perm30ViaShuffle-32 100.3n ± 1% 115.9n ± 1% +15.55% (p=0.000 n=20)
ShuffleOverhead-32 102.5n ± 1% 101.9n ± 1% ~ (p=0.266 n=20)
Concurrent-32 2.101n ± 0% 2.107n ± 6% ~ (p=0.212 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ 4d84a369d1.arm64 │ 2703446c2e.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.261n ± 1% 2.275n ± 0% ~ (p=0.082 n=20)
GlobalInt64-8 2.160n ± 1% 2.154n ± 1% ~ (p=0.490 n=20)
GlobalInt64Parallel-8 0.4299n ± 0% 0.4298n ± 0% ~ (p=0.663 n=20)
GlobalUint64-8 2.169n ± 1% 2.160n ± 1% ~ (p=0.292 n=20)
GlobalUint64Parallel-8 0.4293n ± 1% 0.4286n ± 0% ~ (p=0.155 n=20)
Int64-8 2.473n ± 1% 2.491n ± 1% ~ (p=0.317 n=20)
Uint64-8 2.453n ± 1% 2.458n ± 0% ~ (p=0.941 n=20)
GlobalIntN1000-8 2.814n ± 2% 2.814n ± 2% ~ (p=0.972 n=20)
IntN1000-8 2.933n ± 2% 2.933n ± 0% ~ (p=0.287 n=20)
Int64N1000-8 2.934n ± 2% 2.962n ± 1% ~ (p=0.062 n=20)
Int64N1e8-8 2.935n ± 2% 2.960n ± 1% ~ (p=0.183 n=20)
Int64N1e9-8 2.934n ± 2% 2.935n ± 2% ~ (p=0.367 n=20)
Int64N2e9-8 2.935n ± 2% 2.934n ± 0% ~ (p=0.455 n=20)
Int64N1e18-8 3.778n ± 1% 3.777n ± 1% ~ (p=0.995 n=20)
Int64N2e18-8 4.359n ± 1% 4.359n ± 1% ~ (p=0.122 n=20)
Int64N4e18-8 6.546n ± 1% 6.536n ± 1% ~ (p=0.920 n=20)
Int32N1000-8 2.940n ± 2% 2.937n ± 0% ~ (p=0.149 n=20)
Int32N1e8-8 2.937n ± 2% 2.937n ± 1% ~ (p=0.620 n=20)
Int32N1e9-8 2.938n ± 0% 2.936n ± 0% ~ (p=0.046 n=20)
Int32N2e9-8 2.938n ± 2% 2.938n ± 2% ~ (p=0.455 n=20)
Float32-8 3.486n ± 0% 3.441n ± 0% -1.28% (p=0.000 n=20)
Float64-8 3.480n ± 0% 3.441n ± 0% -1.13% (p=0.000 n=20)
ExpFloat64-8 4.533n ± 0% 4.486n ± 0% -1.03% (p=0.000 n=20)
NormFloat64-8 4.764n ± 0% 4.721n ± 0% -0.90% (p=0.000 n=20)
Perm3-8 26.66n ± 0% 26.65n ± 0% ~ (p=0.019 n=20)
Perm30-8 143.4n ± 0% 143.2n ± 0% -0.17% (p=0.000 n=20)
Perm30ViaShuffle-8 142.9n ± 0% 143.0n ± 0% ~ (p=0.522 n=20)
ShuffleOverhead-8 120.7n ± 0% 120.6n ± 1% ~ (p=0.488 n=20)
Concurrent-8 2.360n ± 2% 2.399n ± 5% ~ (p=0.062 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 4d84a369d1.386 │ 2703446c2e.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.101n ± 2% 2.072n ± 2% ~ (p=0.273 n=20)
GlobalInt64-32 3.518n ± 2% 3.546n ± 27% +0.78% (p=0.007 n=20)
GlobalInt64Parallel-32 0.3206n ± 0% 0.3211n ± 0% ~ (p=0.386 n=20)
GlobalUint64-32 3.538n ± 1% 3.522n ± 2% ~ (p=0.331 n=20)
GlobalUint64Parallel-32 0.3231n ± 0% 0.3172n ± 0% -1.84% (p=0.000 n=20)
Int64-32 2.554n ± 2% 2.520n ± 2% ~ (p=0.465 n=20)
Uint64-32 2.575n ± 2% 2.581n ± 1% ~ (p=0.213 n=20)
GlobalIntN1000-32 6.292n ± 1% 6.171n ± 1% ~ (p=0.015 n=20)
IntN1000-32 4.735n ± 1% 4.752n ± 2% ~ (p=0.635 n=20)
Int64N1000-32 5.489n ± 2% 5.429n ± 1% ~ (p=0.324 n=20)
Int64N1e8-32 5.528n ± 2% 5.469n ± 2% ~ (p=0.013 n=20)
Int64N1e9-32 5.438n ± 2% 5.489n ± 2% ~ (p=0.984 n=20)
Int64N2e9-32 5.474n ± 1% 5.492n ± 2% ~ (p=0.616 n=20)
Int64N1e18-32 9.053n ± 1% 8.927n ± 1% ~ (p=0.037 n=20)
Int64N2e18-32 9.685n ± 2% 9.622n ± 1% ~ (p=0.449 n=20)
Int64N4e18-32 12.18n ± 1% 12.03n ± 1% ~ (p=0.013 n=20)
Int32N1000-32 4.862n ± 1% 4.817n ± 1% -0.94% (p=0.002 n=20)
Int32N1e8-32 4.758n ± 2% 4.801n ± 1% ~ (p=0.597 n=20)
Int32N1e9-32 4.772n ± 1% 4.798n ± 1% ~ (p=0.774 n=20)
Int32N2e9-32 4.847n ± 0% 4.840n ± 1% ~ (p=0.867 n=20)
Float32-32 22.18n ± 4% 10.51n ± 4% -52.61% (p=0.000 n=20)
Float64-32 21.21n ± 3% 20.33n ± 3% -4.17% (p=0.000 n=20)
ExpFloat64-32 12.39n ± 2% 12.59n ± 2% ~ (p=0.139 n=20)
NormFloat64-32 7.422n ± 1% 7.350n ± 2% ~ (p=0.208 n=20)
Perm3-32 38.00n ± 2% 39.29n ± 2% +3.38% (p=0.000 n=20)
Perm30-32 212.7n ± 1% 219.1n ± 2% +3.03% (p=0.001 n=20)
Perm30ViaShuffle-32 187.5n ± 2% 189.8n ± 2% ~ (p=0.457 n=20)
ShuffleOverhead-32 159.7n ± 1% 158.9n ± 2% ~ (p=0.920 n=20)
Concurrent-32 3.470n ± 0% 3.306n ± 3% -4.71% (p=0.000 n=20)
For #61716.
Change-Id: I1933f1f9efd7e6e832d83e7fa5d84398f67d41f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/502503
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Now that we can break the value stream, we can take advantage
of better algorithms that have been suggested since the original
code was written.
Also optimizes IntN, Int32N, Int64N, Perm (indirectly).
All the N variants (IntN, Int32N, Int64N, UintN, N, etc) now
return the same values given a Source and parameter n, so that
for example uint(r.IntN(10)) and r.UintN(10) and r.N(uint(10))
are completely interchangeable.
Int64N4e18 gets slower but that is a near worst case for
the algorithm and is extremely unlikely in practice.
32-bit Int32N variants got slower too, by 15-30%, in exchange
for speeding up everything on 64-bit systems and consistency
across the N functions.
Also rename previously missed benchmark
GlobalInt63Parallel to GlobalInt64Parallel.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 11ad9fdddc.amd64 │ 4d84a369d1.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.335n ± 1% 1.348n ± 2% ~ (p=0.335 n=20)
GlobalInt64-32 2.046n ± 1% 2.082n ± 2% ~ (p=0.310 n=20)
GlobalInt63Parallel-32 0.1037n ± 1%
GlobalInt64Parallel-32 0.1036n ± 1%
GlobalUint64-32 2.075n ± 0% 2.077n ± 2% ~ (p=0.228 n=20)
GlobalUint64Parallel-32 0.1013n ± 1% 0.1012n ± 1% ~ (p=0.878 n=20)
Int64-32 1.726n ± 2% 1.750n ± 0% +1.39% (p=0.000 n=20)
Uint64-32 1.673n ± 1% 1.707n ± 2% +2.03% (p=0.002 n=20)
GlobalIntN1000-32 3.895n ± 2% 3.192n ± 1% -18.05% (p=0.000 n=20)
IntN1000-32 3.403n ± 1% 2.462n ± 2% -27.65% (p=0.000 n=20)
Int64N1000-32 3.053n ± 2% 2.470n ± 1% -19.11% (p=0.000 n=20)
Int64N1e8-32 2.718n ± 1% 2.503n ± 2% -7.91% (p=0.000 n=20)
Int64N1e9-32 2.712n ± 1% 2.487n ± 1% -8.31% (p=0.000 n=20)
Int64N2e9-32 2.690n ± 1% 2.487n ± 1% -7.57% (p=0.000 n=20)
Int64N1e18-32 3.084n ± 2% 3.006n ± 2% -2.53% (p=0.000 n=20)
Int64N2e18-32 4.026n ± 1% 3.368n ± 1% -16.33% (p=0.000 n=20)
Int64N4e18-32 4.049n ± 2% 4.763n ± 1% +17.62% (p=0.000 n=20)
Int32N1000-32 2.730n ± 0% 2.403n ± 1% -11.94% (p=0.000 n=20)
Int32N1e8-32 2.916n ± 2% 2.405n ± 1% -17.53% (p=0.000 n=20)
Int32N1e9-32 3.375n ± 1% 2.402n ± 2% -28.83% (p=0.000 n=20)
Int32N2e9-32 3.292n ± 1% 2.384n ± 1% -27.58% (p=0.000 n=20)
Float32-32 2.673n ± 1% 2.641n ± 2% ~ (p=0.147 n=20)
Float64-32 2.485n ± 1% 2.483n ± 1% ~ (p=0.804 n=20)
ExpFloat64-32 3.577n ± 2% 3.486n ± 2% -2.57% (p=0.000 n=20)
NormFloat64-32 3.797n ± 2% 3.648n ± 1% -3.92% (p=0.000 n=20)
Perm3-32 35.79n ± 2% 33.04n ± 1% -7.68% (p=0.000 n=20)
Perm30-32 205.1n ± 1% 171.9n ± 1% -16.14% (p=0.000 n=20)
Perm30ViaShuffle-32 111.2n ± 2% 100.3n ± 1% -9.76% (p=0.000 n=20)
ShuffleOverhead-32 100.5n ± 2% 102.5n ± 1% +1.99% (p=0.007 n=20)
Concurrent-32 2.188n ± 5% 2.101n ± 0% ~ (p=0.013 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ 11ad9fdddc.arm64 │ 4d84a369d1.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.272n ± 1% 2.261n ± 1% ~ (p=0.172 n=20)
GlobalInt64-8 2.155n ± 1% 2.160n ± 1% ~ (p=0.482 n=20)
GlobalInt63Parallel-8 0.4352n ± 0%
GlobalInt64Parallel-8 0.4299n ± 0%
GlobalUint64-8 2.173n ± 1% 2.169n ± 1% ~ (p=0.262 n=20)
GlobalUint64Parallel-8 0.4340n ± 0% 0.4293n ± 1% -1.08% (p=0.000 n=20)
Int64-8 2.544n ± 1% 2.473n ± 1% -2.83% (p=0.000 n=20)
Uint64-8 2.552n ± 1% 2.453n ± 1% -3.90% (p=0.000 n=20)
GlobalIntN1000-8 3.856n ± 0% 2.814n ± 2% -27.02% (p=0.000 n=20)
IntN1000-8 3.820n ± 0% 2.933n ± 2% -23.22% (p=0.000 n=20)
Int64N1000-8 3.219n ± 2% 2.934n ± 2% -8.85% (p=0.000 n=20)
Int64N1e8-8 3.221n ± 2% 2.935n ± 2% -8.91% (p=0.000 n=20)
Int64N1e9-8 3.276n ± 2% 2.934n ± 2% -10.44% (p=0.000 n=20)
Int64N2e9-8 3.217n ± 0% 2.935n ± 2% -8.78% (p=0.000 n=20)
Int64N1e18-8 3.502n ± 2% 3.778n ± 1% +7.91% (p=0.000 n=20)
Int64N2e18-8 4.968n ± 1% 4.359n ± 1% -12.26% (p=0.000 n=20)
Int64N4e18-8 4.963n ± 0% 6.546n ± 1% +31.92% (p=0.000 n=20)
Int32N1000-8 3.189n ± 1% 2.940n ± 2% -7.81% (p=0.000 n=20)
Int32N1e8-8 3.514n ± 1% 2.937n ± 2% -16.41% (p=0.000 n=20)
Int32N1e9-8 4.133n ± 0% 2.938n ± 0% -28.91% (p=0.000 n=20)
Int32N2e9-8 4.137n ± 0% 2.938n ± 2% -28.97% (p=0.000 n=20)
Float32-8 3.468n ± 1% 3.486n ± 0% +0.52% (p=0.000 n=20)
Float64-8 3.478n ± 0% 3.480n ± 0% ~ (p=0.063 n=20)
ExpFloat64-8 4.563n ± 0% 4.533n ± 0% -0.67% (p=0.000 n=20)
NormFloat64-8 4.768n ± 0% 4.764n ± 0% -0.07% (p=0.001 n=20)
Perm3-8 28.94n ± 0% 26.66n ± 0% -7.88% (p=0.000 n=20)
Perm30-8 175.9n ± 0% 143.4n ± 0% -18.50% (p=0.000 n=20)
Perm30ViaShuffle-8 152.6n ± 1% 142.9n ± 0% -6.29% (p=0.000 n=20)
ShuffleOverhead-8 119.6n ± 1% 120.7n ± 0% +0.96% (p=0.000 n=20)
Concurrent-8 2.452n ± 3% 2.360n ± 2% -3.73% (p=0.007 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 11ad9fdddc.386 │ 4d84a369d1.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.091n ± 1% 2.101n ± 2% ~ (p=0.672 n=20)
GlobalInt64-32 3.514n ± 2% 3.518n ± 2% ~ (p=0.723 n=20)
GlobalInt63Parallel-32 0.3197n ± 0%
GlobalInt64Parallel-32 0.3206n ± 0%
GlobalUint64-32 3.542n ± 1% 3.538n ± 1% ~ (p=0.304 n=20)
GlobalUint64Parallel-32 0.3218n ± 0% 0.3231n ± 0% ~ (p=0.071 n=20)
Int64-32 2.552n ± 2% 2.554n ± 2% ~ (p=0.693 n=20)
Uint64-32 2.566n ± 1% 2.575n ± 2% ~ (p=0.606 n=20)
GlobalIntN1000-32 5.965n ± 2% 6.292n ± 1% +5.46% (p=0.000 n=20)
IntN1000-32 4.652n ± 1% 4.735n ± 1% +1.77% (p=0.000 n=20)
Int64N1000-32 14.485n ± 1% 5.489n ± 2% -62.11% (p=0.000 n=20)
Int64N1e8-32 14.675n ± 1% 5.528n ± 2% -62.33% (p=0.000 n=20)
Int64N1e9-32 16.805n ± 2% 5.438n ± 2% -67.64% (p=0.000 n=20)
Int64N2e9-32 14.515n ± 1% 5.474n ± 1% -62.28% (p=0.000 n=20)
Int64N1e18-32 16.165n ± 1% 9.053n ± 1% -44.00% (p=0.000 n=20)
Int64N2e18-32 17.945n ± 2% 9.685n ± 2% -46.03% (p=0.000 n=20)
Int64N4e18-32 18.35n ± 2% 12.18n ± 1% -33.62% (p=0.000 n=20)
Int32N1000-32 3.608n ± 1% 4.862n ± 1% +34.77% (p=0.000 n=20)
Int32N1e8-32 3.767n ± 1% 4.758n ± 2% +26.31% (p=0.000 n=20)
Int32N1e9-32 4.130n ± 2% 4.772n ± 1% +15.54% (p=0.000 n=20)
Int32N2e9-32 4.206n ± 1% 4.847n ± 0% +15.24% (p=0.000 n=20)
Float32-32 22.18n ± 4% 22.18n ± 4% ~ (p=0.195 n=20)
Float64-32 20.75n ± 4% 21.21n ± 3% ~ (p=0.394 n=20)
ExpFloat64-32 12.58n ± 3% 12.39n ± 2% ~ (p=0.032 n=20)
NormFloat64-32 7.920n ± 3% 7.422n ± 1% -6.29% (p=0.000 n=20)
Perm3-32 40.27n ± 1% 38.00n ± 2% -5.65% (p=0.000 n=20)
Perm30-32 213.2n ± 2% 212.7n ± 1% ~ (p=0.995 n=20)
Perm30ViaShuffle-32 164.2n ± 2% 187.5n ± 2% +14.22% (p=0.000 n=20)
ShuffleOverhead-32 134.7n ± 2% 159.7n ± 1% +18.52% (p=0.000 n=20)
Concurrent-32 3.301n ± 2% 3.470n ± 0% +5.10% (p=0.000 n=20)
For #61716.
Change-Id: Id1481b04202883cd0b23e21bb58d1bca4e482bd3
Reviewed-on: https://go-review.googlesource.com/c/go/+/502500
Reviewed-by: Rob Pike <r@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This should make Uint64-using functions faster and leave
other things alone. It is a mystery why so much got faster.
A good cautionary tale not to read too much into minor
jitter in the benchmarks.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 220860f76f.amd64 │ 11ad9fdddc.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.555n ± 1% 1.335n ± 1% -14.15% (p=0.000 n=20)
GlobalInt64-32 2.071n ± 1% 2.046n ± 1% ~ (p=0.016 n=20)
GlobalInt63Parallel-32 0.1023n ± 1% 0.1037n ± 1% +1.37% (p=0.002 n=20)
GlobalUint64-32 5.193n ± 1% 2.075n ± 0% -60.06% (p=0.000 n=20)
GlobalUint64Parallel-32 0.2341n ± 0% 0.1013n ± 1% -56.74% (p=0.000 n=20)
Int64-32 2.056n ± 2% 1.726n ± 2% -16.10% (p=0.000 n=20)
Uint64-32 2.077n ± 2% 1.673n ± 1% -19.46% (p=0.000 n=20)
GlobalIntN1000-32 4.077n ± 2% 3.895n ± 2% -4.45% (p=0.000 n=20)
IntN1000-32 3.476n ± 2% 3.403n ± 1% -2.10% (p=0.000 n=20)
Int64N1000-32 3.059n ± 1% 3.053n ± 2% ~ (p=0.131 n=20)
Int64N1e8-32 2.942n ± 1% 2.718n ± 1% -7.60% (p=0.000 n=20)
Int64N1e9-32 2.932n ± 1% 2.712n ± 1% -7.50% (p=0.000 n=20)
Int64N2e9-32 2.925n ± 1% 2.690n ± 1% -8.03% (p=0.000 n=20)
Int64N1e18-32 3.116n ± 1% 3.084n ± 2% ~ (p=0.425 n=20)
Int64N2e18-32 4.067n ± 1% 4.026n ± 1% -1.02% (p=0.007 n=20)
Int64N4e18-32 4.054n ± 1% 4.049n ± 2% ~ (p=0.204 n=20)
Int32N1000-32 2.951n ± 1% 2.730n ± 0% -7.49% (p=0.000 n=20)
Int32N1e8-32 3.102n ± 1% 2.916n ± 2% -6.03% (p=0.000 n=20)
Int32N1e9-32 3.535n ± 1% 3.375n ± 1% -4.54% (p=0.000 n=20)
Int32N2e9-32 3.514n ± 1% 3.292n ± 1% -6.30% (p=0.000 n=20)
Float32-32 2.760n ± 1% 2.673n ± 1% -3.13% (p=0.000 n=20)
Float64-32 2.284n ± 1% 2.485n ± 1% +8.80% (p=0.000 n=20)
ExpFloat64-32 3.757n ± 1% 3.577n ± 2% -4.78% (p=0.000 n=20)
NormFloat64-32 3.837n ± 1% 3.797n ± 2% ~ (p=0.204 n=20)
Perm3-32 35.23n ± 2% 35.79n ± 2% ~ (p=0.298 n=20)
Perm30-32 208.8n ± 1% 205.1n ± 1% -1.82% (p=0.000 n=20)
Perm30ViaShuffle-32 111.7n ± 1% 111.2n ± 2% ~ (p=0.273 n=20)
ShuffleOverhead-32 101.1n ± 1% 100.5n ± 2% ~ (p=0.878 n=20)
Concurrent-32 2.108n ± 7% 2.188n ± 5% ~ (p=0.417 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
│ 220860f76f.arm64 │ 11ad9fdddc.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.316n ± 1% 2.272n ± 1% -1.86% (p=0.000 n=20)
GlobalInt64-8 2.183n ± 1% 2.155n ± 1% ~ (p=0.122 n=20)
GlobalInt63Parallel-8 0.4331n ± 0% 0.4352n ± 0% +0.48% (p=0.000 n=20)
GlobalUint64-8 4.377n ± 2% 2.173n ± 1% -50.35% (p=0.000 n=20)
GlobalUint64Parallel-8 0.9237n ± 0% 0.4340n ± 0% -53.02% (p=0.000 n=20)
Int64-8 2.538n ± 1% 2.544n ± 1% ~ (p=0.189 n=20)
Uint64-8 2.604n ± 1% 2.552n ± 1% -1.98% (p=0.000 n=20)
GlobalIntN1000-8 3.857n ± 2% 3.856n ± 0% ~ (p=0.051 n=20)
IntN1000-8 3.822n ± 2% 3.820n ± 0% -0.05% (p=0.001 n=20)
Int64N1000-8 3.318n ± 0% 3.219n ± 2% -2.98% (p=0.000 n=20)
Int64N1e8-8 3.349n ± 1% 3.221n ± 2% -3.79% (p=0.000 n=20)
Int64N1e9-8 3.317n ± 2% 3.276n ± 2% -1.24% (p=0.001 n=20)
Int64N2e9-8 3.317n ± 2% 3.217n ± 0% -3.01% (p=0.000 n=20)
Int64N1e18-8 3.542n ± 1% 3.502n ± 2% -1.16% (p=0.001 n=20)
Int64N2e18-8 5.087n ± 0% 4.968n ± 1% -2.33% (p=0.000 n=20)
Int64N4e18-8 5.084n ± 0% 4.963n ± 0% -2.39% (p=0.000 n=20)
Int32N1000-8 3.208n ± 2% 3.189n ± 1% -0.58% (p=0.001 n=20)
Int32N1e8-8 3.610n ± 1% 3.514n ± 1% -2.67% (p=0.000 n=20)
Int32N1e9-8 4.235n ± 0% 4.133n ± 0% -2.40% (p=0.000 n=20)
Int32N2e9-8 4.229n ± 1% 4.137n ± 0% -2.19% (p=0.000 n=20)
Float32-8 3.468n ± 0% 3.468n ± 1% ~ (p=0.350 n=20)
Float64-8 3.447n ± 0% 3.478n ± 0% +0.90% (p=0.000 n=20)
ExpFloat64-8 4.567n ± 0% 4.563n ± 0% -0.10% (p=0.002 n=20)
NormFloat64-8 4.821n ± 0% 4.768n ± 0% -1.09% (p=0.000 n=20)
Perm3-8 28.89n ± 0% 28.94n ± 0% +0.17% (p=0.000 n=20)
Perm30-8 175.7n ± 0% 175.9n ± 0% +0.14% (p=0.000 n=20)
Perm30ViaShuffle-8 153.5n ± 0% 152.6n ± 1% ~ (p=0.010 n=20)
ShuffleOverhead-8 119.8n ± 1% 119.6n ± 1% ~ (p=0.147 n=20)
Concurrent-8 2.433n ± 3% 2.452n ± 3% ~ (p=0.616 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 220860f76f.386 │ 11ad9fdddc.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.370n ± 1% 2.091n ± 1% -11.75% (p=0.000 n=20)
GlobalInt64-32 3.569n ± 1% 3.514n ± 2% -1.56% (p=0.000 n=20)
GlobalInt63Parallel-32 0.3221n ± 1% 0.3197n ± 0% -0.76% (p=0.000 n=20)
GlobalUint64-32 8.797n ± 10% 3.542n ± 1% -59.74% (p=0.000 n=20)
GlobalUint64Parallel-32 0.6351n ± 0% 0.3218n ± 0% -49.33% (p=0.000 n=20)
Int64-32 2.612n ± 2% 2.552n ± 2% -2.30% (p=0.000 n=20)
Uint64-32 3.350n ± 1% 2.566n ± 1% -23.42% (p=0.000 n=20)
GlobalIntN1000-32 5.892n ± 1% 5.965n ± 2% ~ (p=0.082 n=20)
IntN1000-32 4.546n ± 1% 4.652n ± 1% +2.33% (p=0.000 n=20)
Int64N1000-32 14.59n ± 1% 14.48n ± 1% ~ (p=0.652 n=20)
Int64N1e8-32 14.76n ± 2% 14.67n ± 1% ~ (p=0.836 n=20)
Int64N1e9-32 16.57n ± 1% 16.80n ± 2% ~ (p=0.016 n=20)
Int64N2e9-32 14.54n ± 1% 14.52n ± 1% ~ (p=0.533 n=20)
Int64N1e18-32 16.14n ± 1% 16.16n ± 1% ~ (p=0.606 n=20)
Int64N2e18-32 18.10n ± 1% 17.95n ± 2% ~ (p=0.062 n=20)
Int64N4e18-32 18.65n ± 1% 18.35n ± 2% -1.61% (p=0.010 n=20)
Int32N1000-32 3.560n ± 1% 3.608n ± 1% +1.33% (p=0.001 n=20)
Int32N1e8-32 3.770n ± 2% 3.767n ± 1% ~ (p=0.155 n=20)
Int32N1e9-32 4.098n ± 0% 4.130n ± 2% ~ (p=0.016 n=20)
Int32N2e9-32 4.179n ± 1% 4.206n ± 1% ~ (p=0.011 n=20)
Float32-32 21.18n ± 4% 22.18n ± 4% +4.70% (p=0.003 n=20)
Float64-32 20.60n ± 2% 20.75n ± 4% +0.73% (p=0.000 n=20)
ExpFloat64-32 13.07n ± 0% 12.58n ± 3% -3.82% (p=0.000 n=20)
NormFloat64-32 7.738n ± 2% 7.920n ± 3% ~ (p=0.066 n=20)
Perm3-32 36.73n ± 1% 40.27n ± 1% +9.65% (p=0.000 n=20)
Perm30-32 211.9n ± 1% 213.2n ± 2% ~ (p=0.262 n=20)
Perm30ViaShuffle-32 165.2n ± 1% 164.2n ± 2% ~ (p=0.029 n=20)
ShuffleOverhead-32 133.9n ± 1% 134.7n ± 2% ~ (p=0.551 n=20)
Concurrent-32 3.287n ± 2% 3.301n ± 2% ~ (p=0.330 n=20)
For #61716.
Change-Id: I8d2f73f87dd3603a0c2ff069988938e0957b6904
Reviewed-on: https://go-review.googlesource.com/c/go/+/502499
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
|
|
Change the benchmarks to use the result of the calls,
as I found that in certain cases inlining resulted in
discarding part of the computation in the benchmark loop.
Add various benchmarks that will be relevant in future CLs.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 220860f76f.amd64 │
│ sec/op │
SourceUint64-32 1.555n ± 1%
GlobalInt64-32 2.071n ± 1%
GlobalInt63Parallel-32 0.1023n ± 1%
GlobalUint64-32 5.193n ± 1%
GlobalUint64Parallel-32 0.2341n ± 0%
Int64-32 2.056n ± 2%
Uint64-32 2.077n ± 2%
GlobalIntN1000-32 4.077n ± 2%
IntN1000-32 3.476n ± 2%
Int64N1000-32 3.059n ± 1%
Int64N1e8-32 2.942n ± 1%
Int64N1e9-32 2.932n ± 1%
Int64N2e9-32 2.925n ± 1%
Int64N1e18-32 3.116n ± 1%
Int64N2e18-32 4.067n ± 1%
Int64N4e18-32 4.054n ± 1%
Int32N1000-32 2.951n ± 1%
Int32N1e8-32 3.102n ± 1%
Int32N1e9-32 3.535n ± 1%
Int32N2e9-32 3.514n ± 1%
Float32-32 2.760n ± 1%
Float64-32 2.284n ± 1%
ExpFloat64-32 3.757n ± 1%
NormFloat64-32 3.837n ± 1%
Perm3-32 35.23n ± 2%
Perm30-32 208.8n ± 1%
Perm30ViaShuffle-32 111.7n ± 1%
ShuffleOverhead-32 101.1n ± 1%
Concurrent-32 2.108n ± 7%
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│ 220860f76f.arm64 │
│ sec/op │
SourceUint64-8 2.316n ± 1%
GlobalInt64-8 2.183n ± 1%
GlobalInt63Parallel-8 0.4331n ± 0%
GlobalUint64-8 4.377n ± 2%
GlobalUint64Parallel-8 0.9237n ± 0%
Int64-8 2.538n ± 1%
Uint64-8 2.604n ± 1%
GlobalIntN1000-8 3.857n ± 2%
IntN1000-8 3.822n ± 2%
Int64N1000-8 3.318n ± 0%
Int64N1e8-8 3.349n ± 1%
Int64N1e9-8 3.317n ± 2%
Int64N2e9-8 3.317n ± 2%
Int64N1e18-8 3.542n ± 1%
Int64N2e18-8 5.087n ± 0%
Int64N4e18-8 5.084n ± 0%
Int32N1000-8 3.208n ± 2%
Int32N1e8-8 3.610n ± 1%
Int32N1e9-8 4.235n ± 0%
Int32N2e9-8 4.229n ± 1%
Float32-8 3.468n ± 0%
Float64-8 3.447n ± 0%
ExpFloat64-8 4.567n ± 0%
NormFloat64-8 4.821n ± 0%
Perm3-8 28.89n ± 0%
Perm30-8 175.7n ± 0%
Perm30ViaShuffle-8 153.5n ± 0%
ShuffleOverhead-8 119.8n ± 1%
Concurrent-8 2.433n ± 3%
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ 220860f76f.386 │
│ sec/op │
SourceUint64-32 2.370n ± 1%
GlobalInt64-32 3.569n ± 1%
GlobalInt63Parallel-32 0.3221n ± 1%
GlobalUint64-32 8.797n ± 10%
GlobalUint64Parallel-32 0.6351n ± 0%
Int64-32 2.612n ± 2%
Uint64-32 3.350n ± 1%
GlobalIntN1000-32 5.892n ± 1%
IntN1000-32 4.546n ± 1%
Int64N1000-32 14.59n ± 1%
Int64N1e8-32 14.76n ± 2%
Int64N1e9-32 16.57n ± 1%
Int64N2e9-32 14.54n ± 1%
Int64N1e18-32 16.14n ± 1%
Int64N2e18-32 18.10n ± 1%
Int64N4e18-32 18.65n ± 1%
Int32N1000-32 3.560n ± 1%
Int32N1e8-32 3.770n ± 2%
Int32N1e9-32 4.098n ± 0%
Int32N2e9-32 4.179n ± 1%
Float32-32 21.18n ± 4%
Float64-32 20.60n ± 2%
ExpFloat64-32 13.07n ± 0%
NormFloat64-32 7.738n ± 2%
Perm3-32 36.73n ± 1%
Perm30-32 211.9n ± 1%
Perm30ViaShuffle-32 165.2n ± 1%
ShuffleOverhead-32 133.9n ± 1%
Concurrent-32 3.287n ± 2%
For #61716.
Change-Id: I2f0938eae4b7bf736a8cd899a99783e731bf2179
Reviewed-on: https://go-review.googlesource.com/c/go/+/502496
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Removing Rand.Seed lets us remove lockedSource as well,
along with the ambiguity in globalRand about which source
to use.
For #61716.
Change-Id: Ibe150520dd1e7dd87165eacaebe9f0c2daeaedfd
Reviewed-on: https://go-review.googlesource.com/c/go/+/502498
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
|
|
Add more test cases.
Replace -printgolden with -update,
which rewrites the files for us.
For #61716.
Change-Id: I7c4c900ee896042429135a21971a56ebe16b6a66
Reviewed-on: https://go-review.googlesource.com/c/go/+/516858
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
In math/rand, Read is deprecated. Remove in v2.
People should use crypto/rand if they need long strings.
For #61716.
Change-Id: Ib254b7e1844616e96db60a3a7abb572b0dcb1583
Reviewed-on: https://go-review.googlesource.com/c/go/+/502497
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Int31 -> Int32
Int31n -> Int32N
Int63 -> Int64
Int63n -> Int64N
Intn -> IntN
The 31 and 63 are pedantic and confusing: the functions should
be named for the type they return, same as all the others.
The lower-case n is inconsistent with Go's usual CamelCase
and especially problematic because we plan to add 'func N'.
Capitalize the n.
For #61716.
Change-Id: Idb1a005a82f353677450d47fb612ade7a41fde69
Reviewed-on: https://go-review.googlesource.com/c/go/+/516857
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This is the beginning of the math/rand/v2 package from proposal #61716.
Start by copying old API. This CL copies math/rand/* to math/rand/v2
and updates references to math/rand to add v2 throughout.
Later CLs will make the v2 changes.
For #61716.
Change-Id: I1624ccffae3dfa442d4ba2461942decbd076e11b
Reviewed-on: https://go-review.googlesource.com/c/go/+/502495
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
|
|
Change-Id: I4a6c2ef6fd21355952ab7d8eaad883646a95d364
Reviewed-on: https://go-review.googlesource.com/c/go/+/535087
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
Change-Id: I71a38dd20bfaf2b1aed18892d54eeb017d3d7d66
GitHub-Last-Rev: 8da43b2cbd563ed123690709e519c9f84272b332
GitHub-Pull-Request: golang/go#61955
Reviewed-on: https://go-review.googlesource.com/c/go/+/518595
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
|
|
Fixes #59331
Change-Id: I62156be2f2758c59349c3b02db6cf9140429c9e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/481915
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Bypass: Ian Lance Taylor <iant@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
Fixes the misuse of "a" vs "an", according to English grammatical
expectations and using https://www.a-or-an.com/
Change-Id: I53ac724070e3ff3d33c304483fe72c023c7cda47
Reviewed-on: https://go-review.googlesource.com/c/go/+/480536
Run-TryBot: shuang cui <imcusg@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Change-Id: I37a9e4362953a711840087e1b7b8d7a25f1a83b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/467275
Reviewed-by: Russ Cox <rsc@golang.org>
TryBot-Bypass: Russ Cox <rsc@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
|
|
It currently says only what it wasn't good for, which is not helpful.
Change-Id: I468c7f385c14eaca99788a94d53c30b729ed0944
Reviewed-on: https://go-review.googlesource.com/c/go/+/466276
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
|
|
Now that the top-level math/rand functions are auto-seeded by default
(issue #54880), use the runtime fastrand64 function when 1) Seed
has not been called; 2) the GODEBUG randautoseed=0 is not used.
The benchmarks run quickly and are relatively noisy, but they show
significant improvements for parallel calls to the top-level functions.
goos: linux
goarch: amd64
pkg: math/rand
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ /tmp/foo.1 │ /tmp/foo.2 │
│ sec/op │ sec/op vs base │
Int63Threadsafe-16 11.605n ± 1% 3.094n ± 1% -73.34% (p=0.000 n=10)
Int63ThreadsafeParallel-16 67.8350n ± 2% 0.4000n ± 1% -99.41% (p=0.000 n=10)
Int63Unthreadsafe-16 1.947n ± 3% 1.924n ± 2% ~ (p=0.189 n=10)
Intn1000-16 4.295n ± 2% 4.287n ± 3% ~ (p=0.517 n=10)
Int63n1000-16 4.379n ± 0% 4.192n ± 2% -4.27% (p=0.000 n=10)
Int31n1000-16 3.641n ± 3% 3.506n ± 0% -3.69% (p=0.000 n=10)
Float32-16 3.330n ± 7% 3.250n ± 2% -2.40% (p=0.017 n=10)
Float64-16 2.194n ± 6% 2.056n ± 4% -6.31% (p=0.004 n=10)
Perm3-16 43.39n ± 9% 38.28n ± 12% -11.77% (p=0.015 n=10)
Perm30-16 324.4n ± 6% 315.9n ± 19% ~ (p=0.315 n=10)
Perm30ViaShuffle-16 175.4n ± 1% 143.6n ± 2% -18.15% (p=0.000 n=10)
ShuffleOverhead-16 223.4n ± 2% 215.8n ± 1% -3.38% (p=0.000 n=10)
Read3-16 5.428n ± 3% 5.406n ± 2% ~ (p=0.780 n=10)
Read64-16 41.55n ± 5% 40.14n ± 3% -3.38% (p=0.000 n=10)
Read1000-16 622.9n ± 4% 594.9n ± 2% -4.50% (p=0.000 n=10)
Concurrent-16 136.300n ± 2% 4.647n ± 26% -96.59% (p=0.000 n=10)
geomean 23.40n 12.15n -48.08%
Fixes #49892
Change-Id: Iba75b326145512ab0b7ece233b98ac3d4e1fb504
Reviewed-on: https://go-review.googlesource.com/c/go/+/465037
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
|
|
Allow GODEBUG users to report how many times a setting
resulted in non-default behavior.
Record non-default-behaviors for all existing GODEBUGs.
Also rework tests to ensure that runtime is in sync with runtime/metrics.All,
and generate docs mechanically from metrics.All.
For #56986.
Change-Id: Iefa1213e2a5c3f19ea16cd53298c487952ef05a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/453618
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Programs that call Seed and then expect a specific sequence
of results from the global random source (using functions such as Int)
can be broken when a dependency changes how much it consumes
from the global random source. To avoid such breakages, programs
that need a specific result sequence should use NewRand(NewSource(seed))
to obtain a random generator that other packages cannot access.
Fixes #56319.
Change-Id: Idac33991b719d2c71f109f51dacb3467a649e01e
Reviewed-on: https://go-review.googlesource.com/c/go/+/451375
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
|
|
We have been expanding our use of GODEBUG for compatibility,
and the current implementation forces a tradeoff between
freshness and efficiency. It parses the environment variable
in full each time it is called, which is expensive. But if clients
cache the result, they won't respond to run-time GODEBUG
changes, as happened with x509sha1 (#56436).
This CL changes the GODEBUG API to provide efficient,
up-to-date results. Instead of a single Get function,
New returns a *godebug.Setting that itself has a Get method.
Clients can save the result of New, which is no more expensive
than errors.New, in a global variable, and then call that
variable's Get method to get the value. Get costs only two
atomic loads in the case where the variable hasn't changed
since the last call.
Unfortunately, these changes do require importing sync
from godebug, which will mean that sync itself will never
be able to use a GODEBUG setting. That doesn't seem like
such a hardship. If it was really necessary, the runtime could
pass a setting to package sync itself at startup, with the
caveat that that setting, like the ones used by runtime itself,
would not respond to run-time GODEBUG changes.
Change-Id: I99a3acfa24fb2a692610af26a5d14bbc62c966ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/449504
Run-TryBot: Russ Cox <rsc@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
For #20661.
Change-Id: I1e638cb619e643eadc210d71f92bd1af7bafc912
Reviewed-on: https://go-review.googlesource.com/c/go/+/436955
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: hopehook <hopehook@golangcn.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
|
|
As of CL 443058, rand.Seed is not necessary to call,
nor is it a particular good idea.
For #54880.
Change-Id: If9d70763622c09008599db8c97a90fcbe285c6f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/445395
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Implement proposal #54880, to automatically seed the global source.
The justification for this not being a breaking change is that any
use of the global source in a package's init function or exported API
clearly must be valid - that is, if a package changes how much
randomness it consumes at init time or in an exported API, that
clearly isn't the kind of breaking change that requires issuing a v2
of that package. That kind of per-package change in the position
of the global source is indistinguishable from seeding the global
source differently. So if the per-package change is valid, so is auto-seeding.
And then, of course, auto-seeding means that packages will be
far less likely to depend on the specific results of the global source
and therefore not break when those kinds of per-package changes
happen in the future.
Seed(1) can be called in programs that need the old sequence from
the global source and want to restore the old behavior.
Of course, those programs will still be broken by the per-package
changes just described, and it would be better for them to allocate
local sources rather than continue to use the global one.
Fixes #54880.
Change-Id: Ib9dc3307b97f7a45587a9cc50d81f919d3edc7ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/443058
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
|
|
This sets up for delaying the decision of which seed to use,
but this CL still keeps the original global Seed(1) semantics.
Preparation for #54880.
Change-Id: Ibfa9d50ec9023aa755a83852e55168fa7d24b115
Reviewed-on: https://go-review.googlesource.com/c/go/+/443057
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
|
|
Fixes #44488
Change-Id: I570950799788678b9dc6e9ddad894973b4611e09
Reviewed-on: https://go-review.googlesource.com/c/go/+/425974
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joseph Tsai <joetsai@digital-static.net>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
[This CL is part of a sequence implementing the proposal #51082.
The design doc is at https://go.dev/s/godocfmt-design.]
Run the updated gofmt, which reformats doc comments,
on the main repository. Vendored files are excluded.
For #51082.
Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407
Reviewed-on: https://go-review.googlesource.com/c/go/+/384268
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
A future change to gofmt will rewrite
// Doc comment.
//
func f()
to
// Doc comment.
func f()
Apply that change preemptively to all doc comments.
For #51082.
Change-Id: I4023e16cfb0729b64a8590f071cd92f17343081d
Reviewed-on: https://go-review.googlesource.com/c/go/+/384259
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
|