go - Fork of Go programming language with my patches.

Age	Commit message (Collapse)	Author
2023-11-19	math/rand/v2: add ChaCha8	Russ Cox
	ChaCha8 provides a cryptographically strong generator alongside PCG, so that people who want stronger randomness have access to that. On systems with 128-bit vector math assembly (amd64 and arm64), ChaCha8 runs at about the same speed as PCG (25% slower on amd64, 2% faster on arm64). Obviously all the claimed benchmark variation other than the new ChaCha8 benchmark is a lie. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ afa459a2f0.amd64 │ bbb48afeb7.amd64 │ │ sec/op │ sec/op vs base │ PCG_DXSM-32 1.488n ± 2% 1.492n ± 2% ~ (p=0.309 n=20) ChaCha8-32 1.861n ± 2% SourceUint64-32 1.450n ± 3% 1.590n ± 2% +9.69% (p=0.000 n=20) GlobalInt64-32 2.067n ± 2% 2.061n ± 1% ~ (p=0.952 n=20) GlobalInt64Parallel-32 0.1044n ± 2% 0.1041n ± 1% ~ (p=0.498 n=20) GlobalUint64-32 2.085n ± 0% 2.256n ± 2% +8.23% (p=0.000 n=20) GlobalUint64Parallel-32 0.1008n ± 1% 0.1018n ± 1% ~ (p=0.041 n=20) Int64-32 1.779n ± 1% 1.779n ± 1% ~ (p=0.410 n=20) Uint64-32 1.854n ± 2% 1.882n ± 1% ~ (p=0.044 n=20) GlobalIntN1000-32 3.140n ± 3% 3.115n ± 3% ~ (p=0.673 n=20) IntN1000-32 2.496n ± 1% 2.509n ± 1% ~ (p=0.171 n=20) Int64N1000-32 2.510n ± 2% 2.493n ± 1% ~ (p=0.804 n=20) Int64N1e8-32 2.471n ± 2% 2.521n ± 1% +1.98% (p=0.003 n=20) Int64N1e9-32 2.488n ± 2% 2.506n ± 1% ~ (p=0.663 n=20) Int64N2e9-32 2.478n ± 2% 2.482n ± 2% ~ (p=0.533 n=20) Int64N1e18-32 3.088n ± 1% 3.216n ± 1% +4.15% (p=0.000 n=20) Int64N2e18-32 3.493n ± 1% 3.635n ± 2% +4.05% (p=0.000 n=20) Int64N4e18-32 5.060n ± 2% 5.122n ± 1% +1.22% (p=0.000 n=20) Int32N1000-32 2.620n ± 1% 2.672n ± 1% +2.00% (p=0.002 n=20) Int32N1e8-32 2.652n ± 0% 2.646n ± 1% ~ (p=0.743 n=20) Int32N1e9-32 2.644n ± 1% 2.660n ± 2% ~ (p=0.163 n=20) Int32N2e9-32 2.619n ± 2% 2.652n ± 1% ~ (p=0.132 n=20) Float32-32 2.261n ± 1% 2.267n ± 1% ~ (p=0.516 n=20) Float64-32 2.241n ± 2% 2.276n ± 1% ~ (p=0.080 n=20) ExpFloat64-32 3.716n ± 1% 3.779n ± 1% +1.68% (p=0.007 n=20) NormFloat64-32 3.718n ± 1% 3.747n ± 1% ~ (p=0.011 n=20) Perm3-32 34.11n ± 2% 34.23n ± 2% ~ (p=0.779 n=20) Perm30-32 200.6n ± 0% 202.3n ± 2% ~ (p=0.055 n=20) Perm30ViaShuffle-32 109.7n ± 1% 115.5n ± 2% +5.34% (p=0.000 n=20) ShuffleOverhead-32 107.2n ± 1% 113.3n ± 1% +5.74% (p=0.000 n=20) Concurrent-32 2.108n ± 6% 2.107n ± 1% ~ (p=0.448 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ afa459a2f0.arm64 │ bbb48afeb7.arm64 │ │ sec/op │ sec/op vs base │ PCG_DXSM-8 2.531n ± 0% 2.529n ± 0% ~ (p=0.586 n=20) ChaCha8-8 2.480n ± 0% SourceUint64-8 2.531n ± 0% 2.534n ± 0% ~ (p=0.227 n=20) GlobalInt64-8 2.177n ± 1% 2.173n ± 1% ~ (p=0.733 n=20) GlobalInt64Parallel-8 0.4319n ± 0% 0.4304n ± 0% -0.32% (p=0.003 n=20) GlobalUint64-8 2.185n ± 1% 2.185n ± 0% ~ (p=0.541 n=20) GlobalUint64Parallel-8 0.4295n ± 1% 0.4294n ± 0% ~ (p=0.203 n=20) Int64-8 4.104n ± 0% 4.107n ± 0% ~ (p=0.193 n=20) Uint64-8 4.080n ± 0% 4.081n ± 0% ~ (p=0.053 n=20) GlobalIntN1000-8 2.814n ± 1% 2.814n ± 0% ~ (p=0.879 n=20) IntN1000-8 4.140n ± 0% 4.141n ± 0% ~ (p=0.428 n=20) Int64N1000-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.114 n=20) Int64N1e8-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.898 n=20) Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.593 n=20) Int64N2e9-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.158 n=20) Int64N1e18-8 5.273n ± 0% 5.274n ± 0% ~ (p=0.308 n=20) Int64N2e18-8 6.059n ± 0% 6.058n ± 0% ~ (p=0.053 n=20) Int64N4e18-8 8.803n ± 0% 8.800n ± 0% ~ (p=0.673 n=20) Int32N1000-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.342 n=20) Int32N1e8-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.091 n=20) Int32N1e9-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.273 n=20) Int32N2e9-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.425 n=20) Float32-8 4.110n ± 0% 4.112n ± 0% ~ (p=0.203 n=20) Float64-8 4.104n ± 0% 4.106n ± 0% ~ (p=0.409 n=20) ExpFloat64-8 5.338n ± 0% 5.339n ± 0% ~ (p=0.037 n=20) NormFloat64-8 5.731n ± 0% 5.733n ± 0% ~ (p=0.692 n=20) Perm3-8 26.62n ± 0% 26.65n ± 0% +0.09% (p=0.000 n=20) Perm30-8 194.6n ± 2% 194.9n ± 0% ~ (p=0.141 n=20) Perm30ViaShuffle-8 156.4n ± 0% 156.5n ± 0% +0.06% (p=0.000 n=20) ShuffleOverhead-8 125.8n ± 0% 125.0n ± 0% -0.64% (p=0.000 n=20) Concurrent-8 2.654n ± 6% 2.441n ± 6% -8.06% (p=0.009 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ afa459a2f0.386 │ bbb48afeb7.386 │ │ sec/op │ sec/op vs base │ PCG_DXSM-32 7.793n ± 2% 7.647n ± 1% ~ (p=0.021 n=20) ChaCha8-32 11.48n ± 2% SourceUint64-32 7.680n ± 1% 7.714n ± 1% ~ (p=0.713 n=20) GlobalInt64-32 3.474n ± 3% 3.491n ± 28% ~ (p=0.337 n=20) GlobalInt64Parallel-32 0.3253n ± 0% 0.3194n ± 0% -1.81% (p=0.000 n=20) GlobalUint64-32 3.433n ± 2% 3.610n ± 2% +5.14% (p=0.000 n=20) GlobalUint64Parallel-32 0.3156n ± 0% 0.3164n ± 0% ~ (p=0.073 n=20) Int64-32 7.707n ± 1% 7.824n ± 0% +1.52% (p=0.005 n=20) Uint64-32 7.714n ± 1% 7.732n ± 2% ~ (p=0.441 n=20) GlobalIntN1000-32 6.236n ± 1% 6.176n ± 2% ~ (p=0.499 n=20) IntN1000-32 10.41n ± 1% 10.31n ± 2% ~ (p=0.782 n=20) Int64N1000-32 10.97n ± 2% 11.22n ± 2% +2.19% (p=0.002 n=20) Int64N1e8-32 10.98n ± 1% 11.07n ± 1% ~ (p=0.056 n=20) Int64N1e9-32 10.95n ± 0% 11.15n ± 2% ~ (p=0.016 n=20) Int64N2e9-32 11.11n ± 1% 11.00n ± 1% ~ (p=0.654 n=20) Int64N1e18-32 15.18n ± 2% 14.97n ± 2% ~ (p=0.387 n=20) Int64N2e18-32 15.61n ± 1% 15.91n ± 1% +1.92% (p=0.003 n=20) Int64N4e18-32 19.23n ± 2% 18.98n ± 1% ~ (p=1.000 n=20) Int32N1000-32 10.35n ± 1% 10.31n ± 2% ~ (p=0.081 n=20) Int32N1e8-32 10.33n ± 1% 10.38n ± 1% ~ (p=0.335 n=20) Int32N1e9-32 10.35n ± 1% 10.37n ± 1% ~ (p=0.497 n=20) Int32N2e9-32 10.35n ± 1% 10.41n ± 1% ~ (p=0.605 n=20) Float32-32 13.57n ± 1% 13.78n ± 2% ~ (p=0.047 n=20) Float64-32 22.95n ± 4% 23.43n ± 3% ~ (p=0.218 n=20) ExpFloat64-32 15.23n ± 2% 15.46n ± 1% ~ (p=0.095 n=20) NormFloat64-32 13.78n ± 1% 13.73n ± 2% ~ (p=0.031 n=20) Perm3-32 46.62n ± 2% 47.46n ± 2% +1.82% (p=0.004 n=20) Perm30-32 400.7n ± 1% 403.5n ± 1% ~ (p=0.098 n=20) Perm30ViaShuffle-32 350.5n ± 1% 348.1n ± 2% ~ (p=0.703 n=20) ShuffleOverhead-32 326.0n ± 2% 326.2n ± 2% ~ (p=0.440 n=20) Concurrent-32 3.290n ± 0% 3.297n ± 4% ~ (p=0.189 n=20) For #61716. Change-Id: Id2a7e1c1db0beb81f563faaefba65fe292497269 Reviewed-on: https://go-review.googlesource.com/c/go/+/516859 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>
2023-11-15	math/big: faster FloatPrec implementation	Robert Griesemer
	Based on observations by Cherry Mui (see comments in CL 539299). Add new benchmark FloatPrecMixed. For #50489. name old time/op new time/op delta FloatPrecExact/1-12 129ns ± 0% 105ns ±11% -18.51% (p=0.008 n=5+5) FloatPrecExact/10-12 317ns ± 2% 283ns ± 1% -10.65% (p=0.008 n=5+5) FloatPrecExact/100-12 1.80µs ±15% 1.35µs ± 0% -25.09% (p=0.008 n=5+5) FloatPrecExact/1000-12 9.48µs ±14% 8.32µs ± 1% -12.25% (p=0.008 n=5+5) FloatPrecExact/10000-12 195µs ± 1% 191µs ± 0% -1.73% (p=0.008 n=5+5) FloatPrecExact/100000-12 7.31ms ± 1% 7.24ms ± 1% -0.99% (p=0.032 n=5+5) FloatPrecExact/1000000-12 301ms ± 3% 302ms ± 2% ~ (p=0.841 n=5+5) FloatPrecMixed/1-12 141ns ± 0% 110ns ± 3% -21.88% (p=0.008 n=5+5) FloatPrecMixed/10-12 767ns ± 0% 739ns ± 5% ~ (p=0.151 n=5+5) FloatPrecMixed/100-12 4.93µs ± 2% 3.73µs ± 1% -24.33% (p=0.008 n=5+5) FloatPrecMixed/1000-12 90.9µs ±11% 70.3µs ± 2% -22.66% (p=0.008 n=5+5) FloatPrecMixed/10000-12 2.30ms ± 0% 1.92ms ± 1% -16.41% (p=0.008 n=5+5) FloatPrecMixed/100000-12 87.1ms ± 1% 68.5ms ± 1% -21.42% (p=0.008 n=5+5) FloatPrecMixed/1000000-12 4.09s ± 1% 3.58s ± 1% -12.35% (p=0.008 n=5+5) FloatPrecInexact/1-12 92.4ns ± 0% 66.1ns ± 5% -28.41% (p=0.008 n=5+5) FloatPrecInexact/10-12 118ns ± 0% 91ns ± 1% -23.14% (p=0.016 n=5+4) FloatPrecInexact/100-12 310ns ±10% 244ns ± 1% -21.32% (p=0.008 n=5+5) FloatPrecInexact/1000-12 952ns ± 1% 828ns ± 1% -12.96% (p=0.016 n=4+5) FloatPrecInexact/10000-12 6.71µs ± 1% 6.25µs ± 3% -6.83% (p=0.008 n=5+5) FloatPrecInexact/100000-12 66.1µs ± 1% 61.2µs ± 1% -7.45% (p=0.008 n=5+5) FloatPrecInexact/1000000-12 635µs ± 2% 584µs ± 1% -7.97% (p=0.008 n=5+5) Change-Id: I3aa67b49a042814a3286ee8306fbed36709cbb6e Reviewed-on: https://go-review.googlesource.com/c/go/+/542756 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Robert Griesemer <gri@google.com>
2023-11-14	math/big: update comment in the implementation of FloatPrec	Robert Griesemer
	Follow-up on CL 539299: missed to incorporate the updated comment per feedback on that CL. For #50489. Change-Id: Ib035400038b1d11532f62055b5cdb382ab75654c Reviewed-on: https://go-review.googlesource.com/c/go/+/542115 Run-TryBot: Robert Griesemer <gri@google.com> Auto-Submit: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-11-14	math/big: implement Rat.FloatPrec	Robert Griesemer
	goos: darwin goarch: amd64 pkg: math/big cpu: Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz BenchmarkFloatPrecExact/1-12 9380685 125.0 ns/op BenchmarkFloatPrecExact/10-12 3780493 321.2 ns/op BenchmarkFloatPrecExact/100-12 698272 1679 ns/op BenchmarkFloatPrecExact/1000-12 117975 9113 ns/op BenchmarkFloatPrecExact/10000-12 5913 192768 ns/op BenchmarkFloatPrecExact/100000-12 164 7401817 ns/op BenchmarkFloatPrecExact/1000000-12 4 293568523 ns/op BenchmarkFloatPrecInexact/1-12 12836612 91.26 ns/op BenchmarkFloatPrecInexact/10-12 10144908 114.9 ns/op BenchmarkFloatPrecInexact/100-12 4121931 297.3 ns/op BenchmarkFloatPrecInexact/1000-12 1275886 927.7 ns/op BenchmarkFloatPrecInexact/10000-12 170392 6546 ns/op BenchmarkFloatPrecInexact/100000-12 18307 65232 ns/op BenchmarkFloatPrecInexact/1000000-12 1701 621412 ns/op Fixes #50489. Change-Id: Ic952f00e35d42f2470ecab53df712721997eac94 Reviewed-on: https://go-review.googlesource.com/c/go/+/539299 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Robert Griesemer <gri@google.com> Run-TryBot: Robert Griesemer <gri@google.com> Reviewed-by: Robert Griesemer <gri@google.com>
2023-10-30	math/rand/v2: delete Mitchell/Reeds source	Russ Cox
	These slowdowns are because we are now using PCG instead of the Mitchell/Reeds LFSR for the benchmarks. PCG is in fact a bit slower (but generates statically far better random numbers). goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 01ff938549.amd64 │ afa459a2f0.amd64 │ │ sec/op │ sec/op vs base │ PCG_DXSM-32 1.490n ± 0% 1.488n ± 2% ~ (p=0.408 n=20) SourceUint64-32 1.352n ± 1% 1.450n ± 3% +7.21% (p=0.000 n=20) GlobalInt64-32 2.083n ± 0% 2.067n ± 2% ~ (p=0.223 n=20) GlobalInt64Parallel-32 0.1035n ± 1% 0.1044n ± 2% ~ (p=0.010 n=20) GlobalUint64-32 2.038n ± 1% 2.085n ± 0% +2.28% (p=0.000 n=20) GlobalUint64Parallel-32 0.1006n ± 1% 0.1008n ± 1% ~ (p=0.733 n=20) Int64-32 1.687n ± 2% 1.779n ± 1% +5.48% (p=0.000 n=20) Uint64-32 1.674n ± 2% 1.854n ± 2% +10.69% (p=0.000 n=20) GlobalIntN1000-32 3.135n ± 1% 3.140n ± 3% ~ (p=0.794 n=20) IntN1000-32 2.478n ± 1% 2.496n ± 1% +0.73% (p=0.006 n=20) Int64N1000-32 2.455n ± 1% 2.510n ± 2% +2.22% (p=0.000 n=20) Int64N1e8-32 2.467n ± 2% 2.471n ± 2% ~ (p=0.050 n=20) Int64N1e9-32 2.454n ± 1% 2.488n ± 2% +1.39% (p=0.000 n=20) Int64N2e9-32 2.482n ± 1% 2.478n ± 2% ~ (p=0.066 n=20) Int64N1e18-32 3.349n ± 2% 3.088n ± 1% -7.81% (p=0.000 n=20) Int64N2e18-32 3.537n ± 1% 3.493n ± 1% -1.24% (p=0.002 n=20) Int64N4e18-32 4.917n ± 0% 5.060n ± 2% +2.91% (p=0.000 n=20) Int32N1000-32 2.386n ± 1% 2.620n ± 1% +9.76% (p=0.000 n=20) Int32N1e8-32 2.366n ± 1% 2.652n ± 0% +12.11% (p=0.000 n=20) Int32N1e9-32 2.355n ± 2% 2.644n ± 1% +12.32% (p=0.000 n=20) Int32N2e9-32 2.371n ± 1% 2.619n ± 2% +10.48% (p=0.000 n=20) Float32-32 2.245n ± 2% 2.261n ± 1% ~ (p=0.625 n=20) Float64-32 2.235n ± 1% 2.241n ± 2% ~ (p=0.393 n=20) ExpFloat64-32 3.813n ± 3% 3.716n ± 1% -2.53% (p=0.000 n=20) NormFloat64-32 3.652n ± 2% 3.718n ± 1% +1.79% (p=0.006 n=20) Perm3-32 33.12n ± 3% 34.11n ± 2% ~ (p=0.021 n=20) Perm30-32 205.1n ± 1% 200.6n ± 0% -2.17% (p=0.000 n=20) Perm30ViaShuffle-32 110.8n ± 1% 109.7n ± 1% -0.99% (p=0.002 n=20) ShuffleOverhead-32 113.0n ± 1% 107.2n ± 1% -5.09% (p=0.000 n=20) Concurrent-32 2.100n ± 0% 2.108n ± 6% ~ (p=0.103 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 │ 01ff938549.arm64 │ afa459a2f0.arm64 │ │ sec/op │ sec/op vs base │ PCG_DXSM-8 2.531n ± 0% 2.531n ± 0% ~ (p=0.763 n=20) SourceUint64-8 2.258n ± 1% 2.531n ± 0% +12.09% (p=0.000 n=20) GlobalInt64-8 2.167n ± 0% 2.177n ± 1% ~ (p=0.213 n=20) GlobalInt64Parallel-8 0.4310n ± 0% 0.4319n ± 0% ~ (p=0.027 n=20) GlobalUint64-8 2.182n ± 1% 2.185n ± 1% ~ (p=0.683 n=20) GlobalUint64Parallel-8 0.4297n ± 0% 0.4295n ± 1% ~ (p=0.941 n=20) Int64-8 2.472n ± 1% 4.104n ± 0% +66.00% (p=0.000 n=20) Uint64-8 2.449n ± 1% 4.080n ± 0% +66.60% (p=0.000 n=20) GlobalIntN1000-8 2.814n ± 2% 2.814n ± 1% ~ (p=0.972 n=20) IntN1000-8 2.998n ± 2% 4.140n ± 0% +38.09% (p=0.000 n=20) Int64N1000-8 2.949n ± 2% 4.139n ± 0% +40.35% (p=0.000 n=20) Int64N1e8-8 2.953n ± 2% 4.140n ± 0% +40.22% (p=0.000 n=20) Int64N1e9-8 2.950n ± 0% 4.139n ± 0% +40.32% (p=0.000 n=20) Int64N2e9-8 2.946n ± 2% 4.140n ± 0% +40.53% (p=0.000 n=20) Int64N1e18-8 3.779n ± 1% 5.273n ± 0% +39.52% (p=0.000 n=20) Int64N2e18-8 4.370n ± 1% 6.059n ± 0% +38.65% (p=0.000 n=20) Int64N4e18-8 6.544n ± 1% 8.803n ± 0% +34.52% (p=0.000 n=20) Int32N1000-8 2.950n ± 0% 4.131n ± 0% +40.06% (p=0.000 n=20) Int32N1e8-8 2.950n ± 2% 4.131n ± 0% +40.03% (p=0.000 n=20) Int32N1e9-8 2.951n ± 2% 4.131n ± 0% +39.99% (p=0.000 n=20) Int32N2e9-8 2.950n ± 2% 4.131n ± 0% +40.03% (p=0.000 n=20) Float32-8 3.441n ± 0% 4.110n ± 0% +19.44% (p=0.000 n=20) Float64-8 3.442n ± 0% 4.104n ± 0% +19.24% (p=0.000 n=20) ExpFloat64-8 4.481n ± 0% 5.338n ± 0% +19.11% (p=0.000 n=20) NormFloat64-8 4.725n ± 0% 5.731n ± 0% +21.28% (p=0.000 n=20) Perm3-8 26.55n ± 0% 26.62n ± 0% +0.28% (p=0.000 n=20) Perm30-8 181.9n ± 0% 194.6n ± 2% +6.98% (p=0.000 n=20) Perm30ViaShuffle-8 142.9n ± 0% 156.4n ± 0% +9.45% (p=0.000 n=20) ShuffleOverhead-8 120.8n ± 2% 125.8n ± 0% +4.10% (p=0.000 n=20) Concurrent-8 2.421n ± 6% 2.654n ± 6% +9.67% (p=0.002 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 01ff938549.386 │ afa459a2f0.386 │ │ sec/op │ sec/op vs base │ PCG_DXSM-32 7.613n ± 1% 7.793n ± 2% +2.38% (p=0.000 n=20) SourceUint64-32 2.069n ± 0% 7.680n ± 1% +271.19% (p=0.000 n=20) GlobalInt64-32 3.456n ± 1% 3.474n ± 3% ~ (p=0.654 n=20) GlobalInt64Parallel-32 0.3252n ± 0% 0.3253n ± 0% ~ (p=0.952 n=20) GlobalUint64-32 3.573n ± 1% 3.433n ± 2% -3.92% (p=0.000 n=20) GlobalUint64Parallel-32 0.3159n ± 0% 0.3156n ± 0% ~ (p=0.223 n=20) Int64-32 2.562n ± 2% 7.707n ± 1% +200.74% (p=0.000 n=20) Uint64-32 2.592n ± 0% 7.714n ± 1% +197.65% (p=0.000 n=20) GlobalIntN1000-32 6.266n ± 2% 6.236n ± 1% ~ (p=0.039 n=20) IntN1000-32 4.724n ± 2% 10.410n ± 1% +120.39% (p=0.000 n=20) Int64N1000-32 5.490n ± 2% 10.975n ± 2% +99.89% (p=0.000 n=20) Int64N1e8-32 5.513n ± 2% 10.980n ± 1% +99.15% (p=0.000 n=20) Int64N1e9-32 5.476n ± 1% 10.950n ± 0% +99.96% (p=0.000 n=20) Int64N2e9-32 5.501n ± 2% 11.110n ± 1% +101.96% (p=0.000 n=20) Int64N1e18-32 9.043n ± 2% 15.180n ± 2% +67.86% (p=0.000 n=20) Int64N2e18-32 9.601n ± 2% 15.610n ± 1% +62.60% (p=0.000 n=20) Int64N4e18-32 12.00n ± 1% 19.23n ± 2% +60.14% (p=0.000 n=20) Int32N1000-32 4.829n ± 2% 10.345n ± 1% +114.25% (p=0.000 n=20) Int32N1e8-32 4.825n ± 2% 10.330n ± 1% +114.09% (p=0.000 n=20) Int32N1e9-32 4.830n ± 2% 10.350n ± 1% +114.26% (p=0.000 n=20) Int32N2e9-32 4.750n ± 2% 10.345n ± 1% +117.81% (p=0.000 n=20) Float32-32 10.89n ± 4% 13.57n ± 1% +24.61% (p=0.000 n=20) Float64-32 19.60n ± 4% 22.95n ± 4% +17.12% (p=0.000 n=20) ExpFloat64-32 12.96n ± 3% 15.23n ± 2% +17.47% (p=0.000 n=20) NormFloat64-32 7.516n ± 1% 13.780n ± 1% +83.34% (p=0.000 n=20) Perm3-32 36.78n ± 2% 46.62n ± 2% +26.72% (p=0.000 n=20) Perm30-32 238.9n ± 2% 400.7n ± 1% +67.73% (p=0.000 n=20) Perm30ViaShuffle-32 189.7n ± 2% 350.5n ± 1% +84.79% (p=0.000 n=20) ShuffleOverhead-32 159.8n ± 1% 326.0n ± 2% +104.01% (p=0.000 n=20) Concurrent-32 3.286n ± 1% 3.290n ± 0% ~ (p=0.743 n=20) On the other hand, compared to the original "update benchmarks" CL, the cleanups we've made more than compensate for PCG being a bit slower than LFSR, at least on 64-bit x86. ARM64 (Apple M1) is a bit slower: perhaps the 64x64→128 multiply is slower there for some reason. 386 is noticeably slower, but it's also a non-SSA backend. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 220860f76f.amd64 │ afa459a2f0.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.555n ± 1% 1.450n ± 3% -6.78% (p=0.000 n=20) GlobalInt64-32 2.071n ± 1% 2.067n ± 2% ~ (p=0.673 n=20) GlobalInt63Parallel-32 0.1023n ± 1% GlobalInt64Parallel-32 0.1044n ± 2% GlobalUint64-32 5.193n ± 1% 2.085n ± 0% -59.86% (p=0.000 n=20) GlobalUint64Parallel-32 0.2341n ± 0% 0.1008n ± 1% -56.93% (p=0.000 n=20) Int64-32 2.056n ± 2% 1.779n ± 1% -13.47% (p=0.000 n=20) Uint64-32 2.077n ± 2% 1.854n ± 2% -10.74% (p=0.000 n=20) GlobalIntN1000-32 4.077n ± 2% 3.140n ± 3% -22.98% (p=0.000 n=20) IntN1000-32 3.476n ± 2% 2.496n ± 1% -28.19% (p=0.000 n=20) Int64N1000-32 3.059n ± 1% 2.510n ± 2% -17.96% (p=0.000 n=20) Int64N1e8-32 2.942n ± 1% 2.471n ± 2% -15.98% (p=0.000 n=20) Int64N1e9-32 2.932n ± 1% 2.488n ± 2% -15.14% (p=0.000 n=20) Int64N2e9-32 2.925n ± 1% 2.478n ± 2% -15.30% (p=0.000 n=20) Int64N1e18-32 3.116n ± 1% 3.088n ± 1% ~ (p=0.013 n=20) Int64N2e18-32 4.067n ± 1% 3.493n ± 1% -14.11% (p=0.000 n=20) Int64N4e18-32 4.054n ± 1% 5.060n ± 2% +24.80% (p=0.000 n=20) Int32N1000-32 2.951n ± 1% 2.620n ± 1% -11.22% (p=0.000 n=20) Int32N1e8-32 3.102n ± 1% 2.652n ± 0% -14.50% (p=0.000 n=20) Int32N1e9-32 3.535n ± 1% 2.644n ± 1% -25.20% (p=0.000 n=20) Int32N2e9-32 3.514n ± 1% 2.619n ± 2% -25.47% (p=0.000 n=20) Float32-32 2.760n ± 1% 2.261n ± 1% -18.06% (p=0.000 n=20) Float64-32 2.284n ± 1% 2.241n ± 2% ~ (p=0.016 n=20) ExpFloat64-32 3.757n ± 1% 3.716n ± 1% ~ (p=0.034 n=20) NormFloat64-32 3.837n ± 1% 3.718n ± 1% -3.09% (p=0.000 n=20) Perm3-32 35.23n ± 2% 34.11n ± 2% -3.19% (p=0.000 n=20) Perm30-32 208.8n ± 1% 200.6n ± 0% -3.93% (p=0.000 n=20) Perm30ViaShuffle-32 111.7n ± 1% 109.7n ± 1% -1.84% (p=0.000 n=20) ShuffleOverhead-32 101.1n ± 1% 107.2n ± 1% +6.03% (p=0.000 n=20) Concurrent-32 2.108n ± 7% 2.108n ± 6% ~ (p=0.644 n=20) PCG_DXSM-32 1.488n ± 2% goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ 220860f76f.arm64 │ afa459a2f0.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.316n ± 1% 2.531n ± 0% +9.33% (p=0.000 n=20) GlobalInt64-8 2.183n ± 1% 2.177n ± 1% ~ (p=0.533 n=20) GlobalInt63Parallel-8 0.4331n ± 0% GlobalInt64Parallel-8 0.4319n ± 0% GlobalUint64-8 4.377n ± 2% 2.185n ± 1% -50.07% (p=0.000 n=20) GlobalUint64Parallel-8 0.9237n ± 0% 0.4295n ± 1% -53.50% (p=0.000 n=20) Int64-8 2.538n ± 1% 4.104n ± 0% +61.68% (p=0.000 n=20) Uint64-8 2.604n ± 1% 4.080n ± 0% +56.68% (p=0.000 n=20) GlobalIntN1000-8 3.857n ± 2% 2.814n ± 1% -27.04% (p=0.000 n=20) IntN1000-8 3.822n ± 2% 4.140n ± 0% +8.32% (p=0.000 n=20) Int64N1000-8 3.318n ± 0% 4.139n ± 0% +24.74% (p=0.000 n=20) Int64N1e8-8 3.349n ± 1% 4.140n ± 0% +23.64% (p=0.000 n=20) Int64N1e9-8 3.317n ± 2% 4.139n ± 0% +24.80% (p=0.000 n=20) Int64N2e9-8 3.317n ± 2% 4.140n ± 0% +24.81% (p=0.000 n=20) Int64N1e18-8 3.542n ± 1% 5.273n ± 0% +48.85% (p=0.000 n=20) Int64N2e18-8 5.087n ± 0% 6.059n ± 0% +19.12% (p=0.000 n=20) Int64N4e18-8 5.084n ± 0% 8.803n ± 0% +73.16% (p=0.000 n=20) Int32N1000-8 3.208n ± 2% 4.131n ± 0% +28.79% (p=0.000 n=20) Int32N1e8-8 3.610n ± 1% 4.131n ± 0% +14.43% (p=0.000 n=20) Int32N1e9-8 4.235n ± 0% 4.131n ± 0% -2.44% (p=0.000 n=20) Int32N2e9-8 4.229n ± 1% 4.131n ± 0% -2.33% (p=0.000 n=20) Float32-8 3.468n ± 0% 4.110n ± 0% +18.50% (p=0.000 n=20) Float64-8 3.447n ± 0% 4.104n ± 0% +19.05% (p=0.000 n=20) ExpFloat64-8 4.567n ± 0% 5.338n ± 0% +16.86% (p=0.000 n=20) NormFloat64-8 4.821n ± 0% 5.731n ± 0% +18.89% (p=0.000 n=20) Perm3-8 28.89n ± 0% 26.62n ± 0% -7.84% (p=0.000 n=20) Perm30-8 175.7n ± 0% 194.6n ± 2% +10.76% (p=0.000 n=20) Perm30ViaShuffle-8 153.5n ± 0% 156.4n ± 0% +1.86% (p=0.000 n=20) ShuffleOverhead-8 119.8n ± 1% 125.8n ± 0% +4.97% (p=0.000 n=20) Concurrent-8 2.433n ± 3% 2.654n ± 6% +9.13% (p=0.001 n=20) PCG_DXSM-8 2.531n ± 0% goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 220860f76f.386 │ afa459a2f0.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.370n ± 1% 7.680n ± 1% +224.05% (p=0.000 n=20) GlobalInt64-32 3.569n ± 1% 3.474n ± 3% -2.66% (p=0.001 n=20) GlobalInt63Parallel-32 0.3221n ± 1% GlobalInt64Parallel-32 0.3253n ± 0% GlobalUint64-32 8.797n ± 10% 3.433n ± 2% -60.98% (p=0.000 n=20) GlobalUint64Parallel-32 0.6351n ± 0% 0.3156n ± 0% -50.31% (p=0.000 n=20) Int64-32 2.612n ± 2% 7.707n ± 1% +195.04% (p=0.000 n=20) Uint64-32 3.350n ± 1% 7.714n ± 1% +130.25% (p=0.000 n=20) GlobalIntN1000-32 5.892n ± 1% 6.236n ± 1% +5.82% (p=0.000 n=20) IntN1000-32 4.546n ± 1% 10.410n ± 1% +128.97% (p=0.000 n=20) Int64N1000-32 14.59n ± 1% 10.97n ± 2% -24.75% (p=0.000 n=20) Int64N1e8-32 14.76n ± 2% 10.98n ± 1% -25.58% (p=0.000 n=20) Int64N1e9-32 16.57n ± 1% 10.95n ± 0% -33.90% (p=0.000 n=20) Int64N2e9-32 14.54n ± 1% 11.11n ± 1% -23.62% (p=0.000 n=20) Int64N1e18-32 16.14n ± 1% 15.18n ± 2% -5.95% (p=0.000 n=20) Int64N2e18-32 18.10n ± 1% 15.61n ± 1% -13.73% (p=0.000 n=20) Int64N4e18-32 18.65n ± 1% 19.23n ± 2% +3.08% (p=0.000 n=20) Int32N1000-32 3.560n ± 1% 10.345n ± 1% +190.55% (p=0.000 n=20) Int32N1e8-32 3.770n ± 2% 10.330n ± 1% +174.01% (p=0.000 n=20) Int32N1e9-32 4.098n ± 0% 10.350n ± 1% +152.53% (p=0.000 n=20) Int32N2e9-32 4.179n ± 1% 10.345n ± 1% +147.52% (p=0.000 n=20) Float32-32 21.18n ± 4% 13.57n ± 1% -35.93% (p=0.000 n=20) Float64-32 20.60n ± 2% 22.95n ± 4% +11.41% (p=0.000 n=20) ExpFloat64-32 13.07n ± 0% 15.23n ± 2% +16.48% (p=0.000 n=20) NormFloat64-32 7.738n ± 2% 13.780n ± 1% +78.08% (p=0.000 n=20) Perm3-32 36.73n ± 1% 46.62n ± 2% +26.91% (p=0.000 n=20) Perm30-32 211.9n ± 1% 400.7n ± 1% +89.05% (p=0.000 n=20) Perm30ViaShuffle-32 165.2n ± 1% 350.5n ± 1% +112.20% (p=0.000 n=20) ShuffleOverhead-32 133.9n ± 1% 326.0n ± 2% +143.37% (p=0.000 n=20) Concurrent-32 3.287n ± 2% 3.290n ± 0% ~ (p=0.365 n=20) PCG_DXSM-32 7.793n ± 2% For #61716. Change-Id: I4e9c0525b5f84a2ac46f23da9e365495e2d05777 Reviewed-on: https://go-review.googlesource.com/c/go/+/502506 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30	math/rand/v2: add PCG-DXSM	Russ Cox
	For the original math/rand, we ported Plan 9's random number generator, which was a refinement by Ken Thompson of an algorithm by Don Mitchell and Jim Reeds, which Mitchell in turn recalls as having been derived from an algorithm by Marsaglia. At its core, it is an additive lagged Fibonacci generator (ALFG). Whatever the details of the history, this generator is nowhere near the current state of the art for simple, pseudo-random generators. This CL adds an implementation of Melissa O'Neill's PCG, specifically the variant PCG-DXSM, which she defined after writing the PCG paper and which is now the default in Numpy. The update is slightly slower (a few multiplies and adds, instead of a few adds), but the state is dramatically smaller (2 words instead of 607). The statistical output properties are better too. A followup CL will delete the old generator. PCG is the only change here, so no benchmarks should be affected. Including them anyway as further evidence for caution. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 8993506f2f.amd64 │ 01ff938549.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.325n ± 1% 1.352n ± 1% +2.00% (p=0.000 n=20) GlobalInt64-32 2.240n ± 1% 2.083n ± 0% -7.03% (p=0.000 n=20) GlobalInt64Parallel-32 0.1041n ± 1% 0.1035n ± 1% ~ (p=0.064 n=20) GlobalUint64-32 2.072n ± 3% 2.038n ± 1% ~ (p=0.089 n=20) GlobalUint64Parallel-32 0.1008n ± 1% 0.1006n ± 1% ~ (p=0.804 n=20) Int64-32 1.716n ± 1% 1.687n ± 2% ~ (p=0.045 n=20) Uint64-32 1.665n ± 1% 1.674n ± 2% ~ (p=0.878 n=20) GlobalIntN1000-32 3.335n ± 1% 3.135n ± 1% -6.00% (p=0.000 n=20) IntN1000-32 2.484n ± 1% 2.478n ± 1% ~ (p=0.085 n=20) Int64N1000-32 2.502n ± 2% 2.455n ± 1% -1.88% (p=0.002 n=20) Int64N1e8-32 2.484n ± 2% 2.467n ± 2% ~ (p=0.048 n=20) Int64N1e9-32 2.502n ± 0% 2.454n ± 1% -1.92% (p=0.000 n=20) Int64N2e9-32 2.502n ± 0% 2.482n ± 1% -0.76% (p=0.000 n=20) Int64N1e18-32 3.201n ± 1% 3.349n ± 2% +4.62% (p=0.000 n=20) Int64N2e18-32 3.504n ± 1% 3.537n ± 1% ~ (p=0.185 n=20) Int64N4e18-32 4.873n ± 1% 4.917n ± 0% +0.90% (p=0.000 n=20) Int32N1000-32 2.639n ± 1% 2.386n ± 1% -9.57% (p=0.000 n=20) Int32N1e8-32 2.686n ± 2% 2.366n ± 1% -11.91% (p=0.000 n=20) Int32N1e9-32 2.636n ± 1% 2.355n ± 2% -10.70% (p=0.000 n=20) Int32N2e9-32 2.660n ± 1% 2.371n ± 1% -10.88% (p=0.000 n=20) Float32-32 2.261n ± 1% 2.245n ± 2% ~ (p=0.752 n=20) Float64-32 2.280n ± 1% 2.235n ± 1% -1.97% (p=0.007 n=20) ExpFloat64-32 3.891n ± 1% 3.813n ± 3% ~ (p=0.087 n=20) NormFloat64-32 3.711n ± 1% 3.652n ± 2% ~ (p=0.021 n=20) Perm3-32 32.60n ± 2% 33.12n ± 3% ~ (p=0.107 n=20) Perm30-32 204.2n ± 0% 205.1n ± 1% ~ (p=0.358 n=20) Perm30ViaShuffle-32 121.7n ± 2% 110.8n ± 1% -8.96% (p=0.000 n=20) ShuffleOverhead-32 106.2n ± 2% 113.0n ± 1% +6.36% (p=0.000 n=20) Concurrent-32 2.190n ± 5% 2.100n ± 0% -4.13% (p=0.001 n=20) PCG_DXSM-32 1.490n ± 0% goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ 8993506f2f.arm64 │ 01ff938549.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.271n ± 0% 2.258n ± 1% ~ (p=0.167 n=20) GlobalInt64-8 2.161n ± 1% 2.167n ± 0% ~ (p=0.693 n=20) GlobalInt64Parallel-8 0.4303n ± 0% 0.4310n ± 0% ~ (p=0.051 n=20) GlobalUint64-8 2.164n ± 1% 2.182n ± 1% ~ (p=0.042 n=20) GlobalUint64Parallel-8 0.4287n ± 0% 0.4297n ± 0% ~ (p=0.082 n=20) Int64-8 2.478n ± 1% 2.472n ± 1% ~ (p=0.151 n=20) Uint64-8 2.460n ± 1% 2.449n ± 1% ~ (p=0.013 n=20) GlobalIntN1000-8 2.814n ± 2% 2.814n ± 2% ~ (p=0.821 n=20) IntN1000-8 3.003n ± 2% 2.998n ± 2% ~ (p=0.024 n=20) Int64N1000-8 2.954n ± 0% 2.949n ± 2% ~ (p=0.192 n=20) Int64N1e8-8 2.956n ± 0% 2.953n ± 2% ~ (p=0.109 n=20) Int64N1e9-8 3.325n ± 0% 2.950n ± 0% -11.26% (p=0.000 n=20) Int64N2e9-8 2.956n ± 2% 2.946n ± 2% ~ (p=0.027 n=20) Int64N1e18-8 3.780n ± 1% 3.779n ± 1% ~ (p=0.815 n=20) Int64N2e18-8 4.385n ± 0% 4.370n ± 1% ~ (p=0.402 n=20) Int64N4e18-8 6.527n ± 0% 6.544n ± 1% ~ (p=0.140 n=20) Int32N1000-8 2.964n ± 1% 2.950n ± 0% -0.47% (p=0.002 n=20) Int32N1e8-8 2.964n ± 1% 2.950n ± 2% ~ (p=0.013 n=20) Int32N1e9-8 2.963n ± 2% 2.951n ± 2% ~ (p=0.062 n=20) Int32N2e9-8 2.961n ± 2% 2.950n ± 2% -0.37% (p=0.002 n=20) Float32-8 3.442n ± 0% 3.441n ± 0% ~ (p=0.211 n=20) Float64-8 3.442n ± 0% 3.442n ± 0% ~ (p=0.067 n=20) ExpFloat64-8 4.472n ± 0% 4.481n ± 0% +0.20% (p=0.000 n=20) NormFloat64-8 4.734n ± 0% 4.725n ± 0% -0.19% (p=0.003 n=20) Perm3-8 26.55n ± 0% 26.55n ± 0% ~ (p=0.833 n=20) Perm30-8 181.9n ± 0% 181.9n ± 0% -0.03% (p=0.004 n=20) Perm30ViaShuffle-8 143.1n ± 0% 142.9n ± 0% ~ (p=0.204 n=20) ShuffleOverhead-8 120.6n ± 1% 120.8n ± 2% ~ (p=0.102 n=20) Concurrent-8 2.357n ± 2% 2.421n ± 6% ~ (p=0.016 n=20) PCG_DXSM-8 2.531n ± 0% goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 8993506f2f.386 │ 01ff938549.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.102n ± 2% 2.069n ± 0% ~ (p=0.021 n=20) GlobalInt64-32 3.542n ± 2% 3.456n ± 1% -2.44% (p=0.001 n=20) GlobalInt64Parallel-32 0.3202n ± 0% 0.3252n ± 0% +1.56% (p=0.000 n=20) GlobalUint64-32 3.507n ± 1% 3.573n ± 1% +1.87% (p=0.000 n=20) GlobalUint64Parallel-32 0.3170n ± 1% 0.3159n ± 0% ~ (p=0.167 n=20) Int64-32 2.516n ± 1% 2.562n ± 2% ~ (p=0.016 n=20) Uint64-32 2.544n ± 1% 2.592n ± 0% +1.85% (p=0.000 n=20) GlobalIntN1000-32 6.237n ± 1% 6.266n ± 2% ~ (p=0.268 n=20) IntN1000-32 4.670n ± 2% 4.724n ± 2% ~ (p=0.644 n=20) Int64N1000-32 5.412n ± 1% 5.490n ± 2% ~ (p=0.159 n=20) Int64N1e8-32 5.414n ± 2% 5.513n ± 2% ~ (p=0.129 n=20) Int64N1e9-32 5.473n ± 1% 5.476n ± 1% ~ (p=0.723 n=20) Int64N2e9-32 5.487n ± 1% 5.501n ± 2% ~ (p=0.481 n=20) Int64N1e18-32 8.901n ± 2% 9.043n ± 2% ~ (p=0.330 n=20) Int64N2e18-32 9.521n ± 1% 9.601n ± 2% ~ (p=0.703 n=20) Int64N4e18-32 11.92n ± 1% 12.00n ± 1% ~ (p=0.489 n=20) Int32N1000-32 4.785n ± 1% 4.829n ± 2% ~ (p=0.402 n=20) Int32N1e8-32 4.748n ± 1% 4.825n ± 2% ~ (p=0.218 n=20) Int32N1e9-32 4.810n ± 1% 4.830n ± 2% ~ (p=0.794 n=20) Int32N2e9-32 4.812n ± 1% 4.750n ± 2% ~ (p=0.057 n=20) Float32-32 10.48n ± 4% 10.89n ± 4% ~ (p=0.162 n=20) Float64-32 19.79n ± 3% 19.60n ± 4% ~ (p=0.668 n=20) ExpFloat64-32 12.91n ± 3% 12.96n ± 3% ~ (p=1.000 n=20) NormFloat64-32 7.462n ± 1% 7.516n ± 1% ~ (p=0.051 n=20) Perm3-32 35.98n ± 2% 36.78n ± 2% ~ (p=0.033 n=20) Perm30-32 241.5n ± 1% 238.9n ± 2% ~ (p=0.126 n=20) Perm30ViaShuffle-32 187.3n ± 2% 189.7n ± 2% ~ (p=0.387 n=20) ShuffleOverhead-32 160.2n ± 1% 159.8n ± 1% ~ (p=0.256 n=20) Concurrent-32 3.308n ± 3% 3.286n ± 1% ~ (p=0.038 n=20) PCG_DXSM-32 7.613n ± 1% For #61716. Change-Id: Icb274ca1f782504d658305a40159b4ae6a2f3f1d Reviewed-on: https://go-review.googlesource.com/c/go/+/502505 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Rob Pike <r@golang.org>
2023-10-30	math/rand/v2: simplify Perm	Russ Cox
	The compiler says Perm is being inlined into BenchmarkPerm, and yet BenchmarkPerm30ViaShuffle, which you'd think is the same code, still runs significantly faster. The benchmarks are mystifying but this is clearly still a step in the right direction, since BenchmarkPerm30ViaShuffle is still the fastest and we avoid having two copies of that logic. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ e1bbe739fb.amd64 │ 8993506f2f.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.316n ± 2% 1.325n ± 1% ~ (p=0.208 n=20) GlobalInt64-32 2.048n ± 1% 2.240n ± 1% +9.38% (p=0.000 n=20) GlobalInt64Parallel-32 0.1037n ± 1% 0.1041n ± 1% ~ (p=0.774 n=20) GlobalUint64-32 2.039n ± 2% 2.072n ± 3% ~ (p=0.115 n=20) GlobalUint64Parallel-32 0.1013n ± 1% 0.1008n ± 1% ~ (p=0.417 n=20) Int64-32 1.692n ± 2% 1.716n ± 1% ~ (p=0.122 n=20) Uint64-32 1.643n ± 2% 1.665n ± 1% ~ (p=0.062 n=20) GlobalIntN1000-32 3.287n ± 1% 3.335n ± 1% ~ (p=0.147 n=20) IntN1000-32 2.678n ± 2% 2.484n ± 1% -7.24% (p=0.000 n=20) Int64N1000-32 2.684n ± 2% 2.502n ± 2% -6.80% (p=0.000 n=20) Int64N1e8-32 2.663n ± 2% 2.484n ± 2% -6.76% (p=0.000 n=20) Int64N1e9-32 2.633n ± 1% 2.502n ± 0% -4.98% (p=0.000 n=20) Int64N2e9-32 2.657n ± 1% 2.502n ± 0% -5.87% (p=0.000 n=20) Int64N1e18-32 3.125n ± 2% 3.201n ± 1% +2.43% (p=0.000 n=20) Int64N2e18-32 3.476n ± 1% 3.504n ± 1% +0.83% (p=0.009 n=20) Int64N4e18-32 4.795n ± 1% 4.873n ± 1% ~ (p=0.106 n=20) Int32N1000-32 2.485n ± 2% 2.639n ± 1% +6.20% (p=0.000 n=20) Int32N1e8-32 2.457n ± 1% 2.686n ± 2% +9.34% (p=0.000 n=20) Int32N1e9-32 2.452n ± 1% 2.636n ± 1% +7.52% (p=0.000 n=20) Int32N2e9-32 2.453n ± 1% 2.660n ± 1% +8.44% (p=0.000 n=20) Float32-32 2.254n ± 1% 2.261n ± 1% ~ (p=0.888 n=20) Float64-32 2.262n ± 1% 2.280n ± 1% ~ (p=0.040 n=20) ExpFloat64-32 3.777n ± 2% 3.891n ± 1% +3.03% (p=0.000 n=20) NormFloat64-32 3.606n ± 1% 3.711n ± 1% +2.91% (p=0.000 n=20) Perm3-32 33.12n ± 2% 32.60n ± 2% ~ (p=0.045 n=20) Perm30-32 176.1n ± 1% 204.2n ± 0% +15.96% (p=0.000 n=20) Perm30ViaShuffle-32 109.3n ± 1% 121.7n ± 2% +11.30% (p=0.000 n=20) ShuffleOverhead-32 112.5n ± 1% 106.2n ± 2% -5.56% (p=0.000 n=20) Concurrent-32 2.099n ± 0% 2.190n ± 5% +4.36% (p=0.001 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ e1bbe739fb.arm64 │ 8993506f2f.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.290n ± 1% 2.271n ± 0% ~ (p=0.015 n=20) GlobalInt64-8 2.180n ± 1% 2.161n ± 1% ~ (p=0.180 n=20) GlobalInt64Parallel-8 0.4294n ± 0% 0.4303n ± 0% +0.19% (p=0.001 n=20) GlobalUint64-8 2.170n ± 1% 2.164n ± 1% ~ (p=0.673 n=20) GlobalUint64Parallel-8 0.4283n ± 0% 0.4287n ± 0% ~ (p=0.128 n=20) Int64-8 2.481n ± 1% 2.478n ± 1% ~ (p=0.867 n=20) Uint64-8 2.464n ± 1% 2.460n ± 1% ~ (p=0.763 n=20) GlobalIntN1000-8 2.814n ± 0% 2.814n ± 2% ~ (p=0.969 n=20) IntN1000-8 2.934n ± 2% 3.003n ± 2% +2.35% (p=0.000 n=20) Int64N1000-8 2.957n ± 1% 2.954n ± 0% ~ (p=0.285 n=20) Int64N1e8-8 2.935n ± 2% 2.956n ± 0% +0.73% (p=0.002 n=20) Int64N1e9-8 2.935n ± 2% 3.325n ± 0% +13.29% (p=0.000 n=20) Int64N2e9-8 2.933n ± 4% 2.956n ± 2% ~ (p=0.163 n=20) Int64N1e18-8 3.781n ± 1% 3.780n ± 1% ~ (p=0.805 n=20) Int64N2e18-8 4.362n ± 0% 4.385n ± 0% ~ (p=0.077 n=20) Int64N4e18-8 6.576n ± 1% 6.527n ± 0% ~ (p=0.024 n=20) Int32N1000-8 2.942n ± 2% 2.964n ± 1% ~ (p=0.073 n=20) Int32N1e8-8 2.941n ± 1% 2.964n ± 1% ~ (p=0.058 n=20) Int32N1e9-8 2.938n ± 2% 2.963n ± 2% +0.87% (p=0.003 n=20) Int32N2e9-8 2.982n ± 2% 2.961n ± 2% ~ (p=0.056 n=20) Float32-8 3.441n ± 0% 3.442n ± 0% ~ (p=0.030 n=20) Float64-8 3.441n ± 0% 3.442n ± 0% +0.03% (p=0.001 n=20) ExpFloat64-8 4.472n ± 0% 4.472n ± 0% ~ (p=0.877 n=20) NormFloat64-8 4.716n ± 0% 4.734n ± 0% +0.38% (p=0.000 n=20) Perm3-8 26.66n ± 0% 26.55n ± 0% -0.39% (p=0.000 n=20) Perm30-8 143.3n ± 0% 181.9n ± 0% +26.97% (p=0.000 n=20) Perm30ViaShuffle-8 142.9n ± 0% 143.1n ± 0% ~ (p=0.669 n=20) ShuffleOverhead-8 121.1n ± 1% 120.6n ± 1% -0.41% (p=0.004 n=20) Concurrent-8 2.379n ± 2% 2.357n ± 2% ~ (p=0.337 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ e1bbe739fb.386 │ 8993506f2f.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.087n ± 1% 2.102n ± 2% ~ (p=0.507 n=20) GlobalInt64-32 3.538n ± 2% 3.542n ± 2% ~ (p=0.425 n=20) GlobalInt64Parallel-32 0.3207n ± 1% 0.3202n ± 0% ~ (p=0.963 n=20) GlobalUint64-32 3.543n ± 1% 3.507n ± 1% ~ (p=0.034 n=20) GlobalUint64Parallel-32 0.3170n ± 0% 0.3170n ± 1% ~ (p=0.920 n=20) Int64-32 2.548n ± 1% 2.516n ± 1% ~ (p=0.139 n=20) Uint64-32 2.565n ± 2% 2.544n ± 1% ~ (p=0.394 n=20) GlobalIntN1000-32 6.300n ± 1% 6.237n ± 1% ~ (p=0.029 n=20) IntN1000-32 4.750n ± 0% 4.670n ± 2% ~ (p=0.034 n=20) Int64N1000-32 5.515n ± 2% 5.412n ± 1% -1.86% (p=0.009 n=20) Int64N1e8-32 5.527n ± 0% 5.414n ± 2% -2.05% (p=0.002 n=20) Int64N1e9-32 5.531n ± 2% 5.473n ± 1% ~ (p=0.047 n=20) Int64N2e9-32 5.514n ± 2% 5.487n ± 1% ~ (p=0.298 n=20) Int64N1e18-32 9.059n ± 1% 8.901n ± 2% ~ (p=0.037 n=20) Int64N2e18-32 9.594n ± 1% 9.521n ± 1% ~ (p=0.051 n=20) Int64N4e18-32 12.05n ± 2% 11.92n ± 1% ~ (p=0.357 n=20) Int32N1000-32 4.840n ± 2% 4.785n ± 1% ~ (p=0.189 n=20) Int32N1e8-32 4.832n ± 2% 4.748n ± 1% ~ (p=0.042 n=20) Int32N1e9-32 4.815n ± 2% 4.810n ± 1% ~ (p=0.878 n=20) Int32N2e9-32 4.813n ± 1% 4.812n ± 1% ~ (p=0.542 n=20) Float32-32 10.90n ± 2% 10.48n ± 4% -3.85% (p=0.007 n=20) Float64-32 20.32n ± 4% 19.79n ± 3% ~ (p=0.553 n=20) ExpFloat64-32 12.95n ± 3% 12.91n ± 3% ~ (p=0.909 n=20) NormFloat64-32 7.570n ± 1% 7.462n ± 1% -1.44% (p=0.004 n=20) Perm3-32 37.80n ± 2% 35.98n ± 2% -4.79% (p=0.000 n=20) Perm30-32 214.0n ± 1% 241.5n ± 1% +12.85% (p=0.000 n=20) Perm30ViaShuffle-32 188.7n ± 2% 187.3n ± 2% ~ (p=0.029 n=20) ShuffleOverhead-32 160.8n ± 1% 160.2n ± 1% ~ (p=0.180 n=20) Concurrent-32 3.288n ± 0% 3.308n ± 3% ~ (p=0.037 n=20) For #61716. Change-Id: I342b611456c3569520d3c91c849d29eba325d87e Reviewed-on: https://go-review.googlesource.com/c/go/+/502504 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Rob Pike <r@golang.org>
2023-10-30	math/rand/v2: remove bias in ExpFloat64 and NormFloat64	Branden Brown
	The original implementation of the ziggurat algorithm was designed for 32-bit random integer inputs. This necessitated reusing some low-order bits for the slice selection and the random coordinate, which introduces statistical bias. The result is that PractRand consistently fails the math/rand normal and exponential sequences (transformed to uniform) within 2 GB of variates. This change adjusts the ziggurat procedures to use 63-bit random inputs, so that there is no need to reuse bits between the slice and coordinate. This is sufficient for the normal sequence to survive to 256 GB of PractRand testing. An alternative technique is to recalculate the ziggurats to use 1024 rather than 128 or 256 slices to make full use of 64-bit inputs. This improves the survival of the normal sequence to far beyond 256 GB and additionally provides a 6% performance improvement due to the improved rejection procedure efficiency. However, doing so increases the total size of the ziggurat tables from 4.5 kB to 48 kB. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 2703446c2e.amd64 │ e1bbe739fb.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.337n ± 1% 1.316n ± 2% ~ (p=0.024 n=20) GlobalInt64-32 2.225n ± 2% 2.048n ± 1% -7.93% (p=0.000 n=20) GlobalInt64Parallel-32 0.1043n ± 2% 0.1037n ± 1% ~ (p=0.587 n=20) GlobalUint64-32 2.058n ± 1% 2.039n ± 2% ~ (p=0.030 n=20) GlobalUint64Parallel-32 0.1009n ± 1% 0.1013n ± 1% ~ (p=0.984 n=20) Int64-32 1.719n ± 2% 1.692n ± 2% ~ (p=0.085 n=20) Uint64-32 1.669n ± 1% 1.643n ± 2% ~ (p=0.049 n=20) GlobalIntN1000-32 3.321n ± 2% 3.287n ± 1% ~ (p=0.298 n=20) IntN1000-32 2.479n ± 1% 2.678n ± 2% +8.01% (p=0.000 n=20) Int64N1000-32 2.477n ± 1% 2.684n ± 2% +8.38% (p=0.000 n=20) Int64N1e8-32 2.490n ± 1% 2.663n ± 2% +6.99% (p=0.000 n=20) Int64N1e9-32 2.458n ± 1% 2.633n ± 1% +7.12% (p=0.000 n=20) Int64N2e9-32 2.486n ± 2% 2.657n ± 1% +6.90% (p=0.000 n=20) Int64N1e18-32 3.215n ± 2% 3.125n ± 2% -2.78% (p=0.000 n=20) Int64N2e18-32 3.588n ± 2% 3.476n ± 1% -3.15% (p=0.000 n=20) Int64N4e18-32 4.938n ± 2% 4.795n ± 1% -2.91% (p=0.000 n=20) Int32N1000-32 2.673n ± 2% 2.485n ± 2% -7.02% (p=0.000 n=20) Int32N1e8-32 2.631n ± 2% 2.457n ± 1% -6.63% (p=0.000 n=20) Int32N1e9-32 2.628n ± 2% 2.452n ± 1% -6.70% (p=0.000 n=20) Int32N2e9-32 2.684n ± 2% 2.453n ± 1% -8.61% (p=0.000 n=20) Float32-32 2.240n ± 2% 2.254n ± 1% ~ (p=0.878 n=20) Float64-32 2.253n ± 1% 2.262n ± 1% ~ (p=0.963 n=20) ExpFloat64-32 3.677n ± 1% 3.777n ± 2% +2.71% (p=0.004 n=20) NormFloat64-32 3.761n ± 1% 3.606n ± 1% -4.15% (p=0.000 n=20) Perm3-32 33.55n ± 2% 33.12n ± 2% ~ (p=0.402 n=20) Perm30-32 173.2n ± 1% 176.1n ± 1% +1.67% (p=0.000 n=20) Perm30ViaShuffle-32 115.9n ± 1% 109.3n ± 1% -5.69% (p=0.000 n=20) ShuffleOverhead-32 101.9n ± 1% 112.5n ± 1% +10.35% (p=0.000 n=20) Concurrent-32 2.107n ± 6% 2.099n ± 0% ~ (p=0.051 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ 2703446c2e.arm64 │ e1bbe739fb.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.275n ± 0% 2.290n ± 1% ~ (p=0.044 n=20) GlobalInt64-8 2.154n ± 1% 2.180n ± 1% ~ (p=0.068 n=20) GlobalInt64Parallel-8 0.4298n ± 0% 0.4294n ± 0% ~ (p=0.079 n=20) GlobalUint64-8 2.160n ± 1% 2.170n ± 1% ~ (p=0.129 n=20) GlobalUint64Parallel-8 0.4286n ± 0% 0.4283n ± 0% ~ (p=0.350 n=20) Int64-8 2.491n ± 1% 2.481n ± 1% ~ (p=0.330 n=20) Uint64-8 2.458n ± 0% 2.464n ± 1% ~ (p=0.351 n=20) GlobalIntN1000-8 2.814n ± 2% 2.814n ± 0% ~ (p=0.325 n=20) IntN1000-8 2.933n ± 0% 2.934n ± 2% ~ (p=0.079 n=20) Int64N1000-8 2.962n ± 1% 2.957n ± 1% ~ (p=0.259 n=20) Int64N1e8-8 2.960n ± 1% 2.935n ± 2% ~ (p=0.276 n=20) Int64N1e9-8 2.935n ± 2% 2.935n ± 2% ~ (p=0.984 n=20) Int64N2e9-8 2.934n ± 0% 2.933n ± 4% ~ (p=0.463 n=20) Int64N1e18-8 3.777n ± 1% 3.781n ± 1% ~ (p=0.516 n=20) Int64N2e18-8 4.359n ± 1% 4.362n ± 0% ~ (p=0.256 n=20) Int64N4e18-8 6.536n ± 1% 6.576n ± 1% ~ (p=0.224 n=20) Int32N1000-8 2.937n ± 0% 2.942n ± 2% ~ (p=0.312 n=20) Int32N1e8-8 2.937n ± 1% 2.941n ± 1% ~ (p=0.463 n=20) Int32N1e9-8 2.936n ± 0% 2.938n ± 2% ~ (p=0.044 n=20) Int32N2e9-8 2.938n ± 2% 2.982n ± 2% ~ (p=0.174 n=20) Float32-8 3.441n ± 0% 3.441n ± 0% ~ (p=0.064 n=20) Float64-8 3.441n ± 0% 3.441n ± 0% ~ (p=0.826 n=20) ExpFloat64-8 4.486n ± 0% 4.472n ± 0% -0.31% (p=0.000 n=20) NormFloat64-8 4.721n ± 0% 4.716n ± 0% ~ (p=0.051 n=20) Perm3-8 26.65n ± 0% 26.66n ± 0% ~ (p=0.080 n=20) Perm30-8 143.2n ± 0% 143.3n ± 0% +0.10% (p=0.000 n=20) Perm30ViaShuffle-8 143.0n ± 0% 142.9n ± 0% ~ (p=0.642 n=20) ShuffleOverhead-8 120.6n ± 1% 121.1n ± 1% +0.41% (p=0.010 n=20) Concurrent-8 2.399n ± 5% 2.379n ± 2% ~ (p=0.365 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 2703446c2e.386 │ e1bbe739fb.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.072n ± 2% 2.087n ± 1% ~ (p=0.440 n=20) GlobalInt64-32 3.546n ± 27% 3.538n ± 2% ~ (p=0.101 n=20) GlobalInt64Parallel-32 0.3211n ± 0% 0.3207n ± 1% ~ (p=0.753 n=20) GlobalUint64-32 3.522n ± 2% 3.543n ± 1% ~ (p=0.071 n=20) GlobalUint64Parallel-32 0.3172n ± 0% 0.3170n ± 0% ~ (p=0.507 n=20) Int64-32 2.520n ± 2% 2.548n ± 1% ~ (p=0.267 n=20) Uint64-32 2.581n ± 1% 2.565n ± 2% ~ (p=0.143 n=20) GlobalIntN1000-32 6.171n ± 1% 6.300n ± 1% ~ (p=0.037 n=20) IntN1000-32 4.752n ± 2% 4.750n ± 0% ~ (p=0.984 n=20) Int64N1000-32 5.429n ± 1% 5.515n ± 2% ~ (p=0.292 n=20) Int64N1e8-32 5.469n ± 2% 5.527n ± 0% ~ (p=0.013 n=20) Int64N1e9-32 5.489n ± 2% 5.531n ± 2% ~ (p=0.256 n=20) Int64N2e9-32 5.492n ± 2% 5.514n ± 2% ~ (p=0.606 n=20) Int64N1e18-32 8.927n ± 1% 9.059n ± 1% ~ (p=0.229 n=20) Int64N2e18-32 9.622n ± 1% 9.594n ± 1% ~ (p=0.703 n=20) Int64N4e18-32 12.03n ± 1% 12.05n ± 2% ~ (p=0.733 n=20) Int32N1000-32 4.817n ± 1% 4.840n ± 2% ~ (p=0.941 n=20) Int32N1e8-32 4.801n ± 1% 4.832n ± 2% ~ (p=0.228 n=20) Int32N1e9-32 4.798n ± 1% 4.815n ± 2% ~ (p=0.560 n=20) Int32N2e9-32 4.840n ± 1% 4.813n ± 1% ~ (p=0.015 n=20) Float32-32 10.51n ± 4% 10.90n ± 2% +3.71% (p=0.007 n=20) Float64-32 20.33n ± 3% 20.32n ± 4% ~ (p=0.566 n=20) ExpFloat64-32 12.59n ± 2% 12.95n ± 3% +2.86% (p=0.002 n=20) NormFloat64-32 7.350n ± 2% 7.570n ± 1% +2.99% (p=0.007 n=20) Perm3-32 39.29n ± 2% 37.80n ± 2% -3.79% (p=0.000 n=20) Perm30-32 219.1n ± 2% 214.0n ± 1% -2.33% (p=0.002 n=20) Perm30ViaShuffle-32 189.8n ± 2% 188.7n ± 2% ~ (p=0.147 n=20) ShuffleOverhead-32 158.9n ± 2% 160.8n ± 1% ~ (p=0.176 n=20) Concurrent-32 3.306n ± 3% 3.288n ± 0% -0.54% (p=0.005 n=20) For #61716. Change-Id: I4c5fe710b310dc075ae21c97d1805bcc20db5050 Reviewed-on: https://go-review.googlesource.com/c/go/+/516275 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Rob Pike <r@golang.org>
2023-10-30	math/rand/v2: optimize Float32, Float64	Russ Cox
	We realized too late after Go 1 that float64(r.Uint64())/(1<<64) is not a correct implementation: it occasionally rounds to 1. The correct implementation is float64(r.Uint64()&(1<<53-1))/(1<<53) but we couldn't change the implementation for compatibility, so we changed it to retry only in the "round to 1" cases. The change to v2 lets us update the algorithm to the simpler, faster one. Note that this implementation cannot generate 2⁻⁵⁴, nor 2⁻¹⁰⁰, nor any of the other numbers between 0 and 2⁻⁵³. A slower algorithm could shift some of the probability of generating these two boundary values over to the values in between, but that would be much slower and not necessarily be better. In particular, the current implementation has the property that there are uniform gaps between the possible returned floats, which might help stability. Also, the result is often scaled and shifted, like Float64()*X+Y. Multiplying by X>1 would open new gaps, and adding most Y would erase all the distinctions that were introduced. The only changes to benchmarks should be in Float32 and Float64. The other changes remain a cautionary tale. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 4d84a369d1.amd64 │ 2703446c2e.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.348n ± 2% 1.337n ± 1% ~ (p=0.662 n=20) GlobalInt64-32 2.082n ± 2% 2.225n ± 2% +6.87% (p=0.000 n=20) GlobalInt64Parallel-32 0.1036n ± 1% 0.1043n ± 2% ~ (p=0.171 n=20) GlobalUint64-32 2.077n ± 2% 2.058n ± 1% ~ (p=0.560 n=20) GlobalUint64Parallel-32 0.1012n ± 1% 0.1009n ± 1% ~ (p=0.995 n=20) Int64-32 1.750n ± 0% 1.719n ± 2% -1.74% (p=0.000 n=20) Uint64-32 1.707n ± 2% 1.669n ± 1% -2.20% (p=0.000 n=20) GlobalIntN1000-32 3.192n ± 1% 3.321n ± 2% +4.04% (p=0.000 n=20) IntN1000-32 2.462n ± 2% 2.479n ± 1% ~ (p=0.417 n=20) Int64N1000-32 2.470n ± 1% 2.477n ± 1% ~ (p=0.664 n=20) Int64N1e8-32 2.503n ± 2% 2.490n ± 1% ~ (p=0.245 n=20) Int64N1e9-32 2.487n ± 1% 2.458n ± 1% ~ (p=0.032 n=20) Int64N2e9-32 2.487n ± 1% 2.486n ± 2% ~ (p=0.507 n=20) Int64N1e18-32 3.006n ± 2% 3.215n ± 2% +6.94% (p=0.000 n=20) Int64N2e18-32 3.368n ± 1% 3.588n ± 2% +6.55% (p=0.000 n=20) Int64N4e18-32 4.763n ± 1% 4.938n ± 2% +3.69% (p=0.000 n=20) Int32N1000-32 2.403n ± 1% 2.673n ± 2% +11.19% (p=0.000 n=20) Int32N1e8-32 2.405n ± 1% 2.631n ± 2% +9.42% (p=0.000 n=20) Int32N1e9-32 2.402n ± 2% 2.628n ± 2% +9.41% (p=0.000 n=20) Int32N2e9-32 2.384n ± 1% 2.684n ± 2% +12.56% (p=0.000 n=20) Float32-32 2.641n ± 2% 2.240n ± 2% -15.18% (p=0.000 n=20) Float64-32 2.483n ± 1% 2.253n ± 1% -9.26% (p=0.000 n=20) ExpFloat64-32 3.486n ± 2% 3.677n ± 1% +5.49% (p=0.000 n=20) NormFloat64-32 3.648n ± 1% 3.761n ± 1% +3.11% (p=0.000 n=20) Perm3-32 33.04n ± 1% 33.55n ± 2% ~ (p=0.180 n=20) Perm30-32 171.9n ± 1% 173.2n ± 1% ~ (p=0.050 n=20) Perm30ViaShuffle-32 100.3n ± 1% 115.9n ± 1% +15.55% (p=0.000 n=20) ShuffleOverhead-32 102.5n ± 1% 101.9n ± 1% ~ (p=0.266 n=20) Concurrent-32 2.101n ± 0% 2.107n ± 6% ~ (p=0.212 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ 4d84a369d1.arm64 │ 2703446c2e.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.261n ± 1% 2.275n ± 0% ~ (p=0.082 n=20) GlobalInt64-8 2.160n ± 1% 2.154n ± 1% ~ (p=0.490 n=20) GlobalInt64Parallel-8 0.4299n ± 0% 0.4298n ± 0% ~ (p=0.663 n=20) GlobalUint64-8 2.169n ± 1% 2.160n ± 1% ~ (p=0.292 n=20) GlobalUint64Parallel-8 0.4293n ± 1% 0.4286n ± 0% ~ (p=0.155 n=20) Int64-8 2.473n ± 1% 2.491n ± 1% ~ (p=0.317 n=20) Uint64-8 2.453n ± 1% 2.458n ± 0% ~ (p=0.941 n=20) GlobalIntN1000-8 2.814n ± 2% 2.814n ± 2% ~ (p=0.972 n=20) IntN1000-8 2.933n ± 2% 2.933n ± 0% ~ (p=0.287 n=20) Int64N1000-8 2.934n ± 2% 2.962n ± 1% ~ (p=0.062 n=20) Int64N1e8-8 2.935n ± 2% 2.960n ± 1% ~ (p=0.183 n=20) Int64N1e9-8 2.934n ± 2% 2.935n ± 2% ~ (p=0.367 n=20) Int64N2e9-8 2.935n ± 2% 2.934n ± 0% ~ (p=0.455 n=20) Int64N1e18-8 3.778n ± 1% 3.777n ± 1% ~ (p=0.995 n=20) Int64N2e18-8 4.359n ± 1% 4.359n ± 1% ~ (p=0.122 n=20) Int64N4e18-8 6.546n ± 1% 6.536n ± 1% ~ (p=0.920 n=20) Int32N1000-8 2.940n ± 2% 2.937n ± 0% ~ (p=0.149 n=20) Int32N1e8-8 2.937n ± 2% 2.937n ± 1% ~ (p=0.620 n=20) Int32N1e9-8 2.938n ± 0% 2.936n ± 0% ~ (p=0.046 n=20) Int32N2e9-8 2.938n ± 2% 2.938n ± 2% ~ (p=0.455 n=20) Float32-8 3.486n ± 0% 3.441n ± 0% -1.28% (p=0.000 n=20) Float64-8 3.480n ± 0% 3.441n ± 0% -1.13% (p=0.000 n=20) ExpFloat64-8 4.533n ± 0% 4.486n ± 0% -1.03% (p=0.000 n=20) NormFloat64-8 4.764n ± 0% 4.721n ± 0% -0.90% (p=0.000 n=20) Perm3-8 26.66n ± 0% 26.65n ± 0% ~ (p=0.019 n=20) Perm30-8 143.4n ± 0% 143.2n ± 0% -0.17% (p=0.000 n=20) Perm30ViaShuffle-8 142.9n ± 0% 143.0n ± 0% ~ (p=0.522 n=20) ShuffleOverhead-8 120.7n ± 0% 120.6n ± 1% ~ (p=0.488 n=20) Concurrent-8 2.360n ± 2% 2.399n ± 5% ~ (p=0.062 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 4d84a369d1.386 │ 2703446c2e.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.101n ± 2% 2.072n ± 2% ~ (p=0.273 n=20) GlobalInt64-32 3.518n ± 2% 3.546n ± 27% +0.78% (p=0.007 n=20) GlobalInt64Parallel-32 0.3206n ± 0% 0.3211n ± 0% ~ (p=0.386 n=20) GlobalUint64-32 3.538n ± 1% 3.522n ± 2% ~ (p=0.331 n=20) GlobalUint64Parallel-32 0.3231n ± 0% 0.3172n ± 0% -1.84% (p=0.000 n=20) Int64-32 2.554n ± 2% 2.520n ± 2% ~ (p=0.465 n=20) Uint64-32 2.575n ± 2% 2.581n ± 1% ~ (p=0.213 n=20) GlobalIntN1000-32 6.292n ± 1% 6.171n ± 1% ~ (p=0.015 n=20) IntN1000-32 4.735n ± 1% 4.752n ± 2% ~ (p=0.635 n=20) Int64N1000-32 5.489n ± 2% 5.429n ± 1% ~ (p=0.324 n=20) Int64N1e8-32 5.528n ± 2% 5.469n ± 2% ~ (p=0.013 n=20) Int64N1e9-32 5.438n ± 2% 5.489n ± 2% ~ (p=0.984 n=20) Int64N2e9-32 5.474n ± 1% 5.492n ± 2% ~ (p=0.616 n=20) Int64N1e18-32 9.053n ± 1% 8.927n ± 1% ~ (p=0.037 n=20) Int64N2e18-32 9.685n ± 2% 9.622n ± 1% ~ (p=0.449 n=20) Int64N4e18-32 12.18n ± 1% 12.03n ± 1% ~ (p=0.013 n=20) Int32N1000-32 4.862n ± 1% 4.817n ± 1% -0.94% (p=0.002 n=20) Int32N1e8-32 4.758n ± 2% 4.801n ± 1% ~ (p=0.597 n=20) Int32N1e9-32 4.772n ± 1% 4.798n ± 1% ~ (p=0.774 n=20) Int32N2e9-32 4.847n ± 0% 4.840n ± 1% ~ (p=0.867 n=20) Float32-32 22.18n ± 4% 10.51n ± 4% -52.61% (p=0.000 n=20) Float64-32 21.21n ± 3% 20.33n ± 3% -4.17% (p=0.000 n=20) ExpFloat64-32 12.39n ± 2% 12.59n ± 2% ~ (p=0.139 n=20) NormFloat64-32 7.422n ± 1% 7.350n ± 2% ~ (p=0.208 n=20) Perm3-32 38.00n ± 2% 39.29n ± 2% +3.38% (p=0.000 n=20) Perm30-32 212.7n ± 1% 219.1n ± 2% +3.03% (p=0.001 n=20) Perm30ViaShuffle-32 187.5n ± 2% 189.8n ± 2% ~ (p=0.457 n=20) ShuffleOverhead-32 159.7n ± 1% 158.9n ± 2% ~ (p=0.920 n=20) Concurrent-32 3.470n ± 0% 3.306n ± 3% -4.71% (p=0.000 n=20) For #61716. Change-Id: I1933f1f9efd7e6e832d83e7fa5d84398f67d41f5 Reviewed-on: https://go-review.googlesource.com/c/go/+/502503 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30	math/rand/v2: add, optimize N, UintN, Uint32N, Uint64N	Russ Cox
	Now that we can break the value stream, we can take advantage of better algorithms that have been suggested since the original code was written. Also optimizes IntN, Int32N, Int64N, Perm (indirectly). All the N variants (IntN, Int32N, Int64N, UintN, N, etc) now return the same values given a Source and parameter n, so that for example uint(r.IntN(10)) and r.UintN(10) and r.N(uint(10)) are completely interchangeable. Int64N4e18 gets slower but that is a near worst case for the algorithm and is extremely unlikely in practice. 32-bit Int32N variants got slower too, by 15-30%, in exchange for speeding up everything on 64-bit systems and consistency across the N functions. Also rename previously missed benchmark GlobalInt63Parallel to GlobalInt64Parallel. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 11ad9fdddc.amd64 │ 4d84a369d1.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.335n ± 1% 1.348n ± 2% ~ (p=0.335 n=20) GlobalInt64-32 2.046n ± 1% 2.082n ± 2% ~ (p=0.310 n=20) GlobalInt63Parallel-32 0.1037n ± 1% GlobalInt64Parallel-32 0.1036n ± 1% GlobalUint64-32 2.075n ± 0% 2.077n ± 2% ~ (p=0.228 n=20) GlobalUint64Parallel-32 0.1013n ± 1% 0.1012n ± 1% ~ (p=0.878 n=20) Int64-32 1.726n ± 2% 1.750n ± 0% +1.39% (p=0.000 n=20) Uint64-32 1.673n ± 1% 1.707n ± 2% +2.03% (p=0.002 n=20) GlobalIntN1000-32 3.895n ± 2% 3.192n ± 1% -18.05% (p=0.000 n=20) IntN1000-32 3.403n ± 1% 2.462n ± 2% -27.65% (p=0.000 n=20) Int64N1000-32 3.053n ± 2% 2.470n ± 1% -19.11% (p=0.000 n=20) Int64N1e8-32 2.718n ± 1% 2.503n ± 2% -7.91% (p=0.000 n=20) Int64N1e9-32 2.712n ± 1% 2.487n ± 1% -8.31% (p=0.000 n=20) Int64N2e9-32 2.690n ± 1% 2.487n ± 1% -7.57% (p=0.000 n=20) Int64N1e18-32 3.084n ± 2% 3.006n ± 2% -2.53% (p=0.000 n=20) Int64N2e18-32 4.026n ± 1% 3.368n ± 1% -16.33% (p=0.000 n=20) Int64N4e18-32 4.049n ± 2% 4.763n ± 1% +17.62% (p=0.000 n=20) Int32N1000-32 2.730n ± 0% 2.403n ± 1% -11.94% (p=0.000 n=20) Int32N1e8-32 2.916n ± 2% 2.405n ± 1% -17.53% (p=0.000 n=20) Int32N1e9-32 3.375n ± 1% 2.402n ± 2% -28.83% (p=0.000 n=20) Int32N2e9-32 3.292n ± 1% 2.384n ± 1% -27.58% (p=0.000 n=20) Float32-32 2.673n ± 1% 2.641n ± 2% ~ (p=0.147 n=20) Float64-32 2.485n ± 1% 2.483n ± 1% ~ (p=0.804 n=20) ExpFloat64-32 3.577n ± 2% 3.486n ± 2% -2.57% (p=0.000 n=20) NormFloat64-32 3.797n ± 2% 3.648n ± 1% -3.92% (p=0.000 n=20) Perm3-32 35.79n ± 2% 33.04n ± 1% -7.68% (p=0.000 n=20) Perm30-32 205.1n ± 1% 171.9n ± 1% -16.14% (p=0.000 n=20) Perm30ViaShuffle-32 111.2n ± 2% 100.3n ± 1% -9.76% (p=0.000 n=20) ShuffleOverhead-32 100.5n ± 2% 102.5n ± 1% +1.99% (p=0.007 n=20) Concurrent-32 2.188n ± 5% 2.101n ± 0% ~ (p=0.013 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ 11ad9fdddc.arm64 │ 4d84a369d1.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.272n ± 1% 2.261n ± 1% ~ (p=0.172 n=20) GlobalInt64-8 2.155n ± 1% 2.160n ± 1% ~ (p=0.482 n=20) GlobalInt63Parallel-8 0.4352n ± 0% GlobalInt64Parallel-8 0.4299n ± 0% GlobalUint64-8 2.173n ± 1% 2.169n ± 1% ~ (p=0.262 n=20) GlobalUint64Parallel-8 0.4340n ± 0% 0.4293n ± 1% -1.08% (p=0.000 n=20) Int64-8 2.544n ± 1% 2.473n ± 1% -2.83% (p=0.000 n=20) Uint64-8 2.552n ± 1% 2.453n ± 1% -3.90% (p=0.000 n=20) GlobalIntN1000-8 3.856n ± 0% 2.814n ± 2% -27.02% (p=0.000 n=20) IntN1000-8 3.820n ± 0% 2.933n ± 2% -23.22% (p=0.000 n=20) Int64N1000-8 3.219n ± 2% 2.934n ± 2% -8.85% (p=0.000 n=20) Int64N1e8-8 3.221n ± 2% 2.935n ± 2% -8.91% (p=0.000 n=20) Int64N1e9-8 3.276n ± 2% 2.934n ± 2% -10.44% (p=0.000 n=20) Int64N2e9-8 3.217n ± 0% 2.935n ± 2% -8.78% (p=0.000 n=20) Int64N1e18-8 3.502n ± 2% 3.778n ± 1% +7.91% (p=0.000 n=20) Int64N2e18-8 4.968n ± 1% 4.359n ± 1% -12.26% (p=0.000 n=20) Int64N4e18-8 4.963n ± 0% 6.546n ± 1% +31.92% (p=0.000 n=20) Int32N1000-8 3.189n ± 1% 2.940n ± 2% -7.81% (p=0.000 n=20) Int32N1e8-8 3.514n ± 1% 2.937n ± 2% -16.41% (p=0.000 n=20) Int32N1e9-8 4.133n ± 0% 2.938n ± 0% -28.91% (p=0.000 n=20) Int32N2e9-8 4.137n ± 0% 2.938n ± 2% -28.97% (p=0.000 n=20) Float32-8 3.468n ± 1% 3.486n ± 0% +0.52% (p=0.000 n=20) Float64-8 3.478n ± 0% 3.480n ± 0% ~ (p=0.063 n=20) ExpFloat64-8 4.563n ± 0% 4.533n ± 0% -0.67% (p=0.000 n=20) NormFloat64-8 4.768n ± 0% 4.764n ± 0% -0.07% (p=0.001 n=20) Perm3-8 28.94n ± 0% 26.66n ± 0% -7.88% (p=0.000 n=20) Perm30-8 175.9n ± 0% 143.4n ± 0% -18.50% (p=0.000 n=20) Perm30ViaShuffle-8 152.6n ± 1% 142.9n ± 0% -6.29% (p=0.000 n=20) ShuffleOverhead-8 119.6n ± 1% 120.7n ± 0% +0.96% (p=0.000 n=20) Concurrent-8 2.452n ± 3% 2.360n ± 2% -3.73% (p=0.007 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 11ad9fdddc.386 │ 4d84a369d1.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.091n ± 1% 2.101n ± 2% ~ (p=0.672 n=20) GlobalInt64-32 3.514n ± 2% 3.518n ± 2% ~ (p=0.723 n=20) GlobalInt63Parallel-32 0.3197n ± 0% GlobalInt64Parallel-32 0.3206n ± 0% GlobalUint64-32 3.542n ± 1% 3.538n ± 1% ~ (p=0.304 n=20) GlobalUint64Parallel-32 0.3218n ± 0% 0.3231n ± 0% ~ (p=0.071 n=20) Int64-32 2.552n ± 2% 2.554n ± 2% ~ (p=0.693 n=20) Uint64-32 2.566n ± 1% 2.575n ± 2% ~ (p=0.606 n=20) GlobalIntN1000-32 5.965n ± 2% 6.292n ± 1% +5.46% (p=0.000 n=20) IntN1000-32 4.652n ± 1% 4.735n ± 1% +1.77% (p=0.000 n=20) Int64N1000-32 14.485n ± 1% 5.489n ± 2% -62.11% (p=0.000 n=20) Int64N1e8-32 14.675n ± 1% 5.528n ± 2% -62.33% (p=0.000 n=20) Int64N1e9-32 16.805n ± 2% 5.438n ± 2% -67.64% (p=0.000 n=20) Int64N2e9-32 14.515n ± 1% 5.474n ± 1% -62.28% (p=0.000 n=20) Int64N1e18-32 16.165n ± 1% 9.053n ± 1% -44.00% (p=0.000 n=20) Int64N2e18-32 17.945n ± 2% 9.685n ± 2% -46.03% (p=0.000 n=20) Int64N4e18-32 18.35n ± 2% 12.18n ± 1% -33.62% (p=0.000 n=20) Int32N1000-32 3.608n ± 1% 4.862n ± 1% +34.77% (p=0.000 n=20) Int32N1e8-32 3.767n ± 1% 4.758n ± 2% +26.31% (p=0.000 n=20) Int32N1e9-32 4.130n ± 2% 4.772n ± 1% +15.54% (p=0.000 n=20) Int32N2e9-32 4.206n ± 1% 4.847n ± 0% +15.24% (p=0.000 n=20) Float32-32 22.18n ± 4% 22.18n ± 4% ~ (p=0.195 n=20) Float64-32 20.75n ± 4% 21.21n ± 3% ~ (p=0.394 n=20) ExpFloat64-32 12.58n ± 3% 12.39n ± 2% ~ (p=0.032 n=20) NormFloat64-32 7.920n ± 3% 7.422n ± 1% -6.29% (p=0.000 n=20) Perm3-32 40.27n ± 1% 38.00n ± 2% -5.65% (p=0.000 n=20) Perm30-32 213.2n ± 2% 212.7n ± 1% ~ (p=0.995 n=20) Perm30ViaShuffle-32 164.2n ± 2% 187.5n ± 2% +14.22% (p=0.000 n=20) ShuffleOverhead-32 134.7n ± 2% 159.7n ± 1% +18.52% (p=0.000 n=20) Concurrent-32 3.301n ± 2% 3.470n ± 0% +5.10% (p=0.000 n=20) For #61716. Change-Id: Id1481b04202883cd0b23e21bb58d1bca4e482bd3 Reviewed-on: https://go-review.googlesource.com/c/go/+/502500 Reviewed-by: Rob Pike <r@golang.org> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30	math/rand/v2: change Source to use uint64	Russ Cox
	This should make Uint64-using functions faster and leave other things alone. It is a mystery why so much got faster. A good cautionary tale not to read too much into minor jitter in the benchmarks. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 220860f76f.amd64 │ 11ad9fdddc.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.555n ± 1% 1.335n ± 1% -14.15% (p=0.000 n=20) GlobalInt64-32 2.071n ± 1% 2.046n ± 1% ~ (p=0.016 n=20) GlobalInt63Parallel-32 0.1023n ± 1% 0.1037n ± 1% +1.37% (p=0.002 n=20) GlobalUint64-32 5.193n ± 1% 2.075n ± 0% -60.06% (p=0.000 n=20) GlobalUint64Parallel-32 0.2341n ± 0% 0.1013n ± 1% -56.74% (p=0.000 n=20) Int64-32 2.056n ± 2% 1.726n ± 2% -16.10% (p=0.000 n=20) Uint64-32 2.077n ± 2% 1.673n ± 1% -19.46% (p=0.000 n=20) GlobalIntN1000-32 4.077n ± 2% 3.895n ± 2% -4.45% (p=0.000 n=20) IntN1000-32 3.476n ± 2% 3.403n ± 1% -2.10% (p=0.000 n=20) Int64N1000-32 3.059n ± 1% 3.053n ± 2% ~ (p=0.131 n=20) Int64N1e8-32 2.942n ± 1% 2.718n ± 1% -7.60% (p=0.000 n=20) Int64N1e9-32 2.932n ± 1% 2.712n ± 1% -7.50% (p=0.000 n=20) Int64N2e9-32 2.925n ± 1% 2.690n ± 1% -8.03% (p=0.000 n=20) Int64N1e18-32 3.116n ± 1% 3.084n ± 2% ~ (p=0.425 n=20) Int64N2e18-32 4.067n ± 1% 4.026n ± 1% -1.02% (p=0.007 n=20) Int64N4e18-32 4.054n ± 1% 4.049n ± 2% ~ (p=0.204 n=20) Int32N1000-32 2.951n ± 1% 2.730n ± 0% -7.49% (p=0.000 n=20) Int32N1e8-32 3.102n ± 1% 2.916n ± 2% -6.03% (p=0.000 n=20) Int32N1e9-32 3.535n ± 1% 3.375n ± 1% -4.54% (p=0.000 n=20) Int32N2e9-32 3.514n ± 1% 3.292n ± 1% -6.30% (p=0.000 n=20) Float32-32 2.760n ± 1% 2.673n ± 1% -3.13% (p=0.000 n=20) Float64-32 2.284n ± 1% 2.485n ± 1% +8.80% (p=0.000 n=20) ExpFloat64-32 3.757n ± 1% 3.577n ± 2% -4.78% (p=0.000 n=20) NormFloat64-32 3.837n ± 1% 3.797n ± 2% ~ (p=0.204 n=20) Perm3-32 35.23n ± 2% 35.79n ± 2% ~ (p=0.298 n=20) Perm30-32 208.8n ± 1% 205.1n ± 1% -1.82% (p=0.000 n=20) Perm30ViaShuffle-32 111.7n ± 1% 111.2n ± 2% ~ (p=0.273 n=20) ShuffleOverhead-32 101.1n ± 1% 100.5n ± 2% ~ (p=0.878 n=20) Concurrent-32 2.108n ± 7% 2.188n ± 5% ~ (p=0.417 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 │ 220860f76f.arm64 │ 11ad9fdddc.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.316n ± 1% 2.272n ± 1% -1.86% (p=0.000 n=20) GlobalInt64-8 2.183n ± 1% 2.155n ± 1% ~ (p=0.122 n=20) GlobalInt63Parallel-8 0.4331n ± 0% 0.4352n ± 0% +0.48% (p=0.000 n=20) GlobalUint64-8 4.377n ± 2% 2.173n ± 1% -50.35% (p=0.000 n=20) GlobalUint64Parallel-8 0.9237n ± 0% 0.4340n ± 0% -53.02% (p=0.000 n=20) Int64-8 2.538n ± 1% 2.544n ± 1% ~ (p=0.189 n=20) Uint64-8 2.604n ± 1% 2.552n ± 1% -1.98% (p=0.000 n=20) GlobalIntN1000-8 3.857n ± 2% 3.856n ± 0% ~ (p=0.051 n=20) IntN1000-8 3.822n ± 2% 3.820n ± 0% -0.05% (p=0.001 n=20) Int64N1000-8 3.318n ± 0% 3.219n ± 2% -2.98% (p=0.000 n=20) Int64N1e8-8 3.349n ± 1% 3.221n ± 2% -3.79% (p=0.000 n=20) Int64N1e9-8 3.317n ± 2% 3.276n ± 2% -1.24% (p=0.001 n=20) Int64N2e9-8 3.317n ± 2% 3.217n ± 0% -3.01% (p=0.000 n=20) Int64N1e18-8 3.542n ± 1% 3.502n ± 2% -1.16% (p=0.001 n=20) Int64N2e18-8 5.087n ± 0% 4.968n ± 1% -2.33% (p=0.000 n=20) Int64N4e18-8 5.084n ± 0% 4.963n ± 0% -2.39% (p=0.000 n=20) Int32N1000-8 3.208n ± 2% 3.189n ± 1% -0.58% (p=0.001 n=20) Int32N1e8-8 3.610n ± 1% 3.514n ± 1% -2.67% (p=0.000 n=20) Int32N1e9-8 4.235n ± 0% 4.133n ± 0% -2.40% (p=0.000 n=20) Int32N2e9-8 4.229n ± 1% 4.137n ± 0% -2.19% (p=0.000 n=20) Float32-8 3.468n ± 0% 3.468n ± 1% ~ (p=0.350 n=20) Float64-8 3.447n ± 0% 3.478n ± 0% +0.90% (p=0.000 n=20) ExpFloat64-8 4.567n ± 0% 4.563n ± 0% -0.10% (p=0.002 n=20) NormFloat64-8 4.821n ± 0% 4.768n ± 0% -1.09% (p=0.000 n=20) Perm3-8 28.89n ± 0% 28.94n ± 0% +0.17% (p=0.000 n=20) Perm30-8 175.7n ± 0% 175.9n ± 0% +0.14% (p=0.000 n=20) Perm30ViaShuffle-8 153.5n ± 0% 152.6n ± 1% ~ (p=0.010 n=20) ShuffleOverhead-8 119.8n ± 1% 119.6n ± 1% ~ (p=0.147 n=20) Concurrent-8 2.433n ± 3% 2.452n ± 3% ~ (p=0.616 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 220860f76f.386 │ 11ad9fdddc.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.370n ± 1% 2.091n ± 1% -11.75% (p=0.000 n=20) GlobalInt64-32 3.569n ± 1% 3.514n ± 2% -1.56% (p=0.000 n=20) GlobalInt63Parallel-32 0.3221n ± 1% 0.3197n ± 0% -0.76% (p=0.000 n=20) GlobalUint64-32 8.797n ± 10% 3.542n ± 1% -59.74% (p=0.000 n=20) GlobalUint64Parallel-32 0.6351n ± 0% 0.3218n ± 0% -49.33% (p=0.000 n=20) Int64-32 2.612n ± 2% 2.552n ± 2% -2.30% (p=0.000 n=20) Uint64-32 3.350n ± 1% 2.566n ± 1% -23.42% (p=0.000 n=20) GlobalIntN1000-32 5.892n ± 1% 5.965n ± 2% ~ (p=0.082 n=20) IntN1000-32 4.546n ± 1% 4.652n ± 1% +2.33% (p=0.000 n=20) Int64N1000-32 14.59n ± 1% 14.48n ± 1% ~ (p=0.652 n=20) Int64N1e8-32 14.76n ± 2% 14.67n ± 1% ~ (p=0.836 n=20) Int64N1e9-32 16.57n ± 1% 16.80n ± 2% ~ (p=0.016 n=20) Int64N2e9-32 14.54n ± 1% 14.52n ± 1% ~ (p=0.533 n=20) Int64N1e18-32 16.14n ± 1% 16.16n ± 1% ~ (p=0.606 n=20) Int64N2e18-32 18.10n ± 1% 17.95n ± 2% ~ (p=0.062 n=20) Int64N4e18-32 18.65n ± 1% 18.35n ± 2% -1.61% (p=0.010 n=20) Int32N1000-32 3.560n ± 1% 3.608n ± 1% +1.33% (p=0.001 n=20) Int32N1e8-32 3.770n ± 2% 3.767n ± 1% ~ (p=0.155 n=20) Int32N1e9-32 4.098n ± 0% 4.130n ± 2% ~ (p=0.016 n=20) Int32N2e9-32 4.179n ± 1% 4.206n ± 1% ~ (p=0.011 n=20) Float32-32 21.18n ± 4% 22.18n ± 4% +4.70% (p=0.003 n=20) Float64-32 20.60n ± 2% 20.75n ± 4% +0.73% (p=0.000 n=20) ExpFloat64-32 13.07n ± 0% 12.58n ± 3% -3.82% (p=0.000 n=20) NormFloat64-32 7.738n ± 2% 7.920n ± 3% ~ (p=0.066 n=20) Perm3-32 36.73n ± 1% 40.27n ± 1% +9.65% (p=0.000 n=20) Perm30-32 211.9n ± 1% 213.2n ± 2% ~ (p=0.262 n=20) Perm30ViaShuffle-32 165.2n ± 1% 164.2n ± 2% ~ (p=0.029 n=20) ShuffleOverhead-32 133.9n ± 1% 134.7n ± 2% ~ (p=0.551 n=20) Concurrent-32 3.287n ± 2% 3.301n ± 2% ~ (p=0.330 n=20) For #61716. Change-Id: I8d2f73f87dd3603a0c2ff069988938e0957b6904 Reviewed-on: https://go-review.googlesource.com/c/go/+/502499 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Rob Pike <r@golang.org>
2023-10-30	math/rand/v2: update benchmarks	Russ Cox
	Change the benchmarks to use the result of the calls, as I found that in certain cases inlining resulted in discarding part of the computation in the benchmark loop. Add various benchmarks that will be relevant in future CLs. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 220860f76f.amd64 │ │ sec/op │ SourceUint64-32 1.555n ± 1% GlobalInt64-32 2.071n ± 1% GlobalInt63Parallel-32 0.1023n ± 1% GlobalUint64-32 5.193n ± 1% GlobalUint64Parallel-32 0.2341n ± 0% Int64-32 2.056n ± 2% Uint64-32 2.077n ± 2% GlobalIntN1000-32 4.077n ± 2% IntN1000-32 3.476n ± 2% Int64N1000-32 3.059n ± 1% Int64N1e8-32 2.942n ± 1% Int64N1e9-32 2.932n ± 1% Int64N2e9-32 2.925n ± 1% Int64N1e18-32 3.116n ± 1% Int64N2e18-32 4.067n ± 1% Int64N4e18-32 4.054n ± 1% Int32N1000-32 2.951n ± 1% Int32N1e8-32 3.102n ± 1% Int32N1e9-32 3.535n ± 1% Int32N2e9-32 3.514n ± 1% Float32-32 2.760n ± 1% Float64-32 2.284n ± 1% ExpFloat64-32 3.757n ± 1% NormFloat64-32 3.837n ± 1% Perm3-32 35.23n ± 2% Perm30-32 208.8n ± 1% Perm30ViaShuffle-32 111.7n ± 1% ShuffleOverhead-32 101.1n ± 1% Concurrent-32 2.108n ± 7% goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ 220860f76f.arm64 │ │ sec/op │ SourceUint64-8 2.316n ± 1% GlobalInt64-8 2.183n ± 1% GlobalInt63Parallel-8 0.4331n ± 0% GlobalUint64-8 4.377n ± 2% GlobalUint64Parallel-8 0.9237n ± 0% Int64-8 2.538n ± 1% Uint64-8 2.604n ± 1% GlobalIntN1000-8 3.857n ± 2% IntN1000-8 3.822n ± 2% Int64N1000-8 3.318n ± 0% Int64N1e8-8 3.349n ± 1% Int64N1e9-8 3.317n ± 2% Int64N2e9-8 3.317n ± 2% Int64N1e18-8 3.542n ± 1% Int64N2e18-8 5.087n ± 0% Int64N4e18-8 5.084n ± 0% Int32N1000-8 3.208n ± 2% Int32N1e8-8 3.610n ± 1% Int32N1e9-8 4.235n ± 0% Int32N2e9-8 4.229n ± 1% Float32-8 3.468n ± 0% Float64-8 3.447n ± 0% ExpFloat64-8 4.567n ± 0% NormFloat64-8 4.821n ± 0% Perm3-8 28.89n ± 0% Perm30-8 175.7n ± 0% Perm30ViaShuffle-8 153.5n ± 0% ShuffleOverhead-8 119.8n ± 1% Concurrent-8 2.433n ± 3% goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 220860f76f.386 │ │ sec/op │ SourceUint64-32 2.370n ± 1% GlobalInt64-32 3.569n ± 1% GlobalInt63Parallel-32 0.3221n ± 1% GlobalUint64-32 8.797n ± 10% GlobalUint64Parallel-32 0.6351n ± 0% Int64-32 2.612n ± 2% Uint64-32 3.350n ± 1% GlobalIntN1000-32 5.892n ± 1% IntN1000-32 4.546n ± 1% Int64N1000-32 14.59n ± 1% Int64N1e8-32 14.76n ± 2% Int64N1e9-32 16.57n ± 1% Int64N2e9-32 14.54n ± 1% Int64N1e18-32 16.14n ± 1% Int64N2e18-32 18.10n ± 1% Int64N4e18-32 18.65n ± 1% Int32N1000-32 3.560n ± 1% Int32N1e8-32 3.770n ± 2% Int32N1e9-32 4.098n ± 0% Int32N2e9-32 4.179n ± 1% Float32-32 21.18n ± 4% Float64-32 20.60n ± 2% ExpFloat64-32 13.07n ± 0% NormFloat64-32 7.738n ± 2% Perm3-32 36.73n ± 1% Perm30-32 211.9n ± 1% Perm30ViaShuffle-32 165.2n ± 1% ShuffleOverhead-32 133.9n ± 1% Concurrent-32 3.287n ± 2% For #61716. Change-Id: I2f0938eae4b7bf736a8cd899a99783e731bf2179 Reviewed-on: https://go-review.googlesource.com/c/go/+/502496 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Rob Pike <r@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30	math/rand/v2: remove Rand.Seed	Russ Cox
	Removing Rand.Seed lets us remove lockedSource as well, along with the ambiguity in globalRand about which source to use. For #61716. Change-Id: Ibe150520dd1e7dd87165eacaebe9f0c2daeaedfd Reviewed-on: https://go-review.googlesource.com/c/go/+/502498 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Rob Pike <r@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>
2023-10-30	math/rand/v2: clean up regression test	Russ Cox
	Add more test cases. Replace -printgolden with -update, which rewrites the files for us. For #61716. Change-Id: I7c4c900ee896042429135a21971a56ebe16b6a66 Reviewed-on: https://go-review.googlesource.com/c/go/+/516858 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30	math/rand/v2: remove Read	Russ Cox
	In math/rand, Read is deprecated. Remove in v2. People should use crypto/rand if they need long strings. For #61716. Change-Id: Ib254b7e1844616e96db60a3a7abb572b0dcb1583 Reviewed-on: https://go-review.googlesource.com/c/go/+/502497 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30	math/rand/v2: rename various functions	Russ Cox
	Int31 -> Int32 Int31n -> Int32N Int63 -> Int64 Int63n -> Int64N Intn -> IntN The 31 and 63 are pedantic and confusing: the functions should be named for the type they return, same as all the others. The lower-case n is inconsistent with Go's usual CamelCase and especially problematic because we plan to add 'func N'. Capitalize the n. For #61716. Change-Id: Idb1a005a82f353677450d47fb612ade7a41fde69 Reviewed-on: https://go-review.googlesource.com/c/go/+/516857 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30	math/rand/v2: start of new API	Russ Cox
	This is the beginning of the math/rand/v2 package from proposal #61716. Start by copying old API. This CL copies math/rand/* to math/rand/v2 and updates references to math/rand to add v2 throughout. Later CLs will make the v2 changes. For #61716. Change-Id: I1624ccffae3dfa442d4ba2461942decbd076e11b Reviewed-on: https://go-review.googlesource.com/c/go/+/502495 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Rob Pike <r@golang.org>
2023-10-19	all: drop old +build lines	Dmitri Shuralyov
	Running 'go fix' on the cmd+std packages handled much of this change. Also update code generators to use only the new go:build lines, not the old +build ones. For #41184. For #60268. Change-Id: If35532abe3012e7357b02c79d5992ff5ac37ca23 Cq-Include-Trybots: luci.golang.try:gotip-linux-386-longtest,gotip-linux-amd64-longtest,gotip-windows-amd64-longtest Reviewed-on: https://go-review.googlesource.com/c/go/+/536237 Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-19	math: add available godoc link	cui fliter
	Change-Id: I4a6c2ef6fd21355952ab7d8eaad883646a95d364 Reviewed-on: https://go-review.googlesource.com/c/go/+/535087 Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com>
2023-09-20	all: simplify bool conditions	Oleksandr Redko
	Change-Id: Id2079f7012392dea8dfe2386bb9fb1ea3f487a4a Reviewed-on: https://go-review.googlesource.com/c/go/+/526015 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: qiulaidongfeng <2645477756@qq.com>
2023-09-05	all: use ^TestName$ regular pattern for invoking a single test	Dmitri Shuralyov
	Use ^ and $ in the -run flag regular expression value when the intention is to invoke a single named test. This removes the reliance on there not being another similarly named test to achieve the intended result. In particular, package syscall has tests named TestUnshareMountNameSpace and TestUnshareMountNameSpaceChroot that both trigger themselves setting GO_WANT_HELPER_PROCESS=1 to run alternate code in a helper process. As a consequence of overlap in their test names, the former was inadvertently triggering one too many helpers. Spotted while reviewing CL 525196. Apply the same change in other places to make it easier for code readers to see that said tests aren't running extraneous tests. The unlikely cases of -run=TestSomething intentionally being used to run all tests that have the TestSomething substring in the name can be better written as -run=^.TestSomething.$ or with a comment so it is clear it wasn't an oversight. Change-Id: Iba208aba3998acdbf8c6708e5d23ab88938bfc1e Reviewed-on: https://go-review.googlesource.com/c/go/+/524948 Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Kirill Kolyshkin <kolyshkin@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-08-17	math/big, math/rand: use the built-in max function	chanxuehong
	Change-Id: I71a38dd20bfaf2b1aed18892d54eeb017d3d7d66 GitHub-Last-Rev: 8da43b2cbd563ed123690709e519c9f84272b332 GitHub-Pull-Request: golang/go#61955 Reviewed-on: https://go-review.googlesource.com/c/go/+/518595 Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: qiulaidongfeng <2645477756@qq.com>
2023-08-11	math/big: using the min built-in function	qiulaidongfeng
	Change-Id: I9e95806116a8547ec782f66226d1b1382c6156de Change-Id: I9e95806116a8547ec782f66226d1b1382c6156de GitHub-Last-Rev: 5b4ce994c162775e91aa00c942571bc0ac8b1eca GitHub-Pull-Request: golang/go#61829 Reviewed-on: https://go-review.googlesource.com/c/go/+/516895 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-07-31	math: enable huge argument tests on s390x	Srinivas Pokala
	new s390x assembly implementation of Sin/Cos/SinCos/Tan handle huge argument test's. Updates #29240 Change-Id: I9f22d9714528ef2af52c749079f3727250089baf Reviewed-on: https://go-review.googlesource.com/c/go/+/509675 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-07-31	math: huge argument handling for sin/cos in s390x	Srinivas Pokala
	Currently s390x, sin/cos assembly implementation not handling huge arguments. This change reverts assembly routine to native go implementation for huge arguments. Implementing the changes in assembly giving better performance than native go changes in terms of execution/cycles. name Go_changes Asm_changes Sin/input_size_(0.5)-8 11.85ns ± 0% 5.32ns ± 1% Sin/input_size_(1<<20)-8 15.32ns ± 0% 9.75ns ± 3% Sin/input_size_(1<<_40)-8 17.9ns ± 0% 10.3ns ± 6% Sin/input_size_(1<<50)-8 16.33ns ± 0% 9.75ns ± 6% Sin/input_size_(1<<60)-8 33.0ns ± 1% 29.1ns ± 0% Sin/input_size_(1<<80)-8 29.9ns ± 0% 27.2ns ± 2% Sin/input_size_(1<<200)-8 31.5ns ± 1% 28.3ns ± 0% Sin/input_size_(1<<480)-8 29.4ns ± 1% 28.0ns ± 1% Sin/input_size_(1234567891234567_<<_180)-8 29.3ns ± 1% 28.0ns ± 0% Cos/input_size_(0.5)-8 10.33ns ± 0% 5.69ns ± 1% Cos/input_size_(1<<20)-8 16.67ns ± 0% 9.18ns ± 0% Cos/input_size_(1<<_40)-8 18.50ns ± 0% 9.45ns ± 3% Cos/input_size_(1<<50)-8 16.67ns ± 0% 9.18ns ± 1% Cos/input_size_(1<<60)-8 31.6ns ± 1% 26.7ns ± 2% Cos/input_size_(1<<80)-8 31.3ns ± 0% 25.5ns ± 1% Cos/input_size_(1<<200)-8 30.0ns ± 0% 26.7ns ± 1% Cos/input_size_(1<<480)-8 31.9ns ±2% 27.0ns ± 0% Cos/input_size_(1234567891234567_<<_180)-8 31.8ns ± 0% 26.9ns ± 0% Fixes #29240 Change-Id: Id2ebcfa113926f27510d527e80daaddad925a707 Reviewed-on: https://go-review.googlesource.com/c/go/+/469635 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bill O'Farrell <billotosyr@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2023-07-27	math: support to handle huge arguments in tan function on s390x	root
	Currently on s390x, tan assembly implementation is not handling huge arguments at all. This change is to check for large arguments and revert back to native go implantation from assembly code in case of huge arguments. The changes are implemented in assembly code to get better performance over native go implementation. Benchmark details of tan function with table driven inputs are updated as part of the issue link. Fixes #37854 Change-Id: I4e5321e65c27b7ce8c497fc9d3991ca8604753d2 Reviewed-on: https://go-review.googlesource.com/c/go/+/470595 Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Keith Randall <khr@golang.org>
2023-07-17	math: test large negative values as args for trig functions	Keith Randall
	Sin/Tan are odd, Cos is even, so it is easy to compute the correct result from the positive argument case. Change-Id: If851d00fc7f515ece8199cf56d21186ced51e94f Reviewed-on: https://go-review.googlesource.com/c/go/+/509815 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Srinivas Pokala <Pokala.Srinivas@ibm.com> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Keith Randall <khr@google.com>
2023-07-07	math: add test that covers riscv64 fnm{add,sub} codegen	Michael Munday
	Adds a test that triggers the RISC-V fused multiply-add code generation bug fixed by CL 506575. Change-Id: Ia3a55a68b48c5cc6beac4e5235975dea31f3faf2 Reviewed-on: https://go-review.googlesource.com/c/go/+/507035 Auto-Submit: M Zhuo <mzh@golangcn.org> Reviewed-by: M Zhuo <mzh@golangcn.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2023-07-05	math: fix portable FMA when xy < 0 and xy == -z	Michael Munday
	When xy == -z the portable implementation of FMA copied the sign bit from xy into the result. This meant that when xy == -z and xy < 0 the result was -0 which is incorrect. Fixes #61130. Change-Id: Ib93a568b7bdb9031e2aedfa1bdfa9bddde90851d Reviewed-on: https://go-review.googlesource.com/c/go/+/507376 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joedian Reid <joedian@golang.org>
2023-06-15	math: document that Min/Max differ from min/max	Ian Lance Taylor
	For #59488 Fixes #60616 Change-Id: Idf9f42d7d868999664652dd7b478684a474f1d96 Reviewed-on: https://go-review.googlesource.com/c/go/+/501355 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Rob Pike <r@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org>
2023-06-14	all: fix spelling errors	Alexander Yastrebov
	Fix spelling errors discovered using https://github.com/codespell-project/codespell. Errors in data files and vendored packages are ignored. Change-Id: I83c7818222f2eea69afbd270c15b7897678131dc GitHub-Last-Rev: 3491615b1b82832cc0064f535786546e89aa6184 GitHub-Pull-Request: golang/go#60758 Reviewed-on: https://go-review.googlesource.com/c/go/+/502576 Auto-Submit: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-06-13	all: fix mismatched symbols	cui fliter
	There are some symbol mismatches in the comments, this commit attempts to fix them Change-Id: I5c9075e5218defe9233c075744d243b26ff68496 Reviewed-on: https://go-review.googlesource.com/c/go/+/492996 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: shuang cui <imcusg@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2023-06-02	math/big: rename Int.ToFloat64 to Float64	Alan Donovan
	The "To" prefix was a relic of the first draft that I failed to make consistent with the unprefixed name used in the proposal. Fortunately iant spotted it during the API audit. Updates #56984 Updates #60560 Change-Id: Ifa6eeddf6dd5f0637c0568e383f9a4bef88b10f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/500116 Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Alan Donovan <adonovan@google.com>
2023-05-31	Revert "math: add Compare and Compare32"	Ian Lance Taylor
	This reverts CL 467515. Now that we have cmp.Compare, we don't need math.Compare or math.Compare32 after all. For #56491 Fixes #60519 Change-Id: I8ed33464adfc6d69bd6b328edb26aa2ee3d234d9 Reviewed-on: https://go-review.googlesource.com/c/go/+/499416 Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Eli Bendersky <eliben@google.com>
2023-05-23	fmt,math/big,net/url: fixes to old Benchmarks	Egon Elbre
	b.ResetTimer used to also stop the timer, however it does not anymore. These benchmarks hadn't been fixed and as a result ended up measuring some additional things. Also, make some for loops more conventional. Change-Id: I76ca68456d85eec51722a80587e5b2c9f5d836a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/496996 Run-TryBot: Damien Neil <dneil@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Auto-Submit: Damien Neil <dneil@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Damien Neil <dneil@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com>
2023-05-10	all: fix a lot of comments	cui fliter
	Fix comments, including duplicate is, wrong phrases and articles, misspellings, etc. Change-Id: I8bfea53b9b275e649757cc4bee6a8a026ed9c7a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/493035 Reviewed-by: Benny Siegert <bsiegert@gmail.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: shuang cui <imcusg@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>
2023-04-21	cmd/internal/obj/ppc64: modify PCALIGN to ensure alignment	Lynn Boger
	The initial purpose of PCALIGN was to identify code where it would be beneficial to align code for performance, but avoid cases where too many NOPs were added. On p10, it is now necessary to enforce a certain alignment in some cases, so the behavior of PCALIGN needs to be slightly different. Code will now be aligned to the value specified on the PCALIGN instruction regardless of number of NOPs added, which is more intuitive and consistent with power assembler alignment directives. This also adds 64 as a possible alignment value. The existing values used in PCALIGN were modified according to the new behavior. A testcase was updated and performance testing was done to verify that this does not adversely affect performance. Change-Id: Iad1cf5ff112e5bfc0514f0805be90e24095e932b Reviewed-on: https://go-review.googlesource.com/c/go/+/485056 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Bryan Mills <bcmills@google.com>
2023-04-11	all: re-run stringer	Ian Lance Taylor
	Re-run all go:generate stringer commands. This mostly adds checks that the constant values did not change, but does add new strings for the debug/dwarf and internal/pkgbits packages. Change-Id: I5fc41f20da47338152c183d45d5ae65074e2fccf Reviewed-on: https://go-review.googlesource.com/c/go/+/483717 Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org>
2023-04-04	math/rand: clarify Seed deprecation note	Ian Lance Taylor
	Fixes #59331 Change-Id: I62156be2f2758c59349c3b02db6cf9140429c9e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/481915 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Bypass: Ian Lance Taylor <iant@google.com> Reviewed-by: Russ Cox <rsc@golang.org>
2023-04-04	all: fix misuses of "a" vs "an"	cui fliter
	Fixes the misuse of "a" vs "an", according to English grammatical expectations and using https://www.a-or-an.com/ Change-Id: I53ac724070e3ff3d33c304483fe72c023c7cda47 Reviewed-on: https://go-review.googlesource.com/c/go/+/480536 Run-TryBot: shuang cui <imcusg@gmail.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-03-25	all: add a few links in package godocs	Daniel Martí
	I noticed the one in path/filepath while reading the docs, and the other ones were found via some quick grepping. Change-Id: I386f2f74ef816a6d18aa2f58ee6b64dbd0147c9e Reviewed-on: https://go-review.googlesource.com/c/go/+/478795 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Heschi Kreinick <heschi@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-03-23	all: replace leading spaces with tabs in assembly	Michael Pratt
	Most of these are one-off mistakes. Only one file was all spaces. Change-Id: I277c3ce4a4811aa4248c90676f66bc775ae8d062 Reviewed-on: https://go-review.googlesource.com/c/go/+/478976 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-02-15	math: add Compare and Compare32	Akhil Indurti
	This change introduces the Compare and Compare32 functions based on the total-ordering predicate in IEEE-754, section 5.10. In particular, * -NaN is ordered before any other value * +NaN is ordered after any other value * -0 is ordered before +0 * All other values are ordered the usual way Compare-8 0.4537n ± 1% Compare32-8 0.3752n ± 1% geomean 0.4126n Fixes #56491. Change-Id: I5c9c77430a2872f380688c1b0a66f2105b77d5ac Reviewed-on: https://go-review.googlesource.com/c/go/+/467515 Reviewed-by: WANG Xuerui <git@xen0n.name> Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-14	all: fix some comments	cui fliter
	Change-Id: I16ec916b47de2f417b681c8abff5a1375ddf491b Reviewed-on: https://go-review.googlesource.com/c/go/+/468055 Run-TryBot: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-02-10	Revert "math: add Compare and Compare32"	Than McIntosh
	This reverts CL 459435. Reason for revert: Tests failing on MIPS. Change-Id: I9017bf718ba938df6d6766041555034d55d90b8a Reviewed-on: https://go-review.googlesource.com/c/go/+/467255 Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Akhil Indurti <aindurti@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-02-10	math/rand: fix typo in Seed deprecation comment	Valentin Deleplace
	Change-Id: I37a9e4362953a711840087e1b7b8d7a25f1a83b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/467275 Reviewed-by: Russ Cox <rsc@golang.org> TryBot-Bypass: Russ Cox <rsc@golang.org> Auto-Submit: Russ Cox <rsc@golang.org>
2023-02-09	math/rand: rewrite the math/rand package comment to say what it's good for	Rob Pike
	It currently says only what it wasn't good for, which is not helpful. Change-Id: I468c7f385c14eaca99788a94d53c30b729ed0944 Reviewed-on: https://go-review.googlesource.com/c/go/+/466276 Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com>
2023-02-09	math: add Compare and Compare32	Akhil Indurti
	This change introduces the Compare and Compare32 functions based on the total-ordering predicate in IEEE-754, section 5.10. In particular, * -NaN is ordered before any other value * +NaN is ordered after any other value * -0 is ordered before +0 * All other values are ordered the usual way name time/op Compare-8 0.24ns ± 1% Compare32-8 0.24ns ± 0% Fixes #56491. Change-Id: I9444fbfefe26741794c4436a26d403b8da97bdaf Reviewed-on: https://go-review.googlesource.com/c/go/+/459435 Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-02-07	math/rand: use fastrand64 if possible	Ian Lance Taylor
	Now that the top-level math/rand functions are auto-seeded by default (issue #54880), use the runtime fastrand64 function when 1) Seed has not been called; 2) the GODEBUG randautoseed=0 is not used. The benchmarks run quickly and are relatively noisy, but they show significant improvements for parallel calls to the top-level functions. goos: linux goarch: amd64 pkg: math/rand cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ /tmp/foo.1 │ /tmp/foo.2 │ │ sec/op │ sec/op vs base │ Int63Threadsafe-16 11.605n ± 1% 3.094n ± 1% -73.34% (p=0.000 n=10) Int63ThreadsafeParallel-16 67.8350n ± 2% 0.4000n ± 1% -99.41% (p=0.000 n=10) Int63Unthreadsafe-16 1.947n ± 3% 1.924n ± 2% ~ (p=0.189 n=10) Intn1000-16 4.295n ± 2% 4.287n ± 3% ~ (p=0.517 n=10) Int63n1000-16 4.379n ± 0% 4.192n ± 2% -4.27% (p=0.000 n=10) Int31n1000-16 3.641n ± 3% 3.506n ± 0% -3.69% (p=0.000 n=10) Float32-16 3.330n ± 7% 3.250n ± 2% -2.40% (p=0.017 n=10) Float64-16 2.194n ± 6% 2.056n ± 4% -6.31% (p=0.004 n=10) Perm3-16 43.39n ± 9% 38.28n ± 12% -11.77% (p=0.015 n=10) Perm30-16 324.4n ± 6% 315.9n ± 19% ~ (p=0.315 n=10) Perm30ViaShuffle-16 175.4n ± 1% 143.6n ± 2% -18.15% (p=0.000 n=10) ShuffleOverhead-16 223.4n ± 2% 215.8n ± 1% -3.38% (p=0.000 n=10) Read3-16 5.428n ± 3% 5.406n ± 2% ~ (p=0.780 n=10) Read64-16 41.55n ± 5% 40.14n ± 3% -3.38% (p=0.000 n=10) Read1000-16 622.9n ± 4% 594.9n ± 2% -4.50% (p=0.000 n=10) Concurrent-16 136.300n ± 2% 4.647n ± 26% -96.59% (p=0.000 n=10) geomean 23.40n 12.15n -48.08% Fixes #49892 Change-Id: Iba75b326145512ab0b7ece233b98ac3d4e1fb504 Reviewed-on: https://go-review.googlesource.com/c/go/+/465037 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com>
2023-02-06	math/big: add warning about using Int for cryptography	Filippo Valsorda
	Change-Id: I31bec5d2b4a79a085942c7d380678379d99cf07b Reviewed-on: https://go-review.googlesource.com/c/go/+/455135 Auto-Submit: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Roland Shoemaker <roland@golang.org> Run-TryBot: Filippo Valsorda <filippo@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com>

ChaCha8 provides a cryptographically strong generator alongside PCG, so that people who want stronger randomness have access to that. On systems with 128-bit vector math assembly (amd64 and arm64), ChaCha8 runs at about the same speed as PCG (25% slower on amd64, 2% faster on arm64). Obviously all the claimed benchmark variation other than the new ChaCha8 benchmark is a lie. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ afa459a2f0.amd64 │ bbb48afeb7.amd64 │ │ sec/op │ sec/op vs base │ PCG_DXSM-32 1.488n ± 2% 1.492n ± 2% ~ (p=0.309 n=20) ChaCha8-32 1.861n ± 2% SourceUint64-32 1.450n ± 3% 1.590n ± 2% +9.69% (p=0.000 n=20) GlobalInt64-32 2.067n ± 2% 2.061n ± 1% ~ (p=0.952 n=20) GlobalInt64Parallel-32 0.1044n ± 2% 0.1041n ± 1% ~ (p=0.498 n=20) GlobalUint64-32 2.085n ± 0% 2.256n ± 2% +8.23% (p=0.000 n=20) GlobalUint64Parallel-32 0.1008n ± 1% 0.1018n ± 1% ~ (p=0.041 n=20) Int64-32 1.779n ± 1% 1.779n ± 1% ~ (p=0.410 n=20) Uint64-32 1.854n ± 2% 1.882n ± 1% ~ (p=0.044 n=20) GlobalIntN1000-32 3.140n ± 3% 3.115n ± 3% ~ (p=0.673 n=20) IntN1000-32 2.496n ± 1% 2.509n ± 1% ~ (p=0.171 n=20) Int64N1000-32 2.510n ± 2% 2.493n ± 1% ~ (p=0.804 n=20) Int64N1e8-32 2.471n ± 2% 2.521n ± 1% +1.98% (p=0.003 n=20) Int64N1e9-32 2.488n ± 2% 2.506n ± 1% ~ (p=0.663 n=20) Int64N2e9-32 2.478n ± 2% 2.482n ± 2% ~ (p=0.533 n=20) Int64N1e18-32 3.088n ± 1% 3.216n ± 1% +4.15% (p=0.000 n=20) Int64N2e18-32 3.493n ± 1% 3.635n ± 2% +4.05% (p=0.000 n=20) Int64N4e18-32 5.060n ± 2% 5.122n ± 1% +1.22% (p=0.000 n=20) Int32N1000-32 2.620n ± 1% 2.672n ± 1% +2.00% (p=0.002 n=20) Int32N1e8-32 2.652n ± 0% 2.646n ± 1% ~ (p=0.743 n=20) Int32N1e9-32 2.644n ± 1% 2.660n ± 2% ~ (p=0.163 n=20) Int32N2e9-32 2.619n ± 2% 2.652n ± 1% ~ (p=0.132 n=20) Float32-32 2.261n ± 1% 2.267n ± 1% ~ (p=0.516 n=20) Float64-32 2.241n ± 2% 2.276n ± 1% ~ (p=0.080 n=20) ExpFloat64-32 3.716n ± 1% 3.779n ± 1% +1.68% (p=0.007 n=20) NormFloat64-32 3.718n ± 1% 3.747n ± 1% ~ (p=0.011 n=20) Perm3-32 34.11n ± 2% 34.23n ± 2% ~ (p=0.779 n=20) Perm30-32 200.6n ± 0% 202.3n ± 2% ~ (p=0.055 n=20) Perm30ViaShuffle-32 109.7n ± 1% 115.5n ± 2% +5.34% (p=0.000 n=20) ShuffleOverhead-32 107.2n ± 1% 113.3n ± 1% +5.74% (p=0.000 n=20) Concurrent-32 2.108n ± 6% 2.107n ± 1% ~ (p=0.448 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ afa459a2f0.arm64 │ bbb48afeb7.arm64 │ │ sec/op │ sec/op vs base │ PCG_DXSM-8 2.531n ± 0% 2.529n ± 0% ~ (p=0.586 n=20) ChaCha8-8 2.480n ± 0% SourceUint64-8 2.531n ± 0% 2.534n ± 0% ~ (p=0.227 n=20) GlobalInt64-8 2.177n ± 1% 2.173n ± 1% ~ (p=0.733 n=20) GlobalInt64Parallel-8 0.4319n ± 0% 0.4304n ± 0% -0.32% (p=0.003 n=20) GlobalUint64-8 2.185n ± 1% 2.185n ± 0% ~ (p=0.541 n=20) GlobalUint64Parallel-8 0.4295n ± 1% 0.4294n ± 0% ~ (p=0.203 n=20) Int64-8 4.104n ± 0% 4.107n ± 0% ~ (p=0.193 n=20) Uint64-8 4.080n ± 0% 4.081n ± 0% ~ (p=0.053 n=20) GlobalIntN1000-8 2.814n ± 1% 2.814n ± 0% ~ (p=0.879 n=20) IntN1000-8 4.140n ± 0% 4.141n ± 0% ~ (p=0.428 n=20) Int64N1000-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.114 n=20) Int64N1e8-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.898 n=20) Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.593 n=20) Int64N2e9-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.158 n=20) Int64N1e18-8 5.273n ± 0% 5.274n ± 0% ~ (p=0.308 n=20) Int64N2e18-8 6.059n ± 0% 6.058n ± 0% ~ (p=0.053 n=20) Int64N4e18-8 8.803n ± 0% 8.800n ± 0% ~ (p=0.673 n=20) Int32N1000-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.342 n=20) Int32N1e8-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.091 n=20) Int32N1e9-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.273 n=20) Int32N2e9-8 4.131n ± 0% 4.131n ± 0% ~ (p=0.425 n=20) Float32-8 4.110n ± 0% 4.112n ± 0% ~ (p=0.203 n=20) Float64-8 4.104n ± 0% 4.106n ± 0% ~ (p=0.409 n=20) ExpFloat64-8 5.338n ± 0% 5.339n ± 0% ~ (p=0.037 n=20) NormFloat64-8 5.731n ± 0% 5.733n ± 0% ~ (p=0.692 n=20) Perm3-8 26.62n ± 0% 26.65n ± 0% +0.09% (p=0.000 n=20) Perm30-8 194.6n ± 2% 194.9n ± 0% ~ (p=0.141 n=20) Perm30ViaShuffle-8 156.4n ± 0% 156.5n ± 0% +0.06% (p=0.000 n=20) ShuffleOverhead-8 125.8n ± 0% 125.0n ± 0% -0.64% (p=0.000 n=20) Concurrent-8 2.654n ± 6% 2.441n ± 6% -8.06% (p=0.009 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ afa459a2f0.386 │ bbb48afeb7.386 │ │ sec/op │ sec/op vs base │ PCG_DXSM-32 7.793n ± 2% 7.647n ± 1% ~ (p=0.021 n=20) ChaCha8-32 11.48n ± 2% SourceUint64-32 7.680n ± 1% 7.714n ± 1% ~ (p=0.713 n=20) GlobalInt64-32 3.474n ± 3% 3.491n ± 28% ~ (p=0.337 n=20) GlobalInt64Parallel-32 0.3253n ± 0% 0.3194n ± 0% -1.81% (p=0.000 n=20) GlobalUint64-32 3.433n ± 2% 3.610n ± 2% +5.14% (p=0.000 n=20) GlobalUint64Parallel-32 0.3156n ± 0% 0.3164n ± 0% ~ (p=0.073 n=20) Int64-32 7.707n ± 1% 7.824n ± 0% +1.52% (p=0.005 n=20) Uint64-32 7.714n ± 1% 7.732n ± 2% ~ (p=0.441 n=20) GlobalIntN1000-32 6.236n ± 1% 6.176n ± 2% ~ (p=0.499 n=20) IntN1000-32 10.41n ± 1% 10.31n ± 2% ~ (p=0.782 n=20) Int64N1000-32 10.97n ± 2% 11.22n ± 2% +2.19% (p=0.002 n=20) Int64N1e8-32 10.98n ± 1% 11.07n ± 1% ~ (p=0.056 n=20) Int64N1e9-32 10.95n ± 0% 11.15n ± 2% ~ (p=0.016 n=20) Int64N2e9-32 11.11n ± 1% 11.00n ± 1% ~ (p=0.654 n=20) Int64N1e18-32 15.18n ± 2% 14.97n ± 2% ~ (p=0.387 n=20) Int64N2e18-32 15.61n ± 1% 15.91n ± 1% +1.92% (p=0.003 n=20) Int64N4e18-32 19.23n ± 2% 18.98n ± 1% ~ (p=1.000 n=20) Int32N1000-32 10.35n ± 1% 10.31n ± 2% ~ (p=0.081 n=20) Int32N1e8-32 10.33n ± 1% 10.38n ± 1% ~ (p=0.335 n=20) Int32N1e9-32 10.35n ± 1% 10.37n ± 1% ~ (p=0.497 n=20) Int32N2e9-32 10.35n ± 1% 10.41n ± 1% ~ (p=0.605 n=20) Float32-32 13.57n ± 1% 13.78n ± 2% ~ (p=0.047 n=20) Float64-32 22.95n ± 4% 23.43n ± 3% ~ (p=0.218 n=20) ExpFloat64-32 15.23n ± 2% 15.46n ± 1% ~ (p=0.095 n=20) NormFloat64-32 13.78n ± 1% 13.73n ± 2% ~ (p=0.031 n=20) Perm3-32 46.62n ± 2% 47.46n ± 2% +1.82% (p=0.004 n=20) Perm30-32 400.7n ± 1% 403.5n ± 1% ~ (p=0.098 n=20) Perm30ViaShuffle-32 350.5n ± 1% 348.1n ± 2% ~ (p=0.703 n=20) ShuffleOverhead-32 326.0n ± 2% 326.2n ± 2% ~ (p=0.440 n=20) Concurrent-32 3.290n ± 0% 3.297n ± 4% ~ (p=0.189 n=20) For #61716. Change-Id: Id2a7e1c1db0beb81f563faaefba65fe292497269 Reviewed-on: https://go-review.googlesource.com/c/go/+/516859 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>

Based on observations by Cherry Mui (see comments in CL 539299). Add new benchmark FloatPrecMixed. For #50489. name old time/op new time/op delta FloatPrecExact/1-12 129ns ± 0% 105ns ±11% -18.51% (p=0.008 n=5+5) FloatPrecExact/10-12 317ns ± 2% 283ns ± 1% -10.65% (p=0.008 n=5+5) FloatPrecExact/100-12 1.80µs ±15% 1.35µs ± 0% -25.09% (p=0.008 n=5+5) FloatPrecExact/1000-12 9.48µs ±14% 8.32µs ± 1% -12.25% (p=0.008 n=5+5) FloatPrecExact/10000-12 195µs ± 1% 191µs ± 0% -1.73% (p=0.008 n=5+5) FloatPrecExact/100000-12 7.31ms ± 1% 7.24ms ± 1% -0.99% (p=0.032 n=5+5) FloatPrecExact/1000000-12 301ms ± 3% 302ms ± 2% ~ (p=0.841 n=5+5) FloatPrecMixed/1-12 141ns ± 0% 110ns ± 3% -21.88% (p=0.008 n=5+5) FloatPrecMixed/10-12 767ns ± 0% 739ns ± 5% ~ (p=0.151 n=5+5) FloatPrecMixed/100-12 4.93µs ± 2% 3.73µs ± 1% -24.33% (p=0.008 n=5+5) FloatPrecMixed/1000-12 90.9µs ±11% 70.3µs ± 2% -22.66% (p=0.008 n=5+5) FloatPrecMixed/10000-12 2.30ms ± 0% 1.92ms ± 1% -16.41% (p=0.008 n=5+5) FloatPrecMixed/100000-12 87.1ms ± 1% 68.5ms ± 1% -21.42% (p=0.008 n=5+5) FloatPrecMixed/1000000-12 4.09s ± 1% 3.58s ± 1% -12.35% (p=0.008 n=5+5) FloatPrecInexact/1-12 92.4ns ± 0% 66.1ns ± 5% -28.41% (p=0.008 n=5+5) FloatPrecInexact/10-12 118ns ± 0% 91ns ± 1% -23.14% (p=0.016 n=5+4) FloatPrecInexact/100-12 310ns ±10% 244ns ± 1% -21.32% (p=0.008 n=5+5) FloatPrecInexact/1000-12 952ns ± 1% 828ns ± 1% -12.96% (p=0.016 n=4+5) FloatPrecInexact/10000-12 6.71µs ± 1% 6.25µs ± 3% -6.83% (p=0.008 n=5+5) FloatPrecInexact/100000-12 66.1µs ± 1% 61.2µs ± 1% -7.45% (p=0.008 n=5+5) FloatPrecInexact/1000000-12 635µs ± 2% 584µs ± 1% -7.97% (p=0.008 n=5+5) Change-Id: I3aa67b49a042814a3286ee8306fbed36709cbb6e Reviewed-on: https://go-review.googlesource.com/c/go/+/542756 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Robert Griesemer <gri@google.com>

Follow-up on CL 539299: missed to incorporate the updated comment per feedback on that CL. For #50489. Change-Id: Ib035400038b1d11532f62055b5cdb382ab75654c Reviewed-on: https://go-review.googlesource.com/c/go/+/542115 Run-TryBot: Robert Griesemer <gri@google.com> Auto-Submit: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>

goos: darwin goarch: amd64 pkg: math/big cpu: Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz BenchmarkFloatPrecExact/1-12 9380685 125.0 ns/op BenchmarkFloatPrecExact/10-12 3780493 321.2 ns/op BenchmarkFloatPrecExact/100-12 698272 1679 ns/op BenchmarkFloatPrecExact/1000-12 117975 9113 ns/op BenchmarkFloatPrecExact/10000-12 5913 192768 ns/op BenchmarkFloatPrecExact/100000-12 164 7401817 ns/op BenchmarkFloatPrecExact/1000000-12 4 293568523 ns/op BenchmarkFloatPrecInexact/1-12 12836612 91.26 ns/op BenchmarkFloatPrecInexact/10-12 10144908 114.9 ns/op BenchmarkFloatPrecInexact/100-12 4121931 297.3 ns/op BenchmarkFloatPrecInexact/1000-12 1275886 927.7 ns/op BenchmarkFloatPrecInexact/10000-12 170392 6546 ns/op BenchmarkFloatPrecInexact/100000-12 18307 65232 ns/op BenchmarkFloatPrecInexact/1000000-12 1701 621412 ns/op Fixes #50489. Change-Id: Ic952f00e35d42f2470ecab53df712721997eac94 Reviewed-on: https://go-review.googlesource.com/c/go/+/539299 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Robert Griesemer <gri@google.com> Run-TryBot: Robert Griesemer <gri@google.com> Reviewed-by: Robert Griesemer <gri@google.com>

These slowdowns are because we are now using PCG instead of the Mitchell/Reeds LFSR for the benchmarks. PCG is in fact a bit slower (but generates statically far better random numbers). goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 01ff938549.amd64 │ afa459a2f0.amd64 │ │ sec/op │ sec/op vs base │ PCG_DXSM-32 1.490n ± 0% 1.488n ± 2% ~ (p=0.408 n=20) SourceUint64-32 1.352n ± 1% 1.450n ± 3% +7.21% (p=0.000 n=20) GlobalInt64-32 2.083n ± 0% 2.067n ± 2% ~ (p=0.223 n=20) GlobalInt64Parallel-32 0.1035n ± 1% 0.1044n ± 2% ~ (p=0.010 n=20) GlobalUint64-32 2.038n ± 1% 2.085n ± 0% +2.28% (p=0.000 n=20) GlobalUint64Parallel-32 0.1006n ± 1% 0.1008n ± 1% ~ (p=0.733 n=20) Int64-32 1.687n ± 2% 1.779n ± 1% +5.48% (p=0.000 n=20) Uint64-32 1.674n ± 2% 1.854n ± 2% +10.69% (p=0.000 n=20) GlobalIntN1000-32 3.135n ± 1% 3.140n ± 3% ~ (p=0.794 n=20) IntN1000-32 2.478n ± 1% 2.496n ± 1% +0.73% (p=0.006 n=20) Int64N1000-32 2.455n ± 1% 2.510n ± 2% +2.22% (p=0.000 n=20) Int64N1e8-32 2.467n ± 2% 2.471n ± 2% ~ (p=0.050 n=20) Int64N1e9-32 2.454n ± 1% 2.488n ± 2% +1.39% (p=0.000 n=20) Int64N2e9-32 2.482n ± 1% 2.478n ± 2% ~ (p=0.066 n=20) Int64N1e18-32 3.349n ± 2% 3.088n ± 1% -7.81% (p=0.000 n=20) Int64N2e18-32 3.537n ± 1% 3.493n ± 1% -1.24% (p=0.002 n=20) Int64N4e18-32 4.917n ± 0% 5.060n ± 2% +2.91% (p=0.000 n=20) Int32N1000-32 2.386n ± 1% 2.620n ± 1% +9.76% (p=0.000 n=20) Int32N1e8-32 2.366n ± 1% 2.652n ± 0% +12.11% (p=0.000 n=20) Int32N1e9-32 2.355n ± 2% 2.644n ± 1% +12.32% (p=0.000 n=20) Int32N2e9-32 2.371n ± 1% 2.619n ± 2% +10.48% (p=0.000 n=20) Float32-32 2.245n ± 2% 2.261n ± 1% ~ (p=0.625 n=20) Float64-32 2.235n ± 1% 2.241n ± 2% ~ (p=0.393 n=20) ExpFloat64-32 3.813n ± 3% 3.716n ± 1% -2.53% (p=0.000 n=20) NormFloat64-32 3.652n ± 2% 3.718n ± 1% +1.79% (p=0.006 n=20) Perm3-32 33.12n ± 3% 34.11n ± 2% ~ (p=0.021 n=20) Perm30-32 205.1n ± 1% 200.6n ± 0% -2.17% (p=0.000 n=20) Perm30ViaShuffle-32 110.8n ± 1% 109.7n ± 1% -0.99% (p=0.002 n=20) ShuffleOverhead-32 113.0n ± 1% 107.2n ± 1% -5.09% (p=0.000 n=20) Concurrent-32 2.100n ± 0% 2.108n ± 6% ~ (p=0.103 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 │ 01ff938549.arm64 │ afa459a2f0.arm64 │ │ sec/op │ sec/op vs base │ PCG_DXSM-8 2.531n ± 0% 2.531n ± 0% ~ (p=0.763 n=20) SourceUint64-8 2.258n ± 1% 2.531n ± 0% +12.09% (p=0.000 n=20) GlobalInt64-8 2.167n ± 0% 2.177n ± 1% ~ (p=0.213 n=20) GlobalInt64Parallel-8 0.4310n ± 0% 0.4319n ± 0% ~ (p=0.027 n=20) GlobalUint64-8 2.182n ± 1% 2.185n ± 1% ~ (p=0.683 n=20) GlobalUint64Parallel-8 0.4297n ± 0% 0.4295n ± 1% ~ (p=0.941 n=20) Int64-8 2.472n ± 1% 4.104n ± 0% +66.00% (p=0.000 n=20) Uint64-8 2.449n ± 1% 4.080n ± 0% +66.60% (p=0.000 n=20) GlobalIntN1000-8 2.814n ± 2% 2.814n ± 1% ~ (p=0.972 n=20) IntN1000-8 2.998n ± 2% 4.140n ± 0% +38.09% (p=0.000 n=20) Int64N1000-8 2.949n ± 2% 4.139n ± 0% +40.35% (p=0.000 n=20) Int64N1e8-8 2.953n ± 2% 4.140n ± 0% +40.22% (p=0.000 n=20) Int64N1e9-8 2.950n ± 0% 4.139n ± 0% +40.32% (p=0.000 n=20) Int64N2e9-8 2.946n ± 2% 4.140n ± 0% +40.53% (p=0.000 n=20) Int64N1e18-8 3.779n ± 1% 5.273n ± 0% +39.52% (p=0.000 n=20) Int64N2e18-8 4.370n ± 1% 6.059n ± 0% +38.65% (p=0.000 n=20) Int64N4e18-8 6.544n ± 1% 8.803n ± 0% +34.52% (p=0.000 n=20) Int32N1000-8 2.950n ± 0% 4.131n ± 0% +40.06% (p=0.000 n=20) Int32N1e8-8 2.950n ± 2% 4.131n ± 0% +40.03% (p=0.000 n=20) Int32N1e9-8 2.951n ± 2% 4.131n ± 0% +39.99% (p=0.000 n=20) Int32N2e9-8 2.950n ± 2% 4.131n ± 0% +40.03% (p=0.000 n=20) Float32-8 3.441n ± 0% 4.110n ± 0% +19.44% (p=0.000 n=20) Float64-8 3.442n ± 0% 4.104n ± 0% +19.24% (p=0.000 n=20) ExpFloat64-8 4.481n ± 0% 5.338n ± 0% +19.11% (p=0.000 n=20) NormFloat64-8 4.725n ± 0% 5.731n ± 0% +21.28% (p=0.000 n=20) Perm3-8 26.55n ± 0% 26.62n ± 0% +0.28% (p=0.000 n=20) Perm30-8 181.9n ± 0% 194.6n ± 2% +6.98% (p=0.000 n=20) Perm30ViaShuffle-8 142.9n ± 0% 156.4n ± 0% +9.45% (p=0.000 n=20) ShuffleOverhead-8 120.8n ± 2% 125.8n ± 0% +4.10% (p=0.000 n=20) Concurrent-8 2.421n ± 6% 2.654n ± 6% +9.67% (p=0.002 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 01ff938549.386 │ afa459a2f0.386 │ │ sec/op │ sec/op vs base │ PCG_DXSM-32 7.613n ± 1% 7.793n ± 2% +2.38% (p=0.000 n=20) SourceUint64-32 2.069n ± 0% 7.680n ± 1% +271.19% (p=0.000 n=20) GlobalInt64-32 3.456n ± 1% 3.474n ± 3% ~ (p=0.654 n=20) GlobalInt64Parallel-32 0.3252n ± 0% 0.3253n ± 0% ~ (p=0.952 n=20) GlobalUint64-32 3.573n ± 1% 3.433n ± 2% -3.92% (p=0.000 n=20) GlobalUint64Parallel-32 0.3159n ± 0% 0.3156n ± 0% ~ (p=0.223 n=20) Int64-32 2.562n ± 2% 7.707n ± 1% +200.74% (p=0.000 n=20) Uint64-32 2.592n ± 0% 7.714n ± 1% +197.65% (p=0.000 n=20) GlobalIntN1000-32 6.266n ± 2% 6.236n ± 1% ~ (p=0.039 n=20) IntN1000-32 4.724n ± 2% 10.410n ± 1% +120.39% (p=0.000 n=20) Int64N1000-32 5.490n ± 2% 10.975n ± 2% +99.89% (p=0.000 n=20) Int64N1e8-32 5.513n ± 2% 10.980n ± 1% +99.15% (p=0.000 n=20) Int64N1e9-32 5.476n ± 1% 10.950n ± 0% +99.96% (p=0.000 n=20) Int64N2e9-32 5.501n ± 2% 11.110n ± 1% +101.96% (p=0.000 n=20) Int64N1e18-32 9.043n ± 2% 15.180n ± 2% +67.86% (p=0.000 n=20) Int64N2e18-32 9.601n ± 2% 15.610n ± 1% +62.60% (p=0.000 n=20) Int64N4e18-32 12.00n ± 1% 19.23n ± 2% +60.14% (p=0.000 n=20) Int32N1000-32 4.829n ± 2% 10.345n ± 1% +114.25% (p=0.000 n=20) Int32N1e8-32 4.825n ± 2% 10.330n ± 1% +114.09% (p=0.000 n=20) Int32N1e9-32 4.830n ± 2% 10.350n ± 1% +114.26% (p=0.000 n=20) Int32N2e9-32 4.750n ± 2% 10.345n ± 1% +117.81% (p=0.000 n=20) Float32-32 10.89n ± 4% 13.57n ± 1% +24.61% (p=0.000 n=20) Float64-32 19.60n ± 4% 22.95n ± 4% +17.12% (p=0.000 n=20) ExpFloat64-32 12.96n ± 3% 15.23n ± 2% +17.47% (p=0.000 n=20) NormFloat64-32 7.516n ± 1% 13.780n ± 1% +83.34% (p=0.000 n=20) Perm3-32 36.78n ± 2% 46.62n ± 2% +26.72% (p=0.000 n=20) Perm30-32 238.9n ± 2% 400.7n ± 1% +67.73% (p=0.000 n=20) Perm30ViaShuffle-32 189.7n ± 2% 350.5n ± 1% +84.79% (p=0.000 n=20) ShuffleOverhead-32 159.8n ± 1% 326.0n ± 2% +104.01% (p=0.000 n=20) Concurrent-32 3.286n ± 1% 3.290n ± 0% ~ (p=0.743 n=20) On the other hand, compared to the original "update benchmarks" CL, the cleanups we've made more than compensate for PCG being a bit slower than LFSR, at least on 64-bit x86. ARM64 (Apple M1) is a bit slower: perhaps the 64x64→128 multiply is slower there for some reason. 386 is noticeably slower, but it's also a non-SSA backend. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 220860f76f.amd64 │ afa459a2f0.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.555n ± 1% 1.450n ± 3% -6.78% (p=0.000 n=20) GlobalInt64-32 2.071n ± 1% 2.067n ± 2% ~ (p=0.673 n=20) GlobalInt63Parallel-32 0.1023n ± 1% GlobalInt64Parallel-32 0.1044n ± 2% GlobalUint64-32 5.193n ± 1% 2.085n ± 0% -59.86% (p=0.000 n=20) GlobalUint64Parallel-32 0.2341n ± 0% 0.1008n ± 1% -56.93% (p=0.000 n=20) Int64-32 2.056n ± 2% 1.779n ± 1% -13.47% (p=0.000 n=20) Uint64-32 2.077n ± 2% 1.854n ± 2% -10.74% (p=0.000 n=20) GlobalIntN1000-32 4.077n ± 2% 3.140n ± 3% -22.98% (p=0.000 n=20) IntN1000-32 3.476n ± 2% 2.496n ± 1% -28.19% (p=0.000 n=20) Int64N1000-32 3.059n ± 1% 2.510n ± 2% -17.96% (p=0.000 n=20) Int64N1e8-32 2.942n ± 1% 2.471n ± 2% -15.98% (p=0.000 n=20) Int64N1e9-32 2.932n ± 1% 2.488n ± 2% -15.14% (p=0.000 n=20) Int64N2e9-32 2.925n ± 1% 2.478n ± 2% -15.30% (p=0.000 n=20) Int64N1e18-32 3.116n ± 1% 3.088n ± 1% ~ (p=0.013 n=20) Int64N2e18-32 4.067n ± 1% 3.493n ± 1% -14.11% (p=0.000 n=20) Int64N4e18-32 4.054n ± 1% 5.060n ± 2% +24.80% (p=0.000 n=20) Int32N1000-32 2.951n ± 1% 2.620n ± 1% -11.22% (p=0.000 n=20) Int32N1e8-32 3.102n ± 1% 2.652n ± 0% -14.50% (p=0.000 n=20) Int32N1e9-32 3.535n ± 1% 2.644n ± 1% -25.20% (p=0.000 n=20) Int32N2e9-32 3.514n ± 1% 2.619n ± 2% -25.47% (p=0.000 n=20) Float32-32 2.760n ± 1% 2.261n ± 1% -18.06% (p=0.000 n=20) Float64-32 2.284n ± 1% 2.241n ± 2% ~ (p=0.016 n=20) ExpFloat64-32 3.757n ± 1% 3.716n ± 1% ~ (p=0.034 n=20) NormFloat64-32 3.837n ± 1% 3.718n ± 1% -3.09% (p=0.000 n=20) Perm3-32 35.23n ± 2% 34.11n ± 2% -3.19% (p=0.000 n=20) Perm30-32 208.8n ± 1% 200.6n ± 0% -3.93% (p=0.000 n=20) Perm30ViaShuffle-32 111.7n ± 1% 109.7n ± 1% -1.84% (p=0.000 n=20) ShuffleOverhead-32 101.1n ± 1% 107.2n ± 1% +6.03% (p=0.000 n=20) Concurrent-32 2.108n ± 7% 2.108n ± 6% ~ (p=0.644 n=20) PCG_DXSM-32 1.488n ± 2% goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ 220860f76f.arm64 │ afa459a2f0.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.316n ± 1% 2.531n ± 0% +9.33% (p=0.000 n=20) GlobalInt64-8 2.183n ± 1% 2.177n ± 1% ~ (p=0.533 n=20) GlobalInt63Parallel-8 0.4331n ± 0% GlobalInt64Parallel-8 0.4319n ± 0% GlobalUint64-8 4.377n ± 2% 2.185n ± 1% -50.07% (p=0.000 n=20) GlobalUint64Parallel-8 0.9237n ± 0% 0.4295n ± 1% -53.50% (p=0.000 n=20) Int64-8 2.538n ± 1% 4.104n ± 0% +61.68% (p=0.000 n=20) Uint64-8 2.604n ± 1% 4.080n ± 0% +56.68% (p=0.000 n=20) GlobalIntN1000-8 3.857n ± 2% 2.814n ± 1% -27.04% (p=0.000 n=20) IntN1000-8 3.822n ± 2% 4.140n ± 0% +8.32% (p=0.000 n=20) Int64N1000-8 3.318n ± 0% 4.139n ± 0% +24.74% (p=0.000 n=20) Int64N1e8-8 3.349n ± 1% 4.140n ± 0% +23.64% (p=0.000 n=20) Int64N1e9-8 3.317n ± 2% 4.139n ± 0% +24.80% (p=0.000 n=20) Int64N2e9-8 3.317n ± 2% 4.140n ± 0% +24.81% (p=0.000 n=20) Int64N1e18-8 3.542n ± 1% 5.273n ± 0% +48.85% (p=0.000 n=20) Int64N2e18-8 5.087n ± 0% 6.059n ± 0% +19.12% (p=0.000 n=20) Int64N4e18-8 5.084n ± 0% 8.803n ± 0% +73.16% (p=0.000 n=20) Int32N1000-8 3.208n ± 2% 4.131n ± 0% +28.79% (p=0.000 n=20) Int32N1e8-8 3.610n ± 1% 4.131n ± 0% +14.43% (p=0.000 n=20) Int32N1e9-8 4.235n ± 0% 4.131n ± 0% -2.44% (p=0.000 n=20) Int32N2e9-8 4.229n ± 1% 4.131n ± 0% -2.33% (p=0.000 n=20) Float32-8 3.468n ± 0% 4.110n ± 0% +18.50% (p=0.000 n=20) Float64-8 3.447n ± 0% 4.104n ± 0% +19.05% (p=0.000 n=20) ExpFloat64-8 4.567n ± 0% 5.338n ± 0% +16.86% (p=0.000 n=20) NormFloat64-8 4.821n ± 0% 5.731n ± 0% +18.89% (p=0.000 n=20) Perm3-8 28.89n ± 0% 26.62n ± 0% -7.84% (p=0.000 n=20) Perm30-8 175.7n ± 0% 194.6n ± 2% +10.76% (p=0.000 n=20) Perm30ViaShuffle-8 153.5n ± 0% 156.4n ± 0% +1.86% (p=0.000 n=20) ShuffleOverhead-8 119.8n ± 1% 125.8n ± 0% +4.97% (p=0.000 n=20) Concurrent-8 2.433n ± 3% 2.654n ± 6% +9.13% (p=0.001 n=20) PCG_DXSM-8 2.531n ± 0% goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 220860f76f.386 │ afa459a2f0.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.370n ± 1% 7.680n ± 1% +224.05% (p=0.000 n=20) GlobalInt64-32 3.569n ± 1% 3.474n ± 3% -2.66% (p=0.001 n=20) GlobalInt63Parallel-32 0.3221n ± 1% GlobalInt64Parallel-32 0.3253n ± 0% GlobalUint64-32 8.797n ± 10% 3.433n ± 2% -60.98% (p=0.000 n=20) GlobalUint64Parallel-32 0.6351n ± 0% 0.3156n ± 0% -50.31% (p=0.000 n=20) Int64-32 2.612n ± 2% 7.707n ± 1% +195.04% (p=0.000 n=20) Uint64-32 3.350n ± 1% 7.714n ± 1% +130.25% (p=0.000 n=20) GlobalIntN1000-32 5.892n ± 1% 6.236n ± 1% +5.82% (p=0.000 n=20) IntN1000-32 4.546n ± 1% 10.410n ± 1% +128.97% (p=0.000 n=20) Int64N1000-32 14.59n ± 1% 10.97n ± 2% -24.75% (p=0.000 n=20) Int64N1e8-32 14.76n ± 2% 10.98n ± 1% -25.58% (p=0.000 n=20) Int64N1e9-32 16.57n ± 1% 10.95n ± 0% -33.90% (p=0.000 n=20) Int64N2e9-32 14.54n ± 1% 11.11n ± 1% -23.62% (p=0.000 n=20) Int64N1e18-32 16.14n ± 1% 15.18n ± 2% -5.95% (p=0.000 n=20) Int64N2e18-32 18.10n ± 1% 15.61n ± 1% -13.73% (p=0.000 n=20) Int64N4e18-32 18.65n ± 1% 19.23n ± 2% +3.08% (p=0.000 n=20) Int32N1000-32 3.560n ± 1% 10.345n ± 1% +190.55% (p=0.000 n=20) Int32N1e8-32 3.770n ± 2% 10.330n ± 1% +174.01% (p=0.000 n=20) Int32N1e9-32 4.098n ± 0% 10.350n ± 1% +152.53% (p=0.000 n=20) Int32N2e9-32 4.179n ± 1% 10.345n ± 1% +147.52% (p=0.000 n=20) Float32-32 21.18n ± 4% 13.57n ± 1% -35.93% (p=0.000 n=20) Float64-32 20.60n ± 2% 22.95n ± 4% +11.41% (p=0.000 n=20) ExpFloat64-32 13.07n ± 0% 15.23n ± 2% +16.48% (p=0.000 n=20) NormFloat64-32 7.738n ± 2% 13.780n ± 1% +78.08% (p=0.000 n=20) Perm3-32 36.73n ± 1% 46.62n ± 2% +26.91% (p=0.000 n=20) Perm30-32 211.9n ± 1% 400.7n ± 1% +89.05% (p=0.000 n=20) Perm30ViaShuffle-32 165.2n ± 1% 350.5n ± 1% +112.20% (p=0.000 n=20) ShuffleOverhead-32 133.9n ± 1% 326.0n ± 2% +143.37% (p=0.000 n=20) Concurrent-32 3.287n ± 2% 3.290n ± 0% ~ (p=0.365 n=20) PCG_DXSM-32 7.793n ± 2% For #61716. Change-Id: I4e9c0525b5f84a2ac46f23da9e365495e2d05777 Reviewed-on: https://go-review.googlesource.com/c/go/+/502506 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

For the original math/rand, we ported Plan 9's random number generator, which was a refinement by Ken Thompson of an algorithm by Don Mitchell and Jim Reeds, which Mitchell in turn recalls as having been derived from an algorithm by Marsaglia. At its core, it is an additive lagged Fibonacci generator (ALFG). Whatever the details of the history, this generator is nowhere near the current state of the art for simple, pseudo-random generators. This CL adds an implementation of Melissa O'Neill's PCG, specifically the variant PCG-DXSM, which she defined after writing the PCG paper and which is now the default in Numpy. The update is slightly slower (a few multiplies and adds, instead of a few adds), but the state is dramatically smaller (2 words instead of 607). The statistical output properties are better too. A followup CL will delete the old generator. PCG is the only change here, so no benchmarks should be affected. Including them anyway as further evidence for caution. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 8993506f2f.amd64 │ 01ff938549.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.325n ± 1% 1.352n ± 1% +2.00% (p=0.000 n=20) GlobalInt64-32 2.240n ± 1% 2.083n ± 0% -7.03% (p=0.000 n=20) GlobalInt64Parallel-32 0.1041n ± 1% 0.1035n ± 1% ~ (p=0.064 n=20) GlobalUint64-32 2.072n ± 3% 2.038n ± 1% ~ (p=0.089 n=20) GlobalUint64Parallel-32 0.1008n ± 1% 0.1006n ± 1% ~ (p=0.804 n=20) Int64-32 1.716n ± 1% 1.687n ± 2% ~ (p=0.045 n=20) Uint64-32 1.665n ± 1% 1.674n ± 2% ~ (p=0.878 n=20) GlobalIntN1000-32 3.335n ± 1% 3.135n ± 1% -6.00% (p=0.000 n=20) IntN1000-32 2.484n ± 1% 2.478n ± 1% ~ (p=0.085 n=20) Int64N1000-32 2.502n ± 2% 2.455n ± 1% -1.88% (p=0.002 n=20) Int64N1e8-32 2.484n ± 2% 2.467n ± 2% ~ (p=0.048 n=20) Int64N1e9-32 2.502n ± 0% 2.454n ± 1% -1.92% (p=0.000 n=20) Int64N2e9-32 2.502n ± 0% 2.482n ± 1% -0.76% (p=0.000 n=20) Int64N1e18-32 3.201n ± 1% 3.349n ± 2% +4.62% (p=0.000 n=20) Int64N2e18-32 3.504n ± 1% 3.537n ± 1% ~ (p=0.185 n=20) Int64N4e18-32 4.873n ± 1% 4.917n ± 0% +0.90% (p=0.000 n=20) Int32N1000-32 2.639n ± 1% 2.386n ± 1% -9.57% (p=0.000 n=20) Int32N1e8-32 2.686n ± 2% 2.366n ± 1% -11.91% (p=0.000 n=20) Int32N1e9-32 2.636n ± 1% 2.355n ± 2% -10.70% (p=0.000 n=20) Int32N2e9-32 2.660n ± 1% 2.371n ± 1% -10.88% (p=0.000 n=20) Float32-32 2.261n ± 1% 2.245n ± 2% ~ (p=0.752 n=20) Float64-32 2.280n ± 1% 2.235n ± 1% -1.97% (p=0.007 n=20) ExpFloat64-32 3.891n ± 1% 3.813n ± 3% ~ (p=0.087 n=20) NormFloat64-32 3.711n ± 1% 3.652n ± 2% ~ (p=0.021 n=20) Perm3-32 32.60n ± 2% 33.12n ± 3% ~ (p=0.107 n=20) Perm30-32 204.2n ± 0% 205.1n ± 1% ~ (p=0.358 n=20) Perm30ViaShuffle-32 121.7n ± 2% 110.8n ± 1% -8.96% (p=0.000 n=20) ShuffleOverhead-32 106.2n ± 2% 113.0n ± 1% +6.36% (p=0.000 n=20) Concurrent-32 2.190n ± 5% 2.100n ± 0% -4.13% (p=0.001 n=20) PCG_DXSM-32 1.490n ± 0% goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ 8993506f2f.arm64 │ 01ff938549.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.271n ± 0% 2.258n ± 1% ~ (p=0.167 n=20) GlobalInt64-8 2.161n ± 1% 2.167n ± 0% ~ (p=0.693 n=20) GlobalInt64Parallel-8 0.4303n ± 0% 0.4310n ± 0% ~ (p=0.051 n=20) GlobalUint64-8 2.164n ± 1% 2.182n ± 1% ~ (p=0.042 n=20) GlobalUint64Parallel-8 0.4287n ± 0% 0.4297n ± 0% ~ (p=0.082 n=20) Int64-8 2.478n ± 1% 2.472n ± 1% ~ (p=0.151 n=20) Uint64-8 2.460n ± 1% 2.449n ± 1% ~ (p=0.013 n=20) GlobalIntN1000-8 2.814n ± 2% 2.814n ± 2% ~ (p=0.821 n=20) IntN1000-8 3.003n ± 2% 2.998n ± 2% ~ (p=0.024 n=20) Int64N1000-8 2.954n ± 0% 2.949n ± 2% ~ (p=0.192 n=20) Int64N1e8-8 2.956n ± 0% 2.953n ± 2% ~ (p=0.109 n=20) Int64N1e9-8 3.325n ± 0% 2.950n ± 0% -11.26% (p=0.000 n=20) Int64N2e9-8 2.956n ± 2% 2.946n ± 2% ~ (p=0.027 n=20) Int64N1e18-8 3.780n ± 1% 3.779n ± 1% ~ (p=0.815 n=20) Int64N2e18-8 4.385n ± 0% 4.370n ± 1% ~ (p=0.402 n=20) Int64N4e18-8 6.527n ± 0% 6.544n ± 1% ~ (p=0.140 n=20) Int32N1000-8 2.964n ± 1% 2.950n ± 0% -0.47% (p=0.002 n=20) Int32N1e8-8 2.964n ± 1% 2.950n ± 2% ~ (p=0.013 n=20) Int32N1e9-8 2.963n ± 2% 2.951n ± 2% ~ (p=0.062 n=20) Int32N2e9-8 2.961n ± 2% 2.950n ± 2% -0.37% (p=0.002 n=20) Float32-8 3.442n ± 0% 3.441n ± 0% ~ (p=0.211 n=20) Float64-8 3.442n ± 0% 3.442n ± 0% ~ (p=0.067 n=20) ExpFloat64-8 4.472n ± 0% 4.481n ± 0% +0.20% (p=0.000 n=20) NormFloat64-8 4.734n ± 0% 4.725n ± 0% -0.19% (p=0.003 n=20) Perm3-8 26.55n ± 0% 26.55n ± 0% ~ (p=0.833 n=20) Perm30-8 181.9n ± 0% 181.9n ± 0% -0.03% (p=0.004 n=20) Perm30ViaShuffle-8 143.1n ± 0% 142.9n ± 0% ~ (p=0.204 n=20) ShuffleOverhead-8 120.6n ± 1% 120.8n ± 2% ~ (p=0.102 n=20) Concurrent-8 2.357n ± 2% 2.421n ± 6% ~ (p=0.016 n=20) PCG_DXSM-8 2.531n ± 0% goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 8993506f2f.386 │ 01ff938549.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.102n ± 2% 2.069n ± 0% ~ (p=0.021 n=20) GlobalInt64-32 3.542n ± 2% 3.456n ± 1% -2.44% (p=0.001 n=20) GlobalInt64Parallel-32 0.3202n ± 0% 0.3252n ± 0% +1.56% (p=0.000 n=20) GlobalUint64-32 3.507n ± 1% 3.573n ± 1% +1.87% (p=0.000 n=20) GlobalUint64Parallel-32 0.3170n ± 1% 0.3159n ± 0% ~ (p=0.167 n=20) Int64-32 2.516n ± 1% 2.562n ± 2% ~ (p=0.016 n=20) Uint64-32 2.544n ± 1% 2.592n ± 0% +1.85% (p=0.000 n=20) GlobalIntN1000-32 6.237n ± 1% 6.266n ± 2% ~ (p=0.268 n=20) IntN1000-32 4.670n ± 2% 4.724n ± 2% ~ (p=0.644 n=20) Int64N1000-32 5.412n ± 1% 5.490n ± 2% ~ (p=0.159 n=20) Int64N1e8-32 5.414n ± 2% 5.513n ± 2% ~ (p=0.129 n=20) Int64N1e9-32 5.473n ± 1% 5.476n ± 1% ~ (p=0.723 n=20) Int64N2e9-32 5.487n ± 1% 5.501n ± 2% ~ (p=0.481 n=20) Int64N1e18-32 8.901n ± 2% 9.043n ± 2% ~ (p=0.330 n=20) Int64N2e18-32 9.521n ± 1% 9.601n ± 2% ~ (p=0.703 n=20) Int64N4e18-32 11.92n ± 1% 12.00n ± 1% ~ (p=0.489 n=20) Int32N1000-32 4.785n ± 1% 4.829n ± 2% ~ (p=0.402 n=20) Int32N1e8-32 4.748n ± 1% 4.825n ± 2% ~ (p=0.218 n=20) Int32N1e9-32 4.810n ± 1% 4.830n ± 2% ~ (p=0.794 n=20) Int32N2e9-32 4.812n ± 1% 4.750n ± 2% ~ (p=0.057 n=20) Float32-32 10.48n ± 4% 10.89n ± 4% ~ (p=0.162 n=20) Float64-32 19.79n ± 3% 19.60n ± 4% ~ (p=0.668 n=20) ExpFloat64-32 12.91n ± 3% 12.96n ± 3% ~ (p=1.000 n=20) NormFloat64-32 7.462n ± 1% 7.516n ± 1% ~ (p=0.051 n=20) Perm3-32 35.98n ± 2% 36.78n ± 2% ~ (p=0.033 n=20) Perm30-32 241.5n ± 1% 238.9n ± 2% ~ (p=0.126 n=20) Perm30ViaShuffle-32 187.3n ± 2% 189.7n ± 2% ~ (p=0.387 n=20) ShuffleOverhead-32 160.2n ± 1% 159.8n ± 1% ~ (p=0.256 n=20) Concurrent-32 3.308n ± 3% 3.286n ± 1% ~ (p=0.038 n=20) PCG_DXSM-32 7.613n ± 1% For #61716. Change-Id: Icb274ca1f782504d658305a40159b4ae6a2f3f1d Reviewed-on: https://go-review.googlesource.com/c/go/+/502505 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Rob Pike <r@golang.org>

The compiler says Perm is being inlined into BenchmarkPerm, and yet BenchmarkPerm30ViaShuffle, which you'd think is the same code, still runs significantly faster. The benchmarks are mystifying but this is clearly still a step in the right direction, since BenchmarkPerm30ViaShuffle is still the fastest and we avoid having two copies of that logic. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ e1bbe739fb.amd64 │ 8993506f2f.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.316n ± 2% 1.325n ± 1% ~ (p=0.208 n=20) GlobalInt64-32 2.048n ± 1% 2.240n ± 1% +9.38% (p=0.000 n=20) GlobalInt64Parallel-32 0.1037n ± 1% 0.1041n ± 1% ~ (p=0.774 n=20) GlobalUint64-32 2.039n ± 2% 2.072n ± 3% ~ (p=0.115 n=20) GlobalUint64Parallel-32 0.1013n ± 1% 0.1008n ± 1% ~ (p=0.417 n=20) Int64-32 1.692n ± 2% 1.716n ± 1% ~ (p=0.122 n=20) Uint64-32 1.643n ± 2% 1.665n ± 1% ~ (p=0.062 n=20) GlobalIntN1000-32 3.287n ± 1% 3.335n ± 1% ~ (p=0.147 n=20) IntN1000-32 2.678n ± 2% 2.484n ± 1% -7.24% (p=0.000 n=20) Int64N1000-32 2.684n ± 2% 2.502n ± 2% -6.80% (p=0.000 n=20) Int64N1e8-32 2.663n ± 2% 2.484n ± 2% -6.76% (p=0.000 n=20) Int64N1e9-32 2.633n ± 1% 2.502n ± 0% -4.98% (p=0.000 n=20) Int64N2e9-32 2.657n ± 1% 2.502n ± 0% -5.87% (p=0.000 n=20) Int64N1e18-32 3.125n ± 2% 3.201n ± 1% +2.43% (p=0.000 n=20) Int64N2e18-32 3.476n ± 1% 3.504n ± 1% +0.83% (p=0.009 n=20) Int64N4e18-32 4.795n ± 1% 4.873n ± 1% ~ (p=0.106 n=20) Int32N1000-32 2.485n ± 2% 2.639n ± 1% +6.20% (p=0.000 n=20) Int32N1e8-32 2.457n ± 1% 2.686n ± 2% +9.34% (p=0.000 n=20) Int32N1e9-32 2.452n ± 1% 2.636n ± 1% +7.52% (p=0.000 n=20) Int32N2e9-32 2.453n ± 1% 2.660n ± 1% +8.44% (p=0.000 n=20) Float32-32 2.254n ± 1% 2.261n ± 1% ~ (p=0.888 n=20) Float64-32 2.262n ± 1% 2.280n ± 1% ~ (p=0.040 n=20) ExpFloat64-32 3.777n ± 2% 3.891n ± 1% +3.03% (p=0.000 n=20) NormFloat64-32 3.606n ± 1% 3.711n ± 1% +2.91% (p=0.000 n=20) Perm3-32 33.12n ± 2% 32.60n ± 2% ~ (p=0.045 n=20) Perm30-32 176.1n ± 1% 204.2n ± 0% +15.96% (p=0.000 n=20) Perm30ViaShuffle-32 109.3n ± 1% 121.7n ± 2% +11.30% (p=0.000 n=20) ShuffleOverhead-32 112.5n ± 1% 106.2n ± 2% -5.56% (p=0.000 n=20) Concurrent-32 2.099n ± 0% 2.190n ± 5% +4.36% (p=0.001 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ e1bbe739fb.arm64 │ 8993506f2f.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.290n ± 1% 2.271n ± 0% ~ (p=0.015 n=20) GlobalInt64-8 2.180n ± 1% 2.161n ± 1% ~ (p=0.180 n=20) GlobalInt64Parallel-8 0.4294n ± 0% 0.4303n ± 0% +0.19% (p=0.001 n=20) GlobalUint64-8 2.170n ± 1% 2.164n ± 1% ~ (p=0.673 n=20) GlobalUint64Parallel-8 0.4283n ± 0% 0.4287n ± 0% ~ (p=0.128 n=20) Int64-8 2.481n ± 1% 2.478n ± 1% ~ (p=0.867 n=20) Uint64-8 2.464n ± 1% 2.460n ± 1% ~ (p=0.763 n=20) GlobalIntN1000-8 2.814n ± 0% 2.814n ± 2% ~ (p=0.969 n=20) IntN1000-8 2.934n ± 2% 3.003n ± 2% +2.35% (p=0.000 n=20) Int64N1000-8 2.957n ± 1% 2.954n ± 0% ~ (p=0.285 n=20) Int64N1e8-8 2.935n ± 2% 2.956n ± 0% +0.73% (p=0.002 n=20) Int64N1e9-8 2.935n ± 2% 3.325n ± 0% +13.29% (p=0.000 n=20) Int64N2e9-8 2.933n ± 4% 2.956n ± 2% ~ (p=0.163 n=20) Int64N1e18-8 3.781n ± 1% 3.780n ± 1% ~ (p=0.805 n=20) Int64N2e18-8 4.362n ± 0% 4.385n ± 0% ~ (p=0.077 n=20) Int64N4e18-8 6.576n ± 1% 6.527n ± 0% ~ (p=0.024 n=20) Int32N1000-8 2.942n ± 2% 2.964n ± 1% ~ (p=0.073 n=20) Int32N1e8-8 2.941n ± 1% 2.964n ± 1% ~ (p=0.058 n=20) Int32N1e9-8 2.938n ± 2% 2.963n ± 2% +0.87% (p=0.003 n=20) Int32N2e9-8 2.982n ± 2% 2.961n ± 2% ~ (p=0.056 n=20) Float32-8 3.441n ± 0% 3.442n ± 0% ~ (p=0.030 n=20) Float64-8 3.441n ± 0% 3.442n ± 0% +0.03% (p=0.001 n=20) ExpFloat64-8 4.472n ± 0% 4.472n ± 0% ~ (p=0.877 n=20) NormFloat64-8 4.716n ± 0% 4.734n ± 0% +0.38% (p=0.000 n=20) Perm3-8 26.66n ± 0% 26.55n ± 0% -0.39% (p=0.000 n=20) Perm30-8 143.3n ± 0% 181.9n ± 0% +26.97% (p=0.000 n=20) Perm30ViaShuffle-8 142.9n ± 0% 143.1n ± 0% ~ (p=0.669 n=20) ShuffleOverhead-8 121.1n ± 1% 120.6n ± 1% -0.41% (p=0.004 n=20) Concurrent-8 2.379n ± 2% 2.357n ± 2% ~ (p=0.337 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ e1bbe739fb.386 │ 8993506f2f.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.087n ± 1% 2.102n ± 2% ~ (p=0.507 n=20) GlobalInt64-32 3.538n ± 2% 3.542n ± 2% ~ (p=0.425 n=20) GlobalInt64Parallel-32 0.3207n ± 1% 0.3202n ± 0% ~ (p=0.963 n=20) GlobalUint64-32 3.543n ± 1% 3.507n ± 1% ~ (p=0.034 n=20) GlobalUint64Parallel-32 0.3170n ± 0% 0.3170n ± 1% ~ (p=0.920 n=20) Int64-32 2.548n ± 1% 2.516n ± 1% ~ (p=0.139 n=20) Uint64-32 2.565n ± 2% 2.544n ± 1% ~ (p=0.394 n=20) GlobalIntN1000-32 6.300n ± 1% 6.237n ± 1% ~ (p=0.029 n=20) IntN1000-32 4.750n ± 0% 4.670n ± 2% ~ (p=0.034 n=20) Int64N1000-32 5.515n ± 2% 5.412n ± 1% -1.86% (p=0.009 n=20) Int64N1e8-32 5.527n ± 0% 5.414n ± 2% -2.05% (p=0.002 n=20) Int64N1e9-32 5.531n ± 2% 5.473n ± 1% ~ (p=0.047 n=20) Int64N2e9-32 5.514n ± 2% 5.487n ± 1% ~ (p=0.298 n=20) Int64N1e18-32 9.059n ± 1% 8.901n ± 2% ~ (p=0.037 n=20) Int64N2e18-32 9.594n ± 1% 9.521n ± 1% ~ (p=0.051 n=20) Int64N4e18-32 12.05n ± 2% 11.92n ± 1% ~ (p=0.357 n=20) Int32N1000-32 4.840n ± 2% 4.785n ± 1% ~ (p=0.189 n=20) Int32N1e8-32 4.832n ± 2% 4.748n ± 1% ~ (p=0.042 n=20) Int32N1e9-32 4.815n ± 2% 4.810n ± 1% ~ (p=0.878 n=20) Int32N2e9-32 4.813n ± 1% 4.812n ± 1% ~ (p=0.542 n=20) Float32-32 10.90n ± 2% 10.48n ± 4% -3.85% (p=0.007 n=20) Float64-32 20.32n ± 4% 19.79n ± 3% ~ (p=0.553 n=20) ExpFloat64-32 12.95n ± 3% 12.91n ± 3% ~ (p=0.909 n=20) NormFloat64-32 7.570n ± 1% 7.462n ± 1% -1.44% (p=0.004 n=20) Perm3-32 37.80n ± 2% 35.98n ± 2% -4.79% (p=0.000 n=20) Perm30-32 214.0n ± 1% 241.5n ± 1% +12.85% (p=0.000 n=20) Perm30ViaShuffle-32 188.7n ± 2% 187.3n ± 2% ~ (p=0.029 n=20) ShuffleOverhead-32 160.8n ± 1% 160.2n ± 1% ~ (p=0.180 n=20) Concurrent-32 3.288n ± 0% 3.308n ± 3% ~ (p=0.037 n=20) For #61716. Change-Id: I342b611456c3569520d3c91c849d29eba325d87e Reviewed-on: https://go-review.googlesource.com/c/go/+/502504 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Rob Pike <r@golang.org>

The original implementation of the ziggurat algorithm was designed for 32-bit random integer inputs. This necessitated reusing some low-order bits for the slice selection and the random coordinate, which introduces statistical bias. The result is that PractRand consistently fails the math/rand normal and exponential sequences (transformed to uniform) within 2 GB of variates. This change adjusts the ziggurat procedures to use 63-bit random inputs, so that there is no need to reuse bits between the slice and coordinate. This is sufficient for the normal sequence to survive to 256 GB of PractRand testing. An alternative technique is to recalculate the ziggurats to use 1024 rather than 128 or 256 slices to make full use of 64-bit inputs. This improves the survival of the normal sequence to far beyond 256 GB and additionally provides a 6% performance improvement due to the improved rejection procedure efficiency. However, doing so increases the total size of the ziggurat tables from 4.5 kB to 48 kB. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 2703446c2e.amd64 │ e1bbe739fb.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.337n ± 1% 1.316n ± 2% ~ (p=0.024 n=20) GlobalInt64-32 2.225n ± 2% 2.048n ± 1% -7.93% (p=0.000 n=20) GlobalInt64Parallel-32 0.1043n ± 2% 0.1037n ± 1% ~ (p=0.587 n=20) GlobalUint64-32 2.058n ± 1% 2.039n ± 2% ~ (p=0.030 n=20) GlobalUint64Parallel-32 0.1009n ± 1% 0.1013n ± 1% ~ (p=0.984 n=20) Int64-32 1.719n ± 2% 1.692n ± 2% ~ (p=0.085 n=20) Uint64-32 1.669n ± 1% 1.643n ± 2% ~ (p=0.049 n=20) GlobalIntN1000-32 3.321n ± 2% 3.287n ± 1% ~ (p=0.298 n=20) IntN1000-32 2.479n ± 1% 2.678n ± 2% +8.01% (p=0.000 n=20) Int64N1000-32 2.477n ± 1% 2.684n ± 2% +8.38% (p=0.000 n=20) Int64N1e8-32 2.490n ± 1% 2.663n ± 2% +6.99% (p=0.000 n=20) Int64N1e9-32 2.458n ± 1% 2.633n ± 1% +7.12% (p=0.000 n=20) Int64N2e9-32 2.486n ± 2% 2.657n ± 1% +6.90% (p=0.000 n=20) Int64N1e18-32 3.215n ± 2% 3.125n ± 2% -2.78% (p=0.000 n=20) Int64N2e18-32 3.588n ± 2% 3.476n ± 1% -3.15% (p=0.000 n=20) Int64N4e18-32 4.938n ± 2% 4.795n ± 1% -2.91% (p=0.000 n=20) Int32N1000-32 2.673n ± 2% 2.485n ± 2% -7.02% (p=0.000 n=20) Int32N1e8-32 2.631n ± 2% 2.457n ± 1% -6.63% (p=0.000 n=20) Int32N1e9-32 2.628n ± 2% 2.452n ± 1% -6.70% (p=0.000 n=20) Int32N2e9-32 2.684n ± 2% 2.453n ± 1% -8.61% (p=0.000 n=20) Float32-32 2.240n ± 2% 2.254n ± 1% ~ (p=0.878 n=20) Float64-32 2.253n ± 1% 2.262n ± 1% ~ (p=0.963 n=20) ExpFloat64-32 3.677n ± 1% 3.777n ± 2% +2.71% (p=0.004 n=20) NormFloat64-32 3.761n ± 1% 3.606n ± 1% -4.15% (p=0.000 n=20) Perm3-32 33.55n ± 2% 33.12n ± 2% ~ (p=0.402 n=20) Perm30-32 173.2n ± 1% 176.1n ± 1% +1.67% (p=0.000 n=20) Perm30ViaShuffle-32 115.9n ± 1% 109.3n ± 1% -5.69% (p=0.000 n=20) ShuffleOverhead-32 101.9n ± 1% 112.5n ± 1% +10.35% (p=0.000 n=20) Concurrent-32 2.107n ± 6% 2.099n ± 0% ~ (p=0.051 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ 2703446c2e.arm64 │ e1bbe739fb.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.275n ± 0% 2.290n ± 1% ~ (p=0.044 n=20) GlobalInt64-8 2.154n ± 1% 2.180n ± 1% ~ (p=0.068 n=20) GlobalInt64Parallel-8 0.4298n ± 0% 0.4294n ± 0% ~ (p=0.079 n=20) GlobalUint64-8 2.160n ± 1% 2.170n ± 1% ~ (p=0.129 n=20) GlobalUint64Parallel-8 0.4286n ± 0% 0.4283n ± 0% ~ (p=0.350 n=20) Int64-8 2.491n ± 1% 2.481n ± 1% ~ (p=0.330 n=20) Uint64-8 2.458n ± 0% 2.464n ± 1% ~ (p=0.351 n=20) GlobalIntN1000-8 2.814n ± 2% 2.814n ± 0% ~ (p=0.325 n=20) IntN1000-8 2.933n ± 0% 2.934n ± 2% ~ (p=0.079 n=20) Int64N1000-8 2.962n ± 1% 2.957n ± 1% ~ (p=0.259 n=20) Int64N1e8-8 2.960n ± 1% 2.935n ± 2% ~ (p=0.276 n=20) Int64N1e9-8 2.935n ± 2% 2.935n ± 2% ~ (p=0.984 n=20) Int64N2e9-8 2.934n ± 0% 2.933n ± 4% ~ (p=0.463 n=20) Int64N1e18-8 3.777n ± 1% 3.781n ± 1% ~ (p=0.516 n=20) Int64N2e18-8 4.359n ± 1% 4.362n ± 0% ~ (p=0.256 n=20) Int64N4e18-8 6.536n ± 1% 6.576n ± 1% ~ (p=0.224 n=20) Int32N1000-8 2.937n ± 0% 2.942n ± 2% ~ (p=0.312 n=20) Int32N1e8-8 2.937n ± 1% 2.941n ± 1% ~ (p=0.463 n=20) Int32N1e9-8 2.936n ± 0% 2.938n ± 2% ~ (p=0.044 n=20) Int32N2e9-8 2.938n ± 2% 2.982n ± 2% ~ (p=0.174 n=20) Float32-8 3.441n ± 0% 3.441n ± 0% ~ (p=0.064 n=20) Float64-8 3.441n ± 0% 3.441n ± 0% ~ (p=0.826 n=20) ExpFloat64-8 4.486n ± 0% 4.472n ± 0% -0.31% (p=0.000 n=20) NormFloat64-8 4.721n ± 0% 4.716n ± 0% ~ (p=0.051 n=20) Perm3-8 26.65n ± 0% 26.66n ± 0% ~ (p=0.080 n=20) Perm30-8 143.2n ± 0% 143.3n ± 0% +0.10% (p=0.000 n=20) Perm30ViaShuffle-8 143.0n ± 0% 142.9n ± 0% ~ (p=0.642 n=20) ShuffleOverhead-8 120.6n ± 1% 121.1n ± 1% +0.41% (p=0.010 n=20) Concurrent-8 2.399n ± 5% 2.379n ± 2% ~ (p=0.365 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 2703446c2e.386 │ e1bbe739fb.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.072n ± 2% 2.087n ± 1% ~ (p=0.440 n=20) GlobalInt64-32 3.546n ± 27% 3.538n ± 2% ~ (p=0.101 n=20) GlobalInt64Parallel-32 0.3211n ± 0% 0.3207n ± 1% ~ (p=0.753 n=20) GlobalUint64-32 3.522n ± 2% 3.543n ± 1% ~ (p=0.071 n=20) GlobalUint64Parallel-32 0.3172n ± 0% 0.3170n ± 0% ~ (p=0.507 n=20) Int64-32 2.520n ± 2% 2.548n ± 1% ~ (p=0.267 n=20) Uint64-32 2.581n ± 1% 2.565n ± 2% ~ (p=0.143 n=20) GlobalIntN1000-32 6.171n ± 1% 6.300n ± 1% ~ (p=0.037 n=20) IntN1000-32 4.752n ± 2% 4.750n ± 0% ~ (p=0.984 n=20) Int64N1000-32 5.429n ± 1% 5.515n ± 2% ~ (p=0.292 n=20) Int64N1e8-32 5.469n ± 2% 5.527n ± 0% ~ (p=0.013 n=20) Int64N1e9-32 5.489n ± 2% 5.531n ± 2% ~ (p=0.256 n=20) Int64N2e9-32 5.492n ± 2% 5.514n ± 2% ~ (p=0.606 n=20) Int64N1e18-32 8.927n ± 1% 9.059n ± 1% ~ (p=0.229 n=20) Int64N2e18-32 9.622n ± 1% 9.594n ± 1% ~ (p=0.703 n=20) Int64N4e18-32 12.03n ± 1% 12.05n ± 2% ~ (p=0.733 n=20) Int32N1000-32 4.817n ± 1% 4.840n ± 2% ~ (p=0.941 n=20) Int32N1e8-32 4.801n ± 1% 4.832n ± 2% ~ (p=0.228 n=20) Int32N1e9-32 4.798n ± 1% 4.815n ± 2% ~ (p=0.560 n=20) Int32N2e9-32 4.840n ± 1% 4.813n ± 1% ~ (p=0.015 n=20) Float32-32 10.51n ± 4% 10.90n ± 2% +3.71% (p=0.007 n=20) Float64-32 20.33n ± 3% 20.32n ± 4% ~ (p=0.566 n=20) ExpFloat64-32 12.59n ± 2% 12.95n ± 3% +2.86% (p=0.002 n=20) NormFloat64-32 7.350n ± 2% 7.570n ± 1% +2.99% (p=0.007 n=20) Perm3-32 39.29n ± 2% 37.80n ± 2% -3.79% (p=0.000 n=20) Perm30-32 219.1n ± 2% 214.0n ± 1% -2.33% (p=0.002 n=20) Perm30ViaShuffle-32 189.8n ± 2% 188.7n ± 2% ~ (p=0.147 n=20) ShuffleOverhead-32 158.9n ± 2% 160.8n ± 1% ~ (p=0.176 n=20) Concurrent-32 3.306n ± 3% 3.288n ± 0% -0.54% (p=0.005 n=20) For #61716. Change-Id: I4c5fe710b310dc075ae21c97d1805bcc20db5050 Reviewed-on: https://go-review.googlesource.com/c/go/+/516275 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Rob Pike <r@golang.org>

We realized too late after Go 1 that float64(r.Uint64())/(1<<64) is not a correct implementation: it occasionally rounds to 1. The correct implementation is float64(r.Uint64()&(1<<53-1))/(1<<53) but we couldn't change the implementation for compatibility, so we changed it to retry only in the "round to 1" cases. The change to v2 lets us update the algorithm to the simpler, faster one. Note that this implementation cannot generate 2⁻⁵⁴, nor 2⁻¹⁰⁰, nor any of the other numbers between 0 and 2⁻⁵³. A slower algorithm could shift some of the probability of generating these two boundary values over to the values in between, but that would be much slower and not necessarily be better. In particular, the current implementation has the property that there are uniform gaps between the possible returned floats, which might help stability. Also, the result is often scaled and shifted, like Float64()*X+Y. Multiplying by X>1 would open new gaps, and adding most Y would erase all the distinctions that were introduced. The only changes to benchmarks should be in Float32 and Float64. The other changes remain a cautionary tale. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 4d84a369d1.amd64 │ 2703446c2e.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.348n ± 2% 1.337n ± 1% ~ (p=0.662 n=20) GlobalInt64-32 2.082n ± 2% 2.225n ± 2% +6.87% (p=0.000 n=20) GlobalInt64Parallel-32 0.1036n ± 1% 0.1043n ± 2% ~ (p=0.171 n=20) GlobalUint64-32 2.077n ± 2% 2.058n ± 1% ~ (p=0.560 n=20) GlobalUint64Parallel-32 0.1012n ± 1% 0.1009n ± 1% ~ (p=0.995 n=20) Int64-32 1.750n ± 0% 1.719n ± 2% -1.74% (p=0.000 n=20) Uint64-32 1.707n ± 2% 1.669n ± 1% -2.20% (p=0.000 n=20) GlobalIntN1000-32 3.192n ± 1% 3.321n ± 2% +4.04% (p=0.000 n=20) IntN1000-32 2.462n ± 2% 2.479n ± 1% ~ (p=0.417 n=20) Int64N1000-32 2.470n ± 1% 2.477n ± 1% ~ (p=0.664 n=20) Int64N1e8-32 2.503n ± 2% 2.490n ± 1% ~ (p=0.245 n=20) Int64N1e9-32 2.487n ± 1% 2.458n ± 1% ~ (p=0.032 n=20) Int64N2e9-32 2.487n ± 1% 2.486n ± 2% ~ (p=0.507 n=20) Int64N1e18-32 3.006n ± 2% 3.215n ± 2% +6.94% (p=0.000 n=20) Int64N2e18-32 3.368n ± 1% 3.588n ± 2% +6.55% (p=0.000 n=20) Int64N4e18-32 4.763n ± 1% 4.938n ± 2% +3.69% (p=0.000 n=20) Int32N1000-32 2.403n ± 1% 2.673n ± 2% +11.19% (p=0.000 n=20) Int32N1e8-32 2.405n ± 1% 2.631n ± 2% +9.42% (p=0.000 n=20) Int32N1e9-32 2.402n ± 2% 2.628n ± 2% +9.41% (p=0.000 n=20) Int32N2e9-32 2.384n ± 1% 2.684n ± 2% +12.56% (p=0.000 n=20) Float32-32 2.641n ± 2% 2.240n ± 2% -15.18% (p=0.000 n=20) Float64-32 2.483n ± 1% 2.253n ± 1% -9.26% (p=0.000 n=20) ExpFloat64-32 3.486n ± 2% 3.677n ± 1% +5.49% (p=0.000 n=20) NormFloat64-32 3.648n ± 1% 3.761n ± 1% +3.11% (p=0.000 n=20) Perm3-32 33.04n ± 1% 33.55n ± 2% ~ (p=0.180 n=20) Perm30-32 171.9n ± 1% 173.2n ± 1% ~ (p=0.050 n=20) Perm30ViaShuffle-32 100.3n ± 1% 115.9n ± 1% +15.55% (p=0.000 n=20) ShuffleOverhead-32 102.5n ± 1% 101.9n ± 1% ~ (p=0.266 n=20) Concurrent-32 2.101n ± 0% 2.107n ± 6% ~ (p=0.212 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ 4d84a369d1.arm64 │ 2703446c2e.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.261n ± 1% 2.275n ± 0% ~ (p=0.082 n=20) GlobalInt64-8 2.160n ± 1% 2.154n ± 1% ~ (p=0.490 n=20) GlobalInt64Parallel-8 0.4299n ± 0% 0.4298n ± 0% ~ (p=0.663 n=20) GlobalUint64-8 2.169n ± 1% 2.160n ± 1% ~ (p=0.292 n=20) GlobalUint64Parallel-8 0.4293n ± 1% 0.4286n ± 0% ~ (p=0.155 n=20) Int64-8 2.473n ± 1% 2.491n ± 1% ~ (p=0.317 n=20) Uint64-8 2.453n ± 1% 2.458n ± 0% ~ (p=0.941 n=20) GlobalIntN1000-8 2.814n ± 2% 2.814n ± 2% ~ (p=0.972 n=20) IntN1000-8 2.933n ± 2% 2.933n ± 0% ~ (p=0.287 n=20) Int64N1000-8 2.934n ± 2% 2.962n ± 1% ~ (p=0.062 n=20) Int64N1e8-8 2.935n ± 2% 2.960n ± 1% ~ (p=0.183 n=20) Int64N1e9-8 2.934n ± 2% 2.935n ± 2% ~ (p=0.367 n=20) Int64N2e9-8 2.935n ± 2% 2.934n ± 0% ~ (p=0.455 n=20) Int64N1e18-8 3.778n ± 1% 3.777n ± 1% ~ (p=0.995 n=20) Int64N2e18-8 4.359n ± 1% 4.359n ± 1% ~ (p=0.122 n=20) Int64N4e18-8 6.546n ± 1% 6.536n ± 1% ~ (p=0.920 n=20) Int32N1000-8 2.940n ± 2% 2.937n ± 0% ~ (p=0.149 n=20) Int32N1e8-8 2.937n ± 2% 2.937n ± 1% ~ (p=0.620 n=20) Int32N1e9-8 2.938n ± 0% 2.936n ± 0% ~ (p=0.046 n=20) Int32N2e9-8 2.938n ± 2% 2.938n ± 2% ~ (p=0.455 n=20) Float32-8 3.486n ± 0% 3.441n ± 0% -1.28% (p=0.000 n=20) Float64-8 3.480n ± 0% 3.441n ± 0% -1.13% (p=0.000 n=20) ExpFloat64-8 4.533n ± 0% 4.486n ± 0% -1.03% (p=0.000 n=20) NormFloat64-8 4.764n ± 0% 4.721n ± 0% -0.90% (p=0.000 n=20) Perm3-8 26.66n ± 0% 26.65n ± 0% ~ (p=0.019 n=20) Perm30-8 143.4n ± 0% 143.2n ± 0% -0.17% (p=0.000 n=20) Perm30ViaShuffle-8 142.9n ± 0% 143.0n ± 0% ~ (p=0.522 n=20) ShuffleOverhead-8 120.7n ± 0% 120.6n ± 1% ~ (p=0.488 n=20) Concurrent-8 2.360n ± 2% 2.399n ± 5% ~ (p=0.062 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 4d84a369d1.386 │ 2703446c2e.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.101n ± 2% 2.072n ± 2% ~ (p=0.273 n=20) GlobalInt64-32 3.518n ± 2% 3.546n ± 27% +0.78% (p=0.007 n=20) GlobalInt64Parallel-32 0.3206n ± 0% 0.3211n ± 0% ~ (p=0.386 n=20) GlobalUint64-32 3.538n ± 1% 3.522n ± 2% ~ (p=0.331 n=20) GlobalUint64Parallel-32 0.3231n ± 0% 0.3172n ± 0% -1.84% (p=0.000 n=20) Int64-32 2.554n ± 2% 2.520n ± 2% ~ (p=0.465 n=20) Uint64-32 2.575n ± 2% 2.581n ± 1% ~ (p=0.213 n=20) GlobalIntN1000-32 6.292n ± 1% 6.171n ± 1% ~ (p=0.015 n=20) IntN1000-32 4.735n ± 1% 4.752n ± 2% ~ (p=0.635 n=20) Int64N1000-32 5.489n ± 2% 5.429n ± 1% ~ (p=0.324 n=20) Int64N1e8-32 5.528n ± 2% 5.469n ± 2% ~ (p=0.013 n=20) Int64N1e9-32 5.438n ± 2% 5.489n ± 2% ~ (p=0.984 n=20) Int64N2e9-32 5.474n ± 1% 5.492n ± 2% ~ (p=0.616 n=20) Int64N1e18-32 9.053n ± 1% 8.927n ± 1% ~ (p=0.037 n=20) Int64N2e18-32 9.685n ± 2% 9.622n ± 1% ~ (p=0.449 n=20) Int64N4e18-32 12.18n ± 1% 12.03n ± 1% ~ (p=0.013 n=20) Int32N1000-32 4.862n ± 1% 4.817n ± 1% -0.94% (p=0.002 n=20) Int32N1e8-32 4.758n ± 2% 4.801n ± 1% ~ (p=0.597 n=20) Int32N1e9-32 4.772n ± 1% 4.798n ± 1% ~ (p=0.774 n=20) Int32N2e9-32 4.847n ± 0% 4.840n ± 1% ~ (p=0.867 n=20) Float32-32 22.18n ± 4% 10.51n ± 4% -52.61% (p=0.000 n=20) Float64-32 21.21n ± 3% 20.33n ± 3% -4.17% (p=0.000 n=20) ExpFloat64-32 12.39n ± 2% 12.59n ± 2% ~ (p=0.139 n=20) NormFloat64-32 7.422n ± 1% 7.350n ± 2% ~ (p=0.208 n=20) Perm3-32 38.00n ± 2% 39.29n ± 2% +3.38% (p=0.000 n=20) Perm30-32 212.7n ± 1% 219.1n ± 2% +3.03% (p=0.001 n=20) Perm30ViaShuffle-32 187.5n ± 2% 189.8n ± 2% ~ (p=0.457 n=20) ShuffleOverhead-32 159.7n ± 1% 158.9n ± 2% ~ (p=0.920 n=20) Concurrent-32 3.470n ± 0% 3.306n ± 3% -4.71% (p=0.000 n=20) For #61716. Change-Id: I1933f1f9efd7e6e832d83e7fa5d84398f67d41f5 Reviewed-on: https://go-review.googlesource.com/c/go/+/502503 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Now that we can break the value stream, we can take advantage of better algorithms that have been suggested since the original code was written. Also optimizes IntN, Int32N, Int64N, Perm (indirectly). All the N variants (IntN, Int32N, Int64N, UintN, N, etc) now return the same values given a Source and parameter n, so that for example uint(r.IntN(10)) and r.UintN(10) and r.N(uint(10)) are completely interchangeable. Int64N4e18 gets slower but that is a near worst case for the algorithm and is extremely unlikely in practice. 32-bit Int32N variants got slower too, by 15-30%, in exchange for speeding up everything on 64-bit systems and consistency across the N functions. Also rename previously missed benchmark GlobalInt63Parallel to GlobalInt64Parallel. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 11ad9fdddc.amd64 │ 4d84a369d1.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.335n ± 1% 1.348n ± 2% ~ (p=0.335 n=20) GlobalInt64-32 2.046n ± 1% 2.082n ± 2% ~ (p=0.310 n=20) GlobalInt63Parallel-32 0.1037n ± 1% GlobalInt64Parallel-32 0.1036n ± 1% GlobalUint64-32 2.075n ± 0% 2.077n ± 2% ~ (p=0.228 n=20) GlobalUint64Parallel-32 0.1013n ± 1% 0.1012n ± 1% ~ (p=0.878 n=20) Int64-32 1.726n ± 2% 1.750n ± 0% +1.39% (p=0.000 n=20) Uint64-32 1.673n ± 1% 1.707n ± 2% +2.03% (p=0.002 n=20) GlobalIntN1000-32 3.895n ± 2% 3.192n ± 1% -18.05% (p=0.000 n=20) IntN1000-32 3.403n ± 1% 2.462n ± 2% -27.65% (p=0.000 n=20) Int64N1000-32 3.053n ± 2% 2.470n ± 1% -19.11% (p=0.000 n=20) Int64N1e8-32 2.718n ± 1% 2.503n ± 2% -7.91% (p=0.000 n=20) Int64N1e9-32 2.712n ± 1% 2.487n ± 1% -8.31% (p=0.000 n=20) Int64N2e9-32 2.690n ± 1% 2.487n ± 1% -7.57% (p=0.000 n=20) Int64N1e18-32 3.084n ± 2% 3.006n ± 2% -2.53% (p=0.000 n=20) Int64N2e18-32 4.026n ± 1% 3.368n ± 1% -16.33% (p=0.000 n=20) Int64N4e18-32 4.049n ± 2% 4.763n ± 1% +17.62% (p=0.000 n=20) Int32N1000-32 2.730n ± 0% 2.403n ± 1% -11.94% (p=0.000 n=20) Int32N1e8-32 2.916n ± 2% 2.405n ± 1% -17.53% (p=0.000 n=20) Int32N1e9-32 3.375n ± 1% 2.402n ± 2% -28.83% (p=0.000 n=20) Int32N2e9-32 3.292n ± 1% 2.384n ± 1% -27.58% (p=0.000 n=20) Float32-32 2.673n ± 1% 2.641n ± 2% ~ (p=0.147 n=20) Float64-32 2.485n ± 1% 2.483n ± 1% ~ (p=0.804 n=20) ExpFloat64-32 3.577n ± 2% 3.486n ± 2% -2.57% (p=0.000 n=20) NormFloat64-32 3.797n ± 2% 3.648n ± 1% -3.92% (p=0.000 n=20) Perm3-32 35.79n ± 2% 33.04n ± 1% -7.68% (p=0.000 n=20) Perm30-32 205.1n ± 1% 171.9n ± 1% -16.14% (p=0.000 n=20) Perm30ViaShuffle-32 111.2n ± 2% 100.3n ± 1% -9.76% (p=0.000 n=20) ShuffleOverhead-32 100.5n ± 2% 102.5n ± 1% +1.99% (p=0.007 n=20) Concurrent-32 2.188n ± 5% 2.101n ± 0% ~ (p=0.013 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ 11ad9fdddc.arm64 │ 4d84a369d1.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.272n ± 1% 2.261n ± 1% ~ (p=0.172 n=20) GlobalInt64-8 2.155n ± 1% 2.160n ± 1% ~ (p=0.482 n=20) GlobalInt63Parallel-8 0.4352n ± 0% GlobalInt64Parallel-8 0.4299n ± 0% GlobalUint64-8 2.173n ± 1% 2.169n ± 1% ~ (p=0.262 n=20) GlobalUint64Parallel-8 0.4340n ± 0% 0.4293n ± 1% -1.08% (p=0.000 n=20) Int64-8 2.544n ± 1% 2.473n ± 1% -2.83% (p=0.000 n=20) Uint64-8 2.552n ± 1% 2.453n ± 1% -3.90% (p=0.000 n=20) GlobalIntN1000-8 3.856n ± 0% 2.814n ± 2% -27.02% (p=0.000 n=20) IntN1000-8 3.820n ± 0% 2.933n ± 2% -23.22% (p=0.000 n=20) Int64N1000-8 3.219n ± 2% 2.934n ± 2% -8.85% (p=0.000 n=20) Int64N1e8-8 3.221n ± 2% 2.935n ± 2% -8.91% (p=0.000 n=20) Int64N1e9-8 3.276n ± 2% 2.934n ± 2% -10.44% (p=0.000 n=20) Int64N2e9-8 3.217n ± 0% 2.935n ± 2% -8.78% (p=0.000 n=20) Int64N1e18-8 3.502n ± 2% 3.778n ± 1% +7.91% (p=0.000 n=20) Int64N2e18-8 4.968n ± 1% 4.359n ± 1% -12.26% (p=0.000 n=20) Int64N4e18-8 4.963n ± 0% 6.546n ± 1% +31.92% (p=0.000 n=20) Int32N1000-8 3.189n ± 1% 2.940n ± 2% -7.81% (p=0.000 n=20) Int32N1e8-8 3.514n ± 1% 2.937n ± 2% -16.41% (p=0.000 n=20) Int32N1e9-8 4.133n ± 0% 2.938n ± 0% -28.91% (p=0.000 n=20) Int32N2e9-8 4.137n ± 0% 2.938n ± 2% -28.97% (p=0.000 n=20) Float32-8 3.468n ± 1% 3.486n ± 0% +0.52% (p=0.000 n=20) Float64-8 3.478n ± 0% 3.480n ± 0% ~ (p=0.063 n=20) ExpFloat64-8 4.563n ± 0% 4.533n ± 0% -0.67% (p=0.000 n=20) NormFloat64-8 4.768n ± 0% 4.764n ± 0% -0.07% (p=0.001 n=20) Perm3-8 28.94n ± 0% 26.66n ± 0% -7.88% (p=0.000 n=20) Perm30-8 175.9n ± 0% 143.4n ± 0% -18.50% (p=0.000 n=20) Perm30ViaShuffle-8 152.6n ± 1% 142.9n ± 0% -6.29% (p=0.000 n=20) ShuffleOverhead-8 119.6n ± 1% 120.7n ± 0% +0.96% (p=0.000 n=20) Concurrent-8 2.452n ± 3% 2.360n ± 2% -3.73% (p=0.007 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 11ad9fdddc.386 │ 4d84a369d1.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.091n ± 1% 2.101n ± 2% ~ (p=0.672 n=20) GlobalInt64-32 3.514n ± 2% 3.518n ± 2% ~ (p=0.723 n=20) GlobalInt63Parallel-32 0.3197n ± 0% GlobalInt64Parallel-32 0.3206n ± 0% GlobalUint64-32 3.542n ± 1% 3.538n ± 1% ~ (p=0.304 n=20) GlobalUint64Parallel-32 0.3218n ± 0% 0.3231n ± 0% ~ (p=0.071 n=20) Int64-32 2.552n ± 2% 2.554n ± 2% ~ (p=0.693 n=20) Uint64-32 2.566n ± 1% 2.575n ± 2% ~ (p=0.606 n=20) GlobalIntN1000-32 5.965n ± 2% 6.292n ± 1% +5.46% (p=0.000 n=20) IntN1000-32 4.652n ± 1% 4.735n ± 1% +1.77% (p=0.000 n=20) Int64N1000-32 14.485n ± 1% 5.489n ± 2% -62.11% (p=0.000 n=20) Int64N1e8-32 14.675n ± 1% 5.528n ± 2% -62.33% (p=0.000 n=20) Int64N1e9-32 16.805n ± 2% 5.438n ± 2% -67.64% (p=0.000 n=20) Int64N2e9-32 14.515n ± 1% 5.474n ± 1% -62.28% (p=0.000 n=20) Int64N1e18-32 16.165n ± 1% 9.053n ± 1% -44.00% (p=0.000 n=20) Int64N2e18-32 17.945n ± 2% 9.685n ± 2% -46.03% (p=0.000 n=20) Int64N4e18-32 18.35n ± 2% 12.18n ± 1% -33.62% (p=0.000 n=20) Int32N1000-32 3.608n ± 1% 4.862n ± 1% +34.77% (p=0.000 n=20) Int32N1e8-32 3.767n ± 1% 4.758n ± 2% +26.31% (p=0.000 n=20) Int32N1e9-32 4.130n ± 2% 4.772n ± 1% +15.54% (p=0.000 n=20) Int32N2e9-32 4.206n ± 1% 4.847n ± 0% +15.24% (p=0.000 n=20) Float32-32 22.18n ± 4% 22.18n ± 4% ~ (p=0.195 n=20) Float64-32 20.75n ± 4% 21.21n ± 3% ~ (p=0.394 n=20) ExpFloat64-32 12.58n ± 3% 12.39n ± 2% ~ (p=0.032 n=20) NormFloat64-32 7.920n ± 3% 7.422n ± 1% -6.29% (p=0.000 n=20) Perm3-32 40.27n ± 1% 38.00n ± 2% -5.65% (p=0.000 n=20) Perm30-32 213.2n ± 2% 212.7n ± 1% ~ (p=0.995 n=20) Perm30ViaShuffle-32 164.2n ± 2% 187.5n ± 2% +14.22% (p=0.000 n=20) ShuffleOverhead-32 134.7n ± 2% 159.7n ± 1% +18.52% (p=0.000 n=20) Concurrent-32 3.301n ± 2% 3.470n ± 0% +5.10% (p=0.000 n=20) For #61716. Change-Id: Id1481b04202883cd0b23e21bb58d1bca4e482bd3 Reviewed-on: https://go-review.googlesource.com/c/go/+/502500 Reviewed-by: Rob Pike <r@golang.org> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

This should make Uint64-using functions faster and leave other things alone. It is a mystery why so much got faster. A good cautionary tale not to read too much into minor jitter in the benchmarks. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 220860f76f.amd64 │ 11ad9fdddc.amd64 │ │ sec/op │ sec/op vs base │ SourceUint64-32 1.555n ± 1% 1.335n ± 1% -14.15% (p=0.000 n=20) GlobalInt64-32 2.071n ± 1% 2.046n ± 1% ~ (p=0.016 n=20) GlobalInt63Parallel-32 0.1023n ± 1% 0.1037n ± 1% +1.37% (p=0.002 n=20) GlobalUint64-32 5.193n ± 1% 2.075n ± 0% -60.06% (p=0.000 n=20) GlobalUint64Parallel-32 0.2341n ± 0% 0.1013n ± 1% -56.74% (p=0.000 n=20) Int64-32 2.056n ± 2% 1.726n ± 2% -16.10% (p=0.000 n=20) Uint64-32 2.077n ± 2% 1.673n ± 1% -19.46% (p=0.000 n=20) GlobalIntN1000-32 4.077n ± 2% 3.895n ± 2% -4.45% (p=0.000 n=20) IntN1000-32 3.476n ± 2% 3.403n ± 1% -2.10% (p=0.000 n=20) Int64N1000-32 3.059n ± 1% 3.053n ± 2% ~ (p=0.131 n=20) Int64N1e8-32 2.942n ± 1% 2.718n ± 1% -7.60% (p=0.000 n=20) Int64N1e9-32 2.932n ± 1% 2.712n ± 1% -7.50% (p=0.000 n=20) Int64N2e9-32 2.925n ± 1% 2.690n ± 1% -8.03% (p=0.000 n=20) Int64N1e18-32 3.116n ± 1% 3.084n ± 2% ~ (p=0.425 n=20) Int64N2e18-32 4.067n ± 1% 4.026n ± 1% -1.02% (p=0.007 n=20) Int64N4e18-32 4.054n ± 1% 4.049n ± 2% ~ (p=0.204 n=20) Int32N1000-32 2.951n ± 1% 2.730n ± 0% -7.49% (p=0.000 n=20) Int32N1e8-32 3.102n ± 1% 2.916n ± 2% -6.03% (p=0.000 n=20) Int32N1e9-32 3.535n ± 1% 3.375n ± 1% -4.54% (p=0.000 n=20) Int32N2e9-32 3.514n ± 1% 3.292n ± 1% -6.30% (p=0.000 n=20) Float32-32 2.760n ± 1% 2.673n ± 1% -3.13% (p=0.000 n=20) Float64-32 2.284n ± 1% 2.485n ± 1% +8.80% (p=0.000 n=20) ExpFloat64-32 3.757n ± 1% 3.577n ± 2% -4.78% (p=0.000 n=20) NormFloat64-32 3.837n ± 1% 3.797n ± 2% ~ (p=0.204 n=20) Perm3-32 35.23n ± 2% 35.79n ± 2% ~ (p=0.298 n=20) Perm30-32 208.8n ± 1% 205.1n ± 1% -1.82% (p=0.000 n=20) Perm30ViaShuffle-32 111.7n ± 1% 111.2n ± 2% ~ (p=0.273 n=20) ShuffleOverhead-32 101.1n ± 1% 100.5n ± 2% ~ (p=0.878 n=20) Concurrent-32 2.108n ± 7% 2.188n ± 5% ~ (p=0.417 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 │ 220860f76f.arm64 │ 11ad9fdddc.arm64 │ │ sec/op │ sec/op vs base │ SourceUint64-8 2.316n ± 1% 2.272n ± 1% -1.86% (p=0.000 n=20) GlobalInt64-8 2.183n ± 1% 2.155n ± 1% ~ (p=0.122 n=20) GlobalInt63Parallel-8 0.4331n ± 0% 0.4352n ± 0% +0.48% (p=0.000 n=20) GlobalUint64-8 4.377n ± 2% 2.173n ± 1% -50.35% (p=0.000 n=20) GlobalUint64Parallel-8 0.9237n ± 0% 0.4340n ± 0% -53.02% (p=0.000 n=20) Int64-8 2.538n ± 1% 2.544n ± 1% ~ (p=0.189 n=20) Uint64-8 2.604n ± 1% 2.552n ± 1% -1.98% (p=0.000 n=20) GlobalIntN1000-8 3.857n ± 2% 3.856n ± 0% ~ (p=0.051 n=20) IntN1000-8 3.822n ± 2% 3.820n ± 0% -0.05% (p=0.001 n=20) Int64N1000-8 3.318n ± 0% 3.219n ± 2% -2.98% (p=0.000 n=20) Int64N1e8-8 3.349n ± 1% 3.221n ± 2% -3.79% (p=0.000 n=20) Int64N1e9-8 3.317n ± 2% 3.276n ± 2% -1.24% (p=0.001 n=20) Int64N2e9-8 3.317n ± 2% 3.217n ± 0% -3.01% (p=0.000 n=20) Int64N1e18-8 3.542n ± 1% 3.502n ± 2% -1.16% (p=0.001 n=20) Int64N2e18-8 5.087n ± 0% 4.968n ± 1% -2.33% (p=0.000 n=20) Int64N4e18-8 5.084n ± 0% 4.963n ± 0% -2.39% (p=0.000 n=20) Int32N1000-8 3.208n ± 2% 3.189n ± 1% -0.58% (p=0.001 n=20) Int32N1e8-8 3.610n ± 1% 3.514n ± 1% -2.67% (p=0.000 n=20) Int32N1e9-8 4.235n ± 0% 4.133n ± 0% -2.40% (p=0.000 n=20) Int32N2e9-8 4.229n ± 1% 4.137n ± 0% -2.19% (p=0.000 n=20) Float32-8 3.468n ± 0% 3.468n ± 1% ~ (p=0.350 n=20) Float64-8 3.447n ± 0% 3.478n ± 0% +0.90% (p=0.000 n=20) ExpFloat64-8 4.567n ± 0% 4.563n ± 0% -0.10% (p=0.002 n=20) NormFloat64-8 4.821n ± 0% 4.768n ± 0% -1.09% (p=0.000 n=20) Perm3-8 28.89n ± 0% 28.94n ± 0% +0.17% (p=0.000 n=20) Perm30-8 175.7n ± 0% 175.9n ± 0% +0.14% (p=0.000 n=20) Perm30ViaShuffle-8 153.5n ± 0% 152.6n ± 1% ~ (p=0.010 n=20) ShuffleOverhead-8 119.8n ± 1% 119.6n ± 1% ~ (p=0.147 n=20) Concurrent-8 2.433n ± 3% 2.452n ± 3% ~ (p=0.616 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 220860f76f.386 │ 11ad9fdddc.386 │ │ sec/op │ sec/op vs base │ SourceUint64-32 2.370n ± 1% 2.091n ± 1% -11.75% (p=0.000 n=20) GlobalInt64-32 3.569n ± 1% 3.514n ± 2% -1.56% (p=0.000 n=20) GlobalInt63Parallel-32 0.3221n ± 1% 0.3197n ± 0% -0.76% (p=0.000 n=20) GlobalUint64-32 8.797n ± 10% 3.542n ± 1% -59.74% (p=0.000 n=20) GlobalUint64Parallel-32 0.6351n ± 0% 0.3218n ± 0% -49.33% (p=0.000 n=20) Int64-32 2.612n ± 2% 2.552n ± 2% -2.30% (p=0.000 n=20) Uint64-32 3.350n ± 1% 2.566n ± 1% -23.42% (p=0.000 n=20) GlobalIntN1000-32 5.892n ± 1% 5.965n ± 2% ~ (p=0.082 n=20) IntN1000-32 4.546n ± 1% 4.652n ± 1% +2.33% (p=0.000 n=20) Int64N1000-32 14.59n ± 1% 14.48n ± 1% ~ (p=0.652 n=20) Int64N1e8-32 14.76n ± 2% 14.67n ± 1% ~ (p=0.836 n=20) Int64N1e9-32 16.57n ± 1% 16.80n ± 2% ~ (p=0.016 n=20) Int64N2e9-32 14.54n ± 1% 14.52n ± 1% ~ (p=0.533 n=20) Int64N1e18-32 16.14n ± 1% 16.16n ± 1% ~ (p=0.606 n=20) Int64N2e18-32 18.10n ± 1% 17.95n ± 2% ~ (p=0.062 n=20) Int64N4e18-32 18.65n ± 1% 18.35n ± 2% -1.61% (p=0.010 n=20) Int32N1000-32 3.560n ± 1% 3.608n ± 1% +1.33% (p=0.001 n=20) Int32N1e8-32 3.770n ± 2% 3.767n ± 1% ~ (p=0.155 n=20) Int32N1e9-32 4.098n ± 0% 4.130n ± 2% ~ (p=0.016 n=20) Int32N2e9-32 4.179n ± 1% 4.206n ± 1% ~ (p=0.011 n=20) Float32-32 21.18n ± 4% 22.18n ± 4% +4.70% (p=0.003 n=20) Float64-32 20.60n ± 2% 20.75n ± 4% +0.73% (p=0.000 n=20) ExpFloat64-32 13.07n ± 0% 12.58n ± 3% -3.82% (p=0.000 n=20) NormFloat64-32 7.738n ± 2% 7.920n ± 3% ~ (p=0.066 n=20) Perm3-32 36.73n ± 1% 40.27n ± 1% +9.65% (p=0.000 n=20) Perm30-32 211.9n ± 1% 213.2n ± 2% ~ (p=0.262 n=20) Perm30ViaShuffle-32 165.2n ± 1% 164.2n ± 2% ~ (p=0.029 n=20) ShuffleOverhead-32 133.9n ± 1% 134.7n ± 2% ~ (p=0.551 n=20) Concurrent-32 3.287n ± 2% 3.301n ± 2% ~ (p=0.330 n=20) For #61716. Change-Id: I8d2f73f87dd3603a0c2ff069988938e0957b6904 Reviewed-on: https://go-review.googlesource.com/c/go/+/502499 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Rob Pike <r@golang.org>

Change the benchmarks to use the result of the calls, as I found that in certain cases inlining resulted in discarding part of the computation in the benchmark loop. Add various benchmarks that will be relevant in future CLs. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 220860f76f.amd64 │ │ sec/op │ SourceUint64-32 1.555n ± 1% GlobalInt64-32 2.071n ± 1% GlobalInt63Parallel-32 0.1023n ± 1% GlobalUint64-32 5.193n ± 1% GlobalUint64Parallel-32 0.2341n ± 0% Int64-32 2.056n ± 2% Uint64-32 2.077n ± 2% GlobalIntN1000-32 4.077n ± 2% IntN1000-32 3.476n ± 2% Int64N1000-32 3.059n ± 1% Int64N1e8-32 2.942n ± 1% Int64N1e9-32 2.932n ± 1% Int64N2e9-32 2.925n ± 1% Int64N1e18-32 3.116n ± 1% Int64N2e18-32 4.067n ± 1% Int64N4e18-32 4.054n ± 1% Int32N1000-32 2.951n ± 1% Int32N1e8-32 3.102n ± 1% Int32N1e9-32 3.535n ± 1% Int32N2e9-32 3.514n ± 1% Float32-32 2.760n ± 1% Float64-32 2.284n ± 1% ExpFloat64-32 3.757n ± 1% NormFloat64-32 3.837n ± 1% Perm3-32 35.23n ± 2% Perm30-32 208.8n ± 1% Perm30ViaShuffle-32 111.7n ± 1% ShuffleOverhead-32 101.1n ± 1% Concurrent-32 2.108n ± 7% goos: darwin goarch: arm64 pkg: math/rand/v2 cpu: Apple M1 │ 220860f76f.arm64 │ │ sec/op │ SourceUint64-8 2.316n ± 1% GlobalInt64-8 2.183n ± 1% GlobalInt63Parallel-8 0.4331n ± 0% GlobalUint64-8 4.377n ± 2% GlobalUint64Parallel-8 0.9237n ± 0% Int64-8 2.538n ± 1% Uint64-8 2.604n ± 1% GlobalIntN1000-8 3.857n ± 2% IntN1000-8 3.822n ± 2% Int64N1000-8 3.318n ± 0% Int64N1e8-8 3.349n ± 1% Int64N1e9-8 3.317n ± 2% Int64N2e9-8 3.317n ± 2% Int64N1e18-8 3.542n ± 1% Int64N2e18-8 5.087n ± 0% Int64N4e18-8 5.084n ± 0% Int32N1000-8 3.208n ± 2% Int32N1e8-8 3.610n ± 1% Int32N1e9-8 4.235n ± 0% Int32N2e9-8 4.229n ± 1% Float32-8 3.468n ± 0% Float64-8 3.447n ± 0% ExpFloat64-8 4.567n ± 0% NormFloat64-8 4.821n ± 0% Perm3-8 28.89n ± 0% Perm30-8 175.7n ± 0% Perm30ViaShuffle-8 153.5n ± 0% ShuffleOverhead-8 119.8n ± 1% Concurrent-8 2.433n ± 3% goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ 220860f76f.386 │ │ sec/op │ SourceUint64-32 2.370n ± 1% GlobalInt64-32 3.569n ± 1% GlobalInt63Parallel-32 0.3221n ± 1% GlobalUint64-32 8.797n ± 10% GlobalUint64Parallel-32 0.6351n ± 0% Int64-32 2.612n ± 2% Uint64-32 3.350n ± 1% GlobalIntN1000-32 5.892n ± 1% IntN1000-32 4.546n ± 1% Int64N1000-32 14.59n ± 1% Int64N1e8-32 14.76n ± 2% Int64N1e9-32 16.57n ± 1% Int64N2e9-32 14.54n ± 1% Int64N1e18-32 16.14n ± 1% Int64N2e18-32 18.10n ± 1% Int64N4e18-32 18.65n ± 1% Int32N1000-32 3.560n ± 1% Int32N1e8-32 3.770n ± 2% Int32N1e9-32 4.098n ± 0% Int32N2e9-32 4.179n ± 1% Float32-32 21.18n ± 4% Float64-32 20.60n ± 2% ExpFloat64-32 13.07n ± 0% NormFloat64-32 7.738n ± 2% Perm3-32 36.73n ± 1% Perm30-32 211.9n ± 1% Perm30ViaShuffle-32 165.2n ± 1% ShuffleOverhead-32 133.9n ± 1% Concurrent-32 3.287n ± 2% For #61716. Change-Id: I2f0938eae4b7bf736a8cd899a99783e731bf2179 Reviewed-on: https://go-review.googlesource.com/c/go/+/502496 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Rob Pike <r@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Removing Rand.Seed lets us remove lockedSource as well, along with the ambiguity in globalRand about which source to use. For #61716. Change-Id: Ibe150520dd1e7dd87165eacaebe9f0c2daeaedfd Reviewed-on: https://go-review.googlesource.com/c/go/+/502498 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Rob Pike <r@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>

Add more test cases. Replace -printgolden with -update, which rewrites the files for us. For #61716. Change-Id: I7c4c900ee896042429135a21971a56ebe16b6a66 Reviewed-on: https://go-review.googlesource.com/c/go/+/516858 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

In math/rand, Read is deprecated. Remove in v2. People should use crypto/rand if they need long strings. For #61716. Change-Id: Ib254b7e1844616e96db60a3a7abb572b0dcb1583 Reviewed-on: https://go-review.googlesource.com/c/go/+/502497 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Int31 -> Int32 Int31n -> Int32N Int63 -> Int64 Int63n -> Int64N Intn -> IntN The 31 and 63 are pedantic and confusing: the functions should be named for the type they return, same as all the others. The lower-case n is inconsistent with Go's usual CamelCase and especially problematic because we plan to add 'func N'. Capitalize the n. For #61716. Change-Id: Idb1a005a82f353677450d47fb612ade7a41fde69 Reviewed-on: https://go-review.googlesource.com/c/go/+/516857 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

This is the beginning of the math/rand/v2 package from proposal #61716. Start by copying old API. This CL copies math/rand/* to math/rand/v2 and updates references to math/rand to add v2 throughout. Later CLs will make the v2 changes. For #61716. Change-Id: I1624ccffae3dfa442d4ba2461942decbd076e11b Reviewed-on: https://go-review.googlesource.com/c/go/+/502495 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Rob Pike <r@golang.org>

Running 'go fix' on the cmd+std packages handled much of this change. Also update code generators to use only the new go:build lines, not the old +build ones. For #41184. For #60268. Change-Id: If35532abe3012e7357b02c79d5992ff5ac37ca23 Cq-Include-Trybots: luci.golang.try:gotip-linux-386-longtest,gotip-linux-amd64-longtest,gotip-windows-amd64-longtest Reviewed-on: https://go-review.googlesource.com/c/go/+/536237 Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Change-Id: I4a6c2ef6fd21355952ab7d8eaad883646a95d364 Reviewed-on: https://go-review.googlesource.com/c/go/+/535087 Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com>

Change-Id: Id2079f7012392dea8dfe2386bb9fb1ea3f487a4a Reviewed-on: https://go-review.googlesource.com/c/go/+/526015 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: qiulaidongfeng <2645477756@qq.com>

Use ^ and $ in the -run flag regular expression value when the intention is to invoke a single named test. This removes the reliance on there not being another similarly named test to achieve the intended result. In particular, package syscall has tests named TestUnshareMountNameSpace and TestUnshareMountNameSpaceChroot that both trigger themselves setting GO_WANT_HELPER_PROCESS=1 to run alternate code in a helper process. As a consequence of overlap in their test names, the former was inadvertently triggering one too many helpers. Spotted while reviewing CL 525196. Apply the same change in other places to make it easier for code readers to see that said tests aren't running extraneous tests. The unlikely cases of -run=TestSomething intentionally being used to run all tests that have the TestSomething substring in the name can be better written as -run=^.*TestSomething.*$ or with a comment so it is clear it wasn't an oversight. Change-Id: Iba208aba3998acdbf8c6708e5d23ab88938bfc1e Reviewed-on: https://go-review.googlesource.com/c/go/+/524948 Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Kirill Kolyshkin <kolyshkin@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Change-Id: I71a38dd20bfaf2b1aed18892d54eeb017d3d7d66 GitHub-Last-Rev: 8da43b2cbd563ed123690709e519c9f84272b332 GitHub-Pull-Request: golang/go#61955 Reviewed-on: https://go-review.googlesource.com/c/go/+/518595 Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: qiulaidongfeng <2645477756@qq.com>

Change-Id: I9e95806116a8547ec782f66226d1b1382c6156de Change-Id: I9e95806116a8547ec782f66226d1b1382c6156de GitHub-Last-Rev: 5b4ce994c162775e91aa00c942571bc0ac8b1eca GitHub-Pull-Request: golang/go#61829 Reviewed-on: https://go-review.googlesource.com/c/go/+/516895 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>

new s390x assembly implementation of Sin/Cos/SinCos/Tan handle huge argument test's. Updates #29240 Change-Id: I9f22d9714528ef2af52c749079f3727250089baf Reviewed-on: https://go-review.googlesource.com/c/go/+/509675 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

Currently s390x, sin/cos assembly implementation not handling huge arguments. This change reverts assembly routine to native go implementation for huge arguments. Implementing the changes in assembly giving better performance than native go changes in terms of execution/cycles. name Go_changes Asm_changes Sin/input_size_(0.5)-8 11.85ns ± 0% 5.32ns ± 1% Sin/input_size_(1<<20)-8 15.32ns ± 0% 9.75ns ± 3% Sin/input_size_(1<<_40)-8 17.9ns ± 0% 10.3ns ± 6% Sin/input_size_(1<<50)-8 16.33ns ± 0% 9.75ns ± 6% Sin/input_size_(1<<60)-8 33.0ns ± 1% 29.1ns ± 0% Sin/input_size_(1<<80)-8 29.9ns ± 0% 27.2ns ± 2% Sin/input_size_(1<<200)-8 31.5ns ± 1% 28.3ns ± 0% Sin/input_size_(1<<480)-8 29.4ns ± 1% 28.0ns ± 1% Sin/input_size_(1234567891234567_<<_180)-8 29.3ns ± 1% 28.0ns ± 0% Cos/input_size_(0.5)-8 10.33ns ± 0% 5.69ns ± 1% Cos/input_size_(1<<20)-8 16.67ns ± 0% 9.18ns ± 0% Cos/input_size_(1<<_40)-8 18.50ns ± 0% 9.45ns ± 3% Cos/input_size_(1<<50)-8 16.67ns ± 0% 9.18ns ± 1% Cos/input_size_(1<<60)-8 31.6ns ± 1% 26.7ns ± 2% Cos/input_size_(1<<80)-8 31.3ns ± 0% 25.5ns ± 1% Cos/input_size_(1<<200)-8 30.0ns ± 0% 26.7ns ± 1% Cos/input_size_(1<<480)-8 31.9ns ±2% 27.0ns ± 0% Cos/input_size_(1234567891234567_<<_180)-8 31.8ns ± 0% 26.9ns ± 0% Fixes #29240 Change-Id: Id2ebcfa113926f27510d527e80daaddad925a707 Reviewed-on: https://go-review.googlesource.com/c/go/+/469635 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bill O'Farrell <billotosyr@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>

Currently on s390x, tan assembly implementation is not handling huge arguments at all. This change is to check for large arguments and revert back to native go implantation from assembly code in case of huge arguments. The changes are implemented in assembly code to get better performance over native go implementation. Benchmark details of tan function with table driven inputs are updated as part of the issue link. Fixes #37854 Change-Id: I4e5321e65c27b7ce8c497fc9d3991ca8604753d2 Reviewed-on: https://go-review.googlesource.com/c/go/+/470595 Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Keith Randall <khr@golang.org>

Sin/Tan are odd, Cos is even, so it is easy to compute the correct result from the positive argument case. Change-Id: If851d00fc7f515ece8199cf56d21186ced51e94f Reviewed-on: https://go-review.googlesource.com/c/go/+/509815 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Srinivas Pokala <Pokala.Srinivas@ibm.com> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Keith Randall <khr@google.com>

Adds a test that triggers the RISC-V fused multiply-add code generation bug fixed by CL 506575. Change-Id: Ia3a55a68b48c5cc6beac4e5235975dea31f3faf2 Reviewed-on: https://go-review.googlesource.com/c/go/+/507035 Auto-Submit: M Zhuo <mzh@golangcn.org> Reviewed-by: M Zhuo <mzh@golangcn.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Keith Randall <khr@golang.org>

When x*y == -z the portable implementation of FMA copied the sign bit from x*y into the result. This meant that when x*y == -z and x*y < 0 the result was -0 which is incorrect. Fixes #61130. Change-Id: Ib93a568b7bdb9031e2aedfa1bdfa9bddde90851d Reviewed-on: https://go-review.googlesource.com/c/go/+/507376 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joedian Reid <joedian@golang.org>

For #59488 Fixes #60616 Change-Id: Idf9f42d7d868999664652dd7b478684a474f1d96 Reviewed-on: https://go-review.googlesource.com/c/go/+/501355 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Rob Pike <r@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org>

Fix spelling errors discovered using https://github.com/codespell-project/codespell. Errors in data files and vendored packages are ignored. Change-Id: I83c7818222f2eea69afbd270c15b7897678131dc GitHub-Last-Rev: 3491615b1b82832cc0064f535786546e89aa6184 GitHub-Pull-Request: golang/go#60758 Reviewed-on: https://go-review.googlesource.com/c/go/+/502576 Auto-Submit: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>

There are some symbol mismatches in the comments, this commit attempts to fix them Change-Id: I5c9075e5218defe9233c075744d243b26ff68496 Reviewed-on: https://go-review.googlesource.com/c/go/+/492996 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: shuang cui <imcusg@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>

The "To" prefix was a relic of the first draft that I failed to make consistent with the unprefixed name used in the proposal. Fortunately iant spotted it during the API audit. Updates #56984 Updates #60560 Change-Id: Ifa6eeddf6dd5f0637c0568e383f9a4bef88b10f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/500116 Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Alan Donovan <adonovan@google.com>

This reverts CL 467515. Now that we have cmp.Compare, we don't need math.Compare or math.Compare32 after all. For #56491 Fixes #60519 Change-Id: I8ed33464adfc6d69bd6b328edb26aa2ee3d234d9 Reviewed-on: https://go-review.googlesource.com/c/go/+/499416 Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Eli Bendersky <eliben@google.com>

b.ResetTimer used to also stop the timer, however it does not anymore. These benchmarks hadn't been fixed and as a result ended up measuring some additional things. Also, make some for loops more conventional. Change-Id: I76ca68456d85eec51722a80587e5b2c9f5d836a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/496996 Run-TryBot: Damien Neil <dneil@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Auto-Submit: Damien Neil <dneil@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Damien Neil <dneil@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com>

Fix comments, including duplicate is, wrong phrases and articles, misspellings, etc. Change-Id: I8bfea53b9b275e649757cc4bee6a8a026ed9c7a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/493035 Reviewed-by: Benny Siegert <bsiegert@gmail.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: shuang cui <imcusg@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>

The initial purpose of PCALIGN was to identify code where it would be beneficial to align code for performance, but avoid cases where too many NOPs were added. On p10, it is now necessary to enforce a certain alignment in some cases, so the behavior of PCALIGN needs to be slightly different. Code will now be aligned to the value specified on the PCALIGN instruction regardless of number of NOPs added, which is more intuitive and consistent with power assembler alignment directives. This also adds 64 as a possible alignment value. The existing values used in PCALIGN were modified according to the new behavior. A testcase was updated and performance testing was done to verify that this does not adversely affect performance. Change-Id: Iad1cf5ff112e5bfc0514f0805be90e24095e932b Reviewed-on: https://go-review.googlesource.com/c/go/+/485056 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Bryan Mills <bcmills@google.com>

Re-run all go:generate stringer commands. This mostly adds checks that the constant values did not change, but does add new strings for the debug/dwarf and internal/pkgbits packages. Change-Id: I5fc41f20da47338152c183d45d5ae65074e2fccf Reviewed-on: https://go-review.googlesource.com/c/go/+/483717 Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org>

Fixes #59331 Change-Id: I62156be2f2758c59349c3b02db6cf9140429c9e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/481915 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Bypass: Ian Lance Taylor <iant@google.com> Reviewed-by: Russ Cox <rsc@golang.org>

Fixes the misuse of "a" vs "an", according to English grammatical expectations and using https://www.a-or-an.com/ Change-Id: I53ac724070e3ff3d33c304483fe72c023c7cda47 Reviewed-on: https://go-review.googlesource.com/c/go/+/480536 Run-TryBot: shuang cui <imcusg@gmail.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>

I noticed the one in path/filepath while reading the docs, and the other ones were found via some quick grepping. Change-Id: I386f2f74ef816a6d18aa2f58ee6b64dbd0147c9e Reviewed-on: https://go-review.googlesource.com/c/go/+/478795 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Heschi Kreinick <heschi@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>

Most of these are one-off mistakes. Only one file was all spaces. Change-Id: I277c3ce4a4811aa4248c90676f66bc775ae8d062 Reviewed-on: https://go-review.googlesource.com/c/go/+/478976 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>

This change introduces the Compare and Compare32 functions based on the total-ordering predicate in IEEE-754, section 5.10. In particular, * -NaN is ordered before any other value * +NaN is ordered after any other value * -0 is ordered before +0 * All other values are ordered the usual way Compare-8 0.4537n ± 1% Compare32-8 0.3752n ± 1% geomean 0.4126n Fixes #56491. Change-Id: I5c9c77430a2872f380688c1b0a66f2105b77d5ac Reviewed-on: https://go-review.googlesource.com/c/go/+/467515 Reviewed-by: WANG Xuerui <git@xen0n.name> Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>

Change-Id: I16ec916b47de2f417b681c8abff5a1375ddf491b Reviewed-on: https://go-review.googlesource.com/c/go/+/468055 Run-TryBot: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>

This reverts CL 459435. Reason for revert: Tests failing on MIPS. Change-Id: I9017bf718ba938df6d6766041555034d55d90b8a Reviewed-on: https://go-review.googlesource.com/c/go/+/467255 Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Akhil Indurti <aindurti@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>

Change-Id: I37a9e4362953a711840087e1b7b8d7a25f1a83b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/467275 Reviewed-by: Russ Cox <rsc@golang.org> TryBot-Bypass: Russ Cox <rsc@golang.org> Auto-Submit: Russ Cox <rsc@golang.org>

It currently says only what it wasn't good for, which is not helpful. Change-Id: I468c7f385c14eaca99788a94d53c30b729ed0944 Reviewed-on: https://go-review.googlesource.com/c/go/+/466276 Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com>

This change introduces the Compare and Compare32 functions based on the total-ordering predicate in IEEE-754, section 5.10. In particular, * -NaN is ordered before any other value * +NaN is ordered after any other value * -0 is ordered before +0 * All other values are ordered the usual way name time/op Compare-8 0.24ns ± 1% Compare32-8 0.24ns ± 0% Fixes #56491. Change-Id: I9444fbfefe26741794c4436a26d403b8da97bdaf Reviewed-on: https://go-review.googlesource.com/c/go/+/459435 Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>

Now that the top-level math/rand functions are auto-seeded by default (issue #54880), use the runtime fastrand64 function when 1) Seed has not been called; 2) the GODEBUG randautoseed=0 is not used. The benchmarks run quickly and are relatively noisy, but they show significant improvements for parallel calls to the top-level functions. goos: linux goarch: amd64 pkg: math/rand cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ /tmp/foo.1 │ /tmp/foo.2 │ │ sec/op │ sec/op vs base │ Int63Threadsafe-16 11.605n ± 1% 3.094n ± 1% -73.34% (p=0.000 n=10) Int63ThreadsafeParallel-16 67.8350n ± 2% 0.4000n ± 1% -99.41% (p=0.000 n=10) Int63Unthreadsafe-16 1.947n ± 3% 1.924n ± 2% ~ (p=0.189 n=10) Intn1000-16 4.295n ± 2% 4.287n ± 3% ~ (p=0.517 n=10) Int63n1000-16 4.379n ± 0% 4.192n ± 2% -4.27% (p=0.000 n=10) Int31n1000-16 3.641n ± 3% 3.506n ± 0% -3.69% (p=0.000 n=10) Float32-16 3.330n ± 7% 3.250n ± 2% -2.40% (p=0.017 n=10) Float64-16 2.194n ± 6% 2.056n ± 4% -6.31% (p=0.004 n=10) Perm3-16 43.39n ± 9% 38.28n ± 12% -11.77% (p=0.015 n=10) Perm30-16 324.4n ± 6% 315.9n ± 19% ~ (p=0.315 n=10) Perm30ViaShuffle-16 175.4n ± 1% 143.6n ± 2% -18.15% (p=0.000 n=10) ShuffleOverhead-16 223.4n ± 2% 215.8n ± 1% -3.38% (p=0.000 n=10) Read3-16 5.428n ± 3% 5.406n ± 2% ~ (p=0.780 n=10) Read64-16 41.55n ± 5% 40.14n ± 3% -3.38% (p=0.000 n=10) Read1000-16 622.9n ± 4% 594.9n ± 2% -4.50% (p=0.000 n=10) Concurrent-16 136.300n ± 2% 4.647n ± 26% -96.59% (p=0.000 n=10) geomean 23.40n 12.15n -48.08% Fixes #49892 Change-Id: Iba75b326145512ab0b7ece233b98ac3d4e1fb504 Reviewed-on: https://go-review.googlesource.com/c/go/+/465037 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com>

Change-Id: I31bec5d2b4a79a085942c7d380678379d99cf07b Reviewed-on: https://go-review.googlesource.com/c/go/+/455135 Auto-Submit: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Roland Shoemaker <roland@golang.org> Run-TryBot: Filippo Valsorda <filippo@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com>