| Age | Commit message (Collapse) | Author |
|
Fixes #78541
Change-Id: I73ba10b6d34f9f189b5bdd356d6325d5a4a6985f
GitHub-Last-Rev: 0594d99f55c51f2f164d17a61c4eb1b2bbb8462e
GitHub-Pull-Request: golang/go#78542
Reviewed-on: https://go-review.googlesource.com/c/go/+/763000
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Neal Patel <nealpatel@google.com>
|
|
It is annoying that non-x86 implementations of shlVU and shrVU
have to go out of their way to handle the trivial case shift==0
with their own copy loops. Instead, arrange to never call them
with shift==0, so that the code can be removed.
Unfortunately, there are linknames of shlVU, so we cannot
change that function. But we can rename the functions and
then leave behind a shlVU wrapper, so do that.
Since the big.Int API calls the operations Lsh and Rsh, rename
shlVU/shrVU to lshVU/rshVU. Also rename various other shl/shr
methods and functions to lsh/rsh.
Change-Id: Ieaf54e0110a298730aa3e4566ce5be57ba7fc121
Reviewed-on: https://go-review.googlesource.com/c/go/+/664896
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The old Karatsuba implementation only operated on lengths that are
a power of two times a number smaller than karatsubaThreshold.
For example, when karatsubaThreshold = 40, multiplying a pair
of 99-word numbers runs karatsuba on the low 96 (= 39<<2) words
and then has to fix up the answer to include the high 3 words of each.
I suspect this requirement was needed to make the analysis of
how many temporary words to reserve easier, back when the
answer was 3*n and depended on exactly halving the size at
each Karatsuba step.
Now that we have the more flexible temporary allocation stack,
we can change Karatsuba to accept operands of odd length.
Doing so avoids most of the fixup that the old approach required.
For example, multiplying a pair of 99-word numbers runs
karatsuba on all 99 words now.
This is simpler and about the same speed or, for large cases, faster.
goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) CPU @ 3.10GHz
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-16 99.62n ± 3% 99.10n ± 3% ~ (p=0.009 n=15)
GCD10x10/WithXY-16 243.4n ± 1% 245.2n ± 1% ~ (p=0.009 n=15)
GCD100x100/WithoutXY-16 921.9n ± 1% 919.2n ± 1% ~ (p=0.076 n=15)
GCD100x100/WithXY-16 1.527µ ± 1% 1.526µ ± 0% ~ (p=0.813 n=15)
GCD1000x1000/WithoutXY-16 9.704µ ± 1% 9.696µ ± 0% ~ (p=0.532 n=15)
GCD1000x1000/WithXY-16 14.03µ ± 1% 13.96µ ± 0% ~ (p=0.014 n=15)
GCD10000x10000/WithoutXY-16 206.5µ ± 2% 206.5µ ± 0% ~ (p=0.967 n=15)
GCD10000x10000/WithXY-16 398.0µ ± 1% 397.4µ ± 0% ~ (p=0.683 n=15)
Div/20/10-16 22.22n ± 0% 22.23n ± 0% ~ (p=0.105 n=15)
Div/40/20-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.307 n=15)
Div/100/50-16 55.47n ± 0% 55.47n ± 0% ~ (p=0.573 n=15)
Div/200/100-16 174.9n ± 1% 174.6n ± 1% ~ (p=0.814 n=15)
Div/400/200-16 209.5n ± 1% 210.5n ± 1% ~ (p=0.454 n=15)
Div/1000/500-16 379.9n ± 0% 383.5n ± 2% ~ (p=0.123 n=15)
Div/2000/1000-16 780.1n ± 0% 784.6n ± 1% +0.58% (p=0.000 n=15)
Div/20000/10000-16 25.22µ ± 1% 25.15µ ± 0% ~ (p=0.213 n=15)
Div/200000/100000-16 921.8µ ± 1% 926.1µ ± 0% ~ (p=0.009 n=15)
Div/2000000/1000000-16 37.91m ± 0% 35.63m ± 0% -6.02% (p=0.000 n=15)
Div/20000000/10000000-16 1.378 ± 0% 1.336 ± 0% -3.03% (p=0.000 n=15)
NatMul/10-16 166.8n ± 4% 168.9n ± 3% ~ (p=0.008 n=15)
NatMul/100-16 5.519µ ± 2% 5.548µ ± 4% ~ (p=0.032 n=15)
NatMul/1000-16 230.4µ ± 1% 220.2µ ± 1% -4.43% (p=0.000 n=15)
NatMul/10000-16 8.569m ± 1% 8.640m ± 1% ~ (p=0.005 n=15)
NatMul/100000-16 376.5m ± 1% 334.1m ± 0% -11.26% (p=0.000 n=15)
NatSqr/1-16 27.85n ± 5% 28.60n ± 2% ~ (p=0.123 n=15)
NatSqr/2-16 47.99n ± 2% 48.84n ± 1% ~ (p=0.008 n=15)
NatSqr/3-16 59.41n ± 2% 60.87n ± 2% +2.46% (p=0.001 n=15)
NatSqr/5-16 87.27n ± 2% 89.31n ± 3% ~ (p=0.087 n=15)
NatSqr/8-16 124.6n ± 3% 128.9n ± 3% ~ (p=0.006 n=15)
NatSqr/10-16 166.3n ± 3% 172.7n ± 3% ~ (p=0.002 n=15)
NatSqr/20-16 385.2n ± 2% 394.7n ± 3% ~ (p=0.036 n=15)
NatSqr/30-16 622.7n ± 3% 642.9n ± 3% ~ (p=0.032 n=15)
NatSqr/50-16 1.274µ ± 3% 1.323µ ± 4% ~ (p=0.003 n=15)
NatSqr/80-16 2.606µ ± 4% 2.714µ ± 4% ~ (p=0.044 n=15)
NatSqr/100-16 3.731µ ± 4% 3.871µ ± 4% ~ (p=0.038 n=15)
NatSqr/200-16 12.99µ ± 2% 13.09µ ± 3% ~ (p=0.838 n=15)
NatSqr/300-16 22.87µ ± 2% 23.25µ ± 2% ~ (p=0.285 n=15)
NatSqr/500-16 58.43µ ± 1% 58.25µ ± 2% ~ (p=0.345 n=15)
NatSqr/800-16 115.3µ ± 3% 116.2µ ± 3% ~ (p=0.126 n=15)
NatSqr/1000-16 173.9µ ± 1% 174.3µ ± 1% ~ (p=0.935 n=15)
NatSqr/10000-16 6.133m ± 2% 6.034m ± 1% -1.62% (p=0.000 n=15)
NatSqr/100000-16 253.8m ± 1% 241.5m ± 0% -4.87% (p=0.000 n=15)
geomean 7.745µ 7.760µ +0.19%
goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-88 62.17n ± 4% 61.44n ± 0% -1.17% (p=0.000 n=15)
GCD10x10/WithXY-88 173.4n ± 2% 172.4n ± 4% ~ (p=0.615 n=15)
GCD100x100/WithoutXY-88 584.0n ± 1% 582.9n ± 0% ~ (p=0.009 n=15)
GCD100x100/WithXY-88 1.098µ ± 1% 1.091µ ± 2% ~ (p=0.002 n=15)
GCD1000x1000/WithoutXY-88 6.055µ ± 0% 6.049µ ± 0% ~ (p=0.007 n=15)
GCD1000x1000/WithXY-88 9.430µ ± 0% 9.417µ ± 1% ~ (p=0.123 n=15)
GCD10000x10000/WithoutXY-88 153.4µ ± 2% 149.0µ ± 2% -2.85% (p=0.000 n=15)
GCD10000x10000/WithXY-88 350.6µ ± 3% 349.0µ ± 2% ~ (p=0.126 n=15)
Div/20/10-88 13.12n ± 0% 13.12n ± 1% 0.00% (p=0.042 n=15)
Div/40/20-88 13.12n ± 0% 13.13n ± 0% ~ (p=0.004 n=15)
Div/100/50-88 25.49n ± 0% 25.49n ± 0% ~ (p=0.452 n=15)
Div/200/100-88 115.7n ± 2% 113.8n ± 2% ~ (p=0.212 n=15)
Div/400/200-88 135.0n ± 1% 136.1n ± 1% ~ (p=0.005 n=15)
Div/1000/500-88 257.5n ± 1% 259.9n ± 1% ~ (p=0.004 n=15)
Div/2000/1000-88 567.5n ± 1% 572.4n ± 2% ~ (p=0.616 n=15)
Div/20000/10000-88 25.65µ ± 0% 25.77µ ± 1% ~ (p=0.032 n=15)
Div/200000/100000-88 777.4µ ± 1% 754.3µ ± 1% -2.97% (p=0.000 n=15)
Div/2000000/1000000-88 33.66m ± 0% 31.37m ± 0% -6.81% (p=0.000 n=15)
Div/20000000/10000000-88 1.320 ± 0% 1.266 ± 0% -4.04% (p=0.000 n=15)
NatMul/10-88 151.9n ± 7% 143.3n ± 7% ~ (p=0.878 n=15)
NatMul/100-88 4.418µ ± 2% 4.337µ ± 3% ~ (p=0.512 n=15)
NatMul/1000-88 206.8µ ± 1% 189.8µ ± 1% -8.25% (p=0.000 n=15)
NatMul/10000-88 8.531m ± 1% 8.095m ± 0% -5.12% (p=0.000 n=15)
NatMul/100000-88 298.9m ± 0% 260.5m ± 1% -12.85% (p=0.000 n=15)
NatSqr/1-88 27.55n ± 6% 28.25n ± 7% ~ (p=0.024 n=15)
NatSqr/2-88 44.71n ± 6% 46.21n ± 9% ~ (p=0.024 n=15)
NatSqr/3-88 55.44n ± 4% 58.41n ± 10% ~ (p=0.126 n=15)
NatSqr/5-88 80.71n ± 5% 81.41n ± 5% ~ (p=0.032 n=15)
NatSqr/8-88 115.7n ± 4% 115.4n ± 5% ~ (p=0.814 n=15)
NatSqr/10-88 147.4n ± 4% 147.3n ± 4% ~ (p=0.505 n=15)
NatSqr/20-88 337.8n ± 3% 337.3n ± 4% ~ (p=0.814 n=15)
NatSqr/30-88 556.9n ± 3% 557.6n ± 4% ~ (p=0.814 n=15)
NatSqr/50-88 1.208µ ± 4% 1.208µ ± 3% ~ (p=0.910 n=15)
NatSqr/80-88 2.591µ ± 3% 2.581µ ± 3% ~ (p=0.705 n=15)
NatSqr/100-88 3.870µ ± 3% 3.858µ ± 3% ~ (p=0.846 n=15)
NatSqr/200-88 14.43µ ± 3% 14.28µ ± 2% ~ (p=0.383 n=15)
NatSqr/300-88 24.68µ ± 2% 24.49µ ± 2% ~ (p=0.624 n=15)
NatSqr/500-88 66.27µ ± 1% 66.18µ ± 1% ~ (p=0.735 n=15)
NatSqr/800-88 128.7µ ± 1% 127.4µ ± 1% ~ (p=0.050 n=15)
NatSqr/1000-88 198.7µ ± 1% 197.7µ ± 1% ~ (p=0.229 n=15)
NatSqr/10000-88 6.582m ± 1% 6.426m ± 1% -2.37% (p=0.000 n=15)
NatSqr/100000-88 274.3m ± 0% 267.3m ± 0% -2.57% (p=0.000 n=15)
geomean 6.518µ 6.438µ -1.22%
goos: linux
goarch: arm64
pkg: math/big
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-16 61.70n ± 1% 61.32n ± 1% ~ (p=0.361 n=15)
GCD10x10/WithXY-16 217.3n ± 1% 217.0n ± 1% ~ (p=0.395 n=15)
GCD100x100/WithoutXY-16 569.7n ± 0% 572.6n ± 2% ~ (p=0.213 n=15)
GCD100x100/WithXY-16 1.241µ ± 1% 1.236µ ± 1% ~ (p=0.157 n=15)
GCD1000x1000/WithoutXY-16 5.558µ ± 0% 5.566µ ± 0% ~ (p=0.228 n=15)
GCD1000x1000/WithXY-16 9.319µ ± 0% 9.326µ ± 0% ~ (p=0.233 n=15)
GCD10000x10000/WithoutXY-16 126.4µ ± 2% 128.7µ ± 3% ~ (p=0.081 n=15)
GCD10000x10000/WithXY-16 279.3µ ± 0% 278.3µ ± 5% ~ (p=0.187 n=15)
Div/20/10-16 15.12n ± 1% 15.21n ± 1% ~ (p=0.490 n=15)
Div/40/20-16 15.11n ± 0% 15.23n ± 1% ~ (p=0.107 n=15)
Div/100/50-16 26.53n ± 0% 26.50n ± 0% ~ (p=0.299 n=15)
Div/200/100-16 123.7n ± 0% 124.0n ± 0% ~ (p=0.086 n=15)
Div/400/200-16 142.5n ± 0% 142.4n ± 0% ~ (p=0.039 n=15)
Div/1000/500-16 259.9n ± 1% 261.2n ± 1% ~ (p=0.044 n=15)
Div/2000/1000-16 539.4n ± 1% 532.3n ± 1% -1.32% (p=0.001 n=15)
Div/20000/10000-16 22.43µ ± 0% 22.32µ ± 0% -0.49% (p=0.000 n=15)
Div/200000/100000-16 898.3µ ± 0% 889.6µ ± 0% -0.96% (p=0.000 n=15)
Div/2000000/1000000-16 38.37m ± 0% 35.11m ± 0% -8.49% (p=0.000 n=15)
Div/20000000/10000000-16 1.449 ± 0% 1.384 ± 0% -4.48% (p=0.000 n=15)
NatMul/10-16 182.0n ± 1% 177.8n ± 1% -2.31% (p=0.000 n=15)
NatMul/100-16 5.537µ ± 0% 5.693µ ± 0% +2.82% (p=0.000 n=15)
NatMul/1000-16 229.9µ ± 0% 224.8µ ± 0% -2.24% (p=0.000 n=15)
NatMul/10000-16 8.985m ± 0% 8.751m ± 0% -2.61% (p=0.000 n=15)
NatMul/100000-16 371.1m ± 0% 331.5m ± 0% -10.66% (p=0.000 n=15)
NatSqr/1-16 46.77n ± 6% 42.76n ± 1% -8.57% (p=0.000 n=15)
NatSqr/2-16 66.99n ± 4% 63.62n ± 1% -5.03% (p=0.000 n=15)
NatSqr/3-16 76.79n ± 4% 73.42n ± 1% ~ (p=0.007 n=15)
NatSqr/5-16 99.00n ± 3% 95.35n ± 1% -3.69% (p=0.000 n=15)
NatSqr/8-16 160.0n ± 3% 155.1n ± 1% -3.06% (p=0.001 n=15)
NatSqr/10-16 178.4n ± 2% 175.9n ± 0% -1.40% (p=0.001 n=15)
NatSqr/20-16 361.9n ± 2% 361.3n ± 0% ~ (p=0.083 n=15)
NatSqr/30-16 584.7n ± 0% 586.8n ± 0% +0.36% (p=0.000 n=15)
NatSqr/50-16 1.327µ ± 0% 1.329µ ± 0% ~ (p=0.349 n=15)
NatSqr/80-16 2.893µ ± 1% 2.925µ ± 0% +1.11% (p=0.000 n=15)
NatSqr/100-16 4.330µ ± 1% 4.381µ ± 0% +1.18% (p=0.000 n=15)
NatSqr/200-16 16.25µ ± 1% 16.43µ ± 0% +1.07% (p=0.000 n=15)
NatSqr/300-16 27.85µ ± 1% 28.06µ ± 0% +0.77% (p=0.000 n=15)
NatSqr/500-16 76.01µ ± 0% 76.34µ ± 0% ~ (p=0.002 n=15)
NatSqr/800-16 146.8µ ± 0% 148.1µ ± 0% +0.83% (p=0.000 n=15)
NatSqr/1000-16 228.2µ ± 0% 228.6µ ± 0% ~ (p=0.123 n=15)
NatSqr/10000-16 7.524m ± 0% 7.426m ± 0% -1.31% (p=0.000 n=15)
NatSqr/100000-16 316.7m ± 0% 309.2m ± 0% -2.36% (p=0.000 n=15)
geomean 7.264µ 7.172µ -1.27%
goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-12 32.61n ± 1% 32.42n ± 1% ~ (p=0.021 n=15)
GCD10x10/WithXY-12 87.70n ± 1% 88.42n ± 1% ~ (p=0.010 n=15)
GCD100x100/WithoutXY-12 305.9n ± 0% 306.4n ± 0% ~ (p=0.003 n=15)
GCD100x100/WithXY-12 560.3n ± 2% 556.6n ± 1% ~ (p=0.018 n=15)
GCD1000x1000/WithoutXY-12 3.509µ ± 2% 3.464µ ± 1% ~ (p=0.145 n=15)
GCD1000x1000/WithXY-12 5.347µ ± 2% 5.372µ ± 1% ~ (p=0.046 n=15)
GCD10000x10000/WithoutXY-12 73.75µ ± 1% 73.99µ ± 1% ~ (p=0.004 n=15)
GCD10000x10000/WithXY-12 148.4µ ± 0% 147.8µ ± 1% ~ (p=0.076 n=15)
Div/20/10-12 9.481n ± 0% 9.462n ± 1% ~ (p=0.631 n=15)
Div/40/20-12 9.457n ± 0% 9.462n ± 1% ~ (p=0.798 n=15)
Div/100/50-12 14.91n ± 0% 14.79n ± 1% -0.80% (p=0.000 n=15)
Div/200/100-12 84.56n ± 1% 84.60n ± 1% ~ (p=0.271 n=15)
Div/400/200-12 103.8n ± 0% 102.8n ± 0% -0.96% (p=0.000 n=15)
Div/1000/500-12 181.3n ± 1% 184.2n ± 2% ~ (p=0.091 n=15)
Div/2000/1000-12 397.5n ± 0% 397.4n ± 0% ~ (p=0.299 n=15)
Div/20000/10000-12 14.04µ ± 1% 13.99µ ± 0% ~ (p=0.221 n=15)
Div/200000/100000-12 523.1µ ± 0% 514.0µ ± 3% ~ (p=0.775 n=15)
Div/2000000/1000000-12 21.58m ± 0% 20.01m ± 1% -7.29% (p=0.000 n=15)
Div/20000000/10000000-12 813.5m ± 0% 796.2m ± 1% -2.13% (p=0.000 n=15)
NatMul/10-12 80.46n ± 1% 80.02n ± 1% ~ (p=0.063 n=15)
NatMul/100-12 2.904µ ± 0% 2.979µ ± 1% +2.58% (p=0.000 n=15)
NatMul/1000-12 127.8µ ± 0% 122.3µ ± 0% -4.28% (p=0.000 n=15)
NatMul/10000-12 5.141m ± 0% 4.975m ± 1% -3.23% (p=0.000 n=15)
NatMul/100000-12 208.8m ± 0% 189.6m ± 3% -9.21% (p=0.000 n=15)
NatSqr/1-12 11.90n ± 1% 11.76n ± 1% ~ (p=0.059 n=15)
NatSqr/2-12 21.33n ± 1% 21.12n ± 0% ~ (p=0.063 n=15)
NatSqr/3-12 26.05n ± 1% 25.79n ± 0% ~ (p=0.002 n=15)
NatSqr/5-12 37.31n ± 0% 36.98n ± 1% ~ (p=0.008 n=15)
NatSqr/8-12 63.07n ± 0% 62.75n ± 1% ~ (p=0.061 n=15)
NatSqr/10-12 79.48n ± 0% 79.59n ± 0% ~ (p=0.455 n=15)
NatSqr/20-12 173.1n ± 0% 173.2n ± 1% ~ (p=0.518 n=15)
NatSqr/30-12 288.6n ± 1% 289.2n ± 0% ~ (p=0.030 n=15)
NatSqr/50-12 653.3n ± 0% 653.3n ± 0% ~ (p=0.361 n=15)
NatSqr/80-12 1.492µ ± 0% 1.496µ ± 0% ~ (p=0.018 n=15)
NatSqr/100-12 2.270µ ± 1% 2.270µ ± 0% ~ (p=0.326 n=15)
NatSqr/200-12 8.776µ ± 1% 8.784µ ± 1% ~ (p=0.083 n=15)
NatSqr/300-12 15.07µ ± 0% 15.09µ ± 0% ~ (p=0.455 n=15)
NatSqr/500-12 41.71µ ± 0% 41.77µ ± 1% ~ (p=0.305 n=15)
NatSqr/800-12 80.77µ ± 1% 80.59µ ± 0% ~ (p=0.113 n=15)
NatSqr/1000-12 126.4µ ± 1% 126.5µ ± 0% ~ (p=0.683 n=15)
NatSqr/10000-12 4.204m ± 0% 4.119m ± 0% -2.02% (p=0.000 n=15)
NatSqr/100000-12 177.0m ± 0% 172.9m ± 0% -2.31% (p=0.000 n=15)
geomean 3.790µ 3.757µ -0.87%
Change-Id: Ifc7a9b61f678df216690511ac8bb9143189a795e
Reviewed-on: https://go-review.googlesource.com/c/go/+/652057
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
In the early days of math/big, algorithms that needed more space
grew the result larger than it needed to be and then used the
high words as extra space. This made results their own temporary
space caches, at the cost that saving a result in a data structure
might hold significantly more memory than necessary.
Specifically, new(big.Int).Mul(x, y) returned a big.Int with a
backing slice 3X as big as it strictly needed to be.
If you are storing many multiplication results, or even a single
large result, the 3X overhead can add up.
This approach to storage for temporaries also requires being able
to analyze the algorithms to predict the exact amount they need,
which can be difficult.
For both these reasons, the implementation of recursive long division,
which came later, introduced a “nat pool” where temporaries could be
stored and reused, or reclaimed by the GC when no longer used.
This avoids the storage and bookkeeping overheads but introduces a
per-temporary sync.Pool overhead. divRecursiveStep takes an array
of cached temporaries to remove some of that overhead.
The nat pool was better but is still not quite right.
This CL introduces something even better than the nat pool
(still probably not quite right, but the best I can see for now):
a sync.Pool holding stacks for allocating temporaries.
Now an operation can get one stack out of the pool and then
allocate as many temporaries as it needs during the operation,
eventually returning the stack back to the pool. The sync.Pool
operations are now per-exported-operation (like big.Int.Mul),
not per-temporary.
This CL converts both the pre-allocation in nat.mul and the
uses of the nat pool to use stack pools instead. This simplifies
some code and sets us up better for more complex algorithms
(such as Toom-Cook or FFT-based multiplication) that need
more temporaries. It is also a little bit faster.
goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) CPU @ 3.10GHz
│ old │ new │
│ sec/op │ sec/op vs base │
Div/20/10-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15)
Div/40/20-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15)
Div/100/50-16 56.65n ± 0% 55.53n ± 0% -1.98% (p=0.000 n=15)
Div/200/100-16 194.6n ± 1% 172.8n ± 0% -11.20% (p=0.000 n=15)
Div/400/200-16 232.1n ± 0% 206.7n ± 0% -10.94% (p=0.000 n=15)
Div/1000/500-16 405.3n ± 1% 383.8n ± 0% -5.30% (p=0.000 n=15)
Div/2000/1000-16 810.4n ± 1% 795.2n ± 0% -1.88% (p=0.000 n=15)
Div/20000/10000-16 25.88µ ± 0% 25.39µ ± 0% -1.89% (p=0.000 n=15)
Div/200000/100000-16 931.5µ ± 0% 924.3µ ± 0% -0.77% (p=0.000 n=15)
Div/2000000/1000000-16 37.77m ± 0% 37.75m ± 0% ~ (p=0.098 n=15)
Div/20000000/10000000-16 1.367 ± 0% 1.377 ± 0% +0.72% (p=0.003 n=15)
NatMul/10-16 168.5n ± 3% 164.0n ± 4% ~ (p=0.751 n=15)
NatMul/100-16 6.086µ ± 3% 5.380µ ± 3% -11.60% (p=0.000 n=15)
NatMul/1000-16 238.1µ ± 3% 228.3µ ± 1% -4.12% (p=0.000 n=15)
NatMul/10000-16 8.721m ± 2% 8.518m ± 1% -2.33% (p=0.000 n=15)
NatMul/100000-16 369.6m ± 0% 371.1m ± 0% +0.42% (p=0.000 n=15)
geomean 19.57µ 18.74µ -4.21%
│ old │ new │
│ B/op │ B/op vs base │
NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15)
NatMul/1000-16 48.16Ki ± 0% 16.02Ki ± 0% -66.73% (p=0.000 n=15)
NatMul/10000-16 482.9Ki ± 1% 165.4Ki ± 3% -65.75% (p=0.000 n=15)
NatMul/100000-16 5.747Mi ± 7% 4.197Mi ± 0% -26.97% (p=0.000 n=15)
geomean 41.42Ki 20.63Ki -50.18%
¹ all samples are equal
│ old │ new │
│ allocs/op │ allocs/op vs base │
NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/10000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100000-16 7.000 ± 14% 7.000 ± 14% ~ (p=0.668 n=15)
geomean 1.476 1.476 +0.00%
¹ all samples are equal
goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
│ old │ new │
│ sec/op │ sec/op vs base │
Div/20/10-88 15.84n ± 1% 13.12n ± 0% -17.17% (p=0.000 n=15)
Div/40/20-88 15.88n ± 1% 13.12n ± 0% -17.38% (p=0.000 n=15)
Div/100/50-88 26.42n ± 0% 25.47n ± 0% -3.60% (p=0.000 n=15)
Div/200/100-88 132.4n ± 0% 114.9n ± 0% -13.22% (p=0.000 n=15)
Div/400/200-88 150.1n ± 0% 135.6n ± 0% -9.66% (p=0.000 n=15)
Div/1000/500-88 275.5n ± 0% 264.1n ± 0% -4.14% (p=0.000 n=15)
Div/2000/1000-88 586.5n ± 0% 581.1n ± 0% -0.92% (p=0.000 n=15)
Div/20000/10000-88 25.87µ ± 0% 25.72µ ± 0% -0.59% (p=0.000 n=15)
Div/200000/100000-88 772.2µ ± 0% 779.0µ ± 0% +0.88% (p=0.000 n=15)
Div/2000000/1000000-88 33.36m ± 0% 33.63m ± 0% +0.80% (p=0.000 n=15)
Div/20000000/10000000-88 1.307 ± 0% 1.320 ± 0% +1.03% (p=0.000 n=15)
NatMul/10-88 140.4n ± 0% 148.8n ± 4% +5.98% (p=0.000 n=15)
NatMul/100-88 4.663µ ± 1% 4.388µ ± 1% -5.90% (p=0.000 n=15)
NatMul/1000-88 207.7µ ± 0% 205.8µ ± 0% -0.89% (p=0.000 n=15)
NatMul/10000-88 8.456m ± 0% 8.468m ± 0% +0.14% (p=0.021 n=15)
NatMul/100000-88 295.1m ± 0% 297.9m ± 0% +0.94% (p=0.000 n=15)
geomean 14.96µ 14.33µ -4.23%
│ old │ new │
│ B/op │ B/op vs base │
NatMul/10-88 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-88 4.750Ki ± 0% 1.758Ki ± 0% -62.99% (p=0.000 n=15)
NatMul/1000-88 48.44Ki ± 0% 16.08Ki ± 0% -66.80% (p=0.000 n=15)
NatMul/10000-88 489.7Ki ± 1% 166.1Ki ± 3% -66.08% (p=0.000 n=15)
NatMul/100000-88 5.546Mi ± 0% 3.819Mi ± 60% -31.15% (p=0.000 n=15)
geomean 41.29Ki 20.30Ki -50.85%
¹ all samples are equal
│ old │ new │
│ allocs/op │ allocs/op vs base │
NatMul/10-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/1000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/10000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100000-88 5.000 ± 20% 6.000 ± 67% ~ (p=0.672 n=15)
geomean 1.380 1.431 +3.71%
¹ all samples are equal
goos: linux
goarch: arm64
pkg: math/big
│ old │ new │
│ sec/op │ sec/op vs base │
Div/20/10-16 15.85n ± 0% 15.23n ± 0% -3.91% (p=0.000 n=15)
Div/40/20-16 15.88n ± 0% 15.22n ± 0% -4.16% (p=0.000 n=15)
Div/100/50-16 29.69n ± 0% 26.39n ± 0% -11.11% (p=0.000 n=15)
Div/200/100-16 149.2n ± 0% 123.3n ± 0% -17.36% (p=0.000 n=15)
Div/400/200-16 160.3n ± 0% 139.2n ± 0% -13.16% (p=0.000 n=15)
Div/1000/500-16 271.0n ± 0% 256.1n ± 0% -5.50% (p=0.000 n=15)
Div/2000/1000-16 545.3n ± 0% 527.0n ± 0% -3.36% (p=0.000 n=15)
Div/20000/10000-16 22.60µ ± 0% 22.20µ ± 0% -1.77% (p=0.000 n=15)
Div/200000/100000-16 889.0µ ± 0% 892.2µ ± 0% +0.35% (p=0.000 n=15)
Div/2000000/1000000-16 38.01m ± 0% 38.12m ± 0% +0.30% (p=0.000 n=15)
Div/20000000/10000000-16 1.437 ± 0% 1.444 ± 0% +0.50% (p=0.000 n=15)
NatMul/10-16 166.4n ± 2% 169.5n ± 1% +1.86% (p=0.000 n=15)
NatMul/100-16 5.733µ ± 1% 5.570µ ± 1% -2.84% (p=0.000 n=15)
NatMul/1000-16 232.6µ ± 1% 229.8µ ± 0% -1.22% (p=0.000 n=15)
NatMul/10000-16 9.039m ± 1% 8.969m ± 0% -0.77% (p=0.000 n=15)
NatMul/100000-16 367.0m ± 0% 368.8m ± 0% +0.48% (p=0.000 n=15)
geomean 16.15µ 15.50µ -4.01%
│ old │ new │
│ B/op │ B/op vs base │
NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15)
NatMul/1000-16 48.33Ki ± 0% 16.02Ki ± 0% -66.85% (p=0.000 n=15)
NatMul/10000-16 536.5Ki ± 1% 165.7Ki ± 3% -69.12% (p=0.000 n=15)
NatMul/100000-16 6.078Mi ± 6% 4.197Mi ± 0% -30.94% (p=0.000 n=15)
geomean 42.81Ki 20.64Ki -51.78%
¹ all samples are equal
│ old │ new │
│ allocs/op │ allocs/op vs base │
NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/10000-16 2.000 ± 50% 1.000 ± 0% -50.00% (p=0.001 n=15)
NatMul/100000-16 9.000 ± 11% 8.000 ± 12% -11.11% (p=0.001 n=15)
geomean 1.783 1.516 -14.97%
¹ all samples are equal
goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
│ old │ new │
│ sec/op │ sec/op vs base │
Div/20/10-12 9.850n ± 1% 9.405n ± 1% -4.52% (p=0.000 n=15)
Div/40/20-12 9.858n ± 0% 9.403n ± 1% -4.62% (p=0.000 n=15)
Div/100/50-12 16.40n ± 1% 14.81n ± 0% -9.70% (p=0.000 n=15)
Div/200/100-12 88.48n ± 2% 80.88n ± 0% -8.59% (p=0.000 n=15)
Div/400/200-12 107.90n ± 1% 99.28n ± 1% -7.99% (p=0.000 n=15)
Div/1000/500-12 188.8n ± 1% 178.6n ± 1% -5.40% (p=0.000 n=15)
Div/2000/1000-12 399.9n ± 0% 389.1n ± 0% -2.70% (p=0.000 n=15)
Div/20000/10000-12 13.94µ ± 2% 13.81µ ± 1% ~ (p=0.574 n=15)
Div/200000/100000-12 523.8µ ± 0% 521.7µ ± 0% -0.40% (p=0.000 n=15)
Div/2000000/1000000-12 21.46m ± 0% 21.48m ± 0% ~ (p=0.067 n=15)
Div/20000000/10000000-12 812.5m ± 0% 812.9m ± 0% ~ (p=0.061 n=15)
NatMul/10-12 77.14n ± 0% 78.35n ± 1% +1.57% (p=0.000 n=15)
NatMul/100-12 2.999µ ± 0% 2.871µ ± 1% -4.27% (p=0.000 n=15)
NatMul/1000-12 126.2µ ± 0% 126.8µ ± 0% +0.51% (p=0.011 n=15)
NatMul/10000-12 5.099m ± 0% 5.125m ± 0% +0.51% (p=0.000 n=15)
NatMul/100000-12 206.7m ± 0% 208.4m ± 0% +0.80% (p=0.000 n=15)
geomean 9.512µ 9.236µ -2.91%
│ old │ new │
│ B/op │ B/op vs base │
NatMul/10-12 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-12 4.750Ki ± 0% 1.750Ki ± 0% -63.16% (p=0.000 n=15)
NatMul/1000-12 48.13Ki ± 0% 16.01Ki ± 0% -66.73% (p=0.000 n=15)
NatMul/10000-12 483.5Ki ± 1% 163.2Ki ± 2% -66.24% (p=0.000 n=15)
NatMul/100000-12 5.480Mi ± 4% 1.532Mi ± 104% -72.05% (p=0.000 n=15)
geomean 41.03Ki 16.82Ki -59.01%
¹ all samples are equal
│ old │ new │
│ allocs/op │ allocs/op vs base │
NatMul/10-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/1000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/10000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100000-12 5.000 ± 0% 1.000 ± 400% -80.00% (p=0.007 n=15)
geomean 1.380 1.000 -27.52%
¹ all samples are equal
Change-Id: I7efa6fe37971ed26ae120a32250fcb47ece0a011
Reviewed-on: https://go-review.googlesource.com/c/go/+/650638
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
|
|
The GCD code was setting one *Int to the value of another
by smashing one struct on top of the other, instead of using Set.
That was safe in this one case, but it's not idiomatic in math/big
nor safe in general, so rewrite the code not to do that.
(In one case, by swapping variables around; in another, by calling Set.)
The added Set call does slow down GCDs by a small amount,
since the answer has to be copied out. To compensate for that,
optimize a bit: remove the s, t temporaries entirely and handle
vector x word multiplication directly. The net result is that almost
all GCDs are faster, except for small ones, which are a few
nanoseconds slower.
goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
│ bench.before │ bench.after │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-12 23.80n ± 1% 31.71n ± 1% +33.24% (p=0.000 n=10)
GCD10x10/WithXY-12 100.40n ± 0% 92.14n ± 1% -8.22% (p=0.000 n=10)
GCD10x100/WithoutXY-12 63.70n ± 0% 70.73n ± 0% +11.05% (p=0.000 n=10)
GCD10x100/WithXY-12 278.6n ± 0% 233.1n ± 1% -16.35% (p=0.000 n=10)
GCD10x1000/WithoutXY-12 153.4n ± 0% 162.2n ± 1% +5.74% (p=0.000 n=10)
GCD10x1000/WithXY-12 456.0n ± 0% 411.8n ± 1% -9.69% (p=0.000 n=10)
GCD10x10000/WithoutXY-12 1.002µ ± 1% 1.036µ ± 0% +3.39% (p=0.000 n=10)
GCD10x10000/WithXY-12 2.330µ ± 1% 2.210µ ± 0% -5.13% (p=0.000 n=10)
GCD10x100000/WithoutXY-12 8.894µ ± 0% 8.889µ ± 1% ~ (p=0.754 n=10)
GCD10x100000/WithXY-12 20.84µ ± 0% 20.24µ ± 0% -2.84% (p=0.000 n=10)
GCD100x100/WithoutXY-12 373.3n ± 3% 314.4n ± 0% -15.76% (p=0.000 n=10)
GCD100x100/WithXY-12 662.5n ± 0% 572.4n ± 1% -13.59% (p=0.000 n=10)
GCD100x1000/WithoutXY-12 641.8n ± 0% 598.1n ± 1% -6.81% (p=0.000 n=10)
GCD100x1000/WithXY-12 1.123µ ± 0% 1.019µ ± 1% -9.26% (p=0.000 n=10)
GCD100x10000/WithoutXY-12 2.870µ ± 0% 2.831µ ± 0% -1.38% (p=0.000 n=10)
GCD100x10000/WithXY-12 4.930µ ± 1% 4.675µ ± 0% -5.16% (p=0.000 n=10)
GCD100x100000/WithoutXY-12 24.08µ ± 0% 23.97µ ± 0% -0.48% (p=0.007 n=10)
GCD100x100000/WithXY-12 43.66µ ± 0% 42.52µ ± 0% -2.61% (p=0.001 n=10)
GCD1000x1000/WithoutXY-12 3.999µ ± 0% 3.569µ ± 1% -10.75% (p=0.000 n=10)
GCD1000x1000/WithXY-12 6.397µ ± 0% 5.534µ ± 0% -13.49% (p=0.000 n=10)
GCD1000x10000/WithoutXY-12 6.875µ ± 0% 6.450µ ± 0% -6.18% (p=0.000 n=10)
GCD1000x10000/WithXY-12 20.75µ ± 1% 19.17µ ± 1% -7.64% (p=0.000 n=10)
GCD1000x100000/WithoutXY-12 36.38µ ± 0% 35.60µ ± 1% -2.13% (p=0.000 n=10)
GCD1000x100000/WithXY-12 172.1µ ± 0% 174.4µ ± 3% ~ (p=0.052 n=10)
GCD10000x10000/WithoutXY-12 79.89µ ± 1% 75.16µ ± 2% -5.92% (p=0.000 n=10)
GCD10000x10000/WithXY-12 160.1µ ± 0% 150.0µ ± 0% -6.33% (p=0.000 n=10)
GCD10000x100000/WithoutXY-12 213.2µ ± 1% 209.0µ ± 1% -1.98% (p=0.000 n=10)
GCD10000x100000/WithXY-12 1.399m ± 0% 1.342m ± 3% -4.08% (p=0.002 n=10)
GCD100000x100000/WithoutXY-12 5.463m ± 1% 5.504m ± 2% ~ (p=0.190 n=10)
GCD100000x100000/WithXY-12 11.36m ± 0% 11.46m ± 1% +0.86% (p=0.000 n=10)
geomean 6.953µ 6.695µ -3.71%
goos: linux
goarch: amd64
pkg: math/big
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ bench.before │ bench.after │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-32 39.66n ± 4% 44.34n ± 4% +11.77% (p=0.000 n=10)
GCD10x10/WithXY-32 156.7n ± 12% 130.8n ± 2% -16.53% (p=0.000 n=10)
GCD10x100/WithoutXY-32 115.8n ± 5% 120.2n ± 2% +3.89% (p=0.000 n=10)
GCD10x100/WithXY-32 465.3n ± 3% 368.1n ± 2% -20.91% (p=0.000 n=10)
GCD10x1000/WithoutXY-32 201.1n ± 1% 210.8n ± 2% +4.82% (p=0.000 n=10)
GCD10x1000/WithXY-32 652.9n ± 4% 605.0n ± 1% -7.32% (p=0.002 n=10)
GCD10x10000/WithoutXY-32 1.046µ ± 2% 1.143µ ± 1% +9.33% (p=0.000 n=10)
GCD10x10000/WithXY-32 3.360µ ± 1% 3.258µ ± 1% -3.04% (p=0.000 n=10)
GCD10x100000/WithoutXY-32 9.391µ ± 3% 9.997µ ± 1% +6.46% (p=0.000 n=10)
GCD10x100000/WithXY-32 27.92µ ± 1% 28.21µ ± 0% +1.04% (p=0.043 n=10)
GCD100x100/WithoutXY-32 443.7n ± 5% 320.0n ± 2% -27.88% (p=0.000 n=10)
GCD100x100/WithXY-32 789.9n ± 2% 690.4n ± 1% -12.60% (p=0.000 n=10)
GCD100x1000/WithoutXY-32 718.4n ± 3% 600.0n ± 1% -16.48% (p=0.000 n=10)
GCD100x1000/WithXY-32 1.388µ ± 4% 1.175µ ± 1% -15.28% (p=0.000 n=10)
GCD100x10000/WithoutXY-32 2.750µ ± 1% 2.668µ ± 1% -2.96% (p=0.000 n=10)
GCD100x10000/WithXY-32 6.016µ ± 1% 5.590µ ± 1% -7.09% (p=0.000 n=10)
GCD100x100000/WithoutXY-32 21.40µ ± 1% 22.30µ ± 1% +4.21% (p=0.000 n=10)
GCD100x100000/WithXY-32 47.02µ ± 4% 48.80µ ± 0% +3.78% (p=0.015 n=10)
GCD1000x1000/WithoutXY-32 3.417µ ± 4% 3.020µ ± 1% -11.65% (p=0.000 n=10)
GCD1000x1000/WithXY-32 5.752µ ± 0% 5.418µ ± 2% -5.81% (p=0.000 n=10)
GCD1000x10000/WithoutXY-32 6.150µ ± 0% 6.246µ ± 1% +1.55% (p=0.000 n=10)
GCD1000x10000/WithXY-32 24.68µ ± 3% 25.07µ ± 1% ~ (p=0.051 n=10)
GCD1000x100000/WithoutXY-32 34.60µ ± 2% 36.85µ ± 1% +6.51% (p=0.000 n=10)
GCD1000x100000/WithXY-32 209.5µ ± 4% 227.4µ ± 0% +8.56% (p=0.000 n=10)
GCD10000x10000/WithoutXY-32 90.69µ ± 0% 88.48µ ± 0% -2.44% (p=0.000 n=10)
GCD10000x10000/WithXY-32 197.1µ ± 0% 200.5µ ± 0% +1.73% (p=0.000 n=10)
GCD10000x100000/WithoutXY-32 239.1µ ± 0% 242.5µ ± 0% +1.42% (p=0.000 n=10)
GCD10000x100000/WithXY-32 1.963m ± 3% 2.028m ± 0% +3.28% (p=0.000 n=10)
GCD100000x100000/WithoutXY-32 7.466m ± 0% 7.412m ± 0% -0.71% (p=0.000 n=10)
GCD100000x100000/WithXY-32 16.10m ± 2% 16.47m ± 0% +2.25% (p=0.000 n=10)
geomean 8.388µ 8.127µ -3.12%
Change-Id: I161dc409bad11bcc553bc8116449905ae5b06742
Reviewed-on: https://go-review.googlesource.com/c/go/+/650636
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
|
|
Change-Id: Ie7649060db25f1573eeaadd534a600bb24d30572
GitHub-Last-Rev: c617848a4ec9f5c21820982efc95e0ec4ca2510c
GitHub-Pull-Request: golang/go#70134
Reviewed-on: https://go-review.googlesource.com/c/go/+/623757
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Robert Griesemer <gri@google.com>
|
|
This looks way better than the code formatting.
Similar to CL 597656.
Change-Id: I2c8809c1d6f8a8387941567213880662ff649a73
Reviewed-on: https://go-review.googlesource.com/c/go/+/597659
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: I3541859bbf3ac4f9317b82a66d21be3d5c4c5a84
Reviewed-on: https://go-review.googlesource.com/c/go/+/597658
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
|
|
Change-Id: I07c3a498ce1e462c3d1703d77e7d7824e9334651
GitHub-Last-Rev: 2ba8c4c705eaeb0772109ece7296978b62467eb3
GitHub-Pull-Request: golang/go#66312
Reviewed-on: https://go-review.googlesource.com/c/go/+/571636
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: I4a6c2ef6fd21355952ab7d8eaad883646a95d364
Reviewed-on: https://go-review.googlesource.com/c/go/+/535087
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
The "To" prefix was a relic of the first draft
that I failed to make consistent with the unprefixed
name used in the proposal. Fortunately iant spotted
it during the API audit.
Updates #56984
Updates #60560
Change-Id: Ifa6eeddf6dd5f0637c0568e383f9a4bef88b10f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/500116
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Alan Donovan <adonovan@google.com>
|
|
Change-Id: I16ec916b47de2f417b681c8abff5a1375ddf491b
Reviewed-on: https://go-review.googlesource.com/c/go/+/468055
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Change-Id: I31bec5d2b4a79a085942c7d380678379d99cf07b
Reviewed-on: https://go-review.googlesource.com/c/go/+/455135
Auto-Submit: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Run-TryBot: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
|
|
This operation converts a big.Int to float64,
reporting the accuracy of the result, with
a fast path in hardware.
Fixes #56984
Change-Id: I86d0fb0e105a06a4009986f2f5fd87a4d446f6f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/453115
Reviewed-by: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Alan Donovan <adonovan@google.com>
|
|
This will let us reuse it in crypto/ecdsa for the NIST scalar fields.
The main change in API is around encoding and decoding. The SetBytes +
ExpandFor sequence was hacky: SetBytes could produce a bigger size than
the modulus if leading zeroes in the top byte overflowed the limb
boundary, so ExpandFor had to check for and tolerate that. Also, the
caller was responsible for checking that the overflow was actually all
zeroes (which we weren't doing, exposing a crasher in decryption and
signature verification) and then for checking that the result was less
than the modulus. Instead, make SetBytes take a modulus and return an
error if the value overflows. Same with Bytes: we were always allocating
based on Size before FillBytes anyway, so now Bytes takes a modulus.
Finally, SetBig was almost only used for moduli, so replaced
NewModulusFromNat and SetBig with NewModulusFromBig.
Moved the constant-time bitLen to math/big.Int.BitLen. It's slower, but
BitLen is primarily used in cryptographic code, so it's safer this way.
Change-Id: Ibaf7f36d80695578cb80484167d82ce1aa83832f
Reviewed-on: https://go-review.googlesource.com/c/go/+/450055
Auto-Submit: Filippo Valsorda <filippo@golang.org>
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
|
|
Change-Id: I69065f8adf101fdb28682c55997f503013a50e29
Reviewed-on: https://go-review.googlesource.com/c/go/+/449757
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joedian Reid <joedian@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Change-Id: I7605bcbbaa64bb4273ad458a157b1c6011467973
Reviewed-on: https://go-review.googlesource.com/c/go/+/447915
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Montgomery multiplication can be used for Exp mod even m
by splitting it into two steps - Exp mod an odd number and
Exp mod a power of two - and then combining the two results
using the Chinese Remainder Theorem.
For more details, see Ç. K. Koç, “Montgomery Reduction with Even Modulus”,
IEE Proceedings: Computers and Digital Techniques, 141(5) 314-316, September 1994.
http://www.people.vcu.edu/~jwang3/CMSC691/j34monex.pdf
Thanks to Guido Vranken for suggesting that we use a faster algorithm.
name old time/op new time/op delta
ExpMont/Odd-16 240µs ± 2% 239µs ± 2% ~ (p=0.853 n=10+10)
ExpMont/Even1-16 757µs ± 3% 249µs ± 6% -67.17% (p=0.000 n=10+10)
ExpMont/Even2-16 755µs ± 1% 244µs ± 4% -67.64% (p=0.000 n=8+10)
ExpMont/Even3-16 771µs ± 3% 240µs ± 2% -68.90% (p=0.000 n=10+10)
ExpMont/Even4-16 775µs ± 3% 241µs ± 2% -68.91% (p=0.000 n=10+10)
ExpMont/Even8-16 779µs ± 2% 241µs ± 3% -69.06% (p=0.000 n=9+10)
ExpMont/Even32-16 778µs ± 3% 240µs ± 4% -69.11% (p=0.000 n=9+9)
ExpMont/Even64-16 774µs ± 6% 186µs ± 2% -76.00% (p=0.000 n=10+10)
ExpMont/Even96-16 776µs ± 4% 186µs ± 6% -76.09% (p=0.000 n=9+10)
ExpMont/Even128-16 764µs ± 2% 143µs ± 3% -81.24% (p=0.000 n=10+10)
ExpMont/Even255-16 761µs ± 3% 109µs ± 2% -85.73% (p=0.000 n=10+10)
ExpMont/SmallEven1-16 45.6µs ± 1% 36.3µs ± 3% -20.49% (p=0.000 n=9+10)
ExpMont/SmallEven2-16 44.3µs ± 2% 37.5µs ± 2% -15.26% (p=0.000 n=9+10)
ExpMont/SmallEven3-16 44.1µs ± 5% 37.3µs ± 3% -15.32% (p=0.000 n=9+10)
ExpMont/SmallEven4-16 47.1µs ± 6% 38.0µs ± 5% -19.40% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
ExpMont/Odd-16 2.53kB ± 0% 2.53kB ± 0% ~ (p=0.137 n=8+10)
ExpMont/Even1-16 2.57kB ± 0% 3.31kB ± 0% +28.90% (p=0.000 n=8+10)
ExpMont/Even2-16 2.57kB ± 0% 3.37kB ± 0% +31.09% (p=0.000 n=9+10)
ExpMont/Even3-16 2.57kB ± 0% 3.37kB ± 0% +31.08% (p=0.000 n=8+8)
ExpMont/Even4-16 2.57kB ± 0% 3.37kB ± 0% +31.09% (p=0.000 n=9+10)
ExpMont/Even8-16 2.57kB ± 0% 3.37kB ± 0% +31.09% (p=0.000 n=9+10)
ExpMont/Even32-16 2.57kB ± 0% 3.37kB ± 0% +31.14% (p=0.000 n=10+10)
ExpMont/Even64-16 2.57kB ± 0% 3.16kB ± 0% +22.99% (p=0.000 n=9+9)
ExpMont/Even96-16 2.57kB ± 0% 3.44kB ± 0% +33.90% (p=0.000 n=10+8)
ExpMont/Even128-16 2.57kB ± 0% 2.88kB ± 0% +12.10% (p=0.000 n=10+10)
ExpMont/Even255-16 2.57kB ± 0% 2.54kB ± 0% -1.30% (p=0.000 n=9+10)
ExpMont/SmallEven1-16 872B ± 0% 1232B ± 0% +41.28% (p=0.000 n=10+10)
ExpMont/SmallEven2-16 872B ± 0% 1233B ± 0% +41.40% (p=0.000 n=10+9)
ExpMont/SmallEven3-16 872B ± 0% 1289B ± 0% +47.82% (p=0.000 n=10+10)
ExpMont/SmallEven4-16 872B ± 0% 1241B ± 0% +42.32% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
ExpMont/Odd-16 21.0 ± 0% 21.0 ± 0% ~ (all equal)
ExpMont/Even1-16 24.0 ± 0% 38.0 ± 0% +58.33% (p=0.000 n=10+10)
ExpMont/Even2-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10)
ExpMont/Even3-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10)
ExpMont/Even4-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10)
ExpMont/Even8-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10)
ExpMont/Even32-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10)
ExpMont/Even64-16 24.0 ± 0% 39.0 ± 0% +62.50% (p=0.000 n=10+10)
ExpMont/Even96-16 24.0 ± 0% 42.0 ± 0% +75.00% (p=0.000 n=10+10)
ExpMont/Even128-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10)
ExpMont/Even255-16 24.0 ± 0% 38.0 ± 0% +58.33% (p=0.000 n=10+10)
ExpMont/SmallEven1-16 16.0 ± 0% 35.0 ± 0% +118.75% (p=0.000 n=10+10)
ExpMont/SmallEven2-16 16.0 ± 0% 35.0 ± 0% +118.75% (p=0.000 n=10+10)
ExpMont/SmallEven3-16 16.0 ± 0% 37.0 ± 0% +131.25% (p=0.000 n=10+10)
ExpMont/SmallEven4-16 16.0 ± 0% 36.0 ± 0% +125.00% (p=0.000 n=10+10)
Change-Id: Ib7f70290f8f087b78805ec3120baf17dd7737ac3
Reviewed-on: https://go-review.googlesource.com/c/go/+/420897
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This change improves the performance of Binomial by implementing an
algorithm that produces smaller intermediate values at each step.
Working with smaller big.Int values has the advantage that fewer allocations
and computations are required for each mathematical operation.
The algorithm used is the Multiplicative Formula, which is a well known
way of calculating the Binomial coefficient and is described at:
https://en.wikipedia.org/wiki/Binomial_coefficient#Multiplicative_formula
https://en.wikipedia.org/wiki/Binomial_coefficient#In_programming_languages
In addition to that, an optimization has been made to remove a
redundant computation of (i+1) on each loop which has a measurable
impact when using big.Int.
Performance improvement measured on an M1 MacBook Pro
running the existing benchmark for Binomial:
name old time/op new time/op delta
Binomial-8 589ns ± 0% 435ns ± 0% -26.05% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
Binomial-8 1.02kB ± 0% 0.08kB ± 0% -92.19% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Binomial-8 38.0 ± 0% 5.0 ± 0% -86.84% (p=0.000 n=10+10)
Change-Id: I5a830386dd42f062e17af88411dd74fcb110ded9
GitHub-Last-Rev: 6b2fca07de4096accb02f66c313dff47c2303462
GitHub-Pull-Request: golang/go#56339
Reviewed-on: https://go-review.googlesource.com/c/go/+/444315
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Run-TryBot: Robert Griesemer <gri@google.com>
|
|
Mark the assembly routines as not escaping their arguments.
Add a special case to NewInt that, when inlined, can do all
of its allocations (a big.Int and a [1]Word) on the stack.
Update #29951
Change-Id: I9bd38c262eb97df98c0ed9874da7daac381243ea
Reviewed-on: https://go-review.googlesource.com/c/go/+/411254
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
This CL updates big.Jacobi to avoid forcing its y argument to escape
to the heap. The argument was escaping because it was being passed
through an empty interface to fmt.Sprintf during an assertion failure.
As a result, callers of Jacobi and Int.ModSqrt (which calls Jacobi)
could not keep this value on the stack.
Noticed when working on https://github.com/cockroachdb/apd/pull/103.
Change-Id: I5db9ee2149bf13b921886929425861721b53b085
GitHub-Last-Rev: 3ee07b5dc3292553cc0cd0eb2d38ef036c341a9d
GitHub-Pull-Request: golang/go#50527
Reviewed-on: https://go-review.googlesource.com/c/go/+/377014
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
|
|
TestAlias checks systematically for aliasing issues, where passing the
same value for an argument and the receiver leads to incorrect results.
We had a number of issues like that over the years:
- #31084: Lsh on arm64
- #30217: GCD
- #22830: Exp due to divLarge
- #22265: ModSqrt
- #20490: Add and Sub
- #11284: GCD
This CL also fixes two new minor bugs that the test found. A wrong
result would be returned by
- Exp when the modulo and the receiver alias
- Rand when the limit is negative and it aliases the receiver
The test runs in ~0.05s with the default -quickchecks value.
Change-Id: I8354069ec9886e40c60f2642342ee08e604befb7
Reviewed-on: https://go-review.googlesource.com/c/go/+/168257
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Filippo Valsorda <valsorda@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
Run-TryBot: Filippo Valsorda <filippo@golang.org>
|
|
[This CL is part of a sequence implementing the proposal #51082.
The design doc is at https://go.dev/s/godocfmt-design.]
Run the updated gofmt, which reformats doc comments,
on the main repository. Vendored files are excluded.
For #51082.
Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407
Reviewed-on: https://go-review.googlesource.com/c/go/+/384268
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
go/doc in all its forms applies this replacement when rendering
the comments. We are considering formatting doc comments,
including doing this replacement as part of the formatting.
Apply it to our source files ahead of time.
For #51082.
Change-Id: Ifcc1f5861abb57c5d14e7d8c2102dfb31b7a3a19
Reviewed-on: https://go-review.googlesource.com/c/go/+/384262
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
A future change to gofmt will rewrite
// Doc comment.
//
func f()
to
// Doc comment.
func f()
Apply that change preemptively to all doc comments.
For #51082.
Change-Id: I4023e16cfb0729b64a8590f071cd92f17343081d
Reviewed-on: https://go-review.googlesource.com/c/go/+/384259
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Change-Id: I0c2d26d6ede1452008992efbea7392162da65014
Reviewed-on: https://go-review.googlesource.com/c/go/+/331651
Reviewed-by: Robert Griesemer <gri@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Replaced almost every use of Bytes with FillBytes.
Note that the approved proposal was for
func (*Int) FillBytes(buf []byte)
while this implements
func (*Int) FillBytes(buf []byte) []byte
because the latter was far nicer to use in all callsites.
Fixes #35833
Change-Id: Ia912df123e5d79b763845312ea3d9a8051343c0a
Reviewed-on: https://go-review.googlesource.com/c/go/+/230397
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Fixes #38304
Also change `If m > 0, y < 0, ...` to `If m != 0, y < 0, ...` since `Exp` will return `nil`
whatever `m`'s sign is.
Change-Id: I17d7337ccd1404318cea5d42a8de904ad185fd00
GitHub-Last-Rev: 23995103000505dbf35aa29a717470c4da638fda
GitHub-Pull-Request: golang/go#38390
Reviewed-on: https://go-review.googlesource.com/c/go/+/228000
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
It was removed in CL 217302 but was intentionally added in CL 217104.
Change-Id: I1a478d80ad1ec4f0a0184bfebf8f1a5e352cfe8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/217941
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
We don't usually document past behavior (like "As of Go 1.14 ...") and
in isolation the current docs made it sound like a and b could only be
negative or zero.
Change-Id: I0d3c2b8579a9c01159ce528a3128b1478e99042a
Reviewed-on: https://go-review.googlesource.com/c/go/+/217302
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Per the suggestion https://golang.org/cl/216200/2/doc/go1.14.html#423.
Updates #28878.
Change-Id: I654d2d114409624219a0041916f0a4030efc7573
Reviewed-on: https://go-review.googlesource.com/c/go/+/217104
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Allow the inputs a and b to be zero or negative to GCD
with the following definitions.
If x or y are not nil, GCD sets their value such that z = a*x + b*y.
Regardless of the signs of a and b, z is always >= 0.
If a == b == 0, GCD sets z = x = y = 0.
If a == 0 and b != 0, GCD sets z = |b|, x = 0, y = sign(b) * 1.
If a != 0 and b == 0, GCD sets z = |a|, x = sign(a) * 1, y = 0.
Fixes #28878
Change-Id: Ia83fce66912a96545c95cd8df0549bfd852652f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/164972
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Use the following (suboptimal) script to obtain a list of possible
typos:
#!/usr/bin/env sh
set -x
git ls-files |\
grep -e '\.\(c\|cc\|go\)$' |\
xargs -n 1\
awk\
'/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
hunspell -d en_US -l |\
grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
sort -f |\
uniq -c |\
awk '$1 == 1 { print $2; }'
Then, go through the results manually and fix the most obvious typos in
the non-vendored code.
Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
math/big.Int Cmp method does not have a fast path for the case if x and y are the same.
Fixes #30856
Change-Id: Ia9a5b5f72db9d73af1b13ed6ac39ecff87d10393
Reviewed-on: https://go-review.googlesource.com/c/go/+/178957
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
If x does not have an inverse modulo m, and a negative exponent is used,
return nil just like ModInverse does now.
Change-Id: I8fa72f7a851e8cf77c5fab529ede88408740626f
Reviewed-on: https://go-review.googlesource.com/c/go/+/170757
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
The primary change is in nat.scan which now accepts underscores for base 0.
While at it, streamlined error handling in that function as well.
Also, improved the corresponding test significantly by checking the
expected result values also in case of scan errors.
The second major change is in scanExponent which now accepts underscores when
the new sepOk argument is set. While at it, essentially rewrote that
function to match error and underscore handling of nat.scan more closely.
Added a new test for scanExponent which until now was only tested
indirectly.
Finally, updated the documentation for several functions and added many
new test cases to clients of nat.scan.
A major portion of this CL is due to much better test coverage.
Updates #28493.
Change-Id: I7f17b361b633fbe6c798619d891bd5e0a045b5c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/166157
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
|
|
Implemented via the underlying nat.trailingZeroBits.
Fixes #29578
Change-Id: If9876c5a74b107cbabceb7547bef4e44501f6745
Reviewed-on: https://go-review.googlesource.com/c/go/+/160681
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
If the variables passed in to the cofactor arguments of GCD (x, y)
aliased the input arguments (a, b), the previous implementation would
result in incorrect results for y. This change reorganizes the calculation
so that the only case that need to be handled is when y aliases b, which
can be handled with a simple check.
Tests were added for all of the alias cases for input arguments and and
and irrelevant test case for a previous binary GCD calculation was dropped.
Fixes #30217
Change-Id: Ibe6137f09b3e1ae3c29e3c97aba85b67f33dc169
Reviewed-on: https://go-review.googlesource.com/c/162517
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Fixes #28423.
Change-Id: Ie57ade565d0407a4bffaa86fb4475ff083168e79
Reviewed-on: https://go-review.googlesource.com/c/145537
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
For modular exponentiation, negative exponents can be handled using
the following relation.
for y < 0: x**y mod m == (x**(-1))**|y| mod m
First compute ModInverse(x, m) and then compute the exponentiation
with the absolute value of the exponent. Non-modular exponentiation
with a negative exponent still returns 1.
Fixes #25865
Change-Id: I2a35986a24794b48e549c8de935ac662d217d8a0
Reviewed-on: https://go-review.googlesource.com/118562
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
For primes congruent to 5 mod 8 there is a simple deterministic
method for calculating the modular square root due to Atkin,
using one exponentiation and 4 multiplications.
A. Atkin. Probabilistic primality testing, summary by F. Morain.
Research Report 1779, INRIA, pages 159–163, 1992.
This increases the speed of modular square roots for these primes
considerably.
name old time/op new time/op delta
ModSqrt231_5Mod8-4 1.03ms ± 2% 0.36ms ± 5% -65.06% (p=0.008 n=5+5)
Change-Id: I024f6e514bbca8d634218983117db2afffe615fe
Reviewed-on: https://go-review.googlesource.com/99615
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Updates #15833
The extended GCD algorithm can be implemented using
Lehmer's algorithm with additional updates for the
cosequences following Algorithm 10.45 from Cohen et al.
"Handbook of Elliptic and Hyperelliptic Curve Cryptography" pp 192.
This brings the speed of the extended GCD calculation within
~2x of the base GCD calculation. There is a slight degradation in
the non-extended GCD speed for small inputs (1-2 words) due to the
additional code to handle the extended updates.
name old time/op new time/op delta
GCD10x10/WithoutXY-4 262ns ± 1% 266ns ± 2% ~ (p=0.333 n=5+5)
GCD10x10/WithXY-4 1.42µs ± 2% 0.74µs ± 3% -47.90% (p=0.008 n=5+5)
GCD10x100/WithoutXY-4 520ns ± 2% 539ns ± 1% +3.81% (p=0.008 n=5+5)
GCD10x100/WithXY-4 2.32µs ± 1% 1.67µs ± 0% -27.80% (p=0.008 n=5+5)
GCD10x1000/WithoutXY-4 1.40µs ± 1% 1.45µs ± 2% +3.26% (p=0.016 n=4+5)
GCD10x1000/WithXY-4 4.78µs ± 1% 3.43µs ± 1% -28.37% (p=0.008 n=5+5)
GCD10x10000/WithoutXY-4 10.0µs ± 0% 10.2µs ± 3% +1.80% (p=0.008 n=5+5)
GCD10x10000/WithXY-4 20.9µs ± 3% 17.9µs ± 1% -14.20% (p=0.008 n=5+5)
GCD10x100000/WithoutXY-4 96.8µs ± 0% 96.3µs ± 1% ~ (p=0.310 n=5+5)
GCD10x100000/WithXY-4 196µs ± 3% 159µs ± 2% -18.61% (p=0.008 n=5+5)
GCD100x100/WithoutXY-4 2.53µs ±15% 2.34µs ± 0% -7.35% (p=0.008 n=5+5)
GCD100x100/WithXY-4 19.3µs ± 0% 3.9µs ± 1% -79.58% (p=0.008 n=5+5)
GCD100x1000/WithoutXY-4 4.23µs ± 0% 4.17µs ± 3% ~ (p=0.127 n=5+5)
GCD100x1000/WithXY-4 22.8µs ± 1% 7.5µs ±10% -67.00% (p=0.008 n=5+5)
GCD100x10000/WithoutXY-4 19.1µs ± 0% 19.0µs ± 0% ~ (p=0.095 n=5+5)
GCD100x10000/WithXY-4 75.1µs ± 2% 30.5µs ± 2% -59.38% (p=0.008 n=5+5)
GCD100x100000/WithoutXY-4 170µs ± 5% 167µs ± 1% ~ (p=1.000 n=5+5)
GCD100x100000/WithXY-4 542µs ± 2% 267µs ± 2% -50.79% (p=0.008 n=5+5)
GCD1000x1000/WithoutXY-4 28.0µs ± 0% 27.1µs ± 0% -3.29% (p=0.008 n=5+5)
GCD1000x1000/WithXY-4 329µs ± 0% 42µs ± 1% -87.12% (p=0.008 n=5+5)
GCD1000x10000/WithoutXY-4 47.2µs ± 0% 46.4µs ± 0% -1.65% (p=0.016 n=5+4)
GCD1000x10000/WithXY-4 607µs ± 9% 123µs ± 1% -79.70% (p=0.008 n=5+5)
GCD1000x100000/WithoutXY-4 260µs ±17% 245µs ± 0% ~ (p=0.056 n=5+5)
GCD1000x100000/WithXY-4 3.64ms ± 1% 0.93ms ± 1% -74.41% (p=0.016 n=4+5)
GCD10000x10000/WithoutXY-4 513µs ± 0% 507µs ± 0% -1.22% (p=0.008 n=5+5)
GCD10000x10000/WithXY-4 7.44ms ± 1% 1.00ms ± 0% -86.58% (p=0.008 n=5+5)
GCD10000x100000/WithoutXY-4 1.23ms ± 0% 1.23ms ± 1% ~ (p=0.056 n=5+5)
GCD10000x100000/WithXY-4 37.3ms ± 0% 7.3ms ± 1% -80.45% (p=0.008 n=5+5)
GCD100000x100000/WithoutXY-4 24.2ms ± 0% 24.2ms ± 0% ~ (p=0.841 n=5+5)
GCD100000x100000/WithXY-4 505ms ± 1% 56ms ± 1% -88.92% (p=0.008 n=5+5)
Change-Id: I25f42ab8c55033acb83cc32bb03c12c1963925e8
Reviewed-on: https://go-review.googlesource.com/78755
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Currently, there is no check for a negative modulus in ModInverse.
Negative moduli are passed internally to GCD, which returns 0 for
negative arguments. Mod is symmetric with respect to negative moduli,
so the calculation can be done by just negating the modulus before
passing the arguments to GCD.
Fixes #24949
Change-Id: Ifd1e64c9b2343f0489c04ab65504e73a623378c7
Reviewed-on: https://go-review.googlesource.com/108115
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
|
|
Currently, the behavior of z.ModInverse(g, n) is undefined
when g and n are not relatively prime. In that case, no
ModInverse exists which can be easily checked during the
computation of the ModInverse. Because the ModInverse does
not indicate whether the inverse exists, there are reimplementations
of a "checked" ModInverse in crypto/rsa. This change removes the
undefined behavior. If the ModInverse does not exist, the receiver z
is unchanged and the return value is nil. This matches the behavior of
ModSqrt for the case where the square root does not exist.
name old time/op new time/op delta
ModInverse-4 2.40µs ± 4% 2.22µs ± 0% -7.74% (p=0.016 n=5+4)
name old alloc/op new alloc/op delta
ModInverse-4 1.36kB ± 0% 1.17kB ± 0% -14.12% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
ModInverse-4 10.0 ± 0% 9.0 ± 0% -10.00% (p=0.008 n=5+5)
Fixes #24922
Change-Id: If7f9d491858450bdb00f1e317152f02493c9c8a8
Reviewed-on: https://go-review.googlesource.com/108996
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
GitHub-Last-Rev: 468df242d07419c228656985702325aa78952d99
GitHub-Pull-Request: golang/go#23935
Change-Id: If751ce3ffa3a4d5e00a3138211383d12cb6b23fc
Reviewed-on: https://go-review.googlesource.com/95577
Run-TryBot: Andrew Bonventre <andybons@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
|
|
If a composite literal contains any comments on their own lines without
any elements, the printer would unindent the comments.
The comments in this edge case are written when the closing '}' is
written. Indent and outdent first so that the indentation is
interspersed before the comment is written.
Also note that the go/printer golden tests don't show the exact same
behaviour that gofmt does. Added a TODO to figure this out in a separate
CL.
While at it, ensure that the tree conforms to gofmt. The changes are
unrelated to this indentation fix, however.
Fixes #22355.
Change-Id: I5ac25ac6de95a236f1e123479127cc4dd71e93fe
Reviewed-on: https://go-review.googlesource.com/74232
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
A clarifying comment was added to indicate that overflow of a
single Word is not possible in the single digit calculation.
Lehmer's paper includes a proof of the bounds on the size of the
cosequences (u0, u1, u2, v0, v1, v2).
Change-Id: I98127a07aa8f8fe44814b74b2bc6ff720805194b
Reviewed-on: https://go-review.googlesource.com/77451
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Change-Id: I22a67733aa2d07298e124077654c9b1473802100
Reviewed-on: https://go-review.googlesource.com/76012
Reviewed-by: Aliaksandr Valialkin <valyala@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Fixes #22473.
Change-Id: Ie886dfc8b5510970d6d63ca6472c73325f6f2276
Reviewed-on: https://go-review.googlesource.com/74971
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
|
|
Updates #15833
Lehmer's GCD algorithm uses single precision calculations
to simulate several steps of multiple precision calculations
in Euclid's GCD algorithm which leads to a considerable
speed up. This implementation uses Collins' simplified
testing condition on the single digit cosequences which
requires only one quotient and avoids any possibility of
overflow.
name old time/op new time/op delta
GCD10x10/WithoutXY-4 1.82µs ±24% 0.28µs ± 6% -84.40% (p=0.008 n=5+5)
GCD10x10/WithXY-4 1.69µs ± 6% 1.71µs ± 6% ~ (p=0.595 n=5+5)
GCD10x100/WithoutXY-4 1.87µs ± 2% 0.56µs ± 4% -70.13% (p=0.008 n=5+5)
GCD10x100/WithXY-4 2.61µs ± 2% 2.65µs ± 4% ~ (p=0.635 n=5+5)
GCD10x1000/WithoutXY-4 2.75µs ± 2% 1.48µs ± 1% -46.06% (p=0.008 n=5+5)
GCD10x1000/WithXY-4 5.29µs ± 2% 5.25µs ± 2% ~ (p=0.548 n=5+5)
GCD10x10000/WithoutXY-4 10.7µs ± 2% 10.3µs ± 0% -4.38% (p=0.008 n=5+5)
GCD10x10000/WithXY-4 22.3µs ± 6% 22.1µs ± 1% ~ (p=1.000 n=5+5)
GCD10x100000/WithoutXY-4 93.7µs ± 2% 99.4µs ± 2% +6.09% (p=0.008 n=5+5)
GCD10x100000/WithXY-4 196µs ± 2% 199µs ± 2% ~ (p=0.222 n=5+5)
GCD100x100/WithoutXY-4 10.1µs ± 2% 2.5µs ± 2% -74.84% (p=0.008 n=5+5)
GCD100x100/WithXY-4 21.4µs ± 2% 21.3µs ± 7% ~ (p=0.548 n=5+5)
GCD100x1000/WithoutXY-4 11.3µs ± 2% 4.4µs ± 4% -60.87% (p=0.008 n=5+5)
GCD100x1000/WithXY-4 24.7µs ± 3% 23.9µs ± 1% ~ (p=0.056 n=5+5)
GCD100x10000/WithoutXY-4 26.6µs ± 1% 20.0µs ± 2% -24.82% (p=0.008 n=5+5)
GCD100x10000/WithXY-4 78.7µs ± 2% 78.2µs ± 2% ~ (p=0.690 n=5+5)
GCD100x100000/WithoutXY-4 174µs ± 2% 171µs ± 1% ~ (p=0.056 n=5+5)
GCD100x100000/WithXY-4 563µs ± 4% 561µs ± 2% ~ (p=1.000 n=5+5)
GCD1000x1000/WithoutXY-4 120µs ± 5% 29µs ± 3% -75.71% (p=0.008 n=5+5)
GCD1000x1000/WithXY-4 355µs ± 4% 358µs ± 2% ~ (p=0.841 n=5+5)
GCD1000x10000/WithoutXY-4 140µs ± 2% 49µs ± 2% -65.07% (p=0.008 n=5+5)
GCD1000x10000/WithXY-4 626µs ± 3% 628µs ± 9% ~ (p=0.690 n=5+5)
GCD1000x100000/WithoutXY-4 340µs ± 4% 259µs ± 6% -23.79% (p=0.008 n=5+5)
GCD1000x100000/WithXY-4 3.76ms ± 4% 3.82ms ± 5% ~ (p=0.310 n=5+5)
GCD10000x10000/WithoutXY-4 3.11ms ± 3% 0.54ms ± 2% -82.74% (p=0.008 n=5+5)
GCD10000x10000/WithXY-4 7.96ms ± 3% 7.69ms ± 3% ~ (p=0.151 n=5+5)
GCD10000x100000/WithoutXY-4 3.88ms ± 1% 1.27ms ± 2% -67.21% (p=0.008 n=5+5)
GCD10000x100000/WithXY-4 38.1ms ± 2% 38.8ms ± 1% ~ (p=0.095 n=5+5)
GCD100000x100000/WithoutXY-4 208ms ± 1% 25ms ± 4% -88.07% (p=0.008 n=5+5)
GCD100000x100000/WithXY-4 533ms ± 5% 525ms ± 4% ~ (p=0.548 n=5+5)
Change-Id: Ic1e007eb807b93e75f4752e968e98c1f0cb90e43
Reviewed-on: https://go-review.googlesource.com/59450
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|