| Age | Commit message (Collapse) | Author |
|
Checking that the lengths are equal and panicking teaches the compiler
that it can assume “i in range for z” implies “i in range for x”, letting us
simplify the actual loops a bit.
It also turns up a few places in math/big that were playing maybe a little
too fast and loose with slice lengths. Update those to explicitly set all the
input slices to the same length.
These speedups are basically irrelevant, since they only happen
in real code if people are compiling with -tags math_big_pure_go.
But at least the code is clearer.
benchmark \ system c3h88 c2s16 s7 386 s7-386 c4as16 mac arm loong64 ppc64le riscv64 s390x
AddVV/words=1/impl=go ~ +11.20% +5.11% -7.67% -7.77% +1.90% +10.76% -33.22% ~ +10.98% ~ +6.60%
AddVV/words=10/impl=go -22.12% -13.48% -10.37% -17.95% -18.07% -24.58% -22.04% -29.95% -14.22% ~ -6.33% +3.66%
AddVV/words=16/impl=go -9.75% -13.73% ~ -21.90% -18.66% -30.03% -20.45% -28.09% -17.33% -7.15% -8.96% +12.55%
AddVV/words=100/impl=go -5.91% -1.02% ~ -29.23% -22.18% -25.62% -6.49% -23.59% -22.31% -1.88% -14.13% +9.23%
AddVV/words=1000/impl=go -0.52% -0.19% -3.58% -33.89% -23.46% -22.46% ~ -24.00% -24.73% +0.93% -15.79% +12.32%
AddVV/words=10000/impl=go ~ ~ ~ -33.79% -23.72% -23.79% -5.98% -23.92% ~ +0.78% -15.45% +8.59%
AddVV/words=100000/impl=go ~ ~ ~ -33.90% -24.25% -22.82% -4.09% -24.63% ~ +1.00% -13.56% ~
SubVV/words=1/impl=go ~ +11.64% +14.05% ~ -4.07% ~ +10.79% -33.69% ~ ~ +3.89% +12.33%
SubVV/words=10/impl=go -10.31% -14.09% -7.38% +13.76% -13.25% -18.05% -20.08% -24.97% -14.15% +10.13% -0.97% -2.51%
SubVV/words=16/impl=go -8.06% -13.73% -5.70% +17.00% -12.83% -23.76% -17.52% -25.25% -17.30% -2.80% -4.96% -18.25%
SubVV/words=100/impl=go -9.22% -1.30% -2.76% +20.88% -14.35% -15.29% -8.49% -19.64% -22.31% -0.68% -14.30% -9.04%
SubVV/words=1000/impl=go -0.60% ~ -3.43% +23.08% -16.14% -11.96% ~ -28.52% -24.73% ~ -15.95% -9.91%
SubVV/words=10000/impl=go ~ ~ ~ +26.01% -15.24% -11.92% ~ -28.26% +4.25% ~ -15.42% -5.95%
SubVV/words=100000/impl=go ~ ~ ~ +25.71% -15.83% -12.13% ~ -27.88% -1.27% ~ -13.57% -6.72%
LshVU/words=1/impl=go +0.56% +0.36% ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
LshVU/words=10/impl=go +13.37% +4.63% ~ ~ ~ ~ ~ -2.90% ~ ~ ~ ~
LshVU/words=16/impl=go +22.83% +6.47% ~ ~ ~ ~ ~ ~ +0.80% ~ ~ +5.88%
LshVU/words=100/impl=go +7.56% +13.95% ~ ~ ~ ~ ~ ~ +0.33% -2.50% ~ ~
LshVU/words=1000/impl=go +0.64% +17.92% ~ ~ ~ ~ ~ -6.52% ~ -2.58% ~ ~
LshVU/words=10000/impl=go ~ +17.60% ~ ~ ~ ~ ~ -6.64% -6.22% -1.40% ~ ~
LshVU/words=100000/impl=go ~ +14.57% ~ ~ ~ ~ ~ ~ -5.47% ~ ~ ~
RshVU/words=1/impl=go ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ +2.72%
RshVU/words=10/impl=go ~ ~ ~ ~ ~ ~ ~ +2.50% ~ ~ ~ ~
RshVU/words=16/impl=go ~ +0.53% ~ ~ ~ ~ ~ +3.82% ~ ~ ~ ~
RshVU/words=100/impl=go ~ ~ ~ ~ ~ ~ ~ +6.18% ~ ~ ~ ~
RshVU/words=1000/impl=go ~ ~ ~ ~ ~ ~ ~ +7.00% ~ ~ ~ ~
RshVU/words=10000/impl=go ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
RshVU/words=100000/impl=go ~ ~ ~ ~ ~ ~ ~ +7.05% ~ ~ ~ ~
MulAddVWW/words=1/impl=go -10.34% +4.43% +10.62% -1.62% -4.74% -2.86% +11.75% ~ -8.00% +8.89% +3.87% ~
MulAddVWW/words=10/impl=go -1.61% -5.87% ~ -8.30% -4.55% +0.87% ~ -5.28% -20.82% ~ ~ -2.32%
MulAddVWW/words=16/impl=go -2.96% -5.28% ~ -9.22% -5.28% ~ ~ -3.74% -19.52% -1.48% -2.53% -9.52%
MulAddVWW/words=100/impl=go -3.89% -7.53% +1.93% -10.49% -4.87% -8.27% ~ ~ -0.65% -0.61% -7.59% -20.61%
MulAddVWW/words=1000/impl=go -0.45% -3.91% +4.54% -11.46% -4.69% -8.53% ~ ~ -0.05% ~ -8.88% -19.77%
MulAddVWW/words=10000/impl=go ~ -3.30% +4.10% -11.34% -4.10% -9.43% ~ -0.61% ~ -0.55% -8.21% -18.48%
MulAddVWW/words=100000/impl=go -0.30% -3.03% +4.31% -11.55% -4.41% -9.74% ~ -0.75% +0.63% ~ -7.80% -19.82%
AddMulVVWW/words=1/impl=go ~ +13.09% +12.50% -7.05% -10.41% +2.53% +13.32% -3.49% ~ +15.56% +3.62% ~
AddMulVVWW/words=10/impl=go -15.96% -9.06% -5.06% -14.56% -11.83% -5.44% -26.30% -14.23% -11.44% -1.79% -5.93% -6.60%
AddMulVVWW/words=16/impl=go -19.05% -12.43% -6.19% -14.24% -12.67% -8.65% -18.64% -16.56% -10.64% -3.00% -7.61% -12.80%
AddMulVVWW/words=100/impl=go -22.13% -16.59% -13.04% -13.79% -11.46% -12.01% -6.46% -21.80% -5.08% -3.13% -13.60% -22.53%
AddMulVVWW/words=1000/impl=go -17.07% -17.05% -14.08% -13.59% -12.13% -11.21% ~ -22.81% -4.27% -1.27% -16.35% -23.47%
AddMulVVWW/words=10000/impl=go -15.03% -16.78% -14.23% -13.86% -11.84% -11.69% ~ -22.75% -13.39% -1.10% -14.37% -22.01%
AddMulVVWW/words=100000/impl=go -13.70% -14.90% -14.26% -13.55% -12.04% -11.63% ~ -22.61% ~ -2.53% -10.42% -23.16%
Change-Id: Ic6f64344484a762b818c7090d1396afceb638607
Reviewed-on: https://go-review.googlesource.com/c/go/+/665155
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
|
|
It is annoying that non-x86 implementations of shlVU and shrVU
have to go out of their way to handle the trivial case shift==0
with their own copy loops. Instead, arrange to never call them
with shift==0, so that the code can be removed.
Unfortunately, there are linknames of shlVU, so we cannot
change that function. But we can rename the functions and
then leave behind a shlVU wrapper, so do that.
Since the big.Int API calls the operations Lsh and Rsh, rename
shlVU/shrVU to lshVU/rshVU. Also rename various other shl/shr
methods and functions to lsh/rsh.
Change-Id: Ieaf54e0110a298730aa3e4566ce5be57ba7fc121
Reviewed-on: https://go-review.googlesource.com/c/go/+/664896
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
addMulVVW is an unnecessarily special case.
All other assembly routines taking []Word (V as in vector) arguments
take separate source and destination. For example:
addVV: z = x+y
mulAddVWW: z = x*m+a
addMulVVW uses the z parameter as both destination and source:
addMulVVW: z = z+x*m
Even looking at the signatures is confusing: all the VV routines take
two input vectors x and y, but addMulVVW takes only x: where is y?
(The answer is that the two inputs are z and x.)
It would be nice to fix this, both for understandability and regularity,
and to simplify a future assembly generator.
We cannot remove or redefine addMulVVW, because it has been used
in linknames. Instead, the CL adds a new final addend argument ‘a’
like in mulAddVWW, making the natural name addMulVVWW
(two input vectors, two input words):
addMulVVWW: z = x+y*m+a
This CL updates all the assembly implementations to rename the
inputs z, x, y -> x, y, m, and then introduces a separate destination z.
Change-Id: Ib76c80b53f6d1f4a901f663566e9c4764bb20488
Reviewed-on: https://go-review.googlesource.com/c/go/+/664895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
|
|
The old Karatsuba implementation only operated on lengths that are
a power of two times a number smaller than karatsubaThreshold.
For example, when karatsubaThreshold = 40, multiplying a pair
of 99-word numbers runs karatsuba on the low 96 (= 39<<2) words
and then has to fix up the answer to include the high 3 words of each.
I suspect this requirement was needed to make the analysis of
how many temporary words to reserve easier, back when the
answer was 3*n and depended on exactly halving the size at
each Karatsuba step.
Now that we have the more flexible temporary allocation stack,
we can change Karatsuba to accept operands of odd length.
Doing so avoids most of the fixup that the old approach required.
For example, multiplying a pair of 99-word numbers runs
karatsuba on all 99 words now.
This is simpler and about the same speed or, for large cases, faster.
goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) CPU @ 3.10GHz
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-16 99.62n ± 3% 99.10n ± 3% ~ (p=0.009 n=15)
GCD10x10/WithXY-16 243.4n ± 1% 245.2n ± 1% ~ (p=0.009 n=15)
GCD100x100/WithoutXY-16 921.9n ± 1% 919.2n ± 1% ~ (p=0.076 n=15)
GCD100x100/WithXY-16 1.527µ ± 1% 1.526µ ± 0% ~ (p=0.813 n=15)
GCD1000x1000/WithoutXY-16 9.704µ ± 1% 9.696µ ± 0% ~ (p=0.532 n=15)
GCD1000x1000/WithXY-16 14.03µ ± 1% 13.96µ ± 0% ~ (p=0.014 n=15)
GCD10000x10000/WithoutXY-16 206.5µ ± 2% 206.5µ ± 0% ~ (p=0.967 n=15)
GCD10000x10000/WithXY-16 398.0µ ± 1% 397.4µ ± 0% ~ (p=0.683 n=15)
Div/20/10-16 22.22n ± 0% 22.23n ± 0% ~ (p=0.105 n=15)
Div/40/20-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.307 n=15)
Div/100/50-16 55.47n ± 0% 55.47n ± 0% ~ (p=0.573 n=15)
Div/200/100-16 174.9n ± 1% 174.6n ± 1% ~ (p=0.814 n=15)
Div/400/200-16 209.5n ± 1% 210.5n ± 1% ~ (p=0.454 n=15)
Div/1000/500-16 379.9n ± 0% 383.5n ± 2% ~ (p=0.123 n=15)
Div/2000/1000-16 780.1n ± 0% 784.6n ± 1% +0.58% (p=0.000 n=15)
Div/20000/10000-16 25.22µ ± 1% 25.15µ ± 0% ~ (p=0.213 n=15)
Div/200000/100000-16 921.8µ ± 1% 926.1µ ± 0% ~ (p=0.009 n=15)
Div/2000000/1000000-16 37.91m ± 0% 35.63m ± 0% -6.02% (p=0.000 n=15)
Div/20000000/10000000-16 1.378 ± 0% 1.336 ± 0% -3.03% (p=0.000 n=15)
NatMul/10-16 166.8n ± 4% 168.9n ± 3% ~ (p=0.008 n=15)
NatMul/100-16 5.519µ ± 2% 5.548µ ± 4% ~ (p=0.032 n=15)
NatMul/1000-16 230.4µ ± 1% 220.2µ ± 1% -4.43% (p=0.000 n=15)
NatMul/10000-16 8.569m ± 1% 8.640m ± 1% ~ (p=0.005 n=15)
NatMul/100000-16 376.5m ± 1% 334.1m ± 0% -11.26% (p=0.000 n=15)
NatSqr/1-16 27.85n ± 5% 28.60n ± 2% ~ (p=0.123 n=15)
NatSqr/2-16 47.99n ± 2% 48.84n ± 1% ~ (p=0.008 n=15)
NatSqr/3-16 59.41n ± 2% 60.87n ± 2% +2.46% (p=0.001 n=15)
NatSqr/5-16 87.27n ± 2% 89.31n ± 3% ~ (p=0.087 n=15)
NatSqr/8-16 124.6n ± 3% 128.9n ± 3% ~ (p=0.006 n=15)
NatSqr/10-16 166.3n ± 3% 172.7n ± 3% ~ (p=0.002 n=15)
NatSqr/20-16 385.2n ± 2% 394.7n ± 3% ~ (p=0.036 n=15)
NatSqr/30-16 622.7n ± 3% 642.9n ± 3% ~ (p=0.032 n=15)
NatSqr/50-16 1.274µ ± 3% 1.323µ ± 4% ~ (p=0.003 n=15)
NatSqr/80-16 2.606µ ± 4% 2.714µ ± 4% ~ (p=0.044 n=15)
NatSqr/100-16 3.731µ ± 4% 3.871µ ± 4% ~ (p=0.038 n=15)
NatSqr/200-16 12.99µ ± 2% 13.09µ ± 3% ~ (p=0.838 n=15)
NatSqr/300-16 22.87µ ± 2% 23.25µ ± 2% ~ (p=0.285 n=15)
NatSqr/500-16 58.43µ ± 1% 58.25µ ± 2% ~ (p=0.345 n=15)
NatSqr/800-16 115.3µ ± 3% 116.2µ ± 3% ~ (p=0.126 n=15)
NatSqr/1000-16 173.9µ ± 1% 174.3µ ± 1% ~ (p=0.935 n=15)
NatSqr/10000-16 6.133m ± 2% 6.034m ± 1% -1.62% (p=0.000 n=15)
NatSqr/100000-16 253.8m ± 1% 241.5m ± 0% -4.87% (p=0.000 n=15)
geomean 7.745µ 7.760µ +0.19%
goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-88 62.17n ± 4% 61.44n ± 0% -1.17% (p=0.000 n=15)
GCD10x10/WithXY-88 173.4n ± 2% 172.4n ± 4% ~ (p=0.615 n=15)
GCD100x100/WithoutXY-88 584.0n ± 1% 582.9n ± 0% ~ (p=0.009 n=15)
GCD100x100/WithXY-88 1.098µ ± 1% 1.091µ ± 2% ~ (p=0.002 n=15)
GCD1000x1000/WithoutXY-88 6.055µ ± 0% 6.049µ ± 0% ~ (p=0.007 n=15)
GCD1000x1000/WithXY-88 9.430µ ± 0% 9.417µ ± 1% ~ (p=0.123 n=15)
GCD10000x10000/WithoutXY-88 153.4µ ± 2% 149.0µ ± 2% -2.85% (p=0.000 n=15)
GCD10000x10000/WithXY-88 350.6µ ± 3% 349.0µ ± 2% ~ (p=0.126 n=15)
Div/20/10-88 13.12n ± 0% 13.12n ± 1% 0.00% (p=0.042 n=15)
Div/40/20-88 13.12n ± 0% 13.13n ± 0% ~ (p=0.004 n=15)
Div/100/50-88 25.49n ± 0% 25.49n ± 0% ~ (p=0.452 n=15)
Div/200/100-88 115.7n ± 2% 113.8n ± 2% ~ (p=0.212 n=15)
Div/400/200-88 135.0n ± 1% 136.1n ± 1% ~ (p=0.005 n=15)
Div/1000/500-88 257.5n ± 1% 259.9n ± 1% ~ (p=0.004 n=15)
Div/2000/1000-88 567.5n ± 1% 572.4n ± 2% ~ (p=0.616 n=15)
Div/20000/10000-88 25.65µ ± 0% 25.77µ ± 1% ~ (p=0.032 n=15)
Div/200000/100000-88 777.4µ ± 1% 754.3µ ± 1% -2.97% (p=0.000 n=15)
Div/2000000/1000000-88 33.66m ± 0% 31.37m ± 0% -6.81% (p=0.000 n=15)
Div/20000000/10000000-88 1.320 ± 0% 1.266 ± 0% -4.04% (p=0.000 n=15)
NatMul/10-88 151.9n ± 7% 143.3n ± 7% ~ (p=0.878 n=15)
NatMul/100-88 4.418µ ± 2% 4.337µ ± 3% ~ (p=0.512 n=15)
NatMul/1000-88 206.8µ ± 1% 189.8µ ± 1% -8.25% (p=0.000 n=15)
NatMul/10000-88 8.531m ± 1% 8.095m ± 0% -5.12% (p=0.000 n=15)
NatMul/100000-88 298.9m ± 0% 260.5m ± 1% -12.85% (p=0.000 n=15)
NatSqr/1-88 27.55n ± 6% 28.25n ± 7% ~ (p=0.024 n=15)
NatSqr/2-88 44.71n ± 6% 46.21n ± 9% ~ (p=0.024 n=15)
NatSqr/3-88 55.44n ± 4% 58.41n ± 10% ~ (p=0.126 n=15)
NatSqr/5-88 80.71n ± 5% 81.41n ± 5% ~ (p=0.032 n=15)
NatSqr/8-88 115.7n ± 4% 115.4n ± 5% ~ (p=0.814 n=15)
NatSqr/10-88 147.4n ± 4% 147.3n ± 4% ~ (p=0.505 n=15)
NatSqr/20-88 337.8n ± 3% 337.3n ± 4% ~ (p=0.814 n=15)
NatSqr/30-88 556.9n ± 3% 557.6n ± 4% ~ (p=0.814 n=15)
NatSqr/50-88 1.208µ ± 4% 1.208µ ± 3% ~ (p=0.910 n=15)
NatSqr/80-88 2.591µ ± 3% 2.581µ ± 3% ~ (p=0.705 n=15)
NatSqr/100-88 3.870µ ± 3% 3.858µ ± 3% ~ (p=0.846 n=15)
NatSqr/200-88 14.43µ ± 3% 14.28µ ± 2% ~ (p=0.383 n=15)
NatSqr/300-88 24.68µ ± 2% 24.49µ ± 2% ~ (p=0.624 n=15)
NatSqr/500-88 66.27µ ± 1% 66.18µ ± 1% ~ (p=0.735 n=15)
NatSqr/800-88 128.7µ ± 1% 127.4µ ± 1% ~ (p=0.050 n=15)
NatSqr/1000-88 198.7µ ± 1% 197.7µ ± 1% ~ (p=0.229 n=15)
NatSqr/10000-88 6.582m ± 1% 6.426m ± 1% -2.37% (p=0.000 n=15)
NatSqr/100000-88 274.3m ± 0% 267.3m ± 0% -2.57% (p=0.000 n=15)
geomean 6.518µ 6.438µ -1.22%
goos: linux
goarch: arm64
pkg: math/big
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-16 61.70n ± 1% 61.32n ± 1% ~ (p=0.361 n=15)
GCD10x10/WithXY-16 217.3n ± 1% 217.0n ± 1% ~ (p=0.395 n=15)
GCD100x100/WithoutXY-16 569.7n ± 0% 572.6n ± 2% ~ (p=0.213 n=15)
GCD100x100/WithXY-16 1.241µ ± 1% 1.236µ ± 1% ~ (p=0.157 n=15)
GCD1000x1000/WithoutXY-16 5.558µ ± 0% 5.566µ ± 0% ~ (p=0.228 n=15)
GCD1000x1000/WithXY-16 9.319µ ± 0% 9.326µ ± 0% ~ (p=0.233 n=15)
GCD10000x10000/WithoutXY-16 126.4µ ± 2% 128.7µ ± 3% ~ (p=0.081 n=15)
GCD10000x10000/WithXY-16 279.3µ ± 0% 278.3µ ± 5% ~ (p=0.187 n=15)
Div/20/10-16 15.12n ± 1% 15.21n ± 1% ~ (p=0.490 n=15)
Div/40/20-16 15.11n ± 0% 15.23n ± 1% ~ (p=0.107 n=15)
Div/100/50-16 26.53n ± 0% 26.50n ± 0% ~ (p=0.299 n=15)
Div/200/100-16 123.7n ± 0% 124.0n ± 0% ~ (p=0.086 n=15)
Div/400/200-16 142.5n ± 0% 142.4n ± 0% ~ (p=0.039 n=15)
Div/1000/500-16 259.9n ± 1% 261.2n ± 1% ~ (p=0.044 n=15)
Div/2000/1000-16 539.4n ± 1% 532.3n ± 1% -1.32% (p=0.001 n=15)
Div/20000/10000-16 22.43µ ± 0% 22.32µ ± 0% -0.49% (p=0.000 n=15)
Div/200000/100000-16 898.3µ ± 0% 889.6µ ± 0% -0.96% (p=0.000 n=15)
Div/2000000/1000000-16 38.37m ± 0% 35.11m ± 0% -8.49% (p=0.000 n=15)
Div/20000000/10000000-16 1.449 ± 0% 1.384 ± 0% -4.48% (p=0.000 n=15)
NatMul/10-16 182.0n ± 1% 177.8n ± 1% -2.31% (p=0.000 n=15)
NatMul/100-16 5.537µ ± 0% 5.693µ ± 0% +2.82% (p=0.000 n=15)
NatMul/1000-16 229.9µ ± 0% 224.8µ ± 0% -2.24% (p=0.000 n=15)
NatMul/10000-16 8.985m ± 0% 8.751m ± 0% -2.61% (p=0.000 n=15)
NatMul/100000-16 371.1m ± 0% 331.5m ± 0% -10.66% (p=0.000 n=15)
NatSqr/1-16 46.77n ± 6% 42.76n ± 1% -8.57% (p=0.000 n=15)
NatSqr/2-16 66.99n ± 4% 63.62n ± 1% -5.03% (p=0.000 n=15)
NatSqr/3-16 76.79n ± 4% 73.42n ± 1% ~ (p=0.007 n=15)
NatSqr/5-16 99.00n ± 3% 95.35n ± 1% -3.69% (p=0.000 n=15)
NatSqr/8-16 160.0n ± 3% 155.1n ± 1% -3.06% (p=0.001 n=15)
NatSqr/10-16 178.4n ± 2% 175.9n ± 0% -1.40% (p=0.001 n=15)
NatSqr/20-16 361.9n ± 2% 361.3n ± 0% ~ (p=0.083 n=15)
NatSqr/30-16 584.7n ± 0% 586.8n ± 0% +0.36% (p=0.000 n=15)
NatSqr/50-16 1.327µ ± 0% 1.329µ ± 0% ~ (p=0.349 n=15)
NatSqr/80-16 2.893µ ± 1% 2.925µ ± 0% +1.11% (p=0.000 n=15)
NatSqr/100-16 4.330µ ± 1% 4.381µ ± 0% +1.18% (p=0.000 n=15)
NatSqr/200-16 16.25µ ± 1% 16.43µ ± 0% +1.07% (p=0.000 n=15)
NatSqr/300-16 27.85µ ± 1% 28.06µ ± 0% +0.77% (p=0.000 n=15)
NatSqr/500-16 76.01µ ± 0% 76.34µ ± 0% ~ (p=0.002 n=15)
NatSqr/800-16 146.8µ ± 0% 148.1µ ± 0% +0.83% (p=0.000 n=15)
NatSqr/1000-16 228.2µ ± 0% 228.6µ ± 0% ~ (p=0.123 n=15)
NatSqr/10000-16 7.524m ± 0% 7.426m ± 0% -1.31% (p=0.000 n=15)
NatSqr/100000-16 316.7m ± 0% 309.2m ± 0% -2.36% (p=0.000 n=15)
geomean 7.264µ 7.172µ -1.27%
goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-12 32.61n ± 1% 32.42n ± 1% ~ (p=0.021 n=15)
GCD10x10/WithXY-12 87.70n ± 1% 88.42n ± 1% ~ (p=0.010 n=15)
GCD100x100/WithoutXY-12 305.9n ± 0% 306.4n ± 0% ~ (p=0.003 n=15)
GCD100x100/WithXY-12 560.3n ± 2% 556.6n ± 1% ~ (p=0.018 n=15)
GCD1000x1000/WithoutXY-12 3.509µ ± 2% 3.464µ ± 1% ~ (p=0.145 n=15)
GCD1000x1000/WithXY-12 5.347µ ± 2% 5.372µ ± 1% ~ (p=0.046 n=15)
GCD10000x10000/WithoutXY-12 73.75µ ± 1% 73.99µ ± 1% ~ (p=0.004 n=15)
GCD10000x10000/WithXY-12 148.4µ ± 0% 147.8µ ± 1% ~ (p=0.076 n=15)
Div/20/10-12 9.481n ± 0% 9.462n ± 1% ~ (p=0.631 n=15)
Div/40/20-12 9.457n ± 0% 9.462n ± 1% ~ (p=0.798 n=15)
Div/100/50-12 14.91n ± 0% 14.79n ± 1% -0.80% (p=0.000 n=15)
Div/200/100-12 84.56n ± 1% 84.60n ± 1% ~ (p=0.271 n=15)
Div/400/200-12 103.8n ± 0% 102.8n ± 0% -0.96% (p=0.000 n=15)
Div/1000/500-12 181.3n ± 1% 184.2n ± 2% ~ (p=0.091 n=15)
Div/2000/1000-12 397.5n ± 0% 397.4n ± 0% ~ (p=0.299 n=15)
Div/20000/10000-12 14.04µ ± 1% 13.99µ ± 0% ~ (p=0.221 n=15)
Div/200000/100000-12 523.1µ ± 0% 514.0µ ± 3% ~ (p=0.775 n=15)
Div/2000000/1000000-12 21.58m ± 0% 20.01m ± 1% -7.29% (p=0.000 n=15)
Div/20000000/10000000-12 813.5m ± 0% 796.2m ± 1% -2.13% (p=0.000 n=15)
NatMul/10-12 80.46n ± 1% 80.02n ± 1% ~ (p=0.063 n=15)
NatMul/100-12 2.904µ ± 0% 2.979µ ± 1% +2.58% (p=0.000 n=15)
NatMul/1000-12 127.8µ ± 0% 122.3µ ± 0% -4.28% (p=0.000 n=15)
NatMul/10000-12 5.141m ± 0% 4.975m ± 1% -3.23% (p=0.000 n=15)
NatMul/100000-12 208.8m ± 0% 189.6m ± 3% -9.21% (p=0.000 n=15)
NatSqr/1-12 11.90n ± 1% 11.76n ± 1% ~ (p=0.059 n=15)
NatSqr/2-12 21.33n ± 1% 21.12n ± 0% ~ (p=0.063 n=15)
NatSqr/3-12 26.05n ± 1% 25.79n ± 0% ~ (p=0.002 n=15)
NatSqr/5-12 37.31n ± 0% 36.98n ± 1% ~ (p=0.008 n=15)
NatSqr/8-12 63.07n ± 0% 62.75n ± 1% ~ (p=0.061 n=15)
NatSqr/10-12 79.48n ± 0% 79.59n ± 0% ~ (p=0.455 n=15)
NatSqr/20-12 173.1n ± 0% 173.2n ± 1% ~ (p=0.518 n=15)
NatSqr/30-12 288.6n ± 1% 289.2n ± 0% ~ (p=0.030 n=15)
NatSqr/50-12 653.3n ± 0% 653.3n ± 0% ~ (p=0.361 n=15)
NatSqr/80-12 1.492µ ± 0% 1.496µ ± 0% ~ (p=0.018 n=15)
NatSqr/100-12 2.270µ ± 1% 2.270µ ± 0% ~ (p=0.326 n=15)
NatSqr/200-12 8.776µ ± 1% 8.784µ ± 1% ~ (p=0.083 n=15)
NatSqr/300-12 15.07µ ± 0% 15.09µ ± 0% ~ (p=0.455 n=15)
NatSqr/500-12 41.71µ ± 0% 41.77µ ± 1% ~ (p=0.305 n=15)
NatSqr/800-12 80.77µ ± 1% 80.59µ ± 0% ~ (p=0.113 n=15)
NatSqr/1000-12 126.4µ ± 1% 126.5µ ± 0% ~ (p=0.683 n=15)
NatSqr/10000-12 4.204m ± 0% 4.119m ± 0% -2.02% (p=0.000 n=15)
NatSqr/100000-12 177.0m ± 0% 172.9m ± 0% -2.31% (p=0.000 n=15)
geomean 3.790µ 3.757µ -0.87%
Change-Id: Ifc7a9b61f678df216690511ac8bb9143189a795e
Reviewed-on: https://go-review.googlesource.com/c/go/+/652057
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
No code changes.
This CL moves the multiplication (and squaring) code into natmul.go,
in preparation for cleaning up Karatsuba and then adding Toom-Cook
and FFT-based multiplication.
Change-Id: I7f84328284cc4e1ca4da0ebb9f666a5535e8d7f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/652055
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
|
|
In the early days of math/big, algorithms that needed more space
grew the result larger than it needed to be and then used the
high words as extra space. This made results their own temporary
space caches, at the cost that saving a result in a data structure
might hold significantly more memory than necessary.
Specifically, new(big.Int).Mul(x, y) returned a big.Int with a
backing slice 3X as big as it strictly needed to be.
If you are storing many multiplication results, or even a single
large result, the 3X overhead can add up.
This approach to storage for temporaries also requires being able
to analyze the algorithms to predict the exact amount they need,
which can be difficult.
For both these reasons, the implementation of recursive long division,
which came later, introduced a “nat pool” where temporaries could be
stored and reused, or reclaimed by the GC when no longer used.
This avoids the storage and bookkeeping overheads but introduces a
per-temporary sync.Pool overhead. divRecursiveStep takes an array
of cached temporaries to remove some of that overhead.
The nat pool was better but is still not quite right.
This CL introduces something even better than the nat pool
(still probably not quite right, but the best I can see for now):
a sync.Pool holding stacks for allocating temporaries.
Now an operation can get one stack out of the pool and then
allocate as many temporaries as it needs during the operation,
eventually returning the stack back to the pool. The sync.Pool
operations are now per-exported-operation (like big.Int.Mul),
not per-temporary.
This CL converts both the pre-allocation in nat.mul and the
uses of the nat pool to use stack pools instead. This simplifies
some code and sets us up better for more complex algorithms
(such as Toom-Cook or FFT-based multiplication) that need
more temporaries. It is also a little bit faster.
goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) CPU @ 3.10GHz
│ old │ new │
│ sec/op │ sec/op vs base │
Div/20/10-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15)
Div/40/20-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15)
Div/100/50-16 56.65n ± 0% 55.53n ± 0% -1.98% (p=0.000 n=15)
Div/200/100-16 194.6n ± 1% 172.8n ± 0% -11.20% (p=0.000 n=15)
Div/400/200-16 232.1n ± 0% 206.7n ± 0% -10.94% (p=0.000 n=15)
Div/1000/500-16 405.3n ± 1% 383.8n ± 0% -5.30% (p=0.000 n=15)
Div/2000/1000-16 810.4n ± 1% 795.2n ± 0% -1.88% (p=0.000 n=15)
Div/20000/10000-16 25.88µ ± 0% 25.39µ ± 0% -1.89% (p=0.000 n=15)
Div/200000/100000-16 931.5µ ± 0% 924.3µ ± 0% -0.77% (p=0.000 n=15)
Div/2000000/1000000-16 37.77m ± 0% 37.75m ± 0% ~ (p=0.098 n=15)
Div/20000000/10000000-16 1.367 ± 0% 1.377 ± 0% +0.72% (p=0.003 n=15)
NatMul/10-16 168.5n ± 3% 164.0n ± 4% ~ (p=0.751 n=15)
NatMul/100-16 6.086µ ± 3% 5.380µ ± 3% -11.60% (p=0.000 n=15)
NatMul/1000-16 238.1µ ± 3% 228.3µ ± 1% -4.12% (p=0.000 n=15)
NatMul/10000-16 8.721m ± 2% 8.518m ± 1% -2.33% (p=0.000 n=15)
NatMul/100000-16 369.6m ± 0% 371.1m ± 0% +0.42% (p=0.000 n=15)
geomean 19.57µ 18.74µ -4.21%
│ old │ new │
│ B/op │ B/op vs base │
NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15)
NatMul/1000-16 48.16Ki ± 0% 16.02Ki ± 0% -66.73% (p=0.000 n=15)
NatMul/10000-16 482.9Ki ± 1% 165.4Ki ± 3% -65.75% (p=0.000 n=15)
NatMul/100000-16 5.747Mi ± 7% 4.197Mi ± 0% -26.97% (p=0.000 n=15)
geomean 41.42Ki 20.63Ki -50.18%
¹ all samples are equal
│ old │ new │
│ allocs/op │ allocs/op vs base │
NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/10000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100000-16 7.000 ± 14% 7.000 ± 14% ~ (p=0.668 n=15)
geomean 1.476 1.476 +0.00%
¹ all samples are equal
goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
│ old │ new │
│ sec/op │ sec/op vs base │
Div/20/10-88 15.84n ± 1% 13.12n ± 0% -17.17% (p=0.000 n=15)
Div/40/20-88 15.88n ± 1% 13.12n ± 0% -17.38% (p=0.000 n=15)
Div/100/50-88 26.42n ± 0% 25.47n ± 0% -3.60% (p=0.000 n=15)
Div/200/100-88 132.4n ± 0% 114.9n ± 0% -13.22% (p=0.000 n=15)
Div/400/200-88 150.1n ± 0% 135.6n ± 0% -9.66% (p=0.000 n=15)
Div/1000/500-88 275.5n ± 0% 264.1n ± 0% -4.14% (p=0.000 n=15)
Div/2000/1000-88 586.5n ± 0% 581.1n ± 0% -0.92% (p=0.000 n=15)
Div/20000/10000-88 25.87µ ± 0% 25.72µ ± 0% -0.59% (p=0.000 n=15)
Div/200000/100000-88 772.2µ ± 0% 779.0µ ± 0% +0.88% (p=0.000 n=15)
Div/2000000/1000000-88 33.36m ± 0% 33.63m ± 0% +0.80% (p=0.000 n=15)
Div/20000000/10000000-88 1.307 ± 0% 1.320 ± 0% +1.03% (p=0.000 n=15)
NatMul/10-88 140.4n ± 0% 148.8n ± 4% +5.98% (p=0.000 n=15)
NatMul/100-88 4.663µ ± 1% 4.388µ ± 1% -5.90% (p=0.000 n=15)
NatMul/1000-88 207.7µ ± 0% 205.8µ ± 0% -0.89% (p=0.000 n=15)
NatMul/10000-88 8.456m ± 0% 8.468m ± 0% +0.14% (p=0.021 n=15)
NatMul/100000-88 295.1m ± 0% 297.9m ± 0% +0.94% (p=0.000 n=15)
geomean 14.96µ 14.33µ -4.23%
│ old │ new │
│ B/op │ B/op vs base │
NatMul/10-88 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-88 4.750Ki ± 0% 1.758Ki ± 0% -62.99% (p=0.000 n=15)
NatMul/1000-88 48.44Ki ± 0% 16.08Ki ± 0% -66.80% (p=0.000 n=15)
NatMul/10000-88 489.7Ki ± 1% 166.1Ki ± 3% -66.08% (p=0.000 n=15)
NatMul/100000-88 5.546Mi ± 0% 3.819Mi ± 60% -31.15% (p=0.000 n=15)
geomean 41.29Ki 20.30Ki -50.85%
¹ all samples are equal
│ old │ new │
│ allocs/op │ allocs/op vs base │
NatMul/10-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/1000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/10000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100000-88 5.000 ± 20% 6.000 ± 67% ~ (p=0.672 n=15)
geomean 1.380 1.431 +3.71%
¹ all samples are equal
goos: linux
goarch: arm64
pkg: math/big
│ old │ new │
│ sec/op │ sec/op vs base │
Div/20/10-16 15.85n ± 0% 15.23n ± 0% -3.91% (p=0.000 n=15)
Div/40/20-16 15.88n ± 0% 15.22n ± 0% -4.16% (p=0.000 n=15)
Div/100/50-16 29.69n ± 0% 26.39n ± 0% -11.11% (p=0.000 n=15)
Div/200/100-16 149.2n ± 0% 123.3n ± 0% -17.36% (p=0.000 n=15)
Div/400/200-16 160.3n ± 0% 139.2n ± 0% -13.16% (p=0.000 n=15)
Div/1000/500-16 271.0n ± 0% 256.1n ± 0% -5.50% (p=0.000 n=15)
Div/2000/1000-16 545.3n ± 0% 527.0n ± 0% -3.36% (p=0.000 n=15)
Div/20000/10000-16 22.60µ ± 0% 22.20µ ± 0% -1.77% (p=0.000 n=15)
Div/200000/100000-16 889.0µ ± 0% 892.2µ ± 0% +0.35% (p=0.000 n=15)
Div/2000000/1000000-16 38.01m ± 0% 38.12m ± 0% +0.30% (p=0.000 n=15)
Div/20000000/10000000-16 1.437 ± 0% 1.444 ± 0% +0.50% (p=0.000 n=15)
NatMul/10-16 166.4n ± 2% 169.5n ± 1% +1.86% (p=0.000 n=15)
NatMul/100-16 5.733µ ± 1% 5.570µ ± 1% -2.84% (p=0.000 n=15)
NatMul/1000-16 232.6µ ± 1% 229.8µ ± 0% -1.22% (p=0.000 n=15)
NatMul/10000-16 9.039m ± 1% 8.969m ± 0% -0.77% (p=0.000 n=15)
NatMul/100000-16 367.0m ± 0% 368.8m ± 0% +0.48% (p=0.000 n=15)
geomean 16.15µ 15.50µ -4.01%
│ old │ new │
│ B/op │ B/op vs base │
NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15)
NatMul/1000-16 48.33Ki ± 0% 16.02Ki ± 0% -66.85% (p=0.000 n=15)
NatMul/10000-16 536.5Ki ± 1% 165.7Ki ± 3% -69.12% (p=0.000 n=15)
NatMul/100000-16 6.078Mi ± 6% 4.197Mi ± 0% -30.94% (p=0.000 n=15)
geomean 42.81Ki 20.64Ki -51.78%
¹ all samples are equal
│ old │ new │
│ allocs/op │ allocs/op vs base │
NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/10000-16 2.000 ± 50% 1.000 ± 0% -50.00% (p=0.001 n=15)
NatMul/100000-16 9.000 ± 11% 8.000 ± 12% -11.11% (p=0.001 n=15)
geomean 1.783 1.516 -14.97%
¹ all samples are equal
goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
│ old │ new │
│ sec/op │ sec/op vs base │
Div/20/10-12 9.850n ± 1% 9.405n ± 1% -4.52% (p=0.000 n=15)
Div/40/20-12 9.858n ± 0% 9.403n ± 1% -4.62% (p=0.000 n=15)
Div/100/50-12 16.40n ± 1% 14.81n ± 0% -9.70% (p=0.000 n=15)
Div/200/100-12 88.48n ± 2% 80.88n ± 0% -8.59% (p=0.000 n=15)
Div/400/200-12 107.90n ± 1% 99.28n ± 1% -7.99% (p=0.000 n=15)
Div/1000/500-12 188.8n ± 1% 178.6n ± 1% -5.40% (p=0.000 n=15)
Div/2000/1000-12 399.9n ± 0% 389.1n ± 0% -2.70% (p=0.000 n=15)
Div/20000/10000-12 13.94µ ± 2% 13.81µ ± 1% ~ (p=0.574 n=15)
Div/200000/100000-12 523.8µ ± 0% 521.7µ ± 0% -0.40% (p=0.000 n=15)
Div/2000000/1000000-12 21.46m ± 0% 21.48m ± 0% ~ (p=0.067 n=15)
Div/20000000/10000000-12 812.5m ± 0% 812.9m ± 0% ~ (p=0.061 n=15)
NatMul/10-12 77.14n ± 0% 78.35n ± 1% +1.57% (p=0.000 n=15)
NatMul/100-12 2.999µ ± 0% 2.871µ ± 1% -4.27% (p=0.000 n=15)
NatMul/1000-12 126.2µ ± 0% 126.8µ ± 0% +0.51% (p=0.011 n=15)
NatMul/10000-12 5.099m ± 0% 5.125m ± 0% +0.51% (p=0.000 n=15)
NatMul/100000-12 206.7m ± 0% 208.4m ± 0% +0.80% (p=0.000 n=15)
geomean 9.512µ 9.236µ -2.91%
│ old │ new │
│ B/op │ B/op vs base │
NatMul/10-12 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-12 4.750Ki ± 0% 1.750Ki ± 0% -63.16% (p=0.000 n=15)
NatMul/1000-12 48.13Ki ± 0% 16.01Ki ± 0% -66.73% (p=0.000 n=15)
NatMul/10000-12 483.5Ki ± 1% 163.2Ki ± 2% -66.24% (p=0.000 n=15)
NatMul/100000-12 5.480Mi ± 4% 1.532Mi ± 104% -72.05% (p=0.000 n=15)
geomean 41.03Ki 16.82Ki -59.01%
¹ all samples are equal
│ old │ new │
│ allocs/op │ allocs/op vs base │
NatMul/10-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/1000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/10000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹
NatMul/100000-12 5.000 ± 0% 1.000 ± 400% -80.00% (p=0.007 n=15)
geomean 1.380 1.000 -27.52%
¹ all samples are equal
Change-Id: I7efa6fe37971ed26ae120a32250fcb47ece0a011
Reviewed-on: https://go-review.googlesource.com/c/go/+/650638
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
|
|
If Be and Le stand for big-endian and little-endian,
then they should be BE and LE.
Change-Id: I723e3962b8918da84791783d3c547638f1c9e8a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/627376
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: Id07f16d14133ee539bc2880b39641c42418fa6e2
GitHub-Last-Rev: 7b327d508f677f2476d24f046d25921f4599dd9a
GitHub-Pull-Request: golang/go#67319
Reviewed-on: https://go-review.googlesource.com/c/go/+/585016
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Change-Id: I07c3a498ce1e462c3d1703d77e7d7824e9334651
GitHub-Last-Rev: 2ba8c4c705eaeb0772109ece7296978b62467eb3
GitHub-Pull-Request: golang/go#66312
Reviewed-on: https://go-review.googlesource.com/c/go/+/571636
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Compute median as a + (b-a)/2 instead of (a + b)/2.
Add additional test cases.
Fixes #65025.
Change-Id: Ib716a1036c17f8f33f51e33cedab13512eb7e0be
Reviewed-on: https://go-review.googlesource.com/c/go/+/554617
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Robert Griesemer <gri@google.com>
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Change-Id: I71a38dd20bfaf2b1aed18892d54eeb017d3d7d66
GitHub-Last-Rev: 8da43b2cbd563ed123690709e519c9f84272b332
GitHub-Pull-Request: golang/go#61955
Reviewed-on: https://go-review.googlesource.com/c/go/+/518595
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
|
|
CL 450055 replaced BitLen with a slower constant-time implementation,
which caused a performance regression in some ecosystem benchmarks.
https://perf.golang.org/search?q=upload%3A20221130.13+pkg%3Agithub.com%2Fericlagergren%2Fdecimal%2Fbenchmarks
Current tip vs this CL
name old time/op new time/op delta
Pi/foo=ericlagergren_(Go)/prec=100-4 151µs ± 0% 129µs ± 0% -14.89% (p=0.000 n=10+9)
Pi/foo=ericlagergren_(GDA)/prec=100-4 305µs ± 0% 269µs ± 1% -11.88% (p=0.000 n=9+10)
Pi/foo=cockroachdb/apd/prec=100-4 5.74ms ± 0% 5.33ms ± 0% -7.02% (p=0.000 n=9+10)
Pi/foo=shopspring/prec=100-4 265µs ±16% 268µs ±11% ~ (p=0.796 n=10+10)
Pi/foo=apmckinlay/prec=100-4 3.10µs ± 0% 3.08µs ± 0% -0.60% (p=0.000 n=8+10)
Pi/foo=go-inf/prec=100-4 132µs ± 9% 137µs ± 9% ~ (p=0.182 n=10+9)
Pi/foo=float64/prec=100-4 4.97µs ± 0% 4.98µs ± 0% ~ (p=0.196 n=10+10)
CL 450055's parent vs this CL
name old time/op new time/op delta
Pi/foo=ericlagergren_(Go)/prec=100-4 129µs ± 1% 129µs ± 0% ~ (p=0.182 n=10+9)
Pi/foo=ericlagergren_(GDA)/prec=100-4 267µs ± 1% 269µs ± 1% +0.93% (p=0.001 n=9+10)
Pi/foo=shopspring/prec=100-4 252µs ± 9% 268µs ±11% ~ (p=0.052 n=10+10)
Pi/foo=apmckinlay/prec=100-4 3.10µs ± 1% 3.08µs ± 0% -0.66% (p=0.000 n=9+10)
Pi/foo=go-inf/prec=100-4 135µs ± 6% 137µs ± 9% ~ (p=0.605 n=9+9)
Pi/foo=float64/prec=100-4 4.97µs ± 0% 4.98µs ± 0% +0.23% (p=0.005 n=8+10)
goos: linux
goarch: amd64
pkg: github.com/ericlagergren/decimal_benchmarks
cpu: Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
Fixes #57014
Change-Id: I08478bea122212320a592ad2652e33077807de09
Reviewed-on: https://go-review.googlesource.com/c/go/+/454617
Reviewed-by: Roland Shoemaker <roland@golang.org>
Run-TryBot: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This will let us reuse it in crypto/ecdsa for the NIST scalar fields.
The main change in API is around encoding and decoding. The SetBytes +
ExpandFor sequence was hacky: SetBytes could produce a bigger size than
the modulus if leading zeroes in the top byte overflowed the limb
boundary, so ExpandFor had to check for and tolerate that. Also, the
caller was responsible for checking that the overflow was actually all
zeroes (which we weren't doing, exposing a crasher in decryption and
signature verification) and then for checking that the result was less
than the modulus. Instead, make SetBytes take a modulus and return an
error if the value overflows. Same with Bytes: we were always allocating
based on Size before FillBytes anyway, so now Bytes takes a modulus.
Finally, SetBig was almost only used for moduli, so replaced
NewModulusFromNat and SetBig with NewModulusFromBig.
Moved the constant-time bitLen to math/big.Int.BitLen. It's slower, but
BitLen is primarily used in cryptographic code, so it's safer this way.
Change-Id: Ibaf7f36d80695578cb80484167d82ce1aa83832f
Reviewed-on: https://go-review.googlesource.com/c/go/+/450055
Auto-Submit: Filippo Valsorda <filippo@golang.org>
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
|
|
Montgomery multiplication can be used for Exp mod even m
by splitting it into two steps - Exp mod an odd number and
Exp mod a power of two - and then combining the two results
using the Chinese Remainder Theorem.
For more details, see Ç. K. Koç, “Montgomery Reduction with Even Modulus”,
IEE Proceedings: Computers and Digital Techniques, 141(5) 314-316, September 1994.
http://www.people.vcu.edu/~jwang3/CMSC691/j34monex.pdf
Thanks to Guido Vranken for suggesting that we use a faster algorithm.
name old time/op new time/op delta
ExpMont/Odd-16 240µs ± 2% 239µs ± 2% ~ (p=0.853 n=10+10)
ExpMont/Even1-16 757µs ± 3% 249µs ± 6% -67.17% (p=0.000 n=10+10)
ExpMont/Even2-16 755µs ± 1% 244µs ± 4% -67.64% (p=0.000 n=8+10)
ExpMont/Even3-16 771µs ± 3% 240µs ± 2% -68.90% (p=0.000 n=10+10)
ExpMont/Even4-16 775µs ± 3% 241µs ± 2% -68.91% (p=0.000 n=10+10)
ExpMont/Even8-16 779µs ± 2% 241µs ± 3% -69.06% (p=0.000 n=9+10)
ExpMont/Even32-16 778µs ± 3% 240µs ± 4% -69.11% (p=0.000 n=9+9)
ExpMont/Even64-16 774µs ± 6% 186µs ± 2% -76.00% (p=0.000 n=10+10)
ExpMont/Even96-16 776µs ± 4% 186µs ± 6% -76.09% (p=0.000 n=9+10)
ExpMont/Even128-16 764µs ± 2% 143µs ± 3% -81.24% (p=0.000 n=10+10)
ExpMont/Even255-16 761µs ± 3% 109µs ± 2% -85.73% (p=0.000 n=10+10)
ExpMont/SmallEven1-16 45.6µs ± 1% 36.3µs ± 3% -20.49% (p=0.000 n=9+10)
ExpMont/SmallEven2-16 44.3µs ± 2% 37.5µs ± 2% -15.26% (p=0.000 n=9+10)
ExpMont/SmallEven3-16 44.1µs ± 5% 37.3µs ± 3% -15.32% (p=0.000 n=9+10)
ExpMont/SmallEven4-16 47.1µs ± 6% 38.0µs ± 5% -19.40% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
ExpMont/Odd-16 2.53kB ± 0% 2.53kB ± 0% ~ (p=0.137 n=8+10)
ExpMont/Even1-16 2.57kB ± 0% 3.31kB ± 0% +28.90% (p=0.000 n=8+10)
ExpMont/Even2-16 2.57kB ± 0% 3.37kB ± 0% +31.09% (p=0.000 n=9+10)
ExpMont/Even3-16 2.57kB ± 0% 3.37kB ± 0% +31.08% (p=0.000 n=8+8)
ExpMont/Even4-16 2.57kB ± 0% 3.37kB ± 0% +31.09% (p=0.000 n=9+10)
ExpMont/Even8-16 2.57kB ± 0% 3.37kB ± 0% +31.09% (p=0.000 n=9+10)
ExpMont/Even32-16 2.57kB ± 0% 3.37kB ± 0% +31.14% (p=0.000 n=10+10)
ExpMont/Even64-16 2.57kB ± 0% 3.16kB ± 0% +22.99% (p=0.000 n=9+9)
ExpMont/Even96-16 2.57kB ± 0% 3.44kB ± 0% +33.90% (p=0.000 n=10+8)
ExpMont/Even128-16 2.57kB ± 0% 2.88kB ± 0% +12.10% (p=0.000 n=10+10)
ExpMont/Even255-16 2.57kB ± 0% 2.54kB ± 0% -1.30% (p=0.000 n=9+10)
ExpMont/SmallEven1-16 872B ± 0% 1232B ± 0% +41.28% (p=0.000 n=10+10)
ExpMont/SmallEven2-16 872B ± 0% 1233B ± 0% +41.40% (p=0.000 n=10+9)
ExpMont/SmallEven3-16 872B ± 0% 1289B ± 0% +47.82% (p=0.000 n=10+10)
ExpMont/SmallEven4-16 872B ± 0% 1241B ± 0% +42.32% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
ExpMont/Odd-16 21.0 ± 0% 21.0 ± 0% ~ (all equal)
ExpMont/Even1-16 24.0 ± 0% 38.0 ± 0% +58.33% (p=0.000 n=10+10)
ExpMont/Even2-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10)
ExpMont/Even3-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10)
ExpMont/Even4-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10)
ExpMont/Even8-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10)
ExpMont/Even32-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10)
ExpMont/Even64-16 24.0 ± 0% 39.0 ± 0% +62.50% (p=0.000 n=10+10)
ExpMont/Even96-16 24.0 ± 0% 42.0 ± 0% +75.00% (p=0.000 n=10+10)
ExpMont/Even128-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10)
ExpMont/Even255-16 24.0 ± 0% 38.0 ± 0% +58.33% (p=0.000 n=10+10)
ExpMont/SmallEven1-16 16.0 ± 0% 35.0 ± 0% +118.75% (p=0.000 n=10+10)
ExpMont/SmallEven2-16 16.0 ± 0% 35.0 ± 0% +118.75% (p=0.000 n=10+10)
ExpMont/SmallEven3-16 16.0 ± 0% 37.0 ± 0% +131.25% (p=0.000 n=10+10)
ExpMont/SmallEven4-16 16.0 ± 0% 36.0 ± 0% +125.00% (p=0.000 n=10+10)
Change-Id: Ib7f70290f8f087b78805ec3120baf17dd7737ac3
Reviewed-on: https://go-review.googlesource.com/c/go/+/420897
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
[This CL is part of a sequence implementing the proposal #51082.
The design doc is at https://go.dev/s/godocfmt-design.]
Run the updated gofmt, which reformats doc comments,
on the main repository. Vendored files are excluded.
For #51082.
Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407
Reviewed-on: https://go-review.googlesource.com/c/go/+/384268
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
A future change to gofmt will rewrite
// Doc comment.
//
func f()
to
// Doc comment.
func f()
Apply that change preemptively to all doc comments.
For #51082.
Change-Id: I4023e16cfb0729b64a8590f071cd92f17343081d
Reviewed-on: https://go-review.googlesource.com/c/go/+/384259
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
A run of lines that are indented with any number of spaces or tabs
format as a <pre> block. This commit fixes various doc comments
that format badly according to that (standard) rule.
For example, consider:
// - List item.
// Second line.
// - Another item.
Because the - lines are unindented, this is actually two paragraphs
separated by a one-line <pre> block. This CL rewrites it to:
// - List item.
// Second line.
// - Another item.
Today, that will format as a single <pre> block.
In a future release, we hope to format it as a bulleted list.
Various other minor fixes as well, all in preparation for reformatting.
For #51082.
Change-Id: I95cf06040d4186830e571cd50148be3bf8daf189
Reviewed-on: https://go-review.googlesource.com/c/go/+/384257
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Change-Id: I7dfae0fc91c2d70873ec7ec920be7c0a4888153a
Reviewed-on: https://go-review.googlesource.com/c/go/+/390175
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Trust: Daniel Martí <mvdan@mvdan.cc>
|
|
Code moved and functions reordered to be in a consistent
top-down dependency order, but otherwise unchanged.
First step toward commenting division algorithms.
Change-Id: Ib5e604fb5b2867edff3a228ba4e57b5cb32c4137
Reviewed-on: https://go-review.googlesource.com/c/go/+/321077
Trust: Russ Cox <rsc@golang.org>
Trust: Katie Hockman <katie@golang.org>
Trust: Robert Griesemer <gri@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Katie Hockman <katie@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
There appears to be a typo in the description of
the recursive division algorithm.
Two things seem suspicious with the original comment:
1. It is talking about choosing s, but s doesn't
appear anywhere in the equation.
2. The math in the equation is incorrect.
Where
B = len(v)/2
s = B - 1
Proof that it is incorrect:
len(v) - B >= B + 1
len(v) - len(v)/2 >= len(v)/2 + 1
This doesn't hold if len(v) is even, e.g. 10:
10 - 10/2 >= 10/2 + 1
10 - 5 >= 5 + 1
5 >= 6 // this is false
The new equation will be the following,
which will be mathematically correct:
len(v) - s >= B + 1
len(v) - (len(v)/2 - 1) >= len(v)/2 + 1
len(v) - len(v)/2 + 1 >= len(v)/2 + 1
len(v) - len(v)/2 >= len(v)/2
This holds if len(v) is even or odd.
e.g. 10
10 - 10/2 >= 10/2
10 - 5 >= 5
5 >= 5
e.g. 11
11 - 11/2 >= 11/2
11 - 5 >= 5
6 >= 5
Change-Id: If77ce09286cf7038637b5dfd0fb7d4f828023f56
Reviewed-on: https://go-review.googlesource.com/c/go/+/287372
Run-TryBot: Katie Hockman <katie@golang.org>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Trust: Katie Hockman <katie@golang.org>
|
|
The previous s value could cause a crash
for certain inputs.
Will check in tests and documentation improvements later.
Thanks to the Go Ethereum team and the OSS-Fuzz project for reporting this.
Thanks to Rémy Oudompheng and Robert Griesemer for their help
developing and validating the fix.
Fixes CVE-2020-28362
Change-Id: Ibbf455c4436bcdb07c84a34fa6551fb3422356d3
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/899974
Reviewed-by: Roland Shoemaker <bracewell@google.com>
Reviewed-by: Filippo Valsorda <valsorda@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/269657
Trust: Katie Hockman <katie@golang.org>
Trust: Roland Shoemaker <roland@golang.org>
Run-TryBot: Katie Hockman <katie@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
Division is much slower than multiplication. And the method of using
multiplication by multiplying reciprocal and replacing division with it
can increase the speed of divWVW algorithm by three times,and at the
same time increase the speed of nats division.
The benchmark test on arm64 is as follows:
name old time/op new time/op delta
DivWVW/1-4 13.1ns ± 4% 13.3ns ± 4% ~ (p=0.444 n=5+5)
DivWVW/2-4 48.6ns ± 1% 51.2ns ± 2% +5.39% (p=0.008 n=5+5)
DivWVW/3-4 82.0ns ± 1% 69.7ns ± 1% -15.03% (p=0.008 n=5+5)
DivWVW/4-4 116ns ± 1% 71ns ± 2% -38.88% (p=0.008 n=5+5)
DivWVW/5-4 152ns ± 1% 84ns ± 4% -44.70% (p=0.008 n=5+5)
DivWVW/10-4 319ns ± 1% 155ns ± 4% -51.50% (p=0.008 n=5+5)
DivWVW/100-4 3.44µs ± 3% 1.30µs ± 8% -62.30% (p=0.008 n=5+5)
DivWVW/1000-4 33.8µs ± 0% 10.9µs ± 1% -67.74% (p=0.008 n=5+5)
DivWVW/10000-4 343µs ± 4% 111µs ± 5% -67.63% (p=0.008 n=5+5)
DivWVW/100000-4 3.35ms ± 1% 1.25ms ± 3% -62.79% (p=0.008 n=5+5)
QuoRem-4 3.08µs ± 2% 2.21µs ± 4% -28.40% (p=0.008 n=5+5)
ModSqrt225_Tonelli-4 444µs ± 2% 457µs ± 3% ~ (p=0.095 n=5+5)
ModSqrt225_3Mod4-4 136µs ± 1% 138µs ± 3% ~ (p=0.151 n=5+5)
ModSqrt231_Tonelli-4 473µs ± 3% 483µs ± 4% ~ (p=0.548 n=5+5)
ModSqrt231_5Mod8-4 164µs ± 9% 169µs ±12% ~ (p=0.421 n=5+5)
Sqrt-4 36.8µs ± 1% 28.6µs ± 0% -22.17% (p=0.016 n=5+4)
Div/20/10-4 50.0ns ± 3% 51.3ns ± 6% ~ (p=0.238 n=5+5)
Div/40/20-4 49.8ns ± 2% 51.3ns ± 6% ~ (p=0.222 n=5+5)
Div/100/50-4 85.8ns ± 4% 86.5ns ± 5% ~ (p=0.246 n=5+5)
Div/200/100-4 335ns ± 3% 296ns ± 2% -11.60% (p=0.008 n=5+5)
Div/400/200-4 442ns ± 2% 359ns ± 5% -18.81% (p=0.008 n=5+5)
Div/1000/500-4 858ns ± 3% 643ns ± 6% -25.06% (p=0.008 n=5+5)
Div/2000/1000-4 1.70µs ± 3% 1.28µs ± 4% -24.80% (p=0.008 n=5+5)
Div/20000/10000-4 45.0µs ± 5% 41.8µs ± 4% -7.17% (p=0.016 n=5+5)
Div/200000/100000-4 1.51ms ± 7% 1.43ms ± 3% -5.42% (p=0.016 n=5+5)
Div/2000000/1000000-4 57.6ms ± 4% 57.5ms ± 3% ~ (p=1.000 n=5+5)
Div/20000000/10000000-4 2.08s ± 3% 2.04s ± 1% ~ (p=0.095 n=5+5)
name old speed new speed delta
DivWVW/1-4 4.87GB/s ± 4% 4.80GB/s ± 4% ~ (p=0.310 n=5+5)
DivWVW/2-4 2.63GB/s ± 1% 2.50GB/s ± 2% -5.07% (p=0.008 n=5+5)
DivWVW/3-4 2.34GB/s ± 1% 2.76GB/s ± 1% +17.70% (p=0.008 n=5+5)
DivWVW/4-4 2.21GB/s ± 1% 3.61GB/s ± 2% +63.42% (p=0.008 n=5+5)
DivWVW/5-4 2.10GB/s ± 2% 3.81GB/s ± 4% +80.89% (p=0.008 n=5+5)
DivWVW/10-4 2.01GB/s ± 0% 4.13GB/s ± 4% +105.91% (p=0.008 n=5+5)
DivWVW/100-4 1.86GB/s ± 2% 4.95GB/s ± 7% +165.63% (p=0.008 n=5+5)
DivWVW/1000-4 1.89GB/s ± 0% 5.86GB/s ± 1% +209.96% (p=0.008 n=5+5)
DivWVW/10000-4 1.87GB/s ± 4% 5.76GB/s ± 5% +208.96% (p=0.008 n=5+5)
DivWVW/100000-4 1.91GB/s ± 1% 5.14GB/s ± 3% +168.85% (p=0.008 n=5+5)
Change-Id: I049f1196562b20800e6ef8a6493fd147f93ad830
Reviewed-on: https://go-review.googlesource.com/c/go/+/250417
Trust: Giovanni Bajo <rasky@develer.com>
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Replaced almost every use of Bytes with FillBytes.
Note that the approved proposal was for
func (*Int) FillBytes(buf []byte)
while this implements
func (*Int) FillBytes(buf []byte) []byte
because the latter was far nicer to use in all callsites.
Fixes #35833
Change-Id: Ia912df123e5d79b763845312ea3d9a8051343c0a
Reviewed-on: https://go-review.googlesource.com/c/go/+/230397
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
The divBasic function computes the quotient of big nats u/v word by word.
It estimates each word qhat by performing a long division (top 2 words of u
divided by top word of v), looks at the next word to correct the estimate,
then perform a full multiplication (qhat*v) to catch any inaccuracy in the
estimate.
In the latter case, "negative" values appear temporarily and carries
must be carefully managed, and the recursive division refactoring
introduced a case where qhat*v has the same length as v, triggering an
out-of-bounds write in the case it happens when computing the top word
of the quotient.
Fixes #37499
Change-Id: I15089da4a4027beda43af497bf6de261eb792f94
Reviewed-on: https://go-review.googlesource.com/c/go/+/221980
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Change-Id: I5b909df0fd048cd66c5a27fca1b06466d3bcaac7
GitHub-Last-Rev: 778c5d21311abee09a5fbda2e4005a5fd4cc3f9f
GitHub-Pull-Request: golang/go#35624
Reviewed-on: https://go-review.googlesource.com/c/go/+/207421
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
The bounds in the last carry branch were wrong as there
is no reason for len(u) >= n+n/2 to always hold true.
We also adjust test to avoid using a remainder of 1
(in which case, the last step of the algorithm computes
(qhatv+1) - qhatv which rarely produces a carry).
Change-Id: I69fbab9c5e19d0db1c087fbfcd5b89352c2d26fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/206839
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
The current division algorithm produces one word of result at a time,
using 2-word division to compute the top word and mulAddVWW to compute
the remainder. The top word may need to be adjusted by 1 or 2 units.
The recursive version, based on Burnikel, Ziegler, "Fast Recursive Division",
uses the same principles, but in a multi-word setting, so that
multiplication benefits from the Karatsuba algorithm (and possibly later
improvements).
benchmark old ns/op new ns/op delta
BenchmarkDiv/20/10-4 38.2 38.3 +0.26%
BenchmarkDiv/40/20-4 38.7 38.5 -0.52%
BenchmarkDiv/100/50-4 62.5 62.6 +0.16%
BenchmarkDiv/200/100-4 238 259 +8.82%
BenchmarkDiv/400/200-4 311 338 +8.68%
BenchmarkDiv/1000/500-4 604 649 +7.45%
BenchmarkDiv/2000/1000-4 1214 1278 +5.27%
BenchmarkDiv/20000/10000-4 38279 36510 -4.62%
BenchmarkDiv/200000/100000-4 3022057 1359615 -55.01%
BenchmarkDiv/2000000/1000000-4 310827664 54012939 -82.62%
BenchmarkDiv/20000000/10000000-4 33272829421 1965401359 -94.09%
BenchmarkString/10/Base10-4 158 156 -1.27%
BenchmarkString/100/Base10-4 797 792 -0.63%
BenchmarkString/1000/Base10-4 3677 3814 +3.73%
BenchmarkString/10000/Base10-4 16633 17116 +2.90%
BenchmarkString/100000/Base10-4 5779029 1793808 -68.96%
BenchmarkString/1000000/Base10-4 889840820 85524031 -90.39%
BenchmarkString/10000000/Base10-4 134338236860 4935657026 -96.33%
Fixes #21960
Updates #30943
Change-Id: I134c6f81a47870c688ca95b6081eb9211def15a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/172018
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This notably allows to reuse temporaries across
the karatsubaSqr recursion.
benchmark old ns/op new ns/op delta
BenchmarkNatMul/10-4 227 228 +0.44%
BenchmarkNatMul/100-4 8339 8589 +3.00%
BenchmarkNatMul/1000-4 313796 312272 -0.49%
BenchmarkNatMul/10000-4 11924720 11873589 -0.43%
BenchmarkNatMul/100000-4 503813354 503839058 +0.01%
BenchmarkNatSqr/20-4 549 513 -6.56%
BenchmarkNatSqr/30-4 945 874 -7.51%
BenchmarkNatSqr/50-4 1993 1832 -8.08%
BenchmarkNatSqr/80-4 4096 3874 -5.42%
BenchmarkNatSqr/100-4 6192 5712 -7.75%
BenchmarkNatSqr/200-4 20388 19543 -4.14%
BenchmarkNatSqr/300-4 38735 36715 -5.21%
BenchmarkNatSqr/500-4 99562 93542 -6.05%
BenchmarkNatSqr/800-4 195554 184907 -5.44%
BenchmarkNatSqr/1000-4 286302 275053 -3.93%
BenchmarkNatSqr/10000-4 9817057 9441641 -3.82%
BenchmarkNatSqr/100000-4 390713416 379696789 -2.82%
benchmark old allocs new allocs delta
BenchmarkNatMul/10-4 1 1 +0.00%
BenchmarkNatMul/100-4 1 1 +0.00%
BenchmarkNatMul/1000-4 2 1 -50.00%
BenchmarkNatMul/10000-4 2 1 -50.00%
BenchmarkNatMul/100000-4 9 11 +22.22%
BenchmarkNatSqr/20-4 2 1 -50.00%
BenchmarkNatSqr/30-4 2 1 -50.00%
BenchmarkNatSqr/50-4 2 1 -50.00%
BenchmarkNatSqr/80-4 2 1 -50.00%
BenchmarkNatSqr/100-4 2 1 -50.00%
BenchmarkNatSqr/200-4 2 1 -50.00%
BenchmarkNatSqr/300-4 4 1 -75.00%
BenchmarkNatSqr/500-4 4 1 -75.00%
BenchmarkNatSqr/800-4 10 1 -90.00%
BenchmarkNatSqr/1000-4 10 1 -90.00%
BenchmarkNatSqr/10000-4 731 1 -99.86%
BenchmarkNatSqr/100000-4 19687 6 -99.97%
benchmark old bytes new bytes delta
BenchmarkNatMul/10-4 192 192 +0.00%
BenchmarkNatMul/100-4 4864 4864 +0.00%
BenchmarkNatMul/1000-4 57344 49224 -14.16%
BenchmarkNatMul/10000-4 565248 498772 -11.76%
BenchmarkNatMul/100000-4 5749504 7263720 +26.34%
BenchmarkNatSqr/20-4 672 352 -47.62%
BenchmarkNatSqr/30-4 992 512 -48.39%
BenchmarkNatSqr/50-4 1792 896 -50.00%
BenchmarkNatSqr/80-4 2688 1408 -47.62%
BenchmarkNatSqr/100-4 3584 1792 -50.00%
BenchmarkNatSqr/200-4 6656 3456 -48.08%
BenchmarkNatSqr/300-4 24448 16387 -32.97%
BenchmarkNatSqr/500-4 36864 24591 -33.29%
BenchmarkNatSqr/800-4 69760 40981 -41.25%
BenchmarkNatSqr/1000-4 86016 49180 -42.82%
BenchmarkNatSqr/10000-4 2524800 487368 -80.70%
BenchmarkNatSqr/100000-4 68599808 5876581 -91.43%
Change-Id: I8e6e409ae1cb48be9d5aa9b5f428d6cbe487673a
Reviewed-on: https://go-review.googlesource.com/c/go/+/172017
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
This fixes an old oversight. Rat.SetString already permitted
fractions a/b where both a and b could independently specify
a base prefix. With this CL, it now also accepts non-decimal
floating-point numbers.
Fixes #29799.
Change-Id: I9cc65666a5cebb00f0202da2e4fc5654a02e3234
Reviewed-on: https://go-review.googlesource.com/c/go/+/168237
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
|
|
The proposed change introduces a better initial guess which is closer to the final value and therefore converges in fewer steps. Consider for example sqrt(8): previously the guess was 8, whereas now it is 4 (and the result is 2). All this change does is it computes the division by two more accurately while it keeps the guess ≥ √x.
Change-Id: I917248d734a7b0488d14a647a063f674e56c4e30
GitHub-Last-Rev: c06d9d4876c8e7d6739f0e4b687e370fe1e9aad7
GitHub-Pull-Request: golang/go#28981
Reviewed-on: https://go-review.googlesource.com/c/163866
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
For many uses of math/big, most numbers are small in practice.
Prior to this change, big.NewInt allocated a minimum of five Words:
one to hold the value, and four as extra capacity.
In most cases, this extra capacity is waste.
Worse, allocating a single Word uses a fast malloc path for tiny allocs;
allocating five Words is more expensive in CPU as well as memory.
This change is a simple fix: Treat a request for one Word at its word.
I experimented with more complicated fixes and did not find anything
that outperformed this easy fix.
On some real world programs, this is a clear win.
The compiler:
name old alloc/op new alloc/op delta
Template 37.1MB ± 0% 37.0MB ± 0% -0.23% (p=0.008 n=5+5)
Unicode 29.2MB ± 0% 28.5MB ± 0% -2.48% (p=0.008 n=5+5)
GoTypes 133MB ± 0% 133MB ± 0% -0.05% (p=0.008 n=5+5)
Compiler 628MB ± 0% 628MB ± 0% -0.06% (p=0.008 n=5+5)
SSA 2.04GB ± 0% 2.03GB ± 0% -0.14% (p=0.008 n=5+5)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.23% (p=0.008 n=5+5)
GoParser 29.6MB ± 0% 29.6MB ± 0% -0.07% (p=0.008 n=5+5)
Reflect 82.3MB ± 0% 82.2MB ± 0% -0.05% (p=0.008 n=5+5)
Tar 36.2MB ± 0% 36.2MB ± 0% -0.12% (p=0.008 n=5+5)
XML 49.5MB ± 0% 49.4MB ± 0% -0.23% (p=0.008 n=5+5)
[Geo mean] 85.1MB 84.8MB -0.37%
name old allocs/op new allocs/op delta
Template 364k ± 0% 364k ± 0% ~ (p=0.476 n=5+5)
Unicode 341k ± 0% 341k ± 0% ~ (p=0.690 n=5+5)
GoTypes 1.37M ± 0% 1.37M ± 0% ~ (p=0.444 n=5+5)
Compiler 5.50M ± 0% 5.50M ± 0% +0.02% (p=0.008 n=5+5)
SSA 16.0M ± 0% 16.0M ± 0% +0.01% (p=0.008 n=5+5)
Flate 238k ± 0% 238k ± 0% ~ (p=0.222 n=5+5)
GoParser 305k ± 0% 305k ± 0% ~ (p=0.841 n=5+5)
Reflect 976k ± 0% 976k ± 0% ~ (p=0.222 n=5+5)
Tar 354k ± 0% 354k ± 0% ~ (p=0.103 n=5+5)
XML 450k ± 0% 450k ± 0% ~ (p=0.151 n=5+5)
[Geo mean] 837k 837k +0.01%
go.skylark.net (at ea6d2813de75ded8d157b9540bc3d3ad0b688623):
name old alloc/op new alloc/op delta
Hashtable-8 456kB ± 0% 299kB ± 0% -34.33% (p=0.000 n=9+9)
/bench_builtin_method-8 220kB ± 0% 190kB ± 0% -13.55% (p=0.000 n=9+10)
name old allocs/op new allocs/op delta
Hashtable-8 7.84k ± 0% 7.84k ± 0% ~ (all equal)
/bench_builtin_method-8 7.49k ± 0% 7.49k ± 0% ~ (all equal)
The math/big benchmarks are messy, which is predictable, since they
naturally exercise the bigger-than-one-word code more.
Also worth noting is that many of the benchmarks have very high variance.
I've omitted the opVV and opVW benchmarks, as they are unrelated.
name old time/op new time/op delta
DecimalConversion-8 92.5µs ± 1% 90.6µs ± 0% -2.12% (p=0.000 n=17+19)
FloatString/100-8 867ns ± 0% 871ns ± 0% +0.50% (p=0.000 n=18+18)
FloatString/1000-8 26.4µs ± 1% 26.5µs ± 1% ~ (p=0.396 n=20+19)
FloatString/10000-8 2.15ms ± 2% 2.16ms ± 2% ~ (p=0.089 n=19+20)
FloatString/100000-8 209ms ± 1% 209ms ± 1% ~ (p=0.583 n=19+19)
FloatAdd/10-8 63.5ns ± 2% 64.1ns ± 6% ~ (p=0.389 n=19+19)
FloatAdd/100-8 66.0ns ± 2% 65.8ns ± 2% ~ (p=0.825 n=20+20)
FloatAdd/1000-8 93.9ns ± 1% 94.3ns ± 1% ~ (p=0.273 n=19+20)
FloatAdd/10000-8 347ns ± 2% 342ns ± 1% -1.50% (p=0.000 n=18+18)
FloatAdd/100000-8 2.78µs ± 1% 2.78µs ± 2% ~ (p=0.961 n=20+19)
FloatSub/10-8 56.9ns ± 2% 57.8ns ± 3% +1.59% (p=0.001 n=19+19)
FloatSub/100-8 58.2ns ± 2% 58.9ns ± 2% +1.25% (p=0.004 n=20+20)
FloatSub/1000-8 74.9ns ± 1% 74.4ns ± 1% -0.76% (p=0.000 n=19+20)
FloatSub/10000-8 223ns ± 1% 220ns ± 2% -1.29% (p=0.000 n=16+20)
FloatSub/100000-8 1.66µs ± 1% 1.66µs ± 2% ~ (p=0.147 n=20+20)
ParseFloatSmallExp-8 8.38µs ± 0% 8.59µs ± 0% +2.48% (p=0.000 n=19+19)
ParseFloatLargeExp-8 31.1µs ± 0% 32.0µs ± 0% +3.04% (p=0.000 n=16+17)
GCD10x10/WithoutXY-8 115ns ± 1% 99ns ± 3% -14.07% (p=0.000 n=20+20)
GCD10x10/WithXY-8 322ns ± 0% 312ns ± 0% -3.11% (p=0.000 n=18+13)
GCD10x100/WithoutXY-8 233ns ± 1% 219ns ± 1% -5.73% (p=0.000 n=19+17)
GCD10x100/WithXY-8 709ns ± 0% 759ns ± 0% +7.04% (p=0.000 n=19+19)
GCD10x1000/WithoutXY-8 653ns ± 1% 642ns ± 1% -1.69% (p=0.000 n=17+20)
GCD10x1000/WithXY-8 1.35µs ± 0% 1.35µs ± 1% ~ (p=0.255 n=20+16)
GCD10x10000/WithoutXY-8 4.57µs ± 1% 4.61µs ± 1% +0.95% (p=0.000 n=18+17)
GCD10x10000/WithXY-8 6.82µs ± 0% 6.84µs ± 0% +0.27% (p=0.000 n=16+17)
GCD10x100000/WithoutXY-8 43.9µs ± 1% 44.0µs ± 1% +0.28% (p=0.000 n=18+17)
GCD10x100000/WithXY-8 60.6µs ± 0% 60.6µs ± 0% ~ (p=0.907 n=18+18)
GCD100x100/WithoutXY-8 1.13µs ± 0% 1.21µs ± 0% +6.39% (p=0.000 n=19+19)
GCD100x100/WithXY-8 1.82µs ± 0% 1.92µs ± 0% +5.24% (p=0.000 n=19+17)
GCD100x1000/WithoutXY-8 2.00µs ± 0% 2.03µs ± 1% +1.61% (p=0.000 n=18+16)
GCD100x1000/WithXY-8 3.22µs ± 0% 3.20µs ± 1% -0.83% (p=0.000 n=19+19)
GCD100x10000/WithoutXY-8 9.28µs ± 1% 9.17µs ± 1% -1.25% (p=0.000 n=18+19)
GCD100x10000/WithXY-8 13.5µs ± 0% 13.3µs ± 0% -1.12% (p=0.000 n=18+19)
GCD100x100000/WithoutXY-8 80.4µs ± 0% 78.6µs ± 0% -2.25% (p=0.000 n=19+19)
GCD100x100000/WithXY-8 114µs ± 0% 112µs ± 0% -1.46% (p=0.000 n=19+17)
GCD1000x1000/WithoutXY-8 12.9µs ± 1% 12.9µs ± 2% -0.50% (p=0.014 n=20+19)
GCD1000x1000/WithXY-8 19.6µs ± 1% 19.6µs ± 2% -0.28% (p=0.040 n=17+18)
GCD1000x10000/WithoutXY-8 22.4µs ± 0% 22.4µs ± 2% ~ (p=0.220 n=19+19)
GCD1000x10000/WithXY-8 57.0µs ± 0% 56.5µs ± 0% -0.87% (p=0.000 n=20+20)
GCD1000x100000/WithoutXY-8 116µs ± 0% 115µs ± 0% -0.49% (p=0.000 n=18+19)
GCD1000x100000/WithXY-8 410µs ± 0% 411µs ± 0% ~ (p=0.052 n=19+19)
GCD10000x10000/WithoutXY-8 247µs ± 1% 244µs ± 1% -0.92% (p=0.000 n=19+19)
GCD10000x10000/WithXY-8 476µs ± 1% 473µs ± 1% -0.48% (p=0.009 n=19+19)
GCD10000x100000/WithoutXY-8 573µs ± 1% 571µs ± 1% -0.45% (p=0.012 n=20+20)
GCD10000x100000/WithXY-8 3.35ms ± 1% 3.35ms ± 1% ~ (p=0.444 n=20+19)
GCD100000x100000/WithoutXY-8 12.0ms ± 2% 11.9ms ± 2% ~ (p=0.276 n=18+20)
GCD100000x100000/WithXY-8 27.3ms ± 1% 27.3ms ± 1% ~ (p=0.792 n=20+19)
Hilbert-8 672µs ± 0% 611µs ± 0% -9.02% (p=0.000 n=19+19)
Binomial-8 1.40µs ± 0% 1.18µs ± 0% -15.69% (p=0.000 n=16+14)
QuoRem-8 2.20µs ± 1% 2.17µs ± 1% -1.13% (p=0.000 n=19+19)
Exp-8 4.10ms ± 1% 4.11ms ± 1% ~ (p=0.296 n=20+19)
Exp2-8 4.11ms ± 1% 4.12ms ± 1% ~ (p=0.429 n=20+20)
Bitset-8 8.67ns ± 6% 8.74ns ± 4% ~ (p=0.139 n=19+17)
BitsetNeg-8 43.6ns ± 1% 43.8ns ± 2% +0.61% (p=0.036 n=20+20)
BitsetOrig-8 77.5ns ± 1% 68.4ns ± 1% -11.77% (p=0.000 n=19+20)
BitsetNegOrig-8 145ns ± 1% 141ns ± 1% -2.87% (p=0.000 n=19+20)
ModSqrt225_Tonelli-8 324µs ± 1% 324µs ± 1% ~ (p=0.409 n=18+20)
ModSqrt225_3Mod4-8 98.9µs ± 1% 99.1µs ± 1% ~ (p=0.298 n=19+18)
ModSqrt231_Tonelli-8 337µs ± 1% 337µs ± 1% ~ (p=0.718 n=20+18)
ModSqrt231_5Mod8-8 115µs ± 1% 114µs ± 1% -0.22% (p=0.050 n=20+20)
ModInverse-8 895ns ± 0% 869ns ± 1% -2.83% (p=0.000 n=17+17)
Sqrt-8 28.1µs ± 1% 28.1µs ± 0% -0.28% (p=0.000 n=16+20)
IntSqr/1-8 10.8ns ± 3% 10.5ns ± 3% -2.51% (p=0.000 n=19+17)
IntSqr/2-8 30.5ns ± 2% 30.3ns ± 4% -0.71% (p=0.035 n=18+18)
IntSqr/3-8 40.1ns ± 1% 40.1ns ± 1% ~ (p=0.710 n=20+17)
IntSqr/5-8 65.3ns ± 1% 65.4ns ± 2% ~ (p=0.744 n=19+19)
IntSqr/8-8 101ns ± 1% 102ns ± 0% ~ (p=0.234 n=19+20)
IntSqr/10-8 138ns ± 0% 138ns ± 2% ~ (p=0.827 n=18+18)
IntSqr/20-8 378ns ± 1% 378ns ± 1% ~ (p=0.479 n=18+18)
IntSqr/30-8 637ns ± 0% 638ns ± 1% ~ (p=0.051 n=18+20)
IntSqr/50-8 1.34µs ± 2% 1.34µs ± 1% ~ (p=0.970 n=18+19)
IntSqr/80-8 2.78µs ± 0% 2.78µs ± 1% -0.18% (p=0.006 n=19+17)
IntSqr/100-8 3.98µs ± 0% 3.98µs ± 0% ~ (p=0.057 n=17+19)
IntSqr/200-8 13.5µs ± 0% 13.5µs ± 1% -0.33% (p=0.000 n=19+17)
IntSqr/300-8 25.3µs ± 1% 25.3µs ± 1% ~ (p=0.361 n=19+20)
IntSqr/500-8 62.9µs ± 0% 62.9µs ± 1% ~ (p=0.899 n=17+17)
IntSqr/800-8 128µs ± 1% 127µs ± 1% -0.32% (p=0.016 n=18+20)
IntSqr/1000-8 192µs ± 0% 192µs ± 1% ~ (p=0.916 n=17+18)
Div/20/10-8 34.9ns ± 2% 35.6ns ± 1% +2.01% (p=0.000 n=20+20)
Div/200/100-8 218ns ± 1% 215ns ± 2% -1.43% (p=0.000 n=18+18)
Div/2000/1000-8 1.16µs ± 1% 1.15µs ± 1% -1.04% (p=0.000 n=19+20)
Div/20000/10000-8 35.7µs ± 1% 35.4µs ± 1% -0.69% (p=0.000 n=19+18)
Div/200000/100000-8 2.89ms ± 1% 2.88ms ± 1% -0.62% (p=0.007 n=19+20)
Mul-8 9.28ms ± 1% 9.27ms ± 1% ~ (p=0.563 n=18+18)
ZeroShifts/Shl-8 712ns ± 6% 716ns ± 7% ~ (p=0.597 n=20+20)
ZeroShifts/ShlSame-8 4.00ns ± 1% 4.06ns ± 5% ~ (p=0.162 n=18+20)
ZeroShifts/Shr-8 714ns ±10% 1285ns ±156% ~ (p=0.250 n=20+20)
ZeroShifts/ShrSame-8 4.00ns ± 1% 4.09ns ±10% +2.34% (p=0.048 n=16+19)
Exp3Power/0x10-8 154ns ± 0% 159ns ±13% ~ (p=0.197 n=14+20)
Exp3Power/0x40-8 171ns ± 1% 175ns ± 8% ~ (p=0.058 n=16+19)
Exp3Power/0x100-8 287ns ± 0% 316ns ± 4% +10.03% (p=0.000 n=17+19)
Exp3Power/0x400-8 698ns ± 1% 801ns ± 6% +14.75% (p=0.000 n=19+20)
Exp3Power/0x1000-8 2.87µs ± 0% 3.65µs ± 6% +27.24% (p=0.000 n=18+18)
Exp3Power/0x4000-8 21.9µs ± 1% 28.7µs ± 8% +31.09% (p=0.000 n=18+20)
Exp3Power/0x10000-8 204µs ± 0% 267µs ± 9% +30.81% (p=0.000 n=20+20)
Exp3Power/0x40000-8 1.86ms ± 0% 2.26ms ± 5% +21.68% (p=0.000 n=18+19)
Exp3Power/0x100000-8 17.5ms ± 1% 20.7ms ± 7% +18.39% (p=0.000 n=19+20)
Exp3Power/0x400000-8 156ms ± 0% 172ms ± 6% +10.54% (p=0.000 n=19+20)
Fibo-8 26.9ms ± 1% 27.5ms ± 3% +2.32% (p=0.000 n=19+19)
NatSqr/1-8 31.0ns ± 4% 39.5ns ±29% +27.25% (p=0.000 n=20+19)
NatSqr/2-8 54.1ns ± 1% 69.0ns ±28% +27.52% (p=0.000 n=20+20)
NatSqr/3-8 66.6ns ± 1% 83.0ns ±25% +24.59% (p=0.000 n=20+20)
NatSqr/5-8 97.1ns ± 1% 119.9ns ±12% +23.50% (p=0.000 n=16+20)
NatSqr/8-8 138ns ± 1% 171ns ± 9% +24.20% (p=0.000 n=19+20)
NatSqr/10-8 182ns ± 0% 225ns ± 9% +23.50% (p=0.000 n=16+20)
NatSqr/20-8 447ns ± 1% 624ns ± 6% +39.64% (p=0.000 n=19+19)
NatSqr/30-8 736ns ± 2% 986ns ± 9% +33.94% (p=0.000 n=19+20)
NatSqr/50-8 1.51µs ± 2% 1.97µs ± 9% +30.42% (p=0.000 n=20+20)
NatSqr/80-8 3.03µs ± 1% 3.67µs ± 7% +21.08% (p=0.000 n=20+20)
NatSqr/100-8 4.31µs ± 1% 5.20µs ± 7% +20.52% (p=0.000 n=19+20)
NatSqr/200-8 14.2µs ± 0% 16.3µs ± 4% +14.92% (p=0.000 n=19+20)
NatSqr/300-8 27.8µs ± 1% 33.2µs ± 7% +19.28% (p=0.000 n=20+18)
NatSqr/500-8 66.6µs ± 1% 74.5µs ± 3% +11.87% (p=0.000 n=18+18)
NatSqr/800-8 135µs ± 1% 165µs ± 7% +22.33% (p=0.000 n=20+20)
NatSqr/1000-8 200µs ± 0% 228µs ± 3% +14.39% (p=0.000 n=19+20)
NatSetBytes/8-8 8.87ns ± 4% 8.77ns ± 2% -1.17% (p=0.020 n=20+16)
NatSetBytes/24-8 38.6ns ± 3% 49.5ns ±29% +28.32% (p=0.000 n=18+19)
NatSetBytes/128-8 75.2ns ± 1% 120.7ns ±29% +60.60% (p=0.000 n=17+20)
NatSetBytes/7-8 16.2ns ± 2% 16.5ns ± 2% +1.76% (p=0.000 n=20+20)
NatSetBytes/23-8 46.5ns ± 1% 60.2ns ±24% +29.59% (p=0.000 n=20+20)
NatSetBytes/127-8 83.1ns ± 1% 118.2ns ±20% +42.33% (p=0.000 n=18+20)
ScanPi-8 89.1µs ± 1% 117.4µs ±12% +31.75% (p=0.000 n=18+20)
StringPiParallel-8 35.1µs ± 9% 40.2µs ±12% +14.53% (p=0.000 n=20+20)
Scan/10/Base2-8 410ns ±14% 429ns ±10% +4.47% (p=0.018 n=19+20)
Scan/100/Base2-8 3.05µs ±20% 2.97µs ±14% ~ (p=0.449 n=20+20)
Scan/1000/Base2-8 29.3µs ± 8% 30.1µs ±23% ~ (p=0.355 n=20+20)
Scan/10000/Base2-8 402µs ±13% 395µs ±14% ~ (p=0.355 n=20+20)
Scan/100000/Base2-8 11.8ms ±10% 11.6ms ± 1% ~ (p=0.245 n=17+18)
Scan/10/Base8-8 194ns ± 6% 196ns ±12% ~ (p=0.829 n=20+19)
Scan/100/Base8-8 1.11µs ±15% 1.11µs ±12% ~ (p=0.743 n=20+20)
Scan/1000/Base8-8 11.7µs ±10% 11.7µs ±12% ~ (p=0.904 n=20+20)
Scan/10000/Base8-8 209µs ± 7% 210µs ± 8% ~ (p=0.478 n=20+20)
Scan/100000/Base8-8 10.6ms ± 7% 10.4ms ± 6% ~ (p=0.112 n=20+18)
Scan/10/Base10-8 182ns ±12% 188ns ±11% +3.52% (p=0.044 n=20+20)
Scan/100/Base10-8 1.01µs ± 8% 1.00µs ±13% ~ (p=0.588 n=20+20)
Scan/1000/Base10-8 10.7µs ±20% 10.6µs ±14% ~ (p=0.560 n=20+20)
Scan/10000/Base10-8 195µs ±10% 194µs ± 9% ~ (p=0.883 n=20+20)
Scan/100000/Base10-8 10.6ms ± 2% 10.6ms ± 2% ~ (p=0.495 n=20+20)
Scan/10/Base16-8 166ns ±10% 174ns ±17% ~ (p=0.072 n=20+20)
Scan/100/Base16-8 836ns ±10% 826ns ±12% ~ (p=0.562 n=20+17)
Scan/1000/Base16-8 8.96µs ±13% 8.65µs ± 9% ~ (p=0.203 n=20+18)
Scan/10000/Base16-8 198µs ± 3% 198µs ± 5% ~ (p=0.718 n=20+20)
Scan/100000/Base16-8 11.1ms ± 3% 11.0ms ± 4% ~ (p=0.512 n=20+20)
String/10/Base2-8 88.1ns ± 7% 94.1ns ±11% +6.80% (p=0.000 n=19+20)
String/100/Base2-8 577ns ± 4% 598ns ± 5% +3.72% (p=0.000 n=20+20)
String/1000/Base2-8 5.25µs ± 2% 5.62µs ± 5% +7.04% (p=0.000 n=19+20)
String/10000/Base2-8 55.6µs ± 1% 60.1µs ± 2% +8.12% (p=0.000 n=19+19)
String/100000/Base2-8 519µs ± 2% 560µs ± 2% +7.91% (p=0.000 n=18+17)
String/10/Base8-8 52.2ns ± 8% 53.3ns ±12% ~ (p=0.188 n=20+18)
String/100/Base8-8 218ns ± 3% 232ns ±10% +6.66% (p=0.000 n=20+20)
String/1000/Base8-8 1.84µs ± 3% 1.94µs ± 4% +5.07% (p=0.000 n=20+18)
String/10000/Base8-8 18.1µs ± 2% 19.1µs ± 3% +5.84% (p=0.000 n=20+19)
String/100000/Base8-8 184µs ± 2% 197µs ± 1% +7.15% (p=0.000 n=19+19)
String/10/Base10-8 158ns ± 7% 146ns ± 6% -7.65% (p=0.000 n=20+19)
String/100/Base10-8 807ns ± 2% 845ns ± 4% +4.79% (p=0.000 n=20+19)
String/1000/Base10-8 3.99µs ± 3% 3.99µs ± 7% ~ (p=0.920 n=20+20)
String/10000/Base10-8 20.8µs ± 6% 22.1µs ±10% +6.11% (p=0.000 n=19+20)
String/100000/Base10-8 5.60ms ± 2% 5.59ms ± 2% ~ (p=0.749 n=20+19)
String/10/Base16-8 49.0ns ±13% 49.3ns ±16% ~ (p=0.581 n=19+20)
String/100/Base16-8 173ns ± 5% 185ns ± 6% +6.63% (p=0.000 n=20+18)
String/1000/Base16-8 1.38µs ± 3% 1.49µs ±10% +8.27% (p=0.000 n=19+20)
String/10000/Base16-8 13.5µs ± 2% 14.5µs ± 3% +7.08% (p=0.000 n=20+20)
String/100000/Base16-8 138µs ± 4% 148µs ± 4% +7.57% (p=0.000 n=19+20)
LeafSize/0-8 2.74ms ± 1% 2.79ms ± 2% +2.00% (p=0.000 n=19+19)
LeafSize/1-8 24.8µs ± 4% 26.1µs ± 8% +5.33% (p=0.000 n=18+19)
LeafSize/2-8 24.9µs ± 7% 25.0µs ± 8% ~ (p=0.989 n=20+19)
LeafSize/3-8 97.6µs ± 3% 100.2µs ± 5% +2.66% (p=0.001 n=20+19)
LeafSize/4-8 25.2µs ± 5% 25.4µs ± 5% ~ (p=0.173 n=19+20)
LeafSize/5-8 118µs ± 2% 119µs ± 5% ~ (p=0.478 n=20+20)
LeafSize/6-8 97.6µs ± 3% 100.1µs ± 8% +2.65% (p=0.021 n=20+19)
LeafSize/7-8 65.6µs ± 5% 67.5µs ± 6% +2.92% (p=0.003 n=20+19)
LeafSize/8-8 25.5µs ± 5% 25.6µs ± 6% ~ (p=0.461 n=19+20)
LeafSize/9-8 134µs ± 4% 136µs ± 5% ~ (p=0.194 n=19+20)
LeafSize/10-8 119µs ± 3% 122µs ± 3% +2.52% (p=0.000 n=20+19)
LeafSize/11-8 115µs ± 5% 116µs ± 5% ~ (p=0.158 n=20+19)
LeafSize/12-8 97.4µs ± 4% 100.3µs ± 5% +2.91% (p=0.003 n=19+20)
LeafSize/13-8 93.1µs ± 4% 93.0µs ± 6% ~ (p=0.698 n=20+20)
LeafSize/14-8 67.0µs ± 3% 69.7µs ± 6% +4.10% (p=0.000 n=20+20)
LeafSize/15-8 48.3µs ± 2% 49.3µs ± 6% +1.91% (p=0.014 n=19+20)
LeafSize/16-8 25.6µs ± 5% 25.6µs ± 6% ~ (p=0.947 n=20+20)
LeafSize/32-8 30.1µs ± 4% 30.3µs ± 5% ~ (p=0.685 n=18+19)
LeafSize/64-8 53.4µs ± 2% 54.0µs ± 3% ~ (p=0.053 n=19+19)
ProbablyPrime/n=0-8 3.59ms ± 1% 3.55ms ± 1% -1.12% (p=0.000 n=20+18)
ProbablyPrime/n=1-8 4.21ms ± 2% 4.17ms ± 2% -0.73% (p=0.018 n=20+19)
ProbablyPrime/n=5-8 6.74ms ± 1% 6.72ms ± 1% ~ (p=0.102 n=20+20)
ProbablyPrime/n=10-8 9.91ms ± 1% 9.89ms ± 2% ~ (p=0.322 n=19+20)
ProbablyPrime/n=20-8 16.2ms ± 1% 16.1ms ± 2% -0.52% (p=0.006 n=19+19)
ProbablyPrime/Lucas-8 2.94ms ± 1% 2.95ms ± 1% +0.52% (p=0.002 n=18+19)
ProbablyPrime/MillerRabinBase2-8 641µs ± 2% 640µs ± 2% ~ (p=0.607 n=19+20)
FloatSqrt/64-8 653ns ± 5% 704ns ± 5% +7.82% (p=0.000 n=19+20)
FloatSqrt/128-8 1.32µs ± 3% 1.42µs ± 5% +7.29% (p=0.000 n=18+20)
FloatSqrt/256-8 1.44µs ± 2% 1.45µs ± 4% ~ (p=0.089 n=19+19)
FloatSqrt/1000-8 3.36µs ± 3% 3.42µs ± 5% +1.82% (p=0.012 n=20+20)
FloatSqrt/10000-8 25.5µs ± 2% 27.5µs ± 7% +7.91% (p=0.000 n=18+19)
FloatSqrt/100000-8 629µs ± 6% 663µs ± 9% +5.32% (p=0.000 n=18+20)
FloatSqrt/1000000-8 46.4ms ± 2% 46.6ms ± 5% ~ (p=0.351 n=20+19)
[Geo mean] 9.60µs 10.01µs +4.28%
name old alloc/op new alloc/op delta
DecimalConversion-8 54.0kB ± 0% 43.6kB ± 0% -19.40% (p=0.000 n=20+20)
FloatString/100-8 400B ± 0% 400B ± 0% ~ (all equal)
FloatString/1000-8 3.10kB ± 0% 3.10kB ± 0% ~ (all equal)
FloatString/10000-8 52.1kB ± 0% 52.1kB ± 0% ~ (p=0.153 n=20+20)
FloatString/100000-8 582kB ± 0% 582kB ± 0% ~ (all equal)
FloatAdd/10-8 0.00B 0.00B ~ (all equal)
FloatAdd/100-8 0.00B 0.00B ~ (all equal)
FloatAdd/1000-8 0.00B 0.00B ~ (all equal)
FloatAdd/10000-8 0.00B 0.00B ~ (all equal)
FloatAdd/100000-8 0.00B 0.00B ~ (all equal)
FloatSub/10-8 0.00B 0.00B ~ (all equal)
FloatSub/100-8 0.00B 0.00B ~ (all equal)
FloatSub/1000-8 0.00B 0.00B ~ (all equal)
FloatSub/10000-8 0.00B 0.00B ~ (all equal)
FloatSub/100000-8 0.00B 0.00B ~ (all equal)
ParseFloatSmallExp-8 4.18kB ± 0% 3.60kB ± 0% -13.79% (p=0.000 n=20+20)
ParseFloatLargeExp-8 18.9kB ± 0% 19.3kB ± 0% +2.25% (p=0.000 n=20+20)
GCD10x10/WithoutXY-8 96.0B ± 0% 16.0B ± 0% -83.33% (p=0.000 n=20+20)
GCD10x10/WithXY-8 240B ± 0% 88B ± 0% -63.33% (p=0.000 n=20+20)
GCD10x100/WithoutXY-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20)
GCD10x100/WithXY-8 464B ± 0% 424B ± 0% -8.62% (p=0.000 n=20+20)
GCD10x1000/WithoutXY-8 416B ± 0% 336B ± 0% -19.23% (p=0.000 n=20+20)
GCD10x1000/WithXY-8 1.25kB ± 0% 1.10kB ± 0% -12.18% (p=0.000 n=20+20)
GCD10x10000/WithoutXY-8 2.91kB ± 0% 2.83kB ± 0% -2.75% (p=0.000 n=20+20)
GCD10x10000/WithXY-8 8.70kB ± 0% 8.55kB ± 0% -1.76% (p=0.000 n=16+16)
GCD10x100000/WithoutXY-8 27.2kB ± 0% 27.2kB ± 0% -0.29% (p=0.000 n=20+20)
GCD10x100000/WithXY-8 82.4kB ± 0% 82.3kB ± 0% -0.17% (p=0.000 n=20+19)
GCD100x100/WithoutXY-8 288B ± 0% 384B ± 0% +33.33% (p=0.000 n=20+20)
GCD100x100/WithXY-8 464B ± 0% 576B ± 0% +24.14% (p=0.000 n=20+20)
GCD100x1000/WithoutXY-8 640B ± 0% 688B ± 0% +7.50% (p=0.000 n=20+20)
GCD100x1000/WithXY-8 1.52kB ± 0% 1.46kB ± 0% -3.68% (p=0.000 n=20+20)
GCD100x10000/WithoutXY-8 4.24kB ± 0% 4.29kB ± 0% +1.13% (p=0.000 n=20+20)
GCD100x10000/WithXY-8 11.1kB ± 0% 11.0kB ± 0% -0.51% (p=0.000 n=15+20)
GCD100x100000/WithoutXY-8 40.9kB ± 0% 40.9kB ± 0% +0.12% (p=0.000 n=20+19)
GCD100x100000/WithXY-8 110kB ± 0% 109kB ± 0% -0.08% (p=0.000 n=20+20)
GCD1000x1000/WithoutXY-8 1.22kB ± 0% 1.06kB ± 0% -13.16% (p=0.000 n=20+20)
GCD1000x1000/WithXY-8 2.37kB ± 0% 2.11kB ± 0% -10.83% (p=0.000 n=20+20)
GCD1000x10000/WithoutXY-8 4.71kB ± 0% 4.63kB ± 0% -1.70% (p=0.000 n=20+19)
GCD1000x10000/WithXY-8 28.2kB ± 0% 28.0kB ± 0% -0.43% (p=0.000 n=20+15)
GCD1000x100000/WithoutXY-8 41.3kB ± 0% 41.2kB ± 0% -0.20% (p=0.000 n=20+16)
GCD1000x100000/WithXY-8 301kB ± 0% 301kB ± 0% -0.13% (p=0.000 n=20+20)
GCD10000x10000/WithoutXY-8 8.64kB ± 0% 8.48kB ± 0% -1.85% (p=0.000 n=20+20)
GCD10000x10000/WithXY-8 57.2kB ± 0% 57.7kB ± 0% +0.80% (p=0.000 n=20+20)
GCD10000x100000/WithoutXY-8 43.8kB ± 0% 43.7kB ± 0% -0.19% (p=0.000 n=20+18)
GCD10000x100000/WithXY-8 2.08MB ± 0% 2.08MB ± 0% -0.02% (p=0.000 n=15+19)
GCD100000x100000/WithoutXY-8 81.6kB ± 0% 81.4kB ± 0% -0.20% (p=0.000 n=20+20)
GCD100000x100000/WithXY-8 4.32MB ± 0% 4.33MB ± 0% +0.12% (p=0.000 n=20+20)
Hilbert-8 653kB ± 0% 313kB ± 0% -52.13% (p=0.000 n=19+20)
Binomial-8 1.82kB ± 0% 1.02kB ± 0% -43.86% (p=0.000 n=20+20)
QuoRem-8 0.00B 0.00B ~ (all equal)
Exp-8 11.1kB ± 0% 11.0kB ± 0% -0.34% (p=0.000 n=19+20)
Exp2-8 11.3kB ± 0% 11.3kB ± 0% -0.35% (p=0.000 n=19+20)
Bitset-8 0.00B 0.00B ~ (all equal)
BitsetNeg-8 0.00B 0.00B ~ (all equal)
BitsetOrig-8 103B ± 0% 63B ± 0% -38.83% (p=0.000 n=20+20)
BitsetNegOrig-8 215B ± 0% 175B ± 0% -18.60% (p=0.000 n=20+20)
ModSqrt225_Tonelli-8 11.3kB ± 0% 11.0kB ± 0% -2.76% (p=0.000 n=20+17)
ModSqrt225_3Mod4-8 3.57kB ± 0% 3.53kB ± 0% -1.12% (p=0.000 n=20+20)
ModSqrt231_Tonelli-8 11.0kB ± 0% 10.7kB ± 0% -2.55% (p=0.000 n=20+20)
ModSqrt231_5Mod8-8 4.21kB ± 0% 4.09kB ± 0% -2.85% (p=0.000 n=16+20)
ModInverse-8 1.44kB ± 0% 1.28kB ± 0% -11.11% (p=0.000 n=20+20)
Sqrt-8 6.00kB ± 0% 6.00kB ± 0% ~ (all equal)
IntSqr/1-8 0.00B 0.00B ~ (all equal)
IntSqr/2-8 0.00B 0.00B ~ (all equal)
IntSqr/3-8 0.00B 0.00B ~ (all equal)
IntSqr/5-8 0.00B 0.00B ~ (all equal)
IntSqr/8-8 0.00B 0.00B ~ (all equal)
IntSqr/10-8 0.00B 0.00B ~ (all equal)
IntSqr/20-8 320B ± 0% 320B ± 0% ~ (all equal)
IntSqr/30-8 480B ± 0% 480B ± 0% ~ (all equal)
IntSqr/50-8 896B ± 0% 896B ± 0% ~ (all equal)
IntSqr/80-8 1.28kB ± 0% 1.28kB ± 0% ~ (all equal)
IntSqr/100-8 1.79kB ± 0% 1.79kB ± 0% ~ (all equal)
IntSqr/200-8 3.20kB ± 0% 3.20kB ± 0% ~ (all equal)
IntSqr/300-8 8.06kB ± 0% 8.06kB ± 0% ~ (all equal)
IntSqr/500-8 12.3kB ± 0% 12.3kB ± 0% ~ (all equal)
IntSqr/800-8 28.8kB ± 0% 28.8kB ± 0% ~ (all equal)
IntSqr/1000-8 36.9kB ± 0% 36.9kB ± 0% ~ (all equal)
Div/20/10-8 0.00B 0.00B ~ (all equal)
Div/200/100-8 0.00B 0.00B ~ (all equal)
Div/2000/1000-8 0.00B 0.00B ~ (all equal)
Div/20000/10000-8 0.00B 0.00B ~ (all equal)
Div/200000/100000-8 690B ± 0% 690B ± 0% ~ (all equal)
Mul-8 565kB ± 0% 565kB ± 0% ~ (all equal)
ZeroShifts/Shl-8 6.53kB ± 0% 6.53kB ± 0% ~ (all equal)
ZeroShifts/ShlSame-8 0.00B 0.00B ~ (all equal)
ZeroShifts/Shr-8 6.53kB ± 0% 6.53kB ± 0% ~ (all equal)
ZeroShifts/ShrSame-8 0.00B 0.00B ~ (all equal)
Exp3Power/0x10-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20)
Exp3Power/0x40-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20)
Exp3Power/0x100-8 288B ± 0% 208B ± 0% -27.78% (p=0.000 n=20+20)
Exp3Power/0x400-8 672B ± 0% 592B ± 0% -11.90% (p=0.000 n=20+20)
Exp3Power/0x1000-8 3.33kB ± 0% 3.25kB ± 0% -2.40% (p=0.000 n=20+20)
Exp3Power/0x4000-8 13.8kB ± 0% 13.7kB ± 0% -0.58% (p=0.000 n=20+20)
Exp3Power/0x10000-8 117kB ± 0% 117kB ± 0% -0.07% (p=0.000 n=20+20)
Exp3Power/0x40000-8 755kB ± 0% 755kB ± 0% -0.01% (p=0.000 n=19+20)
Exp3Power/0x100000-8 5.22MB ± 0% 5.22MB ± 0% -0.00% (p=0.000 n=20+20)
Exp3Power/0x400000-8 39.8MB ± 0% 39.8MB ± 0% -0.00% (p=0.000 n=20+19)
Fibo-8 3.09MB ± 0% 3.08MB ± 0% -0.28% (p=0.000 n=20+16)
NatSqr/1-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
NatSqr/2-8 64.0B ± 0% 64.0B ± 0% ~ (all equal)
NatSqr/3-8 80.0B ± 0% 80.0B ± 0% ~ (all equal)
NatSqr/5-8 112B ± 0% 112B ± 0% ~ (all equal)
NatSqr/8-8 160B ± 0% 160B ± 0% ~ (all equal)
NatSqr/10-8 192B ± 0% 192B ± 0% ~ (all equal)
NatSqr/20-8 672B ± 0% 672B ± 0% ~ (all equal)
NatSqr/30-8 992B ± 0% 992B ± 0% ~ (all equal)
NatSqr/50-8 1.79kB ± 0% 1.79kB ± 0% ~ (all equal)
NatSqr/80-8 2.69kB ± 0% 2.69kB ± 0% ~ (all equal)
NatSqr/100-8 3.58kB ± 0% 3.58kB ± 0% ~ (all equal)
NatSqr/200-8 6.66kB ± 0% 6.66kB ± 0% ~ (all equal)
NatSqr/300-8 24.4kB ± 0% 24.4kB ± 0% ~ (all equal)
NatSqr/500-8 36.9kB ± 0% 36.9kB ± 0% ~ (all equal)
NatSqr/800-8 69.8kB ± 0% 69.8kB ± 0% ~ (all equal)
NatSqr/1000-8 86.0kB ± 0% 86.0kB ± 0% ~ (all equal)
NatSetBytes/8-8 0.00B 0.00B ~ (all equal)
NatSetBytes/24-8 64.0B ± 0% 64.0B ± 0% ~ (all equal)
NatSetBytes/128-8 160B ± 0% 160B ± 0% ~ (all equal)
NatSetBytes/7-8 0.00B 0.00B ~ (all equal)
NatSetBytes/23-8 64.0B ± 0% 64.0B ± 0% ~ (all equal)
NatSetBytes/127-8 160B ± 0% 160B ± 0% ~ (all equal)
ScanPi-8 75.4kB ± 0% 75.7kB ± 0% +0.41% (p=0.000 n=20+20)
StringPiParallel-8 20.4kB ± 0% 20.4kB ± 0% ~ (p=0.223 n=20+20)
Scan/10/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/1000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/1000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/1000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/1000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
String/10/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
String/100/Base2-8 352B ± 0% 352B ± 0% ~ (all equal)
String/1000/Base2-8 3.46kB ± 0% 3.46kB ± 0% ~ (all equal)
String/10000/Base2-8 41.0kB ± 0% 41.0kB ± 0% ~ (all equal)
String/100000/Base2-8 336kB ± 0% 336kB ± 0% ~ (all equal)
String/10/Base8-8 16.0B ± 0% 16.0B ± 0% ~ (all equal)
String/100/Base8-8 112B ± 0% 112B ± 0% ~ (all equal)
String/1000/Base8-8 1.15kB ± 0% 1.15kB ± 0% ~ (all equal)
String/10000/Base8-8 12.3kB ± 0% 12.3kB ± 0% ~ (all equal)
String/100000/Base8-8 115kB ± 0% 115kB ± 0% ~ (all equal)
String/10/Base10-8 64.0B ± 0% 24.0B ± 0% -62.50% (p=0.000 n=20+20)
String/100/Base10-8 192B ± 0% 192B ± 0% ~ (all equal)
String/1000/Base10-8 1.95kB ± 0% 1.95kB ± 0% ~ (all equal)
String/10000/Base10-8 20.0kB ± 0% 20.0kB ± 0% ~ (p=0.983 n=19+20)
String/100000/Base10-8 210kB ± 1% 211kB ± 1% +0.82% (p=0.000 n=19+20)
String/10/Base16-8 16.0B ± 0% 16.0B ± 0% ~ (all equal)
String/100/Base16-8 96.0B ± 0% 96.0B ± 0% ~ (all equal)
String/1000/Base16-8 896B ± 0% 896B ± 0% ~ (all equal)
String/10000/Base16-8 9.47kB ± 0% 9.47kB ± 0% ~ (all equal)
String/100000/Base16-8 90.1kB ± 0% 90.1kB ± 0% ~ (all equal)
LeafSize/0-8 16.9kB ± 0% 16.8kB ± 0% -0.44% (p=0.000 n=20+20)
LeafSize/1-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19)
LeafSize/2-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19)
LeafSize/3-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+17)
LeafSize/4-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19)
LeafSize/5-8 22.4kB ± 0% 22.3kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/6-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=20+20)
LeafSize/7-8 22.3kB ± 0% 22.2kB ± 0% -0.35% (p=0.000 n=20+20)
LeafSize/8-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=16+20)
LeafSize/9-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/10-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/11-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/12-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/13-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=20+15)
LeafSize/14-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/15-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/16-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=19+20)
LeafSize/32-8 22.3kB ± 0% 22.2kB ± 0% -0.32% (p=0.000 n=20+20)
LeafSize/64-8 21.8kB ± 0% 21.7kB ± 0% -0.33% (p=0.000 n=18+19)
ProbablyPrime/n=0-8 15.3kB ± 0% 14.9kB ± 0% -2.35% (p=0.000 n=20+20)
ProbablyPrime/n=1-8 21.0kB ± 0% 20.7kB ± 0% -1.71% (p=0.000 n=20+20)
ProbablyPrime/n=5-8 43.4kB ± 0% 42.9kB ± 0% -1.20% (p=0.000 n=20+20)
ProbablyPrime/n=10-8 71.5kB ± 0% 70.7kB ± 0% -1.01% (p=0.000 n=19+20)
ProbablyPrime/n=20-8 127kB ± 0% 126kB ± 0% -0.88% (p=0.000 n=20+20)
ProbablyPrime/Lucas-8 3.07kB ± 0% 2.79kB ± 0% -9.12% (p=0.000 n=20+20)
ProbablyPrime/MillerRabinBase2-8 12.1kB ± 0% 12.0kB ± 0% -0.66% (p=0.000 n=20+20)
FloatSqrt/64-8 416B ± 0% 360B ± 0% -13.46% (p=0.000 n=20+20)
FloatSqrt/128-8 640B ± 0% 584B ± 0% -8.75% (p=0.000 n=20+20)
FloatSqrt/256-8 512B ± 0% 472B ± 0% -7.81% (p=0.000 n=20+20)
FloatSqrt/1000-8 1.47kB ± 0% 1.43kB ± 0% -2.72% (p=0.000 n=20+20)
FloatSqrt/10000-8 18.2kB ± 0% 18.1kB ± 0% -0.22% (p=0.000 n=20+20)
FloatSqrt/100000-8 204kB ± 0% 204kB ± 0% -0.02% (p=0.000 n=20+20)
FloatSqrt/1000000-8 6.37MB ± 0% 6.37MB ± 0% -0.00% (p=0.000 n=19+20)
[Geo mean] 3.42kB 3.24kB -5.33%
name old allocs/op new allocs/op delta
DecimalConversion-8 1.65k ± 0% 1.65k ± 0% ~ (all equal)
FloatString/100-8 8.00 ± 0% 8.00 ± 0% ~ (all equal)
FloatString/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal)
FloatString/10000-8 22.0 ± 0% 22.0 ± 0% ~ (all equal)
FloatString/100000-8 136 ± 0% 136 ± 0% ~ (all equal)
FloatAdd/10-8 0.00 0.00 ~ (all equal)
FloatAdd/100-8 0.00 0.00 ~ (all equal)
FloatAdd/1000-8 0.00 0.00 ~ (all equal)
FloatAdd/10000-8 0.00 0.00 ~ (all equal)
FloatAdd/100000-8 0.00 0.00 ~ (all equal)
FloatSub/10-8 0.00 0.00 ~ (all equal)
FloatSub/100-8 0.00 0.00 ~ (all equal)
FloatSub/1000-8 0.00 0.00 ~ (all equal)
FloatSub/10000-8 0.00 0.00 ~ (all equal)
FloatSub/100000-8 0.00 0.00 ~ (all equal)
ParseFloatSmallExp-8 110 ± 0% 130 ± 0% +18.18% (p=0.000 n=20+20)
ParseFloatLargeExp-8 319 ± 0% 371 ± 0% +16.30% (p=0.000 n=20+20)
GCD10x10/WithoutXY-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
GCD10x10/WithXY-8 5.00 ± 0% 6.00 ± 0% +20.00% (p=0.000 n=20+20)
GCD10x100/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
GCD10x100/WithXY-8 9.00 ± 0% 12.00 ± 0% +33.33% (p=0.000 n=20+20)
GCD10x1000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
GCD10x1000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20)
GCD10x10000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
GCD10x10000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20)
GCD10x100000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
GCD10x100000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20)
GCD100x100/WithoutXY-8 6.00 ± 0% 10.00 ± 0% +66.67% (p=0.000 n=20+20)
GCD100x100/WithXY-8 9.00 ± 0% 15.00 ± 0% +66.67% (p=0.000 n=20+20)
GCD100x1000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20)
GCD100x1000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20)
GCD100x10000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20)
GCD100x10000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20)
GCD100x100000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20)
GCD100x100000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20)
GCD1000x1000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal)
GCD1000x1000/WithXY-8 19.0 ± 0% 20.0 ± 0% +5.26% (p=0.000 n=20+20)
GCD1000x10000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal)
GCD1000x10000/WithXY-8 26.0 ± 0% 26.0 ± 0% ~ (all equal)
GCD1000x100000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal)
GCD1000x100000/WithXY-8 27.0 ± 0% 27.0 ± 0% ~ (all equal)
GCD10000x10000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal)
GCD10000x10000/WithXY-8 76.0 ± 0% 78.0 ± 0% +2.63% (p=0.000 n=20+20)
GCD10000x100000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal)
GCD10000x100000/WithXY-8 174 ± 0% 174 ± 0% ~ (all equal)
GCD100000x100000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal)
GCD100000x100000/WithXY-8 645 ± 0% 647 ± 0% +0.31% (p=0.000 n=20+20)
Hilbert-8 14.1k ± 0% 14.3k ± 0% +0.92% (p=0.000 n=20+20)
Binomial-8 38.0 ± 0% 38.0 ± 0% ~ (all equal)
QuoRem-8 0.00 0.00 ~ (all equal)
Exp-8 21.0 ± 0% 21.0 ± 0% ~ (all equal)
Exp2-8 22.0 ± 0% 22.0 ± 0% ~ (all equal)
Bitset-8 0.00 0.00 ~ (all equal)
BitsetNeg-8 0.00 0.00 ~ (all equal)
BitsetOrig-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
BitsetNegOrig-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
ModSqrt225_Tonelli-8 85.0 ± 0% 86.0 ± 0% +1.18% (p=0.000 n=20+20)
ModSqrt225_3Mod4-8 25.0 ± 0% 25.0 ± 0% ~ (all equal)
ModSqrt231_Tonelli-8 80.0 ± 0% 80.0 ± 0% ~ (all equal)
ModSqrt231_5Mod8-8 32.0 ± 0% 32.0 ± 0% ~ (all equal)
ModInverse-8 11.0 ± 0% 11.0 ± 0% ~ (all equal)
Sqrt-8 13.0 ± 0% 13.0 ± 0% ~ (all equal)
IntSqr/1-8 0.00 0.00 ~ (all equal)
IntSqr/2-8 0.00 0.00 ~ (all equal)
IntSqr/3-8 0.00 0.00 ~ (all equal)
IntSqr/5-8 0.00 0.00 ~ (all equal)
IntSqr/8-8 0.00 0.00 ~ (all equal)
IntSqr/10-8 0.00 0.00 ~ (all equal)
IntSqr/20-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
IntSqr/30-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
IntSqr/50-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
IntSqr/80-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
IntSqr/100-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
IntSqr/200-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
IntSqr/300-8 3.00 ± 0% 3.00 ± 0% ~ (all equal)
IntSqr/500-8 3.00 ± 0% 3.00 ± 0% ~ (all equal)
IntSqr/800-8 9.00 ± 0% 9.00 ± 0% ~ (all equal)
IntSqr/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal)
Div/20/10-8 0.00 0.00 ~ (all equal)
Div/200/100-8 0.00 0.00 ~ (all equal)
Div/2000/1000-8 0.00 0.00 ~ (all equal)
Div/20000/10000-8 0.00 0.00 ~ (all equal)
Div/200000/100000-8 0.00 0.00 ~ (all equal)
Mul-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
ZeroShifts/Shl-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
ZeroShifts/ShlSame-8 0.00 0.00 ~ (all equal)
ZeroShifts/Shr-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
ZeroShifts/ShrSame-8 0.00 0.00 ~ (all equal)
Exp3Power/0x10-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
Exp3Power/0x40-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
Exp3Power/0x100-8 5.00 ± 0% 5.00 ± 0% ~ (all equal)
Exp3Power/0x400-8 7.00 ± 0% 7.00 ± 0% ~ (all equal)
Exp3Power/0x1000-8 11.0 ± 0% 11.0 ± 0% ~ (all equal)
Exp3Power/0x4000-8 15.0 ± 0% 15.0 ± 0% ~ (all equal)
Exp3Power/0x10000-8 29.0 ± 0% 29.0 ± 0% ~ (all equal)
Exp3Power/0x40000-8 140 ± 0% 140 ± 0% ~ (all equal)
Exp3Power/0x100000-8 1.12k ± 0% 1.12k ± 0% ~ (all equal)
Exp3Power/0x400000-8 9.88k ± 0% 9.88k ± 0% ~ (p=0.747 n=17+19)
Fibo-8 739 ± 0% 743 ± 0% +0.54% (p=0.000 n=20+20)
NatSqr/1-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSqr/2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSqr/3-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSqr/5-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSqr/8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSqr/10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSqr/20-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
NatSqr/30-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
NatSqr/50-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
NatSqr/80-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
NatSqr/100-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
NatSqr/200-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
NatSqr/300-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
NatSqr/500-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
NatSqr/800-8 10.0 ± 0% 10.0 ± 0% ~ (all equal)
NatSqr/1000-8 10.0 ± 0% 10.0 ± 0% ~ (all equal)
NatSetBytes/8-8 0.00 0.00 ~ (all equal)
NatSetBytes/24-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSetBytes/128-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSetBytes/7-8 0.00 0.00 ~ (all equal)
NatSetBytes/23-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSetBytes/127-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
ScanPi-8 60.0 ± 0% 61.0 ± 0% +1.67% (p=0.000 n=20+20)
StringPiParallel-8 24.0 ± 0% 24.0 ± 0% ~ (all equal)
Scan/10/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/1000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/1000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/1000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/1000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/10/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/100/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/1000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/10000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/100000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/10/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/100/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/1000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/10000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/100000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/10/Base10-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
String/100/Base10-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
String/1000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal)
String/10000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal)
String/100000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal)
String/10/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/100/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/1000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/10000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/100000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
LeafSize/0-8 10.0 ± 0% 10.0 ± 0% ~ (all equal)
LeafSize/1-8 13.0 ± 0% 13.0 ± 0% ~ (all equal)
LeafSize/2-8 13.0 ± 0% 13.0 ± 0% ~ (all equal)
LeafSize/3-8 13.0 ± 0% 13.0 ± 0% ~ (all equal)
LeafSize/4-8 13.0 ± 0% 13.0 ± 0% ~ (all equal)
LeafSize/5-8 13.0 ± 0% 13.0 ± 0% ~ (all equal)
LeafSize/6-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/7-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/8-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/9-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/10-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/11-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/12-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/13-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/14-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/15-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/16-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/32-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/64-8 11.0 ± 0% 11.0 ± 0% ~ (all equal)
ProbablyPrime/n=0-8 52.0 ± 0% 52.0 ± 0% ~ (all equal)
ProbablyPrime/n=1-8 73.0 ± 0% 73.0 ± 0% ~ (all equal)
ProbablyPrime/n=5-8 157 ± 0% 157 ± 0% ~ (all equal)
ProbablyPrime/n=10-8 262 ± 0% 262 ± 0% ~ (all equal)
ProbablyPrime/n=20-8 472 ± 0% 472 ± 0% ~ (all equal)
ProbablyPrime/Lucas-8 22.0 ± 0% 22.0 ± 0% ~ (all equal)
ProbablyPrime/MillerRabinBase2-8 29.0 ± 0% 29.0 ± 0% ~ (all equal)
FloatSqrt/64-8 9.00 ± 0% 10.00 ± 0% +11.11% (p=0.000 n=20+20)
FloatSqrt/128-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20)
FloatSqrt/256-8 8.00 ± 0% 8.00 ± 0% ~ (all equal)
FloatSqrt/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal)
FloatSqrt/10000-8 14.0 ± 0% 14.0 ± 0% ~ (all equal)
FloatSqrt/100000-8 33.0 ± 0% 33.0 ± 0% ~ (all equal)
FloatSqrt/1000000-8 1.16k ± 0% 1.16k ± 0% ~ (all equal)
[Geo mean] 6.62 6.76 +2.09%
Change-Id: Id9df4157cac1e07721e35cff7fcdefe60703873a
Reviewed-on: https://go-review.googlesource.com/c/150999
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
The divLarge code contained "todo"s about avoiding alias
and clear calls in the initialization of variables. By
rearranging the order of initialization and always using
an auxiliary variable for the shifted divisor, all of these
calls can be safely avoided. On average, normalizing
the divisor (shift>0) is required 31/32 or 63/64 of the
time. If one always performs the shift into an auxiliary
variable first, this avoids the need to check for aliasing of
vIn in the output variables u and z. The remainder u is
initialized via a left shift of uIn and thus needs no
alias check against uIn. Since uIn and vIn were both used,
z needs no alias checks except against u which is used for
storage of the remainder. This change has a minimal impact
on performance (see below), but cleans up the initialization
code and eliminates the "todo"s.
name old time/op new time/op delta
Div/20/10-4 86.7ns ± 6% 85.7ns ± 5% ~ (p=0.841 n=5+5)
Div/200/100-4 523ns ± 5% 502ns ± 3% -4.13% (p=0.024 n=5+5)
Div/2000/1000-4 2.55µs ± 3% 2.59µs ± 5% ~ (p=0.548 n=5+5)
Div/20000/10000-4 80.4µs ± 4% 80.0µs ± 2% ~ (p=1.000 n=5+5)
Div/200000/100000-4 6.43ms ± 6% 6.35ms ± 4% ~ (p=0.548 n=5+5)
Fixes #22928
Change-Id: I30d8498ef1cf8b69b0f827165c517bc25a5c32d7
Reviewed-on: https://go-review.googlesource.com/130775
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
For #23221.
Change-Id: If55dcf2e0706d6658f4a0863e3740437e008706c
Reviewed-on: https://go-review.googlesource.com/114335
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Currently we use three different algorithms for squaring:
1. basic multiplication for small numbers
2. basic squaring for medium numbers
3. Karatsuba multiplication for large numbers
Change 3. to a version of Karatsuba multiplication specialized
for x == y.
Increasing the performance of 3. lets us lower the threshold
between 2. and 3.
Adapt TestCalibrate to the change that 3. isn't independent
of the threshold between 1. and 2. any more.
Fixes #23221.
benchstat old.txt new.txt
name old time/op new time/op delta
NatSqr/1-4 29.6ns ± 7% 29.5ns ± 5% ~ (p=0.103 n=50+50)
NatSqr/2-4 51.9ns ± 1% 51.9ns ± 1% ~ (p=0.693 n=42+49)
NatSqr/3-4 64.3ns ± 1% 64.1ns ± 0% -0.26% (p=0.000 n=46+43)
NatSqr/5-4 93.5ns ± 2% 93.1ns ± 1% -0.39% (p=0.000 n=48+49)
NatSqr/8-4 131ns ± 1% 131ns ± 1% ~ (p=0.870 n=46+49)
NatSqr/10-4 175ns ± 1% 175ns ± 1% +0.38% (p=0.000 n=49+47)
NatSqr/20-4 426ns ± 1% 429ns ± 1% +0.84% (p=0.000 n=46+48)
NatSqr/30-4 702ns ± 2% 699ns ± 1% -0.38% (p=0.011 n=46+44)
NatSqr/50-4 1.44µs ± 2% 1.43µs ± 1% -0.54% (p=0.010 n=48+48)
NatSqr/80-4 2.85µs ± 1% 2.87µs ± 1% +0.68% (p=0.000 n=47+47)
NatSqr/100-4 4.06µs ± 1% 4.07µs ± 1% +0.29% (p=0.000 n=46+45)
NatSqr/200-4 13.4µs ± 1% 13.5µs ± 1% +0.73% (p=0.000 n=48+48)
NatSqr/300-4 28.5µs ± 1% 28.2µs ± 1% -1.22% (p=0.000 n=46+48)
NatSqr/500-4 81.9µs ± 1% 67.0µs ± 1% -18.25% (p=0.000 n=48+48)
NatSqr/800-4 161µs ± 1% 140µs ± 1% -13.29% (p=0.000 n=47+48)
NatSqr/1000-4 245µs ± 1% 207µs ± 1% -15.17% (p=0.000 n=49+49)
go test -v -calibrate --run TestCalibrate
...
Calibrating threshold between basicSqr(x) and karatsubaSqr(x)
Looking for a timing difference for x between 200 - 500 words by 10 step
words = 200 deltaT = -980ns ( -7%) is karatsubaSqr(x) better: false
words = 210 deltaT = -773ns ( -5%) is karatsubaSqr(x) better: false
words = 220 deltaT = -695ns ( -4%) is karatsubaSqr(x) better: false
words = 230 deltaT = -570ns ( -3%) is karatsubaSqr(x) better: false
words = 240 deltaT = -458ns ( -2%) is karatsubaSqr(x) better: false
words = 250 deltaT = -63ns ( 0%) is karatsubaSqr(x) better: false
words = 260 deltaT = 118ns ( 0%) is karatsubaSqr(x) better: true threshold found
words = 270 deltaT = 377ns ( 1%) is karatsubaSqr(x) better: true
words = 280 deltaT = 765ns ( 3%) is karatsubaSqr(x) better: true
words = 290 deltaT = 673ns ( 2%) is karatsubaSqr(x) better: true
words = 300 deltaT = 502ns ( 1%) is karatsubaSqr(x) better: true
words = 310 deltaT = 629ns ( 2%) is karatsubaSqr(x) better: true
words = 320 deltaT = 1.011µs ( 3%) is karatsubaSqr(x) better: true
words = 330 deltaT = 1.36µs ( 4%) is karatsubaSqr(x) better: true
words = 340 deltaT = 3.001µs ( 8%) is karatsubaSqr(x) better: true
words = 350 deltaT = 3.178µs ( 8%) is karatsubaSqr(x) better: true
...
Change-Id: I6f13c23d94d042539ac28e77fd2618cdc37a429e
Reviewed-on: https://go-review.googlesource.com/105075
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Updates #22830
Due to not checking if the output slices alias in divLarge,
calls of the form z.div(z, x, y) caused the slice z
to attempt to be used to store both the quotient and the
remainder of the division. CL 78995 applies an alias
check to correct that error. This CL cleans up the
additional div calls that attempt to supply the same slice
to hold both the quotient and remainder.
Note that the call in expNN was responsible for the reported
error in r.Exp(x, 1, m) when r was initialized to a non-zero value.
The second instance in expNNMontgomery did not result in an error
due to the size of the arguments.
// RR = 2**(2*_W*len(m)) mod m
RR := nat(nil).setWord(1)
zz := nat(nil).shl(RR, uint(2*numWords*_W))
_, RR = RR.div(RR, zz, m)
Specifically,
cap(RR) == 5 after setWord(1) due to const e = 4 in z.make(1)
len(zz) == 2*len(m) + 1 after shifting left, numWords = len(m)
Reusing the backing array for z and z2 in div was only triggered if
cap(RR) >= len(zz) + 1 and len(m) > 1 so that divLarge was called.
But, 5 < 2*len(m) + 2 if len(m) > 1, so new arrays were allocated
and the error was never triggered in this case.
Change-Id: Iedac80dbbde13216c94659e84d28f6f4be3aaf24
Reviewed-on: https://go-review.googlesource.com/81055
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
That "else" was needed due to gc DCE limitations.
Now it's not the case and we can avoid go lint complaints.
(See #23521 and https://golang.org/cl/91056.)
There is inlining test for bigEndianWord, so if test
is passing, no performance regression should occur.
Change-Id: Id84d63f361e5e51a52293904ff042966c83c16e9
Reviewed-on: https://go-review.googlesource.com/104555
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Instead shifting the accumulator every iteration of the loop, shift
once in the end. This significantly improves performance on arm64.
On arm64:
name old time/op new time/op delta
RSA2048Decrypt 3.33ms ± 0% 2.63ms ± 0% -20.94% (p=0.000 n=11+11)
RSA2048Sign 4.22ms ± 0% 3.55ms ± 0% -15.89% (p=0.000 n=11+11)
3PrimeRSA2048Decrypt 1.95ms ± 0% 1.59ms ± 0% -18.59% (p=0.000 n=11+11)
On Skylake:
name old time/op new time/op delta
RSA2048Decrypt-8 1.73ms ± 2% 1.55ms ± 2% -10.19% (p=0.000 n=10+10)
RSA2048Sign-8 2.17ms ± 2% 2.00ms ± 2% -7.93% (p=0.000 n=10+10)
3PrimeRSA2048Decrypt-8 1.10ms ± 2% 0.96ms ± 2% -13.03% (p=0.000 n=10+9)
Change-Id: I5786191a1a09e4217fdb1acfd90880d35c5855f7
Reviewed-on: https://go-review.googlesource.com/99838
Run-TryBot: Vlad Krasnov <vlad@cloudflare.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Adam Langley <agl@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
One could expect calls like
z.mant.shl(z.mant, shiftAmount)
(or higher-level-functions calls that use lhs/rhs) to be almost free
when shiftAmount = 0; and expect calls like
z.mant.shl(x.mant, 0)
to have the same cost of a x.mant -> z.mant copy. Neither of this
things are currently true.
For an 800 words nat, the first kind of calls cost ~800ns for rigth
shifts and ~3.5µs for left shift; while the second kind of calls are
doing more work than necessary by calling shlVU/shrVU.
This change makes the first kind of calls ({Shl,Shr}Same) almost free,
and the second kind of calls ({Shl,Shr}) about 30% faster.
name old time/op new time/op delta
ZeroShifts/Shl-4 3.64µs ± 3% 2.49µs ± 1% -31.55% (p=0.000 n=10+10)
ZeroShifts/ShlSame-4 3.65µs ± 1% 0.01µs ± 1% -99.85% (p=0.000 n=9+9)
ZeroShifts/Shr-4 3.65µs ± 1% 2.49µs ± 1% -31.91% (p=0.000 n=10+10)
ZeroShifts/ShrSame-4 825ns ± 0% 6ns ± 1% -99.33% (p=0.000 n=9+10)
During go test math/big, the shl zeroshift fastpath is triggered 1380
times; while the shr fastpath is triggered 153334 times(!).
Change-Id: I5f92b304a40638bd8453a86c87c58e54b337bcdf
Reviewed-on: https://go-review.googlesource.com/87660
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Change-Id: I7ed40507a019c0bf521ba748fc22c03d74bb17b7
Reviewed-on: https://go-review.googlesource.com/100719
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Set up to _S (number of bytes in Uint) bytes at time
by using BigEndian.Uint32 and BigEndian.Uint64.
The performance improves for slices bigger than _S bytes.
This is the case for 128/256bit arith that initializes
it's objects from bytes.
name old time/op new time/op delta
NatSetBytes/8-4 29.8ns ± 1% 11.4ns ± 0% -61.63% (p=0.000 n=9+8)
NatSetBytes/24-4 109ns ± 1% 56ns ± 0% -48.75% (p=0.000 n=9+8)
NatSetBytes/128-4 420ns ± 2% 110ns ± 1% -73.83% (p=0.000 n=10+10)
NatSetBytes/7-4 26.2ns ± 1% 21.3ns ± 2% -18.63% (p=0.000 n=8+9)
NatSetBytes/23-4 106ns ± 1% 67ns ± 1% -36.93% (p=0.000 n=9+10)
NatSetBytes/127-4 410ns ± 2% 121ns ± 0% -70.46% (p=0.000 n=9+8)
Found this optimization opportunity by looking at ethereum_corevm
community benchmark cpuprofile.
name old time/op new time/op delta
OpDiv256-4 715ns ± 1% 596ns ± 1% -16.57% (p=0.008 n=5+5)
OpDiv128-4 373ns ± 1% 314ns ± 1% -15.83% (p=0.008 n=5+5)
OpDiv64-4 301ns ± 0% 285ns ± 1% -5.12% (p=0.008 n=5+5)
Change-Id: I8e5a680ae6284c8233d8d7431d51253a8a740b57
Reviewed-on: https://go-review.googlesource.com/98775
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
In nat.divLarge (having signature (z nat).divLarge(u, uIn, v nat)),
we check whether z aliases uIn or v, but aliasing is currently not
checked for the u parameter.
Unfortunately, z and u aliasing each other can in some cases cause
errors in the computation.
The q return parameter (which will hold the result's quotient), is
unconditionally initialized as
q = z.make(m + 1)
When cap(z) ≥ m+1, z.make() will reuse z's backing array, causing q
and z to share the same backing array. If then z aliases u, setting q
during the quotient computation will then corrupt u, which at that
point already holds computation state.
To fix this, we add an alias(z, u) check at the beginning of the
function, taking care of aliasing the same way we already do for uIn
and v.
Fixes #22830
Change-Id: I3ab81120d5af6db7772a062bb1dfc011de91f7ad
Reviewed-on: https://go-review.googlesource.com/78995
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Replace z.mul(x, x) calls on nats in internal code with z.sqr(x)
that employs optimized squaring routines. Benchmark results:
Exp-4 12.9ms ± 2% 12.8ms ± 3% ~ (p=0.165 n=10+10)
Exp2-4 13.0ms ± 4% 12.8ms ± 2% -2.14% (p=0.015 n=8+9)
ModSqrt225_Tonelli-4 987µs ± 4% 989µs ± 2% ~ (p=0.673 n=8+9)
ModSqrt224_3Mod4-4 300µs ± 2% 301µs ± 3% ~ (p=0.546 n=9+9)
ModSqrt5430_Tonelli-4 4.88s ± 6% 4.82s ± 5% ~ (p=0.247 n=10+10)
ModSqrt5430_3Mod4-4 1.62s ±10% 1.57s ± 1% ~ (p=0.094 n=9+9)
Exp3Power/0x10-4 496ns ± 7% 426ns ± 7% -14.21% (p=0.000 n=10+10)
Exp3Power/0x40-4 575ns ± 5% 470ns ± 7% -18.20% (p=0.000 n=9+10)
Exp3Power/0x100-4 929ns ±19% 770ns ±10% -17.13% (p=0.000 n=10+10)
Exp3Power/0x400-4 1.96µs ± 7% 1.79µs ± 5% -8.68% (p=0.000 n=10+10)
Exp3Power/0x1000-4 10.9µs ± 9% 7.9µs ± 5% -28.02% (p=0.000 n=10+10)
Exp3Power/0x4000-4 86.8µs ± 8% 67.3µs ± 8% -22.41% (p=0.000 n=10+10)
Exp3Power/0x10000-4 750µs ± 8% 731µs ± 1% ~ (p=0.074 n=9+8)
Exp3Power/0x40000-4 7.07ms ± 7% 7.05ms ± 4% ~ (p=0.931 n=9+9)
Exp3Power/0x100000-4 64.7ms ± 2% 65.6ms ± 6% ~ (p=0.661 n=9+10)
Exp3Power/0x400000-4 577ms ± 2% 580ms ± 3% ~ (p=0.931 n=9+9)
ProbablyPrime/n=0-4 9.08ms ±17% 9.09ms ±16% ~ (p=0.447 n=9+10)
ProbablyPrime/n=1-4 10.8ms ± 4% 10.7ms ± 2% ~ (p=0.243 n=10+9)
ProbablyPrime/n=5-4 18.5ms ± 3% 18.5ms ± 1% ~ (p=0.863 n=9+9)
ProbablyPrime/n=10-4 28.6ms ± 6% 28.2ms ± 1% ~ (p=0.050 n=9+9)
ProbablyPrime/n=20-4 48.4ms ± 4% 48.4ms ± 2% ~ (p=0.739 n=10+10)
ProbablyPrime/Lucas-4 6.75ms ± 4% 6.75ms ± 2% ~ (p=0.963 n=9+8)
ProbablyPrime/MillerRabinBase2-4 2.00ms ± 5% 2.00ms ± 7% ~ (p=0.931 n=9+9)
Change-Id: Ibe9f58d11dbad25eb369faedf480b666a0250a6b
Reviewed-on: https://go-review.googlesource.com/56773
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
updates #13745
Multiprecision squaring can be done in a straightforward manner
with about half the multiplications of a basic multiplication
due to the symmetry of the operands. This change implements
basic squaring for nat types and uses it for Int multiplication
when the same variable is supplied to both arguments of
z.Mul(x, x). This has some overhead to allocate a temporary
variable to hold the cross products, shift them to double and
add them to the diagonal terms. There is a speed benefit in
the intermediate range when the overhead is neglible and the
asymptotic performance of karatsuba multiplication has not been
reached.
basicSqrThreshold = 20
karatsubaSqrThreshold = 400
Were set by running calibrate_test.go to measure timing differences
between the algorithms. Benchmarks for squaring:
name old time/op new time/op delta
IntSqr/1-4 51.5ns ±25% 25.1ns ± 7% -51.38% (p=0.008 n=5+5)
IntSqr/2-4 79.1ns ± 4% 72.4ns ± 2% -8.47% (p=0.008 n=5+5)
IntSqr/3-4 102ns ± 4% 97ns ± 5% ~ (p=0.056 n=5+5)
IntSqr/5-4 161ns ± 4% 163ns ± 7% ~ (p=0.952 n=5+5)
IntSqr/8-4 277ns ± 5% 267ns ± 6% ~ (p=0.087 n=5+5)
IntSqr/10-4 358ns ± 3% 360ns ± 4% ~ (p=0.730 n=5+5)
IntSqr/20-4 1.07µs ± 3% 1.01µs ± 6% ~ (p=0.056 n=5+5)
IntSqr/30-4 2.36µs ± 4% 1.72µs ± 2% -27.03% (p=0.008 n=5+5)
IntSqr/50-4 5.19µs ± 3% 3.88µs ± 4% -25.37% (p=0.008 n=5+5)
IntSqr/80-4 11.3µs ± 4% 8.6µs ± 3% -23.78% (p=0.008 n=5+5)
IntSqr/100-4 16.2µs ± 4% 12.8µs ± 3% -21.49% (p=0.008 n=5+5)
IntSqr/200-4 50.1µs ± 5% 44.7µs ± 3% -10.65% (p=0.008 n=5+5)
IntSqr/300-4 105µs ±11% 95µs ± 3% -9.50% (p=0.008 n=5+5)
IntSqr/500-4 231µs ± 5% 227µs ± 2% ~ (p=0.310 n=5+5)
IntSqr/800-4 496µs ± 9% 459µs ± 3% -7.40% (p=0.016 n=5+5)
IntSqr/1000-4 700µs ± 3% 710µs ± 5% ~ (p=0.841 n=5+5)
Show a speed up of 10-25% in the range where basicSqr is optimal,
improved single word squaring and no significant difference when
the fallback to standard multiplication is used.
Change-Id: Iae2c82ca91cf890823f91e5c83bbe9a2c534b72b
Reviewed-on: https://go-review.googlesource.com/53638
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Verified that BenchmarkBitLen time went down from 2.25 ns/op to 0.65 ns/op
an a 2.3 GHz Intel Core i7, before removing that benchmark (now covered by
math/bits benchmarks).
Change-Id: I3890bb7d1889e95b9a94bd68f0bdf06f1885adeb
Reviewed-on: https://go-review.googlesource.com/38464
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
nat.setUint64 is nicely generic.
By assuming 32- or 64-bit words, however,
we can write simpler code,
and eliminate some shifts
in dead code that vet complains about.
Generated code for 64 bit systems is unaltered.
Generated code for 32 bit systems is much better.
For 386, the routine length drops from 325
bytes of code to 271 bytes of code, with fewer loops.
Change-Id: I1bc14c06272dee37a7fcb48d33dd1e621eba945d
Reviewed-on: https://go-review.googlesource.com/38070
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
This change adds math/bits as a new dependency of math/big.
- use bits.LeadingZeroes instead of local implementation
(they are identical, so there's no performance loss here)
- leave other functionality local (ntz, bitLen) since there's
faster implementations in math/big at the moment
Change-Id: I1218aa8a1df0cc9783583b090a4bb5a8a145c4a2
Reviewed-on: https://go-review.googlesource.com/37141
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
This is needed for some of the more complex primality tests
(to filter out exact squares), and while the code is simple the
boundary conditions are not obvious, so it seems worth having
in the library.
Change-Id: Ica994a6b6c1e412a6f6d9c3cf823f9b653c6bcbd
Reviewed-on: https://go-review.googlesource.com/30706
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
A later CL will be adding more code here.
It will help to keep it separate from the other code.
Change-Id: I971ba53de819cd10991b51fdec665984939a5f9b
Reviewed-on: https://go-review.googlesource.com/30709
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
The Montgomery multiply code is applicable to this case
but was being bypassed. Don't do that.
The old test len(x) > 1 was really just a bad approximation to x > 1.
name old time/op new time/op delta
Exp-8 5.56ms ± 4% 5.73ms ± 3% ~ (p=0.095 n=5+5)
Exp2-8 7.59ms ± 1% 5.66ms ± 1% -25.40% (p=0.008 n=5+5)
This comes up especially when doing Fermat (Miller-Rabin)
primality tests with base 2.
Change-Id: I4cc02978db6dfa93f7f3c8f32718e25eedb4f5ed
Reviewed-on: https://go-review.googlesource.com/30708
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
- Add new BenchmarkQuoRem.
- Eliminate allocation in divLarge nat pool
- Unroll mulAddVWW body 4x
- Remove some redundant slice loads in divLarge
name old time/op new time/op delta
QuoRem-8 2.18µs ± 1% 1.93µs ± 1% -11.38% (p=0.000 n=19+18)
The starting point in the comparison here is Cherry's
pending CL to turn mulWW and divWW into intrinsics.
The optimizations in divLarge work best because all
the function calls are gone. The effect of this CL is not
as large if you don't assume Cherry's CL.
Change-Id: Ia6138907489c5b9168497912e43705634e163b35
Reviewed-on: https://go-review.googlesource.com/30613
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|