go - Fork of Go programming language with my patches.

Age	Commit message (Collapse)	Author
2025-04-19	math/big: use clearer loop bounds check elimination	Russ Cox
	Checking that the lengths are equal and panicking teaches the compiler that it can assume “i in range for z” implies “i in range for x”, letting us simplify the actual loops a bit. It also turns up a few places in math/big that were playing maybe a little too fast and loose with slice lengths. Update those to explicitly set all the input slices to the same length. These speedups are basically irrelevant, since they only happen in real code if people are compiling with -tags math_big_pure_go. But at least the code is clearer. benchmark \ system c3h88 c2s16 s7 386 s7-386 c4as16 mac arm loong64 ppc64le riscv64 s390x AddVV/words=1/impl=go ~ +11.20% +5.11% -7.67% -7.77% +1.90% +10.76% -33.22% ~ +10.98% ~ +6.60% AddVV/words=10/impl=go -22.12% -13.48% -10.37% -17.95% -18.07% -24.58% -22.04% -29.95% -14.22% ~ -6.33% +3.66% AddVV/words=16/impl=go -9.75% -13.73% ~ -21.90% -18.66% -30.03% -20.45% -28.09% -17.33% -7.15% -8.96% +12.55% AddVV/words=100/impl=go -5.91% -1.02% ~ -29.23% -22.18% -25.62% -6.49% -23.59% -22.31% -1.88% -14.13% +9.23% AddVV/words=1000/impl=go -0.52% -0.19% -3.58% -33.89% -23.46% -22.46% ~ -24.00% -24.73% +0.93% -15.79% +12.32% AddVV/words=10000/impl=go ~ ~ ~ -33.79% -23.72% -23.79% -5.98% -23.92% ~ +0.78% -15.45% +8.59% AddVV/words=100000/impl=go ~ ~ ~ -33.90% -24.25% -22.82% -4.09% -24.63% ~ +1.00% -13.56% ~ SubVV/words=1/impl=go ~ +11.64% +14.05% ~ -4.07% ~ +10.79% -33.69% ~ ~ +3.89% +12.33% SubVV/words=10/impl=go -10.31% -14.09% -7.38% +13.76% -13.25% -18.05% -20.08% -24.97% -14.15% +10.13% -0.97% -2.51% SubVV/words=16/impl=go -8.06% -13.73% -5.70% +17.00% -12.83% -23.76% -17.52% -25.25% -17.30% -2.80% -4.96% -18.25% SubVV/words=100/impl=go -9.22% -1.30% -2.76% +20.88% -14.35% -15.29% -8.49% -19.64% -22.31% -0.68% -14.30% -9.04% SubVV/words=1000/impl=go -0.60% ~ -3.43% +23.08% -16.14% -11.96% ~ -28.52% -24.73% ~ -15.95% -9.91% SubVV/words=10000/impl=go ~ ~ ~ +26.01% -15.24% -11.92% ~ -28.26% +4.25% ~ -15.42% -5.95% SubVV/words=100000/impl=go ~ ~ ~ +25.71% -15.83% -12.13% ~ -27.88% -1.27% ~ -13.57% -6.72% LshVU/words=1/impl=go +0.56% +0.36% ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ LshVU/words=10/impl=go +13.37% +4.63% ~ ~ ~ ~ ~ -2.90% ~ ~ ~ ~ LshVU/words=16/impl=go +22.83% +6.47% ~ ~ ~ ~ ~ ~ +0.80% ~ ~ +5.88% LshVU/words=100/impl=go +7.56% +13.95% ~ ~ ~ ~ ~ ~ +0.33% -2.50% ~ ~ LshVU/words=1000/impl=go +0.64% +17.92% ~ ~ ~ ~ ~ -6.52% ~ -2.58% ~ ~ LshVU/words=10000/impl=go ~ +17.60% ~ ~ ~ ~ ~ -6.64% -6.22% -1.40% ~ ~ LshVU/words=100000/impl=go ~ +14.57% ~ ~ ~ ~ ~ ~ -5.47% ~ ~ ~ RshVU/words=1/impl=go ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ +2.72% RshVU/words=10/impl=go ~ ~ ~ ~ ~ ~ ~ +2.50% ~ ~ ~ ~ RshVU/words=16/impl=go ~ +0.53% ~ ~ ~ ~ ~ +3.82% ~ ~ ~ ~ RshVU/words=100/impl=go ~ ~ ~ ~ ~ ~ ~ +6.18% ~ ~ ~ ~ RshVU/words=1000/impl=go ~ ~ ~ ~ ~ ~ ~ +7.00% ~ ~ ~ ~ RshVU/words=10000/impl=go ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ RshVU/words=100000/impl=go ~ ~ ~ ~ ~ ~ ~ +7.05% ~ ~ ~ ~ MulAddVWW/words=1/impl=go -10.34% +4.43% +10.62% -1.62% -4.74% -2.86% +11.75% ~ -8.00% +8.89% +3.87% ~ MulAddVWW/words=10/impl=go -1.61% -5.87% ~ -8.30% -4.55% +0.87% ~ -5.28% -20.82% ~ ~ -2.32% MulAddVWW/words=16/impl=go -2.96% -5.28% ~ -9.22% -5.28% ~ ~ -3.74% -19.52% -1.48% -2.53% -9.52% MulAddVWW/words=100/impl=go -3.89% -7.53% +1.93% -10.49% -4.87% -8.27% ~ ~ -0.65% -0.61% -7.59% -20.61% MulAddVWW/words=1000/impl=go -0.45% -3.91% +4.54% -11.46% -4.69% -8.53% ~ ~ -0.05% ~ -8.88% -19.77% MulAddVWW/words=10000/impl=go ~ -3.30% +4.10% -11.34% -4.10% -9.43% ~ -0.61% ~ -0.55% -8.21% -18.48% MulAddVWW/words=100000/impl=go -0.30% -3.03% +4.31% -11.55% -4.41% -9.74% ~ -0.75% +0.63% ~ -7.80% -19.82% AddMulVVWW/words=1/impl=go ~ +13.09% +12.50% -7.05% -10.41% +2.53% +13.32% -3.49% ~ +15.56% +3.62% ~ AddMulVVWW/words=10/impl=go -15.96% -9.06% -5.06% -14.56% -11.83% -5.44% -26.30% -14.23% -11.44% -1.79% -5.93% -6.60% AddMulVVWW/words=16/impl=go -19.05% -12.43% -6.19% -14.24% -12.67% -8.65% -18.64% -16.56% -10.64% -3.00% -7.61% -12.80% AddMulVVWW/words=100/impl=go -22.13% -16.59% -13.04% -13.79% -11.46% -12.01% -6.46% -21.80% -5.08% -3.13% -13.60% -22.53% AddMulVVWW/words=1000/impl=go -17.07% -17.05% -14.08% -13.59% -12.13% -11.21% ~ -22.81% -4.27% -1.27% -16.35% -23.47% AddMulVVWW/words=10000/impl=go -15.03% -16.78% -14.23% -13.86% -11.84% -11.69% ~ -22.75% -13.39% -1.10% -14.37% -22.01% AddMulVVWW/words=100000/impl=go -13.70% -14.90% -14.26% -13.55% -12.04% -11.63% ~ -22.61% ~ -2.53% -10.42% -23.16% Change-Id: Ic6f64344484a762b818c7090d1396afceb638607 Reviewed-on: https://go-review.googlesource.com/c/go/+/665155 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>
2025-04-11	math/big: remove copy responsibility from, rename shlVU, shrVU	Russ Cox
	It is annoying that non-x86 implementations of shlVU and shrVU have to go out of their way to handle the trivial case shift==0 with their own copy loops. Instead, arrange to never call them with shift==0, so that the code can be removed. Unfortunately, there are linknames of shlVU, so we cannot change that function. But we can rename the functions and then leave behind a shlVU wrapper, so do that. Since the big.Int API calls the operations Lsh and Rsh, rename shlVU/shrVU to lshVU/rshVU. Also rename various other shl/shr methods and functions to lsh/rsh. Change-Id: Ieaf54e0110a298730aa3e4566ce5be57ba7fc121 Reviewed-on: https://go-review.googlesource.com/c/go/+/664896 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-11	math/big: replace addMulVVW with addMulVVWW	Russ Cox
	addMulVVW is an unnecessarily special case. All other assembly routines taking []Word (V as in vector) arguments take separate source and destination. For example: addVV: z = x+y mulAddVWW: z = xm+a addMulVVW uses the z parameter as both destination and source: addMulVVW: z = z+xm Even looking at the signatures is confusing: all the VV routines take two input vectors x and y, but addMulVVW takes only x: where is y? (The answer is that the two inputs are z and x.) It would be nice to fix this, both for understandability and regularity, and to simplify a future assembly generator. We cannot remove or redefine addMulVVW, because it has been used in linknames. Instead, the CL adds a new final addend argument ‘a’ like in mulAddVWW, making the natural name addMulVVWW (two input vectors, two input words): addMulVVWW: z = x+y*m+a This CL updates all the assembly implementations to rename the inputs z, x, y -> x, y, m, and then introduces a separate destination z. Change-Id: Ib76c80b53f6d1f4a901f663566e9c4764bb20488 Reviewed-on: https://go-review.googlesource.com/c/go/+/664895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>
2025-03-12	math/big: simplify, speed up Karatsuba multiplication	Russ Cox
	The old Karatsuba implementation only operated on lengths that are a power of two times a number smaller than karatsubaThreshold. For example, when karatsubaThreshold = 40, multiplying a pair of 99-word numbers runs karatsuba on the low 96 (= 39<<2) words and then has to fix up the answer to include the high 3 words of each. I suspect this requirement was needed to make the analysis of how many temporary words to reserve easier, back when the answer was 3*n and depended on exactly halving the size at each Karatsuba step. Now that we have the more flexible temporary allocation stack, we can change Karatsuba to accept operands of odd length. Doing so avoids most of the fixup that the old approach required. For example, multiplying a pair of 99-word numbers runs karatsuba on all 99 words now. This is simpler and about the same speed or, for large cases, faster. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-16 99.62n ± 3% 99.10n ± 3% ~ (p=0.009 n=15) GCD10x10/WithXY-16 243.4n ± 1% 245.2n ± 1% ~ (p=0.009 n=15) GCD100x100/WithoutXY-16 921.9n ± 1% 919.2n ± 1% ~ (p=0.076 n=15) GCD100x100/WithXY-16 1.527µ ± 1% 1.526µ ± 0% ~ (p=0.813 n=15) GCD1000x1000/WithoutXY-16 9.704µ ± 1% 9.696µ ± 0% ~ (p=0.532 n=15) GCD1000x1000/WithXY-16 14.03µ ± 1% 13.96µ ± 0% ~ (p=0.014 n=15) GCD10000x10000/WithoutXY-16 206.5µ ± 2% 206.5µ ± 0% ~ (p=0.967 n=15) GCD10000x10000/WithXY-16 398.0µ ± 1% 397.4µ ± 0% ~ (p=0.683 n=15) Div/20/10-16 22.22n ± 0% 22.23n ± 0% ~ (p=0.105 n=15) Div/40/20-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.307 n=15) Div/100/50-16 55.47n ± 0% 55.47n ± 0% ~ (p=0.573 n=15) Div/200/100-16 174.9n ± 1% 174.6n ± 1% ~ (p=0.814 n=15) Div/400/200-16 209.5n ± 1% 210.5n ± 1% ~ (p=0.454 n=15) Div/1000/500-16 379.9n ± 0% 383.5n ± 2% ~ (p=0.123 n=15) Div/2000/1000-16 780.1n ± 0% 784.6n ± 1% +0.58% (p=0.000 n=15) Div/20000/10000-16 25.22µ ± 1% 25.15µ ± 0% ~ (p=0.213 n=15) Div/200000/100000-16 921.8µ ± 1% 926.1µ ± 0% ~ (p=0.009 n=15) Div/2000000/1000000-16 37.91m ± 0% 35.63m ± 0% -6.02% (p=0.000 n=15) Div/20000000/10000000-16 1.378 ± 0% 1.336 ± 0% -3.03% (p=0.000 n=15) NatMul/10-16 166.8n ± 4% 168.9n ± 3% ~ (p=0.008 n=15) NatMul/100-16 5.519µ ± 2% 5.548µ ± 4% ~ (p=0.032 n=15) NatMul/1000-16 230.4µ ± 1% 220.2µ ± 1% -4.43% (p=0.000 n=15) NatMul/10000-16 8.569m ± 1% 8.640m ± 1% ~ (p=0.005 n=15) NatMul/100000-16 376.5m ± 1% 334.1m ± 0% -11.26% (p=0.000 n=15) NatSqr/1-16 27.85n ± 5% 28.60n ± 2% ~ (p=0.123 n=15) NatSqr/2-16 47.99n ± 2% 48.84n ± 1% ~ (p=0.008 n=15) NatSqr/3-16 59.41n ± 2% 60.87n ± 2% +2.46% (p=0.001 n=15) NatSqr/5-16 87.27n ± 2% 89.31n ± 3% ~ (p=0.087 n=15) NatSqr/8-16 124.6n ± 3% 128.9n ± 3% ~ (p=0.006 n=15) NatSqr/10-16 166.3n ± 3% 172.7n ± 3% ~ (p=0.002 n=15) NatSqr/20-16 385.2n ± 2% 394.7n ± 3% ~ (p=0.036 n=15) NatSqr/30-16 622.7n ± 3% 642.9n ± 3% ~ (p=0.032 n=15) NatSqr/50-16 1.274µ ± 3% 1.323µ ± 4% ~ (p=0.003 n=15) NatSqr/80-16 2.606µ ± 4% 2.714µ ± 4% ~ (p=0.044 n=15) NatSqr/100-16 3.731µ ± 4% 3.871µ ± 4% ~ (p=0.038 n=15) NatSqr/200-16 12.99µ ± 2% 13.09µ ± 3% ~ (p=0.838 n=15) NatSqr/300-16 22.87µ ± 2% 23.25µ ± 2% ~ (p=0.285 n=15) NatSqr/500-16 58.43µ ± 1% 58.25µ ± 2% ~ (p=0.345 n=15) NatSqr/800-16 115.3µ ± 3% 116.2µ ± 3% ~ (p=0.126 n=15) NatSqr/1000-16 173.9µ ± 1% 174.3µ ± 1% ~ (p=0.935 n=15) NatSqr/10000-16 6.133m ± 2% 6.034m ± 1% -1.62% (p=0.000 n=15) NatSqr/100000-16 253.8m ± 1% 241.5m ± 0% -4.87% (p=0.000 n=15) geomean 7.745µ 7.760µ +0.19% goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-88 62.17n ± 4% 61.44n ± 0% -1.17% (p=0.000 n=15) GCD10x10/WithXY-88 173.4n ± 2% 172.4n ± 4% ~ (p=0.615 n=15) GCD100x100/WithoutXY-88 584.0n ± 1% 582.9n ± 0% ~ (p=0.009 n=15) GCD100x100/WithXY-88 1.098µ ± 1% 1.091µ ± 2% ~ (p=0.002 n=15) GCD1000x1000/WithoutXY-88 6.055µ ± 0% 6.049µ ± 0% ~ (p=0.007 n=15) GCD1000x1000/WithXY-88 9.430µ ± 0% 9.417µ ± 1% ~ (p=0.123 n=15) GCD10000x10000/WithoutXY-88 153.4µ ± 2% 149.0µ ± 2% -2.85% (p=0.000 n=15) GCD10000x10000/WithXY-88 350.6µ ± 3% 349.0µ ± 2% ~ (p=0.126 n=15) Div/20/10-88 13.12n ± 0% 13.12n ± 1% 0.00% (p=0.042 n=15) Div/40/20-88 13.12n ± 0% 13.13n ± 0% ~ (p=0.004 n=15) Div/100/50-88 25.49n ± 0% 25.49n ± 0% ~ (p=0.452 n=15) Div/200/100-88 115.7n ± 2% 113.8n ± 2% ~ (p=0.212 n=15) Div/400/200-88 135.0n ± 1% 136.1n ± 1% ~ (p=0.005 n=15) Div/1000/500-88 257.5n ± 1% 259.9n ± 1% ~ (p=0.004 n=15) Div/2000/1000-88 567.5n ± 1% 572.4n ± 2% ~ (p=0.616 n=15) Div/20000/10000-88 25.65µ ± 0% 25.77µ ± 1% ~ (p=0.032 n=15) Div/200000/100000-88 777.4µ ± 1% 754.3µ ± 1% -2.97% (p=0.000 n=15) Div/2000000/1000000-88 33.66m ± 0% 31.37m ± 0% -6.81% (p=0.000 n=15) Div/20000000/10000000-88 1.320 ± 0% 1.266 ± 0% -4.04% (p=0.000 n=15) NatMul/10-88 151.9n ± 7% 143.3n ± 7% ~ (p=0.878 n=15) NatMul/100-88 4.418µ ± 2% 4.337µ ± 3% ~ (p=0.512 n=15) NatMul/1000-88 206.8µ ± 1% 189.8µ ± 1% -8.25% (p=0.000 n=15) NatMul/10000-88 8.531m ± 1% 8.095m ± 0% -5.12% (p=0.000 n=15) NatMul/100000-88 298.9m ± 0% 260.5m ± 1% -12.85% (p=0.000 n=15) NatSqr/1-88 27.55n ± 6% 28.25n ± 7% ~ (p=0.024 n=15) NatSqr/2-88 44.71n ± 6% 46.21n ± 9% ~ (p=0.024 n=15) NatSqr/3-88 55.44n ± 4% 58.41n ± 10% ~ (p=0.126 n=15) NatSqr/5-88 80.71n ± 5% 81.41n ± 5% ~ (p=0.032 n=15) NatSqr/8-88 115.7n ± 4% 115.4n ± 5% ~ (p=0.814 n=15) NatSqr/10-88 147.4n ± 4% 147.3n ± 4% ~ (p=0.505 n=15) NatSqr/20-88 337.8n ± 3% 337.3n ± 4% ~ (p=0.814 n=15) NatSqr/30-88 556.9n ± 3% 557.6n ± 4% ~ (p=0.814 n=15) NatSqr/50-88 1.208µ ± 4% 1.208µ ± 3% ~ (p=0.910 n=15) NatSqr/80-88 2.591µ ± 3% 2.581µ ± 3% ~ (p=0.705 n=15) NatSqr/100-88 3.870µ ± 3% 3.858µ ± 3% ~ (p=0.846 n=15) NatSqr/200-88 14.43µ ± 3% 14.28µ ± 2% ~ (p=0.383 n=15) NatSqr/300-88 24.68µ ± 2% 24.49µ ± 2% ~ (p=0.624 n=15) NatSqr/500-88 66.27µ ± 1% 66.18µ ± 1% ~ (p=0.735 n=15) NatSqr/800-88 128.7µ ± 1% 127.4µ ± 1% ~ (p=0.050 n=15) NatSqr/1000-88 198.7µ ± 1% 197.7µ ± 1% ~ (p=0.229 n=15) NatSqr/10000-88 6.582m ± 1% 6.426m ± 1% -2.37% (p=0.000 n=15) NatSqr/100000-88 274.3m ± 0% 267.3m ± 0% -2.57% (p=0.000 n=15) geomean 6.518µ 6.438µ -1.22% goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-16 61.70n ± 1% 61.32n ± 1% ~ (p=0.361 n=15) GCD10x10/WithXY-16 217.3n ± 1% 217.0n ± 1% ~ (p=0.395 n=15) GCD100x100/WithoutXY-16 569.7n ± 0% 572.6n ± 2% ~ (p=0.213 n=15) GCD100x100/WithXY-16 1.241µ ± 1% 1.236µ ± 1% ~ (p=0.157 n=15) GCD1000x1000/WithoutXY-16 5.558µ ± 0% 5.566µ ± 0% ~ (p=0.228 n=15) GCD1000x1000/WithXY-16 9.319µ ± 0% 9.326µ ± 0% ~ (p=0.233 n=15) GCD10000x10000/WithoutXY-16 126.4µ ± 2% 128.7µ ± 3% ~ (p=0.081 n=15) GCD10000x10000/WithXY-16 279.3µ ± 0% 278.3µ ± 5% ~ (p=0.187 n=15) Div/20/10-16 15.12n ± 1% 15.21n ± 1% ~ (p=0.490 n=15) Div/40/20-16 15.11n ± 0% 15.23n ± 1% ~ (p=0.107 n=15) Div/100/50-16 26.53n ± 0% 26.50n ± 0% ~ (p=0.299 n=15) Div/200/100-16 123.7n ± 0% 124.0n ± 0% ~ (p=0.086 n=15) Div/400/200-16 142.5n ± 0% 142.4n ± 0% ~ (p=0.039 n=15) Div/1000/500-16 259.9n ± 1% 261.2n ± 1% ~ (p=0.044 n=15) Div/2000/1000-16 539.4n ± 1% 532.3n ± 1% -1.32% (p=0.001 n=15) Div/20000/10000-16 22.43µ ± 0% 22.32µ ± 0% -0.49% (p=0.000 n=15) Div/200000/100000-16 898.3µ ± 0% 889.6µ ± 0% -0.96% (p=0.000 n=15) Div/2000000/1000000-16 38.37m ± 0% 35.11m ± 0% -8.49% (p=0.000 n=15) Div/20000000/10000000-16 1.449 ± 0% 1.384 ± 0% -4.48% (p=0.000 n=15) NatMul/10-16 182.0n ± 1% 177.8n ± 1% -2.31% (p=0.000 n=15) NatMul/100-16 5.537µ ± 0% 5.693µ ± 0% +2.82% (p=0.000 n=15) NatMul/1000-16 229.9µ ± 0% 224.8µ ± 0% -2.24% (p=0.000 n=15) NatMul/10000-16 8.985m ± 0% 8.751m ± 0% -2.61% (p=0.000 n=15) NatMul/100000-16 371.1m ± 0% 331.5m ± 0% -10.66% (p=0.000 n=15) NatSqr/1-16 46.77n ± 6% 42.76n ± 1% -8.57% (p=0.000 n=15) NatSqr/2-16 66.99n ± 4% 63.62n ± 1% -5.03% (p=0.000 n=15) NatSqr/3-16 76.79n ± 4% 73.42n ± 1% ~ (p=0.007 n=15) NatSqr/5-16 99.00n ± 3% 95.35n ± 1% -3.69% (p=0.000 n=15) NatSqr/8-16 160.0n ± 3% 155.1n ± 1% -3.06% (p=0.001 n=15) NatSqr/10-16 178.4n ± 2% 175.9n ± 0% -1.40% (p=0.001 n=15) NatSqr/20-16 361.9n ± 2% 361.3n ± 0% ~ (p=0.083 n=15) NatSqr/30-16 584.7n ± 0% 586.8n ± 0% +0.36% (p=0.000 n=15) NatSqr/50-16 1.327µ ± 0% 1.329µ ± 0% ~ (p=0.349 n=15) NatSqr/80-16 2.893µ ± 1% 2.925µ ± 0% +1.11% (p=0.000 n=15) NatSqr/100-16 4.330µ ± 1% 4.381µ ± 0% +1.18% (p=0.000 n=15) NatSqr/200-16 16.25µ ± 1% 16.43µ ± 0% +1.07% (p=0.000 n=15) NatSqr/300-16 27.85µ ± 1% 28.06µ ± 0% +0.77% (p=0.000 n=15) NatSqr/500-16 76.01µ ± 0% 76.34µ ± 0% ~ (p=0.002 n=15) NatSqr/800-16 146.8µ ± 0% 148.1µ ± 0% +0.83% (p=0.000 n=15) NatSqr/1000-16 228.2µ ± 0% 228.6µ ± 0% ~ (p=0.123 n=15) NatSqr/10000-16 7.524m ± 0% 7.426m ± 0% -1.31% (p=0.000 n=15) NatSqr/100000-16 316.7m ± 0% 309.2m ± 0% -2.36% (p=0.000 n=15) geomean 7.264µ 7.172µ -1.27% goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-12 32.61n ± 1% 32.42n ± 1% ~ (p=0.021 n=15) GCD10x10/WithXY-12 87.70n ± 1% 88.42n ± 1% ~ (p=0.010 n=15) GCD100x100/WithoutXY-12 305.9n ± 0% 306.4n ± 0% ~ (p=0.003 n=15) GCD100x100/WithXY-12 560.3n ± 2% 556.6n ± 1% ~ (p=0.018 n=15) GCD1000x1000/WithoutXY-12 3.509µ ± 2% 3.464µ ± 1% ~ (p=0.145 n=15) GCD1000x1000/WithXY-12 5.347µ ± 2% 5.372µ ± 1% ~ (p=0.046 n=15) GCD10000x10000/WithoutXY-12 73.75µ ± 1% 73.99µ ± 1% ~ (p=0.004 n=15) GCD10000x10000/WithXY-12 148.4µ ± 0% 147.8µ ± 1% ~ (p=0.076 n=15) Div/20/10-12 9.481n ± 0% 9.462n ± 1% ~ (p=0.631 n=15) Div/40/20-12 9.457n ± 0% 9.462n ± 1% ~ (p=0.798 n=15) Div/100/50-12 14.91n ± 0% 14.79n ± 1% -0.80% (p=0.000 n=15) Div/200/100-12 84.56n ± 1% 84.60n ± 1% ~ (p=0.271 n=15) Div/400/200-12 103.8n ± 0% 102.8n ± 0% -0.96% (p=0.000 n=15) Div/1000/500-12 181.3n ± 1% 184.2n ± 2% ~ (p=0.091 n=15) Div/2000/1000-12 397.5n ± 0% 397.4n ± 0% ~ (p=0.299 n=15) Div/20000/10000-12 14.04µ ± 1% 13.99µ ± 0% ~ (p=0.221 n=15) Div/200000/100000-12 523.1µ ± 0% 514.0µ ± 3% ~ (p=0.775 n=15) Div/2000000/1000000-12 21.58m ± 0% 20.01m ± 1% -7.29% (p=0.000 n=15) Div/20000000/10000000-12 813.5m ± 0% 796.2m ± 1% -2.13% (p=0.000 n=15) NatMul/10-12 80.46n ± 1% 80.02n ± 1% ~ (p=0.063 n=15) NatMul/100-12 2.904µ ± 0% 2.979µ ± 1% +2.58% (p=0.000 n=15) NatMul/1000-12 127.8µ ± 0% 122.3µ ± 0% -4.28% (p=0.000 n=15) NatMul/10000-12 5.141m ± 0% 4.975m ± 1% -3.23% (p=0.000 n=15) NatMul/100000-12 208.8m ± 0% 189.6m ± 3% -9.21% (p=0.000 n=15) NatSqr/1-12 11.90n ± 1% 11.76n ± 1% ~ (p=0.059 n=15) NatSqr/2-12 21.33n ± 1% 21.12n ± 0% ~ (p=0.063 n=15) NatSqr/3-12 26.05n ± 1% 25.79n ± 0% ~ (p=0.002 n=15) NatSqr/5-12 37.31n ± 0% 36.98n ± 1% ~ (p=0.008 n=15) NatSqr/8-12 63.07n ± 0% 62.75n ± 1% ~ (p=0.061 n=15) NatSqr/10-12 79.48n ± 0% 79.59n ± 0% ~ (p=0.455 n=15) NatSqr/20-12 173.1n ± 0% 173.2n ± 1% ~ (p=0.518 n=15) NatSqr/30-12 288.6n ± 1% 289.2n ± 0% ~ (p=0.030 n=15) NatSqr/50-12 653.3n ± 0% 653.3n ± 0% ~ (p=0.361 n=15) NatSqr/80-12 1.492µ ± 0% 1.496µ ± 0% ~ (p=0.018 n=15) NatSqr/100-12 2.270µ ± 1% 2.270µ ± 0% ~ (p=0.326 n=15) NatSqr/200-12 8.776µ ± 1% 8.784µ ± 1% ~ (p=0.083 n=15) NatSqr/300-12 15.07µ ± 0% 15.09µ ± 0% ~ (p=0.455 n=15) NatSqr/500-12 41.71µ ± 0% 41.77µ ± 1% ~ (p=0.305 n=15) NatSqr/800-12 80.77µ ± 1% 80.59µ ± 0% ~ (p=0.113 n=15) NatSqr/1000-12 126.4µ ± 1% 126.5µ ± 0% ~ (p=0.683 n=15) NatSqr/10000-12 4.204m ± 0% 4.119m ± 0% -2.02% (p=0.000 n=15) NatSqr/100000-12 177.0m ± 0% 172.9m ± 0% -2.31% (p=0.000 n=15) geomean 3.790µ 3.757µ -0.87% Change-Id: Ifc7a9b61f678df216690511ac8bb9143189a795e Reviewed-on: https://go-review.googlesource.com/c/go/+/652057 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>
2025-02-27	math/big: move multiplication to natmul.go	Russ Cox
	No code changes. This CL moves the multiplication (and squaring) code into natmul.go, in preparation for cleaning up Karatsuba and then adding Toom-Cook and FFT-based multiplication. Change-Id: I7f84328284cc4e1ca4da0ebb9f666a5535e8d7f2 Reviewed-on: https://go-review.googlesource.com/c/go/+/652055 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>
2025-02-27	math/big: replace nat pool with Word stack	Russ Cox
	In the early days of math/big, algorithms that needed more space grew the result larger than it needed to be and then used the high words as extra space. This made results their own temporary space caches, at the cost that saving a result in a data structure might hold significantly more memory than necessary. Specifically, new(big.Int).Mul(x, y) returned a big.Int with a backing slice 3X as big as it strictly needed to be. If you are storing many multiplication results, or even a single large result, the 3X overhead can add up. This approach to storage for temporaries also requires being able to analyze the algorithms to predict the exact amount they need, which can be difficult. For both these reasons, the implementation of recursive long division, which came later, introduced a “nat pool” where temporaries could be stored and reused, or reclaimed by the GC when no longer used. This avoids the storage and bookkeeping overheads but introduces a per-temporary sync.Pool overhead. divRecursiveStep takes an array of cached temporaries to remove some of that overhead. The nat pool was better but is still not quite right. This CL introduces something even better than the nat pool (still probably not quite right, but the best I can see for now): a sync.Pool holding stacks for allocating temporaries. Now an operation can get one stack out of the pool and then allocate as many temporaries as it needs during the operation, eventually returning the stack back to the pool. The sync.Pool operations are now per-exported-operation (like big.Int.Mul), not per-temporary. This CL converts both the pre-allocation in nat.mul and the uses of the nat pool to use stack pools instead. This simplifies some code and sets us up better for more complex algorithms (such as Toom-Cook or FFT-based multiplication) that need more temporaries. It is also a little bit faster. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15) Div/40/20-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15) Div/100/50-16 56.65n ± 0% 55.53n ± 0% -1.98% (p=0.000 n=15) Div/200/100-16 194.6n ± 1% 172.8n ± 0% -11.20% (p=0.000 n=15) Div/400/200-16 232.1n ± 0% 206.7n ± 0% -10.94% (p=0.000 n=15) Div/1000/500-16 405.3n ± 1% 383.8n ± 0% -5.30% (p=0.000 n=15) Div/2000/1000-16 810.4n ± 1% 795.2n ± 0% -1.88% (p=0.000 n=15) Div/20000/10000-16 25.88µ ± 0% 25.39µ ± 0% -1.89% (p=0.000 n=15) Div/200000/100000-16 931.5µ ± 0% 924.3µ ± 0% -0.77% (p=0.000 n=15) Div/2000000/1000000-16 37.77m ± 0% 37.75m ± 0% ~ (p=0.098 n=15) Div/20000000/10000000-16 1.367 ± 0% 1.377 ± 0% +0.72% (p=0.003 n=15) NatMul/10-16 168.5n ± 3% 164.0n ± 4% ~ (p=0.751 n=15) NatMul/100-16 6.086µ ± 3% 5.380µ ± 3% -11.60% (p=0.000 n=15) NatMul/1000-16 238.1µ ± 3% 228.3µ ± 1% -4.12% (p=0.000 n=15) NatMul/10000-16 8.721m ± 2% 8.518m ± 1% -2.33% (p=0.000 n=15) NatMul/100000-16 369.6m ± 0% 371.1m ± 0% +0.42% (p=0.000 n=15) geomean 19.57µ 18.74µ -4.21% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15) NatMul/1000-16 48.16Ki ± 0% 16.02Ki ± 0% -66.73% (p=0.000 n=15) NatMul/10000-16 482.9Ki ± 1% 165.4Ki ± 3% -65.75% (p=0.000 n=15) NatMul/100000-16 5.747Mi ± 7% 4.197Mi ± 0% -26.97% (p=0.000 n=15) geomean 41.42Ki 20.63Ki -50.18% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-16 7.000 ± 14% 7.000 ± 14% ~ (p=0.668 n=15) geomean 1.476 1.476 +0.00% ¹ all samples are equal goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-88 15.84n ± 1% 13.12n ± 0% -17.17% (p=0.000 n=15) Div/40/20-88 15.88n ± 1% 13.12n ± 0% -17.38% (p=0.000 n=15) Div/100/50-88 26.42n ± 0% 25.47n ± 0% -3.60% (p=0.000 n=15) Div/200/100-88 132.4n ± 0% 114.9n ± 0% -13.22% (p=0.000 n=15) Div/400/200-88 150.1n ± 0% 135.6n ± 0% -9.66% (p=0.000 n=15) Div/1000/500-88 275.5n ± 0% 264.1n ± 0% -4.14% (p=0.000 n=15) Div/2000/1000-88 586.5n ± 0% 581.1n ± 0% -0.92% (p=0.000 n=15) Div/20000/10000-88 25.87µ ± 0% 25.72µ ± 0% -0.59% (p=0.000 n=15) Div/200000/100000-88 772.2µ ± 0% 779.0µ ± 0% +0.88% (p=0.000 n=15) Div/2000000/1000000-88 33.36m ± 0% 33.63m ± 0% +0.80% (p=0.000 n=15) Div/20000000/10000000-88 1.307 ± 0% 1.320 ± 0% +1.03% (p=0.000 n=15) NatMul/10-88 140.4n ± 0% 148.8n ± 4% +5.98% (p=0.000 n=15) NatMul/100-88 4.663µ ± 1% 4.388µ ± 1% -5.90% (p=0.000 n=15) NatMul/1000-88 207.7µ ± 0% 205.8µ ± 0% -0.89% (p=0.000 n=15) NatMul/10000-88 8.456m ± 0% 8.468m ± 0% +0.14% (p=0.021 n=15) NatMul/100000-88 295.1m ± 0% 297.9m ± 0% +0.94% (p=0.000 n=15) geomean 14.96µ 14.33µ -4.23% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-88 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-88 4.750Ki ± 0% 1.758Ki ± 0% -62.99% (p=0.000 n=15) NatMul/1000-88 48.44Ki ± 0% 16.08Ki ± 0% -66.80% (p=0.000 n=15) NatMul/10000-88 489.7Ki ± 1% 166.1Ki ± 3% -66.08% (p=0.000 n=15) NatMul/100000-88 5.546Mi ± 0% 3.819Mi ± 60% -31.15% (p=0.000 n=15) geomean 41.29Ki 20.30Ki -50.85% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-88 5.000 ± 20% 6.000 ± 67% ~ (p=0.672 n=15) geomean 1.380 1.431 +3.71% ¹ all samples are equal goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 15.85n ± 0% 15.23n ± 0% -3.91% (p=0.000 n=15) Div/40/20-16 15.88n ± 0% 15.22n ± 0% -4.16% (p=0.000 n=15) Div/100/50-16 29.69n ± 0% 26.39n ± 0% -11.11% (p=0.000 n=15) Div/200/100-16 149.2n ± 0% 123.3n ± 0% -17.36% (p=0.000 n=15) Div/400/200-16 160.3n ± 0% 139.2n ± 0% -13.16% (p=0.000 n=15) Div/1000/500-16 271.0n ± 0% 256.1n ± 0% -5.50% (p=0.000 n=15) Div/2000/1000-16 545.3n ± 0% 527.0n ± 0% -3.36% (p=0.000 n=15) Div/20000/10000-16 22.60µ ± 0% 22.20µ ± 0% -1.77% (p=0.000 n=15) Div/200000/100000-16 889.0µ ± 0% 892.2µ ± 0% +0.35% (p=0.000 n=15) Div/2000000/1000000-16 38.01m ± 0% 38.12m ± 0% +0.30% (p=0.000 n=15) Div/20000000/10000000-16 1.437 ± 0% 1.444 ± 0% +0.50% (p=0.000 n=15) NatMul/10-16 166.4n ± 2% 169.5n ± 1% +1.86% (p=0.000 n=15) NatMul/100-16 5.733µ ± 1% 5.570µ ± 1% -2.84% (p=0.000 n=15) NatMul/1000-16 232.6µ ± 1% 229.8µ ± 0% -1.22% (p=0.000 n=15) NatMul/10000-16 9.039m ± 1% 8.969m ± 0% -0.77% (p=0.000 n=15) NatMul/100000-16 367.0m ± 0% 368.8m ± 0% +0.48% (p=0.000 n=15) geomean 16.15µ 15.50µ -4.01% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15) NatMul/1000-16 48.33Ki ± 0% 16.02Ki ± 0% -66.85% (p=0.000 n=15) NatMul/10000-16 536.5Ki ± 1% 165.7Ki ± 3% -69.12% (p=0.000 n=15) NatMul/100000-16 6.078Mi ± 6% 4.197Mi ± 0% -30.94% (p=0.000 n=15) geomean 42.81Ki 20.64Ki -51.78% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-16 2.000 ± 50% 1.000 ± 0% -50.00% (p=0.001 n=15) NatMul/100000-16 9.000 ± 11% 8.000 ± 12% -11.11% (p=0.001 n=15) geomean 1.783 1.516 -14.97% ¹ all samples are equal goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-12 9.850n ± 1% 9.405n ± 1% -4.52% (p=0.000 n=15) Div/40/20-12 9.858n ± 0% 9.403n ± 1% -4.62% (p=0.000 n=15) Div/100/50-12 16.40n ± 1% 14.81n ± 0% -9.70% (p=0.000 n=15) Div/200/100-12 88.48n ± 2% 80.88n ± 0% -8.59% (p=0.000 n=15) Div/400/200-12 107.90n ± 1% 99.28n ± 1% -7.99% (p=0.000 n=15) Div/1000/500-12 188.8n ± 1% 178.6n ± 1% -5.40% (p=0.000 n=15) Div/2000/1000-12 399.9n ± 0% 389.1n ± 0% -2.70% (p=0.000 n=15) Div/20000/10000-12 13.94µ ± 2% 13.81µ ± 1% ~ (p=0.574 n=15) Div/200000/100000-12 523.8µ ± 0% 521.7µ ± 0% -0.40% (p=0.000 n=15) Div/2000000/1000000-12 21.46m ± 0% 21.48m ± 0% ~ (p=0.067 n=15) Div/20000000/10000000-12 812.5m ± 0% 812.9m ± 0% ~ (p=0.061 n=15) NatMul/10-12 77.14n ± 0% 78.35n ± 1% +1.57% (p=0.000 n=15) NatMul/100-12 2.999µ ± 0% 2.871µ ± 1% -4.27% (p=0.000 n=15) NatMul/1000-12 126.2µ ± 0% 126.8µ ± 0% +0.51% (p=0.011 n=15) NatMul/10000-12 5.099m ± 0% 5.125m ± 0% +0.51% (p=0.000 n=15) NatMul/100000-12 206.7m ± 0% 208.4m ± 0% +0.80% (p=0.000 n=15) geomean 9.512µ 9.236µ -2.91% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-12 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-12 4.750Ki ± 0% 1.750Ki ± 0% -63.16% (p=0.000 n=15) NatMul/1000-12 48.13Ki ± 0% 16.01Ki ± 0% -66.73% (p=0.000 n=15) NatMul/10000-12 483.5Ki ± 1% 163.2Ki ± 2% -66.24% (p=0.000 n=15) NatMul/100000-12 5.480Mi ± 4% 1.532Mi ± 104% -72.05% (p=0.000 n=15) geomean 41.03Ki 16.82Ki -59.01% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-12 5.000 ± 0% 1.000 ± 400% -80.00% (p=0.007 n=15) geomean 1.380 1.000 -27.52% ¹ all samples are equal Change-Id: I7efa6fe37971ed26ae120a32250fcb47ece0a011 Reviewed-on: https://go-review.googlesource.com/c/go/+/650638 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>
2024-11-20	internal/byteorder: use canonical Go casing in names	Russ Cox
	If Be and Le stand for big-endian and little-endian, then they should be BE and LE. Change-Id: I723e3962b8918da84791783d3c547638f1c9e8a9 Reviewed-on: https://go-review.googlesource.com/c/go/+/627376 Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-13	math/rand/v2, math/big: use internal/byteorder	Mateusz Poliwczak
	Change-Id: Id07f16d14133ee539bc2880b39641c42418fa6e2 GitHub-Last-Rev: 7b327d508f677f2476d24f046d25921f4599dd9a GitHub-Pull-Request: golang/go#67319 Reviewed-on: https://go-review.googlesource.com/c/go/+/585016 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2024-03-14	math/big: use built-in clear to simplify code	apocelipes
	Change-Id: I07c3a498ce1e462c3d1703d77e7d7824e9334651 GitHub-Last-Rev: 2ba8c4c705eaeb0772109ece7296978b62467eb3 GitHub-Pull-Request: golang/go#66312 Reviewed-on: https://go-review.googlesource.com/c/go/+/571636 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-01-09	math/big: fix uint64 overflow in nat.mulRange	Robert Griesemer
	Compute median as a + (b-a)/2 instead of (a + b)/2. Add additional test cases. Fixes #65025. Change-Id: Ib716a1036c17f8f33f51e33cedab13512eb7e0be Reviewed-on: https://go-review.googlesource.com/c/go/+/554617 Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Robert Griesemer <gri@google.com> Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-08-17	math/big, math/rand: use the built-in max function	chanxuehong
	Change-Id: I71a38dd20bfaf2b1aed18892d54eeb017d3d7d66 GitHub-Last-Rev: 8da43b2cbd563ed123690709e519c9f84272b332 GitHub-Pull-Request: golang/go#61955 Reviewed-on: https://go-review.googlesource.com/c/go/+/518595 Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: qiulaidongfeng <2645477756@qq.com>
2022-12-02	math/big: fix BitLen performance regression	Filippo Valsorda
	CL 450055 replaced BitLen with a slower constant-time implementation, which caused a performance regression in some ecosystem benchmarks. https://perf.golang.org/search?q=upload%3A20221130.13+pkg%3Agithub.com%2Fericlagergren%2Fdecimal%2Fbenchmarks Current tip vs this CL name old time/op new time/op delta Pi/foo=ericlagergren_(Go)/prec=100-4 151µs ± 0% 129µs ± 0% -14.89% (p=0.000 n=10+9) Pi/foo=ericlagergren_(GDA)/prec=100-4 305µs ± 0% 269µs ± 1% -11.88% (p=0.000 n=9+10) Pi/foo=cockroachdb/apd/prec=100-4 5.74ms ± 0% 5.33ms ± 0% -7.02% (p=0.000 n=9+10) Pi/foo=shopspring/prec=100-4 265µs ±16% 268µs ±11% ~ (p=0.796 n=10+10) Pi/foo=apmckinlay/prec=100-4 3.10µs ± 0% 3.08µs ± 0% -0.60% (p=0.000 n=8+10) Pi/foo=go-inf/prec=100-4 132µs ± 9% 137µs ± 9% ~ (p=0.182 n=10+9) Pi/foo=float64/prec=100-4 4.97µs ± 0% 4.98µs ± 0% ~ (p=0.196 n=10+10) CL 450055's parent vs this CL name old time/op new time/op delta Pi/foo=ericlagergren_(Go)/prec=100-4 129µs ± 1% 129µs ± 0% ~ (p=0.182 n=10+9) Pi/foo=ericlagergren_(GDA)/prec=100-4 267µs ± 1% 269µs ± 1% +0.93% (p=0.001 n=9+10) Pi/foo=shopspring/prec=100-4 252µs ± 9% 268µs ±11% ~ (p=0.052 n=10+10) Pi/foo=apmckinlay/prec=100-4 3.10µs ± 1% 3.08µs ± 0% -0.66% (p=0.000 n=9+10) Pi/foo=go-inf/prec=100-4 135µs ± 6% 137µs ± 9% ~ (p=0.605 n=9+9) Pi/foo=float64/prec=100-4 4.97µs ± 0% 4.98µs ± 0% +0.23% (p=0.005 n=8+10) goos: linux goarch: amd64 pkg: github.com/ericlagergren/decimal_benchmarks cpu: Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz Fixes #57014 Change-Id: I08478bea122212320a592ad2652e33077807de09 Reviewed-on: https://go-review.googlesource.com/c/go/+/454617 Reviewed-by: Roland Shoemaker <roland@golang.org> Run-TryBot: Filippo Valsorda <filippo@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Auto-Submit: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-11-21	crypto/internal/bigmod: move nat implementation out of crypto/rsa	Filippo Valsorda
	This will let us reuse it in crypto/ecdsa for the NIST scalar fields. The main change in API is around encoding and decoding. The SetBytes + ExpandFor sequence was hacky: SetBytes could produce a bigger size than the modulus if leading zeroes in the top byte overflowed the limb boundary, so ExpandFor had to check for and tolerate that. Also, the caller was responsible for checking that the overflow was actually all zeroes (which we weren't doing, exposing a crasher in decryption and signature verification) and then for checking that the result was less than the modulus. Instead, make SetBytes take a modulus and return an error if the value overflows. Same with Bytes: we were always allocating based on Size before FillBytes anyway, so now Bytes takes a modulus. Finally, SetBig was almost only used for moduli, so replaced NewModulusFromNat and SetBig with NewModulusFromBig. Moved the constant-time bitLen to math/big.Int.BitLen. It's slower, but BitLen is primarily used in cryptographic code, so it's safer this way. Change-Id: Ibaf7f36d80695578cb80484167d82ce1aa83832f Reviewed-on: https://go-review.googlesource.com/c/go/+/450055 Auto-Submit: Filippo Valsorda <filippo@golang.org> Run-TryBot: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Roland Shoemaker <roland@golang.org>
2022-11-02	math/big: use Montgomery for z.Exp(x, y, m) even for even m	Russ Cox
	Montgomery multiplication can be used for Exp mod even m by splitting it into two steps - Exp mod an odd number and Exp mod a power of two - and then combining the two results using the Chinese Remainder Theorem. For more details, see Ç. K. Koç, “Montgomery Reduction with Even Modulus”, IEE Proceedings: Computers and Digital Techniques, 141(5) 314-316, September 1994. http://www.people.vcu.edu/~jwang3/CMSC691/j34monex.pdf Thanks to Guido Vranken for suggesting that we use a faster algorithm. name old time/op new time/op delta ExpMont/Odd-16 240µs ± 2% 239µs ± 2% ~ (p=0.853 n=10+10) ExpMont/Even1-16 757µs ± 3% 249µs ± 6% -67.17% (p=0.000 n=10+10) ExpMont/Even2-16 755µs ± 1% 244µs ± 4% -67.64% (p=0.000 n=8+10) ExpMont/Even3-16 771µs ± 3% 240µs ± 2% -68.90% (p=0.000 n=10+10) ExpMont/Even4-16 775µs ± 3% 241µs ± 2% -68.91% (p=0.000 n=10+10) ExpMont/Even8-16 779µs ± 2% 241µs ± 3% -69.06% (p=0.000 n=9+10) ExpMont/Even32-16 778µs ± 3% 240µs ± 4% -69.11% (p=0.000 n=9+9) ExpMont/Even64-16 774µs ± 6% 186µs ± 2% -76.00% (p=0.000 n=10+10) ExpMont/Even96-16 776µs ± 4% 186µs ± 6% -76.09% (p=0.000 n=9+10) ExpMont/Even128-16 764µs ± 2% 143µs ± 3% -81.24% (p=0.000 n=10+10) ExpMont/Even255-16 761µs ± 3% 109µs ± 2% -85.73% (p=0.000 n=10+10) ExpMont/SmallEven1-16 45.6µs ± 1% 36.3µs ± 3% -20.49% (p=0.000 n=9+10) ExpMont/SmallEven2-16 44.3µs ± 2% 37.5µs ± 2% -15.26% (p=0.000 n=9+10) ExpMont/SmallEven3-16 44.1µs ± 5% 37.3µs ± 3% -15.32% (p=0.000 n=9+10) ExpMont/SmallEven4-16 47.1µs ± 6% 38.0µs ± 5% -19.40% (p=0.000 n=10+9) name old alloc/op new alloc/op delta ExpMont/Odd-16 2.53kB ± 0% 2.53kB ± 0% ~ (p=0.137 n=8+10) ExpMont/Even1-16 2.57kB ± 0% 3.31kB ± 0% +28.90% (p=0.000 n=8+10) ExpMont/Even2-16 2.57kB ± 0% 3.37kB ± 0% +31.09% (p=0.000 n=9+10) ExpMont/Even3-16 2.57kB ± 0% 3.37kB ± 0% +31.08% (p=0.000 n=8+8) ExpMont/Even4-16 2.57kB ± 0% 3.37kB ± 0% +31.09% (p=0.000 n=9+10) ExpMont/Even8-16 2.57kB ± 0% 3.37kB ± 0% +31.09% (p=0.000 n=9+10) ExpMont/Even32-16 2.57kB ± 0% 3.37kB ± 0% +31.14% (p=0.000 n=10+10) ExpMont/Even64-16 2.57kB ± 0% 3.16kB ± 0% +22.99% (p=0.000 n=9+9) ExpMont/Even96-16 2.57kB ± 0% 3.44kB ± 0% +33.90% (p=0.000 n=10+8) ExpMont/Even128-16 2.57kB ± 0% 2.88kB ± 0% +12.10% (p=0.000 n=10+10) ExpMont/Even255-16 2.57kB ± 0% 2.54kB ± 0% -1.30% (p=0.000 n=9+10) ExpMont/SmallEven1-16 872B ± 0% 1232B ± 0% +41.28% (p=0.000 n=10+10) ExpMont/SmallEven2-16 872B ± 0% 1233B ± 0% +41.40% (p=0.000 n=10+9) ExpMont/SmallEven3-16 872B ± 0% 1289B ± 0% +47.82% (p=0.000 n=10+10) ExpMont/SmallEven4-16 872B ± 0% 1241B ± 0% +42.32% (p=0.000 n=10+10) name old allocs/op new allocs/op delta ExpMont/Odd-16 21.0 ± 0% 21.0 ± 0% ~ (all equal) ExpMont/Even1-16 24.0 ± 0% 38.0 ± 0% +58.33% (p=0.000 n=10+10) ExpMont/Even2-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10) ExpMont/Even3-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10) ExpMont/Even4-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10) ExpMont/Even8-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10) ExpMont/Even32-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10) ExpMont/Even64-16 24.0 ± 0% 39.0 ± 0% +62.50% (p=0.000 n=10+10) ExpMont/Even96-16 24.0 ± 0% 42.0 ± 0% +75.00% (p=0.000 n=10+10) ExpMont/Even128-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10) ExpMont/Even255-16 24.0 ± 0% 38.0 ± 0% +58.33% (p=0.000 n=10+10) ExpMont/SmallEven1-16 16.0 ± 0% 35.0 ± 0% +118.75% (p=0.000 n=10+10) ExpMont/SmallEven2-16 16.0 ± 0% 35.0 ± 0% +118.75% (p=0.000 n=10+10) ExpMont/SmallEven3-16 16.0 ± 0% 37.0 ± 0% +131.25% (p=0.000 n=10+10) ExpMont/SmallEven4-16 16.0 ± 0% 36.0 ± 0% +125.00% (p=0.000 n=10+10) Change-Id: Ib7f70290f8f087b78805ec3120baf17dd7737ac3 Reviewed-on: https://go-review.googlesource.com/c/go/+/420897 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Roland Shoemaker <roland@golang.org> Auto-Submit: Russ Cox <rsc@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-11	all: gofmt main repo	Russ Cox
	[This CL is part of a sequence implementing the proposal #51082. The design doc is at https://go.dev/s/godocfmt-design.] Run the updated gofmt, which reformats doc comments, on the main repository. Vendored files are excluded. For #51082. Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407 Reviewed-on: https://go-review.googlesource.com/c/go/+/384268 Reviewed-by: Jonathan Amsterdam <jba@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-04-01	all: remove trailing blank doc comment lines	Russ Cox
	A future change to gofmt will rewrite // Doc comment. // func f() to // Doc comment. func f() Apply that change preemptively to all doc comments. For #51082. Change-Id: I4023e16cfb0729b64a8590f071cd92f17343081d Reviewed-on: https://go-review.googlesource.com/c/go/+/384259 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-01	all: fix various doc comment formatting nits	Russ Cox
	A run of lines that are indented with any number of spaces or tabs format as a <pre> block. This commit fixes various doc comments that format badly according to that (standard) rule. For example, consider: // - List item. // Second line. // - Another item. Because the - lines are unindented, this is actually two paragraphs separated by a one-line <pre> block. This CL rewrites it to: // - List item. // Second line. // - Another item. Today, that will format as a single <pre> block. In a future release, we hope to format it as a bulleted list. Various other minor fixes as well, all in preparation for reformatting. For #51082. Change-Id: I95cf06040d4186830e571cd50148be3bf8daf189 Reviewed-on: https://go-review.googlesource.com/c/go/+/384257 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-06	all: fix some typos	Dan Kortschak
	Change-Id: I7dfae0fc91c2d70873ec7ec920be7c0a4888153a Reviewed-on: https://go-review.googlesource.com/c/go/+/390175 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Trust: Daniel Martí <mvdan@mvdan.cc>
2021-05-26	math/big: move division into natdiv.go	Russ Cox
	Code moved and functions reordered to be in a consistent top-down dependency order, but otherwise unchanged. First step toward commenting division algorithms. Change-Id: Ib5e604fb5b2867edff3a228ba4e57b5cb32c4137 Reviewed-on: https://go-review.googlesource.com/c/go/+/321077 Trust: Russ Cox <rsc@golang.org> Trust: Katie Hockman <katie@golang.org> Trust: Robert Griesemer <gri@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Katie Hockman <katie@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2021-02-03	math/big: fix comment in divRecursiveStep	Katie Hockman
	There appears to be a typo in the description of the recursive division algorithm. Two things seem suspicious with the original comment: 1. It is talking about choosing s, but s doesn't appear anywhere in the equation. 2. The math in the equation is incorrect. Where B = len(v)/2 s = B - 1 Proof that it is incorrect: len(v) - B >= B + 1 len(v) - len(v)/2 >= len(v)/2 + 1 This doesn't hold if len(v) is even, e.g. 10: 10 - 10/2 >= 10/2 + 1 10 - 5 >= 5 + 1 5 >= 6 // this is false The new equation will be the following, which will be mathematically correct: len(v) - s >= B + 1 len(v) - (len(v)/2 - 1) >= len(v)/2 + 1 len(v) - len(v)/2 + 1 >= len(v)/2 + 1 len(v) - len(v)/2 >= len(v)/2 This holds if len(v) is even or odd. e.g. 10 10 - 10/2 >= 10/2 10 - 5 >= 5 5 >= 5 e.g. 11 11 - 11/2 >= 11/2 11 - 5 >= 5 6 >= 5 Change-Id: If77ce09286cf7038637b5dfd0fb7d4f828023f56 Reviewed-on: https://go-review.googlesource.com/c/go/+/287372 Run-TryBot: Katie Hockman <katie@golang.org> Reviewed-by: Filippo Valsorda <filippo@golang.org> Trust: Katie Hockman <katie@golang.org>
2020-11-12	math/big: fix shift for recursive division	Katie Hockman
	The previous s value could cause a crash for certain inputs. Will check in tests and documentation improvements later. Thanks to the Go Ethereum team and the OSS-Fuzz project for reporting this. Thanks to Rémy Oudompheng and Robert Griesemer for their help developing and validating the fix. Fixes CVE-2020-28362 Change-Id: Ibbf455c4436bcdb07c84a34fa6551fb3422356d3 Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/899974 Reviewed-by: Roland Shoemaker <bracewell@google.com> Reviewed-by: Filippo Valsorda <valsorda@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/269657 Trust: Katie Hockman <katie@golang.org> Trust: Roland Shoemaker <roland@golang.org> Run-TryBot: Katie Hockman <katie@golang.org> Reviewed-by: Roland Shoemaker <roland@golang.org> TryBot-Result: Go Bot <gobot@golang.org>
2020-09-23	math/big: replace division with multiplication by reciprocal word	SparrowLii
	Division is much slower than multiplication. And the method of using multiplication by multiplying reciprocal and replacing division with it can increase the speed of divWVW algorithm by three times,and at the same time increase the speed of nats division. The benchmark test on arm64 is as follows: name old time/op new time/op delta DivWVW/1-4 13.1ns ± 4% 13.3ns ± 4% ~ (p=0.444 n=5+5) DivWVW/2-4 48.6ns ± 1% 51.2ns ± 2% +5.39% (p=0.008 n=5+5) DivWVW/3-4 82.0ns ± 1% 69.7ns ± 1% -15.03% (p=0.008 n=5+5) DivWVW/4-4 116ns ± 1% 71ns ± 2% -38.88% (p=0.008 n=5+5) DivWVW/5-4 152ns ± 1% 84ns ± 4% -44.70% (p=0.008 n=5+5) DivWVW/10-4 319ns ± 1% 155ns ± 4% -51.50% (p=0.008 n=5+5) DivWVW/100-4 3.44µs ± 3% 1.30µs ± 8% -62.30% (p=0.008 n=5+5) DivWVW/1000-4 33.8µs ± 0% 10.9µs ± 1% -67.74% (p=0.008 n=5+5) DivWVW/10000-4 343µs ± 4% 111µs ± 5% -67.63% (p=0.008 n=5+5) DivWVW/100000-4 3.35ms ± 1% 1.25ms ± 3% -62.79% (p=0.008 n=5+5) QuoRem-4 3.08µs ± 2% 2.21µs ± 4% -28.40% (p=0.008 n=5+5) ModSqrt225_Tonelli-4 444µs ± 2% 457µs ± 3% ~ (p=0.095 n=5+5) ModSqrt225_3Mod4-4 136µs ± 1% 138µs ± 3% ~ (p=0.151 n=5+5) ModSqrt231_Tonelli-4 473µs ± 3% 483µs ± 4% ~ (p=0.548 n=5+5) ModSqrt231_5Mod8-4 164µs ± 9% 169µs ±12% ~ (p=0.421 n=5+5) Sqrt-4 36.8µs ± 1% 28.6µs ± 0% -22.17% (p=0.016 n=5+4) Div/20/10-4 50.0ns ± 3% 51.3ns ± 6% ~ (p=0.238 n=5+5) Div/40/20-4 49.8ns ± 2% 51.3ns ± 6% ~ (p=0.222 n=5+5) Div/100/50-4 85.8ns ± 4% 86.5ns ± 5% ~ (p=0.246 n=5+5) Div/200/100-4 335ns ± 3% 296ns ± 2% -11.60% (p=0.008 n=5+5) Div/400/200-4 442ns ± 2% 359ns ± 5% -18.81% (p=0.008 n=5+5) Div/1000/500-4 858ns ± 3% 643ns ± 6% -25.06% (p=0.008 n=5+5) Div/2000/1000-4 1.70µs ± 3% 1.28µs ± 4% -24.80% (p=0.008 n=5+5) Div/20000/10000-4 45.0µs ± 5% 41.8µs ± 4% -7.17% (p=0.016 n=5+5) Div/200000/100000-4 1.51ms ± 7% 1.43ms ± 3% -5.42% (p=0.016 n=5+5) Div/2000000/1000000-4 57.6ms ± 4% 57.5ms ± 3% ~ (p=1.000 n=5+5) Div/20000000/10000000-4 2.08s ± 3% 2.04s ± 1% ~ (p=0.095 n=5+5) name old speed new speed delta DivWVW/1-4 4.87GB/s ± 4% 4.80GB/s ± 4% ~ (p=0.310 n=5+5) DivWVW/2-4 2.63GB/s ± 1% 2.50GB/s ± 2% -5.07% (p=0.008 n=5+5) DivWVW/3-4 2.34GB/s ± 1% 2.76GB/s ± 1% +17.70% (p=0.008 n=5+5) DivWVW/4-4 2.21GB/s ± 1% 3.61GB/s ± 2% +63.42% (p=0.008 n=5+5) DivWVW/5-4 2.10GB/s ± 2% 3.81GB/s ± 4% +80.89% (p=0.008 n=5+5) DivWVW/10-4 2.01GB/s ± 0% 4.13GB/s ± 4% +105.91% (p=0.008 n=5+5) DivWVW/100-4 1.86GB/s ± 2% 4.95GB/s ± 7% +165.63% (p=0.008 n=5+5) DivWVW/1000-4 1.89GB/s ± 0% 5.86GB/s ± 1% +209.96% (p=0.008 n=5+5) DivWVW/10000-4 1.87GB/s ± 4% 5.76GB/s ± 5% +208.96% (p=0.008 n=5+5) DivWVW/100000-4 1.91GB/s ± 1% 5.14GB/s ± 3% +168.85% (p=0.008 n=5+5) Change-Id: I049f1196562b20800e6ef8a6493fd147f93ad830 Reviewed-on: https://go-review.googlesource.com/c/go/+/250417 Trust: Giovanni Bajo <rasky@develer.com> Trust: Keith Randall <khr@golang.org> Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-05-05	math/big: add (*Int).FillBytes	Filippo Valsorda
	Replaced almost every use of Bytes with FillBytes. Note that the approved proposal was for func (Int) FillBytes(buf []byte) while this implements func (Int) FillBytes(buf []byte) []byte because the latter was far nicer to use in all callsites. Fixes #35833 Change-Id: Ia912df123e5d79b763845312ea3d9a8051343c0a Reviewed-on: https://go-review.googlesource.com/c/go/+/230397 Reviewed-by: Robert Griesemer <gri@golang.org>
2020-04-08	math/big: correct off-by-one access in divBasic	Rémy Oudompheng
	The divBasic function computes the quotient of big nats u/v word by word. It estimates each word qhat by performing a long division (top 2 words of u divided by top word of v), looks at the next word to correct the estimate, then perform a full multiplication (qhatv) to catch any inaccuracy in the estimate. In the latter case, "negative" values appear temporarily and carries must be carefully managed, and the recursive division refactoring introduced a case where qhatv has the same length as v, triggering an out-of-bounds write in the case it happens when computing the top word of the quotient. Fixes #37499 Change-Id: I15089da4a4027beda43af497bf6de261eb792f94 Reviewed-on: https://go-review.googlesource.com/c/go/+/221980 Reviewed-by: Robert Griesemer <gri@golang.org>
2019-11-15	all: fix a bunch of misspellings	Ville Skyttä
	Change-Id: I5b909df0fd048cd66c5a27fca1b06466d3bcaac7 GitHub-Last-Rev: 778c5d21311abee09a5fbda2e4005a5fd4cc3f9f GitHub-Pull-Request: golang/go#35624 Reviewed-on: https://go-review.googlesource.com/c/go/+/207421 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-11-13	math/big: fix out-of-bounds panic in divRecursive	Rémy Oudompheng
	The bounds in the last carry branch were wrong as there is no reason for len(u) >= n+n/2 to always hold true. We also adjust test to avoid using a remainder of 1 (in which case, the last step of the algorithm computes (qhatv+1) - qhatv which rarely produces a carry). Change-Id: I69fbab9c5e19d0db1c087fbfcd5b89352c2d26fb Reviewed-on: https://go-review.googlesource.com/c/go/+/206839 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2019-11-12	math/big: implement recursive algorithm for division	Rémy Oudompheng
	The current division algorithm produces one word of result at a time, using 2-word division to compute the top word and mulAddVWW to compute the remainder. The top word may need to be adjusted by 1 or 2 units. The recursive version, based on Burnikel, Ziegler, "Fast Recursive Division", uses the same principles, but in a multi-word setting, so that multiplication benefits from the Karatsuba algorithm (and possibly later improvements). benchmark old ns/op new ns/op delta BenchmarkDiv/20/10-4 38.2 38.3 +0.26% BenchmarkDiv/40/20-4 38.7 38.5 -0.52% BenchmarkDiv/100/50-4 62.5 62.6 +0.16% BenchmarkDiv/200/100-4 238 259 +8.82% BenchmarkDiv/400/200-4 311 338 +8.68% BenchmarkDiv/1000/500-4 604 649 +7.45% BenchmarkDiv/2000/1000-4 1214 1278 +5.27% BenchmarkDiv/20000/10000-4 38279 36510 -4.62% BenchmarkDiv/200000/100000-4 3022057 1359615 -55.01% BenchmarkDiv/2000000/1000000-4 310827664 54012939 -82.62% BenchmarkDiv/20000000/10000000-4 33272829421 1965401359 -94.09% BenchmarkString/10/Base10-4 158 156 -1.27% BenchmarkString/100/Base10-4 797 792 -0.63% BenchmarkString/1000/Base10-4 3677 3814 +3.73% BenchmarkString/10000/Base10-4 16633 17116 +2.90% BenchmarkString/100000/Base10-4 5779029 1793808 -68.96% BenchmarkString/1000000/Base10-4 889840820 85524031 -90.39% BenchmarkString/10000000/Base10-4 134338236860 4935657026 -96.33% Fixes #21960 Updates #30943 Change-Id: I134c6f81a47870c688ca95b6081eb9211def15a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/172018 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-25	math/big: use nat pool to reduce allocations in mul and sqr	Rémy Oudompheng
	This notably allows to reuse temporaries across the karatsubaSqr recursion. benchmark old ns/op new ns/op delta BenchmarkNatMul/10-4 227 228 +0.44% BenchmarkNatMul/100-4 8339 8589 +3.00% BenchmarkNatMul/1000-4 313796 312272 -0.49% BenchmarkNatMul/10000-4 11924720 11873589 -0.43% BenchmarkNatMul/100000-4 503813354 503839058 +0.01% BenchmarkNatSqr/20-4 549 513 -6.56% BenchmarkNatSqr/30-4 945 874 -7.51% BenchmarkNatSqr/50-4 1993 1832 -8.08% BenchmarkNatSqr/80-4 4096 3874 -5.42% BenchmarkNatSqr/100-4 6192 5712 -7.75% BenchmarkNatSqr/200-4 20388 19543 -4.14% BenchmarkNatSqr/300-4 38735 36715 -5.21% BenchmarkNatSqr/500-4 99562 93542 -6.05% BenchmarkNatSqr/800-4 195554 184907 -5.44% BenchmarkNatSqr/1000-4 286302 275053 -3.93% BenchmarkNatSqr/10000-4 9817057 9441641 -3.82% BenchmarkNatSqr/100000-4 390713416 379696789 -2.82% benchmark old allocs new allocs delta BenchmarkNatMul/10-4 1 1 +0.00% BenchmarkNatMul/100-4 1 1 +0.00% BenchmarkNatMul/1000-4 2 1 -50.00% BenchmarkNatMul/10000-4 2 1 -50.00% BenchmarkNatMul/100000-4 9 11 +22.22% BenchmarkNatSqr/20-4 2 1 -50.00% BenchmarkNatSqr/30-4 2 1 -50.00% BenchmarkNatSqr/50-4 2 1 -50.00% BenchmarkNatSqr/80-4 2 1 -50.00% BenchmarkNatSqr/100-4 2 1 -50.00% BenchmarkNatSqr/200-4 2 1 -50.00% BenchmarkNatSqr/300-4 4 1 -75.00% BenchmarkNatSqr/500-4 4 1 -75.00% BenchmarkNatSqr/800-4 10 1 -90.00% BenchmarkNatSqr/1000-4 10 1 -90.00% BenchmarkNatSqr/10000-4 731 1 -99.86% BenchmarkNatSqr/100000-4 19687 6 -99.97% benchmark old bytes new bytes delta BenchmarkNatMul/10-4 192 192 +0.00% BenchmarkNatMul/100-4 4864 4864 +0.00% BenchmarkNatMul/1000-4 57344 49224 -14.16% BenchmarkNatMul/10000-4 565248 498772 -11.76% BenchmarkNatMul/100000-4 5749504 7263720 +26.34% BenchmarkNatSqr/20-4 672 352 -47.62% BenchmarkNatSqr/30-4 992 512 -48.39% BenchmarkNatSqr/50-4 1792 896 -50.00% BenchmarkNatSqr/80-4 2688 1408 -47.62% BenchmarkNatSqr/100-4 3584 1792 -50.00% BenchmarkNatSqr/200-4 6656 3456 -48.08% BenchmarkNatSqr/300-4 24448 16387 -32.97% BenchmarkNatSqr/500-4 36864 24591 -33.29% BenchmarkNatSqr/800-4 69760 40981 -41.25% BenchmarkNatSqr/1000-4 86016 49180 -42.82% BenchmarkNatSqr/10000-4 2524800 487368 -80.70% BenchmarkNatSqr/100000-4 68599808 5876581 -91.43% Change-Id: I8e6e409ae1cb48be9d5aa9b5f428d6cbe487673a Reviewed-on: https://go-review.googlesource.com/c/go/+/172017 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2019-03-25	math/big: accept non-decimal floats with Rat.SetString	Robert Griesemer
	This fixes an old oversight. Rat.SetString already permitted fractions a/b where both a and b could independently specify a base prefix. With this CL, it now also accepts non-decimal floating-point numbers. Fixes #29799. Change-Id: I9cc65666a5cebb00f0202da2e4fc5654a02e3234 Reviewed-on: https://go-review.googlesource.com/c/go/+/168237 Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
2019-02-27	math/big: better initial guess for nat.sqrt	Juraj Sukop
	The proposed change introduces a better initial guess which is closer to the final value and therefore converges in fewer steps. Consider for example sqrt(8): previously the guess was 8, whereas now it is 4 (and the result is 2). All this change does is it computes the division by two more accurately while it keeps the guess ≥ √x. Change-Id: I917248d734a7b0488d14a647a063f674e56c4e30 GitHub-Last-Rev: c06d9d4876c8e7d6739f0e4b687e370fe1e9aad7 GitHub-Pull-Request: golang/go#28981 Reviewed-on: https://go-review.googlesource.com/c/163866 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2018-11-28	math/big: allocate less for single-Word nats	Josh Bleecher Snyder
	For many uses of math/big, most numbers are small in practice. Prior to this change, big.NewInt allocated a minimum of five Words: one to hold the value, and four as extra capacity. In most cases, this extra capacity is waste. Worse, allocating a single Word uses a fast malloc path for tiny allocs; allocating five Words is more expensive in CPU as well as memory. This change is a simple fix: Treat a request for one Word at its word. I experimented with more complicated fixes and did not find anything that outperformed this easy fix. On some real world programs, this is a clear win. The compiler: name old alloc/op new alloc/op delta Template 37.1MB ± 0% 37.0MB ± 0% -0.23% (p=0.008 n=5+5) Unicode 29.2MB ± 0% 28.5MB ± 0% -2.48% (p=0.008 n=5+5) GoTypes 133MB ± 0% 133MB ± 0% -0.05% (p=0.008 n=5+5) Compiler 628MB ± 0% 628MB ± 0% -0.06% (p=0.008 n=5+5) SSA 2.04GB ± 0% 2.03GB ± 0% -0.14% (p=0.008 n=5+5) Flate 24.7MB ± 0% 24.6MB ± 0% -0.23% (p=0.008 n=5+5) GoParser 29.6MB ± 0% 29.6MB ± 0% -0.07% (p=0.008 n=5+5) Reflect 82.3MB ± 0% 82.2MB ± 0% -0.05% (p=0.008 n=5+5) Tar 36.2MB ± 0% 36.2MB ± 0% -0.12% (p=0.008 n=5+5) XML 49.5MB ± 0% 49.4MB ± 0% -0.23% (p=0.008 n=5+5) [Geo mean] 85.1MB 84.8MB -0.37% name old allocs/op new allocs/op delta Template 364k ± 0% 364k ± 0% ~ (p=0.476 n=5+5) Unicode 341k ± 0% 341k ± 0% ~ (p=0.690 n=5+5) GoTypes 1.37M ± 0% 1.37M ± 0% ~ (p=0.444 n=5+5) Compiler 5.50M ± 0% 5.50M ± 0% +0.02% (p=0.008 n=5+5) SSA 16.0M ± 0% 16.0M ± 0% +0.01% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% ~ (p=0.222 n=5+5) GoParser 305k ± 0% 305k ± 0% ~ (p=0.841 n=5+5) Reflect 976k ± 0% 976k ± 0% ~ (p=0.222 n=5+5) Tar 354k ± 0% 354k ± 0% ~ (p=0.103 n=5+5) XML 450k ± 0% 450k ± 0% ~ (p=0.151 n=5+5) [Geo mean] 837k 837k +0.01% go.skylark.net (at ea6d2813de75ded8d157b9540bc3d3ad0b688623): name old alloc/op new alloc/op delta Hashtable-8 456kB ± 0% 299kB ± 0% -34.33% (p=0.000 n=9+9) /bench_builtin_method-8 220kB ± 0% 190kB ± 0% -13.55% (p=0.000 n=9+10) name old allocs/op new allocs/op delta Hashtable-8 7.84k ± 0% 7.84k ± 0% ~ (all equal) /bench_builtin_method-8 7.49k ± 0% 7.49k ± 0% ~ (all equal) The math/big benchmarks are messy, which is predictable, since they naturally exercise the bigger-than-one-word code more. Also worth noting is that many of the benchmarks have very high variance. I've omitted the opVV and opVW benchmarks, as they are unrelated. name old time/op new time/op delta DecimalConversion-8 92.5µs ± 1% 90.6µs ± 0% -2.12% (p=0.000 n=17+19) FloatString/100-8 867ns ± 0% 871ns ± 0% +0.50% (p=0.000 n=18+18) FloatString/1000-8 26.4µs ± 1% 26.5µs ± 1% ~ (p=0.396 n=20+19) FloatString/10000-8 2.15ms ± 2% 2.16ms ± 2% ~ (p=0.089 n=19+20) FloatString/100000-8 209ms ± 1% 209ms ± 1% ~ (p=0.583 n=19+19) FloatAdd/10-8 63.5ns ± 2% 64.1ns ± 6% ~ (p=0.389 n=19+19) FloatAdd/100-8 66.0ns ± 2% 65.8ns ± 2% ~ (p=0.825 n=20+20) FloatAdd/1000-8 93.9ns ± 1% 94.3ns ± 1% ~ (p=0.273 n=19+20) FloatAdd/10000-8 347ns ± 2% 342ns ± 1% -1.50% (p=0.000 n=18+18) FloatAdd/100000-8 2.78µs ± 1% 2.78µs ± 2% ~ (p=0.961 n=20+19) FloatSub/10-8 56.9ns ± 2% 57.8ns ± 3% +1.59% (p=0.001 n=19+19) FloatSub/100-8 58.2ns ± 2% 58.9ns ± 2% +1.25% (p=0.004 n=20+20) FloatSub/1000-8 74.9ns ± 1% 74.4ns ± 1% -0.76% (p=0.000 n=19+20) FloatSub/10000-8 223ns ± 1% 220ns ± 2% -1.29% (p=0.000 n=16+20) FloatSub/100000-8 1.66µs ± 1% 1.66µs ± 2% ~ (p=0.147 n=20+20) ParseFloatSmallExp-8 8.38µs ± 0% 8.59µs ± 0% +2.48% (p=0.000 n=19+19) ParseFloatLargeExp-8 31.1µs ± 0% 32.0µs ± 0% +3.04% (p=0.000 n=16+17) GCD10x10/WithoutXY-8 115ns ± 1% 99ns ± 3% -14.07% (p=0.000 n=20+20) GCD10x10/WithXY-8 322ns ± 0% 312ns ± 0% -3.11% (p=0.000 n=18+13) GCD10x100/WithoutXY-8 233ns ± 1% 219ns ± 1% -5.73% (p=0.000 n=19+17) GCD10x100/WithXY-8 709ns ± 0% 759ns ± 0% +7.04% (p=0.000 n=19+19) GCD10x1000/WithoutXY-8 653ns ± 1% 642ns ± 1% -1.69% (p=0.000 n=17+20) GCD10x1000/WithXY-8 1.35µs ± 0% 1.35µs ± 1% ~ (p=0.255 n=20+16) GCD10x10000/WithoutXY-8 4.57µs ± 1% 4.61µs ± 1% +0.95% (p=0.000 n=18+17) GCD10x10000/WithXY-8 6.82µs ± 0% 6.84µs ± 0% +0.27% (p=0.000 n=16+17) GCD10x100000/WithoutXY-8 43.9µs ± 1% 44.0µs ± 1% +0.28% (p=0.000 n=18+17) GCD10x100000/WithXY-8 60.6µs ± 0% 60.6µs ± 0% ~ (p=0.907 n=18+18) GCD100x100/WithoutXY-8 1.13µs ± 0% 1.21µs ± 0% +6.39% (p=0.000 n=19+19) GCD100x100/WithXY-8 1.82µs ± 0% 1.92µs ± 0% +5.24% (p=0.000 n=19+17) GCD100x1000/WithoutXY-8 2.00µs ± 0% 2.03µs ± 1% +1.61% (p=0.000 n=18+16) GCD100x1000/WithXY-8 3.22µs ± 0% 3.20µs ± 1% -0.83% (p=0.000 n=19+19) GCD100x10000/WithoutXY-8 9.28µs ± 1% 9.17µs ± 1% -1.25% (p=0.000 n=18+19) GCD100x10000/WithXY-8 13.5µs ± 0% 13.3µs ± 0% -1.12% (p=0.000 n=18+19) GCD100x100000/WithoutXY-8 80.4µs ± 0% 78.6µs ± 0% -2.25% (p=0.000 n=19+19) GCD100x100000/WithXY-8 114µs ± 0% 112µs ± 0% -1.46% (p=0.000 n=19+17) GCD1000x1000/WithoutXY-8 12.9µs ± 1% 12.9µs ± 2% -0.50% (p=0.014 n=20+19) GCD1000x1000/WithXY-8 19.6µs ± 1% 19.6µs ± 2% -0.28% (p=0.040 n=17+18) GCD1000x10000/WithoutXY-8 22.4µs ± 0% 22.4µs ± 2% ~ (p=0.220 n=19+19) GCD1000x10000/WithXY-8 57.0µs ± 0% 56.5µs ± 0% -0.87% (p=0.000 n=20+20) GCD1000x100000/WithoutXY-8 116µs ± 0% 115µs ± 0% -0.49% (p=0.000 n=18+19) GCD1000x100000/WithXY-8 410µs ± 0% 411µs ± 0% ~ (p=0.052 n=19+19) GCD10000x10000/WithoutXY-8 247µs ± 1% 244µs ± 1% -0.92% (p=0.000 n=19+19) GCD10000x10000/WithXY-8 476µs ± 1% 473µs ± 1% -0.48% (p=0.009 n=19+19) GCD10000x100000/WithoutXY-8 573µs ± 1% 571µs ± 1% -0.45% (p=0.012 n=20+20) GCD10000x100000/WithXY-8 3.35ms ± 1% 3.35ms ± 1% ~ (p=0.444 n=20+19) GCD100000x100000/WithoutXY-8 12.0ms ± 2% 11.9ms ± 2% ~ (p=0.276 n=18+20) GCD100000x100000/WithXY-8 27.3ms ± 1% 27.3ms ± 1% ~ (p=0.792 n=20+19) Hilbert-8 672µs ± 0% 611µs ± 0% -9.02% (p=0.000 n=19+19) Binomial-8 1.40µs ± 0% 1.18µs ± 0% -15.69% (p=0.000 n=16+14) QuoRem-8 2.20µs ± 1% 2.17µs ± 1% -1.13% (p=0.000 n=19+19) Exp-8 4.10ms ± 1% 4.11ms ± 1% ~ (p=0.296 n=20+19) Exp2-8 4.11ms ± 1% 4.12ms ± 1% ~ (p=0.429 n=20+20) Bitset-8 8.67ns ± 6% 8.74ns ± 4% ~ (p=0.139 n=19+17) BitsetNeg-8 43.6ns ± 1% 43.8ns ± 2% +0.61% (p=0.036 n=20+20) BitsetOrig-8 77.5ns ± 1% 68.4ns ± 1% -11.77% (p=0.000 n=19+20) BitsetNegOrig-8 145ns ± 1% 141ns ± 1% -2.87% (p=0.000 n=19+20) ModSqrt225_Tonelli-8 324µs ± 1% 324µs ± 1% ~ (p=0.409 n=18+20) ModSqrt225_3Mod4-8 98.9µs ± 1% 99.1µs ± 1% ~ (p=0.298 n=19+18) ModSqrt231_Tonelli-8 337µs ± 1% 337µs ± 1% ~ (p=0.718 n=20+18) ModSqrt231_5Mod8-8 115µs ± 1% 114µs ± 1% -0.22% (p=0.050 n=20+20) ModInverse-8 895ns ± 0% 869ns ± 1% -2.83% (p=0.000 n=17+17) Sqrt-8 28.1µs ± 1% 28.1µs ± 0% -0.28% (p=0.000 n=16+20) IntSqr/1-8 10.8ns ± 3% 10.5ns ± 3% -2.51% (p=0.000 n=19+17) IntSqr/2-8 30.5ns ± 2% 30.3ns ± 4% -0.71% (p=0.035 n=18+18) IntSqr/3-8 40.1ns ± 1% 40.1ns ± 1% ~ (p=0.710 n=20+17) IntSqr/5-8 65.3ns ± 1% 65.4ns ± 2% ~ (p=0.744 n=19+19) IntSqr/8-8 101ns ± 1% 102ns ± 0% ~ (p=0.234 n=19+20) IntSqr/10-8 138ns ± 0% 138ns ± 2% ~ (p=0.827 n=18+18) IntSqr/20-8 378ns ± 1% 378ns ± 1% ~ (p=0.479 n=18+18) IntSqr/30-8 637ns ± 0% 638ns ± 1% ~ (p=0.051 n=18+20) IntSqr/50-8 1.34µs ± 2% 1.34µs ± 1% ~ (p=0.970 n=18+19) IntSqr/80-8 2.78µs ± 0% 2.78µs ± 1% -0.18% (p=0.006 n=19+17) IntSqr/100-8 3.98µs ± 0% 3.98µs ± 0% ~ (p=0.057 n=17+19) IntSqr/200-8 13.5µs ± 0% 13.5µs ± 1% -0.33% (p=0.000 n=19+17) IntSqr/300-8 25.3µs ± 1% 25.3µs ± 1% ~ (p=0.361 n=19+20) IntSqr/500-8 62.9µs ± 0% 62.9µs ± 1% ~ (p=0.899 n=17+17) IntSqr/800-8 128µs ± 1% 127µs ± 1% -0.32% (p=0.016 n=18+20) IntSqr/1000-8 192µs ± 0% 192µs ± 1% ~ (p=0.916 n=17+18) Div/20/10-8 34.9ns ± 2% 35.6ns ± 1% +2.01% (p=0.000 n=20+20) Div/200/100-8 218ns ± 1% 215ns ± 2% -1.43% (p=0.000 n=18+18) Div/2000/1000-8 1.16µs ± 1% 1.15µs ± 1% -1.04% (p=0.000 n=19+20) Div/20000/10000-8 35.7µs ± 1% 35.4µs ± 1% -0.69% (p=0.000 n=19+18) Div/200000/100000-8 2.89ms ± 1% 2.88ms ± 1% -0.62% (p=0.007 n=19+20) Mul-8 9.28ms ± 1% 9.27ms ± 1% ~ (p=0.563 n=18+18) ZeroShifts/Shl-8 712ns ± 6% 716ns ± 7% ~ (p=0.597 n=20+20) ZeroShifts/ShlSame-8 4.00ns ± 1% 4.06ns ± 5% ~ (p=0.162 n=18+20) ZeroShifts/Shr-8 714ns ±10% 1285ns ±156% ~ (p=0.250 n=20+20) ZeroShifts/ShrSame-8 4.00ns ± 1% 4.09ns ±10% +2.34% (p=0.048 n=16+19) Exp3Power/0x10-8 154ns ± 0% 159ns ±13% ~ (p=0.197 n=14+20) Exp3Power/0x40-8 171ns ± 1% 175ns ± 8% ~ (p=0.058 n=16+19) Exp3Power/0x100-8 287ns ± 0% 316ns ± 4% +10.03% (p=0.000 n=17+19) Exp3Power/0x400-8 698ns ± 1% 801ns ± 6% +14.75% (p=0.000 n=19+20) Exp3Power/0x1000-8 2.87µs ± 0% 3.65µs ± 6% +27.24% (p=0.000 n=18+18) Exp3Power/0x4000-8 21.9µs ± 1% 28.7µs ± 8% +31.09% (p=0.000 n=18+20) Exp3Power/0x10000-8 204µs ± 0% 267µs ± 9% +30.81% (p=0.000 n=20+20) Exp3Power/0x40000-8 1.86ms ± 0% 2.26ms ± 5% +21.68% (p=0.000 n=18+19) Exp3Power/0x100000-8 17.5ms ± 1% 20.7ms ± 7% +18.39% (p=0.000 n=19+20) Exp3Power/0x400000-8 156ms ± 0% 172ms ± 6% +10.54% (p=0.000 n=19+20) Fibo-8 26.9ms ± 1% 27.5ms ± 3% +2.32% (p=0.000 n=19+19) NatSqr/1-8 31.0ns ± 4% 39.5ns ±29% +27.25% (p=0.000 n=20+19) NatSqr/2-8 54.1ns ± 1% 69.0ns ±28% +27.52% (p=0.000 n=20+20) NatSqr/3-8 66.6ns ± 1% 83.0ns ±25% +24.59% (p=0.000 n=20+20) NatSqr/5-8 97.1ns ± 1% 119.9ns ±12% +23.50% (p=0.000 n=16+20) NatSqr/8-8 138ns ± 1% 171ns ± 9% +24.20% (p=0.000 n=19+20) NatSqr/10-8 182ns ± 0% 225ns ± 9% +23.50% (p=0.000 n=16+20) NatSqr/20-8 447ns ± 1% 624ns ± 6% +39.64% (p=0.000 n=19+19) NatSqr/30-8 736ns ± 2% 986ns ± 9% +33.94% (p=0.000 n=19+20) NatSqr/50-8 1.51µs ± 2% 1.97µs ± 9% +30.42% (p=0.000 n=20+20) NatSqr/80-8 3.03µs ± 1% 3.67µs ± 7% +21.08% (p=0.000 n=20+20) NatSqr/100-8 4.31µs ± 1% 5.20µs ± 7% +20.52% (p=0.000 n=19+20) NatSqr/200-8 14.2µs ± 0% 16.3µs ± 4% +14.92% (p=0.000 n=19+20) NatSqr/300-8 27.8µs ± 1% 33.2µs ± 7% +19.28% (p=0.000 n=20+18) NatSqr/500-8 66.6µs ± 1% 74.5µs ± 3% +11.87% (p=0.000 n=18+18) NatSqr/800-8 135µs ± 1% 165µs ± 7% +22.33% (p=0.000 n=20+20) NatSqr/1000-8 200µs ± 0% 228µs ± 3% +14.39% (p=0.000 n=19+20) NatSetBytes/8-8 8.87ns ± 4% 8.77ns ± 2% -1.17% (p=0.020 n=20+16) NatSetBytes/24-8 38.6ns ± 3% 49.5ns ±29% +28.32% (p=0.000 n=18+19) NatSetBytes/128-8 75.2ns ± 1% 120.7ns ±29% +60.60% (p=0.000 n=17+20) NatSetBytes/7-8 16.2ns ± 2% 16.5ns ± 2% +1.76% (p=0.000 n=20+20) NatSetBytes/23-8 46.5ns ± 1% 60.2ns ±24% +29.59% (p=0.000 n=20+20) NatSetBytes/127-8 83.1ns ± 1% 118.2ns ±20% +42.33% (p=0.000 n=18+20) ScanPi-8 89.1µs ± 1% 117.4µs ±12% +31.75% (p=0.000 n=18+20) StringPiParallel-8 35.1µs ± 9% 40.2µs ±12% +14.53% (p=0.000 n=20+20) Scan/10/Base2-8 410ns ±14% 429ns ±10% +4.47% (p=0.018 n=19+20) Scan/100/Base2-8 3.05µs ±20% 2.97µs ±14% ~ (p=0.449 n=20+20) Scan/1000/Base2-8 29.3µs ± 8% 30.1µs ±23% ~ (p=0.355 n=20+20) Scan/10000/Base2-8 402µs ±13% 395µs ±14% ~ (p=0.355 n=20+20) Scan/100000/Base2-8 11.8ms ±10% 11.6ms ± 1% ~ (p=0.245 n=17+18) Scan/10/Base8-8 194ns ± 6% 196ns ±12% ~ (p=0.829 n=20+19) Scan/100/Base8-8 1.11µs ±15% 1.11µs ±12% ~ (p=0.743 n=20+20) Scan/1000/Base8-8 11.7µs ±10% 11.7µs ±12% ~ (p=0.904 n=20+20) Scan/10000/Base8-8 209µs ± 7% 210µs ± 8% ~ (p=0.478 n=20+20) Scan/100000/Base8-8 10.6ms ± 7% 10.4ms ± 6% ~ (p=0.112 n=20+18) Scan/10/Base10-8 182ns ±12% 188ns ±11% +3.52% (p=0.044 n=20+20) Scan/100/Base10-8 1.01µs ± 8% 1.00µs ±13% ~ (p=0.588 n=20+20) Scan/1000/Base10-8 10.7µs ±20% 10.6µs ±14% ~ (p=0.560 n=20+20) Scan/10000/Base10-8 195µs ±10% 194µs ± 9% ~ (p=0.883 n=20+20) Scan/100000/Base10-8 10.6ms ± 2% 10.6ms ± 2% ~ (p=0.495 n=20+20) Scan/10/Base16-8 166ns ±10% 174ns ±17% ~ (p=0.072 n=20+20) Scan/100/Base16-8 836ns ±10% 826ns ±12% ~ (p=0.562 n=20+17) Scan/1000/Base16-8 8.96µs ±13% 8.65µs ± 9% ~ (p=0.203 n=20+18) Scan/10000/Base16-8 198µs ± 3% 198µs ± 5% ~ (p=0.718 n=20+20) Scan/100000/Base16-8 11.1ms ± 3% 11.0ms ± 4% ~ (p=0.512 n=20+20) String/10/Base2-8 88.1ns ± 7% 94.1ns ±11% +6.80% (p=0.000 n=19+20) String/100/Base2-8 577ns ± 4% 598ns ± 5% +3.72% (p=0.000 n=20+20) String/1000/Base2-8 5.25µs ± 2% 5.62µs ± 5% +7.04% (p=0.000 n=19+20) String/10000/Base2-8 55.6µs ± 1% 60.1µs ± 2% +8.12% (p=0.000 n=19+19) String/100000/Base2-8 519µs ± 2% 560µs ± 2% +7.91% (p=0.000 n=18+17) String/10/Base8-8 52.2ns ± 8% 53.3ns ±12% ~ (p=0.188 n=20+18) String/100/Base8-8 218ns ± 3% 232ns ±10% +6.66% (p=0.000 n=20+20) String/1000/Base8-8 1.84µs ± 3% 1.94µs ± 4% +5.07% (p=0.000 n=20+18) String/10000/Base8-8 18.1µs ± 2% 19.1µs ± 3% +5.84% (p=0.000 n=20+19) String/100000/Base8-8 184µs ± 2% 197µs ± 1% +7.15% (p=0.000 n=19+19) String/10/Base10-8 158ns ± 7% 146ns ± 6% -7.65% (p=0.000 n=20+19) String/100/Base10-8 807ns ± 2% 845ns ± 4% +4.79% (p=0.000 n=20+19) String/1000/Base10-8 3.99µs ± 3% 3.99µs ± 7% ~ (p=0.920 n=20+20) String/10000/Base10-8 20.8µs ± 6% 22.1µs ±10% +6.11% (p=0.000 n=19+20) String/100000/Base10-8 5.60ms ± 2% 5.59ms ± 2% ~ (p=0.749 n=20+19) String/10/Base16-8 49.0ns ±13% 49.3ns ±16% ~ (p=0.581 n=19+20) String/100/Base16-8 173ns ± 5% 185ns ± 6% +6.63% (p=0.000 n=20+18) String/1000/Base16-8 1.38µs ± 3% 1.49µs ±10% +8.27% (p=0.000 n=19+20) String/10000/Base16-8 13.5µs ± 2% 14.5µs ± 3% +7.08% (p=0.000 n=20+20) String/100000/Base16-8 138µs ± 4% 148µs ± 4% +7.57% (p=0.000 n=19+20) LeafSize/0-8 2.74ms ± 1% 2.79ms ± 2% +2.00% (p=0.000 n=19+19) LeafSize/1-8 24.8µs ± 4% 26.1µs ± 8% +5.33% (p=0.000 n=18+19) LeafSize/2-8 24.9µs ± 7% 25.0µs ± 8% ~ (p=0.989 n=20+19) LeafSize/3-8 97.6µs ± 3% 100.2µs ± 5% +2.66% (p=0.001 n=20+19) LeafSize/4-8 25.2µs ± 5% 25.4µs ± 5% ~ (p=0.173 n=19+20) LeafSize/5-8 118µs ± 2% 119µs ± 5% ~ (p=0.478 n=20+20) LeafSize/6-8 97.6µs ± 3% 100.1µs ± 8% +2.65% (p=0.021 n=20+19) LeafSize/7-8 65.6µs ± 5% 67.5µs ± 6% +2.92% (p=0.003 n=20+19) LeafSize/8-8 25.5µs ± 5% 25.6µs ± 6% ~ (p=0.461 n=19+20) LeafSize/9-8 134µs ± 4% 136µs ± 5% ~ (p=0.194 n=19+20) LeafSize/10-8 119µs ± 3% 122µs ± 3% +2.52% (p=0.000 n=20+19) LeafSize/11-8 115µs ± 5% 116µs ± 5% ~ (p=0.158 n=20+19) LeafSize/12-8 97.4µs ± 4% 100.3µs ± 5% +2.91% (p=0.003 n=19+20) LeafSize/13-8 93.1µs ± 4% 93.0µs ± 6% ~ (p=0.698 n=20+20) LeafSize/14-8 67.0µs ± 3% 69.7µs ± 6% +4.10% (p=0.000 n=20+20) LeafSize/15-8 48.3µs ± 2% 49.3µs ± 6% +1.91% (p=0.014 n=19+20) LeafSize/16-8 25.6µs ± 5% 25.6µs ± 6% ~ (p=0.947 n=20+20) LeafSize/32-8 30.1µs ± 4% 30.3µs ± 5% ~ (p=0.685 n=18+19) LeafSize/64-8 53.4µs ± 2% 54.0µs ± 3% ~ (p=0.053 n=19+19) ProbablyPrime/n=0-8 3.59ms ± 1% 3.55ms ± 1% -1.12% (p=0.000 n=20+18) ProbablyPrime/n=1-8 4.21ms ± 2% 4.17ms ± 2% -0.73% (p=0.018 n=20+19) ProbablyPrime/n=5-8 6.74ms ± 1% 6.72ms ± 1% ~ (p=0.102 n=20+20) ProbablyPrime/n=10-8 9.91ms ± 1% 9.89ms ± 2% ~ (p=0.322 n=19+20) ProbablyPrime/n=20-8 16.2ms ± 1% 16.1ms ± 2% -0.52% (p=0.006 n=19+19) ProbablyPrime/Lucas-8 2.94ms ± 1% 2.95ms ± 1% +0.52% (p=0.002 n=18+19) ProbablyPrime/MillerRabinBase2-8 641µs ± 2% 640µs ± 2% ~ (p=0.607 n=19+20) FloatSqrt/64-8 653ns ± 5% 704ns ± 5% +7.82% (p=0.000 n=19+20) FloatSqrt/128-8 1.32µs ± 3% 1.42µs ± 5% +7.29% (p=0.000 n=18+20) FloatSqrt/256-8 1.44µs ± 2% 1.45µs ± 4% ~ (p=0.089 n=19+19) FloatSqrt/1000-8 3.36µs ± 3% 3.42µs ± 5% +1.82% (p=0.012 n=20+20) FloatSqrt/10000-8 25.5µs ± 2% 27.5µs ± 7% +7.91% (p=0.000 n=18+19) FloatSqrt/100000-8 629µs ± 6% 663µs ± 9% +5.32% (p=0.000 n=18+20) FloatSqrt/1000000-8 46.4ms ± 2% 46.6ms ± 5% ~ (p=0.351 n=20+19) [Geo mean] 9.60µs 10.01µs +4.28% name old alloc/op new alloc/op delta DecimalConversion-8 54.0kB ± 0% 43.6kB ± 0% -19.40% (p=0.000 n=20+20) FloatString/100-8 400B ± 0% 400B ± 0% ~ (all equal) FloatString/1000-8 3.10kB ± 0% 3.10kB ± 0% ~ (all equal) FloatString/10000-8 52.1kB ± 0% 52.1kB ± 0% ~ (p=0.153 n=20+20) FloatString/100000-8 582kB ± 0% 582kB ± 0% ~ (all equal) FloatAdd/10-8 0.00B 0.00B ~ (all equal) FloatAdd/100-8 0.00B 0.00B ~ (all equal) FloatAdd/1000-8 0.00B 0.00B ~ (all equal) FloatAdd/10000-8 0.00B 0.00B ~ (all equal) FloatAdd/100000-8 0.00B 0.00B ~ (all equal) FloatSub/10-8 0.00B 0.00B ~ (all equal) FloatSub/100-8 0.00B 0.00B ~ (all equal) FloatSub/1000-8 0.00B 0.00B ~ (all equal) FloatSub/10000-8 0.00B 0.00B ~ (all equal) FloatSub/100000-8 0.00B 0.00B ~ (all equal) ParseFloatSmallExp-8 4.18kB ± 0% 3.60kB ± 0% -13.79% (p=0.000 n=20+20) ParseFloatLargeExp-8 18.9kB ± 0% 19.3kB ± 0% +2.25% (p=0.000 n=20+20) GCD10x10/WithoutXY-8 96.0B ± 0% 16.0B ± 0% -83.33% (p=0.000 n=20+20) GCD10x10/WithXY-8 240B ± 0% 88B ± 0% -63.33% (p=0.000 n=20+20) GCD10x100/WithoutXY-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20) GCD10x100/WithXY-8 464B ± 0% 424B ± 0% -8.62% (p=0.000 n=20+20) GCD10x1000/WithoutXY-8 416B ± 0% 336B ± 0% -19.23% (p=0.000 n=20+20) GCD10x1000/WithXY-8 1.25kB ± 0% 1.10kB ± 0% -12.18% (p=0.000 n=20+20) GCD10x10000/WithoutXY-8 2.91kB ± 0% 2.83kB ± 0% -2.75% (p=0.000 n=20+20) GCD10x10000/WithXY-8 8.70kB ± 0% 8.55kB ± 0% -1.76% (p=0.000 n=16+16) GCD10x100000/WithoutXY-8 27.2kB ± 0% 27.2kB ± 0% -0.29% (p=0.000 n=20+20) GCD10x100000/WithXY-8 82.4kB ± 0% 82.3kB ± 0% -0.17% (p=0.000 n=20+19) GCD100x100/WithoutXY-8 288B ± 0% 384B ± 0% +33.33% (p=0.000 n=20+20) GCD100x100/WithXY-8 464B ± 0% 576B ± 0% +24.14% (p=0.000 n=20+20) GCD100x1000/WithoutXY-8 640B ± 0% 688B ± 0% +7.50% (p=0.000 n=20+20) GCD100x1000/WithXY-8 1.52kB ± 0% 1.46kB ± 0% -3.68% (p=0.000 n=20+20) GCD100x10000/WithoutXY-8 4.24kB ± 0% 4.29kB ± 0% +1.13% (p=0.000 n=20+20) GCD100x10000/WithXY-8 11.1kB ± 0% 11.0kB ± 0% -0.51% (p=0.000 n=15+20) GCD100x100000/WithoutXY-8 40.9kB ± 0% 40.9kB ± 0% +0.12% (p=0.000 n=20+19) GCD100x100000/WithXY-8 110kB ± 0% 109kB ± 0% -0.08% (p=0.000 n=20+20) GCD1000x1000/WithoutXY-8 1.22kB ± 0% 1.06kB ± 0% -13.16% (p=0.000 n=20+20) GCD1000x1000/WithXY-8 2.37kB ± 0% 2.11kB ± 0% -10.83% (p=0.000 n=20+20) GCD1000x10000/WithoutXY-8 4.71kB ± 0% 4.63kB ± 0% -1.70% (p=0.000 n=20+19) GCD1000x10000/WithXY-8 28.2kB ± 0% 28.0kB ± 0% -0.43% (p=0.000 n=20+15) GCD1000x100000/WithoutXY-8 41.3kB ± 0% 41.2kB ± 0% -0.20% (p=0.000 n=20+16) GCD1000x100000/WithXY-8 301kB ± 0% 301kB ± 0% -0.13% (p=0.000 n=20+20) GCD10000x10000/WithoutXY-8 8.64kB ± 0% 8.48kB ± 0% -1.85% (p=0.000 n=20+20) GCD10000x10000/WithXY-8 57.2kB ± 0% 57.7kB ± 0% +0.80% (p=0.000 n=20+20) GCD10000x100000/WithoutXY-8 43.8kB ± 0% 43.7kB ± 0% -0.19% (p=0.000 n=20+18) GCD10000x100000/WithXY-8 2.08MB ± 0% 2.08MB ± 0% -0.02% (p=0.000 n=15+19) GCD100000x100000/WithoutXY-8 81.6kB ± 0% 81.4kB ± 0% -0.20% (p=0.000 n=20+20) GCD100000x100000/WithXY-8 4.32MB ± 0% 4.33MB ± 0% +0.12% (p=0.000 n=20+20) Hilbert-8 653kB ± 0% 313kB ± 0% -52.13% (p=0.000 n=19+20) Binomial-8 1.82kB ± 0% 1.02kB ± 0% -43.86% (p=0.000 n=20+20) QuoRem-8 0.00B 0.00B ~ (all equal) Exp-8 11.1kB ± 0% 11.0kB ± 0% -0.34% (p=0.000 n=19+20) Exp2-8 11.3kB ± 0% 11.3kB ± 0% -0.35% (p=0.000 n=19+20) Bitset-8 0.00B 0.00B ~ (all equal) BitsetNeg-8 0.00B 0.00B ~ (all equal) BitsetOrig-8 103B ± 0% 63B ± 0% -38.83% (p=0.000 n=20+20) BitsetNegOrig-8 215B ± 0% 175B ± 0% -18.60% (p=0.000 n=20+20) ModSqrt225_Tonelli-8 11.3kB ± 0% 11.0kB ± 0% -2.76% (p=0.000 n=20+17) ModSqrt225_3Mod4-8 3.57kB ± 0% 3.53kB ± 0% -1.12% (p=0.000 n=20+20) ModSqrt231_Tonelli-8 11.0kB ± 0% 10.7kB ± 0% -2.55% (p=0.000 n=20+20) ModSqrt231_5Mod8-8 4.21kB ± 0% 4.09kB ± 0% -2.85% (p=0.000 n=16+20) ModInverse-8 1.44kB ± 0% 1.28kB ± 0% -11.11% (p=0.000 n=20+20) Sqrt-8 6.00kB ± 0% 6.00kB ± 0% ~ (all equal) IntSqr/1-8 0.00B 0.00B ~ (all equal) IntSqr/2-8 0.00B 0.00B ~ (all equal) IntSqr/3-8 0.00B 0.00B ~ (all equal) IntSqr/5-8 0.00B 0.00B ~ (all equal) IntSqr/8-8 0.00B 0.00B ~ (all equal) IntSqr/10-8 0.00B 0.00B ~ (all equal) IntSqr/20-8 320B ± 0% 320B ± 0% ~ (all equal) IntSqr/30-8 480B ± 0% 480B ± 0% ~ (all equal) IntSqr/50-8 896B ± 0% 896B ± 0% ~ (all equal) IntSqr/80-8 1.28kB ± 0% 1.28kB ± 0% ~ (all equal) IntSqr/100-8 1.79kB ± 0% 1.79kB ± 0% ~ (all equal) IntSqr/200-8 3.20kB ± 0% 3.20kB ± 0% ~ (all equal) IntSqr/300-8 8.06kB ± 0% 8.06kB ± 0% ~ (all equal) IntSqr/500-8 12.3kB ± 0% 12.3kB ± 0% ~ (all equal) IntSqr/800-8 28.8kB ± 0% 28.8kB ± 0% ~ (all equal) IntSqr/1000-8 36.9kB ± 0% 36.9kB ± 0% ~ (all equal) Div/20/10-8 0.00B 0.00B ~ (all equal) Div/200/100-8 0.00B 0.00B ~ (all equal) Div/2000/1000-8 0.00B 0.00B ~ (all equal) Div/20000/10000-8 0.00B 0.00B ~ (all equal) Div/200000/100000-8 690B ± 0% 690B ± 0% ~ (all equal) Mul-8 565kB ± 0% 565kB ± 0% ~ (all equal) ZeroShifts/Shl-8 6.53kB ± 0% 6.53kB ± 0% ~ (all equal) ZeroShifts/ShlSame-8 0.00B 0.00B ~ (all equal) ZeroShifts/Shr-8 6.53kB ± 0% 6.53kB ± 0% ~ (all equal) ZeroShifts/ShrSame-8 0.00B 0.00B ~ (all equal) Exp3Power/0x10-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20) Exp3Power/0x40-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20) Exp3Power/0x100-8 288B ± 0% 208B ± 0% -27.78% (p=0.000 n=20+20) Exp3Power/0x400-8 672B ± 0% 592B ± 0% -11.90% (p=0.000 n=20+20) Exp3Power/0x1000-8 3.33kB ± 0% 3.25kB ± 0% -2.40% (p=0.000 n=20+20) Exp3Power/0x4000-8 13.8kB ± 0% 13.7kB ± 0% -0.58% (p=0.000 n=20+20) Exp3Power/0x10000-8 117kB ± 0% 117kB ± 0% -0.07% (p=0.000 n=20+20) Exp3Power/0x40000-8 755kB ± 0% 755kB ± 0% -0.01% (p=0.000 n=19+20) Exp3Power/0x100000-8 5.22MB ± 0% 5.22MB ± 0% -0.00% (p=0.000 n=20+20) Exp3Power/0x400000-8 39.8MB ± 0% 39.8MB ± 0% -0.00% (p=0.000 n=20+19) Fibo-8 3.09MB ± 0% 3.08MB ± 0% -0.28% (p=0.000 n=20+16) NatSqr/1-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) NatSqr/2-8 64.0B ± 0% 64.0B ± 0% ~ (all equal) NatSqr/3-8 80.0B ± 0% 80.0B ± 0% ~ (all equal) NatSqr/5-8 112B ± 0% 112B ± 0% ~ (all equal) NatSqr/8-8 160B ± 0% 160B ± 0% ~ (all equal) NatSqr/10-8 192B ± 0% 192B ± 0% ~ (all equal) NatSqr/20-8 672B ± 0% 672B ± 0% ~ (all equal) NatSqr/30-8 992B ± 0% 992B ± 0% ~ (all equal) NatSqr/50-8 1.79kB ± 0% 1.79kB ± 0% ~ (all equal) NatSqr/80-8 2.69kB ± 0% 2.69kB ± 0% ~ (all equal) NatSqr/100-8 3.58kB ± 0% 3.58kB ± 0% ~ (all equal) NatSqr/200-8 6.66kB ± 0% 6.66kB ± 0% ~ (all equal) NatSqr/300-8 24.4kB ± 0% 24.4kB ± 0% ~ (all equal) NatSqr/500-8 36.9kB ± 0% 36.9kB ± 0% ~ (all equal) NatSqr/800-8 69.8kB ± 0% 69.8kB ± 0% ~ (all equal) NatSqr/1000-8 86.0kB ± 0% 86.0kB ± 0% ~ (all equal) NatSetBytes/8-8 0.00B 0.00B ~ (all equal) NatSetBytes/24-8 64.0B ± 0% 64.0B ± 0% ~ (all equal) NatSetBytes/128-8 160B ± 0% 160B ± 0% ~ (all equal) NatSetBytes/7-8 0.00B 0.00B ~ (all equal) NatSetBytes/23-8 64.0B ± 0% 64.0B ± 0% ~ (all equal) NatSetBytes/127-8 160B ± 0% 160B ± 0% ~ (all equal) ScanPi-8 75.4kB ± 0% 75.7kB ± 0% +0.41% (p=0.000 n=20+20) StringPiParallel-8 20.4kB ± 0% 20.4kB ± 0% ~ (p=0.223 n=20+20) Scan/10/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/1000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/1000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/1000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/1000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) String/10/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) String/100/Base2-8 352B ± 0% 352B ± 0% ~ (all equal) String/1000/Base2-8 3.46kB ± 0% 3.46kB ± 0% ~ (all equal) String/10000/Base2-8 41.0kB ± 0% 41.0kB ± 0% ~ (all equal) String/100000/Base2-8 336kB ± 0% 336kB ± 0% ~ (all equal) String/10/Base8-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) String/100/Base8-8 112B ± 0% 112B ± 0% ~ (all equal) String/1000/Base8-8 1.15kB ± 0% 1.15kB ± 0% ~ (all equal) String/10000/Base8-8 12.3kB ± 0% 12.3kB ± 0% ~ (all equal) String/100000/Base8-8 115kB ± 0% 115kB ± 0% ~ (all equal) String/10/Base10-8 64.0B ± 0% 24.0B ± 0% -62.50% (p=0.000 n=20+20) String/100/Base10-8 192B ± 0% 192B ± 0% ~ (all equal) String/1000/Base10-8 1.95kB ± 0% 1.95kB ± 0% ~ (all equal) String/10000/Base10-8 20.0kB ± 0% 20.0kB ± 0% ~ (p=0.983 n=19+20) String/100000/Base10-8 210kB ± 1% 211kB ± 1% +0.82% (p=0.000 n=19+20) String/10/Base16-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) String/100/Base16-8 96.0B ± 0% 96.0B ± 0% ~ (all equal) String/1000/Base16-8 896B ± 0% 896B ± 0% ~ (all equal) String/10000/Base16-8 9.47kB ± 0% 9.47kB ± 0% ~ (all equal) String/100000/Base16-8 90.1kB ± 0% 90.1kB ± 0% ~ (all equal) LeafSize/0-8 16.9kB ± 0% 16.8kB ± 0% -0.44% (p=0.000 n=20+20) LeafSize/1-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19) LeafSize/2-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19) LeafSize/3-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+17) LeafSize/4-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19) LeafSize/5-8 22.4kB ± 0% 22.3kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/6-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=20+20) LeafSize/7-8 22.3kB ± 0% 22.2kB ± 0% -0.35% (p=0.000 n=20+20) LeafSize/8-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=16+20) LeafSize/9-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/10-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/11-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/12-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/13-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=20+15) LeafSize/14-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/15-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/16-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=19+20) LeafSize/32-8 22.3kB ± 0% 22.2kB ± 0% -0.32% (p=0.000 n=20+20) LeafSize/64-8 21.8kB ± 0% 21.7kB ± 0% -0.33% (p=0.000 n=18+19) ProbablyPrime/n=0-8 15.3kB ± 0% 14.9kB ± 0% -2.35% (p=0.000 n=20+20) ProbablyPrime/n=1-8 21.0kB ± 0% 20.7kB ± 0% -1.71% (p=0.000 n=20+20) ProbablyPrime/n=5-8 43.4kB ± 0% 42.9kB ± 0% -1.20% (p=0.000 n=20+20) ProbablyPrime/n=10-8 71.5kB ± 0% 70.7kB ± 0% -1.01% (p=0.000 n=19+20) ProbablyPrime/n=20-8 127kB ± 0% 126kB ± 0% -0.88% (p=0.000 n=20+20) ProbablyPrime/Lucas-8 3.07kB ± 0% 2.79kB ± 0% -9.12% (p=0.000 n=20+20) ProbablyPrime/MillerRabinBase2-8 12.1kB ± 0% 12.0kB ± 0% -0.66% (p=0.000 n=20+20) FloatSqrt/64-8 416B ± 0% 360B ± 0% -13.46% (p=0.000 n=20+20) FloatSqrt/128-8 640B ± 0% 584B ± 0% -8.75% (p=0.000 n=20+20) FloatSqrt/256-8 512B ± 0% 472B ± 0% -7.81% (p=0.000 n=20+20) FloatSqrt/1000-8 1.47kB ± 0% 1.43kB ± 0% -2.72% (p=0.000 n=20+20) FloatSqrt/10000-8 18.2kB ± 0% 18.1kB ± 0% -0.22% (p=0.000 n=20+20) FloatSqrt/100000-8 204kB ± 0% 204kB ± 0% -0.02% (p=0.000 n=20+20) FloatSqrt/1000000-8 6.37MB ± 0% 6.37MB ± 0% -0.00% (p=0.000 n=19+20) [Geo mean] 3.42kB 3.24kB -5.33% name old allocs/op new allocs/op delta DecimalConversion-8 1.65k ± 0% 1.65k ± 0% ~ (all equal) FloatString/100-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) FloatString/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) FloatString/10000-8 22.0 ± 0% 22.0 ± 0% ~ (all equal) FloatString/100000-8 136 ± 0% 136 ± 0% ~ (all equal) FloatAdd/10-8 0.00 0.00 ~ (all equal) FloatAdd/100-8 0.00 0.00 ~ (all equal) FloatAdd/1000-8 0.00 0.00 ~ (all equal) FloatAdd/10000-8 0.00 0.00 ~ (all equal) FloatAdd/100000-8 0.00 0.00 ~ (all equal) FloatSub/10-8 0.00 0.00 ~ (all equal) FloatSub/100-8 0.00 0.00 ~ (all equal) FloatSub/1000-8 0.00 0.00 ~ (all equal) FloatSub/10000-8 0.00 0.00 ~ (all equal) FloatSub/100000-8 0.00 0.00 ~ (all equal) ParseFloatSmallExp-8 110 ± 0% 130 ± 0% +18.18% (p=0.000 n=20+20) ParseFloatLargeExp-8 319 ± 0% 371 ± 0% +16.30% (p=0.000 n=20+20) GCD10x10/WithoutXY-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) GCD10x10/WithXY-8 5.00 ± 0% 6.00 ± 0% +20.00% (p=0.000 n=20+20) GCD10x100/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) GCD10x100/WithXY-8 9.00 ± 0% 12.00 ± 0% +33.33% (p=0.000 n=20+20) GCD10x1000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) GCD10x1000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20) GCD10x10000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) GCD10x10000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20) GCD10x100000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) GCD10x100000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20) GCD100x100/WithoutXY-8 6.00 ± 0% 10.00 ± 0% +66.67% (p=0.000 n=20+20) GCD100x100/WithXY-8 9.00 ± 0% 15.00 ± 0% +66.67% (p=0.000 n=20+20) GCD100x1000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20) GCD100x1000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20) GCD100x10000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20) GCD100x10000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20) GCD100x100000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20) GCD100x100000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20) GCD1000x1000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) GCD1000x1000/WithXY-8 19.0 ± 0% 20.0 ± 0% +5.26% (p=0.000 n=20+20) GCD1000x10000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) GCD1000x10000/WithXY-8 26.0 ± 0% 26.0 ± 0% ~ (all equal) GCD1000x100000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) GCD1000x100000/WithXY-8 27.0 ± 0% 27.0 ± 0% ~ (all equal) GCD10000x10000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) GCD10000x10000/WithXY-8 76.0 ± 0% 78.0 ± 0% +2.63% (p=0.000 n=20+20) GCD10000x100000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) GCD10000x100000/WithXY-8 174 ± 0% 174 ± 0% ~ (all equal) GCD100000x100000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) GCD100000x100000/WithXY-8 645 ± 0% 647 ± 0% +0.31% (p=0.000 n=20+20) Hilbert-8 14.1k ± 0% 14.3k ± 0% +0.92% (p=0.000 n=20+20) Binomial-8 38.0 ± 0% 38.0 ± 0% ~ (all equal) QuoRem-8 0.00 0.00 ~ (all equal) Exp-8 21.0 ± 0% 21.0 ± 0% ~ (all equal) Exp2-8 22.0 ± 0% 22.0 ± 0% ~ (all equal) Bitset-8 0.00 0.00 ~ (all equal) BitsetNeg-8 0.00 0.00 ~ (all equal) BitsetOrig-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) BitsetNegOrig-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) ModSqrt225_Tonelli-8 85.0 ± 0% 86.0 ± 0% +1.18% (p=0.000 n=20+20) ModSqrt225_3Mod4-8 25.0 ± 0% 25.0 ± 0% ~ (all equal) ModSqrt231_Tonelli-8 80.0 ± 0% 80.0 ± 0% ~ (all equal) ModSqrt231_5Mod8-8 32.0 ± 0% 32.0 ± 0% ~ (all equal) ModInverse-8 11.0 ± 0% 11.0 ± 0% ~ (all equal) Sqrt-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) IntSqr/1-8 0.00 0.00 ~ (all equal) IntSqr/2-8 0.00 0.00 ~ (all equal) IntSqr/3-8 0.00 0.00 ~ (all equal) IntSqr/5-8 0.00 0.00 ~ (all equal) IntSqr/8-8 0.00 0.00 ~ (all equal) IntSqr/10-8 0.00 0.00 ~ (all equal) IntSqr/20-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/30-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/50-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/80-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/100-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/200-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/300-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) IntSqr/500-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) IntSqr/800-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) IntSqr/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) Div/20/10-8 0.00 0.00 ~ (all equal) Div/200/100-8 0.00 0.00 ~ (all equal) Div/2000/1000-8 0.00 0.00 ~ (all equal) Div/20000/10000-8 0.00 0.00 ~ (all equal) Div/200000/100000-8 0.00 0.00 ~ (all equal) Mul-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) ZeroShifts/Shl-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) ZeroShifts/ShlSame-8 0.00 0.00 ~ (all equal) ZeroShifts/Shr-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) ZeroShifts/ShrSame-8 0.00 0.00 ~ (all equal) Exp3Power/0x10-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) Exp3Power/0x40-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) Exp3Power/0x100-8 5.00 ± 0% 5.00 ± 0% ~ (all equal) Exp3Power/0x400-8 7.00 ± 0% 7.00 ± 0% ~ (all equal) Exp3Power/0x1000-8 11.0 ± 0% 11.0 ± 0% ~ (all equal) Exp3Power/0x4000-8 15.0 ± 0% 15.0 ± 0% ~ (all equal) Exp3Power/0x10000-8 29.0 ± 0% 29.0 ± 0% ~ (all equal) Exp3Power/0x40000-8 140 ± 0% 140 ± 0% ~ (all equal) Exp3Power/0x100000-8 1.12k ± 0% 1.12k ± 0% ~ (all equal) Exp3Power/0x400000-8 9.88k ± 0% 9.88k ± 0% ~ (p=0.747 n=17+19) Fibo-8 739 ± 0% 743 ± 0% +0.54% (p=0.000 n=20+20) NatSqr/1-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/3-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/5-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/20-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/30-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/50-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/80-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/100-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/200-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/300-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) NatSqr/500-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) NatSqr/800-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) NatSqr/1000-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) NatSetBytes/8-8 0.00 0.00 ~ (all equal) NatSetBytes/24-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSetBytes/128-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSetBytes/7-8 0.00 0.00 ~ (all equal) NatSetBytes/23-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSetBytes/127-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) ScanPi-8 60.0 ± 0% 61.0 ± 0% +1.67% (p=0.000 n=20+20) StringPiParallel-8 24.0 ± 0% 24.0 ± 0% ~ (all equal) Scan/10/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/1000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/1000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/1000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/1000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/1000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/1000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10/Base10-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) String/100/Base10-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) String/1000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) String/10000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) String/100000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) String/10/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/1000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) LeafSize/0-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) LeafSize/1-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/2-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/3-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/4-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/5-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/6-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/7-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/8-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/9-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/10-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/11-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/12-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/13-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/14-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/15-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/16-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/32-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/64-8 11.0 ± 0% 11.0 ± 0% ~ (all equal) ProbablyPrime/n=0-8 52.0 ± 0% 52.0 ± 0% ~ (all equal) ProbablyPrime/n=1-8 73.0 ± 0% 73.0 ± 0% ~ (all equal) ProbablyPrime/n=5-8 157 ± 0% 157 ± 0% ~ (all equal) ProbablyPrime/n=10-8 262 ± 0% 262 ± 0% ~ (all equal) ProbablyPrime/n=20-8 472 ± 0% 472 ± 0% ~ (all equal) ProbablyPrime/Lucas-8 22.0 ± 0% 22.0 ± 0% ~ (all equal) ProbablyPrime/MillerRabinBase2-8 29.0 ± 0% 29.0 ± 0% ~ (all equal) FloatSqrt/64-8 9.00 ± 0% 10.00 ± 0% +11.11% (p=0.000 n=20+20) FloatSqrt/128-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20) FloatSqrt/256-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) FloatSqrt/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) FloatSqrt/10000-8 14.0 ± 0% 14.0 ± 0% ~ (all equal) FloatSqrt/100000-8 33.0 ± 0% 33.0 ± 0% ~ (all equal) FloatSqrt/1000000-8 1.16k ± 0% 1.16k ± 0% ~ (all equal) [Geo mean] 6.62 6.76 +2.09% Change-Id: Id9df4157cac1e07721e35cff7fcdefe60703873a Reviewed-on: https://go-review.googlesource.com/c/150999 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Robert Griesemer <gri@golang.org>
2018-08-22	math/big: streamline divLarge initialization	Brian Kessler
	The divLarge code contained "todo"s about avoiding alias and clear calls in the initialization of variables. By rearranging the order of initialization and always using an auxiliary variable for the shifted divisor, all of these calls can be safely avoided. On average, normalizing the divisor (shift>0) is required 31/32 or 63/64 of the time. If one always performs the shift into an auxiliary variable first, this avoids the need to check for aliasing of vIn in the output variables u and z. The remainder u is initialized via a left shift of uIn and thus needs no alias check against uIn. Since uIn and vIn were both used, z needs no alias checks except against u which is used for storage of the remainder. This change has a minimal impact on performance (see below), but cleans up the initialization code and eliminates the "todo"s. name old time/op new time/op delta Div/20/10-4 86.7ns ± 6% 85.7ns ± 5% ~ (p=0.841 n=5+5) Div/200/100-4 523ns ± 5% 502ns ± 3% -4.13% (p=0.024 n=5+5) Div/2000/1000-4 2.55µs ± 3% 2.59µs ± 5% ~ (p=0.548 n=5+5) Div/20000/10000-4 80.4µs ± 4% 80.0µs ± 2% ~ (p=1.000 n=5+5) Div/200000/100000-4 6.43ms ± 6% 6.35ms ± 4% ~ (p=0.548 n=5+5) Fixes #22928 Change-Id: I30d8498ef1cf8b69b0f827165c517bc25a5c32d7 Reviewed-on: https://go-review.googlesource.com/130775 Reviewed-by: Robert Griesemer <gri@golang.org>
2018-05-23	math/big: reduce allocations in Karatsuba case of sqr	Alexander Döring
	For #23221. Change-Id: If55dcf2e0706d6658f4a0863e3740437e008706c Reviewed-on: https://go-review.googlesource.com/114335 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2018-05-23	math/big: specialize Karatsuba implementation for squaring	Alexander Döring
	Currently we use three different algorithms for squaring: 1. basic multiplication for small numbers 2. basic squaring for medium numbers 3. Karatsuba multiplication for large numbers Change 3. to a version of Karatsuba multiplication specialized for x == y. Increasing the performance of 3. lets us lower the threshold between 2. and 3. Adapt TestCalibrate to the change that 3. isn't independent of the threshold between 1. and 2. any more. Fixes #23221. benchstat old.txt new.txt name old time/op new time/op delta NatSqr/1-4 29.6ns ± 7% 29.5ns ± 5% ~ (p=0.103 n=50+50) NatSqr/2-4 51.9ns ± 1% 51.9ns ± 1% ~ (p=0.693 n=42+49) NatSqr/3-4 64.3ns ± 1% 64.1ns ± 0% -0.26% (p=0.000 n=46+43) NatSqr/5-4 93.5ns ± 2% 93.1ns ± 1% -0.39% (p=0.000 n=48+49) NatSqr/8-4 131ns ± 1% 131ns ± 1% ~ (p=0.870 n=46+49) NatSqr/10-4 175ns ± 1% 175ns ± 1% +0.38% (p=0.000 n=49+47) NatSqr/20-4 426ns ± 1% 429ns ± 1% +0.84% (p=0.000 n=46+48) NatSqr/30-4 702ns ± 2% 699ns ± 1% -0.38% (p=0.011 n=46+44) NatSqr/50-4 1.44µs ± 2% 1.43µs ± 1% -0.54% (p=0.010 n=48+48) NatSqr/80-4 2.85µs ± 1% 2.87µs ± 1% +0.68% (p=0.000 n=47+47) NatSqr/100-4 4.06µs ± 1% 4.07µs ± 1% +0.29% (p=0.000 n=46+45) NatSqr/200-4 13.4µs ± 1% 13.5µs ± 1% +0.73% (p=0.000 n=48+48) NatSqr/300-4 28.5µs ± 1% 28.2µs ± 1% -1.22% (p=0.000 n=46+48) NatSqr/500-4 81.9µs ± 1% 67.0µs ± 1% -18.25% (p=0.000 n=48+48) NatSqr/800-4 161µs ± 1% 140µs ± 1% -13.29% (p=0.000 n=47+48) NatSqr/1000-4 245µs ± 1% 207µs ± 1% -15.17% (p=0.000 n=49+49) go test -v -calibrate --run TestCalibrate ... Calibrating threshold between basicSqr(x) and karatsubaSqr(x) Looking for a timing difference for x between 200 - 500 words by 10 step words = 200 deltaT = -980ns ( -7%) is karatsubaSqr(x) better: false words = 210 deltaT = -773ns ( -5%) is karatsubaSqr(x) better: false words = 220 deltaT = -695ns ( -4%) is karatsubaSqr(x) better: false words = 230 deltaT = -570ns ( -3%) is karatsubaSqr(x) better: false words = 240 deltaT = -458ns ( -2%) is karatsubaSqr(x) better: false words = 250 deltaT = -63ns ( 0%) is karatsubaSqr(x) better: false words = 260 deltaT = 118ns ( 0%) is karatsubaSqr(x) better: true threshold found words = 270 deltaT = 377ns ( 1%) is karatsubaSqr(x) better: true words = 280 deltaT = 765ns ( 3%) is karatsubaSqr(x) better: true words = 290 deltaT = 673ns ( 2%) is karatsubaSqr(x) better: true words = 300 deltaT = 502ns ( 1%) is karatsubaSqr(x) better: true words = 310 deltaT = 629ns ( 2%) is karatsubaSqr(x) better: true words = 320 deltaT = 1.011µs ( 3%) is karatsubaSqr(x) better: true words = 330 deltaT = 1.36µs ( 4%) is karatsubaSqr(x) better: true words = 340 deltaT = 3.001µs ( 8%) is karatsubaSqr(x) better: true words = 350 deltaT = 3.178µs ( 8%) is karatsubaSqr(x) better: true ... Change-Id: I6f13c23d94d042539ac28e77fd2618cdc37a429e Reviewed-on: https://go-review.googlesource.com/105075 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2018-04-05	math/big: clean up z.div(z, x, y) calls	Brian Kessler
	Updates #22830 Due to not checking if the output slices alias in divLarge, calls of the form z.div(z, x, y) caused the slice z to attempt to be used to store both the quotient and the remainder of the division. CL 78995 applies an alias check to correct that error. This CL cleans up the additional div calls that attempt to supply the same slice to hold both the quotient and remainder. Note that the call in expNN was responsible for the reported error in r.Exp(x, 1, m) when r was initialized to a non-zero value. The second instance in expNNMontgomery did not result in an error due to the size of the arguments. // RR = 2*(2_Wlen(m)) mod m RR := nat(nil).setWord(1) zz := nat(nil).shl(RR, uint(2numWords_W)) _, RR = RR.div(RR, zz, m) Specifically, cap(RR) == 5 after setWord(1) due to const e = 4 in z.make(1) len(zz) == 2len(m) + 1 after shifting left, numWords = len(m) Reusing the backing array for z and z2 in div was only triggered if cap(RR) >= len(zz) + 1 and len(m) > 1 so that divLarge was called. But, 5 < 2*len(m) + 2 if len(m) > 1, so new arrays were allocated and the error was never triggered in this case. Change-Id: Iedac80dbbde13216c94659e84d28f6f4be3aaf24 Reviewed-on: https://go-review.googlesource.com/81055 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2018-04-03	math/big: remove "else" from if with block that ends with return	isharipo
	That "else" was needed due to gc DCE limitations. Now it's not the case and we can avoid go lint complaints. (See #23521 and https://golang.org/cl/91056.) There is inlining test for bigEndianWord, so if test is passing, no performance regression should occur. Change-Id: Id84d63f361e5e51a52293904ff042966c83c16e9 Reviewed-on: https://go-review.googlesource.com/104555 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-19	math/big: reduce amount of copying in Montgomery multiplication	Vlad Krasnov
	Instead shifting the accumulator every iteration of the loop, shift once in the end. This significantly improves performance on arm64. On arm64: name old time/op new time/op delta RSA2048Decrypt 3.33ms ± 0% 2.63ms ± 0% -20.94% (p=0.000 n=11+11) RSA2048Sign 4.22ms ± 0% 3.55ms ± 0% -15.89% (p=0.000 n=11+11) 3PrimeRSA2048Decrypt 1.95ms ± 0% 1.59ms ± 0% -18.59% (p=0.000 n=11+11) On Skylake: name old time/op new time/op delta RSA2048Decrypt-8 1.73ms ± 2% 1.55ms ± 2% -10.19% (p=0.000 n=10+10) RSA2048Sign-8 2.17ms ± 2% 2.00ms ± 2% -7.93% (p=0.000 n=10+10) 3PrimeRSA2048Decrypt-8 1.10ms ± 2% 0.96ms ± 2% -13.03% (p=0.000 n=10+9) Change-Id: I5786191a1a09e4217fdb1acfd90880d35c5855f7 Reviewed-on: https://go-review.googlesource.com/99838 Run-TryBot: Vlad Krasnov <vlad@cloudflare.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Adam Langley <agl@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2018-03-19	math/big: add 0 shift fastpath to shl and shr	Alberto Donizetti
	One could expect calls like z.mant.shl(z.mant, shiftAmount) (or higher-level-functions calls that use lhs/rhs) to be almost free when shiftAmount = 0; and expect calls like z.mant.shl(x.mant, 0) to have the same cost of a x.mant -> z.mant copy. Neither of this things are currently true. For an 800 words nat, the first kind of calls cost ~800ns for rigth shifts and ~3.5µs for left shift; while the second kind of calls are doing more work than necessary by calling shlVU/shrVU. This change makes the first kind of calls ({Shl,Shr}Same) almost free, and the second kind of calls ({Shl,Shr}) about 30% faster. name old time/op new time/op delta ZeroShifts/Shl-4 3.64µs ± 3% 2.49µs ± 1% -31.55% (p=0.000 n=10+10) ZeroShifts/ShlSame-4 3.65µs ± 1% 0.01µs ± 1% -99.85% (p=0.000 n=9+9) ZeroShifts/Shr-4 3.65µs ± 1% 2.49µs ± 1% -31.91% (p=0.000 n=10+10) ZeroShifts/ShrSame-4 825ns ± 0% 6ns ± 1% -99.33% (p=0.000 n=9+10) During go test math/big, the shl zeroshift fastpath is triggered 1380 times; while the shr fastpath is triggered 153334 times(!). Change-Id: I5f92b304a40638bd8453a86c87c58e54b337bcdf Reviewed-on: https://go-review.googlesource.com/87660 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2018-03-14	math/big: add comment about internal assumptions on nat values	Robert Griesemer
	Change-Id: I7ed40507a019c0bf521ba748fc22c03d74bb17b7 Reviewed-on: https://go-review.googlesource.com/100719 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-08	math/big: speedup nat.setBytes for bigger slices	isharipo
	Set up to _S (number of bytes in Uint) bytes at time by using BigEndian.Uint32 and BigEndian.Uint64. The performance improves for slices bigger than _S bytes. This is the case for 128/256bit arith that initializes it's objects from bytes. name old time/op new time/op delta NatSetBytes/8-4 29.8ns ± 1% 11.4ns ± 0% -61.63% (p=0.000 n=9+8) NatSetBytes/24-4 109ns ± 1% 56ns ± 0% -48.75% (p=0.000 n=9+8) NatSetBytes/128-4 420ns ± 2% 110ns ± 1% -73.83% (p=0.000 n=10+10) NatSetBytes/7-4 26.2ns ± 1% 21.3ns ± 2% -18.63% (p=0.000 n=8+9) NatSetBytes/23-4 106ns ± 1% 67ns ± 1% -36.93% (p=0.000 n=9+10) NatSetBytes/127-4 410ns ± 2% 121ns ± 0% -70.46% (p=0.000 n=9+8) Found this optimization opportunity by looking at ethereum_corevm community benchmark cpuprofile. name old time/op new time/op delta OpDiv256-4 715ns ± 1% 596ns ± 1% -16.57% (p=0.008 n=5+5) OpDiv128-4 373ns ± 1% 314ns ± 1% -15.83% (p=0.008 n=5+5) OpDiv64-4 301ns ± 0% 285ns ± 1% -5.12% (p=0.008 n=5+5) Change-Id: I8e5a680ae6284c8233d8d7431d51253a8a740b57 Reviewed-on: https://go-review.googlesource.com/98775 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> Reviewed-by: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-30	math/big: protect against aliasing in nat.divLarge	Alberto Donizetti
	In nat.divLarge (having signature (z nat).divLarge(u, uIn, v nat)), we check whether z aliases uIn or v, but aliasing is currently not checked for the u parameter. Unfortunately, z and u aliasing each other can in some cases cause errors in the computation. The q return parameter (which will hold the result's quotient), is unconditionally initialized as q = z.make(m + 1) When cap(z) ≥ m+1, z.make() will reuse z's backing array, causing q and z to share the same backing array. If then z aliases u, setting q during the quotient computation will then corrupt u, which at that point already holds computation state. To fix this, we add an alias(z, u) check at the beginning of the function, taking care of aliasing the same way we already do for uIn and v. Fixes #22830 Change-Id: I3ab81120d5af6db7772a062bb1dfc011de91f7ad Reviewed-on: https://go-review.googlesource.com/78995 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2017-08-22	math/big: use internal sqr on nats	Brian Kessler
	Replace z.mul(x, x) calls on nats in internal code with z.sqr(x) that employs optimized squaring routines. Benchmark results: Exp-4 12.9ms ± 2% 12.8ms ± 3% ~ (p=0.165 n=10+10) Exp2-4 13.0ms ± 4% 12.8ms ± 2% -2.14% (p=0.015 n=8+9) ModSqrt225_Tonelli-4 987µs ± 4% 989µs ± 2% ~ (p=0.673 n=8+9) ModSqrt224_3Mod4-4 300µs ± 2% 301µs ± 3% ~ (p=0.546 n=9+9) ModSqrt5430_Tonelli-4 4.88s ± 6% 4.82s ± 5% ~ (p=0.247 n=10+10) ModSqrt5430_3Mod4-4 1.62s ±10% 1.57s ± 1% ~ (p=0.094 n=9+9) Exp3Power/0x10-4 496ns ± 7% 426ns ± 7% -14.21% (p=0.000 n=10+10) Exp3Power/0x40-4 575ns ± 5% 470ns ± 7% -18.20% (p=0.000 n=9+10) Exp3Power/0x100-4 929ns ±19% 770ns ±10% -17.13% (p=0.000 n=10+10) Exp3Power/0x400-4 1.96µs ± 7% 1.79µs ± 5% -8.68% (p=0.000 n=10+10) Exp3Power/0x1000-4 10.9µs ± 9% 7.9µs ± 5% -28.02% (p=0.000 n=10+10) Exp3Power/0x4000-4 86.8µs ± 8% 67.3µs ± 8% -22.41% (p=0.000 n=10+10) Exp3Power/0x10000-4 750µs ± 8% 731µs ± 1% ~ (p=0.074 n=9+8) Exp3Power/0x40000-4 7.07ms ± 7% 7.05ms ± 4% ~ (p=0.931 n=9+9) Exp3Power/0x100000-4 64.7ms ± 2% 65.6ms ± 6% ~ (p=0.661 n=9+10) Exp3Power/0x400000-4 577ms ± 2% 580ms ± 3% ~ (p=0.931 n=9+9) ProbablyPrime/n=0-4 9.08ms ±17% 9.09ms ±16% ~ (p=0.447 n=9+10) ProbablyPrime/n=1-4 10.8ms ± 4% 10.7ms ± 2% ~ (p=0.243 n=10+9) ProbablyPrime/n=5-4 18.5ms ± 3% 18.5ms ± 1% ~ (p=0.863 n=9+9) ProbablyPrime/n=10-4 28.6ms ± 6% 28.2ms ± 1% ~ (p=0.050 n=9+9) ProbablyPrime/n=20-4 48.4ms ± 4% 48.4ms ± 2% ~ (p=0.739 n=10+10) ProbablyPrime/Lucas-4 6.75ms ± 4% 6.75ms ± 2% ~ (p=0.963 n=9+8) ProbablyPrime/MillerRabinBase2-4 2.00ms ± 5% 2.00ms ± 7% ~ (p=0.931 n=9+9) Change-Id: Ibe9f58d11dbad25eb369faedf480b666a0250a6b Reviewed-on: https://go-review.googlesource.com/56773 Reviewed-by: Robert Griesemer <gri@golang.org>
2017-08-16	math/big: recognize z.Mul(x, x) as squaring of x	Brian Kessler
	updates #13745 Multiprecision squaring can be done in a straightforward manner with about half the multiplications of a basic multiplication due to the symmetry of the operands. This change implements basic squaring for nat types and uses it for Int multiplication when the same variable is supplied to both arguments of z.Mul(x, x). This has some overhead to allocate a temporary variable to hold the cross products, shift them to double and add them to the diagonal terms. There is a speed benefit in the intermediate range when the overhead is neglible and the asymptotic performance of karatsuba multiplication has not been reached. basicSqrThreshold = 20 karatsubaSqrThreshold = 400 Were set by running calibrate_test.go to measure timing differences between the algorithms. Benchmarks for squaring: name old time/op new time/op delta IntSqr/1-4 51.5ns ±25% 25.1ns ± 7% -51.38% (p=0.008 n=5+5) IntSqr/2-4 79.1ns ± 4% 72.4ns ± 2% -8.47% (p=0.008 n=5+5) IntSqr/3-4 102ns ± 4% 97ns ± 5% ~ (p=0.056 n=5+5) IntSqr/5-4 161ns ± 4% 163ns ± 7% ~ (p=0.952 n=5+5) IntSqr/8-4 277ns ± 5% 267ns ± 6% ~ (p=0.087 n=5+5) IntSqr/10-4 358ns ± 3% 360ns ± 4% ~ (p=0.730 n=5+5) IntSqr/20-4 1.07µs ± 3% 1.01µs ± 6% ~ (p=0.056 n=5+5) IntSqr/30-4 2.36µs ± 4% 1.72µs ± 2% -27.03% (p=0.008 n=5+5) IntSqr/50-4 5.19µs ± 3% 3.88µs ± 4% -25.37% (p=0.008 n=5+5) IntSqr/80-4 11.3µs ± 4% 8.6µs ± 3% -23.78% (p=0.008 n=5+5) IntSqr/100-4 16.2µs ± 4% 12.8µs ± 3% -21.49% (p=0.008 n=5+5) IntSqr/200-4 50.1µs ± 5% 44.7µs ± 3% -10.65% (p=0.008 n=5+5) IntSqr/300-4 105µs ±11% 95µs ± 3% -9.50% (p=0.008 n=5+5) IntSqr/500-4 231µs ± 5% 227µs ± 2% ~ (p=0.310 n=5+5) IntSqr/800-4 496µs ± 9% 459µs ± 3% -7.40% (p=0.016 n=5+5) IntSqr/1000-4 700µs ± 3% 710µs ± 5% ~ (p=0.841 n=5+5) Show a speed up of 10-25% in the range where basicSqr is optimal, improved single word squaring and no significant difference when the fallback to standard multiplication is used. Change-Id: Iae2c82ca91cf890823f91e5c83bbe9a2c534b72b Reviewed-on: https://go-review.googlesource.com/53638 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-23	math/big: replace local versions of bitLen, nlz with math/bits versions	Robert Griesemer
	Verified that BenchmarkBitLen time went down from 2.25 ns/op to 0.65 ns/op an a 2.3 GHz Intel Core i7, before removing that benchmark (now covered by math/bits benchmarks). Change-Id: I3890bb7d1889e95b9a94bd68f0bdf06f1885adeb Reviewed-on: https://go-review.googlesource.com/38464 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-11	math/big: make nat.setUint64 vet-friendly	Josh Bleecher Snyder
	nat.setUint64 is nicely generic. By assuming 32- or 64-bit words, however, we can write simpler code, and eliminate some shifts in dead code that vet complains about. Generated code for 64 bit systems is unaltered. Generated code for 32 bit systems is much better. For 386, the routine length drops from 325 bytes of code to 271 bytes of code, with fewer loops. Change-Id: I1bc14c06272dee37a7fcb48d33dd1e621eba945d Reviewed-on: https://go-review.googlesource.com/38070 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2017-02-24	math/big: use math/bits where appropriate	Robert Griesemer
	This change adds math/bits as a new dependency of math/big. - use bits.LeadingZeroes instead of local implementation (they are identical, so there's no performance loss here) - leave other functionality local (ntz, bitLen) since there's faster implementations in math/big at the moment Change-Id: I1218aa8a1df0cc9783583b090a4bb5a8a145c4a2 Reviewed-on: https://go-review.googlesource.com/37141 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-17	math/big: add (*Int).Sqrt	Russ Cox
	This is needed for some of the more complex primality tests (to filter out exact squares), and while the code is simple the boundary conditions are not obvious, so it seems worth having in the library. Change-Id: Ica994a6b6c1e412a6f6d9c3cf823f9b653c6bcbd Reviewed-on: https://go-review.googlesource.com/30706 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-10-11	math/big: move ProbablyPrime into its own source file	Russ Cox
	A later CL will be adding more code here. It will help to keep it separate from the other code. Change-Id: I971ba53de819cd10991b51fdec665984939a5f9b Reviewed-on: https://go-review.googlesource.com/30709 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-11	math/big: test and optimize Exp(2, y, n) for large y, odd n	Russ Cox
	The Montgomery multiply code is applicable to this case but was being bypassed. Don't do that. The old test len(x) > 1 was really just a bad approximation to x > 1. name old time/op new time/op delta Exp-8 5.56ms ± 4% 5.73ms ± 3% ~ (p=0.095 n=5+5) Exp2-8 7.59ms ± 1% 5.66ms ± 1% -25.40% (p=0.008 n=5+5) This comes up especially when doing Fermat (Miller-Rabin) primality tests with base 2. Change-Id: I4cc02978db6dfa93f7f3c8f32718e25eedb4f5ed Reviewed-on: https://go-review.googlesource.com/30708 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-10	math/big: make division faster	Russ Cox
	- Add new BenchmarkQuoRem. - Eliminate allocation in divLarge nat pool - Unroll mulAddVWW body 4x - Remove some redundant slice loads in divLarge name old time/op new time/op delta QuoRem-8 2.18µs ± 1% 1.93µs ± 1% -11.38% (p=0.000 n=19+18) The starting point in the comparison here is Cherry's pending CL to turn mulWW and divWW into intrinsics. The optimizations in divLarge work best because all the function calls are gone. The effect of this CL is not as large if you don't assume Cherry's CL. Change-Id: Ia6138907489c5b9168497912e43705634e163b35 Reviewed-on: https://go-review.googlesource.com/30613 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>

Checking that the lengths are equal and panicking teaches the compiler that it can assume “i in range for z” implies “i in range for x”, letting us simplify the actual loops a bit. It also turns up a few places in math/big that were playing maybe a little too fast and loose with slice lengths. Update those to explicitly set all the input slices to the same length. These speedups are basically irrelevant, since they only happen in real code if people are compiling with -tags math_big_pure_go. But at least the code is clearer. benchmark \ system c3h88 c2s16 s7 386 s7-386 c4as16 mac arm loong64 ppc64le riscv64 s390x AddVV/words=1/impl=go ~ +11.20% +5.11% -7.67% -7.77% +1.90% +10.76% -33.22% ~ +10.98% ~ +6.60% AddVV/words=10/impl=go -22.12% -13.48% -10.37% -17.95% -18.07% -24.58% -22.04% -29.95% -14.22% ~ -6.33% +3.66% AddVV/words=16/impl=go -9.75% -13.73% ~ -21.90% -18.66% -30.03% -20.45% -28.09% -17.33% -7.15% -8.96% +12.55% AddVV/words=100/impl=go -5.91% -1.02% ~ -29.23% -22.18% -25.62% -6.49% -23.59% -22.31% -1.88% -14.13% +9.23% AddVV/words=1000/impl=go -0.52% -0.19% -3.58% -33.89% -23.46% -22.46% ~ -24.00% -24.73% +0.93% -15.79% +12.32% AddVV/words=10000/impl=go ~ ~ ~ -33.79% -23.72% -23.79% -5.98% -23.92% ~ +0.78% -15.45% +8.59% AddVV/words=100000/impl=go ~ ~ ~ -33.90% -24.25% -22.82% -4.09% -24.63% ~ +1.00% -13.56% ~ SubVV/words=1/impl=go ~ +11.64% +14.05% ~ -4.07% ~ +10.79% -33.69% ~ ~ +3.89% +12.33% SubVV/words=10/impl=go -10.31% -14.09% -7.38% +13.76% -13.25% -18.05% -20.08% -24.97% -14.15% +10.13% -0.97% -2.51% SubVV/words=16/impl=go -8.06% -13.73% -5.70% +17.00% -12.83% -23.76% -17.52% -25.25% -17.30% -2.80% -4.96% -18.25% SubVV/words=100/impl=go -9.22% -1.30% -2.76% +20.88% -14.35% -15.29% -8.49% -19.64% -22.31% -0.68% -14.30% -9.04% SubVV/words=1000/impl=go -0.60% ~ -3.43% +23.08% -16.14% -11.96% ~ -28.52% -24.73% ~ -15.95% -9.91% SubVV/words=10000/impl=go ~ ~ ~ +26.01% -15.24% -11.92% ~ -28.26% +4.25% ~ -15.42% -5.95% SubVV/words=100000/impl=go ~ ~ ~ +25.71% -15.83% -12.13% ~ -27.88% -1.27% ~ -13.57% -6.72% LshVU/words=1/impl=go +0.56% +0.36% ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ LshVU/words=10/impl=go +13.37% +4.63% ~ ~ ~ ~ ~ -2.90% ~ ~ ~ ~ LshVU/words=16/impl=go +22.83% +6.47% ~ ~ ~ ~ ~ ~ +0.80% ~ ~ +5.88% LshVU/words=100/impl=go +7.56% +13.95% ~ ~ ~ ~ ~ ~ +0.33% -2.50% ~ ~ LshVU/words=1000/impl=go +0.64% +17.92% ~ ~ ~ ~ ~ -6.52% ~ -2.58% ~ ~ LshVU/words=10000/impl=go ~ +17.60% ~ ~ ~ ~ ~ -6.64% -6.22% -1.40% ~ ~ LshVU/words=100000/impl=go ~ +14.57% ~ ~ ~ ~ ~ ~ -5.47% ~ ~ ~ RshVU/words=1/impl=go ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ +2.72% RshVU/words=10/impl=go ~ ~ ~ ~ ~ ~ ~ +2.50% ~ ~ ~ ~ RshVU/words=16/impl=go ~ +0.53% ~ ~ ~ ~ ~ +3.82% ~ ~ ~ ~ RshVU/words=100/impl=go ~ ~ ~ ~ ~ ~ ~ +6.18% ~ ~ ~ ~ RshVU/words=1000/impl=go ~ ~ ~ ~ ~ ~ ~ +7.00% ~ ~ ~ ~ RshVU/words=10000/impl=go ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ RshVU/words=100000/impl=go ~ ~ ~ ~ ~ ~ ~ +7.05% ~ ~ ~ ~ MulAddVWW/words=1/impl=go -10.34% +4.43% +10.62% -1.62% -4.74% -2.86% +11.75% ~ -8.00% +8.89% +3.87% ~ MulAddVWW/words=10/impl=go -1.61% -5.87% ~ -8.30% -4.55% +0.87% ~ -5.28% -20.82% ~ ~ -2.32% MulAddVWW/words=16/impl=go -2.96% -5.28% ~ -9.22% -5.28% ~ ~ -3.74% -19.52% -1.48% -2.53% -9.52% MulAddVWW/words=100/impl=go -3.89% -7.53% +1.93% -10.49% -4.87% -8.27% ~ ~ -0.65% -0.61% -7.59% -20.61% MulAddVWW/words=1000/impl=go -0.45% -3.91% +4.54% -11.46% -4.69% -8.53% ~ ~ -0.05% ~ -8.88% -19.77% MulAddVWW/words=10000/impl=go ~ -3.30% +4.10% -11.34% -4.10% -9.43% ~ -0.61% ~ -0.55% -8.21% -18.48% MulAddVWW/words=100000/impl=go -0.30% -3.03% +4.31% -11.55% -4.41% -9.74% ~ -0.75% +0.63% ~ -7.80% -19.82% AddMulVVWW/words=1/impl=go ~ +13.09% +12.50% -7.05% -10.41% +2.53% +13.32% -3.49% ~ +15.56% +3.62% ~ AddMulVVWW/words=10/impl=go -15.96% -9.06% -5.06% -14.56% -11.83% -5.44% -26.30% -14.23% -11.44% -1.79% -5.93% -6.60% AddMulVVWW/words=16/impl=go -19.05% -12.43% -6.19% -14.24% -12.67% -8.65% -18.64% -16.56% -10.64% -3.00% -7.61% -12.80% AddMulVVWW/words=100/impl=go -22.13% -16.59% -13.04% -13.79% -11.46% -12.01% -6.46% -21.80% -5.08% -3.13% -13.60% -22.53% AddMulVVWW/words=1000/impl=go -17.07% -17.05% -14.08% -13.59% -12.13% -11.21% ~ -22.81% -4.27% -1.27% -16.35% -23.47% AddMulVVWW/words=10000/impl=go -15.03% -16.78% -14.23% -13.86% -11.84% -11.69% ~ -22.75% -13.39% -1.10% -14.37% -22.01% AddMulVVWW/words=100000/impl=go -13.70% -14.90% -14.26% -13.55% -12.04% -11.63% ~ -22.61% ~ -2.53% -10.42% -23.16% Change-Id: Ic6f64344484a762b818c7090d1396afceb638607 Reviewed-on: https://go-review.googlesource.com/c/go/+/665155 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>

It is annoying that non-x86 implementations of shlVU and shrVU have to go out of their way to handle the trivial case shift==0 with their own copy loops. Instead, arrange to never call them with shift==0, so that the code can be removed. Unfortunately, there are linknames of shlVU, so we cannot change that function. But we can rename the functions and then leave behind a shlVU wrapper, so do that. Since the big.Int API calls the operations Lsh and Rsh, rename shlVU/shrVU to lshVU/rshVU. Also rename various other shl/shr methods and functions to lsh/rsh. Change-Id: Ieaf54e0110a298730aa3e4566ce5be57ba7fc121 Reviewed-on: https://go-review.googlesource.com/c/go/+/664896 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

addMulVVW is an unnecessarily special case. All other assembly routines taking []Word (V as in vector) arguments take separate source and destination. For example: addVV: z = x+y mulAddVWW: z = x*m+a addMulVVW uses the z parameter as both destination and source: addMulVVW: z = z+x*m Even looking at the signatures is confusing: all the VV routines take two input vectors x and y, but addMulVVW takes only x: where is y? (The answer is that the two inputs are z and x.) It would be nice to fix this, both for understandability and regularity, and to simplify a future assembly generator. We cannot remove or redefine addMulVVW, because it has been used in linknames. Instead, the CL adds a new final addend argument ‘a’ like in mulAddVWW, making the natural name addMulVVWW (two input vectors, two input words): addMulVVWW: z = x+y*m+a This CL updates all the assembly implementations to rename the inputs z, x, y -> x, y, m, and then introduces a separate destination z. Change-Id: Ib76c80b53f6d1f4a901f663566e9c4764bb20488 Reviewed-on: https://go-review.googlesource.com/c/go/+/664895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>

The old Karatsuba implementation only operated on lengths that are a power of two times a number smaller than karatsubaThreshold. For example, when karatsubaThreshold = 40, multiplying a pair of 99-word numbers runs karatsuba on the low 96 (= 39<<2) words and then has to fix up the answer to include the high 3 words of each. I suspect this requirement was needed to make the analysis of how many temporary words to reserve easier, back when the answer was 3*n and depended on exactly halving the size at each Karatsuba step. Now that we have the more flexible temporary allocation stack, we can change Karatsuba to accept operands of odd length. Doing so avoids most of the fixup that the old approach required. For example, multiplying a pair of 99-word numbers runs karatsuba on all 99 words now. This is simpler and about the same speed or, for large cases, faster. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-16 99.62n ± 3% 99.10n ± 3% ~ (p=0.009 n=15) GCD10x10/WithXY-16 243.4n ± 1% 245.2n ± 1% ~ (p=0.009 n=15) GCD100x100/WithoutXY-16 921.9n ± 1% 919.2n ± 1% ~ (p=0.076 n=15) GCD100x100/WithXY-16 1.527µ ± 1% 1.526µ ± 0% ~ (p=0.813 n=15) GCD1000x1000/WithoutXY-16 9.704µ ± 1% 9.696µ ± 0% ~ (p=0.532 n=15) GCD1000x1000/WithXY-16 14.03µ ± 1% 13.96µ ± 0% ~ (p=0.014 n=15) GCD10000x10000/WithoutXY-16 206.5µ ± 2% 206.5µ ± 0% ~ (p=0.967 n=15) GCD10000x10000/WithXY-16 398.0µ ± 1% 397.4µ ± 0% ~ (p=0.683 n=15) Div/20/10-16 22.22n ± 0% 22.23n ± 0% ~ (p=0.105 n=15) Div/40/20-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.307 n=15) Div/100/50-16 55.47n ± 0% 55.47n ± 0% ~ (p=0.573 n=15) Div/200/100-16 174.9n ± 1% 174.6n ± 1% ~ (p=0.814 n=15) Div/400/200-16 209.5n ± 1% 210.5n ± 1% ~ (p=0.454 n=15) Div/1000/500-16 379.9n ± 0% 383.5n ± 2% ~ (p=0.123 n=15) Div/2000/1000-16 780.1n ± 0% 784.6n ± 1% +0.58% (p=0.000 n=15) Div/20000/10000-16 25.22µ ± 1% 25.15µ ± 0% ~ (p=0.213 n=15) Div/200000/100000-16 921.8µ ± 1% 926.1µ ± 0% ~ (p=0.009 n=15) Div/2000000/1000000-16 37.91m ± 0% 35.63m ± 0% -6.02% (p=0.000 n=15) Div/20000000/10000000-16 1.378 ± 0% 1.336 ± 0% -3.03% (p=0.000 n=15) NatMul/10-16 166.8n ± 4% 168.9n ± 3% ~ (p=0.008 n=15) NatMul/100-16 5.519µ ± 2% 5.548µ ± 4% ~ (p=0.032 n=15) NatMul/1000-16 230.4µ ± 1% 220.2µ ± 1% -4.43% (p=0.000 n=15) NatMul/10000-16 8.569m ± 1% 8.640m ± 1% ~ (p=0.005 n=15) NatMul/100000-16 376.5m ± 1% 334.1m ± 0% -11.26% (p=0.000 n=15) NatSqr/1-16 27.85n ± 5% 28.60n ± 2% ~ (p=0.123 n=15) NatSqr/2-16 47.99n ± 2% 48.84n ± 1% ~ (p=0.008 n=15) NatSqr/3-16 59.41n ± 2% 60.87n ± 2% +2.46% (p=0.001 n=15) NatSqr/5-16 87.27n ± 2% 89.31n ± 3% ~ (p=0.087 n=15) NatSqr/8-16 124.6n ± 3% 128.9n ± 3% ~ (p=0.006 n=15) NatSqr/10-16 166.3n ± 3% 172.7n ± 3% ~ (p=0.002 n=15) NatSqr/20-16 385.2n ± 2% 394.7n ± 3% ~ (p=0.036 n=15) NatSqr/30-16 622.7n ± 3% 642.9n ± 3% ~ (p=0.032 n=15) NatSqr/50-16 1.274µ ± 3% 1.323µ ± 4% ~ (p=0.003 n=15) NatSqr/80-16 2.606µ ± 4% 2.714µ ± 4% ~ (p=0.044 n=15) NatSqr/100-16 3.731µ ± 4% 3.871µ ± 4% ~ (p=0.038 n=15) NatSqr/200-16 12.99µ ± 2% 13.09µ ± 3% ~ (p=0.838 n=15) NatSqr/300-16 22.87µ ± 2% 23.25µ ± 2% ~ (p=0.285 n=15) NatSqr/500-16 58.43µ ± 1% 58.25µ ± 2% ~ (p=0.345 n=15) NatSqr/800-16 115.3µ ± 3% 116.2µ ± 3% ~ (p=0.126 n=15) NatSqr/1000-16 173.9µ ± 1% 174.3µ ± 1% ~ (p=0.935 n=15) NatSqr/10000-16 6.133m ± 2% 6.034m ± 1% -1.62% (p=0.000 n=15) NatSqr/100000-16 253.8m ± 1% 241.5m ± 0% -4.87% (p=0.000 n=15) geomean 7.745µ 7.760µ +0.19% goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-88 62.17n ± 4% 61.44n ± 0% -1.17% (p=0.000 n=15) GCD10x10/WithXY-88 173.4n ± 2% 172.4n ± 4% ~ (p=0.615 n=15) GCD100x100/WithoutXY-88 584.0n ± 1% 582.9n ± 0% ~ (p=0.009 n=15) GCD100x100/WithXY-88 1.098µ ± 1% 1.091µ ± 2% ~ (p=0.002 n=15) GCD1000x1000/WithoutXY-88 6.055µ ± 0% 6.049µ ± 0% ~ (p=0.007 n=15) GCD1000x1000/WithXY-88 9.430µ ± 0% 9.417µ ± 1% ~ (p=0.123 n=15) GCD10000x10000/WithoutXY-88 153.4µ ± 2% 149.0µ ± 2% -2.85% (p=0.000 n=15) GCD10000x10000/WithXY-88 350.6µ ± 3% 349.0µ ± 2% ~ (p=0.126 n=15) Div/20/10-88 13.12n ± 0% 13.12n ± 1% 0.00% (p=0.042 n=15) Div/40/20-88 13.12n ± 0% 13.13n ± 0% ~ (p=0.004 n=15) Div/100/50-88 25.49n ± 0% 25.49n ± 0% ~ (p=0.452 n=15) Div/200/100-88 115.7n ± 2% 113.8n ± 2% ~ (p=0.212 n=15) Div/400/200-88 135.0n ± 1% 136.1n ± 1% ~ (p=0.005 n=15) Div/1000/500-88 257.5n ± 1% 259.9n ± 1% ~ (p=0.004 n=15) Div/2000/1000-88 567.5n ± 1% 572.4n ± 2% ~ (p=0.616 n=15) Div/20000/10000-88 25.65µ ± 0% 25.77µ ± 1% ~ (p=0.032 n=15) Div/200000/100000-88 777.4µ ± 1% 754.3µ ± 1% -2.97% (p=0.000 n=15) Div/2000000/1000000-88 33.66m ± 0% 31.37m ± 0% -6.81% (p=0.000 n=15) Div/20000000/10000000-88 1.320 ± 0% 1.266 ± 0% -4.04% (p=0.000 n=15) NatMul/10-88 151.9n ± 7% 143.3n ± 7% ~ (p=0.878 n=15) NatMul/100-88 4.418µ ± 2% 4.337µ ± 3% ~ (p=0.512 n=15) NatMul/1000-88 206.8µ ± 1% 189.8µ ± 1% -8.25% (p=0.000 n=15) NatMul/10000-88 8.531m ± 1% 8.095m ± 0% -5.12% (p=0.000 n=15) NatMul/100000-88 298.9m ± 0% 260.5m ± 1% -12.85% (p=0.000 n=15) NatSqr/1-88 27.55n ± 6% 28.25n ± 7% ~ (p=0.024 n=15) NatSqr/2-88 44.71n ± 6% 46.21n ± 9% ~ (p=0.024 n=15) NatSqr/3-88 55.44n ± 4% 58.41n ± 10% ~ (p=0.126 n=15) NatSqr/5-88 80.71n ± 5% 81.41n ± 5% ~ (p=0.032 n=15) NatSqr/8-88 115.7n ± 4% 115.4n ± 5% ~ (p=0.814 n=15) NatSqr/10-88 147.4n ± 4% 147.3n ± 4% ~ (p=0.505 n=15) NatSqr/20-88 337.8n ± 3% 337.3n ± 4% ~ (p=0.814 n=15) NatSqr/30-88 556.9n ± 3% 557.6n ± 4% ~ (p=0.814 n=15) NatSqr/50-88 1.208µ ± 4% 1.208µ ± 3% ~ (p=0.910 n=15) NatSqr/80-88 2.591µ ± 3% 2.581µ ± 3% ~ (p=0.705 n=15) NatSqr/100-88 3.870µ ± 3% 3.858µ ± 3% ~ (p=0.846 n=15) NatSqr/200-88 14.43µ ± 3% 14.28µ ± 2% ~ (p=0.383 n=15) NatSqr/300-88 24.68µ ± 2% 24.49µ ± 2% ~ (p=0.624 n=15) NatSqr/500-88 66.27µ ± 1% 66.18µ ± 1% ~ (p=0.735 n=15) NatSqr/800-88 128.7µ ± 1% 127.4µ ± 1% ~ (p=0.050 n=15) NatSqr/1000-88 198.7µ ± 1% 197.7µ ± 1% ~ (p=0.229 n=15) NatSqr/10000-88 6.582m ± 1% 6.426m ± 1% -2.37% (p=0.000 n=15) NatSqr/100000-88 274.3m ± 0% 267.3m ± 0% -2.57% (p=0.000 n=15) geomean 6.518µ 6.438µ -1.22% goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-16 61.70n ± 1% 61.32n ± 1% ~ (p=0.361 n=15) GCD10x10/WithXY-16 217.3n ± 1% 217.0n ± 1% ~ (p=0.395 n=15) GCD100x100/WithoutXY-16 569.7n ± 0% 572.6n ± 2% ~ (p=0.213 n=15) GCD100x100/WithXY-16 1.241µ ± 1% 1.236µ ± 1% ~ (p=0.157 n=15) GCD1000x1000/WithoutXY-16 5.558µ ± 0% 5.566µ ± 0% ~ (p=0.228 n=15) GCD1000x1000/WithXY-16 9.319µ ± 0% 9.326µ ± 0% ~ (p=0.233 n=15) GCD10000x10000/WithoutXY-16 126.4µ ± 2% 128.7µ ± 3% ~ (p=0.081 n=15) GCD10000x10000/WithXY-16 279.3µ ± 0% 278.3µ ± 5% ~ (p=0.187 n=15) Div/20/10-16 15.12n ± 1% 15.21n ± 1% ~ (p=0.490 n=15) Div/40/20-16 15.11n ± 0% 15.23n ± 1% ~ (p=0.107 n=15) Div/100/50-16 26.53n ± 0% 26.50n ± 0% ~ (p=0.299 n=15) Div/200/100-16 123.7n ± 0% 124.0n ± 0% ~ (p=0.086 n=15) Div/400/200-16 142.5n ± 0% 142.4n ± 0% ~ (p=0.039 n=15) Div/1000/500-16 259.9n ± 1% 261.2n ± 1% ~ (p=0.044 n=15) Div/2000/1000-16 539.4n ± 1% 532.3n ± 1% -1.32% (p=0.001 n=15) Div/20000/10000-16 22.43µ ± 0% 22.32µ ± 0% -0.49% (p=0.000 n=15) Div/200000/100000-16 898.3µ ± 0% 889.6µ ± 0% -0.96% (p=0.000 n=15) Div/2000000/1000000-16 38.37m ± 0% 35.11m ± 0% -8.49% (p=0.000 n=15) Div/20000000/10000000-16 1.449 ± 0% 1.384 ± 0% -4.48% (p=0.000 n=15) NatMul/10-16 182.0n ± 1% 177.8n ± 1% -2.31% (p=0.000 n=15) NatMul/100-16 5.537µ ± 0% 5.693µ ± 0% +2.82% (p=0.000 n=15) NatMul/1000-16 229.9µ ± 0% 224.8µ ± 0% -2.24% (p=0.000 n=15) NatMul/10000-16 8.985m ± 0% 8.751m ± 0% -2.61% (p=0.000 n=15) NatMul/100000-16 371.1m ± 0% 331.5m ± 0% -10.66% (p=0.000 n=15) NatSqr/1-16 46.77n ± 6% 42.76n ± 1% -8.57% (p=0.000 n=15) NatSqr/2-16 66.99n ± 4% 63.62n ± 1% -5.03% (p=0.000 n=15) NatSqr/3-16 76.79n ± 4% 73.42n ± 1% ~ (p=0.007 n=15) NatSqr/5-16 99.00n ± 3% 95.35n ± 1% -3.69% (p=0.000 n=15) NatSqr/8-16 160.0n ± 3% 155.1n ± 1% -3.06% (p=0.001 n=15) NatSqr/10-16 178.4n ± 2% 175.9n ± 0% -1.40% (p=0.001 n=15) NatSqr/20-16 361.9n ± 2% 361.3n ± 0% ~ (p=0.083 n=15) NatSqr/30-16 584.7n ± 0% 586.8n ± 0% +0.36% (p=0.000 n=15) NatSqr/50-16 1.327µ ± 0% 1.329µ ± 0% ~ (p=0.349 n=15) NatSqr/80-16 2.893µ ± 1% 2.925µ ± 0% +1.11% (p=0.000 n=15) NatSqr/100-16 4.330µ ± 1% 4.381µ ± 0% +1.18% (p=0.000 n=15) NatSqr/200-16 16.25µ ± 1% 16.43µ ± 0% +1.07% (p=0.000 n=15) NatSqr/300-16 27.85µ ± 1% 28.06µ ± 0% +0.77% (p=0.000 n=15) NatSqr/500-16 76.01µ ± 0% 76.34µ ± 0% ~ (p=0.002 n=15) NatSqr/800-16 146.8µ ± 0% 148.1µ ± 0% +0.83% (p=0.000 n=15) NatSqr/1000-16 228.2µ ± 0% 228.6µ ± 0% ~ (p=0.123 n=15) NatSqr/10000-16 7.524m ± 0% 7.426m ± 0% -1.31% (p=0.000 n=15) NatSqr/100000-16 316.7m ± 0% 309.2m ± 0% -2.36% (p=0.000 n=15) geomean 7.264µ 7.172µ -1.27% goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-12 32.61n ± 1% 32.42n ± 1% ~ (p=0.021 n=15) GCD10x10/WithXY-12 87.70n ± 1% 88.42n ± 1% ~ (p=0.010 n=15) GCD100x100/WithoutXY-12 305.9n ± 0% 306.4n ± 0% ~ (p=0.003 n=15) GCD100x100/WithXY-12 560.3n ± 2% 556.6n ± 1% ~ (p=0.018 n=15) GCD1000x1000/WithoutXY-12 3.509µ ± 2% 3.464µ ± 1% ~ (p=0.145 n=15) GCD1000x1000/WithXY-12 5.347µ ± 2% 5.372µ ± 1% ~ (p=0.046 n=15) GCD10000x10000/WithoutXY-12 73.75µ ± 1% 73.99µ ± 1% ~ (p=0.004 n=15) GCD10000x10000/WithXY-12 148.4µ ± 0% 147.8µ ± 1% ~ (p=0.076 n=15) Div/20/10-12 9.481n ± 0% 9.462n ± 1% ~ (p=0.631 n=15) Div/40/20-12 9.457n ± 0% 9.462n ± 1% ~ (p=0.798 n=15) Div/100/50-12 14.91n ± 0% 14.79n ± 1% -0.80% (p=0.000 n=15) Div/200/100-12 84.56n ± 1% 84.60n ± 1% ~ (p=0.271 n=15) Div/400/200-12 103.8n ± 0% 102.8n ± 0% -0.96% (p=0.000 n=15) Div/1000/500-12 181.3n ± 1% 184.2n ± 2% ~ (p=0.091 n=15) Div/2000/1000-12 397.5n ± 0% 397.4n ± 0% ~ (p=0.299 n=15) Div/20000/10000-12 14.04µ ± 1% 13.99µ ± 0% ~ (p=0.221 n=15) Div/200000/100000-12 523.1µ ± 0% 514.0µ ± 3% ~ (p=0.775 n=15) Div/2000000/1000000-12 21.58m ± 0% 20.01m ± 1% -7.29% (p=0.000 n=15) Div/20000000/10000000-12 813.5m ± 0% 796.2m ± 1% -2.13% (p=0.000 n=15) NatMul/10-12 80.46n ± 1% 80.02n ± 1% ~ (p=0.063 n=15) NatMul/100-12 2.904µ ± 0% 2.979µ ± 1% +2.58% (p=0.000 n=15) NatMul/1000-12 127.8µ ± 0% 122.3µ ± 0% -4.28% (p=0.000 n=15) NatMul/10000-12 5.141m ± 0% 4.975m ± 1% -3.23% (p=0.000 n=15) NatMul/100000-12 208.8m ± 0% 189.6m ± 3% -9.21% (p=0.000 n=15) NatSqr/1-12 11.90n ± 1% 11.76n ± 1% ~ (p=0.059 n=15) NatSqr/2-12 21.33n ± 1% 21.12n ± 0% ~ (p=0.063 n=15) NatSqr/3-12 26.05n ± 1% 25.79n ± 0% ~ (p=0.002 n=15) NatSqr/5-12 37.31n ± 0% 36.98n ± 1% ~ (p=0.008 n=15) NatSqr/8-12 63.07n ± 0% 62.75n ± 1% ~ (p=0.061 n=15) NatSqr/10-12 79.48n ± 0% 79.59n ± 0% ~ (p=0.455 n=15) NatSqr/20-12 173.1n ± 0% 173.2n ± 1% ~ (p=0.518 n=15) NatSqr/30-12 288.6n ± 1% 289.2n ± 0% ~ (p=0.030 n=15) NatSqr/50-12 653.3n ± 0% 653.3n ± 0% ~ (p=0.361 n=15) NatSqr/80-12 1.492µ ± 0% 1.496µ ± 0% ~ (p=0.018 n=15) NatSqr/100-12 2.270µ ± 1% 2.270µ ± 0% ~ (p=0.326 n=15) NatSqr/200-12 8.776µ ± 1% 8.784µ ± 1% ~ (p=0.083 n=15) NatSqr/300-12 15.07µ ± 0% 15.09µ ± 0% ~ (p=0.455 n=15) NatSqr/500-12 41.71µ ± 0% 41.77µ ± 1% ~ (p=0.305 n=15) NatSqr/800-12 80.77µ ± 1% 80.59µ ± 0% ~ (p=0.113 n=15) NatSqr/1000-12 126.4µ ± 1% 126.5µ ± 0% ~ (p=0.683 n=15) NatSqr/10000-12 4.204m ± 0% 4.119m ± 0% -2.02% (p=0.000 n=15) NatSqr/100000-12 177.0m ± 0% 172.9m ± 0% -2.31% (p=0.000 n=15) geomean 3.790µ 3.757µ -0.87% Change-Id: Ifc7a9b61f678df216690511ac8bb9143189a795e Reviewed-on: https://go-review.googlesource.com/c/go/+/652057 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>

No code changes. This CL moves the multiplication (and squaring) code into natmul.go, in preparation for cleaning up Karatsuba and then adding Toom-Cook and FFT-based multiplication. Change-Id: I7f84328284cc4e1ca4da0ebb9f666a5535e8d7f2 Reviewed-on: https://go-review.googlesource.com/c/go/+/652055 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>

In the early days of math/big, algorithms that needed more space grew the result larger than it needed to be and then used the high words as extra space. This made results their own temporary space caches, at the cost that saving a result in a data structure might hold significantly more memory than necessary. Specifically, new(big.Int).Mul(x, y) returned a big.Int with a backing slice 3X as big as it strictly needed to be. If you are storing many multiplication results, or even a single large result, the 3X overhead can add up. This approach to storage for temporaries also requires being able to analyze the algorithms to predict the exact amount they need, which can be difficult. For both these reasons, the implementation of recursive long division, which came later, introduced a “nat pool” where temporaries could be stored and reused, or reclaimed by the GC when no longer used. This avoids the storage and bookkeeping overheads but introduces a per-temporary sync.Pool overhead. divRecursiveStep takes an array of cached temporaries to remove some of that overhead. The nat pool was better but is still not quite right. This CL introduces something even better than the nat pool (still probably not quite right, but the best I can see for now): a sync.Pool holding stacks for allocating temporaries. Now an operation can get one stack out of the pool and then allocate as many temporaries as it needs during the operation, eventually returning the stack back to the pool. The sync.Pool operations are now per-exported-operation (like big.Int.Mul), not per-temporary. This CL converts both the pre-allocation in nat.mul and the uses of the nat pool to use stack pools instead. This simplifies some code and sets us up better for more complex algorithms (such as Toom-Cook or FFT-based multiplication) that need more temporaries. It is also a little bit faster. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15) Div/40/20-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15) Div/100/50-16 56.65n ± 0% 55.53n ± 0% -1.98% (p=0.000 n=15) Div/200/100-16 194.6n ± 1% 172.8n ± 0% -11.20% (p=0.000 n=15) Div/400/200-16 232.1n ± 0% 206.7n ± 0% -10.94% (p=0.000 n=15) Div/1000/500-16 405.3n ± 1% 383.8n ± 0% -5.30% (p=0.000 n=15) Div/2000/1000-16 810.4n ± 1% 795.2n ± 0% -1.88% (p=0.000 n=15) Div/20000/10000-16 25.88µ ± 0% 25.39µ ± 0% -1.89% (p=0.000 n=15) Div/200000/100000-16 931.5µ ± 0% 924.3µ ± 0% -0.77% (p=0.000 n=15) Div/2000000/1000000-16 37.77m ± 0% 37.75m ± 0% ~ (p=0.098 n=15) Div/20000000/10000000-16 1.367 ± 0% 1.377 ± 0% +0.72% (p=0.003 n=15) NatMul/10-16 168.5n ± 3% 164.0n ± 4% ~ (p=0.751 n=15) NatMul/100-16 6.086µ ± 3% 5.380µ ± 3% -11.60% (p=0.000 n=15) NatMul/1000-16 238.1µ ± 3% 228.3µ ± 1% -4.12% (p=0.000 n=15) NatMul/10000-16 8.721m ± 2% 8.518m ± 1% -2.33% (p=0.000 n=15) NatMul/100000-16 369.6m ± 0% 371.1m ± 0% +0.42% (p=0.000 n=15) geomean 19.57µ 18.74µ -4.21% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15) NatMul/1000-16 48.16Ki ± 0% 16.02Ki ± 0% -66.73% (p=0.000 n=15) NatMul/10000-16 482.9Ki ± 1% 165.4Ki ± 3% -65.75% (p=0.000 n=15) NatMul/100000-16 5.747Mi ± 7% 4.197Mi ± 0% -26.97% (p=0.000 n=15) geomean 41.42Ki 20.63Ki -50.18% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-16 7.000 ± 14% 7.000 ± 14% ~ (p=0.668 n=15) geomean 1.476 1.476 +0.00% ¹ all samples are equal goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-88 15.84n ± 1% 13.12n ± 0% -17.17% (p=0.000 n=15) Div/40/20-88 15.88n ± 1% 13.12n ± 0% -17.38% (p=0.000 n=15) Div/100/50-88 26.42n ± 0% 25.47n ± 0% -3.60% (p=0.000 n=15) Div/200/100-88 132.4n ± 0% 114.9n ± 0% -13.22% (p=0.000 n=15) Div/400/200-88 150.1n ± 0% 135.6n ± 0% -9.66% (p=0.000 n=15) Div/1000/500-88 275.5n ± 0% 264.1n ± 0% -4.14% (p=0.000 n=15) Div/2000/1000-88 586.5n ± 0% 581.1n ± 0% -0.92% (p=0.000 n=15) Div/20000/10000-88 25.87µ ± 0% 25.72µ ± 0% -0.59% (p=0.000 n=15) Div/200000/100000-88 772.2µ ± 0% 779.0µ ± 0% +0.88% (p=0.000 n=15) Div/2000000/1000000-88 33.36m ± 0% 33.63m ± 0% +0.80% (p=0.000 n=15) Div/20000000/10000000-88 1.307 ± 0% 1.320 ± 0% +1.03% (p=0.000 n=15) NatMul/10-88 140.4n ± 0% 148.8n ± 4% +5.98% (p=0.000 n=15) NatMul/100-88 4.663µ ± 1% 4.388µ ± 1% -5.90% (p=0.000 n=15) NatMul/1000-88 207.7µ ± 0% 205.8µ ± 0% -0.89% (p=0.000 n=15) NatMul/10000-88 8.456m ± 0% 8.468m ± 0% +0.14% (p=0.021 n=15) NatMul/100000-88 295.1m ± 0% 297.9m ± 0% +0.94% (p=0.000 n=15) geomean 14.96µ 14.33µ -4.23% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-88 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-88 4.750Ki ± 0% 1.758Ki ± 0% -62.99% (p=0.000 n=15) NatMul/1000-88 48.44Ki ± 0% 16.08Ki ± 0% -66.80% (p=0.000 n=15) NatMul/10000-88 489.7Ki ± 1% 166.1Ki ± 3% -66.08% (p=0.000 n=15) NatMul/100000-88 5.546Mi ± 0% 3.819Mi ± 60% -31.15% (p=0.000 n=15) geomean 41.29Ki 20.30Ki -50.85% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-88 5.000 ± 20% 6.000 ± 67% ~ (p=0.672 n=15) geomean 1.380 1.431 +3.71% ¹ all samples are equal goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 15.85n ± 0% 15.23n ± 0% -3.91% (p=0.000 n=15) Div/40/20-16 15.88n ± 0% 15.22n ± 0% -4.16% (p=0.000 n=15) Div/100/50-16 29.69n ± 0% 26.39n ± 0% -11.11% (p=0.000 n=15) Div/200/100-16 149.2n ± 0% 123.3n ± 0% -17.36% (p=0.000 n=15) Div/400/200-16 160.3n ± 0% 139.2n ± 0% -13.16% (p=0.000 n=15) Div/1000/500-16 271.0n ± 0% 256.1n ± 0% -5.50% (p=0.000 n=15) Div/2000/1000-16 545.3n ± 0% 527.0n ± 0% -3.36% (p=0.000 n=15) Div/20000/10000-16 22.60µ ± 0% 22.20µ ± 0% -1.77% (p=0.000 n=15) Div/200000/100000-16 889.0µ ± 0% 892.2µ ± 0% +0.35% (p=0.000 n=15) Div/2000000/1000000-16 38.01m ± 0% 38.12m ± 0% +0.30% (p=0.000 n=15) Div/20000000/10000000-16 1.437 ± 0% 1.444 ± 0% +0.50% (p=0.000 n=15) NatMul/10-16 166.4n ± 2% 169.5n ± 1% +1.86% (p=0.000 n=15) NatMul/100-16 5.733µ ± 1% 5.570µ ± 1% -2.84% (p=0.000 n=15) NatMul/1000-16 232.6µ ± 1% 229.8µ ± 0% -1.22% (p=0.000 n=15) NatMul/10000-16 9.039m ± 1% 8.969m ± 0% -0.77% (p=0.000 n=15) NatMul/100000-16 367.0m ± 0% 368.8m ± 0% +0.48% (p=0.000 n=15) geomean 16.15µ 15.50µ -4.01% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15) NatMul/1000-16 48.33Ki ± 0% 16.02Ki ± 0% -66.85% (p=0.000 n=15) NatMul/10000-16 536.5Ki ± 1% 165.7Ki ± 3% -69.12% (p=0.000 n=15) NatMul/100000-16 6.078Mi ± 6% 4.197Mi ± 0% -30.94% (p=0.000 n=15) geomean 42.81Ki 20.64Ki -51.78% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-16 2.000 ± 50% 1.000 ± 0% -50.00% (p=0.001 n=15) NatMul/100000-16 9.000 ± 11% 8.000 ± 12% -11.11% (p=0.001 n=15) geomean 1.783 1.516 -14.97% ¹ all samples are equal goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-12 9.850n ± 1% 9.405n ± 1% -4.52% (p=0.000 n=15) Div/40/20-12 9.858n ± 0% 9.403n ± 1% -4.62% (p=0.000 n=15) Div/100/50-12 16.40n ± 1% 14.81n ± 0% -9.70% (p=0.000 n=15) Div/200/100-12 88.48n ± 2% 80.88n ± 0% -8.59% (p=0.000 n=15) Div/400/200-12 107.90n ± 1% 99.28n ± 1% -7.99% (p=0.000 n=15) Div/1000/500-12 188.8n ± 1% 178.6n ± 1% -5.40% (p=0.000 n=15) Div/2000/1000-12 399.9n ± 0% 389.1n ± 0% -2.70% (p=0.000 n=15) Div/20000/10000-12 13.94µ ± 2% 13.81µ ± 1% ~ (p=0.574 n=15) Div/200000/100000-12 523.8µ ± 0% 521.7µ ± 0% -0.40% (p=0.000 n=15) Div/2000000/1000000-12 21.46m ± 0% 21.48m ± 0% ~ (p=0.067 n=15) Div/20000000/10000000-12 812.5m ± 0% 812.9m ± 0% ~ (p=0.061 n=15) NatMul/10-12 77.14n ± 0% 78.35n ± 1% +1.57% (p=0.000 n=15) NatMul/100-12 2.999µ ± 0% 2.871µ ± 1% -4.27% (p=0.000 n=15) NatMul/1000-12 126.2µ ± 0% 126.8µ ± 0% +0.51% (p=0.011 n=15) NatMul/10000-12 5.099m ± 0% 5.125m ± 0% +0.51% (p=0.000 n=15) NatMul/100000-12 206.7m ± 0% 208.4m ± 0% +0.80% (p=0.000 n=15) geomean 9.512µ 9.236µ -2.91% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-12 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-12 4.750Ki ± 0% 1.750Ki ± 0% -63.16% (p=0.000 n=15) NatMul/1000-12 48.13Ki ± 0% 16.01Ki ± 0% -66.73% (p=0.000 n=15) NatMul/10000-12 483.5Ki ± 1% 163.2Ki ± 2% -66.24% (p=0.000 n=15) NatMul/100000-12 5.480Mi ± 4% 1.532Mi ± 104% -72.05% (p=0.000 n=15) geomean 41.03Ki 16.82Ki -59.01% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-12 5.000 ± 0% 1.000 ± 400% -80.00% (p=0.007 n=15) geomean 1.380 1.000 -27.52% ¹ all samples are equal Change-Id: I7efa6fe37971ed26ae120a32250fcb47ece0a011 Reviewed-on: https://go-review.googlesource.com/c/go/+/650638 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>

If Be and Le stand for big-endian and little-endian, then they should be BE and LE. Change-Id: I723e3962b8918da84791783d3c547638f1c9e8a9 Reviewed-on: https://go-review.googlesource.com/c/go/+/627376 Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Change-Id: Id07f16d14133ee539bc2880b39641c42418fa6e2 GitHub-Last-Rev: 7b327d508f677f2476d24f046d25921f4599dd9a GitHub-Pull-Request: golang/go#67319 Reviewed-on: https://go-review.googlesource.com/c/go/+/585016 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com>

Change-Id: I07c3a498ce1e462c3d1703d77e7d7824e9334651 GitHub-Last-Rev: 2ba8c4c705eaeb0772109ece7296978b62467eb3 GitHub-Pull-Request: golang/go#66312 Reviewed-on: https://go-review.googlesource.com/c/go/+/571636 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Compute median as a + (b-a)/2 instead of (a + b)/2. Add additional test cases. Fixes #65025. Change-Id: Ib716a1036c17f8f33f51e33cedab13512eb7e0be Reviewed-on: https://go-review.googlesource.com/c/go/+/554617 Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Robert Griesemer <gri@google.com> Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>

Change-Id: I71a38dd20bfaf2b1aed18892d54eeb017d3d7d66 GitHub-Last-Rev: 8da43b2cbd563ed123690709e519c9f84272b332 GitHub-Pull-Request: golang/go#61955 Reviewed-on: https://go-review.googlesource.com/c/go/+/518595 Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: qiulaidongfeng <2645477756@qq.com>

CL 450055 replaced BitLen with a slower constant-time implementation, which caused a performance regression in some ecosystem benchmarks. https://perf.golang.org/search?q=upload%3A20221130.13+pkg%3Agithub.com%2Fericlagergren%2Fdecimal%2Fbenchmarks Current tip vs this CL name old time/op new time/op delta Pi/foo=ericlagergren_(Go)/prec=100-4 151µs ± 0% 129µs ± 0% -14.89% (p=0.000 n=10+9) Pi/foo=ericlagergren_(GDA)/prec=100-4 305µs ± 0% 269µs ± 1% -11.88% (p=0.000 n=9+10) Pi/foo=cockroachdb/apd/prec=100-4 5.74ms ± 0% 5.33ms ± 0% -7.02% (p=0.000 n=9+10) Pi/foo=shopspring/prec=100-4 265µs ±16% 268µs ±11% ~ (p=0.796 n=10+10) Pi/foo=apmckinlay/prec=100-4 3.10µs ± 0% 3.08µs ± 0% -0.60% (p=0.000 n=8+10) Pi/foo=go-inf/prec=100-4 132µs ± 9% 137µs ± 9% ~ (p=0.182 n=10+9) Pi/foo=float64/prec=100-4 4.97µs ± 0% 4.98µs ± 0% ~ (p=0.196 n=10+10) CL 450055's parent vs this CL name old time/op new time/op delta Pi/foo=ericlagergren_(Go)/prec=100-4 129µs ± 1% 129µs ± 0% ~ (p=0.182 n=10+9) Pi/foo=ericlagergren_(GDA)/prec=100-4 267µs ± 1% 269µs ± 1% +0.93% (p=0.001 n=9+10) Pi/foo=shopspring/prec=100-4 252µs ± 9% 268µs ±11% ~ (p=0.052 n=10+10) Pi/foo=apmckinlay/prec=100-4 3.10µs ± 1% 3.08µs ± 0% -0.66% (p=0.000 n=9+10) Pi/foo=go-inf/prec=100-4 135µs ± 6% 137µs ± 9% ~ (p=0.605 n=9+9) Pi/foo=float64/prec=100-4 4.97µs ± 0% 4.98µs ± 0% +0.23% (p=0.005 n=8+10) goos: linux goarch: amd64 pkg: github.com/ericlagergren/decimal_benchmarks cpu: Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz Fixes #57014 Change-Id: I08478bea122212320a592ad2652e33077807de09 Reviewed-on: https://go-review.googlesource.com/c/go/+/454617 Reviewed-by: Roland Shoemaker <roland@golang.org> Run-TryBot: Filippo Valsorda <filippo@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Auto-Submit: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>

This will let us reuse it in crypto/ecdsa for the NIST scalar fields. The main change in API is around encoding and decoding. The SetBytes + ExpandFor sequence was hacky: SetBytes could produce a bigger size than the modulus if leading zeroes in the top byte overflowed the limb boundary, so ExpandFor had to check for and tolerate that. Also, the caller was responsible for checking that the overflow was actually all zeroes (which we weren't doing, exposing a crasher in decryption and signature verification) and then for checking that the result was less than the modulus. Instead, make SetBytes take a modulus and return an error if the value overflows. Same with Bytes: we were always allocating based on Size before FillBytes anyway, so now Bytes takes a modulus. Finally, SetBig was almost only used for moduli, so replaced NewModulusFromNat and SetBig with NewModulusFromBig. Moved the constant-time bitLen to math/big.Int.BitLen. It's slower, but BitLen is primarily used in cryptographic code, so it's safer this way. Change-Id: Ibaf7f36d80695578cb80484167d82ce1aa83832f Reviewed-on: https://go-review.googlesource.com/c/go/+/450055 Auto-Submit: Filippo Valsorda <filippo@golang.org> Run-TryBot: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Roland Shoemaker <roland@golang.org>

Montgomery multiplication can be used for Exp mod even m by splitting it into two steps - Exp mod an odd number and Exp mod a power of two - and then combining the two results using the Chinese Remainder Theorem. For more details, see Ç. K. Koç, “Montgomery Reduction with Even Modulus”, IEE Proceedings: Computers and Digital Techniques, 141(5) 314-316, September 1994. http://www.people.vcu.edu/~jwang3/CMSC691/j34monex.pdf Thanks to Guido Vranken for suggesting that we use a faster algorithm. name old time/op new time/op delta ExpMont/Odd-16 240µs ± 2% 239µs ± 2% ~ (p=0.853 n=10+10) ExpMont/Even1-16 757µs ± 3% 249µs ± 6% -67.17% (p=0.000 n=10+10) ExpMont/Even2-16 755µs ± 1% 244µs ± 4% -67.64% (p=0.000 n=8+10) ExpMont/Even3-16 771µs ± 3% 240µs ± 2% -68.90% (p=0.000 n=10+10) ExpMont/Even4-16 775µs ± 3% 241µs ± 2% -68.91% (p=0.000 n=10+10) ExpMont/Even8-16 779µs ± 2% 241µs ± 3% -69.06% (p=0.000 n=9+10) ExpMont/Even32-16 778µs ± 3% 240µs ± 4% -69.11% (p=0.000 n=9+9) ExpMont/Even64-16 774µs ± 6% 186µs ± 2% -76.00% (p=0.000 n=10+10) ExpMont/Even96-16 776µs ± 4% 186µs ± 6% -76.09% (p=0.000 n=9+10) ExpMont/Even128-16 764µs ± 2% 143µs ± 3% -81.24% (p=0.000 n=10+10) ExpMont/Even255-16 761µs ± 3% 109µs ± 2% -85.73% (p=0.000 n=10+10) ExpMont/SmallEven1-16 45.6µs ± 1% 36.3µs ± 3% -20.49% (p=0.000 n=9+10) ExpMont/SmallEven2-16 44.3µs ± 2% 37.5µs ± 2% -15.26% (p=0.000 n=9+10) ExpMont/SmallEven3-16 44.1µs ± 5% 37.3µs ± 3% -15.32% (p=0.000 n=9+10) ExpMont/SmallEven4-16 47.1µs ± 6% 38.0µs ± 5% -19.40% (p=0.000 n=10+9) name old alloc/op new alloc/op delta ExpMont/Odd-16 2.53kB ± 0% 2.53kB ± 0% ~ (p=0.137 n=8+10) ExpMont/Even1-16 2.57kB ± 0% 3.31kB ± 0% +28.90% (p=0.000 n=8+10) ExpMont/Even2-16 2.57kB ± 0% 3.37kB ± 0% +31.09% (p=0.000 n=9+10) ExpMont/Even3-16 2.57kB ± 0% 3.37kB ± 0% +31.08% (p=0.000 n=8+8) ExpMont/Even4-16 2.57kB ± 0% 3.37kB ± 0% +31.09% (p=0.000 n=9+10) ExpMont/Even8-16 2.57kB ± 0% 3.37kB ± 0% +31.09% (p=0.000 n=9+10) ExpMont/Even32-16 2.57kB ± 0% 3.37kB ± 0% +31.14% (p=0.000 n=10+10) ExpMont/Even64-16 2.57kB ± 0% 3.16kB ± 0% +22.99% (p=0.000 n=9+9) ExpMont/Even96-16 2.57kB ± 0% 3.44kB ± 0% +33.90% (p=0.000 n=10+8) ExpMont/Even128-16 2.57kB ± 0% 2.88kB ± 0% +12.10% (p=0.000 n=10+10) ExpMont/Even255-16 2.57kB ± 0% 2.54kB ± 0% -1.30% (p=0.000 n=9+10) ExpMont/SmallEven1-16 872B ± 0% 1232B ± 0% +41.28% (p=0.000 n=10+10) ExpMont/SmallEven2-16 872B ± 0% 1233B ± 0% +41.40% (p=0.000 n=10+9) ExpMont/SmallEven3-16 872B ± 0% 1289B ± 0% +47.82% (p=0.000 n=10+10) ExpMont/SmallEven4-16 872B ± 0% 1241B ± 0% +42.32% (p=0.000 n=10+10) name old allocs/op new allocs/op delta ExpMont/Odd-16 21.0 ± 0% 21.0 ± 0% ~ (all equal) ExpMont/Even1-16 24.0 ± 0% 38.0 ± 0% +58.33% (p=0.000 n=10+10) ExpMont/Even2-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10) ExpMont/Even3-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10) ExpMont/Even4-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10) ExpMont/Even8-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10) ExpMont/Even32-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10) ExpMont/Even64-16 24.0 ± 0% 39.0 ± 0% +62.50% (p=0.000 n=10+10) ExpMont/Even96-16 24.0 ± 0% 42.0 ± 0% +75.00% (p=0.000 n=10+10) ExpMont/Even128-16 24.0 ± 0% 40.0 ± 0% +66.67% (p=0.000 n=10+10) ExpMont/Even255-16 24.0 ± 0% 38.0 ± 0% +58.33% (p=0.000 n=10+10) ExpMont/SmallEven1-16 16.0 ± 0% 35.0 ± 0% +118.75% (p=0.000 n=10+10) ExpMont/SmallEven2-16 16.0 ± 0% 35.0 ± 0% +118.75% (p=0.000 n=10+10) ExpMont/SmallEven3-16 16.0 ± 0% 37.0 ± 0% +131.25% (p=0.000 n=10+10) ExpMont/SmallEven4-16 16.0 ± 0% 36.0 ± 0% +125.00% (p=0.000 n=10+10) Change-Id: Ib7f70290f8f087b78805ec3120baf17dd7737ac3 Reviewed-on: https://go-review.googlesource.com/c/go/+/420897 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Roland Shoemaker <roland@golang.org> Auto-Submit: Russ Cox <rsc@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>

[This CL is part of a sequence implementing the proposal #51082. The design doc is at https://go.dev/s/godocfmt-design.] Run the updated gofmt, which reformats doc comments, on the main repository. Vendored files are excluded. For #51082. Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407 Reviewed-on: https://go-review.googlesource.com/c/go/+/384268 Reviewed-by: Jonathan Amsterdam <jba@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>

A future change to gofmt will rewrite // Doc comment. // func f() to // Doc comment. func f() Apply that change preemptively to all doc comments. For #51082. Change-Id: I4023e16cfb0729b64a8590f071cd92f17343081d Reviewed-on: https://go-review.googlesource.com/c/go/+/384259 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>

A run of lines that are indented with any number of spaces or tabs format as a <pre> block. This commit fixes various doc comments that format badly according to that (standard) rule. For example, consider: // - List item. // Second line. // - Another item. Because the - lines are unindented, this is actually two paragraphs separated by a one-line <pre> block. This CL rewrites it to: // - List item. // Second line. // - Another item. Today, that will format as a single <pre> block. In a future release, we hope to format it as a bulleted list. Various other minor fixes as well, all in preparation for reformatting. For #51082. Change-Id: I95cf06040d4186830e571cd50148be3bf8daf189 Reviewed-on: https://go-review.googlesource.com/c/go/+/384257 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>

Change-Id: I7dfae0fc91c2d70873ec7ec920be7c0a4888153a Reviewed-on: https://go-review.googlesource.com/c/go/+/390175 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Trust: Daniel Martí <mvdan@mvdan.cc>

Code moved and functions reordered to be in a consistent top-down dependency order, but otherwise unchanged. First step toward commenting division algorithms. Change-Id: Ib5e604fb5b2867edff3a228ba4e57b5cb32c4137 Reviewed-on: https://go-review.googlesource.com/c/go/+/321077 Trust: Russ Cox <rsc@golang.org> Trust: Katie Hockman <katie@golang.org> Trust: Robert Griesemer <gri@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Katie Hockman <katie@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>

There appears to be a typo in the description of the recursive division algorithm. Two things seem suspicious with the original comment: 1. It is talking about choosing s, but s doesn't appear anywhere in the equation. 2. The math in the equation is incorrect. Where B = len(v)/2 s = B - 1 Proof that it is incorrect: len(v) - B >= B + 1 len(v) - len(v)/2 >= len(v)/2 + 1 This doesn't hold if len(v) is even, e.g. 10: 10 - 10/2 >= 10/2 + 1 10 - 5 >= 5 + 1 5 >= 6 // this is false The new equation will be the following, which will be mathematically correct: len(v) - s >= B + 1 len(v) - (len(v)/2 - 1) >= len(v)/2 + 1 len(v) - len(v)/2 + 1 >= len(v)/2 + 1 len(v) - len(v)/2 >= len(v)/2 This holds if len(v) is even or odd. e.g. 10 10 - 10/2 >= 10/2 10 - 5 >= 5 5 >= 5 e.g. 11 11 - 11/2 >= 11/2 11 - 5 >= 5 6 >= 5 Change-Id: If77ce09286cf7038637b5dfd0fb7d4f828023f56 Reviewed-on: https://go-review.googlesource.com/c/go/+/287372 Run-TryBot: Katie Hockman <katie@golang.org> Reviewed-by: Filippo Valsorda <filippo@golang.org> Trust: Katie Hockman <katie@golang.org>

The previous s value could cause a crash for certain inputs. Will check in tests and documentation improvements later. Thanks to the Go Ethereum team and the OSS-Fuzz project for reporting this. Thanks to Rémy Oudompheng and Robert Griesemer for their help developing and validating the fix. Fixes CVE-2020-28362 Change-Id: Ibbf455c4436bcdb07c84a34fa6551fb3422356d3 Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/899974 Reviewed-by: Roland Shoemaker <bracewell@google.com> Reviewed-by: Filippo Valsorda <valsorda@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/269657 Trust: Katie Hockman <katie@golang.org> Trust: Roland Shoemaker <roland@golang.org> Run-TryBot: Katie Hockman <katie@golang.org> Reviewed-by: Roland Shoemaker <roland@golang.org> TryBot-Result: Go Bot <gobot@golang.org>

Division is much slower than multiplication. And the method of using multiplication by multiplying reciprocal and replacing division with it can increase the speed of divWVW algorithm by three times,and at the same time increase the speed of nats division. The benchmark test on arm64 is as follows: name old time/op new time/op delta DivWVW/1-4 13.1ns ± 4% 13.3ns ± 4% ~ (p=0.444 n=5+5) DivWVW/2-4 48.6ns ± 1% 51.2ns ± 2% +5.39% (p=0.008 n=5+5) DivWVW/3-4 82.0ns ± 1% 69.7ns ± 1% -15.03% (p=0.008 n=5+5) DivWVW/4-4 116ns ± 1% 71ns ± 2% -38.88% (p=0.008 n=5+5) DivWVW/5-4 152ns ± 1% 84ns ± 4% -44.70% (p=0.008 n=5+5) DivWVW/10-4 319ns ± 1% 155ns ± 4% -51.50% (p=0.008 n=5+5) DivWVW/100-4 3.44µs ± 3% 1.30µs ± 8% -62.30% (p=0.008 n=5+5) DivWVW/1000-4 33.8µs ± 0% 10.9µs ± 1% -67.74% (p=0.008 n=5+5) DivWVW/10000-4 343µs ± 4% 111µs ± 5% -67.63% (p=0.008 n=5+5) DivWVW/100000-4 3.35ms ± 1% 1.25ms ± 3% -62.79% (p=0.008 n=5+5) QuoRem-4 3.08µs ± 2% 2.21µs ± 4% -28.40% (p=0.008 n=5+5) ModSqrt225_Tonelli-4 444µs ± 2% 457µs ± 3% ~ (p=0.095 n=5+5) ModSqrt225_3Mod4-4 136µs ± 1% 138µs ± 3% ~ (p=0.151 n=5+5) ModSqrt231_Tonelli-4 473µs ± 3% 483µs ± 4% ~ (p=0.548 n=5+5) ModSqrt231_5Mod8-4 164µs ± 9% 169µs ±12% ~ (p=0.421 n=5+5) Sqrt-4 36.8µs ± 1% 28.6µs ± 0% -22.17% (p=0.016 n=5+4) Div/20/10-4 50.0ns ± 3% 51.3ns ± 6% ~ (p=0.238 n=5+5) Div/40/20-4 49.8ns ± 2% 51.3ns ± 6% ~ (p=0.222 n=5+5) Div/100/50-4 85.8ns ± 4% 86.5ns ± 5% ~ (p=0.246 n=5+5) Div/200/100-4 335ns ± 3% 296ns ± 2% -11.60% (p=0.008 n=5+5) Div/400/200-4 442ns ± 2% 359ns ± 5% -18.81% (p=0.008 n=5+5) Div/1000/500-4 858ns ± 3% 643ns ± 6% -25.06% (p=0.008 n=5+5) Div/2000/1000-4 1.70µs ± 3% 1.28µs ± 4% -24.80% (p=0.008 n=5+5) Div/20000/10000-4 45.0µs ± 5% 41.8µs ± 4% -7.17% (p=0.016 n=5+5) Div/200000/100000-4 1.51ms ± 7% 1.43ms ± 3% -5.42% (p=0.016 n=5+5) Div/2000000/1000000-4 57.6ms ± 4% 57.5ms ± 3% ~ (p=1.000 n=5+5) Div/20000000/10000000-4 2.08s ± 3% 2.04s ± 1% ~ (p=0.095 n=5+5) name old speed new speed delta DivWVW/1-4 4.87GB/s ± 4% 4.80GB/s ± 4% ~ (p=0.310 n=5+5) DivWVW/2-4 2.63GB/s ± 1% 2.50GB/s ± 2% -5.07% (p=0.008 n=5+5) DivWVW/3-4 2.34GB/s ± 1% 2.76GB/s ± 1% +17.70% (p=0.008 n=5+5) DivWVW/4-4 2.21GB/s ± 1% 3.61GB/s ± 2% +63.42% (p=0.008 n=5+5) DivWVW/5-4 2.10GB/s ± 2% 3.81GB/s ± 4% +80.89% (p=0.008 n=5+5) DivWVW/10-4 2.01GB/s ± 0% 4.13GB/s ± 4% +105.91% (p=0.008 n=5+5) DivWVW/100-4 1.86GB/s ± 2% 4.95GB/s ± 7% +165.63% (p=0.008 n=5+5) DivWVW/1000-4 1.89GB/s ± 0% 5.86GB/s ± 1% +209.96% (p=0.008 n=5+5) DivWVW/10000-4 1.87GB/s ± 4% 5.76GB/s ± 5% +208.96% (p=0.008 n=5+5) DivWVW/100000-4 1.91GB/s ± 1% 5.14GB/s ± 3% +168.85% (p=0.008 n=5+5) Change-Id: I049f1196562b20800e6ef8a6493fd147f93ad830 Reviewed-on: https://go-review.googlesource.com/c/go/+/250417 Trust: Giovanni Bajo <rasky@develer.com> Trust: Keith Randall <khr@golang.org> Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>

Replaced almost every use of Bytes with FillBytes. Note that the approved proposal was for func (*Int) FillBytes(buf []byte) while this implements func (*Int) FillBytes(buf []byte) []byte because the latter was far nicer to use in all callsites. Fixes #35833 Change-Id: Ia912df123e5d79b763845312ea3d9a8051343c0a Reviewed-on: https://go-review.googlesource.com/c/go/+/230397 Reviewed-by: Robert Griesemer <gri@golang.org>

The divBasic function computes the quotient of big nats u/v word by word. It estimates each word qhat by performing a long division (top 2 words of u divided by top word of v), looks at the next word to correct the estimate, then perform a full multiplication (qhat*v) to catch any inaccuracy in the estimate. In the latter case, "negative" values appear temporarily and carries must be carefully managed, and the recursive division refactoring introduced a case where qhat*v has the same length as v, triggering an out-of-bounds write in the case it happens when computing the top word of the quotient. Fixes #37499 Change-Id: I15089da4a4027beda43af497bf6de261eb792f94 Reviewed-on: https://go-review.googlesource.com/c/go/+/221980 Reviewed-by: Robert Griesemer <gri@golang.org>

Change-Id: I5b909df0fd048cd66c5a27fca1b06466d3bcaac7 GitHub-Last-Rev: 778c5d21311abee09a5fbda2e4005a5fd4cc3f9f GitHub-Pull-Request: golang/go#35624 Reviewed-on: https://go-review.googlesource.com/c/go/+/207421 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>

The bounds in the last carry branch were wrong as there is no reason for len(u) >= n+n/2 to always hold true. We also adjust test to avoid using a remainder of 1 (in which case, the last step of the algorithm computes (qhatv+1) - qhatv which rarely produces a carry). Change-Id: I69fbab9c5e19d0db1c087fbfcd5b89352c2d26fb Reviewed-on: https://go-review.googlesource.com/c/go/+/206839 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>

The current division algorithm produces one word of result at a time, using 2-word division to compute the top word and mulAddVWW to compute the remainder. The top word may need to be adjusted by 1 or 2 units. The recursive version, based on Burnikel, Ziegler, "Fast Recursive Division", uses the same principles, but in a multi-word setting, so that multiplication benefits from the Karatsuba algorithm (and possibly later improvements). benchmark old ns/op new ns/op delta BenchmarkDiv/20/10-4 38.2 38.3 +0.26% BenchmarkDiv/40/20-4 38.7 38.5 -0.52% BenchmarkDiv/100/50-4 62.5 62.6 +0.16% BenchmarkDiv/200/100-4 238 259 +8.82% BenchmarkDiv/400/200-4 311 338 +8.68% BenchmarkDiv/1000/500-4 604 649 +7.45% BenchmarkDiv/2000/1000-4 1214 1278 +5.27% BenchmarkDiv/20000/10000-4 38279 36510 -4.62% BenchmarkDiv/200000/100000-4 3022057 1359615 -55.01% BenchmarkDiv/2000000/1000000-4 310827664 54012939 -82.62% BenchmarkDiv/20000000/10000000-4 33272829421 1965401359 -94.09% BenchmarkString/10/Base10-4 158 156 -1.27% BenchmarkString/100/Base10-4 797 792 -0.63% BenchmarkString/1000/Base10-4 3677 3814 +3.73% BenchmarkString/10000/Base10-4 16633 17116 +2.90% BenchmarkString/100000/Base10-4 5779029 1793808 -68.96% BenchmarkString/1000000/Base10-4 889840820 85524031 -90.39% BenchmarkString/10000000/Base10-4 134338236860 4935657026 -96.33% Fixes #21960 Updates #30943 Change-Id: I134c6f81a47870c688ca95b6081eb9211def15a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/172018 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>

This notably allows to reuse temporaries across the karatsubaSqr recursion. benchmark old ns/op new ns/op delta BenchmarkNatMul/10-4 227 228 +0.44% BenchmarkNatMul/100-4 8339 8589 +3.00% BenchmarkNatMul/1000-4 313796 312272 -0.49% BenchmarkNatMul/10000-4 11924720 11873589 -0.43% BenchmarkNatMul/100000-4 503813354 503839058 +0.01% BenchmarkNatSqr/20-4 549 513 -6.56% BenchmarkNatSqr/30-4 945 874 -7.51% BenchmarkNatSqr/50-4 1993 1832 -8.08% BenchmarkNatSqr/80-4 4096 3874 -5.42% BenchmarkNatSqr/100-4 6192 5712 -7.75% BenchmarkNatSqr/200-4 20388 19543 -4.14% BenchmarkNatSqr/300-4 38735 36715 -5.21% BenchmarkNatSqr/500-4 99562 93542 -6.05% BenchmarkNatSqr/800-4 195554 184907 -5.44% BenchmarkNatSqr/1000-4 286302 275053 -3.93% BenchmarkNatSqr/10000-4 9817057 9441641 -3.82% BenchmarkNatSqr/100000-4 390713416 379696789 -2.82% benchmark old allocs new allocs delta BenchmarkNatMul/10-4 1 1 +0.00% BenchmarkNatMul/100-4 1 1 +0.00% BenchmarkNatMul/1000-4 2 1 -50.00% BenchmarkNatMul/10000-4 2 1 -50.00% BenchmarkNatMul/100000-4 9 11 +22.22% BenchmarkNatSqr/20-4 2 1 -50.00% BenchmarkNatSqr/30-4 2 1 -50.00% BenchmarkNatSqr/50-4 2 1 -50.00% BenchmarkNatSqr/80-4 2 1 -50.00% BenchmarkNatSqr/100-4 2 1 -50.00% BenchmarkNatSqr/200-4 2 1 -50.00% BenchmarkNatSqr/300-4 4 1 -75.00% BenchmarkNatSqr/500-4 4 1 -75.00% BenchmarkNatSqr/800-4 10 1 -90.00% BenchmarkNatSqr/1000-4 10 1 -90.00% BenchmarkNatSqr/10000-4 731 1 -99.86% BenchmarkNatSqr/100000-4 19687 6 -99.97% benchmark old bytes new bytes delta BenchmarkNatMul/10-4 192 192 +0.00% BenchmarkNatMul/100-4 4864 4864 +0.00% BenchmarkNatMul/1000-4 57344 49224 -14.16% BenchmarkNatMul/10000-4 565248 498772 -11.76% BenchmarkNatMul/100000-4 5749504 7263720 +26.34% BenchmarkNatSqr/20-4 672 352 -47.62% BenchmarkNatSqr/30-4 992 512 -48.39% BenchmarkNatSqr/50-4 1792 896 -50.00% BenchmarkNatSqr/80-4 2688 1408 -47.62% BenchmarkNatSqr/100-4 3584 1792 -50.00% BenchmarkNatSqr/200-4 6656 3456 -48.08% BenchmarkNatSqr/300-4 24448 16387 -32.97% BenchmarkNatSqr/500-4 36864 24591 -33.29% BenchmarkNatSqr/800-4 69760 40981 -41.25% BenchmarkNatSqr/1000-4 86016 49180 -42.82% BenchmarkNatSqr/10000-4 2524800 487368 -80.70% BenchmarkNatSqr/100000-4 68599808 5876581 -91.43% Change-Id: I8e6e409ae1cb48be9d5aa9b5f428d6cbe487673a Reviewed-on: https://go-review.googlesource.com/c/go/+/172017 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>

This fixes an old oversight. Rat.SetString already permitted fractions a/b where both a and b could independently specify a base prefix. With this CL, it now also accepts non-decimal floating-point numbers. Fixes #29799. Change-Id: I9cc65666a5cebb00f0202da2e4fc5654a02e3234 Reviewed-on: https://go-review.googlesource.com/c/go/+/168237 Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>

The proposed change introduces a better initial guess which is closer to the final value and therefore converges in fewer steps. Consider for example sqrt(8): previously the guess was 8, whereas now it is 4 (and the result is 2). All this change does is it computes the division by two more accurately while it keeps the guess ≥ √x. Change-Id: I917248d734a7b0488d14a647a063f674e56c4e30 GitHub-Last-Rev: c06d9d4876c8e7d6739f0e4b687e370fe1e9aad7 GitHub-Pull-Request: golang/go#28981 Reviewed-on: https://go-review.googlesource.com/c/163866 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>

For many uses of math/big, most numbers are small in practice. Prior to this change, big.NewInt allocated a minimum of five Words: one to hold the value, and four as extra capacity. In most cases, this extra capacity is waste. Worse, allocating a single Word uses a fast malloc path for tiny allocs; allocating five Words is more expensive in CPU as well as memory. This change is a simple fix: Treat a request for one Word at its word. I experimented with more complicated fixes and did not find anything that outperformed this easy fix. On some real world programs, this is a clear win. The compiler: name old alloc/op new alloc/op delta Template 37.1MB ± 0% 37.0MB ± 0% -0.23% (p=0.008 n=5+5) Unicode 29.2MB ± 0% 28.5MB ± 0% -2.48% (p=0.008 n=5+5) GoTypes 133MB ± 0% 133MB ± 0% -0.05% (p=0.008 n=5+5) Compiler 628MB ± 0% 628MB ± 0% -0.06% (p=0.008 n=5+5) SSA 2.04GB ± 0% 2.03GB ± 0% -0.14% (p=0.008 n=5+5) Flate 24.7MB ± 0% 24.6MB ± 0% -0.23% (p=0.008 n=5+5) GoParser 29.6MB ± 0% 29.6MB ± 0% -0.07% (p=0.008 n=5+5) Reflect 82.3MB ± 0% 82.2MB ± 0% -0.05% (p=0.008 n=5+5) Tar 36.2MB ± 0% 36.2MB ± 0% -0.12% (p=0.008 n=5+5) XML 49.5MB ± 0% 49.4MB ± 0% -0.23% (p=0.008 n=5+5) [Geo mean] 85.1MB 84.8MB -0.37% name old allocs/op new allocs/op delta Template 364k ± 0% 364k ± 0% ~ (p=0.476 n=5+5) Unicode 341k ± 0% 341k ± 0% ~ (p=0.690 n=5+5) GoTypes 1.37M ± 0% 1.37M ± 0% ~ (p=0.444 n=5+5) Compiler 5.50M ± 0% 5.50M ± 0% +0.02% (p=0.008 n=5+5) SSA 16.0M ± 0% 16.0M ± 0% +0.01% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% ~ (p=0.222 n=5+5) GoParser 305k ± 0% 305k ± 0% ~ (p=0.841 n=5+5) Reflect 976k ± 0% 976k ± 0% ~ (p=0.222 n=5+5) Tar 354k ± 0% 354k ± 0% ~ (p=0.103 n=5+5) XML 450k ± 0% 450k ± 0% ~ (p=0.151 n=5+5) [Geo mean] 837k 837k +0.01% go.skylark.net (at ea6d2813de75ded8d157b9540bc3d3ad0b688623): name old alloc/op new alloc/op delta Hashtable-8 456kB ± 0% 299kB ± 0% -34.33% (p=0.000 n=9+9) /bench_builtin_method-8 220kB ± 0% 190kB ± 0% -13.55% (p=0.000 n=9+10) name old allocs/op new allocs/op delta Hashtable-8 7.84k ± 0% 7.84k ± 0% ~ (all equal) /bench_builtin_method-8 7.49k ± 0% 7.49k ± 0% ~ (all equal) The math/big benchmarks are messy, which is predictable, since they naturally exercise the bigger-than-one-word code more. Also worth noting is that many of the benchmarks have very high variance. I've omitted the opVV and opVW benchmarks, as they are unrelated. name old time/op new time/op delta DecimalConversion-8 92.5µs ± 1% 90.6µs ± 0% -2.12% (p=0.000 n=17+19) FloatString/100-8 867ns ± 0% 871ns ± 0% +0.50% (p=0.000 n=18+18) FloatString/1000-8 26.4µs ± 1% 26.5µs ± 1% ~ (p=0.396 n=20+19) FloatString/10000-8 2.15ms ± 2% 2.16ms ± 2% ~ (p=0.089 n=19+20) FloatString/100000-8 209ms ± 1% 209ms ± 1% ~ (p=0.583 n=19+19) FloatAdd/10-8 63.5ns ± 2% 64.1ns ± 6% ~ (p=0.389 n=19+19) FloatAdd/100-8 66.0ns ± 2% 65.8ns ± 2% ~ (p=0.825 n=20+20) FloatAdd/1000-8 93.9ns ± 1% 94.3ns ± 1% ~ (p=0.273 n=19+20) FloatAdd/10000-8 347ns ± 2% 342ns ± 1% -1.50% (p=0.000 n=18+18) FloatAdd/100000-8 2.78µs ± 1% 2.78µs ± 2% ~ (p=0.961 n=20+19) FloatSub/10-8 56.9ns ± 2% 57.8ns ± 3% +1.59% (p=0.001 n=19+19) FloatSub/100-8 58.2ns ± 2% 58.9ns ± 2% +1.25% (p=0.004 n=20+20) FloatSub/1000-8 74.9ns ± 1% 74.4ns ± 1% -0.76% (p=0.000 n=19+20) FloatSub/10000-8 223ns ± 1% 220ns ± 2% -1.29% (p=0.000 n=16+20) FloatSub/100000-8 1.66µs ± 1% 1.66µs ± 2% ~ (p=0.147 n=20+20) ParseFloatSmallExp-8 8.38µs ± 0% 8.59µs ± 0% +2.48% (p=0.000 n=19+19) ParseFloatLargeExp-8 31.1µs ± 0% 32.0µs ± 0% +3.04% (p=0.000 n=16+17) GCD10x10/WithoutXY-8 115ns ± 1% 99ns ± 3% -14.07% (p=0.000 n=20+20) GCD10x10/WithXY-8 322ns ± 0% 312ns ± 0% -3.11% (p=0.000 n=18+13) GCD10x100/WithoutXY-8 233ns ± 1% 219ns ± 1% -5.73% (p=0.000 n=19+17) GCD10x100/WithXY-8 709ns ± 0% 759ns ± 0% +7.04% (p=0.000 n=19+19) GCD10x1000/WithoutXY-8 653ns ± 1% 642ns ± 1% -1.69% (p=0.000 n=17+20) GCD10x1000/WithXY-8 1.35µs ± 0% 1.35µs ± 1% ~ (p=0.255 n=20+16) GCD10x10000/WithoutXY-8 4.57µs ± 1% 4.61µs ± 1% +0.95% (p=0.000 n=18+17) GCD10x10000/WithXY-8 6.82µs ± 0% 6.84µs ± 0% +0.27% (p=0.000 n=16+17) GCD10x100000/WithoutXY-8 43.9µs ± 1% 44.0µs ± 1% +0.28% (p=0.000 n=18+17) GCD10x100000/WithXY-8 60.6µs ± 0% 60.6µs ± 0% ~ (p=0.907 n=18+18) GCD100x100/WithoutXY-8 1.13µs ± 0% 1.21µs ± 0% +6.39% (p=0.000 n=19+19) GCD100x100/WithXY-8 1.82µs ± 0% 1.92µs ± 0% +5.24% (p=0.000 n=19+17) GCD100x1000/WithoutXY-8 2.00µs ± 0% 2.03µs ± 1% +1.61% (p=0.000 n=18+16) GCD100x1000/WithXY-8 3.22µs ± 0% 3.20µs ± 1% -0.83% (p=0.000 n=19+19) GCD100x10000/WithoutXY-8 9.28µs ± 1% 9.17µs ± 1% -1.25% (p=0.000 n=18+19) GCD100x10000/WithXY-8 13.5µs ± 0% 13.3µs ± 0% -1.12% (p=0.000 n=18+19) GCD100x100000/WithoutXY-8 80.4µs ± 0% 78.6µs ± 0% -2.25% (p=0.000 n=19+19) GCD100x100000/WithXY-8 114µs ± 0% 112µs ± 0% -1.46% (p=0.000 n=19+17) GCD1000x1000/WithoutXY-8 12.9µs ± 1% 12.9µs ± 2% -0.50% (p=0.014 n=20+19) GCD1000x1000/WithXY-8 19.6µs ± 1% 19.6µs ± 2% -0.28% (p=0.040 n=17+18) GCD1000x10000/WithoutXY-8 22.4µs ± 0% 22.4µs ± 2% ~ (p=0.220 n=19+19) GCD1000x10000/WithXY-8 57.0µs ± 0% 56.5µs ± 0% -0.87% (p=0.000 n=20+20) GCD1000x100000/WithoutXY-8 116µs ± 0% 115µs ± 0% -0.49% (p=0.000 n=18+19) GCD1000x100000/WithXY-8 410µs ± 0% 411µs ± 0% ~ (p=0.052 n=19+19) GCD10000x10000/WithoutXY-8 247µs ± 1% 244µs ± 1% -0.92% (p=0.000 n=19+19) GCD10000x10000/WithXY-8 476µs ± 1% 473µs ± 1% -0.48% (p=0.009 n=19+19) GCD10000x100000/WithoutXY-8 573µs ± 1% 571µs ± 1% -0.45% (p=0.012 n=20+20) GCD10000x100000/WithXY-8 3.35ms ± 1% 3.35ms ± 1% ~ (p=0.444 n=20+19) GCD100000x100000/WithoutXY-8 12.0ms ± 2% 11.9ms ± 2% ~ (p=0.276 n=18+20) GCD100000x100000/WithXY-8 27.3ms ± 1% 27.3ms ± 1% ~ (p=0.792 n=20+19) Hilbert-8 672µs ± 0% 611µs ± 0% -9.02% (p=0.000 n=19+19) Binomial-8 1.40µs ± 0% 1.18µs ± 0% -15.69% (p=0.000 n=16+14) QuoRem-8 2.20µs ± 1% 2.17µs ± 1% -1.13% (p=0.000 n=19+19) Exp-8 4.10ms ± 1% 4.11ms ± 1% ~ (p=0.296 n=20+19) Exp2-8 4.11ms ± 1% 4.12ms ± 1% ~ (p=0.429 n=20+20) Bitset-8 8.67ns ± 6% 8.74ns ± 4% ~ (p=0.139 n=19+17) BitsetNeg-8 43.6ns ± 1% 43.8ns ± 2% +0.61% (p=0.036 n=20+20) BitsetOrig-8 77.5ns ± 1% 68.4ns ± 1% -11.77% (p=0.000 n=19+20) BitsetNegOrig-8 145ns ± 1% 141ns ± 1% -2.87% (p=0.000 n=19+20) ModSqrt225_Tonelli-8 324µs ± 1% 324µs ± 1% ~ (p=0.409 n=18+20) ModSqrt225_3Mod4-8 98.9µs ± 1% 99.1µs ± 1% ~ (p=0.298 n=19+18) ModSqrt231_Tonelli-8 337µs ± 1% 337µs ± 1% ~ (p=0.718 n=20+18) ModSqrt231_5Mod8-8 115µs ± 1% 114µs ± 1% -0.22% (p=0.050 n=20+20) ModInverse-8 895ns ± 0% 869ns ± 1% -2.83% (p=0.000 n=17+17) Sqrt-8 28.1µs ± 1% 28.1µs ± 0% -0.28% (p=0.000 n=16+20) IntSqr/1-8 10.8ns ± 3% 10.5ns ± 3% -2.51% (p=0.000 n=19+17) IntSqr/2-8 30.5ns ± 2% 30.3ns ± 4% -0.71% (p=0.035 n=18+18) IntSqr/3-8 40.1ns ± 1% 40.1ns ± 1% ~ (p=0.710 n=20+17) IntSqr/5-8 65.3ns ± 1% 65.4ns ± 2% ~ (p=0.744 n=19+19) IntSqr/8-8 101ns ± 1% 102ns ± 0% ~ (p=0.234 n=19+20) IntSqr/10-8 138ns ± 0% 138ns ± 2% ~ (p=0.827 n=18+18) IntSqr/20-8 378ns ± 1% 378ns ± 1% ~ (p=0.479 n=18+18) IntSqr/30-8 637ns ± 0% 638ns ± 1% ~ (p=0.051 n=18+20) IntSqr/50-8 1.34µs ± 2% 1.34µs ± 1% ~ (p=0.970 n=18+19) IntSqr/80-8 2.78µs ± 0% 2.78µs ± 1% -0.18% (p=0.006 n=19+17) IntSqr/100-8 3.98µs ± 0% 3.98µs ± 0% ~ (p=0.057 n=17+19) IntSqr/200-8 13.5µs ± 0% 13.5µs ± 1% -0.33% (p=0.000 n=19+17) IntSqr/300-8 25.3µs ± 1% 25.3µs ± 1% ~ (p=0.361 n=19+20) IntSqr/500-8 62.9µs ± 0% 62.9µs ± 1% ~ (p=0.899 n=17+17) IntSqr/800-8 128µs ± 1% 127µs ± 1% -0.32% (p=0.016 n=18+20) IntSqr/1000-8 192µs ± 0% 192µs ± 1% ~ (p=0.916 n=17+18) Div/20/10-8 34.9ns ± 2% 35.6ns ± 1% +2.01% (p=0.000 n=20+20) Div/200/100-8 218ns ± 1% 215ns ± 2% -1.43% (p=0.000 n=18+18) Div/2000/1000-8 1.16µs ± 1% 1.15µs ± 1% -1.04% (p=0.000 n=19+20) Div/20000/10000-8 35.7µs ± 1% 35.4µs ± 1% -0.69% (p=0.000 n=19+18) Div/200000/100000-8 2.89ms ± 1% 2.88ms ± 1% -0.62% (p=0.007 n=19+20) Mul-8 9.28ms ± 1% 9.27ms ± 1% ~ (p=0.563 n=18+18) ZeroShifts/Shl-8 712ns ± 6% 716ns ± 7% ~ (p=0.597 n=20+20) ZeroShifts/ShlSame-8 4.00ns ± 1% 4.06ns ± 5% ~ (p=0.162 n=18+20) ZeroShifts/Shr-8 714ns ±10% 1285ns ±156% ~ (p=0.250 n=20+20) ZeroShifts/ShrSame-8 4.00ns ± 1% 4.09ns ±10% +2.34% (p=0.048 n=16+19) Exp3Power/0x10-8 154ns ± 0% 159ns ±13% ~ (p=0.197 n=14+20) Exp3Power/0x40-8 171ns ± 1% 175ns ± 8% ~ (p=0.058 n=16+19) Exp3Power/0x100-8 287ns ± 0% 316ns ± 4% +10.03% (p=0.000 n=17+19) Exp3Power/0x400-8 698ns ± 1% 801ns ± 6% +14.75% (p=0.000 n=19+20) Exp3Power/0x1000-8 2.87µs ± 0% 3.65µs ± 6% +27.24% (p=0.000 n=18+18) Exp3Power/0x4000-8 21.9µs ± 1% 28.7µs ± 8% +31.09% (p=0.000 n=18+20) Exp3Power/0x10000-8 204µs ± 0% 267µs ± 9% +30.81% (p=0.000 n=20+20) Exp3Power/0x40000-8 1.86ms ± 0% 2.26ms ± 5% +21.68% (p=0.000 n=18+19) Exp3Power/0x100000-8 17.5ms ± 1% 20.7ms ± 7% +18.39% (p=0.000 n=19+20) Exp3Power/0x400000-8 156ms ± 0% 172ms ± 6% +10.54% (p=0.000 n=19+20) Fibo-8 26.9ms ± 1% 27.5ms ± 3% +2.32% (p=0.000 n=19+19) NatSqr/1-8 31.0ns ± 4% 39.5ns ±29% +27.25% (p=0.000 n=20+19) NatSqr/2-8 54.1ns ± 1% 69.0ns ±28% +27.52% (p=0.000 n=20+20) NatSqr/3-8 66.6ns ± 1% 83.0ns ±25% +24.59% (p=0.000 n=20+20) NatSqr/5-8 97.1ns ± 1% 119.9ns ±12% +23.50% (p=0.000 n=16+20) NatSqr/8-8 138ns ± 1% 171ns ± 9% +24.20% (p=0.000 n=19+20) NatSqr/10-8 182ns ± 0% 225ns ± 9% +23.50% (p=0.000 n=16+20) NatSqr/20-8 447ns ± 1% 624ns ± 6% +39.64% (p=0.000 n=19+19) NatSqr/30-8 736ns ± 2% 986ns ± 9% +33.94% (p=0.000 n=19+20) NatSqr/50-8 1.51µs ± 2% 1.97µs ± 9% +30.42% (p=0.000 n=20+20) NatSqr/80-8 3.03µs ± 1% 3.67µs ± 7% +21.08% (p=0.000 n=20+20) NatSqr/100-8 4.31µs ± 1% 5.20µs ± 7% +20.52% (p=0.000 n=19+20) NatSqr/200-8 14.2µs ± 0% 16.3µs ± 4% +14.92% (p=0.000 n=19+20) NatSqr/300-8 27.8µs ± 1% 33.2µs ± 7% +19.28% (p=0.000 n=20+18) NatSqr/500-8 66.6µs ± 1% 74.5µs ± 3% +11.87% (p=0.000 n=18+18) NatSqr/800-8 135µs ± 1% 165µs ± 7% +22.33% (p=0.000 n=20+20) NatSqr/1000-8 200µs ± 0% 228µs ± 3% +14.39% (p=0.000 n=19+20) NatSetBytes/8-8 8.87ns ± 4% 8.77ns ± 2% -1.17% (p=0.020 n=20+16) NatSetBytes/24-8 38.6ns ± 3% 49.5ns ±29% +28.32% (p=0.000 n=18+19) NatSetBytes/128-8 75.2ns ± 1% 120.7ns ±29% +60.60% (p=0.000 n=17+20) NatSetBytes/7-8 16.2ns ± 2% 16.5ns ± 2% +1.76% (p=0.000 n=20+20) NatSetBytes/23-8 46.5ns ± 1% 60.2ns ±24% +29.59% (p=0.000 n=20+20) NatSetBytes/127-8 83.1ns ± 1% 118.2ns ±20% +42.33% (p=0.000 n=18+20) ScanPi-8 89.1µs ± 1% 117.4µs ±12% +31.75% (p=0.000 n=18+20) StringPiParallel-8 35.1µs ± 9% 40.2µs ±12% +14.53% (p=0.000 n=20+20) Scan/10/Base2-8 410ns ±14% 429ns ±10% +4.47% (p=0.018 n=19+20) Scan/100/Base2-8 3.05µs ±20% 2.97µs ±14% ~ (p=0.449 n=20+20) Scan/1000/Base2-8 29.3µs ± 8% 30.1µs ±23% ~ (p=0.355 n=20+20) Scan/10000/Base2-8 402µs ±13% 395µs ±14% ~ (p=0.355 n=20+20) Scan/100000/Base2-8 11.8ms ±10% 11.6ms ± 1% ~ (p=0.245 n=17+18) Scan/10/Base8-8 194ns ± 6% 196ns ±12% ~ (p=0.829 n=20+19) Scan/100/Base8-8 1.11µs ±15% 1.11µs ±12% ~ (p=0.743 n=20+20) Scan/1000/Base8-8 11.7µs ±10% 11.7µs ±12% ~ (p=0.904 n=20+20) Scan/10000/Base8-8 209µs ± 7% 210µs ± 8% ~ (p=0.478 n=20+20) Scan/100000/Base8-8 10.6ms ± 7% 10.4ms ± 6% ~ (p=0.112 n=20+18) Scan/10/Base10-8 182ns ±12% 188ns ±11% +3.52% (p=0.044 n=20+20) Scan/100/Base10-8 1.01µs ± 8% 1.00µs ±13% ~ (p=0.588 n=20+20) Scan/1000/Base10-8 10.7µs ±20% 10.6µs ±14% ~ (p=0.560 n=20+20) Scan/10000/Base10-8 195µs ±10% 194µs ± 9% ~ (p=0.883 n=20+20) Scan/100000/Base10-8 10.6ms ± 2% 10.6ms ± 2% ~ (p=0.495 n=20+20) Scan/10/Base16-8 166ns ±10% 174ns ±17% ~ (p=0.072 n=20+20) Scan/100/Base16-8 836ns ±10% 826ns ±12% ~ (p=0.562 n=20+17) Scan/1000/Base16-8 8.96µs ±13% 8.65µs ± 9% ~ (p=0.203 n=20+18) Scan/10000/Base16-8 198µs ± 3% 198µs ± 5% ~ (p=0.718 n=20+20) Scan/100000/Base16-8 11.1ms ± 3% 11.0ms ± 4% ~ (p=0.512 n=20+20) String/10/Base2-8 88.1ns ± 7% 94.1ns ±11% +6.80% (p=0.000 n=19+20) String/100/Base2-8 577ns ± 4% 598ns ± 5% +3.72% (p=0.000 n=20+20) String/1000/Base2-8 5.25µs ± 2% 5.62µs ± 5% +7.04% (p=0.000 n=19+20) String/10000/Base2-8 55.6µs ± 1% 60.1µs ± 2% +8.12% (p=0.000 n=19+19) String/100000/Base2-8 519µs ± 2% 560µs ± 2% +7.91% (p=0.000 n=18+17) String/10/Base8-8 52.2ns ± 8% 53.3ns ±12% ~ (p=0.188 n=20+18) String/100/Base8-8 218ns ± 3% 232ns ±10% +6.66% (p=0.000 n=20+20) String/1000/Base8-8 1.84µs ± 3% 1.94µs ± 4% +5.07% (p=0.000 n=20+18) String/10000/Base8-8 18.1µs ± 2% 19.1µs ± 3% +5.84% (p=0.000 n=20+19) String/100000/Base8-8 184µs ± 2% 197µs ± 1% +7.15% (p=0.000 n=19+19) String/10/Base10-8 158ns ± 7% 146ns ± 6% -7.65% (p=0.000 n=20+19) String/100/Base10-8 807ns ± 2% 845ns ± 4% +4.79% (p=0.000 n=20+19) String/1000/Base10-8 3.99µs ± 3% 3.99µs ± 7% ~ (p=0.920 n=20+20) String/10000/Base10-8 20.8µs ± 6% 22.1µs ±10% +6.11% (p=0.000 n=19+20) String/100000/Base10-8 5.60ms ± 2% 5.59ms ± 2% ~ (p=0.749 n=20+19) String/10/Base16-8 49.0ns ±13% 49.3ns ±16% ~ (p=0.581 n=19+20) String/100/Base16-8 173ns ± 5% 185ns ± 6% +6.63% (p=0.000 n=20+18) String/1000/Base16-8 1.38µs ± 3% 1.49µs ±10% +8.27% (p=0.000 n=19+20) String/10000/Base16-8 13.5µs ± 2% 14.5µs ± 3% +7.08% (p=0.000 n=20+20) String/100000/Base16-8 138µs ± 4% 148µs ± 4% +7.57% (p=0.000 n=19+20) LeafSize/0-8 2.74ms ± 1% 2.79ms ± 2% +2.00% (p=0.000 n=19+19) LeafSize/1-8 24.8µs ± 4% 26.1µs ± 8% +5.33% (p=0.000 n=18+19) LeafSize/2-8 24.9µs ± 7% 25.0µs ± 8% ~ (p=0.989 n=20+19) LeafSize/3-8 97.6µs ± 3% 100.2µs ± 5% +2.66% (p=0.001 n=20+19) LeafSize/4-8 25.2µs ± 5% 25.4µs ± 5% ~ (p=0.173 n=19+20) LeafSize/5-8 118µs ± 2% 119µs ± 5% ~ (p=0.478 n=20+20) LeafSize/6-8 97.6µs ± 3% 100.1µs ± 8% +2.65% (p=0.021 n=20+19) LeafSize/7-8 65.6µs ± 5% 67.5µs ± 6% +2.92% (p=0.003 n=20+19) LeafSize/8-8 25.5µs ± 5% 25.6µs ± 6% ~ (p=0.461 n=19+20) LeafSize/9-8 134µs ± 4% 136µs ± 5% ~ (p=0.194 n=19+20) LeafSize/10-8 119µs ± 3% 122µs ± 3% +2.52% (p=0.000 n=20+19) LeafSize/11-8 115µs ± 5% 116µs ± 5% ~ (p=0.158 n=20+19) LeafSize/12-8 97.4µs ± 4% 100.3µs ± 5% +2.91% (p=0.003 n=19+20) LeafSize/13-8 93.1µs ± 4% 93.0µs ± 6% ~ (p=0.698 n=20+20) LeafSize/14-8 67.0µs ± 3% 69.7µs ± 6% +4.10% (p=0.000 n=20+20) LeafSize/15-8 48.3µs ± 2% 49.3µs ± 6% +1.91% (p=0.014 n=19+20) LeafSize/16-8 25.6µs ± 5% 25.6µs ± 6% ~ (p=0.947 n=20+20) LeafSize/32-8 30.1µs ± 4% 30.3µs ± 5% ~ (p=0.685 n=18+19) LeafSize/64-8 53.4µs ± 2% 54.0µs ± 3% ~ (p=0.053 n=19+19) ProbablyPrime/n=0-8 3.59ms ± 1% 3.55ms ± 1% -1.12% (p=0.000 n=20+18) ProbablyPrime/n=1-8 4.21ms ± 2% 4.17ms ± 2% -0.73% (p=0.018 n=20+19) ProbablyPrime/n=5-8 6.74ms ± 1% 6.72ms ± 1% ~ (p=0.102 n=20+20) ProbablyPrime/n=10-8 9.91ms ± 1% 9.89ms ± 2% ~ (p=0.322 n=19+20) ProbablyPrime/n=20-8 16.2ms ± 1% 16.1ms ± 2% -0.52% (p=0.006 n=19+19) ProbablyPrime/Lucas-8 2.94ms ± 1% 2.95ms ± 1% +0.52% (p=0.002 n=18+19) ProbablyPrime/MillerRabinBase2-8 641µs ± 2% 640µs ± 2% ~ (p=0.607 n=19+20) FloatSqrt/64-8 653ns ± 5% 704ns ± 5% +7.82% (p=0.000 n=19+20) FloatSqrt/128-8 1.32µs ± 3% 1.42µs ± 5% +7.29% (p=0.000 n=18+20) FloatSqrt/256-8 1.44µs ± 2% 1.45µs ± 4% ~ (p=0.089 n=19+19) FloatSqrt/1000-8 3.36µs ± 3% 3.42µs ± 5% +1.82% (p=0.012 n=20+20) FloatSqrt/10000-8 25.5µs ± 2% 27.5µs ± 7% +7.91% (p=0.000 n=18+19) FloatSqrt/100000-8 629µs ± 6% 663µs ± 9% +5.32% (p=0.000 n=18+20) FloatSqrt/1000000-8 46.4ms ± 2% 46.6ms ± 5% ~ (p=0.351 n=20+19) [Geo mean] 9.60µs 10.01µs +4.28% name old alloc/op new alloc/op delta DecimalConversion-8 54.0kB ± 0% 43.6kB ± 0% -19.40% (p=0.000 n=20+20) FloatString/100-8 400B ± 0% 400B ± 0% ~ (all equal) FloatString/1000-8 3.10kB ± 0% 3.10kB ± 0% ~ (all equal) FloatString/10000-8 52.1kB ± 0% 52.1kB ± 0% ~ (p=0.153 n=20+20) FloatString/100000-8 582kB ± 0% 582kB ± 0% ~ (all equal) FloatAdd/10-8 0.00B 0.00B ~ (all equal) FloatAdd/100-8 0.00B 0.00B ~ (all equal) FloatAdd/1000-8 0.00B 0.00B ~ (all equal) FloatAdd/10000-8 0.00B 0.00B ~ (all equal) FloatAdd/100000-8 0.00B 0.00B ~ (all equal) FloatSub/10-8 0.00B 0.00B ~ (all equal) FloatSub/100-8 0.00B 0.00B ~ (all equal) FloatSub/1000-8 0.00B 0.00B ~ (all equal) FloatSub/10000-8 0.00B 0.00B ~ (all equal) FloatSub/100000-8 0.00B 0.00B ~ (all equal) ParseFloatSmallExp-8 4.18kB ± 0% 3.60kB ± 0% -13.79% (p=0.000 n=20+20) ParseFloatLargeExp-8 18.9kB ± 0% 19.3kB ± 0% +2.25% (p=0.000 n=20+20) GCD10x10/WithoutXY-8 96.0B ± 0% 16.0B ± 0% -83.33% (p=0.000 n=20+20) GCD10x10/WithXY-8 240B ± 0% 88B ± 0% -63.33% (p=0.000 n=20+20) GCD10x100/WithoutXY-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20) GCD10x100/WithXY-8 464B ± 0% 424B ± 0% -8.62% (p=0.000 n=20+20) GCD10x1000/WithoutXY-8 416B ± 0% 336B ± 0% -19.23% (p=0.000 n=20+20) GCD10x1000/WithXY-8 1.25kB ± 0% 1.10kB ± 0% -12.18% (p=0.000 n=20+20) GCD10x10000/WithoutXY-8 2.91kB ± 0% 2.83kB ± 0% -2.75% (p=0.000 n=20+20) GCD10x10000/WithXY-8 8.70kB ± 0% 8.55kB ± 0% -1.76% (p=0.000 n=16+16) GCD10x100000/WithoutXY-8 27.2kB ± 0% 27.2kB ± 0% -0.29% (p=0.000 n=20+20) GCD10x100000/WithXY-8 82.4kB ± 0% 82.3kB ± 0% -0.17% (p=0.000 n=20+19) GCD100x100/WithoutXY-8 288B ± 0% 384B ± 0% +33.33% (p=0.000 n=20+20) GCD100x100/WithXY-8 464B ± 0% 576B ± 0% +24.14% (p=0.000 n=20+20) GCD100x1000/WithoutXY-8 640B ± 0% 688B ± 0% +7.50% (p=0.000 n=20+20) GCD100x1000/WithXY-8 1.52kB ± 0% 1.46kB ± 0% -3.68% (p=0.000 n=20+20) GCD100x10000/WithoutXY-8 4.24kB ± 0% 4.29kB ± 0% +1.13% (p=0.000 n=20+20) GCD100x10000/WithXY-8 11.1kB ± 0% 11.0kB ± 0% -0.51% (p=0.000 n=15+20) GCD100x100000/WithoutXY-8 40.9kB ± 0% 40.9kB ± 0% +0.12% (p=0.000 n=20+19) GCD100x100000/WithXY-8 110kB ± 0% 109kB ± 0% -0.08% (p=0.000 n=20+20) GCD1000x1000/WithoutXY-8 1.22kB ± 0% 1.06kB ± 0% -13.16% (p=0.000 n=20+20) GCD1000x1000/WithXY-8 2.37kB ± 0% 2.11kB ± 0% -10.83% (p=0.000 n=20+20) GCD1000x10000/WithoutXY-8 4.71kB ± 0% 4.63kB ± 0% -1.70% (p=0.000 n=20+19) GCD1000x10000/WithXY-8 28.2kB ± 0% 28.0kB ± 0% -0.43% (p=0.000 n=20+15) GCD1000x100000/WithoutXY-8 41.3kB ± 0% 41.2kB ± 0% -0.20% (p=0.000 n=20+16) GCD1000x100000/WithXY-8 301kB ± 0% 301kB ± 0% -0.13% (p=0.000 n=20+20) GCD10000x10000/WithoutXY-8 8.64kB ± 0% 8.48kB ± 0% -1.85% (p=0.000 n=20+20) GCD10000x10000/WithXY-8 57.2kB ± 0% 57.7kB ± 0% +0.80% (p=0.000 n=20+20) GCD10000x100000/WithoutXY-8 43.8kB ± 0% 43.7kB ± 0% -0.19% (p=0.000 n=20+18) GCD10000x100000/WithXY-8 2.08MB ± 0% 2.08MB ± 0% -0.02% (p=0.000 n=15+19) GCD100000x100000/WithoutXY-8 81.6kB ± 0% 81.4kB ± 0% -0.20% (p=0.000 n=20+20) GCD100000x100000/WithXY-8 4.32MB ± 0% 4.33MB ± 0% +0.12% (p=0.000 n=20+20) Hilbert-8 653kB ± 0% 313kB ± 0% -52.13% (p=0.000 n=19+20) Binomial-8 1.82kB ± 0% 1.02kB ± 0% -43.86% (p=0.000 n=20+20) QuoRem-8 0.00B 0.00B ~ (all equal) Exp-8 11.1kB ± 0% 11.0kB ± 0% -0.34% (p=0.000 n=19+20) Exp2-8 11.3kB ± 0% 11.3kB ± 0% -0.35% (p=0.000 n=19+20) Bitset-8 0.00B 0.00B ~ (all equal) BitsetNeg-8 0.00B 0.00B ~ (all equal) BitsetOrig-8 103B ± 0% 63B ± 0% -38.83% (p=0.000 n=20+20) BitsetNegOrig-8 215B ± 0% 175B ± 0% -18.60% (p=0.000 n=20+20) ModSqrt225_Tonelli-8 11.3kB ± 0% 11.0kB ± 0% -2.76% (p=0.000 n=20+17) ModSqrt225_3Mod4-8 3.57kB ± 0% 3.53kB ± 0% -1.12% (p=0.000 n=20+20) ModSqrt231_Tonelli-8 11.0kB ± 0% 10.7kB ± 0% -2.55% (p=0.000 n=20+20) ModSqrt231_5Mod8-8 4.21kB ± 0% 4.09kB ± 0% -2.85% (p=0.000 n=16+20) ModInverse-8 1.44kB ± 0% 1.28kB ± 0% -11.11% (p=0.000 n=20+20) Sqrt-8 6.00kB ± 0% 6.00kB ± 0% ~ (all equal) IntSqr/1-8 0.00B 0.00B ~ (all equal) IntSqr/2-8 0.00B 0.00B ~ (all equal) IntSqr/3-8 0.00B 0.00B ~ (all equal) IntSqr/5-8 0.00B 0.00B ~ (all equal) IntSqr/8-8 0.00B 0.00B ~ (all equal) IntSqr/10-8 0.00B 0.00B ~ (all equal) IntSqr/20-8 320B ± 0% 320B ± 0% ~ (all equal) IntSqr/30-8 480B ± 0% 480B ± 0% ~ (all equal) IntSqr/50-8 896B ± 0% 896B ± 0% ~ (all equal) IntSqr/80-8 1.28kB ± 0% 1.28kB ± 0% ~ (all equal) IntSqr/100-8 1.79kB ± 0% 1.79kB ± 0% ~ (all equal) IntSqr/200-8 3.20kB ± 0% 3.20kB ± 0% ~ (all equal) IntSqr/300-8 8.06kB ± 0% 8.06kB ± 0% ~ (all equal) IntSqr/500-8 12.3kB ± 0% 12.3kB ± 0% ~ (all equal) IntSqr/800-8 28.8kB ± 0% 28.8kB ± 0% ~ (all equal) IntSqr/1000-8 36.9kB ± 0% 36.9kB ± 0% ~ (all equal) Div/20/10-8 0.00B 0.00B ~ (all equal) Div/200/100-8 0.00B 0.00B ~ (all equal) Div/2000/1000-8 0.00B 0.00B ~ (all equal) Div/20000/10000-8 0.00B 0.00B ~ (all equal) Div/200000/100000-8 690B ± 0% 690B ± 0% ~ (all equal) Mul-8 565kB ± 0% 565kB ± 0% ~ (all equal) ZeroShifts/Shl-8 6.53kB ± 0% 6.53kB ± 0% ~ (all equal) ZeroShifts/ShlSame-8 0.00B 0.00B ~ (all equal) ZeroShifts/Shr-8 6.53kB ± 0% 6.53kB ± 0% ~ (all equal) ZeroShifts/ShrSame-8 0.00B 0.00B ~ (all equal) Exp3Power/0x10-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20) Exp3Power/0x40-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20) Exp3Power/0x100-8 288B ± 0% 208B ± 0% -27.78% (p=0.000 n=20+20) Exp3Power/0x400-8 672B ± 0% 592B ± 0% -11.90% (p=0.000 n=20+20) Exp3Power/0x1000-8 3.33kB ± 0% 3.25kB ± 0% -2.40% (p=0.000 n=20+20) Exp3Power/0x4000-8 13.8kB ± 0% 13.7kB ± 0% -0.58% (p=0.000 n=20+20) Exp3Power/0x10000-8 117kB ± 0% 117kB ± 0% -0.07% (p=0.000 n=20+20) Exp3Power/0x40000-8 755kB ± 0% 755kB ± 0% -0.01% (p=0.000 n=19+20) Exp3Power/0x100000-8 5.22MB ± 0% 5.22MB ± 0% -0.00% (p=0.000 n=20+20) Exp3Power/0x400000-8 39.8MB ± 0% 39.8MB ± 0% -0.00% (p=0.000 n=20+19) Fibo-8 3.09MB ± 0% 3.08MB ± 0% -0.28% (p=0.000 n=20+16) NatSqr/1-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) NatSqr/2-8 64.0B ± 0% 64.0B ± 0% ~ (all equal) NatSqr/3-8 80.0B ± 0% 80.0B ± 0% ~ (all equal) NatSqr/5-8 112B ± 0% 112B ± 0% ~ (all equal) NatSqr/8-8 160B ± 0% 160B ± 0% ~ (all equal) NatSqr/10-8 192B ± 0% 192B ± 0% ~ (all equal) NatSqr/20-8 672B ± 0% 672B ± 0% ~ (all equal) NatSqr/30-8 992B ± 0% 992B ± 0% ~ (all equal) NatSqr/50-8 1.79kB ± 0% 1.79kB ± 0% ~ (all equal) NatSqr/80-8 2.69kB ± 0% 2.69kB ± 0% ~ (all equal) NatSqr/100-8 3.58kB ± 0% 3.58kB ± 0% ~ (all equal) NatSqr/200-8 6.66kB ± 0% 6.66kB ± 0% ~ (all equal) NatSqr/300-8 24.4kB ± 0% 24.4kB ± 0% ~ (all equal) NatSqr/500-8 36.9kB ± 0% 36.9kB ± 0% ~ (all equal) NatSqr/800-8 69.8kB ± 0% 69.8kB ± 0% ~ (all equal) NatSqr/1000-8 86.0kB ± 0% 86.0kB ± 0% ~ (all equal) NatSetBytes/8-8 0.00B 0.00B ~ (all equal) NatSetBytes/24-8 64.0B ± 0% 64.0B ± 0% ~ (all equal) NatSetBytes/128-8 160B ± 0% 160B ± 0% ~ (all equal) NatSetBytes/7-8 0.00B 0.00B ~ (all equal) NatSetBytes/23-8 64.0B ± 0% 64.0B ± 0% ~ (all equal) NatSetBytes/127-8 160B ± 0% 160B ± 0% ~ (all equal) ScanPi-8 75.4kB ± 0% 75.7kB ± 0% +0.41% (p=0.000 n=20+20) StringPiParallel-8 20.4kB ± 0% 20.4kB ± 0% ~ (p=0.223 n=20+20) Scan/10/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/1000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/1000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/1000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/1000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) String/10/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) String/100/Base2-8 352B ± 0% 352B ± 0% ~ (all equal) String/1000/Base2-8 3.46kB ± 0% 3.46kB ± 0% ~ (all equal) String/10000/Base2-8 41.0kB ± 0% 41.0kB ± 0% ~ (all equal) String/100000/Base2-8 336kB ± 0% 336kB ± 0% ~ (all equal) String/10/Base8-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) String/100/Base8-8 112B ± 0% 112B ± 0% ~ (all equal) String/1000/Base8-8 1.15kB ± 0% 1.15kB ± 0% ~ (all equal) String/10000/Base8-8 12.3kB ± 0% 12.3kB ± 0% ~ (all equal) String/100000/Base8-8 115kB ± 0% 115kB ± 0% ~ (all equal) String/10/Base10-8 64.0B ± 0% 24.0B ± 0% -62.50% (p=0.000 n=20+20) String/100/Base10-8 192B ± 0% 192B ± 0% ~ (all equal) String/1000/Base10-8 1.95kB ± 0% 1.95kB ± 0% ~ (all equal) String/10000/Base10-8 20.0kB ± 0% 20.0kB ± 0% ~ (p=0.983 n=19+20) String/100000/Base10-8 210kB ± 1% 211kB ± 1% +0.82% (p=0.000 n=19+20) String/10/Base16-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) String/100/Base16-8 96.0B ± 0% 96.0B ± 0% ~ (all equal) String/1000/Base16-8 896B ± 0% 896B ± 0% ~ (all equal) String/10000/Base16-8 9.47kB ± 0% 9.47kB ± 0% ~ (all equal) String/100000/Base16-8 90.1kB ± 0% 90.1kB ± 0% ~ (all equal) LeafSize/0-8 16.9kB ± 0% 16.8kB ± 0% -0.44% (p=0.000 n=20+20) LeafSize/1-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19) LeafSize/2-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19) LeafSize/3-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+17) LeafSize/4-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19) LeafSize/5-8 22.4kB ± 0% 22.3kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/6-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=20+20) LeafSize/7-8 22.3kB ± 0% 22.2kB ± 0% -0.35% (p=0.000 n=20+20) LeafSize/8-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=16+20) LeafSize/9-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/10-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/11-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/12-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/13-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=20+15) LeafSize/14-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/15-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/16-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=19+20) LeafSize/32-8 22.3kB ± 0% 22.2kB ± 0% -0.32% (p=0.000 n=20+20) LeafSize/64-8 21.8kB ± 0% 21.7kB ± 0% -0.33% (p=0.000 n=18+19) ProbablyPrime/n=0-8 15.3kB ± 0% 14.9kB ± 0% -2.35% (p=0.000 n=20+20) ProbablyPrime/n=1-8 21.0kB ± 0% 20.7kB ± 0% -1.71% (p=0.000 n=20+20) ProbablyPrime/n=5-8 43.4kB ± 0% 42.9kB ± 0% -1.20% (p=0.000 n=20+20) ProbablyPrime/n=10-8 71.5kB ± 0% 70.7kB ± 0% -1.01% (p=0.000 n=19+20) ProbablyPrime/n=20-8 127kB ± 0% 126kB ± 0% -0.88% (p=0.000 n=20+20) ProbablyPrime/Lucas-8 3.07kB ± 0% 2.79kB ± 0% -9.12% (p=0.000 n=20+20) ProbablyPrime/MillerRabinBase2-8 12.1kB ± 0% 12.0kB ± 0% -0.66% (p=0.000 n=20+20) FloatSqrt/64-8 416B ± 0% 360B ± 0% -13.46% (p=0.000 n=20+20) FloatSqrt/128-8 640B ± 0% 584B ± 0% -8.75% (p=0.000 n=20+20) FloatSqrt/256-8 512B ± 0% 472B ± 0% -7.81% (p=0.000 n=20+20) FloatSqrt/1000-8 1.47kB ± 0% 1.43kB ± 0% -2.72% (p=0.000 n=20+20) FloatSqrt/10000-8 18.2kB ± 0% 18.1kB ± 0% -0.22% (p=0.000 n=20+20) FloatSqrt/100000-8 204kB ± 0% 204kB ± 0% -0.02% (p=0.000 n=20+20) FloatSqrt/1000000-8 6.37MB ± 0% 6.37MB ± 0% -0.00% (p=0.000 n=19+20) [Geo mean] 3.42kB 3.24kB -5.33% name old allocs/op new allocs/op delta DecimalConversion-8 1.65k ± 0% 1.65k ± 0% ~ (all equal) FloatString/100-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) FloatString/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) FloatString/10000-8 22.0 ± 0% 22.0 ± 0% ~ (all equal) FloatString/100000-8 136 ± 0% 136 ± 0% ~ (all equal) FloatAdd/10-8 0.00 0.00 ~ (all equal) FloatAdd/100-8 0.00 0.00 ~ (all equal) FloatAdd/1000-8 0.00 0.00 ~ (all equal) FloatAdd/10000-8 0.00 0.00 ~ (all equal) FloatAdd/100000-8 0.00 0.00 ~ (all equal) FloatSub/10-8 0.00 0.00 ~ (all equal) FloatSub/100-8 0.00 0.00 ~ (all equal) FloatSub/1000-8 0.00 0.00 ~ (all equal) FloatSub/10000-8 0.00 0.00 ~ (all equal) FloatSub/100000-8 0.00 0.00 ~ (all equal) ParseFloatSmallExp-8 110 ± 0% 130 ± 0% +18.18% (p=0.000 n=20+20) ParseFloatLargeExp-8 319 ± 0% 371 ± 0% +16.30% (p=0.000 n=20+20) GCD10x10/WithoutXY-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) GCD10x10/WithXY-8 5.00 ± 0% 6.00 ± 0% +20.00% (p=0.000 n=20+20) GCD10x100/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) GCD10x100/WithXY-8 9.00 ± 0% 12.00 ± 0% +33.33% (p=0.000 n=20+20) GCD10x1000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) GCD10x1000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20) GCD10x10000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) GCD10x10000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20) GCD10x100000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) GCD10x100000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20) GCD100x100/WithoutXY-8 6.00 ± 0% 10.00 ± 0% +66.67% (p=0.000 n=20+20) GCD100x100/WithXY-8 9.00 ± 0% 15.00 ± 0% +66.67% (p=0.000 n=20+20) GCD100x1000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20) GCD100x1000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20) GCD100x10000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20) GCD100x10000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20) GCD100x100000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20) GCD100x100000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20) GCD1000x1000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) GCD1000x1000/WithXY-8 19.0 ± 0% 20.0 ± 0% +5.26% (p=0.000 n=20+20) GCD1000x10000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) GCD1000x10000/WithXY-8 26.0 ± 0% 26.0 ± 0% ~ (all equal) GCD1000x100000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) GCD1000x100000/WithXY-8 27.0 ± 0% 27.0 ± 0% ~ (all equal) GCD10000x10000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) GCD10000x10000/WithXY-8 76.0 ± 0% 78.0 ± 0% +2.63% (p=0.000 n=20+20) GCD10000x100000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) GCD10000x100000/WithXY-8 174 ± 0% 174 ± 0% ~ (all equal) GCD100000x100000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) GCD100000x100000/WithXY-8 645 ± 0% 647 ± 0% +0.31% (p=0.000 n=20+20) Hilbert-8 14.1k ± 0% 14.3k ± 0% +0.92% (p=0.000 n=20+20) Binomial-8 38.0 ± 0% 38.0 ± 0% ~ (all equal) QuoRem-8 0.00 0.00 ~ (all equal) Exp-8 21.0 ± 0% 21.0 ± 0% ~ (all equal) Exp2-8 22.0 ± 0% 22.0 ± 0% ~ (all equal) Bitset-8 0.00 0.00 ~ (all equal) BitsetNeg-8 0.00 0.00 ~ (all equal) BitsetOrig-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) BitsetNegOrig-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) ModSqrt225_Tonelli-8 85.0 ± 0% 86.0 ± 0% +1.18% (p=0.000 n=20+20) ModSqrt225_3Mod4-8 25.0 ± 0% 25.0 ± 0% ~ (all equal) ModSqrt231_Tonelli-8 80.0 ± 0% 80.0 ± 0% ~ (all equal) ModSqrt231_5Mod8-8 32.0 ± 0% 32.0 ± 0% ~ (all equal) ModInverse-8 11.0 ± 0% 11.0 ± 0% ~ (all equal) Sqrt-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) IntSqr/1-8 0.00 0.00 ~ (all equal) IntSqr/2-8 0.00 0.00 ~ (all equal) IntSqr/3-8 0.00 0.00 ~ (all equal) IntSqr/5-8 0.00 0.00 ~ (all equal) IntSqr/8-8 0.00 0.00 ~ (all equal) IntSqr/10-8 0.00 0.00 ~ (all equal) IntSqr/20-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/30-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/50-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/80-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/100-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/200-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/300-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) IntSqr/500-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) IntSqr/800-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) IntSqr/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) Div/20/10-8 0.00 0.00 ~ (all equal) Div/200/100-8 0.00 0.00 ~ (all equal) Div/2000/1000-8 0.00 0.00 ~ (all equal) Div/20000/10000-8 0.00 0.00 ~ (all equal) Div/200000/100000-8 0.00 0.00 ~ (all equal) Mul-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) ZeroShifts/Shl-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) ZeroShifts/ShlSame-8 0.00 0.00 ~ (all equal) ZeroShifts/Shr-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) ZeroShifts/ShrSame-8 0.00 0.00 ~ (all equal) Exp3Power/0x10-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) Exp3Power/0x40-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) Exp3Power/0x100-8 5.00 ± 0% 5.00 ± 0% ~ (all equal) Exp3Power/0x400-8 7.00 ± 0% 7.00 ± 0% ~ (all equal) Exp3Power/0x1000-8 11.0 ± 0% 11.0 ± 0% ~ (all equal) Exp3Power/0x4000-8 15.0 ± 0% 15.0 ± 0% ~ (all equal) Exp3Power/0x10000-8 29.0 ± 0% 29.0 ± 0% ~ (all equal) Exp3Power/0x40000-8 140 ± 0% 140 ± 0% ~ (all equal) Exp3Power/0x100000-8 1.12k ± 0% 1.12k ± 0% ~ (all equal) Exp3Power/0x400000-8 9.88k ± 0% 9.88k ± 0% ~ (p=0.747 n=17+19) Fibo-8 739 ± 0% 743 ± 0% +0.54% (p=0.000 n=20+20) NatSqr/1-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/3-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/5-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/20-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/30-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/50-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/80-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/100-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/200-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/300-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) NatSqr/500-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) NatSqr/800-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) NatSqr/1000-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) NatSetBytes/8-8 0.00 0.00 ~ (all equal) NatSetBytes/24-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSetBytes/128-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSetBytes/7-8 0.00 0.00 ~ (all equal) NatSetBytes/23-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSetBytes/127-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) ScanPi-8 60.0 ± 0% 61.0 ± 0% +1.67% (p=0.000 n=20+20) StringPiParallel-8 24.0 ± 0% 24.0 ± 0% ~ (all equal) Scan/10/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/1000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/1000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/1000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/1000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/1000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/1000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10/Base10-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) String/100/Base10-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) String/1000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) String/10000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) String/100000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) String/10/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/1000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) LeafSize/0-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) LeafSize/1-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/2-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/3-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/4-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/5-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/6-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/7-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/8-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/9-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/10-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/11-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/12-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/13-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/14-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/15-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/16-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/32-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/64-8 11.0 ± 0% 11.0 ± 0% ~ (all equal) ProbablyPrime/n=0-8 52.0 ± 0% 52.0 ± 0% ~ (all equal) ProbablyPrime/n=1-8 73.0 ± 0% 73.0 ± 0% ~ (all equal) ProbablyPrime/n=5-8 157 ± 0% 157 ± 0% ~ (all equal) ProbablyPrime/n=10-8 262 ± 0% 262 ± 0% ~ (all equal) ProbablyPrime/n=20-8 472 ± 0% 472 ± 0% ~ (all equal) ProbablyPrime/Lucas-8 22.0 ± 0% 22.0 ± 0% ~ (all equal) ProbablyPrime/MillerRabinBase2-8 29.0 ± 0% 29.0 ± 0% ~ (all equal) FloatSqrt/64-8 9.00 ± 0% 10.00 ± 0% +11.11% (p=0.000 n=20+20) FloatSqrt/128-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20) FloatSqrt/256-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) FloatSqrt/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) FloatSqrt/10000-8 14.0 ± 0% 14.0 ± 0% ~ (all equal) FloatSqrt/100000-8 33.0 ± 0% 33.0 ± 0% ~ (all equal) FloatSqrt/1000000-8 1.16k ± 0% 1.16k ± 0% ~ (all equal) [Geo mean] 6.62 6.76 +2.09% Change-Id: Id9df4157cac1e07721e35cff7fcdefe60703873a Reviewed-on: https://go-review.googlesource.com/c/150999 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Robert Griesemer <gri@golang.org>

The divLarge code contained "todo"s about avoiding alias and clear calls in the initialization of variables. By rearranging the order of initialization and always using an auxiliary variable for the shifted divisor, all of these calls can be safely avoided. On average, normalizing the divisor (shift>0) is required 31/32 or 63/64 of the time. If one always performs the shift into an auxiliary variable first, this avoids the need to check for aliasing of vIn in the output variables u and z. The remainder u is initialized via a left shift of uIn and thus needs no alias check against uIn. Since uIn and vIn were both used, z needs no alias checks except against u which is used for storage of the remainder. This change has a minimal impact on performance (see below), but cleans up the initialization code and eliminates the "todo"s. name old time/op new time/op delta Div/20/10-4 86.7ns ± 6% 85.7ns ± 5% ~ (p=0.841 n=5+5) Div/200/100-4 523ns ± 5% 502ns ± 3% -4.13% (p=0.024 n=5+5) Div/2000/1000-4 2.55µs ± 3% 2.59µs ± 5% ~ (p=0.548 n=5+5) Div/20000/10000-4 80.4µs ± 4% 80.0µs ± 2% ~ (p=1.000 n=5+5) Div/200000/100000-4 6.43ms ± 6% 6.35ms ± 4% ~ (p=0.548 n=5+5) Fixes #22928 Change-Id: I30d8498ef1cf8b69b0f827165c517bc25a5c32d7 Reviewed-on: https://go-review.googlesource.com/130775 Reviewed-by: Robert Griesemer <gri@golang.org>

For #23221. Change-Id: If55dcf2e0706d6658f4a0863e3740437e008706c Reviewed-on: https://go-review.googlesource.com/114335 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>

Currently we use three different algorithms for squaring: 1. basic multiplication for small numbers 2. basic squaring for medium numbers 3. Karatsuba multiplication for large numbers Change 3. to a version of Karatsuba multiplication specialized for x == y. Increasing the performance of 3. lets us lower the threshold between 2. and 3. Adapt TestCalibrate to the change that 3. isn't independent of the threshold between 1. and 2. any more. Fixes #23221. benchstat old.txt new.txt name old time/op new time/op delta NatSqr/1-4 29.6ns ± 7% 29.5ns ± 5% ~ (p=0.103 n=50+50) NatSqr/2-4 51.9ns ± 1% 51.9ns ± 1% ~ (p=0.693 n=42+49) NatSqr/3-4 64.3ns ± 1% 64.1ns ± 0% -0.26% (p=0.000 n=46+43) NatSqr/5-4 93.5ns ± 2% 93.1ns ± 1% -0.39% (p=0.000 n=48+49) NatSqr/8-4 131ns ± 1% 131ns ± 1% ~ (p=0.870 n=46+49) NatSqr/10-4 175ns ± 1% 175ns ± 1% +0.38% (p=0.000 n=49+47) NatSqr/20-4 426ns ± 1% 429ns ± 1% +0.84% (p=0.000 n=46+48) NatSqr/30-4 702ns ± 2% 699ns ± 1% -0.38% (p=0.011 n=46+44) NatSqr/50-4 1.44µs ± 2% 1.43µs ± 1% -0.54% (p=0.010 n=48+48) NatSqr/80-4 2.85µs ± 1% 2.87µs ± 1% +0.68% (p=0.000 n=47+47) NatSqr/100-4 4.06µs ± 1% 4.07µs ± 1% +0.29% (p=0.000 n=46+45) NatSqr/200-4 13.4µs ± 1% 13.5µs ± 1% +0.73% (p=0.000 n=48+48) NatSqr/300-4 28.5µs ± 1% 28.2µs ± 1% -1.22% (p=0.000 n=46+48) NatSqr/500-4 81.9µs ± 1% 67.0µs ± 1% -18.25% (p=0.000 n=48+48) NatSqr/800-4 161µs ± 1% 140µs ± 1% -13.29% (p=0.000 n=47+48) NatSqr/1000-4 245µs ± 1% 207µs ± 1% -15.17% (p=0.000 n=49+49) go test -v -calibrate --run TestCalibrate ... Calibrating threshold between basicSqr(x) and karatsubaSqr(x) Looking for a timing difference for x between 200 - 500 words by 10 step words = 200 deltaT = -980ns ( -7%) is karatsubaSqr(x) better: false words = 210 deltaT = -773ns ( -5%) is karatsubaSqr(x) better: false words = 220 deltaT = -695ns ( -4%) is karatsubaSqr(x) better: false words = 230 deltaT = -570ns ( -3%) is karatsubaSqr(x) better: false words = 240 deltaT = -458ns ( -2%) is karatsubaSqr(x) better: false words = 250 deltaT = -63ns ( 0%) is karatsubaSqr(x) better: false words = 260 deltaT = 118ns ( 0%) is karatsubaSqr(x) better: true threshold found words = 270 deltaT = 377ns ( 1%) is karatsubaSqr(x) better: true words = 280 deltaT = 765ns ( 3%) is karatsubaSqr(x) better: true words = 290 deltaT = 673ns ( 2%) is karatsubaSqr(x) better: true words = 300 deltaT = 502ns ( 1%) is karatsubaSqr(x) better: true words = 310 deltaT = 629ns ( 2%) is karatsubaSqr(x) better: true words = 320 deltaT = 1.011µs ( 3%) is karatsubaSqr(x) better: true words = 330 deltaT = 1.36µs ( 4%) is karatsubaSqr(x) better: true words = 340 deltaT = 3.001µs ( 8%) is karatsubaSqr(x) better: true words = 350 deltaT = 3.178µs ( 8%) is karatsubaSqr(x) better: true ... Change-Id: I6f13c23d94d042539ac28e77fd2618cdc37a429e Reviewed-on: https://go-review.googlesource.com/105075 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>

Updates #22830 Due to not checking if the output slices alias in divLarge, calls of the form z.div(z, x, y) caused the slice z to attempt to be used to store both the quotient and the remainder of the division. CL 78995 applies an alias check to correct that error. This CL cleans up the additional div calls that attempt to supply the same slice to hold both the quotient and remainder. Note that the call in expNN was responsible for the reported error in r.Exp(x, 1, m) when r was initialized to a non-zero value. The second instance in expNNMontgomery did not result in an error due to the size of the arguments. // RR = 2**(2*_W*len(m)) mod m RR := nat(nil).setWord(1) zz := nat(nil).shl(RR, uint(2*numWords*_W)) _, RR = RR.div(RR, zz, m) Specifically, cap(RR) == 5 after setWord(1) due to const e = 4 in z.make(1) len(zz) == 2*len(m) + 1 after shifting left, numWords = len(m) Reusing the backing array for z and z2 in div was only triggered if cap(RR) >= len(zz) + 1 and len(m) > 1 so that divLarge was called. But, 5 < 2*len(m) + 2 if len(m) > 1, so new arrays were allocated and the error was never triggered in this case. Change-Id: Iedac80dbbde13216c94659e84d28f6f4be3aaf24 Reviewed-on: https://go-review.googlesource.com/81055 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>

That "else" was needed due to gc DCE limitations. Now it's not the case and we can avoid go lint complaints. (See #23521 and https://golang.org/cl/91056.) There is inlining test for bigEndianWord, so if test is passing, no performance regression should occur. Change-Id: Id84d63f361e5e51a52293904ff042966c83c16e9 Reviewed-on: https://go-review.googlesource.com/104555 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>

Instead shifting the accumulator every iteration of the loop, shift once in the end. This significantly improves performance on arm64. On arm64: name old time/op new time/op delta RSA2048Decrypt 3.33ms ± 0% 2.63ms ± 0% -20.94% (p=0.000 n=11+11) RSA2048Sign 4.22ms ± 0% 3.55ms ± 0% -15.89% (p=0.000 n=11+11) 3PrimeRSA2048Decrypt 1.95ms ± 0% 1.59ms ± 0% -18.59% (p=0.000 n=11+11) On Skylake: name old time/op new time/op delta RSA2048Decrypt-8 1.73ms ± 2% 1.55ms ± 2% -10.19% (p=0.000 n=10+10) RSA2048Sign-8 2.17ms ± 2% 2.00ms ± 2% -7.93% (p=0.000 n=10+10) 3PrimeRSA2048Decrypt-8 1.10ms ± 2% 0.96ms ± 2% -13.03% (p=0.000 n=10+9) Change-Id: I5786191a1a09e4217fdb1acfd90880d35c5855f7 Reviewed-on: https://go-review.googlesource.com/99838 Run-TryBot: Vlad Krasnov <vlad@cloudflare.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Adam Langley <agl@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>

One could expect calls like z.mant.shl(z.mant, shiftAmount) (or higher-level-functions calls that use lhs/rhs) to be almost free when shiftAmount = 0; and expect calls like z.mant.shl(x.mant, 0) to have the same cost of a x.mant -> z.mant copy. Neither of this things are currently true. For an 800 words nat, the first kind of calls cost ~800ns for rigth shifts and ~3.5µs for left shift; while the second kind of calls are doing more work than necessary by calling shlVU/shrVU. This change makes the first kind of calls ({Shl,Shr}Same) almost free, and the second kind of calls ({Shl,Shr}) about 30% faster. name old time/op new time/op delta ZeroShifts/Shl-4 3.64µs ± 3% 2.49µs ± 1% -31.55% (p=0.000 n=10+10) ZeroShifts/ShlSame-4 3.65µs ± 1% 0.01µs ± 1% -99.85% (p=0.000 n=9+9) ZeroShifts/Shr-4 3.65µs ± 1% 2.49µs ± 1% -31.91% (p=0.000 n=10+10) ZeroShifts/ShrSame-4 825ns ± 0% 6ns ± 1% -99.33% (p=0.000 n=9+10) During go test math/big, the shl zeroshift fastpath is triggered 1380 times; while the shr fastpath is triggered 153334 times(!). Change-Id: I5f92b304a40638bd8453a86c87c58e54b337bcdf Reviewed-on: https://go-review.googlesource.com/87660 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>

Change-Id: I7ed40507a019c0bf521ba748fc22c03d74bb17b7 Reviewed-on: https://go-review.googlesource.com/100719 Reviewed-by: Ian Lance Taylor <iant@golang.org>

Set up to _S (number of bytes in Uint) bytes at time by using BigEndian.Uint32 and BigEndian.Uint64. The performance improves for slices bigger than _S bytes. This is the case for 128/256bit arith that initializes it's objects from bytes. name old time/op new time/op delta NatSetBytes/8-4 29.8ns ± 1% 11.4ns ± 0% -61.63% (p=0.000 n=9+8) NatSetBytes/24-4 109ns ± 1% 56ns ± 0% -48.75% (p=0.000 n=9+8) NatSetBytes/128-4 420ns ± 2% 110ns ± 1% -73.83% (p=0.000 n=10+10) NatSetBytes/7-4 26.2ns ± 1% 21.3ns ± 2% -18.63% (p=0.000 n=8+9) NatSetBytes/23-4 106ns ± 1% 67ns ± 1% -36.93% (p=0.000 n=9+10) NatSetBytes/127-4 410ns ± 2% 121ns ± 0% -70.46% (p=0.000 n=9+8) Found this optimization opportunity by looking at ethereum_corevm community benchmark cpuprofile. name old time/op new time/op delta OpDiv256-4 715ns ± 1% 596ns ± 1% -16.57% (p=0.008 n=5+5) OpDiv128-4 373ns ± 1% 314ns ± 1% -15.83% (p=0.008 n=5+5) OpDiv64-4 301ns ± 0% 285ns ± 1% -5.12% (p=0.008 n=5+5) Change-Id: I8e5a680ae6284c8233d8d7431d51253a8a740b57 Reviewed-on: https://go-review.googlesource.com/98775 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> Reviewed-by: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>

In nat.divLarge (having signature (z nat).divLarge(u, uIn, v nat)), we check whether z aliases uIn or v, but aliasing is currently not checked for the u parameter. Unfortunately, z and u aliasing each other can in some cases cause errors in the computation. The q return parameter (which will hold the result's quotient), is unconditionally initialized as q = z.make(m + 1) When cap(z) ≥ m+1, z.make() will reuse z's backing array, causing q and z to share the same backing array. If then z aliases u, setting q during the quotient computation will then corrupt u, which at that point already holds computation state. To fix this, we add an alias(z, u) check at the beginning of the function, taking care of aliasing the same way we already do for uIn and v. Fixes #22830 Change-Id: I3ab81120d5af6db7772a062bb1dfc011de91f7ad Reviewed-on: https://go-review.googlesource.com/78995 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>

Replace z.mul(x, x) calls on nats in internal code with z.sqr(x) that employs optimized squaring routines. Benchmark results: Exp-4 12.9ms ± 2% 12.8ms ± 3% ~ (p=0.165 n=10+10) Exp2-4 13.0ms ± 4% 12.8ms ± 2% -2.14% (p=0.015 n=8+9) ModSqrt225_Tonelli-4 987µs ± 4% 989µs ± 2% ~ (p=0.673 n=8+9) ModSqrt224_3Mod4-4 300µs ± 2% 301µs ± 3% ~ (p=0.546 n=9+9) ModSqrt5430_Tonelli-4 4.88s ± 6% 4.82s ± 5% ~ (p=0.247 n=10+10) ModSqrt5430_3Mod4-4 1.62s ±10% 1.57s ± 1% ~ (p=0.094 n=9+9) Exp3Power/0x10-4 496ns ± 7% 426ns ± 7% -14.21% (p=0.000 n=10+10) Exp3Power/0x40-4 575ns ± 5% 470ns ± 7% -18.20% (p=0.000 n=9+10) Exp3Power/0x100-4 929ns ±19% 770ns ±10% -17.13% (p=0.000 n=10+10) Exp3Power/0x400-4 1.96µs ± 7% 1.79µs ± 5% -8.68% (p=0.000 n=10+10) Exp3Power/0x1000-4 10.9µs ± 9% 7.9µs ± 5% -28.02% (p=0.000 n=10+10) Exp3Power/0x4000-4 86.8µs ± 8% 67.3µs ± 8% -22.41% (p=0.000 n=10+10) Exp3Power/0x10000-4 750µs ± 8% 731µs ± 1% ~ (p=0.074 n=9+8) Exp3Power/0x40000-4 7.07ms ± 7% 7.05ms ± 4% ~ (p=0.931 n=9+9) Exp3Power/0x100000-4 64.7ms ± 2% 65.6ms ± 6% ~ (p=0.661 n=9+10) Exp3Power/0x400000-4 577ms ± 2% 580ms ± 3% ~ (p=0.931 n=9+9) ProbablyPrime/n=0-4 9.08ms ±17% 9.09ms ±16% ~ (p=0.447 n=9+10) ProbablyPrime/n=1-4 10.8ms ± 4% 10.7ms ± 2% ~ (p=0.243 n=10+9) ProbablyPrime/n=5-4 18.5ms ± 3% 18.5ms ± 1% ~ (p=0.863 n=9+9) ProbablyPrime/n=10-4 28.6ms ± 6% 28.2ms ± 1% ~ (p=0.050 n=9+9) ProbablyPrime/n=20-4 48.4ms ± 4% 48.4ms ± 2% ~ (p=0.739 n=10+10) ProbablyPrime/Lucas-4 6.75ms ± 4% 6.75ms ± 2% ~ (p=0.963 n=9+8) ProbablyPrime/MillerRabinBase2-4 2.00ms ± 5% 2.00ms ± 7% ~ (p=0.931 n=9+9) Change-Id: Ibe9f58d11dbad25eb369faedf480b666a0250a6b Reviewed-on: https://go-review.googlesource.com/56773 Reviewed-by: Robert Griesemer <gri@golang.org>

updates #13745 Multiprecision squaring can be done in a straightforward manner with about half the multiplications of a basic multiplication due to the symmetry of the operands. This change implements basic squaring for nat types and uses it for Int multiplication when the same variable is supplied to both arguments of z.Mul(x, x). This has some overhead to allocate a temporary variable to hold the cross products, shift them to double and add them to the diagonal terms. There is a speed benefit in the intermediate range when the overhead is neglible and the asymptotic performance of karatsuba multiplication has not been reached. basicSqrThreshold = 20 karatsubaSqrThreshold = 400 Were set by running calibrate_test.go to measure timing differences between the algorithms. Benchmarks for squaring: name old time/op new time/op delta IntSqr/1-4 51.5ns ±25% 25.1ns ± 7% -51.38% (p=0.008 n=5+5) IntSqr/2-4 79.1ns ± 4% 72.4ns ± 2% -8.47% (p=0.008 n=5+5) IntSqr/3-4 102ns ± 4% 97ns ± 5% ~ (p=0.056 n=5+5) IntSqr/5-4 161ns ± 4% 163ns ± 7% ~ (p=0.952 n=5+5) IntSqr/8-4 277ns ± 5% 267ns ± 6% ~ (p=0.087 n=5+5) IntSqr/10-4 358ns ± 3% 360ns ± 4% ~ (p=0.730 n=5+5) IntSqr/20-4 1.07µs ± 3% 1.01µs ± 6% ~ (p=0.056 n=5+5) IntSqr/30-4 2.36µs ± 4% 1.72µs ± 2% -27.03% (p=0.008 n=5+5) IntSqr/50-4 5.19µs ± 3% 3.88µs ± 4% -25.37% (p=0.008 n=5+5) IntSqr/80-4 11.3µs ± 4% 8.6µs ± 3% -23.78% (p=0.008 n=5+5) IntSqr/100-4 16.2µs ± 4% 12.8µs ± 3% -21.49% (p=0.008 n=5+5) IntSqr/200-4 50.1µs ± 5% 44.7µs ± 3% -10.65% (p=0.008 n=5+5) IntSqr/300-4 105µs ±11% 95µs ± 3% -9.50% (p=0.008 n=5+5) IntSqr/500-4 231µs ± 5% 227µs ± 2% ~ (p=0.310 n=5+5) IntSqr/800-4 496µs ± 9% 459µs ± 3% -7.40% (p=0.016 n=5+5) IntSqr/1000-4 700µs ± 3% 710µs ± 5% ~ (p=0.841 n=5+5) Show a speed up of 10-25% in the range where basicSqr is optimal, improved single word squaring and no significant difference when the fallback to standard multiplication is used. Change-Id: Iae2c82ca91cf890823f91e5c83bbe9a2c534b72b Reviewed-on: https://go-review.googlesource.com/53638 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>

Verified that BenchmarkBitLen time went down from 2.25 ns/op to 0.65 ns/op an a 2.3 GHz Intel Core i7, before removing that benchmark (now covered by math/bits benchmarks). Change-Id: I3890bb7d1889e95b9a94bd68f0bdf06f1885adeb Reviewed-on: https://go-review.googlesource.com/38464 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

nat.setUint64 is nicely generic. By assuming 32- or 64-bit words, however, we can write simpler code, and eliminate some shifts in dead code that vet complains about. Generated code for 64 bit systems is unaltered. Generated code for 32 bit systems is much better. For 386, the routine length drops from 325 bytes of code to 271 bytes of code, with fewer loops. Change-Id: I1bc14c06272dee37a7fcb48d33dd1e621eba945d Reviewed-on: https://go-review.googlesource.com/38070 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>

This change adds math/bits as a new dependency of math/big. - use bits.LeadingZeroes instead of local implementation (they are identical, so there's no performance loss here) - leave other functionality local (ntz, bitLen) since there's faster implementations in math/big at the moment Change-Id: I1218aa8a1df0cc9783583b090a4bb5a8a145c4a2 Reviewed-on: https://go-review.googlesource.com/37141 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

This is needed for some of the more complex primality tests (to filter out exact squares), and while the code is simple the boundary conditions are not obvious, so it seems worth having in the library. Change-Id: Ica994a6b6c1e412a6f6d9c3cf823f9b653c6bcbd Reviewed-on: https://go-review.googlesource.com/30706 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>

A later CL will be adding more code here. It will help to keep it separate from the other code. Change-Id: I971ba53de819cd10991b51fdec665984939a5f9b Reviewed-on: https://go-review.googlesource.com/30709 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

The Montgomery multiply code is applicable to this case but was being bypassed. Don't do that. The old test len(x) > 1 was really just a bad approximation to x > 1. name old time/op new time/op delta Exp-8 5.56ms ± 4% 5.73ms ± 3% ~ (p=0.095 n=5+5) Exp2-8 7.59ms ± 1% 5.66ms ± 1% -25.40% (p=0.008 n=5+5) This comes up especially when doing Fermat (Miller-Rabin) primality tests with base 2. Change-Id: I4cc02978db6dfa93f7f3c8f32718e25eedb4f5ed Reviewed-on: https://go-review.googlesource.com/30708 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

- Add new BenchmarkQuoRem. - Eliminate allocation in divLarge nat pool - Unroll mulAddVWW body 4x - Remove some redundant slice loads in divLarge name old time/op new time/op delta QuoRem-8 2.18µs ± 1% 1.93µs ± 1% -11.38% (p=0.000 n=19+18) The starting point in the comparison here is Cherry's pending CL to turn mulWW and divWW into intrinsics. The optimizations in divLarge work best because all the function calls are gone. The effect of this CL is not as large if you don't assume Cherry's CL. Change-Id: Ia6138907489c5b9168497912e43705634e163b35 Reviewed-on: https://go-review.googlesource.com/30613 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>