go - Fork of Go programming language with my patches.

Age	Commit message (Collapse)	Author
2025-05-19	math: fix portable FMA implementation when xy ~ 0, xy < 0 and z = 0	ICHINOSE Shogo
	Adding zero usually does not change the original value. However, there is an exception with negative zero. (e.g. (-0) + (+0) = (+0)) This applies when x * y is negative and underflows. Fixes #73757 Change-Id: Ib7b54bdacd1dcfe3d392802ea35cdb4e989f9371 GitHub-Last-Rev: 30d74883b21667fc9439d9d14932b7edb3e72cd5 GitHub-Pull-Request: golang/go#73759 Reviewed-on: https://go-review.googlesource.com/c/go/+/673856 Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Robert Griesemer <gri@google.com>
2025-05-01	math/big: fix incorrect register allocation for mipsx/mips64x	Julian Zhu
	According to the MIPS ABI, R26/R27 are reserved for OS kernel, and may be clobbered by it. They must not be used by user mode. See Figure 3-18 of MIPS ELF ABI specification: https://refspecs.linuxfoundation.org/elf/mipsabi.pdf Fixes #73472 Change-Id: Ifda692a803176bfaab2c70d6623636c5d135f42e Reviewed-on: https://go-review.googlesource.com/c/go/+/667816 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-04-19	math/big: use clearer loop bounds check elimination	Russ Cox
	Checking that the lengths are equal and panicking teaches the compiler that it can assume “i in range for z” implies “i in range for x”, letting us simplify the actual loops a bit. It also turns up a few places in math/big that were playing maybe a little too fast and loose with slice lengths. Update those to explicitly set all the input slices to the same length. These speedups are basically irrelevant, since they only happen in real code if people are compiling with -tags math_big_pure_go. But at least the code is clearer. benchmark \ system c3h88 c2s16 s7 386 s7-386 c4as16 mac arm loong64 ppc64le riscv64 s390x AddVV/words=1/impl=go ~ +11.20% +5.11% -7.67% -7.77% +1.90% +10.76% -33.22% ~ +10.98% ~ +6.60% AddVV/words=10/impl=go -22.12% -13.48% -10.37% -17.95% -18.07% -24.58% -22.04% -29.95% -14.22% ~ -6.33% +3.66% AddVV/words=16/impl=go -9.75% -13.73% ~ -21.90% -18.66% -30.03% -20.45% -28.09% -17.33% -7.15% -8.96% +12.55% AddVV/words=100/impl=go -5.91% -1.02% ~ -29.23% -22.18% -25.62% -6.49% -23.59% -22.31% -1.88% -14.13% +9.23% AddVV/words=1000/impl=go -0.52% -0.19% -3.58% -33.89% -23.46% -22.46% ~ -24.00% -24.73% +0.93% -15.79% +12.32% AddVV/words=10000/impl=go ~ ~ ~ -33.79% -23.72% -23.79% -5.98% -23.92% ~ +0.78% -15.45% +8.59% AddVV/words=100000/impl=go ~ ~ ~ -33.90% -24.25% -22.82% -4.09% -24.63% ~ +1.00% -13.56% ~ SubVV/words=1/impl=go ~ +11.64% +14.05% ~ -4.07% ~ +10.79% -33.69% ~ ~ +3.89% +12.33% SubVV/words=10/impl=go -10.31% -14.09% -7.38% +13.76% -13.25% -18.05% -20.08% -24.97% -14.15% +10.13% -0.97% -2.51% SubVV/words=16/impl=go -8.06% -13.73% -5.70% +17.00% -12.83% -23.76% -17.52% -25.25% -17.30% -2.80% -4.96% -18.25% SubVV/words=100/impl=go -9.22% -1.30% -2.76% +20.88% -14.35% -15.29% -8.49% -19.64% -22.31% -0.68% -14.30% -9.04% SubVV/words=1000/impl=go -0.60% ~ -3.43% +23.08% -16.14% -11.96% ~ -28.52% -24.73% ~ -15.95% -9.91% SubVV/words=10000/impl=go ~ ~ ~ +26.01% -15.24% -11.92% ~ -28.26% +4.25% ~ -15.42% -5.95% SubVV/words=100000/impl=go ~ ~ ~ +25.71% -15.83% -12.13% ~ -27.88% -1.27% ~ -13.57% -6.72% LshVU/words=1/impl=go +0.56% +0.36% ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ LshVU/words=10/impl=go +13.37% +4.63% ~ ~ ~ ~ ~ -2.90% ~ ~ ~ ~ LshVU/words=16/impl=go +22.83% +6.47% ~ ~ ~ ~ ~ ~ +0.80% ~ ~ +5.88% LshVU/words=100/impl=go +7.56% +13.95% ~ ~ ~ ~ ~ ~ +0.33% -2.50% ~ ~ LshVU/words=1000/impl=go +0.64% +17.92% ~ ~ ~ ~ ~ -6.52% ~ -2.58% ~ ~ LshVU/words=10000/impl=go ~ +17.60% ~ ~ ~ ~ ~ -6.64% -6.22% -1.40% ~ ~ LshVU/words=100000/impl=go ~ +14.57% ~ ~ ~ ~ ~ ~ -5.47% ~ ~ ~ RshVU/words=1/impl=go ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ +2.72% RshVU/words=10/impl=go ~ ~ ~ ~ ~ ~ ~ +2.50% ~ ~ ~ ~ RshVU/words=16/impl=go ~ +0.53% ~ ~ ~ ~ ~ +3.82% ~ ~ ~ ~ RshVU/words=100/impl=go ~ ~ ~ ~ ~ ~ ~ +6.18% ~ ~ ~ ~ RshVU/words=1000/impl=go ~ ~ ~ ~ ~ ~ ~ +7.00% ~ ~ ~ ~ RshVU/words=10000/impl=go ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ RshVU/words=100000/impl=go ~ ~ ~ ~ ~ ~ ~ +7.05% ~ ~ ~ ~ MulAddVWW/words=1/impl=go -10.34% +4.43% +10.62% -1.62% -4.74% -2.86% +11.75% ~ -8.00% +8.89% +3.87% ~ MulAddVWW/words=10/impl=go -1.61% -5.87% ~ -8.30% -4.55% +0.87% ~ -5.28% -20.82% ~ ~ -2.32% MulAddVWW/words=16/impl=go -2.96% -5.28% ~ -9.22% -5.28% ~ ~ -3.74% -19.52% -1.48% -2.53% -9.52% MulAddVWW/words=100/impl=go -3.89% -7.53% +1.93% -10.49% -4.87% -8.27% ~ ~ -0.65% -0.61% -7.59% -20.61% MulAddVWW/words=1000/impl=go -0.45% -3.91% +4.54% -11.46% -4.69% -8.53% ~ ~ -0.05% ~ -8.88% -19.77% MulAddVWW/words=10000/impl=go ~ -3.30% +4.10% -11.34% -4.10% -9.43% ~ -0.61% ~ -0.55% -8.21% -18.48% MulAddVWW/words=100000/impl=go -0.30% -3.03% +4.31% -11.55% -4.41% -9.74% ~ -0.75% +0.63% ~ -7.80% -19.82% AddMulVVWW/words=1/impl=go ~ +13.09% +12.50% -7.05% -10.41% +2.53% +13.32% -3.49% ~ +15.56% +3.62% ~ AddMulVVWW/words=10/impl=go -15.96% -9.06% -5.06% -14.56% -11.83% -5.44% -26.30% -14.23% -11.44% -1.79% -5.93% -6.60% AddMulVVWW/words=16/impl=go -19.05% -12.43% -6.19% -14.24% -12.67% -8.65% -18.64% -16.56% -10.64% -3.00% -7.61% -12.80% AddMulVVWW/words=100/impl=go -22.13% -16.59% -13.04% -13.79% -11.46% -12.01% -6.46% -21.80% -5.08% -3.13% -13.60% -22.53% AddMulVVWW/words=1000/impl=go -17.07% -17.05% -14.08% -13.59% -12.13% -11.21% ~ -22.81% -4.27% -1.27% -16.35% -23.47% AddMulVVWW/words=10000/impl=go -15.03% -16.78% -14.23% -13.86% -11.84% -11.69% ~ -22.75% -13.39% -1.10% -14.37% -22.01% AddMulVVWW/words=100000/impl=go -13.70% -14.90% -14.26% -13.55% -12.04% -11.63% ~ -22.61% ~ -2.53% -10.42% -23.16% Change-Id: Ic6f64344484a762b818c7090d1396afceb638607 Reviewed-on: https://go-review.googlesource.com/c/go/+/665155 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>
2025-04-19	math/big: replace assembly with mini-compiler output	Russ Cox
	Step 4 of the mini-compiler: switch to the new generated assembly. No systematic performance regressions, and many many improvements. In the benchmarks, the systems are: c3h88 GOARCH=amd64 c3h88 perf gomote (newer Intel, Google Cloud) c2s16 GOARCH=amd64 c2s16 perf gomote (Intel, Google Cloud) s7 GOARCH=amd64 rsc basement server (AMD Ryzen 9 7950X) 386 GOARCH=386 gotip-linux-386 gomote (Intel, Google Cloud) s7-386 GOARCH=386 rsc basement server (AMD Ryzen 9 7950X) c4as16 GOARCH=arm64 c4as16 perf gomote (Google Cloud) mac GOARCH=arm64 Apple M3 Pro in MacBook Pro arm GOARCH=arm gotip-linux-arm gomote loong64 GOARCH=loong64 gotip-linux-loong64 gomote ppc64le GOARCH=ppc64le gotip-linux-ppc64le gomote riscv64 GOARCH=riscv64 gotip-linux-riscv64 gomote s390x GOARCH=s390x linux-s390x-ibm old gomote benchmark \ system c3h88 c2s16 s7 386 s7-386 c4as16 mac arm loong64 ppc64le riscv64 s390x AddVV/words=1 -4.03% +5.21% -4.04% +4.94% ~ ~ ~ ~ -19.51% ~ ~ ~ AddVV/words=10 -10.20% +0.34% -3.46% -11.50% -7.46% +7.66% +5.97% ~ -17.90% ~ ~ ~ AddVV/words=16 -10.91% -6.45% -8.45% -21.86% -17.90% +2.73% -1.61% ~ -22.47% -3.54% ~ ~ AddVV/words=100 -3.77% -4.30% -3.17% -47.27% -45.34% -0.78% ~ -8.74% -27.19% ~ ~ ~ AddVV/words=1000 -0.08% -0.71% ~ -49.21% -48.07% ~ ~ -16.80% -24.74% ~ ~ ~ AddVV/words=10000 ~ ~ ~ -48.73% -48.56% -0.06% ~ -17.08% ~ ~ -4.81% ~ AddVV/words=100000 ~ ~ ~ -47.80% -48.38% ~ ~ -15.10% -25.06% ~ -5.34% ~ SubVV/words=1 -0.84% +3.43% -3.62% +1.34% ~ -0.76% ~ ~ -18.18% +5.58% ~ ~ SubVV/words=10 -9.99% +0.34% ~ -11.23% -8.24% +7.53% +6.15% ~ -17.55% +2.77% -2.08% ~ SubVV/words=16 -11.94% -6.45% -6.81% -21.82% -18.11% +1.58% -1.21% ~ -20.36% ~ ~ ~ SubVV/words=100 -3.38% -4.32% -1.80% -46.14% -46.43% +0.41% ~ -7.20% -26.17% ~ -0.42% ~ SubVV/words=1000 -0.38% -0.80% ~ -49.22% -48.90% ~ ~ -15.86% -24.73% ~ ~ ~ SubVV/words=10000 ~ ~ ~ -49.57% -49.64% -0.03% ~ -15.85% -26.52% ~ -5.05% ~ SubVV/words=100000 ~ ~ ~ -46.88% -49.66% ~ ~ -15.45% -16.11% ~ -4.99% ~ LshVU/words=1 ~ +5.78% ~ ~ -2.48% +1.61% +2.18% +2.70% -18.16% -34.16% -21.29% ~ LshVU/words=10 -18.34% -3.78% +2.21% ~ ~ -2.81% -12.54% ~ -25.02% -24.78% -38.11% -66.98% LshVU/words=16 -23.15% +1.03% +7.74% +0.73% ~ +8.88% +1.56% ~ -25.37% -28.46% -41.27% ~ LshVU/words=100 -32.85% -8.86% -2.58% ~ +2.69% +1.24% ~ -20.63% -44.14% -42.68% -53.09% ~ LshVU/words=1000 -37.30% -0.20% +5.67% ~ ~ +1.44% ~ -27.83% -45.01% -37.07% -57.02% -46.57% LshVU/words=10000 -36.84% -2.30% +3.82% ~ +1.86% +1.57% -66.81% -28.00% -13.15% -35.40% -41.97% ~ LshVU/words=100000 -40.30% ~ +3.96% ~ ~ ~ ~ -24.91% -19.06% -36.14% -40.99% -66.03% RshVU/words=1 -3.17% +4.76% -4.06% +4.31% +4.55% ~ ~ ~ -20.61% ~ -26.20% -51.33% RshVU/words=10 -22.08% -4.41% -17.99% +3.64% -11.87% ~ -16.30% ~ -30.01% ~ -40.37% -63.05% RshVU/words=16 -26.03% -8.50% -18.09% ~ -17.52% +6.50% ~ -2.85% -30.24% ~ -42.93% -63.13% RshVU/words=100 -20.87% -28.83% -29.45% ~ -26.25% +1.46% -1.14% -16.20% -45.65% -16.20% -53.66% -77.27% RshVU/words=1000 -24.03% -21.37% -26.71% ~ -28.95% +0.98% ~ -18.82% -45.21% -23.55% -57.09% -71.18% RshVU/words=10000 -24.56% -22.44% -27.01% ~ -28.88% +0.78% -5.35% -17.47% -16.87% -20.67% -41.97% ~ RshVU/words=100000 -23.36% -15.65% -27.54% ~ -29.26% +1.73% -6.67% -13.68% -21.40% -23.02% -40.37% -66.31% MulAddVWW/words=1 +2.37% +8.14% ~ +4.10% +3.71% ~ ~ ~ -21.62% ~ +1.12% ~ MulAddVWW/words=10 ~ -2.72% -15.15% +8.04% ~ ~ ~ -2.52% -19.48% ~ -6.18% ~ MulAddVWW/words=16 ~ +1.49% ~ +4.49% +6.58% -8.70% -7.16% -12.08% -21.43% -6.59% -9.05% ~ MulAddVWW/words=100 +0.37% +1.11% -4.51% -13.59% ~ -11.10% -3.63% -21.40% -22.27% -2.92% -14.41% ~ MulAddVWW/words=1000 ~ +0.90% -7.13% -18.94% ~ -14.02% -9.97% -28.31% -18.72% -2.32% -15.80% ~ MulAddVWW/words=10000 ~ +1.08% -6.75% -19.10% ~ -14.61% -9.04% -28.48% -14.29% -2.25% -9.40% ~ MulAddVWW/words=100000 ~ ~ -6.93% -18.09% ~ -14.33% -9.66% -28.92% -16.63% -2.43% -8.23% ~ AddMulVVWW/words=1 +2.30% +4.83% -11.37% +4.58% ~ -3.14% ~ ~ -10.58% +30.35% ~ ~ AddMulVVWW/words=10 -3.27% ~ +8.96% +5.74% ~ +2.67% -1.44% -7.64% -13.41% ~ ~ ~ AddMulVVWW/words=16 -6.12% ~ ~ ~ +1.91% -7.90% -16.22% -14.07% -14.26% -4.15% -7.30% ~ AddMulVVWW/words=100 -5.48% -2.14% ~ -9.40% +9.98% -1.43% -12.35% -18.56% -21.94% ~ -9.84% ~ AddMulVVWW/words=1000 -11.35% -3.40% -3.64% -11.04% +12.82% -1.33% -15.63% -20.50% -20.95% ~ -11.06% -51.97% AddMulVVWW/words=10000 -10.31% -1.61% -8.41% -12.15% +13.10% -1.03% -16.34% -22.46% -1.00% ~ -10.33% -49.80% AddMulVVWW/words=100000 -13.71% ~ -8.31% -12.18% +12.98% -1.35% -15.20% -21.89% ~ ~ -9.38% -48.30% Change-Id: I0a33c33602c0d053c84d9946e662500cfa048e2d Reviewed-on: https://go-review.googlesource.com/c/go/+/664938 Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-19	math/big: add shift and mul to mini-compiler	Russ Cox
	Step 3 of the mini-compiler: add the generators for the shift and mul routines. Change-Id: I981d5b7086262c740036f5db768d3e63083984e2 Reviewed-on: https://go-review.googlesource.com/c/go/+/664937 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-19	math/big: add all architectures to mini-compiler	Russ Cox
	Step 2 of the mini-compiler: add all the remaining architectures. Change-Id: I8c5283aa8baa497785a5c15f2248528fa9ae886e Reviewed-on: https://go-review.googlesource.com/c/go/+/664936 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>
2025-04-19	math/big: new mini-compiler for arith assembly	Russ Cox
	The arith assembly is big enough, and the details that you have to keep in mind are complex enough and varied enough, that it is worth using a Go program to generate the assembly. That way, all the architectures can use the same algorithms, and porting to new architectures will be easier. This is the first of a sequence of CLs to introduce a new mini-compiler for generating the arith assembly, in math/big/internal/asmgen. This CL has the basics of the compiler as well as a couple simple architectures and the generator for addVV/subVV. It does not check in the generated assembly yet. That will happen in a followup CL after the other architectures and generators have been added. Change-Id: Ib704c60fd972fc5690ac04d8fae3712ee2c1a80a Reviewed-on: https://go-review.googlesource.com/c/go/+/664935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>
2025-04-18	math/big: replace addVW/subVW assembly with fast pure Go	Russ Cox
	The vast majority of the time, carry propagation is limited and addVW/subVW only need to consider a single word for carry propagation. As Josh Bleecher-Snyder pointed out in 2019 (CL 164968), once carrying is done, the remaining words can be handled faster with copy (memmove). In the benchmarks below, this is the data=random case. Even more important, if the source and destination are the same, the copy can be optimized away entirely, making a small in-place addition to a big.Int O(1) instead of O(N). To date, only a few systems (amd64, arm64, and pure Go, meaning wasm) make use of this asymptotic improvement. This is the data=shortcut case. This CL deletes the addVW/subVW assembly and replaces it with an optimized pure Go version. Using Go makes it easy to call the real copy builtin, which will use optimized memmove code, instead of recreating a worse memmove in assembly (as arm64 does) or omitting the copy optimization entirely (as most others do). The worst case for the Go version versus assembly is the case of incrementing 2^N-1 by 1, which has to propagate a carry the entire length of the array. This is the data=carry case. On balance, we believe this case is rare enough to be worth taking a hit in that case, in exchange for significant wins in the other cases and the deletion of significant amounts of assembly of varying quality. (Remember that half the assembly has the copy optimization and shortcut, while half does not.) In the benchmarks, the systems are: c2s16 GOARCH=amd64 c2s16 perf gomote (Intel, Google Cloud) c3h88 GOARCH=amd64 c3h88 perf gomote (newer Intel, Google Cloud) s7 GOARCH=amd64 rsc basement server (AMD Ryzen 9 7950X) c4as16 GOARCH=arm64 c4as16 perf gomote (Google Cloud) mac GOARCH=arm64 Apple M3 Pro in MacBook Pro 386 GOARCH=386 gotip-linux-386 gomote arm GOARCH=arm gotip-linux-arm gomote loong64 GOARCH=loong64 gotip-linux-loong64 gomote ppc64le GOARCH=ppc64le gotip-linux-ppc64le gomote riscv64 GOARCH=riscv64 gotip-linux-riscv64 gomote benchmark \ system c2s16 c3h88 s7 c4as16 mac 386 arm loong64 ppc64le riscv64 AddVW/words=1/data=random -1.15% -1.74% -5.89% -9.80% -11.54% +23.71% -12.74% -14.25% +14.67% +10.27% AddVW/words=2/data=random -2.59% ~ -4.38% -19.31% -15.41% +24.80% ~ -19.99% +13.73% +19.71% AddVW/words=3/data=random -3.75% -19.10% -3.79% -23.15% -17.04% +20.04% -10.07% -23.20% ~ +15.39% AddVW/words=4/data=random -2.84% +7.05% -8.77% -22.64% -15.77% +16.01% -7.36% -28.22% ~ +23.00% AddVW/words=5/data=random -10.97% +2.16% -12.09% -20.89% -17.14% +9.42% -4.69% -32.60% ~ +10.07% AddVW/words=6/data=random -9.87% ~ -7.54% -19.08% -6.46% ~ -3.44% -34.61% ~ +12.19% AddVW/words=7/data=random -14.36% ~ -10.09% -19.10% -10.47% -6.20% -5.06% -38.14% -11.54% +6.79% AddVW/words=8/data=random -17.50% ~ -11.06% -25.14% -12.88% -8.35% -5.11% -41.39% -14.04% +11.87% AddVW/words=9/data=random -19.76% -4.05% -15.47% -24.08% -16.50% -12.34% -21.56% -44.25% -14.82% ~ AddVW/words=10/data=random -13.89% ~ -9.69% -23.06% -8.04% -12.58% -19.25% -32.80% -11.68% ~ AddVW/words=16/data=random -29.36% -15.35% -21.86% -25.04% -19.89% -32.26% -16.29% -42.66% -25.92% -3.01% AddVW/words=32/data=random -39.02% -28.76% -39.87% -11.22% -2.85% -55.40% -31.17% -55.37% -37.92% -16.28% AddVW/words=64/data=random -25.94% -19.09% -20.60% -6.90% +8.91% -51.00% -43.72% -62.27% -44.11% -28.74% AddVW/words=100/data=random -22.79% -18.13% -18.25% ~ +33.89% -67.40% -51.77% -63.54% -53.75% -30.97% AddVW/words=1000/data=random -8.98% -3.84% ~ -3.15% ~ -93.35% -63.92% -65.66% -68.67% -42.30% AddVW/words=10000/data=random -1.38% -0.38% ~ ~ ~ -89.16% -65.18% -44.65% -70.35% -20.08% AddVW/words=100000/data=random ~ ~ ~ ~ ~ -87.03% -64.51% -36.08% -61.40% -16.53% SubVW/words=1/data=random -3.67% ~ -8.38% -10.26% -3.07% +45.78% -6.06% -11.17% ~ ~ SubVW/words=2/data=random -3.48% -10.07% -5.76% -20.14% -8.45% +44.28% ~ -19.09% ~ +16.98% SubVW/words=3/data=random -7.11% -26.64% -4.48% -22.07% -9.21% +35.61% ~ -23.93% -18.20% ~ SubVW/words=4/data=random -4.23% +7.19% -8.95% -22.62% -13.89% +33.20% -8.96% -29.96% ~ +22.23% SubVW/words=5/data=random -11.49% +1.92% -10.86% -22.27% -17.53% +24.48% -2.88% -35.19% -19.55% ~ SubVW/words=6/data=random -7.67% ~ -7.72% -18.44% -6.24% +12.03% -2.00% -39.68% -10.73% ~ SubVW/words=7/data=random -13.69% -18.32% -11.82% -18.92% -11.57% +6.63% ~ -43.54% -30.81% ~ SubVW/words=8/data=random -16.02% ~ -11.07% -24.50% -11.92% +4.32% -3.01% -46.95% -24.14% ~ SubVW/words=9/data=random -18.76% -3.34% -14.84% -23.79% -17.50% ~ -21.80% -49.98% -29.62% ~ SubVW/words=10/data=random -13.23% ~ -9.25% -21.26% -11.63% ~ -18.58% -39.19% -20.09% ~ SubVW/words=16/data=random -28.25% -13.24% -22.66% -27.18% -19.13% -23.38% -20.24% -51.01% -28.06% -3.05% SubVW/words=32/data=random -38.41% -28.88% -40.12% -11.20% -2.80% -49.17% -34.67% -63.29% -39.25% -15.20% SubVW/words=64/data=random -25.51% -19.24% -22.20% -6.57% +9.98% -48.52% -48.14% -69.50% -49.44% -27.92% SubVW/words=100/data=random -21.69% -18.51% ~ +1.92% +34.42% -65.88% -54.67% -71.24% -58.88% -30.71% SubVW/words=1000/data=random -9.81% -4.05% -2.14% -3.06% ~ -93.37% -67.33% -74.12% -68.36% -42.17% SubVW/words=10000/data=random ~ -0.52% ~ ~ ~ -88.87% -68.54% -44.94% -70.63% -19.95% SubVW/words=100000/data=random ~ ~ ~ ~ ~ -86.69% -68.09% -48.36% -62.42% -19.32% AddVW/words=1/data=shortcut -29.38% -25.38% -27.37% -23.15% -25.41% +3.01% -33.60% -36.12% -15.76% ~ AddVW/words=2/data=shortcut -32.79% -34.72% -31.47% -24.47% -28.21% -3.75% -34.66% -43.89% -23.65% -21.56% AddVW/words=3/data=shortcut -38.50% -46.83% -35.67% -26.38% -30.29% -10.41% -44.89% -47.68% -30.93% -26.85% AddVW/words=4/data=shortcut -40.40% -28.85% -34.19% -29.83% -32.95% -16.09% -42.86% -51.02% -34.19% -26.69% AddVW/words=5/data=shortcut -43.87% -35.42% -36.46% -32.59% -37.72% -20.82% -45.14% -54.01% -35.49% -30.48% AddVW/words=6/data=shortcut -46.98% -39.34% -42.22% -35.43% -38.18% -27.46% -46.72% -56.61% -40.21% -34.07% AddVW/words=7/data=shortcut -49.63% -47.97% -46.61% -35.28% -41.93% -31.14% -49.29% -58.89% -41.10% -37.01% AddVW/words=8/data=shortcut -50.48% -42.33% -45.40% -40.24% -41.74% -32.92% -50.62% -60.98% -44.85% -38.10% AddVW/words=9/data=shortcut -54.27% -43.52% -49.06% -42.16% -45.22% -37.57% -51.84% -62.91% -46.04% -40.82% AddVW/words=10/data=shortcut -56.01% -45.40% -51.42% -43.29% -46.14% -38.65% -53.65% -64.62% -47.05% -43.21% AddVW/words=16/data=shortcut -62.73% -55.66% -59.31% -56.38% -54.31% -53.16% -61.03% -72.29% -58.24% -52.57% AddVW/words=32/data=shortcut -74.00% -69.42% -71.75% -33.65% -37.35% -71.73% -72.59% -82.44% -70.87% -67.69% AddVW/words=64/data=shortcut -56.69% -52.72% -52.09% -35.48% -36.87% -84.24% -83.10% -90.37% -82.56% -80.81% AddVW/words=100/data=shortcut -56.68% -53.18% -51.49% -33.49% -37.72% -89.95% -88.21% -93.37% -88.47% -86.52% AddVW/words=1000/data=shortcut -56.68% -52.45% -51.66% -35.31% -36.65% -98.88% -98.62% -99.24% -98.78% -98.41% AddVW/words=10000/data=shortcut -56.70% -52.40% -51.92% -33.49% -36.98% -99.89% -99.86% -99.92% -99.87% -99.91% AddVW/words=100000/data=shortcut -56.67% -52.46% -52.38% -35.31% -37.20% -99.99% -99.99% -99.99% -99.99% -99.99% SubVW/words=1/data=shortcut -29.80% -20.71% -26.94% -23.24% -25.33% +26.97% -32.02% -37.85% -40.20% -12.67% SubVW/words=2/data=shortcut -35.47% -36.38% -31.93% -25.43% -30.18% +18.96% -33.48% -46.48% -39.38% -18.65% SubVW/words=3/data=shortcut -39.22% -49.96% -36.90% -25.82% -30.96% +12.53% -40.67% -51.07% -43.71% -23.78% SubVW/words=4/data=shortcut -40.46% -24.90% -34.66% -29.87% -33.97% +4.60% -42.32% -54.92% -42.83% -22.45% SubVW/words=5/data=shortcut -43.84% -34.17% -38.00% -32.55% -37.27% -2.46% -43.09% -58.18% -45.70% -26.45% SubVW/words=6/data=shortcut -47.69% -37.49% -42.73% -35.90% -37.73% -8.52% -46.55% -61.01% -44.00% -30.14% SubVW/words=7/data=shortcut -49.45% -50.66% -46.88% -34.77% -41.64% -14.46% -48.92% -63.46% -50.47% -33.39% SubVW/words=8/data=shortcut -50.45% -39.31% -47.14% -40.47% -41.70% -15.77% -50.21% -65.64% -47.71% -34.01% SubVW/words=9/data=shortcut -54.28% -43.07% -49.42% -41.34% -44.99% -19.39% -51.55% -67.61% -56.92% -36.82% SubVW/words=10/data=shortcut -56.85% -47.88% -50.92% -42.76% -45.67% -23.60% -53.04% -69.34% -60.18% -39.43% SubVW/words=16/data=shortcut -62.36% -54.83% -58.80% -55.83% -53.74% -41.04% -60.16% -76.75% -60.56% -48.63% SubVW/words=32/data=shortcut -73.68% -68.64% -71.57% -33.52% -37.34% -64.73% -72.67% -85.89% -71.87% -64.56% SubVW/words=64/data=shortcut -56.68% -51.66% -52.56% -34.75% -37.54% -80.30% -83.58% -92.39% -83.41% -78.70% SubVW/words=100/data=shortcut -56.68% -50.97% -51.57% -33.68% -36.78% -87.42% -88.53% -94.84% -88.87% -84.96% SubVW/words=1000/data=shortcut -56.68% -50.89% -52.10% -34.94% -37.77% -98.59% -98.71% -99.43% -98.80% -98.20% SubVW/words=10000/data=shortcut -56.68% -51.00% -52.44% -33.65% -37.27% -99.86% -99.87% -99.94% -99.88% -99.90% SubVW/words=100000/data=shortcut -56.68% -50.80% -52.20% -34.79% -37.46% -99.99% -99.99% -99.99% -99.99% -99.99% AddVW/words=1/data=carry -0.51% -5.29% -24.03% -26.48% ~ ~ -33.14% -30.23% ~ -20.74% AddVW/words=2/data=carry -6.36% ~ -21.05% -39.40% ~ +10.72% -29.12% -31.34% ~ -17.29% AddVW/words=3/data=carry ~ ~ -17.46% -19.53% +17.58% ~ -26.23% -23.61% +7.80% -14.34% AddVW/words=4/data=carry +19.02% +16.80% ~ ~ +28.25% ~ -27.90% -20.31% +19.16% ~ AddVW/words=5/data=carry +3.97% +53.02% ~ ~ +11.31% ~ -19.05% -17.47% +16.81% ~ AddVW/words=6/data=carry +2.98% +19.83% ~ ~ +14.84% ~ -18.48% -14.92% +18.25% ~ AddVW/words=7/data=carry ~ ~ ~ ~ +27.17% ~ -15.50% -12.74% +13.00% ~ AddVW/words=8/data=carry +0.58% +22.32% ~ +6.10% +29.63% ~ -13.04% ~ +28.46% +2.95% AddVW/words=9/data=carry ~ +31.53% ~ ~ +14.42% ~ -11.32% ~ +18.37% +3.28% AddVW/words=10/data=carry +3.94% +22.36% ~ +6.29% +19.22% ~ -11.27% ~ +20.10% +3.91% AddVW/words=16/data=carry +2.82% +14.23% ~ +10.06% +25.91% -16.12% ~ ~ +52.28% +10.40% AddVW/words=32/data=carry ~ +25.35% +13.66% ~ +34.89% -34.39% +6.51% -18.71% +41.06% +19.42% AddVW/words=64/data=carry -42.03% ~ -39.70% +6.65% +32.29% -39.94% +14.34% ~ +19.68% +20.86% AddVW/words=100/data=carry -33.95% -34.28% -39.65% ~ +27.72% -26.80% +17.40% ~ +26.39% +23.32% AddVW/words=1000/data=carry -42.49% -47.87% -47.44% +1.25% +4.25% -41.76% +23.40% ~ +25.48% +27.99% AddVW/words=10000/data=carry -41.85% -48.49% -49.43% ~ ~ -42.09% +24.61% -10.32% +40.55% +18.35% AddVW/words=100000/data=carry -28.18% -48.13% -48.24% +1.35% ~ -42.90% +24.73% -9.79% +22.55% +17.16% SubVW/words=1/data=carry -10.32% -17.16% -24.14% -26.24% ~ +18.43% -34.10% -29.54% -9.57% ~ SubVW/words=2/data=carry -19.45% -23.31% -20.74% -39.73% ~ +15.74% -28.13% -30.21% ~ -18.74% SubVW/words=3/data=carry ~ -16.18% -15.34% -19.54% +17.62% +12.39% -27.64% -27.09% ~ -14.97% SubVW/words=4/data=carry +11.67% +24.42% ~ ~ +25.11% +14.07% -28.08% -26.18% ~ ~ SubVW/words=5/data=carry +8.08% +25.64% ~ ~ +10.35% +8.12% -21.75% -25.50% ~ -4.86% SubVW/words=6/data=carry ~ +13.82% ~ ~ +12.92% +6.79% -20.25% -24.70% ~ -2.74% SubVW/words=7/data=carry ~ ~ +8.29% +4.51% +26.59% +4.62% -18.01% -24.09% ~ -1.26% SubVW/words=8/data=carry ~ +23.16% +16.19% +6.16% +25.46% +6.74% -15.57% -22.74% ~ +1.44% SubVW/words=9/data=carry ~ +30.71% +20.81% ~ +12.36% ~ -12.99% ~ ~ +3.13% SubVW/words=10/data=carry +5.03% +19.53% +14.84% +14.16% +16.12% ~ -11.64% -16.00% +15.45% +3.29% SubVW/words=16/data=carry +14.42% +15.58% +33.07% +11.43% +24.65% ~ ~ -21.90% +25.59% +9.40% SubVW/words=32/data=carry ~ +27.57% +46.58% ~ +35.35% -8.49% ~ -24.04% +11.86% +18.40% SubVW/words=64/data=carry -24.34% -27.83% -20.90% +13.34% +37.17% -14.90% ~ -8.81% +12.88% +18.92% SubVW/words=100/data=carry -25.19% -34.70% -27.45% +12.86% +28.42% -14.48% ~ ~ +25.71% +21.93% SubVW/words=1000/data=carry -24.93% -47.86% -47.26% +2.66% ~ -23.88% ~ ~ +25.99% +27.81% SubVW/words=10000/data=carry -24.17% -36.48% -49.41% +1.06% ~ -25.06% ~ -26.50% +27.94% +18.36% SubVW/words=100000/data=carry -22.51% -35.86% -49.46% +3.96% ~ -25.18% ~ -22.15% +26.86% +15.44% Change-Id: I8f252073040e674780ac6ec9912082fb205329dd Reviewed-on: https://go-review.googlesource.com/c/go/+/664898 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-18	math/big: add more complete tests and benchmarks of assembly	Russ Cox
	Also fix a few real but currently harmless bugs from CL 664895. There were a few places that were still wrong if z != x or if a != 0. Change-Id: Id8971e2505523bc4708780c82bf998a546f4f081 Reviewed-on: https://go-review.googlesource.com/c/go/+/664897 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>
2025-04-15	math/big: fix loong64 assembly for vet	Keith Randall
	Vet is failing on this code because some arguments of mulAddVWW got renamed in the go decl (CL 664895) but not the assembly accessors. Looks like the assembly got written before that CL but checked in after that CL. Change-Id: I270e8db5f8327aa2029c21a126fab1231a3506a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/665717 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
2025-04-15	math/big: optimize subVV function for loong64	Huang Qiqi
	Benchmark results on Loongson 3C5000 (which is an LA464 implementation): goos: linux goarch: loong64 pkg: math/big cpu: Loongson-3C5000 @ 2200.00MHz │ test/old_3c5000_subvv.log │ test/new_3c5000_subvv.log │ │ sec/op │ sec/op vs base │ SubVV/1 10.920n ± 0% 7.657n ± 0% -29.88% (p=0.000 n=20) SubVV/2 14.100n ± 0% 8.841n ± 0% -37.30% (p=0.000 n=20) SubVV/3 16.38n ± 0% 11.06n ± 0% -32.48% (p=0.000 n=20) SubVV/4 18.65n ± 0% 12.85n ± 0% -31.10% (p=0.000 n=20) SubVV/5 20.93n ± 0% 14.79n ± 0% -29.34% (p=0.000 n=20) SubVV/10 32.30n ± 0% 22.29n ± 0% -30.99% (p=0.000 n=20) SubVV/100 244.3n ± 0% 149.2n ± 0% -38.93% (p=0.000 n=20) SubVV/1000 2.292µ ± 0% 1.378µ ± 0% -39.88% (p=0.000 n=20) SubVV/10000 26.26µ ± 0% 25.64µ ± 0% -2.33% (p=0.000 n=20) SubVV/100000 341.3µ ± 0% 238.0µ ± 0% -30.26% (p=0.000 n=20) geomean 209.1n 144.5n -30.86% Change-Id: I3863c2c6728f1b0f8fecbf77de13254299c5b1cb Reviewed-on: https://go-review.googlesource.com/c/go/+/659877 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-15	math/big: optimize mulAddVWW function for loong64	Huang Qiqi
	Benchmark results on Loongson 3A5000 (which is an LA464 implementation): goos: linux goarch: loong64 pkg: math/big cpu: Loongson-3A5000-HV @ 2500.00MHz │ test/old_3a5000_muladdvww.log │ test/new_3a5000_muladdvww.log │ │ sec/op │ sec/op vs base │ MulAddVWW/1 7.606n ± 0% 6.987n ± 0% -8.14% (p=0.000 n=20) MulAddVWW/2 9.207n ± 0% 8.567n ± 0% -6.95% (p=0.000 n=20) MulAddVWW/3 10.810n ± 0% 9.223n ± 0% -14.68% (p=0.000 n=20) MulAddVWW/4 13.01n ± 0% 12.41n ± 0% -4.61% (p=0.000 n=20) MulAddVWW/5 15.79n ± 0% 12.99n ± 0% -17.73% (p=0.000 n=20) MulAddVWW/10 25.62n ± 0% 20.02n ± 0% -21.86% (p=0.000 n=20) MulAddVWW/100 217.0n ± 0% 170.9n ± 0% -21.24% (p=0.000 n=20) MulAddVWW/1000 2.064µ ± 0% 1.612µ ± 0% -21.90% (p=0.000 n=20) MulAddVWW/10000 24.50µ ± 0% 16.74µ ± 0% -31.66% (p=0.000 n=20) MulAddVWW/100000 239.1µ ± 0% 171.1µ ± 0% -28.45% (p=0.000 n=20) geomean 159.2n 130.3n -18.18% Change-Id: I063434bc382f4f1234f879172ab671a3d6f2eb80 Reviewed-on: https://go-review.googlesource.com/c/go/+/659881 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-15	math/big: optimize subVW function for loong64	Huang Qiqi
	Benchmark results on Loongson 3C5000 (which is an LA464 implementation): goos: linux goarch: loong64 pkg: math/big cpu: Loongson-3C5000 @ 2200.00MHz │ test/old_3c5000_subvw.log │ test/new_3c5000_subvw.log │ │ sec/op │ sec/op vs base │ SubVW/1 8.564n ± 0% 5.915n ± 0% -30.93% (p=0.000 n=20) SubVW/2 11.675n ± 0% 6.825n ± 0% -41.54% (p=0.000 n=20) SubVW/3 13.410n ± 0% 7.969n ± 0% -40.57% (p=0.000 n=20) SubVW/4 15.300n ± 0% 9.740n ± 0% -36.34% (p=0.000 n=20) SubVW/5 17.34n ± 1% 10.66n ± 0% -38.55% (p=0.000 n=20) SubVW/10 26.55n ± 0% 15.21n ± 0% -42.70% (p=0.000 n=20) SubVW/100 199.2n ± 0% 102.5n ± 0% -48.52% (p=0.000 n=20) SubVW/1000 1866.5n ± 1% 924.6n ± 0% -50.46% (p=0.000 n=20) SubVW/10000 17.67µ ± 2% 12.04µ ± 2% -31.83% (p=0.000 n=20) SubVW/100000 186.4µ ± 0% 132.0µ ± 0% -29.17% (p=0.000 n=20) SubVWext/1 8.616n ± 0% 5.949n ± 0% -30.95% (p=0.000 n=20) SubVWext/2 11.410n ± 0% 7.008n ± 1% -38.58% (p=0.000 n=20) SubVWext/3 13.255n ± 1% 8.073n ± 0% -39.09% (p=0.000 n=20) SubVWext/4 15.095n ± 0% 9.893n ± 0% -34.47% (p=0.000 n=20) SubVWext/5 16.87n ± 0% 10.86n ± 0% -35.63% (p=0.000 n=20) SubVWext/10 26.00n ± 0% 15.54n ± 0% -40.22% (p=0.000 n=20) SubVWext/100 196.0n ± 0% 104.3n ± 1% -46.76% (p=0.000 n=20) SubVWext/1000 1847.0n ± 0% 923.7n ± 0% -49.99% (p=0.000 n=20) SubVWext/10000 17.30µ ± 1% 11.71µ ± 1% -32.31% (p=0.000 n=20) SubVWext/100000 187.5µ ± 0% 131.6µ ± 0% -29.82% (p=0.000 n=20) geomean 159.7n 97.79n -38.79% Change-Id: I21a6903e79b02cb22282e80c9bfe2ae9f1a87589 Reviewed-on: https://go-review.googlesource.com/c/go/+/659878 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2025-04-15	math/big: optimize addVW function for loong64	Huang Qiqi
	Benchmark results on Loongson 3C5000 (which is an LA464 implementation): goos: linux goarch: loong64 pkg: math/big cpu: Loongson-3C5000 @ 2200.00MHz │ test/old_3c5000_addvw.log │ test/new_3c5000_addvw.log │ │ sec/op │ sec/op vs base │ AddVW/1 9.555n ± 0% 5.915n ± 0% -38.09% (p=0.000 n=20) AddVW/2 11.370n ± 0% 6.825n ± 0% -39.97% (p=0.000 n=20) AddVW/3 12.485n ± 0% 7.970n ± 0% -36.16% (p=0.000 n=20) AddVW/4 14.980n ± 0% 9.718n ± 0% -35.13% (p=0.000 n=20) AddVW/5 16.73n ± 0% 10.63n ± 0% -36.46% (p=0.000 n=20) AddVW/10 24.57n ± 0% 15.18n ± 0% -38.23% (p=0.000 n=20) AddVW/100 184.9n ± 0% 102.4n ± 0% -44.62% (p=0.000 n=20) AddVW/1000 1721.0n ± 0% 921.4n ± 0% -46.46% (p=0.000 n=20) AddVW/10000 16.83µ ± 0% 11.68µ ± 0% -30.58% (p=0.000 n=20) AddVW/100000 184.7µ ± 0% 131.3µ ± 0% -28.93% (p=0.000 n=20) AddVWext/1 9.554n ± 0% 5.915n ± 0% -38.09% (p=0.000 n=20) AddVWext/2 11.370n ± 0% 6.825n ± 0% -39.97% (p=0.000 n=20) AddVWext/3 12.505n ± 0% 7.969n ± 0% -36.27% (p=0.000 n=20) AddVWext/4 14.980n ± 0% 9.718n ± 0% -35.13% (p=0.000 n=20) AddVWext/5 16.70n ± 0% 10.63n ± 0% -36.33% (p=0.000 n=20) AddVWext/10 24.54n ± 0% 15.18n ± 0% -38.13% (p=0.000 n=20) AddVWext/100 185.0n ± 0% 102.4n ± 0% -44.65% (p=0.000 n=20) AddVWext/1000 1721.0n ± 0% 921.4n ± 0% -46.46% (p=0.000 n=20) AddVWext/10000 16.83µ ± 0% 11.68µ ± 0% -30.60% (p=0.000 n=20) AddVWext/100000 184.9µ ± 0% 130.4µ ± 0% -29.51% (p=0.000 n=20) geomean 155.5n 96.87n -37.70% Change-Id: I824a90cb365e09d7d0d4a2c53ff4b30cf057a75e Reviewed-on: https://go-review.googlesource.com/c/go/+/659876 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-04-11	math/big: remove copy responsibility from, rename shlVU, shrVU	Russ Cox
	It is annoying that non-x86 implementations of shlVU and shrVU have to go out of their way to handle the trivial case shift==0 with their own copy loops. Instead, arrange to never call them with shift==0, so that the code can be removed. Unfortunately, there are linknames of shlVU, so we cannot change that function. But we can rename the functions and then leave behind a shlVU wrapper, so do that. Since the big.Int API calls the operations Lsh and Rsh, rename shlVU/shrVU to lshVU/rshVU. Also rename various other shl/shr methods and functions to lsh/rsh. Change-Id: Ieaf54e0110a298730aa3e4566ce5be57ba7fc121 Reviewed-on: https://go-review.googlesource.com/c/go/+/664896 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-11	math/big: replace addMulVVW with addMulVVWW	Russ Cox
	addMulVVW is an unnecessarily special case. All other assembly routines taking []Word (V as in vector) arguments take separate source and destination. For example: addVV: z = x+y mulAddVWW: z = xm+a addMulVVW uses the z parameter as both destination and source: addMulVVW: z = z+xm Even looking at the signatures is confusing: all the VV routines take two input vectors x and y, but addMulVVW takes only x: where is y? (The answer is that the two inputs are z and x.) It would be nice to fix this, both for understandability and regularity, and to simplify a future assembly generator. We cannot remove or redefine addMulVVW, because it has been used in linknames. Instead, the CL adds a new final addend argument ‘a’ like in mulAddVWW, making the natural name addMulVVWW (two input vectors, two input words): addMulVVWW: z = x+y*m+a This CL updates all the assembly implementations to rename the inputs z, x, y -> x, y, m, and then introduces a separate destination z. Change-Id: Ib76c80b53f6d1f4a901f663566e9c4764bb20488 Reviewed-on: https://go-review.googlesource.com/c/go/+/664895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>
2025-03-24	cmd/asm: add LCDBR instruction on s390x	Vishwanatha HD
	This CL is to add LCDBR assembly instruction mnemonics, mainly used in math package. The LCDBR instruction has the same effect as the FNEG pseudo-instructions, just that it sets the flag. Change-Id: I3f00f1ed19148d074c3b6c5f64af0772289f2802 Reviewed-on: https://go-review.googlesource.com/c/go/+/648036 Reviewed-by: Srinivas Pokala <Pokala.Srinivas@ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2025-03-12	math/big: update calibration tests and recalibrate	Russ Cox
	Refactor calibration tests to use the same logic for all. Choosing thresholds that are broadly appropriate for all systems is part science but also part guesswork and judgement. We could instead set per-GOOS/GOARCH thresholds, but that seems like too much work, and even then there would be variation between different chips within a GOOS/GOARCH. (For example see the three linux/amd64 systems benchmarked below.) The thresholds chosen in this CL are: karatsubaThreshold = 40 // unchanged basicSqrThreshold = 12 // was 20 karatsubaSqrThreshold = 80 // was 260 divRecursiveThreshold = 40 // was 100 The new file calibrate.md explains the calibration process and links to graphs justifying those values. (The graphs are hosted on swtch.com to avoid adding a megabyte of extra data to the Go repo and Go distributions.) A rendered copy of calibrate.md is at https://swtch.com/math/big/calibrate.html. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-88 13.13n ± 2% 13.14n ± 2% ~ (p=0.494 n=15) Div/40/20-88 13.13n ± 2% 13.14n ± 2% ~ (p=0.137 n=15) Div/100/50-88 25.50n ± 0% 25.51n ± 0% ~ (p=0.038 n=15) Div/200/100-88 113.1n ± 1% 116.0n ± 3% +2.56% (p=0.000 n=15) Div/400/200-88 135.3n ± 0% 137.1n ± 1% ~ (p=0.004 n=15) Div/1000/500-88 259.9n ± 1% 259.0n ± 2% ~ (p=0.182 n=15) Div/2000/1000-88 568.8n ± 1% 564.7n ± 3% ~ (p=0.927 n=15) Div/20000/10000-88 25.79µ ± 1% 22.11µ ± 2% -14.26% (p=0.000 n=15) Div/200000/100000-88 755.1µ ± 1% 737.6µ ± 1% -2.32% (p=0.000 n=15) Div/2000000/1000000-88 31.30m ± 0% 31.20m ± 1% ~ (p=0.081 n=15) Div/20000000/10000000-88 1.268 ± 0% 1.265 ± 0% ~ (p=0.011 n=15) NatMul/10-88 142.6n ± 0% 142.9n ± 7% ~ (p=0.145 n=15) NatMul/100-88 4.347µ ± 0% 4.350µ ± 3% ~ (p=0.430 n=15) NatMul/1000-88 187.6µ ± 0% 188.4µ ± 2% ~ (p=0.004 n=15) NatMul/10000-88 8.052m ± 0% 8.057m ± 1% ~ (p=0.148 n=15) NatMul/100000-88 260.6m ± 0% 260.7m ± 0% ~ (p=0.512 n=15) NatSqr/1-88 26.58n ± 5% 27.96n ± 8% ~ (p=0.574 n=15) NatSqr/2-88 42.35n ± 7% 44.87n ± 6% ~ (p=0.690 n=15) NatSqr/3-88 53.28n ± 4% 55.62n ± 5% ~ (p=0.151 n=15) NatSqr/5-88 76.26n ± 6% 81.43n ± 6% +6.78% (p=0.000 n=15) NatSqr/8-88 110.8n ± 5% 116.4n ± 6% ~ (p=0.040 n=15) NatSqr/10-88 141.4n ± 4% 147.8n ± 4% ~ (p=0.011 n=15) NatSqr/20-88 325.8n ± 3% 341.7n ± 4% +4.88% (p=0.000 n=15) NatSqr/30-88 536.8n ± 3% 556.1n ± 4% ~ (p=0.027 n=15) NatSqr/50-88 1.168µ ± 3% 1.197µ ± 3% ~ (p=0.442 n=15) NatSqr/80-88 2.527µ ± 2% 2.480µ ± 2% -1.86% (p=0.000 n=15) NatSqr/100-88 3.771µ ± 2% 3.535µ ± 2% -6.26% (p=0.000 n=15) NatSqr/200-88 14.03µ ± 2% 10.57µ ± 3% -24.68% (p=0.000 n=15) NatSqr/300-88 24.06µ ± 2% 20.57µ ± 2% -14.52% (p=0.000 n=15) NatSqr/500-88 65.43µ ± 1% 45.45µ ± 1% -30.55% (p=0.000 n=15) NatSqr/800-88 126.41µ ± 1% 94.13µ ± 2% -25.54% (p=0.000 n=15) NatSqr/1000-88 196.4µ ± 1% 135.1µ ± 1% -31.18% (p=0.000 n=15) NatSqr/10000-88 6.404m ± 0% 5.326m ± 1% -16.84% (p=0.000 n=15) NatSqr/100000-88 267.2m ± 0% 198.7m ± 0% -25.64% (p=0.000 n=15) geomean 7.318µ 6.948µ -5.06% goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.973 n=15) Div/40/20-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.226 n=15) Div/100/50-16 55.27n ± 1% 55.59n ± 0% ~ (p=0.004 n=15) Div/200/100-16 174.7n ± 3% 175.9n ± 2% ~ (p=0.645 n=15) Div/400/200-16 208.8n ± 1% 209.5n ± 2% ~ (p=0.169 n=15) Div/1000/500-16 378.7n ± 2% 380.5n ± 2% ~ (p=0.091 n=15) Div/2000/1000-16 778.4n ± 1% 781.1n ± 2% ~ (p=0.104 n=15) Div/20000/10000-16 25.16µ ± 1% 24.93µ ± 1% -0.91% (p=0.000 n=15) Div/200000/100000-16 926.4µ ± 0% 927.7µ ± 1% ~ (p=0.436 n=15) Div/2000000/1000000-16 35.58m ± 0% 35.53m ± 0% ~ (p=0.267 n=15) Div/20000000/10000000-16 1.333 ± 0% 1.330 ± 0% ~ (p=0.126 n=15) NatMul/10-16 172.6n ± 0% 165.4n ± 0% -4.17% (p=0.000 n=15) NatMul/100-16 5.706µ ± 0% 5.503µ ± 0% -3.56% (p=0.000 n=15) NatMul/1000-16 220.8µ ± 0% 219.1µ ± 0% -0.76% (p=0.000 n=15) NatMul/10000-16 8.688m ± 0% 8.621m ± 0% -0.77% (p=0.000 n=15) NatMul/100000-16 333.3m ± 0% 333.5m ± 0% ~ (p=0.512 n=15) NatSqr/1-16 28.66n ± 1% 28.42n ± 3% -0.84% (p=0.000 n=15) NatSqr/2-16 48.29n ± 2% 48.19n ± 2% ~ (p=0.042 n=15) NatSqr/3-16 59.93n ± 0% 59.64n ± 2% -0.48% (p=0.000 n=15) NatSqr/5-16 88.05n ± 0% 87.89n ± 3% ~ (p=0.066 n=15) NatSqr/8-16 127.7n ± 0% 126.9n ± 3% -0.63% (p=0.000 n=15) NatSqr/10-16 170.4n ± 0% 169.7n ± 3% ~ (p=0.004 n=15) NatSqr/20-16 388.8n ± 0% 392.9n ± 3% ~ (p=0.123 n=15) NatSqr/30-16 635.2n ± 0% 641.7n ± 3% ~ (p=0.123 n=15) NatSqr/50-16 1.304µ ± 1% 1.314µ ± 3% ~ (p=0.927 n=15) NatSqr/80-16 2.709µ ± 1% 2.899µ ± 4% +7.01% (p=0.000 n=15) NatSqr/100-16 3.885µ ± 0% 3.981µ ± 4% ~ (p=0.123 n=15) NatSqr/200-16 13.29µ ± 2% 12.14µ ± 4% -8.67% (p=0.000 n=15) NatSqr/300-16 23.39µ ± 0% 22.51µ ± 3% -3.78% (p=0.000 n=15) NatSqr/500-16 58.13µ ± 1% 50.56µ ± 2% -13.02% (p=0.000 n=15) NatSqr/800-16 118.4µ ± 1% 107.6µ ± 2% -9.11% (p=0.000 n=15) NatSqr/1000-16 172.7µ ± 1% 151.8µ ± 2% -12.11% (p=0.000 n=15) NatSqr/10000-16 6.065m ± 1% 5.757m ± 1% -5.08% (p=0.000 n=15) NatSqr/100000-16 240.9m ± 0% 228.1m ± 0% -5.32% (p=0.000 n=15) geomean 8.601µ 8.453µ -1.71% goos: linux goarch: amd64 pkg: math/big cpu: AMD Ryzen 9 7950X 16-Core Processor │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-32 11.11n ± 0% 11.11n ± 1% ~ (p=0.532 n=15) Div/40/20-32 11.08n ± 1% 11.11n ± 0% ~ (p=0.815 n=15) Div/100/50-32 16.81n ± 0% 16.84n ± 29% ~ (p=0.020 n=15) Div/200/100-32 73.91n ± 0% 76.85n ± 11% +3.98% (p=0.000 n=15) Div/400/200-32 87.35n ± 0% 88.91n ± 34% +1.79% (p=0.000 n=15) Div/1000/500-32 169.3n ± 1% 168.9n ± 1% ~ (p=0.049 n=15) Div/2000/1000-32 369.3n ± 0% 369.0n ± 0% ~ (p=0.108 n=15) Div/20000/10000-32 15.92µ ± 0% 13.55µ ± 2% -14.91% (p=0.000 n=15) Div/200000/100000-32 491.4µ ± 0% 482.4µ ± 1% -1.84% (p=0.000 n=15) Div/2000000/1000000-32 20.09m ± 0% 19.96m ± 0% -0.69% (p=0.000 n=15) Div/20000000/10000000-32 756.5m ± 0% 755.5m ± 0% ~ (p=0.089 n=15) NatMul/10-32 125.4n ± 5% 124.8n ± 1% ~ (p=0.588 n=15) NatMul/100-32 2.952µ ± 3% 2.969µ ± 0% ~ (p=0.237 n=15) NatMul/1000-32 120.7µ ± 0% 121.1µ ± 0% +0.30% (p=0.000 n=15) NatMul/10000-32 4.845m ± 0% 4.839m ± 1% ~ (p=0.653 n=15) NatMul/100000-32 173.3m ± 0% 173.3m ± 0% ~ (p=0.838 n=15) NatSqr/1-32 31.18n ± 23% 32.08n ± 2% ~ (p=0.015 n=15) NatSqr/2-32 57.22n ± 28% 58.88n ± 2% ~ (p=0.054 n=15) NatSqr/3-32 61.34n ± 18% 64.33n ± 2% ~ (p=0.237 n=15) NatSqr/5-32 72.47n ± 17% 79.81n ± 3% ~ (p=0.067 n=15) NatSqr/8-32 83.26n ± 26% 100.10n ± 3% ~ (p=0.016 n=15) NatSqr/10-32 87.31n ± 43% 125.50n ± 2% ~ (p=0.003 n=15) NatSqr/20-32 193.5n ± 25% 244.4n ± 13% ~ (p=0.002 n=15) NatSqr/30-32 323.9n ± 17% 380.9n ± 6% ~ (p=0.003 n=15) NatSqr/50-32 713.4n ± 9% 761.7n ± 8% ~ (p=0.419 n=15) NatSqr/80-32 1.486µ ± 7% 1.609µ ± 5% +8.28% (p=0.000 n=15) NatSqr/100-32 2.115µ ± 9% 2.253µ ± 1% ~ (p=0.104 n=15) NatSqr/200-32 7.201µ ± 4% 6.610µ ± 1% -8.21% (p=0.000 n=15) NatSqr/300-32 13.08µ ± 2% 12.37µ ± 1% -5.41% (p=0.000 n=15) NatSqr/500-32 32.56µ ± 2% 27.83µ ± 2% -14.52% (p=0.000 n=15) NatSqr/800-32 66.83µ ± 3% 59.59µ ± 1% -10.83% (p=0.000 n=15) NatSqr/1000-32 98.09µ ± 1% 83.59µ ± 1% -14.78% (p=0.000 n=15) NatSqr/10000-32 3.445m ± 1% 3.245m ± 0% -5.81% (p=0.000 n=15) NatSqr/100000-32 137.3m ± 0% 127.0m ± 0% -7.54% (p=0.000 n=15) geomean 4.897µ 4.972µ +1.52% goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 15.26n ± 2% 15.14n ± 1% ~ (p=0.212 n=15) Div/40/20-16 15.22n ± 1% 15.16n ± 0% ~ (p=0.190 n=15) Div/100/50-16 26.53n ± 2% 26.42n ± 0% -0.41% (p=0.000 n=15) Div/200/100-16 124.3n ± 0% 124.0n ± 0% ~ (p=0.704 n=15) Div/400/200-16 142.4n ± 0% 141.8n ± 0% ~ (p=0.074 n=15) Div/1000/500-16 262.0n ± 1% 261.3n ± 1% ~ (p=0.046 n=15) Div/2000/1000-16 532.6n ± 0% 532.5n ± 1% ~ (p=0.798 n=15) Div/20000/10000-16 22.27µ ± 0% 22.88µ ± 0% +2.73% (p=0.000 n=15) Div/200000/100000-16 890.4µ ± 0% 902.8µ ± 0% +1.39% (p=0.000 n=15) Div/2000000/1000000-16 35.03m ± 0% 35.10m ± 0% ~ (p=0.305 n=15) Div/20000000/10000000-16 1.380 ± 0% 1.385 ± 0% ~ (p=0.019 n=15) NatMul/10-16 177.6n ± 1% 175.6n ± 3% ~ (p=0.480 n=15) NatMul/100-16 5.675µ ± 0% 5.669µ ± 1% ~ (p=0.705 n=15) NatMul/1000-16 224.3µ ± 0% 224.6µ ± 0% ~ (p=0.653 n=15) NatMul/10000-16 8.735m ± 0% 8.739m ± 0% ~ (p=0.567 n=15) NatMul/100000-16 331.6m ± 0% 331.6m ± 1% ~ (p=0.412 n=15) NatSqr/1-16 43.69n ± 2% 42.77n ± 6% ~ (p=0.383 n=15) NatSqr/2-16 65.26n ± 2% 63.91n ± 5% ~ (p=0.285 n=15) NatSqr/3-16 73.95n ± 1% 72.25n ± 6% ~ (p=0.198 n=15) NatSqr/5-16 95.06n ± 1% 94.21n ± 3% ~ (p=0.721 n=15) NatSqr/8-16 155.5n ± 1% 153.4n ± 4% ~ (p=0.170 n=15) NatSqr/10-16 175.4n ± 1% 174.0n ± 2% ~ (p=0.271 n=15) NatSqr/20-16 360.8n ± 0% 358.5n ± 2% ~ (p=0.170 n=15) NatSqr/30-16 584.7n ± 0% 582.9n ± 1% ~ (p=0.170 n=15) NatSqr/50-16 1.323µ ± 0% 1.322µ ± 0% ~ (p=0.627 n=15) NatSqr/80-16 2.916µ ± 0% 2.674µ ± 0% -8.30% (p=0.000 n=15) NatSqr/100-16 4.365µ ± 0% 3.802µ ± 0% -12.90% (p=0.000 n=15) NatSqr/200-16 16.42µ ± 0% 11.29µ ± 0% -31.26% (p=0.000 n=15) NatSqr/300-16 28.07µ ± 0% 22.83µ ± 0% -18.68% (p=0.000 n=15) NatSqr/500-16 76.30µ ± 0% 50.06µ ± 0% -34.39% (p=0.000 n=15) NatSqr/800-16 147.5µ ± 0% 101.2µ ± 1% -31.41% (p=0.000 n=15) NatSqr/1000-16 228.6µ ± 0% 149.5µ ± 0% -34.61% (p=0.000 n=15) NatSqr/10000-16 7.417m ± 0% 6.025m ± 0% -18.76% (p=0.000 n=15) NatSqr/100000-16 309.2m ± 0% 214.9m ± 0% -30.50% (p=0.000 n=15) geomean 8.559µ 7.906µ -7.63% goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-12 9.577n ± 6% 9.473n ± 5% ~ (p=0.384 n=15) Div/40/20-12 9.480n ± 1% 9.430n ± 1% ~ (p=0.019 n=15) Div/100/50-12 14.82n ± 0% 14.82n ± 0% ~ (p=0.845 n=15) Div/200/100-12 83.94n ± 1% 84.35n ± 4% ~ (p=0.512 n=15) Div/400/200-12 102.7n ± 1% 102.9n ± 0% ~ (p=0.845 n=15) Div/1000/500-12 185.3n ± 1% 181.9n ± 1% -1.83% (p=0.000 n=15) Div/2000/1000-12 397.0n ± 1% 396.7n ± 0% ~ (p=0.959 n=15) Div/20000/10000-12 14.05µ ± 0% 13.70µ ± 1% ~ (p=0.002 n=15) Div/200000/100000-12 529.4µ ± 3% 526.7µ ± 2% ~ (p=0.967 n=15) Div/2000000/1000000-12 20.05m ± 0% 20.05m ± 0% ~ (p=0.653 n=15) Div/20000000/10000000-12 788.2m ± 1% 789.0m ± 1% ~ (p=0.412 n=15) NatMul/10-12 79.95n ± 1% 80.87n ± 1% +1.15% (p=0.000 n=15) NatMul/100-12 2.973µ ± 0% 2.986µ ± 2% ~ (p=0.051 n=15) NatMul/1000-12 122.6µ ± 5% 123.0µ ± 1% ~ (p=0.783 n=15) NatMul/10000-12 4.990m ± 1% 5.000m ± 1% ~ (p=0.653 n=15) NatMul/100000-12 185.3m ± 3% 190.3m ± 1% ~ (p=0.089 n=15) NatSqr/1-12 11.84n ± 1% 11.88n ± 1% ~ (p=0.735 n=15) NatSqr/2-12 21.01n ± 1% 21.44n ± 6% ~ (p=0.039 n=15) NatSqr/3-12 25.59n ± 0% 26.74n ± 9% +4.49% (p=0.000 n=15) NatSqr/5-12 36.78n ± 0% 37.04n ± 1% +0.71% (p=0.000 n=15) NatSqr/8-12 63.09n ± 3% 63.22n ± 1% ~ (p=0.846 n=15) NatSqr/10-12 79.98n ± 0% 79.78n ± 0% ~ (p=0.100 n=15) NatSqr/20-12 174.0n ± 0% 175.5n ± 1% ~ (p=0.361 n=15) NatSqr/30-12 290.0n ± 0% 291.4n ± 0% ~ (p=0.002 n=15) NatSqr/50-12 655.2n ± 4% 658.1n ± 0% ~ (p=0.060 n=15) NatSqr/80-12 1.506µ ± 0% 1.397µ ± 5% -7.24% (p=0.000 n=15) NatSqr/100-12 2.273µ ± 0% 2.005µ ± 5% -11.79% (p=0.000 n=15) NatSqr/200-12 8.833µ ± 6% 6.109µ ± 0% -30.84% (p=0.000 n=15) NatSqr/300-12 15.15µ ± 4% 12.37µ ± 0% -18.34% (p=0.000 n=15) NatSqr/500-12 41.89µ ± 0% 27.70µ ± 1% -33.88% (p=0.000 n=15) NatSqr/800-12 80.72µ ± 0% 56.40µ ± 0% -30.12% (p=0.000 n=15) NatSqr/1000-12 127.06µ ± 1% 84.06µ ± 1% -33.84% (p=0.000 n=15) NatSqr/10000-12 4.130m ± 0% 3.390m ± 0% -17.91% (p=0.000 n=15) NatSqr/100000-12 173.2m ± 0% 131.2m ± 6% -24.25% (p=0.000 n=15) geomean 4.489µ 4.189µ -6.68% Change-Id: Iaf65fd85457b003ebf07a787c875cda321b40cc9 Reviewed-on: https://go-review.googlesource.com/c/go/+/652058 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>
2025-03-12	math/big: simplify, speed up Karatsuba multiplication	Russ Cox
	The old Karatsuba implementation only operated on lengths that are a power of two times a number smaller than karatsubaThreshold. For example, when karatsubaThreshold = 40, multiplying a pair of 99-word numbers runs karatsuba on the low 96 (= 39<<2) words and then has to fix up the answer to include the high 3 words of each. I suspect this requirement was needed to make the analysis of how many temporary words to reserve easier, back when the answer was 3*n and depended on exactly halving the size at each Karatsuba step. Now that we have the more flexible temporary allocation stack, we can change Karatsuba to accept operands of odd length. Doing so avoids most of the fixup that the old approach required. For example, multiplying a pair of 99-word numbers runs karatsuba on all 99 words now. This is simpler and about the same speed or, for large cases, faster. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-16 99.62n ± 3% 99.10n ± 3% ~ (p=0.009 n=15) GCD10x10/WithXY-16 243.4n ± 1% 245.2n ± 1% ~ (p=0.009 n=15) GCD100x100/WithoutXY-16 921.9n ± 1% 919.2n ± 1% ~ (p=0.076 n=15) GCD100x100/WithXY-16 1.527µ ± 1% 1.526µ ± 0% ~ (p=0.813 n=15) GCD1000x1000/WithoutXY-16 9.704µ ± 1% 9.696µ ± 0% ~ (p=0.532 n=15) GCD1000x1000/WithXY-16 14.03µ ± 1% 13.96µ ± 0% ~ (p=0.014 n=15) GCD10000x10000/WithoutXY-16 206.5µ ± 2% 206.5µ ± 0% ~ (p=0.967 n=15) GCD10000x10000/WithXY-16 398.0µ ± 1% 397.4µ ± 0% ~ (p=0.683 n=15) Div/20/10-16 22.22n ± 0% 22.23n ± 0% ~ (p=0.105 n=15) Div/40/20-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.307 n=15) Div/100/50-16 55.47n ± 0% 55.47n ± 0% ~ (p=0.573 n=15) Div/200/100-16 174.9n ± 1% 174.6n ± 1% ~ (p=0.814 n=15) Div/400/200-16 209.5n ± 1% 210.5n ± 1% ~ (p=0.454 n=15) Div/1000/500-16 379.9n ± 0% 383.5n ± 2% ~ (p=0.123 n=15) Div/2000/1000-16 780.1n ± 0% 784.6n ± 1% +0.58% (p=0.000 n=15) Div/20000/10000-16 25.22µ ± 1% 25.15µ ± 0% ~ (p=0.213 n=15) Div/200000/100000-16 921.8µ ± 1% 926.1µ ± 0% ~ (p=0.009 n=15) Div/2000000/1000000-16 37.91m ± 0% 35.63m ± 0% -6.02% (p=0.000 n=15) Div/20000000/10000000-16 1.378 ± 0% 1.336 ± 0% -3.03% (p=0.000 n=15) NatMul/10-16 166.8n ± 4% 168.9n ± 3% ~ (p=0.008 n=15) NatMul/100-16 5.519µ ± 2% 5.548µ ± 4% ~ (p=0.032 n=15) NatMul/1000-16 230.4µ ± 1% 220.2µ ± 1% -4.43% (p=0.000 n=15) NatMul/10000-16 8.569m ± 1% 8.640m ± 1% ~ (p=0.005 n=15) NatMul/100000-16 376.5m ± 1% 334.1m ± 0% -11.26% (p=0.000 n=15) NatSqr/1-16 27.85n ± 5% 28.60n ± 2% ~ (p=0.123 n=15) NatSqr/2-16 47.99n ± 2% 48.84n ± 1% ~ (p=0.008 n=15) NatSqr/3-16 59.41n ± 2% 60.87n ± 2% +2.46% (p=0.001 n=15) NatSqr/5-16 87.27n ± 2% 89.31n ± 3% ~ (p=0.087 n=15) NatSqr/8-16 124.6n ± 3% 128.9n ± 3% ~ (p=0.006 n=15) NatSqr/10-16 166.3n ± 3% 172.7n ± 3% ~ (p=0.002 n=15) NatSqr/20-16 385.2n ± 2% 394.7n ± 3% ~ (p=0.036 n=15) NatSqr/30-16 622.7n ± 3% 642.9n ± 3% ~ (p=0.032 n=15) NatSqr/50-16 1.274µ ± 3% 1.323µ ± 4% ~ (p=0.003 n=15) NatSqr/80-16 2.606µ ± 4% 2.714µ ± 4% ~ (p=0.044 n=15) NatSqr/100-16 3.731µ ± 4% 3.871µ ± 4% ~ (p=0.038 n=15) NatSqr/200-16 12.99µ ± 2% 13.09µ ± 3% ~ (p=0.838 n=15) NatSqr/300-16 22.87µ ± 2% 23.25µ ± 2% ~ (p=0.285 n=15) NatSqr/500-16 58.43µ ± 1% 58.25µ ± 2% ~ (p=0.345 n=15) NatSqr/800-16 115.3µ ± 3% 116.2µ ± 3% ~ (p=0.126 n=15) NatSqr/1000-16 173.9µ ± 1% 174.3µ ± 1% ~ (p=0.935 n=15) NatSqr/10000-16 6.133m ± 2% 6.034m ± 1% -1.62% (p=0.000 n=15) NatSqr/100000-16 253.8m ± 1% 241.5m ± 0% -4.87% (p=0.000 n=15) geomean 7.745µ 7.760µ +0.19% goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-88 62.17n ± 4% 61.44n ± 0% -1.17% (p=0.000 n=15) GCD10x10/WithXY-88 173.4n ± 2% 172.4n ± 4% ~ (p=0.615 n=15) GCD100x100/WithoutXY-88 584.0n ± 1% 582.9n ± 0% ~ (p=0.009 n=15) GCD100x100/WithXY-88 1.098µ ± 1% 1.091µ ± 2% ~ (p=0.002 n=15) GCD1000x1000/WithoutXY-88 6.055µ ± 0% 6.049µ ± 0% ~ (p=0.007 n=15) GCD1000x1000/WithXY-88 9.430µ ± 0% 9.417µ ± 1% ~ (p=0.123 n=15) GCD10000x10000/WithoutXY-88 153.4µ ± 2% 149.0µ ± 2% -2.85% (p=0.000 n=15) GCD10000x10000/WithXY-88 350.6µ ± 3% 349.0µ ± 2% ~ (p=0.126 n=15) Div/20/10-88 13.12n ± 0% 13.12n ± 1% 0.00% (p=0.042 n=15) Div/40/20-88 13.12n ± 0% 13.13n ± 0% ~ (p=0.004 n=15) Div/100/50-88 25.49n ± 0% 25.49n ± 0% ~ (p=0.452 n=15) Div/200/100-88 115.7n ± 2% 113.8n ± 2% ~ (p=0.212 n=15) Div/400/200-88 135.0n ± 1% 136.1n ± 1% ~ (p=0.005 n=15) Div/1000/500-88 257.5n ± 1% 259.9n ± 1% ~ (p=0.004 n=15) Div/2000/1000-88 567.5n ± 1% 572.4n ± 2% ~ (p=0.616 n=15) Div/20000/10000-88 25.65µ ± 0% 25.77µ ± 1% ~ (p=0.032 n=15) Div/200000/100000-88 777.4µ ± 1% 754.3µ ± 1% -2.97% (p=0.000 n=15) Div/2000000/1000000-88 33.66m ± 0% 31.37m ± 0% -6.81% (p=0.000 n=15) Div/20000000/10000000-88 1.320 ± 0% 1.266 ± 0% -4.04% (p=0.000 n=15) NatMul/10-88 151.9n ± 7% 143.3n ± 7% ~ (p=0.878 n=15) NatMul/100-88 4.418µ ± 2% 4.337µ ± 3% ~ (p=0.512 n=15) NatMul/1000-88 206.8µ ± 1% 189.8µ ± 1% -8.25% (p=0.000 n=15) NatMul/10000-88 8.531m ± 1% 8.095m ± 0% -5.12% (p=0.000 n=15) NatMul/100000-88 298.9m ± 0% 260.5m ± 1% -12.85% (p=0.000 n=15) NatSqr/1-88 27.55n ± 6% 28.25n ± 7% ~ (p=0.024 n=15) NatSqr/2-88 44.71n ± 6% 46.21n ± 9% ~ (p=0.024 n=15) NatSqr/3-88 55.44n ± 4% 58.41n ± 10% ~ (p=0.126 n=15) NatSqr/5-88 80.71n ± 5% 81.41n ± 5% ~ (p=0.032 n=15) NatSqr/8-88 115.7n ± 4% 115.4n ± 5% ~ (p=0.814 n=15) NatSqr/10-88 147.4n ± 4% 147.3n ± 4% ~ (p=0.505 n=15) NatSqr/20-88 337.8n ± 3% 337.3n ± 4% ~ (p=0.814 n=15) NatSqr/30-88 556.9n ± 3% 557.6n ± 4% ~ (p=0.814 n=15) NatSqr/50-88 1.208µ ± 4% 1.208µ ± 3% ~ (p=0.910 n=15) NatSqr/80-88 2.591µ ± 3% 2.581µ ± 3% ~ (p=0.705 n=15) NatSqr/100-88 3.870µ ± 3% 3.858µ ± 3% ~ (p=0.846 n=15) NatSqr/200-88 14.43µ ± 3% 14.28µ ± 2% ~ (p=0.383 n=15) NatSqr/300-88 24.68µ ± 2% 24.49µ ± 2% ~ (p=0.624 n=15) NatSqr/500-88 66.27µ ± 1% 66.18µ ± 1% ~ (p=0.735 n=15) NatSqr/800-88 128.7µ ± 1% 127.4µ ± 1% ~ (p=0.050 n=15) NatSqr/1000-88 198.7µ ± 1% 197.7µ ± 1% ~ (p=0.229 n=15) NatSqr/10000-88 6.582m ± 1% 6.426m ± 1% -2.37% (p=0.000 n=15) NatSqr/100000-88 274.3m ± 0% 267.3m ± 0% -2.57% (p=0.000 n=15) geomean 6.518µ 6.438µ -1.22% goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-16 61.70n ± 1% 61.32n ± 1% ~ (p=0.361 n=15) GCD10x10/WithXY-16 217.3n ± 1% 217.0n ± 1% ~ (p=0.395 n=15) GCD100x100/WithoutXY-16 569.7n ± 0% 572.6n ± 2% ~ (p=0.213 n=15) GCD100x100/WithXY-16 1.241µ ± 1% 1.236µ ± 1% ~ (p=0.157 n=15) GCD1000x1000/WithoutXY-16 5.558µ ± 0% 5.566µ ± 0% ~ (p=0.228 n=15) GCD1000x1000/WithXY-16 9.319µ ± 0% 9.326µ ± 0% ~ (p=0.233 n=15) GCD10000x10000/WithoutXY-16 126.4µ ± 2% 128.7µ ± 3% ~ (p=0.081 n=15) GCD10000x10000/WithXY-16 279.3µ ± 0% 278.3µ ± 5% ~ (p=0.187 n=15) Div/20/10-16 15.12n ± 1% 15.21n ± 1% ~ (p=0.490 n=15) Div/40/20-16 15.11n ± 0% 15.23n ± 1% ~ (p=0.107 n=15) Div/100/50-16 26.53n ± 0% 26.50n ± 0% ~ (p=0.299 n=15) Div/200/100-16 123.7n ± 0% 124.0n ± 0% ~ (p=0.086 n=15) Div/400/200-16 142.5n ± 0% 142.4n ± 0% ~ (p=0.039 n=15) Div/1000/500-16 259.9n ± 1% 261.2n ± 1% ~ (p=0.044 n=15) Div/2000/1000-16 539.4n ± 1% 532.3n ± 1% -1.32% (p=0.001 n=15) Div/20000/10000-16 22.43µ ± 0% 22.32µ ± 0% -0.49% (p=0.000 n=15) Div/200000/100000-16 898.3µ ± 0% 889.6µ ± 0% -0.96% (p=0.000 n=15) Div/2000000/1000000-16 38.37m ± 0% 35.11m ± 0% -8.49% (p=0.000 n=15) Div/20000000/10000000-16 1.449 ± 0% 1.384 ± 0% -4.48% (p=0.000 n=15) NatMul/10-16 182.0n ± 1% 177.8n ± 1% -2.31% (p=0.000 n=15) NatMul/100-16 5.537µ ± 0% 5.693µ ± 0% +2.82% (p=0.000 n=15) NatMul/1000-16 229.9µ ± 0% 224.8µ ± 0% -2.24% (p=0.000 n=15) NatMul/10000-16 8.985m ± 0% 8.751m ± 0% -2.61% (p=0.000 n=15) NatMul/100000-16 371.1m ± 0% 331.5m ± 0% -10.66% (p=0.000 n=15) NatSqr/1-16 46.77n ± 6% 42.76n ± 1% -8.57% (p=0.000 n=15) NatSqr/2-16 66.99n ± 4% 63.62n ± 1% -5.03% (p=0.000 n=15) NatSqr/3-16 76.79n ± 4% 73.42n ± 1% ~ (p=0.007 n=15) NatSqr/5-16 99.00n ± 3% 95.35n ± 1% -3.69% (p=0.000 n=15) NatSqr/8-16 160.0n ± 3% 155.1n ± 1% -3.06% (p=0.001 n=15) NatSqr/10-16 178.4n ± 2% 175.9n ± 0% -1.40% (p=0.001 n=15) NatSqr/20-16 361.9n ± 2% 361.3n ± 0% ~ (p=0.083 n=15) NatSqr/30-16 584.7n ± 0% 586.8n ± 0% +0.36% (p=0.000 n=15) NatSqr/50-16 1.327µ ± 0% 1.329µ ± 0% ~ (p=0.349 n=15) NatSqr/80-16 2.893µ ± 1% 2.925µ ± 0% +1.11% (p=0.000 n=15) NatSqr/100-16 4.330µ ± 1% 4.381µ ± 0% +1.18% (p=0.000 n=15) NatSqr/200-16 16.25µ ± 1% 16.43µ ± 0% +1.07% (p=0.000 n=15) NatSqr/300-16 27.85µ ± 1% 28.06µ ± 0% +0.77% (p=0.000 n=15) NatSqr/500-16 76.01µ ± 0% 76.34µ ± 0% ~ (p=0.002 n=15) NatSqr/800-16 146.8µ ± 0% 148.1µ ± 0% +0.83% (p=0.000 n=15) NatSqr/1000-16 228.2µ ± 0% 228.6µ ± 0% ~ (p=0.123 n=15) NatSqr/10000-16 7.524m ± 0% 7.426m ± 0% -1.31% (p=0.000 n=15) NatSqr/100000-16 316.7m ± 0% 309.2m ± 0% -2.36% (p=0.000 n=15) geomean 7.264µ 7.172µ -1.27% goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-12 32.61n ± 1% 32.42n ± 1% ~ (p=0.021 n=15) GCD10x10/WithXY-12 87.70n ± 1% 88.42n ± 1% ~ (p=0.010 n=15) GCD100x100/WithoutXY-12 305.9n ± 0% 306.4n ± 0% ~ (p=0.003 n=15) GCD100x100/WithXY-12 560.3n ± 2% 556.6n ± 1% ~ (p=0.018 n=15) GCD1000x1000/WithoutXY-12 3.509µ ± 2% 3.464µ ± 1% ~ (p=0.145 n=15) GCD1000x1000/WithXY-12 5.347µ ± 2% 5.372µ ± 1% ~ (p=0.046 n=15) GCD10000x10000/WithoutXY-12 73.75µ ± 1% 73.99µ ± 1% ~ (p=0.004 n=15) GCD10000x10000/WithXY-12 148.4µ ± 0% 147.8µ ± 1% ~ (p=0.076 n=15) Div/20/10-12 9.481n ± 0% 9.462n ± 1% ~ (p=0.631 n=15) Div/40/20-12 9.457n ± 0% 9.462n ± 1% ~ (p=0.798 n=15) Div/100/50-12 14.91n ± 0% 14.79n ± 1% -0.80% (p=0.000 n=15) Div/200/100-12 84.56n ± 1% 84.60n ± 1% ~ (p=0.271 n=15) Div/400/200-12 103.8n ± 0% 102.8n ± 0% -0.96% (p=0.000 n=15) Div/1000/500-12 181.3n ± 1% 184.2n ± 2% ~ (p=0.091 n=15) Div/2000/1000-12 397.5n ± 0% 397.4n ± 0% ~ (p=0.299 n=15) Div/20000/10000-12 14.04µ ± 1% 13.99µ ± 0% ~ (p=0.221 n=15) Div/200000/100000-12 523.1µ ± 0% 514.0µ ± 3% ~ (p=0.775 n=15) Div/2000000/1000000-12 21.58m ± 0% 20.01m ± 1% -7.29% (p=0.000 n=15) Div/20000000/10000000-12 813.5m ± 0% 796.2m ± 1% -2.13% (p=0.000 n=15) NatMul/10-12 80.46n ± 1% 80.02n ± 1% ~ (p=0.063 n=15) NatMul/100-12 2.904µ ± 0% 2.979µ ± 1% +2.58% (p=0.000 n=15) NatMul/1000-12 127.8µ ± 0% 122.3µ ± 0% -4.28% (p=0.000 n=15) NatMul/10000-12 5.141m ± 0% 4.975m ± 1% -3.23% (p=0.000 n=15) NatMul/100000-12 208.8m ± 0% 189.6m ± 3% -9.21% (p=0.000 n=15) NatSqr/1-12 11.90n ± 1% 11.76n ± 1% ~ (p=0.059 n=15) NatSqr/2-12 21.33n ± 1% 21.12n ± 0% ~ (p=0.063 n=15) NatSqr/3-12 26.05n ± 1% 25.79n ± 0% ~ (p=0.002 n=15) NatSqr/5-12 37.31n ± 0% 36.98n ± 1% ~ (p=0.008 n=15) NatSqr/8-12 63.07n ± 0% 62.75n ± 1% ~ (p=0.061 n=15) NatSqr/10-12 79.48n ± 0% 79.59n ± 0% ~ (p=0.455 n=15) NatSqr/20-12 173.1n ± 0% 173.2n ± 1% ~ (p=0.518 n=15) NatSqr/30-12 288.6n ± 1% 289.2n ± 0% ~ (p=0.030 n=15) NatSqr/50-12 653.3n ± 0% 653.3n ± 0% ~ (p=0.361 n=15) NatSqr/80-12 1.492µ ± 0% 1.496µ ± 0% ~ (p=0.018 n=15) NatSqr/100-12 2.270µ ± 1% 2.270µ ± 0% ~ (p=0.326 n=15) NatSqr/200-12 8.776µ ± 1% 8.784µ ± 1% ~ (p=0.083 n=15) NatSqr/300-12 15.07µ ± 0% 15.09µ ± 0% ~ (p=0.455 n=15) NatSqr/500-12 41.71µ ± 0% 41.77µ ± 1% ~ (p=0.305 n=15) NatSqr/800-12 80.77µ ± 1% 80.59µ ± 0% ~ (p=0.113 n=15) NatSqr/1000-12 126.4µ ± 1% 126.5µ ± 0% ~ (p=0.683 n=15) NatSqr/10000-12 4.204m ± 0% 4.119m ± 0% -2.02% (p=0.000 n=15) NatSqr/100000-12 177.0m ± 0% 172.9m ± 0% -2.31% (p=0.000 n=15) geomean 3.790µ 3.757µ -0.87% Change-Id: Ifc7a9b61f678df216690511ac8bb9143189a795e Reviewed-on: https://go-review.googlesource.com/c/go/+/652057 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>
2025-03-06	math/big: avoid negative slice size in nat.rem	Russ Cox
	In a division, normally the answer to N digits / D digits has N-D digits, but not when N-D is negative. Fix the calculation of the number of digits for the temporary in nat.rem not to be negative. Fixes #72043. Change-Id: Ib9faa430aeb6c5f4c4a730f1ec631d2bf3f7472c Reviewed-on: https://go-review.googlesource.com/c/go/+/655156 Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-05	math: implement func archExp and archExp2 in assembly on loong64	Xiaolin Zhao
	goos: linux goarch: loong64 pkg: math cpu: Loongson-3A6000 @ 2500.00MHz \| bench.old \| bench.new \| \| sec/op \| sec/op vs base \| Exp 26.30n ± 0% 12.93n ± 0% -50.85% (p=0.000 n=10) ExpGo 26.86n ± 0% 26.92n ± 0% +0.22% (p=0.000 n=10) Expm1 16.76n ± 0% 16.75n ± 0% ~ (p=0.060 n=10) Exp2 23.05n ± 0% 12.12n ± 0% -47.42% (p=0.000 n=10) Exp2Go 23.41n ± 0% 23.47n ± 0% +0.28% (p=0.000 n=10) geomean 22.97n 17.54n -23.64% goos: linux goarch: loong64 pkg: math/cmplx cpu: Loongson-3A6000 @ 2500.00MHz \| bench.old \| bench.new \| \| sec/op \| sec/op vs base \| Exp 51.32n ± 0% 35.41n ± 0% -30.99% (p=0.000 n=10) goos: linux goarch: loong64 pkg: math cpu: Loongson-3A5000 @ 2500.00MHz \| bench.old \| bench.new \| \| sec/op \| sec/op vs base \| Exp 50.27n ± 0% 48.75n ± 1% -3.01% (p=0.000 n=10) ExpGo 50.72n ± 0% 50.44n ± 0% -0.55% (p=0.000 n=10) Expm1 28.40n ± 0% 28.32n ± 0% ~ (p=0.360 n=10) Exp2 50.09n ± 0% 21.49n ± 1% -57.10% (p=0.000 n=10) Exp2Go 50.05n ± 0% 49.69n ± 0% -0.72% (p=0.000 n=10) geomean 44.85n 37.52n -16.35% goos: linux goarch: loong64 pkg: math/cmplx cpu: Loongson-3A5000 @ 2500.00MHz \| bench.old \| bench.new \| \| sec/op \| sec/op vs base \| Exp 88.56n ± 0% 67.29n ± 0% -24.03% (p=0.000 n=10) Change-Id: I89e456d26fc075d83335ee4a31227d2aface5714 Reviewed-on: https://go-review.googlesource.com/c/go/+/653935 Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-02-27	math/big: add tests for allocation during multiply	Russ Cox
	Test that big.Int.Mul reusing the same target is not allocating temporary garbage during its computation. That code is going to be modified in an upcoming CL. Change-Id: I3ed55c06da030282233c29cd7af2a04f395dc7a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/652056 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>
2025-02-27	math/big: move multiplication to natmul.go	Russ Cox
	No code changes. This CL moves the multiplication (and squaring) code into natmul.go, in preparation for cleaning up Karatsuba and then adding Toom-Cook and FFT-based multiplication. Change-Id: I7f84328284cc4e1ca4da0ebb9f666a5535e8d7f2 Reviewed-on: https://go-review.googlesource.com/c/go/+/652055 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>
2025-02-27	math/big: optimize atoi of base 2, 4, 16	Russ Cox
	Avoid multiplies when converting base 2, 4, 16 inputs, reducing conversion time from O(N²) to O(N). The Base8 and Base10 code paths should be unmodified, but the base-2,4,16 changes tickle the compiler to generate better (amd64) or worse (arm64) when really it should not. This is described in detail in #71868 and should be ignored for the purposes of this CL. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ Scan/10/Base2-16 324.4n ± 0% 258.7n ± 0% -20.25% (p=0.000 n=15) Scan/100/Base2-16 2.376µ ± 0% 1.968µ ± 0% -17.17% (p=0.000 n=15) Scan/1000/Base2-16 23.89µ ± 0% 19.16µ ± 0% -19.80% (p=0.000 n=15) Scan/10000/Base2-16 311.5µ ± 0% 190.4µ ± 0% -38.86% (p=0.000 n=15) Scan/100000/Base2-16 10.508m ± 0% 1.904m ± 0% -81.88% (p=0.000 n=15) Scan/10/Base8-16 138.3n ± 0% 127.9n ± 0% -7.52% (p=0.000 n=15) Scan/100/Base8-16 886.1n ± 0% 790.2n ± 0% -10.82% (p=0.000 n=15) Scan/1000/Base8-16 9.227µ ± 0% 8.234µ ± 0% -10.76% (p=0.000 n=15) Scan/10000/Base8-16 165.8µ ± 0% 155.6µ ± 0% -6.19% (p=0.000 n=15) Scan/100000/Base8-16 9.044m ± 0% 8.935m ± 0% -1.20% (p=0.000 n=15) Scan/10/Base10-16 129.9n ± 0% 120.0n ± 0% -7.62% (p=0.000 n=15) Scan/100/Base10-16 816.3n ± 0% 730.0n ± 0% -10.57% (p=0.000 n=15) Scan/1000/Base10-16 8.518µ ± 0% 7.628µ ± 0% -10.45% (p=0.000 n=15) Scan/10000/Base10-16 158.6µ ± 0% 149.4µ ± 0% -5.80% (p=0.000 n=15) Scan/100000/Base10-16 8.962m ± 0% 8.855m ± 0% -1.20% (p=0.000 n=15) Scan/10/Base16-16 114.5n ± 0% 108.6n ± 0% -5.15% (p=0.000 n=15) Scan/100/Base16-16 648.3n ± 0% 525.0n ± 0% -19.02% (p=0.000 n=15) Scan/1000/Base16-16 7.375µ ± 0% 5.636µ ± 0% -23.58% (p=0.000 n=15) Scan/10000/Base16-16 171.18µ ± 0% 66.99µ ± 0% -60.87% (p=0.000 n=15) Scan/100000/Base16-16 9490.9µ ± 0% 682.8µ ± 0% -92.81% (p=0.000 n=15) geomean 20.11µ 13.69µ -31.94% goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ Scan/10/Base2-88 275.4n ± 0% 215.0n ± 0% -21.93% (p=0.000 n=15) Scan/100/Base2-88 1.869µ ± 0% 1.629µ ± 0% -12.84% (p=0.000 n=15) Scan/1000/Base2-88 18.56µ ± 0% 15.81µ ± 0% -14.82% (p=0.000 n=15) Scan/10000/Base2-88 270.0µ ± 0% 157.2µ ± 0% -41.77% (p=0.000 n=15) Scan/100000/Base2-88 11.518m ± 0% 1.571m ± 0% -86.36% (p=0.000 n=15) Scan/10/Base8-88 108.9n ± 0% 106.0n ± 0% -2.66% (p=0.000 n=15) Scan/100/Base8-88 655.2n ± 0% 594.9n ± 0% -9.20% (p=0.000 n=15) Scan/1000/Base8-88 6.467µ ± 0% 5.966µ ± 0% -7.75% (p=0.000 n=15) Scan/10000/Base8-88 151.2µ ± 0% 147.4µ ± 0% -2.53% (p=0.000 n=15) Scan/100000/Base8-88 10.33m ± 0% 10.30m ± 0% -0.25% (p=0.000 n=15) Scan/10/Base10-88 100.20n ± 0% 98.53n ± 0% -1.67% (p=0.000 n=15) Scan/100/Base10-88 596.9n ± 0% 543.3n ± 0% -8.98% (p=0.000 n=15) Scan/1000/Base10-88 5.904µ ± 0% 5.485µ ± 0% -7.10% (p=0.000 n=15) Scan/10000/Base10-88 145.7µ ± 0% 142.0µ ± 0% -2.55% (p=0.000 n=15) Scan/100000/Base10-88 10.26m ± 0% 10.24m ± 0% -0.18% (p=0.000 n=15) Scan/10/Base16-88 90.33n ± 0% 87.60n ± 0% -3.02% (p=0.000 n=15) Scan/100/Base16-88 506.4n ± 0% 437.7n ± 0% -13.57% (p=0.000 n=15) Scan/1000/Base16-88 5.056µ ± 0% 4.007µ ± 0% -20.75% (p=0.000 n=15) Scan/10000/Base16-88 163.35µ ± 0% 65.37µ ± 0% -59.98% (p=0.000 n=15) Scan/100000/Base16-88 11027.2µ ± 0% 735.1µ ± 0% -93.33% (p=0.000 n=15) geomean 17.13µ 11.74µ -31.46% goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ Scan/10/Base2-16 324.7n ± 0% 348.4n ± 0% +7.30% (p=0.000 n=15) Scan/100/Base2-16 2.604µ ± 0% 3.031µ ± 0% +16.40% (p=0.000 n=15) Scan/1000/Base2-16 26.15µ ± 0% 29.94µ ± 0% +14.52% (p=0.000 n=15) Scan/10000/Base2-16 334.3µ ± 0% 298.8µ ± 0% -10.64% (p=0.000 n=15) Scan/100000/Base2-16 10.664m ± 0% 2.991m ± 0% -71.95% (p=0.000 n=15) Scan/10/Base8-16 144.4n ± 1% 162.2n ± 1% +12.33% (p=0.000 n=15) Scan/100/Base8-16 917.2n ± 0% 1084.0n ± 0% +18.19% (p=0.000 n=15) Scan/1000/Base8-16 9.367µ ± 0% 10.901µ ± 0% +16.38% (p=0.000 n=15) Scan/10000/Base8-16 164.2µ ± 0% 181.2µ ± 0% +10.34% (p=0.000 n=15) Scan/100000/Base8-16 8.871m ± 1% 9.140m ± 0% +3.04% (p=0.000 n=15) Scan/10/Base10-16 134.6n ± 1% 148.3n ± 1% +10.18% (p=0.000 n=15) Scan/100/Base10-16 837.1n ± 0% 986.6n ± 0% +17.86% (p=0.000 n=15) Scan/1000/Base10-16 8.563µ ± 0% 9.936µ ± 0% +16.03% (p=0.000 n=15) Scan/10000/Base10-16 156.5µ ± 1% 171.3µ ± 0% +9.41% (p=0.000 n=15) Scan/100000/Base10-16 8.863m ± 1% 9.011m ± 0% +1.66% (p=0.000 n=15) Scan/10/Base16-16 115.7n ± 2% 129.1n ± 1% +11.58% (p=0.000 n=15) Scan/100/Base16-16 708.6n ± 0% 796.8n ± 0% +12.45% (p=0.000 n=15) Scan/1000/Base16-16 7.314µ ± 0% 7.554µ ± 0% +3.28% (p=0.000 n=15) Scan/10000/Base16-16 149.05µ ± 0% 74.60µ ± 0% -49.95% (p=0.000 n=15) Scan/100000/Base16-16 9091.6µ ± 0% 741.5µ ± 0% -91.84% (p=0.000 n=15) geomean 20.39µ 17.65µ -13.44% goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ Scan/10/Base2-12 193.8n ± 2% 157.3n ± 1% -18.83% (p=0.000 n=15) Scan/100/Base2-12 1.445µ ± 2% 1.362µ ± 1% -5.74% (p=0.000 n=15) Scan/1000/Base2-12 14.28µ ± 0% 13.51µ ± 0% -5.42% (p=0.000 n=15) Scan/10000/Base2-12 177.1µ ± 0% 134.6µ ± 0% -24.04% (p=0.000 n=15) Scan/100000/Base2-12 5.429m ± 1% 1.333m ± 0% -75.45% (p=0.000 n=15) Scan/10/Base8-12 75.52n ± 2% 76.09n ± 1% ~ (p=0.010 n=15) Scan/100/Base8-12 528.4n ± 1% 532.1n ± 1% ~ (p=0.003 n=15) Scan/1000/Base8-12 5.423µ ± 1% 5.427µ ± 0% ~ (p=0.183 n=15) Scan/10000/Base8-12 89.26µ ± 1% 89.37µ ± 0% ~ (p=0.237 n=15) Scan/100000/Base8-12 4.543m ± 2% 4.560m ± 1% ~ (p=0.595 n=15) Scan/10/Base10-12 69.87n ± 1% 70.51n ± 0% ~ (p=0.002 n=15) Scan/100/Base10-12 488.4n ± 1% 491.2n ± 0% ~ (p=0.060 n=15) Scan/1000/Base10-12 5.014µ ± 1% 5.008µ ± 0% ~ (p=0.783 n=15) Scan/10000/Base10-12 84.90µ ± 0% 85.10µ ± 0% ~ (p=0.109 n=15) Scan/100000/Base10-12 4.516m ± 1% 4.521m ± 1% ~ (p=0.713 n=15) Scan/10/Base16-12 59.21n ± 1% 57.70n ± 1% -2.55% (p=0.000 n=15) Scan/100/Base16-12 380.0n ± 1% 360.7n ± 1% -5.08% (p=0.000 n=15) Scan/1000/Base16-12 3.775µ ± 0% 3.421µ ± 0% -9.38% (p=0.000 n=15) Scan/10000/Base16-12 80.62µ ± 0% 34.44µ ± 1% -57.28% (p=0.000 n=15) Scan/100000/Base16-12 4826.4µ ± 2% 450.9µ ± 2% -90.66% (p=0.000 n=15) geomean 11.05µ 8.448µ -23.52% Change-Id: Ifdb2049545f34072aa75cdbb72bed4cf465f0ad7 Reviewed-on: https://go-review.googlesource.com/c/go/+/650640 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>
2025-02-27	math/big: improve scan test and benchmark	Russ Cox
	Add a few more test cases for scanning (integer conversion), which were helpful in debugging some upcoming changes. BenchmarkScan currently times converting the value 10N represented in base B back into []Word form. When B = 10, the text is 1 followed by many zeros, which could hit a "multiply by zero" special case when processing many digit chunks, misrepresenting the actual time required depending on whether that case is optimized. Change the benchmark to use 9N, which is about as big and will not cause runs of zeros in any of the tested bases. The benchmark comparison below is not showing faster code, since of course the code is not changing at all here. Instead, it is showing that the new benchmark work is roughly the same size as the old benchmark work. goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ ScanPi-12 43.35µ ± 1% 43.59µ ± 1% ~ (p=0.069 n=15) Scan/10/Base2-12 202.3n ± 2% 193.7n ± 1% -4.25% (p=0.000 n=15) Scan/100/Base2-12 1.512µ ± 3% 1.447µ ± 1% -4.30% (p=0.000 n=15) Scan/1000/Base2-12 15.06µ ± 2% 14.33µ ± 0% -4.83% (p=0.000 n=15) Scan/10000/Base2-12 188.0µ ± 5% 177.3µ ± 1% -5.65% (p=0.000 n=15) Scan/100000/Base2-12 5.814m ± 3% 5.382m ± 1% -7.43% (p=0.000 n=15) Scan/10/Base8-12 78.57n ± 2% 75.02n ± 1% -4.52% (p=0.000 n=15) Scan/100/Base8-12 548.2n ± 2% 526.8n ± 1% -3.90% (p=0.000 n=15) Scan/1000/Base8-12 5.674µ ± 2% 5.421µ ± 0% -4.46% (p=0.000 n=15) Scan/10000/Base8-12 94.42µ ± 1% 88.61µ ± 1% -6.15% (p=0.000 n=15) Scan/100000/Base8-12 4.906m ± 2% 4.498m ± 3% -8.31% (p=0.000 n=15) Scan/10/Base10-12 73.42n ± 1% 69.56n ± 0% -5.26% (p=0.000 n=15) Scan/100/Base10-12 511.9n ± 1% 488.2n ± 0% -4.63% (p=0.000 n=15) Scan/1000/Base10-12 5.254µ ± 2% 5.009µ ± 0% -4.66% (p=0.000 n=15) Scan/10000/Base10-12 90.22µ ± 2% 84.52µ ± 0% -6.32% (p=0.000 n=15) Scan/100000/Base10-12 4.842m ± 3% 4.471m ± 3% -7.65% (p=0.000 n=15) Scan/10/Base16-12 62.28n ± 1% 58.70n ± 1% -5.75% (p=0.000 n=15) Scan/100/Base16-12 398.6n ± 0% 377.9n ± 1% -5.19% (p=0.000 n=15) Scan/1000/Base16-12 4.108µ ± 1% 3.782µ ± 0% -7.94% (p=0.000 n=15) Scan/10000/Base16-12 83.78µ ± 2% 80.51µ ± 1% -3.90% (p=0.000 n=15) Scan/100000/Base16-12 5.080m ± 3% 4.698m ± 3% -7.53% (p=0.000 n=15) geomean 12.41µ 11.74µ -5.36% Change-Id: If3ce290ecc7f38672f11b42fd811afb53dee665d Reviewed-on: https://go-review.googlesource.com/c/go/+/650639 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>
2025-02-27	math/big: replace nat pool with Word stack	Russ Cox
	In the early days of math/big, algorithms that needed more space grew the result larger than it needed to be and then used the high words as extra space. This made results their own temporary space caches, at the cost that saving a result in a data structure might hold significantly more memory than necessary. Specifically, new(big.Int).Mul(x, y) returned a big.Int with a backing slice 3X as big as it strictly needed to be. If you are storing many multiplication results, or even a single large result, the 3X overhead can add up. This approach to storage for temporaries also requires being able to analyze the algorithms to predict the exact amount they need, which can be difficult. For both these reasons, the implementation of recursive long division, which came later, introduced a “nat pool” where temporaries could be stored and reused, or reclaimed by the GC when no longer used. This avoids the storage and bookkeeping overheads but introduces a per-temporary sync.Pool overhead. divRecursiveStep takes an array of cached temporaries to remove some of that overhead. The nat pool was better but is still not quite right. This CL introduces something even better than the nat pool (still probably not quite right, but the best I can see for now): a sync.Pool holding stacks for allocating temporaries. Now an operation can get one stack out of the pool and then allocate as many temporaries as it needs during the operation, eventually returning the stack back to the pool. The sync.Pool operations are now per-exported-operation (like big.Int.Mul), not per-temporary. This CL converts both the pre-allocation in nat.mul and the uses of the nat pool to use stack pools instead. This simplifies some code and sets us up better for more complex algorithms (such as Toom-Cook or FFT-based multiplication) that need more temporaries. It is also a little bit faster. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15) Div/40/20-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15) Div/100/50-16 56.65n ± 0% 55.53n ± 0% -1.98% (p=0.000 n=15) Div/200/100-16 194.6n ± 1% 172.8n ± 0% -11.20% (p=0.000 n=15) Div/400/200-16 232.1n ± 0% 206.7n ± 0% -10.94% (p=0.000 n=15) Div/1000/500-16 405.3n ± 1% 383.8n ± 0% -5.30% (p=0.000 n=15) Div/2000/1000-16 810.4n ± 1% 795.2n ± 0% -1.88% (p=0.000 n=15) Div/20000/10000-16 25.88µ ± 0% 25.39µ ± 0% -1.89% (p=0.000 n=15) Div/200000/100000-16 931.5µ ± 0% 924.3µ ± 0% -0.77% (p=0.000 n=15) Div/2000000/1000000-16 37.77m ± 0% 37.75m ± 0% ~ (p=0.098 n=15) Div/20000000/10000000-16 1.367 ± 0% 1.377 ± 0% +0.72% (p=0.003 n=15) NatMul/10-16 168.5n ± 3% 164.0n ± 4% ~ (p=0.751 n=15) NatMul/100-16 6.086µ ± 3% 5.380µ ± 3% -11.60% (p=0.000 n=15) NatMul/1000-16 238.1µ ± 3% 228.3µ ± 1% -4.12% (p=0.000 n=15) NatMul/10000-16 8.721m ± 2% 8.518m ± 1% -2.33% (p=0.000 n=15) NatMul/100000-16 369.6m ± 0% 371.1m ± 0% +0.42% (p=0.000 n=15) geomean 19.57µ 18.74µ -4.21% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15) NatMul/1000-16 48.16Ki ± 0% 16.02Ki ± 0% -66.73% (p=0.000 n=15) NatMul/10000-16 482.9Ki ± 1% 165.4Ki ± 3% -65.75% (p=0.000 n=15) NatMul/100000-16 5.747Mi ± 7% 4.197Mi ± 0% -26.97% (p=0.000 n=15) geomean 41.42Ki 20.63Ki -50.18% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-16 7.000 ± 14% 7.000 ± 14% ~ (p=0.668 n=15) geomean 1.476 1.476 +0.00% ¹ all samples are equal goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-88 15.84n ± 1% 13.12n ± 0% -17.17% (p=0.000 n=15) Div/40/20-88 15.88n ± 1% 13.12n ± 0% -17.38% (p=0.000 n=15) Div/100/50-88 26.42n ± 0% 25.47n ± 0% -3.60% (p=0.000 n=15) Div/200/100-88 132.4n ± 0% 114.9n ± 0% -13.22% (p=0.000 n=15) Div/400/200-88 150.1n ± 0% 135.6n ± 0% -9.66% (p=0.000 n=15) Div/1000/500-88 275.5n ± 0% 264.1n ± 0% -4.14% (p=0.000 n=15) Div/2000/1000-88 586.5n ± 0% 581.1n ± 0% -0.92% (p=0.000 n=15) Div/20000/10000-88 25.87µ ± 0% 25.72µ ± 0% -0.59% (p=0.000 n=15) Div/200000/100000-88 772.2µ ± 0% 779.0µ ± 0% +0.88% (p=0.000 n=15) Div/2000000/1000000-88 33.36m ± 0% 33.63m ± 0% +0.80% (p=0.000 n=15) Div/20000000/10000000-88 1.307 ± 0% 1.320 ± 0% +1.03% (p=0.000 n=15) NatMul/10-88 140.4n ± 0% 148.8n ± 4% +5.98% (p=0.000 n=15) NatMul/100-88 4.663µ ± 1% 4.388µ ± 1% -5.90% (p=0.000 n=15) NatMul/1000-88 207.7µ ± 0% 205.8µ ± 0% -0.89% (p=0.000 n=15) NatMul/10000-88 8.456m ± 0% 8.468m ± 0% +0.14% (p=0.021 n=15) NatMul/100000-88 295.1m ± 0% 297.9m ± 0% +0.94% (p=0.000 n=15) geomean 14.96µ 14.33µ -4.23% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-88 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-88 4.750Ki ± 0% 1.758Ki ± 0% -62.99% (p=0.000 n=15) NatMul/1000-88 48.44Ki ± 0% 16.08Ki ± 0% -66.80% (p=0.000 n=15) NatMul/10000-88 489.7Ki ± 1% 166.1Ki ± 3% -66.08% (p=0.000 n=15) NatMul/100000-88 5.546Mi ± 0% 3.819Mi ± 60% -31.15% (p=0.000 n=15) geomean 41.29Ki 20.30Ki -50.85% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-88 5.000 ± 20% 6.000 ± 67% ~ (p=0.672 n=15) geomean 1.380 1.431 +3.71% ¹ all samples are equal goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 15.85n ± 0% 15.23n ± 0% -3.91% (p=0.000 n=15) Div/40/20-16 15.88n ± 0% 15.22n ± 0% -4.16% (p=0.000 n=15) Div/100/50-16 29.69n ± 0% 26.39n ± 0% -11.11% (p=0.000 n=15) Div/200/100-16 149.2n ± 0% 123.3n ± 0% -17.36% (p=0.000 n=15) Div/400/200-16 160.3n ± 0% 139.2n ± 0% -13.16% (p=0.000 n=15) Div/1000/500-16 271.0n ± 0% 256.1n ± 0% -5.50% (p=0.000 n=15) Div/2000/1000-16 545.3n ± 0% 527.0n ± 0% -3.36% (p=0.000 n=15) Div/20000/10000-16 22.60µ ± 0% 22.20µ ± 0% -1.77% (p=0.000 n=15) Div/200000/100000-16 889.0µ ± 0% 892.2µ ± 0% +0.35% (p=0.000 n=15) Div/2000000/1000000-16 38.01m ± 0% 38.12m ± 0% +0.30% (p=0.000 n=15) Div/20000000/10000000-16 1.437 ± 0% 1.444 ± 0% +0.50% (p=0.000 n=15) NatMul/10-16 166.4n ± 2% 169.5n ± 1% +1.86% (p=0.000 n=15) NatMul/100-16 5.733µ ± 1% 5.570µ ± 1% -2.84% (p=0.000 n=15) NatMul/1000-16 232.6µ ± 1% 229.8µ ± 0% -1.22% (p=0.000 n=15) NatMul/10000-16 9.039m ± 1% 8.969m ± 0% -0.77% (p=0.000 n=15) NatMul/100000-16 367.0m ± 0% 368.8m ± 0% +0.48% (p=0.000 n=15) geomean 16.15µ 15.50µ -4.01% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15) NatMul/1000-16 48.33Ki ± 0% 16.02Ki ± 0% -66.85% (p=0.000 n=15) NatMul/10000-16 536.5Ki ± 1% 165.7Ki ± 3% -69.12% (p=0.000 n=15) NatMul/100000-16 6.078Mi ± 6% 4.197Mi ± 0% -30.94% (p=0.000 n=15) geomean 42.81Ki 20.64Ki -51.78% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-16 2.000 ± 50% 1.000 ± 0% -50.00% (p=0.001 n=15) NatMul/100000-16 9.000 ± 11% 8.000 ± 12% -11.11% (p=0.001 n=15) geomean 1.783 1.516 -14.97% ¹ all samples are equal goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-12 9.850n ± 1% 9.405n ± 1% -4.52% (p=0.000 n=15) Div/40/20-12 9.858n ± 0% 9.403n ± 1% -4.62% (p=0.000 n=15) Div/100/50-12 16.40n ± 1% 14.81n ± 0% -9.70% (p=0.000 n=15) Div/200/100-12 88.48n ± 2% 80.88n ± 0% -8.59% (p=0.000 n=15) Div/400/200-12 107.90n ± 1% 99.28n ± 1% -7.99% (p=0.000 n=15) Div/1000/500-12 188.8n ± 1% 178.6n ± 1% -5.40% (p=0.000 n=15) Div/2000/1000-12 399.9n ± 0% 389.1n ± 0% -2.70% (p=0.000 n=15) Div/20000/10000-12 13.94µ ± 2% 13.81µ ± 1% ~ (p=0.574 n=15) Div/200000/100000-12 523.8µ ± 0% 521.7µ ± 0% -0.40% (p=0.000 n=15) Div/2000000/1000000-12 21.46m ± 0% 21.48m ± 0% ~ (p=0.067 n=15) Div/20000000/10000000-12 812.5m ± 0% 812.9m ± 0% ~ (p=0.061 n=15) NatMul/10-12 77.14n ± 0% 78.35n ± 1% +1.57% (p=0.000 n=15) NatMul/100-12 2.999µ ± 0% 2.871µ ± 1% -4.27% (p=0.000 n=15) NatMul/1000-12 126.2µ ± 0% 126.8µ ± 0% +0.51% (p=0.011 n=15) NatMul/10000-12 5.099m ± 0% 5.125m ± 0% +0.51% (p=0.000 n=15) NatMul/100000-12 206.7m ± 0% 208.4m ± 0% +0.80% (p=0.000 n=15) geomean 9.512µ 9.236µ -2.91% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-12 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-12 4.750Ki ± 0% 1.750Ki ± 0% -63.16% (p=0.000 n=15) NatMul/1000-12 48.13Ki ± 0% 16.01Ki ± 0% -66.73% (p=0.000 n=15) NatMul/10000-12 483.5Ki ± 1% 163.2Ki ± 2% -66.24% (p=0.000 n=15) NatMul/100000-12 5.480Mi ± 4% 1.532Mi ± 104% -72.05% (p=0.000 n=15) geomean 41.03Ki 16.82Ki -59.01% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-12 5.000 ± 0% 1.000 ± 400% -80.00% (p=0.007 n=15) geomean 1.380 1.000 -27.52% ¹ all samples are equal Change-Id: I7efa6fe37971ed26ae120a32250fcb47ece0a011 Reviewed-on: https://go-review.googlesource.com/c/go/+/650638 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>
2025-02-27	math/big: report allocs in BenchmarkNatMul, BenchmarkNatSqr	Russ Cox
	Change-Id: I112f55c0e3ee3b75e615a06b27552de164565c04 Reviewed-on: https://go-review.googlesource.com/c/go/+/650637 Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Russ Cox <rsc@golang.org>
2025-02-27	math/big: clean up GCD a little	Russ Cox
	The GCD code was setting one *Int to the value of another by smashing one struct on top of the other, instead of using Set. That was safe in this one case, but it's not idiomatic in math/big nor safe in general, so rewrite the code not to do that. (In one case, by swapping variables around; in another, by calling Set.) The added Set call does slow down GCDs by a small amount, since the answer has to be copied out. To compensate for that, optimize a bit: remove the s, t temporaries entirely and handle vector x word multiplication directly. The net result is that almost all GCDs are faster, except for small ones, which are a few nanoseconds slower. goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ bench.before │ bench.after │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-12 23.80n ± 1% 31.71n ± 1% +33.24% (p=0.000 n=10) GCD10x10/WithXY-12 100.40n ± 0% 92.14n ± 1% -8.22% (p=0.000 n=10) GCD10x100/WithoutXY-12 63.70n ± 0% 70.73n ± 0% +11.05% (p=0.000 n=10) GCD10x100/WithXY-12 278.6n ± 0% 233.1n ± 1% -16.35% (p=0.000 n=10) GCD10x1000/WithoutXY-12 153.4n ± 0% 162.2n ± 1% +5.74% (p=0.000 n=10) GCD10x1000/WithXY-12 456.0n ± 0% 411.8n ± 1% -9.69% (p=0.000 n=10) GCD10x10000/WithoutXY-12 1.002µ ± 1% 1.036µ ± 0% +3.39% (p=0.000 n=10) GCD10x10000/WithXY-12 2.330µ ± 1% 2.210µ ± 0% -5.13% (p=0.000 n=10) GCD10x100000/WithoutXY-12 8.894µ ± 0% 8.889µ ± 1% ~ (p=0.754 n=10) GCD10x100000/WithXY-12 20.84µ ± 0% 20.24µ ± 0% -2.84% (p=0.000 n=10) GCD100x100/WithoutXY-12 373.3n ± 3% 314.4n ± 0% -15.76% (p=0.000 n=10) GCD100x100/WithXY-12 662.5n ± 0% 572.4n ± 1% -13.59% (p=0.000 n=10) GCD100x1000/WithoutXY-12 641.8n ± 0% 598.1n ± 1% -6.81% (p=0.000 n=10) GCD100x1000/WithXY-12 1.123µ ± 0% 1.019µ ± 1% -9.26% (p=0.000 n=10) GCD100x10000/WithoutXY-12 2.870µ ± 0% 2.831µ ± 0% -1.38% (p=0.000 n=10) GCD100x10000/WithXY-12 4.930µ ± 1% 4.675µ ± 0% -5.16% (p=0.000 n=10) GCD100x100000/WithoutXY-12 24.08µ ± 0% 23.97µ ± 0% -0.48% (p=0.007 n=10) GCD100x100000/WithXY-12 43.66µ ± 0% 42.52µ ± 0% -2.61% (p=0.001 n=10) GCD1000x1000/WithoutXY-12 3.999µ ± 0% 3.569µ ± 1% -10.75% (p=0.000 n=10) GCD1000x1000/WithXY-12 6.397µ ± 0% 5.534µ ± 0% -13.49% (p=0.000 n=10) GCD1000x10000/WithoutXY-12 6.875µ ± 0% 6.450µ ± 0% -6.18% (p=0.000 n=10) GCD1000x10000/WithXY-12 20.75µ ± 1% 19.17µ ± 1% -7.64% (p=0.000 n=10) GCD1000x100000/WithoutXY-12 36.38µ ± 0% 35.60µ ± 1% -2.13% (p=0.000 n=10) GCD1000x100000/WithXY-12 172.1µ ± 0% 174.4µ ± 3% ~ (p=0.052 n=10) GCD10000x10000/WithoutXY-12 79.89µ ± 1% 75.16µ ± 2% -5.92% (p=0.000 n=10) GCD10000x10000/WithXY-12 160.1µ ± 0% 150.0µ ± 0% -6.33% (p=0.000 n=10) GCD10000x100000/WithoutXY-12 213.2µ ± 1% 209.0µ ± 1% -1.98% (p=0.000 n=10) GCD10000x100000/WithXY-12 1.399m ± 0% 1.342m ± 3% -4.08% (p=0.002 n=10) GCD100000x100000/WithoutXY-12 5.463m ± 1% 5.504m ± 2% ~ (p=0.190 n=10) GCD100000x100000/WithXY-12 11.36m ± 0% 11.46m ± 1% +0.86% (p=0.000 n=10) geomean 6.953µ 6.695µ -3.71% goos: linux goarch: amd64 pkg: math/big cpu: AMD Ryzen 9 7950X 16-Core Processor │ bench.before │ bench.after │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-32 39.66n ± 4% 44.34n ± 4% +11.77% (p=0.000 n=10) GCD10x10/WithXY-32 156.7n ± 12% 130.8n ± 2% -16.53% (p=0.000 n=10) GCD10x100/WithoutXY-32 115.8n ± 5% 120.2n ± 2% +3.89% (p=0.000 n=10) GCD10x100/WithXY-32 465.3n ± 3% 368.1n ± 2% -20.91% (p=0.000 n=10) GCD10x1000/WithoutXY-32 201.1n ± 1% 210.8n ± 2% +4.82% (p=0.000 n=10) GCD10x1000/WithXY-32 652.9n ± 4% 605.0n ± 1% -7.32% (p=0.002 n=10) GCD10x10000/WithoutXY-32 1.046µ ± 2% 1.143µ ± 1% +9.33% (p=0.000 n=10) GCD10x10000/WithXY-32 3.360µ ± 1% 3.258µ ± 1% -3.04% (p=0.000 n=10) GCD10x100000/WithoutXY-32 9.391µ ± 3% 9.997µ ± 1% +6.46% (p=0.000 n=10) GCD10x100000/WithXY-32 27.92µ ± 1% 28.21µ ± 0% +1.04% (p=0.043 n=10) GCD100x100/WithoutXY-32 443.7n ± 5% 320.0n ± 2% -27.88% (p=0.000 n=10) GCD100x100/WithXY-32 789.9n ± 2% 690.4n ± 1% -12.60% (p=0.000 n=10) GCD100x1000/WithoutXY-32 718.4n ± 3% 600.0n ± 1% -16.48% (p=0.000 n=10) GCD100x1000/WithXY-32 1.388µ ± 4% 1.175µ ± 1% -15.28% (p=0.000 n=10) GCD100x10000/WithoutXY-32 2.750µ ± 1% 2.668µ ± 1% -2.96% (p=0.000 n=10) GCD100x10000/WithXY-32 6.016µ ± 1% 5.590µ ± 1% -7.09% (p=0.000 n=10) GCD100x100000/WithoutXY-32 21.40µ ± 1% 22.30µ ± 1% +4.21% (p=0.000 n=10) GCD100x100000/WithXY-32 47.02µ ± 4% 48.80µ ± 0% +3.78% (p=0.015 n=10) GCD1000x1000/WithoutXY-32 3.417µ ± 4% 3.020µ ± 1% -11.65% (p=0.000 n=10) GCD1000x1000/WithXY-32 5.752µ ± 0% 5.418µ ± 2% -5.81% (p=0.000 n=10) GCD1000x10000/WithoutXY-32 6.150µ ± 0% 6.246µ ± 1% +1.55% (p=0.000 n=10) GCD1000x10000/WithXY-32 24.68µ ± 3% 25.07µ ± 1% ~ (p=0.051 n=10) GCD1000x100000/WithoutXY-32 34.60µ ± 2% 36.85µ ± 1% +6.51% (p=0.000 n=10) GCD1000x100000/WithXY-32 209.5µ ± 4% 227.4µ ± 0% +8.56% (p=0.000 n=10) GCD10000x10000/WithoutXY-32 90.69µ ± 0% 88.48µ ± 0% -2.44% (p=0.000 n=10) GCD10000x10000/WithXY-32 197.1µ ± 0% 200.5µ ± 0% +1.73% (p=0.000 n=10) GCD10000x100000/WithoutXY-32 239.1µ ± 0% 242.5µ ± 0% +1.42% (p=0.000 n=10) GCD10000x100000/WithXY-32 1.963m ± 3% 2.028m ± 0% +3.28% (p=0.000 n=10) GCD100000x100000/WithoutXY-32 7.466m ± 0% 7.412m ± 0% -0.71% (p=0.000 n=10) GCD100000x100000/WithXY-32 16.10m ± 2% 16.47m ± 0% +2.25% (p=0.000 n=10) geomean 8.388µ 8.127µ -3.12% Change-Id: I161dc409bad11bcc553bc8116449905ae5b06742 Reviewed-on: https://go-review.googlesource.com/c/go/+/650636 Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>
2025-02-25	all: surround -test.run arguments with ^$	qmuntal
	If the -test.run value is not surrounded by ^$ then any test that matches the -test.run value will be run. This is normally not the desired behavior, as it can lead to unexpected tests being run. Change-Id: I3447aaebad5156bbef7f263cdb9f6b8c32331324 Reviewed-on: https://go-review.googlesource.com/c/go/+/651956 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-03	math/big: use built-in max function	Eng Zer Jun
	Change-Id: I65721039dab311762e55c6a60dd75b82f6b4622f Reviewed-on: https://go-review.googlesource.com/c/go/+/642335 Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>
2024-12-04	math/bits: update reference to debruijn paper	Sean Liao
	The old link no longer works. Fixes #70684 Change-Id: I8711ef7d5721bf20ef83f5192dd0d1f73dda6ce1 Reviewed-on: https://go-review.googlesource.com/c/go/+/633775 Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-12-04	math/rand/v2: replace <= 0 with == 0 for Uint function docs	Jorropo
	This harmonize the docs with (Rand).Uint functions. And it make it clearer, I wasn't sure if it would try to interpret the uint as a signed number somehow, it does not pull any surprises make that clear. Change-Id: I5a87a0a5563dbabfc31e536e40ee69b11f5cb6cf Reviewed-on: https://go-review.googlesource.com/c/go/+/633535 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Commit-Queue: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Robert Griesemer <gri@google.com>
2024-11-20	internal/byteorder: use canonical Go casing in names	Russ Cox
	If Be and Le stand for big-endian and little-endian, then they should be BE and LE. Change-Id: I723e3962b8918da84791783d3c547638f1c9e8a9 Reviewed-on: https://go-review.googlesource.com/c/go/+/627376 Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-31	math/big: properly linkify a reference	Adam
	Change-Id: Ie7649060db25f1573eeaadd534a600bb24d30572 GitHub-Last-Rev: c617848a4ec9f5c21820982efc95e0ec4ca2510c GitHub-Pull-Request: golang/go#70134 Reviewed-on: https://go-review.googlesource.com/c/go/+/623757 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Robert Griesemer <gri@google.com>
2024-10-12	math: implement arch{Floor, Ceil, Trunc} in hardware on loong64	Xiaolin Zhao
	benchmark: goos: linux goarch: loong64 pkg: math cpu: Loongson-3A6000 @ 2500.00MHz │ bench.old │ bench.new │ │ sec/op │ sec/op vs base │ Ceil 10.810n ± 0% 2.578n ± 0% -76.15% (p=0.000 n=20) Floor 10.810n ± 0% 2.531n ± 0% -76.59% (p=0.000 n=20) Trunc 9.606n ± 0% 2.530n ± 0% -73.67% (p=0.000 n=20) geomean 10.39n 2.546n -75.50% goos: linux goarch: loong64 pkg: math cpu: Loongson-3A5000 @ 2500.00MHz │ bench.old │ bench.new │ │ sec/op │ sec/op vs base │ Ceil 13.220n ± 0% 7.703n ± 8% -41.73% (p=0.000 n=20) Floor 12.410n ± 0% 7.248n ± 2% -41.59% (p=0.000 n=20) Trunc 11.210n ± 0% 7.757n ± 4% -30.80% (p=0.000 n=20) geomean 12.25n 7.566n -38.25% Change-Id: I3af51e9852e9cf5f965fed895d68945a2e8675f4 Reviewed-on: https://go-review.googlesource.com/c/go/+/612615 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-09-16	math/big: add clarifying (internal) comment	Robert Griesemer
	Follow-up on CL 467555. Change-Id: I1815b5def656ae4b86c31385ad0737f0465fa2d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/613535 Auto-Submit: Robert Griesemer <gri@google.com> TryBot-Bypass: Robert Griesemer <gri@google.com> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Tim King <taking@google.com>
2024-09-16	math/big: simplify divBasic ujn assignment	Joel Sing
	Rather than conditionally assigning ujn, initialise ujn above the loop to invent the leading 0 for u, then unconditionally load ujn at the bottom of the loop. This code operates on the basis that n >= 2, hence j+n-1 is always greater than zero. Change-Id: I1272ef30c787ed8707ae8421af2adcccc776d389 Reviewed-on: https://go-review.googlesource.com/c/go/+/467555 Auto-Submit: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Commit-Queue: Robert Griesemer <gri@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Robert Griesemer <gri@google.com>
2024-09-11	math: add round assembly implementations on riscv64	Meng Zhuo
	This CL reapplies CL 504737 and adds integer precision limitation check, since CL 504737 only checks whether floating point number is +-Inf or NaN. This CL is also ~7% faster than CL 504737. Updates #68322 goos: linux goarch: riscv64 pkg: math │ math.old.bench │ math.new.bench │ │ sec/op │ sec/op vs base │ Ceil 54.09n ± 0% 18.72n ± 0% -65.39% (p=0.000 n=10) Floor 40.72n ± 0% 18.72n ± 0% -54.03% (p=0.000 n=10) Round 20.73n ± 0% 20.73n ± 0% ~ (p=1.000 n=10) RoundToEven 24.07n ± 0% 24.07n ± 0% ~ (p=1.000 n=10) Trunc 38.72n ± 0% 18.72n ± 0% -51.65% (p=0.000 n=10) geomean 33.56n 20.09n -40.13% Change-Id: I06cfe2cb9e2535cd705d40b6650a7e71fedd906c Reviewed-on: https://go-review.googlesource.com/c/go/+/600075 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-09-09	all: remove unnecessary symbols and add missing symbols	cuishuang
	Change-Id: I535a7aaaf3f9e8a9c0e0c04f8f745ad7445a32f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/611678 Run-TryBot: shuang cui <imcusg@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
2024-09-04	math: add large exact float rounding tests	Meng Zhuo
	This CL adds trunc,ceil,floor tests for large exact float. Change-Id: Ib7ffec1d2d50d2ac955398a3dd0fd06d494fcf4f Reviewed-on: https://go-review.googlesource.com/c/go/+/601095 Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-09-03	all: omit unnecessary 0 in slice expression	nlwkobe30
	All changes are related to the code, except for the comments in src/regexp/syntax/parse.go and src/slices/slices.go. Change-Id: I73c5d3c54099749b62210aa7f3182c5eb84bb6a6 GitHub-Last-Rev: 794aa9b0539811d00e1cd42be1e8d9fe9afe0281 GitHub-Pull-Request: golang/go#69170 Reviewed-on: https://go-review.googlesource.com/c/go/+/609678 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>
2024-09-03	math,os,os/*: use testenv.Executable	Kir Kolyshkin
	As some callers don't have a testing context, modify testenv.Executable to accept nil (similar to how testenv.GOROOT works). Change-Id: I39112a7869933785a26b5cb6520055b3cc42b847 Reviewed-on: https://go-review.googlesource.com/c/go/+/609835 Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-08-23	math/big: implement addMulVVW in riscv64 assembly	Joel Sing
	This provides an assembly implementation of addMulVVW for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ addmulvvw.1 │ addmulvvw.2 │ │ sec/op │ sec/op vs base │ AddMulVVW/1-4 65.49n ± 0% 50.79n ± 0% -22.44% (p=0.000 n=10) AddMulVVW/2-4 82.81n ± 0% 66.83n ± 0% -19.29% (p=0.000 n=10) AddMulVVW/3-4 100.20n ± 0% 82.87n ± 0% -17.30% (p=0.000 n=10) AddMulVVW/4-4 117.50n ± 0% 84.20n ± 0% -28.34% (p=0.000 n=10) AddMulVVW/5-4 134.9n ± 0% 100.3n ± 0% -25.69% (p=0.000 n=10) AddMulVVW/10-4 221.7n ± 0% 164.4n ± 0% -25.85% (p=0.000 n=10) AddMulVVW/100-4 1.794µ ± 0% 1.250µ ± 0% -30.32% (p=0.000 n=10) AddMulVVW/1000-4 17.42µ ± 0% 12.08µ ± 0% -30.68% (p=0.000 n=10) AddMulVVW/10000-4 254.9µ ± 0% 214.8µ ± 0% -15.75% (p=0.000 n=10) AddMulVVW/100000-4 2.569m ± 0% 2.178m ± 0% -15.20% (p=0.000 n=10) geomean 1.443µ 1.107µ -23.29% │ addmulvvw.1 │ addmulvvw.2 │ │ B/s │ B/s vs base │ AddMulVVW/1-4 932.0Mi ± 0% 1201.6Mi ± 0% +28.93% (p=0.000 n=10) AddMulVVW/2-4 1.440Gi ± 0% 1.784Gi ± 0% +23.90% (p=0.000 n=10) AddMulVVW/3-4 1.785Gi ± 0% 2.158Gi ± 0% +20.87% (p=0.000 n=10) AddMulVVW/4-4 2.029Gi ± 0% 2.832Gi ± 0% +39.59% (p=0.000 n=10) AddMulVVW/5-4 2.209Gi ± 0% 2.973Gi ± 0% +34.55% (p=0.000 n=10) AddMulVVW/10-4 2.689Gi ± 0% 3.626Gi ± 0% +34.86% (p=0.000 n=10) AddMulVVW/100-4 3.323Gi ± 0% 4.770Gi ± 0% +43.54% (p=0.000 n=10) AddMulVVW/1000-4 3.421Gi ± 0% 4.936Gi ± 0% +44.27% (p=0.000 n=10) AddMulVVW/10000-4 2.338Gi ± 0% 2.776Gi ± 0% +18.69% (p=0.000 n=10) AddMulVVW/100000-4 2.320Gi ± 0% 2.736Gi ± 0% +17.93% (p=0.000 n=10) geomean 2.109Gi 2.749Gi +30.36% Change-Id: I6c7ee48233c53ff9b6a5a9002675886cd9bff5af Reviewed-on: https://go-review.googlesource.com/c/go/+/595400 Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-08-23	math/big: implement mulAddVWW in riscv64 assembly	Joel Sing
	This provides an assembly implementation of mulAddVWW for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ muladdvww.1 │ muladdvww.2 │ │ sec/op │ sec/op vs base │ MulAddVWW/1-4 68.18n ± 0% 65.49n ± 0% -3.95% (p=0.000 n=10) MulAddVWW/2-4 82.81n ± 0% 78.85n ± 0% -4.78% (p=0.000 n=10) MulAddVWW/3-4 97.49n ± 0% 72.18n ± 0% -25.96% (p=0.000 n=10) MulAddVWW/4-4 112.20n ± 0% 85.54n ± 0% -23.76% (p=0.000 n=10) MulAddVWW/5-4 126.90n ± 0% 98.90n ± 0% -22.06% (p=0.000 n=10) MulAddVWW/10-4 200.3n ± 0% 144.3n ± 0% -27.96% (p=0.000 n=10) MulAddVWW/100-4 1532.0n ± 0% 860.0n ± 0% -43.86% (p=0.000 n=10) MulAddVWW/1000-4 14.757µ ± 0% 8.076µ ± 0% -45.27% (p=0.000 n=10) MulAddVWW/10000-4 204.0µ ± 0% 137.1µ ± 0% -32.77% (p=0.000 n=10) MulAddVWW/100000-4 2.066m ± 0% 1.382m ± 0% -33.12% (p=0.000 n=10) geomean 1.311µ 950.0n -27.51% │ muladdvww.1 │ muladdvww.2 │ │ B/s │ B/s vs base │ MulAddVWW/1-4 895.1Mi ± 0% 932.0Mi ± 0% +4.11% (p=0.000 n=10) MulAddVWW/2-4 1.440Gi ± 0% 1.512Gi ± 0% +5.02% (p=0.000 n=10) MulAddVWW/3-4 1.834Gi ± 0% 2.477Gi ± 0% +35.07% (p=0.000 n=10) MulAddVWW/4-4 2.125Gi ± 0% 2.787Gi ± 0% +31.15% (p=0.000 n=10) MulAddVWW/5-4 2.349Gi ± 0% 3.013Gi ± 0% +28.28% (p=0.000 n=10) MulAddVWW/10-4 2.975Gi ± 0% 4.130Gi ± 0% +38.79% (p=0.000 n=10) MulAddVWW/100-4 3.891Gi ± 0% 6.930Gi ± 0% +78.11% (p=0.000 n=10) MulAddVWW/1000-4 4.039Gi ± 0% 7.380Gi ± 0% +82.72% (p=0.000 n=10) MulAddVWW/10000-4 2.922Gi ± 0% 4.346Gi ± 0% +48.74% (p=0.000 n=10) MulAddVWW/100000-4 2.884Gi ± 0% 4.313Gi ± 0% +49.52% (p=0.000 n=10) geomean 2.321Gi 3.202Gi +37.95% Change-Id: If08191607913ce5c7641f34bae8fa5c9dfb44777 Reviewed-on: https://go-review.googlesource.com/c/go/+/595399 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
2024-08-22	math/big: implement subVW in riscv64 assembly	Joel Sing
	This provides an assembly implementation of subVW for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ subvw.1 │ subvw.2 │ │ sec/op │ sec/op vs base │ SubVW/1-4 57.43n ± 0% 41.45n ± 0% -27.82% (p=0.000 n=10) SubVW/2-4 69.31n ± 0% 48.15n ± 0% -30.53% (p=0.000 n=10) SubVW/3-4 76.12n ± 0% 54.87n ± 0% -27.92% (p=0.000 n=10) SubVW/4-4 85.47n ± 0% 56.14n ± 0% -34.32% (p=0.000 n=10) SubVW/5-4 96.15n ± 0% 62.83n ± 0% -34.65% (p=0.000 n=10) SubVW/10-4 149.60n ± 0% 89.55n ± 0% -40.14% (p=0.000 n=10) SubVW/100-4 1115.0n ± 0% 549.3n ± 0% -50.74% (p=0.000 n=10) SubVW/1000-4 10.732µ ± 0% 5.071µ ± 0% -52.75% (p=0.000 n=10) SubVW/10000-4 153.0µ ± 0% 103.7µ ± 0% -32.21% (p=0.000 n=10) SubVW/100000-4 1.542m ± 0% 1.046m ± 0% -32.13% (p=0.000 n=10) SubVWext/1-4 57.42n ± 0% 41.45n ± 0% -27.81% (p=0.000 n=10) SubVWext/2-4 69.33n ± 0% 48.15n ± 0% -30.55% (p=0.000 n=10) SubVWext/3-4 76.12n ± 0% 54.93n ± 0% -27.84% (p=0.000 n=10) SubVWext/4-4 85.47n ± 0% 56.14n ± 0% -34.32% (p=0.000 n=10) SubVWext/5-4 96.15n ± 0% 62.83n ± 0% -34.65% (p=0.000 n=10) SubVWext/10-4 149.60n ± 0% 89.56n ± 0% -40.14% (p=0.000 n=10) SubVWext/100-4 1115.0n ± 0% 549.3n ± 0% -50.74% (p=0.000 n=10) SubVWext/1000-4 10.732µ ± 0% 5.061µ ± 0% -52.84% (p=0.000 n=10) SubVWext/10000-4 152.5µ ± 0% 103.7µ ± 0% -32.02% (p=0.000 n=10) SubVWext/100000-4 1.533m ± 0% 1.046m ± 0% -31.75% (p=0.000 n=10) geomean 1.005µ 633.7n -36.92% │ subvw.1 │ subvw.2 │ │ B/s │ B/s vs base │ SubVW/1-4 132.9Mi ± 0% 184.1Mi ± 0% +38.54% (p=0.000 n=10) SubVW/2-4 220.1Mi ± 0% 316.9Mi ± 0% +43.95% (p=0.000 n=10) SubVW/3-4 300.7Mi ± 0% 417.1Mi ± 0% +38.72% (p=0.000 n=10) SubVW/4-4 357.1Mi ± 0% 543.6Mi ± 0% +52.24% (p=0.000 n=10) SubVW/5-4 396.7Mi ± 0% 607.2Mi ± 0% +53.03% (p=0.000 n=10) SubVW/10-4 510.1Mi ± 0% 851.9Mi ± 0% +67.01% (p=0.000 n=10) SubVW/100-4 684.2Mi ± 0% 1388.9Mi ± 0% +102.99% (p=0.000 n=10) SubVW/1000-4 710.9Mi ± 0% 1504.5Mi ± 0% +111.63% (p=0.000 n=10) SubVW/10000-4 498.7Mi ± 0% 735.7Mi ± 0% +47.52% (p=0.000 n=10) SubVW/100000-4 494.8Mi ± 0% 729.1Mi ± 0% +47.34% (p=0.000 n=10) SubVWext/1-4 132.9Mi ± 0% 184.1Mi ± 0% +38.53% (p=0.000 n=10) SubVWext/2-4 220.1Mi ± 0% 316.9Mi ± 0% +44.00% (p=0.000 n=10) SubVWext/3-4 300.7Mi ± 0% 416.7Mi ± 0% +38.57% (p=0.000 n=10) SubVWext/4-4 357.1Mi ± 0% 543.6Mi ± 0% +52.24% (p=0.000 n=10) SubVWext/5-4 396.7Mi ± 0% 607.2Mi ± 0% +53.04% (p=0.000 n=10) SubVWext/10-4 510.1Mi ± 0% 851.9Mi ± 0% +67.01% (p=0.000 n=10) SubVWext/100-4 684.2Mi ± 0% 1388.9Mi ± 0% +102.99% (p=0.000 n=10) SubVWext/1000-4 710.9Mi ± 0% 1507.6Mi ± 0% +112.07% (p=0.000 n=10) SubVWext/10000-4 500.1Mi ± 0% 735.7Mi ± 0% +47.10% (p=0.000 n=10) SubVWext/100000-4 497.8Mi ± 0% 729.4Mi ± 0% +46.52% (p=0.000 n=10) geomean 387.6Mi 614.5Mi +58.51% Change-Id: I9d7fac719e977710ad9db9121fa298db6df605de Reviewed-on: https://go-review.googlesource.com/c/go/+/595398 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-08-22	math/big: implement addVW in riscv64 assembly	Joel Sing
	This provides an assembly implementation of addVW for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ addvw.1 │ addvw.2 │ │ sec/op │ sec/op vs base │ AddVW/1-4 57.43n ± 0% 41.45n ± 0% -27.83% (p=0.000 n=10) AddVW/2-4 69.31n ± 0% 48.15n ± 0% -30.53% (p=0.000 n=10) AddVW/3-4 76.12n ± 0% 54.97n ± 0% -27.79% (p=0.000 n=10) AddVW/4-4 85.47n ± 0% 56.14n ± 0% -34.32% (p=0.000 n=10) AddVW/5-4 96.16n ± 0% 62.82n ± 0% -34.67% (p=0.000 n=10) AddVW/10-4 149.60n ± 0% 89.55n ± 0% -40.14% (p=0.000 n=10) AddVW/100-4 1115.0n ± 0% 549.3n ± 0% -50.74% (p=0.000 n=10) AddVW/1000-4 10.732µ ± 0% 5.060µ ± 0% -52.85% (p=0.000 n=10) AddVW/10000-4 151.7µ ± 0% 103.7µ ± 0% -31.63% (p=0.000 n=10) AddVW/100000-4 1.523m ± 0% 1.050m ± 0% -31.03% (p=0.000 n=10) AddVWext/1-4 57.42n ± 0% 41.45n ± 0% -27.81% (p=0.000 n=10) AddVWext/2-4 69.32n ± 0% 48.15n ± 0% -30.54% (p=0.000 n=10) AddVWext/3-4 76.12n ± 0% 54.87n ± 0% -27.92% (p=0.000 n=10) AddVWext/4-4 85.47n ± 0% 56.14n ± 0% -34.32% (p=0.000 n=10) AddVWext/5-4 96.15n ± 0% 62.82n ± 0% -34.66% (p=0.000 n=10) AddVWext/10-4 149.60n ± 0% 89.55n ± 0% -40.14% (p=0.000 n=10) AddVWext/100-4 1115.0n ± 0% 549.3n ± 0% -50.74% (p=0.000 n=10) AddVWext/1000-4 10.732µ ± 0% 5.060µ ± 0% -52.85% (p=0.000 n=10) AddVWext/10000-4 150.5µ ± 0% 103.7µ ± 0% -31.10% (p=0.000 n=10) AddVWext/100000-4 1.530m ± 0% 1.049m ± 0% -31.41% (p=0.000 n=10) geomean 1.003µ 633.9n -36.79% │ addvw.1 │ addvw.2 │ │ B/s │ B/s vs base │ AddVW/1-4 132.8Mi ± 0% 184.1Mi ± 0% +38.55% (p=0.000 n=10) AddVW/2-4 220.1Mi ± 0% 316.9Mi ± 0% +43.96% (p=0.000 n=10) AddVW/3-4 300.7Mi ± 0% 416.4Mi ± 0% +38.48% (p=0.000 n=10) AddVW/4-4 357.1Mi ± 0% 543.6Mi ± 0% +52.25% (p=0.000 n=10) AddVW/5-4 396.7Mi ± 0% 607.2Mi ± 0% +53.06% (p=0.000 n=10) AddVW/10-4 510.1Mi ± 0% 852.0Mi ± 0% +67.02% (p=0.000 n=10) AddVW/100-4 684.1Mi ± 0% 1389.0Mi ± 0% +103.03% (p=0.000 n=10) AddVW/1000-4 710.9Mi ± 0% 1507.8Mi ± 0% +112.08% (p=0.000 n=10) AddVW/10000-4 503.1Mi ± 0% 735.8Mi ± 0% +46.26% (p=0.000 n=10) AddVW/100000-4 501.0Mi ± 0% 726.5Mi ± 0% +45.00% (p=0.000 n=10) AddVWext/1-4 132.9Mi ± 0% 184.1Mi ± 0% +38.55% (p=0.000 n=10) AddVWext/2-4 220.1Mi ± 0% 316.9Mi ± 0% +43.98% (p=0.000 n=10) AddVWext/3-4 300.7Mi ± 0% 417.1Mi ± 0% +38.73% (p=0.000 n=10) AddVWext/4-4 357.1Mi ± 0% 543.6Mi ± 0% +52.25% (p=0.000 n=10) AddVWext/5-4 396.7Mi ± 0% 607.2Mi ± 0% +53.05% (p=0.000 n=10) AddVWext/10-4 510.1Mi ± 0% 852.0Mi ± 0% +67.02% (p=0.000 n=10) AddVWext/100-4 684.2Mi ± 0% 1389.0Mi ± 0% +103.02% (p=0.000 n=10) AddVWext/1000-4 710.9Mi ± 0% 1507.7Mi ± 0% +112.08% (p=0.000 n=10) AddVWext/10000-4 506.9Mi ± 0% 735.8Mi ± 0% +45.15% (p=0.000 n=10) AddVWext/100000-4 498.6Mi ± 0% 727.0Mi ± 0% +45.79% (p=0.000 n=10) geomean 388.3Mi 614.3Mi +58.19% Change-Id: Ib14a4b8c1d81e710753bbf6dd5546bbca44fe3f1 Reviewed-on: https://go-review.googlesource.com/c/go/+/595397 Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-08-21	crypto/x509,math/rand/v2: implement the encoding.(Binary\|Text)Appender	apocelipes
	Implement the encoding.(Binary\|Text)Appender interfaces for "x509.OID". Implement the encoding.BinaryAppender interface for "rand/v2.PCG" and "rand/v2.ChaCha8". "rand/v2.ChaCha8.MarshalBinary" alse gains some performance benefits: │ old │ new │ │ sec/op │ sec/op vs base │ ChaCha8MarshalBinary-8 33.730n ± 2% 9.786n ± 1% -70.99% (p=0.000 n=10) ChaCha8MarshalBinaryRead-8 99.86n ± 1% 17.79n ± 0% -82.18% (p=0.000 n=10) geomean 58.04n 13.19n -77.27% │ old │ new │ │ B/op │ B/op vs base │ ChaCha8MarshalBinary-8 48.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) ChaCha8MarshalBinaryRead-8 83.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) │ old │ new │ │ allocs/op │ allocs/op vs base │ ChaCha8MarshalBinary-8 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10) ChaCha8MarshalBinaryRead-8 2.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10) For #62384 Change-Id: I604bde6dad90a916012909c7260f4bb06dcf5c0a GitHub-Last-Rev: 78abf9c5dfb74838985637798bcd5cb957541d20 GitHub-Pull-Request: golang/go#68987 Reviewed-on: https://go-review.googlesource.com/c/go/+/607079 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2024-08-20	src: fix typos	Alexander Cyon
	Fix typos in ~30 files Change-Id: Ie433aea01e7d15944c1e9e103691784876d5c1f9 GitHub-Last-Rev: bbaeb3d1f88a5fa6bbb69607b1bd075f496a7894 GitHub-Pull-Request: golang/go#68964 Reviewed-on: https://go-review.googlesource.com/c/go/+/606955 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2024-08-19	math/rand: make calls to Seed no-op	Paschalis T
	Makes calls to the global Seed a no-op. The GODEBUG=randseednop=0 setting can be used to revert this behavior. Fixes #67273 Change-Id: I79c1b2b23f3bc472fbd6190cb916a9d7583250f4 Reviewed-on: https://go-review.googlesource.com/c/go/+/606055 Auto-Submit: Cherry Mui <cherryyz@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-08-15	math/big,regexp: implement the encoding.TextAppender interface	apocelipes
	For #62384 Change-Id: I1557704c6a0f9c6f3b9aad001374dd5cdbc99065 GitHub-Last-Rev: c258d18ccedab5feeb481a2431d5647bde7e5c58 GitHub-Pull-Request: golang/go#68893 Reviewed-on: https://go-review.googlesource.com/c/go/+/605758 Reviewed-by: Ian Lance Taylor <iant@google.com> Commit-Queue: Robert Griesemer <gri@google.com> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Robert Griesemer <gri@google.com>

Adding zero usually does not change the original value. However, there is an exception with negative zero. (e.g. (-0) + (+0) = (+0)) This applies when x * y is negative and underflows. Fixes #73757 Change-Id: Ib7b54bdacd1dcfe3d392802ea35cdb4e989f9371 GitHub-Last-Rev: 30d74883b21667fc9439d9d14932b7edb3e72cd5 GitHub-Pull-Request: golang/go#73759 Reviewed-on: https://go-review.googlesource.com/c/go/+/673856 Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Robert Griesemer <gri@google.com>

According to the MIPS ABI, R26/R27 are reserved for OS kernel, and may be clobbered by it. They must not be used by user mode. See Figure 3-18 of MIPS ELF ABI specification: https://refspecs.linuxfoundation.org/elf/mipsabi.pdf Fixes #73472 Change-Id: Ifda692a803176bfaab2c70d6623636c5d135f42e Reviewed-on: https://go-review.googlesource.com/c/go/+/667816 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com>

Checking that the lengths are equal and panicking teaches the compiler that it can assume “i in range for z” implies “i in range for x”, letting us simplify the actual loops a bit. It also turns up a few places in math/big that were playing maybe a little too fast and loose with slice lengths. Update those to explicitly set all the input slices to the same length. These speedups are basically irrelevant, since they only happen in real code if people are compiling with -tags math_big_pure_go. But at least the code is clearer. benchmark \ system c3h88 c2s16 s7 386 s7-386 c4as16 mac arm loong64 ppc64le riscv64 s390x AddVV/words=1/impl=go ~ +11.20% +5.11% -7.67% -7.77% +1.90% +10.76% -33.22% ~ +10.98% ~ +6.60% AddVV/words=10/impl=go -22.12% -13.48% -10.37% -17.95% -18.07% -24.58% -22.04% -29.95% -14.22% ~ -6.33% +3.66% AddVV/words=16/impl=go -9.75% -13.73% ~ -21.90% -18.66% -30.03% -20.45% -28.09% -17.33% -7.15% -8.96% +12.55% AddVV/words=100/impl=go -5.91% -1.02% ~ -29.23% -22.18% -25.62% -6.49% -23.59% -22.31% -1.88% -14.13% +9.23% AddVV/words=1000/impl=go -0.52% -0.19% -3.58% -33.89% -23.46% -22.46% ~ -24.00% -24.73% +0.93% -15.79% +12.32% AddVV/words=10000/impl=go ~ ~ ~ -33.79% -23.72% -23.79% -5.98% -23.92% ~ +0.78% -15.45% +8.59% AddVV/words=100000/impl=go ~ ~ ~ -33.90% -24.25% -22.82% -4.09% -24.63% ~ +1.00% -13.56% ~ SubVV/words=1/impl=go ~ +11.64% +14.05% ~ -4.07% ~ +10.79% -33.69% ~ ~ +3.89% +12.33% SubVV/words=10/impl=go -10.31% -14.09% -7.38% +13.76% -13.25% -18.05% -20.08% -24.97% -14.15% +10.13% -0.97% -2.51% SubVV/words=16/impl=go -8.06% -13.73% -5.70% +17.00% -12.83% -23.76% -17.52% -25.25% -17.30% -2.80% -4.96% -18.25% SubVV/words=100/impl=go -9.22% -1.30% -2.76% +20.88% -14.35% -15.29% -8.49% -19.64% -22.31% -0.68% -14.30% -9.04% SubVV/words=1000/impl=go -0.60% ~ -3.43% +23.08% -16.14% -11.96% ~ -28.52% -24.73% ~ -15.95% -9.91% SubVV/words=10000/impl=go ~ ~ ~ +26.01% -15.24% -11.92% ~ -28.26% +4.25% ~ -15.42% -5.95% SubVV/words=100000/impl=go ~ ~ ~ +25.71% -15.83% -12.13% ~ -27.88% -1.27% ~ -13.57% -6.72% LshVU/words=1/impl=go +0.56% +0.36% ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ LshVU/words=10/impl=go +13.37% +4.63% ~ ~ ~ ~ ~ -2.90% ~ ~ ~ ~ LshVU/words=16/impl=go +22.83% +6.47% ~ ~ ~ ~ ~ ~ +0.80% ~ ~ +5.88% LshVU/words=100/impl=go +7.56% +13.95% ~ ~ ~ ~ ~ ~ +0.33% -2.50% ~ ~ LshVU/words=1000/impl=go +0.64% +17.92% ~ ~ ~ ~ ~ -6.52% ~ -2.58% ~ ~ LshVU/words=10000/impl=go ~ +17.60% ~ ~ ~ ~ ~ -6.64% -6.22% -1.40% ~ ~ LshVU/words=100000/impl=go ~ +14.57% ~ ~ ~ ~ ~ ~ -5.47% ~ ~ ~ RshVU/words=1/impl=go ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ +2.72% RshVU/words=10/impl=go ~ ~ ~ ~ ~ ~ ~ +2.50% ~ ~ ~ ~ RshVU/words=16/impl=go ~ +0.53% ~ ~ ~ ~ ~ +3.82% ~ ~ ~ ~ RshVU/words=100/impl=go ~ ~ ~ ~ ~ ~ ~ +6.18% ~ ~ ~ ~ RshVU/words=1000/impl=go ~ ~ ~ ~ ~ ~ ~ +7.00% ~ ~ ~ ~ RshVU/words=10000/impl=go ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ RshVU/words=100000/impl=go ~ ~ ~ ~ ~ ~ ~ +7.05% ~ ~ ~ ~ MulAddVWW/words=1/impl=go -10.34% +4.43% +10.62% -1.62% -4.74% -2.86% +11.75% ~ -8.00% +8.89% +3.87% ~ MulAddVWW/words=10/impl=go -1.61% -5.87% ~ -8.30% -4.55% +0.87% ~ -5.28% -20.82% ~ ~ -2.32% MulAddVWW/words=16/impl=go -2.96% -5.28% ~ -9.22% -5.28% ~ ~ -3.74% -19.52% -1.48% -2.53% -9.52% MulAddVWW/words=100/impl=go -3.89% -7.53% +1.93% -10.49% -4.87% -8.27% ~ ~ -0.65% -0.61% -7.59% -20.61% MulAddVWW/words=1000/impl=go -0.45% -3.91% +4.54% -11.46% -4.69% -8.53% ~ ~ -0.05% ~ -8.88% -19.77% MulAddVWW/words=10000/impl=go ~ -3.30% +4.10% -11.34% -4.10% -9.43% ~ -0.61% ~ -0.55% -8.21% -18.48% MulAddVWW/words=100000/impl=go -0.30% -3.03% +4.31% -11.55% -4.41% -9.74% ~ -0.75% +0.63% ~ -7.80% -19.82% AddMulVVWW/words=1/impl=go ~ +13.09% +12.50% -7.05% -10.41% +2.53% +13.32% -3.49% ~ +15.56% +3.62% ~ AddMulVVWW/words=10/impl=go -15.96% -9.06% -5.06% -14.56% -11.83% -5.44% -26.30% -14.23% -11.44% -1.79% -5.93% -6.60% AddMulVVWW/words=16/impl=go -19.05% -12.43% -6.19% -14.24% -12.67% -8.65% -18.64% -16.56% -10.64% -3.00% -7.61% -12.80% AddMulVVWW/words=100/impl=go -22.13% -16.59% -13.04% -13.79% -11.46% -12.01% -6.46% -21.80% -5.08% -3.13% -13.60% -22.53% AddMulVVWW/words=1000/impl=go -17.07% -17.05% -14.08% -13.59% -12.13% -11.21% ~ -22.81% -4.27% -1.27% -16.35% -23.47% AddMulVVWW/words=10000/impl=go -15.03% -16.78% -14.23% -13.86% -11.84% -11.69% ~ -22.75% -13.39% -1.10% -14.37% -22.01% AddMulVVWW/words=100000/impl=go -13.70% -14.90% -14.26% -13.55% -12.04% -11.63% ~ -22.61% ~ -2.53% -10.42% -23.16% Change-Id: Ic6f64344484a762b818c7090d1396afceb638607 Reviewed-on: https://go-review.googlesource.com/c/go/+/665155 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>

Step 4 of the mini-compiler: switch to the new generated assembly. No systematic performance regressions, and many many improvements. In the benchmarks, the systems are: c3h88 GOARCH=amd64 c3h88 perf gomote (newer Intel, Google Cloud) c2s16 GOARCH=amd64 c2s16 perf gomote (Intel, Google Cloud) s7 GOARCH=amd64 rsc basement server (AMD Ryzen 9 7950X) 386 GOARCH=386 gotip-linux-386 gomote (Intel, Google Cloud) s7-386 GOARCH=386 rsc basement server (AMD Ryzen 9 7950X) c4as16 GOARCH=arm64 c4as16 perf gomote (Google Cloud) mac GOARCH=arm64 Apple M3 Pro in MacBook Pro arm GOARCH=arm gotip-linux-arm gomote loong64 GOARCH=loong64 gotip-linux-loong64 gomote ppc64le GOARCH=ppc64le gotip-linux-ppc64le gomote riscv64 GOARCH=riscv64 gotip-linux-riscv64 gomote s390x GOARCH=s390x linux-s390x-ibm old gomote benchmark \ system c3h88 c2s16 s7 386 s7-386 c4as16 mac arm loong64 ppc64le riscv64 s390x AddVV/words=1 -4.03% +5.21% -4.04% +4.94% ~ ~ ~ ~ -19.51% ~ ~ ~ AddVV/words=10 -10.20% +0.34% -3.46% -11.50% -7.46% +7.66% +5.97% ~ -17.90% ~ ~ ~ AddVV/words=16 -10.91% -6.45% -8.45% -21.86% -17.90% +2.73% -1.61% ~ -22.47% -3.54% ~ ~ AddVV/words=100 -3.77% -4.30% -3.17% -47.27% -45.34% -0.78% ~ -8.74% -27.19% ~ ~ ~ AddVV/words=1000 -0.08% -0.71% ~ -49.21% -48.07% ~ ~ -16.80% -24.74% ~ ~ ~ AddVV/words=10000 ~ ~ ~ -48.73% -48.56% -0.06% ~ -17.08% ~ ~ -4.81% ~ AddVV/words=100000 ~ ~ ~ -47.80% -48.38% ~ ~ -15.10% -25.06% ~ -5.34% ~ SubVV/words=1 -0.84% +3.43% -3.62% +1.34% ~ -0.76% ~ ~ -18.18% +5.58% ~ ~ SubVV/words=10 -9.99% +0.34% ~ -11.23% -8.24% +7.53% +6.15% ~ -17.55% +2.77% -2.08% ~ SubVV/words=16 -11.94% -6.45% -6.81% -21.82% -18.11% +1.58% -1.21% ~ -20.36% ~ ~ ~ SubVV/words=100 -3.38% -4.32% -1.80% -46.14% -46.43% +0.41% ~ -7.20% -26.17% ~ -0.42% ~ SubVV/words=1000 -0.38% -0.80% ~ -49.22% -48.90% ~ ~ -15.86% -24.73% ~ ~ ~ SubVV/words=10000 ~ ~ ~ -49.57% -49.64% -0.03% ~ -15.85% -26.52% ~ -5.05% ~ SubVV/words=100000 ~ ~ ~ -46.88% -49.66% ~ ~ -15.45% -16.11% ~ -4.99% ~ LshVU/words=1 ~ +5.78% ~ ~ -2.48% +1.61% +2.18% +2.70% -18.16% -34.16% -21.29% ~ LshVU/words=10 -18.34% -3.78% +2.21% ~ ~ -2.81% -12.54% ~ -25.02% -24.78% -38.11% -66.98% LshVU/words=16 -23.15% +1.03% +7.74% +0.73% ~ +8.88% +1.56% ~ -25.37% -28.46% -41.27% ~ LshVU/words=100 -32.85% -8.86% -2.58% ~ +2.69% +1.24% ~ -20.63% -44.14% -42.68% -53.09% ~ LshVU/words=1000 -37.30% -0.20% +5.67% ~ ~ +1.44% ~ -27.83% -45.01% -37.07% -57.02% -46.57% LshVU/words=10000 -36.84% -2.30% +3.82% ~ +1.86% +1.57% -66.81% -28.00% -13.15% -35.40% -41.97% ~ LshVU/words=100000 -40.30% ~ +3.96% ~ ~ ~ ~ -24.91% -19.06% -36.14% -40.99% -66.03% RshVU/words=1 -3.17% +4.76% -4.06% +4.31% +4.55% ~ ~ ~ -20.61% ~ -26.20% -51.33% RshVU/words=10 -22.08% -4.41% -17.99% +3.64% -11.87% ~ -16.30% ~ -30.01% ~ -40.37% -63.05% RshVU/words=16 -26.03% -8.50% -18.09% ~ -17.52% +6.50% ~ -2.85% -30.24% ~ -42.93% -63.13% RshVU/words=100 -20.87% -28.83% -29.45% ~ -26.25% +1.46% -1.14% -16.20% -45.65% -16.20% -53.66% -77.27% RshVU/words=1000 -24.03% -21.37% -26.71% ~ -28.95% +0.98% ~ -18.82% -45.21% -23.55% -57.09% -71.18% RshVU/words=10000 -24.56% -22.44% -27.01% ~ -28.88% +0.78% -5.35% -17.47% -16.87% -20.67% -41.97% ~ RshVU/words=100000 -23.36% -15.65% -27.54% ~ -29.26% +1.73% -6.67% -13.68% -21.40% -23.02% -40.37% -66.31% MulAddVWW/words=1 +2.37% +8.14% ~ +4.10% +3.71% ~ ~ ~ -21.62% ~ +1.12% ~ MulAddVWW/words=10 ~ -2.72% -15.15% +8.04% ~ ~ ~ -2.52% -19.48% ~ -6.18% ~ MulAddVWW/words=16 ~ +1.49% ~ +4.49% +6.58% -8.70% -7.16% -12.08% -21.43% -6.59% -9.05% ~ MulAddVWW/words=100 +0.37% +1.11% -4.51% -13.59% ~ -11.10% -3.63% -21.40% -22.27% -2.92% -14.41% ~ MulAddVWW/words=1000 ~ +0.90% -7.13% -18.94% ~ -14.02% -9.97% -28.31% -18.72% -2.32% -15.80% ~ MulAddVWW/words=10000 ~ +1.08% -6.75% -19.10% ~ -14.61% -9.04% -28.48% -14.29% -2.25% -9.40% ~ MulAddVWW/words=100000 ~ ~ -6.93% -18.09% ~ -14.33% -9.66% -28.92% -16.63% -2.43% -8.23% ~ AddMulVVWW/words=1 +2.30% +4.83% -11.37% +4.58% ~ -3.14% ~ ~ -10.58% +30.35% ~ ~ AddMulVVWW/words=10 -3.27% ~ +8.96% +5.74% ~ +2.67% -1.44% -7.64% -13.41% ~ ~ ~ AddMulVVWW/words=16 -6.12% ~ ~ ~ +1.91% -7.90% -16.22% -14.07% -14.26% -4.15% -7.30% ~ AddMulVVWW/words=100 -5.48% -2.14% ~ -9.40% +9.98% -1.43% -12.35% -18.56% -21.94% ~ -9.84% ~ AddMulVVWW/words=1000 -11.35% -3.40% -3.64% -11.04% +12.82% -1.33% -15.63% -20.50% -20.95% ~ -11.06% -51.97% AddMulVVWW/words=10000 -10.31% -1.61% -8.41% -12.15% +13.10% -1.03% -16.34% -22.46% -1.00% ~ -10.33% -49.80% AddMulVVWW/words=100000 -13.71% ~ -8.31% -12.18% +12.98% -1.35% -15.20% -21.89% ~ ~ -9.38% -48.30% Change-Id: I0a33c33602c0d053c84d9946e662500cfa048e2d Reviewed-on: https://go-review.googlesource.com/c/go/+/664938 Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Step 3 of the mini-compiler: add the generators for the shift and mul routines. Change-Id: I981d5b7086262c740036f5db768d3e63083984e2 Reviewed-on: https://go-review.googlesource.com/c/go/+/664937 Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Step 2 of the mini-compiler: add all the remaining architectures. Change-Id: I8c5283aa8baa497785a5c15f2248528fa9ae886e Reviewed-on: https://go-review.googlesource.com/c/go/+/664936 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>

The arith assembly is big enough, and the details that you have to keep in mind are complex enough and varied enough, that it is worth using a Go program to generate the assembly. That way, all the architectures can use the same algorithms, and porting to new architectures will be easier. This is the first of a sequence of CLs to introduce a new mini-compiler for generating the arith assembly, in math/big/internal/asmgen. This CL has the basics of the compiler as well as a couple simple architectures and the generator for addVV/subVV. It does not check in the generated assembly yet. That will happen in a followup CL after the other architectures and generators have been added. Change-Id: Ib704c60fd972fc5690ac04d8fae3712ee2c1a80a Reviewed-on: https://go-review.googlesource.com/c/go/+/664935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>

The vast majority of the time, carry propagation is limited and addVW/subVW only need to consider a single word for carry propagation. As Josh Bleecher-Snyder pointed out in 2019 (CL 164968), once carrying is done, the remaining words can be handled faster with copy (memmove). In the benchmarks below, this is the data=random case. Even more important, if the source and destination are the same, the copy can be optimized away entirely, making a small in-place addition to a big.Int O(1) instead of O(N). To date, only a few systems (amd64, arm64, and pure Go, meaning wasm) make use of this asymptotic improvement. This is the data=shortcut case. This CL deletes the addVW/subVW assembly and replaces it with an optimized pure Go version. Using Go makes it easy to call the real copy builtin, which will use optimized memmove code, instead of recreating a worse memmove in assembly (as arm64 does) or omitting the copy optimization entirely (as most others do). The worst case for the Go version versus assembly is the case of incrementing 2^N-1 by 1, which has to propagate a carry the entire length of the array. This is the data=carry case. On balance, we believe this case is rare enough to be worth taking a hit in that case, in exchange for significant wins in the other cases and the deletion of significant amounts of assembly of varying quality. (Remember that half the assembly has the copy optimization and shortcut, while half does not.) In the benchmarks, the systems are: c2s16 GOARCH=amd64 c2s16 perf gomote (Intel, Google Cloud) c3h88 GOARCH=amd64 c3h88 perf gomote (newer Intel, Google Cloud) s7 GOARCH=amd64 rsc basement server (AMD Ryzen 9 7950X) c4as16 GOARCH=arm64 c4as16 perf gomote (Google Cloud) mac GOARCH=arm64 Apple M3 Pro in MacBook Pro 386 GOARCH=386 gotip-linux-386 gomote arm GOARCH=arm gotip-linux-arm gomote loong64 GOARCH=loong64 gotip-linux-loong64 gomote ppc64le GOARCH=ppc64le gotip-linux-ppc64le gomote riscv64 GOARCH=riscv64 gotip-linux-riscv64 gomote benchmark \ system c2s16 c3h88 s7 c4as16 mac 386 arm loong64 ppc64le riscv64 AddVW/words=1/data=random -1.15% -1.74% -5.89% -9.80% -11.54% +23.71% -12.74% -14.25% +14.67% +10.27% AddVW/words=2/data=random -2.59% ~ -4.38% -19.31% -15.41% +24.80% ~ -19.99% +13.73% +19.71% AddVW/words=3/data=random -3.75% -19.10% -3.79% -23.15% -17.04% +20.04% -10.07% -23.20% ~ +15.39% AddVW/words=4/data=random -2.84% +7.05% -8.77% -22.64% -15.77% +16.01% -7.36% -28.22% ~ +23.00% AddVW/words=5/data=random -10.97% +2.16% -12.09% -20.89% -17.14% +9.42% -4.69% -32.60% ~ +10.07% AddVW/words=6/data=random -9.87% ~ -7.54% -19.08% -6.46% ~ -3.44% -34.61% ~ +12.19% AddVW/words=7/data=random -14.36% ~ -10.09% -19.10% -10.47% -6.20% -5.06% -38.14% -11.54% +6.79% AddVW/words=8/data=random -17.50% ~ -11.06% -25.14% -12.88% -8.35% -5.11% -41.39% -14.04% +11.87% AddVW/words=9/data=random -19.76% -4.05% -15.47% -24.08% -16.50% -12.34% -21.56% -44.25% -14.82% ~ AddVW/words=10/data=random -13.89% ~ -9.69% -23.06% -8.04% -12.58% -19.25% -32.80% -11.68% ~ AddVW/words=16/data=random -29.36% -15.35% -21.86% -25.04% -19.89% -32.26% -16.29% -42.66% -25.92% -3.01% AddVW/words=32/data=random -39.02% -28.76% -39.87% -11.22% -2.85% -55.40% -31.17% -55.37% -37.92% -16.28% AddVW/words=64/data=random -25.94% -19.09% -20.60% -6.90% +8.91% -51.00% -43.72% -62.27% -44.11% -28.74% AddVW/words=100/data=random -22.79% -18.13% -18.25% ~ +33.89% -67.40% -51.77% -63.54% -53.75% -30.97% AddVW/words=1000/data=random -8.98% -3.84% ~ -3.15% ~ -93.35% -63.92% -65.66% -68.67% -42.30% AddVW/words=10000/data=random -1.38% -0.38% ~ ~ ~ -89.16% -65.18% -44.65% -70.35% -20.08% AddVW/words=100000/data=random ~ ~ ~ ~ ~ -87.03% -64.51% -36.08% -61.40% -16.53% SubVW/words=1/data=random -3.67% ~ -8.38% -10.26% -3.07% +45.78% -6.06% -11.17% ~ ~ SubVW/words=2/data=random -3.48% -10.07% -5.76% -20.14% -8.45% +44.28% ~ -19.09% ~ +16.98% SubVW/words=3/data=random -7.11% -26.64% -4.48% -22.07% -9.21% +35.61% ~ -23.93% -18.20% ~ SubVW/words=4/data=random -4.23% +7.19% -8.95% -22.62% -13.89% +33.20% -8.96% -29.96% ~ +22.23% SubVW/words=5/data=random -11.49% +1.92% -10.86% -22.27% -17.53% +24.48% -2.88% -35.19% -19.55% ~ SubVW/words=6/data=random -7.67% ~ -7.72% -18.44% -6.24% +12.03% -2.00% -39.68% -10.73% ~ SubVW/words=7/data=random -13.69% -18.32% -11.82% -18.92% -11.57% +6.63% ~ -43.54% -30.81% ~ SubVW/words=8/data=random -16.02% ~ -11.07% -24.50% -11.92% +4.32% -3.01% -46.95% -24.14% ~ SubVW/words=9/data=random -18.76% -3.34% -14.84% -23.79% -17.50% ~ -21.80% -49.98% -29.62% ~ SubVW/words=10/data=random -13.23% ~ -9.25% -21.26% -11.63% ~ -18.58% -39.19% -20.09% ~ SubVW/words=16/data=random -28.25% -13.24% -22.66% -27.18% -19.13% -23.38% -20.24% -51.01% -28.06% -3.05% SubVW/words=32/data=random -38.41% -28.88% -40.12% -11.20% -2.80% -49.17% -34.67% -63.29% -39.25% -15.20% SubVW/words=64/data=random -25.51% -19.24% -22.20% -6.57% +9.98% -48.52% -48.14% -69.50% -49.44% -27.92% SubVW/words=100/data=random -21.69% -18.51% ~ +1.92% +34.42% -65.88% -54.67% -71.24% -58.88% -30.71% SubVW/words=1000/data=random -9.81% -4.05% -2.14% -3.06% ~ -93.37% -67.33% -74.12% -68.36% -42.17% SubVW/words=10000/data=random ~ -0.52% ~ ~ ~ -88.87% -68.54% -44.94% -70.63% -19.95% SubVW/words=100000/data=random ~ ~ ~ ~ ~ -86.69% -68.09% -48.36% -62.42% -19.32% AddVW/words=1/data=shortcut -29.38% -25.38% -27.37% -23.15% -25.41% +3.01% -33.60% -36.12% -15.76% ~ AddVW/words=2/data=shortcut -32.79% -34.72% -31.47% -24.47% -28.21% -3.75% -34.66% -43.89% -23.65% -21.56% AddVW/words=3/data=shortcut -38.50% -46.83% -35.67% -26.38% -30.29% -10.41% -44.89% -47.68% -30.93% -26.85% AddVW/words=4/data=shortcut -40.40% -28.85% -34.19% -29.83% -32.95% -16.09% -42.86% -51.02% -34.19% -26.69% AddVW/words=5/data=shortcut -43.87% -35.42% -36.46% -32.59% -37.72% -20.82% -45.14% -54.01% -35.49% -30.48% AddVW/words=6/data=shortcut -46.98% -39.34% -42.22% -35.43% -38.18% -27.46% -46.72% -56.61% -40.21% -34.07% AddVW/words=7/data=shortcut -49.63% -47.97% -46.61% -35.28% -41.93% -31.14% -49.29% -58.89% -41.10% -37.01% AddVW/words=8/data=shortcut -50.48% -42.33% -45.40% -40.24% -41.74% -32.92% -50.62% -60.98% -44.85% -38.10% AddVW/words=9/data=shortcut -54.27% -43.52% -49.06% -42.16% -45.22% -37.57% -51.84% -62.91% -46.04% -40.82% AddVW/words=10/data=shortcut -56.01% -45.40% -51.42% -43.29% -46.14% -38.65% -53.65% -64.62% -47.05% -43.21% AddVW/words=16/data=shortcut -62.73% -55.66% -59.31% -56.38% -54.31% -53.16% -61.03% -72.29% -58.24% -52.57% AddVW/words=32/data=shortcut -74.00% -69.42% -71.75% -33.65% -37.35% -71.73% -72.59% -82.44% -70.87% -67.69% AddVW/words=64/data=shortcut -56.69% -52.72% -52.09% -35.48% -36.87% -84.24% -83.10% -90.37% -82.56% -80.81% AddVW/words=100/data=shortcut -56.68% -53.18% -51.49% -33.49% -37.72% -89.95% -88.21% -93.37% -88.47% -86.52% AddVW/words=1000/data=shortcut -56.68% -52.45% -51.66% -35.31% -36.65% -98.88% -98.62% -99.24% -98.78% -98.41% AddVW/words=10000/data=shortcut -56.70% -52.40% -51.92% -33.49% -36.98% -99.89% -99.86% -99.92% -99.87% -99.91% AddVW/words=100000/data=shortcut -56.67% -52.46% -52.38% -35.31% -37.20% -99.99% -99.99% -99.99% -99.99% -99.99% SubVW/words=1/data=shortcut -29.80% -20.71% -26.94% -23.24% -25.33% +26.97% -32.02% -37.85% -40.20% -12.67% SubVW/words=2/data=shortcut -35.47% -36.38% -31.93% -25.43% -30.18% +18.96% -33.48% -46.48% -39.38% -18.65% SubVW/words=3/data=shortcut -39.22% -49.96% -36.90% -25.82% -30.96% +12.53% -40.67% -51.07% -43.71% -23.78% SubVW/words=4/data=shortcut -40.46% -24.90% -34.66% -29.87% -33.97% +4.60% -42.32% -54.92% -42.83% -22.45% SubVW/words=5/data=shortcut -43.84% -34.17% -38.00% -32.55% -37.27% -2.46% -43.09% -58.18% -45.70% -26.45% SubVW/words=6/data=shortcut -47.69% -37.49% -42.73% -35.90% -37.73% -8.52% -46.55% -61.01% -44.00% -30.14% SubVW/words=7/data=shortcut -49.45% -50.66% -46.88% -34.77% -41.64% -14.46% -48.92% -63.46% -50.47% -33.39% SubVW/words=8/data=shortcut -50.45% -39.31% -47.14% -40.47% -41.70% -15.77% -50.21% -65.64% -47.71% -34.01% SubVW/words=9/data=shortcut -54.28% -43.07% -49.42% -41.34% -44.99% -19.39% -51.55% -67.61% -56.92% -36.82% SubVW/words=10/data=shortcut -56.85% -47.88% -50.92% -42.76% -45.67% -23.60% -53.04% -69.34% -60.18% -39.43% SubVW/words=16/data=shortcut -62.36% -54.83% -58.80% -55.83% -53.74% -41.04% -60.16% -76.75% -60.56% -48.63% SubVW/words=32/data=shortcut -73.68% -68.64% -71.57% -33.52% -37.34% -64.73% -72.67% -85.89% -71.87% -64.56% SubVW/words=64/data=shortcut -56.68% -51.66% -52.56% -34.75% -37.54% -80.30% -83.58% -92.39% -83.41% -78.70% SubVW/words=100/data=shortcut -56.68% -50.97% -51.57% -33.68% -36.78% -87.42% -88.53% -94.84% -88.87% -84.96% SubVW/words=1000/data=shortcut -56.68% -50.89% -52.10% -34.94% -37.77% -98.59% -98.71% -99.43% -98.80% -98.20% SubVW/words=10000/data=shortcut -56.68% -51.00% -52.44% -33.65% -37.27% -99.86% -99.87% -99.94% -99.88% -99.90% SubVW/words=100000/data=shortcut -56.68% -50.80% -52.20% -34.79% -37.46% -99.99% -99.99% -99.99% -99.99% -99.99% AddVW/words=1/data=carry -0.51% -5.29% -24.03% -26.48% ~ ~ -33.14% -30.23% ~ -20.74% AddVW/words=2/data=carry -6.36% ~ -21.05% -39.40% ~ +10.72% -29.12% -31.34% ~ -17.29% AddVW/words=3/data=carry ~ ~ -17.46% -19.53% +17.58% ~ -26.23% -23.61% +7.80% -14.34% AddVW/words=4/data=carry +19.02% +16.80% ~ ~ +28.25% ~ -27.90% -20.31% +19.16% ~ AddVW/words=5/data=carry +3.97% +53.02% ~ ~ +11.31% ~ -19.05% -17.47% +16.81% ~ AddVW/words=6/data=carry +2.98% +19.83% ~ ~ +14.84% ~ -18.48% -14.92% +18.25% ~ AddVW/words=7/data=carry ~ ~ ~ ~ +27.17% ~ -15.50% -12.74% +13.00% ~ AddVW/words=8/data=carry +0.58% +22.32% ~ +6.10% +29.63% ~ -13.04% ~ +28.46% +2.95% AddVW/words=9/data=carry ~ +31.53% ~ ~ +14.42% ~ -11.32% ~ +18.37% +3.28% AddVW/words=10/data=carry +3.94% +22.36% ~ +6.29% +19.22% ~ -11.27% ~ +20.10% +3.91% AddVW/words=16/data=carry +2.82% +14.23% ~ +10.06% +25.91% -16.12% ~ ~ +52.28% +10.40% AddVW/words=32/data=carry ~ +25.35% +13.66% ~ +34.89% -34.39% +6.51% -18.71% +41.06% +19.42% AddVW/words=64/data=carry -42.03% ~ -39.70% +6.65% +32.29% -39.94% +14.34% ~ +19.68% +20.86% AddVW/words=100/data=carry -33.95% -34.28% -39.65% ~ +27.72% -26.80% +17.40% ~ +26.39% +23.32% AddVW/words=1000/data=carry -42.49% -47.87% -47.44% +1.25% +4.25% -41.76% +23.40% ~ +25.48% +27.99% AddVW/words=10000/data=carry -41.85% -48.49% -49.43% ~ ~ -42.09% +24.61% -10.32% +40.55% +18.35% AddVW/words=100000/data=carry -28.18% -48.13% -48.24% +1.35% ~ -42.90% +24.73% -9.79% +22.55% +17.16% SubVW/words=1/data=carry -10.32% -17.16% -24.14% -26.24% ~ +18.43% -34.10% -29.54% -9.57% ~ SubVW/words=2/data=carry -19.45% -23.31% -20.74% -39.73% ~ +15.74% -28.13% -30.21% ~ -18.74% SubVW/words=3/data=carry ~ -16.18% -15.34% -19.54% +17.62% +12.39% -27.64% -27.09% ~ -14.97% SubVW/words=4/data=carry +11.67% +24.42% ~ ~ +25.11% +14.07% -28.08% -26.18% ~ ~ SubVW/words=5/data=carry +8.08% +25.64% ~ ~ +10.35% +8.12% -21.75% -25.50% ~ -4.86% SubVW/words=6/data=carry ~ +13.82% ~ ~ +12.92% +6.79% -20.25% -24.70% ~ -2.74% SubVW/words=7/data=carry ~ ~ +8.29% +4.51% +26.59% +4.62% -18.01% -24.09% ~ -1.26% SubVW/words=8/data=carry ~ +23.16% +16.19% +6.16% +25.46% +6.74% -15.57% -22.74% ~ +1.44% SubVW/words=9/data=carry ~ +30.71% +20.81% ~ +12.36% ~ -12.99% ~ ~ +3.13% SubVW/words=10/data=carry +5.03% +19.53% +14.84% +14.16% +16.12% ~ -11.64% -16.00% +15.45% +3.29% SubVW/words=16/data=carry +14.42% +15.58% +33.07% +11.43% +24.65% ~ ~ -21.90% +25.59% +9.40% SubVW/words=32/data=carry ~ +27.57% +46.58% ~ +35.35% -8.49% ~ -24.04% +11.86% +18.40% SubVW/words=64/data=carry -24.34% -27.83% -20.90% +13.34% +37.17% -14.90% ~ -8.81% +12.88% +18.92% SubVW/words=100/data=carry -25.19% -34.70% -27.45% +12.86% +28.42% -14.48% ~ ~ +25.71% +21.93% SubVW/words=1000/data=carry -24.93% -47.86% -47.26% +2.66% ~ -23.88% ~ ~ +25.99% +27.81% SubVW/words=10000/data=carry -24.17% -36.48% -49.41% +1.06% ~ -25.06% ~ -26.50% +27.94% +18.36% SubVW/words=100000/data=carry -22.51% -35.86% -49.46% +3.96% ~ -25.18% ~ -22.15% +26.86% +15.44% Change-Id: I8f252073040e674780ac6ec9912082fb205329dd Reviewed-on: https://go-review.googlesource.com/c/go/+/664898 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Also fix a few real but currently harmless bugs from CL 664895. There were a few places that were still wrong if z != x or if a != 0. Change-Id: Id8971e2505523bc4708780c82bf998a546f4f081 Reviewed-on: https://go-review.googlesource.com/c/go/+/664897 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>

Vet is failing on this code because some arguments of mulAddVWW got renamed in the go decl (CL 664895) but not the assembly accessors. Looks like the assembly got written before that CL but checked in after that CL. Change-Id: I270e8db5f8327aa2029c21a126fab1231a3506a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/665717 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>

Benchmark results on Loongson 3C5000 (which is an LA464 implementation): goos: linux goarch: loong64 pkg: math/big cpu: Loongson-3C5000 @ 2200.00MHz │ test/old_3c5000_subvv.log │ test/new_3c5000_subvv.log │ │ sec/op │ sec/op vs base │ SubVV/1 10.920n ± 0% 7.657n ± 0% -29.88% (p=0.000 n=20) SubVV/2 14.100n ± 0% 8.841n ± 0% -37.30% (p=0.000 n=20) SubVV/3 16.38n ± 0% 11.06n ± 0% -32.48% (p=0.000 n=20) SubVV/4 18.65n ± 0% 12.85n ± 0% -31.10% (p=0.000 n=20) SubVV/5 20.93n ± 0% 14.79n ± 0% -29.34% (p=0.000 n=20) SubVV/10 32.30n ± 0% 22.29n ± 0% -30.99% (p=0.000 n=20) SubVV/100 244.3n ± 0% 149.2n ± 0% -38.93% (p=0.000 n=20) SubVV/1000 2.292µ ± 0% 1.378µ ± 0% -39.88% (p=0.000 n=20) SubVV/10000 26.26µ ± 0% 25.64µ ± 0% -2.33% (p=0.000 n=20) SubVV/100000 341.3µ ± 0% 238.0µ ± 0% -30.26% (p=0.000 n=20) geomean 209.1n 144.5n -30.86% Change-Id: I3863c2c6728f1b0f8fecbf77de13254299c5b1cb Reviewed-on: https://go-review.googlesource.com/c/go/+/659877 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Benchmark results on Loongson 3A5000 (which is an LA464 implementation): goos: linux goarch: loong64 pkg: math/big cpu: Loongson-3A5000-HV @ 2500.00MHz │ test/old_3a5000_muladdvww.log │ test/new_3a5000_muladdvww.log │ │ sec/op │ sec/op vs base │ MulAddVWW/1 7.606n ± 0% 6.987n ± 0% -8.14% (p=0.000 n=20) MulAddVWW/2 9.207n ± 0% 8.567n ± 0% -6.95% (p=0.000 n=20) MulAddVWW/3 10.810n ± 0% 9.223n ± 0% -14.68% (p=0.000 n=20) MulAddVWW/4 13.01n ± 0% 12.41n ± 0% -4.61% (p=0.000 n=20) MulAddVWW/5 15.79n ± 0% 12.99n ± 0% -17.73% (p=0.000 n=20) MulAddVWW/10 25.62n ± 0% 20.02n ± 0% -21.86% (p=0.000 n=20) MulAddVWW/100 217.0n ± 0% 170.9n ± 0% -21.24% (p=0.000 n=20) MulAddVWW/1000 2.064µ ± 0% 1.612µ ± 0% -21.90% (p=0.000 n=20) MulAddVWW/10000 24.50µ ± 0% 16.74µ ± 0% -31.66% (p=0.000 n=20) MulAddVWW/100000 239.1µ ± 0% 171.1µ ± 0% -28.45% (p=0.000 n=20) geomean 159.2n 130.3n -18.18% Change-Id: I063434bc382f4f1234f879172ab671a3d6f2eb80 Reviewed-on: https://go-review.googlesource.com/c/go/+/659881 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Benchmark results on Loongson 3C5000 (which is an LA464 implementation): goos: linux goarch: loong64 pkg: math/big cpu: Loongson-3C5000 @ 2200.00MHz │ test/old_3c5000_subvw.log │ test/new_3c5000_subvw.log │ │ sec/op │ sec/op vs base │ SubVW/1 8.564n ± 0% 5.915n ± 0% -30.93% (p=0.000 n=20) SubVW/2 11.675n ± 0% 6.825n ± 0% -41.54% (p=0.000 n=20) SubVW/3 13.410n ± 0% 7.969n ± 0% -40.57% (p=0.000 n=20) SubVW/4 15.300n ± 0% 9.740n ± 0% -36.34% (p=0.000 n=20) SubVW/5 17.34n ± 1% 10.66n ± 0% -38.55% (p=0.000 n=20) SubVW/10 26.55n ± 0% 15.21n ± 0% -42.70% (p=0.000 n=20) SubVW/100 199.2n ± 0% 102.5n ± 0% -48.52% (p=0.000 n=20) SubVW/1000 1866.5n ± 1% 924.6n ± 0% -50.46% (p=0.000 n=20) SubVW/10000 17.67µ ± 2% 12.04µ ± 2% -31.83% (p=0.000 n=20) SubVW/100000 186.4µ ± 0% 132.0µ ± 0% -29.17% (p=0.000 n=20) SubVWext/1 8.616n ± 0% 5.949n ± 0% -30.95% (p=0.000 n=20) SubVWext/2 11.410n ± 0% 7.008n ± 1% -38.58% (p=0.000 n=20) SubVWext/3 13.255n ± 1% 8.073n ± 0% -39.09% (p=0.000 n=20) SubVWext/4 15.095n ± 0% 9.893n ± 0% -34.47% (p=0.000 n=20) SubVWext/5 16.87n ± 0% 10.86n ± 0% -35.63% (p=0.000 n=20) SubVWext/10 26.00n ± 0% 15.54n ± 0% -40.22% (p=0.000 n=20) SubVWext/100 196.0n ± 0% 104.3n ± 1% -46.76% (p=0.000 n=20) SubVWext/1000 1847.0n ± 0% 923.7n ± 0% -49.99% (p=0.000 n=20) SubVWext/10000 17.30µ ± 1% 11.71µ ± 1% -32.31% (p=0.000 n=20) SubVWext/100000 187.5µ ± 0% 131.6µ ± 0% -29.82% (p=0.000 n=20) geomean 159.7n 97.79n -38.79% Change-Id: I21a6903e79b02cb22282e80c9bfe2ae9f1a87589 Reviewed-on: https://go-review.googlesource.com/c/go/+/659878 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn>

Benchmark results on Loongson 3C5000 (which is an LA464 implementation): goos: linux goarch: loong64 pkg: math/big cpu: Loongson-3C5000 @ 2200.00MHz │ test/old_3c5000_addvw.log │ test/new_3c5000_addvw.log │ │ sec/op │ sec/op vs base │ AddVW/1 9.555n ± 0% 5.915n ± 0% -38.09% (p=0.000 n=20) AddVW/2 11.370n ± 0% 6.825n ± 0% -39.97% (p=0.000 n=20) AddVW/3 12.485n ± 0% 7.970n ± 0% -36.16% (p=0.000 n=20) AddVW/4 14.980n ± 0% 9.718n ± 0% -35.13% (p=0.000 n=20) AddVW/5 16.73n ± 0% 10.63n ± 0% -36.46% (p=0.000 n=20) AddVW/10 24.57n ± 0% 15.18n ± 0% -38.23% (p=0.000 n=20) AddVW/100 184.9n ± 0% 102.4n ± 0% -44.62% (p=0.000 n=20) AddVW/1000 1721.0n ± 0% 921.4n ± 0% -46.46% (p=0.000 n=20) AddVW/10000 16.83µ ± 0% 11.68µ ± 0% -30.58% (p=0.000 n=20) AddVW/100000 184.7µ ± 0% 131.3µ ± 0% -28.93% (p=0.000 n=20) AddVWext/1 9.554n ± 0% 5.915n ± 0% -38.09% (p=0.000 n=20) AddVWext/2 11.370n ± 0% 6.825n ± 0% -39.97% (p=0.000 n=20) AddVWext/3 12.505n ± 0% 7.969n ± 0% -36.27% (p=0.000 n=20) AddVWext/4 14.980n ± 0% 9.718n ± 0% -35.13% (p=0.000 n=20) AddVWext/5 16.70n ± 0% 10.63n ± 0% -36.33% (p=0.000 n=20) AddVWext/10 24.54n ± 0% 15.18n ± 0% -38.13% (p=0.000 n=20) AddVWext/100 185.0n ± 0% 102.4n ± 0% -44.65% (p=0.000 n=20) AddVWext/1000 1721.0n ± 0% 921.4n ± 0% -46.46% (p=0.000 n=20) AddVWext/10000 16.83µ ± 0% 11.68µ ± 0% -30.60% (p=0.000 n=20) AddVWext/100000 184.9µ ± 0% 130.4µ ± 0% -29.51% (p=0.000 n=20) geomean 155.5n 96.87n -37.70% Change-Id: I824a90cb365e09d7d0d4a2c53ff4b30cf057a75e Reviewed-on: https://go-review.googlesource.com/c/go/+/659876 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>

It is annoying that non-x86 implementations of shlVU and shrVU have to go out of their way to handle the trivial case shift==0 with their own copy loops. Instead, arrange to never call them with shift==0, so that the code can be removed. Unfortunately, there are linknames of shlVU, so we cannot change that function. But we can rename the functions and then leave behind a shlVU wrapper, so do that. Since the big.Int API calls the operations Lsh and Rsh, rename shlVU/shrVU to lshVU/rshVU. Also rename various other shl/shr methods and functions to lsh/rsh. Change-Id: Ieaf54e0110a298730aa3e4566ce5be57ba7fc121 Reviewed-on: https://go-review.googlesource.com/c/go/+/664896 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

addMulVVW is an unnecessarily special case. All other assembly routines taking []Word (V as in vector) arguments take separate source and destination. For example: addVV: z = x+y mulAddVWW: z = x*m+a addMulVVW uses the z parameter as both destination and source: addMulVVW: z = z+x*m Even looking at the signatures is confusing: all the VV routines take two input vectors x and y, but addMulVVW takes only x: where is y? (The answer is that the two inputs are z and x.) It would be nice to fix this, both for understandability and regularity, and to simplify a future assembly generator. We cannot remove or redefine addMulVVW, because it has been used in linknames. Instead, the CL adds a new final addend argument ‘a’ like in mulAddVWW, making the natural name addMulVVWW (two input vectors, two input words): addMulVVWW: z = x+y*m+a This CL updates all the assembly implementations to rename the inputs z, x, y -> x, y, m, and then introduces a separate destination z. Change-Id: Ib76c80b53f6d1f4a901f663566e9c4764bb20488 Reviewed-on: https://go-review.googlesource.com/c/go/+/664895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>

This CL is to add LCDBR assembly instruction mnemonics, mainly used in math package. The LCDBR instruction has the same effect as the FNEG pseudo-instructions, just that it sets the flag. Change-Id: I3f00f1ed19148d074c3b6c5f64af0772289f2802 Reviewed-on: https://go-review.googlesource.com/c/go/+/648036 Reviewed-by: Srinivas Pokala <Pokala.Srinivas@ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org>

Refactor calibration tests to use the same logic for all. Choosing thresholds that are broadly appropriate for all systems is part science but also part guesswork and judgement. We could instead set per-GOOS/GOARCH thresholds, but that seems like too much work, and even then there would be variation between different chips within a GOOS/GOARCH. (For example see the three linux/amd64 systems benchmarked below.) The thresholds chosen in this CL are: karatsubaThreshold = 40 // unchanged basicSqrThreshold = 12 // was 20 karatsubaSqrThreshold = 80 // was 260 divRecursiveThreshold = 40 // was 100 The new file calibrate.md explains the calibration process and links to graphs justifying those values. (The graphs are hosted on swtch.com to avoid adding a megabyte of extra data to the Go repo and Go distributions.) A rendered copy of calibrate.md is at https://swtch.com/math/big/calibrate.html. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-88 13.13n ± 2% 13.14n ± 2% ~ (p=0.494 n=15) Div/40/20-88 13.13n ± 2% 13.14n ± 2% ~ (p=0.137 n=15) Div/100/50-88 25.50n ± 0% 25.51n ± 0% ~ (p=0.038 n=15) Div/200/100-88 113.1n ± 1% 116.0n ± 3% +2.56% (p=0.000 n=15) Div/400/200-88 135.3n ± 0% 137.1n ± 1% ~ (p=0.004 n=15) Div/1000/500-88 259.9n ± 1% 259.0n ± 2% ~ (p=0.182 n=15) Div/2000/1000-88 568.8n ± 1% 564.7n ± 3% ~ (p=0.927 n=15) Div/20000/10000-88 25.79µ ± 1% 22.11µ ± 2% -14.26% (p=0.000 n=15) Div/200000/100000-88 755.1µ ± 1% 737.6µ ± 1% -2.32% (p=0.000 n=15) Div/2000000/1000000-88 31.30m ± 0% 31.20m ± 1% ~ (p=0.081 n=15) Div/20000000/10000000-88 1.268 ± 0% 1.265 ± 0% ~ (p=0.011 n=15) NatMul/10-88 142.6n ± 0% 142.9n ± 7% ~ (p=0.145 n=15) NatMul/100-88 4.347µ ± 0% 4.350µ ± 3% ~ (p=0.430 n=15) NatMul/1000-88 187.6µ ± 0% 188.4µ ± 2% ~ (p=0.004 n=15) NatMul/10000-88 8.052m ± 0% 8.057m ± 1% ~ (p=0.148 n=15) NatMul/100000-88 260.6m ± 0% 260.7m ± 0% ~ (p=0.512 n=15) NatSqr/1-88 26.58n ± 5% 27.96n ± 8% ~ (p=0.574 n=15) NatSqr/2-88 42.35n ± 7% 44.87n ± 6% ~ (p=0.690 n=15) NatSqr/3-88 53.28n ± 4% 55.62n ± 5% ~ (p=0.151 n=15) NatSqr/5-88 76.26n ± 6% 81.43n ± 6% +6.78% (p=0.000 n=15) NatSqr/8-88 110.8n ± 5% 116.4n ± 6% ~ (p=0.040 n=15) NatSqr/10-88 141.4n ± 4% 147.8n ± 4% ~ (p=0.011 n=15) NatSqr/20-88 325.8n ± 3% 341.7n ± 4% +4.88% (p=0.000 n=15) NatSqr/30-88 536.8n ± 3% 556.1n ± 4% ~ (p=0.027 n=15) NatSqr/50-88 1.168µ ± 3% 1.197µ ± 3% ~ (p=0.442 n=15) NatSqr/80-88 2.527µ ± 2% 2.480µ ± 2% -1.86% (p=0.000 n=15) NatSqr/100-88 3.771µ ± 2% 3.535µ ± 2% -6.26% (p=0.000 n=15) NatSqr/200-88 14.03µ ± 2% 10.57µ ± 3% -24.68% (p=0.000 n=15) NatSqr/300-88 24.06µ ± 2% 20.57µ ± 2% -14.52% (p=0.000 n=15) NatSqr/500-88 65.43µ ± 1% 45.45µ ± 1% -30.55% (p=0.000 n=15) NatSqr/800-88 126.41µ ± 1% 94.13µ ± 2% -25.54% (p=0.000 n=15) NatSqr/1000-88 196.4µ ± 1% 135.1µ ± 1% -31.18% (p=0.000 n=15) NatSqr/10000-88 6.404m ± 0% 5.326m ± 1% -16.84% (p=0.000 n=15) NatSqr/100000-88 267.2m ± 0% 198.7m ± 0% -25.64% (p=0.000 n=15) geomean 7.318µ 6.948µ -5.06% goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.973 n=15) Div/40/20-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.226 n=15) Div/100/50-16 55.27n ± 1% 55.59n ± 0% ~ (p=0.004 n=15) Div/200/100-16 174.7n ± 3% 175.9n ± 2% ~ (p=0.645 n=15) Div/400/200-16 208.8n ± 1% 209.5n ± 2% ~ (p=0.169 n=15) Div/1000/500-16 378.7n ± 2% 380.5n ± 2% ~ (p=0.091 n=15) Div/2000/1000-16 778.4n ± 1% 781.1n ± 2% ~ (p=0.104 n=15) Div/20000/10000-16 25.16µ ± 1% 24.93µ ± 1% -0.91% (p=0.000 n=15) Div/200000/100000-16 926.4µ ± 0% 927.7µ ± 1% ~ (p=0.436 n=15) Div/2000000/1000000-16 35.58m ± 0% 35.53m ± 0% ~ (p=0.267 n=15) Div/20000000/10000000-16 1.333 ± 0% 1.330 ± 0% ~ (p=0.126 n=15) NatMul/10-16 172.6n ± 0% 165.4n ± 0% -4.17% (p=0.000 n=15) NatMul/100-16 5.706µ ± 0% 5.503µ ± 0% -3.56% (p=0.000 n=15) NatMul/1000-16 220.8µ ± 0% 219.1µ ± 0% -0.76% (p=0.000 n=15) NatMul/10000-16 8.688m ± 0% 8.621m ± 0% -0.77% (p=0.000 n=15) NatMul/100000-16 333.3m ± 0% 333.5m ± 0% ~ (p=0.512 n=15) NatSqr/1-16 28.66n ± 1% 28.42n ± 3% -0.84% (p=0.000 n=15) NatSqr/2-16 48.29n ± 2% 48.19n ± 2% ~ (p=0.042 n=15) NatSqr/3-16 59.93n ± 0% 59.64n ± 2% -0.48% (p=0.000 n=15) NatSqr/5-16 88.05n ± 0% 87.89n ± 3% ~ (p=0.066 n=15) NatSqr/8-16 127.7n ± 0% 126.9n ± 3% -0.63% (p=0.000 n=15) NatSqr/10-16 170.4n ± 0% 169.7n ± 3% ~ (p=0.004 n=15) NatSqr/20-16 388.8n ± 0% 392.9n ± 3% ~ (p=0.123 n=15) NatSqr/30-16 635.2n ± 0% 641.7n ± 3% ~ (p=0.123 n=15) NatSqr/50-16 1.304µ ± 1% 1.314µ ± 3% ~ (p=0.927 n=15) NatSqr/80-16 2.709µ ± 1% 2.899µ ± 4% +7.01% (p=0.000 n=15) NatSqr/100-16 3.885µ ± 0% 3.981µ ± 4% ~ (p=0.123 n=15) NatSqr/200-16 13.29µ ± 2% 12.14µ ± 4% -8.67% (p=0.000 n=15) NatSqr/300-16 23.39µ ± 0% 22.51µ ± 3% -3.78% (p=0.000 n=15) NatSqr/500-16 58.13µ ± 1% 50.56µ ± 2% -13.02% (p=0.000 n=15) NatSqr/800-16 118.4µ ± 1% 107.6µ ± 2% -9.11% (p=0.000 n=15) NatSqr/1000-16 172.7µ ± 1% 151.8µ ± 2% -12.11% (p=0.000 n=15) NatSqr/10000-16 6.065m ± 1% 5.757m ± 1% -5.08% (p=0.000 n=15) NatSqr/100000-16 240.9m ± 0% 228.1m ± 0% -5.32% (p=0.000 n=15) geomean 8.601µ 8.453µ -1.71% goos: linux goarch: amd64 pkg: math/big cpu: AMD Ryzen 9 7950X 16-Core Processor │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-32 11.11n ± 0% 11.11n ± 1% ~ (p=0.532 n=15) Div/40/20-32 11.08n ± 1% 11.11n ± 0% ~ (p=0.815 n=15) Div/100/50-32 16.81n ± 0% 16.84n ± 29% ~ (p=0.020 n=15) Div/200/100-32 73.91n ± 0% 76.85n ± 11% +3.98% (p=0.000 n=15) Div/400/200-32 87.35n ± 0% 88.91n ± 34% +1.79% (p=0.000 n=15) Div/1000/500-32 169.3n ± 1% 168.9n ± 1% ~ (p=0.049 n=15) Div/2000/1000-32 369.3n ± 0% 369.0n ± 0% ~ (p=0.108 n=15) Div/20000/10000-32 15.92µ ± 0% 13.55µ ± 2% -14.91% (p=0.000 n=15) Div/200000/100000-32 491.4µ ± 0% 482.4µ ± 1% -1.84% (p=0.000 n=15) Div/2000000/1000000-32 20.09m ± 0% 19.96m ± 0% -0.69% (p=0.000 n=15) Div/20000000/10000000-32 756.5m ± 0% 755.5m ± 0% ~ (p=0.089 n=15) NatMul/10-32 125.4n ± 5% 124.8n ± 1% ~ (p=0.588 n=15) NatMul/100-32 2.952µ ± 3% 2.969µ ± 0% ~ (p=0.237 n=15) NatMul/1000-32 120.7µ ± 0% 121.1µ ± 0% +0.30% (p=0.000 n=15) NatMul/10000-32 4.845m ± 0% 4.839m ± 1% ~ (p=0.653 n=15) NatMul/100000-32 173.3m ± 0% 173.3m ± 0% ~ (p=0.838 n=15) NatSqr/1-32 31.18n ± 23% 32.08n ± 2% ~ (p=0.015 n=15) NatSqr/2-32 57.22n ± 28% 58.88n ± 2% ~ (p=0.054 n=15) NatSqr/3-32 61.34n ± 18% 64.33n ± 2% ~ (p=0.237 n=15) NatSqr/5-32 72.47n ± 17% 79.81n ± 3% ~ (p=0.067 n=15) NatSqr/8-32 83.26n ± 26% 100.10n ± 3% ~ (p=0.016 n=15) NatSqr/10-32 87.31n ± 43% 125.50n ± 2% ~ (p=0.003 n=15) NatSqr/20-32 193.5n ± 25% 244.4n ± 13% ~ (p=0.002 n=15) NatSqr/30-32 323.9n ± 17% 380.9n ± 6% ~ (p=0.003 n=15) NatSqr/50-32 713.4n ± 9% 761.7n ± 8% ~ (p=0.419 n=15) NatSqr/80-32 1.486µ ± 7% 1.609µ ± 5% +8.28% (p=0.000 n=15) NatSqr/100-32 2.115µ ± 9% 2.253µ ± 1% ~ (p=0.104 n=15) NatSqr/200-32 7.201µ ± 4% 6.610µ ± 1% -8.21% (p=0.000 n=15) NatSqr/300-32 13.08µ ± 2% 12.37µ ± 1% -5.41% (p=0.000 n=15) NatSqr/500-32 32.56µ ± 2% 27.83µ ± 2% -14.52% (p=0.000 n=15) NatSqr/800-32 66.83µ ± 3% 59.59µ ± 1% -10.83% (p=0.000 n=15) NatSqr/1000-32 98.09µ ± 1% 83.59µ ± 1% -14.78% (p=0.000 n=15) NatSqr/10000-32 3.445m ± 1% 3.245m ± 0% -5.81% (p=0.000 n=15) NatSqr/100000-32 137.3m ± 0% 127.0m ± 0% -7.54% (p=0.000 n=15) geomean 4.897µ 4.972µ +1.52% goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 15.26n ± 2% 15.14n ± 1% ~ (p=0.212 n=15) Div/40/20-16 15.22n ± 1% 15.16n ± 0% ~ (p=0.190 n=15) Div/100/50-16 26.53n ± 2% 26.42n ± 0% -0.41% (p=0.000 n=15) Div/200/100-16 124.3n ± 0% 124.0n ± 0% ~ (p=0.704 n=15) Div/400/200-16 142.4n ± 0% 141.8n ± 0% ~ (p=0.074 n=15) Div/1000/500-16 262.0n ± 1% 261.3n ± 1% ~ (p=0.046 n=15) Div/2000/1000-16 532.6n ± 0% 532.5n ± 1% ~ (p=0.798 n=15) Div/20000/10000-16 22.27µ ± 0% 22.88µ ± 0% +2.73% (p=0.000 n=15) Div/200000/100000-16 890.4µ ± 0% 902.8µ ± 0% +1.39% (p=0.000 n=15) Div/2000000/1000000-16 35.03m ± 0% 35.10m ± 0% ~ (p=0.305 n=15) Div/20000000/10000000-16 1.380 ± 0% 1.385 ± 0% ~ (p=0.019 n=15) NatMul/10-16 177.6n ± 1% 175.6n ± 3% ~ (p=0.480 n=15) NatMul/100-16 5.675µ ± 0% 5.669µ ± 1% ~ (p=0.705 n=15) NatMul/1000-16 224.3µ ± 0% 224.6µ ± 0% ~ (p=0.653 n=15) NatMul/10000-16 8.735m ± 0% 8.739m ± 0% ~ (p=0.567 n=15) NatMul/100000-16 331.6m ± 0% 331.6m ± 1% ~ (p=0.412 n=15) NatSqr/1-16 43.69n ± 2% 42.77n ± 6% ~ (p=0.383 n=15) NatSqr/2-16 65.26n ± 2% 63.91n ± 5% ~ (p=0.285 n=15) NatSqr/3-16 73.95n ± 1% 72.25n ± 6% ~ (p=0.198 n=15) NatSqr/5-16 95.06n ± 1% 94.21n ± 3% ~ (p=0.721 n=15) NatSqr/8-16 155.5n ± 1% 153.4n ± 4% ~ (p=0.170 n=15) NatSqr/10-16 175.4n ± 1% 174.0n ± 2% ~ (p=0.271 n=15) NatSqr/20-16 360.8n ± 0% 358.5n ± 2% ~ (p=0.170 n=15) NatSqr/30-16 584.7n ± 0% 582.9n ± 1% ~ (p=0.170 n=15) NatSqr/50-16 1.323µ ± 0% 1.322µ ± 0% ~ (p=0.627 n=15) NatSqr/80-16 2.916µ ± 0% 2.674µ ± 0% -8.30% (p=0.000 n=15) NatSqr/100-16 4.365µ ± 0% 3.802µ ± 0% -12.90% (p=0.000 n=15) NatSqr/200-16 16.42µ ± 0% 11.29µ ± 0% -31.26% (p=0.000 n=15) NatSqr/300-16 28.07µ ± 0% 22.83µ ± 0% -18.68% (p=0.000 n=15) NatSqr/500-16 76.30µ ± 0% 50.06µ ± 0% -34.39% (p=0.000 n=15) NatSqr/800-16 147.5µ ± 0% 101.2µ ± 1% -31.41% (p=0.000 n=15) NatSqr/1000-16 228.6µ ± 0% 149.5µ ± 0% -34.61% (p=0.000 n=15) NatSqr/10000-16 7.417m ± 0% 6.025m ± 0% -18.76% (p=0.000 n=15) NatSqr/100000-16 309.2m ± 0% 214.9m ± 0% -30.50% (p=0.000 n=15) geomean 8.559µ 7.906µ -7.63% goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-12 9.577n ± 6% 9.473n ± 5% ~ (p=0.384 n=15) Div/40/20-12 9.480n ± 1% 9.430n ± 1% ~ (p=0.019 n=15) Div/100/50-12 14.82n ± 0% 14.82n ± 0% ~ (p=0.845 n=15) Div/200/100-12 83.94n ± 1% 84.35n ± 4% ~ (p=0.512 n=15) Div/400/200-12 102.7n ± 1% 102.9n ± 0% ~ (p=0.845 n=15) Div/1000/500-12 185.3n ± 1% 181.9n ± 1% -1.83% (p=0.000 n=15) Div/2000/1000-12 397.0n ± 1% 396.7n ± 0% ~ (p=0.959 n=15) Div/20000/10000-12 14.05µ ± 0% 13.70µ ± 1% ~ (p=0.002 n=15) Div/200000/100000-12 529.4µ ± 3% 526.7µ ± 2% ~ (p=0.967 n=15) Div/2000000/1000000-12 20.05m ± 0% 20.05m ± 0% ~ (p=0.653 n=15) Div/20000000/10000000-12 788.2m ± 1% 789.0m ± 1% ~ (p=0.412 n=15) NatMul/10-12 79.95n ± 1% 80.87n ± 1% +1.15% (p=0.000 n=15) NatMul/100-12 2.973µ ± 0% 2.986µ ± 2% ~ (p=0.051 n=15) NatMul/1000-12 122.6µ ± 5% 123.0µ ± 1% ~ (p=0.783 n=15) NatMul/10000-12 4.990m ± 1% 5.000m ± 1% ~ (p=0.653 n=15) NatMul/100000-12 185.3m ± 3% 190.3m ± 1% ~ (p=0.089 n=15) NatSqr/1-12 11.84n ± 1% 11.88n ± 1% ~ (p=0.735 n=15) NatSqr/2-12 21.01n ± 1% 21.44n ± 6% ~ (p=0.039 n=15) NatSqr/3-12 25.59n ± 0% 26.74n ± 9% +4.49% (p=0.000 n=15) NatSqr/5-12 36.78n ± 0% 37.04n ± 1% +0.71% (p=0.000 n=15) NatSqr/8-12 63.09n ± 3% 63.22n ± 1% ~ (p=0.846 n=15) NatSqr/10-12 79.98n ± 0% 79.78n ± 0% ~ (p=0.100 n=15) NatSqr/20-12 174.0n ± 0% 175.5n ± 1% ~ (p=0.361 n=15) NatSqr/30-12 290.0n ± 0% 291.4n ± 0% ~ (p=0.002 n=15) NatSqr/50-12 655.2n ± 4% 658.1n ± 0% ~ (p=0.060 n=15) NatSqr/80-12 1.506µ ± 0% 1.397µ ± 5% -7.24% (p=0.000 n=15) NatSqr/100-12 2.273µ ± 0% 2.005µ ± 5% -11.79% (p=0.000 n=15) NatSqr/200-12 8.833µ ± 6% 6.109µ ± 0% -30.84% (p=0.000 n=15) NatSqr/300-12 15.15µ ± 4% 12.37µ ± 0% -18.34% (p=0.000 n=15) NatSqr/500-12 41.89µ ± 0% 27.70µ ± 1% -33.88% (p=0.000 n=15) NatSqr/800-12 80.72µ ± 0% 56.40µ ± 0% -30.12% (p=0.000 n=15) NatSqr/1000-12 127.06µ ± 1% 84.06µ ± 1% -33.84% (p=0.000 n=15) NatSqr/10000-12 4.130m ± 0% 3.390m ± 0% -17.91% (p=0.000 n=15) NatSqr/100000-12 173.2m ± 0% 131.2m ± 6% -24.25% (p=0.000 n=15) geomean 4.489µ 4.189µ -6.68% Change-Id: Iaf65fd85457b003ebf07a787c875cda321b40cc9 Reviewed-on: https://go-review.googlesource.com/c/go/+/652058 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>

The old Karatsuba implementation only operated on lengths that are a power of two times a number smaller than karatsubaThreshold. For example, when karatsubaThreshold = 40, multiplying a pair of 99-word numbers runs karatsuba on the low 96 (= 39<<2) words and then has to fix up the answer to include the high 3 words of each. I suspect this requirement was needed to make the analysis of how many temporary words to reserve easier, back when the answer was 3*n and depended on exactly halving the size at each Karatsuba step. Now that we have the more flexible temporary allocation stack, we can change Karatsuba to accept operands of odd length. Doing so avoids most of the fixup that the old approach required. For example, multiplying a pair of 99-word numbers runs karatsuba on all 99 words now. This is simpler and about the same speed or, for large cases, faster. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-16 99.62n ± 3% 99.10n ± 3% ~ (p=0.009 n=15) GCD10x10/WithXY-16 243.4n ± 1% 245.2n ± 1% ~ (p=0.009 n=15) GCD100x100/WithoutXY-16 921.9n ± 1% 919.2n ± 1% ~ (p=0.076 n=15) GCD100x100/WithXY-16 1.527µ ± 1% 1.526µ ± 0% ~ (p=0.813 n=15) GCD1000x1000/WithoutXY-16 9.704µ ± 1% 9.696µ ± 0% ~ (p=0.532 n=15) GCD1000x1000/WithXY-16 14.03µ ± 1% 13.96µ ± 0% ~ (p=0.014 n=15) GCD10000x10000/WithoutXY-16 206.5µ ± 2% 206.5µ ± 0% ~ (p=0.967 n=15) GCD10000x10000/WithXY-16 398.0µ ± 1% 397.4µ ± 0% ~ (p=0.683 n=15) Div/20/10-16 22.22n ± 0% 22.23n ± 0% ~ (p=0.105 n=15) Div/40/20-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.307 n=15) Div/100/50-16 55.47n ± 0% 55.47n ± 0% ~ (p=0.573 n=15) Div/200/100-16 174.9n ± 1% 174.6n ± 1% ~ (p=0.814 n=15) Div/400/200-16 209.5n ± 1% 210.5n ± 1% ~ (p=0.454 n=15) Div/1000/500-16 379.9n ± 0% 383.5n ± 2% ~ (p=0.123 n=15) Div/2000/1000-16 780.1n ± 0% 784.6n ± 1% +0.58% (p=0.000 n=15) Div/20000/10000-16 25.22µ ± 1% 25.15µ ± 0% ~ (p=0.213 n=15) Div/200000/100000-16 921.8µ ± 1% 926.1µ ± 0% ~ (p=0.009 n=15) Div/2000000/1000000-16 37.91m ± 0% 35.63m ± 0% -6.02% (p=0.000 n=15) Div/20000000/10000000-16 1.378 ± 0% 1.336 ± 0% -3.03% (p=0.000 n=15) NatMul/10-16 166.8n ± 4% 168.9n ± 3% ~ (p=0.008 n=15) NatMul/100-16 5.519µ ± 2% 5.548µ ± 4% ~ (p=0.032 n=15) NatMul/1000-16 230.4µ ± 1% 220.2µ ± 1% -4.43% (p=0.000 n=15) NatMul/10000-16 8.569m ± 1% 8.640m ± 1% ~ (p=0.005 n=15) NatMul/100000-16 376.5m ± 1% 334.1m ± 0% -11.26% (p=0.000 n=15) NatSqr/1-16 27.85n ± 5% 28.60n ± 2% ~ (p=0.123 n=15) NatSqr/2-16 47.99n ± 2% 48.84n ± 1% ~ (p=0.008 n=15) NatSqr/3-16 59.41n ± 2% 60.87n ± 2% +2.46% (p=0.001 n=15) NatSqr/5-16 87.27n ± 2% 89.31n ± 3% ~ (p=0.087 n=15) NatSqr/8-16 124.6n ± 3% 128.9n ± 3% ~ (p=0.006 n=15) NatSqr/10-16 166.3n ± 3% 172.7n ± 3% ~ (p=0.002 n=15) NatSqr/20-16 385.2n ± 2% 394.7n ± 3% ~ (p=0.036 n=15) NatSqr/30-16 622.7n ± 3% 642.9n ± 3% ~ (p=0.032 n=15) NatSqr/50-16 1.274µ ± 3% 1.323µ ± 4% ~ (p=0.003 n=15) NatSqr/80-16 2.606µ ± 4% 2.714µ ± 4% ~ (p=0.044 n=15) NatSqr/100-16 3.731µ ± 4% 3.871µ ± 4% ~ (p=0.038 n=15) NatSqr/200-16 12.99µ ± 2% 13.09µ ± 3% ~ (p=0.838 n=15) NatSqr/300-16 22.87µ ± 2% 23.25µ ± 2% ~ (p=0.285 n=15) NatSqr/500-16 58.43µ ± 1% 58.25µ ± 2% ~ (p=0.345 n=15) NatSqr/800-16 115.3µ ± 3% 116.2µ ± 3% ~ (p=0.126 n=15) NatSqr/1000-16 173.9µ ± 1% 174.3µ ± 1% ~ (p=0.935 n=15) NatSqr/10000-16 6.133m ± 2% 6.034m ± 1% -1.62% (p=0.000 n=15) NatSqr/100000-16 253.8m ± 1% 241.5m ± 0% -4.87% (p=0.000 n=15) geomean 7.745µ 7.760µ +0.19% goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-88 62.17n ± 4% 61.44n ± 0% -1.17% (p=0.000 n=15) GCD10x10/WithXY-88 173.4n ± 2% 172.4n ± 4% ~ (p=0.615 n=15) GCD100x100/WithoutXY-88 584.0n ± 1% 582.9n ± 0% ~ (p=0.009 n=15) GCD100x100/WithXY-88 1.098µ ± 1% 1.091µ ± 2% ~ (p=0.002 n=15) GCD1000x1000/WithoutXY-88 6.055µ ± 0% 6.049µ ± 0% ~ (p=0.007 n=15) GCD1000x1000/WithXY-88 9.430µ ± 0% 9.417µ ± 1% ~ (p=0.123 n=15) GCD10000x10000/WithoutXY-88 153.4µ ± 2% 149.0µ ± 2% -2.85% (p=0.000 n=15) GCD10000x10000/WithXY-88 350.6µ ± 3% 349.0µ ± 2% ~ (p=0.126 n=15) Div/20/10-88 13.12n ± 0% 13.12n ± 1% 0.00% (p=0.042 n=15) Div/40/20-88 13.12n ± 0% 13.13n ± 0% ~ (p=0.004 n=15) Div/100/50-88 25.49n ± 0% 25.49n ± 0% ~ (p=0.452 n=15) Div/200/100-88 115.7n ± 2% 113.8n ± 2% ~ (p=0.212 n=15) Div/400/200-88 135.0n ± 1% 136.1n ± 1% ~ (p=0.005 n=15) Div/1000/500-88 257.5n ± 1% 259.9n ± 1% ~ (p=0.004 n=15) Div/2000/1000-88 567.5n ± 1% 572.4n ± 2% ~ (p=0.616 n=15) Div/20000/10000-88 25.65µ ± 0% 25.77µ ± 1% ~ (p=0.032 n=15) Div/200000/100000-88 777.4µ ± 1% 754.3µ ± 1% -2.97% (p=0.000 n=15) Div/2000000/1000000-88 33.66m ± 0% 31.37m ± 0% -6.81% (p=0.000 n=15) Div/20000000/10000000-88 1.320 ± 0% 1.266 ± 0% -4.04% (p=0.000 n=15) NatMul/10-88 151.9n ± 7% 143.3n ± 7% ~ (p=0.878 n=15) NatMul/100-88 4.418µ ± 2% 4.337µ ± 3% ~ (p=0.512 n=15) NatMul/1000-88 206.8µ ± 1% 189.8µ ± 1% -8.25% (p=0.000 n=15) NatMul/10000-88 8.531m ± 1% 8.095m ± 0% -5.12% (p=0.000 n=15) NatMul/100000-88 298.9m ± 0% 260.5m ± 1% -12.85% (p=0.000 n=15) NatSqr/1-88 27.55n ± 6% 28.25n ± 7% ~ (p=0.024 n=15) NatSqr/2-88 44.71n ± 6% 46.21n ± 9% ~ (p=0.024 n=15) NatSqr/3-88 55.44n ± 4% 58.41n ± 10% ~ (p=0.126 n=15) NatSqr/5-88 80.71n ± 5% 81.41n ± 5% ~ (p=0.032 n=15) NatSqr/8-88 115.7n ± 4% 115.4n ± 5% ~ (p=0.814 n=15) NatSqr/10-88 147.4n ± 4% 147.3n ± 4% ~ (p=0.505 n=15) NatSqr/20-88 337.8n ± 3% 337.3n ± 4% ~ (p=0.814 n=15) NatSqr/30-88 556.9n ± 3% 557.6n ± 4% ~ (p=0.814 n=15) NatSqr/50-88 1.208µ ± 4% 1.208µ ± 3% ~ (p=0.910 n=15) NatSqr/80-88 2.591µ ± 3% 2.581µ ± 3% ~ (p=0.705 n=15) NatSqr/100-88 3.870µ ± 3% 3.858µ ± 3% ~ (p=0.846 n=15) NatSqr/200-88 14.43µ ± 3% 14.28µ ± 2% ~ (p=0.383 n=15) NatSqr/300-88 24.68µ ± 2% 24.49µ ± 2% ~ (p=0.624 n=15) NatSqr/500-88 66.27µ ± 1% 66.18µ ± 1% ~ (p=0.735 n=15) NatSqr/800-88 128.7µ ± 1% 127.4µ ± 1% ~ (p=0.050 n=15) NatSqr/1000-88 198.7µ ± 1% 197.7µ ± 1% ~ (p=0.229 n=15) NatSqr/10000-88 6.582m ± 1% 6.426m ± 1% -2.37% (p=0.000 n=15) NatSqr/100000-88 274.3m ± 0% 267.3m ± 0% -2.57% (p=0.000 n=15) geomean 6.518µ 6.438µ -1.22% goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-16 61.70n ± 1% 61.32n ± 1% ~ (p=0.361 n=15) GCD10x10/WithXY-16 217.3n ± 1% 217.0n ± 1% ~ (p=0.395 n=15) GCD100x100/WithoutXY-16 569.7n ± 0% 572.6n ± 2% ~ (p=0.213 n=15) GCD100x100/WithXY-16 1.241µ ± 1% 1.236µ ± 1% ~ (p=0.157 n=15) GCD1000x1000/WithoutXY-16 5.558µ ± 0% 5.566µ ± 0% ~ (p=0.228 n=15) GCD1000x1000/WithXY-16 9.319µ ± 0% 9.326µ ± 0% ~ (p=0.233 n=15) GCD10000x10000/WithoutXY-16 126.4µ ± 2% 128.7µ ± 3% ~ (p=0.081 n=15) GCD10000x10000/WithXY-16 279.3µ ± 0% 278.3µ ± 5% ~ (p=0.187 n=15) Div/20/10-16 15.12n ± 1% 15.21n ± 1% ~ (p=0.490 n=15) Div/40/20-16 15.11n ± 0% 15.23n ± 1% ~ (p=0.107 n=15) Div/100/50-16 26.53n ± 0% 26.50n ± 0% ~ (p=0.299 n=15) Div/200/100-16 123.7n ± 0% 124.0n ± 0% ~ (p=0.086 n=15) Div/400/200-16 142.5n ± 0% 142.4n ± 0% ~ (p=0.039 n=15) Div/1000/500-16 259.9n ± 1% 261.2n ± 1% ~ (p=0.044 n=15) Div/2000/1000-16 539.4n ± 1% 532.3n ± 1% -1.32% (p=0.001 n=15) Div/20000/10000-16 22.43µ ± 0% 22.32µ ± 0% -0.49% (p=0.000 n=15) Div/200000/100000-16 898.3µ ± 0% 889.6µ ± 0% -0.96% (p=0.000 n=15) Div/2000000/1000000-16 38.37m ± 0% 35.11m ± 0% -8.49% (p=0.000 n=15) Div/20000000/10000000-16 1.449 ± 0% 1.384 ± 0% -4.48% (p=0.000 n=15) NatMul/10-16 182.0n ± 1% 177.8n ± 1% -2.31% (p=0.000 n=15) NatMul/100-16 5.537µ ± 0% 5.693µ ± 0% +2.82% (p=0.000 n=15) NatMul/1000-16 229.9µ ± 0% 224.8µ ± 0% -2.24% (p=0.000 n=15) NatMul/10000-16 8.985m ± 0% 8.751m ± 0% -2.61% (p=0.000 n=15) NatMul/100000-16 371.1m ± 0% 331.5m ± 0% -10.66% (p=0.000 n=15) NatSqr/1-16 46.77n ± 6% 42.76n ± 1% -8.57% (p=0.000 n=15) NatSqr/2-16 66.99n ± 4% 63.62n ± 1% -5.03% (p=0.000 n=15) NatSqr/3-16 76.79n ± 4% 73.42n ± 1% ~ (p=0.007 n=15) NatSqr/5-16 99.00n ± 3% 95.35n ± 1% -3.69% (p=0.000 n=15) NatSqr/8-16 160.0n ± 3% 155.1n ± 1% -3.06% (p=0.001 n=15) NatSqr/10-16 178.4n ± 2% 175.9n ± 0% -1.40% (p=0.001 n=15) NatSqr/20-16 361.9n ± 2% 361.3n ± 0% ~ (p=0.083 n=15) NatSqr/30-16 584.7n ± 0% 586.8n ± 0% +0.36% (p=0.000 n=15) NatSqr/50-16 1.327µ ± 0% 1.329µ ± 0% ~ (p=0.349 n=15) NatSqr/80-16 2.893µ ± 1% 2.925µ ± 0% +1.11% (p=0.000 n=15) NatSqr/100-16 4.330µ ± 1% 4.381µ ± 0% +1.18% (p=0.000 n=15) NatSqr/200-16 16.25µ ± 1% 16.43µ ± 0% +1.07% (p=0.000 n=15) NatSqr/300-16 27.85µ ± 1% 28.06µ ± 0% +0.77% (p=0.000 n=15) NatSqr/500-16 76.01µ ± 0% 76.34µ ± 0% ~ (p=0.002 n=15) NatSqr/800-16 146.8µ ± 0% 148.1µ ± 0% +0.83% (p=0.000 n=15) NatSqr/1000-16 228.2µ ± 0% 228.6µ ± 0% ~ (p=0.123 n=15) NatSqr/10000-16 7.524m ± 0% 7.426m ± 0% -1.31% (p=0.000 n=15) NatSqr/100000-16 316.7m ± 0% 309.2m ± 0% -2.36% (p=0.000 n=15) geomean 7.264µ 7.172µ -1.27% goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-12 32.61n ± 1% 32.42n ± 1% ~ (p=0.021 n=15) GCD10x10/WithXY-12 87.70n ± 1% 88.42n ± 1% ~ (p=0.010 n=15) GCD100x100/WithoutXY-12 305.9n ± 0% 306.4n ± 0% ~ (p=0.003 n=15) GCD100x100/WithXY-12 560.3n ± 2% 556.6n ± 1% ~ (p=0.018 n=15) GCD1000x1000/WithoutXY-12 3.509µ ± 2% 3.464µ ± 1% ~ (p=0.145 n=15) GCD1000x1000/WithXY-12 5.347µ ± 2% 5.372µ ± 1% ~ (p=0.046 n=15) GCD10000x10000/WithoutXY-12 73.75µ ± 1% 73.99µ ± 1% ~ (p=0.004 n=15) GCD10000x10000/WithXY-12 148.4µ ± 0% 147.8µ ± 1% ~ (p=0.076 n=15) Div/20/10-12 9.481n ± 0% 9.462n ± 1% ~ (p=0.631 n=15) Div/40/20-12 9.457n ± 0% 9.462n ± 1% ~ (p=0.798 n=15) Div/100/50-12 14.91n ± 0% 14.79n ± 1% -0.80% (p=0.000 n=15) Div/200/100-12 84.56n ± 1% 84.60n ± 1% ~ (p=0.271 n=15) Div/400/200-12 103.8n ± 0% 102.8n ± 0% -0.96% (p=0.000 n=15) Div/1000/500-12 181.3n ± 1% 184.2n ± 2% ~ (p=0.091 n=15) Div/2000/1000-12 397.5n ± 0% 397.4n ± 0% ~ (p=0.299 n=15) Div/20000/10000-12 14.04µ ± 1% 13.99µ ± 0% ~ (p=0.221 n=15) Div/200000/100000-12 523.1µ ± 0% 514.0µ ± 3% ~ (p=0.775 n=15) Div/2000000/1000000-12 21.58m ± 0% 20.01m ± 1% -7.29% (p=0.000 n=15) Div/20000000/10000000-12 813.5m ± 0% 796.2m ± 1% -2.13% (p=0.000 n=15) NatMul/10-12 80.46n ± 1% 80.02n ± 1% ~ (p=0.063 n=15) NatMul/100-12 2.904µ ± 0% 2.979µ ± 1% +2.58% (p=0.000 n=15) NatMul/1000-12 127.8µ ± 0% 122.3µ ± 0% -4.28% (p=0.000 n=15) NatMul/10000-12 5.141m ± 0% 4.975m ± 1% -3.23% (p=0.000 n=15) NatMul/100000-12 208.8m ± 0% 189.6m ± 3% -9.21% (p=0.000 n=15) NatSqr/1-12 11.90n ± 1% 11.76n ± 1% ~ (p=0.059 n=15) NatSqr/2-12 21.33n ± 1% 21.12n ± 0% ~ (p=0.063 n=15) NatSqr/3-12 26.05n ± 1% 25.79n ± 0% ~ (p=0.002 n=15) NatSqr/5-12 37.31n ± 0% 36.98n ± 1% ~ (p=0.008 n=15) NatSqr/8-12 63.07n ± 0% 62.75n ± 1% ~ (p=0.061 n=15) NatSqr/10-12 79.48n ± 0% 79.59n ± 0% ~ (p=0.455 n=15) NatSqr/20-12 173.1n ± 0% 173.2n ± 1% ~ (p=0.518 n=15) NatSqr/30-12 288.6n ± 1% 289.2n ± 0% ~ (p=0.030 n=15) NatSqr/50-12 653.3n ± 0% 653.3n ± 0% ~ (p=0.361 n=15) NatSqr/80-12 1.492µ ± 0% 1.496µ ± 0% ~ (p=0.018 n=15) NatSqr/100-12 2.270µ ± 1% 2.270µ ± 0% ~ (p=0.326 n=15) NatSqr/200-12 8.776µ ± 1% 8.784µ ± 1% ~ (p=0.083 n=15) NatSqr/300-12 15.07µ ± 0% 15.09µ ± 0% ~ (p=0.455 n=15) NatSqr/500-12 41.71µ ± 0% 41.77µ ± 1% ~ (p=0.305 n=15) NatSqr/800-12 80.77µ ± 1% 80.59µ ± 0% ~ (p=0.113 n=15) NatSqr/1000-12 126.4µ ± 1% 126.5µ ± 0% ~ (p=0.683 n=15) NatSqr/10000-12 4.204m ± 0% 4.119m ± 0% -2.02% (p=0.000 n=15) NatSqr/100000-12 177.0m ± 0% 172.9m ± 0% -2.31% (p=0.000 n=15) geomean 3.790µ 3.757µ -0.87% Change-Id: Ifc7a9b61f678df216690511ac8bb9143189a795e Reviewed-on: https://go-review.googlesource.com/c/go/+/652057 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>

In a division, normally the answer to N digits / D digits has N-D digits, but not when N-D is negative. Fix the calculation of the number of digits for the temporary in nat.rem not to be negative. Fixes #72043. Change-Id: Ib9faa430aeb6c5f4c4a730f1ec631d2bf3f7472c Reviewed-on: https://go-review.googlesource.com/c/go/+/655156 Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

goos: linux goarch: loong64 pkg: math cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | Exp 26.30n ± 0% 12.93n ± 0% -50.85% (p=0.000 n=10) ExpGo 26.86n ± 0% 26.92n ± 0% +0.22% (p=0.000 n=10) Expm1 16.76n ± 0% 16.75n ± 0% ~ (p=0.060 n=10) Exp2 23.05n ± 0% 12.12n ± 0% -47.42% (p=0.000 n=10) Exp2Go 23.41n ± 0% 23.47n ± 0% +0.28% (p=0.000 n=10) geomean 22.97n 17.54n -23.64% goos: linux goarch: loong64 pkg: math/cmplx cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | Exp 51.32n ± 0% 35.41n ± 0% -30.99% (p=0.000 n=10) goos: linux goarch: loong64 pkg: math cpu: Loongson-3A5000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | Exp 50.27n ± 0% 48.75n ± 1% -3.01% (p=0.000 n=10) ExpGo 50.72n ± 0% 50.44n ± 0% -0.55% (p=0.000 n=10) Expm1 28.40n ± 0% 28.32n ± 0% ~ (p=0.360 n=10) Exp2 50.09n ± 0% 21.49n ± 1% -57.10% (p=0.000 n=10) Exp2Go 50.05n ± 0% 49.69n ± 0% -0.72% (p=0.000 n=10) geomean 44.85n 37.52n -16.35% goos: linux goarch: loong64 pkg: math/cmplx cpu: Loongson-3A5000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | Exp 88.56n ± 0% 67.29n ± 0% -24.03% (p=0.000 n=10) Change-Id: I89e456d26fc075d83335ee4a31227d2aface5714 Reviewed-on: https://go-review.googlesource.com/c/go/+/653935 Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>

Test that big.Int.Mul reusing the same target is not allocating temporary garbage during its computation. That code is going to be modified in an upcoming CL. Change-Id: I3ed55c06da030282233c29cd7af2a04f395dc7a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/652056 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>

No code changes. This CL moves the multiplication (and squaring) code into natmul.go, in preparation for cleaning up Karatsuba and then adding Toom-Cook and FFT-based multiplication. Change-Id: I7f84328284cc4e1ca4da0ebb9f666a5535e8d7f2 Reviewed-on: https://go-review.googlesource.com/c/go/+/652055 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>

Avoid multiplies when converting base 2, 4, 16 inputs, reducing conversion time from O(N²) to O(N). The Base8 and Base10 code paths should be unmodified, but the base-2,4,16 changes tickle the compiler to generate better (amd64) or worse (arm64) when really it should not. This is described in detail in #71868 and should be ignored for the purposes of this CL. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ Scan/10/Base2-16 324.4n ± 0% 258.7n ± 0% -20.25% (p=0.000 n=15) Scan/100/Base2-16 2.376µ ± 0% 1.968µ ± 0% -17.17% (p=0.000 n=15) Scan/1000/Base2-16 23.89µ ± 0% 19.16µ ± 0% -19.80% (p=0.000 n=15) Scan/10000/Base2-16 311.5µ ± 0% 190.4µ ± 0% -38.86% (p=0.000 n=15) Scan/100000/Base2-16 10.508m ± 0% 1.904m ± 0% -81.88% (p=0.000 n=15) Scan/10/Base8-16 138.3n ± 0% 127.9n ± 0% -7.52% (p=0.000 n=15) Scan/100/Base8-16 886.1n ± 0% 790.2n ± 0% -10.82% (p=0.000 n=15) Scan/1000/Base8-16 9.227µ ± 0% 8.234µ ± 0% -10.76% (p=0.000 n=15) Scan/10000/Base8-16 165.8µ ± 0% 155.6µ ± 0% -6.19% (p=0.000 n=15) Scan/100000/Base8-16 9.044m ± 0% 8.935m ± 0% -1.20% (p=0.000 n=15) Scan/10/Base10-16 129.9n ± 0% 120.0n ± 0% -7.62% (p=0.000 n=15) Scan/100/Base10-16 816.3n ± 0% 730.0n ± 0% -10.57% (p=0.000 n=15) Scan/1000/Base10-16 8.518µ ± 0% 7.628µ ± 0% -10.45% (p=0.000 n=15) Scan/10000/Base10-16 158.6µ ± 0% 149.4µ ± 0% -5.80% (p=0.000 n=15) Scan/100000/Base10-16 8.962m ± 0% 8.855m ± 0% -1.20% (p=0.000 n=15) Scan/10/Base16-16 114.5n ± 0% 108.6n ± 0% -5.15% (p=0.000 n=15) Scan/100/Base16-16 648.3n ± 0% 525.0n ± 0% -19.02% (p=0.000 n=15) Scan/1000/Base16-16 7.375µ ± 0% 5.636µ ± 0% -23.58% (p=0.000 n=15) Scan/10000/Base16-16 171.18µ ± 0% 66.99µ ± 0% -60.87% (p=0.000 n=15) Scan/100000/Base16-16 9490.9µ ± 0% 682.8µ ± 0% -92.81% (p=0.000 n=15) geomean 20.11µ 13.69µ -31.94% goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ Scan/10/Base2-88 275.4n ± 0% 215.0n ± 0% -21.93% (p=0.000 n=15) Scan/100/Base2-88 1.869µ ± 0% 1.629µ ± 0% -12.84% (p=0.000 n=15) Scan/1000/Base2-88 18.56µ ± 0% 15.81µ ± 0% -14.82% (p=0.000 n=15) Scan/10000/Base2-88 270.0µ ± 0% 157.2µ ± 0% -41.77% (p=0.000 n=15) Scan/100000/Base2-88 11.518m ± 0% 1.571m ± 0% -86.36% (p=0.000 n=15) Scan/10/Base8-88 108.9n ± 0% 106.0n ± 0% -2.66% (p=0.000 n=15) Scan/100/Base8-88 655.2n ± 0% 594.9n ± 0% -9.20% (p=0.000 n=15) Scan/1000/Base8-88 6.467µ ± 0% 5.966µ ± 0% -7.75% (p=0.000 n=15) Scan/10000/Base8-88 151.2µ ± 0% 147.4µ ± 0% -2.53% (p=0.000 n=15) Scan/100000/Base8-88 10.33m ± 0% 10.30m ± 0% -0.25% (p=0.000 n=15) Scan/10/Base10-88 100.20n ± 0% 98.53n ± 0% -1.67% (p=0.000 n=15) Scan/100/Base10-88 596.9n ± 0% 543.3n ± 0% -8.98% (p=0.000 n=15) Scan/1000/Base10-88 5.904µ ± 0% 5.485µ ± 0% -7.10% (p=0.000 n=15) Scan/10000/Base10-88 145.7µ ± 0% 142.0µ ± 0% -2.55% (p=0.000 n=15) Scan/100000/Base10-88 10.26m ± 0% 10.24m ± 0% -0.18% (p=0.000 n=15) Scan/10/Base16-88 90.33n ± 0% 87.60n ± 0% -3.02% (p=0.000 n=15) Scan/100/Base16-88 506.4n ± 0% 437.7n ± 0% -13.57% (p=0.000 n=15) Scan/1000/Base16-88 5.056µ ± 0% 4.007µ ± 0% -20.75% (p=0.000 n=15) Scan/10000/Base16-88 163.35µ ± 0% 65.37µ ± 0% -59.98% (p=0.000 n=15) Scan/100000/Base16-88 11027.2µ ± 0% 735.1µ ± 0% -93.33% (p=0.000 n=15) geomean 17.13µ 11.74µ -31.46% goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ Scan/10/Base2-16 324.7n ± 0% 348.4n ± 0% +7.30% (p=0.000 n=15) Scan/100/Base2-16 2.604µ ± 0% 3.031µ ± 0% +16.40% (p=0.000 n=15) Scan/1000/Base2-16 26.15µ ± 0% 29.94µ ± 0% +14.52% (p=0.000 n=15) Scan/10000/Base2-16 334.3µ ± 0% 298.8µ ± 0% -10.64% (p=0.000 n=15) Scan/100000/Base2-16 10.664m ± 0% 2.991m ± 0% -71.95% (p=0.000 n=15) Scan/10/Base8-16 144.4n ± 1% 162.2n ± 1% +12.33% (p=0.000 n=15) Scan/100/Base8-16 917.2n ± 0% 1084.0n ± 0% +18.19% (p=0.000 n=15) Scan/1000/Base8-16 9.367µ ± 0% 10.901µ ± 0% +16.38% (p=0.000 n=15) Scan/10000/Base8-16 164.2µ ± 0% 181.2µ ± 0% +10.34% (p=0.000 n=15) Scan/100000/Base8-16 8.871m ± 1% 9.140m ± 0% +3.04% (p=0.000 n=15) Scan/10/Base10-16 134.6n ± 1% 148.3n ± 1% +10.18% (p=0.000 n=15) Scan/100/Base10-16 837.1n ± 0% 986.6n ± 0% +17.86% (p=0.000 n=15) Scan/1000/Base10-16 8.563µ ± 0% 9.936µ ± 0% +16.03% (p=0.000 n=15) Scan/10000/Base10-16 156.5µ ± 1% 171.3µ ± 0% +9.41% (p=0.000 n=15) Scan/100000/Base10-16 8.863m ± 1% 9.011m ± 0% +1.66% (p=0.000 n=15) Scan/10/Base16-16 115.7n ± 2% 129.1n ± 1% +11.58% (p=0.000 n=15) Scan/100/Base16-16 708.6n ± 0% 796.8n ± 0% +12.45% (p=0.000 n=15) Scan/1000/Base16-16 7.314µ ± 0% 7.554µ ± 0% +3.28% (p=0.000 n=15) Scan/10000/Base16-16 149.05µ ± 0% 74.60µ ± 0% -49.95% (p=0.000 n=15) Scan/100000/Base16-16 9091.6µ ± 0% 741.5µ ± 0% -91.84% (p=0.000 n=15) geomean 20.39µ 17.65µ -13.44% goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ Scan/10/Base2-12 193.8n ± 2% 157.3n ± 1% -18.83% (p=0.000 n=15) Scan/100/Base2-12 1.445µ ± 2% 1.362µ ± 1% -5.74% (p=0.000 n=15) Scan/1000/Base2-12 14.28µ ± 0% 13.51µ ± 0% -5.42% (p=0.000 n=15) Scan/10000/Base2-12 177.1µ ± 0% 134.6µ ± 0% -24.04% (p=0.000 n=15) Scan/100000/Base2-12 5.429m ± 1% 1.333m ± 0% -75.45% (p=0.000 n=15) Scan/10/Base8-12 75.52n ± 2% 76.09n ± 1% ~ (p=0.010 n=15) Scan/100/Base8-12 528.4n ± 1% 532.1n ± 1% ~ (p=0.003 n=15) Scan/1000/Base8-12 5.423µ ± 1% 5.427µ ± 0% ~ (p=0.183 n=15) Scan/10000/Base8-12 89.26µ ± 1% 89.37µ ± 0% ~ (p=0.237 n=15) Scan/100000/Base8-12 4.543m ± 2% 4.560m ± 1% ~ (p=0.595 n=15) Scan/10/Base10-12 69.87n ± 1% 70.51n ± 0% ~ (p=0.002 n=15) Scan/100/Base10-12 488.4n ± 1% 491.2n ± 0% ~ (p=0.060 n=15) Scan/1000/Base10-12 5.014µ ± 1% 5.008µ ± 0% ~ (p=0.783 n=15) Scan/10000/Base10-12 84.90µ ± 0% 85.10µ ± 0% ~ (p=0.109 n=15) Scan/100000/Base10-12 4.516m ± 1% 4.521m ± 1% ~ (p=0.713 n=15) Scan/10/Base16-12 59.21n ± 1% 57.70n ± 1% -2.55% (p=0.000 n=15) Scan/100/Base16-12 380.0n ± 1% 360.7n ± 1% -5.08% (p=0.000 n=15) Scan/1000/Base16-12 3.775µ ± 0% 3.421µ ± 0% -9.38% (p=0.000 n=15) Scan/10000/Base16-12 80.62µ ± 0% 34.44µ ± 1% -57.28% (p=0.000 n=15) Scan/100000/Base16-12 4826.4µ ± 2% 450.9µ ± 2% -90.66% (p=0.000 n=15) geomean 11.05µ 8.448µ -23.52% Change-Id: Ifdb2049545f34072aa75cdbb72bed4cf465f0ad7 Reviewed-on: https://go-review.googlesource.com/c/go/+/650640 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>

Add a few more test cases for scanning (integer conversion), which were helpful in debugging some upcoming changes. BenchmarkScan currently times converting the value 10**N represented in base B back into []Word form. When B = 10, the text is 1 followed by many zeros, which could hit a "multiply by zero" special case when processing many digit chunks, misrepresenting the actual time required depending on whether that case is optimized. Change the benchmark to use 9**N, which is about as big and will not cause runs of zeros in any of the tested bases. The benchmark comparison below is not showing faster code, since of course the code is not changing at all here. Instead, it is showing that the new benchmark work is roughly the same size as the old benchmark work. goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ ScanPi-12 43.35µ ± 1% 43.59µ ± 1% ~ (p=0.069 n=15) Scan/10/Base2-12 202.3n ± 2% 193.7n ± 1% -4.25% (p=0.000 n=15) Scan/100/Base2-12 1.512µ ± 3% 1.447µ ± 1% -4.30% (p=0.000 n=15) Scan/1000/Base2-12 15.06µ ± 2% 14.33µ ± 0% -4.83% (p=0.000 n=15) Scan/10000/Base2-12 188.0µ ± 5% 177.3µ ± 1% -5.65% (p=0.000 n=15) Scan/100000/Base2-12 5.814m ± 3% 5.382m ± 1% -7.43% (p=0.000 n=15) Scan/10/Base8-12 78.57n ± 2% 75.02n ± 1% -4.52% (p=0.000 n=15) Scan/100/Base8-12 548.2n ± 2% 526.8n ± 1% -3.90% (p=0.000 n=15) Scan/1000/Base8-12 5.674µ ± 2% 5.421µ ± 0% -4.46% (p=0.000 n=15) Scan/10000/Base8-12 94.42µ ± 1% 88.61µ ± 1% -6.15% (p=0.000 n=15) Scan/100000/Base8-12 4.906m ± 2% 4.498m ± 3% -8.31% (p=0.000 n=15) Scan/10/Base10-12 73.42n ± 1% 69.56n ± 0% -5.26% (p=0.000 n=15) Scan/100/Base10-12 511.9n ± 1% 488.2n ± 0% -4.63% (p=0.000 n=15) Scan/1000/Base10-12 5.254µ ± 2% 5.009µ ± 0% -4.66% (p=0.000 n=15) Scan/10000/Base10-12 90.22µ ± 2% 84.52µ ± 0% -6.32% (p=0.000 n=15) Scan/100000/Base10-12 4.842m ± 3% 4.471m ± 3% -7.65% (p=0.000 n=15) Scan/10/Base16-12 62.28n ± 1% 58.70n ± 1% -5.75% (p=0.000 n=15) Scan/100/Base16-12 398.6n ± 0% 377.9n ± 1% -5.19% (p=0.000 n=15) Scan/1000/Base16-12 4.108µ ± 1% 3.782µ ± 0% -7.94% (p=0.000 n=15) Scan/10000/Base16-12 83.78µ ± 2% 80.51µ ± 1% -3.90% (p=0.000 n=15) Scan/100000/Base16-12 5.080m ± 3% 4.698m ± 3% -7.53% (p=0.000 n=15) geomean 12.41µ 11.74µ -5.36% Change-Id: If3ce290ecc7f38672f11b42fd811afb53dee665d Reviewed-on: https://go-review.googlesource.com/c/go/+/650639 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>

In the early days of math/big, algorithms that needed more space grew the result larger than it needed to be and then used the high words as extra space. This made results their own temporary space caches, at the cost that saving a result in a data structure might hold significantly more memory than necessary. Specifically, new(big.Int).Mul(x, y) returned a big.Int with a backing slice 3X as big as it strictly needed to be. If you are storing many multiplication results, or even a single large result, the 3X overhead can add up. This approach to storage for temporaries also requires being able to analyze the algorithms to predict the exact amount they need, which can be difficult. For both these reasons, the implementation of recursive long division, which came later, introduced a “nat pool” where temporaries could be stored and reused, or reclaimed by the GC when no longer used. This avoids the storage and bookkeeping overheads but introduces a per-temporary sync.Pool overhead. divRecursiveStep takes an array of cached temporaries to remove some of that overhead. The nat pool was better but is still not quite right. This CL introduces something even better than the nat pool (still probably not quite right, but the best I can see for now): a sync.Pool holding stacks for allocating temporaries. Now an operation can get one stack out of the pool and then allocate as many temporaries as it needs during the operation, eventually returning the stack back to the pool. The sync.Pool operations are now per-exported-operation (like big.Int.Mul), not per-temporary. This CL converts both the pre-allocation in nat.mul and the uses of the nat pool to use stack pools instead. This simplifies some code and sets us up better for more complex algorithms (such as Toom-Cook or FFT-based multiplication) that need more temporaries. It is also a little bit faster. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15) Div/40/20-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15) Div/100/50-16 56.65n ± 0% 55.53n ± 0% -1.98% (p=0.000 n=15) Div/200/100-16 194.6n ± 1% 172.8n ± 0% -11.20% (p=0.000 n=15) Div/400/200-16 232.1n ± 0% 206.7n ± 0% -10.94% (p=0.000 n=15) Div/1000/500-16 405.3n ± 1% 383.8n ± 0% -5.30% (p=0.000 n=15) Div/2000/1000-16 810.4n ± 1% 795.2n ± 0% -1.88% (p=0.000 n=15) Div/20000/10000-16 25.88µ ± 0% 25.39µ ± 0% -1.89% (p=0.000 n=15) Div/200000/100000-16 931.5µ ± 0% 924.3µ ± 0% -0.77% (p=0.000 n=15) Div/2000000/1000000-16 37.77m ± 0% 37.75m ± 0% ~ (p=0.098 n=15) Div/20000000/10000000-16 1.367 ± 0% 1.377 ± 0% +0.72% (p=0.003 n=15) NatMul/10-16 168.5n ± 3% 164.0n ± 4% ~ (p=0.751 n=15) NatMul/100-16 6.086µ ± 3% 5.380µ ± 3% -11.60% (p=0.000 n=15) NatMul/1000-16 238.1µ ± 3% 228.3µ ± 1% -4.12% (p=0.000 n=15) NatMul/10000-16 8.721m ± 2% 8.518m ± 1% -2.33% (p=0.000 n=15) NatMul/100000-16 369.6m ± 0% 371.1m ± 0% +0.42% (p=0.000 n=15) geomean 19.57µ 18.74µ -4.21% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15) NatMul/1000-16 48.16Ki ± 0% 16.02Ki ± 0% -66.73% (p=0.000 n=15) NatMul/10000-16 482.9Ki ± 1% 165.4Ki ± 3% -65.75% (p=0.000 n=15) NatMul/100000-16 5.747Mi ± 7% 4.197Mi ± 0% -26.97% (p=0.000 n=15) geomean 41.42Ki 20.63Ki -50.18% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-16 7.000 ± 14% 7.000 ± 14% ~ (p=0.668 n=15) geomean 1.476 1.476 +0.00% ¹ all samples are equal goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-88 15.84n ± 1% 13.12n ± 0% -17.17% (p=0.000 n=15) Div/40/20-88 15.88n ± 1% 13.12n ± 0% -17.38% (p=0.000 n=15) Div/100/50-88 26.42n ± 0% 25.47n ± 0% -3.60% (p=0.000 n=15) Div/200/100-88 132.4n ± 0% 114.9n ± 0% -13.22% (p=0.000 n=15) Div/400/200-88 150.1n ± 0% 135.6n ± 0% -9.66% (p=0.000 n=15) Div/1000/500-88 275.5n ± 0% 264.1n ± 0% -4.14% (p=0.000 n=15) Div/2000/1000-88 586.5n ± 0% 581.1n ± 0% -0.92% (p=0.000 n=15) Div/20000/10000-88 25.87µ ± 0% 25.72µ ± 0% -0.59% (p=0.000 n=15) Div/200000/100000-88 772.2µ ± 0% 779.0µ ± 0% +0.88% (p=0.000 n=15) Div/2000000/1000000-88 33.36m ± 0% 33.63m ± 0% +0.80% (p=0.000 n=15) Div/20000000/10000000-88 1.307 ± 0% 1.320 ± 0% +1.03% (p=0.000 n=15) NatMul/10-88 140.4n ± 0% 148.8n ± 4% +5.98% (p=0.000 n=15) NatMul/100-88 4.663µ ± 1% 4.388µ ± 1% -5.90% (p=0.000 n=15) NatMul/1000-88 207.7µ ± 0% 205.8µ ± 0% -0.89% (p=0.000 n=15) NatMul/10000-88 8.456m ± 0% 8.468m ± 0% +0.14% (p=0.021 n=15) NatMul/100000-88 295.1m ± 0% 297.9m ± 0% +0.94% (p=0.000 n=15) geomean 14.96µ 14.33µ -4.23% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-88 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-88 4.750Ki ± 0% 1.758Ki ± 0% -62.99% (p=0.000 n=15) NatMul/1000-88 48.44Ki ± 0% 16.08Ki ± 0% -66.80% (p=0.000 n=15) NatMul/10000-88 489.7Ki ± 1% 166.1Ki ± 3% -66.08% (p=0.000 n=15) NatMul/100000-88 5.546Mi ± 0% 3.819Mi ± 60% -31.15% (p=0.000 n=15) geomean 41.29Ki 20.30Ki -50.85% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-88 5.000 ± 20% 6.000 ± 67% ~ (p=0.672 n=15) geomean 1.380 1.431 +3.71% ¹ all samples are equal goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 15.85n ± 0% 15.23n ± 0% -3.91% (p=0.000 n=15) Div/40/20-16 15.88n ± 0% 15.22n ± 0% -4.16% (p=0.000 n=15) Div/100/50-16 29.69n ± 0% 26.39n ± 0% -11.11% (p=0.000 n=15) Div/200/100-16 149.2n ± 0% 123.3n ± 0% -17.36% (p=0.000 n=15) Div/400/200-16 160.3n ± 0% 139.2n ± 0% -13.16% (p=0.000 n=15) Div/1000/500-16 271.0n ± 0% 256.1n ± 0% -5.50% (p=0.000 n=15) Div/2000/1000-16 545.3n ± 0% 527.0n ± 0% -3.36% (p=0.000 n=15) Div/20000/10000-16 22.60µ ± 0% 22.20µ ± 0% -1.77% (p=0.000 n=15) Div/200000/100000-16 889.0µ ± 0% 892.2µ ± 0% +0.35% (p=0.000 n=15) Div/2000000/1000000-16 38.01m ± 0% 38.12m ± 0% +0.30% (p=0.000 n=15) Div/20000000/10000000-16 1.437 ± 0% 1.444 ± 0% +0.50% (p=0.000 n=15) NatMul/10-16 166.4n ± 2% 169.5n ± 1% +1.86% (p=0.000 n=15) NatMul/100-16 5.733µ ± 1% 5.570µ ± 1% -2.84% (p=0.000 n=15) NatMul/1000-16 232.6µ ± 1% 229.8µ ± 0% -1.22% (p=0.000 n=15) NatMul/10000-16 9.039m ± 1% 8.969m ± 0% -0.77% (p=0.000 n=15) NatMul/100000-16 367.0m ± 0% 368.8m ± 0% +0.48% (p=0.000 n=15) geomean 16.15µ 15.50µ -4.01% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15) NatMul/1000-16 48.33Ki ± 0% 16.02Ki ± 0% -66.85% (p=0.000 n=15) NatMul/10000-16 536.5Ki ± 1% 165.7Ki ± 3% -69.12% (p=0.000 n=15) NatMul/100000-16 6.078Mi ± 6% 4.197Mi ± 0% -30.94% (p=0.000 n=15) geomean 42.81Ki 20.64Ki -51.78% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-16 2.000 ± 50% 1.000 ± 0% -50.00% (p=0.001 n=15) NatMul/100000-16 9.000 ± 11% 8.000 ± 12% -11.11% (p=0.001 n=15) geomean 1.783 1.516 -14.97% ¹ all samples are equal goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-12 9.850n ± 1% 9.405n ± 1% -4.52% (p=0.000 n=15) Div/40/20-12 9.858n ± 0% 9.403n ± 1% -4.62% (p=0.000 n=15) Div/100/50-12 16.40n ± 1% 14.81n ± 0% -9.70% (p=0.000 n=15) Div/200/100-12 88.48n ± 2% 80.88n ± 0% -8.59% (p=0.000 n=15) Div/400/200-12 107.90n ± 1% 99.28n ± 1% -7.99% (p=0.000 n=15) Div/1000/500-12 188.8n ± 1% 178.6n ± 1% -5.40% (p=0.000 n=15) Div/2000/1000-12 399.9n ± 0% 389.1n ± 0% -2.70% (p=0.000 n=15) Div/20000/10000-12 13.94µ ± 2% 13.81µ ± 1% ~ (p=0.574 n=15) Div/200000/100000-12 523.8µ ± 0% 521.7µ ± 0% -0.40% (p=0.000 n=15) Div/2000000/1000000-12 21.46m ± 0% 21.48m ± 0% ~ (p=0.067 n=15) Div/20000000/10000000-12 812.5m ± 0% 812.9m ± 0% ~ (p=0.061 n=15) NatMul/10-12 77.14n ± 0% 78.35n ± 1% +1.57% (p=0.000 n=15) NatMul/100-12 2.999µ ± 0% 2.871µ ± 1% -4.27% (p=0.000 n=15) NatMul/1000-12 126.2µ ± 0% 126.8µ ± 0% +0.51% (p=0.011 n=15) NatMul/10000-12 5.099m ± 0% 5.125m ± 0% +0.51% (p=0.000 n=15) NatMul/100000-12 206.7m ± 0% 208.4m ± 0% +0.80% (p=0.000 n=15) geomean 9.512µ 9.236µ -2.91% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-12 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-12 4.750Ki ± 0% 1.750Ki ± 0% -63.16% (p=0.000 n=15) NatMul/1000-12 48.13Ki ± 0% 16.01Ki ± 0% -66.73% (p=0.000 n=15) NatMul/10000-12 483.5Ki ± 1% 163.2Ki ± 2% -66.24% (p=0.000 n=15) NatMul/100000-12 5.480Mi ± 4% 1.532Mi ± 104% -72.05% (p=0.000 n=15) geomean 41.03Ki 16.82Ki -59.01% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-12 5.000 ± 0% 1.000 ± 400% -80.00% (p=0.007 n=15) geomean 1.380 1.000 -27.52% ¹ all samples are equal Change-Id: I7efa6fe37971ed26ae120a32250fcb47ece0a011 Reviewed-on: https://go-review.googlesource.com/c/go/+/650638 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>

Change-Id: I112f55c0e3ee3b75e615a06b27552de164565c04 Reviewed-on: https://go-review.googlesource.com/c/go/+/650637 Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Russ Cox <rsc@golang.org>

The GCD code was setting one *Int to the value of another by smashing one struct on top of the other, instead of using Set. That was safe in this one case, but it's not idiomatic in math/big nor safe in general, so rewrite the code not to do that. (In one case, by swapping variables around; in another, by calling Set.) The added Set call does slow down GCDs by a small amount, since the answer has to be copied out. To compensate for that, optimize a bit: remove the s, t temporaries entirely and handle vector x word multiplication directly. The net result is that almost all GCDs are faster, except for small ones, which are a few nanoseconds slower. goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ bench.before │ bench.after │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-12 23.80n ± 1% 31.71n ± 1% +33.24% (p=0.000 n=10) GCD10x10/WithXY-12 100.40n ± 0% 92.14n ± 1% -8.22% (p=0.000 n=10) GCD10x100/WithoutXY-12 63.70n ± 0% 70.73n ± 0% +11.05% (p=0.000 n=10) GCD10x100/WithXY-12 278.6n ± 0% 233.1n ± 1% -16.35% (p=0.000 n=10) GCD10x1000/WithoutXY-12 153.4n ± 0% 162.2n ± 1% +5.74% (p=0.000 n=10) GCD10x1000/WithXY-12 456.0n ± 0% 411.8n ± 1% -9.69% (p=0.000 n=10) GCD10x10000/WithoutXY-12 1.002µ ± 1% 1.036µ ± 0% +3.39% (p=0.000 n=10) GCD10x10000/WithXY-12 2.330µ ± 1% 2.210µ ± 0% -5.13% (p=0.000 n=10) GCD10x100000/WithoutXY-12 8.894µ ± 0% 8.889µ ± 1% ~ (p=0.754 n=10) GCD10x100000/WithXY-12 20.84µ ± 0% 20.24µ ± 0% -2.84% (p=0.000 n=10) GCD100x100/WithoutXY-12 373.3n ± 3% 314.4n ± 0% -15.76% (p=0.000 n=10) GCD100x100/WithXY-12 662.5n ± 0% 572.4n ± 1% -13.59% (p=0.000 n=10) GCD100x1000/WithoutXY-12 641.8n ± 0% 598.1n ± 1% -6.81% (p=0.000 n=10) GCD100x1000/WithXY-12 1.123µ ± 0% 1.019µ ± 1% -9.26% (p=0.000 n=10) GCD100x10000/WithoutXY-12 2.870µ ± 0% 2.831µ ± 0% -1.38% (p=0.000 n=10) GCD100x10000/WithXY-12 4.930µ ± 1% 4.675µ ± 0% -5.16% (p=0.000 n=10) GCD100x100000/WithoutXY-12 24.08µ ± 0% 23.97µ ± 0% -0.48% (p=0.007 n=10) GCD100x100000/WithXY-12 43.66µ ± 0% 42.52µ ± 0% -2.61% (p=0.001 n=10) GCD1000x1000/WithoutXY-12 3.999µ ± 0% 3.569µ ± 1% -10.75% (p=0.000 n=10) GCD1000x1000/WithXY-12 6.397µ ± 0% 5.534µ ± 0% -13.49% (p=0.000 n=10) GCD1000x10000/WithoutXY-12 6.875µ ± 0% 6.450µ ± 0% -6.18% (p=0.000 n=10) GCD1000x10000/WithXY-12 20.75µ ± 1% 19.17µ ± 1% -7.64% (p=0.000 n=10) GCD1000x100000/WithoutXY-12 36.38µ ± 0% 35.60µ ± 1% -2.13% (p=0.000 n=10) GCD1000x100000/WithXY-12 172.1µ ± 0% 174.4µ ± 3% ~ (p=0.052 n=10) GCD10000x10000/WithoutXY-12 79.89µ ± 1% 75.16µ ± 2% -5.92% (p=0.000 n=10) GCD10000x10000/WithXY-12 160.1µ ± 0% 150.0µ ± 0% -6.33% (p=0.000 n=10) GCD10000x100000/WithoutXY-12 213.2µ ± 1% 209.0µ ± 1% -1.98% (p=0.000 n=10) GCD10000x100000/WithXY-12 1.399m ± 0% 1.342m ± 3% -4.08% (p=0.002 n=10) GCD100000x100000/WithoutXY-12 5.463m ± 1% 5.504m ± 2% ~ (p=0.190 n=10) GCD100000x100000/WithXY-12 11.36m ± 0% 11.46m ± 1% +0.86% (p=0.000 n=10) geomean 6.953µ 6.695µ -3.71% goos: linux goarch: amd64 pkg: math/big cpu: AMD Ryzen 9 7950X 16-Core Processor │ bench.before │ bench.after │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-32 39.66n ± 4% 44.34n ± 4% +11.77% (p=0.000 n=10) GCD10x10/WithXY-32 156.7n ± 12% 130.8n ± 2% -16.53% (p=0.000 n=10) GCD10x100/WithoutXY-32 115.8n ± 5% 120.2n ± 2% +3.89% (p=0.000 n=10) GCD10x100/WithXY-32 465.3n ± 3% 368.1n ± 2% -20.91% (p=0.000 n=10) GCD10x1000/WithoutXY-32 201.1n ± 1% 210.8n ± 2% +4.82% (p=0.000 n=10) GCD10x1000/WithXY-32 652.9n ± 4% 605.0n ± 1% -7.32% (p=0.002 n=10) GCD10x10000/WithoutXY-32 1.046µ ± 2% 1.143µ ± 1% +9.33% (p=0.000 n=10) GCD10x10000/WithXY-32 3.360µ ± 1% 3.258µ ± 1% -3.04% (p=0.000 n=10) GCD10x100000/WithoutXY-32 9.391µ ± 3% 9.997µ ± 1% +6.46% (p=0.000 n=10) GCD10x100000/WithXY-32 27.92µ ± 1% 28.21µ ± 0% +1.04% (p=0.043 n=10) GCD100x100/WithoutXY-32 443.7n ± 5% 320.0n ± 2% -27.88% (p=0.000 n=10) GCD100x100/WithXY-32 789.9n ± 2% 690.4n ± 1% -12.60% (p=0.000 n=10) GCD100x1000/WithoutXY-32 718.4n ± 3% 600.0n ± 1% -16.48% (p=0.000 n=10) GCD100x1000/WithXY-32 1.388µ ± 4% 1.175µ ± 1% -15.28% (p=0.000 n=10) GCD100x10000/WithoutXY-32 2.750µ ± 1% 2.668µ ± 1% -2.96% (p=0.000 n=10) GCD100x10000/WithXY-32 6.016µ ± 1% 5.590µ ± 1% -7.09% (p=0.000 n=10) GCD100x100000/WithoutXY-32 21.40µ ± 1% 22.30µ ± 1% +4.21% (p=0.000 n=10) GCD100x100000/WithXY-32 47.02µ ± 4% 48.80µ ± 0% +3.78% (p=0.015 n=10) GCD1000x1000/WithoutXY-32 3.417µ ± 4% 3.020µ ± 1% -11.65% (p=0.000 n=10) GCD1000x1000/WithXY-32 5.752µ ± 0% 5.418µ ± 2% -5.81% (p=0.000 n=10) GCD1000x10000/WithoutXY-32 6.150µ ± 0% 6.246µ ± 1% +1.55% (p=0.000 n=10) GCD1000x10000/WithXY-32 24.68µ ± 3% 25.07µ ± 1% ~ (p=0.051 n=10) GCD1000x100000/WithoutXY-32 34.60µ ± 2% 36.85µ ± 1% +6.51% (p=0.000 n=10) GCD1000x100000/WithXY-32 209.5µ ± 4% 227.4µ ± 0% +8.56% (p=0.000 n=10) GCD10000x10000/WithoutXY-32 90.69µ ± 0% 88.48µ ± 0% -2.44% (p=0.000 n=10) GCD10000x10000/WithXY-32 197.1µ ± 0% 200.5µ ± 0% +1.73% (p=0.000 n=10) GCD10000x100000/WithoutXY-32 239.1µ ± 0% 242.5µ ± 0% +1.42% (p=0.000 n=10) GCD10000x100000/WithXY-32 1.963m ± 3% 2.028m ± 0% +3.28% (p=0.000 n=10) GCD100000x100000/WithoutXY-32 7.466m ± 0% 7.412m ± 0% -0.71% (p=0.000 n=10) GCD100000x100000/WithXY-32 16.10m ± 2% 16.47m ± 0% +2.25% (p=0.000 n=10) geomean 8.388µ 8.127µ -3.12% Change-Id: I161dc409bad11bcc553bc8116449905ae5b06742 Reviewed-on: https://go-review.googlesource.com/c/go/+/650636 Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>

If the -test.run value is not surrounded by ^$ then any test that matches the -test.run value will be run. This is normally not the desired behavior, as it can lead to unexpected tests being run. Change-Id: I3447aaebad5156bbef7f263cdb9f6b8c32331324 Reviewed-on: https://go-review.googlesource.com/c/go/+/651956 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Change-Id: I65721039dab311762e55c6a60dd75b82f6b4622f Reviewed-on: https://go-review.googlesource.com/c/go/+/642335 Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>

The old link no longer works. Fixes #70684 Change-Id: I8711ef7d5721bf20ef83f5192dd0d1f73dda6ce1 Reviewed-on: https://go-review.googlesource.com/c/go/+/633775 Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

This harmonize the docs with (*Rand).Uint* functions. And it make it clearer, I wasn't sure if it would try to interpret the uint as a signed number somehow, it does not pull any surprises make that clear. Change-Id: I5a87a0a5563dbabfc31e536e40ee69b11f5cb6cf Reviewed-on: https://go-review.googlesource.com/c/go/+/633535 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Commit-Queue: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Robert Griesemer <gri@google.com>

If Be and Le stand for big-endian and little-endian, then they should be BE and LE. Change-Id: I723e3962b8918da84791783d3c547638f1c9e8a9 Reviewed-on: https://go-review.googlesource.com/c/go/+/627376 Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Change-Id: Ie7649060db25f1573eeaadd534a600bb24d30572 GitHub-Last-Rev: c617848a4ec9f5c21820982efc95e0ec4ca2510c GitHub-Pull-Request: golang/go#70134 Reviewed-on: https://go-review.googlesource.com/c/go/+/623757 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Robert Griesemer <gri@google.com>

benchmark: goos: linux goarch: loong64 pkg: math cpu: Loongson-3A6000 @ 2500.00MHz │ bench.old │ bench.new │ │ sec/op │ sec/op vs base │ Ceil 10.810n ± 0% 2.578n ± 0% -76.15% (p=0.000 n=20) Floor 10.810n ± 0% 2.531n ± 0% -76.59% (p=0.000 n=20) Trunc 9.606n ± 0% 2.530n ± 0% -73.67% (p=0.000 n=20) geomean 10.39n 2.546n -75.50% goos: linux goarch: loong64 pkg: math cpu: Loongson-3A5000 @ 2500.00MHz │ bench.old │ bench.new │ │ sec/op │ sec/op vs base │ Ceil 13.220n ± 0% 7.703n ± 8% -41.73% (p=0.000 n=20) Floor 12.410n ± 0% 7.248n ± 2% -41.59% (p=0.000 n=20) Trunc 11.210n ± 0% 7.757n ± 4% -30.80% (p=0.000 n=20) geomean 12.25n 7.566n -38.25% Change-Id: I3af51e9852e9cf5f965fed895d68945a2e8675f4 Reviewed-on: https://go-review.googlesource.com/c/go/+/612615 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Follow-up on CL 467555. Change-Id: I1815b5def656ae4b86c31385ad0737f0465fa2d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/613535 Auto-Submit: Robert Griesemer <gri@google.com> TryBot-Bypass: Robert Griesemer <gri@google.com> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Tim King <taking@google.com>

Rather than conditionally assigning ujn, initialise ujn above the loop to invent the leading 0 for u, then unconditionally load ujn at the bottom of the loop. This code operates on the basis that n >= 2, hence j+n-1 is always greater than zero. Change-Id: I1272ef30c787ed8707ae8421af2adcccc776d389 Reviewed-on: https://go-review.googlesource.com/c/go/+/467555 Auto-Submit: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Commit-Queue: Robert Griesemer <gri@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Robert Griesemer <gri@google.com>

This CL reapplies CL 504737 and adds integer precision limitation check, since CL 504737 only checks whether floating point number is +-Inf or NaN. This CL is also ~7% faster than CL 504737. Updates #68322 goos: linux goarch: riscv64 pkg: math │ math.old.bench │ math.new.bench │ │ sec/op │ sec/op vs base │ Ceil 54.09n ± 0% 18.72n ± 0% -65.39% (p=0.000 n=10) Floor 40.72n ± 0% 18.72n ± 0% -54.03% (p=0.000 n=10) Round 20.73n ± 0% 20.73n ± 0% ~ (p=1.000 n=10) RoundToEven 24.07n ± 0% 24.07n ± 0% ~ (p=1.000 n=10) Trunc 38.72n ± 0% 18.72n ± 0% -51.65% (p=0.000 n=10) geomean 33.56n 20.09n -40.13% Change-Id: I06cfe2cb9e2535cd705d40b6650a7e71fedd906c Reviewed-on: https://go-review.googlesource.com/c/go/+/600075 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

Change-Id: I535a7aaaf3f9e8a9c0e0c04f8f745ad7445a32f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/611678 Run-TryBot: shuang cui <imcusg@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>

This CL adds trunc,ceil,floor tests for large exact float. Change-Id: Ib7ffec1d2d50d2ac955398a3dd0fd06d494fcf4f Reviewed-on: https://go-review.googlesource.com/c/go/+/601095 Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org>

All changes are related to the code, except for the comments in src/regexp/syntax/parse.go and src/slices/slices.go. Change-Id: I73c5d3c54099749b62210aa7f3182c5eb84bb6a6 GitHub-Last-Rev: 794aa9b0539811d00e1cd42be1e8d9fe9afe0281 GitHub-Pull-Request: golang/go#69170 Reviewed-on: https://go-review.googlesource.com/c/go/+/609678 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>

As some callers don't have a testing context, modify testenv.Executable to accept nil (similar to how testenv.GOROOT works). Change-Id: I39112a7869933785a26b5cb6520055b3cc42b847 Reviewed-on: https://go-review.googlesource.com/c/go/+/609835 Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>

This provides an assembly implementation of addMulVVW for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ addmulvvw.1 │ addmulvvw.2 │ │ sec/op │ sec/op vs base │ AddMulVVW/1-4 65.49n ± 0% 50.79n ± 0% -22.44% (p=0.000 n=10) AddMulVVW/2-4 82.81n ± 0% 66.83n ± 0% -19.29% (p=0.000 n=10) AddMulVVW/3-4 100.20n ± 0% 82.87n ± 0% -17.30% (p=0.000 n=10) AddMulVVW/4-4 117.50n ± 0% 84.20n ± 0% -28.34% (p=0.000 n=10) AddMulVVW/5-4 134.9n ± 0% 100.3n ± 0% -25.69% (p=0.000 n=10) AddMulVVW/10-4 221.7n ± 0% 164.4n ± 0% -25.85% (p=0.000 n=10) AddMulVVW/100-4 1.794µ ± 0% 1.250µ ± 0% -30.32% (p=0.000 n=10) AddMulVVW/1000-4 17.42µ ± 0% 12.08µ ± 0% -30.68% (p=0.000 n=10) AddMulVVW/10000-4 254.9µ ± 0% 214.8µ ± 0% -15.75% (p=0.000 n=10) AddMulVVW/100000-4 2.569m ± 0% 2.178m ± 0% -15.20% (p=0.000 n=10) geomean 1.443µ 1.107µ -23.29% │ addmulvvw.1 │ addmulvvw.2 │ │ B/s │ B/s vs base │ AddMulVVW/1-4 932.0Mi ± 0% 1201.6Mi ± 0% +28.93% (p=0.000 n=10) AddMulVVW/2-4 1.440Gi ± 0% 1.784Gi ± 0% +23.90% (p=0.000 n=10) AddMulVVW/3-4 1.785Gi ± 0% 2.158Gi ± 0% +20.87% (p=0.000 n=10) AddMulVVW/4-4 2.029Gi ± 0% 2.832Gi ± 0% +39.59% (p=0.000 n=10) AddMulVVW/5-4 2.209Gi ± 0% 2.973Gi ± 0% +34.55% (p=0.000 n=10) AddMulVVW/10-4 2.689Gi ± 0% 3.626Gi ± 0% +34.86% (p=0.000 n=10) AddMulVVW/100-4 3.323Gi ± 0% 4.770Gi ± 0% +43.54% (p=0.000 n=10) AddMulVVW/1000-4 3.421Gi ± 0% 4.936Gi ± 0% +44.27% (p=0.000 n=10) AddMulVVW/10000-4 2.338Gi ± 0% 2.776Gi ± 0% +18.69% (p=0.000 n=10) AddMulVVW/100000-4 2.320Gi ± 0% 2.736Gi ± 0% +17.93% (p=0.000 n=10) geomean 2.109Gi 2.749Gi +30.36% Change-Id: I6c7ee48233c53ff9b6a5a9002675886cd9bff5af Reviewed-on: https://go-review.googlesource.com/c/go/+/595400 Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

This provides an assembly implementation of mulAddVWW for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ muladdvww.1 │ muladdvww.2 │ │ sec/op │ sec/op vs base │ MulAddVWW/1-4 68.18n ± 0% 65.49n ± 0% -3.95% (p=0.000 n=10) MulAddVWW/2-4 82.81n ± 0% 78.85n ± 0% -4.78% (p=0.000 n=10) MulAddVWW/3-4 97.49n ± 0% 72.18n ± 0% -25.96% (p=0.000 n=10) MulAddVWW/4-4 112.20n ± 0% 85.54n ± 0% -23.76% (p=0.000 n=10) MulAddVWW/5-4 126.90n ± 0% 98.90n ± 0% -22.06% (p=0.000 n=10) MulAddVWW/10-4 200.3n ± 0% 144.3n ± 0% -27.96% (p=0.000 n=10) MulAddVWW/100-4 1532.0n ± 0% 860.0n ± 0% -43.86% (p=0.000 n=10) MulAddVWW/1000-4 14.757µ ± 0% 8.076µ ± 0% -45.27% (p=0.000 n=10) MulAddVWW/10000-4 204.0µ ± 0% 137.1µ ± 0% -32.77% (p=0.000 n=10) MulAddVWW/100000-4 2.066m ± 0% 1.382m ± 0% -33.12% (p=0.000 n=10) geomean 1.311µ 950.0n -27.51% │ muladdvww.1 │ muladdvww.2 │ │ B/s │ B/s vs base │ MulAddVWW/1-4 895.1Mi ± 0% 932.0Mi ± 0% +4.11% (p=0.000 n=10) MulAddVWW/2-4 1.440Gi ± 0% 1.512Gi ± 0% +5.02% (p=0.000 n=10) MulAddVWW/3-4 1.834Gi ± 0% 2.477Gi ± 0% +35.07% (p=0.000 n=10) MulAddVWW/4-4 2.125Gi ± 0% 2.787Gi ± 0% +31.15% (p=0.000 n=10) MulAddVWW/5-4 2.349Gi ± 0% 3.013Gi ± 0% +28.28% (p=0.000 n=10) MulAddVWW/10-4 2.975Gi ± 0% 4.130Gi ± 0% +38.79% (p=0.000 n=10) MulAddVWW/100-4 3.891Gi ± 0% 6.930Gi ± 0% +78.11% (p=0.000 n=10) MulAddVWW/1000-4 4.039Gi ± 0% 7.380Gi ± 0% +82.72% (p=0.000 n=10) MulAddVWW/10000-4 2.922Gi ± 0% 4.346Gi ± 0% +48.74% (p=0.000 n=10) MulAddVWW/100000-4 2.884Gi ± 0% 4.313Gi ± 0% +49.52% (p=0.000 n=10) geomean 2.321Gi 3.202Gi +37.95% Change-Id: If08191607913ce5c7641f34bae8fa5c9dfb44777 Reviewed-on: https://go-review.googlesource.com/c/go/+/595399 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>

This provides an assembly implementation of subVW for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ subvw.1 │ subvw.2 │ │ sec/op │ sec/op vs base │ SubVW/1-4 57.43n ± 0% 41.45n ± 0% -27.82% (p=0.000 n=10) SubVW/2-4 69.31n ± 0% 48.15n ± 0% -30.53% (p=0.000 n=10) SubVW/3-4 76.12n ± 0% 54.87n ± 0% -27.92% (p=0.000 n=10) SubVW/4-4 85.47n ± 0% 56.14n ± 0% -34.32% (p=0.000 n=10) SubVW/5-4 96.15n ± 0% 62.83n ± 0% -34.65% (p=0.000 n=10) SubVW/10-4 149.60n ± 0% 89.55n ± 0% -40.14% (p=0.000 n=10) SubVW/100-4 1115.0n ± 0% 549.3n ± 0% -50.74% (p=0.000 n=10) SubVW/1000-4 10.732µ ± 0% 5.071µ ± 0% -52.75% (p=0.000 n=10) SubVW/10000-4 153.0µ ± 0% 103.7µ ± 0% -32.21% (p=0.000 n=10) SubVW/100000-4 1.542m ± 0% 1.046m ± 0% -32.13% (p=0.000 n=10) SubVWext/1-4 57.42n ± 0% 41.45n ± 0% -27.81% (p=0.000 n=10) SubVWext/2-4 69.33n ± 0% 48.15n ± 0% -30.55% (p=0.000 n=10) SubVWext/3-4 76.12n ± 0% 54.93n ± 0% -27.84% (p=0.000 n=10) SubVWext/4-4 85.47n ± 0% 56.14n ± 0% -34.32% (p=0.000 n=10) SubVWext/5-4 96.15n ± 0% 62.83n ± 0% -34.65% (p=0.000 n=10) SubVWext/10-4 149.60n ± 0% 89.56n ± 0% -40.14% (p=0.000 n=10) SubVWext/100-4 1115.0n ± 0% 549.3n ± 0% -50.74% (p=0.000 n=10) SubVWext/1000-4 10.732µ ± 0% 5.061µ ± 0% -52.84% (p=0.000 n=10) SubVWext/10000-4 152.5µ ± 0% 103.7µ ± 0% -32.02% (p=0.000 n=10) SubVWext/100000-4 1.533m ± 0% 1.046m ± 0% -31.75% (p=0.000 n=10) geomean 1.005µ 633.7n -36.92% │ subvw.1 │ subvw.2 │ │ B/s │ B/s vs base │ SubVW/1-4 132.9Mi ± 0% 184.1Mi ± 0% +38.54% (p=0.000 n=10) SubVW/2-4 220.1Mi ± 0% 316.9Mi ± 0% +43.95% (p=0.000 n=10) SubVW/3-4 300.7Mi ± 0% 417.1Mi ± 0% +38.72% (p=0.000 n=10) SubVW/4-4 357.1Mi ± 0% 543.6Mi ± 0% +52.24% (p=0.000 n=10) SubVW/5-4 396.7Mi ± 0% 607.2Mi ± 0% +53.03% (p=0.000 n=10) SubVW/10-4 510.1Mi ± 0% 851.9Mi ± 0% +67.01% (p=0.000 n=10) SubVW/100-4 684.2Mi ± 0% 1388.9Mi ± 0% +102.99% (p=0.000 n=10) SubVW/1000-4 710.9Mi ± 0% 1504.5Mi ± 0% +111.63% (p=0.000 n=10) SubVW/10000-4 498.7Mi ± 0% 735.7Mi ± 0% +47.52% (p=0.000 n=10) SubVW/100000-4 494.8Mi ± 0% 729.1Mi ± 0% +47.34% (p=0.000 n=10) SubVWext/1-4 132.9Mi ± 0% 184.1Mi ± 0% +38.53% (p=0.000 n=10) SubVWext/2-4 220.1Mi ± 0% 316.9Mi ± 0% +44.00% (p=0.000 n=10) SubVWext/3-4 300.7Mi ± 0% 416.7Mi ± 0% +38.57% (p=0.000 n=10) SubVWext/4-4 357.1Mi ± 0% 543.6Mi ± 0% +52.24% (p=0.000 n=10) SubVWext/5-4 396.7Mi ± 0% 607.2Mi ± 0% +53.04% (p=0.000 n=10) SubVWext/10-4 510.1Mi ± 0% 851.9Mi ± 0% +67.01% (p=0.000 n=10) SubVWext/100-4 684.2Mi ± 0% 1388.9Mi ± 0% +102.99% (p=0.000 n=10) SubVWext/1000-4 710.9Mi ± 0% 1507.6Mi ± 0% +112.07% (p=0.000 n=10) SubVWext/10000-4 500.1Mi ± 0% 735.7Mi ± 0% +47.10% (p=0.000 n=10) SubVWext/100000-4 497.8Mi ± 0% 729.4Mi ± 0% +46.52% (p=0.000 n=10) geomean 387.6Mi 614.5Mi +58.51% Change-Id: I9d7fac719e977710ad9db9121fa298db6df605de Reviewed-on: https://go-review.googlesource.com/c/go/+/595398 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

This provides an assembly implementation of addVW for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ addvw.1 │ addvw.2 │ │ sec/op │ sec/op vs base │ AddVW/1-4 57.43n ± 0% 41.45n ± 0% -27.83% (p=0.000 n=10) AddVW/2-4 69.31n ± 0% 48.15n ± 0% -30.53% (p=0.000 n=10) AddVW/3-4 76.12n ± 0% 54.97n ± 0% -27.79% (p=0.000 n=10) AddVW/4-4 85.47n ± 0% 56.14n ± 0% -34.32% (p=0.000 n=10) AddVW/5-4 96.16n ± 0% 62.82n ± 0% -34.67% (p=0.000 n=10) AddVW/10-4 149.60n ± 0% 89.55n ± 0% -40.14% (p=0.000 n=10) AddVW/100-4 1115.0n ± 0% 549.3n ± 0% -50.74% (p=0.000 n=10) AddVW/1000-4 10.732µ ± 0% 5.060µ ± 0% -52.85% (p=0.000 n=10) AddVW/10000-4 151.7µ ± 0% 103.7µ ± 0% -31.63% (p=0.000 n=10) AddVW/100000-4 1.523m ± 0% 1.050m ± 0% -31.03% (p=0.000 n=10) AddVWext/1-4 57.42n ± 0% 41.45n ± 0% -27.81% (p=0.000 n=10) AddVWext/2-4 69.32n ± 0% 48.15n ± 0% -30.54% (p=0.000 n=10) AddVWext/3-4 76.12n ± 0% 54.87n ± 0% -27.92% (p=0.000 n=10) AddVWext/4-4 85.47n ± 0% 56.14n ± 0% -34.32% (p=0.000 n=10) AddVWext/5-4 96.15n ± 0% 62.82n ± 0% -34.66% (p=0.000 n=10) AddVWext/10-4 149.60n ± 0% 89.55n ± 0% -40.14% (p=0.000 n=10) AddVWext/100-4 1115.0n ± 0% 549.3n ± 0% -50.74% (p=0.000 n=10) AddVWext/1000-4 10.732µ ± 0% 5.060µ ± 0% -52.85% (p=0.000 n=10) AddVWext/10000-4 150.5µ ± 0% 103.7µ ± 0% -31.10% (p=0.000 n=10) AddVWext/100000-4 1.530m ± 0% 1.049m ± 0% -31.41% (p=0.000 n=10) geomean 1.003µ 633.9n -36.79% │ addvw.1 │ addvw.2 │ │ B/s │ B/s vs base │ AddVW/1-4 132.8Mi ± 0% 184.1Mi ± 0% +38.55% (p=0.000 n=10) AddVW/2-4 220.1Mi ± 0% 316.9Mi ± 0% +43.96% (p=0.000 n=10) AddVW/3-4 300.7Mi ± 0% 416.4Mi ± 0% +38.48% (p=0.000 n=10) AddVW/4-4 357.1Mi ± 0% 543.6Mi ± 0% +52.25% (p=0.000 n=10) AddVW/5-4 396.7Mi ± 0% 607.2Mi ± 0% +53.06% (p=0.000 n=10) AddVW/10-4 510.1Mi ± 0% 852.0Mi ± 0% +67.02% (p=0.000 n=10) AddVW/100-4 684.1Mi ± 0% 1389.0Mi ± 0% +103.03% (p=0.000 n=10) AddVW/1000-4 710.9Mi ± 0% 1507.8Mi ± 0% +112.08% (p=0.000 n=10) AddVW/10000-4 503.1Mi ± 0% 735.8Mi ± 0% +46.26% (p=0.000 n=10) AddVW/100000-4 501.0Mi ± 0% 726.5Mi ± 0% +45.00% (p=0.000 n=10) AddVWext/1-4 132.9Mi ± 0% 184.1Mi ± 0% +38.55% (p=0.000 n=10) AddVWext/2-4 220.1Mi ± 0% 316.9Mi ± 0% +43.98% (p=0.000 n=10) AddVWext/3-4 300.7Mi ± 0% 417.1Mi ± 0% +38.73% (p=0.000 n=10) AddVWext/4-4 357.1Mi ± 0% 543.6Mi ± 0% +52.25% (p=0.000 n=10) AddVWext/5-4 396.7Mi ± 0% 607.2Mi ± 0% +53.05% (p=0.000 n=10) AddVWext/10-4 510.1Mi ± 0% 852.0Mi ± 0% +67.02% (p=0.000 n=10) AddVWext/100-4 684.2Mi ± 0% 1389.0Mi ± 0% +103.02% (p=0.000 n=10) AddVWext/1000-4 710.9Mi ± 0% 1507.7Mi ± 0% +112.08% (p=0.000 n=10) AddVWext/10000-4 506.9Mi ± 0% 735.8Mi ± 0% +45.15% (p=0.000 n=10) AddVWext/100000-4 498.6Mi ± 0% 727.0Mi ± 0% +45.79% (p=0.000 n=10) geomean 388.3Mi 614.3Mi +58.19% Change-Id: Ib14a4b8c1d81e710753bbf6dd5546bbca44fe3f1 Reviewed-on: https://go-review.googlesource.com/c/go/+/595397 Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>

Implement the encoding.(Binary|Text)Appender interfaces for "x509.OID". Implement the encoding.BinaryAppender interface for "rand/v2.PCG" and "rand/v2.ChaCha8". "rand/v2.ChaCha8.MarshalBinary" alse gains some performance benefits: │ old │ new │ │ sec/op │ sec/op vs base │ ChaCha8MarshalBinary-8 33.730n ± 2% 9.786n ± 1% -70.99% (p=0.000 n=10) ChaCha8MarshalBinaryRead-8 99.86n ± 1% 17.79n ± 0% -82.18% (p=0.000 n=10) geomean 58.04n 13.19n -77.27% │ old │ new │ │ B/op │ B/op vs base │ ChaCha8MarshalBinary-8 48.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) ChaCha8MarshalBinaryRead-8 83.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) │ old │ new │ │ allocs/op │ allocs/op vs base │ ChaCha8MarshalBinary-8 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10) ChaCha8MarshalBinaryRead-8 2.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10) For #62384 Change-Id: I604bde6dad90a916012909c7260f4bb06dcf5c0a GitHub-Last-Rev: 78abf9c5dfb74838985637798bcd5cb957541d20 GitHub-Pull-Request: golang/go#68987 Reviewed-on: https://go-review.googlesource.com/c/go/+/607079 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>

Fix typos in ~30 files Change-Id: Ie433aea01e7d15944c1e9e103691784876d5c1f9 GitHub-Last-Rev: bbaeb3d1f88a5fa6bbb69607b1bd075f496a7894 GitHub-Pull-Request: golang/go#68964 Reviewed-on: https://go-review.googlesource.com/c/go/+/606955 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com>

Makes calls to the global Seed a no-op. The GODEBUG=randseednop=0 setting can be used to revert this behavior. Fixes #67273 Change-Id: I79c1b2b23f3bc472fbd6190cb916a9d7583250f4 Reviewed-on: https://go-review.googlesource.com/c/go/+/606055 Auto-Submit: Cherry Mui <cherryyz@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

For #62384 Change-Id: I1557704c6a0f9c6f3b9aad001374dd5cdbc99065 GitHub-Last-Rev: c258d18ccedab5feeb481a2431d5647bde7e5c58 GitHub-Pull-Request: golang/go#68893 Reviewed-on: https://go-review.googlesource.com/c/go/+/605758 Reviewed-by: Ian Lance Taylor <iant@google.com> Commit-Queue: Robert Griesemer <gri@google.com> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Robert Griesemer <gri@google.com>