| Age | Commit message (Collapse) | Author |
|
Step 4 of the mini-compiler: switch to the new generated assembly.
No systematic performance regressions, and many many improvements.
In the benchmarks, the systems are:
c3h88 GOARCH=amd64 c3h88 perf gomote (newer Intel, Google Cloud)
c2s16 GOARCH=amd64 c2s16 perf gomote (Intel, Google Cloud)
s7 GOARCH=amd64 rsc basement server (AMD Ryzen 9 7950X)
386 GOARCH=386 gotip-linux-386 gomote (Intel, Google Cloud)
s7-386 GOARCH=386 rsc basement server (AMD Ryzen 9 7950X)
c4as16 GOARCH=arm64 c4as16 perf gomote (Google Cloud)
mac GOARCH=arm64 Apple M3 Pro in MacBook Pro
arm GOARCH=arm gotip-linux-arm gomote
loong64 GOARCH=loong64 gotip-linux-loong64 gomote
ppc64le GOARCH=ppc64le gotip-linux-ppc64le gomote
riscv64 GOARCH=riscv64 gotip-linux-riscv64 gomote
s390x GOARCH=s390x linux-s390x-ibm old gomote
benchmark \ system c3h88 c2s16 s7 386 s7-386 c4as16 mac arm loong64 ppc64le riscv64 s390x
AddVV/words=1 -4.03% +5.21% -4.04% +4.94% ~ ~ ~ ~ -19.51% ~ ~ ~
AddVV/words=10 -10.20% +0.34% -3.46% -11.50% -7.46% +7.66% +5.97% ~ -17.90% ~ ~ ~
AddVV/words=16 -10.91% -6.45% -8.45% -21.86% -17.90% +2.73% -1.61% ~ -22.47% -3.54% ~ ~
AddVV/words=100 -3.77% -4.30% -3.17% -47.27% -45.34% -0.78% ~ -8.74% -27.19% ~ ~ ~
AddVV/words=1000 -0.08% -0.71% ~ -49.21% -48.07% ~ ~ -16.80% -24.74% ~ ~ ~
AddVV/words=10000 ~ ~ ~ -48.73% -48.56% -0.06% ~ -17.08% ~ ~ -4.81% ~
AddVV/words=100000 ~ ~ ~ -47.80% -48.38% ~ ~ -15.10% -25.06% ~ -5.34% ~
SubVV/words=1 -0.84% +3.43% -3.62% +1.34% ~ -0.76% ~ ~ -18.18% +5.58% ~ ~
SubVV/words=10 -9.99% +0.34% ~ -11.23% -8.24% +7.53% +6.15% ~ -17.55% +2.77% -2.08% ~
SubVV/words=16 -11.94% -6.45% -6.81% -21.82% -18.11% +1.58% -1.21% ~ -20.36% ~ ~ ~
SubVV/words=100 -3.38% -4.32% -1.80% -46.14% -46.43% +0.41% ~ -7.20% -26.17% ~ -0.42% ~
SubVV/words=1000 -0.38% -0.80% ~ -49.22% -48.90% ~ ~ -15.86% -24.73% ~ ~ ~
SubVV/words=10000 ~ ~ ~ -49.57% -49.64% -0.03% ~ -15.85% -26.52% ~ -5.05% ~
SubVV/words=100000 ~ ~ ~ -46.88% -49.66% ~ ~ -15.45% -16.11% ~ -4.99% ~
LshVU/words=1 ~ +5.78% ~ ~ -2.48% +1.61% +2.18% +2.70% -18.16% -34.16% -21.29% ~
LshVU/words=10 -18.34% -3.78% +2.21% ~ ~ -2.81% -12.54% ~ -25.02% -24.78% -38.11% -66.98%
LshVU/words=16 -23.15% +1.03% +7.74% +0.73% ~ +8.88% +1.56% ~ -25.37% -28.46% -41.27% ~
LshVU/words=100 -32.85% -8.86% -2.58% ~ +2.69% +1.24% ~ -20.63% -44.14% -42.68% -53.09% ~
LshVU/words=1000 -37.30% -0.20% +5.67% ~ ~ +1.44% ~ -27.83% -45.01% -37.07% -57.02% -46.57%
LshVU/words=10000 -36.84% -2.30% +3.82% ~ +1.86% +1.57% -66.81% -28.00% -13.15% -35.40% -41.97% ~
LshVU/words=100000 -40.30% ~ +3.96% ~ ~ ~ ~ -24.91% -19.06% -36.14% -40.99% -66.03%
RshVU/words=1 -3.17% +4.76% -4.06% +4.31% +4.55% ~ ~ ~ -20.61% ~ -26.20% -51.33%
RshVU/words=10 -22.08% -4.41% -17.99% +3.64% -11.87% ~ -16.30% ~ -30.01% ~ -40.37% -63.05%
RshVU/words=16 -26.03% -8.50% -18.09% ~ -17.52% +6.50% ~ -2.85% -30.24% ~ -42.93% -63.13%
RshVU/words=100 -20.87% -28.83% -29.45% ~ -26.25% +1.46% -1.14% -16.20% -45.65% -16.20% -53.66% -77.27%
RshVU/words=1000 -24.03% -21.37% -26.71% ~ -28.95% +0.98% ~ -18.82% -45.21% -23.55% -57.09% -71.18%
RshVU/words=10000 -24.56% -22.44% -27.01% ~ -28.88% +0.78% -5.35% -17.47% -16.87% -20.67% -41.97% ~
RshVU/words=100000 -23.36% -15.65% -27.54% ~ -29.26% +1.73% -6.67% -13.68% -21.40% -23.02% -40.37% -66.31%
MulAddVWW/words=1 +2.37% +8.14% ~ +4.10% +3.71% ~ ~ ~ -21.62% ~ +1.12% ~
MulAddVWW/words=10 ~ -2.72% -15.15% +8.04% ~ ~ ~ -2.52% -19.48% ~ -6.18% ~
MulAddVWW/words=16 ~ +1.49% ~ +4.49% +6.58% -8.70% -7.16% -12.08% -21.43% -6.59% -9.05% ~
MulAddVWW/words=100 +0.37% +1.11% -4.51% -13.59% ~ -11.10% -3.63% -21.40% -22.27% -2.92% -14.41% ~
MulAddVWW/words=1000 ~ +0.90% -7.13% -18.94% ~ -14.02% -9.97% -28.31% -18.72% -2.32% -15.80% ~
MulAddVWW/words=10000 ~ +1.08% -6.75% -19.10% ~ -14.61% -9.04% -28.48% -14.29% -2.25% -9.40% ~
MulAddVWW/words=100000 ~ ~ -6.93% -18.09% ~ -14.33% -9.66% -28.92% -16.63% -2.43% -8.23% ~
AddMulVVWW/words=1 +2.30% +4.83% -11.37% +4.58% ~ -3.14% ~ ~ -10.58% +30.35% ~ ~
AddMulVVWW/words=10 -3.27% ~ +8.96% +5.74% ~ +2.67% -1.44% -7.64% -13.41% ~ ~ ~
AddMulVVWW/words=16 -6.12% ~ ~ ~ +1.91% -7.90% -16.22% -14.07% -14.26% -4.15% -7.30% ~
AddMulVVWW/words=100 -5.48% -2.14% ~ -9.40% +9.98% -1.43% -12.35% -18.56% -21.94% ~ -9.84% ~
AddMulVVWW/words=1000 -11.35% -3.40% -3.64% -11.04% +12.82% -1.33% -15.63% -20.50% -20.95% ~ -11.06% -51.97%
AddMulVVWW/words=10000 -10.31% -1.61% -8.41% -12.15% +13.10% -1.03% -16.34% -22.46% -1.00% ~ -10.33% -49.80%
AddMulVVWW/words=100000 -13.71% ~ -8.31% -12.18% +12.98% -1.35% -15.20% -21.89% ~ ~ -9.38% -48.30%
Change-Id: I0a33c33602c0d053c84d9946e662500cfa048e2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/664938
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The vast majority of the time, carry propagation is limited and
addVW/subVW only need to consider a single word for carry propagation.
As Josh Bleecher-Snyder pointed out in 2019 (CL 164968), once carrying
is done, the remaining words can be handled faster with copy (memmove).
In the benchmarks below, this is the data=random case.
Even more important, if the source and destination are the same,
the copy can be optimized away entirely, making a small in-place
addition to a big.Int O(1) instead of O(N). To date, only a few
systems (amd64, arm64, and pure Go, meaning wasm) make use of this
asymptotic improvement. This is the data=shortcut case.
This CL deletes the addVW/subVW assembly and replaces it with
an optimized pure Go version. Using Go makes it easy to call
the real copy builtin, which will use optimized memmove code,
instead of recreating a worse memmove in assembly (as arm64 does)
or omitting the copy optimization entirely (as most others do).
The worst case for the Go version versus assembly is the case
of incrementing 2^N-1 by 1, which has to propagate a carry
the entire length of the array. This is the data=carry case.
On balance, we believe this case is rare enough to be worth
taking a hit in that case, in exchange for significant wins
in the other cases and the deletion of significant amounts of
assembly of varying quality. (Remember that half the assembly has
the copy optimization and shortcut, while half does not.)
In the benchmarks, the systems are:
c2s16 GOARCH=amd64 c2s16 perf gomote (Intel, Google Cloud)
c3h88 GOARCH=amd64 c3h88 perf gomote (newer Intel, Google Cloud)
s7 GOARCH=amd64 rsc basement server (AMD Ryzen 9 7950X)
c4as16 GOARCH=arm64 c4as16 perf gomote (Google Cloud)
mac GOARCH=arm64 Apple M3 Pro in MacBook Pro
386 GOARCH=386 gotip-linux-386 gomote
arm GOARCH=arm gotip-linux-arm gomote
loong64 GOARCH=loong64 gotip-linux-loong64 gomote
ppc64le GOARCH=ppc64le gotip-linux-ppc64le gomote
riscv64 GOARCH=riscv64 gotip-linux-riscv64 gomote
benchmark \ system c2s16 c3h88 s7 c4as16 mac 386 arm loong64 ppc64le riscv64
AddVW/words=1/data=random -1.15% -1.74% -5.89% -9.80% -11.54% +23.71% -12.74% -14.25% +14.67% +10.27%
AddVW/words=2/data=random -2.59% ~ -4.38% -19.31% -15.41% +24.80% ~ -19.99% +13.73% +19.71%
AddVW/words=3/data=random -3.75% -19.10% -3.79% -23.15% -17.04% +20.04% -10.07% -23.20% ~ +15.39%
AddVW/words=4/data=random -2.84% +7.05% -8.77% -22.64% -15.77% +16.01% -7.36% -28.22% ~ +23.00%
AddVW/words=5/data=random -10.97% +2.16% -12.09% -20.89% -17.14% +9.42% -4.69% -32.60% ~ +10.07%
AddVW/words=6/data=random -9.87% ~ -7.54% -19.08% -6.46% ~ -3.44% -34.61% ~ +12.19%
AddVW/words=7/data=random -14.36% ~ -10.09% -19.10% -10.47% -6.20% -5.06% -38.14% -11.54% +6.79%
AddVW/words=8/data=random -17.50% ~ -11.06% -25.14% -12.88% -8.35% -5.11% -41.39% -14.04% +11.87%
AddVW/words=9/data=random -19.76% -4.05% -15.47% -24.08% -16.50% -12.34% -21.56% -44.25% -14.82% ~
AddVW/words=10/data=random -13.89% ~ -9.69% -23.06% -8.04% -12.58% -19.25% -32.80% -11.68% ~
AddVW/words=16/data=random -29.36% -15.35% -21.86% -25.04% -19.89% -32.26% -16.29% -42.66% -25.92% -3.01%
AddVW/words=32/data=random -39.02% -28.76% -39.87% -11.22% -2.85% -55.40% -31.17% -55.37% -37.92% -16.28%
AddVW/words=64/data=random -25.94% -19.09% -20.60% -6.90% +8.91% -51.00% -43.72% -62.27% -44.11% -28.74%
AddVW/words=100/data=random -22.79% -18.13% -18.25% ~ +33.89% -67.40% -51.77% -63.54% -53.75% -30.97%
AddVW/words=1000/data=random -8.98% -3.84% ~ -3.15% ~ -93.35% -63.92% -65.66% -68.67% -42.30%
AddVW/words=10000/data=random -1.38% -0.38% ~ ~ ~ -89.16% -65.18% -44.65% -70.35% -20.08%
AddVW/words=100000/data=random ~ ~ ~ ~ ~ -87.03% -64.51% -36.08% -61.40% -16.53%
SubVW/words=1/data=random -3.67% ~ -8.38% -10.26% -3.07% +45.78% -6.06% -11.17% ~ ~
SubVW/words=2/data=random -3.48% -10.07% -5.76% -20.14% -8.45% +44.28% ~ -19.09% ~ +16.98%
SubVW/words=3/data=random -7.11% -26.64% -4.48% -22.07% -9.21% +35.61% ~ -23.93% -18.20% ~
SubVW/words=4/data=random -4.23% +7.19% -8.95% -22.62% -13.89% +33.20% -8.96% -29.96% ~ +22.23%
SubVW/words=5/data=random -11.49% +1.92% -10.86% -22.27% -17.53% +24.48% -2.88% -35.19% -19.55% ~
SubVW/words=6/data=random -7.67% ~ -7.72% -18.44% -6.24% +12.03% -2.00% -39.68% -10.73% ~
SubVW/words=7/data=random -13.69% -18.32% -11.82% -18.92% -11.57% +6.63% ~ -43.54% -30.81% ~
SubVW/words=8/data=random -16.02% ~ -11.07% -24.50% -11.92% +4.32% -3.01% -46.95% -24.14% ~
SubVW/words=9/data=random -18.76% -3.34% -14.84% -23.79% -17.50% ~ -21.80% -49.98% -29.62% ~
SubVW/words=10/data=random -13.23% ~ -9.25% -21.26% -11.63% ~ -18.58% -39.19% -20.09% ~
SubVW/words=16/data=random -28.25% -13.24% -22.66% -27.18% -19.13% -23.38% -20.24% -51.01% -28.06% -3.05%
SubVW/words=32/data=random -38.41% -28.88% -40.12% -11.20% -2.80% -49.17% -34.67% -63.29% -39.25% -15.20%
SubVW/words=64/data=random -25.51% -19.24% -22.20% -6.57% +9.98% -48.52% -48.14% -69.50% -49.44% -27.92%
SubVW/words=100/data=random -21.69% -18.51% ~ +1.92% +34.42% -65.88% -54.67% -71.24% -58.88% -30.71%
SubVW/words=1000/data=random -9.81% -4.05% -2.14% -3.06% ~ -93.37% -67.33% -74.12% -68.36% -42.17%
SubVW/words=10000/data=random ~ -0.52% ~ ~ ~ -88.87% -68.54% -44.94% -70.63% -19.95%
SubVW/words=100000/data=random ~ ~ ~ ~ ~ -86.69% -68.09% -48.36% -62.42% -19.32%
AddVW/words=1/data=shortcut -29.38% -25.38% -27.37% -23.15% -25.41% +3.01% -33.60% -36.12% -15.76% ~
AddVW/words=2/data=shortcut -32.79% -34.72% -31.47% -24.47% -28.21% -3.75% -34.66% -43.89% -23.65% -21.56%
AddVW/words=3/data=shortcut -38.50% -46.83% -35.67% -26.38% -30.29% -10.41% -44.89% -47.68% -30.93% -26.85%
AddVW/words=4/data=shortcut -40.40% -28.85% -34.19% -29.83% -32.95% -16.09% -42.86% -51.02% -34.19% -26.69%
AddVW/words=5/data=shortcut -43.87% -35.42% -36.46% -32.59% -37.72% -20.82% -45.14% -54.01% -35.49% -30.48%
AddVW/words=6/data=shortcut -46.98% -39.34% -42.22% -35.43% -38.18% -27.46% -46.72% -56.61% -40.21% -34.07%
AddVW/words=7/data=shortcut -49.63% -47.97% -46.61% -35.28% -41.93% -31.14% -49.29% -58.89% -41.10% -37.01%
AddVW/words=8/data=shortcut -50.48% -42.33% -45.40% -40.24% -41.74% -32.92% -50.62% -60.98% -44.85% -38.10%
AddVW/words=9/data=shortcut -54.27% -43.52% -49.06% -42.16% -45.22% -37.57% -51.84% -62.91% -46.04% -40.82%
AddVW/words=10/data=shortcut -56.01% -45.40% -51.42% -43.29% -46.14% -38.65% -53.65% -64.62% -47.05% -43.21%
AddVW/words=16/data=shortcut -62.73% -55.66% -59.31% -56.38% -54.31% -53.16% -61.03% -72.29% -58.24% -52.57%
AddVW/words=32/data=shortcut -74.00% -69.42% -71.75% -33.65% -37.35% -71.73% -72.59% -82.44% -70.87% -67.69%
AddVW/words=64/data=shortcut -56.69% -52.72% -52.09% -35.48% -36.87% -84.24% -83.10% -90.37% -82.56% -80.81%
AddVW/words=100/data=shortcut -56.68% -53.18% -51.49% -33.49% -37.72% -89.95% -88.21% -93.37% -88.47% -86.52%
AddVW/words=1000/data=shortcut -56.68% -52.45% -51.66% -35.31% -36.65% -98.88% -98.62% -99.24% -98.78% -98.41%
AddVW/words=10000/data=shortcut -56.70% -52.40% -51.92% -33.49% -36.98% -99.89% -99.86% -99.92% -99.87% -99.91%
AddVW/words=100000/data=shortcut -56.67% -52.46% -52.38% -35.31% -37.20% -99.99% -99.99% -99.99% -99.99% -99.99%
SubVW/words=1/data=shortcut -29.80% -20.71% -26.94% -23.24% -25.33% +26.97% -32.02% -37.85% -40.20% -12.67%
SubVW/words=2/data=shortcut -35.47% -36.38% -31.93% -25.43% -30.18% +18.96% -33.48% -46.48% -39.38% -18.65%
SubVW/words=3/data=shortcut -39.22% -49.96% -36.90% -25.82% -30.96% +12.53% -40.67% -51.07% -43.71% -23.78%
SubVW/words=4/data=shortcut -40.46% -24.90% -34.66% -29.87% -33.97% +4.60% -42.32% -54.92% -42.83% -22.45%
SubVW/words=5/data=shortcut -43.84% -34.17% -38.00% -32.55% -37.27% -2.46% -43.09% -58.18% -45.70% -26.45%
SubVW/words=6/data=shortcut -47.69% -37.49% -42.73% -35.90% -37.73% -8.52% -46.55% -61.01% -44.00% -30.14%
SubVW/words=7/data=shortcut -49.45% -50.66% -46.88% -34.77% -41.64% -14.46% -48.92% -63.46% -50.47% -33.39%
SubVW/words=8/data=shortcut -50.45% -39.31% -47.14% -40.47% -41.70% -15.77% -50.21% -65.64% -47.71% -34.01%
SubVW/words=9/data=shortcut -54.28% -43.07% -49.42% -41.34% -44.99% -19.39% -51.55% -67.61% -56.92% -36.82%
SubVW/words=10/data=shortcut -56.85% -47.88% -50.92% -42.76% -45.67% -23.60% -53.04% -69.34% -60.18% -39.43%
SubVW/words=16/data=shortcut -62.36% -54.83% -58.80% -55.83% -53.74% -41.04% -60.16% -76.75% -60.56% -48.63%
SubVW/words=32/data=shortcut -73.68% -68.64% -71.57% -33.52% -37.34% -64.73% -72.67% -85.89% -71.87% -64.56%
SubVW/words=64/data=shortcut -56.68% -51.66% -52.56% -34.75% -37.54% -80.30% -83.58% -92.39% -83.41% -78.70%
SubVW/words=100/data=shortcut -56.68% -50.97% -51.57% -33.68% -36.78% -87.42% -88.53% -94.84% -88.87% -84.96%
SubVW/words=1000/data=shortcut -56.68% -50.89% -52.10% -34.94% -37.77% -98.59% -98.71% -99.43% -98.80% -98.20%
SubVW/words=10000/data=shortcut -56.68% -51.00% -52.44% -33.65% -37.27% -99.86% -99.87% -99.94% -99.88% -99.90%
SubVW/words=100000/data=shortcut -56.68% -50.80% -52.20% -34.79% -37.46% -99.99% -99.99% -99.99% -99.99% -99.99%
AddVW/words=1/data=carry -0.51% -5.29% -24.03% -26.48% ~ ~ -33.14% -30.23% ~ -20.74%
AddVW/words=2/data=carry -6.36% ~ -21.05% -39.40% ~ +10.72% -29.12% -31.34% ~ -17.29%
AddVW/words=3/data=carry ~ ~ -17.46% -19.53% +17.58% ~ -26.23% -23.61% +7.80% -14.34%
AddVW/words=4/data=carry +19.02% +16.80% ~ ~ +28.25% ~ -27.90% -20.31% +19.16% ~
AddVW/words=5/data=carry +3.97% +53.02% ~ ~ +11.31% ~ -19.05% -17.47% +16.81% ~
AddVW/words=6/data=carry +2.98% +19.83% ~ ~ +14.84% ~ -18.48% -14.92% +18.25% ~
AddVW/words=7/data=carry ~ ~ ~ ~ +27.17% ~ -15.50% -12.74% +13.00% ~
AddVW/words=8/data=carry +0.58% +22.32% ~ +6.10% +29.63% ~ -13.04% ~ +28.46% +2.95%
AddVW/words=9/data=carry ~ +31.53% ~ ~ +14.42% ~ -11.32% ~ +18.37% +3.28%
AddVW/words=10/data=carry +3.94% +22.36% ~ +6.29% +19.22% ~ -11.27% ~ +20.10% +3.91%
AddVW/words=16/data=carry +2.82% +14.23% ~ +10.06% +25.91% -16.12% ~ ~ +52.28% +10.40%
AddVW/words=32/data=carry ~ +25.35% +13.66% ~ +34.89% -34.39% +6.51% -18.71% +41.06% +19.42%
AddVW/words=64/data=carry -42.03% ~ -39.70% +6.65% +32.29% -39.94% +14.34% ~ +19.68% +20.86%
AddVW/words=100/data=carry -33.95% -34.28% -39.65% ~ +27.72% -26.80% +17.40% ~ +26.39% +23.32%
AddVW/words=1000/data=carry -42.49% -47.87% -47.44% +1.25% +4.25% -41.76% +23.40% ~ +25.48% +27.99%
AddVW/words=10000/data=carry -41.85% -48.49% -49.43% ~ ~ -42.09% +24.61% -10.32% +40.55% +18.35%
AddVW/words=100000/data=carry -28.18% -48.13% -48.24% +1.35% ~ -42.90% +24.73% -9.79% +22.55% +17.16%
SubVW/words=1/data=carry -10.32% -17.16% -24.14% -26.24% ~ +18.43% -34.10% -29.54% -9.57% ~
SubVW/words=2/data=carry -19.45% -23.31% -20.74% -39.73% ~ +15.74% -28.13% -30.21% ~ -18.74%
SubVW/words=3/data=carry ~ -16.18% -15.34% -19.54% +17.62% +12.39% -27.64% -27.09% ~ -14.97%
SubVW/words=4/data=carry +11.67% +24.42% ~ ~ +25.11% +14.07% -28.08% -26.18% ~ ~
SubVW/words=5/data=carry +8.08% +25.64% ~ ~ +10.35% +8.12% -21.75% -25.50% ~ -4.86%
SubVW/words=6/data=carry ~ +13.82% ~ ~ +12.92% +6.79% -20.25% -24.70% ~ -2.74%
SubVW/words=7/data=carry ~ ~ +8.29% +4.51% +26.59% +4.62% -18.01% -24.09% ~ -1.26%
SubVW/words=8/data=carry ~ +23.16% +16.19% +6.16% +25.46% +6.74% -15.57% -22.74% ~ +1.44%
SubVW/words=9/data=carry ~ +30.71% +20.81% ~ +12.36% ~ -12.99% ~ ~ +3.13%
SubVW/words=10/data=carry +5.03% +19.53% +14.84% +14.16% +16.12% ~ -11.64% -16.00% +15.45% +3.29%
SubVW/words=16/data=carry +14.42% +15.58% +33.07% +11.43% +24.65% ~ ~ -21.90% +25.59% +9.40%
SubVW/words=32/data=carry ~ +27.57% +46.58% ~ +35.35% -8.49% ~ -24.04% +11.86% +18.40%
SubVW/words=64/data=carry -24.34% -27.83% -20.90% +13.34% +37.17% -14.90% ~ -8.81% +12.88% +18.92%
SubVW/words=100/data=carry -25.19% -34.70% -27.45% +12.86% +28.42% -14.48% ~ ~ +25.71% +21.93%
SubVW/words=1000/data=carry -24.93% -47.86% -47.26% +2.66% ~ -23.88% ~ ~ +25.99% +27.81%
SubVW/words=10000/data=carry -24.17% -36.48% -49.41% +1.06% ~ -25.06% ~ -26.50% +27.94% +18.36%
SubVW/words=100000/data=carry -22.51% -35.86% -49.46% +3.96% ~ -25.18% ~ -22.15% +26.86% +15.44%
Change-Id: I8f252073040e674780ac6ec9912082fb205329dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/664898
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Also fix a few real but currently harmless bugs from CL 664895.
There were a few places that were still wrong if z != x or if a != 0.
Change-Id: Id8971e2505523bc4708780c82bf998a546f4f081
Reviewed-on: https://go-review.googlesource.com/c/go/+/664897
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
|
|
It is annoying that non-x86 implementations of shlVU and shrVU
have to go out of their way to handle the trivial case shift==0
with their own copy loops. Instead, arrange to never call them
with shift==0, so that the code can be removed.
Unfortunately, there are linknames of shlVU, so we cannot
change that function. But we can rename the functions and
then leave behind a shlVU wrapper, so do that.
Since the big.Int API calls the operations Lsh and Rsh, rename
shlVU/shrVU to lshVU/rshVU. Also rename various other shl/shr
methods and functions to lsh/rsh.
Change-Id: Ieaf54e0110a298730aa3e4566ce5be57ba7fc121
Reviewed-on: https://go-review.googlesource.com/c/go/+/664896
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
addMulVVW is an unnecessarily special case.
All other assembly routines taking []Word (V as in vector) arguments
take separate source and destination. For example:
addVV: z = x+y
mulAddVWW: z = x*m+a
addMulVVW uses the z parameter as both destination and source:
addMulVVW: z = z+x*m
Even looking at the signatures is confusing: all the VV routines take
two input vectors x and y, but addMulVVW takes only x: where is y?
(The answer is that the two inputs are z and x.)
It would be nice to fix this, both for understandability and regularity,
and to simplify a future assembly generator.
We cannot remove or redefine addMulVVW, because it has been used
in linknames. Instead, the CL adds a new final addend argument ‘a’
like in mulAddVWW, making the natural name addMulVVWW
(two input vectors, two input words):
addMulVVWW: z = x+y*m+a
This CL updates all the assembly implementations to rename the
inputs z, x, y -> x, y, m, and then introduces a separate destination z.
Change-Id: Ib76c80b53f6d1f4a901f663566e9c4764bb20488
Reviewed-on: https://go-review.googlesource.com/c/go/+/664895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
|
|
Running 'go fix' on the cmd+std packages handled much of this change.
Also update code generators to use only the new go:build lines,
not the old +build ones.
For #41184.
For #60268.
Change-Id: If35532abe3012e7357b02c79d5992ff5ac37ca23
Cq-Include-Trybots: luci.golang.try:gotip-linux-386-longtest,gotip-linux-amd64-longtest,gotip-windows-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/536237
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Now gc can generate the same assembly code.
Change-Id: Iac503003e14045d63e2def66408c13cee516aa37
Reviewed-on: https://go-review.googlesource.com/c/go/+/402575
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Minor leftover from CL 74851.
Change-Id: I1b56afcde3c505ba77a0f79e8ae9b01000362298
GitHub-Last-Rev: 87e97571a58d5eadd63a28226543aaf1510a7b02
GitHub-Pull-Request: golang/go#48942
Reviewed-on: https://go-review.googlesource.com/c/go/+/355629
Reviewed-by: Robert Griesemer <gri@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
Trust: Alexander Rakoczy <alex@golang.org>
|
|
Don't add them to files in vendor and cmd/vendor though. These will be
pulled in by updating the respective dependencies.
For #41184
Change-Id: Icc57458c9b3033c347124323f33084c85b224c70
Reviewed-on: https://go-review.googlesource.com/c/go/+/319389
Trust: Tobias Klauser <tobias.klauser@gmail.com>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
Division is much slower than multiplication. And the method of using
multiplication by multiplying reciprocal and replacing division with it
can increase the speed of divWVW algorithm by three times,and at the
same time increase the speed of nats division.
The benchmark test on arm64 is as follows:
name old time/op new time/op delta
DivWVW/1-4 13.1ns ± 4% 13.3ns ± 4% ~ (p=0.444 n=5+5)
DivWVW/2-4 48.6ns ± 1% 51.2ns ± 2% +5.39% (p=0.008 n=5+5)
DivWVW/3-4 82.0ns ± 1% 69.7ns ± 1% -15.03% (p=0.008 n=5+5)
DivWVW/4-4 116ns ± 1% 71ns ± 2% -38.88% (p=0.008 n=5+5)
DivWVW/5-4 152ns ± 1% 84ns ± 4% -44.70% (p=0.008 n=5+5)
DivWVW/10-4 319ns ± 1% 155ns ± 4% -51.50% (p=0.008 n=5+5)
DivWVW/100-4 3.44µs ± 3% 1.30µs ± 8% -62.30% (p=0.008 n=5+5)
DivWVW/1000-4 33.8µs ± 0% 10.9µs ± 1% -67.74% (p=0.008 n=5+5)
DivWVW/10000-4 343µs ± 4% 111µs ± 5% -67.63% (p=0.008 n=5+5)
DivWVW/100000-4 3.35ms ± 1% 1.25ms ± 3% -62.79% (p=0.008 n=5+5)
QuoRem-4 3.08µs ± 2% 2.21µs ± 4% -28.40% (p=0.008 n=5+5)
ModSqrt225_Tonelli-4 444µs ± 2% 457µs ± 3% ~ (p=0.095 n=5+5)
ModSqrt225_3Mod4-4 136µs ± 1% 138µs ± 3% ~ (p=0.151 n=5+5)
ModSqrt231_Tonelli-4 473µs ± 3% 483µs ± 4% ~ (p=0.548 n=5+5)
ModSqrt231_5Mod8-4 164µs ± 9% 169µs ±12% ~ (p=0.421 n=5+5)
Sqrt-4 36.8µs ± 1% 28.6µs ± 0% -22.17% (p=0.016 n=5+4)
Div/20/10-4 50.0ns ± 3% 51.3ns ± 6% ~ (p=0.238 n=5+5)
Div/40/20-4 49.8ns ± 2% 51.3ns ± 6% ~ (p=0.222 n=5+5)
Div/100/50-4 85.8ns ± 4% 86.5ns ± 5% ~ (p=0.246 n=5+5)
Div/200/100-4 335ns ± 3% 296ns ± 2% -11.60% (p=0.008 n=5+5)
Div/400/200-4 442ns ± 2% 359ns ± 5% -18.81% (p=0.008 n=5+5)
Div/1000/500-4 858ns ± 3% 643ns ± 6% -25.06% (p=0.008 n=5+5)
Div/2000/1000-4 1.70µs ± 3% 1.28µs ± 4% -24.80% (p=0.008 n=5+5)
Div/20000/10000-4 45.0µs ± 5% 41.8µs ± 4% -7.17% (p=0.016 n=5+5)
Div/200000/100000-4 1.51ms ± 7% 1.43ms ± 3% -5.42% (p=0.016 n=5+5)
Div/2000000/1000000-4 57.6ms ± 4% 57.5ms ± 3% ~ (p=1.000 n=5+5)
Div/20000000/10000000-4 2.08s ± 3% 2.04s ± 1% ~ (p=0.095 n=5+5)
name old speed new speed delta
DivWVW/1-4 4.87GB/s ± 4% 4.80GB/s ± 4% ~ (p=0.310 n=5+5)
DivWVW/2-4 2.63GB/s ± 1% 2.50GB/s ± 2% -5.07% (p=0.008 n=5+5)
DivWVW/3-4 2.34GB/s ± 1% 2.76GB/s ± 1% +17.70% (p=0.008 n=5+5)
DivWVW/4-4 2.21GB/s ± 1% 3.61GB/s ± 2% +63.42% (p=0.008 n=5+5)
DivWVW/5-4 2.10GB/s ± 2% 3.81GB/s ± 4% +80.89% (p=0.008 n=5+5)
DivWVW/10-4 2.01GB/s ± 0% 4.13GB/s ± 4% +105.91% (p=0.008 n=5+5)
DivWVW/100-4 1.86GB/s ± 2% 4.95GB/s ± 7% +165.63% (p=0.008 n=5+5)
DivWVW/1000-4 1.89GB/s ± 0% 5.86GB/s ± 1% +209.96% (p=0.008 n=5+5)
DivWVW/10000-4 1.87GB/s ± 4% 5.76GB/s ± 5% +208.96% (p=0.008 n=5+5)
DivWVW/100000-4 1.91GB/s ± 1% 5.14GB/s ± 3% +168.85% (p=0.008 n=5+5)
Change-Id: I049f1196562b20800e6ef8a6493fd147f93ad830
Reviewed-on: https://go-review.googlesource.com/c/go/+/250417
Trust: Giovanni Bajo <rasky@develer.com>
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Assembly files with "/vendor/" or "testdata" in their paths were ignored.
Change-Id: I3882ff07eb4426abb9f8ee96f82dff73c81cd61f
GitHub-Last-Rev: 51ae8c324d72a12a059272fcf8568e670bfaf21b
GitHub-Pull-Request: golang/go#31166
Reviewed-on: https://go-review.googlesource.com/c/go/+/170197
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This matches the pure Go fast path added in the previous commit.
I will leave other architectures to those with ready access to hardware.
name old time/op new time/op delta
AddVW/1-8 3.60ns ± 3% 3.59ns ± 1% ~ (p=0.147 n=91+86)
AddVW/2-8 3.92ns ± 1% 3.91ns ± 2% -0.36% (p=0.000 n=86+92)
AddVW/3-8 4.33ns ± 5% 4.46ns ± 5% +2.94% (p=0.000 n=96+97)
AddVW/4-8 4.76ns ± 5% 4.82ns ± 5% +1.28% (p=0.000 n=95+92)
AddVW/5-8 5.40ns ± 1% 5.42ns ± 0% +0.47% (p=0.000 n=76+71)
AddVW/10-8 8.03ns ± 1% 7.80ns ± 5% -2.90% (p=0.000 n=73+96)
AddVW/100-8 43.8ns ± 5% 17.9ns ± 1% -59.12% (p=0.000 n=94+81)
AddVW/1000-8 428ns ± 4% 85ns ± 6% -80.20% (p=0.000 n=96+99)
AddVW/10000-8 4.22µs ± 2% 1.80µs ± 3% -57.32% (p=0.000 n=69+92)
AddVW/100000-8 44.8µs ± 8% 31.5µs ± 3% -29.76% (p=0.000 n=99+90)
name old time/op new time/op delta
SubVW/1-8 3.53ns ± 2% 3.63ns ± 5% +2.97% (p=0.000 n=94+93)
SubVW/2-8 4.33ns ± 5% 4.01ns ± 2% -7.36% (p=0.000 n=90+85)
SubVW/3-8 4.32ns ± 2% 4.32ns ± 5% ~ (p=0.084 n=87+97)
SubVW/4-8 4.70ns ± 2% 4.83ns ± 6% +2.77% (p=0.000 n=85+96)
SubVW/5-8 5.84ns ± 1% 5.35ns ± 1% -8.35% (p=0.000 n=87+87)
SubVW/10-8 8.01ns ± 4% 7.54ns ± 4% -5.84% (p=0.000 n=98+97)
SubVW/100-8 43.9ns ± 5% 17.9ns ± 1% -59.20% (p=0.000 n=98+76)
SubVW/1000-8 426ns ± 2% 85ns ± 3% -80.13% (p=0.000 n=90+98)
SubVW/10000-8 4.24µs ± 2% 1.81µs ± 3% -57.28% (p=0.000 n=74+91)
SubVW/100000-8 44.5µs ± 4% 31.5µs ± 2% -29.33% (p=0.000 n=84+91)
Change-Id: I10dd361cbaca22197c27e7734c0f50065292afbb
Reviewed-on: https://go-review.googlesource.com/c/go/+/164969
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Don't worry, this patch just remove trailing whitespace from
assembly files, and does not touch any logical changes.
Change-Id: Ia724ac0b1abf8bc1e41454bdc79289ef317c165d
Reviewed-on: https://go-review.googlesource.com/c/113595
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Use MULX/ADOX/ADCX instructions to speed-up addMulVVW,
when they are available. addMulVVW is a hotspot in rsa.
This is faster than ADD/ADC/IMUL version, because ADOX/ADCX only
modify carry/overflow flag, so they can be interleaved with each other
and with MULX, which doesn't modify flags at all.
Increasing unroll factor to e. g. 16 makes rsa 1% faster, but 3PrimeRSA2048Decrypt
performance falls back to baseline.
Updates #20058
AddMulVVW/1-8 3.28ns ± 2% 3.26ns ± 3% ~ (p=0.107 n=10+10)
AddMulVVW/2-8 4.26ns ± 2% 4.24ns ± 3% ~ (p=0.327 n=9+9)
AddMulVVW/3-8 5.07ns ± 2% 5.26ns ± 2% +3.73% (p=0.000 n=10+10)
AddMulVVW/4-8 6.40ns ± 2% 6.50ns ± 2% +1.61% (p=0.000 n=10+10)
AddMulVVW/5-8 6.77ns ± 2% 6.86ns ± 1% +1.38% (p=0.001 n=9+9)
AddMulVVW/10-8 12.2ns ± 2% 10.6ns ± 3% -13.65% (p=0.000 n=10+10)
AddMulVVW/100-8 79.7ns ± 2% 52.4ns ± 1% -34.17% (p=0.000 n=10+10)
AddMulVVW/1000-8 695ns ± 1% 491ns ± 2% -29.39% (p=0.000 n=9+10)
AddMulVVW/10000-8 7.26µs ± 2% 5.92µs ± 6% -18.42% (p=0.000 n=10+10)
AddMulVVW/100000-8 72.6µs ± 2% 62.2µs ± 2% -14.31% (p=0.000 n=10+10)
crypto/rsa speed-up is smaller, but stil noticeable:
RSA2048Decrypt-8 1.61ms ± 1% 1.38ms ± 1% -14.13% (p=0.000 n=10+10)
RSA2048Sign-8 1.93ms ± 1% 1.70ms ± 1% -11.86% (p=0.000 n=10+10)
3PrimeRSA2048Decrypt-8 932µs ± 0% 828µs ± 0% -11.15% (p=0.000 n=10+10)
Results on crypto/tls:
HandshakeServer/RSA-8 901µs ± 1% 777µs ± 0% -13.70% (p=0.000 n=10+8)
HandshakeServer/ECDHE-P256-RSA-8 1.01ms ± 1% 0.90ms ± 0% -11.53% (p=0.000 n=10+9)
Full math/big benchmarks:
name old time/op new time/op delta
AddVV/1-8 3.74ns ± 6% 3.55ns ± 2% ~ (p=0.082 n=10+8)
AddVV/2-8 3.96ns ± 2% 3.98ns ± 5% ~ (p=0.794 n=10+9)
AddVV/3-8 4.97ns ± 2% 4.94ns ± 1% ~ (p=0.081 n=10+9)
AddVV/4-8 5.59ns ± 2% 5.59ns ± 2% ~ (p=0.809 n=10+10)
AddVV/5-8 6.63ns ± 1% 6.62ns ± 1% ~ (p=0.560 n=9+10)
AddVV/10-8 8.11ns ± 1% 8.11ns ± 2% ~ (p=0.402 n=10+10)
AddVV/100-8 46.9ns ± 2% 46.8ns ± 1% ~ (p=0.809 n=10+10)
AddVV/1000-8 389ns ± 1% 391ns ± 4% ~ (p=0.809 n=10+10)
AddVV/10000-8 5.05µs ± 5% 4.98µs ± 2% ~ (p=0.113 n=9+10)
AddVV/100000-8 55.3µs ± 3% 55.2µs ± 3% ~ (p=0.796 n=10+10)
AddVW/1-8 3.04ns ± 3% 3.02ns ± 3% ~ (p=0.538 n=10+10)
AddVW/2-8 3.57ns ± 2% 3.61ns ± 2% +1.12% (p=0.032 n=9+9)
AddVW/3-8 3.77ns ± 1% 3.79ns ± 2% ~ (p=0.719 n=10+10)
AddVW/4-8 4.69ns ± 1% 4.69ns ± 2% ~ (p=0.920 n=10+9)
AddVW/5-8 4.58ns ± 1% 4.58ns ± 1% ~ (p=0.812 n=10+10)
AddVW/10-8 7.62ns ± 2% 7.63ns ± 1% ~ (p=0.926 n=10+10)
AddVW/100-8 41.1ns ± 2% 42.4ns ± 3% +3.34% (p=0.000 n=10+10)
AddVW/1000-8 386ns ± 2% 389ns ± 4% ~ (p=0.514 n=10+10)
AddVW/10000-8 3.88µs ± 3% 3.87µs ± 3% ~ (p=0.448 n=10+10)
AddVW/100000-8 41.2µs ± 3% 41.7µs ± 3% ~ (p=0.148 n=10+10)
AddMulVVW/1-8 3.28ns ± 2% 3.26ns ± 3% ~ (p=0.107 n=10+10)
AddMulVVW/2-8 4.26ns ± 2% 4.24ns ± 3% ~ (p=0.327 n=9+9)
AddMulVVW/3-8 5.07ns ± 2% 5.26ns ± 2% +3.73% (p=0.000 n=10+10)
AddMulVVW/4-8 6.40ns ± 2% 6.50ns ± 2% +1.61% (p=0.000 n=10+10)
AddMulVVW/5-8 6.77ns ± 2% 6.86ns ± 1% +1.38% (p=0.001 n=9+9)
AddMulVVW/10-8 12.2ns ± 2% 10.6ns ± 3% -13.65% (p=0.000 n=10+10)
AddMulVVW/100-8 79.7ns ± 2% 52.4ns ± 1% -34.17% (p=0.000 n=10+10)
AddMulVVW/1000-8 695ns ± 1% 491ns ± 2% -29.39% (p=0.000 n=9+10)
AddMulVVW/10000-8 7.26µs ± 2% 5.92µs ± 6% -18.42% (p=0.000 n=10+10)
AddMulVVW/100000-8 72.6µs ± 2% 62.2µs ± 2% -14.31% (p=0.000 n=10+10)
DecimalConversion-8 108µs ±19% 104µs ± 4% ~ (p=0.460 n=10+8)
FloatString/100-8 926ns ±14% 908ns ± 5% ~ (p=0.398 n=9+9)
FloatString/1000-8 25.7µs ± 1% 25.7µs ± 1% ~ (p=0.739 n=10+10)
FloatString/10000-8 2.13ms ± 1% 2.12ms ± 1% ~ (p=0.353 n=10+10)
FloatString/100000-8 207ms ± 1% 206ms ± 2% ~ (p=0.912 n=10+10)
FloatAdd/10-8 61.3ns ± 3% 61.9ns ± 3% ~ (p=0.183 n=10+10)
FloatAdd/100-8 62.0ns ± 2% 62.9ns ± 4% ~ (p=0.118 n=10+10)
FloatAdd/1000-8 84.7ns ± 2% 84.4ns ± 1% ~ (p=0.591 n=10+10)
FloatAdd/10000-8 305ns ± 2% 306ns ± 1% ~ (p=0.443 n=10+10)
FloatAdd/100000-8 2.45µs ± 1% 2.46µs ± 1% ~ (p=0.782 n=10+10)
FloatSub/10-8 56.8ns ± 4% 56.5ns ± 5% ~ (p=0.423 n=10+10)
FloatSub/100-8 57.3ns ± 4% 57.1ns ± 5% ~ (p=0.540 n=10+10)
FloatSub/1000-8 66.8ns ± 4% 66.6ns ± 1% ~ (p=0.868 n=10+10)
FloatSub/10000-8 199ns ± 1% 198ns ± 1% ~ (p=0.287 n=10+9)
FloatSub/100000-8 1.47µs ± 2% 1.47µs ± 2% ~ (p=0.920 n=10+9)
ParseFloatSmallExp-8 8.74µs ±10% 9.48µs ±10% +8.51% (p=0.010 n=9+10)
ParseFloatLargeExp-8 39.2µs ±25% 39.6µs ±12% ~ (p=0.529 n=10+10)
GCD10x10/WithoutXY-8 173ns ±23% 177ns ±20% ~ (p=0.698 n=10+10)
GCD10x10/WithXY-8 736ns ±12% 728ns ±16% ~ (p=0.838 n=10+10)
GCD10x100/WithoutXY-8 325ns ±16% 326ns ±14% ~ (p=0.912 n=10+10)
GCD10x100/WithXY-8 1.14µs ±13% 1.16µs ± 6% ~ (p=0.287 n=10+9)
GCD10x1000/WithoutXY-8 851ns ±25% 820ns ±12% ~ (p=0.592 n=10+10)
GCD10x1000/WithXY-8 2.89µs ±17% 2.85µs ± 5% ~ (p=1.000 n=10+9)
GCD10x10000/WithoutXY-8 6.66µs ±12% 6.82µs ±19% ~ (p=0.529 n=10+10)
GCD10x10000/WithXY-8 18.0µs ± 5% 17.2µs ±19% ~ (p=0.315 n=7+10)
GCD10x100000/WithoutXY-8 77.8µs ±18% 73.3µs ±11% ~ (p=0.315 n=10+9)
GCD10x100000/WithXY-8 186µs ±14% 204µs ±29% ~ (p=0.218 n=10+10)
GCD100x100/WithoutXY-8 1.09µs ± 1% 1.09µs ± 2% ~ (p=0.117 n=9+10)
GCD100x100/WithXY-8 7.93µs ± 1% 7.97µs ± 1% +0.52% (p=0.006 n=10+10)
GCD100x1000/WithoutXY-8 2.00µs ± 3% 2.04µs ± 6% ~ (p=0.053 n=9+10)
GCD100x1000/WithXY-8 9.23µs ± 1% 9.29µs ± 1% +0.63% (p=0.009 n=10+10)
GCD100x10000/WithoutXY-8 10.2µs ±11% 9.7µs ± 6% ~ (p=0.278 n=10+9)
GCD100x10000/WithXY-8 33.3µs ± 4% 33.6µs ± 4% ~ (p=0.481 n=10+10)
GCD100x100000/WithoutXY-8 106µs ±17% 105µs ±13% ~ (p=0.853 n=10+10)
GCD100x100000/WithXY-8 289µs ±17% 276µs ± 8% ~ (p=0.353 n=10+10)
GCD1000x1000/WithoutXY-8 12.2µs ± 1% 12.1µs ± 1% -0.45% (p=0.007 n=10+10)
GCD1000x1000/WithXY-8 131µs ± 1% 132µs ± 0% +0.93% (p=0.000 n=9+7)
GCD1000x10000/WithoutXY-8 20.6µs ± 2% 20.6µs ± 1% ~ (p=0.326 n=10+9)
GCD1000x10000/WithXY-8 238µs ± 1% 237µs ± 1% ~ (p=0.356 n=9+10)
GCD1000x100000/WithoutXY-8 117µs ± 8% 114µs ±11% ~ (p=0.190 n=10+10)
GCD1000x100000/WithXY-8 1.51ms ± 1% 1.50ms ± 1% ~ (p=0.053 n=9+10)
GCD10000x10000/WithoutXY-8 220µs ± 1% 218µs ± 1% -0.86% (p=0.000 n=10+10)
GCD10000x10000/WithXY-8 3.04ms ± 0% 3.05ms ± 0% +0.33% (p=0.001 n=9+10)
GCD10000x100000/WithoutXY-8 513µs ± 0% 511µs ± 0% -0.38% (p=0.000 n=10+10)
GCD10000x100000/WithXY-8 15.1ms ± 0% 15.0ms ± 0% ~ (p=0.053 n=10+9)
GCD100000x100000/WithoutXY-8 10.4ms ± 1% 10.4ms ± 2% ~ (p=0.258 n=9+9)
GCD100000x100000/WithXY-8 205ms ± 1% 205ms ± 1% ~ (p=0.481 n=10+10)
Hilbert-8 1.25ms ±15% 1.24ms ±17% ~ (p=0.853 n=10+10)
Binomial-8 3.03µs ±24% 2.90µs ±16% ~ (p=0.481 n=10+10)
QuoRem-8 1.95µs ± 1% 1.95µs ± 2% ~ (p=0.117 n=9+10)
Exp-8 5.12ms ± 2% 3.99ms ± 1% -22.02% (p=0.000 n=10+9)
Exp2-8 5.14ms ± 2% 3.98ms ± 0% -22.55% (p=0.000 n=10+9)
Bitset-8 16.4ns ± 2% 16.5ns ± 2% ~ (p=0.311 n=9+10)
BitsetNeg-8 46.3ns ± 4% 45.8ns ± 4% ~ (p=0.272 n=10+10)
BitsetOrig-8 250ns ±19% 247ns ±14% ~ (p=0.671 n=10+10)
BitsetNegOrig-8 416ns ±14% 429ns ±14% ~ (p=0.353 n=10+10)
ModSqrt225_Tonelli-8 400µs ± 0% 320µs ± 0% -19.88% (p=0.000 n=9+7)
ModSqrt224_3Mod4-8 123µs ± 1% 97µs ± 0% -21.21% (p=0.000 n=9+10)
ModSqrt5430_Tonelli-8 1.87s ± 0% 1.39s ± 1% -25.70% (p=0.000 n=9+10)
ModSqrt5430_3Mod4-8 630ms ± 2% 465ms ± 1% -26.12% (p=0.000 n=10+10)
Sqrt-8 25.8µs ± 1% 25.9µs ± 0% +0.66% (p=0.002 n=10+8)
IntSqr/1-8 11.3ns ± 1% 11.3ns ± 2% ~ (p=0.360 n=9+10)
IntSqr/2-8 26.6ns ± 1% 27.4ns ± 2% +2.87% (p=0.000 n=8+9)
IntSqr/3-8 36.5ns ± 6% 36.6ns ± 5% ~ (p=0.589 n=10+10)
IntSqr/5-8 57.2ns ± 2% 57.8ns ± 1% +0.92% (p=0.045 n=10+9)
IntSqr/8-8 112ns ± 1% 93ns ± 1% -16.60% (p=0.000 n=10+10)
IntSqr/10-8 148ns ± 1% 129ns ± 5% -12.85% (p=0.000 n=10+10)
IntSqr/20-8 642ns ±28% 692ns ±21% ~ (p=0.105 n=10+10)
IntSqr/30-8 1.03µs ±18% 1.06µs ±15% ~ (p=0.422 n=10+8)
IntSqr/50-8 2.33µs ±14% 2.14µs ±20% ~ (p=0.063 n=10+10)
IntSqr/80-8 4.06µs ±13% 3.72µs ±14% -8.31% (p=0.029 n=10+10)
IntSqr/100-8 5.79µs ±10% 5.20µs ±18% -10.15% (p=0.004 n=10+10)
IntSqr/200-8 17.1µs ± 1% 12.9µs ± 3% -24.44% (p=0.000 n=10+10)
IntSqr/300-8 35.9µs ± 0% 26.6µs ± 1% -25.75% (p=0.000 n=10+10)
IntSqr/500-8 84.9µs ± 0% 71.7µs ± 1% -15.49% (p=0.000 n=10+10)
IntSqr/800-8 170µs ± 1% 142µs ± 2% -16.73% (p=0.000 n=10+10)
IntSqr/1000-8 258µs ± 1% 218µs ± 1% -15.65% (p=0.000 n=10+10)
Mul-8 10.4ms ± 1% 8.3ms ± 0% -20.05% (p=0.000 n=10+9)
Exp3Power/0x10-8 311ns ±15% 321ns ±24% ~ (p=0.447 n=10+10)
Exp3Power/0x40-8 358ns ±21% 346ns ±37% ~ (p=0.591 n=10+10)
Exp3Power/0x100-8 611ns ±19% 570ns ±27% ~ (p=0.393 n=10+10)
Exp3Power/0x400-8 1.31µs ±26% 1.34µs ±19% ~ (p=0.853 n=10+10)
Exp3Power/0x1000-8 6.76µs ±23% 6.22µs ±16% ~ (p=0.095 n=10+9)
Exp3Power/0x4000-8 37.6µs ±14% 36.4µs ±21% ~ (p=0.247 n=10+10)
Exp3Power/0x10000-8 345µs ±14% 310µs ±11% -9.99% (p=0.005 n=10+10)
Exp3Power/0x40000-8 2.77ms ± 1% 2.34ms ± 1% -15.47% (p=0.000 n=10+10)
Exp3Power/0x100000-8 25.1ms ± 1% 21.3ms ± 1% -15.26% (p=0.000 n=10+10)
Exp3Power/0x400000-8 225ms ± 1% 190ms ± 1% -15.61% (p=0.000 n=10+10)
Fibo-8 23.4ms ± 1% 23.3ms ± 0% ~ (p=0.052 n=10+10)
NatSqr/1-8 58.4ns ±24% 59.8ns ±38% ~ (p=0.739 n=10+10)
NatSqr/2-8 122ns ±21% 122ns ±16% ~ (p=0.896 n=10+10)
NatSqr/3-8 140ns ±28% 148ns ±30% ~ (p=0.288 n=10+10)
NatSqr/5-8 193ns ±29% 210ns ±34% ~ (p=0.469 n=10+10)
NatSqr/8-8 317ns ±21% 296ns ±25% ~ (p=0.393 n=10+10)
NatSqr/10-8 362ns ± 8% 373ns ±30% ~ (p=0.617 n=9+10)
NatSqr/20-8 1.24µs ±16% 1.06µs ±29% -14.57% (p=0.019 n=10+10)
NatSqr/30-8 1.90µs ±32% 1.71µs ±10% ~ (p=0.176 n=10+9)
NatSqr/50-8 4.22µs ±19% 3.67µs ± 7% -13.03% (p=0.017 n=10+9)
NatSqr/80-8 7.33µs ±20% 6.50µs ±15% -11.26% (p=0.009 n=10+10)
NatSqr/100-8 9.84µs ±18% 9.33µs ± 8% ~ (p=0.280 n=10+10)
NatSqr/200-8 21.4µs ± 7% 20.0µs ±14% ~ (p=0.075 n=10+10)
NatSqr/300-8 38.0µs ± 2% 31.3µs ±10% -17.63% (p=0.000 n=10+10)
NatSqr/500-8 102µs ± 5% 101µs ± 4% ~ (p=0.780 n=9+10)
NatSqr/800-8 190µs ± 3% 166µs ± 6% -12.29% (p=0.000 n=10+10)
NatSqr/1000-8 277µs ± 2% 245µs ± 6% -11.64% (p=0.000 n=10+10)
ScanPi-8 144µs ±23% 149µs ±24% ~ (p=0.579 n=10+10)
StringPiParallel-8 25.6µs ± 0% 25.8µs ± 0% +0.69% (p=0.000 n=9+10)
Scan/10/Base2-8 305ns ± 1% 309ns ± 1% +1.32% (p=0.000 n=10+9)
Scan/100/Base2-8 1.95µs ± 1% 1.98µs ± 1% +1.10% (p=0.000 n=10+10)
Scan/1000/Base2-8 19.5µs ± 1% 19.7µs ± 1% +1.39% (p=0.000 n=10+10)
Scan/10000/Base2-8 270µs ± 1% 272µs ± 1% +0.58% (p=0.024 n=9+9)
Scan/100000/Base2-8 10.3ms ± 0% 10.3ms ± 0% +0.16% (p=0.022 n=9+10)
Scan/10/Base8-8 146ns ± 4% 154ns ± 4% +5.57% (p=0.000 n=9+9)
Scan/100/Base8-8 748ns ± 1% 759ns ± 1% +1.51% (p=0.000 n=9+10)
Scan/1000/Base8-8 7.88µs ± 1% 8.00µs ± 1% +1.64% (p=0.000 n=10+10)
Scan/10000/Base8-8 155µs ± 1% 155µs ± 1% ~ (p=0.968 n=10+9)
Scan/100000/Base8-8 9.11ms ± 0% 9.11ms ± 0% ~ (p=0.604 n=9+10)
Scan/10/Base10-8 140ns ± 5% 149ns ± 5% +6.39% (p=0.000 n=9+10)
Scan/100/Base10-8 680ns ± 0% 688ns ± 1% +1.08% (p=0.000 n=9+10)
Scan/1000/Base10-8 7.09µs ± 1% 7.16µs ± 1% +0.98% (p=0.019 n=10+10)
Scan/10000/Base10-8 149µs ± 3% 150µs ± 3% ~ (p=0.143 n=10+10)
Scan/100000/Base10-8 9.16ms ± 0% 9.16ms ± 0% ~ (p=0.661 n=10+9)
Scan/10/Base16-8 134ns ± 5% 135ns ± 3% ~ (p=0.505 n=9+9)
Scan/100/Base16-8 560ns ± 1% 563ns ± 0% +0.67% (p=0.000 n=10+8)
Scan/1000/Base16-8 6.28µs ± 1% 6.26µs ± 1% ~ (p=0.448 n=10+10)
Scan/10000/Base16-8 161µs ± 1% 162µs ± 1% +0.74% (p=0.008 n=9+9)
Scan/100000/Base16-8 9.64ms ± 0% 9.64ms ± 0% ~ (p=0.436 n=10+10)
String/10/Base2-8 116ns ±12% 118ns ±13% ~ (p=0.645 n=10+10)
String/100/Base2-8 871ns ±23% 860ns ±22% ~ (p=0.699 n=10+10)
String/1000/Base2-8 10.0µs ±20% 10.0µs ±23% ~ (p=0.853 n=10+10)
String/10000/Base2-8 110µs ±21% 120µs ±25% ~ (p=0.436 n=10+10)
String/100000/Base2-8 768µs ±11% 733µs ±16% ~ (p=0.393 n=10+10)
String/10/Base8-8 51.3ns ± 1% 51.0ns ± 3% ~ (p=0.286 n=9+9)
String/100/Base8-8 284ns ± 9% 272ns ±12% ~ (p=0.267 n=9+10)
String/1000/Base8-8 3.06µs ± 9% 3.04µs ±10% ~ (p=0.739 n=10+10)
String/10000/Base8-8 36.1µs ±14% 35.1µs ± 9% ~ (p=0.447 n=10+9)
String/100000/Base8-8 371µs ±12% 373µs ±16% ~ (p=0.739 n=10+10)
String/10/Base10-8 167ns ±11% 165ns ± 9% ~ (p=0.781 n=10+10)
String/100/Base10-8 727ns ± 1% 740ns ± 2% +1.70% (p=0.001 n=10+10)
String/1000/Base10-8 5.30µs ±18% 5.37µs ±14% ~ (p=0.631 n=10+10)
String/10000/Base10-8 45.0µs ±14% 44.6µs ±10% ~ (p=0.720 n=9+10)
String/100000/Base10-8 5.10ms ± 1% 5.05ms ± 3% ~ (p=0.211 n=9+10)
String/10/Base16-8 47.7ns ± 6% 47.7ns ± 6% ~ (p=0.985 n=10+10)
String/100/Base16-8 221ns ±10% 234ns ±27% ~ (p=0.541 n=10+10)
String/1000/Base16-8 2.23µs ±11% 2.12µs ± 8% -4.81% (p=0.029 n=9+8)
String/10000/Base16-8 28.3µs ±21% 28.5µs ±14% ~ (p=0.796 n=10+10)
String/100000/Base16-8 291µs ±16% 293µs ±15% ~ (p=0.931 n=9+9)
LeafSize/0-8 2.43ms ± 1% 2.49ms ± 1% +2.56% (p=0.000 n=10+10)
LeafSize/1-8 49.7µs ± 9% 46.3µs ±16% -6.78% (p=0.017 n=10+9)
LeafSize/2-8 48.4µs ±18% 46.3µs ±19% ~ (p=0.436 n=10+10)
LeafSize/3-8 81.7µs ± 3% 80.9µs ± 3% ~ (p=0.278 n=10+9)
LeafSize/4-8 47.0µs ± 7% 47.9µs ±13% ~ (p=0.905 n=9+10)
LeafSize/5-8 96.8µs ± 1% 97.3µs ± 2% ~ (p=0.515 n=8+10)
LeafSize/6-8 82.5µs ± 4% 80.9µs ± 2% -1.92% (p=0.019 n=10+10)
LeafSize/7-8 67.2µs ±13% 66.6µs ± 9% ~ (p=0.842 n=10+9)
LeafSize/8-8 46.0µs ±28% 45.1µs ±12% ~ (p=0.739 n=10+10)
LeafSize/9-8 111µs ± 1% 111µs ± 1% ~ (p=0.739 n=10+10)
LeafSize/10-8 98.8µs ± 4% 97.9µs ± 3% ~ (p=0.278 n=10+9)
LeafSize/11-8 96.8µs ± 1% 96.4µs ± 1% ~ (p=0.211 n=9+10)
LeafSize/12-8 81.0µs ± 4% 81.3µs ± 3% ~ (p=0.579 n=10+10)
LeafSize/13-8 79.7µs ± 5% 79.2µs ± 3% ~ (p=0.661 n=10+9)
LeafSize/14-8 67.6µs ±12% 65.8µs ± 7% ~ (p=0.447 n=10+9)
LeafSize/15-8 63.9µs ±17% 66.3µs ±14% ~ (p=0.481 n=10+10)
LeafSize/16-8 44.0µs ±28% 46.0µs ±27% ~ (p=0.481 n=10+10)
LeafSize/32-8 46.2µs ±13% 43.5µs ±18% ~ (p=0.156 n=9+10)
LeafSize/64-8 53.3µs ±10% 53.0µs ±19% ~ (p=0.730 n=9+9)
ProbablyPrime/n=0-8 3.60ms ± 1% 3.39ms ± 1% -5.87% (p=0.000 n=10+9)
ProbablyPrime/n=1-8 4.42ms ± 1% 4.08ms ± 1% -7.69% (p=0.000 n=10+10)
ProbablyPrime/n=5-8 7.57ms ± 2% 6.79ms ± 1% -10.24% (p=0.000 n=10+10)
ProbablyPrime/n=10-8 11.6ms ± 2% 10.2ms ± 1% -11.69% (p=0.000 n=10+10)
ProbablyPrime/n=20-8 19.4ms ± 2% 16.9ms ± 2% -12.89% (p=0.000 n=10+10)
ProbablyPrime/Lucas-8 2.81ms ± 2% 2.72ms ± 1% -3.22% (p=0.000 n=10+9)
ProbablyPrime/MillerRabinBase2-8 797µs ± 1% 680µs ± 1% -14.64% (p=0.000 n=10+10)
name old speed new speed delta
AddVV/1-8 17.1GB/s ± 6% 18.0GB/s ± 2% ~ (p=0.122 n=10+8)
AddVV/2-8 32.4GB/s ± 2% 32.2GB/s ± 4% ~ (p=0.661 n=10+9)
AddVV/3-8 38.6GB/s ± 2% 38.9GB/s ± 1% ~ (p=0.113 n=10+9)
AddVV/4-8 45.8GB/s ± 2% 45.8GB/s ± 2% ~ (p=0.796 n=10+10)
AddVV/5-8 48.1GB/s ± 2% 48.3GB/s ± 1% ~ (p=0.315 n=10+10)
AddVV/10-8 78.9GB/s ± 1% 78.9GB/s ± 2% ~ (p=0.353 n=10+10)
AddVV/100-8 136GB/s ± 2% 137GB/s ± 1% ~ (p=0.971 n=10+10)
AddVV/1000-8 164GB/s ± 1% 164GB/s ± 4% ~ (p=0.853 n=10+10)
AddVV/10000-8 126GB/s ± 6% 129GB/s ± 2% ~ (p=0.063 n=10+10)
AddVV/100000-8 116GB/s ± 3% 116GB/s ± 3% ~ (p=0.796 n=10+10)
AddVW/1-8 2.64GB/s ± 3% 2.64GB/s ± 3% ~ (p=0.579 n=10+10)
AddVW/2-8 4.49GB/s ± 2% 4.44GB/s ± 2% -1.09% (p=0.040 n=9+9)
AddVW/3-8 6.36GB/s ± 1% 6.34GB/s ± 2% ~ (p=0.684 n=10+10)
AddVW/4-8 6.83GB/s ± 1% 6.82GB/s ± 2% ~ (p=0.905 n=10+9)
AddVW/5-8 8.75GB/s ± 1% 8.73GB/s ± 1% ~ (p=0.796 n=10+10)
AddVW/10-8 10.5GB/s ± 2% 10.5GB/s ± 1% ~ (p=0.971 n=10+10)
AddVW/100-8 19.5GB/s ± 2% 18.9GB/s ± 2% -3.22% (p=0.000 n=10+10)
AddVW/1000-8 20.7GB/s ± 2% 20.6GB/s ± 4% ~ (p=0.631 n=10+10)
AddVW/10000-8 20.6GB/s ± 3% 20.7GB/s ± 3% ~ (p=0.481 n=10+10)
AddVW/100000-8 19.4GB/s ± 2% 19.2GB/s ± 3% ~ (p=0.165 n=10+10)
AddMulVVW/1-8 19.5GB/s ± 2% 19.7GB/s ± 3% ~ (p=0.123 n=10+10)
AddMulVVW/2-8 30.1GB/s ± 2% 30.2GB/s ± 3% ~ (p=0.297 n=9+9)
AddMulVVW/3-8 37.9GB/s ± 2% 36.5GB/s ± 2% -3.63% (p=0.000 n=10+10)
AddMulVVW/4-8 40.0GB/s ± 2% 39.4GB/s ± 2% -1.58% (p=0.001 n=10+10)
AddMulVVW/5-8 47.3GB/s ± 2% 46.6GB/s ± 1% -1.35% (p=0.001 n=9+9)
AddMulVVW/10-8 52.3GB/s ± 2% 60.6GB/s ± 3% +15.76% (p=0.000 n=10+10)
AddMulVVW/100-8 80.3GB/s ± 2% 122.1GB/s ± 1% +51.92% (p=0.000 n=10+10)
AddMulVVW/1000-8 92.0GB/s ± 1% 130.3GB/s ± 2% +41.61% (p=0.000 n=9+10)
AddMulVVW/10000-8 88.2GB/s ± 2% 108.2GB/s ± 5% +22.66% (p=0.000 n=10+10)
AddMulVVW/100000-8 88.2GB/s ± 2% 102.9GB/s ± 2% +16.69% (p=0.000 n=10+10)
Change-Id: Ic98e30c91d437d845fed03e07e976c3fdbf02b36
Reviewed-on: https://go-review.googlesource.com/74851
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Adam Langley <agl@golang.org>
|
|
Fixes #20986.
Change-Id: Ic3cf5c0ab260f259ecff7b92cfdf5f4ae432aef3
Reviewed-on: https://go-review.googlesource.com/73072
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Verified that BenchmarkBitLen time went down from 2.25 ns/op to 0.65 ns/op
an a 2.3 GHz Intel Core i7, before removing that benchmark (now covered by
math/bits benchmarks).
Change-Id: I3890bb7d1889e95b9a94bd68f0bdf06f1885adeb
Reviewed-on: https://go-review.googlesource.com/38464
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
- Add new BenchmarkQuoRem.
- Eliminate allocation in divLarge nat pool
- Unroll mulAddVWW body 4x
- Remove some redundant slice loads in divLarge
name old time/op new time/op delta
QuoRem-8 2.18µs ± 1% 1.93µs ± 1% -11.38% (p=0.000 n=19+18)
The starting point in the comparison here is Cherry's
pending CL to turn mulWW and divWW into intrinsics.
The optimizations in divLarge work best because all
the function calls are gone. The effect of this CL is not
as large if you don't assume Cherry's CL.
Change-Id: Ia6138907489c5b9168497912e43705634e163b35
Reviewed-on: https://go-review.googlesource.com/30613
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
On Haswell I measure anywhere between 2X to 3.5X speedup for RSA.
I believe other architectures will also greatly improve.
In the future may be upgraded by dedicated assembly routine.
Built-in benchmarks i5-4278U turbo off:
benchmark old ns/op new ns/op delta
BenchmarkRSA2048Decrypt 6696649 3073769 -54.10%
Benchmark3PrimeRSA2048Decrypt 4472340 1669080 -62.68%
Change-Id: I17df84f85e34208f990665f9f90ea671695b2add
Reviewed-on: https://go-review.googlesource.com/9253
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Adam Langley <agl@golang.org>
Reviewed-by: Vlad Krasnov <vlad@cloudflare.com>
Run-TryBot: Adam Langley <agl@golang.org>
|
|
To use a pure Go implementation of the low-level arithmetic
functions (when no platform-specific assembly implementations
are available), set the build tag math_big_pure_go.
This will make it easy to vendor the math/big package where no
assembly is available (for instance for use with gc which relies
on 1.4 functionality for now).
Change-Id: I91e17c0fdc568a20ec1512d7c64621241dc60c17
Reviewed-on: https://go-review.googlesource.com/7856
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Replaced use of rotate instructions (RCRQ, RCLQ) with ADDQ/SBBQ
for restoring/saving the carry flag per suggestion from Torbjörn
Granlund (author of GMP bignum libs for C).
The rotate instructions tend to be slower on todays machines.
benchmark old ns/op new ns/op delta
BenchmarkAddVV_1 5.69 5.51 -3.16%
BenchmarkAddVV_2 7.15 6.87 -3.92%
BenchmarkAddVV_3 8.69 8.06 -7.25%
BenchmarkAddVV_4 8.10 8.13 +0.37%
BenchmarkAddVV_5 8.37 8.47 +1.19%
BenchmarkAddVV_1e1 13.1 12.0 -8.40%
BenchmarkAddVV_1e2 78.1 69.4 -11.14%
BenchmarkAddVV_1e3 815 656 -19.51%
BenchmarkAddVV_1e4 8137 7345 -9.73%
BenchmarkAddVV_1e5 100127 93909 -6.21%
BenchmarkAddVW_1 4.86 4.71 -3.09%
BenchmarkAddVW_2 5.67 5.50 -3.00%
BenchmarkAddVW_3 6.51 6.34 -2.61%
BenchmarkAddVW_4 6.69 6.66 -0.45%
BenchmarkAddVW_5 7.20 7.21 +0.14%
BenchmarkAddVW_1e1 10.0 9.34 -6.60%
BenchmarkAddVW_1e2 45.4 52.3 +15.20%
BenchmarkAddVW_1e3 417 491 +17.75%
BenchmarkAddVW_1e4 4760 4852 +1.93%
BenchmarkAddVW_1e5 69107 67717 -2.01%
benchmark old MB/s new MB/s speedup
BenchmarkAddVV_1 11241.82 11610.28 1.03x
BenchmarkAddVV_2 17902.68 18631.82 1.04x
BenchmarkAddVV_3 22082.43 23835.64 1.08x
BenchmarkAddVV_4 31588.18 31492.06 1.00x
BenchmarkAddVV_5 38229.90 37783.17 0.99x
BenchmarkAddVV_1e1 48891.67 53340.91 1.09x
BenchmarkAddVV_1e2 81940.61 92191.86 1.13x
BenchmarkAddVV_1e3 78443.09 97480.44 1.24x
BenchmarkAddVV_1e4 78644.18 87129.50 1.11x
BenchmarkAddVV_1e5 63918.48 68150.84 1.07x
BenchmarkAddVW_1 13165.09 13581.00 1.03x
BenchmarkAddVW_2 22588.04 23275.41 1.03x
BenchmarkAddVW_3 29483.82 30303.96 1.03x
BenchmarkAddVW_4 38286.54 38453.21 1.00x
BenchmarkAddVW_5 44414.57 44370.59 1.00x
BenchmarkAddVW_1e1 63816.84 68494.08 1.07x
BenchmarkAddVW_1e2 140885.41 122427.16 0.87x
BenchmarkAddVW_1e3 153258.31 130325.28 0.85x
BenchmarkAddVW_1e4 134447.63 131904.02 0.98x
BenchmarkAddVW_1e5 92609.41 94509.88 1.02x
Change-Id: Ia473e9ab9c63a955c252426684176bca566645ae
Reviewed-on: https://go-review.googlesource.com/2503
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Preparation was in CL 134570043.
This CL contains only the effect of 'hg mv src/pkg/* src'.
For more about the move, see golang.org/s/go14nopkg.
|