| Age | Commit message (Collapse) | Author |
|
Fix comment as w&1 is the parity of 'x', not of 'n'.
Change-Id: Ia0e448f7e5896412ff9b164459ce15561ab624cc
GitHub-Last-Rev: 54ba08ab1055b5e6e506fc8ac06c2920ff095b6e
GitHub-Pull-Request: golang/go#29419
Reviewed-on: https://go-review.googlesource.com/c/155743
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
The s390x implementations for Sin/Cos/SinCos/Tan use assembly
routines which don't reduce arguments accurately enough for
huge inputs.
Fixes #29221.
Change-Id: I340f576899d67bb52a553c3ab22e6464172c936d
Reviewed-on: https://go-review.googlesource.com/c/154119
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
The previous comment mis-stated the number of bits in mPi4.
The correct value is 19*64 + 1 == 1217 bits.
Change-Id: Ife971ff6936ce2d5b81ce663ce48044749d592a0
Reviewed-on: https://go-review.googlesource.com/c/154017
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
This is a minor follow-up on https://golang.org/cl/153059.
TBR=iant
Updates #6794.
Change-Id: I03657dafc572959d46a03f86bbeb280825bc969d
Reviewed-on: https://go-review.googlesource.com/c/153845
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
This change implements Payne-Hanek range reduction by Pi/4
to properly calculate trigonometric functions of huge arguments.
The implementation is based on:
"ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit"
K. C. Ng et al, March 24, 1992
The major difference with the reference is that the simulated
multi-precision calculation of x*B is implemented using 64-bit
integer arithmetic rather than floating point to ease extraction
of the relevant bits of 4/Pi.
The assembly implementations for 386 were removed since the trigonometric
instructions only use a 66-bit representation of Pi internally for
reduction. It is not possible to use these instructions and maintain
accuracy without a prior accurate reduction in software as recommended
by Intel.
Fixes #6794
Change-Id: I31bf1369e0578891d738c5473447fe9b10560196
Reviewed-on: https://go-review.googlesource.com/c/153059
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
TrailingZeros16 is the only one of the TrailingZeros functions with a
named return value in the signature. This creates a sligthly
unpleasant effect in the godoc listing:
func TrailingZeros(x uint) int
func TrailingZeros16(x uint16) (n int)
func TrailingZeros32(x uint32) int
func TrailingZeros64(x uint64) int
func TrailingZeros8(x uint8) int
Since the named return value is not even used, remove it.
Change-Id: I15c5aedb6157003911b6e0685c357ce56e466c0e
Reviewed-on: https://go-review.googlesource.com/c/153340
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Fixes #27736.
Change-Id: Ibda7da7ec6e731626fc43abf3e8c1190117f7885
Reviewed-on: https://go-review.googlesource.com/c/153057
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
For many uses of math/big, most numbers are small in practice.
Prior to this change, big.NewInt allocated a minimum of five Words:
one to hold the value, and four as extra capacity.
In most cases, this extra capacity is waste.
Worse, allocating a single Word uses a fast malloc path for tiny allocs;
allocating five Words is more expensive in CPU as well as memory.
This change is a simple fix: Treat a request for one Word at its word.
I experimented with more complicated fixes and did not find anything
that outperformed this easy fix.
On some real world programs, this is a clear win.
The compiler:
name old alloc/op new alloc/op delta
Template 37.1MB ± 0% 37.0MB ± 0% -0.23% (p=0.008 n=5+5)
Unicode 29.2MB ± 0% 28.5MB ± 0% -2.48% (p=0.008 n=5+5)
GoTypes 133MB ± 0% 133MB ± 0% -0.05% (p=0.008 n=5+5)
Compiler 628MB ± 0% 628MB ± 0% -0.06% (p=0.008 n=5+5)
SSA 2.04GB ± 0% 2.03GB ± 0% -0.14% (p=0.008 n=5+5)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.23% (p=0.008 n=5+5)
GoParser 29.6MB ± 0% 29.6MB ± 0% -0.07% (p=0.008 n=5+5)
Reflect 82.3MB ± 0% 82.2MB ± 0% -0.05% (p=0.008 n=5+5)
Tar 36.2MB ± 0% 36.2MB ± 0% -0.12% (p=0.008 n=5+5)
XML 49.5MB ± 0% 49.4MB ± 0% -0.23% (p=0.008 n=5+5)
[Geo mean] 85.1MB 84.8MB -0.37%
name old allocs/op new allocs/op delta
Template 364k ± 0% 364k ± 0% ~ (p=0.476 n=5+5)
Unicode 341k ± 0% 341k ± 0% ~ (p=0.690 n=5+5)
GoTypes 1.37M ± 0% 1.37M ± 0% ~ (p=0.444 n=5+5)
Compiler 5.50M ± 0% 5.50M ± 0% +0.02% (p=0.008 n=5+5)
SSA 16.0M ± 0% 16.0M ± 0% +0.01% (p=0.008 n=5+5)
Flate 238k ± 0% 238k ± 0% ~ (p=0.222 n=5+5)
GoParser 305k ± 0% 305k ± 0% ~ (p=0.841 n=5+5)
Reflect 976k ± 0% 976k ± 0% ~ (p=0.222 n=5+5)
Tar 354k ± 0% 354k ± 0% ~ (p=0.103 n=5+5)
XML 450k ± 0% 450k ± 0% ~ (p=0.151 n=5+5)
[Geo mean] 837k 837k +0.01%
go.skylark.net (at ea6d2813de75ded8d157b9540bc3d3ad0b688623):
name old alloc/op new alloc/op delta
Hashtable-8 456kB ± 0% 299kB ± 0% -34.33% (p=0.000 n=9+9)
/bench_builtin_method-8 220kB ± 0% 190kB ± 0% -13.55% (p=0.000 n=9+10)
name old allocs/op new allocs/op delta
Hashtable-8 7.84k ± 0% 7.84k ± 0% ~ (all equal)
/bench_builtin_method-8 7.49k ± 0% 7.49k ± 0% ~ (all equal)
The math/big benchmarks are messy, which is predictable, since they
naturally exercise the bigger-than-one-word code more.
Also worth noting is that many of the benchmarks have very high variance.
I've omitted the opVV and opVW benchmarks, as they are unrelated.
name old time/op new time/op delta
DecimalConversion-8 92.5µs ± 1% 90.6µs ± 0% -2.12% (p=0.000 n=17+19)
FloatString/100-8 867ns ± 0% 871ns ± 0% +0.50% (p=0.000 n=18+18)
FloatString/1000-8 26.4µs ± 1% 26.5µs ± 1% ~ (p=0.396 n=20+19)
FloatString/10000-8 2.15ms ± 2% 2.16ms ± 2% ~ (p=0.089 n=19+20)
FloatString/100000-8 209ms ± 1% 209ms ± 1% ~ (p=0.583 n=19+19)
FloatAdd/10-8 63.5ns ± 2% 64.1ns ± 6% ~ (p=0.389 n=19+19)
FloatAdd/100-8 66.0ns ± 2% 65.8ns ± 2% ~ (p=0.825 n=20+20)
FloatAdd/1000-8 93.9ns ± 1% 94.3ns ± 1% ~ (p=0.273 n=19+20)
FloatAdd/10000-8 347ns ± 2% 342ns ± 1% -1.50% (p=0.000 n=18+18)
FloatAdd/100000-8 2.78µs ± 1% 2.78µs ± 2% ~ (p=0.961 n=20+19)
FloatSub/10-8 56.9ns ± 2% 57.8ns ± 3% +1.59% (p=0.001 n=19+19)
FloatSub/100-8 58.2ns ± 2% 58.9ns ± 2% +1.25% (p=0.004 n=20+20)
FloatSub/1000-8 74.9ns ± 1% 74.4ns ± 1% -0.76% (p=0.000 n=19+20)
FloatSub/10000-8 223ns ± 1% 220ns ± 2% -1.29% (p=0.000 n=16+20)
FloatSub/100000-8 1.66µs ± 1% 1.66µs ± 2% ~ (p=0.147 n=20+20)
ParseFloatSmallExp-8 8.38µs ± 0% 8.59µs ± 0% +2.48% (p=0.000 n=19+19)
ParseFloatLargeExp-8 31.1µs ± 0% 32.0µs ± 0% +3.04% (p=0.000 n=16+17)
GCD10x10/WithoutXY-8 115ns ± 1% 99ns ± 3% -14.07% (p=0.000 n=20+20)
GCD10x10/WithXY-8 322ns ± 0% 312ns ± 0% -3.11% (p=0.000 n=18+13)
GCD10x100/WithoutXY-8 233ns ± 1% 219ns ± 1% -5.73% (p=0.000 n=19+17)
GCD10x100/WithXY-8 709ns ± 0% 759ns ± 0% +7.04% (p=0.000 n=19+19)
GCD10x1000/WithoutXY-8 653ns ± 1% 642ns ± 1% -1.69% (p=0.000 n=17+20)
GCD10x1000/WithXY-8 1.35µs ± 0% 1.35µs ± 1% ~ (p=0.255 n=20+16)
GCD10x10000/WithoutXY-8 4.57µs ± 1% 4.61µs ± 1% +0.95% (p=0.000 n=18+17)
GCD10x10000/WithXY-8 6.82µs ± 0% 6.84µs ± 0% +0.27% (p=0.000 n=16+17)
GCD10x100000/WithoutXY-8 43.9µs ± 1% 44.0µs ± 1% +0.28% (p=0.000 n=18+17)
GCD10x100000/WithXY-8 60.6µs ± 0% 60.6µs ± 0% ~ (p=0.907 n=18+18)
GCD100x100/WithoutXY-8 1.13µs ± 0% 1.21µs ± 0% +6.39% (p=0.000 n=19+19)
GCD100x100/WithXY-8 1.82µs ± 0% 1.92µs ± 0% +5.24% (p=0.000 n=19+17)
GCD100x1000/WithoutXY-8 2.00µs ± 0% 2.03µs ± 1% +1.61% (p=0.000 n=18+16)
GCD100x1000/WithXY-8 3.22µs ± 0% 3.20µs ± 1% -0.83% (p=0.000 n=19+19)
GCD100x10000/WithoutXY-8 9.28µs ± 1% 9.17µs ± 1% -1.25% (p=0.000 n=18+19)
GCD100x10000/WithXY-8 13.5µs ± 0% 13.3µs ± 0% -1.12% (p=0.000 n=18+19)
GCD100x100000/WithoutXY-8 80.4µs ± 0% 78.6µs ± 0% -2.25% (p=0.000 n=19+19)
GCD100x100000/WithXY-8 114µs ± 0% 112µs ± 0% -1.46% (p=0.000 n=19+17)
GCD1000x1000/WithoutXY-8 12.9µs ± 1% 12.9µs ± 2% -0.50% (p=0.014 n=20+19)
GCD1000x1000/WithXY-8 19.6µs ± 1% 19.6µs ± 2% -0.28% (p=0.040 n=17+18)
GCD1000x10000/WithoutXY-8 22.4µs ± 0% 22.4µs ± 2% ~ (p=0.220 n=19+19)
GCD1000x10000/WithXY-8 57.0µs ± 0% 56.5µs ± 0% -0.87% (p=0.000 n=20+20)
GCD1000x100000/WithoutXY-8 116µs ± 0% 115µs ± 0% -0.49% (p=0.000 n=18+19)
GCD1000x100000/WithXY-8 410µs ± 0% 411µs ± 0% ~ (p=0.052 n=19+19)
GCD10000x10000/WithoutXY-8 247µs ± 1% 244µs ± 1% -0.92% (p=0.000 n=19+19)
GCD10000x10000/WithXY-8 476µs ± 1% 473µs ± 1% -0.48% (p=0.009 n=19+19)
GCD10000x100000/WithoutXY-8 573µs ± 1% 571µs ± 1% -0.45% (p=0.012 n=20+20)
GCD10000x100000/WithXY-8 3.35ms ± 1% 3.35ms ± 1% ~ (p=0.444 n=20+19)
GCD100000x100000/WithoutXY-8 12.0ms ± 2% 11.9ms ± 2% ~ (p=0.276 n=18+20)
GCD100000x100000/WithXY-8 27.3ms ± 1% 27.3ms ± 1% ~ (p=0.792 n=20+19)
Hilbert-8 672µs ± 0% 611µs ± 0% -9.02% (p=0.000 n=19+19)
Binomial-8 1.40µs ± 0% 1.18µs ± 0% -15.69% (p=0.000 n=16+14)
QuoRem-8 2.20µs ± 1% 2.17µs ± 1% -1.13% (p=0.000 n=19+19)
Exp-8 4.10ms ± 1% 4.11ms ± 1% ~ (p=0.296 n=20+19)
Exp2-8 4.11ms ± 1% 4.12ms ± 1% ~ (p=0.429 n=20+20)
Bitset-8 8.67ns ± 6% 8.74ns ± 4% ~ (p=0.139 n=19+17)
BitsetNeg-8 43.6ns ± 1% 43.8ns ± 2% +0.61% (p=0.036 n=20+20)
BitsetOrig-8 77.5ns ± 1% 68.4ns ± 1% -11.77% (p=0.000 n=19+20)
BitsetNegOrig-8 145ns ± 1% 141ns ± 1% -2.87% (p=0.000 n=19+20)
ModSqrt225_Tonelli-8 324µs ± 1% 324µs ± 1% ~ (p=0.409 n=18+20)
ModSqrt225_3Mod4-8 98.9µs ± 1% 99.1µs ± 1% ~ (p=0.298 n=19+18)
ModSqrt231_Tonelli-8 337µs ± 1% 337µs ± 1% ~ (p=0.718 n=20+18)
ModSqrt231_5Mod8-8 115µs ± 1% 114µs ± 1% -0.22% (p=0.050 n=20+20)
ModInverse-8 895ns ± 0% 869ns ± 1% -2.83% (p=0.000 n=17+17)
Sqrt-8 28.1µs ± 1% 28.1µs ± 0% -0.28% (p=0.000 n=16+20)
IntSqr/1-8 10.8ns ± 3% 10.5ns ± 3% -2.51% (p=0.000 n=19+17)
IntSqr/2-8 30.5ns ± 2% 30.3ns ± 4% -0.71% (p=0.035 n=18+18)
IntSqr/3-8 40.1ns ± 1% 40.1ns ± 1% ~ (p=0.710 n=20+17)
IntSqr/5-8 65.3ns ± 1% 65.4ns ± 2% ~ (p=0.744 n=19+19)
IntSqr/8-8 101ns ± 1% 102ns ± 0% ~ (p=0.234 n=19+20)
IntSqr/10-8 138ns ± 0% 138ns ± 2% ~ (p=0.827 n=18+18)
IntSqr/20-8 378ns ± 1% 378ns ± 1% ~ (p=0.479 n=18+18)
IntSqr/30-8 637ns ± 0% 638ns ± 1% ~ (p=0.051 n=18+20)
IntSqr/50-8 1.34µs ± 2% 1.34µs ± 1% ~ (p=0.970 n=18+19)
IntSqr/80-8 2.78µs ± 0% 2.78µs ± 1% -0.18% (p=0.006 n=19+17)
IntSqr/100-8 3.98µs ± 0% 3.98µs ± 0% ~ (p=0.057 n=17+19)
IntSqr/200-8 13.5µs ± 0% 13.5µs ± 1% -0.33% (p=0.000 n=19+17)
IntSqr/300-8 25.3µs ± 1% 25.3µs ± 1% ~ (p=0.361 n=19+20)
IntSqr/500-8 62.9µs ± 0% 62.9µs ± 1% ~ (p=0.899 n=17+17)
IntSqr/800-8 128µs ± 1% 127µs ± 1% -0.32% (p=0.016 n=18+20)
IntSqr/1000-8 192µs ± 0% 192µs ± 1% ~ (p=0.916 n=17+18)
Div/20/10-8 34.9ns ± 2% 35.6ns ± 1% +2.01% (p=0.000 n=20+20)
Div/200/100-8 218ns ± 1% 215ns ± 2% -1.43% (p=0.000 n=18+18)
Div/2000/1000-8 1.16µs ± 1% 1.15µs ± 1% -1.04% (p=0.000 n=19+20)
Div/20000/10000-8 35.7µs ± 1% 35.4µs ± 1% -0.69% (p=0.000 n=19+18)
Div/200000/100000-8 2.89ms ± 1% 2.88ms ± 1% -0.62% (p=0.007 n=19+20)
Mul-8 9.28ms ± 1% 9.27ms ± 1% ~ (p=0.563 n=18+18)
ZeroShifts/Shl-8 712ns ± 6% 716ns ± 7% ~ (p=0.597 n=20+20)
ZeroShifts/ShlSame-8 4.00ns ± 1% 4.06ns ± 5% ~ (p=0.162 n=18+20)
ZeroShifts/Shr-8 714ns ±10% 1285ns ±156% ~ (p=0.250 n=20+20)
ZeroShifts/ShrSame-8 4.00ns ± 1% 4.09ns ±10% +2.34% (p=0.048 n=16+19)
Exp3Power/0x10-8 154ns ± 0% 159ns ±13% ~ (p=0.197 n=14+20)
Exp3Power/0x40-8 171ns ± 1% 175ns ± 8% ~ (p=0.058 n=16+19)
Exp3Power/0x100-8 287ns ± 0% 316ns ± 4% +10.03% (p=0.000 n=17+19)
Exp3Power/0x400-8 698ns ± 1% 801ns ± 6% +14.75% (p=0.000 n=19+20)
Exp3Power/0x1000-8 2.87µs ± 0% 3.65µs ± 6% +27.24% (p=0.000 n=18+18)
Exp3Power/0x4000-8 21.9µs ± 1% 28.7µs ± 8% +31.09% (p=0.000 n=18+20)
Exp3Power/0x10000-8 204µs ± 0% 267µs ± 9% +30.81% (p=0.000 n=20+20)
Exp3Power/0x40000-8 1.86ms ± 0% 2.26ms ± 5% +21.68% (p=0.000 n=18+19)
Exp3Power/0x100000-8 17.5ms ± 1% 20.7ms ± 7% +18.39% (p=0.000 n=19+20)
Exp3Power/0x400000-8 156ms ± 0% 172ms ± 6% +10.54% (p=0.000 n=19+20)
Fibo-8 26.9ms ± 1% 27.5ms ± 3% +2.32% (p=0.000 n=19+19)
NatSqr/1-8 31.0ns ± 4% 39.5ns ±29% +27.25% (p=0.000 n=20+19)
NatSqr/2-8 54.1ns ± 1% 69.0ns ±28% +27.52% (p=0.000 n=20+20)
NatSqr/3-8 66.6ns ± 1% 83.0ns ±25% +24.59% (p=0.000 n=20+20)
NatSqr/5-8 97.1ns ± 1% 119.9ns ±12% +23.50% (p=0.000 n=16+20)
NatSqr/8-8 138ns ± 1% 171ns ± 9% +24.20% (p=0.000 n=19+20)
NatSqr/10-8 182ns ± 0% 225ns ± 9% +23.50% (p=0.000 n=16+20)
NatSqr/20-8 447ns ± 1% 624ns ± 6% +39.64% (p=0.000 n=19+19)
NatSqr/30-8 736ns ± 2% 986ns ± 9% +33.94% (p=0.000 n=19+20)
NatSqr/50-8 1.51µs ± 2% 1.97µs ± 9% +30.42% (p=0.000 n=20+20)
NatSqr/80-8 3.03µs ± 1% 3.67µs ± 7% +21.08% (p=0.000 n=20+20)
NatSqr/100-8 4.31µs ± 1% 5.20µs ± 7% +20.52% (p=0.000 n=19+20)
NatSqr/200-8 14.2µs ± 0% 16.3µs ± 4% +14.92% (p=0.000 n=19+20)
NatSqr/300-8 27.8µs ± 1% 33.2µs ± 7% +19.28% (p=0.000 n=20+18)
NatSqr/500-8 66.6µs ± 1% 74.5µs ± 3% +11.87% (p=0.000 n=18+18)
NatSqr/800-8 135µs ± 1% 165µs ± 7% +22.33% (p=0.000 n=20+20)
NatSqr/1000-8 200µs ± 0% 228µs ± 3% +14.39% (p=0.000 n=19+20)
NatSetBytes/8-8 8.87ns ± 4% 8.77ns ± 2% -1.17% (p=0.020 n=20+16)
NatSetBytes/24-8 38.6ns ± 3% 49.5ns ±29% +28.32% (p=0.000 n=18+19)
NatSetBytes/128-8 75.2ns ± 1% 120.7ns ±29% +60.60% (p=0.000 n=17+20)
NatSetBytes/7-8 16.2ns ± 2% 16.5ns ± 2% +1.76% (p=0.000 n=20+20)
NatSetBytes/23-8 46.5ns ± 1% 60.2ns ±24% +29.59% (p=0.000 n=20+20)
NatSetBytes/127-8 83.1ns ± 1% 118.2ns ±20% +42.33% (p=0.000 n=18+20)
ScanPi-8 89.1µs ± 1% 117.4µs ±12% +31.75% (p=0.000 n=18+20)
StringPiParallel-8 35.1µs ± 9% 40.2µs ±12% +14.53% (p=0.000 n=20+20)
Scan/10/Base2-8 410ns ±14% 429ns ±10% +4.47% (p=0.018 n=19+20)
Scan/100/Base2-8 3.05µs ±20% 2.97µs ±14% ~ (p=0.449 n=20+20)
Scan/1000/Base2-8 29.3µs ± 8% 30.1µs ±23% ~ (p=0.355 n=20+20)
Scan/10000/Base2-8 402µs ±13% 395µs ±14% ~ (p=0.355 n=20+20)
Scan/100000/Base2-8 11.8ms ±10% 11.6ms ± 1% ~ (p=0.245 n=17+18)
Scan/10/Base8-8 194ns ± 6% 196ns ±12% ~ (p=0.829 n=20+19)
Scan/100/Base8-8 1.11µs ±15% 1.11µs ±12% ~ (p=0.743 n=20+20)
Scan/1000/Base8-8 11.7µs ±10% 11.7µs ±12% ~ (p=0.904 n=20+20)
Scan/10000/Base8-8 209µs ± 7% 210µs ± 8% ~ (p=0.478 n=20+20)
Scan/100000/Base8-8 10.6ms ± 7% 10.4ms ± 6% ~ (p=0.112 n=20+18)
Scan/10/Base10-8 182ns ±12% 188ns ±11% +3.52% (p=0.044 n=20+20)
Scan/100/Base10-8 1.01µs ± 8% 1.00µs ±13% ~ (p=0.588 n=20+20)
Scan/1000/Base10-8 10.7µs ±20% 10.6µs ±14% ~ (p=0.560 n=20+20)
Scan/10000/Base10-8 195µs ±10% 194µs ± 9% ~ (p=0.883 n=20+20)
Scan/100000/Base10-8 10.6ms ± 2% 10.6ms ± 2% ~ (p=0.495 n=20+20)
Scan/10/Base16-8 166ns ±10% 174ns ±17% ~ (p=0.072 n=20+20)
Scan/100/Base16-8 836ns ±10% 826ns ±12% ~ (p=0.562 n=20+17)
Scan/1000/Base16-8 8.96µs ±13% 8.65µs ± 9% ~ (p=0.203 n=20+18)
Scan/10000/Base16-8 198µs ± 3% 198µs ± 5% ~ (p=0.718 n=20+20)
Scan/100000/Base16-8 11.1ms ± 3% 11.0ms ± 4% ~ (p=0.512 n=20+20)
String/10/Base2-8 88.1ns ± 7% 94.1ns ±11% +6.80% (p=0.000 n=19+20)
String/100/Base2-8 577ns ± 4% 598ns ± 5% +3.72% (p=0.000 n=20+20)
String/1000/Base2-8 5.25µs ± 2% 5.62µs ± 5% +7.04% (p=0.000 n=19+20)
String/10000/Base2-8 55.6µs ± 1% 60.1µs ± 2% +8.12% (p=0.000 n=19+19)
String/100000/Base2-8 519µs ± 2% 560µs ± 2% +7.91% (p=0.000 n=18+17)
String/10/Base8-8 52.2ns ± 8% 53.3ns ±12% ~ (p=0.188 n=20+18)
String/100/Base8-8 218ns ± 3% 232ns ±10% +6.66% (p=0.000 n=20+20)
String/1000/Base8-8 1.84µs ± 3% 1.94µs ± 4% +5.07% (p=0.000 n=20+18)
String/10000/Base8-8 18.1µs ± 2% 19.1µs ± 3% +5.84% (p=0.000 n=20+19)
String/100000/Base8-8 184µs ± 2% 197µs ± 1% +7.15% (p=0.000 n=19+19)
String/10/Base10-8 158ns ± 7% 146ns ± 6% -7.65% (p=0.000 n=20+19)
String/100/Base10-8 807ns ± 2% 845ns ± 4% +4.79% (p=0.000 n=20+19)
String/1000/Base10-8 3.99µs ± 3% 3.99µs ± 7% ~ (p=0.920 n=20+20)
String/10000/Base10-8 20.8µs ± 6% 22.1µs ±10% +6.11% (p=0.000 n=19+20)
String/100000/Base10-8 5.60ms ± 2% 5.59ms ± 2% ~ (p=0.749 n=20+19)
String/10/Base16-8 49.0ns ±13% 49.3ns ±16% ~ (p=0.581 n=19+20)
String/100/Base16-8 173ns ± 5% 185ns ± 6% +6.63% (p=0.000 n=20+18)
String/1000/Base16-8 1.38µs ± 3% 1.49µs ±10% +8.27% (p=0.000 n=19+20)
String/10000/Base16-8 13.5µs ± 2% 14.5µs ± 3% +7.08% (p=0.000 n=20+20)
String/100000/Base16-8 138µs ± 4% 148µs ± 4% +7.57% (p=0.000 n=19+20)
LeafSize/0-8 2.74ms ± 1% 2.79ms ± 2% +2.00% (p=0.000 n=19+19)
LeafSize/1-8 24.8µs ± 4% 26.1µs ± 8% +5.33% (p=0.000 n=18+19)
LeafSize/2-8 24.9µs ± 7% 25.0µs ± 8% ~ (p=0.989 n=20+19)
LeafSize/3-8 97.6µs ± 3% 100.2µs ± 5% +2.66% (p=0.001 n=20+19)
LeafSize/4-8 25.2µs ± 5% 25.4µs ± 5% ~ (p=0.173 n=19+20)
LeafSize/5-8 118µs ± 2% 119µs ± 5% ~ (p=0.478 n=20+20)
LeafSize/6-8 97.6µs ± 3% 100.1µs ± 8% +2.65% (p=0.021 n=20+19)
LeafSize/7-8 65.6µs ± 5% 67.5µs ± 6% +2.92% (p=0.003 n=20+19)
LeafSize/8-8 25.5µs ± 5% 25.6µs ± 6% ~ (p=0.461 n=19+20)
LeafSize/9-8 134µs ± 4% 136µs ± 5% ~ (p=0.194 n=19+20)
LeafSize/10-8 119µs ± 3% 122µs ± 3% +2.52% (p=0.000 n=20+19)
LeafSize/11-8 115µs ± 5% 116µs ± 5% ~ (p=0.158 n=20+19)
LeafSize/12-8 97.4µs ± 4% 100.3µs ± 5% +2.91% (p=0.003 n=19+20)
LeafSize/13-8 93.1µs ± 4% 93.0µs ± 6% ~ (p=0.698 n=20+20)
LeafSize/14-8 67.0µs ± 3% 69.7µs ± 6% +4.10% (p=0.000 n=20+20)
LeafSize/15-8 48.3µs ± 2% 49.3µs ± 6% +1.91% (p=0.014 n=19+20)
LeafSize/16-8 25.6µs ± 5% 25.6µs ± 6% ~ (p=0.947 n=20+20)
LeafSize/32-8 30.1µs ± 4% 30.3µs ± 5% ~ (p=0.685 n=18+19)
LeafSize/64-8 53.4µs ± 2% 54.0µs ± 3% ~ (p=0.053 n=19+19)
ProbablyPrime/n=0-8 3.59ms ± 1% 3.55ms ± 1% -1.12% (p=0.000 n=20+18)
ProbablyPrime/n=1-8 4.21ms ± 2% 4.17ms ± 2% -0.73% (p=0.018 n=20+19)
ProbablyPrime/n=5-8 6.74ms ± 1% 6.72ms ± 1% ~ (p=0.102 n=20+20)
ProbablyPrime/n=10-8 9.91ms ± 1% 9.89ms ± 2% ~ (p=0.322 n=19+20)
ProbablyPrime/n=20-8 16.2ms ± 1% 16.1ms ± 2% -0.52% (p=0.006 n=19+19)
ProbablyPrime/Lucas-8 2.94ms ± 1% 2.95ms ± 1% +0.52% (p=0.002 n=18+19)
ProbablyPrime/MillerRabinBase2-8 641µs ± 2% 640µs ± 2% ~ (p=0.607 n=19+20)
FloatSqrt/64-8 653ns ± 5% 704ns ± 5% +7.82% (p=0.000 n=19+20)
FloatSqrt/128-8 1.32µs ± 3% 1.42µs ± 5% +7.29% (p=0.000 n=18+20)
FloatSqrt/256-8 1.44µs ± 2% 1.45µs ± 4% ~ (p=0.089 n=19+19)
FloatSqrt/1000-8 3.36µs ± 3% 3.42µs ± 5% +1.82% (p=0.012 n=20+20)
FloatSqrt/10000-8 25.5µs ± 2% 27.5µs ± 7% +7.91% (p=0.000 n=18+19)
FloatSqrt/100000-8 629µs ± 6% 663µs ± 9% +5.32% (p=0.000 n=18+20)
FloatSqrt/1000000-8 46.4ms ± 2% 46.6ms ± 5% ~ (p=0.351 n=20+19)
[Geo mean] 9.60µs 10.01µs +4.28%
name old alloc/op new alloc/op delta
DecimalConversion-8 54.0kB ± 0% 43.6kB ± 0% -19.40% (p=0.000 n=20+20)
FloatString/100-8 400B ± 0% 400B ± 0% ~ (all equal)
FloatString/1000-8 3.10kB ± 0% 3.10kB ± 0% ~ (all equal)
FloatString/10000-8 52.1kB ± 0% 52.1kB ± 0% ~ (p=0.153 n=20+20)
FloatString/100000-8 582kB ± 0% 582kB ± 0% ~ (all equal)
FloatAdd/10-8 0.00B 0.00B ~ (all equal)
FloatAdd/100-8 0.00B 0.00B ~ (all equal)
FloatAdd/1000-8 0.00B 0.00B ~ (all equal)
FloatAdd/10000-8 0.00B 0.00B ~ (all equal)
FloatAdd/100000-8 0.00B 0.00B ~ (all equal)
FloatSub/10-8 0.00B 0.00B ~ (all equal)
FloatSub/100-8 0.00B 0.00B ~ (all equal)
FloatSub/1000-8 0.00B 0.00B ~ (all equal)
FloatSub/10000-8 0.00B 0.00B ~ (all equal)
FloatSub/100000-8 0.00B 0.00B ~ (all equal)
ParseFloatSmallExp-8 4.18kB ± 0% 3.60kB ± 0% -13.79% (p=0.000 n=20+20)
ParseFloatLargeExp-8 18.9kB ± 0% 19.3kB ± 0% +2.25% (p=0.000 n=20+20)
GCD10x10/WithoutXY-8 96.0B ± 0% 16.0B ± 0% -83.33% (p=0.000 n=20+20)
GCD10x10/WithXY-8 240B ± 0% 88B ± 0% -63.33% (p=0.000 n=20+20)
GCD10x100/WithoutXY-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20)
GCD10x100/WithXY-8 464B ± 0% 424B ± 0% -8.62% (p=0.000 n=20+20)
GCD10x1000/WithoutXY-8 416B ± 0% 336B ± 0% -19.23% (p=0.000 n=20+20)
GCD10x1000/WithXY-8 1.25kB ± 0% 1.10kB ± 0% -12.18% (p=0.000 n=20+20)
GCD10x10000/WithoutXY-8 2.91kB ± 0% 2.83kB ± 0% -2.75% (p=0.000 n=20+20)
GCD10x10000/WithXY-8 8.70kB ± 0% 8.55kB ± 0% -1.76% (p=0.000 n=16+16)
GCD10x100000/WithoutXY-8 27.2kB ± 0% 27.2kB ± 0% -0.29% (p=0.000 n=20+20)
GCD10x100000/WithXY-8 82.4kB ± 0% 82.3kB ± 0% -0.17% (p=0.000 n=20+19)
GCD100x100/WithoutXY-8 288B ± 0% 384B ± 0% +33.33% (p=0.000 n=20+20)
GCD100x100/WithXY-8 464B ± 0% 576B ± 0% +24.14% (p=0.000 n=20+20)
GCD100x1000/WithoutXY-8 640B ± 0% 688B ± 0% +7.50% (p=0.000 n=20+20)
GCD100x1000/WithXY-8 1.52kB ± 0% 1.46kB ± 0% -3.68% (p=0.000 n=20+20)
GCD100x10000/WithoutXY-8 4.24kB ± 0% 4.29kB ± 0% +1.13% (p=0.000 n=20+20)
GCD100x10000/WithXY-8 11.1kB ± 0% 11.0kB ± 0% -0.51% (p=0.000 n=15+20)
GCD100x100000/WithoutXY-8 40.9kB ± 0% 40.9kB ± 0% +0.12% (p=0.000 n=20+19)
GCD100x100000/WithXY-8 110kB ± 0% 109kB ± 0% -0.08% (p=0.000 n=20+20)
GCD1000x1000/WithoutXY-8 1.22kB ± 0% 1.06kB ± 0% -13.16% (p=0.000 n=20+20)
GCD1000x1000/WithXY-8 2.37kB ± 0% 2.11kB ± 0% -10.83% (p=0.000 n=20+20)
GCD1000x10000/WithoutXY-8 4.71kB ± 0% 4.63kB ± 0% -1.70% (p=0.000 n=20+19)
GCD1000x10000/WithXY-8 28.2kB ± 0% 28.0kB ± 0% -0.43% (p=0.000 n=20+15)
GCD1000x100000/WithoutXY-8 41.3kB ± 0% 41.2kB ± 0% -0.20% (p=0.000 n=20+16)
GCD1000x100000/WithXY-8 301kB ± 0% 301kB ± 0% -0.13% (p=0.000 n=20+20)
GCD10000x10000/WithoutXY-8 8.64kB ± 0% 8.48kB ± 0% -1.85% (p=0.000 n=20+20)
GCD10000x10000/WithXY-8 57.2kB ± 0% 57.7kB ± 0% +0.80% (p=0.000 n=20+20)
GCD10000x100000/WithoutXY-8 43.8kB ± 0% 43.7kB ± 0% -0.19% (p=0.000 n=20+18)
GCD10000x100000/WithXY-8 2.08MB ± 0% 2.08MB ± 0% -0.02% (p=0.000 n=15+19)
GCD100000x100000/WithoutXY-8 81.6kB ± 0% 81.4kB ± 0% -0.20% (p=0.000 n=20+20)
GCD100000x100000/WithXY-8 4.32MB ± 0% 4.33MB ± 0% +0.12% (p=0.000 n=20+20)
Hilbert-8 653kB ± 0% 313kB ± 0% -52.13% (p=0.000 n=19+20)
Binomial-8 1.82kB ± 0% 1.02kB ± 0% -43.86% (p=0.000 n=20+20)
QuoRem-8 0.00B 0.00B ~ (all equal)
Exp-8 11.1kB ± 0% 11.0kB ± 0% -0.34% (p=0.000 n=19+20)
Exp2-8 11.3kB ± 0% 11.3kB ± 0% -0.35% (p=0.000 n=19+20)
Bitset-8 0.00B 0.00B ~ (all equal)
BitsetNeg-8 0.00B 0.00B ~ (all equal)
BitsetOrig-8 103B ± 0% 63B ± 0% -38.83% (p=0.000 n=20+20)
BitsetNegOrig-8 215B ± 0% 175B ± 0% -18.60% (p=0.000 n=20+20)
ModSqrt225_Tonelli-8 11.3kB ± 0% 11.0kB ± 0% -2.76% (p=0.000 n=20+17)
ModSqrt225_3Mod4-8 3.57kB ± 0% 3.53kB ± 0% -1.12% (p=0.000 n=20+20)
ModSqrt231_Tonelli-8 11.0kB ± 0% 10.7kB ± 0% -2.55% (p=0.000 n=20+20)
ModSqrt231_5Mod8-8 4.21kB ± 0% 4.09kB ± 0% -2.85% (p=0.000 n=16+20)
ModInverse-8 1.44kB ± 0% 1.28kB ± 0% -11.11% (p=0.000 n=20+20)
Sqrt-8 6.00kB ± 0% 6.00kB ± 0% ~ (all equal)
IntSqr/1-8 0.00B 0.00B ~ (all equal)
IntSqr/2-8 0.00B 0.00B ~ (all equal)
IntSqr/3-8 0.00B 0.00B ~ (all equal)
IntSqr/5-8 0.00B 0.00B ~ (all equal)
IntSqr/8-8 0.00B 0.00B ~ (all equal)
IntSqr/10-8 0.00B 0.00B ~ (all equal)
IntSqr/20-8 320B ± 0% 320B ± 0% ~ (all equal)
IntSqr/30-8 480B ± 0% 480B ± 0% ~ (all equal)
IntSqr/50-8 896B ± 0% 896B ± 0% ~ (all equal)
IntSqr/80-8 1.28kB ± 0% 1.28kB ± 0% ~ (all equal)
IntSqr/100-8 1.79kB ± 0% 1.79kB ± 0% ~ (all equal)
IntSqr/200-8 3.20kB ± 0% 3.20kB ± 0% ~ (all equal)
IntSqr/300-8 8.06kB ± 0% 8.06kB ± 0% ~ (all equal)
IntSqr/500-8 12.3kB ± 0% 12.3kB ± 0% ~ (all equal)
IntSqr/800-8 28.8kB ± 0% 28.8kB ± 0% ~ (all equal)
IntSqr/1000-8 36.9kB ± 0% 36.9kB ± 0% ~ (all equal)
Div/20/10-8 0.00B 0.00B ~ (all equal)
Div/200/100-8 0.00B 0.00B ~ (all equal)
Div/2000/1000-8 0.00B 0.00B ~ (all equal)
Div/20000/10000-8 0.00B 0.00B ~ (all equal)
Div/200000/100000-8 690B ± 0% 690B ± 0% ~ (all equal)
Mul-8 565kB ± 0% 565kB ± 0% ~ (all equal)
ZeroShifts/Shl-8 6.53kB ± 0% 6.53kB ± 0% ~ (all equal)
ZeroShifts/ShlSame-8 0.00B 0.00B ~ (all equal)
ZeroShifts/Shr-8 6.53kB ± 0% 6.53kB ± 0% ~ (all equal)
ZeroShifts/ShrSame-8 0.00B 0.00B ~ (all equal)
Exp3Power/0x10-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20)
Exp3Power/0x40-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20)
Exp3Power/0x100-8 288B ± 0% 208B ± 0% -27.78% (p=0.000 n=20+20)
Exp3Power/0x400-8 672B ± 0% 592B ± 0% -11.90% (p=0.000 n=20+20)
Exp3Power/0x1000-8 3.33kB ± 0% 3.25kB ± 0% -2.40% (p=0.000 n=20+20)
Exp3Power/0x4000-8 13.8kB ± 0% 13.7kB ± 0% -0.58% (p=0.000 n=20+20)
Exp3Power/0x10000-8 117kB ± 0% 117kB ± 0% -0.07% (p=0.000 n=20+20)
Exp3Power/0x40000-8 755kB ± 0% 755kB ± 0% -0.01% (p=0.000 n=19+20)
Exp3Power/0x100000-8 5.22MB ± 0% 5.22MB ± 0% -0.00% (p=0.000 n=20+20)
Exp3Power/0x400000-8 39.8MB ± 0% 39.8MB ± 0% -0.00% (p=0.000 n=20+19)
Fibo-8 3.09MB ± 0% 3.08MB ± 0% -0.28% (p=0.000 n=20+16)
NatSqr/1-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
NatSqr/2-8 64.0B ± 0% 64.0B ± 0% ~ (all equal)
NatSqr/3-8 80.0B ± 0% 80.0B ± 0% ~ (all equal)
NatSqr/5-8 112B ± 0% 112B ± 0% ~ (all equal)
NatSqr/8-8 160B ± 0% 160B ± 0% ~ (all equal)
NatSqr/10-8 192B ± 0% 192B ± 0% ~ (all equal)
NatSqr/20-8 672B ± 0% 672B ± 0% ~ (all equal)
NatSqr/30-8 992B ± 0% 992B ± 0% ~ (all equal)
NatSqr/50-8 1.79kB ± 0% 1.79kB ± 0% ~ (all equal)
NatSqr/80-8 2.69kB ± 0% 2.69kB ± 0% ~ (all equal)
NatSqr/100-8 3.58kB ± 0% 3.58kB ± 0% ~ (all equal)
NatSqr/200-8 6.66kB ± 0% 6.66kB ± 0% ~ (all equal)
NatSqr/300-8 24.4kB ± 0% 24.4kB ± 0% ~ (all equal)
NatSqr/500-8 36.9kB ± 0% 36.9kB ± 0% ~ (all equal)
NatSqr/800-8 69.8kB ± 0% 69.8kB ± 0% ~ (all equal)
NatSqr/1000-8 86.0kB ± 0% 86.0kB ± 0% ~ (all equal)
NatSetBytes/8-8 0.00B 0.00B ~ (all equal)
NatSetBytes/24-8 64.0B ± 0% 64.0B ± 0% ~ (all equal)
NatSetBytes/128-8 160B ± 0% 160B ± 0% ~ (all equal)
NatSetBytes/7-8 0.00B 0.00B ~ (all equal)
NatSetBytes/23-8 64.0B ± 0% 64.0B ± 0% ~ (all equal)
NatSetBytes/127-8 160B ± 0% 160B ± 0% ~ (all equal)
ScanPi-8 75.4kB ± 0% 75.7kB ± 0% +0.41% (p=0.000 n=20+20)
StringPiParallel-8 20.4kB ± 0% 20.4kB ± 0% ~ (p=0.223 n=20+20)
Scan/10/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/1000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/1000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/1000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/1000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/10000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
Scan/100000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
String/10/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal)
String/100/Base2-8 352B ± 0% 352B ± 0% ~ (all equal)
String/1000/Base2-8 3.46kB ± 0% 3.46kB ± 0% ~ (all equal)
String/10000/Base2-8 41.0kB ± 0% 41.0kB ± 0% ~ (all equal)
String/100000/Base2-8 336kB ± 0% 336kB ± 0% ~ (all equal)
String/10/Base8-8 16.0B ± 0% 16.0B ± 0% ~ (all equal)
String/100/Base8-8 112B ± 0% 112B ± 0% ~ (all equal)
String/1000/Base8-8 1.15kB ± 0% 1.15kB ± 0% ~ (all equal)
String/10000/Base8-8 12.3kB ± 0% 12.3kB ± 0% ~ (all equal)
String/100000/Base8-8 115kB ± 0% 115kB ± 0% ~ (all equal)
String/10/Base10-8 64.0B ± 0% 24.0B ± 0% -62.50% (p=0.000 n=20+20)
String/100/Base10-8 192B ± 0% 192B ± 0% ~ (all equal)
String/1000/Base10-8 1.95kB ± 0% 1.95kB ± 0% ~ (all equal)
String/10000/Base10-8 20.0kB ± 0% 20.0kB ± 0% ~ (p=0.983 n=19+20)
String/100000/Base10-8 210kB ± 1% 211kB ± 1% +0.82% (p=0.000 n=19+20)
String/10/Base16-8 16.0B ± 0% 16.0B ± 0% ~ (all equal)
String/100/Base16-8 96.0B ± 0% 96.0B ± 0% ~ (all equal)
String/1000/Base16-8 896B ± 0% 896B ± 0% ~ (all equal)
String/10000/Base16-8 9.47kB ± 0% 9.47kB ± 0% ~ (all equal)
String/100000/Base16-8 90.1kB ± 0% 90.1kB ± 0% ~ (all equal)
LeafSize/0-8 16.9kB ± 0% 16.8kB ± 0% -0.44% (p=0.000 n=20+20)
LeafSize/1-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19)
LeafSize/2-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19)
LeafSize/3-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+17)
LeafSize/4-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19)
LeafSize/5-8 22.4kB ± 0% 22.3kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/6-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=20+20)
LeafSize/7-8 22.3kB ± 0% 22.2kB ± 0% -0.35% (p=0.000 n=20+20)
LeafSize/8-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=16+20)
LeafSize/9-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/10-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/11-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/12-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/13-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=20+15)
LeafSize/14-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/15-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20)
LeafSize/16-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=19+20)
LeafSize/32-8 22.3kB ± 0% 22.2kB ± 0% -0.32% (p=0.000 n=20+20)
LeafSize/64-8 21.8kB ± 0% 21.7kB ± 0% -0.33% (p=0.000 n=18+19)
ProbablyPrime/n=0-8 15.3kB ± 0% 14.9kB ± 0% -2.35% (p=0.000 n=20+20)
ProbablyPrime/n=1-8 21.0kB ± 0% 20.7kB ± 0% -1.71% (p=0.000 n=20+20)
ProbablyPrime/n=5-8 43.4kB ± 0% 42.9kB ± 0% -1.20% (p=0.000 n=20+20)
ProbablyPrime/n=10-8 71.5kB ± 0% 70.7kB ± 0% -1.01% (p=0.000 n=19+20)
ProbablyPrime/n=20-8 127kB ± 0% 126kB ± 0% -0.88% (p=0.000 n=20+20)
ProbablyPrime/Lucas-8 3.07kB ± 0% 2.79kB ± 0% -9.12% (p=0.000 n=20+20)
ProbablyPrime/MillerRabinBase2-8 12.1kB ± 0% 12.0kB ± 0% -0.66% (p=0.000 n=20+20)
FloatSqrt/64-8 416B ± 0% 360B ± 0% -13.46% (p=0.000 n=20+20)
FloatSqrt/128-8 640B ± 0% 584B ± 0% -8.75% (p=0.000 n=20+20)
FloatSqrt/256-8 512B ± 0% 472B ± 0% -7.81% (p=0.000 n=20+20)
FloatSqrt/1000-8 1.47kB ± 0% 1.43kB ± 0% -2.72% (p=0.000 n=20+20)
FloatSqrt/10000-8 18.2kB ± 0% 18.1kB ± 0% -0.22% (p=0.000 n=20+20)
FloatSqrt/100000-8 204kB ± 0% 204kB ± 0% -0.02% (p=0.000 n=20+20)
FloatSqrt/1000000-8 6.37MB ± 0% 6.37MB ± 0% -0.00% (p=0.000 n=19+20)
[Geo mean] 3.42kB 3.24kB -5.33%
name old allocs/op new allocs/op delta
DecimalConversion-8 1.65k ± 0% 1.65k ± 0% ~ (all equal)
FloatString/100-8 8.00 ± 0% 8.00 ± 0% ~ (all equal)
FloatString/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal)
FloatString/10000-8 22.0 ± 0% 22.0 ± 0% ~ (all equal)
FloatString/100000-8 136 ± 0% 136 ± 0% ~ (all equal)
FloatAdd/10-8 0.00 0.00 ~ (all equal)
FloatAdd/100-8 0.00 0.00 ~ (all equal)
FloatAdd/1000-8 0.00 0.00 ~ (all equal)
FloatAdd/10000-8 0.00 0.00 ~ (all equal)
FloatAdd/100000-8 0.00 0.00 ~ (all equal)
FloatSub/10-8 0.00 0.00 ~ (all equal)
FloatSub/100-8 0.00 0.00 ~ (all equal)
FloatSub/1000-8 0.00 0.00 ~ (all equal)
FloatSub/10000-8 0.00 0.00 ~ (all equal)
FloatSub/100000-8 0.00 0.00 ~ (all equal)
ParseFloatSmallExp-8 110 ± 0% 130 ± 0% +18.18% (p=0.000 n=20+20)
ParseFloatLargeExp-8 319 ± 0% 371 ± 0% +16.30% (p=0.000 n=20+20)
GCD10x10/WithoutXY-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
GCD10x10/WithXY-8 5.00 ± 0% 6.00 ± 0% +20.00% (p=0.000 n=20+20)
GCD10x100/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
GCD10x100/WithXY-8 9.00 ± 0% 12.00 ± 0% +33.33% (p=0.000 n=20+20)
GCD10x1000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
GCD10x1000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20)
GCD10x10000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
GCD10x10000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20)
GCD10x100000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
GCD10x100000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20)
GCD100x100/WithoutXY-8 6.00 ± 0% 10.00 ± 0% +66.67% (p=0.000 n=20+20)
GCD100x100/WithXY-8 9.00 ± 0% 15.00 ± 0% +66.67% (p=0.000 n=20+20)
GCD100x1000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20)
GCD100x1000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20)
GCD100x10000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20)
GCD100x10000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20)
GCD100x100000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20)
GCD100x100000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20)
GCD1000x1000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal)
GCD1000x1000/WithXY-8 19.0 ± 0% 20.0 ± 0% +5.26% (p=0.000 n=20+20)
GCD1000x10000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal)
GCD1000x10000/WithXY-8 26.0 ± 0% 26.0 ± 0% ~ (all equal)
GCD1000x100000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal)
GCD1000x100000/WithXY-8 27.0 ± 0% 27.0 ± 0% ~ (all equal)
GCD10000x10000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal)
GCD10000x10000/WithXY-8 76.0 ± 0% 78.0 ± 0% +2.63% (p=0.000 n=20+20)
GCD10000x100000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal)
GCD10000x100000/WithXY-8 174 ± 0% 174 ± 0% ~ (all equal)
GCD100000x100000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal)
GCD100000x100000/WithXY-8 645 ± 0% 647 ± 0% +0.31% (p=0.000 n=20+20)
Hilbert-8 14.1k ± 0% 14.3k ± 0% +0.92% (p=0.000 n=20+20)
Binomial-8 38.0 ± 0% 38.0 ± 0% ~ (all equal)
QuoRem-8 0.00 0.00 ~ (all equal)
Exp-8 21.0 ± 0% 21.0 ± 0% ~ (all equal)
Exp2-8 22.0 ± 0% 22.0 ± 0% ~ (all equal)
Bitset-8 0.00 0.00 ~ (all equal)
BitsetNeg-8 0.00 0.00 ~ (all equal)
BitsetOrig-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
BitsetNegOrig-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
ModSqrt225_Tonelli-8 85.0 ± 0% 86.0 ± 0% +1.18% (p=0.000 n=20+20)
ModSqrt225_3Mod4-8 25.0 ± 0% 25.0 ± 0% ~ (all equal)
ModSqrt231_Tonelli-8 80.0 ± 0% 80.0 ± 0% ~ (all equal)
ModSqrt231_5Mod8-8 32.0 ± 0% 32.0 ± 0% ~ (all equal)
ModInverse-8 11.0 ± 0% 11.0 ± 0% ~ (all equal)
Sqrt-8 13.0 ± 0% 13.0 ± 0% ~ (all equal)
IntSqr/1-8 0.00 0.00 ~ (all equal)
IntSqr/2-8 0.00 0.00 ~ (all equal)
IntSqr/3-8 0.00 0.00 ~ (all equal)
IntSqr/5-8 0.00 0.00 ~ (all equal)
IntSqr/8-8 0.00 0.00 ~ (all equal)
IntSqr/10-8 0.00 0.00 ~ (all equal)
IntSqr/20-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
IntSqr/30-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
IntSqr/50-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
IntSqr/80-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
IntSqr/100-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
IntSqr/200-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
IntSqr/300-8 3.00 ± 0% 3.00 ± 0% ~ (all equal)
IntSqr/500-8 3.00 ± 0% 3.00 ± 0% ~ (all equal)
IntSqr/800-8 9.00 ± 0% 9.00 ± 0% ~ (all equal)
IntSqr/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal)
Div/20/10-8 0.00 0.00 ~ (all equal)
Div/200/100-8 0.00 0.00 ~ (all equal)
Div/2000/1000-8 0.00 0.00 ~ (all equal)
Div/20000/10000-8 0.00 0.00 ~ (all equal)
Div/200000/100000-8 0.00 0.00 ~ (all equal)
Mul-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
ZeroShifts/Shl-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
ZeroShifts/ShlSame-8 0.00 0.00 ~ (all equal)
ZeroShifts/Shr-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
ZeroShifts/ShrSame-8 0.00 0.00 ~ (all equal)
Exp3Power/0x10-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
Exp3Power/0x40-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
Exp3Power/0x100-8 5.00 ± 0% 5.00 ± 0% ~ (all equal)
Exp3Power/0x400-8 7.00 ± 0% 7.00 ± 0% ~ (all equal)
Exp3Power/0x1000-8 11.0 ± 0% 11.0 ± 0% ~ (all equal)
Exp3Power/0x4000-8 15.0 ± 0% 15.0 ± 0% ~ (all equal)
Exp3Power/0x10000-8 29.0 ± 0% 29.0 ± 0% ~ (all equal)
Exp3Power/0x40000-8 140 ± 0% 140 ± 0% ~ (all equal)
Exp3Power/0x100000-8 1.12k ± 0% 1.12k ± 0% ~ (all equal)
Exp3Power/0x400000-8 9.88k ± 0% 9.88k ± 0% ~ (p=0.747 n=17+19)
Fibo-8 739 ± 0% 743 ± 0% +0.54% (p=0.000 n=20+20)
NatSqr/1-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSqr/2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSqr/3-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSqr/5-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSqr/8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSqr/10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSqr/20-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
NatSqr/30-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
NatSqr/50-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
NatSqr/80-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
NatSqr/100-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
NatSqr/200-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
NatSqr/300-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
NatSqr/500-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
NatSqr/800-8 10.0 ± 0% 10.0 ± 0% ~ (all equal)
NatSqr/1000-8 10.0 ± 0% 10.0 ± 0% ~ (all equal)
NatSetBytes/8-8 0.00 0.00 ~ (all equal)
NatSetBytes/24-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSetBytes/128-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSetBytes/7-8 0.00 0.00 ~ (all equal)
NatSetBytes/23-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
NatSetBytes/127-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
ScanPi-8 60.0 ± 0% 61.0 ± 0% +1.67% (p=0.000 n=20+20)
StringPiParallel-8 24.0 ± 0% 24.0 ± 0% ~ (all equal)
Scan/10/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/1000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/1000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/1000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/1000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/10000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
Scan/100000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/10/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/100/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/1000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/10000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/100000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/10/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/100/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/1000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/10000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/100000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/10/Base10-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
String/100/Base10-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
String/1000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal)
String/10000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal)
String/100000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal)
String/10/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/100/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/1000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/10000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
String/100000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
LeafSize/0-8 10.0 ± 0% 10.0 ± 0% ~ (all equal)
LeafSize/1-8 13.0 ± 0% 13.0 ± 0% ~ (all equal)
LeafSize/2-8 13.0 ± 0% 13.0 ± 0% ~ (all equal)
LeafSize/3-8 13.0 ± 0% 13.0 ± 0% ~ (all equal)
LeafSize/4-8 13.0 ± 0% 13.0 ± 0% ~ (all equal)
LeafSize/5-8 13.0 ± 0% 13.0 ± 0% ~ (all equal)
LeafSize/6-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/7-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/8-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/9-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/10-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/11-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/12-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/13-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/14-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/15-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/16-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/32-8 12.0 ± 0% 12.0 ± 0% ~ (all equal)
LeafSize/64-8 11.0 ± 0% 11.0 ± 0% ~ (all equal)
ProbablyPrime/n=0-8 52.0 ± 0% 52.0 ± 0% ~ (all equal)
ProbablyPrime/n=1-8 73.0 ± 0% 73.0 ± 0% ~ (all equal)
ProbablyPrime/n=5-8 157 ± 0% 157 ± 0% ~ (all equal)
ProbablyPrime/n=10-8 262 ± 0% 262 ± 0% ~ (all equal)
ProbablyPrime/n=20-8 472 ± 0% 472 ± 0% ~ (all equal)
ProbablyPrime/Lucas-8 22.0 ± 0% 22.0 ± 0% ~ (all equal)
ProbablyPrime/MillerRabinBase2-8 29.0 ± 0% 29.0 ± 0% ~ (all equal)
FloatSqrt/64-8 9.00 ± 0% 10.00 ± 0% +11.11% (p=0.000 n=20+20)
FloatSqrt/128-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20)
FloatSqrt/256-8 8.00 ± 0% 8.00 ± 0% ~ (all equal)
FloatSqrt/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal)
FloatSqrt/10000-8 14.0 ± 0% 14.0 ± 0% ~ (all equal)
FloatSqrt/100000-8 33.0 ± 0% 33.0 ± 0% ~ (all equal)
FloatSqrt/1000000-8 1.16k ± 0% 1.16k ± 0% ~ (all equal)
[Geo mean] 6.62 6.76 +2.09%
Change-Id: Id9df4157cac1e07721e35cff7fcdefe60703873a
Reviewed-on: https://go-review.googlesource.com/c/150999
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Div panics when y<=hi because either the quotient overflows
the size of the output or division by zero occurs when y==0.
This provides a uniform behavior for all implementations.
Fixes #28316
Change-Id: If23aeb10e0709ee1a60b7d614afc9103d674a980
Reviewed-on: https://go-review.googlesource.com/c/149517
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Explicitly check for divide-by-zero/overflow and panic with the appropriate
runtime error. The additional checks have basically no effect on performance
since the branch is easily predicted.
name old time/op new time/op delta
Div-4 53.9ns ± 1% 53.0ns ± 1% -1.59% (p=0.016 n=4+5)
Div32-4 17.9ns ± 0% 18.4ns ± 0% +2.56% (p=0.008 n=5+5)
Div64-4 53.5ns ± 0% 53.3ns ± 0% ~ (p=0.095 n=5+5)
Updates #28316
Change-Id: I36297ee9946cbbc57fefb44d1730283b049ecf57
Reviewed-on: https://go-review.googlesource.com/c/144377
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Go documentation style for boolean funcs is to say:
// Foo reports whether ...
func Foo() bool
(rather than "returns true if")
This CL also replaces 4 uses of "iff" with the same "reports whether"
wording, which doesn't lose any meaning, and will prevent people from
sending typo fixes when they don't realize it's "if and only if". In
the past I think we've had the typo CLs updated to just say "reports
whether". So do them all at once.
(Inspired by the addition of another "returns true if" in CL 146938
in fd_plan9.go)
Created with:
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor)
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor)
Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f
Reviewed-on: https://go-review.googlesource.com/c/147037
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Fixes #28423.
Change-Id: Ie57ade565d0407a4bffaa86fb4475ff083168e79
Reviewed-on: https://go-review.googlesource.com/c/145537
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
The function documentation was wrong, it was using a wrong parameter. This change
replaces it with the right parameter.
The wrong formula was: q = (u1<<_W + u0 - r)/y
The function has got a parameter "v" (of type Word), not a parameter "y".
So, the right formula is: q = (u1<<_W + u0 - r)/v
Fixes #28444
Change-Id: I82e57ba014735a9fdb6262874ddf498754d30d33
Reviewed-on: https://go-review.googlesource.com/c/145280
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
name old time/op new time/op delta
Add-8 1.11ns ± 0% 1.18ns ± 0% +6.31% (p=0.029 n=4+4)
Add32-8 1.02ns ± 0% 1.02ns ± 1% ~ (p=0.333 n=4+5)
Add64-8 1.11ns ± 1% 1.17ns ± 0% +5.79% (p=0.008 n=5+5)
Add64multiple-8 4.35ns ± 1% 0.86ns ± 0% -80.22% (p=0.000 n=5+4)
The individual ops are a bit slower (but still very fast).
Using the ops in carry chains is very fast.
Update #28273
Change-Id: Id975f76df2b930abf0e412911d327b6c5b1befe5
Reviewed-on: https://go-review.googlesource.com/c/144257
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Previously, the benchmark was measuring Add64 instead of Sub64.
Change-Id: I0cf30935c8a4728bead9868834377aae0b34f008
Reviewed-on: https://go-review.googlesource.com/c/144380
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Change-Id: If2954bdfc551515403706b2cd0dde94e45936e08
GitHub-Last-Rev: d4cfc41a5504cf10befefdb881d4c45986a1d1f8
GitHub-Pull-Request: golang/go#28049
Reviewed-on: https://go-review.googlesource.com/c/140299
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
name old time/op new time/op delta
PowInt 55.7ns ± 1% 53.4ns ± 2% -4.15% (p=0.000 n=9+9)
PowFrac 133ns ± 1% 133ns ± 2% ~ (p=0.587 n=8+9)
Change-Id: Ica0f4c2cbd554f2195c6d1762ed26742ff8e3924
Reviewed-on: https://go-review.googlesource.com/c/85375
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
goos: linux
goarch: amd64
pkg: math
name old time/op new time/op delta
Mod 64.7ns ± 2% 63.7ns ± 2% -1.52% (p=0.003 n=8+10)
Change-Id: I851bec0fd6c223dab73e4a680b7393d49e81a0e8
Reviewed-on: https://go-review.googlesource.com/c/85095
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Don't worry, this patch just remove trailing whitespace from
assembly files, and does not touch any logical changes.
Change-Id: Ia724ac0b1abf8bc1e41454bdc79289ef317c165d
Reviewed-on: https://go-review.googlesource.com/c/113595
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Use float <-> int register moves without conversion instead of stores
and loads to move float <-> int values.
Math package benchmark results.
name old time/op new time/op delta
Acosh 153ns ± 0% 147ns ± 0% -3.92% (p=0.000 n=10+10)
Asinh 183ns ± 0% 177ns ± 0% -3.28% (p=0.000 n=10+10)
Atanh 157ns ± 0% 155ns ± 0% -1.27% (p=0.000 n=10+10)
Atan2 118ns ± 0% 117ns ± 1% -0.59% (p=0.003 n=10+10)
Cbrt 119ns ± 0% 114ns ± 0% -4.20% (p=0.000 n=10+10)
Copysign 7.51ns ± 0% 6.51ns ± 0% -13.32% (p=0.000 n=9+10)
Cos 73.1ns ± 0% 70.6ns ± 0% -3.42% (p=0.000 n=10+10)
Cosh 119ns ± 0% 121ns ± 0% +1.68% (p=0.000 n=10+9)
ExpGo 154ns ± 0% 149ns ± 0% -3.05% (p=0.000 n=9+10)
Expm1 101ns ± 0% 99ns ± 0% -1.88% (p=0.000 n=10+10)
Exp2Go 150ns ± 0% 146ns ± 0% -2.67% (p=0.000 n=10+10)
Abs 7.01ns ± 0% 6.01ns ± 0% -14.27% (p=0.000 n=10+9)
Mod 234ns ± 0% 212ns ± 0% -9.40% (p=0.000 n=9+10)
Frexp 34.5ns ± 0% 30.0ns ± 0% -13.04% (p=0.000 n=10+10)
Gamma 112ns ± 0% 111ns ± 0% -0.89% (p=0.000 n=10+10)
Hypot 73.6ns ± 0% 68.6ns ± 0% -6.79% (p=0.000 n=10+10)
HypotGo 77.1ns ± 0% 72.1ns ± 0% -6.49% (p=0.000 n=10+10)
Ilogb 31.0ns ± 0% 28.0ns ± 0% -9.68% (p=0.000 n=10+10)
J0 437ns ± 0% 434ns ± 0% -0.62% (p=0.000 n=10+10)
J1 433ns ± 0% 431ns ± 0% -0.46% (p=0.000 n=10+10)
Jn 927ns ± 0% 922ns ± 0% -0.54% (p=0.000 n=10+10)
Ldexp 41.5ns ± 0% 37.0ns ± 0% -10.84% (p=0.000 n=9+10)
Log 124ns ± 0% 118ns ± 0% -4.84% (p=0.000 n=10+9)
Logb 34.0ns ± 0% 32.0ns ± 0% -5.88% (p=0.000 n=10+10)
Log1p 110ns ± 0% 108ns ± 0% -1.82% (p=0.000 n=10+10)
Log10 136ns ± 0% 132ns ± 0% -2.94% (p=0.000 n=10+10)
Log2 51.6ns ± 0% 47.1ns ± 0% -8.72% (p=0.000 n=10+10)
Nextafter32 33.0ns ± 0% 30.5ns ± 0% -7.58% (p=0.000 n=10+10)
Nextafter64 29.0ns ± 0% 26.5ns ± 0% -8.62% (p=0.000 n=10+10)
PowInt 169ns ± 0% 160ns ± 0% -5.33% (p=0.000 n=10+10)
PowFrac 375ns ± 0% 361ns ± 0% -3.73% (p=0.000 n=10+10)
RoundToEven 14.0ns ± 0% 12.5ns ± 0% -10.71% (p=0.000 n=10+10)
Remainder 206ns ± 0% 192ns ± 0% -6.80% (p=0.000 n=10+9)
Signbit 6.01ns ± 0% 5.51ns ± 0% -8.32% (p=0.000 n=10+9)
Sin 70.1ns ± 0% 69.6ns ± 0% -0.71% (p=0.000 n=10+10)
Sincos 99.1ns ± 0% 99.6ns ± 0% +0.50% (p=0.000 n=9+10)
SqrtGoLatency 178ns ± 0% 146ns ± 0% -17.70% (p=0.000 n=8+10)
SqrtPrime 9.19µs ± 0% 9.20µs ± 0% +0.01% (p=0.000 n=9+9)
Tanh 125ns ± 1% 127ns ± 0% +1.36% (p=0.000 n=10+10)
Y0 428ns ± 0% 426ns ± 0% -0.47% (p=0.000 n=10+10)
Y1 431ns ± 0% 429ns ± 0% -0.46% (p=0.000 n=10+9)
Yn 906ns ± 0% 901ns ± 0% -0.55% (p=0.000 n=10+10)
Float64bits 4.50ns ± 0% 3.50ns ± 0% -22.22% (p=0.000 n=10+10)
Float64frombits 4.00ns ± 0% 3.50ns ± 0% -12.50% (p=0.000 n=10+9)
Float32bits 4.50ns ± 0% 3.50ns ± 0% -22.22% (p=0.002 n=8+10)
Float32frombits 4.00ns ± 0% 3.50ns ± 0% -12.50% (p=0.000 n=10+10)
Change-Id: Iba829e15d5624962fe0c699139ea783efeefabc2
Reviewed-on: https://go-review.googlesource.com/129715
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Port math/big pure go versions of add-with-carry, subtract-with-borrow,
full-width multiply, and full-width divide.
Updates #24813
Change-Id: Ifae5d2f6ee4237137c9dcba931f69c91b80a4b1c
Reviewed-on: https://go-review.googlesource.com/123157
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Change-Id: Ibef5f96ea588d17eac1c96ee3992e01943ba0fef
Reviewed-on: https://go-review.googlesource.com/131496
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
The divLarge code contained "todo"s about avoiding alias
and clear calls in the initialization of variables. By
rearranging the order of initialization and always using
an auxiliary variable for the shifted divisor, all of these
calls can be safely avoided. On average, normalizing
the divisor (shift>0) is required 31/32 or 63/64 of the
time. If one always performs the shift into an auxiliary
variable first, this avoids the need to check for aliasing of
vIn in the output variables u and z. The remainder u is
initialized via a left shift of uIn and thus needs no
alias check against uIn. Since uIn and vIn were both used,
z needs no alias checks except against u which is used for
storage of the remainder. This change has a minimal impact
on performance (see below), but cleans up the initialization
code and eliminates the "todo"s.
name old time/op new time/op delta
Div/20/10-4 86.7ns ± 6% 85.7ns ± 5% ~ (p=0.841 n=5+5)
Div/200/100-4 523ns ± 5% 502ns ± 3% -4.13% (p=0.024 n=5+5)
Div/2000/1000-4 2.55µs ± 3% 2.59µs ± 5% ~ (p=0.548 n=5+5)
Div/20000/10000-4 80.4µs ± 4% 80.0µs ± 2% ~ (p=1.000 n=5+5)
Div/200000/100000-4 6.43ms ± 6% 6.35ms ± 4% ~ (p=0.548 n=5+5)
Fixes #22928
Change-Id: I30d8498ef1cf8b69b0f827165c517bc25a5c32d7
Reviewed-on: https://go-review.googlesource.com/130775
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
The Sqrt code previously used explicit constants for 2 and 1/2. This change
replaces multiplication by these constants with increment and decrement of
the floating point exponent directly. This improves performance by ~7-10%
for small inputs and minimal improvement for large inputs.
name old time/op new time/op delta
FloatSqrt/64-4 1.39µs ± 0% 1.29µs ± 3% -7.01% (p=0.016 n=4+5)
FloatSqrt/128-4 2.84µs ± 0% 2.60µs ± 1% -8.33% (p=0.008 n=5+5)
FloatSqrt/256-4 3.24µs ± 1% 2.91µs ± 2% -10.00% (p=0.008 n=5+5)
FloatSqrt/1000-4 7.42µs ± 1% 6.74µs ± 0% -9.16% (p=0.008 n=5+5)
FloatSqrt/10000-4 65.9µs ± 1% 65.3µs ± 4% ~ (p=0.310 n=5+5)
FloatSqrt/100000-4 1.57ms ± 8% 1.52ms ± 1% ~ (p=0.111 n=5+4)
FloatSqrt/1000000-4 127ms ± 1% 126ms ± 1% ~ (p=0.690 n=5+5)
Change-Id: Id81ac842a9d64981e001c4ca3ff129eebd227593
Reviewed-on: https://go-review.googlesource.com/130835
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Ceil and Trunc of -0.2 return -0, not +0, but we didn't test that.
Updates #23647
Change-Id: Idbd4699376abfb4ca93f16c73c114d610d86a9f2
Reviewed-on: https://go-review.googlesource.com/91335
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Performed `switch true {}` => `switch {}` replacement.
Found using https://go-critic.github.io/overview.html#switchTrue-ref
Change-Id: Ib39ea98531651966a5a56b7bd729b46e4eeb7f7c
Reviewed-on: https://go-review.googlesource.com/123378
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
TMLL, LGDR and LDGR have all been added to the Go assembler
previously, so we don't need to encode them using WORD and BYTE
directives anymore. This is purely a cosmetic change, it does not
change the contents of any object files.
Change-Id: I93f815b91be310858297d8a0dc9e6d8e3f09dd65
Reviewed-on: https://go-review.googlesource.com/129895
Run-TryBot: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Notify readers that interval notation is used.
Fixes: #26765
Change-Id: Id02a7fcffbf41699e85631badeee083f5d4b2201
Reviewed-on: https://go-review.googlesource.com/127549
Reviewed-by: Rob Pike <r@golang.org>
|
|
Test large but not infinite arguments.
This CL adds a test which breaks s390x. Don't submit until
a fix for that is figured out.
Update #26477
Change-Id: Ic86739fe3554e87d7f8e15482875c198fcf1d59c
Reviewed-on: https://go-review.googlesource.com/125641
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
The existing implementation produces correct results with a wide range of inputs,
but invalid results asymptotically. With this change we ensure correct asymptotic results
on s390x
Fixes #26477
Change-Id: I760c1f8177f7cab2d7622ab9a926dfb1f8113b49
Reviewed-on: https://go-review.googlesource.com/127119
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
For modular exponentiation, negative exponents can be handled using
the following relation.
for y < 0: x**y mod m == (x**(-1))**|y| mod m
First compute ModInverse(x, m) and then compute the exponentiation
with the absolute value of the exponent. Non-modular exponentiation
with a negative exponent still returns 1.
Fixes #25865
Change-Id: I2a35986a24794b48e549c8de935ac662d217d8a0
Reviewed-on: https://go-review.googlesource.com/118562
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Handling of sign bit as defined by IEEE 754-2008, section 6.3:
When the sum of two operands with opposite signs (or the difference of
two operands with like signs) is exactly zero, the sign of that sum (or
difference) shall be +0 in all rounding-direction attributes except
roundTowardNegative; under that attribute, the sign of an exact zero
sum (or difference) shall be −0. However, x+x = x−(−x) retains the same
sign as x even when x is zero.
This change handles the special case of Add/Sub resulting in exactly zero
when the rounding mode is ToNegativeInf setting the sign bit accordingly.
Fixes #25798
Change-Id: I4d0715fa3c3e4a3d8a4d7861dc1d6423c8b1c68c
Reviewed-on: https://go-review.googlesource.com/117495
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Change-Id: I9154df128b349c102854bb0f21e4c313685dd0e6
Reviewed-on: https://go-review.googlesource.com/118659
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Each URL was manually verified to ensure it did not serve up incorrect
content.
Change-Id: I4dc846227af95a73ee9a3074d0c379ff0fa955df
Reviewed-on: https://go-review.googlesource.com/115798
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
|
|
For primes congruent to 5 mod 8 there is a simple deterministic
method for calculating the modular square root due to Atkin,
using one exponentiation and 4 multiplications.
A. Atkin. Probabilistic primality testing, summary by F. Morain.
Research Report 1779, INRIA, pages 159–163, 1992.
This increases the speed of modular square roots for these primes
considerably.
name old time/op new time/op delta
ModSqrt231_5Mod8-4 1.03ms ± 2% 0.36ms ± 5% -65.06% (p=0.008 n=5+5)
Change-Id: I024f6e514bbca8d634218983117db2afffe615fe
Reviewed-on: https://go-review.googlesource.com/99615
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Change-Id: I34838320047792c4719837591e848b87ccb7f5ab
Reviewed-on: https://go-review.googlesource.com/115058
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
For #23221.
Change-Id: If55dcf2e0706d6658f4a0863e3740437e008706c
Reviewed-on: https://go-review.googlesource.com/114335
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Currently we use three different algorithms for squaring:
1. basic multiplication for small numbers
2. basic squaring for medium numbers
3. Karatsuba multiplication for large numbers
Change 3. to a version of Karatsuba multiplication specialized
for x == y.
Increasing the performance of 3. lets us lower the threshold
between 2. and 3.
Adapt TestCalibrate to the change that 3. isn't independent
of the threshold between 1. and 2. any more.
Fixes #23221.
benchstat old.txt new.txt
name old time/op new time/op delta
NatSqr/1-4 29.6ns ± 7% 29.5ns ± 5% ~ (p=0.103 n=50+50)
NatSqr/2-4 51.9ns ± 1% 51.9ns ± 1% ~ (p=0.693 n=42+49)
NatSqr/3-4 64.3ns ± 1% 64.1ns ± 0% -0.26% (p=0.000 n=46+43)
NatSqr/5-4 93.5ns ± 2% 93.1ns ± 1% -0.39% (p=0.000 n=48+49)
NatSqr/8-4 131ns ± 1% 131ns ± 1% ~ (p=0.870 n=46+49)
NatSqr/10-4 175ns ± 1% 175ns ± 1% +0.38% (p=0.000 n=49+47)
NatSqr/20-4 426ns ± 1% 429ns ± 1% +0.84% (p=0.000 n=46+48)
NatSqr/30-4 702ns ± 2% 699ns ± 1% -0.38% (p=0.011 n=46+44)
NatSqr/50-4 1.44µs ± 2% 1.43µs ± 1% -0.54% (p=0.010 n=48+48)
NatSqr/80-4 2.85µs ± 1% 2.87µs ± 1% +0.68% (p=0.000 n=47+47)
NatSqr/100-4 4.06µs ± 1% 4.07µs ± 1% +0.29% (p=0.000 n=46+45)
NatSqr/200-4 13.4µs ± 1% 13.5µs ± 1% +0.73% (p=0.000 n=48+48)
NatSqr/300-4 28.5µs ± 1% 28.2µs ± 1% -1.22% (p=0.000 n=46+48)
NatSqr/500-4 81.9µs ± 1% 67.0µs ± 1% -18.25% (p=0.000 n=48+48)
NatSqr/800-4 161µs ± 1% 140µs ± 1% -13.29% (p=0.000 n=47+48)
NatSqr/1000-4 245µs ± 1% 207µs ± 1% -15.17% (p=0.000 n=49+49)
go test -v -calibrate --run TestCalibrate
...
Calibrating threshold between basicSqr(x) and karatsubaSqr(x)
Looking for a timing difference for x between 200 - 500 words by 10 step
words = 200 deltaT = -980ns ( -7%) is karatsubaSqr(x) better: false
words = 210 deltaT = -773ns ( -5%) is karatsubaSqr(x) better: false
words = 220 deltaT = -695ns ( -4%) is karatsubaSqr(x) better: false
words = 230 deltaT = -570ns ( -3%) is karatsubaSqr(x) better: false
words = 240 deltaT = -458ns ( -2%) is karatsubaSqr(x) better: false
words = 250 deltaT = -63ns ( 0%) is karatsubaSqr(x) better: false
words = 260 deltaT = 118ns ( 0%) is karatsubaSqr(x) better: true threshold found
words = 270 deltaT = 377ns ( 1%) is karatsubaSqr(x) better: true
words = 280 deltaT = 765ns ( 3%) is karatsubaSqr(x) better: true
words = 290 deltaT = 673ns ( 2%) is karatsubaSqr(x) better: true
words = 300 deltaT = 502ns ( 1%) is karatsubaSqr(x) better: true
words = 310 deltaT = 629ns ( 2%) is karatsubaSqr(x) better: true
words = 320 deltaT = 1.011µs ( 3%) is karatsubaSqr(x) better: true
words = 330 deltaT = 1.36µs ( 4%) is karatsubaSqr(x) better: true
words = 340 deltaT = 3.001µs ( 8%) is karatsubaSqr(x) better: true
words = 350 deltaT = 3.178µs ( 8%) is karatsubaSqr(x) better: true
...
Change-Id: I6f13c23d94d042539ac28e77fd2618cdc37a429e
Reviewed-on: https://go-review.googlesource.com/105075
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Fixes #25325
Change-Id: I101641be99a820722edb7272918e04e8d2e1646c
Reviewed-on: https://go-review.googlesource.com/112775
Reviewed-by: Rob Pike <r@golang.org>
|
|
Updates #15833
The extended GCD algorithm can be implemented using
Lehmer's algorithm with additional updates for the
cosequences following Algorithm 10.45 from Cohen et al.
"Handbook of Elliptic and Hyperelliptic Curve Cryptography" pp 192.
This brings the speed of the extended GCD calculation within
~2x of the base GCD calculation. There is a slight degradation in
the non-extended GCD speed for small inputs (1-2 words) due to the
additional code to handle the extended updates.
name old time/op new time/op delta
GCD10x10/WithoutXY-4 262ns ± 1% 266ns ± 2% ~ (p=0.333 n=5+5)
GCD10x10/WithXY-4 1.42µs ± 2% 0.74µs ± 3% -47.90% (p=0.008 n=5+5)
GCD10x100/WithoutXY-4 520ns ± 2% 539ns ± 1% +3.81% (p=0.008 n=5+5)
GCD10x100/WithXY-4 2.32µs ± 1% 1.67µs ± 0% -27.80% (p=0.008 n=5+5)
GCD10x1000/WithoutXY-4 1.40µs ± 1% 1.45µs ± 2% +3.26% (p=0.016 n=4+5)
GCD10x1000/WithXY-4 4.78µs ± 1% 3.43µs ± 1% -28.37% (p=0.008 n=5+5)
GCD10x10000/WithoutXY-4 10.0µs ± 0% 10.2µs ± 3% +1.80% (p=0.008 n=5+5)
GCD10x10000/WithXY-4 20.9µs ± 3% 17.9µs ± 1% -14.20% (p=0.008 n=5+5)
GCD10x100000/WithoutXY-4 96.8µs ± 0% 96.3µs ± 1% ~ (p=0.310 n=5+5)
GCD10x100000/WithXY-4 196µs ± 3% 159µs ± 2% -18.61% (p=0.008 n=5+5)
GCD100x100/WithoutXY-4 2.53µs ±15% 2.34µs ± 0% -7.35% (p=0.008 n=5+5)
GCD100x100/WithXY-4 19.3µs ± 0% 3.9µs ± 1% -79.58% (p=0.008 n=5+5)
GCD100x1000/WithoutXY-4 4.23µs ± 0% 4.17µs ± 3% ~ (p=0.127 n=5+5)
GCD100x1000/WithXY-4 22.8µs ± 1% 7.5µs ±10% -67.00% (p=0.008 n=5+5)
GCD100x10000/WithoutXY-4 19.1µs ± 0% 19.0µs ± 0% ~ (p=0.095 n=5+5)
GCD100x10000/WithXY-4 75.1µs ± 2% 30.5µs ± 2% -59.38% (p=0.008 n=5+5)
GCD100x100000/WithoutXY-4 170µs ± 5% 167µs ± 1% ~ (p=1.000 n=5+5)
GCD100x100000/WithXY-4 542µs ± 2% 267µs ± 2% -50.79% (p=0.008 n=5+5)
GCD1000x1000/WithoutXY-4 28.0µs ± 0% 27.1µs ± 0% -3.29% (p=0.008 n=5+5)
GCD1000x1000/WithXY-4 329µs ± 0% 42µs ± 1% -87.12% (p=0.008 n=5+5)
GCD1000x10000/WithoutXY-4 47.2µs ± 0% 46.4µs ± 0% -1.65% (p=0.016 n=5+4)
GCD1000x10000/WithXY-4 607µs ± 9% 123µs ± 1% -79.70% (p=0.008 n=5+5)
GCD1000x100000/WithoutXY-4 260µs ±17% 245µs ± 0% ~ (p=0.056 n=5+5)
GCD1000x100000/WithXY-4 3.64ms ± 1% 0.93ms ± 1% -74.41% (p=0.016 n=4+5)
GCD10000x10000/WithoutXY-4 513µs ± 0% 507µs ± 0% -1.22% (p=0.008 n=5+5)
GCD10000x10000/WithXY-4 7.44ms ± 1% 1.00ms ± 0% -86.58% (p=0.008 n=5+5)
GCD10000x100000/WithoutXY-4 1.23ms ± 0% 1.23ms ± 1% ~ (p=0.056 n=5+5)
GCD10000x100000/WithXY-4 37.3ms ± 0% 7.3ms ± 1% -80.45% (p=0.008 n=5+5)
GCD100000x100000/WithoutXY-4 24.2ms ± 0% 24.2ms ± 0% ~ (p=0.841 n=5+5)
GCD100000x100000/WithXY-4 505ms ± 1% 56ms ± 1% -88.92% (p=0.008 n=5+5)
Change-Id: I25f42ab8c55033acb83cc32bb03c12c1963925e8
Reviewed-on: https://go-review.googlesource.com/78755
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This commit adds the wasm architecture to the math package.
Updates #18892
Change-Id: I5cc38552a31b193d35fb81ae87600a76b8b9e9b5
Reviewed-on: https://go-review.googlesource.com/106996
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This makes math/bits not have any explicit imports even
when compiling tests and thereby avoids import cycles when
dependencies of testing want to import math/bits.
Change-Id: I95eccae2f5c4310e9b18124abfa85212dfbd9daa
Reviewed-on: https://go-review.googlesource.com/110479
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Currently, there is no check for a negative modulus in ModInverse.
Negative moduli are passed internally to GCD, which returns 0 for
negative arguments. Mod is symmetric with respect to negative moduli,
so the calculation can be done by just negating the modulus before
passing the arguments to GCD.
Fixes #24949
Change-Id: Ifd1e64c9b2343f0489c04ab65504e73a623378c7
Reviewed-on: https://go-review.googlesource.com/108115
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
|
|
Currently, the behavior of z.ModInverse(g, n) is undefined
when g and n are not relatively prime. In that case, no
ModInverse exists which can be easily checked during the
computation of the ModInverse. Because the ModInverse does
not indicate whether the inverse exists, there are reimplementations
of a "checked" ModInverse in crypto/rsa. This change removes the
undefined behavior. If the ModInverse does not exist, the receiver z
is unchanged and the return value is nil. This matches the behavior of
ModSqrt for the case where the square root does not exist.
name old time/op new time/op delta
ModInverse-4 2.40µs ± 4% 2.22µs ± 0% -7.74% (p=0.016 n=5+4)
name old alloc/op new alloc/op delta
ModInverse-4 1.36kB ± 0% 1.17kB ± 0% -14.12% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
ModInverse-4 10.0 ± 0% 9.0 ± 0% -10.00% (p=0.008 n=5+5)
Fixes #24922
Change-Id: If7f9d491858450bdb00f1e317152f02493c9c8a8
Reviewed-on: https://go-review.googlesource.com/108996
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
One might try to implement the Mod or Remainder function with the expression
x - TRUNC(x/y + 0.5)*y, but in fact this method is wrong, because the rounding
of (x/y + 0.5) to initialize the argument of TRUNC may lose too much precision.
However, the current test cases can not detect this error. This CL adds two
test cases to prevent people from continuing to do such attempts.
Change-Id: I6690f5cffb21bf8ae06a314b7a45cafff8bcee13
Reviewed-on: https://go-review.googlesource.com/84275
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Made constant names more idiomatic,
moved some constants to function seedrand,
and found better name for _M.
Change-Id: I192172f398378bef486a5bbceb6ba86af48ebcc9
Reviewed-on: https://go-review.googlesource.com/107135
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Use the new softfloat support in the compiler, originally added
for softfloat on MIPS. This support is portable, so we can just
use it for softfloat on ARM.
In the old softfloat support on ARM, the compiler generates
floating point instructions, then the assembler inserts calls
to _sfloat before FP instructions. _sfloat decodes the following
FP instructions and simulates them.
In the new scheme, the compiler generates runtime calls to do FP
operations at a higher level. It doesn't generate FP instructions,
and therefore the assembler won't insert _sfloat calls, i.e. the
old mechanism is automatically suppressed.
The old method may be still be triggered with assembly code
using FP instructions. In the standard library, the only
occurance is math/sqrt_arm.s, which is rewritten to call to the
Go implementation instead.
Some significant speedups for code using floating points:
name old time/op new time/op delta
BinaryTree17-4 37.1s ± 2% 37.3s ± 1% ~ (p=0.105 n=10+10)
Fannkuch11-4 13.0s ± 0% 13.1s ± 0% +0.46% (p=0.000 n=10+10)
FmtFprintfEmpty-4 700ns ± 4% 734ns ± 6% +4.84% (p=0.009 n=10+10)
FmtFprintfString-4 1.22µs ± 3% 1.22µs ± 4% ~ (p=0.897 n=10+10)
FmtFprintfInt-4 1.27µs ± 2% 1.30µs ± 1% +1.91% (p=0.001 n=10+9)
FmtFprintfIntInt-4 1.83µs ± 2% 1.81µs ± 3% ~ (p=0.149 n=10+10)
FmtFprintfPrefixedInt-4 1.80µs ± 3% 1.81µs ± 2% ~ (p=0.421 n=10+8)
FmtFprintfFloat-4 6.89µs ± 3% 3.59µs ± 2% -47.93% (p=0.000 n=10+10)
FmtManyArgs-4 6.39µs ± 1% 6.09µs ± 1% -4.61% (p=0.000 n=10+9)
GobDecode-4 109ms ± 2% 81ms ± 2% -25.99% (p=0.000 n=9+10)
GobEncode-4 109ms ± 2% 76ms ± 2% -29.88% (p=0.000 n=10+9)
Gzip-4 3.61s ± 1% 3.59s ± 1% ~ (p=0.247 n=10+10)
Gunzip-4 449ms ± 4% 450ms ± 1% ~ (p=0.230 n=10+7)
HTTPClientServer-4 1.55ms ± 3% 1.53ms ± 2% ~ (p=0.400 n=9+10)
JSONEncode-4 356ms ± 1% 183ms ± 1% -48.73% (p=0.000 n=10+10)
JSONDecode-4 1.12s ± 2% 0.87s ± 1% -21.88% (p=0.000 n=10+10)
Mandelbrot200-4 5.49s ± 1% 2.55s ± 1% -53.45% (p=0.000 n=9+10)
GoParse-4 49.6ms ± 2% 47.5ms ± 1% -4.08% (p=0.000 n=10+9)
RegexpMatchEasy0_32-4 1.13µs ± 4% 1.20µs ± 4% +6.42% (p=0.000 n=10+10)
RegexpMatchEasy0_1K-4 4.41µs ± 2% 4.44µs ± 2% ~ (p=0.128 n=10+10)
RegexpMatchEasy1_32-4 1.15µs ± 5% 1.20µs ± 5% +4.85% (p=0.002 n=10+10)
RegexpMatchEasy1_1K-4 6.21µs ± 2% 6.37µs ± 4% +2.62% (p=0.001 n=9+10)
RegexpMatchMedium_32-4 1.58µs ± 5% 1.65µs ± 3% +4.85% (p=0.000 n=10+10)
RegexpMatchMedium_1K-4 341µs ± 3% 351µs ± 7% ~ (p=0.573 n=8+10)
RegexpMatchHard_32-4 21.4µs ± 3% 21.5µs ± 5% ~ (p=0.931 n=9+9)
RegexpMatchHard_1K-4 626µs ± 2% 626µs ± 1% ~ (p=0.645 n=8+8)
Revcomp-4 46.4ms ± 2% 47.4ms ± 2% +2.07% (p=0.000 n=10+10)
Template-4 1.31s ± 3% 1.23s ± 4% -6.13% (p=0.000 n=10+10)
TimeParse-4 4.49µs ± 1% 4.41µs ± 2% -1.81% (p=0.000 n=10+9)
TimeFormat-4 9.31µs ± 1% 9.32µs ± 2% ~ (p=0.561 n=9+9)
Change-Id: Iaeeff6c9a09c1b2c064d06e09dd88101dc02bfa4
Reviewed-on: https://go-review.googlesource.com/106735
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Updates #22830
Due to not checking if the output slices alias in divLarge,
calls of the form z.div(z, x, y) caused the slice z
to attempt to be used to store both the quotient and the
remainder of the division. CL 78995 applies an alias
check to correct that error. This CL cleans up the
additional div calls that attempt to supply the same slice
to hold both the quotient and remainder.
Note that the call in expNN was responsible for the reported
error in r.Exp(x, 1, m) when r was initialized to a non-zero value.
The second instance in expNNMontgomery did not result in an error
due to the size of the arguments.
// RR = 2**(2*_W*len(m)) mod m
RR := nat(nil).setWord(1)
zz := nat(nil).shl(RR, uint(2*numWords*_W))
_, RR = RR.div(RR, zz, m)
Specifically,
cap(RR) == 5 after setWord(1) due to const e = 4 in z.make(1)
len(zz) == 2*len(m) + 1 after shifting left, numWords = len(m)
Reusing the backing array for z and z2 in div was only triggered if
cap(RR) >= len(zz) + 1 and len(m) > 1 so that divLarge was called.
But, 5 < 2*len(m) + 2 if len(m) > 1, so new arrays were allocated
and the error was never triggered in this case.
Change-Id: Iedac80dbbde13216c94659e84d28f6f4be3aaf24
Reviewed-on: https://go-review.googlesource.com/81055
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
This change improves performance of addVV, subVV and mulAddVWW
by unrolling the loops, with improvements up to 1.45x.
benchmark old ns/op new ns/op delta
BenchmarkAddVV/1-16 5.79 5.85 +1.04%
BenchmarkAddVV/2-16 6.41 6.62 +3.28%
BenchmarkAddVV/3-16 6.89 7.35 +6.68%
BenchmarkAddVV/4-16 7.47 8.26 +10.58%
BenchmarkAddVV/5-16 8.04 8.18 +1.74%
BenchmarkAddVV/10-16 10.9 11.2 +2.75%
BenchmarkAddVV/100-16 81.7 57.0 -30.23%
BenchmarkAddVV/1000-16 714 500 -29.97%
BenchmarkAddVV/10000-16 7088 4946 -30.22%
BenchmarkAddVV/100000-16 71514 49364 -30.97%
BenchmarkSubVV/1-16 5.94 5.89 -0.84%
BenchmarkSubVV/2-16 12.9 6.82 -47.13%
BenchmarkSubVV/3-16 7.03 7.34 +4.41%
BenchmarkSubVV/4-16 7.58 8.23 +8.58%
BenchmarkSubVV/5-16 8.15 8.19 +0.49%
BenchmarkSubVV/10-16 11.2 11.4 +1.79%
BenchmarkSubVV/100-16 82.4 57.0 -30.83%
BenchmarkSubVV/1000-16 715 499 -30.21%
BenchmarkSubVV/10000-16 7089 4947 -30.22%
BenchmarkSubVV/100000-16 71568 49378 -31.01%
benchmark old MB/s new MB/s speedup
BenchmarkAddVV/1-16 11048.49 10939.92 0.99x
BenchmarkAddVV/2-16 19973.41 19323.60 0.97x
BenchmarkAddVV/3-16 27847.09 26123.06 0.94x
BenchmarkAddVV/4-16 34276.46 30976.54 0.90x
BenchmarkAddVV/5-16 39781.92 39140.68 0.98x
BenchmarkAddVV/10-16 58559.29 56894.68 0.97x
BenchmarkAddVV/100-16 78354.88 112243.69 1.43x
BenchmarkAddVV/1000-16 89592.74 127889.04 1.43x
BenchmarkAddVV/10000-16 90292.39 129387.06 1.43x
BenchmarkAddVV/100000-16 89492.92 129647.78 1.45x
BenchmarkSubVV/1-16 10781.03 10861.22 1.01x
BenchmarkSubVV/2-16 9949.27 18760.21 1.89x
BenchmarkSubVV/3-16 27319.40 26166.01 0.96x
BenchmarkSubVV/4-16 33764.35 31123.02 0.92x
BenchmarkSubVV/5-16 39272.40 39050.31 0.99x
BenchmarkSubVV/10-16 57262.87 56206.33 0.98x
BenchmarkSubVV/100-16 77641.78 112280.86 1.45x
BenchmarkSubVV/1000-16 89486.27 128064.08 1.43x
BenchmarkSubVV/10000-16 90274.37 129356.59 1.43x
BenchmarkSubVV/100000-16 89424.42 129610.50 1.45x
Change-Id: I2795a82134d1e3b75e2634c76b8ca165a723ec7b
Reviewed-on: https://go-review.googlesource.com/103495
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
The go/printer (and thus gofmt) uses a heuristic to determine
whether to break alignment between elements of an expression
list which is spread across multiple lines. The heuristic only
kicked in if the entry sizes (character length) was above a
certain threshold (20) and the ratio between the previous and
current entry size was above a certain value (4).
This heuristic worked reasonably most of the time, but also
led to unfortunate breaks in many cases where a single entry
was suddenly much smaller (or larger) then the previous one.
The behavior of gofmt was sufficiently mysterious in some of
these situations that many issues were filed against it.
The simplest solution to address this problem is to remove
the heuristic altogether and have a programmer introduce
empty lines to force different alignments if it improves
readability. The problem with that approach is that the
places where it really matters, very long tables with many
(hundreds, or more) entries, may be machine-generated and
not "post-processed" by a human (e.g., unicode/utf8/tables.go).
If a single one of those entries is overlong, the result
would be that the alignment would force all comments or
values in key:value pairs to be adjusted to that overlong
value, making the table hard to read (e.g., that entry may
not even be visible on screen and all other entries seem
spaced out too wide).
Instead, we opted for a slightly improved heuristic that
behaves much better for "normal", human-written code.
1) The threshold is increased from 20 to 40. This disables
the heuristic for many common cases yet even if the alignment
is not "ideal", 40 is not that many characters per line with
todays screens, making it very likely that the entire line
remains "visible" in an editor.
2) Changed the heuristic to not simply look at the size ratio
between current and previous line, but instead considering the
geometric mean of the sizes of the previous (aligned) lines.
This emphasizes the "overall picture" of the previous lines,
rather than a single one (which might be an outlier).
3) Changed the ratio from 4 to 2.5. Now that we ignore sizes
below 40, a ratio of 4 would mean that a new entry would have
to be 4 times bigger (160) or smaller (10) before alignment
would be broken. A ratio of 2.5 seems more sensible.
Applied updated gofmt to all of src and misc. Also tested
against several former issues that complained about this
and verified that the output for the given examples is
satisfactory (added respective test cases).
Some of the files changed because they were not gofmt-ed
in the first place.
For #644.
For #7335.
For #10392.
(and probably more related issues)
Fixes #22852.
Change-Id: I5e48b3d3b157a5cf2d649833b7297b33f43a6f6e
|