| Age | Commit message (Collapse) | Author |
|
Also move the arm64 CountByte implementation while we're here.
Fixes #19792
Change-Id: I1e0fdf1e03e3135af84150a2703b58dad1b0d57e
Reviewed-on: https://go-review.googlesource.com/98518
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Move bytes.Count and strings.Count to bytealg.
Update #19792
Change-Id: I3e4e14b504a0b71758885bb131e5656e342cf8cb
Reviewed-on: https://go-review.googlesource.com/98495
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Use SIMD instructions when counting a single byte.
Inspired from runtime IndexByte implementation.
Benchmark results of bytes, where 1 byte in every 8 is the one we are looking:
name old time/op new time/op delta
CountSingle/10-8 96.1ns ± 1% 38.8ns ± 0% -59.64% (p=0.000 n=9+7)
CountSingle/32-8 172ns ± 2% 36ns ± 1% -79.27% (p=0.000 n=10+10)
CountSingle/4K-8 18.2µs ± 1% 0.9µs ± 0% -95.17% (p=0.000 n=9+10)
CountSingle/4M-8 18.4ms ± 0% 0.9ms ± 0% -95.00% (p=0.000 n=10+9)
CountSingle/64M-8 284ms ± 4% 19ms ± 0% -93.40% (p=0.000 n=10+10)
name old speed new speed delta
CountSingle/10-8 104MB/s ± 1% 258MB/s ± 0% +147.99% (p=0.000 n=9+10)
CountSingle/32-8 185MB/s ± 1% 897MB/s ± 1% +385.33% (p=0.000 n=9+10)
CountSingle/4K-8 225MB/s ± 1% 4658MB/s ± 0% +1967.40% (p=0.000 n=9+10)
CountSingle/4M-8 228MB/s ± 0% 4555MB/s ± 0% +1901.71% (p=0.000 n=10+9)
CountSingle/64M-8 236MB/s ± 4% 3575MB/s ± 0% +1414.69% (p=0.000 n=10+10)
Change-Id: Ifccb51b3c8658c49773fe05147c3cf3aead361e5
Reviewed-on: https://go-review.googlesource.com/71111
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Use IndexByte first, as it allows us to skip lots of bytes quickly.
If IndexByte is generating a lot of false positives, switch over to Rabin-Karp.
Experiments for ppc64le
bytes:
name old time/op new time/op delta
IndexPeriodic/IndexPeriodic2-2 1.12ms ± 0% 0.18ms ± 0% -83.54% (p=0.000 n=10+9)
IndexPeriodic/IndexPeriodic4-2 635µs ± 0% 184µs ± 0% -71.06% (p=0.000 n=9+9)
IndexPeriodic/IndexPeriodic8-2 289µs ± 0% 184µs ± 0% -36.51% (p=0.000 n=10+9)
IndexPeriodic/IndexPeriodic16-2 133µs ± 0% 183µs ± 0% +37.68% (p=0.000 n=10+9)
IndexPeriodic/IndexPeriodic32-2 68.3µs ± 0% 70.2µs ± 0% +2.76% (p=0.000 n=10+10)
IndexPeriodic/IndexPeriodic64-2 35.8µs ± 0% 36.6µs ± 0% +2.17% (p=0.000 n=8+10)
strings:
name old time/op new time/op delta
IndexPeriodic/IndexPeriodic2-2 184µs ± 0% 184µs ± 0% +0.11% (p=0.029 n=4+4)
IndexPeriodic/IndexPeriodic4-2 184µs ± 0% 184µs ± 0% ~ (p=0.886 n=4+4)
IndexPeriodic/IndexPeriodic8-2 184µs ± 0% 184µs ± 0% ~ (p=0.486 n=4+4)
IndexPeriodic/IndexPeriodic16-2 185µs ± 1% 184µs ± 0% ~ (p=0.343 n=4+4)
IndexPeriodic/IndexPeriodic32-2 184µs ± 0% 69µs ± 0% -62.37% (p=0.029 n=4+4)
IndexPeriodic/IndexPeriodic64-2 184µs ± 0% 37µs ± 0% -80.17% (p=0.029 n=4+4)
Fixes #22578
Change-Id: If2a4d8554cb96bfd699b58149d13ac294615f8b8
Reviewed-on: https://go-review.googlesource.com/76070
Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
|
|
Fixes #21950
Change-Id: I6fa392abd2c3bf6a4f80f14c6b1419470e9a944d
Reviewed-on: https://go-review.googlesource.com/66750
Run-TryBot: Rob Pike <r@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
|
|
Use SSE/AVX2 when counting a single byte.
Inspired from runtime indexbyte implementation.
Benchmark against previous implementation, where
1 byte in every 8 is the one we are looking for:
* On a machine without AVX2
name old time/op new time/op delta
CountSingle/10-4 61.8ns ±10% 15.6ns ±11% -74.83% (p=0.000 n=10+10)
CountSingle/32-4 100ns ± 4% 17ns ±10% -82.54% (p=0.000 n=10+9)
CountSingle/4K-4 9.66µs ± 3% 0.37µs ± 6% -96.21% (p=0.000 n=10+10)
CountSingle/4M-4 11.0ms ± 6% 0.4ms ± 4% -96.04% (p=0.000 n=10+10)
CountSingle/64M-4 194ms ± 8% 8ms ± 2% -95.64% (p=0.000 n=10+10)
name old speed new speed delta
CountSingle/10-4 162MB/s ±10% 645MB/s ±10% +297.00% (p=0.000 n=10+10)
CountSingle/32-4 321MB/s ± 5% 1844MB/s ± 9% +474.79% (p=0.000 n=10+9)
CountSingle/4K-4 424MB/s ± 3% 11169MB/s ± 6% +2533.10% (p=0.000 n=10+10)
CountSingle/4M-4 381MB/s ± 7% 9609MB/s ± 4% +2421.88% (p=0.000 n=10+10)
CountSingle/64M-4 346MB/s ± 7% 7924MB/s ± 2% +2188.78% (p=0.000 n=10+10)
* On a machine with AVX2
name old time/op new time/op delta
CountSingle/10-8 37.1ns ± 3% 8.2ns ± 1% -77.80% (p=0.000 n=10+10)
CountSingle/32-8 66.1ns ± 3% 9.8ns ± 2% -85.23% (p=0.000 n=10+10)
CountSingle/4K-8 7.36µs ± 3% 0.11µs ± 1% -98.54% (p=0.000 n=10+10)
CountSingle/4M-8 7.46ms ± 2% 0.15ms ± 2% -97.95% (p=0.000 n=10+9)
CountSingle/64M-8 124ms ± 2% 6ms ± 4% -95.09% (p=0.000 n=10+10)
name old speed new speed delta
CountSingle/10-8 269MB/s ± 3% 1213MB/s ± 1% +350.32% (p=0.000 n=10+10)
CountSingle/32-8 484MB/s ± 4% 3277MB/s ± 2% +576.66% (p=0.000 n=10+10)
CountSingle/4K-8 556MB/s ± 3% 37933MB/s ± 1% +6718.36% (p=0.000 n=10+10)
CountSingle/4M-8 562MB/s ± 2% 27444MB/s ± 3% +4783.43% (p=0.000 n=10+9)
CountSingle/64M-8 543MB/s ± 2% 11054MB/s ± 3% +1935.81% (p=0.000 n=10+10)
Fixes #19411
Change-Id: Ieaf20b1fabccabe767c55c66e242e86f3617f883
Reviewed-on: https://go-review.googlesource.com/38258
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Continuation of CL 20111.
Change-Id: Ie2f62237e6ec316989c021de9b267cc9d6ee6676
Reviewed-on: https://go-review.googlesource.com/32830
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Use vector instructions to speed up indexing operations for short
strings (64 bytes or less).
bytes_s390x.go and strings_s390x.go are based on their amd64
equivalents in CL 31690.
bytes package:
name old time/op new time/op delta
Index/10 40.3ns ± 7% 11.3ns ± 4% -72.06% (p=0.000 n=10+10)
Index/32 196ns ± 1% 27ns ± 2% -86.25% (p=0.000 n=10+10)
Index/4K 28.9µs ± 1% 1.5µs ± 2% -94.94% (p=0.000 n=9+9)
Index/4M 30.1ms ± 2% 1.5ms ± 3% -94.94% (p=0.000 n=10+10)
Index/64M 549ms ±13% 28ms ± 3% -94.87% (p=0.000 n=10+9)
IndexEasy/10 18.8ns ±11% 11.5ns ± 2% -38.81% (p=0.000 n=10+10)
IndexEasy/32 23.6ns ± 6% 28.1ns ± 3% +19.29% (p=0.000 n=10+10)
IndexEasy/4K 251ns ± 5% 223ns ± 8% -11.04% (p=0.000 n=10+10)
IndexEasy/4M 318µs ± 9% 266µs ± 8% -16.42% (p=0.000 n=10+10)
IndexEasy/64M 14.7ms ±16% 13.2ms ±11% -10.22% (p=0.001 n=10+10)
strings package:
name old time/op new time/op delta
IndexRune 88.1ns ±16% 28.9ns ± 4% -67.20% (p=0.000 n=10+10)
IndexRuneLongString 456ns ± 7% 34ns ± 3% -92.50% (p=0.000 n=10+10)
IndexRuneFastPath 12.9ns ±14% 11.1ns ± 6% -13.84% (p=0.000 n=10+10)
Index 13.0ns ± 7% 11.3ns ± 4% -13.31% (p=0.000 n=10+10)
IndexHard1 3.38ms ± 9% 0.07ms ± 1% -97.79% (p=0.000 n=10+10)
IndexHard2 3.58ms ± 7% 0.37ms ± 2% -89.78% (p=0.000 n=10+10)
IndexHard3 3.47ms ± 7% 0.75ms ± 1% -78.52% (p=0.000 n=10+10)
IndexHard4 3.56ms ± 6% 1.34ms ± 0% -62.39% (p=0.000 n=9+9)
Change-Id: If36c2afb8c02e80fcaa1cf5ec2abb0a2be08c7d1
Reviewed-on: https://go-review.googlesource.com/32447
Run-TryBot: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
name old time/op new time/op delta
IndexByte32-48 9.05ns ± 7% 9.59ns ±11% +5.93% (p=0.001 n=19+20)
IndexByte4K-48 118ns ± 4% 122ns ± 8% +3.52% (p=0.002 n=19+19)
IndexByte4M-48 172µs ±13% 188µs ±12% +9.49% (p=0.000 n=20+20)
IndexByte64M-48 8.00ms ±14% 8.05ms ±23% ~ (p=0.799 n=20+20)
IndexBytePortable32-48 41.7ns ±15% 42.5ns ±12% ~ (p=0.372 n=20+20)
IndexBytePortable4K-48 3.08µs ±16% 3.26µs ±10% +5.77% (p=0.018 n=20+20)
IndexBytePortable4M-48 3.12ms ±17% 3.20ms ±10% ~ (p=0.157 n=20+20)
IndexBytePortable64M-48 54.0ms ±14% 55.3ms ±14% ~ (p=0.640 n=20+20)
Index32-48 230ns ±12% 46ns ± 6% -79.87% (p=0.000 n=20+19)
Index4K-48 43.2µs ± 9% 3.2µs ±12% -92.58% (p=0.000 n=19+20)
Index4M-48 44.4ms ± 7% 3.3ms ±13% -92.59% (p=0.000 n=19+20)
Index64M-48 714ms ±10% 56ms ± 8% -92.22% (p=0.000 n=19+19)
IndexEasy32-48 52.7ns ±10% 31.0ns ±11% -41.21% (p=0.000 n=20+20)
IndexEasy4K-48 139ns ± 5% 1598ns ± 6% +1046.37% (p=0.000 n=19+19)
IndexEasy4M-48 179µs ± 8% 1674µs ±10% +834.31% (p=0.000 n=19+20)
IndexEasy64M-48 8.56ms ±10% 27.82ms ±16% +225.14% (p=0.000 n=19+20)
name old speed new speed delta
IndexByte32-48 3.52GB/s ± 7% 3.35GB/s ±11% -4.99% (p=0.001 n=20+20)
IndexByte4K-48 34.5GB/s ± 7% 33.2GB/s ±10% -3.67% (p=0.002 n=20+20)
IndexByte4M-48 24.6GB/s ±14% 22.4GB/s ±14% -8.73% (p=0.000 n=20+20)
IndexByte64M-48 8.42GB/s ±16% 8.42GB/s ±19% ~ (p=0.799 n=20+20)
IndexBytePortable32-48 770MB/s ±13% 756MB/s ±11% ~ (p=0.383 n=20+20)
IndexBytePortable4K-48 1.34GB/s ±14% 1.26GB/s ±10% -5.76% (p=0.018 n=20+20)
IndexBytePortable4M-48 1.35GB/s ±15% 1.31GB/s ±11% ~ (p=0.157 n=20+20)
IndexBytePortable64M-48 1.25GB/s ±16% 1.22GB/s ±13% ~ (p=0.640 n=20+20)
Index32-48 138MB/s ± 8% 687MB/s ± 8% +398.57% (p=0.000 n=19+20)
Index4K-48 94.9MB/s ± 9% 1280.5MB/s ±11% +1249.11% (p=0.000 n=19+20)
Index4M-48 94.6MB/s ± 7% 1278.5MB/s ±12% +1250.99% (p=0.000 n=19+20)
Index64M-48 94.2MB/s ±10% 1210.9MB/s ± 8% +1185.04% (p=0.000 n=19+19)
IndexEasy32-48 608MB/s ±10% 1035MB/s ±10% +70.15% (p=0.000 n=20+20)
IndexEasy4K-48 29.3GB/s ± 6% 2.6GB/s ± 6% -91.24% (p=0.000 n=19+19)
IndexEasy4M-48 23.3GB/s ±10% 2.5GB/s ± 9% -89.23% (p=0.000 n=20+20)
IndexEasy64M-48 7.86GB/s ±11% 2.42GB/s ±14% -69.18% (p=0.000 n=19+20)
Change-Id: Ia191f0a6ca80e113397d9ed98d25f195768b65bc
Reviewed-on: https://go-review.googlesource.com/22550
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|