aboutsummaryrefslogtreecommitdiff
path: root/src/bytes/bytes_amd64.go
AgeCommit message (Collapse)Author
2018-03-04internal/bytealg: move short string Index implementations into bytealgKeith Randall
Also move the arm64 CountByte implementation while we're here. Fixes #19792 Change-Id: I1e0fdf1e03e3135af84150a2703b58dad1b0d57e Reviewed-on: https://go-review.googlesource.com/98518 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-04internal/bytealg: move Count to bytealgKeith Randall
Move bytes.Count and strings.Count to bytealg. Update #19792 Change-Id: I3e4e14b504a0b71758885bb131e5656e342cf8cb Reviewed-on: https://go-review.googlesource.com/98495 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-11-15bytes,strings: in generic Index, use mix of IndexByte and Rabin-KarpKeith Randall
Use IndexByte first, as it allows us to skip lots of bytes quickly. If IndexByte is generating a lot of false positives, switch over to Rabin-Karp. Experiments for ppc64le bytes: name old time/op new time/op delta IndexPeriodic/IndexPeriodic2-2 1.12ms ± 0% 0.18ms ± 0% -83.54% (p=0.000 n=10+9) IndexPeriodic/IndexPeriodic4-2 635µs ± 0% 184µs ± 0% -71.06% (p=0.000 n=9+9) IndexPeriodic/IndexPeriodic8-2 289µs ± 0% 184µs ± 0% -36.51% (p=0.000 n=10+9) IndexPeriodic/IndexPeriodic16-2 133µs ± 0% 183µs ± 0% +37.68% (p=0.000 n=10+9) IndexPeriodic/IndexPeriodic32-2 68.3µs ± 0% 70.2µs ± 0% +2.76% (p=0.000 n=10+10) IndexPeriodic/IndexPeriodic64-2 35.8µs ± 0% 36.6µs ± 0% +2.17% (p=0.000 n=8+10) strings: name old time/op new time/op delta IndexPeriodic/IndexPeriodic2-2 184µs ± 0% 184µs ± 0% +0.11% (p=0.029 n=4+4) IndexPeriodic/IndexPeriodic4-2 184µs ± 0% 184µs ± 0% ~ (p=0.886 n=4+4) IndexPeriodic/IndexPeriodic8-2 184µs ± 0% 184µs ± 0% ~ (p=0.486 n=4+4) IndexPeriodic/IndexPeriodic16-2 185µs ± 1% 184µs ± 0% ~ (p=0.343 n=4+4) IndexPeriodic/IndexPeriodic32-2 184µs ± 0% 69µs ± 0% -62.37% (p=0.029 n=4+4) IndexPeriodic/IndexPeriodic64-2 184µs ± 0% 37µs ± 0% -80.17% (p=0.029 n=4+4) Fixes #22578 Change-Id: If2a4d8554cb96bfd699b58149d13ac294615f8b8 Reviewed-on: https://go-review.googlesource.com/76070 Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
2017-10-02bytes: explicitly state if a function expects UTF-8-encoded dataTim Cooper
Fixes #21950 Change-Id: I6fa392abd2c3bf6a4f80f14c6b1419470e9a944d Reviewed-on: https://go-review.googlesource.com/66750 Run-TryBot: Rob Pike <r@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rob Pike <r@golang.org>
2017-05-10internal/cpu: new package to detect cpu featuresMartin Möhrmann
Implements detection of x86 cpu features that are used in the go standard library. Changes all standard library packages to use the new cpu package instead of using runtime internal variables to check x86 cpu features. Updates: #15403 Change-Id: I2999a10cb4d9ec4863ffbed72f4e021a1dbc4bb9 Reviewed-on: https://go-review.googlesource.com/41476 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-07strings: optimize Count for amd64Josselin Costanzi
Move optimized Count implementation from bytes to runtime. Use in both bytes and strings packages. Add CountByte benchmark to strings. Strings benchmarks: name old time/op new time/op delta CountHard1-4 226µs ± 1% 226µs ± 2% ~ (p=0.247 n=10+10) CountHard2-4 316µs ± 1% 315µs ± 0% ~ (p=0.133 n=9+10) CountHard3-4 919µs ± 1% 920µs ± 1% ~ (p=0.968 n=10+9) CountTorture-4 15.4µs ± 1% 15.7µs ± 1% +2.47% (p=0.000 n=10+9) CountTortureOverlapping-4 9.60ms ± 0% 9.65ms ± 1% ~ (p=0.247 n=10+10) CountByte/10-4 26.3ns ± 1% 10.9ns ± 1% -58.71% (p=0.000 n=9+9) CountByte/32-4 42.7ns ± 0% 14.2ns ± 0% -66.64% (p=0.000 n=10+10) CountByte/4096-4 3.07µs ± 0% 0.31µs ± 2% -89.99% (p=0.000 n=9+10) CountByte/4194304-4 3.48ms ± 1% 0.34ms ± 1% -90.09% (p=0.000 n=10+9) CountByte/67108864-4 55.6ms ± 1% 7.0ms ± 0% -87.49% (p=0.000 n=9+8) name old speed new speed delta CountByte/10-4 380MB/s ± 1% 919MB/s ± 1% +142.21% (p=0.000 n=9+9) CountByte/32-4 750MB/s ± 0% 2247MB/s ± 0% +199.62% (p=0.000 n=10+10) CountByte/4096-4 1.33GB/s ± 0% 13.32GB/s ± 2% +898.13% (p=0.000 n=9+10) CountByte/4194304-4 1.21GB/s ± 1% 12.17GB/s ± 1% +908.87% (p=0.000 n=10+9) CountByte/67108864-4 1.21GB/s ± 1% 9.65GB/s ± 0% +699.29% (p=0.000 n=9+8) Fixes #19411 Change-Id: I8d2d409f0fa6df6d03b60790aa86e540b4a4e3b0 Reviewed-on: https://go-review.googlesource.com/38693 Reviewed-by: Keith Randall <khr@golang.org>
2017-03-22bytes: fix typo in commentJosselin Costanzi
Change-Id: Ia739337dc9961422982912cc6a669022559fb991 Reviewed-on: https://go-review.googlesource.com/38365 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-21bytes: add optimized countByte for amd64Josselin Costanzi
Use SSE/AVX2 when counting a single byte. Inspired from runtime indexbyte implementation. Benchmark against previous implementation, where 1 byte in every 8 is the one we are looking for: * On a machine without AVX2 name old time/op new time/op delta CountSingle/10-4 61.8ns ±10% 15.6ns ±11% -74.83% (p=0.000 n=10+10) CountSingle/32-4 100ns ± 4% 17ns ±10% -82.54% (p=0.000 n=10+9) CountSingle/4K-4 9.66µs ± 3% 0.37µs ± 6% -96.21% (p=0.000 n=10+10) CountSingle/4M-4 11.0ms ± 6% 0.4ms ± 4% -96.04% (p=0.000 n=10+10) CountSingle/64M-4 194ms ± 8% 8ms ± 2% -95.64% (p=0.000 n=10+10) name old speed new speed delta CountSingle/10-4 162MB/s ±10% 645MB/s ±10% +297.00% (p=0.000 n=10+10) CountSingle/32-4 321MB/s ± 5% 1844MB/s ± 9% +474.79% (p=0.000 n=10+9) CountSingle/4K-4 424MB/s ± 3% 11169MB/s ± 6% +2533.10% (p=0.000 n=10+10) CountSingle/4M-4 381MB/s ± 7% 9609MB/s ± 4% +2421.88% (p=0.000 n=10+10) CountSingle/64M-4 346MB/s ± 7% 7924MB/s ± 2% +2188.78% (p=0.000 n=10+10) * On a machine with AVX2 name old time/op new time/op delta CountSingle/10-8 37.1ns ± 3% 8.2ns ± 1% -77.80% (p=0.000 n=10+10) CountSingle/32-8 66.1ns ± 3% 9.8ns ± 2% -85.23% (p=0.000 n=10+10) CountSingle/4K-8 7.36µs ± 3% 0.11µs ± 1% -98.54% (p=0.000 n=10+10) CountSingle/4M-8 7.46ms ± 2% 0.15ms ± 2% -97.95% (p=0.000 n=10+9) CountSingle/64M-8 124ms ± 2% 6ms ± 4% -95.09% (p=0.000 n=10+10) name old speed new speed delta CountSingle/10-8 269MB/s ± 3% 1213MB/s ± 1% +350.32% (p=0.000 n=10+10) CountSingle/32-8 484MB/s ± 4% 3277MB/s ± 2% +576.66% (p=0.000 n=10+10) CountSingle/4K-8 556MB/s ± 3% 37933MB/s ± 1% +6718.36% (p=0.000 n=10+10) CountSingle/4M-8 562MB/s ± 2% 27444MB/s ± 3% +4783.43% (p=0.000 n=10+9) CountSingle/64M-8 543MB/s ± 2% 11054MB/s ± 3% +1935.81% (p=0.000 n=10+10) Fixes #19411 Change-Id: Ieaf20b1fabccabe767c55c66e242e86f3617f883 Reviewed-on: https://go-review.googlesource.com/38258 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-11-04all: make copyright headers consistent with one space after periodMichael Munday
Continuation of CL 20111. Change-Id: Ie2f62237e6ec316989c021de9b267cc9d6ee6676 Reviewed-on: https://go-review.googlesource.com/32830 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-11-01bytes,strings: use IndexByte more often in Index on AMD64Ilya Tocar
IndexByte+compare is faster than indexShortStr in good case, when first byte is rare, but is more costly in bad cases. Start with IndexByte and switch to indexShortStr if we encounter false positives more often than once per 8 bytes. Benchmark changes for package bytes: IndexRune/4K-8 416ns ± 0% 86ns ± 0% -79.24% (p=0.000 n=10+10) IndexRune/4M-8 413µs ± 0% 100µs ± 1% -75.88% (p=0.000 n=10+10) IndexRune/64M-8 6.73ms ± 0% 2.86ms ± 1% -57.49% (p=0.000 n=10+10) Index/10-8 8.45ns ± 0% 8.96ns ± 0% +6.04% (p=0.000 n=9+10) Index/32-8 9.64ns ± 0% 9.51ns ± 0% -1.30% (p=0.000 n=8+9) Index/4K-8 2.11µs ± 0% 2.12µs ± 0% +0.26% (p=0.000 n=10+10) Index/4M-8 3.60ms ± 5% 3.59ms ± 7% ~ (p=0.497 n=9+10) Index/64M-8 57.1ms ± 3% 58.7ms ± 5% ~ (p=0.113 n=9+10) IndexEasy/10-8 7.10ns ± 1% 7.71ns ± 1% +8.60% (p=0.000 n=10+10) IndexEasy/32-8 9.29ns ± 1% 9.22ns ± 0% -0.75% (p=0.000 n=9+10) IndexEasy/4K-8 1.06µs ± 0% 0.08µs ± 0% -92.18% (p=0.000 n=10+10) IndexEasy/4M-8 1.07ms ± 0% 0.10ms ± 1% -90.74% (p=0.000 n=9+10) IndexEasy/64M-8 17.3ms ± 0% 2.8ms ± 1% -83.76% (p=0.000 n=10+9) IndexRune/4K-8 9.84GB/s ± 0% 47.42GB/s ± 0% +381.85% (p=0.000 n=8+10) IndexRune/4M-8 10.1GB/s ± 0% 42.1GB/s ± 1% +314.56% (p=0.000 n=10+10) IndexRune/64M-8 10.0GB/s ± 0% 23.4GB/s ± 1% +135.25% (p=0.000 n=10+10) Index/10-8 1.18GB/s ± 0% 1.12GB/s ± 0% -5.67% (p=0.000 n=10+9) Index/32-8 3.32GB/s ± 0% 3.36GB/s ± 0% +1.27% (p=0.000 n=10+9) Index/4K-8 1.94GB/s ± 0% 1.93GB/s ± 0% -0.25% (p=0.000 n=10+9) Index/4M-8 1.17GB/s ± 5% 1.17GB/s ± 7% ~ (p=0.497 n=9+10) Index/64M-8 1.17GB/s ± 3% 1.15GB/s ± 6% ~ (p=0.113 n=9+10) IndexEasy/10-8 1.41GB/s ± 1% 1.30GB/s ± 1% -7.90% (p=0.000 n=10+10) IndexEasy/32-8 3.45GB/s ± 1% 3.47GB/s ± 0% +0.73% (p=0.000 n=9+10) IndexEasy/4K-8 3.84GB/s ± 0% 49.16GB/s ± 0% +1178.78% (p=0.000 n=9+10) IndexEasy/4M-8 3.91GB/s ± 0% 42.19GB/s ± 1% +980.37% (p=0.000 n=9+10) IndexEasy/64M-8 3.88GB/s ± 0% 23.91GB/s ± 1% +515.76% (p=0.000 n=10+9) No significant changes in strings. In regexp I see: Match/Easy0/32-8 536MB/s ± 1% 540MB/s ± 1% +0.75% (p=0.001 n=9+10) Match/Easy0/1K-8 1.62GB/s ± 0% 4.42GB/s ± 1% +172.48% (p=0.000 n=10+10) Match/Easy0/32K-8 1.87GB/s ± 0% 9.07GB/s ± 1% +384.24% (p=0.000 n=7+10) Match/Easy0/1M-8 1.90GB/s ± 0% 4.83GB/s ± 0% +154.56% (p=0.000 n=8+10) Match/Easy0/32M-8 1.90GB/s ± 0% 4.53GB/s ± 0% +138.62% (p=0.000 n=7+10) Compared to in 1.7: Match/Easy0/32-8 59.5ns ± 0% 59.2ns ± 1% -0.45% (p=0.008 n=9+10) Match/Easy0/1K-8 226ns ± 1% 231ns ± 1% +2.30% (p=0.000 n=10+10) Match/Easy0/32K-8 3.73µs ± 2% 3.61µs ± 1% -3.12% (p=0.000 n=10+10) Match/Easy0/1M-8 206µs ± 1% 217µs ± 0% +5.34% (p=0.000 n=10+10) Match/Easy0/32M-8 7.03ms ± 1% 7.40ms ± 0% +5.23% (p=0.000 n=10+10) Fixes #17456 Change-Id: I38b2fabcaed7119cc4bf37007ba7bfe7504c8f9f Reviewed-on: https://go-review.googlesource.com/31690 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-09-07strings: use AVX2 for Index if availableIlya Tocar
IndexHard4-4 1.50ms ± 2% 0.71ms ± 0% -52.36% (p=0.000 n=20+19) This also fixes a bug, that caused a string of length 16 to use two 8-byte comparisons instead of one 16-byte. And adds a test for cases when partial_match fails. Change-Id: I1ee8fc4e068bb36c95c45de78f067c822c0d9df0 Reviewed-on: https://go-review.googlesource.com/22551 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-09-06bytes: make IndexRune fasterHiroshi Ioka
re-implement IndexRune by IndexByte and Index which are well optimized to get performance gain. name old time/op new time/op delta IndexRune/10-4 53.2ns ± 1% 29.1ns ± 1% -45.32% (p=0.008 n=5+5) IndexRune/32-4 191ns ± 1% 27ns ± 1% -85.75% (p=0.008 n=5+5) IndexRune/4K-4 23.5µs ± 1% 1.0µs ± 1% -95.77% (p=0.008 n=5+5) IndexRune/4M-4 23.8ms ± 0% 1.0ms ± 2% -95.90% (p=0.008 n=5+5) IndexRune/64M-4 384ms ± 1% 15ms ± 1% -95.98% (p=0.008 n=5+5) IndexRuneASCII/10-4 61.5ns ± 0% 10.3ns ± 4% -83.17% (p=0.008 n=5+5) IndexRuneASCII/32-4 203ns ± 0% 11ns ± 5% -94.68% (p=0.008 n=5+5) IndexRuneASCII/4K-4 23.4µs ± 0% 0.3µs ± 2% -98.60% (p=0.008 n=5+5) IndexRuneASCII/4M-4 24.0ms ± 1% 0.3ms ± 1% -98.60% (p=0.008 n=5+5) IndexRuneASCII/64M-4 386ms ± 2% 6ms ± 1% -98.57% (p=0.008 n=5+5) name old speed new speed delta IndexRune/10-4 188MB/s ± 1% 344MB/s ± 1% +82.91% (p=0.008 n=5+5) IndexRune/32-4 167MB/s ± 0% 1175MB/s ± 1% +603.52% (p=0.008 n=5+5) IndexRune/4K-4 174MB/s ± 1% 4117MB/s ± 1% +2262.71% (p=0.008 n=5+5) IndexRune/4M-4 176MB/s ± 0% 4299MB/s ± 2% +2340.46% (p=0.008 n=5+5) IndexRune/64M-4 175MB/s ± 1% 4354MB/s ± 1% +2388.57% (p=0.008 n=5+5) IndexRuneASCII/10-4 163MB/s ± 0% 968MB/s ± 4% +494.66% (p=0.008 n=5+5) IndexRuneASCII/32-4 157MB/s ± 0% 2974MB/s ± 4% +1788.59% (p=0.008 n=5+5) IndexRuneASCII/4K-4 175MB/s ± 0% 12481MB/s ± 2% +7027.71% (p=0.008 n=5+5) IndexRuneASCII/4M-4 175MB/s ± 1% 12510MB/s ± 1% +7061.15% (p=0.008 n=5+5) IndexRuneASCII/64M-4 174MB/s ± 2% 12143MB/s ± 1% +6881.70% (p=0.008 n=5+5) Change-Id: I0632eadb83937c2a9daa7f0ce79df1dee64f992e Reviewed-on: https://go-review.googlesource.com/28537 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-01bytes: Use the same algorithm as strings for IndexIlya Tocar
name old time/op new time/op delta IndexByte32-48 9.05ns ± 7% 9.59ns ±11% +5.93% (p=0.001 n=19+20) IndexByte4K-48 118ns ± 4% 122ns ± 8% +3.52% (p=0.002 n=19+19) IndexByte4M-48 172µs ±13% 188µs ±12% +9.49% (p=0.000 n=20+20) IndexByte64M-48 8.00ms ±14% 8.05ms ±23% ~ (p=0.799 n=20+20) IndexBytePortable32-48 41.7ns ±15% 42.5ns ±12% ~ (p=0.372 n=20+20) IndexBytePortable4K-48 3.08µs ±16% 3.26µs ±10% +5.77% (p=0.018 n=20+20) IndexBytePortable4M-48 3.12ms ±17% 3.20ms ±10% ~ (p=0.157 n=20+20) IndexBytePortable64M-48 54.0ms ±14% 55.3ms ±14% ~ (p=0.640 n=20+20) Index32-48 230ns ±12% 46ns ± 6% -79.87% (p=0.000 n=20+19) Index4K-48 43.2µs ± 9% 3.2µs ±12% -92.58% (p=0.000 n=19+20) Index4M-48 44.4ms ± 7% 3.3ms ±13% -92.59% (p=0.000 n=19+20) Index64M-48 714ms ±10% 56ms ± 8% -92.22% (p=0.000 n=19+19) IndexEasy32-48 52.7ns ±10% 31.0ns ±11% -41.21% (p=0.000 n=20+20) IndexEasy4K-48 139ns ± 5% 1598ns ± 6% +1046.37% (p=0.000 n=19+19) IndexEasy4M-48 179µs ± 8% 1674µs ±10% +834.31% (p=0.000 n=19+20) IndexEasy64M-48 8.56ms ±10% 27.82ms ±16% +225.14% (p=0.000 n=19+20) name old speed new speed delta IndexByte32-48 3.52GB/s ± 7% 3.35GB/s ±11% -4.99% (p=0.001 n=20+20) IndexByte4K-48 34.5GB/s ± 7% 33.2GB/s ±10% -3.67% (p=0.002 n=20+20) IndexByte4M-48 24.6GB/s ±14% 22.4GB/s ±14% -8.73% (p=0.000 n=20+20) IndexByte64M-48 8.42GB/s ±16% 8.42GB/s ±19% ~ (p=0.799 n=20+20) IndexBytePortable32-48 770MB/s ±13% 756MB/s ±11% ~ (p=0.383 n=20+20) IndexBytePortable4K-48 1.34GB/s ±14% 1.26GB/s ±10% -5.76% (p=0.018 n=20+20) IndexBytePortable4M-48 1.35GB/s ±15% 1.31GB/s ±11% ~ (p=0.157 n=20+20) IndexBytePortable64M-48 1.25GB/s ±16% 1.22GB/s ±13% ~ (p=0.640 n=20+20) Index32-48 138MB/s ± 8% 687MB/s ± 8% +398.57% (p=0.000 n=19+20) Index4K-48 94.9MB/s ± 9% 1280.5MB/s ±11% +1249.11% (p=0.000 n=19+20) Index4M-48 94.6MB/s ± 7% 1278.5MB/s ±12% +1250.99% (p=0.000 n=19+20) Index64M-48 94.2MB/s ±10% 1210.9MB/s ± 8% +1185.04% (p=0.000 n=19+19) IndexEasy32-48 608MB/s ±10% 1035MB/s ±10% +70.15% (p=0.000 n=20+20) IndexEasy4K-48 29.3GB/s ± 6% 2.6GB/s ± 6% -91.24% (p=0.000 n=19+19) IndexEasy4M-48 23.3GB/s ±10% 2.5GB/s ± 9% -89.23% (p=0.000 n=20+20) IndexEasy64M-48 7.86GB/s ±11% 2.42GB/s ±14% -69.18% (p=0.000 n=19+20) Change-Id: Ia191f0a6ca80e113397d9ed98d25f195768b65bc Reviewed-on: https://go-review.googlesource.com/22550 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>