aboutsummaryrefslogtreecommitdiff
path: root/src/internal/bytealg
AgeCommit message (Collapse)Author
2023-02-11internal/bytealg: simplify and improve compare on riscv64Joel Sing
Remove some unnecessary loops and pull the comparison code out from the compare/loop code. Add an unaligned 8 byte comparison, which reads 8 bytes from each input before comparing them. This gives a reasonable gain in performance for the large unaligned case. Updates #50615 name old time/op new time/op delta CompareBytesEqual-4 116ns _ 0% 111ns _ 0% -4.10% (p=0.000 n=5+5) CompareBytesToNil-4 34.9ns _ 0% 35.0ns _ 0% +0.45% (p=0.002 n=5+5) CompareBytesEmpty-4 29.6ns _ 1% 29.8ns _ 0% +0.71% (p=0.016 n=5+5) CompareBytesIdentical-4 29.8ns _ 0% 29.9ns _ 1% +0.50% (p=0.036 n=5+5) CompareBytesSameLength-4 66.1ns _ 0% 60.4ns _ 0% -8.59% (p=0.000 n=5+5) CompareBytesDifferentLength-4 63.1ns _ 0% 60.5ns _ 0% -4.20% (p=0.000 n=5+5) CompareBytesBigUnaligned/offset=1-4 6.84ms _ 3% 6.04ms _ 5% -11.70% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=2-4 6.99ms _ 4% 5.93ms _ 6% -15.22% (p=0.000 n=5+5) CompareBytesBigUnaligned/offset=3-4 6.74ms _ 1% 6.00ms _ 5% -10.94% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=4-4 7.20ms _ 6% 5.97ms _ 6% -17.05% (p=0.000 n=5+5) CompareBytesBigUnaligned/offset=5-4 6.75ms _ 1% 5.81ms _ 8% -13.93% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=6-4 6.89ms _ 5% 5.75ms _ 2% -16.58% (p=0.000 n=5+4) CompareBytesBigUnaligned/offset=7-4 6.91ms _ 6% 6.13ms _ 6% -11.27% (p=0.001 n=5+5) CompareBytesBig-4 2.75ms _ 5% 2.71ms _ 8% ~ (p=0.651 n=5+5) CompareBytesBigIdentical-4 29.9ns _ 1% 29.8ns _ 0% ~ (p=0.751 n=5+5) name old speed new speed delta CompareBytesBigUnaligned/offset=1-4 153MB/s _ 3% 174MB/s _ 6% +13.40% (p=0.003 n=5+5) CompareBytesBigUnaligned/offset=2-4 150MB/s _ 4% 177MB/s _ 6% +18.06% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=3-4 156MB/s _ 1% 175MB/s _ 5% +12.39% (p=0.002 n=5+5) CompareBytesBigUnaligned/offset=4-4 146MB/s _ 6% 176MB/s _ 6% +20.67% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=5-4 155MB/s _ 1% 181MB/s _ 7% +16.35% (p=0.002 n=5+5) CompareBytesBigUnaligned/offset=6-4 152MB/s _ 5% 182MB/s _ 2% +19.74% (p=0.000 n=5+4) CompareBytesBigUnaligned/offset=7-4 152MB/s _ 6% 171MB/s _ 6% +12.70% (p=0.001 n=5+5) CompareBytesBig-4 382MB/s _ 5% 388MB/s _ 9% ~ (p=0.616 n=5+5) CompareBytesBigIdentical-4 35.1TB/s _ 1% 35.1TB/s _ 0% ~ (p=0.800 n=5+5) Change-Id: I127edc376e62a2c529719a4ab172f481e0a81357 Reviewed-on: https://go-review.googlesource.com/c/go/+/431100 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joedian Reid <joedian@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au>
2022-11-07internal/bytealg: add PCALIGN to indexbodyp9 function on ppc64xArchana R
Adding PCALIGN in indexbodyp9 function shows improvements in some SimonWaldherr benchmarks and one of the index benchmarks on both Power9 and Power10 name old time/op new time/op delta Contains 19.8ns ± 0% 15.6ns ± 0% -21.24% ContainsNot 21.3ns ± 0% 18.9ns ± 0% -11.03% ContainsBytes 19.1ns ± 0% 16.0ns ± 0% -16.54% Index/10 17.3ns ± 0% 16.1ns ± 0% -7.30% Index/32 59.6ns ± 0% 59.6ns ± 0% +0.12% Index/4K 3.68µs ± 0% 3.68µs ± 0% ~ Index/4M 3.74ms ± 0% 3.74ms ± 0% -0.00% Index/64M 59.8ms ± 0% 59.8ms ± 0% ~ Change-Id: I784e57e0b0f5bac143f57f3a32845219e43d47fd Reviewed-on: https://go-review.googlesource.com/c/go/+/447595 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-31internal/bytealg: fix bug in index function for ppc64le/power9Archana R
The index function was not handling certain corner cases where there were two more bytes to be examined in the tail end of the string to complete the comparison. Fix code to ensure that when the string has to be shifted two more times the correct bytes are examined. Also hoisted vsplat to V10 so that all paths use the correct value. Some comments had incorrect register names and corrected the same. Added the strings that were failing to strings test for verification. Fixes #56457 Change-Id: Idba7cbc802e3d73c8f4fe89309871cc8447792f5 Reviewed-on: https://go-review.googlesource.com/c/go/+/446135 Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Archana Ravindar <ravindararchana@gmail.com>
2022-10-26all: delete riscv64 non-register ABI fallback pathWayne Zuo
Change-Id: I9e997b59ffb868575b780b9660df1f5ac322b79a Reviewed-on: https://go-review.googlesource.com/c/go/+/443556 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: David Chase <drchase@google.com>
2022-09-19internal/bytealg: correct alignment checks for compare/memequal on riscv64Joel Sing
On riscv64 we need 8 byte alignment for 8 byte loads - the existing check was only ensuring 4 byte alignment, which potentially results in unaligned loads being performed. Unaligned loads incur a significant performance penality due to the resulting kernel traps and fix ups. Adjust BenchmarkCompareBytesBigUnaligned so that this issue would have been more readily visible. Updates #50615 name old time/op new time/op delta CompareBytesBigUnaligned/offset=1-4 6.98ms _ 5% 6.84ms _ 3% ~ (p=0.319 n=5+5) CompareBytesBigUnaligned/offset=2-4 6.75ms _ 1% 6.99ms _ 4% ~ (p=0.063 n=5+5) CompareBytesBigUnaligned/offset=3-4 6.84ms _ 1% 6.74ms _ 1% -1.48% (p=0.003 n=5+5) CompareBytesBigUnaligned/offset=4-4 146ms _ 1% 7ms _ 6% -95.08% (p=0.000 n=5+5) CompareBytesBigUnaligned/offset=5-4 7.05ms _ 5% 6.75ms _ 1% ~ (p=0.079 n=5+5) CompareBytesBigUnaligned/offset=6-4 7.11ms _ 5% 6.89ms _ 5% ~ (p=0.177 n=5+5) CompareBytesBigUnaligned/offset=7-4 7.14ms _ 5% 6.91ms _ 6% ~ (p=0.165 n=5+5) name old speed new speed delta CompareBytesBigUnaligned/offset=1-4 150MB/s _ 5% 153MB/s _ 3% ~ (p=0.336 n=5+5) CompareBytesBigUnaligned/offset=2-4 155MB/s _ 1% 150MB/s _ 4% ~ (p=0.058 n=5+5) CompareBytesBigUnaligned/offset=3-4 153MB/s _ 1% 156MB/s _ 1% +1.51% (p=0.004 n=5+5) CompareBytesBigUnaligned/offset=4-4 7.16MB/s _ 1% 145.79MB/s _ 6% +1936.23% (p=0.000 n=5+5) CompareBytesBigUnaligned/offset=5-4 149MB/s _ 5% 155MB/s _ 1% ~ (p=0.078 n=5+5) CompareBytesBigUnaligned/offset=6-4 148MB/s _ 5% 152MB/s _ 5% ~ (p=0.175 n=5+5) CompareBytesBigUnaligned/offset=7-4 147MB/s _ 5% 152MB/s _ 6% ~ (p=0.160 n=5+5) Change-Id: I2c859e061919db482318ce63b85b808aa973a9ba Reviewed-on: https://go-review.googlesource.com/c/go/+/431099 Reviewed-by: Meng Zhuo <mzh@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-08-18runtime: remove dead code and unnecessary checks for amd64vpachkov
Use amd64 assembly header to remove unnecessary cpu flags checks and dead code that is guaranteed to not be executed when compiling for specific microarchitectures. name old time/op new time/op delta BytesCompare/1-12 3.88ns ± 1% 3.18ns ± 1% -18.15% (p=0.008 n=5+5) BytesCompare/2-12 3.89ns ± 1% 3.21ns ± 2% -17.66% (p=0.008 n=5+5) BytesCompare/4-12 3.89ns ± 0% 3.17ns ± 0% -18.62% (p=0.008 n=5+5) BytesCompare/8-12 3.44ns ± 2% 3.39ns ± 1% -1.36% (p=0.008 n=5+5) BytesCompare/16-12 3.40ns ± 1% 3.14ns ± 0% -7.77% (p=0.008 n=5+5) BytesCompare/32-12 3.90ns ± 1% 3.65ns ± 0% -6.19% (p=0.008 n=5+5) BytesCompare/64-12 4.96ns ± 1% 4.71ns ± 2% -4.98% (p=0.008 n=5+5) BytesCompare/128-12 6.42ns ± 0% 5.99ns ± 4% -6.75% (p=0.008 n=5+5) BytesCompare/256-12 9.36ns ± 0% 7.40ns ± 0% -20.97% (p=0.008 n=5+5) BytesCompare/512-12 15.9ns ± 1% 11.4ns ± 1% -28.36% (p=0.008 n=5+5) BytesCompare/1024-12 27.0ns ± 0% 19.3ns ± 0% -28.36% (p=0.008 n=5+5) BytesCompare/2048-12 50.2ns ± 0% 43.3ns ± 0% -13.71% (p=0.008 n=5+5) [Geo mean] 7.13ns 6.07ns -14.86% name old speed new speed delta Count/10-12 723MB/s ± 0% 704MB/s ± 1% -2.73% (p=0.008 n=5+5) Count/32-12 2.21GB/s ± 0% 2.12GB/s ± 2% -4.21% (p=0.008 n=5+5) Count/4K-12 1.03GB/s ± 0% 1.03GB/s ± 1% ~ (p=1.000 n=5+5) Count/4M-12 1.04GB/s ± 0% 1.02GB/s ± 2% ~ (p=0.310 n=5+5) Count/64M-12 1.02GB/s ± 0% 1.01GB/s ± 1% -1.00% (p=0.016 n=5+5) CountEasy/10-12 779MB/s ± 0% 768MB/s ± 1% -1.48% (p=0.008 n=5+5) CountEasy/32-12 2.15GB/s ± 0% 2.09GB/s ± 1% -2.71% (p=0.008 n=5+5) CountEasy/4K-12 45.1GB/s ± 1% 45.2GB/s ± 1% ~ (p=0.421 n=5+5) CountEasy/4M-12 36.4GB/s ± 1% 36.5GB/s ± 1% ~ (p=0.690 n=5+5) CountEasy/64M-12 16.1GB/s ± 2% 16.4GB/s ± 1% ~ (p=0.056 n=5+5) CountSingle/10-12 2.15GB/s ± 2% 2.22GB/s ± 1% +3.37% (p=0.008 n=5+5) CountSingle/32-12 5.86GB/s ± 1% 5.76GB/s ± 1% -1.55% (p=0.008 n=5+5) CountSingle/4K-12 54.6GB/s ± 1% 55.0GB/s ± 1% ~ (p=0.548 n=5+5) CountSingle/4M-12 45.9GB/s ± 4% 46.4GB/s ± 2% ~ (p=0.548 n=5+5) CountSingle/64M-12 17.3GB/s ± 1% 17.2GB/s ± 2% ~ (p=1.000 n=5+5) [Geo mean] 5.11GB/s 5.08GB/s -0.53% name old speed new speed delta Equal/1-12 200MB/s ± 0% 188MB/s ± 1% -6.11% (p=0.008 n=5+5) Equal/6-12 1.20GB/s ± 0% 1.13GB/s ± 1% -6.38% (p=0.008 n=5+5) Equal/9-12 1.67GB/s ± 3% 1.74GB/s ± 1% +3.83% (p=0.008 n=5+5) Equal/15-12 2.82GB/s ± 1% 2.89GB/s ± 1% +2.63% (p=0.008 n=5+5) Equal/16-12 2.96GB/s ± 1% 3.08GB/s ± 1% +3.95% (p=0.008 n=5+5) Equal/20-12 3.33GB/s ± 1% 3.54GB/s ± 1% +6.36% (p=0.008 n=5+5) Equal/32-12 4.57GB/s ± 0% 5.26GB/s ± 1% +15.09% (p=0.008 n=5+5) Equal/4K-12 62.0GB/s ± 1% 65.9GB/s ± 2% +6.29% (p=0.008 n=5+5) Equal/4M-12 23.6GB/s ± 2% 24.8GB/s ± 4% +5.43% (p=0.008 n=5+5) Equal/64M-12 11.1GB/s ± 2% 11.3GB/s ± 1% +1.69% (p=0.008 n=5+5) [Geo mean] 3.91GB/s 4.03GB/s +3.11% name old speed new speed delta IndexByte/10-12 2.64GB/s ± 0% 2.69GB/s ± 0% +1.67% (p=0.008 n=5+5) IndexByte/32-12 6.79GB/s ± 0% 6.27GB/s ± 0% -7.57% (p=0.008 n=5+5) IndexByte/4K-12 56.2GB/s ± 0% 56.9GB/s ± 0% +1.27% (p=0.008 n=5+5) IndexByte/4M-12 40.1GB/s ± 1% 41.7GB/s ± 1% +4.05% (p=0.008 n=5+5) IndexByte/64M-12 17.5GB/s ± 0% 17.7GB/s ± 1% ~ (p=0.095 n=5+5) IndexBytePortable/10-12 2.06GB/s ± 1% 2.16GB/s ± 1% +5.08% (p=0.008 n=5+5) IndexBytePortable/32-12 1.40GB/s ± 1% 1.54GB/s ± 1% +10.05% (p=0.008 n=5+5) IndexBytePortable/4K-12 3.99GB/s ± 0% 4.08GB/s ± 0% +2.16% (p=0.008 n=5+5) IndexBytePortable/4M-12 4.05GB/s ± 1% 4.08GB/s ± 2% ~ (p=0.095 n=5+5) IndexBytePortable/64M-12 3.80GB/s ± 1% 3.81GB/s ± 0% ~ (p=0.421 n=5+5) IndexRune/10-12 746MB/s ± 1% 752MB/s ± 0% +0.85% (p=0.008 n=5+5) IndexRune/32-12 2.33GB/s ± 0% 2.42GB/s ± 0% +3.66% (p=0.008 n=5+5) IndexRune/4K-12 44.4GB/s ± 0% 44.2GB/s ± 0% ~ (p=0.095 n=5+5) IndexRune/4M-12 36.2GB/s ± 1% 36.3GB/s ± 2% ~ (p=0.841 n=5+5) IndexRune/64M-12 16.2GB/s ± 2% 16.3GB/s ± 2% ~ (p=0.548 n=5+5) IndexRuneASCII/10-12 2.57GB/s ± 0% 2.58GB/s ± 0% +0.63% (p=0.008 n=5+5) IndexRuneASCII/32-12 6.00GB/s ± 0% 6.30GB/s ± 1% +4.98% (p=0.008 n=5+5) IndexRuneASCII/4K-12 56.7GB/s ± 0% 56.8GB/s ± 1% ~ (p=0.151 n=5+5) IndexRuneASCII/4M-12 41.6GB/s ± 1% 41.7GB/s ± 2% ~ (p=0.151 n=5+5) IndexRuneASCII/64M-12 17.7GB/s ± 1% 17.6GB/s ± 1% ~ (p=0.222 n=5+5) Index/10-12 1.06GB/s ± 1% 1.06GB/s ± 0% ~ (p=0.310 n=5+5) Index/32-12 3.57GB/s ± 0% 3.56GB/s ± 1% ~ (p=0.056 n=5+5) Index/4K-12 1.02GB/s ± 2% 1.03GB/s ± 0% ~ (p=0.690 n=5+5) Index/4M-12 1.04GB/s ± 0% 1.03GB/s ± 1% ~ (p=1.000 n=4+5) Index/64M-12 1.02GB/s ± 0% 1.02GB/s ± 0% ~ (p=0.905 n=5+4) IndexEasy/10-12 1.12GB/s ± 2% 1.15GB/s ± 1% +3.10% (p=0.008 n=5+5) IndexEasy/32-12 3.14GB/s ± 2% 3.13GB/s ± 1% ~ (p=0.310 n=5+5) IndexEasy/4K-12 47.6GB/s ± 1% 47.7GB/s ± 2% ~ (p=0.310 n=5+5) IndexEasy/4M-12 36.4GB/s ± 1% 36.3GB/s ± 2% ~ (p=0.690 n=5+5) IndexEasy/64M-12 16.1GB/s ± 1% 16.4GB/s ± 5% ~ (p=0.151 n=5+5) [Geo mean] 6.39GB/s 6.46GB/s +1.11% Change-Id: Ic1ca62f5cc719d87e2c4aeff25ad73507facff82 Reviewed-on: https://go-review.googlesource.com/c/go/+/397576 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2022-05-17internal/bytealg: support basic byte operation on loong64Xiaodong Liu
Contributors to the loong64 port are: Weining Lu <luweining@loongson.cn> Lei Wang <wanglei@loongson.cn> Lingqin Gong <gonglingqin@loongson.cn> Xiaolin Zhao <zhaoxiaolin@loongson.cn> Meidan Li <limeidan@loongson.cn> Xiaojuan Zhai <zhaixiaojuan@loongson.cn> Qiyuan Pu <puqiyuan@loongson.cn> Guoqi Chen <chenguoqi@loongson.cn> This port has been updated to Go 1.15.6: https://github.com/loongson/go Updates #46229 Change-Id: I4ac6d38dc632abfa0b698325ca0ae349c0d7ecd3 Reviewed-on: https://go-review.googlesource.com/c/go/+/342316 Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-09internal/bytealg: optimize index function for ppc64le/power9Archana R
Optimized index2to16 loop by unrolling the loop by 4. Multiple benchmark tests show performance improvement on POWER9. Similar improvements are seen on POWER10. Added tests to ensure changes work fine. name old time/op new time/op delta Index/10 18.3ns ± 0% 19.7ns ±25% ~ Index/32 75.3ns ± 0% 69.2ns ± 0% -8.22% Index/4K 5.53µs ± 0% 3.69µs ± 0% -33.20% Index/4M 5.64ms ± 0% 3.75ms ± 0% -33.55% Index/64M 92.9ms ± 0% 61.6ms ± 0% -33.69% IndexHard2 1.41ms ± 0% 0.93ms ± 0% -33.75% CountHard2 1.41ms ± 0% 0.93ms ± 0% -33.75% Change-Id: If9331df6a141a4716724b8cb648d2b91bdf17e5f Reviewed-on: https://go-review.googlesource.com/c/go/+/377016 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
2022-05-03internal/bytealg: mask high bit for riscv64 regabiMeng Zhuo
This CL masks byte params which high bits(~0xff) is unused for riscv64 regabi. Currently the compiler only guarantees the low bits contains value. Change-Id: I6dd6c867e60d2143fefde92c866f78c4b007a2f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/402894 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: mzh <mzh@golangcn.org> Reviewed-by: Benny Siegert <bsiegert@gmail.com>
2022-05-02internal/bytealg: improve PPC64 equalPaul E. Murphy
Rewrite the vector loop to process 64B per iteration, this greatly improves performance on POWER9/POWER10 for large sizes. Likewise, use a similar tricks for sizes >= 8 and <= 64. And, rewrite small comparisons, it's a little slower for 1 byte, but constant time for 1-7 bytes. Thus, it is increasingly faster for 2-7B. Benchmarks results below are from P8/P9 ppc64le (in that order), several additional testcases have been added to test interesting sizes. Likewise, the old variant was padded to the same code size of the new variant to minimize layout related noise: POWER8/ppc64le/linux: name old speed new speed delta Equal/1 110MB/s ± 0% 106MB/s ± 0% -3.26% Equal/2 202MB/s ± 0% 203MB/s ± 0% +0.18% Equal/3 280MB/s ± 0% 319MB/s ± 0% +13.89% Equal/4 350MB/s ± 0% 414MB/s ± 0% +18.27% Equal/5 412MB/s ± 0% 533MB/s ± 0% +29.19% Equal/6 462MB/s ± 0% 620MB/s ± 0% +34.11% Equal/7 507MB/s ± 0% 745MB/s ± 0% +47.02% Equal/8 913MB/s ± 0% 994MB/s ± 0% +8.84% Equal/9 909MB/s ± 0% 1117MB/s ± 0% +22.85% Equal/10 937MB/s ± 0% 1242MB/s ± 0% +32.59% Equal/11 962MB/s ± 0% 1370MB/s ± 0% +42.37% Equal/12 989MB/s ± 0% 1490MB/s ± 0% +50.60% Equal/13 1.01GB/s ± 0% 1.61GB/s ± 0% +60.27% Equal/14 1.02GB/s ± 0% 1.74GB/s ± 0% +71.22% Equal/15 1.03GB/s ± 0% 1.86GB/s ± 0% +81.45% Equal/16 1.60GB/s ± 0% 1.99GB/s ± 0% +24.21% Equal/17 1.54GB/s ± 0% 2.04GB/s ± 0% +32.28% Equal/20 1.48GB/s ± 0% 2.40GB/s ± 0% +62.64% Equal/32 3.58GB/s ± 0% 3.84GB/s ± 0% +7.18% Equal/63 3.74GB/s ± 0% 7.17GB/s ± 0% +91.79% Equal/64 6.35GB/s ± 0% 7.29GB/s ± 0% +14.75% Equal/65 5.85GB/s ± 0% 7.00GB/s ± 0% +19.66% Equal/127 6.74GB/s ± 0% 13.74GB/s ± 0% +103.77% Equal/128 10.6GB/s ± 0% 12.9GB/s ± 0% +21.98% Equal/129 9.66GB/s ± 0% 11.96GB/s ± 0% +23.85% Equal/191 9.12GB/s ± 0% 17.80GB/s ± 0% +95.26% Equal/192 13.4GB/s ± 0% 17.2GB/s ± 0% +28.66% Equal/4K 29.5GB/s ± 0% 37.3GB/s ± 0% +26.39% Equal/4M 22.6GB/s ± 0% 23.1GB/s ± 0% +2.40% Equal/64M 10.6GB/s ± 0% 11.2GB/s ± 0% +5.83% POWER9/ppc64le/linux: name old speed new speed delta Equal/1 122MB/s ± 0% 121MB/s ± 0% -0.94% Equal/2 223MB/s ± 0% 241MB/s ± 0% +8.29% Equal/3 289MB/s ± 0% 362MB/s ± 0% +24.90% Equal/4 366MB/s ± 0% 483MB/s ± 0% +31.82% Equal/5 427MB/s ± 0% 603MB/s ± 0% +41.28% Equal/6 462MB/s ± 0% 723MB/s ± 0% +56.65% Equal/7 509MB/s ± 0% 843MB/s ± 0% +65.57% Equal/8 974MB/s ± 0% 1066MB/s ± 0% +9.46% Equal/9 1.00GB/s ± 0% 1.20GB/s ± 0% +19.53% Equal/10 1.00GB/s ± 0% 1.33GB/s ± 0% +32.81% Equal/11 1.01GB/s ± 0% 1.47GB/s ± 0% +45.28% Equal/12 1.04GB/s ± 0% 1.60GB/s ± 0% +53.46% Equal/13 1.05GB/s ± 0% 1.73GB/s ± 0% +64.67% Equal/14 1.02GB/s ± 0% 1.87GB/s ± 0% +82.93% Equal/15 1.04GB/s ± 0% 2.00GB/s ± 0% +92.07% Equal/16 1.83GB/s ± 0% 2.13GB/s ± 0% +16.58% Equal/17 1.78GB/s ± 0% 2.18GB/s ± 0% +22.65% Equal/20 1.72GB/s ± 0% 2.57GB/s ± 0% +49.24% Equal/32 3.89GB/s ± 0% 4.10GB/s ± 0% +5.53% Equal/63 3.63GB/s ± 0% 7.63GB/s ± 0% +110.45% Equal/64 6.69GB/s ± 0% 7.75GB/s ± 0% +15.84% Equal/65 6.28GB/s ± 0% 7.07GB/s ± 0% +12.46% Equal/127 6.41GB/s ± 0% 13.65GB/s ± 0% +112.95% Equal/128 11.1GB/s ± 0% 14.1GB/s ± 0% +26.56% Equal/129 10.2GB/s ± 0% 11.2GB/s ± 0% +9.44% Equal/191 8.64GB/s ± 0% 16.39GB/s ± 0% +89.75% Equal/192 15.3GB/s ± 0% 17.8GB/s ± 0% +16.31% Equal/4K 24.6GB/s ± 0% 27.8GB/s ± 0% +13.12% Equal/4M 21.1GB/s ± 0% 22.7GB/s ± 0% +7.66% Equal/64M 20.8GB/s ± 0% 22.4GB/s ± 0% +8.06% Change-Id: Ie3c582133d526cc14e8846ef364c44c93eb7b9a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/399976 Reviewed-by: Carlos Amedee <carlos@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2022-04-22internal/bytealg: optimize cmpbody for ppc64le/ppc64Archana R
Vectorize the cmpbody loop for bytes of size greater than or equal to 32 on both POWER8(LE and BE) and POWER9(LE and BE) and improve performance of smaller size compares Performance improves for most sizes with this change on POWER8, 9 and POWER10. For the very small sizes (upto 8) the overhead of calling function starts to impact performance. POWER9: name old time/op new time/op delta BytesCompare/1 4.60ns ± 0% 5.49ns ± 0% +19.27% BytesCompare/2 4.68ns ± 0% 5.46ns ± 0% +16.71% BytesCompare/4 6.58ns ± 0% 5.49ns ± 0% -16.58% BytesCompare/8 4.89ns ± 0% 5.46ns ± 0% +11.64% BytesCompare/16 5.21ns ± 0% 4.96ns ± 0% -4.70% BytesCompare/32 5.09ns ± 0% 4.98ns ± 0% -2.14% BytesCompare/64 6.40ns ± 0% 5.96ns ± 0% -6.84% BytesCompare/128 11.3ns ± 0% 8.1ns ± 0% -28.09% BytesCompare/256 15.1ns ± 0% 12.8ns ± 0% -15.16% BytesCompare/512 26.5ns ± 0% 23.3ns ± 5% -12.03% BytesCompare/1024 50.2ns ± 0% 41.6ns ± 2% -17.01% BytesCompare/2048 99.3ns ± 0% 86.5ns ± 0% -12.88% Change-Id: I24f93b2910591e6829ddd8509aa6eeaa6355c609 Reviewed-on: https://go-review.googlesource.com/c/go/+/362797 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Archana Ravindar <aravind5@in.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Than McIntosh <thanm@google.com>
2022-04-22internal/bytealg: port bytealg functions to reg ABI on riscv64Meng Zhuo
This CL adds support for the reg ABI to the bytes functions for riscv64. These are initially under control of the GOEXPERIMENT macro until all changes are in. Change-Id: I026295ae38e2aba055f7a51c77f92c1921e5ec97 Reviewed-on: https://go-review.googlesource.com/c/go/+/361916 Run-TryBot: mzh <mzh@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-04-15internal/bytealg: optimize indexbyte function for ppc64le/power9Archana R
Added specific code For POWER9 that does not need prealignment prior to load vector. Optimized vector loop to jump out as soon as there is a match instead of accumulating matches for 4 indices and then processing the same. For small input size 10, the caller function dominates performance. name old time/op new time/op delta IndexByte/10 9.20ns ± 0% 10.40ns ± 0% +13.08% IndexByte/32 9.77ns ± 0% 9.20ns ± 0% -5.84% IndexByte/4K 171ns ± 0% 136ns ± 0% -20.51% IndexByte/4M 154µs ± 0% 126µs ± 0% -17.92% IndexByte/64M 2.48ms ± 0% 2.03ms ± 0% -18.27% IndexAnyASCII/1:32 10.2ns ± 1% 9.2ns ± 0% -9.19% IndexAnyASCII/1:64 11.3ns ± 0% 10.1ns ± 0% -11.29% IndexAnyUTF8/1:64 11.4ns ± 0% 9.8ns ± 0% -13.73% IndexAnyUTF8/16:64 156ns ± 1% 131ns ± 0% -16.23% IndexAnyUTF8/256:64 2.27µs ± 0% 1.86µs ± 0% -18.03% LastIndexAnyUTF8/1:64 11.8ns ± 0% 10.5ns ± 0% -10.81% LastIndexAnyUTF8/16:64 165ns ±11% 132ns ± 0% -19.75% LastIndexAnyUTF8/256:2 1.68µs ± 0% 1.44µs ± 0% -14.33% LastIndexAnyUTF8/256:4 1.68µs ± 0% 1.49µs ± 0% -11.10% LastIndexAnyUTF8/256:8 1.68µs ± 0% 1.50µs ± 0% -11.05% LastIndexAnyUTF8/256:64 2.30µs ± 0% 1.90µs ± 0% -17.56% Change-Id: I3d2550bdfdea38fece2da9960bbe62fe6cb1840c Reviewed-on: https://go-review.googlesource.com/c/go/+/397614 Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Archana Ravindar <aravind5@in.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
2022-03-28all: delete PPC64 non-register ABI fallback pathCherry Mui
Change-Id: Ie058c0549167b256ad943a0134907df3aca4a69f Reviewed-on: https://go-review.googlesource.com/c/go/+/394215 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-18all: delete ARM64 non-register ABI fallback pathCherry Mui
Change-Id: I3996fb31789a1f8559348e059cf371774e548a8d Reviewed-on: https://go-review.googlesource.com/c/go/+/393875 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2022-03-10cmd/compile,bytealg: change context register on riscv64Meng Zhuo
The register ABI will use X8-X23 (CL 356519), this CL changes context register from X20(S4) to X26(S10) to meet the prerequisite. Update #40724 Change-Id: I93d51d22fe7b3ea5ceffe96dff93e3af60fbe7f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/357974 Trust: mzh <mzh@golangcn.org> Run-TryBot: mzh <mzh@golangcn.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-08internal/bytealg: optimise compare on riscv64Joel Sing
Implement compare using loops that process 32 bytes, 16 bytes, 4 bytes or 1 byte depending on size and alignment. For comparisons that are less than 32 bytes the overhead of checking and adjusting alignment usually exceeds the overhead of reading and processing 4 bytes at a time. Updates #50615 name old time/op new time/op delta BytesCompare/1-4 68.4ns _ 1% 61.0ns _ 0% -10.78% (p=0.001 n=3+3) BytesCompare/2-4 82.9ns _ 0% 71.0ns _ 1% -14.31% (p=0.000 n=3+3) BytesCompare/4-4 107ns _ 0% 70ns _ 0% -34.96% (p=0.000 n=3+3) BytesCompare/8-4 156ns _ 0% 90ns _ 0% -42.36% (p=0.000 n=3+3) BytesCompare/16-4 267ns _11% 130ns _ 0% -51.10% (p=0.011 n=3+3) BytesCompare/32-4 446ns _ 0% 74ns _ 0% -83.31% (p=0.000 n=3+3) BytesCompare/64-4 840ns _ 2% 91ns _ 0% -89.17% (p=0.000 n=3+3) BytesCompare/128-4 1.60_s _ 0% 0.13_s _ 0% -92.18% (p=0.000 n=3+3) BytesCompare/256-4 3.15_s _ 0% 0.19_s _ 0% -93.91% (p=0.000 n=3+3) BytesCompare/512-4 6.25_s _ 0% 0.33_s _ 0% -94.80% (p=0.000 n=3+3) BytesCompare/1024-4 12.5_s _ 0% 0.6_s _ 0% -95.23% (p=0.000 n=3+3) BytesCompare/2048-4 24.8_s _ 0% 1.1_s _ 0% -95.46% (p=0.000 n=3+3) CompareBytesEqual-4 225ns _ 0% 131ns _ 0% -41.69% (p=0.000 n=3+3) CompareBytesToNil-4 45.3ns _ 7% 46.7ns _ 0% ~ (p=0.452 n=3+3) CompareBytesEmpty-4 41.0ns _ 1% 40.6ns _ 0% ~ (p=0.071 n=3+3) CompareBytesIdentical-4 48.9ns _ 0% 41.3ns _ 1% -15.58% (p=0.000 n=3+3) CompareBytesSameLength-4 127ns _ 0% 77ns _ 0% -39.48% (p=0.000 n=3+3) CompareBytesDifferentLength-4 136ns _12% 78ns _ 0% -42.65% (p=0.018 n=3+3) CompareBytesBigUnaligned-4 14.9ms _ 1% 7.3ms _ 1% -50.95% (p=0.000 n=3+3) CompareBytesBig-4 14.9ms _ 1% 2.7ms _ 8% -82.10% (p=0.000 n=3+3) CompareBytesBigIdentical-4 52.5ns _ 0% 44.9ns _ 0% -14.53% (p=0.000 n=3+3) name old speed new speed delta CompareBytesBigUnaligned-4 70.5MB/s _ 1% 143.8MB/s _ 1% +103.87% (p=0.000 n=3+3) CompareBytesBig-4 70.3MB/s _ 1% 393.8MB/s _ 8% +460.43% (p=0.003 n=3+3) CompareBytesBigIdentical-4 20.0TB/s _ 0% 23.4TB/s _ 0% +17.00% (p=0.000 n=3+3) Change-Id: Ie18712a9009d425c75e1ab49d5a673d84e73a1eb Reviewed-on: https://go-review.googlesource.com/c/go/+/380076 Trust: Joel Sing <joel@sing.id.au> Trust: mzh <mzh@golangcn.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-03-08internal/bytealg: optimise memequal on riscv64Joel Sing
Implement memequal using loops that process 32 bytes, 16 bytes, 4 bytes or 1 byte depending on size and alignment. For comparisons that are less than 32 bytes the overhead of checking and adjusting alignment usually exceeds the overhead of reading and processing 4 bytes at a time. Updates #50615 name old time/op new time/op delta Equal/0-4 38.3ns _ 0% 43.1ns _ 0% +12.54% (p=0.000 n=3+3) Equal/1-4 77.7ns _ 0% 90.3ns _ 0% +16.27% (p=0.000 n=3+3) Equal/6-4 116ns _ 0% 121ns _ 0% +3.85% (p=0.002 n=3+3) Equal/9-4 137ns _ 0% 126ns _ 0% -7.98% (p=0.000 n=3+3) Equal/15-4 179ns _ 0% 170ns _ 0% -4.77% (p=0.001 n=3+3) Equal/16-4 186ns _ 0% 159ns _ 0% -14.65% (p=0.000 n=3+3) Equal/20-4 215ns _ 0% 178ns _ 0% -17.18% (p=0.000 n=3+3) Equal/32-4 298ns _ 0% 101ns _ 0% -66.22% (p=0.000 n=3+3) Equal/4K-4 28.9_s _ 0% 2.2_s _ 0% -92.56% (p=0.000 n=3+3) Equal/4M-4 29.6ms _ 0% 2.2ms _ 0% -92.72% (p=0.000 n=3+3) Equal/64M-4 758ms _75% 35ms _ 0% ~ (p=0.127 n=3+3) CompareBytesEqual-4 226ns _ 0% 131ns _ 0% -41.76% (p=0.000 n=3+3) name old speed new speed delta Equal/1-4 12.9MB/s _ 0% 11.1MB/s _ 0% -13.98% (p=0.000 n=3+3) Equal/6-4 51.7MB/s _ 0% 49.8MB/s _ 0% -3.72% (p=0.002 n=3+3) Equal/9-4 65.7MB/s _ 0% 71.4MB/s _ 0% +8.67% (p=0.000 n=3+3) Equal/15-4 83.8MB/s _ 0% 88.0MB/s _ 0% +5.02% (p=0.001 n=3+3) Equal/16-4 85.9MB/s _ 0% 100.6MB/s _ 0% +17.19% (p=0.000 n=3+3) Equal/20-4 93.2MB/s _ 0% 112.6MB/s _ 0% +20.74% (p=0.000 n=3+3) Equal/32-4 107MB/s _ 0% 317MB/s _ 0% +195.97% (p=0.000 n=3+3) Equal/4K-4 142MB/s _ 0% 1902MB/s _ 0% +1243.76% (p=0.000 n=3+3) Equal/4M-4 142MB/s _ 0% 1946MB/s _ 0% +1274.22% (p=0.000 n=3+3) Equal/64M-4 111MB/s _55% 1941MB/s _ 0% +1641.21% (p=0.000 n=3+3) Change-Id: I9af7e82de3c4c5af8813772ed139230900c03b92 Reviewed-on: https://go-review.googlesource.com/c/go/+/380075 Trust: Joel Sing <joel@sing.id.au> Trust: mzh <mzh@golangcn.org> Reviewed-by: mzh <mzh@golangcn.org> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-11-06all: remove more leftover // +build linesTobias Klauser
CL 344955 and CL 359476 removed almost all // +build lines, but leaving some assembly files and generating scripts. Also, some files were added with // +build lines after CL 359476 was merged. Remove these or rename files where more appropriate. For #41184 Change-Id: I7eb85a498ed9788b42a636e775f261d755504ffa Reviewed-on: https://go-review.googlesource.com/c/go/+/361480 Trust: Tobias Klauser <tobias.klauser@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com>
2021-10-28all: go fix -fix=buildtag std cmd (except for bootstrap deps, vendor)Russ Cox
When these packages are released as part of Go 1.18, Go 1.16 will no longer be supported, so we can remove the +build tags in these files. Ran go fix -fix=buildtag std cmd and then reverted the bootstrapDirs as defined in src/cmd/dist/buildtool.go, which need to continue to build with Go 1.4 for now. Also reverted src/vendor and src/cmd/vendor, which will need to be updated in their own repos first. Manual changes in runtime/pprof/mprof_test.go to adjust line numbers. For #41184. Change-Id: Ic0f93f7091295b6abc76ed5cd6e6746e1280861e Reviewed-on: https://go-review.googlesource.com/c/go/+/344955 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com>
2021-10-21internal/bytealg: fix Separator length check for Index/ppc64leArchana R
Modified condition in the ASM implementation of indexbody to determine if separator length crosses 16 bytes to BGT from BGE to avoid incorrectly crossing a page. Also fixed IndexString to invoke indexbodyp9 when on the POWER9 platform Change-Id: I0602a797cc75287990eea1972e9e473744f6f5a9 Reviewed-on: https://go-review.googlesource.com/c/go/+/356849 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Trust: Keith Randall <khr@golang.org>
2021-10-19internal/bytealg: port bytes.Index and bytes.Count to reg ABI on ppc64xArchana R
This change adds support for the reg ABI to the Index and Count functions for ppc64/ppc64le. Most Index and Count benchmarks show improvement in performance on POWER9 with this change. Similar numbers observed on POWER8 and POWER10. name old time/op new time/op delta Index/32 71.0ns ± 0% 67.9ns ± 0% -4.42% (p=0.001 n=7+6) IndexEasy/10 17.5ns ± 0% 17.2ns ± 0% -1.30% (p=0.001 n=7+7) name old time/op new time/op delta Count/10 26.6ns ± 0% 25.0ns ± 1% -6.02% (p=0.001 n=7+7) Count/32 78.6ns ± 0% 74.7ns ± 0% -4.97% (p=0.001 n=7+7) Count/4K 5.03µs ± 0% 5.03µs ± 0% -0.07% (p=0.000 n=6+7) CountEasy/10 26.9ns ± 0% 25.2ns ± 1% -6.31% (p=0.001 n=7+7) CountSingle/32 11.8ns ± 0% 9.9ns ± 0% -15.70% (p=0.002 n=6+6) Change-Id: Ibd146c04f8107291c55f9e6100b8264dfccc41ae Reviewed-on: https://go-review.googlesource.com/c/go/+/355509 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-09-28internal/bytealg: port bytealg functions to reg ABI on ppc64xLynn Boger
This adds support for the reg ABI to the bytes functions for ppc64/ppc64le. These are initially under control of the GOEXPERIMENT macro until all changes are in. Change-Id: Id82f31056af8caa8541e27c6735f6b815a5dbf5a Reviewed-on: https://go-review.googlesource.com/c/go/+/351190 Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-08-23all: replace runtime SSE2 detection with GO386 settingMartin Möhrmann
When GO386=sse2 we can assume sse2 to be present without a runtime check. If GO386=softfloat is set we can avoid the usage of SSE2 even if detected. This might cause a memcpy, memclr and bytealg slowdown of Go binaries compiled with softfloat on machines that support SSE2. Such setups are rare and should use GO386=sse2 instead if performance matters. On targets that support SSE2 we avoid the runtime overhead of dynamic cpu feature dispatch. The removal of runtime sse2 checks also allows to simplify internal/cpu further by removing handling of the required feature option as a followup after this CL. Change-Id: I90a853a8853a405cb665497c6d1a86556947ba17 Reviewed-on: https://go-review.googlesource.com/c/go/+/344350 Trust: Martin Möhrmann <martin@golang.org> Run-TryBot: Martin Möhrmann <martin@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-08-11[dev.typeparams] runtime, internal/bytealg: remove regabi fallback code on AMD64Cherry Mui
As we commit to always enabling register ABI on AMD64, remove the fallback code. Change-Id: I30556858ba4bac367495fa94f6a8682ecd771196 Reviewed-on: https://go-review.googlesource.com/c/go/+/341152 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Austin Clements <austin@google.com>
2021-06-16[dev.typeparams] all: merge master (785a8f6) into dev.typeparamsCuong Manh Le
- test/run.go CL 328050 added fixedbugs/issue46749.go to -G=3 excluded files list Merge List: + 2021-06-16 785a8f677f cmd/compile: better error message for invalid untyped operation + 2021-06-16 a752bc0746 syscall: fix TestGroupCleanupUserNamespace test failure on Fedora + 2021-06-15 d77f4c0c5c net/http: improve some server docs + 2021-06-15 219fe9d547 cmd/go: ignore UTF8 BOM when reading source code + 2021-06-15 723f199edd cmd/link: set correct flags in .dynamic for PIE buildmode + 2021-06-15 4d2d89ff42 cmd/go, go/build: update docs to use //go:build syntax + 2021-06-15 033d885315 doc/go1.17: document go run pkg@version + 2021-06-15 ea8612ef42 syscall: disable c-shared test when no cgo, for windows/arm + 2021-06-15 abc56fd1a0 internal/bytealg: remove duplicate go:build line + 2021-06-15 4061d3463b syscall: rewrite handle inheritance test to use C rather than Powershell + 2021-06-15 cf4e3e3d3b reflect: explain why convertible or comparable types may still panic + 2021-06-14 7841cb14d9 doc/go1.17: assorted fixes + 2021-06-14 8a5a6f46dc debug/elf: don't apply DWARF relocations for ET_EXEC binaries + 2021-06-14 9d13f8d43e runtime: update the variable name in comment + 2021-06-14 0fd20ed5b6 reflect: use same conversion panic in reflect and runtime + 2021-06-14 6bbb0a9d4a cmd/internal/sys: mark windows/arm64 as c-shared-capable + 2021-06-14 d4f34f8c63 doc/go1.17: reword "results" in stack trace printing Change-Id: I60d1f67c4d48cd4093c350fc89bd60c454d23944
2021-06-15internal/bytealg: remove duplicate go:build linecuishuang
Change-Id: I6b71bf468b9544820829f02e320673f5edd785fa GitHub-Last-Rev: 8082ac5fba18e630dd2a21771837e6f0b1f9853f GitHub-Pull-Request: golang/go#46683 Reviewed-on: https://go-review.googlesource.com/c/go/+/326730 Reviewed-by: Ian Lance Taylor <iant@golang.org> Trust: Tobias Klauser <tobias.klauser@gmail.com>
2021-06-03[dev.typeparams] runtime, internal/bytealg: port performance-critical ↵Cherry Mui
functions to register ABI on ARM64 This CL ports a few performance-critical assembly functions to use register arguments directly. This is similar to CL 308931 and CL 310184. Change-Id: I6e30dfff17f76b8578ce8cfd51de21b66610fdb0 Reviewed-on: https://go-review.googlesource.com/c/go/+/324400 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-06-03[dev.typeparams] internal/bytealg: call memeqbody directly in ↵Cherry Mui
memequal_varlen on ARM64 Currently, memequal_varlen opens up a frame and call memequal, which then tail-calls memeqbody. This CL changes memequal_varlen tail-calls memeqbody directly. This makes it simpler to switch to the register ABI in the next CL. Change-Id: Ia1367c0abb7f4755fe736c404411793fb9e5c04f Reviewed-on: https://go-review.googlesource.com/c/go/+/324399 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-05-13all: add //go:build lines to assembly filesTobias Klauser
Don't add them to files in vendor and cmd/vendor though. These will be pulled in by updating the respective dependencies. For #41184 Change-Id: Icc57458c9b3033c347124323f33084c85b224c70 Reviewed-on: https://go-review.googlesource.com/c/go/+/319389 Trust: Tobias Klauser <tobias.klauser@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
2021-04-21internal/bytealg: add power9 version of bytes indexLynn Boger
This adds a power9 version of the bytes.Index function for little endian. Here is the improvement on power9 for some of the Index benchmarks: Index/10 -0.14% Index/32 -3.19% Index/4K -12.66% Index/4M -13.34% Index/64M -13.17% Count/10 -0.59% Count/32 -2.88% Count/4K -12.63% Count/4M -13.35% Count/64M -13.17% IndexHard1 -23.03% IndexHard2 -13.01% IndexHard3 -22.12% IndexHard4 +0.16% CountHard1 -23.02% CountHard2 -13.01% CountHard3 -22.12% IndexPeriodic/IndexPeriodic2 -22.85% IndexPeriodic/IndexPeriodic4 -23.15% Change-Id: Id72353e2771eba2efbb1544d5f0be65f8a9f0433 Reviewed-on: https://go-review.googlesource.com/c/go/+/311380 Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2021-04-15bytes: add asm implementation for index on ppc64xLynn Boger
This adds an asm implementation of index on ppc64le and ppc64. It results in a significant improvement in some of the benchmarks that use bytes.Index. The implementation is based on a port of the s390x asm implementation. Comments on the design are found with the code. The following improvements occurred on power8: Index/10 70.7ns ± 0% 18.8ns ± 0% -73.4 Index/32 165ns ± 0% 95ns ± 0% -42.6 Index/4K 9.23µs ± 0% 4.91µs ± 0% -46 Index/4M 9.52ms ± 0% 5.10ms ± 0% -46.4 Index/64M 155ms ± 0% 85ms ± 0% -45.1 Count/10 83.0ns ± 0% 32.1ns ± 0% -61.3 Count/32 178ns ± 0% 109ns ± 0% -38.8 Count/4K 9.24µs ± 0% 4.93µs ± 0% -46 Count/4M 9.52ms ± 0% 5.10ms ± 0% -46.4 Count/64M 155ms ± 0% 85ms ± 0% -45.1 IndexHard1 2.36ms ± 0% 0.13ms ± 0% -94.4 IndexHard2 2.36ms ± 0% 1.28ms ± 0% -45.8 IndexHard3 2.36ms ± 0% 1.19ms ± 0% -49.4 IndexHard4 2.36ms ± 0% 2.35ms ± 0% -0.1 CountHard1 2.36ms ± 0% 0.13ms ± 0% -94.4 CountHard2 2.36ms ± 0% 1.28ms ± 0% -45.8 CountHard3 2.36ms ± 0% 1.19ms ± 0% -49.4 IndexPeriodic/IndexPeriodic2 146µs ± 0% 8µs ± 0% -94 IndexPeriodic/IndexPeriodic4 146µs ± 0% 8µs ± 0% -94 Change-Id: I7dd2bb7e278726e27f51825ca8b2f8317d460e60 Reviewed-on: https://go-review.googlesource.com/c/go/+/309730 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org> Trust: Carlos Eduardo Seo <carlos.seo@linaro.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2021-04-15internal/bytealg: port more performance-critical functions to ABIInternalAustin Clements
CL 308931 ported several runtime assembly functions to ABIInternal so that compiler-generated ABIInternal calls don't go through ABI wrappers, but it missed the runtime assembly functions that are actually defined in internal/bytealg. This eliminates the cost of wrappers for the BleveQuery and GopherLuaKNucleotide benchmarks, but there's still more to do for Tile38. 0-base 1-wrappers sec/op sec/op vs base BleveQuery 6.507 ± 0% 6.477 ± 0% -0.46% (p=0.004 n=20) GopherLuaKNucleotide 30.39 ± 1% 30.34 ± 0% ~ (p=0.301 n=20) Tile38IntersectsCircle100kmRequest 1.038m ± 1% 1.080m ± 2% +4.03% (p=0.000 n=20) For #40724. Change-Id: I0b722443f684fcb997b1d70802c5ed4b8d8f9829 Reviewed-on: https://go-review.googlesource.com/c/go/+/310184 Trust: Austin Clements <austin@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-02-20all: go fmt std cmd (but revert vendor)Russ Cox
Make all our package sources use Go 1.17 gofmt format (adding //go:build lines). Part of //go:build change (#41184). See https://golang.org/design/draft-gobuild Change-Id: Ia0534360e4957e58cd9a18429c39d0e32a6addb4 Reviewed-on: https://go-review.googlesource.com/c/go/+/294430 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-10-23internal/bytealg: improve mips64x equal on large sizeMeng Zhuo
name old time/op new time/op delta Equal/0 9.94ns ± 4% 9.12ns ± 5% -8.26% (p=0.000 n=10+10) Equal/1 24.5ns ± 0% 27.2ns ± 1% +11.22% (p=0.000 n=9+10) Equal/6 28.1ns ± 0% 32.1ns ± 1% +14.20% (p=0.000 n=8+10) Equal/9 37.1ns ± 0% 37.8ns ± 1% +1.95% (p=0.000 n=8+9) Equal/15 47.3ns ± 0% 44.3ns ± 0% -6.34% (p=0.000 n=9+10) Equal/16 42.9ns ± 0% 24.6ns ± 0% -42.66% (p=0.000 n=10+7) Equal/20 44.3ns ± 0% 57.4ns ± 0% +29.57% (p=0.000 n=9+10) Equal/32 63.2ns ± 0% 35.8ns ± 0% -43.35% (p=0.000 n=10+10) Equal/4K 6.49µs ± 0% 0.50µs ± 0% -92.27% (p=0.000 n=10+8) Equal/4M 6.70ms ± 0% 0.48ms ± 0% -92.78% (p=0.000 n=8+10) Equal/64M 110ms ± 0% 8ms ± 0% -92.65% (p=0.000 n=9+9) CompareBytesEqual 36.6ns ± 0% 35.9ns ± 0% -1.83% (p=0.000 n=10+9) name old speed new speed delta Equal/1 40.8MB/s ± 0% 36.7MB/s ± 0% -10.16% (p=0.000 n=10+10) Equal/6 213MB/s ± 0% 187MB/s ± 1% -12.32% (p=0.000 n=10+10) Equal/9 243MB/s ± 0% 238MB/s ± 1% -1.94% (p=0.000 n=9+10) Equal/15 317MB/s ± 0% 339MB/s ± 0% +6.86% (p=0.000 n=9+9) Equal/16 373MB/s ± 0% 651MB/s ± 0% +74.70% (p=0.000 n=8+10) Equal/20 452MB/s ± 0% 348MB/s ± 0% -22.90% (p=0.000 n=8+10) Equal/32 506MB/s ± 0% 893MB/s ± 0% +76.53% (p=0.000 n=10+9) Equal/4K 631MB/s ± 0% 8166MB/s ± 0% +1194.73% (p=0.000 n=10+10) Equal/4M 626MB/s ± 0% 8673MB/s ± 0% +1284.94% (p=0.000 n=8+10) Equal/64M 608MB/s ± 0% 8277MB/s ± 0% +1260.83% (p=0.000 n=9+9) Change-Id: I1cd14ade16390a5097a8d4e9721d5e822fa6218f Reviewed-on: https://go-review.googlesource.com/c/go/+/199597 Run-TryBot: Meng Zhuo <mzh@golangcn.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Trust: Meng Zhuo <mzh@golangcn.org>
2020-10-19internal/bytealg: add assembly implementation of Count/CountString for riscv64Tobias Klauser
Simple single-byte loop count for now, to be further improved in future CLs. Benchmark on linux/riscv64 (HiFive Unleashed): name old time/op new time/op delta CountSingle/10-4 190ns ± 1% 145ns ± 1% -23.66% (p=0.000 n=10+9) CountSingle/32-4 422ns ± 1% 268ns ± 0% -36.43% (p=0.000 n=10+7) CountSingle/4K-4 43.3µs ± 0% 23.8µs ± 0% -45.09% (p=0.000 n=8+10) CountSingle/4M-4 54.2ms ± 1% 33.3ms ± 1% -38.48% (p=0.000 n=10+10) CountSingle/64M-4 1.52s ± 1% 1.20s ± 1% -21.20% (p=0.000 n=9+9) name old speed new speed delta CountSingle/10-4 52.7MB/s ± 1% 69.1MB/s ± 1% +31.03% (p=0.000 n=10+9) CountSingle/32-4 75.9MB/s ± 1% 119.5MB/s ± 0% +57.34% (p=0.000 n=10+8) CountSingle/4K-4 94.6MB/s ± 0% 172.2MB/s ± 0% +82.10% (p=0.000 n=8+10) CountSingle/4M-4 77.4MB/s ± 1% 125.8MB/s ± 1% +62.54% (p=0.000 n=10+10) CountSingle/64M-4 44.2MB/s ± 1% 56.1MB/s ± 1% +26.91% (p=0.000 n=9+9) Change-Id: I2a6bd50d22d5f598517bb3c5a50066c54280cac5 Reviewed-on: https://go-review.googlesource.com/c/go/+/263541 Trust: Tobias Klauser <tobias.klauser@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Joel Sing <joel@sing.id.au>
2020-10-13internal/bytealg: fix typo in IndexRabinKarp{,Bytes} godocTobias Klauser
Change-Id: I09ba19e19b195e345a0fe29d542e0d86529b0d31 Reviewed-on: https://go-review.googlesource.com/c/go/+/261359 Trust: Tobias Klauser <tobias.klauser@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-09-23bytes, internal/bytealg: fix incorrect IndexString usageMichael Munday
The IndexString implementation in the bytealg package requires that the string passed into it be in the range '2 <= len(s) <= MaxLen' where MaxLen may be any value (including 0). CL 156998 added calls to bytealg.IndexString where MaxLen was not first checked. This led to an illegal instruction on s390x with the vector facility disabled. This CL guards the calls to bytealg.IndexString with a MaxLen check. If the check fails then the code now falls back to the pre CL 156998 implementation (a loop over the runes in the string). Since the MaxLen check is now in place the generic implementation is no longer called so I have returned it to its original unimplemented state. In future we may want to drop MaxLen to prevent this kind of confusion. Fixes #41552. Change-Id: Ibeb3f08720444a05c08d719ed97f6cef2423bbe9 Reviewed-on: https://go-review.googlesource.com/c/go/+/256717 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Munday <mike.munday@ibm.com> Reviewed-by: Keith Randall <khr@golang.org>
2020-08-17internal/bytealg: use CBZ instructionsHeisenberg
Use CBZ to replace the comparison and jump to the zero instruction in the arm64 assembly file. Change-Id: Ie16fb52e27b4d327343e119ebc0f0ca756437bc4 Reviewed-on: https://go-review.googlesource.com/c/go/+/237477 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-08-16crypto,internal/bytealg: fix assembly that clobbers BPKeith Randall
BP should be callee-save. It will be saved automatically if there is a nonzero frame size. Otherwise, we need to avoid this register. Change-Id: If3f551efa42d830c8793d9f0183cb8daad7a2ab5 Reviewed-on: https://go-review.googlesource.com/c/go/+/248260 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-03-11strings, bytes: improve IndexAny and LastIndexAny performanceerifan01
For the case of a pattern containing multi-byte rune, the time complexity of the previous algorithm is O(nm), and if both input arguments are long, the search performance will be poor. This CL improves the searching performance for these cases by using IndexRune, which is mainly implemented with IndexByte and Index. As IndexByte and Index are specially optimized with some powerful instructions for short patterns (an UTF8 rune is 1 to 4 bytes), so they can help to reduce the runtime complexity of IndexAny and LastIndexAny. Another optimization method is using hash table, however, the actual test results show that using indexrune is better, and the space complexity is lower. There are two fast paths in IndexAny and LastIndexAny for cases where the length of the input arguements are 1, and their locations are not exactly the same, which is determined based on the actual test results. Benchmarks on arm64 and amd64: name old time/op new time/op delta pkg:strings goos:linux goarch:arm64 IndexAnyASCII/1:1-8 23.7ns ± 3% 28.5ns ± 0% +20.15% (p=0.008 n=5+5) IndexAnyASCII/1:2-8 18.0ns ± 0% 33.1ns ± 0% +83.67% (p=0.008 n=5+5) IndexAnyASCII/1:4-8 20.0ns ± 0% 36.0ns ± 0% +80.00% (p=0.029 n=4+4) IndexAnyASCII/1:8-8 36.1ns ± 0% 36.0ns ± 0% ~ (p=0.095 n=5+4) IndexAnyASCII/1:16-8 48.1ns ± 0% 36.0ns ± 0% -25.19% (p=0.029 n=4+4) IndexAnyASCII/1:32-8 72.1ns ± 0% 36.0ns ± 0% -50.01% (p=0.008 n=5+5) IndexAnyASCII/1:64-8 120ns ± 0% 39ns ± 0% -67.83% (p=0.008 n=5+5) IndexAnyASCII/16:1-8 73.0ns ± 0% 28.5ns ± 0% -60.95% (p=0.008 n=5+5) IndexAnyASCII/16:2-8 76.8ns ± 0% 77.0ns ± 0% ~ (p=1.000 n=5+5) IndexAnyASCII/16:4-8 83.2ns ± 1% 83.0ns ± 0% ~ (p=0.770 n=5+5) IndexAnyASCII/16:8-8 111ns ± 1% 107ns ± 0% -3.25% (p=0.008 n=5+5) IndexAnyASCII/16:16-8 139ns ± 1% 137ns ± 0% -1.58% (p=0.008 n=5+5) IndexAnyASCII/16:32-8 199ns ± 1% 197ns ± 0% -1.20% (p=0.008 n=5+5) IndexAnyASCII/16:64-8 307ns ± 0% 313ns ± 0% +1.82% (p=0.016 n=5+4) IndexAnyASCII/256:1-8 674ns ± 0% 65ns ± 0% -90.31% (p=0.008 n=5+5) IndexAnyASCII/256:2-8 678ns ± 0% 683ns ± 0% +0.68% (p=0.008 n=5+5) IndexAnyASCII/256:4-8 685ns ± 0% 683ns ± 0% -0.29% (p=0.000 n=5+4) IndexAnyASCII/256:8-8 711ns ± 0% 708ns ± 0% -0.48% (p=0.008 n=5+5) IndexAnyASCII/256:16-8 740ns ± 0% 740ns ± 0% ~ (p=0.444 n=5+5) IndexAnyASCII/256:32-8 799ns ± 0% 798ns ± 0% -0.18% (p=0.008 n=5+5) IndexAnyASCII/256:64-8 910ns ± 0% 914ns ± 0% +0.44% (p=0.016 n=4+5) IndexAnyUTF8/1:1-8 27.1ns ± 0% 19.0ns ± 0% -29.79% (p=0.008 n=5+5) IndexAnyUTF8/1:2-8 44.1ns ± 0% 33.0ns ± 0% -25.17% (p=0.008 n=5+5) IndexAnyUTF8/1:4-8 46.1ns ± 0% 33.1ns ± 0% -28.29% (p=0.016 n=4+5) IndexAnyUTF8/1:8-8 85.1ns ± 0% 33.0ns ± 0% -61.18% (p=0.008 n=5+5) IndexAnyUTF8/1:16-8 110ns ± 1% 36ns ± 0% -67.27% (p=0.008 n=5+5) IndexAnyUTF8/1:32-8 188ns ± 0% 36ns ± 0% -80.85% (p=0.008 n=5+5) IndexAnyUTF8/1:64-8 332ns ± 0% 39ns ± 0% ~ (p=0.079 n=4+5) IndexAnyUTF8/16:1-8 293ns ± 0% 54ns ± 0% -81.56% (p=0.008 n=5+5) IndexAnyUTF8/16:2-8 563ns ± 0% 349ns ± 0% -37.98% (p=0.008 n=5+5) IndexAnyUTF8/16:4-8 546ns ± 1% 349ns ± 0% -36.10% (p=0.000 n=5+4) IndexAnyUTF8/16:8-8 1.22µs ± 0% 0.35µs ± 0% -71.39% (p=0.008 n=5+5) IndexAnyUTF8/16:16-8 1.63µs ± 1% 0.42µs ± 0% -73.98% (p=0.008 n=5+5) IndexAnyUTF8/16:32-8 2.87µs ± 0% 0.42µs ± 0% -85.22% (p=0.008 n=5+5) IndexAnyUTF8/16:64-8 5.18µs ± 0% 0.47µs ± 0% -90.98% (p=0.008 n=5+5) IndexAnyUTF8/256:1-8 4.26µs ± 0% 0.47µs ± 0% -88.85% (p=0.000 n=4+5) IndexAnyUTF8/256:2-8 8.62µs ± 0% 5.15µs ± 0% -40.21% (p=0.008 n=5+5) IndexAnyUTF8/256:4-8 8.25µs ± 0% 5.15µs ± 0% -37.50% (p=0.016 n=5+4) IndexAnyUTF8/256:8-8 19.2µs ± 1% 5.2µs ± 0% -73.08% (p=0.016 n=5+4) IndexAnyUTF8/256:16-8 25.6µs ± 1% 6.3µs ± 0% -75.32% (p=0.008 n=5+5) IndexAnyUTF8/256:32-8 45.6µs ± 0% 6.3µs ± 0% -86.15% (p=0.008 n=5+5) IndexAnyUTF8/256:64-8 82.4µs ± 0% 7.0µs ± 0% -91.53% (p=0.016 n=5+4) LastIndexAnyASCII/1:1-8 23.0ns ± 0% 33.5ns ± 0% +45.65% (p=0.008 n=5+5) LastIndexAnyASCII/1:2-8 24.5ns ± 0% 33.5ns ± 0% +36.73% (p=0.016 n=4+5) LastIndexAnyASCII/1:4-8 27.5ns ± 0% 35.5ns ± 0% +29.09% (p=0.008 n=5+5) LastIndexAnyASCII/1:8-8 44.5ns ± 0% 35.5ns ± 0% -20.13% (p=0.008 n=5+5) LastIndexAnyASCII/1:16-8 56.5ns ± 0% 35.5ns ± 0% -37.15% (p=0.008 n=5+5) LastIndexAnyASCII/1:32-8 80.3ns ± 0% 35.5ns ± 0% -55.79% (p=0.000 n=5+4) LastIndexAnyASCII/1:64-8 129ns ± 0% 40ns ± 0% -68.85% (p=0.008 n=5+5) LastIndexAnyASCII/16:1-8 72.8ns ± 0% 72.7ns ± 0% -0.19% (p=0.016 n=4+5) LastIndexAnyASCII/16:2-8 75.4ns ± 0% 75.1ns ± 0% ~ (p=0.127 n=5+5) LastIndexAnyASCII/16:4-8 81.9ns ± 1% 80.2ns ± 0% -2.00% (p=0.008 n=5+5) LastIndexAnyASCII/16:8-8 110ns ± 1% 108ns ± 0% -1.46% (p=0.008 n=5+5) LastIndexAnyASCII/16:16-8 138ns ± 1% 134ns ± 0% -3.18% (p=0.008 n=5+5) LastIndexAnyASCII/16:32-8 198ns ± 0% 197ns ± 0% -0.51% (p=0.008 n=5+5) LastIndexAnyASCII/16:64-8 309ns ± 0% 313ns ± 0% +1.30% (p=0.008 n=5+5) LastIndexAnyASCII/256:1-8 652ns ± 0% 653ns ± 0% +0.21% (p=0.008 n=5+5) LastIndexAnyASCII/256:2-8 656ns ± 0% 656ns ± 0% ~ (all equal) LastIndexAnyASCII/256:4-8 663ns ± 0% 663ns ± 0% ~ (p=0.444 n=5+5) LastIndexAnyASCII/256:8-8 691ns ± 0% 690ns ± 0% ~ (p=0.079 n=4+5) LastIndexAnyASCII/256:16-8 719ns ± 0% 715ns ± 0% -0.53% (p=0.000 n=5+4) LastIndexAnyASCII/256:32-8 779ns ± 0% 780ns ± 0% +0.13% (p=0.029 n=4+4) LastIndexAnyASCII/256:64-8 890ns ± 0% 894ns ± 0% +0.45% (p=0.008 n=5+5) LastIndexAnyUTF8/1:1-8 31.6ns ± 0% 33.5ns ± 0% +6.01% (p=0.008 n=5+5) LastIndexAnyUTF8/1:2-8 48.6ns ± 0% 33.5ns ± 0% -30.99% (p=0.008 n=5+5) LastIndexAnyUTF8/1:4-8 48.6ns ± 0% 33.5ns ± 0% -31.13% (p=0.000 n=5+4) LastIndexAnyUTF8/1:8-8 89.6ns ± 0% 33.5ns ± 0% -62.56% (p=0.008 n=5+5) LastIndexAnyUTF8/1:16-8 113ns ± 1% 36ns ± 0% -68.47% (p=0.000 n=5+4) LastIndexAnyUTF8/1:32-8 190ns ± 0% 36ns ± 0% -81.26% (p=0.029 n=4+4) LastIndexAnyUTF8/1:64-8 327ns ± 0% 40ns ± 0% -87.77% (p=0.008 n=5+5) LastIndexAnyUTF8/16:1-8 364ns ± 0% 158ns ± 0% ~ (p=0.079 n=4+5) LastIndexAnyUTF8/16:2-8 636ns ± 0% 472ns ± 0% -25.79% (p=0.000 n=5+4) LastIndexAnyUTF8/16:4-8 630ns ± 0% 472ns ± 0% -25.03% (p=0.008 n=5+5) LastIndexAnyUTF8/16:8-8 1.28µs ± 0% 0.47µs ± 0% -63.09% (p=0.016 n=5+4) LastIndexAnyUTF8/16:16-8 1.66µs ± 0% 0.53µs ± 0% -68.39% (p=0.016 n=5+4) LastIndexAnyUTF8/16:32-8 2.88µs ± 0% 0.53µs ± 0% -81.72% (p=0.008 n=5+5) LastIndexAnyUTF8/16:64-8 5.08µs ± 0% 0.57µs ± 0% -88.79% (p=0.008 n=5+5) LastIndexAnyUTF8/256:1-8 5.41µs ± 0% 2.03µs ± 0% -62.46% (p=0.016 n=4+5) LastIndexAnyUTF8/256:2-8 9.77µs ± 0% 7.14µs ± 0% -26.97% (p=0.008 n=5+5) LastIndexAnyUTF8/256:4-8 9.63µs ± 0% 7.14µs ± 0% -25.86% (p=0.008 n=5+5) LastIndexAnyUTF8/256:8-8 20.0µs ± 0% 7.1µs ± 0% -64.30% (p=0.008 n=5+5) LastIndexAnyUTF8/256:16-8 26.1µs ± 1% 8.0µs ± 0% -69.40% (p=0.008 n=5+5) LastIndexAnyUTF8/256:32-8 45.6µs ± 1% 8.0µs ± 0% -82.51% (p=0.008 n=5+5) LastIndexAnyUTF8/256:64-8 80.8µs ± 0% 8.6µs ± 0% -89.33% (p=0.016 n=5+4) pkg:bytes goos:linux goarch:arm64 IndexAnyASCII/1:1-8 26.2ns ± 1% 26.5ns ± 0% +1.30% (p=0.016 n=5+4) IndexAnyASCII/1:2-8 18.5ns ± 0% 26.5ns ± 0% +43.24% (p=0.008 n=5+5) IndexAnyASCII/1:4-8 21.0ns ± 0% 26.5ns ± 0% +26.38% (p=0.008 n=5+5) IndexAnyASCII/1:8-8 37.5ns ± 0% 26.5ns ± 0% -29.33% (p=0.000 n=5+4) IndexAnyASCII/1:16-8 49.6ns ± 0% 26.5ns ± 0% -46.49% (p=0.008 n=5+5) IndexAnyASCII/1:32-8 73.6ns ± 0% 30.1ns ± 0% -59.16% (p=0.008 n=5+5) IndexAnyASCII/1:64-8 122ns ± 0% 33ns ± 0% -73.23% (p=0.008 n=5+5) IndexAnyASCII/16:1-8 73.7ns ± 0% 33.4ns ± 0% -54.71% (p=0.008 n=5+5) IndexAnyASCII/16:2-8 79.1ns ± 0% 78.9ns ± 0% -0.30% (p=0.016 n=4+5) IndexAnyASCII/16:4-8 84.8ns ± 0% 86.1ns ± 0% +1.58% (p=0.016 n=5+4) IndexAnyASCII/16:8-8 111ns ± 0% 111ns ± 0% ~ (all equal) IndexAnyASCII/16:16-8 139ns ± 0% 144ns ± 0% +3.60% (p=0.016 n=4+5) IndexAnyASCII/16:32-8 196ns ± 0% 207ns ± 0% +5.61% (p=0.016 n=5+4) IndexAnyASCII/16:64-8 311ns ± 0% 320ns ± 0% +2.89% (p=0.016 n=4+5) IndexAnyASCII/256:1-8 674ns ± 0% 65ns ± 1% -90.35% (p=0.008 n=5+5) IndexAnyASCII/256:2-8 680ns ± 0% 680ns ± 0% ~ (p=0.444 n=5+5) IndexAnyASCII/256:4-8 686ns ± 0% 687ns ± 0% ~ (p=0.167 n=5+5) IndexAnyASCII/256:8-8 713ns ± 0% 712ns ± 0% -0.14% (p=0.008 n=5+5) IndexAnyASCII/256:16-8 740ns ± 0% 744ns ± 0% +0.54% (p=0.016 n=5+4) IndexAnyASCII/256:32-8 797ns ± 0% 808ns ± 0% +1.43% (p=0.008 n=5+5) IndexAnyASCII/256:64-8 912ns ± 0% 921ns ± 0% +0.99% (p=0.016 n=4+5) IndexAnyUTF8/1:1-8 27.5ns ± 0% 26.5ns ± 0% -3.64% (p=0.008 n=5+5) IndexAnyUTF8/1:2-8 44.5ns ± 0% 26.5ns ± 0% -40.50% (p=0.008 n=5+5) IndexAnyUTF8/1:4-8 45.6ns ± 0% 26.5ns ± 0% -41.89% (p=0.000 n=5+4) IndexAnyUTF8/1:8-8 85.8ns ± 1% 26.5ns ± 0% -69.11% (p=0.008 n=5+5) IndexAnyUTF8/1:16-8 110ns ± 1% 26ns ± 0% -76.00% (p=0.016 n=5+4) IndexAnyUTF8/1:32-8 188ns ± 0% 30ns ± 0% -84.04% (p=0.008 n=5+5) IndexAnyUTF8/1:64-8 333ns ± 0% 33ns ± 0% -90.20% (p=0.008 n=5+5) IndexAnyUTF8/16:1-8 294ns ± 0% 235ns ± 0% -20.07% (p=0.008 n=5+5) IndexAnyUTF8/16:2-8 563ns ± 0% 309ns ± 0% -45.12% (p=0.008 n=5+5) IndexAnyUTF8/16:4-8 558ns ± 1% 309ns ± 0% -44.60% (p=0.000 n=5+4) IndexAnyUTF8/16:8-8 1.23µs ± 0% 0.31µs ± 0% -74.79% (p=0.008 n=5+5) IndexAnyUTF8/16:16-8 1.62µs ± 2% 0.31µs ± 0% -80.93% (p=0.008 n=5+5) IndexAnyUTF8/16:32-8 2.86µs ± 0% 0.38µs ± 0% -86.87% (p=0.008 n=5+5) IndexAnyUTF8/16:64-8 5.18µs ± 0% 0.42µs ± 0% -91.86% (p=0.008 n=5+5) IndexAnyUTF8/256:1-8 4.27µs ± 1% 3.30µs ± 1% -22.75% (p=0.008 n=5+5) IndexAnyUTF8/256:2-8 8.61µs ± 0% 4.45µs ± 0% -48.31% (p=0.016 n=4+5) IndexAnyUTF8/256:4-8 8.44µs ± 0% 4.45µs ± 0% -47.23% (p=0.008 n=5+5) IndexAnyUTF8/256:8-8 19.2µs ± 0% 4.5µs ± 0% -76.78% (p=0.008 n=5+5) IndexAnyUTF8/256:16-8 25.6µs ± 0% 4.5µs ± 0% -82.63% (p=0.008 n=5+5) IndexAnyUTF8/256:32-8 45.4µs ± 0% 5.5µs ± 0% -87.85% (p=0.016 n=4+5) IndexAnyUTF8/256:64-8 82.5µs ± 0% 6.2µs ± 0% -92.49% (p=0.008 n=5+5) LastIndexAnyASCII/1:1-8 23.0ns ± 0% 26.5ns ± 0% +15.02% (p=0.008 n=5+5) LastIndexAnyASCII/1:2-8 24.5ns ± 0% 26.5ns ± 0% +8.16% (p=0.008 n=5+5) LastIndexAnyASCII/1:4-8 27.8ns ± 0% 26.5ns ± 0% -4.68% (p=0.029 n=4+4) LastIndexAnyASCII/1:8-8 45.1ns ± 1% 26.5ns ± 0% -41.29% (p=0.000 n=5+4) LastIndexAnyASCII/1:16-8 57.1ns ± 0% 26.5ns ± 0% -53.61% (p=0.008 n=5+5) LastIndexAnyASCII/1:32-8 81.5ns ± 0% 30.0ns ± 0% ~ (p=0.079 n=4+5) LastIndexAnyASCII/1:64-8 129ns ± 0% 32ns ± 0% -74.81% (p=0.008 n=5+5) LastIndexAnyASCII/16:1-8 72.6ns ± 0% 72.1ns ± 0% -0.63% (p=0.000 n=4+5) LastIndexAnyASCII/16:2-8 77.2ns ± 0% 77.2ns ± 0% ~ (p=0.167 n=5+5) LastIndexAnyASCII/16:4-8 83.1ns ± 0% 83.2ns ± 0% ~ (p=0.444 n=5+5) LastIndexAnyASCII/16:8-8 109ns ± 1% 108ns ± 0% ~ (p=0.167 n=5+5) LastIndexAnyASCII/16:16-8 136ns ± 0% 136ns ± 0% ~ (all equal) LastIndexAnyASCII/16:32-8 195ns ± 0% 197ns ± 0% +0.82% (p=0.008 n=5+5) LastIndexAnyASCII/16:64-8 309ns ± 0% 309ns ± 0% ~ (all equal) LastIndexAnyASCII/256:1-8 653ns ± 0% 657ns ± 0% +0.61% (p=0.008 n=5+5) LastIndexAnyASCII/256:2-8 659ns ± 0% 658ns ± 0% ~ (p=0.167 n=5+5) LastIndexAnyASCII/256:4-8 664ns ± 0% 663ns ± 0% ~ (p=0.095 n=5+4) LastIndexAnyASCII/256:8-8 698ns ± 0% 689ns ± 0% -1.29% (p=0.008 n=5+5) LastIndexAnyASCII/256:16-8 726ns ± 0% 717ns ± 0% -1.24% (p=0.008 n=5+5) LastIndexAnyASCII/256:32-8 777ns ± 0% 779ns ± 0% ~ (p=0.079 n=5+4) LastIndexAnyASCII/256:64-8 889ns ± 0% 890ns ± 0% ~ (p=0.444 n=5+5) LastIndexAnyUTF8/1:1-8 32.1ns ± 0% 26.5ns ± 0% -17.45% (p=0.000 n=5+4) LastIndexAnyUTF8/1:2-8 48.6ns ± 0% 26.5ns ± 0% -45.52% (p=0.000 n=5+4) LastIndexAnyUTF8/1:4-8 49.6ns ± 0% 26.5ns ± 0% -46.62% (p=0.008 n=5+5) LastIndexAnyUTF8/1:8-8 91.9ns ± 0% 26.5ns ± 0% -71.18% (p=0.008 n=5+5) LastIndexAnyUTF8/1:16-8 114ns ± 1% 26ns ± 0% -76.84% (p=0.000 n=5+4) LastIndexAnyUTF8/1:32-8 203ns ± 6% 30ns ± 0% -85.25% (p=0.008 n=5+5) LastIndexAnyUTF8/1:64-8 330ns ± 0% 33ns ± 0% -90.14% (p=0.000 n=4+5) LastIndexAnyUTF8/16:1-8 365ns ± 0% 164ns ± 0% -55.04% (p=0.008 n=5+5) LastIndexAnyUTF8/16:2-8 638ns ± 0% 296ns ± 0% -53.58% (p=0.008 n=5+5) LastIndexAnyUTF8/16:4-8 634ns ± 0% 296ns ± 0% -53.31% (p=0.008 n=5+5) LastIndexAnyUTF8/16:8-8 1.30µs ± 0% 0.30µs ± 0% -77.18% (p=0.000 n=4+5) LastIndexAnyUTF8/16:16-8 1.66µs ± 0% 0.30µs ± 0% -82.19% (p=0.008 n=5+5) LastIndexAnyUTF8/16:32-8 2.90µs ± 0% 0.38µs ± 0% -87.00% (p=0.029 n=4+4) LastIndexAnyUTF8/16:64-8 5.10µs ± 0% 0.42µs ± 0% -91.78% (p=0.008 n=5+5) LastIndexAnyUTF8/256:1-8 5.42µs ± 0% 2.12µs ± 0% -60.92% (p=0.008 n=5+5) LastIndexAnyUTF8/256:2-8 9.79µs ± 0% 4.26µs ± 0% -56.47% (p=0.008 n=5+5) LastIndexAnyUTF8/256:4-8 9.66µs ± 0% 4.26µs ± 0% -55.87% (p=0.008 n=5+5) LastIndexAnyUTF8/256:8-8 20.4µs ± 0% 4.3µs ± 0% -79.10% (p=0.008 n=5+5) LastIndexAnyUTF8/256:16-8 26.0µs ± 1% 4.3µs ± 0% -83.62% (p=0.008 n=5+5) LastIndexAnyUTF8/256:32-8 46.0µs ± 0% 5.5µs ± 0% -88.09% (p=0.008 n=5+5) LastIndexAnyUTF8/256:64-8 81.1µs ± 0% 6.2µs ± 0% -92.38% (p=0.008 n=5+5) name old time/op new time/op delta pkg:strings goos:linux goarch:amd64 IndexAnyASCII/1:1-48 10.0ns ± 0% 13.3ns ± 0% +33.00% (p=0.008 n=5+5) IndexAnyASCII/1:2-48 11.0ns ± 0% 15.5ns ± 0% +40.55% (p=0.016 n=4+5) IndexAnyASCII/1:4-48 12.9ns ± 0% 15.4ns ± 0% +19.69% (p=0.008 n=5+5) IndexAnyASCII/1:8-48 18.6ns ± 0% 15.5ns ± 0% -16.45% (p=0.000 n=4+5) IndexAnyASCII/1:16-48 30.1ns ± 0% 16.9ns ± 0% ~ (p=0.079 n=4+5) IndexAnyASCII/1:32-48 53.1ns ± 0% 18.6ns ± 0% -64.95% (p=0.000 n=5+4) IndexAnyASCII/1:64-48 98.9ns ± 0% 17.4ns ± 0% -82.41% (p=0.000 n=5+4) IndexAnyASCII/16:1-48 35.0ns ± 0% 14.2ns ± 0% -59.47% (p=0.000 n=5+4) IndexAnyASCII/16:2-48 35.5ns ± 0% 35.6ns ± 0% ~ (p=0.238 n=5+4) IndexAnyASCII/16:4-48 40.8ns ± 0% 40.7ns ± 1% ~ (p=0.643 n=5+5) IndexAnyASCII/16:8-48 50.8ns ± 0% 50.9ns ± 1% ~ (p=1.000 n=4+5) IndexAnyASCII/16:16-48 64.0ns ± 1% 64.5ns ± 1% ~ (p=0.071 n=5+5) IndexAnyASCII/16:32-48 98.3ns ± 0% 100.8ns ± 1% +2.52% (p=0.008 n=5+5) IndexAnyASCII/16:64-48 156ns ± 0% 157ns ± 0% ~ (p=0.238 n=4+5) IndexAnyASCII/256:1-48 299ns ± 0% 24ns ± 3% -92.12% (p=0.008 n=5+5) IndexAnyASCII/256:2-48 303ns ± 0% 304ns ± 0% ~ (p=0.762 n=5+5) IndexAnyASCII/256:4-48 311ns ± 0% 311ns ± 0% ~ (p=0.476 n=5+5) IndexAnyASCII/256:8-48 321ns ± 0% 321ns ± 0% ~ (p=0.429 n=4+5) IndexAnyASCII/256:16-48 334ns ± 0% 335ns ± 0% ~ (p=0.079 n=5+4) IndexAnyASCII/256:32-48 367ns ± 0% 365ns ± 0% ~ (p=0.079 n=4+5) IndexAnyASCII/256:64-48 431ns ± 1% 421ns ± 0% -2.27% (p=0.008 n=5+5) IndexAnyUTF8/1:1-48 17.2ns ± 0% 10.8ns ± 0% -37.21% (p=0.029 n=4+4) IndexAnyUTF8/1:2-48 26.7ns ± 0% 15.6ns ± 0% ~ (p=0.079 n=4+5) IndexAnyUTF8/1:4-48 28.2ns ± 0% 15.6ns ± 0% -44.68% (p=0.000 n=5+4) IndexAnyUTF8/1:8-48 48.8ns ± 0% 15.6ns ± 0% -68.03% (p=0.029 n=4+4) IndexAnyUTF8/1:16-48 58.3ns ± 0% 16.2ns ± 0% ~ (p=0.079 n=4+5) IndexAnyUTF8/1:32-48 103ns ± 0% 18ns ± 0% -82.27% (p=0.008 n=5+5) IndexAnyUTF8/1:64-48 182ns ± 0% 17ns ± 0% -90.53% (p=0.008 n=5+5) IndexAnyUTF8/16:1-48 197ns ± 0% 25ns ± 0% -87.34% (p=0.000 n=5+4) IndexAnyUTF8/16:2-48 348ns ± 0% 163ns ± 0% -53.11% (p=0.000 n=5+4) IndexAnyUTF8/16:4-48 374ns ± 0% 163ns ± 0% -56.37% (p=0.000 n=5+4) IndexAnyUTF8/16:8-48 716ns ± 0% 163ns ± 0% -77.22% (p=0.000 n=5+4) IndexAnyUTF8/16:16-48 859ns ± 0% 175ns ± 0% -79.63% (p=0.000 n=5+4) IndexAnyUTF8/16:32-48 1.58µs ± 0% 0.20µs ± 0% -87.01% (p=0.029 n=4+4) IndexAnyUTF8/16:64-48 2.84µs ± 0% 0.19µs ± 1% -93.34% (p=0.008 n=5+5) IndexAnyUTF8/256:1-48 2.61µs ± 0% 0.27µs ± 0% -89.81% (p=0.008 n=5+5) IndexAnyUTF8/256:2-48 4.95µs ± 0% 2.23µs ± 0% -54.91% (p=0.016 n=5+4) IndexAnyUTF8/256:4-48 5.55µs ± 0% 2.23µs ± 0% -59.72% (p=0.008 n=5+5) IndexAnyUTF8/256:8-48 10.8µs ± 0% 2.2µs ± 0% -79.39% (p=0.008 n=5+5) IndexAnyUTF8/256:16-48 13.1µs ± 0% 2.5µs ± 0% -81.21% (p=0.016 n=4+5) IndexAnyUTF8/256:32-48 24.7µs ± 0% 2.8µs ± 0% -88.49% (p=0.008 n=5+5) IndexAnyUTF8/256:64-48 45.0µs ± 0% 2.6µs ± 1% -94.23% (p=0.008 n=5+5) LastIndexAnyASCII/1:1-48 13.9ns ± 0% 15.2ns ± 0% +9.35% (p=0.008 n=5+5) LastIndexAnyASCII/1:2-48 14.4ns ± 0% 15.2ns ± 0% +5.56% (p=0.008 n=5+5) LastIndexAnyASCII/1:4-48 16.7ns ± 0% 15.2ns ± 0% -8.98% (p=0.008 n=5+5) LastIndexAnyASCII/1:8-48 24.0ns ± 0% 15.2ns ± 0% -36.67% (p=0.008 n=5+5) LastIndexAnyASCII/1:16-48 35.6ns ± 0% 15.0ns ± 0% -57.82% (p=0.008 n=5+5) LastIndexAnyASCII/1:32-48 68.9ns ± 0% 16.7ns ± 0% -75.75% (p=0.008 n=5+5) LastIndexAnyASCII/1:64-48 104ns ± 0% 17ns ± 1% -83.81% (p=0.008 n=5+5) LastIndexAnyASCII/16:1-48 35.0ns ± 0% 35.0ns ± 0% ~ (all equal) LastIndexAnyASCII/16:2-48 35.6ns ± 0% 35.6ns ± 0% ~ (all equal) LastIndexAnyASCII/16:4-48 41.0ns ± 0% 40.8ns ± 0% -0.49% (p=0.032 n=5+5) LastIndexAnyASCII/16:8-48 50.9ns ± 0% 50.7ns ± 1% ~ (p=0.397 n=5+5) LastIndexAnyASCII/16:16-48 64.3ns ± 1% 64.4ns ± 1% ~ (p=1.000 n=4+5) LastIndexAnyASCII/16:32-48 100ns ± 0% 100ns ± 0% +0.38% (p=0.016 n=4+5) LastIndexAnyASCII/16:64-48 157ns ± 1% 163ns ± 0% +3.82% (p=0.008 n=5+5) LastIndexAnyASCII/256:1-48 302ns ± 0% 300ns ± 0% -0.53% (p=0.008 n=5+5) LastIndexAnyASCII/256:2-48 305ns ± 0% 303ns ± 0% -0.66% (p=0.000 n=5+4) LastIndexAnyASCII/256:4-48 313ns ± 0% 307ns ± 0% -2.04% (p=0.000 n=4+5) LastIndexAnyASCII/256:8-48 323ns ± 0% 315ns ± 0% -2.48% (p=0.029 n=4+4) LastIndexAnyASCII/256:16-48 333ns ± 0% 332ns ± 0% -0.30% (p=0.048 n=5+5) LastIndexAnyASCII/256:32-48 366ns ± 0% 367ns ± 0% ~ (p=0.238 n=4+5) LastIndexAnyASCII/256:64-48 430ns ± 0% 430ns ± 0% ~ (p=1.000 n=5+5) LastIndexAnyUTF8/1:1-48 21.1ns ± 0% 13.9ns ± 0% -34.00% (p=0.008 n=5+5) LastIndexAnyUTF8/1:2-48 29.5ns ± 0% 13.9ns ± 0% -52.95% (p=0.008 n=5+5) LastIndexAnyUTF8/1:4-48 31.6ns ± 0% 13.9ns ± 0% -55.96% (p=0.008 n=5+5) LastIndexAnyUTF8/1:8-48 51.1ns ± 0% 13.9ns ± 0% -72.81% (p=0.008 n=5+5) LastIndexAnyUTF8/1:16-48 58.9ns ± 0% 14.6ns ± 0% -75.23% (p=0.016 n=5+4) LastIndexAnyUTF8/1:32-48 103ns ± 0% 16ns ± 1% -84.12% (p=0.008 n=5+5) LastIndexAnyUTF8/1:64-48 177ns ± 0% 17ns ± 1% -90.62% (p=0.008 n=5+5) LastIndexAnyUTF8/16:1-48 275ns ± 1% 105ns ± 0% -61.85% (p=0.000 n=5+4) LastIndexAnyUTF8/16:2-48 406ns ± 0% 216ns ± 0% -46.70% (p=0.008 n=5+5) LastIndexAnyUTF8/16:4-48 458ns ± 0% 216ns ± 0% -52.75% (p=0.000 n=4+5) LastIndexAnyUTF8/16:8-48 753ns ± 0% 216ns ± 0% -71.31% (p=0.029 n=4+4) LastIndexAnyUTF8/16:16-48 902ns ± 0% 221ns ± 0% -75.50% (p=0.016 n=5+4) LastIndexAnyUTF8/16:32-48 1.57µs ± 0% 0.24µs ± 0% -84.46% (p=0.008 n=5+5) LastIndexAnyUTF8/16:64-48 2.77µs ± 0% 0.24µs ± 0% -91.22% (p=0.000 n=5+4) LastIndexAnyUTF8/256:1-48 4.06µs ± 0% 1.53µs ± 0% -62.26% (p=0.008 n=5+5) LastIndexAnyUTF8/256:2-48 5.92µs ± 0% 3.04µs ± 0% -48.55% (p=0.016 n=4+5) LastIndexAnyUTF8/256:4-48 6.82µs ± 0% 3.04µs ± 0% -55.34% (p=0.008 n=5+5) LastIndexAnyUTF8/256:8-48 11.5µs ± 0% 3.0µs ± 0% -73.48% (p=0.008 n=5+5) LastIndexAnyUTF8/256:16-48 14.1µs ± 0% 3.1µs ± 0% -77.85% (p=0.008 n=5+5) LastIndexAnyUTF8/256:32-48 24.5µs ± 0% 3.5µs ± 0% -85.85% (p=0.016 n=5+4) LastIndexAnyUTF8/256:64-48 44.0µs ± 0% 3.5µs ± 0% -92.12% (p=0.008 n=5+5) pkg:bytes goos:linux goarch:amd64 IndexAnyASCII/1:1-48 9.56ns ± 0% 11.00ns ± 0% +15.06% (p=0.016 n=5+4) IndexAnyASCII/1:2-48 11.0ns ± 0% 10.8ns ± 2% -1.64% (p=0.048 n=5+5) IndexAnyASCII/1:4-48 13.9ns ± 0% 11.0ns ± 1% -21.15% (p=0.008 n=5+5) IndexAnyASCII/1:8-48 19.6ns ± 0% 10.8ns ± 3% -44.90% (p=0.008 n=5+5) IndexAnyASCII/1:16-48 31.1ns ± 0% 11.5ns ± 0% -63.02% (p=0.008 n=5+5) IndexAnyASCII/1:32-48 54.0ns ± 0% 11.8ns ± 0% -78.15% (p=0.000 n=5+4) IndexAnyASCII/1:64-48 100ns ± 0% 13ns ± 0% -86.89% (p=0.008 n=5+5) IndexAnyASCII/16:1-48 35.5ns ± 0% 14.8ns ± 0% -58.26% (p=0.008 n=5+5) IndexAnyASCII/16:2-48 36.2ns ± 1% 36.0ns ± 1% ~ (p=0.087 n=5+5) IndexAnyASCII/16:4-48 40.3ns ± 1% 39.7ns ± 4% ~ (p=0.175 n=4+5) IndexAnyASCII/16:8-48 48.7ns ± 5% 45.8ns ± 0% -6.02% (p=0.016 n=5+4) IndexAnyASCII/16:16-48 64.1ns ±11% 62.1ns ± 1% ~ (p=0.143 n=5+5) IndexAnyASCII/16:32-48 97.9ns ± 1% 98.3ns ± 1% ~ (p=0.294 n=5+5) IndexAnyASCII/16:64-48 163ns ± 0% 157ns ± 0% -3.68% (p=0.008 n=5+5) IndexAnyASCII/256:1-48 389ns ± 0% 25ns ± 0% -93.65% (p=0.000 n=5+4) IndexAnyASCII/256:2-48 391ns ± 0% 307ns ± 0% -21.48% (p=0.000 n=5+4) IndexAnyASCII/256:4-48 394ns ± 0% 323ns ± 0% -17.92% (p=0.008 n=5+5) IndexAnyASCII/256:8-48 402ns ± 0% 323ns ± 0% -19.51% (p=0.008 n=5+5) IndexAnyASCII/256:16-48 414ns ± 0% 334ns ± 0% -19.32% (p=0.016 n=4+5) IndexAnyASCII/256:32-48 446ns ± 0% 367ns ± 0% -17.75% (p=0.016 n=5+4) IndexAnyASCII/256:64-48 511ns ± 0% 424ns ± 0% -17.02% (p=0.008 n=5+5) IndexAnyUTF8/1:1-48 17.4ns ± 0% 11.0ns ± 0% -36.64% (p=0.008 n=5+5) IndexAnyUTF8/1:2-48 27.3ns ± 1% 11.0ns ± 0% -59.74% (p=0.008 n=5+5) IndexAnyUTF8/1:4-48 28.7ns ± 0% 11.0ns ± 0% -61.73% (p=0.008 n=5+5) IndexAnyUTF8/1:8-48 49.2ns ± 0% 11.0ns ± 0% -77.66% (p=0.008 n=5+5) IndexAnyUTF8/1:16-48 56.0ns ± 0% 11.5ns ± 0% -79.46% (p=0.000 n=5+4) IndexAnyUTF8/1:32-48 102ns ± 0% 12ns ± 0% -88.24% (p=0.008 n=5+5) IndexAnyUTF8/1:64-48 177ns ± 0% 13ns ± 0% -92.51% (p=0.008 n=5+5) IndexAnyUTF8/16:1-48 212ns ± 0% 112ns ± 0% -47.17% (p=0.008 n=5+5) IndexAnyUTF8/16:2-48 356ns ± 0% 159ns ± 1% -55.28% (p=0.000 n=4+5) IndexAnyUTF8/16:4-48 372ns ± 0% 158ns ± 0% -57.47% (p=0.008 n=5+5) IndexAnyUTF8/16:8-48 712ns ± 0% 159ns ± 1% -77.70% (p=0.008 n=5+5) IndexAnyUTF8/16:16-48 829ns ± 0% 129ns ± 0% -84.44% (p=0.008 n=5+5) IndexAnyUTF8/16:32-48 1.55µs ± 0% 0.16µs ± 0% -89.87% (p=0.008 n=5+5) IndexAnyUTF8/16:64-48 2.77µs ± 0% 0.14µs ± 0% -94.94% (p=0.008 n=5+5) IndexAnyUTF8/256:1-48 2.85µs ± 0% 1.63µs ± 1% -42.74% (p=0.008 n=5+5) IndexAnyUTF8/256:2-48 5.14µs ± 1% 2.03µs ± 0% -60.51% (p=0.008 n=5+5) IndexAnyUTF8/256:4-48 5.56µs ± 0% 2.03µs ± 0% -63.52% (p=0.008 n=5+5) IndexAnyUTF8/256:8-48 10.8µs ± 0% 2.0µs ± 0% -81.22% (p=0.008 n=5+5) IndexAnyUTF8/256:16-48 12.9µs ± 0% 1.9µs ± 0% -85.55% (p=0.008 n=5+5) IndexAnyUTF8/256:32-48 24.2µs ± 0% 2.1µs ± 0% -91.29% (p=0.016 n=5+4) IndexAnyUTF8/256:64-48 43.7µs ± 0% 2.0µs ± 0% -95.32% (p=0.016 n=5+4) LastIndexAnyASCII/1:1-48 13.7ns ± 1% 12.8ns ± 0% -6.57% (p=0.016 n=5+4) LastIndexAnyASCII/1:2-48 14.7ns ± 0% 12.7ns ± 1% -13.33% (p=0.000 n=4+5) LastIndexAnyASCII/1:4-48 16.9ns ± 0% 12.7ns ± 1% -24.73% (p=0.000 n=4+5) LastIndexAnyASCII/1:8-48 20.5ns ± 0% 12.7ns ± 0% -37.85% (p=0.000 n=4+5) LastIndexAnyASCII/1:16-48 28.0ns ± 0% 11.7ns ± 0% ~ (p=0.079 n=4+5) LastIndexAnyASCII/1:32-48 69.8ns ± 0% 12.4ns ± 0% -82.19% (p=0.008 n=5+5) LastIndexAnyASCII/1:64-48 73.8ns ± 0% 13.3ns ± 0% -82.03% (p=0.000 n=4+5) LastIndexAnyASCII/16:1-48 35.5ns ± 0% 35.5ns ± 0% ~ (all equal) LastIndexAnyASCII/16:2-48 36.0ns ± 0% 36.1ns ± 0% +0.28% (p=0.016 n=4+5) LastIndexAnyASCII/16:4-48 40.3ns ± 2% 40.0ns ± 6% ~ (p=0.651 n=5+5) LastIndexAnyASCII/16:8-48 50.3ns ± 0% 50.2ns ± 9% ~ (p=0.175 n=4+5) LastIndexAnyASCII/16:16-48 62.4ns ± 4% 64.4ns ± 0% +3.28% (p=0.016 n=5+4) LastIndexAnyASCII/16:32-48 98.9ns ± 0% 98.4ns ± 0% -0.53% (p=0.016 n=5+4) LastIndexAnyASCII/16:64-48 160ns ± 1% 161ns ± 1% ~ (p=0.325 n=5+5) LastIndexAnyASCII/256:1-48 300ns ± 0% 301ns ± 0% +0.33% (p=0.008 n=5+5) LastIndexAnyASCII/256:2-48 304ns ± 0% 304ns ± 0% ~ (p=1.000 n=5+5) LastIndexAnyASCII/256:4-48 311ns ± 0% 311ns ± 0% ~ (p=0.556 n=4+5) LastIndexAnyASCII/256:8-48 320ns ± 0% 321ns ± 0% ~ (p=0.143 n=5+5) LastIndexAnyASCII/256:16-48 333ns ± 0% 335ns ± 0% +0.60% (p=0.029 n=4+4) LastIndexAnyASCII/256:32-48 367ns ± 0% 366ns ± 0% ~ (p=0.095 n=4+5) LastIndexAnyASCII/256:64-48 431ns ± 0% 424ns ± 0% -1.62% (p=0.008 n=5+5) LastIndexAnyUTF8/1:1-48 19.7ns ± 1% 11.9ns ± 0% -39.47% (p=0.008 n=5+5) LastIndexAnyUTF8/1:2-48 27.6ns ± 1% 11.9ns ± 0% -56.82% (p=0.008 n=5+5) LastIndexAnyUTF8/1:4-48 29.9ns ± 0% 11.9ns ± 0% ~ (p=0.079 n=4+5) LastIndexAnyUTF8/1:8-48 48.7ns ± 0% 11.9ns ± 0% -75.54% (p=0.008 n=5+5) LastIndexAnyUTF8/1:16-48 57.8ns ± 0% 11.4ns ± 0% -80.26% (p=0.008 n=5+5) LastIndexAnyUTF8/1:32-48 94.7ns ± 0% 12.2ns ± 0% -87.07% (p=0.008 n=5+5) LastIndexAnyUTF8/1:64-48 163ns ± 0% 13ns ± 1% -91.93% (p=0.008 n=5+5) LastIndexAnyUTF8/16:1-48 258ns ± 0% 88ns ± 0% -65.76% (p=0.008 n=5+5) LastIndexAnyUTF8/16:2-48 400ns ± 0% 162ns ± 0% -59.38% (p=0.008 n=5+5) LastIndexAnyUTF8/16:4-48 415ns ± 0% 162ns ± 0% -60.87% (p=0.008 n=5+5) LastIndexAnyUTF8/16:8-48 737ns ± 0% 162ns ± 0% -78.02% (p=0.000 n=5+4) LastIndexAnyUTF8/16:16-48 882ns ± 0% 128ns ± 0% -85.49% (p=0.008 n=5+5) LastIndexAnyUTF8/16:32-48 1.47µs ± 0% 0.16µs ± 0% -89.29% (p=0.000 n=4+5) LastIndexAnyUTF8/16:64-48 2.56µs ± 0% 0.14µs ± 0% -94.41% (p=0.016 n=5+4) LastIndexAnyUTF8/256:1-48 3.60µs ± 0% 1.23µs ± 0% -65.67% (p=0.008 n=5+5) LastIndexAnyUTF8/256:2-48 5.78µs ± 0% 2.18µs ± 0% -62.32% (p=0.008 n=5+5) LastIndexAnyUTF8/256:4-48 6.26µs ± 0% 2.18µs ± 0% -65.15% (p=0.008 n=5+5) LastIndexAnyUTF8/256:8-48 11.2µs ± 0% 2.2µs ± 0% -80.53% (p=0.008 n=5+5) LastIndexAnyUTF8/256:16-48 13.5µs ± 0% 1.9µs ± 0% -86.02% (p=0.016 n=4+5) LastIndexAnyUTF8/256:32-48 23.0µs ± 0% 2.1µs ± 0% -90.72% (p=0.008 n=5+5) LastIndexAnyUTF8/256:64-48 40.5µs ± 0% 2.1µs ± 0% -94.73% (p=0.008 n=5+5) Change-Id: Ie05e306f8b184b989701868cb161ce8b3f18203b Reviewed-on: https://go-review.googlesource.com/c/go/+/156998 Run-TryBot: eric fang <eric.fang@arm.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-03-04bytes, strings: moves indexRabinKarp function to internal/bytealgerifan01
In order to facilitate optimization of IndexAny and LastIndexAny, this patch moves three Rabin-Karp related functions indexRabinKarp, hashStr and hashStrRev in strings package to initernal/bytealg. There are also three functions in the bytes package with the same names and functions but different parameter types. To highlight this, this patch also moves them to internal/bytealg and gives them slightly different names. Related benchmark changes on amd64 and arm64: name old time/op new time/op delta pkg:strings goos:linux goarch:amd64 Index-16 14.0ns ± 1% 14.1ns ± 2% ~ (p=0.738 n=5+5) LastIndex-16 15.5ns ± 1% 15.7ns ± 4% ~ (p=0.897 n=5+5) pkg:bytes goos:linux goarch:amd64 Index/10-16 26.5ns ± 1% 26.5ns ± 0% ~ (p=0.873 n=5+5) Index/32-16 26.2ns ± 0% 25.7ns ± 0% -1.68% (p=0.008 n=5+5) Index/4K-16 5.12µs ± 4% 5.14µs ± 2% ~ (p=0.841 n=5+5) Index/4M-16 5.44ms ± 3% 5.34ms ± 2% ~ (p=0.056 n=5+5) Index/64M-16 85.8ms ± 3% 84.6ms ± 0% -1.37% (p=0.016 n=5+5) name old speed new speed delta pkg:bytes goos:linux goarch:amd64 Index/10-16 377MB/s ± 1% 377MB/s ± 0% ~ (p=1.000 n=5+5) Index/32-16 1.22GB/s ± 1% 1.24GB/s ± 0% +1.66% (p=0.008 n=5+5) Index/4K-16 800MB/s ± 4% 797MB/s ± 2% ~ (p=0.841 n=5+5) Index/4M-16 771MB/s ± 3% 786MB/s ± 2% ~ (p=0.056 n=5+5) Index/64M-16 783MB/s ± 3% 793MB/s ± 0% +1.36% (p=0.016 n=5+5) name old time/op new time/op delta pkg:strings goos:linux goarch:arm64 Index-8 22.6ns ± 0% 22.5ns ± 0% ~ (p=0.167 n=5+5) LastIndex-8 17.5ns ± 0% 17.5ns ± 0% ~ (all equal) pkg:bytes goos:linux goarch:arm64 Index/10-8 25.0ns ± 0% 25.0ns ± 0% ~ (all equal) Index/32-8 160ns ± 0% 160ns ± 0% ~ (all equal) Index/4K-8 6.26µs ± 0% 6.26µs ± 0% ~ (p=0.167 n=5+5) Index/4M-8 6.30ms ± 0% 6.31ms ± 0% ~ (p=1.000 n=5+5) Index/64M-8 101ms ± 0% 101ms ± 0% ~ (p=0.690 n=5+5) name old speed new speed delta pkg:bytes goos:linux goarch:arm64 Index/10-8 399MB/s ± 0% 400MB/s ± 0% +0.08% (p=0.008 n=5+5) Index/32-8 200MB/s ± 0% 200MB/s ± 0% ~ (p=0.127 n=4+5) Index/4K-8 654MB/s ± 0% 654MB/s ± 0% +0.01% (p=0.016 n=5+5) Index/4M-8 665MB/s ± 0% 665MB/s ± 0% ~ (p=0.833 n=5+5) Index/64M-8 665MB/s ± 0% 665MB/s ± 0% ~ (p=0.913 n=5+5) Change-Id: Icce3bc162bb8613ac36dc963a46c51f8e82ab842 Reviewed-on: https://go-review.googlesource.com/c/go/+/208638 Run-TryBot: eric fang <eric.fang@arm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-02-04internal/bytealg: fix riscv64 offset namesJosh Bleecher Snyder
Vet caught that these were incorrect. Updates #37022 Change-Id: I7b5cd8032ea95eb8e0729f6a4f386aec613c71d8 Reviewed-on: https://go-review.googlesource.com/c/go/+/217777 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-11-15all: fix a bunch of misspellingsVille Skyttä
Change-Id: I5b909df0fd048cd66c5a27fca1b06466d3bcaac7 GitHub-Last-Rev: 778c5d21311abee09a5fbda2e4005a5fd4cc3f9f GitHub-Pull-Request: golang/go#35624 Reviewed-on: https://go-review.googlesource.com/c/go/+/207421 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-11-11internal/cpu,internal/bytealg: add support for riscv64Joel Sing
Based on riscv-go port. Updates #27532 Change-Id: Ia3aed521d4109e7b73f762c5a3cdacc7cdac430d Reviewed-on: https://go-review.googlesource.com/c/go/+/204635 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-11-04internal/bytealg: add SIMD byte count implementation for s390xMichael Munday
Add a 'single lane' SIMD implemementation of the single byte count function for use on machines that support the vector facility. This allows up to 16 bytes to be counted per loop iteration. We can probably improve performance further by adding more 'lanes' (i.e. counting more bytes in parallel) however this will increase the complexity of the function so I'm not sure it is worth doing yet. name old speed new speed delta pkg:strings goos:linux goarch:s390x CountByte/10 789MB/s ± 0% 1131MB/s ± 0% +43.44% (p=0.000 n=9+9) CountByte/32 936MB/s ± 0% 3236MB/s ± 0% +245.87% (p=0.000 n=8+9) CountByte/4096 1.06GB/s ± 0% 21.26GB/s ± 0% +1907.07% (p=0.000 n=10+10) CountByte/4194304 1.06GB/s ± 0% 20.54GB/s ± 0% +1838.50% (p=0.000 n=10+10) CountByte/67108864 1.06GB/s ± 0% 18.31GB/s ± 0% +1629.51% (p=0.000 n=10+10) pkg:bytes goos:linux goarch:s390x CountSingle/10 800MB/s ± 0% 986MB/s ± 0% +23.21% (p=0.000 n=9+10) CountSingle/32 925MB/s ± 0% 2744MB/s ± 0% +196.55% (p=0.000 n=9+10) CountSingle/4K 1.26GB/s ± 0% 19.44GB/s ± 0% +1445.59% (p=0.000 n=10+10) CountSingle/4M 1.26GB/s ± 0% 20.28GB/s ± 0% +1510.26% (p=0.000 n=8+10) CountSingle/64M 1.23GB/s ± 0% 17.78GB/s ± 0% +1350.67% (p=0.000 n=9+10) Change-Id: I230d57905db92a8fdfc50b1d5be338941ae3a7a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/199979 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Keith Randall <khr@golang.org>
2019-10-11runtime,internal/bytealg: optimize wasmZero, wasmMove, CompareAgniva De Sarker
Coalesce set/get pairs into a tee. Change-Id: I88ccdcb148465615437bebf24145e941a037e0a5 Reviewed-on: https://go-review.googlesource.com/c/go/+/200357 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-10all: remove nacl (part 3, more amd64p32)Brad Fitzpatrick
Part 1: CL 199499 (GOOS nacl) Part 2: CL 200077 (amd64p32 files, toolchain) Part 3: stuff that arguably should've been part of Part 2, but I forgot one of my grep patterns when splitting the original CL up into two parts. This one might also have interesting stuff to resurrect for any future x32 ABI support. Updates #30439 Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c Reviewed-on: https://go-review.googlesource.com/c/go/+/200318 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-09all: remove the nacl port (part 2, amd64p32 + toolchain)Brad Fitzpatrick
This is part two if the nacl removal. Part 1 was CL 199499. This CL removes amd64p32 support, which might be useful in the future if we implement the x32 ABI. It also removes the nacl bits in the toolchain, and some remaining nacl bits. Updates #30439 Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/200077 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-03internal/bytealg: (re)adding mips64x compare implementationMeng Zhuo
The original CL of mips64x compare function has been reverted due to wrong implement for little endian. Original CL: https://go-review.googlesource.com/c/go/+/196837 name old time/op new time/op delta BytesCompare/1 28.9ns ± 4% 22.1ns ± 0% -23.60% (p=0.000 n=9+8) BytesCompare/2 34.6ns ± 0% 23.1ns ± 0% -33.25% (p=0.000 n=8+10) BytesCompare/4 54.6ns ± 0% 40.8ns ± 0% -25.27% (p=0.000 n=8+8) BytesCompare/8 73.9ns ± 0% 49.1ns ± 0% -33.56% (p=0.000 n=8+8) BytesCompare/16 113ns ± 0% 24ns ± 0% -79.20% (p=0.000 n=9+9) BytesCompare/32 190ns ± 0% 26ns ± 0% -86.53% (p=0.000 n=10+10) BytesCompare/64 345ns ± 0% 44ns ± 0% -87.19% (p=0.000 n=10+8) BytesCompare/128 654ns ± 0% 52ns ± 0% -91.97% (p=0.000 n=9+8) BytesCompare/256 1.27µs ± 0% 0.07µs ± 0% -94.14% (p=0.001 n=8+9) BytesCompare/512 2.51µs ± 0% 0.12µs ± 0% -95.26% (p=0.000 n=9+10) BytesCompare/1024 4.99µs ± 0% 0.21µs ± 0% -95.85% (p=0.000 n=8+10) BytesCompare/2048 9.94µs ± 0% 0.38µs ± 0% -96.14% (p=0.000 n=8+8) CompareBytesEqual 105ns ± 0% 64ns ± 0% -39.43% (p=0.000 n=10+9) CompareBytesToNil 34.8ns ± 1% 38.6ns ± 3% +11.01% (p=0.000 n=10+10) CompareBytesEmpty 33.6ns ± 3% 36.6ns ± 0% +8.77% (p=0.000 n=10+8) CompareBytesIdentical 29.7ns ± 0% 40.5ns ± 1% +36.45% (p=0.000 n=10+8) CompareBytesSameLength 69.1ns ± 0% 51.8ns ± 0% -25.04% (p=0.000 n=10+9) CompareBytesDifferentLength 69.8ns ± 0% 52.5ns ± 0% -24.79% (p=0.000 n=10+8) CompareBytesBigUnaligned 5.15ms ± 0% 2.19ms ± 0% -57.59% (p=0.000 n=9+9) CompareBytesBig 5.28ms ± 0% 0.28ms ± 0% -94.64% (p=0.000 n=8+8) CompareBytesBigIdentical 29.7ns ± 0% 36.9ns ± 2% +24.11% (p=0.000 n=8+10) name old speed new speed delta CompareBytesBigUnaligned 204MB/s ± 0% 480MB/s ± 0% +135.77% (p=0.000 n=9+9) CompareBytesBig 198MB/s ± 0% 3704MB/s ± 0% +1765.97% (p=0.000 n=8+8) CompareBytesBigIdentical 35.3TB/s ± 0% 28.4TB/s ± 2% -19.44% (p=0.000 n=8+10) Fixes #34549 Change-Id: I2ef29f13cdd4229745ac2d018bb53c76f2ff1209 Reviewed-on: https://go-review.googlesource.com/c/go/+/197557 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>