aboutsummaryrefslogtreecommitdiff
path: root/src/internal/bytealg
AgeCommit message (Collapse)Author
2026-03-03internal/bytealg: on android/arm64, don't read outside 16-byte regionsKeith Randall
CL 749062 attempted this, but handled only the case of reading past the end. This CL handles the case of reading before the beginning. Update #59090 Change-Id: Ia21166a9a3fb20ac9003c192589a3d92304c9ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/751020 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2026-02-27runtime: on android/arm64, don't read outside 16-byte regionsKeith Randall
Because MTE might be enforced. Update #59090 Update #27610 Change-Id: Idfaecbf3b7a93c5e371abcace666febfc303de9a Reviewed-on: https://go-review.googlesource.com/c/go/+/749062 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2026-02-06internal/bytealg: unexport {Last,}IndexRabinKarp helpersTobias Klauser
After CL 538737 these no longer need to be exported. They are only used in internal/bytealg and can be unexported. Change-Id: Idd405f397c7ec9f96425d2b7e0e74de61daa7a6e Reviewed-on: https://go-review.googlesource.com/c/go/+/741920 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Tobias Klauser <tobias.klauser@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2025-11-24internal/bytealg: port bytealg functions to reg ABI on s390xSrinivas Pokala
This adds support for the reg ABI to the byte/string functions for s390x. These are initially under control of the GOEXPERIMENT macro until all changes are in. Updates #40724 Change-Id: Ia3532523fe3a839cc0370d6fe1544972327be514 Reviewed-on: https://go-review.googlesource.com/c/go/+/719481 Reviewed-by: Vishwanatha HD <vishwanatha.hd@ibm.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-08-08internal/bytealg: vector implementation of compare for riscv64Joel Sing
Provide a vector implementation of compare for riscv64, which is used when compiled with the rva23u64 profile, or when vector is detected to be available. Inputs that are 8 byte aligned will still be handled via a the non-vector code if the length is less than or equal to 128 bytes. On a Banana Pi F3, with GORISCV64=rva23u64: │ compare.1 │ compare.2 │ │ sec/op │ sec/op vs base │ BytesCompare/1-8 24.36n ± 0% 24.15n ± 0% -0.84% (p=0.007 n=10) BytesCompare/2-8 26.75n ± 0% 26.97n ± 0% +0.82% (p=0.000 n=10) BytesCompare/4-8 27.63n ± 0% 27.80n ± 0% +0.60% (p=0.001 n=10) BytesCompare/8-8 35.91n ± 0% 35.19n ± 0% -2.01% (p=0.000 n=10) BytesCompare/16-8 53.22n ± 0% 24.04n ± 1% -54.82% (p=0.000 n=10) BytesCompare/32-8 25.12n ± 0% 26.09n ± 1% +3.86% (p=0.000 n=10) BytesCompare/64-8 32.52n ± 0% 33.43n ± 1% +2.78% (p=0.000 n=10) BytesCompare/128-8 46.59n ± 0% 48.22n ± 1% +3.50% (p=0.000 n=10) BytesCompare/256-8 74.25n ± 0% 50.18n ± 0% -32.42% (p=0.000 n=10) BytesCompare/512-8 129.85n ± 0% 83.12n ± 0% -35.98% (p=0.000 n=10) BytesCompare/1024-8 244.6n ± 0% 148.0n ± 1% -39.49% (p=0.000 n=10) BytesCompare/2048-8 465.9n ± 0% 282.8n ± 2% -39.30% (p=0.000 n=10) CompareBytesEqual-8 51.96n ± 0% 52.90n ± 1% +1.80% (p=0.000 n=10) CompareBytesToNil-8 15.77n ± 1% 15.68n ± 0% -0.57% (p=0.000 n=10) CompareBytesEmpty-8 14.21n ± 1% 14.20n ± 1% ~ (p=1.000 n=10) CompareBytesIdentical-8 14.20n ± 1% 15.07n ± 1% +6.20% (p=0.000 n=10) CompareBytesSameLength-8 31.38n ± 0% 30.52n ± 0% -2.74% (p=0.000 n=10) CompareBytesDifferentLength-8 31.38n ± 0% 30.53n ± 0% -2.71% (p=0.000 n=10) CompareBytesBigUnaligned/offset=1-8 2401.0µ ± 0% 437.6µ ± 0% -81.77% (p=0.000 n=10) CompareBytesBigUnaligned/offset=2-8 2376.8µ ± 0% 437.4µ ± 0% -81.60% (p=0.000 n=10) CompareBytesBigUnaligned/offset=3-8 2384.1µ ± 0% 437.5µ ± 0% -81.65% (p=0.000 n=10) CompareBytesBigUnaligned/offset=4-8 2377.7µ ± 0% 437.4µ ± 0% -81.60% (p=0.000 n=10) CompareBytesBigUnaligned/offset=5-8 2366.3µ ± 0% 437.5µ ± 0% -81.51% (p=0.000 n=10) CompareBytesBigUnaligned/offset=6-8 2357.3µ ± 0% 437.3µ ± 0% -81.45% (p=0.000 n=10) CompareBytesBigUnaligned/offset=7-8 2385.3µ ± 0% 437.6µ ± 0% -81.65% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=0-8 447.2µ ± 0% 464.8µ ± 0% +3.94% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=1-8 447.7µ ± 0% 453.1µ ± 0% +1.20% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=2-8 447.9µ ± 0% 453.0µ ± 0% +1.15% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=3-8 448.0µ ± 0% 452.5µ ± 0% +1.02% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=4-8 448.0µ ± 0% 452.1µ ± 0% +0.92% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=5-8 447.8µ ± 0% 452.8µ ± 0% +1.12% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=6-8 447.9µ ± 0% 452.4µ ± 0% +1.01% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=7-8 447.9µ ± 0% 452.8µ ± 0% +1.09% (p=0.000 n=10) CompareBytesBig-8 441.2µ ± 0% 461.8µ ± 0% +4.66% (p=0.000 n=10) CompareBytesBigIdentical-8 13.81n ± 0% 13.80n ± 0% ~ (p=0.519 n=10) geomean 3.980µ 2.651µ -33.40% │ compare.1 │ compare.2 │ │ B/s │ B/s vs base │ CompareBytesBigUnaligned/offset=1-8 416.5Mi ± 0% 2285.1Mi ± 0% +448.64% (p=0.000 n=10) CompareBytesBigUnaligned/offset=2-8 420.7Mi ± 0% 2286.4Mi ± 0% +443.43% (p=0.000 n=10) CompareBytesBigUnaligned/offset=3-8 419.5Mi ± 0% 2285.9Mi ± 0% +444.97% (p=0.000 n=10) CompareBytesBigUnaligned/offset=4-8 420.6Mi ± 0% 2286.1Mi ± 0% +443.57% (p=0.000 n=10) CompareBytesBigUnaligned/offset=5-8 422.6Mi ± 0% 2285.7Mi ± 0% +440.86% (p=0.000 n=10) CompareBytesBigUnaligned/offset=6-8 424.2Mi ± 0% 2286.8Mi ± 0% +439.07% (p=0.000 n=10) CompareBytesBigUnaligned/offset=7-8 419.2Mi ± 0% 2285.2Mi ± 0% +445.07% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=0-8 2.184Gi ± 0% 2.101Gi ± 0% -3.79% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=1-8 2.181Gi ± 0% 2.155Gi ± 0% -1.18% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=2-8 2.180Gi ± 0% 2.156Gi ± 0% -1.13% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=3-8 2.180Gi ± 0% 2.158Gi ± 0% -1.01% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=4-8 2.180Gi ± 0% 2.160Gi ± 0% -0.91% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=5-8 2.181Gi ± 0% 2.157Gi ± 0% -1.11% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=6-8 2.181Gi ± 0% 2.159Gi ± 0% -1.00% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=7-8 2.180Gi ± 0% 2.157Gi ± 0% -1.08% (p=0.000 n=10) CompareBytesBig-8 2.213Gi ± 0% 2.115Gi ± 0% -4.45% (p=0.000 n=10) CompareBytesBigIdentical-8 69.06Ti ± 0% 69.09Ti ± 0% ~ (p=0.315 n=10) geomean 2.022Gi 4.022Gi +98.95% Change-Id: Id3012faf8d353eb1be0e1fb01b78ac43fa4c7e8b Reviewed-on: https://go-review.googlesource.com/c/go/+/646737 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Mark Freeman <markfreeman@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
2025-08-07internal/bytealg: optimize Index/IndexString on loong64limeidan
goos: linux goarch: loong64 pkg: bytes cpu: Loongson-3A6000 @ 2500.00MHz | 3a6000.old.txt | 3a6000.new.txt | | sec/op | sec/op vs base | IndexRune/10 23.56n ± 1% 20.42n ± 0% -13.33% (p=0.000 n=10) IndexRune/32 29.91n ± 1% 22.46n ± 0% -24.90% (p=0.000 n=10) IndexRune/4K 102.45n ± 2% 72.66n ± 0% -29.08% (p=0.000 n=10) IndexRune/4M 111.96µ ± 1% 52.50µ ± 1% -53.11% (p=0.000 n=10) IndexRune/64M 3.653m ± 30% 3.633m ± 0% ~ (p=0.143 n=10) IndexRuneASCII/10 8.736n ± 2% 7.206n ± 0% -17.51% (p=0.000 n=10) IndexRuneASCII/32 10.195n ± 2% 8.008n ± 0% -21.45% (p=0.000 n=10) IndexRuneASCII/4K 70.27n ± 2% 52.84n ± 0% -24.80% (p=0.000 n=10) IndexRuneASCII/4M 98.15µ ± 1% 87.87µ ± 1% -10.47% (p=0.000 n=10) IndexRuneASCII/64M 2.028m ± 0% 1.918m ± 2% -5.41% (p=0.000 n=10) IndexRuneUnicode/Latin/10 18.80n ± 1% 13.61n ± 0% -27.59% (p=0.000 n=10) IndexRuneUnicode/Latin/32 28.09n ± 2% 20.82n ± 0% -25.88% (p=0.000 n=10) IndexRuneUnicode/Latin/4K 373.8n ± 1% 357.1n ± 0% -4.47% (p=0.000 n=10) IndexRuneUnicode/Latin/4M 395.8µ ± 0% 381.0µ ± 0% -3.74% (p=0.000 n=10) IndexRuneUnicode/Latin/64M 8.056m ± 0% 7.614m ± 0% -5.49% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/10 23.72n ± 1% 20.42n ± 0% -13.91% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/32 30.20n ± 1% 22.42n ± 0% -25.77% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/4K 1.134µ ± 1% 1.122µ ± 0% -1.06% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/4M 1.160m ± 1% 1.152m ± 0% -0.72% (p=0.005 n=10) IndexRuneUnicode/Cyrillic/64M 20.26m ± 1% 19.61m ± 0% -3.24% (p=0.000 n=10) IndexRuneUnicode/Han/10 30.11n ± 2% 24.82n ± 0% -17.57% (p=0.000 n=10) IndexRuneUnicode/Han/32 36.16n ± 2% 27.20n ± 0% -24.78% (p=0.000 n=10) IndexRuneUnicode/Han/4K 548.1n ± 0% 524.8n ± 0% -4.25% (p=0.000 n=10) IndexRuneUnicode/Han/4M 706.7µ ± 1% 624.0µ ± 0% -11.70% (p=0.000 n=10) IndexRuneUnicode/Han/64M 12.50m ± 1% 10.84m ± 1% -13.24% (p=0.000 n=10) Index/10 42.03n ± 2% 10.01n ± 0% -76.18% (p=0.000 n=10) Index/32 133.15n ± 1% 40.03n ± 0% -69.94% (p=0.000 n=10) Index/4K 11.647µ ± 1% 2.493µ ± 0% -78.60% (p=0.000 n=10) Index/4M 11.536m ± 0% 2.519m ± 0% -78.16% (p=0.000 n=10) Index/64M 184.60m ± 1% 40.42m ± 0% -78.10% (p=0.000 n=10) IndexEasy/10 17.290n ± 2% 9.608n ± 0% -44.43% (p=0.000 n=10) IndexEasy/32 23.71n ± 2% 16.61n ± 0% -29.95% (p=0.000 n=10) IndexEasy/4K 95.64n ± 2% 68.25n ± 0% -28.64% (p=0.000 n=10) IndexEasy/4M 105.04µ ± 1% 91.94µ ± 0% -12.47% (p=0.000 n=10) IndexEasy/64M 4.280m ± 0% 4.264m ± 0% -0.38% (p=0.002 n=10) Count/10 53.09n ± 1% 16.81n ± 0% -68.33% (p=0.000 n=10) Count/32 142.20n ± 2% 46.44n ± 0% -67.34% (p=0.000 n=10) Count/4K 11.428µ ± 1% 2.500µ ± 1% -78.12% (p=0.000 n=10) Count/4M 11.536m ± 1% 2.520m ± 0% -78.16% (p=0.000 n=10) Count/64M 183.80m ± 1% 40.42m ± 0% -78.01% (p=0.000 n=10) IndexHard1 2906.4µ ± 1% 420.4µ ± 0% -85.54% (p=0.000 n=10) IndexHard2 2918.0µ ± 1% 421.1µ ± 1% -85.57% (p=0.000 n=10) IndexHard3 2912.8µ ± 1% 440.2µ ± 0% -84.89% (p=0.000 n=10) IndexHard4 2909.6µ ± 1% 840.4µ ± 0% -71.12% (p=0.000 n=10) LastIndexHard1 2.939m ± 1% 2.621m ± 0% -10.83% (p=0.000 n=10) LastIndexHard2 2.924m ± 1% 2.624m ± 0% -10.26% (p=0.000 n=10) LastIndexHard3 2.936m ± 1% 2.580m ± 1% -12.12% (p=0.000 n=10) CountHard1 2900.4µ ± 1% 420.0µ ± 0% -85.52% (p=0.000 n=10) CountHard2 2915.6µ ± 1% 420.0µ ± 0% -85.59% (p=0.000 n=10) CountHard3 2905.0µ ± 0% 440.0µ ± 0% -84.85% (p=0.000 n=10) IndexPeriodic/IndexPeriodic2 181.95µ ± 1% 26.28µ ± 0% -85.56% (p=0.000 n=10) IndexPeriodic/IndexPeriodic4 182.59µ ± 1% 26.29µ ± 0% -85.60% (p=0.000 n=10) IndexPeriodic/IndexPeriodic8 183.9µ ± 1% 108.2µ ± 0% -41.14% (p=0.000 n=10) IndexPeriodic/IndexPeriodic16 58.24µ ± 0% 56.58µ ± 0% -2.86% (p=0.000 n=10) IndexPeriodic/IndexPeriodic32 30.82µ ± 0% 29.62µ ± 0% -3.92% (p=0.000 n=10) IndexPeriodic/IndexPeriodic64 16.59µ ± 0% 15.00µ ± 0% -9.62% (p=0.000 n=10) geomean 22.69µ 11.59µ -48.92% Change-Id: Iacc9e686027f99bb0413b566cfc8ee6cd873d2d9 Reviewed-on: https://go-review.googlesource.com/c/go/+/693878 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-06internal/bytealg: vector implementation of indexbyte for riscv64Joel Sing
Provide a vector implementation of indexbyte for riscv64, which is used when compiled with the rva23u64 profile, or when vector is detected to be available. Inputs that are smaller than 24 bytes will continue to use the non-vector path. On a Banana Pi F3, with GORISCV64=rva23u64: │ indexbyte.1 │ indexbyte.2 │ │ sec/op │ sec/op vs base │ IndexByte/10-8 52.68n ± 0% 47.26n ± 0% -10.30% (p=0.000 n=10) IndexByte/32-8 68.62n ± 0% 47.02n ± 0% -31.49% (p=0.000 n=10) IndexByte/4K-8 2217.0n ± 0% 420.4n ± 0% -81.04% (p=0.000 n=10) IndexByte/4M-8 2624.4µ ± 0% 767.5µ ± 0% -70.75% (p=0.000 n=10) IndexByte/64M-8 68.08m ± 10% 47.84m ± 45% -29.73% (p=0.004 n=10) geomean 17.03µ 8.073µ -52.59% │ indexbyte.1 │ indexbyte.2 │ │ B/s │ B/s vs base │ IndexByte/10-8 181.0Mi ± 0% 201.8Mi ± 0% +11.48% (p=0.000 n=10) IndexByte/32-8 444.7Mi ± 0% 649.1Mi ± 0% +45.97% (p=0.000 n=10) IndexByte/4K-8 1.721Gi ± 0% 9.076Gi ± 0% +427.51% (p=0.000 n=10) IndexByte/4M-8 1.488Gi ± 0% 5.089Gi ± 0% +241.93% (p=0.000 n=10) IndexByte/64M-8 940.3Mi ± 9% 1337.8Mi ± 31% +42.27% (p=0.004 n=10) geomean 727.1Mi 1.498Gi +110.94% Change-Id: If7b0dbef38d76fa7a2021e4ecaed668a1d4b9783 Reviewed-on: https://go-review.googlesource.com/c/go/+/648856 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-08-06internal/bytealg: vector implementation of equal for riscv64Joel Sing
Provide a vector implementation of equal for riscv64, which is used when compiled with the rva23u64 profile, or when vector is detected to be available. Inputs that are 8 byte aligned will still be handled via a the non-vector code if the length is less than or equal to 64 bytes. On a Banana Pi F3, with GORISCV64=rva23u64: │ equal.1 │ equal.2 │ │ sec/op │ sec/op vs base │ Equal/0-8 1.254n ± 0% 1.254n ± 0% ~ (p=1.000 n=10) Equal/same/1-8 21.32n ± 0% 21.32n ± 0% ~ (p=0.466 n=10) Equal/same/6-8 21.32n ± 0% 21.32n ± 0% ~ (p=0.689 n=10) Equal/same/9-8 21.32n ± 0% 21.32n ± 0% ~ (p=0.861 n=10) Equal/same/15-8 21.32n ± 0% 21.32n ± 0% ~ (p=0.657 n=10) Equal/same/16-8 21.32n ± 0% 21.33n ± 0% ~ (p=0.075 n=10) Equal/same/20-8 21.32n ± 0% 21.32n ± 0% ~ (p=0.249 n=10) Equal/same/32-8 21.32n ± 0% 21.32n ± 0% ~ (p=0.303 n=10) Equal/same/4K-8 21.32n ± 0% 21.32n ± 0% ~ (p=1.000 n=10) Equal/same/4M-8 21.32n ± 0% 21.32n ± 0% ~ (p=0.582 n=10) Equal/same/64M-8 21.32n ± 0% 21.32n ± 0% ~ (p=0.930 n=10) Equal/1-8 39.16n ± 1% 38.71n ± 0% -1.15% (p=0.000 n=10) Equal/6-8 51.49n ± 1% 50.40n ± 1% -2.12% (p=0.000 n=10) Equal/9-8 54.46n ± 1% 53.89n ± 0% -1.04% (p=0.000 n=10) Equal/15-8 71.81n ± 1% 70.59n ± 0% -1.71% (p=0.000 n=10) Equal/16-8 69.14n ± 0% 68.21n ± 0% -1.34% (p=0.000 n=10) Equal/20-8 78.59n ± 0% 77.59n ± 0% -1.26% (p=0.000 n=10) Equal/32-8 41.55n ± 0% 41.16n ± 0% -0.96% (p=0.000 n=10) Equal/4K-8 925.5n ± 0% 561.4n ± 1% -39.34% (p=0.000 n=10) Equal/4M-8 3.110m ± 32% 2.463m ± 16% -20.80% (p=0.000 n=10) Equal/64M-8 47.34m ± 30% 39.89m ± 16% -15.75% (p=0.004 n=10) EqualBothUnaligned/64_0-8 32.17n ± 1% 32.11n ± 1% ~ (p=0.184 n=10) EqualBothUnaligned/64_1-8 79.48n ± 0% 48.24n ± 1% -39.31% (p=0.000 n=10) EqualBothUnaligned/64_4-8 72.71n ± 0% 48.37n ± 1% -33.48% (p=0.000 n=10) EqualBothUnaligned/64_7-8 77.12n ± 0% 48.16n ± 1% -37.56% (p=0.000 n=10) EqualBothUnaligned/4096_0-8 908.4n ± 0% 562.4n ± 2% -38.09% (p=0.000 n=10) EqualBothUnaligned/4096_1-8 956.6n ± 0% 571.4n ± 3% -40.26% (p=0.000 n=10) EqualBothUnaligned/4096_4-8 949.6n ± 0% 571.6n ± 3% -39.81% (p=0.000 n=10) EqualBothUnaligned/4096_7-8 954.2n ± 0% 571.7n ± 3% -40.09% (p=0.000 n=10) EqualBothUnaligned/4194304_0-8 2.935m ± 29% 2.664m ± 19% ~ (p=0.089 n=10) EqualBothUnaligned/4194304_1-8 3.341m ± 13% 2.896m ± 34% ~ (p=0.075 n=10) EqualBothUnaligned/4194304_4-8 3.204m ± 39% 3.352m ± 33% ~ (p=0.796 n=10) EqualBothUnaligned/4194304_7-8 3.226m ± 30% 2.737m ± 34% -15.16% (p=0.043 n=10) EqualBothUnaligned/67108864_0-8 49.04m ± 17% 39.94m ± 12% -18.57% (p=0.005 n=10) EqualBothUnaligned/67108864_1-8 51.96m ± 15% 42.48m ± 15% -18.23% (p=0.015 n=10) EqualBothUnaligned/67108864_4-8 47.67m ± 17% 37.85m ± 41% -20.61% (p=0.035 n=10) EqualBothUnaligned/67108864_7-8 53.00m ± 22% 38.76m ± 21% -26.87% (p=0.000 n=10) CompareBytesEqual-8 51.71n ± 1% 52.00n ± 0% +0.57% (p=0.002 n=10) geomean 1.469µ 1.265µ -13.93% │ equal.1 │ equal.2 │ │ B/s │ B/s vs base │ Equal/same/1-8 44.73Mi ± 0% 44.72Mi ± 0% ~ (p=0.426 n=10) Equal/same/6-8 268.3Mi ± 0% 268.4Mi ± 0% ~ (p=0.753 n=10) Equal/same/9-8 402.6Mi ± 0% 402.5Mi ± 0% ~ (p=0.209 n=10) Equal/same/15-8 670.9Mi ± 0% 670.9Mi ± 0% ~ (p=0.724 n=10) Equal/same/16-8 715.6Mi ± 0% 715.4Mi ± 0% -0.04% (p=0.022 n=10) Equal/same/20-8 894.6Mi ± 0% 894.5Mi ± 0% ~ (p=0.060 n=10) Equal/same/32-8 1.398Gi ± 0% 1.398Gi ± 0% ~ (p=0.986 n=10) Equal/same/4K-8 178.9Gi ± 0% 178.9Gi ± 0% ~ (p=0.853 n=10) Equal/same/4M-8 178.9Ti ± 0% 178.9Ti ± 0% ~ (p=0.971 n=10) Equal/same/64M-8 2862.8Ti ± 0% 2862.6Ti ± 0% ~ (p=0.971 n=10) Equal/1-8 24.35Mi ± 1% 24.63Mi ± 0% +1.16% (p=0.000 n=10) Equal/6-8 111.1Mi ± 1% 113.5Mi ± 1% +2.17% (p=0.000 n=10) Equal/9-8 157.6Mi ± 1% 159.3Mi ± 0% +1.05% (p=0.000 n=10) Equal/15-8 199.2Mi ± 1% 202.7Mi ± 0% +1.74% (p=0.000 n=10) Equal/16-8 220.7Mi ± 0% 223.7Mi ± 0% +1.36% (p=0.000 n=10) Equal/20-8 242.7Mi ± 0% 245.8Mi ± 0% +1.27% (p=0.000 n=10) Equal/32-8 734.3Mi ± 0% 741.6Mi ± 0% +0.98% (p=0.000 n=10) Equal/4K-8 4.122Gi ± 0% 6.795Gi ± 1% +64.84% (p=0.000 n=10) Equal/4M-8 1.258Gi ± 24% 1.586Gi ± 14% +26.12% (p=0.000 n=10) Equal/64M-8 1.320Gi ± 23% 1.567Gi ± 14% +18.69% (p=0.004 n=10) EqualBothUnaligned/64_0-8 1.853Gi ± 1% 1.856Gi ± 1% ~ (p=0.190 n=10) EqualBothUnaligned/64_1-8 767.9Mi ± 0% 1265.2Mi ± 1% +64.76% (p=0.000 n=10) EqualBothUnaligned/64_4-8 839.4Mi ± 0% 1261.9Mi ± 1% +50.33% (p=0.000 n=10) EqualBothUnaligned/64_7-8 791.4Mi ± 0% 1267.5Mi ± 1% +60.16% (p=0.000 n=10) EqualBothUnaligned/4096_0-8 4.199Gi ± 0% 6.784Gi ± 2% +61.54% (p=0.000 n=10) EqualBothUnaligned/4096_1-8 3.988Gi ± 0% 6.676Gi ± 3% +67.40% (p=0.000 n=10) EqualBothUnaligned/4096_4-8 4.017Gi ± 0% 6.674Gi ± 3% +66.14% (p=0.000 n=10) EqualBothUnaligned/4096_7-8 3.998Gi ± 0% 6.673Gi ± 3% +66.92% (p=0.000 n=10) EqualBothUnaligned/4194304_0-8 1.332Gi ± 22% 1.468Gi ± 16% ~ (p=0.089 n=10) EqualBothUnaligned/4194304_1-8 1.169Gi ± 12% 1.350Gi ± 25% ~ (p=0.075 n=10) EqualBothUnaligned/4194304_4-8 1.222Gi ± 28% 1.165Gi ± 48% ~ (p=0.796 n=10) EqualBothUnaligned/4194304_7-8 1.211Gi ± 23% 1.427Gi ± 26% +17.88% (p=0.043 n=10) EqualBothUnaligned/67108864_0-8 1.274Gi ± 14% 1.567Gi ± 14% +22.97% (p=0.005 n=10) EqualBothUnaligned/67108864_1-8 1.204Gi ± 14% 1.471Gi ± 13% +22.18% (p=0.015 n=10) EqualBothUnaligned/67108864_4-8 1.311Gi ± 14% 1.651Gi ± 29% +25.92% (p=0.035 n=10) EqualBothUnaligned/67108864_7-8 1.179Gi ± 18% 1.612Gi ± 17% +36.73% (p=0.000 n=10) geomean 1.870Gi 2.190Gi +17.16% Change-Id: I9c5270bcc6997d020a96d1e97c7e7cfc7ca7fd34 Reviewed-on: https://go-review.googlesource.com/c/go/+/646736 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Mark Freeman <markfreeman@google.com>
2025-05-13internal/bytealg: optimize the function compare using SIMD on loong64limeidan
goos: linux goarch: loong64 pkg: bytes cpu: Loongson-3A6000-HV @ 2500.00MHz │ old │ new │ │ sec/op │ sec/op vs base │ BytesCompare/1 7.238n ± 25% 5.204n ± 0% -28.10% (p=0.001 n=10) BytesCompare/2 7.242n ± 6% 5.204n ± 0% -28.14% (p=0.000 n=10) BytesCompare/4 7.229n ± 5% 4.403n ± 0% -39.10% (p=0.000 n=10) BytesCompare/8 7.077n ± 36% 4.403n ± 0% -37.78% (p=0.000 n=10) BytesCompare/16 8.373n ± 6% 6.004n ± 0% -28.30% (p=0.000 n=10) BytesCompare/32 8.040n ± 3% 4.803n ± 0% -40.26% (p=0.000 n=10) BytesCompare/64 8.434n ± 24% 10.410n ± 0% +23.42% (p=0.014 n=10) BytesCompare/128 11.530n ± 23% 5.604n ± 0% -51.40% (p=0.000 n=10) BytesCompare/256 14.180n ± 0% 7.606n ± 0% -46.36% (p=0.000 n=10) BytesCompare/512 26.83n ± 0% 10.81n ± 0% -59.71% (p=0.000 n=10) BytesCompare/1024 52.60n ± 0% 17.21n ± 0% -67.28% (p=0.000 n=10) BytesCompare/2048 103.70n ± 0% 30.02n ± 0% -71.05% (p=0.000 n=10) geomean 13.49n 7.607n -43.63% goos: linux goarch: loong64 pkg: bytes cpu: Loongson-3A6000-HV @ 2500.00MHz │ old │ new │ │ sec/op │ sec/op vs base │ CompareBytesEqual 5.603n ± 0% 5.604n ± 0% ~ (p=0.191 n=10) CompareBytesToNil 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=10) CompareBytesEmpty 2.802n ± 0% 2.802n ± 0% ~ (p=1.000 n=10) CompareBytesIdentical 3.202n ± 0% 2.538n ± 1% -20.72% (p=0.000 n=10) CompareBytesSameLength 8.805n ± 0% 4.803n ± 0% -45.45% (p=0.000 n=10) CompareBytesDifferentLength 9.206n ± 0% 4.403n ± 0% -52.17% (p=0.000 n=10) CompareBytesBigUnaligned/offset=1 82.04µ ± 0% 45.91µ ± 0% -44.04% (p=0.000 n=10) CompareBytesBigUnaligned/offset=2 82.04µ ± 0% 45.91µ ± 0% -44.04% (p=0.000 n=10) CompareBytesBigUnaligned/offset=3 82.04µ ± 0% 45.91µ ± 0% -44.04% (p=0.000 n=10) CompareBytesBigUnaligned/offset=4 82.04µ ± 0% 45.91µ ± 0% -44.04% (p=0.000 n=10) CompareBytesBigUnaligned/offset=5 82.04µ ± 0% 45.91µ ± 0% -44.04% (p=0.000 n=10) CompareBytesBigUnaligned/offset=6 82.03µ ± 0% 45.93µ ± 0% -44.01% (p=0.000 n=10) CompareBytesBigUnaligned/offset=7 82.04µ ± 0% 45.93µ ± 0% -44.01% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=0 78.76µ ± 0% 45.69µ ± 0% -41.98% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=1 85.32µ ± 0% 46.04µ ± 0% -46.03% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=2 85.31µ ± 0% 46.04µ ± 0% -46.03% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=3 85.32µ ± 0% 46.04µ ± 0% -46.03% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=4 85.32µ ± 0% 46.04µ ± 0% -46.03% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=5 85.32µ ± 0% 46.04µ ± 0% -46.03% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=6 85.31µ ± 0% 46.06µ ± 0% -46.02% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=7 85.32µ ± 0% 52.32µ ± 7% -38.68% (p=0.000 n=10) CompareBytesBig 78.76µ ± 0% 50.20µ ± 6% -36.26% (p=0.000 n=10) CompareBytesBigIdentical 3.202n ± 0% 3.442n ± 24% ~ (p=0.462 n=10) geomean 4.197µ 2.630µ -37.34% Change-Id: I621145aef3e6a2c68e7127152f26ed047c6b2ece Reviewed-on: https://go-review.googlesource.com/c/go/+/671315 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-08internal/bytealg: optimize the function indexbyte using SIMD on loong64limeidan
goos: linux goarch: loong64 pkg: bytes cpu: Loongson-3C5000 @ 2200.00MHz │ old │ new │ │ sec/op │ sec/op vs base │ IndexByte/10 19.32n ± 0% 11.84n ± 0% -38.72% (p=0.000 n=10) IndexByte/32 49.34n ± 0% 14.11n ± 0% -71.40% (p=0.000 n=10) IndexByte/4K 5608.0n ± 0% 138.8n ± 0% -97.52% (p=0.000 n=10) IndexByte/4M 3822.8µ ± 0% 119.4µ ± 0% -96.88% (p=0.000 n=10) IndexByte/64M 61.826m ± 1% 3.812m ± 0% -93.83% (p=0.000 n=10) geomean 16.61µ 1.602µ -90.35% goos: linux goarch: loong64 pkg: bytes cpu: Loongson-3A6000-HV @ 2500.00MHz │ old │ new │ │ sec/op │ sec/op vs base │ IndexByte/10 6.809n ± 0% 5.804n ± 0% -14.75% (p=0.000 n=10) IndexByte/32 16.015n ± 0% 6.404n ± 0% -60.01% (p=0.000 n=10) IndexByte/4K 1651.00n ± 0% 52.83n ± 0% -96.80% (p=0.000 n=10) IndexByte/4M 1680.76µ ± 0% 91.10µ ± 0% -94.58% (p=0.000 n=10) IndexByte/64M 26.878m ± 0% 2.010m ± 27% -92.52% (p=0.000 n=10) geomean 6.054µ 815.0n -86.54% Change-Id: Ib75b997249708f921c6717eba43543c6650bf376 Reviewed-on: https://go-review.googlesource.com/c/go/+/668055 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
2025-05-01internal/bytealg: deduplicate code between Count/CountString for riscv64Joel Sing
Change-Id: I22eb4e7444e5fe5f6767cc960895f3c6e2fa13cc Reviewed-on: https://go-review.googlesource.com/c/go/+/661615 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Carlos Amedee <carlos@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-04-23runtime: optimize the function memequal using SIMD on loong64limeidan
goos: linux goarch: loong64 pkg: bytes cpu: Loongson-3A6000-HV @ 2500.00MHz │ old │ new │ │ sec/op │ sec/op vs base │ Equal/0 0.4012n ± 0% 0.4003n ± 0% -0.21% (p=0.000 n=10) Equal/same/1 2.555n ± 1% 2.419n ± 0% -5.32% (p=0.000 n=10) Equal/same/6 2.574n ± 1% 2.425n ± 1% -5.79% (p=0.000 n=10) Equal/same/9 2.578n ± 0% 2.419n ± 1% -6.19% (p=0.000 n=10) Equal/same/15 2.565n ± 1% 2.417n ± 0% -5.73% (p=0.000 n=10) Equal/same/16 2.576n ± 1% 2.414n ± 0% -6.31% (p=0.000 n=10) Equal/same/20 2.573n ± 1% 2.416n ± 0% -6.10% (p=0.000 n=10) Equal/same/32 2.559n ± 0% 2.411n ± 0% -5.80% (p=0.000 n=10) Equal/same/4K 2.579n ± 1% 2.410n ± 0% -6.53% (p=0.000 n=10) Equal/same/4M 2.571n ± 0% 2.411n ± 0% -6.22% (p=0.000 n=10) Equal/same/64M 2.568n ± 1% 2.413n ± 0% -6.05% (p=0.000 n=10) Equal/1 5.215n ± 0% 6.404n ± 0% +22.80% (p=0.000 n=10) Equal/6 11.630n ± 0% 6.404n ± 0% -44.94% (p=0.000 n=10) Equal/9 15.240n ± 0% 6.404n ± 0% -57.98% (p=0.000 n=10) Equal/15 22.925n ± 0% 6.404n ± 0% -72.07% (p=0.000 n=10) Equal/16 24.070n ± 0% 5.203n ± 0% -78.38% (p=0.000 n=10) Equal/20 28.880n ± 0% 6.404n ± 0% -77.83% (p=0.000 n=10) Equal/32 43.320n ± 0% 6.404n ± 0% -85.22% (p=0.000 n=10) Equal/4K 4938.50n ± 0% 55.43n ± 0% -98.88% (p=0.000 n=10) Equal/4M 5048.8µ ± 0% 202.0µ ± 0% -96.00% (p=0.000 n=10) Equal/64M 80.819m ± 0% 4.539m ± 0% -94.38% (p=0.000 n=10) EqualBothUnaligned/64_0 79.830n ± 0% 4.803n ± 0% -93.98% (p=0.000 n=10) EqualBothUnaligned/64_1 79.830n ± 0% 4.803n ± 0% -93.98% (p=0.000 n=10) EqualBothUnaligned/64_4 79.830n ± 0% 4.803n ± 0% -93.98% (p=0.000 n=10) EqualBothUnaligned/64_7 79.830n ± 0% 4.803n ± 0% -93.98% (p=0.000 n=10) EqualBothUnaligned/4096_0 4937.00n ± 0% 65.64n ± 0% -98.67% (p=0.000 n=10) EqualBothUnaligned/4096_1 4937.00n ± 0% 78.85n ± 0% -98.40% (p=0.000 n=10) EqualBothUnaligned/4096_4 4937.00n ± 0% 78.87n ± 0% -98.40% (p=0.000 n=10) EqualBothUnaligned/4096_7 4937.00n ± 0% 78.87n ± 0% -98.40% (p=0.000 n=10) EqualBothUnaligned/4194304_0 5049.2µ ± 0% 204.2µ ± 0% -95.96% (p=0.000 n=10) EqualBothUnaligned/4194304_1 5049.2µ ± 0% 205.1µ ± 0% -95.94% (p=0.000 n=10) EqualBothUnaligned/4194304_4 5049.4µ ± 0% 205.1µ ± 0% -95.94% (p=0.000 n=10) EqualBothUnaligned/4194304_7 5049.2µ ± 0% 205.1µ ± 0% -95.94% (p=0.000 n=10) EqualBothUnaligned/67108864_0 80.796m ± 0% 3.863m ± 0% -95.22% (p=0.000 n=10) EqualBothUnaligned/67108864_1 80.801m ± 0% 3.706m ± 0% -95.41% (p=0.000 n=10) EqualBothUnaligned/67108864_4 80.799m ± 0% 3.706m ± 0% -95.41% (p=0.000 n=10) EqualBothUnaligned/67108864_7 80.781m ± 0% 3.706m ± 0% -95.41% (p=0.000 n=10) geomean 1.040µ 149.6n -85.63% Change-Id: Id4c2bc0ca758337dd9759df83750c761814be488 Reviewed-on: https://go-review.googlesource.com/c/go/+/667255 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-04-03internal/bytealg: optimize Index/IndexString/IndexByte/IndexByteString on arm64Vasily Leonenko
Introduce ABIInternal support for Index/IndexString/IndexByte/IndexByteString goos: linux goarch: arm64 pkg: bytes │ base.txt │ new.txt │ │ B/s │ B/s vs base │ IndexByte/10 1.090Gi ± 0% 1.313Gi ± 0% +20.51% (p=0.000 n=10) IndexByte/32 3.714Gi ± 0% 4.289Gi ± 0% +15.47% (p=0.000 n=10) IndexByte/4K 22.92Gi ± 0% 23.01Gi ± 0% +0.37% (p=0.000 n=10) IndexByte/4M 20.23Gi ± 0% 20.35Gi ± 0% +0.60% (p=0.000 n=10) IndexByte/64M 23.82Gi ± 0% 23.81Gi ± 0% -0.01% (p=0.002 n=10) IndexBytePortable/10 788.5Mi ± 0% 788.5Mi ± 0% ~ (p=0.722 n=10) IndexBytePortable/32 1002.3Mi ± 0% 1002.3Mi ± 0% ~ (p=0.137 n=10) IndexBytePortable/4K 1.111Gi ± 0% 1.111Gi ± 0% ~ (p=0.692 n=10) IndexBytePortable/4M 1.116Gi ± 0% 1.116Gi ± 0% ~ (p=0.158 n=10) IndexBytePortable/64M 1.116Gi ± 0% 1.116Gi ± 0% -0.01% (p=0.000 n=10) IndexRune/10 352.1Mi ± 0% 445.0Mi ± 0% +26.38% (p=0.000 n=10) IndexRune/32 1.101Gi ± 0% 1.391Gi ± 0% +26.43% (p=0.000 n=10) IndexRune/4K 21.07Gi ± 0% 21.25Gi ± 0% +0.82% (p=0.000 n=10) IndexRune/4M 23.81Gi ± 0% 23.81Gi ± 0% ~ (p=0.218 n=10) IndexRune/64M 23.81Gi ± 0% 23.81Gi ± 0% ~ (p=0.271 n=10) IndexRuneASCII/10 1.038Gi ± 0% 1.190Gi ± 1% +14.63% (p=0.000 n=10) IndexRuneASCII/32 3.643Gi ± 2% 4.203Gi ± 0% +15.38% (p=0.000 n=10) IndexRuneASCII/4K 22.90Gi ± 0% 22.98Gi ± 0% +0.34% (p=0.000 n=10) IndexRuneASCII/4M 23.81Gi ± 0% 23.81Gi ± 0% ~ (p=0.108 n=10) IndexRuneASCII/64M 23.82Gi ± 0% 23.81Gi ± 0% ~ (p=0.105 n=10) IndexRuneUnicode/Latin/10 404.4Mi ± 0% 493.7Mi ± 0% +22.10% (p=0.000 n=10) IndexRuneUnicode/Latin/32 1.261Gi ± 0% 1.543Gi ± 0% +22.31% (p=0.000 n=10) IndexRuneUnicode/Latin/4K 6.966Gi ± 0% 8.115Gi ± 0% +16.50% (p=0.000 n=10) IndexRuneUnicode/Latin/4M 6.599Gi ± 0% 7.576Gi ± 0% +14.80% (p=0.000 n=10) IndexRuneUnicode/Latin/64M 6.297Gi ± 0% 7.070Gi ± 2% +12.28% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/10 385.9Mi ± 0% 440.1Mi ± 0% +14.03% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/32 1.206Gi ± 0% 1.375Gi ± 0% +14.05% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/4K 2.468Gi ± 0% 2.921Gi ± 0% +18.37% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/4M 2.386Gi ± 0% 2.845Gi ± 0% +19.23% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/64M 2.280Gi ± 0% 2.717Gi ± 0% +19.14% (p=0.000 n=10) IndexRuneUnicode/Han/10 307.1Mi ± 0% 331.5Mi ± 0% +7.94% (p=0.000 n=10) IndexRuneUnicode/Han/32 982.2Mi ± 0% 1060.2Mi ± 0% +7.94% (p=0.000 n=10) IndexRuneUnicode/Han/4K 4.986Gi ± 0% 5.957Gi ± 0% +19.48% (p=0.000 n=10) IndexRuneUnicode/Han/4M 3.822Gi ± 0% 4.198Gi ± 0% +9.83% (p=0.000 n=10) IndexRuneUnicode/Han/64M 3.765Gi ± 0% 4.140Gi ± 0% +9.96% (p=0.000 n=10) Index/10 634.6Mi ± 0% 635.2Mi ± 0% +0.09% (p=0.000 n=10) Index/32 375.3Mi ± 0% 385.1Mi ± 0% +2.63% (p=0.000 n=10) Index/4K 754.8Mi ± 0% 755.2Mi ± 0% +0.04% (p=0.001 n=10) Index/4M 746.5Mi ± 0% 746.3Mi ± 0% -0.03% (p=0.000 n=10) Index/64M 746.5Mi ± 0% 746.3Mi ± 0% -0.03% (p=0.000 n=10) IndexEasy/10 714.6Mi ± 0% 714.6Mi ± 0% +0.00% (p=0.001 n=10) IndexEasy/32 1.221Gi ± 0% 1.524Gi ± 0% +24.81% (p=0.000 n=10) IndexEasy/4K 21.06Gi ± 0% 21.47Gi ± 0% +1.91% (p=0.000 n=10) IndexEasy/4M 20.23Gi ± 0% 20.24Gi ± 0% ~ (p=0.684 n=10) IndexEasy/64M 13.07Gi ± 0% 12.58Gi ± 4% -3.75% (p=0.000 n=10) IndexHard1 1.114Gi ± 0% 1.114Gi ± 0% ~ (p=0.193 n=10) IndexHard2 1.111Gi ± 0% 1.112Gi ± 0% +0.04% (p=0.001 n=10) IndexHard3 1.086Gi ± 0% 1.081Gi ± 0% -0.37% (p=0.000 n=10) IndexHard4 607.9Mi ± 0% 607.9Mi ± 0% ~ (p=0.136 n=10) geomean 2.536Gi 2.720Gi +7.26% Change-Id: I1fc246783ebb215882d7144d05dbe2433dc66751 Reviewed-on: https://go-review.googlesource.com/c/go/+/662415 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>
2025-04-03internal/bytealg: optimize Count/CountString on arm64Vasily Leonenko
Introduce ABIInternal support for Count/CountString Move <32 size block from function end to beginning as fastpath goos: linux goarch: arm64 pkg: strings │ base.txt │ new.txt │ │ B/s │ B/s vs base │ CountByte/10 672.5Mi ± 0% 692.9Mi ± 0% +3.04% (p=0.000 n=10) CountByte/32 3.592Gi ± 0% 3.970Gi ± 0% +10.53% (p=0.000 n=10) CountByte/4096 16.63Gi ± 0% 16.73Gi ± 0% +0.64% (p=0.000 n=10) CountByte/4194304 14.97Gi ± 2% 15.02Gi ± 1% ~ (p=0.190 n=10) CountByte/67108864 12.50Gi ± 0% 12.50Gi ± 0% ~ (p=0.853 n=10) geomean 5.931Gi 6.099Gi +2.83% Change-Id: I5af1be2b117d9fb8d570739637499923de62251c Reviewed-on: https://go-review.googlesource.com/c/go/+/662395 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Commit-Queue: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-03-27cmd/internal/obj/riscv,internal/bytealg: synthesize MIN/MAX/MINU/MAXU ↵Joel Sing
instructions Provide a synthesized version of the MIN/MAX/MINU/MAXU instructions if they're not natively available. This allows these instructions to be used in assembly unconditionally. Use MIN in internal/bytealg.compare. Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64 Change-Id: I8a5a3a59f0a9205e136fc3d673b23eaf3ca469f8 Reviewed-on: https://go-review.googlesource.com/c/go/+/653295 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-11internal/bytealg: optimize Count{,String} in loong64Guoqi Chen
Benchmark on Loongson 3A6000 and 3A5000: goos: linux goarch: loong64 pkg: bytes cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | CountSingle/10 13.210n ± 0% 9.984n ± 0% -24.42% (p=0.000 n=15) CountSingle/32 31.970n ± 1% 7.205n ± 0% -77.46% (p=0.000 n=15) CountSingle/4K 4039.0n ± 0% 108.7n ± 0% -97.31% (p=0.000 n=15) CountSingle/4M 4158.9µ ± 0% 117.3µ ± 0% -97.18% (p=0.000 n=15) CountSingle/64M 68.641m ± 0% 2.585m ± 1% -96.23% (p=0.000 n=15) geomean 13.72µ 1.189µ -91.34% | bench.old | bench.new | | B/s | B/s vs base | CountSingle/10 722.0Mi ± 0% 955.2Mi ± 0% +32.30% (p=0.000 n=15) CountSingle/32 954.6Mi ± 1% 4235.4Mi ± 0% +343.68% (p=0.000 n=15) CountSingle/4K 967.2Mi ± 0% 35947.6Mi ± 0% +3616.64% (p=0.000 n=15) CountSingle/4M 961.8Mi ± 0% 34092.7Mi ± 0% +3444.71% (p=0.000 n=15) CountSingle/64M 932.4Mi ± 0% 24757.2Mi ± 1% +2555.24% (p=0.000 n=15) geomean 902.2Mi 10.17Gi +1054.77% goos: linux goarch: loong64 pkg: bytes cpu: Loongson-3A5000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | CountSingle/10 14.41n ± 0% 12.81n ± 0% -11.10% (p=0.000 n=15) CountSingle/32 36.230n ± 0% 9.609n ± 0% -73.48% (p=0.000 n=15) CountSingle/4K 4366.0n ± 0% 165.5n ± 0% -96.21% (p=0.000 n=15) CountSingle/4M 4464.7µ ± 0% 325.2µ ± 0% -92.72% (p=0.000 n=15) CountSingle/64M 75.627m ± 0% 8.307m ± 69% -89.02% (p=0.000 n=15) geomean 15.04µ 2.229µ -85.18% | bench.old | bench.new | | B/s | B/s vs base | CountSingle/10 661.8Mi ± 0% 744.4Mi ± 0% +12.49% (p=0.000 n=15) CountSingle/32 842.4Mi ± 0% 3176.1Mi ± 0% +277.03% (p=0.000 n=15) CountSingle/4K 894.7Mi ± 0% 23596.7Mi ± 0% +2537.34% (p=0.000 n=15) CountSingle/4M 895.9Mi ± 0% 12299.7Mi ± 0% +1272.88% (p=0.000 n=15) CountSingle/64M 846.3Mi ± 0% 7703.9Mi ± 41% +810.34% (p=0.000 n=15) geomean 823.3Mi 5.424Gi +574.68% Change-Id: Ie07592beac61bdb093470c524049ed494df4d703 Reviewed-on: https://go-review.googlesource.com/c/go/+/586055 Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-02-21internal/bytealg: add assembly implementation of Count/CountString for mips64xJulian Zhu
Add a simple assembly implementation of Count/CountString for mips64x. name old sec/op new sec/op vs base CountSingle/10-4 31.16n ± 0% 41.69n ± 0% +33.79% (p=0.000 n=11) CountSingle/32-4 69.58n ± 0% 59.61n ± 0% -14.33% (p=0.000 n=11) CountSingle/4K-4 7.428µ ± 0% 5.153µ ± 0% -30.63% (p=0.000 n=11) CountSingle/4M-4 7.634m ± 0% 5.300m ± 0% -30.58% (p=0.000 n=11) CountSingle/64M-4 134.4m ± 0% 100.8m ± 3% -24.99% (p=0.000 n=11) name old B/s new B/s vs base CountSingle/10-4 306.1Mi ± 0% 228.8Mi ± 0% -25.25% (p=0.000 n=11) CountSingle/32-4 438.6Mi ± 0% 512.0Mi ± 0% +16.74% (p=0.000 n=11) CountSingle/4K-4 525.9Mi ± 0% 758.0Mi ± 0% +44.15% (p=0.000 n=11) CountSingle/4M-4 523.9Mi ± 0% 754.7Mi ± 0% +44.05% (p=0.000 n=11) CountSingle/64M-4 476.3Mi ± 0% 635.0Mi ± 0% +33.31% (p=0.000 n=11) Change-Id: Id5ddbea0d080e2903156ef8dc86c030a8179115b Reviewed-on: https://go-review.googlesource.com/c/go/+/650995 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>
2025-02-20internal/bytealg: clean up and simplify the riscv64 equal implementationJoel Sing
Now that riscv64 is only regabi, remove the entrypoint separation and have runtime.memequal_varlen call runtime.memequal. Add a zero byte length check and replace the equal and not equal exit paths with a single exit path that conditions on length reaching zero. Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64 Change-Id: Ida4e54378daa7fd423f759753eba04ce513a27cb Reviewed-on: https://go-review.googlesource.com/c/go/+/648855 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-11-20internal/bytealg: optimize IndexByte for riscv64Mark Ryan
The existing implementations of IndexByte and IndexByteString for riscv64 are very simplistic. They load and compare a single byte at a time in a tight loop. It's possible to improve performance in the general case by loading and checking 8 bytes at a time. This is achieved using the 'Determine if a word has a byte equal to n' bit hack from https://graphics.stanford.edu/~seander/bithacks.html. We broadcast the byte we're looking for across a 64 bit register, let v be the result of xoring that register with 8 bytes loaded from the buffer and then use the formula, (((v) - 0x0101010101010101UL) & ~(v) & 0x8080808080808080UL) which evaluates to true if any one of the bytes in v is 0, i.e, matches the byte we're looking for. We then just need to figure out which byte out of the 8 it is to return the correct index. This change generally improves performance when the byte we're looking for is not in the first 24 bytes of the buffer and degrades performance slightly when it is. Some example benchmarks results from the bytes and strings package are presented below. These were generated on a VisionFive2 running Ubuntu 24.04. Subset of bytes Index benchmarks IndexByte/10 46.49n ± 0% 44.08n ± 0% -5.19% (p=0.000 n=10) IndexByte/32 75.98n ± 0% 67.90n ± 0% -10.63% (p=0.000 n=10) IndexByte/4K 5.512µ ± 0% 2.113µ ± 0% -61.67% (p=0.000 n=10) IndexByte/4M 7.354m ± 0% 3.218m ± 0% -56.24% (p=0.000 n=10) IndexByte/64M 90.15m ± 0% 33.86m ± 0% -62.44% (p=0.000 n=10) IndexBytePortable/10 50.41n ± 0% 54.92n ± 1% +8.94% (p=0.000 n=10) IndexBytePortable/32 111.9n ± 0% 115.5n ± 0% +3.22% (p=0.000 n=10) IndexBytePortable/4K 10.99µ ± 0% 10.99µ ± 0% +0.04% (p=0.000 n=10) IndexBytePortable/4M 11.24m ± 0% 11.24m ± 0% ~ (p=0.218 n=10) IndexBytePortable/64M 179.8m ± 0% 179.8m ± 0% +0.01% (p=0.001 n=10) IndexRune/10 104.2n ± 0% 104.4n ± 0% +0.19% (p=0.000 n=10) IndexRune/32 133.7n ± 0% 139.3n ± 0% +4.23% (p=0.000 n=10) IndexRune/4K 5.573µ ± 0% 2.184µ ± 0% -60.81% (p=0.000 n=10) IndexRune/4M 5.634m ± 0% 2.112m ± 0% -62.51% (p=0.000 n=10) IndexRune/64M 90.19m ± 0% 33.87m ± 0% -62.45% (p=0.000 n=10) IndexRuneASCII/10 50.42n ± 2% 47.14n ± 0% -6.52% (p=0.000 n=10) IndexRuneASCII/32 79.64n ± 1% 70.39n ± 0% -11.61% (p=0.000 n=10) IndexRuneASCII/4K 5.516µ ± 0% 2.115µ ± 0% -61.66% (p=0.000 n=10) IndexRuneASCII/4M 5.634m ± 0% 2.112m ± 0% -62.51% (p=0.000 n=10) IndexRuneASCII/64M 90.16m ± 0% 33.86m ± 0% -62.44% (p=0.000 n=10) IndexRuneUnicode/Latin/10 82.14n ± 0% 82.07n ± 0% -0.09% (p=0.000 n=10) IndexRuneUnicode/Latin/32 111.6n ± 0% 117.1n ± 0% +4.93% (p=0.000 n=10) IndexRuneUnicode/Latin/4K 6.222µ ± 0% 3.429µ ± 0% -44.89% (p=0.000 n=10) IndexRuneUnicode/Latin/4M 8.189m ± 0% 4.706m ± 0% -42.53% (p=0.000 n=10) IndexRuneUnicode/Latin/64M 171.8m ± 2% 105.8m ± 0% -38.44% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/10 89.69n ± 0% 89.67n ± 0% -0.02% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/32 119.1n ± 0% 124.1n ± 0% +4.20% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/4K 8.002µ ± 0% 6.232µ ± 0% -22.12% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/4M 9.501m ± 0% 7.510m ± 0% -20.95% (p=0.000 n=10) IndexRuneUnicode/Cyrillic/64M 186.5m ± 0% 150.3m ± 0% -19.41% (p=0.000 n=10) IndexRuneUnicode/Han/10 117.8n ± 0% 118.1n ± 0% +0.25% (p=0.000 n=10) IndexRuneUnicode/Han/32 151.5n ± 0% 154.0n ± 0% +1.65% (p=0.000 n=10) IndexRuneUnicode/Han/4K 6.664µ ± 0% 4.125µ ± 0% -38.11% (p=0.000 n=10) IndexRuneUnicode/Han/4M 8.526m ± 0% 5.502m ± 0% -35.46% (p=0.000 n=10) IndexRuneUnicode/Han/64M 171.8m ± 1% 112.2m ± 0% -34.68% (p=0.000 n=10) Index/10 199.3n ± 1% 199.4n ± 0% ~ (p=1.000 n=10) Index/32 547.7n ± 0% 547.3n ± 0% -0.08% (p=0.001 n=10) Index/4K 38.62µ ± 0% 38.62µ ± 0% -0.01% (p=0.023 n=10) Index/4M 40.46m ± 0% 40.45m ± 0% ~ (p=0.105 n=10) Index/64M 648.5m ± 0% 648.4m ± 0% ~ (p=1.000 n=10) IndexEasy/10 70.25n ± 0% 70.92n ± 0% +0.95% (p=0.000 n=10) IndexEasy/32 104.60n ± 0% 95.67n ± 0% -8.54% (p=0.000 n=10) IndexEasy/4K 5.544µ ± 0% 2.142µ ± 0% -61.36% (p=0.000 n=10) IndexEasy/4M 7.354m ± 0% 3.213m ± 0% -56.32% (p=0.000 n=10) IndexEasy/64M 114.93m ± 2% 52.61m ± 0% -54.22% (p=0.000 n=10) IndexHard1 10.09m ± 0% 10.09m ± 0% ~ (p=0.393 n=10) IndexHard2 10.09m ± 0% 10.09m ± 0% ~ (p=0.481 n=10) IndexHard3 10.09m ± 0% 10.09m ± 0% ~ (p=1.000 n=10) IndexHard4 10.09m ± 0% 10.09m ± 0% ~ (p=0.739 n=10) LastIndexHard1 10.71m ± 0% 10.71m ± 0% ~ (p=0.052 n=10) LastIndexHard2 10.71m ± 0% 10.71m ± 0% ~ (p=0.218 n=10) LastIndexHard3 10.71m ± 0% 10.71m ± 0% ~ (p=0.739 n=10) IndexAnyASCII/1:1 30.13n ± 0% 30.79n ± 0% +2.19% (p=0.000 n=10) IndexAnyASCII/1:2 31.49n ± 0% 32.16n ± 0% +2.13% (p=0.000 n=10) IndexAnyASCII/1:4 34.16n ± 0% 34.82n ± 0% +1.93% (p=0.000 n=10) IndexAnyASCII/1:8 39.50n ± 0% 40.16n ± 0% +1.67% (p=0.000 n=10) IndexAnyASCII/1:16 50.20n ± 0% 50.87n ± 0% +1.33% (p=0.000 n=10) IndexAnyASCII/1:32 81.04n ± 0% 50.29n ± 0% -37.94% (p=0.000 n=10) IndexAnyASCII/1:64 119.80n ± 0% 66.94n ± 0% -44.13% (p=0.000 n=10) IndexAnyASCII/16:1 54.86n ± 0% 55.53n ± 0% +1.22% (p=0.000 n=10) IndexAnyASCII/16:2 268.2n ± 0% 268.2n ± 0% ~ (p=1.000 n=10) IndexAnyASCII/16:4 288.1n ± 0% 288.1n ± 0% ~ (p=1.000 n=10) ¹ IndexAnyASCII/16:8 328.3n ± 0% 328.2n ± 0% ~ (p=0.370 n=10) IndexAnyASCII/16:16 413.4n ± 0% 413.4n ± 0% ~ (p=0.628 n=10) IndexAnyASCII/16:32 574.0n ± 0% 573.9n ± 0% ~ (p=0.141 n=10) IndexAnyASCII/16:64 895.1n ± 0% 895.1n ± 0% ~ (p=0.548 n=10) IndexAnyASCII/256:1 381.4n ± 0% 175.4n ± 0% -53.99% (p=0.000 n=10) IndexAnyASCII/256:2 2.998µ ± 0% 2.998µ ± 0% ~ (p=0.365 n=10) IndexAnyASCII/256:4 3.018µ ± 0% 3.018µ ± 0% ~ (p=0.650 n=10) IndexAnyASCII/256:8 3.058µ ± 0% 3.064µ ± 0% +0.20% (p=0.011 n=10) IndexAnyASCII/256:16 3.143µ ± 0% 3.150µ ± 0% +0.21% (p=0.000 n=10) IndexAnyASCII/256:32 3.303µ ± 0% 3.307µ ± 0% +0.12% (p=0.000 n=10) IndexAnyASCII/256:64 3.625µ ± 0% 3.638µ ± 0% +0.36% (p=0.000 n=10) IndexAnyUTF8/1:1 30.13n ± 0% 30.94n ± 0% +2.69% (p=0.000 n=10) IndexAnyUTF8/1:2 31.49n ± 0% 32.30n ± 0% +2.59% (p=0.000 n=10) IndexAnyUTF8/1:4 34.16n ± 0% 35.03n ± 0% +2.55% (p=0.000 n=10) IndexAnyUTF8/1:8 39.50n ± 0% 40.16n ± 0% +1.67% (p=0.000 n=10) IndexAnyUTF8/1:16 50.20n ± 0% 50.84n ± 0% +1.27% (p=0.000 n=10) IndexAnyUTF8/1:32 81.02n ± 0% 61.55n ± 0% -24.03% (p=0.000 n=10) IndexAnyUTF8/1:64 119.80n ± 0% 80.04n ± 0% -33.19% (p=0.000 n=10) IndexAnyUTF8/16:1 489.0n ± 0% 489.0n ± 0% ~ (p=1.000 n=10) IndexAnyUTF8/16:2 361.9n ± 0% 372.6n ± 0% +2.96% (p=0.000 n=10) IndexAnyUTF8/16:4 404.7n ± 0% 415.4n ± 0% +2.64% (p=0.000 n=10) IndexAnyUTF8/16:8 489.9n ± 0% 500.7n ± 0% +2.20% (p=0.000 n=10) IndexAnyUTF8/16:16 661.2n ± 0% 671.9n ± 0% +1.62% (p=0.000 n=10) IndexAnyUTF8/16:32 1004.0n ± 0% 881.6n ± 0% -12.19% (p=0.000 n=10) IndexAnyUTF8/16:64 1.767µ ± 0% 1.129µ ± 0% -36.11% (p=0.000 n=10) IndexAnyUTF8/256:1 7.072µ ± 0% 7.072µ ± 0% ~ (p=0.387 n=10) IndexAnyUTF8/256:2 4.700µ ± 0% 4.872µ ± 0% +3.66% (p=0.000 n=10) IndexAnyUTF8/256:4 5.386µ ± 0% 5.557µ ± 0% +3.18% (p=0.000 n=10) IndexAnyUTF8/256:8 6.752µ ± 0% 6.923µ ± 0% +2.53% (p=0.000 n=10) IndexAnyUTF8/256:16 9.493µ ± 0% 9.664µ ± 0% +1.80% (p=0.000 n=10) IndexAnyUTF8/256:32 14.97µ ± 0% 12.93µ ± 0% -13.64% (p=0.000 n=10) IndexAnyUTF8/256:64 27.15µ ± 0% 16.89µ ± 0% -37.80% (p=0.000 n=10) LastIndexAnyASCII/1:1 30.78n ± 0% 31.45n ± 0% +2.18% (p=0.000 n=10) LastIndexAnyASCII/1:2 32.13n ± 0% 32.80n ± 0% +2.07% (p=0.000 n=10) LastIndexAnyASCII/1:4 34.81n ± 0% 35.48n ± 0% +1.92% (p=0.000 n=10) LastIndexAnyASCII/1:8 40.14n ± 0% 40.81n ± 0% +1.67% (p=0.000 n=10) LastIndexAnyASCII/1:16 50.85n ± 0% 51.51n ± 0% +1.30% (p=0.000 n=10) LastIndexAnyASCII/1:32 84.03n ± 0% 50.85n ± 0% -39.49% (p=0.000 n=10) LastIndexAnyASCII/1:64 121.50n ± 0% 68.16n ± 0% -43.90% (p=0.000 n=10) LastIndexAnyASCII/16:1 249.7n ± 0% 249.7n ± 0% ~ (p=1.000 n=10) ¹ LastIndexAnyASCII/16:2 255.2n ± 0% 255.2n ± 0% ~ (p=1.000 n=10) ¹ LastIndexAnyASCII/16:4 274.0n ± 0% 274.0n ± 0% ~ (p=1.000 n=10) ¹ LastIndexAnyASCII/16:8 314.1n ± 0% 314.1n ± 0% ~ (p=1.000 n=10) LastIndexAnyASCII/16:16 403.8n ± 0% 403.8n ± 0% ~ (p=1.000 n=10) LastIndexAnyASCII/16:32 564.4n ± 0% 564.4n ± 0% ~ (p=1.000 n=10) LastIndexAnyASCII/16:64 885.5n ± 0% 885.5n ± 0% ~ (p=0.474 n=10) LastIndexAnyASCII/256:1 2.819µ ± 0% 2.819µ ± 0% ~ (p=0.211 n=10) LastIndexAnyASCII/256:2 2.824µ ± 0% 2.824µ ± 0% ~ (p=1.000 n=10) ¹ LastIndexAnyASCII/256:4 2.843µ ± 0% 2.843µ ± 0% ~ (p=1.000 n=10) ¹ LastIndexAnyASCII/256:8 2.883µ ± 0% 2.883µ ± 0% ~ (p=1.000 n=10) ¹ LastIndexAnyASCII/256:16 2.973µ ± 0% 2.973µ ± 0% ~ (p=1.000 n=10) LastIndexAnyASCII/256:32 3.133µ ± 0% 3.133µ ± 0% ~ (p=0.628 n=10) LastIndexAnyASCII/256:64 3.454µ ± 0% 3.454µ ± 0% ~ (p=1.000 n=10) LastIndexAnyUTF8/1:1 30.78n ± 0% 31.45n ± 0% +2.18% (p=0.000 n=10) LastIndexAnyUTF8/1:2 32.13n ± 0% 32.80n ± 0% +2.07% (p=0.000 n=10) LastIndexAnyUTF8/1:4 34.81n ± 0% 35.48n ± 0% +1.92% (p=0.000 n=10) LastIndexAnyUTF8/1:8 40.14n ± 0% 40.81n ± 0% +1.67% (p=0.000 n=10) LastIndexAnyUTF8/1:16 50.84n ± 0% 51.52n ± 0% +1.33% (p=0.000 n=10) LastIndexAnyUTF8/1:32 83.87n ± 0% 62.90n ± 0% -25.00% (p=0.000 n=10) LastIndexAnyUTF8/1:64 121.50n ± 0% 81.67n ± 0% -32.78% (p=0.000 n=10) LastIndexAnyUTF8/16:1 330.0n ± 0% 330.0n ± 0% ~ (p=1.000 n=10) LastIndexAnyUTF8/16:2 365.4n ± 1% 376.1n ± 0% +2.93% (p=0.000 n=10) LastIndexAnyUTF8/16:4 399.9n ± 0% 410.6n ± 0% +2.68% (p=0.000 n=10) LastIndexAnyUTF8/16:8 485.5n ± 0% 496.2n ± 0% +2.20% (p=0.000 n=10) LastIndexAnyUTF8/16:16 656.8n ± 0% 667.5n ± 0% +1.63% (p=0.000 n=10) LastIndexAnyUTF8/16:32 999.3n ± 0% 882.6n ± 0% -11.68% (p=0.000 n=10) LastIndexAnyUTF8/16:64 1.744µ ± 0% 1.129µ ± 0% -35.26% (p=0.000 n=10) LastIndexAnyUTF8/256:1 4.023µ ± 0% 4.023µ ± 0% 0.00% (p=0.033 n=10) LastIndexAnyUTF8/256:2 4.645µ ± 0% 4.816µ ± 0% +3.68% (p=0.000 n=10) LastIndexAnyUTF8/256:4 5.217µ ± 0% 5.388µ ± 0% +3.28% (p=0.000 n=10) LastIndexAnyUTF8/256:8 6.587µ ± 0% 6.758µ ± 0% +2.60% (p=0.000 n=10) LastIndexAnyUTF8/256:16 9.327µ ± 0% 9.498µ ± 0% +1.83% (p=0.000 n=10) LastIndexAnyUTF8/256:32 14.81µ ± 0% 12.92µ ± 0% -12.73% (p=0.000 n=10) LastIndexAnyUTF8/256:64 26.69µ ± 0% 16.84µ ± 0% -36.92% (p=0.000 n=10) IndexPeriodic/IndexPeriodic2 625.6µ ± 0% 625.6µ ± 0% ~ (p=0.529 n=10) IndexPeriodic/IndexPeriodic4 625.5µ ± 0% 625.6µ ± 0% +0.01% (p=0.002 n=10) IndexPeriodic/IndexPeriodic8 625.4µ ± 0% 625.4µ ± 0% +0.01% (p=0.001 n=10) IndexPeriodic/IndexPeriodic16 236.5µ ± 0% 225.4µ ± 0% -4.69% (p=0.000 n=10) IndexPeriodic/IndexPeriodic32 171.1µ ± 3% 133.4µ ± 0% -22.05% (p=0.000 n=10) IndexPeriodic/IndexPeriodic64 139.10µ ± 3% 89.28µ ± 0% -35.82% (p=0.000 n=10) geomean 4.222µ 3.628µ -14.0 Subset of strings Index benchmarks IndexRune 110.7n ± 0% 117.7n ± 0% +6.32% (p=0.000 n=10) IndexRuneLongString 246.6n ± 0% 187.4n ± 3% -24.01% (p=0.000 n=10) IndexRuneFastPath 46.82n ± 0% 46.06n ± 0% -1.62% (p=0.000 n=10) Index 48.28n ± 0% 47.61n ± 0% -1.39% (p=0.000 n=10) LastIndex 34.50n ± 0% 34.50n ± 0% ~ (p=1.000 n=10) ¹ IndexByte 41.72n ± 0% 40.83n ± 0% -2.13% (p=0.000 n=10) IndexHard1 10.01m ± 0% 10.01m ± 0% +0.02% (p=0.000 n=10) IndexHard2 10.01m ± 0% 10.01m ± 0% +0.02% (p=0.000 n=10) IndexHard3 10.01m ± 0% 10.01m ± 0% +0.02% (p=0.000 n=10) IndexHard4 10.01m ± 0% 10.01m ± 0% +0.02% (p=0.000 n=10) LastIndexHard1 10.71m ± 0% 10.71m ± 0% +0.03% (p=0.000 n=10) LastIndexHard2 10.71m ± 0% 10.71m ± 0% +0.03% (p=0.000 n=10) LastIndexHard3 10.71m ± 0% 10.71m ± 0% +0.03% (p=0.000 n=10) IndexTorture 71.33µ ± 0% 71.37µ ± 0% +0.05% (p=0.000 n=10) IndexAnyASCII/1:1 34.40n ± 0% 35.07n ± 0% +1.95% (p=0.000 n=10) IndexAnyASCII/1:2 46.87n ± 0% 47.54n ± 0% +1.43% (p=0.000 n=10) IndexAnyASCII/1:4 49.53n ± 0% 50.20n ± 0% +1.35% (p=0.000 n=10) IndexAnyASCII/1:8 54.86n ± 0% 55.53n ± 0% +1.22% (p=0.000 n=10) IndexAnyASCII/1:16 65.56n ± 0% 66.24n ± 0% +1.04% (p=0.000 n=10) IndexAnyASCII/1:32 86.97n ± 0% 77.82n ± 0% -10.52% (p=0.000 n=10) IndexAnyASCII/1:64 134.50n ± 0% 98.57n ± 0% -26.71% (p=0.000 n=10) IndexAnyASCII/16:1 54.19n ± 0% 54.86n ± 0% +1.24% (p=0.000 n=10) IndexAnyASCII/16:2 257.4n ± 0% 256.7n ± 0% -0.27% (p=0.000 n=10) IndexAnyASCII/16:4 275.3n ± 0% 275.3n ± 0% ~ (p=1.000 n=10) IndexAnyASCII/16:8 315.4n ± 0% 315.5n ± 0% +0.03% (p=0.001 n=10) IndexAnyASCII/16:16 405.4n ± 0% 405.4n ± 0% ~ (p=1.000 n=10) IndexAnyASCII/16:32 566.0n ± 0% 566.0n ± 0% ~ (p=1.000 n=10) IndexAnyASCII/16:64 887.0n ± 0% 887.1n ± 0% ~ (p=0.181 n=10) IndexAnyASCII/256:1 380.0n ± 0% 174.7n ± 0% -54.03% (p=0.000 n=10) IndexAnyASCII/256:2 2.826µ ± 0% 2.826µ ± 0% ~ (p=1.000 n=10) ¹ IndexAnyASCII/256:4 2.844µ ± 0% 2.844µ ± 0% ~ (p=1.000 n=10) ¹ IndexAnyASCII/256:8 2.884µ ± 0% 2.884µ ± 0% ~ (p=0.087 n=10) IndexAnyASCII/256:16 2.974µ ± 0% 2.974µ ± 0% ~ (p=1.000 n=10) IndexAnyASCII/256:32 3.135µ ± 0% 3.135µ ± 0% ~ (p=1.000 n=10) IndexAnyASCII/256:64 3.456µ ± 0% 3.456µ ± 0% ~ (p=1.000 n=10) ¹ IndexAnyUTF8/1:1 38.13n ± 0% 38.13n ± 0% ~ (p=1.000 n=10) ¹ IndexAnyUTF8/1:2 46.87n ± 0% 47.54n ± 0% +1.43% (p=0.000 n=10) IndexAnyUTF8/1:4 49.53n ± 0% 50.19n ± 0% +1.33% (p=0.000 n=10) IndexAnyUTF8/1:8 54.86n ± 0% 55.52n ± 0% +1.20% (p=0.000 n=10) IndexAnyUTF8/1:16 65.56n ± 0% 66.23n ± 0% +1.02% (p=0.000 n=10) IndexAnyUTF8/1:32 86.97n ± 0% 82.25n ± 0% -5.42% (p=0.000 n=10) IndexAnyUTF8/1:64 134.50n ± 0% 99.96n ± 0% -25.68% (p=0.000 n=10) IndexAnyUTF8/16:1 98.34n ± 0% 98.34n ± 0% ~ (p=1.000 n=10) IndexAnyUTF8/16:2 462.7n ± 0% 473.7n ± 0% +2.38% (p=0.000 n=10) IndexAnyUTF8/16:4 504.6n ± 0% 515.3n ± 0% +2.11% (p=0.000 n=10) IndexAnyUTF8/16:8 589.1n ± 0% 599.7n ± 0% +1.80% (p=0.000 n=10) IndexAnyUTF8/16:16 760.4n ± 0% 770.9n ± 0% +1.38% (p=0.000 n=10) IndexAnyUTF8/16:32 1.103µ ± 0% 1.023µ ± 0% -7.25% (p=0.000 n=10) IndexAnyUTF8/16:64 1.857µ ± 0% 1.294µ ± 0% -30.32% (p=0.000 n=10) IndexAnyUTF8/256:1 1.066µ ± 0% 1.066µ ± 0% ~ (p=1.000 n=10) ¹ IndexAnyUTF8/256:2 6.106µ ± 0% 6.277µ ± 0% +2.81% (p=0.000 n=10) IndexAnyUTF8/256:4 6.787µ ± 0% 6.958µ ± 0% +2.52% (p=0.000 n=10) IndexAnyUTF8/256:8 8.136µ ± 0% 8.308µ ± 0% +2.11% (p=0.000 n=10) IndexAnyUTF8/256:16 10.88µ ± 0% 11.05µ ± 0% +1.57% (p=0.000 n=10) IndexAnyUTF8/256:32 16.36µ ± 0% 14.90µ ± 0% -8.93% (p=0.000 n=10) IndexAnyUTF8/256:64 28.51µ ± 0% 19.41µ ± 0% -31.92% (p=0.000 n=10) LastIndexAnyASCII/1:1 35.79n ± 0% 38.52n ± 0% +7.63% (p=0.000 n=10) LastIndexAnyASCII/1:2 37.12n ± 0% 39.85n ± 0% +7.35% (p=0.000 n=10) LastIndexAnyASCII/1:4 39.76n ± 0% 42.08n ± 0% +5.84% (p=0.000 n=10) LastIndexAnyASCII/1:8 44.82n ± 0% 47.22n ± 0% +5.34% (p=0.000 n=10) LastIndexAnyASCII/1:16 55.53n ± 0% 57.92n ± 3% +4.30% (p=0.000 n=10) LastIndexAnyASCII/1:32 76.94n ± 0% 70.16n ± 0% -8.81% (p=0.000 n=10) LastIndexAnyASCII/1:64 124.40n ± 0% 89.67n ± 0% -27.92% (p=0.000 n=10) LastIndexAnyASCII/16:1 245.9n ± 0% 245.9n ± 0% ~ (p=1.000 n=10) LastIndexAnyASCII/16:2 255.2n ± 0% 255.2n ± 0% ~ (p=1.000 n=10) ¹ LastIndexAnyASCII/16:4 275.1n ± 0% 275.1n ± 0% ~ (p=1.000 n=10) ¹ LastIndexAnyASCII/16:8 315.2n ± 0% 315.2n ± 0% ~ (p=1.000 n=10) LastIndexAnyASCII/16:16 400.4n ± 0% 400.4n ± 0% ~ (p=0.087 n=10) LastIndexAnyASCII/16:32 560.9n ± 0% 560.9n ± 0% ~ (p=0.124 n=10) LastIndexAnyASCII/16:64 882.1n ± 0% 882.0n ± 0% -0.01% (p=0.003 n=10) LastIndexAnyASCII/256:1 2.815µ ± 0% 2.815µ ± 0% ~ (p=0.211 n=10) LastIndexAnyASCII/256:2 2.824µ ± 0% 2.824µ ± 0% ~ (p=1.000 n=10) LastIndexAnyASCII/256:4 2.844µ ± 0% 2.844µ ± 0% ~ (p=1.000 n=10) ¹ LastIndexAnyASCII/256:8 2.884µ ± 0% 2.884µ ± 0% ~ (p=1.000 n=10) ¹ LastIndexAnyASCII/256:16 2.969µ ± 0% 2.969µ ± 0% ~ (p=1.000 n=10) LastIndexAnyASCII/256:32 3.130µ ± 0% 3.130µ ± 0% ~ (p=1.000 n=10) ¹ LastIndexAnyASCII/256:64 3.451µ ± 0% 3.451µ ± 0% ~ (p=0.474 n=10) LastIndexAnyUTF8/1:1 35.79n ± 0% 36.13n ± 0% +0.95% (p=0.000 n=10) LastIndexAnyUTF8/1:2 37.11n ± 0% 37.47n ± 0% +0.97% (p=0.000 n=10) LastIndexAnyUTF8/1:4 39.75n ± 0% 40.14n ± 0% +0.97% (p=0.000 n=10) LastIndexAnyUTF8/1:8 44.82n ± 0% 45.49n ± 0% +1.49% (p=0.000 n=10) LastIndexAnyUTF8/1:16 55.52n ± 0% 56.20n ± 0% +1.22% (p=0.000 n=10) LastIndexAnyUTF8/1:32 76.93n ± 0% 74.25n ± 0% -3.48% (p=0.000 n=10) LastIndexAnyUTF8/1:64 124.40n ± 0% 91.15n ± 0% -26.73% (p=0.000 n=10) LastIndexAnyUTF8/16:1 322.5n ± 0% 322.5n ± 0% ~ (p=0.087 n=10) LastIndexAnyUTF8/16:2 634.2n ± 0% 616.4n ± 0% -2.81% (p=0.000 n=10) LastIndexAnyUTF8/16:4 674.5n ± 0% 657.9n ± 0% -2.46% (p=0.000 n=10) LastIndexAnyUTF8/16:8 758.3n ± 0% 741.0n ± 0% -2.28% (p=0.000 n=10) LastIndexAnyUTF8/16:16 929.6n ± 0% 912.3n ± 0% -1.86% (p=0.000 n=10) LastIndexAnyUTF8/16:32 1.272µ ± 0% 1.176µ ± 0% -7.55% (p=0.000 n=10) LastIndexAnyUTF8/16:64 2.018µ ± 0% 1.453µ ± 0% -28.00% (p=0.000 n=10) LastIndexAnyUTF8/256:1 4.015µ ± 0% 4.016µ ± 0% +0.02% (p=0.000 n=10) LastIndexAnyUTF8/256:2 8.896µ ± 0% 8.537µ ± 0% -4.04% (p=0.000 n=10) LastIndexAnyUTF8/256:4 9.553µ ± 0% 9.217µ ± 0% -3.52% (p=0.000 n=10) LastIndexAnyUTF8/256:8 10.90µ ± 0% 10.54µ ± 0% -3.29% (p=0.000 n=10) LastIndexAnyUTF8/256:16 13.64µ ± 0% 13.28µ ± 0% -2.63% (p=0.000 n=10) LastIndexAnyUTF8/256:32 19.12µ ± 0% 17.16µ ± 1% -10.23% (p=0.000 n=10) LastIndexAnyUTF8/256:64 31.11µ ± 0% 21.98µ ± 0% -29.36% (p=0.000 n=10) IndexPeriodic/IndexPeriodic2 625.5µ ± 0% 625.5µ ± 0% ~ (p=0.955 n=10) IndexPeriodic/IndexPeriodic4 625.4µ ± 0% 625.4µ ± 0% ~ (p=0.838 n=10) IndexPeriodic/IndexPeriodic8 625.3µ ± 0% 625.3µ ± 0% +0.01% (p=0.009 n=10) IndexPeriodic/IndexPeriodic16 229.8µ ± 0% 227.0µ ± 0% -1.22% (p=0.000 n=10) IndexPeriodic/IndexPeriodic32 168.9µ ± 3% 131.8µ ± 0% -22.00% (p=0.000 n=10) IndexPeriodic/IndexPeriodic64 126.36µ ± 0% 86.66µ ± 0% -31.42% (p=0.000 n=10) geomean 1.361µ 1.302µ -4.31% As these functions are so heavily used this change impacts other benchmarks. I include the improvements in geomean for the all the benchmarks in the strings and bytes packages, along with some selected benchmarks to illustrate the impact of the change. geomean for bytes 13.81µ 12.92µ -6.44% geomean for string 9.385µ 9.224µ -1.72% Note that when building for rva22u64 a single Zbb instruction is used in the main loop. This also helps to improve performance slightly. The geomean for all the bytes benchmarks when building with GORISCV64=rva22u64 with and without the patch is shown below. geomean for bytes (rva22u64) 13.46µ 12.49µ -7.21% Examples of non-Index benchmarks affected by this commit. ReadString uses IndexByte to search for a byte stored at the end of 32KB buffer, so we see a speed up. SplitSingleByteSeparator searches large buffers, but the byte being sought occurs within the first 15 bytes of the buffer, 76% of the time, hence the slowdown. In SplitMultiByteSeparator the first byte of the separator only occurs in the first 15 bytes 33% of the time so we see a speed up. ReadString 05.13µ ± 2% 74.67µ ± 0% -28.97% (p=0.000 n=10) SplitSingleByteSeparator 11.31m ± 2% 12.43m ± 1% +9.83% (p=0.000 n=10) SplitMultiByteSeparator 8.070m ± 1% 7.707m ± 1% -4.49% (p=0.000 n=10) Change-Id: I6210ea2f3decdc6d2e0609df72b1b66e6d6f5395 Reviewed-on: https://go-review.googlesource.com/c/go/+/561275 Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-29cmd/internal/obj/ppc64: support for extended mnemonics of BCJayanth Krishnamurthy
BGT, BLT, BLE, BGE, BNE, BVS, BVC, and BEQ support by assembler. This will simplify the usage of BC constructs like BC 12, 30, LR <=> BEQ CR7, LR BC 12, 2, LR <=> BEQ CR0, LR BC 12, 0, target <=> BLT CR0, target BC 12, 2, target <=> BEQ CR0, target BC 12, 5, target <=> BGT CR1, target BC 12, 30, target <=> BEQ CR7, target BC 4, 6, target <=> BNE CR1, target BC 4, 5, target <=> BLE CR1, target code cleanup based on the above additions. Change-Id: I02fdb212b6fe3f85ce447e05f4d42118c9ce63b5 Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/612395 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-09-11src/internal/bytealg: optimize the function Compare on loong64Xiaolin Zhao
The relevant performance improved by 66.73%. benchmark: goos: linux goarch: loong64 pkg: bytes cpu: Loongson-3A6000 @ 2500.00MHz │ old │ new │ │ sec/op │ sec/op vs base │ BytesCompare/1 5.603n ± 0% 4.002n ± 0% -28.57% (p=0.000 n=20) BytesCompare/2 6.405n ± 0% 4.002n ± 0% -37.52% (p=0.000 n=20) BytesCompare/4 8.007n ± 0% 4.002n ± 0% -50.02% (p=0.000 n=20) BytesCompare/8 11.210n ± 0% 4.002n ± 0% -64.30% (p=0.000 n=20) BytesCompare/16 6.005n ± 0% 4.802n ± 0% -20.03% (p=0.000 n=20) BytesCompare/32 6.806n ± 0% 4.402n ± 0% -35.32% (p=0.000 n=20) BytesCompare/64 8.407n ± 0% 6.003n ± 0% -28.60% (p=0.000 n=20) BytesCompare/128 11.610n ± 0% 8.404n ± 0% -27.61% (p=0.000 n=20) BytesCompare/256 18.02n ± 0% 14.01n ± 0% -22.25% (p=0.000 n=20) BytesCompare/512 31.23n ± 0% 26.98n ± 0% -13.61% (p=0.000 n=20) BytesCompare/1024 56.85n ± 0% 52.43n ± 0% -7.77% (p=0.000 n=20) BytesCompare/2048 108.1n ± 0% 103.8n ± 0% -3.98% (p=0.000 n=20) CompareBytesEqual 15.610n ± 0% 5.203n ± 0% -66.67% (p=0.000 n=20) CompareBytesToNil 3.203n ± 0% 3.202n ± 0% -0.03% (p=0.000 n=20) CompareBytesEmpty 3.203n ± 0% 2.423n ± 0% -24.35% (p=0.000 n=20) CompareBytesIdentical 3.203n ± 0% 2.424n ± 0% -24.32% (p=0.000 n=20) CompareBytesSameLength 8.407n ± 0% 8.004n ± 0% -4.79% (p=0.000 n=20) CompareBytesDifferentLength 8.808n ± 0% 7.604n ± 0% -13.67% (p=0.000 n=20) CompareBytesBigUnaligned/offset=1 839.85µ ± 0% 82.04µ ± 0% -90.23% (p=0.000 n=20) CompareBytesBigUnaligned/offset=2 839.86µ ± 0% 82.03µ ± 0% -90.23% (p=0.000 n=20) CompareBytesBigUnaligned/offset=3 839.86µ ± 0% 82.03µ ± 0% -90.23% (p=0.000 n=20) CompareBytesBigUnaligned/offset=4 839.86µ ± 0% 82.03µ ± 0% -90.23% (p=0.000 n=20) CompareBytesBigUnaligned/offset=5 839.85µ ± 0% 82.04µ ± 0% -90.23% (p=0.000 n=20) CompareBytesBigUnaligned/offset=6 839.85µ ± 0% 82.03µ ± 0% -90.23% (p=0.000 n=20) CompareBytesBigUnaligned/offset=7 839.85µ ± 0% 82.03µ ± 0% -90.23% (p=0.000 n=20) CompareBytesBigBothUnaligned/offset=0 78.77µ ± 0% 78.75µ ± 0% -0.03% (p=0.000 n=20) CompareBytesBigBothUnaligned/offset=1 839.84µ ± 0% 85.31µ ± 0% -89.84% (p=0.000 n=20) CompareBytesBigBothUnaligned/offset=2 839.84µ ± 0% 85.31µ ± 0% -89.84% (p=0.000 n=20) CompareBytesBigBothUnaligned/offset=3 839.85µ ± 0% 85.31µ ± 0% -89.84% (p=0.000 n=20) CompareBytesBigBothUnaligned/offset=4 839.83µ ± 0% 85.31µ ± 0% -89.84% (p=0.000 n=20) CompareBytesBigBothUnaligned/offset=5 839.85µ ± 0% 85.31µ ± 0% -89.84% (p=0.000 n=20) CompareBytesBigBothUnaligned/offset=6 839.85µ ± 0% 85.31µ ± 0% -89.84% (p=0.000 n=20) CompareBytesBigBothUnaligned/offset=7 839.84µ ± 0% 85.31µ ± 0% -89.84% (p=0.000 n=20) CompareBytesBig 78.77µ ± 0% 78.75µ ± 0% -0.03% (p=0.001 n=20) CompareBytesBigIdentical 2.802n ± 0% 2.801n ± 0% -0.04% (p=0.001 n=20) geomean 1.524µ 507.2n -66.73% Change-Id: Ice9f4ef0ce0fbb5a6424823c5f8e0c0c369fd159 Reviewed-on: https://go-review.googlesource.com/c/go/+/589538 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Tim King <taking@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Auto-Submit: Tim King <taking@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-08-06internal/bytealg: optimize Equal for arm64 targetVasily Leonenko
Remove redundant intermediate jump in runtime.memequal Remove redundant a.ptr==b.ptr check in runtime.memequal_varlen Add 16-bytes alignment before some labels in runtime.memequal goos: linux goarch: arm64 pkg: bytes │ ./master.log │ ./opt.log │ │ sec/op │ sec/op vs base │ Equal/0-4 0.8342n ± 0% 0.5254n ± 3% -37.01% (p=0.000 n=8) Equal/same/1-4 2.720n ± 0% 2.720n ± 2% ~ (p=0.779 n=8) Equal/same/6-4 2.720n ± 5% 2.720n ± 2% ~ (p=0.908 n=8) Equal/same/9-4 2.722n ± 2% 2.721n ± 2% ~ (p=0.779 n=8) Equal/same/15-4 2.719n ± 0% 2.719n ± 0% ~ (p=0.641 n=8) Equal/same/16-4 2.721n ± 2% 2.719n ± 0% -0.07% (p=0.014 n=8) Equal/same/20-4 2.720n ± 0% 2.721n ± 2% ~ (p=0.236 n=8) Equal/same/32-4 2.720n ± 1% 2.720n ± 0% ~ (p=0.396 n=8) Equal/same/4K-4 2.719n ± 0% 2.720n ± 0% ~ (p=0.663 n=8) Equal/same/4M-4 2.721n ± 0% 2.720n ± 0% ~ (p=0.075 n=8) Equal/same/64M-4 2.720n ± 0% 2.720n ± 2% ~ (p=0.806 n=8) Equal/1-4 6.671n ± 0% 5.449n ± 0% -18.33% (p=0.000 n=8) Equal/6-4 8.761n ± 2% 7.508n ± 0% -14.30% (p=0.000 n=8) Equal/9-4 8.343n ± 0% 7.091n ± 0% -15.01% (p=0.000 n=8) Equal/15-4 8.339n ± 2% 7.090n ± 0% -14.98% (p=0.000 n=8) Equal/16-4 9.173n ± 0% 7.925n ± 2% -13.61% (p=0.000 n=8) Equal/20-4 11.26n ± 0% 10.01n ± 0% -11.10% (p=0.000 n=8) Equal/32-4 10.425n ± 0% 9.176n ± 0% -11.98% (p=0.000 n=8) Equal/4K-4 192.9n ± 0% 192.7n ± 0% -0.10% (p=0.044 n=8) Equal/4M-4 191.3µ ± 0% 191.3µ ± 0% ~ (p=0.798 n=8) Equal/64M-4 3.066m ± 2% 3.065m ± 0% ~ (p=0.083 n=8) EqualBothUnaligned/64_0-4 7.506n ± 2% 7.090n ± 2% -5.55% (p=0.000 n=8) EqualBothUnaligned/64_1-4 7.850n ± 1% 7.423n ± 0% -5.43% (p=0.000 n=8) EqualBothUnaligned/64_4-4 7.505n ± 0% 7.088n ± 0% -5.56% (p=0.000 n=8) EqualBothUnaligned/64_7-4 7.840n ± 0% 7.413n ± 0% -5.44% (p=0.000 n=8) EqualBothUnaligned/4096_0-4 193.0n ± 4% 190.9n ± 0% -1.09% (p=0.004 n=8) EqualBothUnaligned/4096_1-4 223.9n ± 0% 223.1n ± 0% -0.36% (p=0.000 n=8) EqualBothUnaligned/4096_4-4 191.9n ± 2% 191.5n ± 0% -0.21% (p=0.004 n=8) EqualBothUnaligned/4096_7-4 223.8n ± 0% 223.1n ± 1% ~ (p=0.098 n=8) EqualBothUnaligned/4194304_0-4 191.8µ ± 0% 191.8µ ± 0% ~ (p=0.504 n=8) EqualBothUnaligned/4194304_1-4 225.4µ ± 2% 225.5µ ± 0% ~ (p=0.065 n=8) EqualBothUnaligned/4194304_4-4 192.6µ ± 0% 192.7µ ± 2% +0.06% (p=0.041 n=8) EqualBothUnaligned/4194304_7-4 225.4µ ± 0% 225.5µ ± 0% +0.05% (p=0.050 n=8) EqualBothUnaligned/67108864_0-4 3.069m ± 0% 3.069m ± 0% ~ (p=0.314 n=8) EqualBothUnaligned/67108864_1-4 3.589m ± 0% 3.588m ± 0% ~ (p=0.959 n=8) EqualBothUnaligned/67108864_4-4 3.083m ± 0% 3.083m ± 2% ~ (p=0.505 n=8) EqualBothUnaligned/67108864_7-4 3.588m ± 0% 3.588m ± 0% ~ (p=1.000 n=8) geomean 199.9n 190.5n -4.70% Change-Id: Ib8d0d4006dd39162a600ac98a5f44a0f05136ed3 Reviewed-on: https://go-review.googlesource.com/c/go/+/601135 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>
2024-07-17internal/bytealg: extend memchr result correctly on wasmZxilly
The mem address should be regarded as uint32. Fixes #65571 Change-Id: Icee38d11f2d93eeca7d50b2e133159e321daeb90 GitHub-Last-Rev: c2568b104369bcf5c4d42c6281d235a52bb9675f GitHub-Pull-Request: golang/go#68400 Reviewed-on: https://go-review.googlesource.com/c/go/+/597955 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-29all: document legacy //go:linkname for final round of modulesRuss Cox
Add linknames for most modules with ≥50 dependents. Add linknames for a few other modules that we know are important but are below 50. Remove linknames from badlinkname.go that do not merit inclusion (very small number of dependents). We can add them back later if the need arises. Fixes #67401. (For now.) Change-Id: I1e49fec0292265256044d64b1841d366c4106002 Reviewed-on: https://go-review.googlesource.com/c/go/+/587756 Auto-Submit: Russ Cox <rsc@golang.org> TryBot-Bypass: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-05-29all: document legacy //go:linkname for modules with ≥100 dependentsRuss Cox
For #67401. Change-Id: I015408a3f437c1733d97160ef2fb5da6d4efcc5c Reviewed-on: https://go-review.googlesource.com/c/go/+/587598 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>
2024-05-23all: document legacy //go:linkname for modules with ≥500 dependentsRuss Cox
For #67401. Change-Id: I7dd28c3b01a1a647f84929d15412aa43ab0089ee Reviewed-on: https://go-review.googlesource.com/c/go/+/587575 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-13all: delete loong64 non-register ABI fallback pathGuoqi Chen
Change-Id: If1d3eba9a922ac6f9d78301bb8f07e445c712899 Reviewed-on: https://go-review.googlesource.com/c/go/+/525576 Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Meidan Li <limeidan@loongson.cn> Commit-Queue: abner chenc <chenguoqi@loongson.cn> Run-TryBot: abner chenc <chenguoqi@loongson.cn>
2024-04-04strings: intrinsify and optimize CompareEmma Haruka Iwao
slices.SortFunc requires a three-way comparison and we need an efficient strings.Compare to perform three-way string comparisons. This new implementation adds bytealg.CompareString as a wrapper of runtime_cmpstring and changes Compare to use bytealg.CompareString. The new implementation of Compare with runtime_cmpstring is about 28% faster than the previous one. Fixes #61725 │ /tmp/gobench-sort-cmp.txt │ /tmp/gobench-sort-strings.txt │ │ sec/op │ sec/op vs base │ SortFuncStruct/Size16-48 918.8n ± 1% 726.6n ± 0% -20.92% (p=0.000 n=10) SortFuncStruct/Size32-48 2.666µ ± 1% 2.003µ ± 1% -24.85% (p=0.000 n=10) SortFuncStruct/Size64-48 1.934µ ± 1% 1.331µ ± 1% -31.22% (p=0.000 n=10) SortFuncStruct/Size128-48 3.560µ ± 1% 2.423µ ± 0% -31.94% (p=0.000 n=10) SortFuncStruct/Size512-48 13.019µ ± 0% 9.071µ ± 0% -30.33% (p=0.000 n=10) SortFuncStruct/Size1024-48 25.61µ ± 0% 17.75µ ± 0% -30.70% (p=0.000 n=10) geomean 4.217µ 3.018µ -28.44% Change-Id: I2513b6f8c1b9b273ef2d23f0a86f691e2d097eb6 Reviewed-on: https://go-review.googlesource.com/c/go/+/532195 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: qiu laidongfeng2 <2645477756@qq.com> Reviewed-by: Keith Randall <khr@google.com>
2024-02-19strings: make use of sizeclasses in (*Builder).GrowMateusz Poliwczak
Fixes #64833 Change-Id: Ice3f5dfab65f5525bc7a6f57ddeaabda8d64dfa3 GitHub-Last-Rev: 38f1d6c19d8ec29ae5645ce677839a301f798df3 GitHub-Pull-Request: golang/go#64835 Reviewed-on: https://go-review.googlesource.com/c/go/+/552135 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-01-24runtime: short path for equal pointers in arm64 memequalMauri de Souza Meneguzzo
If memequal is invoked with the same pointers as arguments it ends up comparing the whole memory contents, instead of just comparing the pointers. This effectively makes an operation that could be O(1) into O(n). All the other architectures already have this optimization in place. For instance, arm64 also have it, in memequal_varlen. Such optimization is very specific, one case that it will probably benefit is programs that rely heavily on interning of strings. goos: darwin goarch: arm64 pkg: bytes │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Equal/same/1-8 2.678n ± ∞ ¹ 2.400n ± ∞ ¹ -10.38% (p=0.008 n=5) Equal/same/6-8 3.267n ± ∞ ¹ 2.431n ± ∞ ¹ -25.59% (p=0.008 n=5) Equal/same/9-8 2.981n ± ∞ ¹ 2.385n ± ∞ ¹ -19.99% (p=0.008 n=5) Equal/same/15-8 2.974n ± ∞ ¹ 2.390n ± ∞ ¹ -19.64% (p=0.008 n=5) Equal/same/16-8 2.983n ± ∞ ¹ 2.380n ± ∞ ¹ -20.21% (p=0.008 n=5) Equal/same/20-8 3.567n ± ∞ ¹ 2.384n ± ∞ ¹ -33.17% (p=0.008 n=5) Equal/same/32-8 3.568n ± ∞ ¹ 2.385n ± ∞ ¹ -33.16% (p=0.008 n=5) Equal/same/4K-8 78.040n ± ∞ ¹ 2.378n ± ∞ ¹ -96.95% (p=0.008 n=5) Equal/same/4M-8 78713.000n ± ∞ ¹ 2.385n ± ∞ ¹ -100.00% (p=0.008 n=5) Equal/same/64M-8 1348095.000n ± ∞ ¹ 2.381n ± ∞ ¹ -100.00% (p=0.008 n=5) geomean 43.52n 2.390n -94.51% ¹ need >= 6 samples for confidence interval at level 0.95 │ old.txt │ new.txt │ │ B/s │ B/s vs base │ Equal/same/1-8 356.1Mi ± ∞ ¹ 397.3Mi ± ∞ ¹ +11.57% (p=0.008 n=5) Equal/same/6-8 1.711Gi ± ∞ ¹ 2.298Gi ± ∞ ¹ +34.35% (p=0.008 n=5) Equal/same/9-8 2.812Gi ± ∞ ¹ 3.515Gi ± ∞ ¹ +24.99% (p=0.008 n=5) Equal/same/15-8 4.698Gi ± ∞ ¹ 5.844Gi ± ∞ ¹ +24.41% (p=0.008 n=5) Equal/same/16-8 4.995Gi ± ∞ ¹ 6.260Gi ± ∞ ¹ +25.34% (p=0.008 n=5) Equal/same/20-8 5.222Gi ± ∞ ¹ 7.814Gi ± ∞ ¹ +49.63% (p=0.008 n=5) Equal/same/32-8 8.353Gi ± ∞ ¹ 12.496Gi ± ∞ ¹ +49.59% (p=0.008 n=5) Equal/same/4K-8 48.88Gi ± ∞ ¹ 1603.96Gi ± ∞ ¹ +3181.17% (p=0.008 n=5) Equal/same/4M-8 49.63Gi ± ∞ ¹ 1637911.85Gi ± ∞ ¹ +3300381.91% (p=0.008 n=5) Equal/same/64M-8 46.36Gi ± ∞ ¹ 26253069.97Gi ± ∞ ¹ +56626517.99% (p=0.008 n=5) geomean 6.737Gi 122.7Gi +1721.01% ¹ need >= 6 samples for confidence interval at level 0.95 Fixes #64381 Change-Id: I7d423930a688edd88c4ba60d45e097296d9be852 GitHub-Last-Rev: ae8189fafb1cba87b5394f09f971746ae9299273 GitHub-Pull-Request: golang/go#64419 Reviewed-on: https://go-review.googlesource.com/c/go/+/545416 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com>
2023-11-22internal/bytealg: optimize Count with PCALIGN in riscv64Meng Zhuo
For #63678 Benchmark on Milk-V Mars CM eMMC (Starfive/JH7110 SoC) goos: linux goarch: riscv64 pkg: bytes │ /root/bytes.old.bench │ /root/bytes.pc16.bench │ │ sec/op │ sec/op vs base │ Count/10 223.9n ± 1% 220.8n ± 1% -1.36% (p=0.001 n=10) Count/32 571.6n ± 0% 571.3n ± 0% ~ (p=0.054 n=10) Count/4K 38.56µ ± 0% 38.55µ ± 0% -0.01% (p=0.010 n=10) Count/4M 40.13m ± 0% 39.21m ± 0% -2.28% (p=0.000 n=10) Count/64M 627.5m ± 0% 627.4m ± 0% -0.01% (p=0.019 n=10) CountEasy/10 101.3n ± 0% 101.3n ± 0% ~ (p=1.000 n=10) ¹ CountEasy/32 139.3n ± 0% 139.3n ± 0% ~ (p=1.000 n=10) ¹ CountEasy/4K 5.565µ ± 0% 5.564µ ± 0% -0.02% (p=0.001 n=10) CountEasy/4M 5.619m ± 0% 5.619m ± 0% ~ (p=0.190 n=10) CountEasy/64M 89.94m ± 0% 89.93m ± 0% ~ (p=0.436 n=10) CountSingle/10 53.80n ± 0% 46.06n ± 0% -14.39% (p=0.000 n=10) CountSingle/32 104.30n ± 0% 79.64n ± 0% -23.64% (p=0.000 n=10) CountSingle/4K 10.413µ ± 0% 7.247µ ± 0% -30.40% (p=0.000 n=10) CountSingle/4M 11.603m ± 0% 8.388m ± 0% -27.71% (p=0.000 n=10) CountSingle/64M 230.9m ± 0% 172.3m ± 0% -25.40% (p=0.000 n=10) CountHard1 9.981m ± 0% 9.981m ± 0% ~ (p=0.810 n=10) CountHard2 9.981m ± 0% 9.981m ± 0% ~ (p=0.315 n=10) CountHard3 9.981m ± 0% 9.981m ± 0% ~ (p=0.159 n=10) geomean 144.6µ 133.5µ -7.70% ¹ all samples are equal │ /root/bytes.old.bench │ /root/bytes.pc16.bench │ │ B/s │ B/s vs base │ Count/10 42.60Mi ± 1% 43.19Mi ± 1% +1.39% (p=0.001 n=10) Count/32 53.38Mi ± 0% 53.42Mi ± 0% +0.06% (p=0.049 n=10) Count/4K 101.3Mi ± 0% 101.3Mi ± 0% ~ (p=0.077 n=10) Count/4M 99.68Mi ± 0% 102.01Mi ± 0% +2.34% (p=0.000 n=10) Count/64M 102.0Mi ± 0% 102.0Mi ± 0% ~ (p=0.076 n=10) CountEasy/10 94.18Mi ± 0% 94.18Mi ± 0% ~ (p=0.054 n=10) CountEasy/32 219.1Mi ± 0% 219.1Mi ± 0% +0.01% (p=0.016 n=10) CountEasy/4K 702.0Mi ± 0% 702.0Mi ± 0% +0.00% (p=0.000 n=10) CountEasy/4M 711.9Mi ± 0% 711.9Mi ± 0% ~ (p=0.133 n=10) CountEasy/64M 711.6Mi ± 0% 711.7Mi ± 0% ~ (p=0.447 n=10) CountSingle/10 177.2Mi ± 0% 207.0Mi ± 0% +16.81% (p=0.000 n=10) CountSingle/32 292.7Mi ± 0% 383.2Mi ± 0% +30.91% (p=0.000 n=10) CountSingle/4K 375.1Mi ± 0% 539.0Mi ± 0% +43.70% (p=0.000 n=10) CountSingle/4M 344.7Mi ± 0% 476.9Mi ± 0% +38.33% (p=0.000 n=10) CountSingle/64M 277.2Mi ± 0% 371.5Mi ± 0% +34.05% (p=0.000 n=10) geomean 199.7Mi 219.8Mi +10.10% Change-Id: I1abf6b220b9802028f8ad5eebc8d3b7cfa3e89ea Reviewed-on: https://go-review.googlesource.com/c/go/+/541756 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: M Zhuo <mzh@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Wang Yaduo <wangyaduo@linux.alibaba.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
2023-11-21internal/bytealg: add regABI support in bytealg functions on loong64Guoqi Chen
Update #40724 Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn> Change-Id: I4a7392afd7238d44e7d09aaca7e0d733649926ac Reviewed-on: https://go-review.googlesource.com/c/go/+/521785 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: David Chase <drchase@google.com>
2023-11-17internal/bytealg: use PCALIGN in memequalAlexander Yastrebov
goos: linux goarch: amd64 pkg: bytes cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz │ master │ HEAD │ │ sec/op │ sec/op vs base │ Equal/0-8 0.2800n ± 22% 0.2865n ± 26% ~ (p=0.075 n=10) Equal/1-8 18.57n ± 2% 19.34n ± 6% +4.15% (p=0.014 n=10) Equal/6-8 19.07n ± 1% 19.38n ± 2% +1.63% (p=0.014 n=10) Equal/9-8 19.39n ± 2% 19.05n ± 1% -1.78% (p=0.005 n=10) Equal/15-8 19.46n ± 1% 19.10n ± 1% -1.85% (p=0.000 n=10) Equal/16-8 19.36n ± 2% 18.95n ± 1% -2.09% (p=0.011 n=10) Equal/20-8 20.20n ± 1% 19.83n ± 1% -1.86% (p=0.001 n=10) Equal/32-8 20.95n ± 1% 20.84n ± 1% -0.57% (p=0.010 n=10) Equal/4K-8 97.40n ± 2% 81.34n ± 3% -16.49% (p=0.000 n=10) Equal/4M-8 81.74µ ± 3% 71.52µ ± 4% -12.49% (p=0.000 n=10) Equal/64M-8 1.319m ± 1% 1.139m ± 3% -13.68% (p=0.000 n=10) EqualBothUnaligned/64_0-8 8.707n ± 4% 8.588n ± 3% ~ (p=0.353 n=10) EqualBothUnaligned/64_1-8 8.513n ± 3% 8.614n ± 2% ~ (p=0.481 n=10) EqualBothUnaligned/64_4-8 8.752n ± 3% 8.637n ± 4% ~ (p=0.148 n=10) EqualBothUnaligned/64_7-8 8.742n ± 3% 8.514n ± 2% ~ (p=0.052 n=10) EqualBothUnaligned/4096_0-8 89.87n ± 3% 70.44n ± 5% -21.63% (p=0.000 n=10) EqualBothUnaligned/4096_1-8 91.67n ± 5% 70.89n ± 3% -22.67% (p=0.000 n=10) EqualBothUnaligned/4096_4-8 90.43n ± 2% 70.52n ± 3% -22.01% (p=0.000 n=10) EqualBothUnaligned/4096_7-8 89.53n ± 3% 72.02n ± 5% -19.56% (p=0.000 n=10) EqualBothUnaligned/4194304_0-8 86.43µ ± 3% 73.40µ ± 4% -15.07% (p=0.000 n=10) EqualBothUnaligned/4194304_1-8 85.48µ ± 2% 75.35µ ± 1% -11.85% (p=0.000 n=10) EqualBothUnaligned/4194304_4-8 86.51µ ± 3% 75.44µ ± 4% -12.80% (p=0.000 n=10) EqualBothUnaligned/4194304_7-8 86.40µ ± 3% 74.41µ ± 3% -13.88% (p=0.000 n=10) EqualBothUnaligned/67108864_0-8 1.374m ± 3% 1.171m ± 3% -14.75% (p=0.000 n=10) EqualBothUnaligned/67108864_1-8 1.401m ± 4% 1.198m ± 4% -14.49% (p=0.000 n=10) EqualBothUnaligned/67108864_4-8 1.393m ± 4% 1.205m ± 4% -13.53% (p=0.000 n=10) EqualBothUnaligned/67108864_7-8 1.396m ± 3% 1.199m ± 4% -14.11% (p=0.000 n=10) geomean 735.7n 666.7n -9.39% │ master │ HEAD │ │ B/s │ B/s vs base │ Equal/1-8 51.36Mi ± 2% 49.32Mi ± 6% -3.98% (p=0.015 n=10) Equal/6-8 300.0Mi ± 1% 295.3Mi ± 2% -1.57% (p=0.011 n=10) Equal/9-8 442.5Mi ± 2% 450.6Mi ± 1% +1.82% (p=0.005 n=10) Equal/15-8 734.9Mi ± 1% 748.8Mi ± 1% +1.90% (p=0.000 n=10) Equal/16-8 788.4Mi ± 2% 805.2Mi ± 1% +2.14% (p=0.011 n=10) Equal/20-8 944.2Mi ± 1% 961.8Mi ± 1% +1.87% (p=0.002 n=10) Equal/32-8 1.422Gi ± 0% 1.430Gi ± 1% +0.58% (p=0.011 n=10) Equal/4K-8 39.17Gi ± 2% 46.90Gi ± 3% +19.74% (p=0.000 n=10) Equal/4M-8 47.79Gi ± 3% 54.62Gi ± 4% +14.27% (p=0.000 n=10) Equal/64M-8 47.38Gi ± 1% 54.89Gi ± 3% +15.85% (p=0.000 n=10) EqualBothUnaligned/64_0-8 6.845Gi ± 4% 6.940Gi ± 3% ~ (p=0.353 n=10) EqualBothUnaligned/64_1-8 7.002Gi ± 3% 6.919Gi ± 2% ~ (p=0.481 n=10) EqualBothUnaligned/64_4-8 6.811Gi ± 3% 6.901Gi ± 4% ~ (p=0.165 n=10) EqualBothUnaligned/64_7-8 6.819Gi ± 3% 7.002Gi ± 2% ~ (p=0.052 n=10) EqualBothUnaligned/4096_0-8 42.45Gi ± 3% 54.16Gi ± 5% +27.60% (p=0.000 n=10) EqualBothUnaligned/4096_1-8 41.61Gi ± 6% 53.82Gi ± 3% +29.33% (p=0.000 n=10) EqualBothUnaligned/4096_4-8 42.19Gi ± 2% 54.09Gi ± 3% +28.22% (p=0.000 n=10) EqualBothUnaligned/4096_7-8 42.61Gi ± 3% 52.97Gi ± 5% +24.33% (p=0.000 n=10) EqualBothUnaligned/4194304_0-8 45.20Gi ± 3% 53.22Gi ± 4% +17.75% (p=0.000 n=10) EqualBothUnaligned/4194304_1-8 45.70Gi ± 2% 51.84Gi ± 1% +13.43% (p=0.000 n=10) EqualBothUnaligned/4194304_4-8 45.15Gi ± 3% 51.78Gi ± 4% +14.68% (p=0.000 n=10) EqualBothUnaligned/4194304_7-8 45.21Gi ± 3% 52.50Gi ± 4% +16.12% (p=0.000 n=10) EqualBothUnaligned/67108864_0-8 45.50Gi ± 3% 53.37Gi ± 3% +17.30% (p=0.000 n=10) EqualBothUnaligned/67108864_1-8 44.63Gi ± 4% 52.17Gi ± 4% +16.89% (p=0.000 n=10) EqualBothUnaligned/67108864_4-8 44.86Gi ± 4% 51.88Gi ± 4% +15.65% (p=0.000 n=10) EqualBothUnaligned/67108864_7-8 44.76Gi ± 3% 52.12Gi ± 4% +16.43% (p=0.000 n=10) geomean 9.734Gi 10.79Gi +10.88% For #63678 Change-Id: I427b8756e361fd4d36984c2bdb8bc3661ac3a0b8 GitHub-Last-Rev: 981d272d172a9e07c17fab04d6dbab032ecb2426 GitHub-Pull-Request: golang/go#63757 Reviewed-on: https://go-review.googlesource.com/c/go/+/537995 Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: qiulaidongfeng <2645477756@qq.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2023-11-09all: clean up addition of constants in riscv64 assemblyJoel Sing
Use ADD with constants, instead of ADDI. Also use SUB with a positive constant rather than ADD with a negative constant. The resulting assembly is still the same. Change-Id: Ife10bf5ae4122e525f0e7d41b5e463e748236a9c Reviewed-on: https://go-review.googlesource.com/c/go/+/540136 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: M Zhuo <mzh@golangcn.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Joel Sing <joel@sing.id.au>
2023-11-01internal/bytealg: optimize indexbyte in amd64qiulaidongfeng
goos: windows goarch: amd64 pkg: bytes cpu: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ IndexByte/10-16 2.613n ± 1% 2.558n ± 1% -2.09% (p=0.014 n=10) IndexByte/32-16 3.034n ± 1% 3.010n ± 2% ~ (p=0.305 n=10) IndexByte/4K-16 57.20n ± 2% 39.58n ± 2% -30.81% (p=0.000 n=10) IndexByte/4M-16 34.48µ ± 1% 33.83µ ± 2% -1.87% (p=0.023 n=10) IndexByte/64M-16 1.493m ± 2% 1.450m ± 2% -2.89% (p=0.000 n=10) IndexBytePortable/10-16 3.172n ± 4% 3.163n ± 2% ~ (p=0.684 n=10) IndexBytePortable/32-16 8.465n ± 2% 8.375n ± 3% ~ (p=0.631 n=10) IndexBytePortable/4K-16 852.0n ± 1% 846.6n ± 3% ~ (p=0.971 n=10) IndexBytePortable/4M-16 868.2µ ± 2% 856.6µ ± 2% ~ (p=0.393 n=10) IndexBytePortable/64M-16 13.81m ± 2% 13.88m ± 3% ~ (p=0.684 n=10) geomean 1.204µ 1.148µ -4.63% │ old.txt │ new.txt │ │ B/s │ B/s vs base │ IndexByte/10-16 3.565Gi ± 1% 3.641Gi ± 1% +2.15% (p=0.015 n=10) IndexByte/32-16 9.821Gi ± 1% 9.899Gi ± 2% ~ (p=0.315 n=10) IndexByte/4K-16 66.70Gi ± 2% 96.39Gi ± 2% +44.52% (p=0.000 n=10) IndexByte/4M-16 113.3Gi ± 1% 115.5Gi ± 2% +1.91% (p=0.023 n=10) IndexByte/64M-16 41.85Gi ± 2% 43.10Gi ± 2% +2.98% (p=0.000 n=10) IndexBytePortable/10-16 2.936Gi ± 4% 2.945Gi ± 2% ~ (p=0.684 n=10) IndexBytePortable/32-16 3.521Gi ± 2% 3.559Gi ± 3% ~ (p=0.631 n=10) IndexBytePortable/4K-16 4.477Gi ± 1% 4.506Gi ± 3% ~ (p=0.971 n=10) IndexBytePortable/4M-16 4.499Gi ± 2% 4.560Gi ± 2% ~ (p=0.393 n=10) IndexBytePortable/64M-16 4.525Gi ± 2% 4.504Gi ± 3% ~ (p=0.684 n=10) geomean 10.04Gi 10.53Gi +4.86% For #63678 Change-Id: I0571c2b540a816d57bd6ed8bb1df4191c7992d92 GitHub-Last-Rev: 7e95b8bfb035b53175f5a1b7d8750113933a7e17 GitHub-Pull-Request: golang/go#63847 Reviewed-on: https://go-review.googlesource.com/c/go/+/538715 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-11-01bytes,internal/bytealg: add func bytealg.LastIndexRabinKarpJes Cok
Also rename 'substr' to 'sep' in IndexRabinKarp for consistency. Change-Id: Icc2ad1116aecaf002c8264daa2fa608306c9a88a GitHub-Last-Rev: 1784b93f53d569991f86585f9011120ea26f193f GitHub-Pull-Request: golang/go#63854 Reviewed-on: https://go-review.googlesource.com/c/go/+/538716 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2023-10-31bytes,internal/bytealg: eliminate IndexRabinKarpBytes using genericsJes Cok
This is a follow-up to CL 538175. Change-Id: Iec2523b36a16d7e157c17858c89fcd43c2470d58 GitHub-Last-Rev: 812d36e57c71ea3bf44d2d64bde0703ef02a1b91 GitHub-Pull-Request: golang/go#63770 Reviewed-on: https://go-review.googlesource.com/c/go/+/538195 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Bryan Mills <bcmills@google.com>
2023-10-31internal/bytealg: optimize Count/CountString in arm64cui fliter
For #63678 goos: darwin goarch: arm64 pkg: strings │ count_old.txt │ count_new.txt │ │ sec/op │ sec/op vs base │ CountHard1-8 368.7µ ± 11% 332.0µ ± 1% -9.95% (p=0.002 n=10) CountHard2-8 348.8µ ± 5% 333.1µ ± 1% -4.51% (p=0.000 n=10) CountHard3-8 402.7µ ± 25% 359.5µ ± 1% -10.75% (p=0.000 n=10) CountTorture-8 10.536µ ± 23% 9.913µ ± 0% -5.91% (p=0.000 n=10) CountTortureOverlapping-8 74.86µ ± 9% 67.56µ ± 1% -9.75% (p=0.000 n=10) CountByte/10-8 6.905n ± 3% 6.690n ± 1% -3.11% (p=0.001 n=10) CountByte/32-8 3.247n ± 13% 3.207n ± 2% -1.23% (p=0.030 n=10) CountByte/4096-8 83.72n ± 1% 82.58n ± 1% -1.36% (p=0.007 n=10) CountByte/4194304-8 85.17µ ± 5% 84.02µ ± 8% ~ (p=0.075 n=10) CountByte/67108864-8 1.497m ± 8% 1.397m ± 2% -6.69% (p=0.000 n=10) geomean 9.977µ 9.426µ -5.53% │ count_old.txt │ count_new.txt │ │ B/s │ B/s vs base │ CountByte/10-8 1.349Gi ± 3% 1.392Gi ± 1% +3.20% (p=0.002 n=10) CountByte/32-8 9.180Gi ± 11% 9.294Gi ± 2% +1.24% (p=0.029 n=10) CountByte/4096-8 45.57Gi ± 1% 46.20Gi ± 1% +1.38% (p=0.007 n=10) CountByte/4194304-8 45.86Gi ± 5% 46.49Gi ± 7% ~ (p=0.075 n=10) CountByte/67108864-8 41.75Gi ± 8% 44.74Gi ± 2% +7.16% (p=0.000 n=10) geomean 16.10Gi 16.55Gi +2.85% Change-Id: Ifc2173ba3a926b0fa9598372d4404b8645929d45 Reviewed-on: https://go-review.googlesource.com/c/go/+/538116 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: shuang cui <imcusg@gmail.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-27bytes,internal/bytealg: eliminate HashStrBytes,HashStrRevBytes using …Jes Cok
…generics The logic of HashStrBytes, HashStrRevBytes and HashStr, HashStrRev, are exactly the same, except that the types are different. Since the bootstrap toolchain is bumped to 1.20, we can eliminate them by using generics. Change-Id: I4336b1cab494ba963f09646c169b45f6b1ee62e3 GitHub-Last-Rev: b11a2bf9476d54bed4bd18a3f9269b5c95a66d67 GitHub-Pull-Request: golang/go#63766 Reviewed-on: https://go-review.googlesource.com/c/go/+/538175 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-06internal/bytealg: process two AVX2 lanes per Count loopAchille Roussel
The branch taken by the bytealg.Count algorithm used to process a single 32 bytes block per loop iteration. Throughput of the algorithm can be improved by unrolling two iterations per loop: the lack of data dependencies between each iteration allows for better utilization of the CPU pipeline. The improvement is most significant on medium size payloads that fit in the L1 cache; beyond the L1 cache size, memory bandwidth is likely the bottleneck and the change does not show any measurable improvements. goos: linux goarch: amd64 pkg: bytes cpu: Intel(R) Xeon(R) CPU @ 2.60GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ CountSingle/10 4.800n ± 0% 4.811n ± 0% +0.23% (p=0.000 n=10) CountSingle/32 5.445n ± 0% 5.430n ± 0% ~ (p=0.085 n=10) CountSingle/4K 81.38n ± 1% 63.12n ± 0% -22.43% (p=0.000 n=10) CountSingle/4M 133.0µ ± 7% 130.1µ ± 4% ~ (p=0.280 n=10) CountSingle/64M 4.079m ± 1% 4.070m ± 3% ~ (p=0.796 n=10) geomean 1.029µ 973.3n -5.41% │ old.txt │ new.txt │ │ B/s │ B/s vs base │ CountSingle/10 1.940Gi ± 0% 1.936Gi ± 0% -0.22% (p=0.000 n=10) CountSingle/32 5.474Gi ± 0% 5.488Gi ± 0% ~ (p=0.075 n=10) CountSingle/4K 46.88Gi ± 1% 60.43Gi ± 0% +28.92% (p=0.000 n=10) CountSingle/4M 29.39Gi ± 7% 30.02Gi ± 4% ~ (p=0.280 n=10) CountSingle/64M 15.32Gi ± 1% 15.36Gi ± 3% ~ (p=0.796 n=10) geomean 11.75Gi 12.42Gi +5.71% Change-Id: I1098228c726a2ee814806dcb438b7e92febf4370 Reviewed-on: https://go-review.googlesource.com/c/go/+/532457 Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-08-28internal/bytealg: improve compare on Power10/PPC64Paul Murphy
Handle comparisons of 15 or less bytes more efficiently with Power10 instructions when building with GOPPC64=power10. name old time/op new time/op delta BytesCompare/1 2.53ns ± 0% 2.17ns ± 0% -14.17% BytesCompare/2 2.70ns ± 0% 2.17ns ± 0% -19.77% BytesCompare/4 2.59ns ± 0% 2.17ns ± 0% -16.20% BytesCompare/8 2.66ns ± 0% 2.17ns ± 0% -18.63% Change-Id: I6d7c6af0a58ea3e03acc3930c54b77f2ac1dfbd5 Reviewed-on: https://go-review.googlesource.com/c/go/+/522315 Reviewed-by: Joedian Reid <joedian@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-25internal/bytealg: add generic LastIndexByte{,String}Tobias Klauser
To avoid duplicating them in net/netip and os and to allow these packages automatically benefiting from future performance improvements when optimized native LastIndexByte{,String} implementations are added. For #36891 Change-Id: I4905a4742273570c2c36b867df57762c5bfbe1e4 Reviewed-on: https://go-review.googlesource.com/c/go/+/522475 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> Auto-Submit: Tobias Klauser <tobias.klauser@gmail.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-08-14internal/bytealg: optimize Count/CountString for PPC64/Power10Paul E. Murphy
Power10 adds a handful of new instructions which make this noticeably quicker for smaller values. Likewise, since the vector loop requires 32B to enter, unroll it once to count 32B per iteration. This improvement benefits all PPC64 cpus. On Power10 comparing a binary built with GOPPC64=power8 CountSingle/10 8.99ns ± 0% 5.55ns ± 3% -38.24% CountSingle/16 7.55ns ± 0% 5.56ns ± 3% -26.37% CountSingle/17 7.45ns ± 0% 5.25ns ± 0% -29.52% CountSingle/31 18.4ns ± 0% 6.2ns ± 0% -66.41% CountSingle/32 6.17ns ± 0% 5.04ns ± 0% -18.37% CountSingle/33 7.13ns ± 0% 5.99ns ± 0% -15.94% CountSingle/4K 198ns ± 0% 115ns ± 0% -42.08% CountSingle/4M 190µs ± 0% 109µs ± 0% -42.49% CountSingle/64M 3.28ms ± 0% 2.08ms ± 0% -36.53% Furthermore, comparing the new tail implementation on GOPPC64=power8 with GOPPC64=power10: CountSingle/10 5.55ns ± 3% 4.52ns ± 1% -18.66% CountSingle/16 5.56ns ± 3% 4.80ns ± 0% -13.65% CountSingle/17 5.25ns ± 0% 4.79ns ± 0% -8.78% CountSingle/31 6.17ns ± 0% 4.82ns ± 0% -21.79% CountSingle/32 5.04ns ± 0% 5.09ns ± 6% +1.01% CountSingle/33 5.99ns ± 0% 5.42ns ± 2% -9.54% Change-Id: I62d80be3b5d706e1abbb4bec7d6278a939a5eed4 Reviewed-on: https://go-review.googlesource.com/c/go/+/512695 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-07internal/bytealg: optimize Count/CountString in amd64Mauri de Souza Meneguzzo
Another optimization by aligning a hot loop. ``` │ sec/op │ sec/op vs base │ Count/10-16 11.29n ± 1% 10.50n ± 1% -7.04% (p=0.000 n=10) Count/32-16 11.06n ± 1% 11.36n ± 2% +2.76% (p=0.000 n=10) Count/4K-16 2.852µ ± 1% 1.953µ ± 1% -31.52% (p=0.000 n=10) Count/4M-16 2.884m ± 1% 1.958m ± 1% -32.11% (p=0.000 n=10) Count/64M-16 46.27m ± 1% 30.86m ± 0% -33.31% (p=0.000 n=10) CountEasy/10-16 9.873n ± 1% 9.669n ± 1% -2.07% (p=0.000 n=10) CountEasy/32-16 11.07n ± 1% 11.23n ± 1% +1.49% (p=0.000 n=10) CountEasy/4K-16 73.47n ± 1% 54.20n ± 0% -26.22% (p=0.000 n=10) CountEasy/4M-16 61.12µ ± 1% 49.42µ ± 0% -19.15% (p=0.000 n=10) CountEasy/64M-16 1.303m ± 3% 1.082m ± 4% -16.97% (p=0.000 n=10) CountSingle/10-16 4.150n ± 1% 3.679n ± 1% -11.36% (p=0.000 n=10) CountSingle/32-16 4.815n ± 1% 4.588n ± 1% -4.71% (p=0.000 n=10) CountSingle/4M-16 72.18µ ± 2% 75.38µ ± 1% +4.44% (p=0.000 n=10) CountHard3-16 462.6µ ± 1% 484.4µ ± 1% +4.73% (p=0.000 n=10) │ old.txt │ new.txt │ │ B/s │ B/s vs base │ Count/10-16 844.1Mi ± 1% 908.3Mi ± 1% +7.60% (p=0.000 n=10) Count/32-16 2.695Gi ± 1% 2.623Gi ± 2% -2.66% (p=0.000 n=10) Count/4K-16 1.337Gi ± 1% 1.953Gi ± 1% +46.06% (p=0.000 n=10) Count/4M-16 1.355Gi ± 1% 1.995Gi ± 1% +47.29% (p=0.000 n=10) Count/64M-16 1.351Gi ± 1% 2.026Gi ± 0% +49.95% (p=0.000 n=10) CountEasy/10-16 965.9Mi ± 1% 986.3Mi ± 1% +2.11% (p=0.000 n=10) CountEasy/32-16 2.693Gi ± 1% 2.653Gi ± 1% -1.48% (p=0.000 n=10) CountEasy/4K-16 51.93Gi ± 1% 70.38Gi ± 0% +35.54% (p=0.000 n=10) CountEasy/4M-16 63.91Gi ± 1% 79.05Gi ± 0% +23.68% (p=0.000 n=10) CountEasy/64M-16 47.97Gi ± 3% 57.77Gi ± 4% +20.44% (p=0.000 n=10) CountSingle/10-16 2.244Gi ± 1% 2.532Gi ± 1% +12.80% (p=0.000 n=10) CountSingle/32-16 6.190Gi ± 1% 6.496Gi ± 1% +4.94% (p=0.000 n=10) CountSingle/4M-16 54.12Gi ± 2% 51.82Gi ± 1% -4.25% (p=0.000 n=10) ``` Change-Id: I847b36125d2b11e2a88d31f48f6c160f041b3624 GitHub-Last-Rev: faacba662ee6bf41f69960060d48d340cfdbbbd6 GitHub-Pull-Request: golang/go#61793 Reviewed-on: https://go-review.googlesource.com/c/go/+/516455 TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org>
2023-08-07internal/bytealg: optimize Index/IndexString in amd64Mauri de Souza Meneguzzo
Now with PCALIGN available on amd64 we can start optimizing some routines that benefit from instruction alignment. ``` │ sec/op │ sec/op vs base │ IndexByte/4K-16 69.89n ± ∞ ¹ 45.88n ± ∞ ¹ -34.35% (p=0.008 n=5) IndexByte/4M-16 65.36µ ± ∞ ¹ 47.32µ ± ∞ ¹ -27.60% (p=0.008 n=5) IndexByte/64M-16 1.435m ± ∞ ¹ 1.140m ± ∞ ¹ -20.57% (p=0.008 n=5) │ B/s │ B/s vs base │ IndexByte/4K-16 54.58Gi ± ∞ ¹ 83.14Gi ± ∞ ¹ +52.32% (p=0.008 n=5) IndexByte/4M-16 59.76Gi ± ∞ ¹ 82.54Gi ± ∞ ¹ +38.12% (p=0.008 n=5) IndexByte/64M-16 43.56Gi ± ∞ ¹ 54.84Gi ± ∞ ¹ +25.89% (p=0.008 n=5) ``` Change-Id: Iff3dfd542c55e7569242be81f38b2887b9e04e87 GitHub-Last-Rev: f309f898b13ad8fdf88a21f2f105382db9ada2f5 GitHub-Pull-Request: golang/go#61792 Reviewed-on: https://go-review.googlesource.com/c/go/+/516435 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-07-20internal/bytealg: use generic IndexByte on plan9/amd64Philip Silva
Use generic implementation of IndexByte/IndexByteString on plan9/amd64 since the assembly implementation uses SSE instructions which are classified as floating point instructions and cannot be used in a note handler. A similar issue was fixed in CL 100577. This fixes runtime.TestBreakpoint. Fixes #61087. Change-Id: Id0c085e47da449be405ea04ab9b93518c4e2fde8 Reviewed-on: https://go-review.googlesource.com/c/go/+/508400 Reviewed-by: Heschi Kreinick <heschi@google.com> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David du Colombier <0intro@gmail.com>
2023-06-29internal/bytealg: fix alignment code in equal_riscv64.sMark Ryan
The riscv64 implementation of equal has an optimization that is applied when both pointers share the same alignment but that alignment is not 8 bytes. In this case it tries to align both pointers to an 8 byte boundaries, by individually comparing the first few bytes of each buffer. Unfortunately, the existing code is incorrect. It adjusts the pointers by the wrong number of bytes resulting, in most cases, in pointers that are not 8 byte aligned. This commit fixes the issue by individually comparing the first (8 - (pointer & 7)) bytes of each buffer rather than the first (pointer & 7) bytes. This particular optimization is not covered by any of the existing benchmarks so a new benchmark, BenchmarkEqualBothUnaligned, is provided. The benchmark tests the case where both pointers have the same alignment but may not be 8 byte aligned. Results of the new benchmark along with some of the existing benchmarks generated on a SiFive HiFive Unmatched A00 with 16GB of RAM running Ubuntu 23.04 are presented below. Equal/0-4 3.356n ± 0% 3.357n ± 0% ~ (p=0.840 n=10) Equal/1-4 63.91n ± 7% 65.97n ± 5% +3.22% (p=0.029 n=10) Equal/6-4 72.94n ± 5% 76.09n ± 4% ~ (p=0.075 n=10) Equal/9-4 84.61n ± 7% 85.83n ± 3% ~ (p=0.315 n=10) Equal/15-4 103.7n ± 2% 102.9n ± 4% ~ (p=0.739 n=10) Equal/16-4 89.14n ± 3% 100.40n ± 4% +12.64% (p=0.000 n=10) Equal/20-4 107.8n ± 3% 106.8n ± 3% ~ (p=0.725 n=10) Equal/32-4 63.95n ± 8% 67.79n ± 7% ~ (p=0.089 n=10) Equal/4K-4 1.256µ ± 1% 1.254µ ± 0% ~ (p=0.925 n=10) Equal/4M-4 1.231m ± 0% 1.230m ± 0% -0.04% (p=0.011 n=10) Equal/64M-4 19.77m ± 0% 19.78m ± 0% ~ (p=0.052 n=10) EqualBothUnaligned/64_0-4 43.70n ± 4% 44.40n ± 5% ~ (p=0.529 n=10) EqualBothUnaligned/64_1-4 6957.5n ± 0% 105.9n ± 1% -98.48% (p=0.000 n=10) EqualBothUnaligned/64_4-4 100.1n ± 2% 101.5n ± 4% ~ (p=0.149 n=10) EqualBothUnaligned/64_7-4 6965.00n ± 0% 95.60n ± 4% -98.63% (p=0.000 n=10) EqualBothUnaligned/4096_0-4 1.233µ ± 1% 1.225µ ± 0% -0.65% (p=0.015 n=10) EqualBothUnaligned/4096_1-4 584.226µ ± 0% 1.277µ ± 0% -99.78% (p=0.000 n=10) EqualBothUnaligned/4096_4-4 1.270µ ± 1% 1.268µ ± 0% ~ (p=0.105 n=10) EqualBothUnaligned/4096_7-4 584.944µ ± 0% 1.266µ ± 1% -99.78% (p=0.000 n=10) EqualBothUnaligned/4194304_0-4 1.241m ± 0% 1.236m ± 0% -0.38% (p=0.035 n=10) EqualBothUnaligned/4194304_1-4 600.956m ± 0% 1.238m ± 0% -99.79% (p=0.000 n=10) EqualBothUnaligned/4194304_4-4 1.239m ± 0% 1.241m ± 0% +0.22% (p=0.007 n=10) EqualBothUnaligned/4194304_7-4 601.036m ± 0% 1.239m ± 0% -99.79% (p=0.000 n=10) EqualBothUnaligned/67108864_0-4 19.79m ± 0% 19.78m ± 0% ~ (p=0.393 n=10) EqualBothUnaligned/67108864_1-4 9616.61m ± 0% 19.82m ± 0% -99.79% (p=0.000 n=10) EqualBothUnaligned/67108864_4-4 19.82m ± 0% 19.82m ± 0% ~ (p=0.971 n=10) EqualBothUnaligned/67108864_7-4 9616.34m ± 0% 19.86m ± 0% -99.79% (p=0.000 n=10) geomean 38.38µ 7.194µ -81.26% Change-Id: I4caab6c3450bd7e2773426b08b70bbc37fbe4e5f Reviewed-on: https://go-review.googlesource.com/c/go/+/500855 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-05-30internal/bytealg: fix alignment code in compare_riscv64.sMark Ryan
The riscv64 implementation of compare has an optimization that is applied when both pointers share the same alignment but that alignment is not 8 bytes. In this case it tries to align both pointers to an 8 byte boundaries, by individually comparing the first few bytes of each buffer. Unfortunately, the existing code is incorrect. It adjusts the pointers by the wrong number of bytes resulting, in most cases, in pointers that are not 8 byte aligned. This commit fixes the issue by individually comparing the first (8 - (pointer & 7)) bytes of each buffer rather than the first (pointer & 7) bytes. We also remove an unnecessary immediate MOV instruction. This particular optimization is not covered by any of the existing benchmarks so a new benchmark, benchmarkCompareBytesBigBothUnaligned, is provided. The benchmark tests the case where both pointers have the same alignment but may not be 8 byte aligned. Results of the new benchmark along with some of the existing benchmarks generated on a SiFive HiFive Unmatched A00 with 16GB of RAM running Ubuntu 23.04 are presented below. CompareBytesEqual-4 70.00n ± 6% 68.32n ± 0% -2.40% (p=0.020 n=10) CompareBytesToNil-4 19.31n ± 0% 18.47n ± 0% -4.35% (p=0.000 n=10) CompareBytesEmpty-4 16.79n ± 0% 15.95n ± 0% -4.97% (p=0.000 n=10) CompareBytesIdentical-4 19.94n ± 15% 18.32n ± 13% -8.15% (p=0.040 n=10) CompareBytesSameLength-4 37.93n ± 0% 42.44n ± 1% +11.91% (p=0.000 n=10) CompareBytesDifferentLength-4 37.93n ± 0% 42.44n ± 0% +11.89% (p=0.000 n=10) CompareBytesBigUnaligned/offset=1-4 3.881m ± 14% 3.880m ± 15% ~ (p=0.436 n=10) CompareBytesBigUnaligned/offset=2-4 3.884m ± 0% 3.875m ± 0% ~ (p=0.190 n=10) CompareBytesBigUnaligned/offset=3-4 3.858m ± 1% 3.868m ± 1% ~ (p=0.105 n=10) CompareBytesBigUnaligned/offset=4-4 3.877m ± 1% 3.876m ± 0% ~ (p=0.529 n=10) CompareBytesBigUnaligned/offset=5-4 3.859m ± 0% 3.874m ± 0% +0.39% (p=0.009 n=10) CompareBytesBigUnaligned/offset=6-4 3.878m ± 1% 3.876m ± 0% ~ (p=0.353 n=10) CompareBytesBigUnaligned/offset=7-4 3.868m ± 1% 3.877m ± 0% ~ (p=0.190 n=10) CompareBytesBigBothUnaligned/offset=0-4 1.586m ± 0% 1.765m ± 0% +11.30% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=1-4 153.132m ± 1% 1.765m ± 1% -98.85% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=2-4 152.930m ± 1% 1.765m ± 1% -98.85% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=3-4 152.093m ± 1% 1.769m ± 0% -98.84% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=4-4 1.602m ± 0% 1.764m ± 0% +10.11% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=5-4 152.314m ± 1% 1.768m ± 0% -98.84% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=6-4 152.905m ± 1% 1.764m ± 1% -98.85% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=7-4 152.951m ± 1% 1.804m ± 2% -98.82% (p=0.000 n=10) CompareBytesBig-4 1.441m ± 21% 1.373m ± 55% ~ (p=0.481 n=10) CompareBytesBigIdentical-4 19.94n ± 1% 19.10n ± 0% -4.21% (p=0.001 n=10) geomean 243.7µ 76.65µ -68.54% CompareBytesBigUnaligned/offset=1-4 257.7Mi ± 12% 257.7Mi ± 13% ~ (p=0.424 n=10) CompareBytesBigUnaligned/offset=2-4 257.5Mi ± 0% 258.1Mi ± 0% ~ (p=0.190 n=10) CompareBytesBigUnaligned/offset=3-4 259.2Mi ± 1% 258.5Mi ± 1% ~ (p=0.105 n=10) CompareBytesBigUnaligned/offset=4-4 257.9Mi ± 1% 258.0Mi ± 0% ~ (p=0.529 n=10) CompareBytesBigUnaligned/offset=5-4 259.1Mi ± 0% 258.1Mi ± 0% -0.39% (p=0.008 n=10) CompareBytesBigUnaligned/offset=6-4 257.9Mi ± 1% 258.0Mi ± 0% ~ (p=0.353 n=10) CompareBytesBigUnaligned/offset=7-4 258.5Mi ± 1% 257.9Mi ± 0% ~ (p=0.190 n=10) CompareBytesBigBothUnaligned/offset=0-4 630.6Mi ± 0% 566.6Mi ± 0% -10.15% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=1-4 6.533Mi ± 1% 566.545Mi ± 1% +8572.48% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=2-4 6.537Mi ± 1% 566.683Mi ± 1% +8568.27% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=3-4 6.576Mi ± 1% 565.200Mi ± 0% +8495.43% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=4-4 624.2Mi ± 0% 566.9Mi ± 0% -9.18% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=5-4 6.566Mi ± 1% 565.758Mi ± 0% +8516.41% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=6-4 6.542Mi ± 1% 567.036Mi ± 1% +8567.35% (p=0.000 n=10) CompareBytesBigBothUnaligned/offset=7-4 6.542Mi ± 1% 554.390Mi ± 2% +8374.05% (p=0.000 n=10) CompareBytesBig-4 694.2Mi ± 18% 728.1Mi ± 35% ~ (p=0.481 n=10) CompareBytesBigIdentical-4 47.83Ti ± 1% 49.92Ti ± 0% +4.39% (p=0.002 n=10) geomean 170.0Mi 813.8Mi +378.66% Change-Id: I0a2d0386d5ca1ffa249682a12ebd1533508e31e9 Reviewed-on: https://go-review.googlesource.com/c/go/+/497838 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: M Zhuo <mzh@golangcn.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: M Zhuo <mzh@golangcn.org>
2023-04-21cmd/internal/obj/ppc64: modify PCALIGN to ensure alignmentLynn Boger
The initial purpose of PCALIGN was to identify code where it would be beneficial to align code for performance, but avoid cases where too many NOPs were added. On p10, it is now necessary to enforce a certain alignment in some cases, so the behavior of PCALIGN needs to be slightly different. Code will now be aligned to the value specified on the PCALIGN instruction regardless of number of NOPs added, which is more intuitive and consistent with power assembler alignment directives. This also adds 64 as a possible alignment value. The existing values used in PCALIGN were modified according to the new behavior. A testcase was updated and performance testing was done to verify that this does not adversely affect performance. Change-Id: Iad1cf5ff112e5bfc0514f0805be90e24095e932b Reviewed-on: https://go-review.googlesource.com/c/go/+/485056 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Bryan Mills <bcmills@google.com>
2023-04-21internal/bytealg: rewrite indexbytebody on PPC64Paul E. Murphy
Use P8 instructions throughout to be backwards compatible, but otherwise not impede performance. Use overlapping loads where possible, and prioritize larger checks over smaller check. However, some newer instructions can be used surgically when targeting a newer GOPPC64. These can lead to noticeable performance improvements with minimal impact to readability. All tests run below on a Power10/ppc64le, and use a small modification to BenchmarkIndexByte to ensure the IndexByte wrapper call is inlined (as it likely is under realistic usage). This wrapper adds substantial overhead if not inlined. Previous (power9 path, GOPPC64=power8) vs. GOPPC64=power8: IndexByte/1 3.81ns ± 8% 3.11ns ± 5% -18.39% IndexByte/2 3.82ns ± 3% 3.20ns ± 6% -16.23% IndexByte/3 3.61ns ± 4% 3.25ns ± 6% -10.13% IndexByte/4 3.66ns ± 5% 3.08ns ± 1% -15.91% IndexByte/5 3.82ns ± 0% 3.75ns ± 2% -1.94% IndexByte/6 3.83ns ± 0% 3.87ns ± 4% +1.04% IndexByte/7 3.83ns ± 0% 3.82ns ± 0% -0.27% IndexByte/8 3.82ns ± 0% 2.92ns ±11% -23.70% IndexByte/9 3.70ns ± 2% 3.08ns ± 2% -16.87% IndexByte/10 3.74ns ± 2% 3.04ns ± 0% -18.75% IndexByte/11 3.75ns ± 0% 3.31ns ± 8% -11.79% IndexByte/12 3.74ns ± 0% 3.04ns ± 0% -18.86% IndexByte/13 3.83ns ± 4% 3.04ns ± 0% -20.64% IndexByte/14 3.80ns ± 1% 3.30ns ± 8% -13.18% IndexByte/15 3.77ns ± 1% 3.04ns ± 0% -19.33% IndexByte/16 3.81ns ± 0% 2.78ns ± 7% -26.88% IndexByte/17 4.12ns ± 0% 3.04ns ± 1% -26.11% IndexByte/18 4.27ns ± 6% 3.05ns ± 0% -28.64% IndexByte/19 4.30ns ± 4% 3.02ns ± 2% -29.65% IndexByte/20 4.43ns ± 7% 3.45ns ± 7% -22.15% IndexByte/21 4.12ns ± 0% 3.03ns ± 1% -26.35% IndexByte/22 4.40ns ± 6% 3.05ns ± 0% -30.82% IndexByte/23 4.40ns ± 6% 3.01ns ± 2% -31.48% IndexByte/24 4.32ns ± 5% 3.07ns ± 0% -28.98% IndexByte/25 4.76ns ± 2% 3.04ns ± 1% -36.11% IndexByte/26 4.82ns ± 0% 3.05ns ± 0% -36.66% IndexByte/27 4.82ns ± 0% 2.97ns ± 3% -38.39% IndexByte/28 4.82ns ± 0% 2.96ns ± 3% -38.57% IndexByte/29 4.82ns ± 0% 3.34ns ± 9% -30.71% IndexByte/30 4.82ns ± 0% 3.05ns ± 0% -36.77% IndexByte/31 4.81ns ± 0% 3.05ns ± 0% -36.70% IndexByte/32 3.52ns ± 0% 3.44ns ± 1% -2.15% IndexByte/33 4.77ns ± 1% 3.35ns ± 0% -29.81% IndexByte/34 5.01ns ± 5% 3.35ns ± 0% -33.15% IndexByte/35 4.92ns ± 9% 3.35ns ± 0% -31.89% IndexByte/36 4.81ns ± 5% 3.35ns ± 0% -30.37% IndexByte/37 4.99ns ± 6% 3.35ns ± 0% -32.86% IndexByte/38 5.06ns ± 5% 3.35ns ± 0% -33.84% IndexByte/39 5.02ns ± 5% 3.48ns ± 9% -30.58% IndexByte/40 5.21ns ± 9% 3.55ns ± 4% -31.82% IndexByte/41 5.18ns ± 0% 3.42ns ± 2% -33.98% IndexByte/42 5.19ns ± 0% 3.55ns ±11% -31.56% IndexByte/43 5.18ns ± 0% 3.45ns ± 5% -33.46% IndexByte/44 5.18ns ± 0% 3.39ns ± 0% -34.56% IndexByte/45 5.18ns ± 0% 3.43ns ± 4% -33.74% IndexByte/46 5.18ns ± 0% 3.47ns ± 1% -33.03% IndexByte/47 5.18ns ± 0% 3.44ns ± 2% -33.54% IndexByte/48 5.18ns ± 0% 3.39ns ± 0% -34.52% IndexByte/49 5.69ns ± 0% 3.79ns ± 0% -33.45% IndexByte/50 5.70ns ± 0% 3.70ns ± 3% -34.98% IndexByte/51 5.70ns ± 0% 3.70ns ± 2% -35.05% IndexByte/52 5.69ns ± 0% 3.80ns ± 1% -33.35% IndexByte/53 5.69ns ± 0% 3.78ns ± 0% -33.54% IndexByte/54 5.69ns ± 0% 3.78ns ± 1% -33.51% IndexByte/55 5.69ns ± 0% 3.78ns ± 0% -33.61% IndexByte/56 5.69ns ± 0% 3.81ns ± 3% -33.12% IndexByte/57 6.20ns ± 0% 3.79ns ± 4% -38.89% IndexByte/58 6.20ns ± 0% 3.74ns ± 2% -39.58% IndexByte/59 6.20ns ± 0% 3.69ns ± 2% -40.47% IndexByte/60 6.20ns ± 0% 3.79ns ± 1% -38.81% IndexByte/61 6.20ns ± 0% 3.77ns ± 1% -39.23% IndexByte/62 6.20ns ± 0% 3.79ns ± 0% -38.89% IndexByte/63 6.20ns ± 0% 3.79ns ± 0% -38.90% IndexByte/64 4.17ns ± 0% 3.47ns ± 3% -16.70% IndexByte/65 5.38ns ± 0% 4.21ns ± 0% -21.59% IndexByte/66 5.38ns ± 0% 4.21ns ± 0% -21.58% IndexByte/67 5.38ns ± 0% 4.22ns ± 0% -21.58% IndexByte/68 5.38ns ± 0% 4.22ns ± 0% -21.59% IndexByte/69 5.38ns ± 0% 4.22ns ± 0% -21.56% IndexByte/70 5.38ns ± 0% 4.21ns ± 0% -21.59% IndexByte/71 5.37ns ± 0% 4.21ns ± 0% -21.51% IndexByte/72 5.37ns ± 0% 4.22ns ± 0% -21.46% IndexByte/73 5.71ns ± 0% 4.22ns ± 0% -26.20% IndexByte/74 5.71ns ± 0% 4.21ns ± 0% -26.21% IndexByte/75 5.71ns ± 0% 4.21ns ± 0% -26.17% IndexByte/76 5.71ns ± 0% 4.22ns ± 0% -26.22% IndexByte/77 5.71ns ± 0% 4.22ns ± 0% -26.22% IndexByte/78 5.71ns ± 0% 4.21ns ± 0% -26.22% IndexByte/79 5.71ns ± 0% 4.22ns ± 0% -26.21% IndexByte/80 5.71ns ± 0% 4.21ns ± 0% -26.19% IndexByte/81 6.20ns ± 0% 4.39ns ± 0% -29.13% IndexByte/82 6.20ns ± 0% 4.36ns ± 0% -29.67% IndexByte/83 6.20ns ± 0% 4.36ns ± 0% -29.63% IndexByte/84 6.20ns ± 0% 4.39ns ± 0% -29.21% IndexByte/85 6.20ns ± 0% 4.36ns ± 0% -29.64% IndexByte/86 6.20ns ± 0% 4.36ns ± 0% -29.63% IndexByte/87 6.20ns ± 0% 4.39ns ± 0% -29.21% IndexByte/88 6.20ns ± 0% 4.36ns ± 0% -29.65% IndexByte/89 6.74ns ± 0% 4.36ns ± 0% -35.33% IndexByte/90 6.75ns ± 0% 4.37ns ± 0% -35.22% IndexByte/91 6.74ns ± 0% 4.36ns ± 0% -35.30% IndexByte/92 6.74ns ± 0% 4.36ns ± 0% -35.34% IndexByte/93 6.74ns ± 0% 4.37ns ± 0% -35.20% IndexByte/94 6.74ns ± 0% 4.36ns ± 0% -35.33% IndexByte/95 6.75ns ± 0% 4.36ns ± 0% -35.32% IndexByte/96 4.83ns ± 0% 4.34ns ± 2% -10.24% IndexByte/97 5.91ns ± 0% 4.65ns ± 0% -21.24% IndexByte/98 5.91ns ± 0% 4.65ns ± 0% -21.24% IndexByte/99 5.91ns ± 0% 4.65ns ± 0% -21.23% IndexByte/100 5.90ns ± 0% 4.65ns ± 0% -21.21% IndexByte/101 5.90ns ± 0% 4.65ns ± 0% -21.22% IndexByte/102 5.90ns ± 0% 4.65ns ± 0% -21.23% IndexByte/103 5.91ns ± 0% 4.65ns ± 0% -21.23% IndexByte/104 5.91ns ± 0% 4.65ns ± 0% -21.24% IndexByte/105 6.25ns ± 0% 4.65ns ± 0% -25.59% IndexByte/106 6.25ns ± 0% 4.65ns ± 0% -25.59% IndexByte/107 6.25ns ± 0% 4.65ns ± 0% -25.60% IndexByte/108 6.25ns ± 0% 4.65ns ± 0% -25.58% IndexByte/109 6.24ns ± 0% 4.65ns ± 0% -25.50% IndexByte/110 6.25ns ± 0% 4.65ns ± 0% -25.56% IndexByte/111 6.25ns ± 0% 4.65ns ± 0% -25.60% IndexByte/112 6.25ns ± 0% 4.65ns ± 0% -25.59% IndexByte/113 6.76ns ± 0% 5.05ns ± 0% -25.37% IndexByte/114 6.76ns ± 0% 5.05ns ± 0% -25.31% IndexByte/115 6.76ns ± 0% 5.05ns ± 0% -25.38% IndexByte/116 6.76ns ± 0% 5.05ns ± 0% -25.31% IndexByte/117 6.76ns ± 0% 5.05ns ± 0% -25.38% IndexByte/118 6.76ns ± 0% 5.05ns ± 0% -25.31% IndexByte/119 6.76ns ± 0% 5.05ns ± 0% -25.38% IndexByte/120 6.76ns ± 0% 5.05ns ± 0% -25.36% IndexByte/121 7.35ns ± 0% 5.05ns ± 0% -31.33% IndexByte/122 7.36ns ± 0% 5.05ns ± 0% -31.42% IndexByte/123 7.38ns ± 0% 5.05ns ± 0% -31.60% IndexByte/124 7.38ns ± 0% 5.05ns ± 0% -31.59% IndexByte/125 7.38ns ± 0% 5.05ns ± 0% -31.60% IndexByte/126 7.38ns ± 0% 5.05ns ± 0% -31.58% IndexByte/128 5.28ns ± 0% 5.10ns ± 0% -3.41% IndexByte/256 7.27ns ± 0% 7.28ns ± 2% +0.13% IndexByte/512 12.1ns ± 0% 11.8ns ± 0% -2.51% IndexByte/1K 23.1ns ± 3% 22.0ns ± 0% -4.66% IndexByte/2K 42.6ns ± 0% 42.4ns ± 0% -0.41% IndexByte/4K 90.3ns ± 0% 89.4ns ± 0% -0.98% IndexByte/8K 170ns ± 0% 170ns ± 0% -0.59% IndexByte/16K 331ns ± 0% 330ns ± 0% -0.27% IndexByte/32K 660ns ± 0% 660ns ± 0% -0.08% IndexByte/64K 1.30µs ± 0% 1.30µs ± 0% -0.08% IndexByte/128K 2.58µs ± 0% 2.58µs ± 0% -0.04% IndexByte/256K 5.15µs ± 0% 5.15µs ± 0% -0.04% IndexByte/512K 10.3µs ± 0% 10.3µs ± 0% -0.03% IndexByte/1M 20.6µs ± 0% 20.5µs ± 0% -0.03% IndexByte/2M 41.1µs ± 0% 41.1µs ± 0% -0.03% IndexByte/4M 82.2µs ± 0% 82.1µs ± 0% -0.02% IndexByte/8M 164µs ± 0% 164µs ± 0% -0.01% IndexByte/16M 328µs ± 0% 328µs ± 0% -0.01% IndexByte/32M 657µs ± 0% 657µs ± 0% -0.00% GOPPC64=power8 vs GOPPC64=power9. The Improvement is most noticed between 16 and 64B, and goes away around 128B. IndexByte/16 2.78ns ± 7% 2.65ns ±15% -4.74% IndexByte/17 3.04ns ± 1% 2.80ns ± 3% -7.85% IndexByte/18 3.05ns ± 0% 2.71ns ± 4% -11.00% IndexByte/19 3.02ns ± 2% 2.76ns ±10% -8.74% IndexByte/20 3.45ns ± 7% 2.91ns ± 0% -15.46% IndexByte/21 3.03ns ± 1% 2.84ns ± 9% -6.33% IndexByte/22 3.05ns ± 0% 2.67ns ± 1% -12.38% IndexByte/23 3.01ns ± 2% 2.67ns ± 1% -11.24% IndexByte/24 3.07ns ± 0% 2.92ns ±12% -4.79% IndexByte/25 3.04ns ± 1% 3.15ns ±15% +3.63% IndexByte/26 3.05ns ± 0% 2.83ns ±13% -7.33% IndexByte/27 2.97ns ± 3% 2.98ns ±10% +0.56% IndexByte/28 2.96ns ± 3% 2.96ns ± 9% -0.05% IndexByte/29 3.34ns ± 9% 3.03ns ±12% -9.33% IndexByte/30 3.05ns ± 0% 2.68ns ± 1% -12.05% IndexByte/31 3.05ns ± 0% 2.83ns ±12% -7.27% IndexByte/32 3.44ns ± 1% 3.21ns ±10% -6.78% IndexByte/33 3.35ns ± 0% 3.41ns ± 2% +1.95% IndexByte/34 3.35ns ± 0% 3.13ns ± 0% -6.53% IndexByte/35 3.35ns ± 0% 3.13ns ± 0% -6.54% IndexByte/36 3.35ns ± 0% 3.13ns ± 0% -6.52% IndexByte/37 3.35ns ± 0% 3.13ns ± 0% -6.52% IndexByte/38 3.35ns ± 0% 3.24ns ± 4% -3.30% IndexByte/39 3.48ns ± 9% 3.44ns ± 2% -1.19% IndexByte/40 3.55ns ± 4% 3.46ns ± 2% -2.44% IndexByte/41 3.42ns ± 2% 3.39ns ± 4% -0.86% IndexByte/42 3.55ns ±11% 3.46ns ± 1% -2.65% IndexByte/43 3.45ns ± 5% 3.44ns ± 2% -0.31% IndexByte/44 3.39ns ± 0% 3.43ns ± 3% +1.23% IndexByte/45 3.43ns ± 4% 3.50ns ± 1% +2.07% IndexByte/46 3.47ns ± 1% 3.46ns ± 2% -0.31% IndexByte/47 3.44ns ± 2% 3.47ns ± 1% +0.78% IndexByte/48 3.39ns ± 0% 3.46ns ± 2% +1.96% IndexByte/49 3.79ns ± 0% 3.47ns ± 0% -8.41% IndexByte/50 3.70ns ± 3% 3.64ns ± 5% -1.66% IndexByte/51 3.70ns ± 2% 3.75ns ± 0% +1.40% IndexByte/52 3.80ns ± 1% 3.77ns ± 0% -0.70% IndexByte/53 3.78ns ± 0% 3.77ns ± 0% -0.46% IndexByte/54 3.78ns ± 1% 3.53ns ± 7% -6.74% IndexByte/55 3.78ns ± 0% 3.47ns ± 0% -8.17% IndexByte/56 3.81ns ± 3% 3.45ns ± 0% -9.43% IndexByte/57 3.79ns ± 4% 3.47ns ± 0% -8.45% IndexByte/58 3.74ns ± 2% 3.55ns ± 4% -5.16% IndexByte/59 3.69ns ± 2% 3.61ns ± 4% -2.01% IndexByte/60 3.79ns ± 1% 3.45ns ± 0% -9.09% IndexByte/61 3.77ns ± 1% 3.47ns ± 0% -7.93% IndexByte/62 3.79ns ± 0% 3.45ns ± 0% -8.97% IndexByte/63 3.79ns ± 0% 3.47ns ± 0% -8.44% IndexByte/64 3.47ns ± 3% 3.18ns ± 0% -8.41% GOPPC64=power9 vs GOPPC64=power10. Only sizes <16 will show meaningful changes. IndexByte/1 3.27ns ± 8% 2.36ns ± 2% -27.58% IndexByte/2 3.06ns ± 4% 2.34ns ± 1% -23.42% IndexByte/3 3.77ns ±11% 2.48ns ± 7% -34.03% IndexByte/4 3.18ns ± 8% 2.33ns ± 1% -26.69% IndexByte/5 3.18ns ± 5% 2.34ns ± 4% -26.26% IndexByte/6 3.13ns ± 3% 2.35ns ± 1% -24.97% IndexByte/7 3.25ns ± 1% 2.33ns ± 1% -28.22% IndexByte/8 2.79ns ± 2% 2.36ns ± 1% -15.32% IndexByte/9 2.90ns ± 0% 2.34ns ± 2% -19.36% IndexByte/10 2.99ns ± 3% 2.31ns ± 1% -22.70% IndexByte/11 3.13ns ± 7% 2.31ns ± 0% -26.08% IndexByte/12 3.01ns ± 4% 2.32ns ± 1% -22.91% IndexByte/13 2.98ns ± 3% 2.31ns ± 1% -22.72% IndexByte/14 2.92ns ± 2% 2.61ns ±16% -10.58% IndexByte/15 3.02ns ± 5% 2.69ns ± 7% -10.90% IndexByte/16 2.65ns ±15% 2.29ns ± 1% -13.61% Change-Id: I4482f762d25eabf60def4981a0b2bc0c10ccf50c Reviewed-on: https://go-review.googlesource.com/c/go/+/478656 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org>