aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/asm
diff options
context:
space:
mode:
authorWei Xiao <wei.xiao@arm.com>2017-06-16 06:45:14 +0000
committerBrad Fitzpatrick <bradfitz@golang.org>2017-11-21 19:07:38 +0000
commit9a14cd9e7556e68ea98255ae809b5862d574df9e (patch)
treee3f36968e830bc9313f0ea83bdf622f54ee6aca0 /src/cmd/asm
parent63ef3cde335e5b46fc3c8027b5e2f474a26717e8 (diff)
downloadgo-9a14cd9e7556e68ea98255ae809b5862d574df9e.tar.xz
bytes: add optimized countByte for arm64
Use SIMD instructions when counting a single byte. Inspired from runtime IndexByte implementation. Benchmark results of bytes, where 1 byte in every 8 is the one we are looking: name old time/op new time/op delta CountSingle/10-8 96.1ns ± 1% 38.8ns ± 0% -59.64% (p=0.000 n=9+7) CountSingle/32-8 172ns ± 2% 36ns ± 1% -79.27% (p=0.000 n=10+10) CountSingle/4K-8 18.2µs ± 1% 0.9µs ± 0% -95.17% (p=0.000 n=9+10) CountSingle/4M-8 18.4ms ± 0% 0.9ms ± 0% -95.00% (p=0.000 n=10+9) CountSingle/64M-8 284ms ± 4% 19ms ± 0% -93.40% (p=0.000 n=10+10) name old speed new speed delta CountSingle/10-8 104MB/s ± 1% 258MB/s ± 0% +147.99% (p=0.000 n=9+10) CountSingle/32-8 185MB/s ± 1% 897MB/s ± 1% +385.33% (p=0.000 n=9+10) CountSingle/4K-8 225MB/s ± 1% 4658MB/s ± 0% +1967.40% (p=0.000 n=9+10) CountSingle/4M-8 228MB/s ± 0% 4555MB/s ± 0% +1901.71% (p=0.000 n=10+9) CountSingle/64M-8 236MB/s ± 4% 3575MB/s ± 0% +1414.69% (p=0.000 n=10+10) Change-Id: Ifccb51b3c8658c49773fe05147c3cf3aead361e5 Reviewed-on: https://go-review.googlesource.com/71111 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
Diffstat (limited to 'src/cmd/asm')
-rw-r--r--src/cmd/asm/internal/asm/testdata/arm64.s5
1 files changed, 5 insertions, 0 deletions
diff --git a/src/cmd/asm/internal/asm/testdata/arm64.s b/src/cmd/asm/internal/asm/testdata/arm64.s
index 269e363f7e..ab6ad5bcb7 100644
--- a/src/cmd/asm/internal/asm/testdata/arm64.s
+++ b/src/cmd/asm/internal/asm/testdata/arm64.s
@@ -51,6 +51,11 @@ TEXT foo(SB), DUPOK|NOSPLIT, $-8
SHA1P V11.S4, V10, V9 // 49110b5e
VADDV V0.S4, V0 // 00b8b14e
VMOVI $82, V0.B16 // 40e6024f
+ VUADDLV V6.B16, V6 // c638306e
+ VADD V1, V2, V3 // 4384e15e
+ VADD V1, V3, V3 // 6384e15e
+ VSUB V12, V30, V30 // de87ec7e
+ VSUB V12, V20, V30 // 9e86ec7e
// LTYPE1 imsr ',' spreg ','
// {