aboutsummaryrefslogtreecommitdiff
path: root/src/internal/bytealg
diff options
context:
space:
mode:
authorArchana R <aravind5@in.ibm.com>2022-11-03 10:02:04 -0500
committerLynn Boger <laboger@linux.vnet.ibm.com>2022-11-07 19:27:43 +0000
commit92c7df116ecbd8f230b48f72eb44fa7de5d13233 (patch)
tree970b99b7a5464b39d74d512153b67ad7717c1b85 /src/internal/bytealg
parentd2b5a6f332b011c75c17bfb99216cc51ac7a0b5f (diff)
downloadgo-92c7df116ecbd8f230b48f72eb44fa7de5d13233.tar.xz
internal/bytealg: add PCALIGN to indexbodyp9 function on ppc64x
Adding PCALIGN in indexbodyp9 function shows improvements in some SimonWaldherr benchmarks and one of the index benchmarks on both Power9 and Power10 name old time/op new time/op delta Contains 19.8ns ± 0% 15.6ns ± 0% -21.24% ContainsNot 21.3ns ± 0% 18.9ns ± 0% -11.03% ContainsBytes 19.1ns ± 0% 16.0ns ± 0% -16.54% Index/10 17.3ns ± 0% 16.1ns ± 0% -7.30% Index/32 59.6ns ± 0% 59.6ns ± 0% +0.12% Index/4K 3.68µs ± 0% 3.68µs ± 0% ~ Index/4M 3.74ms ± 0% 3.74ms ± 0% -0.00% Index/64M 59.8ms ± 0% 59.8ms ± 0% ~ Change-Id: I784e57e0b0f5bac143f57f3a32845219e43d47fd Reviewed-on: https://go-review.googlesource.com/c/go/+/447595 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
Diffstat (limited to 'src/internal/bytealg')
-rw-r--r--src/internal/bytealg/index_ppc64x.s4
1 files changed, 2 insertions, 2 deletions
diff --git a/src/internal/bytealg/index_ppc64x.s b/src/internal/bytealg/index_ppc64x.s
index 735159cd8e..26205cebaf 100644
--- a/src/internal/bytealg/index_ppc64x.s
+++ b/src/internal/bytealg/index_ppc64x.s
@@ -660,7 +660,7 @@ index2to16:
BGT index2to16tail
MOVD $3, R17 // Number of bytes beyond 16
-
+ PCALIGN $32
index2to16loop:
LXVB16X (R7)(R0), V1 // Load next 16 bytes of string into V1 from R7
LXVB16X (R7)(R17), V5 // Load next 16 bytes of string into V5 from R7+3
@@ -738,7 +738,7 @@ short:
MTVSRD R10, V8 // Set up shift
VSLDOI $8, V8, V8, V8
VSLO V1, V8, V1 // Shift by start byte
-
+ PCALIGN $32
index2to16next:
VAND V1, SEPMASK, V2 // Just compare size of sep
VCMPEQUBCC V0, V2, V3 // Compare sep and partial string