aboutsummaryrefslogtreecommitdiff
path: root/src/internal
diff options
context:
space:
mode:
authorXiaolin Zhao <zhaoxiaolin@loongson.cn>2025-08-29 16:20:16 +0800
committerGopher Robot <gobot@golang.org>2025-09-04 09:22:33 -0700
commitb8cc907425c4b851d2b941cf689cf8177ea8a153 (patch)
treec6d99ae0cff79fbfa55dcaa69928a1c24ffc474a /src/internal
parent8c27a808905b0611b0a7b7bbff08819206be3b86 (diff)
downloadgo-b8cc907425c4b851d2b941cf689cf8177ea8a153.tar.xz
cmd/internal/obj/loong64: fix the usage of offset in the instructions [X]VLDREPL.{B/H/W/D}
The previously defined usage of offset was ambiguous and not easy to understand. For example, to fetch 4 bytes of data from the address base+8 and broadcast it to each word element of vector register V5, the assembly implementation is as follows: previous: VMOVQ 2(base), V5.W4 current: VMOVQ 8(base), V5.W4 Change-Id: I8bc84e35033ab63bd10f4c61618789f94314f78c Reviewed-on: https://go-review.googlesource.com/c/go/+/699875 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Diffstat (limited to 'src/internal')
-rw-r--r--src/internal/chacha8rand/chacha8_loong64.s20
1 files changed, 10 insertions, 10 deletions
diff --git a/src/internal/chacha8rand/chacha8_loong64.s b/src/internal/chacha8rand/chacha8_loong64.s
index 5e6857ed3a..73a1e5bf05 100644
--- a/src/internal/chacha8rand/chacha8_loong64.s
+++ b/src/internal/chacha8rand/chacha8_loong64.s
@@ -50,22 +50,22 @@ lsx_chacha8:
// load contants
VMOVQ (R10), V0.W4
- VMOVQ 1(R10), V1.W4
- VMOVQ 2(R10), V2.W4
- VMOVQ 3(R10), V3.W4
+ VMOVQ 4(R10), V1.W4
+ VMOVQ 8(R10), V2.W4
+ VMOVQ 12(R10), V3.W4
// load 4-32bit data from incRotMatrix added to counter
VMOVQ (R11), V30
// load seed
VMOVQ (R4), V4.W4
- VMOVQ 1(R4), V5.W4
- VMOVQ 2(R4), V6.W4
- VMOVQ 3(R4), V7.W4
- VMOVQ 4(R4), V8.W4
- VMOVQ 5(R4), V9.W4
- VMOVQ 6(R4), V10.W4
- VMOVQ 7(R4), V11.W4
+ VMOVQ 4(R4), V5.W4
+ VMOVQ 8(R4), V6.W4
+ VMOVQ 12(R4), V7.W4
+ VMOVQ 16(R4), V8.W4
+ VMOVQ 20(R4), V9.W4
+ VMOVQ 24(R4), V10.W4
+ VMOVQ 28(R4), V11.W4
// load counter and update counter
VMOVQ R6, V12.W4