diff options
| author | Xiaolin Zhao <zhaoxiaolin@loongson.cn> | 2025-08-29 16:20:16 +0800 |
|---|---|---|
| committer | Gopher Robot <gobot@golang.org> | 2025-09-04 09:22:33 -0700 |
| commit | b8cc907425c4b851d2b941cf689cf8177ea8a153 (patch) | |
| tree | c6d99ae0cff79fbfa55dcaa69928a1c24ffc474a /src/internal | |
| parent | 8c27a808905b0611b0a7b7bbff08819206be3b86 (diff) | |
| download | go-b8cc907425c4b851d2b941cf689cf8177ea8a153.tar.xz | |
cmd/internal/obj/loong64: fix the usage of offset in the instructions [X]VLDREPL.{B/H/W/D}
The previously defined usage of offset was ambiguous and not easy to understand.
For example, to fetch 4 bytes of data from the address base+8 and
broadcast it to each word element of vector register V5, the assembly
implementation is as follows:
previous: VMOVQ 2(base), V5.W4
current: VMOVQ 8(base), V5.W4
Change-Id: I8bc84e35033ab63bd10f4c61618789f94314f78c
Reviewed-on: https://go-review.googlesource.com/c/go/+/699875
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Diffstat (limited to 'src/internal')
| -rw-r--r-- | src/internal/chacha8rand/chacha8_loong64.s | 20 |
1 files changed, 10 insertions, 10 deletions
diff --git a/src/internal/chacha8rand/chacha8_loong64.s b/src/internal/chacha8rand/chacha8_loong64.s index 5e6857ed3a..73a1e5bf05 100644 --- a/src/internal/chacha8rand/chacha8_loong64.s +++ b/src/internal/chacha8rand/chacha8_loong64.s @@ -50,22 +50,22 @@ lsx_chacha8: // load contants VMOVQ (R10), V0.W4 - VMOVQ 1(R10), V1.W4 - VMOVQ 2(R10), V2.W4 - VMOVQ 3(R10), V3.W4 + VMOVQ 4(R10), V1.W4 + VMOVQ 8(R10), V2.W4 + VMOVQ 12(R10), V3.W4 // load 4-32bit data from incRotMatrix added to counter VMOVQ (R11), V30 // load seed VMOVQ (R4), V4.W4 - VMOVQ 1(R4), V5.W4 - VMOVQ 2(R4), V6.W4 - VMOVQ 3(R4), V7.W4 - VMOVQ 4(R4), V8.W4 - VMOVQ 5(R4), V9.W4 - VMOVQ 6(R4), V10.W4 - VMOVQ 7(R4), V11.W4 + VMOVQ 4(R4), V5.W4 + VMOVQ 8(R4), V6.W4 + VMOVQ 12(R4), V7.W4 + VMOVQ 16(R4), V8.W4 + VMOVQ 20(R4), V9.W4 + VMOVQ 24(R4), V10.W4 + VMOVQ 28(R4), V11.W4 // load counter and update counter VMOVQ R6, V12.W4 |
