aboutsummaryrefslogtreecommitdiff
path: root/src/math
diff options
context:
space:
mode:
authorHuang Qiqi <huangqiqi@loongson.cn>2024-06-11 16:09:10 +0800
committerabner chenc <chenguoqi@loongson.cn>2025-04-15 04:55:58 -0700
commit2fe0330cd74ce1ff7510977a5eee9194acc741d0 (patch)
tree60e46c6b09d6eada6d0138dd21fe591709085f42 /src/math
parentb8ed752d6f2bd4d7edb48c8614a133182e7646ee (diff)
downloadgo-2fe0330cd74ce1ff7510977a5eee9194acc741d0.tar.xz
math/big: optimize addVW function for loong64
Benchmark results on Loongson 3C5000 (which is an LA464 implementation): goos: linux goarch: loong64 pkg: math/big cpu: Loongson-3C5000 @ 2200.00MHz │ test/old_3c5000_addvw.log │ test/new_3c5000_addvw.log │ │ sec/op │ sec/op vs base │ AddVW/1 9.555n ± 0% 5.915n ± 0% -38.09% (p=0.000 n=20) AddVW/2 11.370n ± 0% 6.825n ± 0% -39.97% (p=0.000 n=20) AddVW/3 12.485n ± 0% 7.970n ± 0% -36.16% (p=0.000 n=20) AddVW/4 14.980n ± 0% 9.718n ± 0% -35.13% (p=0.000 n=20) AddVW/5 16.73n ± 0% 10.63n ± 0% -36.46% (p=0.000 n=20) AddVW/10 24.57n ± 0% 15.18n ± 0% -38.23% (p=0.000 n=20) AddVW/100 184.9n ± 0% 102.4n ± 0% -44.62% (p=0.000 n=20) AddVW/1000 1721.0n ± 0% 921.4n ± 0% -46.46% (p=0.000 n=20) AddVW/10000 16.83µ ± 0% 11.68µ ± 0% -30.58% (p=0.000 n=20) AddVW/100000 184.7µ ± 0% 131.3µ ± 0% -28.93% (p=0.000 n=20) AddVWext/1 9.554n ± 0% 5.915n ± 0% -38.09% (p=0.000 n=20) AddVWext/2 11.370n ± 0% 6.825n ± 0% -39.97% (p=0.000 n=20) AddVWext/3 12.505n ± 0% 7.969n ± 0% -36.27% (p=0.000 n=20) AddVWext/4 14.980n ± 0% 9.718n ± 0% -35.13% (p=0.000 n=20) AddVWext/5 16.70n ± 0% 10.63n ± 0% -36.33% (p=0.000 n=20) AddVWext/10 24.54n ± 0% 15.18n ± 0% -38.13% (p=0.000 n=20) AddVWext/100 185.0n ± 0% 102.4n ± 0% -44.65% (p=0.000 n=20) AddVWext/1000 1721.0n ± 0% 921.4n ± 0% -46.46% (p=0.000 n=20) AddVWext/10000 16.83µ ± 0% 11.68µ ± 0% -30.60% (p=0.000 n=20) AddVWext/100000 184.9µ ± 0% 130.4µ ± 0% -29.51% (p=0.000 n=20) geomean 155.5n 96.87n -37.70% Change-Id: I824a90cb365e09d7d0d4a2c53ff4b30cf057a75e Reviewed-on: https://go-review.googlesource.com/c/go/+/659876 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
Diffstat (limited to 'src/math')
-rw-r--r--src/math/big/arith_loong64.s24
1 files changed, 23 insertions, 1 deletions
diff --git a/src/math/big/arith_loong64.s b/src/math/big/arith_loong64.s
index 12cfb84eea..a0130efc31 100644
--- a/src/math/big/arith_loong64.s
+++ b/src/math/big/arith_loong64.s
@@ -15,8 +15,30 @@ TEXT ·addVV(SB),NOSPLIT,$0
TEXT ·subVV(SB),NOSPLIT,$0
JMP ·subVV_g(SB)
+// func addVW(z, x []Word, y Word) (c Word)
TEXT ·addVW(SB),NOSPLIT,$0
- JMP ·addVW_g(SB)
+ // input:
+ // R4: z
+ // R5: z_len
+ // R7: x
+ // R10: y
+ MOVV z+0(FP), R4
+ MOVV z_len+8(FP), R5
+ MOVV x+24(FP), R7
+ MOVV y+48(FP), R10
+ MOVV $0, R6
+ SLLV $3, R5
+loop:
+ BEQ R5, R6, done
+ MOVV (R6)(R7), R8
+ ADDV R8, R10, R9 // x1 + c = z1, if z1 < x1 then z1 overflow
+ SGTU R8, R9, R10
+ MOVV R9, (R6)(R4)
+ ADDV $8, R6
+ JMP loop
+done:
+ MOVV R10, c+56(FP)
+ RET
TEXT ·subVW(SB),NOSPLIT,$0
JMP ·subVW_g(SB)