| Age | Commit message (Collapse) | Author |
|
Ref: #733621
Updates #75463
Change-Id: Idd8821d1713754097a2fe83a050c25d9ec5b17eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/735540
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Gets rid of some sign extensions, like arm64.
Change-Id: I9fc37e15a82718bfcf53db8cab0c4e7baaa0a747
Reviewed-on: https://go-review.googlesource.com/c/go/+/721522
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
| old.txt | new.txt |
| sec/op | sec/op vs base |
ClearFat256 6.406n ± 0% 3.329n ± 1% -48.03% (p=0.000 n=10)
ClearFat512 12.810n ± 0% 7.607n ± 0% -40.62% (p=0.000 n=10)
ClearFat1024 25.62n ± 0% 14.01n ± 0% -45.32% (p=0.000 n=10)
ClearFat1032 26.02n ± 0% 14.28n ± 0% -45.14% (p=0.000 n=10)
ClearFat1040 26.02n ± 0% 14.41n ± 0% -44.62% (p=0.000 n=10)
MemclrKnownSize192 4.804n ± 0% 2.827n ± 0% -41.15% (p=0.000 n=10)
MemclrKnownSize248 6.561n ± 0% 4.371n ± 0% -33.38% (p=0.000 n=10)
MemclrKnownSize256 6.406n ± 0% 3.335n ± 0% -47.94% (p=0.000 n=10)
geomean 11.41n 6.453n -43.45%
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3C5000 @ 2200.00MHz
| old.txt | new.txt |
| sec/op | sec/op vs base |
ClearFat256 14.570n ± 0% 7.284n ± 0% -50.01% (p=0.000 n=10)
ClearFat512 29.13n ± 0% 14.57n ± 0% -49.98% (p=0.000 n=10)
ClearFat1024 58.26n ± 0% 29.15n ± 0% -49.97% (p=0.000 n=10)
ClearFat1032 58.73n ± 0% 29.15n ± 0% -50.36% (p=0.000 n=10)
ClearFat1040 59.18n ± 0% 29.26n ± 0% -50.56% (p=0.000 n=10)
MemclrKnownSize192 10.930n ± 0% 5.466n ± 0% -49.99% (p=0.000 n=10)
MemclrKnownSize248 14.110n ± 0% 6.772n ± 0% -52.01% (p=0.000 n=10)
MemclrKnownSize256 14.570n ± 0% 7.285n ± 0% -50.00% (p=0.000 n=10)
geomean 25.75n 12.78n -50.36%
Change-Id: I88d7b6ae2f6fc3f095979f24fb83ff42a9d2d42e
Reviewed-on: https://go-review.googlesource.com/c/go/+/720940
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
The ICE seen on loong64 while compiling the `(*gcWork).tryStealSpan`
function was due to an `LoweredAtomicAnd32` op (inlined from the
`(pMask).clear` implementation) being incorrectly assigned an output
register while it shouldn't have. Because the op is of mem type, it has
needRegister() == false; hence in the shuffle phase of regalloc, its
bogus output register has no associated `orig` value recorded. The bug
was introduced in CL 482756, but only recently exposed by CL 696035.
Since the old-style atomic ops need no return value (and is even
documented so besides the loong64 ssa op definition), just fix the
register info for both.
While at it, add a note in the ssa op definition file about the
architectural necessity of resultNotInArgs for loong64 atomic ops,
because the practice is not seen in several other arches I have
checked.
Updates #75776
Change-Id: I087f51b8a2825d7b00fc3965b0afcc8b02cad277
Reviewed-on: https://go-review.googlesource.com/c/go/+/710475
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Following CL 357330, use jump tables on Loong64.
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
│ old │ new │
│ sec/op │ sec/op vs base │
Switch8Predictable 2.352n ± 0% 2.101n ± 0% -10.65% (p=0.000 n=10)
Switch8Unpredictable 11.99n ± 0% 10.25n ± 0% -14.51% (p=0.000 n=10)
Switch32Predictable 3.153n ± 0% 1.887n ± 1% -40.14% (p=0.000 n=10)
Switch32Unpredictable 12.47n ± 0% 10.22n ± 0% -18.00% (p=0.000 n=10)
SwitchStringPredictable 3.162n ± 0% 3.352n ± 0% +6.01% (p=0.000 n=10)
SwitchStringUnpredictable 14.70n ± 0% 13.31n ± 0% -9.46% (p=0.000 n=10)
SwitchTypePredictable 3.702n ± 0% 2.201n ± 0% -40.55% (p=0.000 n=10)
SwitchTypeUnpredictable 16.18n ± 0% 14.48n ± 0% -10.51% (p=0.000 n=10)
SwitchInterfaceTypePredictable 7.654n ± 0% 9.680n ± 0% +26.47% (p=0.000 n=10)
SwitchInterfaceTypeUnpredictable 22.04n ± 0% 22.44n ± 0% +1.81% (p=0.000 n=10)
geomean 7.441n 6.469n -13.07%
Change-Id: Id6f30fa73349c60fac17670084daee56973a955f
Reviewed-on: https://go-review.googlesource.com/c/go/+/705396
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
Fixes #75479
Change-Id: I362d3e49090e94f91a840dd5a475978b59222a00
Reviewed-on: https://go-review.googlesource.com/c/go/+/704135
Reviewed-by: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
On loong64, the addi.d instruction can only directly handle 12-bit
immediate numbers. If a larger immediate number needs to be processed,
it must first be placed in a register, and then the add.d instruction
is used to complete the processing of the larger immediate number.
If a larger immediate number c satisfies is32Bit(c) && c&0xffff == 0,
then the ADDV16 instruction can be used to complete the addition operation.
Removes 164 instructions from the go binary on loong64.
Change-Id: I404de93cc4eaaa12fe424f5a0d61b03231215d1a
Reviewed-on: https://go-review.googlesource.com/c/go/+/700536
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Change-Id: I44d6452933c8010f7dfbf821a32053f9d1cf151e
Reviewed-on: https://go-review.googlesource.com/c/go/+/700096
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
Change-Id: If9da2b5681e5d05d7c3d51f003f1fe662d3feaec
Reviewed-on: https://go-review.googlesource.com/c/go/+/699855
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Removes 152 instructions from the Go binary on loong64.
Change-Id: Icf8ead4f4ca965f51add85ac5e45c3cca8916401
Reviewed-on: https://go-review.googlesource.com/c/go/+/700335
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
Change-Id: Id43ee4353d4bac96627f8b0f54545cdd3d2a1d1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/699695
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
loong64
Refer to CL 633075, loong64 has a zero(R0) register that can be used to do this.
Change-Id: I846c6bdfcfd6dbfa18338afc13e34e350580ead4
Reviewed-on: https://go-review.googlesource.com/c/go/+/693876
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Pattern1: (the type of c is uint16)
c>>8 | c<<8
To:
revb2h c
Pattern2: (the type of c is uint32)
(c & 0xff00ff00)>>8 | (c & 0x00ff00ff)<<8
To:
revb2h c
Pattern3: (the type of c is uint64)
(c & 0xff00ff00ff00ff00)>>8 | (c & 0x00ff00ff00ff00ff)<<8
To:
revb4h c
Change-Id: Ic6231a3f476cbacbea4bd00e31193d107cb86cda
Reviewed-on: https://go-review.googlesource.com/c/go/+/696335
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
'ADDshiftLLV' on loong64
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
│ old │ new │
│ sec/op │ sec/op vs base │
MulconstI32/3 0.8004n ± 0% 0.4247n ± 2% -46.94% (p=0.000 n=10)
MulconstI32/5 0.8005n ± 0% 0.4256n ± 1% -46.83% (p=0.000 n=10)
MulconstI32/12 1.2010n ± 0% 0.8005n ± 0% -33.35% (p=0.000 n=10)
MulconstI32/120 0.8090n ± 0% 0.8067n ± 0% -0.28% (p=0.007 n=10)
MulconstI32/-120 0.8109n ± 0% 0.8072n ± 0% -0.47% (p=0.000 n=10)
MulconstI32/65537 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
MulconstI32/65538 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.265 n=10)
MulconstI64/3 0.8005n ± 0% 0.4241n ± 1% -47.02% (p=0.000 n=10)
MulconstI64/5 0.8004n ± 0% 0.4249n ± 1% -46.91% (p=0.000 n=10)
MulconstI64/12 1.2010n ± 0% 0.8004n ± 0% -33.36% (p=0.000 n=10)
MulconstI64/120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.635 n=10)
MulconstI64/-120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.837 n=10)
MulconstI64/65537 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.837 n=10)
MulconstI64/65538 0.8096n ± 0% 0.8004n ± 0% -1.14% (p=0.000 n=10)
MulconstU32/3 0.8004n ± 0% 0.4263n ± 1% -46.75% (p=0.000 n=10)
MulconstU32/5 0.8005n ± 0% 0.4262n ± 1% -46.76% (p=0.000 n=10)
MulconstU32/12 1.2010n ± 0% 0.8005n ± 0% -33.35% (p=0.000 n=10)
MulconstU32/120 0.8105n ± 0% 0.8096n ± 0% ~ (p=0.183 n=10)
MulconstU32/65537 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
MulconstU32/65538 0.8005n ± 0% 0.8005n ± 0% ~ (p=1.000 n=10)
MulconstU64/3 0.8004n ± 0% 0.4265n ± 4% -46.71% (p=0.000 n=10)
MulconstU64/5 0.8004n ± 0% 0.4256n ± 0% -46.82% (p=0.000 n=10)
MulconstU64/12 1.2010n ± 0% 0.8004n ± 0% -33.36% (p=0.000 n=10)
MulconstU64/120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.387 n=10)
MulconstU64/65537 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.265 n=10)
MulconstU64/65538 0.8080n ± 0% 0.8004n ± 0% -0.93% (p=0.000 n=10)
geomean 0.8539n 0.6597n -22.74%
Change-Id: Ie33e88985d7639f481bbba540bc917b9f185c357
Reviewed-on: https://go-review.googlesource.com/c/go/+/693855
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
loong64
Add branches to convert EQZ/NEZ into more optimal branch conditions.
This reduces 720 instructions from the go toolchain binary on loong64.
file before after Δ %
asm 555306 555082 -224 -0.0403%
cgo 481814 481742 -72 -0.0149%
compile 2475686 2475710 +24 +0.0010%
cover 516854 516770 -84 -0.0163%
link 702566 702530 -36 -0.0051%
preprofile 238612 238548 -64 -0.0268%
vet 793140 793060 -80 -0.0101%
go 1573466 1573346 -120 -0.0076%
gofmt 320560 320496 -64 -0.0200%
total 7658004 7657284 -720 -0.0094%
Additionally, rename EQ/NE to EQZ/NEZ to enhance readability.
Change-Id: Ibc876bc8b8d4e81d5c3aaf0b74b60419f3c771b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/693455
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: I680bae7fc1a26c1f249ab833fa8d41e9387b2d50
Reviewed-on: https://go-review.googlesource.com/c/go/+/693456
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
Change-Id: I5dec33d10d16a5d5c0dc7231cd1f764a6d1d7598
Reviewed-on: https://go-review.googlesource.com/c/go/+/682399
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
|
|
Reduce the number of go toolchain instructions on loong64 as follows.
file before after Δ %
addr2line 279880 279776 -104 -0.0372%
asm 556638 556410 -228 -0.0410%
buildid 272272 272072 -200 -0.0735%
cgo 481522 481318 -204 -0.0424%
compile 2457788 2457580 -208 -0.0085%
covdata 323384 323280 -104 -0.0322%
cover 518450 518234 -216 -0.0417%
dist 340790 340686 -104 -0.0305%
distpack 282456 282252 -204 -0.0722%
doc 789932 789688 -244 -0.0309%
fix 324332 324228 -104 -0.0321%
link 704622 704390 -232 -0.0329%
nm 277132 277028 -104 -0.0375%
objdump 507862 507758 -104 -0.0205%
pack 221774 221674 -100 -0.0451%
pprof 1469816 1469552 -264 -0.0180%
test2json 254836 254732 -104 -0.0408%
trace 1100002 1099738 -264 -0.0240%
vet 781078 780874 -204 -0.0261%
go 1529116 1528848 -268 -0.0175%
gofmt 318556 318448 -108 -0.0339%
total 13792238 13788566 -3672 -0.0266%
Change-Id: I23fb3ebd41309252c7075e57ea7094e79f8c4fef
Reviewed-on: https://go-review.googlesource.com/c/go/+/674335
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
|
|
This CL enables intrinsic support to emit the following prefetch
instructions for loong64 platform:
1.Prefetch - prefetches data from memory address to cache;
2.PrefetchStreamed - prefetches data from memory address, with a
hint that this data is being streamed.
Benchmarks picked from go/test/bench/garbage
Parameters tested with:
GOMAXPROCS=8
tree2 -heapsize=1000000000 -cpus=8
tree -n=18
parser
peano
Benchmarks Loongson-3A6000-HV @ 2500.00MHz:
| bench.old | bench.new |
| sec/op | sec/op vs base |
Tree2-8 1238.2µ ± 24% 999.9µ ± 453% ~ (p=0.089 n=10)
Tree-8 277.4m ± 1% 275.5m ± 1% ~ (p=0.063 n=10)
Parser-8 3.564 ± 0% 3.509 ± 1% -1.56% (p=0.000 n=10)
Peano-8 39.12m ± 2% 38.85m ± 2% ~ (p=0.353 n=10)
geomean 83.19m 78.28m -5.90%
Change-Id: I59e9aa4f609a106d4f70706e6d6d1fe6738ab72a
Reviewed-on: https://go-review.googlesource.com/c/go/+/671876
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 1.100n ± 1% 1.101n ± 0% ~ (p=0.566 n=10)
LeadingZeros8 1.501n ± 0% 1.502n ± 0% +0.07% (p=0.000 n=10)
LeadingZeros16 1.501n ± 0% 1.502n ± 0% +0.07% (p=0.000 n=10)
LeadingZeros32 1.2010n ± 0% 0.9511n ± 0% -20.81% (p=0.000 n=10)
LeadingZeros64 1.104n ± 1% 1.119n ± 0% +1.40% (p=0.000 n=10)
TrailingZeros 0.8137n ± 0% 0.8086n ± 0% -0.63% (p=0.001 n=10)
TrailingZeros8 1.031n ± 1% 1.031n ± 1% ~ (p=0.956 n=10)
TrailingZeros16 0.8204n ± 1% 0.8114n ± 0% -1.11% (p=0.000 n=10)
TrailingZeros32 0.8145n ± 0% 0.8090n ± 0% -0.68% (p=0.000 n=10)
TrailingZeros64 0.8159n ± 0% 0.8089n ± 1% -0.86% (p=0.000 n=10)
OnesCount 0.8672n ± 0% 0.8677n ± 0% +0.06% (p=0.000 n=10)
OnesCount8 0.8005n ± 0% 0.8009n ± 0% +0.06% (p=0.000 n=10)
OnesCount16 0.9339n ± 0% 0.9344n ± 0% +0.05% (p=0.000 n=10)
OnesCount32 0.8672n ± 0% 0.8677n ± 0% +0.06% (p=0.000 n=10)
OnesCount64 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
RotateLeft 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
RotateLeft8 1.202n ± 0% 1.202n ± 0% ~ (p=0.210 n=10)
RotateLeft16 0.8050n ± 0% 0.8036n ± 0% -0.17% (p=0.002 n=10)
RotateLeft32 0.6674n ± 0% 0.6674n ± 0% ~ (p=1.000 n=10)
RotateLeft64 0.6673n ± 0% 0.6674n ± 0% ~ (p=0.072 n=10)
Reverse 0.4123n ± 0% 0.4067n ± 1% -1.37% (p=0.000 n=10)
Reverse8 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
Reverse16 0.8004n ± 0% 0.8009n ± 0% +0.06% (p=0.000 n=10)
Reverse32 0.8004n ± 0% 0.8009n ± 0% +0.06% (p=0.000 n=10)
Reverse64 0.8004n ± 0% 0.8009n ± 0% +0.06% (p=0.001 n=10)
ReverseBytes 0.4100n ± 1% 0.4057n ± 1% -1.06% (p=0.002 n=10)
ReverseBytes16 0.8004n ± 0% 0.8009n ± 0% +0.07% (p=0.000 n=10)
ReverseBytes32 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
ReverseBytes64 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
Add 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Add32 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
Add64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Add64multiple 1.831n ± 0% 1.832n ± 0% ~ (p=1.000 n=10)
Sub 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub32 1.601n ± 0% 1.602n ± 0% +0.06% (p=0.000 n=10)
Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
Sub64multiple 2.400n ± 0% 2.402n ± 0% +0.10% (p=0.000 n=10)
Mul 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
Mul32 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
Mul64 0.8004n ± 0% 0.8008n ± 0% +0.05% (p=0.000 n=10)
Div 9.107n ± 0% 9.083n ± 0% ~ (p=0.255 n=10)
Div32 4.009n ± 0% 4.011n ± 0% +0.05% (p=0.000 n=10)
Div64 9.705n ± 0% 9.711n ± 0% +0.06% (p=0.000 n=10)
geomean 1.089n 1.083n -0.62%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 1.352n ± 0% 1.341n ± 4% -0.81% (p=0.024 n=10)
LeadingZeros8 1.766n ± 0% 1.781n ± 0% +0.88% (p=0.000 n=10)
LeadingZeros16 1.766n ± 0% 1.782n ± 0% +0.88% (p=0.000 n=10)
LeadingZeros32 1.536n ± 0% 1.341n ± 1% -12.73% (p=0.000 n=10)
LeadingZeros64 1.351n ± 1% 1.338n ± 0% -0.96% (p=0.000 n=10)
TrailingZeros 0.9037n ± 0% 0.9025n ± 0% -0.12% (p=0.020 n=10)
TrailingZeros8 1.087n ± 3% 1.056n ± 0% ~ (p=0.060 n=10)
TrailingZeros16 1.101n ± 0% 1.101n ± 0% ~ (p=0.211 n=10)
TrailingZeros32 0.9040n ± 0% 0.9024n ± 1% -0.18% (p=0.017 n=10)
TrailingZeros64 0.9043n ± 0% 0.9028n ± 1% ~ (p=0.118 n=10)
OnesCount 1.503n ± 2% 1.482n ± 1% -1.43% (p=0.001 n=10)
OnesCount8 1.207n ± 0% 1.206n ± 0% -0.12% (p=0.000 n=10)
OnesCount16 1.501n ± 0% 1.534n ± 0% +2.13% (p=0.000 n=10)
OnesCount32 1.483n ± 1% 1.531n ± 1% +3.27% (p=0.000 n=10)
OnesCount64 1.301n ± 0% 1.302n ± 0% +0.08% (p=0.000 n=10)
RotateLeft 0.8136n ± 4% 0.8083n ± 0% -0.66% (p=0.002 n=10)
RotateLeft8 1.311n ± 0% 1.310n ± 0% ~ (p=0.786 n=10)
RotateLeft16 1.165n ± 0% 1.149n ± 0% -1.33% (p=0.001 n=10)
RotateLeft32 0.8138n ± 1% 0.8093n ± 0% -0.57% (p=0.017 n=10)
RotateLeft64 0.8149n ± 1% 0.8088n ± 0% -0.74% (p=0.000 n=10)
Reverse 0.5195n ± 1% 0.5109n ± 0% -1.67% (p=0.000 n=10)
Reverse8 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
Reverse16 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
Reverse32 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.012 n=10)
Reverse64 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.010 n=10)
ReverseBytes 0.5120n ± 1% 0.5122n ± 2% ~ (p=0.306 n=10)
ReverseBytes16 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
ReverseBytes32 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
ReverseBytes64 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
Add 1.201n ± 0% 1.201n ± 4% ~ (p=0.334 n=10)
Add32 1.201n ± 0% 1.201n ± 0% ~ (p=0.563 n=10)
Add64 1.201n ± 0% 1.201n ± 1% ~ (p=0.652 n=10)
Add64multiple 1.909n ± 0% 1.902n ± 0% ~ (p=0.126 n=10)
Sub 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub32 1.655n ± 0% 1.654n ± 0% ~ (p=0.589 n=10)
Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub64multiple 2.150n ± 0% 2.180n ± 4% +1.37% (p=0.000 n=10)
Mul 0.9341n ± 0% 0.9345n ± 0% +0.04% (p=0.011 n=10)
Mul32 1.053n ± 0% 1.030n ± 0% -2.23% (p=0.000 n=10)
Mul64 0.9341n ± 0% 0.9345n ± 0% +0.04% (p=0.018 n=10)
Div 11.59n ± 0% 11.57n ± 1% ~ (p=0.091 n=10)
Div32 4.337n ± 0% 4.337n ± 1% ~ (p=0.783 n=10)
Div64 12.81n ± 0% 12.76n ± 0% -0.39% (p=0.001 n=10)
geomean 1.257n 1.252n -0.46%
Change-Id: I9e93ea49736760c19dc6b6463d2aa95878121b7b
Reviewed-on: https://go-review.googlesource.com/c/go/+/627855
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
In Loongson's new microstructure LA664 (Loongson-3A6000) and later, the atomic
instruction AMSWAP[DB]{B,H} [1] is supported. Therefore, the implementation of
the atomic operation exchange can be selected according to the CPUCFG flag LAM_BH:
AMSWAPDBB(full barrier) instruction is used on new microstructures, and traditional
LL-SC is used on LA464 (Loongson-3A5000) and older microstructures. This can
significantly improve the performance of Go programs on new microstructures.
Because Xchg8 implemented using traditional LL-SC uses too many temporary
registers, it is not suitable for intrinsics.
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A6000 @ 2500.00MHz
BenchmarkXchg8 100000000 10.41 ns/op
BenchmarkXchg8-2 100000000 10.41 ns/op
BenchmarkXchg8-4 100000000 10.41 ns/op
BenchmarkXchg8Parallel 96647592 12.41 ns/op
BenchmarkXchg8Parallel-2 58376136 20.60 ns/op
BenchmarkXchg8Parallel-4 78458899 17.97 ns/op
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A5000-HV @ 2500.00MHz
BenchmarkXchg8 38323825 31.23 ns/op
BenchmarkXchg8-2 38368219 31.23 ns/op
BenchmarkXchg8-4 37154156 31.26 ns/op
BenchmarkXchg8Parallel 37908301 31.63 ns/op
BenchmarkXchg8Parallel-2 30413440 39.42 ns/op
BenchmarkXchg8Parallel-4 30737626 39.03 ns/op
For #69735
[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html
Change-Id: I02ba68f66a2210b6902344fdc9975eb62de728ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/623058
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
In Loongson's new microstructure LA664 (Loongson-3A6000) and later, the atomic
compare-and-exchange instruction AMCAS[DB]{B,W,H,V} [1] is supported. Therefore,
the implementation of the atomic operation compare-and-swap can be selected according
to the CPUCFG flag LAMCAS: AMCASDB(full barrier) instruction is used on new
microstructures, and traditional LL-SC is used on LA464 (Loongson-3A5000) and older
microstructures. This can significantly improve the performance of Go programs on
new microstructures.
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
Cas 46.84n ± 0% 22.82n ± 0% -51.28% (p=0.000 n=20)
Cas-2 47.58n ± 0% 29.57n ± 0% -37.85% (p=0.000 n=20)
Cas-4 43.27n ± 20% 25.31n ± 13% -41.50% (p=0.000 n=20)
Cas64 46.85n ± 0% 22.82n ± 0% -51.29% (p=0.000 n=20)
Cas64-2 47.43n ± 0% 29.53n ± 0% -37.74% (p=0.002 n=20)
Cas64-4 43.18n ± 0% 25.28n ± 2% -41.46% (p=0.000 n=20)
geomean 45.82n 25.74n -43.82%
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
Cas 50.05n ± 0% 51.26n ± 0% +2.42% (p=0.000 n=20)
Cas-2 52.80n ± 0% 53.11n ± 0% +0.59% (p=0.000 n=20)
Cas-4 55.97n ± 0% 57.31n ± 0% +2.39% (p=0.000 n=20)
Cas64 50.05n ± 0% 51.26n ± 0% +2.42% (p=0.000 n=20)
Cas64-2 52.68n ± 0% 53.11n ± 0% +0.82% (p=0.000 n=20)
Cas64-4 55.96n ± 0% 57.26n ± 0% +2.33% (p=0.000 n=20)
geomean 52.86n 53.83n +1.82%
[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html
Change-Id: I9b777c63c124fb492f61c903f77061fa2b4e5322
Reviewed-on: https://go-review.googlesource.com/c/go/+/613396
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
TrailingZeros 1.7240n ± 0% 0.8120n ± 0% -52.90% (p=0.000 n=20)
TrailingZeros8 1.0530n ± 0% 0.8015n ± 0% -23.88% (p=0.000 n=20)
TrailingZeros16 2.072n ± 0% 1.015n ± 0% -51.01% (p=0.000 n=20)
TrailingZeros32 1.7160n ± 0% 0.8122n ± 0% -52.67% (p=0.000 n=20)
TrailingZeros64 2.0060n ± 0% 0.8125n ± 0% -59.50% (p=0.000 n=20)
geomean 1.669n 0.8470n -49.25%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
TrailingZeros 2.6275n ± 0% 0.9120n ± 0% -65.29% (p=0.000 n=20)
TrailingZeros8 1.451n ± 0% 1.163n ± 0% -19.85% (p=0.000 n=20)
TrailingZeros16 3.069n ± 0% 1.201n ± 0% -60.87% (p=0.000 n=20)
TrailingZeros32 2.9060n ± 0% 0.9115n ± 0% -68.63% (p=0.000 n=20)
TrailingZeros64 2.6305n ± 0% 0.9115n ± 0% -65.35% (p=0.000 n=20)
geomean 2.456n 1.011n -58.83%
This patch is a copy of CL 479498.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: I1a5b2114a844dc0d02c8e68f41ce2443ac3b5fda
Reviewed-on: https://go-review.googlesource.com/c/go/+/624356
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Use Loong64's LSX instruction VPCNT to implement math/bits.OnesCount{16,32,64}
and make it intrinsic.
Benchmark results on loongson 3A5000 and 3A6000 machines:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
OnesCount 4.413n ± 0% 1.401n ± 0% -68.25% (p=0.000 n=10)
OnesCount8 1.364n ± 0% 1.363n ± 0% ~ (p=0.130 n=10)
OnesCount16 2.112n ± 0% 1.534n ± 0% -27.37% (p=0.000 n=10)
OnesCount32 4.533n ± 0% 1.529n ± 0% -66.27% (p=0.000 n=10)
OnesCount64 4.565n ± 0% 1.531n ± 1% -66.46% (p=0.000 n=10)
geomean 3.048n 1.470n -51.78%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
OnesCount 3.553n ± 0% 1.201n ± 0% -66.20% (p=0.000 n=10)
OnesCount8 0.8021n ± 0% 0.8004n ± 0% -0.21% (p=0.000 n=10)
OnesCount16 1.216n ± 0% 1.000n ± 0% -17.76% (p=0.000 n=10)
OnesCount32 3.006n ± 0% 1.035n ± 0% -65.57% (p=0.000 n=10)
OnesCount64 3.503n ± 0% 1.035n ± 0% -70.45% (p=0.000 n=10)
geomean 2.053n 1.006n -51.01%
Change-Id: I07a5b8da2bb48711b896387ec7625145804affc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/620978
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| CL 624576 | this CL |
| sec/op | sec/op vs base |
Reverse 2.8130n ± 0% 0.8008n ± 0% -71.53% (p=0.000 n=20)
Reverse8 0.7014n ± 0% 0.4040n ± 0% -42.40% (p=0.000 n=20)
Reverse16 1.2975n ± 0% 0.6632n ± 1% -48.89% (p=0.000 n=20)
Reverse32 2.7520n ± 0% 0.4042n ± 0% -85.31% (p=0.000 n=20)
Reverse64 2.8970n ± 0% 0.4041n ± 0% -86.05% (p=0.000 n=20)
geomean 1.828n 0.5116n -72.01%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| CL 624576 | this CL |
| sec/op | sec/op vs base |
Reverse 4.0050n ± 0% 0.8011n ± 0% -80.00% (p=0.000 n=20)
Reverse8 0.8010n ± 0% 0.5210n ± 1% -34.96% (p=0.000 n=20)
Reverse16 1.6160n ± 0% 0.6008n ± 0% -62.82% (p=0.000 n=20)
Reverse32 3.8550n ± 0% 0.5179n ± 0% -86.57% (p=0.000 n=20)
Reverse64 3.8050n ± 0% 0.5177n ± 0% -86.40% (p=0.000 n=20)
geomean 2.378n 0.5828n -75.49%
Updates #59120
This patch is a copy of CL 483656.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: I98681091763279279c8404bd0295785f13ea1c8e
Reviewed-on: https://go-review.googlesource.com/c/go/+/624276
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
on loong64
Use loong64's atomic operation instruction AMANDDB{V,W,W} (full barrier) to implement
And{64,32,8}, AMORDB{V,W,W} (full barrier) to implement Or{64,32,8}.
Intrinsify And{64,32,8} and Or{64,32,8}, And this CL alias all of the And/Or operations
into sync/atomic package.
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A6000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
And32 27.73n ± 0% 10.81n ± 0% -61.02% (p=0.000 n=20)
And32Parallel 28.96n ± 0% 12.41n ± 0% -57.15% (p=0.000 n=20)
And64 27.73n ± 0% 10.81n ± 0% -61.02% (p=0.000 n=20)
And64Parallel 28.96n ± 0% 12.41n ± 0% -57.15% (p=0.000 n=20)
Or32 27.62n ± 0% 10.81n ± 0% -60.86% (p=0.000 n=20)
Or32Parallel 28.96n ± 0% 12.41n ± 0% -57.15% (p=0.000 n=20)
Or64 27.62n ± 0% 10.81n ± 0% -60.86% (p=0.000 n=20)
Or64Parallel 28.97n ± 0% 12.41n ± 0% -57.16% (p=0.000 n=20)
And8 29.15n ± 0% 13.21n ± 0% -54.68% (p=0.000 n=20)
And 27.71n ± 0% 12.82n ± 0% -53.74% (p=0.000 n=20)
And8Parallel 28.99n ± 0% 14.46n ± 0% -50.12% (p=0.000 n=20)
AndParallel 29.12n ± 0% 14.42n ± 0% -50.48% (p=0.000 n=20)
Or8 28.31n ± 0% 12.81n ± 0% -54.75% (p=0.000 n=20)
Or 27.72n ± 0% 12.81n ± 0% -53.79% (p=0.000 n=20)
Or8Parallel 29.03n ± 0% 14.62n ± 0% -49.64% (p=0.000 n=20)
OrParallel 29.12n ± 0% 14.42n ± 0% -50.49% (p=0.000 n=20)
geomean 28.47n 12.58n -55.80%
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
And32 30.02n ± 0% 14.81n ± 0% -50.67% (p=0.000 n=20)
And32Parallel 30.83n ± 0% 15.61n ± 0% -49.37% (p=0.000 n=20)
And64 30.02n ± 0% 14.81n ± 0% -50.67% (p=0.000 n=20)
And64Parallel 30.83n ± 0% 15.61n ± 0% -49.37% (p=0.000 n=20)
And8 30.42n ± 0% 14.41n ± 0% -52.63% (p=0.000 n=20)
And 30.02n ± 0% 13.61n ± 0% -54.66% (p=0.000 n=20)
And8Parallel 31.23n ± 0% 15.21n ± 0% -51.30% (p=0.000 n=20)
AndParallel 30.83n ± 0% 14.41n ± 0% -53.26% (p=0.000 n=20)
Or32 30.02n ± 0% 14.81n ± 0% -50.67% (p=0.000 n=20)
Or32Parallel 30.83n ± 0% 15.61n ± 0% -49.37% (p=0.000 n=20)
Or64 30.02n ± 0% 14.82n ± 0% -50.63% (p=0.000 n=20)
Or64Parallel 30.83n ± 0% 15.61n ± 0% -49.37% (p=0.000 n=20)
Or8 30.02n ± 0% 14.01n ± 0% -53.33% (p=0.000 n=20)
Or 30.02n ± 0% 13.61n ± 0% -54.66% (p=0.000 n=20)
Or8Parallel 30.83n ± 0% 14.81n ± 0% -51.96% (p=0.000 n=20)
OrParallel 30.83n ± 0% 14.41n ± 0% -53.26% (p=0.000 n=20)
geomean 30.47n 14.75n -51.61%
Change-Id: If008ff6a08b51905076f8ddb6e92f8e214d3f7b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/482756
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Use Loong64's atomic operation instruction AMSWAPDB{W,V} (full barrier)
to implement atomic.Xchg{32,64}
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A5000 @ 2500.00MHz
| old.bench | new.bench |
| sec/op | sec/op vs base |
Xchg 26.44n ± 0% 12.01n ± 0% -54.58% (p=0.000 n=20)
Xchg-2 30.10n ± 0% 25.58n ± 0% -15.02% (p=0.000 n=20)
Xchg-4 30.06n ± 0% 24.82n ± 0% -17.43% (p=0.000 n=20)
Xchg64 26.44n ± 0% 12.02n ± 0% -54.54% (p=0.000 n=20)
Xchg64-2 30.10n ± 0% 25.57n ± 0% -15.05% (p=0.000 n=20)
Xchg64-4 30.05n ± 0% 24.80n ± 0% -17.47% (p=0.000 n=20)
geomean 28.81n 19.68n -31.69%
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A6000 @ 2500.00MHz
| old.bench | new.bench |
| sec/op | sec/op vs base |
Xchg 25.62n ± 0% 12.41n ± 0% -51.56% (p=0.000 n=20)
Xchg-2 35.01n ± 0% 20.59n ± 0% -41.19% (p=0.000 n=20)
Xchg-4 34.63n ± 0% 19.59n ± 0% -43.42% (p=0.000 n=20)
Xchg64 25.62n ± 0% 12.41n ± 0% -51.56% (p=0.000 n=20)
Xchg64-2 35.01n ± 0% 20.59n ± 0% -41.19% (p=0.000 n=20)
Xchg64-4 34.67n ± 0% 19.59n ± 0% -43.50% (p=0.000 n=20)
geomean 31.44n 17.11n -45.59%
Updates #59120.
Change-Id: Ied74fc20338b63799c6d6eeb122c31b42cff0f7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/481578
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: WANG Xuerui <git@xen0n.name>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
|
|
Benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
FMA 25.930n ± 0% 2.002n ± 0% -92.28% (p=0.000 n=10)
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
FMA 32.840n ± 0% 2.002n ± 0% -93.90% (p=0.000 n=10)
Updates #59120
This patch is a copy of CL 483355.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: I88b89d23f00864f9173a182a47ee135afec7ed6e
Reviewed-on: https://go-review.googlesource.com/c/go/+/625335
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
The publication barrier is a StoreStore barrier, which is implemented
by "DBAR 0x1A" [1] on loong64.
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
Malloc8 31.76n ± 0% 22.79n ± 0% -28.24% (p=0.000 n=20)
Malloc8-2 25.46n ± 0% 18.33n ± 0% -28.00% (p=0.000 n=20)
Malloc8-4 25.75n ± 0% 18.43n ± 0% -28.41% (p=0.000 n=20)
Malloc16 62.97n ± 0% 42.41n ± 0% -32.65% (p=0.000 n=20)
Malloc16-2 49.11n ± 0% 31.68n ± 0% -35.50% (p=0.000 n=20)
Malloc16-4 49.64n ± 1% 31.95n ± 0% -35.62% (p=0.000 n=20)
MallocTypeInfo8 58.57n ± 0% 46.51n ± 0% -20.61% (p=0.000 n=20)
MallocTypeInfo8-2 51.43n ± 0% 38.01n ± 0% -26.09% (p=0.000 n=20)
MallocTypeInfo8-4 51.65n ± 0% 38.15n ± 0% -26.13% (p=0.000 n=20)
MallocTypeInfo16 68.07n ± 0% 51.62n ± 0% -24.17% (p=0.000 n=20)
MallocTypeInfo16-2 54.73n ± 0% 41.13n ± 0% -24.85% (p=0.000 n=20)
MallocTypeInfo16-4 55.05n ± 0% 41.28n ± 0% -25.02% (p=0.000 n=20)
MallocLargeStruct 491.5n ± 0% 454.8n ± 0% -7.47% (p=0.000 n=20)
MallocLargeStruct-2 351.8n ± 1% 323.8n ± 0% -7.94% (p=0.000 n=20)
MallocLargeStruct-4 333.6n ± 0% 316.7n ± 0% -5.10% (p=0.000 n=20)
geomean 71.01n 53.78n -24.26%
[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html
Change-Id: Ica0c89db6f2bebd55d9b3207a1c462a9454e9268
Reviewed-on: https://go-review.googlesource.com/c/go/+/577515
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Use Loong64's atomic operation instruction AMADDDB{W,V} (full barrier)
to implement atomic.Xadd{32,64}
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
Xadd 27.24n ± 0% 12.01n ± 0% -55.91% (p=0.000 n=20)
Xadd-2 31.93n ± 0% 25.55n ± 0% -19.98% (p=0.000 n=20)
Xadd-4 31.90n ± 0% 24.80n ± 0% -22.26% (p=0.000 n=20)
Xadd64 27.23n ± 0% 12.01n ± 0% -55.89% (p=0.000 n=20)
Xadd64-2 31.93n ± 0% 25.57n ± 0% -19.90% (p=0.000 n=20)
Xadd64-4 31.89n ± 0% 24.80n ± 0% -22.23% (p=0.000 n=20)
geomean 30.27n 19.67n -35.01%
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
Xadd 26.02n ± 0% 12.41n ± 0% -52.31% (p=0.000 n=20)
Xadd-2 37.36n ± 0% 20.60n ± 0% -44.86% (p=0.000 n=20)
Xadd-4 37.22n ± 0% 19.59n ± 0% -47.37% (p=0.000 n=20)
Xadd64 26.42n ± 0% 12.41n ± 0% -53.03% (p=0.000 n=20)
Xadd64-2 37.77n ± 0% 20.60n ± 0% -45.46% (p=0.000 n=20)
Xadd64-4 37.78n ± 0% 19.59n ± 0% -48.15% (p=0.000 n=20)
geomean 33.30n 17.11n -48.62%
Change-Id: I982539c2aa04680e9dd11b099ba8d5f215bf9b32
Reviewed-on: https://go-review.googlesource.com/c/go/+/481937
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
|
|
On Loong64, AMSWAPDB{W,V} instructions are supported by default, and AMSWAPDB{B,H} [1]
is a new instruction added by LA664(Loongson 3A6000) and later microarchitectures.
Therefore, AMSWAPDB{W,V} (full barrier) is used to implement AtomicStore{32,64}, and
the traditional MOVB or the new AMSWAPDBB is used to implement AtomicStore8 according
to the CPU feature.
The StoreRelease barrier on Loong64 is "dbar 0x12", but it is still necessary to
ensure consistency in the order of Store/Load [2].
LoweredAtomicStorezero{32,64} was removed because on loong64 the constant "0" uses
the R0 register, and there is no performance difference between the implementations
of LoweredAtomicStorezero{32,64} and LoweredAtomicStore{32,64}.
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A5000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
AtomicStore64 19.61n ± 0% 13.61n ± 0% -30.60% (p=0.000 n=20)
AtomicStore64-2 19.61n ± 0% 13.61n ± 0% -30.57% (p=0.000 n=20)
AtomicStore64-4 19.62n ± 0% 13.61n ± 0% -30.63% (p=0.000 n=20)
AtomicStore 19.61n ± 0% 13.61n ± 0% -30.60% (p=0.000 n=20)
AtomicStore-2 19.62n ± 0% 13.61n ± 0% -30.63% (p=0.000 n=20)
AtomicStore-4 19.62n ± 0% 13.62n ± 0% -30.58% (p=0.000 n=20)
AtomicStore8 19.61n ± 0% 20.01n ± 0% +2.04% (p=0.000 n=20)
AtomicStore8-2 19.62n ± 0% 20.02n ± 0% +2.01% (p=0.000 n=20)
AtomicStore8-4 19.61n ± 0% 20.02n ± 0% +2.09% (p=0.000 n=20)
geomean 19.61n 15.48n -21.08%
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
AtomicStore64 18.03n ± 0% 12.81n ± 0% -28.93% (p=0.000 n=20)
AtomicStore64-2 18.02n ± 0% 12.81n ± 0% -28.91% (p=0.000 n=20)
AtomicStore64-4 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore 18.02n ± 0% 12.81n ± 0% -28.91% (p=0.000 n=20)
AtomicStore-2 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore-4 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore8 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore8-2 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore8-4 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
geomean 18.01n 12.81n -28.89%
[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html
[2]: https://gcc.gnu.org/git/?p=gcc.git;a=blob_plain;f=gcc/config/loongarch/sync.md
Change-Id: I4ae5e8dd0e6f026129b6e503990a763ed40c6097
Reviewed-on: https://go-review.googlesource.com/c/go/+/581356
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
ReverseBytes 2.0020n ± 0% 0.4040n ± 0% -79.82% (p=0.000 n=20)
ReverseBytes16 0.8866n ± 1% 0.8007n ± 0% -9.69% (p=0.000 n=20)
ReverseBytes32 1.2195n ± 0% 0.8007n ± 0% -34.34% (p=0.000 n=20)
ReverseBytes64 2.0705n ± 0% 0.8008n ± 0% -61.32% (p=0.000 n=20)
geomean 1.455n 0.6749n -53.62%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
ReverseBytes 2.8040n ± 0% 0.5205n ± 0% -81.44% (p=0.000 n=20)
ReverseBytes16 0.7066n ± 0% 0.8011n ± 0% +13.37% (p=0.000 n=20)
ReverseBytes32 1.5500n ± 0% 0.8010n ± 0% -48.32% (p=0.000 n=20)
ReverseBytes64 2.7665n ± 0% 0.8010n ± 0% -71.05% (p=0.000 n=20)
geomean 1.707n 0.7192n -57.87%
Updates #59120
This patch is a copy of CL 483357.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: If355354cd031533df91991fcc3392e5a6c314295
Reviewed-on: https://go-review.googlesource.com/c/go/+/624576
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
For the SubFromLen64 codegen test case to work as intended, we need
to fold c-(-(x-d)) into x+(c-d).
Still, some instances of LeadingZeros are not optimized into single
CLZ instructions right now (actually, the LeadingZeros micro-benchmarks
are currently still compiled with redundant adds/subs of 64, due to
interference of loop optimizations before lowering), but perf numbers
indicate it's not that bad after all.
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 3.660n ± 0% 1.348n ± 0% -63.17% (p=0.000 n=20)
LeadingZeros8 1.777n ± 0% 1.767n ± 0% -0.56% (p=0.000 n=20)
LeadingZeros16 2.816n ± 0% 1.770n ± 0% -37.14% (p=0.000 n=20)
LeadingZeros32 5.293n ± 1% 1.683n ± 0% -68.21% (p=0.000 n=20)
LeadingZeros64 3.622n ± 0% 1.349n ± 0% -62.76% (p=0.000 n=20)
geomean 3.229n 1.571n -51.35%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 2.410n ± 0% 1.103n ± 1% -54.23% (p=0.000 n=20)
LeadingZeros8 1.236n ± 0% 1.501n ± 0% +21.44% (p=0.000 n=20)
LeadingZeros16 2.106n ± 0% 1.501n ± 0% -28.73% (p=0.000 n=20)
LeadingZeros32 2.860n ± 0% 1.324n ± 0% -53.72% (p=0.000 n=20)
LeadingZeros64 2.6135n ± 0% 0.9509n ± 0% -63.62% (p=0.000 n=20)
geomean 2.159n 1.256n -41.81%
Updates #59120
This patch is a copy of CL 483356.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: Iee81a17f7da06d77a427e73dfcc016f2b15ae556
Reviewed-on: https://go-review.googlesource.com/c/go/+/624575
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
os: linux
goarch: loong64
pkg: test/bench/go1
cpu: Loongson-3A6000 @ 2500.00MHz
│ old │ new │
│ sec/op │ sec/op vs base │
BinaryTree17 7.521 ± 1% 7.551 ± 2% ~ (p=0.190 n=10)
Fannkuch11 2.736 ± 0% 2.667 ± 0% -2.51% (p=0.000 n=10)
FmtFprintfEmpty 34.42n ± 0% 35.22n ± 0% +2.32% (p=0.000 n=10)
FmtFprintfString 61.24n ± 0% 56.84n ± 0% -7.18% (p=0.000 n=10)
FmtFprintfInt 68.04n ± 0% 65.65n ± 0% -3.51% (p=0.000 n=10)
FmtFprintfIntInt 111.9n ± 0% 106.0n ± 0% -5.32% (p=0.000 n=10)
FmtFprintfPrefixedInt 131.4n ± 0% 122.5n ± 0% -6.77% (p=0.000 n=10)
FmtFprintfFloat 241.1n ± 0% 235.1n ± 0% -2.51% (p=0.000 n=10)
FmtManyArgs 553.7n ± 0% 518.9n ± 0% -6.28% (p=0.000 n=10)
GobDecode 7.223m ± 1% 7.291m ± 1% +0.94% (p=0.004 n=10)
GobEncode 6.741m ± 1% 6.622m ± 2% -1.77% (p=0.011 n=10)
Gzip 288.9m ± 0% 280.3m ± 0% -3.00% (p=0.000 n=10)
Gunzip 34.07m ± 0% 33.33m ± 0% -2.18% (p=0.000 n=10)
HTTPClientServer 60.15µ ± 0% 60.63µ ± 0% +0.80% (p=0.000 n=10)
JSONEncode 10.052m ± 1% 9.840m ± 0% -2.12% (p=0.000 n=10)
JSONDecode 50.96m ± 0% 51.32m ± 0% +0.70% (p=0.002 n=10)
Mandelbrot200 4.525m ± 0% 4.602m ± 0% +1.69% (p=0.000 n=10)
GoParse 5.018m ± 0% 4.996m ± 0% -0.44% (p=0.000 n=10)
RegexpMatchEasy0_32 58.74n ± 0% 59.95n ± 0% +2.06% (p=0.000 n=10)
RegexpMatchEasy0_1K 464.9n ± 0% 466.1n ± 0% +0.26% (p=0.000 n=10)
RegexpMatchEasy1_32 64.88n ± 0% 59.64n ± 0% -8.08% (p=0.000 n=10)
RegexpMatchEasy1_1K 557.2n ± 0% 564.4n ± 0% +1.29% (p=0.000 n=10)
RegexpMatchMedium_32 879.3n ± 0% 912.8n ± 1% +3.81% (p=0.000 n=10)
RegexpMatchMedium_1K 28.08µ ± 0% 28.70µ ± 0% +2.20% (p=0.000 n=10)
RegexpMatchHard_32 1.456µ ± 0% 1.414µ ± 0% -2.88% (p=0.000 n=10)
RegexpMatchHard_1K 43.81µ ± 0% 42.23µ ± 0% -3.61% (p=0.000 n=10)
Revcomp 472.4m ± 0% 474.5m ± 1% +0.45% (p=0.000 n=10)
Template 83.45m ± 0% 83.39m ± 0% ~ (p=0.481 n=10)
TimeParse 291.3n ± 0% 283.8n ± 0% -2.57% (p=0.000 n=10)
TimeFormat 322.8n ± 0% 313.1n ± 0% -3.02% (p=0.000 n=10)
geomean 54.32µ 53.45µ -1.61%
Change-Id: If68fdd952ec6137c77e25ce8932358cac28da324
Reviewed-on: https://go-review.googlesource.com/c/go/+/620977
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
|
|
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 1.0095n ± 0% 0.8011n ± 0% -20.64% (p=0.000 n=10)
LeadingZeros8 1.201n ± 0% 1.167n ± 0% -2.83% (p=0.000 n=10)
LeadingZeros16 1.201n ± 0% 1.167n ± 0% -2.83% (p=0.000 n=10)
LeadingZeros32 1.201n ± 0% 1.134n ± 0% -5.58% (p=0.000 n=10)
LeadingZeros64 0.8007n ± 0% 1.0115n ± 0% +26.32% (p=0.000 n=10)
TrailingZeros 0.8054n ± 0% 0.8106n ± 1% +0.65% (p=0.000 n=10)
TrailingZeros8 1.067n ± 0% 1.002n ± 1% -6.09% (p=0.000 n=10)
TrailingZeros16 1.0540n ± 0% 0.8389n ± 0% -20.40% (p=0.000 n=10)
TrailingZeros32 0.8014n ± 0% 0.8117n ± 0% +1.29% (p=0.000 n=10)
TrailingZeros64 0.8015n ± 0% 0.8124n ± 1% +1.36% (p=0.000 n=10)
OnesCount 3.418n ± 0% 3.417n ± 0% ~ (p=0.911 n=10)
OnesCount8 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
OnesCount16 1.440n ± 0% 1.299n ± 0% -9.79% (p=0.000 n=10)
OnesCount32 2.969n ± 0% 2.940n ± 0% -0.94% (p=0.000 n=10)
OnesCount64 3.563n ± 0% 3.558n ± 0% -0.14% (p=0.000 n=10)
RotateLeft 0.6677n ± 0% 0.6670n ± 0% ~ (p=0.055 n=10)
RotateLeft8 1.318n ± 1% 1.321n ± 0% ~ (p=0.117 n=10)
RotateLeft16 0.8457n ± 1% 0.8442n ± 0% ~ (p=0.325 n=10)
RotateLeft32 0.8004n ± 0% 0.8004n ± 0% ~ (p=0.837 n=10)
RotateLeft64 0.6678n ± 0% 0.6670n ± 0% -0.13% (p=0.000 n=10)
Reverse 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
Reverse8 0.6989n ± 0% 0.6969n ± 1% ~ (p=0.138 n=10)
Reverse16 0.6998n ± 1% 0.7004n ± 1% ~ (p=0.985 n=10)
Reverse32 0.4158n ± 1% 0.4159n ± 1% ~ (p=0.870 n=10)
Reverse64 0.4165n ± 1% 0.4194n ± 2% ~ (p=0.093 n=10)
ReverseBytes 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
ReverseBytes16 0.4183n ± 2% 0.4148n ± 1% ~ (p=0.055 n=10)
ReverseBytes32 0.4143n ± 2% 0.4153n ± 1% ~ (p=0.869 n=10)
ReverseBytes64 0.4168n ± 1% 0.4177n ± 1% ~ (p=0.184 n=10)
Add 1.201n ± 0% 1.201n ± 0% ~ (p=0.087 n=10)
Add32 1.603n ± 0% 1.601n ± 0% -0.12% (p=0.000 n=10)
Add64 1.201n ± 0% 1.201n ± 0% ~ (p=0.211 n=10)
Add64multiple 1.839n ± 0% 1.835n ± 0% -0.24% (p=0.001 n=10)
Sub 1.202n ± 0% 1.201n ± 0% -0.04% (p=0.033 n=10)
Sub32 2.401n ± 0% 1.601n ± 0% -33.32% (p=0.000 n=10)
Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub64multiple 2.105n ± 0% 2.096n ± 0% -0.40% (p=0.000 n=10)
Mul 0.8008n ± 0% 0.8004n ± 0% -0.05% (p=0.000 n=10)
Mul32 0.8041n ± 0% 0.8014n ± 0% -0.34% (p=0.000 n=10)
Mul64 0.8008n ± 0% 0.8004n ± 0% -0.05% (p=0.000 n=10)
Div 8.977n ± 0% 8.945n ± 0% -0.36% (p=0.000 n=10)
Div32 4.084n ± 0% 4.086n ± 0% ~ (p=0.445 n=10)
Div64 9.316n ± 0% 9.301n ± 0% -0.17% (p=0.000 n=10)
geomean 1.141n 1.117n -2.09%
Change-Id: I4dc1eaab6728f771bc722ed331fe5c6429bd1037
Reviewed-on: https://go-review.googlesource.com/c/go/+/618475
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
goos: linux
goarch: loong64
pkg: test/bench/go1
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
BinaryTree17 7.766 ± 1% 7.640 ± 2% -1.62% (p=0.000 n=20)
Fannkuch11 2.649 ± 0% 2.358 ± 0% -10.96% (p=0.000 n=20)
FmtFprintfEmpty 35.89n ± 0% 35.87n ± 0% -0.06% (p=0.000 n=20)
FmtFprintfString 59.44n ± 0% 57.25n ± 2% -3.68% (p=0.000 n=20)
FmtFprintfInt 62.07n ± 0% 60.04n ± 0% -3.27% (p=0.000 n=20)
FmtFprintfIntInt 97.90n ± 0% 97.26n ± 0% -0.65% (p=0.000 n=20)
FmtFprintfPrefixedInt 116.7n ± 0% 119.2n ± 0% +2.14% (p=0.000 n=20)
FmtFprintfFloat 204.5n ± 0% 201.9n ± 0% -1.30% (p=0.000 n=20)
FmtManyArgs 455.9n ± 0% 466.8n ± 0% +2.39% (p=0.000 n=20)
GobDecode 7.458m ± 1% 7.138m ± 1% -4.28% (p=0.000 n=20)
GobEncode 8.573m ± 1% 8.473m ± 1% ~ (p=0.091 n=20)
Gzip 280.2m ± 0% 284.9m ± 0% +1.67% (p=0.000 n=20)
Gunzip 32.68m ± 0% 32.67m ± 0% ~ (p=0.211 n=20)
HTTPClientServer 54.22µ ± 0% 53.24µ ± 0% -1.80% (p=0.000 n=20)
JSONEncode 9.427m ± 1% 9.152m ± 0% -2.92% (p=0.000 n=20)
JSONDecode 47.08m ± 1% 46.85m ± 1% -0.49% (p=0.007 n=20)
Mandelbrot200 4.601m ± 0% 4.605m ± 0% +0.08% (p=0.000 n=20)
GoParse 4.776m ± 0% 4.655m ± 1% -2.52% (p=0.000 n=20)
RegexpMatchEasy0_32 59.77n ± 0% 57.59n ± 0% -3.66% (p=0.000 n=20)
RegexpMatchEasy0_1K 458.1n ± 0% 458.8n ± 0% +0.15% (p=0.000 n=20)
RegexpMatchEasy1_32 59.36n ± 0% 59.24n ± 0% -0.20% (p=0.000 n=20)
RegexpMatchEasy1_1K 557.7n ± 0% 560.2n ± 0% +0.46% (p=0.000 n=20)
RegexpMatchMedium_32 803.1n ± 0% 772.8n ± 0% -3.77% (p=0.000 n=20)
RegexpMatchMedium_1K 27.29µ ± 0% 25.88µ ± 0% -5.18% (p=0.000 n=20)
RegexpMatchHard_32 1.385µ ± 0% 1.304µ ± 0% -5.85% (p=0.000 n=20)
RegexpMatchHard_1K 40.92µ ± 0% 39.58µ ± 0% -3.27% (p=0.000 n=20)
Revcomp 474.3m ± 0% 410.0m ± 0% -13.56% (p=0.000 n=20)
Template 78.16m ± 0% 76.32m ± 1% -2.36% (p=0.000 n=20)
TimeParse 271.8n ± 0% 272.1n ± 0% +0.11% (p=0.000 n=20)
TimeFormat 292.3n ± 0% 294.8n ± 0% +0.86% (p=0.000 n=20)
geomean 51.98µ 50.82µ -2.22%
Change-Id: Ia78f1ddee8f1d9ec7192a4b8d2a4ec6058679956
Reviewed-on: https://go-review.googlesource.com/c/go/+/615918
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
Moving these intrinsics to a base package enables other internal/runtime
packages to use them.
For #54766.
Change-Id: I0b3eded3bb45af53e3eb5bab93e3792e6a8beb46
Reviewed-on: https://go-review.googlesource.com/c/go/+/613260
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
loong64
Use float <-> int register moves without conversion instead of stores
and loads to move float <-> int values like arm64 and mips64.
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
│ bench.old │ bench.new │
│ sec/op │ sec/op vs base │
Acos 15.98n ± 0% 15.94n ± 0% -0.25% (p=0.000 n=20)
Acosh 27.75n ± 0% 25.56n ± 0% -7.89% (p=0.000 n=20)
Asin 15.85n ± 0% 15.76n ± 0% -0.57% (p=0.000 n=20)
Asinh 39.79n ± 0% 37.69n ± 0% -5.28% (p=0.000 n=20)
Atan 7.261n ± 0% 7.242n ± 0% -0.27% (p=0.000 n=20)
Atanh 28.30n ± 0% 27.62n ± 0% -2.40% (p=0.000 n=20)
Atan2 15.85n ± 0% 15.75n ± 0% -0.63% (p=0.000 n=20)
Cbrt 27.02n ± 0% 21.08n ± 0% -21.98% (p=0.000 n=20)
Ceil 2.830n ± 1% 2.896n ± 1% +2.31% (p=0.000 n=20)
Copysign 0.8022n ± 0% 0.8004n ± 0% -0.22% (p=0.000 n=20)
Cos 11.64n ± 0% 11.61n ± 0% -0.26% (p=0.000 n=20)
Cosh 35.98n ± 0% 33.44n ± 0% -7.05% (p=0.000 n=20)
Erf 10.09n ± 0% 10.08n ± 0% -0.10% (p=0.000 n=20)
Erfc 11.40n ± 0% 11.35n ± 0% -0.44% (p=0.000 n=20)
Erfinv 12.31n ± 0% 12.29n ± 0% -0.16% (p=0.000 n=20)
Erfcinv 12.16n ± 0% 12.17n ± 0% +0.08% (p=0.000 n=20)
Exp 28.41n ± 0% 26.44n ± 0% -6.95% (p=0.000 n=20)
ExpGo 28.68n ± 0% 27.07n ± 0% -5.60% (p=0.000 n=20)
Expm1 17.21n ± 0% 16.75n ± 0% -2.67% (p=0.000 n=20)
Exp2 24.71n ± 0% 23.01n ± 0% -6.88% (p=0.000 n=20)
Exp2Go 25.17n ± 0% 23.91n ± 0% -4.99% (p=0.000 n=20)
Abs 0.8004n ± 0% 0.8004n ± 0% ~ (p=0.224 n=20)
Dim 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=20) ¹
Floor 2.848n ± 0% 2.859n ± 0% +0.39% (p=0.000 n=20)
Max 3.074n ± 0% 3.071n ± 0% ~ (p=0.481 n=20)
Min 3.179n ± 0% 3.176n ± 0% -0.09% (p=0.003 n=20)
Mod 49.62n ± 0% 44.82n ± 0% -9.67% (p=0.000 n=20)
Frexp 7.604n ± 0% 6.803n ± 0% -10.53% (p=0.000 n=20)
Gamma 18.01n ± 0% 17.61n ± 0% -2.22% (p=0.000 n=20)
Hypot 7.204n ± 0% 7.604n ± 0% +5.55% (p=0.000 n=20)
HypotGo 7.204n ± 0% 7.604n ± 0% +5.56% (p=0.000 n=20)
Ilogb 6.003n ± 0% 6.003n ± 0% ~ (p=0.407 n=20)
J0 76.43n ± 0% 76.24n ± 0% -0.25% (p=0.000 n=20)
J1 76.44n ± 0% 76.44n ± 0% ~ (p=1.000 n=20)
Jn 168.2n ± 0% 168.5n ± 0% +0.18% (p=0.000 n=20)
Ldexp 8.804n ± 0% 7.604n ± 0% -13.63% (p=0.000 n=20)
Lgamma 19.01n ± 0% 19.01n ± 0% ~ (p=0.695 n=20)
Log 19.38n ± 0% 19.12n ± 0% -1.34% (p=0.000 n=20)
Logb 6.003n ± 0% 6.003n ± 0% ~ (p=1.000 n=20)
Log1p 18.57n ± 0% 16.72n ± 0% -9.96% (p=0.000 n=20)
Log10 20.67n ± 0% 20.45n ± 0% -1.06% (p=0.000 n=20)
Log2 9.605n ± 0% 8.804n ± 0% -8.34% (p=0.000 n=20)
Modf 4.402n ± 0% 4.402n ± 0% ~ (p=1.000 n=20)
Nextafter32 7.204n ± 0% 5.603n ± 0% -22.22% (p=0.000 n=20)
Nextafter64 6.803n ± 0% 6.003n ± 0% -11.76% (p=0.000 n=20)
PowInt 39.62n ± 0% 37.22n ± 0% -6.06% (p=0.000 n=20)
PowFrac 120.9n ± 0% 108.9n ± 0% -9.93% (p=0.000 n=20)
Pow10Pos 1.601n ± 0% 1.601n ± 0% ~ (p=0.487 n=20)
Pow10Neg 2.675n ± 0% 2.675n ± 0% ~ (p=1.000 n=20)
Round 3.018n ± 0% 2.401n ± 0% -20.46% (p=0.000 n=20)
RoundToEven 3.822n ± 0% 3.001n ± 0% -21.48% (p=0.000 n=20)
Remainder 45.62n ± 0% 42.42n ± 0% -7.01% (p=0.000 n=20)
Signbit 0.9075n ± 0% 0.8004n ± 0% -11.81% (p=0.000 n=20)
Sin 12.65n ± 0% 12.65n ± 0% ~ (p=0.503 n=20)
Sincos 14.81n ± 0% 14.60n ± 0% -1.42% (p=0.000 n=20)
Sinh 36.75n ± 0% 35.11n ± 0% -4.46% (p=0.000 n=20)
SqrtIndirect 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=20) ¹
SqrtLatency 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20)
SqrtIndirectLatency 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20)
SqrtGoLatency 52.85n ± 0% 40.82n ± 0% -22.76% (p=0.000 n=20)
SqrtPrime 887.4n ± 0% 887.4n ± 0% ~ (p=0.751 n=20)
Tan 13.95n ± 0% 13.97n ± 0% +0.18% (p=0.000 n=20)
Tanh 36.79n ± 0% 34.89n ± 0% -5.16% (p=0.000 n=20)
Trunc 2.849n ± 0% 2.861n ± 0% +0.42% (p=0.000 n=20)
Y0 77.44n ± 0% 77.64n ± 0% +0.26% (p=0.000 n=20)
Y1 74.41n ± 0% 74.33n ± 0% -0.11% (p=0.000 n=20)
Yn 158.7n ± 0% 159.0n ± 0% +0.19% (p=0.000 n=20)
Float64bits 0.8774n ± 0% 0.4002n ± 0% -54.39% (p=0.000 n=20)
Float64frombits 0.8042n ± 0% 0.4002n ± 0% -50.24% (p=0.000 n=20)
Float32bits 1.1230n ± 0% 0.5336n ± 0% -52.48% (p=0.000 n=20)
Float32frombits 1.0670n ± 0% 0.8004n ± 0% -24.99% (p=0.000 n=20)
FMA 2.001n ± 0% 2.001n ± 0% ~ (p=0.605 n=20)
geomean 10.87n 10.10n -7.15%
¹ all samples are equal
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
│ bench.old │ bench.new │
│ sec/op │ sec/op vs base │
Acos 33.10n ± 0% 31.95n ± 2% -3.46% (p=0.000 n=20)
Acosh 58.38n ± 0% 50.44n ± 0% -13.60% (p=0.000 n=20)
Asin 32.70n ± 0% 31.94n ± 0% -2.32% (p=0.000 n=20)
Asinh 57.65n ± 0% 50.83n ± 0% -11.82% (p=0.000 n=20)
Atan 14.21n ± 0% 14.21n ± 0% ~ (p=0.501 n=20)
Atanh 60.86n ± 0% 54.44n ± 0% -10.56% (p=0.000 n=20)
Atan2 32.02n ± 0% 34.02n ± 0% +6.25% (p=0.000 n=20)
Cbrt 55.58n ± 0% 40.64n ± 0% -26.88% (p=0.000 n=20)
Ceil 9.566n ± 0% 9.566n ± 0% ~ (p=0.463 n=20)
Copysign 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.806 n=20)
Cos 18.02n ± 0% 18.02n ± 0% ~ (p=0.191 n=20)
Cosh 64.44n ± 0% 65.64n ± 0% +1.86% (p=0.000 n=20)
Erf 16.15n ± 0% 16.16n ± 0% ~ (p=0.770 n=20)
Erfc 18.71n ± 0% 18.83n ± 0% +0.61% (p=0.000 n=20)
Erfinv 19.33n ± 0% 19.34n ± 0% ~ (p=0.513 n=20)
Erfcinv 18.90n ± 0% 19.78n ± 0% +4.63% (p=0.000 n=20)
Exp 50.04n ± 0% 49.66n ± 0% -0.75% (p=0.000 n=20)
ExpGo 50.03n ± 0% 50.03n ± 0% ~ (p=0.723 n=20)
Expm1 28.41n ± 0% 28.27n ± 0% -0.49% (p=0.000 n=20)
Exp2 50.08n ± 0% 51.23n ± 0% +2.31% (p=0.000 n=20)
Exp2Go 49.77n ± 0% 49.89n ± 0% +0.24% (p=0.000 n=20)
Abs 0.8009n ± 0% 0.8006n ± 0% ~ (p=0.317 n=20)
Dim 1.987n ± 0% 1.993n ± 0% +0.28% (p=0.001 n=20)
Floor 8.543n ± 0% 8.548n ± 0% ~ (p=0.509 n=20)
Max 6.670n ± 0% 6.672n ± 0% ~ (p=0.335 n=20)
Min 6.694n ± 0% 6.694n ± 0% ~ (p=0.459 n=20)
Mod 56.44n ± 0% 53.23n ± 0% -5.70% (p=0.000 n=20)
Frexp 8.409n ± 0% 7.606n ± 0% -9.55% (p=0.000 n=20)
Gamma 35.64n ± 0% 35.23n ± 0% -1.15% (p=0.000 n=20)
Hypot 11.21n ± 0% 10.61n ± 0% -5.31% (p=0.000 n=20)
HypotGo 11.50n ± 0% 11.01n ± 0% -4.30% (p=0.000 n=20)
Ilogb 7.606n ± 0% 6.804n ± 0% -10.54% (p=0.000 n=20)
J0 125.3n ± 0% 126.5n ± 0% +0.96% (p=0.000 n=20)
J1 124.9n ± 0% 125.3n ± 0% +0.32% (p=0.000 n=20)
Jn 264.3n ± 0% 265.9n ± 0% +0.61% (p=0.000 n=20)
Ldexp 9.606n ± 0% 9.204n ± 0% -4.19% (p=0.000 n=20)
Lgamma 38.82n ± 0% 38.85n ± 0% +0.06% (p=0.019 n=20)
Log 38.44n ± 0% 28.04n ± 0% -27.06% (p=0.000 n=20)
Logb 8.405n ± 0% 7.605n ± 0% -9.52% (p=0.000 n=20)
Log1p 31.62n ± 0% 27.11n ± 0% -14.26% (p=0.000 n=20)
Log10 38.83n ± 0% 28.42n ± 0% -26.81% (p=0.000 n=20)
Log2 11.21n ± 0% 10.41n ± 0% -7.14% (p=0.000 n=20)
Modf 5.204n ± 0% 5.205n ± 0% ~ (p=0.983 n=20)
Nextafter32 8.809n ± 0% 7.208n ± 0% -18.18% (p=0.000 n=20)
Nextafter64 8.405n ± 0% 8.406n ± 0% +0.01% (p=0.007 n=20)
PowInt 48.83n ± 0% 44.78n ± 0% -8.28% (p=0.000 n=20)
PowFrac 146.9n ± 0% 142.1n ± 0% -3.23% (p=0.000 n=20)
Pow10Pos 2.334n ± 0% 2.333n ± 0% ~ (p=0.110 n=20)
Pow10Neg 4.803n ± 0% 4.803n ± 0% ~ (p=0.130 n=20)
Round 4.816n ± 0% 3.819n ± 0% -20.70% (p=0.000 n=20)
RoundToEven 5.735n ± 0% 5.204n ± 0% -9.26% (p=0.000 n=20)
Remainder 52.05n ± 0% 49.64n ± 0% -4.63% (p=0.000 n=20)
Signbit 1.201n ± 0% 1.001n ± 0% -16.65% (p=0.000 n=20)
Sin 20.63n ± 0% 20.64n ± 0% +0.05% (p=0.040 n=20)
Sincos 23.82n ± 0% 24.62n ± 0% +3.36% (p=0.000 n=20)
Sinh 71.25n ± 0% 68.44n ± 0% -3.94% (p=0.000 n=20)
SqrtIndirect 2.001n ± 0% 2.001n ± 0% ~ (p=0.182 n=20)
SqrtLatency 4.003n ± 0% 4.003n ± 0% ~ (p=0.754 n=20)
SqrtIndirectLatency 4.003n ± 0% 4.003n ± 0% ~ (p=0.773 n=20)
SqrtGoLatency 60.84n ± 0% 81.26n ± 0% +33.56% (p=0.000 n=20)
SqrtPrime 1.791µ ± 0% 1.791µ ± 0% ~ (p=0.784 n=20)
Tan 27.22n ± 0% 27.22n ± 0% ~ (p=0.819 n=20)
Tanh 70.88n ± 0% 69.04n ± 0% -2.60% (p=0.000 n=20)
Trunc 8.543n ± 0% 8.543n ± 0% ~ (p=0.784 n=20)
Y0 122.9n ± 0% 122.9n ± 0% ~ (p=0.559 n=20)
Y1 123.3n ± 0% 121.7n ± 0% -1.30% (p=0.000 n=20)
Yn 263.0n ± 0% 262.6n ± 0% -0.15% (p=0.000 n=20)
Float64bits 1.2010n ± 0% 0.6004n ± 0% -50.01% (p=0.000 n=20)
Float64frombits 1.2010n ± 0% 0.6004n ± 0% -50.01% (p=0.000 n=20)
Float32bits 1.7010n ± 0% 0.8005n ± 0% -52.94% (p=0.000 n=20)
Float32frombits 1.5010n ± 0% 0.8005n ± 0% -46.67% (p=0.000 n=20)
FMA 2.001n ± 0% 2.001n ± 0% ~ (p=0.238 n=20)
geomean 17.41n 16.15n -7.19%
Change-Id: I0a0c263af2f07203eab1782e69c706f20c689d8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/604737
Auto-Submit: Tim King <taking@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Tim King <taking@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
CL 580283 left cmd/compile/internal/ssa/_gen/ in a state where `go run *.go` would always fails ! :'(
Change-Id: I0b3aea9b3f6275cb17c552898c5034e15f0107d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/603995
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Copysign 1.9710n ± 0% 0.8006n ± 0% -59.38% (p=0.000 n=10)
Abs 1.8745n ± 0% 0.8006n ± 0% -57.29% (p=0.000 n=10)
geomean 1.922n 0.8006n -58.35%
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Copysign 2.4020n ± 0% 0.9006n ± 0% -62.51% (p=0.000 n=10)
Abs 2.4020n ± 0% 0.8005n ± 0% -66.67% (p=0.000 n=10)
geomean 2.402n 0.8491n -64.65%
Updates #59120.
Change-Id: Ic409e1f4d15ad15cb3568a5aaa100046e9302842
Reviewed-on: https://go-review.googlesource.com/c/go/+/580280
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Make math.{Min,Max} intrinsics and implement math.{archMax,archMin}
in hardware.
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Max 7.606n ± 0% 3.087n ± 0% -59.41% (p=0.000 n=20)
Min 7.205n ± 0% 2.904n ± 0% -59.69% (p=0.000 n=20)
MinFloat 37.220n ± 0% 4.802n ± 0% -87.10% (p=0.000 n=20)
MaxFloat 33.620n ± 0% 4.802n ± 0% -85.72% (p=0.000 n=20)
geomean 16.18n 3.792n -76.57%
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A5000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Max 10.010n ± 0% 7.196n ± 0% -28.11% (p=0.000 n=20)
Min 8.806n ± 0% 7.155n ± 0% -18.75% (p=0.000 n=20)
MinFloat 60.010n ± 0% 7.976n ± 0% -86.71% (p=0.000 n=20)
MaxFloat 56.410n ± 0% 7.980n ± 0% -85.85% (p=0.000 n=20)
geomean 23.37n 7.566n -67.63%
Updates #59120.
Change-Id: I6815d20bc304af3cbf5d6ca8fe0ca1c2ddebea2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/580283
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
allow the loong64 CALL* ops to take variable number of args
Update #40724
Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: I4706d9651fcbf9a0f201af6820c97b1a924f14e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/521781
Auto-Submit: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
|
|
Update #40724
Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: Ifd7d94147b01e4fc83978b53dca2bcc0ad1ac4e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/521779
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
|
|
regABI arguments
Update #40724
Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: Ic7e2e7fb4c1d3670e6abbfb817aa6e4e654e08d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/521777
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: David Chase <drchase@google.com>
|
|
device for loong64
Add R21 to the allocatable registers, use R20 and R21 in duff
device. This CL is in preparation for subsequent regABI support.
Updates #40724
Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: If1661adc0f766925fbe74827a369797f95fa28a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/521775
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Like the CL 487295, when implementing Lowered{Move,Zero}, 8 is first subtracted
from Rarg0 (parameter Ptr), and then the offset of 8 is added during subsequent
operations on Rarg0. This operation is meaningless, so delete it.
Change LoweredMove's Rarg0 register to R20, consistent with duffcopy.
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3C5000 @ 2200.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Memmove/15 19.10n ± 0% 19.10n ± 0% ~ (p=0.483 n=15)
MemmoveUnalignedDst/15 25.02n ± 0% 25.02n ± 0% ~ (p=0.741 n=15)
MemmoveUnalignedDst/32 48.22n ± 0% 48.22n ± 0% ~ (p=1.000 n=15) ¹
MemmoveUnalignedDst/64 90.57n ± 0% 90.52n ± 0% ~ (p=0.212 n=15)
MemmoveUnalignedDstOverlap/32 44.12n ± 0% 44.13n ± 0% +0.02% (p=0.000 n=15)
MemmoveUnalignedDstOverlap/64 87.79n ± 0% 87.80n ± 0% +0.01% (p=0.002 n=15)
MemmoveUnalignedSrc/0 3.639n ± 0% 3.639n ± 0% ~ (p=1.000 n=15) ¹
MemmoveUnalignedSrc/1 7.733n ± 0% 7.733n ± 0% ~ (p=1.000 n=15)
MemmoveUnalignedSrc/2 9.097n ± 0% 9.097n ± 0% ~ (p=1.000 n=15)
MemmoveUnalignedSrc/3 10.46n ± 0% 10.46n ± 0% ~ (p=1.000 n=15) ¹
MemmoveUnalignedSrc/4 11.83n ± 0% 11.83n ± 0% ~ (p=1.000 n=15) ¹
MemmoveUnalignedSrc/64 93.71n ± 0% 93.70n ± 0% ~ (p=0.128 n=15)
Memclr/4096 699.1n ± 0% 699.1n ± 0% ~ (p=0.682 n=15)
Memclr/65536 11.18µ ± 0% 11.18µ ± 0% -0.01% (p=0.000 n=15)
Memclr/1M 175.2µ ± 0% 175.2µ ± 0% ~ (p=0.191 n=15)
Memclr/4M 661.8µ ± 0% 662.0µ ± 0% ~ (p=0.486 n=15)
MemclrUnaligned/4_5 19.39n ± 0% 20.47n ± 0% +5.57% (p=0.000 n=15)
MemclrUnaligned/4_16 22.29n ± 0% 21.38n ± 0% -4.08% (p=0.000 n=15)
MemclrUnaligned/4_64 30.58n ± 0% 29.81n ± 0% -2.52% (p=0.000 n=15)
MemclrUnaligned/4_65536 11.19µ ± 0% 11.20µ ± 0% +0.02% (p=0.000 n=15)
GoMemclr/5 12.73n ± 0% 12.73n ± 0% ~ (p=0.261 n=15)
GoMemclr/16 10.01n ± 0% 10.00n ± 0% ~ (p=0.264 n=15)
GoMemclr/256 50.94n ± 0% 50.94n ± 0% ~ (p=0.372 n=15)
ClearFat15 14.95n ± 0% 15.01n ± 4% ~ (p=0.925 n=15)
ClearFat1032 125.5n ± 0% 125.6n ± 0% +0.08% (p=0.000 n=15)
CopyFat64 10.58n ± 0% 10.01n ± 0% -5.39% (p=0.000 n=15)
CopyFat1040 244.3n ± 0% 155.6n ± 0% -36.31% (p=0.000 n=15)
Issue18740/2byte 29.82µ ± 0% 29.82µ ± 0% ~ (p=0.648 n=30)
Issue18740/4byte 18.18µ ± 0% 18.18µ ± 0% -0.02% (p=0.001 n=30)
Issue18740/8byte 8.395µ ± 0% 8.395µ ± 0% ~ (p=0.401 n=30)
geomean 154.5n 151.8n -1.70%
¹ all samples are equal
Change-Id: Ia3f3c8b25e1e93c97ab72328651de78ca9dec016
Reviewed-on: https://go-review.googlesource.com/c/go/+/488515
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: xiaodong liu <teaofmoli@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Currently we subtract 8 from offset when calling duffzero because 8
is added to offset in the duffzero implementation. This operation is
meaningless, so remove it.
Change-Id: I7e451d04d7e98ccafe711645d81d3aadf376766f
Reviewed-on: https://go-review.googlesource.com/c/go/+/487295
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Run-TryBot: WANG Xuerui <git@xen0n.name>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: xiaodong liu <teaofmoli@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
|
|
Previously, we need calculate both quotient and remainder together.
However, in most cases, only one result is needed. By separating these
instructions, we can save one instruction in most cases.
Change-Id: I0a2d4167cda68ab606783ba1aa2720ede19d6b53
Reviewed-on: https://go-review.googlesource.com/c/go/+/475315
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Previously, multiplication on loong64 architecture was performed using
MULV and MULHVU instructions to calculate the low 64-bit and high
64-bit of a multiplication respectively. However, in most cases, only
the low 64-bits are needed. This commit enalbes only computating the low
64-bit result with the MULV instruction.
Reduce the binary size slightly.
file before after Δ %
addr2line 2833777 2833849 +72 +0.003%
asm 5267499 5266963 -536 -0.010%
buildid 2579706 2579402 -304 -0.012%
cgo 4798260 4797444 -816 -0.017%
compile 25247419 25175030 -72389 -0.287%
cover 4973091 4972027 -1064 -0.021%
dist 3631013 3565653 -65360 -1.800%
doc 4076036 4074004 -2032 -0.050%
fix 3496378 3496066 -312 -0.009%
link 6984102 6983214 -888 -0.013%
nm 2743820 2743516 -304 -0.011%
objdump 4277171 4277035 -136 -0.003%
pack 2379248 2378872 -376 -0.016%
pprof 14419090 14419874 +784 +0.005%
test2json 2684386 2684018 -368 -0.014%
trace 13640018 13631034 -8984 -0.066%
vet 7748918 7752630 +3712 +0.048%
go 15643850 15638098 -5752 -0.037%
total 127423782 127268729 -155053 -0.122%
Change-Id: Ifce4a9a3ed1d03c170681e39cb6f3541db9882dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/472775
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Have the write barrier call return a pointer to a buffer into which
the generated code records pointers that need write barrier treatment.
Change-Id: I7871764298e0aa1513de417010c8d46b296b199e
Reviewed-on: https://go-review.googlesource.com/c/go/+/447781
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Bypass: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|