diff options
| author | Michael Munday <mike.munday@ibm.com> | 2017-10-27 09:45:45 -0400 |
|---|---|---|
| committer | Michael Munday <mike.munday@ibm.com> | 2017-10-30 23:42:51 +0000 |
| commit | 96cdacb9717271126eb60de3d8410c9cecd67b28 (patch) | |
| tree | 49eba8ac93d1d88fa7dc826df47afb8bf523a9e0 /src/cmd/asm/internal | |
| parent | 7fff1db0605739fee20673475cbc1813fdf7008e (diff) | |
| download | go-96cdacb9717271126eb60de3d8410c9cecd67b28.tar.xz | |
cmd/asm, cmd/compile: optimize math.Abs and math.Copysign on s390x
This change adds three new instructions:
- LPDFR: load positive (math.Abs(x))
- LNDFR: load negative (-math.Abs(x))
- CPSDR: copy sign (math.Copysign(x, y))
By making use of GPR <-> FPR moves we can now compile math.Abs and
math.Copysign to these instructions using SSA rules.
This CL also adds new rules to merge address generation into combined
load operations. This makes GPR <-> FPR move matching more reliable.
name old time/op new time/op delta
Copysign 1.85ns ± 0% 1.40ns ± 1% -24.65% (p=0.000 n=8+10)
Abs 1.58ns ± 1% 0.73ns ± 1% -53.64% (p=0.000 n=10+10)
The geo mean improvement for all math package benchmarks was 4.6%.
Change-Id: I0cec35c5c1b3fb45243bf666b56b57faca981bc9
Reviewed-on: https://go-review.googlesource.com/73950
Run-TryBot: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Diffstat (limited to 'src/cmd/asm/internal')
| -rw-r--r-- | src/cmd/asm/internal/asm/testdata/s390x.s | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/src/cmd/asm/internal/asm/testdata/s390x.s b/src/cmd/asm/internal/asm/testdata/s390x.s index 6cc129ccc5..269f8bd077 100644 --- a/src/cmd/asm/internal/asm/testdata/s390x.s +++ b/src/cmd/asm/internal/asm/testdata/s390x.s @@ -296,6 +296,9 @@ TEXT main·foo(SB),DUPOK|NOSPLIT,$16-0 // TEXT main.foo(SB), DUPOK|NOSPLIT, $16- FMADDS F1, F2, F3 // b30e3012 FMSUB F4, F5, F5 // b31f5045 FMSUBS F6, F6, F7 // b30f7066 + LPDFR F1, F2 // b3700021 + LNDFR F3, F4 // b3710043 + CPSDR F5, F6, F7 // b3725076 VL (R15), V1 // e710f0000006 VST V1, (R15) // e710f000000e |
