diff options
| author | Paul E. Murphy <murp@ibm.com> | 2021-06-08 13:16:01 -0500 |
|---|---|---|
| committer | Paul Murphy <murp@ibm.com> | 2022-05-10 20:03:53 +0000 |
| commit | 0c43878baa035db39d9bbf84ce8721cd8a97c78a (patch) | |
| tree | 27bc4f48cd69efdeb8aeb3510befd583dbf9801e /src/os/exec | |
| parent | 526de61c67add8400d73b28bbed3f3680586c472 (diff) | |
| download | go-0c43878baa035db39d9bbf84ce8721cd8a97c78a.tar.xz | |
cmd/compile: lower Add64/Sub64 into ssa on PPC64
math/bits.Add64 and math/bits.Sub64 now lower and optimize
directly in SSA form.
The optimization of carry chains focuses around eliding
XER<->GPR transfers of the CA bit when used exclusively as an
input to a single carry operations, or when the CA value is
known.
This also adds support for handling XER spills in the assembler
which could happen if carry chains contain inter-dependencies
on each other (which seems very unlikely with practical usage),
or a clobber happens (SRAW/SRAD/SUBFC operations clobber CA).
With PPC64 Add64/Sub64 lowering into SSA and this patch, the net
performance difference in crypto/elliptic benchmarks on P9/ppc64le
are:
name old time/op new time/op delta
ScalarBaseMult/P256 46.3µs ± 0% 46.9µs ± 0% +1.34%
ScalarBaseMult/P224 356µs ± 0% 209µs ± 0% -41.14%
ScalarBaseMult/P384 1.20ms ± 0% 0.57ms ± 0% -52.14%
ScalarBaseMult/P521 3.38ms ± 0% 1.44ms ± 0% -57.27%
ScalarMult/P256 199µs ± 0% 199µs ± 0% -0.17%
ScalarMult/P224 357µs ± 0% 212µs ± 0% -40.56%
ScalarMult/P384 1.20ms ± 0% 0.58ms ± 0% -51.86%
ScalarMult/P521 3.37ms ± 0% 1.44ms ± 0% -57.32%
MarshalUnmarshal/P256/Uncompressed 2.59µs ± 0% 2.52µs ± 0% -2.63%
MarshalUnmarshal/P256/Compressed 2.58µs ± 0% 2.52µs ± 0% -2.06%
MarshalUnmarshal/P224/Uncompressed 1.54µs ± 0% 1.40µs ± 0% -9.42%
MarshalUnmarshal/P224/Compressed 1.54µs ± 0% 1.39µs ± 0% -9.87%
MarshalUnmarshal/P384/Uncompressed 2.40µs ± 0% 1.80µs ± 0% -24.93%
MarshalUnmarshal/P384/Compressed 2.35µs ± 0% 1.81µs ± 0% -23.03%
MarshalUnmarshal/P521/Uncompressed 3.79µs ± 0% 2.58µs ± 0% -31.81%
MarshalUnmarshal/P521/Compressed 3.80µs ± 0% 2.60µs ± 0% -31.67%
Note, P256 uses an asm implementation, thus, little variation is expected.
Change-Id: I88a24f6bf0f4f285c649e40243b1ab69cc452b71
Reviewed-on: https://go-review.googlesource.com/c/go/+/346870
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Diffstat (limited to 'src/os/exec')
0 files changed, 0 insertions, 0 deletions
