| Age | Commit message (Collapse) | Author |
|
1. Support for decimal arithmetic quad instructions of powerpc: DADDQ, DSUBQ, DMULQ
and DDIVQ.
2. Support for decimal compare ordered, unordered, quad instructions of powerpc:
DCMPU, DCMPO, DCMPUQ, and DCMPOQ.
Change-Id: I32a15a7f0a127b022b1f43d376e0ab0f7e9dd108
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/623036
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Paul Murphy <murp@ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Assembler support provided for the instructions DADD, DSUB, DMUL, and DDIV.
Change-Id: Ic12ba02ce453cb1ca275334ca1924fb2009da767
Reviewed-on: https://go-review.googlesource.com/c/go/+/620856
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
This enables efficient use of the builtin min/max function
for float64 and float32 types on GOPPC64 >= power9.
Extend the assembler to support xsminjdp/xsmaxjdp and use
them to implement float min/max.
Simplify the VSX xx3 opcode rules to allow FPR arguments,
if all arguments are an FPR.
Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328
Reviewed-on: https://go-review.googlesource.com/c/go/+/574535
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
These are ISA 3.0 power9 instructions which are helpful when reducing
a vector compare result into a GPR.
They are used in a future patch to improve the bytes.IndexByte asm
routine.
Change-Id: I424e2628e577167b9b7c0fcbd82099daf568ea35
Reviewed-on: https://go-review.googlesource.com/c/go/+/478115
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
This ISA 3.0 (power9) instruction is helpful for some string functions
in a future change.
Change-Id: I1a659488ffb5099f8c89f480c39af4ef9c4b556a
Reviewed-on: https://go-review.googlesource.com/c/go/+/472635
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This CL removes some opcode placeholders that do not correspond
to any existing instructions and hence create confusion. Some
instructions that are no longer valid like LDMX are also removed.
Any references to this instruction in ISA 3.0 are considered
as documentation errata.
Change-Id: Ib71a657099723bbe1db88873233ee573b5c42fe7
Reviewed-on: https://go-review.googlesource.com/c/go/+/429860
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Benny Siegert <bsiegert@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
|
|
Use ppc64map (from x/arch) to generate ISA 3.1 support for the
assembler. A new file asm9_gtables.go is added which contains
generated code to encode ISA 3.1 instructions, a function to assist
filling out the oprange structure, a lookup table for the fixed
bits of each instructions, and a slice of string name. Generated
functions are shared if their bitwise encoding match, and the
translation from an obj.Prog structure matches.
The generated file is entirely self-contained, and does not require
regenerating any other files for changes within it. If opcodes in
a.out.go are reordered or changed, anames.go must be updated in
the same way as before.
Future improvements could shrink the generated opcode table
to 32 bit entries as there is much less variation of the
encoding of the prefix word, but it is not always identical
for instructions which share a similar encoding of arguments
(e.g PLWA and PLWZ).
Updates #44549
Change-Id: Ie83fa02497c9ad2280678d68391043d3aae63175
Reviewed-on: https://go-review.googlesource.com/c/go/+/419535
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Jenny Rakoczy <jenny@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Jenny Rakoczy <jenny@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Jenny Rakoczy <jenny@golang.org>
|
|
Support BDNZ and BDZ mnemonics, they are commonly used
POWER instructions. The raw BC mnemonic is not easy
to read.
Likewise, cleanup code surrounding these changes.
Change-Id: I72f1dad5013f7856bd0dd320bfb17b5a9f3c69ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/390696
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Trust: Paul Murphy <murp@ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Insert machine NOPs when a prefixed instruction crosses a 64B boundary.
ISA 3.1 prohibits prefixed instructions being placed across them. Such
instructions generate SIGILL if executed.
Likewise, adjust the function alignment to guarantee such instructions
can never cross one. And, don't pad the PC based on alignment. The
linker can fit these more optimally.
Likewise, include the function alignment when printing function debug
information. This is needed to verify function alignment happens.
Updates #44549
Change-Id: I434fb0ee4e984ca00dc4566f7569c3bcdf93f910
Reviewed-on: https://go-review.googlesource.com/c/go/+/347050
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This adds support for the extswsli instruction which combines
extsw followed by a shift.
New benchmark demonstrates the improvement:
name old time/op new time/op delta
ExtShift 1.34µs ± 0% 1.30µs ± 0% -3.15% (p=0.057 n=4+3)
Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/257017
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This change adds rules to find pairs of instructions that can
be combined into a single shifts. These instruction sequences
are common in array addressing within loops. Improvements can
be seen in many crypto packages and the hash packages.
These are based on the extended mnemonics found in the ISA
sections C.8.1 and C.8.2.
Some rules in PPC64.rules were moved because the ordering prevented
some matching.
The following results were generated on power9.
hash/crc32:
CRC32/poly=Koopman/size=40/align=0 195ns ± 0% 163ns ± 0% -16.41%
CRC32/poly=Koopman/size=40/align=1 200ns ± 0% 163ns ± 0% -18.50%
CRC32/poly=Koopman/size=512/align=0 1.98µs ± 0% 1.67µs ± 0% -15.46%
CRC32/poly=Koopman/size=512/align=1 1.98µs ± 0% 1.69µs ± 0% -14.80%
CRC32/poly=Koopman/size=1kB/align=0 3.90µs ± 0% 3.31µs ± 0% -15.27%
CRC32/poly=Koopman/size=1kB/align=1 3.85µs ± 0% 3.31µs ± 0% -14.15%
CRC32/poly=Koopman/size=4kB/align=0 15.3µs ± 0% 13.1µs ± 0% -14.22%
CRC32/poly=Koopman/size=4kB/align=1 15.4µs ± 0% 13.1µs ± 0% -14.79%
CRC32/poly=Koopman/size=32kB/align=0 137µs ± 0% 105µs ± 0% -23.56%
CRC32/poly=Koopman/size=32kB/align=1 137µs ± 0% 105µs ± 0% -23.53%
crypto/rc4:
RC4_128 733ns ± 0% 650ns ± 0% -11.32% (p=1.000 n=1+1)
RC4_1K 5.80µs ± 0% 5.17µs ± 0% -10.89% (p=1.000 n=1+1)
RC4_8K 45.7µs ± 0% 40.8µs ± 0% -10.73% (p=1.000 n=1+1)
crypto/sha1:
Hash8Bytes 635ns ± 0% 613ns ± 0% -3.46% (p=1.000 n=1+1)
Hash320Bytes 2.30µs ± 0% 2.18µs ± 0% -5.38% (p=1.000 n=1+1)
Hash1K 5.88µs ± 0% 5.38µs ± 0% -8.62% (p=1.000 n=1+1)
Hash8K 42.0µs ± 0% 37.9µs ± 0% -9.75% (p=1.000 n=1+1)
There are other improvements found in golang.org/x/crypto which are all in the
range of 5-15%.
Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252097
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
This updates the PPC64.rules file to use the MOD instructions
that are available in power9. Prior to power9 this is done
using a longer sequence with multiply and divide.
Included in this change is removal of the REM* opcode variations
that set the CC or OV bits since their settings are based
on the DIV and are not appropriate for the REM.
Change-Id: Iceed9ce33e128e1911c15592ee674276ce8ba3fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/229761
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change adds some instructions that were missing from the
ppc64 assembler, mostly power9 but a few others from earlier.
Tests in cmd/asm for ppc64 were updated: ppc64.s includes the
new instructions, and ppc64enc.s now includes not only the
new instructions but most ppc64 opcodes to provide a more
complete test of the ppc64 assembler.
The ppc64 instruction set is used for linux/ppc64le,
linux/ppc64, and aix/ppc64.
Change-Id: I8695f89dbca06174847963f4ef869f2e584d5bbf
Reviewed-on: https://go-review.googlesource.com/c/go/+/229479
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This does some clean up of the ppc64 opcodes to remove names
from the opcode list that don't actually assemble. At one time
names were added to this list to represent opcode "classes" to
organize other opcodes that have the same set of operand
combinations. Since this is not documented, it is confusing as
to which opcodes can be used in an asm file and which can't, and
which opcodes should be supported in the disassembler. It is
clearer for the user if the list of Go opcodes are all opcodes
that can be assembled with names that match the ppc64 opcode
where possible.
I found this when trying to use Go opcode XXLAND in an asm file
which seems like it should map to ppc64 xxland but when used it
gets this error:
go tool asm test_xxland.s
asm: bad r/r, r/r/r or r/r/r/r opcode XXLAND
asm: assembly failed
This change removes the opcodes that are only used for opcode
"classes" and fixes the case statement where they are referenced.
This also fixes XXLAND and XXPERM which are opcodes that should
assemble to their corresponding ppc64 opcode but do not.
Change-Id: I52300db6b22f7f8b3dd3491c3f35a384b943352c
Reviewed-on: https://go-review.googlesource.com/c/go/+/223138
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
POWER9 (ISA 3.0) introduced a new format of load/store instructions to
implement indexed load/store quadword, using an immediate value instead
of a register index.
This change adds support for this new instruction encoding and adds the
new load/store quadword instructions (lxv/stxv) to the assembler.
This change also adds the missing XX1-form loads/stores (halfword and byte)
included in ISA 3.0.
Change-Id: Ibcdf53c342d7a352d64a9403c2fe7b25be9c3b24
Reviewed-on: https://go-review.googlesource.com/c/go/+/200399
Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This adds support for ppc64 instructions vmrgow and vmrgew which
are needed for an improved implementation of chacha20.
Change-Id: I967a2de54236bcc573a99f7e2b222d5a8bb29e03
Reviewed-on: https://go-review.googlesource.com/c/go/+/192117
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
zeros instructions on POWER9
This change adds new POWER9 instructions for counting trailing zeros (CNTTZW/CNTTZD)
to the assembler and generates them in SSA when GOPPC64=power9.
name old time/op new time/op delta
TrailingZeros-160 1.59ns ±20% 1.45ns ±10% -8.81% (p=0.000 n=14+13)
TrailingZeros8-160 1.55ns ±23% 1.62ns ±44% ~ (p=0.593 n=13+15)
TrailingZeros16-160 1.78ns ±23% 1.62ns ±38% -9.31% (p=0.003 n=14+14)
TrailingZeros32-160 1.64ns ±10% 1.49ns ± 9% -9.15% (p=0.000 n=13+14)
TrailingZeros64-160 1.53ns ± 6% 1.45ns ± 5% -5.38% (p=0.000 n=15+13)
Change-Id: I365e6ff79f3ce4d8ebe089a6a86b1771853eb596
Reviewed-on: https://go-review.googlesource.com/c/go/+/167517
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
VPERMXOR is missing from the Go assembler for ppc64. It has the
same format as VPERM. It was requested by an external user so
they could write an optimized algorithm in asm.
Change-Id: Icf4c682f7f46716ccae64e6ae3d62e8cec67f6c1
Reviewed-on: https://go-review.googlesource.com/c/151578
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
Follow the convertion (https://golang.org/s/generatedcode) for generated
code in stringer.go.
Change-Id: I7b5fbb04ba03e8ac77a9a0a402088669469de858
Reviewed-on: https://go-review.googlesource.com/122015
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
This change implements math.Round as an intrinsic on ppc64x so it can be
done using a single instruction.
benchmark old ns/op new ns/op delta
BenchmarkRound-16 2.60 0.69 -73.46%
Change-Id: I9408363e96201abdfc73ced7bcd5f0c29db006a8
Reviewed-on: https://go-review.googlesource.com/109395
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This change adds vector multiply instructions to the assembler for
ppc64x.
Change-Id: I5143a2dc3736951344d43999066d38ab8be4a721
Reviewed-on: https://go-review.googlesource.com/107795
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The current implementation of l*arx instructions does not accept non-zero
offsets in RA nor the EH field. This change adds full functionality to those
instructions.
Updates #23845
Change-Id: If113f70d11de5f35f8389520b049390dbc40e863
Reviewed-on: https://go-review.googlesource.com/99635
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This change adds ADD/AND/OR/XOR Immediate Shifted instructions for
ppc64x so they are usable in Go asm code. These instructions were
originally present in asm9.go, but they were only usable in that
file (as -AADD, -AANDCC, -AOR, -AXOR). These old mnemonics are now
removed.
Updates #23845
Change-Id: Ifa2fac685e8bc628cb241dd446adfc3068181826
Reviewed-on: https://go-review.googlesource.com/94115
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This change adds a better implementation of IndexByte in asm that uses the
vector registers/instructions on ppc64x.
benchmark old ns/op new ns/op delta
BenchmarkIndexByte/10-8 9.70 9.37 -3.40%
BenchmarkIndexByte/32-8 10.9 10.9 +0.00%
BenchmarkIndexByte/4K-8 254 92.8 -63.46%
BenchmarkIndexByte/4M-8 249246 118435 -52.48%
BenchmarkIndexByte/64M-8 10737987 7383096 -31.24%
benchmark old MB/s new MB/s speedup
BenchmarkIndexByte/10-8 1030.63 1067.24 1.04x
BenchmarkIndexByte/32-8 2922.69 2928.53 1.00x
BenchmarkIndexByte/4K-8 16065.95 44156.45 2.75x
BenchmarkIndexByte/4M-8 16827.96 35414.21 2.10x
BenchmarkIndexByte/64M-8 6249.67 9089.53 1.45x
Change-Id: I81dbdd620f7bb4e395ce4d1f2a14e8e91e39f9a1
Reviewed-on: https://go-review.googlesource.com/71710
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
on ppc64x
This adds support for math Abs, Copysign to be instrinsics on ppc64x.
New instruction FCPSGN is added to generate fcpsgn. Some new
rules are added to improve the int<->float conversions that are
generated mainly due to the Float64bits and Float64frombits in
the math package. PPC64.rules is also modified as suggested
in the review for CL 63290.
Improvements:
benchmark old ns/op new ns/op delta
BenchmarkAbs-16 1.12 0.69 -38.39%
BenchmarkCopysign-16 1.30 0.93 -28.46%
BenchmarkNextafter32-16 9.34 8.05 -13.81%
BenchmarkFrexp-16 8.81 7.60 -13.73%
Others that used Copysign also saw smaller improvements.
I attempted to make this work using rules since that
seems to be preferred, but due to the use of Float64bits and
Float64frombits in these functions, several rules had to be added and
even then not all cases were matched. Using rules became too
complicated and seemed too fragile for these.
Updates #21390
Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995
Reviewed-on: https://go-review.googlesource.com/67130
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The functions Float64bits and Float64frombits perform
poorly on ppc64x because the int<->float conversions
often result in load and store sequences to handle the
type change. This patch adds more rules to recognize
those sequences and use register to register moves
and avoid unnecessary loads and stores where possible.
There were some existing rules to improve these conversions,
but this provides additional improvements. Included here:
- New instruction FCFIDS to improve on conversion to 32 bit
- Rename Xf2i64 and Xi2f64 as MTVSRD, MFVSRD, to match the asm
- Add rules to lower some of the load/store sequences for
- Added new go asm to ppc64.s testcase.
conversions
Improvements:
BenchmarkAbs-16 2.16 0.93 -56.94%
BenchmarkCopysign-16 2.66 1.18 -55.64%
BenchmarkRound-16 4.82 2.69 -44.19%
BenchmarkSignbit-16 1.71 1.14 -33.33%
BenchmarkFrexp-16 11.4 7.94 -30.35%
BenchmarkLogb-16 10.4 7.34 -29.42%
BenchmarkLdexp-16 15.7 11.2 -28.66%
BenchmarkIlogb-16 10.2 7.32 -28.24%
BenchmarkPowInt-16 69.6 55.9 -19.68%
BenchmarkModf-16 10.1 8.19 -18.91%
BenchmarkLog2-16 17.4 14.3 -17.82%
BenchmarkCbrt-16 45.0 37.3 -17.11%
BenchmarkAtanh-16 57.6 48.3 -16.15%
BenchmarkRemainder-16 76.6 65.4 -14.62%
BenchmarkGamma-16 26.0 22.5 -13.46%
BenchmarkPowFrac-16 197 174 -11.68%
BenchmarkMod-16 112 99.8 -10.89%
BenchmarkAsinh-16 59.9 53.7 -10.35%
BenchmarkAcosh-16 44.8 40.3 -10.04%
Updates #21390
Change-Id: I56cc991fc2e55249d69518d4e1ba76cc23904e35
Reviewed-on: https://go-review.googlesource.com/63290
Reviewed-by: Michael Munday <mike.munday@ibm.com>
|
|
This change adds new ppc64 instructions from the POWER9 ISA. This includes
compares, loads, maths, register moves and the new random number generator and
copy/paste facilities.
Change-Id: Ife3720b90f5af184ff115bbcdcbce5c1302d39b6
Reviewed-on: https://go-review.googlesource.com/53930
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This updates PPC64.rules to include rules to generate rotates
for ADD, OR, XOR operators that combine two opposite shifts
that sum to 32 or 64.
To support this change opcodes for ROTL and ROTLW were added to
be used like the rotldi and rotlwi extended mnemonics.
This provides the following improvement in sha3:
BenchmarkPermutationFunction-8 302.83 376.40 1.24x
BenchmarkSha3_512_MTU-8 98.64 121.92 1.24x
BenchmarkSha3_384_MTU-8 136.80 168.30 1.23x
BenchmarkSha3_256_MTU-8 169.21 211.29 1.25x
BenchmarkSha3_224_MTU-8 179.76 221.19 1.23x
BenchmarkShake128_MTU-8 212.87 263.23 1.24x
BenchmarkShake256_MTU-8 196.62 245.60 1.25x
BenchmarkShake256_16x-8 163.57 194.37 1.19x
BenchmarkShake256_1MiB-8 199.02 248.74 1.25x
BenchmarkSha3_512_1MiB-8 106.55 133.13 1.25x
Fixes #20030
Change-Id: I484c56f48395d32f53ff3ecb3ac6cb8191cfee44
Reviewed-on: https://go-review.googlesource.com/40992
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This change adds instructions from ISA 2.05, 2.06 and 2.07 that are frequently
used in assembly optimizations for ppc64.
It also fixes two problems:
* the implementation of RLDICR[CC]/RLDICL[CC] did not consider all possible
cases for the bit mask.
* removed two non-existing instructions that were added by mistake in the VMX
implementation (VORL/VANDL).
Change-Id: Iaef4e5c6a5240c2156c6c0f28ad3bcd8780e9830
Reviewed-on: https://go-review.googlesource.com/36230
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
instructions
The current implementation for Power architecture does not include the vector
scalar (VSX) registers. This adds the 63 VSX registers and the most commonly
used instructions: load/store VSX vector/scalar, move to/from VSR, logical
operations, select, merge, splat, permute, shift, FP-FP conversion, FP-integer
conversion and integer-FP conversion.
Change-Id: I0f7572d2359fe7f3ea0124a1eb1b0bebab33649e
Reviewed-on: https://go-review.googlesource.com/30510
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Some original shift opcodes for ppc64x expected an operand to be
a mask instead of a shift count, preventing some valid shift counts
from being written.
This adds new opcodes for shifts where needed, using mnemonics that
match the ppc64 asm and allowing the assembler to accept the full set
of valid shift counts.
Fixes #15016
Change-Id: Id573489f852038d06def279c13fd0523736878a7
Reviewed-on: https://go-review.googlesource.com/31853
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This improves the performance for byte.Compare by rewriting
the cmpbody function in runtime/asm_ppc64x.s. The previous code
had a simple loop which loaded a pair of bytes and compared them,
which is inefficient for long buffers. The updated function checks
for 8 or 32 byte chunks and then loads and compares double words where
possible.
Because the byte.Compare result indicates greater or less than,
the doubleword loads must take endianness into account, using a
byte reversed load in the little endian case.
Fixes #17433
benchmark old ns/op new ns/op delta
BenchmarkBytesCompare/8-16 13.6 7.16 -47.35%
BenchmarkBytesCompare/16-16 25.7 7.83 -69.53%
BenchmarkBytesCompare/32-16 38.1 7.78 -79.58%
BenchmarkBytesCompare/64-16 63.0 10.6 -83.17%
BenchmarkBytesCompare/128-16 112 13.0 -88.39%
BenchmarkBytesCompare/256-16 211 28.1 -86.68%
BenchmarkBytesCompare/512-16 410 38.6 -90.59%
BenchmarkBytesCompare/1024-16 807 60.2 -92.54%
BenchmarkBytesCompare/2048-16 1601 103 -93.57%
Change-Id: I121acc74fcd27c430797647b8d682eb0607c63eb
Reviewed-on: https://go-review.googlesource.com/30949
Reviewed-by: David Chase <drchase@google.com>
|
|
This adds the instructions frim, frip, and friz to the ppc64x
assembler for use in implementing the math.Floor, math.Ceil, and
math.Trunc functions to improve performance.
Fixes #17185
BenchmarkCeil-128 21.4 6.99 -67.34%
BenchmarkFloor-128 13.9 6.37 -54.17%
BenchmarkTrunc-128 12.7 6.33 -50.16%
Change-Id: I96131bd4e8c9c8dbafb25bfeb544cf9d2dbb4282
Reviewed-on: https://go-review.googlesource.com/29654
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
|
|
The current implementation for Power architecture does not include the vector
(Altivec) registers. This adds the 32 VMX registers and the most commonly used
instructions: X-form loads/stores; VX-form logical operations, add/sub,
rotate/shift, count, splat, SHA Sigma and AES cipher; VC-form compare; and
VA-form permute, shift, add/sub and select.
Fixes #15619
Change-Id: I544b990631726e8fdfcce8ecca0aeeb72faae9aa
Reviewed-on: https://go-review.googlesource.com/25600
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This adds the support for the ppc64le isel instruction so
it can be used by SSA.
Fixed #16771
Change-Id: Ia2517f0834ff5e7ad927e218b84493e0106ab4a7
Reviewed-on: https://go-review.googlesource.com/28611
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The existing implementation used a pure go implementation, leading to slow
cryptographic performance.
Implemented mulWW, subVV, mulAddVWW, addMulVVW, and bitLen for
ppc64{le}.
Implemented divWW for ppc64le only, as the DIVDEU instruction is only
available on Power8 or newer.
benchcmp output:
benchmark old ns/op new ns/op delta
BenchmarkSignP384 28934360 10877330 -62.41%
BenchmarkRSA2048Decrypt 41261033 5139930 -87.54%
BenchmarkRSA2048Sign 45231300 7610985 -83.17%
Benchmark3PrimeRSA2048Decrypt 20487300 2481408 -87.89%
Fixes #16621
Change-Id: If8b68963bb49909bde832f2bda08a3791c4f5b7a
Reviewed-on: https://go-review.googlesource.com/26951
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
|
|
Passes ssa_test.
Requires a few new instructions and some scratchpad
memory to move data between G and F registers.
Also fixed comparisons to be correct in case of NaN.
Added missing instructions for run.bash.
Removed some FP registers that are apparently "reserved"
(but that are also apparently also unused except for a
gratuitous multiplication by two when y = x+x would work
just as well).
Currently failing stack splits.
Updates #16010.
Change-Id: I73b161bfff54445d72bd7b813b1479f89fc72602
Reviewed-on: https://go-review.googlesource.com/26813
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
The following performance improvements have been made to the
low-level atomic functions for ppc64le & ppc64:
- For those cases containing a lwarx and stwcx (or other sizes):
sync, lwarx, maybe something, stwcx, loop to sync, sync, isync
The sync is moved before (outside) the lwarx/stwcx loop, and the
sync after is removed, so it becomes:
sync, lwarx, maybe something, stwcx, loop to lwarx, isync
- For the Or8 and And8, the shifting and manipulation of the
address to the word aligned version were removed and the
instructions were changed to use lbarx, stbcx instead of
register shifting, xor, then lwarx, stwcx.
- New instructions LWSYNC, LBAR, STBCC were tested and added.
runtime/atomic_ppc64x.s was changed to use the LWSYNC opcode
instead of the WORD encoding.
Fixes #15469
Ran some of the benchmarks in the runtime and sync directories.
Some results varied from run to run but the trend was improvement
based on best times for base and new:
runtime.test:
BenchmarkChanNonblocking-128 0.88 0.89 +1.14%
BenchmarkChanUncontended-128 569 511 -10.19%
BenchmarkChanContended-128 63110 53231 -15.65%
BenchmarkChanSync-128 691 598 -13.46%
BenchmarkChanSyncWork-128 11355 11649 +2.59%
BenchmarkChanProdCons0-128 2402 2090 -12.99%
BenchmarkChanProdCons10-128 1348 1363 +1.11%
BenchmarkChanProdCons100-128 1002 746 -25.55%
BenchmarkChanProdConsWork0-128 2554 2720 +6.50%
BenchmarkChanProdConsWork10-128 1909 1804 -5.50%
BenchmarkChanProdConsWork100-128 1624 1580 -2.71%
BenchmarkChanCreation-128 237 212 -10.55%
BenchmarkChanSem-128 705 667 -5.39%
BenchmarkChanPopular-128 5081190 4497566 -11.49%
BenchmarkCreateGoroutines-128 532 473 -11.09%
BenchmarkCreateGoroutinesParallel-128 35.0 34.7 -0.86%
BenchmarkCreateGoroutinesCapture-128 4923 4200 -14.69%
sync.test:
BenchmarkUncontendedSemaphore-128 112 94.2 -15.89%
BenchmarkContendedSemaphore-128 133 128 -3.76%
BenchmarkMutexUncontended-128 1.90 1.67 -12.11%
BenchmarkMutex-128 353 310 -12.18%
BenchmarkMutexSlack-128 304 283 -6.91%
BenchmarkMutexWork-128 554 541 -2.35%
BenchmarkMutexWorkSlack-128 567 556 -1.94%
BenchmarkMutexNoSpin-128 275 242 -12.00%
BenchmarkMutexSpin-128 1129 1030 -8.77%
BenchmarkOnce-128 1.08 0.96 -11.11%
BenchmarkPool-128 29.8 27.4 -8.05%
BenchmarkPoolOverflow-128 40564 36583 -9.81%
BenchmarkSemaUncontended-128 3.14 2.63 -16.24%
BenchmarkSemaSyntNonblock-128 1087 1069 -1.66%
BenchmarkSemaSyntBlock-128 897 893 -0.45%
BenchmarkSemaWorkNonblock-128 1034 1028 -0.58%
BenchmarkSemaWorkBlock-128 949 886 -6.64%
Change-Id: I4403fb29d3cd5254b7b1ce87a216bd11b391079e
Reviewed-on: https://go-review.googlesource.com/22549
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Minux Ma <minux@golang.org>
|
|
Adds the FCFIDU instruction and uses it instead of the FCFID
instruction for unsigned integer to float casts. This change means
that unsigned integers do not have to be cast to signed integers
before being cast to a floating point value. Therefore it is no
longer necessary to insert instructions to detect and fix
values that overflow int64.
The previous code generating the uint64 to int64 cast handled
overflow by truncating the uint64 value. This truncation can
change the result of the rounding performed by the integer to
float cast.
The FCFIDU instruction was added in Power ISA 2.06B.
Fixes #15539.
Change-Id: Ia37a9631293eff91032d4cd9a9bec759d2142437
Reviewed-on: https://go-review.googlesource.com/22772
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The old numerical names like 6.out.go are a relic from the old tools.
Easier to rename than explain.
The anames.go files were modified by go generate; no changes
beyond the explanatory comment at the top.
Change-Id: I84742c75c60e47724baa9d49a91fef1f8581f021
Reviewed-on: https://go-review.googlesource.com/12069
Run-TryBot: Rob Pike <r@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
Add cmd/internal/obj/stringer.go to do the generation and update
the architecture packages to use it to maintain the Anames tables.
Change-Id: I9c6d4def1bf21624668396d70c17973d0db11fbc
Reviewed-on: https://go-review.googlesource.com/7430
Reviewed-by: Russ Cox <rsc@golang.org>
|