aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/internal/obj/ppc64/anames.go
AgeCommit message (Collapse)Author
2024-11-21cmd/internal/obj/ppc64: support for decimal floating point instructionsJayanth Krishnamurthy jayanth.krishnamurthy@ibm.com
1. Support for decimal arithmetic quad instructions of powerpc: DADDQ, DSUBQ, DMULQ and DDIVQ. 2. Support for decimal compare ordered, unordered, quad instructions of powerpc: DCMPU, DCMPO, DCMPUQ, and DCMPOQ. Change-Id: I32a15a7f0a127b022b1f43d376e0ab0f7e9dd108 Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/623036 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Paul Murphy <murp@ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-10-29cmd/internal/obj/ppc64: add double-decimal arithmetic instructionsJayanth Krishnamurthy
Assembler support provided for the instructions DADD, DSUB, DMUL, and DDIV. Change-Id: Ic12ba02ce453cb1ca275334ca1924fb2009da767 Reviewed-on: https://go-review.googlesource.com/c/go/+/620856 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-04-01cmd/compile: support float min/max instructions on PPC64Paul E. Murphy
This enables efficient use of the builtin min/max function for float64 and float32 types on GOPPC64 >= power9. Extend the assembler to support xsminjdp/xsmaxjdp and use them to implement float min/max. Simplify the VSX xx3 opcode rules to allow FPR arguments, if all arguments are an FPR. Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328 Reviewed-on: https://go-review.googlesource.com/c/go/+/574535 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Than McIntosh <thanm@google.com>
2023-03-22cmd/internal/obj/ppc64: add VC[LT]ZLSBB instructionsPaul E. Murphy
These are ISA 3.0 power9 instructions which are helpful when reducing a vector compare result into a GPR. They are used in a future patch to improve the bytes.IndexByte asm routine. Change-Id: I424e2628e577167b9b7c0fcbd82099daf568ea35 Reviewed-on: https://go-review.googlesource.com/c/go/+/478115 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-03-08cmd/internal/obj/ppc64: add SETB instructionPaul E. Murphy
This ISA 3.0 (power9) instruction is helpful for some string functions in a future change. Change-Id: I1a659488ffb5099f8c89f480c39af4ef9c4b556a Reviewed-on: https://go-review.googlesource.com/c/go/+/472635 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-09-28cmd/internal/obj/ppc64: remove unnecessary opcodesArchana R
This CL removes some opcode placeholders that do not correspond to any existing instructions and hence create confusion. Some instructions that are no longer valid like LDMX are also removed. Any references to this instruction in ISA 3.0 are considered as documentation errata. Change-Id: Ib71a657099723bbe1db88873233ee573b5c42fe7 Reviewed-on: https://go-review.googlesource.com/c/go/+/429860 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Paul Murphy <murp@ibm.com> Run-TryBot: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Benny Siegert <bsiegert@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
2022-09-15cmd/internal/obj/ppc64: add ISA 3.1 instructionsPaul E. Murphy
Use ppc64map (from x/arch) to generate ISA 3.1 support for the assembler. A new file asm9_gtables.go is added which contains generated code to encode ISA 3.1 instructions, a function to assist filling out the oprange structure, a lookup table for the fixed bits of each instructions, and a slice of string name. Generated functions are shared if their bitwise encoding match, and the translation from an obj.Prog structure matches. The generated file is entirely self-contained, and does not require regenerating any other files for changes within it. If opcodes in a.out.go are reordered or changed, anames.go must be updated in the same way as before. Future improvements could shrink the generated opcode table to 32 bit entries as there is much less variation of the encoding of the prefix word, but it is not always identical for instructions which share a similar encoding of arguments (e.g PLWA and PLWZ). Updates #44549 Change-Id: Ie83fa02497c9ad2280678d68391043d3aae63175 Reviewed-on: https://go-review.googlesource.com/c/go/+/419535 Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Jenny Rakoczy <jenny@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Jenny Rakoczy <jenny@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Jenny Rakoczy <jenny@golang.org>
2022-03-11cmd/asm: add support for bdnz/bdz extended mnemonics on PPC64Paul E. Murphy
Support BDNZ and BDZ mnemonics, they are commonly used POWER instructions. The raw BC mnemonic is not easy to read. Likewise, cleanup code surrounding these changes. Change-Id: I72f1dad5013f7856bd0dd320bfb17b5a9f3c69ee Reviewed-on: https://go-review.googlesource.com/c/go/+/390696 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Trust: Paul Murphy <murp@ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2021-10-12cmd/internal/obj/ppc64: support alignment of prefixed insnPaul E. Murphy
Insert machine NOPs when a prefixed instruction crosses a 64B boundary. ISA 3.1 prohibits prefixed instructions being placed across them. Such instructions generate SIGILL if executed. Likewise, adjust the function alignment to guarantee such instructions can never cross one. And, don't pad the PC based on alignment. The linker can fit these more optimally. Likewise, include the function alignment when printing function debug information. This is needed to verify function alignment happens. Updates #44549 Change-Id: I434fb0ee4e984ca00dc4566f7569c3bcdf93f910 Reviewed-on: https://go-review.googlesource.com/c/go/+/347050 Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-09-28cmd/asm,cmd/compile,cmd/internal/obj/ppc64: add extswsli support on power9Lynn Boger
This adds support for the extswsli instruction which combines extsw followed by a shift. New benchmark demonstrates the improvement: name old time/op new time/op delta ExtShift 1.34µs ± 0% 1.30µs ± 0% -3.15% (p=0.057 n=4+3) Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/257017 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-09-17cmd/compile: use combined shifts to improve array addressing on ppc64xLynn Boger
This change adds rules to find pairs of instructions that can be combined into a single shifts. These instruction sequences are common in array addressing within loops. Improvements can be seen in many crypto packages and the hash packages. These are based on the extended mnemonics found in the ISA sections C.8.1 and C.8.2. Some rules in PPC64.rules were moved because the ordering prevented some matching. The following results were generated on power9. hash/crc32: CRC32/poly=Koopman/size=40/align=0 195ns ± 0% 163ns ± 0% -16.41% CRC32/poly=Koopman/size=40/align=1 200ns ± 0% 163ns ± 0% -18.50% CRC32/poly=Koopman/size=512/align=0 1.98µs ± 0% 1.67µs ± 0% -15.46% CRC32/poly=Koopman/size=512/align=1 1.98µs ± 0% 1.69µs ± 0% -14.80% CRC32/poly=Koopman/size=1kB/align=0 3.90µs ± 0% 3.31µs ± 0% -15.27% CRC32/poly=Koopman/size=1kB/align=1 3.85µs ± 0% 3.31µs ± 0% -14.15% CRC32/poly=Koopman/size=4kB/align=0 15.3µs ± 0% 13.1µs ± 0% -14.22% CRC32/poly=Koopman/size=4kB/align=1 15.4µs ± 0% 13.1µs ± 0% -14.79% CRC32/poly=Koopman/size=32kB/align=0 137µs ± 0% 105µs ± 0% -23.56% CRC32/poly=Koopman/size=32kB/align=1 137µs ± 0% 105µs ± 0% -23.53% crypto/rc4: RC4_128 733ns ± 0% 650ns ± 0% -11.32% (p=1.000 n=1+1) RC4_1K 5.80µs ± 0% 5.17µs ± 0% -10.89% (p=1.000 n=1+1) RC4_8K 45.7µs ± 0% 40.8µs ± 0% -10.73% (p=1.000 n=1+1) crypto/sha1: Hash8Bytes 635ns ± 0% 613ns ± 0% -3.46% (p=1.000 n=1+1) Hash320Bytes 2.30µs ± 0% 2.18µs ± 0% -5.38% (p=1.000 n=1+1) Hash1K 5.88µs ± 0% 5.38µs ± 0% -8.62% (p=1.000 n=1+1) Hash8K 42.0µs ± 0% 37.9µs ± 0% -9.75% (p=1.000 n=1+1) There are other improvements found in golang.org/x/crypto which are all in the range of 5-15%. Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c Reviewed-on: https://go-review.googlesource.com/c/go/+/252097 Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2020-04-29cmd/compile,cmd/internal/obj/ppc64: use mod instructions on power9Lynn Boger
This updates the PPC64.rules file to use the MOD instructions that are available in power9. Prior to power9 this is done using a longer sequence with multiply and divide. Included in this change is removal of the REM* opcode variations that set the CC or OV bits since their settings are based on the DIV and are not appropriate for the REM. Change-Id: Iceed9ce33e128e1911c15592ee674276ce8ba3fa Reviewed-on: https://go-review.googlesource.com/c/go/+/229761 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-23cmd/asm,cmd/internal/obj/ppc64: update instructions and testsLynn Boger
This change adds some instructions that were missing from the ppc64 assembler, mostly power9 but a few others from earlier. Tests in cmd/asm for ppc64 were updated: ppc64.s includes the new instructions, and ppc64enc.s now includes not only the new instructions but most ppc64 opcodes to provide a more complete test of the ppc64 assembler. The ppc64 instruction set is used for linux/ppc64le, linux/ppc64, and aix/ppc64. Change-Id: I8695f89dbca06174847963f4ef869f2e584d5bbf Reviewed-on: https://go-review.googlesource.com/c/go/+/229479 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-13cmd/internal/obj/ppc64: clean up some opcodesLynn Boger
This does some clean up of the ppc64 opcodes to remove names from the opcode list that don't actually assemble. At one time names were added to this list to represent opcode "classes" to organize other opcodes that have the same set of operand combinations. Since this is not documented, it is confusing as to which opcodes can be used in an asm file and which can't, and which opcodes should be supported in the disassembler. It is clearer for the user if the list of Go opcodes are all opcodes that can be assembled with names that match the ppc64 opcode where possible. I found this when trying to use Go opcode XXLAND in an asm file which seems like it should map to ppc64 xxland but when used it gets this error: go tool asm test_xxland.s asm: bad r/r, r/r/r or r/r/r/r opcode XXLAND asm: assembly failed This change removes the opcodes that are only used for opcode "classes" and fixes the case statement where they are referenced. This also fixes XXLAND and XXPERM which are opcodes that should assemble to their corresponding ppc64 opcode but do not. Change-Id: I52300db6b22f7f8b3dd3491c3f35a384b943352c Reviewed-on: https://go-review.googlesource.com/c/go/+/223138 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-04cmd/internal/obj/ppc64: add support for DQ-form instructionsCarlos Eduardo Seo
POWER9 (ISA 3.0) introduced a new format of load/store instructions to implement indexed load/store quadword, using an immediate value instead of a register index. This change adds support for this new instruction encoding and adds the new load/store quadword instructions (lxv/stxv) to the assembler. This change also adds the missing XX1-form loads/stores (halfword and byte) included in ISA 3.0. Change-Id: Ibcdf53c342d7a352d64a9403c2fe7b25be9c3b24 Reviewed-on: https://go-review.googlesource.com/c/go/+/200399 Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2019-08-29cmd/internal/obj/ppc64: add support for vmrgow,vmrgewLynn Boger
This adds support for ppc64 instructions vmrgow and vmrgew which are needed for an improved implementation of chacha20. Change-Id: I967a2de54236bcc573a99f7e2b222d5a8bb29e03 Reviewed-on: https://go-review.googlesource.com/c/go/+/192117 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2019-03-20cmd/compile/internal, cmd/internal/obj/ppc64: generate new count trailing ↵Carlos Eduardo Seo
zeros instructions on POWER9 This change adds new POWER9 instructions for counting trailing zeros (CNTTZW/CNTTZD) to the assembler and generates them in SSA when GOPPC64=power9. name old time/op new time/op delta TrailingZeros-160 1.59ns ±20% 1.45ns ±10% -8.81% (p=0.000 n=14+13) TrailingZeros8-160 1.55ns ±23% 1.62ns ±44% ~ (p=0.593 n=13+15) TrailingZeros16-160 1.78ns ±23% 1.62ns ±38% -9.31% (p=0.003 n=14+14) TrailingZeros32-160 1.64ns ±10% 1.49ns ± 9% -9.15% (p=0.000 n=13+14) TrailingZeros64-160 1.53ns ± 6% 1.45ns ± 5% -5.38% (p=0.000 n=15+13) Change-Id: I365e6ff79f3ce4d8ebe089a6a86b1771853eb596 Reviewed-on: https://go-review.googlesource.com/c/go/+/167517 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-11-28cmd/asm,cmd/internal/obj/ppc64: add VPERMXOR to ppc64 assemblerLynn Boger
VPERMXOR is missing from the Go assembler for ppc64. It has the same format as VPERM. It was requested by an external user so they could write an optimized algorithm in asm. Change-Id: Icf4c682f7f46716ccae64e6ae3d62e8cec67f6c1 Reviewed-on: https://go-review.googlesource.com/c/151578 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2018-07-03cmd/internal/obj: follow convention for generated code commentTobias Klauser
Follow the convertion (https://golang.org/s/generatedcode) for generated code in stringer.go. Change-Id: I7b5fbb04ba03e8ac77a9a0a402088669469de858 Reviewed-on: https://go-review.googlesource.com/122015 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-04-26cmd/compile, cmd/internal/obj/ppc64: make math.Round an intrinsic on ppc64xCarlos Eduardo Seo
This change implements math.Round as an intrinsic on ppc64x so it can be done using a single instruction. benchmark old ns/op new ns/op delta BenchmarkRound-16 2.60 0.69 -73.46% Change-Id: I9408363e96201abdfc73ced7bcd5f0c29db006a8 Reviewed-on: https://go-review.googlesource.com/109395 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-04-18cmd/internal/obj/ppc64: add vector multiply instructionsCarlos Eduardo Seo
This change adds vector multiply instructions to the assembler for ppc64x. Change-Id: I5143a2dc3736951344d43999066d38ab8be4a721 Reviewed-on: https://go-review.googlesource.com/107795 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-13cmd/internal/obj/ppc64: implement full operand support for l*arx instructionsCarlos Eduardo Seo
The current implementation of l*arx instructions does not accept non-zero offsets in RA nor the EH field. This change adds full functionality to those instructions. Updates #23845 Change-Id: If113f70d11de5f35f8389520b049390dbc40e863 Reviewed-on: https://go-review.googlesource.com/99635 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-02-15cmd/asm, cmd/internal/obj/ppc64: add Immediate Shifted opcodes for ppc64xCarlos Eduardo Seo
This change adds ADD/AND/OR/XOR Immediate Shifted instructions for ppc64x so they are usable in Go asm code. These instructions were originally present in asm9.go, but they were only usable in that file (as -AADD, -AANDCC, -AOR, -AXOR). These old mnemonics are now removed. Updates #23845 Change-Id: Ifa2fac685e8bc628cb241dd446adfc3068181826 Reviewed-on: https://go-review.googlesource.com/94115 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-11-06runtime: improve IndexByte for ppc64xCarlos Eduardo Seo
This change adds a better implementation of IndexByte in asm that uses the vector registers/instructions on ppc64x. benchmark old ns/op new ns/op delta BenchmarkIndexByte/10-8 9.70 9.37 -3.40% BenchmarkIndexByte/32-8 10.9 10.9 +0.00% BenchmarkIndexByte/4K-8 254 92.8 -63.46% BenchmarkIndexByte/4M-8 249246 118435 -52.48% BenchmarkIndexByte/64M-8 10737987 7383096 -31.24% benchmark old MB/s new MB/s speedup BenchmarkIndexByte/10-8 1030.63 1067.24 1.04x BenchmarkIndexByte/32-8 2922.69 2928.53 1.00x BenchmarkIndexByte/4K-8 16065.95 44156.45 2.75x BenchmarkIndexByte/4M-8 16827.96 35414.21 2.10x BenchmarkIndexByte/64M-8 6249.67 9089.53 1.45x Change-Id: I81dbdd620f7bb4e395ce4d1f2a14e8e91e39f9a1 Reviewed-on: https://go-review.googlesource.com/71710 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-10-30cmd/compile,cmd/internal/obj/ppc64: make math.Abs,math.Copysign instrinsics ↵Lynn Boger
on ppc64x This adds support for math Abs, Copysign to be instrinsics on ppc64x. New instruction FCPSGN is added to generate fcpsgn. Some new rules are added to improve the int<->float conversions that are generated mainly due to the Float64bits and Float64frombits in the math package. PPC64.rules is also modified as suggested in the review for CL 63290. Improvements: benchmark old ns/op new ns/op delta BenchmarkAbs-16 1.12 0.69 -38.39% BenchmarkCopysign-16 1.30 0.93 -28.46% BenchmarkNextafter32-16 9.34 8.05 -13.81% BenchmarkFrexp-16 8.81 7.60 -13.73% Others that used Copysign also saw smaller improvements. I attempted to make this work using rules since that seems to be preferred, but due to the use of Float64bits and Float64frombits in these functions, several rules had to be added and even then not all cases were matched. Using rules became too complicated and seemed too fragile for these. Updates #21390 Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995 Reviewed-on: https://go-review.googlesource.com/67130 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Keith Randall <khr@golang.org>
2017-09-14cmd/compile,math: improve int<->float conversions on ppc64xLynn Boger
The functions Float64bits and Float64frombits perform poorly on ppc64x because the int<->float conversions often result in load and store sequences to handle the type change. This patch adds more rules to recognize those sequences and use register to register moves and avoid unnecessary loads and stores where possible. There were some existing rules to improve these conversions, but this provides additional improvements. Included here: - New instruction FCFIDS to improve on conversion to 32 bit - Rename Xf2i64 and Xi2f64 as MTVSRD, MFVSRD, to match the asm - Add rules to lower some of the load/store sequences for - Added new go asm to ppc64.s testcase. conversions Improvements: BenchmarkAbs-16 2.16 0.93 -56.94% BenchmarkCopysign-16 2.66 1.18 -55.64% BenchmarkRound-16 4.82 2.69 -44.19% BenchmarkSignbit-16 1.71 1.14 -33.33% BenchmarkFrexp-16 11.4 7.94 -30.35% BenchmarkLogb-16 10.4 7.34 -29.42% BenchmarkLdexp-16 15.7 11.2 -28.66% BenchmarkIlogb-16 10.2 7.32 -28.24% BenchmarkPowInt-16 69.6 55.9 -19.68% BenchmarkModf-16 10.1 8.19 -18.91% BenchmarkLog2-16 17.4 14.3 -17.82% BenchmarkCbrt-16 45.0 37.3 -17.11% BenchmarkAtanh-16 57.6 48.3 -16.15% BenchmarkRemainder-16 76.6 65.4 -14.62% BenchmarkGamma-16 26.0 22.5 -13.46% BenchmarkPowFrac-16 197 174 -11.68% BenchmarkMod-16 112 99.8 -10.89% BenchmarkAsinh-16 59.9 53.7 -10.35% BenchmarkAcosh-16 44.8 40.3 -10.04% Updates #21390 Change-Id: I56cc991fc2e55249d69518d4e1ba76cc23904e35 Reviewed-on: https://go-review.googlesource.com/63290 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2017-08-29cmd/asm, cmd/internal/obj/ppc64: add ISA 3.0 instructionsCarlos Eduardo Seo
This change adds new ppc64 instructions from the POWER9 ISA. This includes compares, loads, maths, register moves and the new random number generator and copy/paste facilities. Change-Id: Ife3720b90f5af184ff115bbcdcbce5c1302d39b6 Reviewed-on: https://go-review.googlesource.com/53930 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-04-20cmd/compile: add rotates to PPC64.rulesLynn Boger
This updates PPC64.rules to include rules to generate rotates for ADD, OR, XOR operators that combine two opposite shifts that sum to 32 or 64. To support this change opcodes for ROTL and ROTLW were added to be used like the rotldi and rotlwi extended mnemonics. This provides the following improvement in sha3: BenchmarkPermutationFunction-8 302.83 376.40 1.24x BenchmarkSha3_512_MTU-8 98.64 121.92 1.24x BenchmarkSha3_384_MTU-8 136.80 168.30 1.23x BenchmarkSha3_256_MTU-8 169.21 211.29 1.25x BenchmarkSha3_224_MTU-8 179.76 221.19 1.23x BenchmarkShake128_MTU-8 212.87 263.23 1.24x BenchmarkShake256_MTU-8 196.62 245.60 1.25x BenchmarkShake256_16x-8 163.57 194.37 1.19x BenchmarkShake256_1MiB-8 199.02 248.74 1.25x BenchmarkSha3_512_1MiB-8 106.55 133.13 1.25x Fixes #20030 Change-Id: I484c56f48395d32f53ff3ecb3ac6cb8191cfee44 Reviewed-on: https://go-review.googlesource.com/40992 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-09cmd/asm, cmd/internal/obj/ppc64: Add ISA 2.05, 2.06 and 2.07 instructions.Carlos Eduardo Seo
This change adds instructions from ISA 2.05, 2.06 and 2.07 that are frequently used in assembly optimizations for ppc64. It also fixes two problems: * the implementation of RLDICR[CC]/RLDICL[CC] did not consider all possible cases for the bit mask. * removed two non-existing instructions that were added by mistake in the VMX implementation (VORL/VANDL). Change-Id: Iaef4e5c6a5240c2156c6c0f28ad3bcd8780e9830 Reviewed-on: https://go-review.googlesource.com/36230 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2016-10-28cmd/asm, cmd/internal/obj/ppc64: Add vector scalar (VSX) registers and ↵Carlos Eduardo Seo
instructions The current implementation for Power architecture does not include the vector scalar (VSX) registers. This adds the 63 VSX registers and the most commonly used instructions: load/store VSX vector/scalar, move to/from VSR, logical operations, select, merge, splat, permute, shift, FP-FP conversion, FP-integer conversion and integer-FP conversion. Change-Id: I0f7572d2359fe7f3ea0124a1eb1b0bebab33649e Reviewed-on: https://go-review.googlesource.com/30510 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-25cmd/internal: add shift opcodes with shift operands on ppc64xLynn Boger
Some original shift opcodes for ppc64x expected an operand to be a mask instead of a shift count, preventing some valid shift counts from being written. This adds new opcodes for shifts where needed, using mnemonics that match the ppc64 asm and allowing the assembler to accept the full set of valid shift counts. Fixes #15016 Change-Id: Id573489f852038d06def279c13fd0523736878a7 Reviewed-on: https://go-review.googlesource.com/31853 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: David Chase <drchase@google.com>
2016-10-17bytes: improve performance for bytes.Compare on ppc64xLynn Boger
This improves the performance for byte.Compare by rewriting the cmpbody function in runtime/asm_ppc64x.s. The previous code had a simple loop which loaded a pair of bytes and compared them, which is inefficient for long buffers. The updated function checks for 8 or 32 byte chunks and then loads and compares double words where possible. Because the byte.Compare result indicates greater or less than, the doubleword loads must take endianness into account, using a byte reversed load in the little endian case. Fixes #17433 benchmark old ns/op new ns/op delta BenchmarkBytesCompare/8-16 13.6 7.16 -47.35% BenchmarkBytesCompare/16-16 25.7 7.83 -69.53% BenchmarkBytesCompare/32-16 38.1 7.78 -79.58% BenchmarkBytesCompare/64-16 63.0 10.6 -83.17% BenchmarkBytesCompare/128-16 112 13.0 -88.39% BenchmarkBytesCompare/256-16 211 28.1 -86.68% BenchmarkBytesCompare/512-16 410 38.6 -90.59% BenchmarkBytesCompare/1024-16 807 60.2 -92.54% BenchmarkBytesCompare/2048-16 1601 103 -93.57% Change-Id: I121acc74fcd27c430797647b8d682eb0607c63eb Reviewed-on: https://go-review.googlesource.com/30949 Reviewed-by: David Chase <drchase@google.com>
2016-09-23math, cmd/internal/obj/ppc64: improve floor, ceil, trunc with asmLynn Boger
This adds the instructions frim, frip, and friz to the ppc64x assembler for use in implementing the math.Floor, math.Ceil, and math.Trunc functions to improve performance. Fixes #17185 BenchmarkCeil-128 21.4 6.99 -67.34% BenchmarkFloor-128 13.9 6.37 -54.17% BenchmarkTrunc-128 12.7 6.33 -50.16% Change-Id: I96131bd4e8c9c8dbafb25bfeb544cf9d2dbb4282 Reviewed-on: https://go-review.googlesource.com/29654 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Munday <munday@ca.ibm.com>
2016-09-19cmd/asm, cmd/internal/obj/ppc64: add ppc64 vector registers and instructionsCarlos Eduardo Seo
The current implementation for Power architecture does not include the vector (Altivec) registers. This adds the 32 VMX registers and the most commonly used instructions: X-form loads/stores; VX-form logical operations, add/sub, rotate/shift, count, splat, SHA Sigma and AES cipher; VC-form compare; and VA-form permute, shift, add/sub and select. Fixes #15619 Change-Id: I544b990631726e8fdfcce8ecca0aeeb72faae9aa Reviewed-on: https://go-review.googlesource.com/25600 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: David Chase <drchase@google.com>
2016-09-13cmd/asm: ppc64le support for ISEL for use by SSALynn Boger
This adds the support for the ppc64le isel instruction so it can be used by SSA. Fixed #16771 Change-Id: Ia2517f0834ff5e7ad927e218b84493e0106ab4a7 Reviewed-on: https://go-review.googlesource.com/28611 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-29math/big: add assembly implementation of arith for ppc64{le}Ethan Miller
The existing implementation used a pure go implementation, leading to slow cryptographic performance. Implemented mulWW, subVV, mulAddVWW, addMulVVW, and bitLen for ppc64{le}. Implemented divWW for ppc64le only, as the DIVDEU instruction is only available on Power8 or newer. benchcmp output: benchmark old ns/op new ns/op delta BenchmarkSignP384 28934360 10877330 -62.41% BenchmarkRSA2048Decrypt 41261033 5139930 -87.54% BenchmarkRSA2048Sign 45231300 7610985 -83.17% Benchmark3PrimeRSA2048Decrypt 20487300 2481408 -87.89% Fixes #16621 Change-Id: If8b68963bb49909bde832f2bda08a3791c4f5b7a Reviewed-on: https://go-review.googlesource.com/26951 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <munday@ca.ibm.com>
2016-08-15[dev.ssa] cmd/compile: PPC64, FP to/from int conversions.David Chase
Passes ssa_test. Requires a few new instructions and some scratchpad memory to move data between G and F registers. Also fixed comparisons to be correct in case of NaN. Added missing instructions for run.bash. Removed some FP registers that are apparently "reserved" (but that are also apparently also unused except for a gratuitous multiplication by two when y = x+x would work just as well). Currently failing stack splits. Updates #16010. Change-Id: I73b161bfff54445d72bd7b813b1479f89fc72602 Reviewed-on: https://go-review.googlesource.com/26813 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-05-05sync/atomic, runtime/internal/atomic: improve ppc64x atomicsLynn Boger
The following performance improvements have been made to the low-level atomic functions for ppc64le & ppc64: - For those cases containing a lwarx and stwcx (or other sizes): sync, lwarx, maybe something, stwcx, loop to sync, sync, isync The sync is moved before (outside) the lwarx/stwcx loop, and the sync after is removed, so it becomes: sync, lwarx, maybe something, stwcx, loop to lwarx, isync - For the Or8 and And8, the shifting and manipulation of the address to the word aligned version were removed and the instructions were changed to use lbarx, stbcx instead of register shifting, xor, then lwarx, stwcx. - New instructions LWSYNC, LBAR, STBCC were tested and added. runtime/atomic_ppc64x.s was changed to use the LWSYNC opcode instead of the WORD encoding. Fixes #15469 Ran some of the benchmarks in the runtime and sync directories. Some results varied from run to run but the trend was improvement based on best times for base and new: runtime.test: BenchmarkChanNonblocking-128 0.88 0.89 +1.14% BenchmarkChanUncontended-128 569 511 -10.19% BenchmarkChanContended-128 63110 53231 -15.65% BenchmarkChanSync-128 691 598 -13.46% BenchmarkChanSyncWork-128 11355 11649 +2.59% BenchmarkChanProdCons0-128 2402 2090 -12.99% BenchmarkChanProdCons10-128 1348 1363 +1.11% BenchmarkChanProdCons100-128 1002 746 -25.55% BenchmarkChanProdConsWork0-128 2554 2720 +6.50% BenchmarkChanProdConsWork10-128 1909 1804 -5.50% BenchmarkChanProdConsWork100-128 1624 1580 -2.71% BenchmarkChanCreation-128 237 212 -10.55% BenchmarkChanSem-128 705 667 -5.39% BenchmarkChanPopular-128 5081190 4497566 -11.49% BenchmarkCreateGoroutines-128 532 473 -11.09% BenchmarkCreateGoroutinesParallel-128 35.0 34.7 -0.86% BenchmarkCreateGoroutinesCapture-128 4923 4200 -14.69% sync.test: BenchmarkUncontendedSemaphore-128 112 94.2 -15.89% BenchmarkContendedSemaphore-128 133 128 -3.76% BenchmarkMutexUncontended-128 1.90 1.67 -12.11% BenchmarkMutex-128 353 310 -12.18% BenchmarkMutexSlack-128 304 283 -6.91% BenchmarkMutexWork-128 554 541 -2.35% BenchmarkMutexWorkSlack-128 567 556 -1.94% BenchmarkMutexNoSpin-128 275 242 -12.00% BenchmarkMutexSpin-128 1129 1030 -8.77% BenchmarkOnce-128 1.08 0.96 -11.11% BenchmarkPool-128 29.8 27.4 -8.05% BenchmarkPoolOverflow-128 40564 36583 -9.81% BenchmarkSemaUncontended-128 3.14 2.63 -16.24% BenchmarkSemaSyntNonblock-128 1087 1069 -1.66% BenchmarkSemaSyntBlock-128 897 893 -0.45% BenchmarkSemaWorkNonblock-128 1034 1028 -0.58% BenchmarkSemaWorkBlock-128 949 886 -6.64% Change-Id: I4403fb29d3cd5254b7b1ce87a216bd11b391079e Reviewed-on: https://go-review.googlesource.com/22549 Reviewed-by: Michael Munday <munday@ca.ibm.com> Reviewed-by: Minux Ma <minux@golang.org>
2016-05-04cmd/compile: fix uint64 to float casts on ppc64Michael Munday
Adds the FCFIDU instruction and uses it instead of the FCFID instruction for unsigned integer to float casts. This change means that unsigned integers do not have to be cast to signed integers before being cast to a floating point value. Therefore it is no longer necessary to insert instructions to detect and fix values that overflow int64. The previous code generating the uint64 to int64 cast handled overflow by truncating the uint64 value. This truncation can change the result of the rounding performed by the integer to float cast. The FCFIDU instruction was added in Power ISA 2.06B. Fixes #15539. Change-Id: Ia37a9631293eff91032d4cd9a9bec759d2142437 Reviewed-on: https://go-review.googlesource.com/22772 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-07-13cmd/internal/obj: rename *.out.go to a.out.goRob Pike
The old numerical names like 6.out.go are a relic from the old tools. Easier to rename than explain. The anames.go files were modified by go generate; no changes beyond the explanatory comment at the top. Change-Id: I84742c75c60e47724baa9d49a91fef1f8581f021 Reviewed-on: https://go-review.googlesource.com/12069 Run-TryBot: Rob Pike <r@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
2015-03-11cmd/internal/obj/*: go generate the slice of Anames stringsRob Pike
Add cmd/internal/obj/stringer.go to do the generation and update the architecture packages to use it to maintain the Anames tables. Change-Id: I9c6d4def1bf21624668396d70c17973d0db11fbc Reviewed-on: https://go-review.googlesource.com/7430 Reviewed-by: Russ Cox <rsc@golang.org>