| Age | Commit message (Collapse) | Author |
|
Found by github.com/mdempsky/unconvert
Change-Id: I88ce10390a49ba768a4deaa0df9057c93c1164de
GitHub-Last-Rev: 3b0f7e8f74f58340637f33287c238765856b2483
GitHub-Pull-Request: golang/go#75974
Reviewed-on: https://go-review.googlesource.com/c/go/+/712940
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
1. Support for decimal arithmetic quad instructions of powerpc: DADDQ, DSUBQ, DMULQ
and DDIVQ.
2. Support for decimal compare ordered, unordered, quad instructions of powerpc:
DCMPU, DCMPO, DCMPUQ, and DCMPOQ.
Change-Id: I32a15a7f0a127b022b1f43d376e0ab0f7e9dd108
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/623036
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Paul Murphy <murp@ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Add cmd/internal/obj/mkcnames.go to do the generation and update
the architecture packages to use it to maintain the Cnames tables.
Currently works correctly on arm64,loong64,mips,ppc64 and s390x.
Change-Id: I5220b0ba6d8a8a5fcc4d9774731eb2af69a671af
Reviewed-on: https://go-review.googlesource.com/c/go/+/622256
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Commit-Queue: Ian Lance Taylor <iant@golang.org>
|
|
Assembler support provided for the instructions DADD, DSUB, DMUL, and DDIV.
Change-Id: Ic12ba02ce453cb1ca275334ca1924fb2009da767
Reviewed-on: https://go-review.googlesource.com/c/go/+/620856
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
This enables efficient use of the builtin min/max function
for float64 and float32 types on GOPPC64 >= power9.
Extend the assembler to support xsminjdp/xsmaxjdp and use
them to implement float min/max.
Simplify the VSX xx3 opcode rules to allow FPR arguments,
if all arguments are an FPR.
Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328
Reviewed-on: https://go-review.googlesource.com/c/go/+/574535
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
Rename C_LCON, C_SCON, C_ADDCON, C_ANDCON into their aliased names
and remove them.
Change-Id: I8f67cc973f8059e65b81669d91a44500fc136b0a
Reviewed-on: https://go-review.googlesource.com/c/go/+/563097
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Make C_S32CON, C_U32CON, and C_32CON distinct classifiers to allow
more specific matching of 32 bit constants. C_U31CON is added to
support C_S32CON.
Likewise, add C_16CON which is the union of C_S16CON and C_U16CON
classification. This wil allow simplifying MOVD/MOVW optab entries
in a future patch.
Change-Id: I193acc0ded8f3edd91d306e39c3e7e55a9811e04
Reviewed-on: https://go-review.googlesource.com/c/go/+/562346
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
The assembler treats C_SBRA and C_LBRA optab classes identically,
combine them into one class to reduce the number of optab classes.
Likewise, C_LBRAPIC is renamed to C_BRAPIC for consistency with
the above change.
Change-Id: I47000e7273cb8f89a4d0621d71433ccbfb7afb70
Reviewed-on: https://go-review.googlesource.com/c/go/+/557916
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This halves the size of the xcmp lookup table.
Change-Id: I543fb72709ca45c026e9b7d8084a78f2a8fcd43e
Reviewed-on: https://go-review.googlesource.com/c/go/+/542295
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This optab matching rule was used to match signed 16 bit values shifted
left by 16 bits. Unsigned 16 bit values greater than 0x7FFF<<16 were
classified as C_U32CON which led to larger than necessary codegen.
Instead, rewrite logical/arithmetic operations in the preprocessor pass
to use the 16 bit shifted immediate operation (e.g ADDIS vs ADD). This
simplifies the optab matching rules, while also minimizing codegen size
for large unsigned values.
Note, ADDIS sign-extends the constant argument, all others do not.
For matching opcodes, this means:
MOVD $is<<16,Rx becomes ADDIS $is,Rx or ORIS $is,Rx
MOVW $is<<16,Rx becomes ADDIS $is,Rx
ADD $is<<16,[Rx,]Ry becomes ADDIS $is[Rx,]Ry
OR $is<<16,[Rx,]Ry becomes ORIS $is[Rx,]Ry
XOR $is<<16,[Rx,]Ry becomes XORIS $is[Rx,]Ry
Change-Id: I1a988d9f52517a04bb8dc2e41d7caf3d5fff867c
Reviewed-on: https://go-review.googlesource.com/c/go/+/536735
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
These are ISA 3.0 power9 instructions which are helpful when reducing
a vector compare result into a GPR.
They are used in a future patch to improve the bytes.IndexByte asm
routine.
Change-Id: I424e2628e577167b9b7c0fcbd82099daf568ea35
Reviewed-on: https://go-review.googlesource.com/c/go/+/478115
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
This ISA 3.0 (power9) instruction is helpful for some string functions
in a future change.
Change-Id: I1a659488ffb5099f8c89f480c39af4ef9c4b556a
Reviewed-on: https://go-review.googlesource.com/c/go/+/472635
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This CL removes some opcode placeholders that do not correspond
to any existing instructions and hence create confusion. Some
instructions that are no longer valid like LDMX are also removed.
Any references to this instruction in ISA 3.0 are considered
as documentation errata.
Change-Id: Ib71a657099723bbe1db88873233ee573b5c42fe7
Reviewed-on: https://go-review.googlesource.com/c/go/+/429860
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Benny Siegert <bsiegert@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
|
|
Use ppc64map (from x/arch) to generate ISA 3.1 support for the
assembler. A new file asm9_gtables.go is added which contains
generated code to encode ISA 3.1 instructions, a function to assist
filling out the oprange structure, a lookup table for the fixed
bits of each instructions, and a slice of string name. Generated
functions are shared if their bitwise encoding match, and the
translation from an obj.Prog structure matches.
The generated file is entirely self-contained, and does not require
regenerating any other files for changes within it. If opcodes in
a.out.go are reordered or changed, anames.go must be updated in
the same way as before.
Future improvements could shrink the generated opcode table
to 32 bit entries as there is much less variation of the
encoding of the prefix word, but it is not always identical
for instructions which share a similar encoding of arguments
(e.g PLWA and PLWZ).
Updates #44549
Change-Id: Ie83fa02497c9ad2280678d68391043d3aae63175
Reviewed-on: https://go-review.googlesource.com/c/go/+/419535
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Jenny Rakoczy <jenny@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Jenny Rakoczy <jenny@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Jenny Rakoczy <jenny@golang.org>
|
|
Allow the assembler frontend to match MMA register arguments added by
ISA 3.1. The prefix "A" (for accumulator) is chosen to identify them.
Updates #44549
Change-Id: I363e7d1103aee19d7966829d2079c3d876621efc
Reviewed-on: https://go-review.googlesource.com/c/go/+/419534
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
When a base+displacement kind of operand is given in an index-mode
instruction, the assembler does not flag it as an invalid instruction
causing the user to get an incorrect encoding of that instruction
leading to incorrect execution of the program.
Enable assembler to recognize valid and invalid operands used in index
mode instructions by classifying SOREG type into two further types
XOREG (used uniquely in index addressing mode instructions) and SOREG
for instructions working on base+displacement operands.
Also cleaned up usage of obj.Addr.Scale on PPC64.
Change-Id: Ib4d84343ae57477c6c074f44c4c2749496e11b91
Reviewed-on: https://go-review.googlesource.com/c/go/+/405542
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
|
|
Avoid coercing the CR bit into a GPR register type argument, and
move the existing usage to CRx_y register types. And, update the
compiler usage to this. This transformation is done internally,
so it should not alter existing assembly code.
Likewise, add assembly tests for all optab entries of BC/BR. This
found some cases which were not possible to realize with handwritten
asm, or assemble to something very unexpected if generated by the
compiler. The following optab entries are removed, and the cases
simplified or removed:
{as: ABR, a3: C_SCON, a6: C_LR, type_: 18, size: 4}
This existed only to pass the BH hint to JMP (LR) from compiler
generated code. It cannot be matched with asm. Instead, add and
support 4-operand form "BC{,L} $BO, $BI, $BH, (LR)".
{as: ABR, a1: C_REG, a6: C_CTR, type_: 18, size: 4}
Could be used like "BR R1, (CTR)", but always compiles to bctr
irrespective of arg 1. Any usage should be rewritten as "JMP (CTR)",
or rewritten if this was not the intended behavior.
{as: ABR, a6: C_ZOREG, type_: 15, size: 8}:
{as: ABC, a6: C_ZOREG, type_: 15, size: 8},
Not reachable: 0(reg) is coerced to reg in assembler frontend.
{as: ABC, a2: C_REG, a6: C_LR, type_: 18, size: 4}
{as: ABC, a2: C_REG, a6: C_CTR, type_: 18, size: 4}
Only usable from the compiler. However, the compiler does not
generate this form today. Without a BO operand (usually in a1), it
is not clear what this should assemble to.
Change-Id: I1b5151f884a5877e4a610e6fd41261e8e64c5454
Reviewed-on: https://go-review.googlesource.com/c/go/+/357775
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Support BDNZ and BDZ mnemonics, they are commonly used
POWER instructions. The raw BC mnemonic is not easy
to read.
Likewise, cleanup code surrounding these changes.
Change-Id: I72f1dad5013f7856bd0dd320bfb17b5a9f3c69ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/390696
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Trust: Paul Murphy <murp@ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
CR bits and CR fields should be treated separately. Some instructions
modify an entire CR, a CR field, or a single CR bit.
Add a new argument class for CR bits, and teach the assembler the
recognize them as names like CR0LT or CR2SO, and update the CR
bit logic instructions to use them. They will no longer accept
register field (CRn) type arguments.
Fixes #46422
Change-Id: Iaba127d88abada0c2a49b8d3b07a976180565ae4
Reviewed-on: https://go-review.googlesource.com/c/go/+/357774
Run-TryBot: Paul Murphy <murp@ibm.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This introduces a number of new classifications which will make it
easier to generate functions to assemble the new instructions of
ISA 3.1, and potentially earlier versions.
No code generation changes should occur as a result of these. These
allow finer control over how an opcode is matched to an optab entry.
Literal values are now classified based on the smallest number of bits
needed to encode, and matching rules will accept a literal if it
can be zero/sign extended to fit a larger literal class.
Likewise, support classifying even register numbers for GPR, VSX, and
FPR instructions. Some instructions require and even/odd register pair,
and these are usually represented by specifying the even register, and
similarly encoded.
Likewise, add a unit test for the argument classifier function (aclass).
This caught an off-by-one bug in aclass which is also fixed.
Updates #44549
Change-Id: Ia03013aea8b56c4d59b7c3812cdd67ddb3b720b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/350152
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
Insert machine NOPs when a prefixed instruction crosses a 64B boundary.
ISA 3.1 prohibits prefixed instructions being placed across them. Such
instructions generate SIGILL if executed.
Likewise, adjust the function alignment to guarantee such instructions
can never cross one. And, don't pad the PC based on alignment. The
linker can fit these more optimally.
Likewise, include the function alignment when printing function debug
information. This is needed to verify function alignment happens.
Updates #44549
Change-Id: I434fb0ee4e984ca00dc4566f7569c3bcdf93f910
Reviewed-on: https://go-review.googlesource.com/c/go/+/347050
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
The assembler does not support parsing DCRx registers,
nor does the compiler generate opcodes with these.
Likewise, these registers are only available on ISA
2.07 embedded processors which are not supported in
golang.
Change-Id: Iea258e5958a2022bda0eee8348de1b06437148df
Reviewed-on: https://go-review.googlesource.com/c/go/+/352790
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This adds support for duffcopy on ppc64x and updates the
ssa/config.go file to enable register args and recognize
the duffDevice is available on ppc64x.
Change-Id: Ifc472cc9cc19c9a80e468fb52078c75f7dd44d36
Reviewed-on: https://go-review.googlesource.com/c/go/+/351490
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Likewise, reorder register numbers such that extended mnemonics which
use FPR arguments can be transparently encoded as a VSR argument for
the move to/from VSR class of instructions. Specifically, ensure the
following holds for all FPx and VRx constants: FPRx & 63 == x, and
VRx & 63 == x + 32.
This simplifies encoding machine instructions, and likewise helps
ppc64 assembly writers to avoid hokey workarounds when switching from
vector to vector-scalar register notation. Notably, many VSX
instructions are limited to vector operands due to encoding
restrictions.
Secondly, this explicitly rejects dubious usages of the m[tf]vsr
family of instructions which had previously been accepted.
* Reject two GPR arguments for non-MTVSRDD opcodes. These
have no defined behavior today, and may set RFU bits. e.g
MTVSRD R1, R2, VS1
* Reject FPR destinations for MTVSRDD, and only accept with two GPR
arguments. This copies two GPR values into either half of a VSR. e.g
MTVSRDD R1, R2, F1
MTVSRDD R1, F1
Change-Id: If13dd88c3791d1892dbd18ef0e34675a5285fff9
Reviewed-on: https://go-review.googlesource.com/c/go/+/342929
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Improve the code which fixes up conditional branches which exceed the
range of a single instruction by inserting one extra jump when
possible instead of two.
Change-Id: Ib0eb5b0f47f7d0e0ccd55471307a5f73fbda88a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/342930
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
These generate similar machine code sequences to
other symbol accesses, therefore we should merge them.
Change-Id: Id8ead284d430fadd2e58bad255deb465498dfade
Reviewed-on: https://go-review.googlesource.com/c/go/+/314109
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
|
|
These are privileged instructions, and thus will never work with
usermode code. I don't think there is a case where this isn't
true. The motivation is to simplify handling of MOV* opcodes.
Assembler support for recognizing the MSR as a register is
retained.
Change-Id: Ic33f021a20057b64e69df8ea125e23dd8937e38d
Reviewed-on: https://go-review.googlesource.com/c/go/+/307814
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
|
|
C_LECON and C_SECON classifications are not generated on ppc64, however
there are many optab entries which match against them. Remove them to
resolve their related TODOs.
Likewise, reorder the optab entries for better readability.
Change-Id: I894a209a148014e5aa438b7303e7fbdda4727c4e
Reviewed-on: https://go-review.googlesource.com/c/go/+/307429
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
|
|
Several classifications exist only to help disambiguate an
implied register (i.e $0/R0 as the implied second register
argument when loading constants, or pseudo-registers used
exclusively by the assembler front-end).
The register determination is folded into getimpliedreg. The
classifications and their related optab entries are removed
or updated.
Change-Id: Iffb167aa9fa57fbc1a537c79fbdfb36cb38f9d95
Reviewed-on: https://go-review.googlesource.com/c/go/+/301789
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Cherry Zhang <cherryyz@google.com>
|
|
This adds support for the extswsli instruction which combines
extsw followed by a shift.
New benchmark demonstrates the improvement:
name old time/op new time/op delta
ExtShift 1.34µs ± 0% 1.30µs ± 0% -3.15% (p=0.057 n=4+3)
Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/257017
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This change adds rules to find pairs of instructions that can
be combined into a single shifts. These instruction sequences
are common in array addressing within loops. Improvements can
be seen in many crypto packages and the hash packages.
These are based on the extended mnemonics found in the ISA
sections C.8.1 and C.8.2.
Some rules in PPC64.rules were moved because the ordering prevented
some matching.
The following results were generated on power9.
hash/crc32:
CRC32/poly=Koopman/size=40/align=0 195ns ± 0% 163ns ± 0% -16.41%
CRC32/poly=Koopman/size=40/align=1 200ns ± 0% 163ns ± 0% -18.50%
CRC32/poly=Koopman/size=512/align=0 1.98µs ± 0% 1.67µs ± 0% -15.46%
CRC32/poly=Koopman/size=512/align=1 1.98µs ± 0% 1.69µs ± 0% -14.80%
CRC32/poly=Koopman/size=1kB/align=0 3.90µs ± 0% 3.31µs ± 0% -15.27%
CRC32/poly=Koopman/size=1kB/align=1 3.85µs ± 0% 3.31µs ± 0% -14.15%
CRC32/poly=Koopman/size=4kB/align=0 15.3µs ± 0% 13.1µs ± 0% -14.22%
CRC32/poly=Koopman/size=4kB/align=1 15.4µs ± 0% 13.1µs ± 0% -14.79%
CRC32/poly=Koopman/size=32kB/align=0 137µs ± 0% 105µs ± 0% -23.56%
CRC32/poly=Koopman/size=32kB/align=1 137µs ± 0% 105µs ± 0% -23.53%
crypto/rc4:
RC4_128 733ns ± 0% 650ns ± 0% -11.32% (p=1.000 n=1+1)
RC4_1K 5.80µs ± 0% 5.17µs ± 0% -10.89% (p=1.000 n=1+1)
RC4_8K 45.7µs ± 0% 40.8µs ± 0% -10.73% (p=1.000 n=1+1)
crypto/sha1:
Hash8Bytes 635ns ± 0% 613ns ± 0% -3.46% (p=1.000 n=1+1)
Hash320Bytes 2.30µs ± 0% 2.18µs ± 0% -5.38% (p=1.000 n=1+1)
Hash1K 5.88µs ± 0% 5.38µs ± 0% -8.62% (p=1.000 n=1+1)
Hash8K 42.0µs ± 0% 37.9µs ± 0% -9.75% (p=1.000 n=1+1)
There are other improvements found in golang.org/x/crypto which are all in the
range of 5-15%.
Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252097
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
This updates the PPC64.rules file to use the MOD instructions
that are available in power9. Prior to power9 this is done
using a longer sequence with multiply and divide.
Included in this change is removal of the REM* opcode variations
that set the CC or OV bits since their settings are based
on the DIV and are not appropriate for the REM.
Change-Id: Iceed9ce33e128e1911c15592ee674276ce8ba3fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/229761
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change adds some instructions that were missing from the
ppc64 assembler, mostly power9 but a few others from earlier.
Tests in cmd/asm for ppc64 were updated: ppc64.s includes the
new instructions, and ppc64enc.s now includes not only the
new instructions but most ppc64 opcodes to provide a more
complete test of the ppc64 assembler.
The ppc64 instruction set is used for linux/ppc64le,
linux/ppc64, and aix/ppc64.
Change-Id: I8695f89dbca06174847963f4ef869f2e584d5bbf
Reviewed-on: https://go-review.googlesource.com/c/go/+/229479
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This does some clean up of the ppc64 opcodes to remove names
from the opcode list that don't actually assemble. At one time
names were added to this list to represent opcode "classes" to
organize other opcodes that have the same set of operand
combinations. Since this is not documented, it is confusing as
to which opcodes can be used in an asm file and which can't, and
which opcodes should be supported in the disassembler. It is
clearer for the user if the list of Go opcodes are all opcodes
that can be assembled with names that match the ppc64 opcode
where possible.
I found this when trying to use Go opcode XXLAND in an asm file
which seems like it should map to ppc64 xxland but when used it
gets this error:
go tool asm test_xxland.s
asm: bad r/r, r/r/r or r/r/r/r opcode XXLAND
asm: assembly failed
This change removes the opcodes that are only used for opcode
"classes" and fixes the case statement where they are referenced.
This also fixes XXLAND and XXPERM which are opcodes that should
assemble to their corresponding ppc64 opcode but do not.
Change-Id: I52300db6b22f7f8b3dd3491c3f35a384b943352c
Reviewed-on: https://go-review.googlesource.com/c/go/+/223138
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
POWER9 (ISA 3.0) introduced a new format of load/store instructions to
implement indexed load/store quadword, using an immediate value instead
of a register index.
This change adds support for this new instruction encoding and adds the
new load/store quadword instructions (lxv/stxv) to the assembler.
This change also adds the missing XX1-form loads/stores (halfword and byte)
included in ISA 3.0.
Change-Id: Ibcdf53c342d7a352d64a9403c2fe7b25be9c3b24
Reviewed-on: https://go-review.googlesource.com/c/go/+/200399
Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This adds support for ppc64 instructions vmrgow and vmrgew which
are needed for an improved implementation of chacha20.
Change-Id: I967a2de54236bcc573a99f7e2b222d5a8bb29e03
Reviewed-on: https://go-review.googlesource.com/c/go/+/192117
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
zeros instructions on POWER9
This change adds new POWER9 instructions for counting trailing zeros (CNTTZW/CNTTZD)
to the assembler and generates them in SSA when GOPPC64=power9.
name old time/op new time/op delta
TrailingZeros-160 1.59ns ±20% 1.45ns ±10% -8.81% (p=0.000 n=14+13)
TrailingZeros8-160 1.55ns ±23% 1.62ns ±44% ~ (p=0.593 n=13+15)
TrailingZeros16-160 1.78ns ±23% 1.62ns ±38% -9.31% (p=0.003 n=14+14)
TrailingZeros32-160 1.64ns ±10% 1.49ns ± 9% -9.15% (p=0.000 n=13+14)
TrailingZeros64-160 1.53ns ± 6% 1.45ns ± 5% -5.38% (p=0.000 n=15+13)
Change-Id: I365e6ff79f3ce4d8ebe089a6a86b1771853eb596
Reviewed-on: https://go-review.googlesource.com/c/go/+/167517
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
VPERMXOR is missing from the Go assembler for ppc64. It has the
same format as VPERM. It was requested by an external user so
they could write an optimized algorithm in asm.
Change-Id: Icf4c682f7f46716ccae64e6ae3d62e8cec67f6c1
Reviewed-on: https://go-review.googlesource.com/c/151578
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
This commit changes the code generated for addressing symbols on AIX
operating system.
On AIX, every symbol accesses must be done via another symbol near the TOC,
named TOC anchor or TOC entry. This TOC anchor is a pointer to the symbol
address.
During Progedit function, when a symbol access is detected, its instructions
are modified to create a load on its TOC anchor and retrieve the symbol.
Change-Id: I00cf8f49c13004bc99fa8af13d549a709320f797
Reviewed-on: https://go-review.googlesource.com/c/151039
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
This change implements math.Round as an intrinsic on ppc64x so it can be
done using a single instruction.
benchmark old ns/op new ns/op delta
BenchmarkRound-16 2.60 0.69 -73.46%
Change-Id: I9408363e96201abdfc73ced7bcd5f0c29db006a8
Reviewed-on: https://go-review.googlesource.com/109395
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This change adds vector multiply instructions to the assembler for
ppc64x.
Change-Id: I5143a2dc3736951344d43999066d38ab8be4a721
Reviewed-on: https://go-review.googlesource.com/107795
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The current implementation of l*arx instructions does not accept non-zero
offsets in RA nor the EH field. This change adds full functionality to those
instructions.
Updates #23845
Change-Id: If113f70d11de5f35f8389520b049390dbc40e863
Reviewed-on: https://go-review.googlesource.com/99635
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
Before DWARF location lists can be turned on, 3 bugs need
fixing.
This CL addresses two -- lack of register definitions for
various architectures, and bugs on 32-bit platforms.
The third bug comes later.
Passes
GO_GCFLAGS=-dwarflocationlists ./run.bash -no-rebuild
(-no-rebuild because the map dependence causes trouble)
Change-Id: I4223b48ade84763e4b048e4aeb81149f082c7bc7
Reviewed-on: https://go-review.googlesource.com/99255
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change adds ADD/AND/OR/XOR Immediate Shifted instructions for
ppc64x so they are usable in Go asm code. These instructions were
originally present in asm9.go, but they were only usable in that
file (as -AADD, -AANDCC, -AOR, -AXOR). These old mnemonics are now
removed.
Updates #23845
Change-Id: Ifa2fac685e8bc628cb241dd446adfc3068181826
Reviewed-on: https://go-review.googlesource.com/94115
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This change adds a better implementation of IndexByte in asm that uses the
vector registers/instructions on ppc64x.
benchmark old ns/op new ns/op delta
BenchmarkIndexByte/10-8 9.70 9.37 -3.40%
BenchmarkIndexByte/32-8 10.9 10.9 +0.00%
BenchmarkIndexByte/4K-8 254 92.8 -63.46%
BenchmarkIndexByte/4M-8 249246 118435 -52.48%
BenchmarkIndexByte/64M-8 10737987 7383096 -31.24%
benchmark old MB/s new MB/s speedup
BenchmarkIndexByte/10-8 1030.63 1067.24 1.04x
BenchmarkIndexByte/32-8 2922.69 2928.53 1.00x
BenchmarkIndexByte/4K-8 16065.95 44156.45 2.75x
BenchmarkIndexByte/4M-8 16827.96 35414.21 2.10x
BenchmarkIndexByte/64M-8 6249.67 9089.53 1.45x
Change-Id: I81dbdd620f7bb4e395ce4d1f2a14e8e91e39f9a1
Reviewed-on: https://go-review.googlesource.com/71710
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
on ppc64x
This adds support for math Abs, Copysign to be instrinsics on ppc64x.
New instruction FCPSGN is added to generate fcpsgn. Some new
rules are added to improve the int<->float conversions that are
generated mainly due to the Float64bits and Float64frombits in
the math package. PPC64.rules is also modified as suggested
in the review for CL 63290.
Improvements:
benchmark old ns/op new ns/op delta
BenchmarkAbs-16 1.12 0.69 -38.39%
BenchmarkCopysign-16 1.30 0.93 -28.46%
BenchmarkNextafter32-16 9.34 8.05 -13.81%
BenchmarkFrexp-16 8.81 7.60 -13.73%
Others that used Copysign also saw smaller improvements.
I attempted to make this work using rules since that
seems to be preferred, but due to the use of Float64bits and
Float64frombits in these functions, several rules had to be added and
even then not all cases were matched. Using rules became too
complicated and seemed too fragile for these.
Updates #21390
Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995
Reviewed-on: https://go-review.googlesource.com/67130
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The functions Float64bits and Float64frombits perform
poorly on ppc64x because the int<->float conversions
often result in load and store sequences to handle the
type change. This patch adds more rules to recognize
those sequences and use register to register moves
and avoid unnecessary loads and stores where possible.
There were some existing rules to improve these conversions,
but this provides additional improvements. Included here:
- New instruction FCFIDS to improve on conversion to 32 bit
- Rename Xf2i64 and Xi2f64 as MTVSRD, MFVSRD, to match the asm
- Add rules to lower some of the load/store sequences for
- Added new go asm to ppc64.s testcase.
conversions
Improvements:
BenchmarkAbs-16 2.16 0.93 -56.94%
BenchmarkCopysign-16 2.66 1.18 -55.64%
BenchmarkRound-16 4.82 2.69 -44.19%
BenchmarkSignbit-16 1.71 1.14 -33.33%
BenchmarkFrexp-16 11.4 7.94 -30.35%
BenchmarkLogb-16 10.4 7.34 -29.42%
BenchmarkLdexp-16 15.7 11.2 -28.66%
BenchmarkIlogb-16 10.2 7.32 -28.24%
BenchmarkPowInt-16 69.6 55.9 -19.68%
BenchmarkModf-16 10.1 8.19 -18.91%
BenchmarkLog2-16 17.4 14.3 -17.82%
BenchmarkCbrt-16 45.0 37.3 -17.11%
BenchmarkAtanh-16 57.6 48.3 -16.15%
BenchmarkRemainder-16 76.6 65.4 -14.62%
BenchmarkGamma-16 26.0 22.5 -13.46%
BenchmarkPowFrac-16 197 174 -11.68%
BenchmarkMod-16 112 99.8 -10.89%
BenchmarkAsinh-16 59.9 53.7 -10.35%
BenchmarkAcosh-16 44.8 40.3 -10.04%
Updates #21390
Change-Id: I56cc991fc2e55249d69518d4e1ba76cc23904e35
Reviewed-on: https://go-review.googlesource.com/63290
Reviewed-by: Michael Munday <mike.munday@ibm.com>
|
|
This change adds new ppc64 instructions from the POWER9 ISA. This includes
compares, loads, maths, register moves and the new random number generator and
copy/paste facilities.
Change-Id: Ife3720b90f5af184ff115bbcdcbce5c1302d39b6
Reviewed-on: https://go-review.googlesource.com/53930
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This updates PPC64.rules to include rules to generate rotates
for ADD, OR, XOR operators that combine two opposite shifts
that sum to 32 or 64.
To support this change opcodes for ROTL and ROTLW were added to
be used like the rotldi and rotlwi extended mnemonics.
This provides the following improvement in sha3:
BenchmarkPermutationFunction-8 302.83 376.40 1.24x
BenchmarkSha3_512_MTU-8 98.64 121.92 1.24x
BenchmarkSha3_384_MTU-8 136.80 168.30 1.23x
BenchmarkSha3_256_MTU-8 169.21 211.29 1.25x
BenchmarkSha3_224_MTU-8 179.76 221.19 1.23x
BenchmarkShake128_MTU-8 212.87 263.23 1.24x
BenchmarkShake256_MTU-8 196.62 245.60 1.25x
BenchmarkShake256_16x-8 163.57 194.37 1.19x
BenchmarkShake256_1MiB-8 199.02 248.74 1.25x
BenchmarkSha3_512_1MiB-8 106.55 133.13 1.25x
Fixes #20030
Change-Id: I484c56f48395d32f53ff3ecb3ac6cb8191cfee44
Reviewed-on: https://go-review.googlesource.com/40992
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This change adds instructions from ISA 2.05, 2.06 and 2.07 that are frequently
used in assembly optimizations for ppc64.
It also fixes two problems:
* the implementation of RLDICR[CC]/RLDICL[CC] did not consider all possible
cases for the bit mask.
* removed two non-existing instructions that were added by mistake in the VMX
implementation (VORL/VANDL).
Change-Id: Iaef4e5c6a5240c2156c6c0f28ad3bcd8780e9830
Reviewed-on: https://go-review.googlesource.com/36230
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|