| Age | Commit message (Collapse) | Author |
|
This CL is generated by CL 764980.
This CL supports these new special constants:
<prfop>, which Go already support (prefetch modifier)
<vl>, which include VLx2 and VLx4, which is the vector length specifier.
Change-Id: I831f306a816493c08f3c22786e5360f2a37acf6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/765000
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This CL is generated by CL 759800.
The new register patterns are (examples):
Z1.B[5]
Z2[6]
P1[7]
PN1[8]
Change-Id: I5bccc4f1c0474dbd4cd4878bd488f36a7026c7ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/759780
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The GP registers and SIMD registers are comforming to the existing Go
syntax: they are V or R registers, their widths are specified in the
Opcode, the rules to specify them is:
- if that instruction only contains one GP or SIMD register:
If it's 32-bit GP, then append W to the end of the opcode.
If it's 64-bit GP, no changes.
If it's SIMD register with BHWD width specification, BHSDQ will just
be appended to the end of the opcode.
- if it contains multiple GP or SIMD registers, then manual observation
found that they are either specified the same width, or they are fixed
width. We distinguish them by their first Go ASM operand width. The rule
to append suffixes are the same to the single-reg case above.
This CL is generated by CL 759280.
Change-Id: Icc819cc30dd8fd1609de31ba7bcb4e3ac83c465e
Reviewed-on: https://go-review.googlesource.com/c/go/+/759261
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add support for ASIMD instructions that reduce a vector to
a scalar by operating across all lanes. These use the ASIMDALL
encoding class from the ARM architecture specification.
Integer cross-lane reductions (.B8, .B16, .H4, .H8, .S4):
Signed max/min across lanes: VSMAXV, VSMINV
Unsigned max/min across lanes: VUMAXV, VUMINV
Floating-point cross-lane reductions (.S4 arrangement):
FP max/min across lanes: VFMAXV, VFMINV
FP max/min across lanes (NM): VFMAXNMV, VFMINNMV
Change-Id: I6af4462d26803dfc7c78db2ad9df4284083e31e8
Reviewed-on: https://go-review.googlesource.com/c/go/+/762202
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add support for ASIMD unary miscellaneous instructions that operate
on a single source register. These use the ASIMDMISC encoding
class from the ARM architecture specification.
These instruction need some validation for arrangement constraints:
- VNOT only allows .B8/.B16 arrangements
- VCLS/VCLZ do not support D arrangements
- Floating-point variants (VFABS, VFNEG, VFSQRT, VFRINT*) only
allow floating-point arrangements (S and D)
New instructions by group:
Integer absolute/negate: VABS, VNEG
Floating-point abs/negate: VFABS, VFNEG
Floating-point sqrt: VFSQRT
Floating-point round: VFRINTN, VFRINTP, VFRINTM, VFRINTZ
Saturating abs/negate: VSQABS, VSQNEG
Bit/count operations: VCLS, VCLZ, VNOT
Change-Id: I62242eda31f82cd34119c7d4f97316a030e7663b
Reviewed-on: https://go-review.googlesource.com/c/go/+/762201
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Add encoding support for ASIMD three-register instructions covering
floating-point, saturating, halving, integer multiply/accumulate,
min/max (including pairwise variants), and bitwise operations.
These belong to the "Advanced SIMD Three-register (same)" instruction
class defined by the ARM architecture, meaning the two source registers
use the same element arrangement (e.g., both .S4 or both .D2). In the
assembler they share a common encoding path using the ASIMDSAME()
macro.
New instructions by group:
Floating-point arithmetic: VFADD, VFSUB, VFMUL, VFDIV
Floating-point min/max: VFMAX, VFMAXNM, VFMIN, VFMINNM
Pairwise floating-point: VFADDP, VFMAXP, VFMINP, VFMAXNMP,
VFMINNMP
Saturating arithmetic: VSQADD, VUQADD, VSQSUB, VUQSUB
Average (halving add): VSHADD, VSRHADD, VUHADD, VURHADD
Integer multiply/accum: VMUL, VMLA, VMLS
Integer min/max: VSMAX, VSMIN
Pairwise integer min/max: VSMAXP, VSMINP, VUMAXP, VUMINP
Bitwise: VBIC, VORN
Change-Id: I732c84123ad1f302260514fdfe0d020787da017b
Reviewed-on: https://go-review.googlesource.com/c/go/+/762200
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add support for ASIMD shift instructions. These use the ASIMDSHF
encoding class from the ARM architecture specification, where the
shift amount is encoded as an immediate derived from the element size.
Also add ASIMD shifts-by-vector (3-register form) where the shift
amount comes from a second vector register. These use the ASIMDSAME
encoding class.
New instructions by group:
Shift by immediate (signed): VSSHR, VSRSHR
Shift by immediate (saturating): VSQSHL, VUQSHL
Narrowing shift by immediate: VSHRN, VSHRN2
Shift by vector (3-reg): VSSHL, VUSHL, VSQSHL, VUQSHL
Change-Id: I039cc16bc01980b04e6940cc1d4670faf5fa7e3c
Reviewed-on: https://go-review.googlesource.com/c/go/+/762180
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add remaining arm64 ASIMD vector compare instructions.
All these instructions produce either all zeroes (false) or all ones (true)
bits in each corresponding lane as the result.
Added integer comparison instructions:
- VCMEQ (compare to zero)
- VCMGE, VCMGT (singed, both two-register and compare to zero)
- VCMHI, VCMHS (unsigned two-register compare)
- VCMLE, VCMLT (signed compare to zero)
Added floating-point comparison instructions:
- VFCMEQ, VFCMGE, VFCMGT (both two-register and zero variants)
- VFCMLE, VFCMLT (compare to zero)
Change-Id: I913165d3934f2556c9bdf38c5103ef56d86383ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/721640
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
This CL integrates a new assembling path specifically designed for SVE
and other modern ARM64 instructions, utilizing generated instruction
tables. It contains the foundational files and modifications to direct
the assembling pipeline to use this new data-driven path.
In a.out.go, it registers new constants for registers and operand types
used by SVE.
A new file inst.go is added, which defines the instruction table data
types and utility functions for the new path. The entry point from the
upstream pipeline is `tryEncode`.
`tryEncode` returns false upon an encoding failure, which allows the
upstream matching logic to handle multiple potential matches. The exact
match is not finalized until an instruction is actually encoded, as
detailed in the comments for `elemEncoders`.
This CL also introduces the core generated tables (`anames_gen.go`,
`encoding_gen.go`, `goops_gen.go`, and `inst_gen.go`) which handle a
wide variety of SVE instructions. A comprehensive end-to-end assembly
test file (`arm64sveenc.s`) is added, containing hundreds of test cases
for these SVE instructions to verify the new encoding path.
To facilitate these encodings, this CL implements handling for operand
types such as AC_ARNG, AC_PREG, AC_PREGZM, and AC_ZREG. Others are left
as TODOs.
The generated files in this CL are produced by the `instgen` tool in CL
755180.
Original author Eric Fang (eric.fang@arm.com, CL 424137)
Change-Id: I483f170c776fcd8edd8b8b04520f9d69ee0855dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/742620
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Add the SB (speculation barrier) instruction, and an internal/cpu
feature bit to check its availability.
Change-Id: I7c2d887ae75598f7c11cc875ec15ec3be76c09f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/729501
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add support for the Pointer Authentication Code instructions
required for the ELF ABI when enabling PAC aware binaries.
This allows for assembly writers to add PAC instructions where needed to
support this ABI. Follow up work is to enable the compiler to emit these
instructions in the appropriate places.
The TL;DR for the Linux ABI is that the prologue of a function that
pushes the link register (LR) to the stack, signs the LR with a key
managed by the operating system and hardware using a PAC instruction,
like "paciasp". The function epilog, when restoring the LR from the
stack will verify the signature, using an instruction like "autiasp".
This helps prevents attackers from modifying the return address on the
stack, a common technique for ROP attacks.
Details on PAC can be found here:
- https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/enabling-pac-and-bti-on-aarch64
- https://developer.arm.com/documentation/109576/0100/Pointer-Authentication-Code
The ABI details can be found here:
- https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst
Change-Id: I4516ed1294d19f9ff9d278833d542821b6642aa9
Reviewed-on: https://go-review.googlesource.com/c/go/+/676675
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Add support for the `BTI' instruction to the arm64 assembler. This
instruction provides Branch Target Identification for targets of
indirect branches. A BTI can be marked with a target type of
'C' (call), 'J' (jump) or 'JC' (jump or call).
Updates #66054
Change-Id: I1cf31a0382207bb75b9b2deb49ac298a59c00d8a
Reviewed-on: https://go-review.googlesource.com/c/go/+/646781
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Marvin Drees <marvin.drees@9elements.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
instructions
Implement vector configuration setting instructions (VSETVLI,
VSETIVLI, VSETL). These allow the vector length (vl) and vector
type (vtype) CSRs to be configured via a single instruction.
Unfortunately each instruction has its own dedicated encoding.
In the case of VSETVLI/VSETIVLI, the vector type is specified via
a series of special operands, which specify the selected element
width (E8, E16, E32, E64), the vector register group multiplier
(M1, M2, M4, M8, MF2, MF4, MF8), the vector tail policy (TU, TA)
and vector mask policy (MU, MA). Note that the order of these
special operands matches non-Go assemblers.
Partially based on work by Pengcheng Wang <wangpengcheng.pp@bytedance.com>.
Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64
Change-Id: I431f59c1e048a3e84754f0643a963da473a741fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/631936
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Add cmd/internal/obj/mkcnames.go to do the generation and update
the architecture packages to use it to maintain the Cnames tables.
Currently works correctly on arm64,loong64,mips,ppc64 and s390x.
Change-Id: I5220b0ba6d8a8a5fcc4d9774731eb2af69a671af
Reviewed-on: https://go-review.googlesource.com/c/go/+/622256
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Commit-Queue: Ian Lance Taylor <iant@golang.org>
|
|
Change-Id: I36a0f0989d37bef45ea8778da799b56a7e9a0c30
Reviewed-on: https://go-review.googlesource.com/c/go/+/529515
Run-TryBot: shuang cui <imcusg@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
|
|
Currently, pool literals are added when they are not needed, namely
in the case where the offset is a 24 bit unsigned scaled immediate.
By improving the classification of loads and stores, we can avoid
generating unused pool literals. However, more importantly this
provides a basis for further improvement of the load and store
code generation.
Updates #59615
Change-Id: Ia3bad1709314565a05894a76c434cca2fa4533c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/512538
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Currently the Optab structure contains four arguments of an instruction,
excludes the fifth argument p.RegTo2. It does not participate in
instruction matching and is usually handled separately.
Instructions with five operands are common in the newer arm instruction
set, so this CL adds the fifth argument to Optab, so that instruction
matching is easier. This caused the oplook function also needs to be
updated synchronously, this CL also made some cleaning and modifications
to this function.
Change-Id: I1d95ad99e72a44dfad1e00db182cfc369a0e55c6
Reviewed-on: https://go-review.googlesource.com/c/go/+/505975
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
|
|
Fix spelling errors discovered using https://github.com/codespell-project/codespell. Errors in data files and vendored packages are ignored.
Change-Id: I83c7818222f2eea69afbd270c15b7897678131dc
GitHub-Last-Rev: 3491615b1b82832cc0064f535786546e89aa6184
GitHub-Pull-Request: golang/go#60758
Reviewed-on: https://go-review.googlesource.com/c/go/+/502576
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
ARM64 doesn't have MOVNP/MOVNPW and STLP/STLPW instructions, which are
currently useless instructions as well. This CL deletes them. At the
same time this CL sorts the opcodes by name, which looks cleaner.
Change-Id: I25cfb636b23356ba0a50cba527a8c85b3f7e2ee4
Reviewed-on: https://go-review.googlesource.com/c/go/+/495695
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Previously we convert $0 to the ZR register for some reasons, which causes
two problems:
1. Confusion, the special case of the ZR register needs to be considered
when dealing with constants. For encoding, some places we encode ZR, and
some places we encode $0, although we have converted $0 to ZR.
2. Unexpected instruction format. All instructions that support ZR register
operands can be replaced by $0.
This patch removes this conversion. Note that this patch may cause previously
unintendedly supported instruction formats to no longer be supported.
Change-Id: I3d8d2c06711b7614a38191397da7776417f1861c
Reviewed-on: https://go-review.googlesource.com/c/go/+/404316
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Change-Id: Icd9eeb78bfc0c0bbe19dcb9841c9fdc0abc29cc9
Reviewed-on: https://go-review.googlesource.com/c/go/+/413314
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
There was only a placeholder for DC instruction in the previous code.
gVisor needs this instruction. This CL completes its support.
This patch is a copy of CL 250858, contributed by Junchen Li(junchen.li@arm.com).
Co-authored-by: Junchen Li(junchen.li@arm.com)
CustomizedGitHooks: yes
Change-Id: I76098048a227fbd08aa42c4173b028f0ab4f66e8
Reviewed-on: https://go-review.googlesource.com/c/go/+/302851
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: Eric Fang <eric.fang@arm.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
There was only a placeholder for TLBI instruction in the previous code.
gVisor needs this instruction. This CL completes its support.
This patch is a copy of CL 250758, contributed by Junchen Li(junchen.li@arm.com).
Co-authored-by: Junchen Li(junchen.li@arm.com)
Change-Id: I69e893d2c1f75e227475de9e677548e14870f3cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/302850
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: Eric Fang <eric.fang@arm.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
The previous code treats some operands such as EQ, LT, etc. as special
registers. However, they are not. This CL adds a new AddrType TYPE_SPOPD
and a new class C_SPOPD to support this kind of special operands, and
refactors the relevant code.
This patch is a copy of CL 260861, contributed by Junchen Li(junchen.li@arm.com).
Co-authored-by: Junchen Li(junchen.li@arm.com)
Change-Id: I57b28da458ee3332f610602632e7eda03af435f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/302849
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: Eric Fang <eric.fang@arm.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Add test cases.
Fixes #51628
Change-Id: I433367d87e6bb5da5579c4be540079b92701c1fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/392294
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: Fannie Zhang <Fannie.Zhang@arm.com>
|
|
Fixes #48002
Change-Id: Ie3a157d55b291f5ac2ef4845e6ce4fefd84fc642
Reviewed-on: https://go-review.googlesource.com/c/go/+/350912
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This CL adds support for arm64 fp&simd instructions VUMAX and VUMIN.
Fixes #42326
Change-Id: I3757ba165dc31ce1ce70f3b06a9e5b94c14d2ab9
Reviewed-on: https://go-review.googlesource.com/c/go/+/271497
Trust: eric fang <eric.fang@arm.com>
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: fannie zhang <Fannie.Zhang@arm.com>
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This CL adds assembly support for 128-bit FLDPQ and FSTPQ instructions.
This CL also deletes some wrong pre/post-indexed LDP and STP instructions,
such as {ALDP, C_UAUTO4K, C_NONE, C_NONE, C_PAIR, 74, 8, REGSP, 0, C_XPRE},
because when the offset type is C_UAUTO4K, pre and post don't work.
Change-Id: Ifd901d4440eb06eb9e86c9dd17518749fdf32848
Reviewed-on: https://go-review.googlesource.com/c/go/+/273668
Trust: eric fang <eric.fang@arm.com>
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Frame pointers were already enabled on linux, darwin, ios,
but not freebsd, android, openbsd, netbsd.
But the space was reserved on all platforms, leading to
two different arm64 framepointer conditions in different
parts of the code, one of which had no name
(framepointer_enabled || GOARCH == "arm64",
which might have been "framepointer_space_reserved").
So on the disabled systems, the stack layouts were still
set up for frame pointers and the only difference was not
actually maintaining the FP register in the generated code.
Reduce complexity by just enabling the frame pointer
completely on all the arm64 systems.
This commit passes on freebsd, android, netbsd.
I have not been able to try it on openbsd.
This CL is part of a stack adding windows/arm64
support (#36439), intended to land in the Go 1.17 cycle.
This CL is, however, not windows/arm64-specific.
It is cleanup meant to make the port (and future ports) easier.
Change-Id: I83bd23369d24b76db4c6a648fa74f6917819a093
Reviewed-on: https://go-review.googlesource.com/c/go/+/288814
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
The LDANDx instructions were misleading because they correspond to the
mnemonic LDCLRx as defined in the Arm Architecture Reference Manual for
Armv8. This changes the assembler to use the same mnemonic as the GNU
assembler and the manual.
The instruction has the form:
LDCLRx Rs, (Rb), Rt: *Rb -> Rt, Rs AND NOT(*Rb) -> *Rb
Change-Id: I94ae003e99e817209bba1afe960e612bf3a0b410
Reviewed-on: https://go-review.googlesource.com/c/go/+/267138
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: fannie zhang <Fannie.Zhang@arm.com>
|
|
This patch adds support for CASx and CASPx atomic instructions.
go syntax gnu syntax
CASD Rs, (Rn|RSP), Rt => cas Xs, Xt, (Xn|SP)
CASALW Rs, (Rn|RSP), Rt => casal Ws, Wt, (Xn|SP)
CASPD (Rs, Rs+1), (Rn|RSP), (Rt, Rt+1) => casp Xs, Xs+1, Xt, Xt+1, (Xn|SP)
CASPW (Rs, Rs+1), (Rn|RSP), (Rt, Rt+1) => casp Ws, Ws+1, Wt, Wt+1, (Xn|SP)
This patch changes the type of prog.RestArgs from "[]Addr" to
"[]struct{Addr, Pos}", Pos is a enum, indicating the position of
the operand.
This patch also adds test cases.
Change-Id: Ib971cfda7890b7aa895d17bab22dea326c7fcaa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/233277
Trust: fannie zhang <Fannie.Zhang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This patch enables VSLI, VUADDW(2), VUSRA and FMOVQ SIMD instructions
required by the issue #40725. And the GNU syntax of 'FMOVQ' is 128-bit
ldr/str(immediate, simd&fp).
Add test cases.
Fixes #40725
Change-Id: Ide968ef4a9385ce4cd8f69bce854289014d30456
Reviewed-on: https://go-review.googlesource.com/c/go/+/258397
Trust: fannie zhang <Fannie.Zhang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Armv8.2-SHA introduced four SHA3-related instructions
EOR3 <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B
RAX1 <Vd>.2D, <Vn>.2D, <Vm>.2D
XAR <Vd>.2D, <Vn>.2D, <Vm>.2D, #<imm6>
BCAX <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B
We convert them into Go asm style as:
VEOR3 <Va>.B16, <Vm>.B16, <Vn>.B16, <Vd>.B16
VRAX1 <Vm>.D2, <Vn>.D2, <Vd>.D2
VXAR $imm6, <Vm>.D2, <Vn>.D2, <Vd>.D2
VBCAX <Va>.B16, <Vm>.B16, <Vn>.B16, <Vd>.B16
Armv8 Reference Manual:
* EOR3 (Three-way Exclusive OR) on C7.2.42
* RAX1 (Rotate and Exclusive OR) on C7.2.217
* XAR (Exclusive OR and Rotate) on C7.2.401
* BCAX (Bit Clear and Exclusive OR) on C7.2.12
Change-Id: I9a5d1b5ad508ed8fd5289d535906c54d9a63ca5a
Reviewed-on: https://go-review.googlesource.com/c/go/+/180757
Run-TryBot: Meng Zhuo <mzh@golangcn.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Emmanuel Odeke <emm.odeke@gmail.com>
|
|
The CL 249758 added `FMOVQ $vcon, Vd` instruction and assembler used
128-bit simd literal-loading to load `$vcon` from pool into 128-bit vector
register `Vd`. Because Go does not have 128-bit integers for now, the
assembler will report an error of `immediate out of range` when
assembleing `FMOVQ $0x123456789abcdef0123456789abcdef, V0` instruction.
This patch lets 128-bit integers take two 64-bit operands, for the high
and low parts separately and adds `VMOVQ $hi, $lo, Vd` instruction to
move `$hi<<64+$lo' into 128-bit register `Vd`.
In addition, this patch renames `FMOVQ/FMOVD/FMOVS` ops to 'VMOVQ/VMOVD/VMOVS'
and uses them to move 128-bit, 64-bit and 32-bit constants into vector
registers, respectively
Update the go doc.
Fixes #40725
Change-Id: Ia3c83bb6463f104d2bee960905053a97299e0a3a
Reviewed-on: https://go-review.googlesource.com/c/go/+/255900
Trust: fannie zhang <Fannie.Zhang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
stack address
Currently, when the offset of "MOVD $offset(Rn), Rd" is a large positive
constant or a negative constant, the assembler will load this offset from
the constant pool.This patch gets rid of the constant pool by encoding the
offset into two ADD instructions if it's a large positive constant or one
SUB instruction if negative. For very large negative offset, it is rarely
used, here we don't optimize this case.
Optimized case 1: MOVD $-0x100000(R7), R0
Before: LDR 0x67670(constant pool), R27; ADD R27.UXTX, R0, R7
After: SUB $0x100000, R7, R0
Optimized case 2: MOVD $0x123468(R7), R0
Before: LDR 0x67670(constant pool), R27; ADD R27.UXTX, R0, R7
After: ADD $0x123000, R7, R27; ADD $0x000468, R27, R0
1. Binary size before/after.
binary size change
pkg/linux_arm64 +4KB
pkg/tool/linux_arm64 no change
go no change
gofmt no change
2. go1 benckmark.
name old time/op new time/op delta
pkg:test/bench/go1 goos:linux goarch:arm64
BinaryTree17-64 7335721401.800000ns +-40% 6264542009.800000ns +-14% ~ (p=0.421 n=5+5)
Fannkuch11-64 3886551822.600000ns +- 0% 3875870590.200000ns +- 0% ~ (p=0.151 n=5+5)
FmtFprintfEmpty-64 82.960000ns +- 1% 83.900000ns +- 2% +1.13% (p=0.048 n=5+5)
FmtFprintfString-64 149.200000ns +- 1% 148.000000ns +- 0% -0.80% (p=0.016 n=5+4)
FmtFprintfInt-64 177.000000ns +- 0% 178.400000ns +- 2% ~ (p=0.794 n=4+5)
FmtFprintfIntInt-64 240.200000ns +- 2% 239.400000ns +- 4% ~ (p=0.302 n=5+5)
FmtFprintfPrefixedInt-64 300.400000ns +- 0% 299.200000ns +- 1% ~ (p=0.119 n=5+5)
FmtFprintfFloat-64 360.000000ns +- 0% 361.600000ns +- 3% ~ (p=0.349 n=4+5)
FmtManyArgs-64 1064.400000ns +- 1% 1061.400000ns +- 0% ~ (p=0.087 n=5+5)
GobDecode-64 12080404.400000ns +- 2% 11637601.000000ns +- 1% -3.67% (p=0.008 n=5+5)
GobEncode-64 8474973.800000ns +- 2% 7977801.600000ns +- 2% -5.87% (p=0.008 n=5+5)
Gzip-64 416501238.400000ns +- 0% 410463405.400000ns +- 0% -1.45% (p=0.008 n=5+5)
Gunzip-64 58088415.200000ns +- 0% 58826209.600000ns +- 0% +1.27% (p=0.008 n=5+5)
HTTPClientServer-64 128660.200000ns +-23% 117840.800000ns +- 8% ~ (p=0.222 n=5+5)
JSONEncode-64 17547746.800000ns +- 4% 17216180.000000ns +- 1% ~ (p=0.222 n=5+5)
JSONDecode-64 80879896.000000ns +- 1% 80063737.200000ns +- 0% -1.01% (p=0.008 n=5+5)
Mandelbrot200-64 5484901.600000ns +- 0% 5483614.400000ns +- 0% ~ (p=0.310 n=5+5)
GoParse-64 6201166.800000ns +- 6% 6150920.600000ns +- 1% ~ (p=0.548 n=5+5)
RegexpMatchEasy0_32-64 135.000000ns +- 0% 139.200000ns +- 7% ~ (p=0.643 n=5+5)
RegexpMatchEasy0_1K-64 484.600000ns +- 2% 483.800000ns +- 2% ~ (p=0.984 n=5+5)
RegexpMatchEasy1_32-64 128.000000ns +- 1% 124.600000ns +- 1% -2.66% (p=0.008 n=5+5)
RegexpMatchEasy1_1K-64 769.400000ns +- 2% 761.400000ns +- 1% ~ (p=0.460 n=5+5)
RegexpMatchMedium_32-64 12.900000ns +- 0% 12.500000ns +- 0% -3.10% (p=0.008 n=5+5)
RegexpMatchMedium_1K-64 57879.200000ns +- 1% 56512.200000ns +- 0% -2.36% (p=0.008 n=5+5)
RegexpMatchHard_32-64 3091.600000ns +- 1% 3071.000000ns +- 0% -0.67% (p=0.048 n=5+5)
RegexpMatchHard_1K-64 92941.200000ns +- 1% 92794.000000ns +- 0% ~ (p=1.000 n=5+5)
Revcomp-64 1695605187.000000ns +-54% 1821697637.400000ns +-47% ~ (p=1.000 n=5+5)
Template-64 112839686.800000ns +- 1% 109964069.200000ns +- 3% ~ (p=0.095 n=5+5)
TimeParse-64 587.000000ns +- 0% 587.000000ns +- 0% ~ (all equal)
TimeFormat-64 586.000000ns +- 1% 584.200000ns +- 1% ~ (p=0.659 n=5+5)
[Geo mean] 81804.262218ns 80694.712973ns -1.36%
name old speed new speed delta
pkg:test/bench/go1 goos:linux goarch:arm64
GobDecode-64 63.6MB/s +- 2% 66.0MB/s +- 1% +3.78% (p=0.008 n=5+5)
GobEncode-64 90.6MB/s +- 2% 96.2MB/s +- 2% +6.23% (p=0.008 n=5+5)
Gzip-64 46.6MB/s +- 0% 47.3MB/s +- 0% +1.47% (p=0.008 n=5+5)
Gunzip-64 334MB/s +- 0% 330MB/s +- 0% -1.25% (p=0.008 n=5+5)
JSONEncode-64 111MB/s +- 4% 113MB/s +- 1% ~ (p=0.222 n=5+5)
JSONDecode-64 24.0MB/s +- 1% 24.2MB/s +- 0% +1.02% (p=0.008 n=5+5)
GoParse-64 9.35MB/s +- 6% 9.42MB/s +- 1% ~ (p=0.571 n=5+5)
RegexpMatchEasy0_32-64 237MB/s +- 0% 231MB/s +- 7% ~ (p=0.690 n=5+5)
RegexpMatchEasy0_1K-64 2.11GB/s +- 2% 2.12GB/s +- 2% ~ (p=1.000 n=5+5)
RegexpMatchEasy1_32-64 250MB/s +- 1% 257MB/s +- 1% +2.63% (p=0.008 n=5+5)
RegexpMatchEasy1_1K-64 1.33GB/s +- 2% 1.35GB/s +- 1% ~ (p=0.548 n=5+5)
RegexpMatchMedium_32-64 77.6MB/s +- 0% 79.8MB/s +- 0% +2.80% (p=0.008 n=5+5)
RegexpMatchMedium_1K-64 17.7MB/s +- 1% 18.1MB/s +- 0% +2.41% (p=0.008 n=5+5)
RegexpMatchHard_32-64 10.4MB/s +- 1% 10.4MB/s +- 0% ~ (p=0.056 n=5+5)
RegexpMatchHard_1K-64 11.0MB/s +- 1% 11.0MB/s +- 0% ~ (p=0.984 n=5+5)
Revcomp-64 188MB/s +-71% 155MB/s +-71% ~ (p=1.000 n=5+5)
Template-64 17.2MB/s +- 1% 17.7MB/s +- 3% ~ (p=0.095 n=5+5)
[Geo mean] 79.2MB/s 79.3MB/s +0.24%
Change-Id: I593ac3e7037afafc3605ad4b0cfb51d5dd88015d
Reviewed-on: https://go-review.googlesource.com/c/go/+/232438
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This CL adds USHLL, USHLL2, UZP1, UZP2, and BIF instructions requested
by #40725. And since UXTL* are aliases of USHLL*, this CL also merges
them into one case.
Updates #40725
Change-Id: I404a4fdaf953319f72eea548175bec1097a2a816
Reviewed-on: https://go-review.googlesource.com/c/go/+/253659
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Enable VBSL, VBIT, VCMTST, VUXTL VUXTL2 and FMOVQ SIMD
instructions required by the issue #40725. And FMOVQ
instrucion is used to move a large constant to a Vn
register.
Add test cases.
Fixes #40725
Change-Id: I1cac1922a0a0165d698a4b73a41f7a5f0a0ad549
Reviewed-on: https://go-review.googlesource.com/c/go/+/249758
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
ARMv8.2-SHA add SHA512 intructions:
1. SHA512H Vm.D2, Vn, Vd
2. SHA512H2 Vm.D2, Vn, Vd
3. SHA512SU0 Vn.D2, Vd.D2
4. SHA512SU1 Vm.D2, Vn.D2, Vd.D2
ARMv8 Architecture Reference Manual C7.2.234-C7.2.234
Change-Id: Ie970fef1bba5312ad466f246035da4c40a1bbb39
Reviewed-on: https://go-review.googlesource.com/c/go/+/180057
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Add support to the asimd instruction rev16 which reverses elements in
16-bit halfwords.
syntax:
VREV16 <Vn>.<T>, <Vd>.<T>
<T> should be either B8 or B16.
Change-Id: I7a7b8e772589c51ca9eb6dca98bab1aac863c6c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/213738
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This patch uses symbol NOOP to support arm64 instruction NOP. In
arm64, NOP stands for that No Operation does nothing, other than
advance the value of the program counter by 4. This instruction
can be used for instruction alignment purposes. This patch uses
NOOP to support arm64 instruction NOP, because we have a generic
"NOP" instruction, which is a zero-width pseudo-instruction.
In arm64, instruction NOP is an alias of HINT #0. This patch adds
test cases for instruction HINT #0.
Change-Id: I54e6854c46516eb652b412ef9e0f73ab7f171f8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200578
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This CL adds system register error checking test cases. There're two kinds of
error test cases:
1. illegal combination.
MRS should be used in this way: MRS <system register>, <general register>.
MSR should be used in this way: MSR <general register>, <system register>.
Error usage examples:
MRS R8, VTCR_EL2 // ERROR "illegal combination"
MSR VTCR_EL2, R8 // ERROR "illegal combination"
2. illegal read or write access.
Error usage examples:
MSR R7, MIDR_EL1 // ERROR "expected writable system register or pstate"
MRS OSLAR_EL1, R3 // ERROR "expected readable system register"
This CL reads system registers readable and writeable property to check whether
they're used with legal read or write access. This property is named AccessFlags
in sysRegEnc.go, and it is automatically generated by modifing the system register
generator.
Change-Id: Ic83d5f372de38d1ecd0df1ca56b354ee157f16b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/194917
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This change adds VLD1R, VLD2R, VLD3R, VLD4R
Change-Id: Ie19e9ae02fdfc94b9344acde8c9938849efb0bf0
Reviewed-on: https://go-review.googlesource.com/c/go/+/181697
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change adds VLD2, VLD3, VLD4, VST2, VST3, VST4 (multiple structures)
for image or multi media optimazation.
Change-Id: Iae3538ef4434e436e3fb2f19153c58f918f773af
Reviewed-on: https://go-review.googlesource.com/c/go/+/166518
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This patch supports the EL0 and EL1 system registers used in MRS/MSR
instructions. This patch refactors the assembler code, allowing the
assembler to read system register information from the automatically
generated sysRegEnc.go file and move existing declared system registers
to the sysRegEnc.go file.
This patch adds 431 system registers, it is worth noting that the number
of special registers is initialized to less than 1024 in the list7.go file.
This CL also adds some test cases to test the newly added system registers.
The test cases are contributed by Dianhong Xu <Dianhong.Xu@arm.com>
Change-Id: Ic09a937eaaeefe82bd08b5dd726808f8ff6cebf6
Reviewed-on: https://go-review.googlesource.com/c/go/+/189577
Reviewed-by: Ben Shi <powerman1st@163.com>
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Currently we use R16 and R17 for ARM64's Duff's devices.
According to ARM64 ABI, R16 and R17 can be used by the (external)
linker as scratch registers in trampolines. So don't use these
registers to pass information across functions.
It seems unlikely that calling Duff's devices would need a
trampoline in normal cases. But it could happen if the call
target is out of the 128 MB direct jump limit.
The choice of R20 and R21 is kind of arbitrary. The register
allocator allocates from low-numbered registers. High numbered
registers are chosen so it is unlikely to hold a live value and
forces a spill.
Fixes #32773.
Change-Id: Id22d555b5afeadd4efcf62797d1580d641c39218
Reviewed-on: https://go-review.googlesource.com/c/go/+/183842
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This change adds several arm64 v8.1 atomic instructions and test cases.
They are LDADDAx, LDADDLx, LDANDAx, LDANDALx, LDANDLx, LDEORAx, LDEORALx,
LDEORLx, LDORAx, LDORALx, LDORLx, SWPAx and SWPLx. Their form is consistent
with the form of the existing atomic instructions.
For instructions STXRx, STLXRx, STXPx and STLXPx, the second destination
register can't be RSP. This CL also adds a check for this.
LDADDx Rs, (Rb), Rt: *Rb -> Rt, Rs + *Rb -> *Rb
LDANDx Rs, (Rb), Rt: *Rb -> Rt, Rs AND NOT(*Rb) -> *Rb
LDEORx Rs, (Rb), Rt: *Rb -> Rt, Rs EOR *Rb -> *Rb
LDORx Rs, (Rb), Rt: *Rb -> Rt, Rs OR *Rb -> *Rb
Change-Id: I9f9b0245958cb57ab7d88c66fb9159b23b9017fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/157001
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
instructions
Current assembler gets large constants from constant pool, this CL
gets rid of the pool by using MOVZ/MOVN and MOVK to load large
constants.
This CL changes the assembler behavior as follows.
1. go assembly 1, MOVD $0x1111222233334444, R1
2, MOVD $0x1111ffff1111ffff, R1
previous version: MOVD 0x9a4, R1 (loads constant from pool).
optimized version: 1, MOVD $0x4444, R1; MOVK $(0x3333<<16), R1; MOVK $(0x2222<<32), R1;
MOVK $(0x1111<<48), R1. 2, MOVN $(0xeeee<<16), R1; MOVK $(0x1111<<48), R1.
Add test cases, and below are binary size comparison and bechmark results.
1. Binary size before/after
binary size change
pkg/linux_arm64 +25.4KB
pkg/tool/linux_arm64 -2.9KB
go -2KB
gofmt no change
2. compiler benchmark.
name old time/op new time/op delta
Template 574ms ±21% 577ms ±14% ~ (p=0.853 n=10+10)
Unicode 327ms ±29% 353ms ±23% ~ (p=0.360 n=10+8)
GoTypes 1.97s ± 8% 2.04s ±11% ~ (p=0.143 n=10+10)
Compiler 9.13s ± 9% 9.25s ± 8% ~ (p=0.684 n=10+10)
SSA 29.2s ± 5% 27.0s ± 4% -7.40% (p=0.000 n=10+10)
Flate 402ms ±40% 308ms ± 6% -23.29% (p=0.004 n=10+10)
GoParser 470ms ±26% 382ms ±10% -18.82% (p=0.000 n=9+10)
Reflect 1.36s ±16% 1.17s ± 7% -13.92% (p=0.001 n=9+10)
Tar 561ms ±19% 466ms ±15% -17.08% (p=0.000 n=9+10)
XML 745ms ±20% 679ms ±20% ~ (p=0.123 n=10+10)
StdCmd 35.5s ± 6% 37.2s ± 3% +4.81% (p=0.001 n=9+8)
name old user-time/op new user-time/op delta
Template 625ms ±14% 660ms ±18% ~ (p=0.343 n=10+10)
Unicode 355ms ±10% 373ms ±20% ~ (p=0.346 n=9+10)
GoTypes 2.39s ± 8% 2.37s ± 5% ~ (p=0.897 n=10+10)
Compiler 11.1s ± 4% 11.4s ± 2% +2.63% (p=0.010 n=10+9)
SSA 35.4s ± 3% 34.9s ± 2% ~ (p=0.113 n=10+9)
Flate 402ms ±13% 371ms ±30% ~ (p=0.089 n=10+9)
GoParser 513ms ± 8% 489ms ±24% -4.76% (p=0.039 n=9+9)
Reflect 1.52s ±12% 1.41s ± 5% -7.32% (p=0.001 n=9+10)
Tar 607ms ±10% 558ms ± 8% -7.96% (p=0.009 n=9+10)
XML 828ms ±10% 789ms ±12% ~ (p=0.059 n=10+10)
name old text-bytes new text-bytes delta
HelloSize 714kB ± 0% 712kB ± 0% -0.23% (p=0.000 n=10+10)
CmdGoSize 8.26MB ± 0% 8.25MB ± 0% -0.14% (p=0.000 n=10+10)
name old data-bytes new data-bytes delta
HelloSize 10.5kB ± 0% 10.5kB ± 0% ~ (all equal)
CmdGoSize 258kB ± 0% 258kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal)
CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.18MB ± 0% 1.18MB ± 0% ~ (all equal)
CmdGoSize 11.2MB ± 0% 11.2MB ± 0% -0.13% (p=0.000 n=10+10)
3. go1 benckmark.
name old time/op new time/op delta
BinaryTree17 6.60s ±18% 7.36s ±22% ~ (p=0.222 n=5+5)
Fannkuch11 4.04s ± 0% 4.05s ± 0% ~ (p=0.421 n=5+5)
FmtFprintfEmpty 91.8ns ±14% 91.2ns ± 9% ~ (p=0.667 n=5+5)
FmtFprintfString 145ns ± 0% 151ns ± 6% ~ (p=0.397 n=4+5)
FmtFprintfInt 169ns ± 0% 176ns ± 5% +4.14% (p=0.016 n=4+5)
FmtFprintfIntInt 229ns ± 2% 243ns ± 6% ~ (p=0.143 n=5+5)
FmtFprintfPrefixedInt 343ns ± 0% 350ns ± 3% +1.92% (p=0.048 n=5+5)
FmtFprintfFloat 400ns ± 3% 394ns ± 3% ~ (p=0.063 n=5+5)
FmtManyArgs 1.04µs ± 0% 1.05µs ± 0% +1.62% (p=0.029 n=4+4)
GobDecode 13.9ms ± 4% 13.9ms ± 5% ~ (p=1.000 n=5+5)
GobEncode 10.6ms ± 4% 10.6ms ± 5% ~ (p=0.421 n=5+5)
Gzip 567ms ± 1% 563ms ± 4% ~ (p=0.548 n=5+5)
Gunzip 60.2ms ± 1% 60.4ms ± 0% ~ (p=0.056 n=5+5)
HTTPClientServer 114µs ± 4% 108µs ± 7% ~ (p=0.095 n=5+5)
JSONEncode 18.4ms ± 2% 17.8ms ± 2% -3.06% (p=0.016 n=5+5)
JSONDecode 105ms ± 1% 103ms ± 2% ~ (p=0.056 n=5+5)
Mandelbrot200 5.48ms ± 0% 5.49ms ± 0% ~ (p=0.841 n=5+5)
GoParse 6.05ms ± 1% 6.05ms ± 2% ~ (p=1.000 n=5+5)
RegexpMatchEasy0_32 143ns ± 1% 146ns ± 4% +2.10% (p=0.048 n=4+5)
RegexpMatchEasy0_1K 499ns ± 1% 492ns ± 2% ~ (p=0.079 n=5+5)
RegexpMatchEasy1_32 137ns ± 0% 136ns ± 1% -0.73% (p=0.016 n=4+5)
RegexpMatchEasy1_1K 826ns ± 4% 823ns ± 2% ~ (p=0.841 n=5+5)
RegexpMatchMedium_32 224ns ± 5% 233ns ± 8% ~ (p=0.119 n=5+5)
RegexpMatchMedium_1K 59.6µs ± 0% 59.3µs ± 1% -0.66% (p=0.016 n=4+5)
RegexpMatchHard_32 3.29µs ± 3% 3.26µs ± 1% ~ (p=0.889 n=5+5)
RegexpMatchHard_1K 98.8µs ± 2% 99.0µs ± 0% ~ (p=0.690 n=5+5)
Revcomp 1.02s ± 1% 1.01s ± 1% ~ (p=0.095 n=5+5)
Template 135ms ± 5% 131ms ± 1% ~ (p=0.151 n=5+5)
TimeParse 591ns ± 0% 593ns ± 0% +0.20% (p=0.048 n=5+5)
TimeFormat 655ns ± 2% 607ns ± 0% -7.42% (p=0.016 n=5+4)
[Geo mean] 93.5µs 93.8µs +0.23%
name old speed new speed delta
GobDecode 55.1MB/s ± 4% 55.1MB/s ± 4% ~ (p=1.000 n=5+5)
GobEncode 72.4MB/s ± 4% 72.3MB/s ± 5% ~ (p=0.421 n=5+5)
Gzip 34.2MB/s ± 1% 34.5MB/s ± 4% ~ (p=0.548 n=5+5)
Gunzip 322MB/s ± 1% 321MB/s ± 0% ~ (p=0.056 n=5+5)
JSONEncode 106MB/s ± 2% 109MB/s ± 2% +3.16% (p=0.016 n=5+5)
JSONDecode 18.5MB/s ± 1% 18.8MB/s ± 2% ~ (p=0.056 n=5+5)
GoParse 9.57MB/s ± 1% 9.57MB/s ± 2% ~ (p=0.952 n=5+5)
RegexpMatchEasy0_32 223MB/s ± 1% 221MB/s ± 0% -1.10% (p=0.029 n=4+4)
RegexpMatchEasy0_1K 2.05GB/s ± 1% 2.08GB/s ± 2% ~ (p=0.095 n=5+5)
RegexpMatchEasy1_32 232MB/s ± 0% 234MB/s ± 1% +0.76% (p=0.016 n=4+5)
RegexpMatchEasy1_1K 1.24GB/s ± 4% 1.24GB/s ± 2% ~ (p=0.841 n=5+5)
RegexpMatchMedium_32 4.45MB/s ± 5% 4.20MB/s ± 1% -5.63% (p=0.000 n=5+4)
RegexpMatchMedium_1K 17.2MB/s ± 0% 17.3MB/s ± 1% +0.66% (p=0.016 n=4+5)
RegexpMatchHard_32 9.73MB/s ± 3% 9.83MB/s ± 1% ~ (p=0.889 n=5+5)
RegexpMatchHard_1K 10.4MB/s ± 2% 10.3MB/s ± 0% ~ (p=0.635 n=5+5)
Revcomp 249MB/s ± 1% 252MB/s ± 1% ~ (p=0.095 n=5+5)
Template 14.4MB/s ± 4% 14.8MB/s ± 1% ~ (p=0.151 n=5+5)
[Geo mean] 62.1MB/s 62.3MB/s +0.34%
Fixes #10108
Change-Id: I79038f3c4c2ff874c136053d1a2b1c8a5a9cfac5
Reviewed-on: https://go-review.googlesource.com/c/118796
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Current assembler saves constants in Offset which type is int64,
causing 32-bit constants have a incorrect class. This CL reclassifies
constants when opcodes are 32-bit variant, like MOVW, ANDW and
ADDW, etc. Besides, this CL encodes some constants of ADDCON class
as MOVs instructions.
This CL changes the assembler behavior as follows.
1. go assembler ADDW $MOVCON, Rn, Rd
previous version: MOVD $MOVCON, Rtmp; ADDW Rtmp, Rn, Rd
current version: MOVW $MOVCON, Rtmp; ADDW Rtmp, Rn, Rd
2. go assembly MOVW $0xaaaaffff, R1
previous version: treats $0xaaaaffff as VCON, encodes it as MOVW 0x994, R1 (loads it from pool).
current version: treats $0xaaaaffff as MOVCON, and encodes it into MOVW instructions.
3. go assembly MOVD $0x210000, R1
previous version: treats $0x210000 as ADDCON, loads it from pool
current version: treats $0x210000 as MOVCON, and encodes it into MOVD instructions.
Add the test cases.
1. Binary size before/after.
binary size change
pkg/linux_arm64 -1.534KB
pkg/tool/linux_arm64 -0.718KB
go -0.32KB
gofmt no change
2. go1 benchmark result.
name old time/op new time/op delta
BinaryTree17-8 6.26s ± 1% 6.28s ± 1% ~ (p=0.105 n=10+10)
Fannkuch11-8 5.40s ± 0% 5.39s ± 0% -0.29% (p=0.028 n=9+10)
FmtFprintfEmpty-8 94.5ns ± 0% 95.0ns ± 0% +0.51% (p=0.000 n=10+9)
FmtFprintfString-8 163ns ± 1% 159ns ± 1% -2.06% (p=0.000 n=10+9)
FmtFprintfInt-8 200ns ± 1% 196ns ± 1% -1.99% (p=0.000 n=9+10)
FmtFprintfIntInt-8 292ns ± 3% 284ns ± 1% -2.87% (p=0.001 n=10+9)
FmtFprintfPrefixedInt-8 422ns ± 1% 420ns ± 1% -0.59% (p=0.015 n=10+10)
FmtFprintfFloat-8 458ns ± 0% 463ns ± 1% +1.19% (p=0.000 n=9+10)
FmtManyArgs-8 1.37µs ± 1% 1.35µs ± 1% -1.85% (p=0.000 n=10+10)
GobDecode-8 15.5ms ± 1% 15.3ms ± 1% -1.82% (p=0.000 n=10+10)
GobEncode-8 11.7ms ± 5% 11.7ms ± 2% ~ (p=0.549 n=10+9)
Gzip-8 622ms ± 0% 624ms ± 0% +0.23% (p=0.000 n=10+9)
Gunzip-8 73.6ms ± 0% 73.8ms ± 1% ~ (p=0.077 n=9+9)
HTTPClientServer-8 115µs ± 1% 115µs ± 1% ~ (p=0.796 n=10+10)
JSONEncode-8 31.1ms ± 2% 28.7ms ± 1% -7.98% (p=0.000 n=10+9)
JSONDecode-8 145ms ± 0% 145ms ± 1% ~ (p=0.447 n=9+10)
Mandelbrot200-8 9.67ms ± 0% 9.60ms ± 0% -0.76% (p=0.000 n=9+9)
GoParse-8 7.56ms ± 1% 7.58ms ± 0% +0.21% (p=0.035 n=10+9)
RegexpMatchEasy0_32-8 208ns ±10% 222ns ± 0% ~ (p=0.531 n=10+6)
RegexpMatchEasy0_1K-8 699ns ± 4% 694ns ± 4% ~ (p=0.868 n=10+10)
RegexpMatchEasy1_32-8 186ns ± 8% 190ns ±12% ~ (p=0.955 n=10+10)
RegexpMatchEasy1_1K-8 1.13µs ± 1% 1.05µs ± 2% -6.64% (p=0.000 n=10+10)
RegexpMatchMedium_32-8 316ns ± 7% 288ns ± 1% -8.68% (p=0.000 n=10+7)
RegexpMatchMedium_1K-8 90.2µs ± 0% 85.5µs ± 2% -5.19% (p=0.000 n=10+10)
RegexpMatchHard_32-8 5.53µs ± 0% 3.90µs ± 0% -29.52% (p=0.000 n=10+10)
RegexpMatchHard_1K-8 119µs ± 0% 124µs ± 0% +4.29% (p=0.000 n=9+10)
Revcomp-8 1.07s ± 0% 1.07s ± 0% ~ (p=0.094 n=9+9)
Template-8 162ms ± 1% 160ms ± 2% ~ (p=0.089 n=10+10)
TimeParse-8 756ns ± 2% 763ns ± 1% ~ (p=0.158 n=10+10)
TimeFormat-8 758ns ± 1% 746ns ± 1% -1.52% (p=0.000 n=10+10)
name old speed new speed delta
GobDecode-8 49.4MB/s ± 1% 50.3MB/s ± 1% +1.84% (p=0.000 n=10+10)
GobEncode-8 65.6MB/s ± 5% 65.4MB/s ± 2% ~ (p=0.549 n=10+9)
Gzip-8 31.2MB/s ± 0% 31.1MB/s ± 0% -0.24% (p=0.000 n=9+9)
Gunzip-8 264MB/s ± 0% 263MB/s ± 1% ~ (p=0.073 n=9+9)
JSONEncode-8 62.3MB/s ± 2% 67.7MB/s ± 1% +8.67% (p=0.000 n=10+9)
JSONDecode-8 13.4MB/s ± 0% 13.4MB/s ± 1% ~ (p=0.508 n=9+10)
GoParse-8 7.66MB/s ± 1% 7.64MB/s ± 0% -0.23% (p=0.049 n=10+9)
RegexpMatchEasy0_32-8 154MB/s ± 9% 143MB/s ± 3% ~ (p=0.303 n=10+7)
RegexpMatchEasy0_1K-8 1.46GB/s ± 4% 1.47GB/s ± 4% ~ (p=0.912 n=10+10)
RegexpMatchEasy1_32-8 172MB/s ± 9% 170MB/s ±12% ~ (p=0.971 n=10+10)
RegexpMatchEasy1_1K-8 908MB/s ± 1% 972MB/s ± 2% +7.12% (p=0.000 n=10+10)
RegexpMatchMedium_32-8 3.17MB/s ± 7% 3.46MB/s ± 1% +9.14% (p=0.000 n=10+7)
RegexpMatchMedium_1K-8 11.3MB/s ± 0% 12.0MB/s ± 2% +5.51% (p=0.000 n=10+10)
RegexpMatchHard_32-8 5.78MB/s ± 0% 8.21MB/s ± 0% +41.93% (p=0.000 n=9+10)
RegexpMatchHard_1K-8 8.62MB/s ± 0% 8.27MB/s ± 0% -4.11% (p=0.000 n=9+10)
Revcomp-8 237MB/s ± 0% 237MB/s ± 0% ~ (p=0.081 n=9+9)
Template-8 12.0MB/s ± 1% 12.1MB/s ± 2% ~ (p=0.072 n=10+10)
Change-Id: I080801f520366b42d5f9699954bd33106976a81b
Reviewed-on: https://go-review.googlesource.com/c/120661
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Currently "ADD $0x123456, Rs, Rd" will load pre-stored 0x123456
from the constant pool and use it for the addition. Total 12 bytes
are cost. And so does SUB.
This CL breaks it to "ADD 0x123000, Rs, Rd" + "ADD 0x000456, Rd, Rd".
Both "0x123000" and "0x000456" can be directly encoded into the
instruction binary code. So 4 bytes are saved.
1. The total size of pkg/android_arm64 decreases about 0.3KB.
2. The go1 benchmark show little regression (excluding noise).
name old time/op new time/op delta
BinaryTree17-4 15.9s ± 0% 15.9s ± 1% +0.10% (p=0.044 n=29+29)
Fannkuch11-4 8.72s ± 0% 8.75s ± 0% +0.34% (p=0.000 n=30+24)
FmtFprintfEmpty-4 173ns ± 0% 173ns ± 0% ~ (all equal)
FmtFprintfString-4 368ns ± 0% 368ns ± 0% ~ (p=0.593 n=30+30)
FmtFprintfInt-4 417ns ± 0% 417ns ± 0% ~ (all equal)
FmtFprintfIntInt-4 673ns ± 0% 661ns ± 1% -1.70% (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4 805ns ± 0% 805ns ± 0% +0.10% (p=0.011 n=30+30)
FmtFprintfFloat-4 1.09µs ± 0% 1.09µs ± 0% ~ (p=0.125 n=30+29)
FmtManyArgs-4 2.68µs ± 0% 2.68µs ± 0% +0.07% (p=0.004 n=30+30)
GobDecode-4 32.9ms ± 0% 33.2ms ± 1% +1.07% (p=0.000 n=29+29)
GobEncode-4 29.5ms ± 0% 29.6ms ± 0% +0.26% (p=0.000 n=28+28)
Gzip-4 1.38s ± 1% 1.35s ± 3% -1.94% (p=0.000 n=28+30)
Gunzip-4 139ms ± 0% 139ms ± 0% +0.10% (p=0.000 n=28+29)
HTTPClientServer-4 745µs ± 5% 742µs ± 3% ~ (p=0.405 n=28+29)
JSONEncode-4 49.5ms ± 1% 49.9ms ± 0% +0.89% (p=0.000 n=30+30)
JSONDecode-4 264ms ± 1% 264ms ± 0% +0.25% (p=0.001 n=30+30)
Mandelbrot200-4 16.6ms ± 0% 16.6ms ± 0% ~ (p=0.507 n=29+29)
GoParse-4 15.9ms ± 0% 16.0ms ± 1% +0.91% (p=0.002 n=23+30)
RegexpMatchEasy0_32-4 379ns ± 0% 379ns ± 0% ~ (all equal)
RegexpMatchEasy0_1K-4 1.31µs ± 0% 1.31µs ± 0% +0.09% (p=0.008 n=27+30)
RegexpMatchEasy1_32-4 357ns ± 0% 358ns ± 0% +0.28% (p=0.000 n=28+29)
RegexpMatchEasy1_1K-4 2.04µs ± 0% 2.04µs ± 0% ~ (p=0.850 n=30+30)
RegexpMatchMedium_32-4 587ns ± 0% 589ns ± 0% +0.33% (p=0.000 n=30+30)
RegexpMatchMedium_1K-4 162µs ± 0% 163µs ± 0% ~ (p=0.351 n=30+29)
RegexpMatchHard_32-4 9.54µs ± 0% 9.60µs ± 0% +0.59% (p=0.000 n=28+30)
RegexpMatchHard_1K-4 287µs ± 0% 287µs ± 0% +0.11% (p=0.000 n=26+29)
Revcomp-4 2.50s ± 0% 2.50s ± 0% -0.13% (p=0.012 n=28+27)
Template-4 312ms ± 1% 312ms ± 1% +0.20% (p=0.015 n=27+30)
TimeParse-4 1.68µs ± 0% 1.68µs ± 0% -0.35% (p=0.000 n=30+30)
TimeFormat-4 1.66µs ± 0% 1.64µs ± 0% -1.20% (p=0.000 n=25+29)
[Geo mean] 246µs 246µs -0.00%
name old speed new speed delta
GobDecode-4 23.3MB/s ± 0% 23.1MB/s ± 1% -1.05% (p=0.000 n=29+29)
GobEncode-4 26.0MB/s ± 0% 25.9MB/s ± 0% -0.25% (p=0.000 n=29+28)
Gzip-4 14.1MB/s ± 1% 14.4MB/s ± 3% +1.94% (p=0.000 n=27+30)
Gunzip-4 139MB/s ± 0% 139MB/s ± 0% -0.10% (p=0.000 n=28+29)
JSONEncode-4 39.2MB/s ± 1% 38.9MB/s ± 0% -0.88% (p=0.000 n=30+30)
JSONDecode-4 7.37MB/s ± 0% 7.35MB/s ± 0% -0.26% (p=0.001 n=30+30)
GoParse-4 3.65MB/s ± 0% 3.62MB/s ± 1% -0.86% (p=0.001 n=23+30)
RegexpMatchEasy0_32-4 84.3MB/s ± 0% 84.3MB/s ± 0% ~ (p=0.126 n=27+26)
RegexpMatchEasy0_1K-4 784MB/s ± 0% 783MB/s ± 0% -0.10% (p=0.003 n=27+30)
RegexpMatchEasy1_32-4 89.5MB/s ± 0% 89.3MB/s ± 0% -0.20% (p=0.000 n=27+29)
RegexpMatchEasy1_1K-4 502MB/s ± 0% 502MB/s ± 0% ~ (p=0.858 n=30+28)
RegexpMatchMedium_32-4 1.70MB/s ± 0% 1.70MB/s ± 0% -0.25% (p=0.000 n=30+30)
RegexpMatchMedium_1K-4 6.30MB/s ± 0% 6.30MB/s ± 0% ~ (all equal)
RegexpMatchHard_32-4 3.35MB/s ± 0% 3.33MB/s ± 0% -0.47% (p=0.000 n=30+30)
RegexpMatchHard_1K-4 3.57MB/s ± 0% 3.56MB/s ± 0% -0.20% (p=0.000 n=27+30)
Revcomp-4 102MB/s ± 0% 102MB/s ± 0% +0.14% (p=0.008 n=28+28)
Template-4 6.23MB/s ± 0% 6.21MB/s ± 1% -0.21% (p=0.009 n=21+30)
[Geo mean] 24.1MB/s 24.0MB/s -0.16%
Change-Id: Ifcef3edb667540e2d86e586c23afcfbc2cf1340b
Reviewed-on: https://go-review.googlesource.com/c/134536
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
LDADDALD(64-bit) and LDADDALW(32-bit) are already supported.
This CL adds supports of LDADDALH(16-bit) and LDADDALB(8-bit).
Change-Id: I4eac61adcec226d618dfce88618a2b98f5f1afe7
Reviewed-on: https://go-review.googlesource.com/132135
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|