| Age | Commit message (Collapse) | Author |
|
This CL is generated by CL 764800.
Supported addressing patterns:
(Z7.D.SXTW<<2)(Z6.D), where Z6.D is the base, Z7.D is the indices.
SXTW/UXTW represents signed/unsigned extension, << represents LSL.
Change-Id: Ifc6c47833d5113be7cfe96943d369ab977b3a6ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/764780
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Junyang Shao <shaojunyang@google.com>
|
|
This CL is generated by CL 759800.
The new register patterns are (examples):
Z1.B[5]
Z2[6]
P1[7]
PN1[8]
Change-Id: I5bccc4f1c0474dbd4cd4878bd488f36a7026c7ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/759780
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add support for ASIMD instructions that reduce a vector to
a scalar by operating across all lanes. These use the ASIMDALL
encoding class from the ARM architecture specification.
Integer cross-lane reductions (.B8, .B16, .H4, .H8, .S4):
Signed max/min across lanes: VSMAXV, VSMINV
Unsigned max/min across lanes: VUMAXV, VUMINV
Floating-point cross-lane reductions (.S4 arrangement):
FP max/min across lanes: VFMAXV, VFMINV
FP max/min across lanes (NM): VFMAXNMV, VFMINNMV
Change-Id: I6af4462d26803dfc7c78db2ad9df4284083e31e8
Reviewed-on: https://go-review.googlesource.com/c/go/+/762202
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add support for ASIMD unary miscellaneous instructions that operate
on a single source register. These use the ASIMDMISC encoding
class from the ARM architecture specification.
These instruction need some validation for arrangement constraints:
- VNOT only allows .B8/.B16 arrangements
- VCLS/VCLZ do not support D arrangements
- Floating-point variants (VFABS, VFNEG, VFSQRT, VFRINT*) only
allow floating-point arrangements (S and D)
New instructions by group:
Integer absolute/negate: VABS, VNEG
Floating-point abs/negate: VFABS, VFNEG
Floating-point sqrt: VFSQRT
Floating-point round: VFRINTN, VFRINTP, VFRINTM, VFRINTZ
Saturating abs/negate: VSQABS, VSQNEG
Bit/count operations: VCLS, VCLZ, VNOT
Change-Id: I62242eda31f82cd34119c7d4f97316a030e7663b
Reviewed-on: https://go-review.googlesource.com/c/go/+/762201
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Add encoding support for ASIMD three-register instructions covering
floating-point, saturating, halving, integer multiply/accumulate,
min/max (including pairwise variants), and bitwise operations.
These belong to the "Advanced SIMD Three-register (same)" instruction
class defined by the ARM architecture, meaning the two source registers
use the same element arrangement (e.g., both .S4 or both .D2). In the
assembler they share a common encoding path using the ASIMDSAME()
macro.
New instructions by group:
Floating-point arithmetic: VFADD, VFSUB, VFMUL, VFDIV
Floating-point min/max: VFMAX, VFMAXNM, VFMIN, VFMINNM
Pairwise floating-point: VFADDP, VFMAXP, VFMINP, VFMAXNMP,
VFMINNMP
Saturating arithmetic: VSQADD, VUQADD, VSQSUB, VUQSUB
Average (halving add): VSHADD, VSRHADD, VUHADD, VURHADD
Integer multiply/accum: VMUL, VMLA, VMLS
Integer min/max: VSMAX, VSMIN
Pairwise integer min/max: VSMAXP, VSMINP, VUMAXP, VUMINP
Bitwise: VBIC, VORN
Change-Id: I732c84123ad1f302260514fdfe0d020787da017b
Reviewed-on: https://go-review.googlesource.com/c/go/+/762200
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add support for ASIMD shift instructions. These use the ASIMDSHF
encoding class from the ARM architecture specification, where the
shift amount is encoded as an immediate derived from the element size.
Also add ASIMD shifts-by-vector (3-register form) where the shift
amount comes from a second vector register. These use the ASIMDSAME
encoding class.
New instructions by group:
Shift by immediate (signed): VSSHR, VSRSHR
Shift by immediate (saturating): VSQSHL, VUQSHL
Narrowing shift by immediate: VSHRN, VSHRN2
Shift by vector (3-reg): VSSHL, VUSHL, VSQSHL, VUQSHL
Change-Id: I039cc16bc01980b04e6940cc1d4670faf5fa7e3c
Reviewed-on: https://go-review.googlesource.com/c/go/+/762180
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add remaining arm64 ASIMD vector compare instructions.
All these instructions produce either all zeroes (false) or all ones (true)
bits in each corresponding lane as the result.
Added integer comparison instructions:
- VCMEQ (compare to zero)
- VCMGE, VCMGT (singed, both two-register and compare to zero)
- VCMHI, VCMHS (unsigned two-register compare)
- VCMLE, VCMLT (signed compare to zero)
Added floating-point comparison instructions:
- VFCMEQ, VFCMGE, VFCMGT (both two-register and zero variants)
- VFCMLE, VFCMLT (compare to zero)
Change-Id: I913165d3934f2556c9bdf38c5103ef56d86383ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/721640
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Refactor arm64 ASIMD opcodes to use common helper routines named
after their instruction classes from the arm64 XML specification.
Add helper routines like ASIMDSAME for instructions with encoding
class "asimdsame" in arm64 encodingindex.xml. Helper arguments
follow the bitfield order in the speficication tables.
For example, the CMEQ instruction entry:
<tr class="instructiontable" encname="CMEQ_asimdsame_only"...>
<td bitwidth="1" class="bitfield">1</td>
<td bitwidth="2" class="bitfield"></td>
<td bitwidth="5" class="bitfield">10001</td>
<td class="iformname" iformid="CMEQ_advsimd_reg">CMEQ (register)</td>
<td class="enctags">Vector</td>
</tr>
Now corresponds to ASIMDSAME(1, 0, 0x11), where each argument
matches the correspoding bitfield value in the table.
Change-Id: I024f3eba552906a865841bc1a296f14e3fca73f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/719280
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This CL integrates a new assembling path specifically designed for SVE
and other modern ARM64 instructions, utilizing generated instruction
tables. It contains the foundational files and modifications to direct
the assembling pipeline to use this new data-driven path.
In a.out.go, it registers new constants for registers and operand types
used by SVE.
A new file inst.go is added, which defines the instruction table data
types and utility functions for the new path. The entry point from the
upstream pipeline is `tryEncode`.
`tryEncode` returns false upon an encoding failure, which allows the
upstream matching logic to handle multiple potential matches. The exact
match is not finalized until an instruction is actually encoded, as
detailed in the comments for `elemEncoders`.
This CL also introduces the core generated tables (`anames_gen.go`,
`encoding_gen.go`, `goops_gen.go`, and `inst_gen.go`) which handle a
wide variety of SVE instructions. A comprehensive end-to-end assembly
test file (`arm64sveenc.s`) is added, containing hundreds of test cases
for these SVE instructions to verify the new encoding path.
To facilitate these encodings, this CL implements handling for operand
types such as AC_ARNG, AC_PREG, AC_PREGZM, and AC_ZREG. Others are left
as TODOs.
The generated files in this CL are produced by the `instgen` tool in CL
755180.
Original author Eric Fang (eric.fang@arm.com, CL 424137)
Change-Id: I483f170c776fcd8edd8b8b04520f9d69ee0855dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/742620
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Extend splitImm24uScaled to support an unshifted hi value (hi <= 0xfff)
in addition to the shifted hi value (hi & ^0xfff000 == 0). This allows
load/store instructions to handle more offsets using ADD + load/store
sequences instead of falling back to the literal pool.
This will be used by a subsequent change to add FMOVQ support in SSA form.
Change-Id: I78490f5b1a60d49c1d42ad4daefb5d4e6021c965
Reviewed-on: https://go-review.googlesource.com/c/go/+/737320
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Add the SB (speculation barrier) instruction, and an internal/cpu
feature bit to check its availability.
Change-Id: I7c2d887ae75598f7c11cc875ec15ec3be76c09f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/729501
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Support arm64 FMOVQ from/to global address. Currently there are no
global addresses known to be aligned by 16 bytes, and with this CL
we will always use R_ADDRARM64 relocation with ADRP+ADD+FMOVQ instructions.
Change-Id: I283009eda151d1875cf4457734e79b68a941a6df
Reviewed-on: https://go-review.googlesource.com/c/go/+/718001
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Remove the ARM64 prefix from encoding helper functions that were moved to
cmd/internal/obj to be used by both cmd/asm and cmd/compile. These
functions now use the package prefix and look like:
arm64.EncodeRegisterExtension and arm64.RegisterListOffset.
Change-Id: I3548a4fce1072083eb2f55310c9f7ca6a8e12253
Reviewed-on: https://go-review.googlesource.com/c/go/+/714320
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Change-Id: Ieaacd8c40495e7dad61a068125b1d0e0cee832c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/713500
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
There is nothing particularly special about content-addressable
symbols, it's just a place to start.
This reduces the size of the tailscaled binary by about 16K.
This happens mainly because before this CL the linker's symalign
function kicks in for all static composite literals and PCDATA symbols,
and gives them an alignment based on their size. If the size happens
to be a multiple of 32, it gets an alignment of 32.
That wastes space.
For #6853
For #36313
Change-Id: I2f049eee8f2463dd2b5e20d7c9a270ac32a31e50
Reviewed-on: https://go-review.googlesource.com/c/go/+/727920
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
log2
In several places the integer log2 is calculated using loops or similar
mechanisms. math/bits.Len* provide a simpler and more efficient
mechanisms for this.
Annoyingly, every usage has slightly different ideas of what "log2"
means and how non-positive inputs should be handled. I verified the
replacements in each case by comparing the result for inputs from 0
to 1<<16.
Change-Id: Ie962a74674802da363e0038d34c06979ccb41cf3
Reviewed-on: https://go-review.googlesource.com/c/go/+/721880
Reviewed-by: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Large integer constants can take up to 4 instructions to encode.
We can encode some large constants with a single instruction, namely
those which are bit patterns (repetitions of certain runs of 0s and 1s).
Often the constants we want to encode are *close* to those bit patterns,
but don't exactly match. For those, we can use 2 instructions, one to
load the close-by bit pattern and one to fix up any mismatches.
The constants we use to strength reduce divides often fit this pattern.
For unsigned divides by 1 through 15, this CL applies to the constant
for N=3,5,6,10,12,15.
Triggers 17 times in hello world.
Change-Id: I623abf32961fb3e74d0a163f6822f0647cd94499
Reviewed-on: https://go-review.googlesource.com/c/go/+/717900
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Support arm64 FMOVQ with large offset in immediate which is encoded
using register offset instruction in opldrr or opstrr. This will help
allowing folding immediate into new ssa ops FMOVQload and FMOVQstore.
For example: FMOVQ F0, -20000(R0) is encoded as following:
MOVD 3(PC), R27
FMOVQ F0, (R0)(R27)
RET
ffff b1e0 # constant value
Change-Id: Ib71f92f6ff4b310bda004a440b1df41ffe164523
Reviewed-on: https://go-review.googlesource.com/c/go/+/716960
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Found by github.com/mdempsky/unconvert
Change-Id: I88ce10390a49ba768a4deaa0df9057c93c1164de
GitHub-Last-Rev: 3b0f7e8f74f58340637f33287c238765856b2483
GitHub-Pull-Request: golang/go#75974
Reviewed-on: https://go-review.googlesource.com/c/go/+/712940
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Change-Id: Iab41674953655efa7be3d306dfb3f5be486be501
Reviewed-on: https://go-review.googlesource.com/c/go/+/701455
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Add support for the Pointer Authentication Code instructions
required for the ELF ABI when enabling PAC aware binaries.
This allows for assembly writers to add PAC instructions where needed to
support this ABI. Follow up work is to enable the compiler to emit these
instructions in the appropriate places.
The TL;DR for the Linux ABI is that the prologue of a function that
pushes the link register (LR) to the stack, signs the LR with a key
managed by the operating system and hardware using a PAC instruction,
like "paciasp". The function epilog, when restoring the LR from the
stack will verify the signature, using an instruction like "autiasp".
This helps prevents attackers from modifying the return address on the
stack, a common technique for ROP attacks.
Details on PAC can be found here:
- https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/enabling-pac-and-bti-on-aarch64
- https://developer.arm.com/documentation/109576/0100/Pointer-Authentication-Code
The ABI details can be found here:
- https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst
Change-Id: I4516ed1294d19f9ff9d278833d542821b6642aa9
Reviewed-on: https://go-review.googlesource.com/c/go/+/676675
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
go1.26's vet printf checker can associate the printf-wrapper
property with local vars and struct fields if they are assigned
from a printf-like func literal (CL 706635). This leads to better
detection of mistakes.
Change-Id: I604be1e200aa1aba75e09d4f36ab68c1dba3b8a3
Reviewed-on: https://go-review.googlesource.com/c/go/+/710195
Auto-Submit: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This reverts commit 719dfcf8a8478d70360bf3c34c0e920be7b32994.
Reason for revert: Causing crashes.
Change-Id: I0b8526dd03d82fa074ce4f97f1789eeac702b3eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/709755
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Instead of storing LR (the return address) at 0(SP) and the FP
(parent's frame pointer) at -8(SP), store them at framesize-8(SP)
and framesize-16(SP), respectively.
We push and pop data onto the stack such that we're never accessing
anything below SP.
The prolog/epilog lengths are unchanged (3 insns for a typical prolog,
2 for a typical epilog).
We use 8 bytes more per frame.
Typical prologue:
STP.W (FP, LR), -16(SP)
MOVD SP, FP
SUB $C, SP
Typical epilogue:
ADD $C, SP
LDP.P 16(SP), (FP, LR)
RET
The previous word where we stored LR, at 0(SP), is now unused.
We could repurpose that slot for storing a local variable.
The new prolog and epilog instructions are recognized by libunwind,
so pc-sampling tools like perf should now be accurate. (TODO: except
maybe after the first RET instruction? Have to look into that.)
Update #73753 (fixes, for arm64)
Update #57302 (Quim thinks this will help on that issue)
Change-Id: I4800036a9a9a08aaaf35d9f99de79a36cf37ebb8
Reviewed-on: https://go-review.googlesource.com/c/go/+/674615
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: Ib290079a77a746a8512cd4638310b24164f6a930
Reviewed-on: https://go-review.googlesource.com/c/go/+/679456
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Fixes #74076
Change-Id: Icc67b3d4e342f329584433bd1250c56ae8f5a73d
Reviewed-on: https://go-review.googlesource.com/c/go/+/690635
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Return the shift in bits from movcon, rather than returning an index.
This allows a number of multiplications to be removed, making the code
more readable. Scale down to an index only when encoding.
Change-Id: I1be91eb526ad95d389e2f8ce97212311551790df
Reviewed-on: https://go-review.googlesource.com/c/go/+/650939
Auto-Submit: Joel Sing <joel@sing.id.au>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Teach conclass how to handle 32 bit values and deduplicate the code
between con32class and conclass.
Change-Id: I9c5eea31d443fd4c2ce700c6ea21e1d0bef665b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/650938
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Joel Sing <joel@sing.id.au>
|
|
Reduce repetition by pulling some common conversions into variables.
Change-Id: I8c1cc806236b5ecdadf90f4507923718fa5de9b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/650937
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This will allow for further improvements and deduplication.
Change-Id: I9374fc2d16168ced06f3fcc9e558a9c85e24fd01
Reviewed-on: https://go-review.googlesource.com/c/go/+/650936
Reviewed-by: Fannie Zhang <Fannie.Zhang@arm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Add support for the `BTI' instruction to the arm64 assembler. This
instruction provides Branch Target Identification for targets of
indirect branches. A BTI can be marked with a target type of
'C' (call), 'J' (jump) or 'JC' (jump or call).
Updates #66054
Change-Id: I1cf31a0382207bb75b9b2deb49ac298a59c00d8a
Reviewed-on: https://go-review.googlesource.com/c/go/+/646781
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Marvin Drees <marvin.drees@9elements.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Rather than having register encoding knowledge in each caller of opldrr/opstrr
(and in a separate olsxrr function), pass the registers into opldrr/opstrr and
let them handle the encoding. This reduces duplication and improves readability.
Change-Id: I50a25263f305d01454f3ff95e8b6e7c76e760ab0
Reviewed-on: https://go-review.googlesource.com/c/go/+/471521
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Provide a four register version of oprrr, which takes an additional 'ra'
register. Use this instead of oprrr where appropriate.
Change-Id: I8882957a83c2b08e407f37a37c61864cd920bbc9
Reviewed-on: https://go-review.googlesource.com/c/go/+/471519
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Rather than having register encoding knowledge in each caller of oprrr,
pass the registers into oprrr and let it handle the encoding. This reduces
duplication and improves readability.
Change-Id: Iab6c70f7796b7a8c071419654b8a5686aeee8c1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/471518
Reviewed-by: Fannie Zhang <Fannie.Zhang@arm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
isaddcon2 tests for the range 0 <= v <= 0xffffff - replace duplicated range
checks with calls to isaddcon2.
Change-Id: Ia6f331852ed3d77715b265cb4fcc500579eac711
Reviewed-on: https://go-review.googlesource.com/c/go/+/650935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Fannie Zhang <Fannie.Zhang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Code like x := [12]byte{1,2,3,4,5,6,7,8,9,10,11,12} stores x in
a pair of registers and uses MOVD/MOVWU to load the values
from RODATA. The code generator needs to understand not
to use the aligned PC-relative relocation for that sequence.
In non-FIPS modes, more statictemp optimizations can be applied
and this problematic sequence doesn't happen.
Fix the decision about whether to assume alignment to match
the code used by the linker when deciding what to align.
Fixes the linker failure in CL 626437 patch set 5.
Change-Id: Iedad862c6faee758d4a2c5120cab2d329265b134
Reviewed-on: https://go-review.googlesource.com/c/go/+/628835
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Bypass: Russ Cox <rsc@golang.org>
|
|
The old API was to do
r := obj.AddRel(sym)
r.Type = this
r.Off = that
etc
The new API is:
sym.AddRel(ctxt, obj.Reloc{Type: this: Off: that, etc})
This new API is more idiomatic and avoids ever having relocations
that are only partially constructed. Most importantly, it sets up
for sym.AddRel being able to check relocation validity in the future.
(Passing ctxt is for use in validity checking.)
Passes golang.org/x/tools/cmd/toolstash/buildall.
Change-Id: I042ea76e61bb3bf6402f98ca11291a13f4799972
Reviewed-on: https://go-review.googlesource.com/c/go/+/625616
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Adds helper functions for the literal pooling, large branch handling
and code emission stages of the span7 assembler pass. This hides the
implementation of the current assembler from the general workflow in
span7 to make the implementation easier to change in future.
Updates #44734
Change-Id: I8859956b23ad4faebeeff6df28051b098ef90fed
Reviewed-on: https://go-review.googlesource.com/c/go/+/595755
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
with slices there's no need to implement sort.Interface
Change-Id: I59167e78881cb1df89a71e33d738d6aeca7adb71
GitHub-Last-Rev: 507ba84453f7305b6b2bf6317292111c00c93ffe
GitHub-Pull-Request: golang/go#68724
Reviewed-on: https://go-review.googlesource.com/c/go/+/602895
Reviewed-by: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
UDF provides a stronger guarantee for generating the Undefined
Instruction exception than the current value being emitted.
Change-Id: I234cd70ce04f21311959c1061ae24992438105f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/605155
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Set the right instruction bits in asmout in order
to allow using MSR with DIT and an immediate
value. This allows us to avoid using an
intermediary register when we want to set DIT
(unsetting DIT already worked with the zero
register).
Ref: https://developer.arm.com/documentation/ddi0602/2024-06/Base-Instructions/MSR--immediate---Move-immediate-value-to-special-register-?lang=en
Change-Id: Id049a0b4e0feb534cea992553228f9b5e12ddcea
Reviewed-on: https://go-review.googlesource.com/c/go/+/597595
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This appears to be useful only on amd64, and was specifically
benchmarked on Apple Silicon and did not produce any benefit there.
This CL adds the assembly instruction `PCALIGNMAX align,amount`
which aligns to `align` if that can be achieved with `amount`
or fewer bytes of padding. (0 means never, but will align the
enclosing function.)
Specifically, if low-order-address-bits + amount are
greater than or equal to align; thus, `PCALIGNMAX 64,63` is
the same as `PCALIGN 64` and `PCALIGNMAX 64,0` will never
emit any alignment, but will still cause the function itself
to be aligned to (at least) 64 bytes.
Change-Id: Id51a056f1672f8095e8f755e01f72836c9686aa3
Reviewed-on: https://go-review.googlesource.com/c/go/+/577935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
ZR register can be used in register pair of LDP, LDPW and LDPSW
instructions, but now it's not allowed. This CL fixes this issue.
Change-Id: I8467502de4664214e0b7dad0295c44f6cff16ee6
Reviewed-on: https://go-review.googlesource.com/c/go/+/547815
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
|
|
Change-Id: I36a0f0989d37bef45ea8778da799b56a7e9a0c30
Reviewed-on: https://go-review.googlesource.com/c/go/+/529515
Run-TryBot: shuang cui <imcusg@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
|
|
pairs
Implement better classification for load and store pair operations. This in
turn allows us to avoid using pool literals when the offset fits in a 24 bit
unsigned immediate. In this case, the offset can be calculated using two
add immediate instructions, rather than loading the offset from the pool
literal and then adding the offset to the base register. This requires the
same number of instructions, however avoids a load from memory and does
not require the offset to be stored in the literal pool.
Updates #59615
Change-Id: I316ec3d54f1d06ae9d930e98d0c32471775fcb26
Reviewed-on: https://go-review.googlesource.com/c/go/+/515615
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Load large constants into vector registers from rodata, instead of placing them
in the literal pool. This treats VMOVQ/VMOVD/VMOVS the same as FMOVD/FMOVS and
makes use of the existing mechanism for storing values in rodata. Two additional
instructions are required for a load, however these instructions are used
infrequently and already have a high latency.
Updates #59615
Change-Id: I54226730267689963d73321e548733ae2d66740e
Reviewed-on: https://go-review.googlesource.com/c/go/+/515617
Reviewed-by: Eric Fang <eric.fang@arm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Rather than having register encoding knowledge in each caller of opxrrr,
pass the registers into opxrrr and let it handle the encoding. This reduces
duplication and improves readability.
Change-Id: I202c503465a0169277a0f64340598203c9dcf20c
Reviewed-on: https://go-review.googlesource.com/c/go/+/461140
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
The previous implementation would limit itself to 0xfff000 | 0xfff << shift,
while the maximum possible value is 0xfff000 + 0xfff << shift. In practical
terms, this means that an additional ((1 << shift) - 1) * 0x1000 of offset
is reachable for operations that use this splitting format. In the case of
an 8 byte load/store, this is an additional 0x7000 that can be reached
without needing to use the literal pool.
Updates #59615
Change-Id: Ice7023104042d31c115eafb9398c2b999bdd6583
Reviewed-on: https://go-review.googlesource.com/c/go/+/512540
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
|
|
In a number of load and store cases, the use of the literal pool can be
entirely avoided by simply adding or subtracting the offset from the
register. This uses the same number of instructions, while avoiding a
load from memory, along with the need for the value to be in the literal
pool. Overall this reduces the size of binaries slightly and should have
lower overhead.
Updates #59615
Change-Id: I9cb6a403dc71e34a46af913f5db87dbf52f8688c
Reviewed-on: https://go-review.googlesource.com/c/go/+/512539
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
|
|
Currently, pool literals are added when they are not needed, namely
in the case where the offset is a 24 bit unsigned scaled immediate.
By improving the classification of loads and stores, we can avoid
generating unused pool literals. However, more importantly this
provides a basis for further improvement of the load and store
code generation.
Updates #59615
Change-Id: Ia3bad1709314565a05894a76c434cca2fa4533c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/512538
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
|