| Age | Commit message (Collapse) | Author |
|
This CL is generated by CL 765440.
This CL supports this addressing pattern:
(VL*imm)(Reg)
(-VL*imm)(Reg)
Change-Id: I4d1bab2ef6c4141699a47b28aa14b28cdee6cb3f
Reviewed-on: https://go-review.googlesource.com/c/go/+/765420
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Commit-Queue: Junyang Shao <shaojunyang@google.com>
|
|
This CL is generated by CL 765100
This CL supports this addressing pattern:
imm(reg.T)
Change-Id: I16789e8e6cf03c4fa225c0fe1bd31dc23c9feb21
Reviewed-on: https://go-review.googlesource.com/c/go/+/765080
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This CL is generated by CL 764980.
This CL supports these new special constants:
<prfop>, which Go already support (prefetch modifier)
<vl>, which include VLx2 and VLx4, which is the vector length specifier.
Change-Id: I831f306a816493c08f3c22786e5360f2a37acf6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/765000
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This CL is generated by CL 764800.
Supported addressing patterns:
(Z7.D.SXTW<<2)(Z6.D), where Z6.D is the base, Z7.D is the indices.
SXTW/UXTW represents signed/unsigned extension, << represents LSL.
Change-Id: Ifc6c47833d5113be7cfe96943d369ab977b3a6ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/764780
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Junyang Shao <shaojunyang@google.com>
|
|
This CL adds the register list support for SVE:
[Z1.B, Z2.B]
[P1.B, P2.B]
[Z1.D]
[Z1.D, Z2.D, Z3.D]
[Z1.D, Z2.D, Z3.D, Z4.D]
This CL is generated by CL 763780.
Change-Id: I92210097a8a7525a5a53a2dce0b7652397275dd6
Reviewed-on: https://go-review.googlesource.com/c/go/+/763820
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This CL supports various immediate operand patterns.
ARM designs the immediate to carry significant semantics, this CL tries
to address them as what GNU assembler do, and what the ARM ASL
specifies.
This CL is generated by CL 763781.
Change-Id: I40e2b573f196a947c4f3e55c2be7b8d551471c84
Reviewed-on: https://go-review.googlesource.com/c/go/+/763769
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Commit-Queue: Junyang Shao <shaojunyang@google.com>
|
|
This CL is generated by CL 759800.
The new register patterns are (examples):
Z1.B[5]
Z2[6]
P1[7]
PN1[8]
Change-Id: I5bccc4f1c0474dbd4cd4878bd488f36a7026c7ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/759780
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The GP registers and SIMD registers are comforming to the existing Go
syntax: they are V or R registers, their widths are specified in the
Opcode, the rules to specify them is:
- if that instruction only contains one GP or SIMD register:
If it's 32-bit GP, then append W to the end of the opcode.
If it's 64-bit GP, no changes.
If it's SIMD register with BHWD width specification, BHSDQ will just
be appended to the end of the opcode.
- if it contains multiple GP or SIMD registers, then manual observation
found that they are either specified the same width, or they are fixed
width. We distinguish them by their first Go ASM operand width. The rule
to append suffixes are the same to the single-reg case above.
This CL is generated by CL 759280.
Change-Id: Icc819cc30dd8fd1609de31ba7bcb4e3ac83c465e
Reviewed-on: https://go-review.googlesource.com/c/go/+/759261
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add support for ASIMD instructions that reduce a vector to
a scalar by operating across all lanes. These use the ASIMDALL
encoding class from the ARM architecture specification.
Integer cross-lane reductions (.B8, .B16, .H4, .H8, .S4):
Signed max/min across lanes: VSMAXV, VSMINV
Unsigned max/min across lanes: VUMAXV, VUMINV
Floating-point cross-lane reductions (.S4 arrangement):
FP max/min across lanes: VFMAXV, VFMINV
FP max/min across lanes (NM): VFMAXNMV, VFMINNMV
Change-Id: I6af4462d26803dfc7c78db2ad9df4284083e31e8
Reviewed-on: https://go-review.googlesource.com/c/go/+/762202
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add support for ASIMD unary miscellaneous instructions that operate
on a single source register. These use the ASIMDMISC encoding
class from the ARM architecture specification.
These instruction need some validation for arrangement constraints:
- VNOT only allows .B8/.B16 arrangements
- VCLS/VCLZ do not support D arrangements
- Floating-point variants (VFABS, VFNEG, VFSQRT, VFRINT*) only
allow floating-point arrangements (S and D)
New instructions by group:
Integer absolute/negate: VABS, VNEG
Floating-point abs/negate: VFABS, VFNEG
Floating-point sqrt: VFSQRT
Floating-point round: VFRINTN, VFRINTP, VFRINTM, VFRINTZ
Saturating abs/negate: VSQABS, VSQNEG
Bit/count operations: VCLS, VCLZ, VNOT
Change-Id: I62242eda31f82cd34119c7d4f97316a030e7663b
Reviewed-on: https://go-review.googlesource.com/c/go/+/762201
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Add encoding support for ASIMD three-register instructions covering
floating-point, saturating, halving, integer multiply/accumulate,
min/max (including pairwise variants), and bitwise operations.
These belong to the "Advanced SIMD Three-register (same)" instruction
class defined by the ARM architecture, meaning the two source registers
use the same element arrangement (e.g., both .S4 or both .D2). In the
assembler they share a common encoding path using the ASIMDSAME()
macro.
New instructions by group:
Floating-point arithmetic: VFADD, VFSUB, VFMUL, VFDIV
Floating-point min/max: VFMAX, VFMAXNM, VFMIN, VFMINNM
Pairwise floating-point: VFADDP, VFMAXP, VFMINP, VFMAXNMP,
VFMINNMP
Saturating arithmetic: VSQADD, VUQADD, VSQSUB, VUQSUB
Average (halving add): VSHADD, VSRHADD, VUHADD, VURHADD
Integer multiply/accum: VMUL, VMLA, VMLS
Integer min/max: VSMAX, VSMIN
Pairwise integer min/max: VSMAXP, VSMINP, VUMAXP, VUMINP
Bitwise: VBIC, VORN
Change-Id: I732c84123ad1f302260514fdfe0d020787da017b
Reviewed-on: https://go-review.googlesource.com/c/go/+/762200
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add support for ASIMD shift instructions. These use the ASIMDSHF
encoding class from the ARM architecture specification, where the
shift amount is encoded as an immediate derived from the element size.
Also add ASIMD shifts-by-vector (3-register form) where the shift
amount comes from a second vector register. These use the ASIMDSAME
encoding class.
New instructions by group:
Shift by immediate (signed): VSSHR, VSRSHR
Shift by immediate (saturating): VSQSHL, VUQSHL
Narrowing shift by immediate: VSHRN, VSHRN2
Shift by vector (3-reg): VSSHL, VUSHL, VSQSHL, VUQSHL
Change-Id: I039cc16bc01980b04e6940cc1d4670faf5fa7e3c
Reviewed-on: https://go-review.googlesource.com/c/go/+/762180
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add remaining arm64 ASIMD vector compare instructions.
All these instructions produce either all zeroes (false) or all ones (true)
bits in each corresponding lane as the result.
Added integer comparison instructions:
- VCMEQ (compare to zero)
- VCMGE, VCMGT (singed, both two-register and compare to zero)
- VCMHI, VCMHS (unsigned two-register compare)
- VCMLE, VCMLT (signed compare to zero)
Added floating-point comparison instructions:
- VFCMEQ, VFCMGE, VFCMGT (both two-register and zero variants)
- VFCMLE, VFCMLT (compare to zero)
Change-Id: I913165d3934f2556c9bdf38c5103ef56d86383ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/721640
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Refactor arm64 ASIMD opcodes to use common helper routines named
after their instruction classes from the arm64 XML specification.
Add helper routines like ASIMDSAME for instructions with encoding
class "asimdsame" in arm64 encodingindex.xml. Helper arguments
follow the bitfield order in the speficication tables.
For example, the CMEQ instruction entry:
<tr class="instructiontable" encname="CMEQ_asimdsame_only"...>
<td bitwidth="1" class="bitfield">1</td>
<td bitwidth="2" class="bitfield"></td>
<td bitwidth="5" class="bitfield">10001</td>
<td class="iformname" iformid="CMEQ_advsimd_reg">CMEQ (register)</td>
<td class="enctags">Vector</td>
</tr>
Now corresponds to ASIMDSAME(1, 0, 0x11), where each argument
matches the correspoding bitfield value in the table.
Change-Id: I024f3eba552906a865841bc1a296f14e3fca73f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/719280
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This CL integrates a new assembling path specifically designed for SVE
and other modern ARM64 instructions, utilizing generated instruction
tables. It contains the foundational files and modifications to direct
the assembling pipeline to use this new data-driven path.
In a.out.go, it registers new constants for registers and operand types
used by SVE.
A new file inst.go is added, which defines the instruction table data
types and utility functions for the new path. The entry point from the
upstream pipeline is `tryEncode`.
`tryEncode` returns false upon an encoding failure, which allows the
upstream matching logic to handle multiple potential matches. The exact
match is not finalized until an instruction is actually encoded, as
detailed in the comments for `elemEncoders`.
This CL also introduces the core generated tables (`anames_gen.go`,
`encoding_gen.go`, `goops_gen.go`, and `inst_gen.go`) which handle a
wide variety of SVE instructions. A comprehensive end-to-end assembly
test file (`arm64sveenc.s`) is added, containing hundreds of test cases
for these SVE instructions to verify the new encoding path.
To facilitate these encodings, this CL implements handling for operand
types such as AC_ARNG, AC_PREG, AC_PREGZM, and AC_ZREG. Others are left
as TODOs.
The generated files in this CL are produced by the `instgen` tool in CL
755180.
Original author Eric Fang (eric.fang@arm.com, CL 424137)
Change-Id: I483f170c776fcd8edd8b8b04520f9d69ee0855dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/742620
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Extend splitImm24uScaled to support an unshifted hi value (hi <= 0xfff)
in addition to the shifted hi value (hi & ^0xfff000 == 0). This allows
load/store instructions to handle more offsets using ADD + load/store
sequences instead of falling back to the literal pool.
This will be used by a subsequent change to add FMOVQ support in SSA form.
Change-Id: I78490f5b1a60d49c1d42ad4daefb5d4e6021c965
Reviewed-on: https://go-review.googlesource.com/c/go/+/737320
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Add the SB (speculation barrier) instruction, and an internal/cpu
feature bit to check its availability.
Change-Id: I7c2d887ae75598f7c11cc875ec15ec3be76c09f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/729501
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Support arm64 FMOVQ from/to global address. Currently there are no
global addresses known to be aligned by 16 bytes, and with this CL
we will always use R_ADDRARM64 relocation with ADRP+ADD+FMOVQ instructions.
Change-Id: I283009eda151d1875cf4457734e79b68a941a6df
Reviewed-on: https://go-review.googlesource.com/c/go/+/718001
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Remove the ARM64 prefix from encoding helper functions that were moved to
cmd/internal/obj to be used by both cmd/asm and cmd/compile. These
functions now use the package prefix and look like:
arm64.EncodeRegisterExtension and arm64.RegisterListOffset.
Change-Id: I3548a4fce1072083eb2f55310c9f7ca6a8e12253
Reviewed-on: https://go-review.googlesource.com/c/go/+/714320
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Change-Id: Ieaacd8c40495e7dad61a068125b1d0e0cee832c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/713500
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
There is nothing particularly special about content-addressable
symbols, it's just a place to start.
This reduces the size of the tailscaled binary by about 16K.
This happens mainly because before this CL the linker's symalign
function kicks in for all static composite literals and PCDATA symbols,
and gives them an alignment based on their size. If the size happens
to be a multiple of 32, it gets an alignment of 32.
That wastes space.
For #6853
For #36313
Change-Id: I2f049eee8f2463dd2b5e20d7c9a270ac32a31e50
Reviewed-on: https://go-review.googlesource.com/c/go/+/727920
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
log2
In several places the integer log2 is calculated using loops or similar
mechanisms. math/bits.Len* provide a simpler and more efficient
mechanisms for this.
Annoyingly, every usage has slightly different ideas of what "log2"
means and how non-positive inputs should be handled. I verified the
replacements in each case by comparing the result for inputs from 0
to 1<<16.
Change-Id: Ie962a74674802da363e0038d34c06979ccb41cf3
Reviewed-on: https://go-review.googlesource.com/c/go/+/721880
Reviewed-by: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Large integer constants can take up to 4 instructions to encode.
We can encode some large constants with a single instruction, namely
those which are bit patterns (repetitions of certain runs of 0s and 1s).
Often the constants we want to encode are *close* to those bit patterns,
but don't exactly match. For those, we can use 2 instructions, one to
load the close-by bit pattern and one to fix up any mismatches.
The constants we use to strength reduce divides often fit this pattern.
For unsigned divides by 1 through 15, this CL applies to the constant
for N=3,5,6,10,12,15.
Triggers 17 times in hello world.
Change-Id: I623abf32961fb3e74d0a163f6822f0647cd94499
Reviewed-on: https://go-review.googlesource.com/c/go/+/717900
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Support arm64 FMOVQ with large offset in immediate which is encoded
using register offset instruction in opldrr or opstrr. This will help
allowing folding immediate into new ssa ops FMOVQload and FMOVQstore.
For example: FMOVQ F0, -20000(R0) is encoded as following:
MOVD 3(PC), R27
FMOVQ F0, (R0)(R27)
RET
ffff b1e0 # constant value
Change-Id: Ib71f92f6ff4b310bda004a440b1df41ffe164523
Reviewed-on: https://go-review.googlesource.com/c/go/+/716960
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Found by github.com/mdempsky/unconvert
Change-Id: I88ce10390a49ba768a4deaa0df9057c93c1164de
GitHub-Last-Rev: 3b0f7e8f74f58340637f33287c238765856b2483
GitHub-Pull-Request: golang/go#75974
Reviewed-on: https://go-review.googlesource.com/c/go/+/712940
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Change-Id: Iab41674953655efa7be3d306dfb3f5be486be501
Reviewed-on: https://go-review.googlesource.com/c/go/+/701455
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Add support for the Pointer Authentication Code instructions
required for the ELF ABI when enabling PAC aware binaries.
This allows for assembly writers to add PAC instructions where needed to
support this ABI. Follow up work is to enable the compiler to emit these
instructions in the appropriate places.
The TL;DR for the Linux ABI is that the prologue of a function that
pushes the link register (LR) to the stack, signs the LR with a key
managed by the operating system and hardware using a PAC instruction,
like "paciasp". The function epilog, when restoring the LR from the
stack will verify the signature, using an instruction like "autiasp".
This helps prevents attackers from modifying the return address on the
stack, a common technique for ROP attacks.
Details on PAC can be found here:
- https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/enabling-pac-and-bti-on-aarch64
- https://developer.arm.com/documentation/109576/0100/Pointer-Authentication-Code
The ABI details can be found here:
- https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst
Change-Id: I4516ed1294d19f9ff9d278833d542821b6642aa9
Reviewed-on: https://go-review.googlesource.com/c/go/+/676675
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
go1.26's vet printf checker can associate the printf-wrapper
property with local vars and struct fields if they are assigned
from a printf-like func literal (CL 706635). This leads to better
detection of mistakes.
Change-Id: I604be1e200aa1aba75e09d4f36ab68c1dba3b8a3
Reviewed-on: https://go-review.googlesource.com/c/go/+/710195
Auto-Submit: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This reverts commit 719dfcf8a8478d70360bf3c34c0e920be7b32994.
Reason for revert: Causing crashes.
Change-Id: I0b8526dd03d82fa074ce4f97f1789eeac702b3eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/709755
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Instead of storing LR (the return address) at 0(SP) and the FP
(parent's frame pointer) at -8(SP), store them at framesize-8(SP)
and framesize-16(SP), respectively.
We push and pop data onto the stack such that we're never accessing
anything below SP.
The prolog/epilog lengths are unchanged (3 insns for a typical prolog,
2 for a typical epilog).
We use 8 bytes more per frame.
Typical prologue:
STP.W (FP, LR), -16(SP)
MOVD SP, FP
SUB $C, SP
Typical epilogue:
ADD $C, SP
LDP.P 16(SP), (FP, LR)
RET
The previous word where we stored LR, at 0(SP), is now unused.
We could repurpose that slot for storing a local variable.
The new prolog and epilog instructions are recognized by libunwind,
so pc-sampling tools like perf should now be accurate. (TODO: except
maybe after the first RET instruction? Have to look into that.)
Update #73753 (fixes, for arm64)
Update #57302 (Quim thinks this will help on that issue)
Change-Id: I4800036a9a9a08aaaf35d9f99de79a36cf37ebb8
Reviewed-on: https://go-review.googlesource.com/c/go/+/674615
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: Ib290079a77a746a8512cd4638310b24164f6a930
Reviewed-on: https://go-review.googlesource.com/c/go/+/679456
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Fixes #74076
Change-Id: Icc67b3d4e342f329584433bd1250c56ae8f5a73d
Reviewed-on: https://go-review.googlesource.com/c/go/+/690635
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
The previous CL made this adjustment unnecessary. The argp field
is no longer used by the runtime.
Change-Id: I3491eeef4103c6653ec345d604c0acd290af9e8f
Reviewed-on: https://go-review.googlesource.com/c/go/+/685356
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
That way, the frame is atomically popped. Previously, for big frames
the SP was unwound in two steps (because arm64 can only add constants
up to 1<<12 in a single instruction).
Fixes #73259
Change-Id: I382c249194ad7bc9fc19607c27487c58d90d49e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/689235
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
This reverts commits
3f3782feed6e0726ddb08afd32dad7d94fbb38c6 (CL 648518)
b386b628521780c048af14a148f373c84e687b26 (CL 668475)
Fixes #73542
Change-Id: I218851c5c0b62700281feb0b3f82b6b9b97b910d
Reviewed-on: https://go-review.googlesource.com/c/go/+/670055
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
We currently make some parts of the preamble unpreemptible because
it confuses morestack. See comments in the code.
Instead, have morestack handle those weird cases so we can
remove unpreemptible marks from most places.
This CL makes user functions preemptible everywhere if they have no
write barriers (at least, on x86). In cmd/go the fraction of functions
that need preemptible markings drops from 82% to 36%. Makes the cmd/go
binary 0.3% smaller.
Update #35470
Change-Id: Ic83d5eabfd0f6d239a92e65684bcce7e67ff30bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/648518
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Return the shift in bits from movcon, rather than returning an index.
This allows a number of multiplications to be removed, making the code
more readable. Scale down to an index only when encoding.
Change-Id: I1be91eb526ad95d389e2f8ce97212311551790df
Reviewed-on: https://go-review.googlesource.com/c/go/+/650939
Auto-Submit: Joel Sing <joel@sing.id.au>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Teach conclass how to handle 32 bit values and deduplicate the code
between con32class and conclass.
Change-Id: I9c5eea31d443fd4c2ce700c6ea21e1d0bef665b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/650938
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Joel Sing <joel@sing.id.au>
|
|
Reduce repetition by pulling some common conversions into variables.
Change-Id: I8c1cc806236b5ecdadf90f4507923718fa5de9b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/650937
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This will allow for further improvements and deduplication.
Change-Id: I9374fc2d16168ced06f3fcc9e558a9c85e24fd01
Reviewed-on: https://go-review.googlesource.com/c/go/+/650936
Reviewed-by: Fannie Zhang <Fannie.Zhang@arm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Add support for the `BTI' instruction to the arm64 assembler. This
instruction provides Branch Target Identification for targets of
indirect branches. A BTI can be marked with a target type of
'C' (call), 'J' (jump) or 'JC' (jump or call).
Updates #66054
Change-Id: I1cf31a0382207bb75b9b2deb49ac298a59c00d8a
Reviewed-on: https://go-review.googlesource.com/c/go/+/646781
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Marvin Drees <marvin.drees@9elements.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Rather than having register encoding knowledge in each caller of opldrr/opstrr
(and in a separate olsxrr function), pass the registers into opldrr/opstrr and
let them handle the encoding. This reduces duplication and improves readability.
Change-Id: I50a25263f305d01454f3ff95e8b6e7c76e760ab0
Reviewed-on: https://go-review.googlesource.com/c/go/+/471521
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Provide a four register version of oprrr, which takes an additional 'ra'
register. Use this instead of oprrr where appropriate.
Change-Id: I8882957a83c2b08e407f37a37c61864cd920bbc9
Reviewed-on: https://go-review.googlesource.com/c/go/+/471519
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Rather than having register encoding knowledge in each caller of oprrr,
pass the registers into oprrr and let it handle the encoding. This reduces
duplication and improves readability.
Change-Id: Iab6c70f7796b7a8c071419654b8a5686aeee8c1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/471518
Reviewed-by: Fannie Zhang <Fannie.Zhang@arm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
isaddcon2 tests for the range 0 <= v <= 0xffffff - replace duplicated range
checks with calls to isaddcon2.
Change-Id: Ia6f331852ed3d77715b265cb4fcc500579eac711
Reviewed-on: https://go-review.googlesource.com/c/go/+/650935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Fannie Zhang <Fannie.Zhang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
instructions
Implement vector configuration setting instructions (VSETVLI,
VSETIVLI, VSETL). These allow the vector length (vl) and vector
type (vtype) CSRs to be configured via a single instruction.
Unfortunately each instruction has its own dedicated encoding.
In the case of VSETVLI/VSETIVLI, the vector type is specified via
a series of special operands, which specify the selected element
width (E8, E16, E32, E64), the vector register group multiplier
(M1, M2, M4, M8, MF2, MF4, MF8), the vector tail policy (TU, TA)
and vector mask policy (MU, MA). Note that the order of these
special operands matches non-Go assemblers.
Partially based on work by Pengcheng Wang <wangpengcheng.pp@bytedance.com>.
Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64
Change-Id: I431f59c1e048a3e84754f0643a963da473a741fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/631936
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Code like x := [12]byte{1,2,3,4,5,6,7,8,9,10,11,12} stores x in
a pair of registers and uses MOVD/MOVWU to load the values
from RODATA. The code generator needs to understand not
to use the aligned PC-relative relocation for that sequence.
In non-FIPS modes, more statictemp optimizations can be applied
and this problematic sequence doesn't happen.
Fix the decision about whether to assume alignment to match
the code used by the linker when deciding what to align.
Fixes the linker failure in CL 626437 patch set 5.
Change-Id: Iedad862c6faee758d4a2c5120cab2d329265b134
Reviewed-on: https://go-review.googlesource.com/c/go/+/628835
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Bypass: Russ Cox <rsc@golang.org>
|
|
Add cmd/internal/obj/mkcnames.go to do the generation and update
the architecture packages to use it to maintain the Cnames tables.
Currently works correctly on arm64,loong64,mips,ppc64 and s390x.
Change-Id: I5220b0ba6d8a8a5fcc4d9774731eb2af69a671af
Reviewed-on: https://go-review.googlesource.com/c/go/+/622256
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Commit-Queue: Ian Lance Taylor <iant@golang.org>
|
|
The old API was to do
r := obj.AddRel(sym)
r.Type = this
r.Off = that
etc
The new API is:
sym.AddRel(ctxt, obj.Reloc{Type: this: Off: that, etc})
This new API is more idiomatic and avoids ever having relocations
that are only partially constructed. Most importantly, it sets up
for sym.AddRel being able to check relocation validity in the future.
(Passing ctxt is for use in validity checking.)
Passes golang.org/x/tools/cmd/toolstash/buildall.
Change-Id: I042ea76e61bb3bf6402f98ca11291a13f4799972
Reviewed-on: https://go-review.googlesource.com/c/go/+/625616
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
for small frames
CL 379075 implemented function prologue/epilogue with STP/LDP.
To fix issue #53374, CL 412474 reverted the prologue STP change for
small frames, and the LDP in epilogue was kept. The current instructions
are:
prologue:
MOVD.W R30, -offset(RSP)
MOVD R29, -8(RSP)
epilogue:
LDP -8(RSP), (R29, R30)
ADD $offset, RSP, RSP
It seems a bit strange, as:
1) The prolog and epilogue are not in the same pattern (either STR-LDR,
or STP-LDP).
2) Go Internal ABI defines that R30 is saved at 0(RSP) and R29 is saved
at -8(RSP), so we can not use a single STP.W/LDP.P to save/restore
LR&FP and adjust SP. Changing the ABI causes too much complexity,
and the benefit is not that big.
This patch reverts the small frames' epilogue change in CL 379075. It
converts LDP in the epilogue to LDR-LDR. Another solution is to re-apply
the STP change in prologue, which requires to fix #53609. This seems the
easier and safer solution in the mean time. The new instructions are:
prologue:
MOVD.W R30, -offset(RSP)
MOVD R29, -8(RSP)
epilogue:
MOVD -8(RSP), R29
MOVD.P offset(RSP), R30
The current pattern may cause performance issues in Store-Forwarding on
micro-architectures like AmpereOne. Assuming a function call in the
middle of such code is short enough that the stores are still around,
then the LDP executes and it may wait longer to get the results from
separated stores in Store Buffers other than single STP.
Store-Forwarding aims to improve the efficiency of the processor by
allowing data to be forwarded directly from a store operation to a
subsequent load operation when certain conditions are met. See the
paper: "Memory Barriers: a Hardware View for Software Hackers"
(chapter 3.2: Store Forwarding).
The performance of following ARM64 Linux servers were tested:
1) AmpereOne (ARM v8.6+) from Ampere Computing.
2) Ampere Altra (ARM Neoverse N1) from Ampere Computing.
3) Graviton2 (ARM Neoverse N1) from AWS.
The effect of this change depends the hardware implementation of
store-forwarding. It can obviously improve AmpereOne, especially for
small functions that are frequently called and returned quickly.
E.g., JSON Marshal/Unmarshal benchmarks on AmpereOne:
goos: linux
goarch: arm64
pkg: encoding/json
│ ampere-one.base │ ampere-one.new │
│ sec/op │ sec/op vs base │
CodeMarshal-8 882.1µ ± 1% 779.6µ ± 1% -11.62% (p=0.000 n=10)
CodeMarshalError-8 961.5µ ± 0% 855.7µ ± 1% -11.01% (p=0.000 n=10)
MarshalBytes/32-8 207.6n ± 1% 187.8n ± 0% -9.52% (p=0.000 n=10)
MarshalBytes/256-8 501.0n ± 1% 482.6n ± 1% -3.68% (p=0.000 n=10)
MarshalBytes/4096-8 5.336µ ± 1% 5.074µ ± 1% -4.92% (p=0.000 n=10)
MarshalBytesError/32-8 242.3µ ± 2% 205.7µ ± 3% -15.08% (p=0.000 n=10)
MarshalBytesError/256-8 242.4µ ± 1% 205.2µ ± 2% -15.35% (p=0.000 n=10)
MarshalBytesError/4096-8 247.9µ ± 0% 210.1µ ± 1% -15.24% (p=0.000 n=10)
MarshalMap-8 150.8n ± 1% 145.7n ± 0% -3.35% (p=0.000 n=10)
EncodeMarshaler-8 50.30n ± 26% 54.48n ± 6% ~ (p=0.739 n=10)
CodeUnmarshal-8 4.796m ± 2% 4.055m ± 1% -15.45% (p=0.000 n=10)
CodeUnmarshalReuse-8 4.260m ± 1% 3.496m ± 1% -17.94% (p=0.000 n=10)
UnmarshalString-8 73.89n ± 1% 65.83n ± 1% -10.91% (p=0.000 n=10)
UnmarshalFloat64-8 60.63n ± 1% 58.66n ± 25% ~ (p=0.143 n=10)
UnmarshalInt64-8 55.62n ± 1% 53.25n ± 22% ~ (p=0.468 n=10)
UnmarshalMap-8 255.3n ± 1% 230.3n ± 1% -9.77% (p=0.000 n=10)
UnmarshalNumber-8 467.2n ± 1% 367.0n ± 0% -21.43% (p=0.000 n=10)
geomean 6.224µ 5.605µ -9.94%
Other ARM64 micro-architectures may be not affected so much by such
issue. E.g., benchmarks on Ampere Altra and Graviton2 show slight
improvements:
│ altra.base │ altra.new │
│ sec/op │ sec/op vs base │
CodeMarshal-8 980.1µ ± 1% 977.3µ ± 1% ~ (p=0.912 n=10)
CodeMarshalError-8 1.109m ± 3% 1.096m ± 5% ~ (p=0.971 n=10)
MarshalBytes/32-8 246.8n ± 1% 245.4n ± 0% -0.55% (p=0.002 n=10)
MarshalBytes/256-8 590.9n ± 1% 606.6n ± 1% +2.67% (p=0.000 n=10)
MarshalBytes/4096-8 6.351µ ± 1% 6.376µ ± 1% ~ (p=0.183 n=10)
MarshalBytesError/32-8 245.3µ ± 2% 246.1µ ± 2% ~ (p=0.684 n=10)
MarshalBytesError/256-8 245.5µ ± 1% 248.7µ ± 2% ~ (p=0.218 n=10)
MarshalBytesError/4096-8 254.2µ ± 1% 254.9µ ± 1% ~ (p=0.481 n=10)
MarshalMap-8 152.7n ± 2% 151.5n ± 3% ~ (p=0.782 n=10)
EncodeMarshaler-8 45.95n ± 7% 42.88n ± 5% -6.70% (p=0.014 n=10)
CodeUnmarshal-8 5.121m ± 4% 5.125m ± 3% ~ (p=0.579 n=10)
CodeUnmarshalReuse-8 4.616m ± 3% 4.634m ± 2% ~ (p=0.529 n=10)
UnmarshalString-8 72.12n ± 2% 72.20n ± 2% ~ (p=0.912 n=10)
UnmarshalFloat64-8 64.44n ± 5% 63.20n ± 4% ~ (p=0.393 n=10)
UnmarshalInt64-8 61.49n ± 2% 58.14n ± 4% -5.45% (p=0.002 n=10)
UnmarshalMap-8 263.6n ± 2% 266.2n ± 1% ~ (p=0.196 n=10)
UnmarshalNumber-8 464.7n ± 1% 464.0n ± 0% ~ (p=0.566 n=10)
geomean 6.617µ 6.575µ -0.64%
│ graviton2.base │ graviton2.new │
│ sec/op │ sec/op vs base │
CodeMarshal-8 1.122m ± 0% 1.118m ± 1% ~ (p=0.052 n=10)
CodeMarshalError-8 1.216m ± 1% 1.214m ± 0% ~ (p=0.631 n=10)
MarshalBytes/32-8 289.9n ± 0% 280.8n ± 0% -3.17% (p=0.000 n=10)
MarshalBytes/256-8 675.9n ± 0% 664.7n ± 0% -1.66% (p=0.000 n=10)
MarshalBytes/4096-8 6.884µ ± 0% 6.885µ ± 0% ~ (p=0.565 n=10)
MarshalBytesError/32-8 293.1µ ± 2% 288.9µ ± 2% ~ (p=0.123 n=10)
MarshalBytesError/256-8 296.0µ ± 3% 289.0µ ± 1% -2.36% (p=0.019 n=10)
MarshalBytesError/4096-8 300.4µ ± 1% 295.6µ ± 0% -1.60% (p=0.000 n=10)
MarshalMap-8 168.8n ± 1% 168.8n ± 1% ~ (p=1.000 n=10)
EncodeMarshaler-8 53.77n ± 8% 50.05n ± 12% ~ (p=0.579 n=10)
CodeUnmarshal-8 5.875m ± 2% 5.882m ± 1% ~ (p=0.796 n=10)
CodeUnmarshalReuse-8 5.383m ± 1% 5.366m ± 0% ~ (p=0.631 n=10)
UnmarshalString-8 74.59n ± 1% 73.99n ± 0% -0.80% (p=0.001 n=10)
UnmarshalFloat64-8 68.52n ± 7% 64.19n ± 18% ~ (p=0.868 n=10)
UnmarshalInt64-8 65.32n ± 13% 62.24n ± 8% ~ (p=0.138 n=10)
UnmarshalMap-8 290.1n ± 0% 291.3n ± 0% +0.43% (p=0.010 n=10)
UnmarshalNumber-8 514.4n ± 0% 499.4n ± 0% -2.93% (p=0.000 n=10)
geomean 7.459µ 7.317µ -1.91%
Change-Id: If27386fc5f514b76bdaf2012c2ce86cc65f7ca5b
Reviewed-on: https://go-review.googlesource.com/c/go/+/621775
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|