aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/internal/obj/arm64/a.out.go
AgeCommit message (Collapse)Author
34 hourscmd/asm, cmd/internal/obj/arm64: support special operands in SVEJunyang Shao
This CL is generated by CL 764980. This CL supports these new special constants: <prfop>, which Go already support (prefetch modifier) <vl>, which include VLx2 and VLx4, which is the vector length specifier. Change-Id: I831f306a816493c08f3c22786e5360f2a37acf6c Reviewed-on: https://go-review.googlesource.com/c/go/+/765000 LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
5 dayscmd/asm, cmd/internal/obj/arm64: support register with index in SVEJunyang Shao
This CL is generated by CL 759800. The new register patterns are (examples): Z1.B[5] Z2[6] P1[7] PN1[8] Change-Id: I5bccc4f1c0474dbd4cd4878bd488f36a7026c7ca Reviewed-on: https://go-review.googlesource.com/c/go/+/759780 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
5 dayscmd/asm, cmd/internal/obj/arm64: add GP and SIMD reg support for SVEJunyang Shao
The GP registers and SIMD registers are comforming to the existing Go syntax: they are V or R registers, their widths are specified in the Opcode, the rules to specify them is: - if that instruction only contains one GP or SIMD register: If it's 32-bit GP, then append W to the end of the opcode. If it's 64-bit GP, no changes. If it's SIMD register with BHWD width specification, BHSDQ will just be appended to the end of the opcode. - if it contains multiple GP or SIMD registers, then manual observation found that they are either specified the same width, or they are fixed width. We distinguish them by their first Go ASM operand width. The rule to append suffixes are the same to the single-reg case above. This CL is generated by CL 759280. Change-Id: Icc819cc30dd8fd1609de31ba7bcb4e3ac83c465e Reviewed-on: https://go-review.googlesource.com/c/go/+/759261 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
7 dayscmd/internal/obj/arm64: add ASIMD cross-lane reduction instructionsAlexander Musman
Add support for ASIMD instructions that reduce a vector to a scalar by operating across all lanes. These use the ASIMDALL encoding class from the ARM architecture specification. Integer cross-lane reductions (.B8, .B16, .H4, .H8, .S4): Signed max/min across lanes: VSMAXV, VSMINV Unsigned max/min across lanes: VUMAXV, VUMINV Floating-point cross-lane reductions (.S4 arrangement): FP max/min across lanes: VFMAXV, VFMINV FP max/min across lanes (NM): VFMAXNMV, VFMINNMV Change-Id: I6af4462d26803dfc7c78db2ad9df4284083e31e8 Reviewed-on: https://go-review.googlesource.com/c/go/+/762202 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
7 dayscmd/internal/obj/arm64: add ASIMD miscellaneous unary instructionsAlexander Musman
Add support for ASIMD unary miscellaneous instructions that operate on a single source register. These use the ASIMDMISC encoding class from the ARM architecture specification. These instruction need some validation for arrangement constraints: - VNOT only allows .B8/.B16 arrangements - VCLS/VCLZ do not support D arrangements - Floating-point variants (VFABS, VFNEG, VFSQRT, VFRINT*) only allow floating-point arrangements (S and D) New instructions by group: Integer absolute/negate: VABS, VNEG Floating-point abs/negate: VFABS, VFNEG Floating-point sqrt: VFSQRT Floating-point round: VFRINTN, VFRINTP, VFRINTM, VFRINTZ Saturating abs/negate: VSQABS, VSQNEG Bit/count operations: VCLS, VCLZ, VNOT Change-Id: I62242eda31f82cd34119c7d4f97316a030e7663b Reviewed-on: https://go-review.googlesource.com/c/go/+/762201 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Keith Randall <khr@golang.org>
7 dayscmd/internal/obj/arm64: add ASIMD arithmetic instructionsAlexander Musman
Add encoding support for ASIMD three-register instructions covering floating-point, saturating, halving, integer multiply/accumulate, min/max (including pairwise variants), and bitwise operations. These belong to the "Advanced SIMD Three-register (same)" instruction class defined by the ARM architecture, meaning the two source registers use the same element arrangement (e.g., both .S4 or both .D2). In the assembler they share a common encoding path using the ASIMDSAME() macro. New instructions by group: Floating-point arithmetic: VFADD, VFSUB, VFMUL, VFDIV Floating-point min/max: VFMAX, VFMAXNM, VFMIN, VFMINNM Pairwise floating-point: VFADDP, VFMAXP, VFMINP, VFMAXNMP, VFMINNMP Saturating arithmetic: VSQADD, VUQADD, VSQSUB, VUQSUB Average (halving add): VSHADD, VSRHADD, VUHADD, VURHADD Integer multiply/accum: VMUL, VMLA, VMLS Integer min/max: VSMAX, VSMIN Pairwise integer min/max: VSMAXP, VSMINP, VUMAXP, VUMINP Bitwise: VBIC, VORN Change-Id: I732c84123ad1f302260514fdfe0d020787da017b Reviewed-on: https://go-review.googlesource.com/c/go/+/762200 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
7 dayscmd/internal/obj/arm64: add ASIMD shift instructionsAlexander Musman
Add support for ASIMD shift instructions. These use the ASIMDSHF encoding class from the ARM architecture specification, where the shift amount is encoded as an immediate derived from the element size. Also add ASIMD shifts-by-vector (3-register form) where the shift amount comes from a second vector register. These use the ASIMDSAME encoding class. New instructions by group: Shift by immediate (signed): VSSHR, VSRSHR Shift by immediate (saturating): VSQSHL, VUQSHL Narrowing shift by immediate: VSHRN, VSHRN2 Shift by vector (3-reg): VSSHL, VUSHL, VSQSHL, VUQSHL Change-Id: I039cc16bc01980b04e6940cc1d4670faf5fa7e3c Reviewed-on: https://go-review.googlesource.com/c/go/+/762180 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
7 dayscmd/internal/obj/arm64: add remaining ASIMD compare instructionsAlexander Musman
Add remaining arm64 ASIMD vector compare instructions. All these instructions produce either all zeroes (false) or all ones (true) bits in each corresponding lane as the result. Added integer comparison instructions: - VCMEQ (compare to zero) - VCMGE, VCMGT (singed, both two-register and compare to zero) - VCMHI, VCMHS (unsigned two-register compare) - VCMLE, VCMLT (signed compare to zero) Added floating-point comparison instructions: - VFCMEQ, VFCMGE, VFCMGT (both two-register and zero variants) - VFCMLE, VFCMLT (compare to zero) Change-Id: I913165d3934f2556c9bdf38c5103ef56d86383ef Reviewed-on: https://go-review.googlesource.com/c/go/+/721640 Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2026-03-20cmd/internal/obj/arm64: new arm64 assembling path for SVEJunyang Shao
This CL integrates a new assembling path specifically designed for SVE and other modern ARM64 instructions, utilizing generated instruction tables. It contains the foundational files and modifications to direct the assembling pipeline to use this new data-driven path. In a.out.go, it registers new constants for registers and operand types used by SVE. A new file inst.go is added, which defines the instruction table data types and utility functions for the new path. The entry point from the upstream pipeline is `tryEncode`. `tryEncode` returns false upon an encoding failure, which allows the upstream matching logic to handle multiple potential matches. The exact match is not finalized until an instruction is actually encoded, as detailed in the comments for `elemEncoders`. This CL also introduces the core generated tables (`anames_gen.go`, `encoding_gen.go`, `goops_gen.go`, and `inst_gen.go`) which handle a wide variety of SVE instructions. A comprehensive end-to-end assembly test file (`arm64sveenc.s`) is added, containing hundreds of test cases for these SVE instructions to verify the new encoding path. To facilitate these encodings, this CL implements handling for operand types such as AC_ARNG, AC_PREG, AC_PREGZM, and AC_ZREG. Others are left as TODOs. The generated files in this CL are produced by the `instgen` tool in CL 755180. Original author Eric Fang (eric.fang@arm.com, CL 424137) Change-Id: I483f170c776fcd8edd8b8b04520f9d69ee0855dd Reviewed-on: https://go-review.googlesource.com/c/go/+/742620 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2026-02-24internal/cpu,cmd/internal/obj/arm64: add SBRoland Shoemaker
Add the SB (speculation barrier) instruction, and an internal/cpu feature bit to check its availability. Change-Id: I7c2d887ae75598f7c11cc875ec15ec3be76c09f5 Reviewed-on: https://go-review.googlesource.com/c/go/+/729501 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-14cmd/internal/obj/arm64: add support for PAC instructionsBill Roberts
Add support for the Pointer Authentication Code instructions required for the ELF ABI when enabling PAC aware binaries. This allows for assembly writers to add PAC instructions where needed to support this ABI. Follow up work is to enable the compiler to emit these instructions in the appropriate places. The TL;DR for the Linux ABI is that the prologue of a function that pushes the link register (LR) to the stack, signs the LR with a key managed by the operating system and hardware using a PAC instruction, like "paciasp". The function epilog, when restoring the LR from the stack will verify the signature, using an instruction like "autiasp". This helps prevents attackers from modifying the return address on the stack, a common technique for ROP attacks. Details on PAC can be found here: - https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/enabling-pac-and-bti-on-aarch64 - https://developer.arm.com/documentation/109576/0100/Pointer-Authentication-Code The ABI details can be found here: - https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst Change-Id: I4516ed1294d19f9ff9d278833d542821b6642aa9 Reviewed-on: https://go-review.googlesource.com/c/go/+/676675 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-26cmd/internal/obj/arm64: add support for BTI instructionJoel Sing
Add support for the `BTI' instruction to the arm64 assembler. This instruction provides Branch Target Identification for targets of indirect branches. A BTI can be marked with a target type of 'C' (call), 'J' (jump) or 'JC' (jump or call). Updates #66054 Change-Id: I1cf31a0382207bb75b9b2deb49ac298a59c00d8a Reviewed-on: https://go-review.googlesource.com/c/go/+/646781 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Marvin Drees <marvin.drees@9elements.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-02-14cmd/asm,cmd/internal/obj/riscv: implement vector configuration setting ↵Joel Sing
instructions Implement vector configuration setting instructions (VSETVLI, VSETIVLI, VSETL). These allow the vector length (vl) and vector type (vtype) CSRs to be configured via a single instruction. Unfortunately each instruction has its own dedicated encoding. In the case of VSETVLI/VSETIVLI, the vector type is specified via a series of special operands, which specify the selected element width (E8, E16, E32, E64), the vector register group multiplier (M1, M2, M4, M8, MF2, MF4, MF8), the vector tail policy (TU, TA) and vector mask policy (MU, MA). Note that the order of these special operands matches non-Go assemblers. Partially based on work by Pengcheng Wang <wangpengcheng.pp@bytedance.com>. Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64 Change-Id: I431f59c1e048a3e84754f0643a963da473a741fe Reviewed-on: https://go-review.googlesource.com/c/go/+/631936 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-11-13cmd/internal/obj: add tool to generate Cnames stringchenguoqi
Add cmd/internal/obj/mkcnames.go to do the generation and update the architecture packages to use it to maintain the Cnames tables. Currently works correctly on arm64,loong64,mips,ppc64 and s390x. Change-Id: I5220b0ba6d8a8a5fcc4d9774731eb2af69a671af Reviewed-on: https://go-review.googlesource.com/c/go/+/622256 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> Commit-Queue: Ian Lance Taylor <iant@golang.org>
2023-10-18cmd/internal/obj/arm64: replace the migrated url addresscui fliter
Change-Id: I36a0f0989d37bef45ea8778da799b56a7e9a0c30 Reviewed-on: https://go-review.googlesource.com/c/go/+/529515 Run-TryBot: shuang cui <imcusg@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
2023-07-31cmd/internal/obj/arm64: improve classification of loads and storesJoel Sing
Currently, pool literals are added when they are not needed, namely in the case where the offset is a 24 bit unsigned scaled immediate. By improving the classification of loads and stores, we can avoid generating unused pool literals. However, more importantly this provides a basis for further improvement of the load and store code generation. Updates #59615 Change-Id: Ia3bad1709314565a05894a76c434cca2fa4533c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/512538 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-07-27cmd/asm: add the fifth argument of the instruction to Optab on arm64erifan01
Currently the Optab structure contains four arguments of an instruction, excludes the fifth argument p.RegTo2. It does not participate in instruction matching and is usually handled separately. Instructions with five operands are common in the newer arm instruction set, so this CL adds the fifth argument to Optab, so that instruction matching is easier. This caused the oplook function also needs to be updated synchronously, this CL also made some cleaning and modifications to this function. Change-Id: I1d95ad99e72a44dfad1e00db182cfc369a0e55c6 Reviewed-on: https://go-review.googlesource.com/c/go/+/505975 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com>
2023-06-14all: fix spelling errorsAlexander Yastrebov
Fix spelling errors discovered using https://github.com/codespell-project/codespell. Errors in data files and vendored packages are ignored. Change-Id: I83c7818222f2eea69afbd270c15b7897678131dc GitHub-Last-Rev: 3491615b1b82832cc0064f535786546e89aa6184 GitHub-Pull-Request: golang/go#60758 Reviewed-on: https://go-review.googlesource.com/c/go/+/502576 Auto-Submit: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-05-18cmd/asm: remove unsupported opcodes MOVNP and STLP for arm64erifan01
ARM64 doesn't have MOVNP/MOVNPW and STLP/STLPW instructions, which are currently useless instructions as well. This CL deletes them. At the same time this CL sorts the opcodes by name, which looks cleaner. Change-Id: I25cfb636b23356ba0a50cba527a8c85b3f7e2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/495695 Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-23cmd/internal/obj/arm64: remove the transition from $0 to ZReric fang
Previously we convert $0 to the ZR register for some reasons, which causes two problems: 1. Confusion, the special case of the ZR register needs to be considered when dealing with constants. For encoding, some places we encode ZR, and some places we encode $0, although we have converted $0 to ZR. 2. Unexpected instruction format. All instructions that support ZR register operands can be replaced by $0. This patch removes this conversion. Note that this patch may cause previously unintendedly supported instruction formats to no longer be supported. Change-Id: I3d8d2c06711b7614a38191397da7776417f1861c Reviewed-on: https://go-review.googlesource.com/c/go/+/404316 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-09cmd/asm: add VTBX instruction on arm64Nick Ripley
Change-Id: Icd9eeb78bfc0c0bbe19dcb9841c9fdc0abc29cc9 Reviewed-on: https://go-review.googlesource.com/c/go/+/413314 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-04-01cmd/asm: add DC instruction on arm64erifan01
There was only a placeholder for DC instruction in the previous code. gVisor needs this instruction. This CL completes its support. This patch is a copy of CL 250858, contributed by Junchen Li(junchen.li@arm.com). Co-authored-by: Junchen Li(junchen.li@arm.com) CustomizedGitHooks: yes Change-Id: I76098048a227fbd08aa42c4173b028f0ab4f66e8 Reviewed-on: https://go-review.googlesource.com/c/go/+/302851 Reviewed-by: Cherry Mui <cherryyz@google.com> Trust: Eric Fang <eric.fang@arm.com> Run-TryBot: Eric Fang <eric.fang@arm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-01cmd/asm: add TLBI instruction on arm64erifan01
There was only a placeholder for TLBI instruction in the previous code. gVisor needs this instruction. This CL completes its support. This patch is a copy of CL 250758, contributed by Junchen Li(junchen.li@arm.com). Co-authored-by: Junchen Li(junchen.li@arm.com) Change-Id: I69e893d2c1f75e227475de9e677548e14870f3cd Reviewed-on: https://go-review.googlesource.com/c/go/+/302850 Reviewed-by: Cherry Mui <cherryyz@google.com> Trust: Eric Fang <eric.fang@arm.com> Run-TryBot: Eric Fang <eric.fang@arm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-01cmd/asm: refactor some operands that are not special registers on arm64erifan01
The previous code treats some operands such as EQ, LT, etc. as special registers. However, they are not. This CL adds a new AddrType TYPE_SPOPD and a new class C_SPOPD to support this kind of special operands, and refactors the relevant code. This patch is a copy of CL 260861, contributed by Junchen Li(junchen.li@arm.com). Co-authored-by: Junchen Li(junchen.li@arm.com) Change-Id: I57b28da458ee3332f610602632e7eda03af435f5 Reviewed-on: https://go-review.googlesource.com/c/go/+/302849 Reviewed-by: Cherry Mui <cherryyz@google.com> Trust: Eric Fang <eric.fang@arm.com> Run-TryBot: Eric Fang <eric.fang@arm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-15cmd/internal/obj/arm64: add TRN1 and TRN2 instructions supportfanzha02
Add test cases. Fixes #51628 Change-Id: I433367d87e6bb5da5579c4be540079b92701c1fa Reviewed-on: https://go-review.googlesource.com/c/go/+/392294 Trust: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Trust: Fannie Zhang <Fannie.Zhang@arm.com>
2021-09-20cmd/compile: allow rotates to be merged with logical ops on arm64Keith Randall
Fixes #48002 Change-Id: Ie3a157d55b291f5ac2ef4845e6ce4fefd84fc642 Reviewed-on: https://go-review.googlesource.com/c/go/+/350912 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-03-04cmd/asm: add arm64 instructions VUMAX and VUMINeric fang
This CL adds support for arm64 fp&simd instructions VUMAX and VUMIN. Fixes #42326 Change-Id: I3757ba165dc31ce1ce70f3b06a9e5b94c14d2ab9 Reviewed-on: https://go-review.googlesource.com/c/go/+/271497 Trust: eric fang <eric.fang@arm.com> Run-TryBot: eric fang <eric.fang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: fannie zhang <Fannie.Zhang@arm.com> Reviewed-by: eric fang <eric.fang@arm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-04cmd/asm: add 128-bit FLDPQ and FSTPQ instructions for arm64eric fang
This CL adds assembly support for 128-bit FLDPQ and FSTPQ instructions. This CL also deletes some wrong pre/post-indexed LDP and STP instructions, such as {ALDP, C_UAUTO4K, C_NONE, C_NONE, C_PAIR, 74, 8, REGSP, 0, C_XPRE}, because when the offset type is C_UAUTO4K, pre and post don't work. Change-Id: Ifd901d4440eb06eb9e86c9dd17518749fdf32848 Reviewed-on: https://go-review.googlesource.com/c/go/+/273668 Trust: eric fang <eric.fang@arm.com> Run-TryBot: eric fang <eric.fang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: eric fang <eric.fang@arm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-02-19runtime: enable framepointer on all arm64Russ Cox
Frame pointers were already enabled on linux, darwin, ios, but not freebsd, android, openbsd, netbsd. But the space was reserved on all platforms, leading to two different arm64 framepointer conditions in different parts of the code, one of which had no name (framepointer_enabled || GOARCH == "arm64", which might have been "framepointer_space_reserved"). So on the disabled systems, the stack layouts were still set up for frame pointers and the only difference was not actually maintaining the FP register in the generated code. Reduce complexity by just enabling the frame pointer completely on all the arm64 systems. This commit passes on freebsd, android, netbsd. I have not been able to try it on openbsd. This CL is part of a stack adding windows/arm64 support (#36439), intended to land in the Go 1.17 cycle. This CL is, however, not windows/arm64-specific. It is cleanup meant to make the port (and future ports) easier. Change-Id: I83bd23369d24b76db4c6a648fa74f6917819a093 Reviewed-on: https://go-review.googlesource.com/c/go/+/288814 Trust: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-11-04cmd/asm: rename arm64 instructions LDANDx to LDCLRxJonathan Swinney
The LDANDx instructions were misleading because they correspond to the mnemonic LDCLRx as defined in the Arm Architecture Reference Manual for Armv8. This changes the assembler to use the same mnemonic as the GNU assembler and the manual. The instruction has the form: LDCLRx Rs, (Rb), Rt: *Rb -> Rt, Rs AND NOT(*Rb) -> *Rb Change-Id: I94ae003e99e817209bba1afe960e612bf3a0b410 Reviewed-on: https://go-review.googlesource.com/c/go/+/267138 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: fannie zhang <Fannie.Zhang@arm.com>
2020-10-29cmd/internal/obj/arm64: add CASx/CASPx instructionsfanzha02
This patch adds support for CASx and CASPx atomic instructions. go syntax gnu syntax CASD Rs, (Rn|RSP), Rt => cas Xs, Xt, (Xn|SP) CASALW Rs, (Rn|RSP), Rt => casal Ws, Wt, (Xn|SP) CASPD (Rs, Rs+1), (Rn|RSP), (Rt, Rt+1) => casp Xs, Xs+1, Xt, Xt+1, (Xn|SP) CASPW (Rs, Rs+1), (Rn|RSP), (Rt, Rt+1) => casp Ws, Ws+1, Wt, Wt+1, (Xn|SP) This patch changes the type of prog.RestArgs from "[]Addr" to "[]struct{Addr, Pos}", Pos is a enum, indicating the position of the operand. This patch also adds test cases. Change-Id: Ib971cfda7890b7aa895d17bab22dea326c7fcaa4 Reviewed-on: https://go-review.googlesource.com/c/go/+/233277 Trust: fannie zhang <Fannie.Zhang@arm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-29cmd/asm: add several arm64 SIMD instructionsfanzha02
This patch enables VSLI, VUADDW(2), VUSRA and FMOVQ SIMD instructions required by the issue #40725. And the GNU syntax of 'FMOVQ' is 128-bit ldr/str(immediate, simd&fp). Add test cases. Fixes #40725 Change-Id: Ide968ef4a9385ce4cd8f69bce854289014d30456 Reviewed-on: https://go-review.googlesource.com/c/go/+/258397 Trust: fannie zhang <Fannie.Zhang@arm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-10cmd/asm: Add SHA3 hardware instructions for ARM64Meng Zhuo
Armv8.2-SHA introduced four SHA3-related instructions EOR3 <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B RAX1 <Vd>.2D, <Vn>.2D, <Vm>.2D XAR <Vd>.2D, <Vn>.2D, <Vm>.2D, #<imm6> BCAX <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B We convert them into Go asm style as: VEOR3 <Va>.B16, <Vm>.B16, <Vn>.B16, <Vd>.B16 VRAX1 <Vm>.D2, <Vn>.D2, <Vd>.D2 VXAR $imm6, <Vm>.D2, <Vn>.D2, <Vd>.D2 VBCAX <Va>.B16, <Vm>.B16, <Vn>.B16, <Vd>.B16 Armv8 Reference Manual: * EOR3 (Three-way Exclusive OR) on C7.2.42 * RAX1 (Rotate and Exclusive OR) on C7.2.217 * XAR (Exclusive OR and Rotate) on C7.2.401 * BCAX (Bit Clear and Exclusive OR) on C7.2.12 Change-Id: I9a5d1b5ad508ed8fd5289d535906c54d9a63ca5a Reviewed-on: https://go-review.googlesource.com/c/go/+/180757 Run-TryBot: Meng Zhuo <mzh@golangcn.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Trust: Emmanuel Odeke <emm.odeke@gmail.com>
2020-09-25cmd/asm: fix the issue of moving 128-bit integers to vector registers on arm64fanzha02
The CL 249758 added `FMOVQ $vcon, Vd` instruction and assembler used 128-bit simd literal-loading to load `$vcon` from pool into 128-bit vector register `Vd`. Because Go does not have 128-bit integers for now, the assembler will report an error of `immediate out of range` when assembleing `FMOVQ $0x123456789abcdef0123456789abcdef, V0` instruction. This patch lets 128-bit integers take two 64-bit operands, for the high and low parts separately and adds `VMOVQ $hi, $lo, Vd` instruction to move `$hi<<64+$lo' into 128-bit register `Vd`. In addition, this patch renames `FMOVQ/FMOVD/FMOVS` ops to 'VMOVQ/VMOVD/VMOVS' and uses them to move 128-bit, 64-bit and 32-bit constants into vector registers, respectively Update the go doc. Fixes #40725 Change-Id: Ia3c83bb6463f104d2bee960905053a97299e0a3a Reviewed-on: https://go-review.googlesource.com/c/go/+/255900 Trust: fannie zhang <Fannie.Zhang@arm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-09-16cmd/internal/obj/arm64: optimize the instruction of moving long effective ↵diaxu01
stack address Currently, when the offset of "MOVD $offset(Rn), Rd" is a large positive constant or a negative constant, the assembler will load this offset from the constant pool.This patch gets rid of the constant pool by encoding the offset into two ADD instructions if it's a large positive constant or one SUB instruction if negative. For very large negative offset, it is rarely used, here we don't optimize this case. Optimized case 1: MOVD $-0x100000(R7), R0 Before: LDR 0x67670(constant pool), R27; ADD R27.UXTX, R0, R7 After: SUB $0x100000, R7, R0 Optimized case 2: MOVD $0x123468(R7), R0 Before: LDR 0x67670(constant pool), R27; ADD R27.UXTX, R0, R7 After: ADD $0x123000, R7, R27; ADD $0x000468, R27, R0 1. Binary size before/after. binary size change pkg/linux_arm64 +4KB pkg/tool/linux_arm64 no change go no change gofmt no change 2. go1 benckmark. name old time/op new time/op delta pkg:test/bench/go1 goos:linux goarch:arm64 BinaryTree17-64 7335721401.800000ns +-40% 6264542009.800000ns +-14% ~ (p=0.421 n=5+5) Fannkuch11-64 3886551822.600000ns +- 0% 3875870590.200000ns +- 0% ~ (p=0.151 n=5+5) FmtFprintfEmpty-64 82.960000ns +- 1% 83.900000ns +- 2% +1.13% (p=0.048 n=5+5) FmtFprintfString-64 149.200000ns +- 1% 148.000000ns +- 0% -0.80% (p=0.016 n=5+4) FmtFprintfInt-64 177.000000ns +- 0% 178.400000ns +- 2% ~ (p=0.794 n=4+5) FmtFprintfIntInt-64 240.200000ns +- 2% 239.400000ns +- 4% ~ (p=0.302 n=5+5) FmtFprintfPrefixedInt-64 300.400000ns +- 0% 299.200000ns +- 1% ~ (p=0.119 n=5+5) FmtFprintfFloat-64 360.000000ns +- 0% 361.600000ns +- 3% ~ (p=0.349 n=4+5) FmtManyArgs-64 1064.400000ns +- 1% 1061.400000ns +- 0% ~ (p=0.087 n=5+5) GobDecode-64 12080404.400000ns +- 2% 11637601.000000ns +- 1% -3.67% (p=0.008 n=5+5) GobEncode-64 8474973.800000ns +- 2% 7977801.600000ns +- 2% -5.87% (p=0.008 n=5+5) Gzip-64 416501238.400000ns +- 0% 410463405.400000ns +- 0% -1.45% (p=0.008 n=5+5) Gunzip-64 58088415.200000ns +- 0% 58826209.600000ns +- 0% +1.27% (p=0.008 n=5+5) HTTPClientServer-64 128660.200000ns +-23% 117840.800000ns +- 8% ~ (p=0.222 n=5+5) JSONEncode-64 17547746.800000ns +- 4% 17216180.000000ns +- 1% ~ (p=0.222 n=5+5) JSONDecode-64 80879896.000000ns +- 1% 80063737.200000ns +- 0% -1.01% (p=0.008 n=5+5) Mandelbrot200-64 5484901.600000ns +- 0% 5483614.400000ns +- 0% ~ (p=0.310 n=5+5) GoParse-64 6201166.800000ns +- 6% 6150920.600000ns +- 1% ~ (p=0.548 n=5+5) RegexpMatchEasy0_32-64 135.000000ns +- 0% 139.200000ns +- 7% ~ (p=0.643 n=5+5) RegexpMatchEasy0_1K-64 484.600000ns +- 2% 483.800000ns +- 2% ~ (p=0.984 n=5+5) RegexpMatchEasy1_32-64 128.000000ns +- 1% 124.600000ns +- 1% -2.66% (p=0.008 n=5+5) RegexpMatchEasy1_1K-64 769.400000ns +- 2% 761.400000ns +- 1% ~ (p=0.460 n=5+5) RegexpMatchMedium_32-64 12.900000ns +- 0% 12.500000ns +- 0% -3.10% (p=0.008 n=5+5) RegexpMatchMedium_1K-64 57879.200000ns +- 1% 56512.200000ns +- 0% -2.36% (p=0.008 n=5+5) RegexpMatchHard_32-64 3091.600000ns +- 1% 3071.000000ns +- 0% -0.67% (p=0.048 n=5+5) RegexpMatchHard_1K-64 92941.200000ns +- 1% 92794.000000ns +- 0% ~ (p=1.000 n=5+5) Revcomp-64 1695605187.000000ns +-54% 1821697637.400000ns +-47% ~ (p=1.000 n=5+5) Template-64 112839686.800000ns +- 1% 109964069.200000ns +- 3% ~ (p=0.095 n=5+5) TimeParse-64 587.000000ns +- 0% 587.000000ns +- 0% ~ (all equal) TimeFormat-64 586.000000ns +- 1% 584.200000ns +- 1% ~ (p=0.659 n=5+5) [Geo mean] 81804.262218ns 80694.712973ns -1.36% name old speed new speed delta pkg:test/bench/go1 goos:linux goarch:arm64 GobDecode-64 63.6MB/s +- 2% 66.0MB/s +- 1% +3.78% (p=0.008 n=5+5) GobEncode-64 90.6MB/s +- 2% 96.2MB/s +- 2% +6.23% (p=0.008 n=5+5) Gzip-64 46.6MB/s +- 0% 47.3MB/s +- 0% +1.47% (p=0.008 n=5+5) Gunzip-64 334MB/s +- 0% 330MB/s +- 0% -1.25% (p=0.008 n=5+5) JSONEncode-64 111MB/s +- 4% 113MB/s +- 1% ~ (p=0.222 n=5+5) JSONDecode-64 24.0MB/s +- 1% 24.2MB/s +- 0% +1.02% (p=0.008 n=5+5) GoParse-64 9.35MB/s +- 6% 9.42MB/s +- 1% ~ (p=0.571 n=5+5) RegexpMatchEasy0_32-64 237MB/s +- 0% 231MB/s +- 7% ~ (p=0.690 n=5+5) RegexpMatchEasy0_1K-64 2.11GB/s +- 2% 2.12GB/s +- 2% ~ (p=1.000 n=5+5) RegexpMatchEasy1_32-64 250MB/s +- 1% 257MB/s +- 1% +2.63% (p=0.008 n=5+5) RegexpMatchEasy1_1K-64 1.33GB/s +- 2% 1.35GB/s +- 1% ~ (p=0.548 n=5+5) RegexpMatchMedium_32-64 77.6MB/s +- 0% 79.8MB/s +- 0% +2.80% (p=0.008 n=5+5) RegexpMatchMedium_1K-64 17.7MB/s +- 1% 18.1MB/s +- 0% +2.41% (p=0.008 n=5+5) RegexpMatchHard_32-64 10.4MB/s +- 1% 10.4MB/s +- 0% ~ (p=0.056 n=5+5) RegexpMatchHard_1K-64 11.0MB/s +- 1% 11.0MB/s +- 0% ~ (p=0.984 n=5+5) Revcomp-64 188MB/s +-71% 155MB/s +-71% ~ (p=1.000 n=5+5) Template-64 17.2MB/s +- 1% 17.7MB/s +- 3% ~ (p=0.095 n=5+5) [Geo mean] 79.2MB/s 79.3MB/s +0.24% Change-Id: I593ac3e7037afafc3605ad4b0cfb51d5dd88015d Reviewed-on: https://go-review.googlesource.com/c/go/+/232438 Trust: Alberto Donizetti <alb.donizetti@gmail.com> Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-09-10cmd/asm: add more SIMD instructions on arm64Junchen Li
This CL adds USHLL, USHLL2, UZP1, UZP2, and BIF instructions requested by #40725. And since UXTL* are aliases of USHLL*, this CL also merges them into one case. Updates #40725 Change-Id: I404a4fdaf953319f72eea548175bec1097a2a816 Reviewed-on: https://go-review.googlesource.com/c/go/+/253659 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-09-10cmd/internal/obj/arm64: enable some SIMD instructionsfanzha02
Enable VBSL, VBIT, VCMTST, VUXTL VUXTL2 and FMOVQ SIMD instructions required by the issue #40725. And FMOVQ instrucion is used to move a large constant to a Vn register. Add test cases. Fixes #40725 Change-Id: I1cac1922a0a0165d698a4b73a41f7a5f0a0ad549 Reviewed-on: https://go-review.googlesource.com/c/go/+/249758 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-18cmd/asm: Add SHA512 hardware instructions for ARM64Meng Zhuo
ARMv8.2-SHA add SHA512 intructions: 1. SHA512H Vm.D2, Vn, Vd 2. SHA512H2 Vm.D2, Vn, Vd 3. SHA512SU0 Vn.D2, Vd.D2 4. SHA512SU1 Vm.D2, Vn.D2, Vd.D2 ARMv8 Architecture Reference Manual C7.2.234-C7.2.234 Change-Id: Ie970fef1bba5312ad466f246035da4c40a1bbb39 Reviewed-on: https://go-review.googlesource.com/c/go/+/180057 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-02-25cmd/asm: add asimd instruction 'rev16' on arm64Xiangdong Ji
Add support to the asimd instruction rev16 which reverses elements in 16-bit halfwords. syntax: VREV16 <Vn>.<T>, <Vd>.<T> <T> should be either B8 or B16. Change-Id: I7a7b8e772589c51ca9eb6dca98bab1aac863c6c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/213738 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-18cmd/internal/obj/arm64: add support of NOOP instructiondiaxu01
This patch uses symbol NOOP to support arm64 instruction NOP. In arm64, NOP stands for that No Operation does nothing, other than advance the value of the program counter by 4. This instruction can be used for instruction alignment purposes. This patch uses NOOP to support arm64 instruction NOP, because we have a generic "NOP" instruction, which is a zero-width pseudo-instruction. In arm64, instruction NOP is an alias of HINT #0. This patch adds test cases for instruction HINT #0. Change-Id: I54e6854c46516eb652b412ef9e0f73ab7f171f8c Reviewed-on: https://go-review.googlesource.com/c/go/+/200578 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-08cmd/internal/obj/arm64: add error checking for system registers.diaxu01
This CL adds system register error checking test cases. There're two kinds of error test cases: 1. illegal combination. MRS should be used in this way: MRS <system register>, <general register>. MSR should be used in this way: MSR <general register>, <system register>. Error usage examples: MRS R8, VTCR_EL2 // ERROR "illegal combination" MSR VTCR_EL2, R8 // ERROR "illegal combination" 2. illegal read or write access. Error usage examples: MSR R7, MIDR_EL1 // ERROR "expected writable system register or pstate" MRS OSLAR_EL1, R3 // ERROR "expected readable system register" This CL reads system registers readable and writeable property to check whether they're used with legal read or write access. This property is named AccessFlags in sysRegEnc.go, and it is automatically generated by modifing the system register generator. Change-Id: Ic83d5f372de38d1ecd0df1ca56b354ee157f16b4 Reviewed-on: https://go-review.googlesource.com/c/go/+/194917 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-03cmd/asm: add VLD[1-4]R vector instructions on arm64Meng Zhuo
This change adds VLD1R, VLD2R, VLD3R, VLD4R Change-Id: Ie19e9ae02fdfc94b9344acde8c9938849efb0bf0 Reviewed-on: https://go-review.googlesource.com/c/go/+/181697 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28cmd/asm: add V[LD|ST][2-4] vector instructions on arm64Meng Zhuo
This change adds VLD2, VLD3, VLD4, VST2, VST3, VST4 (multiple structures) for image or multi media optimazation. Change-Id: Iae3538ef4434e436e3fb2f19153c58f918f773af Reviewed-on: https://go-review.googlesource.com/c/go/+/166518 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28cmd/internal/obj/arm64: add support for most system registersfanzha02
This patch supports the EL0 and EL1 system registers used in MRS/MSR instructions. This patch refactors the assembler code, allowing the assembler to read system register information from the automatically generated sysRegEnc.go file and move existing declared system registers to the sysRegEnc.go file. This patch adds 431 system registers, it is worth noting that the number of special registers is initialized to less than 1024 in the list7.go file. This CL also adds some test cases to test the newly added system registers. The test cases are contributed by Dianhong Xu <Dianhong.Xu@arm.com> Change-Id: Ic09a937eaaeefe82bd08b5dd726808f8ff6cebf6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189577 Reviewed-by: Ben Shi <powerman1st@163.com> Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-06-26cmd/compile, runtime: use R20, R21 in ARM64's Duff's devicesCherry Zhang
Currently we use R16 and R17 for ARM64's Duff's devices. According to ARM64 ABI, R16 and R17 can be used by the (external) linker as scratch registers in trampolines. So don't use these registers to pass information across functions. It seems unlikely that calling Duff's devices would need a trampoline in normal cases. But it could happen if the call target is out of the 128 MB direct jump limit. The choice of R20 and R21 is kind of arbitrary. The register allocator allocates from low-numbered registers. High numbered registers are chosen so it is unlikely to hold a live value and forces a spill. Fixes #32773. Change-Id: Id22d555b5afeadd4efcf62797d1580d641c39218 Reviewed-on: https://go-review.googlesource.com/c/go/+/183842 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2019-03-06cmd/asm: add arm64 v8.1 atomic instructionserifan01
This change adds several arm64 v8.1 atomic instructions and test cases. They are LDADDAx, LDADDLx, LDANDAx, LDANDALx, LDANDLx, LDEORAx, LDEORALx, LDEORLx, LDORAx, LDORALx, LDORLx, SWPAx and SWPLx. Their form is consistent with the form of the existing atomic instructions. For instructions STXRx, STLXRx, STXPx and STLXPx, the second destination register can't be RSP. This CL also adds a check for this. LDADDx Rs, (Rb), Rt: *Rb -> Rt, Rs + *Rb -> *Rb LDANDx Rs, (Rb), Rt: *Rb -> Rt, Rs AND NOT(*Rb) -> *Rb LDEORx Rs, (Rb), Rt: *Rb -> Rt, Rs EOR *Rb -> *Rb LDORx Rs, (Rb), Rt: *Rb -> Rt, Rs OR *Rb -> *Rb Change-Id: I9f9b0245958cb57ab7d88c66fb9159b23b9017fd Reviewed-on: https://go-review.googlesource.com/c/go/+/157001 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-11-07cmd/internal/obj/arm64: encode large constants into MOVZ/MOVN and MOVK ↵fanzha02
instructions Current assembler gets large constants from constant pool, this CL gets rid of the pool by using MOVZ/MOVN and MOVK to load large constants. This CL changes the assembler behavior as follows. 1. go assembly 1, MOVD $0x1111222233334444, R1 2, MOVD $0x1111ffff1111ffff, R1 previous version: MOVD 0x9a4, R1 (loads constant from pool). optimized version: 1, MOVD $0x4444, R1; MOVK $(0x3333<<16), R1; MOVK $(0x2222<<32), R1; MOVK $(0x1111<<48), R1. 2, MOVN $(0xeeee<<16), R1; MOVK $(0x1111<<48), R1. Add test cases, and below are binary size comparison and bechmark results. 1. Binary size before/after binary size change pkg/linux_arm64 +25.4KB pkg/tool/linux_arm64 -2.9KB go -2KB gofmt no change 2. compiler benchmark. name old time/op new time/op delta Template 574ms ±21% 577ms ±14% ~ (p=0.853 n=10+10) Unicode 327ms ±29% 353ms ±23% ~ (p=0.360 n=10+8) GoTypes 1.97s ± 8% 2.04s ±11% ~ (p=0.143 n=10+10) Compiler 9.13s ± 9% 9.25s ± 8% ~ (p=0.684 n=10+10) SSA 29.2s ± 5% 27.0s ± 4% -7.40% (p=0.000 n=10+10) Flate 402ms ±40% 308ms ± 6% -23.29% (p=0.004 n=10+10) GoParser 470ms ±26% 382ms ±10% -18.82% (p=0.000 n=9+10) Reflect 1.36s ±16% 1.17s ± 7% -13.92% (p=0.001 n=9+10) Tar 561ms ±19% 466ms ±15% -17.08% (p=0.000 n=9+10) XML 745ms ±20% 679ms ±20% ~ (p=0.123 n=10+10) StdCmd 35.5s ± 6% 37.2s ± 3% +4.81% (p=0.001 n=9+8) name old user-time/op new user-time/op delta Template 625ms ±14% 660ms ±18% ~ (p=0.343 n=10+10) Unicode 355ms ±10% 373ms ±20% ~ (p=0.346 n=9+10) GoTypes 2.39s ± 8% 2.37s ± 5% ~ (p=0.897 n=10+10) Compiler 11.1s ± 4% 11.4s ± 2% +2.63% (p=0.010 n=10+9) SSA 35.4s ± 3% 34.9s ± 2% ~ (p=0.113 n=10+9) Flate 402ms ±13% 371ms ±30% ~ (p=0.089 n=10+9) GoParser 513ms ± 8% 489ms ±24% -4.76% (p=0.039 n=9+9) Reflect 1.52s ±12% 1.41s ± 5% -7.32% (p=0.001 n=9+10) Tar 607ms ±10% 558ms ± 8% -7.96% (p=0.009 n=9+10) XML 828ms ±10% 789ms ±12% ~ (p=0.059 n=10+10) name old text-bytes new text-bytes delta HelloSize 714kB ± 0% 712kB ± 0% -0.23% (p=0.000 n=10+10) CmdGoSize 8.26MB ± 0% 8.25MB ± 0% -0.14% (p=0.000 n=10+10) name old data-bytes new data-bytes delta HelloSize 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) CmdGoSize 258kB ± 0% 258kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.18MB ± 0% 1.18MB ± 0% ~ (all equal) CmdGoSize 11.2MB ± 0% 11.2MB ± 0% -0.13% (p=0.000 n=10+10) 3. go1 benckmark. name old time/op new time/op delta BinaryTree17 6.60s ±18% 7.36s ±22% ~ (p=0.222 n=5+5) Fannkuch11 4.04s ± 0% 4.05s ± 0% ~ (p=0.421 n=5+5) FmtFprintfEmpty 91.8ns ±14% 91.2ns ± 9% ~ (p=0.667 n=5+5) FmtFprintfString 145ns ± 0% 151ns ± 6% ~ (p=0.397 n=4+5) FmtFprintfInt 169ns ± 0% 176ns ± 5% +4.14% (p=0.016 n=4+5) FmtFprintfIntInt 229ns ± 2% 243ns ± 6% ~ (p=0.143 n=5+5) FmtFprintfPrefixedInt 343ns ± 0% 350ns ± 3% +1.92% (p=0.048 n=5+5) FmtFprintfFloat 400ns ± 3% 394ns ± 3% ~ (p=0.063 n=5+5) FmtManyArgs 1.04µs ± 0% 1.05µs ± 0% +1.62% (p=0.029 n=4+4) GobDecode 13.9ms ± 4% 13.9ms ± 5% ~ (p=1.000 n=5+5) GobEncode 10.6ms ± 4% 10.6ms ± 5% ~ (p=0.421 n=5+5) Gzip 567ms ± 1% 563ms ± 4% ~ (p=0.548 n=5+5) Gunzip 60.2ms ± 1% 60.4ms ± 0% ~ (p=0.056 n=5+5) HTTPClientServer 114µs ± 4% 108µs ± 7% ~ (p=0.095 n=5+5) JSONEncode 18.4ms ± 2% 17.8ms ± 2% -3.06% (p=0.016 n=5+5) JSONDecode 105ms ± 1% 103ms ± 2% ~ (p=0.056 n=5+5) Mandelbrot200 5.48ms ± 0% 5.49ms ± 0% ~ (p=0.841 n=5+5) GoParse 6.05ms ± 1% 6.05ms ± 2% ~ (p=1.000 n=5+5) RegexpMatchEasy0_32 143ns ± 1% 146ns ± 4% +2.10% (p=0.048 n=4+5) RegexpMatchEasy0_1K 499ns ± 1% 492ns ± 2% ~ (p=0.079 n=5+5) RegexpMatchEasy1_32 137ns ± 0% 136ns ± 1% -0.73% (p=0.016 n=4+5) RegexpMatchEasy1_1K 826ns ± 4% 823ns ± 2% ~ (p=0.841 n=5+5) RegexpMatchMedium_32 224ns ± 5% 233ns ± 8% ~ (p=0.119 n=5+5) RegexpMatchMedium_1K 59.6µs ± 0% 59.3µs ± 1% -0.66% (p=0.016 n=4+5) RegexpMatchHard_32 3.29µs ± 3% 3.26µs ± 1% ~ (p=0.889 n=5+5) RegexpMatchHard_1K 98.8µs ± 2% 99.0µs ± 0% ~ (p=0.690 n=5+5) Revcomp 1.02s ± 1% 1.01s ± 1% ~ (p=0.095 n=5+5) Template 135ms ± 5% 131ms ± 1% ~ (p=0.151 n=5+5) TimeParse 591ns ± 0% 593ns ± 0% +0.20% (p=0.048 n=5+5) TimeFormat 655ns ± 2% 607ns ± 0% -7.42% (p=0.016 n=5+4) [Geo mean] 93.5µs 93.8µs +0.23% name old speed new speed delta GobDecode 55.1MB/s ± 4% 55.1MB/s ± 4% ~ (p=1.000 n=5+5) GobEncode 72.4MB/s ± 4% 72.3MB/s ± 5% ~ (p=0.421 n=5+5) Gzip 34.2MB/s ± 1% 34.5MB/s ± 4% ~ (p=0.548 n=5+5) Gunzip 322MB/s ± 1% 321MB/s ± 0% ~ (p=0.056 n=5+5) JSONEncode 106MB/s ± 2% 109MB/s ± 2% +3.16% (p=0.016 n=5+5) JSONDecode 18.5MB/s ± 1% 18.8MB/s ± 2% ~ (p=0.056 n=5+5) GoParse 9.57MB/s ± 1% 9.57MB/s ± 2% ~ (p=0.952 n=5+5) RegexpMatchEasy0_32 223MB/s ± 1% 221MB/s ± 0% -1.10% (p=0.029 n=4+4) RegexpMatchEasy0_1K 2.05GB/s ± 1% 2.08GB/s ± 2% ~ (p=0.095 n=5+5) RegexpMatchEasy1_32 232MB/s ± 0% 234MB/s ± 1% +0.76% (p=0.016 n=4+5) RegexpMatchEasy1_1K 1.24GB/s ± 4% 1.24GB/s ± 2% ~ (p=0.841 n=5+5) RegexpMatchMedium_32 4.45MB/s ± 5% 4.20MB/s ± 1% -5.63% (p=0.000 n=5+4) RegexpMatchMedium_1K 17.2MB/s ± 0% 17.3MB/s ± 1% +0.66% (p=0.016 n=4+5) RegexpMatchHard_32 9.73MB/s ± 3% 9.83MB/s ± 1% ~ (p=0.889 n=5+5) RegexpMatchHard_1K 10.4MB/s ± 2% 10.3MB/s ± 0% ~ (p=0.635 n=5+5) Revcomp 249MB/s ± 1% 252MB/s ± 1% ~ (p=0.095 n=5+5) Template 14.4MB/s ± 4% 14.8MB/s ± 1% ~ (p=0.151 n=5+5) [Geo mean] 62.1MB/s 62.3MB/s +0.34% Fixes #10108 Change-Id: I79038f3c4c2ff874c136053d1a2b1c8a5a9cfac5 Reviewed-on: https://go-review.googlesource.com/c/118796 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-22cmd/internal/obj/arm64: reclassify 32-bit/64-bit constantsfanzha02
Current assembler saves constants in Offset which type is int64, causing 32-bit constants have a incorrect class. This CL reclassifies constants when opcodes are 32-bit variant, like MOVW, ANDW and ADDW, etc. Besides, this CL encodes some constants of ADDCON class as MOVs instructions. This CL changes the assembler behavior as follows. 1. go assembler ADDW $MOVCON, Rn, Rd previous version: MOVD $MOVCON, Rtmp; ADDW Rtmp, Rn, Rd current version: MOVW $MOVCON, Rtmp; ADDW Rtmp, Rn, Rd 2. go assembly MOVW $0xaaaaffff, R1 previous version: treats $0xaaaaffff as VCON, encodes it as MOVW 0x994, R1 (loads it from pool). current version: treats $0xaaaaffff as MOVCON, and encodes it into MOVW instructions. 3. go assembly MOVD $0x210000, R1 previous version: treats $0x210000 as ADDCON, loads it from pool current version: treats $0x210000 as MOVCON, and encodes it into MOVD instructions. Add the test cases. 1. Binary size before/after. binary size change pkg/linux_arm64 -1.534KB pkg/tool/linux_arm64 -0.718KB go -0.32KB gofmt no change 2. go1 benchmark result. name old time/op new time/op delta BinaryTree17-8 6.26s ± 1% 6.28s ± 1% ~ (p=0.105 n=10+10) Fannkuch11-8 5.40s ± 0% 5.39s ± 0% -0.29% (p=0.028 n=9+10) FmtFprintfEmpty-8 94.5ns ± 0% 95.0ns ± 0% +0.51% (p=0.000 n=10+9) FmtFprintfString-8 163ns ± 1% 159ns ± 1% -2.06% (p=0.000 n=10+9) FmtFprintfInt-8 200ns ± 1% 196ns ± 1% -1.99% (p=0.000 n=9+10) FmtFprintfIntInt-8 292ns ± 3% 284ns ± 1% -2.87% (p=0.001 n=10+9) FmtFprintfPrefixedInt-8 422ns ± 1% 420ns ± 1% -0.59% (p=0.015 n=10+10) FmtFprintfFloat-8 458ns ± 0% 463ns ± 1% +1.19% (p=0.000 n=9+10) FmtManyArgs-8 1.37µs ± 1% 1.35µs ± 1% -1.85% (p=0.000 n=10+10) GobDecode-8 15.5ms ± 1% 15.3ms ± 1% -1.82% (p=0.000 n=10+10) GobEncode-8 11.7ms ± 5% 11.7ms ± 2% ~ (p=0.549 n=10+9) Gzip-8 622ms ± 0% 624ms ± 0% +0.23% (p=0.000 n=10+9) Gunzip-8 73.6ms ± 0% 73.8ms ± 1% ~ (p=0.077 n=9+9) HTTPClientServer-8 115µs ± 1% 115µs ± 1% ~ (p=0.796 n=10+10) JSONEncode-8 31.1ms ± 2% 28.7ms ± 1% -7.98% (p=0.000 n=10+9) JSONDecode-8 145ms ± 0% 145ms ± 1% ~ (p=0.447 n=9+10) Mandelbrot200-8 9.67ms ± 0% 9.60ms ± 0% -0.76% (p=0.000 n=9+9) GoParse-8 7.56ms ± 1% 7.58ms ± 0% +0.21% (p=0.035 n=10+9) RegexpMatchEasy0_32-8 208ns ±10% 222ns ± 0% ~ (p=0.531 n=10+6) RegexpMatchEasy0_1K-8 699ns ± 4% 694ns ± 4% ~ (p=0.868 n=10+10) RegexpMatchEasy1_32-8 186ns ± 8% 190ns ±12% ~ (p=0.955 n=10+10) RegexpMatchEasy1_1K-8 1.13µs ± 1% 1.05µs ± 2% -6.64% (p=0.000 n=10+10) RegexpMatchMedium_32-8 316ns ± 7% 288ns ± 1% -8.68% (p=0.000 n=10+7) RegexpMatchMedium_1K-8 90.2µs ± 0% 85.5µs ± 2% -5.19% (p=0.000 n=10+10) RegexpMatchHard_32-8 5.53µs ± 0% 3.90µs ± 0% -29.52% (p=0.000 n=10+10) RegexpMatchHard_1K-8 119µs ± 0% 124µs ± 0% +4.29% (p=0.000 n=9+10) Revcomp-8 1.07s ± 0% 1.07s ± 0% ~ (p=0.094 n=9+9) Template-8 162ms ± 1% 160ms ± 2% ~ (p=0.089 n=10+10) TimeParse-8 756ns ± 2% 763ns ± 1% ~ (p=0.158 n=10+10) TimeFormat-8 758ns ± 1% 746ns ± 1% -1.52% (p=0.000 n=10+10) name old speed new speed delta GobDecode-8 49.4MB/s ± 1% 50.3MB/s ± 1% +1.84% (p=0.000 n=10+10) GobEncode-8 65.6MB/s ± 5% 65.4MB/s ± 2% ~ (p=0.549 n=10+9) Gzip-8 31.2MB/s ± 0% 31.1MB/s ± 0% -0.24% (p=0.000 n=9+9) Gunzip-8 264MB/s ± 0% 263MB/s ± 1% ~ (p=0.073 n=9+9) JSONEncode-8 62.3MB/s ± 2% 67.7MB/s ± 1% +8.67% (p=0.000 n=10+9) JSONDecode-8 13.4MB/s ± 0% 13.4MB/s ± 1% ~ (p=0.508 n=9+10) GoParse-8 7.66MB/s ± 1% 7.64MB/s ± 0% -0.23% (p=0.049 n=10+9) RegexpMatchEasy0_32-8 154MB/s ± 9% 143MB/s ± 3% ~ (p=0.303 n=10+7) RegexpMatchEasy0_1K-8 1.46GB/s ± 4% 1.47GB/s ± 4% ~ (p=0.912 n=10+10) RegexpMatchEasy1_32-8 172MB/s ± 9% 170MB/s ±12% ~ (p=0.971 n=10+10) RegexpMatchEasy1_1K-8 908MB/s ± 1% 972MB/s ± 2% +7.12% (p=0.000 n=10+10) RegexpMatchMedium_32-8 3.17MB/s ± 7% 3.46MB/s ± 1% +9.14% (p=0.000 n=10+7) RegexpMatchMedium_1K-8 11.3MB/s ± 0% 12.0MB/s ± 2% +5.51% (p=0.000 n=10+10) RegexpMatchHard_32-8 5.78MB/s ± 0% 8.21MB/s ± 0% +41.93% (p=0.000 n=9+10) RegexpMatchHard_1K-8 8.62MB/s ± 0% 8.27MB/s ± 0% -4.11% (p=0.000 n=9+10) Revcomp-8 237MB/s ± 0% 237MB/s ± 0% ~ (p=0.081 n=9+9) Template-8 12.0MB/s ± 1% 12.1MB/s ± 2% ~ (p=0.072 n=10+10) Change-Id: I080801f520366b42d5f9699954bd33106976a81b Reviewed-on: https://go-review.googlesource.com/c/120661 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-04cmd/internal/obj/arm64: simplify ADD and SUBBen Shi
Currently "ADD $0x123456, Rs, Rd" will load pre-stored 0x123456 from the constant pool and use it for the addition. Total 12 bytes are cost. And so does SUB. This CL breaks it to "ADD 0x123000, Rs, Rd" + "ADD 0x000456, Rd, Rd". Both "0x123000" and "0x000456" can be directly encoded into the instruction binary code. So 4 bytes are saved. 1. The total size of pkg/android_arm64 decreases about 0.3KB. 2. The go1 benchmark show little regression (excluding noise). name old time/op new time/op delta BinaryTree17-4 15.9s ± 0% 15.9s ± 1% +0.10% (p=0.044 n=29+29) Fannkuch11-4 8.72s ± 0% 8.75s ± 0% +0.34% (p=0.000 n=30+24) FmtFprintfEmpty-4 173ns ± 0% 173ns ± 0% ~ (all equal) FmtFprintfString-4 368ns ± 0% 368ns ± 0% ~ (p=0.593 n=30+30) FmtFprintfInt-4 417ns ± 0% 417ns ± 0% ~ (all equal) FmtFprintfIntInt-4 673ns ± 0% 661ns ± 1% -1.70% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 805ns ± 0% 805ns ± 0% +0.10% (p=0.011 n=30+30) FmtFprintfFloat-4 1.09µs ± 0% 1.09µs ± 0% ~ (p=0.125 n=30+29) FmtManyArgs-4 2.68µs ± 0% 2.68µs ± 0% +0.07% (p=0.004 n=30+30) GobDecode-4 32.9ms ± 0% 33.2ms ± 1% +1.07% (p=0.000 n=29+29) GobEncode-4 29.5ms ± 0% 29.6ms ± 0% +0.26% (p=0.000 n=28+28) Gzip-4 1.38s ± 1% 1.35s ± 3% -1.94% (p=0.000 n=28+30) Gunzip-4 139ms ± 0% 139ms ± 0% +0.10% (p=0.000 n=28+29) HTTPClientServer-4 745µs ± 5% 742µs ± 3% ~ (p=0.405 n=28+29) JSONEncode-4 49.5ms ± 1% 49.9ms ± 0% +0.89% (p=0.000 n=30+30) JSONDecode-4 264ms ± 1% 264ms ± 0% +0.25% (p=0.001 n=30+30) Mandelbrot200-4 16.6ms ± 0% 16.6ms ± 0% ~ (p=0.507 n=29+29) GoParse-4 15.9ms ± 0% 16.0ms ± 1% +0.91% (p=0.002 n=23+30) RegexpMatchEasy0_32-4 379ns ± 0% 379ns ± 0% ~ (all equal) RegexpMatchEasy0_1K-4 1.31µs ± 0% 1.31µs ± 0% +0.09% (p=0.008 n=27+30) RegexpMatchEasy1_32-4 357ns ± 0% 358ns ± 0% +0.28% (p=0.000 n=28+29) RegexpMatchEasy1_1K-4 2.04µs ± 0% 2.04µs ± 0% ~ (p=0.850 n=30+30) RegexpMatchMedium_32-4 587ns ± 0% 589ns ± 0% +0.33% (p=0.000 n=30+30) RegexpMatchMedium_1K-4 162µs ± 0% 163µs ± 0% ~ (p=0.351 n=30+29) RegexpMatchHard_32-4 9.54µs ± 0% 9.60µs ± 0% +0.59% (p=0.000 n=28+30) RegexpMatchHard_1K-4 287µs ± 0% 287µs ± 0% +0.11% (p=0.000 n=26+29) Revcomp-4 2.50s ± 0% 2.50s ± 0% -0.13% (p=0.012 n=28+27) Template-4 312ms ± 1% 312ms ± 1% +0.20% (p=0.015 n=27+30) TimeParse-4 1.68µs ± 0% 1.68µs ± 0% -0.35% (p=0.000 n=30+30) TimeFormat-4 1.66µs ± 0% 1.64µs ± 0% -1.20% (p=0.000 n=25+29) [Geo mean] 246µs 246µs -0.00% name old speed new speed delta GobDecode-4 23.3MB/s ± 0% 23.1MB/s ± 1% -1.05% (p=0.000 n=29+29) GobEncode-4 26.0MB/s ± 0% 25.9MB/s ± 0% -0.25% (p=0.000 n=29+28) Gzip-4 14.1MB/s ± 1% 14.4MB/s ± 3% +1.94% (p=0.000 n=27+30) Gunzip-4 139MB/s ± 0% 139MB/s ± 0% -0.10% (p=0.000 n=28+29) JSONEncode-4 39.2MB/s ± 1% 38.9MB/s ± 0% -0.88% (p=0.000 n=30+30) JSONDecode-4 7.37MB/s ± 0% 7.35MB/s ± 0% -0.26% (p=0.001 n=30+30) GoParse-4 3.65MB/s ± 0% 3.62MB/s ± 1% -0.86% (p=0.001 n=23+30) RegexpMatchEasy0_32-4 84.3MB/s ± 0% 84.3MB/s ± 0% ~ (p=0.126 n=27+26) RegexpMatchEasy0_1K-4 784MB/s ± 0% 783MB/s ± 0% -0.10% (p=0.003 n=27+30) RegexpMatchEasy1_32-4 89.5MB/s ± 0% 89.3MB/s ± 0% -0.20% (p=0.000 n=27+29) RegexpMatchEasy1_1K-4 502MB/s ± 0% 502MB/s ± 0% ~ (p=0.858 n=30+28) RegexpMatchMedium_32-4 1.70MB/s ± 0% 1.70MB/s ± 0% -0.25% (p=0.000 n=30+30) RegexpMatchMedium_1K-4 6.30MB/s ± 0% 6.30MB/s ± 0% ~ (all equal) RegexpMatchHard_32-4 3.35MB/s ± 0% 3.33MB/s ± 0% -0.47% (p=0.000 n=30+30) RegexpMatchHard_1K-4 3.57MB/s ± 0% 3.56MB/s ± 0% -0.20% (p=0.000 n=27+30) Revcomp-4 102MB/s ± 0% 102MB/s ± 0% +0.14% (p=0.008 n=28+28) Template-4 6.23MB/s ± 0% 6.21MB/s ± 1% -0.21% (p=0.009 n=21+30) [Geo mean] 24.1MB/s 24.0MB/s -0.16% Change-Id: Ifcef3edb667540e2d86e586c23afcfbc2cf1340b Reviewed-on: https://go-review.googlesource.com/c/134536 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-04cmd/internal/obj/arm64: support more atomic instructionsBen Shi
LDADDALD(64-bit) and LDADDALW(32-bit) are already supported. This CL adds supports of LDADDALH(16-bit) and LDADDALB(8-bit). Change-Id: I4eac61adcec226d618dfce88618a2b98f5f1afe7 Reviewed-on: https://go-review.googlesource.com/132135 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>