| Age | Commit message (Collapse) | Author |
|
Uncomment the test cases in arm64enc.s because they can be handled
by current assembler. In addition, CL supplements more test cases.
Change-Id: I583d45793b8227c6ec370868652dd8bcbfaa1ecf
Reviewed-on: https://go-review.googlesource.com/109275
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This change provides VZIP1, VZIP2, VTBL instruction for supporting
ChaCha20Poly1305 implementation later.
Change-Id: Ife7c87b8ab1a6495a444478eeb9d906ae4c5ffa9
Reviewed-on: https://go-review.googlesource.com/110015
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Add a compiler intrinsic for getcallerpc on arm64 for better code generation.
Change-Id: I897e670a2b8ffa1a8c2fdc638f5b2c44bda26318
Reviewed-on: https://go-review.googlesource.com/109276
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This instruction was introduced on the z14 to accelerate "limbified"
multiplications for certain cryptographic algorithms. This change allows
it to be used in Go assembly.
Change-Id: Ic93dae7fec1756f662874c08a5abc435bce9dd9e
Reviewed-on: https://go-review.googlesource.com/109695
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Current optab entries are unordered, because the new instructions
are added at the end of the optab. The patch reorders them by comments
in optab, such as arithmetic operations, logical operations and a
series of load/store etc.
The patch removes the VMOVS opcode because FMOVS already has the same
operation.
Change-Id: Iccdf89ecbb3875b9dfcb6e06be2cc19c7e5581a2
Reviewed-on: https://go-review.googlesource.com/109896
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change provides VSRI instruction for ChaCha20Poly1305 implementation.
Change-Id: Ifee727b6f3982b629b44a67cac5bbe87ca59027b
Reviewed-on: https://go-review.googlesource.com/109342
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Use CMN/TST to simplify comparisons. This can reduce the
register pressure by removing single def/use registers for example:
ADDW R0, R1, R8 -> CMNW R1, R0 ; CMN is an alias of ADDS.
CBZW R8, label -> BEQ label ; single def/use of R8 removed.
Little change in performance of go1 benchmark on Amberwing:
name old time/op new time/op delta
RegexpMatchEasy0_32 247ns ± 0% 246ns ± 0% -0.40% (p=0.008 n=5+5)
RegexpMatchEasy0_1K 581ns ± 0% 580ns ± 0% ~ (p=0.079 n=4+5)
RegexpMatchEasy1_32 244ns ± 0% 243ns ± 0% -0.41% (p=0.008 n=5+5)
RegexpMatchEasy1_1K 804ns ± 0% 806ns ± 0% +0.25% (p=0.016 n=5+4)
RegexpMatchMedium_32 313ns ± 0% 311ns ± 0% -0.64% (p=0.008 n=5+5)
RegexpMatchMedium_1K 52.2µs ± 0% 51.9µs ± 0% -0.51% (p=0.008 n=5+5)
RegexpMatchHard_32 2.76µs ± 3% 2.74µs ± 0% ~ (p=0.683 n=5+5)
RegexpMatchHard_1K 78.8µs ± 0% 78.9µs ± 0% +0.04% (p=0.008 n=5+5)
FmtFprintfEmpty 58.6ns ± 0% 57.7ns ± 0% -1.54% (p=0.008 n=5+5)
FmtFprintfString 118ns ± 0% 115ns ± 0% -2.54% (p=0.008 n=5+5)
FmtFprintfInt 119ns ± 0% 119ns ± 0% ~ (all equal)
FmtFprintfIntInt 192ns ± 0% 192ns ± 0% ~ (all equal)
FmtFprintfPrefixedInt 224ns ± 0% 205ns ± 0% -8.48% (p=0.008 n=5+5)
FmtFprintfFloat 336ns ± 0% 333ns ± 1% ~ (p=0.683 n=5+5)
FmtManyArgs 779ns ± 1% 760ns ± 1% -2.41% (p=0.008 n=5+5)
Gzip 437ms ± 0% 436ms ± 0% -0.27% (p=0.008 n=5+5)
HTTPClientServer 90.1µs ± 1% 91.1µs ± 0% +1.19% (p=0.008 n=5+5)
JSONEncode 20.1ms ± 0% 20.2ms ± 1% ~ (p=0.690 n=5+5)
JSONDecode 94.5ms ± 1% 94.1ms ± 1% ~ (p=0.095 n=5+5)
Mandelbrot200 5.37ms ± 0% 5.37ms ± 0% ~ (p=0.421 n=5+5)
TimeParse 450ns ± 0% 446ns ± 0% -0.89% (p=0.000 n=5+4)
TimeFormat 483ns ± 1% 473ns ± 0% -2.19% (p=0.008 n=5+5)
Template 90.6ms ± 0% 89.7ms ± 0% -0.93% (p=0.008 n=5+5)
GoParse 5.97ms ± 0% 6.01ms ± 0% +0.65% (p=0.008 n=5+5)
BinaryTree17 11.8s ± 0% 11.7s ± 0% -0.28% (p=0.016 n=5+5)
Revcomp 669ms ± 0% 669ms ± 0% ~ (p=0.222 n=5+5)
Fannkuch11 3.28s ± 0% 3.34s ± 0% +1.72% (p=0.016 n=4+5)
[Geo mean] 46.6µs 46.3µs -0.74%
name old speed new speed delta
RegexpMatchEasy0_32 129MB/s ± 0% 130MB/s ± 0% +0.32% (p=0.016 n=5+4)
RegexpMatchEasy0_1K 1.76GB/s ± 0% 1.76GB/s ± 0% +0.13% (p=0.016 n=4+5)
RegexpMatchEasy1_32 131MB/s ± 0% 132MB/s ± 0% +0.32% (p=0.008 n=5+5)
RegexpMatchEasy1_1K 1.27GB/s ± 0% 1.27GB/s ± 0% -0.24% (p=0.016 n=5+4)
RegexpMatchMedium_32 3.19MB/s ± 0% 3.21MB/s ± 0% +0.63% (p=0.008 n=5+5)
RegexpMatchMedium_1K 19.6MB/s ± 0% 19.7MB/s ± 0% +0.51% (p=0.029 n=4+4)
RegexpMatchHard_32 11.6MB/s ± 2% 11.7MB/s ± 0% ~ (p=1.000 n=5+5)
RegexpMatchHard_1K 13.0MB/s ± 0% 13.0MB/s ± 0% ~ (p=0.079 n=4+5)
Gzip 44.4MB/s ± 0% 44.5MB/s ± 0% +0.27% (p=0.008 n=5+5)
JSONEncode 96.4MB/s ± 0% 96.2MB/s ± 1% ~ (p=0.579 n=5+5)
JSONDecode 20.5MB/s ± 1% 20.6MB/s ± 1% ~ (p=0.111 n=5+5)
Template 21.4MB/s ± 0% 21.6MB/s ± 0% +0.94% (p=0.008 n=5+5)
GoParse 9.70MB/s ± 0% 9.63MB/s ± 0% -0.68% (p=0.016 n=4+5)
Revcomp 380MB/s ± 0% 380MB/s ± 0% ~ (p=0.222 n=5+5)
[Geo mean] 55.3MB/s 55.4MB/s +0.23%
Change-Id: I2e5338138991d9bc984e67b51212aa5d1b0f2a6b
Reviewed-on: https://go-review.googlesource.com/97335
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
|
|
Memory arguments for debug/control register moves are a
minefield for programmer: not useful, but can lead to errors.
See referenced issue for detailed explanation.
Fixes #24981
Change-Id: I918e81cd4a8b1dfcfc9023cdfc3de45abe29e749
Reviewed-on: https://go-review.googlesource.com/107075
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Reject to compile I386/AMD64 asm code that contains
(Register)(PseudoReg*scale) forms of memory operands.
Example of such program: "CALL (AX)(PC*2)".
PseudoReg is one of the PC, FP, SB (but not SP).
When pseudo-register is used in register indirect as
scaled index base, x86 backend will panic because
its register file misses SB/FP/PC registers.
Fixes #12657.
Change-Id: I30fca797b537cbc86ab47583ae96c6a0c59acaa1
Reviewed-on: https://go-review.googlesource.com/107835
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
All test cases are commented-out until real implementation is merged.
These files are required to make x86avxgen work: it expects that
for each new instruction it enables end2end test cases are available.
This test suite is automatically generated.
Additional AVX512 tests will be added later.
Change-Id: I5f5cb6b90540834585ee5ad4c00ebfbb6efa8094
Reviewed-on: https://go-review.googlesource.com/107217
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
This change provides VREV64 instruction for AES-GCM implementation.
Change-Id: Icdf278862b03556388586f459964b025c47b8c19
Reviewed-on: https://go-review.googlesource.com/107696
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
the backend
The current code encodes the register and the shift/extension into a.Offset
field and this is done in the frontend. The CL refactors it to have the
frontend record the register/shift/extension information in a.Reg or a.Index
and leave the encoding stuff for the backend.
Change-Id: I600f456aec95377b7b79cd58e94afcb30aca5d19
Reviewed-on: https://go-review.googlesource.com/106815
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This change adds vector multiply instructions to the assembler for
ppc64x.
Change-Id: I5143a2dc3736951344d43999066d38ab8be4a721
Reviewed-on: https://go-review.googlesource.com/107795
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
CL 106735 changed to the new softfloat support on GOARM=5.
ARM assembly code that uses FP instructions not guarded on GOARM,
if any, will break. The easiest way to fix is probably to use Go
implementation on GOARM=5, like
MOVB runtime·goarm(SB), R11
CMP $5, R11
BEQ arm5
... FP instructions ...
RET
arm5:
CALL or JMP to Go implementation
Change-Id: I52fc76fac9c854ebe7c6c856c365fba35d3f560a
Reviewed-on: https://go-review.googlesource.com/107475
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Say "offset too large" instead of "invalid instruction" when
assembling for AMD64. GOARCH=386 already reports error correctly.
Fixed #24871
Change-Id: Iab029307b5c5edbb45f9df4b64c60ecb5f101349
Reviewed-on: https://go-review.googlesource.com/107116
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
LDP/STP/LDPW/STPW
The current assembler will report error when the negative offset is in
the range of [-256, 0) and is not the multiples of 4/8.
The fix introduces C_NSAUTO_8, C_NSAUTO_4 and C_NAUTO4K. C_NPAUTO
includes C_NSAUTO_8 instead of C_NSAUTO, C_NAUTO4K includes C_NSAUTO_8,
C_NSAUTO_4 and C_NSAUTO. So that assembler will encode the negative offset
that is greater than -4095 and is not the multiples of 4/8 as two instructions.
Add the test cases.
Fixed #24471
Change-Id: I42f34e3b8a9fc52c9e8b41504294271aafade639
Reviewed-on: https://go-review.googlesource.com/102635
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
SWPD/SWPW/SWPH/SWPB were introduced in ARMv8.1. They swap content
of register and memory atomically. And their difference is
SWPD: 64-bit double word data
SWPW: 32-bit word data (zero extended to 64-bit)
SWPH: 16-bit half word data (zero extended to 64-bit)
SWPB: 8-bit byte data (zero extended to 64-bit)
This CL implements them in the arm64 assembler.
Change-Id: I2d9fb2310674bd92693531210e187143e7eed602
Reviewed-on: https://go-review.googlesource.com/101516
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
offset instrucitons
The patch adds support for arm64 instructions LDRB, LDRH, LDRSB,
LDRSH, LDRSW, STR, STRB and STRH with register offset.
Test cases are also added.
Change-Id: I8d17fddd2963c0bc366e12b00bac49b93f3f0957
Reviewed-on: https://go-review.googlesource.com/91575
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The patch adds support for LDR(register offset) instruction.
And add the test cases and negative tests.
Change-Id: I5b32c6a5065afc4571116d4896f7ebec3c0416d3
Reviewed-on: https://go-review.googlesource.com/87955
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
ParseARM64Suffix is not used outside cmd/asm/internal/arch.
Change-Id: I8e7782dce11cf8cd2fd08dd17e555ced8d87ba24
Reviewed-on: https://go-review.googlesource.com/105115
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
* Remove some redundant returns
* Replace HasPrefix with TrimPrefix
* Remove some obviously dead code
Passes toolstash -cmp on std cmd.
Change-Id: Ifb0d70a45cbb8a8553758a8c4878598b7fe932bc
Reviewed-on: https://go-review.googlesource.com/105017
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Floating point test instructions allow special cases (NaN, ±∞ and
a few other useful properties) to be checked directly.
This CL adds the following instructions to the assembler:
* LTEBR - load and test (float32)
* LTDBR - load and test (float64)
* TCEB - test data class (float32)
* TCDB - test data class (float64)
Note that I have only added immediate versions of the 'test data
class' instructions for now as that's the only case I think the
compiler will use.
Change-Id: I3398aab2b3a758bf909bd158042234030c8af582
Reviewed-on: https://go-review.googlesource.com/104457
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This change adds VLD1, VST1, VPMULL{2}, VEXT, VRBIT, VUSHR and VSHL instructions
for supporting AES-GCM implementation later.
Fixes #24400
Change-Id: I556feb88067f195cbe25629ec2b7a817acc58709
Reviewed-on: https://go-review.googlesource.com/101095
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Change-Id: I9d2a4b8df324897e264d30801e95ddc0f0e75f3a
Reviewed-on: https://go-review.googlesource.com/102337
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
|
|
Change-Id: Ib67a61d5b37af210ff15d60d72bd5238b9c2d0ca
Reviewed-on: https://go-review.googlesource.com/94815
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
1. STPW stores the lower 32-bit words of a pair of registers to memory.
2. LDPW loads two 32-bit words from memory, zero extends them to 64-bit,
and then copies to a pair of registers.
3. LDPSW does the same as LDPW, except a sign extension.
This CL implements those 3 instructions and adds test cases.
Change-Id: Ied9834d8240240d23ce00e086b4ea456e1611f1a
Reviewed-on: https://go-review.googlesource.com/99956
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
For each replacement, test case is added to new 386enc.s file
with exception of EMMS, SYSENTER, MFENCE and LFENCE as they
are already covered in amd64enc.s (same on amd64 and 386).
The replacement became less obvious after go vet suggested changes
Before:
BYTE $0x0f; BYTE $0x7f; BYTE $0x44; BYTE $0x24; BYTE $0x08
Changed to MOVQ (this form is being tested):
MOVQ M0, 8(SP)
Refactored to FP-relative access (go vet advice):
MOVQ M0, val+4(FP)
Change-Id: I56b87cf3371b6ad81ad0cd9db2033aee407b5818
Reviewed-on: https://go-review.googlesource.com/101475
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
|
|
on ARM64
This change fixes index error when encoding VMOV instruction which pattern
is vmov Vn.<T>[index], Vd.<T>[index]
Change-Id: I949166e6dfd63fb0a9365f183b6c50d452614f9d
Reviewed-on: https://go-review.googlesource.com/101335
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
ARM64
This change fixes index error when encoding VMOV instruction which pattern is
VMOV Rn, V.<T>[index]. For example VMOV R1, V1.S[1] is assembled as VMOV R1, V1.S[0]
Fixes #24400
Change-Id: I82b5edc8af4e06862bc4692b119697c6bb7dc3fb
Reviewed-on: https://go-review.googlesource.com/101297
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Current ARM64 assembler has no check for the invalid value of both
shift amount and post-index immediate offset of LD1/ST1. This patch
adds the check.
This patch also fixes the printing error of register number equals
to 31, which should be printed as ZR instead of R31. Test cases
are also added.
Change-Id: I476235f3ab3a3fc91fe89c5a3149a4d4529c05c7
Reviewed-on: https://go-review.googlesource.com/100255
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
|
|
Improve runtime memclr_arm64.s using ZVA feature to zero out memory when n
is at least 64 bytes.
Also add DCZID_EL0 system register to use in MRS instruction.
Benchmark results of runtime/Memclr on Amberwing:
name old time/op new time/op delta
Memclr/5 12.7ns ± 0% 12.7ns ± 0% ~ (all equal)
Memclr/16 12.7ns ± 0% 12.2ns ± 1% -4.13% (p=0.000 n=7+8)
Memclr/64 14.0ns ± 0% 14.6ns ± 1% +4.29% (p=0.000 n=7+8)
Memclr/256 23.7ns ± 0% 25.7ns ± 0% +8.44% (p=0.000 n=8+7)
Memclr/4096 204ns ± 0% 74ns ± 0% -63.71% (p=0.000 n=8+8)
Memclr/65536 2.89µs ± 0% 0.84µs ± 0% -70.91% (p=0.000 n=8+8)
Memclr/1M 45.9µs ± 0% 17.0µs ± 0% -62.88% (p=0.000 n=8+8)
Memclr/4M 184µs ± 0% 77µs ± 4% -57.94% (p=0.001 n=6+8)
Memclr/8M 367µs ± 0% 144µs ± 1% -60.72% (p=0.000 n=7+8)
Memclr/16M 734µs ± 0% 293µs ± 1% -60.09% (p=0.000 n=8+8)
Memclr/64M 2.94ms ± 0% 1.23ms ± 0% -58.06% (p=0.000 n=7+8)
GoMemclr/5 8.00ns ± 0% 8.79ns ± 0% +9.83% (p=0.000 n=8+8)
GoMemclr/16 8.00ns ± 0% 7.60ns ± 0% -5.00% (p=0.000 n=8+8)
GoMemclr/64 10.8ns ± 0% 10.4ns ± 0% -3.70% (p=0.000 n=8+8)
GoMemclr/256 20.4ns ± 0% 21.2ns ± 0% +3.92% (p=0.000 n=8+8)
name old speed new speed delta
Memclr/5 394MB/s ± 0% 393MB/s ± 0% -0.28% (p=0.006 n=8+8)
Memclr/16 1.26GB/s ± 0% 1.31GB/s ± 1% +4.07% (p=0.000 n=7+8)
Memclr/64 4.57GB/s ± 0% 4.39GB/s ± 2% -3.91% (p=0.000 n=7+8)
Memclr/256 10.8GB/s ± 0% 10.0GB/s ± 0% -7.95% (p=0.001 n=7+6)
Memclr/4096 20.1GB/s ± 0% 55.3GB/s ± 0% +175.46% (p=0.000 n=8+8)
Memclr/65536 22.6GB/s ± 0% 77.8GB/s ± 0% +243.63% (p=0.000 n=7+8)
Memclr/1M 22.8GB/s ± 0% 61.5GB/s ± 0% +169.38% (p=0.000 n=8+8)
Memclr/4M 22.8GB/s ± 0% 54.3GB/s ± 4% +137.85% (p=0.001 n=6+8)
Memclr/8M 22.8GB/s ± 0% 58.1GB/s ± 1% +154.56% (p=0.000 n=7+8)
Memclr/16M 22.8GB/s ± 0% 57.2GB/s ± 1% +150.54% (p=0.000 n=8+8)
Memclr/64M 22.8GB/s ± 0% 54.4GB/s ± 0% +138.42% (p=0.000 n=7+8)
GoMemclr/5 625MB/s ± 0% 569MB/s ± 0% -8.90% (p=0.000 n=7+8)
GoMemclr/16 2.00GB/s ± 0% 2.10GB/s ± 0% +5.26% (p=0.000 n=8+8)
GoMemclr/64 5.92GB/s ± 0% 6.15GB/s ± 0% +3.83% (p=0.000 n=7+8)
GoMemclr/256 12.5GB/s ± 0% 12.1GB/s ± 0% -3.77% (p=0.000 n=8+7)
Benchmark results of runtime/Memclr on Amberwing without ZVA:
name old time/op new time/op delta
Memclr/5 12.7ns ± 0% 12.8ns ± 0% +0.79% (p=0.008 n=5+5)
Memclr/16 12.7ns ± 0% 12.7ns ± 0% ~ (p=0.444 n=5+5)
Memclr/64 14.0ns ± 0% 14.4ns ± 0% +2.86% (p=0.008 n=5+5)
Memclr/256 23.7ns ± 1% 19.2ns ± 0% -19.06% (p=0.008 n=5+5)
Memclr/4096 203ns ± 0% 119ns ± 0% -41.38% (p=0.008 n=5+5)
Memclr/65536 2.89µs ± 0% 1.66µs ± 0% -42.76% (p=0.008 n=5+5)
Memclr/1M 45.9µs ± 0% 26.2µs ± 0% -42.82% (p=0.008 n=5+5)
Memclr/4M 184µs ± 0% 105µs ± 0% -42.81% (p=0.008 n=5+5)
Memclr/8M 367µs ± 0% 210µs ± 0% -42.76% (p=0.008 n=5+5)
Memclr/16M 734µs ± 0% 420µs ± 0% -42.74% (p=0.008 n=5+5)
Memclr/64M 2.94ms ± 0% 1.69ms ± 0% -42.46% (p=0.008 n=5+5)
GoMemclr/5 8.00ns ± 0% 8.40ns ± 0% +5.00% (p=0.008 n=5+5)
GoMemclr/16 8.00ns ± 0% 8.40ns ± 0% +5.00% (p=0.008 n=5+5)
GoMemclr/64 10.8ns ± 0% 9.6ns ± 0% -11.02% (p=0.008 n=5+5)
GoMemclr/256 20.4ns ± 0% 17.2ns ± 0% -15.69% (p=0.008 n=5+5)
name old speed new speed delta
Memclr/5 393MB/s ± 0% 391MB/s ± 0% -0.64% (p=0.008 n=5+5)
Memclr/16 1.26GB/s ± 0% 1.26GB/s ± 0% -0.55% (p=0.008 n=5+5)
Memclr/64 4.57GB/s ± 0% 4.44GB/s ± 0% -2.79% (p=0.008 n=5+5)
Memclr/256 10.8GB/s ± 0% 13.3GB/s ± 0% +23.07% (p=0.016 n=4+5)
Memclr/4096 20.1GB/s ± 0% 34.3GB/s ± 0% +70.91% (p=0.008 n=5+5)
Memclr/65536 22.7GB/s ± 0% 39.6GB/s ± 0% +74.65% (p=0.008 n=5+5)
Memclr/1M 22.8GB/s ± 0% 40.0GB/s ± 0% +74.88% (p=0.008 n=5+5)
Memclr/4M 22.8GB/s ± 0% 39.9GB/s ± 0% +74.84% (p=0.008 n=5+5)
Memclr/8M 22.9GB/s ± 0% 39.9GB/s ± 0% +74.71% (p=0.008 n=5+5)
Memclr/16M 22.9GB/s ± 0% 39.9GB/s ± 0% +74.64% (p=0.008 n=5+5)
Memclr/64M 22.8GB/s ± 0% 39.7GB/s ± 0% +73.79% (p=0.008 n=5+5)
GoMemclr/5 625MB/s ± 0% 595MB/s ± 0% -4.77% (p=0.000 n=4+5)
GoMemclr/16 2.00GB/s ± 0% 1.90GB/s ± 0% -4.77% (p=0.008 n=5+5)
GoMemclr/64 5.92GB/s ± 0% 6.66GB/s ± 0% +12.48% (p=0.016 n=4+5)
GoMemclr/256 12.5GB/s ± 0% 14.9GB/s ± 0% +18.95% (p=0.008 n=5+5)
Fixes #22948
Change-Id: Iaae4e22391e25b54d299821bb7f8a81ac3986b93
Reviewed-on: https://go-review.googlesource.com/82055
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Thanks to Iskander Sharipov for spotting this in an earlier CL of mine.
Change-Id: Idf45ad266205ff83985367cb38f585badfbed151
Reviewed-on: https://go-review.googlesource.com/100535
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Iskander Sharipov <iskander.sharipov@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The imm8 argument consists of 4 2-bit indices, so it can take values up
to $255. However, the assembler was treating it as Yi8, which reads
"fits in int8". Add a Yu8 variant, to also keep backwards compatibility
with negative values possible with Yi8.
Fixes #24378.
Change-Id: I24ddb19c219b54d039a6c1bcdb903717d1c7c3b8
Reviewed-on: https://go-review.googlesource.com/100475
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The current implementation of l*arx instructions does not accept non-zero
offsets in RA nor the EH field. This change adds full functionality to those
instructions.
Updates #23845
Change-Id: If113f70d11de5f35f8389520b049390dbc40e863
Reviewed-on: https://go-review.googlesource.com/99635
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
Logical instructions can have RSP as its destination. Support it.
Note that the two-operand form, like "AND $1, RSP", which is
equivalent to the three-operand form "AND $1, RSP, RSP", is
invalid, because the source register is not allowed to be RSP.
Also note that instructions that set the conditional flags, like
ANDS, cannot target RSP. Because of this, we split out the optab
entries of AND et al. and ANDS et al.
Merge the optab entries of BIC et al. to AND et al., because they
are same.
Fixes #24332.
Change-Id: I3584d6f2e7cea98a659a1ed9fdf67c353e090637
Reviewed-on: https://go-review.googlesource.com/100217
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The current code assigns vector register arrangement a wrong value
when the arrangement specifier is S2, which causes the incorrect
assembly.
The patch fixes the issue and adds the test cases.
Fixes #24249
Change-Id: I9736df1279494003d0b178da1af9cee9cd85ce21
Reviewed-on: https://go-review.googlesource.com/98555
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
When instructions add, and, or, xor, and movd have
constant operands in some cases more instructions are
generated than necessary by the assembler.
This adds more opcode/operand combinations to the optab
and improves the code generation for the cases where the
size and sign of the constant allows the use of 1
instructions instead of 2.
Example of previous code:
oris r3, r0, 0
ori r3, r3, 65533
now:
ori r3, r0, 65533
This does not significantly reduce the overall binary size
because the improvement depends on the constant value.
Some procedures show a 1-2% reduction in size. This improvement
could also be significant in cases where the extra instructions
occur in a critical loop.
Testcase ppc64enc.s was added to cmd/asm/internal/asm/testdata
with the variations affected by this change.
Updates #23845
Change-Id: I7fdf2320c95815d99f2755ba77d0c6921cd7fad7
Reviewed-on: https://go-review.googlesource.com/95135
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
In RET instruction, the operand is the return jump's target,
which should be put in Prog.To.
Add an action "buildrundir" to the test driver, which builds
(compile+assemble+link) the code in a directory and runs the
resulting binary.
Fixes #23838.
Change-Id: I7ebe7eda49024b40a69a24857322c5ca9c67babb
Reviewed-on: https://go-review.googlesource.com/94175
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Instructions LDARB, LDARH, LDAXPW, LDAXP, STLRB, STLRH, STLXP, STLXPW, STXP,
STXPW have been added before, but they are not enabled. This CL enabled them.
Change the form of LDXP and LDXPW to the form of LDP, and fix a bug of STLXP.
Change-Id: I5d2b51494b92451bf6b072c65cfdd8acf07e9b54
Reviewed-on: https://go-review.googlesource.com/96215
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Minimizes the amount of "TODO" stuff in test suite
of cmd/asm/internal/asm/testdata/amd64enc.s.
Some instructions were already implemented, but
test cases for them were commented-out.
Does not enable MMX instructions, calls/jumps and some
segment registers instructions.
-- Affected instructions --
BLENDVPD, BLENDVPS
BSWAPW
CBW
CDQE
CLAC
CLFLUSHOPT
CMPXCHG16B
CRC32B, CRC32L, CRC32W
CWDE
FBLD
FBSTP
FCMOVB
FCMOVBE
FCMOVE
FCMOVNB
FCMOVNBE
FCMOVU
FCOMI
FCOMIP
IMUL3L, IMUL3Q, IMUL3W
ICEBP, INT
INVPCID
LARQ
LGDT, LIDT, LLDT
LMSW
LTR
LZCNTL, LZCNTQ, LZCNTW
MONITOR
MOVBELL, MOVBEQQ, MOVBEWW
MOVBQZX
MOVQ
MOVSWW, MOVZWW
MWAIT
NOPL, NOPW
PBLENDVB
PEXTRW
RDPKRU
RDRANDL, RDRANDQ, RDRANDW
RDSEEDL, RDSEEDQ, RDSEEDW
RDTSCP
SAHF
SGDT, SIDT
SLDTL, SLDTQ, SLDTW
SMSWL, SMSWQ, SMSWW
STAC
STRL, STRQ, STRW
SYSENTER, SYSENTER64
SYSEXIT, SYSEXIT64
SHA256RNDS2
TZCNTL, TZCNTQ, TZCNTW
UD1, UD2
WRPKRU
XRSTOR, XRSTOR64
XRSTORS, XRSTORS64
XSAVE, XSAVE64
XSAVEC, XSAVEC64
XSAVEOPT, XSAVEOPT64
XSAVES, XSAVES64
XSETBV
Fixes #6739
Change-Id: I8b125d9a5ea39bb4b9da7e66a63a16f609cef376
Reviewed-on: https://go-review.googlesource.com/97235
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
CL generated mechanically with github.com/mdempsky/unconvert.
Also updated cmd/compile/internal/ssa/gen/*.rules manually.
Change-Id: If721ef73cf0771ae83ce7e2d11623fc8d9155768
Reviewed-on: https://go-review.googlesource.com/97075
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Extend cmd/internal/src.PosBase to track column information,
and adjust the meaning of the PosBase position to mean the
position at which the PosBase's relative (line, col) position
starts (rather than indicating the position of the //line
directive). Because this semantic change is made in the
compiler's noder, it doesn't affect the logic of src.PosBase,
only its test setup (where PosBases are constructed with
corrected incomming positions). In short, src.PosBase now
matches syntax.PosBase with respect to the semantics of
src.PosBase.pos.
For #22662.
Change-Id: I5b1451cb88fff3f149920c2eec08b6167955ce27
Reviewed-on: https://go-review.googlesource.com/96535
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Add arm64 HW instructions FMADDD, FMADDS, FMSUBD, FMSUBS, FNMADDD, FNMADDS,
FNMSUBD, FNMSUBS, VFMLA, VFMLS, VMOV (element) for math optimization.
Add check on register element index and test cases.
Change-Id: Ice07c50b1a02d488ad2cde2a4e8aea93f3e3afff
Reviewed-on: https://go-review.googlesource.com/90876
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
GitHub-Last-Rev: 468df242d07419c228656985702325aa78952d99
GitHub-Pull-Request: golang/go#23935
Change-Id: If751ce3ffa3a4d5e00a3138211383d12cb6b23fc
Reviewed-on: https://go-review.googlesource.com/95577
Run-TryBot: Andrew Bonventre <andybons@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
|
|
This adds math/bits intrinsics for OnesCount on arm64.
name old time/op new time/op delta
OnesCount 3.81ns ± 0% 1.60ns ± 0% -57.96% (p=0.000 n=7+8)
OnesCount8 1.60ns ± 0% 1.60ns ± 0% ~ (all equal)
OnesCount16 2.41ns ± 0% 1.60ns ± 0% -33.61% (p=0.000 n=8+8)
OnesCount32 4.17ns ± 0% 1.60ns ± 0% -61.58% (p=0.000 n=8+8)
OnesCount64 3.80ns ± 0% 1.60ns ± 0% -57.84% (p=0.000 n=8+8)
Update #18616
Conflicts:
src/cmd/compile/internal/gc/asm_test.go
Change-Id: I63ac2f63acafdb1f60656ab8a56be0b326eec5cb
Reviewed-on: https://go-review.googlesource.com/90835
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change adds ADD/AND/OR/XOR Immediate Shifted instructions for
ppc64x so they are usable in Go asm code. These instructions were
originally present in asm9.go, but they were only usable in that
file (as -AADD, -AANDCC, -AOR, -AXOR). These old mnemonics are now
removed.
Updates #23845
Change-Id: Ifa2fac685e8bc628cb241dd446adfc3068181826
Reviewed-on: https://go-review.googlesource.com/94115
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
The current assembler cannot handle PRFM(immediate) instruciton.
The fix creates a prfopfield struct that contains the eight
prefetch operations and the value to use in instruction. And add
the test cases.
Fixes #22932
Change-Id: I621d611bd930ef3c42306a4372447c46d53b2ccf
Reviewed-on: https://go-review.googlesource.com/81675
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Add support of NEG{V,W} pseudo-instructions, which are translated
to a SUB instruction from R0 with proper width.
Also turn illegal instruction to UNDEF, to avoid crashing in
asmout when it tries to read the operands.
Fixes #23548.
Change-Id: I047b27559ccd9594c3dcf62ab039b636098f30a3
Reviewed-on: https://go-review.googlesource.com/89896
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
R28 is used as the SB register on MIPS64, and it was printed as
"RSB" on both 32-bit and 64-bit MIPS. This is confusing on MIPS32
as there R28 is just a general purpose register. Further, this
string representation is used in the assembler's frontend to parse
register symbols, and this leads to failure in parsing R28 in
MIPS32 assembly code. Change rconv to always print the register
as R28. This fixes the parsing problem on MIPS32, and this is
a reasonable representation on both MIPS32 and MIPS64.
Change-Id: I30d6c0a442fbb08ea615f32f1763b5baadcee1da
Reviewed-on: https://go-review.googlesource.com/92915
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Was missing a check in validSymbol.
Fixes #23580.
Can wait for go1.11. Probably safe but the crash is only for
invalid input, so not worth the risk.
Change-Id: I51f88c5be35a8880536147d1fe5c5dd6798c29de
Reviewed-on: https://go-review.googlesource.com/90398
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|