aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/asm
AgeCommit message (Collapse)Author
2018-05-02cmd/asm/internal/asm: update the test cases in arm64enc.s filefanzha02
Uncomment the test cases in arm64enc.s because they can be handled by current assembler. In addition, CL supplements more test cases. Change-Id: I583d45793b8227c6ec370868652dd8bcbfaa1ecf Reviewed-on: https://go-review.googlesource.com/109275 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-30cmd/asm: add vector instructions for ChaCha20Poly1305 on ARM64Balaram Makam
This change provides VZIP1, VZIP2, VTBL instruction for supporting ChaCha20Poly1305 implementation later. Change-Id: Ife7c87b8ab1a6495a444478eeb9d906ae4c5ffa9 Reviewed-on: https://go-review.googlesource.com/110015 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-30cmd/compile: intrinsify runtime.getcallerpc on arm64Wei Xiao
Add a compiler intrinsic for getcallerpc on arm64 for better code generation. Change-Id: I897e670a2b8ffa1a8c2fdc638f5b2c44bda26318 Reviewed-on: https://go-review.googlesource.com/109276 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-28cmd/asm: add s390x VMSLG instructionbill_ofarrell
This instruction was introduced on the z14 to accelerate "limbified" multiplications for certain cryptographic algorithms. This change allows it to be used in Go assembly. Change-Id: Ic93dae7fec1756f662874c08a5abc435bce9dd9e Reviewed-on: https://go-review.googlesource.com/109695 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-28cmd/internal/obj/arm64: reorder the assembler's optab entriesfanzha02
Current optab entries are unordered, because the new instructions are added at the end of the optab. The patch reorders them by comments in optab, such as arithmetic operations, logical operations and a series of load/store etc. The patch removes the VMOVS opcode because FMOVS already has the same operation. Change-Id: Iccdf89ecbb3875b9dfcb6e06be2cc19c7e5581a2 Reviewed-on: https://go-review.googlesource.com/109896 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-26cmd/asm: add VSRI instruction on ARM64Balaram Makam
This change provides VSRI instruction for ChaCha20Poly1305 implementation. Change-Id: Ifee727b6f3982b629b44a67cac5bbe87ca59027b Reviewed-on: https://go-review.googlesource.com/109342 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-26cmd/compile: optimize ARM64 code with CMN/TSTBalaram Makam
Use CMN/TST to simplify comparisons. This can reduce the register pressure by removing single def/use registers for example: ADDW R0, R1, R8 -> CMNW R1, R0 ; CMN is an alias of ADDS. CBZW R8, label -> BEQ label ; single def/use of R8 removed. Little change in performance of go1 benchmark on Amberwing: name old time/op new time/op delta RegexpMatchEasy0_32 247ns ± 0% 246ns ± 0% -0.40% (p=0.008 n=5+5) RegexpMatchEasy0_1K 581ns ± 0% 580ns ± 0% ~ (p=0.079 n=4+5) RegexpMatchEasy1_32 244ns ± 0% 243ns ± 0% -0.41% (p=0.008 n=5+5) RegexpMatchEasy1_1K 804ns ± 0% 806ns ± 0% +0.25% (p=0.016 n=5+4) RegexpMatchMedium_32 313ns ± 0% 311ns ± 0% -0.64% (p=0.008 n=5+5) RegexpMatchMedium_1K 52.2µs ± 0% 51.9µs ± 0% -0.51% (p=0.008 n=5+5) RegexpMatchHard_32 2.76µs ± 3% 2.74µs ± 0% ~ (p=0.683 n=5+5) RegexpMatchHard_1K 78.8µs ± 0% 78.9µs ± 0% +0.04% (p=0.008 n=5+5) FmtFprintfEmpty 58.6ns ± 0% 57.7ns ± 0% -1.54% (p=0.008 n=5+5) FmtFprintfString 118ns ± 0% 115ns ± 0% -2.54% (p=0.008 n=5+5) FmtFprintfInt 119ns ± 0% 119ns ± 0% ~ (all equal) FmtFprintfIntInt 192ns ± 0% 192ns ± 0% ~ (all equal) FmtFprintfPrefixedInt 224ns ± 0% 205ns ± 0% -8.48% (p=0.008 n=5+5) FmtFprintfFloat 336ns ± 0% 333ns ± 1% ~ (p=0.683 n=5+5) FmtManyArgs 779ns ± 1% 760ns ± 1% -2.41% (p=0.008 n=5+5) Gzip 437ms ± 0% 436ms ± 0% -0.27% (p=0.008 n=5+5) HTTPClientServer 90.1µs ± 1% 91.1µs ± 0% +1.19% (p=0.008 n=5+5) JSONEncode 20.1ms ± 0% 20.2ms ± 1% ~ (p=0.690 n=5+5) JSONDecode 94.5ms ± 1% 94.1ms ± 1% ~ (p=0.095 n=5+5) Mandelbrot200 5.37ms ± 0% 5.37ms ± 0% ~ (p=0.421 n=5+5) TimeParse 450ns ± 0% 446ns ± 0% -0.89% (p=0.000 n=5+4) TimeFormat 483ns ± 1% 473ns ± 0% -2.19% (p=0.008 n=5+5) Template 90.6ms ± 0% 89.7ms ± 0% -0.93% (p=0.008 n=5+5) GoParse 5.97ms ± 0% 6.01ms ± 0% +0.65% (p=0.008 n=5+5) BinaryTree17 11.8s ± 0% 11.7s ± 0% -0.28% (p=0.016 n=5+5) Revcomp 669ms ± 0% 669ms ± 0% ~ (p=0.222 n=5+5) Fannkuch11 3.28s ± 0% 3.34s ± 0% +1.72% (p=0.016 n=4+5) [Geo mean] 46.6µs 46.3µs -0.74% name old speed new speed delta RegexpMatchEasy0_32 129MB/s ± 0% 130MB/s ± 0% +0.32% (p=0.016 n=5+4) RegexpMatchEasy0_1K 1.76GB/s ± 0% 1.76GB/s ± 0% +0.13% (p=0.016 n=4+5) RegexpMatchEasy1_32 131MB/s ± 0% 132MB/s ± 0% +0.32% (p=0.008 n=5+5) RegexpMatchEasy1_1K 1.27GB/s ± 0% 1.27GB/s ± 0% -0.24% (p=0.016 n=5+4) RegexpMatchMedium_32 3.19MB/s ± 0% 3.21MB/s ± 0% +0.63% (p=0.008 n=5+5) RegexpMatchMedium_1K 19.6MB/s ± 0% 19.7MB/s ± 0% +0.51% (p=0.029 n=4+4) RegexpMatchHard_32 11.6MB/s ± 2% 11.7MB/s ± 0% ~ (p=1.000 n=5+5) RegexpMatchHard_1K 13.0MB/s ± 0% 13.0MB/s ± 0% ~ (p=0.079 n=4+5) Gzip 44.4MB/s ± 0% 44.5MB/s ± 0% +0.27% (p=0.008 n=5+5) JSONEncode 96.4MB/s ± 0% 96.2MB/s ± 1% ~ (p=0.579 n=5+5) JSONDecode 20.5MB/s ± 1% 20.6MB/s ± 1% ~ (p=0.111 n=5+5) Template 21.4MB/s ± 0% 21.6MB/s ± 0% +0.94% (p=0.008 n=5+5) GoParse 9.70MB/s ± 0% 9.63MB/s ± 0% -0.68% (p=0.016 n=4+5) Revcomp 380MB/s ± 0% 380MB/s ± 0% ~ (p=0.222 n=5+5) [Geo mean] 55.3MB/s 55.4MB/s +0.23% Change-Id: I2e5338138991d9bc984e67b51212aa5d1b0f2a6b Reviewed-on: https://go-review.googlesource.com/97335 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>
2018-04-24cmd/internal/obj/x86: forbid mem args for MOV_DR and MOV_CRquasilyte
Memory arguments for debug/control register moves are a minefield for programmer: not useful, but can lead to errors. See referenced issue for detailed explanation. Fixes #24981 Change-Id: I918e81cd4a8b1dfcfc9023cdfc3de45abe29e749 Reviewed-on: https://go-review.googlesource.com/107075 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-20cmd/internal/obj/x86: disallow PC/FP/SB scaled indexquasilyte
Reject to compile I386/AMD64 asm code that contains (Register)(PseudoReg*scale) forms of memory operands. Example of such program: "CALL (AX)(PC*2)". PseudoReg is one of the PC, FP, SB (but not SP). When pseudo-register is used in register indirect as scaled index base, x86 backend will panic because its register file misses SB/FP/PC registers. Fixes #12657. Change-Id: I30fca797b537cbc86ab47583ae96c6a0c59acaa1 Reviewed-on: https://go-review.googlesource.com/107835 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-20cmd/asm/internal/asm: add AVX512 end2end testsisharipo
All test cases are commented-out until real implementation is merged. These files are required to make x86avxgen work: it expects that for each new instruction it enables end2end test cases are available. This test suite is automatically generated. Additional AVX512 tests will be added later. Change-Id: I5f5cb6b90540834585ee5ad4c00ebfbb6efa8094 Reviewed-on: https://go-review.googlesource.com/107217 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-19cmd/asm: add rev64 instruction on ARM64Fangming.Fang
This change provides VREV64 instruction for AES-GCM implementation. Change-Id: Icdf278862b03556388586f459964b025c47b8c19 Reviewed-on: https://go-review.googlesource.com/107696 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-18cmd/internal/obj/arm64: refactor the extended/shifted register encoding to ↵fanzha02
the backend The current code encodes the register and the shift/extension into a.Offset field and this is done in the frontend. The CL refactors it to have the frontend record the register/shift/extension information in a.Reg or a.Index and leave the encoding stuff for the backend. Change-Id: I600f456aec95377b7b79cd58e94afcb30aca5d19 Reviewed-on: https://go-review.googlesource.com/106815 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-18cmd/internal/obj/ppc64: add vector multiply instructionsCarlos Eduardo Seo
This change adds vector multiply instructions to the assembler for ppc64x. Change-Id: I5143a2dc3736951344d43999066d38ab8be4a721 Reviewed-on: https://go-review.googlesource.com/107795 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-17cmd/internal/obj/arm, runtime: delete old ARM softfloat codeCherry Zhang
CL 106735 changed to the new softfloat support on GOARM=5. ARM assembly code that uses FP instructions not guarded on GOARM, if any, will break. The easiest way to fix is probably to use Go implementation on GOARM=5, like MOVB runtime·goarm(SB), R11 CMP $5, R11 BEQ arm5 ... FP instructions ... RET arm5: CALL or JMP to Go implementation Change-Id: I52fc76fac9c854ebe7c6c856c365fba35d3f560a Reviewed-on: https://go-review.googlesource.com/107475 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-17cmd/internal/obj/x86: better error msg for offset overflow on AMD64quasilyte
Say "offset too large" instead of "invalid instruction" when assembling for AMD64. GOARCH=386 already reports error correctly. Fixed #24871 Change-Id: Iab029307b5c5edbb45f9df4b64c60ecb5f101349 Reviewed-on: https://go-review.googlesource.com/107116 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-04-13cmd/internal/obj/arm64: fix the bug of incorrect handling negative offset of ↵fanzha02
LDP/STP/LDPW/STPW The current assembler will report error when the negative offset is in the range of [-256, 0) and is not the multiples of 4/8. The fix introduces C_NSAUTO_8, C_NSAUTO_4 and C_NAUTO4K. C_NPAUTO includes C_NSAUTO_8 instead of C_NSAUTO, C_NAUTO4K includes C_NSAUTO_8, C_NSAUTO_4 and C_NSAUTO. So that assembler will encode the negative offset that is greater than -4095 and is not the multiples of 4/8 as two instructions. Add the test cases. Fixed #24471 Change-Id: I42f34e3b8a9fc52c9e8b41504294271aafade639 Reviewed-on: https://go-review.googlesource.com/102635 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-11cmd/internal/obj/arm64: support SWPD/SWPW/SWPH/SWPBBen Shi
SWPD/SWPW/SWPH/SWPB were introduced in ARMv8.1. They swap content of register and memory atomically. And their difference is SWPD: 64-bit double word data SWPW: 32-bit word data (zero extended to 64-bit) SWPH: 16-bit half word data (zero extended to 64-bit) SWPB: 8-bit byte data (zero extended to 64-bit) This CL implements them in the arm64 assembler. Change-Id: I2d9fb2310674bd92693531210e187143e7eed602 Reviewed-on: https://go-review.googlesource.com/101516 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-11cmd/internal/obj/arm64: add support for a series of load/store with register ↵fanzha02
offset instrucitons The patch adds support for arm64 instructions LDRB, LDRH, LDRSB, LDRSH, LDRSW, STR, STRB and STRH with register offset. Test cases are also added. Change-Id: I8d17fddd2963c0bc366e12b00bac49b93f3f0957 Reviewed-on: https://go-review.googlesource.com/91575 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-09cmd/internal/obj/arm64: add assembler support for load with register offsetfanzha02
The patch adds support for LDR(register offset) instruction. And add the test cases and negative tests. Change-Id: I5b32c6a5065afc4571116d4896f7ebec3c0416d3 Reviewed-on: https://go-review.googlesource.com/87955 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-06cmd/asm/internal/arch: unexport ParseARM64SuffixTobias Klauser
ParseARM64Suffix is not used outside cmd/asm/internal/arch. Change-Id: I8e7782dce11cf8cd2fd08dd17e555ced8d87ba24 Reviewed-on: https://go-review.googlesource.com/105115 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-06cmd: some semi-automated cleanupsDaniel Martí
* Remove some redundant returns * Replace HasPrefix with TrimPrefix * Remove some obviously dead code Passes toolstash -cmp on std cmd. Change-Id: Ifb0d70a45cbb8a8553758a8c4878598b7fe932bc Reviewed-on: https://go-review.googlesource.com/105017 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-03cmd/asm, math: add s390x floating point test instructionsMichael Munday
Floating point test instructions allow special cases (NaN, ±∞ and a few other useful properties) to be checked directly. This CL adds the following instructions to the assembler: * LTEBR - load and test (float32) * LTDBR - load and test (float64) * TCEB - test data class (float32) * TCDB - test data class (float64) Note that I have only added immediate versions of the 'test data class' instructions for now as that's the only case I think the compiler will use. Change-Id: I3398aab2b3a758bf909bd158042234030c8af582 Reviewed-on: https://go-review.googlesource.com/104457 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-03cmd/asm: add essential instructions for AES-GCM on ARM64Fangming.Fang
This change adds VLD1, VST1, VPMULL{2}, VEXT, VRBIT, VUSHR and VSHL instructions for supporting AES-GCM implementation later. Fixes #24400 Change-Id: I556feb88067f195cbe25629ec2b7a817acc58709 Reviewed-on: https://go-review.googlesource.com/101095 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-02cmd: remove some unused parametersDaniel Martí
Change-Id: I9d2a4b8df324897e264d30801e95ddc0f0e75f3a Reviewed-on: https://go-review.googlesource.com/102337 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-03-27cmd/internal/obj/arm: add DMB instructionYuval Pavel Zholkover
Change-Id: Ib67a61d5b37af210ff15d60d72bd5238b9c2d0ca Reviewed-on: https://go-review.googlesource.com/94815 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-27cmd/internal/obj/arm64: add LDPW/LDPSW/STPW to arm64 assemblerBen Shi
1. STPW stores the lower 32-bit words of a pair of registers to memory. 2. LDPW loads two 32-bit words from memory, zero extends them to 64-bit, and then copies to a pair of registers. 3. LDPSW does the same as LDPW, except a sign extension. This CL implements those 3 instructions and adds test cases. Change-Id: Ied9834d8240240d23ce00e086b4ea456e1611f1a Reviewed-on: https://go-review.googlesource.com/99956 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-21runtime,sync/atomic: replace asm BYTEs with insts for x86isharipo
For each replacement, test case is added to new 386enc.s file with exception of EMMS, SYSENTER, MFENCE and LFENCE as they are already covered in amd64enc.s (same on amd64 and 386). The replacement became less obvious after go vet suggested changes Before: BYTE $0x0f; BYTE $0x7f; BYTE $0x44; BYTE $0x24; BYTE $0x08 Changed to MOVQ (this form is being tested): MOVQ M0, 8(SP) Refactored to FP-relative access (go vet advice): MOVQ M0, val+4(FP) Change-Id: I56b87cf3371b6ad81ad0cd9db2033aee407b5818 Reviewed-on: https://go-review.googlesource.com/101475 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-03-20cmd/asm: fix bug about VMOV instruction (move a vector element to another) ↵Fangming.Fang
on ARM64 This change fixes index error when encoding VMOV instruction which pattern is vmov Vn.<T>[index], Vd.<T>[index] Change-Id: I949166e6dfd63fb0a9365f183b6c50d452614f9d Reviewed-on: https://go-review.googlesource.com/101335 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-20cmd/asm: fix bug about VMOV instruction (move register to vector element) on ↵Fangming.Fang
ARM64 This change fixes index error when encoding VMOV instruction which pattern is VMOV Rn, V.<T>[index]. For example VMOV R1, V1.S[1] is assembled as VMOV R1, V1.S[0] Fixes #24400 Change-Id: I82b5edc8af4e06862bc4692b119697c6bb7dc3fb Reviewed-on: https://go-review.googlesource.com/101297 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-19cmd/asm: add ARM64 assembler check for incorrect inputfanzha02
Current ARM64 assembler has no check for the invalid value of both shift amount and post-index immediate offset of LD1/ST1. This patch adds the check. This patch also fixes the printing error of register number equals to 31, which should be printed as ZR instead of R31. Test cases are also added. Change-Id: I476235f3ab3a3fc91fe89c5a3149a4d4529c05c7 Reviewed-on: https://go-review.googlesource.com/100255 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>
2018-03-14runtime: improve arm64 memclr implementationBalaram Makam
Improve runtime memclr_arm64.s using ZVA feature to zero out memory when n is at least 64 bytes. Also add DCZID_EL0 system register to use in MRS instruction. Benchmark results of runtime/Memclr on Amberwing: name old time/op new time/op delta Memclr/5 12.7ns ± 0% 12.7ns ± 0% ~ (all equal) Memclr/16 12.7ns ± 0% 12.2ns ± 1% -4.13% (p=0.000 n=7+8) Memclr/64 14.0ns ± 0% 14.6ns ± 1% +4.29% (p=0.000 n=7+8) Memclr/256 23.7ns ± 0% 25.7ns ± 0% +8.44% (p=0.000 n=8+7) Memclr/4096 204ns ± 0% 74ns ± 0% -63.71% (p=0.000 n=8+8) Memclr/65536 2.89µs ± 0% 0.84µs ± 0% -70.91% (p=0.000 n=8+8) Memclr/1M 45.9µs ± 0% 17.0µs ± 0% -62.88% (p=0.000 n=8+8) Memclr/4M 184µs ± 0% 77µs ± 4% -57.94% (p=0.001 n=6+8) Memclr/8M 367µs ± 0% 144µs ± 1% -60.72% (p=0.000 n=7+8) Memclr/16M 734µs ± 0% 293µs ± 1% -60.09% (p=0.000 n=8+8) Memclr/64M 2.94ms ± 0% 1.23ms ± 0% -58.06% (p=0.000 n=7+8) GoMemclr/5 8.00ns ± 0% 8.79ns ± 0% +9.83% (p=0.000 n=8+8) GoMemclr/16 8.00ns ± 0% 7.60ns ± 0% -5.00% (p=0.000 n=8+8) GoMemclr/64 10.8ns ± 0% 10.4ns ± 0% -3.70% (p=0.000 n=8+8) GoMemclr/256 20.4ns ± 0% 21.2ns ± 0% +3.92% (p=0.000 n=8+8) name old speed new speed delta Memclr/5 394MB/s ± 0% 393MB/s ± 0% -0.28% (p=0.006 n=8+8) Memclr/16 1.26GB/s ± 0% 1.31GB/s ± 1% +4.07% (p=0.000 n=7+8) Memclr/64 4.57GB/s ± 0% 4.39GB/s ± 2% -3.91% (p=0.000 n=7+8) Memclr/256 10.8GB/s ± 0% 10.0GB/s ± 0% -7.95% (p=0.001 n=7+6) Memclr/4096 20.1GB/s ± 0% 55.3GB/s ± 0% +175.46% (p=0.000 n=8+8) Memclr/65536 22.6GB/s ± 0% 77.8GB/s ± 0% +243.63% (p=0.000 n=7+8) Memclr/1M 22.8GB/s ± 0% 61.5GB/s ± 0% +169.38% (p=0.000 n=8+8) Memclr/4M 22.8GB/s ± 0% 54.3GB/s ± 4% +137.85% (p=0.001 n=6+8) Memclr/8M 22.8GB/s ± 0% 58.1GB/s ± 1% +154.56% (p=0.000 n=7+8) Memclr/16M 22.8GB/s ± 0% 57.2GB/s ± 1% +150.54% (p=0.000 n=8+8) Memclr/64M 22.8GB/s ± 0% 54.4GB/s ± 0% +138.42% (p=0.000 n=7+8) GoMemclr/5 625MB/s ± 0% 569MB/s ± 0% -8.90% (p=0.000 n=7+8) GoMemclr/16 2.00GB/s ± 0% 2.10GB/s ± 0% +5.26% (p=0.000 n=8+8) GoMemclr/64 5.92GB/s ± 0% 6.15GB/s ± 0% +3.83% (p=0.000 n=7+8) GoMemclr/256 12.5GB/s ± 0% 12.1GB/s ± 0% -3.77% (p=0.000 n=8+7) Benchmark results of runtime/Memclr on Amberwing without ZVA: name old time/op new time/op delta Memclr/5 12.7ns ± 0% 12.8ns ± 0% +0.79% (p=0.008 n=5+5) Memclr/16 12.7ns ± 0% 12.7ns ± 0% ~ (p=0.444 n=5+5) Memclr/64 14.0ns ± 0% 14.4ns ± 0% +2.86% (p=0.008 n=5+5) Memclr/256 23.7ns ± 1% 19.2ns ± 0% -19.06% (p=0.008 n=5+5) Memclr/4096 203ns ± 0% 119ns ± 0% -41.38% (p=0.008 n=5+5) Memclr/65536 2.89µs ± 0% 1.66µs ± 0% -42.76% (p=0.008 n=5+5) Memclr/1M 45.9µs ± 0% 26.2µs ± 0% -42.82% (p=0.008 n=5+5) Memclr/4M 184µs ± 0% 105µs ± 0% -42.81% (p=0.008 n=5+5) Memclr/8M 367µs ± 0% 210µs ± 0% -42.76% (p=0.008 n=5+5) Memclr/16M 734µs ± 0% 420µs ± 0% -42.74% (p=0.008 n=5+5) Memclr/64M 2.94ms ± 0% 1.69ms ± 0% -42.46% (p=0.008 n=5+5) GoMemclr/5 8.00ns ± 0% 8.40ns ± 0% +5.00% (p=0.008 n=5+5) GoMemclr/16 8.00ns ± 0% 8.40ns ± 0% +5.00% (p=0.008 n=5+5) GoMemclr/64 10.8ns ± 0% 9.6ns ± 0% -11.02% (p=0.008 n=5+5) GoMemclr/256 20.4ns ± 0% 17.2ns ± 0% -15.69% (p=0.008 n=5+5) name old speed new speed delta Memclr/5 393MB/s ± 0% 391MB/s ± 0% -0.64% (p=0.008 n=5+5) Memclr/16 1.26GB/s ± 0% 1.26GB/s ± 0% -0.55% (p=0.008 n=5+5) Memclr/64 4.57GB/s ± 0% 4.44GB/s ± 0% -2.79% (p=0.008 n=5+5) Memclr/256 10.8GB/s ± 0% 13.3GB/s ± 0% +23.07% (p=0.016 n=4+5) Memclr/4096 20.1GB/s ± 0% 34.3GB/s ± 0% +70.91% (p=0.008 n=5+5) Memclr/65536 22.7GB/s ± 0% 39.6GB/s ± 0% +74.65% (p=0.008 n=5+5) Memclr/1M 22.8GB/s ± 0% 40.0GB/s ± 0% +74.88% (p=0.008 n=5+5) Memclr/4M 22.8GB/s ± 0% 39.9GB/s ± 0% +74.84% (p=0.008 n=5+5) Memclr/8M 22.9GB/s ± 0% 39.9GB/s ± 0% +74.71% (p=0.008 n=5+5) Memclr/16M 22.9GB/s ± 0% 39.9GB/s ± 0% +74.64% (p=0.008 n=5+5) Memclr/64M 22.8GB/s ± 0% 39.7GB/s ± 0% +73.79% (p=0.008 n=5+5) GoMemclr/5 625MB/s ± 0% 595MB/s ± 0% -4.77% (p=0.000 n=4+5) GoMemclr/16 2.00GB/s ± 0% 1.90GB/s ± 0% -4.77% (p=0.008 n=5+5) GoMemclr/64 5.92GB/s ± 0% 6.66GB/s ± 0% +12.48% (p=0.016 n=4+5) GoMemclr/256 12.5GB/s ± 0% 14.9GB/s ± 0% +18.95% (p=0.008 n=5+5) Fixes #22948 Change-Id: Iaae4e22391e25b54d299821bb7f8a81ac3986b93 Reviewed-on: https://go-review.googlesource.com/82055 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-14cmd/asm: move manual tests out of generated fileDaniel Martí
Thanks to Iskander Sharipov for spotting this in an earlier CL of mine. Change-Id: Idf45ad266205ff83985367cb38f585badfbed151 Reviewed-on: https://go-review.googlesource.com/100535 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Iskander Sharipov <iskander.sharipov@intel.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-13cmd/asm: VPERMQ's imm8 arg is an uint8Daniel Martí
The imm8 argument consists of 4 2-bit indices, so it can take values up to $255. However, the assembler was treating it as Yi8, which reads "fits in int8". Add a Yu8 variant, to also keep backwards compatibility with negative values possible with Yi8. Fixes #24378. Change-Id: I24ddb19c219b54d039a6c1bcdb903717d1c7c3b8 Reviewed-on: https://go-review.googlesource.com/100475 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-03-13cmd/internal/obj/ppc64: implement full operand support for l*arx instructionsCarlos Eduardo Seo
The current implementation of l*arx instructions does not accept non-zero offsets in RA nor the EH field. This change adds full functionality to those instructions. Updates #23845 Change-Id: If113f70d11de5f35f8389520b049390dbc40e863 Reviewed-on: https://go-review.googlesource.com/99635 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-03-13cmd/internal/obj/arm64: support logical instructions targeting RSPCherry Zhang
Logical instructions can have RSP as its destination. Support it. Note that the two-operand form, like "AND $1, RSP", which is equivalent to the three-operand form "AND $1, RSP, RSP", is invalid, because the source register is not allowed to be RSP. Also note that instructions that set the conditional flags, like ANDS, cannot target RSP. Because of this, we split out the optab entries of AND et al. and ANDS et al. Merge the optab entries of BIC et al. to AND et al., because they are same. Fixes #24332. Change-Id: I3584d6f2e7cea98a659a1ed9fdf67c353e090637 Reviewed-on: https://go-review.googlesource.com/100217 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-03-12cmd/asm: fix ARM64 vector register arrangement encoding bugfanzha02
The current code assigns vector register arrangement a wrong value when the arrangement specifier is S2, which causes the incorrect assembly. The patch fixes the issue and adds the test cases. Fixes #24249 Change-Id: I9736df1279494003d0b178da1af9cee9cd85ce21 Reviewed-on: https://go-review.googlesource.com/98555 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-08cmd/asm, cmd/internal/obj/ppc64: avoid unnecessary load zerosLynn Boger
When instructions add, and, or, xor, and movd have constant operands in some cases more instructions are generated than necessary by the assembler. This adds more opcode/operand combinations to the optab and improves the code generation for the cases where the size and sign of the constant allows the use of 1 instructions instead of 2. Example of previous code: oris r3, r0, 0 ori r3, r3, 65533 now: ori r3, r0, 65533 This does not significantly reduce the overall binary size because the improvement depends on the constant value. Some procedures show a 1-2% reduction in size. This improvement could also be significant in cases where the extra instructions occur in a critical loop. Testcase ppc64enc.s was added to cmd/asm/internal/asm/testdata with the variations affected by this change. Updates #23845 Change-Id: I7fdf2320c95815d99f2755ba77d0c6921cd7fad7 Reviewed-on: https://go-review.googlesource.com/95135 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-03-01cmd/asm: fix assembling return jumpCherry Zhang
In RET instruction, the operand is the return jump's target, which should be put in Prog.To. Add an action "buildrundir" to the test driver, which builds (compile+assemble+link) the code in a directory and runs the resulting binary. Fixes #23838. Change-Id: I7ebe7eda49024b40a69a24857322c5ca9c67babb Reviewed-on: https://go-review.googlesource.com/94175 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-02-28cmd/asm: enable several arm64 load & store instructionserifan01
Instructions LDARB, LDARH, LDAXPW, LDAXP, STLRB, STLRH, STLXP, STLXPW, STXP, STXPW have been added before, but they are not enabled. This CL enabled them. Change the form of LDXP and LDXPW to the form of LDP, and fix a bug of STLXP. Change-Id: I5d2b51494b92451bf6b072c65cfdd8acf07e9b54 Reviewed-on: https://go-review.googlesource.com/96215 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-27cmd/internal/obj/x86: add missing legacy instsisharipo
Minimizes the amount of "TODO" stuff in test suite of cmd/asm/internal/asm/testdata/amd64enc.s. Some instructions were already implemented, but test cases for them were commented-out. Does not enable MMX instructions, calls/jumps and some segment registers instructions. -- Affected instructions -- BLENDVPD, BLENDVPS BSWAPW CBW CDQE CLAC CLFLUSHOPT CMPXCHG16B CRC32B, CRC32L, CRC32W CWDE FBLD FBSTP FCMOVB FCMOVBE FCMOVE FCMOVNB FCMOVNBE FCMOVU FCOMI FCOMIP IMUL3L, IMUL3Q, IMUL3W ICEBP, INT INVPCID LARQ LGDT, LIDT, LLDT LMSW LTR LZCNTL, LZCNTQ, LZCNTW MONITOR MOVBELL, MOVBEQQ, MOVBEWW MOVBQZX MOVQ MOVSWW, MOVZWW MWAIT NOPL, NOPW PBLENDVB PEXTRW RDPKRU RDRANDL, RDRANDQ, RDRANDW RDSEEDL, RDSEEDQ, RDSEEDW RDTSCP SAHF SGDT, SIDT SLDTL, SLDTQ, SLDTW SMSWL, SMSWQ, SMSWW STAC STRL, STRQ, STRW SYSENTER, SYSENTER64 SYSEXIT, SYSEXIT64 SHA256RNDS2 TZCNTL, TZCNTQ, TZCNTW UD1, UD2 WRPKRU XRSTOR, XRSTOR64 XRSTORS, XRSTORS64 XSAVE, XSAVE64 XSAVEC, XSAVEC64 XSAVEOPT, XSAVEOPT64 XSAVES, XSAVES64 XSETBV Fixes #6739 Change-Id: I8b125d9a5ea39bb4b9da7e66a63a16f609cef376 Reviewed-on: https://go-review.googlesource.com/97235 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-02-26cmd: avoid unnecessary type conversionsKunpei Sakai
CL generated mechanically with github.com/mdempsky/unconvert. Also updated cmd/compile/internal/ssa/gen/*.rules manually. Change-Id: If721ef73cf0771ae83ce7e2d11623fc8d9155768 Reviewed-on: https://go-review.googlesource.com/97075 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-26cmd/compile: track line directives w/ column informationRobert Griesemer
Extend cmd/internal/src.PosBase to track column information, and adjust the meaning of the PosBase position to mean the position at which the PosBase's relative (line, col) position starts (rather than indicating the position of the //line directive). Because this semantic change is made in the compiler's noder, it doesn't affect the logic of src.PosBase, only its test setup (where PosBases are constructed with corrected incomming positions). In short, src.PosBase now matches syntax.PosBase with respect to the semantics of src.PosBase.pos. For #22662. Change-Id: I5b1451cb88fff3f149920c2eec08b6167955ce27 Reviewed-on: https://go-review.googlesource.com/96535 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-02-22cmd/asm: add arm64 instructions for math optimizationerifan01
Add arm64 HW instructions FMADDD, FMADDS, FMSUBD, FMSUBS, FNMADDD, FNMADDS, FNMSUBD, FNMSUBS, VFMLA, VFMLS, VMOV (element) for math optimization. Add check on register element index and test cases. Change-Id: Ice07c50b1a02d488ad2cde2a4e8aea93f3e3afff Reviewed-on: https://go-review.googlesource.com/90876 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-20all: fix misspellingsShawn Smith
GitHub-Last-Rev: 468df242d07419c228656985702325aa78952d99 GitHub-Pull-Request: golang/go#23935 Change-Id: If751ce3ffa3a4d5e00a3138211383d12cb6b23fc Reviewed-on: https://go-review.googlesource.com/95577 Run-TryBot: Andrew Bonventre <andybons@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Andrew Bonventre <andybons@golang.org>
2018-02-15cmd/compile: arm64 intrinsics for math/bits.OnesCountBalaram Makam
This adds math/bits intrinsics for OnesCount on arm64. name old time/op new time/op delta OnesCount 3.81ns ± 0% 1.60ns ± 0% -57.96% (p=0.000 n=7+8) OnesCount8 1.60ns ± 0% 1.60ns ± 0% ~ (all equal) OnesCount16 2.41ns ± 0% 1.60ns ± 0% -33.61% (p=0.000 n=8+8) OnesCount32 4.17ns ± 0% 1.60ns ± 0% -61.58% (p=0.000 n=8+8) OnesCount64 3.80ns ± 0% 1.60ns ± 0% -57.84% (p=0.000 n=8+8) Update #18616 Conflicts: src/cmd/compile/internal/gc/asm_test.go Change-Id: I63ac2f63acafdb1f60656ab8a56be0b326eec5cb Reviewed-on: https://go-review.googlesource.com/90835 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-15cmd/asm, cmd/internal/obj/ppc64: add Immediate Shifted opcodes for ppc64xCarlos Eduardo Seo
This change adds ADD/AND/OR/XOR Immediate Shifted instructions for ppc64x so they are usable in Go asm code. These instructions were originally present in asm9.go, but they were only usable in that file (as -AADD, -AANDCC, -AOR, -AXOR). These old mnemonics are now removed. Updates #23845 Change-Id: Ifa2fac685e8bc628cb241dd446adfc3068181826 Reviewed-on: https://go-review.googlesource.com/94115 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-02-14cmd/asm: add PRFM instruction on ARM64fanzha02
The current assembler cannot handle PRFM(immediate) instruciton. The fix creates a prfopfield struct that contains the eight prefetch operations and the value to use in instruction. And add the test cases. Fixes #22932 Change-Id: I621d611bd930ef3c42306a4372447c46d53b2ccf Reviewed-on: https://go-review.googlesource.com/81675 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-14cmd/internal/obj/mips: support NEG, avoid crash with illegal instructionCherry Zhang
Add support of NEG{V,W} pseudo-instructions, which are translated to a SUB instruction from R0 with proper width. Also turn illegal instruction to UNDEF, to avoid crashing in asmout when it tries to read the operands. Fixes #23548. Change-Id: I047b27559ccd9594c3dcf62ab039b636098f30a3 Reviewed-on: https://go-review.googlesource.com/89896 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-02-13cmd/internal/obj/mips: fix use of R28 on 32-bit MIPSCherry Zhang
R28 is used as the SB register on MIPS64, and it was printed as "RSB" on both 32-bit and 64-bit MIPS. This is confusing on MIPS32 as there R28 is just a general purpose register. Further, this string representation is used in the assembler's frontend to parse register symbols, and this leads to failure in parsing R28 in MIPS32 assembly code. Change rconv to always print the register as R28. This fixes the parsing problem on MIPS32, and this is a reasonable representation on both MIPS32 and MIPS64. Change-Id: I30d6c0a442fbb08ea615f32f1763b5baadcee1da Reviewed-on: https://go-review.googlesource.com/92915 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-02-13cmd/asm: fix crash on bad symbol for TEXTRob Pike
Was missing a check in validSymbol. Fixes #23580. Can wait for go1.11. Probably safe but the crash is only for invalid input, so not worth the risk. Change-Id: I51f88c5be35a8880536147d1fe5c5dd6798c29de Reviewed-on: https://go-review.googlesource.com/90398 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>