aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/asm
AgeCommit message (Collapse)Author
2019-04-16cmd/asm: add s390x 'rotate then ... selected bits' instructionsMichael Munday
This CL adds the following instructions, useful for shifting/rotating and masking operations: * RNSBG - rotate then and selected bits * ROSBG - rotate then or selected bits * RXSBG - rotate then exclusive or selected bits * RISBG - rotate then insert selected bits It also adds the 'T' (test), 'Z' (zero), 'H' (high), 'L' (low) and 'N' (no test) variants of these instructions as appropriate. Operands are ordered as: I₃, I₄, I₅, R₂, R₁. Key: I₃=start, I₄=end, I₅=amount, R₂=source, R₁=destination Change-Id: I200d12287e1df7447f37f4919da5e9a93d27c792 Reviewed-on: https://go-review.googlesource.com/c/go/+/159357 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-10cmd/asm/internal/arch: improve the comment of function IsMIPSMULsmileeye
The check of MADD&MSUB was added to the function IsMIPSMUL in a previous commit, and the comments should also be updated. Change-Id: I2d3da055d55b459b908714c542dff99ab5c6cf99 Reviewed-on: https://go-review.googlesource.com/c/go/+/171102 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-09cmd/internal/obj/x86: allow non-zero offset in TLS referenceCherry Zhang
An instruction that references TLS, e.g. MOVQ 0(TLS), AX on some platforms (e.g. Android), or in shared mode, may be translated to (assuming TLS offset already loaded to CX) MOVQ 0(CX)(TLS*1), AX which in turns translates to movq %fs:(%rcx), %rax We have rejected non-zero offset for TLS reference, like 16(TLS). Actually, the instruction can take offset, i.e. it is a valid instruction for, e.g., movq %fs:16(%rcx),%rcx So, allow offset in TLS reference. Change-Id: Iaf1996bad7fe874e0c298ea441af5acb136a4028 Reviewed-on: https://go-review.googlesource.com/c/go/+/171151 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-29cmd/asm: add 'insert program mask' instruction for s390xMichael Munday
This CL adds the 'insert program mask' (IPM) instruction to s390x. IPM stores the current program mask (which contains the condition code) into a general purpose register. This instruction will be useful when implementing intrinsics for the arithmetic functions in the math/bits package. We can also potentially use it to convert some condition codes into bool values. The condition code can be saved and restored using an instruction sequence such as: IPM R4 // save condition code to R4 ... TMLH R4, $0x3000 // restore condition code from R4 We can also use IPM to save the carry bit to a register using an instruction sequence such as: IPM R4 // save condition code to R4 RISBLGZ $31, $31, $3, R4, R4 // isolate carry bit in R4 Change-Id: I169d450b6ea1a7ff8c0286115ddc42618da8a2f4 Reviewed-on: https://go-review.googlesource.com/c/go/+/165997 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-23cmd/internal/obj/mips: add MADD/MSUBBen Shi
This CL implements MADD&MSUB, which are mips32r2 instructions. Change-Id: I06fe51573569baf3b71536336b34b95ccd24750b Reviewed-on: https://go-review.googlesource.com/c/go/+/167680 Run-TryBot: Ben Shi <powerman1st@163.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-14cmd/internal/obj/ppc64: fix wrong register encoding in XX1-Form instructionsCarlos Eduardo Seo
A bug in the encoding of XX1-Form is flipping bit 31 of such instructions. This may result in register clobering when using VSX instructions. This was not exposed before because we currently don't generate these instructions in SSA, and the asm files in which they are present aren't affected by register clobbering. This change fixes the bug and adds a testcase for the problem. Fixes #30112 Change-Id: I77b606159ae1efea33d2ba3e1c74b7fae8d5d2e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/163759 Reviewed-by: Bryan C. Mills <bcmills@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Bryan C. Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-06cmd/asm: add arm64 v8.1 atomic instructionserifan01
This change adds several arm64 v8.1 atomic instructions and test cases. They are LDADDAx, LDADDLx, LDANDAx, LDANDALx, LDANDLx, LDEORAx, LDEORALx, LDEORLx, LDORAx, LDORALx, LDORLx, SWPAx and SWPLx. Their form is consistent with the form of the existing atomic instructions. For instructions STXRx, STLXRx, STXPx and STLXPx, the second destination register can't be RSP. This CL also adds a check for this. LDADDx Rs, (Rb), Rt: *Rb -> Rt, Rs + *Rb -> *Rb LDANDx Rs, (Rb), Rt: *Rb -> Rt, Rs AND NOT(*Rb) -> *Rb LDEORx Rs, (Rb), Rt: *Rb -> Rt, Rs EOR *Rb -> *Rb LDORx Rs, (Rb), Rt: *Rb -> Rt, Rs OR *Rb -> *Rb Change-Id: I9f9b0245958cb57ab7d88c66fb9159b23b9017fd Reviewed-on: https://go-review.googlesource.com/c/go/+/157001 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-02-27cmd/asm: improve DATA size operand validationJosh Bleecher Snyder
Prior to this change, DATA instructions accepted the values 1, 2, 4, and 8 as sizes. The acceptable sizes were further restricted to 4 and 8 for float constants. This was both too restrictive and not restrictive enough: string constants may reasonably have any length, and address constants should really only accept pointer-length sizes. Fixes #30269 Change-Id: I06e44ecdf5909eca7b19553861aec1fa39655c2b Reviewed-on: https://go-review.googlesource.com/c/163747 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-02-22cmd/internal/obj/arm64: fix the bug assembling TSTWfanzha02
Current assembler reports error when it assembles "TSTW $1689262177517664, R3", but go1.11 was building fine. Fixes #30334 Change-Id: I9c16d36717cd05df2134e8eb5b17edc385aff0a9 Reviewed-on: https://go-review.googlesource.com/c/163259 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ben Shi <powerman1st@163.com>
2018-12-11cmd/compile: use innermost line number for -SKeith Randall
When functions are inlined, for instructions in the inlined body, does -S print the location of the call, or the location of the body? Right now, we do the former. I'd like to do the latter by default, it makes much more sense when reading disassembly. With mid-stack inlining enabled in more cases, this quandry will come up more often. The original behavior is still available with -S=2. Some tests use this mode (so they can find assembly generated by a particular source line). This helped me with understanding what the compiler was doing while fixing #29007. Change-Id: Id14a3a41e1b18901e7c5e460aa4caf6d940ed064 Reviewed-on: https://go-review.googlesource.com/c/153241 Reviewed-by: David Chase <drchase@google.com>
2018-11-28cmd/asm,cmd/internal/obj/ppc64: add VPERMXOR to ppc64 assemblerLynn Boger
VPERMXOR is missing from the Go assembler for ppc64. It has the same format as VPERM. It was requested by an external user so they could write an optimized algorithm in asm. Change-Id: Icf4c682f7f46716ccae64e6ae3d62e8cec67f6c1 Reviewed-on: https://go-review.googlesource.com/c/151578 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2018-11-16cmd/asm: rename -symabis to -gensymabisAustin Clements
Currently, both asm and compile have a -symabis flag, but in asm it's a boolean flag that means to generate a symbol ABIs file and in the compiler its a string flag giving the path of the symbol ABIs file to consume. I'm worried about this false symmetry biting us in the future, so rename asm's flag to -gensymabis. Updates #27539. Change-Id: I8b9c18a852d2838099718f8989813f19d82e7434 Reviewed-on: https://go-review.googlesource.com/c/149818 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-11-12cmd/asm: add mode to collect symbol ABIsAustin Clements
This adds a -symabis flag that runs the assembler in a special mode that outputs symbol definition and reference ABIs rather than assembling the code. This uses a fast and somewhat lax parser because the go_asm.h definitions may not be available. For #27539. Change-Id: I248ba0ebab7cc75dcb2a90e82a82eb445da7e88e Reviewed-on: https://go-review.googlesource.com/c/147098 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-11-12cmd/asm: factor out line parsing from assemblingAustin Clements
Currently cmd/asm's Parser.line both consumes a line of assembly from the lexer and assembles it. This CL separates these two steps so that the line parser can be reused for purposes other than generating a Prog stream. For #27539. Updates #17544. Change-Id: I452c9a2112fbcc1c94bf909efc0d1fcc71014812 Reviewed-on: https://go-review.googlesource.com/c/147097 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-11-07cmd/internal/obj/arm64: encode large constants into MOVZ/MOVN and MOVK ↵fanzha02
instructions Current assembler gets large constants from constant pool, this CL gets rid of the pool by using MOVZ/MOVN and MOVK to load large constants. This CL changes the assembler behavior as follows. 1. go assembly 1, MOVD $0x1111222233334444, R1 2, MOVD $0x1111ffff1111ffff, R1 previous version: MOVD 0x9a4, R1 (loads constant from pool). optimized version: 1, MOVD $0x4444, R1; MOVK $(0x3333<<16), R1; MOVK $(0x2222<<32), R1; MOVK $(0x1111<<48), R1. 2, MOVN $(0xeeee<<16), R1; MOVK $(0x1111<<48), R1. Add test cases, and below are binary size comparison and bechmark results. 1. Binary size before/after binary size change pkg/linux_arm64 +25.4KB pkg/tool/linux_arm64 -2.9KB go -2KB gofmt no change 2. compiler benchmark. name old time/op new time/op delta Template 574ms ±21% 577ms ±14% ~ (p=0.853 n=10+10) Unicode 327ms ±29% 353ms ±23% ~ (p=0.360 n=10+8) GoTypes 1.97s ± 8% 2.04s ±11% ~ (p=0.143 n=10+10) Compiler 9.13s ± 9% 9.25s ± 8% ~ (p=0.684 n=10+10) SSA 29.2s ± 5% 27.0s ± 4% -7.40% (p=0.000 n=10+10) Flate 402ms ±40% 308ms ± 6% -23.29% (p=0.004 n=10+10) GoParser 470ms ±26% 382ms ±10% -18.82% (p=0.000 n=9+10) Reflect 1.36s ±16% 1.17s ± 7% -13.92% (p=0.001 n=9+10) Tar 561ms ±19% 466ms ±15% -17.08% (p=0.000 n=9+10) XML 745ms ±20% 679ms ±20% ~ (p=0.123 n=10+10) StdCmd 35.5s ± 6% 37.2s ± 3% +4.81% (p=0.001 n=9+8) name old user-time/op new user-time/op delta Template 625ms ±14% 660ms ±18% ~ (p=0.343 n=10+10) Unicode 355ms ±10% 373ms ±20% ~ (p=0.346 n=9+10) GoTypes 2.39s ± 8% 2.37s ± 5% ~ (p=0.897 n=10+10) Compiler 11.1s ± 4% 11.4s ± 2% +2.63% (p=0.010 n=10+9) SSA 35.4s ± 3% 34.9s ± 2% ~ (p=0.113 n=10+9) Flate 402ms ±13% 371ms ±30% ~ (p=0.089 n=10+9) GoParser 513ms ± 8% 489ms ±24% -4.76% (p=0.039 n=9+9) Reflect 1.52s ±12% 1.41s ± 5% -7.32% (p=0.001 n=9+10) Tar 607ms ±10% 558ms ± 8% -7.96% (p=0.009 n=9+10) XML 828ms ±10% 789ms ±12% ~ (p=0.059 n=10+10) name old text-bytes new text-bytes delta HelloSize 714kB ± 0% 712kB ± 0% -0.23% (p=0.000 n=10+10) CmdGoSize 8.26MB ± 0% 8.25MB ± 0% -0.14% (p=0.000 n=10+10) name old data-bytes new data-bytes delta HelloSize 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) CmdGoSize 258kB ± 0% 258kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.18MB ± 0% 1.18MB ± 0% ~ (all equal) CmdGoSize 11.2MB ± 0% 11.2MB ± 0% -0.13% (p=0.000 n=10+10) 3. go1 benckmark. name old time/op new time/op delta BinaryTree17 6.60s ±18% 7.36s ±22% ~ (p=0.222 n=5+5) Fannkuch11 4.04s ± 0% 4.05s ± 0% ~ (p=0.421 n=5+5) FmtFprintfEmpty 91.8ns ±14% 91.2ns ± 9% ~ (p=0.667 n=5+5) FmtFprintfString 145ns ± 0% 151ns ± 6% ~ (p=0.397 n=4+5) FmtFprintfInt 169ns ± 0% 176ns ± 5% +4.14% (p=0.016 n=4+5) FmtFprintfIntInt 229ns ± 2% 243ns ± 6% ~ (p=0.143 n=5+5) FmtFprintfPrefixedInt 343ns ± 0% 350ns ± 3% +1.92% (p=0.048 n=5+5) FmtFprintfFloat 400ns ± 3% 394ns ± 3% ~ (p=0.063 n=5+5) FmtManyArgs 1.04µs ± 0% 1.05µs ± 0% +1.62% (p=0.029 n=4+4) GobDecode 13.9ms ± 4% 13.9ms ± 5% ~ (p=1.000 n=5+5) GobEncode 10.6ms ± 4% 10.6ms ± 5% ~ (p=0.421 n=5+5) Gzip 567ms ± 1% 563ms ± 4% ~ (p=0.548 n=5+5) Gunzip 60.2ms ± 1% 60.4ms ± 0% ~ (p=0.056 n=5+5) HTTPClientServer 114µs ± 4% 108µs ± 7% ~ (p=0.095 n=5+5) JSONEncode 18.4ms ± 2% 17.8ms ± 2% -3.06% (p=0.016 n=5+5) JSONDecode 105ms ± 1% 103ms ± 2% ~ (p=0.056 n=5+5) Mandelbrot200 5.48ms ± 0% 5.49ms ± 0% ~ (p=0.841 n=5+5) GoParse 6.05ms ± 1% 6.05ms ± 2% ~ (p=1.000 n=5+5) RegexpMatchEasy0_32 143ns ± 1% 146ns ± 4% +2.10% (p=0.048 n=4+5) RegexpMatchEasy0_1K 499ns ± 1% 492ns ± 2% ~ (p=0.079 n=5+5) RegexpMatchEasy1_32 137ns ± 0% 136ns ± 1% -0.73% (p=0.016 n=4+5) RegexpMatchEasy1_1K 826ns ± 4% 823ns ± 2% ~ (p=0.841 n=5+5) RegexpMatchMedium_32 224ns ± 5% 233ns ± 8% ~ (p=0.119 n=5+5) RegexpMatchMedium_1K 59.6µs ± 0% 59.3µs ± 1% -0.66% (p=0.016 n=4+5) RegexpMatchHard_32 3.29µs ± 3% 3.26µs ± 1% ~ (p=0.889 n=5+5) RegexpMatchHard_1K 98.8µs ± 2% 99.0µs ± 0% ~ (p=0.690 n=5+5) Revcomp 1.02s ± 1% 1.01s ± 1% ~ (p=0.095 n=5+5) Template 135ms ± 5% 131ms ± 1% ~ (p=0.151 n=5+5) TimeParse 591ns ± 0% 593ns ± 0% +0.20% (p=0.048 n=5+5) TimeFormat 655ns ± 2% 607ns ± 0% -7.42% (p=0.016 n=5+4) [Geo mean] 93.5µs 93.8µs +0.23% name old speed new speed delta GobDecode 55.1MB/s ± 4% 55.1MB/s ± 4% ~ (p=1.000 n=5+5) GobEncode 72.4MB/s ± 4% 72.3MB/s ± 5% ~ (p=0.421 n=5+5) Gzip 34.2MB/s ± 1% 34.5MB/s ± 4% ~ (p=0.548 n=5+5) Gunzip 322MB/s ± 1% 321MB/s ± 0% ~ (p=0.056 n=5+5) JSONEncode 106MB/s ± 2% 109MB/s ± 2% +3.16% (p=0.016 n=5+5) JSONDecode 18.5MB/s ± 1% 18.8MB/s ± 2% ~ (p=0.056 n=5+5) GoParse 9.57MB/s ± 1% 9.57MB/s ± 2% ~ (p=0.952 n=5+5) RegexpMatchEasy0_32 223MB/s ± 1% 221MB/s ± 0% -1.10% (p=0.029 n=4+4) RegexpMatchEasy0_1K 2.05GB/s ± 1% 2.08GB/s ± 2% ~ (p=0.095 n=5+5) RegexpMatchEasy1_32 232MB/s ± 0% 234MB/s ± 1% +0.76% (p=0.016 n=4+5) RegexpMatchEasy1_1K 1.24GB/s ± 4% 1.24GB/s ± 2% ~ (p=0.841 n=5+5) RegexpMatchMedium_32 4.45MB/s ± 5% 4.20MB/s ± 1% -5.63% (p=0.000 n=5+4) RegexpMatchMedium_1K 17.2MB/s ± 0% 17.3MB/s ± 1% +0.66% (p=0.016 n=4+5) RegexpMatchHard_32 9.73MB/s ± 3% 9.83MB/s ± 1% ~ (p=0.889 n=5+5) RegexpMatchHard_1K 10.4MB/s ± 2% 10.3MB/s ± 0% ~ (p=0.635 n=5+5) Revcomp 249MB/s ± 1% 252MB/s ± 1% ~ (p=0.095 n=5+5) Template 14.4MB/s ± 4% 14.8MB/s ± 1% ~ (p=0.151 n=5+5) [Geo mean] 62.1MB/s 62.3MB/s +0.34% Fixes #10108 Change-Id: I79038f3c4c2ff874c136053d1a2b1c8a5a9cfac5 Reviewed-on: https://go-review.googlesource.com/c/118796 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-11-06cmd/asm: rename R18 to R18_PLATFORM on ARM64Cherry Zhang
In ARM64 ABI, R18 is the "platform register", the use of which is OS specific. The OS could choose to reserve this register. In practice, it seems fine to use R18 on Linux but not on darwin (iOS). Rename R18 to R18_PLATFORM to prevent accidental use. There is no R18 usage within the standard library (besides tests, which are updated). Fixes #26110 Change-Id: Icef7b9549e2049db1df307a0180a3c90a12d7a84 Reviewed-on: https://go-review.googlesource.com/c/147218 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-11-03cmd/internal/obj/arm64: fix encoding of 32-bit negated logical instructionsCherry Zhang
32-bit negated logical instructions (BICW, ORNW, EONW) with constants were mis-encoded, because they were missing in the cases where we handle 32-bit logical instructions. This CL adds the missing cases. Fixes #28548 Change-Id: I3d6acde7d3b72bb7d3d5d00a9df698a72c806ad5 Reviewed-on: https://go-review.googlesource.com/c/147077 Run-TryBot: Cherry Zhang <cherryyz@google.com> Run-TryBot: Ben Shi <powerman1st@163.com> Reviewed-by: Ben Shi <powerman1st@163.com>
2018-11-02all: use "reports whether" consistently in the few places that didn'tBrad Fitzpatrick
Go documentation style for boolean funcs is to say: // Foo reports whether ... func Foo() bool (rather than "returns true if") This CL also replaces 4 uses of "iff" with the same "reports whether" wording, which doesn't lose any meaning, and will prevent people from sending typo fixes when they don't realize it's "if and only if". In the past I think we've had the typo CLs updated to just say "reports whether". So do them all at once. (Inspired by the addition of another "returns true if" in CL 146938 in fd_plan9.go) Created with: $ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor) $ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor) Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f Reviewed-on: https://go-review.googlesource.com/c/147037 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-10-29cmd/asm: add s390x VMSLG instruction variantsbill_ofarrell
VMSLG has three variants on z14 and later machines. These variants are used in "limbified" squaring: VMSLEG: Even Shift Indication -- the even-indexed intermediate result is doubled VMSLOG: Odd Shift Indication -- the odd-indexed intermediate result is doubled VMSLEOG: Even and Odd Shift Indication -- both intermediate results are doubled Limbified squaring is very useful for high performance cryptographic algorithms, such as elliptic curve. This change allows these instructions to be used in Go assembly. Change-Id: Iaad577b07320205539f99b3cb37a2a984882721b Reviewed-on: https://go-review.googlesource.com/c/145180 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2018-10-23cmd/asm/internal,cmd/internal/obj/ppc64: add alignment directive to asm for ↵Lynn Boger
ppc64x This adds support for an alignment directive that can be used within Go asm to indicate preferred code alignment for ppc64x. This is intended to be used with loops to improve performance. This change only adds the directive and aligns the code based on it. Follow up changes will modify asm functions for ppc64x that benefit from preferred alignment. Fixes #14935 Here is one example of the improvement in memmove when the directive is used on the loops in the code: Memmove/64 8.74ns ± 0% 8.64ns ± 0% -1.19% (p=0.000 n=8+8) Memmove/128 11.5ns ± 0% 11.0ns ± 0% -4.35% (p=0.000 n=8+8) Memmove/256 23.0ns ± 0% 15.3ns ± 0% -33.48% (p=0.000 n=8+8) Memmove/512 31.7ns ± 0% 31.8ns ± 0% +0.32% (p=0.000 n=8+8) Memmove/1024 52.3ns ± 0% 43.9ns ± 0% -16.10% (p=0.000 n=8+8) Memmove/2048 93.2ns ± 0% 76.2ns ± 0% -18.24% (p=0.000 n=8+8) Memmove/4096 174ns ± 0% 141ns ± 0% -18.97% (p=0.000 n=8+8) Change-Id: I200d77e923dd5d78c22fe3f8eb142a8fbaff57bf Reviewed-on: https://go-review.googlesource.com/c/144218 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-10-22cmd/internal/obj/arm64: reclassify 32-bit/64-bit constantsfanzha02
Current assembler saves constants in Offset which type is int64, causing 32-bit constants have a incorrect class. This CL reclassifies constants when opcodes are 32-bit variant, like MOVW, ANDW and ADDW, etc. Besides, this CL encodes some constants of ADDCON class as MOVs instructions. This CL changes the assembler behavior as follows. 1. go assembler ADDW $MOVCON, Rn, Rd previous version: MOVD $MOVCON, Rtmp; ADDW Rtmp, Rn, Rd current version: MOVW $MOVCON, Rtmp; ADDW Rtmp, Rn, Rd 2. go assembly MOVW $0xaaaaffff, R1 previous version: treats $0xaaaaffff as VCON, encodes it as MOVW 0x994, R1 (loads it from pool). current version: treats $0xaaaaffff as MOVCON, and encodes it into MOVW instructions. 3. go assembly MOVD $0x210000, R1 previous version: treats $0x210000 as ADDCON, loads it from pool current version: treats $0x210000 as MOVCON, and encodes it into MOVD instructions. Add the test cases. 1. Binary size before/after. binary size change pkg/linux_arm64 -1.534KB pkg/tool/linux_arm64 -0.718KB go -0.32KB gofmt no change 2. go1 benchmark result. name old time/op new time/op delta BinaryTree17-8 6.26s ± 1% 6.28s ± 1% ~ (p=0.105 n=10+10) Fannkuch11-8 5.40s ± 0% 5.39s ± 0% -0.29% (p=0.028 n=9+10) FmtFprintfEmpty-8 94.5ns ± 0% 95.0ns ± 0% +0.51% (p=0.000 n=10+9) FmtFprintfString-8 163ns ± 1% 159ns ± 1% -2.06% (p=0.000 n=10+9) FmtFprintfInt-8 200ns ± 1% 196ns ± 1% -1.99% (p=0.000 n=9+10) FmtFprintfIntInt-8 292ns ± 3% 284ns ± 1% -2.87% (p=0.001 n=10+9) FmtFprintfPrefixedInt-8 422ns ± 1% 420ns ± 1% -0.59% (p=0.015 n=10+10) FmtFprintfFloat-8 458ns ± 0% 463ns ± 1% +1.19% (p=0.000 n=9+10) FmtManyArgs-8 1.37µs ± 1% 1.35µs ± 1% -1.85% (p=0.000 n=10+10) GobDecode-8 15.5ms ± 1% 15.3ms ± 1% -1.82% (p=0.000 n=10+10) GobEncode-8 11.7ms ± 5% 11.7ms ± 2% ~ (p=0.549 n=10+9) Gzip-8 622ms ± 0% 624ms ± 0% +0.23% (p=0.000 n=10+9) Gunzip-8 73.6ms ± 0% 73.8ms ± 1% ~ (p=0.077 n=9+9) HTTPClientServer-8 115µs ± 1% 115µs ± 1% ~ (p=0.796 n=10+10) JSONEncode-8 31.1ms ± 2% 28.7ms ± 1% -7.98% (p=0.000 n=10+9) JSONDecode-8 145ms ± 0% 145ms ± 1% ~ (p=0.447 n=9+10) Mandelbrot200-8 9.67ms ± 0% 9.60ms ± 0% -0.76% (p=0.000 n=9+9) GoParse-8 7.56ms ± 1% 7.58ms ± 0% +0.21% (p=0.035 n=10+9) RegexpMatchEasy0_32-8 208ns ±10% 222ns ± 0% ~ (p=0.531 n=10+6) RegexpMatchEasy0_1K-8 699ns ± 4% 694ns ± 4% ~ (p=0.868 n=10+10) RegexpMatchEasy1_32-8 186ns ± 8% 190ns ±12% ~ (p=0.955 n=10+10) RegexpMatchEasy1_1K-8 1.13µs ± 1% 1.05µs ± 2% -6.64% (p=0.000 n=10+10) RegexpMatchMedium_32-8 316ns ± 7% 288ns ± 1% -8.68% (p=0.000 n=10+7) RegexpMatchMedium_1K-8 90.2µs ± 0% 85.5µs ± 2% -5.19% (p=0.000 n=10+10) RegexpMatchHard_32-8 5.53µs ± 0% 3.90µs ± 0% -29.52% (p=0.000 n=10+10) RegexpMatchHard_1K-8 119µs ± 0% 124µs ± 0% +4.29% (p=0.000 n=9+10) Revcomp-8 1.07s ± 0% 1.07s ± 0% ~ (p=0.094 n=9+9) Template-8 162ms ± 1% 160ms ± 2% ~ (p=0.089 n=10+10) TimeParse-8 756ns ± 2% 763ns ± 1% ~ (p=0.158 n=10+10) TimeFormat-8 758ns ± 1% 746ns ± 1% -1.52% (p=0.000 n=10+10) name old speed new speed delta GobDecode-8 49.4MB/s ± 1% 50.3MB/s ± 1% +1.84% (p=0.000 n=10+10) GobEncode-8 65.6MB/s ± 5% 65.4MB/s ± 2% ~ (p=0.549 n=10+9) Gzip-8 31.2MB/s ± 0% 31.1MB/s ± 0% -0.24% (p=0.000 n=9+9) Gunzip-8 264MB/s ± 0% 263MB/s ± 1% ~ (p=0.073 n=9+9) JSONEncode-8 62.3MB/s ± 2% 67.7MB/s ± 1% +8.67% (p=0.000 n=10+9) JSONDecode-8 13.4MB/s ± 0% 13.4MB/s ± 1% ~ (p=0.508 n=9+10) GoParse-8 7.66MB/s ± 1% 7.64MB/s ± 0% -0.23% (p=0.049 n=10+9) RegexpMatchEasy0_32-8 154MB/s ± 9% 143MB/s ± 3% ~ (p=0.303 n=10+7) RegexpMatchEasy0_1K-8 1.46GB/s ± 4% 1.47GB/s ± 4% ~ (p=0.912 n=10+10) RegexpMatchEasy1_32-8 172MB/s ± 9% 170MB/s ±12% ~ (p=0.971 n=10+10) RegexpMatchEasy1_1K-8 908MB/s ± 1% 972MB/s ± 2% +7.12% (p=0.000 n=10+10) RegexpMatchMedium_32-8 3.17MB/s ± 7% 3.46MB/s ± 1% +9.14% (p=0.000 n=10+7) RegexpMatchMedium_1K-8 11.3MB/s ± 0% 12.0MB/s ± 2% +5.51% (p=0.000 n=10+10) RegexpMatchHard_32-8 5.78MB/s ± 0% 8.21MB/s ± 0% +41.93% (p=0.000 n=9+10) RegexpMatchHard_1K-8 8.62MB/s ± 0% 8.27MB/s ± 0% -4.11% (p=0.000 n=9+10) Revcomp-8 237MB/s ± 0% 237MB/s ± 0% ~ (p=0.081 n=9+9) Template-8 12.0MB/s ± 1% 12.1MB/s ± 2% ~ (p=0.072 n=10+10) Change-Id: I080801f520366b42d5f9699954bd33106976a81b Reviewed-on: https://go-review.googlesource.com/c/120661 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-10cmd/internal/obj/ppc64: generate float 0 more efficiently on ppc64xLynn Boger
This change makes use of a VSX instruction to generate the float 0 value instead of generating a constant in memory and loading it from there. This uses 1 instruction instead of 2 and avoids a memory reference. in the +0 case, uses 2 instructions in the -0 case but avoids the memory reference. Since this is done in the assembler for ppc64x, an update has been made to the assembler test. Change-Id: Ief7dddcb057bfb602f78215f6947664e8c841464 Reviewed-on: https://go-review.googlesource.com/c/139420 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2018-10-08all: fix a bunch of misspellingsIgor Zhilianin
Change-Id: I94cebca86706e072fbe3be782d3edbe0e22b9432 GitHub-Last-Rev: 8e15a40545704fb21b41a8768079f2da19341ef3 GitHub-Pull-Request: golang/go#28067 Reviewed-on: https://go-review.googlesource.com/c/140437 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-10-04cmd/internal/obj/arm64: simplify ADD and SUBBen Shi
Currently "ADD $0x123456, Rs, Rd" will load pre-stored 0x123456 from the constant pool and use it for the addition. Total 12 bytes are cost. And so does SUB. This CL breaks it to "ADD 0x123000, Rs, Rd" + "ADD 0x000456, Rd, Rd". Both "0x123000" and "0x000456" can be directly encoded into the instruction binary code. So 4 bytes are saved. 1. The total size of pkg/android_arm64 decreases about 0.3KB. 2. The go1 benchmark show little regression (excluding noise). name old time/op new time/op delta BinaryTree17-4 15.9s ± 0% 15.9s ± 1% +0.10% (p=0.044 n=29+29) Fannkuch11-4 8.72s ± 0% 8.75s ± 0% +0.34% (p=0.000 n=30+24) FmtFprintfEmpty-4 173ns ± 0% 173ns ± 0% ~ (all equal) FmtFprintfString-4 368ns ± 0% 368ns ± 0% ~ (p=0.593 n=30+30) FmtFprintfInt-4 417ns ± 0% 417ns ± 0% ~ (all equal) FmtFprintfIntInt-4 673ns ± 0% 661ns ± 1% -1.70% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 805ns ± 0% 805ns ± 0% +0.10% (p=0.011 n=30+30) FmtFprintfFloat-4 1.09µs ± 0% 1.09µs ± 0% ~ (p=0.125 n=30+29) FmtManyArgs-4 2.68µs ± 0% 2.68µs ± 0% +0.07% (p=0.004 n=30+30) GobDecode-4 32.9ms ± 0% 33.2ms ± 1% +1.07% (p=0.000 n=29+29) GobEncode-4 29.5ms ± 0% 29.6ms ± 0% +0.26% (p=0.000 n=28+28) Gzip-4 1.38s ± 1% 1.35s ± 3% -1.94% (p=0.000 n=28+30) Gunzip-4 139ms ± 0% 139ms ± 0% +0.10% (p=0.000 n=28+29) HTTPClientServer-4 745µs ± 5% 742µs ± 3% ~ (p=0.405 n=28+29) JSONEncode-4 49.5ms ± 1% 49.9ms ± 0% +0.89% (p=0.000 n=30+30) JSONDecode-4 264ms ± 1% 264ms ± 0% +0.25% (p=0.001 n=30+30) Mandelbrot200-4 16.6ms ± 0% 16.6ms ± 0% ~ (p=0.507 n=29+29) GoParse-4 15.9ms ± 0% 16.0ms ± 1% +0.91% (p=0.002 n=23+30) RegexpMatchEasy0_32-4 379ns ± 0% 379ns ± 0% ~ (all equal) RegexpMatchEasy0_1K-4 1.31µs ± 0% 1.31µs ± 0% +0.09% (p=0.008 n=27+30) RegexpMatchEasy1_32-4 357ns ± 0% 358ns ± 0% +0.28% (p=0.000 n=28+29) RegexpMatchEasy1_1K-4 2.04µs ± 0% 2.04µs ± 0% ~ (p=0.850 n=30+30) RegexpMatchMedium_32-4 587ns ± 0% 589ns ± 0% +0.33% (p=0.000 n=30+30) RegexpMatchMedium_1K-4 162µs ± 0% 163µs ± 0% ~ (p=0.351 n=30+29) RegexpMatchHard_32-4 9.54µs ± 0% 9.60µs ± 0% +0.59% (p=0.000 n=28+30) RegexpMatchHard_1K-4 287µs ± 0% 287µs ± 0% +0.11% (p=0.000 n=26+29) Revcomp-4 2.50s ± 0% 2.50s ± 0% -0.13% (p=0.012 n=28+27) Template-4 312ms ± 1% 312ms ± 1% +0.20% (p=0.015 n=27+30) TimeParse-4 1.68µs ± 0% 1.68µs ± 0% -0.35% (p=0.000 n=30+30) TimeFormat-4 1.66µs ± 0% 1.64µs ± 0% -1.20% (p=0.000 n=25+29) [Geo mean] 246µs 246µs -0.00% name old speed new speed delta GobDecode-4 23.3MB/s ± 0% 23.1MB/s ± 1% -1.05% (p=0.000 n=29+29) GobEncode-4 26.0MB/s ± 0% 25.9MB/s ± 0% -0.25% (p=0.000 n=29+28) Gzip-4 14.1MB/s ± 1% 14.4MB/s ± 3% +1.94% (p=0.000 n=27+30) Gunzip-4 139MB/s ± 0% 139MB/s ± 0% -0.10% (p=0.000 n=28+29) JSONEncode-4 39.2MB/s ± 1% 38.9MB/s ± 0% -0.88% (p=0.000 n=30+30) JSONDecode-4 7.37MB/s ± 0% 7.35MB/s ± 0% -0.26% (p=0.001 n=30+30) GoParse-4 3.65MB/s ± 0% 3.62MB/s ± 1% -0.86% (p=0.001 n=23+30) RegexpMatchEasy0_32-4 84.3MB/s ± 0% 84.3MB/s ± 0% ~ (p=0.126 n=27+26) RegexpMatchEasy0_1K-4 784MB/s ± 0% 783MB/s ± 0% -0.10% (p=0.003 n=27+30) RegexpMatchEasy1_32-4 89.5MB/s ± 0% 89.3MB/s ± 0% -0.20% (p=0.000 n=27+29) RegexpMatchEasy1_1K-4 502MB/s ± 0% 502MB/s ± 0% ~ (p=0.858 n=30+28) RegexpMatchMedium_32-4 1.70MB/s ± 0% 1.70MB/s ± 0% -0.25% (p=0.000 n=30+30) RegexpMatchMedium_1K-4 6.30MB/s ± 0% 6.30MB/s ± 0% ~ (all equal) RegexpMatchHard_32-4 3.35MB/s ± 0% 3.33MB/s ± 0% -0.47% (p=0.000 n=30+30) RegexpMatchHard_1K-4 3.57MB/s ± 0% 3.56MB/s ± 0% -0.20% (p=0.000 n=27+30) Revcomp-4 102MB/s ± 0% 102MB/s ± 0% +0.14% (p=0.008 n=28+28) Template-4 6.23MB/s ± 0% 6.21MB/s ± 1% -0.21% (p=0.009 n=21+30) [Geo mean] 24.1MB/s 24.0MB/s -0.16% Change-Id: Ifcef3edb667540e2d86e586c23afcfbc2cf1340b Reviewed-on: https://go-review.googlesource.com/c/134536 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-03all: this big patch remove whitespace from assembly filesZhou Peng
Don't worry, this patch just remove trailing whitespace from assembly files, and does not touch any logical changes. Change-Id: Ia724ac0b1abf8bc1e41454bdc79289ef317c165d Reviewed-on: https://go-review.googlesource.com/c/113595 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-12cmd/internal/obj/arm64: add error report for invalid base registerfanzha02
The current assembler accepts the non-integer register as the base register, which should be an illegal combination. Add the test cases. Change-Id: Ia21596bbb5b1e212e34bd3a170748ae788860422 Reviewed-on: https://go-review.googlesource.com/134575 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-06cmd/internal/obj/arm64: add CONSTRAINED UNPREDICTABLE behavior check for ↵fanzha02
some load/store According to ARM64 manual, it is "constrained unpredictable behavior" if the src and dst registers of some load/store instructions are same. In order to completely prevent such unpredictable behavior, adding the check for load/store instructions that are supported by the assembler in the assembler. Add test cases. Update #25823 Change-Id: I64c14ad99ee543d778e7ec8ae6516a532293dbb3 Reviewed-on: https://go-review.googlesource.com/120660 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-05cmd/internal/obj/arm64: encode float constants into FMOVS/FMOVD instructionsfanzha02
Current assembler rewrites float constants to values stored in memory except 0.0, which is not performant. This patch uses the FMOVS/FMOVD instructions to move some available floating-point immediate constants into SIMD&FP destination registers. These available constants can be encoded into FMOVS/FMOVD instructions, checked by the chipfloat7() function. go1 benchmark results. name old time/op new time/op delta BinaryTree17-8 6.27s ± 1% 6.27s ± 1% ~ (p=0.762 n=10+8) Fannkuch11-8 5.42s ± 1% 5.38s ± 0% -0.63% (p=0.000 n=10+10) FmtFprintfEmpty-8 92.9ns ± 1% 93.4ns ± 0% +0.47% (p=0.004 n=9+8) FmtFprintfString-8 169ns ± 2% 170ns ± 4% ~ (p=0.378 n=10+10) FmtFprintfInt-8 197ns ± 1% 196ns ± 1% -0.77% (p=0.009 n=10+9) FmtFprintfIntInt-8 284ns ± 1% 286ns ± 1% ~ (p=0.051 n=10+10) FmtFprintfPrefixedInt-8 419ns ± 0% 422ns ± 1% +0.69% (p=0.038 n=6+10) FmtFprintfFloat-8 458ns ± 0% 463ns ± 1% +1.14% (p=0.000 n=10+10) FmtManyArgs-8 1.35µs ± 2% 1.36µs ± 1% +0.91% (p=0.043 n=10+10) GobDecode-8 16.0ms ± 2% 15.5ms ± 1% -3.39% (p=0.000 n=10+10) GobEncode-8 11.9ms ± 3% 11.4ms ± 1% -3.98% (p=0.000 n=10+9) Gzip-8 621ms ± 0% 625ms ± 0% +0.59% (p=0.000 n=9+10) Gunzip-8 74.0ms ± 1% 74.3ms ± 0% ~ (p=0.059 n=9+8) HTTPClientServer-8 116µs ± 1% 116µs ± 1% ~ (p=0.165 n=10+10) JSONEncode-8 29.3ms ± 1% 29.5ms ± 0% +0.72% (p=0.001 n=10+10) JSONDecode-8 145ms ± 1% 148ms ± 2% +2.06% (p=0.000 n=10+10) Mandelbrot200-8 9.67ms ± 0% 9.48ms ± 1% -1.92% (p=0.000 n=8+10) GoParse-8 7.55ms ± 0% 7.60ms ± 0% +0.57% (p=0.000 n=9+10) RegexpMatchEasy0_32-8 234ns ± 0% 210ns ± 0% -10.13% (p=0.000 n=8+10) RegexpMatchEasy0_1K-8 753ns ± 1% 729ns ± 0% -3.17% (p=0.000 n=10+8) RegexpMatchEasy1_32-8 225ns ± 0% 224ns ± 0% -0.44% (p=0.000 n=9+9) RegexpMatchEasy1_1K-8 1.03µs ± 0% 1.04µs ± 1% +1.29% (p=0.000 n=10+10) RegexpMatchMedium_32-8 320ns ± 3% 296ns ± 6% -7.50% (p=0.000 n=10+10) RegexpMatchMedium_1K-8 77.0µs ± 5% 73.6µs ± 1% ~ (p=0.393 n=10+10) RegexpMatchHard_32-8 3.93µs ± 0% 3.89µs ± 1% -0.95% (p=0.000 n=10+9) RegexpMatchHard_1K-8 120µs ± 5% 115µs ± 1% ~ (p=0.739 n=10+10) Revcomp-8 1.07s ± 0% 1.08s ± 1% +0.63% (p=0.000 n=10+9) Template-8 165ms ± 1% 163ms ± 1% -1.05% (p=0.001 n=8+10) TimeParse-8 751ns ± 1% 749ns ± 1% ~ (p=0.209 n=10+10) TimeFormat-8 759ns ± 1% 751ns ± 1% -0.96% (p=0.001 n=10+10) name old speed new speed delta GobDecode-8 48.0MB/s ± 2% 49.6MB/s ± 1% +3.50% (p=0.000 n=10+10) GobEncode-8 64.5MB/s ± 3% 67.1MB/s ± 1% +4.08% (p=0.000 n=10+9) Gzip-8 31.2MB/s ± 0% 31.1MB/s ± 0% -0.55% (p=0.000 n=9+8) Gunzip-8 262MB/s ± 1% 261MB/s ± 0% ~ (p=0.059 n=9+8) JSONEncode-8 66.3MB/s ± 1% 65.8MB/s ± 0% -0.72% (p=0.001 n=10+10) JSONDecode-8 13.4MB/s ± 1% 13.2MB/s ± 1% -2.02% (p=0.000 n=10+10) GoParse-8 7.67MB/s ± 0% 7.63MB/s ± 0% -0.57% (p=0.000 n=9+10) RegexpMatchEasy0_32-8 136MB/s ± 0% 152MB/s ± 0% +11.45% (p=0.000 n=10+10) RegexpMatchEasy0_1K-8 1.36GB/s ± 1% 1.40GB/s ± 0% +3.25% (p=0.000 n=10+8) RegexpMatchEasy1_32-8 142MB/s ± 0% 143MB/s ± 0% +0.35% (p=0.000 n=10+9) RegexpMatchEasy1_1K-8 992MB/s ± 0% 980MB/s ± 1% -1.27% (p=0.000 n=10+10) RegexpMatchMedium_32-8 3.12MB/s ± 3% 3.38MB/s ± 6% +8.17% (p=0.000 n=10+10) RegexpMatchMedium_1K-8 13.3MB/s ± 5% 13.9MB/s ± 1% ~ (p=0.362 n=10+10) RegexpMatchHard_32-8 8.14MB/s ± 0% 8.21MB/s ± 1% +0.95% (p=0.000 n=10+9) RegexpMatchHard_1K-8 8.54MB/s ± 5% 8.90MB/s ± 1% ~ (p=0.636 n=10+10) Revcomp-8 238MB/s ± 0% 236MB/s ± 1% -0.63% (p=0.000 n=10+9) Template-8 11.8MB/s ± 1% 11.9MB/s ± 1% +1.07% (p=0.001 n=8+10) Change-Id: I57b372d8dcd47e6aec39893843b20385d5d9c37e Reviewed-on: https://go-review.googlesource.com/129555 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-04cmd/internal/obj/arm64: support more atomic instructionsBen Shi
LDADDALD(64-bit) and LDADDALW(32-bit) are already supported. This CL adds supports of LDADDALH(16-bit) and LDADDALB(8-bit). Change-Id: I4eac61adcec226d618dfce88618a2b98f5f1afe7 Reviewed-on: https://go-review.googlesource.com/132135 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-03cmd/compile: implement OnesCount{8,16,32,64} intrinsics on s390xMichael Munday
This CL implements the math/bits.OnesCount{8,16,32,64} functions as intrinsics on s390x using the 'population count' (popcnt) instruction. This instruction was released as the 'population-count' facility which uses the same facility bit (45) as the 'distinct-operands' facility which is a pre-requisite for Go on s390x. We can therefore use it without a feature check. The s390x popcnt instruction treats a 64 bit register as a vector of 8 bytes, summing the number of ones in each byte individually. It then writes the results to the corresponding bytes in the output register. Therefore to implement OnesCount{16,32,64} we need to sum the individual byte counts using some extra instructions. To do this efficiently I've added some additional pseudo operations to the s390x SSA backend. Unlike other architectures the new instruction sequence is faster for OnesCount8, so that is implemented using the intrinsic. name old time/op new time/op delta OnesCount 3.21ns ± 1% 1.35ns ± 0% -58.00% (p=0.000 n=20+20) OnesCount8 0.91ns ± 1% 0.81ns ± 0% -11.43% (p=0.000 n=20+20) OnesCount16 1.51ns ± 3% 1.21ns ± 0% -19.71% (p=0.000 n=20+17) OnesCount32 1.91ns ± 0% 1.12ns ± 1% -41.60% (p=0.000 n=19+20) OnesCount64 3.18ns ± 4% 1.35ns ± 0% -57.52% (p=0.000 n=20+20) Change-Id: Id54f0bd28b6db9a887ad12c0d72fcc168ef9c4e0 Reviewed-on: https://go-review.googlesource.com/114675 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-24cmd/internal/obj: support more arm64 FP instructionsBen Shi
ARM64 also supports float point LDP(load pair) & STP (store pair). The CL adds implementation and corresponding test cases for FLDPD/FLDPS/FSTPD/FSTPS. Change-Id: I45f112012a4e097bfaf023d029b36e6cbc7a5859 Reviewed-on: https://go-review.googlesource.com/125438 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-23all: fix typos detected by github.com/client9/misspellKazuhiro Sera
Change-Id: Iadb3c5de8ae9ea45855013997ed70f7929a88661 GitHub-Last-Rev: ae85bcf82be8fee533e2b9901c6133921382c70a GitHub-Pull-Request: golang/go#26920 Reviewed-on: https://go-review.googlesource.com/128955 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-08-20cmd/internal/obj/arm64: add register indexed FMOVS/FMOVDBen Shi
This CL adds register indexed FMOVS/FMOVD. FMOVS Fx, (Rn)(Rm) FMOVS Fx, (Rn)(Rm<<2) FMOVD Fx, (Rn)(Rm) FMOVD Fx, (Rn)(Rm<<3) FMOVS (Rn)(Rm), Fx FMOVS (Rn)(Rm<<2), Fx FMOVD (Rn)(Rm), Fx FMOVD (Rn)(Rm<<3), Fx Change-Id: Id76de6a4be96b64cf79d7e9a1962d9d49cb462f2 Reviewed-on: https://go-review.googlesource.com/123995 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-20cmd/internal/obj/arm64: add SWPALD/SWPALW/SWPALH/SWPALBBen Shi
Those new instructions have acquire/release semantics, besides normal atomic SWPD/SWPW/SWPH/SWPB. Change-Id: I24821a4d21aebc342897ae52903aef612c8d8a4a Reviewed-on: https://go-review.googlesource.com/128476 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-06cmd/asm/internal/arch: add package definitionMario Arranz
The package arch didn't have a definition as you can see in https://tip.golang.org/pkg/cmd/asm/internal/arch/ Change-Id: I07653b396393a75c445d04dbae5e22e90a0d5133 GitHub-Last-Rev: a859e9410f38073853687b933f53eb6570af3216 GitHub-Pull-Request: golang/go#26817 Reviewed-on: https://go-review.googlesource.com/127929 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-08-03cmd/internal/obj/arm64: fix incorrect rejection of legal instructionsBen Shi
"BFI $0, R1, $7, R2" is expected to copy bit 0~6 from R1 to R2, and left R2's other bits unchanged. But the assembler rejects it with error "illegal bit number", and BFIW/SBFIZ/SBFIZW/UBFIZ/UBFIZW have the same problem. This CL fixes that issue and adds corresponding test cases. fixes #26736 Change-Id: Ie0090a0faa38a49dd9b096a0f435987849800b76 Reviewed-on: https://go-review.googlesource.com/127159 Run-TryBot: Ben Shi <powerman1st@163.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-07-30cmd/internal/obj/arm64: reject incorrect form of LDP/STPBen Shi
"LDP (R0), (F0, F1)" and "STP (F1, F2), (R0)" are silently accepted by the arm64 assembler without any error message. And this CL fixes that bug. fixes #26556. Change-Id: Ib6fae81956deb39a4ffd95e9409acc8dad3ab2d2 Reviewed-on: https://go-review.googlesource.com/125637 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-07-12doc: ArgsSizeUnknown it's defined in cmd/internal/objabi nowXia Bin
Change-Id: I877c82788f3edbcb0b334b42049c1a06f36a6477 Reviewed-on: https://go-review.googlesource.com/123517 Reviewed-by: Rob Pike <r@golang.org>
2018-06-21cmd/compile: improve atomic add intrinsics with ARMv8.1 new instructionWei Xiao
ARMv8.1 has added new instruction (LDADDAL) for atomic memory operations. This CL improves existing atomic add intrinsics with the new instruction. Since the new instruction is only guaranteed to be present after ARMv8.1, we guard its usage with a conditional on CPU feature. Performance result on ARMv8.1 machine: name old time/op new time/op delta Xadd-224 1.05µs ± 6% 0.02µs ± 4% -98.06% (p=0.000 n=10+8) Xadd64-224 1.05µs ± 3% 0.02µs ±13% -98.10% (p=0.000 n=9+10) [Geo mean] 1.05µs 0.02µs -98.08% Performance result on ARMv8.0 machine: name old time/op new time/op delta Xadd-46 538ns ± 1% 541ns ± 1% +0.62% (p=0.000 n=9+9) Xadd64-46 505ns ± 1% 508ns ± 0% +0.48% (p=0.003 n=9+8) [Geo mean] 521ns 524ns +0.55% Change-Id: If4b5d8d0e2d6f84fe1492a4f5de0789910ad0ee9 Reviewed-on: https://go-review.googlesource.com/81877 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-06-04cmd/asm/internal/asm: add extra AVX-512 testsIskander Sharipov
These tests were meant to be included into https://golang.org/cl/113315, but were lost somewhere in the middle. This CL adds hand-written AVX-512 tests that complement auto-generated test suite. It's worth including it, because: - It covers every new Z-case explicitly - Does checks every opcode suffix encoding Change-Id: Id6da5f58773e07bef3d532fc3ca5db391d380ebf Reviewed-on: https://go-review.googlesource.com/115858 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-06-01cmd/internal/obj/arm64: fix two issues in the assemblerBen Shi
There are two issues in the arm64 assembler. 1. "CMPW $0x22220000, RSP" is encoded to 5b44a4d2ff031b6b, which is the combination of "MOVD $0x22220000, Rtmp" and "NEGSW Rtmp, ZR". The right encoding should be a combination of "MOVD $0x22220000, Rtmp" and "CMPW Rtmp, RSP". 2. "AND $0x22220000, R2, RSP" is encoded to 5b44a4d25f601b00, which is the combination of "MOVD $0x22220000, Rtmp" and an illegal instruction. The right behavior should be an error report of "illegal combination", since "AND Rtmp, RSP, RSP" is invalid in armv8. This CL fixes the above 2 issues and adds more test cases. fixes #25557 Change-Id: Ia510be26b58a229f5dfe8a5fa0b35569b2d566e7 Reviewed-on: https://go-review.googlesource.com/114796 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-23cmd/internal/obj/x86: add missing Yi8 for VEX ytabsisharipo
This change adds Yi8 forms for every ytab that had them before AVX-512 patch. The rationale is backwards-compatibility. EVEX forms remain strict and unchanged as they're not bound to any backwards-compatibility issues. Fixes #25510 Change-Id: Icd692266010ed64c9fe47cc837afc2edf2ad2d1d Reviewed-on: https://go-review.googlesource.com/114136 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-05-22cmd/asm: enable AVX512isharipo
- Uncomment tests for AVX512 encoder - Permit instruction suffixes for x86 - Permit limited reg list [reg-reg] syntax for x86 for multi-source ops - EVEX encoding support in obj/x86 (Z-cases, asmevex, etc.) - optabs and ytabs generated by x86avxgen (https://golang.org/cl/107216) Note: suffix formatting implemented with updated CConv function. Now arch asm backend should register formatting function by calling RegisterOpSuffix. Updates #22779 Change-Id: I076a167ee49582700e058c56ad74e6696710c8c8 Reviewed-on: https://go-review.googlesource.com/113315 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-17cmd/internal/obj/x86: fix VPERMQ and VPERMPD ytabisharipo
Fixes invalid encoding of VPERMQ and VPERMPD that use negative immediate argument. Fixes #25418 Updates #25420 Change-Id: Idd8180c4c632a76b76f3a68efd5f930d94431994 Reviewed-on: https://go-review.googlesource.com/113615 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-05-17cmd/asm/internal/asm/testdata: convert CRLF to LF line endingZhou Peng
Change-Id: Icbff14b52e040826bc6de704942ff2f8e0164e3e Reviewed-on: https://go-review.googlesource.com/113596 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-14cmd/internal/obj/arm: fix wrong encoding of MULBen Shi
The arm assembler incorrectly encodes the following instructions. "MUL R2, R4" -> 0xe0040492 ("MUL R4, R2, R4") "MUL R2, R4, R4" -> 0xe0040492 ("MUL R4, R2, R4") The CL fixes that issue. fixes #25347 Change-Id: I883716c7bc51c5f64837ae7d81342f94540a58cb Reviewed-on: https://go-review.googlesource.com/112737 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-11cmd/internal/obj/x86: use named consts for movtab Z-casesquasilyte
Use 0-terminated opbyte sequences for Zlit-like movtabs instead of E=0xff. movCodeFullPtr is unused (load full ptr is unsupported), but it should be removed in a separate CL (if removed at all). Passes toolstash-check. Change-Id: I28436718d93b017153de0e50e3bcec344ea4ee05 Reviewed-on: https://go-review.googlesource.com/107076 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-07cmd/internal/obj/arm64: fix illegal 4-operand instructions accepted arm64 bugfanzha02
Current assmbler accepts MUL* related instructions with 4 operands, such as instruction "MUL R1, R2, R3, R4", which is illegal. The fix adds an actual field informantion to Optab, which has value of C_NONE, C_REG, etc, so assembler can use p.From3Type for checking in oplook. Add test cases. Fixes #25059 Change-Id: I0656319383c460696b392197bf5960b987f8fc97 Reviewed-on: https://go-review.googlesource.com/109295 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>
2018-05-04cmd/compile: add wasm architectureRichard Musiol
This commit adds the wasm architecture to the compile command. A later commit will contain the corresponding linker changes. Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4 The following files are generated: - src/cmd/compile/internal/ssa/opGen.go - src/cmd/compile/internal/ssa/rewriteWasm.go - src/cmd/internal/obj/wasm/anames.go Updates #18892 Change-Id: Ifb4a96a3e427aac2362a1c97967d5667450fba3b Reviewed-on: https://go-review.googlesource.com/103295 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-03cmd/internal/obj/arm64: add more atomic instructionsBen Shi
More atomic instructions were introduced in ARMv8.1. And this CL adds support for them and corresponding test cases. LDADD Rs, (Rb), Rt: (Rb) -> Rt, Rs+(Rb) -> (Rb) LDAND Rs, (Rb), Rt: (Rb) -> Rt, Rs&(Rb) -> (Rb) LDEOR Rs, (Rb), Rt: (Rb) -> Rt, Rs^(Rb) -> (Rb) LDOR Rs, (Rb), Rt: (Rb) -> Rt, Rs|(Rb) -> (Rb) Change-Id: Ifb9df86583c4dc54fb96274852c3b93a197045e4 Reviewed-on: https://go-review.googlesource.com/110535 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>