aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/asm
AgeCommit message (Collapse)Author
2018-02-06cmd/internal/obj/arm64: fix assemble add/adds/sub/subs/cmp/cmn(extended ↵fanzha02
register) bug The current code encodes the wrong option value in the binary. The fix reconstructs the function opxrrr() that does not encode the option value into the binary value when arguments is sign or zero-extended register. Add the relevant test cases and negative tests. Fixes #23501 Change-Id: Ie5850ead2ad08d9a235a5664869aac5051762f1f Reviewed-on: https://go-review.googlesource.com/88876 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-01-23cmd/internal/obj/arm64: fix assemble VLD1/VST1 bugfanzha02
The current code misassembles VLD1/VST1 instruction with non-zero offset. The offset is dropped silently without any error message. The cause of the misassembling is the current code treats argument (Rn)(Rm) as ZOREG type. The fix changes the matching rules and considers (Rn)(Rm) as ROFF type. The fix will report error information when assembles VLD1/VST1 (R8)(R13), [V1.16B]. The fix enables the ARM64Errors test. Fixes #23448 Change-Id: I3dd518b91e9960131ffb8efcb685cb8df84b70eb Reviewed-on: https://go-review.googlesource.com/87956 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-12-21cmd/internal/obj/arm: fix wrong encoding of NMULAF/NMULAD/NMULSF/NMULSDBen Shi
NMULAF/NMULAD/NMULSF/NMULSD are incorrectly encoded by the arm assembler. Instruction Right binary Current wrong binary "NMULAF F5, F6, F7" 0xee167a45 0xee167a05 "NMULAD F5, F6, F7" 0xee167b45 0xee167b05 "NMULSF F5, F6, F7" 0xee167a05 0xee167a45 "NMULSD F5, F6, F7" 0xee167b05 0xee167b45 This patch fixes this issue. fixes issue #23212 Change-Id: Ic9c203f92c34b90d6eef492a694c0e95b4d479c5 Reviewed-on: https://go-review.googlesource.com/85116 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-30compiler,linker: support for DWARF inlined instancesThan McIntosh
Compiler and linker changes to support DWARF inlined instances, see https://go.googlesource.com/proposal/+/HEAD/design/22080-dwarf-inlining.md for design details. This functionality is gated via the cmd/compile option -gendwarfinl=N, where N={0,1,2}, where a value of 0 disables dwarf inline generation, a value of 1 turns on dwarf generation without tracking of formal/local vars from inlined routines, and a value of 2 enables inlines with variable tracking. Updates #22080 Change-Id: I69309b3b815d9fed04aebddc0b8d33d0dbbfad6e Reviewed-on: https://go-review.googlesource.com/75550 Run-TryBot: Than McIntosh <thanm@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-11-21cmd/internal/obj/x86: fix /is4 encoding for VBLENDisharipo
Fixes VBLENDVP{D/S}, VPBLENDVB encoding for /is4 imm8[7:4] encoded register operand. Explanation: `reg[r]+regrex[r]+1` will yield correct values for 8..15 reg indexes, but for 0..7 it gives `index+1` results. There was no test that used lower 8 register with /is4 encoding, so the bug passed the tests. The proper solution is to get 4th bit from regrex with a proper shift: `reg[r]|(regrex[r]<<1)`. Instead of inlining `reg[r]|(regrex[r]<<1)` expr, using new `regIndex(r)` function. Test that reproduces this issue is added to amd64enc_extra.s test suite. Bug came from https://golang.org/cl/70650. Change-Id: I846a25e88d5e6df88df9d9c3f5fe94ec55416a33 Reviewed-on: https://go-review.googlesource.com/78815 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-11-21bytes: add optimized countByte for arm64Wei Xiao
Use SIMD instructions when counting a single byte. Inspired from runtime IndexByte implementation. Benchmark results of bytes, where 1 byte in every 8 is the one we are looking: name old time/op new time/op delta CountSingle/10-8 96.1ns ± 1% 38.8ns ± 0% -59.64% (p=0.000 n=9+7) CountSingle/32-8 172ns ± 2% 36ns ± 1% -79.27% (p=0.000 n=10+10) CountSingle/4K-8 18.2µs ± 1% 0.9µs ± 0% -95.17% (p=0.000 n=9+10) CountSingle/4M-8 18.4ms ± 0% 0.9ms ± 0% -95.00% (p=0.000 n=10+9) CountSingle/64M-8 284ms ± 4% 19ms ± 0% -93.40% (p=0.000 n=10+10) name old speed new speed delta CountSingle/10-8 104MB/s ± 1% 258MB/s ± 0% +147.99% (p=0.000 n=9+10) CountSingle/32-8 185MB/s ± 1% 897MB/s ± 1% +385.33% (p=0.000 n=9+10) CountSingle/4K-8 225MB/s ± 1% 4658MB/s ± 0% +1967.40% (p=0.000 n=9+10) CountSingle/4M-8 228MB/s ± 0% 4555MB/s ± 0% +1901.71% (p=0.000 n=10+9) CountSingle/64M-8 236MB/s ± 4% 3575MB/s ± 0% +1414.69% (p=0.000 n=10+10) Change-Id: Ifccb51b3c8658c49773fe05147c3cf3aead361e5 Reviewed-on: https://go-review.googlesource.com/71111 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-21cmd/internal/obj/arm64: fix assemble msr/mrs bugfanzha02
The arguments <pstatefield> is a struct that includes two elements, element reg is special register, elememt enc is pstate field values. The current code compares two different type values and get a incorrect result. The fix follows pstate field to create a systemreg struct, each system register has a vaule to use in instruction. Uncomment the msr/mrs cases. Fixes #21464 Change-Id: I1bb1587ec8548f3e4bd8d5be4d7127bd10d53186 Reviewed-on: https://go-review.googlesource.com/56030 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-20cmd/internal/obj/arm64: fix assemble hlt/hvc/smc/brk/clrex bugfanzha02
When instruction has only one argument, Go parser saves the argument value into prog.From without any special handling. But assembler gets the argument value from prog.To. The fix adds special handling for CLREX and puts other instructions arguments value into prog.From. Uncomment hlt/hvc/smc/brk/dcps1/dcps2/dcps3/clrex test cases. Fixes #20765 Change-Id: I1fc0d2faafb19b537cab5a665bd4af56c3a2c925 Reviewed-on: https://go-review.googlesource.com/78275 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-17cmd/internal/obj/x86: add AVX2 gather and VSIBisharipo
Enables AVX2 gather instructions and VSIB support, which makes vm32{x,y} vm64{x,y} operands encodable. AXXX constants placed with respect to sorting order. New VEX optabs inserted near non-VEX entries to simplify potential transition to auto-generated VSIB optabs. Tests go into new AMD64 encoder test file (amd64enc_extra.s) to avoid unnecessary interactions with auto-generated "amd64enc.s". Side note: x86avxgen did not produced these instructions because x86.v0.2.csv misses them. This also explains why x86 test suite have no AVX2 gather instructions tests. List of new instructions: VGATHERPDP VGATHERDPS VGATHERQPD VGATHERQPS VPGATHERDD VPGATHERDQ VPGATHERQD VPGATHERQQ Change-Id: Iac852f3c5016523670bd99de6bec6a48f66fb4f6 Reviewed-on: https://go-review.googlesource.com/77970 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-11-13cmd/compile: record original and absolute file names for line directivesgriesemer
Also, with this change, error locations don't print absolute positions in [] brackets following positions relative to line directives. To get the absolute positions as well, specify the -L flag. Fixes #22660. Change-Id: I9ecfa254f053defba9c802222874155fa12fee2c Reviewed-on: https://go-review.googlesource.com/77090 Reviewed-by: David Crawshaw <crawshaw@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
2017-11-06runtime: improve IndexByte for ppc64xCarlos Eduardo Seo
This change adds a better implementation of IndexByte in asm that uses the vector registers/instructions on ppc64x. benchmark old ns/op new ns/op delta BenchmarkIndexByte/10-8 9.70 9.37 -3.40% BenchmarkIndexByte/32-8 10.9 10.9 +0.00% BenchmarkIndexByte/4K-8 254 92.8 -63.46% BenchmarkIndexByte/4M-8 249246 118435 -52.48% BenchmarkIndexByte/64M-8 10737987 7383096 -31.24% benchmark old MB/s new MB/s speedup BenchmarkIndexByte/10-8 1030.63 1067.24 1.04x BenchmarkIndexByte/32-8 2922.69 2928.53 1.00x BenchmarkIndexByte/4K-8 16065.95 44156.45 2.75x BenchmarkIndexByte/4M-8 16827.96 35414.21 2.10x BenchmarkIndexByte/64M-8 6249.67 9089.53 1.45x Change-Id: I81dbdd620f7bb4e395ce4d1f2a14e8e91e39f9a1 Reviewed-on: https://go-review.googlesource.com/71710 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-11-03cmd/internal/obj/x86: add most missing AVX1/2 instsisharipo
This change applies x86avxgen output. https://go-review.googlesource.com/c/arch/+/66972 As an effect, many new AVX instructions are now available. One of the side-effects of this patch is sorted AXXX (A-enum) constants. Some AVX1/2 instructions still not added due to: 1. x86.csv V0.2 does not list them; 2. partially because of (1), test suite does not contain tests for these instructions. Change-Id: I90430d773974ca5c995d6950d90e2c62ec88ef47 Reviewed-on: https://go-review.googlesource.com/75490 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-03cmd/internal/obj/arm: add BFC/BFI to arm's assemblerBen Shi
BFC (Bit Field Clear) and BFI (Bit Field Insert) were introduced in ARMv6T2, and the compiler can use them to do further optimization. Change-Id: I5a3fbcd2c2400c9bf4b939da6366c854c744c27f Reviewed-on: https://go-review.googlesource.com/72891 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-02cmd/internal/obj/x86: add ADX extensionIlya Tocar
Add support for ADX cpuid bit detection and all instructions, implied by that bit (ADOX/ADCX). They are useful for rsa and math/big in general. Change-Id: Idaa93303ead48fd18b9b3da09b3e79de2f7e2193 Reviewed-on: https://go-review.googlesource.com/74850 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-30cmd/asm, cmd/internal/obj/s390x, math: add "test under mask" instructionsMichael Munday
Adds the following s390x test under mask (immediate) instructions: TMHH TMHL TMLH TMLL These are useful for testing bits and are already used in the math package. Change-Id: Idffb3f83b238dba76ac1e42ac6b0bf7f1d11bea2 Reviewed-on: https://go-review.googlesource.com/41092 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-30cmd/asm, cmd/compile: optimize math.Abs and math.Copysign on s390xMichael Munday
This change adds three new instructions: - LPDFR: load positive (math.Abs(x)) - LNDFR: load negative (-math.Abs(x)) - CPSDR: copy sign (math.Copysign(x, y)) By making use of GPR <-> FPR moves we can now compile math.Abs and math.Copysign to these instructions using SSA rules. This CL also adds new rules to merge address generation into combined load operations. This makes GPR <-> FPR move matching more reliable. name old time/op new time/op delta Copysign 1.85ns ± 0% 1.40ns ± 1% -24.65% (p=0.000 n=8+10) Abs 1.58ns ± 1% 0.73ns ± 1% -53.64% (p=0.000 n=10+10) The geo mean improvement for all math package benchmarks was 4.6%. Change-Id: I0cec35c5c1b3fb45243bf666b56b57faca981bc9 Reviewed-on: https://go-review.googlesource.com/73950 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Keith Randall <khr@golang.org>
2017-10-30cmd/compile,cmd/internal/obj/ppc64: make math.Abs,math.Copysign instrinsics ↵Lynn Boger
on ppc64x This adds support for math Abs, Copysign to be instrinsics on ppc64x. New instruction FCPSGN is added to generate fcpsgn. Some new rules are added to improve the int<->float conversions that are generated mainly due to the Float64bits and Float64frombits in the math package. PPC64.rules is also modified as suggested in the review for CL 63290. Improvements: benchmark old ns/op new ns/op delta BenchmarkAbs-16 1.12 0.69 -38.39% BenchmarkCopysign-16 1.30 0.93 -28.46% BenchmarkNextafter32-16 9.34 8.05 -13.81% BenchmarkFrexp-16 8.81 7.60 -13.73% Others that used Copysign also saw smaller improvements. I attempted to make this work using rules since that seems to be preferred, but due to the use of Float64bits and Float64frombits in these functions, several rules had to be added and even then not all cases were matched. Using rules became too complicated and seemed too fragile for these. Updates #21390 Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995 Reviewed-on: https://go-review.googlesource.com/67130 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Keith Randall <khr@golang.org>
2017-10-23cmd/internal/obj/arm64: handle global address in LDP/STPCherry Zhang
The addressing mode of global variable was missing, whereas the compiler may make use of it, causing "illegal combination" error. This CL adds support of that addressing mode. Fixes #22390. Change-Id: Ic8eade31aba73e6fb895f758ee7f277f8f1832ef Reviewed-on: https://go-review.googlesource.com/72610 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-17cmd/internal/obj/arm: better solution of .S/.P/.U/.W suffix checkBen Shi
Current suffix check is based on instruction, which is not very accurate. For example, "MOVW.S R1, R2" is valid, but "MOVW.S $0xaaaaaaaa, R1" and "MOVW.P CPSR, R9" are not. This patch fixes the above kinds of issues by checking suffix based on []optab. And also more test cases are added. fixes #20509 Change-Id: Ibad91be72c78eefa719412a83b4d44370d2202a8 Reviewed-on: https://go-review.googlesource.com/70910 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-16cmd/asm: reject STREX with same source and destination register on ARMCherry Zhang
On ARM, STREX does not permit the same register used as both the source and the destination. Reject the bad instruction. The assembler also accepted special cases STREX R0, (R1) as STREX R0, (R1), R0 STREX (R1), R0 as STREX R0, (R1), R0 both are illegal. Remove this special case as well. For STREXD, check that the destination is not source, and not source+1. Also check that the source register is even numbered, as required by the architecture's manual. Fixes #22268. Change-Id: I6bfde86ae692d8f1d35bd0bd7aac0f8a11ce8e22 Reviewed-on: https://go-review.googlesource.com/71190 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-13cmd/asm: refine Go assembly for ARM64Wei Xiao
Some ARM64-specific instructions (such as SIMD instructions) are not supported. This patch adds support for the following: 1. Extended register, e.g.: ADD Rm.<ext>[<<amount], Rn, Rd <ext> can have the following values: UXTB, UXTH, UXTW, UXTX, SXTB, SXTH, SXTW and SXTX 2. Arrangement for SIMD instructions, e.g.: VADDP Vm.<T>, Vn.<T>, Vd.<T> <T> can have the following values: B8, B16, H4, H8, S2, S4 and D2 3. Width specifier and element index for SIMD instructions, e.g.: VMOV Vn.<T>[index], Rd // MOV(to general register) <T> can have the following values: S and D 4. Register List, e.g.: VLD1 (Rn), [Vt1.<T>, Vt2.<T>, Vt3.<T>] 5. Register offset variant, e.g.: VLD1.P (Rn)(Rm), [Vt1.<T>, Vt2.<T>] // Rm is the post-index register 6. Go assembly for ARM64 reference manual new added instructions are required to have according explanation items in the manual and items for existed instructions will be added incrementally For more information about the refinement background, please refer to the discussion (https://groups.google.com/forum/#!topic/golang-dev/rWgDxCrL4GU) This patch only adds syntax and doesn't break any assembly that already exists. Change-Id: I34e90b7faae032820593a0e417022c354a882008 Reviewed-on: https://go-review.googlesource.com/41654 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-10cmd/asm, cmd/internal/obj/ppc64: Fix Data Cache instructions for ppc64xCarlos Eduardo Seo
This change fixes the implementation of Data Cache instructions for ppc64x, allowing non-zero hint field values. Change-Id: I454aac9293d069a4817ee574d5809fa1799b3216 Reviewed-on: https://go-review.googlesource.com/68670 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-10-06cmd/asm, cmd/cgo, cmd/compile, cmd/cover, cmd/link: use standard -V outputRuss Cox
Also add -V=full to print a unique identifier of the specific tool being invoked. This will be used for content-based staleness. Also sort and clean up a few of the flag doc comments. Change-Id: I786fe50be0b8e5f77af809d8d2dab721185c2abd Reviewed-on: https://go-review.googlesource.com/68590 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-10-06cmd/asm: add amd64 EXTRACTPS instructionisharipo
Adds last missing SSE4 instruction. Also introduces additional ytab set 'yextractps'. See https://golang.org/cl/57470 that adds other SSE4 instructions but skips this one due to 'yextractps'. To make EXTRACTPS less "sloppy", Yu2 oclass added to forbid usage of invalid offset values in immediate operand. Part of the mission to add missing amd64 SSE4 instructions to Go asm. Change-Id: I0e67e3497054f53257dd8eb4c6268da5118b4853 Reviewed-on: https://go-review.googlesource.com/57490 TryBot-Result: Gobot Gobot <gobot@golang.org> Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> Reviewed-by: Russ Cox <rsc@golang.org>
2017-10-05cmd/internal/obj/arm: support more ARMv6 instructionsBen Shi
The following instructions were introduced in ARMv6, and the compiler can do more optimization with them. 1. "MOVBS Rm@>i, Rd": rotates Rm 0/8/16/24 bits, does signed byte extension to word, and writes the result to Rd. 2. "MOVHS Rm@>i, Rd": rotates Rm 0/8/16/24 bits, does signed halfword extension to word, and writes the result to Rd. 3. "MOVBU Rm@>i, Rd": rotates Rm 0/8/16/24 bits, does unsigned byte extension to word, and writes the result to Rd. 4. "MOVHU Rm@>i, Rd": rotates Rm 0/8/16/24 bits, does unsigned half-word extension to word, and writes the result to Rd. 5. "XTAB Rm@>i, Rn, Rd": rotates Rm 0/8/16/24 bits, does signed byte extension to word, adds Rn, and writes the result to Rd. 6. "XTAH Rm@>i, Rn, Rd": rotates Rm 0/8/16/24 bits, does signed half-word extension to word, adds Rn, and writes the result to Rd. 7. "XTABU Rm@>i, Rn, Rd": rotates Rm 0/8/16/24 bits, does unsigned byte extension to word, adds Rn, and writes the result to Rd. 8. "XTAHU Rm@>i, Rn, Rd": rotates Rm 0/8/16/24 bits, does unsigned half-word extension to word, adds Rn, and writes the result to Rd. Change-Id: I4306d7ebac93015d7e2e40d307f2c4271c03f466 Reviewed-on: https://go-review.googlesource.com/65790 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-18cmd/internal/obj/x86: add ADDSUBPS/PDDamien Lespiau
These are the last instructions missing to complete SSE3 support. For reference what was missing was found by a tool [1]: $ x86db-gogen list --extension SSE3 --not-known ADDSUBPD xmmreg,xmmrm [rm: 66 0f d0 /r] PRESCOTT,SSE3,SO ADDSUBPS xmmreg,xmmrm [rm: f2 0f d0 /r] PRESCOTT,SSE3,SO [1] https://github.com/dlespiau/x86db Fixes #20293 Change-Id: Ib5a91bf64dcc5282cdb60eae740ae52b4db16ebd Reviewed-on: https://go-review.googlesource.com/42990 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-09-15cmd/internal/obj: change Prog.From3 to RestArgs ([]Addr)isharipo
This change makes it easier to express instructions with arbitrary number of operands. Rationale: previous approach with operand "hiding" does not scale well, AVX and especially AVX512 have many instructions with 3+ operands. x86 asm backend is updated to handle up to 6 explicit operands. It also fixes issue with 4-th immediate operand type checks. All `ytab` tables are updated accordingly. Changes to non-x86 backends only include these patterns: `p.From3 = X` => `p.SetFrom3(X)` `p.From3.X = Y` => `p.GetFrom3().X = Y` Over time, other backends can adapt Prog.RestArgs and reduce the amount of workarounds. -- Performance -- x/benchmark/build: $ benchstat upstream.bench patched.bench name old time/op new time/op delta Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10) name old binary-size new binary-size delta Build-48 10.3M ± 0% 10.3M ± 0% ~ (all equal) name old build-time/op new build-time/op delta Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10) name old build-peak-RSS-bytes new build-peak-RSS-bytes delta Build-48 145MB ± 5% 148MB ± 5% ~ (p=0.218 n=10+10) name old build-user+sys-time/op new build-user+sys-time/op delta Build-48 21.0s ± 2% 21.2s ± 2% ~ (p=0.075 n=10+10) Microbenchmark shows a slight slowdown. name old time/op new time/op delta AMD64asm-4 49.5ms ± 1% 49.9ms ± 1% +0.67% (p=0.001 n=23+15) func BenchmarkAMD64asm(b *testing.B) { for i := 0; i < b.N; i++ { TestAMD64EndToEnd(nil) TestAMD64Encoder(nil) } } Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183 Reviewed-on: https://go-review.googlesource.com/63490 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-14cmd/compile,math: improve int<->float conversions on ppc64xLynn Boger
The functions Float64bits and Float64frombits perform poorly on ppc64x because the int<->float conversions often result in load and store sequences to handle the type change. This patch adds more rules to recognize those sequences and use register to register moves and avoid unnecessary loads and stores where possible. There were some existing rules to improve these conversions, but this provides additional improvements. Included here: - New instruction FCFIDS to improve on conversion to 32 bit - Rename Xf2i64 and Xi2f64 as MTVSRD, MFVSRD, to match the asm - Add rules to lower some of the load/store sequences for - Added new go asm to ppc64.s testcase. conversions Improvements: BenchmarkAbs-16 2.16 0.93 -56.94% BenchmarkCopysign-16 2.66 1.18 -55.64% BenchmarkRound-16 4.82 2.69 -44.19% BenchmarkSignbit-16 1.71 1.14 -33.33% BenchmarkFrexp-16 11.4 7.94 -30.35% BenchmarkLogb-16 10.4 7.34 -29.42% BenchmarkLdexp-16 15.7 11.2 -28.66% BenchmarkIlogb-16 10.2 7.32 -28.24% BenchmarkPowInt-16 69.6 55.9 -19.68% BenchmarkModf-16 10.1 8.19 -18.91% BenchmarkLog2-16 17.4 14.3 -17.82% BenchmarkCbrt-16 45.0 37.3 -17.11% BenchmarkAtanh-16 57.6 48.3 -16.15% BenchmarkRemainder-16 76.6 65.4 -14.62% BenchmarkGamma-16 26.0 22.5 -13.46% BenchmarkPowFrac-16 197 174 -11.68% BenchmarkMod-16 112 99.8 -10.89% BenchmarkAsinh-16 59.9 53.7 -10.35% BenchmarkAcosh-16 44.8 40.3 -10.04% Updates #21390 Change-Id: I56cc991fc2e55249d69518d4e1ba76cc23904e35 Reviewed-on: https://go-review.googlesource.com/63290 Reviewed-by: Michael Munday <mike.munday@ibm.com>
2017-09-11cmd/internal/obj/arm: support more ARM VFP instructionsBen Shi
Add support of more ARM VFP instructions in the assembler. They were introduced in ARM VFPv4. "FMULAF/FMULAD Fm, Fn, Fd": Fd = Fd + Fn*Fm "FNMULAF/FNMULAD Fm, Fn, Fd": Fd = -(Fd + Fn*Fm) "FMULSF/FMULSD Fm, Fn, Fd": Fd = Fd - Fn*Fm "FNMULSF/FNMULSD Fm, Fn, Fd": Fd = -(Fd - Fn*Fm) The multiplication results are not rounded. Change-Id: Id9cc52fd8e1b9a708103cd1e514c85a9e1cb3f47 Reviewed-on: https://go-review.googlesource.com/62550 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-06cmd/internal/obj/x86: add some more AVX2 instructionsAgniva De Sarker
This adds the VFMADD[213|231]SD, VFNMADD[213|231]SD, VADDSD, VSUBSD instructions This will allow us to write a fast path for exp_amd64.s where these optimizations can be applied in a lot of places. Change-Id: Ide292107ab887bd1e225a1ad60880235b5ed7c61 Reviewed-on: https://go-review.googlesource.com/61810 Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-06cmd/asm: add most SSE4 missing instructionsisharipo
Instructions added: INSERTPS immb, r/m, xmm MPSADBW immb, r/m, xmm BLENDPD immb, r/m, xmm BLENDPS immb, r/m, xmm DPPD immb, r/m, xmm DPPS immb, r/m, xmm MOVNTDQA r/m, xmm PACKUSDW r/m, xmm PBLENDW immb, r/m, xmm PCMPEQQ r/m, xmm PCMPGTQ r/m, xmm PCMPISTRI immb, r/m, xmm PCMPISTRM immb, r/m, xmm PMAXSB r/m, xmm PMAXSD r/m, xmm PMAXUD r/m, xmm PMAXUW r/m, xmm PMINSB r/m, xmm PMINSD r/m, xmm PMINUD r/m, xmm PMINUW r/m, xmm PTEST r/m, xmm PCMPESTRM immb, r/m, xmm Note: only 'optab' table is extended. `EXTRACTPS immb, xmm, r/m` is not included in this change due to new ytab set 'yextractps'. This should simplify code review. 4-operand instructions are a subject of upcoming changes that make 4-th (and so on) operands explicit. Related TODO note in asm6.go: "dont't hide 4op, some version have xmm version". Part of the mission to add missing amd64 SSE4 instructions to Go asm. Change-Id: I71716df14a8a5332e866dd0f0d52d43d7714872f Reviewed-on: https://go-review.googlesource.com/57470 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-09-06cmd/asm: add amd64 PALIGNR instructionisharipo
3rd change out of 3 to cover AMD64 SSSE3 instruction set in Go asm. This commit adds instruction that do require new ytab variable. Change-Id: I0bc7d9401c9176eb3760c3d59494ef082e97af84 Reviewed-on: https://go-review.googlesource.com/56870 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Russ Cox <rsc@golang.org>
2017-09-06cmd/asm: add amd64 CLFLUSH instructionisharipo
This is the last instruction I found missing in SSE2 set. It does not reuse 'yprefetch' ytabs due to differences in operands SRC/DST roles: - PREFETCHx: ModRM:r/m(r) -> FROM - CLFLUSH: ModRM:r/m(w) -> TO unaryDst map is extended accordingly. Change-Id: I89e34ebb81cc0ee5f9ebbb1301bad417f7ee437f Reviewed-on: https://go-review.googlesource.com/56833 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Russ Cox <rsc@golang.org>
2017-09-06cmd/asm: add amd64 PAB{SB,SD,SW}, PMADDUBSW, PMULHRSW, PSIG{NB,ND,NW}isharipo
instructions 1st change out of 3 to cover AMD64 SSSE3 instruction set in Go asm. This commit adds instructions that do not require new named ytab sets. Change-Id: I0c3dfd8d39c3daa8b7683ab163c63145626d042e Reviewed-on: https://go-review.googlesource.com/56834 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Russ Cox <rsc@golang.org>
2017-08-31cmd/internal/obj/arm: support more ARM VFP instructionsBen Shi
Add support of more ARM VFP instructions in the assembler. They were introduced in ARM VFPv2. "NMULF/NMULD Fm, Fn, Fd": Fd = -Fn*Fm "MULAF/MULAD Fm, Fn, Fd": Fd = Fd + Fn*Fm "NMULAF/NMULAD Fm, Fn, Fd": Fd = -(Fd + Fn*Fm) "MULSF/MULSD Fm, Fn, Fd": Fd = Fd - Fn*Fm "NMULSF/NMULSD Fm, Fn, Fd": Fd = -(Fd - Fn*Fm) Change-Id: Icd302676ca44a9f5f153fce734225299403c4163 Reviewed-on: https://go-review.googlesource.com/60170 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-29cmd/asm, cmd/internal/obj/ppc64: add ISA 3.0 instructionsCarlos Eduardo Seo
This change adds new ppc64 instructions from the POWER9 ISA. This includes compares, loads, maths, register moves and the new random number generator and copy/paste facilities. Change-Id: Ife3720b90f5af184ff115bbcdcbce5c1302d39b6 Reviewed-on: https://go-review.googlesource.com/53930 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-08-28all: remove some unused result paramsDaniel Martí
Most of these are return values that were part of a receiving parameter, so they're still accessible. A few others are not, but those have never had a use. Found with github.com/mvdan/unparam, after Kevin Burke's suggestion that the tool should also warn about unused result parameters. Change-Id: Id8b5ed89912a99db22027703a88bd94d0b292b8b Reviewed-on: https://go-review.googlesource.com/55910 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-26all: remove some double spaces from commentsDaniel Martí
Went mainly for the ones that make no sense, such as the ones mid-sentence or after commas. Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0 Reviewed-on: https://go-review.googlesource.com/57293 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-25cmd/internal/obj/arm64: fix assemble fcsels/fcseld bugfanzha02
The current code treats the type of SIMD&FP register as C_REG incorrectly. The fix code converts C_REG type into C_FREG type. Uncomment fcsels/fcseld test cases. Fixes #21582 Change-Id: I754c51f72a0418bd352cbc0f7740f14cc599c72d Reviewed-on: https://go-review.googlesource.com/58350 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-23cmd/internal/obj/arm64: fix assemble fcmp/fcmpe bugfanzha02
The current code treats floating-point constant as integer and does not treat fcmp/fcmpe as the comparison instrucitons that requires special handling. The fix corrects the type of immediate arguments and adds fcmp/fcmpe in the special handing. Uncomment the fcmp/fcmpe cases. Fixes #21567 Change-Id: I6782520e2770f6ce70270b667dd5e68f71e2d5ad Reviewed-on: https://go-review.googlesource.com/57852 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-08-22cmd/internal/obj/arm64: fix assemble movk bugfanzha02
The current code gets shift arguments value from prog.From3.Offset. But prog.From3.Offset is not assigned the shift arguments value in instructions assemble process. The fix calls movcon() function to get the correct value. Uncomment the movk/movkw cases. Fixes #21398 Change-Id: I78d40c33c24bd4e3688a04622e4af7ddb5333fa6 Reviewed-on: https://go-review.googlesource.com/54990 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-08-21cmd/internal/obj/arm: support BFX/BFXU instructionsBen Shi
BFX extracts given bits from the source register, sign extends them to 32-bit, and writes to destination register. BFXU does the similar operation with zero extention. They were introduced in ARMv6T2. Change-Id: I6822ebf663497a87a662d3645eddd7c611de2b1e Reviewed-on: https://go-review.googlesource.com/56071 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-08-18cmd/asm: uncomment tests for amd64 PHADD{SW,W}, PHSUB{D,SW,W}isharipo
Instructions added in https://golang.org/cl/18853 2nd change out of 3 to cover AMD64 SSSE3 instruction set in Go asm. This commit does not actually add any new instructions, only enables some test cases. Change-Id: I9596435b31ee4c19460a51dd6cea4530aac9d198 Reviewed-on: https://go-review.googlesource.com/56835 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-08-18cmd/internal/obj/arm: support new arm instructionsBen Shi
There are two changes in this CL. 1. Add new forms of MOVH/MOVHS/MOVHU. MOVHS R0<<0(R1), R2 // ldrsh MOVH R0<<0(R1), R2 // ldrsh MOVHU R0<<0(R1), R2 // ldrh MOVHS R2, R5<<0(R1) // strh MOVH R2, R5<<0(R1) // strh MOVHU R2, R5<<0(R1) // strh 2. Simpify "MVN $0xffffffaa, Rn" to "MOVW $0x55, Rn". It is originally assembled to two instructions. "MOVW offset(PC), R11" "MVN R11, Rn" Change-Id: I8e863dcfb2bd8f21a04c5d627fa7beec0afe65fb Reviewed-on: https://go-review.googlesource.com/53690 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-08-17cmd/asm: uncomment tests for PCMPESTRI, PHMINPOSUWisharipo
Instructions are implemented in the following revisions: PCMPESTRI - https://golang.org/cl/22337 PHMINPOSUW - https://golang.org/cl/18853 It is unknown when x86test will be updated/re-run, but tests are useful to check which x86 instructions are not yet supported. As an example of tool that uses this information, there is Damien Lespiau x86db. Part of the mission to add missing amd64 SSE4 instructions to Go asm. Change-Id: I512ff26040f47a0976b3e37000fb1f37eac5b762 Reviewed-on: https://go-review.googlesource.com/55830 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-08-15cmd/internal/obj/arm64: fix assemble stxr/stxrw/stxrb/stxrh bugfanzha02
The stxr/stxrw/stxrb/stxrh instructions belong to STLXR-like instructions set and they require special handling. The current code has no special handling for those instructions. The fix adds the special handling for those instructions. Uncomment stxr/stxrw/stxrb/stxrh test cases. Fixes #21397 Change-Id: I31cee29dd6b30b1c25badd5c7574dda7a01bf016 Reviewed-on: https://go-review.googlesource.com/54951 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-06-30cmd/internal/obj/arm: check illegal base registers in ARM instructionsBen Shi
Wrong instructions "MOVW 8(F0), R1" and "MOVW R0<<0(F1), R1" are silently accepted, and all Fx are treated as Rx. The patch checks all those illegal base registers. fixes #20724 Change-Id: I05d41bb43fe774b023205163b7daf4a846e9dc88 Reviewed-on: https://go-review.googlesource.com/46132 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-06-30cmd/internal/obj/arm64: fix assemble LDXP bugfanzha02
The current code calculates register number incorrectly. The fix corrects the register number calculation. Add cases created by decoder to test assembler. Fixes #20697 Fixes #20723 Change-Id: I73ac153df9ea9f51c43a5104828d7a5389551c92 Reviewed-on: https://go-review.googlesource.com/45850 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-06-23cmd/internal/obj/arm: fix wrong encoding of MULBBBen Shi
"MULBB R1, R2, R3" is encoded to 0xe163f182, which should be 0xe1630182. This patch fix it. fix #20764 Change-Id: I9d3c3ffa40ecde86638e5e083eacc67578caebf4 Reviewed-on: https://go-review.googlesource.com/46491 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-06-23cmd/internal/obj/arm: fix setting U bit in shifted register offset of MOVBSBen Shi
"MOVBS.U R0<<0(R1), R2" is assembled to 0xe19120d0 (ldrsb r2, [r1, r0]), but it is expected to be 0xe11120d0 (ldrsb r2, [r1, -r0]). This patch fixes it and adds more encoding tests. fixes #20701 Change-Id: Ic1fb46438d71a978dbef06d97494a70c95fcbf3a Reviewed-on: https://go-review.googlesource.com/45996 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>