| Age | Commit message (Collapse) | Author |
|
This CL adds the following instructions, useful for shifting/rotating
and masking operations:
* RNSBG - rotate then and selected bits
* ROSBG - rotate then or selected bits
* RXSBG - rotate then exclusive or selected bits
* RISBG - rotate then insert selected bits
It also adds the 'T' (test), 'Z' (zero), 'H' (high), 'L' (low) and
'N' (no test) variants of these instructions as appropriate.
Operands are ordered as: I₃, I₄, I₅, R₂, R₁.
Key: I₃=start, I₄=end, I₅=amount, R₂=source, R₁=destination
Change-Id: I200d12287e1df7447f37f4919da5e9a93d27c792
Reviewed-on: https://go-review.googlesource.com/c/go/+/159357
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
The check of MADD&MSUB was added to the function IsMIPSMUL in
a previous commit, and the comments should also be updated.
Change-Id: I2d3da055d55b459b908714c542dff99ab5c6cf99
Reviewed-on: https://go-review.googlesource.com/c/go/+/171102
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
An instruction that references TLS, e.g.
MOVQ 0(TLS), AX
on some platforms (e.g. Android), or in shared mode, may be
translated to (assuming TLS offset already loaded to CX)
MOVQ 0(CX)(TLS*1), AX
which in turns translates to
movq %fs:(%rcx), %rax
We have rejected non-zero offset for TLS reference, like 16(TLS).
Actually, the instruction can take offset, i.e. it is a valid
instruction for, e.g.,
movq %fs:16(%rcx),%rcx
So, allow offset in TLS reference.
Change-Id: Iaf1996bad7fe874e0c298ea441af5acb136a4028
Reviewed-on: https://go-review.googlesource.com/c/go/+/171151
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
This CL adds the 'insert program mask' (IPM) instruction to s390x.
IPM stores the current program mask (which contains the condition
code) into a general purpose register.
This instruction will be useful when implementing intrinsics for
the arithmetic functions in the math/bits package. We can also
potentially use it to convert some condition codes into bool
values.
The condition code can be saved and restored using an instruction
sequence such as:
IPM R4 // save condition code to R4
...
TMLH R4, $0x3000 // restore condition code from R4
We can also use IPM to save the carry bit to a register using an
instruction sequence such as:
IPM R4 // save condition code to R4
RISBLGZ $31, $31, $3, R4, R4 // isolate carry bit in R4
Change-Id: I169d450b6ea1a7ff8c0286115ddc42618da8a2f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/165997
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
This CL implements MADD&MSUB, which are mips32r2 instructions.
Change-Id: I06fe51573569baf3b71536336b34b95ccd24750b
Reviewed-on: https://go-review.googlesource.com/c/go/+/167680
Run-TryBot: Ben Shi <powerman1st@163.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
A bug in the encoding of XX1-Form is flipping bit 31 of such instructions.
This may result in register clobering when using VSX instructions.
This was not exposed before because we currently don't generate these
instructions in SSA, and the asm files in which they are present aren't
affected by register clobbering.
This change fixes the bug and adds a testcase for the problem.
Fixes #30112
Change-Id: I77b606159ae1efea33d2ba3e1c74b7fae8d5d2e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/163759
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This change adds several arm64 v8.1 atomic instructions and test cases.
They are LDADDAx, LDADDLx, LDANDAx, LDANDALx, LDANDLx, LDEORAx, LDEORALx,
LDEORLx, LDORAx, LDORALx, LDORLx, SWPAx and SWPLx. Their form is consistent
with the form of the existing atomic instructions.
For instructions STXRx, STLXRx, STXPx and STLXPx, the second destination
register can't be RSP. This CL also adds a check for this.
LDADDx Rs, (Rb), Rt: *Rb -> Rt, Rs + *Rb -> *Rb
LDANDx Rs, (Rb), Rt: *Rb -> Rt, Rs AND NOT(*Rb) -> *Rb
LDEORx Rs, (Rb), Rt: *Rb -> Rt, Rs EOR *Rb -> *Rb
LDORx Rs, (Rb), Rt: *Rb -> Rt, Rs OR *Rb -> *Rb
Change-Id: I9f9b0245958cb57ab7d88c66fb9159b23b9017fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/157001
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Prior to this change, DATA instructions accepted
the values 1, 2, 4, and 8 as sizes.
The acceptable sizes were further restricted
to 4 and 8 for float constants.
This was both too restrictive and not restrictive enough:
string constants may reasonably have any length,
and address constants should really only accept pointer-length sizes.
Fixes #30269
Change-Id: I06e44ecdf5909eca7b19553861aec1fa39655c2b
Reviewed-on: https://go-review.googlesource.com/c/163747
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Current assembler reports error when it assembles
"TSTW $1689262177517664, R3", but go1.11 was building
fine.
Fixes #30334
Change-Id: I9c16d36717cd05df2134e8eb5b17edc385aff0a9
Reviewed-on: https://go-review.googlesource.com/c/163259
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ben Shi <powerman1st@163.com>
|
|
When functions are inlined, for instructions in the inlined body, does
-S print the location of the call, or the location of the body? Right
now, we do the former. I'd like to do the latter by default, it makes
much more sense when reading disassembly. With mid-stack inlining
enabled in more cases, this quandry will come up more often.
The original behavior is still available with -S=2. Some tests
use this mode (so they can find assembly generated by a particular
source line).
This helped me with understanding what the compiler was doing
while fixing #29007.
Change-Id: Id14a3a41e1b18901e7c5e460aa4caf6d940ed064
Reviewed-on: https://go-review.googlesource.com/c/153241
Reviewed-by: David Chase <drchase@google.com>
|
|
VPERMXOR is missing from the Go assembler for ppc64. It has the
same format as VPERM. It was requested by an external user so
they could write an optimized algorithm in asm.
Change-Id: Icf4c682f7f46716ccae64e6ae3d62e8cec67f6c1
Reviewed-on: https://go-review.googlesource.com/c/151578
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
Currently, both asm and compile have a -symabis flag, but in asm it's
a boolean flag that means to generate a symbol ABIs file and in the
compiler its a string flag giving the path of the symbol ABIs file to
consume. I'm worried about this false symmetry biting us in the
future, so rename asm's flag to -gensymabis.
Updates #27539.
Change-Id: I8b9c18a852d2838099718f8989813f19d82e7434
Reviewed-on: https://go-review.googlesource.com/c/149818
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
This adds a -symabis flag that runs the assembler in a special mode
that outputs symbol definition and reference ABIs rather than
assembling the code. This uses a fast and somewhat lax parser because
the go_asm.h definitions may not be available.
For #27539.
Change-Id: I248ba0ebab7cc75dcb2a90e82a82eb445da7e88e
Reviewed-on: https://go-review.googlesource.com/c/147098
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Currently cmd/asm's Parser.line both consumes a line of assembly from
the lexer and assembles it. This CL separates these two steps so that
the line parser can be reused for purposes other than generating a
Prog stream.
For #27539.
Updates #17544.
Change-Id: I452c9a2112fbcc1c94bf909efc0d1fcc71014812
Reviewed-on: https://go-review.googlesource.com/c/147097
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
instructions
Current assembler gets large constants from constant pool, this CL
gets rid of the pool by using MOVZ/MOVN and MOVK to load large
constants.
This CL changes the assembler behavior as follows.
1. go assembly 1, MOVD $0x1111222233334444, R1
2, MOVD $0x1111ffff1111ffff, R1
previous version: MOVD 0x9a4, R1 (loads constant from pool).
optimized version: 1, MOVD $0x4444, R1; MOVK $(0x3333<<16), R1; MOVK $(0x2222<<32), R1;
MOVK $(0x1111<<48), R1. 2, MOVN $(0xeeee<<16), R1; MOVK $(0x1111<<48), R1.
Add test cases, and below are binary size comparison and bechmark results.
1. Binary size before/after
binary size change
pkg/linux_arm64 +25.4KB
pkg/tool/linux_arm64 -2.9KB
go -2KB
gofmt no change
2. compiler benchmark.
name old time/op new time/op delta
Template 574ms ±21% 577ms ±14% ~ (p=0.853 n=10+10)
Unicode 327ms ±29% 353ms ±23% ~ (p=0.360 n=10+8)
GoTypes 1.97s ± 8% 2.04s ±11% ~ (p=0.143 n=10+10)
Compiler 9.13s ± 9% 9.25s ± 8% ~ (p=0.684 n=10+10)
SSA 29.2s ± 5% 27.0s ± 4% -7.40% (p=0.000 n=10+10)
Flate 402ms ±40% 308ms ± 6% -23.29% (p=0.004 n=10+10)
GoParser 470ms ±26% 382ms ±10% -18.82% (p=0.000 n=9+10)
Reflect 1.36s ±16% 1.17s ± 7% -13.92% (p=0.001 n=9+10)
Tar 561ms ±19% 466ms ±15% -17.08% (p=0.000 n=9+10)
XML 745ms ±20% 679ms ±20% ~ (p=0.123 n=10+10)
StdCmd 35.5s ± 6% 37.2s ± 3% +4.81% (p=0.001 n=9+8)
name old user-time/op new user-time/op delta
Template 625ms ±14% 660ms ±18% ~ (p=0.343 n=10+10)
Unicode 355ms ±10% 373ms ±20% ~ (p=0.346 n=9+10)
GoTypes 2.39s ± 8% 2.37s ± 5% ~ (p=0.897 n=10+10)
Compiler 11.1s ± 4% 11.4s ± 2% +2.63% (p=0.010 n=10+9)
SSA 35.4s ± 3% 34.9s ± 2% ~ (p=0.113 n=10+9)
Flate 402ms ±13% 371ms ±30% ~ (p=0.089 n=10+9)
GoParser 513ms ± 8% 489ms ±24% -4.76% (p=0.039 n=9+9)
Reflect 1.52s ±12% 1.41s ± 5% -7.32% (p=0.001 n=9+10)
Tar 607ms ±10% 558ms ± 8% -7.96% (p=0.009 n=9+10)
XML 828ms ±10% 789ms ±12% ~ (p=0.059 n=10+10)
name old text-bytes new text-bytes delta
HelloSize 714kB ± 0% 712kB ± 0% -0.23% (p=0.000 n=10+10)
CmdGoSize 8.26MB ± 0% 8.25MB ± 0% -0.14% (p=0.000 n=10+10)
name old data-bytes new data-bytes delta
HelloSize 10.5kB ± 0% 10.5kB ± 0% ~ (all equal)
CmdGoSize 258kB ± 0% 258kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal)
CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.18MB ± 0% 1.18MB ± 0% ~ (all equal)
CmdGoSize 11.2MB ± 0% 11.2MB ± 0% -0.13% (p=0.000 n=10+10)
3. go1 benckmark.
name old time/op new time/op delta
BinaryTree17 6.60s ±18% 7.36s ±22% ~ (p=0.222 n=5+5)
Fannkuch11 4.04s ± 0% 4.05s ± 0% ~ (p=0.421 n=5+5)
FmtFprintfEmpty 91.8ns ±14% 91.2ns ± 9% ~ (p=0.667 n=5+5)
FmtFprintfString 145ns ± 0% 151ns ± 6% ~ (p=0.397 n=4+5)
FmtFprintfInt 169ns ± 0% 176ns ± 5% +4.14% (p=0.016 n=4+5)
FmtFprintfIntInt 229ns ± 2% 243ns ± 6% ~ (p=0.143 n=5+5)
FmtFprintfPrefixedInt 343ns ± 0% 350ns ± 3% +1.92% (p=0.048 n=5+5)
FmtFprintfFloat 400ns ± 3% 394ns ± 3% ~ (p=0.063 n=5+5)
FmtManyArgs 1.04µs ± 0% 1.05µs ± 0% +1.62% (p=0.029 n=4+4)
GobDecode 13.9ms ± 4% 13.9ms ± 5% ~ (p=1.000 n=5+5)
GobEncode 10.6ms ± 4% 10.6ms ± 5% ~ (p=0.421 n=5+5)
Gzip 567ms ± 1% 563ms ± 4% ~ (p=0.548 n=5+5)
Gunzip 60.2ms ± 1% 60.4ms ± 0% ~ (p=0.056 n=5+5)
HTTPClientServer 114µs ± 4% 108µs ± 7% ~ (p=0.095 n=5+5)
JSONEncode 18.4ms ± 2% 17.8ms ± 2% -3.06% (p=0.016 n=5+5)
JSONDecode 105ms ± 1% 103ms ± 2% ~ (p=0.056 n=5+5)
Mandelbrot200 5.48ms ± 0% 5.49ms ± 0% ~ (p=0.841 n=5+5)
GoParse 6.05ms ± 1% 6.05ms ± 2% ~ (p=1.000 n=5+5)
RegexpMatchEasy0_32 143ns ± 1% 146ns ± 4% +2.10% (p=0.048 n=4+5)
RegexpMatchEasy0_1K 499ns ± 1% 492ns ± 2% ~ (p=0.079 n=5+5)
RegexpMatchEasy1_32 137ns ± 0% 136ns ± 1% -0.73% (p=0.016 n=4+5)
RegexpMatchEasy1_1K 826ns ± 4% 823ns ± 2% ~ (p=0.841 n=5+5)
RegexpMatchMedium_32 224ns ± 5% 233ns ± 8% ~ (p=0.119 n=5+5)
RegexpMatchMedium_1K 59.6µs ± 0% 59.3µs ± 1% -0.66% (p=0.016 n=4+5)
RegexpMatchHard_32 3.29µs ± 3% 3.26µs ± 1% ~ (p=0.889 n=5+5)
RegexpMatchHard_1K 98.8µs ± 2% 99.0µs ± 0% ~ (p=0.690 n=5+5)
Revcomp 1.02s ± 1% 1.01s ± 1% ~ (p=0.095 n=5+5)
Template 135ms ± 5% 131ms ± 1% ~ (p=0.151 n=5+5)
TimeParse 591ns ± 0% 593ns ± 0% +0.20% (p=0.048 n=5+5)
TimeFormat 655ns ± 2% 607ns ± 0% -7.42% (p=0.016 n=5+4)
[Geo mean] 93.5µs 93.8µs +0.23%
name old speed new speed delta
GobDecode 55.1MB/s ± 4% 55.1MB/s ± 4% ~ (p=1.000 n=5+5)
GobEncode 72.4MB/s ± 4% 72.3MB/s ± 5% ~ (p=0.421 n=5+5)
Gzip 34.2MB/s ± 1% 34.5MB/s ± 4% ~ (p=0.548 n=5+5)
Gunzip 322MB/s ± 1% 321MB/s ± 0% ~ (p=0.056 n=5+5)
JSONEncode 106MB/s ± 2% 109MB/s ± 2% +3.16% (p=0.016 n=5+5)
JSONDecode 18.5MB/s ± 1% 18.8MB/s ± 2% ~ (p=0.056 n=5+5)
GoParse 9.57MB/s ± 1% 9.57MB/s ± 2% ~ (p=0.952 n=5+5)
RegexpMatchEasy0_32 223MB/s ± 1% 221MB/s ± 0% -1.10% (p=0.029 n=4+4)
RegexpMatchEasy0_1K 2.05GB/s ± 1% 2.08GB/s ± 2% ~ (p=0.095 n=5+5)
RegexpMatchEasy1_32 232MB/s ± 0% 234MB/s ± 1% +0.76% (p=0.016 n=4+5)
RegexpMatchEasy1_1K 1.24GB/s ± 4% 1.24GB/s ± 2% ~ (p=0.841 n=5+5)
RegexpMatchMedium_32 4.45MB/s ± 5% 4.20MB/s ± 1% -5.63% (p=0.000 n=5+4)
RegexpMatchMedium_1K 17.2MB/s ± 0% 17.3MB/s ± 1% +0.66% (p=0.016 n=4+5)
RegexpMatchHard_32 9.73MB/s ± 3% 9.83MB/s ± 1% ~ (p=0.889 n=5+5)
RegexpMatchHard_1K 10.4MB/s ± 2% 10.3MB/s ± 0% ~ (p=0.635 n=5+5)
Revcomp 249MB/s ± 1% 252MB/s ± 1% ~ (p=0.095 n=5+5)
Template 14.4MB/s ± 4% 14.8MB/s ± 1% ~ (p=0.151 n=5+5)
[Geo mean] 62.1MB/s 62.3MB/s +0.34%
Fixes #10108
Change-Id: I79038f3c4c2ff874c136053d1a2b1c8a5a9cfac5
Reviewed-on: https://go-review.googlesource.com/c/118796
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
In ARM64 ABI, R18 is the "platform register", the use of which is
OS specific. The OS could choose to reserve this register. In
practice, it seems fine to use R18 on Linux but not on darwin (iOS).
Rename R18 to R18_PLATFORM to prevent accidental use. There is no
R18 usage within the standard library (besides tests, which are
updated).
Fixes #26110
Change-Id: Icef7b9549e2049db1df307a0180a3c90a12d7a84
Reviewed-on: https://go-review.googlesource.com/c/147218
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
32-bit negated logical instructions (BICW, ORNW, EONW) with
constants were mis-encoded, because they were missing in the
cases where we handle 32-bit logical instructions. This CL
adds the missing cases.
Fixes #28548
Change-Id: I3d6acde7d3b72bb7d3d5d00a9df698a72c806ad5
Reviewed-on: https://go-review.googlesource.com/c/147077
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Ben Shi <powerman1st@163.com>
Reviewed-by: Ben Shi <powerman1st@163.com>
|
|
Go documentation style for boolean funcs is to say:
// Foo reports whether ...
func Foo() bool
(rather than "returns true if")
This CL also replaces 4 uses of "iff" with the same "reports whether"
wording, which doesn't lose any meaning, and will prevent people from
sending typo fixes when they don't realize it's "if and only if". In
the past I think we've had the typo CLs updated to just say "reports
whether". So do them all at once.
(Inspired by the addition of another "returns true if" in CL 146938
in fd_plan9.go)
Created with:
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor)
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor)
Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f
Reviewed-on: https://go-review.googlesource.com/c/147037
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
VMSLG has three variants on z14 and later machines. These variants are used in "limbified" squaring:
VMSLEG: Even Shift Indication -- the even-indexed intermediate result is doubled
VMSLOG: Odd Shift Indication -- the odd-indexed intermediate result is doubled
VMSLEOG: Even and Odd Shift Indication -- both intermediate results are doubled
Limbified squaring is very useful for high performance cryptographic algorithms, such as
elliptic curve. This change allows these instructions to be used in Go assembly.
Change-Id: Iaad577b07320205539f99b3cb37a2a984882721b
Reviewed-on: https://go-review.googlesource.com/c/145180
Reviewed-by: Michael Munday <mike.munday@ibm.com>
|
|
ppc64x
This adds support for an alignment directive that can be used
within Go asm to indicate preferred code alignment for ppc64x.
This is intended to be used with loops to improve
performance.
This change only adds the directive and aligns the code based
on it. Follow up changes will modify asm functions for
ppc64x that benefit from preferred alignment.
Fixes #14935
Here is one example of the improvement in memmove when the
directive is used on the loops in the code:
Memmove/64 8.74ns ± 0% 8.64ns ± 0% -1.19% (p=0.000 n=8+8)
Memmove/128 11.5ns ± 0% 11.0ns ± 0% -4.35% (p=0.000 n=8+8)
Memmove/256 23.0ns ± 0% 15.3ns ± 0% -33.48% (p=0.000 n=8+8)
Memmove/512 31.7ns ± 0% 31.8ns ± 0% +0.32% (p=0.000 n=8+8)
Memmove/1024 52.3ns ± 0% 43.9ns ± 0% -16.10% (p=0.000 n=8+8)
Memmove/2048 93.2ns ± 0% 76.2ns ± 0% -18.24% (p=0.000 n=8+8)
Memmove/4096 174ns ± 0% 141ns ± 0% -18.97% (p=0.000 n=8+8)
Change-Id: I200d77e923dd5d78c22fe3f8eb142a8fbaff57bf
Reviewed-on: https://go-review.googlesource.com/c/144218
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Current assembler saves constants in Offset which type is int64,
causing 32-bit constants have a incorrect class. This CL reclassifies
constants when opcodes are 32-bit variant, like MOVW, ANDW and
ADDW, etc. Besides, this CL encodes some constants of ADDCON class
as MOVs instructions.
This CL changes the assembler behavior as follows.
1. go assembler ADDW $MOVCON, Rn, Rd
previous version: MOVD $MOVCON, Rtmp; ADDW Rtmp, Rn, Rd
current version: MOVW $MOVCON, Rtmp; ADDW Rtmp, Rn, Rd
2. go assembly MOVW $0xaaaaffff, R1
previous version: treats $0xaaaaffff as VCON, encodes it as MOVW 0x994, R1 (loads it from pool).
current version: treats $0xaaaaffff as MOVCON, and encodes it into MOVW instructions.
3. go assembly MOVD $0x210000, R1
previous version: treats $0x210000 as ADDCON, loads it from pool
current version: treats $0x210000 as MOVCON, and encodes it into MOVD instructions.
Add the test cases.
1. Binary size before/after.
binary size change
pkg/linux_arm64 -1.534KB
pkg/tool/linux_arm64 -0.718KB
go -0.32KB
gofmt no change
2. go1 benchmark result.
name old time/op new time/op delta
BinaryTree17-8 6.26s ± 1% 6.28s ± 1% ~ (p=0.105 n=10+10)
Fannkuch11-8 5.40s ± 0% 5.39s ± 0% -0.29% (p=0.028 n=9+10)
FmtFprintfEmpty-8 94.5ns ± 0% 95.0ns ± 0% +0.51% (p=0.000 n=10+9)
FmtFprintfString-8 163ns ± 1% 159ns ± 1% -2.06% (p=0.000 n=10+9)
FmtFprintfInt-8 200ns ± 1% 196ns ± 1% -1.99% (p=0.000 n=9+10)
FmtFprintfIntInt-8 292ns ± 3% 284ns ± 1% -2.87% (p=0.001 n=10+9)
FmtFprintfPrefixedInt-8 422ns ± 1% 420ns ± 1% -0.59% (p=0.015 n=10+10)
FmtFprintfFloat-8 458ns ± 0% 463ns ± 1% +1.19% (p=0.000 n=9+10)
FmtManyArgs-8 1.37µs ± 1% 1.35µs ± 1% -1.85% (p=0.000 n=10+10)
GobDecode-8 15.5ms ± 1% 15.3ms ± 1% -1.82% (p=0.000 n=10+10)
GobEncode-8 11.7ms ± 5% 11.7ms ± 2% ~ (p=0.549 n=10+9)
Gzip-8 622ms ± 0% 624ms ± 0% +0.23% (p=0.000 n=10+9)
Gunzip-8 73.6ms ± 0% 73.8ms ± 1% ~ (p=0.077 n=9+9)
HTTPClientServer-8 115µs ± 1% 115µs ± 1% ~ (p=0.796 n=10+10)
JSONEncode-8 31.1ms ± 2% 28.7ms ± 1% -7.98% (p=0.000 n=10+9)
JSONDecode-8 145ms ± 0% 145ms ± 1% ~ (p=0.447 n=9+10)
Mandelbrot200-8 9.67ms ± 0% 9.60ms ± 0% -0.76% (p=0.000 n=9+9)
GoParse-8 7.56ms ± 1% 7.58ms ± 0% +0.21% (p=0.035 n=10+9)
RegexpMatchEasy0_32-8 208ns ±10% 222ns ± 0% ~ (p=0.531 n=10+6)
RegexpMatchEasy0_1K-8 699ns ± 4% 694ns ± 4% ~ (p=0.868 n=10+10)
RegexpMatchEasy1_32-8 186ns ± 8% 190ns ±12% ~ (p=0.955 n=10+10)
RegexpMatchEasy1_1K-8 1.13µs ± 1% 1.05µs ± 2% -6.64% (p=0.000 n=10+10)
RegexpMatchMedium_32-8 316ns ± 7% 288ns ± 1% -8.68% (p=0.000 n=10+7)
RegexpMatchMedium_1K-8 90.2µs ± 0% 85.5µs ± 2% -5.19% (p=0.000 n=10+10)
RegexpMatchHard_32-8 5.53µs ± 0% 3.90µs ± 0% -29.52% (p=0.000 n=10+10)
RegexpMatchHard_1K-8 119µs ± 0% 124µs ± 0% +4.29% (p=0.000 n=9+10)
Revcomp-8 1.07s ± 0% 1.07s ± 0% ~ (p=0.094 n=9+9)
Template-8 162ms ± 1% 160ms ± 2% ~ (p=0.089 n=10+10)
TimeParse-8 756ns ± 2% 763ns ± 1% ~ (p=0.158 n=10+10)
TimeFormat-8 758ns ± 1% 746ns ± 1% -1.52% (p=0.000 n=10+10)
name old speed new speed delta
GobDecode-8 49.4MB/s ± 1% 50.3MB/s ± 1% +1.84% (p=0.000 n=10+10)
GobEncode-8 65.6MB/s ± 5% 65.4MB/s ± 2% ~ (p=0.549 n=10+9)
Gzip-8 31.2MB/s ± 0% 31.1MB/s ± 0% -0.24% (p=0.000 n=9+9)
Gunzip-8 264MB/s ± 0% 263MB/s ± 1% ~ (p=0.073 n=9+9)
JSONEncode-8 62.3MB/s ± 2% 67.7MB/s ± 1% +8.67% (p=0.000 n=10+9)
JSONDecode-8 13.4MB/s ± 0% 13.4MB/s ± 1% ~ (p=0.508 n=9+10)
GoParse-8 7.66MB/s ± 1% 7.64MB/s ± 0% -0.23% (p=0.049 n=10+9)
RegexpMatchEasy0_32-8 154MB/s ± 9% 143MB/s ± 3% ~ (p=0.303 n=10+7)
RegexpMatchEasy0_1K-8 1.46GB/s ± 4% 1.47GB/s ± 4% ~ (p=0.912 n=10+10)
RegexpMatchEasy1_32-8 172MB/s ± 9% 170MB/s ±12% ~ (p=0.971 n=10+10)
RegexpMatchEasy1_1K-8 908MB/s ± 1% 972MB/s ± 2% +7.12% (p=0.000 n=10+10)
RegexpMatchMedium_32-8 3.17MB/s ± 7% 3.46MB/s ± 1% +9.14% (p=0.000 n=10+7)
RegexpMatchMedium_1K-8 11.3MB/s ± 0% 12.0MB/s ± 2% +5.51% (p=0.000 n=10+10)
RegexpMatchHard_32-8 5.78MB/s ± 0% 8.21MB/s ± 0% +41.93% (p=0.000 n=9+10)
RegexpMatchHard_1K-8 8.62MB/s ± 0% 8.27MB/s ± 0% -4.11% (p=0.000 n=9+10)
Revcomp-8 237MB/s ± 0% 237MB/s ± 0% ~ (p=0.081 n=9+9)
Template-8 12.0MB/s ± 1% 12.1MB/s ± 2% ~ (p=0.072 n=10+10)
Change-Id: I080801f520366b42d5f9699954bd33106976a81b
Reviewed-on: https://go-review.googlesource.com/c/120661
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change makes use of a VSX instruction to generate the
float 0 value instead of generating a constant in memory and
loading it from there.
This uses 1 instruction instead of 2 and avoids a memory reference.
in the +0 case, uses 2 instructions in the -0 case but avoids
the memory reference.
Since this is done in the assembler for ppc64x, an update has
been made to the assembler test.
Change-Id: Ief7dddcb057bfb602f78215f6947664e8c841464
Reviewed-on: https://go-review.googlesource.com/c/139420
Reviewed-by: Michael Munday <mike.munday@ibm.com>
|
|
Change-Id: I94cebca86706e072fbe3be782d3edbe0e22b9432
GitHub-Last-Rev: 8e15a40545704fb21b41a8768079f2da19341ef3
GitHub-Pull-Request: golang/go#28067
Reviewed-on: https://go-review.googlesource.com/c/140437
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Currently "ADD $0x123456, Rs, Rd" will load pre-stored 0x123456
from the constant pool and use it for the addition. Total 12 bytes
are cost. And so does SUB.
This CL breaks it to "ADD 0x123000, Rs, Rd" + "ADD 0x000456, Rd, Rd".
Both "0x123000" and "0x000456" can be directly encoded into the
instruction binary code. So 4 bytes are saved.
1. The total size of pkg/android_arm64 decreases about 0.3KB.
2. The go1 benchmark show little regression (excluding noise).
name old time/op new time/op delta
BinaryTree17-4 15.9s ± 0% 15.9s ± 1% +0.10% (p=0.044 n=29+29)
Fannkuch11-4 8.72s ± 0% 8.75s ± 0% +0.34% (p=0.000 n=30+24)
FmtFprintfEmpty-4 173ns ± 0% 173ns ± 0% ~ (all equal)
FmtFprintfString-4 368ns ± 0% 368ns ± 0% ~ (p=0.593 n=30+30)
FmtFprintfInt-4 417ns ± 0% 417ns ± 0% ~ (all equal)
FmtFprintfIntInt-4 673ns ± 0% 661ns ± 1% -1.70% (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4 805ns ± 0% 805ns ± 0% +0.10% (p=0.011 n=30+30)
FmtFprintfFloat-4 1.09µs ± 0% 1.09µs ± 0% ~ (p=0.125 n=30+29)
FmtManyArgs-4 2.68µs ± 0% 2.68µs ± 0% +0.07% (p=0.004 n=30+30)
GobDecode-4 32.9ms ± 0% 33.2ms ± 1% +1.07% (p=0.000 n=29+29)
GobEncode-4 29.5ms ± 0% 29.6ms ± 0% +0.26% (p=0.000 n=28+28)
Gzip-4 1.38s ± 1% 1.35s ± 3% -1.94% (p=0.000 n=28+30)
Gunzip-4 139ms ± 0% 139ms ± 0% +0.10% (p=0.000 n=28+29)
HTTPClientServer-4 745µs ± 5% 742µs ± 3% ~ (p=0.405 n=28+29)
JSONEncode-4 49.5ms ± 1% 49.9ms ± 0% +0.89% (p=0.000 n=30+30)
JSONDecode-4 264ms ± 1% 264ms ± 0% +0.25% (p=0.001 n=30+30)
Mandelbrot200-4 16.6ms ± 0% 16.6ms ± 0% ~ (p=0.507 n=29+29)
GoParse-4 15.9ms ± 0% 16.0ms ± 1% +0.91% (p=0.002 n=23+30)
RegexpMatchEasy0_32-4 379ns ± 0% 379ns ± 0% ~ (all equal)
RegexpMatchEasy0_1K-4 1.31µs ± 0% 1.31µs ± 0% +0.09% (p=0.008 n=27+30)
RegexpMatchEasy1_32-4 357ns ± 0% 358ns ± 0% +0.28% (p=0.000 n=28+29)
RegexpMatchEasy1_1K-4 2.04µs ± 0% 2.04µs ± 0% ~ (p=0.850 n=30+30)
RegexpMatchMedium_32-4 587ns ± 0% 589ns ± 0% +0.33% (p=0.000 n=30+30)
RegexpMatchMedium_1K-4 162µs ± 0% 163µs ± 0% ~ (p=0.351 n=30+29)
RegexpMatchHard_32-4 9.54µs ± 0% 9.60µs ± 0% +0.59% (p=0.000 n=28+30)
RegexpMatchHard_1K-4 287µs ± 0% 287µs ± 0% +0.11% (p=0.000 n=26+29)
Revcomp-4 2.50s ± 0% 2.50s ± 0% -0.13% (p=0.012 n=28+27)
Template-4 312ms ± 1% 312ms ± 1% +0.20% (p=0.015 n=27+30)
TimeParse-4 1.68µs ± 0% 1.68µs ± 0% -0.35% (p=0.000 n=30+30)
TimeFormat-4 1.66µs ± 0% 1.64µs ± 0% -1.20% (p=0.000 n=25+29)
[Geo mean] 246µs 246µs -0.00%
name old speed new speed delta
GobDecode-4 23.3MB/s ± 0% 23.1MB/s ± 1% -1.05% (p=0.000 n=29+29)
GobEncode-4 26.0MB/s ± 0% 25.9MB/s ± 0% -0.25% (p=0.000 n=29+28)
Gzip-4 14.1MB/s ± 1% 14.4MB/s ± 3% +1.94% (p=0.000 n=27+30)
Gunzip-4 139MB/s ± 0% 139MB/s ± 0% -0.10% (p=0.000 n=28+29)
JSONEncode-4 39.2MB/s ± 1% 38.9MB/s ± 0% -0.88% (p=0.000 n=30+30)
JSONDecode-4 7.37MB/s ± 0% 7.35MB/s ± 0% -0.26% (p=0.001 n=30+30)
GoParse-4 3.65MB/s ± 0% 3.62MB/s ± 1% -0.86% (p=0.001 n=23+30)
RegexpMatchEasy0_32-4 84.3MB/s ± 0% 84.3MB/s ± 0% ~ (p=0.126 n=27+26)
RegexpMatchEasy0_1K-4 784MB/s ± 0% 783MB/s ± 0% -0.10% (p=0.003 n=27+30)
RegexpMatchEasy1_32-4 89.5MB/s ± 0% 89.3MB/s ± 0% -0.20% (p=0.000 n=27+29)
RegexpMatchEasy1_1K-4 502MB/s ± 0% 502MB/s ± 0% ~ (p=0.858 n=30+28)
RegexpMatchMedium_32-4 1.70MB/s ± 0% 1.70MB/s ± 0% -0.25% (p=0.000 n=30+30)
RegexpMatchMedium_1K-4 6.30MB/s ± 0% 6.30MB/s ± 0% ~ (all equal)
RegexpMatchHard_32-4 3.35MB/s ± 0% 3.33MB/s ± 0% -0.47% (p=0.000 n=30+30)
RegexpMatchHard_1K-4 3.57MB/s ± 0% 3.56MB/s ± 0% -0.20% (p=0.000 n=27+30)
Revcomp-4 102MB/s ± 0% 102MB/s ± 0% +0.14% (p=0.008 n=28+28)
Template-4 6.23MB/s ± 0% 6.21MB/s ± 1% -0.21% (p=0.009 n=21+30)
[Geo mean] 24.1MB/s 24.0MB/s -0.16%
Change-Id: Ifcef3edb667540e2d86e586c23afcfbc2cf1340b
Reviewed-on: https://go-review.googlesource.com/c/134536
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Don't worry, this patch just remove trailing whitespace from
assembly files, and does not touch any logical changes.
Change-Id: Ia724ac0b1abf8bc1e41454bdc79289ef317c165d
Reviewed-on: https://go-review.googlesource.com/c/113595
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The current assembler accepts the non-integer register as the base register,
which should be an illegal combination.
Add the test cases.
Change-Id: Ia21596bbb5b1e212e34bd3a170748ae788860422
Reviewed-on: https://go-review.googlesource.com/134575
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
some load/store
According to ARM64 manual, it is "constrained unpredictable behavior"
if the src and dst registers of some load/store instructions are same.
In order to completely prevent such unpredictable behavior, adding the
check for load/store instructions that are supported by the assembler
in the assembler.
Add test cases.
Update #25823
Change-Id: I64c14ad99ee543d778e7ec8ae6516a532293dbb3
Reviewed-on: https://go-review.googlesource.com/120660
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Current assembler rewrites float constants to values stored in memory
except 0.0, which is not performant. This patch uses the FMOVS/FMOVD
instructions to move some available floating-point immediate constants
into SIMD&FP destination registers. These available constants can be
encoded into FMOVS/FMOVD instructions, checked by the chipfloat7() function.
go1 benchmark results.
name old time/op new time/op delta
BinaryTree17-8 6.27s ± 1% 6.27s ± 1% ~ (p=0.762 n=10+8)
Fannkuch11-8 5.42s ± 1% 5.38s ± 0% -0.63% (p=0.000 n=10+10)
FmtFprintfEmpty-8 92.9ns ± 1% 93.4ns ± 0% +0.47% (p=0.004 n=9+8)
FmtFprintfString-8 169ns ± 2% 170ns ± 4% ~ (p=0.378 n=10+10)
FmtFprintfInt-8 197ns ± 1% 196ns ± 1% -0.77% (p=0.009 n=10+9)
FmtFprintfIntInt-8 284ns ± 1% 286ns ± 1% ~ (p=0.051 n=10+10)
FmtFprintfPrefixedInt-8 419ns ± 0% 422ns ± 1% +0.69% (p=0.038 n=6+10)
FmtFprintfFloat-8 458ns ± 0% 463ns ± 1% +1.14% (p=0.000 n=10+10)
FmtManyArgs-8 1.35µs ± 2% 1.36µs ± 1% +0.91% (p=0.043 n=10+10)
GobDecode-8 16.0ms ± 2% 15.5ms ± 1% -3.39% (p=0.000 n=10+10)
GobEncode-8 11.9ms ± 3% 11.4ms ± 1% -3.98% (p=0.000 n=10+9)
Gzip-8 621ms ± 0% 625ms ± 0% +0.59% (p=0.000 n=9+10)
Gunzip-8 74.0ms ± 1% 74.3ms ± 0% ~ (p=0.059 n=9+8)
HTTPClientServer-8 116µs ± 1% 116µs ± 1% ~ (p=0.165 n=10+10)
JSONEncode-8 29.3ms ± 1% 29.5ms ± 0% +0.72% (p=0.001 n=10+10)
JSONDecode-8 145ms ± 1% 148ms ± 2% +2.06% (p=0.000 n=10+10)
Mandelbrot200-8 9.67ms ± 0% 9.48ms ± 1% -1.92% (p=0.000 n=8+10)
GoParse-8 7.55ms ± 0% 7.60ms ± 0% +0.57% (p=0.000 n=9+10)
RegexpMatchEasy0_32-8 234ns ± 0% 210ns ± 0% -10.13% (p=0.000 n=8+10)
RegexpMatchEasy0_1K-8 753ns ± 1% 729ns ± 0% -3.17% (p=0.000 n=10+8)
RegexpMatchEasy1_32-8 225ns ± 0% 224ns ± 0% -0.44% (p=0.000 n=9+9)
RegexpMatchEasy1_1K-8 1.03µs ± 0% 1.04µs ± 1% +1.29% (p=0.000 n=10+10)
RegexpMatchMedium_32-8 320ns ± 3% 296ns ± 6% -7.50% (p=0.000 n=10+10)
RegexpMatchMedium_1K-8 77.0µs ± 5% 73.6µs ± 1% ~ (p=0.393 n=10+10)
RegexpMatchHard_32-8 3.93µs ± 0% 3.89µs ± 1% -0.95% (p=0.000 n=10+9)
RegexpMatchHard_1K-8 120µs ± 5% 115µs ± 1% ~ (p=0.739 n=10+10)
Revcomp-8 1.07s ± 0% 1.08s ± 1% +0.63% (p=0.000 n=10+9)
Template-8 165ms ± 1% 163ms ± 1% -1.05% (p=0.001 n=8+10)
TimeParse-8 751ns ± 1% 749ns ± 1% ~ (p=0.209 n=10+10)
TimeFormat-8 759ns ± 1% 751ns ± 1% -0.96% (p=0.001 n=10+10)
name old speed new speed delta
GobDecode-8 48.0MB/s ± 2% 49.6MB/s ± 1% +3.50% (p=0.000 n=10+10)
GobEncode-8 64.5MB/s ± 3% 67.1MB/s ± 1% +4.08% (p=0.000 n=10+9)
Gzip-8 31.2MB/s ± 0% 31.1MB/s ± 0% -0.55% (p=0.000 n=9+8)
Gunzip-8 262MB/s ± 1% 261MB/s ± 0% ~ (p=0.059 n=9+8)
JSONEncode-8 66.3MB/s ± 1% 65.8MB/s ± 0% -0.72% (p=0.001 n=10+10)
JSONDecode-8 13.4MB/s ± 1% 13.2MB/s ± 1% -2.02% (p=0.000 n=10+10)
GoParse-8 7.67MB/s ± 0% 7.63MB/s ± 0% -0.57% (p=0.000 n=9+10)
RegexpMatchEasy0_32-8 136MB/s ± 0% 152MB/s ± 0% +11.45% (p=0.000 n=10+10)
RegexpMatchEasy0_1K-8 1.36GB/s ± 1% 1.40GB/s ± 0% +3.25% (p=0.000 n=10+8)
RegexpMatchEasy1_32-8 142MB/s ± 0% 143MB/s ± 0% +0.35% (p=0.000 n=10+9)
RegexpMatchEasy1_1K-8 992MB/s ± 0% 980MB/s ± 1% -1.27% (p=0.000 n=10+10)
RegexpMatchMedium_32-8 3.12MB/s ± 3% 3.38MB/s ± 6% +8.17% (p=0.000 n=10+10)
RegexpMatchMedium_1K-8 13.3MB/s ± 5% 13.9MB/s ± 1% ~ (p=0.362 n=10+10)
RegexpMatchHard_32-8 8.14MB/s ± 0% 8.21MB/s ± 1% +0.95% (p=0.000 n=10+9)
RegexpMatchHard_1K-8 8.54MB/s ± 5% 8.90MB/s ± 1% ~ (p=0.636 n=10+10)
Revcomp-8 238MB/s ± 0% 236MB/s ± 1% -0.63% (p=0.000 n=10+9)
Template-8 11.8MB/s ± 1% 11.9MB/s ± 1% +1.07% (p=0.001 n=8+10)
Change-Id: I57b372d8dcd47e6aec39893843b20385d5d9c37e
Reviewed-on: https://go-review.googlesource.com/129555
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
LDADDALD(64-bit) and LDADDALW(32-bit) are already supported.
This CL adds supports of LDADDALH(16-bit) and LDADDALB(8-bit).
Change-Id: I4eac61adcec226d618dfce88618a2b98f5f1afe7
Reviewed-on: https://go-review.googlesource.com/132135
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This CL implements the math/bits.OnesCount{8,16,32,64} functions
as intrinsics on s390x using the 'population count' (popcnt)
instruction. This instruction was released as the 'population-count'
facility which uses the same facility bit (45) as the
'distinct-operands' facility which is a pre-requisite for Go on
s390x. We can therefore use it without a feature check.
The s390x popcnt instruction treats a 64 bit register as a vector
of 8 bytes, summing the number of ones in each byte individually.
It then writes the results to the corresponding bytes in the
output register. Therefore to implement OnesCount{16,32,64} we
need to sum the individual byte counts using some extra
instructions. To do this efficiently I've added some additional
pseudo operations to the s390x SSA backend.
Unlike other architectures the new instruction sequence is faster
for OnesCount8, so that is implemented using the intrinsic.
name old time/op new time/op delta
OnesCount 3.21ns ± 1% 1.35ns ± 0% -58.00% (p=0.000 n=20+20)
OnesCount8 0.91ns ± 1% 0.81ns ± 0% -11.43% (p=0.000 n=20+20)
OnesCount16 1.51ns ± 3% 1.21ns ± 0% -19.71% (p=0.000 n=20+17)
OnesCount32 1.91ns ± 0% 1.12ns ± 1% -41.60% (p=0.000 n=19+20)
OnesCount64 3.18ns ± 4% 1.35ns ± 0% -57.52% (p=0.000 n=20+20)
Change-Id: Id54f0bd28b6db9a887ad12c0d72fcc168ef9c4e0
Reviewed-on: https://go-review.googlesource.com/114675
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
ARM64 also supports float point LDP(load pair) & STP (store pair).
The CL adds implementation and corresponding test cases for
FLDPD/FLDPS/FSTPD/FSTPS.
Change-Id: I45f112012a4e097bfaf023d029b36e6cbc7a5859
Reviewed-on: https://go-review.googlesource.com/125438
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Change-Id: Iadb3c5de8ae9ea45855013997ed70f7929a88661
GitHub-Last-Rev: ae85bcf82be8fee533e2b9901c6133921382c70a
GitHub-Pull-Request: golang/go#26920
Reviewed-on: https://go-review.googlesource.com/128955
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This CL adds register indexed FMOVS/FMOVD.
FMOVS Fx, (Rn)(Rm)
FMOVS Fx, (Rn)(Rm<<2)
FMOVD Fx, (Rn)(Rm)
FMOVD Fx, (Rn)(Rm<<3)
FMOVS (Rn)(Rm), Fx
FMOVS (Rn)(Rm<<2), Fx
FMOVD (Rn)(Rm), Fx
FMOVD (Rn)(Rm<<3), Fx
Change-Id: Id76de6a4be96b64cf79d7e9a1962d9d49cb462f2
Reviewed-on: https://go-review.googlesource.com/123995
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Those new instructions have acquire/release semantics, besides
normal atomic SWPD/SWPW/SWPH/SWPB.
Change-Id: I24821a4d21aebc342897ae52903aef612c8d8a4a
Reviewed-on: https://go-review.googlesource.com/128476
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
The package arch didn't have a definition as you can see in https://tip.golang.org/pkg/cmd/asm/internal/arch/
Change-Id: I07653b396393a75c445d04dbae5e22e90a0d5133
GitHub-Last-Rev: a859e9410f38073853687b933f53eb6570af3216
GitHub-Pull-Request: golang/go#26817
Reviewed-on: https://go-review.googlesource.com/127929
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
"BFI $0, R1, $7, R2" is expected to copy bit 0~6 from R1 to R2, and
left R2's other bits unchanged.
But the assembler rejects it with error "illegal bit number", and
BFIW/SBFIZ/SBFIZW/UBFIZ/UBFIZW have the same problem.
This CL fixes that issue and adds corresponding test cases.
fixes #26736
Change-Id: Ie0090a0faa38a49dd9b096a0f435987849800b76
Reviewed-on: https://go-review.googlesource.com/127159
Run-TryBot: Ben Shi <powerman1st@163.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
"LDP (R0), (F0, F1)" and "STP (F1, F2), (R0)" are
silently accepted by the arm64 assembler without
any error message. And this CL fixes that bug.
fixes #26556.
Change-Id: Ib6fae81956deb39a4ffd95e9409acc8dad3ab2d2
Reviewed-on: https://go-review.googlesource.com/125637
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Change-Id: I877c82788f3edbcb0b334b42049c1a06f36a6477
Reviewed-on: https://go-review.googlesource.com/123517
Reviewed-by: Rob Pike <r@golang.org>
|
|
ARMv8.1 has added new instruction (LDADDAL) for atomic memory operations. This
CL improves existing atomic add intrinsics with the new instruction. Since the
new instruction is only guaranteed to be present after ARMv8.1, we guard its
usage with a conditional on CPU feature.
Performance result on ARMv8.1 machine:
name old time/op new time/op delta
Xadd-224 1.05µs ± 6% 0.02µs ± 4% -98.06% (p=0.000 n=10+8)
Xadd64-224 1.05µs ± 3% 0.02µs ±13% -98.10% (p=0.000 n=9+10)
[Geo mean] 1.05µs 0.02µs -98.08%
Performance result on ARMv8.0 machine:
name old time/op new time/op delta
Xadd-46 538ns ± 1% 541ns ± 1% +0.62% (p=0.000 n=9+9)
Xadd64-46 505ns ± 1% 508ns ± 0% +0.48% (p=0.003 n=9+8)
[Geo mean] 521ns 524ns +0.55%
Change-Id: If4b5d8d0e2d6f84fe1492a4f5de0789910ad0ee9
Reviewed-on: https://go-review.googlesource.com/81877
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
These tests were meant to be included into https://golang.org/cl/113315,
but were lost somewhere in the middle.
This CL adds hand-written AVX-512 tests that complement
auto-generated test suite.
It's worth including it, because:
- It covers every new Z-case explicitly
- Does checks every opcode suffix encoding
Change-Id: Id6da5f58773e07bef3d532fc3ca5db391d380ebf
Reviewed-on: https://go-review.googlesource.com/115858
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
There are two issues in the arm64 assembler.
1. "CMPW $0x22220000, RSP" is encoded to 5b44a4d2ff031b6b, which
is the combination of "MOVD $0x22220000, Rtmp" and
"NEGSW Rtmp, ZR".
The right encoding should be a combination of
"MOVD $0x22220000, Rtmp" and "CMPW Rtmp, RSP".
2. "AND $0x22220000, R2, RSP" is encoded to 5b44a4d25f601b00,
which is the combination of "MOVD $0x22220000, Rtmp" and
an illegal instruction.
The right behavior should be an error report of
"illegal combination", since "AND Rtmp, RSP, RSP" is invalid
in armv8.
This CL fixes the above 2 issues and adds more test cases.
fixes #25557
Change-Id: Ia510be26b58a229f5dfe8a5fa0b35569b2d566e7
Reviewed-on: https://go-review.googlesource.com/114796
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change adds Yi8 forms for every ytab that had them before AVX-512 patch.
The rationale is backwards-compatibility.
EVEX forms remain strict and unchanged as they're not bound to any
backwards-compatibility issues.
Fixes #25510
Change-Id: Icd692266010ed64c9fe47cc837afc2edf2ad2d1d
Reviewed-on: https://go-review.googlesource.com/114136
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
|
|
- Uncomment tests for AVX512 encoder
- Permit instruction suffixes for x86
- Permit limited reg list [reg-reg] syntax for x86 for multi-source ops
- EVEX encoding support in obj/x86 (Z-cases, asmevex, etc.)
- optabs and ytabs generated by x86avxgen (https://golang.org/cl/107216)
Note: suffix formatting implemented with updated CConv function.
Now arch asm backend should register formatting function by
calling RegisterOpSuffix.
Updates #22779
Change-Id: I076a167ee49582700e058c56ad74e6696710c8c8
Reviewed-on: https://go-review.googlesource.com/113315
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Fixes invalid encoding of VPERMQ and VPERMPD that use
negative immediate argument.
Fixes #25418
Updates #25420
Change-Id: Idd8180c4c632a76b76f3a68efd5f930d94431994
Reviewed-on: https://go-review.googlesource.com/113615
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
|
|
Change-Id: Icbff14b52e040826bc6de704942ff2f8e0164e3e
Reviewed-on: https://go-review.googlesource.com/113596
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
The arm assembler incorrectly encodes the following instructions.
"MUL R2, R4" -> 0xe0040492 ("MUL R4, R2, R4")
"MUL R2, R4, R4" -> 0xe0040492 ("MUL R4, R2, R4")
The CL fixes that issue.
fixes #25347
Change-Id: I883716c7bc51c5f64837ae7d81342f94540a58cb
Reviewed-on: https://go-review.googlesource.com/112737
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Use 0-terminated opbyte sequences for Zlit-like movtabs instead of E=0xff.
movCodeFullPtr is unused (load full ptr is unsupported), but it should
be removed in a separate CL (if removed at all).
Passes toolstash-check.
Change-Id: I28436718d93b017153de0e50e3bcec344ea4ee05
Reviewed-on: https://go-review.googlesource.com/107076
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Current assmbler accepts MUL* related instructions with 4 operands,
such as instruction "MUL R1, R2, R3, R4", which is illegal.
The fix adds an actual field informantion to Optab, which has value
of C_NONE, C_REG, etc, so assembler can use p.From3Type for checking
in oplook.
Add test cases.
Fixes #25059
Change-Id: I0656319383c460696b392197bf5960b987f8fc97
Reviewed-on: https://go-review.googlesource.com/109295
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
|
|
This commit adds the wasm architecture to the compile command.
A later commit will contain the corresponding linker changes.
Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4
The following files are generated:
- src/cmd/compile/internal/ssa/opGen.go
- src/cmd/compile/internal/ssa/rewriteWasm.go
- src/cmd/internal/obj/wasm/anames.go
Updates #18892
Change-Id: Ifb4a96a3e427aac2362a1c97967d5667450fba3b
Reviewed-on: https://go-review.googlesource.com/103295
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
More atomic instructions were introduced in ARMv8.1. And this CL
adds support for them and corresponding test cases.
LDADD Rs, (Rb), Rt: (Rb) -> Rt, Rs+(Rb) -> (Rb)
LDAND Rs, (Rb), Rt: (Rb) -> Rt, Rs&(Rb) -> (Rb)
LDEOR Rs, (Rb), Rt: (Rb) -> Rt, Rs^(Rb) -> (Rb)
LDOR Rs, (Rb), Rt: (Rb) -> Rt, Rs|(Rb) -> (Rb)
Change-Id: Ifb9df86583c4dc54fb96274852c3b93a197045e4
Reviewed-on: https://go-review.googlesource.com/110535
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|