| Age | Commit message (Collapse) | Author |
|
register) bug
The current code encodes the wrong option value in the binary.
The fix reconstructs the function opxrrr() that does not encode the option
value into the binary value when arguments is sign or zero-extended register.
Add the relevant test cases and negative tests.
Fixes #23501
Change-Id: Ie5850ead2ad08d9a235a5664869aac5051762f1f
Reviewed-on: https://go-review.googlesource.com/88876
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
The current code misassembles VLD1/VST1 instruction with non-zero
offset. The offset is dropped silently without any error message.
The cause of the misassembling is the current code treats argument
(Rn)(Rm) as ZOREG type.
The fix changes the matching rules and considers (Rn)(Rm) as ROFF
type. The fix will report error information when assembles VLD1/VST1
(R8)(R13), [V1.16B].
The fix enables the ARM64Errors test.
Fixes #23448
Change-Id: I3dd518b91e9960131ffb8efcb685cb8df84b70eb
Reviewed-on: https://go-review.googlesource.com/87956
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
NMULAF/NMULAD/NMULSF/NMULSD are incorrectly encoded by the arm
assembler.
Instruction Right binary Current wrong binary
"NMULAF F5, F6, F7" 0xee167a45 0xee167a05
"NMULAD F5, F6, F7" 0xee167b45 0xee167b05
"NMULSF F5, F6, F7" 0xee167a05 0xee167a45
"NMULSD F5, F6, F7" 0xee167b05 0xee167b45
This patch fixes this issue.
fixes issue #23212
Change-Id: Ic9c203f92c34b90d6eef492a694c0e95b4d479c5
Reviewed-on: https://go-review.googlesource.com/85116
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Compiler and linker changes to support DWARF inlined instances,
see https://go.googlesource.com/proposal/+/HEAD/design/22080-dwarf-inlining.md
for design details.
This functionality is gated via the cmd/compile option -gendwarfinl=N,
where N={0,1,2}, where a value of 0 disables dwarf inline generation,
a value of 1 turns on dwarf generation without tracking of formal/local
vars from inlined routines, and a value of 2 enables inlines with
variable tracking.
Updates #22080
Change-Id: I69309b3b815d9fed04aebddc0b8d33d0dbbfad6e
Reviewed-on: https://go-review.googlesource.com/75550
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Fixes VBLENDVP{D/S}, VPBLENDVB encoding for /is4 imm8[7:4]
encoded register operand.
Explanation:
`reg[r]+regrex[r]+1` will yield correct values for 8..15 reg indexes,
but for 0..7 it gives `index+1` results.
There was no test that used lower 8 register with /is4 encoding,
so the bug passed the tests.
The proper solution is to get 4th bit from regrex with a proper shift:
`reg[r]|(regrex[r]<<1)`.
Instead of inlining `reg[r]|(regrex[r]<<1)` expr,
using new `regIndex(r)` function.
Test that reproduces this issue is added to
amd64enc_extra.s test suite.
Bug came from https://golang.org/cl/70650.
Change-Id: I846a25e88d5e6df88df9d9c3f5fe94ec55416a33
Reviewed-on: https://go-review.googlesource.com/78815
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
|
|
Use SIMD instructions when counting a single byte.
Inspired from runtime IndexByte implementation.
Benchmark results of bytes, where 1 byte in every 8 is the one we are looking:
name old time/op new time/op delta
CountSingle/10-8 96.1ns ± 1% 38.8ns ± 0% -59.64% (p=0.000 n=9+7)
CountSingle/32-8 172ns ± 2% 36ns ± 1% -79.27% (p=0.000 n=10+10)
CountSingle/4K-8 18.2µs ± 1% 0.9µs ± 0% -95.17% (p=0.000 n=9+10)
CountSingle/4M-8 18.4ms ± 0% 0.9ms ± 0% -95.00% (p=0.000 n=10+9)
CountSingle/64M-8 284ms ± 4% 19ms ± 0% -93.40% (p=0.000 n=10+10)
name old speed new speed delta
CountSingle/10-8 104MB/s ± 1% 258MB/s ± 0% +147.99% (p=0.000 n=9+10)
CountSingle/32-8 185MB/s ± 1% 897MB/s ± 1% +385.33% (p=0.000 n=9+10)
CountSingle/4K-8 225MB/s ± 1% 4658MB/s ± 0% +1967.40% (p=0.000 n=9+10)
CountSingle/4M-8 228MB/s ± 0% 4555MB/s ± 0% +1901.71% (p=0.000 n=10+9)
CountSingle/64M-8 236MB/s ± 4% 3575MB/s ± 0% +1414.69% (p=0.000 n=10+10)
Change-Id: Ifccb51b3c8658c49773fe05147c3cf3aead361e5
Reviewed-on: https://go-review.googlesource.com/71111
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The arguments <pstatefield> is a struct that includes two elements,
element reg is special register, elememt enc is pstate field values.
The current code compares two different type values and get a incorrect
result.
The fix follows pstate field to create a systemreg struct,
each system register has a vaule to use in instruction.
Uncomment the msr/mrs cases.
Fixes #21464
Change-Id: I1bb1587ec8548f3e4bd8d5be4d7127bd10d53186
Reviewed-on: https://go-review.googlesource.com/56030
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
When instruction has only one argument, Go parser saves the
argument value into prog.From without any special handling.
But assembler gets the argument value from prog.To.
The fix adds special handling for CLREX and puts other instructions
arguments value into prog.From.
Uncomment hlt/hvc/smc/brk/dcps1/dcps2/dcps3/clrex test cases.
Fixes #20765
Change-Id: I1fc0d2faafb19b537cab5a665bd4af56c3a2c925
Reviewed-on: https://go-review.googlesource.com/78275
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Enables AVX2 gather instructions and VSIB support,
which makes vm32{x,y} vm64{x,y} operands encodable.
AXXX constants placed with respect to sorting order.
New VEX optabs inserted near non-VEX entries to simplify
potential transition to auto-generated VSIB optabs.
Tests go into new AMD64 encoder test file (amd64enc_extra.s)
to avoid unnecessary interactions with auto-generated "amd64enc.s".
Side note: x86avxgen did not produced these instructions
because x86.v0.2.csv misses them.
This also explains why x86 test suite have no AVX2 gather
instructions tests.
List of new instructions:
VGATHERPDP
VGATHERDPS
VGATHERQPD
VGATHERQPS
VPGATHERDD
VPGATHERDQ
VPGATHERQD
VPGATHERQQ
Change-Id: Iac852f3c5016523670bd99de6bec6a48f66fb4f6
Reviewed-on: https://go-review.googlesource.com/77970
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
|
|
Also, with this change, error locations don't print absolute positions
in [] brackets following positions relative to line directives. To get
the absolute positions as well, specify the -L flag.
Fixes #22660.
Change-Id: I9ecfa254f053defba9c802222874155fa12fee2c
Reviewed-on: https://go-review.googlesource.com/77090
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
This change adds a better implementation of IndexByte in asm that uses the
vector registers/instructions on ppc64x.
benchmark old ns/op new ns/op delta
BenchmarkIndexByte/10-8 9.70 9.37 -3.40%
BenchmarkIndexByte/32-8 10.9 10.9 +0.00%
BenchmarkIndexByte/4K-8 254 92.8 -63.46%
BenchmarkIndexByte/4M-8 249246 118435 -52.48%
BenchmarkIndexByte/64M-8 10737987 7383096 -31.24%
benchmark old MB/s new MB/s speedup
BenchmarkIndexByte/10-8 1030.63 1067.24 1.04x
BenchmarkIndexByte/32-8 2922.69 2928.53 1.00x
BenchmarkIndexByte/4K-8 16065.95 44156.45 2.75x
BenchmarkIndexByte/4M-8 16827.96 35414.21 2.10x
BenchmarkIndexByte/64M-8 6249.67 9089.53 1.45x
Change-Id: I81dbdd620f7bb4e395ce4d1f2a14e8e91e39f9a1
Reviewed-on: https://go-review.googlesource.com/71710
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This change applies x86avxgen output.
https://go-review.googlesource.com/c/arch/+/66972
As an effect, many new AVX instructions are now available.
One of the side-effects of this patch is
sorted AXXX (A-enum) constants.
Some AVX1/2 instructions still not added due to:
1. x86.csv V0.2 does not list them;
2. partially because of (1), test suite does not contain tests for
these instructions.
Change-Id: I90430d773974ca5c995d6950d90e2c62ec88ef47
Reviewed-on: https://go-review.googlesource.com/75490
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
BFC (Bit Field Clear) and BFI (Bit Field Insert) were
introduced in ARMv6T2, and the compiler can use them
to do further optimization.
Change-Id: I5a3fbcd2c2400c9bf4b939da6366c854c744c27f
Reviewed-on: https://go-review.googlesource.com/72891
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Add support for ADX cpuid bit detection and all instructions,
implied by that bit (ADOX/ADCX). They are useful for rsa and math/big in
general.
Change-Id: Idaa93303ead48fd18b9b3da09b3e79de2f7e2193
Reviewed-on: https://go-review.googlesource.com/74850
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Adds the following s390x test under mask (immediate) instructions:
TMHH
TMHL
TMLH
TMLL
These are useful for testing bits and are already used in the math package.
Change-Id: Idffb3f83b238dba76ac1e42ac6b0bf7f1d11bea2
Reviewed-on: https://go-review.googlesource.com/41092
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change adds three new instructions:
- LPDFR: load positive (math.Abs(x))
- LNDFR: load negative (-math.Abs(x))
- CPSDR: copy sign (math.Copysign(x, y))
By making use of GPR <-> FPR moves we can now compile math.Abs and
math.Copysign to these instructions using SSA rules.
This CL also adds new rules to merge address generation into combined
load operations. This makes GPR <-> FPR move matching more reliable.
name old time/op new time/op delta
Copysign 1.85ns ± 0% 1.40ns ± 1% -24.65% (p=0.000 n=8+10)
Abs 1.58ns ± 1% 0.73ns ± 1% -53.64% (p=0.000 n=10+10)
The geo mean improvement for all math package benchmarks was 4.6%.
Change-Id: I0cec35c5c1b3fb45243bf666b56b57faca981bc9
Reviewed-on: https://go-review.googlesource.com/73950
Run-TryBot: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
on ppc64x
This adds support for math Abs, Copysign to be instrinsics on ppc64x.
New instruction FCPSGN is added to generate fcpsgn. Some new
rules are added to improve the int<->float conversions that are
generated mainly due to the Float64bits and Float64frombits in
the math package. PPC64.rules is also modified as suggested
in the review for CL 63290.
Improvements:
benchmark old ns/op new ns/op delta
BenchmarkAbs-16 1.12 0.69 -38.39%
BenchmarkCopysign-16 1.30 0.93 -28.46%
BenchmarkNextafter32-16 9.34 8.05 -13.81%
BenchmarkFrexp-16 8.81 7.60 -13.73%
Others that used Copysign also saw smaller improvements.
I attempted to make this work using rules since that
seems to be preferred, but due to the use of Float64bits and
Float64frombits in these functions, several rules had to be added and
even then not all cases were matched. Using rules became too
complicated and seemed too fragile for these.
Updates #21390
Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995
Reviewed-on: https://go-review.googlesource.com/67130
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The addressing mode of global variable was missing, whereas the
compiler may make use of it, causing "illegal combination" error.
This CL adds support of that addressing mode.
Fixes #22390.
Change-Id: Ic8eade31aba73e6fb895f758ee7f277f8f1832ef
Reviewed-on: https://go-review.googlesource.com/72610
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Current suffix check is based on instruction, which is not very
accurate. For example, "MOVW.S R1, R2" is valid, but
"MOVW.S $0xaaaaaaaa, R1" and "MOVW.P CPSR, R9" are not.
This patch fixes the above kinds of issues by checking suffix
based on []optab. And also more test cases are added.
fixes #20509
Change-Id: Ibad91be72c78eefa719412a83b4d44370d2202a8
Reviewed-on: https://go-review.googlesource.com/70910
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
On ARM, STREX does not permit the same register used as both the
source and the destination. Reject the bad instruction.
The assembler also accepted special cases
STREX R0, (R1) as STREX R0, (R1), R0
STREX (R1), R0 as STREX R0, (R1), R0
both are illegal. Remove this special case as well.
For STREXD, check that the destination is not source, and not
source+1. Also check that the source register is even numbered,
as required by the architecture's manual.
Fixes #22268.
Change-Id: I6bfde86ae692d8f1d35bd0bd7aac0f8a11ce8e22
Reviewed-on: https://go-review.googlesource.com/71190
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Some ARM64-specific instructions (such as SIMD instructions) are not supported.
This patch adds support for the following:
1. Extended register, e.g.:
ADD Rm.<ext>[<<amount], Rn, Rd
<ext> can have the following values:
UXTB, UXTH, UXTW, UXTX, SXTB, SXTH, SXTW and SXTX
2. Arrangement for SIMD instructions, e.g.:
VADDP Vm.<T>, Vn.<T>, Vd.<T>
<T> can have the following values:
B8, B16, H4, H8, S2, S4 and D2
3. Width specifier and element index for SIMD instructions, e.g.:
VMOV Vn.<T>[index], Rd // MOV(to general register)
<T> can have the following values:
S and D
4. Register List, e.g.:
VLD1 (Rn), [Vt1.<T>, Vt2.<T>, Vt3.<T>]
5. Register offset variant, e.g.:
VLD1.P (Rn)(Rm), [Vt1.<T>, Vt2.<T>] // Rm is the post-index register
6. Go assembly for ARM64 reference manual
new added instructions are required to have according explanation items in
the manual and items for existed instructions will be added incrementally
For more information about the refinement background, please refer to the
discussion (https://groups.google.com/forum/#!topic/golang-dev/rWgDxCrL4GU)
This patch only adds syntax and doesn't break any assembly that already exists.
Change-Id: I34e90b7faae032820593a0e417022c354a882008
Reviewed-on: https://go-review.googlesource.com/41654
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change fixes the implementation of Data Cache instructions for
ppc64x, allowing non-zero hint field values.
Change-Id: I454aac9293d069a4817ee574d5809fa1799b3216
Reviewed-on: https://go-review.googlesource.com/68670
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
Also add -V=full to print a unique identifier of the specific tool being invoked.
This will be used for content-based staleness.
Also sort and clean up a few of the flag doc comments.
Change-Id: I786fe50be0b8e5f77af809d8d2dab721185c2abd
Reviewed-on: https://go-review.googlesource.com/68590
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
|
|
Adds last missing SSE4 instruction.
Also introduces additional ytab set 'yextractps'.
See https://golang.org/cl/57470 that adds other SSE4 instructions
but skips this one due to 'yextractps'.
To make EXTRACTPS less "sloppy", Yu2 oclass added to forbid
usage of invalid offset values in immediate operand.
Part of the mission to add missing amd64 SSE4 instructions to Go asm.
Change-Id: I0e67e3497054f53257dd8eb4c6268da5118b4853
Reviewed-on: https://go-review.googlesource.com/57490
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
The following instructions were introduced in ARMv6, and the compiler
can do more optimization with them.
1. "MOVBS Rm@>i, Rd": rotates Rm 0/8/16/24 bits, does signed
byte extension to word, and writes the result to Rd.
2. "MOVHS Rm@>i, Rd": rotates Rm 0/8/16/24 bits, does signed
halfword extension to word, and writes the result to Rd.
3. "MOVBU Rm@>i, Rd": rotates Rm 0/8/16/24 bits, does unsigned
byte extension to word, and writes the result to Rd.
4. "MOVHU Rm@>i, Rd": rotates Rm 0/8/16/24 bits, does unsigned
half-word extension to word, and writes the result to Rd.
5. "XTAB Rm@>i, Rn, Rd": rotates Rm 0/8/16/24 bits, does signed
byte extension to word, adds Rn, and writes the result to Rd.
6. "XTAH Rm@>i, Rn, Rd": rotates Rm 0/8/16/24 bits, does signed
half-word extension to word, adds Rn, and writes the result to Rd.
7. "XTABU Rm@>i, Rn, Rd": rotates Rm 0/8/16/24 bits, does unsigned
byte extension to word, adds Rn, and writes the result to Rd.
8. "XTAHU Rm@>i, Rn, Rd": rotates Rm 0/8/16/24 bits, does unsigned
half-word extension to word, adds Rn, and writes the result to Rd.
Change-Id: I4306d7ebac93015d7e2e40d307f2c4271c03f466
Reviewed-on: https://go-review.googlesource.com/65790
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
These are the last instructions missing to complete SSE3 support.
For reference what was missing was found by a tool [1]:
$ x86db-gogen list --extension SSE3 --not-known
ADDSUBPD xmmreg,xmmrm [rm: 66 0f d0 /r] PRESCOTT,SSE3,SO
ADDSUBPS xmmreg,xmmrm [rm: f2 0f d0 /r] PRESCOTT,SSE3,SO
[1] https://github.com/dlespiau/x86db
Fixes #20293
Change-Id: Ib5a91bf64dcc5282cdb60eae740ae52b4db16ebd
Reviewed-on: https://go-review.googlesource.com/42990
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
|
|
This change makes it easier to express instructions
with arbitrary number of operands.
Rationale: previous approach with operand "hiding" does
not scale well, AVX and especially AVX512 have many
instructions with 3+ operands.
x86 asm backend is updated to handle up to 6 explicit operands.
It also fixes issue with 4-th immediate operand type checks.
All `ytab` tables are updated accordingly.
Changes to non-x86 backends only include these patterns:
`p.From3 = X` => `p.SetFrom3(X)`
`p.From3.X = Y` => `p.GetFrom3().X = Y`
Over time, other backends can adapt Prog.RestArgs
and reduce the amount of workarounds.
-- Performance --
x/benchmark/build:
$ benchstat upstream.bench patched.bench
name old time/op new time/op delta
Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10)
name old binary-size new binary-size delta
Build-48 10.3M ± 0% 10.3M ± 0% ~ (all equal)
name old build-time/op new build-time/op delta
Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10)
name old build-peak-RSS-bytes new build-peak-RSS-bytes delta
Build-48 145MB ± 5% 148MB ± 5% ~ (p=0.218 n=10+10)
name old build-user+sys-time/op new build-user+sys-time/op delta
Build-48 21.0s ± 2% 21.2s ± 2% ~ (p=0.075 n=10+10)
Microbenchmark shows a slight slowdown.
name old time/op new time/op delta
AMD64asm-4 49.5ms ± 1% 49.9ms ± 1% +0.67% (p=0.001 n=23+15)
func BenchmarkAMD64asm(b *testing.B) {
for i := 0; i < b.N; i++ {
TestAMD64EndToEnd(nil)
TestAMD64Encoder(nil)
}
}
Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183
Reviewed-on: https://go-review.googlesource.com/63490
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
The functions Float64bits and Float64frombits perform
poorly on ppc64x because the int<->float conversions
often result in load and store sequences to handle the
type change. This patch adds more rules to recognize
those sequences and use register to register moves
and avoid unnecessary loads and stores where possible.
There were some existing rules to improve these conversions,
but this provides additional improvements. Included here:
- New instruction FCFIDS to improve on conversion to 32 bit
- Rename Xf2i64 and Xi2f64 as MTVSRD, MFVSRD, to match the asm
- Add rules to lower some of the load/store sequences for
- Added new go asm to ppc64.s testcase.
conversions
Improvements:
BenchmarkAbs-16 2.16 0.93 -56.94%
BenchmarkCopysign-16 2.66 1.18 -55.64%
BenchmarkRound-16 4.82 2.69 -44.19%
BenchmarkSignbit-16 1.71 1.14 -33.33%
BenchmarkFrexp-16 11.4 7.94 -30.35%
BenchmarkLogb-16 10.4 7.34 -29.42%
BenchmarkLdexp-16 15.7 11.2 -28.66%
BenchmarkIlogb-16 10.2 7.32 -28.24%
BenchmarkPowInt-16 69.6 55.9 -19.68%
BenchmarkModf-16 10.1 8.19 -18.91%
BenchmarkLog2-16 17.4 14.3 -17.82%
BenchmarkCbrt-16 45.0 37.3 -17.11%
BenchmarkAtanh-16 57.6 48.3 -16.15%
BenchmarkRemainder-16 76.6 65.4 -14.62%
BenchmarkGamma-16 26.0 22.5 -13.46%
BenchmarkPowFrac-16 197 174 -11.68%
BenchmarkMod-16 112 99.8 -10.89%
BenchmarkAsinh-16 59.9 53.7 -10.35%
BenchmarkAcosh-16 44.8 40.3 -10.04%
Updates #21390
Change-Id: I56cc991fc2e55249d69518d4e1ba76cc23904e35
Reviewed-on: https://go-review.googlesource.com/63290
Reviewed-by: Michael Munday <mike.munday@ibm.com>
|
|
Add support of more ARM VFP instructions in the assembler.
They were introduced in ARM VFPv4.
"FMULAF/FMULAD Fm, Fn, Fd": Fd = Fd + Fn*Fm
"FNMULAF/FNMULAD Fm, Fn, Fd": Fd = -(Fd + Fn*Fm)
"FMULSF/FMULSD Fm, Fn, Fd": Fd = Fd - Fn*Fm
"FNMULSF/FNMULSD Fm, Fn, Fd": Fd = -(Fd - Fn*Fm)
The multiplication results are not rounded.
Change-Id: Id9cc52fd8e1b9a708103cd1e514c85a9e1cb3f47
Reviewed-on: https://go-review.googlesource.com/62550
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This adds the VFMADD[213|231]SD, VFNMADD[213|231]SD,
VADDSD, VSUBSD instructions
This will allow us to write a fast path for exp_amd64.s where
these optimizations can be applied in a lot of places.
Change-Id: Ide292107ab887bd1e225a1ad60880235b5ed7c61
Reviewed-on: https://go-review.googlesource.com/61810
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Instructions added:
INSERTPS immb, r/m, xmm
MPSADBW immb, r/m, xmm
BLENDPD immb, r/m, xmm
BLENDPS immb, r/m, xmm
DPPD immb, r/m, xmm
DPPS immb, r/m, xmm
MOVNTDQA r/m, xmm
PACKUSDW r/m, xmm
PBLENDW immb, r/m, xmm
PCMPEQQ r/m, xmm
PCMPGTQ r/m, xmm
PCMPISTRI immb, r/m, xmm
PCMPISTRM immb, r/m, xmm
PMAXSB r/m, xmm
PMAXSD r/m, xmm
PMAXUD r/m, xmm
PMAXUW r/m, xmm
PMINSB r/m, xmm
PMINSD r/m, xmm
PMINUD r/m, xmm
PMINUW r/m, xmm
PTEST r/m, xmm
PCMPESTRM immb, r/m, xmm
Note: only 'optab' table is extended.
`EXTRACTPS immb, xmm, r/m` is not included in this
change due to new ytab set 'yextractps'. This should simplify
code review.
4-operand instructions are a subject of upcoming changes that
make 4-th (and so on) operands explicit.
Related TODO note in asm6.go:
"dont't hide 4op, some version have xmm version".
Part of the mission to add missing amd64 SSE4 instructions to Go asm.
Change-Id: I71716df14a8a5332e866dd0f0d52d43d7714872f
Reviewed-on: https://go-review.googlesource.com/57470
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
|
|
3rd change out of 3 to cover AMD64 SSSE3 instruction set in Go asm.
This commit adds instruction that do require new ytab variable.
Change-Id: I0bc7d9401c9176eb3760c3d59494ef082e97af84
Reviewed-on: https://go-review.googlesource.com/56870
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
This is the last instruction I found missing in SSE2 set.
It does not reuse 'yprefetch' ytabs due to differences in
operands SRC/DST roles:
- PREFETCHx: ModRM:r/m(r) -> FROM
- CLFLUSH: ModRM:r/m(w) -> TO
unaryDst map is extended accordingly.
Change-Id: I89e34ebb81cc0ee5f9ebbb1301bad417f7ee437f
Reviewed-on: https://go-review.googlesource.com/56833
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
instructions
1st change out of 3 to cover AMD64 SSSE3 instruction set in Go asm.
This commit adds instructions that do not require new named ytab sets.
Change-Id: I0c3dfd8d39c3daa8b7683ab163c63145626d042e
Reviewed-on: https://go-review.googlesource.com/56834
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
Add support of more ARM VFP instructions in the assembler.
They were introduced in ARM VFPv2.
"NMULF/NMULD Fm, Fn, Fd": Fd = -Fn*Fm
"MULAF/MULAD Fm, Fn, Fd": Fd = Fd + Fn*Fm
"NMULAF/NMULAD Fm, Fn, Fd": Fd = -(Fd + Fn*Fm)
"MULSF/MULSD Fm, Fn, Fd": Fd = Fd - Fn*Fm
"NMULSF/NMULSD Fm, Fn, Fd": Fd = -(Fd - Fn*Fm)
Change-Id: Icd302676ca44a9f5f153fce734225299403c4163
Reviewed-on: https://go-review.googlesource.com/60170
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This change adds new ppc64 instructions from the POWER9 ISA. This includes
compares, loads, maths, register moves and the new random number generator and
copy/paste facilities.
Change-Id: Ife3720b90f5af184ff115bbcdcbce5c1302d39b6
Reviewed-on: https://go-review.googlesource.com/53930
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
Most of these are return values that were part of a receiving parameter,
so they're still accessible.
A few others are not, but those have never had a use.
Found with github.com/mvdan/unparam, after Kevin Burke's suggestion that
the tool should also warn about unused result parameters.
Change-Id: Id8b5ed89912a99db22027703a88bd94d0b292b8b
Reviewed-on: https://go-review.googlesource.com/55910
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Went mainly for the ones that make no sense, such as the ones
mid-sentence or after commas.
Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0
Reviewed-on: https://go-review.googlesource.com/57293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The current code treats the type of SIMD&FP register as C_REG incorrectly.
The fix code converts C_REG type into C_FREG type.
Uncomment fcsels/fcseld test cases.
Fixes #21582
Change-Id: I754c51f72a0418bd352cbc0f7740f14cc599c72d
Reviewed-on: https://go-review.googlesource.com/58350
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The current code treats floating-point constant as integer
and does not treat fcmp/fcmpe as the comparison instrucitons
that requires special handling.
The fix corrects the type of immediate arguments and adds fcmp/fcmpe
in the special handing.
Uncomment the fcmp/fcmpe cases.
Fixes #21567
Change-Id: I6782520e2770f6ce70270b667dd5e68f71e2d5ad
Reviewed-on: https://go-review.googlesource.com/57852
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
The current code gets shift arguments value from prog.From3.Offset.
But prog.From3.Offset is not assigned the shift arguments value in
instructions assemble process.
The fix calls movcon() function to get the correct value.
Uncomment the movk/movkw cases.
Fixes #21398
Change-Id: I78d40c33c24bd4e3688a04622e4af7ddb5333fa6
Reviewed-on: https://go-review.googlesource.com/54990
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
BFX extracts given bits from the source register, sign extends them
to 32-bit, and writes to destination register. BFXU does the similar
operation with zero extention.
They were introduced in ARMv6T2.
Change-Id: I6822ebf663497a87a662d3645eddd7c611de2b1e
Reviewed-on: https://go-review.googlesource.com/56071
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Instructions added in https://golang.org/cl/18853
2nd change out of 3 to cover AMD64 SSSE3 instruction set in Go asm.
This commit does not actually add any new instructions, only
enables some test cases.
Change-Id: I9596435b31ee4c19460a51dd6cea4530aac9d198
Reviewed-on: https://go-review.googlesource.com/56835
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
|
|
There are two changes in this CL.
1. Add new forms of MOVH/MOVHS/MOVHU.
MOVHS R0<<0(R1), R2 // ldrsh
MOVH R0<<0(R1), R2 // ldrsh
MOVHU R0<<0(R1), R2 // ldrh
MOVHS R2, R5<<0(R1) // strh
MOVH R2, R5<<0(R1) // strh
MOVHU R2, R5<<0(R1) // strh
2. Simpify "MVN $0xffffffaa, Rn" to "MOVW $0x55, Rn".
It is originally assembled to two instructions.
"MOVW offset(PC), R11"
"MVN R11, Rn"
Change-Id: I8e863dcfb2bd8f21a04c5d627fa7beec0afe65fb
Reviewed-on: https://go-review.googlesource.com/53690
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Instructions are implemented in the following revisions:
PCMPESTRI - https://golang.org/cl/22337
PHMINPOSUW - https://golang.org/cl/18853
It is unknown when x86test will be updated/re-run, but tests are useful
to check which x86 instructions are not yet supported.
As an example of tool that uses this information, there is Damien
Lespiau x86db.
Part of the mission to add missing amd64 SSE4 instructions to Go asm.
Change-Id: I512ff26040f47a0976b3e37000fb1f37eac5b762
Reviewed-on: https://go-review.googlesource.com/55830
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
|
|
The stxr/stxrw/stxrb/stxrh instructions belong to STLXR-like instructions
set and they require special handling. The current code has no special
handling for those instructions.
The fix adds the special handling for those instructions.
Uncomment stxr/stxrw/stxrb/stxrh test cases.
Fixes #21397
Change-Id: I31cee29dd6b30b1c25badd5c7574dda7a01bf016
Reviewed-on: https://go-review.googlesource.com/54951
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Wrong instructions "MOVW 8(F0), R1" and "MOVW R0<<0(F1), R1"
are silently accepted, and all Fx are treated as Rx.
The patch checks all those illegal base registers.
fixes #20724
Change-Id: I05d41bb43fe774b023205163b7daf4a846e9dc88
Reviewed-on: https://go-review.googlesource.com/46132
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
The current code calculates register number incorrectly.
The fix corrects the register number calculation.
Add cases created by decoder to test assembler.
Fixes #20697
Fixes #20723
Change-Id: I73ac153df9ea9f51c43a5104828d7a5389551c92
Reviewed-on: https://go-review.googlesource.com/45850
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
"MULBB R1, R2, R3" is encoded to 0xe163f182, which should be
0xe1630182.
This patch fix it.
fix #20764
Change-Id: I9d3c3ffa40ecde86638e5e083eacc67578caebf4
Reviewed-on: https://go-review.googlesource.com/46491
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
"MOVBS.U R0<<0(R1), R2" is assembled to 0xe19120d0 (ldrsb r2, [r1, r0]),
but it is expected to be 0xe11120d0 (ldrsb r2, [r1, -r0]).
This patch fixes it and adds more encoding tests.
fixes #20701
Change-Id: Ic1fb46438d71a978dbef06d97494a70c95fcbf3a
Reviewed-on: https://go-review.googlesource.com/45996
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|