| Age | Commit message (Collapse) | Author |
|
CL 307010 for ppc64.
I spent a long time trying to figure out how to use the carry bit from
ADDCCC to further simplify this (like what we do on arm64), but gave
up after I couldn't figure out how to access the carry bit without
just adding more instructions.
Change-Id: I6cad51b93616865b203cb16554f16121375aabbc
Reviewed-on: https://go-review.googlesource.com/c/go/+/307149
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This was missed in https://golang.org/cl/303329 . It is another
impossible usage of MOVBU as a load like "MOVBU 0(rX), rY, rZ" or
"MOVBU rX(rB), rY, rZ".
Change-Id: Ib3dd984b6424907498ed65b798649f0b990d50a7
Reviewed-on: https://go-review.googlesource.com/c/go/+/304471
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
No valid operation should match those removed by this patch. They
kind of look as if they match X-form load/stores on ppc64, but the
second argument is always ignored when translating to machine code.
Similarly, it should be noted an X-form memory access encodes into
an Addr which is a classified as a ZOREG argument with a non-zero
index, and a register type Addr.
Change-Id: I1adbb020d1b2612b18949d0e7eda05dbb3e8a25c
Reviewed-on: https://go-review.googlesource.com/c/go/+/303329
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
|
|
Several classifications exist only to help disambiguate an
implied register (i.e $0/R0 as the implied second register
argument when loading constants, or pseudo-registers used
exclusively by the assembler front-end).
The register determination is folded into getimpliedreg. The
classifications and their related optab entries are removed
or updated.
Change-Id: Iffb167aa9fa57fbc1a537c79fbdfb36cb38f9d95
Reviewed-on: https://go-review.googlesource.com/c/go/+/301789
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Cherry Zhang <cherryyz@google.com>
|
|
These are always sorted and grouped during initialization of the
actual opcode -> optab map generation. Thus, their initial location
in optab is mostly aimed at readability. This cleanup is intends to
ease reviewing of future patches which simplify, combine, or remove
MOV* optab entries.
Change-Id: I87583ed34fab79e0f625880f419d499939e2a9e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/300612
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
|
|
This is rarely used, and is implied based on the
memory type of the operand. This is a step towards
simplifying the MOV* pseudo opcodes on ppc64.
Similarly, remove the bogus param value from AVMULESB.
Change-Id: Ibad4d045ec6d8c5163a468b2db1dfb762ef674ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/300177
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Trust: Ian Lance Taylor <iant@golang.org>
|
|
These instructions are actually 5 argument opcodes as specified
by the ISA. Prior to this patch, the MB and ME arguments were
merged into a single bitmask operand to workaround the limitations
of the ppc64 assembler backend.
This limitation no longer exists. Thus, we can pass operands for
these opcodes without having to merge the MB and ME arguments in
the assembler frontend or compiler backend.
Likewise, support for 4 operand variants is unchanged.
Change-Id: Ib086774f3581edeaadfd2190d652aaaa8a90daeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/298750
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
|
|
This is a preparatory patch to support 6 arg opcodes on POWER10,
and simplify 5 arg opcode processing (e.g RLWNM and similar).
This expands the optab structure, and renames a4 arguments to a6.
No actual change in functionality is made.
Change-Id: I785e4177778e4bf1326cf8e46e8aeaaa0e4d406b
Reviewed-on: https://go-review.googlesource.com/c/go/+/295031
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Keith Randall <khr@golang.org>
|
|
This is no-functionality change to begin the process of supporting
more than 6 operands.
This rewrites the table to use named arguments, and removes default
initialized argument values. The following sed regexes rewrote the table:
s/{\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\)}/{as:\1,a1:\2,a2:\3,a3:\4,a4:\5,type_:\6,size:\7,param:\8}
s/a[1-4]: C_NONE, //g
s/, param: 0//
Change-Id: I5f4de9da75f2fb3964d625d6b4e2f1ce1e29cc47
Reviewed-on: https://go-review.googlesource.com/c/go/+/294189
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
|
|
The runtime traceback code has its own definition of which functions
mark the top frame of a stack, separate from the TOPFRAME bits that
exist in the assembly and are passed along in DWARF information.
It's error-prone and redundant to have two different sources of truth.
This CL provides the actual TOPFRAME bits to the runtime, so that
the runtime can use those bits instead of reinventing its own category.
This CL also adds a new bit, SPWRITE, which marks functions that
write directly to SP (anything but adding and subtracting constants).
Such functions must stop a traceback, because the traceback has no
way to rederive the SP on entry. Again, the runtime has its own definition
which is mostly correct, but also missing some functions. During ordinary
goroutine context switches, such functions do not appear on the stack,
so the incompleteness in the runtime usually doesn't matter.
But profiling signals can arrive at any moment, and the runtime may
crash during traceback if it attempts to unwind an SP-writing frame
and gets out-of-sync with the actual stack. The runtime contains code
to try to detect likely candidates but again it is incomplete.
Deriving the SPWRITE bit automatically from the actual assembly code
provides the complete truth, and passing it to the runtime lets the
runtime use it.
This CL is part of a stack adding windows/arm64
support (#36439), intended to land in the Go 1.17 cycle.
This CL is, however, not windows/arm64-specific.
It is cleanup meant to make the port (and future ports) easier.
Change-Id: I227f53b23ac5b3dabfcc5e8ee3f00df4e113cf58
Reviewed-on: https://go-review.googlesource.com/c/go/+/288800
Trust: Russ Cox <rsc@golang.org>
Trust: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
by inserting hint when using bclrl.
Using this instruction as subroutine call is not the expected
default behavior, and as a result confuses the branch predictor.
The default expected behavior is a conditional return from a
subroutine.
We can change this assumption by encoding a hint this is not a
subroutine return.
The regex benchmarks are a pretty good example of how much this
hint can help generic ppc64le code on a power9 machine:
name old time/op new time/op delta
Find 606ns ± 0% 447ns ± 0% -26.27%
FindAllNoMatches 309ns ± 0% 205ns ± 0% -33.72%
FindString 609ns ± 0% 451ns ± 0% -26.04%
FindSubmatch 734ns ± 0% 594ns ± 0% -19.07%
FindStringSubmatch 706ns ± 0% 574ns ± 0% -18.83%
Literal 177ns ± 0% 136ns ± 0% -22.89%
NotLiteral 4.69µs ± 0% 2.34µs ± 0% -50.14%
MatchClass 6.05µs ± 0% 3.26µs ± 0% -46.08%
MatchClass_InRange 5.93µs ± 0% 3.15µs ± 0% -46.86%
ReplaceAll 3.15µs ± 0% 2.18µs ± 0% -30.77%
AnchoredLiteralShortNonMatch 156ns ± 0% 109ns ± 0% -30.61%
AnchoredLiteralLongNonMatch 192ns ± 0% 136ns ± 0% -29.34%
AnchoredShortMatch 268ns ± 0% 209ns ± 0% -22.00%
AnchoredLongMatch 472ns ± 0% 357ns ± 0% -24.30%
OnePassShortA 1.16µs ± 0% 0.87µs ± 0% -25.03%
NotOnePassShortA 1.34µs ± 0% 1.20µs ± 0% -10.63%
OnePassShortB 940ns ± 0% 655ns ± 0% -30.29%
NotOnePassShortB 873ns ± 0% 703ns ± 0% -19.52%
OnePassLongPrefix 258ns ± 0% 155ns ± 0% -40.13%
OnePassLongNotPrefix 943ns ± 0% 529ns ± 0% -43.89%
MatchParallelShared 591ns ± 0% 436ns ± 0% -26.31%
MatchParallelCopied 596ns ± 0% 435ns ± 0% -27.10%
QuoteMetaAll 186ns ± 0% 186ns ± 0% -0.16%
QuoteMetaNone 55.9ns ± 0% 55.9ns ± 0% +0.02%
Compile/Onepass 9.64µs ± 0% 9.26µs ± 0% -3.97%
Compile/Medium 21.7µs ± 0% 20.6µs ± 0% -4.90%
Compile/Hard 174µs ± 0% 174µs ± 0% +0.07%
Match/Easy0/16 7.35ns ± 0% 7.34ns ± 0% -0.11%
Match/Easy0/32 116ns ± 0% 97ns ± 0% -16.27%
Match/Easy0/1K 592ns ± 0% 562ns ± 0% -5.04%
Match/Easy0/32K 12.6µs ± 0% 12.5µs ± 0% -0.64%
Match/Easy0/1M 556µs ± 0% 556µs ± 0% -0.00%
Match/Easy0/32M 17.7ms ± 0% 17.7ms ± 0% +0.05%
Match/Easy0i/16 7.34ns ± 0% 7.35ns ± 0% +0.10%
Match/Easy0i/32 2.82µs ± 0% 1.64µs ± 0% -41.71%
Match/Easy0i/1K 83.2µs ± 0% 48.2µs ± 0% -42.06%
Match/Easy0i/32K 2.13ms ± 0% 1.80ms ± 0% -15.34%
Match/Easy0i/1M 68.1ms ± 0% 57.6ms ± 0% -15.31%
Match/Easy0i/32M 2.18s ± 0% 1.80s ± 0% -17.52%
Match/Easy1/16 7.36ns ± 0% 7.34ns ± 0% -0.24%
Match/Easy1/32 118ns ± 0% 96ns ± 0% -18.72%
Match/Easy1/1K 2.46µs ± 0% 1.58µs ± 0% -35.65%
Match/Easy1/32K 80.2µs ± 0% 54.6µs ± 0% -31.92%
Match/Easy1/1M 2.75ms ± 0% 1.88ms ± 0% -31.66%
Match/Easy1/32M 87.5ms ± 0% 59.8ms ± 0% -31.62%
Match/Medium/16 7.34ns ± 0% 7.34ns ± 0% +0.01%
Match/Medium/32 2.60µs ± 0% 1.50µs ± 0% -42.61%
Match/Medium/1K 78.1µs ± 0% 43.7µs ± 0% -44.06%
Match/Medium/32K 2.08ms ± 0% 1.52ms ± 0% -27.11%
Match/Medium/1M 66.5ms ± 0% 48.6ms ± 0% -26.96%
Match/Medium/32M 2.14s ± 0% 1.60s ± 0% -25.18%
Match/Hard/16 7.35ns ± 0% 7.35ns ± 0% +0.03%
Match/Hard/32 3.58µs ± 0% 2.44µs ± 0% -31.82%
Match/Hard/1K 108µs ± 0% 75µs ± 0% -31.04%
Match/Hard/32K 2.79ms ± 0% 2.25ms ± 0% -19.30%
Match/Hard/1M 89.4ms ± 0% 72.2ms ± 0% -19.26%
Match/Hard/32M 2.91s ± 0% 2.37s ± 0% -18.60%
Match/Hard1/16 11.1µs ± 0% 8.3µs ± 0% -25.07%
Match/Hard1/32 21.4µs ± 0% 16.1µs ± 0% -24.85%
Match/Hard1/1K 658µs ± 0% 498µs ± 0% -24.27%
Match/Hard1/32K 12.2ms ± 0% 11.7ms ± 0% -4.60%
Match/Hard1/1M 391ms ± 0% 374ms ± 0% -4.40%
Match/Hard1/32M 12.6s ± 0% 12.0s ± 0% -4.68%
Match_onepass_regex/16 870ns ± 0% 611ns ± 0% -29.79%
Match_onepass_regex/32 1.58µs ± 0% 1.08µs ± 0% -31.48%
Match_onepass_regex/1K 45.7µs ± 0% 30.3µs ± 0% -33.58%
Match_onepass_regex/32K 1.45ms ± 0% 0.97ms ± 0% -33.20%
Match_onepass_regex/1M 46.2ms ± 0% 30.9ms ± 0% -33.01%
Match_onepass_regex/32M 1.46s ± 0% 0.99s ± 0% -32.02%
name old alloc/op new alloc/op delta
Find 0.00B 0.00B 0.00%
FindAllNoMatches 0.00B 0.00B 0.00%
FindString 0.00B 0.00B 0.00%
FindSubmatch 48.0B ± 0% 48.0B ± 0% 0.00%
FindStringSubmatch 32.0B ± 0% 32.0B ± 0% 0.00%
Compile/Onepass 4.02kB ± 0% 4.02kB ± 0% 0.00%
Compile/Medium 9.39kB ± 0% 9.39kB ± 0% 0.00%
Compile/Hard 84.7kB ± 0% 84.7kB ± 0% 0.00%
Match_onepass_regex/16 0.00B 0.00B 0.00%
Match_onepass_regex/32 0.00B 0.00B 0.00%
Match_onepass_regex/1K 0.00B 0.00B 0.00%
Match_onepass_regex/32K 0.00B 0.00B 0.00%
Match_onepass_regex/1M 5.00B ± 0% 3.00B ± 0% -40.00%
Match_onepass_regex/32M 136B ± 0% 68B ± 0% -50.00%
name old allocs/op new allocs/op delta
Find 0.00 0.00 0.00%
FindAllNoMatches 0.00 0.00 0.00%
FindString 0.00 0.00 0.00%
FindSubmatch 1.00 ± 0% 1.00 ± 0% 0.00%
FindStringSubmatch 1.00 ± 0% 1.00 ± 0% 0.00%
Compile/Onepass 52.0 ± 0% 52.0 ± 0% 0.00%
Compile/Medium 112 ± 0% 112 ± 0% 0.00%
Compile/Hard 424 ± 0% 424 ± 0% 0.00%
Match_onepass_regex/16 0.00 0.00 0.00%
Match_onepass_regex/32 0.00 0.00 0.00%
Match_onepass_regex/1K 0.00 0.00 0.00%
Match_onepass_regex/32K 0.00 0.00 0.00%
Match_onepass_regex/1M 0.00 0.00 0.00%
Match_onepass_regex/32M 2.00 ± 0% 1.00 ± 0% -50.00%
name old speed new speed delta
QuoteMetaAll 75.2MB/s ± 0% 75.3MB/s ± 0% +0.15%
QuoteMetaNone 465MB/s ± 0% 465MB/s ± 0% -0.02%
Match/Easy0/16 2.18GB/s ± 0% 2.18GB/s ± 0% +0.10%
Match/Easy0/32 276MB/s ± 0% 330MB/s ± 0% +19.46%
Match/Easy0/1K 1.73GB/s ± 0% 1.82GB/s ± 0% +5.29%
Match/Easy0/32K 2.60GB/s ± 0% 2.62GB/s ± 0% +0.64%
Match/Easy0/1M 1.89GB/s ± 0% 1.89GB/s ± 0% +0.00%
Match/Easy0/32M 1.89GB/s ± 0% 1.89GB/s ± 0% -0.05%
Match/Easy0i/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.10%
Match/Easy0i/32 11.4MB/s ± 0% 19.5MB/s ± 0% +71.48%
Match/Easy0i/1K 12.3MB/s ± 0% 21.2MB/s ± 0% +72.62%
Match/Easy0i/32K 15.4MB/s ± 0% 18.2MB/s ± 0% +18.12%
Match/Easy0i/1M 15.4MB/s ± 0% 18.2MB/s ± 0% +18.12%
Match/Easy0i/32M 15.4MB/s ± 0% 18.6MB/s ± 0% +21.21%
Match/Easy1/16 2.17GB/s ± 0% 2.18GB/s ± 0% +0.24%
Match/Easy1/32 271MB/s ± 0% 333MB/s ± 0% +23.07%
Match/Easy1/1K 417MB/s ± 0% 648MB/s ± 0% +55.38%
Match/Easy1/32K 409MB/s ± 0% 600MB/s ± 0% +46.88%
Match/Easy1/1M 381MB/s ± 0% 558MB/s ± 0% +46.33%
Match/Easy1/32M 383MB/s ± 0% 561MB/s ± 0% +46.25%
Match/Medium/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.01%
Match/Medium/32 12.3MB/s ± 0% 21.4MB/s ± 0% +74.13%
Match/Medium/1K 13.1MB/s ± 0% 23.4MB/s ± 0% +78.73%
Match/Medium/32K 15.7MB/s ± 0% 21.6MB/s ± 0% +37.23%
Match/Medium/1M 15.8MB/s ± 0% 21.6MB/s ± 0% +36.93%
Match/Medium/32M 15.7MB/s ± 0% 21.0MB/s ± 0% +33.67%
Match/Hard/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.03%
Match/Hard/32 8.93MB/s ± 0% 13.10MB/s ± 0% +46.70%
Match/Hard/1K 9.48MB/s ± 0% 13.74MB/s ± 0% +44.94%
Match/Hard/32K 11.7MB/s ± 0% 14.5MB/s ± 0% +23.87%
Match/Hard/1M 11.7MB/s ± 0% 14.5MB/s ± 0% +23.87%
Match/Hard/32M 11.6MB/s ± 0% 14.2MB/s ± 0% +22.86%
Match/Hard1/16 1.44MB/s ± 0% 1.93MB/s ± 0% +34.03%
Match/Hard1/32 1.49MB/s ± 0% 1.99MB/s ± 0% +33.56%
Match/Hard1/1K 1.56MB/s ± 0% 2.05MB/s ± 0% +31.41%
Match/Hard1/32K 2.68MB/s ± 0% 2.80MB/s ± 0% +4.48%
Match/Hard1/1M 2.68MB/s ± 0% 2.80MB/s ± 0% +4.48%
Match/Hard1/32M 2.66MB/s ± 0% 2.79MB/s ± 0% +4.89%
Match_onepass_regex/16 18.4MB/s ± 0% 26.2MB/s ± 0% +42.41%
Match_onepass_regex/32 20.2MB/s ± 0% 29.5MB/s ± 0% +45.92%
Match_onepass_regex/1K 22.4MB/s ± 0% 33.8MB/s ± 0% +50.54%
Match_onepass_regex/32K 22.6MB/s ± 0% 33.9MB/s ± 0% +49.67%
Match_onepass_regex/1M 22.7MB/s ± 0% 33.9MB/s ± 0% +49.27%
Match_onepass_regex/32M 23.0MB/s ± 0% 33.9MB/s ± 0% +47.14%
Fixes #42709
Change-Id: Ice07fec2de4c5b1302febf8c2978ae8c1e4fd3e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/271337
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
|
|
This patch adds a prologue_end statement to the DWARF information for
the ppc64 arch.
Prologue end is used by the Delve debugger in order to determine where
to set a breakpoint to avoid the stacksplit prologue.
Updates #36612
Change-Id: Ifb16c1476fe716a0bf493c5486d1d88ebe8d0253
GitHub-Last-Rev: 77a217206d529df8bf8d4ef10a5347b6ae524612
GitHub-Pull-Request: golang/go#42261
Reviewed-on: https://go-review.googlesource.com/c/go/+/266019
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
|
|
The wrong value for the first reg parameter was selected.
Likewise the wrong opcode was selected. This should match
rlwnm (rrr type), not rlwinm (irr type).
Similarly, fix the optab matching rules so clrlslwi does
not match reg,reg,const,reg arguments. This is not a valid
operand combination for clrlslwi.
Fixes #42368
Change-Id: I4eb16d45a760b9fd3f497ef9863f82465351d39f
Reviewed-on: https://go-review.googlesource.com/c/go/+/267421
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
When a fix was made at the end of the last release related to
NOPs, it was discovered that the ppc64.s testcase was out of date
and contained comments that weren't being processed. Essentially the
instructions in that test were being assembled but there was no
verification that the encodings weres correct. The ppc64enc.s file
was mostly complete and included the valid encodings for verification.
This change moves ppc64enc.s to ppc64.s and adds the instructions
that were missing.
This also adds a minor fix to asm9.go on the assembly of the
addex that was discovered during this testing.
Change-Id: Iaada1563b137849ad195fa88f32ecc9ab3e1e95f
Reviewed-on: https://go-review.googlesource.com/c/go/+/260217
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This creates space for a different kind of extension field
in LSym without making the struct any larger.
(There are many LSym, so we care about keeping the struct small.)
Change-Id: Ib16edb9e15f54c2a7351c8b875e19684058711e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/243943
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This adds support to allow the use of mulli when one of the multiply
operands is a constant that fits in 16 bits.
This especially helps in the case where this instruction appears in
a loop since the load of the constant is not being moved out of the loop.
Some improvements seen in compress/flate on power9:
Decode/Digits/Huffman/1e4 259µs ± 0% 261µs ± 0% +0.57% (p=1.000 n=1+1)
Decode/Digits/Huffman/1e5 2.43ms ± 0% 2.45ms ± 0% +0.79% (p=1.000 n=1+1)
Decode/Digits/Huffman/1e6 23.9ms ± 0% 24.2ms ± 0% +0.86% (p=1.000 n=1+1)
Decode/Digits/Speed/1e4 278µs ± 0% 279µs ± 0% +0.34% (p=1.000 n=1+1)
Decode/Digits/Speed/1e5 2.80ms ± 0% 2.81ms ± 0% +0.29% (p=1.000 n=1+1)
Decode/Digits/Speed/1e6 28.0ms ± 0% 28.1ms ± 0% +0.28% (p=1.000 n=1+1)
Decode/Digits/Default/1e4 278µs ± 0% 278µs ± 0% +0.28% (p=1.000 n=1+1)
Decode/Digits/Default/1e5 2.68ms ± 0% 2.69ms ± 0% +0.19% (p=1.000 n=1+1)
Decode/Digits/Default/1e6 26.6ms ± 0% 26.6ms ± 0% +0.21% (p=1.000 n=1+1)
Decode/Digits/Compression/1e4 278µs ± 0% 278µs ± 0% +0.00% (p=1.000 n=1+1)
Decode/Digits/Compression/1e5 2.68ms ± 0% 2.69ms ± 0% +0.21% (p=1.000 n=1+1)
Decode/Digits/Compression/1e6 26.6ms ± 0% 26.6ms ± 0% +0.07% (p=1.000 n=1+1)
Decode/Newton/Huffman/1e4 322µs ± 0% 312µs ± 0% -2.84% (p=1.000 n=1+1)
Decode/Newton/Huffman/1e5 3.11ms ± 0% 2.91ms ± 0% -6.41% (p=1.000 n=1+1)
Decode/Newton/Huffman/1e6 31.4ms ± 0% 29.3ms ± 0% -6.85% (p=1.000 n=1+1)
Decode/Newton/Speed/1e4 282µs ± 0% 269µs ± 0% -4.69% (p=1.000 n=1+1)
Decode/Newton/Speed/1e5 2.29ms ± 0% 2.20ms ± 0% -4.13% (p=1.000 n=1+1)
Decode/Newton/Speed/1e6 22.7ms ± 0% 21.3ms ± 0% -6.06% (p=1.000 n=1+1)
Decode/Newton/Default/1e4 254µs ± 0% 237µs ± 0% -6.60% (p=1.000 n=1+1)
Decode/Newton/Default/1e5 1.86ms ± 0% 1.75ms ± 0% -5.99% (p=1.000 n=1+1)
Decode/Newton/Default/1e6 18.1ms ± 0% 17.4ms ± 0% -4.10% (p=1.000 n=1+1)
Decode/Newton/Compression/1e4 254µs ± 0% 244µs ± 0% -3.91% (p=1.000 n=1+1)
Decode/Newton/Compression/1e5 1.85ms ± 0% 1.79ms ± 0% -3.10% (p=1.000 n=1+1)
Decode/Newton/Compression/1e6 18.0ms ± 0% 17.3ms ± 0% -3.88% (p=1.000 n=1+1)
Change-Id: I840320fab1c4bf64c76b001c2651ab79f23df4eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/259444
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
A recent change to improve shifts was generating some
invalid cases when the rule was based on an AND. The
extended mnemonics CLRLSLDI and CLRLSLWI only allow
certain values for the operands and in the mask case
those values were not being checked properly. This
adds a check to those rules to verify that the
'b' and 'n' values used when an AND was part of the rule
have correct values.
There was a bug in some diag messages in asm9. The
message expected 3 values but only provided 2. Those are
corrected here also.
The test/codegen/shift.go was updated to add a few more
cases to check for the case mentioned here.
Some of the comments that mention the order of operands
in these extended mnemonics were wrong and those have been
corrected.
Fixes #41683.
Change-Id: If5bb860acaa5051b9e0cd80784b2868b85898c31
Reviewed-on: https://go-review.googlesource.com/c/go/+/258138
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This adds support for the extswsli instruction which combines
extsw followed by a shift.
New benchmark demonstrates the improvement:
name old time/op new time/op delta
ExtShift 1.34µs ± 0% 1.30µs ± 0% -3.15% (p=0.057 n=4+3)
Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/257017
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This change adds rules to find pairs of instructions that can
be combined into a single shifts. These instruction sequences
are common in array addressing within loops. Improvements can
be seen in many crypto packages and the hash packages.
These are based on the extended mnemonics found in the ISA
sections C.8.1 and C.8.2.
Some rules in PPC64.rules were moved because the ordering prevented
some matching.
The following results were generated on power9.
hash/crc32:
CRC32/poly=Koopman/size=40/align=0 195ns ± 0% 163ns ± 0% -16.41%
CRC32/poly=Koopman/size=40/align=1 200ns ± 0% 163ns ± 0% -18.50%
CRC32/poly=Koopman/size=512/align=0 1.98µs ± 0% 1.67µs ± 0% -15.46%
CRC32/poly=Koopman/size=512/align=1 1.98µs ± 0% 1.69µs ± 0% -14.80%
CRC32/poly=Koopman/size=1kB/align=0 3.90µs ± 0% 3.31µs ± 0% -15.27%
CRC32/poly=Koopman/size=1kB/align=1 3.85µs ± 0% 3.31µs ± 0% -14.15%
CRC32/poly=Koopman/size=4kB/align=0 15.3µs ± 0% 13.1µs ± 0% -14.22%
CRC32/poly=Koopman/size=4kB/align=1 15.4µs ± 0% 13.1µs ± 0% -14.79%
CRC32/poly=Koopman/size=32kB/align=0 137µs ± 0% 105µs ± 0% -23.56%
CRC32/poly=Koopman/size=32kB/align=1 137µs ± 0% 105µs ± 0% -23.53%
crypto/rc4:
RC4_128 733ns ± 0% 650ns ± 0% -11.32% (p=1.000 n=1+1)
RC4_1K 5.80µs ± 0% 5.17µs ± 0% -10.89% (p=1.000 n=1+1)
RC4_8K 45.7µs ± 0% 40.8µs ± 0% -10.73% (p=1.000 n=1+1)
crypto/sha1:
Hash8Bytes 635ns ± 0% 613ns ± 0% -3.46% (p=1.000 n=1+1)
Hash320Bytes 2.30µs ± 0% 2.18µs ± 0% -5.38% (p=1.000 n=1+1)
Hash1K 5.88µs ± 0% 5.38µs ± 0% -8.62% (p=1.000 n=1+1)
Hash8K 42.0µs ± 0% 37.9µs ± 0% -9.75% (p=1.000 n=1+1)
There are other improvements found in golang.org/x/crypto which are all in the
range of 5-15%.
Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252097
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
We currently use two fields to store the targets of branches.
Some phases use p.To.Val, some use p.Pcond. Rewrite so that
every branch instruction uses p.To.Val.
p.From.Val is also used in rare instances.
Introduce a Pool link for use by arm/arm64, instead of
repurposing Pcond.
This is a cleanup CL in preparation for some stack frame CLs.
Change-Id: If8239177e4b1ea2bccd0608eb39553d23210d405
Reviewed-on: https://go-review.googlesource.com/c/go/+/251437
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This reverts CL 243318.
Reason for revert: Seems to be crashing some builders.
Change-Id: I2ffc59bc5535be60b884b281c8d0eff4647dc756
Reviewed-on: https://go-review.googlesource.com/c/go/+/251169
Reviewed-by: Bryan C. Mills <bcmills@google.com>
|
|
We currently use two fields to store the targets of branches.
Some phases use p.To.Val, some use p.Pcond. Rewrite so that
every branch instruction uses p.To.Val.
p.From.Val is also used in rare instances.
Introduce a Pool link for use by arm/arm64, instead of
repurposing Pcond.
This is a cleanup CL in preparation for some stack frame CLs.
Change-Id: I9055bf0a1d986aff421e47951a1dedc301c846f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/243318
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
These are the indexed vsx load operations with the
same endian and alignment benefits of {l,st}vx.
Likewise, cleanup redundant comments in op{load,store}x and
fix ISA 3.0 typos nearby.
Change-Id: Ie1ace17c6150cf9168a834e435114028ff6eb07c
Reviewed-on: https://go-review.googlesource.com/c/go/+/249025
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
Previously, the assembler removed NOPs from the Prog list in
obj9.go. NOPs shouldn't be removed if they were added as
an inline mark, as described in the issue below.
Fixes #40689
Once the NOPs were left in the Prog list, some instructions
were flagged as invalid because they had an operand which was
not represented in optab. In order to preserve the previous
assembler behavior, entries were added to optab for those
operand cases. They were not flagged as errors before because
the NOP instructions were removed before the code to check the
valid opcode/operand combinations.
Change-Id: Iae5145f94459027cf458e914d7c5d6089807ccf8
Reviewed-on: https://go-review.googlesource.com/c/go/+/247842
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Set the AttrStatic flag on compiler-emitted TOC symbols for ppc64; these
symbols don't need to go into the final symbol table in Go binaries.
This fixes a buglet introduced by CL 240539 that was causing failures
on the aix builder.
Change-Id: If8b63bcf6d2791f1ec5a0c371d2d11e806202fd2
Reviewed-on: https://go-review.googlesource.com/c/go/+/241637
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This updates the ppc64 asm doc file, including information on
updates to the objdump, correcting information on operand order,
and adding some information on shifts.
Change-Id: Ib8ed53eac86c2121ea5b657c361ad92aae31cb32
Reviewed-on: https://go-review.googlesource.com/c/go/+/238237
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This updates the PPC64.rules file to use the MOD instructions
that are available in power9. Prior to power9 this is done
using a longer sequence with multiply and divide.
Included in this change is removal of the REM* opcode variations
that set the CC or OV bits since their settings are based
on the DIV and are not appropriate for the REM.
Change-Id: Iceed9ce33e128e1911c15592ee674276ce8ba3fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/229761
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change adds some instructions that were missing from the
ppc64 assembler, mostly power9 but a few others from earlier.
Tests in cmd/asm for ppc64 were updated: ppc64.s includes the
new instructions, and ppc64enc.s now includes not only the
new instructions but most ppc64 opcodes to provide a more
complete test of the ppc64 assembler.
The ppc64 instruction set is used for linux/ppc64le,
linux/ppc64, and aix/ppc64.
Change-Id: I8695f89dbca06174847963f4ef869f2e584d5bbf
Reviewed-on: https://go-review.googlesource.com/c/go/+/229479
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This adds support support for the PCALIGN value 32. When this
directive occurs code will be aligned to 32 bytes unless
too many NOPs are needed, and then will fall back to 16
byte alignment.
On Linux the function's alignment is promoted from 16 to 32
in functions where PCALIGN 32 appears. On AIX the function's
alignment is left at 16 due to complexity with modifying its
alignment, which means code will be aligned to at least 16,
possibly 32 at times, which is still good.
Test was updated to accept new value.
Change-Id: I28e72d5f30ca472ed9ba736ddeabfea192d11797
Reviewed-on: https://go-review.googlesource.com/c/go/+/228258
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This reverts CL 227775.
Reason for revert: broke aix-ppc64 builder (https://build.golang.org/log/cf3b4f9fd09ee81f422a4b58488b9d0a2692c949).
Change-Id: I2095bb2aadb5a4064eb89ad353012503faf15709
Reviewed-on: https://go-review.googlesource.com/c/go/+/228143
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Previous PCALIGN support on ppc64x only accepted 8 and 16 byte
alignment since the default function alignment was 16. Now that
the function's alignment can be set to a larger value when needed,
PCALIGN can accept 32. When this happens then the function's
alignment will be changed to 32.
Test has been updated to recognized this new value.
Change-Id: If82c3cd50d7c686fcf8a9e819708b15660cdfa63
Reviewed-on: https://go-review.googlesource.com/c/go/+/227775
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Change-Id: I03e2a573eb778591071db4f783585a5d71a14c03
Reviewed-on: https://go-review.googlesource.com/c/go/+/227005
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This fixes a potential issue with the previous implementation
of PCALIGN on ppc64. Previously PCALIGN was processed inside of
asmout and indicated the padding size by setting the value in
the optab, changing it back after the alignment instructions
were added. Now PCALIGN is processed outside of asmout, and optab
is not changed.
Change-Id: I8b0093a0e2b7e06176af27e05150d04ae2c55d60
Reviewed-on: https://go-review.googlesource.com/c/go/+/225198
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This does some clean up of the ppc64 opcodes to remove names
from the opcode list that don't actually assemble. At one time
names were added to this list to represent opcode "classes" to
organize other opcodes that have the same set of operand
combinations. Since this is not documented, it is confusing as
to which opcodes can be used in an asm file and which can't, and
which opcodes should be supported in the disassembler. It is
clearer for the user if the list of Go opcodes are all opcodes
that can be assembled with names that match the ppc64 opcode
where possible.
I found this when trying to use Go opcode XXLAND in an asm file
which seems like it should map to ppc64 xxland but when used it
gets this error:
go tool asm test_xxland.s
asm: bad r/r, r/r/r or r/r/r/r opcode XXLAND
asm: assembly failed
This change removes the opcodes that are only used for opcode
"classes" and fixes the case statement where they are referenced.
This also fixes XXLAND and XXPERM which are opcodes that should
assemble to their corresponding ppc64 opcode but do not.
Change-Id: I52300db6b22f7f8b3dd3491c3f35a384b943352c
Reviewed-on: https://go-review.googlesource.com/c/go/+/223138
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
When there are both a synchronous preemption request (by
clobbering the stack guard) and an asynchronous one (by signal),
the running goroutine may observe the synchronous request first
in stack bounds check, and go to the path of calling morestack.
If the preemption signal arrives at this point before the call to
morestack, the goroutine will be asynchronously preempted,
entering the scheduler. When it is resumed, the scheduler clears
the preemption request, unclobbers the stack guard. But the
resumed goroutine will still call morestack, as it is already on
its way. morestack will, as there is no preemption request,
double the stack unnecessarily. If this happens multiple times,
the stack may grow too big, although only a small amount is
actually used.
To fix this, we mark the stack bounds check and the call to
morestack async-nonpreemptible, starting after the memory
instruction (mostly a load, on x86 CMP with memory).
Not done for Wasm as it does not support async preemption.
Fixes #35470.
Change-Id: Ibd7f3d935a3649b80f47539116ec9b9556680cf2
Reviewed-on: https://go-review.googlesource.com/c/go/+/207350
Reviewed-by: David Chase <drchase@google.com>
|
|
If a MOVDU instruction is used with an offset of SP, the
instruction changes SP therefore needs an SP delta, which is used
for generating the PC-SP table for stack unwinding. MOVDU is
frequently used for allocating the frame and saving the LR in the
same instruction, so this is particularly useful.
Change-Id: Icb63eb55aa01c3dc350ac4e4cff6371f4c3c5867
Reviewed-on: https://go-review.googlesource.com/c/go/+/205279
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
We'll use CTR as a scratch register for call injection. Mark code
sequences that use CTR as unsafe for async preemption. Currently
it is only used in LoweredZero and LoweredMove. It is unfortunate
that they are nonpreemptible. But I think it is still better than
using LR for call injection and marking all leaf functions
nonpreemptible.
Also mark the prologue of large frame functions nonpreemptible,
as we write below SP.
Change-Id: I05a75431499f3f4b2f23651a7b17f7fcf2afbe06
Reviewed-on: https://go-review.googlesource.com/c/go/+/203823
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
On PPC64, indirect calls can be made through LR or CTR. Currently
both are used. This CL changes it to always use LR.
For async preemption, to return from the injected call, we need
an indirect jump back to the PC we preeempted. This jump can be
made through LR or CTR. So we'll have to clobber either LR or CTR.
Currently, LR is used more frequently. In particular, for a leaf
function, LR is live throughout the function. We don't want to
make leaf functions nonpreemptible. So we choose CTR for the call
injection. For code sequences that use CTR, if it is ok to use
another register, change it to.
Plus, it is a call so it will clobber LR anyway. It doesn't need
to also clobber CTR (even without preemption).
Change-Id: I07bd0e93b94a1a3aa2be2cd465801136165d8ab8
Reviewed-on: https://go-review.googlesource.com/c/go/+/203822
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
POWER9 (ISA 3.0) introduced a new format of load/store instructions to
implement indexed load/store quadword, using an immediate value instead
of a register index.
This change adds support for this new instruction encoding and adds the
new load/store quadword instructions (lxv/stxv) to the assembler.
This change also adds the missing XX1-form loads/stores (halfword and byte)
included in ISA 3.0.
Change-Id: Ibcdf53c342d7a352d64a9403c2fe7b25be9c3b24
Reviewed-on: https://go-review.googlesource.com/c/go/+/200399
Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
ADUFFCOPY and ADUFFZERO instructions weren't handled by rewriteToUseTOC.
These instructions are considered as a simple branch except with -dynlink
where they become an indirect call.
Fixes #34604
Change-Id: I16ca6a152164966fb9cbf792219a8a39aad2b53b
Reviewed-on: https://go-review.googlesource.com/c/go/+/197842
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This adds support for ppc64 instructions vmrgow and vmrgew which
are needed for an improved implementation of chacha20.
Change-Id: I967a2de54236bcc573a99f7e2b222d5a8bb29e03
Reviewed-on: https://go-review.googlesource.com/c/go/+/192117
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
Adding some details on writing Go assembler for ppc64.
Change-Id: I46fc6b75ee6c36946f90280b2b670e0d32bcc6b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/183837
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
zeros instructions on POWER9
This change adds new POWER9 instructions for counting trailing zeros (CNTTZW/CNTTZD)
to the assembler and generates them in SSA when GOPPC64=power9.
name old time/op new time/op delta
TrailingZeros-160 1.59ns ±20% 1.45ns ±10% -8.81% (p=0.000 n=14+13)
TrailingZeros8-160 1.55ns ±23% 1.62ns ±44% ~ (p=0.593 n=13+15)
TrailingZeros16-160 1.78ns ±23% 1.62ns ±38% -9.31% (p=0.003 n=14+14)
TrailingZeros32-160 1.64ns ±10% 1.49ns ± 9% -9.15% (p=0.000 n=13+14)
TrailingZeros64-160 1.53ns ± 6% 1.45ns ± 5% -5.38% (p=0.000 n=15+13)
Change-Id: I365e6ff79f3ce4d8ebe089a6a86b1771853eb596
Reviewed-on: https://go-review.googlesource.com/c/go/+/167517
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
A bug in the encoding of XX1-Form is flipping bit 31 of such instructions.
This may result in register clobering when using VSX instructions.
This was not exposed before because we currently don't generate these
instructions in SSA, and the asm files in which they are present aren't
affected by register clobbering.
This change fixes the bug and adds a testcase for the problem.
Fixes #30112
Change-Id: I77b606159ae1efea33d2ba3e1c74b7fae8d5d2e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/163759
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This commit allows to cross-compiling aix/ppc64. The nosplit limit must
twice as large as on others platforms because of AIX syscalls.
The stack limit, especially stackGuardMultiplier, was set by cmd/dist
during the bootstrap and doesn't depend on GOOS/GOARCH target.
Fixes #29572
Change-Id: Id51e38885e1978d981aa9e14972eaec17294322e
Reviewed-on: https://go-review.googlesource.com/c/157117
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
This commit updates the new symbol addressing made for aix/ppc64 according
to feedbacks given in CL 151039.
Change-Id: Ic4eb9943dc520d65f7d084adf8fa9a2530f4d3f9
Reviewed-on: https://go-review.googlesource.com/c/151302
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
VPERMXOR is missing from the Go assembler for ppc64. It has the
same format as VPERM. It was requested by an external user so
they could write an optimized algorithm in asm.
Change-Id: Icf4c682f7f46716ccae64e6ae3d62e8cec67f6c1
Reviewed-on: https://go-review.googlesource.com/c/151578
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
This commit changes the code generated for addressing symbols on AIX
operating system.
On AIX, every symbol accesses must be done via another symbol near the TOC,
named TOC anchor or TOC entry. This TOC anchor is a pointer to the symbol
address.
During Progedit function, when a symbol access is detected, its instructions
are modified to create a load on its TOC anchor and retrieve the symbol.
Change-Id: I00cf8f49c13004bc99fa8af13d549a709320f797
Reviewed-on: https://go-review.googlesource.com/c/151039
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
ppc64x
This adds support for an alignment directive that can be used
within Go asm to indicate preferred code alignment for ppc64x.
This is intended to be used with loops to improve
performance.
This change only adds the directive and aligns the code based
on it. Follow up changes will modify asm functions for
ppc64x that benefit from preferred alignment.
Fixes #14935
Here is one example of the improvement in memmove when the
directive is used on the loops in the code:
Memmove/64 8.74ns ± 0% 8.64ns ± 0% -1.19% (p=0.000 n=8+8)
Memmove/128 11.5ns ± 0% 11.0ns ± 0% -4.35% (p=0.000 n=8+8)
Memmove/256 23.0ns ± 0% 15.3ns ± 0% -33.48% (p=0.000 n=8+8)
Memmove/512 31.7ns ± 0% 31.8ns ± 0% +0.32% (p=0.000 n=8+8)
Memmove/1024 52.3ns ± 0% 43.9ns ± 0% -16.10% (p=0.000 n=8+8)
Memmove/2048 93.2ns ± 0% 76.2ns ± 0% -18.24% (p=0.000 n=8+8)
Memmove/4096 174ns ± 0% 141ns ± 0% -18.97% (p=0.000 n=8+8)
Change-Id: I200d77e923dd5d78c22fe3f8eb142a8fbaff57bf
Reviewed-on: https://go-review.googlesource.com/c/144218
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
This change makes use of a VSX instruction to generate the
float 0 value instead of generating a constant in memory and
loading it from there.
This uses 1 instruction instead of 2 and avoids a memory reference.
in the +0 case, uses 2 instructions in the -0 case but avoids
the memory reference.
Since this is done in the assembler for ppc64x, an update has
been made to the assembler test.
Change-Id: Ief7dddcb057bfb602f78215f6947664e8c841464
Reviewed-on: https://go-review.googlesource.com/c/139420
Reviewed-by: Michael Munday <mike.munday@ibm.com>
|