| Age | Commit message (Collapse) | Author |
|
Logical ops on uint8/uint16 (AND/OR/XOR) with constants sometimes
materialized the mask via MOVD (often as a negative immediate), even
when the value fit in the UI-immediate range. This prevented the backend
from selecting andi. / ori / xori forms.
This CL makes:
UI-immediate truncation is performed only at the use-site of
logical-immediate ops, and only when the constant does not fit in the
8- or 16-bit unsigned domain (m != uint8(m) / m != uint16(m)).
This avoids negative-mask materialization and enables correct emission of
UI-form logical instructions. Arithmetic SI-immediate instructions (addi, subfic, etc.) and other
use-patterns are unchanged.
Codegen tests are added to ensure the expected andi./ori/xori
patterns appear and that MOVD is not emitted for valid 8/16-bit masks.
Change-Id: I9fcdf4498c4e984c7587814fb9019a75865c4a0d
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/704015
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
|
|
On ppc64/ppc64le, rewrite (x + x) << c to x << (c+1) for constant shifts. This removes an ADD, shortens the dependency chain, and reduces code size.
Add rules for both 64-bit (SLDconst) and 32-bit (SLWconst), and extend
test/codegen/shift.go with ppc64x checks to assert a single SLD/SLW and
forbid ADD. Aligns ppc64 with other architectures that already assert
similar codegen in shift.go.
Change-Id: Ie564afbb029a5bd48887b82b0c455ca1dddd5508
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/712000
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
type I interface {
foo()
}
type S struct {
I
}
Because I is embedded in S, S needs a foo method. We generate a
wrapper function to implement (*S).foo. It just loads the embedded
field I out of S and calls foo on it.
When the thing in S.I itself needs a wrapper, then we have a wrapper
calling another wrapper. This can continue, leaving a potentially long
sequence of wrappers on the stack. When we then call runtime.Callers
or friends, we have to walk an unbounded number of frames to find a
bounded number of non-wrapper frames.
This really happens, for instance with I = context.Context, S =
context.ValueCtx, and runtime.Callers = pprof sample (for any of
context.Context's methods).
To fix, make the interface call in the wrapper a tail call.
That way, the number of wrapper frames on the stack does not
increase when there are lots of wrappers happening.
Fixes #75764
Fixes #77781
Change-Id: I03b1731159d9218c7f14f72ecbbac822d6a6bb87
Reviewed-on: https://go-review.googlesource.com/c/go/+/751465
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Fixes #76084
I was focused on restoring the old behavior and fixing the failing
test/codegen/arithmetic.go:MergeMuls2 test.
It is probable this same bug hides elsewhere in this file.
Change-Id: I17f2ee6b97a1e33b8132648d9d750749d006f7e0
Reviewed-on: https://go-review.googlesource.com/c/go/+/715560
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
|
|
Change-Id: I25a9bbc247b2490e7e37ed843386f53a71822146
Reviewed-on: https://go-review.googlesource.com/c/go/+/682498
Reviewed-by: Paul Murphy <paumurph@redhat.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64
and S390X, rather than having a custom intrinsic. Note that for PPC64 this
actually allows the existing Ctz16 and Ctz8 rules to be used.
Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/651816
Reviewed-by: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Decompose BitLen16 and BitLen8 within the SSA rules for architectures that
support BitLen32 or BitLen64, rather than having a custom intrinsic.
Change-Id: Ie4188ce69d1021e63cec27a8e7418efb0714812b
Reviewed-on: https://go-review.googlesource.com/c/go/+/651817
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Fold constant int16 addends for usages of math/bits.Add64(x,const,0)
on PPC64. This usage shows up in a few crypto implementations;
notably the go wrapper for CL 626176.
Change-Id: I6963163330487d04e0479b4fdac235f97bb96889
Reviewed-on: https://go-review.googlesource.com/c/go/+/625899
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
CL 621357 introduced new generic lowering rules which caused
several shift related codegen test failures.
Add new rules to fix the test regressions, and cleanup tests
which are changed but not regressed. Some CLRLSLDI tests are
removed as they are no test CLRLSLDI rules.
Fixes #70003
Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034
Reviewed-on: https://go-review.googlesource.com/c/go/+/622236
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This is minor extension of the existing support for 32 and
64 bit types.
For #69735
Change-Id: I6828ec223951d2b692e077dc507b000ac23c32a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/617496
Reviewed-by: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: I50a19bd971176598bf8e4ef86ec98f008abe245c
Reviewed-on: https://go-review.googlesource.com/c/go/+/615198
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
This can be done efficiently with few instructions.
This also adds MULHDUCC for further codegen improvement.
Change-Id: I06320ba4383a679341b911a237a360ef07b19168
Reviewed-on: https://go-review.googlesource.com/c/go/+/605975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
RLIWNM does not clear the upper 32 bits of the target register if
the mask wraps around (e.g 0xF000000F). Don't elide MOVWZreg for
such masks. All other usage clears the upper 32 bits.
Fixes #67844.
Change-Id: I11b89f1da9ae077624369bfe2bf25e9b7c9b79bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/590896
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This allows more effective conversion of rotate and mask opcodes
into their CC equivalents, while simplifying the first lowering
pass.
This was removed before the latelower pass was introduced to fold
more cases of compare against zero. Add ANDconst to push the
conversion of ANDconst to ANDCCconst into latelower with the other
CC opcodes.
This also requires introducing RLDICLCC to prevent regressions
when ANDconst is converted to RLDICL then to RLDICLCC and back
to ANDCCconst when possible.
Change-Id: I9e5f9c99fbefa334db18c6c152c5f967f3ff2590
Reviewed-on: https://go-review.googlesource.com/c/go/+/586160
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Avoid creating duplicate usages of ANDCCconst. This is preparation for
a patch to reintroduce ANDconst to simplify the lower pass while
treating ANDCCconst like other *CC* ssa opcodes.
Also, move many of the similar rules wich retarget ANDCCconst users
to the flag result to a common rule for all compares against zero.
Change-Id: Ida86efe17ff413cb82c349d8ef69d2899361f4c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/585400
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Investigating binaries, these patterns seem to show up frequently.
Change-Id: I987251e4070e35c25e98da321e444ccaa1526912
Reviewed-on: https://go-review.googlesource.com/c/go/+/583302
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
RLWINM
This provides a small performance bump to crc64 as measured on ppc64le/power10:
name old time/op new time/op delta
Crc64/ISO64KB 49.6µs ± 0% 46.6µs ± 0% -6.18%
Crc64/ISO4KB 3.16µs ± 0% 2.97µs ± 0% -5.83%
Crc64/ISO1KB 840ns ± 0% 794ns ± 0% -5.46%
Crc64/ECMA64KB 49.6µs ± 0% 46.5µs ± 0% -6.20%
Crc64/Random64KB 53.1µs ± 0% 49.9µs ± 0% -6.04%
Crc64/Random16KB 15.9µs ± 1% 15.0µs ± 0% -5.73%
Change-Id: I302b5431c7dc46dfd2d211545c483bdcdfe011f1
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/581937
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Eli Bendersky <eliben@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This enables efficient use of the builtin min/max function
for float64 and float32 types on GOPPC64 >= power9.
Extend the assembler to support xsminjdp/xsmaxjdp and use
them to implement float min/max.
Simplify the VSX xx3 opcode rules to allow FPR arguments,
if all arguments are an FPR.
Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328
Reviewed-on: https://go-review.googlesource.com/c/go/+/574535
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
This usage shows up in quite a few places, and helps reduce
register pressure in several complex cryto functions by
removing a MOVD $0,... instruction.
Change-Id: I9444ea8f9d19bfd68fb71ea8dc34e109681b3802
Reviewed-on: https://go-review.googlesource.com/c/go/+/571055
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
|
|
For example, the Slicemask rule in PPC64 generates a sequence wherein there is andi operation, after an sradi, which can be replaced by srdi. This new rule eliminates ANDCCconst.
Change-Id: I27aaadf76b9c749a60bcdc5e87b1ebb8167d2fd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/539055
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
In the Power10 rule to fold bit reversal into load, the MOVWZreg or
MOVHZreg (Zeroing out the upper bits of a word or halfword) becomes
redundant since byte reverse (BR) load clears the upper bits. Hence
removing for Power10. Similarly for < Power10 cases in the rule used to
fold bit reversal into load (Bswap), the above redundant operation is removed.
Change-Id: Idb027e8b6e79b6acfb81d48a9a6cc06f8e9cd2db
Reviewed-on: https://go-review.googlesource.com/c/go/+/531377
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
In the PPC64 ISA, the instruction to do an 'and' operation
using an immediate constant is only available in the form that
also sets CR0 (i.e. clobbers the condition register.) This means
CR0 is being clobbered unnecessarily in many cases. That
affects some decisions made during some compiler passes
that check for it.
In those cases when the constant used by the ANDCC is a right
justified consecutive set of bits, a shift instruction can
be used which has the same effect if CR0 does not need to be
set. The rule to do that has been added to the late rules file
after other rules using ANDCCconst have been processed in the
main rules file.
Some codegen tests had to be updated since ANDCC is no
longer generated for some cases. A new test case was added to
verify the ANDCC is present if the results for both the AND
and CR0 are used.
Change-Id: I304f607c039a458e2d67d25351dd00aea72ba542
Reviewed-on: https://go-review.googlesource.com/c/go/+/531435
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
This adds some rules to recognize MOVDaddr in those cases where
it is just adding 0 to a ptr value. Instead the ptr value can just
be used.
Change-Id: I95188defc9701165c86bbea70d14d037a9e54853
Reviewed-on: https://go-review.googlesource.com/c/go/+/527698
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
This sequence can show up in the lowering pass on PPC64. If it
makes it to the latelower pass, it will cause an error because
it looks like it can be turned into RLDICL, but -1 isn't an
accepted mask.
Also, print more debug info if panic is called from
encodePPC64RotateMask.
Fixes #62698
Change-Id: I0f3322e2205357abe7fc28f96e05e3f7ad65567c
Reviewed-on: https://go-review.googlesource.com/c/go/+/529195
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
(ANDCCconst [y] (MOV.*reg x)) should only be merged when zero
extending. Otherwise, sign bits are lost on negative values.
(ANDCCconst [0xFF] (MOVBreg x)) should be simplified to a zero
extension of x. Likewise for the MOVHreg variant.
Fixes #61297
Change-Id: I04e4fd7dc6a826e870681f37506620d48393698b
Reviewed-on: https://go-review.googlesource.com/c/go/+/508775
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This modifies some existing rules to allow more prefixed instructions
to be generated when using GOPPC64=power10. Some rules also check
if PCRel is available, which is currently supported for linux/ppc64le
and linux/ppc64 (internal linking only).
Prior to p10, DS-offset loads and stores had a 16 bit size limit for
the offset field. If the offset of the data for load or store was
beyond this range then an indexed load or store would be selected by
the rules.
In p10 the assembler can generate prefixed instructions in this case,
but does not if an indexed instruction was selected during the lowering
pass.
This allows many more cases to use prefixed loads or stores, reducing
function sizes and improving performance in some cases where the code
change happens in key loops.
For example in strconv BenchmarkAppendQuoteRune before:
12c5e4: 15 00 10 06 pla r10,1425660
12c5e8: fc c0 40 39
12c5ec: 00 00 6a e8 ld r3,0(r10)
12c5f0: 10 00 aa e8 ld r5,16(r10)
After this change:
12a828: 15 00 10 04 pld r3,1433272
12a82c: b8 de 60 e4
12a830: 15 00 10 04 pld r5,1433280
12a834: c0 de a0 e4
Performs better in the second case.
A testcase was added to verify that the rules correctly select a load or
store based on the offset and whether power10 or earlier.
Change-Id: I4335fed0bd9b8aba8a4f84d69b89f819cc464846
Reviewed-on: https://go-review.googlesource.com/c/go/+/477398
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
|
|
Memory op combining is currently done using arch-specific rewrite rules.
Instead, do them as a arch-independent rewrite pass. This ensures that
all architectures (with unaligned loads & stores) get equal treatment.
This removes a lot of rewrite rules.
The new pass is a bit more comprehensive. It handles things like out-of-order
writes and is careful not to apply partial optimizations that then block
further optimizations.
Change-Id: I780ff3bb052475cd725a923309616882d25b8d9e
Reviewed-on: https://go-review.googlesource.com/c/go/+/478475
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
No change in semantics, just removing an unneeded helper.
Also align rules a bit.
Change-Id: Ie4dabb99392315a7700c645b3d0931eb8766a5fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/483439
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Argument type is dangerous because it may be thinner than the actual
store being issued.
Change-Id: Id19fbd8e6c41390a453994f897dd5048473136aa
Reviewed-on: https://go-review.googlesource.com/c/go/+/483438
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This reduces unbounded shift latency by one cycle, and may
generate less instructions in some cases.
When there is a choice whether to use doubleword or word shifts, use
doubleword shifts. Doubleword shifts have fewer hardware scheduling
restrictions across P8/P9/P10.
Likewise, rework the shift sequence to allow the compare/shift/overshift
values to compute in parallel, then choose the correct value.
Some ANDCCconst rules also need reworked to ensure they simplify when
used for their flag value. This commonly occurs when prove fails to
identify a bounded shift (e.g foo32<<uint(x&31)).
Change-Id: Ifc6ff4a865d68675e57745056db414b0eb6f2d34
Reviewed-on: https://go-review.googlesource.com/c/go/+/442597
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
For c + nil, we want the result to still be of pointer type.
Fixes ppc64le build failure with CL 468455, in issue33724.go.
The problem in that test is that it requires a nil check to be
scheduled before the corresponding load. This normally happens fine
because we prioritize nil checks. If we have nilcheck(p) and load(p),
once p is scheduled the nil check will always go before the load.
The issue we saw in 33724 is that when p is a nil pointer, we ended up
with two different p's, an int64(0) as the argument to the nil check
and an (*Outer)(0) as the argument to the load. Those two zeroes don't
get CSEd, so if the (*Outer)(0) happens to get scheduled first, the
load can end up before the nilcheck.
Fix this by always having constant arithmetic preserve the pointerness
of the value, so that both zeroes are of type *Outer and get CSEd.
Update #58482
Update #33724
Change-Id: Ib9b8c0446f1690b574e0f3c0afb9934efbaf3513
Reviewed-on: https://go-review.googlesource.com/c/go/+/468615
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Bypass: Keith Randall <khr@golang.org>
|
|
This CL adds rules that replaces instances of ISEL that produce
a boolean result based on a condition register by SETBC/SETBCR
operations. On Power10 these are convereted to SETBC/SETBCR
instructions that use one register instead of 3 registers
conventionally used by ISEL and hence reduces register pressure.
On loops written specifically to exercise such instances of ISEL
extensively, a performance improvement of 2.5% is seen on Power10.
Also added verification tests to verify correct generation of
SETBC/SETBCR instructions on Power10.
Change-Id: Ib719897f09d893de40324440a43052dca026e8fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/449795
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This change intrinsifies ReverseBytes{16|32|64} by generating the
corresponding new instructions in Power10: brh, brd and brw and
adds a verification test for the same.
On Power 9 and 8, the .go code performs optimally as it is.
Performance improvement seen on Power10:
ReverseBytes32 1.38ns ± 0% 1.18ns ± 0% -14.2
ReverseBytes64 1.52ns ± 0% 1.11ns ± 0% -26.87
ReverseBytes16 1.41ns ± 1% 1.18ns ± 0% -16.47
Change-Id: I88f127f3ab9ba24a772becc21ad90acfba324b37
Reviewed-on: https://go-review.googlesource.com/c/go/+/446675
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
The SPanchored opcode is identical to SP, except that it takes a memory
argument so that it (and more importantly, anything that uses it)
must be scheduled at or after that memory argument.
This opcode ensures that a LEAQ of a variable gets scheduled after the
corresponding VARDEF for that variable.
This may lead to less CSE of LEAQ operations. The effect is very small.
The go binary is only 80 bytes bigger after this CL. Usually LEAQs get
folded into load/store operations, so the effect is only for pointerful
types, large enough to need a duffzero, and have their address passed
somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because
the two uses are on different sides of a function call and the LEAQ
ends up being rematerialized at the second use anyway.
Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293
Reviewed-on: https://go-review.googlesource.com/c/go/+/452916
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Martin Möhrmann <martin@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Similar to CL 456556 but for ppc64 instead of arm64.
Change docs about how booleans are stored in registers for ppc64.
We now don't promise to keep the upper bits zeroed; they might be junk.
To test, I changed the boolean generation instructions (MOVBZload* and ISEL*
with boolean type) to OR in 0x100 to the result. all.bash still passed,
so I think nothing else is depending on the upper bits of booleans.
Update #57184
Change-Id: Ie66f8934a0dafa34d0a8c2a37324868d959a852c
Reviewed-on: https://go-review.googlesource.com/c/go/+/456437
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: KAMPANAT THUMWONG (KONG PC) <1992kongpc.kth@gmail.com>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
|
|
This adds a -d debug flag "fmahash" for hashcode search for
floating point architecture-dependent problems. This variable has no
effect on architectures w/o fused-multiply-add.
This was rebased onto the GOSSAHASH renovation so that this could have
its own dedicated environment variable, and so that it would be
cheap (a nil check) to check it in the normal case.
Includes a basic test of the trigger plumbing.
Sample use (on arm64, ppc64le, s390x):
% GOCOMPILEDEBUG=fmahash=001110110 \
go build -o foo cmd/compile/internal/ssa/testdata/fma.go
fmahash triggered main.main:24 101111101101111001110110
GOFMAHASH triggered main.main:20 010111010000101110111011
1.0000000000000002 1.0000000000000004 -2.220446049250313e-16
exit status 1
The intended use is in conjunction with github.com/dr2chase/gossahash,
which will probably acquire a flag "-fma" to streamline its use. This
tool+use was inspired by an ad hoc use of this technique "in anger"
to debug this very problem. This is also a dry-run for using this
same technique to identify code sensitive to loop variable
lifetime/capture, should we make that change.
Example intended use, with current search tool (using old environment
variable), for a test example:
gossahash -e GOFMAHASH GOMAGIC=GOFMAHASH go run fma.go
Trying go args=[...], env=[GOFMAHASH=1 GOMAGIC=GOFMAHASH]
go failed (81 distinct triggers): exit status 1
Trying go args=[...], env=[GOFMAHASH=11 GOMAGIC=GOFMAHASH]
go failed (39 distinct triggers): exit status 1
Trying go args=[...], env=[GOFMAHASH=011 GOMAGIC=GOFMAHASH]
go failed (18 distinct triggers): exit status 1
Trying go args=[...], env=[GOFMAHASH=0011 GOMAGIC=GOFMAHASH]
Trying go args=[...], env=[GOFMAHASH=1011 GOMAGIC=GOFMAHASH]
...
Trying go args=[...], env=[GOFMAHASH=0110111011 GOMAGIC=GOFMAHASH]
Trying go args=[...], env=[GOFMAHASH=1110111011 GOMAGIC=GOFMAHASH]
go failed (2 distinct triggers): exit status 1
Trigger string is 'GOFMAHASH triggered math.qzero:427 111111101010011110111011', repeated 6 times
Trigger string is 'GOFMAHASH triggered main.main:20 010111010000101110111011', repeated 1 times
Trying go args=[...], env=[GOFMAHASH=01110111011 GOMAGIC=GOFMAHASH]
go failed (1 distinct triggers): exit status 1
Trigger string is 'GOFMAHASH triggered main.main:20 010111010000101110111011', repeated 1 times
Review GSHS_LAST_FAIL.0.log for failing run
FINISHED, suggest this command line for debugging:
GOSSAFUNC='main.main:20 010111010000101110111011' \
GOFMAHASH=01110111011 GOMAGIC=GOFMAHASH go run fma.go
Change-Id: Ifa22dd8f1c37c18fc8a4f7c396345a364bc367d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/394754
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
This updates some rules to use ops with CC variations to
set the condition code when the result of the operation is
zero. This allows the following compare with zero to be
removed since the equivalent condition code has already been
set.
In addition, a previous rule change to use ANDCCconst was modified
to allow any constant value, not just 1 in some cases.
Improvements in the reflect package benchmarks:
DeepEqual/int8-4 23.9ns ± 1% 23.1ns ± 1% -3.57% (p=0.029 n=4+4)
DeepEqual/[]int8-4 109ns ± 2% 102ns ± 1% -6.67% (p=0.029 n=4+4)
DeepEqual/int16-4 23.8ns ± 1% 22.8ns ± 0% -3.97% (p=0.029 n=4+4)
DeepEqual/[]int16-4 108ns ± 1% 102ns ± 0% -6.25% (p=0.029 n=4+4)
DeepEqual/int32-4 24.9ns ± 3% 23.6ns ± 0% -5.09% (p=0.029 n=4+4)
DeepEqual/[]int32-4 109ns ± 1% 103ns ± 0% -5.64% (p=0.029 n=4+4)
DeepEqual/int64-4 25.5ns ± 1% 23.7ns ± 0% -7.03% (p=0.029 n=4+4)
DeepEqual/[]int64-4 109ns ± 1% 102ns ± 0% -6.73% (p=0.029 n=4+4)
DeepEqual/int-4 23.2ns ± 1% 22.7ns ± 0% -2.05% (p=0.029 n=4+4)
DeepEqual/[]int-4 109ns ± 3% 101ns ± 0% -7.18% (p=0.029 n=4+4)
DeepEqual/uint8-4 23.9ns ± 1% 23.5ns ± 0% -1.69% (p=0.029 n=4+4)
DeepEqual/[]uint8-4 89.1ns ± 0% 85.6ns ± 1% -3.95% (p=0.029 n=4+4)
DeepEqual/uint16-4 24.0ns ± 1% 23.8ns ± 0% -0.76% (p=0.343 n=4+4)
DeepEqual/[]uint16-4 111ns ± 0% 106ns ± 4% -4.74% (p=0.029 n=4+4)
DeepEqual/uint32-4 23.5ns ± 1% 23.0ns ± 0% -2.15% (p=0.029 n=4+4)
DeepEqual/[]uint32-4 110ns ± 1% 104ns ± 0% -5.66% (p=0.029 n=4+4)
DeepEqual/uint64-4 24.6ns ± 1% 24.3ns ± 0% -1.10% (p=0.143 n=4+4)
DeepEqual/[]uint64-4 111ns ± 0% 105ns ± 1% -5.16% (p=0.029 n=4+4)
DeepEqual/uint-4 23.6ns ± 0% 23.0ns ± 0% -2.70% (p=0.029 n=4+4)
DeepEqual/[]uint-4 109ns ± 0% 103ns ± 1% -5.74% (p=0.029 n=4+4)
DeepEqual/uintptr-4 25.1ns ± 1% 24.8ns ± 2% -1.11% (p=0.171 n=4+4)
DeepEqual/[]uintptr-4 111ns ± 0% 106ns ± 1% -4.45% (p=0.029 n=4+4)
DeepEqual/float32-4 22.5ns ± 0% 22.2ns ± 0% -1.29% (p=0.029 n=4+4)
DeepEqual/[]float32-4 105ns ± 0% 101ns ± 1% -3.75% (p=0.029 n=4+4)
DeepEqual/float64-4 22.7ns ± 2% 22.1ns ± 0% -2.52% (p=0.029 n=4+4)
DeepEqual/[]float64-4 105ns ± 1% 103ns ± 1% -2.77% (p=0.029 n=4+4)
DeepEqual/complex64-4 22.9ns ± 0% 22.8ns ± 0% -0.48% (p=0.029 n=4+4)
DeepEqual/[]complex64-4 107ns ± 0% 101ns ± 0% -5.48% (p=0.029 n=4+4)
DeepEqual/complex128-4 23.2ns ± 1% 22.6ns ± 0% -2.34% (p=0.029 n=4+4)
DeepEqual/[]complex128-4 107ns ± 0% 101ns ± 0% -5.60% (p=0.029 n=4+4)
DeepEqual/bool-4 22.0ns ± 1% 21.7ns ± 0% -1.44% (p=0.029 n=4+4)
DeepEqual/[]bool-4 106ns ± 1% 100ns ± 0% -5.42% (p=0.029 n=4+4)
DeepEqual/string-4 26.7ns ± 1% 24.7ns ± 0% -7.47% (p=0.029 n=4+4)
DeepEqual/[]string-4 112ns ± 0% 107ns ± 0% -4.21% (p=0.029 n=4+4)
DeepEqual/[]uint8#01-4 89.4ns ± 1% 85.5ns ± 1% -4.44% (p=0.029 n=4+4)
DeepEqual/[][]uint8-4 177ns ± 0% 173ns ± 1% -2.22% (p=0.029 n=4+4)
DeepEqual/[6]uint8-4 137ns ± 1% 137ns ± 0% -0.56% (p=0.057 n=4+4)
DeepEqual/[][6]uint8-4 232ns ± 0% 230ns ± 1% -1.09% (p=0.029 n=4+4)
Change-Id: I275624e21dc4d70001032be48897f1504cbfdd1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/427634
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
These two directories are full of //go:build ignore files.
We can ignore them more easily by putting an underscore
at the start of the name. That also works around a bug
in Go 1.17 that was not fixed until Go 1.17.3.
Change-Id: Ia5389b65c79b1e6d08e4fef374d335d776d44ead
Reviewed-on: https://go-review.googlesource.com/c/go/+/435472
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|