| Age | Commit message (Collapse) | Author |
|
Logical ops on uint8/uint16 (AND/OR/XOR) with constants sometimes
materialized the mask via MOVD (often as a negative immediate), even
when the value fit in the UI-immediate range. This prevented the backend
from selecting andi. / ori / xori forms.
This CL makes:
UI-immediate truncation is performed only at the use-site of
logical-immediate ops, and only when the constant does not fit in the
8- or 16-bit unsigned domain (m != uint8(m) / m != uint16(m)).
This avoids negative-mask materialization and enables correct emission of
UI-form logical instructions. Arithmetic SI-immediate instructions (addi, subfic, etc.) and other
use-patterns are unchanged.
Codegen tests are added to ensure the expected andi./ori/xori
patterns appear and that MOVD is not emitted for valid 8/16-bit masks.
Change-Id: I9fcdf4498c4e984c7587814fb9019a75865c4a0d
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/704015
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
|
|
On ppc64/ppc64le, rewrite (x + x) << c to x << (c+1) for constant shifts. This removes an ADD, shortens the dependency chain, and reduces code size.
Add rules for both 64-bit (SLDconst) and 32-bit (SLWconst), and extend
test/codegen/shift.go with ppc64x checks to assert a single SLD/SLW and
forbid ADD. Aligns ppc64 with other architectures that already assert
similar codegen in shift.go.
Change-Id: Ie564afbb029a5bd48887b82b0c455ca1dddd5508
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/712000
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
type I interface {
foo()
}
type S struct {
I
}
Because I is embedded in S, S needs a foo method. We generate a
wrapper function to implement (*S).foo. It just loads the embedded
field I out of S and calls foo on it.
When the thing in S.I itself needs a wrapper, then we have a wrapper
calling another wrapper. This can continue, leaving a potentially long
sequence of wrappers on the stack. When we then call runtime.Callers
or friends, we have to walk an unbounded number of frames to find a
bounded number of non-wrapper frames.
This really happens, for instance with I = context.Context, S =
context.ValueCtx, and runtime.Callers = pprof sample (for any of
context.Context's methods).
To fix, make the interface call in the wrapper a tail call.
That way, the number of wrapper frames on the stack does not
increase when there are lots of wrappers happening.
Fixes #75764
Fixes #77781
Change-Id: I03b1731159d9218c7f14f72ecbbac822d6a6bb87
Reviewed-on: https://go-review.googlesource.com/c/go/+/751465
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Fixes #76084
I was focused on restoring the old behavior and fixing the failing
test/codegen/arithmetic.go:MergeMuls2 test.
It is probable this same bug hides elsewhere in this file.
Change-Id: I17f2ee6b97a1e33b8132648d9d750749d006f7e0
Reviewed-on: https://go-review.googlesource.com/c/go/+/715560
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
|
|
Change-Id: I25a9bbc247b2490e7e37ed843386f53a71822146
Reviewed-on: https://go-review.googlesource.com/c/go/+/682498
Reviewed-by: Paul Murphy <paumurph@redhat.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64
and S390X, rather than having a custom intrinsic. Note that for PPC64 this
actually allows the existing Ctz16 and Ctz8 rules to be used.
Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/651816
Reviewed-by: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Decompose BitLen16 and BitLen8 within the SSA rules for architectures that
support BitLen32 or BitLen64, rather than having a custom intrinsic.
Change-Id: Ie4188ce69d1021e63cec27a8e7418efb0714812b
Reviewed-on: https://go-review.googlesource.com/c/go/+/651817
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Fold constant int16 addends for usages of math/bits.Add64(x,const,0)
on PPC64. This usage shows up in a few crypto implementations;
notably the go wrapper for CL 626176.
Change-Id: I6963163330487d04e0479b4fdac235f97bb96889
Reviewed-on: https://go-review.googlesource.com/c/go/+/625899
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
CL 621357 introduced new generic lowering rules which caused
several shift related codegen test failures.
Add new rules to fix the test regressions, and cleanup tests
which are changed but not regressed. Some CLRLSLDI tests are
removed as they are no test CLRLSLDI rules.
Fixes #70003
Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034
Reviewed-on: https://go-review.googlesource.com/c/go/+/622236
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This is minor extension of the existing support for 32 and
64 bit types.
For #69735
Change-Id: I6828ec223951d2b692e077dc507b000ac23c32a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/617496
Reviewed-by: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: I50a19bd971176598bf8e4ef86ec98f008abe245c
Reviewed-on: https://go-review.googlesource.com/c/go/+/615198
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
This can be done efficiently with few instructions.
This also adds MULHDUCC for further codegen improvement.
Change-Id: I06320ba4383a679341b911a237a360ef07b19168
Reviewed-on: https://go-review.googlesource.com/c/go/+/605975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
RLIWNM does not clear the upper 32 bits of the target register if
the mask wraps around (e.g 0xF000000F). Don't elide MOVWZreg for
such masks. All other usage clears the upper 32 bits.
Fixes #67844.
Change-Id: I11b89f1da9ae077624369bfe2bf25e9b7c9b79bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/590896
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This allows more effective conversion of rotate and mask opcodes
into their CC equivalents, while simplifying the first lowering
pass.
This was removed before the latelower pass was introduced to fold
more cases of compare against zero. Add ANDconst to push the
conversion of ANDconst to ANDCCconst into latelower with the other
CC opcodes.
This also requires introducing RLDICLCC to prevent regressions
when ANDconst is converted to RLDICL then to RLDICLCC and back
to ANDCCconst when possible.
Change-Id: I9e5f9c99fbefa334db18c6c152c5f967f3ff2590
Reviewed-on: https://go-review.googlesource.com/c/go/+/586160
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Avoid creating duplicate usages of ANDCCconst. This is preparation for
a patch to reintroduce ANDconst to simplify the lower pass while
treating ANDCCconst like other *CC* ssa opcodes.
Also, move many of the similar rules wich retarget ANDCCconst users
to the flag result to a common rule for all compares against zero.
Change-Id: Ida86efe17ff413cb82c349d8ef69d2899361f4c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/585400
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Investigating binaries, these patterns seem to show up frequently.
Change-Id: I987251e4070e35c25e98da321e444ccaa1526912
Reviewed-on: https://go-review.googlesource.com/c/go/+/583302
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
RLWINM
This provides a small performance bump to crc64 as measured on ppc64le/power10:
name old time/op new time/op delta
Crc64/ISO64KB 49.6µs ± 0% 46.6µs ± 0% -6.18%
Crc64/ISO4KB 3.16µs ± 0% 2.97µs ± 0% -5.83%
Crc64/ISO1KB 840ns ± 0% 794ns ± 0% -5.46%
Crc64/ECMA64KB 49.6µs ± 0% 46.5µs ± 0% -6.20%
Crc64/Random64KB 53.1µs ± 0% 49.9µs ± 0% -6.04%
Crc64/Random16KB 15.9µs ± 1% 15.0µs ± 0% -5.73%
Change-Id: I302b5431c7dc46dfd2d211545c483bdcdfe011f1
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/581937
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Eli Bendersky <eliben@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This enables efficient use of the builtin min/max function
for float64 and float32 types on GOPPC64 >= power9.
Extend the assembler to support xsminjdp/xsmaxjdp and use
them to implement float min/max.
Simplify the VSX xx3 opcode rules to allow FPR arguments,
if all arguments are an FPR.
Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328
Reviewed-on: https://go-review.googlesource.com/c/go/+/574535
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
This usage shows up in quite a few places, and helps reduce
register pressure in several complex cryto functions by
removing a MOVD $0,... instruction.
Change-Id: I9444ea8f9d19bfd68fb71ea8dc34e109681b3802
Reviewed-on: https://go-review.googlesource.com/c/go/+/571055
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
|
|
For example, the Slicemask rule in PPC64 generates a sequence wherein there is andi operation, after an sradi, which can be replaced by srdi. This new rule eliminates ANDCCconst.
Change-Id: I27aaadf76b9c749a60bcdc5e87b1ebb8167d2fd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/539055
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
In the Power10 rule to fold bit reversal into load, the MOVWZreg or
MOVHZreg (Zeroing out the upper bits of a word or halfword) becomes
redundant since byte reverse (BR) load clears the upper bits. Hence
removing for Power10. Similarly for < Power10 cases in the rule used to
fold bit reversal into load (Bswap), the above redundant operation is removed.
Change-Id: Idb027e8b6e79b6acfb81d48a9a6cc06f8e9cd2db
Reviewed-on: https://go-review.googlesource.com/c/go/+/531377
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
In the PPC64 ISA, the instruction to do an 'and' operation
using an immediate constant is only available in the form that
also sets CR0 (i.e. clobbers the condition register.) This means
CR0 is being clobbered unnecessarily in many cases. That
affects some decisions made during some compiler passes
that check for it.
In those cases when the constant used by the ANDCC is a right
justified consecutive set of bits, a shift instruction can
be used which has the same effect if CR0 does not need to be
set. The rule to do that has been added to the late rules file
after other rules using ANDCCconst have been processed in the
main rules file.
Some codegen tests had to be updated since ANDCC is no
longer generated for some cases. A new test case was added to
verify the ANDCC is present if the results for both the AND
and CR0 are used.
Change-Id: I304f607c039a458e2d67d25351dd00aea72ba542
Reviewed-on: https://go-review.googlesource.com/c/go/+/531435
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
This adds some rules to recognize MOVDaddr in those cases where
it is just adding 0 to a ptr value. Instead the ptr value can just
be used.
Change-Id: I95188defc9701165c86bbea70d14d037a9e54853
Reviewed-on: https://go-review.googlesource.com/c/go/+/527698
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
This sequence can show up in the lowering pass on PPC64. If it
makes it to the latelower pass, it will cause an error because
it looks like it can be turned into RLDICL, but -1 isn't an
accepted mask.
Also, print more debug info if panic is called from
encodePPC64RotateMask.
Fixes #62698
Change-Id: I0f3322e2205357abe7fc28f96e05e3f7ad65567c
Reviewed-on: https://go-review.googlesource.com/c/go/+/529195
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
(ANDCCconst [y] (MOV.*reg x)) should only be merged when zero
extending. Otherwise, sign bits are lost on negative values.
(ANDCCconst [0xFF] (MOVBreg x)) should be simplified to a zero
extension of x. Likewise for the MOVHreg variant.
Fixes #61297
Change-Id: I04e4fd7dc6a826e870681f37506620d48393698b
Reviewed-on: https://go-review.googlesource.com/c/go/+/508775
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This modifies some existing rules to allow more prefixed instructions
to be generated when using GOPPC64=power10. Some rules also check
if PCRel is available, which is currently supported for linux/ppc64le
and linux/ppc64 (internal linking only).
Prior to p10, DS-offset loads and stores had a 16 bit size limit for
the offset field. If the offset of the data for load or store was
beyond this range then an indexed load or store would be selected by
the rules.
In p10 the assembler can generate prefixed instructions in this case,
but does not if an indexed instruction was selected during the lowering
pass.
This allows many more cases to use prefixed loads or stores, reducing
function sizes and improving performance in some cases where the code
change happens in key loops.
For example in strconv BenchmarkAppendQuoteRune before:
12c5e4: 15 00 10 06 pla r10,1425660
12c5e8: fc c0 40 39
12c5ec: 00 00 6a e8 ld r3,0(r10)
12c5f0: 10 00 aa e8 ld r5,16(r10)
After this change:
12a828: 15 00 10 04 pld r3,1433272
12a82c: b8 de 60 e4
12a830: 15 00 10 04 pld r5,1433280
12a834: c0 de a0 e4
Performs better in the second case.
A testcase was added to verify that the rules correctly select a load or
store based on the offset and whether power10 or earlier.
Change-Id: I4335fed0bd9b8aba8a4f84d69b89f819cc464846
Reviewed-on: https://go-review.googlesource.com/c/go/+/477398
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
|
|
Memory op combining is currently done using arch-specific rewrite rules.
Instead, do them as a arch-independent rewrite pass. This ensures that
all architectures (with unaligned loads & stores) get equal treatment.
This removes a lot of rewrite rules.
The new pass is a bit more comprehensive. It handles things like out-of-order
writes and is careful not to apply partial optimizations that then block
further optimizations.
Change-Id: I780ff3bb052475cd725a923309616882d25b8d9e
Reviewed-on: https://go-review.googlesource.com/c/go/+/478475
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
No change in semantics, just removing an unneeded helper.
Also align rules a bit.
Change-Id: Ie4dabb99392315a7700c645b3d0931eb8766a5fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/483439
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Argument type is dangerous because it may be thinner than the actual
store being issued.
Change-Id: Id19fbd8e6c41390a453994f897dd5048473136aa
Reviewed-on: https://go-review.googlesource.com/c/go/+/483438
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This reduces unbounded shift latency by one cycle, and may
generate less instructions in some cases.
When there is a choice whether to use doubleword or word shifts, use
doubleword shifts. Doubleword shifts have fewer hardware scheduling
restrictions across P8/P9/P10.
Likewise, rework the shift sequence to allow the compare/shift/overshift
values to compute in parallel, then choose the correct value.
Some ANDCCconst rules also need reworked to ensure they simplify when
used for their flag value. This commonly occurs when prove fails to
identify a bounded shift (e.g foo32<<uint(x&31)).
Change-Id: Ifc6ff4a865d68675e57745056db414b0eb6f2d34
Reviewed-on: https://go-review.googlesource.com/c/go/+/442597
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
For c + nil, we want the result to still be of pointer type.
Fixes ppc64le build failure with CL 468455, in issue33724.go.
The problem in that test is that it requires a nil check to be
scheduled before the corresponding load. This normally happens fine
because we prioritize nil checks. If we have nilcheck(p) and load(p),
once p is scheduled the nil check will always go before the load.
The issue we saw in 33724 is that when p is a nil pointer, we ended up
with two different p's, an int64(0) as the argument to the nil check
and an (*Outer)(0) as the argument to the load. Those two zeroes don't
get CSEd, so if the (*Outer)(0) happens to get scheduled first, the
load can end up before the nilcheck.
Fix this by always having constant arithmetic preserve the pointerness
of the value, so that both zeroes are of type *Outer and get CSEd.
Update #58482
Update #33724
Change-Id: Ib9b8c0446f1690b574e0f3c0afb9934efbaf3513
Reviewed-on: https://go-review.googlesource.com/c/go/+/468615
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Bypass: Keith Randall <khr@golang.org>
|
|
This CL adds rules that replaces instances of ISEL that produce
a boolean result based on a condition register by SETBC/SETBCR
operations. On Power10 these are convereted to SETBC/SETBCR
instructions that use one register instead of 3 registers
conventionally used by ISEL and hence reduces register pressure.
On loops written specifically to exercise such instances of ISEL
extensively, a performance improvement of 2.5% is seen on Power10.
Also added verification tests to verify correct generation of
SETBC/SETBCR instructions on Power10.
Change-Id: Ib719897f09d893de40324440a43052dca026e8fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/449795
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This change intrinsifies ReverseBytes{16|32|64} by generating the
corresponding new instructions in Power10: brh, brd and brw and
adds a verification test for the same.
On Power 9 and 8, the .go code performs optimally as it is.
Performance improvement seen on Power10:
ReverseBytes32 1.38ns ± 0% 1.18ns ± 0% -14.2
ReverseBytes64 1.52ns ± 0% 1.11ns ± 0% -26.87
ReverseBytes16 1.41ns ± 1% 1.18ns ± 0% -16.47
Change-Id: I88f127f3ab9ba24a772becc21ad90acfba324b37
Reviewed-on: https://go-review.googlesource.com/c/go/+/446675
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
The SPanchored opcode is identical to SP, except that it takes a memory
argument so that it (and more importantly, anything that uses it)
must be scheduled at or after that memory argument.
This opcode ensures that a LEAQ of a variable gets scheduled after the
corresponding VARDEF for that variable.
This may lead to less CSE of LEAQ operations. The effect is very small.
The go binary is only 80 bytes bigger after this CL. Usually LEAQs get
folded into load/store operations, so the effect is only for pointerful
types, large enough to need a duffzero, and have their address passed
somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because
the two uses are on different sides of a function call and the LEAQ
ends up being rematerialized at the second use anyway.
Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293
Reviewed-on: https://go-review.googlesource.com/c/go/+/452916
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Martin Möhrmann <martin@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
The standard way to generate code in a Go package is via //go:generate
directives, which are invoked by the developer explicitly running:
go generate import/path/of/said/package
Switch to using that approach here.
This way, developers don't need to learn and remember a custom way that
each particular Go package may choose to implement its code generation.
It also enables conveniences such as 'go generate -n' to discover how
code is generated without running anything (this works on all packages
that rely on //go:generate directives), being able to generate multiple
packages at once and from any directory, and so on.
Change-Id: I0e5b6a1edeff670a8e588befeef0c445613803c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/460135
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Similar to CL 456556 but for ppc64 instead of arm64.
Change docs about how booleans are stored in registers for ppc64.
We now don't promise to keep the upper bits zeroed; they might be junk.
To test, I changed the boolean generation instructions (MOVBZload* and ISEL*
with boolean type) to OR in 0x100 to the result. all.bash still passed,
so I think nothing else is depending on the upper bits of booleans.
Update #57184
Change-Id: Ie66f8934a0dafa34d0a8c2a37324868d959a852c
Reviewed-on: https://go-review.googlesource.com/c/go/+/456437
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: KAMPANAT THUMWONG (KONG PC) <1992kongpc.kth@gmail.com>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
|
|
This adds a -d debug flag "fmahash" for hashcode search for
floating point architecture-dependent problems. This variable has no
effect on architectures w/o fused-multiply-add.
This was rebased onto the GOSSAHASH renovation so that this could have
its own dedicated environment variable, and so that it would be
cheap (a nil check) to check it in the normal case.
Includes a basic test of the trigger plumbing.
Sample use (on arm64, ppc64le, s390x):
% GOCOMPILEDEBUG=fmahash=001110110 \
go build -o foo cmd/compile/internal/ssa/testdata/fma.go
fmahash triggered main.main:24 101111101101111001110110
GOFMAHASH triggered main.main:20 010111010000101110111011
1.0000000000000002 1.0000000000000004 -2.220446049250313e-16
exit status 1
The intended use is in conjunction with github.com/dr2chase/gossahash,
which will probably acquire a flag "-fma" to streamline its use. This
tool+use was inspired by an ad hoc use of this technique "in anger"
to debug this very problem. This is also a dry-run for using this
same technique to identify code sensitive to loop variable
lifetime/capture, should we make that change.
Example intended use, with current search tool (using old environment
variable), for a test example:
gossahash -e GOFMAHASH GOMAGIC=GOFMAHASH go run fma.go
Trying go args=[...], env=[GOFMAHASH=1 GOMAGIC=GOFMAHASH]
go failed (81 distinct triggers): exit status 1
Trying go args=[...], env=[GOFMAHASH=11 GOMAGIC=GOFMAHASH]
go failed (39 distinct triggers): exit status 1
Trying go args=[...], env=[GOFMAHASH=011 GOMAGIC=GOFMAHASH]
go failed (18 distinct triggers): exit status 1
Trying go args=[...], env=[GOFMAHASH=0011 GOMAGIC=GOFMAHASH]
Trying go args=[...], env=[GOFMAHASH=1011 GOMAGIC=GOFMAHASH]
...
Trying go args=[...], env=[GOFMAHASH=0110111011 GOMAGIC=GOFMAHASH]
Trying go args=[...], env=[GOFMAHASH=1110111011 GOMAGIC=GOFMAHASH]
go failed (2 distinct triggers): exit status 1
Trigger string is 'GOFMAHASH triggered math.qzero:427 111111101010011110111011', repeated 6 times
Trigger string is 'GOFMAHASH triggered main.main:20 010111010000101110111011', repeated 1 times
Trying go args=[...], env=[GOFMAHASH=01110111011 GOMAGIC=GOFMAHASH]
go failed (1 distinct triggers): exit status 1
Trigger string is 'GOFMAHASH triggered main.main:20 010111010000101110111011', repeated 1 times
Review GSHS_LAST_FAIL.0.log for failing run
FINISHED, suggest this command line for debugging:
GOSSAFUNC='main.main:20 010111010000101110111011' \
GOFMAHASH=01110111011 GOMAGIC=GOFMAHASH go run fma.go
Change-Id: Ifa22dd8f1c37c18fc8a4f7c396345a364bc367d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/394754
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
The gen folder was renamed to _gen in CL 435472, but references in code
and docs were not updated. This updates the references.
Change-Id: Ibadc0cdcb5bed145c3257b58465a8df370487ae5
Reviewed-on: https://go-review.googlesource.com/c/go/+/444355
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Johan Brandhorst-Satzkorn <johan.brandhorst@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This updates some rules to use ops with CC variations to
set the condition code when the result of the operation is
zero. This allows the following compare with zero to be
removed since the equivalent condition code has already been
set.
In addition, a previous rule change to use ANDCCconst was modified
to allow any constant value, not just 1 in some cases.
Improvements in the reflect package benchmarks:
DeepEqual/int8-4 23.9ns ± 1% 23.1ns ± 1% -3.57% (p=0.029 n=4+4)
DeepEqual/[]int8-4 109ns ± 2% 102ns ± 1% -6.67% (p=0.029 n=4+4)
DeepEqual/int16-4 23.8ns ± 1% 22.8ns ± 0% -3.97% (p=0.029 n=4+4)
DeepEqual/[]int16-4 108ns ± 1% 102ns ± 0% -6.25% (p=0.029 n=4+4)
DeepEqual/int32-4 24.9ns ± 3% 23.6ns ± 0% -5.09% (p=0.029 n=4+4)
DeepEqual/[]int32-4 109ns ± 1% 103ns ± 0% -5.64% (p=0.029 n=4+4)
DeepEqual/int64-4 25.5ns ± 1% 23.7ns ± 0% -7.03% (p=0.029 n=4+4)
DeepEqual/[]int64-4 109ns ± 1% 102ns ± 0% -6.73% (p=0.029 n=4+4)
DeepEqual/int-4 23.2ns ± 1% 22.7ns ± 0% -2.05% (p=0.029 n=4+4)
DeepEqual/[]int-4 109ns ± 3% 101ns ± 0% -7.18% (p=0.029 n=4+4)
DeepEqual/uint8-4 23.9ns ± 1% 23.5ns ± 0% -1.69% (p=0.029 n=4+4)
DeepEqual/[]uint8-4 89.1ns ± 0% 85.6ns ± 1% -3.95% (p=0.029 n=4+4)
DeepEqual/uint16-4 24.0ns ± 1% 23.8ns ± 0% -0.76% (p=0.343 n=4+4)
DeepEqual/[]uint16-4 111ns ± 0% 106ns ± 4% -4.74% (p=0.029 n=4+4)
DeepEqual/uint32-4 23.5ns ± 1% 23.0ns ± 0% -2.15% (p=0.029 n=4+4)
DeepEqual/[]uint32-4 110ns ± 1% 104ns ± 0% -5.66% (p=0.029 n=4+4)
DeepEqual/uint64-4 24.6ns ± 1% 24.3ns ± 0% -1.10% (p=0.143 n=4+4)
DeepEqual/[]uint64-4 111ns ± 0% 105ns ± 1% -5.16% (p=0.029 n=4+4)
DeepEqual/uint-4 23.6ns ± 0% 23.0ns ± 0% -2.70% (p=0.029 n=4+4)
DeepEqual/[]uint-4 109ns ± 0% 103ns ± 1% -5.74% (p=0.029 n=4+4)
DeepEqual/uintptr-4 25.1ns ± 1% 24.8ns ± 2% -1.11% (p=0.171 n=4+4)
DeepEqual/[]uintptr-4 111ns ± 0% 106ns ± 1% -4.45% (p=0.029 n=4+4)
DeepEqual/float32-4 22.5ns ± 0% 22.2ns ± 0% -1.29% (p=0.029 n=4+4)
DeepEqual/[]float32-4 105ns ± 0% 101ns ± 1% -3.75% (p=0.029 n=4+4)
DeepEqual/float64-4 22.7ns ± 2% 22.1ns ± 0% -2.52% (p=0.029 n=4+4)
DeepEqual/[]float64-4 105ns ± 1% 103ns ± 1% -2.77% (p=0.029 n=4+4)
DeepEqual/complex64-4 22.9ns ± 0% 22.8ns ± 0% -0.48% (p=0.029 n=4+4)
DeepEqual/[]complex64-4 107ns ± 0% 101ns ± 0% -5.48% (p=0.029 n=4+4)
DeepEqual/complex128-4 23.2ns ± 1% 22.6ns ± 0% -2.34% (p=0.029 n=4+4)
DeepEqual/[]complex128-4 107ns ± 0% 101ns ± 0% -5.60% (p=0.029 n=4+4)
DeepEqual/bool-4 22.0ns ± 1% 21.7ns ± 0% -1.44% (p=0.029 n=4+4)
DeepEqual/[]bool-4 106ns ± 1% 100ns ± 0% -5.42% (p=0.029 n=4+4)
DeepEqual/string-4 26.7ns ± 1% 24.7ns ± 0% -7.47% (p=0.029 n=4+4)
DeepEqual/[]string-4 112ns ± 0% 107ns ± 0% -4.21% (p=0.029 n=4+4)
DeepEqual/[]uint8#01-4 89.4ns ± 1% 85.5ns ± 1% -4.44% (p=0.029 n=4+4)
DeepEqual/[][]uint8-4 177ns ± 0% 173ns ± 1% -2.22% (p=0.029 n=4+4)
DeepEqual/[6]uint8-4 137ns ± 1% 137ns ± 0% -0.56% (p=0.057 n=4+4)
DeepEqual/[][6]uint8-4 232ns ± 0% 230ns ± 1% -1.09% (p=0.029 n=4+4)
Change-Id: I275624e21dc4d70001032be48897f1504cbfdd1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/427634
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
In CL 429035 Keith suggested removing some
rules that were covered by generic rules. This
follows up on that comment.
Change-Id: I57b6c9ae0cd85f33a0eb2fef8356575d3d7820fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/430417
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Remove OpPPC64LoweredMuluhilo as this operation can be done
more efficiently with MULHDU and MULLD directly. This has the
benefit of not needing to use tuple select operations, and giving
the scheduler more freedom to place these operations.
The primary reason to avoid using tuples here is to to avoid
suboptimal scheduling when carry ops (e.x ADDC/ADDE) are used in
the same block as 64->128b multiples. CL 432275 modifies the
scheduling priorities which may cause non-flag/non-carry generating
tuple ops to interfere with carry opcodes. Thus resulting in excess
saving and restoring of the XER register.
This allows CL 432275 to adjust the scheduling priorities without
having to workaround odd tuple scheduling behavior.
Change-Id: Id04ef009ec4b86416e5436f2b44ae1474e73720e
Reviewed-on: https://go-review.googlesource.com/c/go/+/434855
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This uses rulegen syntax which allows similar rules
to be combined, saving lines in the rules file.
The Lsh16x32 rule had an incorrect value and that was
fixed.
Change-Id: I637410e39d8554825076aca5ac24083ce05ab186
Reviewed-on: https://go-review.googlesource.com/c/go/+/429035
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Detect rotate instructions while still in architecture-independent form.
It's easier to do here, and we don't need to repeat it in each
architecture file.
Change-Id: I9396954b3f3b3bfb96c160d064a02002309935bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/421195
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Ruinan Sun <Ruinan.Sun@arm.com>
Run-TryBot: Keith Randall <khr@golang.org>
|
|
Currently there is a an ANDconst and an ANDCCconst op in PPC64,
which is confusing since they map onto the same instruction.
One of these ops sets the result of the AND operation, and the
other sets the flag (condition register).
This converts ANDCCconst into an op with the 2 expected results:
the integer result of the AND and the flag setting. The ANDconst
op has been removed.
Note that in the PPC64 ISA the only variation of the 'and immediate'
is the one that sets the condition bit, which probably led to the
original (confusing) implementation.
This also adds a few rules to improve the use of ANDCCconst with
ISELB and some testcases to verify those improvements.
Change-Id: I523703fa4da2098eb995dc3ba744d36fa28e41d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/422015
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
|
|
math/bits.Add64 and math/bits.Sub64 now lower and optimize
directly in SSA form.
The optimization of carry chains focuses around eliding
XER<->GPR transfers of the CA bit when used exclusively as an
input to a single carry operations, or when the CA value is
known.
This also adds support for handling XER spills in the assembler
which could happen if carry chains contain inter-dependencies
on each other (which seems very unlikely with practical usage),
or a clobber happens (SRAW/SRAD/SUBFC operations clobber CA).
With PPC64 Add64/Sub64 lowering into SSA and this patch, the net
performance difference in crypto/elliptic benchmarks on P9/ppc64le
are:
name old time/op new time/op delta
ScalarBaseMult/P256 46.3µs ± 0% 46.9µs ± 0% +1.34%
ScalarBaseMult/P224 356µs ± 0% 209µs ± 0% -41.14%
ScalarBaseMult/P384 1.20ms ± 0% 0.57ms ± 0% -52.14%
ScalarBaseMult/P521 3.38ms ± 0% 1.44ms ± 0% -57.27%
ScalarMult/P256 199µs ± 0% 199µs ± 0% -0.17%
ScalarMult/P224 357µs ± 0% 212µs ± 0% -40.56%
ScalarMult/P384 1.20ms ± 0% 0.58ms ± 0% -51.86%
ScalarMult/P521 3.37ms ± 0% 1.44ms ± 0% -57.32%
MarshalUnmarshal/P256/Uncompressed 2.59µs ± 0% 2.52µs ± 0% -2.63%
MarshalUnmarshal/P256/Compressed 2.58µs ± 0% 2.52µs ± 0% -2.06%
MarshalUnmarshal/P224/Uncompressed 1.54µs ± 0% 1.40µs ± 0% -9.42%
MarshalUnmarshal/P224/Compressed 1.54µs ± 0% 1.39µs ± 0% -9.87%
MarshalUnmarshal/P384/Uncompressed 2.40µs ± 0% 1.80µs ± 0% -24.93%
MarshalUnmarshal/P384/Compressed 2.35µs ± 0% 1.81µs ± 0% -23.03%
MarshalUnmarshal/P521/Uncompressed 3.79µs ± 0% 2.58µs ± 0% -31.81%
MarshalUnmarshal/P521/Compressed 3.80µs ± 0% 2.60µs ± 0% -31.67%
Note, P256 uses an asm implementation, thus, little variation is expected.
Change-Id: I88a24f6bf0f4f285c649e40243b1ab69cc452b71
Reviewed-on: https://go-review.googlesource.com/c/go/+/346870
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Following CL 405114, for PPC64.
Should fix PPC64 builds.
Updates #52788.
Change-Id: I193ac31cfba18b4f907dd2523b51368251fd6fad
Reviewed-on: https://go-review.googlesource.com/c/go/+/405116
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
For this code:
z &= 63
_ = x<<z | x>>(64-z)
Now can prove 'x<<z' in bound. In ppc64 lowering pass, it will not
produce an extra '(ANDconst <typ.Int64> [63] z)' causing
codegen/rotate.go failed. Just remove the type check in rewrite rules
as the workaround.
Removes 32 bounds checks during make.bat.
Fixes #52563.
Change-Id: I14ed2c093ff5638dfea7de9bc7649c0f756dd71a
Reviewed-on: https://go-review.googlesource.com/c/go/+/404315
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This enables publicationBarrier to be used as an intrinsic
on ppc64le/ppc64.
A call to this appears in test/bench/go1 BinaryTree17
Change-Id: If53528a82de99688270473cbe23472f37046ad65
Reviewed-on: https://go-review.googlesource.com/c/go/+/404056
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This shows up in a few crypto functions, and other
assorted places.
Change-Id: I5a7f4c25ddd4a6499dc295ef693b9fe43d2448ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/404057
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
This CL fixes encoding of PrefetchStreamed on PPC64 to be consistent
with what is implemented on AMD64 and ARM64 platforms which is
prefetchNTA (prefetch non-temporal access). Looking at the definition
of prefetchNTA, the closest corresponding Touch hint (TH) value to be
used on PPC64 is 16 that states that the address is accessed in a
transient manner. Current usage of TH=8 may cause degraded
performance.
Change-Id: I393bf5a9b971a22f632b3cbfb4fa659062af9a27
Reviewed-on: https://go-review.googlesource.com/c/go/+/390316
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|