| Age | Commit message (Collapse) | Author |
|
The SSA generic rewrite rules implement DeMorgan's laws but are
missing the closely related boolean absorption laws:
x & (x | y) == x
x | (x & y) == x
These are fundamental boolean algebra identities (see
https://en.wikipedia.org/wiki/Absorption_law) that hold for all
bit patterns, all widths, signed and unsigned. Both GCC and LLVM
recognize and optimize these patterns at -O2.
Add two generic rules covering all four widths (8, 16, 32, 64).
Commutativity of AND/OR is handled automatically by the rule
engine, so all argument orderings are matched.
The rules eliminate two redundant ALU instructions per occurrence
and fire on real code (defer bit-manipulation patterns in runtime,
testing, go/parser, and third-party packages).
Fixes #78632
Change-Id: Ib59e839081302ad1635e823309d8aec768c25dcf
GitHub-Last-Rev: 23f8296ece08c77fcaeeaf59c2c2d8ce23d1202c
GitHub-Pull-Request: golang/go#78634
Reviewed-on: https://go-review.googlesource.com/c/go/+/765580
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
|
|
Change-Id: I27696b1a5fa0593d9f36743efa3559a36d23ec4b
Reviewed-on: https://go-review.googlesource.com/c/go/+/760844
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
- fix a bug where it wouldn't recognize 1<<63 as a power of two
- remove the IsSigned check; there is no such thing as a signed Mul
If the rule works for signed numbers it works for unsigned ones too.
Even if the intermediary steps makes no sense, it ends up wrapping
the right way around in the end.
Change-Id: I86182762aec5eff784e2d9bc49ee028825fb9ea0
Reviewed-on: https://go-review.googlesource.com/c/go/+/760843
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
Change-Id: Ic97f661c68180ff7adb9976fcc61279e1e1f04a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/760842
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
On amd64 along:
if b { x += 1 } => x += b
We can also implement constants 2 4 and 8:
if b { x += 2 } => x += b * 2
This compiles to a displacement LEA.
Change-Id: Ib00fcc5059acb0ebb346e056c4a656f164cc63df
Reviewed-on: https://go-review.googlesource.com/c/go/+/760841
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Logical ops on uint8/uint16 (AND/OR/XOR) with constants sometimes
materialized the mask via MOVD (often as a negative immediate), even
when the value fit in the UI-immediate range. This prevented the backend
from selecting andi. / ori / xori forms.
This CL makes:
UI-immediate truncation is performed only at the use-site of
logical-immediate ops, and only when the constant does not fit in the
8- or 16-bit unsigned domain (m != uint8(m) / m != uint16(m)).
This avoids negative-mask materialization and enables correct emission of
UI-form logical instructions. Arithmetic SI-immediate instructions (addi, subfic, etc.) and other
use-patterns are unchanged.
Codegen tests are added to ensure the expected andi./ori/xori
patterns appear and that MOVD is not emitted for valid 8/16-bit masks.
Change-Id: I9fcdf4498c4e984c7587814fb9019a75865c4a0d
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/704015
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
|
|
If the bool comes from a local operation this is foldable into the comparison.
if a == b {
} else {
x++
}
becomes:
x += !(a == b)
becomes:
x += a != b
If the bool is passed in or loaded rather than being locally computed
this adds an extra XOR ^1 to invert it.
But at worst it should make the math equal to the compute + CMP + CMOV
which is a tie on modern CPUs which can execute CMOV on all int ALUs
and a win on the cheaper or older ones which can't.
Change-Id: Idd2566c7a3826ec432ebfbba7b3898aa0db4b812
Reviewed-on: https://go-review.googlesource.com/c/go/+/760922
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
Similar to CL 685676 but for XOR.
Change-Id: Ib5ffd4c13348f176a808b3218fdbbafc2c42794f
Reviewed-on: https://go-review.googlesource.com/c/go/+/760921
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Similar to CL 685676 but for OR.
Change-Id: I0ddfd457ed9e8888462306138a251ac48ad42084
Reviewed-on: https://go-review.googlesource.com/c/go/+/760920
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
Change-Id: I9596dbca8991c93c7543d10dc1b155056dfa7db3
Reviewed-on: https://go-review.googlesource.com/c/go/+/759500
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: Ib37b35dfff6236c59c0242c3b7d979c95aefbb8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/750321
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
|
|
Change-Id: I4dff3ba1462848f408257cbadedf202e62d1ea69
Reviewed-on: https://go-review.googlesource.com/c/go/+/750320
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
On ppc64/ppc64le, rewrite (x + x) << c to x << (c+1) for constant shifts. This removes an ADD, shortens the dependency chain, and reduces code size.
Add rules for both 64-bit (SLDconst) and 32-bit (SLWconst), and extend
test/codegen/shift.go with ppc64x checks to assert a single SLD/SLW and
forbid ADD. Aligns ppc64 with other architectures that already assert
similar codegen in shift.go.
Change-Id: Ie564afbb029a5bd48887b82b0c455ca1dddd5508
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/712000
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
clears"
This reverts CL 750480.
Reason: Adding preemptible memclrNoHeapPointers exposes existing unsafe
use of notInHeapSlice, causing crashes. Revert the memclr stack until
the underlying issue is fixed.
We keep the test added in CL 755942, which is useful regardless.
For #78254.
Change-Id: I8be3f9a20292b7f294e98e74e5a86c6a204406ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/757343
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Add rules to eliminate sign-extension of values that have already
been zero-extended from fewer bits via an I64And mask:
(I64Extend32S x:(I64And _ (I64Const [c]))) && c >= 0 && int64(int32(c)) == c => x
(I64Extend16S x:(I64And _ (I64Const [c]))) && c >= 0 && int64(int16(c)) == c => x
(I64Extend8S x:(I64And _ (I64Const [c]))) && c >= 0 && int64(int8(c)) == c => x
When a value has been masked to fit within the non-negative range of
the sign-extension width, the upper bits are already zero and sign-
extending is a no-op. For example, (I64Extend32S (I64And x 0xff))
can be elided because 0xff fits in a signed int32, so bit 31 is
guaranteed to be zero and sign-extending from 32 bits is identity.
Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero
Change-Id: Ia54d67358756e47ca7635a6a8ca4beadb003820a
Reviewed-on: https://go-review.googlesource.com/c/go/+/756320
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Absorb unnecessary conversion between float32 and float64
if both src and dst are 32 bit.
Ref: CL 733621
Updates #75463
Change-Id: I439f92aa3d940fa4979e76845c0893e43bf584af
Reviewed-on: https://go-review.googlesource.com/c/go/+/739520
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
|
|
R30 is the callee's saved register; using it requires saving and then restoring.
Therefore, we replace it with a register saved by the caller.
R4~R19 are argument registers on loong64, and R20 is the only remaining usable
caller saved register. To use R20 in trampoline, we modified the registers used
by the LoweredMove/LoweredMoveLoop operations (originally using r20 and r21,
now changed to R23 and R24).
Change-Id: Ie7bba0caa30a764a45bcb47635c35c829036c5a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/726140
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Update Move rewrite rules to use FMOVQload/store and FLDPQ/FSTPQ
for medium-sized copies (16-64 bytes). This generates fewer and
wider instructions than the previous approach using LDP/STP pairs.
Executable Base .text go1 Change
----------------------------------------------------
asm 2112308 2105732 -0.31%
cgo 1826132 1823172 -0.16%
compile 10474868 10460644 -0.14%
cover 1990036 1985748 -0.22%
fix 3234116 3226340 -0.24%
link 2702628 2695316 -0.27%
preprofile 947652 947028 -0.07%
vet 3140964 3133524 -0.24%
Performance effect on OrangePi 6 plus:
│ orig.out │ movq.out │
│ sec/op │ sec/op vs base │
CopyFat16 0.4711n ± 0% 0.3852n ± 0% -18.23% (p=0.000 n=10)
CopyFat17 0.7705n ± 0% 0.7705n ± 0% ~ (p=0.984 n=10)
CopyFat18 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.771 n=10)
CopyFat19 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.637 n=10)
CopyFat20 0.7703n ± 0% 0.7704n ± 0% ~ (p=0.103 n=10)
CopyFat21 0.7703n ± 0% 0.7708n ± 0% ~ (p=0.505 n=10)
CopyFat22 0.7704n ± 0% 0.7705n ± 0% ~ (p=0.589 n=10)
CopyFat23 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.347 n=10)
CopyFat24 0.7704n ± 0% 0.7703n ± 0% ~ (p=0.383 n=10)
CopyFat25 0.8385n ± 0% 0.6589n ± 0% -21.41% (p=0.000 n=10)
CopyFat26 0.8386n ± 0% 0.6590n ± 0% -21.42% (p=0.000 n=10)
CopyFat27 0.8385n ± 0% 0.6590n ± 0% -21.41% (p=0.000 n=10)
CopyFat28 0.8386n ± 0% 0.6571n ± 0% -21.65% (p=0.000 n=10)
CopyFat29 0.8385n ± 0% 0.6590n ± 0% -21.41% (p=0.000 n=10)
CopyFat30 0.8387n ± 0% 0.6591n ± 0% -21.42% (p=0.000 n=10)
CopyFat31 0.8385n ± 0% 0.6589n ± 0% -21.42% (p=0.000 n=10)
CopyFat32 0.8318n ± 0% 0.4969n ± 0% -40.26% (p=0.000 n=10)
CopyFat33 1.1550n ± 0% 0.7705n ± 0% -33.29% (p=0.000 n=10)
CopyFat34 1.1560n ± 0% 0.7703n ± 0% -33.37% (p=0.000 n=10)
CopyFat35 1.1550n ± 0% 0.7705n ± 0% -33.29% (p=0.000 n=10)
CopyFat36 1.1550n ± 0% 0.7704n ± 0% -33.30% (p=0.000 n=10)
CopyFat37 1.1555n ± 0% 0.7704n ± 0% -33.33% (p=0.000 n=10)
CopyFat38 1.1550n ± 0% 0.7704n ± 0% -33.30% (p=0.000 n=10)
CopyFat39 1.1560n ± 0% 0.7703n ± 0% -33.36% (p=0.000 n=10)
CopyFat40 1.0020n ± 0% 0.7705n ± 0% -23.10% (p=0.000 n=10)
CopyFat41 1.2060n ± 0% 0.7703n ± 0% -36.12% (p=0.000 n=10)
CopyFat42 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10)
CopyFat43 1.2060n ± 0% 0.7705n ± 0% -36.11% (p=0.000 n=10)
CopyFat44 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10)
CopyFat45 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10)
CopyFat46 1.2060n ± 0% 0.7703n ± 0% -36.13% (p=0.000 n=10)
CopyFat47 1.2060n ± 0% 0.7703n ± 0% -36.12% (p=0.000 n=10)
CopyFat48 1.2060n ± 0% 0.7703n ± 0% -36.13% (p=0.000 n=10)
CopyFat49 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10)
CopyFat50 1.3620n ± 0% 0.8621n ± 0% -36.70% (p=0.000 n=10)
CopyFat51 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10)
CopyFat52 1.3620n ± 0% 0.8623n ± 0% -36.69% (p=0.000 n=10)
CopyFat53 1.3620n ± 0% 0.8621n ± 0% -36.70% (p=0.000 n=10)
CopyFat54 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10)
CopyFat55 1.3620n ± 0% 0.8620n ± 0% -36.71% (p=0.000 n=10)
CopyFat56 1.3120n ± 0% 0.8622n ± 0% -34.28% (p=0.000 n=10)
CopyFat57 1.5905n ± 0% 0.8621n ± 0% -45.80% (p=0.000 n=10)
CopyFat58 1.5830n ± 1% 0.8622n ± 0% -45.53% (p=0.000 n=10)
CopyFat59 1.5865n ± 1% 0.8621n ± 0% -45.66% (p=0.000 n=10)
CopyFat60 1.5720n ± 1% 0.8622n ± 0% -45.15% (p=0.000 n=10)
CopyFat61 1.5900n ± 1% 0.8621n ± 0% -45.78% (p=0.000 n=10)
CopyFat62 1.5890n ± 0% 0.8622n ± 0% -45.74% (p=0.000 n=10)
CopyFat63 1.5900n ± 1% 0.8620n ± 0% -45.78% (p=0.000 n=10)
CopyFat64 1.5440n ± 0% 0.8568n ± 0% -44.51% (p=0.000 n=10)
geomean 1.093n 0.7636n -30.13%
Kunpeng 920C:
goos: linux
goarch: arm64
pkg: runtime
│ orig.out │ movq.out │
│ sec/op │ sec/op vs base │
CopyFat16 0.4892n ± 1% 0.5072n ± 0% +3.68% (p=0.000 n=10)
CopyFat17 0.6394n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10)
CopyFat18 0.6394n ± 0% 0.4638n ± 0% -27.46% (p=0.000 n=10)
CopyFat19 0.6395n ± 0% 0.4638n ± 0% -27.48% (p=0.000 n=10)
CopyFat20 0.6393n ± 0% 0.4638n ± 0% -27.45% (p=0.000 n=10)
CopyFat21 0.6394n ± 0% 0.4637n ± 0% -27.48% (p=0.000 n=10)
CopyFat22 0.6395n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10)
CopyFat23 0.6395n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10)
CopyFat24 0.6091n ± 0% 0.4639n ± 0% -23.84% (p=0.000 n=10)
CopyFat25 0.9109n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10)
CopyFat26 0.9107n ± 0% 0.4674n ± 0% -48.68% (p=0.000 n=10)
CopyFat27 0.9108n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10)
CopyFat28 0.9109n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10)
CopyFat29 0.9110n ± 0% 0.4673n ± 0% -48.70% (p=0.000 n=10)
CopyFat30 0.9109n ± 0% 0.4673n ± 0% -48.70% (p=0.000 n=10)
CopyFat31 0.9110n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10)
CopyFat32 0.6845n ± 0% 0.4845n ± 1% -29.21% (p=0.000 n=10)
CopyFat33 0.9130n ± 0% 0.9117n ± 0% -0.14% (p=0.000 n=10)
CopyFat34 0.9131n ± 0% 0.9118n ± 0% -0.14% (p=0.001 n=10)
CopyFat35 0.9131n ± 0% 0.9117n ± 0% -0.15% (p=0.001 n=10)
CopyFat36 0.9129n ± 0% 0.9117n ± 0% -0.14% (p=0.003 n=10)
CopyFat37 0.9129n ± 0% 0.9117n ± 0% -0.14% (p=0.000 n=10)
CopyFat38 0.9130n ± 0% 0.9118n ± 0% -0.14% (p=0.000 n=10)
CopyFat39 0.9131n ± 0% 0.9118n ± 0% -0.15% (p=0.000 n=10)
CopyFat40 0.9112n ± 0% 0.9118n ± 0% +0.07% (p=0.027 n=10)
CopyFat41 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10)
CopyFat42 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10)
CopyFat43 1.1390n ± 0% 0.9116n ± 0% -19.96% (p=0.000 n=10)
CopyFat44 1.1390n ± 0% 0.9119n ± 0% -19.94% (p=0.000 n=10)
CopyFat45 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10)
CopyFat46 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10)
CopyFat47 1.1390n ± 0% 0.9117n ± 0% -19.96% (p=0.000 n=10)
CopyFat48 0.9111n ± 0% 0.9116n ± 0% +0.06% (p=0.002 n=10)
CopyFat49 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10)
CopyFat50 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10)
CopyFat51 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10)
CopyFat52 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10)
CopyFat53 1.2160n ± 0% 0.9293n ± 0% -23.58% (p=0.000 n=10)
CopyFat54 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10)
CopyFat55 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10)
CopyFat56 1.1480n ± 0% 0.9303n ± 0% -18.96% (p=0.000 n=10)
CopyFat57 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10)
CopyFat58 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10)
CopyFat59 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10)
CopyFat60 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10)
CopyFat61 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10)
CopyFat62 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10)
CopyFat63 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10)
CopyFat64 1.1470n ± 0% 0.5742n ± 0% -49.94% (p=0.000 n=10)
geomean 0.9710n 0.7214n -25.70%
Change-Id: Iecfe52fde1d431a1e4503cd848813a67f3896512
Reviewed-on: https://go-review.googlesource.com/c/go/+/738261
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Add rules to eliminate redundant I64Extend sign-extension operations
in the wasm backend:
Idempotent (applying the same extend twice is redundant):
(I64Extend32S (I64Extend32S x)) => (I64Extend32S x)
(I64Extend16S (I64Extend16S x)) => (I64Extend16S x)
(I64Extend8S (I64Extend8S x)) => (I64Extend8S x)
Narrower-subsumes-wider (a narrower sign-extend already determines
all the bits that a wider one would set):
(I64Extend32S (I64Extend16S x)) => (I64Extend16S x)
(I64Extend32S (I64Extend8S x)) => (I64Extend8S x)
(I64Extend16S (I64Extend8S x)) => (I64Extend8S x)
These patterns arise from nested sub-word type conversions. For
example, converting int8 -> int16 -> int32 -> int64 lowers to
I64Extend8S -> I64Extend16S -> I64Extend32S, but the I64Extend8S
alone is sufficient since it already sign-extends from 8 to 64 bits.
Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero
Change-Id: I1637687df31893b1ffa36915a3bd2e10d455f4ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/754040
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Absorb unnecessary conversion between float32 and float64
if both src and dst are 32 bit.
Updates #75463
Change-Id: Ia71941223b5cca3fea66b559da7b8f916e63feaf
Reviewed-on: https://go-review.googlesource.com/c/go/+/733621
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Julian Zhu <jz531210@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add a rule to collapse cascaded I64And operations with constant masks
into a single mask:
(I64And (I64And x (I64Const [c1])) (I64Const [c2])) =>
(I64And x (I64Const [c1 & c2]))
This pattern arises from sub-word comparisons. For example,
(Eq32 x y) lowers to (I64Eq (ZeroExt32to64 x) (ZeroExt32to64 y)),
which becomes (I64Eq (I64And x 0xffffffff) (I64And y 0xffffffff)).
If x or y is the result of another sub-word operation that already
inserted a mask, the masks cascade and this rule collapses them.
Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero
Change-Id: Id7856b391be3ac20f1bc9eee40995b52c0754aed
Reviewed-on: https://go-review.googlesource.com/c/go/+/753620
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
It should be 2, not 1.
Fixes #78013
Change-Id: If1c48c84c324a3fd50e9f4b43cca2ea62a995dc5
Reviewed-on: https://go-review.googlesource.com/c/go/+/752740
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
Fixes #77720
Add a generic SSA rewrite that forwards `Load` from a `Move` destination
back to the `Move` source when it is provably safe, so field reads like
`s.h.Value().typ` don’t force a full struct copy.
- Add `Load <- Move` rewrite in `generic.rules` with safety guard:
non-volatile source
- Tweak `fixedbugs/issue22200*` so that it can still trigger the "stack frame too large" error.
- Regenerate `rewritegeneric.go`.
- Add `test/codegen/moveload.go` to assert no `MOVUPS` and direct `MOVBLZX`
in both direct and inlined forms.
Benchmark results (linux/amd64, i7-14700KF):
$ go test cmd/compile/internal/test -run='^$' -bench='MoveLoad' -count=20
Before:
BenchmarkMoveLoadTypViaValue-20 ~76.9 ns/op
BenchmarkMoveLoadTypViaPtr-20 ~1.97 ns/op
After:
BenchmarkMoveLoadTypViaValue-20 ~1.894 ns/op
BenchmarkMoveLoadTypViaPtr-20 ~1.905 ns/op
The rewrite removes the redundant struct copy in
`s.h.Value().typ`, bringing it in line with the direct pointer form.
Change-Id: Iddf2263e390030ba013e0642a695b87c75f899da
Reviewed-on: https://go-review.googlesource.com/c/go/+/748200
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <markfreeman@google.com>
|
|
Since Go 1.22, loop variables have per-iteration scope, making
the x := x this pattern unnecessary for goroutine capture.
No issue required for this trivial cleanup.
Change-Id: I00d98522537fc2b9a6b4d598c8aa21b447628d41
Reviewed-on: https://go-review.googlesource.com/c/go/+/753400
Auto-Submit: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
Add post-lowering identity and absorption rules for I64And, I64Or,
I64Xor, and I64Mul with constant operands:
(I64And x (I64Const [-1])) => x
(I64And x (I64Const [0])) => (I64Const [0])
(I64Or x (I64Const [0])) => x
(I64Or x (I64Const [-1])) => (I64Const [-1])
(I64Xor x (I64Const [0])) => x
(I64Mul x (I64Const [0])) => (I64Const [0])
(I64Mul x (I64Const [1])) => x
The generic SSA rules handle these patterns before lowering, but
these rules catch cases where wasm-specific lowering or other
post-lowering optimization passes produce new nodes with identity
or absorbing constant operands.
For example, the complement rule lowers Com64(x) to
(I64Xor x (I64Const [-1])), and if x is later determined to be
all-ones, the I64And absorption rule can fold the result to zero.
Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero
Change-Id: Ie9a40e075662d4828a70e30b258d92ee171d0bc2
Reviewed-on: https://go-review.googlesource.com/c/go/+/752861
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Add OpARM64FMOVQload, OpARM64FMOVQstore, OpARM64FLDPQ, and
OpARM64FSTPQ for loading and storing Vec128 values.
Includes offset folding and address combining rules.
These ops will be used by subsequent CLs.
Change-Id: I4ac86a0a31f878411f49d390cb8df01f81cfc4d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/738260
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Negation on a condition can be eliminated.
Change-Id: I94fab5f019cbaebb2ca589e1d8796a9cb72f3894
Reviewed-on: https://go-review.googlesource.com/c/go/+/748401
Reviewed-by: Xueqi Luo <1824368278@qq.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Julian Zhu <jz531210@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Replace one of the boolean simplification rules with two new rules
in order to cover more cases.
This is a rebase of CL 42516 which slipped through the cracks.
Change-Id: I6da4cf30e5156174e8eac6bc2f0e2cebe95e555c
Reviewed-on: https://go-review.googlesource.com/c/go/+/750520
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
|
|
Change-Id: I0e0a5919536b643477a6f9278fcc60492ea5a759
Reviewed-on: https://go-review.googlesource.com/c/go/+/750540
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Add the missing I64Sub constant folding rule to the wasm backend.
Every other wasm arithmetic operation (I64Add, I64Mul, I64And, I64Or,
I64Xor, I64Shl, I64ShrU, I64ShrS) already had a post-lowering
constant folding rule, but I64Sub was missing.
While the generic SSA pass folds Sub64(Const64, Const64) before
lowering, this rule ensures consistency and handles any edge cases
where later wasm-specific passes produce I64Sub with two constant
operands.
Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero
Change-Id: Ie8bc044dd300dcc6d077feec34f9a65f4a310b13
Reviewed-on: https://go-review.googlesource.com/c/go/+/751441
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Commit-Queue: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
type I interface {
foo()
}
type S struct {
I
}
Because I is embedded in S, S needs a foo method. We generate a
wrapper function to implement (*S).foo. It just loads the embedded
field I out of S and calls foo on it.
When the thing in S.I itself needs a wrapper, then we have a wrapper
calling another wrapper. This can continue, leaving a potentially long
sequence of wrappers on the stack. When we then call runtime.Callers
or friends, we have to walk an unbounded number of frames to find a
bounded number of non-wrapper frames.
This really happens, for instance with I = context.Context, S =
context.ValueCtx, and runtime.Callers = pprof sample (for any of
context.Context's methods).
To fix, make the interface call in the wrapper a tail call.
That way, the number of wrapper frames on the stack does not
increase when there are lots of wrappers happening.
Fixes #75764
Fixes #77781
Change-Id: I03b1731159d9218c7f14f72ecbbac822d6a6bb87
Reviewed-on: https://go-review.googlesource.com/c/go/+/751465
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Large memory clearing operations (via clear() or large slice allocation)
currently use non-preemptible assembly loops. This blocks the Garbage
Collector from performing a Stop The World (STW) event, leading to
significant tail latency or even indefinite hangs in tight loops.
This change introduces memclrNoHeapPointersPreemptible, which chunks
clears into 256KB blocks with preemption checks. The compiler's walk
phase is updated to emit this call for large pointer-free clears.
To prevent regressions, SSA rewrite rules are added to ensure that
constant-size clears (which are common and small) continue to be
inlined into OpZero assembly.
Benchmarks on darwin/arm64:
- STW with 50MB clear: Improved from 'Hung' to ~500µs max pause.
- Small clears (5-64B): No measurable regression.
- Large clears (1M-64M): No measurable regression.
Fixes #69327
Change-Id: Ide14d6bcdca1f60d6ac95443acb57da9a8822538
Reviewed-on: https://go-review.googlesource.com/c/go/+/750480
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
Before this CL, the simdgen contains a sign check to selectively enable
such rules for deduplication purposes. This left out `VPSRL` as it's
only available in unsigned form. This CL fixes that.
It looks like the previous documentation fix to SHA instruction might
not had run go generate, so this CL also contains the generated code for
that fix.
There is also a weird phantom import in
cmd/compile/internal/ssa/issue77582_test.go
This CL also fixes that
The trybot didn't complain?
Change-Id: Ibbf9f789c1a67af1474f0285ab376bc07f17667e
Reviewed-on: https://go-review.googlesource.com/c/go/+/748501
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Saves a few lines. If applicable, we also directly rewrite to 32 bit
MOVLconst, skipping the redundant transformation.
Change-Id: I4c2f5e2bb480e798cbe373de608e19a951d168ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/640215
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Not a fix because there are other architectures
still to be done.
Updates #75463.
Change-Id: Ia5233c2b6c5f4439e269950efdd851e72e8e7ff6
Reviewed-on: https://go-review.googlesource.com/c/go/+/730160
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Not a fix because there are other architectures
still to be done.
Updates #75463.
Change-Id: Ifca03975023e4e5d0ffa98d1f877314a1a291be0
Reviewed-on: https://go-review.googlesource.com/c/go/+/729161
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Not a fix because there are other architectures
still to be done.
Updates #75463.
Change-Id: I3d7754ce4a26af0f5c4ef0be1254d164e68f8442
Reviewed-on: https://go-review.googlesource.com/c/go/+/729160
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Introduce a new zero-arg op, Empty, which builds a zero-sized value.
This is like ArrayMake0 but can make more general zero-sized values,
like those of type [2][0]int.
Needed for the subsequent CL.
Update #77635
Change-Id: If928e9677be5d40a4e2d7501dada66e062319711
Reviewed-on: https://go-review.googlesource.com/c/go/+/747761
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Ensures that deeply nested structs that have the underlying shape
of a pointer get unwrapped properly.
Update #77534
Change-Id: I004f424d2c62ec7026281daded9b3d96c021e2e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/747760
Reviewed-by: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Zicond is a mandatory extension in rva23u64. This patch converts
certain branches to CondSelect and optimizes them to Zicond
instructions on RISC-V in appropriate cases, along with additional
optimization rules.
Zicond can provide performance benefits on unpredictable branches by
avoiding branch misprediction penalties. However, on simple predictable
branches, zicond uses 4 instructions vs 2 for traditional branches,
which can cause performance regressions.
To avoid regressions, we keep CondSelect globally disabled for riscv64
and only enable it for the ConstantTimeSelect intrinsic,
which has been shown to benefit from zicond:
goos: linux
goarch: riscv64
pkg: crypto/subtle
CPU: SG2044
│ nozicond.txt │ zicond.txt │
│ sec/op │ sec/op vs base │
ConstantTimeSelect-44 2.325n ± 4% 1.750n ± 2% -24.69% (p=0.000 n=10)
Future work can explore enabling zicond for other cases that can benefit
from zicond.
Follow-up to CL 631595
Updates #75350
Co-authored-by: wangpengcheng.pp@bytedance.com
mengzhuo1203@gmail.com
Change-Id: If5d9555980e0d1e26fa924974f88943eb86b050b
GitHub-Last-Rev: 7a61508780953295f5507e5f927ab5be1d6afd91
GitHub-Pull-Request: golang/go#75577
Reviewed-on: https://go-review.googlesource.com/c/go/+/705996
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Folding a 64-bit integer into a 32-bit constant may result in a negative
integer if the value exceeds math.MaxInt32 (the maximum value of a 32-
bit signed integer). This negative value will be sign-extended to 64
bits at runtime, leading to unexpected results when used in bitwise
AND/OR operations.
Fixes #77613
Change-Id: Idb081a3c20c28bddddcc8eff1225d62123b37a2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/745581
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
When creating a dynamically-sized slice, the compiler attempts to use a
stack-allocated buffer if the slice does not escape and its buffer size
is ≤ 32 bytes.
In this case, the SSA will contain a (OpPhi (OpSliceMake) (OpSliceMake))
value: one OpSliceMake uses the stack-allocated buffer, and the other
uses the heap-allocated buffer. The len and cap arguments for these two
OpSliceMake values are expected to be identical.
This CL enables the prove pass to recognize this scenario and handle
OpSliceLen and OpSliceCap as intended.
Fixes #77375
Change-Id: Id77a2473caf66d366f5c94108aa6cb6d3df5b887
Reviewed-on: https://go-review.googlesource.com/c/go/+/740840
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
The code previously filters out VAES-only instructions, this CL added
them back.
This CL added the VAES feature check following the Intel xed data:
XED_ISA_SET_VAES: vaes.7.0.ecx.9 # avx.1.0.ecx.28
This CL also found out that the old AVX512VAES feature check is not
checking the correct bits, it also fixes it:
XED_ISA_SET_AVX512_VAES_128: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 avx512vl.7.0.ebx.31
XED_ISA_SET_AVX512_VAES_256: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 avx512vl.7.0.ebx.31
XED_ISA_SET_AVX512_VAES_512: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16
It restricts to the most strict common set - includes avx512vl for even
512-bits although it doesn't requires it.
Change-Id: I4e2f72b312fd2411589fbc12f9ee5c63c09c2e9a
Reviewed-on: https://go-review.googlesource.com/c/go/+/738500
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Ref: #733621
Updates #75463
Change-Id: Idd8821d1713754097a2fe83a050c25d9ec5b17eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/735540
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
In the mips{,64} instruction sets and their extensions, there is no
NORI instruction.
Change-Id: If008442c792297d011b3d0c1e8501e62e32ab175
Reviewed-on: https://go-review.googlesource.com/c/go/+/735900
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
riscv64
On RISC-V the JAL and JALR instructions provide Return Address
Stack(RAS) prediction hints based on the registers used (as per section
2.5.1 of the RISC-V ISA manual). When a JALR instruction uses X1 or X5
as the source register, it hints that a pop should occur.
When making a function call, avoid the use of X5 as a source register
since this results in the RAS performing a pop-then-push instead of a
push, breaking call/return pairing and significantly degrading front-end
branch prediction performance.
Based on test result of golang.org/x/benchmarks/json on SpacemiT K1, fix
version has a performance improvement of about 7%
Fixes #76654
Change-Id: I867c8d7cfb54f5decbe176f3ab3bb3d78af1cf64
Reviewed-on: https://go-review.googlesource.com/c/go/+/726760
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Joel Sing <joel@sing.id.au>
|
|
Fixes #76449
This saves a single byte for the REX prefix per OpCopy it triggers on.
Change-Id: I1eab364d07354555ba2f23ffd2f9c522d4a04bd0
Reviewed-on: https://go-review.googlesource.com/c/go/+/731640
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Merge the signed and unsigned generic functions.
The only implementation difference between the two is:
n > 0 vs n != 0 check.
For unsigned numbers n > 0 == n != 0 and we infact optimize
the first to the second.
Change-Id: Ia2f6c3e3d4eb098d98f85e06dc2e81baa60bad4e
Reviewed-on: https://go-review.googlesource.com/c/go/+/726720
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This reduces 744 instructions from the go toolchain binary on loong64.
file before after Δ %
asm 599282 599222 -60 -0.0100%
cgo 513606 513534 -72 -0.0140%
compile 2939250 2939146 -104 -0.0035%
cover 564136 564056 -80 -0.0142%
fix 895622 895546 -76 -0.0085%
link 759460 759376 -84 -0.0111%
preprofile 264960 264916 -44 -0.0166%
vet 869964 869888 -76 -0.0087%
go 1712990 1712890 -100 -0.0058%
gofmt 346416 346368 -48 -0.0139%
total 9465686 9464942 -744 -0.0079%
Change-Id: I32dfa7506d0458ca0b6de83b030c330cd2b82176
Reviewed-on: https://go-review.googlesource.com/c/go/+/725720
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
The generic SSA representation uses explicit extension and
truncation operations to change widths of values. The map
intrinsics were playing somewhat fast and loose with this
requirement. Fix that, and add a check to make sure we
don't regress.
I don't think there is a triggerable bug here, but I ran into
this with some prove pass modifications, where
cmd/compile/internal/ssa/prove.go:isCleanExt (and/or its uses)
is actually wrong when this invariant is not maintained.
Change-Id: Idb7be6e691e2dbf6d7af6584641c3227c5c64bf5
Reviewed-on: https://go-review.googlesource.com/c/go/+/731300
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|