| Age | Commit message (Collapse) | Author |
|
|
|
Since negating min int will overflows back to itself, causing a panic
inside subWillUnderflow check.
Fixes #78641
Change-Id: Ibbf2fa3228b9890a1a76ac6f4ff504b7e125b29f
Reviewed-on: https://go-review.googlesource.com/c/go/+/766260
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
The SSA generic rewrite rules implement DeMorgan's laws but are
missing the closely related boolean absorption laws:
x & (x | y) == x
x | (x & y) == x
These are fundamental boolean algebra identities (see
https://en.wikipedia.org/wiki/Absorption_law) that hold for all
bit patterns, all widths, signed and unsigned. Both GCC and LLVM
recognize and optimize these patterns at -O2.
Add two generic rules covering all four widths (8, 16, 32, 64).
Commutativity of AND/OR is handled automatically by the rule
engine, so all argument orderings are matched.
The rules eliminate two redundant ALU instructions per occurrence
and fire on real code (defer bit-manipulation patterns in runtime,
testing, go/parser, and third-party packages).
Fixes #78632
Change-Id: Ib59e839081302ad1635e823309d8aec768c25dcf
GitHub-Last-Rev: 23f8296ece08c77fcaeeaf59c2c2d8ce23d1202c
GitHub-Pull-Request: golang/go#78634
Reviewed-on: https://go-review.googlesource.com/c/go/+/765580
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
|
|
Change-Id: Ia9ee618aa68aad5bab73ee62eea176084ee162da
GitHub-Last-Rev: 4cc005d3cd1ae4e5eaa283b1799c7be26b2279f5
GitHub-Pull-Request: golang/go#78625
Reviewed-on: https://go-review.googlesource.com/c/go/+/765280
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Fixes #78558
I've also added tests to make sure PPC still generate ISEL when
the constant isn't 1.
This is to make sure we aren't generating a sequence that wouldn't
work right now.
But it does not mean we couldn't try to optimize other constants
on PPC64 if a fast sequence exists; for example like arm64's
inline register shifts.
Change-Id: Ic241d593149b7a11533948f5d4c52db357cc134f
Reviewed-on: https://go-review.googlesource.com/c/go/+/763340
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
|
|
Original algorithm merges stores with the first
mergeable store in the chain, but it misses some
cases. Additionally, creating list of STs, which
store data to adjacent memory cells allows merging them
according to the direction of increase of their addresses.
I have already tried another algorithm in CL 698097,
but it was reverted. This algorithm works differently
and fixes bug, generated by variant from another CL.
Fixes #71987, #75365
There are the results of sweet benchmarks
│ base.stat │ opt.stat │
│ sec/op │ sec/op vs base │
ESBuildThreeJS-4 1.088 ± 2% 1.086 ± 1% ~ (p=1.000 n=10)
ESBuildRomeTS-4 263.0m ± 2% 260.8m ± 1% ~ (p=0.105 n=10)
EtcdPut-4 73.08m ± 1% 73.16m ± 1% ~ (p=0.971 n=10)
EtcdSTM-4 414.9m ± 1% 415.4m ± 1% ~ (p=0.393 n=10)
GoBuildKubelet-4 203.3 ± 0% 203.5 ± 0% ~ (p=0.393 n=10)
GoBuildKubeletLink-4 19.06 ± 1% 19.05 ± 0% ~ (p=0.280 n=10)
GoBuildIstioctl-4 156.6 ± 0% 156.6 ± 0% ~ (p=0.796 n=10)
GoBuildIstioctlLink-4 14.16 ± 1% 14.18 ± 1% ~ (p=0.853 n=10)
GoBuildFrontend-4 56.45 ± 1% 56.57 ± 0% ~ (p=0.579 n=10)
GoBuildFrontendLink-4 3.635 ± 1% 3.646 ± 0% ~ (p=0.436 n=10)
GoBuildTsgo-4 103.0 ± 1% 103.4 ± 1% ~ (p=0.529 n=10)
GoBuildTsgoLink-4 1.865 ± 1% 1.860 ± 1% ~ (p=0.684 n=10)
GopherLuaKNucleotide-4 33.55 ± 0% 33.58 ± 0% ~ (p=0.075 n=10)
MarkdownRenderXHTML-4 281.0m ± 0% 280.3m ± 0% -0.23% (p=0.019 n=10)
Tile38QueryLoad-4 970.0µ ± 1% 969.3µ ± 0% ~ (p=0.436 n=10)
geomean 3.128 3.128 -0.01%
Change-Id: Ia548b43601b1bdb1c1723d300a4b8b907ab0c040
Reviewed-on: https://go-review.googlesource.com/c/go/+/760100
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
addWillOverflow and subWillOverflow has an implicit assumption that y is
positive, using it outside of addU and subU is really incorrect. This CL
fixes those incorrect usage to use the correct logic in place.
Thanks to Jakub Ciolek for reporting this issue.
Fixes #78333
Fixes CVE-2026-27143
Change-Id: I263e8e7ac227e2a68109eb7bbd45f66569ed22ec
Reviewed-on: https://go-internal-review.googlesource.com/c/go/+/3700
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Neal Patel <nealpatel@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/763765
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: David Chase <drchase@google.com>
|
|
Change-Id: I27696b1a5fa0593d9f36743efa3559a36d23ec4b
Reviewed-on: https://go-review.googlesource.com/c/go/+/760844
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
- fix a bug where it wouldn't recognize 1<<63 as a power of two
- remove the IsSigned check; there is no such thing as a signed Mul
If the rule works for signed numbers it works for unsigned ones too.
Even if the intermediary steps makes no sense, it ends up wrapping
the right way around in the end.
Change-Id: I86182762aec5eff784e2d9bc49ee028825fb9ea0
Reviewed-on: https://go-review.googlesource.com/c/go/+/760843
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
(I copied Ilya Tocar's CL 27656 and heavily modified it.)
This adds an optimization that moves loop invariant computations
out of the loop. For example:
a:= ...
for ... {
b:= a + 15
// uses of b
}
Turns into
a:= ...
b:= a + 15
for ... {
// uses of b
}
Change-Id: I36c8c7e2b3bc1c5e6b4b293bed3a76dc20d6c825
Reviewed-on: https://go-review.googlesource.com/c/go/+/697235
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Change-Id: Ic97f661c68180ff7adb9976fcc61279e1e1f04a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/760842
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
On amd64 along:
if b { x += 1 } => x += b
We can also implement constants 2 4 and 8:
if b { x += 2 } => x += b * 2
This compiles to a displacement LEA.
Change-Id: Ib00fcc5059acb0ebb346e056c4a656f164cc63df
Reviewed-on: https://go-review.googlesource.com/c/go/+/760841
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Refactor the DWARF variable generation in the compiler:
1. Replace the intermediate []byte location list encoding with a
structured LocListEntry type. The old code packed SSA block/value
IDs into pointer-sized integers, wrote them alongside DWARF4-format
length-prefixed expressions, then re-read and decoded everything
during final encoding. The new approach stores entries as
{StartBlock, StartValue, EndBlock, EndValue, Expr} structs that
PutLocationListDwarf4/5 directly encode into the appropriate format.
This eliminates encodeValue, decodeValue, appendPtr, writePtr,
readPtr, and SetupLocList, and removes the DWARF4 re-encoding in
PutLocationListDwarf5.
2. Unify createDwarfVars into a single processing loop. The old code
had three mutually exclusive paths (createSimpleVars, createABIVars,
createComplexVars) selected by build mode, followed by a separate
conservative-var loop. The new code uses one loop that tries
createComplexVar first (when SSA debug info is available), then
falls back to createSimpleVar. This removes createSimpleVars,
createABIVars, and createComplexVars.
3. Extract createConservativeVar and shouldEmitDwarfVar as named
functions, consolidating inline code and scattered filtering logic.
4. Fix createHeapDerefLocationList to return []LocListEntry instead
of raw bytes, consistent with the new representation.
Change-Id: If6fb755c22e398d7615dccaf33b1367828e6c47e
Reviewed-on: https://go-review.googlesource.com/c/go/+/750920
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Logical ops on uint8/uint16 (AND/OR/XOR) with constants sometimes
materialized the mask via MOVD (often as a negative immediate), even
when the value fit in the UI-immediate range. This prevented the backend
from selecting andi. / ori / xori forms.
This CL makes:
UI-immediate truncation is performed only at the use-site of
logical-immediate ops, and only when the constant does not fit in the
8- or 16-bit unsigned domain (m != uint8(m) / m != uint16(m)).
This avoids negative-mask materialization and enables correct emission of
UI-form logical instructions. Arithmetic SI-immediate instructions (addi, subfic, etc.) and other
use-patterns are unchanged.
Codegen tests are added to ensure the expected andi./ori/xori
patterns appear and that MOVD is not emitted for valid 8/16-bit masks.
Change-Id: I9fcdf4498c4e984c7587814fb9019a75865c4a0d
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/704015
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
|
|
If the bool comes from a local operation this is foldable into the comparison.
if a == b {
} else {
x++
}
becomes:
x += !(a == b)
becomes:
x += a != b
If the bool is passed in or loaded rather than being locally computed
this adds an extra XOR ^1 to invert it.
But at worst it should make the math equal to the compute + CMP + CMOV
which is a tie on modern CPUs which can execute CMOV on all int ALUs
and a win on the cheaper or older ones which can't.
Change-Id: Idd2566c7a3826ec432ebfbba7b3898aa0db4b812
Reviewed-on: https://go-review.googlesource.com/c/go/+/760922
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
Similar to CL 685676 but for XOR.
Change-Id: Ib5ffd4c13348f176a808b3218fdbbafc2c42794f
Reviewed-on: https://go-review.googlesource.com/c/go/+/760921
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Similar to CL 685676 but for OR.
Change-Id: I0ddfd457ed9e8888462306138a251ac48ad42084
Reviewed-on: https://go-review.googlesource.com/c/go/+/760920
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
We have two induction variables i and j in the following loop:
for i, j := 0, len(s)-1; i < j; i, j = i+1, j-1 {
// loop body
}
This CL enables the prove pass to handle cases where one if block
uses two induction variables.
Updates #45078
Change-Id: I8b8dc8b7b2d160a796dab1d1e29a00ef4e8e8157
Reviewed-on: https://go-review.googlesource.com/c/go/+/757700
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This is hit 3 times (unique by LOC) when building the std.
Change-Id: Ic1fc7b60a129e73470d9bc4f603f4be12d154b0f
Reviewed-on: https://go-review.googlesource.com/c/go/+/750342
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Change-Id: Id8baeb89e6e11a01d53cd63c665f0b2966f50392
Reviewed-on: https://go-review.googlesource.com/c/go/+/750341
Reviewed-by: Mark Freeman <markfreeman@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This is hit 308 times (unique by LOC) when building the std.
There are many hits in defer generated code.
My original intent was to optimize cryptographic code that
uses And to implement modulus by a power of two but the
number is always smaller than the modulus,
it also works there but there (unsurprisingly) far fewer hits.
Change-Id: Ia7a9a57099b98de966673c6e8231ef09f7c80964
Reviewed-on: https://go-review.googlesource.com/c/go/+/750200
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: I9596dbca8991c93c7543d10dc1b155056dfa7db3
Reviewed-on: https://go-review.googlesource.com/c/go/+/759500
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: Ib37b35dfff6236c59c0242c3b7d979c95aefbb8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/750321
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
|
|
Change-Id: I4dff3ba1462848f408257cbadedf202e62d1ea69
Reviewed-on: https://go-review.googlesource.com/c/go/+/750320
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
On the final iteration we need space below start (which becomes end)
such that i-step does not overflow or underflow.
In other words the code used to assume that the last time the loop header
execute `start < i - step` (or `<=`, `>` `>=` based on the loop)
is always false.
And it seems correct since by definition the only way for it to be the
last's loop header execution is when the condition becomes false.
However here is an example with uint (even tho the code doesn't
already support them) to make things simpler:
start = 1
i = 2
step = 100
We do 2 - 100 which should give us 1 < -98 == false breaking the loop;
Instead we get 18446744073709551518 which gives
1 < 18446744073709551518 == true which keeps the loop going.
This patch fixes this issue by ensuring that in the last execution of
a loop header the induction variable does not underflow or overflow.
Fixes #78303
Change-Id: I64e8e8592b023d79fdbc7f1598d584726ed601f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/758801
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
On ppc64/ppc64le, rewrite (x + x) << c to x << (c+1) for constant shifts. This removes an ADD, shortens the dependency chain, and reduces code size.
Add rules for both 64-bit (SLDconst) and 32-bit (SLWconst), and extend
test/codegen/shift.go with ppc64x checks to assert a single SLD/SLW and
forbid ADD. Aligns ppc64 with other architectures that already assert
similar codegen in shift.go.
Change-Id: Ie564afbb029a5bd48887b82b0c455ca1dddd5508
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/712000
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
clears"
This reverts CL 750480.
Reason: Adding preemptible memclrNoHeapPointers exposes existing unsafe
use of notInHeapSlice, causing crashes. Revert the memclr stack until
the underlying issue is fixed.
We keep the test added in CL 755942, which is useful regardless.
For #78254.
Change-Id: I8be3f9a20292b7f294e98e74e5a86c6a204406ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/757343
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Look into the following block(s) for a load that can be paired with
the load we're trying to pair up.
This particularly helps the generated equality functions. Instead of doing
MOVD x(R0), R2
MOVD x(R1), R3
CMP R2, R3
BNE noteq
MOVD x+8(R0), R2
MOVD x+8(R1), R3
CMP R2, R3
BNE noteq
we do
LDP x(R0), (R2, R4)
LDP x(R1), (R3, R5)
CMP R2, R3
BNE noteq
CMP R4, R5
BNE noteq
Removes 5296 bytes of code from cmd/go.
Change-Id: I6368686892ac944783c8b07ed7252126d1ef4031
Reviewed-on: https://go-review.googlesource.com/c/go/+/740741
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add rules to eliminate sign-extension of values that have already
been zero-extended from fewer bits via an I64And mask:
(I64Extend32S x:(I64And _ (I64Const [c]))) && c >= 0 && int64(int32(c)) == c => x
(I64Extend16S x:(I64And _ (I64Const [c]))) && c >= 0 && int64(int16(c)) == c => x
(I64Extend8S x:(I64And _ (I64Const [c]))) && c >= 0 && int64(int8(c)) == c => x
When a value has been masked to fit within the non-negative range of
the sign-extension width, the upper bits are already zero and sign-
extending is a no-op. For example, (I64Extend32S (I64And x 0xff))
can be elided because 0xff fits in a signed int32, so bit 31 is
guaranteed to be zero and sign-extending from 32 bits is identity.
Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero
Change-Id: Ia54d67358756e47ca7635a6a8ca4beadb003820a
Reviewed-on: https://go-review.googlesource.com/c/go/+/756320
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Absorb unnecessary conversion between float32 and float64
if both src and dst are 32 bit.
Ref: CL 733621
Updates #75463
Change-Id: I439f92aa3d940fa4979e76845c0893e43bf584af
Reviewed-on: https://go-review.googlesource.com/c/go/+/739520
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
|
|
R30 is the callee's saved register; using it requires saving and then restoring.
Therefore, we replace it with a register saved by the caller.
R4~R19 are argument registers on loong64, and R20 is the only remaining usable
caller saved register. To use R20 in trampoline, we modified the registers used
by the LoweredMove/LoweredMoveLoop operations (originally using r20 and r21,
now changed to R23 and R24).
Change-Id: Ie7bba0caa30a764a45bcb47635c35c829036c5a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/726140
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
In this CL, the restriction that each block can only have one induction
variable has been removed. This reduces missed optimizations.
Fixes #76269
Change-Id: I14043182a40cc7887c5b6d9c1a5df8ea3a1bfedc
Reviewed-on: https://go-review.googlesource.com/c/go/+/719881
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Update Move rewrite rules to use FMOVQload/store and FLDPQ/FSTPQ
for medium-sized copies (16-64 bytes). This generates fewer and
wider instructions than the previous approach using LDP/STP pairs.
Executable Base .text go1 Change
----------------------------------------------------
asm 2112308 2105732 -0.31%
cgo 1826132 1823172 -0.16%
compile 10474868 10460644 -0.14%
cover 1990036 1985748 -0.22%
fix 3234116 3226340 -0.24%
link 2702628 2695316 -0.27%
preprofile 947652 947028 -0.07%
vet 3140964 3133524 -0.24%
Performance effect on OrangePi 6 plus:
│ orig.out │ movq.out │
│ sec/op │ sec/op vs base │
CopyFat16 0.4711n ± 0% 0.3852n ± 0% -18.23% (p=0.000 n=10)
CopyFat17 0.7705n ± 0% 0.7705n ± 0% ~ (p=0.984 n=10)
CopyFat18 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.771 n=10)
CopyFat19 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.637 n=10)
CopyFat20 0.7703n ± 0% 0.7704n ± 0% ~ (p=0.103 n=10)
CopyFat21 0.7703n ± 0% 0.7708n ± 0% ~ (p=0.505 n=10)
CopyFat22 0.7704n ± 0% 0.7705n ± 0% ~ (p=0.589 n=10)
CopyFat23 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.347 n=10)
CopyFat24 0.7704n ± 0% 0.7703n ± 0% ~ (p=0.383 n=10)
CopyFat25 0.8385n ± 0% 0.6589n ± 0% -21.41% (p=0.000 n=10)
CopyFat26 0.8386n ± 0% 0.6590n ± 0% -21.42% (p=0.000 n=10)
CopyFat27 0.8385n ± 0% 0.6590n ± 0% -21.41% (p=0.000 n=10)
CopyFat28 0.8386n ± 0% 0.6571n ± 0% -21.65% (p=0.000 n=10)
CopyFat29 0.8385n ± 0% 0.6590n ± 0% -21.41% (p=0.000 n=10)
CopyFat30 0.8387n ± 0% 0.6591n ± 0% -21.42% (p=0.000 n=10)
CopyFat31 0.8385n ± 0% 0.6589n ± 0% -21.42% (p=0.000 n=10)
CopyFat32 0.8318n ± 0% 0.4969n ± 0% -40.26% (p=0.000 n=10)
CopyFat33 1.1550n ± 0% 0.7705n ± 0% -33.29% (p=0.000 n=10)
CopyFat34 1.1560n ± 0% 0.7703n ± 0% -33.37% (p=0.000 n=10)
CopyFat35 1.1550n ± 0% 0.7705n ± 0% -33.29% (p=0.000 n=10)
CopyFat36 1.1550n ± 0% 0.7704n ± 0% -33.30% (p=0.000 n=10)
CopyFat37 1.1555n ± 0% 0.7704n ± 0% -33.33% (p=0.000 n=10)
CopyFat38 1.1550n ± 0% 0.7704n ± 0% -33.30% (p=0.000 n=10)
CopyFat39 1.1560n ± 0% 0.7703n ± 0% -33.36% (p=0.000 n=10)
CopyFat40 1.0020n ± 0% 0.7705n ± 0% -23.10% (p=0.000 n=10)
CopyFat41 1.2060n ± 0% 0.7703n ± 0% -36.12% (p=0.000 n=10)
CopyFat42 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10)
CopyFat43 1.2060n ± 0% 0.7705n ± 0% -36.11% (p=0.000 n=10)
CopyFat44 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10)
CopyFat45 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10)
CopyFat46 1.2060n ± 0% 0.7703n ± 0% -36.13% (p=0.000 n=10)
CopyFat47 1.2060n ± 0% 0.7703n ± 0% -36.12% (p=0.000 n=10)
CopyFat48 1.2060n ± 0% 0.7703n ± 0% -36.13% (p=0.000 n=10)
CopyFat49 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10)
CopyFat50 1.3620n ± 0% 0.8621n ± 0% -36.70% (p=0.000 n=10)
CopyFat51 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10)
CopyFat52 1.3620n ± 0% 0.8623n ± 0% -36.69% (p=0.000 n=10)
CopyFat53 1.3620n ± 0% 0.8621n ± 0% -36.70% (p=0.000 n=10)
CopyFat54 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10)
CopyFat55 1.3620n ± 0% 0.8620n ± 0% -36.71% (p=0.000 n=10)
CopyFat56 1.3120n ± 0% 0.8622n ± 0% -34.28% (p=0.000 n=10)
CopyFat57 1.5905n ± 0% 0.8621n ± 0% -45.80% (p=0.000 n=10)
CopyFat58 1.5830n ± 1% 0.8622n ± 0% -45.53% (p=0.000 n=10)
CopyFat59 1.5865n ± 1% 0.8621n ± 0% -45.66% (p=0.000 n=10)
CopyFat60 1.5720n ± 1% 0.8622n ± 0% -45.15% (p=0.000 n=10)
CopyFat61 1.5900n ± 1% 0.8621n ± 0% -45.78% (p=0.000 n=10)
CopyFat62 1.5890n ± 0% 0.8622n ± 0% -45.74% (p=0.000 n=10)
CopyFat63 1.5900n ± 1% 0.8620n ± 0% -45.78% (p=0.000 n=10)
CopyFat64 1.5440n ± 0% 0.8568n ± 0% -44.51% (p=0.000 n=10)
geomean 1.093n 0.7636n -30.13%
Kunpeng 920C:
goos: linux
goarch: arm64
pkg: runtime
│ orig.out │ movq.out │
│ sec/op │ sec/op vs base │
CopyFat16 0.4892n ± 1% 0.5072n ± 0% +3.68% (p=0.000 n=10)
CopyFat17 0.6394n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10)
CopyFat18 0.6394n ± 0% 0.4638n ± 0% -27.46% (p=0.000 n=10)
CopyFat19 0.6395n ± 0% 0.4638n ± 0% -27.48% (p=0.000 n=10)
CopyFat20 0.6393n ± 0% 0.4638n ± 0% -27.45% (p=0.000 n=10)
CopyFat21 0.6394n ± 0% 0.4637n ± 0% -27.48% (p=0.000 n=10)
CopyFat22 0.6395n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10)
CopyFat23 0.6395n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10)
CopyFat24 0.6091n ± 0% 0.4639n ± 0% -23.84% (p=0.000 n=10)
CopyFat25 0.9109n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10)
CopyFat26 0.9107n ± 0% 0.4674n ± 0% -48.68% (p=0.000 n=10)
CopyFat27 0.9108n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10)
CopyFat28 0.9109n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10)
CopyFat29 0.9110n ± 0% 0.4673n ± 0% -48.70% (p=0.000 n=10)
CopyFat30 0.9109n ± 0% 0.4673n ± 0% -48.70% (p=0.000 n=10)
CopyFat31 0.9110n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10)
CopyFat32 0.6845n ± 0% 0.4845n ± 1% -29.21% (p=0.000 n=10)
CopyFat33 0.9130n ± 0% 0.9117n ± 0% -0.14% (p=0.000 n=10)
CopyFat34 0.9131n ± 0% 0.9118n ± 0% -0.14% (p=0.001 n=10)
CopyFat35 0.9131n ± 0% 0.9117n ± 0% -0.15% (p=0.001 n=10)
CopyFat36 0.9129n ± 0% 0.9117n ± 0% -0.14% (p=0.003 n=10)
CopyFat37 0.9129n ± 0% 0.9117n ± 0% -0.14% (p=0.000 n=10)
CopyFat38 0.9130n ± 0% 0.9118n ± 0% -0.14% (p=0.000 n=10)
CopyFat39 0.9131n ± 0% 0.9118n ± 0% -0.15% (p=0.000 n=10)
CopyFat40 0.9112n ± 0% 0.9118n ± 0% +0.07% (p=0.027 n=10)
CopyFat41 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10)
CopyFat42 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10)
CopyFat43 1.1390n ± 0% 0.9116n ± 0% -19.96% (p=0.000 n=10)
CopyFat44 1.1390n ± 0% 0.9119n ± 0% -19.94% (p=0.000 n=10)
CopyFat45 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10)
CopyFat46 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10)
CopyFat47 1.1390n ± 0% 0.9117n ± 0% -19.96% (p=0.000 n=10)
CopyFat48 0.9111n ± 0% 0.9116n ± 0% +0.06% (p=0.002 n=10)
CopyFat49 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10)
CopyFat50 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10)
CopyFat51 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10)
CopyFat52 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10)
CopyFat53 1.2160n ± 0% 0.9293n ± 0% -23.58% (p=0.000 n=10)
CopyFat54 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10)
CopyFat55 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10)
CopyFat56 1.1480n ± 0% 0.9303n ± 0% -18.96% (p=0.000 n=10)
CopyFat57 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10)
CopyFat58 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10)
CopyFat59 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10)
CopyFat60 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10)
CopyFat61 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10)
CopyFat62 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10)
CopyFat63 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10)
CopyFat64 1.1470n ± 0% 0.5742n ± 0% -49.94% (p=0.000 n=10)
geomean 0.9710n 0.7214n -25.70%
Change-Id: Iecfe52fde1d431a1e4503cd848813a67f3896512
Reviewed-on: https://go-review.googlesource.com/c/go/+/738261
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Add rules to eliminate redundant I64Extend sign-extension operations
in the wasm backend:
Idempotent (applying the same extend twice is redundant):
(I64Extend32S (I64Extend32S x)) => (I64Extend32S x)
(I64Extend16S (I64Extend16S x)) => (I64Extend16S x)
(I64Extend8S (I64Extend8S x)) => (I64Extend8S x)
Narrower-subsumes-wider (a narrower sign-extend already determines
all the bits that a wider one would set):
(I64Extend32S (I64Extend16S x)) => (I64Extend16S x)
(I64Extend32S (I64Extend8S x)) => (I64Extend8S x)
(I64Extend16S (I64Extend8S x)) => (I64Extend8S x)
These patterns arise from nested sub-word type conversions. For
example, converting int8 -> int16 -> int32 -> int64 lowers to
I64Extend8S -> I64Extend16S -> I64Extend32S, but the I64Extend8S
alone is sufficient since it already sign-extends from 8 to 64 bits.
Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero
Change-Id: I1637687df31893b1ffa36915a3bd2e10d455f4ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/754040
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Absorb unnecessary conversion between float32 and float64
if both src and dst are 32 bit.
Updates #75463
Change-Id: Ia71941223b5cca3fea66b559da7b8f916e63feaf
Reviewed-on: https://go-review.googlesource.com/c/go/+/733621
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Julian Zhu <jz531210@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add a rule to collapse cascaded I64And operations with constant masks
into a single mask:
(I64And (I64And x (I64Const [c1])) (I64Const [c2])) =>
(I64And x (I64Const [c1 & c2]))
This pattern arises from sub-word comparisons. For example,
(Eq32 x y) lowers to (I64Eq (ZeroExt32to64 x) (ZeroExt32to64 y)),
which becomes (I64Eq (I64And x 0xffffffff) (I64And y 0xffffffff)).
If x or y is the result of another sub-word operation that already
inserted a mask, the masks cascade and this rule collapses them.
Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero
Change-Id: Id7856b391be3ac20f1bc9eee40995b52c0754aed
Reviewed-on: https://go-review.googlesource.com/c/go/+/753620
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
It should be 2, not 1.
Fixes #78013
Change-Id: If1c48c84c324a3fd50e9f4b43cca2ea62a995dc5
Reviewed-on: https://go-review.googlesource.com/c/go/+/752740
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
Fixes #77720
Add a generic SSA rewrite that forwards `Load` from a `Move` destination
back to the `Move` source when it is provably safe, so field reads like
`s.h.Value().typ` don’t force a full struct copy.
- Add `Load <- Move` rewrite in `generic.rules` with safety guard:
non-volatile source
- Tweak `fixedbugs/issue22200*` so that it can still trigger the "stack frame too large" error.
- Regenerate `rewritegeneric.go`.
- Add `test/codegen/moveload.go` to assert no `MOVUPS` and direct `MOVBLZX`
in both direct and inlined forms.
Benchmark results (linux/amd64, i7-14700KF):
$ go test cmd/compile/internal/test -run='^$' -bench='MoveLoad' -count=20
Before:
BenchmarkMoveLoadTypViaValue-20 ~76.9 ns/op
BenchmarkMoveLoadTypViaPtr-20 ~1.97 ns/op
After:
BenchmarkMoveLoadTypViaValue-20 ~1.894 ns/op
BenchmarkMoveLoadTypViaPtr-20 ~1.905 ns/op
The rewrite removes the redundant struct copy in
`s.h.Value().typ`, bringing it in line with the direct pointer form.
Change-Id: Iddf2263e390030ba013e0642a695b87c75f899da
Reviewed-on: https://go-review.googlesource.com/c/go/+/748200
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <markfreeman@google.com>
|
|
Since Go 1.22, loop variables have per-iteration scope, making
the x := x this pattern unnecessary for goroutine capture.
No issue required for this trivial cleanup.
Change-Id: I00d98522537fc2b9a6b4d598c8aa21b447628d41
Reviewed-on: https://go-review.googlesource.com/c/go/+/753400
Auto-Submit: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
Add post-lowering identity and absorption rules for I64And, I64Or,
I64Xor, and I64Mul with constant operands:
(I64And x (I64Const [-1])) => x
(I64And x (I64Const [0])) => (I64Const [0])
(I64Or x (I64Const [0])) => x
(I64Or x (I64Const [-1])) => (I64Const [-1])
(I64Xor x (I64Const [0])) => x
(I64Mul x (I64Const [0])) => (I64Const [0])
(I64Mul x (I64Const [1])) => x
The generic SSA rules handle these patterns before lowering, but
these rules catch cases where wasm-specific lowering or other
post-lowering optimization passes produce new nodes with identity
or absorbing constant operands.
For example, the complement rule lowers Com64(x) to
(I64Xor x (I64Const [-1])), and if x is later determined to be
all-ones, the I64And absorption rule can fold the result to zero.
Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero
Change-Id: Ie9a40e075662d4828a70e30b258d92ee171d0bc2
Reviewed-on: https://go-review.googlesource.com/c/go/+/752861
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Add OpARM64FMOVQload, OpARM64FMOVQstore, OpARM64FLDPQ, and
OpARM64FSTPQ for loading and storing Vec128 values.
Includes offset folding and address combining rules.
These ops will be used by subsequent CLs.
Change-Id: I4ac86a0a31f878411f49d390cb8df01f81cfc4d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/738260
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Negation on a condition can be eliminated.
Change-Id: I94fab5f019cbaebb2ca589e1d8796a9cb72f3894
Reviewed-on: https://go-review.googlesource.com/c/go/+/748401
Reviewed-by: Xueqi Luo <1824368278@qq.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Julian Zhu <jz531210@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Replace one of the boolean simplification rules with two new rules
in order to cover more cases.
This is a rebase of CL 42516 which slipped through the cracks.
Change-Id: I6da4cf30e5156174e8eac6bc2f0e2cebe95e555c
Reviewed-on: https://go-review.googlesource.com/c/go/+/750520
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
|
|
Change-Id: I0e0a5919536b643477a6f9278fcc60492ea5a759
Reviewed-on: https://go-review.googlesource.com/c/go/+/750540
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Add the missing I64Sub constant folding rule to the wasm backend.
Every other wasm arithmetic operation (I64Add, I64Mul, I64And, I64Or,
I64Xor, I64Shl, I64ShrU, I64ShrS) already had a post-lowering
constant folding rule, but I64Sub was missing.
While the generic SSA pass folds Sub64(Const64, Const64) before
lowering, this rule ensures consistency and handles any edge cases
where later wasm-specific passes produce I64Sub with two constant
operands.
Cq-Include-Trybots: luci.golang.try:gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero
Change-Id: Ie8bc044dd300dcc6d077feec34f9a65f4a310b13
Reviewed-on: https://go-review.googlesource.com/c/go/+/751441
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Commit-Queue: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
type I interface {
foo()
}
type S struct {
I
}
Because I is embedded in S, S needs a foo method. We generate a
wrapper function to implement (*S).foo. It just loads the embedded
field I out of S and calls foo on it.
When the thing in S.I itself needs a wrapper, then we have a wrapper
calling another wrapper. This can continue, leaving a potentially long
sequence of wrappers on the stack. When we then call runtime.Callers
or friends, we have to walk an unbounded number of frames to find a
bounded number of non-wrapper frames.
This really happens, for instance with I = context.Context, S =
context.ValueCtx, and runtime.Callers = pprof sample (for any of
context.Context's methods).
To fix, make the interface call in the wrapper a tail call.
That way, the number of wrapper frames on the stack does not
increase when there are lots of wrappers happening.
Fixes #75764
Fixes #77781
Change-Id: I03b1731159d9218c7f14f72ecbbac822d6a6bb87
Reviewed-on: https://go-review.googlesource.com/c/go/+/751465
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Large memory clearing operations (via clear() or large slice allocation)
currently use non-preemptible assembly loops. This blocks the Garbage
Collector from performing a Stop The World (STW) event, leading to
significant tail latency or even indefinite hangs in tight loops.
This change introduces memclrNoHeapPointersPreemptible, which chunks
clears into 256KB blocks with preemption checks. The compiler's walk
phase is updated to emit this call for large pointer-free clears.
To prevent regressions, SSA rewrite rules are added to ensure that
constant-size clears (which are common and small) continue to be
inlined into OpZero assembly.
Benchmarks on darwin/arm64:
- STW with 50MB clear: Improved from 'Hung' to ~500µs max pause.
- Small clears (5-64B): No measurable regression.
- Large clears (1M-64M): No measurable regression.
Fixes #69327
Change-Id: Ide14d6bcdca1f60d6ac95443acb57da9a8822538
Reviewed-on: https://go-review.googlesource.com/c/go/+/750480
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
While investigating other optimizations, I found several
opportunities to accelerate sccp convergence:
- Avoid adding duplicate uses to the re-visit worklist
- Prevent queueing uses of values that have already reached the Bottom
- Add an early exit when processing a value that is already Bottom
These changes provide an overall speedup of ~9% for sccp phase
during a full make.bash run. Also they does not change
the number of constants found or the amount of dead code eliminated.
Updates #77325
Change-Id: Iaf83f6ea355eed366c3d09fc38f85561634a5a16
GitHub-Last-Rev: 078c3d309cc1f1e4b7f7a40635ffc4506f2ac1c6
GitHub-Pull-Request: golang/go#77399
Reviewed-on: https://go-review.googlesource.com/c/go/+/740980
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Before this CL, the simdgen contains a sign check to selectively enable
such rules for deduplication purposes. This left out `VPSRL` as it's
only available in unsigned form. This CL fixes that.
It looks like the previous documentation fix to SHA instruction might
not had run go generate, so this CL also contains the generated code for
that fix.
There is also a weird phantom import in
cmd/compile/internal/ssa/issue77582_test.go
This CL also fixes that
The trybot didn't complain?
Change-Id: Ibbf9f789c1a67af1474f0285ab376bc07f17667e
Reviewed-on: https://go-review.googlesource.com/c/go/+/748501
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Saves a few lines. If applicable, we also directly rewrite to 32 bit
MOVLconst, skipping the redundant transformation.
Change-Id: I4c2f5e2bb480e798cbe373de608e19a951d168ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/640215
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|