diff options
| author | Russ Cox <rsc@golang.org> | 2025-10-22 22:22:51 -0400 |
|---|---|---|
| committer | Gopher Robot <gobot@golang.org> | 2025-10-29 18:49:40 -0700 |
| commit | 9bbda7c99d2c176592186d230dab013147954bda (patch) | |
| tree | 6d299cf47b9956e35f71ad6d58130975a48218d4 /src/cmd/compile/internal/ssa/_gen/generic.rules | |
| parent | 915c1839fe76aef4bea6191282be1e48ef1c64e2 (diff) | |
| download | go-9bbda7c99d2c176592186d230dab013147954bda.tar.xz | |
cmd/compile: make prove understand div, mod better
This CL introduces new divisible and divmod passes that rewrite
divisibility checks and div, mod, and mul. These happen after
prove, so that prove can make better sense of the code for
deriving bounds, and they must run before decompose, so that
64-bit ops can be lowered to 32-bit ops on 32-bit systems.
And then they need another generic pass as well, to optimize
the generated code before decomposing.
The three opt passes are "opt", "middle opt", and "late opt".
(Perhaps instead they should be "generic", "opt", and "late opt"?)
The "late opt" pass repeats the "middle opt" work on any new code
that has been generated in the interim.
There will not be new divs or mods, but there may be new muls.
The x%c==0 rewrite rules are much simpler now, since they can
match before divs have been rewritten. This has the effect of
applying them more consistently and making the rewrite rules
independent of the exact div rewrites.
Prove is also now charged with marking signed div/mod as
unsigned when the arguments call for it, allowing simpler
code to be emitted in various cases. For example,
t.Seconds()/2 and len(x)/2 are now recognized as unsigned,
meaning they compile to a simple shift (unsigned division),
avoiding the more complex fixup we need for signed values.
https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32
shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std'
output before and after. "Proved Rsh64x64 shifts to zero" is replaced
by the higher-level "Proved Div64 is unsigned" (the shift was in the
signed expansion of div by constant), but otherwise prove is only
finding more things to prove.
One short example, in code that does x[i%len(x)]:
< runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero
---
> runtime/mfinal.go:131:34: Proved Div64 is unsigned
> runtime/mfinal.go:131:38: Proved IsInBounds
A longer example:
< crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero
---
> crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds
These diffs are due to the smaller opt being better
and taking work away from prove:
< image/jpeg/dct.go:307:5: Proved IsInBounds
< image/jpeg/dct.go:308:5: Proved IsInBounds
...
< image/jpeg/dct.go:442:5: Proved IsInBounds
In the old opt, Mul by 8 was rewritten to Lsh by 3 early.
This CL delays that rule to help prove recognize mods,
but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8].
Specifically, computing the length, opt can now do:
(Sub64 (Add (Mul 8 i) 8) (Add (Mul 8 i) 8)) ->
(Add 8 (Sub (Mul 8 i) (Mul 8 i))) ->
(Add 8 (Mul 8 (Sub i i))) ->
(Add 8 (Mul 8 0)) ->
(Add 8 0) ->
8
The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)),
Leaving the multiply as Mul enables using that step; the old
rewrite to Lsh blocked it, leaving prove to figure out the length
and then remove the bounds checks. But now opt can evaluate
the length down to a constant 8 and then constant-fold away
the bounds checks 0 < 8, 1 < 8, and so on. After that,
the compiler has nothing left to prove.
Benchmarks are noisy in general; I checked the assembly for the many
large increases below, and the vast majority are unchanged and
presumably hitting the caches differently in some way.
The divisibility optimizations were not reliably triggering before.
This leads to a very large improvement in some cases, like
DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems
and DivisbleconstU64 on 32-bit systems.
Another way the divisibility optimizations were unreliable before
was incorrectly triggering for x/3, x%3 even though they are
written not to do that. There is a real but small slowdown
in the DivisibleWDivconst benchmarks on Mac because in the cases
used in the benchmark, it is still faster (on Mac) to do the
divisibility check than to remultiply.
This may be worth further study. Perhaps when there is no rotate
(meaning the divisor is odd), the divisibility optimization
should be enabled always. In any event, this CL makes it possible
to study that.
benchmark \ host s7 linux-amd64 mac linux-arm64 linux-ppc64le linux-386 s7:GOARCH=386 linux-arm
vs base vs base vs base vs base vs base vs base vs base vs base
LoadAdd ~ ~ ~ ~ ~ -1.59% ~ ~
ExtShift ~ ~ -42.14% +0.10% ~ +1.44% +5.66% +8.50%
Modify ~ ~ ~ ~ ~ ~ ~ -1.53%
MullImm ~ ~ ~ ~ ~ +37.90% -21.87% +3.05%
ConstModify ~ ~ ~ ~ -49.14% ~ ~ ~
BitSet ~ ~ ~ ~ -15.86% -14.57% +6.44% +0.06%
BitClear ~ ~ ~ ~ ~ +1.78% +3.50% +0.06%
BitToggle ~ ~ ~ ~ ~ -16.09% +2.91% ~
BitSetConst ~ ~ ~ ~ ~ ~ ~ -0.49%
BitClearConst ~ ~ ~ ~ -28.29% ~ ~ -0.40%
BitToggleConst ~ ~ ~ +8.89% -31.19% ~ ~ -0.77%
MulNeg ~ ~ ~ ~ ~ ~ ~ ~
Mul2Neg ~ ~ -4.83% ~ ~ -13.75% -5.92% ~
DivconstI64 ~ ~ ~ ~ ~ -30.12% ~ +0.50%
ModconstI64 ~ ~ -9.94% -4.63% ~ +3.15% ~ +5.32%
DivisiblePow2constI64 -34.49% -12.58% ~ ~ -12.25% ~ ~ ~
DivisibleconstI64 -24.69% -25.06% -0.40% -2.27% -42.61% -3.31% ~ +1.63%
DivisibleWDivconstI64 ~ ~ ~ ~ ~ -17.55% ~ -0.60%
DivconstU64/3 ~ ~ ~ ~ ~ +1.51% ~ ~
DivconstU64/5 ~ ~ ~ ~ ~ ~ ~ ~
DivconstU64/37 ~ ~ -0.18% ~ ~ +2.70% ~ ~
DivconstU64/1234567 ~ ~ ~ ~ ~ ~ ~ +0.12%
ModconstU64 ~ ~ ~ -0.24% ~ -5.10% -1.07% -1.56%
DivisibleconstU64 ~ ~ ~ ~ ~ -29.01% -59.13% -50.72%
DivisibleWDivconstU64 ~ ~ -12.18% -18.88% ~ -5.50% -3.91% +5.17%
DivconstI32 ~ ~ -0.48% ~ -34.69% +89.01% -6.01% -16.67%
ModconstI32 ~ +2.95% -0.33% ~ ~ -2.98% -5.40% -8.30%
DivisiblePow2constI32 ~ ~ ~ ~ ~ ~ ~ -16.22%
DivisibleconstI32 ~ ~ ~ ~ ~ -37.27% -47.75% -25.03%
DivisibleWDivconstI32 -11.59% +5.22% -12.99% -23.83% ~ +45.95% -7.03% -10.01%
DivconstU32 ~ ~ ~ ~ ~ +74.71% +4.81% ~
ModconstU32 ~ ~ +0.53% +0.18% ~ +51.16% ~ ~
DivisibleconstU32 ~ ~ ~ -0.62% ~ -4.25% ~ ~
DivisibleWDivconstU32 -2.77% +5.56% +11.12% -5.15% ~ +48.70% +25.11% -4.07%
DivconstI16 -6.06% ~ -0.33% +0.22% ~ ~ -9.68% +5.47%
ModconstI16 ~ ~ +4.44% +2.82% ~ ~ ~ +5.06%
DivisiblePow2constI16 ~ ~ ~ ~ ~ ~ ~ -0.17%
DivisibleconstI16 ~ ~ -0.23% ~ ~ ~ +4.60% +6.64%
DivisibleWDivconstI16 -1.44% -0.43% +13.48% -5.76% ~ +1.62% -23.15% -9.06%
DivconstU16 +1.61% ~ -0.35% -0.47% ~ ~ +15.59% ~
ModconstU16 ~ ~ ~ ~ ~ -0.72% ~ +14.23%
DivisibleconstU16 ~ ~ -0.05% +3.00% ~ ~ ~ +5.06%
DivisibleWDivconstU16 +52.10% +0.75% +17.28% +4.79% ~ -37.39% +5.28% -9.06%
DivconstI8 ~ ~ -0.34% -0.96% ~ ~ -9.20% ~
ModconstI8 +2.29% ~ +4.38% +2.96% ~ ~ ~ ~
DivisiblePow2constI8 ~ ~ ~ ~ ~ ~ ~ ~
DivisibleconstI8 ~ ~ ~ ~ ~ ~ +6.04% ~
DivisibleWDivconstI8 -26.44% +1.69% +17.03% +4.05% ~ +32.48% -24.90% ~
DivconstU8 -4.50% +14.06% -0.28% ~ ~ ~ +4.16% +0.88%
ModconstU8 ~ ~ +25.84% -0.64% ~ ~ ~ ~
DivisibleconstU8 ~ ~ -5.70% ~ ~ ~ ~ ~
DivisibleWDivconstU8 +49.55% +9.07% ~ +4.03% +53.87% -40.03% +39.72% -3.01%
Mul2 ~ ~ ~ ~ ~ ~ ~ ~
MulNeg2 ~ ~ ~ ~ -11.73% ~ ~ -0.02%
EfaceInteger ~ ~ ~ ~ ~ +18.11% ~ +2.53%
TypeAssert +33.90% +2.86% ~ ~ ~ -1.07% -5.29% -1.04%
Div64UnsignedSmall ~ ~ ~ ~ ~ ~ ~ ~
Div64Small ~ ~ ~ ~ ~ -0.88% ~ +2.39%
Div64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +0.35%
Div64SmallNegDividend ~ ~ ~ ~ ~ -0.84% ~ +3.57%
Div64SmallNegBoth ~ ~ ~ ~ ~ -0.86% ~ +3.55%
Div64Unsigned ~ ~ ~ ~ ~ ~ ~ -0.11%
Div64 ~ ~ ~ ~ ~ ~ ~ +0.11%
Div64NegDivisor ~ ~ ~ ~ ~ -1.29% ~ ~
Div64NegDividend ~ ~ ~ ~ ~ -1.44% ~ ~
Div64NegBoth ~ ~ ~ ~ ~ ~ ~ +0.28%
Mod64UnsignedSmall ~ ~ ~ ~ ~ +0.48% ~ +0.93%
Mod64Small ~ ~ ~ ~ ~ ~ ~ ~
Mod64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +1.44%
Mod64SmallNegDividend ~ ~ ~ ~ ~ +0.22% ~ +1.37%
Mod64SmallNegBoth ~ ~ ~ ~ ~ ~ ~ -2.22%
Mod64Unsigned ~ ~ ~ ~ ~ -0.95% ~ +0.11%
Mod64 ~ ~ ~ ~ ~ ~ ~ ~
Mod64NegDivisor ~ ~ ~ ~ ~ ~ ~ -0.02%
Mod64NegDividend ~ ~ ~ ~ ~ ~ ~ ~
Mod64NegBoth ~ ~ ~ ~ ~ ~ ~ -0.02%
MulconstI32/3 ~ ~ ~ -25.00% ~ ~ ~ +47.37%
MulconstI32/5 ~ ~ ~ +33.28% ~ ~ ~ +32.21%
MulconstI32/12 ~ ~ ~ -2.13% ~ ~ ~ -0.02%
MulconstI32/120 ~ ~ ~ +2.93% ~ ~ ~ -0.03%
MulconstI32/-120 ~ ~ ~ -2.17% ~ ~ ~ -0.03%
MulconstI32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03%
MulconstI32/65538 ~ ~ ~ ~ ~ -33.38% ~ +0.04%
MulconstI64/3 ~ ~ ~ +33.35% ~ -0.37% ~ -0.13%
MulconstI64/5 ~ ~ ~ -25.00% ~ -0.34% ~ ~
MulconstI64/12 ~ ~ ~ +2.13% ~ +11.62% ~ +2.30%
MulconstI64/120 ~ ~ ~ -1.98% ~ ~ ~ ~
MulconstI64/-120 ~ ~ ~ +0.75% ~ ~ ~ ~
MulconstI64/65537 ~ ~ ~ ~ ~ +5.61% ~ ~
MulconstI64/65538 ~ ~ ~ ~ ~ +5.25% ~ ~
MulconstU32/3 ~ +0.81% ~ +33.39% ~ +77.92% ~ -32.31%
MulconstU32/5 ~ ~ ~ -24.97% ~ +77.92% ~ -24.47%
MulconstU32/12 ~ ~ ~ +2.06% ~ ~ ~ +0.03%
MulconstU32/120 ~ ~ ~ -2.74% ~ ~ ~ +0.03%
MulconstU32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03%
MulconstU32/65538 ~ ~ ~ ~ ~ -33.42% ~ -0.03%
MulconstU64/3 ~ ~ ~ +33.33% ~ -0.28% ~ +1.22%
MulconstU64/5 ~ ~ ~ -25.00% ~ ~ ~ -0.64%
MulconstU64/12 ~ ~ ~ +2.30% ~ +11.59% ~ +0.14%
MulconstU64/120 ~ ~ ~ -2.82% ~ ~ ~ +0.04%
MulconstU64/65537 ~ +0.37% ~ ~ ~ +5.58% ~ ~
MulconstU64/65538 ~ ~ ~ ~ ~ +5.16% ~ ~
ShiftArithmeticRight ~ ~ ~ ~ ~ -10.81% ~ +0.31%
Switch8Predictable +14.69% ~ ~ ~ ~ -24.85% ~ ~
Switch8Unpredictable ~ -0.58% -3.80% ~ ~ -11.78% ~ -0.79%
Switch32Predictable -10.33% +17.89% ~ ~ ~ +5.76% ~ ~
Switch32Unpredictable -3.15% +1.19% +9.42% ~ ~ -10.30% -5.09% +0.44%
SwitchStringPredictable +70.88% +20.48% ~ ~ ~ +2.39% ~ +0.31%
SwitchStringUnpredictable ~ +3.91% -5.06% -0.98% ~ +0.61% +2.03% ~
SwitchTypePredictable +146.58% -1.10% ~ -12.45% ~ -0.46% -3.81% ~
SwitchTypeUnpredictable +0.46% -0.83% ~ +4.18% ~ +0.43% ~ +0.62%
SwitchInterfaceTypePredictable -13.41% -10.13% +11.03% ~ ~ -4.38% ~ +0.75%
SwitchInterfaceTypeUnpredictable -6.37% -2.14% ~ -3.21% ~ -4.20% ~ +1.08%
Fixes #63110.
Fixes #75954.
Change-Id: I55a876f08c6c14f419ce1a8cbba2eaae6c6efbf0
Reviewed-on: https://go-review.googlesource.com/c/go/+/714160
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Diffstat (limited to 'src/cmd/compile/internal/ssa/_gen/generic.rules')
| -rw-r--r-- | src/cmd/compile/internal/ssa/_gen/generic.rules | 887 |
1 files changed, 56 insertions, 831 deletions
diff --git a/src/cmd/compile/internal/ssa/_gen/generic.rules b/src/cmd/compile/internal/ssa/_gen/generic.rules index 795e9f052e..3f02644832 100644 --- a/src/cmd/compile/internal/ssa/_gen/generic.rules +++ b/src/cmd/compile/internal/ssa/_gen/generic.rules @@ -199,16 +199,6 @@ (And(8|16|32|64) <t> (Com(8|16|32|64) x) (Com(8|16|32|64) y)) => (Com(8|16|32|64) (Or(8|16|32|64) <t> x y)) (Or(8|16|32|64) <t> (Com(8|16|32|64) x) (Com(8|16|32|64) y)) => (Com(8|16|32|64) (And(8|16|32|64) <t> x y)) -// Convert multiplication by a power of two to a shift. -(Mul8 <t> n (Const8 [c])) && isPowerOfTwo(c) => (Lsh8x64 <t> n (Const64 <typ.UInt64> [log8(c)])) -(Mul16 <t> n (Const16 [c])) && isPowerOfTwo(c) => (Lsh16x64 <t> n (Const64 <typ.UInt64> [log16(c)])) -(Mul32 <t> n (Const32 [c])) && isPowerOfTwo(c) => (Lsh32x64 <t> n (Const64 <typ.UInt64> [log32(c)])) -(Mul64 <t> n (Const64 [c])) && isPowerOfTwo(c) => (Lsh64x64 <t> n (Const64 <typ.UInt64> [log64(c)])) -(Mul8 <t> n (Const8 [c])) && t.IsSigned() && isPowerOfTwo(-c) => (Neg8 (Lsh8x64 <t> n (Const64 <typ.UInt64> [log8(-c)]))) -(Mul16 <t> n (Const16 [c])) && t.IsSigned() && isPowerOfTwo(-c) => (Neg16 (Lsh16x64 <t> n (Const64 <typ.UInt64> [log16(-c)]))) -(Mul32 <t> n (Const32 [c])) && t.IsSigned() && isPowerOfTwo(-c) => (Neg32 (Lsh32x64 <t> n (Const64 <typ.UInt64> [log32(-c)]))) -(Mul64 <t> n (Const64 [c])) && t.IsSigned() && isPowerOfTwo(-c) => (Neg64 (Lsh64x64 <t> n (Const64 <typ.UInt64> [log64(-c)]))) - (Mod8 (Const8 [c]) (Const8 [d])) && d != 0 => (Const8 [c % d]) (Mod16 (Const16 [c]) (Const16 [d])) && d != 0 => (Const16 [c % d]) (Mod32 (Const32 [c]) (Const32 [d])) && d != 0 => (Const32 [c % d]) @@ -380,13 +370,15 @@ // Distribute multiplication c * (d+x) -> c*d + c*x. Useful for: // a[i].b = ...; a[i+1].b = ... -(Mul64 (Const64 <t> [c]) (Add64 <t> (Const64 <t> [d]) x)) => +// The !isPowerOfTwo is a kludge to keep a[i+1] using an index by a multiply, +// which turns into an index by a shift, which can use a shifted operand on ARM systems. +(Mul64 (Const64 <t> [c]) (Add64 <t> (Const64 <t> [d]) x)) && !isPowerOfTwo(c) => (Add64 (Const64 <t> [c*d]) (Mul64 <t> (Const64 <t> [c]) x)) -(Mul32 (Const32 <t> [c]) (Add32 <t> (Const32 <t> [d]) x)) => +(Mul32 (Const32 <t> [c]) (Add32 <t> (Const32 <t> [d]) x)) && !isPowerOfTwo(c) => (Add32 (Const32 <t> [c*d]) (Mul32 <t> (Const32 <t> [c]) x)) -(Mul16 (Const16 <t> [c]) (Add16 <t> (Const16 <t> [d]) x)) => +(Mul16 (Const16 <t> [c]) (Add16 <t> (Const16 <t> [d]) x)) && !isPowerOfTwo(c) => (Add16 (Const16 <t> [c*d]) (Mul16 <t> (Const16 <t> [c]) x)) -(Mul8 (Const8 <t> [c]) (Add8 <t> (Const8 <t> [d]) x)) => +(Mul8 (Const8 <t> [c]) (Add8 <t> (Const8 <t> [d]) x)) && !isPowerOfTwo(c) => (Add8 (Const8 <t> [c*d]) (Mul8 <t> (Const8 <t> [c]) x)) // Rewrite x*y ± x*z to x*(y±z) @@ -1034,176 +1026,9 @@ // We must ensure that no intermediate computations are invalid pointers. (Convert a:(Add(64|32) (Add(64|32) (Convert ptr mem) off1) off2) mem) => (AddPtr ptr (Add(64|32) <a.Type> off1 off2)) -// strength reduction of divide by a constant. -// See ../magic.go for a detailed description of these algorithms. - -// Unsigned divide by power of 2. Strength reduce to a shift. -(Div8u n (Const8 [c])) && isUnsignedPowerOfTwo(uint8(c)) => (Rsh8Ux64 n (Const64 <typ.UInt64> [log8u(uint8(c))])) -(Div16u n (Const16 [c])) && isUnsignedPowerOfTwo(uint16(c)) => (Rsh16Ux64 n (Const64 <typ.UInt64> [log16u(uint16(c))])) -(Div32u n (Const32 [c])) && isUnsignedPowerOfTwo(uint32(c)) => (Rsh32Ux64 n (Const64 <typ.UInt64> [log32u(uint32(c))])) -(Div64u n (Const64 [c])) && isUnsignedPowerOfTwo(uint64(c)) => (Rsh64Ux64 n (Const64 <typ.UInt64> [log64u(uint64(c))])) - -// Signed non-negative divide by power of 2. -(Div8 n (Const8 [c])) && isNonNegative(n) && isPowerOfTwo(c) => (Rsh8Ux64 n (Const64 <typ.UInt64> [log8(c)])) -(Div16 n (Const16 [c])) && isNonNegative(n) && isPowerOfTwo(c) => (Rsh16Ux64 n (Const64 <typ.UInt64> [log16(c)])) -(Div32 n (Const32 [c])) && isNonNegative(n) && isPowerOfTwo(c) => (Rsh32Ux64 n (Const64 <typ.UInt64> [log32(c)])) -(Div64 n (Const64 [c])) && isNonNegative(n) && isPowerOfTwo(c) => (Rsh64Ux64 n (Const64 <typ.UInt64> [log64(c)])) -(Div64 n (Const64 [-1<<63])) && isNonNegative(n) => (Const64 [0]) - -// Unsigned divide, not a power of 2. Strength reduce to a multiply. -// For 8-bit divides, we just do a direct 9-bit by 8-bit multiply. -(Div8u x (Const8 [c])) && umagicOK8(c) => - (Trunc32to8 - (Rsh32Ux64 <typ.UInt32> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(1<<8+umagic8(c).m)]) - (ZeroExt8to32 x)) - (Const64 <typ.UInt64> [8+umagic8(c).s]))) - -// For 16-bit divides on 64-bit machines, we do a direct 17-bit by 16-bit multiply. -(Div16u x (Const16 [c])) && umagicOK16(c) && config.RegSize == 8 => - (Trunc64to16 - (Rsh64Ux64 <typ.UInt64> - (Mul64 <typ.UInt64> - (Const64 <typ.UInt64> [int64(1<<16+umagic16(c).m)]) - (ZeroExt16to64 x)) - (Const64 <typ.UInt64> [16+umagic16(c).s]))) - -// For 16-bit divides on 32-bit machines -(Div16u x (Const16 [c])) && umagicOK16(c) && config.RegSize == 4 && umagic16(c).m&1 == 0 => - (Trunc32to16 - (Rsh32Ux64 <typ.UInt32> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(1<<15+umagic16(c).m/2)]) - (ZeroExt16to32 x)) - (Const64 <typ.UInt64> [16+umagic16(c).s-1]))) -(Div16u x (Const16 [c])) && umagicOK16(c) && config.RegSize == 4 && c&1 == 0 => - (Trunc32to16 - (Rsh32Ux64 <typ.UInt32> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(1<<15+(umagic16(c).m+1)/2)]) - (Rsh32Ux64 <typ.UInt32> (ZeroExt16to32 x) (Const64 <typ.UInt64> [1]))) - (Const64 <typ.UInt64> [16+umagic16(c).s-2]))) -(Div16u x (Const16 [c])) && umagicOK16(c) && config.RegSize == 4 && config.useAvg => - (Trunc32to16 - (Rsh32Ux64 <typ.UInt32> - (Avg32u - (Lsh32x64 <typ.UInt32> (ZeroExt16to32 x) (Const64 <typ.UInt64> [16])) - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(umagic16(c).m)]) - (ZeroExt16to32 x))) - (Const64 <typ.UInt64> [16+umagic16(c).s-1]))) - -// For 32-bit divides on 32-bit machines -(Div32u x (Const32 [c])) && umagicOK32(c) && config.RegSize == 4 && umagic32(c).m&1 == 0 && config.useHmul => - (Rsh32Ux64 <typ.UInt32> - (Hmul32u <typ.UInt32> - (Const32 <typ.UInt32> [int32(1<<31+umagic32(c).m/2)]) - x) - (Const64 <typ.UInt64> [umagic32(c).s-1])) -(Div32u x (Const32 [c])) && umagicOK32(c) && config.RegSize == 4 && c&1 == 0 && config.useHmul => - (Rsh32Ux64 <typ.UInt32> - (Hmul32u <typ.UInt32> - (Const32 <typ.UInt32> [int32(1<<31+(umagic32(c).m+1)/2)]) - (Rsh32Ux64 <typ.UInt32> x (Const64 <typ.UInt64> [1]))) - (Const64 <typ.UInt64> [umagic32(c).s-2])) -(Div32u x (Const32 [c])) && umagicOK32(c) && config.RegSize == 4 && config.useAvg && config.useHmul => - (Rsh32Ux64 <typ.UInt32> - (Avg32u - x - (Hmul32u <typ.UInt32> - (Const32 <typ.UInt32> [int32(umagic32(c).m)]) - x)) - (Const64 <typ.UInt64> [umagic32(c).s-1])) - -// For 32-bit divides on 64-bit machines -// We'll use a regular (non-hi) multiply for this case. -(Div32u x (Const32 [c])) && umagicOK32(c) && config.RegSize == 8 && umagic32(c).m&1 == 0 => - (Trunc64to32 - (Rsh64Ux64 <typ.UInt64> - (Mul64 <typ.UInt64> - (Const64 <typ.UInt64> [int64(1<<31+umagic32(c).m/2)]) - (ZeroExt32to64 x)) - (Const64 <typ.UInt64> [32+umagic32(c).s-1]))) -(Div32u x (Const32 [c])) && umagicOK32(c) && config.RegSize == 8 && c&1 == 0 => - (Trunc64to32 - (Rsh64Ux64 <typ.UInt64> - (Mul64 <typ.UInt64> - (Const64 <typ.UInt64> [int64(1<<31+(umagic32(c).m+1)/2)]) - (Rsh64Ux64 <typ.UInt64> (ZeroExt32to64 x) (Const64 <typ.UInt64> [1]))) - (Const64 <typ.UInt64> [32+umagic32(c).s-2]))) -(Div32u x (Const32 [c])) && umagicOK32(c) && config.RegSize == 8 && config.useAvg => - (Trunc64to32 - (Rsh64Ux64 <typ.UInt64> - (Avg64u - (Lsh64x64 <typ.UInt64> (ZeroExt32to64 x) (Const64 <typ.UInt64> [32])) - (Mul64 <typ.UInt64> - (Const64 <typ.UInt32> [int64(umagic32(c).m)]) - (ZeroExt32to64 x))) - (Const64 <typ.UInt64> [32+umagic32(c).s-1]))) - -// For unsigned 64-bit divides on 32-bit machines, -// if the constant fits in 16 bits (so that the last term -// fits in 32 bits), convert to three 32-bit divides by a constant. -// -// If 1<<32 = Q * c + R -// and x = hi << 32 + lo -// -// Then x = (hi/c*c + hi%c) << 32 + lo -// = hi/c*c<<32 + hi%c<<32 + lo -// = hi/c*c<<32 + (hi%c)*(Q*c+R) + lo/c*c + lo%c -// = hi/c*c<<32 + (hi%c)*Q*c + lo/c*c + (hi%c*R+lo%c) -// and x / c = (hi/c)<<32 + (hi%c)*Q + lo/c + (hi%c*R+lo%c)/c -(Div64u x (Const64 [c])) && c > 0 && c <= 0xFFFF && umagicOK32(int32(c)) && config.RegSize == 4 && config.useHmul => - (Add64 - (Add64 <typ.UInt64> - (Add64 <typ.UInt64> - (Lsh64x64 <typ.UInt64> - (ZeroExt32to64 - (Div32u <typ.UInt32> - (Trunc64to32 <typ.UInt32> (Rsh64Ux64 <typ.UInt64> x (Const64 <typ.UInt64> [32]))) - (Const32 <typ.UInt32> [int32(c)]))) - (Const64 <typ.UInt64> [32])) - (ZeroExt32to64 (Div32u <typ.UInt32> (Trunc64to32 <typ.UInt32> x) (Const32 <typ.UInt32> [int32(c)])))) - (Mul64 <typ.UInt64> - (ZeroExt32to64 <typ.UInt64> - (Mod32u <typ.UInt32> - (Trunc64to32 <typ.UInt32> (Rsh64Ux64 <typ.UInt64> x (Const64 <typ.UInt64> [32]))) - (Const32 <typ.UInt32> [int32(c)]))) - (Const64 <typ.UInt64> [int64((1<<32)/c)]))) - (ZeroExt32to64 - (Div32u <typ.UInt32> - (Add32 <typ.UInt32> - (Mod32u <typ.UInt32> (Trunc64to32 <typ.UInt32> x) (Const32 <typ.UInt32> [int32(c)])) - (Mul32 <typ.UInt32> - (Mod32u <typ.UInt32> - (Trunc64to32 <typ.UInt32> (Rsh64Ux64 <typ.UInt64> x (Const64 <typ.UInt64> [32]))) - (Const32 <typ.UInt32> [int32(c)])) - (Const32 <typ.UInt32> [int32((1<<32)%c)]))) - (Const32 <typ.UInt32> [int32(c)])))) - -// For 64-bit divides on 64-bit machines -// (64-bit divides on 32-bit machines are lowered to a runtime call by the walk pass.) -(Div64u x (Const64 [c])) && umagicOK64(c) && config.RegSize == 8 && umagic64(c).m&1 == 0 && config.useHmul => - (Rsh64Ux64 <typ.UInt64> - (Hmul64u <typ.UInt64> - (Const64 <typ.UInt64> [int64(1<<63+umagic64(c).m/2)]) - x) - (Const64 <typ.UInt64> [umagic64(c).s-1])) -(Div64u x (Const64 [c])) && umagicOK64(c) && config.RegSize == 8 && c&1 == 0 && config.useHmul => - (Rsh64Ux64 <typ.UInt64> - (Hmul64u <typ.UInt64> - (Const64 <typ.UInt64> [int64(1<<63+(umagic64(c).m+1)/2)]) - (Rsh64Ux64 <typ.UInt64> x (Const64 <typ.UInt64> [1]))) - (Const64 <typ.UInt64> [umagic64(c).s-2])) -(Div64u x (Const64 [c])) && umagicOK64(c) && config.RegSize == 8 && config.useAvg && config.useHmul => - (Rsh64Ux64 <typ.UInt64> - (Avg64u - x - (Hmul64u <typ.UInt64> - (Const64 <typ.UInt64> [int64(umagic64(c).m)]) - x)) - (Const64 <typ.UInt64> [umagic64(c).s-1])) +// Simplification of divisions. +// Only trivial, easily analyzed (by prove) rewrites here. +// Strength reduction of div to mul is delayed to divmod.rules. // Signed divide by a negative constant. Rewrite to divide by a positive constant. (Div8 <t> n (Const8 [c])) && c < 0 && c != -1<<7 => (Neg8 (Div8 <t> n (Const8 <t> [-c]))) @@ -1214,107 +1039,41 @@ // Dividing by the most-negative number. Result is always 0 except // if the input is also the most-negative number. // We can detect that using the sign bit of x & -x. +(Div64 x (Const64 [-1<<63])) && isNonNegative(x) => (Const64 [0]) (Div8 <t> x (Const8 [-1<<7 ])) => (Rsh8Ux64 (And8 <t> x (Neg8 <t> x)) (Const64 <typ.UInt64> [7 ])) (Div16 <t> x (Const16 [-1<<15])) => (Rsh16Ux64 (And16 <t> x (Neg16 <t> x)) (Const64 <typ.UInt64> [15])) (Div32 <t> x (Const32 [-1<<31])) => (Rsh32Ux64 (And32 <t> x (Neg32 <t> x)) (Const64 <typ.UInt64> [31])) (Div64 <t> x (Const64 [-1<<63])) => (Rsh64Ux64 (And64 <t> x (Neg64 <t> x)) (Const64 <typ.UInt64> [63])) -// Signed divide by power of 2. -// n / c = n >> log(c) if n >= 0 -// = (n+c-1) >> log(c) if n < 0 -// We conditionally add c-1 by adding n>>63>>(64-log(c)) (first shift signed, second shift unsigned). -(Div8 <t> n (Const8 [c])) && isPowerOfTwo(c) => - (Rsh8x64 - (Add8 <t> n (Rsh8Ux64 <t> (Rsh8x64 <t> n (Const64 <typ.UInt64> [ 7])) (Const64 <typ.UInt64> [int64( 8-log8(c))]))) - (Const64 <typ.UInt64> [int64(log8(c))])) -(Div16 <t> n (Const16 [c])) && isPowerOfTwo(c) => - (Rsh16x64 - (Add16 <t> n (Rsh16Ux64 <t> (Rsh16x64 <t> n (Const64 <typ.UInt64> [15])) (Const64 <typ.UInt64> [int64(16-log16(c))]))) - (Const64 <typ.UInt64> [int64(log16(c))])) -(Div32 <t> n (Const32 [c])) && isPowerOfTwo(c) => - (Rsh32x64 - (Add32 <t> n (Rsh32Ux64 <t> (Rsh32x64 <t> n (Const64 <typ.UInt64> [31])) (Const64 <typ.UInt64> [int64(32-log32(c))]))) - (Const64 <typ.UInt64> [int64(log32(c))])) -(Div64 <t> n (Const64 [c])) && isPowerOfTwo(c) => - (Rsh64x64 - (Add64 <t> n (Rsh64Ux64 <t> (Rsh64x64 <t> n (Const64 <typ.UInt64> [63])) (Const64 <typ.UInt64> [int64(64-log64(c))]))) - (Const64 <typ.UInt64> [int64(log64(c))])) +// Unsigned divide by power of 2. Strength reduce to a shift. +(Div8u n (Const8 [c])) && isUnsignedPowerOfTwo(uint8(c)) => (Rsh8Ux64 n (Const64 <typ.UInt64> [log8u(uint8(c))])) +(Div16u n (Const16 [c])) && isUnsignedPowerOfTwo(uint16(c)) => (Rsh16Ux64 n (Const64 <typ.UInt64> [log16u(uint16(c))])) +(Div32u n (Const32 [c])) && isUnsignedPowerOfTwo(uint32(c)) => (Rsh32Ux64 n (Const64 <typ.UInt64> [log32u(uint32(c))])) +(Div64u n (Const64 [c])) && isUnsignedPowerOfTwo(uint64(c)) => (Rsh64Ux64 n (Const64 <typ.UInt64> [log64u(uint64(c))])) + +// Strength reduce multiplication by a power of two to a shift. +// Excluded from early opt so that prove can recognize mod +// by the x - (x/d)*d pattern. +// (Runs during "middle opt" and "late opt".) +(Mul8 <t> x (Const8 [c])) && isPowerOfTwo(c) && v.Block.Func.pass.name != "opt" => + (Lsh8x64 <t> x (Const64 <typ.UInt64> [log8(c)])) +(Mul16 <t> x (Const16 [c])) && isPowerOfTwo(c) && v.Block.Func.pass.name != "opt" => + (Lsh16x64 <t> x (Const64 <typ.UInt64> [log16(c)])) +(Mul32 <t> x (Const32 [c])) && isPowerOfTwo(c) && v.Block.Func.pass.name != "opt" => + (Lsh32x64 <t> x (Const64 <typ.UInt64> [log32(c)])) +(Mul64 <t> x (Const64 [c])) && isPowerOfTwo(c) && v.Block.Func.pass.name != "opt" => + (Lsh64x64 <t> x (Const64 <typ.UInt64> [log64(c)])) +(Mul8 <t> x (Const8 [c])) && t.IsSigned() && isPowerOfTwo(-c) && v.Block.Func.pass.name != "opt" => + (Neg8 (Lsh8x64 <t> x (Const64 <typ.UInt64> [log8(-c)]))) +(Mul16 <t> x (Const16 [c])) && t.IsSigned() && isPowerOfTwo(-c) && v.Block.Func.pass.name != "opt" => + (Neg16 (Lsh16x64 <t> x (Const64 <typ.UInt64> [log16(-c)]))) +(Mul32 <t> x (Const32 [c])) && t.IsSigned() && isPowerOfTwo(-c) && v.Block.Func.pass.name != "opt" => + (Neg32 (Lsh32x64 <t> x (Const64 <typ.UInt64> [log32(-c)]))) +(Mul64 <t> x (Const64 [c])) && t.IsSigned() && isPowerOfTwo(-c) && v.Block.Func.pass.name != "opt" => + (Neg64 (Lsh64x64 <t> x (Const64 <typ.UInt64> [log64(-c)]))) -// Signed divide, not a power of 2. Strength reduce to a multiply. -(Div8 <t> x (Const8 [c])) && smagicOK8(c) => - (Sub8 <t> - (Rsh32x64 <t> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(smagic8(c).m)]) - (SignExt8to32 x)) - (Const64 <typ.UInt64> [8+smagic8(c).s])) - (Rsh32x64 <t> - (SignExt8to32 x) - (Const64 <typ.UInt64> [31]))) -(Div16 <t> x (Const16 [c])) && smagicOK16(c) => - (Sub16 <t> - (Rsh32x64 <t> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(smagic16(c).m)]) - (SignExt16to32 x)) - (Const64 <typ.UInt64> [16+smagic16(c).s])) - (Rsh32x64 <t> - (SignExt16to32 x) - (Const64 <typ.UInt64> [31]))) -(Div32 <t> x (Const32 [c])) && smagicOK32(c) && config.RegSize == 8 => - (Sub32 <t> - (Rsh64x64 <t> - (Mul64 <typ.UInt64> - (Const64 <typ.UInt64> [int64(smagic32(c).m)]) - (SignExt32to64 x)) - (Const64 <typ.UInt64> [32+smagic32(c).s])) - (Rsh64x64 <t> - (SignExt32to64 x) - (Const64 <typ.UInt64> [63]))) -(Div32 <t> x (Const32 [c])) && smagicOK32(c) && config.RegSize == 4 && smagic32(c).m&1 == 0 && config.useHmul => - (Sub32 <t> - (Rsh32x64 <t> - (Hmul32 <t> - (Const32 <typ.UInt32> [int32(smagic32(c).m/2)]) - x) - (Const64 <typ.UInt64> [smagic32(c).s-1])) - (Rsh32x64 <t> - x - (Const64 <typ.UInt64> [31]))) -(Div32 <t> x (Const32 [c])) && smagicOK32(c) && config.RegSize == 4 && smagic32(c).m&1 != 0 && config.useHmul => - (Sub32 <t> - (Rsh32x64 <t> - (Add32 <t> - (Hmul32 <t> - (Const32 <typ.UInt32> [int32(smagic32(c).m)]) - x) - x) - (Const64 <typ.UInt64> [smagic32(c).s])) - (Rsh32x64 <t> - x - (Const64 <typ.UInt64> [31]))) -(Div64 <t> x (Const64 [c])) && smagicOK64(c) && smagic64(c).m&1 == 0 && config.useHmul => - (Sub64 <t> - (Rsh64x64 <t> - (Hmul64 <t> - (Const64 <typ.UInt64> [int64(smagic64(c).m/2)]) - x) - (Const64 <typ.UInt64> [smagic64(c).s-1])) - (Rsh64x64 <t> - x - (Const64 <typ.UInt64> [63]))) -(Div64 <t> x (Const64 [c])) && smagicOK64(c) && smagic64(c).m&1 != 0 && config.useHmul => - (Sub64 <t> - (Rsh64x64 <t> - (Add64 <t> - (Hmul64 <t> - (Const64 <typ.UInt64> [int64(smagic64(c).m)]) - x) - x) - (Const64 <typ.UInt64> [smagic64(c).s])) - (Rsh64x64 <t> - x - (Const64 <typ.UInt64> [63]))) +// Strength reduction of mod to div. +// Strength reduction of div to mul is delayed to genericlateopt.rules. // Unsigned mod by power of 2 constant. (Mod8u <t> n (Const8 [c])) && isUnsignedPowerOfTwo(uint8(c)) => (And8 n (Const8 <t> [c-1])) @@ -1323,6 +1082,7 @@ (Mod64u <t> n (Const64 [c])) && isUnsignedPowerOfTwo(uint64(c)) => (And64 n (Const64 <t> [c-1])) // Signed non-negative mod by power of 2 constant. +// TODO: Replace ModN with ModNu in prove. (Mod8 <t> n (Const8 [c])) && isNonNegative(n) && isPowerOfTwo(c) => (And8 n (Const8 <t> [c-1])) (Mod16 <t> n (Const16 [c])) && isNonNegative(n) && isPowerOfTwo(c) => (And16 n (Const16 <t> [c-1])) (Mod32 <t> n (Const32 [c])) && isNonNegative(n) && isPowerOfTwo(c) => (And32 n (Const32 <t> [c-1])) @@ -1355,7 +1115,9 @@ (Mod64u <t> x (Const64 [c])) && x.Op != OpConst64 && c > 0 && umagicOK64(c) => (Sub64 x (Mul64 <t> (Div64u <t> x (Const64 <t> [c])) (Const64 <t> [c]))) -// For architectures without rotates on less than 32-bits, promote these checks to 32-bit. +// Set up for mod->mul+rot optimization in genericlateopt.rules. +// For architectures without rotates on less than 32-bits, promote to 32-bit. +// TODO: Also != 0 case? (Eq8 (Mod8u x (Const8 [c])) (Const8 [0])) && x.Op != OpConst8 && udivisibleOK8(c) && !hasSmallRotate(config) => (Eq32 (Mod32u <typ.UInt32> (ZeroExt8to32 <typ.UInt32> x) (Const32 <typ.UInt32> [int32(uint8(c))])) (Const32 <typ.UInt32> [0])) (Eq16 (Mod16u x (Const16 [c])) (Const16 [0])) && x.Op != OpConst16 && udivisibleOK16(c) && !hasSmallRotate(config) => @@ -1365,557 +1127,6 @@ (Eq16 (Mod16 x (Const16 [c])) (Const16 [0])) && x.Op != OpConst16 && sdivisibleOK16(c) && !hasSmallRotate(config) => (Eq32 (Mod32 <typ.Int32> (SignExt16to32 <typ.Int32> x) (Const32 <typ.Int32> [int32(c)])) (Const32 <typ.Int32> [0])) -// Divisibility checks x%c == 0 convert to multiply and rotate. -// Note, x%c == 0 is rewritten as x == c*(x/c) during the opt pass -// where (x/c) is performed using multiplication with magic constants. -// To rewrite x%c == 0 requires pattern matching the rewritten expression -// and checking that the division by the same constant wasn't already calculated. -// This check is made by counting uses of the magic constant multiplication. -// Note that if there were an intermediate opt pass, this rule could be applied -// directly on the Div op and magic division rewrites could be delayed to late opt. - -// Unsigned divisibility checks convert to multiply and rotate. -(Eq8 x (Mul8 (Const8 [c]) - (Trunc32to8 - (Rsh32Ux64 - mul:(Mul32 - (Const32 [m]) - (ZeroExt8to32 x)) - (Const64 [s]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int32(1<<8+umagic8(c).m) && s == 8+umagic8(c).s - && x.Op != OpConst8 && udivisibleOK8(c) - => (Leq8U - (RotateLeft8 <typ.UInt8> - (Mul8 <typ.UInt8> - (Const8 <typ.UInt8> [int8(udivisible8(c).m)]) - x) - (Const8 <typ.UInt8> [int8(8-udivisible8(c).k)]) - ) - (Const8 <typ.UInt8> [int8(udivisible8(c).max)]) - ) - -(Eq16 x (Mul16 (Const16 [c]) - (Trunc64to16 - (Rsh64Ux64 - mul:(Mul64 - (Const64 [m]) - (ZeroExt16to64 x)) - (Const64 [s]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int64(1<<16+umagic16(c).m) && s == 16+umagic16(c).s - && x.Op != OpConst16 && udivisibleOK16(c) - => (Leq16U - (RotateLeft16 <typ.UInt16> - (Mul16 <typ.UInt16> - (Const16 <typ.UInt16> [int16(udivisible16(c).m)]) - x) - (Const16 <typ.UInt16> [int16(16-udivisible16(c).k)]) - ) - (Const16 <typ.UInt16> [int16(udivisible16(c).max)]) - ) - -(Eq16 x (Mul16 (Const16 [c]) - (Trunc32to16 - (Rsh32Ux64 - mul:(Mul32 - (Const32 [m]) - (ZeroExt16to32 x)) - (Const64 [s]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int32(1<<15+umagic16(c).m/2) && s == 16+umagic16(c).s-1 - && x.Op != OpConst16 && udivisibleOK16(c) - => (Leq16U - (RotateLeft16 <typ.UInt16> - (Mul16 <typ.UInt16> - (Const16 <typ.UInt16> [int16(udivisible16(c).m)]) - x) - (Const16 <typ.UInt16> [int16(16-udivisible16(c).k)]) - ) - (Const16 <typ.UInt16> [int16(udivisible16(c).max)]) - ) - -(Eq16 x (Mul16 (Const16 [c]) - (Trunc32to16 - (Rsh32Ux64 - mul:(Mul32 - (Const32 [m]) - (Rsh32Ux64 (ZeroExt16to32 x) (Const64 [1]))) - (Const64 [s]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int32(1<<15+(umagic16(c).m+1)/2) && s == 16+umagic16(c).s-2 - && x.Op != OpConst16 && udivisibleOK16(c) - => (Leq16U - (RotateLeft16 <typ.UInt16> - (Mul16 <typ.UInt16> - (Const16 <typ.UInt16> [int16(udivisible16(c).m)]) - x) - (Const16 <typ.UInt16> [int16(16-udivisible16(c).k)]) - ) - (Const16 <typ.UInt16> [int16(udivisible16(c).max)]) - ) - -(Eq16 x (Mul16 (Const16 [c]) - (Trunc32to16 - (Rsh32Ux64 - (Avg32u - (Lsh32x64 (ZeroExt16to32 x) (Const64 [16])) - mul:(Mul32 - (Const32 [m]) - (ZeroExt16to32 x))) - (Const64 [s]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int32(umagic16(c).m) && s == 16+umagic16(c).s-1 - && x.Op != OpConst16 && udivisibleOK16(c) - => (Leq16U - (RotateLeft16 <typ.UInt16> - (Mul16 <typ.UInt16> - (Const16 <typ.UInt16> [int16(udivisible16(c).m)]) - x) - (Const16 <typ.UInt16> [int16(16-udivisible16(c).k)]) - ) - (Const16 <typ.UInt16> [int16(udivisible16(c).max)]) - ) - -(Eq32 x (Mul32 (Const32 [c]) - (Rsh32Ux64 - mul:(Hmul32u - (Const32 [m]) - x) - (Const64 [s])) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int32(1<<31+umagic32(c).m/2) && s == umagic32(c).s-1 - && x.Op != OpConst32 && udivisibleOK32(c) - => (Leq32U - (RotateLeft32 <typ.UInt32> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(udivisible32(c).m)]) - x) - (Const32 <typ.UInt32> [int32(32-udivisible32(c).k)]) - ) - (Const32 <typ.UInt32> [int32(udivisible32(c).max)]) - ) - -(Eq32 x (Mul32 (Const32 [c]) - (Rsh32Ux64 - mul:(Hmul32u - (Const32 <typ.UInt32> [m]) - (Rsh32Ux64 x (Const64 [1]))) - (Const64 [s])) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int32(1<<31+(umagic32(c).m+1)/2) && s == umagic32(c).s-2 - && x.Op != OpConst32 && udivisibleOK32(c) - => (Leq32U - (RotateLeft32 <typ.UInt32> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(udivisible32(c).m)]) - x) - (Const32 <typ.UInt32> [int32(32-udivisible32(c).k)]) - ) - (Const32 <typ.UInt32> [int32(udivisible32(c).max)]) - ) - -(Eq32 x (Mul32 (Const32 [c]) - (Rsh32Ux64 - (Avg32u - x - mul:(Hmul32u - (Const32 [m]) - x)) - (Const64 [s])) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int32(umagic32(c).m) && s == umagic32(c).s-1 - && x.Op != OpConst32 && udivisibleOK32(c) - => (Leq32U - (RotateLeft32 <typ.UInt32> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(udivisible32(c).m)]) - x) - (Const32 <typ.UInt32> [int32(32-udivisible32(c).k)]) - ) - (Const32 <typ.UInt32> [int32(udivisible32(c).max)]) - ) - -(Eq32 x (Mul32 (Const32 [c]) - (Trunc64to32 - (Rsh64Ux64 - mul:(Mul64 - (Const64 [m]) - (ZeroExt32to64 x)) - (Const64 [s]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int64(1<<31+umagic32(c).m/2) && s == 32+umagic32(c).s-1 - && x.Op != OpConst32 && udivisibleOK32(c) - => (Leq32U - (RotateLeft32 <typ.UInt32> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(udivisible32(c).m)]) - x) - (Const32 <typ.UInt32> [int32(32-udivisible32(c).k)]) - ) - (Const32 <typ.UInt32> [int32(udivisible32(c).max)]) - ) - -(Eq32 x (Mul32 (Const32 [c]) - (Trunc64to32 - (Rsh64Ux64 - mul:(Mul64 - (Const64 [m]) - (Rsh64Ux64 (ZeroExt32to64 x) (Const64 [1]))) - (Const64 [s]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int64(1<<31+(umagic32(c).m+1)/2) && s == 32+umagic32(c).s-2 - && x.Op != OpConst32 && udivisibleOK32(c) - => (Leq32U - (RotateLeft32 <typ.UInt32> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(udivisible32(c).m)]) - x) - (Const32 <typ.UInt32> [int32(32-udivisible32(c).k)]) - ) - (Const32 <typ.UInt32> [int32(udivisible32(c).max)]) - ) - -(Eq32 x (Mul32 (Const32 [c]) - (Trunc64to32 - (Rsh64Ux64 - (Avg64u - (Lsh64x64 (ZeroExt32to64 x) (Const64 [32])) - mul:(Mul64 - (Const64 [m]) - (ZeroExt32to64 x))) - (Const64 [s]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int64(umagic32(c).m) && s == 32+umagic32(c).s-1 - && x.Op != OpConst32 && udivisibleOK32(c) - => (Leq32U - (RotateLeft32 <typ.UInt32> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(udivisible32(c).m)]) - x) - (Const32 <typ.UInt32> [int32(32-udivisible32(c).k)]) - ) - (Const32 <typ.UInt32> [int32(udivisible32(c).max)]) - ) - -(Eq64 x (Mul64 (Const64 [c]) - (Rsh64Ux64 - mul:(Hmul64u - (Const64 [m]) - x) - (Const64 [s])) - ) -) && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int64(1<<63+umagic64(c).m/2) && s == umagic64(c).s-1 - && x.Op != OpConst64 && udivisibleOK64(c) - => (Leq64U - (RotateLeft64 <typ.UInt64> - (Mul64 <typ.UInt64> - (Const64 <typ.UInt64> [int64(udivisible64(c).m)]) - x) - (Const64 <typ.UInt64> [64-udivisible64(c).k]) - ) - (Const64 <typ.UInt64> [int64(udivisible64(c).max)]) - ) -(Eq64 x (Mul64 (Const64 [c]) - (Rsh64Ux64 - mul:(Hmul64u - (Const64 [m]) - (Rsh64Ux64 x (Const64 [1]))) - (Const64 [s])) - ) -) && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int64(1<<63+(umagic64(c).m+1)/2) && s == umagic64(c).s-2 - && x.Op != OpConst64 && udivisibleOK64(c) - => (Leq64U - (RotateLeft64 <typ.UInt64> - (Mul64 <typ.UInt64> - (Const64 <typ.UInt64> [int64(udivisible64(c).m)]) - x) - (Const64 <typ.UInt64> [64-udivisible64(c).k]) - ) - (Const64 <typ.UInt64> [int64(udivisible64(c).max)]) - ) -(Eq64 x (Mul64 (Const64 [c]) - (Rsh64Ux64 - (Avg64u - x - mul:(Hmul64u - (Const64 [m]) - x)) - (Const64 [s])) - ) -) && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int64(umagic64(c).m) && s == umagic64(c).s-1 - && x.Op != OpConst64 && udivisibleOK64(c) - => (Leq64U - (RotateLeft64 <typ.UInt64> - (Mul64 <typ.UInt64> - (Const64 <typ.UInt64> [int64(udivisible64(c).m)]) - x) - (Const64 <typ.UInt64> [64-udivisible64(c).k]) - ) - (Const64 <typ.UInt64> [int64(udivisible64(c).max)]) - ) - -// Signed divisibility checks convert to multiply, add and rotate. -(Eq8 x (Mul8 (Const8 [c]) - (Sub8 - (Rsh32x64 - mul:(Mul32 - (Const32 [m]) - (SignExt8to32 x)) - (Const64 [s])) - (Rsh32x64 - (SignExt8to32 x) - (Const64 [31]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int32(smagic8(c).m) && s == 8+smagic8(c).s - && x.Op != OpConst8 && sdivisibleOK8(c) - => (Leq8U - (RotateLeft8 <typ.UInt8> - (Add8 <typ.UInt8> - (Mul8 <typ.UInt8> - (Const8 <typ.UInt8> [int8(sdivisible8(c).m)]) - x) - (Const8 <typ.UInt8> [int8(sdivisible8(c).a)]) - ) - (Const8 <typ.UInt8> [int8(8-sdivisible8(c).k)]) - ) - (Const8 <typ.UInt8> [int8(sdivisible8(c).max)]) - ) - -(Eq16 x (Mul16 (Const16 [c]) - (Sub16 - (Rsh32x64 - mul:(Mul32 - (Const32 [m]) - (SignExt16to32 x)) - (Const64 [s])) - (Rsh32x64 - (SignExt16to32 x) - (Const64 [31]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int32(smagic16(c).m) && s == 16+smagic16(c).s - && x.Op != OpConst16 && sdivisibleOK16(c) - => (Leq16U - (RotateLeft16 <typ.UInt16> - (Add16 <typ.UInt16> - (Mul16 <typ.UInt16> - (Const16 <typ.UInt16> [int16(sdivisible16(c).m)]) - x) - (Const16 <typ.UInt16> [int16(sdivisible16(c).a)]) - ) - (Const16 <typ.UInt16> [int16(16-sdivisible16(c).k)]) - ) - (Const16 <typ.UInt16> [int16(sdivisible16(c).max)]) - ) - -(Eq32 x (Mul32 (Const32 [c]) - (Sub32 - (Rsh64x64 - mul:(Mul64 - (Const64 [m]) - (SignExt32to64 x)) - (Const64 [s])) - (Rsh64x64 - (SignExt32to64 x) - (Const64 [63]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int64(smagic32(c).m) && s == 32+smagic32(c).s - && x.Op != OpConst32 && sdivisibleOK32(c) - => (Leq32U - (RotateLeft32 <typ.UInt32> - (Add32 <typ.UInt32> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(sdivisible32(c).m)]) - x) - (Const32 <typ.UInt32> [int32(sdivisible32(c).a)]) - ) - (Const32 <typ.UInt32> [int32(32-sdivisible32(c).k)]) - ) - (Const32 <typ.UInt32> [int32(sdivisible32(c).max)]) - ) - -(Eq32 x (Mul32 (Const32 [c]) - (Sub32 - (Rsh32x64 - mul:(Hmul32 - (Const32 [m]) - x) - (Const64 [s])) - (Rsh32x64 - x - (Const64 [31]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int32(smagic32(c).m/2) && s == smagic32(c).s-1 - && x.Op != OpConst32 && sdivisibleOK32(c) - => (Leq32U - (RotateLeft32 <typ.UInt32> - (Add32 <typ.UInt32> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(sdivisible32(c).m)]) - x) - (Const32 <typ.UInt32> [int32(sdivisible32(c).a)]) - ) - (Const32 <typ.UInt32> [int32(32-sdivisible32(c).k)]) - ) - (Const32 <typ.UInt32> [int32(sdivisible32(c).max)]) - ) - -(Eq32 x (Mul32 (Const32 [c]) - (Sub32 - (Rsh32x64 - (Add32 - mul:(Hmul32 - (Const32 [m]) - x) - x) - (Const64 [s])) - (Rsh32x64 - x - (Const64 [31]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int32(smagic32(c).m) && s == smagic32(c).s - && x.Op != OpConst32 && sdivisibleOK32(c) - => (Leq32U - (RotateLeft32 <typ.UInt32> - (Add32 <typ.UInt32> - (Mul32 <typ.UInt32> - (Const32 <typ.UInt32> [int32(sdivisible32(c).m)]) - x) - (Const32 <typ.UInt32> [int32(sdivisible32(c).a)]) - ) - (Const32 <typ.UInt32> [int32(32-sdivisible32(c).k)]) - ) - (Const32 <typ.UInt32> [int32(sdivisible32(c).max)]) - ) - -(Eq64 x (Mul64 (Const64 [c]) - (Sub64 - (Rsh64x64 - mul:(Hmul64 - (Const64 [m]) - x) - (Const64 [s])) - (Rsh64x64 - x - (Const64 [63]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int64(smagic64(c).m/2) && s == smagic64(c).s-1 - && x.Op != OpConst64 && sdivisibleOK64(c) - => (Leq64U - (RotateLeft64 <typ.UInt64> - (Add64 <typ.UInt64> - (Mul64 <typ.UInt64> - (Const64 <typ.UInt64> [int64(sdivisible64(c).m)]) - x) - (Const64 <typ.UInt64> [int64(sdivisible64(c).a)]) - ) - (Const64 <typ.UInt64> [64-sdivisible64(c).k]) - ) - (Const64 <typ.UInt64> [int64(sdivisible64(c).max)]) - ) - -(Eq64 x (Mul64 (Const64 [c]) - (Sub64 - (Rsh64x64 - (Add64 - mul:(Hmul64 - (Const64 [m]) - x) - x) - (Const64 [s])) - (Rsh64x64 - x - (Const64 [63]))) - ) -) - && v.Block.Func.pass.name != "opt" && mul.Uses == 1 - && m == int64(smagic64(c).m) && s == smagic64(c).s - && x.Op != OpConst64 && sdivisibleOK64(c) - => (Leq64U - (RotateLeft64 <typ.UInt64> - (Add64 <typ.UInt64> - (Mul64 <typ.UInt64> - (Const64 <typ.UInt64> [int64(sdivisible64(c).m)]) - x) - (Const64 <typ.UInt64> [int64(sdivisible64(c).a)]) - ) - (Const64 <typ.UInt64> [64-sdivisible64(c).k]) - ) - (Const64 <typ.UInt64> [int64(sdivisible64(c).max)]) - ) - -// Divisibility check for signed integers for power of two constant are simple mask. -// However, we must match against the rewritten n%c == 0 -> n - c*(n/c) == 0 -> n == c*(n/c) -// where n/c contains fixup code to handle signed n. -((Eq8|Neq8) n (Lsh8x64 - (Rsh8x64 - (Add8 <t> n (Rsh8Ux64 <t> (Rsh8x64 <t> n (Const64 <typ.UInt64> [ 7])) (Const64 <typ.UInt64> [kbar]))) - (Const64 <typ.UInt64> [k])) - (Const64 <typ.UInt64> [k])) -) && k > 0 && k < 7 && kbar == 8 - k - => ((Eq8|Neq8) (And8 <t> n (Const8 <t> [1<<uint(k)-1])) (Const8 <t> [0])) - -((Eq16|Neq16) n (Lsh16x64 - (Rsh16x64 - (Add16 <t> n (Rsh16Ux64 <t> (Rsh16x64 <t> n (Const64 <typ.UInt64> [15])) (Const64 <typ.UInt64> [kbar]))) - (Const64 <typ.UInt64> [k])) - (Const64 <typ.UInt64> [k])) -) && k > 0 && k < 15 && kbar == 16 - k - => ((Eq16|Neq16) (And16 <t> n (Const16 <t> [1<<uint(k)-1])) (Const16 <t> [0])) - -((Eq32|Neq32) n (Lsh32x64 - (Rsh32x64 - (Add32 <t> n (Rsh32Ux64 <t> (Rsh32x64 <t> n (Const64 <typ.UInt64> [31])) (Const64 <typ.UInt64> [kbar]))) - (Const64 <typ.UInt64> [k])) - (Const64 <typ.UInt64> [k])) -) && k > 0 && k < 31 && kbar == 32 - k - => ((Eq32|Neq32) (And32 <t> n (Const32 <t> [1<<uint(k)-1])) (Const32 <t> [0])) - -((Eq64|Neq64) n (Lsh64x64 - (Rsh64x64 - (Add64 <t> n (Rsh64Ux64 <t> (Rsh64x64 <t> n (Const64 <typ.UInt64> [63])) (Const64 <typ.UInt64> [kbar]))) - (Const64 <typ.UInt64> [k])) - (Const64 <typ.UInt64> [k])) -) && k > 0 && k < 63 && kbar == 64 - k - => ((Eq64|Neq64) (And64 <t> n (Const64 <t> [1<<uint(k)-1])) (Const64 <t> [0])) - (Eq(8|16|32|64) s:(Sub(8|16|32|64) x y) (Const(8|16|32|64) [0])) && s.Uses == 1 => (Eq(8|16|32|64) x y) (Neq(8|16|32|64) s:(Sub(8|16|32|64) x y) (Const(8|16|32|64) [0])) && s.Uses == 1 => (Neq(8|16|32|64) x y) @@ -1925,6 +1136,20 @@ (Neq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [y])) && oneBit(y) => (Eq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [0])) +// Mark newly generated bounded shifts as bounded, for opt passes after prove. +(Lsh64x(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 64 => (Lsh64x(8|16|32|64) [true] x con) +(Rsh64x(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 64 => (Rsh64x(8|16|32|64) [true] x con) +(Rsh64Ux(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 64 => (Rsh64Ux(8|16|32|64) [true] x con) +(Lsh32x(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 32 => (Lsh32x(8|16|32|64) [true] x con) +(Rsh32x(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 32 => (Rsh32x(8|16|32|64) [true] x con) +(Rsh32Ux(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 32 => (Rsh32Ux(8|16|32|64) [true] x con) +(Lsh16x(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 16 => (Lsh16x(8|16|32|64) [true] x con) +(Rsh16x(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 16 => (Rsh16x(8|16|32|64) [true] x con) +(Rsh16Ux(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 16 => (Rsh16Ux(8|16|32|64) [true] x con) +(Lsh8x(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 8 => (Lsh8x(8|16|32|64) [true] x con) +(Rsh8x(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 8 => (Rsh8x(8|16|32|64) [true] x con) +(Rsh8Ux(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 8 => (Rsh8Ux(8|16|32|64) [true] x con) + // Reassociate expressions involving // constants such that constants come first, // exposing obvious constant-folding opportunities. |
