| Age | Commit message (Collapse) | Author |
|
In the mips{,64} instruction sets and their extensions, there is no
NORI instruction.
Change-Id: If008442c792297d011b3d0c1e8501e62e32ab175
Reviewed-on: https://go-review.googlesource.com/c/go/+/735900
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Merge the signed and unsigned generic functions.
The only implementation difference between the two is:
n > 0 vs n != 0 check.
For unsigned numbers n > 0 == n != 0 and we infact optimize
the first to the second.
Change-Id: Ia2f6c3e3d4eb098d98f85e06dc2e81baa60bad4e
Reviewed-on: https://go-review.googlesource.com/c/go/+/726720
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u
on 32-bit systems, with the effect that constant division of both
int64s and uint64s can now be emitted directly in all cases,
and also that bits.Mul64 can be intrinsified on 32-bit systems.
Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were
implemented as uint32 divisions by c and some fixup. After expanding
those smaller constant divisions, the code for i/999 required:
(386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes)
(arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes)
(mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes)
For that much code, we might as well use a full 64x64->128 multiply
that can be used for all divisors, not just small ones.
Having done that, the same i/999 now generates:
(386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes)
(arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes)
(mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes)
The size increase on 386 is due to a few extra register spills.
The size increase on mips is due to add-with-carry being hard.
The new approach is more general, letting us delete the old special case
and guarantee that all int64 and uint64 divisions by constants are
generated directly on 32-bit systems.
This especially speeds up code making heavy use of bits.Mul64 with
a constant argument, which happens in strconv and various crypto
packages. A few examples are benchmarked below.
pkg: cmd/compile/internal/test
benchmark \ host local linux-amd64 s7 linux-386 s7:GOARCH=386
vs base vs base vs base vs base vs base
DivconstI64 ~ ~ ~ -49.66% -21.02%
ModconstI64 ~ ~ ~ -13.45% +14.52%
DivisiblePow2constI64 ~ ~ ~ +0.97% -1.32%
DivisibleconstI64 ~ ~ ~ -20.01% -48.28%
DivisibleWDivconstI64 ~ ~ -1.76% -38.59% -42.74%
DivconstU64/3 ~ ~ ~ -13.82% -4.09%
DivconstU64/5 ~ ~ ~ -14.10% -3.54%
DivconstU64/37 -2.07% -4.45% ~ -19.60% -9.55%
DivconstU64/1234567 ~ ~ ~ -61.55% -56.93%
ModconstU64 ~ ~ ~ -6.25% ~
DivisibleconstU64 ~ ~ ~ -2.78% -7.82%
DivisibleWDivconstU64 ~ ~ ~ +4.23% +2.56%
pkg: math/bits
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Add ~ ~ ~ ~
Add32 +1.59% ~ ~ ~
Add64 ~ ~ ~ ~
Add64multiple ~ ~ ~ ~
Sub ~ ~ ~ ~
Sub32 ~ ~ ~ ~
Sub64 ~ ~ -9.20% ~
Sub64multiple ~ ~ ~ ~
Mul ~ ~ ~ ~
Mul32 ~ ~ ~ ~
Mul64 ~ ~ -41.58% -53.21%
Div ~ ~ ~ ~
Div32 ~ ~ ~ ~
Div64 ~ ~ ~ ~
pkg: strconv
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
ParseInt/Pos/7bit ~ ~ -11.08% -6.75%
ParseInt/Pos/26bit ~ ~ -13.65% -11.02%
ParseInt/Pos/31bit ~ ~ -14.65% -9.71%
ParseInt/Pos/56bit -1.80% ~ -17.97% -10.78%
ParseInt/Pos/63bit ~ ~ -13.85% -9.63%
ParseInt/Neg/7bit ~ ~ -12.14% -7.26%
ParseInt/Neg/26bit ~ ~ -14.18% -9.81%
ParseInt/Neg/31bit ~ ~ -14.51% -9.02%
ParseInt/Neg/56bit ~ ~ -15.79% -9.79%
ParseInt/Neg/63bit ~ ~ -15.68% -11.07%
AppendFloat/Decimal ~ ~ -7.25% -12.26%
AppendFloat/Float ~ ~ -15.96% -19.45%
AppendFloat/Exp ~ ~ -13.96% -17.76%
AppendFloat/NegExp ~ ~ -14.89% -20.27%
AppendFloat/LongExp ~ ~ -12.68% -17.97%
AppendFloat/Big ~ ~ -11.10% -16.64%
AppendFloat/BinaryExp ~ ~ ~ ~
AppendFloat/32Integer ~ ~ -10.05% -10.91%
AppendFloat/32ExactFraction ~ ~ -8.93% -13.00%
AppendFloat/32Point ~ ~ -10.36% -14.89%
AppendFloat/32Exp ~ ~ -9.88% -13.54%
AppendFloat/32NegExp ~ ~ -10.16% -14.26%
AppendFloat/32Shortest ~ ~ -11.39% -14.96%
AppendFloat/32Fixed8Hard ~ ~ ~ -2.31%
AppendFloat/32Fixed9Hard ~ ~ ~ -7.01%
AppendFloat/64Fixed1 ~ ~ -2.83% -8.23%
AppendFloat/64Fixed2 ~ ~ ~ -7.94%
AppendFloat/64Fixed3 ~ ~ -4.07% -7.22%
AppendFloat/64Fixed4 ~ ~ -7.24% -7.62%
AppendFloat/64Fixed12 ~ ~ -6.57% -4.82%
AppendFloat/64Fixed16 ~ ~ -4.00% -5.81%
AppendFloat/64Fixed12Hard -2.22% ~ -4.07% -6.35%
AppendFloat/64Fixed17Hard -2.12% ~ ~ -3.79%
AppendFloat/64Fixed18Hard -1.89% ~ +2.48% ~
AppendFloat/Slowpath64 -1.85% ~ -14.49% -18.21%
AppendFloat/SlowpathDenormal64 ~ ~ -13.08% -19.41%
pkg: crypto/internal/fips140/nistec/fiat
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Mul/P224 ~ ~ -29.95% -39.60%
Mul/P384 ~ ~ -37.11% -63.33%
Mul/P521 ~ ~ -26.62% -12.42%
Square/P224 +1.46% ~ -40.62% -49.18%
Square/P384 ~ ~ -45.51% -69.68%
Square/P521 +90.37% ~ -25.26% -11.23%
(The +90% is a separate problem and not real; that much variation
can be seen on that system by running the same binary from two
different files.)
pkg: crypto/internal/fips140/edwards25519
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
EncodingDecoding ~ ~ -34.67% -35.75%
ScalarBaseMult ~ ~ -31.25% -30.29%
ScalarMult ~ ~ -33.45% -32.54%
VarTimeDoubleScalarBaseMult ~ ~ -33.78% -33.68%
Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65
Reviewed-on: https://go-review.googlesource.com/c/go/+/716061
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: Ied54ea7bf68c4c943c621ca059aca1048903c041
Reviewed-on: https://go-review.googlesource.com/c/go/+/682497
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Julian Zhu <jz531210@gmail.com>
Reviewed-by: Mark Freeman <mark@golang.org>
|
|
Just using isUnsignedPowerOfTwo and log32u is enough.
Change-Id: I93d49ab71c6245d05f6507adbcb9ef2a696e75d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/691476
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This enables publicationBarrier to be used as an intrinsic on mipsx.
Change-Id: Ic199f34b84b3058bcfab79aac8f2399ff21a97ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/674856
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Fold negation into addition/subtraction and avoid double negation.
file before after Δ %
addr2line 3742022 3741986 -36 -0.001%
asm 6668616 6668628 +12 +0.000%
buildid 3583786 3583630 -156 -0.004%
cgo 6020370 6019634 -736 -0.012%
compile 29416016 29417336 +1320 +0.004%
cover 6801903 6801675 -228 -0.003%
dist 4485916 4485816 -100 -0.002%
doc 10652787 10652251 -536 -0.005%
fix 4115988 4115560 -428 -0.010%
link 9002328 9001616 -712 -0.008%
nm 3733148 3732780 -368 -0.010%
objdump 6163292 6163068 -224 -0.004%
pack 2944768 2944604 -164 -0.006%
pprof 18909973 18908773 -1200 -0.006%
test2json 3394662 3394778 +116 +0.003%
trace 17350911 17349751 -1160 -0.007%
vet 10077727 10077527 -200 -0.002%
go 19118769 19118609 -160 -0.001%
total 166182982 166178022 -4960 -0.003%
Change-Id: Id55698800fd70f3cb2ff48393584456b87208921
Reviewed-on: https://go-review.googlesource.com/c/go/+/673556
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64
and S390X, rather than having a custom intrinsic. Note that for PPC64 this
actually allows the existing Ctz16 and Ctz8 rules to be used.
Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/651816
Reviewed-by: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Decompose BitLen16 and BitLen8 within the SSA rules for architectures that
support BitLen32 or BitLen64, rather than having a custom intrinsic.
Change-Id: Ie4188ce69d1021e63cec27a8e7418efb0714812b
Reviewed-on: https://go-review.googlesource.com/c/go/+/651817
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Change-Id: I097b53e9f13de6ff6eb18ae2261842b097f26390
Reviewed-on: https://go-review.googlesource.com/c/go/+/615197
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This CL use MFC1/MTC1 instructions to move data between GPR and FPR instead of stores and loads to move float/int values.
goos: linux
goarch: mipsle
pkg: math
│ oldmathf │ newmathf │
│ sec/op │ sec/op vs base │
Acos-4 282.7n ± 0% 282.1n ± 0% -0.18% (p=0.010 n=8)
Acosh-4 450.8n ± 0% 450.9n ± 0% ~ (p=0.699 n=8)
Asin-4 272.6n ± 0% 272.1n ± 0% ~ (p=0.050 n=8)
Asinh-4 476.8n ± 0% 475.1n ± 0% -0.35% (p=0.018 n=8)
Atan-4 208.1n ± 0% 207.7n ± 0% -0.17% (p=0.009 n=8)
Atanh-4 448.8n ± 0% 448.7n ± 0% -0.03% (p=0.014 n=8)
Atan2-4 310.2n ± 0% 310.1n ± 0% ~ (p=0.133 n=8)
Cbrt-4 357.9n ± 0% 358.4n ± 0% +0.11% (p=0.014 n=8)
Ceil-4 203.8n ± 0% 204.7n ± 0% +0.42% (p=0.008 n=8)
Compare-4 21.12n ± 0% 22.09n ± 0% +4.59% (p=0.000 n=8)
Compare32-4 19.105n ± 0% 6.022n ± 0% -68.48% (p=0.000 n=8)
Copysign-4 33.17n ± 0% 33.15n ± 0% ~ (p=0.795 n=8)
Cos-4 385.2n ± 0% 384.8n ± 1% ~ (p=0.112 n=8)
Cosh-4 546.0n ± 0% 545.0n ± 0% -0.17% (p=0.012 n=8)
Erf-4 192.4n ± 0% 195.4n ± 1% +1.59% (p=0.000 n=8)
Erfc-4 187.8n ± 0% 192.7n ± 0% +2.64% (p=0.000 n=8)
Erfinv-4 221.8n ± 1% 219.8n ± 0% -0.88% (p=0.000 n=8)
Erfcinv-4 224.1n ± 1% 219.9n ± 0% -1.87% (p=0.000 n=8)
Exp-4 434.7n ± 0% 435.0n ± 0% ~ (p=0.339 n=8)
ExpGo-4 433.7n ± 0% 434.2n ± 0% +0.13% (p=0.005 n=8)
Expm1-4 243.0n ± 0% 242.9n ± 0% ~ (p=0.103 n=8)
Exp2-4 426.6n ± 0% 426.6n ± 0% ~ (p=0.822 n=8)
Exp2Go-4 425.6n ± 0% 425.5n ± 0% ~ (p=0.377 n=8)
Abs-4 8.033n ± 0% 8.029n ± 0% ~ (p=0.065 n=8)
Dim-4 18.07n ± 0% 18.07n ± 0% ~ (p=0.051 n=8)
Floor-4 151.6n ± 0% 151.6n ± 0% ~ (p=0.450 n=8)
Max-4 100.9n ± 8% 103.2n ± 2% ~ (p=0.099 n=8)
Min-4 116.4n ± 0% 116.4n ± 0% ~ (p=0.467 n=8)
Mod-4 959.6n ± 1% 950.9n ± 0% -0.91% (p=0.006 n=8)
Frexp-4 147.6n ± 0% 147.5n ± 0% -0.07% (p=0.026 n=8)
Gamma-4 482.7n ± 0% 478.2n ± 2% -0.92% (p=0.000 n=8)
Hypot-4 139.8n ± 1% 127.1n ± 8% -9.12% (p=0.000 n=8)
HypotGo-4 137.2n ± 7% 117.5n ± 2% -14.39% (p=0.001 n=8)
Ilogb-4 109.5n ± 0% 108.4n ± 1% -1.05% (p=0.001 n=8)
J0-4 1.304µ ± 0% 1.304µ ± 0% ~ (p=0.853 n=8)
J1-4 1.349µ ± 0% 1.331µ ± 0% -1.33% (p=0.000 n=8)
Jn-4 2.774µ ± 0% 2.750µ ± 0% -0.87% (p=0.000 n=8)
Ldexp-4 151.6n ± 0% 151.5n ± 0% ~ (p=0.695 n=8)
Lgamma-4 226.9n ± 0% 233.9n ± 0% +3.09% (p=0.000 n=8)
Log-4 407.6n ± 0% 407.4n ± 0% ~ (p=0.340 n=8)
Logb-4 121.5n ± 0% 121.5n ± 0% -0.08% (p=0.042 n=8)
Log1p-4 315.5n ± 0% 315.6n ± 0% ~ (p=0.930 n=8)
Log10-4 417.8n ± 0% 417.5n ± 0% ~ (p=0.053 n=8)
Log2-4 208.8n ± 0% 208.8n ± 0% ~ (p=0.582 n=8)
Modf-4 126.5n ± 0% 126.4n ± 0% ~ (p=0.128 n=8)
Nextafter32-4 112.45n ± 0% 82.27n ± 0% -26.84% (p=0.000 n=8)
Nextafter64-4 141.5n ± 0% 141.5n ± 0% ~ (p=0.569 n=8)
PowInt-4 754.0n ± 1% 754.6n ± 0% ~ (p=0.279 n=8)
PowFrac-4 1.608µ ± 1% 1.596µ ± 1% ~ (p=0.661 n=8)
Pow10Pos-4 18.07n ± 0% 18.07n ± 0% ~ (p=0.413 n=8)
Pow10Neg-4 17.08n ± 0% 18.07n ± 0% +5.80% (p=0.000 n=8)
Round-4 68.30n ± 0% 69.29n ± 0% +1.45% (p=0.000 n=8)
RoundToEven-4 78.33n ± 0% 78.34n ± 0% ~ (p=0.975 n=8)
Remainder-4 740.6n ± 1% 736.7n ± 0% ~ (p=0.098 n=8)
Signbit-4 18.08n ± 0% 18.07n ± 0% ~ (p=0.546 n=8)
Sin-4 389.4n ± 0% 389.5n ± 0% ~ (p=0.451 n=8)
Sincos-4 415.6n ± 0% 415.6n ± 0% ~ (p=0.450 n=8)
Sinh-4 607.0n ± 0% 590.8n ± 1% -2.68% (p=0.000 n=8)
SqrtIndirect-4 8.034n ± 0% 8.030n ± 0% ~ (p=0.487 n=8)
SqrtLatency-4 8.031n ± 0% 8.034n ± 0% ~ (p=0.152 n=8)
SqrtIndirectLatency-4 8.032n ± 0% 8.032n ± 0% ~ (p=0.818 n=8)
SqrtGoLatency-4 895.8n ± 0% 895.3n ± 0% ~ (p=0.553 n=8)
SqrtPrime-4 5.405µ ± 0% 5.379µ ± 0% -0.48% (p=0.000 n=8)
Tan-4 405.6n ± 0% 405.7n ± 0% ~ (p=0.980 n=8)
Tanh-4 545.1n ± 0% 545.1n ± 0% ~ (p=0.806 n=8)
Trunc-4 146.5n ± 0% 146.6n ± 0% ~ (p=0.380 n=8)
Y0-4 1.308µ ± 0% 1.306µ ± 0% ~ (p=0.071 n=8)
Y1-4 1.311µ ± 0% 1.315µ ± 0% +0.31% (p=0.000 n=8)
Yn-4 2.737µ ± 0% 2.745µ ± 0% +0.27% (p=0.000 n=8)
Float64bits-4 14.56n ± 0% 14.56n ± 0% ~ (p=0.689 n=8)
Float64frombits-4 19.08n ± 0% 19.08n ± 0% ~ (p=0.580 n=8)
Float32bits-4 13.050n ± 0% 5.019n ± 0% -61.54% (p=0.000 n=8)
Float32frombits-4 13.060n ± 0% 4.016n ± 0% -69.25% (p=0.000 n=8)
FMA-4 608.5n ± 0% 586.1n ± 0% -3.67% (p=0.000 n=8)
geomean 185.5n 176.2n -5.02%
Change-Id: Ibf91092ffe70104e6c5ec03bc76d51259818b9b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/494535
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This commit optimized math.Abs function implementation on mipsx.
Tested on loongson 3A2000.
goos: linux
goarch: mipsle
pkg: math
│ oldmath │ newmath │
│ sec/op │ sec/op vs base │
Acos-4 282.6n ± 0% 282.3n ± 0% ~ (p=0.140 n=7)
Acosh-4 506.1n ± 0% 451.8n ± 0% -10.73% (p=0.001 n=7)
Asin-4 272.3n ± 0% 272.2n ± 0% ~ (p=0.808 n=7)
Asinh-4 529.7n ± 0% 475.3n ± 0% -10.27% (p=0.001 n=7)
Atan-4 208.2n ± 0% 207.9n ± 0% ~ (p=0.134 n=7)
Atanh-4 503.4n ± 1% 449.7n ± 0% -10.67% (p=0.001 n=7)
Atan2-4 310.5n ± 0% 310.5n ± 0% ~ (p=0.928 n=7)
Cbrt-4 359.3n ± 0% 358.8n ± 0% ~ (p=0.121 n=7)
Ceil-4 203.9n ± 0% 204.0n ± 0% ~ (p=0.600 n=7)
Compare-4 23.11n ± 0% 23.11n ± 0% ~ (p=0.702 n=7)
Compare32-4 19.09n ± 0% 19.12n ± 0% ~ (p=0.070 n=7)
Copysign-4 33.20n ± 0% 34.02n ± 0% +2.47% (p=0.001 n=7)
Cos-4 422.5n ± 0% 385.4n ± 1% -8.78% (p=0.001 n=7)
Cosh-4 628.0n ± 0% 545.5n ± 0% -13.14% (p=0.001 n=7)
Erf-4 193.7n ± 2% 192.7n ± 1% ~ (p=0.430 n=7)
Erfc-4 192.8n ± 1% 193.0n ± 0% ~ (p=0.245 n=7)
Erfinv-4 220.7n ± 1% 221.5n ± 2% ~ (p=0.272 n=7)
Erfcinv-4 221.3n ± 1% 220.4n ± 2% ~ (p=0.738 n=7)
Exp-4 471.4n ± 0% 435.1n ± 0% -7.70% (p=0.001 n=7)
ExpGo-4 470.6n ± 0% 434.0n ± 0% -7.78% (p=0.001 n=7)
Expm1-4 243.1n ± 0% 243.4n ± 0% ~ (p=0.417 n=7)
Exp2-4 463.1n ± 0% 427.0n ± 0% -7.80% (p=0.001 n=7)
Exp2Go-4 462.4n ± 0% 426.2n ± 5% -7.83% (p=0.001 n=7)
Abs-4 37.000n ± 0% 8.039n ± 9% -78.27% (p=0.001 n=7)
Dim-4 18.09n ± 0% 18.11n ± 0% ~ (p=0.094 n=7)
Floor-4 151.9n ± 0% 151.8n ± 0% ~ (p=0.190 n=7)
Max-4 116.7n ± 1% 116.7n ± 1% ~ (p=0.842 n=7)
Min-4 116.6n ± 1% 116.6n ± 0% ~ (p=0.464 n=7)
Mod-4 1244.0n ± 0% 980.9n ± 0% -21.15% (p=0.001 n=7)
Frexp-4 199.0n ± 0% 146.7n ± 0% -26.28% (p=0.001 n=7)
Gamma-4 516.4n ± 0% 479.3n ± 1% -7.18% (p=0.001 n=7)
Hypot-4 169.8n ± 0% 117.8n ± 2% -30.62% (p=0.001 n=7)
HypotGo-4 170.8n ± 0% 117.5n ± 0% -31.21% (p=0.001 n=7)
Ilogb-4 160.8n ± 0% 109.5n ± 0% -31.90% (p=0.001 n=7)
J0-4 1.359µ ± 0% 1.305µ ± 0% -3.97% (p=0.001 n=7)
J1-4 1.386µ ± 0% 1.334µ ± 0% -3.75% (p=0.001 n=7)
Jn-4 2.864µ ± 0% 2.758µ ± 0% -3.70% (p=0.001 n=7)
Ldexp-4 202.9n ± 0% 151.7n ± 0% -25.23% (p=0.001 n=7)
Lgamma-4 234.0n ± 0% 234.3n ± 0% ~ (p=0.199 n=7)
Log-4 444.1n ± 0% 407.9n ± 0% -8.15% (p=0.001 n=7)
Logb-4 157.8n ± 0% 121.6n ± 0% -22.94% (p=0.001 n=7)
Log1p-4 354.8n ± 0% 315.4n ± 0% -11.10% (p=0.001 n=7)
Log10-4 453.9n ± 0% 417.9n ± 0% -7.93% (p=0.001 n=7)
Log2-4 245.3n ± 0% 209.1n ± 0% -14.76% (p=0.001 n=7)
Modf-4 126.6n ± 0% 126.6n ± 0% ~ (p=0.126 n=7)
Nextafter32-4 112.5n ± 0% 112.5n ± 0% ~ (p=0.853 n=7)
Nextafter64-4 141.7n ± 0% 141.6n ± 0% ~ (p=0.331 n=7)
PowInt-4 878.8n ± 1% 758.3n ± 1% -13.71% (p=0.001 n=7)
PowFrac-4 1.809µ ± 0% 1.615µ ± 0% -10.72% (p=0.001 n=7)
Pow10Pos-4 18.10n ± 0% 18.12n ± 0% ~ (p=0.464 n=7)
Pow10Neg-4 17.09n ± 0% 17.09n ± 0% ~ (p=0.263 n=7)
Round-4 68.36n ± 0% 68.33n ± 0% ~ (p=0.325 n=7)
RoundToEven-4 78.40n ± 0% 78.40n ± 0% ~ (p=0.934 n=7)
Remainder-4 894.0n ± 1% 753.4n ± 1% -15.73% (p=0.001 n=7)
Signbit-4 18.09n ± 0% 18.09n ± 0% ~ (p=0.761 n=7)
Sin-4 389.8n ± 1% 389.8n ± 0% ~ (p=0.995 n=7)
Sincos-4 416.0n ± 0% 415.9n ± 0% ~ (p=0.361 n=7)
Sinh-4 634.6n ± 4% 585.6n ± 1% -7.72% (p=0.001 n=7)
SqrtIndirect-4 8.035n ± 0% 8.036n ± 0% ~ (p=0.523 n=7)
SqrtLatency-4 8.039n ± 0% 8.037n ± 0% ~ (p=0.218 n=7)
SqrtIndirectLatency-4 8.040n ± 0% 8.040n ± 0% ~ (p=0.652 n=7)
SqrtGoLatency-4 895.7n ± 0% 896.6n ± 0% +0.10% (p=0.004 n=7)
SqrtPrime-4 5.406µ ± 0% 5.407µ ± 0% ~ (p=0.592 n=7)
Tan-4 406.1n ± 0% 405.8n ± 1% ~ (p=0.435 n=7)
Tanh-4 627.6n ± 0% 545.5n ± 0% -13.08% (p=0.001 n=7)
Trunc-4 146.7n ± 1% 146.7n ± 0% ~ (p=0.755 n=7)
Y0-4 1.359µ ± 0% 1.310µ ± 0% -3.61% (p=0.001 n=7)
Y1-4 1.351µ ± 0% 1.301µ ± 0% -3.70% (p=0.001 n=7)
Yn-4 2.829µ ± 0% 2.729µ ± 0% -3.53% (p=0.001 n=7)
Float64bits-4 14.08n ± 0% 14.07n ± 0% ~ (p=0.069 n=7)
Float64frombits-4 19.09n ± 0% 19.10n ± 0% ~ (p=0.755 n=7)
Float32bits-4 13.06n ± 0% 13.07n ± 1% ~ (p=0.586 n=7)
Float32frombits-4 13.06n ± 0% 13.06n ± 0% ~ (p=0.853 n=7)
FMA-4 606.9n ± 0% 606.8n ± 0% ~ (p=0.393 n=7)
geomean 201.1n 185.4n -7.81%
Change-Id: I6d41a97ad3789ed5731588588859ac0b8b13b664
Reviewed-on: https://go-review.googlesource.com/c/go/+/484675
Reviewed-by: Rong Zhang <rongrong@oss.cipunited.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
|
|
No change in semantics, just removing an unneeded helper.
Also align rules a bit.
Change-Id: Ie4dabb99392315a7700c645b3d0931eb8766a5fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/483439
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Argument type is dangerous because it may be thinner than the actual
store being issued.
Change-Id: Id19fbd8e6c41390a453994f897dd5048473136aa
Reviewed-on: https://go-review.googlesource.com/c/go/+/483438
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
For c + nil, we want the result to still be of pointer type.
Fixes ppc64le build failure with CL 468455, in issue33724.go.
The problem in that test is that it requires a nil check to be
scheduled before the corresponding load. This normally happens fine
because we prioritize nil checks. If we have nilcheck(p) and load(p),
once p is scheduled the nil check will always go before the load.
The issue we saw in 33724 is that when p is a nil pointer, we ended up
with two different p's, an int64(0) as the argument to the nil check
and an (*Outer)(0) as the argument to the load. Those two zeroes don't
get CSEd, so if the (*Outer)(0) happens to get scheduled first, the
load can end up before the nilcheck.
Fix this by always having constant arithmetic preserve the pointerness
of the value, so that both zeroes are of type *Outer and get CSEd.
Update #58482
Update #33724
Change-Id: Ib9b8c0446f1690b574e0f3c0afb9934efbaf3513
Reviewed-on: https://go-review.googlesource.com/c/go/+/468615
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Bypass: Keith Randall <khr@golang.org>
|
|
The SPanchored opcode is identical to SP, except that it takes a memory
argument so that it (and more importantly, anything that uses it)
must be scheduled at or after that memory argument.
This opcode ensures that a LEAQ of a variable gets scheduled after the
corresponding VARDEF for that variable.
This may lead to less CSE of LEAQ operations. The effect is very small.
The go binary is only 80 bytes bigger after this CL. Usually LEAQs get
folded into load/store operations, so the effect is only for pointerful
types, large enough to need a duffzero, and have their address passed
somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because
the two uses are on different sides of a function call and the LEAQ
ends up being rematerialized at the second use anyway.
Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293
Reviewed-on: https://go-review.googlesource.com/c/go/+/452916
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Martin Möhrmann <martin@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
The standard way to generate code in a Go package is via //go:generate
directives, which are invoked by the developer explicitly running:
go generate import/path/of/said/package
Switch to using that approach here.
This way, developers don't need to learn and remember a custom way that
each particular Go package may choose to implement its code generation.
It also enables conveniences such as 'go generate -n' to discover how
code is generated without running anything (this works on all packages
that rely on //go:generate directives), being able to generate multiple
packages at once and from any directory, and so on.
Change-Id: I0e5b6a1edeff670a8e588befeef0c445613803c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/460135
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The gen folder was renamed to _gen in CL 435472, but references in code
and docs were not updated. This updates the references.
Change-Id: Ibadc0cdcb5bed145c3257b58465a8df370487ae5
Reviewed-on: https://go-review.googlesource.com/c/go/+/444355
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Johan Brandhorst-Satzkorn <johan.brandhorst@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
For certain type of method wrappers we used to generate a tail
call. That was disabled in CL 307234 when register ABI is used,
because with the current IR it was difficult to generate a tail
call with the arguments in the right places. The problem was that
the IR does not contain a CALL-like node with arguments; instead,
it contains an OAS node that adjusts the receiver, than an
OTAILCALL node that just contains the target, but no argument
(with the assumption that the OAS node will put the adjusted
receiver in the right place). With register ABI, putting
arguments in registers are done in SSA. The assignment (OAS)
doesn't put the receiver in register.
This CL changes the IR of a tail call to take an actual OCALL
node. Specifically, a tail call is represented as
OTAILCALL (OCALL target args...)
This way, the call target and args are connected through the OCALL
node. So the call can be analyzed in SSA and the args can be passed
in the right places.
(Alternatively, we could have OTAILCALL node directly take the
target and the args, without the OCALL node. Using an OCALL node is
convenient as there are existing code that processes OCALL nodes
which do not need to be changed. Also, a tail call is similar to
ORETURN (OCALL target args...), except it doesn't preserve the
frame. I did the former but I'm open to change.)
The SSA representation is similar. Previously, the IR lowers to
a Store the receiver then a BlockRetJmp which jumps to the target
(without putting the arg in register). Now we use a TailCall op,
which takes the target and the args. The call expansion pass and
the register allocator handles TailCall pretty much like a
StaticCall, and it will do the right ABI analysis and put the args
in the right places. (Args other than the receiver are already in
the right places. For register args it generates no code for them.
For stack args currently it generates a self copy. I'll work on
optimize that out.) BlockRetJmp is still used, signaling it is a
tail call. The actual call is made in the TailCall op so
BlockRetJmp generates no code (we could use BlockExit if we like).
This slightly reduces binary size:
old new
cmd/go 14003088 13953936
cmd/link 6275552 6271456
Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/350145
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
If I change a rule in ARM64.rules to use the variable name "b" in a
conflicting way, rulegen would previously not complain, and the compiler
would later give a confusing error:
$ go run *.go && go build cmd/compile/internal/ssa
# cmd/compile/internal/ssa
../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0)
Make rulegen complain early about those cases. Sometimes they might
happen to be harmless, but in general they can easily cause confusion or
unintended effect due to shadowing.
After the change, with the same conflicting rule:
$ go run *.go && go build cmd/compile/internal/ssa
2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b
exit status 1
Note that 24 existing rules were using reserved names. It seems like the
shadowing was harmless, as it wasn't causing typechecking issues nor did
it seem to cause unintended behavior when the rule rewrite code ran.
The bool values "b" were renamed "t", since that seems to have a
precedent in other rules and in the fmt package.
Sequential values like "a b c" were renamed to "x y z", since "b" is
reserved.
Finally, "typ" was renamed to "_typ", since there doesn't seem to be an
obviously better answer.
Passes all three of:
$ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std
$ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std
$ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std
Fixes #45154.
Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/303549
Trust: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Add generic rule to rewrite the single-precision square root expression
with one single-precision instruction. The optimization will reduce two
times of precision converting between double-precision and single-precision.
On arm64 flatform.
previous:
FCVTSD F0, F0
FSQRTD F0, F0
FCVTDS F0, F0
optimized:
FSQRTS S0, S0
And this patch adds the test case to check the correctness.
This patch refers to CL 241877, contributed by Alice Xu
(dianhong.xu@arm.com)
Change-Id: I6de5d02281c693017ac4bd4c10963dd55989bd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/276873
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
MOV*nop and MOV*reg seem superfluous. They are there to keep type
information around that would otherwise get thrown away. Not sure
what we need it for. I think our compiler needs a normalization of
how types are represented in SSA, especially after lowering.
MOV*nop gets in the way of some optimization rules firing, like for
load combining.
For now, just fold MOV*nop and MOV*const. It's certainly safe to
do that, as the type info on the MOV*const isn't ever useful.
R=go1.17
Change-Id: I3630a80afc2455a8e9cd9fde10c7abe05ddc3767
Reviewed-on: https://go-review.googlesource.com/c/go/+/276792
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
It just makes the compiler crash. Oops.
Fixes #43099
Change-Id: Id996c14799c1a5d0063ecae3b8770568161c2440
Reviewed-on: https://go-review.googlesource.com/c/go/+/276652
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
The shift amount in SRAconst needs to be in the [0,31] range, so stop
MOVWing -1 to SRA in the Rsh lowering rules.
Also see CL 270117.
Passes
$ GOARCH=mips go build -toolexec 'toolstash -cmp' -a std
$ GOARCH=mipsle go build -toolexec 'toolstash -cmp' -a std
Updates #42587
Change-Id: Ib5eb99b82310e404cc2d6f0c619b21b8a15406ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/270558
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
mips SRA/SLL/SRL shift amounts are used mod 32; this change aligns the
XXXconst rules to mask the shift amount by &31.
Passes
$ GOARCH=mips go build -toolexec 'toolstash -cmp' -a std
$ GOARCH=mipsle go build -toolexec 'toolstash -cmp' -a std
Fixes #42587
Change-Id: I6003ebd0bc500fba4cf6fb10254e1b557bf8c48f
Reviewed-on: https://go-review.googlesource.com/c/go/+/270117
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Also make canMergeSym take Syms instead of interface{}
Change-Id: I4926a1fc586aa90e198249d67e5b520404b40869
Reviewed-on: https://go-review.googlesource.com/c/go/+/265817
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Change-Id: I7fbb0c1ead6e29a7445c8ab43f7050947597f3e8
Reviewed-on: https://go-review.googlesource.com/c/go/+/265497
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
rewrite.go has two identical functions isPowerOfTwo and
isPowerOfTwo64; the former has been there for a while, while the
latter was added together with isPowerOfTwo{8,16,32} for use in typed
rules.
This change deletes isPowerOfTwo and switch to using isPowerOfTwo64
everywhere.
Change-Id: If26c94565d2393fac6f0ba117ee7ee2fc915f7cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/265417
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This one is trivial, as there are already 32-bit AND and OR ops used to
implement the more complex 8-bit versions.
Change-Id: Ic48a53ea291d0067ebeab8e96c82e054daf20ae7
Reviewed-on: https://go-review.googlesource.com/c/go/+/263149
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Keep track of all expressions encountered while
generating a rewrite result, and re-use them whenever possible.
Named expressions may still be used for clarity when desired.
Change-Id: I640dca108763eb8baeff8f9a4169300af3445b82
Reviewed-on: https://go-review.googlesource.com/c/go/+/229800
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
|
|
Passes
GOARCH=mips gotip build -toolexec 'toolstash -cmp' -a std
GOARCH=mipsle gotip build -toolexec 'toolstash -cmp' -a std
Change-Id: I35df0522e299aa755491cd25f47f1f1bf447848c
Reviewed-on: https://go-review.googlesource.com/c/go/+/229637
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This covers most of the lowering rules.
Passes
GOARCH=mips gotip build -toolexec 'toolstash -cmp' -a std
GOARCH=mipsle gotip build -toolexec 'toolstash -cmp' -a std
Change-Id: I9d00aaebecb36622e3bdaf556e5a9377670bf86b
Reviewed-on: https://go-review.googlesource.com/c/go/+/229102
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.
Fixes #37316.
Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
For 1.15, unless someone really wants it in 1.14.
A performance-sensitive user thought this would be useful,
though "large" was not well-defined. If 128 is large,
there are 139 static instances of "large" copies in the compiler
itself.
Includes test.
Change-Id: I81f20c62da59d37072429f3a22c1809e6fb2946d
Reviewed-on: https://go-review.googlesource.com/c/go/+/205066
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
In cases in which we had a named value whose args were all _,
like this rule from ARM.rules:
(MOVBUreg x:(MOVBUload _ _)) -> (MOVWreg x)
We previously inserted
_ = x.Args[1]
even though it is unnecessary.
This change eliminates this pointless bounds check.
And in other cases, we now check bounds just as far as strictly necessary.
No significant movement on any compiler metrics.
Just nicer (and less) code.
Passes toolstash-check -all.
Change-Id: I075dfe9f926cc561cdc705e9ddaab563164bed3a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221781
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
For use in rewrite rules. Shrinks cmd/compile:
compile 20082104 19967416 -114688 -0.571%
Passes toolstash-check -all.
Change-Id: Ic856508b27ec5b7fb9b6ca63e955a7139ae7dc30
Reviewed-on: https://go-review.googlesource.com/c/go/+/221780
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This:
* Simplifies and shortens the generated code for rewrite rules.
* Shrinks cmd/compile by 86k (0.4%) and makes it easier to compile.
* Removes the stmt boundary code wrangling from Value.reset,
in favor of doing it in the one place where it actually does some work,
namely the writebarrier pass. (This was ascertained by inspecting the
code for cases in which notStmtBoundary values were generated.)
Passes toolstash-check -all.
Change-Id: I25671d4c4bbd772f235195d11da090878ea2cc07
Reviewed-on: https://go-review.googlesource.com/c/go/+/221421
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
This shrinks the compiler without impacting performance.
(The performance-sensitive part of rewrite rules is the non-match case.)
Passes toolstash-check -all.
Executable size:
file before after Δ %
compile 20356168 20163960 -192208 -0.944%
total 115599376 115407168 -192208 -0.166%
Text size:
file before after Δ %
cmd/compile/internal/ssa.s 3928309 3778774 -149535 -3.807%
total 18862943 18713408 -149535 -0.793%
Memory allocated compiling package SSA:
SSA 12.7M ± 0% 12.5M ± 0% -1.74% (p=0.008 n=5+5)
Compiler speed impact:
name old time/op new time/op delta
Template 211ms ± 1% 211ms ± 2% ~ (p=0.832 n=49+49)
Unicode 82.8ms ± 2% 83.2ms ± 2% +0.44% (p=0.022 n=46+49)
GoTypes 726ms ± 1% 728ms ± 2% ~ (p=0.076 n=46+48)
Compiler 3.39s ± 2% 3.40s ± 2% ~ (p=0.633 n=48+49)
SSA 7.71s ± 1% 7.65s ± 1% -0.78% (p=0.000 n=45+44)
Flate 134ms ± 1% 134ms ± 1% ~ (p=0.195 n=50+49)
GoParser 167ms ± 1% 167ms ± 1% ~ (p=0.390 n=47+47)
Reflect 453ms ± 3% 452ms ± 2% ~ (p=0.492 n=48+49)
Tar 184ms ± 3% 184ms ± 2% ~ (p=0.862 n=50+48)
XML 248ms ± 2% 248ms ± 2% ~ (p=0.096 n=49+47)
[Geo mean] 415ms 415ms -0.03%
name old user-time/op new user-time/op delta
Template 273ms ± 1% 273ms ± 2% ~ (p=0.711 n=48+48)
Unicode 117ms ± 6% 117ms ± 5% ~ (p=0.633 n=50+50)
GoTypes 972ms ± 2% 974ms ± 1% +0.29% (p=0.016 n=47+49)
Compiler 4.46s ± 6% 4.51s ± 6% ~ (p=0.093 n=50+50)
SSA 10.4s ± 1% 10.3s ± 2% -0.94% (p=0.000 n=45+50)
Flate 166ms ± 2% 167ms ± 2% ~ (p=0.148 n=49+48)
GoParser 202ms ± 1% 202ms ± 2% -0.28% (p=0.014 n=47+49)
Reflect 594ms ± 2% 594ms ± 2% ~ (p=0.717 n=48+49)
Tar 224ms ± 2% 224ms ± 2% ~ (p=0.805 n=50+49)
XML 311ms ± 1% 310ms ± 1% ~ (p=0.177 n=49+48)
[Geo mean] 537ms 537ms +0.01%
Change-Id: I562b9f349b34ddcff01771769e6dbbc80604da7a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221237
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
We try to preserve type correctness of generic ops.
phiopt modified a bool to be an int without a conversion.
Add a conversion. There are a few random fluctations in the
generated code as a result, but nothing noteworthy or systematic.
no binary size changes
file before after Δ %
math.s 35966 35961 -5 -0.014%
debug/dwarf.s 108141 108147 +6 +0.006%
crypto/dsa.s 6047 6044 -3 -0.050%
image/png.s 42882 42885 +3 +0.007%
go/parser.s 80281 80278 -3 -0.004%
cmd/internal/obj.s 115116 115113 -3 -0.003%
go/types.s 322130 322118 -12 -0.004%
cmd/internal/obj/arm64.s 151679 151685 +6 +0.004%
go/internal/gccgoimporter.s 56487 56493 +6 +0.011%
cmd/test2json.s 1650 1647 -3 -0.182%
cmd/link/internal/loadelf.s 35442 35443 +1 +0.003%
cmd/go/internal/work.s 305039 305035 -4 -0.001%
cmd/link/internal/ld.s 544835 544834 -1 -0.000%
net/http.s 558777 558774 -3 -0.001%
cmd/compile/internal/ssa.s 3926551 3926994 +443 +0.011%
cmd/compile/internal/gc.s 1552320 1552321 +1 +0.000%
total 18862241 18862670 +429 +0.002%
Change-Id: I4289e773be6be534ea3f907d68f614441b8f9b46
Reviewed-on: https://go-review.googlesource.com/c/go/+/221607
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The generic Greater and Geq ops can always be replaced with the Less and
Leq ops. This CL therefore removes them. This simplifies the compiler since
it reduces the number of operations that need handling in both code and in
rewrite rules. This will be especially true when adding control flow
optimizations such as the integer-in-range optimizations in CL 165998.
Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040
Reviewed-on: https://go-review.googlesource.com/c/go/+/220417
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Passes toolstash-check -all.
Change-Id: I14db0acb9b531029c613fa31bc076928651b6448
Reviewed-on: https://go-review.googlesource.com/c/go/+/217007
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
We added chunking of rewrite rules to speed up compiling package SSA.
This series of changes has significantly shrunk the number of
rewrite rules, and they are no longer being added nearly as fast.
Now that we are sharing v.Args across multiple rewrite rules,
there is additional benefit to having more rules in a single function.
Removing chunking now has an incidental impact on compiling package SSA,
marginally speeds up other compilation, shrinks the cmd/compile binary,
and simplifies the code.
name old time/op new time/op delta
Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97)
Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91)
GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96)
Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94)
SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95)
Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94)
GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94)
Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97)
Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94)
XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92)
[Geo mean] 419ms 418ms -0.21%
name old user-time/op new user-time/op delta
Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96)
Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98)
GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99)
Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100)
SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96)
Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96)
GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97)
Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92)
Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99)
XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93)
[Geo mean] 543ms 542ms -0.13%
Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa
Reviewed-on: https://go-review.googlesource.com/c/go/+/216220
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
CL 213703 converted generated rewrite rules for commutative ops
to use loops instead of duplicated code.
However, it loaded args using expressions like
v.Args[i] and v.Args[i^1], which the compiler could
not eliminate bounds for (including with all outstanding
prove CLs).
Also, given a series of separate rewrite rules for the same op,
we generated bounds checks for every rewrite rule, even though
we were repeatedly loading the same set of args.
This change reduces both sets of bounds checks.
Instead of loading v.Args[i] and v.Args[i^1] for commutative loops,
we now preload v.Args[0] and v.Args[1] into local variables,
and then swap them (as needed) in the commutative loop post statement.
And we now load all top level v.Args into local variables
at the beginning of every rewrite rule function.
The second optimization is the more significant,
but the first helps a little, and they play together
nicely from the perspective of generating the code.
This does increase register pressure, but the reduced bounds
checks more than compensate.
Note that the vast majority of rewrite rules evaluated
are not applied, so the prologue is the most important
part of the rewrite rules.
There is one subtle aspect to the new generated code.
Because the top level v.Args are shared across rewrite rules,
and rule evaluation can swap v_0 and v_1, v_0 and v_1
can end up being swapped from one rule to the next.
That is OK, because any time a rule does not get applied,
they will have been swapped exactly twice.
Passes toolstash-check -all.
name old time/op new time/op delta
Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96)
Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90)
GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94)
Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100)
SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99)
Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96)
GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93)
Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94)
Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95)
XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94)
[Geo mean] 424ms 421ms -0.68%
name old user-time/op new user-time/op delta
Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98)
Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90)
GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93)
Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100)
SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97)
Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92)
GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96)
Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92)
Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98)
XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95)
[Geo mean] 547ms 544ms -0.61%
file before after Δ %
compile 21113112 21109016 -4096 -0.019%
total 131704940 131700844 -4096 -0.003%
Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956
Reviewed-on: https://go-review.googlesource.com/c/go/+/216218
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Prior to this change, we generated additional rules at rulegen time
for all possible combinations of args to commutative ops.
This is simple and works well, but leads to lots of generated rules.
This in turn has increased the size of the compiler,
made it hard to compile package ssa on small machines,
and provided a disincentive to mark some ops as commutative.
This change reworks how we handle commutative ops.
Instead of generating a rule per argument permutation,
we generate a series of nested loops, one for each commutative op.
Each loop tries both possible argument orderings.
I also considered attempting to canonicalize the inputs to the
rewrite rules. However, because either or both arguments might be
nothing more than an identifier, and because there can be arbitrary
conditions to evaluate during matching, I did not see how to proceed.
The duplicate rule detection now sorts arguments to commutative ops,
so that it can detect commutative-only duplicates.
There may be further optimizations to the new generated code.
In particular, we may not be removing as many bounds checks as before;
I have not investigated deeply. If more work here is needed,
we could do it with more hints or with improvements to the prove pass.
This change has almost no impact on the generated code.
It does not pass toolstash-check, however. In a handful of functions,
for reasons I do not understand, there are minor position changes.
For the entire series ending at this change,
there is negligible compiler performance impact.
The compiler binary shrinks by about 15%,
and package ssa shrinks by about 25%.
Package ssa also compiles ~25% faster with ~25% less memory.
Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4
Reviewed-on: https://go-review.googlesource.com/c/go/+/213703
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
CL 203284 added a compiler intrinsics from atomic Load8 and Store8 on
several architectures, but missed the lowering on MIPS. This CL fixes
that.
Updates #10958, #24543.
Change-Id: I82e88971554fe8c33ad2bf195a633c44b9ac4cf7
Reviewed-on: https://go-review.googlesource.com/c/go/+/203977
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Add a Reset method to blocks that allows us to reduce the amount of
code we generate for block rewrite rules.
Thanks to Cherry for suggesting a similar fix to this in CL 196557.
Compilebench result:
name old time/op new time/op delta
Template 211ms ± 1% 211ms ± 1% -0.30% (p=0.028 n=19+20)
Unicode 83.7ms ± 3% 83.0ms ± 2% -0.79% (p=0.029 n=18+19)
GoTypes 757ms ± 1% 755ms ± 1% -0.31% (p=0.034 n=19+19)
Compiler 3.51s ± 1% 3.50s ± 1% -0.20% (p=0.013 n=18+18)
SSA 11.7s ± 1% 11.7s ± 1% -0.38% (p=0.000 n=19+19)
Flate 131ms ± 1% 130ms ± 1% -0.32% (p=0.024 n=18+18)
GoParser 162ms ± 1% 162ms ± 1% ~ (p=0.059 n=20+18)
Reflect 471ms ± 0% 470ms ± 0% -0.24% (p=0.045 n=20+17)
Tar 187ms ± 1% 186ms ± 1% ~ (p=0.157 n=20+20)
XML 255ms ± 1% 255ms ± 1% ~ (p=0.461 n=19+20)
LinkCompiler 754ms ± 2% 755ms ± 2% ~ (p=0.919 n=17+17)
ExternalLinkCompiler 2.82s ±16% 2.37s ±10% -15.94% (p=0.000 n=20+20)
LinkWithoutDebugCompiler 439ms ± 4% 442ms ± 6% ~ (p=0.461 n=18+19)
StdCmd 25.8s ± 2% 25.5s ± 1% -0.95% (p=0.000 n=20+20)
name old user-time/op new user-time/op delta
Template 240ms ± 8% 238ms ± 7% ~ (p=0.301 n=20+20)
Unicode 107ms ±18% 104ms ±13% ~ (p=0.149 n=20+20)
GoTypes 883ms ± 3% 888ms ± 2% ~ (p=0.211 n=20+20)
Compiler 4.22s ± 1% 4.20s ± 1% ~ (p=0.077 n=20+18)
SSA 14.1s ± 1% 14.1s ± 2% ~ (p=0.192 n=20+20)
Flate 145ms ±10% 148ms ± 5% ~ (p=0.126 n=20+18)
GoParser 186ms ± 7% 186ms ± 7% ~ (p=0.779 n=20+20)
Reflect 538ms ± 3% 541ms ± 3% ~ (p=0.192 n=20+20)
Tar 218ms ± 4% 217ms ± 6% ~ (p=0.835 n=19+20)
XML 298ms ± 5% 298ms ± 5% ~ (p=0.749 n=19+20)
LinkCompiler 818ms ± 5% 825ms ± 8% ~ (p=0.461 n=20+20)
ExternalLinkCompiler 1.55s ± 4% 1.53s ± 5% ~ (p=0.063 n=20+18)
LinkWithoutDebugCompiler 460ms ±12% 460ms ± 7% ~ (p=0.925 n=20+20)
name old object-bytes new object-bytes delta
Template 554kB ± 0% 554kB ± 0% ~ (all equal)
Unicode 215kB ± 0% 215kB ± 0% ~ (all equal)
GoTypes 2.01MB ± 0% 2.01MB ± 0% ~ (all equal)
Compiler 7.97MB ± 0% 7.97MB ± 0% +0.00% (p=0.000 n=20+20)
SSA 26.8MB ± 0% 26.9MB ± 0% +0.27% (p=0.000 n=20+20)
Flate 340kB ± 0% 340kB ± 0% ~ (all equal)
GoParser 434kB ± 0% 434kB ± 0% ~ (all equal)
Reflect 1.34MB ± 0% 1.34MB ± 0% ~ (all equal)
Tar 480kB ± 0% 480kB ± 0% ~ (all equal)
XML 622kB ± 0% 622kB ± 0% ~ (all equal)
name old export-bytes new export-bytes delta
Template 20.4kB ± 0% 20.4kB ± 0% ~ (all equal)
Unicode 8.21kB ± 0% 8.21kB ± 0% ~ (all equal)
GoTypes 36.6kB ± 0% 36.6kB ± 0% ~ (all equal)
Compiler 115kB ± 0% 115kB ± 0% +0.08% (p=0.000 n=20+20)
SSA 141kB ± 0% 141kB ± 0% +0.07% (p=0.000 n=20+20)
Flate 5.11kB ± 0% 5.11kB ± 0% ~ (all equal)
GoParser 8.93kB ± 0% 8.93kB ± 0% ~ (all equal)
Reflect 11.8kB ± 0% 11.8kB ± 0% ~ (all equal)
Tar 10.9kB ± 0% 10.9kB ± 0% ~ (all equal)
XML 17.4kB ± 0% 17.4kB ± 0% ~ (all equal)
name old text-bytes new text-bytes delta
HelloSize 742kB ± 0% 742kB ± 0% ~ (all equal)
CmdGoSize 10.7MB ± 0% 10.7MB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal)
CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal)
CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.10MB ± 0% 1.10MB ± 0% ~ (all equal)
CmdGoSize 14.9MB ± 0% 14.9MB ± 0% ~ (all equal)
Change-Id: Ic89a8e62423b3d9fd9391159e0663acf450803b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/198419
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
|
|
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.
Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.
This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.
Passes toolstash-check -all.
Results of compilebench:
name old time/op new time/op delta
Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20)
Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18)
GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18)
Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18)
SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18)
Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20)
GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20)
Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20)
Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20)
XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18)
LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19)
ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20)
LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20)
StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20)
name old user-time/op new user-time/op delta
Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20)
Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20)
GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19)
Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18)
SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18)
Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18)
GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20)
Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18)
Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20)
XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20)
LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20)
ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19)
LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20)
name old object-bytes new object-bytes delta
Template 559kB ± 0% 559kB ± 0% ~ (all equal)
Unicode 216kB ± 0% 216kB ± 0% ~ (all equal)
GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal)
Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20)
SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20)
Flate 343kB ± 0% 343kB ± 0% ~ (all equal)
GoParser 441kB ± 0% 441kB ± 0% ~ (all equal)
Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal)
Tar 487kB ± 0% 487kB ± 0% ~ (all equal)
XML 632kB ± 0% 632kB ± 0% ~ (all equal)
name old export-bytes new export-bytes delta
Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal)
Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal)
GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal)
Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20)
SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20)
Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal)
GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal)
Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal)
Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal)
XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal)
name old text-bytes new text-bytes delta
HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal)
CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal)
CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal)
CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal)
CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal)
Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
First, renove unnecessary "// cond:" lines from the generated files.
This shaves off about ~7k lines.
Second, join "if cond { break }" statements via "||", which allows us to
deduplicate a large number of them. This shaves off another ~25k lines.
This change is not for readability or simplicity; but rather, to avoid
unnecessary verbosity that makes the generated files larger. All in all,
git reports that the generated files overall weigh ~200KiB less, or
about 2.7% less.
While at it, add a -trace flag to rulegen.
Updates #33644.
Change-Id: I3fac0290a6066070cc62400bf970a4ae0929470a
Reviewed-on: https://go-review.googlesource.com/c/go/+/196498
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
First, add cpu and memory profiling flags, as these are useful to see
where rulegen is spending its time. It now takes many seconds to run on
a recent laptop, so we have to keep an eye on what it's doing.
Second, stop writing '_ = var' lines to keep imports and variables used
at all times. Now that rulegen removes all such unused names, they're
unnecessary.
To perform the removal, lean on go/types to first detect what names are
unused. We can configure it to give us all the type-checking errors in a
file, so we can collect all "declared but not used" errors in a single
pass.
We then use astutil.Apply to remove the relevant nodes based on the line
information from each unused error. This allows us to apply the changes
without having to do extra parser+printer roundtrips to plaintext, which
are far too expensive.
We need to do multiple such passes, as removing an unused variable
declaration might then make another declaration unused. Two passes are
enough to clean every file at the moment, so add a limit of three passes
for now to avoid eating cpu uncontrollably by accident.
The resulting performance of the changes above is a ~30% loss across the
table, since go/types is fairly expensive. The numbers were obtained
with 'benchcmd Rulegen go run *.go', which involves compiling rulegen
itself, but that seems reflective of how the program is used.
name old time/op new time/op delta
Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4)
name old user-time/op new user-time/op delta
Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4)
name old sys-time/op new sys-time/op delta
Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5)
name old peak-RSS-bytes new peak-RSS-bytes delta
Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5)
We can live with a bit more resource usage, but the time/op getting
close to 10s isn't good. To win that back, introduce concurrency in
main.go. This further increases resource usage a bit, but the real time
on this quad-core laptop is greatly reduced. The final benchstat is as
follows:
name old time/op new time/op delta
Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5)
name old user-time/op new user-time/op delta
Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5)
name old peak-RSS-bytes new peak-RSS-bytes delta
Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5)
It might be possible to reduce the cpu or memory usage in the future,
such as configuring go/types to do less work, or taking shortcuts to
avoid having to run it many times. For now, ~2x cpu and ~4x memory usage
seems like a fair trade for a faster and better rulegen.
Finally, we can remove the old code that tried to remove some unused
variables in a hacky and unmaintainable way.
Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6
Reviewed-on: https://go-review.googlesource.com/c/go/+/189798
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
"Division by invariant integers using multiplication" paper
by Granlund and Montgomery contains a method for directly computing
divisibility (x%c == 0 for c constant) by means of the modular inverse.
The method is further elaborated in "Hacker's Delight" by Warren Section 10-17
This general rule can compute divisibilty by one multiplication and a compare
for odd divisors and an additional rotate for even divisors.
To apply the divisibility rule, we must take into account
the rules to rewrite x%c = x-((x/c)*c) and (x/c) for c constant on the first
optimization pass "opt". This complicates the matching as we want to match
only in the cases where the result of (x/c) is not also available.
So, we must match on the expanded form of (x/c) in the expression x == c*(x/c)
in the "late opt" pass after common subexpresion elimination.
Note, that if there is an intermediate opt pass introduced in the future we
could simplify these rules by delaying the magic division rewrite to "late opt"
and matching directly on (x/c) in the intermediate opt pass.
Additional rules to lower the generic RotateLeft* ops were also applied.
On amd64, the divisibility check is 25-50% faster.
name old time/op new time/op delta
DivconstI64-4 2.08ns ± 0% 2.08ns ± 1% ~ (p=0.881 n=5+5)
DivisibleconstI64-4 2.67ns ± 0% 2.67ns ± 1% ~ (p=1.000 n=5+5)
DivisibleWDivconstI64-4 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.683 n=5+5)
DivconstU64-4 2.08ns ± 1% 2.08ns ± 1% ~ (p=1.000 n=5+5)
DivisibleconstU64-4 2.77ns ± 1% 1.55ns ± 2% -43.90% (p=0.008 n=5+5)
DivisibleWDivconstU64-4 2.99ns ± 1% 2.99ns ± 1% ~ (p=1.000 n=5+5)
DivconstI32-4 1.53ns ± 2% 1.53ns ± 0% ~ (p=1.000 n=5+5)
DivisibleconstI32-4 2.23ns ± 0% 2.25ns ± 3% ~ (p=0.167 n=5+5)
DivisibleWDivconstI32-4 2.27ns ± 1% 2.27ns ± 1% ~ (p=0.429 n=5+5)
DivconstU32-4 1.78ns ± 0% 1.78ns ± 1% ~ (p=1.000 n=4+5)
DivisibleconstU32-4 2.52ns ± 2% 1.26ns ± 0% -49.96% (p=0.000 n=5+4)
DivisibleWDivconstU32-4 2.63ns ± 0% 2.85ns ±10% +8.29% (p=0.016 n=4+5)
DivconstI16-4 1.54ns ± 0% 1.54ns ± 0% ~ (p=0.333 n=4+5)
DivisibleconstI16-4 2.10ns ± 0% 2.10ns ± 1% ~ (p=0.571 n=4+5)
DivisibleWDivconstI16-4 2.22ns ± 0% 2.23ns ± 1% ~ (p=0.556 n=4+5)
DivconstU16-4 1.09ns ± 0% 1.01ns ± 1% -7.74% (p=0.000 n=4+5)
DivisibleconstU16-4 1.83ns ± 0% 1.26ns ± 0% -31.52% (p=0.008 n=5+5)
DivisibleWDivconstU16-4 1.88ns ± 0% 1.89ns ± 1% ~ (p=0.365 n=5+5)
DivconstI8-4 1.54ns ± 1% 1.54ns ± 1% ~ (p=1.000 n=5+5)
DivisibleconstI8-4 2.10ns ± 0% 2.11ns ± 0% ~ (p=0.238 n=5+4)
DivisibleWDivconstI8-4 2.22ns ± 0% 2.23ns ± 2% ~ (p=0.762 n=5+5)
DivconstU8-4 0.92ns ± 1% 0.94ns ± 1% +2.65% (p=0.008 n=5+5)
DivisibleconstU8-4 1.66ns ± 0% 1.26ns ± 1% -24.28% (p=0.008 n=5+5)
DivisibleWDivconstU8-4 1.79ns ± 0% 1.80ns ± 1% ~ (p=0.079 n=4+5)
A follow-up change will address the signed division case.
Updates #30282
Change-Id: I7e995f167179aa5c76bb10fbcbeb49c520943403
Reviewed-on: https://go-review.googlesource.com/c/go/+/168037
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|