aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa/rewriteMIPS.go
AgeCommit message (Collapse)Author
2026-01-28cmd/compile: remove the NORconst op on mips{,64}Xiaolin Zhao
In the mips{,64} instruction sets and their extensions, there is no NORI instruction. Change-Id: If008442c792297d011b3d0c1e8501e62e32ab175 Reviewed-on: https://go-review.googlesource.com/c/go/+/735900 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Cherry Mui <cherryyz@google.com>
2026-01-23cmd/compile: cleanup isUnsignedPowerOfTwoJorropo
Merge the signed and unsigned generic functions. The only implementation difference between the two is: n > 0 vs n != 0 check. For unsigned numbers n > 0 == n != 0 and we infact optimize the first to the second. Change-Id: Ia2f6c3e3d4eb098d98f85e06dc2e81baa60bad4e Reviewed-on: https://go-review.googlesource.com/c/go/+/726720 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-30cmd/compile: implement bits.Mul64 on 32-bit systemsRuss Cox
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u on 32-bit systems, with the effect that constant division of both int64s and uint64s can now be emitted directly in all cases, and also that bits.Mul64 can be intrinsified on 32-bit systems. Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were implemented as uint32 divisions by c and some fixup. After expanding those smaller constant divisions, the code for i/999 required: (386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes) (arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes) (mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes) For that much code, we might as well use a full 64x64->128 multiply that can be used for all divisors, not just small ones. Having done that, the same i/999 now generates: (386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes) (arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes) (mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes) The size increase on 386 is due to a few extra register spills. The size increase on mips is due to add-with-carry being hard. The new approach is more general, letting us delete the old special case and guarantee that all int64 and uint64 divisions by constants are generated directly on 32-bit systems. This especially speeds up code making heavy use of bits.Mul64 with a constant argument, which happens in strconv and various crypto packages. A few examples are benchmarked below. pkg: cmd/compile/internal/test benchmark \ host local linux-amd64 s7 linux-386 s7:GOARCH=386 vs base vs base vs base vs base vs base DivconstI64 ~ ~ ~ -49.66% -21.02% ModconstI64 ~ ~ ~ -13.45% +14.52% DivisiblePow2constI64 ~ ~ ~ +0.97% -1.32% DivisibleconstI64 ~ ~ ~ -20.01% -48.28% DivisibleWDivconstI64 ~ ~ -1.76% -38.59% -42.74% DivconstU64/3 ~ ~ ~ -13.82% -4.09% DivconstU64/5 ~ ~ ~ -14.10% -3.54% DivconstU64/37 -2.07% -4.45% ~ -19.60% -9.55% DivconstU64/1234567 ~ ~ ~ -61.55% -56.93% ModconstU64 ~ ~ ~ -6.25% ~ DivisibleconstU64 ~ ~ ~ -2.78% -7.82% DivisibleWDivconstU64 ~ ~ ~ +4.23% +2.56% pkg: math/bits benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base Add ~ ~ ~ ~ Add32 +1.59% ~ ~ ~ Add64 ~ ~ ~ ~ Add64multiple ~ ~ ~ ~ Sub ~ ~ ~ ~ Sub32 ~ ~ ~ ~ Sub64 ~ ~ -9.20% ~ Sub64multiple ~ ~ ~ ~ Mul ~ ~ ~ ~ Mul32 ~ ~ ~ ~ Mul64 ~ ~ -41.58% -53.21% Div ~ ~ ~ ~ Div32 ~ ~ ~ ~ Div64 ~ ~ ~ ~ pkg: strconv benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base ParseInt/Pos/7bit ~ ~ -11.08% -6.75% ParseInt/Pos/26bit ~ ~ -13.65% -11.02% ParseInt/Pos/31bit ~ ~ -14.65% -9.71% ParseInt/Pos/56bit -1.80% ~ -17.97% -10.78% ParseInt/Pos/63bit ~ ~ -13.85% -9.63% ParseInt/Neg/7bit ~ ~ -12.14% -7.26% ParseInt/Neg/26bit ~ ~ -14.18% -9.81% ParseInt/Neg/31bit ~ ~ -14.51% -9.02% ParseInt/Neg/56bit ~ ~ -15.79% -9.79% ParseInt/Neg/63bit ~ ~ -15.68% -11.07% AppendFloat/Decimal ~ ~ -7.25% -12.26% AppendFloat/Float ~ ~ -15.96% -19.45% AppendFloat/Exp ~ ~ -13.96% -17.76% AppendFloat/NegExp ~ ~ -14.89% -20.27% AppendFloat/LongExp ~ ~ -12.68% -17.97% AppendFloat/Big ~ ~ -11.10% -16.64% AppendFloat/BinaryExp ~ ~ ~ ~ AppendFloat/32Integer ~ ~ -10.05% -10.91% AppendFloat/32ExactFraction ~ ~ -8.93% -13.00% AppendFloat/32Point ~ ~ -10.36% -14.89% AppendFloat/32Exp ~ ~ -9.88% -13.54% AppendFloat/32NegExp ~ ~ -10.16% -14.26% AppendFloat/32Shortest ~ ~ -11.39% -14.96% AppendFloat/32Fixed8Hard ~ ~ ~ -2.31% AppendFloat/32Fixed9Hard ~ ~ ~ -7.01% AppendFloat/64Fixed1 ~ ~ -2.83% -8.23% AppendFloat/64Fixed2 ~ ~ ~ -7.94% AppendFloat/64Fixed3 ~ ~ -4.07% -7.22% AppendFloat/64Fixed4 ~ ~ -7.24% -7.62% AppendFloat/64Fixed12 ~ ~ -6.57% -4.82% AppendFloat/64Fixed16 ~ ~ -4.00% -5.81% AppendFloat/64Fixed12Hard -2.22% ~ -4.07% -6.35% AppendFloat/64Fixed17Hard -2.12% ~ ~ -3.79% AppendFloat/64Fixed18Hard -1.89% ~ +2.48% ~ AppendFloat/Slowpath64 -1.85% ~ -14.49% -18.21% AppendFloat/SlowpathDenormal64 ~ ~ -13.08% -19.41% pkg: crypto/internal/fips140/nistec/fiat benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base Mul/P224 ~ ~ -29.95% -39.60% Mul/P384 ~ ~ -37.11% -63.33% Mul/P521 ~ ~ -26.62% -12.42% Square/P224 +1.46% ~ -40.62% -49.18% Square/P384 ~ ~ -45.51% -69.68% Square/P521 +90.37% ~ -25.26% -11.23% (The +90% is a separate problem and not real; that much variation can be seen on that system by running the same binary from two different files.) pkg: crypto/internal/fips140/edwards25519 benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base EncodingDecoding ~ ~ -34.67% -35.75% ScalarBaseMult ~ ~ -31.25% -30.29% ScalarMult ~ ~ -33.45% -32.54% VarTimeDoubleScalarBaseMult ~ ~ -33.78% -33.68% Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65 Reviewed-on: https://go-review.googlesource.com/c/go/+/716061 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-07-30cmd/compile: move mips32 over to new bounds check strategyKeith Randall
Change-Id: Ied54ea7bf68c4c943c621ca059aca1048903c041 Reviewed-on: https://go-review.googlesource.com/c/go/+/682497 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Julian Zhu <jz531210@gmail.com> Reviewed-by: Mark Freeman <mark@golang.org>
2025-07-29cmd/compile: removing log2uint32 functionCuong Manh Le
Just using isUnsignedPowerOfTwo and log32u is enough. Change-Id: I93d49ab71c6245d05f6507adbcb9ef2a696e75d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/691476 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: David Chase <drchase@google.com>
2025-05-21cmd/compile/internal: intrinsify publicationBarrier on mipsxJulian Zhu
This enables publicationBarrier to be used as an intrinsic on mipsx. Change-Id: Ic199f34b84b3058bcfab79aac8f2399ff21a97ce Reviewed-on: https://go-review.googlesource.com/c/go/+/674856 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2025-05-19cmd/compile: fold negation into addition/subtraction on mipsxJulian Zhu
Fold negation into addition/subtraction and avoid double negation. file before after Δ % addr2line 3742022 3741986 -36 -0.001% asm 6668616 6668628 +12 +0.000% buildid 3583786 3583630 -156 -0.004% cgo 6020370 6019634 -736 -0.012% compile 29416016 29417336 +1320 +0.004% cover 6801903 6801675 -228 -0.003% dist 4485916 4485816 -100 -0.002% doc 10652787 10652251 -536 -0.005% fix 4115988 4115560 -428 -0.010% link 9002328 9001616 -712 -0.008% nm 3733148 3732780 -368 -0.010% objdump 6163292 6163068 -224 -0.004% pack 2944768 2944604 -164 -0.006% pprof 18909973 18908773 -1200 -0.006% test2json 3394662 3394778 +116 +0.003% trace 17350911 17349751 -1160 -0.007% vet 10077727 10077527 -200 -0.002% go 19118769 19118609 -160 -0.001% total 166182982 166178022 -4960 -0.003% Change-Id: Id55698800fd70f3cb2ff48393584456b87208921 Reviewed-on: https://go-review.googlesource.com/c/go/+/673556 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-02-27cmd/compile: simplify intrinsification of TrailingZeros16 and TrailingZeros8Joel Sing
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64 and S390X, rather than having a custom intrinsic. Note that for PPC64 this actually allows the existing Ctz16 and Ctz8 rules to be used. Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4 Reviewed-on: https://go-review.googlesource.com/c/go/+/651816 Reviewed-by: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-02-26cmd/compile: simplify intrinsification of BitLen16 and BitLen8Joel Sing
Decompose BitLen16 and BitLen8 within the SSA rules for architectures that support BitLen32 or BitLen64, rather than having a custom intrinsic. Change-Id: Ie4188ce69d1021e63cec27a8e7418efb0714812b Reviewed-on: https://go-review.googlesource.com/c/go/+/651817 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-09-24cmd/compile: use generics for isPowerOfTwo predicateskhr@golang.org
Change-Id: I097b53e9f13de6ff6eb18ae2261842b097f26390 Reviewed-on: https://go-review.googlesource.com/c/go/+/615197 Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2023-05-24cmd/compile: optimize math.Float32bits and math.Float32frombits on mipsxJunxian Zhu
This CL use MFC1/MTC1 instructions to move data between GPR and FPR instead of stores and loads to move float/int values. goos: linux goarch: mipsle pkg: math │ oldmathf │ newmathf │ │ sec/op │ sec/op vs base │ Acos-4 282.7n ± 0% 282.1n ± 0% -0.18% (p=0.010 n=8) Acosh-4 450.8n ± 0% 450.9n ± 0% ~ (p=0.699 n=8) Asin-4 272.6n ± 0% 272.1n ± 0% ~ (p=0.050 n=8) Asinh-4 476.8n ± 0% 475.1n ± 0% -0.35% (p=0.018 n=8) Atan-4 208.1n ± 0% 207.7n ± 0% -0.17% (p=0.009 n=8) Atanh-4 448.8n ± 0% 448.7n ± 0% -0.03% (p=0.014 n=8) Atan2-4 310.2n ± 0% 310.1n ± 0% ~ (p=0.133 n=8) Cbrt-4 357.9n ± 0% 358.4n ± 0% +0.11% (p=0.014 n=8) Ceil-4 203.8n ± 0% 204.7n ± 0% +0.42% (p=0.008 n=8) Compare-4 21.12n ± 0% 22.09n ± 0% +4.59% (p=0.000 n=8) Compare32-4 19.105n ± 0% 6.022n ± 0% -68.48% (p=0.000 n=8) Copysign-4 33.17n ± 0% 33.15n ± 0% ~ (p=0.795 n=8) Cos-4 385.2n ± 0% 384.8n ± 1% ~ (p=0.112 n=8) Cosh-4 546.0n ± 0% 545.0n ± 0% -0.17% (p=0.012 n=8) Erf-4 192.4n ± 0% 195.4n ± 1% +1.59% (p=0.000 n=8) Erfc-4 187.8n ± 0% 192.7n ± 0% +2.64% (p=0.000 n=8) Erfinv-4 221.8n ± 1% 219.8n ± 0% -0.88% (p=0.000 n=8) Erfcinv-4 224.1n ± 1% 219.9n ± 0% -1.87% (p=0.000 n=8) Exp-4 434.7n ± 0% 435.0n ± 0% ~ (p=0.339 n=8) ExpGo-4 433.7n ± 0% 434.2n ± 0% +0.13% (p=0.005 n=8) Expm1-4 243.0n ± 0% 242.9n ± 0% ~ (p=0.103 n=8) Exp2-4 426.6n ± 0% 426.6n ± 0% ~ (p=0.822 n=8) Exp2Go-4 425.6n ± 0% 425.5n ± 0% ~ (p=0.377 n=8) Abs-4 8.033n ± 0% 8.029n ± 0% ~ (p=0.065 n=8) Dim-4 18.07n ± 0% 18.07n ± 0% ~ (p=0.051 n=8) Floor-4 151.6n ± 0% 151.6n ± 0% ~ (p=0.450 n=8) Max-4 100.9n ± 8% 103.2n ± 2% ~ (p=0.099 n=8) Min-4 116.4n ± 0% 116.4n ± 0% ~ (p=0.467 n=8) Mod-4 959.6n ± 1% 950.9n ± 0% -0.91% (p=0.006 n=8) Frexp-4 147.6n ± 0% 147.5n ± 0% -0.07% (p=0.026 n=8) Gamma-4 482.7n ± 0% 478.2n ± 2% -0.92% (p=0.000 n=8) Hypot-4 139.8n ± 1% 127.1n ± 8% -9.12% (p=0.000 n=8) HypotGo-4 137.2n ± 7% 117.5n ± 2% -14.39% (p=0.001 n=8) Ilogb-4 109.5n ± 0% 108.4n ± 1% -1.05% (p=0.001 n=8) J0-4 1.304µ ± 0% 1.304µ ± 0% ~ (p=0.853 n=8) J1-4 1.349µ ± 0% 1.331µ ± 0% -1.33% (p=0.000 n=8) Jn-4 2.774µ ± 0% 2.750µ ± 0% -0.87% (p=0.000 n=8) Ldexp-4 151.6n ± 0% 151.5n ± 0% ~ (p=0.695 n=8) Lgamma-4 226.9n ± 0% 233.9n ± 0% +3.09% (p=0.000 n=8) Log-4 407.6n ± 0% 407.4n ± 0% ~ (p=0.340 n=8) Logb-4 121.5n ± 0% 121.5n ± 0% -0.08% (p=0.042 n=8) Log1p-4 315.5n ± 0% 315.6n ± 0% ~ (p=0.930 n=8) Log10-4 417.8n ± 0% 417.5n ± 0% ~ (p=0.053 n=8) Log2-4 208.8n ± 0% 208.8n ± 0% ~ (p=0.582 n=8) Modf-4 126.5n ± 0% 126.4n ± 0% ~ (p=0.128 n=8) Nextafter32-4 112.45n ± 0% 82.27n ± 0% -26.84% (p=0.000 n=8) Nextafter64-4 141.5n ± 0% 141.5n ± 0% ~ (p=0.569 n=8) PowInt-4 754.0n ± 1% 754.6n ± 0% ~ (p=0.279 n=8) PowFrac-4 1.608µ ± 1% 1.596µ ± 1% ~ (p=0.661 n=8) Pow10Pos-4 18.07n ± 0% 18.07n ± 0% ~ (p=0.413 n=8) Pow10Neg-4 17.08n ± 0% 18.07n ± 0% +5.80% (p=0.000 n=8) Round-4 68.30n ± 0% 69.29n ± 0% +1.45% (p=0.000 n=8) RoundToEven-4 78.33n ± 0% 78.34n ± 0% ~ (p=0.975 n=8) Remainder-4 740.6n ± 1% 736.7n ± 0% ~ (p=0.098 n=8) Signbit-4 18.08n ± 0% 18.07n ± 0% ~ (p=0.546 n=8) Sin-4 389.4n ± 0% 389.5n ± 0% ~ (p=0.451 n=8) Sincos-4 415.6n ± 0% 415.6n ± 0% ~ (p=0.450 n=8) Sinh-4 607.0n ± 0% 590.8n ± 1% -2.68% (p=0.000 n=8) SqrtIndirect-4 8.034n ± 0% 8.030n ± 0% ~ (p=0.487 n=8) SqrtLatency-4 8.031n ± 0% 8.034n ± 0% ~ (p=0.152 n=8) SqrtIndirectLatency-4 8.032n ± 0% 8.032n ± 0% ~ (p=0.818 n=8) SqrtGoLatency-4 895.8n ± 0% 895.3n ± 0% ~ (p=0.553 n=8) SqrtPrime-4 5.405µ ± 0% 5.379µ ± 0% -0.48% (p=0.000 n=8) Tan-4 405.6n ± 0% 405.7n ± 0% ~ (p=0.980 n=8) Tanh-4 545.1n ± 0% 545.1n ± 0% ~ (p=0.806 n=8) Trunc-4 146.5n ± 0% 146.6n ± 0% ~ (p=0.380 n=8) Y0-4 1.308µ ± 0% 1.306µ ± 0% ~ (p=0.071 n=8) Y1-4 1.311µ ± 0% 1.315µ ± 0% +0.31% (p=0.000 n=8) Yn-4 2.737µ ± 0% 2.745µ ± 0% +0.27% (p=0.000 n=8) Float64bits-4 14.56n ± 0% 14.56n ± 0% ~ (p=0.689 n=8) Float64frombits-4 19.08n ± 0% 19.08n ± 0% ~ (p=0.580 n=8) Float32bits-4 13.050n ± 0% 5.019n ± 0% -61.54% (p=0.000 n=8) Float32frombits-4 13.060n ± 0% 4.016n ± 0% -69.25% (p=0.000 n=8) FMA-4 608.5n ± 0% 586.1n ± 0% -3.67% (p=0.000 n=8) geomean 185.5n 176.2n -5.02% Change-Id: Ibf91092ffe70104e6c5ec03bc76d51259818b9b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/494535 Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2023-05-08math: optimize math.Abs on mipsxJunxian Zhu
This commit optimized math.Abs function implementation on mipsx. Tested on loongson 3A2000. goos: linux goarch: mipsle pkg: math │ oldmath │ newmath │ │ sec/op │ sec/op vs base │ Acos-4 282.6n ± 0% 282.3n ± 0% ~ (p=0.140 n=7) Acosh-4 506.1n ± 0% 451.8n ± 0% -10.73% (p=0.001 n=7) Asin-4 272.3n ± 0% 272.2n ± 0% ~ (p=0.808 n=7) Asinh-4 529.7n ± 0% 475.3n ± 0% -10.27% (p=0.001 n=7) Atan-4 208.2n ± 0% 207.9n ± 0% ~ (p=0.134 n=7) Atanh-4 503.4n ± 1% 449.7n ± 0% -10.67% (p=0.001 n=7) Atan2-4 310.5n ± 0% 310.5n ± 0% ~ (p=0.928 n=7) Cbrt-4 359.3n ± 0% 358.8n ± 0% ~ (p=0.121 n=7) Ceil-4 203.9n ± 0% 204.0n ± 0% ~ (p=0.600 n=7) Compare-4 23.11n ± 0% 23.11n ± 0% ~ (p=0.702 n=7) Compare32-4 19.09n ± 0% 19.12n ± 0% ~ (p=0.070 n=7) Copysign-4 33.20n ± 0% 34.02n ± 0% +2.47% (p=0.001 n=7) Cos-4 422.5n ± 0% 385.4n ± 1% -8.78% (p=0.001 n=7) Cosh-4 628.0n ± 0% 545.5n ± 0% -13.14% (p=0.001 n=7) Erf-4 193.7n ± 2% 192.7n ± 1% ~ (p=0.430 n=7) Erfc-4 192.8n ± 1% 193.0n ± 0% ~ (p=0.245 n=7) Erfinv-4 220.7n ± 1% 221.5n ± 2% ~ (p=0.272 n=7) Erfcinv-4 221.3n ± 1% 220.4n ± 2% ~ (p=0.738 n=7) Exp-4 471.4n ± 0% 435.1n ± 0% -7.70% (p=0.001 n=7) ExpGo-4 470.6n ± 0% 434.0n ± 0% -7.78% (p=0.001 n=7) Expm1-4 243.1n ± 0% 243.4n ± 0% ~ (p=0.417 n=7) Exp2-4 463.1n ± 0% 427.0n ± 0% -7.80% (p=0.001 n=7) Exp2Go-4 462.4n ± 0% 426.2n ± 5% -7.83% (p=0.001 n=7) Abs-4 37.000n ± 0% 8.039n ± 9% -78.27% (p=0.001 n=7) Dim-4 18.09n ± 0% 18.11n ± 0% ~ (p=0.094 n=7) Floor-4 151.9n ± 0% 151.8n ± 0% ~ (p=0.190 n=7) Max-4 116.7n ± 1% 116.7n ± 1% ~ (p=0.842 n=7) Min-4 116.6n ± 1% 116.6n ± 0% ~ (p=0.464 n=7) Mod-4 1244.0n ± 0% 980.9n ± 0% -21.15% (p=0.001 n=7) Frexp-4 199.0n ± 0% 146.7n ± 0% -26.28% (p=0.001 n=7) Gamma-4 516.4n ± 0% 479.3n ± 1% -7.18% (p=0.001 n=7) Hypot-4 169.8n ± 0% 117.8n ± 2% -30.62% (p=0.001 n=7) HypotGo-4 170.8n ± 0% 117.5n ± 0% -31.21% (p=0.001 n=7) Ilogb-4 160.8n ± 0% 109.5n ± 0% -31.90% (p=0.001 n=7) J0-4 1.359µ ± 0% 1.305µ ± 0% -3.97% (p=0.001 n=7) J1-4 1.386µ ± 0% 1.334µ ± 0% -3.75% (p=0.001 n=7) Jn-4 2.864µ ± 0% 2.758µ ± 0% -3.70% (p=0.001 n=7) Ldexp-4 202.9n ± 0% 151.7n ± 0% -25.23% (p=0.001 n=7) Lgamma-4 234.0n ± 0% 234.3n ± 0% ~ (p=0.199 n=7) Log-4 444.1n ± 0% 407.9n ± 0% -8.15% (p=0.001 n=7) Logb-4 157.8n ± 0% 121.6n ± 0% -22.94% (p=0.001 n=7) Log1p-4 354.8n ± 0% 315.4n ± 0% -11.10% (p=0.001 n=7) Log10-4 453.9n ± 0% 417.9n ± 0% -7.93% (p=0.001 n=7) Log2-4 245.3n ± 0% 209.1n ± 0% -14.76% (p=0.001 n=7) Modf-4 126.6n ± 0% 126.6n ± 0% ~ (p=0.126 n=7) Nextafter32-4 112.5n ± 0% 112.5n ± 0% ~ (p=0.853 n=7) Nextafter64-4 141.7n ± 0% 141.6n ± 0% ~ (p=0.331 n=7) PowInt-4 878.8n ± 1% 758.3n ± 1% -13.71% (p=0.001 n=7) PowFrac-4 1.809µ ± 0% 1.615µ ± 0% -10.72% (p=0.001 n=7) Pow10Pos-4 18.10n ± 0% 18.12n ± 0% ~ (p=0.464 n=7) Pow10Neg-4 17.09n ± 0% 17.09n ± 0% ~ (p=0.263 n=7) Round-4 68.36n ± 0% 68.33n ± 0% ~ (p=0.325 n=7) RoundToEven-4 78.40n ± 0% 78.40n ± 0% ~ (p=0.934 n=7) Remainder-4 894.0n ± 1% 753.4n ± 1% -15.73% (p=0.001 n=7) Signbit-4 18.09n ± 0% 18.09n ± 0% ~ (p=0.761 n=7) Sin-4 389.8n ± 1% 389.8n ± 0% ~ (p=0.995 n=7) Sincos-4 416.0n ± 0% 415.9n ± 0% ~ (p=0.361 n=7) Sinh-4 634.6n ± 4% 585.6n ± 1% -7.72% (p=0.001 n=7) SqrtIndirect-4 8.035n ± 0% 8.036n ± 0% ~ (p=0.523 n=7) SqrtLatency-4 8.039n ± 0% 8.037n ± 0% ~ (p=0.218 n=7) SqrtIndirectLatency-4 8.040n ± 0% 8.040n ± 0% ~ (p=0.652 n=7) SqrtGoLatency-4 895.7n ± 0% 896.6n ± 0% +0.10% (p=0.004 n=7) SqrtPrime-4 5.406µ ± 0% 5.407µ ± 0% ~ (p=0.592 n=7) Tan-4 406.1n ± 0% 405.8n ± 1% ~ (p=0.435 n=7) Tanh-4 627.6n ± 0% 545.5n ± 0% -13.08% (p=0.001 n=7) Trunc-4 146.7n ± 1% 146.7n ± 0% ~ (p=0.755 n=7) Y0-4 1.359µ ± 0% 1.310µ ± 0% -3.61% (p=0.001 n=7) Y1-4 1.351µ ± 0% 1.301µ ± 0% -3.70% (p=0.001 n=7) Yn-4 2.829µ ± 0% 2.729µ ± 0% -3.53% (p=0.001 n=7) Float64bits-4 14.08n ± 0% 14.07n ± 0% ~ (p=0.069 n=7) Float64frombits-4 19.09n ± 0% 19.10n ± 0% ~ (p=0.755 n=7) Float32bits-4 13.06n ± 0% 13.07n ± 1% ~ (p=0.586 n=7) Float32frombits-4 13.06n ± 0% 13.06n ± 0% ~ (p=0.853 n=7) FMA-4 606.9n ± 0% 606.8n ± 0% ~ (p=0.393 n=7) geomean 201.1n 185.4n -7.81% Change-Id: I6d41a97ad3789ed5731588588859ac0b8b13b664 Reviewed-on: https://go-review.googlesource.com/c/go/+/484675 Reviewed-by: Rong Zhang <rongrong@oss.cipunited.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Than McIntosh <thanm@google.com>
2023-04-10cmd/compile: replace isSigned(t) with t.IsSigned()Keith Randall
No change in semantics, just removing an unneeded helper. Also align rules a bit. Change-Id: Ie4dabb99392315a7700c645b3d0931eb8766a5fa Reviewed-on: https://go-review.googlesource.com/c/go/+/483439 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2023-04-10cmd/compile: clean up store rules to use store type, not argument typeKeith Randall
Argument type is dangerous because it may be thinner than the actual store being issued. Change-Id: Id19fbd8e6c41390a453994f897dd5048473136aa Reviewed-on: https://go-review.googlesource.com/c/go/+/483438 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com>
2023-02-17cmd/compile: ensure constant folding of pointer arithmetic remains a pointerKeith Randall
For c + nil, we want the result to still be of pointer type. Fixes ppc64le build failure with CL 468455, in issue33724.go. The problem in that test is that it requires a nil check to be scheduled before the corresponding load. This normally happens fine because we prioritize nil checks. If we have nilcheck(p) and load(p), once p is scheduled the nil check will always go before the load. The issue we saw in 33724 is that when p is a nil pointer, we ended up with two different p's, an int64(0) as the argument to the nil check and an (*Outer)(0) as the argument to the load. Those two zeroes don't get CSEd, so if the (*Outer)(0) happens to get scheduled first, the load can end up before the nilcheck. Fix this by always having constant arithmetic preserve the pointerness of the value, so that both zeroes are of type *Outer and get CSEd. Update #58482 Update #33724 Change-Id: Ib9b8c0446f1690b574e0f3c0afb9934efbaf3513 Reviewed-on: https://go-review.googlesource.com/c/go/+/468615 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> TryBot-Bypass: Keith Randall <khr@golang.org>
2023-01-19cmd/compile: add anchored version of SPKeith Randall
The SPanchored opcode is identical to SP, except that it takes a memory argument so that it (and more importantly, anything that uses it) must be scheduled at or after that memory argument. This opcode ensures that a LEAQ of a variable gets scheduled after the corresponding VARDEF for that variable. This may lead to less CSE of LEAQ operations. The effect is very small. The go binary is only 80 bytes bigger after this CL. Usually LEAQs get folded into load/store operations, so the effect is only for pointerful types, large enough to need a duffzero, and have their address passed somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because the two uses are on different sides of a function call and the LEAQ ends up being rematerialized at the second use anyway. Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293 Reviewed-on: https://go-review.googlesource.com/c/go/+/452916 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Martin Möhrmann <martin@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2023-01-19cmd/compile/internal/ssa: generate code via a //go:generate directiveDmitri Shuralyov
The standard way to generate code in a Go package is via //go:generate directives, which are invoked by the developer explicitly running: go generate import/path/of/said/package Switch to using that approach here. This way, developers don't need to learn and remember a custom way that each particular Go package may choose to implement its code generation. It also enables conveniences such as 'go generate -n' to discover how code is generated without running anything (this works on all packages that rely on //go:generate directives), being able to generate multiple packages at once and from any directory, and so on. Change-Id: I0e5b6a1edeff670a8e588befeef0c445613803c7 Reviewed-on: https://go-review.googlesource.com/c/go/+/460135 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2022-10-23cmd/internal/ssa: correct references to _gen folderJohan Brandhorst-Satzkorn
The gen folder was renamed to _gen in CL 435472, but references in code and docs were not updated. This updates the references. Change-Id: Ibadc0cdcb5bed145c3257b58465a8df370487ae5 Reviewed-on: https://go-review.googlesource.com/c/go/+/444355 Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Johan Brandhorst-Satzkorn <johan.brandhorst@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2021-09-17cmd/compile: restore tail call for method wrappersCherry Mui
For certain type of method wrappers we used to generate a tail call. That was disabled in CL 307234 when register ABI is used, because with the current IR it was difficult to generate a tail call with the arguments in the right places. The problem was that the IR does not contain a CALL-like node with arguments; instead, it contains an OAS node that adjusts the receiver, than an OTAILCALL node that just contains the target, but no argument (with the assumption that the OAS node will put the adjusted receiver in the right place). With register ABI, putting arguments in registers are done in SSA. The assignment (OAS) doesn't put the receiver in register. This CL changes the IR of a tail call to take an actual OCALL node. Specifically, a tail call is represented as OTAILCALL (OCALL target args...) This way, the call target and args are connected through the OCALL node. So the call can be analyzed in SSA and the args can be passed in the right places. (Alternatively, we could have OTAILCALL node directly take the target and the args, without the OCALL node. Using an OCALL node is convenient as there are existing code that processes OCALL nodes which do not need to be changed. Also, a tail call is similar to ORETURN (OCALL target args...), except it doesn't preserve the frame. I did the former but I'm open to change.) The SSA representation is similar. Previously, the IR lowers to a Store the receiver then a BlockRetJmp which jumps to the target (without putting the arg in register). Now we use a TailCall op, which takes the target and the args. The call expansion pass and the register allocator handles TailCall pretty much like a StaticCall, and it will do the right ABI analysis and put the args in the right places. (Args other than the receiver are already in the right places. For register args it generates no code for them. For stack args currently it generates a self copy. I'll work on optimize that out.) BlockRetJmp is still used, signaling it is a tail call. The actual call is made in the TailCall op so BlockRetJmp generates no code (we could use BlockExit if we like). This slightly reduces binary size: old new cmd/go 14003088 13953936 cmd/link 6275552 6271456 Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/350145 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: David Chase <drchase@google.com>
2021-03-22cmd/compile: disallow rewrite rules from declaring reserved namesDaniel Martí
If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-02cmd/compile: optimize single-precision floating point square rootfanzha02
Add generic rule to rewrite the single-precision square root expression with one single-precision instruction. The optimization will reduce two times of precision converting between double-precision and single-precision. On arm64 flatform. previous: FCVTSD F0, F0 FSQRTD F0, F0 FCVTDS F0, F0 optimized: FSQRTS S0, S0 And this patch adds the test case to check the correctness. This patch refers to CL 241877, contributed by Alice Xu (dianhong.xu@arm.com) Change-Id: I6de5d02281c693017ac4bd4c10963dd55989bd7e Reviewed-on: https://go-review.googlesource.com/c/go/+/276873 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-02-23cmd/compile: fold MOV*nop and MOV*constKeith Randall
MOV*nop and MOV*reg seem superfluous. They are there to keep type information around that would otherwise get thrown away. Not sure what we need it for. I think our compiler needs a normalization of how types are represented in SSA, especially after lowering. MOV*nop gets in the way of some optimization rules firing, like for load combining. For now, just fold MOV*nop and MOV*const. It's certainly safe to do that, as the type info on the MOV*const isn't ever useful. R=go1.17 Change-Id: I3630a80afc2455a8e9cd9fde10c7abe05ddc3767 Reviewed-on: https://go-review.googlesource.com/c/go/+/276792 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-12-10cmd/compile: don't constant fold divide by zeroKeith Randall
It just makes the compiler crash. Oops. Fixes #43099 Change-Id: Id996c14799c1a5d0063ecae3b8770568161c2440 Reviewed-on: https://go-review.googlesource.com/c/go/+/276652 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-11-18cmd/compile: stop MOVW-ing -1 as SRA shift amount in mipsAlberto Donizetti
The shift amount in SRAconst needs to be in the [0,31] range, so stop MOVWing -1 to SRA in the Rsh lowering rules. Also see CL 270117. Passes $ GOARCH=mips go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mipsle go build -toolexec 'toolstash -cmp' -a std Updates #42587 Change-Id: Ib5eb99b82310e404cc2d6f0c619b21b8a15406ce Reviewed-on: https://go-review.googlesource.com/c/go/+/270558 Trust: Alberto Donizetti <alb.donizetti@gmail.com> Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-11-16cmd/compile: mask SLL,SRL,SRAconst shift amountAlberto Donizetti
mips SRA/SLL/SRL shift amounts are used mod 32; this change aligns the XXXconst rules to mask the shift amount by &31. Passes $ GOARCH=mips go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mipsle go build -toolexec 'toolstash -cmp' -a std Fixes #42587 Change-Id: I6003ebd0bc500fba4cf6fb10254e1b557bf8c48f Reviewed-on: https://go-review.googlesource.com/c/go/+/270117 Trust: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-28cmd/compile: rename mergeSymTyped to mergeSymAlberto Donizetti
Also make canMergeSym take Syms instead of interface{} Change-Id: I4926a1fc586aa90e198249d67e5b520404b40869 Reviewed-on: https://go-review.googlesource.com/c/go/+/265817 Trust: Alberto Donizetti <alb.donizetti@gmail.com> Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-10-27cmd/compile: replace int32(b2i(x)) with b2i32(x) in rulesAlberto Donizetti
Change-Id: I7fbb0c1ead6e29a7445c8ab43f7050947597f3e8 Reviewed-on: https://go-review.googlesource.com/c/go/+/265497 Trust: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>
2020-10-27cmd/compile: delete isPowerOfTwo, switch to isPowerOfTwo64Alberto Donizetti
rewrite.go has two identical functions isPowerOfTwo and isPowerOfTwo64; the former has been there for a while, while the latter was added together with isPowerOfTwo{8,16,32} for use in typed rules. This change deletes isPowerOfTwo and switch to using isPowerOfTwo64 everywhere. Change-Id: If26c94565d2393fac6f0ba117ee7ee2fc915f7cd Reviewed-on: https://go-review.googlesource.com/c/go/+/265417 Trust: Alberto Donizetti <alb.donizetti@gmail.com> Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-10-23cmd/compile: intrinsify runtime/internal/atomic.{And,Or} on MIPSMichael Pratt
This one is trivial, as there are already 32-bit AND and OR ops used to implement the more complex 8-bit versions. Change-Id: Ic48a53ea291d0067ebeab8e96c82e054daf20ae7 Reviewed-on: https://go-review.googlesource.com/c/go/+/263149 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-24cmd/compile: CSE the RHS of rewrite rulesJosh Bleecher Snyder
Keep track of all expressions encountered while generating a rewrite result, and re-use them whenever possible. Named expressions may still be used for clarity when desired. Change-Id: I640dca108763eb8baeff8f9a4169300af3445b82 Reviewed-on: https://go-review.googlesource.com/c/go/+/229800 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2020-04-24cmd/compile: convert remaining mips rules to typed auxAlberto Donizetti
Passes GOARCH=mips gotip build -toolexec 'toolstash -cmp' -a std GOARCH=mipsle gotip build -toolexec 'toolstash -cmp' -a std Change-Id: I35df0522e299aa755491cd25f47f1f1bf447848c Reviewed-on: https://go-review.googlesource.com/c/go/+/229637 Reviewed-by: Keith Randall <khr@golang.org>
2020-04-22cmd/compile: switch to typed aux for mips lowering rulesAlberto Donizetti
This covers most of the lowering rules. Passes GOARCH=mips gotip build -toolexec 'toolstash -cmp' -a std GOARCH=mipsle gotip build -toolexec 'toolstash -cmp' -a std Change-Id: I9d00aaebecb36622e3bdaf556e5a9377670bf86b Reviewed-on: https://go-review.googlesource.com/c/go/+/229102 Reviewed-by: Keith Randall <khr@golang.org>
2020-04-07cmd/compile: delete the floating point Greater and Geq opsMichael Munday
Extend CL 220417 (which removed the integer Greater and Geq ops) to floating point comparisons. Greater and Geq can always be implemented using Less and Leq. Fixes #37316. Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/222397 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-04-03cmd/compile: add logging for large (>= 128 byte) copiesDavid Chase
For 1.15, unless someone really wants it in 1.14. A performance-sensitive user thought this would be useful, though "large" was not well-defined. If 128 is large, there are 139 static instances of "large" copies in the compiler itself. Includes test. Change-Id: I81f20c62da59d37072429f3a22c1809e6fb2946d Reviewed-on: https://go-review.googlesource.com/c/go/+/205066 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-02cmd/compile: make pre-elimination of rulegen bounds checks more preciseJosh Bleecher Snyder
In cases in which we had a named value whose args were all _, like this rule from ARM.rules: (MOVBUreg x:(MOVBUload _ _)) -> (MOVWreg x) We previously inserted _ = x.Args[1] even though it is unnecessary. This change eliminates this pointless bounds check. And in other cases, we now check bounds just as far as strictly necessary. No significant movement on any compiler metrics. Just nicer (and less) code. Passes toolstash-check -all. Change-Id: I075dfe9f926cc561cdc705e9ddaab563164bed3a Reviewed-on: https://go-review.googlesource.com/c/go/+/221781 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-03-02cmd/compile: add streamlined Block Reset+AddControl routinesJosh Bleecher Snyder
For use in rewrite rules. Shrinks cmd/compile: compile 20082104 19967416 -114688 -0.571% Passes toolstash-check -all. Change-Id: Ic856508b27ec5b7fb9b6ca63e955a7139ae7dc30 Reviewed-on: https://go-review.googlesource.com/c/go/+/221780 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-03-02cmd/compile: add specialized Value reset for OpCopyJosh Bleecher Snyder
This: * Simplifies and shortens the generated code for rewrite rules. * Shrinks cmd/compile by 86k (0.4%) and makes it easier to compile. * Removes the stmt boundary code wrangling from Value.reset, in favor of doing it in the one place where it actually does some work, namely the writebarrier pass. (This was ascertained by inspecting the code for cases in which notStmtBoundary values were generated.) Passes toolstash-check -all. Change-Id: I25671d4c4bbd772f235195d11da090878ea2cc07 Reviewed-on: https://go-review.googlesource.com/c/go/+/221421 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-01cmd/compile: add specialized AddArgN functions for rewrite rulesJosh Bleecher Snyder
This shrinks the compiler without impacting performance. (The performance-sensitive part of rewrite rules is the non-match case.) Passes toolstash-check -all. Executable size: file before after Δ % compile 20356168 20163960 -192208 -0.944% total 115599376 115407168 -192208 -0.166% Text size: file before after Δ % cmd/compile/internal/ssa.s 3928309 3778774 -149535 -3.807% total 18862943 18713408 -149535 -0.793% Memory allocated compiling package SSA: SSA 12.7M ± 0% 12.5M ± 0% -1.74% (p=0.008 n=5+5) Compiler speed impact: name old time/op new time/op delta Template 211ms ± 1% 211ms ± 2% ~ (p=0.832 n=49+49) Unicode 82.8ms ± 2% 83.2ms ± 2% +0.44% (p=0.022 n=46+49) GoTypes 726ms ± 1% 728ms ± 2% ~ (p=0.076 n=46+48) Compiler 3.39s ± 2% 3.40s ± 2% ~ (p=0.633 n=48+49) SSA 7.71s ± 1% 7.65s ± 1% -0.78% (p=0.000 n=45+44) Flate 134ms ± 1% 134ms ± 1% ~ (p=0.195 n=50+49) GoParser 167ms ± 1% 167ms ± 1% ~ (p=0.390 n=47+47) Reflect 453ms ± 3% 452ms ± 2% ~ (p=0.492 n=48+49) Tar 184ms ± 3% 184ms ± 2% ~ (p=0.862 n=50+48) XML 248ms ± 2% 248ms ± 2% ~ (p=0.096 n=49+47) [Geo mean] 415ms 415ms -0.03% name old user-time/op new user-time/op delta Template 273ms ± 1% 273ms ± 2% ~ (p=0.711 n=48+48) Unicode 117ms ± 6% 117ms ± 5% ~ (p=0.633 n=50+50) GoTypes 972ms ± 2% 974ms ± 1% +0.29% (p=0.016 n=47+49) Compiler 4.46s ± 6% 4.51s ± 6% ~ (p=0.093 n=50+50) SSA 10.4s ± 1% 10.3s ± 2% -0.94% (p=0.000 n=45+50) Flate 166ms ± 2% 167ms ± 2% ~ (p=0.148 n=49+48) GoParser 202ms ± 1% 202ms ± 2% -0.28% (p=0.014 n=47+49) Reflect 594ms ± 2% 594ms ± 2% ~ (p=0.717 n=48+49) Tar 224ms ± 2% 224ms ± 2% ~ (p=0.805 n=50+49) XML 311ms ± 1% 310ms ± 1% ~ (p=0.177 n=49+48) [Geo mean] 537ms 537ms +0.01% Change-Id: I562b9f349b34ddcff01771769e6dbbc80604da7a Reviewed-on: https://go-review.googlesource.com/c/go/+/221237 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-02-29cmd/compile: use correct types in phioptJosh Bleecher Snyder
We try to preserve type correctness of generic ops. phiopt modified a bool to be an int without a conversion. Add a conversion. There are a few random fluctations in the generated code as a result, but nothing noteworthy or systematic. no binary size changes file before after Δ % math.s 35966 35961 -5 -0.014% debug/dwarf.s 108141 108147 +6 +0.006% crypto/dsa.s 6047 6044 -3 -0.050% image/png.s 42882 42885 +3 +0.007% go/parser.s 80281 80278 -3 -0.004% cmd/internal/obj.s 115116 115113 -3 -0.003% go/types.s 322130 322118 -12 -0.004% cmd/internal/obj/arm64.s 151679 151685 +6 +0.004% go/internal/gccgoimporter.s 56487 56493 +6 +0.011% cmd/test2json.s 1650 1647 -3 -0.182% cmd/link/internal/loadelf.s 35442 35443 +1 +0.003% cmd/go/internal/work.s 305039 305035 -4 -0.001% cmd/link/internal/ld.s 544835 544834 -1 -0.000% net/http.s 558777 558774 -3 -0.001% cmd/compile/internal/ssa.s 3926551 3926994 +443 +0.011% cmd/compile/internal/gc.s 1552320 1552321 +1 +0.000% total 18862241 18862670 +429 +0.002% Change-Id: I4289e773be6be534ea3f907d68f614441b8f9b46 Reviewed-on: https://go-review.googlesource.com/c/go/+/221607 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>
2020-02-26cmd/compile: remove Greater* and Geq* generic integer opsMichael Munday
The generic Greater and Geq ops can always be replaced with the Less and Leq ops. This CL therefore removes them. This simplifies the compiler since it reduces the number of operations that need handling in both code and in rewrite rules. This will be especially true when adding control flow optimizations such as the integer-in-range optimizations in CL 165998. Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040 Reviewed-on: https://go-review.googlesource.com/c/go/+/220417 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-02-24cmd/compile: use ellipses in MIPS rulesJosh Bleecher Snyder
Passes toolstash-check -all. Change-Id: I14db0acb9b531029c613fa31bc076928651b6448 Reviewed-on: https://go-review.googlesource.com/c/go/+/217007 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-02-21cmd/compile: remove chunking of rewrite rulesJosh Bleecher Snyder
We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-02-21cmd/compile: reduce bounds checks in generated rewrite rulesJosh Bleecher Snyder
CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-02-20cmd/compile: use loops to handle commutative ops in rulesJosh Bleecher Snyder
Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-10-29cmd/compile: fix missing lowering of atomic {Load,Store}8Austin Clements
CL 203284 added a compiler intrinsics from atomic Load8 and Store8 on several architectures, but missed the lowering on MIPS. This CL fixes that. Updates #10958, #24543. Change-Id: I82e88971554fe8c33ad2bf195a633c44b9ac4cf7 Reviewed-on: https://go-review.googlesource.com/c/go/+/203977 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-07cmd/compile: reduce amount of code generated for block rewrite rulesMichael Munday
Add a Reset method to blocks that allows us to reduce the amount of code we generate for block rewrite rules. Thanks to Cherry for suggesting a similar fix to this in CL 196557. Compilebench result: name old time/op new time/op delta Template 211ms ± 1% 211ms ± 1% -0.30% (p=0.028 n=19+20) Unicode 83.7ms ± 3% 83.0ms ± 2% -0.79% (p=0.029 n=18+19) GoTypes 757ms ± 1% 755ms ± 1% -0.31% (p=0.034 n=19+19) Compiler 3.51s ± 1% 3.50s ± 1% -0.20% (p=0.013 n=18+18) SSA 11.7s ± 1% 11.7s ± 1% -0.38% (p=0.000 n=19+19) Flate 131ms ± 1% 130ms ± 1% -0.32% (p=0.024 n=18+18) GoParser 162ms ± 1% 162ms ± 1% ~ (p=0.059 n=20+18) Reflect 471ms ± 0% 470ms ± 0% -0.24% (p=0.045 n=20+17) Tar 187ms ± 1% 186ms ± 1% ~ (p=0.157 n=20+20) XML 255ms ± 1% 255ms ± 1% ~ (p=0.461 n=19+20) LinkCompiler 754ms ± 2% 755ms ± 2% ~ (p=0.919 n=17+17) ExternalLinkCompiler 2.82s ±16% 2.37s ±10% -15.94% (p=0.000 n=20+20) LinkWithoutDebugCompiler 439ms ± 4% 442ms ± 6% ~ (p=0.461 n=18+19) StdCmd 25.8s ± 2% 25.5s ± 1% -0.95% (p=0.000 n=20+20) name old user-time/op new user-time/op delta Template 240ms ± 8% 238ms ± 7% ~ (p=0.301 n=20+20) Unicode 107ms ±18% 104ms ±13% ~ (p=0.149 n=20+20) GoTypes 883ms ± 3% 888ms ± 2% ~ (p=0.211 n=20+20) Compiler 4.22s ± 1% 4.20s ± 1% ~ (p=0.077 n=20+18) SSA 14.1s ± 1% 14.1s ± 2% ~ (p=0.192 n=20+20) Flate 145ms ±10% 148ms ± 5% ~ (p=0.126 n=20+18) GoParser 186ms ± 7% 186ms ± 7% ~ (p=0.779 n=20+20) Reflect 538ms ± 3% 541ms ± 3% ~ (p=0.192 n=20+20) Tar 218ms ± 4% 217ms ± 6% ~ (p=0.835 n=19+20) XML 298ms ± 5% 298ms ± 5% ~ (p=0.749 n=19+20) LinkCompiler 818ms ± 5% 825ms ± 8% ~ (p=0.461 n=20+20) ExternalLinkCompiler 1.55s ± 4% 1.53s ± 5% ~ (p=0.063 n=20+18) LinkWithoutDebugCompiler 460ms ±12% 460ms ± 7% ~ (p=0.925 n=20+20) name old object-bytes new object-bytes delta Template 554kB ± 0% 554kB ± 0% ~ (all equal) Unicode 215kB ± 0% 215kB ± 0% ~ (all equal) GoTypes 2.01MB ± 0% 2.01MB ± 0% ~ (all equal) Compiler 7.97MB ± 0% 7.97MB ± 0% +0.00% (p=0.000 n=20+20) SSA 26.8MB ± 0% 26.9MB ± 0% +0.27% (p=0.000 n=20+20) Flate 340kB ± 0% 340kB ± 0% ~ (all equal) GoParser 434kB ± 0% 434kB ± 0% ~ (all equal) Reflect 1.34MB ± 0% 1.34MB ± 0% ~ (all equal) Tar 480kB ± 0% 480kB ± 0% ~ (all equal) XML 622kB ± 0% 622kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 20.4kB ± 0% 20.4kB ± 0% ~ (all equal) Unicode 8.21kB ± 0% 8.21kB ± 0% ~ (all equal) GoTypes 36.6kB ± 0% 36.6kB ± 0% ~ (all equal) Compiler 115kB ± 0% 115kB ± 0% +0.08% (p=0.000 n=20+20) SSA 141kB ± 0% 141kB ± 0% +0.07% (p=0.000 n=20+20) Flate 5.11kB ± 0% 5.11kB ± 0% ~ (all equal) GoParser 8.93kB ± 0% 8.93kB ± 0% ~ (all equal) Reflect 11.8kB ± 0% 11.8kB ± 0% ~ (all equal) Tar 10.9kB ± 0% 10.9kB ± 0% ~ (all equal) XML 17.4kB ± 0% 17.4kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 742kB ± 0% 742kB ± 0% ~ (all equal) CmdGoSize 10.7MB ± 0% 10.7MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.10MB ± 0% 1.10MB ± 0% ~ (all equal) CmdGoSize 14.9MB ± 0% 14.9MB ± 0% ~ (all equal) Change-Id: Ic89a8e62423b3d9fd9391159e0663acf450803b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/198419 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2019-10-02cmd/compile: allow multiple SSA block control valuesMichael Munday
Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-09-23cmd/compile: reduce rulegen's output by 200 KiBDaniel Martí
First, renove unnecessary "// cond:" lines from the generated files. This shaves off about ~7k lines. Second, join "if cond { break }" statements via "||", which allows us to deduplicate a large number of them. This shaves off another ~25k lines. This change is not for readability or simplicity; but rather, to avoid unnecessary verbosity that makes the generated files larger. All in all, git reports that the generated files overall weigh ~200KiB less, or about 2.7% less. While at it, add a -trace flag to rulegen. Updates #33644. Change-Id: I3fac0290a6066070cc62400bf970a4ae0929470a Reviewed-on: https://go-review.googlesource.com/c/go/+/196498 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27cmd/compile: teach rulegen to remove unused declsDaniel Martí
First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-04-27cmd/compile: add unsigned divisibility rulesBrian Kessler
"Division by invariant integers using multiplication" paper by Granlund and Montgomery contains a method for directly computing divisibility (x%c == 0 for c constant) by means of the modular inverse. The method is further elaborated in "Hacker's Delight" by Warren Section 10-17 This general rule can compute divisibilty by one multiplication and a compare for odd divisors and an additional rotate for even divisors. To apply the divisibility rule, we must take into account the rules to rewrite x%c = x-((x/c)*c) and (x/c) for c constant on the first optimization pass "opt". This complicates the matching as we want to match only in the cases where the result of (x/c) is not also available. So, we must match on the expanded form of (x/c) in the expression x == c*(x/c) in the "late opt" pass after common subexpresion elimination. Note, that if there is an intermediate opt pass introduced in the future we could simplify these rules by delaying the magic division rewrite to "late opt" and matching directly on (x/c) in the intermediate opt pass. Additional rules to lower the generic RotateLeft* ops were also applied. On amd64, the divisibility check is 25-50% faster. name old time/op new time/op delta DivconstI64-4 2.08ns ± 0% 2.08ns ± 1% ~ (p=0.881 n=5+5) DivisibleconstI64-4 2.67ns ± 0% 2.67ns ± 1% ~ (p=1.000 n=5+5) DivisibleWDivconstI64-4 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.683 n=5+5) DivconstU64-4 2.08ns ± 1% 2.08ns ± 1% ~ (p=1.000 n=5+5) DivisibleconstU64-4 2.77ns ± 1% 1.55ns ± 2% -43.90% (p=0.008 n=5+5) DivisibleWDivconstU64-4 2.99ns ± 1% 2.99ns ± 1% ~ (p=1.000 n=5+5) DivconstI32-4 1.53ns ± 2% 1.53ns ± 0% ~ (p=1.000 n=5+5) DivisibleconstI32-4 2.23ns ± 0% 2.25ns ± 3% ~ (p=0.167 n=5+5) DivisibleWDivconstI32-4 2.27ns ± 1% 2.27ns ± 1% ~ (p=0.429 n=5+5) DivconstU32-4 1.78ns ± 0% 1.78ns ± 1% ~ (p=1.000 n=4+5) DivisibleconstU32-4 2.52ns ± 2% 1.26ns ± 0% -49.96% (p=0.000 n=5+4) DivisibleWDivconstU32-4 2.63ns ± 0% 2.85ns ±10% +8.29% (p=0.016 n=4+5) DivconstI16-4 1.54ns ± 0% 1.54ns ± 0% ~ (p=0.333 n=4+5) DivisibleconstI16-4 2.10ns ± 0% 2.10ns ± 1% ~ (p=0.571 n=4+5) DivisibleWDivconstI16-4 2.22ns ± 0% 2.23ns ± 1% ~ (p=0.556 n=4+5) DivconstU16-4 1.09ns ± 0% 1.01ns ± 1% -7.74% (p=0.000 n=4+5) DivisibleconstU16-4 1.83ns ± 0% 1.26ns ± 0% -31.52% (p=0.008 n=5+5) DivisibleWDivconstU16-4 1.88ns ± 0% 1.89ns ± 1% ~ (p=0.365 n=5+5) DivconstI8-4 1.54ns ± 1% 1.54ns ± 1% ~ (p=1.000 n=5+5) DivisibleconstI8-4 2.10ns ± 0% 2.11ns ± 0% ~ (p=0.238 n=5+4) DivisibleWDivconstI8-4 2.22ns ± 0% 2.23ns ± 2% ~ (p=0.762 n=5+5) DivconstU8-4 0.92ns ± 1% 0.94ns ± 1% +2.65% (p=0.008 n=5+5) DivisibleconstU8-4 1.66ns ± 0% 1.26ns ± 1% -24.28% (p=0.008 n=5+5) DivisibleWDivconstU8-4 1.79ns ± 0% 1.80ns ± 1% ~ (p=0.079 n=4+5) A follow-up change will address the signed division case. Updates #30282 Change-Id: I7e995f167179aa5c76bb10fbcbeb49c520943403 Reviewed-on: https://go-review.googlesource.com/c/go/+/168037 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>