| Age | Commit message (Collapse) | Author |
|
These were broken by CL 721206, which changes Rsh to RshU for
positive inputs.
Change-Id: I9e38c3c428fb8aeb70cf51e7e76f4711c864f027
Reviewed-on: https://go-review.googlesource.com/c/go/+/723340
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Separate patterns in asmcheck by spaces instead of commas.
Many patterns end in comma (like "MOV [$]123,") so separating
patterns by comma is not great; they're already quoted, so spaces are fine.
Also replace all tabs in the assembly lines with spaces before matching.
Finally, replace \$ or \\$ with [$] as the matching idiom.
The effect of all these is to make the patterns look like:
// amd64:"BSFQ" "ORQ [$]256"
instead of the old:
// amd64:"BSFQ","ORQ\t\\$256"
Update all tests as well.
Change-Id: Ia39febe5d7f67ba115846422789e11b185d5c807
Reviewed-on: https://go-review.googlesource.com/c/go/+/716060
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
|
|
Fixes #75479
Change-Id: I362d3e49090e94f91a840dd5a475978b59222a00
Reviewed-on: https://go-review.googlesource.com/c/go/+/704135
Reviewed-by: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
Change-Id: I4bee2770fedf97e35b5a5b9187a8ba3c41f9ec2e
Reviewed-on: https://go-review.googlesource.com/c/go/+/702697
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@google.com>
|
|
loong64
Refer to CL 633075, loong64 has a zero(R0) register that can be used to do this.
Change-Id: I846c6bdfcfd6dbfa18338afc13e34e350580ead4
Reviewed-on: https://go-review.googlesource.com/c/go/+/693876
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
Change-Id: I782f93510bba92ba60b298c1c1cde456c8bcec38
Reviewed-on: https://go-review.googlesource.com/c/go/+/697956
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
CL 622236 forgot to check the mask was also a 32 bit rotate mask. Add
a modified version of isPPC64WordRotateMask which valids the mask is
contiguous and fits inside a uint32.
I don't this is possible when merging SRDconst, the first check should
always reject such combines. But, be extra careful and do it there
too.
Fixes #73153
Change-Id: Ie95f74ec5e7d89dc761511126db814f886a7a435
Reviewed-on: https://go-review.googlesource.com/c/go/+/679775
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
-N+1 <= x % N <= N-1
This is useful for cases like:
func setBit(b []byte, i int) {
b[i/8] |= 1<<(i%8)
}
The shift does not need protection against larger-than-7 cases.
(It does still need protection against <0 cases.)
Change-Id: Idf83101386af538548bfeb6e2928cea855610ce2
Reviewed-on: https://go-review.googlesource.com/c/go/+/672995
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
This adds tests for type conversion and shifts, detailing various
poor bad code generation that currently exists for riscv64. This
will be addressed in future CLs.
Change-Id: Ie1d366dfe878832df691600f8500ef383da92848
Reviewed-on: https://go-review.googlesource.com/c/go/+/615678
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Tests that exist for riscv64/rva22u64 should also be applied to
riscv64/rva23u64.
Change-Id: Ia529fdf0ac55b8bcb3dcd24fa80efef2351f3842
Reviewed-on: https://go-review.googlesource.com/c/go/+/652315
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Use the shiftIsBounded function to generate more efficient shift instructions.
This change also optimize shift ops when the shift value is v&63 and v&31.
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000-HV @ 2500.00MHz
| CL 627855 | this CL |
| sec/op | sec/op vs base |
LeadingZeros 1.1005n ± 0% 0.8425n ± 1% -23.44% (p=0.000 n=10)
LeadingZeros8 1.502n ± 0% 1.501n ± 0% -0.07% (p=0.001 n=10)
LeadingZeros16 1.502n ± 0% 1.501n ± 0% -0.07% (p=0.000 n=10)
LeadingZeros32 0.9511n ± 0% 0.8050n ± 0% -15.36% (p=0.000 n=10)
LeadingZeros64 1.1195n ± 0% 0.8423n ± 0% -24.76% (p=0.000 n=10)
TrailingZeros 0.8086n ± 0% 0.8005n ± 0% -1.00% (p=0.000 n=10)
TrailingZeros8 1.031n ± 1% 1.035n ± 1% ~ (p=0.136 n=10)
TrailingZeros16 0.8114n ± 0% 0.8254n ± 1% +1.73% (p=0.000 n=10)
TrailingZeros32 0.8090n ± 0% 0.8005n ± 0% -1.05% (p=0.000 n=10)
TrailingZeros64 0.8089n ± 1% 0.8005n ± 0% -1.04% (p=0.000 n=10)
OnesCount 0.8677n ± 0% 1.2010n ± 0% +38.41% (p=0.000 n=10)
OnesCount8 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
OnesCount16 0.9344n ± 0% 1.2010n ± 0% +28.53% (p=0.000 n=10)
OnesCount32 0.8677n ± 0% 1.2010n ± 0% +38.41% (p=0.000 n=10)
OnesCount64 1.2010n ± 0% 0.8671n ± 0% -27.80% (p=0.000 n=10)
RotateLeft 0.8009n ± 0% 0.6671n ± 0% -16.71% (p=0.000 n=10)
RotateLeft8 1.202n ± 0% 1.327n ± 0% +10.40% (p=0.000 n=10)
RotateLeft16 0.8036n ± 0% 0.8218n ± 0% +2.26% (p=0.000 n=10)
RotateLeft32 0.6674n ± 0% 0.8004n ± 0% +19.94% (p=0.000 n=10)
RotateLeft64 0.6674n ± 0% 0.8004n ± 0% +19.94% (p=0.000 n=10)
Reverse 0.4067n ± 1% 0.4122n ± 1% +1.38% (p=0.001 n=10)
Reverse8 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
Reverse16 0.8009n ± 0% 0.8005n ± 0% -0.05% (p=0.000 n=10)
Reverse32 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.001 n=10)
Reverse64 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.008 n=10)
ReverseBytes 0.4057n ± 1% 0.4133n ± 1% +1.90% (p=0.000 n=10)
ReverseBytes16 0.8009n ± 0% 0.8004n ± 0% -0.07% (p=0.000 n=10)
ReverseBytes32 0.8009n ± 0% 0.8005n ± 0% -0.05% (p=0.000 n=10)
ReverseBytes64 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
Add 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Add32 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
Add64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Add64multiple 1.832n ± 0% 1.828n ± 0% -0.22% (p=0.001 n=10)
Sub 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub32 1.602n ± 0% 1.601n ± 0% -0.06% (p=0.000 n=10)
Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
Sub64multiple 2.402n ± 0% 2.400n ± 0% -0.10% (p=0.000 n=10)
Mul 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
Mul32 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
Mul64 0.8008n ± 0% 0.8004n ± 0% -0.05% (p=0.000 n=10)
Div 9.083n ± 0% 7.638n ± 0% -15.91% (p=0.000 n=10)
Div32 4.011n ± 0% 4.009n ± 0% -0.05% (p=0.000 n=10)
Div64 9.711n ± 0% 8.204n ± 0% -15.51% (p=0.000 n=10)
geomean 1.083n 1.078n -0.40%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| CL 627855 | this CL |
| sec/op | sec/op vs base |
LeadingZeros 1.341n ± 4% 1.331n ± 2% -0.71% (p=0.008 n=10)
LeadingZeros8 1.781n ± 0% 1.766n ± 1% -0.84% (p=0.011 n=10)
LeadingZeros16 1.782n ± 0% 1.767n ± 0% -0.79% (p=0.001 n=10)
LeadingZeros32 1.341n ± 1% 1.333n ± 0% -0.52% (p=0.001 n=10)
LeadingZeros64 1.338n ± 0% 1.333n ± 0% -0.37% (p=0.008 n=10)
TrailingZeros 0.9025n ± 0% 0.8077n ± 0% -10.50% (p=0.000 n=10)
TrailingZeros8 1.056n ± 0% 1.089n ± 1% +3.17% (p=0.001 n=10)
TrailingZeros16 1.101n ± 0% 1.102n ± 0% +0.09% (p=0.011 n=10)
TrailingZeros32 0.9024n ± 1% 0.8083n ± 0% -10.43% (p=0.000 n=10)
TrailingZeros64 0.9028n ± 1% 0.8087n ± 0% -10.43% (p=0.000 n=10)
OnesCount 1.482n ± 1% 1.302n ± 0% -12.15% (p=0.000 n=10)
OnesCount8 1.206n ± 0% 1.207n ± 2% +0.12% (p=0.000 n=10)
OnesCount16 1.534n ± 0% 1.402n ± 0% -8.58% (p=0.000 n=10)
OnesCount32 1.531n ± 1% 1.302n ± 0% -14.99% (p=0.000 n=10)
OnesCount64 1.302n ± 0% 1.538n ± 1% +18.16% (p=0.000 n=10)
RotateLeft 0.8083n ± 0% 0.8087n ± 1% ~ (p=0.579 n=10)
RotateLeft8 1.310n ± 0% 1.323n ± 0% +0.95% (p=0.001 n=10)
RotateLeft16 1.149n ± 0% 1.165n ± 1% +1.35% (p=0.001 n=10)
RotateLeft32 0.8093n ± 0% 0.8105n ± 0% ~ (p=0.393 n=10)
RotateLeft64 0.8088n ± 0% 0.8090n ± 0% ~ (p=0.739 n=10)
Reverse 0.5109n ± 0% 0.5172n ± 1% +1.25% (p=0.000 n=10)
Reverse8 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.000 n=10)
Reverse16 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.002 n=10)
Reverse32 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.000 n=10)
Reverse64 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.005 n=10)
ReverseBytes 0.5122n ± 2% 0.5182n ± 1% ~ (p=0.060 n=10)
ReverseBytes16 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.005 n=10)
ReverseBytes32 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.005 n=10)
ReverseBytes64 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.001 n=10)
Add 1.201n ± 4% 1.202n ± 0% +0.08% (p=0.028 n=10)
Add32 1.201n ± 0% 1.202n ± 2% +0.08% (p=0.014 n=10)
Add64 1.201n ± 1% 1.202n ± 0% +0.08% (p=0.025 n=10)
Add64multiple 1.902n ± 0% 1.913n ± 0% +0.55% (p=0.004 n=10)
Sub 1.201n ± 0% 1.202n ± 3% +0.08% (p=0.001 n=10)
Sub32 1.654n ± 0% 1.656n ± 1% ~ (p=0.117 n=10)
Sub64 1.201n ± 0% 1.202n ± 0% +0.08% (p=0.001 n=10)
Sub64multiple 2.180n ± 4% 2.159n ± 1% -0.96% (p=0.006 n=10)
Mul 0.9345n ± 0% 0.9346n ± 0% +0.01% (p=0.000 n=10)
Mul32 1.030n ± 0% 1.050n ± 1% +1.94% (p=0.000 n=10)
Mul64 0.9345n ± 0% 0.9346n ± 1% +0.01% (p=0.000 n=10)
Div 11.57n ± 1% 11.12n ± 0% -3.85% (p=0.000 n=10)
Div32 4.337n ± 1% 4.341n ± 1% ~ (p=0.286 n=10)
Div64 12.76n ± 0% 12.02n ± 3% -5.80% (p=0.000 n=10)
geomean 1.252n 1.235n -1.32%
Change-Id: Iec4cfd2b83bb0f946068c1d657369ff081d95b04
Reviewed-on: https://go-review.googlesource.com/c/go/+/628575
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 1.100n ± 1% 1.101n ± 0% ~ (p=0.566 n=10)
LeadingZeros8 1.501n ± 0% 1.502n ± 0% +0.07% (p=0.000 n=10)
LeadingZeros16 1.501n ± 0% 1.502n ± 0% +0.07% (p=0.000 n=10)
LeadingZeros32 1.2010n ± 0% 0.9511n ± 0% -20.81% (p=0.000 n=10)
LeadingZeros64 1.104n ± 1% 1.119n ± 0% +1.40% (p=0.000 n=10)
TrailingZeros 0.8137n ± 0% 0.8086n ± 0% -0.63% (p=0.001 n=10)
TrailingZeros8 1.031n ± 1% 1.031n ± 1% ~ (p=0.956 n=10)
TrailingZeros16 0.8204n ± 1% 0.8114n ± 0% -1.11% (p=0.000 n=10)
TrailingZeros32 0.8145n ± 0% 0.8090n ± 0% -0.68% (p=0.000 n=10)
TrailingZeros64 0.8159n ± 0% 0.8089n ± 1% -0.86% (p=0.000 n=10)
OnesCount 0.8672n ± 0% 0.8677n ± 0% +0.06% (p=0.000 n=10)
OnesCount8 0.8005n ± 0% 0.8009n ± 0% +0.06% (p=0.000 n=10)
OnesCount16 0.9339n ± 0% 0.9344n ± 0% +0.05% (p=0.000 n=10)
OnesCount32 0.8672n ± 0% 0.8677n ± 0% +0.06% (p=0.000 n=10)
OnesCount64 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
RotateLeft 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
RotateLeft8 1.202n ± 0% 1.202n ± 0% ~ (p=0.210 n=10)
RotateLeft16 0.8050n ± 0% 0.8036n ± 0% -0.17% (p=0.002 n=10)
RotateLeft32 0.6674n ± 0% 0.6674n ± 0% ~ (p=1.000 n=10)
RotateLeft64 0.6673n ± 0% 0.6674n ± 0% ~ (p=0.072 n=10)
Reverse 0.4123n ± 0% 0.4067n ± 1% -1.37% (p=0.000 n=10)
Reverse8 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
Reverse16 0.8004n ± 0% 0.8009n ± 0% +0.06% (p=0.000 n=10)
Reverse32 0.8004n ± 0% 0.8009n ± 0% +0.06% (p=0.000 n=10)
Reverse64 0.8004n ± 0% 0.8009n ± 0% +0.06% (p=0.001 n=10)
ReverseBytes 0.4100n ± 1% 0.4057n ± 1% -1.06% (p=0.002 n=10)
ReverseBytes16 0.8004n ± 0% 0.8009n ± 0% +0.07% (p=0.000 n=10)
ReverseBytes32 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
ReverseBytes64 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
Add 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Add32 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
Add64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Add64multiple 1.831n ± 0% 1.832n ± 0% ~ (p=1.000 n=10)
Sub 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub32 1.601n ± 0% 1.602n ± 0% +0.06% (p=0.000 n=10)
Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
Sub64multiple 2.400n ± 0% 2.402n ± 0% +0.10% (p=0.000 n=10)
Mul 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
Mul32 0.8005n ± 0% 0.8009n ± 0% +0.05% (p=0.000 n=10)
Mul64 0.8004n ± 0% 0.8008n ± 0% +0.05% (p=0.000 n=10)
Div 9.107n ± 0% 9.083n ± 0% ~ (p=0.255 n=10)
Div32 4.009n ± 0% 4.011n ± 0% +0.05% (p=0.000 n=10)
Div64 9.705n ± 0% 9.711n ± 0% +0.06% (p=0.000 n=10)
geomean 1.089n 1.083n -0.62%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 1.352n ± 0% 1.341n ± 4% -0.81% (p=0.024 n=10)
LeadingZeros8 1.766n ± 0% 1.781n ± 0% +0.88% (p=0.000 n=10)
LeadingZeros16 1.766n ± 0% 1.782n ± 0% +0.88% (p=0.000 n=10)
LeadingZeros32 1.536n ± 0% 1.341n ± 1% -12.73% (p=0.000 n=10)
LeadingZeros64 1.351n ± 1% 1.338n ± 0% -0.96% (p=0.000 n=10)
TrailingZeros 0.9037n ± 0% 0.9025n ± 0% -0.12% (p=0.020 n=10)
TrailingZeros8 1.087n ± 3% 1.056n ± 0% ~ (p=0.060 n=10)
TrailingZeros16 1.101n ± 0% 1.101n ± 0% ~ (p=0.211 n=10)
TrailingZeros32 0.9040n ± 0% 0.9024n ± 1% -0.18% (p=0.017 n=10)
TrailingZeros64 0.9043n ± 0% 0.9028n ± 1% ~ (p=0.118 n=10)
OnesCount 1.503n ± 2% 1.482n ± 1% -1.43% (p=0.001 n=10)
OnesCount8 1.207n ± 0% 1.206n ± 0% -0.12% (p=0.000 n=10)
OnesCount16 1.501n ± 0% 1.534n ± 0% +2.13% (p=0.000 n=10)
OnesCount32 1.483n ± 1% 1.531n ± 1% +3.27% (p=0.000 n=10)
OnesCount64 1.301n ± 0% 1.302n ± 0% +0.08% (p=0.000 n=10)
RotateLeft 0.8136n ± 4% 0.8083n ± 0% -0.66% (p=0.002 n=10)
RotateLeft8 1.311n ± 0% 1.310n ± 0% ~ (p=0.786 n=10)
RotateLeft16 1.165n ± 0% 1.149n ± 0% -1.33% (p=0.001 n=10)
RotateLeft32 0.8138n ± 1% 0.8093n ± 0% -0.57% (p=0.017 n=10)
RotateLeft64 0.8149n ± 1% 0.8088n ± 0% -0.74% (p=0.000 n=10)
Reverse 0.5195n ± 1% 0.5109n ± 0% -1.67% (p=0.000 n=10)
Reverse8 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
Reverse16 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
Reverse32 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.012 n=10)
Reverse64 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.010 n=10)
ReverseBytes 0.5120n ± 1% 0.5122n ± 2% ~ (p=0.306 n=10)
ReverseBytes16 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
ReverseBytes32 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
ReverseBytes64 0.8007n ± 0% 0.8010n ± 0% +0.04% (p=0.000 n=10)
Add 1.201n ± 0% 1.201n ± 4% ~ (p=0.334 n=10)
Add32 1.201n ± 0% 1.201n ± 0% ~ (p=0.563 n=10)
Add64 1.201n ± 0% 1.201n ± 1% ~ (p=0.652 n=10)
Add64multiple 1.909n ± 0% 1.902n ± 0% ~ (p=0.126 n=10)
Sub 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub32 1.655n ± 0% 1.654n ± 0% ~ (p=0.589 n=10)
Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub64multiple 2.150n ± 0% 2.180n ± 4% +1.37% (p=0.000 n=10)
Mul 0.9341n ± 0% 0.9345n ± 0% +0.04% (p=0.011 n=10)
Mul32 1.053n ± 0% 1.030n ± 0% -2.23% (p=0.000 n=10)
Mul64 0.9341n ± 0% 0.9345n ± 0% +0.04% (p=0.018 n=10)
Div 11.59n ± 0% 11.57n ± 1% ~ (p=0.091 n=10)
Div32 4.337n ± 0% 4.337n ± 1% ~ (p=0.783 n=10)
Div64 12.81n ± 0% 12.76n ± 0% -0.39% (p=0.001 n=10)
geomean 1.257n 1.252n -0.46%
Change-Id: I9e93ea49736760c19dc6b6463d2aa95878121b7b
Reviewed-on: https://go-review.googlesource.com/c/go/+/627855
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
ADD(Q|L) has generally twice the throughput.
Came up in CL 626998.
Throughput by arch:
Zen 4:
SHLL (R64, 1): 0.5
ADD (R64, R64): 0.25
Intel Alder Lake:
SHLL (R64, 1): 0.5
ADD (R64, R64): 0.2
Intel Haswell:
SHLL (R64, 1): 0.5
ADD (R64, R64): 0.25
Also include a minor opt for:
(x + x) << c -> x << (c + 1)
Before this, the code:
func addShift(x int64) int64 {
return (x + x) << 1
}
emitted two instructions:
ADDQ AX, AX
SHLQ $1, AX
but we can do it in a single shift:
SHLQ $2, AX
Add a codegen test for clearing the last bit.
compilecmp linux/amd64:
math
math.sqrt 243 -> 242 (-0.41%)
math [cmd/compile]
math.sqrt 243 -> 242 (-0.41%)
runtime
runtime.selectgo 5455 -> 5445 (-0.18%)
runtime.sysargs 665 -> 662 (-0.45%)
runtime.isPinned 145 -> 141 (-2.76%)
runtime.atoi64 198 -> 194 (-2.02%)
runtime.setPinned 714 -> 709 (-0.70%)
runtime [cmd/compile]
runtime.sysargs 665 -> 662 (-0.45%)
runtime.setPinned 714 -> 709 (-0.70%)
runtime.atoi64 198 -> 194 (-2.02%)
runtime.isPinned 145 -> 141 (-2.76%)
strconv
strconv.computeBounds 109 -> 107 (-1.83%)
strconv.FormatInt 201 -> 197 (-1.99%)
strconv.ryuFtoaShortest 1298 -> 1266 (-2.47%)
strconv.small 144 -> 134 (-6.94%)
strconv.AppendInt 357 -> 344 (-3.64%)
strconv.ryuDigits32 490 -> 488 (-0.41%)
strconv.AppendUint 342 -> 340 (-0.58%)
strconv [cmd/compile]
strconv.FormatInt 201 -> 197 (-1.99%)
strconv.ryuFtoaShortest 1298 -> 1266 (-2.47%)
strconv.ryuDigits32 490 -> 488 (-0.41%)
strconv.AppendUint 342 -> 340 (-0.58%)
strconv.computeBounds 109 -> 107 (-1.83%)
strconv.small 144 -> 134 (-6.94%)
strconv.AppendInt 357 -> 344 (-3.64%)
image
image.Rectangle.Inset 101 -> 97 (-3.96%)
regexp/syntax
regexp/syntax.inCharClass.func1 111 -> 110 (-0.90%)
regexp/syntax.(*compiler).quest 586 -> 573 (-2.22%)
regexp/syntax.ranges.Less 153 -> 150 (-1.96%)
regexp/syntax.(*compiler).loop 583 -> 568 (-2.57%)
time
time.Time.Before 179 -> 161 (-10.06%)
time.Time.Compare 189 -> 166 (-12.17%)
time.Time.Sub 444 -> 425 (-4.28%)
time.Time.UnixMicro 106 -> 95 (-10.38%)
time.div 592 -> 587 (-0.84%)
time.Time.UnixNano 85 -> 78 (-8.24%)
time.(*Time).UnixMilli 141 -> 140 (-0.71%)
time.Time.UnixMilli 106 -> 95 (-10.38%)
time.(*Time).UnixMicro 141 -> 140 (-0.71%)
time.Time.After 179 -> 161 (-10.06%)
time.Time.Equal 170 -> 150 (-11.76%)
time.Time.AppendBinary 766 -> 757 (-1.17%)
time.Time.IsZero 74 -> 66 (-10.81%)
time.(*Time).UnixNano 124 -> 113 (-8.87%)
time.(*Time).IsZero 113 -> 108 (-4.42%)
regexp
regexp.(*Regexp).FindAllStringSubmatch.func1 590 -> 569 (-3.56%)
regexp.QuoteMeta 485 -> 469 (-3.30%)
regexp/syntax [cmd/compile]
regexp/syntax.inCharClass.func1 111 -> 110 (-0.90%)
regexp/syntax.(*compiler).loop 583 -> 568 (-2.57%)
regexp/syntax.(*compiler).quest 586 -> 573 (-2.22%)
regexp/syntax.ranges.Less 153 -> 150 (-1.96%)
encoding/base64
encoding/base64.decodedLen 92 -> 90 (-2.17%)
encoding/base64.(*Encoding).DecodedLen 99 -> 97 (-2.02%)
time [cmd/compile]
time.(*Time).IsZero 113 -> 108 (-4.42%)
time.Time.IsZero 74 -> 66 (-10.81%)
time.(*Time).UnixNano 124 -> 113 (-8.87%)
time.Time.UnixMilli 106 -> 95 (-10.38%)
time.Time.Equal 170 -> 150 (-11.76%)
time.Time.UnixMicro 106 -> 95 (-10.38%)
time.(*Time).UnixMicro 141 -> 140 (-0.71%)
time.Time.Before 179 -> 161 (-10.06%)
time.Time.UnixNano 85 -> 78 (-8.24%)
time.Time.AppendBinary 766 -> 757 (-1.17%)
time.div 592 -> 587 (-0.84%)
time.Time.After 179 -> 161 (-10.06%)
time.Time.Compare 189 -> 166 (-12.17%)
time.(*Time).UnixMilli 141 -> 140 (-0.71%)
time.Time.Sub 444 -> 425 (-4.28%)
index/suffixarray
index/suffixarray.sais_8_32 1677 -> 1645 (-1.91%)
index/suffixarray.sais_32 1677 -> 1645 (-1.91%)
index/suffixarray.sais_64 1677 -> 1654 (-1.37%)
index/suffixarray.sais_8_64 1677 -> 1654 (-1.37%)
index/suffixarray.writeInt 249 -> 247 (-0.80%)
os
os.Expand 1070 -> 1051 (-1.78%)
os.Chtimes 787 -> 774 (-1.65%)
regexp [cmd/compile]
regexp.(*Regexp).FindAllStringSubmatch.func1 590 -> 569 (-3.56%)
regexp.QuoteMeta 485 -> 469 (-3.30%)
encoding/base64 [cmd/compile]
encoding/base64.decodedLen 92 -> 90 (-2.17%)
encoding/base64.(*Encoding).DecodedLen 99 -> 97 (-2.02%)
encoding/hex
encoding/hex.Encode 138 -> 136 (-1.45%)
encoding/hex.(*decoder).Read 830 -> 824 (-0.72%)
crypto/des
crypto/des.initFeistelBox 235 -> 229 (-2.55%)
crypto/des.cryptBlock 549 -> 538 (-2.00%)
os [cmd/compile]
os.Chtimes 787 -> 774 (-1.65%)
os.Expand 1070 -> 1051 (-1.78%)
math/big
math/big.newFloat 238 -> 223 (-6.30%)
math/big.nat.mul 2138 -> 2122 (-0.75%)
math/big.karatsubaSqr 1372 -> 1369 (-0.22%)
math/big.(*Float).sqrtInverse 895 -> 878 (-1.90%)
math/big.basicSqr 1032 -> 1017 (-1.45%)
cmd/vendor/golang.org/x/sys/unix
cmd/vendor/golang.org/x/sys/unix.TimeToTimespec 72 -> 66 (-8.33%)
encoding/json
encoding/json.Indent 404 -> 403 (-0.25%)
encoding/json.MarshalIndent 303 -> 297 (-1.98%)
testing
testing.(*T).Deadline 84 -> 82 (-2.38%)
testing.(*M).Run 3545 -> 3525 (-0.56%)
archive/zip
archive/zip.headerFileInfo.ModTime 229 -> 223 (-2.62%)
encoding/gob
encoding/gob.(*encoderState).encodeInt 474 -> 469 (-1.05%)
crypto/elliptic
crypto/elliptic.Marshal 728 -> 714 (-1.92%)
debug/buildinfo
debug/buildinfo.readString 325 -> 315 (-3.08%)
image/png
image/png.(*decoder).readImagePass 10866 -> 10834 (-0.29%)
archive/tar
archive/tar.Header.allowedFormats.func3 1768 -> 1736 (-1.81%)
archive/tar.formatPAXTime 389 -> 358 (-7.97%)
archive/tar.(*Writer).writeGNUHeader 741 -> 727 (-1.89%)
archive/tar.readGNUSparseMap0x1 709 -> 695 (-1.97%)
archive/tar.(*Writer).templateV7Plus 915 -> 909 (-0.66%)
crypto/internal/cryptotest
crypto/internal/cryptotest.TestHash.func4 890 -> 879 (-1.24%)
crypto/internal/cryptotest.TestStream.func6.1 646 -> 645 (-0.15%)
crypto/internal/cryptotest.testCipher.func3 1300 -> 1289 (-0.85%)
internal/pkgbits
internal/pkgbits.(*Encoder).Int64 113 -> 103 (-8.85%)
internal/pkgbits.(*Encoder).rawVarint 74 -> 72 (-2.70%)
testing/quick
testing/quick.(*Config).getRand 316 -> 315 (-0.32%)
log/slog
log/slog.TimeValue 489 -> 479 (-2.04%)
runtime/pprof
runtime/pprof.(*profileBuilder).build 2341 -> 2322 (-0.81%)
internal/coverage/cfile
internal/coverage/cfile.(*emitState).openMetaFile 824 -> 822 (-0.24%)
internal/coverage/cfile.(*emitState).openCounterFile 904 -> 892 (-1.33%)
cmd/internal/objabi
cmd/internal/objabi.expandArgs 1177 -> 1169 (-0.68%)
crypto/ecdsa
crypto/ecdsa.pointFromAffine 1162 -> 1144 (-1.55%)
net
net.minNonzeroTime 313 -> 308 (-1.60%)
net.cgoLookupAddrPTR 812 -> 797 (-1.85%)
net.(*IPNet).String 851 -> 827 (-2.82%)
net.IP.AppendText 488 -> 471 (-3.48%)
net.IPMask.String 281 -> 270 (-3.91%)
net.partialDeadline 374 -> 366 (-2.14%)
net.hexString 249 -> 240 (-3.61%)
net.IP.String 454 -> 453 (-0.22%)
internal/fuzz
internal/fuzz.newPcgRand 240 -> 234 (-2.50%)
crypto/x509
crypto/x509.(*Certificate).isValid 2642 -> 2611 (-1.17%)
cmd/internal/obj/s390x
cmd/internal/obj/s390x.buildop 33676 -> 33644 (-0.10%)
encoding/hex [cmd/compile]
encoding/hex.(*decoder).Read 830 -> 824 (-0.72%)
encoding/hex.Encode 138 -> 136 (-1.45%)
cmd/internal/objabi [cmd/compile]
cmd/internal/objabi.expandArgs 1177 -> 1169 (-0.68%)
math/big [cmd/compile]
math/big.(*Float).sqrtInverse 895 -> 878 (-1.90%)
math/big.nat.mul 2138 -> 2122 (-0.75%)
math/big.karatsubaSqr 1372 -> 1369 (-0.22%)
math/big.basicSqr 1032 -> 1017 (-1.45%)
math/big.newFloat 238 -> 223 (-6.30%)
encoding/json [cmd/compile]
encoding/json.MarshalIndent 303 -> 297 (-1.98%)
encoding/json.Indent 404 -> 403 (-0.25%)
cmd/covdata
main.(*metaMerge).emitCounters 985 -> 973 (-1.22%)
runtime/pprof [cmd/compile]
runtime/pprof.(*profileBuilder).build 2341 -> 2322 (-0.81%)
cmd/compile/internal/syntax
cmd/compile/internal/syntax.(*source).fill 722 -> 703 (-2.63%)
cmd/dist
main.runInstall 19081 -> 19049 (-0.17%)
crypto/tls
crypto/tls.extractPadding 176 -> 175 (-0.57%)
slices.Clone[[]crypto/tls.SignatureScheme,crypto/tls.SignatureScheme] 253 -> 247 (-2.37%)
slices.Clone[[]uint16,uint16] 253 -> 247 (-2.37%)
slices.Clone[[]crypto/tls.CurveID,crypto/tls.CurveID] 253 -> 247 (-2.37%)
crypto/tls.(*Config).cipherSuites 335 -> 326 (-2.69%)
slices.DeleteFunc[go.shape.[]crypto/tls.CurveID,go.shape.uint16] 437 -> 434 (-0.69%)
crypto/tls.dial 1349 -> 1339 (-0.74%)
slices.DeleteFunc[go.shape.[]uint16,go.shape.uint16] 437 -> 434 (-0.69%)
internal/pkgbits [cmd/compile]
internal/pkgbits.(*Encoder).Int64 113 -> 103 (-8.85%)
internal/pkgbits.(*Encoder).rawVarint 74 -> 72 (-2.70%)
cmd/compile/internal/syntax [cmd/compile]
cmd/compile/internal/syntax.(*source).fill 722 -> 703 (-2.63%)
cmd/internal/obj/s390x [cmd/compile]
cmd/internal/obj/s390x.buildop 33676 -> 33644 (-0.10%)
cmd/go/internal/trace
cmd/go/internal/trace.Flow 910 -> 886 (-2.64%)
cmd/go/internal/trace.(*Span).Done 311 -> 304 (-2.25%)
cmd/go/internal/trace.StartSpan 620 -> 615 (-0.81%)
cmd/internal/script
cmd/internal/script.(*Engine).Execute.func2 534 -> 528 (-1.12%)
cmd/link/internal/loader
cmd/link/internal/loader.(*Loader).SetSymSect 344 -> 338 (-1.74%)
net/http
net/http.(*Transport).queueForIdleConn 1797 -> 1766 (-1.73%)
net/http.(*Transport).getConn 2149 -> 2131 (-0.84%)
net/http.(*http2ClientConn).tooIdleLocked 207 -> 197 (-4.83%)
net/http.(*http2responseWriter).SetWriteDeadline.func1 520 -> 508 (-2.31%)
net/http.(*Cookie).Valid 837 -> 818 (-2.27%)
net/http.(*http2responseWriter).SetReadDeadline 373 -> 357 (-4.29%)
net/http.checkIfRange 701 -> 690 (-1.57%)
net/http.(*http2SettingsFrame).Value 325 -> 298 (-8.31%)
net/http.(*http2SettingsFrame).HasDuplicates 777 -> 767 (-1.29%)
net/http.(*Server).Serve 1746 -> 1739 (-0.40%)
net/http.http2traceGotConn 569 -> 556 (-2.28%)
net/http/pprof
net/http/pprof.collectProfile 242 -> 239 (-1.24%)
cmd/compile/internal/coverage
cmd/compile/internal/coverage.metaHashAndLen 439 -> 438 (-0.23%)
cmd/vendor/golang.org/x/telemetry/internal/upload
cmd/vendor/golang.org/x/telemetry/internal/upload.(*uploader).findWork 4570 -> 4540 (-0.66%)
cmd/vendor/golang.org/x/telemetry/internal/upload.(*uploader).reports 3604 -> 3572 (-0.89%)
cmd/compile/internal/coverage [cmd/compile]
cmd/compile/internal/coverage.metaHashAndLen 439 -> 438 (-0.23%)
cmd/vendor/golang.org/x/text/language
cmd/vendor/golang.org/x/text/language.regionGroupDist 287 -> 284 (-1.05%)
cmd/go/internal/vcweb
cmd/go/internal/vcweb.(*Server).overview.func1 1045 -> 1041 (-0.38%)
cmd/go/internal/vcs
cmd/go/internal/vcs.expand 761 -> 741 (-2.63%)
cmd/compile/internal/inline/inlheur
slices.stableCmpFunc[go.shape.struct 2300 -> 2284 (-0.70%)
cmd/compile/internal/inline/inlheur [cmd/compile]
slices.stableCmpFunc[go.shape.struct 2300 -> 2284 (-0.70%)
cmd/go/internal/modfetch/codehost
cmd/go/internal/modfetch/codehost.bzrParseStat 2217 -> 2213 (-0.18%)
cmd/link/internal/ld
cmd/link/internal/ld.decodetypeStructFieldCount 157 -> 152 (-3.18%)
cmd/link/internal/ld.(*Link).address 12559 -> 12495 (-0.51%)
cmd/link/internal/ld.(*dodataState).allocateDataSections 18345 -> 18205 (-0.76%)
cmd/link/internal/ld.elfshreloc 618 -> 616 (-0.32%)
cmd/link/internal/ld.(*deadcodePass).decodetypeMethods 794 -> 779 (-1.89%)
cmd/link/internal/ld.(*dodataState).assignDsymsToSection 668 -> 663 (-0.75%)
cmd/link/internal/ld.relocSectFn 285 -> 284 (-0.35%)
cmd/link/internal/ld.decodetypeIfaceMethodCount 146 -> 144 (-1.37%)
cmd/link/internal/ld.decodetypeArrayLen 157 -> 152 (-3.18%)
cmd/link/internal/arm64
cmd/link/internal/arm64.gensymlate.func1 895 -> 888 (-0.78%)
cmd/go/internal/modload
cmd/go/internal/modload.queryProxy.func3 1029 -> 1012 (-1.65%)
cmd/go/internal/load
cmd/go/internal/load.(*Package).setBuildInfo 8453 -> 8447 (-0.07%)
cmd/go/internal/clean
cmd/go/internal/clean.runClean 2120 -> 2104 (-0.75%)
cmd/compile/internal/ssa
cmd/compile/internal/ssa.(*poset).aliasnodes 2010 -> 1978 (-1.59%)
cmd/compile/internal/ssa.rewriteValueARM64_OpARM64MOVHstoreidx2 730 -> 719 (-1.51%)
cmd/compile/internal/ssa.(*debugState).buildLocationLists 3326 -> 3294 (-0.96%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDLconst 3069 -> 2941 (-4.17%)
cmd/compile/internal/ssa.(*debugState).processValue 9756 -> 9724 (-0.33%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDQconst 3069 -> 2941 (-4.17%)
cmd/compile/internal/ssa.(*poset).mergeroot 1079 -> 1054 (-2.32%)
cmd/compile/internal/ssa [cmd/compile]
cmd/compile/internal/ssa.rewriteValueARM64_OpARM64MOVHstoreidx2 730 -> 719 (-1.51%)
cmd/compile/internal/ssa.(*poset).aliasnodes 2010 -> 1978 (-1.59%)
cmd/compile/internal/ssa.(*poset).mergeroot 1079 -> 1054 (-2.32%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDQconst 3069 -> 2941 (-4.17%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDLconst 3069 -> 2941 (-4.17%)
file before after Δ %
math/bits.s 2352 2354 +2 +0.085%
math/bits [cmd/compile].s 2352 2354 +2 +0.085%
math.s 35675 35674 -1 -0.003%
math [cmd/compile].s 35675 35674 -1 -0.003%
runtime.s 577251 577245 -6 -0.001%
runtime [cmd/compile].s 642419 642438 +19 +0.003%
sort.s 37434 37435 +1 +0.003%
strconv.s 48391 48343 -48 -0.099%
sort [cmd/compile].s 37434 37435 +1 +0.003%
bufio.s 21386 21418 +32 +0.150%
strconv [cmd/compile].s 48391 48343 -48 -0.099%
image.s 34978 35022 +44 +0.126%
regexp/syntax.s 81719 81781 +62 +0.076%
time.s 94341 94184 -157 -0.166%
regexp.s 60411 60399 -12 -0.020%
bufio [cmd/compile].s 21512 21544 +32 +0.149%
encoding/binary.s 34062 34087 +25 +0.073%
regexp/syntax [cmd/compile].s 81719 81781 +62 +0.076%
encoding/base64.s 11907 11903 -4 -0.034%
time [cmd/compile].s 94341 94184 -157 -0.166%
index/suffixarray.s 41633 41527 -106 -0.255%
os.s 101770 101738 -32 -0.031%
regexp [cmd/compile].s 60411 60399 -12 -0.020%
encoding/binary [cmd/compile].s 37173 37198 +25 +0.067%
encoding/base64 [cmd/compile].s 11907 11903 -4 -0.034%
os/exec.s 23900 23907 +7 +0.029%
encoding/hex.s 6038 6030 -8 -0.132%
crypto/des.s 5073 5056 -17 -0.335%
os [cmd/compile].s 102030 101998 -32 -0.031%
vendor/golang.org/x/net/http2/hpack.s 22027 22033 +6 +0.027%
math/big.s 164808 164753 -55 -0.033%
cmd/vendor/golang.org/x/sys/unix.s 121450 121444 -6 -0.005%
encoding/json.s 110294 110287 -7 -0.006%
testing.s 115303 115281 -22 -0.019%
archive/zip.s 65329 65325 -4 -0.006%
os/user.s 10078 10080 +2 +0.020%
encoding/gob.s 143788 143783 -5 -0.003%
crypto/elliptic.s 30686 30704 +18 +0.059%
go/doc/comment.s 49401 49433 +32 +0.065%
debug/buildinfo.s 9095 9085 -10 -0.110%
image/png.s 36113 36081 -32 -0.089%
archive/tar.s 71994 71897 -97 -0.135%
crypto/internal/cryptotest.s 60872 60849 -23 -0.038%
internal/pkgbits.s 20441 20429 -12 -0.059%
testing/quick.s 8236 8235 -1 -0.012%
log/slog.s 77568 77558 -10 -0.013%
internal/trace/internal/oldtrace.s 52885 52896 +11 +0.021%
runtime/pprof.s 123978 123969 -9 -0.007%
internal/coverage/cfile.s 25198 25184 -14 -0.056%
cmd/internal/objabi.s 19954 19946 -8 -0.040%
crypto/ecdsa.s 29159 29141 -18 -0.062%
log/slog/internal/benchmarks.s 6694 6695 +1 +0.015%
net.s 299569 299503 -66 -0.022%
os/exec [cmd/compile].s 23888 23895 +7 +0.029%
internal/trace.s 179226 179240 +14 +0.008%
internal/fuzz.s 86190 86191 +1 +0.001%
crypto/x509.s 177195 177164 -31 -0.017%
cmd/internal/obj/s390x.s 121642 121610 -32 -0.026%
cmd/internal/obj/ppc64.s 140118 140122 +4 +0.003%
encoding/hex [cmd/compile].s 6149 6141 -8 -0.130%
cmd/internal/objabi [cmd/compile].s 19954 19946 -8 -0.040%
cmd/internal/obj/arm64.s 158523 158555 +32 +0.020%
go/doc/comment [cmd/compile].s 49512 49544 +32 +0.065%
math/big [cmd/compile].s 166394 166339 -55 -0.033%
encoding/json [cmd/compile].s 110712 110705 -7 -0.006%
cmd/covdata.s 39699 39687 -12 -0.030%
runtime/pprof [cmd/compile].s 125209 125200 -9 -0.007%
cmd/compile/internal/syntax.s 181755 181736 -19 -0.010%
cmd/dist.s 177893 177861 -32 -0.018%
crypto/tls.s 389157 389113 -44 -0.011%
internal/pkgbits [cmd/compile].s 41644 41632 -12 -0.029%
cmd/compile/internal/syntax [cmd/compile].s 196105 196086 -19 -0.010%
cmd/compile/internal/types.s 71315 71345 +30 +0.042%
cmd/internal/obj/s390x [cmd/compile].s 121733 121701 -32 -0.026%
cmd/go/internal/trace.s 4796 4760 -36 -0.751%
cmd/internal/obj/arm64 [cmd/compile].s 168120 168147 +27 +0.016%
cmd/internal/obj/ppc64 [cmd/compile].s 140219 140223 +4 +0.003%
cmd/internal/script.s 83442 83436 -6 -0.007%
cmd/link/internal/loader.s 93299 93294 -5 -0.005%
net/http.s 620639 620472 -167 -0.027%
net/http/pprof.s 35016 35013 -3 -0.009%
cmd/compile/internal/coverage.s 6668 6667 -1 -0.015%
cmd/vendor/golang.org/x/telemetry/internal/upload.s 34210 34148 -62 -0.181%
cmd/compile/internal/coverage [cmd/compile].s 6664 6663 -1 -0.015%
cmd/vendor/golang.org/x/text/language.s 48077 48074 -3 -0.006%
cmd/go/internal/vcweb.s 45193 45189 -4 -0.009%
cmd/go/internal/vcs.s 44749 44729 -20 -0.045%
cmd/compile/internal/inline/inlheur.s 83758 83742 -16 -0.019%
cmd/compile/internal/inline/inlheur [cmd/compile].s 84773 84757 -16 -0.019%
cmd/go/internal/modfetch/codehost.s 89098 89094 -4 -0.004%
cmd/trace.s 257550 257564 +14 +0.005%
cmd/link/internal/ld.s 641945 641706 -239 -0.037%
cmd/link/internal/arm64.s 34805 34798 -7 -0.020%
cmd/go/internal/modload.s 328971 328954 -17 -0.005%
cmd/go/internal/load.s 178877 178871 -6 -0.003%
cmd/go/internal/clean.s 11006 10990 -16 -0.145%
cmd/compile/internal/ssa.s 3552843 3553347 +504 +0.014%
cmd/compile/internal/ssa [cmd/compile].s 3752511 3753123 +612 +0.016%
total 36179015 36178687 -328 -0.001%
Change-Id: I251c2898ccf3c9931d162d87dabbd49cf4ec73a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/641757
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
CL 621357 introduced new generic lowering rules which caused
several shift related codegen test failures.
Add new rules to fix the test regressions, and cleanup tests
which are changed but not regressed. Some CLRLSLDI tests are
removed as they are no test CLRLSLDI rules.
Fixes #70003
Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034
Reviewed-on: https://go-review.googlesource.com/c/go/+/622236
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
When GORISCV64 enables rva22u64, combined shift and addition using the
SH1ADD, SH2ADD and SH3ADD instructions that are available via the Zba
extension. This results in more than 2000 instructions being removed
from the Go binary on riscv64.
Change-Id: Ia62ae7dda3d8083cff315113421bee73f518eea8
Reviewed-on: https://go-review.googlesource.com/c/go/+/606636
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
|
|
The rotate value was not correctly converted from a 64 bit to 32
bit rotate. This caused a miscompile of
golang.org/x/text/unicode/runenames.Names.
Fixes #67526
Change-Id: Ief56fbab27ccc71cd4c01117909bfee7f60a2ea1
Reviewed-on: https://go-review.googlesource.com/c/go/+/586915
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Investigating binaries, these patterns seem to show up frequently.
Change-Id: I987251e4070e35c25e98da321e444ccaa1526912
Reviewed-on: https://go-review.googlesource.com/c/go/+/583302
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
RLWINM
This provides a small performance bump to crc64 as measured on ppc64le/power10:
name old time/op new time/op delta
Crc64/ISO64KB 49.6µs ± 0% 46.6µs ± 0% -6.18%
Crc64/ISO4KB 3.16µs ± 0% 2.97µs ± 0% -5.83%
Crc64/ISO1KB 840ns ± 0% 794ns ± 0% -5.46%
Crc64/ECMA64KB 49.6µs ± 0% 46.5µs ± 0% -6.20%
Crc64/Random64KB 53.1µs ± 0% 49.9µs ± 0% -6.04%
Crc64/Random16KB 15.9µs ± 1% 15.0µs ± 0% -5.73%
Change-Id: I302b5431c7dc46dfd2d211545c483bdcdfe011f1
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/581937
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Eli Bendersky <eliben@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
The code generation on riscv64 will currently result in incorrect
assembly when a 32 bit integer is right shifted by an amount that
exceeds the size of the type. In particular, this occurs when an
int32 or uint32 is cast to a 64 bit type and right shifted by a
value larger than 31.
Fix this by moving the SRAW/SRLW conversion into the right shift
rules and removing the SignExt32to64/ZeroExt32to64. Add additional
rules that rewrite to SRAIW/SRLIW when the shift is less than the
size of the type, or replace/eliminate the shift when it exceeds
the size of the type.
Add SSA tests that would have caught this issue. Also add additional
codegen tests to ensure that the resulting assembly is what we
expect in these overflow cases.
Fixes #64285
Change-Id: Ie97b05668597cfcb91413afefaab18ee1aa145ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/545035
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
The compiler is currently sign extending 32 bit signed integers to
64 bits before right shifting them using a 64 bit shift instruction.
There's no need to do this as RISC-V has instructions for right
shifting 32 bit signed values (sraw and sraiw) which sign extend
the result of the shift to 64 bits. Change the compiler so that
it uses sraw and sraiw for shifts of signed 32 bit integers reducing
in most cases the number of instructions needed to perform the shift.
Here are some examples of code sequences that are changed by this
patch:
int32(a) >> 2
before:
sll x5,x10,0x20
sra x10,x5,0x22
after:
sraw x10,x10,0x2
int32(v) >> int(s)
before:
sext.w x5,x10
sltiu x6,x11,64
add x6,x6,-1
or x6,x11,x6
sra x10,x5,x6
after:
sltiu x5,x11,32
add x5,x5,-1
or x5,x11,x5
sraw x10,x10,x5
int32(v) >> (int(s) & 31)
before:
sext.w x5,x10
and x6,x11,63
sra x10,x5,x6
after:
and x5,x11,31
sraw x10,x10,x5
int32(100) >> int(a)
before:
bltz x10,<target address calls runtime.panicshift>
sltiu x5,x10,64
add x5,x5,-1
or x5,x10,x5
li x6,100
sra x10,x6,x5
after:
bltz x10,<target address calls runtime.panicshift>
sltiu x5,x10,32
add x5,x5,-1
or x5,x10,x5
li x6,100
sraw x10,x6,x5
int32(v) >> (int(s) & 63)
before:
sext.w x5,x10
and x6,x11,63
sra x10,x5,x6
after:
and x5,x11,63
sltiu x6,x5,32
add x6,x6,-1
or x5,x5,x6
sraw x10,x10,x5
In most cases we eliminate one instruction. In the case where
we shift a int32 constant by a variable the number of instructions
generated is identical. A sra is simply replaced by a sraw. In the
unusual case where we shift right by a variable anded with a constant
> 31 but < 64, we generate two additional instructions. As this is
an unusual case we do not try to optimize for it.
Some improvements can be seen in some of the existing benchmarks,
notably in the utf8 package which performs right shifts of runes
which are signed 32 bit integers.
| utf8-old | utf8-new |
| sec/op | sec/op vs base |
EncodeASCIIRune-4 17.68n ± 0% 17.67n ± 0% ~ (p=0.312 n=10)
EncodeJapaneseRune-4 35.34n ± 0% 34.53n ± 1% -2.31% (p=0.000 n=10)
AppendASCIIRune-4 3.213n ± 0% 3.213n ± 0% ~ (p=0.318 n=10)
AppendJapaneseRune-4 36.14n ± 0% 35.35n ± 0% -2.19% (p=0.000 n=10)
DecodeASCIIRune-4 28.11n ± 0% 27.36n ± 0% -2.69% (p=0.000 n=10)
DecodeJapaneseRune-4 38.55n ± 0% 38.58n ± 0% ~ (p=0.612 n=10)
Change-Id: I60a91cbede9ce65597571c7b7dd9943eeb8d3cc2
Reviewed-on: https://go-review.googlesource.com/c/go/+/535115
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
In the PPC64 ISA, the instruction to do an 'and' operation
using an immediate constant is only available in the form that
also sets CR0 (i.e. clobbers the condition register.) This means
CR0 is being clobbered unnecessarily in many cases. That
affects some decisions made during some compiler passes
that check for it.
In those cases when the constant used by the ANDCC is a right
justified consecutive set of bits, a shift instruction can
be used which has the same effect if CR0 does not need to be
set. The rule to do that has been added to the late rules file
after other rules using ANDCCconst have been processed in the
main rules file.
Some codegen tests had to be updated since ANDCC is no
longer generated for some cases. A new test case was added to
verify the ANDCC is present if the results for both the AND
and CR0 are used.
Change-Id: I304f607c039a458e2d67d25351dd00aea72ba542
Reviewed-on: https://go-review.googlesource.com/c/go/+/531435
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
The compiler is currently zero extending 32 bit unsigned integers to
64 bits before right shifting them using a 64 bit shift instruction.
There's no need to do this as RISC-V has instructions for right
shifting 32 bit unsigned values (srlw and srliw) which zero extend
the result of the shift to 64 bits. Change the compiler so that
it uses srlw and srliw for 32 bit unsigned shifts reducing in most
cases the number of instructions needed to perform the shift.
Here are some examples of code sequences that are changed by this
patch:
uint32(a) >> 2
before:
sll x5,x10,0x20
srl x10,x5,0x22
after:
srlw x10,x10,0x2
uint32(a) >> int(b)
before:
sll x5,x10,0x20
srl x5,x5,0x20
srl x5,x5,x11
sltiu x6,x11,64
neg x6,x6
and x10,x5,x6
after:
srlw x5,x10,x11
sltiu x6,x11,32
neg x6,x6
and x10,x5,x6
bits.RotateLeft32(uint32(a), 1)
before:
sll x5,x10,0x1
sll x6,x10,0x20
srl x7,x6,0x3f
or x5,x5,x7
after:
sll x5,x10,0x1
srlw x6,x10,0x1f
or x10,x5,x6
bits.RotateLeft32(uint32(a), int(b))
before:
and x6,x11,31
sll x7,x10,x6
sll x8,x10,0x20
srl x8,x8,0x20
add x6,x6,-32
neg x6,x6
srl x9,x8,x6
sltiu x6,x6,64
neg x6,x6
and x6,x9,x6
or x6,x6,x7
after:
and x5,x11,31
sll x6,x10,x5
add x5,x5,-32
neg x5,x5
srlw x7,x10,x5
sltiu x5,x5,32
neg x5,x5
and x5,x7,x5
or x10,x6,x5
The one regression observed is the following case, an unbounded right
shift of a uint32 where the value we're shifting by is known to be
< 64 but > 31. As this is an unusual case this commit does not
optimize for it, although the existing code does.
uint32(a) >> (b & 63)
before:
sll x5,x10,0x20
srl x5,x5,0x20
and x6,x11,63
srl x10,x5,x6
after
and x5,x11,63
srlw x6,x10,x5
sltiu x5,x5,32
neg x5,x5
and x10,x6,x5
Here we have one extra instruction.
Some benchmark highlights, generated on a VisionFive2 8GB running
Ubuntu 23.04.
pkg: math/bits
LeadingZeros32-4 18.64n ± 0% 17.32n ± 0% -7.11% (p=0.000 n=10)
LeadingZeros64-4 15.47n ± 0% 15.51n ± 0% +0.26% (p=0.027 n=10)
TrailingZeros16-4 18.48n ± 0% 17.68n ± 0% -4.33% (p=0.000 n=10)
TrailingZeros32-4 16.87n ± 0% 16.07n ± 0% -4.74% (p=0.000 n=10)
TrailingZeros64-4 15.26n ± 0% 15.27n ± 0% +0.07% (p=0.043 n=10)
OnesCount32-4 20.08n ± 0% 19.29n ± 0% -3.96% (p=0.000 n=10)
RotateLeft-4 8.864n ± 0% 8.838n ± 0% -0.30% (p=0.006 n=10)
RotateLeft32-4 8.837n ± 0% 8.032n ± 0% -9.11% (p=0.000 n=10)
Reverse32-4 29.77n ± 0% 26.52n ± 0% -10.93% (p=0.000 n=10)
ReverseBytes32-4 9.640n ± 0% 8.838n ± 0% -8.32% (p=0.000 n=10)
Sub32-4 8.835n ± 0% 8.035n ± 0% -9.06% (p=0.000 n=10)
geomean 11.50n 11.33n -1.45%
pkg: crypto/md5
Hash8Bytes-4 1.486µ ± 0% 1.426µ ± 0% -4.04% (p=0.000 n=10)
Hash64-4 2.079µ ± 0% 1.968µ ± 0% -5.36% (p=0.000 n=10)
Hash128-4 2.720µ ± 0% 2.557µ ± 0% -5.99% (p=0.000 n=10)
Hash256-4 3.996µ ± 0% 3.733µ ± 0% -6.58% (p=0.000 n=10)
Hash512-4 6.541µ ± 0% 6.072µ ± 0% -7.18% (p=0.000 n=10)
Hash1K-4 11.64µ ± 0% 10.75µ ± 0% -7.58% (p=0.000 n=10)
Hash8K-4 82.95µ ± 0% 76.32µ ± 0% -7.99% (p=0.000 n=10)
Hash1M-4 10.436m ± 0% 9.591m ± 0% -8.10% (p=0.000 n=10)
Hash8M-4 83.50m ± 0% 76.73m ± 0% -8.10% (p=0.000 n=10)
Hash8BytesUnaligned-4 1.494µ ± 0% 1.434µ ± 0% -4.02% (p=0.000 n=10)
Hash1KUnaligned-4 11.64µ ± 0% 10.76µ ± 0% -7.52% (p=0.000 n=10)
Hash8KUnaligned-4 83.01µ ± 0% 76.32µ ± 0% -8.07% (p=0.000 n=10)
geomean 28.32µ 26.42µ -6.72%
Change-Id: I20483a6668cca1b53fe83944bee3706aadcf8693
Reviewed-on: https://go-review.googlesource.com/c/go/+/528975
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Manually consolidate the remaining ppc64/ppc64le test which
are not so trivial to automatically merge.
The remaining ppc64le tests are limited to cases where load/stores are
merged (this only happens on ppc64le) and the race detector (only
supported on ppc64le).
Change-Id: I1f9c0f3d3ddbb7fbbd8c81fbbd6537394fba63ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/463217
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
Use a small python script to consolidate duplicate
ppc64/ppc64le tests into a single ppc64x codegen test.
This makes small assumption that anytime two tests with
for different arch/variant combos exists, those tests
can be combined into a single ppc64x test.
E.x:
// ppc64le: foo
// ppc64le/power9: foo
into
// ppc64x: foo
or
// ppc64: foo
// ppc64le: foo
into
// ppc64x: foo
import glob
import re
files = glob.glob("codegen/*.go")
for file in files:
with open(file) as f:
text = [l for l in f]
i = 0
while i < len(text):
first = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[i])
if first:
j = i+1
while j < len(text):
second = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[j])
if not second:
break
if (not first.group(2) or first.group(2) == second.group(2)) and first.group(3) == second.group(3):
text[i] = re.sub(" ?ppc64(le|x)?"," ppc64x",text[i])
text=text[:j] + (text[j+1:])
else:
j += 1
i+=1
with open(file, 'w') as f:
f.write("".join(text))
Change-Id: Ic6b009b54eacaadc5a23db9c5a3bf7331b595821
Reviewed-on: https://go-review.googlesource.com/c/go/+/463220
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
For example:
movb a0, a0
srai $1, a0, a0
the assembler will expand to:
slli $56, a0, a0
srai $56, a0, a0
srai $1, a0, a0
this CL optimize to:
slli $56, a0, a0
srai $57, a0, a0
Remove 270+ instructions from Go binary on linux/riscv64.
Change-Id: I375e19f9d3bd54f2781791d8cbe5970191297dc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/428496
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
When SLTI/SLTIU is used with ANDI/ORI, it may be possible to determine the
outcome based on the values of the immediates. Resolve these cases.
Improves code generation for various shift operations.
While here, sort tests by architecture to improve readability and ease
future maintenance.
Change-Id: I87e71e016a0e396a928e7d6389a2df61583dfd8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/428217
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Jenny Rakoczy <jenny@golang.org>
Reviewed-by: Jenny Rakoczy <jenny@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Jenny Rakoczy <jenny@golang.org>
|
|
This CL adds shiftIsBounded checks for the Lsh* and Rsh* rules in arm64.
There is no need to check the shift value again with CMP + CSEL when the
shift value is valid.
Change-Id: I54620de64f02a1b5a11089add237248ae2de01b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/417714
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
|
|
Fixes #54496
Change-Id: I3c2ed8cd55836d5b07c8cdec00d3b584885aca79
Reviewed-on: https://go-review.googlesource.com/c/go/+/424856
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Martin Möhrmann <martin@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <martin@golang.org>
|
|
For the following code case:
var x uint64
x >> (shift & 63)
We can directly genereta `x >> shift` on arm64, since the hardware will
only use the bottom 6 bits of the shift amount.
Benchmark old time/op new time/op delta
ShiftArithmeticRight-8 0.40ns 0.31ns -21.7%
Change-Id: Id58c8a5b2f6dd5c30c3876f4a36e11b4d81e2dc9
Reviewed-on: https://go-review.googlesource.com/c/go/+/425294
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
|
|
The prove pass will mark some shifts bounded, and then we can use that
information to generate better code on riscv64.
Change-Id: Ia22f43d0598453c9417adac7017db28d7240948b
Reviewed-on: https://go-review.googlesource.com/c/go/+/422616
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Updated multiple tests in test/codegen: math.go, mathbits.go, shift.go
and slices.go to verify on ppc64/ppc64le as well
Change-Id: Id88dd41569b7097819fb4d451b615f69cf7f7a94
Reviewed-on: https://go-review.googlesource.com/c/go/+/412115
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Instructions with immediates can be precomputed when operating on a
constant - do so for SLTI/SLTIU, SLLI/SRLI/SRAI, NEG/NEGW, ANDI, ORI
and ADDI. Additionally, optimise ANDI and ORI when the immediate is
all ones or all zeroes.
In particular, the RISCV64 logical left and right shift rules
(Lsh*x*/Rsh*Ux*) produce sequences that check if the shift amount
exceeds 64 and if so returns zero. When the shift amount is a
constant we can precompute and eliminate the filter entirely.
Likewise the arithmetic right shift rules produce sequences that
check if the shift amount exceeds 64 and if so, ensures that the
lower six bits of the shift are all ones. When the shift amount
is a constant we can precompute the shift value.
Arithmetic right shift sequences like:
117fc: 00100513 li a0,1
11800: 04053593 sltiu a1,a0,64
11804: fff58593 addi a1,a1,-1
11808: 0015e593 ori a1,a1,1
1180c: 40b45433 sra s0,s0,a1
Are now a single srai instruction:
117fc: 40145413 srai s0,s0,0x1
Likewise for logical left shift (and logical right shift):
1d560: 01100413 li s0,17
1d564: 04043413 sltiu s0,s0,64
1d568: 40800433 neg s0,s0
1d56c: 01131493 slli s1,t1,0x11
1d570: 0084f433 and s0,s1,s0
Which are now a single slli (or srli) instruction:
1d120: 01131413 slli s0,t1,0x11
This removes more than 30,000 instructions from the Go binary and
should improve performance in a variety of areas - of note
runtime.makemap_small drops from 48 to 36 instructions. Similar
gains exist in at least other parts of runtime and math/bits.
Change-Id: I33f6f3d1fd36d9ff1bda706997162bfe4bb859b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/350689
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Add tests for shift by constant, masked shifts and bounded shifts. While here,
sort tests by architecture and keep order of tests consistent (lsh, rshU, rsh).
Change-Id: I512d64196f34df9cb2884e8c0f6adcf9dd88b0fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/351289
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
|
|
amd64 can shift in bits from another register instead of filling with 0/1.
This pattern is helpful when implementing 128 bit shifts or arbitrary length shifts.
In the standard library, it shows up in pure Go math/big.
Benchmarks results on amd64 with -tags=math_big_pure_go.
name old time/op new time/op delta
NonZeroShifts/1/shrVU-8 4.45ns ± 3% 4.39ns ± 1% -1.28% (p=0.000 n=30+27)
NonZeroShifts/1/shlVU-8 4.13ns ± 4% 4.10ns ± 2% ~ (p=0.254 n=29+28)
NonZeroShifts/2/shrVU-8 5.55ns ± 1% 5.63ns ± 2% +1.42% (p=0.000 n=28+29)
NonZeroShifts/2/shlVU-8 5.70ns ± 2% 5.14ns ± 1% -9.82% (p=0.000 n=29+28)
NonZeroShifts/3/shrVU-8 6.79ns ± 2% 6.35ns ± 2% -6.46% (p=0.000 n=28+29)
NonZeroShifts/3/shlVU-8 6.69ns ± 1% 6.25ns ± 1% -6.60% (p=0.000 n=28+27)
NonZeroShifts/4/shrVU-8 7.79ns ± 2% 7.06ns ± 2% -9.48% (p=0.000 n=30+30)
NonZeroShifts/4/shlVU-8 7.82ns ± 1% 7.24ns ± 1% -7.37% (p=0.000 n=28+29)
NonZeroShifts/5/shrVU-8 8.90ns ± 3% 7.93ns ± 1% -10.84% (p=0.000 n=29+26)
NonZeroShifts/5/shlVU-8 8.68ns ± 1% 7.92ns ± 1% -8.76% (p=0.000 n=29+29)
NonZeroShifts/10/shrVU-8 14.4ns ± 1% 12.3ns ± 2% -14.79% (p=0.000 n=28+29)
NonZeroShifts/10/shlVU-8 14.1ns ± 1% 11.9ns ± 2% -15.55% (p=0.000 n=28+27)
NonZeroShifts/100/shrVU-8 118ns ± 1% 96ns ± 3% -18.82% (p=0.000 n=30+29)
NonZeroShifts/100/shlVU-8 120ns ± 2% 98ns ± 2% -18.46% (p=0.000 n=29+28)
NonZeroShifts/1000/shrVU-8 1.10µs ± 1% 0.88µs ± 2% -19.63% (p=0.000 n=29+30)
NonZeroShifts/1000/shlVU-8 1.10µs ± 2% 0.88µs ± 2% -20.28% (p=0.000 n=29+28)
NonZeroShifts/10000/shrVU-8 10.9µs ± 1% 8.7µs ± 1% -19.78% (p=0.000 n=28+27)
NonZeroShifts/10000/shlVU-8 10.9µs ± 2% 8.7µs ± 1% -19.64% (p=0.000 n=29+27)
NonZeroShifts/100000/shrVU-8 111µs ± 2% 90µs ± 2% -19.39% (p=0.000 n=28+29)
NonZeroShifts/100000/shlVU-8 113µs ± 2% 90µs ± 2% -20.43% (p=0.000 n=30+27)
The assembly version is still faster, unfortunately, but the gap is narrowing.
Speedup from pure Go to assembly:
name old time/op new time/op delta
NonZeroShifts/1/shrVU-8 4.39ns ± 1% 3.45ns ± 2% -21.36% (p=0.000 n=27+29)
NonZeroShifts/1/shlVU-8 4.10ns ± 2% 3.47ns ± 3% -15.42% (p=0.000 n=28+30)
NonZeroShifts/2/shrVU-8 5.63ns ± 2% 3.97ns ± 0% -29.40% (p=0.000 n=29+25)
NonZeroShifts/2/shlVU-8 5.14ns ± 1% 3.77ns ± 2% -26.65% (p=0.000 n=28+26)
NonZeroShifts/3/shrVU-8 6.35ns ± 2% 4.79ns ± 2% -24.52% (p=0.000 n=29+29)
NonZeroShifts/3/shlVU-8 6.25ns ± 1% 4.42ns ± 1% -29.29% (p=0.000 n=27+26)
NonZeroShifts/4/shrVU-8 7.06ns ± 2% 5.64ns ± 1% -20.05% (p=0.000 n=30+29)
NonZeroShifts/4/shlVU-8 7.24ns ± 1% 5.34ns ± 2% -26.23% (p=0.000 n=29+29)
NonZeroShifts/5/shrVU-8 7.93ns ± 1% 6.56ns ± 2% -17.26% (p=0.000 n=26+30)
NonZeroShifts/5/shlVU-8 7.92ns ± 1% 6.27ns ± 1% -20.79% (p=0.000 n=29+25)
NonZeroShifts/10/shrVU-8 12.3ns ± 2% 10.2ns ± 2% -17.21% (p=0.000 n=29+29)
NonZeroShifts/10/shlVU-8 11.9ns ± 2% 10.5ns ± 2% -12.45% (p=0.000 n=27+29)
NonZeroShifts/100/shrVU-8 95.9ns ± 3% 77.7ns ± 1% -19.00% (p=0.000 n=29+30)
NonZeroShifts/100/shlVU-8 97.5ns ± 2% 66.8ns ± 2% -31.47% (p=0.000 n=28+30)
NonZeroShifts/1000/shrVU-8 884ns ± 2% 705ns ± 1% -20.17% (p=0.000 n=30+28)
NonZeroShifts/1000/shlVU-8 880ns ± 2% 590ns ± 1% -32.96% (p=0.000 n=28+25)
NonZeroShifts/10000/shrVU-8 8.74µs ± 1% 7.34µs ± 3% -15.94% (p=0.000 n=27+30)
NonZeroShifts/10000/shlVU-8 8.73µs ± 1% 6.00µs ± 1% -31.25% (p=0.000 n=27+28)
NonZeroShifts/100000/shrVU-8 89.6µs ± 2% 75.5µs ± 2% -15.80% (p=0.000 n=29+29)
NonZeroShifts/100000/shlVU-8 89.6µs ± 2% 68.0µs ± 3% -24.09% (p=0.000 n=27+30)
Change-Id: I18f58d8f5513d737d9cdf09b8f9d14011ffe3958
Reviewed-on: https://go-review.googlesource.com/c/go/+/297050
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
These instructions are actually 5 argument opcodes as specified
by the ISA. Prior to this patch, the MB and ME arguments were
merged into a single bitmask operand to workaround the limitations
of the ppc64 assembler backend.
This limitation no longer exists. Thus, we can pass operands for
these opcodes without having to merge the MB and ME arguments in
the assembler frontend or compiler backend.
Likewise, support for 4 operand variants is unchanged.
Change-Id: Ib086774f3581edeaadfd2190d652aaaa8a90daeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/298750
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
|
|
Optimize combinations of left and right shifts by a constant value
into a 'rotate then insert selected bits [into zero]' instruction.
Use the same instruction for contiguous masks since it has some
benefits over 'and immediate' (not restricted to 32-bits, does not
overwrite source register).
To keep the complexity of this change under control I've only
implemented 64 bit operations for now.
There are a lot more optimizations that can be done with this
instruction family. However, since their function overlaps with other
instructions we need to be somewhat careful not to break existing
optimization rules by creating optimization dead ends. This is
particularly true of the load/store merging rules which contain lots
of zero extensions and shifts.
This CL does interfere with the store merging rules when an operand
is shifted left before it is stored:
binary.BigEndian.PutUint64(b, x << 1)
This is unfortunate but it's not critical and somewhat complex so
I plan to fix that in a follow up CL.
file before after Δ %
addr2line 4117446 4117282 -164 -0.004%
api 4945184 4942752 -2432 -0.049%
asm 4998079 4991891 -6188 -0.124%
buildid 2685158 2684074 -1084 -0.040%
cgo 4553732 4553394 -338 -0.007%
compile 19294446 19245070 -49376 -0.256%
cover 4897105 4891319 -5786 -0.118%
dist 3544389 3542785 -1604 -0.045%
doc 3926795 3927617 +822 +0.021%
fix 3302958 3293868 -9090 -0.275%
link 6546274 6543456 -2818 -0.043%
nm 4102021 4100825 -1196 -0.029%
objdump 4542431 4548483 +6052 +0.133%
pack 2482465 2416389 -66076 -2.662%
pprof 13366541 13363915 -2626 -0.020%
test2json 2829007 2761515 -67492 -2.386%
trace 10216164 10219684 +3520 +0.034%
vet 6773956 6773572 -384 -0.006%
total 107124151 106917891 -206260 -0.193%
Change-Id: I7591cce41e06867ba10a745daae9333513062746
Reviewed-on: https://go-review.googlesource.com/c/go/+/233317
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Michael Munday <mike.munday@ibm.com>
|
|
Combine (AND m (SRWconst x)) or (SRWconst (AND m x)) when mask m is
and the shift value produce constant which can be encoded into an
RLWINM instruction.
Combine (CLRLSLDI (SRWconst x)) if the combining of the underling rotate
masks produces a constant which can be encoded into RLWINM.
Likewise for (SLDconst (SRWconst x)) and (CLRLSDI (RLWINM x)).
Combine rotate word + and operations which can be encoded as a single
RLWINM/RLWNM instruction.
The most notable performance improvements arise from the crypto
benchmarks below (GOARCH=power8 on a ppc64le/linux):
pkg:golang.org/x/crypto/blowfish goos:linux goarch:ppc64le
ExpandKeyWithSalt 52.2µs ± 0% 47.5µs ± 0% -8.88%
ExpandKey 44.4µs ± 0% 40.3µs ± 0% -9.15%
pkg:golang.org/x/crypto/ssh/internal/bcrypt_pbkdf goos:linux goarch:ppc64le
Key 57.6ms ± 0% 52.3ms ± 0% -9.13%
pkg:golang.org/x/crypto/bcrypt goos:linux goarch:ppc64le
Equal 90.9ms ± 0% 82.6ms ± 0% -9.13%
DefaultCost 91.0ms ± 0% 82.7ms ± 0% -9.12%
Change-Id: I59a0ca29face38f4ab46e37124c32906f216c4ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/260798
Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
A recent change to improve shifts was generating some
invalid cases when the rule was based on an AND. The
extended mnemonics CLRLSLDI and CLRLSLWI only allow
certain values for the operands and in the mask case
those values were not being checked properly. This
adds a check to those rules to verify that the
'b' and 'n' values used when an AND was part of the rule
have correct values.
There was a bug in some diag messages in asm9. The
message expected 3 values but only provided 2. Those are
corrected here also.
The test/codegen/shift.go was updated to add a few more
cases to check for the case mentioned here.
Some of the comments that mention the order of operands
in these extended mnemonics were wrong and those have been
corrected.
Fixes #41683.
Change-Id: If5bb860acaa5051b9e0cd80784b2868b85898c31
Reviewed-on: https://go-review.googlesource.com/c/go/+/258138
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This adds support for the extswsli instruction which combines
extsw followed by a shift.
New benchmark demonstrates the improvement:
name old time/op new time/op delta
ExtShift 1.34µs ± 0% 1.30µs ± 0% -3.15% (p=0.057 n=4+3)
Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/257017
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This change adds rules to find pairs of instructions that can
be combined into a single shifts. These instruction sequences
are common in array addressing within loops. Improvements can
be seen in many crypto packages and the hash packages.
These are based on the extended mnemonics found in the ISA
sections C.8.1 and C.8.2.
Some rules in PPC64.rules were moved because the ordering prevented
some matching.
The following results were generated on power9.
hash/crc32:
CRC32/poly=Koopman/size=40/align=0 195ns ± 0% 163ns ± 0% -16.41%
CRC32/poly=Koopman/size=40/align=1 200ns ± 0% 163ns ± 0% -18.50%
CRC32/poly=Koopman/size=512/align=0 1.98µs ± 0% 1.67µs ± 0% -15.46%
CRC32/poly=Koopman/size=512/align=1 1.98µs ± 0% 1.69µs ± 0% -14.80%
CRC32/poly=Koopman/size=1kB/align=0 3.90µs ± 0% 3.31µs ± 0% -15.27%
CRC32/poly=Koopman/size=1kB/align=1 3.85µs ± 0% 3.31µs ± 0% -14.15%
CRC32/poly=Koopman/size=4kB/align=0 15.3µs ± 0% 13.1µs ± 0% -14.22%
CRC32/poly=Koopman/size=4kB/align=1 15.4µs ± 0% 13.1µs ± 0% -14.79%
CRC32/poly=Koopman/size=32kB/align=0 137µs ± 0% 105µs ± 0% -23.56%
CRC32/poly=Koopman/size=32kB/align=1 137µs ± 0% 105µs ± 0% -23.53%
crypto/rc4:
RC4_128 733ns ± 0% 650ns ± 0% -11.32% (p=1.000 n=1+1)
RC4_1K 5.80µs ± 0% 5.17µs ± 0% -10.89% (p=1.000 n=1+1)
RC4_8K 45.7µs ± 0% 40.8µs ± 0% -10.73% (p=1.000 n=1+1)
crypto/sha1:
Hash8Bytes 635ns ± 0% 613ns ± 0% -3.46% (p=1.000 n=1+1)
Hash320Bytes 2.30µs ± 0% 2.18µs ± 0% -5.38% (p=1.000 n=1+1)
Hash1K 5.88µs ± 0% 5.38µs ± 0% -8.62% (p=1.000 n=1+1)
Hash8K 42.0µs ± 0% 37.9µs ± 0% -9.75% (p=1.000 n=1+1)
There are other improvements found in golang.org/x/crypto which are all in the
range of 5-15%.
Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252097
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
This changes the code generated for variable length shift
counts to use isel instead of instructions that set and
read the carry flag.
This reduces the generated code for shifts like this
by 1 instruction and avoids the use of instructions to
set and read the carry flag.
This sequence can be found in strconv with these results
on power9:
Atof64Decimal 71.6ns ± 0% 68.3ns ± 0% -4.61%
Atof64Float 95.3ns ± 0% 90.9ns ± 0% -4.62%
Atof64FloatExp 153ns ± 0% 149ns ± 0% -2.61%
Atof64Big 234ns ± 0% 232ns ± 0% -0.85%
Atof64RandomBits 348ns ± 0% 369ns ± 0% +6.03%
Atof64RandomFloats 262ns ± 0% 262ns ± 0% ~
Atof32Decimal 72.0ns ± 0% 68.2ns ± 0% -5.28%
Atof32Float 92.1ns ± 0% 87.1ns ± 0% -5.43%
Atof32FloatExp 159ns ± 0% 158ns ± 0% -0.63%
Atof32Random 194ns ± 0% 191ns ± 0% -1.55%
Some tests in codegen/shift.go are enabled to verify the
expected instructions are generated.
Change-Id: I968715d10ada405a8c46132bf19b8ed9b85796d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227337
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change to the rules removes some unnecessary signed shifts
that appear in the math/rand functions. Existing rules did not
cover some of the signed cases.
A little improvement seen in math/rand due to removing 1 of 2
instructions generated for Int31n, which is inlined quite a bit.
Intn1000 46.9ns ± 0% 45.5ns ± 0% -2.99% (p=1.000 n=1+1)
Int63n1000 33.5ns ± 0% 32.8ns ± 0% -2.09% (p=1.000 n=1+1)
Int31n1000 32.7ns ± 0% 32.6ns ± 0% -0.31% (p=1.000 n=1+1)
Float32 32.7ns ± 0% 30.3ns ± 0% -7.34% (p=1.000 n=1+1)
Float64 21.7ns ± 0% 20.9ns ± 0% -3.69% (p=1.000 n=1+1)
Perm3 205ns ± 0% 202ns ± 0% -1.46% (p=1.000 n=1+1)
Perm30 1.71µs ± 0% 1.68µs ± 0% -1.35% (p=1.000 n=1+1)
Perm30ViaShuffle 1.65µs ± 0% 1.65µs ± 0% -0.30% (p=1.000 n=1+1)
ShuffleOverhead 2.83µs ± 0% 2.83µs ± 0% -0.07% (p=1.000 n=1+1)
Read3 18.7ns ± 0% 16.1ns ± 0% -13.90% (p=1.000 n=1+1)
Read64 126ns ± 0% 124ns ± 0% -1.59% (p=1.000 n=1+1)
Read1000 1.75µs ± 0% 1.63µs ± 0% -7.08% (p=1.000 n=1+1)
Change-Id: I11502dfca7d65aafc76749a8d713e9e50c24a858
Reviewed-on: https://go-review.googlesource.com/c/go/+/225917
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Use the shiftIsBounded function to generate more efficient
Shift instructions.
Updates #25167
Change-Id: Id350f8462dc3a7ed3bfed0bcbea2860b8f40048a
Reviewed-on: https://go-review.googlesource.com/c/go/+/182558
Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Richard Musiol <neelance@gmail.com>
|
|
We know that a & 31 is non-negative for all a, signed or not.
We can avoid checking that and needing to write out an
unreachable call to panicshift.
Change-Id: I32f32fb2c950d2b2b35ac5c0e99b7b2dbd47f917
Reviewed-on: https://go-review.googlesource.com/c/go/+/167499
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Change-Id: I33f5b5051e5f75aa264ec656926223c5a3c09c1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/167498
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matt Layher <mdlayher@gmail.com>
|
|
Use conditional moves instead of subtractions with borrow to handle
saturation cases. This allows us to delete the SUBE/SUBEW ops and
associated rules from the SSA backend. Using conditional moves also
means we can detect when shift values are masked so I've added some
new rules to constant fold the relevant comparisons and masking ops.
Also use the new shiftIsBounded() function to avoid generating code
to handle saturation cases where possible.
Updates #25167 for s390x.
Change-Id: Ief9991c91267c9151ce4c5ec07642abb4dcc1c0d
Reviewed-on: https://go-review.googlesource.com/110070
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|