aboutsummaryrefslogtreecommitdiff
path: root/test/codegen
AgeCommit message (Collapse)Author
2024-08-28cmd/compile/internal/ssa: combine shift and addition for riscv64 rva22u64Joel Sing
When GORISCV64 enables rva22u64, combined shift and addition using the SH1ADD, SH2ADD and SH3ADD instructions that are available via the Zba extension. This results in more than 2000 instructions being removed from the Go binary on riscv64. Change-Id: Ia62ae7dda3d8083cff315113421bee73f518eea8 Reviewed-on: https://go-review.googlesource.com/c/go/+/606636 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
2024-08-26cmd/compile: regalloc: drop values that aren't used until after a callKeith Randall
No point in keeping values in registers when their next use is after a call, as we'd have to spill/restore them anyway. cmd/go is 0.1% smaller. Fixes #59297 Change-Id: I10ee761d0d23229f57de278f734c44d6a8dccd6c Reviewed-on: https://go-review.googlesource.com/c/go/+/509255 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-08-26cmd/compile: intrinsify math.MulUintptr on PPC64Paul E. Murphy
This can be done efficiently with few instructions. This also adds MULHDUCC for further codegen improvement. Change-Id: I06320ba4383a679341b911a237a360ef07b19168 Reviewed-on: https://go-review.googlesource.com/c/go/+/605975 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Archana Ravindar <aravinda@redhat.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-08-23test/codegen: add initial codegen tests for integer min/maxJoel Sing
Change-Id: I006370053748edbec930c7279ee88a805009aa0d Reviewed-on: https://go-review.googlesource.com/c/go/+/606976 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-08-14cmd/compile: improve unneeded zeroing removalKeith Randall
After newobject, we don't need to write zeroes to initialize the object. It has already been zeroed by the allocator. This is already handled in most cases, but because we run builtin decomposition after the opt pass, we don't handle cases where the zero of a compound builtin is being written. Improve the zero detector to handle those cases. Fixes #68845 Change-Id: If3dde2e304a05e5a6a6723565191d5444b334bcc Reviewed-on: https://go-review.googlesource.com/c/go/+/605255 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Auto-Submit: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-08-12cmd/compile: add additional arm64 bit field ruleskhr@golang.org
Get rid of TODO in prove pass. We currently avoid marking shifts of constants as bounded, where bounded means we don't have to worry about <0 or >=bitwidth shifts. We do this because it causes different rule applications during lowering which cause some codegen tests to fail. Add some new rules which ensure that we get the right final instruction sequence regardless of the ordering. Then we can remove this special case. Change-Id: I4e962d4f09992b42ab47e123de5ded3b8b8fb205 Reviewed-on: https://go-review.googlesource.com/c/go/+/602935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-08-07cmd/compile: simplify prove passkhr@golang.org
We don't need noLimit checks in a bunch of places. Also simplify folding of provable constant results. At this point in the CL stack, compilebench reports no performance changes. The only thing of note is that binaries got a bit smaller. name old text-bytes new text-bytes delta HelloSize 960kB ± 0% 952kB ± 0% -0.83% (p=0.000 n=10+10) CmdGoSize 12.3MB ± 0% 12.1MB ± 0% -1.53% (p=0.000 n=10+10) Change-Id: Id4be75eec0f8c93f2f3b93a8521ce2278ee2ee2c Reviewed-on: https://go-review.googlesource.com/c/go/+/599197 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-08-07cmd/compile: rewrite the constant parts of the prove passkhr@golang.org
Handles a lot more cases where constant ranges can eliminate various (mostly bounds failure) paths. Fixes #66826 Fixes #66692 Fixes #48213 Update #57959 TODO: remove constant logic from poset code, no longer needed. Change-Id: Id196436fcd8a0c84c7d59c04f93bd92e26a0fd7e Reviewed-on: https://go-review.googlesource.com/c/go/+/599096 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-08-07cmd/compile, math: improve implementation of math.{Max,Min} on loong64Xiaolin Zhao
Make math.{Min,Max} intrinsics and implement math.{archMax,archMin} in hardware. goos: linux goarch: loong64 pkg: math cpu: Loongson-3A6000 @ 2500.00MHz │ old.bench │ new.bench │ │ sec/op │ sec/op vs base │ Max 7.606n ± 0% 3.087n ± 0% -59.41% (p=0.000 n=20) Min 7.205n ± 0% 2.904n ± 0% -59.69% (p=0.000 n=20) MinFloat 37.220n ± 0% 4.802n ± 0% -87.10% (p=0.000 n=20) MaxFloat 33.620n ± 0% 4.802n ± 0% -85.72% (p=0.000 n=20) geomean 16.18n 3.792n -76.57% goos: linux goarch: loong64 pkg: runtime cpu: Loongson-3A5000 @ 2500.00MHz │ old.bench │ new.bench │ │ sec/op │ sec/op vs base │ Max 10.010n ± 0% 7.196n ± 0% -28.11% (p=0.000 n=20) Min 8.806n ± 0% 7.155n ± 0% -18.75% (p=0.000 n=20) MinFloat 60.010n ± 0% 7.976n ± 0% -86.71% (p=0.000 n=20) MaxFloat 56.410n ± 0% 7.980n ± 0% -85.85% (p=0.000 n=20) geomean 23.37n 7.566n -67.63% Updates #59120. Change-Id: I6815d20bc304af3cbf5d6ca8fe0ca1c2ddebea2d Reviewed-on: https://go-review.googlesource.com/c/go/+/580283 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2024-08-02cmd/compile,runtime: disable swissmap fast variantsMichael Pratt
Temporary measure to reduce the required MVP code. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: I44dc8acd0dc8280c6beb40451998e84bc85c238a Reviewed-on: https://go-review.googlesource.com/c/go/+/580915 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2024-07-23cmd/compile: make sync/atomic AND/OR operations intrinsic on amd64Keith Randall
Update #61395 Change-Id: I59a950f48efc587dfdffce00e2f4f3ab99d8df00 Reviewed-on: https://go-review.googlesource.com/c/go/+/594738 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com>
2024-07-23cmd/compile: store constant floats using integer constantsKeith Randall
x86 is better at storing constant ints than constant floats. (It uses a constant directly in the instruction stream, instead of loading it from a constant global memory.) Noticed as part of #67957 Change-Id: I9b7b586ad8e0fe9ce245324f020e9526f82b209d Reviewed-on: https://go-review.googlesource.com/c/go/+/592596 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-06-07cmd/compile/ssa: fix (MOVWZreg (RLWINM)) folding on PPC64Paul E. Murphy
RLIWNM does not clear the upper 32 bits of the target register if the mask wraps around (e.g 0xF000000F). Don't elide MOVWZreg for such masks. All other usage clears the upper 32 bits. Fixes #67844. Change-Id: I11b89f1da9ae077624369bfe2bf25e9b7c9b79bc Reviewed-on: https://go-review.googlesource.com/c/go/+/590896 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-23test/codegen: add Mul test for riscv64Meng Zhuo
Change-Id: I51e9832317e5dee1e3fe0772e7592b3dae95a625 Reviewed-on: https://go-review.googlesource.com/c/go/+/586797 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-21cmd/compile/internal/ssa: fix ppc64 merging of (CLRLSLDI (SRD ...))Paul E. Murphy
The rotate value was not correctly converted from a 64 bit to 32 bit rotate. This caused a miscompile of golang.org/x/text/unicode/runenames.Names. Fixes #67526 Change-Id: Ief56fbab27ccc71cd4c01117909bfee7f60a2ea1 Reviewed-on: https://go-review.googlesource.com/c/go/+/586915 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-05-17cmd/compile/internal/ssa: cleanup ANDCCconst rewrite rules on PPC64Paul E. Murphy
Avoid creating duplicate usages of ANDCCconst. This is preparation for a patch to reintroduce ANDconst to simplify the lower pass while treating ANDCCconst like other *CC* ssa opcodes. Also, move many of the similar rules wich retarget ANDCCconst users to the flag result to a common rule for all compares against zero. Change-Id: Ida86efe17ff413cb82c349d8ef69d2899361f4c0 Reviewed-on: https://go-review.googlesource.com/c/go/+/585400 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-05-15cmd/compile/internal/ssa: combine more shift and masking on PPC64Paul E. Murphy
Investigating binaries, these patterns seem to show up frequently. Change-Id: I987251e4070e35c25e98da321e444ccaa1526912 Reviewed-on: https://go-review.googlesource.com/c/go/+/583302 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-05-03cmd/compile/internal/ssa: on PPC64, try combining CLRLSLDI and SRDconst into ↵Paul E. Murphy
RLWINM This provides a small performance bump to crc64 as measured on ppc64le/power10: name old time/op new time/op delta Crc64/ISO64KB 49.6µs ± 0% 46.6µs ± 0% -6.18% Crc64/ISO4KB 3.16µs ± 0% 2.97µs ± 0% -5.83% Crc64/ISO1KB 840ns ± 0% 794ns ± 0% -5.46% Crc64/ECMA64KB 49.6µs ± 0% 46.5µs ± 0% -6.20% Crc64/Random64KB 53.1µs ± 0% 49.9µs ± 0% -6.04% Crc64/Random16KB 15.9µs ± 1% 15.0µs ± 0% -5.73% Change-Id: I302b5431c7dc46dfd2d211545c483bdcdfe011f1 Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/581937 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Eli Bendersky <eliben@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-04-19cmd/compile: remove redundant calls to cmpstringkhr@golang.org
The results of cmpstring are reuseable if the second call has the same arguments and memory. Note that this gets rid of cmpstring, but we still generate a redundant </<= test and branch afterwards, because the compiler doesn't know that cmpstring only ever returns -1,0,1. Update #61725 Change-Id: I93a0d1ccca50d90b1e1a888240ffb75a3b10b59b Reviewed-on: https://go-review.googlesource.com/c/go/+/578835 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-04-04cmd/internal/obj/ppc64: on Power10, use xxspltidp for float constantsPaul E. Murphy
Any normal float32 constant can be generated by this instruction; use xxspltidp when possible. This prefixed instruction is much faster than the two instruction load sequence from the float32/float64 constant pool. Change-Id: Id751d9ffdae71463adbde66427b986f0b2ef74c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/575555 Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2024-04-02cmd/compile: check ODEREF for safe lhs in assignment during static initCuong Manh Le
For #66585 Change-Id: Iddc407e3ef4c3b6ecf5173963b66b3e65e43c92d Reviewed-on: https://go-review.googlesource.com/c/go/+/575336 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-04-01cmd/compile: support float min/max instructions on PPC64Paul E. Murphy
This enables efficient use of the builtin min/max function for float64 and float32 types on GOPPC64 >= power9. Extend the assembler to support xsminjdp/xsmaxjdp and use them to implement float min/max. Simplify the VSX xx3 opcode rules to allow FPR arguments, if all arguments are an FPR. Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328 Reviewed-on: https://go-review.googlesource.com/c/go/+/574535 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Than McIntosh <thanm@google.com>
2024-03-21cmd/compile,cmd/go,cmd/internal,runtime: remove dynamic checks for atomics ↵Andrey Bokhanko
for ARM64 targets that support LSE Remove dynamic checks for atomic instructions for ARM64 targets that support LSE extension. For #66131 Change-Id: I0ec1b183a3f4ea4c8a537430646e6bc4b4f64271 Reviewed-on: https://go-review.googlesource.com/c/go/+/569536 Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Fannie Zhang <Fannie.Zhang@arm.com> Reviewed-by: Shu-Chun Weng <scw@google.com>
2024-03-21cmd/compile: include constant bools in memcombineKeith Randall
Constant bools are like constant 1-byte values, they memcombine just fine. (There are still trickier cases that this pass doesn't catch yet, see TODO at memcombine.go:503.) Fixes #66413 Change-Id: Ia67cf72ed1c416e27ac22da443bd88a3f09a6cc8 Reviewed-on: https://go-review.googlesource.com/c/go/+/573416 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Joseph Tsai <joetsai@digital-static.net> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@google.com>
2024-03-15cmd/compile/internal: generate ADDZE on PPC64Paul E. Murphy
This usage shows up in quite a few places, and helps reduce register pressure in several complex cryto functions by removing a MOVD $0,... instruction. Change-Id: I9444ea8f9d19bfd68fb71ea8dc34e109681b3802 Reviewed-on: https://go-review.googlesource.com/c/go/+/571055 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Paul Murphy <murp@ibm.com>
2024-03-13cmd/asm,cmd/compile: generate less instructions for most 32 bit constant ↵Paul E. Murphy
adds on ppc64x For GOPPC64 < 10 targets, most large 32 bit constants (those exceeding int16 capacity) can be added using two instructions instead of 3. This cannot be done for values greater than 0x7FFF7FFF, so this must be done during asm preprocessing as the optab matching rules cannot differentiate this special case. Likewise, constants 0x8000 <= x < 0x10000 are not converted. The assembler currently generates 2 instructions sequences for these constants. Change-Id: I1ccc839c6c28fc32f15d286b2e52e2d22a2a06d4 Reviewed-on: https://go-review.googlesource.com/c/go/+/568116 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2024-03-07cmd/compile,cmd/internal/obj: provide rotation pseudo-instructions for riscv64Joel Sing
Provide and use rotation pseudo-instructions for riscv64. The RISC-V bitmanip extension adds support for hardware rotation instructions in the form of ROL, ROLW, ROR, RORI, RORIW and RORW. These are easily implemented in the assembler as pseudo-instructions for CPUs that do not support the bitmanip extension. This approach provides a number of advantages, including reducing the rewrite rules needed in the compiler, simplifying codegen tests and most importantly, allowing these instructions to be used in assembly (for example, riscv64 optimised versions of SHA-256 and SHA-512). When bitmanip support is added, these instruction sequences can simply be replaced with a single instruction if permitted by the GORISCV64 profile. Change-Id: Ia23402e1a82f211ac760690deb063386056ae1fa Reviewed-on: https://go-review.googlesource.com/c/go/+/565015 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: M Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Run-TryBot: Joel Sing <joel@sing.id.au>
2024-02-29cmd/compile: soften type matching when allocating stack slotskhr@golang.org
Currently we use pointer equality on types when deciding whether we can reuse a stack slot. That's too strict, as we don't guarantee pointer equality for the same type. In particular, it can vary based on whether PtrTo has been called in the frontend or not. Instead, use the type's LinkString, which is guaranteed to both be unique for a type, and to not vary given two different type structures describing the same type. Update #65783 Change-Id: I64f55138475f04bfa30cfb819b786b7cc06aebe4 Reviewed-on: https://go-review.googlesource.com/c/go/+/565436 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2024-02-16cmd/compile: improve rotations for riscv64Joel Sing
Enable canRotate for riscv64, enable rotation intrinsics and provide better rewrite implementations for rotations. By avoiding Lsh*x64 and Rsh*Ux64 we can produce better code, especially for 32 and 64 bit rotations. By enabling canRotate we also benefit from the generic rotation rewrite rules. Benchmark on a StarFive VisionFive 2: │ rotate.1 │ rotate.2 │ │ sec/op │ sec/op vs base │ RotateLeft-4 14.700n ± 0% 8.016n ± 0% -45.47% (p=0.000 n=10) RotateLeft8-4 14.70n ± 0% 10.69n ± 0% -27.28% (p=0.000 n=10) RotateLeft16-4 14.70n ± 0% 12.02n ± 0% -18.23% (p=0.000 n=10) RotateLeft32-4 13.360n ± 0% 8.016n ± 0% -40.00% (p=0.000 n=10) RotateLeft64-4 13.360n ± 0% 8.016n ± 0% -40.00% (p=0.000 n=10) geomean 14.15n 9.208n -34.92% Change-Id: I1a2036fdc57cf88ebb6617eb8d92e1d187e183b2 Reviewed-on: https://go-review.googlesource.com/c/go/+/560315 Reviewed-by: M Zhuo <mengzhuo1203@gmail.com> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>
2024-02-08test/codegen: add float max/min codegen testMeng Zhuo
As CL 514596 and CL 514775 adds hardware implement of float max/min, we should add codegen test for these two CL. Change-Id: I347331032fe9f67a2e6fdb5d3cfe20203296b81c Reviewed-on: https://go-review.googlesource.com/c/go/+/561295 Reviewed-by: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: M Zhuo <mengzhuo1203@gmail.com> Reviewed-by: David Chase <drchase@google.com>
2023-12-14all: remove newline characters after return statementsDanil Timerbulatov
This commit is aimed at improving the readability and consistency of the code base. Extraneous newline characters were present after some return statements, creating unnecessary separation in the code. Fixes #64610 Change-Id: Ic1b05bf11761c4dff22691c2f1c3755f66d341f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/548316 Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-12-01cmd/compile: correct code generation for right shifts on riscv64Joel Sing
The code generation on riscv64 will currently result in incorrect assembly when a 32 bit integer is right shifted by an amount that exceeds the size of the type. In particular, this occurs when an int32 or uint32 is cast to a 64 bit type and right shifted by a value larger than 31. Fix this by moving the SRAW/SRLW conversion into the right shift rules and removing the SignExt32to64/ZeroExt32to64. Add additional rules that rewrite to SRAIW/SRLIW when the shift is less than the size of the type, or replace/eliminate the shift when it exceeds the size of the type. Add SSA tests that would have caught this issue. Also add additional codegen tests to ensure that the resulting assembly is what we expect in these overflow cases. Fixes #64285 Change-Id: Ie97b05668597cfcb91413afefaab18ee1aa145ec Reviewed-on: https://go-review.googlesource.com/c/go/+/545035 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: M Zhuo <mzh@golangcn.org> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-11-30cmd/compile: fix memcombine pass for big endian, > 1 byte elementsKeith Randall
The shift amounts were wrong in this case, leading to miscompilation of load combining. Also the store combining was not triggering when it should. Fixes #64468 Change-Id: Iaeb08972c5fc1d6f628800334789c6af7216e87b Reviewed-on: https://go-review.googlesource.com/c/go/+/546355 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-21cmd/compile/internal/walk: copy SSA-able variablesMatthew Dempsky
order.go ensures expressions that are passed to the runtime by address are in fact addressable. However, in the case of local variables, if the variable hasn't already been marked as addrtaken, then taking its address here will effectively prevent the variable from being converted to SSA form. Instead, it's better to just copy the variable into a new temporary, which we can pass by address instead. This ensures the original variable can still be converted to SSA form. Fixes #63332. Change-Id: I182376d98d419df8bf07c400d84c344c9b82c0fb Reviewed-on: https://go-review.googlesource.com/c/go/+/541715 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Matthew Dempsky <mdempsky@google.com>
2023-11-13cmd/compile/internal/ssa: on PPC64, merge (CMPconst [0] (op ...)) more ↵Paul E. Murphy
aggressively Generate the CC version of many opcodes whose result is compared against signed 0. The approach taken here works even if the opcode result is used in multiple places too. Add support for ADD, ADDconst, ANDN, SUB, NEG, CNTLZD, NOR conversions to their CC opcode variant. These are the most commonly used variants. Also, do not set clobberFlags of CNTLZD and CNTLZW, they do not clobber flags. This results in about 1% smaller text sections in kubernetes binaries, and no regressions in the crypto benchmarks. Change-Id: I9e0381944869c3774106bf348dead5ecb96dffda Reviewed-on: https://go-review.googlesource.com/c/go/+/538636 Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-11-09cmd/internal/obj/ppc64: remove C_UCON optab matching classPaul E. Murphy
This optab matching rule was used to match signed 16 bit values shifted left by 16 bits. Unsigned 16 bit values greater than 0x7FFF<<16 were classified as C_U32CON which led to larger than necessary codegen. Instead, rewrite logical/arithmetic operations in the preprocessor pass to use the 16 bit shifted immediate operation (e.g ADDIS vs ADD). This simplifies the optab matching rules, while also minimizing codegen size for large unsigned values. Note, ADDIS sign-extends the constant argument, all others do not. For matching opcodes, this means: MOVD $is<<16,Rx becomes ADDIS $is,Rx or ORIS $is,Rx MOVW $is<<16,Rx becomes ADDIS $is,Rx ADD $is<<16,[Rx,]Ry becomes ADDIS $is[Rx,]Ry OR $is<<16,[Rx,]Ry becomes ORIS $is[Rx,]Ry XOR $is<<16,[Rx,]Ry becomes XORIS $is[Rx,]Ry Change-Id: I1a988d9f52517a04bb8dc2e41d7caf3d5fff867c Reviewed-on: https://go-review.googlesource.com/c/go/+/536735 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Heschi Kreinick <heschi@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-10-30cmd/compile: optimize right shifts of int32 on riscv64Ubuntu
The compiler is currently sign extending 32 bit signed integers to 64 bits before right shifting them using a 64 bit shift instruction. There's no need to do this as RISC-V has instructions for right shifting 32 bit signed values (sraw and sraiw) which sign extend the result of the shift to 64 bits. Change the compiler so that it uses sraw and sraiw for shifts of signed 32 bit integers reducing in most cases the number of instructions needed to perform the shift. Here are some examples of code sequences that are changed by this patch: int32(a) >> 2 before: sll x5,x10,0x20 sra x10,x5,0x22 after: sraw x10,x10,0x2 int32(v) >> int(s) before: sext.w x5,x10 sltiu x6,x11,64 add x6,x6,-1 or x6,x11,x6 sra x10,x5,x6 after: sltiu x5,x11,32 add x5,x5,-1 or x5,x11,x5 sraw x10,x10,x5 int32(v) >> (int(s) & 31) before: sext.w x5,x10 and x6,x11,63 sra x10,x5,x6 after: and x5,x11,31 sraw x10,x10,x5 int32(100) >> int(a) before: bltz x10,<target address calls runtime.panicshift> sltiu x5,x10,64 add x5,x5,-1 or x5,x10,x5 li x6,100 sra x10,x6,x5 after: bltz x10,<target address calls runtime.panicshift> sltiu x5,x10,32 add x5,x5,-1 or x5,x10,x5 li x6,100 sraw x10,x6,x5 int32(v) >> (int(s) & 63) before: sext.w x5,x10 and x6,x11,63 sra x10,x5,x6 after: and x5,x11,63 sltiu x6,x5,32 add x6,x6,-1 or x5,x5,x6 sraw x10,x10,x5 In most cases we eliminate one instruction. In the case where we shift a int32 constant by a variable the number of instructions generated is identical. A sra is simply replaced by a sraw. In the unusual case where we shift right by a variable anded with a constant > 31 but < 64, we generate two additional instructions. As this is an unusual case we do not try to optimize for it. Some improvements can be seen in some of the existing benchmarks, notably in the utf8 package which performs right shifts of runes which are signed 32 bit integers. | utf8-old | utf8-new | | sec/op | sec/op vs base | EncodeASCIIRune-4 17.68n ± 0% 17.67n ± 0% ~ (p=0.312 n=10) EncodeJapaneseRune-4 35.34n ± 0% 34.53n ± 1% -2.31% (p=0.000 n=10) AppendASCIIRune-4 3.213n ± 0% 3.213n ± 0% ~ (p=0.318 n=10) AppendJapaneseRune-4 36.14n ± 0% 35.35n ± 0% -2.19% (p=0.000 n=10) DecodeASCIIRune-4 28.11n ± 0% 27.36n ± 0% -2.69% (p=0.000 n=10) DecodeJapaneseRune-4 38.55n ± 0% 38.58n ± 0% ~ (p=0.612 n=10) Change-Id: I60a91cbede9ce65597571c7b7dd9943eeb8d3cc2 Reviewed-on: https://go-review.googlesource.com/c/go/+/535115 Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: M Zhuo <mzh@golangcn.org> Reviewed-by: David Chase <drchase@google.com>
2023-10-19test: migrate remaining files to go:build syntaxDmitri Shuralyov
Most of the test cases in the test directory use the new go:build syntax already. Convert the rest. In general, try to place the build constraint line below the test directive comment in more places. For #41184. For #60268. Change-Id: I11c41a0642a8a26dc2eda1406da908645bbc005b Cq-Include-Trybots: luci.golang.try:gotip-linux-386-longtest,gotip-linux-amd64-longtest,gotip-windows-amd64-longtest Reviewed-on: https://go-review.googlesource.com/c/go/+/536236 Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-19test/codegen: fix PPC64 AddLargeConst testPaul E. Murphy
Commit 061d77cb70 was published in parallel with another commit 36ecff0893 which changed how certain constants were generated. Update the test to account for the changes. Change-Id: I314b735a34857efa02392b7a0dd9fd634e4ee428 Reviewed-on: https://go-review.googlesource.com/c/go/+/536256 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Paul Murphy <murp@ibm.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-10-18cmd/compile/internal/ssa: on PPC64, generate large constant paddiPaul E. Murphy
This is only supported power10/linux/PPC64. This generates smaller, faster code by merging a pli + add into paddi. Change-Id: I1f4d522fce53aea4c072713cc119a9e0d7065acc Reviewed-on: https://go-review.googlesource.com/c/go/+/531717 Run-TryBot: Paul Murphy <murp@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-10-18cmd/compile: avoid ANDCCconst on PPC64 if condition not neededLynn Boger
In the PPC64 ISA, the instruction to do an 'and' operation using an immediate constant is only available in the form that also sets CR0 (i.e. clobbers the condition register.) This means CR0 is being clobbered unnecessarily in many cases. That affects some decisions made during some compiler passes that check for it. In those cases when the constant used by the ANDCC is a right justified consecutive set of bits, a shift instruction can be used which has the same effect if CR0 does not need to be set. The rule to do that has been added to the late rules file after other rules using ANDCCconst have been processed in the main rules file. Some codegen tests had to be updated since ANDCC is no longer generated for some cases. A new test case was added to verify the ANDCC is present if the results for both the AND and CR0 are used. Change-Id: I304f607c039a458e2d67d25351dd00aea72ba542 Reviewed-on: https://go-review.googlesource.com/c/go/+/531435 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-10-12cmd/compile: when combining stores, use line number of first storeKeith Randall
var p *[2]uint32 = ... p[0] = 0 p[1] = 0 When we combine these two 32-bit stores into a single 64-bit store, use the line number of the first store, not the second one. This differs from the default behavior because usually with the combining that the compiler does, we use the line number of the last instruction in the combo (e.g. load+add, we use the line number of the add). This is the same behavior that gcc does in C (picking the line number of the first of a set of combined stores). Change-Id: Ie70bf6151755322d33ecd50e4d9caf62f7881784 Reviewed-on: https://go-review.googlesource.com/c/go/+/521678 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com>
2023-10-09cmd/compile: use type hash from itab field instead of type fieldKeith Randall
It is one less dependent load away, and right next to another field in the itab we also load as part of the type switch or type assert. Change-Id: If7aaa7814c47bd79a6c7ed4232ece0bc1d63550e Reviewed-on: https://go-review.googlesource.com/c/go/+/533117 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-10-09cmd/compile: use cache in front of convI2IKeith Randall
This is the last of the getitab users to receive a cache. We should now no longer see getitab (and callees) in profiles. Hopefully. Change-Id: I2ed72b9943095bbe8067c805da7f08e00706c98c Reviewed-on: https://go-review.googlesource.com/c/go/+/531055 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-10-07cmd/compile: optimize right shifts of uint32 on riscvMark Ryan
The compiler is currently zero extending 32 bit unsigned integers to 64 bits before right shifting them using a 64 bit shift instruction. There's no need to do this as RISC-V has instructions for right shifting 32 bit unsigned values (srlw and srliw) which zero extend the result of the shift to 64 bits. Change the compiler so that it uses srlw and srliw for 32 bit unsigned shifts reducing in most cases the number of instructions needed to perform the shift. Here are some examples of code sequences that are changed by this patch: uint32(a) >> 2 before: sll x5,x10,0x20 srl x10,x5,0x22 after: srlw x10,x10,0x2 uint32(a) >> int(b) before: sll x5,x10,0x20 srl x5,x5,0x20 srl x5,x5,x11 sltiu x6,x11,64 neg x6,x6 and x10,x5,x6 after: srlw x5,x10,x11 sltiu x6,x11,32 neg x6,x6 and x10,x5,x6 bits.RotateLeft32(uint32(a), 1) before: sll x5,x10,0x1 sll x6,x10,0x20 srl x7,x6,0x3f or x5,x5,x7 after: sll x5,x10,0x1 srlw x6,x10,0x1f or x10,x5,x6 bits.RotateLeft32(uint32(a), int(b)) before: and x6,x11,31 sll x7,x10,x6 sll x8,x10,0x20 srl x8,x8,0x20 add x6,x6,-32 neg x6,x6 srl x9,x8,x6 sltiu x6,x6,64 neg x6,x6 and x6,x9,x6 or x6,x6,x7 after: and x5,x11,31 sll x6,x10,x5 add x5,x5,-32 neg x5,x5 srlw x7,x10,x5 sltiu x5,x5,32 neg x5,x5 and x5,x7,x5 or x10,x6,x5 The one regression observed is the following case, an unbounded right shift of a uint32 where the value we're shifting by is known to be < 64 but > 31. As this is an unusual case this commit does not optimize for it, although the existing code does. uint32(a) >> (b & 63) before: sll x5,x10,0x20 srl x5,x5,0x20 and x6,x11,63 srl x10,x5,x6 after and x5,x11,63 srlw x6,x10,x5 sltiu x5,x5,32 neg x5,x5 and x10,x6,x5 Here we have one extra instruction. Some benchmark highlights, generated on a VisionFive2 8GB running Ubuntu 23.04. pkg: math/bits LeadingZeros32-4 18.64n ± 0% 17.32n ± 0% -7.11% (p=0.000 n=10) LeadingZeros64-4 15.47n ± 0% 15.51n ± 0% +0.26% (p=0.027 n=10) TrailingZeros16-4 18.48n ± 0% 17.68n ± 0% -4.33% (p=0.000 n=10) TrailingZeros32-4 16.87n ± 0% 16.07n ± 0% -4.74% (p=0.000 n=10) TrailingZeros64-4 15.26n ± 0% 15.27n ± 0% +0.07% (p=0.043 n=10) OnesCount32-4 20.08n ± 0% 19.29n ± 0% -3.96% (p=0.000 n=10) RotateLeft-4 8.864n ± 0% 8.838n ± 0% -0.30% (p=0.006 n=10) RotateLeft32-4 8.837n ± 0% 8.032n ± 0% -9.11% (p=0.000 n=10) Reverse32-4 29.77n ± 0% 26.52n ± 0% -10.93% (p=0.000 n=10) ReverseBytes32-4 9.640n ± 0% 8.838n ± 0% -8.32% (p=0.000 n=10) Sub32-4 8.835n ± 0% 8.035n ± 0% -9.06% (p=0.000 n=10) geomean 11.50n 11.33n -1.45% pkg: crypto/md5 Hash8Bytes-4 1.486µ ± 0% 1.426µ ± 0% -4.04% (p=0.000 n=10) Hash64-4 2.079µ ± 0% 1.968µ ± 0% -5.36% (p=0.000 n=10) Hash128-4 2.720µ ± 0% 2.557µ ± 0% -5.99% (p=0.000 n=10) Hash256-4 3.996µ ± 0% 3.733µ ± 0% -6.58% (p=0.000 n=10) Hash512-4 6.541µ ± 0% 6.072µ ± 0% -7.18% (p=0.000 n=10) Hash1K-4 11.64µ ± 0% 10.75µ ± 0% -7.58% (p=0.000 n=10) Hash8K-4 82.95µ ± 0% 76.32µ ± 0% -7.99% (p=0.000 n=10) Hash1M-4 10.436m ± 0% 9.591m ± 0% -8.10% (p=0.000 n=10) Hash8M-4 83.50m ± 0% 76.73m ± 0% -8.10% (p=0.000 n=10) Hash8BytesUnaligned-4 1.494µ ± 0% 1.434µ ± 0% -4.02% (p=0.000 n=10) Hash1KUnaligned-4 11.64µ ± 0% 10.76µ ± 0% -7.52% (p=0.000 n=10) Hash8KUnaligned-4 83.01µ ± 0% 76.32µ ± 0% -8.07% (p=0.000 n=10) geomean 28.32µ 26.42µ -6.72% Change-Id: I20483a6668cca1b53fe83944bee3706aadcf8693 Reviewed-on: https://go-review.googlesource.com/c/go/+/528975 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-06cmd/compile: expand calls cleanupDavid Chase
Convert expand calls into a smaller number of focused recursive rewrites, and rely on an enhanced version of "decompose" to clean up afterwards. Debugging information seems to emerge intact. Change-Id: Ic46da4207e3a4da5c8e2c47b637b0e35abbe56bb Reviewed-on: https://go-review.googlesource.com/c/go/+/507295 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-06cmd/compile: use cache in front of type assert runtime callKeith Randall
That way we don't need to call into the runtime for every type assertion (to an interface type). name old time/op new time/op delta TypeAssert-24 3.78ns ± 3% 1.00ns ± 1% -73.53% (p=0.000 n=10+8) Change-Id: I0ba308aaf0f24a5495b4e13c814d35af0c58bfde Reviewed-on: https://go-review.googlesource.com/c/go/+/529316 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-10-06cmd/compile: add a cache to interface type switchesKeith Randall
That way we don't need to call into the runtime when the type being switched on has been seen many times before. The cache is just a hash table of a sample of all the concrete types that have been switched on at that source location. We record the matching case number and the resulting itab for each concrete input type. The caches seldom get large. The only two in a run of all.bash that get more than 100 entries, even with the sampling rate set to 1, are test/fixedbugs/issue29264.go, with 101 test/fixedbugs/issue29312.go, with 254 Both happen at the type switch in fmt.(*pp).handleMethods, perhaps unsurprisingly. name old time/op new time/op delta SwitchInterfaceTypePredictable-24 25.8ns ± 2% 2.5ns ± 3% -90.43% (p=0.000 n=10+10) SwitchInterfaceTypeUnpredictable-24 37.5ns ± 2% 11.2ns ± 1% -70.02% (p=0.000 n=10+10) Change-Id: I4961ac9547b7f15b03be6f55cdcb972d176955eb Reviewed-on: https://go-review.googlesource.com/c/go/+/526658 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Keith Randall <khr@google.com>
2023-10-06cmd/compile: improve interface type switchesKeith Randall
For type switches where the targets are interface types, call into the runtime once instead of doing a sequence of assert* calls. name old time/op new time/op delta SwitchInterfaceTypePredictable-24 26.6ns ± 1% 25.8ns ± 2% -2.86% (p=0.000 n=10+10) SwitchInterfaceTypeUnpredictable-24 39.3ns ± 1% 37.5ns ± 2% -4.57% (p=0.000 n=10+10) Not super helpful by itself, but this code organization allows followon CLs that add caching to the lookups. Change-Id: I7967f85a99171faa6c2550690e311bea8b54b01c Reviewed-on: https://go-review.googlesource.com/c/go/+/526657 Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com>
2023-10-05cmd/internal/obj/ppc64: generate MOVD mask constants in registerPaul E. Murphy
Add a new form of RLDC which maps directly to the ISA definition of rldc: RLDC Rs, $sh, $mb, Ra. This is used to generate mask constants described below. Using MOVD $-1, Rx; RLDC Rx, $sh, $mb, Rx, any mask constant can be generated. A mask is a contiguous series of 1 bits, which may wrap. Change-Id: Ifcaae1114080ad58b5fdaa3e5fc9019e2051f282 Reviewed-on: https://go-review.googlesource.com/c/go/+/531120 Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org>