aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa/check.go
AgeCommit message (Collapse)Author
7 dayscmd/compile: fix typoWeixie Cui
Change-Id: Ia9ee618aa68aad5bab73ee62eea176084ee162da GitHub-Last-Rev: 4cc005d3cd1ae4e5eaa283b1799c7be26b2279f5 GitHub-Pull-Request: golang/go#78625 Reviewed-on: https://go-review.googlesource.com/c/go/+/765280 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2026-02-25cmd/compile: handle zero-sized values more generallykhr@golang.org
Introduce a new zero-arg op, Empty, which builds a zero-sized value. This is like ArrayMake0 but can make more general zero-sized values, like those of type [2][0]int. Needed for the subsequent CL. Update #77635 Change-Id: If928e9677be5d40a4e2d7501dada66e062319711 Reviewed-on: https://go-review.googlesource.com/c/go/+/747761 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2026-01-22cmd/compile: ensure ops have the expected argument widthsKeith Randall
The generic SSA representation uses explicit extension and truncation operations to change widths of values. The map intrinsics were playing somewhat fast and loose with this requirement. Fix that, and add a check to make sure we don't regress. I don't think there is a triggerable bug here, but I ran into this with some prove pass modifications, where cmd/compile/internal/ssa/prove.go:isCleanExt (and/or its uses) is actually wrong when this invariant is not maintained. Change-Id: Idb7be6e691e2dbf6d7af6584641c3227c5c64bf5 Reviewed-on: https://go-review.googlesource.com/c/go/+/731300 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-09-26[dev.simd] simd: generalize select-float32-from-pairDavid Chase
This adds methods SelectFromPair for {Int,Uint,Float}32x4 and SelectFromPairGrouped for {Int,Uint,Float}32x8. Each of these has the signature ``` func(x T32xK.Method(a,b,c,d uint8, y T32xK) T32xK) ``` where a, b, c, d can be 0-7 and each one specifies an element from the concatenated elements of x (0-3) and y (4-7). When a, b, c, d are constants, 1 or 2 instructions are generated, otherwise, it's done the harder-slower way with a function call. Change-Id: I05eb9342e90edb9d83a4d0f5b924bcd2cfd4d12e Reviewed-on: https://go-review.googlesource.com/c/go/+/703575 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-09-11[dev.simd] all: merge master (cf5e993) into dev.simdCherry Mui
Merge List: + 2025-09-11 cf5e993177 cmd/link: allow one to specify the data section in the internal linker + 2025-09-11 cdb3d467fa encoding/gob: make use of reflect.TypeAssert + 2025-09-11 fef360964c archive/tar: fix typo in benchmark name + 2025-09-11 7d562b8460 syscall: actually remove unreachable code + 2025-09-11 c349582344 crypto/rsa: don't test CL 687836 against v1.0.0 FIPS 140-3 module + 2025-09-11 253dd08f5d debug/macho: filter non-external symbols when reading imported symbols without LC_DYSYMTAB + 2025-09-10 2009e6c596 internal/runtime/maps: remove redundant package docs + 2025-09-10 de5d7eccb9 runtime/internal/maps: only conditionally clear groups when sparse + 2025-09-10 8098b99547 internal/runtime/maps: speed up Clear + 2025-09-10 fe5420b054 cmd: delete some more windows/arm remnants + 2025-09-10 fad1dc608d runtime: don't artificially limit TestReadMetricsSched + 2025-09-10 b1f3e38e41 cmd/compile: when CSEing two values, prefer the statement marked one + 2025-09-10 00824f5ff5 types2: better documentation for resolve() + 2025-09-10 5cf8ca42e3 internal/trace/raw: use strings.Cut instead of strings.SplitN 2 + 2025-09-10 80a2aae922 Revert "cmd/compile: improve stp merging for non-sequent cases" + 2025-09-10 f327a05419 go/token, syscall: annotate if blocks that defeat vet's unreachable pass + 2025-09-10 9650c97d0f syscall: remove unreachable code + 2025-09-10 f1c4b860d4 Revert "crypto/internal/fips140: update frozen module version to "v1.0.0"" + 2025-09-10 30686c4cc8 encoding/json/v2: document context annotation with SemanticError + 2025-09-09 c5737dc21b runtime: when using cgo on 386, call C sigaction function + 2025-09-09 b9a4a09b0f runtime: remove duff support for riscv64 + 2025-09-09 4dac9e093f cmd/compile: use generated loops instead of DUFFCOPY on riscv64 + 2025-09-09 879ff736d3 cmd/compile: use generated loops instead of DUFFZERO on riscv64 + 2025-09-09 77643dc63f cmd/compile: simplify zerorange on riscv64 + 2025-09-09 e6605a1bcc encoding/json: use reflect.TypeAssert + 2025-09-09 4c20f7f15a cmd/cgo: run gcc to get errors and debug info in parallel + 2025-09-09 5dcedd6550 runtime: lock mheap_.speciallock when allocating synctest specials + 2025-09-09 d3be949ada runtime: don't negate eventfd errno + 2025-09-09 836fa74518 syscall: optimise cgo clearenv + 2025-09-09 ce39174482 crypto/rsa: check PrivateKey.D for consistency with Dp and Dq + 2025-09-09 5d9d0513dc crypto/rsa: check for post-Precompute changes in Validate + 2025-09-09 968a5107a9 crypto/internal/fips140: update frozen module version to "v1.0.0" + 2025-09-09 645ee44492 crypto/ecdsa: deprecate direct use of big.Int fields in keys + 2025-09-09 a67977da5e cmd/compile/internal/inline: ignore superfluous slicing + 2025-09-09 a5fa5ea51c cmd/compile/internal/ssa: expand runtime.memequal for length {3,5,6,7} + 2025-09-09 4c63d798cb cmd/compile: improve stp merging for non-sequent cases + 2025-09-09 bdd51e7855 cmd/compile: use constant zero register instead of specialized zero instructions on mips64x + 2025-09-09 10ac80de77 cmd/compile: introduce CCMP generation + 2025-09-09 3b3b16957c Revert "cmd/go: use os.Rename to move files on Windows" + 2025-09-09 e3223518b8 cmd/go: split generating cover files into its own action + 2025-09-09 af03343f93 cmd/compile: fix bounds check report + 2025-09-08 6447ff409a cmd/compile: fold constant in ADDshift op on loong64 + 2025-09-08 5b218461f9 cmd/compile: optimize loads from abi.Type.{Size_,PtrBytes,Kind_} + 2025-09-08 b915e14490 cmd/compile: consolidate logic for rewriting fixed loads + 2025-09-08 06e791c0cd cmd/compile: simplify zerorange on mips + 2025-09-08 cf42b785b7 cmd/cgo: run recordTypes for each of the debugs at the end of Translate + 2025-09-08 5e6296f3f8 archive/tar: optimize nanosecond parsing in parsePAXTime + 2025-09-08 ea00650784 debug/pe: permit symbols with no name + 2025-09-08 4cc7cc74c3 crypto: update Hash comments to point to crypto/sha3 + 2025-09-08 ff45d5d53c encoding/json/internal/jsonflags: fix comment with wrong field name + 2025-09-06 861c90c907 net/http: pool transport gzip readers + 2025-09-06 57769b5532 os: reject OpenDir of a non-directory file in Plan 9 + 2025-09-06 a6144613d3 crypto/tls: use context.AfterFunc in handshakeContext + 2025-09-05 e8126bce9e runtime/cgo: save and restore R31 for crosscall1 on loong64 + 2025-09-05 d767064170 cmd/compile: mark abi.PtrType.Elem sym as used + 2025-09-05 0b1eed09a3 vendor/golang.org/x/tools: update to a09a2fb + 2025-09-05 f5b20689e9 cmd/compile: optimize loads from readonly globals into constants on loong64 + 2025-09-05 3492e4262b cmd/compile: simplify specific addition operations using the ADDV16 instruction + 2025-09-05 459b85ccaa cmd/fix: remove all functionality except for buildtag + 2025-09-05 87e72769fa runtime: simplify openbsd check in usesLibcall and mStackIsSystemAllocated + 2025-09-05 bb48272e24 cmd/compile: simplify zerorange on mips64 + 2025-09-05 d52a56cce1 cmd/link/internal/ld: unconditionally use posix_fallocate on FreeBSD + 2025-09-04 9d0829963c net/http: fix cookie value of "" being interpreted as empty string. + 2025-09-04 ddce0522be cmd/internal/obj/loong64: add ADDU16I.D instruction support + 2025-09-04 00b8474e47 cmd/trace: don't filter events for profile by whether they have stack + 2025-09-04 e36c5aead6 log/slog: add multiple handlers support for logger + 2025-09-04 150fae714e crypto/x509: don't force system roots load in SetFallbackRoots + 2025-09-04 4f7bbc62c7 runtime, cmd/compile, cmd/internal/obj: remove duff support for loong64 + 2025-09-04 b8cc907425 cmd/internal/obj/loong64: fix the usage of offset in the instructions [X]VLDREPL.{B/H/W/D} + 2025-09-04 8c27a80890 path{,/filepath}: speed up Match + 2025-09-04 b7c20413c5 runtime: remove obsolete osArchInit function + 2025-09-04 df29038486 cmd/compile/internal/ssa: load constant values from abi.PtrType.Elem + 2025-09-04 4373754bc9 cmd/compile: add store to load forwarding rules on riscv64 + 2025-09-03 80038586ed cmd/compile: export to DWARF types only referenced through interfaces + 2025-09-03 91e76a513b cmd/compile: use generated loops instead of DUFFCOPY on loong64 + 2025-09-03 c552ad913f cmd/compile: simplify memory load and store operations on loong64 + 2025-09-03 e8f9127d1f net/netip: export Prefix.Compare, fix ordering + 2025-09-03 731e546166 cmd/compile: simplify the support for 32bit high multiply on loong64 Change-Id: I2c124fb8071e2972d39804867cafb6806e601aba
2025-09-09cmd/compile: introduce CCMP generationCh1n-ch1nless
Introduce new aux type "ARM64ConditionalParams", which contains condition code, NZCV flags and constant with indicator of using it for CCMP instructions Updates #71268 Change-Id: I322a6cb7077c9a2c4415893c5eb7ff7692d5a2de Reviewed-on: https://go-review.googlesource.com/c/go/+/698037 Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>
2025-08-11[dev.simd] cmd/compile, simd: jump table for imm opsJunyang Shao
This CL fixes some errors in prog generation for imm operations, please see the changes in ssa.go for details. This CL also implements the jump table for non-const immediate arg. The current implementation exhaust 0-255, the bound-checked version will be in the next CL. This CL is partially generated by CL 694375. Change-Id: I75fe9900430b4fca5b39b0c0958a13b20b1104b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/694395 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-24cmd/compile,runtime: remember idx+len for bounds check failure with less codeKeith Randall
Currently we must put the index and length into specific registers so we can call into the runtime to report a bounds check failure. So a typical bounds check call is something like: MOVD R3, R0 MOVD R7, R1 CALL runtime.panicIndex or, if for instance the index is constant, MOVD $7, R0 MOVD R9, R1 CALL runtime.panicIndex Sometimes the MOVD can be avoided, if the value happens to be in the right register already. But that's not terribly common, and doesn't work at all for constants. Let's get rid of those MOVD instructions. They pollute the instruction cache and are almost never executed. Instead, we'll encode in a PCDATA table where the runtime should find the index and length. The table encodes, for each index and length, whether it is a constant or in a register, and which register or constant it is. That way, we can avoid all those useless MOVDs. Instead, we can figure out the index and length at runtime. This makes the bounds panic path slower, but that's a good tradeoff. We can encode registers 0-15 and constants 0-31. Anything outside that range still needs to use an explicit instruction. This CL is the foundation, followon CLs will move each architecture to the new strategy. Change-Id: I705c511e546e6aac59fed922a8eaed4585e96820 Reviewed-on: https://go-review.googlesource.com/c/go/+/682396 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-09cmd/compile/internal: merge stack slots for selected local auto varsThan McIntosh
[This is a partial roll-forward of CL 553055, the main change here is that the stack slot overlap operation is flagged off by default (can be enabled by hand with -gcflags=-d=mergelocals=1) ] Preliminary compiler support for merging/overlapping stack slots of local variables whose access patterns are disjoint. This patch includes changes in AllocFrame to do the actual merging/overlapping based on information returned from a new liveness.MergeLocals helper. The MergeLocals helper identifies candidates by looking for sets of AUTO variables that either A) have the same size and GC shape (if types contain pointers), or B) have the same size (but potentially different types as long as those types have no pointers). Variables must be greater than (3*types.PtrSize) in size to be considered for merging. After forming candidates, MergeLocals collects variables into "can be overlapped" equivalence classes or partitions; this process is driven by an additional liveness analysis pass. Ideally it would be nice to move the existing stackmap liveness pass up before AllocFrame and "widen" it to include merge candidates so that we can do just a single liveness as opposed to two passes, however this may be difficult given that the merge-locals liveness has to take into account writes corresponding to dead stores. This patch also required a change to the way ssa.OpVarDef pseudo-ops are generated; prior to this point they would only be created for variables whose type included pointers; if stack slot merging is enabled then the ssagen code creates OpVarDef ops for all auto vars that are merge candidates. Note that some temporaries created late in the compilation process (e.g. during ssa backend) are difficult to reason about, especially in cases where we take the address of a temp and pass it to the runtime. For the time being we mark most of the vars created post-ssagen as "not a merge candidate". Stack slot merging for locals/autos is enabled by default if "-N" is not in effect, and can be disabled via "-gcflags=-d=mergelocals=0". Fixmes/todos/restrictions: - try lowering size restrictions - re-evaluate the various skips that happen in SSA-created autotmps Updates #62737. Updates #65532. Updates #65495. Change-Id: Ifda26bc48cde5667de245c8a9671b3f0a30bb45d Reviewed-on: https://go-review.googlesource.com/c/go/+/575415 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-03-30Revert "cmd/compile/internal: merge stack slots for selected local auto vars"Cuong Manh Le
This reverts CL 553055. Reason for revert: causes crypto/ecdsa failures on linux ppc64/s390x builders Change-Id: I9266b030693a5b6b1e667a009de89d613755b048 Reviewed-on: https://go-review.googlesource.com/c/go/+/575236 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Auto-Submit: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-29cmd/compile/internal: merge stack slots for selected local auto varsThan McIntosh
Preliminary compiler support for merging/overlapping stack slots of local variables whose access patterns are disjoint. This patch includes changes in AllocFrame to do the actual merging/overlapping based on information returned from a new liveness.MergeLocals helper. The MergeLocals helper identifies candidates by looking for sets of AUTO variables that either A) have the same size and GC shape (if types contain pointers), or B) have the same size (but potentially different types as long as those types have no pointers). Variables must be greater than (3*types.PtrSize) in size to be considered for merging. After forming candidates, MergeLocals collects variables into "can be overlapped" equivalence classes or partitions; this process is driven by an additional liveness analysis pass. Ideally it would be nice to move the existing stackmap liveness pass up before AllocFrame and "widen" it to include merge candidates so that we can do just a single liveness as opposed to two passes, however this may be difficult given that the merge-locals liveness has to take into account writes corresponding to dead stores. This patch also required a change to the way ssa.OpVarDef pseudo-ops are generated; prior to this point they would only be created for variables whose type included pointers; if stack slot merging is enabled then the ssagen code creates OpVarDef ops for all auto vars that are merge candidates. Note that some temporaries created late in the compilation process (e.g. during ssa backend) are difficult to reason about, especially in cases where we take the address of a temp and pass it to the runtime. For the time being we mark most of the vars created post-ssagen as "not a merge candidate". Stack slot merging for locals/autos is enabled by default if "-N" is not in effect, and can be disabled via "-gcflags=-d=mergelocals=0". Fixmes/todos/restrictions: - try lowering size restrictions - re-evaluate the various skips that happen in SSA-created autotmps Fixes #62737. Updates #65532. Updates #65495. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Change-Id: Ibc22e8a76c87e47bc9fafe4959804d9ea923623d Reviewed-on: https://go-review.googlesource.com/c/go/+/553055 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-31cmd/compile: ensure pointer arithmetic happens after the nil checkKeith Randall
Have nil checks return a pointer that is known non-nil. Users of that pointer can use the result, ensuring that they are ordered after the nil check itself. The order dependence goes away after scheduling, when we've fixed an order. At that point we move uses back to the original pointer so it doesn't change regalloc any. This prevents pointer arithmetic on nil from being spilled to the stack and then observed by a stack scan. Fixes #63657 Change-Id: I1a5fa4f2e6d9000d672792b4f90dfc1b7b67f6ea Reviewed-on: https://go-review.googlesource.com/c/go/+/537775 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2022-08-22cmd/compile: issue VarDef only for pointer-ful typesKeith Randall
Use OpVarDef only when the variable being defined has pointers in it. VarDef markers are only used for liveness analysis, and that only runs on pointer-ful variables. Fixes #53810 Change-Id: I09b0ef7ed31e72528916fe79325f80bbe69ff9b4 Reviewed-on: https://go-review.googlesource.com/c/go/+/419320 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Run-TryBot: Keith Randall <khr@golang.org>
2022-04-14cmd/compile: implement jump tablesKeith Randall
Performance is kind of hard to exactly quantify. One big difference between jump tables and the old binary search scheme is that there's only 1 branch statement instead of O(n) of them. That can be both a blessing and a curse, and can make evaluating jump tables very hard to do. The single branch can become a choke point for the hardware branch predictor. A branch table jump must fit all of its state in a single branch predictor entry (technically, a branch target predictor entry). With binary search that predictor state can be spread among lots of entries. In cases where the case selection is repetitive and thus predictable, binary search can perform better. The big win for a jump table is that it doesn't consume so much of the branch predictor's resources. But that benefit is essentially never observed in microbenchmarks, because the branch predictor can easily keep state for all the binary search branches in a microbenchmark. So that benefit is really hard to measure. So predictable switch microbenchmarks are ~useless - they will almost always favor the binary search scheme. Fully unpredictable switch microbenchmarks are better, as they aren't lying to us quite so much. In a perfectly unpredictable situation, a jump table will expect to incur 1-1/N branch mispredicts, where a binary search would incur lg(N)/2 of them. That makes the crossover point at about N=4. But of course switches in real programs are seldom fully unpredictable, so we'll use a higher crossover point. Beyond the branch predictor, jump tables tend to execute more instructions per switch but have no additional instructions per case, which also argues for a larger crossover. As far as code size goes, with this CL cmd/go has a slightly smaller code segment and a slightly larger overall size (from the jump tables themselves which live in the data segment). This is a case where some FDO (feedback-directed optimization) would be really nice to have. #28262 Some large-program benchmarks might help make the case for this CL. Especially if we can turn on branch mispredict counters so we can see how much using jump tables can free up branch prediction resources that can be gainfully used elsewhere in the program. name old time/op new time/op delta Switch8Predictable 1.89ns ± 2% 1.27ns ± 3% -32.58% (p=0.000 n=9+10) Switch8Unpredictable 9.33ns ± 1% 7.50ns ± 1% -19.60% (p=0.000 n=10+9) Switch32Predictable 2.20ns ± 2% 1.64ns ± 1% -25.39% (p=0.000 n=10+9) Switch32Unpredictable 10.0ns ± 2% 7.6ns ± 2% -24.04% (p=0.000 n=10+10) Fixes #5496 Update #34381 Change-Id: I3ff56011d02be53f605ca5fd3fb96b905517c34f Reviewed-on: https://go-review.googlesource.com/c/go/+/357330 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>
2021-09-17cmd/compile: restore tail call for method wrappersCherry Mui
For certain type of method wrappers we used to generate a tail call. That was disabled in CL 307234 when register ABI is used, because with the current IR it was difficult to generate a tail call with the arguments in the right places. The problem was that the IR does not contain a CALL-like node with arguments; instead, it contains an OAS node that adjusts the receiver, than an OTAILCALL node that just contains the target, but no argument (with the assumption that the OAS node will put the adjusted receiver in the right place). With register ABI, putting arguments in registers are done in SSA. The assignment (OAS) doesn't put the receiver in register. This CL changes the IR of a tail call to take an actual OCALL node. Specifically, a tail call is represented as OTAILCALL (OCALL target args...) This way, the call target and args are connected through the OCALL node. So the call can be analyzed in SSA and the args can be passed in the right places. (Alternatively, we could have OTAILCALL node directly take the target and the args, without the OCALL node. Using an OCALL node is convenient as there are existing code that processes OCALL nodes which do not need to be changed. Also, a tail call is similar to ORETURN (OCALL target args...), except it doesn't preserve the frame. I did the former but I'm open to change.) The SSA representation is similar. Previously, the IR lowers to a Store the receiver then a BlockRetJmp which jumps to the target (without putting the arg in register). Now we use a TailCall op, which takes the target and the args. The call expansion pass and the register allocator handles TailCall pretty much like a StaticCall, and it will do the right ABI analysis and put the args in the right places. (Args other than the receiver are already in the right places. For register args it generates no code for them. For stack args currently it generates a self copy. I'll work on optimize that out.) BlockRetJmp is still used, signaling it is a tail call. The actual call is made in the TailCall op so BlockRetJmp generates no code (we could use BlockExit if we like). This slightly reduces binary size: old new cmd/go 14003088 13953936 cmd/link 6275552 6271456 Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/350145 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: David Chase <drchase@google.com>
2021-03-03cmd/compile: make modified Aux type for OpArgXXXX pass ssa/checkDavid Chase
For #40724. Change-Id: I7d1e76139d187cd15a6e0df9d19542b7200589f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/297911 Trust: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-12-21[dev.regabi] all: merge master into dev.regabiMatthew Dempsky
The list of conflicted files for this merge is: src/cmd/compile/internal/gc/inl.go src/cmd/compile/internal/gc/order.go src/cmd/compile/internal/gc/ssa.go test/fixedbugs/issue20415.go test/fixedbugs/issue22822.go test/fixedbugs/issue28079b.go inl.go was updated for changes on dev.regabi: namely that OSELRECV has been removed, and that OSELRECV2 now only uses List, rather than both Left and List. order.go was updated IsAutoTmp is now a standalone function, rather than a method on Node. ssa.go was similarly updated for new APIs involving package ir. The tests are all merging upstream additions for gccgo error messages with changes to cmd/compile's error messages on the dev.regabi branch. Change-Id: Icaaf186d69da791b5994dbb6688ec989caabec42
2020-12-14cmd/compile: fix incorrect shift count type with s390x rulesRuixin Bao
The type of the shift count must be an unsigned integer. Some s390x rules for shift have their auxint type being int8. This results in a compilation failure on s390x with an invalid operation when running make.bash using older versions of go (e.g: go1.10.4). This CL adds an auxint type of uint8 and changes the ops for shift and rotate to use auxint with type uint8. The related rules are also modified to address this change. Fixes #43090 Change-Id: I594274b6e3d9b23092fc9e9f4b354870164f2f19 Reviewed-on: https://go-review.googlesource.com/c/go/+/277078 Reviewed-by: Keith Randall <khr@golang.org> Trust: Dmitri Shuralyov <dmitshur@golang.org>
2020-12-08[dev.regabi] cmd/compile: add ssa.Aux tag interface for Value.AuxMatthew Dempsky
It's currently hard to automate refactorings around the Value.Aux field, because we don't have any static typing information for it. Adding a tag interface will make subsequent CLs easier and safer. Passes buildall w/ toolstash -cmp. Updates #42982. Change-Id: I41ae8e411a66bda3195a0957b60c2fe8a8002893 Reviewed-on: https://go-review.googlesource.com/c/go/+/275756 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Trust: Matthew Dempsky <mdempsky@google.com>
2020-09-18cmd/compile: add type check for ssa genericOpssurechen
Change-Id: I2233a6a157ec8feffaefd6a8ee65b1c38778c1cd Reviewed-on: https://go-review.googlesource.com/c/go/+/255238 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Trust: Giovanni Bajo <rasky@develer.com>
2020-09-16cmd/compile: introduce special ssa Aux type for callsDavid Chase
This is prerequisite to moving call expansion later into SSA, and probably a good idea anyway. Passes tests. This is the first minimal CL that does a 1-for-1 substitution of *ssa.AuxCall for *obj.LSym. Next step (next CL) is to make this change for all calls so that additional information can be stored in AuxCall. Change-Id: Ia3a7715648fd9fb1a176850767a726e6f5b959eb Reviewed-on: https://go-review.googlesource.com/c/go/+/237680 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-09-03cmd/compile: store the comparison pseudo-ops of arm64 conditional ↵fanzha02
instructions in AuxInt The current implementation stores the comparison pseudo-ops of arm64 conditional instructions (CSEL/CSEL0) in Aux, this patch modifies it and stores it in AuxInt, which can avoid the allocation. Change-Id: I0b69e51f63acd84c6878c6a59ccf6417501a8cfc Reviewed-on: https://go-review.googlesource.com/c/go/+/252517 Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-06-18cmd/compile: redo flag constant ops for armKeith Randall
Encode the flag results in an auxint field instead of having one opcode per flag state. This helps us handle the new *noov branches in a unified manner. This is only for arm, arm64 is in a subsequent CL. We could extend to other architectures as well, athough it would only be cleanup, no behavioral change. Update #39505 Change-Id: Ia46cea596faad540d1496c5915ab1274571543f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/238077 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-17cmd/compile: make some s390x rules use strongly typed aux valuesMichael Munday
This first pass makes the rules using the condition code mask (CCMask) and rotate parameters (RotateParams) aux values strongly typed. This required adding strongly typed aux handling to the block rulegen. More CLs like this to follow, but this is probably the most complex. Passes toolstash-check -all. Change-Id: Ie513b07d527f0c1b398d7748331442dcb5f7b17d Reviewed-on: https://go-review.googlesource.com/c/go/+/228518 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-04-09cmd/compile: start implementing strongly typed aux and auxint fieldsKeith Randall
Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-04cmd/compile: don't allow NaNs in floating-point constant opsKeith Randall
Trying this CL again, with a fixed test that allows platforms to disagree on the exact behavior of converting NaNs. We store 32-bit floating point constants in a 64-bit field, by converting that 32-bit float to 64-bit float to store it, and convert it back to use it. That works for *almost* all floating-point constants. The exception is signaling NaNs. The round trip described above means we can't represent a 32-bit signaling NaN, because conversions strip the signaling bit. To fix this issue, just forbid NaNs as floating-point constants in SSA form. This shouldn't affect any real-world code, as people seldom constant-propagate NaNs (except in test code). Additionally, NaNs are somewhat underspecified (which of the many NaNs do you get when dividing 0/0?), so when cross-compiling there's a danger of using the compiler machine's NaN regime for some math, and the target machine's NaN regime for other math. Better to use the target machine's NaN regime always. Update #36400 Change-Id: Idf203b688a15abceabbd66ba290d4e9f63619ecb Reviewed-on: https://go-review.googlesource.com/c/go/+/221790 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-02-28cmd/compile: add dedicated ARM64BitField aux typeJosh Bleecher Snyder
The goal here is improved AuxInt printing in ssa.html. Instead of displaying an inscrutable encoded integer, it displays something like v25 (28) = UBFX <int> [lsb=4,width=8] v52 which is much nicer for debugging. Change-Id: I40713ff7f4a857c4557486cdf73c2dff137511ca Reviewed-on: https://go-review.googlesource.com/c/go/+/221420 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-02-25Revert "cmd/compile: don't allow NaNs in floating-point constant ops"Bryan C. Mills
This reverts CL 213477. Reason for revert: tests are failing on linux-mips*-rtrk builders. Change-Id: I8168f7450890233f1bd7e53930b73693c26d4dc0 Reviewed-on: https://go-review.googlesource.com/c/go/+/220897 Run-TryBot: Bryan C. Mills <bcmills@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-02-25cmd/compile: don't allow NaNs in floating-point constant opsKeith Randall
We store 32-bit floating point constants in a 64-bit field, by converting that 32-bit float to 64-bit float to store it, and convert it back to use it. That works for *almost* all floating-point constants. The exception is signaling NaNs. The round trip described above means we can't represent a 32-bit signaling NaN, because conversions strip the signaling bit. To fix this issue, just forbid NaNs as floating-point constants in SSA form. This shouldn't affect any real-world code, as people seldom constant-propagate NaNs (except in test code). Additionally, NaNs are somewhat underspecified (which of the many NaNs do you get when dividing 0/0?), so when cross-compiling there's a danger of using the compiler machine's NaN regime for some math, and the target machine's NaN regime for other math. Better to use the target machine's NaN regime always. This has been a bug since 1.10, and there's an easy workaround (declare a global varaible containing the signaling NaN pattern, and use that as the argument to math.Float32frombits) so we'll fix it in 1.15. Fixes #36400 Update #36399 Change-Id: Icf155e743281560eda2eed953d19a829552ccfda Reviewed-on: https://go-review.googlesource.com/c/go/+/213477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2019-11-05cmd/compile: fix liveness for open-coded defer args for infinite loopsDan Scales
Once defined, a stack slot holding an open-coded defer arg should always be marked live, since it may be used at any time if there is a panic. These stack slots are typically kept live naturally by the open-defer code inlined at each return/exit point. However, we need to do extra work to make sure that they are kept live if a function has an infinite loop or a panic exit. For this fix, only in the case of a function that is using open-coded defers, we compute the set of blocks (most often empty) that cannot reach a return or a BlockExit (panic) because of an infinite loop. Then, for each block b which cannot reach a return or BlockExit or is a BlockExit block, we mark each defer arg slot as live, as long as the definition of the defer arg slot dominates block b. For this change, had to export (*Func).sdom (-> Sdom) and SparseTree.isAncestorEq (-> IsAncestorEq) Updates #35277 Change-Id: I7b53c9bd38ba384a3794386dd0eb94e4cbde4eb1 Reviewed-on: https://go-review.googlesource.com/c/go/+/204802 Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-10-02cmd/compile: allow multiple SSA block control valuesMichael Munday
Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-09-26cmd/compile: use numeric condition code masks on s390xMichael Munday
Prior to this CL conditional branches on s390x always used an extended mnemonic such as BNE, BLT and so on to represent branch instructions with different condition code masks. This CL adds support for numeric condition code masks to the s390x SSA backend so that we can encode the condition under which a Block's successor is chosen as a field in that Block rather than in its type. This change will be useful as we come to add support for combined compare-and-branch instructions. Rather than trying to add extended mnemonics for every possible combination of mask and compare-and- branch instruction we can instead use a single mnemonic for each instruction. Change-Id: Idb7458f187b50906877d683695c291dff5279553 Reviewed-on: https://go-review.googlesource.com/c/go/+/197178 Reviewed-by: Keith Randall <khr@golang.org>
2019-08-28cmd/compile: remove auxSymInt32Keith Randall
We never used it, might as well get rid of it. Change-Id: I5c23c93e90173bff9ac1fc1b8ae1e2025215d6eb Reviewed-on: https://go-review.googlesource.com/c/go/+/191938 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-05cmd/compile: fix store-to-load forwarding of 32-bit sNaNsMichael Munday
Signalling NaNs were being converted to quiet NaNs during constant propagation through integer <-> float store-to-load forwarding. This occurs because we store float32 constants as float64 values and CPU hardware 'quietens' NaNs during conversion between the two. Eventually we want to move to using float32 values to store float32 constants, however this will be a big change since both the compiler and the assembler expect float64 values. So for now this is a small change that will fix the immediate issue. Fixes #27193. Change-Id: Iac54bd8c13abe26f9396712bc71f9b396f842724 Reviewed-on: https://go-review.googlesource.com/132956 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-07-12cmd/compile: add LocalAddr that takes SP,mem operandsDavid Chase
Lack of a well-defined order between VarDef and related address operations sometimes causes problems with store order and write barrier transformations; glitches in the order are made irreparable (by later optimizations) if the two parts of the glitch straddle a split in the original block caused by insertion of a write barrier diamond. Fix this by creating a LocalAddr for addresses of locals (what VarDef matters for) that takes a memory input to help make the order explicit. Addr is modified to only be legal for SB operand, so there is no overlap between Addr and LocalAddr uses (there may be some downstream cleanup from this). Changes to generic.rules and rewrite.go ensure that codegen tests continue to pass; CSE of LocalAddr is impaired, not quite sure of the cost. Fixes #26105. Change-Id: Id4192b4440aa4e9d7ba54a465c456df9b530b515 Reviewed-on: https://go-review.googlesource.com/122483 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2018-04-24cmd/compile/internal/ssa: add Op{SP,SB} type checks to check.goisharipo
gc/ssa.go initilizes SP and SB values with TUINTPTR type. Assign same type in SSA tests and modify check.go to catch mismatching types for those ops. This makes SSA tests more consistent. Change-Id: I798440d57d00fb949d1a0cd796759c9b82a934bd Reviewed-on: https://go-review.googlesource.com/106658 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-02-20cmd/compile/internal/ssa: emit csel on arm64philhofer
Introduce a new SSA pass to generate CondSelect intstrutions, and add CondSelect lowering rules for arm64. In order to make the CSEL instruction easier to optimize, and to simplify the introduction of CSNEG, CSINC, and CSINV in the future, modify the CSEL instruction to accept a condition code in the aux field. Notably, this change makes the go1 Gzip benchmark more than 10% faster. Benchmarks on a Cavium ThunderX: name old time/op new time/op delta BinaryTree17-96 15.9s ± 6% 16.0s ± 4% ~ (p=0.968 n=10+9) Fannkuch11-96 7.17s ± 0% 7.00s ± 0% -2.43% (p=0.000 n=8+9) FmtFprintfEmpty-96 208ns ± 1% 207ns ± 0% ~ (p=0.152 n=10+8) FmtFprintfString-96 379ns ± 0% 375ns ± 0% -0.95% (p=0.000 n=10+9) FmtFprintfInt-96 385ns ± 0% 383ns ± 0% -0.52% (p=0.000 n=9+10) FmtFprintfIntInt-96 591ns ± 0% 586ns ± 0% -0.85% (p=0.006 n=7+9) FmtFprintfPrefixedInt-96 656ns ± 0% 667ns ± 0% +1.71% (p=0.000 n=10+10) FmtFprintfFloat-96 967ns ± 0% 984ns ± 0% +1.78% (p=0.000 n=10+10) FmtManyArgs-96 2.35µs ± 0% 2.25µs ± 0% -4.63% (p=0.000 n=9+8) GobDecode-96 31.0ms ± 0% 30.8ms ± 0% -0.36% (p=0.006 n=9+9) GobEncode-96 24.4ms ± 0% 24.5ms ± 0% +0.30% (p=0.000 n=9+9) Gzip-96 1.60s ± 0% 1.43s ± 0% -10.58% (p=0.000 n=9+10) Gunzip-96 167ms ± 0% 169ms ± 0% +0.83% (p=0.000 n=8+9) HTTPClientServer-96 311µs ± 1% 308µs ± 0% -0.75% (p=0.000 n=10+10) JSONEncode-96 65.0ms ± 0% 64.8ms ± 0% -0.25% (p=0.000 n=9+8) JSONDecode-96 262ms ± 1% 261ms ± 1% ~ (p=0.579 n=10+10) Mandelbrot200-96 18.0ms ± 0% 18.1ms ± 0% +0.17% (p=0.000 n=8+10) GoParse-96 14.0ms ± 0% 14.1ms ± 1% +0.42% (p=0.003 n=9+10) RegexpMatchEasy0_32-96 644ns ± 2% 645ns ± 2% ~ (p=0.836 n=10+10) RegexpMatchEasy0_1K-96 3.70µs ± 0% 3.49µs ± 0% -5.58% (p=0.000 n=10+10) RegexpMatchEasy1_32-96 662ns ± 2% 657ns ± 2% ~ (p=0.137 n=10+10) RegexpMatchEasy1_1K-96 4.47µs ± 0% 4.31µs ± 0% -3.48% (p=0.000 n=10+10) RegexpMatchMedium_32-96 844ns ± 2% 849ns ± 1% ~ (p=0.208 n=10+10) RegexpMatchMedium_1K-96 179µs ± 0% 182µs ± 0% +1.20% (p=0.000 n=10+10) RegexpMatchHard_32-96 10.0µs ± 0% 10.1µs ± 0% +0.48% (p=0.000 n=10+9) RegexpMatchHard_1K-96 297µs ± 0% 297µs ± 0% -0.14% (p=0.000 n=10+10) Revcomp-96 3.08s ± 0% 3.13s ± 0% +1.56% (p=0.000 n=9+9) Template-96 276ms ± 2% 275ms ± 1% ~ (p=0.393 n=10+10) TimeParse-96 1.37µs ± 0% 1.36µs ± 0% -0.53% (p=0.000 n=10+7) TimeFormat-96 1.40µs ± 0% 1.42µs ± 0% +0.97% (p=0.000 n=10+10) [Geo mean] 264µs 262µs -0.77% Change-Id: Ie54eee4b3092af53e6da3baa6d1755098f57f3a2 Reviewed-on: https://go-review.googlesource.com/55670 Run-TryBot: Philip Hofer <phofer@umich.edu> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2018-02-20cmd/compile: reset branch prediction when deleting a branchKeith Randall
When we go from a branch block to a plain block, reset the branch prediction bit. Downstream passes asssume that if the branch prediction is set, then the block has 2 successors. Fixes #23504 Change-Id: I2898ec002228b2e34fe80ce420c6939201c0a5aa Reviewed-on: https://go-review.googlesource.com/88955 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-02-14cmd/compile: reimplement location list generationHeschi Kreinick
Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-11-30cmd/compile: use soft-float routines for soft-float targetsVladimir Stefanovic
Updates #18162 (mostly fixes) Change-Id: I35bcb8a688bdaa432adb0ddbb73a2f7adda47b9e Reviewed-on: https://go-review.googlesource.com/37958 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-21cmd/compile: ignore RegKill ops for non-phi after phi checkThan McIntosh
Relax the 'phi after non-phi' SSA sanity check to allow RegKill ops interspersed with phi ops in a block. This fixes a sanity check failure when -dwarflocationlists is enabled. Updates #22694. Change-Id: Iaae604ab6f1a8b150664dd120003727a6fb2f698 Reviewed-on: https://go-review.googlesource.com/77610 Run-TryBot: Than McIntosh <thanm@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-05cmd/compile: make loop finder more aware of irreducible loopsDavid Chase
The loop finder doesn't return good information if it encounters an irreducible loop. Make a start on improving this, and set a function-level flag to indicate when there is such a loop (and the returned information might be flaky). Use that flag to prevent the loop rotater from getting confused; the existing code seems to depend on artifacts of the previous loop-finding algorithm. (There is one irreducible loop in the go library, in "inflate.go"). Change-Id: If6e26feab38d9b009d2252d556e1470c803bde40 Reviewed-on: https://go-review.googlesource.com/42150 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-09-08cmd/compile: propagate constants through math.Float{32,64}{,from}bitsMichael Munday
This CL adds generic SSA rules to propagate constants through raw bits conversions between floats and integers. This allows constants to propagate through some math functions. For example, math.Copysign(0, -1) is now constant folded to a load of -0.0. Requires a fix to the ARM assembler which loaded -0.0 as +0.0. Change-Id: I52649a4691077c7414f19d17bb599a6743c23ac2 Reviewed-on: https://go-review.googlesource.com/62250 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-06-07cmd/compile: check that phis are always first after schedulingKeith Randall
Update #20178 Change-Id: I603f77268ed38afdd84228c775efe006f08f14a7 Reviewed-on: https://go-review.googlesource.com/45018 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-05-15cmd/compile: better check for single live memoryKeith Randall
Enhance the one-live-memory-at-a-time check to run during many more phases of the SSA backend. Also make it work in an interblock fashion. Change types.IsMemory to return true for tuples containing a memory type. Fix trim pass to build the merged phi correctly. Doesn't affect code but allows the check to pass after trim runs. Switch the AddTuple* ops to take the memory-containing tuple argument second. Update #20335 Change-Id: I5b03ef3606b75a9e4f765276bb8b183cdc172b43 Reviewed-on: https://go-review.googlesource.com/43495 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-05-11cmd/compile: fix store chain in schedule passKeith Randall
Tuple ops are weird. They are essentially a pair of ops, one which consumes a mem and one which generates a mem (the Select1). The schedule pass didn't handle these quite right. Fix the scheduler to include both parts of the paired op in the store chain. That makes sure that loads are correctly ordered with respect to the first of the pair. Add a check for the ssacheck builder, that there is only one live store at a time. I thought we already had such a check, but apparently not... Fixes #20335 Change-Id: I59eb3446a329100af38d22820b1ca2190ca46a78 Reviewed-on: https://go-review.googlesource.com/43294 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-19cmd/compile: enhance postorder computation and repair loop finderDavid Chase
Replace derecursed postorder computation with one that mimics DFS traversal. Corrected outerinner function in loopfinder Leave enhanced checks in place. Change-Id: I657ba5e89c88941028d6d4c72e9f9056e30f1ce8 Reviewed-on: https://go-review.googlesource.com/40872 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2017-03-16cmd/compile: use type information in Aux for Store sizeCherry Zhang
Remove size AuxInt in Store, and alignment in Move/Zero. We still pass size AuxInt to Move/Zero, as it is used for partial Move/Zero lowering (e.g. cmd/compile/internal/ssa/gen/386.rules:288). SizeAndAlign is gone. Passes "toolstash -cmp" on std. Change-Id: I1ca34652b65dd30de886940e789fcf41d521475d Reviewed-on: https://go-review.googlesource.com/38150 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-10-25cmd/compile: add a writebarrier phase in SSACherry Zhang
When the compiler insert write barriers, the frontend makes conservative decisions at an early stage. This may have false positives which result in write barriers for stack writes. A new phase, writebarrier, is added to the SSA backend, to delay the decision and eliminate false positives. The frontend still makes conservative decisions. When building SSA, instead of emitting runtime calls directly, it emits WB ops (StoreWB, MoveWB, etc.), which will be expanded to branches and runtime calls in writebarrier phase. Writes to static locations on stack are detected and write barriers are removed. All write barriers of stack writes found by the script from issue #17330 are eliminated (except two false positives). Fixes #17330. Change-Id: I9bd66333da9d0ceb64dcaa3c6f33502798d1a0f8 Reviewed-on: https://go-review.googlesource.com/31131 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-09-19cmd/compile: cache CFG-dependent computationsKeith Randall
We compute a lot of stuff based off the CFG: postorder traversal, dominators, dominator tree, loop nest. Multiple phases use this information and we end up recomputing some of it. Add a cache for this information so if the CFG hasn't changed, we can reuse the previous computation. Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2 Reviewed-on: https://go-review.googlesource.com/29356 Reviewed-by: David Chase <drchase@google.com>