| Age | Commit message (Collapse) | Author |
|
We kind of have this mechanism already, just normalizing it and
using it in a bunch of places. Previously a bunch of places cached
slices only for the duration of a single function compilation. Now
we can reuse slices across a whole compiler run.
Use a sync.Pool of powers-of-two sizes. This lets us use not
too much memory, and avoid holding onto memory we're no longer
using when a GC happens.
There's a few different types we need, so generate the code for it.
Generics would be useful here, but we can't use generics in the
compiler because of bootstrapping.
Change-Id: I6cf37e7b7b2e802882aaa723a0b29770511ccd82
Reviewed-on: https://go-review.googlesource.com/c/go/+/444820
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
steals idea from CL 312093
further investigation revealed additional duplicate
slots (equivalent, but not equal), so delete those too.
Rearranged Func.Names to be addresses of slots,
create canonical addresses so that split slots
(which use those addresses to refer to their parent,
and split slots can be further split)
will preserve "equivalent slots are equal".
Removes duplicates, improves metrics for "args at entry".
Change-Id: I5bbdcb50bd33655abcab3d27ad8cdce25499faaf
Reviewed-on: https://go-review.googlesource.com/c/go/+/312292
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Assignment between input parameters causes them to have more than
one "Name", and running this backwards from names to values can end
up confusing (conflating) parameter spill slots.
Around 105a6e9518, this cases a stack overflow running
go test -race encoding/pem
because two slice parameters spill (incorrectly) into the same
stack slots (in the AB?I-defined parameter spill area).
This also tickles a failure in cue, which turned out to be
easier to isolate.
Fixes #45851.
Updates #40724.
Change-Id: I39c56815bd6abb652f1ccbe83c47f4f373a125c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/313212
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
former code only spilled those parameters mentioned in code
AT THE REGISTER LEVEL, this caused problems with liveness
sometimes (which worked on whole variables including
aggregates).
Updates #40724.
Change-Id: Ib9fdc50d95d1d2b1f1e405dd370540e88582ac71
Reviewed-on: https://go-review.googlesource.com/c/go/+/310690
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This reverts CL 308510.
Reason for revert: It breaks "GOEXPERIMENT=regabi,regabiargs ./make.bash"
Change-Id: I553654690ec73120f8a6258dd80623853c430df0
Reviewed-on: https://go-review.googlesource.com/c/go/+/308932
Trust: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
We noticed a while ago that register argument spills were not always
landing where they should.
Updates #40724.
Change-Id: I0b7c3279a2f6270577481c252bae4568cbb6e796
Reviewed-on: https://go-review.googlesource.com/c/go/+/308510
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Morestack works for non-pointer register parameters
Within a function body, pointer-typed parameters are correctly
tracked.
Results still not hooked up.
For #40724.
Change-Id: Icaee0b51d0da54af983662d945d939b756088746
Reviewed-on: https://go-review.googlesource.com/c/go/+/294410
Trust: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Also handles case where OpArg does not escape but has its address
taken.
May have exposed a lurking bug in 1.16 expandCalls,
if e.g., loading len(someArrayOfstructThing[0].secondStringField)
from a local. Maybe.
For #40724.
Change-Id: I0298c4ad5d652b5e3d7ed6a62095d59e2d8819c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/293396
Trust: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This was already documented as always being an ONAME, so it just
needed a few type assertion changes.
Passes buildall w/ toolstash -cmp.
Updates #42982.
Change-Id: I61f4b6ebd57c43b41977f4b37b81fe94fb11a723
Reviewed-on: https://go-review.googlesource.com/c/go/+/275757
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
|
|
The plan is to introduce a Node interface that replaces the old *Node pointer-to-struct.
The previous CL defined an interface INode modeling a *Node.
This CL:
- Changes all references outside internal/ir to use INode,
along with many references inside internal/ir as well.
- Renames Node to node.
- Renames INode to Node
So now ir.Node is an interface implemented by *ir.node, which is otherwise inaccessible,
and the code outside package ir is now (clearly) using only the interface.
The usual rule is never to redefine an existing name with a new meaning,
so that old code that hasn't been updated gets a "unknown name" error
instead of more mysterious errors or silent misbehavior. That rule would
caution against replacing Node-the-struct with Node-the-interface,
as in this CL, because code that says *Node would now be using a pointer
to an interface. But this CL is being landed at the same time as another that
moves Node from gc to ir. So the net effect is to replace *gc.Node with ir.Node,
which does follow the rule: any lingering references to gc.Node will be told
it's gone, not silently start using pointers to interfaces. So the rule is followed
by the CL sequence, just not this specific CL.
Overall, the loss of inlining caused by using interfaces cuts the compiler speed
by about 6%, a not insignificant amount. However, as we convert the representation
to concrete structs that are not the giant Node over the next weeks, that speed
should come back as more of the compiler starts operating directly on concrete types
and the memory taken up by the graph of Nodes drops due to the more precise
structs. Honestly, I was expecting worse.
% benchstat bench.old bench.new
name old time/op new time/op delta
Template 168ms ± 4% 182ms ± 2% +8.34% (p=0.000 n=9+9)
Unicode 72.2ms ±10% 82.5ms ± 6% +14.38% (p=0.000 n=9+9)
GoTypes 563ms ± 8% 598ms ± 2% +6.14% (p=0.006 n=9+9)
Compiler 2.89s ± 4% 3.04s ± 2% +5.37% (p=0.000 n=10+9)
SSA 6.45s ± 4% 7.25s ± 5% +12.41% (p=0.000 n=9+10)
Flate 105ms ± 2% 115ms ± 1% +9.66% (p=0.000 n=10+8)
GoParser 144ms ±10% 152ms ± 2% +5.79% (p=0.011 n=9+8)
Reflect 345ms ± 9% 370ms ± 4% +7.28% (p=0.001 n=10+9)
Tar 149ms ± 9% 161ms ± 5% +8.05% (p=0.001 n=10+9)
XML 190ms ± 3% 209ms ± 2% +9.54% (p=0.000 n=9+8)
LinkCompiler 327ms ± 2% 325ms ± 2% ~ (p=0.382 n=8+8)
ExternalLinkCompiler 1.77s ± 4% 1.73s ± 6% ~ (p=0.113 n=9+10)
LinkWithoutDebugCompiler 214ms ± 4% 211ms ± 2% ~ (p=0.360 n=10+8)
StdCmd 14.8s ± 3% 15.9s ± 1% +6.98% (p=0.000 n=10+9)
[Geo mean] 480ms 510ms +6.31%
name old user-time/op new user-time/op delta
Template 223ms ± 3% 237ms ± 3% +6.16% (p=0.000 n=9+10)
Unicode 103ms ± 6% 113ms ± 3% +9.53% (p=0.000 n=9+9)
GoTypes 758ms ± 8% 800ms ± 2% +5.55% (p=0.003 n=10+9)
Compiler 3.95s ± 2% 4.12s ± 2% +4.34% (p=0.000 n=10+9)
SSA 9.43s ± 1% 9.74s ± 4% +3.25% (p=0.000 n=8+10)
Flate 132ms ± 2% 141ms ± 2% +6.89% (p=0.000 n=9+9)
GoParser 177ms ± 9% 183ms ± 4% ~ (p=0.050 n=9+9)
Reflect 467ms ±10% 495ms ± 7% +6.17% (p=0.029 n=10+10)
Tar 183ms ± 9% 197ms ± 5% +7.92% (p=0.001 n=10+10)
XML 249ms ± 5% 268ms ± 4% +7.82% (p=0.000 n=10+9)
LinkCompiler 544ms ± 5% 544ms ± 6% ~ (p=0.863 n=9+9)
ExternalLinkCompiler 1.79s ± 4% 1.75s ± 6% ~ (p=0.075 n=10+10)
LinkWithoutDebugCompiler 248ms ± 6% 246ms ± 2% ~ (p=0.965 n=10+8)
[Geo mean] 483ms 504ms +4.41%
[git-generate]
cd src/cmd/compile/internal/ir
: # We need to do the conversion in multiple steps, so we introduce
: # a temporary type alias that will start out meaning the pointer-to-struct
: # and then change to mean the interface.
rf '
mv Node OldNode
add node.go \
type Node = *OldNode
'
: # It should work to do this ex in ir, but it misses test files, due to a bug in rf.
: # Run the command in gc to handle gc's tests, and then again in ssa for ssa's tests.
cd ../gc
rf '
ex . ../arm ../riscv64 ../arm64 ../mips64 ../ppc64 ../mips ../wasm {
import "cmd/compile/internal/ir"
*ir.OldNode -> ir.Node
}
'
cd ../ssa
rf '
ex {
import "cmd/compile/internal/ir"
*ir.OldNode -> ir.Node
}
'
: # Back in ir, finish conversion clumsily with sed,
: # because type checking and circular aliases do not mix.
cd ../ir
sed -i '' '
/type Node = \*OldNode/d
s/\*OldNode/Node/g
s/^func (n Node)/func (n *OldNode)/
s/OldNode/node/g
s/type INode interface/type Node interface/
s/var _ INode = (Node)(nil)/var _ Node = (*node)(nil)/
' *.go
gofmt -w *.go
sed -i '' '
s/{Func{}, 136, 248}/{Func{}, 152, 280}/
s/{Name{}, 32, 56}/{Name{}, 44, 80}/
s/{Param{}, 24, 48}/{Param{}, 44, 88}/
s/{node{}, 76, 128}/{node{}, 88, 152}/
' sizeof_test.go
cd ../ssa
sed -i '' '
s/{LocalSlot{}, 28, 40}/{LocalSlot{}, 32, 48}/
' sizeof_test.go
cd ../gc
sed -i '' 's/\*ir.Node/ir.Node/' mkbuiltin.go
cd ../../../..
go install std cmd
cd cmd/compile
go test -u || go test -u
Change-Id: I196bbe3b648e4701662e4a2bada40bf155e2a553
Reviewed-on: https://go-review.googlesource.com/c/go/+/272935
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
The cycle hacks existed because gc needed to import ssa
which need to know about gc.Node. But now that's ir.Node,
and there's no cycle anymore.
Don't know how much it matters but LocalSlot is now
one word shorter than before, because it holds a pointer
instead of an interface for the *Node. That won't last long.
Now that they're not necessary for interface satisfaction,
IsSynthetic and IsAutoTmp can move to top-level ir functions.
Change-Id: Ie511e93466cfa2b17d9a91afc4bd8d53fdb80453
Reviewed-on: https://go-review.googlesource.com/c/go/+/272931
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
As it says, delay expanpsion of OpArg to the expand_calls phase,
to enable (eventually) interprocedural SSA optimizations, and
(sooner) change to a register ABI.
Includes a round of cleanup to function names and comments,
largely to match the expanded scope of the functions.
This CL removes the per-function dependence on GOSSAHASH,
but the go116lateCallExpansion kill switch remains (and was
tested locally to ensure it worked).
Two functions in expand_calls.go that performed overlapping
things were combined into a single function that is called
twice.
Fixes #42236.
For #40724.
Change-Id: Icbb78947eaa39f17f2c1210d5c2caef20abd6571
Reviewed-on: https://go-review.googlesource.com/c/go/+/262117
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Change-Id: If2954bdfc551515403706b2cd0dde94e45936e08
GitHub-Last-Rev: d4cfc41a5504cf10befefdb881d4c45986a1d1f8
GitHub-Pull-Request: golang/go#28049
Reviewed-on: https://go-review.googlesource.com/c/140299
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Go's SSA instructions only operate on registers. For example, an add
instruction would read two registers, do the addition and then write
to a register. WebAssembly's instructions, on the other hand, operate
on the stack. The add instruction first pops two values from the stack,
does the addition, then pushes the result to the stack. To fulfill
Go's semantics, one needs to map Go's single add instruction to
4 WebAssembly instructions:
- Push the value of local variable A to the stack
- Push the value of local variable B to the stack
- Do addition
- Write value from stack to local variable C
Now consider that B was set to the constant 42 before the addition:
- Push constant 42 to the stack
- Write value from stack to local variable B
This works, but is inefficient. Instead, the stack is used directly
by inlining instructions if possible. With inlining it becomes:
- Push the value of local variable A to the stack (add)
- Push constant 42 to the stack (constant)
- Do addition (add)
- Write value from stack to local variable C (add)
Note that the two SSA instructions can not be generated sequentially
anymore, because their WebAssembly instructions are interleaved.
Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4
Updates #18892
Change-Id: Ie35e1c0bebf4985fddda0d6330eb2066f9ad6dec
Reviewed-on: https://go-review.googlesource.com/103535
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
The compiler allows code to have multiple differently-typed views of a
single argument. For instance, if we have
func f(x float64) {
y := *(*int64)(unsafe.Pointer(&x))
...
}
Then in SSA we get two OpArg ops, one with float64 type and one with
int64 type.
The compiler will try to reuse argument slots for spill slots. It
checks that the argument slot is dead by consulting an interference
graph.
When building the interference graph, we normally ignore cross-type
edges because the values on either end of that edge can't be allocated
to the same slot. (This is just a space-saving optimization.) This
rule breaks down when one of the values is an argument, because of the
multiple views described above. If we're spilling a float64, it is not
enough that the float64 version of x is dead; the int64 version of x
has to be dead also.
Remove the optimization of not recording interference edges if types
don't match. That optimization is incorrect if one of the values
connected by the edge is an argument.
Fixes #23522
Change-Id: I361f85d80fe3bc7249014ca2c3ec887c3dc30271
Reviewed-on: https://go-review.googlesource.com/89335
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Just to get rid of lots of .Name() stutter in printf calls.
Change-Id: I86cf00b3f7b2172387a1c6a7f189c1897fab6300
Reviewed-on: https://go-review.googlesource.com/56630
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
When the compiler decomposes a user variable, track its origin so that
it can be recomposed during DWARF generation.
Change-Id: Ia71c7f8e7f4d65f0652f1c97b0dda5d9cad41936
Reviewed-on: https://go-review.googlesource.com/50878
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Concurrent compilation requires providing an
explicit position and curfn to temp.
This implementation of tempAt temporarily
continues to use the globals lineno and Curfn,
so as not to collide with mdempsky's
work for #19683 eliminating the Curfn dependency
from func nod.
Updates #15756
Updates #19683
Change-Id: Ib3149ca4b0740e9f6eea44babc6f34cdd63028a9
Reviewed-on: https://go-review.googlesource.com/38592
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Suggested by mdempsky in CL 38232.
This allows us to use the Frontend field
to associate frontend state and information
with a function.
See the following CL in the series for examples.
This is a giant CL, but it is almost entirely routine refactoring.
The ssa test API is starting to feel a bit unwieldy.
I will clean it up separately, once the dust has settled.
Passes toolstash -cmp.
Updates #15756
Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf
Reviewed-on: https://go-review.googlesource.com/38327
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill
the roles laid out for them in CL 38160.
The only non-trivial change in this CL is how cached
values and blocks get IDs. Prior to this CL, their IDs were
assigned as part of resetting the cache, and only modified
IDs were reset. This required knowing how many values and
blocks were modified, which required a tight coupling between
ssa.Func and ssa.Config. To eliminate that coupling,
we now zero values and blocks during reset,
and assign their IDs when they are used.
Since unused values and blocks have ID == 0,
we can efficiently find the last used value/block,
to avoid zeroing everything.
Bulk zeroing is efficient, but not efficient enough
to obviate the need to avoid zeroing everything every time.
As a happy side-effect, ssa.Func.Free is no longer necessary.
DebugHashMatch and friends now belong in func.go.
They have been left in place for clarity and review.
I will move them in a subsequent CL.
Passes toolstash -cmp. No compiler performance impact.
No change in 'go test cmd/compile/internal/ssa' execution time.
Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098
Reviewed-on: https://go-review.googlesource.com/38167
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
XPos is a compact (8 instead of 16 bytes on a 64bit machine) source
position representation. There is a 1:1 correspondence between each
XPos and each regular Pos, translated via a global table.
In some sense this brings back the LineHist, though positions can
track line and column information; there is a O(1) translation
between the representations (no binary search), and the translation
is factored out.
The size increase with the prior change is brought down again and
the compiler speed is in line with the master repo (measured on
the same "quiet" machine as for prior change):
name old time/op new time/op delta
Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4)
Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4)
GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4)
Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4)
MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5)
name old user-ns/op new user-ns/op delta
Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4)
Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5)
GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4)
Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4)
Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47
Reviewed-on: https://go-review.googlesource.com/34506
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Using a variable instead of a composite literal makes
the code independent of implementation changes of Pos.
Per David Lazar's suggestion.
Change-Id: I336967ac12a027c51a728a58ac6207cb5119af4a
Reviewed-on: https://go-review.googlesource.com/34148
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Adjust cmd/compile accordingly.
This will make it easier to replace the underlying implementation.
Change-Id: I33645850bb18c839b24785b6222a9e028617addb
Reviewed-on: https://go-review.googlesource.com/34133
Reviewed-by: David Lazar <lazard@golang.org>
|
|
We compute a lot of stuff based off the CFG: postorder traversal,
dominators, dominator tree, loop nest. Multiple phases use this
information and we end up recomputing some of it. Add a cache
for this information so if the CFG hasn't changed, we can reuse
the previous computation.
Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2
Reviewed-on: https://go-review.googlesource.com/29356
Reviewed-by: David Chase <drchase@google.com>
|
|
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Provide indexes along with block pointers for Preds
and Succs arrays. This allows us to splice edges in
and out of those arrays in constant time.
Fixes worst-case O(n^2) behavior in deadcode and fuse.
benchmark old ns/op new ns/op delta
BenchmarkFuse1-8 2065 2057 -0.39%
BenchmarkFuse10-8 9408 9073 -3.56%
BenchmarkFuse100-8 105238 76277 -27.52%
BenchmarkFuse1000-8 3982562 1026750 -74.22%
BenchmarkFuse10000-8 301220329 12824005 -95.74%
BenchmarkDeadCode1-8 1588 1566 -1.39%
BenchmarkDeadCode10-8 4333 4250 -1.92%
BenchmarkDeadCode100-8 32031 32574 +1.70%
BenchmarkDeadCode1000-8 590407 468275 -20.69%
BenchmarkDeadCode10000-8 17822890 5000818 -71.94%
BenchmarkDeadCode100000-8 1388706640 78021127 -94.38%
BenchmarkDeadCode200000-8 5372518479 168598762 -96.86%
Change-Id: Iccabdbb9343fd1c921ba07bbf673330a1c36ee17
Reviewed-on: https://go-review.googlesource.com/22589
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
They have different semantics.
Equal is stricter and is designed for the front-end.
Compare is looser and cheaper and is designed for the back-end.
To avoid possible regression, remove Equal from ssa.Type.
Updates #15043
Change-Id: Ie23ce75ff6b4d01b7982e0a89e6f81b5d099d8d6
Reviewed-on: https://go-review.googlesource.com/21483
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
|
|
This is controlled by the "regalloc" stats flag, since regalloc
calls stackalloc. The plan is for this to allow comparison
of cheaper stack allocation algorithms with what we have now.
Change-Id: Ibf64a780344c69babfcbb328fd6d053ea2e02cfc
Reviewed-on: https://go-review.googlesource.com/21393
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Compound AUTO types weren't named previously. That was because live
variable analysis (plive.go) doesn't handle spilling to compound types.
It can't handle them because there is no valid place to put VARDEFs when
regalloc is spilling compound types.
compound types = multiword builtin types: complex, string, slice, and
interface.
Instead, we split named AUTOs into individual one-word variables. For
example, a string s gets split into a byte ptr s.ptr and an integer
s.len. Those two variables can be spilled to / restored from
independently. As a result, live variable analysis can handle them
because they are one-word objects.
This CL will change how AUTOs are described in DWARF information.
Consider the code:
func f(s string, i int) int {
x := s[i:i+5]
g()
return lookup(x)
}
The old compiler would spill x to two consecutive slots on the stack,
both named x (at offsets 0 and 8). The new compiler spills the pointer
of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a
constant (5) and can be rematerialized for the call to lookup.
So compound objects may not be spilled in their entirety, and even if
they are they won't necessarily be contiguous. Such is the price of
optimization.
Re-enable live variable analysis tests. One test remains disabled, it
fails because of #14904.
Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801
Reviewed-on: https://go-review.googlesource.com/21233
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
This commit replaces some of
for i := len(x) - 1; i >= 0; i-- {...}
style loops, which do not rely on reverse iteration order.
Change-Id: I5542834286562da058200c06e7a173b13760e54d
Reviewed-on: https://go-review.googlesource.com/21044
Reviewed-by: Keith Randall <khr@golang.org>
|
|
It's pretty hard to get reliable CPU numbers, even with 50 runs on an
otherwise-idle physical Linux machine, but the garbage reduction
numbers are nice. To get useful time/op numbers, I modified
compilebench to report user CPU time instead of wall time:
name old time/op new time/op delta
Template 547ms ± 6% 557ms ± 5% +1.80% (p=0.001 n=49+49)
Unicode 360ms ± 9% 365ms ± 6% ~ (p=0.094 n=50+45)
GoTypes 1.84s ± 3% 1.82s ± 3% -1.50% (p=0.000 n=50+49)
Compiler 9.19s ± 2% 9.02s ± 2% -1.87% (p=0.000 n=45+50)
name old alloc/op new alloc/op delta
Template 63.3MB ± 0% 59.1MB ± 0% -6.72% (p=0.000 n=50+50)
Unicode 43.1MB ± 0% 42.9MB ± 0% -0.47% (p=0.000 n=50+49)
GoTypes 220MB ± 0% 200MB ± 0% -9.00% (p=0.000 n=50+50)
Compiler 1.00GB ± 0% 0.89GB ± 0% -10.09% (p=0.000 n=50+49)
name old allocs/op new allocs/op delta
Template 681k ± 0% 680k ± 0% -0.16% (p=0.000 n=50+48)
Unicode 541k ± 0% 541k ± 0% -0.02% (p=0.011 n=48+50)
GoTypes 2.08M ± 0% 2.08M ± 0% -0.19% (p=0.000 n=48+50)
Compiler 9.24M ± 0% 9.23M ± 0% -0.11% (p=0.000 n=50+50)
Change-Id: I1fac4ebf85a1783e3289c3ffb1ed365442837643
Reviewed-on: https://go-review.googlesource.com/20995
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Change the existing flags from compile time consts to be configurable
from the command line.
Change-Id: I4aab4bf3dfcbdd8e2b5a2ff51af95c2543967769
Reviewed-on: https://go-review.googlesource.com/20560
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Cache sparse sets in the function so they can be reused by subsequent
compiler passes.
benchmark old ns/op new ns/op delta
BenchmarkDSEPass-8 206945 180022 -13.01%
BenchmarkDSEPassBlock-8 5286103 2614054 -50.55%
BenchmarkCSEPass-8 1790277 1790655 +0.02%
BenchmarkCSEPassBlock-8 18083588 18112771 +0.16%
BenchmarkDeadcodePass-8 59837 41375 -30.85%
BenchmarkDeadcodePassBlock-8 1651575 511169 -69.05%
BenchmarkMultiPass-8 531529 427506 -19.57%
BenchmarkMultiPassBlock-8 7033496 4487814 -36.19%
benchmark old allocs new allocs delta
BenchmarkDSEPass-8 11 4 -63.64%
BenchmarkDSEPassBlock-8 599 120 -79.97%
BenchmarkCSEPass-8 18 18 +0.00%
BenchmarkCSEPassBlock-8 2700 2700 +0.00%
BenchmarkDeadcodePass-8 4 3 -25.00%
BenchmarkDeadcodePassBlock-8 30 9 -70.00%
BenchmarkMultiPass-8 24 20 -16.67%
BenchmarkMultiPassBlock-8 1800 1000 -44.44%
benchmark old bytes new bytes delta
BenchmarkDSEPass-8 221367 142 -99.94%
BenchmarkDSEPassBlock-8 3695207 3846 -99.90%
BenchmarkCSEPass-8 303328 303328 +0.00%
BenchmarkCSEPassBlock-8 5006400 5006400 +0.00%
BenchmarkDeadcodePass-8 84232 10506 -87.53%
BenchmarkDeadcodePassBlock-8 1274940 163680 -87.16%
BenchmarkMultiPass-8 608674 313834 -48.44%
BenchmarkMultiPassBlock-8 9906001 5003450 -49.49%
Change-Id: Ib1fa58c7f494b374d1a4bb9cffbc2c48377b59d3
Reviewed-on: https://go-review.googlesource.com/19100
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Reorder how register & stack allocation is done. We used to allocate
registers, then fix up merge edges, then allocate stack slots. This
lead to lots of unnecessary copies on merge edges:
v2 = LoadReg v1
v3 = StoreReg v2
If v1 and v3 are allocated to the same stack slot, then this code is
unnecessary. But at regalloc time we didn't know the homes of v1 and
v3.
To fix this problem, allocate all the stack slots before fixing up the
merge edges. That way, we know what stack slots values use so we know
what copies are required.
Use a good technique for shuffling values around on merge edges.
Improves performance of the go1 TimeParse benchmark by ~12%
Change-Id: I731f43e4ff1a7e0dc4cd4aa428fcdb97812b86fa
Reviewed-on: https://go-review.googlesource.com/17915
Reviewed-by: David Chase <drchase@google.com>
|
|
Declare a function's arguments as having already been
spilled so their use just requires a restore.
Allow spill locations to be portions of larger objects the stack.
Required to load portions of compound input arguments.
Rename the memory input to InputMem. Use Arg for the
pre-spilled argument values.
Change-Id: I8fe2a03ffbba1022d98bfae2052b376b96d32dda
Reviewed-on: https://go-review.googlesource.com/16536
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
For debugging, spill values to named variables instead of autotmp_
variables if possible. We do this by keeping a name -> value map
for each function, keep it up-to-date during deadcode elim, and use
it to override spill decisions in stackalloc.
It might even make stack frames a bit smaller, as it makes it easy
to identify a set of spills which are likely not to interfere.
This just works for one-word variables for now. Strings/slices
will be a separate CL.
Change-Id: Ie89eba8cab16bcd41b311c479ec46dd7e64cdb67
Reviewed-on: https://go-review.googlesource.com/16336
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
For each type, maintain a list of stack slots used to spill
SSA values to the stack. Reuse those stack slots for noninterfering
spills.
Lowers frame sizes. As an example, runtime.mSpan_Sweep goes from
584 bytes to 392 bytes. heapBitsSetType goes from 576 bytes to 152 bytes.
Change-Id: I0e9afe80c2fd84aff9eb368318685de293c363d0
Reviewed-on: https://go-review.googlesource.com/16022
Reviewed-by: David Chase <drchase@google.com>
|
|
This change is all about leveraging the gc bitmap generation
that is already done by the current compiler. We rearrange how
stack allocation is done so that we generate a variable declaration
for each spill. We also reorganize how args/locals are recorded
during SSA. Then we can use the existing allocauto/defframe to
allocate the stack frame and liveness to make the gc bitmaps.
With this change, stack copying works correctly and we no longer
need hacks in runtime/stack*.go to make tests work. GC is close
to working, it just needs write barriers.
Change-Id: I990fb4e3fbe98850c6be35c3185a1c85d9e1a6ba
Reviewed-on: https://go-review.googlesource.com/13894
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Implement a global (whole function) register allocator.
This replaces the local (per basic block) register allocator.
Clobbering of registers by instructions is handled properly.
A separate change will add the correct clobbers to all the instructions.
Change-Id: I38ce4dc7dccb8303c1c0e0295fe70247b0a3f2ea
Reviewed-on: https://go-review.googlesource.com/13622
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Todd Neal <todd@tneal.org>
|
|
Bad rebase in CL 12439.
Change-Id: I7ad359519c6274be37456b655f19bf0ca6ac6692
Reviewed-on: https://go-review.googlesource.com/12449
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Prior to this fix, a zero-aligned variable such as a flags
variable would reset n to 0.
While we're here, log the stack layout so that debugging
and reading the generated assembly is easier.
Change-Id: I18ef83ea95b6ea877c83f2e595e14c48c9ad7d84
Reviewed-on: https://go-review.googlesource.com/12439
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Bake the bit width and signedness into opcodes.
Pro: Rewrite rules become easier. Less chance for confusion.
Con: Lots more opcodes.
Let me know what you think. I'm leaning towards this, but I could be
convinced otherwise if people think this is too ugly.
Update #11467
Change-Id: Icf1b894268cdf73515877bb123839800d97b9df9
Reviewed-on: https://go-review.googlesource.com/12362
Reviewed-by: Alan Donovan <adonovan@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Keep track of the outargs size needed at each call.
Compute the size of the outargs section of the stack frame. It's just
the max of the outargs size at all the callsites in the function.
Change-Id: I3d0640f654f01307633b1a5f75bab16e211ea6c0
Reviewed-on: https://go-review.googlesource.com/12178
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Use *Node of type ONAME instead of string as the key for variable maps.
This will prevent aliasing between two identically named but
differently scoped variables.
Introduce an Aux value that encodes the offset of a variable
from a base pointer (either global base pointer or stack pointer).
Allow LEAQ and derivatives (MOVQ, etc.) to also have such an Aux field.
Allocate space for AUTO variables in stackalloc.
Change-Id: Ibdccdaea4bbc63a1f4882959ac374f2b467e3acd
Reviewed-on: https://go-review.googlesource.com/11238
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Requested in CL 11380.
Change-Id: Icf0d23fb8d383c76272401e363cc9b2169d11403
Reviewed-on: https://go-review.googlesource.com/11450
Reviewed-by: Alan Donovan <adonovan@google.com>
|
|
The SSA implementation logs for three purposes:
* debug logging
* fatal errors
* unimplemented features
Separating these three uses lets us attempt an SSA
implementation for all functions, not just
_ssa functions. This turns the entire standard
library into a compilation test, and makes it
easy to figure out things like
"how much coverage does SSA have now" and
"what should we do next to get more coverage?".
Functions called _ssa are still special.
They log profusely by default and
the output of the SSA implementation
is used. For all other functions,
logging is off, and the implementation
is built and discarded, due to lack of
support for the runtime.
While we're here, fix a few minor bugs and
add some extra Unimplementeds to allow
all.bash to pass.
As of now, SSA handles 20.79% of the functions
in the standard library (689 of 3314).
The top missing features are:
10.03% 2597 SSA unimplemented: zero for type error not implemented
7.79% 2016 SSA unimplemented: addr: bad op DOTPTR
7.33% 1898 SSA unimplemented: unhandled expr EQ
6.10% 1579 SSA unimplemented: unhandled expr OROR
4.91% 1271 SSA unimplemented: unhandled expr NE
4.49% 1163 SSA unimplemented: unhandled expr LROT
4.00% 1036 SSA unimplemented: unhandled expr LEN
3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC
2.37% 615 SSA unimplemented: zero for type []byte not implemented
1.90% 492 SSA unimplemented: unhandled stmt CALLMETH
1.74% 450 SSA unimplemented: unhandled expr CALLINTER
1.74% 450 SSA unimplemented: unhandled expr DOT
1.71% 444 SSA unimplemented: unhandled expr ANDAND
1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR
1.54% 400 SSA unimplemented: unhandled expr CALLMETH
1.51% 390 SSA unimplemented: unhandled stmt SWITCH
1.47% 380 SSA unimplemented: unhandled expr CONV
1.33% 345 SSA unimplemented: addr: bad op *
1.30% 336 SSA unimplemented: unhandled OLITERAL 6
Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7
Reviewed-on: https://go-review.googlesource.com/11160
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Change-Id: I33025a4a41fd91f6ee317d33a6eebf27fa00ab51
Reviewed-on: https://go-review.googlesource.com/11115
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Add an additional int64 auxiliary field to Value.
There are two main reasons for doing this:
1) Ints in interfaces require allocation, and we store ints in Aux a lot.
2) I'd like to have both *gc.Sym and int offsets included in lots
of operations (e.g. MOVQloadidx8). It will be more efficient to
store them as separate fields instead of a pointer to a sym/int pair.
It also simplifies a bunch of code.
This is just the refactoring. I'll start using this some more in a
subsequent changelist.
Change-Id: I1ca797ff572553986cf90cab3ac0a0c1d01ad241
Reviewed-on: https://go-review.googlesource.com/10929
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|