| Age | Commit message (Collapse) | Author |
|
The obj package needs to emit the PCDATA to select the entry stack map
before calling morestack. Currently this is copied for every
architecture. Since we're about to change how this works, consolidate
all of these copies into a single helper function.
For #24543.
Change-Id: Ia92d94de78f8e23fd06dba747c43e03e5989f67b
Reviewed-on: https://go-review.googlesource.com/109346
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Add a compiler intrinsic for getcallerpc on following architectures:
arm
mips mipsle mips64 mips64le
ppc64 ppc64le
s390x
Change-Id: I758f3d4742fc214b206bcd07d90408622c17dbef
Reviewed-on: https://go-review.googlesource.com/110835
Run-TryBot: Wei Xiao <Wei.Xiao@arm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Before DWARF location lists can be turned on, 3 bugs need
fixing.
This CL addresses two -- lack of register definitions for
various architectures, and bugs on 32-bit platforms.
The third bug comes later.
Passes
GO_GCFLAGS=-dwarflocationlists ./run.bash -no-rebuild
(-no-rebuild because the map dependence causes trouble)
Change-Id: I4223b48ade84763e4b048e4aeb81149f082c7bc7
Reviewed-on: https://go-review.googlesource.com/99255
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
CL generated mechanically with github.com/mdempsky/unconvert.
Also updated cmd/compile/internal/ssa/gen/*.rules manually.
Change-Id: If721ef73cf0771ae83ce7e2d11623fc8d9155768
Reviewed-on: https://go-review.googlesource.com/97075
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This change makes it easier to express instructions
with arbitrary number of operands.
Rationale: previous approach with operand "hiding" does
not scale well, AVX and especially AVX512 have many
instructions with 3+ operands.
x86 asm backend is updated to handle up to 6 explicit operands.
It also fixes issue with 4-th immediate operand type checks.
All `ytab` tables are updated accordingly.
Changes to non-x86 backends only include these patterns:
`p.From3 = X` => `p.SetFrom3(X)`
`p.From3.X = Y` => `p.GetFrom3().X = Y`
Over time, other backends can adapt Prog.RestArgs
and reduce the amount of workarounds.
-- Performance --
x/benchmark/build:
$ benchstat upstream.bench patched.bench
name old time/op new time/op delta
Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10)
name old binary-size new binary-size delta
Build-48 10.3M ± 0% 10.3M ± 0% ~ (all equal)
name old build-time/op new build-time/op delta
Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10)
name old build-peak-RSS-bytes new build-peak-RSS-bytes delta
Build-48 145MB ± 5% 148MB ± 5% ~ (p=0.218 n=10+10)
name old build-user+sys-time/op new build-user+sys-time/op delta
Build-48 21.0s ± 2% 21.2s ± 2% ~ (p=0.075 n=10+10)
Microbenchmark shows a slight slowdown.
name old time/op new time/op delta
AMD64asm-4 49.5ms ± 1% 49.9ms ± 1% +0.67% (p=0.001 n=23+15)
func BenchmarkAMD64asm(b *testing.B) {
for i := 0; i < b.N; i++ {
TestAMD64EndToEnd(nil)
TestAMD64Encoder(nil)
}
}
Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183
Reviewed-on: https://go-review.googlesource.com/63490
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
In PPC64 ELF files, the st_other field indicates the number of
prologue instructions between the global and local entry points.
We add the instructions in the compiler and assembler if -shared is used.
We were assuming that the instructions were present when building a
c-archive or PIE or doing dynamic linking, on the assumption that those
are the cases where the go tool would be building with -shared.
That assumption fails when using some other tool, such as Bazel,
that does not necessarily use -shared in exactly the same way.
This CL records in the object file whether a symbol was compiled
with -shared (this will be the same for all symbols in a given compilation)
and uses that information when setting the st_other field.
Fixes #20290.
Change-Id: Ib2b77e16aef38824871102e3c244fcf04a86c6ea
Reviewed-on: https://go-review.googlesource.com/43051
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
where possible
When the stack register is decremented to acquire stack space at
the beginning of a function, a MOVDU should be used so it is done
atomically, unless the size of the stack frame is too large for
that instruction. The code to determine whether to use MOVDU
or MOVD was checking if the function was a leaf and always generating MOVD
when it was. The choice of MOVD vs. MOVDU should only depend on the stack
frame size. This fixes that problem.
Change-Id: I0e49c79036f1e8f7584179e1442b938fc6da085f
Reviewed-on: https://go-review.googlesource.com/41813
Reviewed-by: Michael Munday <munday@ca.ibm.com>
|
|
There were only two versions, 0 and 1,
and the only user of version 1 was the assembler,
to indicate that a symbol was static.
Rename LSym.Version to Static,
and add it to LSym.Attributes.
Simplify call-sites.
Passes toolstash-check.
Change-Id: Iabd39918f5019cce78f381d13f0481ae09f3871f
Reviewed-on: https://go-review.googlesource.com/41201
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing
the assembler backends no longer requires reinstalling cmd/link or
cmd/addr2line.
There's also now one canonical definition of the object file format in
cmd/internal/objabi/doc.go, with a warning to update all three
implementations.
objabi is still something of a grab bag of unrelated code (e.g., flag
and environment variable handling probably belong in a separate "tool"
package), but this is still progress.
Fixes #15165.
Fixes #20026.
Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c
Reviewed-on: https://go-review.googlesource.com/40972
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Automated refactoring using github.com/mdempsky/unbed (to rewrite
s.Foo to s.FuncInfo.Foo) and then gorename (to rename the FuncInfo
field to just Func).
Passes toolstash-check -all.
Change-Id: I802c07a1239e0efea058a91a87c5efe12170083a
Reviewed-on: https://go-review.googlesource.com/40670
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Prior to this CL, flags such as NOSPLIT
on ATEXT Progs were stored in From3.Offset.
Some but not all of those flags were also
duplicated into From.Sym.Attribute.
This CL migrates all of those flags into
From.Sym.Attribute and stops creating a From3.
A side-effect of this is that printing an
ATEXT Prog can no longer simply dump From3.Offset.
That's kind of good, since the raw flag value
wasn't very informative anyway, but it did
necessitate a bunch of updates to the cmd/asm tests.
The reason I'm doing this work now is that
avoiding storing flags in both From.Sym and From3.Offset
simplifies some other changes to fix the data
race first described in CL 40254.
This CL almost passes toolstash-check -all.
The only changes are in cases where the assembler
has decided that a function's flags may be altered,
e.g. to make a function with no calls in it NOSPLIT.
Prior to this CL, that information was not printed.
Sample before:
"".Ctz64 t=1 size=63 args=0x10 locals=0x0
0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35) TEXT "".Ctz64(SB), $0-16
0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35) FUNCDATA $0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)
Sample after:
"".Ctz64 t=1 nosplit size=63 args=0x10 locals=0x0
0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35) TEXT "".Ctz64(SB), NOSPLIT, $0-16
0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35) FUNCDATA $0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)
Observe the additional "nosplit" in the first line
and the additional "NOSPLIT" in the second line.
Updates #15756
Change-Id: I5c59bd8f3bdc7c780361f801d94a261f0aef3d13
Reviewed-on: https://go-review.googlesource.com/40495
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
CL 39922 made the arm assembler concurrency-safe.
This CL does the same, but for ppc64.
The approach is similar: introduce ctxt9 to hold
function-local state and thread it through
the assembler as necessary.
One race remains after this CL, similar to CL 40252.
That race is conceptually unrelated to this refactoring,
and will be addressed in a separate CL.
Passes toolstash-check -all.
Updates #15756
Change-Id: Icc37d9a971bed2184c8e66b1a64f4f2e556dc207
Reviewed-on: https://go-review.googlesource.com/40372
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Updates #19865
Change-Id: I24fbf5d79b5e4cac09c14cfff678a8215397b670
Reviewed-on: https://go-review.googlesource.com/39914
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
CL 38662 changed the x86 assembler to be eagerly
initialized, for a concurrent backend.
This CL puts in place a proper mechanism for doing so,
and switches all architectures to use it.
Passes toolstash-check -all.
Updates #15756
Change-Id: Id2aa527d3a8259c95797d63a2f0d1123e3ca2a1c
Reviewed-on: https://go-review.googlesource.com/39917
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
This is a straightforward refactoring,
to reduce the scope of upcoming changes.
The symbol size and AttrLocal=true was not
set universally, but it appears not to matter,
since toolstash -cmp is happy.
Passes toolstash-check -all.
Change-Id: I7f8392f939592d3a1bc6f61dec992f5661f42fca
Reviewed-on: https://go-review.googlesource.com/39791
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
It was simply a wrapper around Link.Lookup.
Unwrap everything.
CL prepared using eg with template:
package p
import "cmd/internal/obj"
func before(ctxt *obj.Link, name string, version int) *obj.LSym {
return obj.Linklookup(ctxt, name, version)
}
func after(ctxt *obj.Link, name string, version int) *obj.LSym {
return ctxt.Lookup(name, version)
}
Then one comment in cmd/asm/internal/asm/parse.go
was manually updated (and gofmt'ed!),
and func Linklookup deleted.
Passes toolstash-check (as a sanity measure).
Change-Id: Icc4d56b0b2b5c8888d3184c1898c48359ea1e638
Reviewed-on: https://go-review.googlesource.com/39715
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
The existing bulk Prog allocator is not concurrency-safe.
To allow for concurrency-safe bulk allocation of Progs,
I want to move Prog allocation and caching upstream,
to the clients of cmd/internal/obj.
This is a preliminary enabling refactoring.
After this CL, instead of calling Ctxt.NewProg
throughout the assemblers, we thread through
a newprog function that returns a new Prog.
That function is set up to be Ctxt.NewProg,
so there are no real changes in this CL;
this CL only establishes the plumbing.
Passes toolstash-check -all.
Negligible compiler performance impact.
Updates #15756
name old time/op new time/op delta
Template 213ms ± 3% 214ms ± 4% ~ (p=0.574 n=49+47)
Unicode 90.1ms ± 5% 89.9ms ± 4% ~ (p=0.417 n=50+49)
GoTypes 585ms ± 4% 584ms ± 3% ~ (p=0.466 n=49+49)
SSA 6.50s ± 3% 6.52s ± 2% ~ (p=0.251 n=49+49)
Flate 128ms ± 4% 128ms ± 4% ~ (p=0.673 n=49+50)
GoParser 152ms ± 3% 152ms ± 3% ~ (p=0.810 n=48+49)
Reflect 372ms ± 4% 372ms ± 5% ~ (p=0.778 n=49+50)
Tar 113ms ± 5% 111ms ± 4% -0.98% (p=0.016 n=50+49)
XML 208ms ± 3% 208ms ± 2% ~ (p=0.483 n=47+49)
[Geo mean] 285ms 285ms -0.17%
name old user-ns/op new user-ns/op delta
Template 253M ± 8% 254M ± 9% ~ (p=0.899 n=50+50)
Unicode 106M ± 9% 106M ±11% ~ (p=0.642 n=50+50)
GoTypes 736M ± 4% 740M ± 4% ~ (p=0.121 n=50+49)
SSA 8.82G ± 3% 8.88G ± 2% +0.65% (p=0.006 n=49+48)
Flate 147M ± 4% 147M ± 5% ~ (p=0.844 n=47+48)
GoParser 179M ± 4% 178M ± 6% ~ (p=0.785 n=50+50)
Reflect 443M ± 6% 441M ± 5% ~ (p=0.850 n=48+47)
Tar 126M ± 5% 126M ± 5% ~ (p=0.734 n=50+50)
XML 244M ± 5% 244M ± 5% ~ (p=0.594 n=49+50)
[Geo mean] 341M 341M +0.11%
Change-Id: Ice962f61eb3a524c2db00a166cb582c22caa7d68
Reviewed-on: https://go-review.googlesource.com/39633
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
There is always 128 bytes available below the stackguard. Allow functions
with medium-sized stack frames to use this, potentially allowing them to
avoid growing the stack.
This change makes all architectures use the same calculation as x86.
Change-Id: I2afb1a7c686ae5a933e50903b31ea4106e4cd0a0
Reviewed-on: https://go-review.googlesource.com/38734
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Change-Id: I9ac274dbfe887675a7820d2f8f87b5887b1c9b0e
Reviewed-on: https://go-review.googlesource.com/38383
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Stack barriers were removed in CL 36620.
Change-Id: If124d65a73a7b344a42be2a4b386a14d7a0a428b
Reviewed-on: https://go-review.googlesource.com/38169
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
The Follow pass in the assembler backend reorders and copies
instructions. This even applies to hand-written assembly code,
which in many cases don't want to be reordered. Now that the
SSA compiler does a good job for laying out instructions, the
benefit of this pass is very little:
AMD64: (old = with Follow, new = without Follow)
name old time/op new time/op delta
BinaryTree17-12 2.78s ± 1% 2.79s ± 1% +0.44% (p=0.000 n=20+19)
Fannkuch11-12 3.11s ± 0% 3.31s ± 1% +6.16% (p=0.000 n=19+19)
FmtFprintfEmpty-12 50.9ns ± 1% 51.6ns ± 3% +1.40% (p=0.000 n=17+20)
FmtFprintfString-12 127ns ± 0% 128ns ± 1% +0.88% (p=0.000 n=17+17)
FmtFprintfInt-12 122ns ± 0% 123ns ± 1% +0.76% (p=0.000 n=20+19)
FmtFprintfIntInt-12 185ns ± 1% 186ns ± 1% +0.65% (p=0.000 n=20+19)
FmtFprintfPrefixedInt-12 192ns ± 1% 202ns ± 1% +4.99% (p=0.000 n=20+19)
FmtFprintfFloat-12 284ns ± 0% 288ns ± 0% +1.33% (p=0.000 n=15+19)
FmtManyArgs-12 807ns ± 0% 804ns ± 0% -0.44% (p=0.000 n=16+18)
GobDecode-12 7.23ms ± 1% 7.21ms ± 1% ~ (p=0.052 n=20+20)
GobEncode-12 6.09ms ± 1% 6.12ms ± 1% +0.41% (p=0.002 n=19+19)
Gzip-12 253ms ± 1% 255ms ± 1% +0.95% (p=0.000 n=18+20)
Gunzip-12 38.4ms ± 0% 38.5ms ± 0% +0.34% (p=0.000 n=17+17)
HTTPClientServer-12 95.4µs ± 2% 96.1µs ± 1% +0.78% (p=0.002 n=19+19)
JSONEncode-12 16.5ms ± 1% 16.6ms ± 1% +1.17% (p=0.000 n=19+19)
JSONDecode-12 54.6ms ± 1% 55.3ms ± 1% +1.23% (p=0.000 n=18+18)
Mandelbrot200-12 4.47ms ± 0% 4.47ms ± 0% +0.06% (p=0.000 n=18+18)
GoParse-12 3.47ms ± 1% 3.47ms ± 1% ~ (p=0.583 n=20+20)
RegexpMatchEasy0_32-12 84.8ns ± 1% 85.2ns ± 2% +0.51% (p=0.022 n=20+20)
RegexpMatchEasy0_1K-12 206ns ± 1% 206ns ± 1% ~ (p=0.770 n=20+20)
RegexpMatchEasy1_32-12 82.8ns ± 1% 83.4ns ± 1% +0.64% (p=0.000 n=20+19)
RegexpMatchEasy1_1K-12 363ns ± 1% 361ns ± 1% -0.48% (p=0.007 n=20+20)
RegexpMatchMedium_32-12 126ns ± 1% 126ns ± 0% +0.72% (p=0.000 n=20+20)
RegexpMatchMedium_1K-12 39.1µs ± 1% 39.8µs ± 0% +1.73% (p=0.000 n=19+19)
RegexpMatchHard_32-12 1.97µs ± 0% 1.98µs ± 1% +0.29% (p=0.005 n=18+20)
RegexpMatchHard_1K-12 59.5µs ± 1% 59.8µs ± 1% +0.36% (p=0.000 n=18+20)
Revcomp-12 442ms ± 1% 445ms ± 2% +0.67% (p=0.000 n=19+20)
Template-12 58.0ms ± 1% 57.5ms ± 1% -0.85% (p=0.000 n=19+19)
TimeParse-12 311ns ± 0% 314ns ± 0% +0.94% (p=0.000 n=20+18)
TimeFormat-12 350ns ± 3% 346ns ± 0% ~ (p=0.076 n=20+19)
[Geo mean] 55.9µs 56.4µs +0.80%
ARM32:
name old time/op new time/op delta
BinaryTree17-4 30.4s ± 0% 30.1s ± 0% -1.14% (p=0.000 n=10+8)
Fannkuch11-4 13.7s ± 0% 13.6s ± 0% -0.75% (p=0.000 n=10+10)
FmtFprintfEmpty-4 664ns ± 1% 651ns ± 1% -1.96% (p=0.000 n=7+8)
FmtFprintfString-4 1.83µs ± 2% 1.77µs ± 2% -3.21% (p=0.000 n=10+10)
FmtFprintfInt-4 1.57µs ± 2% 1.54µs ± 2% -2.25% (p=0.007 n=10+10)
FmtFprintfIntInt-4 2.37µs ± 2% 2.31µs ± 1% -2.68% (p=0.000 n=10+10)
FmtFprintfPrefixedInt-4 2.14µs ± 2% 2.10µs ± 1% -1.83% (p=0.006 n=10+10)
FmtFprintfFloat-4 3.69µs ± 2% 3.74µs ± 1% +1.60% (p=0.000 n=10+10)
FmtManyArgs-4 9.43µs ± 1% 9.17µs ± 1% -2.70% (p=0.000 n=10+10)
GobDecode-4 76.3ms ± 1% 75.5ms ± 1% -1.14% (p=0.003 n=10+10)
GobEncode-4 70.7ms ± 2% 69.0ms ± 1% -2.36% (p=0.000 n=10+10)
Gzip-4 2.64s ± 1% 2.65s ± 0% +0.59% (p=0.002 n=10+10)
Gunzip-4 402ms ± 0% 398ms ± 0% -1.11% (p=0.000 n=10+9)
HTTPClientServer-4 458µs ± 0% 457µs ± 0% ~ (p=0.247 n=10+10)
JSONEncode-4 171ms ± 0% 172ms ± 0% +0.56% (p=0.000 n=10+10)
JSONDecode-4 672ms ± 1% 668ms ± 1% ~ (p=0.105 n=10+10)
Mandelbrot200-4 33.5ms ± 0% 33.5ms ± 0% ~ (p=0.156 n=9+10)
GoParse-4 33.9ms ± 0% 34.0ms ± 0% +0.36% (p=0.031 n=9+9)
RegexpMatchEasy0_32-4 823ns ± 1% 835ns ± 1% +1.49% (p=0.000 n=8+8)
RegexpMatchEasy0_1K-4 3.99µs ± 0% 4.02µs ± 1% +0.92% (p=0.000 n=8+10)
RegexpMatchEasy1_32-4 877ns ± 3% 904ns ± 2% +3.07% (p=0.012 n=10+10)
RegexpMatchEasy1_1K-4 5.99µs ± 0% 5.97µs ± 1% -0.38% (p=0.023 n=8+8)
RegexpMatchMedium_32-4 1.40µs ± 2% 1.40µs ± 2% ~ (p=0.590 n=10+9)
RegexpMatchMedium_1K-4 357µs ± 0% 355µs ± 1% -0.72% (p=0.000 n=7+8)
RegexpMatchHard_32-4 22.3µs ± 0% 22.1µs ± 0% -0.49% (p=0.000 n=8+7)
RegexpMatchHard_1K-4 661µs ± 0% 658µs ± 0% -0.42% (p=0.000 n=8+7)
Revcomp-4 46.3ms ± 0% 46.3ms ± 0% ~ (p=0.393 n=10+10)
Template-4 753ms ± 1% 750ms ± 0% ~ (p=0.211 n=10+9)
TimeParse-4 4.28µs ± 1% 4.22µs ± 1% -1.34% (p=0.000 n=8+10)
TimeFormat-4 9.00µs ± 0% 9.05µs ± 0% +0.59% (p=0.000 n=10+10)
[Geo mean] 538µs 535µs -0.55%
ARM64:
name old time/op new time/op delta
BinaryTree17-8 8.39s ± 0% 8.39s ± 0% ~ (p=0.684 n=10+10)
Fannkuch11-8 5.95s ± 0% 5.99s ± 0% +0.63% (p=0.000 n=10+10)
FmtFprintfEmpty-8 116ns ± 0% 116ns ± 0% ~ (all equal)
FmtFprintfString-8 361ns ± 0% 360ns ± 0% -0.31% (p=0.003 n=8+6)
FmtFprintfInt-8 290ns ± 0% 290ns ± 0% ~ (p=0.620 n=9+9)
FmtFprintfIntInt-8 476ns ± 1% 469ns ± 0% -1.47% (p=0.000 n=10+6)
FmtFprintfPrefixedInt-8 412ns ± 2% 417ns ± 2% +1.39% (p=0.006 n=9+10)
FmtFprintfFloat-8 652ns ± 1% 652ns ± 0% ~ (p=0.161 n=10+8)
FmtManyArgs-8 1.94µs ± 0% 1.94µs ± 2% ~ (p=0.781 n=10+10)
GobDecode-8 17.7ms ± 1% 17.7ms ± 0% ~ (p=0.962 n=10+7)
GobEncode-8 15.6ms ± 0% 15.6ms ± 1% ~ (p=0.063 n=10+10)
Gzip-8 786ms ± 0% 787ms ± 0% ~ (p=0.356 n=10+9)
Gunzip-8 127ms ± 0% 127ms ± 0% +0.08% (p=0.028 n=10+9)
HTTPClientServer-8 198µs ± 6% 198µs ± 7% ~ (p=0.796 n=10+10)
JSONEncode-8 42.5ms ± 0% 42.2ms ± 0% -0.73% (p=0.000 n=9+8)
JSONDecode-8 158ms ± 1% 162ms ± 0% +2.28% (p=0.000 n=10+9)
Mandelbrot200-8 10.1ms ± 0% 10.1ms ± 0% -0.01% (p=0.000 n=10+9)
GoParse-8 8.54ms ± 1% 8.63ms ± 1% +1.06% (p=0.000 n=10+9)
RegexpMatchEasy0_32-8 231ns ± 1% 225ns ± 0% -2.52% (p=0.000 n=9+10)
RegexpMatchEasy0_1K-8 1.63µs ± 0% 1.63µs ± 0% ~ (p=0.170 n=10+10)
RegexpMatchEasy1_32-8 253ns ± 0% 249ns ± 0% -1.41% (p=0.000 n=9+10)
RegexpMatchEasy1_1K-8 2.08µs ± 0% 2.08µs ± 0% -0.32% (p=0.000 n=9+10)
RegexpMatchMedium_32-8 355ns ± 1% 351ns ± 0% -1.04% (p=0.007 n=10+7)
RegexpMatchMedium_1K-8 104µs ± 0% 104µs ± 0% ~ (p=0.148 n=10+10)
RegexpMatchHard_32-8 5.79µs ± 0% 5.79µs ± 0% ~ (p=0.578 n=10+10)
RegexpMatchHard_1K-8 176µs ± 0% 176µs ± 0% ~ (p=0.137 n=10+10)
Revcomp-8 1.37s ± 1% 1.36s ± 1% -0.26% (p=0.023 n=10+10)
Template-8 151ms ± 1% 154ms ± 1% +2.14% (p=0.000 n=9+10)
TimeParse-8 723ns ± 2% 721ns ± 1% ~ (p=0.592 n=10+10)
TimeFormat-8 804ns ± 2% 798ns ± 3% ~ (p=0.344 n=10+10)
[Geo mean] 154µs 154µs -0.02%
Therefore remove this pass. Also reduce text size by 0.5~2%.
Comment out some dead code in runtime/sys_nacl_amd64p32.s
which contains undefined symbols.
Change-Id: I1473986fe5b18b3d2554ce96cdc6f0999b8d955d
Reviewed-on: https://go-review.googlesource.com/36205
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
Change-Id: I7585d85907869f5a286b36936dfd035f1e8e9906
Reviewed-on: https://go-review.googlesource.com/34197
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
on PPC64, ARM64, S390X
The runtime traceback code assumes non-empty frame has link
link register saved on LR architectures. Make sure it is so in
the assember.
Also make sure that LR is stored before update SP, so the traceback
code will not see a half-updated stack frame if a signal comes
during the execution of function prologue.
Fixes #17381.
Change-Id: I668b04501999b7f9b080275a2d1f8a57029cbbb3
Reviewed-on: https://go-review.googlesource.com/31760
Reviewed-by: Michael Munday <munday@ca.ibm.com>
|
|
Reduces the size of LSym struct.
On 32bit: before 84 after 76
On 64bit: before 136 after 128
name old time/op new time/op delta
Template 182ms ± 3% 182ms ± 3% ~ (p=0.607 n=19+20)
Unicode 93.5ms ± 4% 94.2ms ± 3% ~ (p=0.141 n=20+19)
GoTypes 608ms ± 1% 605ms ± 2% ~ (p=0.056 n=20+20)
name old user-ns/op new user-ns/op delta
Template 249M ± 7% 249M ± 4% ~ (p=0.605 n=18+19)
Unicode 149M ±14% 151M ± 5% ~ (p=0.724 n=20+17)
GoTypes 855M ± 4% 853M ± 3% ~ (p=0.537 n=19+19)
name old alloc/op new alloc/op delta
Template 40.3MB ± 0% 40.3MB ± 0% -0.11% (p=0.000 n=19+20)
Unicode 33.8MB ± 0% 33.8MB ± 0% -0.08% (p=0.000 n=20+20)
GoTypes 119MB ± 0% 119MB ± 0% -0.10% (p=0.000 n=19+20)
name old allocs/op new allocs/op delta
Template 383k ± 0% 383k ± 0% ~ (p=0.703 n=20+20)
Unicode 317k ± 0% 317k ± 0% ~ (p=0.982 n=19+18)
GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.086 n=20+20)
Change-Id: Id6ba0db3ecc4503a4e9af3ed0d5884d4366e8bf9
Reviewed-on: https://go-review.googlesource.com/31870
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Shahar Kohanim <skohanim@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This change omits the stack check on ppc64 and s390x when the size of
a stack frame is less than obj.StackSmall. This is an optimization
x86 already performs.
The effect on s390x isn't huge because we were already omitting the
stack check when the frame size was 0 (it shaves about 1K from the
size of bin/go). On ppc64 however this change reduces the size of the
.text section in bin/go by 33K (1%).
Updates #13379 (for ppc64).
Change-Id: I6af0eb987646bea47fcaf0a812db3496bab0f680
Reviewed-on: https://go-review.googlesource.com/31357
Reviewed-by: David Chase <drchase@google.com>
|
|
In the function prologue, we emit a jump to the beginning of
the function immediately after calling morestack. And in the
runtime stack growing code, it decodes and emulates that jump.
This emulation was necessary before we had per-PC SP deltas,
since the traceback code assumed that the frame size was fixed
for the whole function, except on the first instruction where
it was 0. Since we now have per-PC SP deltas and PCDATA, we
can correctly record that the frame size is 0. This makes the
emulation unnecessary.
This may be helpful for registerized calling convention, where
there may be unspills of arguments after calling morestack. It
also simplifies the runtime.
Change-Id: I7ebee31eaee81795445b33f521ab6a79624c4ceb
Reviewed-on: https://go-review.googlesource.com/30138
Reviewed-by: David Chase <drchase@google.com>
|
|
Replace the various calls to Fprintf(ctxt.Bso, ...) with a helper,
ctxt.Logf. This also addresses the various inconsistent flushing of
ctxt.Bso.
Because we have two Link structures, add Link.Logf in both places.
Change-Id: I23093f9b9b3bf33089a0ffd7f815f92dcd1a1fa1
Reviewed-on: https://go-review.googlesource.com/27730
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
In many places where ctx.Bso.Flush is used as the target for some debug
logging, ctx.Bso.Flush is called unconditionally. In the majority of
cases where debug logging is not enabled, this means Flush is called
many times when there is nothing to be flushed (it will be called anyway
when ctx.Bso is eventually closed), sometimes in a loop.
Avoid this by moving the ctx.Bso.Flush call into the same condition
block as the debug print. This pattern was previously applied
sporadically.
Change-Id: I0444cb235cc8b9bac51a59b2e44e59872db2be06
Reviewed-on: https://go-review.googlesource.com/27579
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
The following performance improvements have been made to the
low-level atomic functions for ppc64le & ppc64:
- For those cases containing a lwarx and stwcx (or other sizes):
sync, lwarx, maybe something, stwcx, loop to sync, sync, isync
The sync is moved before (outside) the lwarx/stwcx loop, and the
sync after is removed, so it becomes:
sync, lwarx, maybe something, stwcx, loop to lwarx, isync
- For the Or8 and And8, the shifting and manipulation of the
address to the word aligned version were removed and the
instructions were changed to use lbarx, stbcx instead of
register shifting, xor, then lwarx, stwcx.
- New instructions LWSYNC, LBAR, STBCC were tested and added.
runtime/atomic_ppc64x.s was changed to use the LWSYNC opcode
instead of the WORD encoding.
Fixes #15469
Ran some of the benchmarks in the runtime and sync directories.
Some results varied from run to run but the trend was improvement
based on best times for base and new:
runtime.test:
BenchmarkChanNonblocking-128 0.88 0.89 +1.14%
BenchmarkChanUncontended-128 569 511 -10.19%
BenchmarkChanContended-128 63110 53231 -15.65%
BenchmarkChanSync-128 691 598 -13.46%
BenchmarkChanSyncWork-128 11355 11649 +2.59%
BenchmarkChanProdCons0-128 2402 2090 -12.99%
BenchmarkChanProdCons10-128 1348 1363 +1.11%
BenchmarkChanProdCons100-128 1002 746 -25.55%
BenchmarkChanProdConsWork0-128 2554 2720 +6.50%
BenchmarkChanProdConsWork10-128 1909 1804 -5.50%
BenchmarkChanProdConsWork100-128 1624 1580 -2.71%
BenchmarkChanCreation-128 237 212 -10.55%
BenchmarkChanSem-128 705 667 -5.39%
BenchmarkChanPopular-128 5081190 4497566 -11.49%
BenchmarkCreateGoroutines-128 532 473 -11.09%
BenchmarkCreateGoroutinesParallel-128 35.0 34.7 -0.86%
BenchmarkCreateGoroutinesCapture-128 4923 4200 -14.69%
sync.test:
BenchmarkUncontendedSemaphore-128 112 94.2 -15.89%
BenchmarkContendedSemaphore-128 133 128 -3.76%
BenchmarkMutexUncontended-128 1.90 1.67 -12.11%
BenchmarkMutex-128 353 310 -12.18%
BenchmarkMutexSlack-128 304 283 -6.91%
BenchmarkMutexWork-128 554 541 -2.35%
BenchmarkMutexWorkSlack-128 567 556 -1.94%
BenchmarkMutexNoSpin-128 275 242 -12.00%
BenchmarkMutexSpin-128 1129 1030 -8.77%
BenchmarkOnce-128 1.08 0.96 -11.11%
BenchmarkPool-128 29.8 27.4 -8.05%
BenchmarkPoolOverflow-128 40564 36583 -9.81%
BenchmarkSemaUncontended-128 3.14 2.63 -16.24%
BenchmarkSemaSyntNonblock-128 1087 1069 -1.66%
BenchmarkSemaSyntBlock-128 897 893 -0.45%
BenchmarkSemaWorkNonblock-128 1034 1028 -0.58%
BenchmarkSemaWorkBlock-128 949 886 -6.64%
Change-Id: I4403fb29d3cd5254b7b1ce87a216bd11b391079e
Reviewed-on: https://go-review.googlesource.com/22549
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Minux Ma <minux@golang.org>
|
|
Follows suit with https://go-review.googlesource.com/#/c/20111.
Generated by running
$ grep -R 'Go Authors. All' * | cut -d":" -f1 | while read F;do perl -pi -e 's/Go
Authors. All/Go Authors. All/g' $F;done
The code in cmd/internal/unvendor wasn't changed.
Fixes #15213
Change-Id: I4f235cee0a62ec435f9e8540a1ec08ae03b1a75f
Reviewed-on: https://go-review.googlesource.com/21819
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Change-Id: I9bda2ce6f45fb8292503f86d8f9f161601f222b7
Reviewed-on: https://go-review.googlesource.com/22053
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
|
|
Information about CPU architectures (e.g., name, family, byte
ordering, pointer and register size) is currently redundantly
scattered around the source tree. Instead consolidate the basic
information into a single new package cmd/internal/sys.
Also, introduce new sys.I386, sys.AMD64, etc. names for the constants
'8', '6', etc. and replace most uses of the latter. The notable
exceptions are a couple of error messages that still refer to the old
char-based toolchain names and function reltype in cmd/link.
Passes toolstash/buildall.
Change-Id: I8a6f0cbd49577ec1672a98addebc45f767e36461
Reviewed-on: https://go-review.googlesource.com/21623
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Change-Id: I91873aaebf79bdf1c00d38aacc1a1fb8d79656a7
Reviewed-on: https://go-review.googlesource.com/21433
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
No performance regression measurable:
name old time/op new time/op delta
Template 432ms ± 3% 422ms ± 2% -2.34% (p=0.010 n=10+9)
GoTypes 1.46s ± 1% 1.46s ± 1% ~ (p=0.796 n=10+10)
Compiler 7.15s ± 1% 7.14s ± 1% ~ (p=0.447 n=10+9)
Change-Id: I21b93cb989017b6fec2215de2423d87f25cf538c
Reviewed-on: https://go-review.googlesource.com/21220
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Passes toolstash/buildall.
Fixes #14692.
Change-Id: I4352678d8251309f2b8b7793674c550fac948006
Reviewed-on: https://go-review.googlesource.com/20350
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
runtime.stackBarrier is a strange function: it is only ever "called" by
smashing its address into a LR slot on the stack. Calling it like this
certainly does not adhere to the rule that r12 is set to the global entry point
before calling it and the prologue instrutions that compute r2 from r12 in fact
just corrupt r2, which is bad because the function that stackBarrier returns to
probably uses r2 to access global data.
Fortunately stackBarrier itself does not access any global data and so does not
depend on the value of r2, meaning we can ignore the ABI rules and simply skip
inserting the prologue instructions into this specific function.
Fixes 64bit.go, append.go and fixedbugs/issue13169.go from "cd test; go run
run.go -linkshared".
Change-Id: I606864133a83935899398e2d42edd08a946aab24
Reviewed-on: https://go-review.googlesource.com/17281
Reviewed-by: Austin Clements <austin@google.com>
|
|
linking
Change-Id: Ie79f72786b1d7154f1910e717a0faf354b913b89
Reviewed-on: https://go-review.googlesource.com/15970
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
ppc64le
Change-Id: I79c60241df6c785f35371e70c777a7bd6e93571c
Reviewed-on: https://go-review.googlesource.com/15968
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
when compiling PIC
The PowerPC ISA does not have a PC-relative load instruction, which poses
obvious challenges when generating position-independent code. The way the ELFv2
ABI addresses this is to specify that r2 points to a per "module" (shared
library or executable) TOC pointer. Maintaining this pointer requires
cooperation between codegen and the system linker:
* Non-leaf functions leave space on the stack at r1+24 to save the TOC pointer.
* A call to a function that *might* have to go via a PLT stub must be followed
by a nop instruction that the system linker can replace with "ld r1, 24(r1)"
to restore the TOC pointer (only when dynamically linking Go code).
* When calling a function via a function pointer, the address of the function
must be in r12, and the first couple of instructions (the "global entry
point") of the called function use this to derive the address of the TOC
for the module it is in.
* When calling a function that is implemented in the same module, the system
linker adjusts the call to skip over the instructions mentioned above (the
"local entry point"), assuming that r2 is already correctly set.
So this changeset adds the global entry point instructions, sets the metadata so
the system linker knows where the local entry point is, inserts code to save the
TOC pointer at 24(r1), adds a nop after any call not known to be local and copes
with the odd non-local code transfer in the runtime (e.g. the stuff around
jmpdefer). It does not actually compile PIC yet.
Change-Id: I7522e22bdfd2f891745a900c60254fe9e372c854
Reviewed-on: https://go-review.googlesource.com/15967
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
stack
Shared libraries on ppc64le will require a larger minimum stack frame (because
the ABI mandates that the TOC pointer is available at 24(R1)). Part 2a of
preparing for that is to have all bits of arch-independent and ppc64-specific
codegen that need to know call a function to find out.
Change-Id: I55899f73037e92227813c491049a3bd6f30bd41f
Reviewed-on: https://go-review.googlesource.com/15524
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
on ppc64x
Replace the confusing game where a frame size of $-8 would suppress the
implicit setting up of a stack frame with a nice explicit flag.
The code to set up the function prologue is still a little confusing but better
than it was.
Change-Id: I1d49278ff42c6bc734ebfb079998b32bc53f8d9a
Reviewed-on: https://go-review.googlesource.com/15670
Reviewed-by: Minux Ma <minux@golang.org>
|
|
This lets us re-enable duffzero.
Fixes #12108
Change-Id: Iefd24d26eaa56067caa2c29ff99cd20a42d8714a
Reviewed-on: https://go-review.googlesource.com/14937
Reviewed-by: Keith Randall <khr@golang.org>
|
|
obj.ARET is the portable return mnemonic. ppc64.ARETURN is a legacy
alias.
This was done with
sed -i s/ppc64\.ARETURN/obj.ARET/ cmd/compile/**/*.go
sed -i s/ARETURN/obj.ARET/ cmd/internal/obj/ppc64/obj9.go
Change-Id: I4d8e83ff411cee764774a40ef4c7c34dcbca4e43
Reviewed-on: https://go-review.googlesource.com/10673
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
|
|
This is a follow up to rev 443a32e707d2 which reduces some of the
duplication between methods and functions that operate on obj.Biobuf.
obj.Biobuf has Flush and Write methods as well as helpers which duplicate
those methods, consolidate on the former and remove the latter.
Also, address a final comment from CL 9525.
Change-Id: I67deaf3a163bb489a9bb21bb39524785d7a2f6c5
Reviewed-on: https://go-review.googlesource.com/9527
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
This started out as trying to remove Bool2int calls, which it does a bit, but
mostly it ended up being removing the Link.Symmorestack array which seemed a
pointless bit of caching.
Change-Id: I91a51eb08cb4b08f3f9f093b575306499267b67a
Reviewed-on: https://go-review.googlesource.com/9239
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
implementations
There were 10 implementations of the trivial bool2int function, 9 of which
were the only thing in their file. Remove all of them in favor of one in
cmd/internal/obj.
Change-Id: I9c51d30716239df51186860b9842a5e9b27264d3
Reviewed-on: https://go-review.googlesource.com/9230
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
The c2go translation left a lot of case expressions on separate lines.
Merge expressions onto single lines subject to these constraints:
* Max 4 clauses, all literals or names
* Don't move expressions with comments
The change was created by running http://play.golang.org/p/yHajs72h-g:
$ mergecase cmd/internal/{ld,gc,obj}/*.go cmd/internal/obj/*/*.go
Passes toolstash -cmp.
Change-Id: Iba41b390d302e5486e5dc6ba7599a92270676556
Reviewed-on: https://go-review.googlesource.com/7593
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
|
|
An interface{} is more in the spirit of the original union.
By my calculations, on 64-bit systems this reduces
Addr from 120 to 80 bytes, and Prog from 592 to 424 bytes.
Change-Id: I0d7b0981513c2a3c94c9ac76bb4f8816485b5a3c
Reviewed-on: https://go-review.googlesource.com/7744
Reviewed-by: Rob Pike <r@golang.org>
|
|
These were introduced during C -> Go translation when the loop increment
contained multiple statements.
Change-Id: Ic8abd8dcb3308851a1f7024de00711f0f984e684
Reviewed-on: https://go-review.googlesource.com/7627
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Rob Pike <r@golang.org>
|