| Age | Commit message (Collapse) | Author |
|
|
|
This reverts CL 529738.
Reason for revert: Breaking longtest builders
For #58102.
Fixes #65220.
Change-Id: Id295e3249da9d82f6a9e4fc571760302a1362def
Reviewed-on: https://go-review.googlesource.com/c/go/+/557460
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The pgo compilation time is very long if the profile file is large.
We added a preprocess tool to pre-parse profile file in order to
expedite the compile time.
Change-Id: I6f50bbd01f242448e2463607a9b63483c6ca9a12
Reviewed-on: https://go-review.googlesource.com/c/go/+/529738
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This commit is aimed at improving the readability and consistency
of the code base. Extraneous newline characters were present after
some return statements, creating unnecessary separation in the code.
Fixes #64610
Change-Id: Ic1b05bf11761c4dff22691c2f1c3755f66d341f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/548316
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Move ChaCha8 code into internal/chacha8rand and use it to implement
runtime.rand, which is used for the unseeded global source for
both math/rand and math/rand/v2. This also affects the calculation of
the start point for iteration over very very large maps (when the
32-bit fastrand is not big enough).
The benefit is that misuse of the global random number generators
in math/rand and math/rand/v2 in contexts where non-predictable
randomness is important for security reasons is no longer a
security problem, removing a common mistake among programmers
who are unaware of the different kinds of randomness.
The cost is an extra 304 bytes per thread stored in the m struct
plus 2-3ns more per random uint64 due to the more sophisticated
algorithm. Using PCG looks like it would cost about the same,
although I haven't benchmarked that.
Before this, the math/rand and math/rand/v2 global generator
was wyrand (https://github.com/wangyi-fudan/wyhash).
For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson
ALFG was justifiable, since the latter was not any better.
But for math/rand/v2, the global generator really should be
at least as good as one of the well-studied, specific algorithms
provided directly by the package, and it's not.
(Wyrand is still reasonable for scheduling and cache decisions.)
Good randomness does have a cost: about twice wyrand.
Also rationalize the various runtime rand references.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ bbb48afeb7.amd64 │ 5cf807d1ea.amd64 │
│ sec/op │ sec/op vs base │
ChaCha8-32 1.862n ± 2% 1.861n ± 2% ~ (p=0.825 n=20)
PCG_DXSM-32 1.471n ± 1% 1.460n ± 2% ~ (p=0.153 n=20)
SourceUint64-32 1.636n ± 2% 1.582n ± 1% -3.30% (p=0.000 n=20)
GlobalInt64-32 2.087n ± 1% 3.663n ± 1% +75.54% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1042n ± 1% 0.2026n ± 1% +94.48% (p=0.000 n=20)
GlobalUint64-32 2.263n ± 2% 3.724n ± 1% +64.57% (p=0.000 n=20)
GlobalUint64Parallel-32 0.1019n ± 1% 0.1973n ± 1% +93.67% (p=0.000 n=20)
Int64-32 1.771n ± 1% 1.774n ± 1% ~ (p=0.449 n=20)
Uint64-32 1.863n ± 2% 1.866n ± 1% ~ (p=0.364 n=20)
GlobalIntN1000-32 3.134n ± 3% 4.730n ± 2% +50.95% (p=0.000 n=20)
IntN1000-32 2.489n ± 1% 2.489n ± 1% ~ (p=0.683 n=20)
Int64N1000-32 2.521n ± 1% 2.516n ± 1% ~ (p=0.394 n=20)
Int64N1e8-32 2.479n ± 1% 2.478n ± 2% ~ (p=0.743 n=20)
Int64N1e9-32 2.530n ± 2% 2.514n ± 2% ~ (p=0.193 n=20)
Int64N2e9-32 2.501n ± 1% 2.494n ± 1% ~ (p=0.616 n=20)
Int64N1e18-32 3.227n ± 1% 3.205n ± 1% ~ (p=0.101 n=20)
Int64N2e18-32 3.647n ± 1% 3.599n ± 1% ~ (p=0.019 n=20)
Int64N4e18-32 5.135n ± 1% 5.069n ± 2% ~ (p=0.034 n=20)
Int32N1000-32 2.657n ± 1% 2.637n ± 1% ~ (p=0.180 n=20)
Int32N1e8-32 2.636n ± 1% 2.636n ± 1% ~ (p=0.763 n=20)
Int32N1e9-32 2.660n ± 2% 2.638n ± 1% ~ (p=0.358 n=20)
Int32N2e9-32 2.662n ± 2% 2.618n ± 2% ~ (p=0.064 n=20)
Float32-32 2.272n ± 2% 2.239n ± 2% ~ (p=0.194 n=20)
Float64-32 2.272n ± 1% 2.286n ± 2% ~ (p=0.763 n=20)
ExpFloat64-32 3.762n ± 1% 3.744n ± 1% ~ (p=0.171 n=20)
NormFloat64-32 3.706n ± 1% 3.655n ± 2% ~ (p=0.066 n=20)
Perm3-32 32.93n ± 3% 34.62n ± 1% +5.13% (p=0.000 n=20)
Perm30-32 202.9n ± 1% 204.0n ± 1% ~ (p=0.482 n=20)
Perm30ViaShuffle-32 115.0n ± 1% 114.9n ± 1% ~ (p=0.358 n=20)
ShuffleOverhead-32 112.8n ± 1% 112.7n ± 1% ~ (p=0.692 n=20)
Concurrent-32 2.107n ± 0% 3.725n ± 1% +76.75% (p=0.000 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
│ bbb48afeb7.arm64 │ 5cf807d1ea.arm64 │
│ sec/op │ sec/op vs base │
ChaCha8-8 2.480n ± 0% 2.429n ± 0% -2.04% (p=0.000 n=20)
PCG_DXSM-8 2.531n ± 0% 2.530n ± 0% ~ (p=0.877 n=20)
SourceUint64-8 2.534n ± 0% 2.533n ± 0% ~ (p=0.732 n=20)
GlobalInt64-8 2.172n ± 1% 4.794n ± 0% +120.67% (p=0.000 n=20)
GlobalInt64Parallel-8 0.4320n ± 0% 0.9605n ± 0% +122.32% (p=0.000 n=20)
GlobalUint64-8 2.182n ± 0% 4.770n ± 0% +118.58% (p=0.000 n=20)
GlobalUint64Parallel-8 0.4307n ± 0% 0.9583n ± 0% +122.51% (p=0.000 n=20)
Int64-8 4.107n ± 0% 4.104n ± 0% ~ (p=0.416 n=20)
Uint64-8 4.080n ± 0% 4.080n ± 0% ~ (p=0.052 n=20)
GlobalIntN1000-8 2.814n ± 2% 5.643n ± 0% +100.50% (p=0.000 n=20)
IntN1000-8 4.141n ± 0% 4.139n ± 0% ~ (p=0.140 n=20)
Int64N1000-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.313 n=20)
Int64N1e8-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.103 n=20)
Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.761 n=20)
Int64N2e9-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.636 n=20)
Int64N1e18-8 5.266n ± 0% 5.326n ± 1% +1.14% (p=0.001 n=20)
Int64N2e18-8 6.052n ± 0% 6.167n ± 0% +1.90% (p=0.000 n=20)
Int64N4e18-8 8.826n ± 0% 9.051n ± 0% +2.55% (p=0.000 n=20)
Int32N1000-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20)
Int32N1e8-8 4.126n ± 0% 4.131n ± 0% +0.12% (p=0.000 n=20)
Int32N1e9-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20)
Int32N2e9-8 4.132n ± 0% 4.131n ± 0% ~ (p=0.017 n=20)
Float32-8 4.109n ± 0% 4.105n ± 0% ~ (p=0.379 n=20)
Float64-8 4.107n ± 0% 4.106n ± 0% ~ (p=0.867 n=20)
ExpFloat64-8 5.339n ± 0% 5.383n ± 0% +0.82% (p=0.000 n=20)
NormFloat64-8 5.735n ± 0% 5.737n ± 1% ~ (p=0.856 n=20)
Perm3-8 26.65n ± 0% 26.80n ± 1% +0.58% (p=0.000 n=20)
Perm30-8 194.8n ± 1% 197.0n ± 0% +1.18% (p=0.000 n=20)
Perm30ViaShuffle-8 156.6n ± 0% 157.6n ± 1% +0.61% (p=0.000 n=20)
ShuffleOverhead-8 124.9n ± 0% 125.5n ± 0% +0.52% (p=0.000 n=20)
Concurrent-8 2.434n ± 3% 5.066n ± 0% +108.09% (p=0.000 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ bbb48afeb7.386 │ 5cf807d1ea.386 │
│ sec/op │ sec/op vs base │
ChaCha8-32 11.295n ± 1% 4.748n ± 2% -57.96% (p=0.000 n=20)
PCG_DXSM-32 7.693n ± 1% 7.738n ± 2% ~ (p=0.542 n=20)
SourceUint64-32 7.658n ± 2% 7.622n ± 2% ~ (p=0.344 n=20)
GlobalInt64-32 3.473n ± 2% 7.526n ± 2% +116.73% (p=0.000 n=20)
GlobalInt64Parallel-32 0.3198n ± 0% 0.5444n ± 0% +70.22% (p=0.000 n=20)
GlobalUint64-32 3.612n ± 0% 7.575n ± 1% +109.69% (p=0.000 n=20)
GlobalUint64Parallel-32 0.3168n ± 0% 0.5403n ± 0% +70.51% (p=0.000 n=20)
Int64-32 7.673n ± 2% 7.789n ± 1% ~ (p=0.122 n=20)
Uint64-32 7.773n ± 1% 7.827n ± 2% ~ (p=0.920 n=20)
GlobalIntN1000-32 6.268n ± 1% 9.581n ± 1% +52.87% (p=0.000 n=20)
IntN1000-32 10.33n ± 2% 10.45n ± 1% ~ (p=0.233 n=20)
Int64N1000-32 10.98n ± 2% 11.01n ± 1% ~ (p=0.401 n=20)
Int64N1e8-32 11.19n ± 2% 10.97n ± 1% ~ (p=0.033 n=20)
Int64N1e9-32 11.06n ± 1% 11.08n ± 1% ~ (p=0.498 n=20)
Int64N2e9-32 11.10n ± 1% 11.01n ± 2% ~ (p=0.995 n=20)
Int64N1e18-32 15.23n ± 2% 15.04n ± 1% ~ (p=0.973 n=20)
Int64N2e18-32 15.89n ± 1% 15.85n ± 1% ~ (p=0.409 n=20)
Int64N4e18-32 18.96n ± 2% 19.34n ± 2% ~ (p=0.048 n=20)
Int32N1000-32 10.46n ± 2% 10.44n ± 2% ~ (p=0.480 n=20)
Int32N1e8-32 10.46n ± 2% 10.49n ± 2% ~ (p=0.951 n=20)
Int32N1e9-32 10.28n ± 2% 10.26n ± 1% ~ (p=0.431 n=20)
Int32N2e9-32 10.50n ± 2% 10.44n ± 2% ~ (p=0.249 n=20)
Float32-32 13.80n ± 2% 13.80n ± 2% ~ (p=0.751 n=20)
Float64-32 23.55n ± 2% 23.87n ± 0% ~ (p=0.408 n=20)
ExpFloat64-32 15.36n ± 1% 15.29n ± 2% ~ (p=0.316 n=20)
NormFloat64-32 13.57n ± 1% 13.79n ± 1% +1.66% (p=0.005 n=20)
Perm3-32 45.70n ± 2% 46.99n ± 2% +2.81% (p=0.001 n=20)
Perm30-32 399.0n ± 1% 403.8n ± 1% +1.19% (p=0.006 n=20)
Perm30ViaShuffle-32 349.0n ± 1% 350.4n ± 1% ~ (p=0.909 n=20)
ShuffleOverhead-32 322.3n ± 1% 323.8n ± 1% ~ (p=0.410 n=20)
Concurrent-32 3.331n ± 1% 7.312n ± 1% +119.50% (p=0.000 n=20)
For #61716.
Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2
Reviewed-on: https://go-review.googlesource.com/c/go/+/516860
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The code generation on riscv64 will currently result in incorrect
assembly when a 32 bit integer is right shifted by an amount that
exceeds the size of the type. In particular, this occurs when an
int32 or uint32 is cast to a 64 bit type and right shifted by a
value larger than 31.
Fix this by moving the SRAW/SRLW conversion into the right shift
rules and removing the SignExt32to64/ZeroExt32to64. Add additional
rules that rewrite to SRAIW/SRLIW when the shift is less than the
size of the type, or replace/eliminate the shift when it exceeds
the size of the type.
Add SSA tests that would have caught this issue. Also add additional
codegen tests to ensure that the resulting assembly is what we
expect in these overflow cases.
Fixes #64285
Change-Id: Ie97b05668597cfcb91413afefaab18ee1aa145ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/545035
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
The shift amounts were wrong in this case, leading to miscompilation
of load combining.
Also the store combining was not triggering when it should.
Fixes #64468
Change-Id: Iaeb08972c5fc1d6f628800334789c6af7216e87b
Reviewed-on: https://go-review.googlesource.com/c/go/+/546355
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: Ic61fb181923159e80a86a41582e83ec466ab9bc4
GitHub-Last-Rev: 92469845665fa1f864d257c8bc175201a43b4d43
GitHub-Pull-Request: golang/go#64080
Reviewed-on: https://go-review.googlesource.com/c/go/+/541741
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Jes Cok <xigua67damn@gmail.com>
|
|
As of CL 539699, PGO-based devirtualization supports devirtualization of
function values in addition to interface method calls. As with CL
497175, we need to explicitly look up functions from export data that
may not be imported already.
Symbol naming is ambiguous (`foo.Bar.func1` could be a closure or a
method), so we simply attempt to do both types of lookup. That said,
closures are defined in export data only as OCLOSURE nodes in the
enclosing function, which this CL does not yet attempt to expand.
For #61577.
Change-Id: Ic7205b046218a4dfb8c4162ece3620ed1c3cb40a
Reviewed-on: https://go-review.googlesource.com/c/go/+/540258
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Today, PGO-based devirtualization only applies to interface calls. This
CL extends initial support to function values (i.e., function/closure
pointers passed as arguments or stored in a struct).
This CL is a minimal implementation with several limitations.
* Export data lookup of function value callees not implemented
(equivalent of CL 497175; done in CL 540258).
* Callees must be standard static functions. Callees that are closures
(requiring closure context) are not supported.
For #61577.
Change-Id: I7d328859035249e176294cd0d9885b2d08c853f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/539699
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change mostly implements the design described in #60773 and
includes a new scalable parser for the new trace format, available in
internal/trace/v2. I'll leave this commit message short because this is
clearly an enormous CL with a lot of detail.
This change does not hook up the new tracer into cmd/trace yet. A
follow-up CL will handle that.
For #60773.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race
Change-Id: I5d2aca2cc07580ed3c76a9813ac48ec96b157de0
Reviewed-on: https://go-review.googlesource.com/c/go/+/494187
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Currently the user arena code writes heap bits to the (*mspan).heapBits
space with the platform-specific byte ordering (the heap bits are
written and managed as uintptrs). However, the compiler always emits GC
metadata for types in little endian.
Because the scanning part of the code that loads through the type
pointer in the allocation header expects little endian ordering, we end
up with the wrong byte ordering in GC when trying to scan arena memory.
Fix this by writing out the user arena heap bits in little endian on big
endian platforms.
This means that the space returned by (*mspan).heapBits has a different
meaning for user arenas and small object spans, which is a little odd,
so I documented it. To reduce the chance of misuse of the writeHeapBits
API, which now writes out heap bits in a different ordering than
writeSmallHeapBits on big endian platforms, this change also renames
writeHeapBits to writeUserArenaHeapBits.
Much of this can be avoided in the future if the compiler were to write
out the pointer/scalar bits as an array of uintptr values instead of
plain bytes. That's too big of a change for right now though.
This change is a no-op on little endian platforms. I confirmed it by
checking for any assembly code differences in the runtime test binary.
There were none. With this change, the arena tests pass on ppc64.
Fixes #64048.
Change-Id: If077d003872fcccf5a154ff5d8441a58582061bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/541315
Run-TryBot: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This change replaces the 1-bit-per-word heap bitmap for most size
classes with allocation headers for objects that contain pointers. The
header consists of a single pointer to a type. All allocations with
headers are treated as implicitly containing one or more instances of
the type in the header.
As the name implies, headers are usually stored as the first word of an
object. There are two additional exceptions to where headers are stored
and how they're used.
Objects smaller than 512 bytes do not have headers. Instead, a heap
bitmap is reserved at the end of spans for objects of this size. A full
word of overhead is too much for these small objects. The bitmap is of
the same format of the old bitmap, minus the noMorePtrs bits which are
unnecessary. All the objects <512 bytes have a bitmap less than a
pointer-word in size, and that was the granularity at which noMorePtrs
could stop scanning early anyway.
Objects that are larger than 32 KiB (which have their own span) have
their headers stored directly in the span, to allow power-of-two-sized
allocations to not spill over into an extra page.
The full implementation is behind GOEXPERIMENT=allocheaders.
The purpose of this change is performance. First and foremost, with
headers we no longer have to unroll pointer/scalar data at allocation
time for most size classes. Small size classes still need some
unrolling, but their bitmaps are small so we can optimize that case
fairly well. Larger objects effectively have their pointer/scalar data
unrolled on-demand from type data, which is much more compactly
represented and results in less TLB pressure. Furthermore, since the
headers are usually right next to the object and where we're about to
start scanning, we get an additional temporal locality benefit in the
data cache when looking up type metadata. The pointer/scalar data is
now effectively unrolled on-demand, but it's also simpler to unroll than
before; that unrolled data is never written anywhere, and for arrays we
get the benefit of retreading the same data per element, as opposed to
looking it up from scratch for each pointer-word of bitmap. Lastly,
because we no longer have a heap bitmap that spans the entire heap,
there's a flat 1.5% memory use reduction. This is balanced slightly by
some objects possibly being bumped up a size class, but most objects are
not tightly optimized to size class sizes so there's some memory to
spare, making the header basically free in those cases.
See the follow-up CL which turns on this experiment by default for
benchmark results. (CL 538217.)
Change-Id: I4c9034ee200650d06d8bdecd579d5f7c1bbf1fc5
Reviewed-on: https://go-review.googlesource.com/c/go/+/437955
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
TestPGOHash may rebuild dependencies as we pass -trimpath to the
go command. This CL makes it pass -trimpath compiler flag to only
the current package instead, as we only need the current package
to have a stable source file path.
Also refactor buildPGOInliningTest to only take compiler flags,
not go flags, to avoid accidental rebuild.
Should fix #63733.
Change-Id: Iec6c4e90cf659790e21083ee2e697f518234c5b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/535915
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
|
|
Today, the PGO IR graph only contains entries for ir.Func loaded into
the package. This can include functions from transitive dependencies,
but only if they happen to be referenced by something in the current
package. If they are not referenced, noder never bothers to load them.
This leads to a deficiency in PGO devirtualization: some callee methods
are available in transitive dependencies but do not devirtualize because
they happen to not get loaded from export data.
Resolve this by adding an explicit lookup from export data of callees
mentioned in the profile.
I have chosen to do this during loading of the profile for simplicity:
the PGO IR graph always contains all of the functions we might need.
That said, it isn't strictly necessary. PGO devirtualization could do
the lookup lazily if it decides it actually needs a method. This saves
work at the expense of a bit more complexity, but I've chosen the
simpler approach for now as I measured the cost of this as significantly
less than the rest of PGO loading.
For #61577.
Change-Id: Ieafb2a549510587027270ee6b4c3aefd149a901f
Reviewed-on: https://go-review.googlesource.com/c/go/+/497175
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
That way we don't need to call into the runtime for every
type assertion (to an interface type).
name old time/op new time/op delta
TypeAssert-24 3.78ns ± 3% 1.00ns ± 1% -73.53% (p=0.000 n=10+8)
Change-Id: I0ba308aaf0f24a5495b4e13c814d35af0c58bfde
Reviewed-on: https://go-review.googlesource.com/c/go/+/529316
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
That way we don't need to call into the runtime when the type being
switched on has been seen many times before.
The cache is just a hash table of a sample of all the concrete types
that have been switched on at that source location. We record the
matching case number and the resulting itab for each concrete input
type.
The caches seldom get large. The only two in a run of all.bash that
get more than 100 entries, even with the sampling rate set to 1, are
test/fixedbugs/issue29264.go, with 101
test/fixedbugs/issue29312.go, with 254
Both happen at the type switch in fmt.(*pp).handleMethods, perhaps
unsurprisingly.
name old time/op new time/op delta
SwitchInterfaceTypePredictable-24 25.8ns ± 2% 2.5ns ± 3% -90.43% (p=0.000 n=10+10)
SwitchInterfaceTypeUnpredictable-24 37.5ns ± 2% 11.2ns ± 1% -70.02% (p=0.000 n=10+10)
Change-Id: I4961ac9547b7f15b03be6f55cdcb972d176955eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/526658
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
For type switches where the targets are interface types,
call into the runtime once instead of doing a sequence
of assert* calls.
name old time/op new time/op delta
SwitchInterfaceTypePredictable-24 26.6ns ± 1% 25.8ns ± 2% -2.86% (p=0.000 n=10+10)
SwitchInterfaceTypeUnpredictable-24 39.3ns ± 1% 37.5ns ± 2% -4.57% (p=0.000 n=10+10)
Not super helpful by itself, but this code organization allows
followon CLs that add caching to the lookups.
Change-Id: I7967f85a99171faa6c2550690e311bea8b54b01c
Reviewed-on: https://go-review.googlesource.com/c/go/+/526657
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Uses ,ABI instead of <ABI> because of problems with shell escaping
and windows file names, however if someone goes to all the trouble
of escaping the linker syntax and uses that instead, that works too.
Examples:
```
GOSSAFUNC=runtime.exitsyscall go build main.go
\# runtime
dumped SSA for exitsyscall,0 to ../../src/loopvar/ssa.html
dumped SSA for exitsyscall,1 to ../../src/loopvar/ssa.html
GOSSADIR=`pwd` GOSSAFUNC=runtime.exitsyscall go build main.go
\# runtime
dumped SSA for exitsyscall,0 to ../../src/loopvar/runtime.exitsyscall,0.html
dumped SSA for exitsyscall,1 to ../../src/loopvar/runtime.exitsyscall,1.html
GOSSAFUNC=runtime.exitsyscall,0 go build main.go
\# runtime
dumped SSA for exitsyscall,0 to ../../src/loopvar/ssa.html
GOSSAFUNC=runtime.exitsyscall\<1\> go build main.go
\# runtime
dumped SSA for exitsyscall,1 to ../../src/loopvar/ssa.html
```
Change-Id: Ia1138b61c797d0de49dbfae702dc306b9650a7f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/532475
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: David Chase <drchase@google.com>
|
|
When a PGO build fails or produces incorrect program, it is often
unclear what the problem is. Add pgo hash so we can bisect to
individual optimization decisions, which often helps debugging.
Related to #58153.
Change-Id: I651ffd9c53bad60f2f28c8ec2a90a3f532982712
Reviewed-on: https://go-review.googlesource.com/c/go/+/528400
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Now that types can self-report how many registers they need, it's much
easier to determine whether a parameter will fit into the available
registers: simply compare the number of registers needed against the
number of registers still available.
This also eliminates the need for the NumParamRegs cache.
Also, the new code in NumParamRegs is stricter in only allowing it to
be called on types that can actually be passed in registers, which
requires a test to be corrected for that. While here, change mkstruct
to a variadic function, so the call sites require less boilerplate.
Change-Id: Iebe1a0456a8053a10e551e5da796014e5b1b695b
Reviewed-on: https://go-review.googlesource.com/c/go/+/527339
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
|
|
Use ^ and $ in the -run flag regular expression value when the intention
is to invoke a single named test. This removes the reliance on there not
being another similarly named test to achieve the intended result.
In particular, package syscall has tests named TestUnshareMountNameSpace
and TestUnshareMountNameSpaceChroot that both trigger themselves setting
GO_WANT_HELPER_PROCESS=1 to run alternate code in a helper process. As a
consequence of overlap in their test names, the former was inadvertently
triggering one too many helpers.
Spotted while reviewing CL 525196. Apply the same change in other places
to make it easier for code readers to see that said tests aren't running
extraneous tests. The unlikely cases of -run=TestSomething intentionally
being used to run all tests that have the TestSomething substring in the
name can be better written as -run=^.*TestSomething.*$ or with a comment
so it is clear it wasn't an oversight.
Change-Id: Iba208aba3998acdbf8c6708e5d23ab88938bfc1e
Reviewed-on: https://go-review.googlesource.com/c/go/+/524948
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Kirill Kolyshkin <kolyshkin@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This allows to reuse the slice cap computation across
specialized growslice funcs.
Updates #49480
Change-Id: Ie075d9c3075659ea14c11d51a9cd4ed46aa0e961
Reviewed-on: https://go-review.googlesource.com/c/go/+/495876
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Egon Elbre <egonelbre@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
|
|
For large interface -> concrete type switches, we can use a jump
table on some bits of the type hash instead of a binary search on
the type hash.
name old time/op new time/op delta
SwitchTypePredictable-24 1.99ns ± 2% 1.78ns ± 5% -10.87% (p=0.000 n=10+10)
SwitchTypeUnpredictable-24 11.0ns ± 1% 9.1ns ± 2% -17.55% (p=0.000 n=7+9)
Change-Id: Ida4768e5d62c3ce1c2701288b72664aaa9e64259
Reviewed-on: https://go-review.googlesource.com/c/go/+/521497
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
|
|
Except for a single call site in escape analysis, every use of
ir.AsNode involves a types.Object that's known to contain
an *ir.Name. Asserting directly to that type makes the code simpler
and more efficient.
The one use in escape analysis is extended to handle nil correctly
without it.
Change-Id: I694ae516903e541341d82c2f65a9155e4b0a9809
Reviewed-on: https://go-review.googlesource.com/c/go/+/520775
TryBot-Bypass: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Start making progress towards constructing IR with proper types.
Change-Id: Iad32c1cf60f30ceb8e07c31c8871b115570ac3bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/520263
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
Make it more consistent with the static devirtualization
diagnostic message. Keep the print of concrete callee's method
name, as it is clearer.
Change-Id: Ibe9b40253eaff2c0071353a2b388177213488822
Reviewed-on: https://go-review.googlesource.com/c/go/+/500960
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
if current package has a concrete reference
The new PGO-driven indirect call specialization from CL 492436
in theory should allow for devirtualization on methods
in another package when those methods are directly referenced
in the current package.
However, inline.InlineImpossible was checking for a zero-length
fn.Body and would cause devirtualization to fail
with a debug log message like:
"should not PGO devirtualize (*Speaker1).Speak: no function body"
Previously, the logic in inline.InlineImpossible was only
called on local functions, but with PGO-based devirtualization,
it can now be called on imported functions, where inlinable
imported functions will have a zero-length fn.Body but a
non-nil fn.Inl.
We update inline.InlineImpossible to handle imported functions
by adding a call to typecheck.HaveInlineBody in the check
that was previously failing.
For the test, we need to have a hopefully temporary workaround
of adding explicit references to the callees in another package
for devirtualization to work. CL 497175 or similar should
enable removing this workaround.
Fixes #60561
Updates #59959
Change-Id: I48449b7d8b329d84151bd3b506b8093c262eb2a3
GitHub-Last-Rev: 2d53c55fd895ad8fefd25510a6e6969e89d54a6d
GitHub-Pull-Request: golang/go#60565
Reviewed-on: https://go-review.googlesource.com/c/go/+/500155
Run-TryBot: thepudds <thepudds1460@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This CL is originally based on CL 484838 from rajbarik@uber.com.
Add a new PGO-based devirtualize pass. This pass conditionally
devirtualizes interface calls for the hottest callee. That is, it
performs a transformation like:
type Iface interface {
Foo()
}
type Concrete struct{}
func (Concrete) Foo() {}
func foo(i Iface) {
i.Foo()
}
to:
func foo(i Iface) {
if c, ok := i.(Concrete); ok {
c.Foo()
} else {
i.Foo()
}
}
The primary benefit of this transformation is enabling inlining of the
direct calls.
Today this change has no impact on the escape behavior, as the fallback
interface always forces an escape. But improving escape analysis to take
advantage of this is an area of potential work.
This CL is the bare minimum of a devirtualization implementation. There
are still numerous limitations:
* Callees not directly referenced in the current package can be missed
(even if they are in the transitive dependences).
* Callees not in the transitive dependencies of the current package are
missed.
* Only interface method calls are supported, not other indirect function
calls.
* Multiple calls to compatible interfaces on the same line cannot be
distinguished and will use the same callee target.
* Callees that only partially implement an interface (they are embedded
in another type that completes the interface) cannot be devirtualized.
* Others, mentioned in TODOs.
Fixes #59959
Change-Id: I8bedb516139695ee4069650b099d05957b7ce5ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/492436
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
[This is a roll-forward of CL 479095, which was reverted due to a bad
interaction between inlining and escape analysis, then later fixed
first with an attempt in CL 482355, then again in CL 484859, and then
one more time with CL 492135.]
Currently, when the inliner is determining if a function is
inlineable, it descends into the bodies of closures constructed by
that function. This has several unfortunate consequences:
- If the closure contains a disallowed operation (e.g., a defer), then
the outer function can't be inlined. It makes sense that the
*closure* can't be inlined in this case, but it doesn't make sense
to punish the function that constructs the closure.
- The hairiness of the closure counts against the inlining budget of
the outer function. Since we currently copy the closure body when
inlining the outer function, this makes sense from the perspective
of export data size and binary size, but ultimately doesn't make
much sense from the perspective of what should be inlineable.
- Since the inliner walks into every closure created by an outer
function in addition to starting a walk at every closure, this adds
an n^2 factor to inlinability analysis.
This CL simply drops this behavior.
In std, this makes 57 more functions inlinable, and disallows inlining
for 10 (due to the basic instability of our bottom-up inlining
approach), for an net increase of 47 inlinable functions (+0.6%).
This will help significantly with the performance of the functions to
be added for #56102, which have a somewhat complicated nesting of
closures with a performance-critical fast path.
The downside of this seems to be a potential increase in export data
and text size, but the practical impact of this seems to be
negligible:
│ before │ after │
│ bytes │ bytes vs base │
Go/binary 15.12Mi ± 0% 15.14Mi ± 0% +0.16% (n=1)
Go/text 5.220Mi ± 0% 5.237Mi ± 0% +0.32% (n=1)
Compile/binary 22.92Mi ± 0% 22.94Mi ± 0% +0.07% (n=1)
Compile/text 8.428Mi ± 0% 8.435Mi ± 0% +0.08% (n=1)
Change-Id: I5f75fcceb177f05853996b75184a486528eafe96
Reviewed-on: https://go-review.googlesource.com/c/go/+/492017
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
Memory op combining is currently done using arch-specific rewrite rules.
Instead, do them as a arch-independent rewrite pass. This ensures that
all architectures (with unaligned loads & stores) get equal treatment.
This removes a lot of rewrite rules.
The new pass is a bit more comprehensive. It handles things like out-of-order
writes and is careful not to apply partial optimizations that then block
further optimizations.
Change-Id: I780ff3bb052475cd725a923309616882d25b8d9e
Reviewed-on: https://go-review.googlesource.com/c/go/+/478475
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
This reverts commit f8162a0e726f4b3a9df60a37e8ca7883dde61914.
Reason for revert: https://github.com/golang/go/issues/59680
Change-Id: I91821c691a2d019ff0ad5b69509e32f3d56b8f67
Reviewed-on: https://go-review.googlesource.com/c/go/+/485498
Reviewed-by: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
[This is a roll-forward of CL 479095, which was reverted due to a bad
interaction between inlining and escape analysis, then later fixed
fist with an attempt in CL 482355, then again in 484859 .]
Currently, when the inliner is determining if a function is
inlineable, it descends into the bodies of closures constructed by
that function. This has several unfortunate consequences:
- If the closure contains a disallowed operation (e.g., a defer), then
the outer function can't be inlined. It makes sense that the
*closure* can't be inlined in this case, but it doesn't make sense
to punish the function that constructs the closure.
- The hairiness of the closure counts against the inlining budget of
the outer function. Since we currently copy the closure body when
inlining the outer function, this makes sense from the perspective
of export data size and binary size, but ultimately doesn't make
much sense from the perspective of what should be inlineable.
- Since the inliner walks into every closure created by an outer
function in addition to starting a walk at every closure, this adds
an n^2 factor to inlinability analysis.
This CL simply drops this behavior.
In std, this makes 57 more functions inlinable, and disallows inlining
for 10 (due to the basic instability of our bottom-up inlining
approach), for an net increase of 47 inlinable functions (+0.6%).
This will help significantly with the performance of the functions to
be added for #56102, which have a somewhat complicated nesting of
closures with a performance-critical fast path.
The downside of this seems to be a potential increase in export data
and text size, but the practical impact of this seems to be
negligible:
│ before │ after │
│ bytes │ bytes vs base │
Go/binary 15.12Mi ± 0% 15.14Mi ± 0% +0.16% (n=1)
Go/text 5.220Mi ± 0% 5.237Mi ± 0% +0.32% (n=1)
Compile/binary 22.92Mi ± 0% 22.94Mi ± 0% +0.07% (n=1)
Compile/text 8.428Mi ± 0% 8.435Mi ± 0% +0.08% (n=1)
Updates #56102.
Change-Id: I6e938d596992ffb473cf51e7e598f372ce08deb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/484860
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
Updates #57434
Change-Id: Ib90c228f95c3d61204e60f63d7de55884d839e05
Reviewed-on: https://go-review.googlesource.com/c/go/+/459496
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
This adds benchmarks for division and modulus of 64 bit signed and unsigned
integers.
Updates #59089
Change-Id: Ie757c6d74a1f355873e79619eae26ece21a8f23e
Reviewed-on: https://go-review.googlesource.com/c/go/+/482656
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
This reverts commit http://go.dev/cl/c/482356.
Reason for revert: Reverting this change again, since it is causing additional failures in google-internal testing.
Change-Id: I9234946f62e5bb18c2f873a65e8b298d04af0809
Reviewed-on: https://go-review.googlesource.com/c/go/+/484735
Reviewed-by: Florian Zenker <floriank@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
Auto-Submit: Than McIntosh <thanm@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
[This is a roll-forward of CL 479095, which was reverted due to a bad
interaction between inlining and escape analysis since fixed in CL 482355.]
Currently, when the inliner is determining if a function is
inlineable, it descends into the bodies of closures constructed by
that function. This has several unfortunate consequences:
- If the closure contains a disallowed operation (e.g., a defer), then
the outer function can't be inlined. It makes sense that the
*closure* can't be inlined in this case, but it doesn't make sense
to punish the function that constructs the closure.
- The hairiness of the closure counts against the inlining budget of
the outer function. Since we currently copy the closure body when
inlining the outer function, this makes sense from the perspective
of export data size and binary size, but ultimately doesn't make
much sense from the perspective of what should be inlineable.
- Since the inliner walks into every closure created by an outer
function in addition to starting a walk at every closure, this adds
an n^2 factor to inlinability analysis.
This CL simply drops this behavior.
In std, this makes 57 more functions inlinable, and disallows inlining
for 10 (due to the basic instability of our bottom-up inlining
approach), for an net increase of 47 inlinable functions (+0.6%).
This will help significantly with the performance of the functions to
be added for #56102, which have a somewhat complicated nesting of
closures with a performance-critical fast path.
The downside of this seems to be a potential increase in export data
and text size, but the practical impact of this seems to be
negligible:
│ before │ after │
│ bytes │ bytes vs base │
Go/binary 15.12Mi ± 0% 15.14Mi ± 0% +0.16% (n=1)
Go/text 5.220Mi ± 0% 5.237Mi ± 0% +0.32% (n=1)
Compile/binary 22.92Mi ± 0% 22.94Mi ± 0% +0.07% (n=1)
Compile/text 8.428Mi ± 0% 8.435Mi ± 0% +0.08% (n=1)
Updates #56102.
Change-Id: I1f4fc96c71609c8feb59fecdb92b69ba7e3b5b41
Reviewed-on: https://go-review.googlesource.com/c/go/+/482356
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Skip one of the testpoints that verifies inlining, since it
no longer passes as a result of reverting CL 479095. Once we
roll forward with a new version of CL 479095 we can re-enable
this testpoint.
Change-Id: I41f6fb3fce78f31e60c5f0ed2856be0e66865149
Reviewed-on: https://go-review.googlesource.com/c/go/+/481755
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
|
|
This adds the three functions from #56102 to the sync package. These
provide a convenient API for the most common uses of sync.Once.
The performance of these is comparable to direct use of sync.Once:
$ go test -run ^$ -bench OnceFunc\|OnceVal -count 20 | benchstat -row .name -col /v
goos: linux
goarch: amd64
pkg: sync
cpu: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
│ Once │ Global │ Local │
│ sec/op │ sec/op vs base │ sec/op vs base │
OnceFunc 1.3500n ± 6% 2.7030n ± 1% +100.22% (p=0.000 n=20) 0.3935n ± 0% -70.86% (p=0.000 n=20)
OnceValue 1.3155n ± 0% 2.7460n ± 1% +108.74% (p=0.000 n=20) 0.5478n ± 1% -58.35% (p=0.000 n=20)
The "Once" column represents the baseline of how code would typically
express these patterns using sync.Once. "Global" binds the closure
returned by OnceFunc/OnceValue to global, which is how I expect these
to be used most of the time. Currently, this defeats some inlining
opportunities, which roughly doubles the cost over sync.Once; however,
it's still *extremely* fast. Finally, "Local" binds the returned
closure to a local variable. This unlocks several levels of inlining
and represents pretty much the best possible case for these APIs, but
is also unlikely to happen in practice. In principle the compiler
could recognize that the global in the "Global" case is initialized in
place and never mutated and do the same optimizations it does in the
"Local" case, but it currently does not.
Fixes #56102
Change-Id: If7355eccd7c8de7288d89a4282ff15ab1469e420
Reviewed-on: https://go-review.googlesource.com/c/go/+/451356
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Andrew Gerrand <adg@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Caleb Spare <cespare@gmail.com>
Auto-Submit: Austin Clements <austin@google.com>
|
|
This reduces unbounded shift latency by one cycle, and may
generate less instructions in some cases.
When there is a choice whether to use doubleword or word shifts, use
doubleword shifts. Doubleword shifts have fewer hardware scheduling
restrictions across P8/P9/P10.
Likewise, rework the shift sequence to allow the compare/shift/overshift
values to compute in parallel, then choose the correct value.
Some ANDCCconst rules also need reworked to ensure they simplify when
used for their flag value. This commonly occurs when prove fails to
identify a bounded shift (e.g foo32<<uint(x&31)).
Change-Id: Ifc6ff4a865d68675e57745056db414b0eb6f2d34
Reviewed-on: https://go-review.googlesource.com/c/go/+/442597
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Future CLs will remove the invariant that pointers are always put in
the write barrier in pairs.
The behavior of the assembly code changes a bit, where instead of writing
the pointers unconditionally and then checking for overflow, check for
overflow first and then write the pointers.
Also changed the write barrier flush function to not take the src/dst
as arguments.
Change-Id: I2ef708038367b7b82ea67cbaf505a1d5904c775c
Reviewed-on: https://go-review.googlesource.com/c/go/+/447779
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Bypass: Keith Randall <khr@golang.org>
|
|
This patch detects at which index position profiling samples that have
the value-type samples count are, instead of the previously hard-coded
position of index 1. Runtime generated profiles always generate CPU
profiling data with the 0 index being CPU nanoseconds, and samples count
at index 1, which is why this previously hasn't come up.
This is a redo of CL 465135, now allowing empty profiles. Note that
preprocessProfileGraph will already cause pgo.New to return nil for
empty profiles.
Fixes #58292
Change-Id: Ia6c94f0793f6ca9b0882b5e2c4d34f38e600c1e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/466675
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This reverts CL 465135.
Reason for revert: This broke cmd/go.TestScript/build_pgo on the linux-amd64-longtest builder: https://build.golang.org/log/8f8ed7bf576f891a06d295c4a5bca987c6e941d6
Change-Id: Ie2f2cc2731099eb28eda6b94dded4dfc34e29441
Reviewed-on: https://go-review.googlesource.com/c/go/+/466439
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
This patch detects at which index position profiling samples that have the value-type samples count are, instead of the previously hard-coded position of index 1. Runtime generated profiles always generate CPU profiling data with the 0 index being CPU nanoseconds, and samples count at index 1, which is why this previously hasn't come up.
Fixes #58292
Change-Id: Idde761d53b02259f3076c4e5dcb4a96a9d53df0e
GitHub-Last-Rev: dabbf9f88c560286e150e9b136a27c3ac23c5ec1
GitHub-Pull-Request: golang/go#58294
Reviewed-on: https://go-review.googlesource.com/c/go/+/465135
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Updates #45402
Change-Id: Ieffd1c8b0b5e4e63024b5be2e1f910fb4411eb94
GitHub-Last-Rev: fa7418c8eb977b7214311e774f9df7a1220a3dfd
GitHub-Pull-Request: golang/go#57940
Reviewed-on: https://go-review.googlesource.com/c/go/+/462896
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
|
|
Updates #57410
Change-Id: I9be38e20c6b83d14f7785049a66de77ac7ecdf15
Reviewed-on: https://go-review.googlesource.com/c/go/+/463997
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
This CL removes a handful of features that were only needed for the
pre-unified frontends.
In particular, Type.Pkg was a hack for iexport so that
go/types.Var.Pkg could be precisely populated for struct fields and
signature parameters by gcimporter, but it's no longer necessary with
the unified export data format because we now write export data
directly from types2-supplied type descriptors.
Several other features (e.g., OrigType, implicit interfaces, type
parameters on signatures) are no longer relevant to the unified
frontend, because it only uses types1 to represent instantiated
generic types.
Updates #57410.
Change-Id: I84fd1da5e0b65d2ab91d244a7bb593821ee916e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/458622
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This CL removes the GOEXPERIMENT=nounified knob, and any conditional
statements that depend on that knob. Further CLs to remove unreachable
code follow this one.
Updates #57410.
Change-Id: I39c147e1a83601c73f8316a001705778fee64a91
Reviewed-on: https://go-review.googlesource.com/c/go/+/458615
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This CL avoids allocating in utf16.Decode for code point sequences
with less than 64 elements. It does so by splitting the function in two,
one that can be inlined that preallocates a buffer and the other that
does the heavy-lifting.
The mid-stack inliner will allocate the buffer in the caller stack,
and in many cases this will be enough to avoid the allocation.
unicode/utf16 benchmarks:
name old time/op new time/op delta
DecodeValidASCII-12 60.1ns ± 3% 16.0ns ±20% -73.40% (p=0.000 n=8+10)
DecodeValidJapaneseChars-12 61.3ns ±10% 14.9ns ±39% -75.71% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
DecodeValidASCII-12 48.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
DecodeValidJapaneseChars-12 48.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
DecodeValidASCII-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
DecodeValidJapaneseChars-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
I've also benchmarked os.File.ReadDir with this change applied
to demonstrate that it does make a difference in the caller site, in this
case via syscall.UTF16ToString:
name old time/op new time/op delta
ReadDir-12 592µs ± 8% 620µs ±16% ~ (p=0.280 n=10+10)
name old alloc/op new alloc/op delta
ReadDir-12 30.4kB ± 0% 22.4kB ± 0% -26.10% (p=0.000 n=8+10)
name old allocs/op new allocs/op delta
ReadDir-12 402 ± 0% 272 ± 0% -32.34% (p=0.000 n=10+10)
Change-Id: I65cf5caa3fd3b3a466c0ed837a50a96e975bbe6b
Reviewed-on: https://go-review.googlesource.com/c/go/+/453415
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
|
|
This is the second round to look for spelling mistakes. This time the
manual sifting of the result list was made easier by filtering out
capitalized and camelcase words.
grep -r --include '*.go' -E '^// .*$' . | aspell list | grep -E -x '[A-Za-z]{1}[a-z]*' | sort | uniq
This PR will be imported into Gerrit with the title and first
comment (this text) used to generate the subject and body of
the Gerrit change.
Change-Id: Ie8a2092aaa7e1f051aa90f03dbaf2b9aaf5664a9
GitHub-Last-Rev: fc2bd6e0c51652f13a7588980f1408af8e6080f5
GitHub-Pull-Request: golang/go#57737
Reviewed-on: https://go-review.googlesource.com/c/go/+/461595
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|