aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/test
AgeCommit message (Collapse)Author
2024-01-25all: prealloc slice with possible minimum capabilitiesShulhan
2024-01-22Revert "cmd/preprofile: Add preprocess tool to pre-parse the profile file."Michael Pratt
This reverts CL 529738. Reason for revert: Breaking longtest builders For #58102. Fixes #65220. Change-Id: Id295e3249da9d82f6a9e4fc571760302a1362def Reviewed-on: https://go-review.googlesource.com/c/go/+/557460 Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-01-22cmd/preprofile: Add preprocess tool to pre-parse the profile file.Jin Lin
The pgo compilation time is very long if the profile file is large. We added a preprocess tool to pre-parse profile file in order to expedite the compile time. Change-Id: I6f50bbd01f242448e2463607a9b63483c6ca9a12 Reviewed-on: https://go-review.googlesource.com/c/go/+/529738 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-12-14all: remove newline characters after return statementsDanil Timerbulatov
This commit is aimed at improving the readability and consistency of the code base. Extraneous newline characters were present after some return statements, creating unnecessary separation in the code. Fixes #64610 Change-Id: Ic1b05bf11761c4dff22691c2f1c3755f66d341f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/548316 Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-12-05math/rand, math/rand/v2: use ChaCha8 for global randRuss Cox
Move ChaCha8 code into internal/chacha8rand and use it to implement runtime.rand, which is used for the unseeded global source for both math/rand and math/rand/v2. This also affects the calculation of the start point for iteration over very very large maps (when the 32-bit fastrand is not big enough). The benefit is that misuse of the global random number generators in math/rand and math/rand/v2 in contexts where non-predictable randomness is important for security reasons is no longer a security problem, removing a common mistake among programmers who are unaware of the different kinds of randomness. The cost is an extra 304 bytes per thread stored in the m struct plus 2-3ns more per random uint64 due to the more sophisticated algorithm. Using PCG looks like it would cost about the same, although I haven't benchmarked that. Before this, the math/rand and math/rand/v2 global generator was wyrand (https://github.com/wangyi-fudan/wyhash). For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson ALFG was justifiable, since the latter was not any better. But for math/rand/v2, the global generator really should be at least as good as one of the well-studied, specific algorithms provided directly by the package, and it's not. (Wyrand is still reasonable for scheduling and cache decisions.) Good randomness does have a cost: about twice wyrand. Also rationalize the various runtime rand references. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ bbb48afeb7.amd64 │ 5cf807d1ea.amd64 │ │ sec/op │ sec/op vs base │ ChaCha8-32 1.862n ± 2% 1.861n ± 2% ~ (p=0.825 n=20) PCG_DXSM-32 1.471n ± 1% 1.460n ± 2% ~ (p=0.153 n=20) SourceUint64-32 1.636n ± 2% 1.582n ± 1% -3.30% (p=0.000 n=20) GlobalInt64-32 2.087n ± 1% 3.663n ± 1% +75.54% (p=0.000 n=20) GlobalInt64Parallel-32 0.1042n ± 1% 0.2026n ± 1% +94.48% (p=0.000 n=20) GlobalUint64-32 2.263n ± 2% 3.724n ± 1% +64.57% (p=0.000 n=20) GlobalUint64Parallel-32 0.1019n ± 1% 0.1973n ± 1% +93.67% (p=0.000 n=20) Int64-32 1.771n ± 1% 1.774n ± 1% ~ (p=0.449 n=20) Uint64-32 1.863n ± 2% 1.866n ± 1% ~ (p=0.364 n=20) GlobalIntN1000-32 3.134n ± 3% 4.730n ± 2% +50.95% (p=0.000 n=20) IntN1000-32 2.489n ± 1% 2.489n ± 1% ~ (p=0.683 n=20) Int64N1000-32 2.521n ± 1% 2.516n ± 1% ~ (p=0.394 n=20) Int64N1e8-32 2.479n ± 1% 2.478n ± 2% ~ (p=0.743 n=20) Int64N1e9-32 2.530n ± 2% 2.514n ± 2% ~ (p=0.193 n=20) Int64N2e9-32 2.501n ± 1% 2.494n ± 1% ~ (p=0.616 n=20) Int64N1e18-32 3.227n ± 1% 3.205n ± 1% ~ (p=0.101 n=20) Int64N2e18-32 3.647n ± 1% 3.599n ± 1% ~ (p=0.019 n=20) Int64N4e18-32 5.135n ± 1% 5.069n ± 2% ~ (p=0.034 n=20) Int32N1000-32 2.657n ± 1% 2.637n ± 1% ~ (p=0.180 n=20) Int32N1e8-32 2.636n ± 1% 2.636n ± 1% ~ (p=0.763 n=20) Int32N1e9-32 2.660n ± 2% 2.638n ± 1% ~ (p=0.358 n=20) Int32N2e9-32 2.662n ± 2% 2.618n ± 2% ~ (p=0.064 n=20) Float32-32 2.272n ± 2% 2.239n ± 2% ~ (p=0.194 n=20) Float64-32 2.272n ± 1% 2.286n ± 2% ~ (p=0.763 n=20) ExpFloat64-32 3.762n ± 1% 3.744n ± 1% ~ (p=0.171 n=20) NormFloat64-32 3.706n ± 1% 3.655n ± 2% ~ (p=0.066 n=20) Perm3-32 32.93n ± 3% 34.62n ± 1% +5.13% (p=0.000 n=20) Perm30-32 202.9n ± 1% 204.0n ± 1% ~ (p=0.482 n=20) Perm30ViaShuffle-32 115.0n ± 1% 114.9n ± 1% ~ (p=0.358 n=20) ShuffleOverhead-32 112.8n ± 1% 112.7n ± 1% ~ (p=0.692 n=20) Concurrent-32 2.107n ± 0% 3.725n ± 1% +76.75% (p=0.000 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 │ bbb48afeb7.arm64 │ 5cf807d1ea.arm64 │ │ sec/op │ sec/op vs base │ ChaCha8-8 2.480n ± 0% 2.429n ± 0% -2.04% (p=0.000 n=20) PCG_DXSM-8 2.531n ± 0% 2.530n ± 0% ~ (p=0.877 n=20) SourceUint64-8 2.534n ± 0% 2.533n ± 0% ~ (p=0.732 n=20) GlobalInt64-8 2.172n ± 1% 4.794n ± 0% +120.67% (p=0.000 n=20) GlobalInt64Parallel-8 0.4320n ± 0% 0.9605n ± 0% +122.32% (p=0.000 n=20) GlobalUint64-8 2.182n ± 0% 4.770n ± 0% +118.58% (p=0.000 n=20) GlobalUint64Parallel-8 0.4307n ± 0% 0.9583n ± 0% +122.51% (p=0.000 n=20) Int64-8 4.107n ± 0% 4.104n ± 0% ~ (p=0.416 n=20) Uint64-8 4.080n ± 0% 4.080n ± 0% ~ (p=0.052 n=20) GlobalIntN1000-8 2.814n ± 2% 5.643n ± 0% +100.50% (p=0.000 n=20) IntN1000-8 4.141n ± 0% 4.139n ± 0% ~ (p=0.140 n=20) Int64N1000-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.313 n=20) Int64N1e8-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.103 n=20) Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.761 n=20) Int64N2e9-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.636 n=20) Int64N1e18-8 5.266n ± 0% 5.326n ± 1% +1.14% (p=0.001 n=20) Int64N2e18-8 6.052n ± 0% 6.167n ± 0% +1.90% (p=0.000 n=20) Int64N4e18-8 8.826n ± 0% 9.051n ± 0% +2.55% (p=0.000 n=20) Int32N1000-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20) Int32N1e8-8 4.126n ± 0% 4.131n ± 0% +0.12% (p=0.000 n=20) Int32N1e9-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20) Int32N2e9-8 4.132n ± 0% 4.131n ± 0% ~ (p=0.017 n=20) Float32-8 4.109n ± 0% 4.105n ± 0% ~ (p=0.379 n=20) Float64-8 4.107n ± 0% 4.106n ± 0% ~ (p=0.867 n=20) ExpFloat64-8 5.339n ± 0% 5.383n ± 0% +0.82% (p=0.000 n=20) NormFloat64-8 5.735n ± 0% 5.737n ± 1% ~ (p=0.856 n=20) Perm3-8 26.65n ± 0% 26.80n ± 1% +0.58% (p=0.000 n=20) Perm30-8 194.8n ± 1% 197.0n ± 0% +1.18% (p=0.000 n=20) Perm30ViaShuffle-8 156.6n ± 0% 157.6n ± 1% +0.61% (p=0.000 n=20) ShuffleOverhead-8 124.9n ± 0% 125.5n ± 0% +0.52% (p=0.000 n=20) Concurrent-8 2.434n ± 3% 5.066n ± 0% +108.09% (p=0.000 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ bbb48afeb7.386 │ 5cf807d1ea.386 │ │ sec/op │ sec/op vs base │ ChaCha8-32 11.295n ± 1% 4.748n ± 2% -57.96% (p=0.000 n=20) PCG_DXSM-32 7.693n ± 1% 7.738n ± 2% ~ (p=0.542 n=20) SourceUint64-32 7.658n ± 2% 7.622n ± 2% ~ (p=0.344 n=20) GlobalInt64-32 3.473n ± 2% 7.526n ± 2% +116.73% (p=0.000 n=20) GlobalInt64Parallel-32 0.3198n ± 0% 0.5444n ± 0% +70.22% (p=0.000 n=20) GlobalUint64-32 3.612n ± 0% 7.575n ± 1% +109.69% (p=0.000 n=20) GlobalUint64Parallel-32 0.3168n ± 0% 0.5403n ± 0% +70.51% (p=0.000 n=20) Int64-32 7.673n ± 2% 7.789n ± 1% ~ (p=0.122 n=20) Uint64-32 7.773n ± 1% 7.827n ± 2% ~ (p=0.920 n=20) GlobalIntN1000-32 6.268n ± 1% 9.581n ± 1% +52.87% (p=0.000 n=20) IntN1000-32 10.33n ± 2% 10.45n ± 1% ~ (p=0.233 n=20) Int64N1000-32 10.98n ± 2% 11.01n ± 1% ~ (p=0.401 n=20) Int64N1e8-32 11.19n ± 2% 10.97n ± 1% ~ (p=0.033 n=20) Int64N1e9-32 11.06n ± 1% 11.08n ± 1% ~ (p=0.498 n=20) Int64N2e9-32 11.10n ± 1% 11.01n ± 2% ~ (p=0.995 n=20) Int64N1e18-32 15.23n ± 2% 15.04n ± 1% ~ (p=0.973 n=20) Int64N2e18-32 15.89n ± 1% 15.85n ± 1% ~ (p=0.409 n=20) Int64N4e18-32 18.96n ± 2% 19.34n ± 2% ~ (p=0.048 n=20) Int32N1000-32 10.46n ± 2% 10.44n ± 2% ~ (p=0.480 n=20) Int32N1e8-32 10.46n ± 2% 10.49n ± 2% ~ (p=0.951 n=20) Int32N1e9-32 10.28n ± 2% 10.26n ± 1% ~ (p=0.431 n=20) Int32N2e9-32 10.50n ± 2% 10.44n ± 2% ~ (p=0.249 n=20) Float32-32 13.80n ± 2% 13.80n ± 2% ~ (p=0.751 n=20) Float64-32 23.55n ± 2% 23.87n ± 0% ~ (p=0.408 n=20) ExpFloat64-32 15.36n ± 1% 15.29n ± 2% ~ (p=0.316 n=20) NormFloat64-32 13.57n ± 1% 13.79n ± 1% +1.66% (p=0.005 n=20) Perm3-32 45.70n ± 2% 46.99n ± 2% +2.81% (p=0.001 n=20) Perm30-32 399.0n ± 1% 403.8n ± 1% +1.19% (p=0.006 n=20) Perm30ViaShuffle-32 349.0n ± 1% 350.4n ± 1% ~ (p=0.909 n=20) ShuffleOverhead-32 322.3n ± 1% 323.8n ± 1% ~ (p=0.410 n=20) Concurrent-32 3.331n ± 1% 7.312n ± 1% +119.50% (p=0.000 n=20) For #61716. Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2 Reviewed-on: https://go-review.googlesource.com/c/go/+/516860 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-12-01cmd/compile: correct code generation for right shifts on riscv64Joel Sing
The code generation on riscv64 will currently result in incorrect assembly when a 32 bit integer is right shifted by an amount that exceeds the size of the type. In particular, this occurs when an int32 or uint32 is cast to a 64 bit type and right shifted by a value larger than 31. Fix this by moving the SRAW/SRLW conversion into the right shift rules and removing the SignExt32to64/ZeroExt32to64. Add additional rules that rewrite to SRAIW/SRLIW when the shift is less than the size of the type, or replace/eliminate the shift when it exceeds the size of the type. Add SSA tests that would have caught this issue. Also add additional codegen tests to ensure that the resulting assembly is what we expect in these overflow cases. Fixes #64285 Change-Id: Ie97b05668597cfcb91413afefaab18ee1aa145ec Reviewed-on: https://go-review.googlesource.com/c/go/+/545035 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: M Zhuo <mzh@golangcn.org> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-11-30cmd/compile: fix memcombine pass for big endian, > 1 byte elementsKeith Randall
The shift amounts were wrong in this case, leading to miscompilation of load combining. Also the store combining was not triggering when it should. Fixes #64468 Change-Id: Iaeb08972c5fc1d6f628800334789c6af7216e87b Reviewed-on: https://go-review.googlesource.com/c/go/+/546355 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-17all: add missing copyright headerJes Cok
Change-Id: Ic61fb181923159e80a86a41582e83ec466ab9bc4 GitHub-Last-Rev: 92469845665fa1f864d257c8bc175201a43b4d43 GitHub-Pull-Request: golang/go#64080 Reviewed-on: https://go-review.googlesource.com/c/go/+/541741 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Jes Cok <xigua67damn@gmail.com>
2023-11-13cmd/compile: support lookup of functions from export dataMichael Pratt
As of CL 539699, PGO-based devirtualization supports devirtualization of function values in addition to interface method calls. As with CL 497175, we need to explicitly look up functions from export data that may not be imported already. Symbol naming is ambiguous (`foo.Bar.func1` could be a closure or a method), so we simply attempt to do both types of lookup. That said, closures are defined in export data only as OCLOSURE nodes in the enclosing function, which this CL does not yet attempt to expand. For #61577. Change-Id: Ic7205b046218a4dfb8c4162ece3620ed1c3cb40a Reviewed-on: https://go-review.googlesource.com/c/go/+/540258 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-11-13cmd/compile: initial function value devirtualizationMichael Pratt
Today, PGO-based devirtualization only applies to interface calls. This CL extends initial support to function values (i.e., function/closure pointers passed as arguments or stored in a struct). This CL is a minimal implementation with several limitations. * Export data lookup of function value callees not implemented (equivalent of CL 497175; done in CL 540258). * Callees must be standard static functions. Callees that are closures (requiring closure context) are not supported. For #61577. Change-Id: I7d328859035249e176294cd0d9885b2d08c853f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/539699 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-10runtime: add execution tracer v2 behind GOEXPERIMENT=exectracer2Michael Anthony Knyszek
This change mostly implements the design described in #60773 and includes a new scalable parser for the new trace format, available in internal/trace/v2. I'll leave this commit message short because this is clearly an enormous CL with a lot of detail. This change does not hook up the new tracer into cmd/trace yet. A follow-up CL will handle that. For #60773. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race Change-Id: I5d2aca2cc07580ed3c76a9813ac48ec96b157de0 Reviewed-on: https://go-review.googlesource.com/c/go/+/494187 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-10runtime: fix user arena heap bits writing on big endian platformsMichael Anthony Knyszek
Currently the user arena code writes heap bits to the (*mspan).heapBits space with the platform-specific byte ordering (the heap bits are written and managed as uintptrs). However, the compiler always emits GC metadata for types in little endian. Because the scanning part of the code that loads through the type pointer in the allocation header expects little endian ordering, we end up with the wrong byte ordering in GC when trying to scan arena memory. Fix this by writing out the user arena heap bits in little endian on big endian platforms. This means that the space returned by (*mspan).heapBits has a different meaning for user arenas and small object spans, which is a little odd, so I documented it. To reduce the chance of misuse of the writeHeapBits API, which now writes out heap bits in a different ordering than writeSmallHeapBits on big endian platforms, this change also renames writeHeapBits to writeUserArenaHeapBits. Much of this can be avoided in the future if the compiler were to write out the pointer/scalar bits as an array of uintptr values instead of plain bytes. That's too big of a change for right now though. This change is a no-op on little endian platforms. I confirmed it by checking for any assembly code differences in the runtime test binary. There were none. With this change, the arena tests pass on ppc64. Fixes #64048. Change-Id: If077d003872fcccf5a154ff5d8441a58582061bb Reviewed-on: https://go-review.googlesource.com/c/go/+/541315 Run-TryBot: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-11-09runtime: implement experiment to replace heap bitmap with alloc headersMichael Anthony Knyszek
This change replaces the 1-bit-per-word heap bitmap for most size classes with allocation headers for objects that contain pointers. The header consists of a single pointer to a type. All allocations with headers are treated as implicitly containing one or more instances of the type in the header. As the name implies, headers are usually stored as the first word of an object. There are two additional exceptions to where headers are stored and how they're used. Objects smaller than 512 bytes do not have headers. Instead, a heap bitmap is reserved at the end of spans for objects of this size. A full word of overhead is too much for these small objects. The bitmap is of the same format of the old bitmap, minus the noMorePtrs bits which are unnecessary. All the objects <512 bytes have a bitmap less than a pointer-word in size, and that was the granularity at which noMorePtrs could stop scanning early anyway. Objects that are larger than 32 KiB (which have their own span) have their headers stored directly in the span, to allow power-of-two-sized allocations to not spill over into an extra page. The full implementation is behind GOEXPERIMENT=allocheaders. The purpose of this change is performance. First and foremost, with headers we no longer have to unroll pointer/scalar data at allocation time for most size classes. Small size classes still need some unrolling, but their bitmaps are small so we can optimize that case fairly well. Larger objects effectively have their pointer/scalar data unrolled on-demand from type data, which is much more compactly represented and results in less TLB pressure. Furthermore, since the headers are usually right next to the object and where we're about to start scanning, we get an additional temporal locality benefit in the data cache when looking up type metadata. The pointer/scalar data is now effectively unrolled on-demand, but it's also simpler to unroll than before; that unrolled data is never written anywhere, and for arrays we get the benefit of retreading the same data per element, as opposed to looking it up from scratch for each pointer-word of bitmap. Lastly, because we no longer have a heap bitmap that spans the entire heap, there's a flat 1.5% memory use reduction. This is balanced slightly by some objects possibly being bumped up a size class, but most objects are not tightly optimized to size class sizes so there's some memory to spare, making the header basically free in those cases. See the follow-up CL which turns on this experiment by default for benchmark results. (CL 538217.) Change-Id: I4c9034ee200650d06d8bdecd579d5f7c1bbf1fc5 Reviewed-on: https://go-review.googlesource.com/c/go/+/437955 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-27cmd/compile: rework TestPGOHash to not rebuild dependenciesCherry Mui
TestPGOHash may rebuild dependencies as we pass -trimpath to the go command. This CL makes it pass -trimpath compiler flag to only the current package instead, as we only need the current package to have a stable source file path. Also refactor buildPGOInliningTest to only take compiler flags, not go flags, to avoid accidental rebuild. Should fix #63733. Change-Id: Iec6c4e90cf659790e21083ee2e697f518234c5b9 Reviewed-on: https://go-review.googlesource.com/c/go/+/535915 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Bryan Mills <bcmills@google.com>
2023-10-13cmd/compile: lookup indirect callees from export data for devirtualizationMichael Pratt
Today, the PGO IR graph only contains entries for ir.Func loaded into the package. This can include functions from transitive dependencies, but only if they happen to be referenced by something in the current package. If they are not referenced, noder never bothers to load them. This leads to a deficiency in PGO devirtualization: some callee methods are available in transitive dependencies but do not devirtualize because they happen to not get loaded from export data. Resolve this by adding an explicit lookup from export data of callees mentioned in the profile. I have chosen to do this during loading of the profile for simplicity: the PGO IR graph always contains all of the functions we might need. That said, it isn't strictly necessary. PGO devirtualization could do the lookup lazily if it decides it actually needs a method. This saves work at the expense of a bit more complexity, but I've chosen the simpler approach for now as I measured the cost of this as significantly less than the rest of PGO loading. For #61577. Change-Id: Ieafb2a549510587027270ee6b4c3aefd149a901f Reviewed-on: https://go-review.googlesource.com/c/go/+/497175 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-10-06cmd/compile: use cache in front of type assert runtime callKeith Randall
That way we don't need to call into the runtime for every type assertion (to an interface type). name old time/op new time/op delta TypeAssert-24 3.78ns ± 3% 1.00ns ± 1% -73.53% (p=0.000 n=10+8) Change-Id: I0ba308aaf0f24a5495b4e13c814d35af0c58bfde Reviewed-on: https://go-review.googlesource.com/c/go/+/529316 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-10-06cmd/compile: add a cache to interface type switchesKeith Randall
That way we don't need to call into the runtime when the type being switched on has been seen many times before. The cache is just a hash table of a sample of all the concrete types that have been switched on at that source location. We record the matching case number and the resulting itab for each concrete input type. The caches seldom get large. The only two in a run of all.bash that get more than 100 entries, even with the sampling rate set to 1, are test/fixedbugs/issue29264.go, with 101 test/fixedbugs/issue29312.go, with 254 Both happen at the type switch in fmt.(*pp).handleMethods, perhaps unsurprisingly. name old time/op new time/op delta SwitchInterfaceTypePredictable-24 25.8ns ± 2% 2.5ns ± 3% -90.43% (p=0.000 n=10+10) SwitchInterfaceTypeUnpredictable-24 37.5ns ± 2% 11.2ns ± 1% -70.02% (p=0.000 n=10+10) Change-Id: I4961ac9547b7f15b03be6f55cdcb972d176955eb Reviewed-on: https://go-review.googlesource.com/c/go/+/526658 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Keith Randall <khr@google.com>
2023-10-06cmd/compile: improve interface type switchesKeith Randall
For type switches where the targets are interface types, call into the runtime once instead of doing a sequence of assert* calls. name old time/op new time/op delta SwitchInterfaceTypePredictable-24 26.6ns ± 1% 25.8ns ± 2% -2.86% (p=0.000 n=10+10) SwitchInterfaceTypeUnpredictable-24 39.3ns ± 1% 37.5ns ± 2% -4.57% (p=0.000 n=10+10) Not super helpful by itself, but this code organization allows followon CLs that add caching to the lookups. Change-Id: I7967f85a99171faa6c2550690e311bea8b54b01c Reviewed-on: https://go-review.googlesource.com/c/go/+/526657 Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com>
2023-10-04cmd/compile: adjust GOSSAFUNC html dumping to be more ABI-awareDavid Chase
Uses ,ABI instead of <ABI> because of problems with shell escaping and windows file names, however if someone goes to all the trouble of escaping the linker syntax and uses that instead, that works too. Examples: ``` GOSSAFUNC=runtime.exitsyscall go build main.go \# runtime dumped SSA for exitsyscall,0 to ../../src/loopvar/ssa.html dumped SSA for exitsyscall,1 to ../../src/loopvar/ssa.html GOSSADIR=`pwd` GOSSAFUNC=runtime.exitsyscall go build main.go \# runtime dumped SSA for exitsyscall,0 to ../../src/loopvar/runtime.exitsyscall,0.html dumped SSA for exitsyscall,1 to ../../src/loopvar/runtime.exitsyscall,1.html GOSSAFUNC=runtime.exitsyscall,0 go build main.go \# runtime dumped SSA for exitsyscall,0 to ../../src/loopvar/ssa.html GOSSAFUNC=runtime.exitsyscall\<1\> go build main.go \# runtime dumped SSA for exitsyscall,1 to ../../src/loopvar/ssa.html ``` Change-Id: Ia1138b61c797d0de49dbfae702dc306b9650a7f8 Reviewed-on: https://go-review.googlesource.com/c/go/+/532475 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: David Chase <drchase@google.com>
2023-09-19cmd/compile: add pgohash for debugging/bisecting PGO optimizationsCherry Mui
When a PGO build fails or produces incorrect program, it is often unclear what the problem is. Add pgo hash so we can bisect to individual optimization decisions, which often helps debugging. Related to #58153. Change-Id: I651ffd9c53bad60f2f28c8ec2a90a3f532982712 Reviewed-on: https://go-review.googlesource.com/c/go/+/528400 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-09-13cmd/compile/internal/abi: use Type.RegistersMatthew Dempsky
Now that types can self-report how many registers they need, it's much easier to determine whether a parameter will fit into the available registers: simply compare the number of registers needed against the number of registers still available. This also eliminates the need for the NumParamRegs cache. Also, the new code in NumParamRegs is stricter in only allowing it to be called on types that can actually be passed in registers, which requires a test to be corrected for that. While here, change mkstruct to a variadic function, so the call sites require less boilerplate. Change-Id: Iebe1a0456a8053a10e551e5da796014e5b1b695b Reviewed-on: https://go-review.googlesource.com/c/go/+/527339 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Auto-Submit: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Heschi Kreinick <heschi@google.com>
2023-09-05all: use ^TestName$ regular pattern for invoking a single testDmitri Shuralyov
Use ^ and $ in the -run flag regular expression value when the intention is to invoke a single named test. This removes the reliance on there not being another similarly named test to achieve the intended result. In particular, package syscall has tests named TestUnshareMountNameSpace and TestUnshareMountNameSpaceChroot that both trigger themselves setting GO_WANT_HELPER_PROCESS=1 to run alternate code in a helper process. As a consequence of overlap in their test names, the former was inadvertently triggering one too many helpers. Spotted while reviewing CL 525196. Apply the same change in other places to make it easier for code readers to see that said tests aren't running extraneous tests. The unlikely cases of -run=TestSomething intentionally being used to run all tests that have the TestSomething substring in the name can be better written as -run=^.*TestSomething.*$ or with a comment so it is clear it wasn't an oversight. Change-Id: Iba208aba3998acdbf8c6708e5d23ab88938bfc1e Reviewed-on: https://go-review.googlesource.com/c/go/+/524948 Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Kirill Kolyshkin <kolyshkin@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-09-04runtime: introduce nextslicecapEgon Elbre
This allows to reuse the slice cap computation across specialized growslice funcs. Updates #49480 Change-Id: Ie075d9c3075659ea14c11d51a9cd4ed46aa0e961 Reviewed-on: https://go-review.googlesource.com/c/go/+/495876 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Egon Elbre <egonelbre@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@golang.org>
2023-08-23cmd/compile: use jump tables for large type switchesKeith Randall
For large interface -> concrete type switches, we can use a jump table on some bits of the type hash instead of a binary search on the type hash. name old time/op new time/op delta SwitchTypePredictable-24 1.99ns ± 2% 1.78ns ± 5% -10.87% (p=0.000 n=10+10) SwitchTypeUnpredictable-24 11.0ns ± 1% 9.1ns ± 2% -17.55% (p=0.000 n=7+9) Change-Id: Ida4768e5d62c3ce1c2701288b72664aaa9e64259 Reviewed-on: https://go-review.googlesource.com/c/go/+/521497 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Keith Randall <khr@golang.org>
2023-08-18cmd/compile/internal/ir: remove AsNodeMatthew Dempsky
Except for a single call site in escape analysis, every use of ir.AsNode involves a types.Object that's known to contain an *ir.Name. Asserting directly to that type makes the code simpler and more efficient. The one use in escape analysis is extended to handle nil correctly without it. Change-Id: I694ae516903e541341d82c2f65a9155e4b0a9809 Reviewed-on: https://go-review.googlesource.com/c/go/+/520775 TryBot-Bypass: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> Auto-Submit: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-08-17cmd/compile/internal/ir: add typ parameter to NewNameAtMatthew Dempsky
Start making progress towards constructing IR with proper types. Change-Id: Iad32c1cf60f30ceb8e07c31c8871b115570ac3bd Reviewed-on: https://go-review.googlesource.com/c/go/+/520263 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-06-06cmd/compile: adjust PGO devirtualization diagnostic messageCherry Mui
Make it more consistent with the static devirtualization diagnostic message. Keep the print of concrete callee's method name, as it is clearer. Change-Id: Ibe9b40253eaff2c0071353a2b388177213488822 Reviewed-on: https://go-review.googlesource.com/c/go/+/500960 Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-06-03cmd/compile/internal/devirtualize: devirtualize methods in other packages ↵thepudds
if current package has a concrete reference The new PGO-driven indirect call specialization from CL 492436 in theory should allow for devirtualization on methods in another package when those methods are directly referenced in the current package. However, inline.InlineImpossible was checking for a zero-length fn.Body and would cause devirtualization to fail with a debug log message like: "should not PGO devirtualize (*Speaker1).Speak: no function body" Previously, the logic in inline.InlineImpossible was only called on local functions, but with PGO-based devirtualization, it can now be called on imported functions, where inlinable imported functions will have a zero-length fn.Body but a non-nil fn.Inl. We update inline.InlineImpossible to handle imported functions by adding a call to typecheck.HaveInlineBody in the check that was previously failing. For the test, we need to have a hopefully temporary workaround of adding explicit references to the callees in another package for devirtualization to work. CL 497175 or similar should enable removing this workaround. Fixes #60561 Updates #59959 Change-Id: I48449b7d8b329d84151bd3b506b8093c262eb2a3 GitHub-Last-Rev: 2d53c55fd895ad8fefd25510a6e6969e89d54a6d GitHub-Pull-Request: golang/go#60565 Reviewed-on: https://go-review.googlesource.com/c/go/+/500155 Run-TryBot: thepudds <thepudds1460@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-22cmd/compile: enable PGO-driven call devirtualizationMichael Pratt
This CL is originally based on CL 484838 from rajbarik@uber.com. Add a new PGO-based devirtualize pass. This pass conditionally devirtualizes interface calls for the hottest callee. That is, it performs a transformation like: type Iface interface { Foo() } type Concrete struct{} func (Concrete) Foo() {} func foo(i Iface) { i.Foo() } to: func foo(i Iface) { if c, ok := i.(Concrete); ok { c.Foo() } else { i.Foo() } } The primary benefit of this transformation is enabling inlining of the direct calls. Today this change has no impact on the escape behavior, as the fallback interface always forces an escape. But improving escape analysis to take advantage of this is an area of potential work. This CL is the bare minimum of a devirtualization implementation. There are still numerous limitations: * Callees not directly referenced in the current package can be missed (even if they are in the transitive dependences). * Callees not in the transitive dependencies of the current package are missed. * Only interface method calls are supported, not other indirect function calls. * Multiple calls to compatible interfaces on the same line cannot be distinguished and will use the same callee target. * Callees that only partially implement an interface (they are embedded in another type that completes the interface) cannot be devirtualized. * Others, mentioned in TODOs. Fixes #59959 Change-Id: I8bedb516139695ee4069650b099d05957b7ce5ee Reviewed-on: https://go-review.googlesource.com/c/go/+/492436 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-05cmd/compile: allow more inlining of functions that construct closuresThan McIntosh
[This is a roll-forward of CL 479095, which was reverted due to a bad interaction between inlining and escape analysis, then later fixed first with an attempt in CL 482355, then again in CL 484859, and then one more time with CL 492135.] Currently, when the inliner is determining if a function is inlineable, it descends into the bodies of closures constructed by that function. This has several unfortunate consequences: - If the closure contains a disallowed operation (e.g., a defer), then the outer function can't be inlined. It makes sense that the *closure* can't be inlined in this case, but it doesn't make sense to punish the function that constructs the closure. - The hairiness of the closure counts against the inlining budget of the outer function. Since we currently copy the closure body when inlining the outer function, this makes sense from the perspective of export data size and binary size, but ultimately doesn't make much sense from the perspective of what should be inlineable. - Since the inliner walks into every closure created by an outer function in addition to starting a walk at every closure, this adds an n^2 factor to inlinability analysis. This CL simply drops this behavior. In std, this makes 57 more functions inlinable, and disallows inlining for 10 (due to the basic instability of our bottom-up inlining approach), for an net increase of 47 inlinable functions (+0.6%). This will help significantly with the performance of the functions to be added for #56102, which have a somewhat complicated nesting of closures with a performance-critical fast path. The downside of this seems to be a potential increase in export data and text size, but the practical impact of this seems to be negligible: │ before │ after │ │ bytes │ bytes vs base │ Go/binary 15.12Mi ± 0% 15.14Mi ± 0% +0.16% (n=1) Go/text 5.220Mi ± 0% 5.237Mi ± 0% +0.32% (n=1) Compile/binary 22.92Mi ± 0% 22.94Mi ± 0% +0.07% (n=1) Compile/text 8.428Mi ± 0% 8.435Mi ± 0% +0.08% (n=1) Change-Id: I5f75fcceb177f05853996b75184a486528eafe96 Reviewed-on: https://go-review.googlesource.com/c/go/+/492017 Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-04-21cmd/compile: introduce separate memory op combining passKeith Randall
Memory op combining is currently done using arch-specific rewrite rules. Instead, do them as a arch-independent rewrite pass. This ensures that all architectures (with unaligned loads & stores) get equal treatment. This removes a lot of rewrite rules. The new pass is a bit more comprehensive. It handles things like out-of-order writes and is careful not to apply partial optimizations that then block further optimizations. Change-Id: I780ff3bb052475cd725a923309616882d25b8d9e Reviewed-on: https://go-review.googlesource.com/c/go/+/478475 Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2023-04-17Revert "cmd/compile: allow more inlining of functions that construct closures"Michael Knyszek
This reverts commit f8162a0e726f4b3a9df60a37e8ca7883dde61914. Reason for revert: https://github.com/golang/go/issues/59680 Change-Id: I91821c691a2d019ff0ad5b69509e32f3d56b8f67 Reviewed-on: https://go-review.googlesource.com/c/go/+/485498 Reviewed-by: Russ Cox <rsc@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2023-04-17cmd/compile: allow more inlining of functions that construct closuresThan McIntosh
[This is a roll-forward of CL 479095, which was reverted due to a bad interaction between inlining and escape analysis, then later fixed fist with an attempt in CL 482355, then again in 484859 .] Currently, when the inliner is determining if a function is inlineable, it descends into the bodies of closures constructed by that function. This has several unfortunate consequences: - If the closure contains a disallowed operation (e.g., a defer), then the outer function can't be inlined. It makes sense that the *closure* can't be inlined in this case, but it doesn't make sense to punish the function that constructs the closure. - The hairiness of the closure counts against the inlining budget of the outer function. Since we currently copy the closure body when inlining the outer function, this makes sense from the perspective of export data size and binary size, but ultimately doesn't make much sense from the perspective of what should be inlineable. - Since the inliner walks into every closure created by an outer function in addition to starting a walk at every closure, this adds an n^2 factor to inlinability analysis. This CL simply drops this behavior. In std, this makes 57 more functions inlinable, and disallows inlining for 10 (due to the basic instability of our bottom-up inlining approach), for an net increase of 47 inlinable functions (+0.6%). This will help significantly with the performance of the functions to be added for #56102, which have a somewhat complicated nesting of closures with a performance-critical fast path. The downside of this seems to be a potential increase in export data and text size, but the practical impact of this seems to be negligible: │ before │ after │ │ bytes │ bytes vs base │ Go/binary 15.12Mi ± 0% 15.14Mi ± 0% +0.16% (n=1) Go/text 5.220Mi ± 0% 5.237Mi ± 0% +0.32% (n=1) Compile/binary 22.92Mi ± 0% 22.94Mi ± 0% +0.07% (n=1) Compile/text 8.428Mi ± 0% 8.435Mi ± 0% +0.08% (n=1) Updates #56102. Change-Id: I6e938d596992ffb473cf51e7e598f372ce08deb0 Reviewed-on: https://go-review.googlesource.com/c/go/+/484860 Run-TryBot: Than McIntosh <thanm@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-04-14cmd/compile: add test that non-name call does not allocateCuong Manh Le
Updates #57434 Change-Id: Ib90c228f95c3d61204e60f63d7de55884d839e05 Reviewed-on: https://go-review.googlesource.com/c/go/+/459496 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-04-14cmd/compile: add math benchmarksJoel Sing
This adds benchmarks for division and modulus of 64 bit signed and unsigned integers. Updates #59089 Change-Id: Ie757c6d74a1f355873e79619eae26ece21a8f23e Reviewed-on: https://go-review.googlesource.com/c/go/+/482656 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2023-04-14Revert "cmd/compile: allow more inlining of functions that construct closures"Than McIntosh
This reverts commit http://go.dev/cl/c/482356. Reason for revert: Reverting this change again, since it is causing additional failures in google-internal testing. Change-Id: I9234946f62e5bb18c2f873a65e8b298d04af0809 Reviewed-on: https://go-review.googlesource.com/c/go/+/484735 Reviewed-by: Florian Zenker <floriank@google.com> Run-TryBot: Than McIntosh <thanm@google.com> Auto-Submit: Than McIntosh <thanm@google.com> Reviewed-by: Than McIntosh <thanm@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-04-07cmd/compile: allow more inlining of functions that construct closuresThan McIntosh
[This is a roll-forward of CL 479095, which was reverted due to a bad interaction between inlining and escape analysis since fixed in CL 482355.] Currently, when the inliner is determining if a function is inlineable, it descends into the bodies of closures constructed by that function. This has several unfortunate consequences: - If the closure contains a disallowed operation (e.g., a defer), then the outer function can't be inlined. It makes sense that the *closure* can't be inlined in this case, but it doesn't make sense to punish the function that constructs the closure. - The hairiness of the closure counts against the inlining budget of the outer function. Since we currently copy the closure body when inlining the outer function, this makes sense from the perspective of export data size and binary size, but ultimately doesn't make much sense from the perspective of what should be inlineable. - Since the inliner walks into every closure created by an outer function in addition to starting a walk at every closure, this adds an n^2 factor to inlinability analysis. This CL simply drops this behavior. In std, this makes 57 more functions inlinable, and disallows inlining for 10 (due to the basic instability of our bottom-up inlining approach), for an net increase of 47 inlinable functions (+0.6%). This will help significantly with the performance of the functions to be added for #56102, which have a somewhat complicated nesting of closures with a performance-critical fast path. The downside of this seems to be a potential increase in export data and text size, but the practical impact of this seems to be negligible: │ before │ after │ │ bytes │ bytes vs base │ Go/binary 15.12Mi ± 0% 15.14Mi ± 0% +0.16% (n=1) Go/text 5.220Mi ± 0% 5.237Mi ± 0% +0.32% (n=1) Compile/binary 22.92Mi ± 0% 22.94Mi ± 0% +0.07% (n=1) Compile/text 8.428Mi ± 0% 8.435Mi ± 0% +0.08% (n=1) Updates #56102. Change-Id: I1f4fc96c71609c8feb59fecdb92b69ba7e3b5b41 Reviewed-on: https://go-review.googlesource.com/c/go/+/482356 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-04-03cmd/compile/internal/test: skip testpoint due to revert of CL 479095Than McIntosh
Skip one of the testpoints that verifies inlining, since it no longer passes as a result of reverting CL 479095. Once we roll forward with a new version of CL 479095 we can re-enable this testpoint. Change-Id: I41f6fb3fce78f31e60c5f0ed2856be0e66865149 Reviewed-on: https://go-review.googlesource.com/c/go/+/481755 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Bryan Mills <bcmills@google.com>
2023-03-31sync: implement OnceFunc, OnceValue, and OnceValuesAustin Clements
This adds the three functions from #56102 to the sync package. These provide a convenient API for the most common uses of sync.Once. The performance of these is comparable to direct use of sync.Once: $ go test -run ^$ -bench OnceFunc\|OnceVal -count 20 | benchstat -row .name -col /v goos: linux goarch: amd64 pkg: sync cpu: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz │ Once │ Global │ Local │ │ sec/op │ sec/op vs base │ sec/op vs base │ OnceFunc 1.3500n ± 6% 2.7030n ± 1% +100.22% (p=0.000 n=20) 0.3935n ± 0% -70.86% (p=0.000 n=20) OnceValue 1.3155n ± 0% 2.7460n ± 1% +108.74% (p=0.000 n=20) 0.5478n ± 1% -58.35% (p=0.000 n=20) The "Once" column represents the baseline of how code would typically express these patterns using sync.Once. "Global" binds the closure returned by OnceFunc/OnceValue to global, which is how I expect these to be used most of the time. Currently, this defeats some inlining opportunities, which roughly doubles the cost over sync.Once; however, it's still *extremely* fast. Finally, "Local" binds the returned closure to a local variable. This unlocks several levels of inlining and represents pretty much the best possible case for these APIs, but is also unlikely to happen in practice. In principle the compiler could recognize that the global in the "Global" case is initialized in place and never mutated and do the same optimizations it does in the "Local" case, but it currently does not. Fixes #56102 Change-Id: If7355eccd7c8de7288d89a4282ff15ab1469e420 Reviewed-on: https://go-review.googlesource.com/c/go/+/451356 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Andrew Gerrand <adg@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Caleb Spare <cespare@gmail.com> Auto-Submit: Austin Clements <austin@google.com>
2023-02-23cmd/compile: rework unbounded shift lowering on PPC64Paul E. Murphy
This reduces unbounded shift latency by one cycle, and may generate less instructions in some cases. When there is a choice whether to use doubleword or word shifts, use doubleword shifts. Doubleword shifts have fewer hardware scheduling restrictions across P8/P9/P10. Likewise, rework the shift sequence to allow the compare/shift/overshift values to compute in parallel, then choose the correct value. Some ANDCCconst rules also need reworked to ensure they simplify when used for their flag value. This commonly occurs when prove fails to identify a bounded shift (e.g foo32<<uint(x&31)). Change-Id: Ifc6ff4a865d68675e57745056db414b0eb6f2d34 Reviewed-on: https://go-review.googlesource.com/c/go/+/442597 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-02-17runtime: remove the restriction that write barrier ptrs come in pairsKeith Randall
Future CLs will remove the invariant that pointers are always put in the write barrier in pairs. The behavior of the assembly code changes a bit, where instead of writing the pointers unconditionally and then checking for overflow, check for overflow first and then write the pointers. Also changed the write barrier flush function to not take the src/dst as arguments. Change-Id: I2ef708038367b7b82ea67cbaf505a1d5904c775c Reviewed-on: https://go-review.googlesource.com/c/go/+/447779 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Bypass: Keith Randall <khr@golang.org>
2023-02-10cmd/compile/internal/pgo: fix hard-coded PGO sample data positionFrederic Branczyk
This patch detects at which index position profiling samples that have the value-type samples count are, instead of the previously hard-coded position of index 1. Runtime generated profiles always generate CPU profiling data with the 0 index being CPU nanoseconds, and samples count at index 1, which is why this previously hasn't come up. This is a redo of CL 465135, now allowing empty profiles. Note that preprocessProfileGraph will already cause pgo.New to return nil for empty profiles. Fixes #58292 Change-Id: Ia6c94f0793f6ca9b0882b5e2c4d34f38e600c1e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/466675 Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-08Revert "cmd/compile/internal/pgo: fix hard-coded PGO sample data position"Michael Pratt
This reverts CL 465135. Reason for revert: This broke cmd/go.TestScript/build_pgo on the linux-amd64-longtest builder: https://build.golang.org/log/8f8ed7bf576f891a06d295c4a5bca987c6e941d6 Change-Id: Ie2f2cc2731099eb28eda6b94dded4dfc34e29441 Reviewed-on: https://go-review.googlesource.com/c/go/+/466439 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2023-02-08cmd/compile/internal/pgo: fix hard-coded PGO sample data positionFrederic Branczyk
This patch detects at which index position profiling samples that have the value-type samples count are, instead of the previously hard-coded position of index 1. Runtime generated profiles always generate CPU profiling data with the 0 index being CPU nanoseconds, and samples count at index 1, which is why this previously hasn't come up. Fixes #58292 Change-Id: Idde761d53b02259f3076c4e5dcb4a96a9d53df0e GitHub-Last-Rev: dabbf9f88c560286e150e9b136a27c3ac23c5ec1 GitHub-Pull-Request: golang/go#58294 Reviewed-on: https://go-review.googlesource.com/c/go/+/465135 Auto-Submit: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-02-06cmd/compile: replace os.MkdirTemp with T.TempDirOleksandr Redko
Updates #45402 Change-Id: Ieffd1c8b0b5e4e63024b5be2e1f910fb4411eb94 GitHub-Last-Rev: fa7418c8eb977b7214311e774f9df7a1220a3dfd GitHub-Pull-Request: golang/go#57940 Reviewed-on: https://go-review.googlesource.com/c/go/+/462896 Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org>
2023-01-31cmd/compile: cleanup atomic.Pointer[T] inline testCuong Manh Le
Updates #57410 Change-Id: I9be38e20c6b83d14f7785049a66de77ac7ecdf15 Reviewed-on: https://go-review.googlesource.com/c/go/+/463997 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-01-26cmd/compile/internal/types: remove unneeded functionalityMatthew Dempsky
This CL removes a handful of features that were only needed for the pre-unified frontends. In particular, Type.Pkg was a hack for iexport so that go/types.Var.Pkg could be precisely populated for struct fields and signature parameters by gcimporter, but it's no longer necessary with the unified export data format because we now write export data directly from types2-supplied type descriptors. Several other features (e.g., OrigType, implicit interfaces, type parameters on signatures) are no longer relevant to the unified frontend, because it only uses types1 to represent instantiated generic types. Updates #57410. Change-Id: I84fd1da5e0b65d2ab91d244a7bb593821ee916e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/458622 Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2023-01-25cmd: remove GOEXPERIMENT=nounified knobMatthew Dempsky
This CL removes the GOEXPERIMENT=nounified knob, and any conditional statements that depend on that knob. Further CLs to remove unreachable code follow this one. Updates #57410. Change-Id: I39c147e1a83601c73f8316a001705778fee64a91 Reviewed-on: https://go-review.googlesource.com/c/go/+/458615 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-01-23utf16: reduce utf16.Decode allocationsqmuntal
This CL avoids allocating in utf16.Decode for code point sequences with less than 64 elements. It does so by splitting the function in two, one that can be inlined that preallocates a buffer and the other that does the heavy-lifting. The mid-stack inliner will allocate the buffer in the caller stack, and in many cases this will be enough to avoid the allocation. unicode/utf16 benchmarks: name old time/op new time/op delta DecodeValidASCII-12 60.1ns ± 3% 16.0ns ±20% -73.40% (p=0.000 n=8+10) DecodeValidJapaneseChars-12 61.3ns ±10% 14.9ns ±39% -75.71% (p=0.000 n=10+10) name old alloc/op new alloc/op delta DecodeValidASCII-12 48.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10) DecodeValidJapaneseChars-12 48.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta DecodeValidASCII-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) DecodeValidJapaneseChars-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) I've also benchmarked os.File.ReadDir with this change applied to demonstrate that it does make a difference in the caller site, in this case via syscall.UTF16ToString: name old time/op new time/op delta ReadDir-12 592µs ± 8% 620µs ±16% ~ (p=0.280 n=10+10) name old alloc/op new alloc/op delta ReadDir-12 30.4kB ± 0% 22.4kB ± 0% -26.10% (p=0.000 n=8+10) name old allocs/op new allocs/op delta ReadDir-12 402 ± 0% 272 ± 0% -32.34% (p=0.000 n=10+10) Change-Id: I65cf5caa3fd3b3a466c0ed837a50a96e975bbe6b Reviewed-on: https://go-review.googlesource.com/c/go/+/453415 Reviewed-by: Damien Neil <dneil@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Alex Brainman <alex.brainman@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
2023-01-20all: fix typos in go file commentsMarcel Meyer
This is the second round to look for spelling mistakes. This time the manual sifting of the result list was made easier by filtering out capitalized and camelcase words. grep -r --include '*.go' -E '^// .*$' . | aspell list | grep -E -x '[A-Za-z]{1}[a-z]*' | sort | uniq This PR will be imported into Gerrit with the title and first comment (this text) used to generate the subject and body of the Gerrit change. Change-Id: Ie8a2092aaa7e1f051aa90f03dbaf2b9aaf5664a9 GitHub-Last-Rev: fc2bd6e0c51652f13a7588980f1408af8e6080f5 GitHub-Pull-Request: golang/go#57737 Reviewed-on: https://go-review.googlesource.com/c/go/+/461595 Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Robert Griesemer <gri@google.com>