go - Fork of Go programming language with my patches.

Age	Commit message (Collapse)	Author
2025-06-20	[dev.simd] sync: correct the type of runtime_StoreReluintptr	Cherry Mui
	runtime_StoreReluintptr linknames to internal/runtime/atomic.StoreReluintptr, which does not have a result. Change-Id: I468cce82985f391c221768188a5eaff43cbcd037 Reviewed-on: https://go-review.googlesource.com/c/go/+/683095 TryBot-Bypass: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>
2024-06-07	sync: include links to the Go memory model in package documentation	Rodrigo Orselli
	The lack of links to https://go.dev/ref/mem in the sync package documentation makes it difficult to read for people who have no previous knowledge of that page. This PR includes the links where needed. Fixes #67891 Change-Id: I0e1344cc6d7b702f4cb2e55fe0fcee3eb089391a GitHub-Last-Rev: 427cf58aaeaae2e4b060248dd592e5fe8c6b7df4 GitHub-Pull-Request: golang/go#67892 Reviewed-on: https://go-review.googlesource.com/c/go/+/591395 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-29	all: document legacy //go:linkname for modules with ≥100 dependents	Russ Cox
	For #67401. Change-Id: I015408a3f437c1733d97160ef2fb5da6d4efcc5c Reviewed-on: https://go-review.googlesource.com/c/go/+/587598 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>
2024-05-23	all: document legacy //go:linkname for modules with ≥500 dependents	Russ Cox
	For #67401. Change-Id: I7dd28c3b01a1a647f84929d15412aa43ab0089ee Reviewed-on: https://go-review.googlesource.com/c/go/+/587575 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-25	runtime: migrate internal/atomic to internal/runtime	Andy Pan
	For #65355 Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be Reviewed-on: https://go-review.googlesource.com/c/go/+/560155 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
2024-02-28	all: run go fmt	sivchari
	I ran go fmt to fix format on the entire repository. Change-Id: I2f09166b6b8ba0ffb0ba27f6500efb0ea4cf21ff Reviewed-on: https://go-review.googlesource.com/c/go/+/566835 Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Ian Lance Taylor <iant@golang.org>
2024-02-26	sync: add available godoc link	cui fliter
	Change-Id: I9bc5fd29b0eec8ceadcfee2116de5e7524ef92c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/539617 Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: shuang cui <imcusg@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-12-05	math/rand, math/rand/v2: use ChaCha8 for global rand	Russ Cox
	Move ChaCha8 code into internal/chacha8rand and use it to implement runtime.rand, which is used for the unseeded global source for both math/rand and math/rand/v2. This also affects the calculation of the start point for iteration over very very large maps (when the 32-bit fastrand is not big enough). The benefit is that misuse of the global random number generators in math/rand and math/rand/v2 in contexts where non-predictable randomness is important for security reasons is no longer a security problem, removing a common mistake among programmers who are unaware of the different kinds of randomness. The cost is an extra 304 bytes per thread stored in the m struct plus 2-3ns more per random uint64 due to the more sophisticated algorithm. Using PCG looks like it would cost about the same, although I haven't benchmarked that. Before this, the math/rand and math/rand/v2 global generator was wyrand (https://github.com/wangyi-fudan/wyhash). For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson ALFG was justifiable, since the latter was not any better. But for math/rand/v2, the global generator really should be at least as good as one of the well-studied, specific algorithms provided directly by the package, and it's not. (Wyrand is still reasonable for scheduling and cache decisions.) Good randomness does have a cost: about twice wyrand. Also rationalize the various runtime rand references. goos: linux goarch: amd64 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ bbb48afeb7.amd64 │ 5cf807d1ea.amd64 │ │ sec/op │ sec/op vs base │ ChaCha8-32 1.862n ± 2% 1.861n ± 2% ~ (p=0.825 n=20) PCG_DXSM-32 1.471n ± 1% 1.460n ± 2% ~ (p=0.153 n=20) SourceUint64-32 1.636n ± 2% 1.582n ± 1% -3.30% (p=0.000 n=20) GlobalInt64-32 2.087n ± 1% 3.663n ± 1% +75.54% (p=0.000 n=20) GlobalInt64Parallel-32 0.1042n ± 1% 0.2026n ± 1% +94.48% (p=0.000 n=20) GlobalUint64-32 2.263n ± 2% 3.724n ± 1% +64.57% (p=0.000 n=20) GlobalUint64Parallel-32 0.1019n ± 1% 0.1973n ± 1% +93.67% (p=0.000 n=20) Int64-32 1.771n ± 1% 1.774n ± 1% ~ (p=0.449 n=20) Uint64-32 1.863n ± 2% 1.866n ± 1% ~ (p=0.364 n=20) GlobalIntN1000-32 3.134n ± 3% 4.730n ± 2% +50.95% (p=0.000 n=20) IntN1000-32 2.489n ± 1% 2.489n ± 1% ~ (p=0.683 n=20) Int64N1000-32 2.521n ± 1% 2.516n ± 1% ~ (p=0.394 n=20) Int64N1e8-32 2.479n ± 1% 2.478n ± 2% ~ (p=0.743 n=20) Int64N1e9-32 2.530n ± 2% 2.514n ± 2% ~ (p=0.193 n=20) Int64N2e9-32 2.501n ± 1% 2.494n ± 1% ~ (p=0.616 n=20) Int64N1e18-32 3.227n ± 1% 3.205n ± 1% ~ (p=0.101 n=20) Int64N2e18-32 3.647n ± 1% 3.599n ± 1% ~ (p=0.019 n=20) Int64N4e18-32 5.135n ± 1% 5.069n ± 2% ~ (p=0.034 n=20) Int32N1000-32 2.657n ± 1% 2.637n ± 1% ~ (p=0.180 n=20) Int32N1e8-32 2.636n ± 1% 2.636n ± 1% ~ (p=0.763 n=20) Int32N1e9-32 2.660n ± 2% 2.638n ± 1% ~ (p=0.358 n=20) Int32N2e9-32 2.662n ± 2% 2.618n ± 2% ~ (p=0.064 n=20) Float32-32 2.272n ± 2% 2.239n ± 2% ~ (p=0.194 n=20) Float64-32 2.272n ± 1% 2.286n ± 2% ~ (p=0.763 n=20) ExpFloat64-32 3.762n ± 1% 3.744n ± 1% ~ (p=0.171 n=20) NormFloat64-32 3.706n ± 1% 3.655n ± 2% ~ (p=0.066 n=20) Perm3-32 32.93n ± 3% 34.62n ± 1% +5.13% (p=0.000 n=20) Perm30-32 202.9n ± 1% 204.0n ± 1% ~ (p=0.482 n=20) Perm30ViaShuffle-32 115.0n ± 1% 114.9n ± 1% ~ (p=0.358 n=20) ShuffleOverhead-32 112.8n ± 1% 112.7n ± 1% ~ (p=0.692 n=20) Concurrent-32 2.107n ± 0% 3.725n ± 1% +76.75% (p=0.000 n=20) goos: darwin goarch: arm64 pkg: math/rand/v2 │ bbb48afeb7.arm64 │ 5cf807d1ea.arm64 │ │ sec/op │ sec/op vs base │ ChaCha8-8 2.480n ± 0% 2.429n ± 0% -2.04% (p=0.000 n=20) PCG_DXSM-8 2.531n ± 0% 2.530n ± 0% ~ (p=0.877 n=20) SourceUint64-8 2.534n ± 0% 2.533n ± 0% ~ (p=0.732 n=20) GlobalInt64-8 2.172n ± 1% 4.794n ± 0% +120.67% (p=0.000 n=20) GlobalInt64Parallel-8 0.4320n ± 0% 0.9605n ± 0% +122.32% (p=0.000 n=20) GlobalUint64-8 2.182n ± 0% 4.770n ± 0% +118.58% (p=0.000 n=20) GlobalUint64Parallel-8 0.4307n ± 0% 0.9583n ± 0% +122.51% (p=0.000 n=20) Int64-8 4.107n ± 0% 4.104n ± 0% ~ (p=0.416 n=20) Uint64-8 4.080n ± 0% 4.080n ± 0% ~ (p=0.052 n=20) GlobalIntN1000-8 2.814n ± 2% 5.643n ± 0% +100.50% (p=0.000 n=20) IntN1000-8 4.141n ± 0% 4.139n ± 0% ~ (p=0.140 n=20) Int64N1000-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.313 n=20) Int64N1e8-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.103 n=20) Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.761 n=20) Int64N2e9-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.636 n=20) Int64N1e18-8 5.266n ± 0% 5.326n ± 1% +1.14% (p=0.001 n=20) Int64N2e18-8 6.052n ± 0% 6.167n ± 0% +1.90% (p=0.000 n=20) Int64N4e18-8 8.826n ± 0% 9.051n ± 0% +2.55% (p=0.000 n=20) Int32N1000-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20) Int32N1e8-8 4.126n ± 0% 4.131n ± 0% +0.12% (p=0.000 n=20) Int32N1e9-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20) Int32N2e9-8 4.132n ± 0% 4.131n ± 0% ~ (p=0.017 n=20) Float32-8 4.109n ± 0% 4.105n ± 0% ~ (p=0.379 n=20) Float64-8 4.107n ± 0% 4.106n ± 0% ~ (p=0.867 n=20) ExpFloat64-8 5.339n ± 0% 5.383n ± 0% +0.82% (p=0.000 n=20) NormFloat64-8 5.735n ± 0% 5.737n ± 1% ~ (p=0.856 n=20) Perm3-8 26.65n ± 0% 26.80n ± 1% +0.58% (p=0.000 n=20) Perm30-8 194.8n ± 1% 197.0n ± 0% +1.18% (p=0.000 n=20) Perm30ViaShuffle-8 156.6n ± 0% 157.6n ± 1% +0.61% (p=0.000 n=20) ShuffleOverhead-8 124.9n ± 0% 125.5n ± 0% +0.52% (p=0.000 n=20) Concurrent-8 2.434n ± 3% 5.066n ± 0% +108.09% (p=0.000 n=20) goos: linux goarch: 386 pkg: math/rand/v2 cpu: AMD Ryzen 9 7950X 16-Core Processor │ bbb48afeb7.386 │ 5cf807d1ea.386 │ │ sec/op │ sec/op vs base │ ChaCha8-32 11.295n ± 1% 4.748n ± 2% -57.96% (p=0.000 n=20) PCG_DXSM-32 7.693n ± 1% 7.738n ± 2% ~ (p=0.542 n=20) SourceUint64-32 7.658n ± 2% 7.622n ± 2% ~ (p=0.344 n=20) GlobalInt64-32 3.473n ± 2% 7.526n ± 2% +116.73% (p=0.000 n=20) GlobalInt64Parallel-32 0.3198n ± 0% 0.5444n ± 0% +70.22% (p=0.000 n=20) GlobalUint64-32 3.612n ± 0% 7.575n ± 1% +109.69% (p=0.000 n=20) GlobalUint64Parallel-32 0.3168n ± 0% 0.5403n ± 0% +70.51% (p=0.000 n=20) Int64-32 7.673n ± 2% 7.789n ± 1% ~ (p=0.122 n=20) Uint64-32 7.773n ± 1% 7.827n ± 2% ~ (p=0.920 n=20) GlobalIntN1000-32 6.268n ± 1% 9.581n ± 1% +52.87% (p=0.000 n=20) IntN1000-32 10.33n ± 2% 10.45n ± 1% ~ (p=0.233 n=20) Int64N1000-32 10.98n ± 2% 11.01n ± 1% ~ (p=0.401 n=20) Int64N1e8-32 11.19n ± 2% 10.97n ± 1% ~ (p=0.033 n=20) Int64N1e9-32 11.06n ± 1% 11.08n ± 1% ~ (p=0.498 n=20) Int64N2e9-32 11.10n ± 1% 11.01n ± 2% ~ (p=0.995 n=20) Int64N1e18-32 15.23n ± 2% 15.04n ± 1% ~ (p=0.973 n=20) Int64N2e18-32 15.89n ± 1% 15.85n ± 1% ~ (p=0.409 n=20) Int64N4e18-32 18.96n ± 2% 19.34n ± 2% ~ (p=0.048 n=20) Int32N1000-32 10.46n ± 2% 10.44n ± 2% ~ (p=0.480 n=20) Int32N1e8-32 10.46n ± 2% 10.49n ± 2% ~ (p=0.951 n=20) Int32N1e9-32 10.28n ± 2% 10.26n ± 1% ~ (p=0.431 n=20) Int32N2e9-32 10.50n ± 2% 10.44n ± 2% ~ (p=0.249 n=20) Float32-32 13.80n ± 2% 13.80n ± 2% ~ (p=0.751 n=20) Float64-32 23.55n ± 2% 23.87n ± 0% ~ (p=0.408 n=20) ExpFloat64-32 15.36n ± 1% 15.29n ± 2% ~ (p=0.316 n=20) NormFloat64-32 13.57n ± 1% 13.79n ± 1% +1.66% (p=0.005 n=20) Perm3-32 45.70n ± 2% 46.99n ± 2% +2.81% (p=0.001 n=20) Perm30-32 399.0n ± 1% 403.8n ± 1% +1.19% (p=0.006 n=20) Perm30ViaShuffle-32 349.0n ± 1% 350.4n ± 1% ~ (p=0.909 n=20) ShuffleOverhead-32 322.3n ± 1% 323.8n ± 1% ~ (p=0.410 n=20) Concurrent-32 3.331n ± 1% 7.312n ± 1% +119.50% (p=0.000 n=20) For #61716. Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2 Reviewed-on: https://go-review.googlesource.com/c/go/+/516860 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-07-31	sync: panic rather than throw on nil *Pool	Ian Lance Taylor
	Fixes #61651 Change-Id: I27d581719e6bf38910f9d47dcf023bbff74ddf72 Reviewed-on: https://go-review.googlesource.com/c/go/+/514037 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com>
2022-06-06	runtime, sync, sync/atomic: document happens-before guarantees	Russ Cox
	A few of these are copied from the memory model doc. Many are entirely new, following discussion on #47141. See https://research.swtch.com/gomm for background. The rule we are establishing is that each type that is meant to help synchronize a Go program should document its happens-before guarantees. For #50859. Change-Id: I947c40639b263abe67499fa74f68711a97873a39 Reviewed-on: https://go-review.googlesource.com/c/go/+/381316 Auto-Submit: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Roland Shoemaker <roland@golang.org>
2022-05-08	sync: remove the redundant logic on sync.(*Pool).Put	Jason7602
	When the procUnpin is placed after shared.pushHead, there is no need for x as a flag to indicate the previous process. This CL can make the logic clear, and at the same time reduce a redundant judgment. Change-Id: I34ec9ba4cb5b5dbdf13a8f158b90481fed248cf5 Reviewed-on: https://go-review.googlesource.com/c/go/+/360059 Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2021-12-13	all: gofmt -w -r 'interface{} -> any' src	Russ Cox
	And then revert the bootstrap cmd directories and certain testdata. And adjust tests as needed. Not reverting the changes in std that are bootstrapped, because some of those changes would appear in API docs, and we want to use any consistently. Instead, rewrite 'any' to 'interface{}' in cmd/dist for those directories when preparing the bootstrap copy. A few files changed as a result of running gofmt -w not because of interface{} -> any but because they hadn't been updated for the new //go:build lines. Fixes #49884. Change-Id: Ie8045cba995f65bd79c694ec77a1b3d1fe01bb09 Reviewed-on: https://go-review.googlesource.com/c/go/+/368254 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2021-10-07	runtime,sync: using fastrandn instead of modulo reduction	Meng Zhuo
	fastrandn is ~50% faster than fastrand() % n. `ack -v 'fastrand\(\)\s?\%'` finds all modulo on fastrand() name old time/op new time/op delta Fastrandn/2 2.86ns ± 0% 1.59ns ± 0% -44.35% (p=0.000 n=9+10) Fastrandn/3 2.87ns ± 1% 1.59ns ± 0% -44.41% (p=0.000 n=10+9) Fastrandn/4 2.87ns ± 1% 1.58ns ± 1% -45.10% (p=0.000 n=10+10) Fastrandn/5 2.86ns ± 1% 1.58ns ± 1% -44.84% (p=0.000 n=10+10) Change-Id: Ic91f5ca9b9e3b65127bc34792b62fd64fbd13b5c Reviewed-on: https://go-review.googlesource.com/c/go/+/353269 Trust: Meng Zhuo <mzh@golangcn.org> Run-TryBot: Meng Zhuo <mzh@golangcn.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-04-26	Revert "sync: improve sync.Pool object stealing"	Bryan C. Mills
	This reverts CL 303949. Reason for revert: broke linux-arm-aws TryBots. Change-Id: Ib44949df70520cdabff857846be0d2221403d2f4 Reviewed-on: https://go-review.googlesource.com/c/go/+/313630 Trust: Bryan C. Mills <bcmills@google.com> Run-TryBot: Bryan C. Mills <bcmills@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-04-26	sync: improve sync.Pool object stealing	Ruslan Andreev
	This CL provide abilty to randomly select P to steal object from its shared queue. In order to provide such ability randomOrder structure was copied from runtime/proc.go. It should reduce contention in firsts Ps and improve balance of object stealing across all Ps. Also, the patch provides new benchmark PoolStarvation which force Ps to steal objects. Benchmarks: name old time/op new time/op delta Pool-8 2.16ns ±14% 2.14ns ±16% ~ (p=0.425 n=10+10) PoolOverflow-8 489ns ± 0% 489ns ± 0% ~ (p=0.719 n=9+10) PoolStarvation-8 7.00µs ± 4% 6.59µs ± 2% -5.86% (p=0.000 n=10+10) PoolSTW-8 15.1µs ± 1% 15.2µs ± 1% +0.99% (p=0.001 n=10+10) PoolExpensiveNew-8 1.25ms ±10% 1.31ms ± 9% ~ (p=0.143 n=10+10) [Geo mean] 2.68µs 2.68µs -0.28% name old p50-ns/STW new p50-ns/STW delta PoolSTW-8 15.0k ± 1% 15.1k ± 1% +0.92% (p=0.000 n=10+10) name old p95-ns/STW new p95-ns/STW delta PoolSTW-8 16.2k ± 3% 16.4k ± 2% ~ (p=0.143 n=10+10) name old GCs/op new GCs/op delta PoolExpensiveNew-8 0.29 ± 2% 0.30 ± 1% +2.84% (p=0.000 n=8+10) name old New/op new New/op delta PoolExpensiveNew-8 8.07 ±11% 8.49 ±10% ~ (p=0.123 n=10+10) Change-Id: I3ca1d0bf1f358b1148c58e64740fb2d5bfc0bc02 Reviewed-on: https://go-review.googlesource.com/c/go/+/303949 Reviewed-by: David Chase <drchase@google.com> Trust: Emmanuel Odeke <emmanuel@orijtech.com>
2020-10-26	cmd/go,cmd/compile,sync: remove special import case in cmd/go	Austin Clements
	CL 253748 introduced a special case in cmd/go to allow sync to import runtime/internal/atomic. Besides introducing unnecessary complexity into cmd/go, this breaks other packages (like gopls) that understand how imports work, but don't understand this special case. Fix this by using the more standard linkname-based approach to pull the necessary functions from runtime/internal/atomic into sync. Since these are compiler intrinsics, we also have to tell the compiler that the linknamed symbols are intrinsics to get this optimization in sync. Fixes #42196. Change-Id: I1f91498c255c91583950886a89c3c9adc39a32f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/265124 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Bryan C. Mills <bcmills@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Paul Murphy <murp@ibm.com> TryBot-Result: Go Bot <gobot@golang.org>
2020-10-21	cmd/compiler,cmd/go,sync: add internal {LoadAcq,StoreRel}64 on ppc64	Paul E. Murphy
	Add an internal atomic intrinsic for load with acquire semantics (extending LoadAcq to 64b) and add LoadAcquintptr for internal use within the sync package. For other arches, this remaps to the appropriate atomic.Load{,64} intrinsic which should not alter code generation. Similarly, add StoreRel{uintptr,64} for consistency, and inline. Finally, add an exception to allow sync to directly use the runtime/internal/atomic package which avoids more convoluted workarounds (contributed by Lynn Boger). In an extreme example, sync.(*Pool).pin consumes 20% of wall time during fmt tests. This is reduced to 5% on ppc64le/power9. From the fmt benchmarks on ppc64le: name old time/op new time/op delta SprintfPadding 468ns ± 0% 451ns ± 0% -3.63% SprintfEmpty 73.3ns ± 0% 51.9ns ± 0% -29.20% SprintfString 135ns ± 0% 122ns ± 0% -9.63% SprintfTruncateString 232ns ± 0% 214ns ± 0% -7.76% SprintfTruncateBytes 216ns ± 0% 202ns ± 0% -6.48% SprintfSlowParsingPath 162ns ± 0% 142ns ± 0% -12.35% SprintfQuoteString 1.00µs ± 0% 0.99µs ± 0% -1.39% SprintfInt 117ns ± 0% 104ns ± 0% -11.11% SprintfIntInt 190ns ± 0% 175ns ± 0% -7.89% SprintfPrefixedInt 232ns ± 0% 212ns ± 0% -8.62% SprintfFloat 270ns ± 0% 255ns ± 0% -5.56% SprintfComplex 1.01µs ± 0% 0.99µs ± 0% -1.68% SprintfBoolean 127ns ± 0% 111ns ± 0% -12.60% SprintfHexString 220ns ± 0% 198ns ± 0% -10.00% SprintfHexBytes 261ns ± 0% 252ns ± 0% -3.45% SprintfBytes 600ns ± 0% 590ns ± 0% -1.67% SprintfStringer 684ns ± 0% 658ns ± 0% -3.80% SprintfStructure 2.57µs ± 0% 2.57µs ± 0% -0.12% ManyArgs 669ns ± 0% 646ns ± 0% -3.44% FprintInt 140ns ± 0% 136ns ± 0% -2.86% FprintfBytes 184ns ± 0% 181ns ± 0% -1.63% FprintIntNoAlloc 140ns ± 0% 136ns ± 0% -2.86% ScanInts 929µs ± 0% 921µs ± 0% -0.79% ScanRecursiveInt 122ms ± 0% 121ms ± 0% -0.11% ScanRecursiveIntReaderWrapper 122ms ± 0% 122ms ± 0% -0.18% Change-Id: I4d66780261b57b06ef600229e475462e7313f0d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/253748 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Keith Randall <khr@golang.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org>
2019-04-19	sync: update comment	Kai Dong
	Comment update. Change-Id: If0d054216f9953f42df04647b85c38008b85b026 GitHub-Last-Rev: 133b4670be6dd1c94d16361c3a7a4bbdf8a355ab GitHub-Pull-Request: golang/go#31539 Reviewed-on: https://go-review.googlesource.com/c/go/+/172700 Reviewed-by: Austin Clements <austin@google.com>
2019-04-05	sync: smooth out Pool behavior over GC with a victim cache	Austin Clements
	Currently, every Pool is cleared completely at the start of each GC. This is a problem for heavy users of Pool because it causes an allocation spike immediately after Pools are clear, which impacts both throughput and latency. This CL fixes this by introducing a victim cache mechanism. Instead of clearing Pools, the victim cache is dropped and the primary cache is moved to the victim cache. As a result, in steady-state, there are (roughly) no new allocations, but if Pool usage drops, objects will still be collected within two GCs (as opposed to one). This victim cache approach also improves Pool's impact on GC dynamics. The current approach causes all objects in Pools to be short lived. However, if an application is in steady state and is just going to repopulate its Pools, then these objects impact the live heap size as if they were long lived. Since Pooled objects count as short lived when computing the GC trigger and goal, but act as long lived objects in the live heap, this causes GC to trigger too frequently. If Pooled objects are a non-trivial portion of an application's heap, this increases the CPU overhead of GC. The victim cache lets Pooled objects affect the GC trigger and goal as long-lived objects. This has no impact on Get/Put performance, but substantially reduces the impact to the Pool user when a GC happens. PoolExpensiveNew demonstrates this in the substantially reduction in the rate at which the "New" function is called. name old time/op new time/op delta Pool-12 2.21ns ±36% 2.00ns ± 0% ~ (p=0.070 n=19+16) PoolOverflow-12 587ns ± 1% 583ns ± 1% -0.77% (p=0.000 n=18+18) PoolSTW-12 5.57µs ± 3% 4.52µs ± 4% -18.82% (p=0.000 n=20+19) PoolExpensiveNew-12 3.69ms ± 7% 1.25ms ± 5% -66.25% (p=0.000 n=20+19) name old p50-ns/STW new p50-ns/STW delta PoolSTW-12 5.48k ± 2% 4.53k ± 2% -17.32% (p=0.000 n=20+20) name old p95-ns/STW new p95-ns/STW delta PoolSTW-12 6.69k ± 4% 5.13k ± 3% -23.31% (p=0.000 n=19+18) name old GCs/op new GCs/op delta PoolExpensiveNew-12 0.39 ± 1% 0.32 ± 2% -17.95% (p=0.000 n=18+20) name old New/op new New/op delta PoolExpensiveNew-12 40.0 ± 6% 12.4 ± 6% -68.91% (p=0.000 n=20+19) (https://perf.golang.org/search?q=upload:20190311.2) Fixes #22950. Change-Id: If2e183d948c650417283076aacc20739682cdd70 Reviewed-on: https://go-review.googlesource.com/c/go/+/166961 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2019-04-05	sync: use lock-free structure for Pool stealing	Austin Clements
	Currently, Pool stores each per-P shard's overflow in a slice protected by a Mutex. In order to store to the overflow or steal from another shard, a P must lock that shard's Mutex. This allows for simple synchronization between Put and Get, but has unfortunate consequences for clearing pools. Pools are cleared during STW sweep termination, and hence rely on pinning a goroutine to its P to synchronize between Get/Put and clearing. This makes the Get/Put fast path extremely fast because it can rely on quiescence-style coordination, which doesn't even require atomic writes, much less locking. The catch is that a goroutine cannot acquire a Mutex while pinned to its P (as this could deadlock). Hence, it must drop the pin on the slow path. But this means the slow path is not synchronized with clearing. As a result, 1) It's difficult to reason about races between clearing and the slow path. Furthermore, this reasoning often depends on unspecified nuances of where preemption points can occur. 2) Clearing must zero out the pointer to every object in every Pool to prevent a concurrent slow path from causing all objects to be retained. Since this happens during STW, this has an O(# objects in Pools) effect on STW time. 3) We can't implement a victim cache without making clearing even slower. This CL solves these problems by replacing the locked overflow slice with a lock-free structure. This allows Gets and Puts to be pinned the whole time they're manipulating the shards slice (Pool.local), which eliminates the races between Get/Put and clearing. This, in turn, eliminates the need to zero all object pointers, reducing clearing to O(# of Pools) during STW. In addition to significantly reducing STW impact, this also happens to speed up the Get/Put fast-path and the slow path. It somewhat increases the cost of PoolExpensiveNew, but we'll fix that in the next CL. name old time/op new time/op delta Pool-12 3.00ns ± 0% 2.21ns ±36% -26.32% (p=0.000 n=18+19) PoolOverflow-12 600ns ± 1% 587ns ± 1% -2.21% (p=0.000 n=16+18) PoolSTW-12 71.0µs ± 2% 5.6µs ± 3% -92.15% (p=0.000 n=20+20) PoolExpensiveNew-12 3.14ms ± 5% 3.69ms ± 7% +17.67% (p=0.000 n=19+20) name old p50-ns/STW new p50-ns/STW delta PoolSTW-12 70.7k ± 1% 5.5k ± 2% -92.25% (p=0.000 n=20+20) name old p95-ns/STW new p95-ns/STW delta PoolSTW-12 73.1k ± 2% 6.7k ± 4% -90.86% (p=0.000 n=18+19) name old GCs/op new GCs/op delta PoolExpensiveNew-12 0.38 ± 1% 0.39 ± 1% +2.07% (p=0.000 n=20+18) name old New/op new New/op delta PoolExpensiveNew-12 33.9 ± 6% 40.0 ± 6% +17.97% (p=0.000 n=19+20) (https://perf.golang.org/search?q=upload:20190311.1) Fixes #22331. For #22950. Change-Id: Ic5cd826e25e218f3f8256dbc4d22835c1fecb391 Reviewed-on: https://go-review.googlesource.com/c/go/+/166960 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-20	sync: align poolLocal to CPU cache line size	Aliaksandr Valialkin
	Make poolLocal size multiple of 128, so it aligns to CPU cache line on the most common architectures. This also has the following benefits: - It may help compiler substituting integer multiplication by bit shift inside indexLocal. - It shrinks poolLocal size from 176 bytes to 128 bytes on amd64, so now it fits two cache lines (or a single cache line on certain Intel CPUs - see https://software.intel.com/en-us/articles/optimizing-application-performance-on-intel-coret-microarchitecture-using-hardware-implemented-prefetchers). No measurable performance changes on linux/amd64 and linux/386. Change-Id: I11df0f064718a662e77a85d88b8a15a8919f25e9 Reviewed-on: https://go-review.googlesource.com/40918 Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-18	sync: improve Pool performance	Aliaksandr Valialkin
	Rewrite indexLocal to achieve higher performance. Performance results on linux/amd64: name old time/op new time/op delta Pool-4 19.1ns ± 2% 10.1ns ± 1% -47.15% (p=0.000 n=10+8) PoolOverflow-4 3.11µs ± 1% 2.10µs ± 2% -32.66% (p=0.000 n=10+10) Performance results on linux/386: name old time/op new time/op delta Pool-4 20.0ns ± 2% 13.1ns ± 1% -34.59% (p=0.000 n=10+9) PoolOverflow-4 3.51µs ± 1% 2.49µs ± 0% -28.99% (p=0.000 n=10+8) Change-Id: I7d57a2d4cd47ec43d09ca1267bde2e3f05a9faa9 Reviewed-on: https://go-review.googlesource.com/40913 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-30	sync: enable Pool when using race detector	Russ Cox
	Disabled by https://golang.org/cl/53020044 due to false positives. Reenable and model properly. Fixes #17306. Change-Id: I28405ddfcd17f58cf1427c300273212729154359 Reviewed-on: https://go-review.googlesource.com/31589 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-05-06	cmd/vet: check sync.* types' copying	Aliaksandr Valialkin
	Embed noLock struct into the following types, so `go vet -copylocks` catches their copying additionally to types containing sync.Mutex: - sync.Cond - sync.WaitGroup - sync.Pool - atomic.Value Fixes #14582 Change-Id: Icb543ef5ad10524ad239a15eec8a9b334b0e0660 Reviewed-on: https://go-review.googlesource.com/22015 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-15	all: remove unnecessary type conversions	Matthew Dempsky
	cmd and runtime were handled separately, and I'm intentionally skipped syscall. This is the rest of the standard library. CL generated mechanically with github.com/mdempsky/unconvert. Change-Id: I9e0eff886974dedc37adb93f602064b83e469122 Reviewed-on: https://go-review.googlesource.com/22104 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-21	all: use cannot instead of can not	Josh Bleecher Snyder
	You can not use cannot, but you cannot spell cannot can not. Change-Id: I2f0971481a460804de96fd8c9e46a9cc62a3fc5b Reviewed-on: https://go-review.googlesource.com/19772 Reviewed-by: Rob Pike <r@golang.org>
2015-11-26	internal/race: add package	Dmitry Vyukov
	Factor out duplicated race thunks from sync, syscall net and fmt packages into a separate package and use it. Fixes #8593 Change-Id: I156869c50946277809f6b509463752e7f7d28cdb Reviewed-on: https://go-review.googlesource.com/14870 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2014-10-22	sync: release Pool memory during second and later GCs	Dmitriy Vyukov
	Pool memory was only being released during the first GC after the first Put. Put assumes that p.local != nil means p is on the allPools list. poolCleanup (called during each GC) removed each pool from allPools but did not clear p.local, so each pool was cleared by exactly one GC and then never cleared again. This bug was introduced late in the Go 1.3 release cycle. Fixes #8979. LGTM=rsc R=golang-codereviews, bradfitz, r, rsc CC=golang-codereviews, khr https://golang.org/cl/162980043
2014-09-08	build: move package sources from src/pkg to src	Russ Cox
	Preparation was in CL 134570043. This CL contains only the effect of 'hg mv src/pkg/* src'. For more about the move, see golang.org/s/go14nopkg.