| Age | Commit message (Collapse) | Author |
|
runtime_StoreReluintptr linknames to internal/runtime/atomic.StoreReluintptr,
which does not have a result.
Change-Id: I468cce82985f391c221768188a5eaff43cbcd037
Reviewed-on: https://go-review.googlesource.com/c/go/+/683095
TryBot-Bypass: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
The lack of links to https://go.dev/ref/mem in the sync package
documentation makes it difficult to read for people who have no previous
knowledge of that page. This PR includes the links where needed.
Fixes #67891
Change-Id: I0e1344cc6d7b702f4cb2e55fe0fcee3eb089391a
GitHub-Last-Rev: 427cf58aaeaae2e4b060248dd592e5fe8c6b7df4
GitHub-Pull-Request: golang/go#67892
Reviewed-on: https://go-review.googlesource.com/c/go/+/591395
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For #67401.
Change-Id: I015408a3f437c1733d97160ef2fb5da6d4efcc5c
Reviewed-on: https://go-review.googlesource.com/c/go/+/587598
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
|
|
For #67401.
Change-Id: I7dd28c3b01a1a647f84929d15412aa43ab0089ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/587575
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For #65355
Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be
Reviewed-on: https://go-review.googlesource.com/c/go/+/560155
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
|
|
I ran go fmt to fix format on the entire repository.
Change-Id: I2f09166b6b8ba0ffb0ba27f6500efb0ea4cf21ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/566835
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
|
|
Change-Id: I9bc5fd29b0eec8ceadcfee2116de5e7524ef92c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/539617
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: shuang cui <imcusg@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Move ChaCha8 code into internal/chacha8rand and use it to implement
runtime.rand, which is used for the unseeded global source for
both math/rand and math/rand/v2. This also affects the calculation of
the start point for iteration over very very large maps (when the
32-bit fastrand is not big enough).
The benefit is that misuse of the global random number generators
in math/rand and math/rand/v2 in contexts where non-predictable
randomness is important for security reasons is no longer a
security problem, removing a common mistake among programmers
who are unaware of the different kinds of randomness.
The cost is an extra 304 bytes per thread stored in the m struct
plus 2-3ns more per random uint64 due to the more sophisticated
algorithm. Using PCG looks like it would cost about the same,
although I haven't benchmarked that.
Before this, the math/rand and math/rand/v2 global generator
was wyrand (https://github.com/wangyi-fudan/wyhash).
For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson
ALFG was justifiable, since the latter was not any better.
But for math/rand/v2, the global generator really should be
at least as good as one of the well-studied, specific algorithms
provided directly by the package, and it's not.
(Wyrand is still reasonable for scheduling and cache decisions.)
Good randomness does have a cost: about twice wyrand.
Also rationalize the various runtime rand references.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ bbb48afeb7.amd64 │ 5cf807d1ea.amd64 │
│ sec/op │ sec/op vs base │
ChaCha8-32 1.862n ± 2% 1.861n ± 2% ~ (p=0.825 n=20)
PCG_DXSM-32 1.471n ± 1% 1.460n ± 2% ~ (p=0.153 n=20)
SourceUint64-32 1.636n ± 2% 1.582n ± 1% -3.30% (p=0.000 n=20)
GlobalInt64-32 2.087n ± 1% 3.663n ± 1% +75.54% (p=0.000 n=20)
GlobalInt64Parallel-32 0.1042n ± 1% 0.2026n ± 1% +94.48% (p=0.000 n=20)
GlobalUint64-32 2.263n ± 2% 3.724n ± 1% +64.57% (p=0.000 n=20)
GlobalUint64Parallel-32 0.1019n ± 1% 0.1973n ± 1% +93.67% (p=0.000 n=20)
Int64-32 1.771n ± 1% 1.774n ± 1% ~ (p=0.449 n=20)
Uint64-32 1.863n ± 2% 1.866n ± 1% ~ (p=0.364 n=20)
GlobalIntN1000-32 3.134n ± 3% 4.730n ± 2% +50.95% (p=0.000 n=20)
IntN1000-32 2.489n ± 1% 2.489n ± 1% ~ (p=0.683 n=20)
Int64N1000-32 2.521n ± 1% 2.516n ± 1% ~ (p=0.394 n=20)
Int64N1e8-32 2.479n ± 1% 2.478n ± 2% ~ (p=0.743 n=20)
Int64N1e9-32 2.530n ± 2% 2.514n ± 2% ~ (p=0.193 n=20)
Int64N2e9-32 2.501n ± 1% 2.494n ± 1% ~ (p=0.616 n=20)
Int64N1e18-32 3.227n ± 1% 3.205n ± 1% ~ (p=0.101 n=20)
Int64N2e18-32 3.647n ± 1% 3.599n ± 1% ~ (p=0.019 n=20)
Int64N4e18-32 5.135n ± 1% 5.069n ± 2% ~ (p=0.034 n=20)
Int32N1000-32 2.657n ± 1% 2.637n ± 1% ~ (p=0.180 n=20)
Int32N1e8-32 2.636n ± 1% 2.636n ± 1% ~ (p=0.763 n=20)
Int32N1e9-32 2.660n ± 2% 2.638n ± 1% ~ (p=0.358 n=20)
Int32N2e9-32 2.662n ± 2% 2.618n ± 2% ~ (p=0.064 n=20)
Float32-32 2.272n ± 2% 2.239n ± 2% ~ (p=0.194 n=20)
Float64-32 2.272n ± 1% 2.286n ± 2% ~ (p=0.763 n=20)
ExpFloat64-32 3.762n ± 1% 3.744n ± 1% ~ (p=0.171 n=20)
NormFloat64-32 3.706n ± 1% 3.655n ± 2% ~ (p=0.066 n=20)
Perm3-32 32.93n ± 3% 34.62n ± 1% +5.13% (p=0.000 n=20)
Perm30-32 202.9n ± 1% 204.0n ± 1% ~ (p=0.482 n=20)
Perm30ViaShuffle-32 115.0n ± 1% 114.9n ± 1% ~ (p=0.358 n=20)
ShuffleOverhead-32 112.8n ± 1% 112.7n ± 1% ~ (p=0.692 n=20)
Concurrent-32 2.107n ± 0% 3.725n ± 1% +76.75% (p=0.000 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
│ bbb48afeb7.arm64 │ 5cf807d1ea.arm64 │
│ sec/op │ sec/op vs base │
ChaCha8-8 2.480n ± 0% 2.429n ± 0% -2.04% (p=0.000 n=20)
PCG_DXSM-8 2.531n ± 0% 2.530n ± 0% ~ (p=0.877 n=20)
SourceUint64-8 2.534n ± 0% 2.533n ± 0% ~ (p=0.732 n=20)
GlobalInt64-8 2.172n ± 1% 4.794n ± 0% +120.67% (p=0.000 n=20)
GlobalInt64Parallel-8 0.4320n ± 0% 0.9605n ± 0% +122.32% (p=0.000 n=20)
GlobalUint64-8 2.182n ± 0% 4.770n ± 0% +118.58% (p=0.000 n=20)
GlobalUint64Parallel-8 0.4307n ± 0% 0.9583n ± 0% +122.51% (p=0.000 n=20)
Int64-8 4.107n ± 0% 4.104n ± 0% ~ (p=0.416 n=20)
Uint64-8 4.080n ± 0% 4.080n ± 0% ~ (p=0.052 n=20)
GlobalIntN1000-8 2.814n ± 2% 5.643n ± 0% +100.50% (p=0.000 n=20)
IntN1000-8 4.141n ± 0% 4.139n ± 0% ~ (p=0.140 n=20)
Int64N1000-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.313 n=20)
Int64N1e8-8 4.140n ± 0% 4.139n ± 0% ~ (p=0.103 n=20)
Int64N1e9-8 4.139n ± 0% 4.140n ± 0% ~ (p=0.761 n=20)
Int64N2e9-8 4.140n ± 0% 4.140n ± 0% ~ (p=0.636 n=20)
Int64N1e18-8 5.266n ± 0% 5.326n ± 1% +1.14% (p=0.001 n=20)
Int64N2e18-8 6.052n ± 0% 6.167n ± 0% +1.90% (p=0.000 n=20)
Int64N4e18-8 8.826n ± 0% 9.051n ± 0% +2.55% (p=0.000 n=20)
Int32N1000-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20)
Int32N1e8-8 4.126n ± 0% 4.131n ± 0% +0.12% (p=0.000 n=20)
Int32N1e9-8 4.127n ± 0% 4.132n ± 0% +0.12% (p=0.000 n=20)
Int32N2e9-8 4.132n ± 0% 4.131n ± 0% ~ (p=0.017 n=20)
Float32-8 4.109n ± 0% 4.105n ± 0% ~ (p=0.379 n=20)
Float64-8 4.107n ± 0% 4.106n ± 0% ~ (p=0.867 n=20)
ExpFloat64-8 5.339n ± 0% 5.383n ± 0% +0.82% (p=0.000 n=20)
NormFloat64-8 5.735n ± 0% 5.737n ± 1% ~ (p=0.856 n=20)
Perm3-8 26.65n ± 0% 26.80n ± 1% +0.58% (p=0.000 n=20)
Perm30-8 194.8n ± 1% 197.0n ± 0% +1.18% (p=0.000 n=20)
Perm30ViaShuffle-8 156.6n ± 0% 157.6n ± 1% +0.61% (p=0.000 n=20)
ShuffleOverhead-8 124.9n ± 0% 125.5n ± 0% +0.52% (p=0.000 n=20)
Concurrent-8 2.434n ± 3% 5.066n ± 0% +108.09% (p=0.000 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│ bbb48afeb7.386 │ 5cf807d1ea.386 │
│ sec/op │ sec/op vs base │
ChaCha8-32 11.295n ± 1% 4.748n ± 2% -57.96% (p=0.000 n=20)
PCG_DXSM-32 7.693n ± 1% 7.738n ± 2% ~ (p=0.542 n=20)
SourceUint64-32 7.658n ± 2% 7.622n ± 2% ~ (p=0.344 n=20)
GlobalInt64-32 3.473n ± 2% 7.526n ± 2% +116.73% (p=0.000 n=20)
GlobalInt64Parallel-32 0.3198n ± 0% 0.5444n ± 0% +70.22% (p=0.000 n=20)
GlobalUint64-32 3.612n ± 0% 7.575n ± 1% +109.69% (p=0.000 n=20)
GlobalUint64Parallel-32 0.3168n ± 0% 0.5403n ± 0% +70.51% (p=0.000 n=20)
Int64-32 7.673n ± 2% 7.789n ± 1% ~ (p=0.122 n=20)
Uint64-32 7.773n ± 1% 7.827n ± 2% ~ (p=0.920 n=20)
GlobalIntN1000-32 6.268n ± 1% 9.581n ± 1% +52.87% (p=0.000 n=20)
IntN1000-32 10.33n ± 2% 10.45n ± 1% ~ (p=0.233 n=20)
Int64N1000-32 10.98n ± 2% 11.01n ± 1% ~ (p=0.401 n=20)
Int64N1e8-32 11.19n ± 2% 10.97n ± 1% ~ (p=0.033 n=20)
Int64N1e9-32 11.06n ± 1% 11.08n ± 1% ~ (p=0.498 n=20)
Int64N2e9-32 11.10n ± 1% 11.01n ± 2% ~ (p=0.995 n=20)
Int64N1e18-32 15.23n ± 2% 15.04n ± 1% ~ (p=0.973 n=20)
Int64N2e18-32 15.89n ± 1% 15.85n ± 1% ~ (p=0.409 n=20)
Int64N4e18-32 18.96n ± 2% 19.34n ± 2% ~ (p=0.048 n=20)
Int32N1000-32 10.46n ± 2% 10.44n ± 2% ~ (p=0.480 n=20)
Int32N1e8-32 10.46n ± 2% 10.49n ± 2% ~ (p=0.951 n=20)
Int32N1e9-32 10.28n ± 2% 10.26n ± 1% ~ (p=0.431 n=20)
Int32N2e9-32 10.50n ± 2% 10.44n ± 2% ~ (p=0.249 n=20)
Float32-32 13.80n ± 2% 13.80n ± 2% ~ (p=0.751 n=20)
Float64-32 23.55n ± 2% 23.87n ± 0% ~ (p=0.408 n=20)
ExpFloat64-32 15.36n ± 1% 15.29n ± 2% ~ (p=0.316 n=20)
NormFloat64-32 13.57n ± 1% 13.79n ± 1% +1.66% (p=0.005 n=20)
Perm3-32 45.70n ± 2% 46.99n ± 2% +2.81% (p=0.001 n=20)
Perm30-32 399.0n ± 1% 403.8n ± 1% +1.19% (p=0.006 n=20)
Perm30ViaShuffle-32 349.0n ± 1% 350.4n ± 1% ~ (p=0.909 n=20)
ShuffleOverhead-32 322.3n ± 1% 323.8n ± 1% ~ (p=0.410 n=20)
Concurrent-32 3.331n ± 1% 7.312n ± 1% +119.50% (p=0.000 n=20)
For #61716.
Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2
Reviewed-on: https://go-review.googlesource.com/c/go/+/516860
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Fixes #61651
Change-Id: I27d581719e6bf38910f9d47dcf023bbff74ddf72
Reviewed-on: https://go-review.googlesource.com/c/go/+/514037
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
|
|
A few of these are copied from the memory model doc.
Many are entirely new, following discussion on #47141.
See https://research.swtch.com/gomm for background.
The rule we are establishing is that each type that is meant
to help synchronize a Go program should document its
happens-before guarantees.
For #50859.
Change-Id: I947c40639b263abe67499fa74f68711a97873a39
Reviewed-on: https://go-review.googlesource.com/c/go/+/381316
Auto-Submit: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
|
|
When the procUnpin is placed after shared.pushHead, there is
no need for x as a flag to indicate the previous process.
This CL can make the logic clear, and at the same time reduce
a redundant judgment.
Change-Id: I34ec9ba4cb5b5dbdf13a8f158b90481fed248cf5
Reviewed-on: https://go-review.googlesource.com/c/go/+/360059
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
And then revert the bootstrap cmd directories and certain testdata.
And adjust tests as needed.
Not reverting the changes in std that are bootstrapped,
because some of those changes would appear in API docs,
and we want to use any consistently.
Instead, rewrite 'any' to 'interface{}' in cmd/dist for those directories
when preparing the bootstrap copy.
A few files changed as a result of running gofmt -w
not because of interface{} -> any but because they
hadn't been updated for the new //go:build lines.
Fixes #49884.
Change-Id: Ie8045cba995f65bd79c694ec77a1b3d1fe01bb09
Reviewed-on: https://go-review.googlesource.com/c/go/+/368254
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
fastrandn is ~50% faster than fastrand() % n.
`ack -v 'fastrand\(\)\s?\%'` finds all modulo on fastrand()
name old time/op new time/op delta
Fastrandn/2 2.86ns ± 0% 1.59ns ± 0% -44.35% (p=0.000 n=9+10)
Fastrandn/3 2.87ns ± 1% 1.59ns ± 0% -44.41% (p=0.000 n=10+9)
Fastrandn/4 2.87ns ± 1% 1.58ns ± 1% -45.10% (p=0.000 n=10+10)
Fastrandn/5 2.86ns ± 1% 1.58ns ± 1% -44.84% (p=0.000 n=10+10)
Change-Id: Ic91f5ca9b9e3b65127bc34792b62fd64fbd13b5c
Reviewed-on: https://go-review.googlesource.com/c/go/+/353269
Trust: Meng Zhuo <mzh@golangcn.org>
Run-TryBot: Meng Zhuo <mzh@golangcn.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This reverts CL 303949.
Reason for revert: broke linux-arm-aws TryBots.
Change-Id: Ib44949df70520cdabff857846be0d2221403d2f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/313630
Trust: Bryan C. Mills <bcmills@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
This CL provide abilty to randomly select P to steal object from its
shared queue. In order to provide such ability randomOrder structure
was copied from runtime/proc.go.
It should reduce contention in firsts Ps and improve balance of object
stealing across all Ps. Also, the patch provides new benchmark
PoolStarvation which force Ps to steal objects.
Benchmarks:
name old time/op new time/op delta
Pool-8 2.16ns ±14% 2.14ns ±16% ~ (p=0.425 n=10+10)
PoolOverflow-8 489ns ± 0% 489ns ± 0% ~ (p=0.719 n=9+10)
PoolStarvation-8 7.00µs ± 4% 6.59µs ± 2% -5.86% (p=0.000 n=10+10)
PoolSTW-8 15.1µs ± 1% 15.2µs ± 1% +0.99% (p=0.001 n=10+10)
PoolExpensiveNew-8 1.25ms ±10% 1.31ms ± 9% ~ (p=0.143 n=10+10)
[Geo mean] 2.68µs 2.68µs -0.28%
name old p50-ns/STW new p50-ns/STW delta
PoolSTW-8 15.0k ± 1% 15.1k ± 1% +0.92% (p=0.000 n=10+10)
name old p95-ns/STW new p95-ns/STW delta
PoolSTW-8 16.2k ± 3% 16.4k ± 2% ~ (p=0.143 n=10+10)
name old GCs/op new GCs/op delta
PoolExpensiveNew-8 0.29 ± 2% 0.30 ± 1% +2.84% (p=0.000 n=8+10)
name old New/op new New/op delta
PoolExpensiveNew-8 8.07 ±11% 8.49 ±10% ~ (p=0.123 n=10+10)
Change-Id: I3ca1d0bf1f358b1148c58e64740fb2d5bfc0bc02
Reviewed-on: https://go-review.googlesource.com/c/go/+/303949
Reviewed-by: David Chase <drchase@google.com>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
|
|
CL 253748 introduced a special case in cmd/go to allow sync to import
runtime/internal/atomic. Besides introducing unnecessary complexity
into cmd/go, this breaks other packages (like gopls) that understand
how imports work, but don't understand this special case.
Fix this by using the more standard linkname-based approach to pull
the necessary functions from runtime/internal/atomic into sync. Since
these are compiler intrinsics, we also have to tell the compiler that
the linknamed symbols are intrinsics to get this optimization in sync.
Fixes #42196.
Change-Id: I1f91498c255c91583950886a89c3c9adc39a32f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/265124
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
Add an internal atomic intrinsic for load with acquire semantics
(extending LoadAcq to 64b) and add LoadAcquintptr for internal
use within the sync package. For other arches, this remaps to the
appropriate atomic.Load{,64} intrinsic which should not alter code
generation.
Similarly, add StoreRel{uintptr,64} for consistency, and inline.
Finally, add an exception to allow sync to directly use the
runtime/internal/atomic package which avoids more convoluted
workarounds (contributed by Lynn Boger).
In an extreme example, sync.(*Pool).pin consumes 20% of wall time
during fmt tests. This is reduced to 5% on ppc64le/power9.
From the fmt benchmarks on ppc64le:
name old time/op new time/op delta
SprintfPadding 468ns ± 0% 451ns ± 0% -3.63%
SprintfEmpty 73.3ns ± 0% 51.9ns ± 0% -29.20%
SprintfString 135ns ± 0% 122ns ± 0% -9.63%
SprintfTruncateString 232ns ± 0% 214ns ± 0% -7.76%
SprintfTruncateBytes 216ns ± 0% 202ns ± 0% -6.48%
SprintfSlowParsingPath 162ns ± 0% 142ns ± 0% -12.35%
SprintfQuoteString 1.00µs ± 0% 0.99µs ± 0% -1.39%
SprintfInt 117ns ± 0% 104ns ± 0% -11.11%
SprintfIntInt 190ns ± 0% 175ns ± 0% -7.89%
SprintfPrefixedInt 232ns ± 0% 212ns ± 0% -8.62%
SprintfFloat 270ns ± 0% 255ns ± 0% -5.56%
SprintfComplex 1.01µs ± 0% 0.99µs ± 0% -1.68%
SprintfBoolean 127ns ± 0% 111ns ± 0% -12.60%
SprintfHexString 220ns ± 0% 198ns ± 0% -10.00%
SprintfHexBytes 261ns ± 0% 252ns ± 0% -3.45%
SprintfBytes 600ns ± 0% 590ns ± 0% -1.67%
SprintfStringer 684ns ± 0% 658ns ± 0% -3.80%
SprintfStructure 2.57µs ± 0% 2.57µs ± 0% -0.12%
ManyArgs 669ns ± 0% 646ns ± 0% -3.44%
FprintInt 140ns ± 0% 136ns ± 0% -2.86%
FprintfBytes 184ns ± 0% 181ns ± 0% -1.63%
FprintIntNoAlloc 140ns ± 0% 136ns ± 0% -2.86%
ScanInts 929µs ± 0% 921µs ± 0% -0.79%
ScanRecursiveInt 122ms ± 0% 121ms ± 0% -0.11%
ScanRecursiveIntReaderWrapper 122ms ± 0% 122ms ± 0% -0.18%
Change-Id: I4d66780261b57b06ef600229e475462e7313f0d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/253748
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
Comment update.
Change-Id: If0d054216f9953f42df04647b85c38008b85b026
GitHub-Last-Rev: 133b4670be6dd1c94d16361c3a7a4bbdf8a355ab
GitHub-Pull-Request: golang/go#31539
Reviewed-on: https://go-review.googlesource.com/c/go/+/172700
Reviewed-by: Austin Clements <austin@google.com>
|
|
Currently, every Pool is cleared completely at the start of each GC.
This is a problem for heavy users of Pool because it causes an
allocation spike immediately after Pools are clear, which impacts both
throughput and latency.
This CL fixes this by introducing a victim cache mechanism. Instead of
clearing Pools, the victim cache is dropped and the primary cache is
moved to the victim cache. As a result, in steady-state, there are
(roughly) no new allocations, but if Pool usage drops, objects will
still be collected within two GCs (as opposed to one).
This victim cache approach also improves Pool's impact on GC dynamics.
The current approach causes all objects in Pools to be short lived.
However, if an application is in steady state and is just going to
repopulate its Pools, then these objects impact the live heap size *as
if* they were long lived. Since Pooled objects count as short lived
when computing the GC trigger and goal, but act as long lived objects
in the live heap, this causes GC to trigger too frequently. If Pooled
objects are a non-trivial portion of an application's heap, this
increases the CPU overhead of GC. The victim cache lets Pooled objects
affect the GC trigger and goal as long-lived objects.
This has no impact on Get/Put performance, but substantially reduces
the impact to the Pool user when a GC happens. PoolExpensiveNew
demonstrates this in the substantially reduction in the rate at which
the "New" function is called.
name old time/op new time/op delta
Pool-12 2.21ns ±36% 2.00ns ± 0% ~ (p=0.070 n=19+16)
PoolOverflow-12 587ns ± 1% 583ns ± 1% -0.77% (p=0.000 n=18+18)
PoolSTW-12 5.57µs ± 3% 4.52µs ± 4% -18.82% (p=0.000 n=20+19)
PoolExpensiveNew-12 3.69ms ± 7% 1.25ms ± 5% -66.25% (p=0.000 n=20+19)
name old p50-ns/STW new p50-ns/STW delta
PoolSTW-12 5.48k ± 2% 4.53k ± 2% -17.32% (p=0.000 n=20+20)
name old p95-ns/STW new p95-ns/STW delta
PoolSTW-12 6.69k ± 4% 5.13k ± 3% -23.31% (p=0.000 n=19+18)
name old GCs/op new GCs/op delta
PoolExpensiveNew-12 0.39 ± 1% 0.32 ± 2% -17.95% (p=0.000 n=18+20)
name old New/op new New/op delta
PoolExpensiveNew-12 40.0 ± 6% 12.4 ± 6% -68.91% (p=0.000 n=20+19)
(https://perf.golang.org/search?q=upload:20190311.2)
Fixes #22950.
Change-Id: If2e183d948c650417283076aacc20739682cdd70
Reviewed-on: https://go-review.googlesource.com/c/go/+/166961
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Currently, Pool stores each per-P shard's overflow in a slice
protected by a Mutex. In order to store to the overflow or steal from
another shard, a P must lock that shard's Mutex. This allows for
simple synchronization between Put and Get, but has unfortunate
consequences for clearing pools.
Pools are cleared during STW sweep termination, and hence rely on
pinning a goroutine to its P to synchronize between Get/Put and
clearing. This makes the Get/Put fast path extremely fast because it
can rely on quiescence-style coordination, which doesn't even require
atomic writes, much less locking.
The catch is that a goroutine cannot acquire a Mutex while pinned to
its P (as this could deadlock). Hence, it must drop the pin on the
slow path. But this means the slow path is not synchronized with
clearing. As a result,
1) It's difficult to reason about races between clearing and the slow
path. Furthermore, this reasoning often depends on unspecified nuances
of where preemption points can occur.
2) Clearing must zero out the pointer to every object in every Pool to
prevent a concurrent slow path from causing all objects to be
retained. Since this happens during STW, this has an O(# objects in
Pools) effect on STW time.
3) We can't implement a victim cache without making clearing even
slower.
This CL solves these problems by replacing the locked overflow slice
with a lock-free structure. This allows Gets and Puts to be pinned the
whole time they're manipulating the shards slice (Pool.local), which
eliminates the races between Get/Put and clearing. This, in turn,
eliminates the need to zero all object pointers, reducing clearing to
O(# of Pools) during STW.
In addition to significantly reducing STW impact, this also happens to
speed up the Get/Put fast-path and the slow path. It somewhat
increases the cost of PoolExpensiveNew, but we'll fix that in the next
CL.
name old time/op new time/op delta
Pool-12 3.00ns ± 0% 2.21ns ±36% -26.32% (p=0.000 n=18+19)
PoolOverflow-12 600ns ± 1% 587ns ± 1% -2.21% (p=0.000 n=16+18)
PoolSTW-12 71.0µs ± 2% 5.6µs ± 3% -92.15% (p=0.000 n=20+20)
PoolExpensiveNew-12 3.14ms ± 5% 3.69ms ± 7% +17.67% (p=0.000 n=19+20)
name old p50-ns/STW new p50-ns/STW delta
PoolSTW-12 70.7k ± 1% 5.5k ± 2% -92.25% (p=0.000 n=20+20)
name old p95-ns/STW new p95-ns/STW delta
PoolSTW-12 73.1k ± 2% 6.7k ± 4% -90.86% (p=0.000 n=18+19)
name old GCs/op new GCs/op delta
PoolExpensiveNew-12 0.38 ± 1% 0.39 ± 1% +2.07% (p=0.000 n=20+18)
name old New/op new New/op delta
PoolExpensiveNew-12 33.9 ± 6% 40.0 ± 6% +17.97% (p=0.000 n=19+20)
(https://perf.golang.org/search?q=upload:20190311.1)
Fixes #22331.
For #22950.
Change-Id: Ic5cd826e25e218f3f8256dbc4d22835c1fecb391
Reviewed-on: https://go-review.googlesource.com/c/go/+/166960
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Make poolLocal size multiple of 128, so it aligns to CPU cache line
on the most common architectures.
This also has the following benefits:
- It may help compiler substituting integer multiplication
by bit shift inside indexLocal.
- It shrinks poolLocal size from 176 bytes to 128 bytes on amd64,
so now it fits two cache lines (or a single cache line on certain
Intel CPUs - see https://software.intel.com/en-us/articles/optimizing-application-performance-on-intel-coret-microarchitecture-using-hardware-implemented-prefetchers).
No measurable performance changes on linux/amd64 and linux/386.
Change-Id: I11df0f064718a662e77a85d88b8a15a8919f25e9
Reviewed-on: https://go-review.googlesource.com/40918
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Rewrite indexLocal to achieve higher performance.
Performance results on linux/amd64:
name old time/op new time/op delta
Pool-4 19.1ns ± 2% 10.1ns ± 1% -47.15% (p=0.000 n=10+8)
PoolOverflow-4 3.11µs ± 1% 2.10µs ± 2% -32.66% (p=0.000 n=10+10)
Performance results on linux/386:
name old time/op new time/op delta
Pool-4 20.0ns ± 2% 13.1ns ± 1% -34.59% (p=0.000 n=10+9)
PoolOverflow-4 3.51µs ± 1% 2.49µs ± 0% -28.99% (p=0.000 n=10+8)
Change-Id: I7d57a2d4cd47ec43d09ca1267bde2e3f05a9faa9
Reviewed-on: https://go-review.googlesource.com/40913
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Disabled by https://golang.org/cl/53020044 due to false positives.
Reenable and model properly.
Fixes #17306.
Change-Id: I28405ddfcd17f58cf1427c300273212729154359
Reviewed-on: https://go-review.googlesource.com/31589
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
|
|
Embed noLock struct into the following types, so `go vet -copylocks` catches
their copying additionally to types containing sync.Mutex:
- sync.Cond
- sync.WaitGroup
- sync.Pool
- atomic.Value
Fixes #14582
Change-Id: Icb543ef5ad10524ad239a15eec8a9b334b0e0660
Reviewed-on: https://go-review.googlesource.com/22015
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
cmd and runtime were handled separately, and I'm intentionally skipped
syscall. This is the rest of the standard library.
CL generated mechanically with github.com/mdempsky/unconvert.
Change-Id: I9e0eff886974dedc37adb93f602064b83e469122
Reviewed-on: https://go-review.googlesource.com/22104
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
You can not use cannot, but you cannot spell cannot can not.
Change-Id: I2f0971481a460804de96fd8c9e46a9cc62a3fc5b
Reviewed-on: https://go-review.googlesource.com/19772
Reviewed-by: Rob Pike <r@golang.org>
|
|
Factor out duplicated race thunks from sync, syscall net
and fmt packages into a separate package and use it.
Fixes #8593
Change-Id: I156869c50946277809f6b509463752e7f7d28cdb
Reviewed-on: https://go-review.googlesource.com/14870
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Pool memory was only being released during the first GC after the first Put.
Put assumes that p.local != nil means p is on the allPools list.
poolCleanup (called during each GC) removed each pool from allPools
but did not clear p.local, so each pool was cleared by exactly one GC
and then never cleared again.
This bug was introduced late in the Go 1.3 release cycle.
Fixes #8979.
LGTM=rsc
R=golang-codereviews, bradfitz, r, rsc
CC=golang-codereviews, khr
https://golang.org/cl/162980043
|
|
Preparation was in CL 134570043.
This CL contains only the effect of 'hg mv src/pkg/* src'.
For more about the move, see golang.org/s/go14nopkg.
|