| Age | Commit message (Collapse) | Author |
|
It wasn't actually testing what it says it was testing.
A random permutation isn't cyclic. It only probably hits a few
elements before entering a cycle.
Use an algorithm that generates a random cyclic permutation instead.
Fixing the test makes the previous CL look less good. But it still helps.
(Theory: Fixing the test makes it less cache friendly, so there are
more misses all around. That makes the benchmark slower, suppressing
the differences seen. Also fixing the benchmark makes the loop
iteration count less predictable, which hurts the raw loop
implementation somewhat.)
(baseline = tip, experiment = tip+previous CL, noswiss = GOEXPERIMENT=noswissmap)
goos: darwin
goarch: arm64
pkg: runtime
cpu: Apple M2 Ultra
│ baseline │ experiment │
│ sec/op │ sec/op vs base │
MapCycle-24 20.59n ± 4% 18.99n ± 3% -7.77% (p=0.000 n=10)
khr@Mac-Studio src % benchstat noswiss experiment
goos: darwin
goarch: arm64
pkg: runtime
cpu: Apple M2 Ultra
│ noswiss │ experiment │
│ sec/op │ sec/op vs base │
MapCycle-24 16.12n ± 1% 18.99n ± 3% +17.83% (p=0.000 n=10)
Change-Id: I3a4edb814ba97fec020a6698c535ce3a87a9fc67
Reviewed-on: https://go-review.googlesource.com/c/go/+/625900
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
We've been getting intermittent flakes in this test since 2023,
all reporting values just barely over 100kB on windows-386.
If we were happy with 100kB, we should be happy with 128kB,
and it should fix the flakes.
Fixes #58570.
Change-Id: Iabe734cfbba6fe28a83f62e7811ee03fed424f0b
Reviewed-on: https://go-review.googlesource.com/c/go/+/628795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
We don't really need the index of the slot we're looking at.
Just keep looking until there are no more filled slots.
This particularly helps when there are only a few filled entries
(packed at the bottom), and we're looking for something that isn't
there. We exit earlier than we would otherwise.
goos: darwin
goarch: arm64
pkg: runtime
cpu: Apple M2 Ultra
│ baseline │ experiment │
│ sec/op │ sec/op vs base │
MapSmallAccessHit/Key=int64/Elem=int64/len=1-24 2.759n ± 0% 2.779n ± 2% ~ (p=0.055 n=10)
MapSmallAccessHit/Key=int64/Elem=int64/len=2-24 2.862n ± 1% 2.922n ± 1% +2.08% (p=0.000 n=10)
MapSmallAccessHit/Key=int64/Elem=int64/len=3-24 3.003n ± 0% 3.061n ± 1% +1.91% (p=0.000 n=10)
MapSmallAccessHit/Key=int64/Elem=int64/len=4-24 3.170n ± 1% 3.188n ± 1% +0.57% (p=0.030 n=10)
MapSmallAccessHit/Key=int64/Elem=int64/len=5-24 3.387n ± 1% 3.391n ± 1% ~ (p=0.362 n=10)
MapSmallAccessHit/Key=int64/Elem=int64/len=6-24 3.601n ± 1% 3.584n ± 0% -0.49% (p=0.009 n=10)
MapSmallAccessHit/Key=int64/Elem=int64/len=7-24 3.785n ± 1% 3.778n ± 3% ~ (p=0.987 n=10)
MapSmallAccessHit/Key=int64/Elem=int64/len=8-24 3.960n ± 1% 3.946n ± 1% ~ (p=0.256 n=10)
MapSmallAccessMiss/Key=int64/Elem=int64/len=0-24 2.004n ± 1%
MapSmallAccessMiss/Key=int64/Elem=int64/len=1-24 5.145n ± 1% 2.411n ± 1% -53.14% (p=0.000 n=10)
MapSmallAccessMiss/Key=int64/Elem=int64/len=2-24 5.128n ± 0% 3.313n ± 1% -35.40% (p=0.000 n=10)
MapSmallAccessMiss/Key=int64/Elem=int64/len=3-24 5.159n ± 1% 3.690n ± 1% -28.48% (p=0.000 n=10)
MapSmallAccessMiss/Key=int64/Elem=int64/len=4-24 5.117n ± 1% 4.466n ± 6% -12.73% (p=0.000 n=10)
MapSmallAccessMiss/Key=int64/Elem=int64/len=5-24 5.115n ± 1% 4.308n ± 1% -15.79% (p=0.000 n=10)
MapSmallAccessMiss/Key=int64/Elem=int64/len=6-24 5.111n ± 1% 4.538n ± 2% -11.19% (p=0.000 n=10)
MapSmallAccessMiss/Key=int64/Elem=int64/len=7-24 4.896n ± 4% 4.831n ± 1% -1.33% (p=0.001 n=10)
MapSmallAccessMiss/Key=int64/Elem=int64/len=8-24 4.905n ± 1% 5.121n ± 1% +4.40% (p=0.000 n=10)
geomean 3.917n 3.631n -11.11%
Change-Id: Ife26ac457a513af24fa0921b839ee6cd5fed6fba
Reviewed-on: https://go-review.googlesource.com/c/go/+/627717
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
For large strings, do a quick equality check on all the slots.
Only if more than one passes the quick equality check do we
resort to hashing.
│ baseline │ experiment │
│ sec/op │ sec/op vs base │
MegMap-24 16609.50n ± 1% 13.91n ± 3% -99.92% (p=0.000 n=10)
MegOneMap-24 16655.00n ± 0% 12.27n ± 1% -99.93% (p=0.000 n=10)
MegEqMap-24 41.31µ ± 1% 25.03µ ± 1% -39.40% (p=0.000 n=10)
MegEmptyMap-24 2.034n ± 0% 2.027n ± 2% ~ (p=0.541 n=10)
MegEmptyMapWithInterfaceKey-24 5.931n ± 2% 5.599n ± 1% -5.60% (p=0.000 n=10)
MapStringKeysEight_16-24 8.473n ± 7% 8.224n ± 5% ~ (p=0.315 n=10)
MapStringKeysEight_32-24 8.441n ± 2% 8.147n ± 1% -3.48% (p=0.002 n=10)
MapStringKeysEight_64-24 8.769n ± 1% 8.517n ± 1% -2.87% (p=0.000 n=10)
MapStringKeysEight_128-24 10.73n ± 4% 13.57n ± 8% +26.57% (p=0.000 n=10)
MapStringKeysEight_256-24 12.97n ± 2% 14.35n ± 4% +10.64% (p=0.001 n=10)
MapStringKeysEight_1M-24 17359.50n ± 3% 13.92n ± 4% -99.92% (p=0.000 n=10)
Change-Id: I4cc2ea4edab12a4b03236de626c7bcf0f96b6cc0
Reviewed-on: https://go-review.googlesource.com/c/go/+/625905
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Switch labelMap from map[string]string to use LabelSet as a data
structure. Optimize Labels() for the case where the keys are given in
sorted order without duplicates.
This is primarily motivated by reducing the overhead of distributed
tracing systems that use pprof labels. We have encountered cases where
users complained about the overhead relative to the rest of our
distributed tracing library code. Additionally, we see this as an
opportunity to free up hundreds of CPU cores across our fleet.
A secondary motivation is eBPF profilers that try to access pprof
labels. The current map[string]string requires them to implement Go map
access in eBPF, which is non-trivial. With the enablement of swiss maps,
this complexity is only increasing. The slice data structure introduced
in this CL will greatly lower the implementation complexity for eBPF
profilers in the future. But to be clear: This change does not imply
that the pprof label mechanism is now a stable ABI. They are still an
implementation detail and may change again in the future.
goos: darwin
goarch: arm64
pkg: runtime/pprof
cpu: Apple M1 Max
│ baseline.txt │ patch1.txt │
│ sec/op │ sec/op vs base │
Labels/set-one-10 153.50n ± 3% 75.00n ± 1% -51.14% (p=0.000 n=10)
Labels/merge-one-10 187.8n ± 1% 128.8n ± 1% -31.42% (p=0.000 n=10)
Labels/overwrite-one-10 193.1n ± 2% 102.0n ± 1% -47.18% (p=0.000 n=10)
Labels/ordered/set-many-10 502.6n ± 4% 146.1n ± 2% -70.94% (p=0.000 n=10)
Labels/ordered/merge-many-10 516.3n ± 2% 238.1n ± 1% -53.89% (p=0.000 n=10)
Labels/ordered/overwrite-many-10 569.3n ± 4% 247.6n ± 2% -56.51% (p=0.000 n=10)
Labels/unordered/set-many-10 488.9n ± 2% 308.3n ± 3% -36.94% (p=0.000 n=10)
Labels/unordered/merge-many-10 523.6n ± 1% 258.5n ± 1% -50.64% (p=0.000 n=10)
Labels/unordered/overwrite-many-10 571.4n ± 1% 412.1n ± 2% -27.89% (p=0.000 n=10)
geomean 366.8n 186.9n -49.05%
│ baseline.txt │ patch1b.txt │
│ B/op │ B/op vs base │
Labels/set-one-10 424.0 ± 0% 104.0 ± 0% -75.47% (p=0.000 n=10)
Labels/merge-one-10 424.0 ± 0% 200.0 ± 0% -52.83% (p=0.000 n=10)
Labels/overwrite-one-10 424.0 ± 0% 136.0 ± 0% -67.92% (p=0.000 n=10)
Labels/ordered/set-many-10 1344.0 ± 0% 392.0 ± 0% -70.83% (p=0.000 n=10)
Labels/ordered/merge-many-10 1184.0 ± 0% 712.0 ± 0% -39.86% (p=0.000 n=10)
Labels/ordered/overwrite-many-10 1056.0 ± 0% 712.0 ± 0% -32.58% (p=0.000 n=10)
Labels/unordered/set-many-10 1344.0 ± 0% 712.0 ± 0% -47.02% (p=0.000 n=10)
Labels/unordered/merge-many-10 1184.0 ± 0% 712.0 ± 0% -39.86% (p=0.000 n=10)
Labels/unordered/overwrite-many-10 1.031Ki ± 0% 1.008Ki ± 0% -2.27% (p=0.000 n=10)
geomean 843.1 405.1 -51.95%
│ baseline.txt │ patch1b.txt │
│ allocs/op │ allocs/op vs base │
Labels/set-one-10 5.000 ± 0% 3.000 ± 0% -40.00% (p=0.000 n=10)
Labels/merge-one-10 5.000 ± 0% 5.000 ± 0% ~ (p=1.000 n=10) ¹
Labels/overwrite-one-10 5.000 ± 0% 4.000 ± 0% -20.00% (p=0.000 n=10)
Labels/ordered/set-many-10 8.000 ± 0% 3.000 ± 0% -62.50% (p=0.000 n=10)
Labels/ordered/merge-many-10 8.000 ± 0% 5.000 ± 0% -37.50% (p=0.000 n=10)
Labels/ordered/overwrite-many-10 7.000 ± 0% 4.000 ± 0% -42.86% (p=0.000 n=10)
Labels/unordered/set-many-10 8.000 ± 0% 4.000 ± 0% -50.00% (p=0.000 n=10)
Labels/unordered/merge-many-10 8.000 ± 0% 5.000 ± 0% -37.50% (p=0.000 n=10)
Labels/unordered/overwrite-many-10 7.000 ± 0% 5.000 ± 0% -28.57% (p=0.000 n=10)
geomean 6.640 4.143 -37.60%
¹ all samples are equal
Change-Id: Ie68e960a25c2d97bcfb6239dc481832fa8a39754
Reviewed-on: https://go-review.googlesource.com/c/go/+/574516
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
This change adds the implementation for AddCleanup.Stop. It allows the
caller to cancel the call to execute the cleanup. Cleanup will not be
stopped if the cleanup has already been queued for execution.
For #67535
Change-Id: I494b77d344e54d772c41489d172286773c3814e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/627975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
|
|
This change introduces AddCleanup to the runtime package. AddCleanup attaches
a cleanup function to an pointer to an object.
The Stop method on Cleanups will be implemented in a followup CL.
AddCleanup is intended to be an incremental improvement over
SetFinalizer and will result in SetFinalizer being deprecated.
For #67535
Change-Id: I99645152e3fdcee85fcf42a4f312c6917e8aecb1
Reviewed-on: https://go-review.googlesource.com/c/go/+/627695
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
This PR will use test.Skip to bypass a test run for which the vmmap
subprocess appears to hang before the test times out.
In addition it catches a different error message from vmmap that can
occur due to transient resource shortages and triggers a retry for
this additional case.
Fixes #62352
Change-Id: I3ae749e5cd78965c45b1b7c689b896493aa37ba0
Reviewed-on: https://go-review.googlesource.com/c/go/+/560935
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The previous CALLFN macro was copying a single byte at a time
which is inefficient on loong64. In this CL, according to the
argsize, copy 16 bytes or 8 bytes at a time, and copy 1 byte
a time for the rest.
benchmark in reflect on 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: reflect
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
CallArgCopy/size=128 360.2n ± 0% 266.9n ± 0% -25.90% (p=0.000 n=20)
CallArgCopy/size=256 473.2n ± 0% 277.5n ± 0% -41.35% (p=0.000 n=20)
CallArgCopy/size=1024 1128.0n ± 0% 332.9n ± 0% -70.49% (p=0.000 n=20)
CallArgCopy/size=4096 3743.0n ± 0% 672.6n ± 0% -82.03% (p=0.000 n=20)
CallArgCopy/size=65536 58.888µ ± 0% 9.667µ ± 0% -83.58% (p=0.000 n=20)
geomean 2.116µ 693.4n -67.22%
| bench.old | bench.new |
| B/s | B/s vs base |
CallArgCopy/size=128 338.9Mi ± 0% 457.3Mi ± 0% +34.94% (p=0.000 n=20)
CallArgCopy/size=256 516.0Mi ± 0% 879.8Mi ± 0% +70.52% (p=0.000 n=20)
CallArgCopy/size=1024 865.5Mi ± 0% 2933.6Mi ± 0% +238.94% (p=0.000 n=20)
CallArgCopy/size=4096 1.019Gi ± 0% 5.672Gi ± 0% +456.52% (p=0.000 n=20)
CallArgCopy/size=65536 1.036Gi ± 0% 6.313Gi ± 0% +509.13% (p=0.000 n=20)
geomean 699.6Mi 2.085Gi +205.10%
goos: linux
goarch: loong64
pkg: reflect
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
CallArgCopy/size=128 466.6n ± 0% 368.7n ± 0% -20.98% (p=0.000 n=20)
CallArgCopy/size=256 579.4n ± 0% 384.6n ± 0% -33.62% (p=0.000 n=20)
CallArgCopy/size=1024 1273.0n ± 0% 492.0n ± 0% -61.35% (p=0.000 n=20)
CallArgCopy/size=4096 4049.0n ± 0% 978.1n ± 0% -75.84% (p=0.000 n=20)
CallArgCopy/size=65536 69.01µ ± 0% 14.50µ ± 0% -78.99% (p=0.000 n=20)
geomean 2.492µ 997.9n -59.96%
| bench.old | bench.new |
| B/s | B/s vs base |
CallArgCopy/size=128 261.6Mi ± 0% 331.0Mi ± 0% +26.54% (p=0.000 n=20)
CallArgCopy/size=256 421.4Mi ± 0% 634.8Mi ± 0% +50.66% (p=0.000 n=20)
CallArgCopy/size=1024 767.2Mi ± 0% 1985.0Mi ± 0% +158.75% (p=0.000 n=20)
CallArgCopy/size=4096 964.8Mi ± 0% 3993.8Mi ± 0% +313.95% (p=0.000 n=20)
CallArgCopy/size=65536 905.7Mi ± 0% 4310.6Mi ± 0% +375.97% (p=0.000 n=20)
geomean 593.9Mi 1.449Gi +149.76%
Change-Id: I9570395af80b2e4b760058098a1b5b07d4b37ad7
Reviewed-on: https://go-review.googlesource.com/c/go/+/627175
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Change-Id: I13fd5a96daff66a2ecb54f5bafa3d6e5c60f3879
Reviewed-on: https://go-review.googlesource.com/c/go/+/586357
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
|
|
The tri-state mutex implementation (unlocked, locked, sleeping) avoids
sleep/wake syscalls when contention is low or absent, but its
performance degrades when many threads are contending for a mutex to
execute a fast critical section.
A fast critical section means frequent unlock2 calls. Each of those
finds the mutex in the "sleeping" state and so wakes a sleeping thread,
even if many other threads are already awake and in the spin loop of
lock2 attempting to acquire the mutex for themselves. Many spinning
threads means wasting energy and CPU time that could be used by other
processes on the machine. Many threads all spinning on the same cache
line leads to performance collapse.
Merge the futex- and semaphore-based mutex implementations by using a
semaphore abstraction for futex platforms. Then, add a bit to the mutex
state word that communicates whether one of the waiting threads is awake
and spinning. When threads in lock2 see the new "spinning" bit, they can
sleep immediately. In unlock2, the "spinning" bit means we can save a
syscall and not wake a sleeping thread.
This brings up the real possibility of starvation: waiting threads are
able to enter a deeper sleep than before, since one of their peers can
volunteer to be the sole "spinning" thread and thus cause unlock2 to
skip the semawakeup call. Additionally, the waiting threads form a LIFO
stack so any wakeups that do occur will target threads that have gone to
sleep most recently. Counteract those effects by periodically waking the
thread at the bottom of the stack and allowing it to spin.
Exempt sched.lock from most of the new behaviors; it's often used by
several threads in sequence to do thread-specific work, so low-latency
handoff is a priority over improved throughput.
Gate use of this implementation behind GOEXPERIMENT=spinbitmutex, so
it's easy to disable. Enable it by default on supported platforms (the
most efficient implementation requires atomic.Xchg8).
Fixes #68578
goos: linux
goarch: amd64
pkg: runtime
cpu: 13th Gen Intel(R) Core(TM) i7-13700H
│ old │ new │
│ sec/op │ sec/op vs base │
MutexContention 17.82n ± 0% 17.74n ± 0% -0.42% (p=0.000 n=10)
MutexContention-2 22.17n ± 9% 19.85n ± 12% ~ (p=0.089 n=10)
MutexContention-3 26.14n ± 14% 20.81n ± 13% -20.41% (p=0.000 n=10)
MutexContention-4 29.28n ± 8% 21.19n ± 10% -27.62% (p=0.000 n=10)
MutexContention-5 31.79n ± 2% 21.98n ± 10% -30.83% (p=0.000 n=10)
MutexContention-6 34.63n ± 1% 22.58n ± 5% -34.79% (p=0.000 n=10)
MutexContention-7 44.16n ± 2% 23.14n ± 7% -47.59% (p=0.000 n=10)
MutexContention-8 53.81n ± 3% 23.66n ± 6% -56.04% (p=0.000 n=10)
MutexContention-9 65.58n ± 4% 23.91n ± 9% -63.54% (p=0.000 n=10)
MutexContention-10 77.35n ± 3% 26.06n ± 9% -66.31% (p=0.000 n=10)
MutexContention-11 89.62n ± 1% 25.56n ± 9% -71.47% (p=0.000 n=10)
MutexContention-12 102.45n ± 2% 25.57n ± 7% -75.04% (p=0.000 n=10)
MutexContention-13 111.95n ± 1% 24.59n ± 8% -78.04% (p=0.000 n=10)
MutexContention-14 123.95n ± 3% 24.42n ± 6% -80.30% (p=0.000 n=10)
MutexContention-15 120.80n ± 10% 25.54n ± 6% -78.86% (p=0.000 n=10)
MutexContention-16 128.10n ± 25% 26.95n ± 4% -78.96% (p=0.000 n=10)
MutexContention-17 139.80n ± 18% 24.96n ± 5% -82.14% (p=0.000 n=10)
MutexContention-18 141.35n ± 7% 25.05n ± 8% -82.27% (p=0.000 n=10)
MutexContention-19 151.35n ± 18% 25.72n ± 6% -83.00% (p=0.000 n=10)
MutexContention-20 153.30n ± 20% 24.75n ± 6% -83.85% (p=0.000 n=10)
MutexHandoff/Solo-20 13.54n ± 1% 13.61n ± 4% ~ (p=0.206 n=10)
MutexHandoff/FastPingPong-20 141.3n ± 209% 164.8n ± 49% ~ (p=0.436 n=10)
MutexHandoff/SlowPingPong-20 1.572µ ± 16% 1.804µ ± 19% +14.76% (p=0.015 n=10)
geomean 74.34n 30.26n -59.30%
goos: darwin
goarch: arm64
pkg: runtime
cpu: Apple M1
│ old │ new │
│ sec/op │ sec/op vs base │
MutexContention 13.86n ± 3% 12.09n ± 3% -12.73% (p=0.000 n=10)
MutexContention-2 15.88n ± 1% 16.50n ± 2% +3.94% (p=0.001 n=10)
MutexContention-3 18.45n ± 2% 16.88n ± 2% -8.54% (p=0.000 n=10)
MutexContention-4 20.01n ± 2% 18.94n ± 18% ~ (p=0.469 n=10)
MutexContention-5 22.60n ± 1% 17.51n ± 9% -22.50% (p=0.000 n=10)
MutexContention-6 23.93n ± 2% 17.35n ± 2% -27.48% (p=0.000 n=10)
MutexContention-7 24.69n ± 1% 17.15n ± 3% -30.54% (p=0.000 n=10)
MutexContention-8 25.01n ± 1% 17.33n ± 2% -30.69% (p=0.000 n=10)
MutexHandoff/Solo-8 13.96n ± 4% 12.04n ± 4% -13.78% (p=0.000 n=10)
MutexHandoff/FastPingPong-8 68.89n ± 4% 64.62n ± 2% -6.20% (p=0.000 n=10)
MutexHandoff/SlowPingPong-8 9.698µ ± 22% 9.646µ ± 35% ~ (p=0.912 n=10)
geomean 38.20n 32.53n -14.84%
Change-Id: I0058c75eadf282d08eea7fce0d426f0518039f7c
Reviewed-on: https://go-review.googlesource.com/c/go/+/620435
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
|
|
Implement sema{create,sleep,wakeup} in terms of the futex syscall when
available. Split the lock2/unlock2 implementations out of lock_sema.go
and lock_futex.go (which they shared with runtime.note) to allow
swapping in new implementations of those.
Let futex-based platforms use the semaphore-based mutex implementation.
Control that via the new "spinbitmutex" GOEXPERMENT value, disabled by
default.
This lays the groundwork for a "spinbit" mutex implementation; it does
not include the new mutex implementation.
For #68578.
Change-Id: I091289c85124212a87abec7079ecbd9e610b4270
Reviewed-on: https://go-review.googlesource.com/c/go/+/622996
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This change moves the check for a change in the memory management
system to after the SetFinalizer parameters have been validated.
Moving the check ensures that invalid parameters will never pass the
validation checks.
Change-Id: I9f1d3454f891f7b147c0d86b6720297172e08ef9
Reviewed-on: https://go-review.googlesource.com/c/go/+/625035
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
We can find a few issues finally turned out to be a race condition,
such as #47513. I believe such a tip can eliminate the need for developers
to file this kind of issue in the first place.
Change-Id: I1597fa09fde641882e8e87453470941747705272
GitHub-Last-Rev: 9f136f5b3bee78f90f434dcea1cabf397c6c05f2
GitHub-Pull-Request: golang/go#70331
Reviewed-on: https://go-review.googlesource.com/c/go/+/627816
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
When multiple threads all need to acquire the same runtime.mutex, make
sure that none of them has to wait for too long. Measure how long a
single thread can capture the mutex, and how long individual other
threads must go between having a turn with the mutex.
For #68578
Change-Id: I56ecc551232f9c2730c128a9f8eeb7bd45c2d3b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/622995
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Go has never supported Cygwin as a C compiler, but users get the
following cryptic error message when they try to use it:
implicit declaration of function '_beginthread'
This is because Cygwin doesn't implement _beginthread. Note that
this is not the only problem with Cygwin, but it's the one that
users are most likely to run into first.
This CL improves the error message to make it clear that Cygwin
is not supported, and suggests using MinGW instead.
Fixes #59490
Fixes #36691
Change-Id: Ifeec7a2cb38d7c5f50d6362c95504f72818c6a76
Reviewed-on: https://go-review.googlesource.com/c/go/+/627935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
I believe now this code can work in both test and standalone situations.
Fixes #70057
Change-Id: Ieb5163e6b917fd03d050f65589df6c31ad2515fe
GitHub-Last-Rev: db4863c05e4d4bcbd40caf459d29e2eee81f847b
GitHub-Pull-Request: golang/go#70270
Reviewed-on: https://go-review.googlesource.com/c/go/+/625904
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Bypass: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Fixes #65328
Change-Id: I11242be93a95e117a6758ac037e143c3b38aa71c
Reviewed-on: https://go-review.googlesource.com/c/go/+/597980
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
Currently it's possible for weak->strong conversions to create more GC
work during mark termination. When a weak->strong conversion happens
during the mark phase, we need to mark the newly-strong pointer, since
it may now be the only pointer to that object. In other words, the
object could be white.
But queueing new white objects creates GC work, and if this happens
during mark termination, we could end up violating mark termination
invariants. In the parlance of the mark termination algorithm, the
weak->strong conversion is a non-monotonic source of GC work, unlike the
write barriers (which will eventually only see black objects).
This change fixes the problem by forcing weak->strong conversions to
block during mark termination. We can do this efficiently by setting a
global flag before the ragged barrier that is checked at each
weak->strong conversion. If the flag is set, then the conversions block.
The ragged barrier ensures that all Ps have observed the flag and that
any weak->strong conversions which completed before the ragged barrier
have their newly-minted strong pointers visible in GC work queues if
necessary. We later unset the flag and wake all the blocked goroutines
during the mark termination STW.
There are a few subtleties that we need to account for. For one, it's
possible that a goroutine which blocked in a weak->strong conversion
wakes up only to find it's mark termination time again, so we need to
recheck the global flag on wake. We should also stay non-preemptible
while performing the check, so that if the check *does* appear as true,
it cannot switch back to false while we're actively trying to block. If
it switches to false while we try to block, then we'll be stuck in the
queue until the following GC.
All-in-all, this CL is more complicated than I would have liked, but
it's the only idea so far that is clearly correct to me at a high level.
This change adds a test which is somewhat invasive as it manipulates
mark termination, but hopefully that infrastructure will be useful for
debugging, fixing, and regression testing mark termination whenever we
do fix it.
Fixes #69803.
Change-Id: Ie314e6fd357c9e2a07a9be21f217f75f7aba8c4a
Reviewed-on: https://go-review.googlesource.com/c/go/+/623615
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Iteration over swissmaps with low load (think map with large hint but
only one entry) is signicantly regressed vs old maps. See noswiss vs
swiss-tip below (+60%).
Currently we visit every single slot and individually check if the slot
is full or not.
We can do much better by using the control word to find all full slots
in a group in a single operation. This lets us skip completely empty
groups for instance.
Always using the control match approach is great for maps with low load,
but is a regression for mostly full maps. Mostly full maps have the
majority of slots full, so most calls to mapiternext will return the
next slot. In that case, doing the full group match on every call is
more expensive than checking the individual slot.
Thus we take a hybrid approach: on each call, we first check an
individual slot. If that slot is full, we're done. If that slot is
non-full, then we fall back to doing full group matches.
This trade-off works well. Both mostly empty and mostly full maps
perform nearly as well as doing all matching and all individual,
respectively.
The fast path is placed above the slow path loop rather than combined
(with some sort of `useMatch` variable) into a single loop to help the
compiler's code generation. The compiler really struggles with code
generation on a combined loop for some reason, yielding ~15% additional
instructions/op.
Comparison with old maps prior to this CL:
│ noswiss │ swiss-tip │
│ sec/op │ sec/op vs base │
MapIter/Key=int64/Elem=int64/len=6-12 11.53n ± 2% 10.64n ± 2% -7.72% (p=0.002 n=6)
MapIter/Key=int64/Elem=int64/len=64-12 10.180n ± 2% 9.670n ± 5% -5.01% (p=0.004 n=6)
MapIter/Key=int64/Elem=int64/len=65536-12 10.78n ± 1% 10.15n ± 2% -5.84% (p=0.002 n=6)
MapIterLowLoad/Key=int64/Elem=int64/len=6-12 6.116n ± 2% 6.840n ± 2% +11.84% (p=0.002 n=6)
MapIterLowLoad/Key=int64/Elem=int64/len=64-12 2.403n ± 2% 3.892n ± 0% +61.95% (p=0.002 n=6)
MapIterLowLoad/Key=int64/Elem=int64/len=65536-12 1.940n ± 3% 3.237n ± 1% +66.81% (p=0.002 n=6)
MapPop/Key=int64/Elem=int64/len=6-12 66.20n ± 2% 60.14n ± 3% -9.15% (p=0.002 n=6)
MapPop/Key=int64/Elem=int64/len=64-12 97.24n ± 1% 171.35n ± 1% +76.21% (p=0.002 n=6)
MapPop/Key=int64/Elem=int64/len=65536-12 826.1n ± 12% 842.5n ± 10% ~ (p=0.937 n=6)
geomean 17.93n 20.96n +16.88%
After this CL:
│ noswiss │ swiss-cl │
│ sec/op │ sec/op vs base │
MapIter/Key=int64/Elem=int64/len=6-12 11.53n ± 2% 10.90n ± 3% -5.42% (p=0.002 n=6)
MapIter/Key=int64/Elem=int64/len=64-12 10.180n ± 2% 9.719n ± 9% -4.53% (p=0.043 n=6)
MapIter/Key=int64/Elem=int64/len=65536-12 10.78n ± 1% 10.07n ± 2% -6.63% (p=0.002 n=6)
MapIterLowLoad/Key=int64/Elem=int64/len=6-12 6.116n ± 2% 7.022n ± 1% +14.82% (p=0.002 n=6)
MapIterLowLoad/Key=int64/Elem=int64/len=64-12 2.403n ± 2% 1.475n ± 1% -38.63% (p=0.002 n=6)
MapIterLowLoad/Key=int64/Elem=int64/len=65536-12 1.940n ± 3% 1.210n ± 6% -37.67% (p=0.002 n=6)
MapPop/Key=int64/Elem=int64/len=6-12 66.20n ± 2% 61.54n ± 2% -7.02% (p=0.002 n=6)
MapPop/Key=int64/Elem=int64/len=64-12 97.24n ± 1% 110.10n ± 1% +13.23% (p=0.002 n=6)
MapPop/Key=int64/Elem=int64/len=65536-12 826.1n ± 12% 504.7n ± 6% -38.91% (p=0.002 n=6)
geomean 17.93n 15.29n -14.74%
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10
Change-Id: Ic07f9df763239e85be57873103df5007144fdaef
Reviewed-on: https://go-review.googlesource.com/c/go/+/627156
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: I3a3b7da6245a18bf1db0c595008f0eea853ce544
Reviewed-on: https://go-review.googlesource.com/c/go/+/627155
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
The failures in #70288 are consistent with and strongly imply
stack corruption during fault handling, and debug prints show
that the Go code run during fault handling is running about
300 bytes above the bottom of the goroutine stack.
That should be okay, but that implies the DLL code that called
Go's handler was running near the bottom of the stack too,
and maybe it called other deeper things before or after the
Go handler and smashed the stack that way.
stackSystem is already 4096 bytes on amd64;
making it match that on 386 makes the flaky failures go away.
It's a little unsatisfying not to be able to say exactly what is
overflowing the stack, but the circumstantial evidence is
very strong that it's Windows.
Fixes #70288.
Change-Id: Ife89385873d5e5062a71629dbfee40825edefa49
Reviewed-on: https://go-review.googlesource.com/c/go/+/627375
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Fixes #70189
Fixes #59411
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-noswissmap
Change-Id: I4ef7ecd7e996330189309cb2a658cf34bf9e1119
Reviewed-on: https://go-review.googlesource.com/c/go/+/625275
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Use Loong64's LSX instruction VPCNT to implement math/bits.OnesCount{16,32,64}
and make it intrinsic.
Benchmark results on loongson 3A5000 and 3A6000 machines:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
OnesCount 4.413n ± 0% 1.401n ± 0% -68.25% (p=0.000 n=10)
OnesCount8 1.364n ± 0% 1.363n ± 0% ~ (p=0.130 n=10)
OnesCount16 2.112n ± 0% 1.534n ± 0% -27.37% (p=0.000 n=10)
OnesCount32 4.533n ± 0% 1.529n ± 0% -66.27% (p=0.000 n=10)
OnesCount64 4.565n ± 0% 1.531n ± 1% -66.46% (p=0.000 n=10)
geomean 3.048n 1.470n -51.78%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
OnesCount 3.553n ± 0% 1.201n ± 0% -66.20% (p=0.000 n=10)
OnesCount8 0.8021n ± 0% 0.8004n ± 0% -0.21% (p=0.000 n=10)
OnesCount16 1.216n ± 0% 1.000n ± 0% -17.76% (p=0.000 n=10)
OnesCount32 3.006n ± 0% 1.035n ± 0% -65.57% (p=0.000 n=10)
OnesCount64 3.503n ± 0% 1.035n ± 0% -70.45% (p=0.000 n=10)
geomean 2.053n 1.006n -51.01%
Change-Id: I07a5b8da2bb48711b896387ec7625145804affc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/620978
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
│ baseline │ experiment │
│ sec/op │ sec/op vs base │
MapDeleteLargeKey-24 312.0n ± 6% 162.3n ± 5% -47.97% (p=0.000 n=10)
Change-Id: I31f1f8e3c344cf8abf2e9eb4b51b78fcd67b93c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/625906
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Now that we support pointer types on wasmimport functions, use
them, instead of unsafe.Pointer. This removes unsafe conversions.
There is still one unsafe.Pointer argument left. It is actually a
*Stat_t, which is an exported type with an int field, which is not
allowed as a wasmimport field type. We probably cannot change it
at this point.
Updates #66984.
Change-Id: I445c70b356c3877a5604bee67d19d99a538c682e
Reviewed-on: https://go-review.googlesource.com/c/go/+/627059
Reviewed-by: Johan Brandhorst-Satzkorn <johan.brandhorst@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
CL 622042 added rand as a compiler builtin, but did not update builtinlist.
Also update the mkbuiltin comment to refer to the current file location,
and add a comment for runtime.rand that it is called from the compiler.
For #54766
Change-Id: I99d2c0bb0658da333775afe2ed0447265c845c82
Reviewed-on: https://go-review.googlesource.com/c/go/+/626755
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
|
|
The publication barrier is a StoreStore barrier, which is implemented
by "DBAR 0x1A" [1] on loong64.
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
Malloc8 31.76n ± 0% 22.79n ± 0% -28.24% (p=0.000 n=20)
Malloc8-2 25.46n ± 0% 18.33n ± 0% -28.00% (p=0.000 n=20)
Malloc8-4 25.75n ± 0% 18.43n ± 0% -28.41% (p=0.000 n=20)
Malloc16 62.97n ± 0% 42.41n ± 0% -32.65% (p=0.000 n=20)
Malloc16-2 49.11n ± 0% 31.68n ± 0% -35.50% (p=0.000 n=20)
Malloc16-4 49.64n ± 1% 31.95n ± 0% -35.62% (p=0.000 n=20)
MallocTypeInfo8 58.57n ± 0% 46.51n ± 0% -20.61% (p=0.000 n=20)
MallocTypeInfo8-2 51.43n ± 0% 38.01n ± 0% -26.09% (p=0.000 n=20)
MallocTypeInfo8-4 51.65n ± 0% 38.15n ± 0% -26.13% (p=0.000 n=20)
MallocTypeInfo16 68.07n ± 0% 51.62n ± 0% -24.17% (p=0.000 n=20)
MallocTypeInfo16-2 54.73n ± 0% 41.13n ± 0% -24.85% (p=0.000 n=20)
MallocTypeInfo16-4 55.05n ± 0% 41.28n ± 0% -25.02% (p=0.000 n=20)
MallocLargeStruct 491.5n ± 0% 454.8n ± 0% -7.47% (p=0.000 n=20)
MallocLargeStruct-2 351.8n ± 1% 323.8n ± 0% -7.94% (p=0.000 n=20)
MallocLargeStruct-4 333.6n ± 0% 316.7n ± 0% -5.10% (p=0.000 n=20)
geomean 71.01n 53.78n -24.26%
[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html
Change-Id: Ica0c89db6f2bebd55d9b3207a1c462a9454e9268
Reviewed-on: https://go-review.googlesource.com/c/go/+/577515
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
It is defined in bionic libc since at least API level 3. Use it.
Updates #68285.
Change-Id: I215c2d61d5612e7c0298b2cb69875690f8fbea66
Reviewed-on: https://go-review.googlesource.com/c/go/+/626275
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Add several benchmarks for pprof labels to analyze the impact of
follow-up CLs.
Change-Id: Ifae39cfe83ec93858fce9e3af6c1be024ba76736
Reviewed-on: https://go-review.googlesource.com/c/go/+/574515
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Sometimes the runtime realizes there is a race before the race detector does.
Maybe that's a bug in the race detector? But we should probably handle it.
Update #70164
(Fixes? I'm not sure.)
Change-Id: Ie7e8bf2b06701368e0551b4a1aa40f6746bbddd8
Reviewed-on: https://go-review.googlesource.com/c/go/+/626036
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
On Loong64, AMSWAPDB{W,V} instructions are supported by default, and AMSWAPDB{B,H} [1]
is a new instruction added by LA664(Loongson 3A6000) and later microarchitectures.
Therefore, AMSWAPDB{W,V} (full barrier) is used to implement AtomicStore{32,64}, and
the traditional MOVB or the new AMSWAPDBB is used to implement AtomicStore8 according
to the CPU feature.
The StoreRelease barrier on Loong64 is "dbar 0x12", but it is still necessary to
ensure consistency in the order of Store/Load [2].
LoweredAtomicStorezero{32,64} was removed because on loong64 the constant "0" uses
the R0 register, and there is no performance difference between the implementations
of LoweredAtomicStorezero{32,64} and LoweredAtomicStore{32,64}.
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A5000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
AtomicStore64 19.61n ± 0% 13.61n ± 0% -30.60% (p=0.000 n=20)
AtomicStore64-2 19.61n ± 0% 13.61n ± 0% -30.57% (p=0.000 n=20)
AtomicStore64-4 19.62n ± 0% 13.61n ± 0% -30.63% (p=0.000 n=20)
AtomicStore 19.61n ± 0% 13.61n ± 0% -30.60% (p=0.000 n=20)
AtomicStore-2 19.62n ± 0% 13.61n ± 0% -30.63% (p=0.000 n=20)
AtomicStore-4 19.62n ± 0% 13.62n ± 0% -30.58% (p=0.000 n=20)
AtomicStore8 19.61n ± 0% 20.01n ± 0% +2.04% (p=0.000 n=20)
AtomicStore8-2 19.62n ± 0% 20.02n ± 0% +2.01% (p=0.000 n=20)
AtomicStore8-4 19.61n ± 0% 20.02n ± 0% +2.09% (p=0.000 n=20)
geomean 19.61n 15.48n -21.08%
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
AtomicStore64 18.03n ± 0% 12.81n ± 0% -28.93% (p=0.000 n=20)
AtomicStore64-2 18.02n ± 0% 12.81n ± 0% -28.91% (p=0.000 n=20)
AtomicStore64-4 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore 18.02n ± 0% 12.81n ± 0% -28.91% (p=0.000 n=20)
AtomicStore-2 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore-4 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore8 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore8-2 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore8-4 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
geomean 18.01n 12.81n -28.89%
[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html
[2]: https://gcc.gnu.org/git/?p=gcc.git;a=blob_plain;f=gcc/config/loongarch/sync.md
Change-Id: I4ae5e8dd0e6f026129b6e503990a763ed40c6097
Reviewed-on: https://go-review.googlesource.com/c/go/+/581356
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
The TestProfilerStackDepth/heap test can spuriously fail if the profiler
happens to capture a stack with an allocation several frames deep into
runtime code. The pprof API hides runtime frames at the leaf-end of
stacks, but those frames still count against the profiler's stack depth
limit. The test checks only the first stack it finds with the desired
prefix and fails if it's not deep enough or doesn't have the right root
frame. So it can fail in that scenario, even though the implementation
isn't really broken.
Relax the test to check that there is at least one stack with desired
prefix, depth, and root frame.
Fixes #70112
Change-Id: I337fb3cccd1ddde76530b03aa1ec0f9608aa4112
Reviewed-on: https://go-review.googlesource.com/c/go/+/623998
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: I352fa0e4e048b896d63427f1c2c519bfed24c702
Reviewed-on: https://go-review.googlesource.com/c/go/+/622017
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Currently, at a cgo callback where there is already a Go frame on
the stack (i.e. C->Go->C->Go), we require that at the inner Go
callback the SP is within the g0's stack bounds set by a previous
callback. This is to prevent that the C code switches stack while
having a Go frame on the stack, which we don't really support. But
this could also happen when we cannot get accurate stack bounds,
e.g. when pthread_getattr_np is not available. Since the stack
bounds are just estimates based on the current SP, if there are
multiple C->Go callbacks with various stack depth, it is possible
that the SP of a later callback falls out of a previous call's
estimate. This leads to runtime throw in a seemingly reasonable
program.
This CL changes it to save the old g0 stack bounds at cgocallback,
update the bounds, and restore the old bounds at return. So each
callback will get its own stack bounds based on the current SP,
and when it returns, the outer callback has the its old stack
bounds restored.
Also, at a cgo callback when there is no Go frame on the stack,
we currently always get new stack bounds. We do this because if
we can only get estimated bounds based on the SP, and the stack
depth varies a lot between two C->Go calls, the previous
estimates may be off and we fall out or nearly fall out of the
previous bounds. But this causes a performance problem: the
pthread API to get accurate stack bounds (pthread_getattr_np) is
very slow when called on the main thread. Getting the stack bounds
every time significantly slows down repeated C->Go calls on the
main thread.
This CL fixes it by "caching" the stack bounds if they are
accurate. I.e. at the second time Go calls into C, if the previous
stack bounds are accurate, and the current SP is in bounds, we can
be sure it is the same stack and we don't need to update the bounds.
This avoids the repeated calls to pthread_getattr_np. If we cannot
get the accurate bounds, we continue to update the stack bounds
based on the SP, and that operation is very cheap.
On a Linux/AMD64 machine with glibc:
name old time/op new time/op delta
CgoCallbackMainThread-8 96.4µs ± 3% 0.1µs ± 2% -99.92% (p=0.000 n=10+9)
Fixes #68285.
Fixes #68587.
Change-Id: I3422badd5ad8ff63e1a733152d05fb7a44d5d435
Reviewed-on: https://go-review.googlesource.com/c/go/+/600296
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
The compiler will stack allocate the Map struct and initial group if
possible.
Stack maps are initialized inline without calling into the runtime.
Small heap allocated maps use makemap_small.
These are the same heuristics as existing maps.
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: I6c371d1309716fd1c38a3212d417b3c76db5c9b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/622042
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
Add all the specialized variants that exist for the existing maps.
Like the existing maps, the fast variants do not support indirect
key/elem.
Note that as of this CL, the Get and Put methods on Map/table are
effectively dead. They are only reachable from the internal/runtime/maps
unit tests.
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: I95297750be6200f34ec483e4cfc897f048c26db7
Reviewed-on: https://go-review.googlesource.com/c/go/+/616463
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
We use the same heuristics as existing maps.
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: I44bb51483cae2c1714717f1b501850fb9e55a39a
Reviewed-on: https://go-review.googlesource.com/c/go/+/616461
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This is the same design as existing maps.
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: I5f6ef5fea1e0f0616bcd90eaae7faee4cdac58c6
Reviewed-on: https://go-review.googlesource.com/c/go/+/616460
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: Iebc7f5482299cb7c4ecccc4c2eb46b4bc42c5fc3
Reviewed-on: https://go-review.googlesource.com/c/go/+/616459
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Rather than importing runtime directly, linkname the functions from
runtime. This allows importing internal/race from internal/runtime/*
packages, similar to internal/asan and internal/msan.
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: Ibd9644557782076e3cee7927c8a6e6d2909f0a6e
Reviewed-on: https://go-review.googlesource.com/c/go/+/616458
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
When given a hint size, set the initial capacity large enough to avoid
requiring growth in the average case.
When not given a hint (or given 0), don't allocate anything at all.
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: I8844fc652b8d2d4e5136cd56f7e78999a07fe381
Reviewed-on: https://go-review.googlesource.com/c/go/+/616457
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
On a whim I decided to investigate the possibility of whether the
flakiness on the asan builder was due to a concurrently executing test.
Of the most recent failures there were a few candidates, and this test
was one of them. After disabling each candidate one by one, we had a
winner: this test causes other concurrently executing tests, running
pure Go code, to spuriously fail.
I do not know why yet, but this test doesn't seem like it would have
incredibly high value for ASAN, and does funky things like MAP_FIXED in
recently unmapped regions, so I think it's fine.
For #70054.
For #64257.
Change-Id: Ib9a84d9b69812e76c390d99b00698710ee1ece1a
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-asan-clang15
Reviewed-on: https://go-review.googlesource.com/c/go/+/623336
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This enables manual inlining Map.Get/table.getWithoutKey to create a
simple fast path with no calls.
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: Ic208dd4c02c7554f312b85b5fadccaf82b23545c
Reviewed-on: https://go-review.googlesource.com/c/go/+/616455
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
Fixes #70008.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-race
Change-Id: I1fd7d1cbda20cc96016c864bcf0696382453e807
Reviewed-on: https://go-review.googlesource.com/c/go/+/623335
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Rather than storing the same type pointer in multiple places, just pass
it around.
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: Ia6c74805c7a44125ae473177b317f16c6688e6de
Reviewed-on: https://go-review.googlesource.com/c/go/+/622377
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This can get rid of a bounds check.
Followup to CL 622240.
Change-Id: I9d0a2c0408b8d274c46136d32d7a5fb09b4aad1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/622955
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
These fail for the same reason as for the race detector, and is the most
frequently failing test in both.
For #70054.
For #64257.
For #64256.
Change-Id: I3649e58069190b4450f9d4deae6eb8eca5f827a3
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-asan-clang15,gotip-linux-amd64-msan-clang15
Reviewed-on: https://go-review.googlesource.com/c/go/+/623176
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
BGT, BLT, BLE, BGE, BNE, BVS, BVC, and BEQ support by assembler. This will simplify the usage of BC constructs like
BC 12, 30, LR <=> BEQ CR7, LR
BC 12, 2, LR <=> BEQ CR0, LR
BC 12, 0, target <=> BLT CR0, target
BC 12, 2, target <=> BEQ CR0, target
BC 12, 5, target <=> BGT CR1, target
BC 12, 30, target <=> BEQ CR7, target
BC 4, 6, target <=> BNE CR1, target
BC 4, 5, target <=> BLE CR1, target
code cleanup based on the above additions.
Change-Id: I02fdb212b6fe3f85ce447e05f4d42118c9ce63b5
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/612395
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Updates #69536
Change-Id: Ib68b0e7058221a89908fd47f255f0a983883bee8
Reviewed-on: https://go-review.googlesource.com/c/go/+/621075
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|