aboutsummaryrefslogtreecommitdiff
path: root/src/runtime
AgeCommit message (Collapse)Author
2024-11-18runtime: fix MapCycle testKeith Randall
It wasn't actually testing what it says it was testing. A random permutation isn't cyclic. It only probably hits a few elements before entering a cycle. Use an algorithm that generates a random cyclic permutation instead. Fixing the test makes the previous CL look less good. But it still helps. (Theory: Fixing the test makes it less cache friendly, so there are more misses all around. That makes the benchmark slower, suppressing the differences seen. Also fixing the benchmark makes the loop iteration count less predictable, which hurts the raw loop implementation somewhat.) (baseline = tip, experiment = tip+previous CL, noswiss = GOEXPERIMENT=noswissmap) goos: darwin goarch: arm64 pkg: runtime cpu: Apple M2 Ultra │ baseline │ experiment │ │ sec/op │ sec/op vs base │ MapCycle-24 20.59n ± 4% 18.99n ± 3% -7.77% (p=0.000 n=10) khr@Mac-Studio src % benchstat noswiss experiment goos: darwin goarch: arm64 pkg: runtime cpu: Apple M2 Ultra │ noswiss │ experiment │ │ sec/op │ sec/op vs base │ MapCycle-24 16.12n ± 1% 18.99n ± 3% +17.83% (p=0.000 n=10) Change-Id: I3a4edb814ba97fec020a6698c535ce3a87a9fc67 Reviewed-on: https://go-review.googlesource.com/c/go/+/625900 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2024-11-18runtime: relax TestWindowsStackMemory from 100kB to 128kBRuss Cox
We've been getting intermittent flakes in this test since 2023, all reporting values just barely over 100kB on windows-386. If we were happy with 100kB, we should be happy with 128kB, and it should fix the flakes. Fixes #58570. Change-Id: Iabe734cfbba6fe28a83f62e7811ee03fed424f0b Reviewed-on: https://go-review.googlesource.com/c/go/+/628795 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-11-17internal/runtime/maps: simplify small group lookupKeith Randall
We don't really need the index of the slot we're looking at. Just keep looking until there are no more filled slots. This particularly helps when there are only a few filled entries (packed at the bottom), and we're looking for something that isn't there. We exit earlier than we would otherwise. goos: darwin goarch: arm64 pkg: runtime cpu: Apple M2 Ultra │ baseline │ experiment │ │ sec/op │ sec/op vs base │ MapSmallAccessHit/Key=int64/Elem=int64/len=1-24 2.759n ± 0% 2.779n ± 2% ~ (p=0.055 n=10) MapSmallAccessHit/Key=int64/Elem=int64/len=2-24 2.862n ± 1% 2.922n ± 1% +2.08% (p=0.000 n=10) MapSmallAccessHit/Key=int64/Elem=int64/len=3-24 3.003n ± 0% 3.061n ± 1% +1.91% (p=0.000 n=10) MapSmallAccessHit/Key=int64/Elem=int64/len=4-24 3.170n ± 1% 3.188n ± 1% +0.57% (p=0.030 n=10) MapSmallAccessHit/Key=int64/Elem=int64/len=5-24 3.387n ± 1% 3.391n ± 1% ~ (p=0.362 n=10) MapSmallAccessHit/Key=int64/Elem=int64/len=6-24 3.601n ± 1% 3.584n ± 0% -0.49% (p=0.009 n=10) MapSmallAccessHit/Key=int64/Elem=int64/len=7-24 3.785n ± 1% 3.778n ± 3% ~ (p=0.987 n=10) MapSmallAccessHit/Key=int64/Elem=int64/len=8-24 3.960n ± 1% 3.946n ± 1% ~ (p=0.256 n=10) MapSmallAccessMiss/Key=int64/Elem=int64/len=0-24 2.004n ± 1% MapSmallAccessMiss/Key=int64/Elem=int64/len=1-24 5.145n ± 1% 2.411n ± 1% -53.14% (p=0.000 n=10) MapSmallAccessMiss/Key=int64/Elem=int64/len=2-24 5.128n ± 0% 3.313n ± 1% -35.40% (p=0.000 n=10) MapSmallAccessMiss/Key=int64/Elem=int64/len=3-24 5.159n ± 1% 3.690n ± 1% -28.48% (p=0.000 n=10) MapSmallAccessMiss/Key=int64/Elem=int64/len=4-24 5.117n ± 1% 4.466n ± 6% -12.73% (p=0.000 n=10) MapSmallAccessMiss/Key=int64/Elem=int64/len=5-24 5.115n ± 1% 4.308n ± 1% -15.79% (p=0.000 n=10) MapSmallAccessMiss/Key=int64/Elem=int64/len=6-24 5.111n ± 1% 4.538n ± 2% -11.19% (p=0.000 n=10) MapSmallAccessMiss/Key=int64/Elem=int64/len=7-24 4.896n ± 4% 4.831n ± 1% -1.33% (p=0.001 n=10) MapSmallAccessMiss/Key=int64/Elem=int64/len=8-24 4.905n ± 1% 5.121n ± 1% +4.40% (p=0.000 n=10) geomean 3.917n 3.631n -11.11% Change-Id: Ife26ac457a513af24fa0921b839ee6cd5fed6fba Reviewed-on: https://go-review.googlesource.com/c/go/+/627717 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-11-17runtime/internal/maps: optimize long string keys for small mapsKeith Randall
For large strings, do a quick equality check on all the slots. Only if more than one passes the quick equality check do we resort to hashing. │ baseline │ experiment │ │ sec/op │ sec/op vs base │ MegMap-24 16609.50n ± 1% 13.91n ± 3% -99.92% (p=0.000 n=10) MegOneMap-24 16655.00n ± 0% 12.27n ± 1% -99.93% (p=0.000 n=10) MegEqMap-24 41.31µ ± 1% 25.03µ ± 1% -39.40% (p=0.000 n=10) MegEmptyMap-24 2.034n ± 0% 2.027n ± 2% ~ (p=0.541 n=10) MegEmptyMapWithInterfaceKey-24 5.931n ± 2% 5.599n ± 1% -5.60% (p=0.000 n=10) MapStringKeysEight_16-24 8.473n ± 7% 8.224n ± 5% ~ (p=0.315 n=10) MapStringKeysEight_32-24 8.441n ± 2% 8.147n ± 1% -3.48% (p=0.002 n=10) MapStringKeysEight_64-24 8.769n ± 1% 8.517n ± 1% -2.87% (p=0.000 n=10) MapStringKeysEight_128-24 10.73n ± 4% 13.57n ± 8% +26.57% (p=0.000 n=10) MapStringKeysEight_256-24 12.97n ± 2% 14.35n ± 4% +10.64% (p=0.001 n=10) MapStringKeysEight_1M-24 17359.50n ± 3% 13.92n ± 4% -99.92% (p=0.000 n=10) Change-Id: I4cc2ea4edab12a4b03236de626c7bcf0f96b6cc0 Reviewed-on: https://go-review.googlesource.com/c/go/+/625905 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-11-16runtime/pprof: reduce label overheadFelix Geisendörfer
Switch labelMap from map[string]string to use LabelSet as a data structure. Optimize Labels() for the case where the keys are given in sorted order without duplicates. This is primarily motivated by reducing the overhead of distributed tracing systems that use pprof labels. We have encountered cases where users complained about the overhead relative to the rest of our distributed tracing library code. Additionally, we see this as an opportunity to free up hundreds of CPU cores across our fleet. A secondary motivation is eBPF profilers that try to access pprof labels. The current map[string]string requires them to implement Go map access in eBPF, which is non-trivial. With the enablement of swiss maps, this complexity is only increasing. The slice data structure introduced in this CL will greatly lower the implementation complexity for eBPF profilers in the future. But to be clear: This change does not imply that the pprof label mechanism is now a stable ABI. They are still an implementation detail and may change again in the future. goos: darwin goarch: arm64 pkg: runtime/pprof cpu: Apple M1 Max │ baseline.txt │ patch1.txt │ │ sec/op │ sec/op vs base │ Labels/set-one-10 153.50n ± 3% 75.00n ± 1% -51.14% (p=0.000 n=10) Labels/merge-one-10 187.8n ± 1% 128.8n ± 1% -31.42% (p=0.000 n=10) Labels/overwrite-one-10 193.1n ± 2% 102.0n ± 1% -47.18% (p=0.000 n=10) Labels/ordered/set-many-10 502.6n ± 4% 146.1n ± 2% -70.94% (p=0.000 n=10) Labels/ordered/merge-many-10 516.3n ± 2% 238.1n ± 1% -53.89% (p=0.000 n=10) Labels/ordered/overwrite-many-10 569.3n ± 4% 247.6n ± 2% -56.51% (p=0.000 n=10) Labels/unordered/set-many-10 488.9n ± 2% 308.3n ± 3% -36.94% (p=0.000 n=10) Labels/unordered/merge-many-10 523.6n ± 1% 258.5n ± 1% -50.64% (p=0.000 n=10) Labels/unordered/overwrite-many-10 571.4n ± 1% 412.1n ± 2% -27.89% (p=0.000 n=10) geomean 366.8n 186.9n -49.05% │ baseline.txt │ patch1b.txt │ │ B/op │ B/op vs base │ Labels/set-one-10 424.0 ± 0% 104.0 ± 0% -75.47% (p=0.000 n=10) Labels/merge-one-10 424.0 ± 0% 200.0 ± 0% -52.83% (p=0.000 n=10) Labels/overwrite-one-10 424.0 ± 0% 136.0 ± 0% -67.92% (p=0.000 n=10) Labels/ordered/set-many-10 1344.0 ± 0% 392.0 ± 0% -70.83% (p=0.000 n=10) Labels/ordered/merge-many-10 1184.0 ± 0% 712.0 ± 0% -39.86% (p=0.000 n=10) Labels/ordered/overwrite-many-10 1056.0 ± 0% 712.0 ± 0% -32.58% (p=0.000 n=10) Labels/unordered/set-many-10 1344.0 ± 0% 712.0 ± 0% -47.02% (p=0.000 n=10) Labels/unordered/merge-many-10 1184.0 ± 0% 712.0 ± 0% -39.86% (p=0.000 n=10) Labels/unordered/overwrite-many-10 1.031Ki ± 0% 1.008Ki ± 0% -2.27% (p=0.000 n=10) geomean 843.1 405.1 -51.95% │ baseline.txt │ patch1b.txt │ │ allocs/op │ allocs/op vs base │ Labels/set-one-10 5.000 ± 0% 3.000 ± 0% -40.00% (p=0.000 n=10) Labels/merge-one-10 5.000 ± 0% 5.000 ± 0% ~ (p=1.000 n=10) ¹ Labels/overwrite-one-10 5.000 ± 0% 4.000 ± 0% -20.00% (p=0.000 n=10) Labels/ordered/set-many-10 8.000 ± 0% 3.000 ± 0% -62.50% (p=0.000 n=10) Labels/ordered/merge-many-10 8.000 ± 0% 5.000 ± 0% -37.50% (p=0.000 n=10) Labels/ordered/overwrite-many-10 7.000 ± 0% 4.000 ± 0% -42.86% (p=0.000 n=10) Labels/unordered/set-many-10 8.000 ± 0% 4.000 ± 0% -50.00% (p=0.000 n=10) Labels/unordered/merge-many-10 8.000 ± 0% 5.000 ± 0% -37.50% (p=0.000 n=10) Labels/unordered/overwrite-many-10 7.000 ± 0% 5.000 ± 0% -28.57% (p=0.000 n=10) geomean 6.640 4.143 -37.60% ¹ all samples are equal Change-Id: Ie68e960a25c2d97bcfb6239dc481832fa8a39754 Reviewed-on: https://go-review.googlesource.com/c/go/+/574516 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-11-16runtime: implement Stop for AddCleanupCarlos Amedee
This change adds the implementation for AddCleanup.Stop. It allows the caller to cancel the call to execute the cleanup. Cleanup will not be stopped if the cleanup has already been queued for execution. For #67535 Change-Id: I494b77d344e54d772c41489d172286773c3814e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/627975 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Carlos Amedee <carlos@golang.org>
2024-11-16runtime: implement AddCleanupCarlos Amedee
This change introduces AddCleanup to the runtime package. AddCleanup attaches a cleanup function to an pointer to an object. The Stop method on Cleanups will be implemented in a followup CL. AddCleanup is intended to be an incremental improvement over SetFinalizer and will result in SetFinalizer being deprecated. For #67535 Change-Id: I99645152e3fdcee85fcf42a4f312c6917e8aecb1 Reviewed-on: https://go-review.googlesource.com/c/go/+/627695 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-11-16runtime/pprof: continued attempt to deflake the VMInfo test.Cosmos Nicolaou
This PR will use test.Skip to bypass a test run for which the vmmap subprocess appears to hang before the test times out. In addition it catches a different error message from vmmap that can occur due to transient resource shortages and triggers a retry for this additional case. Fixes #62352 Change-Id: I3ae749e5cd78965c45b1b7c689b896493aa37ba0 Reviewed-on: https://go-review.googlesource.com/c/go/+/560935 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-16runtime: improve CALLFN macro for loong64Xiaolin Zhao
The previous CALLFN macro was copying a single byte at a time which is inefficient on loong64. In this CL, according to the argsize, copy 16 bytes or 8 bytes at a time, and copy 1 byte a time for the rest. benchmark in reflect on 3A5000 and 3A6000: goos: linux goarch: loong64 pkg: reflect cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | CallArgCopy/size=128 360.2n ± 0% 266.9n ± 0% -25.90% (p=0.000 n=20) CallArgCopy/size=256 473.2n ± 0% 277.5n ± 0% -41.35% (p=0.000 n=20) CallArgCopy/size=1024 1128.0n ± 0% 332.9n ± 0% -70.49% (p=0.000 n=20) CallArgCopy/size=4096 3743.0n ± 0% 672.6n ± 0% -82.03% (p=0.000 n=20) CallArgCopy/size=65536 58.888µ ± 0% 9.667µ ± 0% -83.58% (p=0.000 n=20) geomean 2.116µ 693.4n -67.22% | bench.old | bench.new | | B/s | B/s vs base | CallArgCopy/size=128 338.9Mi ± 0% 457.3Mi ± 0% +34.94% (p=0.000 n=20) CallArgCopy/size=256 516.0Mi ± 0% 879.8Mi ± 0% +70.52% (p=0.000 n=20) CallArgCopy/size=1024 865.5Mi ± 0% 2933.6Mi ± 0% +238.94% (p=0.000 n=20) CallArgCopy/size=4096 1.019Gi ± 0% 5.672Gi ± 0% +456.52% (p=0.000 n=20) CallArgCopy/size=65536 1.036Gi ± 0% 6.313Gi ± 0% +509.13% (p=0.000 n=20) geomean 699.6Mi 2.085Gi +205.10% goos: linux goarch: loong64 pkg: reflect cpu: Loongson-3A5000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | CallArgCopy/size=128 466.6n ± 0% 368.7n ± 0% -20.98% (p=0.000 n=20) CallArgCopy/size=256 579.4n ± 0% 384.6n ± 0% -33.62% (p=0.000 n=20) CallArgCopy/size=1024 1273.0n ± 0% 492.0n ± 0% -61.35% (p=0.000 n=20) CallArgCopy/size=4096 4049.0n ± 0% 978.1n ± 0% -75.84% (p=0.000 n=20) CallArgCopy/size=65536 69.01µ ± 0% 14.50µ ± 0% -78.99% (p=0.000 n=20) geomean 2.492µ 997.9n -59.96% | bench.old | bench.new | | B/s | B/s vs base | CallArgCopy/size=128 261.6Mi ± 0% 331.0Mi ± 0% +26.54% (p=0.000 n=20) CallArgCopy/size=256 421.4Mi ± 0% 634.8Mi ± 0% +50.66% (p=0.000 n=20) CallArgCopy/size=1024 767.2Mi ± 0% 1985.0Mi ± 0% +158.75% (p=0.000 n=20) CallArgCopy/size=4096 964.8Mi ± 0% 3993.8Mi ± 0% +313.95% (p=0.000 n=20) CallArgCopy/size=65536 905.7Mi ± 0% 4310.6Mi ± 0% +375.97% (p=0.000 n=20) geomean 593.9Mi 1.449Gi +149.76% Change-Id: I9570395af80b2e4b760058098a1b5b07d4b37ad7 Reviewed-on: https://go-review.googlesource.com/c/go/+/627175 Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-11-16runtime: use ABIInternal for calls to sigtrampgo on linux/loong64Guoqi Chen
Change-Id: I13fd5a96daff66a2ecb54f5bafa3d6e5c60f3879 Reviewed-on: https://go-review.googlesource.com/c/go/+/586357 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
2024-11-15runtime: unify lock2, allow deeper sleepRhys Hiltner
The tri-state mutex implementation (unlocked, locked, sleeping) avoids sleep/wake syscalls when contention is low or absent, but its performance degrades when many threads are contending for a mutex to execute a fast critical section. A fast critical section means frequent unlock2 calls. Each of those finds the mutex in the "sleeping" state and so wakes a sleeping thread, even if many other threads are already awake and in the spin loop of lock2 attempting to acquire the mutex for themselves. Many spinning threads means wasting energy and CPU time that could be used by other processes on the machine. Many threads all spinning on the same cache line leads to performance collapse. Merge the futex- and semaphore-based mutex implementations by using a semaphore abstraction for futex platforms. Then, add a bit to the mutex state word that communicates whether one of the waiting threads is awake and spinning. When threads in lock2 see the new "spinning" bit, they can sleep immediately. In unlock2, the "spinning" bit means we can save a syscall and not wake a sleeping thread. This brings up the real possibility of starvation: waiting threads are able to enter a deeper sleep than before, since one of their peers can volunteer to be the sole "spinning" thread and thus cause unlock2 to skip the semawakeup call. Additionally, the waiting threads form a LIFO stack so any wakeups that do occur will target threads that have gone to sleep most recently. Counteract those effects by periodically waking the thread at the bottom of the stack and allowing it to spin. Exempt sched.lock from most of the new behaviors; it's often used by several threads in sequence to do thread-specific work, so low-latency handoff is a priority over improved throughput. Gate use of this implementation behind GOEXPERIMENT=spinbitmutex, so it's easy to disable. Enable it by default on supported platforms (the most efficient implementation requires atomic.Xchg8). Fixes #68578 goos: linux goarch: amd64 pkg: runtime cpu: 13th Gen Intel(R) Core(TM) i7-13700H │ old │ new │ │ sec/op │ sec/op vs base │ MutexContention 17.82n ± 0% 17.74n ± 0% -0.42% (p=0.000 n=10) MutexContention-2 22.17n ± 9% 19.85n ± 12% ~ (p=0.089 n=10) MutexContention-3 26.14n ± 14% 20.81n ± 13% -20.41% (p=0.000 n=10) MutexContention-4 29.28n ± 8% 21.19n ± 10% -27.62% (p=0.000 n=10) MutexContention-5 31.79n ± 2% 21.98n ± 10% -30.83% (p=0.000 n=10) MutexContention-6 34.63n ± 1% 22.58n ± 5% -34.79% (p=0.000 n=10) MutexContention-7 44.16n ± 2% 23.14n ± 7% -47.59% (p=0.000 n=10) MutexContention-8 53.81n ± 3% 23.66n ± 6% -56.04% (p=0.000 n=10) MutexContention-9 65.58n ± 4% 23.91n ± 9% -63.54% (p=0.000 n=10) MutexContention-10 77.35n ± 3% 26.06n ± 9% -66.31% (p=0.000 n=10) MutexContention-11 89.62n ± 1% 25.56n ± 9% -71.47% (p=0.000 n=10) MutexContention-12 102.45n ± 2% 25.57n ± 7% -75.04% (p=0.000 n=10) MutexContention-13 111.95n ± 1% 24.59n ± 8% -78.04% (p=0.000 n=10) MutexContention-14 123.95n ± 3% 24.42n ± 6% -80.30% (p=0.000 n=10) MutexContention-15 120.80n ± 10% 25.54n ± 6% -78.86% (p=0.000 n=10) MutexContention-16 128.10n ± 25% 26.95n ± 4% -78.96% (p=0.000 n=10) MutexContention-17 139.80n ± 18% 24.96n ± 5% -82.14% (p=0.000 n=10) MutexContention-18 141.35n ± 7% 25.05n ± 8% -82.27% (p=0.000 n=10) MutexContention-19 151.35n ± 18% 25.72n ± 6% -83.00% (p=0.000 n=10) MutexContention-20 153.30n ± 20% 24.75n ± 6% -83.85% (p=0.000 n=10) MutexHandoff/Solo-20 13.54n ± 1% 13.61n ± 4% ~ (p=0.206 n=10) MutexHandoff/FastPingPong-20 141.3n ± 209% 164.8n ± 49% ~ (p=0.436 n=10) MutexHandoff/SlowPingPong-20 1.572µ ± 16% 1.804µ ± 19% +14.76% (p=0.015 n=10) geomean 74.34n 30.26n -59.30% goos: darwin goarch: arm64 pkg: runtime cpu: Apple M1 │ old │ new │ │ sec/op │ sec/op vs base │ MutexContention 13.86n ± 3% 12.09n ± 3% -12.73% (p=0.000 n=10) MutexContention-2 15.88n ± 1% 16.50n ± 2% +3.94% (p=0.001 n=10) MutexContention-3 18.45n ± 2% 16.88n ± 2% -8.54% (p=0.000 n=10) MutexContention-4 20.01n ± 2% 18.94n ± 18% ~ (p=0.469 n=10) MutexContention-5 22.60n ± 1% 17.51n ± 9% -22.50% (p=0.000 n=10) MutexContention-6 23.93n ± 2% 17.35n ± 2% -27.48% (p=0.000 n=10) MutexContention-7 24.69n ± 1% 17.15n ± 3% -30.54% (p=0.000 n=10) MutexContention-8 25.01n ± 1% 17.33n ± 2% -30.69% (p=0.000 n=10) MutexHandoff/Solo-8 13.96n ± 4% 12.04n ± 4% -13.78% (p=0.000 n=10) MutexHandoff/FastPingPong-8 68.89n ± 4% 64.62n ± 2% -6.20% (p=0.000 n=10) MutexHandoff/SlowPingPong-8 9.698µ ± 22% 9.646µ ± 35% ~ (p=0.912 n=10) geomean 38.20n 32.53n -14.84% Change-Id: I0058c75eadf282d08eea7fce0d426f0518039f7c Reviewed-on: https://go-review.googlesource.com/c/go/+/620435 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
2024-11-15runtime: allow futex OSes to use sema-based mutexRhys Hiltner
Implement sema{create,sleep,wakeup} in terms of the futex syscall when available. Split the lock2/unlock2 implementations out of lock_sema.go and lock_futex.go (which they shared with runtime.note) to allow swapping in new implementations of those. Let futex-based platforms use the semaphore-based mutex implementation. Control that via the new "spinbitmutex" GOEXPERMENT value, disabled by default. This lays the groundwork for a "spinbit" mutex implementation; it does not include the new mutex implementation. For #68578. Change-Id: I091289c85124212a87abec7079ecbd9e610b4270 Reviewed-on: https://go-review.googlesource.com/c/go/+/622996 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-15runtime: validate all calls to SetFinalizerCarlos Amedee
This change moves the check for a change in the memory management system to after the SetFinalizer parameters have been validated. Moving the check ensures that invalid parameters will never pass the validation checks. Change-Id: I9f1d3454f891f7b147c0d86b6720297172e08ef9 Reviewed-on: https://go-review.googlesource.com/c/go/+/625035 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-15runtime: add race detector tips to reportZombies funcLin Lin
We can find a few issues finally turned out to be a race condition, such as #47513. I believe such a tip can eliminate the need for developers to file this kind of issue in the first place. Change-Id: I1597fa09fde641882e8e87453470941747705272 GitHub-Last-Rev: 9f136f5b3bee78f90f434dcea1cabf397c6c05f2 GitHub-Pull-Request: golang/go#70331 Reviewed-on: https://go-review.googlesource.com/c/go/+/627816 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-11-14runtime: add test for mutex starvationRhys Hiltner
When multiple threads all need to acquire the same runtime.mutex, make sure that none of them has to wait for too long. Measure how long a single thread can capture the mutex, and how long individual other threads must go between having a turn with the mutex. For #68578 Change-Id: I56ecc551232f9c2730c128a9f8eeb7bd45c2d3b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/622995 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-11-14runtime/cgo: report a meaningful error message when using Cygwinqmuntal
Go has never supported Cygwin as a C compiler, but users get the following cryptic error message when they try to use it: implicit declaration of function '_beginthread' This is because Cygwin doesn't implement _beginthread. Note that this is not the only problem with Cygwin, but it's the one that users are most likely to run into first. This CL improves the error message to make it clear that Cygwin is not supported, and suggests using MinGW instead. Fixes #59490 Fixes #36691 Change-Id: Ifeec7a2cb38d7c5f50d6362c95504f72818c6a76 Reviewed-on: https://go-review.googlesource.com/c/go/+/627935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-11-14runtime: make Frames example produce documented outputJes Cok
I believe now this code can work in both test and standalone situations. Fixes #70057 Change-Id: Ieb5163e6b917fd03d050f65589df6c31ad2515fe GitHub-Last-Rev: db4863c05e4d4bcbd40caf459d29e2eee81f847b GitHub-Pull-Request: golang/go#70270 Reviewed-on: https://go-review.googlesource.com/c/go/+/625904 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Bypass: Ian Lance Taylor <iant@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-11-13runtime/pprof: note different between go test -memprofile and WriteHeapProfileSean Liao
Fixes #65328 Change-Id: I11242be93a95e117a6758ac037e143c3b38aa71c Reviewed-on: https://go-review.googlesource.com/c/go/+/597980 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2024-11-13runtime: prevent weak->strong conversions during mark terminationMichael Anthony Knyszek
Currently it's possible for weak->strong conversions to create more GC work during mark termination. When a weak->strong conversion happens during the mark phase, we need to mark the newly-strong pointer, since it may now be the only pointer to that object. In other words, the object could be white. But queueing new white objects creates GC work, and if this happens during mark termination, we could end up violating mark termination invariants. In the parlance of the mark termination algorithm, the weak->strong conversion is a non-monotonic source of GC work, unlike the write barriers (which will eventually only see black objects). This change fixes the problem by forcing weak->strong conversions to block during mark termination. We can do this efficiently by setting a global flag before the ragged barrier that is checked at each weak->strong conversion. If the flag is set, then the conversions block. The ragged barrier ensures that all Ps have observed the flag and that any weak->strong conversions which completed before the ragged barrier have their newly-minted strong pointers visible in GC work queues if necessary. We later unset the flag and wake all the blocked goroutines during the mark termination STW. There are a few subtleties that we need to account for. For one, it's possible that a goroutine which blocked in a weak->strong conversion wakes up only to find it's mark termination time again, so we need to recheck the global flag on wake. We should also stay non-preemptible while performing the check, so that if the check *does* appear as true, it cannot switch back to false while we're actively trying to block. If it switches to false while we try to block, then we'll be stuck in the queue until the following GC. All-in-all, this CL is more complicated than I would have liked, but it's the only idea so far that is clearly correct to me at a high level. This change adds a test which is somewhat invasive as it manipulates mark termination, but hopefully that infrastructure will be useful for debugging, fixing, and regression testing mark termination whenever we do fix it. Fixes #69803. Change-Id: Ie314e6fd357c9e2a07a9be21f217f75f7aba8c4a Reviewed-on: https://go-review.googlesource.com/c/go/+/623615 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-11-13internal/runtime/maps: use match to skip non-full slots in iterationMichael Pratt
Iteration over swissmaps with low load (think map with large hint but only one entry) is signicantly regressed vs old maps. See noswiss vs swiss-tip below (+60%). Currently we visit every single slot and individually check if the slot is full or not. We can do much better by using the control word to find all full slots in a group in a single operation. This lets us skip completely empty groups for instance. Always using the control match approach is great for maps with low load, but is a regression for mostly full maps. Mostly full maps have the majority of slots full, so most calls to mapiternext will return the next slot. In that case, doing the full group match on every call is more expensive than checking the individual slot. Thus we take a hybrid approach: on each call, we first check an individual slot. If that slot is full, we're done. If that slot is non-full, then we fall back to doing full group matches. This trade-off works well. Both mostly empty and mostly full maps perform nearly as well as doing all matching and all individual, respectively. The fast path is placed above the slow path loop rather than combined (with some sort of `useMatch` variable) into a single loop to help the compiler's code generation. The compiler really struggles with code generation on a combined loop for some reason, yielding ~15% additional instructions/op. Comparison with old maps prior to this CL: │ noswiss │ swiss-tip │ │ sec/op │ sec/op vs base │ MapIter/Key=int64/Elem=int64/len=6-12 11.53n ± 2% 10.64n ± 2% -7.72% (p=0.002 n=6) MapIter/Key=int64/Elem=int64/len=64-12 10.180n ± 2% 9.670n ± 5% -5.01% (p=0.004 n=6) MapIter/Key=int64/Elem=int64/len=65536-12 10.78n ± 1% 10.15n ± 2% -5.84% (p=0.002 n=6) MapIterLowLoad/Key=int64/Elem=int64/len=6-12 6.116n ± 2% 6.840n ± 2% +11.84% (p=0.002 n=6) MapIterLowLoad/Key=int64/Elem=int64/len=64-12 2.403n ± 2% 3.892n ± 0% +61.95% (p=0.002 n=6) MapIterLowLoad/Key=int64/Elem=int64/len=65536-12 1.940n ± 3% 3.237n ± 1% +66.81% (p=0.002 n=6) MapPop/Key=int64/Elem=int64/len=6-12 66.20n ± 2% 60.14n ± 3% -9.15% (p=0.002 n=6) MapPop/Key=int64/Elem=int64/len=64-12 97.24n ± 1% 171.35n ± 1% +76.21% (p=0.002 n=6) MapPop/Key=int64/Elem=int64/len=65536-12 826.1n ± 12% 842.5n ± 10% ~ (p=0.937 n=6) geomean 17.93n 20.96n +16.88% After this CL: │ noswiss │ swiss-cl │ │ sec/op │ sec/op vs base │ MapIter/Key=int64/Elem=int64/len=6-12 11.53n ± 2% 10.90n ± 3% -5.42% (p=0.002 n=6) MapIter/Key=int64/Elem=int64/len=64-12 10.180n ± 2% 9.719n ± 9% -4.53% (p=0.043 n=6) MapIter/Key=int64/Elem=int64/len=65536-12 10.78n ± 1% 10.07n ± 2% -6.63% (p=0.002 n=6) MapIterLowLoad/Key=int64/Elem=int64/len=6-12 6.116n ± 2% 7.022n ± 1% +14.82% (p=0.002 n=6) MapIterLowLoad/Key=int64/Elem=int64/len=64-12 2.403n ± 2% 1.475n ± 1% -38.63% (p=0.002 n=6) MapIterLowLoad/Key=int64/Elem=int64/len=65536-12 1.940n ± 3% 1.210n ± 6% -37.67% (p=0.002 n=6) MapPop/Key=int64/Elem=int64/len=6-12 66.20n ± 2% 61.54n ± 2% -7.02% (p=0.002 n=6) MapPop/Key=int64/Elem=int64/len=64-12 97.24n ± 1% 110.10n ± 1% +13.23% (p=0.002 n=6) MapPop/Key=int64/Elem=int64/len=65536-12 826.1n ± 12% 504.7n ± 6% -38.91% (p=0.002 n=6) geomean 17.93n 15.29n -14.74% For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10 Change-Id: Ic07f9df763239e85be57873103df5007144fdaef Reviewed-on: https://go-review.googlesource.com/c/go/+/627156 Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2024-11-13runtime: add benchmark of iteration over map with low loadMichael Pratt
Change-Id: I3a3b7da6245a18bf1db0c595008f0eea853ce544 Reviewed-on: https://go-review.googlesource.com/c/go/+/627155 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2024-11-13runtime: reserve 4kB for system stack on windows-386Russ Cox
The failures in #70288 are consistent with and strongly imply stack corruption during fault handling, and debug prints show that the Go code run during fault handling is running about 300 bytes above the bottom of the goroutine stack. That should be okay, but that implies the DLL code that called Go's handler was running near the bottom of the stack too, and maybe it called other deeper things before or after the Go handler and smashed the stack that way. stackSystem is already 4096 bytes on amd64; making it match that on 386 makes the flaky failures go away. It's a little unsatisfying not to be able to say exactly what is overflowing the stack, but the circumstantial evidence is very strong that it's Windows. Fixes #70288. Change-Id: Ife89385873d5e5062a71629dbfee40825edefa49 Reviewed-on: https://go-review.googlesource.com/c/go/+/627375 Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-12runtime: fix iterator returns map entries after clear (pre-swissmap)Youlin Feng
Fixes #70189 Fixes #59411 Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-noswissmap Change-Id: I4ef7ecd7e996330189309cb2a658cf34bf9e1119 Reviewed-on: https://go-review.googlesource.com/c/go/+/625275 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-12cmd/compile: optimize math/bits.OnesCount{16,32,64} implementation on loong64Guoqi Chen
Use Loong64's LSX instruction VPCNT to implement math/bits.OnesCount{16,32,64} and make it intrinsic. Benchmark results on loongson 3A5000 and 3A6000 machines: goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A5000-HV @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | OnesCount 4.413n ± 0% 1.401n ± 0% -68.25% (p=0.000 n=10) OnesCount8 1.364n ± 0% 1.363n ± 0% ~ (p=0.130 n=10) OnesCount16 2.112n ± 0% 1.534n ± 0% -27.37% (p=0.000 n=10) OnesCount32 4.533n ± 0% 1.529n ± 0% -66.27% (p=0.000 n=10) OnesCount64 4.565n ± 0% 1.531n ± 1% -66.46% (p=0.000 n=10) geomean 3.048n 1.470n -51.78% goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | OnesCount 3.553n ± 0% 1.201n ± 0% -66.20% (p=0.000 n=10) OnesCount8 0.8021n ± 0% 0.8004n ± 0% -0.21% (p=0.000 n=10) OnesCount16 1.216n ± 0% 1.000n ± 0% -17.76% (p=0.000 n=10) OnesCount32 3.006n ± 0% 1.035n ± 0% -65.57% (p=0.000 n=10) OnesCount64 3.503n ± 0% 1.035n ± 0% -70.45% (p=0.000 n=10) geomean 2.053n 1.006n -51.01% Change-Id: I07a5b8da2bb48711b896387ec7625145804affc8 Reviewed-on: https://go-review.googlesource.com/c/go/+/620978 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-11internal/runtime/maps: don't hash twice when deletingKeith Randall
│ baseline │ experiment │ │ sec/op │ sec/op vs base │ MapDeleteLargeKey-24 312.0n ± 6% 162.3n ± 5% -47.97% (p=0.000 n=10) Change-Id: I31f1f8e3c344cf8abf2e9eb4b51b78fcd67b93c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/625906 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2024-11-11runtime, syscall: use pointer types on wasmimport functionsCherry Mui
Now that we support pointer types on wasmimport functions, use them, instead of unsafe.Pointer. This removes unsafe conversions. There is still one unsafe.Pointer argument left. It is actually a *Stat_t, which is an exported type with an int field, which is not allowed as a wasmimport field type. We probably cannot change it at this point. Updates #66984. Change-Id: I445c70b356c3877a5604bee67d19d99a538c682e Reviewed-on: https://go-review.googlesource.com/c/go/+/627059 Reviewed-by: Johan Brandhorst-Satzkorn <johan.brandhorst@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-11-08cmd/internal/goobj: regenerate builtinlistIan Lance Taylor
CL 622042 added rand as a compiler builtin, but did not update builtinlist. Also update the mkbuiltin comment to refer to the current file location, and add a comment for runtime.rand that it is called from the compiler. For #54766 Change-Id: I99d2c0bb0658da333775afe2ed0447265c845c82 Reviewed-on: https://go-review.googlesource.com/c/go/+/626755 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Ian Lance Taylor <iant@golang.org>
2024-11-08cmd/compile/internal: intrinsify publicationBarrier on loong64Guoqi Chen
The publication barrier is a StoreStore barrier, which is implemented by "DBAR 0x1A" [1] on loong64. goos: linux goarch: loong64 pkg: runtime cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | Malloc8 31.76n ± 0% 22.79n ± 0% -28.24% (p=0.000 n=20) Malloc8-2 25.46n ± 0% 18.33n ± 0% -28.00% (p=0.000 n=20) Malloc8-4 25.75n ± 0% 18.43n ± 0% -28.41% (p=0.000 n=20) Malloc16 62.97n ± 0% 42.41n ± 0% -32.65% (p=0.000 n=20) Malloc16-2 49.11n ± 0% 31.68n ± 0% -35.50% (p=0.000 n=20) Malloc16-4 49.64n ± 1% 31.95n ± 0% -35.62% (p=0.000 n=20) MallocTypeInfo8 58.57n ± 0% 46.51n ± 0% -20.61% (p=0.000 n=20) MallocTypeInfo8-2 51.43n ± 0% 38.01n ± 0% -26.09% (p=0.000 n=20) MallocTypeInfo8-4 51.65n ± 0% 38.15n ± 0% -26.13% (p=0.000 n=20) MallocTypeInfo16 68.07n ± 0% 51.62n ± 0% -24.17% (p=0.000 n=20) MallocTypeInfo16-2 54.73n ± 0% 41.13n ± 0% -24.85% (p=0.000 n=20) MallocTypeInfo16-4 55.05n ± 0% 41.28n ± 0% -25.02% (p=0.000 n=20) MallocLargeStruct 491.5n ± 0% 454.8n ± 0% -7.47% (p=0.000 n=20) MallocLargeStruct-2 351.8n ± 1% 323.8n ± 0% -7.94% (p=0.000 n=20) MallocLargeStruct-4 333.6n ± 0% 316.7n ± 0% -5.10% (p=0.000 n=20) geomean 71.01n 53.78n -24.26% [1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html Change-Id: Ica0c89db6f2bebd55d9b3207a1c462a9454e9268 Reviewed-on: https://go-review.googlesource.com/c/go/+/577515 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-11-07runtime/cgo: use pthread_getattr_np on AndroidCherry Mui
It is defined in bionic libc since at least API level 3. Use it. Updates #68285. Change-Id: I215c2d61d5612e7c0298b2cb69875690f8fbea66 Reviewed-on: https://go-review.googlesource.com/c/go/+/626275 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2024-11-07runtime/pprof: add label benchmarkFelix Geisendörfer
Add several benchmarks for pprof labels to analyze the impact of follow-up CLs. Change-Id: Ifae39cfe83ec93858fce9e3af6c1be024ba76736 Reviewed-on: https://go-review.googlesource.com/c/go/+/574515 Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-11-07runtime/race: treat map concurrent access detection as a race detector hitKeith Randall
Sometimes the runtime realizes there is a race before the race detector does. Maybe that's a bug in the race detector? But we should probably handle it. Update #70164 (Fixes? I'm not sure.) Change-Id: Ie7e8bf2b06701368e0551b4a1aa40f6746bbddd8 Reviewed-on: https://go-review.googlesource.com/c/go/+/626036 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-07cmd/compiler,internal/runtime/atomic: optimize Store{64,32,8} on loong64Guoqi Chen
On Loong64, AMSWAPDB{W,V} instructions are supported by default, and AMSWAPDB{B,H} [1] is a new instruction added by LA664(Loongson 3A6000) and later microarchitectures. Therefore, AMSWAPDB{W,V} (full barrier) is used to implement AtomicStore{32,64}, and the traditional MOVB or the new AMSWAPDBB is used to implement AtomicStore8 according to the CPU feature. The StoreRelease barrier on Loong64 is "dbar 0x12", but it is still necessary to ensure consistency in the order of Store/Load [2]. LoweredAtomicStorezero{32,64} was removed because on loong64 the constant "0" uses the R0 register, and there is no performance difference between the implementations of LoweredAtomicStorezero{32,64} and LoweredAtomicStore{32,64}. goos: linux goarch: loong64 pkg: internal/runtime/atomic cpu: Loongson-3A5000-HV @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | AtomicStore64 19.61n ± 0% 13.61n ± 0% -30.60% (p=0.000 n=20) AtomicStore64-2 19.61n ± 0% 13.61n ± 0% -30.57% (p=0.000 n=20) AtomicStore64-4 19.62n ± 0% 13.61n ± 0% -30.63% (p=0.000 n=20) AtomicStore 19.61n ± 0% 13.61n ± 0% -30.60% (p=0.000 n=20) AtomicStore-2 19.62n ± 0% 13.61n ± 0% -30.63% (p=0.000 n=20) AtomicStore-4 19.62n ± 0% 13.62n ± 0% -30.58% (p=0.000 n=20) AtomicStore8 19.61n ± 0% 20.01n ± 0% +2.04% (p=0.000 n=20) AtomicStore8-2 19.62n ± 0% 20.02n ± 0% +2.01% (p=0.000 n=20) AtomicStore8-4 19.61n ± 0% 20.02n ± 0% +2.09% (p=0.000 n=20) geomean 19.61n 15.48n -21.08% goos: linux goarch: loong64 pkg: internal/runtime/atomic cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | AtomicStore64 18.03n ± 0% 12.81n ± 0% -28.93% (p=0.000 n=20) AtomicStore64-2 18.02n ± 0% 12.81n ± 0% -28.91% (p=0.000 n=20) AtomicStore64-4 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20) AtomicStore 18.02n ± 0% 12.81n ± 0% -28.91% (p=0.000 n=20) AtomicStore-2 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20) AtomicStore-4 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20) AtomicStore8 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20) AtomicStore8-2 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20) AtomicStore8-4 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20) geomean 18.01n 12.81n -28.89% [1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html [2]: https://gcc.gnu.org/git/?p=gcc.git;a=blob_plain;f=gcc/config/loongarch/sync.md Change-Id: I4ae5e8dd0e6f026129b6e503990a763ed40c6097 Reviewed-on: https://go-review.googlesource.com/c/go/+/581356 Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn> Reviewed-by: Meidan Li <limeidan@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-11-04runtime/pprof: relax TestProfilerStackDepthNick Ripley
The TestProfilerStackDepth/heap test can spuriously fail if the profiler happens to capture a stack with an allocation several frames deep into runtime code. The pprof API hides runtime frames at the leaf-end of stacks, but those frames still count against the profiler's stack depth limit. The test checks only the first stack it finds with the desired prefix and fails if it's not deep enough or doesn't have the right root frame. So it can fail in that scenario, even though the implementation isn't really broken. Relax the test to check that there is at least one stack with desired prefix, depth, and root frame. Fixes #70112 Change-Id: I337fb3cccd1ddde76530b03aa1ec0f9608aa4112 Reviewed-on: https://go-review.googlesource.com/c/go/+/623998 Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-01runtime: fix out-of-date comment docchangwang ma
Change-Id: I352fa0e4e048b896d63427f1c2c519bfed24c702 Reviewed-on: https://go-review.googlesource.com/c/go/+/622017 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-10-30runtime: update and restore g0 stack bounds at cgocallbackCherry Mui
Currently, at a cgo callback where there is already a Go frame on the stack (i.e. C->Go->C->Go), we require that at the inner Go callback the SP is within the g0's stack bounds set by a previous callback. This is to prevent that the C code switches stack while having a Go frame on the stack, which we don't really support. But this could also happen when we cannot get accurate stack bounds, e.g. when pthread_getattr_np is not available. Since the stack bounds are just estimates based on the current SP, if there are multiple C->Go callbacks with various stack depth, it is possible that the SP of a later callback falls out of a previous call's estimate. This leads to runtime throw in a seemingly reasonable program. This CL changes it to save the old g0 stack bounds at cgocallback, update the bounds, and restore the old bounds at return. So each callback will get its own stack bounds based on the current SP, and when it returns, the outer callback has the its old stack bounds restored. Also, at a cgo callback when there is no Go frame on the stack, we currently always get new stack bounds. We do this because if we can only get estimated bounds based on the SP, and the stack depth varies a lot between two C->Go calls, the previous estimates may be off and we fall out or nearly fall out of the previous bounds. But this causes a performance problem: the pthread API to get accurate stack bounds (pthread_getattr_np) is very slow when called on the main thread. Getting the stack bounds every time significantly slows down repeated C->Go calls on the main thread. This CL fixes it by "caching" the stack bounds if they are accurate. I.e. at the second time Go calls into C, if the previous stack bounds are accurate, and the current SP is in bounds, we can be sure it is the same stack and we don't need to update the bounds. This avoids the repeated calls to pthread_getattr_np. If we cannot get the accurate bounds, we continue to update the stack bounds based on the SP, and that operation is very cheap. On a Linux/AMD64 machine with glibc: name old time/op new time/op delta CgoCallbackMainThread-8 96.4µs ± 3% 0.1µs ± 2% -99.92% (p=0.000 n=10+9) Fixes #68285. Fixes #68587. Change-Id: I3422badd5ad8ff63e1a733152d05fb7a44d5d435 Reviewed-on: https://go-review.googlesource.com/c/go/+/600296 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-10-30cmd/compile,internal/runtime/maps: stack allocated maps and small allocMichael Pratt
The compiler will stack allocate the Map struct and initial group if possible. Stack maps are initialized inline without calling into the runtime. Small heap allocated maps use makemap_small. These are the same heuristics as existing maps. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: I6c371d1309716fd1c38a3212d417b3c76db5c9b9 Reviewed-on: https://go-review.googlesource.com/c/go/+/622042 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2024-10-30runtime,internal/runtime/maps: specialized swissmapsMichael Pratt
Add all the specialized variants that exist for the existing maps. Like the existing maps, the fast variants do not support indirect key/elem. Note that as of this CL, the Get and Put methods on Map/table are effectively dead. They are only reachable from the internal/runtime/maps unit tests. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: I95297750be6200f34ec483e4cfc897f048c26db7 Reviewed-on: https://go-review.googlesource.com/c/go/+/616463 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com>
2024-10-30cmd/compile,runtime: add indirect key/elem to swissmapMichael Pratt
We use the same heuristics as existing maps. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: I44bb51483cae2c1714717f1b501850fb9e55a39a Reviewed-on: https://go-review.googlesource.com/c/go/+/616461 Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-30runtime: add concurrent write checks to swissmapMichael Pratt
This is the same design as existing maps. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: I5f6ef5fea1e0f0616bcd90eaae7faee4cdac58c6 Reviewed-on: https://go-review.googlesource.com/c/go/+/616460 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com>
2024-10-30internal/runtime/maps: enable race for map functions in internal/runtime/mapsMichael Pratt
For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: Iebc7f5482299cb7c4ecccc4c2eb46b4bc42c5fc3 Reviewed-on: https://go-review.googlesource.com/c/go/+/616459 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2024-10-30internal/race,runtime: linkname contents of internal/raceMichael Pratt
Rather than importing runtime directly, linkname the functions from runtime. This allows importing internal/race from internal/runtime/* packages, similar to internal/asan and internal/msan. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: Ibd9644557782076e3cee7927c8a6e6d2909f0a6e Reviewed-on: https://go-review.googlesource.com/c/go/+/616458 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2024-10-30internal/runtime/maps: proper capacity hint handlingMichael Pratt
When given a hint size, set the initial capacity large enough to avoid requiring growth in the average case. When not given a hint (or given 0), don't allocate anything at all. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: I8844fc652b8d2d4e5136cd56f7e78999a07fe381 Reviewed-on: https://go-review.googlesource.com/c/go/+/616457 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2024-10-29runtime: skip TestMemmoveOverflow with asanMichael Anthony Knyszek
On a whim I decided to investigate the possibility of whether the flakiness on the asan builder was due to a concurrently executing test. Of the most recent failures there were a few candidates, and this test was one of them. After disabling each candidate one by one, we had a winner: this test causes other concurrently executing tests, running pure Go code, to spuriously fail. I do not know why yet, but this test doesn't seem like it would have incredibly high value for ASAN, and does funky things like MAP_FIXED in recently unmapped regions, so I think it's fine. For #70054. For #64257. Change-Id: Ib9a84d9b69812e76c390d99b00698710ee1ece1a Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-asan-clang15 Reviewed-on: https://go-review.googlesource.com/c/go/+/623336 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-10-29runtime: move mapaccess1 and mapassign to internal/runtime/mapsMichael Pratt
This enables manual inlining Map.Get/table.getWithoutKey to create a simple fast path with no calls. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: Ic208dd4c02c7554f312b85b5fadccaf82b23545c Reviewed-on: https://go-review.googlesource.com/c/go/+/616455 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2024-10-29runtime: skip most map benchmark combinations by defaultMichael Pratt
Fixes #70008. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-race Change-Id: I1fd7d1cbda20cc96016c864bcf0696382453e807 Reviewed-on: https://go-review.googlesource.com/c/go/+/623335 Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-29internal/runtime/maps: remove type fieldsMichael Pratt
Rather than storing the same type pointer in multiple places, just pass it around. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: Ia6c74805c7a44125ae473177b317f16c6688e6de Reviewed-on: https://go-review.googlesource.com/c/go/+/622377 Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-29strings,bytes: use result of copy in subsequent slicingKeith Randall
This can get rid of a bounds check. Followup to CL 622240. Change-Id: I9d0a2c0408b8d274c46136d32d7a5fb09b4aad1c Reviewed-on: https://go-review.googlesource.com/c/go/+/622955 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2024-10-29runtime: skip TestNewOSProc0 with asan and msanMichael Anthony Knyszek
These fail for the same reason as for the race detector, and is the most frequently failing test in both. For #70054. For #64257. For #64256. Change-Id: I3649e58069190b4450f9d4deae6eb8eca5f827a3 Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-asan-clang15,gotip-linux-amd64-msan-clang15 Reviewed-on: https://go-review.googlesource.com/c/go/+/623176 TryBot-Bypass: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-10-29cmd/internal/obj/ppc64: support for extended mnemonics of BCJayanth Krishnamurthy
BGT, BLT, BLE, BGE, BNE, BVS, BVC, and BEQ support by assembler. This will simplify the usage of BC constructs like BC 12, 30, LR <=> BEQ CR7, LR BC 12, 2, LR <=> BEQ CR0, LR BC 12, 0, target <=> BLT CR0, target BC 12, 2, target <=> BEQ CR0, target BC 12, 5, target <=> BGT CR1, target BC 12, 30, target <=> BEQ CR7, target BC 4, 6, target <=> BNE CR1, target BC 4, 5, target <=> BLE CR1, target code cleanup based on the above additions. Change-Id: I02fdb212b6fe3f85ce447e05f4d42118c9ce63b5 Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/612395 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2024-10-29crypto/internal/fips: add self-test mechanismFilippo Valsorda
Updates #69536 Change-Id: Ib68b0e7058221a89908fd47f255f0a983883bee8 Reviewed-on: https://go-review.googlesource.com/c/go/+/621075 Reviewed-by: Daniel McCarney <daniel@binaryparadox.net> Auto-Submit: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Roland Shoemaker <roland@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org>