aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/lfstack.go
diff options
context:
space:
mode:
authorfanzha02 <fannie.zhang@arm.com>2025-01-14 09:32:56 +0000
committerMichael Knyszek <mknyszek@google.com>2025-10-22 17:02:28 -0700
commit50586182abd82ec724b9beb6b806610d08846d8e (patch)
tree95d75aa8a575d1aea4a4abe11f0a6dfbc375ab8b /src/runtime/lfstack.go
parent1ff59f3dd3569e1225c9273fc205cb54df674bf5 (diff)
downloadgo-50586182abd82ec724b9beb6b806610d08846d8e.tar.xz
runtime: use backoff and ISB instruction to reduce contention in (*lfstack).pop and (*spanSet).pop on arm64
When profiling CPU usage LiveKit on AArch64/x86 (AWS), the graphs show CPU spikes that was repeating in a semi-periodic manner and spikes occur when the GC(garbage collector) is active. Our analysis found that the getempty function accounted for 10.54% of the overhead, which was mainly caused by the work.empty.pop() function. And listing pop shows that the majority of the time, with a 10.29% overhead, is spent on atomic.Cas64((*uint64)(head), old, next). This patch adds a backoff approach to reduce the high overhead of the atomic operation primarily occurs when contention over a specific memory address increases, typically with the rise in the number of threads. Note that on paltforms other than arm64, the initial value of backoff is zero. This patch rewrites the implementation of procyield() on arm64, which is an Armv8.0-A compatible delay function using the counter-timer. The garbage collector benchmark: │ master │ opt │ │ sec/op │ sec/op vs base │ Garbage/benchmem-MB=64-160 3.782m ± 4% 2.264m ± 2% -40.12% (p=0.000 n=10) │ user+sys-sec/op │ user+sys-sec/op vs base │ Garbage/benchmem-MB=64-160 433.5m ± 4% 255.4m ± 2% -41.08% (p=0.000 n=10) Reference for backoff mechianism: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/multi-threaded-applications-arm Change-Id: Ie8128a2243ceacbb82ab2a88941acbb8428bad94 Reviewed-on: https://go-review.googlesource.com/c/go/+/654895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
Diffstat (limited to 'src/runtime/lfstack.go')
-rw-r--r--src/runtime/lfstack.go15
1 files changed, 15 insertions, 0 deletions
diff --git a/src/runtime/lfstack.go b/src/runtime/lfstack.go
index 8946c80348..1e2f5a2965 100644
--- a/src/runtime/lfstack.go
+++ b/src/runtime/lfstack.go
@@ -34,6 +34,11 @@ func (head *lfstack) push(node *lfnode) {
}
func (head *lfstack) pop() unsafe.Pointer {
+ var backoff uint32
+ // TODO: tweak backoff parameters on other architectures.
+ if GOARCH == "arm64" {
+ backoff = 128
+ }
for {
old := atomic.Load64((*uint64)(head))
if old == 0 {
@@ -44,6 +49,16 @@ func (head *lfstack) pop() unsafe.Pointer {
if atomic.Cas64((*uint64)(head), old, next) {
return unsafe.Pointer(node)
}
+
+ // Use a backoff approach to reduce demand to the shared memory location
+ // decreases memory contention and allows for other threads to make quicker
+ // progress.
+ // Read more in this Arm blog post:
+ // https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/multi-threaded-applications-arm
+ procyield(backoff)
+ // Increase backoff time.
+ backoff += backoff / 2
+
}
}