aboutsummaryrefslogtreecommitdiff
path: root/src/syscall/exec_linux.go
diff options
context:
space:
mode:
authorMichael Pratt <mpratt@google.com>2025-10-27 15:34:18 -0400
committerGopher Robot <gobot@golang.org>2025-10-28 13:58:12 -0700
commit041f564b3e6fa3f4af13a01b94db14c1ee8a42e0 (patch)
treec5a5edb73e94143952498b4677a5f544d070ade2 /src/syscall/exec_linux.go
parent81afd3a59be1a3f343bf2b9d6665cd0fc825c6ba (diff)
downloadgo-041f564b3e6fa3f4af13a01b94db14c1ee8a42e0.tar.xz
internal/runtime/gc/scan: avoid memory destination on VPCOMPRESSQ
On AMD Genoa / Zen 4, VPCOMPRESSQ with a memory destination imposes a severe performance penalty of another an order of magnitude compared to a register destination. We can trivially work around this penalty with a register destination and an additional move to memory. Benchmark results from: $ go test -bench=BenchmarkScanSpanPacked/.*/.*/.*/.*/impl=Platform internal/runtime/gc/scan I've only included the summarized geomean here because there are ~2500 unique test cases. AMD Genoa (Zen 4): cpu: AMD EPYC 9B14 96-Core Processor │ mem │ reg │ │ sec/op │ sec/op vs base │ geomean 1.039µ 310.1n -70.16% │ mem │ reg │ │ B/s │ B/s vs base │ geomean 2.906Gi 10.99Gi +278.27% As expected, we see a massive performance improvement on Genoa. AMD Turin (Zen 5): cpu: AMD EPYC 9B45 128-Core Processor │ mem │ reg │ │ sec/op │ sec/op vs base │ geomean 231.9n 237.3n +2.32% │ mem │ reg │ │ B/s │ B/s vs base │ geomean 14.79Gi 14.43Gi -2.50% On Turin there is a minor regression. This is primarily due to a fairly large regression (~15%) in very small microbenchmark cases where the entire memory fits in L1 cache. This regression disappears as memory access slows down with larger memories. The latter should be more common in real workloads. Intel Sapphire Rapids: cpu: Intel(R) Xeon(R) Platinum 8481C │ mem │ reg │ │ sec/op │ sec/op vs base │ geomean 254.9n 246.8n -3.18% │ mem │ reg │ │ B/s │ B/s vs base │ geomean 13.65Gi 14.15Gi +3.69% On Sapphire Rapids there is a minor improvement. Here results are fairly noisy. Most cases are a wash, but some are arbitrary 20% slower or 20% faster for unclear reasons. For #73581. Change-Id: I6a6a636cfd294a0dcdc4f34c9ece1bc9a6e5e4c7 Reviewed-on: https://go-review.googlesource.com/c/go/+/715362 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
Diffstat (limited to 'src/syscall/exec_linux.go')
0 files changed, 0 insertions, 0 deletions