aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/asm_arm64.s
AgeCommit message (Collapse)Author
2026-01-27runtime: rename aeshashbody to runtime.aeshashbodyMichael Pratt
Currently this is a raw symbol name with no package component, which is confusing when seen in profilers or similar tools. This function does not follow a Go ABI, and thus should not have a Go function declaration. go vet requires declaration for standard assembly functions. CL 176100 removed the package name as part of making vet pass on package runtime, but simply making the function static via the <> suffix is sufficient, there is no need to shorten the symbol name. Change-Id: I6a6a636c6030f1c9a4b8bb330978733bb336b08e Reviewed-on: https://go-review.googlesource.com/c/go/+/738521 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-11-26runtime/secret: implement new secret packageDaniel Morsing
Implement secret.Do. - When secret.Do returns: - Clear stack that is used by the argument function. - Clear all the registers that might contain secrets. - On stack growth in secret mode, clear the old stack. - When objects are allocated in secret mode, mark them and then zero the marked objects immediately when they are freed. - If the argument function panics, raise that panic as if it originated from secret.Do. This removes anything about the secret function from tracebacks. For now, this is only implemented on linux for arm64 and amd64. This is a rebased version of Keith Randalls initial implementation at CL 600635. I have added arm64 support, signal handling, preemption handling and dealt with vDSOs spilling into system stacks. Fixes #21865 Change-Id: I6fbd5a233beeaceb160785e0c0199a5c94d8e520 Co-authored-by: Keith Randall <khr@golang.org> Reviewed-on: https://go-review.googlesource.com/c/go/+/704615 Reviewed-by: Roland Shoemaker <roland@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Filippo Valsorda <filippo@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-11-17runtime: clear frame pointer at thread entry pointsNick Ripley
There are a few places in the runtime where new threads enter Go code with a possibly invalid frame pointer. mstart is the entry point for new Ms, and rt0_go is the entrypoint for the program. As we try to introduce frame pointer unwinding in more places (e.g. for heap profiling in CL 540476 or for execution trace events on the system stack in CL 593835), we see these functions on the stack. We need to ensure that they have valid frame pointers. These functions are both considered the "top" (first) frame frame of the call stack, so this CL sets the frame pointer register to 0 in these functions. Updates #63630 Change-Id: I6a6a6964a9ebc6f68ba23d2616e5fb6f19677f97 Reviewed-on: https://go-review.googlesource.com/c/go/+/721020 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-10-22runtime: use backoff and ISB instruction to reduce contention in ↵fanzha02
(*lfstack).pop and (*spanSet).pop on arm64 When profiling CPU usage LiveKit on AArch64/x86 (AWS), the graphs show CPU spikes that was repeating in a semi-periodic manner and spikes occur when the GC(garbage collector) is active. Our analysis found that the getempty function accounted for 10.54% of the overhead, which was mainly caused by the work.empty.pop() function. And listing pop shows that the majority of the time, with a 10.29% overhead, is spent on atomic.Cas64((*uint64)(head), old, next). This patch adds a backoff approach to reduce the high overhead of the atomic operation primarily occurs when contention over a specific memory address increases, typically with the rise in the number of threads. Note that on paltforms other than arm64, the initial value of backoff is zero. This patch rewrites the implementation of procyield() on arm64, which is an Armv8.0-A compatible delay function using the counter-timer. The garbage collector benchmark: │ master │ opt │ │ sec/op │ sec/op vs base │ Garbage/benchmem-MB=64-160 3.782m ± 4% 2.264m ± 2% -40.12% (p=0.000 n=10) │ user+sys-sec/op │ user+sys-sec/op vs base │ Garbage/benchmem-MB=64-160 433.5m ± 4% 255.4m ± 2% -41.08% (p=0.000 n=10) Reference for backoff mechianism: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/multi-threaded-applications-arm Change-Id: Ie8128a2243ceacbb82ab2a88941acbb8428bad94 Reviewed-on: https://go-review.googlesource.com/c/go/+/654895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-10-20runtime: make procyieldAsm no longer loop infinitely if passed 0Michael Anthony Knyszek
Change-Id: I9f01692373623687e09bee54efebaac0ee361f81 Reviewed-on: https://go-review.googlesource.com/c/go/+/712664 Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-20runtime: wrap procyield assembly and check for 0Michael Anthony Knyszek
procyield will currently loop infinitely if passed 0 on several platforms. This change sidesteps this bug by renaming procyield to procyieldAsm, and adding a wrapper named procyield that checks for cycles == 0. The benefit of this structure is that procyield called with a constant cycle count of 0 will be inlined and constant folded away, the expected behavior of a procyield of 0 cycles. A follow-up change will fix the assembly to not have this footgun anymore. Change-Id: I7068abfeb961bc0fa475e216836f7c0e46b38373 Reviewed-on: https://go-review.googlesource.com/c/go/+/712663 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-10-07Revert "cmd/compile: redo arm64 LR/FP save and restore"Keith Randall
This reverts commit 719dfcf8a8478d70360bf3c34c0e920be7b32994. Reason for revert: Causing crashes. Change-Id: I0b8526dd03d82fa074ce4f97f1789eeac702b3eb Reviewed-on: https://go-review.googlesource.com/c/go/+/709755 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-10-06cmd/compile: redo arm64 LR/FP save and restoreKeith Randall
Instead of storing LR (the return address) at 0(SP) and the FP (parent's frame pointer) at -8(SP), store them at framesize-8(SP) and framesize-16(SP), respectively. We push and pop data onto the stack such that we're never accessing anything below SP. The prolog/epilog lengths are unchanged (3 insns for a typical prolog, 2 for a typical epilog). We use 8 bytes more per frame. Typical prologue: STP.W (FP, LR), -16(SP) MOVD SP, FP SUB $C, SP Typical epilogue: ADD $C, SP LDP.P 16(SP), (FP, LR) RET The previous word where we stored LR, at 0(SP), is now unused. We could repurpose that slot for storing a local variable. The new prolog and epilog instructions are recognized by libunwind, so pc-sampling tools like perf should now be accurate. (TODO: except maybe after the first RET instruction? Have to look into that.) Update #73753 (fixes, for arm64) Update #57302 (Quim thinks this will help on that issue) Change-Id: I4800036a9a9a08aaaf35d9f99de79a36cf37ebb8 Reviewed-on: https://go-review.googlesource.com/c/go/+/674615 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2025-09-26runtime: unify arm64 entry point codeqmuntal
There is a lot of duplication in how arm64 OSes handle entry points. Do as amd64, have all the logic in a common function. Cq-Include-Trybots: luci.golang.try:gotip-darwin-arm64-longtest,gotip-windows-arm64 Change-Id: I370c25c3c4b107b525aba14e9dcac34a02d9872e Reviewed-on: https://go-review.googlesource.com/c/go/+/706175 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Quim Muntal <quimmuntal@gmail.com>
2025-07-24cmd/compile: move arm64 over to new bounds check strategyKeith Randall
For all the static bounds checks in cmd/go, we have: 6877 just a single instruction (the call itself) 139 needs an additional reg-reg move 602 needs an additional constant load 25 needs some other instruction that's ~90% implemented using just a single instruction. Reduces the text size of cmd/go by ~0.8%. Total binary size is just barely smaller, ~0.2%. (The difference is the new pcdata table.) Change-Id: I416e9c196f5d8d0e8f08e191e6df3045e11dccbe Reviewed-on: https://go-review.googlesource.com/c/go/+/682496 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-02runtime: clear frame pointer in morestackMichael Pratt
Corollary to CL 669615. morestack uses the frame pointer from g0.sched.bp. This doesn't really make any sense. morestack wasn't called by whatever used g0 last, so at best unwinding will get misleading results. For #63630. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-arm64-longtest Change-Id: I6a6a636c3a2994eb88f890c506c96fd899e993a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/669616 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-02runtime: don't restore from g0.sched in systemstack on arm64Michael Pratt
On arm64, systemstack restores the frame pointer from g0.sched to R29 prior to calling the callback. That doesn't really make any sense. The frame pointer value in g0.sched is some arbitrary BP from a prior context save, but that is not the caller of systemstack. amd64 does not do this. In fact, it leaves BP completely unmodified so frame pointer unwinders like gdb can walk through the systemstack frame and continue traceback on the caller's stack. Unlike mcall, systemstack always returns to the original goroutine, so that is safe. We should do the same on arm64. For #63630. Cq-Include-Trybots: luci.golang.try:gotip-linux-arm64-longtest Change-Id: I6a6a636c35d321dd5d7dc1c4d09e29b55b1ab621 Reviewed-on: https://go-review.googlesource.com/c/go/+/669236 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-02runtime: clear frame pointer in mcallMichael Pratt
On amd64, mcall leaves BP untouched, so the callback will push BP, connecting the g0 stack to the calling g stack. This seems OK (frame pointer unwinders like Linux perf can see what user code called into the scheduler), but the "scheduler" part is problematic. mcall is used when calling into the scheduler to deschedule the current goroutine (e.g., in goyield). Once the goroutine is descheduled, it may be picked up by another M and continue execution. The other thread is mutating the goroutine stack, but our M still has a frame pointer pointing to the goroutine stack. A frame pointer unwinder like Linux perf could get bogus values off of the mutating stack. Note that though the execution tracer uses framepointer unwinding, it never unwinds a g0, so it isn't affected. Clear the frame pointer in mcall so that unwinding always stops at mcall. On arm64, mcall stores the frame pointer from g0.sched.bp. This doesn't really make any sense. mcall wasn't called by whatever used g0 last, so at best unwinding will get misleading results (e.g., it might look like cgocallback calls mcall?). Also clear the frame pointer on arm64. Other architectures don't use frame pointers. For #63630. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-arm64-longtest Change-Id: I6a6a636cb6404f3c95ecabdb969c9b8184615cee Reviewed-on: https://go-review.googlesource.com/c/go/+/669615 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>
2025-02-26runtime: remove ret field from gobufKeith Randall
It's not used for anything. Change-Id: I031b3cdfe52b6b1cff4b3cb6713ffe588084542f Reviewed-on: https://go-review.googlesource.com/c/go/+/652276 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-25cmd/compile, runtime: use PC of deferreturn for panic transferDavid Chase
this removes the old conditional-on-register-value handshake from the deferproc/deferprocstack logic. The "line" for the recovery-exit frame itself (not the defers that it runs) is the closing brace of the function. Reduces code size slightly (e.g. go command is 0.2% smaller) Sample output showing effect of this change, also what sort of code it requires to observe the effect: ``` package main import "os" func main() { g(len(os.Args) - 1) // stack[0] } var gi int var pi *int = &gi //go:noinline func g(i int) { switch i { case 0: defer func() { println("g0", i) q() // stack[2] if i == 0 }() for j := *pi; j < 1; j++ { defer func() { println("recover0", recover().(string)) }() } default: for j := *pi; j < 1; j++ { defer func() { println("g1", i) q() // stack[2] if i == 1 }() } defer func() { println("recover1", recover().(string)) }() } p() } // stack[1] (deferreturn) //go:noinline func p() { panic("p()") } //go:noinline func q() { panic("q()") // stack[3] } /* Sample output for "./foo foo": recover1 p() g1 1 panic: q() goroutine 1 [running]: main.q() .../main.go:46 +0x2c main.g.func3() .../main.go:29 +0x48 main.g(0x1?) .../main.go:37 +0x68 main.main() .../main.go:6 +0x28 */ ``` Change-Id: Ie39ea62ecc244213500380ea06d44024cadc2317 Reviewed-on: https://go-review.googlesource.com/c/go/+/650795 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-17runtime: check LSE support on ARM64 at runtime initAndrey Bokhanko
Check presence of LSE support on ARM64 chip if we targeted it at compile time. Related to #69124 Updates #60905 Fixes #71411 Change-Id: I65e899a28ff64a390182572c0c353aa5931fc85d Reviewed-on: https://go-review.googlesource.com/c/go/+/645795 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
2025-01-27Revert "runtime: Check LSE support on ARM64 at runtime init"Cherry Mui
This reverts CL 610195. Reason for revert: SIGILL on macOS. See issue #71411. Updates #69124, #60905. Fixes #71411. Change-Id: Ie0624e516dfb32fb13563327bcd7f557e5cba940 Reviewed-on: https://go-review.googlesource.com/c/go/+/644695 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
2024-10-22runtime: Check LSE support on ARM64 at runtime initAndrey Bokhanko
Check presence of LSE support on ARM64 chip if we targeted it at compile time. Related to #69124 Update #60905 Change-Id: I6fe244decbb4982548982e1f88376847721a33c7 Reviewed-on: https://go-review.googlesource.com/c/go/+/610195 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Shu-Chun Weng <scw@google.com>
2024-05-11runtime: add runtime.debugPinnerV1Alessandro Arzilli
Adds runtime.debugPinnerV1 which returns a runtime.Pinner object that pins itself. This is intended to be used by debuggers in conjunction with runtime.debugCall to keep heap memory reachable even if it isn't referenced from anywhere else. Change-Id: I508ee6a7b103e68df83c96f2e04a0599200300dc Reviewed-on: https://go-review.googlesource.com/c/go/+/558276 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Austin Clements <austin@google.com>
2023-10-26runtime: print a stack trace at "morestack on g0"Cherry Mui
Error like "morestack on g0" is one of the errors that is very hard to debug, because often it doesn't print a useful stack trace. The runtime doesn't directly print a stack trace because it is a bad stack state to call print. Sometimes the SIGABRT may trigger a traceback, but sometimes not especially in a cgo binary. Even if it triggers a traceback it often does not include the stack trace of the bad stack. This CL makes it explicitly print a stack trace and throw. The idea is to have some space as an "emergency" crash stack. When the stack is in a really bad state, we switch to the crash stack and do a traceback. Currently only implemented on AMD64 and ARM64. TODO: also handle errors like "morestack on gsignal" and bad systemstack. Also handle other architectures. Change-Id: Ibfc397202f2bb0737c5cbe99f2763de83301c1c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/419435 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-06-14all: fix spelling errorsAlexander Yastrebov
Fix spelling errors discovered using https://github.com/codespell-project/codespell. Errors in data files and vendored packages are ignored. Change-Id: I83c7818222f2eea69afbd270c15b7897678131dc GitHub-Last-Rev: 3491615b1b82832cc0064f535786546e89aa6184 GitHub-Pull-Request: golang/go#60758 Reviewed-on: https://go-review.googlesource.com/c/go/+/502576 Auto-Submit: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-06-10runtime: fix typosJes Cok
Change-Id: If13f4d4bc545f78e3eb8c23cf2e63f0eb273d71f GitHub-Last-Rev: 32ca70f52a5c3dd66f18535c5e595e66afb3903c GitHub-Pull-Request: golang/go#60703 Reviewed-on: https://go-review.googlesource.com/c/go/+/502055 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-22runtime: rename getcallerfp to getfpFelix Geisendörfer
The previous name was wrong due to the mistaken assumption that calling f->g->getcallerpc and f->g->getcallersp would respectively return the pc/sp at g. However, they are actually referring to their caller's caller, i.e. f. Rename getcallerfp to getfp in order to stay consistent with this naming convention. Also see discussion on CL 463835. For #16638 This is a redo of CL 481617 that became necessary because CL 461738 added another call site for getcallerfp(). Change-Id: If0b536e85a6c26061b65e7b5c2859fc31385d025 Reviewed-on: https://go-review.googlesource.com/c/go/+/494857 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
2023-05-17runtime/cgo: store M for C-created thread in pthread keyCherry Mui
This reapplies CL 485500, with a fix drafted in CL 492987 incorporated. CL 485500 is reverted due to #60004 and #60007. #60004 is fixed in CL 492743. #60007 is fixed in CL 492987 (incorporated in this CL). [Original CL 485500 description] This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and CL 485316 incorporated. CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 482975 is a followup fix to a C declaration in testprogcgo. CL 485315 is a followup fix for x_cgo_getstackbound on Illumos. CL 485316 is a followup cleanup for ppc64 assembly. CL 479915 passed the G to _cgo_getstackbound for direct updates to gp.stack.lo. A G can be reused on a new thread after the previous thread exited. This could trigger the C TSAN race detector because it couldn't see the synchronization in Go (lockextra) preventing the same G from being used on multiple threads at the same time. We work around this by passing the address of a stack variable to _cgo_getstackbound rather than the G. The stack is generally unique per thread, so TSAN won't see the same address from multiple threads. Even if stacks are reused across threads by pthread, C TSAN should see the synchonization in the stack allocator. A regression test is added to misc/cgo/testsanitizer. [Original CL 481061 description] This reapplies CL 392854, with the followup fixes in CL 479255, CL 479915, and CL 481057 incorporated. CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 479255 is a followup fix for a small bug in ARM assembly code. CL 479915 is another followup fix to address C to Go calls after the C code uses some stack, but that CL is also buggy. CL 481057, by Michael Knyszek, is a followup fix for a memory leak bug of CL 479915. [Original CL 392854 description] In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls. So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call. Instead, we only dropm while the C thread exits, so the extra M won't leak. When invoking a Go function from C: Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor. And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits. When returning back to C: Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C. This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows. This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread. For the newly added BenchmarkCGoInCThread, some benchmark results: 1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz [CL 479915 description] Currently, when C calls into Go the first time, we grab an M using needm, which sets m.g0's stack bounds using the SP. We don't know how big the stack is, so we simply assume 32K. Previously, when the Go function returns to C, we drop the M, and the next time C calls into Go, we put a new stack bound on the g0 based on the current SP. After CL 392854, we don't drop the M, and the next time C calls into Go, we reuse the same g0, without recomputing the stack bounds. If the C code uses quite a bit of stack space before calling into Go, the SP may be well below the 32K stack bound we assumed, so the runtime thinks the g0 stack overflows. This CL makes needm get a more accurate stack bound from pthread. (In some platforms this may still be a guess as we don't know exactly where we are in the C stack), but it is probably better than simply assuming 32K. [CL 492987 description] On the first call into Go from a C thread, currently we set the g0 stack's high bound imprecisely based on the SP. With CL 485500, we keep the M and don't recompute the stack bounds when it calls into Go again. If the first call is made when the C thread uses some deep stack, but a subsequent call is made with a shallower stack, the SP may be above g0.stack.hi. This is usually okay as we don't check usually stack.hi. One place where we do check for stack.hi is in the signal handler, in adjustSignalStack. In particular, C TSAN delivers signals on the g0 stack (instead of the usual signal stack). If the SP is above g0.stack.hi, we don't see it is on the g0 stack, and throws. This CL makes it get an accurate stack upper bound with the pthread API (on the platforms where it is available). Also add some debug print for the "handler not on signal stack" throw. Fixes #51676. Fixes #59294. Fixes #59678. Fixes #60007. Change-Id: Ie51c8e81ade34ec81d69fd7bce1fe0039a470776 Reviewed-on: https://go-review.googlesource.com/c/go/+/495855 Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>
2023-05-11Revert "runtime: rename getcallerfp to getfp"Michael Pratt
This reverts CL 481617. Reason for revert: breaks test build on Windows Change-Id: Ifc1a323b0cc521e7a5a1f7de7b3da667f5fee375 Reviewed-on: https://go-review.googlesource.com/c/go/+/494377 Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-11runtime: rename getcallerfp to getfpFelix Geisendörfer
The previous name was wrong due to the mistaken assumption that calling f->g->getcallerpc and f->g->getcallersp would respectively return the pc/sp at g. However, they are actually referring to their caller's caller, i.e. f. Rename getcallerfp to getfp in order to stay consistent with this naming convention. Also see discussion on CL 463835. For #16638 Change-Id: I07990645da78819efd3db92f643326652ee516f8 Reviewed-on: https://go-review.googlesource.com/c/go/+/481617 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-05Revert "runtime/cgo: store M for C-created thread in pthread key"Chressie Himpel
This reverts CL 485500. Reason for revert: This breaks internal tests at Google, see b/280861579 and b/280820455. Change-Id: I426278d400f7611170918fc07c524cb059b9cc55 Reviewed-on: https://go-review.googlesource.com/c/go/+/492995 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Chressie Himpel <chressie@google.com>
2023-04-26runtime/cgo: store M for C-created thread in pthread keyMichael Pratt
This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and CL 485316 incorporated. CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 482975 is a followup fix to a C declaration in testprogcgo. CL 485315 is a followup fix for x_cgo_getstackbound on Illumos. CL 485316 is a followup cleanup for ppc64 assembly. [Original CL 481061 description] This reapplies CL 392854, with the followup fixes in CL 479255, CL 479915, and CL 481057 incorporated. CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 479255 is a followup fix for a small bug in ARM assembly code. CL 479915 is another followup fix to address C to Go calls after the C code uses some stack, but that CL is also buggy. CL 481057, by Michael Knyszek, is a followup fix for a memory leak bug of CL 479915. [Original CL 392854 description] In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls. So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call. Instead, we only dropm while the C thread exits, so the extra M won't leak. When invoking a Go function from C: Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor. And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits. When returning back to C: Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C. This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows. This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread. For the newly added BenchmarkCGoInCThread, some benchmark results: 1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz [CL 479915 description] Currently, when C calls into Go the first time, we grab an M using needm, which sets m.g0's stack bounds using the SP. We don't know how big the stack is, so we simply assume 32K. Previously, when the Go function returns to C, we drop the M, and the next time C calls into Go, we put a new stack bound on the g0 based on the current SP. After CL 392854, we don't drop the M, and the next time C calls into Go, we reuse the same g0, without recomputing the stack bounds. If the C code uses quite a bit of stack space before calling into Go, the SP may be well below the 32K stack bound we assumed, so the runtime thinks the g0 stack overflows. This CL makes needm get a more accurate stack bound from pthread. (In some platforms this may still be a guess as we don't know exactly where we are in the C stack), but it is probably better than simply assuming 32K. [CL 485500 description] CL 479915 passed the G to _cgo_getstackbound for direct updates to gp.stack.lo. A G can be reused on a new thread after the previous thread exited. This could trigger the C TSAN race detector because it couldn't see the synchronization in Go (lockextra) preventing the same G from being used on multiple threads at the same time. We work around this by passing the address of a stack variable to _cgo_getstackbound rather than the G. The stack is generally unique per thread, so TSAN won't see the same address from multiple threads. Even if stacks are reused across threads by pthread, C TSAN should see the synchonization in the stack allocator. A regression test is added to misc/cgo/testsanitizer. Fixes #51676. Fixes #59294. Fixes #59678. Change-Id: Ic62be31a06ee83568215e875a891df37084e08ca Reviewed-on: https://go-review.googlesource.com/c/go/+/485500 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Michael Pratt <mpratt@google.com>
2023-04-21runtime: tidy _Stack* constant namingAustin Clements
For #59670. Change-Id: I0efa743edc08e48dc8d906803ba45e9f641369db Reviewed-on: https://go-review.googlesource.com/c/go/+/486977 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>
2023-04-20Revert "runtime: tidy _Stack* constant naming"Austin Clements
This reverts commit CL 486381. Submitted out of order and breaks bootstrap. Change-Id: Ia472111cb966e884a48f8ee3893b3bf4b4f4f875 Reviewed-on: https://go-review.googlesource.com/c/go/+/486915 Reviewed-by: David Chase <drchase@google.com> TryBot-Bypass: Austin Clements <austin@google.com>
2023-04-20runtime: tidy _Stack* constant namingAustin Clements
For #59670. Change-Id: I4476d6f92663e8a825d063d6e6a7fc9a2ac99d4d Reviewed-on: https://go-review.googlesource.com/c/go/+/486381 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-04-20runtime: mix a bit more in arm64 hash functionKeith Randall
We really need 3 mix steps between the data being hashed and the output. One mix can only spread a 1 bit change to 32 bits. The second mix can spread to all 128 bits, but the spread is not complete. A third mix spreads out ~evenly to all 128 bits. The amd64 version has 3 mix steps. Fixes #59643 Change-Id: I54ad8686ca42bcffb6d0ec3779d27af682cc96e9 Reviewed-on: https://go-review.googlesource.com/c/go/+/486616 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-04-17Revert "runtime/cgo: store M for C-created thread in pthread key"Michael Pratt
This reverts CL 481061. Reason for revert: When built with C TSAN, x_cgo_getstackbound triggers race detection on `g->stacklo` because the synchronization is in Go, which isn't instrumented. For #51676. For #59294. For #59678. Change-Id: I38afcda9fcffd6537582a39a5214bc23dc147d47 Reviewed-on: https://go-review.googlesource.com/c/go/+/485275 TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Than McIntosh <thanm@google.com>
2023-04-03runtime/cgo: store M for C-created thread in pthread keydoujiang24
This reapplies CL 392854, with the followup fixes in CL 479255, CL 479915, and CL 481057 incorporated. CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 479255 is a followup fix for a small bug in ARM assembly code. CL 479915 is another followup fix to address C to Go calls after the C code uses some stack, but that CL is also buggy. CL 481057, by Michael Knyszek, is a followup fix for a memory leak bug of CL 479915. [Original CL 392854 description] In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls. So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call. Instead, we only dropm while the C thread exits, so the extra M won't leak. When invoking a Go function from C: Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor. And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits. When returning back to C: Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C. This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows. This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread. For the newly added BenchmarkCGoInCThread, some benchmark results: 1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz [CL 479915 description] Currently, when C calls into Go the first time, we grab an M using needm, which sets m.g0's stack bounds using the SP. We don't know how big the stack is, so we simply assume 32K. Previously, when the Go function returns to C, we drop the M, and the next time C calls into Go, we put a new stack bound on the g0 based on the current SP. After CL 392854, we don't drop the M, and the next time C calls into Go, we reuse the same g0, without recomputing the stack bounds. If the C code uses quite a bit of stack space before calling into Go, the SP may be well below the 32K stack bound we assumed, so the runtime thinks the g0 stack overflows. This CL makes needm get a more accurate stack bound from pthread. (In some platforms this may still be a guess as we don't know exactly where we are in the C stack), but it is probably better than simply assuming 32K. Fixes #51676. Fixes #59294. Change-Id: I9bf1400106d5c08ce621d2ed1df3a2d9e3f55494 Reviewed-on: https://go-review.googlesource.com/c/go/+/481061 Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: DeJiang Zhu (doujiang) <doujiang24@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-31Revert "runtime/cgo: store M for C-created thread in pthread key"Cherry Mui
This reverts CL 392854. Reason for revert: caused #59294, which was derived from google internal tests. The attempted fix of #59294 caused more breakage. Change-Id: I5a061561ac2740856b7ecc09725ac28bd30f8bba Reviewed-on: https://go-review.googlesource.com/c/go/+/481060 Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-30runtime/trace: enable frame pointer unwinding on amd64Felix Geisendörfer
Change tracer to use frame pointer unwinding by default on amd64. The expansion of inline frames is delayed until the stack table is dumped at the end of the trace. This requires storing the skip argument in the stack table, which now resides in pcBuf[0]. For stacks that are not produced by traceStackID (e.g. CPU samples), a logicalStackSentinel value in pcBuf[0] indicates that no inline expansion is needed. Add new GODEBUG=tracefpunwindoff=1 option to use the old unwinder if needed. Benchmarks show a considerable decrease in CPU overhead when using frame pointer unwinding for trace events: GODEBUG=tracefpunwindoff=1 ../bin/go test -run '^$' -bench '.+PingPong' -count 20 -v -trace /dev/null ./runtime | tee tracefpunwindoff1.txt GODEBUG=tracefpunwindoff=0 ../bin/go test -run '^$' -bench '.+PingPong' -count 20 -v -trace /dev/null ./runtime | tee tracefpunwindoff0.txt goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz │ tracefpunwindoff1.txt │ tracefpunwindoff0.txt │ │ sec/op │ sec/op vs base │ PingPongHog-32 3782.5n ± 0% 740.7n ± 2% -80.42% (p=0.000 n=20) For #16638 Change-Id: I2928a2fcd8779a31c45ce0f2fbcc0179641190bb Reviewed-on: https://go-review.googlesource.com/c/go/+/463835 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-03-24runtime/cgo: store M for C-created thread in pthread keydoujiang24
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls. So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call. Instead, we only dropm while the C thread exits, so the extra M won't leak. When invoking a Go function from C: Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor. And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits. When returning back to C: Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C. This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows. This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread. For the newly added BenchmarkCGoInCThread, some benchmark results: 1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz Fixes #51676 Change-Id: I380702fe2f9b6b401b2d6f04b0aba990f4b9ee6c GitHub-Last-Rev: 93dc64ad98e5583372e41f65ee4b7ab78b5aff51 GitHub-Pull-Request: golang/go#51679 Reviewed-on: https://go-review.googlesource.com/c/go/+/392854 Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: thepudds <thepudds1460@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-02-24cmd/compile: batch write barrier callsKeith Randall
Have the write barrier call return a pointer to a buffer into which the generated code records pointers that need write barrier treatment. Change-Id: I7871764298e0aa1513de417010c8d46b296b199e Reviewed-on: https://go-review.googlesource.com/c/go/+/447781 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Bypass: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-02-17cmd/compile: move raw writes out of write barrier codeKeith Randall
Previously, the write barrier calls themselves did the actual writes to memory. Instead, move those writes out to a common location that both the wb-enabled and wb-disabled code paths share. This enables us to optimize the write barrier path without having to worry about performing the actual writes. Change-Id: Ia71ab651908ec124cc33141afb52e4ca19733ac6 Reviewed-on: https://go-review.googlesource.com/c/go/+/447780 Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Bypass: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-02-17runtime: remove the restriction that write barrier ptrs come in pairsKeith Randall
Future CLs will remove the invariant that pointers are always put in the write barrier in pairs. The behavior of the assembly code changes a bit, where instead of writing the pointers unconditionally and then checking for overflow, check for overflow first and then write the pointers. Also changed the write barrier flush function to not take the src/dst as arguments. Change-Id: I2ef708038367b7b82ea67cbaf505a1d5904c775c Reviewed-on: https://go-review.googlesource.com/c/go/+/447779 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Bypass: Keith Randall <khr@golang.org>
2022-08-25runtime: mark morestack_noctxt SPWRITE on LR architecturesCherry Mui
On LR architectures, morestack (and morestack_noctxt) are called with a special calling convention, where the caller doesn't save LR on stack but passes it as a register, which morestack will save to g.sched.lr. The stack unwinder currently doesn't understand it, and would fail to unwind from it. morestack already writes SP (as it switches stack), but morestack_noctxt (which tailcalls morestack) doesn't. If a profiling signal lands right in morestack_noctxt, the unwinder will try to unwind the stack and go off, and possibly crash. Marking morestack_noctxt SPWRITE stops the unwinding. Ideally we could teach the unwinder about the special calling convention, or change the calling convention to be less special (so the unwinder doesn't need to fetch a register from the signal context). This is a stop-gap solution, to stop the unwinder from crashing. Fixes #54332. Change-Id: I75295f2e27ddcf05f1ea0b541aedcb9000ae7576 Reviewed-on: https://go-review.googlesource.com/c/go/+/425396 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2022-07-25runtime: fix runtime.Breakpoint() on windows/arm64qmuntal
Fixes #53837 Change-Id: I4219fe35aac1a88aae2905998fbb1d7db87bbfb2 Reviewed-on: https://go-review.googlesource.com/c/go/+/418734 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Alessandro Arzilli <alessandro.arzilli@gmail.com>
2022-05-18all: fix spellingJohn Bampton
Change-Id: I63eb42f3ce5ca452279120a5b33518f4ce16be45 GitHub-Last-Rev: a88f2f72bef402344582ae997a4907457002b5df GitHub-Pull-Request: golang/go#52951 Reviewed-on: https://go-review.googlesource.com/c/go/+/406843 Run-TryBot: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>
2022-05-04runtime: improve the annotation of debugCallV2 for arm64eric fang
This CL improves the annotation documentation of the debugCallV2 function for arm64. Change-Id: Icc2b52063cf4fe779071039d6a3bca1951108eb0 Reviewed-on: https://go-review.googlesource.com/c/go/+/402514 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2022-04-23runtime: support for debugger function calls on linux/arm64eric fang
This CL adds support for debugger function calls on linux arm64 platform. The protocol is basically the same as in CL 109699, except for the following differences: 1, The abi difference which affect parameter passing and frame layout. 2, Stores communication information in R20. 3, The closure register is R26. 4, Use BRK 0 instruction to generate a breakpoint. The saved PC in sigcontext is the PC where the signal occurred, not the next PC. In addition, this CL refactors the existing code (which is dedicated to amd64) for easier multi-arch scaling. Fixes #50614 Change-Id: I06b14e345cc89aab175f4a5f2287b765da85a86b Reviewed-on: https://go-review.googlesource.com/c/go/+/395754 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-30runtime: unify C->Go ABI transitions on arm64eric fang
There are several of places that save and restore the C callee-saved registers, the operation is the same everywhere, so this CL defines several macros to do this, which will help reduce code redundancy and unify the operation. This CL also replaced consecutive MOVD instructions with STP and LDP instructions in several places where these macros do not apply. Change-Id: I815f39fe484a9ab9b6bd157dfcbc8ad99c1420fe Reviewed-on: https://go-review.googlesource.com/c/go/+/374397 Trust: Eric Fang <eric.fang@arm.com> Run-TryBot: Eric Fang <eric.fang@arm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-03-18all: delete ARM64 non-register ABI fallback pathCherry Mui
Change-Id: I3996fb31789a1f8559348e059cf371774e548a8d Reviewed-on: https://go-review.googlesource.com/c/go/+/393875 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2022-03-18all: delete regabireflect goexperimentCherry Mui
regabireflect goexperiment was helpful in the register ABI development, to control code paths for reflect calls, before the compiler can generate register ABI everywhere. It is not necessary for now. Drop it. Change-Id: I2731197d2f496e29616c426a01045c9b685946a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/393362 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-10-29runtime: remove unnecessary argument stores for panicIndex etc. on ARM64Cherry Mui
If register ABI is used, no need to store the arguments to stack. I forgot them in CL 323937. Change-Id: I888af2b547a8fc97d13716bc8e8f3acd5c5bc127 Reviewed-on: https://go-review.googlesource.com/c/go/+/351609 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2021-08-16runtime: make asmcgocall g0/gsignal checks consistentJoel Sing
In asmcgocall() we need to switch to the g0 stack if we're not already on the g0 stack or the gsignal stack. The prefered way of doing this is to check gsignal first, then g0, since if we are going to switch to g0 we will need g0 handy (thus avoiding a second load). Rewrite/reorder 386 and amd64 to check gsignal first - this shaves a few assembly instructions off and makes the order consistent with arm, arm64, mips64 and ppc64. Add missing gsignal checks to mips, riscv64 and s390x. Change-Id: I1b027bf393c25e0c33e1d8eb80de67e4a0a3f561 Reviewed-on: https://go-review.googlesource.com/c/go/+/335869 Trust: Joel Sing <joel@sing.id.au> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>