| Age | Commit message (Collapse) | Author |
|
All openbsd architectures now use the same code, deduplicate accordingly.
Change-Id: I65f1d9bd78c97dbdf552ec95ebba7ec4d04c8d2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/518622
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
All netbsd architectures now use the same code, deduplicate accordingly.
Change-Id: Ieb179fd76885b7af6d388d7f2aee0f9fac6f1264
Reviewed-on: https://go-review.googlesource.com/c/go/+/518621
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
|
|
Much of the gcc_linux_*.c code is identical and duplicated across
architectures. Consolidate code for 386, arm, loong64, mips* and
riscv64, where the only difference is the build tags (386 also
has some non-functional ordering differences).
Change-Id: I14ee9a4cc6b72e165239d196b68b6343efaddf0a
Reviewed-on: https://go-review.googlesource.com/c/go/+/518620
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Most freebsd architectures now use the same code, deduplicate accordingly.
The arm code differs slightly in that it has a compile time check for
ARM_TP_ADDRESS, however this is written in a way that it can be included
for all architectures.
Change-Id: I7f6032b63521d24d0c3b5e0e08d57e32b4f9ddc4
Reviewed-on: https://go-review.googlesource.com/c/go/+/518619
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
This reduces inconsistency with other architectures and will allow
for further code deduplication.
Change-Id: Icf0d02f765546c3193cccaa22c79e632e12d6bba
Reviewed-on: https://go-review.googlesource.com/c/go/+/518616
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Use fatalf consistently on freebsd. Also use it on dragonfly, netbsd
and openbsd.
Change-Id: I8643c0b7bc13c3cb5173209d311d6d297913955b
Reviewed-on: https://go-review.googlesource.com/c/go/+/518615
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Most architectures have a crosscall1 function that takes a function
pointer, a setg_gcc function pointer and a g pointer. However,
crosscall_386 only takes a function pointer and the call to setg_gcc
is performed in the thread entry function.
Rename crosscall_386 to crosscall1 for consistency with other
architectures, as well as standardising the API - while not strictly
necessary, it will allow for further deduplication as the calling
code becomes more consistent.
Change-Id: I77cf42e1e15e0a4c5802359849a849c32cebd92f
Reviewed-on: https://go-review.googlesource.com/c/go/+/518618
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
This reduces inconsistency with other architectures and will allow
for further code deduplication.
Change-Id: I5becbf29af2ef714974b5e338f869281f2b4de8e
Reviewed-on: https://go-review.googlesource.com/c/go/+/518617
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This extends CL 419434 to all Unix targets. Rather than repeating
the code, pull all the similar code into a single function.
CL 419434 description:
For a cgo binary, at startup we set g0's stack bounds using the
address of a local variable (&size) in a C function x_cgo_init and
the stack size from pthread_attr_getstacksize. Normally, &size is
an address within the current stack frame. However, when it is
compiled with ASAN, it may be instrumented to __asan_stack_malloc_0
and the address may not live in the current stack frame, causing
the stack bound to be set incorrectly, e.g. lo > hi.
Using __builtin_frame_address(0) to get the stack address instead.
Change-Id: I914a09d32c66a79515b6f700be18c690f3c0c77b
Reviewed-on: https://go-review.googlesource.com/c/go/+/517335
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
|
|
Every call from C to Go does acquire a mutex to check whether Go runtime
has been fully initialized. This often does not matter, because the lock
is held only briefly. However, with code that does a lot of parallel
calls from C to Go could cause heavy contention on the mutex.
Since this is an initialization guard, we can double check with atomic
operation to provide a fast path in case the initialization is done.
With this CL, program in #60961 reduces from ~2.7s to ~1.8s.
Fixes #60961
Change-Id: Iba4cabbee3c9bc646e70ef7eb074212ba63fdc04
Reviewed-on: https://go-review.googlesource.com/c/go/+/505455
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This reapplies CL 485500, with a fix drafted in CL 492987 incorporated.
CL 485500 is reverted due to #60004 and #60007. #60004 is fixed in
CL 492743. #60007 is fixed in CL 492987 (incorporated in this CL).
[Original CL 485500 description]
This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and
CL 485316 incorporated.
CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 482975 is a followup fix to a C declaration in testprogcgo.
CL 485315 is a followup fix for x_cgo_getstackbound on Illumos.
CL 485316 is a followup cleanup for ppc64 assembly.
CL 479915 passed the G to _cgo_getstackbound for direct updates to
gp.stack.lo. A G can be reused on a new thread after the previous thread
exited. This could trigger the C TSAN race detector because it couldn't
see the synchronization in Go (lockextra) preventing the same G from
being used on multiple threads at the same time.
We work around this by passing the address of a stack variable to
_cgo_getstackbound rather than the G. The stack is generally unique per
thread, so TSAN won't see the same address from multiple threads. Even
if stacks are reused across threads by pthread, C TSAN should see the
synchonization in the stack allocator.
A regression test is added to misc/cgo/testsanitizer.
[Original CL 481061 description]
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
[CL 492987 description]
On the first call into Go from a C thread, currently we set the g0
stack's high bound imprecisely based on the SP. With CL 485500, we
keep the M and don't recompute the stack bounds when it calls into
Go again. If the first call is made when the C thread uses some
deep stack, but a subsequent call is made with a shallower stack,
the SP may be above g0.stack.hi.
This is usually okay as we don't check usually stack.hi. One place
where we do check for stack.hi is in the signal handler, in
adjustSignalStack. In particular, C TSAN delivers signals on the
g0 stack (instead of the usual signal stack). If the SP is above
g0.stack.hi, we don't see it is on the g0 stack, and throws.
This CL makes it get an accurate stack upper bound with the
pthread API (on the platforms where it is available).
Also add some debug print for the "handler not on signal stack"
throw.
Fixes #51676.
Fixes #59294.
Fixes #59678.
Fixes #60007.
Change-Id: Ie51c8e81ade34ec81d69fd7bce1fe0039a470776
Reviewed-on: https://go-review.googlesource.com/c/go/+/495855
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
This reverts CL 485500.
Reason for revert: This breaks internal tests at Google, see b/280861579 and b/280820455.
Change-Id: I426278d400f7611170918fc07c524cb059b9cc55
Reviewed-on: https://go-review.googlesource.com/c/go/+/492995
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Chressie Himpel <chressie@google.com>
|
|
Rework this function to closely match the PPC64 crosscall2, but
written in gnu asm. Likewise, fix this to store TOC in the new
frame, not the caller's, as is required by the ELF ABIs.
Change-Id: I8902c74f2607e3436260882a7bea52e72a67b8f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/486335
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Carlos Amedee <carlos@golang.org>
|
|
ELFv1 and ELFv2 declare V20-V31 as nonvolatile (callee save) registers.
Go does not. Preserve them before calling into Go.
Change-Id: If60a6d8f71b51b1136a86eab2b90d964900becd7
Reviewed-on: https://go-review.googlesource.com/c/go/+/480657
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
cgo.NewHandle atomically increments a global uintptr index using
atomic.AddUintptr. Use atomic.Uintptr instead, which is
cleaner and clearer.
Change-Id: I845b3e4cb8c461e787a9b9bb2a9ceaaef1d21d8e
Reviewed-on: https://go-review.googlesource.com/c/go/+/490775
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and
CL 485316 incorporated.
CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 482975 is a followup fix to a C declaration in testprogcgo.
CL 485315 is a followup fix for x_cgo_getstackbound on Illumos.
CL 485316 is a followup cleanup for ppc64 assembly.
[Original CL 481061 description]
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
[CL 485500 description]
CL 479915 passed the G to _cgo_getstackbound for direct updates to
gp.stack.lo. A G can be reused on a new thread after the previous thread
exited. This could trigger the C TSAN race detector because it couldn't
see the synchronization in Go (lockextra) preventing the same G from
being used on multiple threads at the same time.
We work around this by passing the address of a stack variable to
_cgo_getstackbound rather than the G. The stack is generally unique per
thread, so TSAN won't see the same address from multiple threads. Even
if stacks are reused across threads by pthread, C TSAN should see the
synchonization in the stack allocator.
A regression test is added to misc/cgo/testsanitizer.
Fixes #51676.
Fixes #59294.
Fixes #59678.
Change-Id: Ic62be31a06ee83568215e875a891df37084e08ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/485500
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
|
|
This reverts CL 481061.
Reason for revert: When built with C TSAN, x_cgo_getstackbound triggers
race detection on `g->stacklo` because the synchronization is in Go,
which isn't instrumented.
For #51676.
For #59294.
For #59678.
Change-Id: I38afcda9fcffd6537582a39a5214bc23dc147d47
Reviewed-on: https://go-review.googlesource.com/c/go/+/485275
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
on PPC64"""
This reverts CL 481075 (a re-apply of previously reverted CL 478917).
Reason for revert: CL 481061 causes C TSAN failures and must be
reverted. See CL 485275. This CL depends on CL 481061.
For #59678.
Change-Id: I4bf7f43d9df1ae28e04cd4065552bcbee82ef13f
Reviewed-on: https://go-review.googlesource.com/c/go/+/485316
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
This reverts CL 481795.
Reason for revert: CL 481061 causes C TSAN failures and must be
reverted. See CL 485275. This CL depends on CL 481061.
For #59678.
Change-Id: I5ec1f495154205ebdf19cd44c6e6452a7a3606f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/485315
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
While Solaris supports pthread_getattr_np, Illumos doesn't...
Instead, Illumos supports pthread_attr_get_np.
Updates #59294.
Change-Id: I2c66dad79b8bf3d510352875bf21d04415f23eeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/481795
TryBot-Bypass: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Add new helper macros to further simplify the transition from
the host's ABI to Go. Fortunately the same one should work for
all PPC64 targets.
Update the other site which uses these wrappers to further
consolidate. Also, update the call to runtime.sigtrampgo to
call the ABIInternal version directly.
Also, update the SAVE/RESTORE_VR macros to accept R0.
Change-Id: I0046176029e1e1b25838688e4b7bf57805b01bd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/476297
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This reverts CL 481059, which in turn reverts CL 478917.
Reason for revert: reapply the original CL.
Change-Id: Icf6bb6a620313b44fadcc7f69a62fdbb943e34fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/481075
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
|
|
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
Fixes #51676.
Fixes #59294.
Change-Id: I9bf1400106d5c08ce621d2ed1df3a2d9e3f55494
Reviewed-on: https://go-review.googlesource.com/c/go/+/481061
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: DeJiang Zhu (doujiang) <doujiang24@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This reverts CL 392854.
Reason for revert: caused #59294, which was derived from google
internal tests. The attempted fix of #59294 caused more breakage.
Change-Id: I5a061561ac2740856b7ecc09725ac28bd30f8bba
Reviewed-on: https://go-review.googlesource.com/c/go/+/481060
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This reverts CL 478917.
Reason for revert: need to revert CL 392854, and this caused a conflict.
Change-Id: I02c3285de5635b431a99adc8790c8310d1c4e6a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/481059
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
|
|
This reverts CL 479915.
Reason for revert: breaks a lot google internal tests.
Change-Id: I13a9422e810af7ba58cbf4a7e6e55f4d8cc0ca51
Reviewed-on: https://go-review.googlesource.com/c/go/+/481055
Reviewed-by: Chressie Himpel <chressie@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
For #59294.
Change-Id: Ie52a8f931e0648d8753e4c1dbe45468b8748b527
Reviewed-on: https://go-review.googlesource.com/c/go/+/479915
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Loong64's R22-R31 and F24-F31 are callee saved registers, which
should be saved in the beginning of sigtramp, and restored at
the end.
In reviewing comments about sigtramp in sys_linux_arm64 it was
noted that a previous issue in arm64 due to missing callee save
registers could also be a problem on loong64, so code was added
to save and restore those.
Updates #31827
Change-Id: I3ae58fe8a64ddb052d0a89b63e82c01ad328dd15
Reviewed-on: https://go-review.googlesource.com/c/go/+/426356
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Auto-Submit: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: xiaodong liu <teaofmoli@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This is a cleanup to allow a consistent definitions of a function
descriptor on code shared between AIX and Linux. They need to be
declared in slightly different ways, but we can hide that in one
macro.
And, update all usage.
Change-Id: I10f3580473db555b4fb4d2597b856f3a67d01a53
Reviewed-on: https://go-review.googlesource.com/c/go/+/478917
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
|
|
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Fixes #51676
Change-Id: I380702fe2f9b6b401b2d6f04b0aba990f4b9ee6c
GitHub-Last-Rev: 93dc64ad98e5583372e41f65ee4b7ab78b5aff51
GitHub-Pull-Request: golang/go#51679
Reviewed-on: https://go-review.googlesource.com/c/go/+/392854
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: thepudds <thepudds1460@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Rather than implying that all ppc64 GOARCHs use function descriptors,
provide a define for platforms that make use of function descriptors.
Condition on GO_PPC64X_HAS_FUNCDESC when choosing whether or not
to load the entry address from the first slot of the function
descriptor.
Updates #56001.
Change-Id: I9cdc788f2de70a1262c17d8485b555383d1374b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/476117
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
The tsan13 test highlighted a few bugs.
The first being runtime.sigprofNonGoWrapper was being
called from C code and violating the C ABI.
The second was a missed tsan acquire/release after
thread creation.
The third was runtime.cgoSigtramp violating ELFv2
ABI constraints when loading g. It is reworked to
avoid clobbering R30 and R31 via runtime.load_g.
Change-Id: Ib2d98047fa1b4e72b8045767e86457a8ddfe492e
Reviewed-on: https://go-review.googlesource.com/c/go/+/475935
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Change-Id: Ifb8d64f18b67c8b712feec29ffb6719c6e9718ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/474198
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
|
|
The linux build tags were incorrectly removed from these files by CL 460538.
Restore the correct build tags so that they are only included in builds
for linux/mips* platforms.
Change-Id: I21d8802b0252688d8e2228cf904b47d90b253485
Reviewed-on: https://go-review.googlesource.com/c/go/+/469175
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Add -fno-stack-protector back to the default set of CFLAGS for cgo, so
as to avoid problems with internal linking locating the library
containing the "__stack_chk_fail_local" support function that some
compilers emit (the specific archive can vary based on GOOS).
Updates #52919.
Updates #54313.
Updates #57261.
Updates #58385.
Change-Id: I4591bfb15501f04b7afe1fcd50c4fb93c86db63d
Reviewed-on: https://go-review.googlesource.com/c/go/+/466935
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
We don't support Apple platform on 386, ARM, or PPC64. Remove
dead code.
Change-Id: I5722bf58c0fb73c5db4ba016cb424e392739c7de
Reviewed-on: https://go-review.googlesource.com/c/go/+/455162
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
For internal linking, at the point where we finish reading libgcc.a,
if the symbol "__stack_chk_local" is still undefined, then read
in the host archive libc_nonshared.a as well.
Updates #57261.
Change-Id: I0b1e485aa50aa7940db8cabcb3b9a7959bf99ce7
Reviewed-on: https://go-review.googlesource.com/c/go/+/456856
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Replace deprecated // +build lines by their respective //go:build line
counterpart. Also remove build constraints implied by file name or type.
Change-Id: I8d18cd40071ca28d7654da8f0d22841f43729ca6
Reviewed-on: https://go-review.googlesource.com/c/go/+/460538
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
This CL marks non-leaf nosplit assembly functions as NOFRAME to avoid
relying on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.
Updates #57302
Updates #40044
Change-Id: Ia4d26f8420dcf2b54528969ffbf40a73f1315d61
Reviewed-on: https://go-review.googlesource.com/c/go/+/459395
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
This CL redesign how we get the TLS pointer on windows/i386.
It applies the same changes as done in CL 431775 for windows/amd64.
We were previously reading it from the [TEB] arbitrary data slot,
located at 0x14(FS), which can only hold 1 TLS pointer.
With this CL, we will read the TLS pointer from the TEB TLS slot array,
located at 0xE10(GS). The TLS slot array can hold multiple
TLS pointers, up to 64, so multiple Go runtimes running on the
same thread can coexists with different TLS.
Each new TLS slot has to be allocated via [TlsAlloc],
which returns the slot index. This index can then be used to get the
slot offset from GS with the following formula: 0xE10 + index*4.
The slot index is fixed per Go runtime, so we can store it
in runtime.tls_g and use it latter on to read/update the TLS pointer.
Loading the TLS pointer requires the following asm instructions:
MOVQ runtime.tls_g, AX
MOVQ AX(FS), AX
Notice that this approach will now be implemented in all the supported
windows arches.
[TEB]: https://en.wikipedia.org/wiki/Win32_Thread_Information_Block
[TlsAlloc]: https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-tlsalloc
Change-Id: If4550b0d44694ee6480d4093b851f4991a088b32
Reviewed-on: https://go-review.googlesource.com/c/go/+/454675
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Fix typo in CL 454838.
Change-Id: I0e91d22cf09949f0bf924ebcf09f1ddac525bac4
Reviewed-on: https://go-review.googlesource.com/c/go/+/455161
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
Without it, at least on ARM64 with older BFD linker, it will
include the file of the object file (which is of a temporary path)
as a debug symbol into the binary, causing the build to be
nondeterministic. Adding a .file directive makes it to create a
STT_FILE symbol with deterministic input, and prevent the linker
creating one using the temporary object file path.
Fixes #57035.
Change-Id: I3ab716b240f60f7a891af2f7e10b467df67d1f31
Reviewed-on: https://go-review.googlesource.com/c/go/+/454838
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cherry Mui <cherryyz@google.com>
|
|
This CL redesign how we get the TLS pointer on windows/amd64.
We were previously reading it from the [TEB] arbitrary data slot,
located at 0x28(GS), which can only hold 1 TLS pointer.
With this CL, we will read the TLS pointer from the TEB TLS slot array,
located at 0x1480(GS). The TLS slot array can hold multiple
TLS pointers, up to 64, so multiple Go runtimes running on the
same thread can coexists with different TLS.
Each new TLS slot has to be allocated via [TlsAlloc],
which returns the slot index. This index can then be used to get the
slot offset from GS with the following formula: 0x1480 + index*8
The slot index is fixed per Go runtime, so we can store it
in runtime.tls_g and use it latter on to read/update the TLS pointer.
Loading the TLS pointer requires the following asm instructions:
MOVQ runtime.tls_g, AX
MOVQ AX(GS), AX
Notice that this approach is also implemented on windows/arm64.
[TEB]: https://en.wikipedia.org/wiki/Win32_Thread_Information_Block
[TlsAlloc]: https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-tlsalloc
Updates #22192
Change-Id: Idea7119fd76a3cd083979a4d57ed64b552fa101b
Reviewed-on: https://go-review.googlesource.com/c/go/+/431775
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
|
|
Adjust build constraints and change the runtime to call the C mmap function
when using cgo.
R=go1.20
For #53298
Change-Id: If9c3306dc16a8645d1bb9be0343e0472d6c4783f
Reviewed-on: https://go-review.googlesource.com/c/go/+/411274
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
On Mac OS X, the default stack size for non-main threads created by cgo is
fixed at 512KB and cannot be altered by setrlimit. This stack size is too
small for some recursive scenarios. We can solve this problem by explicitly
copying the stack size of the main thread when creating a new thread.
Change-Id: I400d5b2e929a1ee261502914a991e208759f64a8
GitHub-Last-Rev: b29c74599ea1cce4683d25e1ac8a70ffd9b86381
GitHub-Pull-Request: golang/go#53667
Reviewed-on: https://go-review.googlesource.com/c/go/+/415915
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: hopehook <hopehook@golangcn.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Updates #53466
Change-Id: I08ea279c905e265a579b6b3e23aee012165beaee
Reviewed-on: https://go-review.googlesource.com/c/go/+/431658
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Goutnik <dgoutnik@gmail.com>
|
|
Updates #46731
Change-Id: Ia83f27c177cc2f57e240cb5c6708d4552423f5be
Reviewed-on: https://go-review.googlesource.com/c/go/+/421879
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Some compilers default to having -fstack-protector on, which breaks
when using internal linking because the linker doesn't know how to
find the support functions.
Fixes #52919
Fixes #54313
Change-Id: I6f51d5e906503f61fc768ad8e30c163bad135087
Reviewed-on: https://go-review.googlesource.com/c/go/+/421935
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
For a cgo binary, at startup we set g0's stack bounds using the
address of a local variable (&size) in a C function x_cgo_init and
the stack size from pthread_attr_getstacksize. Normally, &size is
an address within the current stack frame. However, when it is
compiled with ASAN, it may be instrumented to __asan_stack_malloc_0
and the address may not live in the current stack frame, causing
the stack bound to be set incorrectly, e.g. lo > hi.
Using __builtin_frame_address(0) to get the stack address instead.
Change-Id: I41df929e5ed24d8bbf3e15027af6dcdfc3736e37
Reviewed-on: https://go-review.googlesource.com/c/go/+/419434
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
We occassionally see _beginthread failing with EACCES, meaning
"insufficient resources" according to the Microsoft documentation.
Exactly which resources is unclear.
Similar to pthread_create on unix systems, we can wait a bit and retry
to try to get success. The alternative is to abort, so we may as well
give it a try.
Fixes #52572.
Change-Id: I6e05add53b4ae36c61e53b1ee3fed6bc74e17dfa
Reviewed-on: https://go-review.googlesource.com/c/go/+/410355
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|