| Age | Commit message (Collapse) | Author |
|
All architectures supporting c-shared and c-archive share the same
initialization code in assembly, and most of it can be implemented in
pure Go.
Cq-Include-Trybots: luci.golang.try:gotip-darwin-arm64-longtest,gotip-linux-ppc64le_power10,gotip-linux-riscv64,gotip-linux-loong64,gotip-linux-s390x
Change-Id: Iaa9fb7d6f9ca8785f1098461646d607ef6b00d47
Reviewed-on: https://go-review.googlesource.com/c/go/+/706417
Auto-Submit: Quim Muntal <quimmuntal@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Being able to call asmcgocall without a G is useful for code shared
between different stages of the runtime initialization and thread
creation.
Cq-Include-Trybots: luci.golang.try:gotip-darwin-arm64_15,gotip-linux-mips64le,gotip-linux-ppc64le_power10,gotip-linux-riscv64,gotip-openbsd-ppc64,gotip-openbsd-amd64
Change-Id: Ic427764de197e648e8b9987c98c3b7521512cc5c
Reviewed-on: https://go-review.googlesource.com/c/go/+/750541
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
crosscall1 has the same signature and behavior on all platforms except for ppc64x and s390x, fix that.
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64le_power9,gotip-linux-ppc64_power8,gotip-linux-s390x
Change-Id: Iead8b578a45787bb1fb2dd2dcfcb181514595637
Reviewed-on: https://go-review.googlesource.com/c/go/+/708235
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
On AIX, all externally linked programs call _rt0_ppc64_aix_lib,
as seen in runtime/cgo/gcc_aix_ppc64.c. The idea is that we
only do the library initialization is isarchive is set.
However, before this CL, AIX programs would call libpreinit
before checking isarchive. The effect was that signals were
initialized twice. That caused the signal code to record that
all signals had an existing forwarding address, namely the
Go signal handler that was always installed. This caused signal
forwarding to enter an infinite loop. This caused, for example,
"go test os" to hang forever when running TestStdPipe.
Fixes #76081
Change-Id: I5555f8c5e299d45549f5ce601568dc6b3cff4ecc
Reviewed-on: https://go-review.googlesource.com/c/go/+/727820
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: I9f01692373623687e09bee54efebaac0ee361f81
Reviewed-on: https://go-review.googlesource.com/c/go/+/712664
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
procyield will currently loop infinitely if passed 0 on several
platforms. This change sidesteps this bug by renaming procyield to
procyieldAsm, and adding a wrapper named procyield that checks for
cycles == 0. The benefit of this structure is that procyield called
with a constant cycle count of 0 will be inlined and constant folded
away, the expected behavior of a procyield of 0 cycles.
A follow-up change will fix the assembly to not have this footgun
anymore.
Change-Id: I7068abfeb961bc0fa475e216836f7c0e46b38373
Reviewed-on: https://go-review.googlesource.com/c/go/+/712663
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
CL 706395 refactored the ppc64 library entry point and missed some
important aix-specific characteristics:
- _rt0_ppc64x_lib should account for the function descriptor when
getting the callback pointer.
- _rt0_ppc64x_lib should only call _cgo_sys_thread_create when
built as a c-archive.
Fixes #75801
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10
Change-Id: I343ca09d3b9688ffa585668a6c52f0ad519d6203
Reviewed-on: https://go-review.googlesource.com/c/go/+/710175
Reviewed-by: Paul Murphy <paumurph@redhat.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
|
|
CL 706395 consolidated the aix and elf startup code, further
update the comments to reflect this.
Change-Id: Iccb98008b3fe4a4b08e55ee822924fad76846cc2
Reviewed-on: https://go-review.googlesource.com/c/go/+/708355
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Quim Muntal <quimmuntal@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
AIX sets the argc and argv parameters in R14 and R15, but
_rt0_ppc64x_lib expects them to be in R3 and R4. Also, call reginit in
_rt0_ppc64x_lib.
These issues were oversights from CL 706395 which went unnoticed because
there if no LUCI aix/ppc64 builder (see #67299).
Change-Id: I93a2798739935fbcead3e6162b4b90db7e740aa5
Reviewed-on: https://go-review.googlesource.com/c/go/+/708255
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
|
|
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64le_power10
Change-Id: Ifd7861488b1b47a5d30163552b51838f2bef7248
Reviewed-on: https://go-review.googlesource.com/c/go/+/706395
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
We're running into nosplit limits when compiled with all=-N -l.
Fixes #74910
Change-Id: I156263ae9b54ded240000001719512af86af70ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/693557
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Auto-Submit: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: I25a9bbc247b2490e7e37ed843386f53a71822146
Reviewed-on: https://go-review.googlesource.com/c/go/+/682498
Reviewed-by: Paul Murphy <paumurph@redhat.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
It's not used for anything.
Change-Id: I031b3cdfe52b6b1cff4b3cb6713ffe588084542f
Reviewed-on: https://go-review.googlesource.com/c/go/+/652276
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
this removes the old conditional-on-register-value
handshake from the deferproc/deferprocstack logic.
The "line" for the recovery-exit frame itself (not the defers
that it runs) is the closing brace of the function.
Reduces code size slightly (e.g. go command is 0.2% smaller)
Sample output showing effect of this change, also what sort of
code it requires to observe the effect:
```
package main
import "os"
func main() {
g(len(os.Args) - 1) // stack[0]
}
var gi int
var pi *int = &gi
//go:noinline
func g(i int) {
switch i {
case 0:
defer func() {
println("g0", i)
q() // stack[2] if i == 0
}()
for j := *pi; j < 1; j++ {
defer func() {
println("recover0", recover().(string))
}()
}
default:
for j := *pi; j < 1; j++ {
defer func() {
println("g1", i)
q() // stack[2] if i == 1
}()
}
defer func() {
println("recover1", recover().(string))
}()
}
p()
} // stack[1] (deferreturn)
//go:noinline
func p() {
panic("p()")
}
//go:noinline
func q() {
panic("q()") // stack[3]
}
/* Sample output for "./foo foo":
recover1 p()
g1 1
panic: q()
goroutine 1 [running]:
main.q()
.../main.go:46 +0x2c
main.g.func3()
.../main.go:29 +0x48
main.g(0x1?)
.../main.go:37 +0x68
main.main()
.../main.go:6 +0x28
*/
```
Change-Id: Ie39ea62ecc244213500380ea06d44024cadc2317
Reviewed-on: https://go-review.googlesource.com/c/go/+/650795
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Cleanup all remaining trivial compares against $0 in ppc64x assembly.
In math, SRD ...,Rx; CMP Rx, $0 is further simplified to SRDCC.
Change-Id: Ia2bc204953e32f08ee142bfd06a91965f30f99b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/587016
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Adds runtime.debugPinnerV1 which returns a runtime.Pinner object that
pins itself. This is intended to be used by debuggers in conjunction
with runtime.debugCall to keep heap memory reachable even if it isn't
referenced from anywhere else.
Change-Id: I508ee6a7b103e68df83c96f2e04a0599200300dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/558276
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Define it as ABIInternal, so the result does not take space on
stack.
Also use R10 as a temporary register for arithmetics on SP, so it
is hidden from the assembler's SP delta calculation, which is
irrelevant anyway as we are on the system stack.
Change-Id: I8fed467601c19cad2d7afab26978246d15ce3147
Reviewed-on: https://go-review.googlesource.com/c/go/+/580918
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Change-Id: I8d1011509c4f0f529e97055280606603747a2e1a
Reviewed-on: https://go-review.googlesource.com/c/go/+/541775
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
|
|
This CL adds support for debugger function calls on linux ppc64le
platform. The protocol is basically the same as in CL 395754, except for
the following differences:
1, The abi differences which affect parameter passing and frame layout.
2, The closure register is R11.
3, Minimum framesize on pp64le is 32 bytes
4, Added functions to return parent context structure for general purpose
registers in order to work with the way these structures are defined in
ppc64le
Change-Id: I58e01fedad66a818ab322e2b2d8f5104cfa64f39
Reviewed-on: https://go-review.googlesource.com/c/go/+/512575
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Archana Ravindar <aravinda@redhat.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add runtime support for the openbsd/ppc64 port.
Updates #56001
Change-Id: I3cf010b34f96ce269858b02f16481fdad2cd5b77
Reviewed-on: https://go-review.googlesource.com/c/go/+/475618
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Eric Grosse <grosse@gmail.com>
|
|
On some platforms asmcgocall can be called with a nil g. Additionally, it
can be called when already on a the system (g0) stack or on a signal stack.
In these cases we do not need to switch (and/or cannot switch) to the
system stack and as a result, do not need to save the g.
Rework asmcgocall on ppc64x to follow the pattern used on other architectures,
such as amd64 and arm64, where a separate nosave path is called in the above
cases. The nil g case will be needed to support openbsd/ppc64.
Updates #56001
Change-Id: I431d4200bcbc4aaddeb617aefe18590165ff2927
Reviewed-on: https://go-review.googlesource.com/c/go/+/478775
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Fix spelling errors discovered using https://github.com/codespell-project/codespell. Errors in data files and vendored packages are ignored.
Change-Id: I83c7818222f2eea69afbd270c15b7897678131dc
GitHub-Last-Rev: 3491615b1b82832cc0064f535786546e89aa6184
GitHub-Pull-Request: golang/go#60758
Reviewed-on: https://go-review.googlesource.com/c/go/+/502576
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Change-Id: If13f4d4bc545f78e3eb8c23cf2e63f0eb273d71f
GitHub-Last-Rev: 32ca70f52a5c3dd66f18535c5e595e66afb3903c
GitHub-Pull-Request: golang/go#60703
Reviewed-on: https://go-review.googlesource.com/c/go/+/502055
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Surprisingly, it usually survived the call to flush a write
barrier. Usually.
Fixes #60368
Change-Id: I4792a57738e5829c79baebae4d13b62abe9526b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/499679
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This reapplies CL 485500, with a fix drafted in CL 492987 incorporated.
CL 485500 is reverted due to #60004 and #60007. #60004 is fixed in
CL 492743. #60007 is fixed in CL 492987 (incorporated in this CL).
[Original CL 485500 description]
This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and
CL 485316 incorporated.
CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 482975 is a followup fix to a C declaration in testprogcgo.
CL 485315 is a followup fix for x_cgo_getstackbound on Illumos.
CL 485316 is a followup cleanup for ppc64 assembly.
CL 479915 passed the G to _cgo_getstackbound for direct updates to
gp.stack.lo. A G can be reused on a new thread after the previous thread
exited. This could trigger the C TSAN race detector because it couldn't
see the synchronization in Go (lockextra) preventing the same G from
being used on multiple threads at the same time.
We work around this by passing the address of a stack variable to
_cgo_getstackbound rather than the G. The stack is generally unique per
thread, so TSAN won't see the same address from multiple threads. Even
if stacks are reused across threads by pthread, C TSAN should see the
synchonization in the stack allocator.
A regression test is added to misc/cgo/testsanitizer.
[Original CL 481061 description]
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
[CL 492987 description]
On the first call into Go from a C thread, currently we set the g0
stack's high bound imprecisely based on the SP. With CL 485500, we
keep the M and don't recompute the stack bounds when it calls into
Go again. If the first call is made when the C thread uses some
deep stack, but a subsequent call is made with a shallower stack,
the SP may be above g0.stack.hi.
This is usually okay as we don't check usually stack.hi. One place
where we do check for stack.hi is in the signal handler, in
adjustSignalStack. In particular, C TSAN delivers signals on the
g0 stack (instead of the usual signal stack). If the SP is above
g0.stack.hi, we don't see it is on the g0 stack, and throws.
This CL makes it get an accurate stack upper bound with the
pthread API (on the platforms where it is available).
Also add some debug print for the "handler not on signal stack"
throw.
Fixes #51676.
Fixes #59294.
Fixes #59678.
Fixes #60007.
Change-Id: Ie51c8e81ade34ec81d69fd7bce1fe0039a470776
Reviewed-on: https://go-review.googlesource.com/c/go/+/495855
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
This reverts CL 485500.
Reason for revert: This breaks internal tests at Google, see b/280861579 and b/280820455.
Change-Id: I426278d400f7611170918fc07c524cb059b9cc55
Reviewed-on: https://go-review.googlesource.com/c/go/+/492995
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Chressie Himpel <chressie@google.com>
|
|
This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and
CL 485316 incorporated.
CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 482975 is a followup fix to a C declaration in testprogcgo.
CL 485315 is a followup fix for x_cgo_getstackbound on Illumos.
CL 485316 is a followup cleanup for ppc64 assembly.
[Original CL 481061 description]
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
[CL 485500 description]
CL 479915 passed the G to _cgo_getstackbound for direct updates to
gp.stack.lo. A G can be reused on a new thread after the previous thread
exited. This could trigger the C TSAN race detector because it couldn't
see the synchronization in Go (lockextra) preventing the same G from
being used on multiple threads at the same time.
We work around this by passing the address of a stack variable to
_cgo_getstackbound rather than the G. The stack is generally unique per
thread, so TSAN won't see the same address from multiple threads. Even
if stacks are reused across threads by pthread, C TSAN should see the
synchonization in the stack allocator.
A regression test is added to misc/cgo/testsanitizer.
Fixes #51676.
Fixes #59294.
Fixes #59678.
Change-Id: Ic62be31a06ee83568215e875a891df37084e08ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/485500
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
|
|
For #59670.
Change-Id: I0efa743edc08e48dc8d906803ba45e9f641369db
Reviewed-on: https://go-review.googlesource.com/c/go/+/486977
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
|
|
This reverts commit CL 486381.
Submitted out of order and breaks bootstrap.
Change-Id: Ia472111cb966e884a48f8ee3893b3bf4b4f4f875
Reviewed-on: https://go-review.googlesource.com/c/go/+/486915
Reviewed-by: David Chase <drchase@google.com>
TryBot-Bypass: Austin Clements <austin@google.com>
|
|
For #59670.
Change-Id: I4476d6f92663e8a825d063d6e6a7fc9a2ac99d4d
Reviewed-on: https://go-review.googlesource.com/c/go/+/486381
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This reapplies CL 481075, which was a reappliation of CL 478917.
This CL has been reverted twice now due to conflicts with CL 392854 /
CL 481061, which had bugs and had to be reverted.
Now this CL skips the conflicting changes to runtime/cgo/asm_ppc64x.s,
which will be merged directly into a new version of CL 392854 /
CL 481061. That way, if there are _more_ issues, this CL need not be
involved in any more reverts.
Change-Id: I2801b918faf9418dd0edff19f2a63f4d9e08896c
Reviewed-on: https://go-review.googlesource.com/c/go/+/485335
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This reverts CL 481061.
Reason for revert: When built with C TSAN, x_cgo_getstackbound triggers
race detection on `g->stacklo` because the synchronization is in Go,
which isn't instrumented.
For #51676.
For #59294.
For #59678.
Change-Id: I38afcda9fcffd6537582a39a5214bc23dc147d47
Reviewed-on: https://go-review.googlesource.com/c/go/+/485275
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
on PPC64"""
This reverts CL 481075 (a re-apply of previously reverted CL 478917).
Reason for revert: CL 481061 causes C TSAN failures and must be
reverted. See CL 485275. This CL depends on CL 481061.
For #59678.
Change-Id: I4bf7f43d9df1ae28e04cd4065552bcbee82ef13f
Reviewed-on: https://go-review.googlesource.com/c/go/+/485316
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
|
|
This reverts CL 481059, which in turn reverts CL 478917.
Reason for revert: reapply the original CL.
Change-Id: Icf6bb6a620313b44fadcc7f69a62fdbb943e34fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/481075
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
|
|
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
Fixes #51676.
Fixes #59294.
Change-Id: I9bf1400106d5c08ce621d2ed1df3a2d9e3f55494
Reviewed-on: https://go-review.googlesource.com/c/go/+/481061
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: DeJiang Zhu (doujiang) <doujiang24@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This reverts CL 392854.
Reason for revert: caused #59294, which was derived from google
internal tests. The attempted fix of #59294 caused more breakage.
Change-Id: I5a061561ac2740856b7ecc09725ac28bd30f8bba
Reviewed-on: https://go-review.googlesource.com/c/go/+/481060
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This reverts CL 478917.
Reason for revert: need to revert CL 392854, and this caused a conflict.
Change-Id: I02c3285de5635b431a99adc8790c8310d1c4e6a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/481059
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
|
|
This is a cleanup to allow a consistent definitions of a function
descriptor on code shared between AIX and Linux. They need to be
declared in slightly different ways, but we can hide that in one
macro.
And, update all usage.
Change-Id: I10f3580473db555b4fb4d2597b856f3a67d01a53
Reviewed-on: https://go-review.googlesource.com/c/go/+/478917
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
|
|
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Fixes #51676
Change-Id: I380702fe2f9b6b401b2d6f04b0aba990f4b9ee6c
GitHub-Last-Rev: 93dc64ad98e5583372e41f65ee4b7ab78b5aff51
GitHub-Pull-Request: golang/go#51679
Reviewed-on: https://go-review.googlesource.com/c/go/+/392854
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: thepudds <thepudds1460@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Rather than implying that all ppc64 GOARCHs use function descriptors,
provide a define for platforms that make use of function descriptors.
Condition on GO_PPC64X_HAS_FUNCDESC when choosing whether or not
to load the entry address from the first slot of the function
descriptor.
Updates #56001.
Change-Id: I9cdc788f2de70a1262c17d8485b555383d1374b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/476117
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Have the write barrier call return a pointer to a buffer into which
the generated code records pointers that need write barrier treatment.
Change-Id: I7871764298e0aa1513de417010c8d46b296b199e
Reviewed-on: https://go-review.googlesource.com/c/go/+/447781
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Bypass: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Previously, the write barrier calls themselves did the actual
writes to memory. Instead, move those writes out to a common location
that both the wb-enabled and wb-disabled code paths share.
This enables us to optimize the write barrier path without having
to worry about performing the actual writes.
Change-Id: Ia71ab651908ec124cc33141afb52e4ca19733ac6
Reviewed-on: https://go-review.googlesource.com/c/go/+/447780
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Bypass: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Future CLs will remove the invariant that pointers are always put in
the write barrier in pairs.
The behavior of the assembly code changes a bit, where instead of writing
the pointers unconditionally and then checking for overflow, check for
overflow first and then write the pointers.
Also changed the write barrier flush function to not take the src/dst
as arguments.
Change-Id: I2ef708038367b7b82ea67cbaf505a1d5904c775c
Reviewed-on: https://go-review.googlesource.com/c/go/+/447779
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Bypass: Keith Randall <khr@golang.org>
|
|
In the fix for 54332 the MOVD R1, R1 instruction was added to
morestack_noctxt function to set the SPWRITE bit. However, the
instruction MOVD R1, R1 results in or r1,r1,r1 which is a special
instruction on ppc64 architecture as it changes the thread priority
and can negatively impact performance in some cases.
More details on such similar nops can be found in Power ISA v3.1
Book II on Power ISA Virtual Environment architecture in the chapter
on Program Priority Registers and Or instructions.
Replacing this by OR R0, R1 has the same affect on setting SPWRITE as
needed by the first fix but does not affect thread priority and
hence does not cause the degradation in performance
Hash65536-64 2.81GB/s ±10% 16.69GB/s ± 0% +494.44%
Fixes #57741
Change-Id: Ib912e3716c6afd277994d6c1c5b2891f82225d50
Reviewed-on: https://go-review.googlesource.com/c/go/+/461597
Reviewed-by: Benny Siegert <bsiegert@gmail.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Auto-Submit: Benny Siegert <bsiegert@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
On LR architectures, morestack (and morestack_noctxt) are called
with a special calling convention, where the caller doesn't save
LR on stack but passes it as a register, which morestack will save
to g.sched.lr. The stack unwinder currently doesn't understand it,
and would fail to unwind from it. morestack already writes SP (as
it switches stack), but morestack_noctxt (which tailcalls
morestack) doesn't. If a profiling signal lands right in
morestack_noctxt, the unwinder will try to unwind the stack and
go off, and possibly crash.
Marking morestack_noctxt SPWRITE stops the unwinding.
Ideally we could teach the unwinder about the special calling
convention, or change the calling convention to be less special
(so the unwinder doesn't need to fetch a register from the signal
context). This is a stop-gap solution, to stop the unwinder from
crashing.
Fixes #54332.
Change-Id: I75295f2e27ddcf05f1ea0b541aedcb9000ae7576
Reviewed-on: https://go-review.googlesource.com/c/go/+/425396
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Currently runtime.Breakpoint generates a SIGSEGV in ppc64.
The solution is an unconditional trap similar to what clang and gcc do. It is documented in the section C.6 of the ABI Book 3.
Fixes #52101
Change-Id: I071d2f2679b695ef268445b04c9222bd74e1f9af
GitHub-Last-Rev: fff4e5e8ffe23bf0cef135b22abd2cc0a3838613
GitHub-Pull-Request: golang/go#52102
Reviewed-on: https://go-review.googlesource.com/c/go/+/397554
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
They are usually needed when internally linking gcc code
compiled with -Os. These are typically generated by ld
or gold, but are missing when linking internally.
The PPC64 ELF ABI describes a set of functions to save/restore
non-volatile, callee-save registers using R1/R0/R12:
_savegpr0_n: Save Rn-R31 relative to R1, save LR (in R0), return
_restgpr0_n: Restore Rn-R31 from R1, and return to saved LR
_savefpr_n: Save Fn-F31 based on R1, and save LR (in R0), return
_restfpr_n: Restore Fn-F31 from R1, and return to 16(R1)
_savegpr1_n: Save Rn-R31 based on R12, return
_restgpr1_n: Restore Rn-R31 based on R12, return
_savevr_m: Save VRm-VR31 based on R0, R12 is scratch, return
_restvr_m: Restore VRm-VR31 based on R0, R12 is scratch, return
m is a value 20<=m<=31
n is a value 14<=n<=31
Add several new functions similar to those suggested by the
PPC64 ELFv2 ABI. And update the linker to scan external relocs
for these calls, and redirect them to runtime.elf_<func>+offset
in runtime/asm_ppc64x.go.
Similarly, code which generates plt stubs is moved into
a dedicated function. This avoids an extra scan of relocs.
fixes #52336
Change-Id: I2f0f8b5b081a7b294dff5c92b4b1db8eba9a9400
Reviewed-on: https://go-review.googlesource.com/c/go/+/400796
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Change-Id: Ie058c0549167b256ad943a0134907df3aca4a69f
Reviewed-on: https://go-review.googlesource.com/c/go/+/394215
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
regabireflect goexperiment was helpful in the register ABI
development, to control code paths for reflect calls, before the
compiler can generate register ABI everywhere. It is not necessary
for now. Drop it.
Change-Id: I2731197d2f496e29616c426a01045c9b685946a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/393362
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
CL 344955 and CL 359476 removed almost all // +build lines, but leaving
some assembly files and generating scripts. Also, some files were added
with // +build lines after CL 359476 was merged. Remove these or rename
files where more appropriate.
For #41184
Change-Id: I7eb85a498ed9788b42a636e775f261d755504ffa
Reviewed-on: https://go-review.googlesource.com/c/go/+/361480
Trust: Tobias Klauser <tobias.klauser@gmail.com>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
|