| Age | Commit message (Collapse) | Author |
|
CL 743940 puts an atomic bool initialization-done channel, which
is a nice optimization. In race mode, however, the atomic bool
does not have race instrumentation, so the race detector doesn't
know the happens-before edge between the initialization and the
cgo callback.
To tell the race detector about this synchronization, just use the
channel unconditionally in race mode. Channel operations have the
proper race instrumentation. Alternatively, we could explicitly
instrument the atomic boolean operations.
TODO: write a test.
Change-Id: I466b20a46cd39d2bbe2149a9009e1a01b15891e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/746581
Reviewed-by: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Calls to Go functions from threads not started by Go have to wait
for Go initialization to be complete. Before this CL they did this
by receiving from a channel that is closed when initialization is done.
That works well but introduces a channel operation into cgo calls.
This is particularly frustrating because the channel is approximately
always closed.
This CL adds an atomic bool before the channel, so that in the normal
case we are just adding a single locked memory load. We still use
the channel as a fallback.
For #77522
Change-Id: I8f609bf349bb0f836cefa5f6ad6d0c3c7bbfe5e0
Reviewed-on: https://go-review.googlesource.com/c/go/+/743940
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
|
|
When we first implemented DIT (crypto/subtle.WithDataIndependentTiming),
we made it so that enabling DIT on a goroutine would lock that goroutine
to its current OS thread. This was done to ensure that the DIT state
(which is per-thread) would not leak to other goroutines. We also did
not make goroutines inherit the DIT state.
This change makes goroutines inherit the DIT state from their parent
at creation time. It also removes the OS thread locking when enabling
DIT on a goroutine. Instead, we now set the DIT state on the OS thread
in the scheduler whenever we switch to a goroutine that has DIT enabled,
and we unset it when switching to a goroutine that has DIT disabled.
We add a new field to G and M, ditEnabled, to track whether the G wants
DIT enabled, and whether the M currently has DIT enabled, respectively.
When the scheduler executes a goroutine, it checks these fields and
enables/disables DIT on the thread as needed.
Additionally, cgocallbackg is updated to check if DIT is enabled when
being called from C, and sets the G and M fields accordingly. This
ensures that if DIT was enabled/disabled in C, the correct state will be
reflected in the Go runtime.
The behavior as it currently stands is as follows:
- The function passed to crypto/subtle.WithDataIndependentTiming
will have DIT enabled.
- Any goroutine created within that function will inherit DIT enabled
for its lifetime. Any goroutine created from subquent goroutines will
also inherit DIT enabled for their lifetimes.
- Calling into a C function within from a goroutine with DIT enabled
will have DIT enabled.
- If the C code disables DIT, the goroutine will have DIT re-enabled
when returning to Go.
- If the C code enables DIT, the goroutine will have DIT disabled
when returning to Go if it was not previously enabled.
- Calling back into Go code from C will have DIT enabled if it was
enabled when calling into C, or if the C code enabled it.
Change-Id: I8e91e6df13bb88e56e1036e0e0e5f04efd8eebd3
Reviewed-on: https://go-review.googlesource.com/c/go/+/726382
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
When using `any` as param or return type of an exported
function, we currently have the error `unrecognized Go
type any`. `any` is an alias of `interface{}` which is
already supported.
This would avoid such change: https://github.com/php/frankenphp/pull/1976
Fixes #76340
Change-Id: I301838ff72e99ae78b035a8eff2405f6a145ed1a
GitHub-Last-Rev: 7dfbccfa582bbc6e79ed29677391b9ae81a9b5bd
GitHub-Pull-Request: golang/go#76325
Reviewed-on: https://go-review.googlesource.com/c/go/+/720960
Reviewed-by: Mark Freeman <markfreeman@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: I6a6a636cf38ddb1dc6f2170361eb4093b81acdfb
Reviewed-on: https://go-review.googlesource.com/c/go/+/722521
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
This CL improves the error messages after panics due to the
sharing of an unpinned Go pointer (or a pointer to an unpinned Go
pointer) between Go and C.
This occurs when it is:
1. returned from Go to C (through cgoCheckResult)
2. passed as argument to a C function (through cgoCheckPointer).
An unpinned Go pointer refers to a memory location that might be moved or
freed by the garbage collector.
Therefore:
- change the signature of cgoCheckArg (it does the real work behind
cgoCheckResult and cgoCheckPointer)
- change the signature of cgoCheckUnknownPointer (called by cgoCheckArg
for checking unexpected pointers)
- introduce cgoFormatErr (it is called by cgoCheckArg and
cgoCheckUnknownPointer to format panic error messages)
- update the cgo pointer tests (add new tests, and a field errTextRegexp
to the struct ptrTest)
- remove a loop variable in TestPointerChecks (similar to CL 711640).
1. cgoCheckResult
When an unpinned Go pointer (or a pointer to an unpinned Go pointer) is
returned from Go to C,
1 package main
2
3 /*
4 #include <stdio.h>
5
6 extern void* GoFoo();
7
8 static void CFoo() { GoFoo();}
9 */
10 import (
11 "C"
12 )
13
14 //export GoFoo
15 func GoFoo() map[int]int {
16 return map[int]int{0: 1,}
17 }
18
19 func main() {
20 C.CFoo();
21 }
This error shows up at runtime:
panic: runtime error: cgo result is unpinned Go pointer or points to unpinned Go pointer
goroutine 1 [running]:
main._Cfunc_CFoo()
_cgo_gotypes.go:46 +0x3a
main.main()
/mnt/tmp/parse.go:20 +0xf
exit status 2
GoFoo is the faulty Go function; it is not mentioned in the error message.
Moreover the error does not say which kind of pointer caused the panic; for
instance, a Go map.
Retrieve name and file/line of the Go function, as well as the kind of
pointer; use them in the error message:
panic: runtime error: /mnt/tmp/parse.go:15: result of Go function GoFoo called from cgo is unpinned Go map or points to unpinned Go map
goroutine 1 [running]:
main._Cfunc_CFoo()
_cgo_gotypes.go:46 +0x3a
main.main()
/mnt/tmp/parse.go:20 +0xf
exit status 2
2. cgoCheckPointer
When a pointer to an unpinned Go pointer is passed to a C function,
1 package main
2
3 /*
4 #include <stdio.h>
5 void foo(void *bar) {}
6 */
7 import "C"
8 import "unsafe"
9
10 func main() {
11 m := map[int]int{0: 1,}
12 C.foo(unsafe.Pointer(&m))
13 }
This error shows up at runtime:
panic: runtime error: cgo argument has Go pointer to unpinned Go pointer
goroutine 1 [running]:
main.main.func1(...)
/mnt/tmp/cgomap.go:12
main.main()
/mnt/tmp/cgomap.go:12 +0x91
exit status 2
Retrieve kind of pointer; use it in the error message.
panic: runtime error: argument of cgo function has Go pointer to unpinned Go map
goroutine 1 [running]:
main.main.func1(...)
/mnt/tmp/cgomap.go:12
main.main()
/mnt/tmp/cgomap.go:12 +0x9b
exit status 2
Link: https://pkg.go.dev/cmd/cgo#hdr-Passing_pointers
Suggested-by: Ian Lance Taylor <iant@golang.org>
Fixes #75856
Change-Id: Ia72f01df016feeae0cddb2558ced51a1b07e4486
GitHub-Last-Rev: 76257c7dd7c52a3d88234d43ec7f22bda81edcdd
GitHub-Pull-Request: golang/go#75894
Reviewed-on: https://go-review.googlesource.com/c/go/+/711801
Reviewed-by: Funda Secgin <fundasecgin73@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
The description is misleading: cgoCheckArg is called by both
cgoCheckPointer and cgoCheckResult.
Mention cgoCheckResult in the cgoCheckArg description. Remove extra
spaces between words.
For #75856
Change-Id: I6780cda76b5cb7b4f9af3fbaa37a6c5099cc8d7d
GitHub-Last-Rev: 531928b679ab138bf9bfe47a1bf6470d582776f3
GitHub-Pull-Request: golang/go#75992
Reviewed-on: https://go-review.googlesource.com/c/go/+/713520
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
This info makes more sense in the flags instead of as a high
bit of the kind. This makes kind access simpler because we now
don't need to mask anything.
Cleaned up most direct field accesses to use methods instead.
(reflect making new types is the only remaining direct accessor.)
IfaceIndir -> !IsDirectIface everywhere.
gocore has been updated to handle the new location. So has delve.
TODO: any other tools need updating?
Change-Id: I123f97a4d4bdd0bff1641ee7e276d1cc0bd7e8eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/681936
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Factor out the code related to doing calls using the Windows stdcall
calling convention into a separate package. This will allow us to
reuse it in other low-level packages that can't depend on syscall.
Updates #51087.
Cq-Include-Trybots: luci.golang.try:gotip-windows-arm64,gotip-windows-amd64-longtest,gotip-solaris-amd64
Change-Id: I68640b07091183b50da6bef17406c10a397896e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/689156
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
mp.isExtraInC is intended to indicate that this M has no Go frames at
all; it is entirely executing in C.
If there was a cgocallback to Go and then a cgocall to C, such that the
leaf frames are C, that is fine. e.g., traceback can handle this fine
with SetCgoTraceback (or by simply skipping the C frames).
However, we currently mismanage isExtraInC, unconditionally setting it
on return from cgocallback. This means that if there are two levels of
cgocallback, we end up running Go code with isExtraInC set.
1. C-created thread calls into Go function 1 (via cgocallback).
2. Go function 1 calls into C function 1 (via cgocall).
3. C function 1 calls into Go function 2 (via cgocallback).
4. Go function 2 returns back to C function 1 (returning via the remainder of cgocallback).
5. C function 1 returns back to Go function 1 (returning via the remainder of cgocall).
6. Go function 1 is now running with mp.isExtraInC == true.
The fix is simple; only set isExtraInC on return from cgocallback if
there are no more Go frames. There can't be more Go frames unless there
is an active cgocall out of the Go frames.
Fixes #72870.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Change-Id: I6a6a636c4e7ba75a29639d7036c5af3738033467
Reviewed-on: https://go-review.googlesource.com/c/go/+/658035
Reviewed-by: Cherry Mui <cherryyz@google.com>
Commit-Queue: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Add a new function, WithDataIndependentTiming, which takes a function as
an argument, and encloses it with calls to set/unset the DIT PSTATE bit
on Arm64.
Since DIT is OS thread-local, for the duration of the execution of
WithDataIndependentTiming, we lock the goroutine to the OS thread, using
LockOSThread. For long running operations, this is likely to not be
performant, but we expect this to be tightly scoped around cryptographic
operations that have bounded execution times.
If locking to the OS thread turns out to be too slow, another option is
to add a bit to the g state indicating if a goroutine has DIT enabled,
and then have the scheduler enable/disable DIT when scheduling a g.
Additionally, we add a new GODEBUG, dataindependenttiming, which allows
setting DIT for an entire program. Running a program with
dataindependenttiming=1 enables DIT for the program during
initialization. In an ideal world PSTATE.DIT would be inherited from
the parent thread, so we'd only need to set it in the main thread and
then all subsequent threads would inherit the value. While this does
happen in the Linux kernel [0], it is not the case for darwin [1].
Rather than add complex logic to only set it on darwin for each new
thread, we just unconditionally set it in mstart1 and cgocallbackg1
regardless of the OS. DIT will already impose some overhead, and the
cost of setting the bit is only ~two instructions (CALL, MSR), so it
should be cheap enough.
Fixes #66450
Updates #49702
[0] https://github.com/torvalds/linux/blob/e8bdb3c8be08c9a3edc0a373c0aa8729355a0705/arch/arm64/kernel/process.c#L373
[1] https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57b9f54c2f6d4a15585/osfmk/arm64/status.c#L1666
Change-Id: I78eda691ff9254b0415f2b54770e5850a0179749
Reviewed-on: https://go-review.googlesource.com/c/go/+/598336
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Currently, at a cgo callback where there is already a Go frame on
the stack (i.e. C->Go->C->Go), we require that at the inner Go
callback the SP is within the g0's stack bounds set by a previous
callback. This is to prevent that the C code switches stack while
having a Go frame on the stack, which we don't really support. But
this could also happen when we cannot get accurate stack bounds,
e.g. when pthread_getattr_np is not available. Since the stack
bounds are just estimates based on the current SP, if there are
multiple C->Go callbacks with various stack depth, it is possible
that the SP of a later callback falls out of a previous call's
estimate. This leads to runtime throw in a seemingly reasonable
program.
This CL changes it to save the old g0 stack bounds at cgocallback,
update the bounds, and restore the old bounds at return. So each
callback will get its own stack bounds based on the current SP,
and when it returns, the outer callback has the its old stack
bounds restored.
Also, at a cgo callback when there is no Go frame on the stack,
we currently always get new stack bounds. We do this because if
we can only get estimated bounds based on the SP, and the stack
depth varies a lot between two C->Go calls, the previous
estimates may be off and we fall out or nearly fall out of the
previous bounds. But this causes a performance problem: the
pthread API to get accurate stack bounds (pthread_getattr_np) is
very slow when called on the main thread. Getting the stack bounds
every time significantly slows down repeated C->Go calls on the
main thread.
This CL fixes it by "caching" the stack bounds if they are
accurate. I.e. at the second time Go calls into C, if the previous
stack bounds are accurate, and the current SP is in bounds, we can
be sure it is the same stack and we don't need to update the bounds.
This avoids the repeated calls to pthread_getattr_np. If we cannot
get the accurate bounds, we continue to update the stack bounds
based on the SP, and that operation is very cheap.
On a Linux/AMD64 machine with glibc:
name old time/op new time/op delta
CgoCallbackMainThread-8 96.4µs ± 3% 0.1µs ± 2% -99.92% (p=0.000 n=10+9)
Fixes #68285.
Fixes #68587.
Change-Id: I3422badd5ad8ff63e1a733152d05fb7a44d5d435
Reviewed-on: https://go-review.googlesource.com/c/go/+/600296
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
We were missing a case for calling a C function with an index
into a pointer-to-array.
Fixes #70016
Change-Id: I9c74d629e58722813c1aaa0f0dc225a5a64d111b
Reviewed-on: https://go-review.googlesource.com/c/go/+/621576
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
As of CL 580255, the runtime tracks the frame pointer (or base pointer,
bp) when entering syscalls, so that we can use fpTracebackPCs on
goroutines that are sitting in syscalls. That CL mostly got things
right, but missed one very subtle detail.
When calling from Go->C->Go, the goroutine stack performing the calls
when returning to Go is free to move around in memory due to growth,
shrinking, etc. But upon returning back to C, it needs to restore
gp.syscall*, including gp.syscallsp and gp.syscallbp. The way syscallsp
currently gets updated is automagically: it's stored as an
unsafe.Pointer on the stack so that it shows up in a stack map. If the
stack ever moves, it'll get updated correctly. But gp.syscallbp isn't
saved to the stack as an unsafe.Pointer, but rather as a uintptr, so it
never gets updated! As a result, in rare circumstances, fpTracebackPCs
can correctly try to use gp.syscallbp as the starting point for the
traceback, but the value is stale.
This change fixes the problem by just storing gp.syscallbp to the stack
on cgocallback as an unsafe.Pointer, like gp.syscallsp. It also adds a
comment documenting this subtlety; the lack of explanation for the
unsafe.Pointer type on syscallsp meant this detail was missed -- let's
not miss it again in the future.
Now, we have a fix, what about a test? Unfortunately, testing this is
going to be incredibly annoying because the circumstances under which
gp.syscallbp are actually used for traceback are non-deterministic and
hard to arrange, especially from within testprogcgo where we don't have
export_test.go and can't reach into the runtime.
So, instead, add a gp.syscallbp check to reentersyscall and
entersyscallblock that mirrors the gp.syscallbp consistency check. This
probably causes some miniscule slowdown to the syscall path, but it'll
catch the issue without having to actually perform a traceback.
Fixes #69085.
Change-Id: Iaf771758f1666024b854f5fbe2b2c63cbe35b201
Reviewed-on: https://go-review.googlesource.com/c/go/+/608775
Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Cleanup and friction reduction
For #65355.
Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511
Reviewed-on: https://go-review.googlesource.com/c/go/+/600436
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
For #67401.
Change-Id: I3ae93042dffd0683b7e6d6225536ae667749515b
Reviewed-on: https://go-review.googlesource.com/c/go/+/587221
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
callbackUpdateSystemStack contains a fast path to exit early without
update if SP is already within the g0.stack bounds.
This is not safe, as a subsequent call may have new stack bounds that
only partially overlap the old stack bounds. In this case it is possible
to see an SP that is in the old stack bounds, but very close to the
bottom of the bounds due to the partial overlap. In that case we're very
likely to "run out" of space on the system stack.
We only need to do this on extra Ms, as normal Ms have precise bounds
defined when we allocated the stack.
TSAN annotations are added to x_cgo_getstackbounds because bounds is a
pointer into the Go stack. The stack can be reused when an old thread
exits and a new thread starts, but TSAN can't see the synchronization
there. This isn't a new case, but we are now calling more often.
Fixes #62440.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Change-Id: I5389050494987b7668d0b317fb92f85e61d798ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/584597
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Currently the runtime only tracks the PC and SP upon entering a syscall,
but not the FP (BP). This is mainly for historical reasons, and because
the tracer (which uses the frame pointer unwinder) does not need it.
Until it did, of course, in CL 567076, where the tracer tries to take a
stack trace of a goroutine that's in a syscall from afar. It tries to
use gp.sched.bp and lots of things go wrong. It *really* should be using
the equivalent of gp.syscallbp, which doesn't exist before this CL.
This change introduces gp.syscallbp and tracks it. It also introduces
getcallerfp which is nice for simplifying some code. Because we now have
gp.syscallbp, we can also delete the frame skip count computation in
traceLocker.GoSysCall, because it's now the same regardless of whether
frame pointer unwinding is used.
Fixes #66889.
Change-Id: Ib6d761c9566055e0a037134138cb0f81be73ecf7
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-nocgo
Reviewed-on: https://go-review.googlesource.com/c/go/+/580255
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This change removes the allocheaders, deleting all the old code and
merging mbitmap_allocheaders.go back into mbitmap.go.
This change also deletes the SetType benchmarks which were already
broken in the new GOEXPERIMENT (it's harder to set up than before). We
weren't really watching these benchmarks at all, and they don't provide
additional test coverage.
Change-Id: I135497201c3259087c5cd3722ed3fbe24791d25d
Reviewed-on: https://go-review.googlesource.com/c/go/+/567200
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
For #59670
Change-Id: Id66e102f13e529dd041b68ce869026a56f0a1b9b
GitHub-Last-Rev: 43aa9376f72bc02a9d86518cdc99494a6b2f8573
GitHub-Pull-Request: golang/go#65564
Reviewed-on: https://go-review.googlesource.com/c/go/+/562298
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Austin Clements <austin@google.com>
|
|
Windows syscall.SyscallN currently calls lockOSThread for every syscall.
This can be expensive and produce unnecessary context switches,
especially when the syscall is called frequently under high contention.
The lockOSThread was necessary to ensure that cgocall wouldn't
reschedule the goroutine to a different M, as the syscall return values
are reported back in the M struct.
This CL instructs cgocall to copy the syscall return values into the
the M that will see the caller on return, so the caller no longer needs
to call lockOSThread.
Updates #58336.
Cq-Include-Trybots: luci.golang.try:gotip-windows-arm64,gotip-windows-amd64-longtest
Change-Id: If6644fd111dbacab74e7dcee2afa18ca146735da
Reviewed-on: https://go-review.googlesource.com/c/go/+/562915
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: Ib78c1513616089f4942297cd17212b1b11871fd5
GitHub-Last-Rev: f97fe5b5bffffe25dc31de7964588640cb70ec41
GitHub-Pull-Request: golang/go#65819
Reviewed-on: https://go-review.googlesource.com/c/go/+/565515
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This change replaces the 1-bit-per-word heap bitmap for most size
classes with allocation headers for objects that contain pointers. The
header consists of a single pointer to a type. All allocations with
headers are treated as implicitly containing one or more instances of
the type in the header.
As the name implies, headers are usually stored as the first word of an
object. There are two additional exceptions to where headers are stored
and how they're used.
Objects smaller than 512 bytes do not have headers. Instead, a heap
bitmap is reserved at the end of spans for objects of this size. A full
word of overhead is too much for these small objects. The bitmap is of
the same format of the old bitmap, minus the noMorePtrs bits which are
unnecessary. All the objects <512 bytes have a bitmap less than a
pointer-word in size, and that was the granularity at which noMorePtrs
could stop scanning early anyway.
Objects that are larger than 32 KiB (which have their own span) have
their headers stored directly in the span, to allow power-of-two-sized
allocations to not spill over into an extra page.
The full implementation is behind GOEXPERIMENT=allocheaders.
The purpose of this change is performance. First and foremost, with
headers we no longer have to unroll pointer/scalar data at allocation
time for most size classes. Small size classes still need some
unrolling, but their bitmaps are small so we can optimize that case
fairly well. Larger objects effectively have their pointer/scalar data
unrolled on-demand from type data, which is much more compactly
represented and results in less TLB pressure. Furthermore, since the
headers are usually right next to the object and where we're about to
start scanning, we get an additional temporal locality benefit in the
data cache when looking up type metadata. The pointer/scalar data is
now effectively unrolled on-demand, but it's also simpler to unroll than
before; that unrolled data is never written anywhere, and for arrays we
get the benefit of retreading the same data per element, as opposed to
looking it up from scratch for each pointer-word of bitmap. Lastly,
because we no longer have a heap bitmap that spans the entire heap,
there's a flat 1.5% memory use reduction. This is balanced slightly by
some objects possibly being bumped up a size class, but most objects are
not tightly optimized to size class sizes so there's some memory to
spare, making the header basically free in those cases.
See the follow-up CL which turns on this experiment by default for
benchmark results. (CL 538217.)
Change-Id: I4c9034ee200650d06d8bdecd579d5f7c1bbf1fc5
Reviewed-on: https://go-review.googlesource.com/c/go/+/437955
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
With the introduction of runtime.Pinner, returning a pointer to a pinned
struct that then points to an unpinned Go pointer is correctly caught.
However, the error message remained as "cgo result has Go pointer",
which should be updated to acknowledge that Go pointers to pinned
memory are allowed.
This also updates the comments for cgoCheckArg and cgoCheckResult
to similarly clarify.
Updates #46787
Change-Id: I147bb09e87dfb70a24d6d43e4cf84e8bcc2aff48
GitHub-Last-Rev: 706facb9f2bf28e1f6e575b7626f8feeca1187cf
GitHub-Pull-Request: golang/go#62606
Reviewed-on: https://go-review.googlesource.com/c/go/+/527702
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
|
|
For cgo callbacks, currently cgocallbackg locks the OS thread and
then call cgocallbackg1, which invokes the actual callback, and
then unlocks the OS thread in a deferred call. cgocallback then
continues assuming we are on the same M. This assumes there is no
preemption point between the deferred unlockOSThread and returning
to the caller (cgocallbackg). But this is not always true. E.g.
when open defer is not used (e.g. PIE or shared build mode on 386),
there is a preemption point in deferreturn after invoking the
deferred function (when it checks whether there are still defers
to run).
Instead of relying on and requiring the defer implementation has
no preemption point, we move the unlockOSThread to the caller, and
ensuring no preemption by setting incgo to true before unlocking.
This doesn't cover the panicking path, so we also adds an
unlockOSThread there. There we don't need to worry about preemption,
because we're panicking out of the callback and we have unwound the
g0 stack, instead of reentering cgo.
Fixes #62102.
Change-Id: I0e0b9f9091be88d01675c0acb7339b81402545be
Reviewed-on: https://go-review.googlesource.com/c/go/+/532615
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
[This is an unmodified redo of CL 527056.]
Standard Ms set g0.stackguard1 to the same value as stackguard0 in
mstart0. For consistency, extra Ms should do the same for their g0. Do
this in needm -> callbackUpdateSystemStack.
Background: getg().stackguard1 is used as the stack guard for the stack
growth prolouge in functions marked //go:systemstack [1]. User Gs set
stackguard1 to ^uintptr(0) so that the check always fail, calling
morestackc, which throws to report a //go:systemstack function call on a
user stack.
g0 setting stackguard1 is unnecessary for this functionality. 0 would be
sufficient, as g0 is always allowed to call //go:systemstack functions.
However, since we have the check anyway, setting stackguard1 to the
actual stack bound is useful to detect actual stack overflows on g0
(though morestackc doesn't detect this case and would report a
misleading message about user stacks).
[1] cmd/internal/obj calls //go:systemstack functions AttrCFunc. This is
a holdover from when the runtime contained actual C functions. But since
CL 2275, it has simply meant "pretend this is a C function, which would
thus need to use the system stack". Hence the name morestackc. At this
point, this terminology is pretty far removed from reality and should
probably be updated to something more intuitive.
Change-Id: If315677217354465fbbfbd0d406d79be20db0cc3
Reviewed-on: https://go-review.googlesource.com/c/go/+/527716
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
[This is a redo of CL 525455 with the test fixed on darwin by defining
_XOPEN_SOURCE, and disabled with android, musl, and openbsd, which do
not provide getcontext.]
Since CL 495855, Ms are cached for C threads calling into Go, including
the stack bounds of the system stack.
Some C libraries (e.g., coroutine libraries) do manual stack management
and may change stacks between calls to Go on the same thread.
Changing the stack if there is more Go up the stack would be
problematic. But if the calls are completely independent there is no
particular reason for Go to care about the changing stack boundary.
Thus, this CL allows the stack bounds to change in such cases. The
primary downside here (besides additional complexity) is that normal
systems that do not manipulate the stack may not notice unintentional
stack corruption as quickly as before.
Note that callbackUpdateSystemStack is written to be usable for the
initial setup in needm as well as updating the stack in cgocallbackg.
Fixes #62440.
For #62130.
Change-Id: I0fe0134f865932bbaff1fc0da377c35c013bd768
Reviewed-on: https://go-review.googlesource.com/c/go/+/527715
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This reverts CL 525455. The test fails to build on darwin, alpine, and
android.
For #62440.
Change-Id: I39c6b1e16499bd61e0f166de6c6efe7a07961e62
Reviewed-on: https://go-review.googlesource.com/c/go/+/527317
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This reverts CL 527056.
CL 525455 breaks darwin, alpine, and android. This CL must be reverted
in order to revert that CL.
For #62440.
Change-Id: I4e1b16e384b475a605e0214ca36c918d50faa22c
Reviewed-on: https://go-review.googlesource.com/c/go/+/527316
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Standard Ms set g0.stackguard1 to the same value as stackguard0 in
mstart0. For consistency, extra Ms should do the same for their g0. Do
this in needm -> callbackUpdateSystemStack.
Background: getg().stackguard1 is used as the stack guard for the stack
growth prolouge in functions marked //go:systemstack [1]. User Gs set
stackguard1 to ^uintptr(0) so that the check always fail, calling
morestackc, which throws to report a //go:systemstack function call on a
user stack.
g0 setting stackguard1 is unnecessary for this functionality. 0 would be
sufficient, as g0 is always allowed to call //go:systemstack functions.
However, since we have the check anyway, setting stackguard1 to the
actual stack bound is useful to detect actual stack overflows on g0
(though morestackc doesn't detect this case and would report a
misleading message about user stacks).
[1] cmd/internal/obj calls //go:systemstack functions AttrCFunc. This is
a holdover from when the runtime contained actual C functions. But since
CL 2275, it has simply meant "pretend this is a C function, which would
thus need to use the system stack". Hence the name morestackc. At this
point, this terminology is pretty far removed from reality and should
probably be updated to something more intuitive.
Change-Id: I8d0e5628ce31ac6a189a7d7a4124be85aef89862
Reviewed-on: https://go-review.googlesource.com/c/go/+/527056
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Since CL 495855, Ms are cached for C threads calling into Go, including
the stack bounds of the system stack.
Some C libraries (e.g., coroutine libraries) do manual stack management
and may change stacks between calls to Go on the same thread.
Changing the stack if there is more Go up the stack would be
problematic. But if the calls are completely independent there is no
particular reason for Go to care about the changing stack boundary.
Thus, this CL allows the stack bounds to change in such cases. The
primary downside here (besides additional complexity) is that normal
systems that do not manipulate the stack may not notice unintentional
stack corruption as quickly as before.
Note that callbackUpdateSystemStack is written to be usable for the
initial setup in needm as well as updating the stack in cgocallbackg.
Fixes #62440.
For #62130.
Change-Id: I7841b056acea1111bdae3b718345a3bd3961b4a8
Reviewed-on: https://go-review.googlesource.com/c/go/+/525455
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
When passing pointers of Go objects from Go to C, the cgo command generate _Cgo_use(pN) for the unsafe.Pointer type arguments, so that the Go compiler will escape these object to heap.
Since the C function may callback to Go, then the Go stack might grow/shrink, that means the pointers that the C function have will be invalid.
After adding the #cgo noescape annotation for a C function, the cgo command won't generate _Cgo_use(pN), and the Go compiler won't force the object escape to heap.
After adding the #cgo nocallback annotation for a C function, which means the C function won't callback to Go, if it do callback to Go, the Go process will crash.
Fixes #56378
Change-Id: Ifdca070584e0d349c7b12276270e50089e481f7a
GitHub-Last-Rev: f1a17b08b0590eca2670e404bbfedad5461df72f
GitHub-Pull-Request: golang/go#60399
Reviewed-on: https://go-review.googlesource.com/c/go/+/497837
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Change-Id: I510b0a4bf3472d937393800dd57472c30beef329
GitHub-Last-Rev: 8d289b73a37bd86080936423d981d21e152aaa33
GitHub-Pull-Request: golang/go#60960
Reviewed-on: https://go-review.googlesource.com/c/go/+/505398
Auto-Submit: Robert Findley <rfindley@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Robert Findley <rfindley@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
Some C APIs require the use or structures that contain pointers to
buffers (iovec, io_uring, ...). The pointer passing rules would
require that these buffers are allocated in C memory and to process
this data with Go libraries it would need to be copied.
In order to provide a zero-copy way to use these C APIs, this CL
implements a Pinner API that allows to pin Go objects, which
guarantees that the garbage collector does not move these objects
while pinned. This allows to relax the pointer passing rules so that
pinned pointers can be stored in C allocated memory or can be
contained in Go memory that is passed to C functions.
The Pin() method accepts pointers to objects of any type and
unsafe.Pointer. Slices and arrays can be pinned by calling Pin()
with the pointer to the first element. Pinning of maps is not
supported.
If the GC collects unreachable Pinner holding pinned objects it
panics. If Pin() is called with the other non-pointer types it
panics as well.
Performance considerations: This change has no impact on execution
time on existing code, because checks are only done in code paths,
that would panic otherwise. The memory footprint on existing code is
one pointer per memory span.
Fixes: #46787
Signed-off-by: Sven Anderson <sven@anderson.de>
Change-Id: I110031fe789b92277ae45a9455624687bd1c54f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/367296
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
|
|
This reapplies CL 485500, with a fix drafted in CL 492987 incorporated.
CL 485500 is reverted due to #60004 and #60007. #60004 is fixed in
CL 492743. #60007 is fixed in CL 492987 (incorporated in this CL).
[Original CL 485500 description]
This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and
CL 485316 incorporated.
CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 482975 is a followup fix to a C declaration in testprogcgo.
CL 485315 is a followup fix for x_cgo_getstackbound on Illumos.
CL 485316 is a followup cleanup for ppc64 assembly.
CL 479915 passed the G to _cgo_getstackbound for direct updates to
gp.stack.lo. A G can be reused on a new thread after the previous thread
exited. This could trigger the C TSAN race detector because it couldn't
see the synchronization in Go (lockextra) preventing the same G from
being used on multiple threads at the same time.
We work around this by passing the address of a stack variable to
_cgo_getstackbound rather than the G. The stack is generally unique per
thread, so TSAN won't see the same address from multiple threads. Even
if stacks are reused across threads by pthread, C TSAN should see the
synchonization in the stack allocator.
A regression test is added to misc/cgo/testsanitizer.
[Original CL 481061 description]
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
[CL 492987 description]
On the first call into Go from a C thread, currently we set the g0
stack's high bound imprecisely based on the SP. With CL 485500, we
keep the M and don't recompute the stack bounds when it calls into
Go again. If the first call is made when the C thread uses some
deep stack, but a subsequent call is made with a shallower stack,
the SP may be above g0.stack.hi.
This is usually okay as we don't check usually stack.hi. One place
where we do check for stack.hi is in the signal handler, in
adjustSignalStack. In particular, C TSAN delivers signals on the
g0 stack (instead of the usual signal stack). If the SP is above
g0.stack.hi, we don't see it is on the g0 stack, and throws.
This CL makes it get an accurate stack upper bound with the
pthread API (on the platforms where it is available).
Also add some debug print for the "handler not on signal stack"
throw.
Fixes #51676.
Fixes #59294.
Fixes #59678.
Fixes #60007.
Change-Id: Ie51c8e81ade34ec81d69fd7bce1fe0039a470776
Reviewed-on: https://go-review.googlesource.com/c/go/+/495855
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
Change-Id: I1f031f0f83a94bebe41d3978a91a903dc5bcda66
Reviewed-on: https://go-review.googlesource.com/c/go/+/489276
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Change-Id: I1c478b704d84811caa209006c657dda82d9c4cf9
Reviewed-on: https://go-review.googlesource.com/c/go/+/488435
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: If5da1057ead34eb3e4c7f42bbe6ad3d350b97725
Reviewed-on: https://go-review.googlesource.com/c/go/+/484856
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This refactoring is more problematic because the client
package wrap abi.Type, thus the self-referential fields
within ArrayType need to be downcast to the client wrappers
in several places. It's not clear to me this is worthwhile;
this CL is for additional comment, before I attempt similar
changes for other self-referential types.
Change-Id: I41e517e6d851b32560c41676b91b76d7eb17c951
Reviewed-on: https://go-review.googlesource.com/c/go/+/466236
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
This touches a lot of files, which is bad, but it is also good,
since there's N copies of this information commoned into 1.
The new files in internal/abi are copied from the end of the stack;
ultimately this will all end up being used.
Change-Id: Ia252c0055aaa72ca569411ef9f9e96e3d610889e
Reviewed-on: https://go-review.googlesource.com/c/go/+/462995
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This reverts CL 485500.
Reason for revert: This breaks internal tests at Google, see b/280861579 and b/280820455.
Change-Id: I426278d400f7611170918fc07c524cb059b9cc55
Reviewed-on: https://go-review.googlesource.com/c/go/+/492995
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Chressie Himpel <chressie@google.com>
|
|
The current mp.incgocallback() logic allows for trace events to be
recorded using frame pointer unwinding during cgocallbackg when they
shouldn't be. Specifically, mp.incgo will be false during the
reentersyscall call at the end. It's possible to crash with tracing
enabled because of this, if C code which uses the frame pointer register
for other purposes calls into Go. This can be seen, for example, by
forcing testprogcgo/trace_unix.c to write a garbage value to RBP prior
to calling into Go.
We can drop the mp.incgo check, and instead conservatively avoid doing
frame pointer unwinding if there is any C on the stack. This is the case
if mp.ncgo > 0, or if mp.isextra is true (meaning we're coming from a
thread created by C). Rename incgocallback to reflect that we're
checking if there's any C on the stack. We can also move the ncgo
increment in cgocall closer to where the transition to C happens, which
lets us use frame pointer unwinding for the entersyscall event during
the first Go-to-C call on a stack, when there isn't yet any C on the
stack.
Fixes #59830.
Change-Id: If178a705a9d38d0d2fb19589a9e669cd982d32cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/488755
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Nick Ripley <nick.ripley@datadoghq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and
CL 485316 incorporated.
CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 482975 is a followup fix to a C declaration in testprogcgo.
CL 485315 is a followup fix for x_cgo_getstackbound on Illumos.
CL 485316 is a followup cleanup for ppc64 assembly.
[Original CL 481061 description]
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
[CL 485500 description]
CL 479915 passed the G to _cgo_getstackbound for direct updates to
gp.stack.lo. A G can be reused on a new thread after the previous thread
exited. This could trigger the C TSAN race detector because it couldn't
see the synchronization in Go (lockextra) preventing the same G from
being used on multiple threads at the same time.
We work around this by passing the address of a stack variable to
_cgo_getstackbound rather than the G. The stack is generally unique per
thread, so TSAN won't see the same address from multiple threads. Even
if stacks are reused across threads by pthread, C TSAN should see the
synchonization in the stack allocator.
A regression test is added to misc/cgo/testsanitizer.
Fixes #51676.
Fixes #59294.
Fixes #59678.
Change-Id: Ic62be31a06ee83568215e875a891df37084e08ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/485500
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
|
|
This reverts CL 481061.
Reason for revert: When built with C TSAN, x_cgo_getstackbound triggers
race detection on `g->stacklo` because the synchronization is in Go,
which isn't instrumented.
For #51676.
For #59294.
For #59678.
Change-Id: I38afcda9fcffd6537582a39a5214bc23dc147d47
Reviewed-on: https://go-review.googlesource.com/c/go/+/485275
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
Fixes #51676.
Fixes #59294.
Change-Id: I9bf1400106d5c08ce621d2ed1df3a2d9e3f55494
Reviewed-on: https://go-review.googlesource.com/c/go/+/481061
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: DeJiang Zhu (doujiang) <doujiang24@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This reverts CL 392854.
Reason for revert: caused #59294, which was derived from google
internal tests. The attempted fix of #59294 caused more breakage.
Change-Id: I5a061561ac2740856b7ecc09725ac28bd30f8bba
Reviewed-on: https://go-review.googlesource.com/c/go/+/481060
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Fixes #51676
Change-Id: I380702fe2f9b6b401b2d6f04b0aba990f4b9ee6c
GitHub-Last-Rev: 93dc64ad98e5583372e41f65ee4b7ab78b5aff51
GitHub-Pull-Request: golang/go#51679
Reviewed-on: https://go-review.googlesource.com/c/go/+/392854
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: thepudds <thepudds1460@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Move this knob from a binary-startup thing to a build-time thing.
This will enable followon optmizations to the write barrier.
Change-Id: Ic3323348621c76a7dc390c09ff55016b19c43018
Reviewed-on: https://go-review.googlesource.com/c/go/+/447778
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Change-Id: I69065f8adf101fdb28682c55997f503013a50e29
Reviewed-on: https://go-review.googlesource.com/c/go/+/449757
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joedian Reid <joedian@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
|
|
This reverts CL 426075.
Reason for revert: Import missing from cgocall.go.
Change-Id: Iac17e914045b83da30484dbe2a624cde526fb175
Reviewed-on: https://go-review.googlesource.com/c/go/+/427614
Reviewed-by: Heschi Kreinick <heschi@google.com>
|