diff options
| author | doujiang24 <doujiang24@gmail.com> | 2023-03-24 01:30:46 +0000 |
|---|---|---|
| committer | Cherry Mui <cherryyz@google.com> | 2023-03-24 16:00:24 +0000 |
| commit | ef0dedce87b505f6115dd39b4329da9e9231ff95 (patch) | |
| tree | 790632c8421af9f34fd84b70b2802ffb964a0533 /src/runtime/cgo/gcc_libinit.c | |
| parent | a6c382eaa8eaa611d71232aa4d5391b56a5c2693 (diff) | |
| download | go-ef0dedce87b505f6115dd39b4329da9e9231ff95.tar.xz | |
runtime/cgo: store M for C-created thread in pthread key
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Fixes #51676
Change-Id: I380702fe2f9b6b401b2d6f04b0aba990f4b9ee6c
GitHub-Last-Rev: 93dc64ad98e5583372e41f65ee4b7ab78b5aff51
GitHub-Pull-Request: golang/go#51679
Reviewed-on: https://go-review.googlesource.com/c/go/+/392854
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: thepudds <thepudds1460@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Diffstat (limited to 'src/runtime/cgo/gcc_libinit.c')
| -rw-r--r-- | src/runtime/cgo/gcc_libinit.c | 35 |
1 files changed, 35 insertions, 0 deletions
diff --git a/src/runtime/cgo/gcc_libinit.c b/src/runtime/cgo/gcc_libinit.c index 57620fe4de..9676593211 100644 --- a/src/runtime/cgo/gcc_libinit.c +++ b/src/runtime/cgo/gcc_libinit.c @@ -17,6 +17,14 @@ static pthread_cond_t runtime_init_cond = PTHREAD_COND_INITIALIZER; static pthread_mutex_t runtime_init_mu = PTHREAD_MUTEX_INITIALIZER; static int runtime_init_done; +// pthread_g is a pthread specific key, for storing the g that binded to the C thread. +// The registered pthread_key_destructor will dropm, when the pthread-specified value g is not NULL, +// while a C thread is exiting. +static pthread_key_t pthread_g; +static void pthread_key_destructor(void* g); +uintptr_t x_cgo_pthread_key_created; +void (*x_crosscall2_ptr)(void (*fn)(void *), void *, int, size_t); + // The context function, used when tracing back C calls into Go. static void (*cgo_context_function)(struct context_arg*); @@ -39,6 +47,12 @@ _cgo_wait_runtime_init_done(void) { pthread_cond_wait(&runtime_init_cond, &runtime_init_mu); } + // The key and x_cgo_pthread_key_created are for the whole program, + // whereas the specific and destructor is per thread. + if (x_cgo_pthread_key_created == 0 && pthread_key_create(&pthread_g, pthread_key_destructor) == 0) { + x_cgo_pthread_key_created = 1; + } + // TODO(iant): For the case of a new C thread calling into Go, such // as when using -buildmode=c-archive, we know that Go runtime // initialization is complete but we do not know that all Go init @@ -61,6 +75,16 @@ _cgo_wait_runtime_init_done(void) { return 0; } +// Store the g into a thread-specific value associated with the pthread key pthread_g. +// And pthread_key_destructor will dropm when the thread is exiting. +void x_cgo_bindm(void* g) { + // We assume this will always succeed, otherwise, there might be extra M leaking, + // when a C thread exits after a cgo call. + // We only invoke this function once per thread in runtime.needAndBindM, + // and the next calls just reuse the bound m. + pthread_setspecific(pthread_g, g); +} + void x_cgo_notify_runtime_init_done(void* dummy __attribute__ ((unused))) { pthread_mutex_lock(&runtime_init_mu); @@ -110,3 +134,14 @@ _cgo_try_pthread_create(pthread_t* thread, const pthread_attr_t* attr, void* (*p } return EAGAIN; } + +static void +pthread_key_destructor(void* g) { + if (x_crosscall2_ptr != NULL) { + // fn == NULL means dropm. + // We restore g by using the stored g, before dropm in runtime.cgocallback, + // since the g stored in the TLS by Go might be cleared in some platforms, + // before this destructor invoked. + x_crosscall2_ptr(NULL, g, 0, 0); + } +} |
