| Age | Commit message (Collapse) | Author |
|
The CV add changes according to TODO in Go source-code.
Internal atomic set does not comply with sync/atomic library and has shortage
operations for signed integers.
This patch extend internal atomic set by Int32 and Int64 operations. It's
implemented new aliases and asm versions of operations. As a result Cas64 was
replaced by Casint64 in findRunnableGCWorker without type casting.
Another purpose is unified structure of internal atomics' source code. Before,
assembly impementations for different archs were in different files. For
example, filename for AMD64 was asm_amd64.s, but filename for RISC-V was
atomic_riscv64.s. Some arches have both files without any meaning. So, assembly
files were merged and renamed to atomic_{$ARCH}.s filenames.
Change-Id: I29a05a7cbf5f4a9cc146e8315536c038af545677
Reviewed-on: https://go-review.googlesource.com/c/go/+/289152
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
These will be used in a following CL to perform larger bit clear and bit
set than And8/Or8.
Change-Id: I60f7b1099e29b69eb64add77564faee862880a8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/260977
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Michael Pratt <mpratt@google.com>
|
|
Add an internal atomic intrinsic for load with acquire semantics
(extending LoadAcq to 64b) and add LoadAcquintptr for internal
use within the sync package. For other arches, this remaps to the
appropriate atomic.Load{,64} intrinsic which should not alter code
generation.
Similarly, add StoreRel{uintptr,64} for consistency, and inline.
Finally, add an exception to allow sync to directly use the
runtime/internal/atomic package which avoids more convoluted
workarounds (contributed by Lynn Boger).
In an extreme example, sync.(*Pool).pin consumes 20% of wall time
during fmt tests. This is reduced to 5% on ppc64le/power9.
From the fmt benchmarks on ppc64le:
name old time/op new time/op delta
SprintfPadding 468ns ± 0% 451ns ± 0% -3.63%
SprintfEmpty 73.3ns ± 0% 51.9ns ± 0% -29.20%
SprintfString 135ns ± 0% 122ns ± 0% -9.63%
SprintfTruncateString 232ns ± 0% 214ns ± 0% -7.76%
SprintfTruncateBytes 216ns ± 0% 202ns ± 0% -6.48%
SprintfSlowParsingPath 162ns ± 0% 142ns ± 0% -12.35%
SprintfQuoteString 1.00µs ± 0% 0.99µs ± 0% -1.39%
SprintfInt 117ns ± 0% 104ns ± 0% -11.11%
SprintfIntInt 190ns ± 0% 175ns ± 0% -7.89%
SprintfPrefixedInt 232ns ± 0% 212ns ± 0% -8.62%
SprintfFloat 270ns ± 0% 255ns ± 0% -5.56%
SprintfComplex 1.01µs ± 0% 0.99µs ± 0% -1.68%
SprintfBoolean 127ns ± 0% 111ns ± 0% -12.60%
SprintfHexString 220ns ± 0% 198ns ± 0% -10.00%
SprintfHexBytes 261ns ± 0% 252ns ± 0% -3.45%
SprintfBytes 600ns ± 0% 590ns ± 0% -1.67%
SprintfStringer 684ns ± 0% 658ns ± 0% -3.80%
SprintfStructure 2.57µs ± 0% 2.57µs ± 0% -0.12%
ManyArgs 669ns ± 0% 646ns ± 0% -3.44%
FprintInt 140ns ± 0% 136ns ± 0% -2.86%
FprintfBytes 184ns ± 0% 181ns ± 0% -1.63%
FprintIntNoAlloc 140ns ± 0% 136ns ± 0% -2.86%
ScanInts 929µs ± 0% 921µs ± 0% -0.79%
ScanRecursiveInt 122ms ± 0% 121ms ± 0% -0.11%
ScanRecursiveIntReaderWrapper 122ms ± 0% 122ms ± 0% -0.18%
Change-Id: I4d66780261b57b06ef600229e475462e7313f0d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/253748
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
We already have Load8, And8, and Or8.
For #10958, #24543, but makes sense on its own.
Change-Id: I478529fc643edc57efdeccaae413c99edd19b2eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/203283
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change creates the infrastructure for new lightweight atomics
primitives in runtime/internal/atomic:
- LoadAcq, for load-acquire
- StoreRel, for store-release
- CasRel, for Compare-and-Swap-release
and implements them for ppc64x. There is visible performance improvement
in producer-consumer scenarios, like BenchmarkChanProdCons*:
benchmark old ns/op new ns/op delta
BenchmarkChanProdCons0-48 2034 2034 +0.00%
BenchmarkChanProdCons10-48 1798 1608 -10.57%
BenchmarkChanProdCons100-48 1596 1585 -0.69%
BenchmarkChanProdConsWork0-48 2084 2046 -1.82%
BenchmarkChanProdConsWork10-48 1829 1668 -8.80%
BenchmarkChanProdConsWork100-48 1650 1650 +0.00%
Fixes #21348
Change-Id: I1f6ce377e4a0fe4bd7f5f775e8036f50070ad8db
Reviewed-on: https://go-review.googlesource.com/c/142277
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
faster atomics for ppc64x
This change implements faster atomics for ppc64x based on the ISA 2.07B,
Appendix B.2 recommendations, replacing SYNC/ISYNC by LWSYNC in some
cases.
Updates #21348
name old time/op new time/op delta
Cond1-16 955ns 856ns -10.33%
Cond2-16 2.38µs 2.03µs -14.59%
Cond4-16 5.90µs 5.44µs -7.88%
Cond8-16 12.1µs 11.1µs -8.42%
Cond16-16 27.0µs 25.1µs -7.04%
Cond32-16 59.1µs 55.5µs -6.14%
LoadMostlyHits/*sync_test.DeepCopyMap-16 22.1ns 24.1ns +9.02%
LoadMostlyHits/*sync_test.RWMutexMap-16 252ns 249ns -1.20%
LoadMostlyHits/*sync.Map-16 16.2ns 16.3ns ~
LoadMostlyMisses/*sync_test.DeepCopyMap-16 22.3ns 22.6ns ~
LoadMostlyMisses/*sync_test.RWMutexMap-16 249ns 247ns -0.51%
LoadMostlyMisses/*sync.Map-16 12.7ns 12.7ns ~
LoadOrStoreBalanced/*sync_test.RWMutexMap-16 1.27µs 1.17µs -7.54%
LoadOrStoreBalanced/*sync.Map-16 1.12µs 1.10µs -2.35%
LoadOrStoreUnique/*sync_test.RWMutexMap-16 1.75µs 1.68µs -3.84%
LoadOrStoreUnique/*sync.Map-16 2.07µs 1.97µs -5.13%
LoadOrStoreCollision/*sync_test.DeepCopyMap-16 15.8ns 15.9ns ~
LoadOrStoreCollision/*sync_test.RWMutexMap-16 496ns 424ns -14.48%
LoadOrStoreCollision/*sync.Map-16 6.07ns 6.07ns ~
Range/*sync_test.DeepCopyMap-16 1.65µs 1.64µs ~
Range/*sync_test.RWMutexMap-16 278µs 288µs +3.75%
Range/*sync.Map-16 2.00µs 2.01µs ~
AdversarialAlloc/*sync_test.DeepCopyMap-16 3.45µs 3.44µs ~
AdversarialAlloc/*sync_test.RWMutexMap-16 226ns 227ns ~
AdversarialAlloc/*sync.Map-16 1.09µs 1.07µs -2.36%
AdversarialDelete/*sync_test.DeepCopyMap-16 553ns 550ns -0.57%
AdversarialDelete/*sync_test.RWMutexMap-16 273ns 274ns ~
AdversarialDelete/*sync.Map-16 247ns 249ns ~
UncontendedSemaphore-16 79.0ns 65.5ns -17.11%
ContendedSemaphore-16 112ns 97ns -13.77%
MutexUncontended-16 3.34ns 2.51ns -24.69%
Mutex-16 266ns 191ns -28.26%
MutexSlack-16 226ns 159ns -29.55%
MutexWork-16 377ns 338ns -10.14%
MutexWorkSlack-16 335ns 308ns -8.20%
MutexNoSpin-16 196ns 184ns -5.91%
MutexSpin-16 710ns 666ns -6.21%
Once-16 1.29ns 1.29ns ~
Pool-16 8.64ns 8.71ns ~
PoolOverflow-16 1.60µs 1.44µs -10.25%
SemaUncontended-16 5.39ns 4.42ns -17.96%
SemaSyntNonblock-16 539ns 483ns -10.42%
SemaSyntBlock-16 413ns 354ns -14.20%
SemaWorkNonblock-16 305ns 258ns -15.36%
SemaWorkBlock-16 266ns 229ns -14.06%
RWMutexUncontended-16 12.9ns 9.7ns -24.80%
RWMutexWrite100-16 203ns 147ns -27.47%
RWMutexWrite10-16 177ns 119ns -32.74%
RWMutexWorkWrite100-16 435ns 403ns -7.39%
RWMutexWorkWrite10-16 642ns 611ns -4.79%
WaitGroupUncontended-16 4.67ns 3.70ns -20.92%
WaitGroupAddDone-16 402ns 355ns -11.54%
WaitGroupAddDoneWork-16 208ns 250ns +20.09%
WaitGroupWait-16 1.21ns 1.21ns ~
WaitGroupWaitWork-16 5.91ns 5.87ns -0.81%
WaitGroupActuallyWait-16 92.2ns 85.8ns -6.91%
Updates #21348
Change-Id: Ibb9b271d11b308264103829e176c6d9fe8f867d3
Reviewed-on: https://go-review.googlesource.com/95175
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
Starting in go1.9, the minimum processor requirement for ppc64 is POWER8. This
means the checks for GOARCH_ppc64 in asm_ppc64x.s can be removed, since we can
assume LBAR and STBCCC instructions (both from ISA 2.06) will always be
available.
Updates #19074
Change-Id: Ib4418169cd9fc6f871a5ab126b28ee58a2f349e2
Reviewed-on: https://go-review.googlesource.com/38406
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Change-Id: I80ccf40cd3930aff908ee64f6dcbe5f5255198d3
Reviewed-on: https://go-review.googlesource.com/24914
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
This modifies a recent performance improvement to the
And8 and Or8 atomic functions which required both ppc64le
and ppc64 to use power8 instructions. Since then it was
decided that ppc64 (BE) should work for power5 and later.
This change uses instructions compatible with power5 for
ppc64 and uses power8 for ppc64le.
Fixes #16004
Change-Id: I623c75e8e6fd1fa063a53d250d86cdc9d0890dc7
Reviewed-on: https://go-review.googlesource.com/24181
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Andrew Gerrand <adg@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The following performance improvements have been made to the
low-level atomic functions for ppc64le & ppc64:
- For those cases containing a lwarx and stwcx (or other sizes):
sync, lwarx, maybe something, stwcx, loop to sync, sync, isync
The sync is moved before (outside) the lwarx/stwcx loop, and the
sync after is removed, so it becomes:
sync, lwarx, maybe something, stwcx, loop to lwarx, isync
- For the Or8 and And8, the shifting and manipulation of the
address to the word aligned version were removed and the
instructions were changed to use lbarx, stbcx instead of
register shifting, xor, then lwarx, stwcx.
- New instructions LWSYNC, LBAR, STBCC were tested and added.
runtime/atomic_ppc64x.s was changed to use the LWSYNC opcode
instead of the WORD encoding.
Fixes #15469
Ran some of the benchmarks in the runtime and sync directories.
Some results varied from run to run but the trend was improvement
based on best times for base and new:
runtime.test:
BenchmarkChanNonblocking-128 0.88 0.89 +1.14%
BenchmarkChanUncontended-128 569 511 -10.19%
BenchmarkChanContended-128 63110 53231 -15.65%
BenchmarkChanSync-128 691 598 -13.46%
BenchmarkChanSyncWork-128 11355 11649 +2.59%
BenchmarkChanProdCons0-128 2402 2090 -12.99%
BenchmarkChanProdCons10-128 1348 1363 +1.11%
BenchmarkChanProdCons100-128 1002 746 -25.55%
BenchmarkChanProdConsWork0-128 2554 2720 +6.50%
BenchmarkChanProdConsWork10-128 1909 1804 -5.50%
BenchmarkChanProdConsWork100-128 1624 1580 -2.71%
BenchmarkChanCreation-128 237 212 -10.55%
BenchmarkChanSem-128 705 667 -5.39%
BenchmarkChanPopular-128 5081190 4497566 -11.49%
BenchmarkCreateGoroutines-128 532 473 -11.09%
BenchmarkCreateGoroutinesParallel-128 35.0 34.7 -0.86%
BenchmarkCreateGoroutinesCapture-128 4923 4200 -14.69%
sync.test:
BenchmarkUncontendedSemaphore-128 112 94.2 -15.89%
BenchmarkContendedSemaphore-128 133 128 -3.76%
BenchmarkMutexUncontended-128 1.90 1.67 -12.11%
BenchmarkMutex-128 353 310 -12.18%
BenchmarkMutexSlack-128 304 283 -6.91%
BenchmarkMutexWork-128 554 541 -2.35%
BenchmarkMutexWorkSlack-128 567 556 -1.94%
BenchmarkMutexNoSpin-128 275 242 -12.00%
BenchmarkMutexSpin-128 1129 1030 -8.77%
BenchmarkOnce-128 1.08 0.96 -11.11%
BenchmarkPool-128 29.8 27.4 -8.05%
BenchmarkPoolOverflow-128 40564 36583 -9.81%
BenchmarkSemaUncontended-128 3.14 2.63 -16.24%
BenchmarkSemaSyntNonblock-128 1087 1069 -1.66%
BenchmarkSemaSyntBlock-128 897 893 -0.45%
BenchmarkSemaWorkNonblock-128 1034 1028 -0.58%
BenchmarkSemaWorkBlock-128 949 886 -6.64%
Change-Id: I4403fb29d3cd5254b7b1ce87a216bd11b391079e
Reviewed-on: https://go-review.googlesource.com/22549
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Minux Ma <minux@golang.org>
|
|
Make it clear that the point of this function stores a pointer
*without* a write barrier.
sed -i -e 's/Storep1/StorepNoWB/' $(git grep -l Storep1)
Updates #15270.
Change-Id: Ifad7e17815e51a738070655fe3b178afdadaecf6
Reviewed-on: https://go-review.googlesource.com/21994
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
|
|
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This change breaks out most of the atomics functions in the runtime
into package runtime/internal/atomic. It adds some basic support
in the toolchain for runtime packages, and also modifies linux/arm
atomics to remove the dependency on the runtime's mutex. The mutexes
have been replaced with spinlocks.
all trybots are happy!
In addition to the trybots, I've tested on the darwin/arm64 builder,
on the darwin/arm builder, and on a ppc64le machine.
Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f
Reviewed-on: https://go-review.googlesource.com/14204
Run-TryBot: Michael Matloob <matloob@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|