| Age | Commit message (Collapse) | Author |
|
For #65355
Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be
Reviewed-on: https://go-review.googlesource.com/c/go/+/560155
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
|
|
And then revert the bootstrap cmd directories and certain testdata.
And adjust tests as needed.
Not reverting the changes in std that are bootstrapped,
because some of those changes would appear in API docs,
and we want to use any consistently.
Instead, rewrite 'any' to 'interface{}' in cmd/dist for those directories
when preparing the bootstrap copy.
A few files changed as a result of running gofmt -w
not because of interface{} -> any but because they
hadn't been updated for the new //go:build lines.
Fixes #49884.
Change-Id: Ie8045cba995f65bd79c694ec77a1b3d1fe01bb09
Reviewed-on: https://go-review.googlesource.com/c/go/+/368254
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
ARMv8.1 has added new instructions for atomic memory operations. This
change builds on the previous change which added support for atomic add,
0a7ac93c27c9ade79fe0f66ae0bb81484c241ae5, to include similar support for
atomic-compare-and-swap, atomic-swap, atomic-or, and atomic-and
intrinsics. Since the new instructions are not guaranteed to be present,
we guard their usages with a branch on a CPU feature.
Peformance on an ARMv8.1 machine:
name old time/op new time/op delta
CompareAndSwap-16 37.9ns ±16% 24.1ns ± 4% -36.44% (p=0.000 n=10+9)
CompareAndSwap64-16 38.6ns ±15% 24.1ns ± 3% -37.47% (p=0.000 n=10+10)
name old time/op new time/op delta
Swap-16 46.9ns ±32% 12.5ns ± 6% -73.40% (p=0.000 n=10+10)
Swap64-16 53.4ns ± 1% 12.5ns ± 6% -76.56% (p=0.000 n=10+10)
name old time/op new time/op delta
Or8-16 8.81ns ± 0% 5.61ns ± 0% -36.32% (p=0.000 n=10+10)
Or-16 7.21ns ± 0% 5.61ns ± 0% -22.19% (p=0.000 n=10+10)
Or8Parallel-16 59.8ns ± 3% 12.5ns ± 2% -79.10% (p=0.000 n=10+10)
OrParallel-16 51.7ns ± 3% 12.5ns ± 2% -75.84% (p=0.000 n=10+10)
name old time/op new time/op delta
And8-16 8.81ns ± 0% 5.61ns ± 0% -36.32% (p=0.000 n=10+10)
And-16 7.21ns ± 0% 5.61ns ± 0% -22.19% (p=0.000 n=10+10)
And8Parallel-16 59.1ns ± 6% 12.8ns ± 3% -78.33% (p=0.000 n=10+10)
AndParallel-16 51.4ns ± 7% 12.8ns ± 3% -75.03% (p=0.000 n=10+10)
Performance on an ARMv8.0 machine (no atomics instructions):
name old time/op new time/op delta
CompareAndSwap-16 61.3ns ± 0% 62.4ns ± 0% +1.70% (p=0.000 n=8+9)
CompareAndSwap64-16 62.0ns ± 3% 61.3ns ± 2% ~ (p=0.093 n=10+10)
name old time/op new time/op delta
Swap-16 127ns ± 2% 131ns ± 2% +2.91% (p=0.001 n=10+10)
Swap64-16 128ns ± 1% 131ns ± 2% +2.43% (p=0.001 n=10+10)
name old time/op new time/op delta
Or8-16 14.9ns ± 0% 15.3ns ± 0% +2.68% (p=0.000 n=10+10)
Or-16 11.8ns ± 0% 12.3ns ± 0% +4.24% (p=0.000 n=10+10)
Or8Parallel-16 137ns ± 1% 144ns ± 1% +4.97% (p=0.000 n=10+10)
OrParallel-16 128ns ± 1% 136ns ± 1% +6.34% (p=0.000 n=10+10)
name old time/op new time/op delta
And8-16 14.9ns ± 0% 15.3ns ± 0% +2.68% (p=0.000 n=10+10)
And-16 11.8ns ± 0% 12.3ns ± 0% +4.24% (p=0.000 n=10+10)
And8Parallel-16 134ns ± 2% 141ns ± 1% +5.29% (p=0.000 n=10+10)
AndParallel-16 125ns ± 2% 134ns ± 1% +7.10% (p=0.000 n=10+10)
Fixes #39304
Change-Id: Idaca68701d4751650be6b4bedca3d57f51571712
Reviewed-on: https://go-review.googlesource.com/c/go/+/234217
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: fannie zhang <Fannie.Zhang@arm.com>
|
|
These will be used in a following CL to perform larger bit clear and bit
set than And8/Or8.
Change-Id: I60f7b1099e29b69eb64add77564faee862880a8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/260977
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Michael Pratt <mpratt@google.com>
|
|
Intrinsify these functions to match other platforms. Update the
sequence of instructions used in the assembly implementations to
match the intrinsics.
Also, add a micro benchmark so we can more easily measure the
performance of these two functions:
name old time/op new time/op delta
And8-8 5.33ns ± 7% 2.55ns ± 8% -52.12% (p=0.000 n=20+20)
And8Parallel-8 7.39ns ± 5% 3.74ns ± 4% -49.34% (p=0.000 n=20+20)
Or8-8 4.84ns ±15% 2.64ns ±11% -45.50% (p=0.000 n=20+20)
Or8Parallel-8 7.27ns ± 3% 3.84ns ± 4% -47.10% (p=0.000 n=19+20)
By using a 'rotate then xor selected bits' instruction combined with
either a 'load and and' or a 'load and or' instruction we can
implement And8 and Or8 with far fewer instructions. Replacing
'compare and swap' with atomic instructions may also improve
performance when there is contention.
Change-Id: I28bb8032052b73ae8ccdf6e4c612d2877085fa01
Reviewed-on: https://go-review.googlesource.com/c/go/+/204277
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
ARMv8.1 has added new instruction (LDADDAL) for atomic memory operations. This
CL improves existing atomic add intrinsics with the new instruction. Since the
new instruction is only guaranteed to be present after ARMv8.1, we guard its
usage with a conditional on CPU feature.
Performance result on ARMv8.1 machine:
name old time/op new time/op delta
Xadd-224 1.05µs ± 6% 0.02µs ± 4% -98.06% (p=0.000 n=10+8)
Xadd64-224 1.05µs ± 3% 0.02µs ±13% -98.10% (p=0.000 n=9+10)
[Geo mean] 1.05µs 0.02µs -98.08%
Performance result on ARMv8.0 machine:
name old time/op new time/op delta
Xadd-46 538ns ± 1% 541ns ± 1% +0.62% (p=0.000 n=9+9)
Xadd64-46 505ns ± 1% 508ns ± 0% +0.48% (p=0.003 n=9+8)
[Geo mean] 521ns 524ns +0.55%
Change-Id: If4b5d8d0e2d6f84fe1492a4f5de0789910ad0ee9
Reviewed-on: https://go-review.googlesource.com/81877
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This gets us around the kernel helpers on ARMv7.
It is slightly faster than using the kernel helper.
name old time/op new time/op delta
AtomicLoad-4 72.5ns ± 0% 69.5ns ± 0% -4.08% (p=0.000 n=9+9)
AtomicStore-4 57.6ns ± 1% 54.4ns ± 0% -5.58% (p=0.000 n=10+9)
[Geo mean] 64.6ns 61.5ns -4.83%
If performance is really critical, we can even do compiler intrinsics
on GOARM=7.
Fixes #23792.
Change-Id: I36497d880890b26bdf01e048b542bd5fd7b17d23
Reviewed-on: https://go-review.googlesource.com/94076
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Inline atomic reads and writes on amd64. There's no reason
to pay the overhead of a call for these.
To keep atomic loads from being reordered, we make them
return a <value,memory> tuple.
Change the meaning of resultInArg0 for tuple-generating ops
to mean the first part of the result tuple, not the second.
This means we can always put the store part of the tuple last,
matching how arguments are laid out. This requires reordering
the outputs of add32carry and sub32carry and their descendents
in various architectures.
benchmark old ns/op new ns/op delta
BenchmarkAtomicLoad64-8 2.09 0.26 -87.56%
BenchmarkAtomicStore64-8 7.54 5.72 -24.14%
TBD (in a different CL): Cas, Or8, ...
Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207
Reviewed-on: https://go-review.googlesource.com/27641
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|