| Age | Commit message (Collapse) | Author |
|
Just like we do for race mode. They are just too slow when running
with the sanitizers.
Fixes #59448
Change-Id: I86e3e3488ec5c4c29e410955e9dc4cbc99d39b84
Reviewed-on: https://go-review.googlesource.com/c/go/+/687535
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
|
For testing out duffcopy changes.
Change-Id: I93b4a52d75418a6e31aae5ad99f95d1870812b69
Reviewed-on: https://go-review.googlesource.com/c/go/+/678215
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
The overhead for allocation is not significant but it should be excluded
from the memmove/memclr benchmarking anyway.
Change-Id: I7ea86d1b85b13352ccbff16f7510caa250654dab
Reviewed-on: https://go-review.googlesource.com/c/go/+/645576
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This is a follow-up to CL 574675.
Change-Id: I98c3ea968e9c7dc61472849c385a1e697568aa30
Reviewed-on: https://go-review.googlesource.com/c/go/+/574975
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Jes Cok <xigua67damn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
|
|
Change-Id: I1df0685c75fc1044ba46003a69ecc7dfc53bbc2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/574675
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
|
|
Change-Id: I3f0b7209621b39cee69566a5cc95e4343b4f1f20
GitHub-Last-Rev: af9dbbe69ad74e8c210254dafa260a886b690853
GitHub-Pull-Request: golang/go#63321
Reviewed-on: https://go-review.googlesource.com/c/go/+/531916
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The riscv64 implementation of memmove has two optimizations that are
applied when both source and destination pointers share the same alignment
but that alignment is not 8 bytes. Both optimizations attempt to align the
source and destination pointers to 8 byte boundaries before performing 8 byte
aligned loads and stores. Both optimizations are incorrect. The first
optimization is applied when the destination pointer is smaller than the source
pointer. In this case the code increments both pointers by (pointer & 3) bytes
rather than (8 - (pointer & 7)) bytes. The second optimization is applied
when the destination pointer is larger than the source pointer. In this
case the existing code decrements the pointers by (pointer & 3) bytes instead
of (pointer & 7).
This commit fixes both optimizations avoiding unaligned 8 byte accesses.
As this particular optimization is not covered by any of the existing
benchmarks a new benchmark, BenchmarkMemmoveUnalignedSrcDst,
is provided that exercises both optimizations. Results of the new
benchmark, which were run on a SiFive HiFive Unmatched A00 with 16GB of RAM
running Ubuntu 23.04 are presented below.
MemmoveUnalignedSrcDst/f_16_0-4 39.48n ± 5% 43.47n ± 2% +10.13% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_16_0-4 45.39n ± 5% 41.55n ± 4% -8.47% (p=0.000 n=10)
MemmoveUnalignedSrcDst/f_16_1-4 1230.50n ± 1% 83.44n ± 5% -93.22% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_16_1-4 69.34n ± 4% 67.83n ± 8% ~ (p=0.436 n=10)
MemmoveUnalignedSrcDst/f_16_4-4 2349.00n ± 1% 72.09n ± 4% -96.93% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_16_4-4 2357.00n ± 0% 77.61n ± 4% -96.71% (p=0.000 n=10)
MemmoveUnalignedSrcDst/f_16_7-4 1235.00n ± 0% 62.02n ± 2% -94.98% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_16_7-4 1246.00n ± 0% 84.05n ± 6% -93.25% (p=0.000 n=10)
MemmoveUnalignedSrcDst/f_64_0-4 49.96n ± 2% 50.01n ± 2% ~ (p=0.755 n=10)
MemmoveUnalignedSrcDst/b_64_0-4 52.06n ± 3% 51.65n ± 3% ~ (p=0.631 n=10)
MemmoveUnalignedSrcDst/f_64_1-4 8105.50n ± 0% 97.63n ± 1% -98.80% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_64_1-4 84.07n ± 4% 84.90n ± 5% ~ (p=0.315 n=10)
MemmoveUnalignedSrcDst/f_64_4-4 9192.00n ± 0% 86.16n ± 3% -99.06% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_64_4-4 9195.50n ± 1% 91.88n ± 5% -99.00% (p=0.000 n=10)
MemmoveUnalignedSrcDst/f_64_7-4 8106.50n ± 0% 78.44n ± 9% -99.03% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_64_7-4 8107.00n ± 0% 99.19n ± 1% -98.78% (p=0.000 n=10)
MemmoveUnalignedSrcDst/f_256_0-4 90.95n ± 1% 92.16n ± 8% ~ (p=0.123 n=10)
MemmoveUnalignedSrcDst/b_256_0-4 96.09n ± 12% 94.90n ± 2% ~ (p=0.143 n=10)
MemmoveUnalignedSrcDst/f_256_1-4 35492.5n ± 0% 133.5n ± 0% -99.62% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_256_1-4 128.7n ± 1% 130.1n ± 1% +1.13% (p=0.005 n=10)
MemmoveUnalignedSrcDst/f_256_4-4 36599.0n ± 0% 123.0n ± 1% -99.66% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_256_4-4 36675.5n ± 0% 130.7n ± 1% -99.64% (p=0.000 n=10)
MemmoveUnalignedSrcDst/f_256_7-4 35555.5n ± 0% 121.6n ± 2% -99.66% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_256_7-4 35584.0n ± 0% 139.1n ± 1% -99.61% (p=0.000 n=10)
MemmoveUnalignedSrcDst/f_4096_0-4 956.3n ± 2% 960.8n ± 1% ~ (p=0.306 n=10)
MemmoveUnalignedSrcDst/b_4096_0-4 1.015µ ± 2% 1.012µ ± 2% ~ (p=0.076 n=10)
MemmoveUnalignedSrcDst/f_4096_1-4 584.406µ ± 0% 1.002µ ± 1% -99.83% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_4096_1-4 1.044µ ± 1% 1.040µ ± 2% ~ (p=0.090 n=10)
MemmoveUnalignedSrcDst/f_4096_4-4 585113.5n ± 0% 988.6n ± 2% -99.83% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_4096_4-4 586.521µ ± 0% 1.044µ ± 1% -99.82% (p=0.000 n=10)
MemmoveUnalignedSrcDst/f_4096_7-4 585374.5n ± 0% 986.2n ± 0% -99.83% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_4096_7-4 584.595µ ± 1% 1.055µ ± 0% -99.82% (p=0.000 n=10)
MemmoveUnalignedSrcDst/f_65536_0-4 54.83µ ± 0% 55.00µ ± 0% +0.31% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_65536_0-4 56.54µ ± 0% 56.64µ ± 0% +0.19% (p=0.011 n=10)
MemmoveUnalignedSrcDst/f_65536_1-4 9450.51µ ± 0% 58.25µ ± 0% -99.38% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_65536_1-4 56.65µ ± 0% 56.68µ ± 0% ~ (p=0.353 n=10)
MemmoveUnalignedSrcDst/f_65536_4-4 9449.48µ ± 0% 58.24µ ± 0% -99.38% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_65536_4-4 9462.91µ ± 0% 56.69µ ± 0% -99.40% (p=0.000 n=10)
MemmoveUnalignedSrcDst/f_65536_7-4 9477.37µ ± 0% 58.26µ ± 0% -99.39% (p=0.000 n=10)
MemmoveUnalignedSrcDst/b_65536_7-4 9467.96µ ± 0% 56.68µ ± 0% -99.40% (p=0.000 n=10)
geomean 11.16µ 509.8n -95.43%
Change-Id: Idfa1873b81fece3b2b1a0aed398fa5663cc73b83
Reviewed-on: https://go-review.googlesource.com/c/go/+/498377
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
The existing code incorrectly determines whether the pointer passed to
memclrNoHeapPointers is 8 byte aligned (it currently checks to see whether
it's 4 byte aligned).
In addition, the code that aligns the pointer, by individually filling
the first few bytes of the buffer with zeros, is also incorrect. It adjusts
the pointer by the wrong number of bytes, resulting in most cases, in
an unaligned pointer.
This commit fixes both of these issues by anding the pointer with 7
rather than 3 to determine its alignment, and by individually filling
the first (8 - (pointer & 7)) bytes with 0 to align the buffer, rather
than the first (pointer & 3) bytes.
We also remove an unnecessary immediate MOV instruction.
A new benchmark is added to test the performance of memclrNoHeapPointers
on non-aligned pointers. Results of the existing and the new benchmark
on a SiFive HiFive Unmatched A00 with 16GB of RAM running Ubuntu 23.04
are presented below.
Memclr/5-4 21.98n ± 7% 22.66n ± 9% ~ (p=0.079 n=10)
Memclr/16-4 20.85n ± 3% 21.09n ± 5% ~ (p=0.796 n=10)
Memclr/64-4 28.20n ± 4% 27.50n ± 3% ~ (p=0.093 n=10)
Memclr/256-4 53.66n ± 8% 53.44n ± 8% ~ (p=0.280 n=10)
Memclr/4096-4 522.6n ± 1% 523.4n ± 1% ~ (p=0.240 n=10)
Memclr/65536-4 24.17µ ± 0% 24.13µ ± 0% -0.19% (p=0.029 n=10)
Memclr/1M-4 446.9µ ± 0% 446.9µ ± 0% ~ (p=0.684 n=10)
Memclr/4M-4 12.69m ± 2% 12.79m ± 3% +0.78% (p=0.043 n=10)
Memclr/8M-4 29.75m ± 0% 29.76m ± 0% +0.03% (p=0.015 n=10)
Memclr/16M-4 60.34m ± 0% 60.32m ± 0% ~ (p=0.247 n=10)
Memclr/64M-4 241.2m ± 0% 241.3m ± 0% ~ (p=0.247 n=10)
MemclrUnaligned/0_5-4 27.71n ± 0% 27.72n ± 1% ~ (p=0.142 n=10)
MemclrUnaligned/0_16-4 26.95n ± 0% 26.04n ± 0% -3.38% (p=0.000 n=10)
MemclrUnaligned/0_64-4 38.27n ± 4% 40.15n ± 6% +4.89% (p=0.005 n=10)
MemclrUnaligned/0_256-4 63.95n ± 3% 64.19n ± 2% ~ (p=0.971 n=10)
MemclrUnaligned/0_4096-4 532.6n ± 1% 530.9n ± 1% ~ (p=0.324 n=10)
MemclrUnaligned/0_65536-4 24.30µ ± 0% 24.22µ ± 0% -0.32% (p=0.023 n=10)
MemclrUnaligned/1_5-4 29.40n ± 0% 29.39n ± 0% ~ (p=0.060 n=10)
MemclrUnaligned/1_16-4 632.65n ± 1% 63.80n ± 2% -89.92% (p=0.000 n=10)
MemclrUnaligned/1_64-4 4091.00n ± 1% 73.23n ± 1% -98.21% (p=0.000 n=10)
MemclrUnaligned/1_256-4 17803.50n ± 1% 92.03n ± 1% -99.48% (p=0.000 n=10)
MemclrUnaligned/1_4096-4 294150.0n ± 1% 561.9n ± 1% -99.81% (p=0.000 n=10)
MemclrUnaligned/1_65536-4 4692.80µ ± 1% 24.44µ ± 0% -99.48% (p=0.000 n=10)
MemclrUnaligned/4_5-4 27.71n ± 0% 27.71n ± 0% ~ (p=0.308 n=10)
MemclrUnaligned/4_16-4 1187.00n ± 1% 50.74n ± 3% -95.72% (p=0.000 n=10)
MemclrUnaligned/4_64-4 4617.00n ± 1% 59.89n ± 2% -98.70% (p=0.000 n=10)
MemclrUnaligned/4_256-4 18472.50n ± 1% 84.76n ± 2% -99.54% (p=0.000 n=10)
MemclrUnaligned/4_4096-4 292904.0n ± 1% 553.7n ± 0% -99.81% (p=0.000 n=10)
MemclrUnaligned/4_65536-4 4716.12µ ± 0% 24.38µ ± 0% -99.48% (p=0.000 n=10)
MemclrUnaligned/7_5-4 29.39n ± 0% 29.39n ± 0% ~ (p=1.000 n=10)
MemclrUnaligned/7_16-4 636.80n ± 1% 48.33n ± 5% -92.41% (p=0.000 n=10)
MemclrUnaligned/7_64-4 4094.00n ± 1% 58.88n ± 3% -98.56% (p=0.000 n=10)
MemclrUnaligned/7_256-4 17869.00n ± 2% 82.70n ± 3% -99.54% (p=0.000 n=10)
MemclrUnaligned/7_4096-4 294110.5n ± 1% 554.6n ± 1% -99.81% (p=0.000 n=10)
MemclrUnaligned/7_65536-4 4735.00µ ± 1% 24.28µ ± 0% -99.49% (p=0.000 n=10)
MemclrUnaligned/0_1M-4 447.8µ ± 0% 450.0µ ± 1% +0.51% (p=0.000 n=10)
MemclrUnaligned/0_4M-4 12.68m ± 1% 12.64m ± 2% -0.33% (p=0.015 n=10)
MemclrUnaligned/0_8M-4 29.76m ± 0% 29.79m ± 2% ~ (p=0.075 n=10)
MemclrUnaligned/0_16M-4 60.34m ± 1% 60.49m ± 1% ~ (p=0.353 n=10)
MemclrUnaligned/0_64M-4 241.3m ± 0% 241.4m ± 0% ~ (p=0.247 n=10)
MemclrUnaligned/1_1M-4 75937.3µ ± 1% 449.9µ ± 0% -99.41% (p=0.000 n=10)
MemclrUnaligned/1_4M-4 313.96m ± 2% 12.69m ± 0% -95.96% (p=0.000 n=10)
MemclrUnaligned/1_8M-4 630.97m ± 1% 29.76m ± 0% -95.28% (p=0.000 n=10)
MemclrUnaligned/1_16M-4 1263.47m ± 1% 60.35m ± 2% -95.22% (p=0.000 n=10)
MemclrUnaligned/1_64M-4 5053.5m ± 0% 241.3m ± 0% -95.23% (p=0.000 n=10)
MemclrUnaligned/4_1M-4 75880.5µ ± 2% 446.5µ ± 0% -99.41% (p=0.000 n=10)
MemclrUnaligned/4_4M-4 314.00m ± 1% 12.71m ± 2% -95.95% (p=0.000 n=10)
MemclrUnaligned/4_8M-4 630.63m ± 1% 29.77m ± 2% -95.28% (p=0.000 n=10)
MemclrUnaligned/4_16M-4 1257.80m ± 0% 60.34m ± 2% -95.20% (p=0.000 n=10)
MemclrUnaligned/4_64M-4 5041.3m ± 1% 241.2m ± 0% -95.21% (p=0.000 n=10)
MemclrUnaligned/7_1M-4 75866.2µ ± 1% 446.9µ ± 0% -99.41% (p=0.000 n=10)
MemclrUnaligned/7_4M-4 309.86m ± 1% 12.70m ± 1% -95.90% (p=0.000 n=10)
MemclrUnaligned/7_8M-4 626.67m ± 1% 29.75m ± 2% -95.25% (p=0.000 n=10)
MemclrUnaligned/7_16M-4 1252.84m ± 1% 60.31m ± 0% -95.19% (p=0.000 n=10)
MemclrUnaligned/7_64M-4 5015.8m ± 1% 241.4m ± 0% -95.19% (p=0.000 n=10)
geomean 339.1µ 35.83µ -89.43%
Change-Id: I3b958a1d8e8f5ef205052e6b985a5ce21e92ef85
Reviewed-on: https://go-review.googlesource.com/c/go/+/496455
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
This patch rewrites known-size calls to memclrNoHeapPointers with an OpZero.
This significantly improves performance and lets some clears get DSE'd.
One of the cases where this applies is zeroing a known-size array, example:
var x [256]int8
...
for a := range x {
x[a] = 0
}
Other cases can be found in the runtime itself where memclrNoHeapPointers is sometimes directly invoked with a constant.
It seems that for some sized-clears on some architectures (AMD64, maybe others) the memcrlNoHeapPointers is more performant than OpZero.
See the issue #56997 for more details.
Benches ARM (M1 Pro):
name old time/op new time/op delta
MemclrKnownSize1-10 2.03ns ± 0% 0.31ns ± 0% -84.69% (p=0.000 n=18+19)
MemclrKnownSize2-10 1.97ns ± 0% 0.31ns ± 0% -84.19% (p=0.000 n=12+19)
MemclrKnownSize4-10 2.02ns ± 0% 0.31ns ± 0% -84.56% (p=0.000 n=12+20)
MemclrKnownSize8-10 2.02ns ± 0% 0.31ns ± 0% -84.59% (p=0.000 n=14+19)
MemclrKnownSize16-10 2.15ns ± 0% 0.31ns ± 0% -85.50% (p=0.000 n=18+19)
MemclrKnownSize32-10 2.48ns ± 0% 0.31ns ± 0% -87.48% (p=0.000 n=20+19)
MemclrKnownSize64-10 1.93ns ± 0% 0.62ns ± 0% -67.88% (p=0.000 n=20+19)
MemclrKnownSize112-10 2.48ns ± 0% 1.80ns ± 0% -27.74% (p=0.000 n=19+20)
MemclrKnownSize128-10 10.0ns ±112% 2.0ns ± 0% -79.76% (p=0.000 n=18+17)
MemclrKnownSize192-10 27.4ns ±103% 2.6ns ± 0% -90.38% (p=0.000 n=16+19)
MemclrKnownSize248-10 9.67ns ±43% 3.26ns ± 0% -66.29% (p=0.000 n=19+19)
MemclrKnownSize256-10 85.4ns ±148% 3.3ns ± 0% -96.18% (p=0.000 n=20+20)
MemclrKnownSize512-10 223ns ±54% 6ns ± 0% -97.42% (p=0.000 n=18+20)
MemclrKnownSize1024-10 216ns ±26% 11ns ± 0% -95.00% (p=0.000 n=18+15)
MemclrKnownSize4096-10 265ns ± 2% 88ns ± 0% -66.84% (p=0.000 n=19+17)
MemclrKnownSize512KiB-10 9.91µs ± 1% 10.23µs ± 2% +3.14% (p=0.000 n=19+19)
[Geo mean] 15.6ns 2.5ns -83.62%
name old speed new speed delta
MemclrKnownSize1-10 493MB/s ± 0% 3216MB/s ± 0% +553.04% (p=0.000 n=18+19)
MemclrKnownSize2-10 1.02GB/s ± 0% 6.43GB/s ± 0% +532.33% (p=0.000 n=16+19)
MemclrKnownSize4-10 1.99GB/s ± 0% 12.86GB/s ± 0% +547.67% (p=0.000 n=18+20)
MemclrKnownSize8-10 3.96GB/s ± 0% 25.72GB/s ± 0% +548.81% (p=0.000 n=19+19)
MemclrKnownSize16-10 7.46GB/s ± 0% 51.43GB/s ± 0% +589.72% (p=0.000 n=20+19)
MemclrKnownSize32-10 12.9GB/s ± 0% 102.9GB/s ± 0% +698.60% (p=0.000 n=20+18)
MemclrKnownSize64-10 33.1GB/s ± 0% 103.0GB/s ± 0% +211.34% (p=0.000 n=19+19)
MemclrKnownSize112-10 45.1GB/s ± 0% 62.4GB/s ± 0% +38.38% (p=0.000 n=19+20)
MemclrKnownSize128-10 13.3GB/s ±107% 63.5GB/s ± 0% +378.03% (p=0.000 n=19+18)
MemclrKnownSize192-10 6.97GB/s ±139% 72.72GB/s ± 0% +943.44% (p=0.000 n=19+19)
MemclrKnownSize248-10 25.9GB/s ±46% 76.1GB/s ± 0% +194.16% (p=0.000 n=20+17)
MemclrKnownSize256-10 8.64GB/s ±196% 78.51GB/s ± 0% +808.19% (p=0.000 n=20+20)
MemclrKnownSize512-10 2.33GB/s ±86% 89.13GB/s ± 0% +3719.50% (p=0.000 n=17+20)
MemclrKnownSize1024-10 4.85GB/s ±32% 94.93GB/s ± 0% +1856.74% (p=0.000 n=18+19)
MemclrKnownSize4096-10 15.4GB/s ± 2% 46.6GB/s ± 0% +201.55% (p=0.000 n=19+18)
MemclrKnownSize512KiB-10 52.9GB/s ± 1% 51.3GB/s ± 2% -3.04% (p=0.000 n=19+19)
[Geo mean] 7.54GB/s 42.86GB/s +468.76%
Intel Alder Lake 12600k:
name old time/op new time/op delta
MemclrKnownSize1-16 0.59ns ± 3% 0.38ns ± 6% -36.00% (p=0.000 n=19+18)
MemclrKnownSize2-16 0.57ns ± 1% 0.19ns ± 5% -66.27% (p=0.000 n=19+19)
MemclrKnownSize4-16 0.66ns ± 2% 0.36ns ±21% -45.12% (p=0.000 n=19+20)
MemclrKnownSize8-16 0.74ns ± 1% 0.30ns ±26% -59.81% (p=0.000 n=18+20)
MemclrKnownSize16-16 1.00ns ± 7% 0.21ns ± 8% -79.51% (p=0.000 n=20+19)
MemclrKnownSize32-16 0.95ns ± 1% 0.40ns ± 1% -57.61% (p=0.000 n=20+18)
MemclrKnownSize64-16 1.20ns ± 2% 0.41ns ± 0% -65.82% (p=0.000 n=20+18)
MemclrKnownSize112-16 1.27ns ± 2% 1.03ns ± 0% -19.35% (p=0.000 n=20+18)
MemclrKnownSize128-16 1.34ns ± 2% 1.03ns ± 0% -23.02% (p=0.000 n=20+20)
MemclrKnownSize192-16 1.92ns ± 2% 1.44ns ± 0% -24.89% (p=0.000 n=20+16)
MemclrKnownSize248-16 2.77ns ± 1% 3.29ns ± 0% +18.81% (p=0.000 n=20+16)
MemclrKnownSize256-16 1.92ns ± 1% 1.86ns ± 0% -3.49% (p=0.000 n=19+15)
MemclrKnownSize512-16 2.81ns ± 2% 3.49ns ± 0% +24.15% (p=0.000 n=20+17)
MemclrKnownSize1024-16 4.02ns ± 1% 6.78ns ± 0% +68.44% (p=0.000 n=20+18)
MemclrKnownSize4096-16 17.2ns ± 2% 14.4ns ± 0% -16.73% (p=0.000 n=20+17)
MemclrKnownSize512KiB-16 6.71µs ± 1% 6.52µs ± 0% -2.85% (p=0.000 n=20+18)
[Geo mean] 2.60ns 1.71ns -34.06%
name old speed new speed delta
MemclrKnownSize1-16 1.71GB/s ± 3% 2.67GB/s ± 6% +56.39% (p=0.000 n=19+18)
MemclrKnownSize2-16 3.52GB/s ± 2% 10.43GB/s ± 6% +196.04% (p=0.000 n=20+20)
MemclrKnownSize4-16 6.06GB/s ± 1% 10.83GB/s ±11% +78.63% (p=0.000 n=19+18)
MemclrKnownSize8-16 10.7GB/s ± 1% 27.0GB/s ±21% +151.49% (p=0.000 n=18+20)
MemclrKnownSize16-16 16.0GB/s ± 8% 78.1GB/s ± 7% +387.24% (p=0.000 n=20+19)
MemclrKnownSize32-16 33.6GB/s ± 1% 79.4GB/s ± 1% +135.89% (p=0.000 n=20+18)
MemclrKnownSize64-16 53.3GB/s ± 2% 155.9GB/s ± 0% +192.58% (p=0.000 n=20+18)
MemclrKnownSize112-16 88.0GB/s ± 2% 109.1GB/s ± 0% +23.97% (p=0.000 n=20+18)
MemclrKnownSize128-16 95.3GB/s ± 2% 123.8GB/s ± 0% +29.88% (p=0.000 n=20+20)
MemclrKnownSize192-16 100GB/s ± 2% 133GB/s ± 0% +33.12% (p=0.000 n=20+17)
MemclrKnownSize248-16 89.7GB/s ± 1% 75.5GB/s ± 0% -15.84% (p=0.000 n=20+19)
MemclrKnownSize256-16 133GB/s ± 1% 138GB/s ± 0% +3.61% (p=0.000 n=19+14)
MemclrKnownSize512-16 182GB/s ± 2% 147GB/s ± 0% -19.46% (p=0.000 n=20+17)
MemclrKnownSize1024-16 254GB/s ± 1% 151GB/s ± 0% -40.64% (p=0.000 n=20+18)
MemclrKnownSize4096-16 237GB/s ± 2% 285GB/s ± 0% +20.09% (p=0.000 n=20+17)
MemclrKnownSize512KiB-16 78.2GB/s ± 1% 80.4GB/s ± 0% +2.93% (p=0.000 n=20+18)
[Geo mean] 42.1GB/s 63.8GB/s +51.53%
compilecmp linux/amd64:
runtime
runtime.(*pallocData).allocAll 85 -> 45 (-47.06%)
runtime.(*pageAlloc).allocRange 942 -> 923 (-2.02%)
runtime.(*pageAlloc).free 798 -> 774 (-3.01%)
runtime.(*pageBits).clearAll 66 -> 20 (-69.70%)
runtime.startCheckmarks 255 -> 246 (-3.53%)
runtime.(*pallocData).freeAll 86 -> 46 (-46.51%)
runtime.(*pallocBits).freeAll 66 -> 20 (-69.70%)
runtime.(*consistentHeapStats).unsafeClear 66 -> 19 (-71.21%)
runtime.newproc1 965 -> 933 (-3.32%)
crypto/rc4
crypto/rc4.(*Cipher).Reset 78 -> 69 (-11.54%)
compress/bzip2
compress/bzip2.(*reader).readBlock 2973 -> 2941 (-1.08%)
image/jpeg
image/jpeg.(*decoder).processDHT 1179 -> 1166 (-1.10%)
index/suffixarray
index/suffixarray.bucketMax_8_32 394 -> 241 (-38.83%)
index/suffixarray.freq_8_32 317 -> 185 (-41.64%)
index/suffixarray.freq_8_64 317 -> 178 (-43.85%)
index/suffixarray.bucketMin_8_32 394 -> 243 (-38.32%)
index/suffixarray.bucketMin_8_64 398 -> 234 (-41.21%)
index/suffixarray.bucketMax_8_64 398 -> 234 (-41.21%)
compress/flate
compress/flate.(*huffmanBitWriter).generateCodegen 965 -> 838 (-13.16%)
compress/flate.(*compressor).reset 429 -> 409 (-4.66%)
cmd/vendor/golang.org/x/sys/unix
cmd/vendor/golang.org/x/sys/unix.(*FdSet).Zero 66 -> 60 (-9.09%)
cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetInet4Addr 211 -> 129 (-38.86%)
cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetUint32 98 -> 14 (-85.71%)
cmd/vendor/golang.org/x/sys/unix.(*Ifreq).clear 66 -> 11 (-83.33%)
cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetUint16 101 -> 15 (-85.15%)
cmd/vendor/golang.org/x/sys/unix.(*CPUSet).Zero 66 -> 60 (-9.09%)
internal/coverage/decodemeta
internal/coverage/decodemeta.(*CoverageMetaFileReader).rdUint64 325 -> 293 (-9.85%)
crypto/tls
crypto/tls.(*halfConn).setTrafficSecret 253 -> 247 (-2.37%)
crypto/tls.(*Conn).readRecordOrCCS 10315 -> 10283 (-0.31%)
crypto/tls.(*halfConn).changeCipherSpec 271 -> 261 (-3.69%)
crypto/tls.(*Conn).writeRecordLocked 1765 -> 1748 (-0.96%)
file before after Δ %
runtime.s 512467 512164 -303 -0.059%
crypto/rc4.s 955 946 -9 -0.942%
compress/bzip2.s 9586 9554 -32 -0.334%
image/jpeg.s 32122 32109 -13 -0.040%
index/suffixarray.s 38547 37644 -903 -2.343%
compress/flate.s 46668 46521 -147 -0.315%
cmd/vendor/golang.org/x/sys/unix.s 118620 118301 -319 -0.269%
internal/coverage/decodemeta.s 7224 7192 -32 -0.443%
crypto/tls.s 288762 288697 -65 -0.023%
cmd/compile/internal/ssa.s 3639799 3640727 +928 +0.025%
total 20790248 20789353 -895 -0.004%
src/runtime benchmarks (Linux Alder Lake 12600k):
name old time/op new time/op delta
MakeChan/Byte-16 26.2ns ± 2% 25.6ns ± 3% -2.05% (p=0.003 n=9+10)
MakeChan/Int-16 33.9ns ± 2% 33.3ns ± 4% -1.99% (p=0.015 n=10+10)
MakeChan/Ptr-16 54.2ns ± 2% 53.7ns ± 1% -0.90% (p=0.016 n=10+9)
MakeChan/Struct/0-16 23.8ns ± 3% 23.4ns ± 1% -1.72% (p=0.009 n=10+8)
MakeChan/Struct/32-16 55.9ns ± 2% 53.9ns ± 1% -3.48% (p=0.000 n=10+10)
MakeChan/Struct/40-16 63.5ns ± 1% 61.1ns ± 2% -3.79% (p=0.000 n=10+9)
ChanNonblocking-16 0.22ns ± 0% 0.22ns ± 0% +0.40% (p=0.011 n=9+8)
SelectUncontended-16 4.63ns ± 1% 4.62ns ± 0% -0.35% (p=0.001 n=10+8)
SelectSyncContended-16 1.58µs ± 2% 1.59µs ± 1% ~ (p=0.540 n=10+10)
SelectAsyncContended-16 290ns ± 0% 291ns ± 0% +0.14% (p=0.012 n=8+9)
SelectNonblock-16 0.95ns ± 1% 0.95ns ± 1% ~ (p=0.546 n=9+9)
ChanUncontended-16 239ns ± 3% 242ns ± 6% ~ (p=0.886 n=9+10)
ChanContended-16 17.7µs ± 1% 18.2µs ± 1% +2.87% (p=0.000 n=10+9)
ChanSync-16 109ns ± 2% 109ns ± 1% ~ (p=0.342 n=10+10)
ChanSyncWork-16 6.55µs ± 1% 6.53µs ± 1% ~ (p=0.101 n=10+10)
ChanProdCons0-16 502ns ± 1% 499ns ± 0% -0.55% (p=0.001 n=10+9)
ChanProdCons10-16 373ns ± 2% 377ns ± 1% ~ (p=0.095 n=10+9)
ChanProdCons100-16 224ns ± 2% 223ns ± 3% ~ (p=0.150 n=9+10)
ChanProdConsWork0-16 491ns ± 1% 484ns ± 0% -1.26% (p=0.000 n=10+9)
ChanProdConsWork10-16 451ns ± 2% 448ns ± 2% ~ (p=0.210 n=8+10)
ChanProdConsWork100-16 406ns ± 0% 407ns ± 1% ~ (p=0.138 n=8+8)
SelectProdCons-16 509ns ± 0% 509ns ± 0% ~ (p=0.917 n=9+9)
ReceiveDataFromClosedChan-16 12.1ns ± 0% 12.1ns ± 0% ~ (p=0.780 n=10+10)
ChanCreation-16 22.6ns ± 1% 22.4ns ± 0% -0.72% (p=0.001 n=10+8)
ChanSem-16 165ns ± 1% 166ns ± 1% +0.72% (p=0.002 n=10+10)
ChanPopular-16 500µs ± 2% 498µs ± 1% ~ (p=0.218 n=10+10)
ChanClosed-16 0.29ns ± 0% 0.29ns ± 0% +0.09% (p=0.019 n=9+8)
CallClosure-16 1.28ns ± 0% 1.27ns ± 0% -0.51% (p=0.000 n=9+9)
CallClosure1-16 1.50ns ± 0% 1.50ns ± 0% ~ (p=0.123 n=9+9)
CallClosure2-16 8.86ns ± 1% 8.86ns ± 3% ~ (p=0.590 n=9+10)
CallClosure3-16 8.75ns ± 2% 8.69ns ± 2% ~ (p=0.247 n=10+10)
CallClosure4-16 8.65ns ± 2% 8.56ns ± 2% ~ (p=0.105 n=10+10)
Complex128DivNormal-16 2.47ns ± 0% 2.47ns ± 0% ~ (p=0.790 n=10+9)
Complex128DivNisNaN-16 4.44ns ± 0% 4.43ns ± 0% ~ (p=0.564 n=10+10)
Complex128DivDisNaN-16 4.48ns ± 0% 4.48ns ± 0% ~ (p=0.101 n=10+10)
Complex128DivNisInf-16 2.58ns ± 0% 2.58ns ± 0% ~ (p=0.808 n=10+10)
Complex128DivDisInf-16 6.30ns ± 0% 6.31ns ± 0% ~ (p=0.305 n=10+10)
SetTypePtr-16 0.73ns ± 1% 0.73ns ± 3% ~ (p=0.644 n=10+10)
SetTypePtr8-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.127 n=10+10)
SetTypePtr16-16 4.13ns ± 1% 4.12ns ± 0% ~ (p=0.109 n=10+10)
SetTypePtr32-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.203 n=9+10)
SetTypePtr64-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.696 n=10+10)
SetTypePtr126-16 6.91ns ± 0% 6.91ns ± 0% ~ (p=0.469 n=10+10)
SetTypePtr128-16 6.66ns ± 0% 6.67ns ± 0% ~ (p=0.246 n=9+10)
SetTypePtrSlice-16 54.1ns ± 1% 54.1ns ± 1% ~ (p=0.509 n=9+10)
SetTypeNode1-16 4.13ns ± 1% 4.12ns ± 0% ~ (p=0.342 n=10+10)
SetTypeNode1Slice-16 10.1ns ± 1% 10.0ns ± 1% -1.18% (p=0.000 n=10+10)
SetTypeNode8-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.137 n=8+8)
SetTypeNode8Slice-16 22.6ns ± 0% 22.6ns ± 0% ~ (p=0.423 n=10+10)
SetTypeNode64-16 6.90ns ± 0% 6.91ns ± 0% ~ (p=0.275 n=10+10)
SetTypeNode64Slice-16 173ns ± 0% 173ns ± 0% ~ (p=0.610 n=9+10)
SetTypeNode64Dead-16 5.53ns ± 0% 5.52ns ± 0% ~ (p=0.123 n=10+6)
SetTypeNode64DeadSlice-16 150ns ± 0% 150ns ± 0% ~ (p=0.398 n=10+10)
SetTypeNode124-16 6.90ns ± 0% 6.90ns ± 0% ~ (p=0.779 n=10+10)
SetTypeNode124Slice-16 222ns ± 5% 217ns ± 0% ~ (p=0.302 n=10+10)
SetTypeNode126-16 6.66ns ± 0% 6.66ns ± 0% ~ (p=0.324 n=10+9)
SetTypeNode126Slice-16 218ns ± 0% 218ns ± 0% ~ (p=0.119 n=9+10)
SetTypeNode128-16 9.76ns ± 0% 9.73ns ± 0% -0.31% (p=0.003 n=9+10)
SetTypeNode128Slice-16 279ns ± 0% 278ns ± 0% ~ (p=0.112 n=10+9)
SetTypeNode130-16 9.77ns ± 0% 9.73ns ± 0% -0.33% (p=0.002 n=10+10)
SetTypeNode130Slice-16 284ns ± 0% 284ns ± 0% ~ (p=0.668 n=10+10)
SetTypeNode1024-16 51.2ns ± 0% 51.6ns ± 1% ~ (p=0.080 n=9+9)
SetTypeNode1024Slice-16 1.83µs ± 0% 1.82µs ± 0% ~ (p=0.115 n=10+10)
Allocation-16 4.64µs ± 1% 4.37µs ± 1% -5.69% (p=0.000 n=9+9)
ReadMemStats-16 5.62µs ± 2% 5.55µs ± 5% -1.36% (p=0.050 n=10+10)
WriteBarrier-16 4.95ns ± 3% 4.99ns ± 3% ~ (p=0.255 n=10+10)
BulkWriteBarrier-16 1.69ns ± 2% 1.63ns ± 4% -3.77% (p=0.001 n=10+10)
ScanStackNoLocals-16 12.8ms ± 2% 12.9ms ± 1% +0.72% (p=0.019 n=10+10)
MSpanCountAlloc/bits=64-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.124 n=10+10)
MSpanCountAlloc/bits=128-16 2.08ns ± 1% 2.06ns ± 1% -0.87% (p=0.000 n=10+10)
MSpanCountAlloc/bits=256-16 2.71ns ± 1% 2.69ns ± 1% -0.74% (p=0.001 n=10+9)
MSpanCountAlloc/bits=512-16 4.15ns ± 0% 4.23ns ± 2% +2.15% (p=0.000 n=10+10)
MSpanCountAlloc/bits=1024-16 7.89ns ± 1% 7.89ns ± 1% ~ (p=0.867 n=10+10)
Hash5-16 1.93ns ± 1% 2.01ns ± 0% +3.99% (p=0.000 n=10+8)
Hash16-16 2.04ns ± 1% 2.21ns ± 1% +8.61% (p=0.000 n=10+10)
Hash64-16 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.154 n=9+9)
Hash1024-16 16.4ns ± 0% 16.4ns ± 0% +0.17% (p=0.020 n=9+10)
Hash65536-16 886ns ± 0% 885ns ± 0% ~ (p=0.725 n=10+10)
AlignedLoad-16 0.96ns ± 2% 0.95ns ± 3% ~ (p=0.123 n=10+10)
UnalignedLoad-16 0.95ns ± 2% 1.01ns ± 2% +6.31% (p=0.000 n=10+10)
EqEfaceConcrete-16 0.31ns ± 3% 0.33ns ± 5% +8.10% (p=0.000 n=10+10)
EqIfaceConcrete-16 0.31ns ±13% 0.28ns ± 2% -9.23% (p=0.001 n=10+10)
NeEfaceConcrete-16 0.29ns ± 1% 0.31ns ± 7% +5.59% (p=0.010 n=8+8)
NeIfaceConcrete-16 0.28ns ± 2% 0.29ns ± 1% +4.49% (p=0.000 n=9+8)
ConvT2EByteSized/bool-16 0.53ns ± 1% 0.52ns ± 1% -2.18% (p=0.000 n=10+10)
ConvT2EByteSized/uint8-16 0.53ns ± 1% 0.53ns ± 0% +1.22% (p=0.000 n=10+10)
ConvT2ESmall-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.774 n=9+9)
ConvT2EUintptr-16 1.03ns ± 0% 1.04ns ± 0% +0.50% (p=0.000 n=10+8)
ConvT2ELarge-16 14.4ns ± 2% 14.4ns ± 1% ~ (p=0.726 n=10+10)
ConvT2ISmall-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.693 n=9+10)
ConvT2IUintptr-16 1.03ns ± 0% 1.03ns ± 0% +0.44% (p=0.000 n=10+10)
ConvT2ILarge-16 14.2ns ± 1% 14.4ns ± 1% +0.85% (p=0.007 n=9+10)
ConvI2E-16 0.54ns ± 1% 0.54ns ± 0% -0.39% (p=0.037 n=10+8)
ConvI2I-16 2.68ns ± 0% 2.70ns ± 1% +0.73% (p=0.000 n=9+10)
AssertE2T-16 0.28ns ± 1% 0.39ns ± 5% +37.38% (p=0.000 n=10+10)
AssertE2TLarge-16 0.42ns ± 2% 0.48ns ± 1% +14.92% (p=0.000 n=9+10)
AssertE2I-16 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.352 n=9+9)
AssertI2T-16 0.37ns ± 3% 0.34ns ± 1% -6.16% (p=0.000 n=10+10)
AssertI2I-16 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.286 n=10+10)
AssertI2E-16 0.54ns ± 1% 0.54ns ± 0% -0.94% (p=0.000 n=10+10)
AssertE2E-16 0.41ns ± 0% 0.41ns ± 1% ~ (p=0.880 n=9+9)
AssertE2T2-16 0.41ns ± 1% 0.41ns ± 1% ~ (p=0.725 n=10+10)
AssertE2T2Blank-16 0.24ns ± 5% 0.21ns ± 1% -14.79% (p=0.000 n=10+9)
AssertI2E2-16 0.69ns ± 0% 0.69ns ± 1% ~ (p=0.541 n=10+10)
AssertI2E2Blank-16 0.26ns ± 9% 0.21ns ± 1% -18.86% (p=0.000 n=10+9)
AssertE2E2-16 0.53ns ± 1% 0.53ns ± 1% +0.72% (p=0.004 n=10+10)
AssertE2E2Blank-16 0.23ns ± 4% 0.21ns ± 1% -8.42% (p=0.000 n=10+10)
ConvT2Ezero/zero/16-16 1.13ns ± 0% 1.14ns ± 1% ~ (p=0.583 n=9+10)
ConvT2Ezero/zero/32-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.417 n=10+10)
ConvT2Ezero/zero/64-16 1.03ns ± 1% 1.03ns ± 0% ~ (p=0.051 n=10+10)
ConvT2Ezero/zero/str-16 1.03ns ± 0% 1.03ns ± 0% ~ (p=0.132 n=10+10)
ConvT2Ezero/zero/slice-16 1.14ns ± 0% 1.15ns ± 0% +0.49% (p=0.001 n=10+10)
ConvT2Ezero/zero/big-16 123ns ± 1% 123ns ± 1% ~ (p=0.171 n=10+10)
ConvT2Ezero/nonzero/str-16 19.4ns ± 1% 19.3ns ± 3% ~ (p=0.548 n=9+10)
ConvT2Ezero/nonzero/slice-16 22.2ns ± 2% 22.0ns ± 2% ~ (p=0.109 n=10+10)
ConvT2Ezero/nonzero/big-16 123ns ± 1% 123ns ± 1% ~ (p=0.446 n=10+8)
ConvT2Ezero/smallint/16-16 1.13ns ± 0% 1.14ns ± 1% ~ (p=0.362 n=10+10)
ConvT2Ezero/smallint/32-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.907 n=10+9)
ConvT2Ezero/smallint/64-16 1.04ns ± 0% 1.03ns ± 0% -0.38% (p=0.002 n=10+10)
ConvT2Ezero/largeint/16-16 6.65ns ± 1% 6.65ns ± 2% ~ (p=0.618 n=10+9)
ConvT2Ezero/largeint/32-16 6.75ns ± 3% 6.63ns ± 2% -1.77% (p=0.015 n=10+10)
ConvT2Ezero/largeint/64-16 9.19ns ± 1% 9.26ns ± 2% ~ (p=0.123 n=10+10)
Malloc8-16 8.66ns ± 1% 8.89ns ± 2% +2.74% (p=0.000 n=10+10)
Malloc16-16 13.7ns ± 1% 13.8ns ± 1% +0.71% (p=0.022 n=10+8)
MallocTypeInfo8-16 11.7ns ± 3% 11.6ns ± 2% ~ (p=0.469 n=10+10)
MallocTypeInfo16-16 18.3ns ± 1% 18.2ns ± 2% ~ (p=0.251 n=9+10)
MallocLargeStruct-16 195ns ± 1% 198ns ± 1% +1.65% (p=0.000 n=9+10)
GoroutineSelect-16 1.10ms ± 1% 1.12ms ± 1% +1.36% (p=0.000 n=10+8)
GoroutineBlocking-16 986µs ± 1% 998µs ± 1% +1.23% (p=0.002 n=10+10)
GoroutineForRange-16 985µs ± 1% 1001µs ± 1% +1.68% (p=0.000 n=10+10)
GoroutineIdle-16 679µs ± 1% 691µs ± 0% +1.74% (p=0.000 n=10+9)
HashStringSpeed-16 5.33ns ± 5% 5.19ns ± 4% ~ (p=0.113 n=9+9)
HashBytesSpeed-16 8.20ns ± 3% 8.24ns ± 1% ~ (p=0.497 n=10+9)
HashInt32Speed-16 4.01ns ± 2% 3.90ns ± 4% -2.63% (p=0.011 n=9+10)
HashInt64Speed-16 3.94ns ± 4% 3.79ns ± 1% -3.74% (p=0.003 n=10+9)
HashStringArraySpeed-16 12.5ns ± 4% 12.3ns ± 1% ~ (p=0.055 n=10+10)
MegMap-16 3.72ns ± 1% 3.73ns ± 1% ~ (p=0.484 n=9+10)
MegOneMap-16 2.28ns ± 1% 2.27ns ± 1% ~ (p=0.287 n=10+10)
MegEqMap-16 22.0µs ± 3% 22.3µs ± 2% +1.48% (p=0.028 n=10+9)
MegEmptyMap-16 0.93ns ± 1% 0.92ns ± 1% -0.52% (p=0.030 n=10+10)
SmallStrMap-16 3.77ns ± 0% 3.77ns ± 0% ~ (p=0.324 n=10+10)
MapStringKeysEight_16-16 3.91ns ± 0% 3.91ns ± 0% ~ (p=0.088 n=9+9)
MapStringKeysEight_32-16 3.58ns ± 1% 3.50ns ± 0% -2.11% (p=0.000 n=10+10)
MapStringKeysEight_64-16 3.58ns ± 1% 3.50ns ± 0% -2.23% (p=0.000 n=10+10)
MapStringKeysEight_1M-16 3.57ns ± 1% 3.50ns ± 0% -1.92% (p=0.000 n=10+10)
IntMap-16 2.89ns ± 1% 2.89ns ± 0% ~ (p=0.381 n=10+10)
MapFirst/1-16 1.60ns ± 1% 1.59ns ± 2% -0.49% (p=0.020 n=10+9)
MapFirst/2-16 1.61ns ± 0% 1.59ns ± 1% -1.17% (p=0.001 n=10+10)
MapFirst/3-16 1.61ns ± 1% 1.59ns ± 1% -1.45% (p=0.000 n=10+10)
MapFirst/4-16 1.60ns ± 1% 1.59ns ± 1% -1.16% (p=0.000 n=10+10)
MapFirst/5-16 1.60ns ± 1% 1.58ns ± 1% -0.98% (p=0.000 n=10+10)
MapFirst/6-16 1.60ns ± 1% 1.59ns ± 1% -0.87% (p=0.001 n=10+10)
MapFirst/7-16 1.60ns ± 1% 1.59ns ± 1% -0.79% (p=0.002 n=10+10)
MapFirst/8-16 1.60ns ± 1% 1.59ns ± 1% -0.67% (p=0.017 n=9+10)
MapFirst/9-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.492 n=10+10)
MapFirst/10-16 2.83ns ± 0% 2.84ns ± 0% +0.24% (p=0.017 n=10+10)
MapFirst/11-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.445 n=10+10)
MapFirst/12-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.564 n=10+10)
MapFirst/13-16 2.83ns ± 0% 2.84ns ± 0% ~ (p=0.175 n=9+10)
MapFirst/14-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.322 n=10+9)
MapFirst/15-16 2.83ns ± 0% 2.84ns ± 1% ~ (p=0.209 n=10+10)
MapFirst/16-16 2.83ns ± 1% 2.84ns ± 0% ~ (p=0.238 n=10+10)
MapMid/1-16 1.64ns ± 0% 1.64ns ± 0% ~ (p=0.453 n=10+9)
MapMid/2-16 1.86ns ± 1% 1.86ns ± 0% ~ (p=0.764 n=10+9)
MapMid/3-16 1.86ns ± 0% 1.86ns ± 1% ~ (p=1.000 n=10+10)
MapMid/4-16 2.06ns ± 0% 2.06ns ± 0% -0.27% (p=0.014 n=10+9)
MapMid/5-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.075 n=9+10)
MapMid/6-16 2.27ns ± 0% 2.27ns ± 1% ~ (p=0.898 n=10+10)
MapMid/7-16 2.27ns ± 1% 2.26ns ± 0% -0.23% (p=0.049 n=10+10)
MapMid/8-16 2.47ns ± 0% 2.47ns ± 1% ~ (p=0.840 n=10+10)
MapMid/9-16 4.21ns ± 7% 4.13ns ±19% ~ (p=0.315 n=10+10)
MapMid/10-16 4.17ns ± 7% 4.31ns ± 5% +3.37% (p=0.021 n=10+9)
MapMid/11-16 4.18ns ± 7% 4.32ns ± 6% +3.50% (p=0.015 n=10+10)
MapMid/12-16 4.34ns ± 7% 4.30ns ± 5% ~ (p=0.858 n=9+10)
MapMid/13-16 4.25ns ± 6% 4.28ns ± 6% ~ (p=0.489 n=9+9)
MapMid/14-16 3.75ns ±23% 3.90ns ±16% ~ (p=0.353 n=10+10)
MapMid/15-16 3.87ns ±25% 3.95ns ±26% ~ (p=0.315 n=10+10)
MapMid/16-16 4.06ns ±19% 3.94ns ±16% ~ (p=0.796 n=10+10)
MapLast/1-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.607 n=10+10)
MapLast/2-16 1.86ns ± 0% 1.86ns ± 0% +0.26% (p=0.029 n=10+10)
MapLast/3-16 2.06ns ± 1% 2.06ns ± 0% ~ (p=0.689 n=8+9)
MapLast/4-16 2.27ns ± 1% 2.26ns ± 0% ~ (p=0.148 n=10+9)
MapLast/5-16 2.47ns ± 0% 2.47ns ± 0% ~ (p=0.385 n=9+10)
MapLast/6-16 2.67ns ± 0% 2.68ns ± 0% ~ (p=0.202 n=9+10)
MapLast/7-16 2.88ns ± 0% 2.88ns ± 0% ~ (p=0.751 n=10+10)
MapLast/8-16 3.08ns ± 0% 3.08ns ± 0% ~ (p=0.826 n=10+9)
MapLast/9-16 4.31ns ± 6% 4.54ns ± 5% ~ (p=0.070 n=9+8)
MapLast/10-16 4.25ns ± 5% 4.42ns ± 6% ~ (p=0.321 n=9+8)
MapLast/11-16 4.59ns ±16% 5.42ns ±44% +17.99% (p=0.019 n=10+10)
MapLast/12-16 5.04ns ±19% 6.11ns ±28% +21.35% (p=0.005 n=9+10)
MapLast/13-16 6.00ns ±35% 5.76ns ± 3% ~ (p=0.173 n=10+8)
MapLast/14-16 4.27ns ± 5% 4.53ns ± 6% +6.14% (p=0.007 n=10+10)
MapLast/15-16 4.41ns ± 1% 4.44ns ± 7% ~ (p=0.515 n=8+10)
MapLast/16-16 4.18ns ± 6% 4.99ns ±18% +19.48% (p=0.000 n=10+10)
MapCycle-16 7.48ns ± 2% 7.46ns ± 1% ~ (p=0.699 n=10+10)
RepeatedLookupStrMapKey32-16 6.98ns ± 3% 6.73ns ± 2% -3.63% (p=0.000 n=10+10)
RepeatedLookupStrMapKey1M-16 14.7µs ± 5% 14.7µs ± 4% ~ (p=0.604 n=9+10)
MakeMap/[Byte]Byte-16 58.5ns ± 1% 58.5ns ± 1% ~ (p=0.780 n=10+9)
MakeMap/[Int]Int-16 113ns ± 0% 113ns ± 1% ~ (p=0.100 n=8+10)
NewEmptyMap-16 2.47ns ± 0% 2.47ns ± 0% ~ (p=0.638 n=10+10)
NewSmallMap-16 11.5ns ± 1% 11.6ns ± 0% +1.18% (p=0.000 n=10+10)
MapIter-16 42.2ns ± 0% 42.8ns ± 1% +1.50% (p=0.000 n=10+10)
MapIterEmpty-16 1.85ns ± 0% 1.85ns ± 0% ~ (p=0.651 n=10+10)
SameLengthMap-16 1.85ns ± 1% 1.85ns ± 0% ~ (p=0.247 n=10+10)
BigKeyMap-16 7.18ns ± 1% 7.42ns ± 4% +3.33% (p=0.004 n=10+10)
BigValMap-16 7.03ns ± 2% 7.19ns ± 1% +2.33% (p=0.000 n=10+9)
SmallKeyMap-16 5.32ns ± 1% 5.24ns ± 1% -1.41% (p=0.000 n=10+10)
MapPopulate/1-16 6.30ns ± 0% 6.41ns ± 1% +1.81% (p=0.000 n=8+10)
MapPopulate/10-16 239ns ± 2% 234ns ± 2% -2.05% (p=0.001 n=9+10)
MapPopulate/100-16 4.19µs ± 2% 4.22µs ± 2% ~ (p=0.171 n=10+10)
MapPopulate/1000-16 52.3µs ± 1% 52.5µs ± 1% ~ (p=0.133 n=9+10)
MapPopulate/10000-16 459µs ± 1% 466µs ± 2% +1.45% (p=0.005 n=10+10)
MapPopulate/100000-16 4.22ms ± 2% 4.25ms ± 2% ~ (p=0.393 n=10+10)
ComplexAlgMap-16 12.5ns ± 1% 12.4ns ± 1% -0.95% (p=0.022 n=10+10)
GoMapClear/Reflexive/1-16 9.61ns ± 1% 9.58ns ± 0% -0.27% (p=0.027 n=10+10)
GoMapClear/Reflexive/10-16 10.0ns ± 1% 10.0ns ± 1% ~ (p=0.648 n=9+9)
GoMapClear/Reflexive/100-16 31.4ns ± 0% 31.4ns ± 1% ~ (p=0.305 n=9+10)
GoMapClear/Reflexive/1000-16 147ns ± 0% 149ns ± 2% +1.21% (p=0.000 n=10+10)
GoMapClear/Reflexive/10000-16 3.99µs ± 0% 4.00µs ± 0% +0.21% (p=0.018 n=9+10)
GoMapClear/NonReflexive/1-16 41.4ns ± 2% 41.7ns ± 1% +0.55% (p=0.043 n=9+10)
GoMapClear/NonReflexive/10-16 50.3ns ± 1% 50.9ns ± 1% +1.16% (p=0.000 n=10+10)
GoMapClear/NonReflexive/100-16 125ns ± 0% 126ns ± 0% +0.96% (p=0.000 n=8+10)
GoMapClear/NonReflexive/1000-16 1.08µs ± 0% 1.08µs ± 1% ~ (p=0.097 n=10+10)
GoMapClear/NonReflexive/10000-16 8.18µs ± 2% 8.10µs ± 0% -0.91% (p=0.019 n=10+8)
MapStringConversion/32/simple-16 4.66ns ± 1% 4.69ns ± 3% ~ (p=0.905 n=9+10)
MapStringConversion/32/struct-16 4.65ns ± 3% 4.94ns ± 2% +6.23% (p=0.000 n=10+10)
MapStringConversion/32/array-16 4.69ns ± 3% 4.72ns ± 3% ~ (p=0.631 n=10+10)
MapStringConversion/64/simple-16 4.14ns ± 0% 4.14ns ± 1% ~ (p=0.342 n=10+10)
MapStringConversion/64/struct-16 4.13ns ± 0% 4.13ns ± 0% ~ (p=0.809 n=10+10)
MapStringConversion/64/array-16 4.13ns ± 1% 4.13ns ± 1% ~ (p=0.752 n=10+10)
MapInterfaceString-16 7.90ns ±23% 8.51ns ±33% ~ (p=0.604 n=9+10)
MapInterfacePtr-16 7.68ns ±29% 7.10ns ±36% ~ (p=0.353 n=10+10)
NewEmptyMapHintLessThan8-16 3.70ns ± 0% 3.70ns ± 0% ~ (p=0.209 n=10+10)
NewEmptyMapHintGreaterThan8-16 270ns ± 1% 272ns ± 1% +0.71% (p=0.005 n=10+9)
MapPop100-16 6.45µs ± 0% 6.50µs ± 1% +0.77% (p=0.000 n=10+10)
MapPop1000-16 114µs ± 1% 114µs ± 1% ~ (p=0.190 n=10+10)
MapPop10000-16 2.28ms ± 2% 2.28ms ± 2% ~ (p=0.912 n=10+10)
MapAssign/Int32/256-16 4.75ns ± 2% 4.82ns ± 4% ~ (p=0.101 n=10+10)
MapAssign/Int32/65536-16 16.4ns ± 1% 16.7ns ± 0% +1.44% (p=0.000 n=10+9)
MapAssign/Int64/256-16 4.79ns ± 5% 4.79ns ± 1% ~ (p=0.616 n=10+8)
MapAssign/Int64/65536-16 17.1ns ± 1% 16.8ns ± 0% -1.28% (p=0.000 n=10+8)
MapAssign/Str/256-16 6.07ns ± 6% 6.24ns ± 2% +2.84% (p=0.035 n=10+9)
MapAssign/Str/65536-16 21.4ns ± 0% 21.4ns ± 3% ~ (p=0.300 n=7+10)
MapOperatorAssign/Int32/256-16 4.82ns ± 3% 4.81ns ± 3% ~ (p=0.684 n=10+10)
MapOperatorAssign/Int32/65536-16 16.8ns ± 1% 16.5ns ± 1% -1.68% (p=0.000 n=9+10)
MapOperatorAssign/Int64/256-16 4.74ns ± 1% 4.77ns ± 3% ~ (p=0.563 n=10+9)
MapOperatorAssign/Int64/65536-16 16.9ns ± 1% 17.2ns ± 1% +1.88% (p=0.000 n=10+10)
MapOperatorAssign/Str/256-16 1.09µs ± 1% 1.10µs ± 2% ~ (p=0.210 n=10+10)
MapOperatorAssign/Str/65536-16 184ns ± 9% 184ns ± 8% ~ (p=0.922 n=10+9)
MapAppendAssign/Int32/256-16 13.8ns ±10% 14.4ns ±11% ~ (p=0.190 n=10+10)
MapAppendAssign/Int32/65536-16 28.9ns ± 5% 30.7ns ± 6% +6.13% (p=0.001 n=9+10)
MapAppendAssign/Int64/256-16 14.5ns ±12% 13.8ns ± 8% -5.02% (p=0.037 n=10+10)
MapAppendAssign/Int64/65536-16 30.9ns ± 1% 30.4ns ± 2% -1.56% (p=0.001 n=10+10)
MapAppendAssign/Str/256-16 30.2ns ± 6% 30.0ns ±10% ~ (p=0.645 n=10+10)
MapAppendAssign/Str/65536-16 44.5ns ± 4% 46.8ns ± 3% +5.17% (p=0.001 n=8+9)
MapDelete/Int32/100-16 18.7ns ± 0% 18.7ns ± 0% -0.27% (p=0.017 n=10+10)
MapDelete/Int32/1000-16 17.6ns ± 1% 17.5ns ± 1% -0.85% (p=0.000 n=9+10)
MapDelete/Int32/10000-16 18.7ns ± 0% 18.3ns ± 1% -1.92% (p=0.000 n=10+10)
MapDelete/Int64/100-16 19.1ns ± 0% 19.2ns ± 0% +0.68% (p=0.000 n=10+9)
MapDelete/Int64/1000-16 17.7ns ± 2% 18.3ns ± 1% +3.00% (p=0.000 n=10+10)
MapDelete/Int64/10000-16 18.8ns ± 1% 19.2ns ± 0% +2.01% (p=0.000 n=10+9)
MapDelete/Str/100-16 26.5ns ± 0% 26.4ns ± 1% -0.73% (p=0.000 n=10+10)
MapDelete/Str/1000-16 23.5ns ± 2% 23.4ns ± 1% ~ (p=0.425 n=10+10)
MapDelete/Str/10000-16 25.1ns ± 0% 25.1ns ± 1% +0.28% (p=0.037 n=10+10)
MapDelete/Pointer/100-16 20.6ns ± 1% 20.6ns ± 0% ~ (p=0.117 n=10+10)
MapDelete/Pointer/1000-16 19.2ns ± 1% 19.4ns ± 1% +0.97% (p=0.004 n=10+10)
MapDelete/Pointer/10000-16 20.0ns ± 0% 20.1ns ± 1% +0.52% (p=0.022 n=10+10)
Memmove/0-16 0.21ns ± 2% 0.21ns ± 1% ~ (p=0.671 n=10+10)
Memmove/1-16 0.93ns ± 0% 0.93ns ± 0% +0.21% (p=0.034 n=10+10)
Memmove/2-16 0.93ns ± 0% 0.93ns ± 0% ~ (p=0.101 n=10+10)
Memmove/3-16 0.93ns ± 1% 0.93ns ± 1% +0.49% (p=0.004 n=10+10)
Memmove/4-16 1.03ns ± 0% 1.03ns ± 0% ~ (p=0.260 n=10+10)
Memmove/5-16 1.13ns ± 0% 1.13ns ± 0% +0.20% (p=0.034 n=10+10)
Memmove/6-16 1.13ns ± 0% 1.13ns ± 1% ~ (p=0.126 n=10+10)
Memmove/7-16 1.13ns ± 0% 1.13ns ± 1% +0.22% (p=0.028 n=10+10)
Memmove/8-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.545 n=9+10)
Memmove/9-16 1.25ns ± 0% 1.35ns ± 0% +7.98% (p=0.000 n=10+10)
Memmove/10-16 1.25ns ± 0% 1.35ns ± 0% +7.96% (p=0.000 n=9+9)
Memmove/11-16 1.25ns ± 0% 1.35ns ± 0% +8.53% (p=0.000 n=10+9)
Memmove/12-16 1.25ns ± 0% 1.35ns ± 1% +8.24% (p=0.000 n=10+10)
Memmove/13-16 1.25ns ± 0% 1.34ns ± 0% +7.75% (p=0.000 n=10+10)
Memmove/14-16 1.25ns ± 0% 1.35ns ± 1% +8.28% (p=0.000 n=10+9)
Memmove/15-16 1.25ns ± 0% 1.35ns ± 0% +8.07% (p=0.000 n=10+9)
Memmove/16-16 1.25ns ± 0% 1.35ns ± 1% +8.35% (p=0.000 n=9+10)
Memmove/32-16 1.34ns ± 0% 1.36ns ± 1% +1.22% (p=0.000 n=10+10)
Memmove/64-16 1.45ns ± 0% 1.64ns ± 0% +13.07% (p=0.000 n=10+9)
Memmove/128-16 1.86ns ± 0% 2.02ns ± 0% +8.64% (p=0.000 n=10+10)
Memmove/256-16 2.47ns ± 0% 2.49ns ± 1% +1.14% (p=0.000 n=10+10)
Memmove/512-16 3.96ns ± 1% 3.96ns ± 0% ~ (p=0.182 n=10+10)
Memmove/1024-16 5.90ns ± 1% 5.87ns ± 1% ~ (p=0.258 n=9+9)
Memmove/2048-16 9.62ns ± 1% 9.62ns ± 2% ~ (p=0.963 n=8+9)
Memmove/4096-16 16.4ns ± 0% 17.1ns ± 4% +4.19% (p=0.003 n=8+9)
MemmoveOverlap/32-16 1.62ns ± 1% 1.68ns ± 1% +3.53% (p=0.000 n=10+10)
MemmoveOverlap/64-16 1.64ns ± 0% 1.65ns ± 0% +0.29% (p=0.002 n=9+9)
MemmoveOverlap/128-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.070 n=10+10)
MemmoveOverlap/256-16 2.67ns ± 0% 2.67ns ± 0% +0.26% (p=0.012 n=10+10)
MemmoveOverlap/512-16 6.20ns ±18% 5.74ns ± 0% ~ (p=0.645 n=10+8)
MemmoveOverlap/1024-16 7.28ns ± 0% 7.30ns ± 0% +0.28% (p=0.006 n=8+10)
MemmoveOverlap/2048-16 11.9ns ± 0% 12.0ns ± 1% +0.37% (p=0.014 n=9+9)
MemmoveOverlap/4096-16 23.3ns ± 1% 23.1ns ± 1% -0.84% (p=0.000 n=8+10)
MemmoveUnalignedDst/0-16 1.03ns ± 0% 1.03ns ± 0% +0.19% (p=0.007 n=10+10)
MemmoveUnalignedDst/1-16 1.24ns ± 0% 1.25ns ± 1% +0.52% (p=0.022 n=10+10)
MemmoveUnalignedDst/2-16 1.23ns ± 0% 1.23ns ± 0% ~ (p=0.051 n=10+10)
MemmoveUnalignedDst/3-16 1.23ns ± 0% 1.23ns ± 0% +0.14% (p=0.006 n=9+9)
MemmoveUnalignedDst/4-16 1.23ns ± 0% 1.24ns ± 1% +0.37% (p=0.004 n=10+10)
MemmoveUnalignedDst/5-16 1.35ns ± 0% 1.35ns ± 0% ~ (p=0.075 n=10+10)
MemmoveUnalignedDst/6-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=0.779 n=10+10)
MemmoveUnalignedDst/7-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=1.000 n=10+10)
MemmoveUnalignedDst/8-16 1.34ns ± 0% 1.35ns ± 1% +0.39% (p=0.024 n=10+10)
MemmoveUnalignedDst/9-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.849 n=10+10)
MemmoveUnalignedDst/10-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.255 n=10+10)
MemmoveUnalignedDst/11-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.304 n=10+10)
MemmoveUnalignedDst/12-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.672 n=10+10)
MemmoveUnalignedDst/13-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.435 n=10+10)
MemmoveUnalignedDst/14-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.340 n=10+10)
MemmoveUnalignedDst/15-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.911 n=10+9)
MemmoveUnalignedDst/16-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.074 n=10+10)
MemmoveUnalignedDst/32-16 1.62ns ± 0% 1.63ns ± 0% ~ (p=0.059 n=10+10)
MemmoveUnalignedDst/64-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.234 n=10+10)
MemmoveUnalignedDst/128-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.709 n=10+9)
MemmoveUnalignedDst/256-16 3.69ns ± 0% 3.70ns ± 0% ~ (p=0.144 n=10+10)
MemmoveUnalignedDst/512-16 4.15ns ± 1% 4.14ns ± 0% ~ (p=0.778 n=10+8)
MemmoveUnalignedDst/1024-16 7.52ns ± 0% 7.53ns ± 1% ~ (p=0.650 n=9+9)
MemmoveUnalignedDst/2048-16 12.9ns ± 0% 12.9ns ± 1% ~ (p=0.548 n=8+8)
MemmoveUnalignedDst/4096-16 25.4ns ± 0% 25.4ns ± 0% ~ (p=0.947 n=9+9)
MemmoveUnalignedDstOverlap/32-16 4.08ns ± 0% 4.09ns ± 0% ~ (p=0.360 n=10+10)
MemmoveUnalignedDstOverlap/64-16 4.56ns ± 0% 4.56ns ± 0% ~ (p=0.705 n=10+9)
MemmoveUnalignedDstOverlap/128-16 4.67ns ± 0% 4.67ns ± 0% ~ (p=0.397 n=10+10)
MemmoveUnalignedDstOverlap/256-16 5.08ns ± 0% 5.08ns ± 0% ~ (p=0.159 n=10+9)
MemmoveUnalignedDstOverlap/512-16 8.45ns ± 5% 8.19ns ± 0% -3.10% (p=0.021 n=10+9)
MemmoveUnalignedDstOverlap/1024-16 9.55ns ± 0% 9.56ns ± 0% ~ (p=0.221 n=8+8)
MemmoveUnalignedDstOverlap/2048-16 14.0ns ± 0% 14.0ns ± 1% ~ (p=0.200 n=10+9)
MemmoveUnalignedDstOverlap/4096-16 26.5ns ± 0% 26.5ns ± 0% ~ (p=0.458 n=10+9)
MemmoveUnalignedSrc/0-16 1.02ns ± 1% 0.99ns ± 1% -2.67% (p=0.000 n=10+9)
MemmoveUnalignedSrc/1-16 1.13ns ± 0% 1.13ns ± 1% -0.25% (p=0.027 n=10+9)
MemmoveUnalignedSrc/2-16 1.13ns ± 1% 1.13ns ± 0% -0.28% (p=0.012 n=10+9)
MemmoveUnalignedSrc/3-16 1.24ns ± 1% 1.23ns ± 0% -0.25% (p=0.022 n=9+10)
MemmoveUnalignedSrc/4-16 1.24ns ± 0% 1.23ns ± 1% ~ (p=0.118 n=9+10)
MemmoveUnalignedSrc/5-16 1.34ns ± 0% 1.34ns ± 1% ~ (p=0.564 n=8+10)
MemmoveUnalignedSrc/6-16 1.34ns ± 0% 1.34ns ± 0% -0.39% (p=0.000 n=10+10)
MemmoveUnalignedSrc/7-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=0.235 n=10+10)
MemmoveUnalignedSrc/8-16 1.34ns ± 0% 1.34ns ± 0% -0.37% (p=0.002 n=10+9)
MemmoveUnalignedSrc/9-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.579 n=10+9)
MemmoveUnalignedSrc/10-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.534 n=10+9)
MemmoveUnalignedSrc/11-16 1.44ns ± 0% 1.44ns ± 1% ~ (p=0.415 n=10+10)
MemmoveUnalignedSrc/12-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.218 n=10+10)
MemmoveUnalignedSrc/13-16 1.44ns ± 0% 1.44ns ± 1% ~ (p=0.693 n=10+10)
MemmoveUnalignedSrc/14-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.901 n=10+10)
MemmoveUnalignedSrc/15-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.379 n=10+10)
MemmoveUnalignedSrc/16-16 1.44ns ± 1% 1.44ns ± 0% ~ (p=0.538 n=10+10)
MemmoveUnalignedSrc/32-16 1.60ns ± 1% 1.60ns ± 0% ~ (p=0.491 n=10+10)
MemmoveUnalignedSrc/64-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.564 n=10+10)
MemmoveUnalignedSrc/128-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.497 n=10+9)
MemmoveUnalignedSrc/256-16 2.70ns ± 0% 2.78ns ± 1% +2.81% (p=0.000 n=10+10)
MemmoveUnalignedSrc/512-16 4.31ns ± 0% 4.30ns ± 0% -0.26% (p=0.031 n=8+9)
MemmoveUnalignedSrc/1024-16 7.28ns ± 0% 7.21ns ± 1% -1.05% (p=0.000 n=8+10)
MemmoveUnalignedSrc/2048-16 13.0ns ± 0% 13.0ns ± 0% ~ (p=0.180 n=9+8)
MemmoveUnalignedSrc/4096-16 25.4ns ± 0% 25.3ns ± 1% ~ (p=0.054 n=10+10)
MemmoveUnalignedSrcOverlap/32-16 4.04ns ± 0% 4.06ns ± 0% +0.62% (p=0.000 n=9+10)
MemmoveUnalignedSrcOverlap/64-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.421 n=10+10)
MemmoveUnalignedSrcOverlap/128-16 4.53ns ± 0% 4.52ns ± 0% ~ (p=0.251 n=10+10)
MemmoveUnalignedSrcOverlap/256-16 6.17ns ± 0% 6.15ns ± 0% -0.35% (p=0.000 n=10+9)
MemmoveUnalignedSrcOverlap/512-16 7.43ns ± 0% 7.44ns ± 0% ~ (p=0.524 n=9+8)
MemmoveUnalignedSrcOverlap/1024-16 8.94ns ± 0% 8.94ns ± 0% ~ (p=0.419 n=8+8)
MemmoveUnalignedSrcOverlap/2048-16 13.2ns ± 0% 14.5ns ±21% ~ (p=0.107 n=8+10)
MemmoveUnalignedSrcOverlap/4096-16 25.6ns ± 0% 25.6ns ± 1% ~ (p=0.650 n=9+9)
Memclr/5-16 0.86ns ± 1% 0.86ns ± 2% ~ (p=0.531 n=9+9)
Memclr/16-16 1.04ns ± 0% 1.04ns ± 0% +0.32% (p=0.013 n=9+10)
Memclr/64-16 1.23ns ± 0% 1.26ns ± 0% +2.28% (p=0.000 n=10+10)
Memclr/256-16 2.27ns ± 0% 2.27ns ± 0% ~ (p=0.127 n=10+10)
Memclr/4096-16 17.1ns ± 1% 17.3ns ± 0% +0.88% (p=0.000 n=10+10)
Memclr/65536-16 821ns ± 0% 822ns ± 0% ~ (p=0.516 n=10+10)
Memclr/1M-16 14.1µs ± 1% 14.0µs ± 1% ~ (p=0.516 n=10+10)
Memclr/4M-16 86.1µs ± 1% 85.9µs ± 0% ~ (p=0.123 n=10+10)
Memclr/8M-16 174µs ± 2% 173µs ± 0% ~ (p=0.408 n=10+8)
Memclr/16M-16 385µs ± 4% 387µs ± 0% ~ (p=0.173 n=10+8)
Memclr/64M-16 2.18ms ± 0% 2.19ms ± 0% ~ (p=0.113 n=10+9)
GoMemclr/5-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.346 n=9+10)
GoMemclr/16-16 1.02ns ± 0% 1.02ns ± 0% +0.22% (p=0.003 n=10+8)
GoMemclr/64-16 1.14ns ± 0% 1.14ns ± 0% ~ (p=0.948 n=10+9)
GoMemclr/256-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.868 n=10+10)
MemclrRange/1K_2K-16 457ns ± 0% 428ns ± 1% -6.38% (p=0.000 n=10+10)
MemclrRange/2K_8K-16 1.46µs ± 0% 1.46µs ± 0% ~ (p=0.700 n=10+10)
MemclrRange/4K_16K-16 1.16µs ± 0% 1.16µs ± 0% ~ (p=0.567 n=9+10)
MemclrRange/160K_228K-16 20.7µs ± 0% 20.7µs ± 0% ~ (p=0.160 n=10+10)
ClearFat7-16 0.38ns ± 5% 0.21ns ± 1% -45.79% (p=0.000 n=9+10)
ClearFat8-16 0.21ns ± 3% 0.12ns ± 2% -44.16% (p=0.000 n=8+9)
ClearFat11-16 0.35ns ± 3% 0.21ns ± 1% -40.46% (p=0.000 n=9+9)
ClearFat12-16 0.23ns ± 9% 0.21ns ± 1% -10.23% (p=0.000 n=10+9)
ClearFat13-16 0.22ns ± 6% 0.21ns ± 2% -6.53% (p=0.000 n=10+10)
ClearFat14-16 0.22ns ± 4% 0.21ns ± 1% -5.97% (p=0.000 n=10+10)
ClearFat15-16 0.22ns ± 4% 0.21ns ± 1% -6.96% (p=0.000 n=10+9)
ClearFat16-16 0.19ns ± 9% 0.12ns ± 6% -34.89% (p=0.000 n=9+10)
ClearFat24-16 0.23ns ± 6% 0.21ns ± 1% -10.26% (p=0.000 n=10+9)
ClearFat32-16 0.22ns ± 5% 0.21ns ± 2% -5.31% (p=0.000 n=10+10)
ClearFat40-16 0.34ns ± 4% 0.62ns ± 1% +83.00% (p=0.000 n=10+10)
ClearFat48-16 0.33ns ± 2% 0.41ns ± 0% +26.71% (p=0.000 n=10+10)
ClearFat56-16 0.41ns ± 1% 0.41ns ± 0% ~ (p=0.838 n=10+10)
ClearFat64-16 0.41ns ± 0% 0.41ns ± 0% ~ (p=0.178 n=10+8)
ClearFat72-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.669 n=10+10)
ClearFat128-16 1.04ns ± 0% 1.04ns ± 0% ~ (p=0.679 n=10+10)
ClearFat256-16 1.86ns ± 0% 1.86ns ± 0% ~ (p=0.066 n=9+10)
ClearFat512-16 3.50ns ± 0% 3.50ns ± 0% ~ (p=0.626 n=10+10)
ClearFat1024-16 6.79ns ± 0% 6.79ns ± 0% ~ (p=0.986 n=10+10)
ClearFat1032-16 13.6ns ± 0% 13.6ns ± 0% +0.13% (p=0.044 n=10+10)
ClearFat1040-16 10.3ns ± 0% 10.3ns ± 0% ~ (p=0.175 n=10+9)
CopyFat7-16 0.37ns ±13% 0.25ns ± 1% -31.74% (p=0.000 n=10+9)
CopyFat8-16 0.17ns ± 1% 0.17ns ± 2% +1.35% (p=0.004 n=9+9)
CopyFat11-16 0.26ns ± 1% 0.30ns ± 3% +12.58% (p=0.000 n=9+10)
CopyFat12-16 0.28ns ± 2% 0.26ns ± 1% -5.66% (p=0.000 n=9+9)
CopyFat13-16 0.26ns ± 0% 0.28ns ± 4% +7.35% (p=0.000 n=8+10)
CopyFat14-16 0.29ns ± 6% 0.26ns ± 2% -10.46% (p=0.000 n=10+9)
CopyFat15-16 0.26ns ± 1% 0.30ns ± 6% +14.12% (p=0.000 n=8+10)
CopyFat16-16 0.21ns ± 1% 0.21ns ± 0% ~ (p=0.426 n=8+8)
CopyFat24-16 0.29ns ± 3% 0.25ns ± 1% -12.27% (p=0.000 n=9+10)
CopyFat32-16 0.26ns ± 4% 0.29ns ± 4% +11.71% (p=0.000 n=10+10)
CopyFat64-16 0.46ns ± 8% 0.42ns ± 1% -8.37% (p=0.002 n=10+10)
CopyFat72-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.563 n=10+10)
CopyFat128-16 1.53ns ± 0% 1.54ns ± 0% +0.62% (p=0.000 n=10+10)
CopyFat256-16 2.68ns ± 0% 2.65ns ± 1% -1.23% (p=0.000 n=10+10)
CopyFat512-16 4.93ns ± 1% 5.19ns ± 3% +5.16% (p=0.000 n=9+9)
CopyFat520-16 6.99ns ± 0% 6.99ns ± 0% ~ (p=0.539 n=10+10)
CopyFat1024-16 11.5ns ± 1% 9.8ns ± 1% -14.98% (p=0.000 n=9+10)
CopyFat1032-16 13.6ns ± 0% 13.6ns ± 0% ~ (p=0.728 n=10+10)
CopyFat1040-16 11.0ns ± 0% 11.1ns ± 0% +0.53% (p=0.000 n=10+10)
Issue18740/2byte-16 10.1µs ± 0% 10.1µs ± 0% ~ (p=0.342 n=10+10)
Issue18740/4byte-16 2.34µs ± 0% 2.35µs ± 0% +0.30% (p=0.002 n=10+8)
Issue18740/8byte-16 1.28µs ± 0% 1.28µs ± 0% +0.32% (p=0.000 n=9+10)
Finalizer-16 345µs ± 1% 336µs ± 0% -2.55% (p=0.000 n=10+9)
FinalizerRun-16 450ns ± 3% 420ns ± 1% -6.65% (p=0.000 n=10+10)
PallocBitsSummarize/Unpacked00-16 2.88ns ± 0% 2.88ns ± 0% ~ (p=0.358 n=10+10)
PallocBitsSummarize/UnpackedFFFFFFFFFFFFFFFF-16 15.2ns ± 0% 15.2ns ± 0% ~ (p=0.925 n=10+10)
PallocBitsSummarize/UnpackedAA-16 16.4ns ± 0% 16.3ns ± 0% ~ (p=0.113 n=9+9)
PallocBitsSummarize/UnpackedAAAAAAAAAAAAAAAA-16 16.5ns ± 0% 16.6ns ± 0% ~ (p=0.238 n=10+10)
PallocBitsSummarize/Unpacked80000000AAAAAAAA-16 37.8ns ± 1% 36.4ns ± 0% -3.70% (p=0.000 n=10+9)
PallocBitsSummarize/UnpackedAAAAAAAA00000001-16 41.8ns ± 1% 39.9ns ± 0% -4.68% (p=0.000 n=9+10)
PallocBitsSummarize/UnpackedBBBBBBBBBBBBBBBB-16 18.3ns ± 0% 18.3ns ± 0% ~ (p=0.781 n=10+10)
PallocBitsSummarize/Unpacked80000000BBBBBBBB-16 38.8ns ± 1% 38.1ns ± 0% -1.78% (p=0.000 n=9+10)
PallocBitsSummarize/UnpackedBBBBBBBB00000001-16 37.5ns ± 0% 36.1ns ± 1% -3.88% (p=0.000 n=8+10)
PallocBitsSummarize/UnpackedCCCCCCCCCCCCCCCC-16 21.8ns ± 0% 21.9ns ± 0% +0.20% (p=0.018 n=10+9)
PallocBitsSummarize/Unpacked4444444444444444-16 21.8ns ± 0% 21.9ns ± 0% +0.20% (p=0.029 n=10+9)
PallocBitsSummarize/Unpacked4040404040404040-16 26.5ns ± 0% 26.5ns ± 0% -0.24% (p=0.001 n=9+10)
PallocBitsSummarize/Unpacked4000400040004000-16 33.4ns ± 1% 31.3ns ± 0% -6.20% (p=0.000 n=9+10)
PallocBitsSummarize/Unpacked1000404044CCAAFF-16 36.4ns ± 1% 35.9ns ± 0% -1.50% (p=0.000 n=10+10)
FindBitRange64/Pattern00Size2-16 0.34ns ± 1% 0.35ns ± 1% +3.80% (p=0.000 n=10+9)
FindBitRange64/Pattern00Size8-16 0.70ns ± 1% 0.70ns ± 0% -0.68% (p=0.000 n=10+10)
FindBitRange64/Pattern00Size32-16 0.70ns ± 1% 0.69ns ± 0% -0.86% (p=0.001 n=10+8)
FindBitRange64/PatternFFFFFFFFFFFFFFFFSize2-16 0.34ns ± 1% 0.35ns ± 1% +4.45% (p=0.000 n=9+8)
FindBitRange64/PatternFFFFFFFFFFFFFFFFSize8-16 1.54ns ± 0% 1.54ns ± 1% ~ (p=0.914 n=9+9)
FindBitRange64/PatternFFFFFFFFFFFFFFFFSize32-16 2.78ns ± 0% 2.78ns ± 0% ~ (p=0.295 n=9+10)
FindBitRange64/PatternAASize2-16 0.34ns ± 2% 0.35ns ± 2% +4.61% (p=0.000 n=10+10)
FindBitRange64/PatternAASize8-16 0.70ns ± 1% 0.70ns ± 1% -0.82% (p=0.005 n=10+10)
FindBitRange64/PatternAASize32-16 0.70ns ± 1% 0.70ns ± 0% -0.73% (p=0.003 n=10+9)
FindBitRange64/PatternAAAAAAAAAAAAAAAASize2-16 0.34ns ± 2% 0.35ns ± 2% +3.94% (p=0.000 n=10+10)
FindBitRange64/PatternAAAAAAAAAAAAAAAASize8-16 0.70ns ± 1% 0.70ns ± 1% -0.67% (p=0.025 n=10+10)
FindBitRange64/PatternAAAAAAAAAAAAAAAASize32-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.118 n=9+10)
FindBitRange64/Pattern80000000AAAAAAAASize2-16 0.34ns ± 1% 0.35ns ± 2% +3.72% (p=0.000 n=10+9)
FindBitRange64/Pattern80000000AAAAAAAASize8-16 0.70ns ± 1% 0.70ns ± 0% ~ (p=0.102 n=10+10)
FindBitRange64/Pattern80000000AAAAAAAASize32-16 0.70ns ± 1% 0.70ns ± 1% -0.55% (p=0.011 n=10+10)
FindBitRange64/PatternAAAAAAAA00000001Size2-16 0.34ns ± 2% 0.35ns ± 1% +3.83% (p=0.000 n=10+9)
FindBitRange64/PatternAAAAAAAA00000001Size8-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.065 n=10+10)
FindBitRange64/PatternAAAAAAAA00000001Size32-16 0.70ns ± 1% 0.70ns ± 1% -0.95% (p=0.002 n=10+10)
FindBitRange64/PatternBBBBBBBBBBBBBBBBSize2-16 0.34ns ± 0% 0.35ns ± 1% +4.12% (p=0.000 n=8+10)
FindBitRange64/PatternBBBBBBBBBBBBBBBBSize8-16 1.24ns ± 0% 1.23ns ± 0% -0.30% (p=0.002 n=10+9)
FindBitRange64/PatternBBBBBBBBBBBBBBBBSize32-16 1.24ns ± 0% 1.24ns ± 0% -0.17% (p=0.023 n=9+10)
FindBitRange64/Pattern80000000BBBBBBBBSize2-16 0.34ns ± 1% 0.35ns ± 2% +4.82% (p=0.000 n=9+10)
FindBitRange64/Pattern80000000BBBBBBBBSize8-16 1.24ns ± 1% 1.24ns ± 0% ~ (p=0.063 n=10+10)
FindBitRange64/Pattern80000000BBBBBBBBSize32-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.164 n=9+10)
FindBitRange64/PatternBBBBBBBB00000001Size2-16 0.34ns ± 1% 0.35ns ± 1% +4.38% (p=0.000 n=8+10)
FindBitRange64/PatternBBBBBBBB00000001Size8-16 1.24ns ± 1% 1.24ns ± 0% ~ (p=0.052 n=10+10)
FindBitRange64/PatternBBBBBBBB00000001Size32-16 1.24ns ± 0% 1.23ns ± 0% -0.40% (p=0.000 n=10+10)
FindBitRange64/PatternCCCCCCCCCCCCCCCCSize2-16 0.34ns ± 0% 0.35ns ± 2% +3.96% (p=0.000 n=9+10)
FindBitRange64/PatternCCCCCCCCCCCCCCCCSize8-16 1.24ns ± 0% 1.23ns ± 0% -0.30% (p=0.000 n=10+9)
FindBitRange64/PatternCCCCCCCCCCCCCCCCSize32-16 1.24ns ± 0% 1.24ns ± 1% ~ (p=0.284 n=10+10)
FindBitRange64/Pattern4444444444444444Size2-16 0.34ns ± 1% 0.35ns ± 1% +3.91% (p=0.000 n=9+9)
FindBitRange64/Pattern4444444444444444Size8-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.617 n=10+10)
FindBitRange64/Pattern4444444444444444Size32-16 0.70ns ± 1% 0.70ns ± 1% -0.60% (p=0.006 n=10+10)
FindBitRange64/Pattern4040404040404040Size2-16 0.34ns ± 2% 0.35ns ± 2% +3.67% (p=0.000 n=10+10)
FindBitRange64/Pattern4040404040404040Size8-16 0.70ns ± 2% 0.70ns ± 1% -0.87% (p=0.014 n=10+10)
FindBitRange64/Pattern4040404040404040Size32-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.256 n=10+10)
FindBitRange64/Pattern4000400040004000Size2-16 0.34ns ± 2% 0.35ns ± 3% +4.71% (p=0.000 n=10+10)
FindBitRange64/Pattern4000400040004000Size8-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.393 n=10+10)
FindBitRange64/Pattern4000400040004000Size32-16 0.70ns ± 1% 0.70ns ± 1% -0.86% (p=0.014 n=10+10)
NetpollBreak-16 1.49µs ± 1% 1.50µs ± 3% ~ (p=0.181 n=8+10)
Syscall-16 3.68ns ± 1% 3.66ns ± 2% ~ (p=0.148 n=10+10)
SyscallWork-16 5.15ns ± 1% 5.13ns ± 0% ~ (p=0.188 n=10+9)
SyscallExcess-16 3.89ns ± 2% 3.83ns ± 1% -1.52% (p=0.001 n=10+10)
SyscallExcessWork-16 5.34ns ± 1% 5.31ns ± 0% -0.64% (p=0.000 n=10+9)
PingPongHog-16 397ns ± 7% 394ns ±11% ~ (p=0.912 n=10+10)
StackGrowth-16 67.9ns ± 0% 68.8ns ± 0% +1.28% (p=0.000 n=10+8)
StackGrowthDeep-16 7.70µs ± 1% 8.48µs ± 2% +10.06% (p=0.000 n=9+10)
CreateGoroutines-16 124ns ± 1% 124ns ± 1% ~ (p=0.254 n=10+10)
CreateGoroutinesParallel-16 25.7ns ± 1% 27.6ns ± 2% +7.51% (p=0.000 n=10+10)
CreateGoroutinesCapture-16 823ns ± 1% 821ns ± 2% ~ (p=0.699 n=10+10)
CreateGoroutinesSingle-16 175ns ± 3% 172ns ± 3% -1.90% (p=0.011 n=10+10)
ClosureCall-16 0.11ns ± 7% 0.12ns ± 3% ~ (p=0.842 n=9+10)
WakeupParallelSpinning/0s-16 11.4µs ± 0% 11.4µs ± 0% ~ (p=0.325 n=9+10)
WakeupParallelSpinning/1µs-16 15.4µs ± 0% 15.4µs ± 1% ~ (p=0.955 n=10+10)
WakeupParallelSpinning/2µs-16 18.7µs ± 2% 18.9µs ± 2% ~ (p=0.052 n=10+10)
WakeupParallelSpinning/5µs-16 30.7µs ± 0% 30.7µs ± 0% -0.03% (p=0.003 n=10+10)
WakeupParallelSpinning/10µs-16 48.8µs ± 0% 48.8µs ± 0% ~ (p=0.670 n=10+10)
WakeupParallelSpinning/20µs-16 90.8µs ± 0% 90.8µs ± 0% -0.02% (p=0.004 n=10+10)
WakeupParallelSpinning/50µs-16 211µs ± 0% 211µs ± 0% ~ (p=0.194 n=10+10)
WakeupParallelSpinning/100µs-16 323µs ± 0% 323µs ± 0% ~ (p=1.000 n=10+9)
WakeupParallelSyscall/0s-16 118µs ± 0% 118µs ± 0% ~ (p=0.447 n=10+9)
WakeupParallelSyscall/1µs-16 119µs ± 2% 119µs ± 1% ~ (p=0.604 n=10+9)
WakeupParallelSyscall/2µs-16 120µs ± 1% 121µs ± 3% ~ (p=0.263 n=8+10)
WakeupParallelSyscall/5µs-16 126µs ± 2% 126µs ± 2% ~ (p=0.510 n=10+9)
WakeupParallelSyscall/10µs-16 136µs ± 1% 137µs ± 1% ~ (p=0.095 n=9+10)
WakeupParallelSyscall/20µs-16 156µs ± 2% 157µs ± 3% ~ (p=0.604 n=10+9)
WakeupParallelSyscall/50µs-16 221µs ± 1% 220µs ± 1% ~ (p=0.063 n=10+10)
WakeupParallelSyscall/100µs-16 326µs ± 0% 325µs ± 0% -0.26% (p=0.003 n=9+10)
Matmult-16 0.67ns ± 2% 0.66ns ± 2% ~ (p=0.256 n=10+10)
Fastrand-16 0.08ns ±11% 0.08ns ±13% ~ (p=0.661 n=9+10)
Fastrand64-16 0.08ns ±11% 0.08ns ± 6% ~ (p=0.631 n=10+10)
FastrandHashiter-16 1.76ns ± 1% 1.76ns ± 1% ~ (p=0.854 n=8+8)
Fastrandn/2-16 0.86ns ± 1% 0.86ns ± 1% +1.09% (p=0.000 n=10+9)
Fastrandn/3-16 0.85ns ± 1% 0.86ns ± 1% +1.23% (p=0.001 n=10+10)
Fastrandn/4-16 0.85ns ± 1% 0.87ns ± 2% +1.60% (p=0.000 n=10+10)
Fastrandn/5-16 0.85ns ± 1% 0.86ns ± 1% +1.05% (p=0.000 n=10+10)
IfaceCmp100-16 46.6ns ± 0% 46.1ns ± 0% -1.18% (p=0.000 n=10+10)
IfaceCmpNil100-16 26.8ns ± 0% 26.8ns ± 0% ~ (p=0.777 n=10+8)
EfaceCmpDiff-16 132ns ± 0% 130ns ± 0% -0.95% (p=0.000 n=10+9)
EfaceCmpDiffIndirect-16 209ns ± 0% 211ns ± 0% +1.14% (p=0.000 n=10+9)
Defer-16 3.40ns ± 1% 3.04ns ± 0% -10.67% (p=0.000 n=10+10)
Defer10-16 29.4ns ± 2% 27.2ns ± 3% -7.26% (p=0.000 n=10+10)
DeferMany-16 110ns ± 6% 113ns ± 2% +3.45% (p=0.017 n=9+9)
PanicRecover-16 67.6ns ± 0% 67.7ns ± 2% ~ (p=0.436 n=9+9)
GoroutineProfile/small-nil/idle-16 3.90µs ± 4% 3.86µs ± 2% ~ (p=0.305 n=10+9)
GoroutineProfile/small-nil/loaded-16 4.82µs ± 6% 4.82µs ± 4% ~ (p=0.905 n=10+9)
GoroutineProfile/small/idle-16 103µs ± 3% 102µs ± 3% ~ (p=0.113 n=9+9)
GoroutineProfile/small/loaded-16 432µs ± 5% 440µs ±13% ~ (p=0.604 n=9+10)
GoroutineProfile/large-nil/idle-16 3.86µs ± 3% 3.82µs ± 3% ~ (p=0.210 n=10+10)
GoroutineProfile/large-nil/loaded-16 4.90µs ± 2% 4.90µs ± 5% ~ (p=0.780 n=10+9)
GoroutineProfile/large/idle-16 2.58ms ± 1% 2.52ms ± 1% -2.38% (p=0.000 n=10+10)
GoroutineProfile/large/loaded-16 8.62ms ± 9% 8.90ms ±11% ~ (p=0.400 n=9+10)
GoroutineProfile/sparse-nil/idle-16 3.85µs ± 4% 3.81µs ± 3% ~ (p=0.470 n=10+10)
GoroutineProfile/sparse-nil/loaded-16 4.82µs ± 4% 4.69µs ± 5% ~ (p=0.052 n=10+10)
GoroutineProfile/sparse/idle-16 102µs ± 4% 102µs ± 2% ~ (p=0.497 n=10+9)
GoroutineProfile/sparse/loaded-16 438µs ± 7% 437µs ± 6% ~ (p=0.796 n=10+10)
RWMutexUncontended-16 6.79ns ± 0% 6.78ns ± 0% ~ (p=0.228 n=10+8)
RWMutexWrite100-16 85.4ns ± 0% 87.1ns ± 0% +2.00% (p=0.000 n=10+8)
RWMutexWrite10-16 168ns ±25% 152ns ±11% ~ (p=0.063 n=10+10)
RWMutexWorkWrite100-16 106ns ± 0% 106ns ± 3% ~ (p=0.136 n=10+10)
RWMutexWorkWrite10-16 567ns ± 3% 571ns ± 1% ~ (p=0.326 n=10+9)
SemTable/OneAddrCollision/n=1000-16 15.9µs ± 1% 16.0µs ± 1% +0.50% (p=0.031 n=9+9)
SemTable/ManyAddrCollision/n=1000-16 56.2µs ± 1% 56.8µs ± 1% +1.06% (p=0.000 n=10+10)
SemTable/OneAddrCollision/n=2000-16 32.6µs ± 2% 32.9µs ± 4% ~ (p=0.156 n=10+9)
SemTable/ManyAddrCollision/n=2000-16 118µs ± 0% 119µs ± 0% +0.75% (p=0.000 n=9+10)
SemTable/OneAddrCollision/n=4000-16 65.3µs ± 1% 65.6µs ± 3% ~ (p=0.497 n=9+10)
SemTable/ManyAddrCollision/n=4000-16 245µs ± 0% 248µs ± 2% +1.36% (p=0.000 n=9+10)
SemTable/OneAddrCollision/n=8000-16 131µs ± 1% 130µs ± 1% -1.01% (p=0.002 n=9+10)
SemTable/ManyAddrCollision/n=8000-16 503µs ± 1% 508µs ± 0% +0.97% (p=0.000 n=10+10)
MakeSliceCopy/mallocmove/Byte-16 67.6ns ± 1% 64.1ns ± 2% -5.20% (p=0.000 n=10+10)
MakeSliceCopy/mallocmove/Int-16 65.0ns ± 7% 61.7ns ± 4% -5.08% (p=0.009 n=10+10)
MakeSliceCopy/mallocmove/Ptr-16 88.1ns ± 1% 79.9ns ± 1% -9.29% (p=0.000 n=10+10)
MakeSliceCopy/makecopy/Byte-16 65.2ns ± 6% 63.4ns ± 0% ~ (p=0.500 n=10+8)
MakeSliceCopy/makecopy/Int-16 63.2ns ± 1% 64.1ns ± 1% +1.34% (p=0.001 n=9+9)
MakeSliceCopy/makecopy/Ptr-16 88.1ns ± 1% 80.1ns ± 1% -9.09% (p=0.000 n=10+10)
MakeSliceCopy/nilappend/Byte-16 69.8ns ± 1% 65.7ns ± 3% -5.80% (p=0.000 n=10+10)
MakeSliceCopy/nilappend/Int-16 69.6ns ± 2% 67.2ns ± 1% -3.50% (p=0.000 n=10+9)
MakeSliceCopy/nilappend/Ptr-16 91.5ns ± 1% 83.8ns ± 1% -8.42% (p=0.000 n=9+10)
MakeSlice/Byte-16 6.64ns ± 3% 6.58ns ± 2% ~ (p=0.393 n=10+10)
MakeSlice/Int16-16 8.60ns ± 1% 8.38ns ± 3% -2.48% (p=0.001 n=9+10)
MakeSlice/Int-16 17.7ns ± 3% 16.9ns ± 1% -4.67% (p=0.000 n=10+9)
MakeSlice/Ptr-16 24.0ns ± 3% 23.3ns ± 2% -3.25% (p=0.000 n=10+9)
MakeSlice/Struct/24-16 34.1ns ± 1% 32.0ns ± 1% -6.11% (p=0.000 n=10+10)
MakeSlice/Struct/32-16 39.1ns ± 4% 38.2ns ± 1% ~ (p=0.829 n=10+8)
MakeSlice/Struct/40-16 47.0ns ± 5% 43.0ns ± 2% -8.55% (p=0.000 n=10+9)
GrowSlice/Byte-16 15.3ns ± 3% 15.0ns ± 2% -1.75% (p=0.005 n=9+9)
GrowSlice/Int16-16 18.9ns ± 2% 18.4ns ± 2% -2.71% (p=0.000 n=10+9)
GrowSlice/Int-16 33.9ns ± 1% 32.2ns ± 1% -4.89% (p=0.000 n=10+9)
GrowSlice/Ptr-16 45.3ns ± 2% 43.5ns ± 1% -4.12% (p=0.000 n=10+10)
GrowSlice/Struct/24-16 61.9ns ± 2% 60.0ns ± 4% -3.10% (p=0.002 n=10+10)
GrowSlice/Struct/32-16 79.9ns ± 2% 72.3ns ± 3% -9.58% (p=0.000 n=8+10)
GrowSlice/Struct/40-16 97.1ns ± 7% 88.8ns ± 5% -8.49% (p=0.000 n=10+10)
ExtendSlice/IntSlice-16 21.1ns ± 2% 20.3ns ± 2% -3.71% (p=0.000 n=10+10)
ExtendSlice/PointerSlice-16 26.8ns ± 2% 26.3ns ± 2% -1.86% (p=0.004 n=10+10)
ExtendSlice/NoGrow-16 1.23ns ± 0% 1.30ns ± 1% +5.03% (p=0.000 n=10+10)
Append-16 4.58ns ± 1% 4.53ns ± 0% -1.11% (p=0.000 n=10+10)
AppendGrowByte-16 1.46ms ± 8% 1.42ms ± 7% -3.24% (p=0.035 n=10+10)
AppendGrowString-16 27.8ms ± 4% 27.2ms ± 5% ~ (p=0.052 n=10+10)
AppendSlice/1Bytes-16 1.03ns ± 1% 1.04ns ± 1% ~ (p=0.303 n=10+10)
AppendSlice/4Bytes-16 1.04ns ± 0% 1.05ns ± 0% +0.79% (p=0.000 n=9+10)
AppendSlice/7Bytes-16 1.23ns ± 1% 1.24ns ± 0% +0.45% (p=0.001 n=10+10)
AppendSlice/8Bytes-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.183 n=10+10)
AppendSlice/15Bytes-16 1.37ns ± 1% 1.43ns ± 1% +3.88% (p=0.000 n=10+10)
AppendSlice/16Bytes-16 1.37ns ± 1% 1.42ns ± 1% +3.63% (p=0.000 n=9+10)
AppendSlice/32Bytes-16 1.44ns ± 0% 1.47ns ± 1% +1.83% (p=0.000 n=10+10)
AppendSliceLarge/1024Bytes-16 257ns ± 2% 234ns ± 1% -8.96% (p=0.000 n=8+9)
AppendSliceLarge/4096Bytes-16 871ns ± 6% 812ns ± 1% -6.80% (p=0.000 n=10+10)
AppendSliceLarge/16384Bytes-16 3.15µs ± 6% 3.04µs ± 5% ~ (p=0.052 n=10+10)
AppendSliceLarge/65536Bytes-16 10.7µs ± 7% 10.8µs ± 2% ~ (p=0.278 n=10+9)
AppendSliceLarge/262144Bytes-16 42.9µs ± 1% 39.6µs ± 5% -7.75% (p=0.000 n=9+10)
AppendSliceLarge/1048576Bytes-16 147µs ± 4% 144µs ± 4% -2.21% (p=0.035 n=10+10)
AppendStr/1Bytes-16 1.20ns ± 0% 1.20ns ± 0% ~ (p=0.755 n=10+10)
AppendStr/4Bytes-16 1.13ns ± 0% 1.14ns ± 1% +1.20% (p=0.000 n=10+10)
AppendStr/8Bytes-16 1.24ns ± 0% 1.25ns ± 0% +0.93% (p=0.000 n=10+10)
AppendStr/16Bytes-16 1.40ns ± 0% 1.42ns ± 0% +2.10% (p=0.000 n=9+10)
AppendStr/32Bytes-16 1.44ns ± 0% 1.45ns ± 0% +0.99% (p=0.000 n=10+10)
AppendSpecialCase-16 8.64ns ± 1% 8.89ns ± 2% +2.90% (p=0.000 n=10+10)
Copy/1Byte-16 1.24ns ± 1% 1.24ns ± 0% -0.28% (p=0.000 n=10+6)
Copy/1String-16 1.24ns ± 0% 1.23ns ± 0% ~ (p=0.160 n=10+10)
Copy/2Byte-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.115 n=10+10)
Copy/2String-16 1.24ns ± 0% 1.24ns ± 1% ~ (p=0.954 n=10+10)
Copy/4Byte-16 1.24ns ± 0% 1.24ns ± 0% -0.44% (p=0.001 n=10+10)
Copy/4String-16 1.23ns ± 0% 1.23ns ± 0% ~ (p=0.081 n=10+10)
Copy/8Byte-16 1.37ns ± 0% 1.34ns ± 0% -1.79% (p=0.000 n=9+9)
Copy/8String-16 1.34ns ± 0% 1.34ns ± 0% -0.58% (p=0.000 n=9+10)
Copy/12Byte-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.149 n=9+9)
Copy/12String-16 1.44ns ± 0% 1.45ns ± 0% ~ (p=0.124 n=9+9)
Copy/16Byte-16 1.44ns ± 0% 1.44ns ± 0% -0.19% (p=0.004 n=10+9)
Copy/16String-16 1.44ns ± 0% 1.45ns ± 0% +0.30% (p=0.008 n=10+10)
Copy/32Byte-16 1.63ns ± 1% 1.62ns ± 1% -0.72% (p=0.002 n=10+10)
Copy/32String-16 1.60ns ± 1% 1.64ns ± 0% +2.23% (p=0.000 n=10+10)
Copy/128Byte-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.757 n=9+10)
Copy/128String-16 2.07ns ± 0% 2.07ns ± 0% +0.36% (p=0.004 n=10+10)
Copy/1024Byte-16 6.07ns ± 2% 6.00ns ± 1% -1.20% (p=0.000 n=9+10)
Copy/1024String-16 6.05ns ± 0% 5.95ns ± 1% -1.54% (p=0.000 n=10+9)
AppendInPlace/NoGrow/Byte-16 288ns ± 1% 284ns ± 1% -1.58% (p=0.000 n=10+10)
AppendInPlace/NoGrow/1Ptr-16 844ns ± 1% 809ns ± 3% -4.13% (p=0.000 n=9+10)
AppendInPlace/NoGrow/2Ptr-16 1.47µs ± 1% 1.46µs ± 1% ~ (p=0.388 n=9+10)
AppendInPlace/NoGrow/3Ptr-16 1.87µs ± 7% 1.91µs ± 1% ~ (p=0.166 n=10+8)
AppendInPlace/NoGrow/4Ptr-16 2.66µs ± 1% 2.67µs ± 3% ~ (p=0.968 n=9+10)
AppendInPlace/Grow/Byte-16 126ns ± 2% 121ns ± 2% -4.06% (p=0.000 n=10+10)
AppendInPlace/Grow/1Ptr-16 132ns ± 2% 127ns ± 2% -4.28% (p=0.000 n=10+9)
AppendInPlace/Grow/2Ptr-16 196ns ± 2% 188ns ± 1% -4.20% (p=0.000 n=10+8)
AppendInPlace/Grow/3Ptr-16 264ns ± 1% 260ns ± 1% -1.51% (p=0.000 n=9+10)
AppendInPlace/Grow/4Ptr-16 297ns ± 2% 294ns ± 2% ~ (p=0.085 n=10+10)
StackCopyPtr-16 36.4ms ± 2% 36.7ms ± 2% ~ (p=0.481 n=10+10)
StackCopy-16 33.9ms ± 3% 32.6ms ± 1% -3.87% (p=0.000 n=10+8)
StackCopyNoCache-16 1.00ms ± 5% 1.01ms ± 5% ~ (p=0.143 n=10+10)
StackCopyWithStkobj-16 11.0ms ± 3% 10.9ms ± 4% ~ (p=0.579 n=10+10)
Issue18138-16 49.2µs ± 5% 49.0µs ± 4% ~ (p=1.000 n=10+9)
CompareStringEqual-16 1.39ns ± 1% 1.45ns ± 2% +3.80% (p=0.000 n=8+10)
CompareStringIdentical-16 0.55ns ± 1% 0.55ns ± 0% +0.42% (p=0.007 n=10+10)
CompareStringSameLength-16 1.03ns ± 0% 1.03ns ± 0% ~ (p=0.430 n=9+10)
CompareStringDifferentLength-16 0.11ns ± 2% 0.11ns ± 3% ~ (p=0.139 n=9+10)
CompareStringBigUnaligned-16 23.9µs ± 1% 24.0µs ± 1% ~ (p=0.370 n=9+8)
CompareStringBig-16 22.0µs ± 3% 22.2µs ± 3% ~ (p=0.243 n=9+10)
ConcatStringAndBytes-16 10.7ns ± 1% 10.0ns ± 2% -6.33% (p=0.000 n=10+10)
SliceByteToString/1-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=0.057 n=10+10)
SliceByteToString/2-16 6.67ns ± 2% 6.60ns ± 3% ~ (p=0.101 n=10+10)
SliceByteToString/4-16 7.76ns ± 2% 7.56ns ± 3% -2.59% (p=0.001 n=10+10)
SliceByteToString/8-16 9.81ns ± 4% 9.57ns ± 2% -2.48% (p=0.005 n=10+10)
SliceByteToString/16-16 14.0ns ± 3% 13.7ns ± 2% -2.31% (p=0.009 n=10+10)
SliceByteToString/32-16 17.3ns ± 1% 16.7ns ± 2% -3.41% (p=0.000 n=10+10)
SliceByteToString/64-16 25.1ns ± 1% 24.1ns ± 2% -3.93% (p=0.000 n=9+10)
SliceByteToString/128-16 38.6ns ± 1% 36.5ns ± 1% -5.60% (p=0.000 n=10+10)
RuneCount/lenruneslice/ASCII-16 4.12ns ± 0% 4.11ns ± 0% ~ (p=0.382 n=10+10)
RuneCount/lenruneslice/Japanese-16 25.4ns ± 2% 25.6ns ± 2% ~ (p=0.138 n=9+10)
RuneCount/lenruneslice/MixedLength-16 17.1ns ± 0% 17.2ns ± 0% +0.59% (p=0.000 n=9+9)
RuneCount/rangeloop/ASCII-16 3.30ns ± 1% 3.29ns ± 0% ~ (p=0.267 n=10+10)
RuneCount/rangeloop/Japanese-16 20.1ns ± 1% 24.9ns ± 1% +24.31% (p=0.000 n=9+9)
RuneCount/rangeloop/MixedLength-16 16.5ns ± 1% 16.7ns ± 1% +1.34% (p=0.000 n=10+10)
RuneCount/utf8.RuneCountInString/ASCII-16 5.71ns ± 1% 5.73ns ± 2% ~ (p=0.579 n=10+10)
RuneCount/utf8.RuneCountInString/Japanese-16 22.0ns ± 6% 18.4ns ± 3% -16.41% (p=0.000 n=9+10)
RuneCount/utf8.RuneCountInString/MixedLength-16 15.0ns ± 1% 14.9ns ± 1% -1.01% (p=0.004 n=9+10)
RuneIterate/range/ASCII-16 2.69ns ± 1% 2.72ns ± 0% +0.94% (p=0.026 n=10+9)
RuneIterate/range/Japanese-16 24.5ns ± 2% 25.3ns ± 2% +3.23% (p=0.000 n=9+10)
RuneIterate/range/MixedLength-16 17.0ns ± 1% 17.1ns ± 1% +0.85% (p=0.000 n=10+10)
RuneIterate/range1/ASCII-16 2.70ns ± 1% 2.72ns ± 0% ~ (p=0.058 n=9+9)
RuneIterate/range1/Japanese-16 24.1ns ± 2% 25.2ns ± 3% +4.30% (p=0.000 n=10+10)
RuneIterate/range1/MixedLength-16 16.9ns ± 1% 17.7ns ± 0% +5.04% (p=0.000 n=10+8)
RuneIterate/range2/ASCII-16 2.84ns ± 8% 2.72ns ± 1% -4.28% (p=0.003 n=10+9)
RuneIterate/range2/Japanese-16 22.7ns ± 4% 25.2ns ± 3% +10.97% (p=0.000 n=10+10)
RuneIterate/range2/MixedLength-16 17.0ns ± 1% 17.2ns ± 0% +0.95% (p=0.000 n=10+10)
ArrayEqual-16 0.40ns ± 5% 0.35ns ± 2% -11.83% (p=0.000 n=10+10)
Func/Name-16 8.05ns ± 1% 8.09ns ± 1% +0.40% (p=0.025 n=8+10)
Func/Entry-16 1.73ns ± 1% 1.66ns ± 1% -3.93% (p=0.000 n=10+10)
Func/FileLine-16 27.5ns ± 2% 26.0ns ± 0% -5.50% (p=0.000 n=10+10)
[Geo mean] 16.7ns 15.7ns -6.08%
name old speed new speed delta
SetTypePtr-16 11.0GB/s ± 1% 11.0GB/s ± 3% ~ (p=0.684 n=10+10)
SetTypePtr8-16 15.5GB/s ± 0% 15.5GB/s ± 0% ~ (p=0.123 n=10+10)
SetTypePtr16-16 31.0GB/s ± 1% 31.1GB/s ± 0% ~ (p=0.123 n=10+10)
SetTypePtr32-16 62.1GB/s ± 0% 62.2GB/s ± 0% ~ (p=0.123 n=10+10)
SetTypePtr64-16 124GB/s ± 0% 124GB/s ± 0% ~ (p=0.684 n=10+10)
SetTypePtr126-16 146GB/s ± 0% 146GB/s ± 0% ~ (p=0.481 n=10+10)
SetTypePtr128-16 154GB/s ± 0% 154GB/s ± 0% ~ (p=0.243 n=9+10)
SetTypePtrSlice-16 151GB/s ± 1% 151GB/s ± 1% ~ (p=0.497 n=9+10)
SetTypeNode1-16 5.82GB/s ± 1% 5.82GB/s ± 0% ~ (p=0.353 n=10+10)
SetTypeNode1Slice-16 76.1GB/s ± 1% 77.0GB/s ± 1% +1.19% (p=0.000 n=10+10)
SetTypeNode8-16 19.4GB/s ± 0% 19.4GB/s ± 0% ~ (p=0.130 n=8+8)
SetTypeNode8Slice-16 113GB/s ± 0% 113GB/s ± 0% ~ (p=0.604 n=10+9)
SetTypeNode64-16 76.5GB/s ± 0% 76.5GB/s ± 0% ~ (p=0.190 n=10+10)
SetTypeNode64Slice-16 97.8GB/s ± 0% 97.7GB/s ± 0% ~ (p=0.549 n=9+10)
SetTypeNode64Dead-16 95.5GB/s ± 0% 95.7GB/s ± 0% ~ (p=0.118 n=10+6)
SetTypeNode64DeadSlice-16 112GB/s ± 0% 112GB/s ± 0% ~ (p=0.353 n=10+10)
SetTypeNode124-16 146GB/s ± 0% 146GB/s ± 0% ~ (p=0.853 n=10+10)
SetTypeNode124Slice-16 146GB/s ± 5% 149GB/s ± 0% ~ (p=0.315 n=10+10)
SetTypeNode126-16 154GB/s ± 0% 154GB/s ± 0% ~ (p=0.356 n=10+9)
SetTypeNode126Slice-16 150GB/s ± 0% 150GB/s ± 0% ~ (p=0.095 n=9+10)
SetTypeNode128-16 107GB/s ± 0% 107GB/s ± 0% +0.31% (p=0.003 n=9+10)
SetTypeNode128Slice-16 119GB/s ± 0% 120GB/s ± 0% ~ (p=0.156 n=10+9)
SetTypeNode130-16 108GB/s ± 0% 108GB/s ± 0% +0.33% (p=0.002 n=10+10)
SetTypeNode130Slice-16 119GB/s ± 0% 119GB/s ± 0% ~ (p=0.739 n=10+10)
SetTypeNode1024-16 160GB/s ± 0% 159GB/s ± 1% ~ (p=0.113 n=9+9)
SetTypeNode1024Slice-16 144GB/s ± 0% 144GB/s ± 0% ~ (p=0.063 n=10+10)
Hash5-16 2.59GB/s ± 1% 2.49GB/s ± 0% -3.90% (p=0.000 n=10+9)
Hash16-16 7.85GB/s ± 1% 7.23GB/s ± 1% -7.92% (p=0.000 n=10+10)
Hash64-16 24.0GB/s ± 0% 23.9GB/s ± 0% ~ (p=0.190 n=9+9)
Hash1024-16 62.4GB/s ± 0% 62.3GB/s ± 0% -0.16% (p=0.017 n=9+10)
Hash65536-16 74.0GB/s ± 0% 74.0GB/s ± 0% ~ (p=0.796 n=10+10)
Memmove/1-16 1.08GB/s ± 0% 1.08GB/s ± 0% -0.21% (p=0.035 n=10+10)
Memmove/2-16 2.16GB/s ± 0% 2.15GB/s ± 0% ~ (p=0.105 n=10+10)
Memmove/3-16 3.24GB/s ± 1% 3.22GB/s ± 1% -0.49% (p=0.004 n=10+10)
Memmove/4-16 3.89GB/s ± 0% 3.89GB/s ± 0% ~ (p=0.218 n=10+10)
Memmove/5-16 4.42GB/s ± 0% 4.42GB/s ± 0% ~ (p=0.075 n=10+10)
Memmove/6-16 5.31GB/s ± 0% 5.29GB/s ± 1% ~ (p=0.218 n=10+10)
Memmove/7-16 6.19GB/s ± 0% 6.18GB/s ± 0% -0.15% (p=0.035 n=10+9)
Memmove/8-16 7.07GB/s ± 0% 7.07GB/s ± 0% ~ (p=0.684 n=10+10)
Memmove/9-16 7.22GB/s ± 0% 6.68GB/s ± 0% -7.37% (p=0.000 n=10+10)
Memmove/10-16 8.02GB/s ± 0% 7.43GB/s ± 0% -7.38% (p=0.000 n=9+9)
Memmove/11-16 8.83GB/s ± 0% 8.13GB/s ± 0% -7.87% (p=0.000 n=10+9)
Memmove/12-16 9.62GB/s ± 0% 8.89GB/s ± 1% -7.61% (p=0.000 n=10+10)
Memmove/13-16 10.4GB/s ± 0% 9.7GB/s ± 0% -7.20% (p=0.000 n=10+10)
Memmove/14-16 11.2GB/s ± 0% 10.4GB/s ± 1% -7.64% (p=0.000 n=10+9)
Memmove/15-16 12.0GB/s ± 0% 11.1GB/s ± 0% -7.46% (p=0.000 n=10+9)
Memmove/16-16 12.8GB/s ± 0% 11.8GB/s ± 1% -7.67% (p=0.000 n=10+10)
Memmove/32-16 23.8GB/s ± 0% 23.5GB/s ± 1% -1.20% (p=0.000 n=10+10)
Memmove/64-16 44.2GB/s ± 0% 39.1GB/s ± 0% -11.56% (p=0.000 n=10+9)
Memmove/128-16 68.7GB/s ± 0% 63.2GB/s ± 0% -7.95% (p=0.000 n=10+10)
Memmove/256-16 104GB/s ± 0% 103GB/s ± 0% -1.13% (p=0.000 n=10+10)
Memmove/512-16 129GB/s ± 1% 129GB/s ± 0% ~ (p=0.165 n=10+10)
Memmove/1024-16 174GB/s ± 1% 174GB/s ± 1% ~ (p=0.258 n=9+9)
Memmove/2048-16 213GB/s ± 1% 213GB/s ± 2% ~ (p=0.963 n=8+9)
Memmove/4096-16 250GB/s ± 1% 240GB/s ± 4% -3.83% (p=0.006 n=9+9)
MemmoveOverlap/32-16 19.8GB/s ± 1% 19.1GB/s ± 1% -3.40% (p=0.000 n=10+10)
MemmoveOverlap/64-16 39.0GB/s ± 0% 38.8GB/s ± 0% -0.28% (p=0.001 n=9+9)
MemmoveOverlap/128-16 62.2GB/s ± 0% 62.1GB/s ± 0% ~ (p=0.063 n=10+10)
MemmoveOverlap/256-16 96.0GB/s ± 0% 95.8GB/s ± 0% -0.26% (p=0.009 n=10+10)
MemmoveOverlap/512-16 83.6GB/s ±16% 89.2GB/s ± 0% ~ (p=0.696 n=10+8)
MemmoveOverlap/1024-16 141GB/s ± 0% 140GB/s ± 0% -0.28% (p=0.006 n=8+10)
MemmoveOverlap/2048-16 172GB/s ± 0% 171GB/s ± 1% -0.38% (p=0.008 n=9+9)
MemmoveOverlap/4096-16 176GB/s ± 1% 177GB/s ± 1% +0.84% (p=0.001 n=8+10)
MemmoveUnalignedDst/1-16 806MB/s ± 0% 802MB/s ± 1% -0.52% (p=0.023 n=10+10)
MemmoveUnalignedDst/2-16 1.62GB/s ± 0% 1.62GB/s ± 0% -0.11% (p=0.041 n=10+10)
MemmoveUnalignedDst/3-16 2.43GB/s ± 0% 2.43GB/s ± 0% -0.14% (p=0.006 n=9+9)
MemmoveUnalignedDst/4-16 3.24GB/s ± 0% 3.23GB/s ± 1% -0.36% (p=0.007 n=10+10)
MemmoveUnalignedDst/5-16 3.71GB/s ± 0% 3.71GB/s ± 0% ~ (p=0.063 n=10+10)
MemmoveUnalignedDst/6-16 4.48GB/s ± 0% 4.47GB/s ± 0% ~ (p=0.912 n=10+10)
MemmoveUnalignedDst/7-16 5.22GB/s ± 0% 5.22GB/s ± 0% ~ (p=1.000 n=10+10)
MemmoveUnalignedDst/8-16 5.95GB/s ± 0% 5.93GB/s ± 1% -0.40% (p=0.023 n=10+10)
MemmoveUnalignedDst/9-16 6.24GB/s ± 0% 6.24GB/s ± 0% ~ (p=0.912 n=10+10)
MemmoveUnalignedDst/10-16 6.94GB/s ± 0% 6.94GB/s ± 0% ~ (p=0.353 n=10+10)
MemmoveUnalignedDst/11-16 7.64GB/s ± 0% 7.63GB/s ± 0% ~ (p=0.393 n=10+10)
MemmoveUnalignedDst/12-16 8.33GB/s ± 0% 8.33GB/s ± 0% ~ (p=0.971 n=10+10)
MemmoveUnalignedDst/13-16 9.02GB/s ± 0% 9.01GB/s ± 0% ~ (p=0.436 n=10+10)
MemmoveUnalignedDst/14-16 9.71GB/s ± 0% 9.71GB/s ± 0% ~ (p=0.280 n=10+10)
MemmoveUnalignedDst/15-16 10.4GB/s ± 0% 10.4GB/s ± 1% ~ (p=0.853 n=10+10)
MemmoveUnalignedDst/16-16 11.1GB/s ± 0% 11.1GB/s ± 0% ~ (p=0.089 n=10+10)
MemmoveUnalignedDst/32-16 19.7GB/s ± 1% 19.6GB/s ± 0% ~ (p=0.075 n=10+10)
MemmoveUnalignedDst/64-16 38.9GB/s ± 0% 38.8GB/s ± 0% ~ (p=0.218 n=10+10)
MemmoveUnalignedDst/128-16 62.1GB/s ± 0% 62.1GB/s ± 0% ~ (p=0.549 n=10+9)
MemmoveUnalignedDst/256-16 69.4GB/s ± 0% 69.3GB/s ± 0% ~ (p=0.105 n=10+10)
MemmoveUnalignedDst/512-16 124GB/s ± 1% 124GB/s ± 0% ~ (p=0.762 n=10+8)
MemmoveUnalignedDst/1024-16 136GB/s ± 0% 136GB/s ± 1% ~ (p=0.666 n=9+9)
MemmoveUnalignedDst/2048-16 159GB/s ± 0% 159GB/s ± 0% ~ (p=0.574 n=8+8)
MemmoveUnalignedDst/4096-16 161GB/s ± 0% 161GB/s ± 0% ~ (p=1.000 n=9+9)
MemmoveUnalignedDstOverlap/32-16 7.84GB/s ± 0% 7.83GB/s ± 0% ~ (p=0.353 n=10+10)
MemmoveUnalignedDstOverlap/64-16 14.0GB/s ± 0% 14.0GB/s ± 0% ~ (p=0.661 n=10+9)
MemmoveUnalignedDstOverlap/128-16 27.4GB/s ± 0% 27.4GB/s ± 0% ~ (p=0.353 n=10+10)
MemmoveUnalignedDstOverlap/256-16 50.4GB/s ± 0% 50.4GB/s ± 0% ~ (p=0.156 n=10+9)
MemmoveUnalignedDstOverlap/512-16 60.7GB/s ± 4% 62.5GB/s ± 0% +3.07% (p=0.022 n=10+9)
MemmoveUnalignedDstOverlap/1024-16 107GB/s ± 0% 107GB/s ± 0% ~ (p=0.234 n=8+8)
MemmoveUnalignedDstOverlap/2048-16 146GB/s ± 0% 146GB/s ± 1% ~ (p=0.182 n=10+9)
MemmoveUnalignedDstOverlap/4096-16 155GB/s ± 0% 155GB/s ± 0% ~ (p=0.400 n=10+9)
MemmoveUnalignedSrc/1-16 882MB/s ± 0% 884MB/s ± 1% +0.24% (p=0.033 n=10+9)
MemmoveUnalignedSrc/2-16 1.76GB/s ± 1% 1.77GB/s ± 0% +0.27% (p=0.028 n=10+9)
MemmoveUnalignedSrc/3-16 2.43GB/s ± 0% 2.43GB/s ± 0% +0.26% (p=0.027 n=9+10)
MemmoveUnalignedSrc/4-16 3.24GB/s ± 0% 3.24GB/s ± 1% ~ (p=0.079 n=9+10)
MemmoveUnalignedSrc/5-16 3.73GB/s ± 0% 3.73GB/s ± 1% ~ (p=0.829 n=8+10)
MemmoveUnalignedSrc/6-16 4.47GB/s ± 0% 4.49GB/s ± 0% +0.39% (p=0.000 n=10+10)
MemmoveUnalignedSrc/7-16 5.22GB/s ± 0% 5.23GB/s ± 0% ~ (p=0.280 n=10+10)
MemmoveUnalignedSrc/8-16 5.95GB/s ± 0% 5.98GB/s ± 0% +0.39% (p=0.001 n=10+9)
MemmoveUnalignedSrc/9-16 6.24GB/s ± 0% 6.25GB/s ± 0% ~ (p=0.549 n=10+9)
MemmoveUnalignedSrc/10-16 6.93GB/s ± 0% 6.94GB/s ± 0% ~ (p=0.604 n=10+9)
MemmoveUnalignedSrc/11-16 7.63GB/s ± 0% 7.63GB/s ± 1% ~ (p=0.353 n=10+10)
MemmoveUnalignedSrc/12-16 8.32GB/s ± 0% 8.32GB/s ± 0% ~ (p=0.218 n=10+10)
MemmoveUnalignedSrc/13-16 9.02GB/s ± 0% 9.00GB/s ± 1% ~ (p=0.684 n=10+10)
MemmoveUnalignedSrc/14-16 9.71GB/s ± 0% 9.71GB/s ± 0% ~ (p=0.739 n=10+10)
MemmoveUnalignedSrc/15-16 10.4GB/s ± 0% 10.4GB/s ± 0% ~ (p=0.353 n=10+10)
MemmoveUnalignedSrc/16-16 11.1GB/s ± 1% 11.1GB/s ± 0% ~ (p=0.579 n=10+10)
MemmoveUnalignedSrc/32-16 20.0GB/s ± 1% 20.0GB/s ± 0% ~ (p=0.631 n=10+10)
MemmoveUnalignedSrc/64-16 38.8GB/s ± 0% 38.8GB/s ± 0% ~ (p=0.579 n=10+10)
MemmoveUnalignedSrc/128-16 61.2GB/s ± 0% 61.2GB/s ± 0% ~ (p=0.780 n=10+9)
MemmoveUnalignedSrc/256-16 94.8GB/s ± 0% 92.2GB/s ± 1% -2.73% (p=0.000 n=10+10)
MemmoveUnalignedSrc/512-16 119GB/s ± 0% 119GB/s ± 0% +0.26% (p=0.027 n=8+9)
MemmoveUnalignedSrc/1024-16 141GB/s ± 0% 142GB/s ± 1% +1.07% (p=0.000 n=8+10)
MemmoveUnalignedSrc/2048-16 157GB/s ± 0% 157GB/s ± 0% ~ (p=0.167 n=9+8)
MemmoveUnalignedSrc/4096-16 161GB/s ± 0% 162GB/s ± 1% ~ (p=0.063 n=10+10)
MemmoveUnalignedSrcOverlap/32-16 7.93GB/s ± 0% 7.88GB/s ± 0% -0.63% (p=0.000 n=9+10)
MemmoveUnalignedSrcOverlap/64-16 15.5GB/s ± 0% 15.5GB/s ± 0% ~ (p=0.529 n=10+10)
MemmoveUnalignedSrcOverlap/128-16 28.3GB/s ± 0% 28.3GB/s ± 0% ~ (p=0.218 n=10+10)
MemmoveUnalignedSrcOverlap/256-16 41.5GB/s ± 0% 41.6GB/s ± 0% +0.35% (p=0.000 n=10+9)
MemmoveUnalignedSrcOverlap/512-16 68.9GB/s ± 0% 68.8GB/s ± 0% ~ (p=0.541 n=9+8)
MemmoveUnalignedSrcOverlap/1024-16 115GB/s ± 0% 115GB/s ± 0% ~ (p=0.382 n=8+8)
MemmoveUnalignedSrcOverlap/2048-16 155GB/s ± 0% 144GB/s ±18% ~ (p=0.101 n=8+10)
MemmoveUnalignedSrcOverlap/4096-16 160GB/s ± 0% 160GB/s ± 1% ~ (p=0.605 n=9+9)
Memclr/5-16 5.81GB/s ± 1% 5.80GB/s ± 2% ~ (p=0.546 n=9+9)
Memclr/16-16 15.5GB/s ± 0% 15.4GB/s ± 0% -0.32% (p=0.008 n=9+10)
Memclr/64-16 51.8GB/s ± 0% 50.7GB/s ± 0% -2.22% (p=0.000 n=10+10)
Memclr/256-16 113GB/s ± 0% 113GB/s ± 0% ~ (p=0.143 n=10+10)
Memclr/4096-16 239GB/s ± 1% 237GB/s ± 0% -0.87% (p=0.000 n=10+10)
Memclr/65536-16 79.8GB/s ± 0% 79.7GB/s ± 0% ~ (p=0.529 n=10+10)
Memclr/1M-16 74.6GB/s ± 1% 74.7GB/s ± 1% ~ (p=0.529 n=10+10)
Memclr/4M-16 48.7GB/s ± 1% 48.8GB/s ± 0% ~ (p=0.123 n=10+10)
Memclr/8M-16 48.2GB/s ± 2% 48.6GB/s ± 0% ~ (p=0.408 n=10+8)
Memclr/16M-16 43.6GB/s ± 4% 43.3GB/s ± 0% ~ (p=0.173 n=10+8)
Memclr/64M-16 30.7GB/s ± 0% 30.7GB/s ± 0% ~ (p=0.113 n=10+9)
GoMemclr/5-16 6.07GB/s ± 0% 6.08GB/s ± 0% ~ (p=0.367 n=9+10)
GoMemclr/16-16 15.6GB/s ± 0% 15.6GB/s ± 0% -0.22% (p=0.004 n=9+9)
GoMemclr/64-16 56.1GB/s ± 0% 56.1GB/s ± 0% ~ (p=0.968 n=10+9)
GoMemclr/256-16 125GB/s ± 0% 124GB/s ± 0% ~ (p=0.912 n=10+10)
MemclrRange/1K_2K-16 210GB/s ± 0% 224GB/s ± 1% +6.81% (p=0.000 n=10+10)
MemclrRange/2K_8K-16 228GB/s ± 0% 228GB/s ± 0% ~ (p=0.684 n=10+10)
MemclrRange/4K_16K-16 279GB/s ± 0% 279GB/s ± 0% ~ (p=0.780 n=9+10)
MemclrRange/160K_228K-16 80.3GB/s ± 0% 80.2GB/s ± 0% ~ (p=0.165 n=10+10)
Copy/1Byte-16 808MB/s ± 1% 810MB/s ± 0% +0.28% (p=0.000 n=10+8)
Copy/1String-16 810MB/s ± 0% 811MB/s ± 0% ~ (p=0.105 n=10+10)
Copy/2Byte-16 1.62GB/s ± 0% 1.62GB/s ± 0% ~ (p=0.182 n=10+9)
Copy/2String-16 1.62GB/s ± 1% 1.62GB/s ± 1% ~ (p=1.000 n=10+10)
Copy/4Byte-16 3.22GB/s ± 0% 3.24GB/s ± 0% +0.46% (p=0.000 n=10+10)
Copy/4String-16 3.24GB/s ± 0% 3.24GB/s ± 0% ~ (p=0.075 n=10+10)
Copy/8Byte-16 5.86GB/s ± 0% 5.96GB/s ± 0% +1.82% (p=0.000 n=9+9)
Copy/8String-16 5.95GB/s ± 0% 5.99GB/s ± 0% +0.59% (p=0.000 n=9+10)
Copy/12Byte-16 8.32GB/s ± 0% 8.32GB/s ± 0% ~ (p=0.190 n=9+9)
Copy/12String-16 8.31GB/s ± 0% 8.29GB/s ± 0% ~ (p=0.068 n=9+10)
Copy/16Byte-16 11.1GB/s ± 0% 11.1GB/s ± 0% +0.18% (p=0.003 n=10+9)
Copy/16String-16 11.1GB/s ± 0% 11.1GB/s ± 0% -0.31% (p=0.009 n=10+10)
Copy/32Byte-16 19.6GB/s ± 1% 19.8GB/s ± 1% +0.72% (p=0.002 n=10+10)
Copy/32String-16 20.0GB/s ± 0% 19.5GB/s ± 0% -2.19% (p=0.000 n=10+10)
Copy/128Byte-16 62.2GB/s ± 0% 62.1GB/s ± 0% ~ (p=0.661 n=9+10)
Copy/128String-16 61.9GB/s ± 0% 61.7GB/s ± 0% -0.35% (p=0.005 n=10+10)
Copy/1024Byte-16 169GB/s ± 2% 171GB/s ± 1% +1.21% (p=0.000 n=9+10)
Copy/1024String-16 169GB/s ± 0% 172GB/s ± 1% +1.57% (p=0.000 n=10+9)
CompareStringBigUnaligned-16 43.8GB/s ± 1% 43.7GB/s ± 1% ~ (p=0.370 n=9+8)
CompareStringBig-16 47.6GB/s ± 3% 47.3GB/s ± 3% ~ (p=0.243 n=9+10)
[Geo mean] 25.3GB/s 28.0GB/s +10.66%
name old p50-ns new p50-ns delta
ReadMemStatsLatency-16 98.4k ±37% 112.7k ±64% ~ (p=0.436 n=10+10)
ReadMetricsLatency-16 1.69k ± 3% 1.70k ± 2% ~ (p=0.646 n=9+10)
GoroutineProfile/small-nil/idle-16 3.75k ± 4% 3.72k ± 1% ~ (p=0.447 n=10+9)
GoroutineProfile/small-nil/loaded-16 4.33k ± 3% 4.31k ± 5% ~ (p=0.931 n=9+9)
GoroutineProfile/small/idle-16 102k ± 3% 101k ± 4% ~ (p=0.113 n=9+9)
GoroutineProfile/small/loaded-16 214k ± 3% 215k ± 6% ~ (p=0.842 n=9+10)
GoroutineProfile/large-nil/idle-16 3.70k ± 2% 3.65k ± 2% ~ (p=0.075 n=10+10)
GoroutineProfile/large-nil/loaded-16 4.36k ± 3% 4.31k ±10% ~ (p=0.631 n=10+10)
GoroutineProfile/large/idle-16 2.56M ± 1% 2.51M ± 1% -2.28% (p=0.000 n=10+10)
GoroutineProfile/large/loaded-16 6.77M ± 4% 6.85M ±19% ~ (p=0.536 n=7+10)
GoroutineProfile/sparse-nil/idle-16 3.66k ± 1% 3.64k ± 2% ~ (p=0.136 n=9+9)
GoroutineProfile/sparse-nil/loaded-16 4.25k ± 5% 4.15k ± 4% ~ (p=0.190 n=10+10)
GoroutineProfile/sparse/idle-16 102k ± 4% 101k ± 3% ~ (p=0.447 n=10+9)
GoroutineProfile/sparse/loaded-16 216k ± 4% 218k ± 4% ~ (p=0.549 n=9+10)
[Geo mean] 35.8k 35.9k +0.35%
name old p90-ns new p90-ns delta
ReadMemStatsLatency-16 983k ±310% 200k ±34% -79.62% (p=0.034 n=10+8)
ReadMetricsLatency-16 4.01k ±35% 3.75k ±17% ~ (p=0.315 n=10+10)
GoroutineProfile/small-nil/idle-16 4.21k ± 4% 4.27k ± 8% ~ (p=0.968 n=9+10)
GoroutineProfile/small-nil/loaded-16 5.58k ± 8% 5.35k ±12% ~ (p=0.190 n=10+10)
GoroutineProfile/small/idle-16 108k ± 6% 107k ± 7% ~ (p=0.497 n=9+10)
GoroutineProfile/small/loaded-16 450k ± 5% 432k ± 3% -3.92% (p=0.002 n=9+9)
GoroutineProfile/large-nil/idle-16 4.13k ± 7% 4.04k ± 2% ~ (p=0.181 n=10+8)
GoroutineProfile/large-nil/loaded-16 5.76k ± 4% 5.67k ± 5% ~ (p=0.190 n=10+10)
GoroutineProfile/large/idle-16 2.63M ± 2% 2.58M ± 1% -1.97% (p=0.000 n=10+10)
GoroutineProfile/large/loaded-16 16.9M ± 4% 17.0M ± 6% ~ (p=0.661 n=9+10)
GoroutineProfile/sparse-nil/idle-16 4.21k ±10% 4.07k ± 7% ~ (p=0.128 n=10+10)
GoroutineProfile/sparse-nil/loaded-16 5.55k ± 8% 5.38k ± 6% ~ (p=0.089 n=10+10)
GoroutineProfile/sparse/idle-16 106k ± 4% 106k ± 3% ~ (p=0.661 n=10+9)
GoroutineProfile/sparse/loaded-16 454k ± 6% 441k ± 6% -2.86% (p=0.043 n=10+10)
[Geo mean] 58.4k 51.0k -12.61%
name old p99-ns new p99-ns delta
ReadMemStatsLatency-16 983k ±310% 200k ±34% -79.62% (p=0.034 n=10+8)
ReadMetricsLatency-16 26.6k ±22% 26.6k ±17% ~ (p=0.971 n=10+10)
GoroutineProfile/small-nil/idle-16 5.27k ±12% 5.19k ±12% ~ (p=0.579 n=10+10)
GoroutineProfile/small-nil/loaded-16 7.28k ± 3% 7.04k ± 9% ~ (p=0.113 n=9+10)
GoroutineProfile/small/idle-16 114k ± 6% 113k ± 6% ~ (p=0.604 n=9+10)
GoroutineProfile/small/loaded-16 4.84M ±70% 6.06M ±131% ~ (p=0.842 n=9+10)
GoroutineProfile/large-nil/idle-16 5.38k ±19% 5.26k ±13% ~ (p=0.912 n=10+10)
GoroutineProfile/large-nil/loaded-16 7.38k ± 3% 7.23k ± 4% ~ (p=0.143 n=10+10)
GoroutineProfile/large/idle-16 2.79M ± 5% 2.72M ± 3% ~ (p=0.089 n=10+10)
GoroutineProfile/large/loaded-16 24.0M ±24% 24.9M ±29% ~ (p=0.684 n=10+10)
GoroutineProfile/sparse-nil/idle-16 5.32k ±17% 5.49k ±17% ~ (p=0.684 n=10+10)
GoroutineProfile/sparse-nil/loaded-16 7.25k ± 4% 6.97k ± 5% -3.90% (p=0.005 n=9+10)
GoroutineProfile/sparse/idle-16 113k ± 5% 112k ± 6% ~ (p=0.631 n=10+10)
GoroutineProfile/sparse/loaded-16 4.00M ±66% 4.26M ±65% ~ (p=0.489 n=9+9)
[Geo mean] 107k 97k -9.55%
name old alloc/op new alloc/op delta
NewEmptyMap-16 0.00B 0.00B ~ (all equal)
NewSmallMap-16 0.00B 0.00B ~ (all equal)
MapPopulate/1-16 0.00B 0.00B ~ (all equal)
MapPopulate/10-16 179B ± 0% 179B ± 0% ~ (all equal)
MapPopulate/100-16 3.35kB ± 0% 3.35kB ± 0% ~ (p=0.294 n=10+8)
MapPopulate/1000-16 53.3kB ± 0% 53.3kB ± 0% ~ (p=1.000 n=8+10)
MapPopulate/10000-16 428kB ± 0% 428kB ± 0% ~ (p=0.469 n=10+10)
MapPopulate/100000-16 3.62MB ± 0% 3.62MB ± 0% ~ (p=0.888 n=9+10)
MapStringConversion/32/simple-16 0.00B 0.00B ~ (all equal)
MapStringConversion/32/struct-16 0.00B 0.00B ~ (all equal)
MapStringConversion/32/array-16 0.00B 0.00B ~ (all equal)
MapStringConversion/64/simple-16 0.00B 0.00B ~ (all equal)
MapStringConversion/64/struct-16 0.00B 0.00B ~ (all equal)
MapStringConversion/64/array-16 0.00B 0.00B ~ (all equal)
NewEmptyMapHintLessThan8-16 0.00B 0.00B ~ (all equal)
NewEmptyMapHintGreaterThan8-16 1.15kB ± 0% 1.15kB ± 0% ~ (all equal)
MapAppendAssign/Int32/256-16 41.7B ±15% 44.3B ±12% ~ (p=0.106 n=10+10)
MapAppendAssign/Int32/65536-16 22.6B ± 6% 23.5B ± 6% +4.19% (p=0.025 n=9+10)
MapAppendAssign/Int64/256-16 43.5B ±10% 42.9B ± 7% ~ (p=0.757 n=10+10)
MapAppendAssign/Int64/65536-16 24.7B ± 7% 21.8B ± 6% -11.74% (p=0.000 n=10+10)
MapAppendAssign/Str/256-16 87.6B ±10% 89.2B ± 9% ~ (p=0.379 n=10+10)
MapAppendAssign/Str/65536-16 45.1B ±14% 47.2B ± 8% ~ (p=0.150 n=10+9)
CreateGoroutinesCapture-16 144B ± 0% 144B ± 0% ~ (all equal)
[Geo mean] 769B 770B +0.21%
name old allocs/op new allocs/op delta
NewEmptyMap-16 0.00 0.00 ~ (all equal)
NewSmallMap-16 0.00 0.00 ~ (all equal)
MapPopulate/1-16 0.00 0.00 ~ (all equal)
MapPopulate/10-16 1.00 ± 0% 1.00 ± 0% ~ (all equal)
MapPopulate/100-16 17.0 ± 0% 17.0 ± 0% ~ (all equal)
MapPopulate/1000-16 73.0 ± 0% 73.0 ± 0% ~ (all equal)
MapPopulate/10000-16 320 ± 0% 320 ± 0% ~ (p=1.000 n=10+10)
MapPopulate/100000-16 4.00k ± 0% 4.00k ± 0% ~ (p=0.753 n=10+10)
MapStringConversion/32/simple-16 0.00 0.00 ~ (all equal)
MapStringConversion/32/struct-16 0.00 0.00 ~ (all equal)
MapStringConversion/32/array-16 0.00 0.00 ~ (all equal)
MapStringConversion/64/simple-16 0.00 0.00 ~ (all equal)
MapStringConversion/64/struct-16 0.00 0.00 ~ (all equal)
MapStringConversion/64/array-16 0.00 0.00 ~ (all equal)
NewEmptyMapHintLessThan8-16 0.00 0.00 ~ (all equal)
NewEmptyMapHintGreaterThan8-16 1.00 ± 0% 1.00 ± 0% ~ (all equal)
MapAppendAssign/Int32/256-16 0.00 0.00 ~ (all equal)
MapAppendAssign/Int32/65536-16 0.00 0.00 ~ (all equal)
MapAppendAssign/Int64/256-16 0.00 0.00 ~ (all equal)
MapAppendAssign/Int64/65536-16 0.00 0.00 ~ (all equal)
MapAppendAssign/Str/256-16 0.00 0.00 ~ (all equal)
MapAppendAssign/Str/65536-16 0.00 0.00 ~ (all equal)
CreateGoroutinesCapture-16 5.00 ± 0% 5.00 ± 0% ~ (all equal)
[Geo mean] 26.0 26.0 +0.00%
Change-Id: I5fb03e93df8b380e04795afbdcd1c94aeeecacc6
Reviewed-on: https://go-review.googlesource.com/c/go/+/454255
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Jakub Ciolek <jakub@ciolek.dev>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
For #53821
Change-Id: I2487b8d18a4cd3fc6e64fbbb531419812bfe0f08
Reviewed-on: https://go-review.googlesource.com/c/go/+/427136
Run-TryBot: xie cui <523516579@qq.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
|
|
This CL optimizes memory moving with LDP and STP on arm64.
Benchmarks:
name old time/op new time/op delta
ClearFat7-160 1.08ns ± 0% 0.95ns ± 0% -11.41% (p=0.008 n=5+5)
ClearFat8-160 0.84ns ± 0% 0.84ns ± 0% -0.95% (p=0.008 n=5+5)
ClearFat11-160 1.08ns ± 0% 0.95ns ± 0% -11.46% (p=0.008 n=5+5)
ClearFat12-160 0.95ns ± 0% 0.95ns ± 0% ~ (p=0.063 n=4+5)
ClearFat13-160 1.08ns ± 0% 0.95ns ± 0% -11.45% (p=0.008 n=5+5)
ClearFat14-160 1.08ns ± 0% 0.95ns ± 0% -11.47% (p=0.008 n=5+5)
ClearFat15-160 1.24ns ± 0% 0.95ns ± 0% -22.98% (p=0.029 n=4+4)
ClearFat16-160 0.84ns ± 0% 0.83ns ± 0% -0.11% (p=0.008 n=5+5)
ClearFat24-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal)
ClearFat32-160 2.86ns ± 0% 2.86ns ± 0% ~ (p=0.333 n=5+4)
ClearFat40-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal)
ClearFat48-160 3.32ns ± 1% 3.31ns ± 1% ~ (p=0.690 n=5+5)
ClearFat56-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal)
ClearFat64-160 3.25ns ± 1% 3.26ns ± 1% ~ (p=0.841 n=5+5)
ClearFat72-160 2.22ns ± 0% 2.22ns ± 0% ~ (p=0.444 n=5+5)
ClearFat128-160 4.03ns ± 0% 4.04ns ± 0% +0.32% (p=0.008 n=5+5)
ClearFat256-160 6.44ns ± 0% 6.44ns ± 0% +0.08% (p=0.016 n=4+5)
ClearFat512-160 12.2ns ± 0% 12.2ns ± 0% +0.13% (p=0.008 n=5+5)
ClearFat1024-160 24.3ns ± 0% 24.3ns ± 0% ~ (p=0.167 n=5+5)
ClearFat1032-160 24.5ns ± 0% 24.5ns ± 0% ~ (p=0.238 n=4+5)
ClearFat1040-160 29.2ns ± 0% 29.3ns ± 0% +0.34% (p=0.008 n=5+5)
CopyFat7-160 1.43ns ± 0% 1.07ns ± 0% -24.97% (p=0.008 n=5+5)
CopyFat8-160 0.89ns ± 0% 0.89ns ± 0% ~ (p=0.238 n=5+5)
CopyFat11-160 1.43ns ± 0% 1.07ns ± 0% -24.97% (p=0.008 n=5+5)
CopyFat12-160 1.07ns ± 0% 1.07ns ± 0% ~ (p=0.238 n=5+4)
CopyFat13-160 1.43ns ± 0% 1.07ns ± 0% ~ (p=0.079 n=4+5)
CopyFat14-160 1.43ns ± 0% 1.07ns ± 0% -24.95% (p=0.008 n=5+5)
CopyFat15-160 1.79ns ± 0% 1.07ns ± 0% ~ (p=0.079 n=4+5)
CopyFat16-160 1.07ns ± 0% 1.07ns ± 0% ~ (p=0.444 n=5+5)
CopyFat24-160 1.84ns ± 2% 1.67ns ± 0% -9.28% (p=0.008 n=5+5)
CopyFat32-160 3.22ns ± 0% 2.92ns ± 0% -9.40% (p=0.008 n=5+5)
CopyFat64-160 3.64ns ± 0% 3.57ns ± 0% -1.96% (p=0.008 n=5+5)
CopyFat72-160 3.56ns ± 0% 3.11ns ± 0% -12.89% (p=0.008 n=5+5)
CopyFat128-160 5.06ns ± 0% 5.06ns ± 0% +0.04% (p=0.048 n=5+5)
CopyFat256-160 9.13ns ± 0% 9.13ns ± 0% ~ (p=0.659 n=5+5)
CopyFat512-160 17.4ns ± 0% 17.4ns ± 0% ~ (p=0.167 n=5+5)
CopyFat520-160 17.2ns ± 0% 17.3ns ± 0% +0.37% (p=0.008 n=5+5)
CopyFat1024-160 34.1ns ± 0% 34.0ns ± 0% ~ (p=0.127 n=5+5)
CopyFat1032-160 80.9ns ± 0% 34.2ns ± 0% -57.74% (p=0.008 n=5+5)
CopyFat1040-160 94.4ns ± 0% 41.7ns ± 0% -55.78% (p=0.016 n=5+4)
Change-Id: I14186f9f82b0ecf8b6c02191dc5da566b9a21e6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/421654
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Change-Id: I7eb3de35d1f1f0237962735450b37d738966f30c
Reviewed-on: https://go-review.googlesource.com/c/go/+/423254
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
This benchmark is added to test improvements in memclr_amd64.
As it is stated in Intel Optimization Manual 15.16.3.3, AVX2-implemented
memclr can produce a skewed result with the branch predictor being
trained by the large loop iteration count.
This benchmark generates sizes between some specified range. This should
help to measure how memclr works when branch predictors may be incorrectly
trained.
Change-Id: I14d173cafe43ca47198ed920e655547a66b3909f
Reviewed-on: https://go-review.googlesource.com/c/go/+/373362
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Replace the memmove implementation for moves of 17 bytes or larger
with an implementation from ARM optimized software. The moves of 16
bytes or fewer are unchanged, but the registers used are updated to
match the rest of the implementation.
This implementation makes use of new optimizations:
- software pipelined loop for large (>128 byte) moves
- medium size moves (17..128 bytes) have a new implementation
- address realignment when src or dst is unaligned
- preference for aligned src (loads) or dst (stores) depending on CPU
To support preference for aligned loads or aligned stores, a new CPU
flag is added. This flag indicates that the detected micro
architecture performs better with aligned loads. Some tested CPUs did
not exhibit a significant difference and are left with the default
behavior of realigning based on the destination address (stores).
Neoverse N1 (Tested on Graviton 2)
name old time/op new time/op delta
Memmove/0-4 1.88ns ± 1% 1.87ns ± 1% -0.58% (p=0.020 n=10+10)
Memmove/1-4 4.40ns ± 0% 4.40ns ± 0% ~ (all equal)
Memmove/8-4 3.88ns ± 3% 3.80ns ± 0% -1.97% (p=0.001 n=10+9)
Memmove/16-4 3.90ns ± 3% 3.80ns ± 0% -2.49% (p=0.000 n=10+9)
Memmove/32-4 4.80ns ± 0% 4.40ns ± 0% -8.33% (p=0.000 n=9+8)
Memmove/64-4 5.86ns ± 0% 5.00ns ± 0% -14.76% (p=0.000 n=8+8)
Memmove/128-4 8.46ns ± 0% 8.06ns ± 0% -4.62% (p=0.000 n=10+10)
Memmove/256-4 12.4ns ± 0% 12.2ns ± 0% -1.61% (p=0.000 n=10+10)
Memmove/512-4 19.5ns ± 0% 19.1ns ± 0% -2.05% (p=0.000 n=10+10)
Memmove/1024-4 33.7ns ± 0% 33.5ns ± 0% -0.59% (p=0.000 n=10+10)
Memmove/2048-4 62.1ns ± 0% 59.0ns ± 0% -4.99% (p=0.000 n=10+10)
Memmove/4096-4 117ns ± 1% 110ns ± 0% -5.66% (p=0.000 n=10+10)
MemmoveUnalignedDst/64-4 6.41ns ± 0% 5.62ns ± 0% -12.32% (p=0.000 n=10+7)
MemmoveUnalignedDst/128-4 9.40ns ± 0% 8.34ns ± 0% -11.24% (p=0.000 n=10+10)
MemmoveUnalignedDst/256-4 12.8ns ± 0% 12.8ns ± 0% ~ (all equal)
MemmoveUnalignedDst/512-4 20.4ns ± 0% 19.7ns ± 0% -3.43% (p=0.000 n=9+10)
MemmoveUnalignedDst/1024-4 34.1ns ± 0% 35.1ns ± 0% +2.93% (p=0.000 n=9+9)
MemmoveUnalignedDst/2048-4 61.5ns ± 0% 60.4ns ± 0% -1.77% (p=0.000 n=10+10)
MemmoveUnalignedDst/4096-4 122ns ± 0% 113ns ± 0% -7.38% (p=0.002 n=8+10)
MemmoveUnalignedSrc/64-4 7.25ns ± 1% 6.26ns ± 0% -13.64% (p=0.000 n=9+9)
MemmoveUnalignedSrc/128-4 10.5ns ± 0% 9.7ns ± 0% -7.52% (p=0.000 n=10+10)
MemmoveUnalignedSrc/256-4 17.1ns ± 0% 17.3ns ± 0% +1.17% (p=0.000 n=10+10)
MemmoveUnalignedSrc/512-4 27.0ns ± 0% 27.0ns ± 0% ~ (all equal)
MemmoveUnalignedSrc/1024-4 46.7ns ± 0% 35.7ns ± 0% -23.55% (p=0.000 n=10+9)
MemmoveUnalignedSrc/2048-4 85.2ns ± 0% 61.2ns ± 0% -28.17% (p=0.000 n=10+8)
MemmoveUnalignedSrc/4096-4 162ns ± 0% 113ns ± 0% -30.25% (p=0.000 n=10+10)
name old speed new speed delta
Memmove/4096-4 35.2GB/s ± 0% 37.1GB/s ± 0% +5.56% (p=0.000 n=10+9)
MemmoveUnalignedSrc/1024-4 21.9GB/s ± 0% 28.7GB/s ± 0% +30.90% (p=0.000 n=10+10)
MemmoveUnalignedSrc/2048-4 24.0GB/s ± 0% 33.5GB/s ± 0% +39.18% (p=0.000 n=10+9)
MemmoveUnalignedSrc/4096-4 25.3GB/s ± 0% 36.2GB/s ± 0% +43.50% (p=0.000 n=10+7)
Cortex-A72 (Graviton 1)
name old time/op new time/op delta
Memmove/0-4 3.06ns ± 3% 3.08ns ± 1% ~ (p=0.958 n=10+9)
Memmove/1-4 8.72ns ± 0% 7.85ns ± 0% -9.98% (p=0.002 n=8+10)
Memmove/8-4 8.29ns ± 0% 8.29ns ± 0% ~ (all equal)
Memmove/16-4 8.29ns ± 0% 8.29ns ± 0% ~ (all equal)
Memmove/32-4 8.19ns ± 2% 8.29ns ± 0% ~ (p=0.114 n=10+10)
Memmove/64-4 18.3ns ± 4% 10.0ns ± 0% -45.36% (p=0.000 n=10+10)
Memmove/128-4 14.8ns ± 0% 17.4ns ± 0% +17.77% (p=0.000 n=10+10)
Memmove/256-4 21.8ns ± 0% 23.1ns ± 0% +5.96% (p=0.000 n=10+10)
Memmove/512-4 35.8ns ± 0% 37.2ns ± 0% +3.91% (p=0.000 n=10+10)
Memmove/1024-4 63.7ns ± 0% 67.2ns ± 0% +5.49% (p=0.000 n=10+10)
Memmove/2048-4 126ns ± 0% 123ns ± 0% -2.38% (p=0.000 n=10+10)
Memmove/4096-4 238ns ± 1% 243ns ± 1% +1.93% (p=0.000 n=10+10)
MemmoveUnalignedDst/64-4 19.3ns ± 1% 12.0ns ± 1% -37.49% (p=0.000 n=10+10)
MemmoveUnalignedDst/128-4 17.2ns ± 0% 17.4ns ± 0% +1.16% (p=0.000 n=10+10)
MemmoveUnalignedDst/256-4 28.2ns ± 8% 29.2ns ± 0% ~ (p=0.352 n=10+10)
MemmoveUnalignedDst/512-4 49.8ns ± 3% 48.9ns ± 0% ~ (p=1.000 n=10+10)
MemmoveUnalignedDst/1024-4 89.5ns ± 0% 80.5ns ± 1% -10.02% (p=0.000 n=10+10)
MemmoveUnalignedDst/2048-4 180ns ± 0% 127ns ± 0% -29.44% (p=0.000 n=9+10)
MemmoveUnalignedDst/4096-4 347ns ± 0% 244ns ± 0% -29.59% (p=0.000 n=10+9)
MemmoveUnalignedSrc/128-4 16.1ns ± 0% 21.8ns ± 0% +35.40% (p=0.000 n=10+10)
MemmoveUnalignedSrc/256-4 24.9ns ± 8% 26.6ns ± 0% +6.70% (p=0.015 n=10+10)
MemmoveUnalignedSrc/512-4 39.4ns ± 6% 40.6ns ± 0% ~ (p=0.352 n=10+10)
MemmoveUnalignedSrc/1024-4 72.5ns ± 0% 83.0ns ± 1% +14.44% (p=0.000 n=9+10)
MemmoveUnalignedSrc/2048-4 129ns ± 1% 128ns ± 1% ~ (p=0.179 n=10+10)
MemmoveUnalignedSrc/4096-4 241ns ± 0% 253ns ± 1% +4.99% (p=0.000 n=9+9)
Cortex-A53 (Raspberry Pi 3)
name old time/op new time/op delta
Memmove/0-4 11.0ns ± 0% 11.0ns ± 1% ~ (p=0.294 n=8+10)
Memmove/1-4 29.6ns ± 0% 28.0ns ± 1% -5.41% (p=0.000 n=9+10)
Memmove/8-4 23.5ns ± 0% 22.1ns ± 0% -6.11% (p=0.000 n=8+8)
Memmove/16-4 23.7ns ± 1% 22.1ns ± 0% -6.59% (p=0.000 n=10+8)
Memmove/32-4 27.9ns ± 0% 27.1ns ± 0% -3.13% (p=0.000 n=8+8)
Memmove/64-4 33.8ns ± 0% 31.5ns ± 1% -6.99% (p=0.000 n=8+10)
Memmove/128-4 45.6ns ± 0% 44.2ns ± 1% -3.23% (p=0.000 n=9+10)
Memmove/256-4 69.3ns ± 0% 69.3ns ± 0% ~ (p=0.072 n=8+8)
Memmove/512-4 127ns ± 0% 110ns ± 0% -13.39% (p=0.000 n=8+8)
Memmove/1024-4 222ns ± 0% 205ns ± 1% -7.66% (p=0.000 n=7+10)
Memmove/2048-4 411ns ± 0% 366ns ± 0% -10.98% (p=0.000 n=8+9)
Memmove/4096-4 795ns ± 1% 695ns ± 1% -12.63% (p=0.000 n=10+10)
MemmoveUnalignedDst/64-4 44.0ns ± 0% 40.5ns ± 0% -7.93% (p=0.000 n=8+8)
MemmoveUnalignedDst/128-4 59.6ns ± 0% 54.9ns ± 0% -7.85% (p=0.000 n=9+9)
MemmoveUnalignedDst/256-4 98.2ns ±11% 90.0ns ± 1% ~ (p=0.130 n=10+10)
MemmoveUnalignedDst/512-4 161ns ± 2% 145ns ± 1% -9.96% (p=0.000 n=10+10)
MemmoveUnalignedDst/1024-4 281ns ± 0% 265ns ± 0% -5.65% (p=0.000 n=9+8)
MemmoveUnalignedDst/2048-4 528ns ± 0% 482ns ± 0% -8.73% (p=0.000 n=8+9)
MemmoveUnalignedDst/4096-4 1.02µs ± 1% 0.92µs ± 0% -10.00% (p=0.000 n=10+8)
MemmoveUnalignedSrc/64-4 42.4ns ± 1% 40.5ns ± 0% -4.39% (p=0.000 n=10+8)
MemmoveUnalignedSrc/128-4 57.4ns ± 0% 57.0ns ± 1% -0.75% (p=0.048 n=9+10)
MemmoveUnalignedSrc/256-4 88.1ns ± 1% 89.6ns ± 0% +1.70% (p=0.000 n=9+8)
MemmoveUnalignedSrc/512-4 160ns ± 2% 144ns ± 0% -9.89% (p=0.000 n=10+8)
MemmoveUnalignedSrc/1024-4 286ns ± 0% 266ns ± 1% -6.69% (p=0.000 n=8+10)
MemmoveUnalignedSrc/2048-4 525ns ± 0% 483ns ± 1% -7.96% (p=0.000 n=9+10)
MemmoveUnalignedSrc/4096-4 1.01µs ± 0% 0.92µs ± 1% -9.40% (p=0.000 n=8+10)
Change-Id: Ia1144e9d4dfafdece6e167c5e576bf80f254c8ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/243357
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Change-Id: I6389d7efe90836b6ece44d2e75053d1ad9f35d08
Reviewed-on: https://go-review.googlesource.com/c/go/+/253417
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
In the previous CL we ensures that memmove writes pointers
atomically, so the concurrent GC won't observe a partially
updated pointer. This CL adds a test.
Change-Id: Icd1124bf3a15ef25bac20c7fb8933f1a642d897c
Reviewed-on: https://go-review.googlesource.com/c/go/+/212627
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Speeds up the "go test runtime -cpu=1,2,4 -short -quick" phase of all.bash.
For #26473.
Change-Id: I090f5a5aa754462b3253a2156dc31fa67ce7af2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/177399
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
name old time/op new time/op delta
CopyFat8 2.15ns ± 1% 2.19ns ± 6% ~ (p=0.171 n=8+9)
CopyFat12 2.15ns ± 0% 2.17ns ± 2% ~ (p=0.137 n=8+10)
CopyFat16 2.17ns ± 3% 2.15ns ± 0% ~ (p=0.211 n=10+10)
CopyFat24 2.16ns ± 1% 2.15ns ± 0% ~ (p=0.087 n=10+10)
CopyFat32 11.5ns ± 0% 12.8ns ± 2% +10.87% (p=0.000 n=8+10)
CopyFat64 20.2ns ± 2% 12.9ns ± 0% -36.11% (p=0.000 n=10+10)
CopyFat128 37.2ns ± 0% 21.5ns ± 0% -42.20% (p=0.000 n=10+10)
CopyFat256 71.6ns ± 0% 38.7ns ± 0% -45.95% (p=0.000 n=10+10)
CopyFat512 140ns ± 0% 73ns ± 0% -47.86% (p=0.000 n=10+9)
CopyFat520 142ns ± 0% 74ns ± 0% -47.54% (p=0.000 n=10+10)
CopyFat1024 277ns ± 0% 141ns ± 0% -49.10% (p=0.000 n=10+10)
Change-Id: If54bc571add5db674d5e081579c87e80153d0a5a
Reviewed-on: https://go-review.googlesource.com/97395
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This cuts 23 seconds from all.bash on my MacBook Pro.
Change-Id: Ibc4d7c01660b9e9ebd088dd55ba993f0d7ec6aa3
Reviewed-on: https://go-review.googlesource.com/73991
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
memmove used to use 2 2-byte load/store pairs to move 4 bytes.
When the result is loaded with a single 4-byte load, it caused
a store to load fowarding stall. To avoid the stall,
special case memmove to use 4 byte ops for the 4 byte copy case.
We already have a special case for 8-byte copies.
386 already specializes 4-byte copies.
I'll do 2-byte copies also, but not for 1.8.
benchmark old ns/op new ns/op delta
BenchmarkIssue18740-8 7567 4799 -36.58%
3-byte copies get a bit slower. Other copies are unchanged.
name old time/op new time/op delta
Memmove/3-8 4.76ns ± 5% 5.26ns ± 3% +10.50% (p=0.000 n=10+10)
Fixes #18740
Change-Id: Iec82cbac0ecfee80fa3c8fc83828f9a1819c3c74
Reviewed-on: https://go-review.googlesource.com/35567
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
I used the slowtests.go tool as described in
https://golang.org/cl/32684 on packages that stood out.
go test -short std drops from ~56 to ~52 seconds.
This isn't a huge win, but it was mostly an exercise.
Updates #17751
Change-Id: I9f3402e36a038d71e662d06ce2c1d52f6c4b674d
Reviewed-on: https://go-review.googlesource.com/32751
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Use AVX if available on 4th generation of Intel(TM) Core(TM) processors.
(collected on E5 2609v3 @1.9GHz)
name old speed new speed delta
Memmove/1-6 158MB/s ± 0% 172MB/s ± 0% +9.09% (p=0.000 n=16+16)
Memmove/2-6 316MB/s ± 0% 345MB/s ± 0% +9.09% (p=0.000 n=18+16)
Memmove/3-6 517MB/s ± 0% 517MB/s ± 0% ~ (p=0.445 n=16+16)
Memmove/4-6 687MB/s ± 1% 690MB/s ± 0% +0.35% (p=0.000 n=20+17)
Memmove/5-6 729MB/s ± 0% 729MB/s ± 0% +0.01% (p=0.000 n=16+18)
Memmove/6-6 875MB/s ± 0% 875MB/s ± 0% +0.01% (p=0.000 n=18+18)
Memmove/7-6 1.02GB/s ± 0% 1.02GB/s ± 1% ~ (p=0.139 n=19+20)
Memmove/8-6 1.26GB/s ± 0% 1.26GB/s ± 0% +0.00% (p=0.000 n=18+18)
Memmove/9-6 1.42GB/s ± 0% 1.42GB/s ± 0% +0.00% (p=0.000 n=17+18)
Memmove/10-6 1.58GB/s ± 0% 1.58GB/s ± 0% +0.00% (p=0.000 n=19+19)
Memmove/11-6 1.74GB/s ± 0% 1.74GB/s ± 0% +0.00% (p=0.001 n=18+17)
Memmove/12-6 1.90GB/s ± 0% 1.90GB/s ± 0% +0.00% (p=0.000 n=19+19)
Memmove/13-6 2.05GB/s ± 0% 2.05GB/s ± 0% +0.00% (p=0.000 n=18+19)
Memmove/14-6 2.21GB/s ± 0% 2.21GB/s ± 0% +0.00% (p=0.000 n=16+20)
Memmove/15-6 2.37GB/s ± 0% 2.37GB/s ± 0% +0.00% (p=0.004 n=19+20)
Memmove/16-6 2.53GB/s ± 0% 2.53GB/s ± 0% +0.00% (p=0.000 n=16+16)
Memmove/32-6 4.67GB/s ± 0% 4.67GB/s ± 0% +0.00% (p=0.000 n=17+17)
Memmove/64-6 8.67GB/s ± 0% 8.64GB/s ± 0% -0.33% (p=0.000 n=18+17)
Memmove/128-6 12.6GB/s ± 0% 11.6GB/s ± 0% -8.05% (p=0.000 n=16+19)
Memmove/256-6 16.3GB/s ± 0% 16.6GB/s ± 0% +1.66% (p=0.000 n=20+18)
Memmove/512-6 21.5GB/s ± 0% 24.4GB/s ± 0% +13.35% (p=0.000 n=18+17)
Memmove/1024-6 24.7GB/s ± 0% 33.7GB/s ± 0% +36.12% (p=0.000 n=18+18)
Memmove/2048-6 27.3GB/s ± 0% 43.3GB/s ± 0% +58.77% (p=0.000 n=19+17)
Memmove/4096-6 37.5GB/s ± 0% 50.5GB/s ± 0% +34.56% (p=0.000 n=19+19)
MemmoveUnalignedDst/1-6 135MB/s ± 0% 146MB/s ± 0% +7.69% (p=0.000 n=16+14)
MemmoveUnalignedDst/2-6 271MB/s ± 0% 292MB/s ± 0% +7.69% (p=0.000 n=18+18)
MemmoveUnalignedDst/3-6 438MB/s ± 0% 438MB/s ± 0% ~ (p=0.352 n=16+19)
MemmoveUnalignedDst/4-6 584MB/s ± 0% 584MB/s ± 0% ~ (p=0.876 n=17+17)
MemmoveUnalignedDst/5-6 631MB/s ± 1% 632MB/s ± 0% +0.25% (p=0.000 n=20+17)
MemmoveUnalignedDst/6-6 759MB/s ± 0% 759MB/s ± 0% +0.00% (p=0.000 n=19+16)
MemmoveUnalignedDst/7-6 885MB/s ± 0% 883MB/s ± 1% ~ (p=0.647 n=18+20)
MemmoveUnalignedDst/8-6 1.08GB/s ± 0% 1.08GB/s ± 0% +0.00% (p=0.035 n=19+18)
MemmoveUnalignedDst/9-6 1.22GB/s ± 0% 1.22GB/s ± 0% ~ (p=0.251 n=18+17)
MemmoveUnalignedDst/10-6 1.35GB/s ± 0% 1.35GB/s ± 0% ~ (p=0.327 n=17+18)
MemmoveUnalignedDst/11-6 1.49GB/s ± 0% 1.49GB/s ± 0% ~ (p=0.531 n=18+19)
MemmoveUnalignedDst/12-6 1.63GB/s ± 0% 1.63GB/s ± 0% ~ (p=0.886 n=19+18)
MemmoveUnalignedDst/13-6 1.76GB/s ± 0% 1.76GB/s ± 1% -0.24% (p=0.006 n=18+20)
MemmoveUnalignedDst/14-6 1.90GB/s ± 0% 1.90GB/s ± 0% ~ (p=0.818 n=20+19)
MemmoveUnalignedDst/15-6 2.03GB/s ± 0% 2.03GB/s ± 0% ~ (p=0.294 n=17+16)
MemmoveUnalignedDst/16-6 2.17GB/s ± 0% 2.17GB/s ± 0% ~ (p=0.602 n=16+18)
MemmoveUnalignedDst/32-6 4.05GB/s ± 0% 4.05GB/s ± 0% +0.00% (p=0.010 n=18+17)
MemmoveUnalignedDst/64-6 7.59GB/s ± 0% 7.59GB/s ± 0% +0.00% (p=0.022 n=18+16)
MemmoveUnalignedDst/128-6 11.1GB/s ± 0% 11.4GB/s ± 0% +2.79% (p=0.000 n=18+17)
MemmoveUnalignedDst/256-6 16.4GB/s ± 0% 16.7GB/s ± 0% +1.59% (p=0.000 n=20+17)
MemmoveUnalignedDst/512-6 15.7GB/s ± 0% 21.3GB/s ± 0% +35.87% (p=0.000 n=18+20)
MemmoveUnalignedDst/1024-6 16.0GB/s ±20% 31.5GB/s ± 0% +96.93% (p=0.000 n=20+14)
MemmoveUnalignedDst/2048-6 19.6GB/s ± 0% 42.1GB/s ± 0% +115.16% (p=0.000 n=17+18)
MemmoveUnalignedDst/4096-6 6.41GB/s ± 0% 33.18GB/s ± 0% +417.56% (p=0.000 n=17+18)
MemmoveUnalignedSrc/1-6 171MB/s ± 0% 166MB/s ± 0% -3.33% (p=0.000 n=19+16)
MemmoveUnalignedSrc/2-6 343MB/s ± 0% 342MB/s ± 1% -0.41% (p=0.000 n=17+20)
MemmoveUnalignedSrc/3-6 508MB/s ± 0% 493MB/s ± 1% -2.90% (p=0.000 n=17+17)
MemmoveUnalignedSrc/4-6 677MB/s ± 0% 660MB/s ± 2% -2.55% (p=0.000 n=17+20)
MemmoveUnalignedSrc/5-6 790MB/s ± 0% 790MB/s ± 0% ~ (p=0.139 n=17+17)
MemmoveUnalignedSrc/6-6 948MB/s ± 0% 946MB/s ± 1% ~ (p=0.330 n=17+19)
MemmoveUnalignedSrc/7-6 1.11GB/s ± 0% 1.11GB/s ± 0% -0.05% (p=0.026 n=17+17)
MemmoveUnalignedSrc/8-6 1.38GB/s ± 0% 1.38GB/s ± 0% ~ (p=0.091 n=18+16)
MemmoveUnalignedSrc/9-6 1.42GB/s ± 0% 1.40GB/s ± 1% -1.04% (p=0.000 n=19+20)
MemmoveUnalignedSrc/10-6 1.58GB/s ± 0% 1.56GB/s ± 1% -1.15% (p=0.000 n=18+19)
MemmoveUnalignedSrc/11-6 1.73GB/s ± 0% 1.71GB/s ± 1% -1.30% (p=0.000 n=20+20)
MemmoveUnalignedSrc/12-6 1.89GB/s ± 0% 1.87GB/s ± 1% -1.18% (p=0.000 n=17+20)
MemmoveUnalignedSrc/13-6 2.05GB/s ± 0% 2.02GB/s ± 1% -1.18% (p=0.000 n=17+20)
MemmoveUnalignedSrc/14-6 2.21GB/s ± 0% 2.18GB/s ± 1% -1.14% (p=0.000 n=17+20)
MemmoveUnalignedSrc/15-6 2.36GB/s ± 0% 2.34GB/s ± 1% -1.04% (p=0.000 n=17+20)
MemmoveUnalignedSrc/16-6 2.52GB/s ± 0% 2.49GB/s ± 1% -1.26% (p=0.000 n=19+20)
MemmoveUnalignedSrc/32-6 4.82GB/s ± 0% 4.61GB/s ± 0% -4.40% (p=0.000 n=19+20)
MemmoveUnalignedSrc/64-6 5.03GB/s ± 4% 7.97GB/s ± 0% +58.55% (p=0.000 n=20+16)
MemmoveUnalignedSrc/128-6 11.1GB/s ± 0% 11.2GB/s ± 0% +0.52% (p=0.000 n=17+18)
MemmoveUnalignedSrc/256-6 16.5GB/s ± 0% 16.4GB/s ± 0% -0.10% (p=0.000 n=20+18)
MemmoveUnalignedSrc/512-6 21.0GB/s ± 0% 22.1GB/s ± 0% +5.48% (p=0.000 n=14+17)
MemmoveUnalignedSrc/1024-6 24.9GB/s ± 0% 31.9GB/s ± 0% +28.20% (p=0.000 n=19+20)
MemmoveUnalignedSrc/2048-6 23.3GB/s ± 0% 33.8GB/s ± 0% +45.22% (p=0.000 n=17+19)
MemmoveUnalignedSrc/4096-6 37.3GB/s ± 0% 42.7GB/s ± 0% +14.30% (p=0.000 n=17+17)
Change-Id: Id66aa3e499ccfb117cb99d623ef326b50d057b64
Reviewed-on: https://go-review.googlesource.com/29590
Run-TryBot: Denis Nagorny <denis.nagorny@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This reverts commit 3607c5f4f18ad4d423e40996ebf7f46b2f79ce02.
This was causing failures on amd64 machines without AVX.
Fixes #16939
Change-Id: I70080fbb4e7ae791857334f2bffd847d08cb25fa
Reviewed-on: https://go-review.googlesource.com/28274
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Use AVX if available on 4th generation of Intel(TM) Core(TM) processors.
(collected on E5 2609v3 @1.9GHz)
name old speed new speed delta
Memmove/1-6 158MB/s ± 0% 172MB/s ± 0% +9.09% (p=0.000 n=16+16)
Memmove/2-6 316MB/s ± 0% 345MB/s ± 0% +9.09% (p=0.000 n=18+16)
Memmove/3-6 517MB/s ± 0% 517MB/s ± 0% ~ (p=0.445 n=16+16)
Memmove/4-6 687MB/s ± 1% 690MB/s ± 0% +0.35% (p=0.000 n=20+17)
Memmove/5-6 729MB/s ± 0% 729MB/s ± 0% +0.01% (p=0.000 n=16+18)
Memmove/6-6 875MB/s ± 0% 875MB/s ± 0% +0.01% (p=0.000 n=18+18)
Memmove/7-6 1.02GB/s ± 0% 1.02GB/s ± 1% ~ (p=0.139 n=19+20)
Memmove/8-6 1.26GB/s ± 0% 1.26GB/s ± 0% +0.00% (p=0.000 n=18+18)
Memmove/9-6 1.42GB/s ± 0% 1.42GB/s ± 0% +0.00% (p=0.000 n=17+18)
Memmove/10-6 1.58GB/s ± 0% 1.58GB/s ± 0% +0.00% (p=0.000 n=19+19)
Memmove/11-6 1.74GB/s ± 0% 1.74GB/s ± 0% +0.00% (p=0.001 n=18+17)
Memmove/12-6 1.90GB/s ± 0% 1.90GB/s ± 0% +0.00% (p=0.000 n=19+19)
Memmove/13-6 2.05GB/s ± 0% 2.05GB/s ± 0% +0.00% (p=0.000 n=18+19)
Memmove/14-6 2.21GB/s ± 0% 2.21GB/s ± 0% +0.00% (p=0.000 n=16+20)
Memmove/15-6 2.37GB/s ± 0% 2.37GB/s ± 0% +0.00% (p=0.004 n=19+20)
Memmove/16-6 2.53GB/s ± 0% 2.53GB/s ± 0% +0.00% (p=0.000 n=16+16)
Memmove/32-6 4.67GB/s ± 0% 4.67GB/s ± 0% +0.00% (p=0.000 n=17+17)
Memmove/64-6 8.67GB/s ± 0% 8.64GB/s ± 0% -0.33% (p=0.000 n=18+17)
Memmove/128-6 12.6GB/s ± 0% 11.6GB/s ± 0% -8.05% (p=0.000 n=16+19)
Memmove/256-6 16.3GB/s ± 0% 16.6GB/s ± 0% +1.66% (p=0.000 n=20+18)
Memmove/512-6 21.5GB/s ± 0% 24.4GB/s ± 0% +13.35% (p=0.000 n=18+17)
Memmove/1024-6 24.7GB/s ± 0% 33.7GB/s ± 0% +36.12% (p=0.000 n=18+18)
Memmove/2048-6 27.3GB/s ± 0% 43.3GB/s ± 0% +58.77% (p=0.000 n=19+17)
Memmove/4096-6 37.5GB/s ± 0% 50.5GB/s ± 0% +34.56% (p=0.000 n=19+19)
MemmoveUnalignedDst/1-6 135MB/s ± 0% 146MB/s ± 0% +7.69% (p=0.000 n=16+14)
MemmoveUnalignedDst/2-6 271MB/s ± 0% 292MB/s ± 0% +7.69% (p=0.000 n=18+18)
MemmoveUnalignedDst/3-6 438MB/s ± 0% 438MB/s ± 0% ~ (p=0.352 n=16+19)
MemmoveUnalignedDst/4-6 584MB/s ± 0% 584MB/s ± 0% ~ (p=0.876 n=17+17)
MemmoveUnalignedDst/5-6 631MB/s ± 1% 632MB/s ± 0% +0.25% (p=0.000 n=20+17)
MemmoveUnalignedDst/6-6 759MB/s ± 0% 759MB/s ± 0% +0.00% (p=0.000 n=19+16)
MemmoveUnalignedDst/7-6 885MB/s ± 0% 883MB/s ± 1% ~ (p=0.647 n=18+20)
MemmoveUnalignedDst/8-6 1.08GB/s ± 0% 1.08GB/s ± 0% +0.00% (p=0.035 n=19+18)
MemmoveUnalignedDst/9-6 1.22GB/s ± 0% 1.22GB/s ± 0% ~ (p=0.251 n=18+17)
MemmoveUnalignedDst/10-6 1.35GB/s ± 0% 1.35GB/s ± 0% ~ (p=0.327 n=17+18)
MemmoveUnalignedDst/11-6 1.49GB/s ± 0% 1.49GB/s ± 0% ~ (p=0.531 n=18+19)
MemmoveUnalignedDst/12-6 1.63GB/s ± 0% 1.63GB/s ± 0% ~ (p=0.886 n=19+18)
MemmoveUnalignedDst/13-6 1.76GB/s ± 0% 1.76GB/s ± 1% -0.24% (p=0.006 n=18+20)
MemmoveUnalignedDst/14-6 1.90GB/s ± 0% 1.90GB/s ± 0% ~ (p=0.818 n=20+19)
MemmoveUnalignedDst/15-6 2.03GB/s ± 0% 2.03GB/s ± 0% ~ (p=0.294 n=17+16)
MemmoveUnalignedDst/16-6 2.17GB/s ± 0% 2.17GB/s ± 0% ~ (p=0.602 n=16+18)
MemmoveUnalignedDst/32-6 4.05GB/s ± 0% 4.05GB/s ± 0% +0.00% (p=0.010 n=18+17)
MemmoveUnalignedDst/64-6 7.59GB/s ± 0% 7.59GB/s ± 0% +0.00% (p=0.022 n=18+16)
MemmoveUnalignedDst/128-6 11.1GB/s ± 0% 11.4GB/s ± 0% +2.79% (p=0.000 n=18+17)
MemmoveUnalignedDst/256-6 16.4GB/s ± 0% 16.7GB/s ± 0% +1.59% (p=0.000 n=20+17)
MemmoveUnalignedDst/512-6 15.7GB/s ± 0% 21.3GB/s ± 0% +35.87% (p=0.000 n=18+20)
MemmoveUnalignedDst/1024-6 16.0GB/s ±20% 31.5GB/s ± 0% +96.93% (p=0.000 n=20+14)
MemmoveUnalignedDst/2048-6 19.6GB/s ± 0% 42.1GB/s ± 0% +115.16% (p=0.000 n=17+18)
MemmoveUnalignedDst/4096-6 6.41GB/s ± 0% 33.18GB/s ± 0% +417.56% (p=0.000 n=17+18)
MemmoveUnalignedSrc/1-6 171MB/s ± 0% 166MB/s ± 0% -3.33% (p=0.000 n=19+16)
MemmoveUnalignedSrc/2-6 343MB/s ± 0% 342MB/s ± 1% -0.41% (p=0.000 n=17+20)
MemmoveUnalignedSrc/3-6 508MB/s ± 0% 493MB/s ± 1% -2.90% (p=0.000 n=17+17)
MemmoveUnalignedSrc/4-6 677MB/s ± 0% 660MB/s ± 2% -2.55% (p=0.000 n=17+20)
MemmoveUnalignedSrc/5-6 790MB/s ± 0% 790MB/s ± 0% ~ (p=0.139 n=17+17)
MemmoveUnalignedSrc/6-6 948MB/s ± 0% 946MB/s ± 1% ~ (p=0.330 n=17+19)
MemmoveUnalignedSrc/7-6 1.11GB/s ± 0% 1.11GB/s ± 0% -0.05% (p=0.026 n=17+17)
MemmoveUnalignedSrc/8-6 1.38GB/s ± 0% 1.38GB/s ± 0% ~ (p=0.091 n=18+16)
MemmoveUnalignedSrc/9-6 1.42GB/s ± 0% 1.40GB/s ± 1% -1.04% (p=0.000 n=19+20)
MemmoveUnalignedSrc/10-6 1.58GB/s ± 0% 1.56GB/s ± 1% -1.15% (p=0.000 n=18+19)
MemmoveUnalignedSrc/11-6 1.73GB/s ± 0% 1.71GB/s ± 1% -1.30% (p=0.000 n=20+20)
MemmoveUnalignedSrc/12-6 1.89GB/s ± 0% 1.87GB/s ± 1% -1.18% (p=0.000 n=17+20)
MemmoveUnalignedSrc/13-6 2.05GB/s ± 0% 2.02GB/s ± 1% -1.18% (p=0.000 n=17+20)
MemmoveUnalignedSrc/14-6 2.21GB/s ± 0% 2.18GB/s ± 1% -1.14% (p=0.000 n=17+20)
MemmoveUnalignedSrc/15-6 2.36GB/s ± 0% 2.34GB/s ± 1% -1.04% (p=0.000 n=17+20)
MemmoveUnalignedSrc/16-6 2.52GB/s ± 0% 2.49GB/s ± 1% -1.26% (p=0.000 n=19+20)
MemmoveUnalignedSrc/32-6 4.82GB/s ± 0% 4.61GB/s ± 0% -4.40% (p=0.000 n=19+20)
MemmoveUnalignedSrc/64-6 5.03GB/s ± 4% 7.97GB/s ± 0% +58.55% (p=0.000 n=20+16)
MemmoveUnalignedSrc/128-6 11.1GB/s ± 0% 11.2GB/s ± 0% +0.52% (p=0.000 n=17+18)
MemmoveUnalignedSrc/256-6 16.5GB/s ± 0% 16.4GB/s ± 0% -0.10% (p=0.000 n=20+18)
MemmoveUnalignedSrc/512-6 21.0GB/s ± 0% 22.1GB/s ± 0% +5.48% (p=0.000 n=14+17)
MemmoveUnalignedSrc/1024-6 24.9GB/s ± 0% 31.9GB/s ± 0% +28.20% (p=0.000 n=19+20)
MemmoveUnalignedSrc/2048-6 23.3GB/s ± 0% 33.8GB/s ± 0% +45.22% (p=0.000 n=17+19)
MemmoveUnalignedSrc/4096-6 37.3GB/s ± 0% 42.7GB/s ± 0% +14.30% (p=0.000 n=17+17)
Change-Id: Iab488d93a293cdf573ab5cd89b95a818bbb5d531
Reviewed-on: https://go-review.googlesource.com/22515
Run-TryBot: Denis Nagorny <denis.nagorny@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Names of sub-benchmarks are preserved, short of the additional slash.
Change-Id: I9b3f82964f9a44b0d28724413320afd091ed3106
Reviewed-on: https://go-review.googlesource.com/23425
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
MOVSB is quite a bit faster for unaligned moves.
Possibly we should use MOVSB all of the time, but Intel folks
say it might be a bit faster to use MOVSQ on some processors
(but not any I have access to at the moment).
benchmark old ns/op new ns/op delta
BenchmarkMemmove4096-8 93.9 93.2 -0.75%
BenchmarkMemmoveUnalignedDst4096-8 256 151 -41.02%
BenchmarkMemmoveUnalignedSrc4096-8 175 90.5 -48.29%
Fixes #14630
Change-Id: I568e6d6590eb3615e6a699fb474020596be665ff
Reviewed-on: https://go-review.googlesource.com/20293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
This is a subset of https://golang.org/cl/20022 with only the copyright
header lines, so the next CL will be smaller and more reviewable.
Go policy has been single space after periods in comments for some time.
The copyright header template at:
https://golang.org/doc/contribute.html#copyright
also uses a single space.
Make them all consistent.
Change-Id: Icc26c6b8495c3820da6b171ca96a74701b4a01b0
Reviewed-on: https://go-review.googlesource.com/20111
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Results are a bit noisy, but show good improvement (haswell)
name old time/op new time/op delta
Memclr5-48 6.06ns ± 8% 5.65ns ± 8% -6.81% (p=0.000 n=20+20)
Memclr16-48 5.75ns ± 6% 5.71ns ± 6% ~ (p=0.545 n=20+19)
Memclr64-48 6.54ns ± 5% 6.14ns ± 9% -6.12% (p=0.000 n=18+20)
Memclr256-48 10.1ns ±12% 9.9ns ±14% ~ (p=0.285 n=20+19)
Memclr4096-48 104ns ± 8% 57ns ±15% -44.98% (p=0.000 n=20+20)
Memclr65536-48 2.45µs ± 5% 2.43µs ± 8% ~ (p=0.665 n=16+20)
Memclr1M-48 58.7µs ±13% 56.4µs ±11% -3.92% (p=0.033 n=20+19)
Memclr4M-48 233µs ± 9% 234µs ± 9% ~ (p=0.728 n=20+19)
Memclr8M-48 469µs ±11% 472µs ±16% ~ (p=0.947 n=20+20)
Memclr16M-48 947µs ±10% 916µs ±10% ~ (p=0.050 n=20+19)
Memclr64M-48 10.9ms ±10% 4.5ms ± 9% -58.43% (p=0.000 n=20+20)
GoMemclr5-48 3.80ns ±13% 3.38ns ± 6% -11.02% (p=0.000 n=20+20)
GoMemclr16-48 3.34ns ±15% 3.40ns ± 9% ~ (p=0.351 n=20+20)
GoMemclr64-48 4.10ns ±15% 4.04ns ±10% ~ (p=1.000 n=20+19)
GoMemclr256-48 7.75ns ±20% 7.88ns ± 9% ~ (p=0.227 n=20+19)
name old speed new speed delta
Memclr5-48 826MB/s ± 7% 886MB/s ± 8% +7.32% (p=0.000 n=20+20)
Memclr16-48 2.78GB/s ± 5% 2.81GB/s ± 6% ~ (p=0.550 n=20+19)
Memclr64-48 9.79GB/s ± 5% 10.44GB/s ±10% +6.64% (p=0.000 n=18+20)
Memclr256-48 25.4GB/s ±14% 25.6GB/s ±12% ~ (p=0.647 n=20+19)
Memclr4096-48 39.4GB/s ± 8% 72.0GB/s ±13% +82.81% (p=0.000 n=20+20)
Memclr65536-48 26.6GB/s ± 6% 27.0GB/s ± 9% ~ (p=0.517 n=17+20)
Memclr1M-48 17.9GB/s ±12% 18.5GB/s ±11% ~ (p=0.068 n=20+20)
Memclr4M-48 18.0GB/s ± 9% 17.8GB/s ±14% ~ (p=0.547 n=20+20)
Memclr8M-48 17.9GB/s ±10% 17.8GB/s ±14% ~ (p=0.947 n=20+20)
Memclr16M-48 17.8GB/s ± 9% 18.4GB/s ± 9% ~ (p=0.050 n=20+19)
Memclr64M-48 6.19GB/s ±10% 14.87GB/s ± 9% +140.11% (p=0.000 n=20+20)
GoMemclr5-48 1.31GB/s ±10% 1.48GB/s ± 6% +13.06% (p=0.000 n=19+20)
GoMemclr16-48 4.81GB/s ±14% 4.71GB/s ± 8% ~ (p=0.341 n=20+20)
GoMemclr64-48 15.7GB/s ±13% 15.8GB/s ±11% ~ (p=0.967 n=20+19)
Change-Id: I393f3f20e2f31538d1b1dd70d6e5c201c106a095
Reviewed-on: https://go-review.googlesource.com/16773
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Klaus Post <klauspost@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
they can
Not only is this an obvious optimization:
benchmark old MB/s new MB/s speedup
BenchmarkMemmove1-4 35.35 29.65 0.84x
BenchmarkMemmove2-4 63.78 52.53 0.82x
BenchmarkMemmove3-4 89.72 73.96 0.82x
BenchmarkMemmove4-4 109.94 95.73 0.87x
BenchmarkMemmove5-4 127.60 112.80 0.88x
BenchmarkMemmove6-4 143.59 126.67 0.88x
BenchmarkMemmove7-4 157.90 138.92 0.88x
BenchmarkMemmove8-4 167.18 231.81 1.39x
BenchmarkMemmove9-4 175.23 252.07 1.44x
BenchmarkMemmove10-4 165.68 261.10 1.58x
BenchmarkMemmove11-4 174.43 263.31 1.51x
BenchmarkMemmove12-4 180.76 267.56 1.48x
BenchmarkMemmove13-4 189.06 284.93 1.51x
BenchmarkMemmove14-4 186.31 284.72 1.53x
BenchmarkMemmove15-4 195.75 281.62 1.44x
BenchmarkMemmove16-4 202.96 439.23 2.16x
BenchmarkMemmove32-4 264.77 775.77 2.93x
BenchmarkMemmove64-4 306.81 1209.64 3.94x
BenchmarkMemmove128-4 357.03 1515.41 4.24x
BenchmarkMemmove256-4 380.77 2066.01 5.43x
BenchmarkMemmove512-4 385.05 2556.45 6.64x
BenchmarkMemmove1024-4 381.23 2804.10 7.36x
BenchmarkMemmove2048-4 379.06 2814.83 7.43x
BenchmarkMemmove4096-4 387.43 3064.96 7.91x
BenchmarkMemmoveUnaligned1-4 28.91 25.40 0.88x
BenchmarkMemmoveUnaligned2-4 56.13 47.56 0.85x
BenchmarkMemmoveUnaligned3-4 74.32 69.31 0.93x
BenchmarkMemmoveUnaligned4-4 97.02 83.58 0.86x
BenchmarkMemmoveUnaligned5-4 110.17 103.62 0.94x
BenchmarkMemmoveUnaligned6-4 124.95 113.26 0.91x
BenchmarkMemmoveUnaligned7-4 142.37 130.82 0.92x
BenchmarkMemmoveUnaligned8-4 151.20 205.64 1.36x
BenchmarkMemmoveUnaligned9-4 166.97 215.42 1.29x
BenchmarkMemmoveUnaligned10-4 148.49 221.22 1.49x
BenchmarkMemmoveUnaligned11-4 159.47 239.57 1.50x
BenchmarkMemmoveUnaligned12-4 163.52 247.32 1.51x
BenchmarkMemmoveUnaligned13-4 167.55 256.54 1.53x
BenchmarkMemmoveUnaligned14-4 175.12 251.03 1.43x
BenchmarkMemmoveUnaligned15-4 192.10 267.13 1.39x
BenchmarkMemmoveUnaligned16-4 190.76 378.87 1.99x
BenchmarkMemmoveUnaligned32-4 259.02 562.98 2.17x
BenchmarkMemmoveUnaligned64-4 317.72 842.44 2.65x
BenchmarkMemmoveUnaligned128-4 355.43 1274.49 3.59x
BenchmarkMemmoveUnaligned256-4 378.17 1815.74 4.80x
BenchmarkMemmoveUnaligned512-4 362.15 2180.81 6.02x
BenchmarkMemmoveUnaligned1024-4 376.07 2453.58 6.52x
BenchmarkMemmoveUnaligned2048-4 381.66 2568.32 6.73x
BenchmarkMemmoveUnaligned4096-4 398.51 2669.36 6.70x
BenchmarkMemclr5-4 113.83 107.93 0.95x
BenchmarkMemclr16-4 223.84 389.63 1.74x
BenchmarkMemclr64-4 421.99 1209.58 2.87x
BenchmarkMemclr256-4 525.94 2411.58 4.59x
BenchmarkMemclr4096-4 581.66 4372.20 7.52x
BenchmarkMemclr65536-4 565.84 4747.48 8.39x
BenchmarkGoMemclr5-4 194.63 160.31 0.82x
BenchmarkGoMemclr16-4 295.30 630.07 2.13x
BenchmarkGoMemclr64-4 480.24 1884.03 3.92x
BenchmarkGoMemclr256-4 540.23 2926.49 5.42x
but it turns out that it's necessary to avoid the GC seeing partially written
pointers.
It's of course possible to be more sophisticated (using ldp/stp to move 16
bytes at a time in the core loop and unrolling the tail copying loops being
the obvious ideas) but I wanted something simple and (reasonably) obviously
correct.
Fixes #12552
Change-Id: Iaeaf8a812cd06f4747ba2f792de1ded738890735
Reviewed-on: https://go-review.googlesource.com/14813
Reviewed-by: Austin Clements <austin@google.com>
|
|
It is faster to execute
MOVQ AX,(DI)
MOVQ AX,8(DI)
MOVQ AX,16(DI)
MOVQ AX,24(DI)
ADDQ $32,DI
than
STOSQ
STOSQ
STOSQ
STOSQ
However, in order to be able to jump into
the middle of a block of MOVQs, the call
site needs to pre-adjust DI.
If we're clearing a small area, the cost
of that DI pre-adjustment isn't repaid.
This CL switches the DUFFZERO implementation
to use a hybrid strategy, in which small
clears use STOSQ as before, but large clears
use mostly MOVQ/ADDQ blocks.
benchmark old ns/op new ns/op delta
BenchmarkClearFat8 0.55 0.55 +0.00%
BenchmarkClearFat12 0.82 0.83 +1.22%
BenchmarkClearFat16 0.55 0.55 +0.00%
BenchmarkClearFat24 0.82 0.82 +0.00%
BenchmarkClearFat32 2.20 1.94 -11.82%
BenchmarkClearFat40 1.92 1.66 -13.54%
BenchmarkClearFat48 2.21 1.93 -12.67%
BenchmarkClearFat56 3.03 2.20 -27.39%
BenchmarkClearFat64 3.26 2.48 -23.93%
BenchmarkClearFat72 3.57 2.76 -22.69%
BenchmarkClearFat80 3.83 3.05 -20.37%
BenchmarkClearFat88 4.14 3.30 -20.29%
BenchmarkClearFat128 5.54 4.69 -15.34%
BenchmarkClearFat256 9.95 9.09 -8.64%
BenchmarkClearFat512 18.7 17.9 -4.28%
BenchmarkClearFat1024 36.2 35.4 -2.21%
Change-Id: Ic786406d9b3cab68d5a231688f9e66fcd1bd7103
Reviewed-on: https://go-review.googlesource.com/2585
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Recognize loops of the form
for i := range a {
a[i] = zero
}
in which the evaluation of a is free from side effects.
Replace these loops with calls to memclr.
This occurs in the stdlib in 18 places.
The motivating example is clearing a byte slice:
benchmark old ns/op new ns/op delta
BenchmarkGoMemclr5 3.31 3.26 -1.51%
BenchmarkGoMemclr16 13.7 3.28 -76.06%
BenchmarkGoMemclr64 50.8 4.14 -91.85%
BenchmarkGoMemclr256 157 6.02 -96.17%
Update #5373.
Change-Id: I99d3e6f5f268e8c6499b7e661df46403e5eb83e4
Reviewed-on: https://go-review.googlesource.com/2520
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Preparation was in CL 134570043.
This CL contains only the effect of 'hg mv src/pkg/* src'.
For more about the move, see golang.org/s/go14nopkg.
|