aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/memmove_test.go
AgeCommit message (Collapse)Author
2025-07-11runtime: turn off large memmove tests under asan/msanKeith Randall
Just like we do for race mode. They are just too slow when running with the sanitizers. Fixes #59448 Change-Id: I86e3e3488ec5c4c29e410955e9dc4cbc99d39b84 Reviewed-on: https://go-review.googlesource.com/c/go/+/687535 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Keith Randall <khr@golang.org>
2025-06-02runtime: additional memmove benchmarksKeith Randall
For testing out duffcopy changes. Change-Id: I93b4a52d75418a6e31aae5ad99f95d1870812b69 Reviewed-on: https://go-review.googlesource.com/c/go/+/678215 Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2025-02-23runtime: exclude allocation(s) from memmove/memclr benchmarkingDmitrii Martynov
The overhead for allocation is not significant but it should be excluded from the memmove/memclr benchmarking anyway. Change-Id: I7ea86d1b85b13352ccbff16f7510caa250654dab Reviewed-on: https://go-review.googlesource.com/c/go/+/645576 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-03-29runtime: make use of builtin clear in testsJes Cok
This is a follow-up to CL 574675. Change-Id: I98c3ea968e9c7dc61472849c385a1e697568aa30 Reviewed-on: https://go-review.googlesource.com/c/go/+/574975 Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Jes Cok <xigua67damn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>
2024-03-27all: make use of builtin clearJes Cok
Change-Id: I1df0685c75fc1044ba46003a69ecc7dfc53bbc2b Reviewed-on: https://go-review.googlesource.com/c/go/+/574675 Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Ian Lance Taylor <iant@google.com>
2023-10-24runtime: use max/min funcqiulaidongfeng
Change-Id: I3f0b7209621b39cee69566a5cc95e4343b4f1f20 GitHub-Last-Rev: af9dbbe69ad74e8c210254dafa260a886b690853 GitHub-Pull-Request: golang/go#63321 Reviewed-on: https://go-review.googlesource.com/c/go/+/531916 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2023-05-31runtime: fix alignment code in memmove_riscv64.sMark Ryan
The riscv64 implementation of memmove has two optimizations that are applied when both source and destination pointers share the same alignment but that alignment is not 8 bytes. Both optimizations attempt to align the source and destination pointers to 8 byte boundaries before performing 8 byte aligned loads and stores. Both optimizations are incorrect. The first optimization is applied when the destination pointer is smaller than the source pointer. In this case the code increments both pointers by (pointer & 3) bytes rather than (8 - (pointer & 7)) bytes. The second optimization is applied when the destination pointer is larger than the source pointer. In this case the existing code decrements the pointers by (pointer & 3) bytes instead of (pointer & 7). This commit fixes both optimizations avoiding unaligned 8 byte accesses. As this particular optimization is not covered by any of the existing benchmarks a new benchmark, BenchmarkMemmoveUnalignedSrcDst, is provided that exercises both optimizations. Results of the new benchmark, which were run on a SiFive HiFive Unmatched A00 with 16GB of RAM running Ubuntu 23.04 are presented below. MemmoveUnalignedSrcDst/f_16_0-4 39.48n ± 5% 43.47n ± 2% +10.13% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_16_0-4 45.39n ± 5% 41.55n ± 4% -8.47% (p=0.000 n=10) MemmoveUnalignedSrcDst/f_16_1-4 1230.50n ± 1% 83.44n ± 5% -93.22% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_16_1-4 69.34n ± 4% 67.83n ± 8% ~ (p=0.436 n=10) MemmoveUnalignedSrcDst/f_16_4-4 2349.00n ± 1% 72.09n ± 4% -96.93% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_16_4-4 2357.00n ± 0% 77.61n ± 4% -96.71% (p=0.000 n=10) MemmoveUnalignedSrcDst/f_16_7-4 1235.00n ± 0% 62.02n ± 2% -94.98% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_16_7-4 1246.00n ± 0% 84.05n ± 6% -93.25% (p=0.000 n=10) MemmoveUnalignedSrcDst/f_64_0-4 49.96n ± 2% 50.01n ± 2% ~ (p=0.755 n=10) MemmoveUnalignedSrcDst/b_64_0-4 52.06n ± 3% 51.65n ± 3% ~ (p=0.631 n=10) MemmoveUnalignedSrcDst/f_64_1-4 8105.50n ± 0% 97.63n ± 1% -98.80% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_64_1-4 84.07n ± 4% 84.90n ± 5% ~ (p=0.315 n=10) MemmoveUnalignedSrcDst/f_64_4-4 9192.00n ± 0% 86.16n ± 3% -99.06% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_64_4-4 9195.50n ± 1% 91.88n ± 5% -99.00% (p=0.000 n=10) MemmoveUnalignedSrcDst/f_64_7-4 8106.50n ± 0% 78.44n ± 9% -99.03% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_64_7-4 8107.00n ± 0% 99.19n ± 1% -98.78% (p=0.000 n=10) MemmoveUnalignedSrcDst/f_256_0-4 90.95n ± 1% 92.16n ± 8% ~ (p=0.123 n=10) MemmoveUnalignedSrcDst/b_256_0-4 96.09n ± 12% 94.90n ± 2% ~ (p=0.143 n=10) MemmoveUnalignedSrcDst/f_256_1-4 35492.5n ± 0% 133.5n ± 0% -99.62% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_256_1-4 128.7n ± 1% 130.1n ± 1% +1.13% (p=0.005 n=10) MemmoveUnalignedSrcDst/f_256_4-4 36599.0n ± 0% 123.0n ± 1% -99.66% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_256_4-4 36675.5n ± 0% 130.7n ± 1% -99.64% (p=0.000 n=10) MemmoveUnalignedSrcDst/f_256_7-4 35555.5n ± 0% 121.6n ± 2% -99.66% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_256_7-4 35584.0n ± 0% 139.1n ± 1% -99.61% (p=0.000 n=10) MemmoveUnalignedSrcDst/f_4096_0-4 956.3n ± 2% 960.8n ± 1% ~ (p=0.306 n=10) MemmoveUnalignedSrcDst/b_4096_0-4 1.015µ ± 2% 1.012µ ± 2% ~ (p=0.076 n=10) MemmoveUnalignedSrcDst/f_4096_1-4 584.406µ ± 0% 1.002µ ± 1% -99.83% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_4096_1-4 1.044µ ± 1% 1.040µ ± 2% ~ (p=0.090 n=10) MemmoveUnalignedSrcDst/f_4096_4-4 585113.5n ± 0% 988.6n ± 2% -99.83% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_4096_4-4 586.521µ ± 0% 1.044µ ± 1% -99.82% (p=0.000 n=10) MemmoveUnalignedSrcDst/f_4096_7-4 585374.5n ± 0% 986.2n ± 0% -99.83% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_4096_7-4 584.595µ ± 1% 1.055µ ± 0% -99.82% (p=0.000 n=10) MemmoveUnalignedSrcDst/f_65536_0-4 54.83µ ± 0% 55.00µ ± 0% +0.31% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_65536_0-4 56.54µ ± 0% 56.64µ ± 0% +0.19% (p=0.011 n=10) MemmoveUnalignedSrcDst/f_65536_1-4 9450.51µ ± 0% 58.25µ ± 0% -99.38% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_65536_1-4 56.65µ ± 0% 56.68µ ± 0% ~ (p=0.353 n=10) MemmoveUnalignedSrcDst/f_65536_4-4 9449.48µ ± 0% 58.24µ ± 0% -99.38% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_65536_4-4 9462.91µ ± 0% 56.69µ ± 0% -99.40% (p=0.000 n=10) MemmoveUnalignedSrcDst/f_65536_7-4 9477.37µ ± 0% 58.26µ ± 0% -99.39% (p=0.000 n=10) MemmoveUnalignedSrcDst/b_65536_7-4 9467.96µ ± 0% 56.68µ ± 0% -99.40% (p=0.000 n=10) geomean 11.16µ 509.8n -95.43% Change-Id: Idfa1873b81fece3b2b1a0aed398fa5663cc73b83 Reviewed-on: https://go-review.googlesource.com/c/go/+/498377 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-05-24runtime: fix alignment code in memclr_riscv64.sMark Ryan
The existing code incorrectly determines whether the pointer passed to memclrNoHeapPointers is 8 byte aligned (it currently checks to see whether it's 4 byte aligned). In addition, the code that aligns the pointer, by individually filling the first few bytes of the buffer with zeros, is also incorrect. It adjusts the pointer by the wrong number of bytes, resulting in most cases, in an unaligned pointer. This commit fixes both of these issues by anding the pointer with 7 rather than 3 to determine its alignment, and by individually filling the first (8 - (pointer & 7)) bytes with 0 to align the buffer, rather than the first (pointer & 3) bytes. We also remove an unnecessary immediate MOV instruction. A new benchmark is added to test the performance of memclrNoHeapPointers on non-aligned pointers. Results of the existing and the new benchmark on a SiFive HiFive Unmatched A00 with 16GB of RAM running Ubuntu 23.04 are presented below. Memclr/5-4 21.98n ± 7% 22.66n ± 9% ~ (p=0.079 n=10) Memclr/16-4 20.85n ± 3% 21.09n ± 5% ~ (p=0.796 n=10) Memclr/64-4 28.20n ± 4% 27.50n ± 3% ~ (p=0.093 n=10) Memclr/256-4 53.66n ± 8% 53.44n ± 8% ~ (p=0.280 n=10) Memclr/4096-4 522.6n ± 1% 523.4n ± 1% ~ (p=0.240 n=10) Memclr/65536-4 24.17µ ± 0% 24.13µ ± 0% -0.19% (p=0.029 n=10) Memclr/1M-4 446.9µ ± 0% 446.9µ ± 0% ~ (p=0.684 n=10) Memclr/4M-4 12.69m ± 2% 12.79m ± 3% +0.78% (p=0.043 n=10) Memclr/8M-4 29.75m ± 0% 29.76m ± 0% +0.03% (p=0.015 n=10) Memclr/16M-4 60.34m ± 0% 60.32m ± 0% ~ (p=0.247 n=10) Memclr/64M-4 241.2m ± 0% 241.3m ± 0% ~ (p=0.247 n=10) MemclrUnaligned/0_5-4 27.71n ± 0% 27.72n ± 1% ~ (p=0.142 n=10) MemclrUnaligned/0_16-4 26.95n ± 0% 26.04n ± 0% -3.38% (p=0.000 n=10) MemclrUnaligned/0_64-4 38.27n ± 4% 40.15n ± 6% +4.89% (p=0.005 n=10) MemclrUnaligned/0_256-4 63.95n ± 3% 64.19n ± 2% ~ (p=0.971 n=10) MemclrUnaligned/0_4096-4 532.6n ± 1% 530.9n ± 1% ~ (p=0.324 n=10) MemclrUnaligned/0_65536-4 24.30µ ± 0% 24.22µ ± 0% -0.32% (p=0.023 n=10) MemclrUnaligned/1_5-4 29.40n ± 0% 29.39n ± 0% ~ (p=0.060 n=10) MemclrUnaligned/1_16-4 632.65n ± 1% 63.80n ± 2% -89.92% (p=0.000 n=10) MemclrUnaligned/1_64-4 4091.00n ± 1% 73.23n ± 1% -98.21% (p=0.000 n=10) MemclrUnaligned/1_256-4 17803.50n ± 1% 92.03n ± 1% -99.48% (p=0.000 n=10) MemclrUnaligned/1_4096-4 294150.0n ± 1% 561.9n ± 1% -99.81% (p=0.000 n=10) MemclrUnaligned/1_65536-4 4692.80µ ± 1% 24.44µ ± 0% -99.48% (p=0.000 n=10) MemclrUnaligned/4_5-4 27.71n ± 0% 27.71n ± 0% ~ (p=0.308 n=10) MemclrUnaligned/4_16-4 1187.00n ± 1% 50.74n ± 3% -95.72% (p=0.000 n=10) MemclrUnaligned/4_64-4 4617.00n ± 1% 59.89n ± 2% -98.70% (p=0.000 n=10) MemclrUnaligned/4_256-4 18472.50n ± 1% 84.76n ± 2% -99.54% (p=0.000 n=10) MemclrUnaligned/4_4096-4 292904.0n ± 1% 553.7n ± 0% -99.81% (p=0.000 n=10) MemclrUnaligned/4_65536-4 4716.12µ ± 0% 24.38µ ± 0% -99.48% (p=0.000 n=10) MemclrUnaligned/7_5-4 29.39n ± 0% 29.39n ± 0% ~ (p=1.000 n=10) MemclrUnaligned/7_16-4 636.80n ± 1% 48.33n ± 5% -92.41% (p=0.000 n=10) MemclrUnaligned/7_64-4 4094.00n ± 1% 58.88n ± 3% -98.56% (p=0.000 n=10) MemclrUnaligned/7_256-4 17869.00n ± 2% 82.70n ± 3% -99.54% (p=0.000 n=10) MemclrUnaligned/7_4096-4 294110.5n ± 1% 554.6n ± 1% -99.81% (p=0.000 n=10) MemclrUnaligned/7_65536-4 4735.00µ ± 1% 24.28µ ± 0% -99.49% (p=0.000 n=10) MemclrUnaligned/0_1M-4 447.8µ ± 0% 450.0µ ± 1% +0.51% (p=0.000 n=10) MemclrUnaligned/0_4M-4 12.68m ± 1% 12.64m ± 2% -0.33% (p=0.015 n=10) MemclrUnaligned/0_8M-4 29.76m ± 0% 29.79m ± 2% ~ (p=0.075 n=10) MemclrUnaligned/0_16M-4 60.34m ± 1% 60.49m ± 1% ~ (p=0.353 n=10) MemclrUnaligned/0_64M-4 241.3m ± 0% 241.4m ± 0% ~ (p=0.247 n=10) MemclrUnaligned/1_1M-4 75937.3µ ± 1% 449.9µ ± 0% -99.41% (p=0.000 n=10) MemclrUnaligned/1_4M-4 313.96m ± 2% 12.69m ± 0% -95.96% (p=0.000 n=10) MemclrUnaligned/1_8M-4 630.97m ± 1% 29.76m ± 0% -95.28% (p=0.000 n=10) MemclrUnaligned/1_16M-4 1263.47m ± 1% 60.35m ± 2% -95.22% (p=0.000 n=10) MemclrUnaligned/1_64M-4 5053.5m ± 0% 241.3m ± 0% -95.23% (p=0.000 n=10) MemclrUnaligned/4_1M-4 75880.5µ ± 2% 446.5µ ± 0% -99.41% (p=0.000 n=10) MemclrUnaligned/4_4M-4 314.00m ± 1% 12.71m ± 2% -95.95% (p=0.000 n=10) MemclrUnaligned/4_8M-4 630.63m ± 1% 29.77m ± 2% -95.28% (p=0.000 n=10) MemclrUnaligned/4_16M-4 1257.80m ± 0% 60.34m ± 2% -95.20% (p=0.000 n=10) MemclrUnaligned/4_64M-4 5041.3m ± 1% 241.2m ± 0% -95.21% (p=0.000 n=10) MemclrUnaligned/7_1M-4 75866.2µ ± 1% 446.9µ ± 0% -99.41% (p=0.000 n=10) MemclrUnaligned/7_4M-4 309.86m ± 1% 12.70m ± 1% -95.90% (p=0.000 n=10) MemclrUnaligned/7_8M-4 626.67m ± 1% 29.75m ± 2% -95.25% (p=0.000 n=10) MemclrUnaligned/7_16M-4 1252.84m ± 1% 60.31m ± 0% -95.19% (p=0.000 n=10) MemclrUnaligned/7_64M-4 5015.8m ± 1% 241.4m ± 0% -95.19% (p=0.000 n=10) geomean 339.1µ 35.83µ -89.43% Change-Id: I3b958a1d8e8f5ef205052e6b985a5ce21e92ef85 Reviewed-on: https://go-review.googlesource.com/c/go/+/496455 Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: M Zhuo <mzh@golangcn.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-31cmd/compile: inline known-size memclrNoHeapPointers callsJakub Ciolek
This patch rewrites known-size calls to memclrNoHeapPointers with an OpZero. This significantly improves performance and lets some clears get DSE'd. One of the cases where this applies is zeroing a known-size array, example: var x [256]int8 ... for a := range x { x[a] = 0 } Other cases can be found in the runtime itself where memclrNoHeapPointers is sometimes directly invoked with a constant. It seems that for some sized-clears on some architectures (AMD64, maybe others) the memcrlNoHeapPointers is more performant than OpZero. See the issue #56997 for more details. Benches ARM (M1 Pro): name old time/op new time/op delta MemclrKnownSize1-10 2.03ns ± 0% 0.31ns ± 0% -84.69% (p=0.000 n=18+19) MemclrKnownSize2-10 1.97ns ± 0% 0.31ns ± 0% -84.19% (p=0.000 n=12+19) MemclrKnownSize4-10 2.02ns ± 0% 0.31ns ± 0% -84.56% (p=0.000 n=12+20) MemclrKnownSize8-10 2.02ns ± 0% 0.31ns ± 0% -84.59% (p=0.000 n=14+19) MemclrKnownSize16-10 2.15ns ± 0% 0.31ns ± 0% -85.50% (p=0.000 n=18+19) MemclrKnownSize32-10 2.48ns ± 0% 0.31ns ± 0% -87.48% (p=0.000 n=20+19) MemclrKnownSize64-10 1.93ns ± 0% 0.62ns ± 0% -67.88% (p=0.000 n=20+19) MemclrKnownSize112-10 2.48ns ± 0% 1.80ns ± 0% -27.74% (p=0.000 n=19+20) MemclrKnownSize128-10 10.0ns ±112% 2.0ns ± 0% -79.76% (p=0.000 n=18+17) MemclrKnownSize192-10 27.4ns ±103% 2.6ns ± 0% -90.38% (p=0.000 n=16+19) MemclrKnownSize248-10 9.67ns ±43% 3.26ns ± 0% -66.29% (p=0.000 n=19+19) MemclrKnownSize256-10 85.4ns ±148% 3.3ns ± 0% -96.18% (p=0.000 n=20+20) MemclrKnownSize512-10 223ns ±54% 6ns ± 0% -97.42% (p=0.000 n=18+20) MemclrKnownSize1024-10 216ns ±26% 11ns ± 0% -95.00% (p=0.000 n=18+15) MemclrKnownSize4096-10 265ns ± 2% 88ns ± 0% -66.84% (p=0.000 n=19+17) MemclrKnownSize512KiB-10 9.91µs ± 1% 10.23µs ± 2% +3.14% (p=0.000 n=19+19) [Geo mean] 15.6ns 2.5ns -83.62% name old speed new speed delta MemclrKnownSize1-10 493MB/s ± 0% 3216MB/s ± 0% +553.04% (p=0.000 n=18+19) MemclrKnownSize2-10 1.02GB/s ± 0% 6.43GB/s ± 0% +532.33% (p=0.000 n=16+19) MemclrKnownSize4-10 1.99GB/s ± 0% 12.86GB/s ± 0% +547.67% (p=0.000 n=18+20) MemclrKnownSize8-10 3.96GB/s ± 0% 25.72GB/s ± 0% +548.81% (p=0.000 n=19+19) MemclrKnownSize16-10 7.46GB/s ± 0% 51.43GB/s ± 0% +589.72% (p=0.000 n=20+19) MemclrKnownSize32-10 12.9GB/s ± 0% 102.9GB/s ± 0% +698.60% (p=0.000 n=20+18) MemclrKnownSize64-10 33.1GB/s ± 0% 103.0GB/s ± 0% +211.34% (p=0.000 n=19+19) MemclrKnownSize112-10 45.1GB/s ± 0% 62.4GB/s ± 0% +38.38% (p=0.000 n=19+20) MemclrKnownSize128-10 13.3GB/s ±107% 63.5GB/s ± 0% +378.03% (p=0.000 n=19+18) MemclrKnownSize192-10 6.97GB/s ±139% 72.72GB/s ± 0% +943.44% (p=0.000 n=19+19) MemclrKnownSize248-10 25.9GB/s ±46% 76.1GB/s ± 0% +194.16% (p=0.000 n=20+17) MemclrKnownSize256-10 8.64GB/s ±196% 78.51GB/s ± 0% +808.19% (p=0.000 n=20+20) MemclrKnownSize512-10 2.33GB/s ±86% 89.13GB/s ± 0% +3719.50% (p=0.000 n=17+20) MemclrKnownSize1024-10 4.85GB/s ±32% 94.93GB/s ± 0% +1856.74% (p=0.000 n=18+19) MemclrKnownSize4096-10 15.4GB/s ± 2% 46.6GB/s ± 0% +201.55% (p=0.000 n=19+18) MemclrKnownSize512KiB-10 52.9GB/s ± 1% 51.3GB/s ± 2% -3.04% (p=0.000 n=19+19) [Geo mean] 7.54GB/s 42.86GB/s +468.76% Intel Alder Lake 12600k: name old time/op new time/op delta MemclrKnownSize1-16 0.59ns ± 3% 0.38ns ± 6% -36.00% (p=0.000 n=19+18) MemclrKnownSize2-16 0.57ns ± 1% 0.19ns ± 5% -66.27% (p=0.000 n=19+19) MemclrKnownSize4-16 0.66ns ± 2% 0.36ns ±21% -45.12% (p=0.000 n=19+20) MemclrKnownSize8-16 0.74ns ± 1% 0.30ns ±26% -59.81% (p=0.000 n=18+20) MemclrKnownSize16-16 1.00ns ± 7% 0.21ns ± 8% -79.51% (p=0.000 n=20+19) MemclrKnownSize32-16 0.95ns ± 1% 0.40ns ± 1% -57.61% (p=0.000 n=20+18) MemclrKnownSize64-16 1.20ns ± 2% 0.41ns ± 0% -65.82% (p=0.000 n=20+18) MemclrKnownSize112-16 1.27ns ± 2% 1.03ns ± 0% -19.35% (p=0.000 n=20+18) MemclrKnownSize128-16 1.34ns ± 2% 1.03ns ± 0% -23.02% (p=0.000 n=20+20) MemclrKnownSize192-16 1.92ns ± 2% 1.44ns ± 0% -24.89% (p=0.000 n=20+16) MemclrKnownSize248-16 2.77ns ± 1% 3.29ns ± 0% +18.81% (p=0.000 n=20+16) MemclrKnownSize256-16 1.92ns ± 1% 1.86ns ± 0% -3.49% (p=0.000 n=19+15) MemclrKnownSize512-16 2.81ns ± 2% 3.49ns ± 0% +24.15% (p=0.000 n=20+17) MemclrKnownSize1024-16 4.02ns ± 1% 6.78ns ± 0% +68.44% (p=0.000 n=20+18) MemclrKnownSize4096-16 17.2ns ± 2% 14.4ns ± 0% -16.73% (p=0.000 n=20+17) MemclrKnownSize512KiB-16 6.71µs ± 1% 6.52µs ± 0% -2.85% (p=0.000 n=20+18) [Geo mean] 2.60ns 1.71ns -34.06% name old speed new speed delta MemclrKnownSize1-16 1.71GB/s ± 3% 2.67GB/s ± 6% +56.39% (p=0.000 n=19+18) MemclrKnownSize2-16 3.52GB/s ± 2% 10.43GB/s ± 6% +196.04% (p=0.000 n=20+20) MemclrKnownSize4-16 6.06GB/s ± 1% 10.83GB/s ±11% +78.63% (p=0.000 n=19+18) MemclrKnownSize8-16 10.7GB/s ± 1% 27.0GB/s ±21% +151.49% (p=0.000 n=18+20) MemclrKnownSize16-16 16.0GB/s ± 8% 78.1GB/s ± 7% +387.24% (p=0.000 n=20+19) MemclrKnownSize32-16 33.6GB/s ± 1% 79.4GB/s ± 1% +135.89% (p=0.000 n=20+18) MemclrKnownSize64-16 53.3GB/s ± 2% 155.9GB/s ± 0% +192.58% (p=0.000 n=20+18) MemclrKnownSize112-16 88.0GB/s ± 2% 109.1GB/s ± 0% +23.97% (p=0.000 n=20+18) MemclrKnownSize128-16 95.3GB/s ± 2% 123.8GB/s ± 0% +29.88% (p=0.000 n=20+20) MemclrKnownSize192-16 100GB/s ± 2% 133GB/s ± 0% +33.12% (p=0.000 n=20+17) MemclrKnownSize248-16 89.7GB/s ± 1% 75.5GB/s ± 0% -15.84% (p=0.000 n=20+19) MemclrKnownSize256-16 133GB/s ± 1% 138GB/s ± 0% +3.61% (p=0.000 n=19+14) MemclrKnownSize512-16 182GB/s ± 2% 147GB/s ± 0% -19.46% (p=0.000 n=20+17) MemclrKnownSize1024-16 254GB/s ± 1% 151GB/s ± 0% -40.64% (p=0.000 n=20+18) MemclrKnownSize4096-16 237GB/s ± 2% 285GB/s ± 0% +20.09% (p=0.000 n=20+17) MemclrKnownSize512KiB-16 78.2GB/s ± 1% 80.4GB/s ± 0% +2.93% (p=0.000 n=20+18) [Geo mean] 42.1GB/s 63.8GB/s +51.53% compilecmp linux/amd64: runtime runtime.(*pallocData).allocAll 85 -> 45 (-47.06%) runtime.(*pageAlloc).allocRange 942 -> 923 (-2.02%) runtime.(*pageAlloc).free 798 -> 774 (-3.01%) runtime.(*pageBits).clearAll 66 -> 20 (-69.70%) runtime.startCheckmarks 255 -> 246 (-3.53%) runtime.(*pallocData).freeAll 86 -> 46 (-46.51%) runtime.(*pallocBits).freeAll 66 -> 20 (-69.70%) runtime.(*consistentHeapStats).unsafeClear 66 -> 19 (-71.21%) runtime.newproc1 965 -> 933 (-3.32%) crypto/rc4 crypto/rc4.(*Cipher).Reset 78 -> 69 (-11.54%) compress/bzip2 compress/bzip2.(*reader).readBlock 2973 -> 2941 (-1.08%) image/jpeg image/jpeg.(*decoder).processDHT 1179 -> 1166 (-1.10%) index/suffixarray index/suffixarray.bucketMax_8_32 394 -> 241 (-38.83%) index/suffixarray.freq_8_32 317 -> 185 (-41.64%) index/suffixarray.freq_8_64 317 -> 178 (-43.85%) index/suffixarray.bucketMin_8_32 394 -> 243 (-38.32%) index/suffixarray.bucketMin_8_64 398 -> 234 (-41.21%) index/suffixarray.bucketMax_8_64 398 -> 234 (-41.21%) compress/flate compress/flate.(*huffmanBitWriter).generateCodegen 965 -> 838 (-13.16%) compress/flate.(*compressor).reset 429 -> 409 (-4.66%) cmd/vendor/golang.org/x/sys/unix cmd/vendor/golang.org/x/sys/unix.(*FdSet).Zero 66 -> 60 (-9.09%) cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetInet4Addr 211 -> 129 (-38.86%) cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetUint32 98 -> 14 (-85.71%) cmd/vendor/golang.org/x/sys/unix.(*Ifreq).clear 66 -> 11 (-83.33%) cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetUint16 101 -> 15 (-85.15%) cmd/vendor/golang.org/x/sys/unix.(*CPUSet).Zero 66 -> 60 (-9.09%) internal/coverage/decodemeta internal/coverage/decodemeta.(*CoverageMetaFileReader).rdUint64 325 -> 293 (-9.85%) crypto/tls crypto/tls.(*halfConn).setTrafficSecret 253 -> 247 (-2.37%) crypto/tls.(*Conn).readRecordOrCCS 10315 -> 10283 (-0.31%) crypto/tls.(*halfConn).changeCipherSpec 271 -> 261 (-3.69%) crypto/tls.(*Conn).writeRecordLocked 1765 -> 1748 (-0.96%) file before after Δ % runtime.s 512467 512164 -303 -0.059% crypto/rc4.s 955 946 -9 -0.942% compress/bzip2.s 9586 9554 -32 -0.334% image/jpeg.s 32122 32109 -13 -0.040% index/suffixarray.s 38547 37644 -903 -2.343% compress/flate.s 46668 46521 -147 -0.315% cmd/vendor/golang.org/x/sys/unix.s 118620 118301 -319 -0.269% internal/coverage/decodemeta.s 7224 7192 -32 -0.443% crypto/tls.s 288762 288697 -65 -0.023% cmd/compile/internal/ssa.s 3639799 3640727 +928 +0.025% total 20790248 20789353 -895 -0.004% src/runtime benchmarks (Linux Alder Lake 12600k): name old time/op new time/op delta MakeChan/Byte-16 26.2ns ± 2% 25.6ns ± 3% -2.05% (p=0.003 n=9+10) MakeChan/Int-16 33.9ns ± 2% 33.3ns ± 4% -1.99% (p=0.015 n=10+10) MakeChan/Ptr-16 54.2ns ± 2% 53.7ns ± 1% -0.90% (p=0.016 n=10+9) MakeChan/Struct/0-16 23.8ns ± 3% 23.4ns ± 1% -1.72% (p=0.009 n=10+8) MakeChan/Struct/32-16 55.9ns ± 2% 53.9ns ± 1% -3.48% (p=0.000 n=10+10) MakeChan/Struct/40-16 63.5ns ± 1% 61.1ns ± 2% -3.79% (p=0.000 n=10+9) ChanNonblocking-16 0.22ns ± 0% 0.22ns ± 0% +0.40% (p=0.011 n=9+8) SelectUncontended-16 4.63ns ± 1% 4.62ns ± 0% -0.35% (p=0.001 n=10+8) SelectSyncContended-16 1.58µs ± 2% 1.59µs ± 1% ~ (p=0.540 n=10+10) SelectAsyncContended-16 290ns ± 0% 291ns ± 0% +0.14% (p=0.012 n=8+9) SelectNonblock-16 0.95ns ± 1% 0.95ns ± 1% ~ (p=0.546 n=9+9) ChanUncontended-16 239ns ± 3% 242ns ± 6% ~ (p=0.886 n=9+10) ChanContended-16 17.7µs ± 1% 18.2µs ± 1% +2.87% (p=0.000 n=10+9) ChanSync-16 109ns ± 2% 109ns ± 1% ~ (p=0.342 n=10+10) ChanSyncWork-16 6.55µs ± 1% 6.53µs ± 1% ~ (p=0.101 n=10+10) ChanProdCons0-16 502ns ± 1% 499ns ± 0% -0.55% (p=0.001 n=10+9) ChanProdCons10-16 373ns ± 2% 377ns ± 1% ~ (p=0.095 n=10+9) ChanProdCons100-16 224ns ± 2% 223ns ± 3% ~ (p=0.150 n=9+10) ChanProdConsWork0-16 491ns ± 1% 484ns ± 0% -1.26% (p=0.000 n=10+9) ChanProdConsWork10-16 451ns ± 2% 448ns ± 2% ~ (p=0.210 n=8+10) ChanProdConsWork100-16 406ns ± 0% 407ns ± 1% ~ (p=0.138 n=8+8) SelectProdCons-16 509ns ± 0% 509ns ± 0% ~ (p=0.917 n=9+9) ReceiveDataFromClosedChan-16 12.1ns ± 0% 12.1ns ± 0% ~ (p=0.780 n=10+10) ChanCreation-16 22.6ns ± 1% 22.4ns ± 0% -0.72% (p=0.001 n=10+8) ChanSem-16 165ns ± 1% 166ns ± 1% +0.72% (p=0.002 n=10+10) ChanPopular-16 500µs ± 2% 498µs ± 1% ~ (p=0.218 n=10+10) ChanClosed-16 0.29ns ± 0% 0.29ns ± 0% +0.09% (p=0.019 n=9+8) CallClosure-16 1.28ns ± 0% 1.27ns ± 0% -0.51% (p=0.000 n=9+9) CallClosure1-16 1.50ns ± 0% 1.50ns ± 0% ~ (p=0.123 n=9+9) CallClosure2-16 8.86ns ± 1% 8.86ns ± 3% ~ (p=0.590 n=9+10) CallClosure3-16 8.75ns ± 2% 8.69ns ± 2% ~ (p=0.247 n=10+10) CallClosure4-16 8.65ns ± 2% 8.56ns ± 2% ~ (p=0.105 n=10+10) Complex128DivNormal-16 2.47ns ± 0% 2.47ns ± 0% ~ (p=0.790 n=10+9) Complex128DivNisNaN-16 4.44ns ± 0% 4.43ns ± 0% ~ (p=0.564 n=10+10) Complex128DivDisNaN-16 4.48ns ± 0% 4.48ns ± 0% ~ (p=0.101 n=10+10) Complex128DivNisInf-16 2.58ns ± 0% 2.58ns ± 0% ~ (p=0.808 n=10+10) Complex128DivDisInf-16 6.30ns ± 0% 6.31ns ± 0% ~ (p=0.305 n=10+10) SetTypePtr-16 0.73ns ± 1% 0.73ns ± 3% ~ (p=0.644 n=10+10) SetTypePtr8-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.127 n=10+10) SetTypePtr16-16 4.13ns ± 1% 4.12ns ± 0% ~ (p=0.109 n=10+10) SetTypePtr32-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.203 n=9+10) SetTypePtr64-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.696 n=10+10) SetTypePtr126-16 6.91ns ± 0% 6.91ns ± 0% ~ (p=0.469 n=10+10) SetTypePtr128-16 6.66ns ± 0% 6.67ns ± 0% ~ (p=0.246 n=9+10) SetTypePtrSlice-16 54.1ns ± 1% 54.1ns ± 1% ~ (p=0.509 n=9+10) SetTypeNode1-16 4.13ns ± 1% 4.12ns ± 0% ~ (p=0.342 n=10+10) SetTypeNode1Slice-16 10.1ns ± 1% 10.0ns ± 1% -1.18% (p=0.000 n=10+10) SetTypeNode8-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.137 n=8+8) SetTypeNode8Slice-16 22.6ns ± 0% 22.6ns ± 0% ~ (p=0.423 n=10+10) SetTypeNode64-16 6.90ns ± 0% 6.91ns ± 0% ~ (p=0.275 n=10+10) SetTypeNode64Slice-16 173ns ± 0% 173ns ± 0% ~ (p=0.610 n=9+10) SetTypeNode64Dead-16 5.53ns ± 0% 5.52ns ± 0% ~ (p=0.123 n=10+6) SetTypeNode64DeadSlice-16 150ns ± 0% 150ns ± 0% ~ (p=0.398 n=10+10) SetTypeNode124-16 6.90ns ± 0% 6.90ns ± 0% ~ (p=0.779 n=10+10) SetTypeNode124Slice-16 222ns ± 5% 217ns ± 0% ~ (p=0.302 n=10+10) SetTypeNode126-16 6.66ns ± 0% 6.66ns ± 0% ~ (p=0.324 n=10+9) SetTypeNode126Slice-16 218ns ± 0% 218ns ± 0% ~ (p=0.119 n=9+10) SetTypeNode128-16 9.76ns ± 0% 9.73ns ± 0% -0.31% (p=0.003 n=9+10) SetTypeNode128Slice-16 279ns ± 0% 278ns ± 0% ~ (p=0.112 n=10+9) SetTypeNode130-16 9.77ns ± 0% 9.73ns ± 0% -0.33% (p=0.002 n=10+10) SetTypeNode130Slice-16 284ns ± 0% 284ns ± 0% ~ (p=0.668 n=10+10) SetTypeNode1024-16 51.2ns ± 0% 51.6ns ± 1% ~ (p=0.080 n=9+9) SetTypeNode1024Slice-16 1.83µs ± 0% 1.82µs ± 0% ~ (p=0.115 n=10+10) Allocation-16 4.64µs ± 1% 4.37µs ± 1% -5.69% (p=0.000 n=9+9) ReadMemStats-16 5.62µs ± 2% 5.55µs ± 5% -1.36% (p=0.050 n=10+10) WriteBarrier-16 4.95ns ± 3% 4.99ns ± 3% ~ (p=0.255 n=10+10) BulkWriteBarrier-16 1.69ns ± 2% 1.63ns ± 4% -3.77% (p=0.001 n=10+10) ScanStackNoLocals-16 12.8ms ± 2% 12.9ms ± 1% +0.72% (p=0.019 n=10+10) MSpanCountAlloc/bits=64-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.124 n=10+10) MSpanCountAlloc/bits=128-16 2.08ns ± 1% 2.06ns ± 1% -0.87% (p=0.000 n=10+10) MSpanCountAlloc/bits=256-16 2.71ns ± 1% 2.69ns ± 1% -0.74% (p=0.001 n=10+9) MSpanCountAlloc/bits=512-16 4.15ns ± 0% 4.23ns ± 2% +2.15% (p=0.000 n=10+10) MSpanCountAlloc/bits=1024-16 7.89ns ± 1% 7.89ns ± 1% ~ (p=0.867 n=10+10) Hash5-16 1.93ns ± 1% 2.01ns ± 0% +3.99% (p=0.000 n=10+8) Hash16-16 2.04ns ± 1% 2.21ns ± 1% +8.61% (p=0.000 n=10+10) Hash64-16 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.154 n=9+9) Hash1024-16 16.4ns ± 0% 16.4ns ± 0% +0.17% (p=0.020 n=9+10) Hash65536-16 886ns ± 0% 885ns ± 0% ~ (p=0.725 n=10+10) AlignedLoad-16 0.96ns ± 2% 0.95ns ± 3% ~ (p=0.123 n=10+10) UnalignedLoad-16 0.95ns ± 2% 1.01ns ± 2% +6.31% (p=0.000 n=10+10) EqEfaceConcrete-16 0.31ns ± 3% 0.33ns ± 5% +8.10% (p=0.000 n=10+10) EqIfaceConcrete-16 0.31ns ±13% 0.28ns ± 2% -9.23% (p=0.001 n=10+10) NeEfaceConcrete-16 0.29ns ± 1% 0.31ns ± 7% +5.59% (p=0.010 n=8+8) NeIfaceConcrete-16 0.28ns ± 2% 0.29ns ± 1% +4.49% (p=0.000 n=9+8) ConvT2EByteSized/bool-16 0.53ns ± 1% 0.52ns ± 1% -2.18% (p=0.000 n=10+10) ConvT2EByteSized/uint8-16 0.53ns ± 1% 0.53ns ± 0% +1.22% (p=0.000 n=10+10) ConvT2ESmall-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.774 n=9+9) ConvT2EUintptr-16 1.03ns ± 0% 1.04ns ± 0% +0.50% (p=0.000 n=10+8) ConvT2ELarge-16 14.4ns ± 2% 14.4ns ± 1% ~ (p=0.726 n=10+10) ConvT2ISmall-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.693 n=9+10) ConvT2IUintptr-16 1.03ns ± 0% 1.03ns ± 0% +0.44% (p=0.000 n=10+10) ConvT2ILarge-16 14.2ns ± 1% 14.4ns ± 1% +0.85% (p=0.007 n=9+10) ConvI2E-16 0.54ns ± 1% 0.54ns ± 0% -0.39% (p=0.037 n=10+8) ConvI2I-16 2.68ns ± 0% 2.70ns ± 1% +0.73% (p=0.000 n=9+10) AssertE2T-16 0.28ns ± 1% 0.39ns ± 5% +37.38% (p=0.000 n=10+10) AssertE2TLarge-16 0.42ns ± 2% 0.48ns ± 1% +14.92% (p=0.000 n=9+10) AssertE2I-16 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.352 n=9+9) AssertI2T-16 0.37ns ± 3% 0.34ns ± 1% -6.16% (p=0.000 n=10+10) AssertI2I-16 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.286 n=10+10) AssertI2E-16 0.54ns ± 1% 0.54ns ± 0% -0.94% (p=0.000 n=10+10) AssertE2E-16 0.41ns ± 0% 0.41ns ± 1% ~ (p=0.880 n=9+9) AssertE2T2-16 0.41ns ± 1% 0.41ns ± 1% ~ (p=0.725 n=10+10) AssertE2T2Blank-16 0.24ns ± 5% 0.21ns ± 1% -14.79% (p=0.000 n=10+9) AssertI2E2-16 0.69ns ± 0% 0.69ns ± 1% ~ (p=0.541 n=10+10) AssertI2E2Blank-16 0.26ns ± 9% 0.21ns ± 1% -18.86% (p=0.000 n=10+9) AssertE2E2-16 0.53ns ± 1% 0.53ns ± 1% +0.72% (p=0.004 n=10+10) AssertE2E2Blank-16 0.23ns ± 4% 0.21ns ± 1% -8.42% (p=0.000 n=10+10) ConvT2Ezero/zero/16-16 1.13ns ± 0% 1.14ns ± 1% ~ (p=0.583 n=9+10) ConvT2Ezero/zero/32-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.417 n=10+10) ConvT2Ezero/zero/64-16 1.03ns ± 1% 1.03ns ± 0% ~ (p=0.051 n=10+10) ConvT2Ezero/zero/str-16 1.03ns ± 0% 1.03ns ± 0% ~ (p=0.132 n=10+10) ConvT2Ezero/zero/slice-16 1.14ns ± 0% 1.15ns ± 0% +0.49% (p=0.001 n=10+10) ConvT2Ezero/zero/big-16 123ns ± 1% 123ns ± 1% ~ (p=0.171 n=10+10) ConvT2Ezero/nonzero/str-16 19.4ns ± 1% 19.3ns ± 3% ~ (p=0.548 n=9+10) ConvT2Ezero/nonzero/slice-16 22.2ns ± 2% 22.0ns ± 2% ~ (p=0.109 n=10+10) ConvT2Ezero/nonzero/big-16 123ns ± 1% 123ns ± 1% ~ (p=0.446 n=10+8) ConvT2Ezero/smallint/16-16 1.13ns ± 0% 1.14ns ± 1% ~ (p=0.362 n=10+10) ConvT2Ezero/smallint/32-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.907 n=10+9) ConvT2Ezero/smallint/64-16 1.04ns ± 0% 1.03ns ± 0% -0.38% (p=0.002 n=10+10) ConvT2Ezero/largeint/16-16 6.65ns ± 1% 6.65ns ± 2% ~ (p=0.618 n=10+9) ConvT2Ezero/largeint/32-16 6.75ns ± 3% 6.63ns ± 2% -1.77% (p=0.015 n=10+10) ConvT2Ezero/largeint/64-16 9.19ns ± 1% 9.26ns ± 2% ~ (p=0.123 n=10+10) Malloc8-16 8.66ns ± 1% 8.89ns ± 2% +2.74% (p=0.000 n=10+10) Malloc16-16 13.7ns ± 1% 13.8ns ± 1% +0.71% (p=0.022 n=10+8) MallocTypeInfo8-16 11.7ns ± 3% 11.6ns ± 2% ~ (p=0.469 n=10+10) MallocTypeInfo16-16 18.3ns ± 1% 18.2ns ± 2% ~ (p=0.251 n=9+10) MallocLargeStruct-16 195ns ± 1% 198ns ± 1% +1.65% (p=0.000 n=9+10) GoroutineSelect-16 1.10ms ± 1% 1.12ms ± 1% +1.36% (p=0.000 n=10+8) GoroutineBlocking-16 986µs ± 1% 998µs ± 1% +1.23% (p=0.002 n=10+10) GoroutineForRange-16 985µs ± 1% 1001µs ± 1% +1.68% (p=0.000 n=10+10) GoroutineIdle-16 679µs ± 1% 691µs ± 0% +1.74% (p=0.000 n=10+9) HashStringSpeed-16 5.33ns ± 5% 5.19ns ± 4% ~ (p=0.113 n=9+9) HashBytesSpeed-16 8.20ns ± 3% 8.24ns ± 1% ~ (p=0.497 n=10+9) HashInt32Speed-16 4.01ns ± 2% 3.90ns ± 4% -2.63% (p=0.011 n=9+10) HashInt64Speed-16 3.94ns ± 4% 3.79ns ± 1% -3.74% (p=0.003 n=10+9) HashStringArraySpeed-16 12.5ns ± 4% 12.3ns ± 1% ~ (p=0.055 n=10+10) MegMap-16 3.72ns ± 1% 3.73ns ± 1% ~ (p=0.484 n=9+10) MegOneMap-16 2.28ns ± 1% 2.27ns ± 1% ~ (p=0.287 n=10+10) MegEqMap-16 22.0µs ± 3% 22.3µs ± 2% +1.48% (p=0.028 n=10+9) MegEmptyMap-16 0.93ns ± 1% 0.92ns ± 1% -0.52% (p=0.030 n=10+10) SmallStrMap-16 3.77ns ± 0% 3.77ns ± 0% ~ (p=0.324 n=10+10) MapStringKeysEight_16-16 3.91ns ± 0% 3.91ns ± 0% ~ (p=0.088 n=9+9) MapStringKeysEight_32-16 3.58ns ± 1% 3.50ns ± 0% -2.11% (p=0.000 n=10+10) MapStringKeysEight_64-16 3.58ns ± 1% 3.50ns ± 0% -2.23% (p=0.000 n=10+10) MapStringKeysEight_1M-16 3.57ns ± 1% 3.50ns ± 0% -1.92% (p=0.000 n=10+10) IntMap-16 2.89ns ± 1% 2.89ns ± 0% ~ (p=0.381 n=10+10) MapFirst/1-16 1.60ns ± 1% 1.59ns ± 2% -0.49% (p=0.020 n=10+9) MapFirst/2-16 1.61ns ± 0% 1.59ns ± 1% -1.17% (p=0.001 n=10+10) MapFirst/3-16 1.61ns ± 1% 1.59ns ± 1% -1.45% (p=0.000 n=10+10) MapFirst/4-16 1.60ns ± 1% 1.59ns ± 1% -1.16% (p=0.000 n=10+10) MapFirst/5-16 1.60ns ± 1% 1.58ns ± 1% -0.98% (p=0.000 n=10+10) MapFirst/6-16 1.60ns ± 1% 1.59ns ± 1% -0.87% (p=0.001 n=10+10) MapFirst/7-16 1.60ns ± 1% 1.59ns ± 1% -0.79% (p=0.002 n=10+10) MapFirst/8-16 1.60ns ± 1% 1.59ns ± 1% -0.67% (p=0.017 n=9+10) MapFirst/9-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.492 n=10+10) MapFirst/10-16 2.83ns ± 0% 2.84ns ± 0% +0.24% (p=0.017 n=10+10) MapFirst/11-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.445 n=10+10) MapFirst/12-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.564 n=10+10) MapFirst/13-16 2.83ns ± 0% 2.84ns ± 0% ~ (p=0.175 n=9+10) MapFirst/14-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.322 n=10+9) MapFirst/15-16 2.83ns ± 0% 2.84ns ± 1% ~ (p=0.209 n=10+10) MapFirst/16-16 2.83ns ± 1% 2.84ns ± 0% ~ (p=0.238 n=10+10) MapMid/1-16 1.64ns ± 0% 1.64ns ± 0% ~ (p=0.453 n=10+9) MapMid/2-16 1.86ns ± 1% 1.86ns ± 0% ~ (p=0.764 n=10+9) MapMid/3-16 1.86ns ± 0% 1.86ns ± 1% ~ (p=1.000 n=10+10) MapMid/4-16 2.06ns ± 0% 2.06ns ± 0% -0.27% (p=0.014 n=10+9) MapMid/5-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.075 n=9+10) MapMid/6-16 2.27ns ± 0% 2.27ns ± 1% ~ (p=0.898 n=10+10) MapMid/7-16 2.27ns ± 1% 2.26ns ± 0% -0.23% (p=0.049 n=10+10) MapMid/8-16 2.47ns ± 0% 2.47ns ± 1% ~ (p=0.840 n=10+10) MapMid/9-16 4.21ns ± 7% 4.13ns ±19% ~ (p=0.315 n=10+10) MapMid/10-16 4.17ns ± 7% 4.31ns ± 5% +3.37% (p=0.021 n=10+9) MapMid/11-16 4.18ns ± 7% 4.32ns ± 6% +3.50% (p=0.015 n=10+10) MapMid/12-16 4.34ns ± 7% 4.30ns ± 5% ~ (p=0.858 n=9+10) MapMid/13-16 4.25ns ± 6% 4.28ns ± 6% ~ (p=0.489 n=9+9) MapMid/14-16 3.75ns ±23% 3.90ns ±16% ~ (p=0.353 n=10+10) MapMid/15-16 3.87ns ±25% 3.95ns ±26% ~ (p=0.315 n=10+10) MapMid/16-16 4.06ns ±19% 3.94ns ±16% ~ (p=0.796 n=10+10) MapLast/1-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.607 n=10+10) MapLast/2-16 1.86ns ± 0% 1.86ns ± 0% +0.26% (p=0.029 n=10+10) MapLast/3-16 2.06ns ± 1% 2.06ns ± 0% ~ (p=0.689 n=8+9) MapLast/4-16 2.27ns ± 1% 2.26ns ± 0% ~ (p=0.148 n=10+9) MapLast/5-16 2.47ns ± 0% 2.47ns ± 0% ~ (p=0.385 n=9+10) MapLast/6-16 2.67ns ± 0% 2.68ns ± 0% ~ (p=0.202 n=9+10) MapLast/7-16 2.88ns ± 0% 2.88ns ± 0% ~ (p=0.751 n=10+10) MapLast/8-16 3.08ns ± 0% 3.08ns ± 0% ~ (p=0.826 n=10+9) MapLast/9-16 4.31ns ± 6% 4.54ns ± 5% ~ (p=0.070 n=9+8) MapLast/10-16 4.25ns ± 5% 4.42ns ± 6% ~ (p=0.321 n=9+8) MapLast/11-16 4.59ns ±16% 5.42ns ±44% +17.99% (p=0.019 n=10+10) MapLast/12-16 5.04ns ±19% 6.11ns ±28% +21.35% (p=0.005 n=9+10) MapLast/13-16 6.00ns ±35% 5.76ns ± 3% ~ (p=0.173 n=10+8) MapLast/14-16 4.27ns ± 5% 4.53ns ± 6% +6.14% (p=0.007 n=10+10) MapLast/15-16 4.41ns ± 1% 4.44ns ± 7% ~ (p=0.515 n=8+10) MapLast/16-16 4.18ns ± 6% 4.99ns ±18% +19.48% (p=0.000 n=10+10) MapCycle-16 7.48ns ± 2% 7.46ns ± 1% ~ (p=0.699 n=10+10) RepeatedLookupStrMapKey32-16 6.98ns ± 3% 6.73ns ± 2% -3.63% (p=0.000 n=10+10) RepeatedLookupStrMapKey1M-16 14.7µs ± 5% 14.7µs ± 4% ~ (p=0.604 n=9+10) MakeMap/[Byte]Byte-16 58.5ns ± 1% 58.5ns ± 1% ~ (p=0.780 n=10+9) MakeMap/[Int]Int-16 113ns ± 0% 113ns ± 1% ~ (p=0.100 n=8+10) NewEmptyMap-16 2.47ns ± 0% 2.47ns ± 0% ~ (p=0.638 n=10+10) NewSmallMap-16 11.5ns ± 1% 11.6ns ± 0% +1.18% (p=0.000 n=10+10) MapIter-16 42.2ns ± 0% 42.8ns ± 1% +1.50% (p=0.000 n=10+10) MapIterEmpty-16 1.85ns ± 0% 1.85ns ± 0% ~ (p=0.651 n=10+10) SameLengthMap-16 1.85ns ± 1% 1.85ns ± 0% ~ (p=0.247 n=10+10) BigKeyMap-16 7.18ns ± 1% 7.42ns ± 4% +3.33% (p=0.004 n=10+10) BigValMap-16 7.03ns ± 2% 7.19ns ± 1% +2.33% (p=0.000 n=10+9) SmallKeyMap-16 5.32ns ± 1% 5.24ns ± 1% -1.41% (p=0.000 n=10+10) MapPopulate/1-16 6.30ns ± 0% 6.41ns ± 1% +1.81% (p=0.000 n=8+10) MapPopulate/10-16 239ns ± 2% 234ns ± 2% -2.05% (p=0.001 n=9+10) MapPopulate/100-16 4.19µs ± 2% 4.22µs ± 2% ~ (p=0.171 n=10+10) MapPopulate/1000-16 52.3µs ± 1% 52.5µs ± 1% ~ (p=0.133 n=9+10) MapPopulate/10000-16 459µs ± 1% 466µs ± 2% +1.45% (p=0.005 n=10+10) MapPopulate/100000-16 4.22ms ± 2% 4.25ms ± 2% ~ (p=0.393 n=10+10) ComplexAlgMap-16 12.5ns ± 1% 12.4ns ± 1% -0.95% (p=0.022 n=10+10) GoMapClear/Reflexive/1-16 9.61ns ± 1% 9.58ns ± 0% -0.27% (p=0.027 n=10+10) GoMapClear/Reflexive/10-16 10.0ns ± 1% 10.0ns ± 1% ~ (p=0.648 n=9+9) GoMapClear/Reflexive/100-16 31.4ns ± 0% 31.4ns ± 1% ~ (p=0.305 n=9+10) GoMapClear/Reflexive/1000-16 147ns ± 0% 149ns ± 2% +1.21% (p=0.000 n=10+10) GoMapClear/Reflexive/10000-16 3.99µs ± 0% 4.00µs ± 0% +0.21% (p=0.018 n=9+10) GoMapClear/NonReflexive/1-16 41.4ns ± 2% 41.7ns ± 1% +0.55% (p=0.043 n=9+10) GoMapClear/NonReflexive/10-16 50.3ns ± 1% 50.9ns ± 1% +1.16% (p=0.000 n=10+10) GoMapClear/NonReflexive/100-16 125ns ± 0% 126ns ± 0% +0.96% (p=0.000 n=8+10) GoMapClear/NonReflexive/1000-16 1.08µs ± 0% 1.08µs ± 1% ~ (p=0.097 n=10+10) GoMapClear/NonReflexive/10000-16 8.18µs ± 2% 8.10µs ± 0% -0.91% (p=0.019 n=10+8) MapStringConversion/32/simple-16 4.66ns ± 1% 4.69ns ± 3% ~ (p=0.905 n=9+10) MapStringConversion/32/struct-16 4.65ns ± 3% 4.94ns ± 2% +6.23% (p=0.000 n=10+10) MapStringConversion/32/array-16 4.69ns ± 3% 4.72ns ± 3% ~ (p=0.631 n=10+10) MapStringConversion/64/simple-16 4.14ns ± 0% 4.14ns ± 1% ~ (p=0.342 n=10+10) MapStringConversion/64/struct-16 4.13ns ± 0% 4.13ns ± 0% ~ (p=0.809 n=10+10) MapStringConversion/64/array-16 4.13ns ± 1% 4.13ns ± 1% ~ (p=0.752 n=10+10) MapInterfaceString-16 7.90ns ±23% 8.51ns ±33% ~ (p=0.604 n=9+10) MapInterfacePtr-16 7.68ns ±29% 7.10ns ±36% ~ (p=0.353 n=10+10) NewEmptyMapHintLessThan8-16 3.70ns ± 0% 3.70ns ± 0% ~ (p=0.209 n=10+10) NewEmptyMapHintGreaterThan8-16 270ns ± 1% 272ns ± 1% +0.71% (p=0.005 n=10+9) MapPop100-16 6.45µs ± 0% 6.50µs ± 1% +0.77% (p=0.000 n=10+10) MapPop1000-16 114µs ± 1% 114µs ± 1% ~ (p=0.190 n=10+10) MapPop10000-16 2.28ms ± 2% 2.28ms ± 2% ~ (p=0.912 n=10+10) MapAssign/Int32/256-16 4.75ns ± 2% 4.82ns ± 4% ~ (p=0.101 n=10+10) MapAssign/Int32/65536-16 16.4ns ± 1% 16.7ns ± 0% +1.44% (p=0.000 n=10+9) MapAssign/Int64/256-16 4.79ns ± 5% 4.79ns ± 1% ~ (p=0.616 n=10+8) MapAssign/Int64/65536-16 17.1ns ± 1% 16.8ns ± 0% -1.28% (p=0.000 n=10+8) MapAssign/Str/256-16 6.07ns ± 6% 6.24ns ± 2% +2.84% (p=0.035 n=10+9) MapAssign/Str/65536-16 21.4ns ± 0% 21.4ns ± 3% ~ (p=0.300 n=7+10) MapOperatorAssign/Int32/256-16 4.82ns ± 3% 4.81ns ± 3% ~ (p=0.684 n=10+10) MapOperatorAssign/Int32/65536-16 16.8ns ± 1% 16.5ns ± 1% -1.68% (p=0.000 n=9+10) MapOperatorAssign/Int64/256-16 4.74ns ± 1% 4.77ns ± 3% ~ (p=0.563 n=10+9) MapOperatorAssign/Int64/65536-16 16.9ns ± 1% 17.2ns ± 1% +1.88% (p=0.000 n=10+10) MapOperatorAssign/Str/256-16 1.09µs ± 1% 1.10µs ± 2% ~ (p=0.210 n=10+10) MapOperatorAssign/Str/65536-16 184ns ± 9% 184ns ± 8% ~ (p=0.922 n=10+9) MapAppendAssign/Int32/256-16 13.8ns ±10% 14.4ns ±11% ~ (p=0.190 n=10+10) MapAppendAssign/Int32/65536-16 28.9ns ± 5% 30.7ns ± 6% +6.13% (p=0.001 n=9+10) MapAppendAssign/Int64/256-16 14.5ns ±12% 13.8ns ± 8% -5.02% (p=0.037 n=10+10) MapAppendAssign/Int64/65536-16 30.9ns ± 1% 30.4ns ± 2% -1.56% (p=0.001 n=10+10) MapAppendAssign/Str/256-16 30.2ns ± 6% 30.0ns ±10% ~ (p=0.645 n=10+10) MapAppendAssign/Str/65536-16 44.5ns ± 4% 46.8ns ± 3% +5.17% (p=0.001 n=8+9) MapDelete/Int32/100-16 18.7ns ± 0% 18.7ns ± 0% -0.27% (p=0.017 n=10+10) MapDelete/Int32/1000-16 17.6ns ± 1% 17.5ns ± 1% -0.85% (p=0.000 n=9+10) MapDelete/Int32/10000-16 18.7ns ± 0% 18.3ns ± 1% -1.92% (p=0.000 n=10+10) MapDelete/Int64/100-16 19.1ns ± 0% 19.2ns ± 0% +0.68% (p=0.000 n=10+9) MapDelete/Int64/1000-16 17.7ns ± 2% 18.3ns ± 1% +3.00% (p=0.000 n=10+10) MapDelete/Int64/10000-16 18.8ns ± 1% 19.2ns ± 0% +2.01% (p=0.000 n=10+9) MapDelete/Str/100-16 26.5ns ± 0% 26.4ns ± 1% -0.73% (p=0.000 n=10+10) MapDelete/Str/1000-16 23.5ns ± 2% 23.4ns ± 1% ~ (p=0.425 n=10+10) MapDelete/Str/10000-16 25.1ns ± 0% 25.1ns ± 1% +0.28% (p=0.037 n=10+10) MapDelete/Pointer/100-16 20.6ns ± 1% 20.6ns ± 0% ~ (p=0.117 n=10+10) MapDelete/Pointer/1000-16 19.2ns ± 1% 19.4ns ± 1% +0.97% (p=0.004 n=10+10) MapDelete/Pointer/10000-16 20.0ns ± 0% 20.1ns ± 1% +0.52% (p=0.022 n=10+10) Memmove/0-16 0.21ns ± 2% 0.21ns ± 1% ~ (p=0.671 n=10+10) Memmove/1-16 0.93ns ± 0% 0.93ns ± 0% +0.21% (p=0.034 n=10+10) Memmove/2-16 0.93ns ± 0% 0.93ns ± 0% ~ (p=0.101 n=10+10) Memmove/3-16 0.93ns ± 1% 0.93ns ± 1% +0.49% (p=0.004 n=10+10) Memmove/4-16 1.03ns ± 0% 1.03ns ± 0% ~ (p=0.260 n=10+10) Memmove/5-16 1.13ns ± 0% 1.13ns ± 0% +0.20% (p=0.034 n=10+10) Memmove/6-16 1.13ns ± 0% 1.13ns ± 1% ~ (p=0.126 n=10+10) Memmove/7-16 1.13ns ± 0% 1.13ns ± 1% +0.22% (p=0.028 n=10+10) Memmove/8-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.545 n=9+10) Memmove/9-16 1.25ns ± 0% 1.35ns ± 0% +7.98% (p=0.000 n=10+10) Memmove/10-16 1.25ns ± 0% 1.35ns ± 0% +7.96% (p=0.000 n=9+9) Memmove/11-16 1.25ns ± 0% 1.35ns ± 0% +8.53% (p=0.000 n=10+9) Memmove/12-16 1.25ns ± 0% 1.35ns ± 1% +8.24% (p=0.000 n=10+10) Memmove/13-16 1.25ns ± 0% 1.34ns ± 0% +7.75% (p=0.000 n=10+10) Memmove/14-16 1.25ns ± 0% 1.35ns ± 1% +8.28% (p=0.000 n=10+9) Memmove/15-16 1.25ns ± 0% 1.35ns ± 0% +8.07% (p=0.000 n=10+9) Memmove/16-16 1.25ns ± 0% 1.35ns ± 1% +8.35% (p=0.000 n=9+10) Memmove/32-16 1.34ns ± 0% 1.36ns ± 1% +1.22% (p=0.000 n=10+10) Memmove/64-16 1.45ns ± 0% 1.64ns ± 0% +13.07% (p=0.000 n=10+9) Memmove/128-16 1.86ns ± 0% 2.02ns ± 0% +8.64% (p=0.000 n=10+10) Memmove/256-16 2.47ns ± 0% 2.49ns ± 1% +1.14% (p=0.000 n=10+10) Memmove/512-16 3.96ns ± 1% 3.96ns ± 0% ~ (p=0.182 n=10+10) Memmove/1024-16 5.90ns ± 1% 5.87ns ± 1% ~ (p=0.258 n=9+9) Memmove/2048-16 9.62ns ± 1% 9.62ns ± 2% ~ (p=0.963 n=8+9) Memmove/4096-16 16.4ns ± 0% 17.1ns ± 4% +4.19% (p=0.003 n=8+9) MemmoveOverlap/32-16 1.62ns ± 1% 1.68ns ± 1% +3.53% (p=0.000 n=10+10) MemmoveOverlap/64-16 1.64ns ± 0% 1.65ns ± 0% +0.29% (p=0.002 n=9+9) MemmoveOverlap/128-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.070 n=10+10) MemmoveOverlap/256-16 2.67ns ± 0% 2.67ns ± 0% +0.26% (p=0.012 n=10+10) MemmoveOverlap/512-16 6.20ns ±18% 5.74ns ± 0% ~ (p=0.645 n=10+8) MemmoveOverlap/1024-16 7.28ns ± 0% 7.30ns ± 0% +0.28% (p=0.006 n=8+10) MemmoveOverlap/2048-16 11.9ns ± 0% 12.0ns ± 1% +0.37% (p=0.014 n=9+9) MemmoveOverlap/4096-16 23.3ns ± 1% 23.1ns ± 1% -0.84% (p=0.000 n=8+10) MemmoveUnalignedDst/0-16 1.03ns ± 0% 1.03ns ± 0% +0.19% (p=0.007 n=10+10) MemmoveUnalignedDst/1-16 1.24ns ± 0% 1.25ns ± 1% +0.52% (p=0.022 n=10+10) MemmoveUnalignedDst/2-16 1.23ns ± 0% 1.23ns ± 0% ~ (p=0.051 n=10+10) MemmoveUnalignedDst/3-16 1.23ns ± 0% 1.23ns ± 0% +0.14% (p=0.006 n=9+9) MemmoveUnalignedDst/4-16 1.23ns ± 0% 1.24ns ± 1% +0.37% (p=0.004 n=10+10) MemmoveUnalignedDst/5-16 1.35ns ± 0% 1.35ns ± 0% ~ (p=0.075 n=10+10) MemmoveUnalignedDst/6-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=0.779 n=10+10) MemmoveUnalignedDst/7-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=1.000 n=10+10) MemmoveUnalignedDst/8-16 1.34ns ± 0% 1.35ns ± 1% +0.39% (p=0.024 n=10+10) MemmoveUnalignedDst/9-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.849 n=10+10) MemmoveUnalignedDst/10-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.255 n=10+10) MemmoveUnalignedDst/11-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.304 n=10+10) MemmoveUnalignedDst/12-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.672 n=10+10) MemmoveUnalignedDst/13-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.435 n=10+10) MemmoveUnalignedDst/14-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.340 n=10+10) MemmoveUnalignedDst/15-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.911 n=10+9) MemmoveUnalignedDst/16-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.074 n=10+10) MemmoveUnalignedDst/32-16 1.62ns ± 0% 1.63ns ± 0% ~ (p=0.059 n=10+10) MemmoveUnalignedDst/64-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.234 n=10+10) MemmoveUnalignedDst/128-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.709 n=10+9) MemmoveUnalignedDst/256-16 3.69ns ± 0% 3.70ns ± 0% ~ (p=0.144 n=10+10) MemmoveUnalignedDst/512-16 4.15ns ± 1% 4.14ns ± 0% ~ (p=0.778 n=10+8) MemmoveUnalignedDst/1024-16 7.52ns ± 0% 7.53ns ± 1% ~ (p=0.650 n=9+9) MemmoveUnalignedDst/2048-16 12.9ns ± 0% 12.9ns ± 1% ~ (p=0.548 n=8+8) MemmoveUnalignedDst/4096-16 25.4ns ± 0% 25.4ns ± 0% ~ (p=0.947 n=9+9) MemmoveUnalignedDstOverlap/32-16 4.08ns ± 0% 4.09ns ± 0% ~ (p=0.360 n=10+10) MemmoveUnalignedDstOverlap/64-16 4.56ns ± 0% 4.56ns ± 0% ~ (p=0.705 n=10+9) MemmoveUnalignedDstOverlap/128-16 4.67ns ± 0% 4.67ns ± 0% ~ (p=0.397 n=10+10) MemmoveUnalignedDstOverlap/256-16 5.08ns ± 0% 5.08ns ± 0% ~ (p=0.159 n=10+9) MemmoveUnalignedDstOverlap/512-16 8.45ns ± 5% 8.19ns ± 0% -3.10% (p=0.021 n=10+9) MemmoveUnalignedDstOverlap/1024-16 9.55ns ± 0% 9.56ns ± 0% ~ (p=0.221 n=8+8) MemmoveUnalignedDstOverlap/2048-16 14.0ns ± 0% 14.0ns ± 1% ~ (p=0.200 n=10+9) MemmoveUnalignedDstOverlap/4096-16 26.5ns ± 0% 26.5ns ± 0% ~ (p=0.458 n=10+9) MemmoveUnalignedSrc/0-16 1.02ns ± 1% 0.99ns ± 1% -2.67% (p=0.000 n=10+9) MemmoveUnalignedSrc/1-16 1.13ns ± 0% 1.13ns ± 1% -0.25% (p=0.027 n=10+9) MemmoveUnalignedSrc/2-16 1.13ns ± 1% 1.13ns ± 0% -0.28% (p=0.012 n=10+9) MemmoveUnalignedSrc/3-16 1.24ns ± 1% 1.23ns ± 0% -0.25% (p=0.022 n=9+10) MemmoveUnalignedSrc/4-16 1.24ns ± 0% 1.23ns ± 1% ~ (p=0.118 n=9+10) MemmoveUnalignedSrc/5-16 1.34ns ± 0% 1.34ns ± 1% ~ (p=0.564 n=8+10) MemmoveUnalignedSrc/6-16 1.34ns ± 0% 1.34ns ± 0% -0.39% (p=0.000 n=10+10) MemmoveUnalignedSrc/7-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=0.235 n=10+10) MemmoveUnalignedSrc/8-16 1.34ns ± 0% 1.34ns ± 0% -0.37% (p=0.002 n=10+9) MemmoveUnalignedSrc/9-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.579 n=10+9) MemmoveUnalignedSrc/10-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.534 n=10+9) MemmoveUnalignedSrc/11-16 1.44ns ± 0% 1.44ns ± 1% ~ (p=0.415 n=10+10) MemmoveUnalignedSrc/12-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.218 n=10+10) MemmoveUnalignedSrc/13-16 1.44ns ± 0% 1.44ns ± 1% ~ (p=0.693 n=10+10) MemmoveUnalignedSrc/14-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.901 n=10+10) MemmoveUnalignedSrc/15-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.379 n=10+10) MemmoveUnalignedSrc/16-16 1.44ns ± 1% 1.44ns ± 0% ~ (p=0.538 n=10+10) MemmoveUnalignedSrc/32-16 1.60ns ± 1% 1.60ns ± 0% ~ (p=0.491 n=10+10) MemmoveUnalignedSrc/64-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.564 n=10+10) MemmoveUnalignedSrc/128-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.497 n=10+9) MemmoveUnalignedSrc/256-16 2.70ns ± 0% 2.78ns ± 1% +2.81% (p=0.000 n=10+10) MemmoveUnalignedSrc/512-16 4.31ns ± 0% 4.30ns ± 0% -0.26% (p=0.031 n=8+9) MemmoveUnalignedSrc/1024-16 7.28ns ± 0% 7.21ns ± 1% -1.05% (p=0.000 n=8+10) MemmoveUnalignedSrc/2048-16 13.0ns ± 0% 13.0ns ± 0% ~ (p=0.180 n=9+8) MemmoveUnalignedSrc/4096-16 25.4ns ± 0% 25.3ns ± 1% ~ (p=0.054 n=10+10) MemmoveUnalignedSrcOverlap/32-16 4.04ns ± 0% 4.06ns ± 0% +0.62% (p=0.000 n=9+10) MemmoveUnalignedSrcOverlap/64-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.421 n=10+10) MemmoveUnalignedSrcOverlap/128-16 4.53ns ± 0% 4.52ns ± 0% ~ (p=0.251 n=10+10) MemmoveUnalignedSrcOverlap/256-16 6.17ns ± 0% 6.15ns ± 0% -0.35% (p=0.000 n=10+9) MemmoveUnalignedSrcOverlap/512-16 7.43ns ± 0% 7.44ns ± 0% ~ (p=0.524 n=9+8) MemmoveUnalignedSrcOverlap/1024-16 8.94ns ± 0% 8.94ns ± 0% ~ (p=0.419 n=8+8) MemmoveUnalignedSrcOverlap/2048-16 13.2ns ± 0% 14.5ns ±21% ~ (p=0.107 n=8+10) MemmoveUnalignedSrcOverlap/4096-16 25.6ns ± 0% 25.6ns ± 1% ~ (p=0.650 n=9+9) Memclr/5-16 0.86ns ± 1% 0.86ns ± 2% ~ (p=0.531 n=9+9) Memclr/16-16 1.04ns ± 0% 1.04ns ± 0% +0.32% (p=0.013 n=9+10) Memclr/64-16 1.23ns ± 0% 1.26ns ± 0% +2.28% (p=0.000 n=10+10) Memclr/256-16 2.27ns ± 0% 2.27ns ± 0% ~ (p=0.127 n=10+10) Memclr/4096-16 17.1ns ± 1% 17.3ns ± 0% +0.88% (p=0.000 n=10+10) Memclr/65536-16 821ns ± 0% 822ns ± 0% ~ (p=0.516 n=10+10) Memclr/1M-16 14.1µs ± 1% 14.0µs ± 1% ~ (p=0.516 n=10+10) Memclr/4M-16 86.1µs ± 1% 85.9µs ± 0% ~ (p=0.123 n=10+10) Memclr/8M-16 174µs ± 2% 173µs ± 0% ~ (p=0.408 n=10+8) Memclr/16M-16 385µs ± 4% 387µs ± 0% ~ (p=0.173 n=10+8) Memclr/64M-16 2.18ms ± 0% 2.19ms ± 0% ~ (p=0.113 n=10+9) GoMemclr/5-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.346 n=9+10) GoMemclr/16-16 1.02ns ± 0% 1.02ns ± 0% +0.22% (p=0.003 n=10+8) GoMemclr/64-16 1.14ns ± 0% 1.14ns ± 0% ~ (p=0.948 n=10+9) GoMemclr/256-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.868 n=10+10) MemclrRange/1K_2K-16 457ns ± 0% 428ns ± 1% -6.38% (p=0.000 n=10+10) MemclrRange/2K_8K-16 1.46µs ± 0% 1.46µs ± 0% ~ (p=0.700 n=10+10) MemclrRange/4K_16K-16 1.16µs ± 0% 1.16µs ± 0% ~ (p=0.567 n=9+10) MemclrRange/160K_228K-16 20.7µs ± 0% 20.7µs ± 0% ~ (p=0.160 n=10+10) ClearFat7-16 0.38ns ± 5% 0.21ns ± 1% -45.79% (p=0.000 n=9+10) ClearFat8-16 0.21ns ± 3% 0.12ns ± 2% -44.16% (p=0.000 n=8+9) ClearFat11-16 0.35ns ± 3% 0.21ns ± 1% -40.46% (p=0.000 n=9+9) ClearFat12-16 0.23ns ± 9% 0.21ns ± 1% -10.23% (p=0.000 n=10+9) ClearFat13-16 0.22ns ± 6% 0.21ns ± 2% -6.53% (p=0.000 n=10+10) ClearFat14-16 0.22ns ± 4% 0.21ns ± 1% -5.97% (p=0.000 n=10+10) ClearFat15-16 0.22ns ± 4% 0.21ns ± 1% -6.96% (p=0.000 n=10+9) ClearFat16-16 0.19ns ± 9% 0.12ns ± 6% -34.89% (p=0.000 n=9+10) ClearFat24-16 0.23ns ± 6% 0.21ns ± 1% -10.26% (p=0.000 n=10+9) ClearFat32-16 0.22ns ± 5% 0.21ns ± 2% -5.31% (p=0.000 n=10+10) ClearFat40-16 0.34ns ± 4% 0.62ns ± 1% +83.00% (p=0.000 n=10+10) ClearFat48-16 0.33ns ± 2% 0.41ns ± 0% +26.71% (p=0.000 n=10+10) ClearFat56-16 0.41ns ± 1% 0.41ns ± 0% ~ (p=0.838 n=10+10) ClearFat64-16 0.41ns ± 0% 0.41ns ± 0% ~ (p=0.178 n=10+8) ClearFat72-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.669 n=10+10) ClearFat128-16 1.04ns ± 0% 1.04ns ± 0% ~ (p=0.679 n=10+10) ClearFat256-16 1.86ns ± 0% 1.86ns ± 0% ~ (p=0.066 n=9+10) ClearFat512-16 3.50ns ± 0% 3.50ns ± 0% ~ (p=0.626 n=10+10) ClearFat1024-16 6.79ns ± 0% 6.79ns ± 0% ~ (p=0.986 n=10+10) ClearFat1032-16 13.6ns ± 0% 13.6ns ± 0% +0.13% (p=0.044 n=10+10) ClearFat1040-16 10.3ns ± 0% 10.3ns ± 0% ~ (p=0.175 n=10+9) CopyFat7-16 0.37ns ±13% 0.25ns ± 1% -31.74% (p=0.000 n=10+9) CopyFat8-16 0.17ns ± 1% 0.17ns ± 2% +1.35% (p=0.004 n=9+9) CopyFat11-16 0.26ns ± 1% 0.30ns ± 3% +12.58% (p=0.000 n=9+10) CopyFat12-16 0.28ns ± 2% 0.26ns ± 1% -5.66% (p=0.000 n=9+9) CopyFat13-16 0.26ns ± 0% 0.28ns ± 4% +7.35% (p=0.000 n=8+10) CopyFat14-16 0.29ns ± 6% 0.26ns ± 2% -10.46% (p=0.000 n=10+9) CopyFat15-16 0.26ns ± 1% 0.30ns ± 6% +14.12% (p=0.000 n=8+10) CopyFat16-16 0.21ns ± 1% 0.21ns ± 0% ~ (p=0.426 n=8+8) CopyFat24-16 0.29ns ± 3% 0.25ns ± 1% -12.27% (p=0.000 n=9+10) CopyFat32-16 0.26ns ± 4% 0.29ns ± 4% +11.71% (p=0.000 n=10+10) CopyFat64-16 0.46ns ± 8% 0.42ns ± 1% -8.37% (p=0.002 n=10+10) CopyFat72-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.563 n=10+10) CopyFat128-16 1.53ns ± 0% 1.54ns ± 0% +0.62% (p=0.000 n=10+10) CopyFat256-16 2.68ns ± 0% 2.65ns ± 1% -1.23% (p=0.000 n=10+10) CopyFat512-16 4.93ns ± 1% 5.19ns ± 3% +5.16% (p=0.000 n=9+9) CopyFat520-16 6.99ns ± 0% 6.99ns ± 0% ~ (p=0.539 n=10+10) CopyFat1024-16 11.5ns ± 1% 9.8ns ± 1% -14.98% (p=0.000 n=9+10) CopyFat1032-16 13.6ns ± 0% 13.6ns ± 0% ~ (p=0.728 n=10+10) CopyFat1040-16 11.0ns ± 0% 11.1ns ± 0% +0.53% (p=0.000 n=10+10) Issue18740/2byte-16 10.1µs ± 0% 10.1µs ± 0% ~ (p=0.342 n=10+10) Issue18740/4byte-16 2.34µs ± 0% 2.35µs ± 0% +0.30% (p=0.002 n=10+8) Issue18740/8byte-16 1.28µs ± 0% 1.28µs ± 0% +0.32% (p=0.000 n=9+10) Finalizer-16 345µs ± 1% 336µs ± 0% -2.55% (p=0.000 n=10+9) FinalizerRun-16 450ns ± 3% 420ns ± 1% -6.65% (p=0.000 n=10+10) PallocBitsSummarize/Unpacked00-16 2.88ns ± 0% 2.88ns ± 0% ~ (p=0.358 n=10+10) PallocBitsSummarize/UnpackedFFFFFFFFFFFFFFFF-16 15.2ns ± 0% 15.2ns ± 0% ~ (p=0.925 n=10+10) PallocBitsSummarize/UnpackedAA-16 16.4ns ± 0% 16.3ns ± 0% ~ (p=0.113 n=9+9) PallocBitsSummarize/UnpackedAAAAAAAAAAAAAAAA-16 16.5ns ± 0% 16.6ns ± 0% ~ (p=0.238 n=10+10) PallocBitsSummarize/Unpacked80000000AAAAAAAA-16 37.8ns ± 1% 36.4ns ± 0% -3.70% (p=0.000 n=10+9) PallocBitsSummarize/UnpackedAAAAAAAA00000001-16 41.8ns ± 1% 39.9ns ± 0% -4.68% (p=0.000 n=9+10) PallocBitsSummarize/UnpackedBBBBBBBBBBBBBBBB-16 18.3ns ± 0% 18.3ns ± 0% ~ (p=0.781 n=10+10) PallocBitsSummarize/Unpacked80000000BBBBBBBB-16 38.8ns ± 1% 38.1ns ± 0% -1.78% (p=0.000 n=9+10) PallocBitsSummarize/UnpackedBBBBBBBB00000001-16 37.5ns ± 0% 36.1ns ± 1% -3.88% (p=0.000 n=8+10) PallocBitsSummarize/UnpackedCCCCCCCCCCCCCCCC-16 21.8ns ± 0% 21.9ns ± 0% +0.20% (p=0.018 n=10+9) PallocBitsSummarize/Unpacked4444444444444444-16 21.8ns ± 0% 21.9ns ± 0% +0.20% (p=0.029 n=10+9) PallocBitsSummarize/Unpacked4040404040404040-16 26.5ns ± 0% 26.5ns ± 0% -0.24% (p=0.001 n=9+10) PallocBitsSummarize/Unpacked4000400040004000-16 33.4ns ± 1% 31.3ns ± 0% -6.20% (p=0.000 n=9+10) PallocBitsSummarize/Unpacked1000404044CCAAFF-16 36.4ns ± 1% 35.9ns ± 0% -1.50% (p=0.000 n=10+10) FindBitRange64/Pattern00Size2-16 0.34ns ± 1% 0.35ns ± 1% +3.80% (p=0.000 n=10+9) FindBitRange64/Pattern00Size8-16 0.70ns ± 1% 0.70ns ± 0% -0.68% (p=0.000 n=10+10) FindBitRange64/Pattern00Size32-16 0.70ns ± 1% 0.69ns ± 0% -0.86% (p=0.001 n=10+8) FindBitRange64/PatternFFFFFFFFFFFFFFFFSize2-16 0.34ns ± 1% 0.35ns ± 1% +4.45% (p=0.000 n=9+8) FindBitRange64/PatternFFFFFFFFFFFFFFFFSize8-16 1.54ns ± 0% 1.54ns ± 1% ~ (p=0.914 n=9+9) FindBitRange64/PatternFFFFFFFFFFFFFFFFSize32-16 2.78ns ± 0% 2.78ns ± 0% ~ (p=0.295 n=9+10) FindBitRange64/PatternAASize2-16 0.34ns ± 2% 0.35ns ± 2% +4.61% (p=0.000 n=10+10) FindBitRange64/PatternAASize8-16 0.70ns ± 1% 0.70ns ± 1% -0.82% (p=0.005 n=10+10) FindBitRange64/PatternAASize32-16 0.70ns ± 1% 0.70ns ± 0% -0.73% (p=0.003 n=10+9) FindBitRange64/PatternAAAAAAAAAAAAAAAASize2-16 0.34ns ± 2% 0.35ns ± 2% +3.94% (p=0.000 n=10+10) FindBitRange64/PatternAAAAAAAAAAAAAAAASize8-16 0.70ns ± 1% 0.70ns ± 1% -0.67% (p=0.025 n=10+10) FindBitRange64/PatternAAAAAAAAAAAAAAAASize32-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.118 n=9+10) FindBitRange64/Pattern80000000AAAAAAAASize2-16 0.34ns ± 1% 0.35ns ± 2% +3.72% (p=0.000 n=10+9) FindBitRange64/Pattern80000000AAAAAAAASize8-16 0.70ns ± 1% 0.70ns ± 0% ~ (p=0.102 n=10+10) FindBitRange64/Pattern80000000AAAAAAAASize32-16 0.70ns ± 1% 0.70ns ± 1% -0.55% (p=0.011 n=10+10) FindBitRange64/PatternAAAAAAAA00000001Size2-16 0.34ns ± 2% 0.35ns ± 1% +3.83% (p=0.000 n=10+9) FindBitRange64/PatternAAAAAAAA00000001Size8-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.065 n=10+10) FindBitRange64/PatternAAAAAAAA00000001Size32-16 0.70ns ± 1% 0.70ns ± 1% -0.95% (p=0.002 n=10+10) FindBitRange64/PatternBBBBBBBBBBBBBBBBSize2-16 0.34ns ± 0% 0.35ns ± 1% +4.12% (p=0.000 n=8+10) FindBitRange64/PatternBBBBBBBBBBBBBBBBSize8-16 1.24ns ± 0% 1.23ns ± 0% -0.30% (p=0.002 n=10+9) FindBitRange64/PatternBBBBBBBBBBBBBBBBSize32-16 1.24ns ± 0% 1.24ns ± 0% -0.17% (p=0.023 n=9+10) FindBitRange64/Pattern80000000BBBBBBBBSize2-16 0.34ns ± 1% 0.35ns ± 2% +4.82% (p=0.000 n=9+10) FindBitRange64/Pattern80000000BBBBBBBBSize8-16 1.24ns ± 1% 1.24ns ± 0% ~ (p=0.063 n=10+10) FindBitRange64/Pattern80000000BBBBBBBBSize32-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.164 n=9+10) FindBitRange64/PatternBBBBBBBB00000001Size2-16 0.34ns ± 1% 0.35ns ± 1% +4.38% (p=0.000 n=8+10) FindBitRange64/PatternBBBBBBBB00000001Size8-16 1.24ns ± 1% 1.24ns ± 0% ~ (p=0.052 n=10+10) FindBitRange64/PatternBBBBBBBB00000001Size32-16 1.24ns ± 0% 1.23ns ± 0% -0.40% (p=0.000 n=10+10) FindBitRange64/PatternCCCCCCCCCCCCCCCCSize2-16 0.34ns ± 0% 0.35ns ± 2% +3.96% (p=0.000 n=9+10) FindBitRange64/PatternCCCCCCCCCCCCCCCCSize8-16 1.24ns ± 0% 1.23ns ± 0% -0.30% (p=0.000 n=10+9) FindBitRange64/PatternCCCCCCCCCCCCCCCCSize32-16 1.24ns ± 0% 1.24ns ± 1% ~ (p=0.284 n=10+10) FindBitRange64/Pattern4444444444444444Size2-16 0.34ns ± 1% 0.35ns ± 1% +3.91% (p=0.000 n=9+9) FindBitRange64/Pattern4444444444444444Size8-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.617 n=10+10) FindBitRange64/Pattern4444444444444444Size32-16 0.70ns ± 1% 0.70ns ± 1% -0.60% (p=0.006 n=10+10) FindBitRange64/Pattern4040404040404040Size2-16 0.34ns ± 2% 0.35ns ± 2% +3.67% (p=0.000 n=10+10) FindBitRange64/Pattern4040404040404040Size8-16 0.70ns ± 2% 0.70ns ± 1% -0.87% (p=0.014 n=10+10) FindBitRange64/Pattern4040404040404040Size32-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.256 n=10+10) FindBitRange64/Pattern4000400040004000Size2-16 0.34ns ± 2% 0.35ns ± 3% +4.71% (p=0.000 n=10+10) FindBitRange64/Pattern4000400040004000Size8-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.393 n=10+10) FindBitRange64/Pattern4000400040004000Size32-16 0.70ns ± 1% 0.70ns ± 1% -0.86% (p=0.014 n=10+10) NetpollBreak-16 1.49µs ± 1% 1.50µs ± 3% ~ (p=0.181 n=8+10) Syscall-16 3.68ns ± 1% 3.66ns ± 2% ~ (p=0.148 n=10+10) SyscallWork-16 5.15ns ± 1% 5.13ns ± 0% ~ (p=0.188 n=10+9) SyscallExcess-16 3.89ns ± 2% 3.83ns ± 1% -1.52% (p=0.001 n=10+10) SyscallExcessWork-16 5.34ns ± 1% 5.31ns ± 0% -0.64% (p=0.000 n=10+9) PingPongHog-16 397ns ± 7% 394ns ±11% ~ (p=0.912 n=10+10) StackGrowth-16 67.9ns ± 0% 68.8ns ± 0% +1.28% (p=0.000 n=10+8) StackGrowthDeep-16 7.70µs ± 1% 8.48µs ± 2% +10.06% (p=0.000 n=9+10) CreateGoroutines-16 124ns ± 1% 124ns ± 1% ~ (p=0.254 n=10+10) CreateGoroutinesParallel-16 25.7ns ± 1% 27.6ns ± 2% +7.51% (p=0.000 n=10+10) CreateGoroutinesCapture-16 823ns ± 1% 821ns ± 2% ~ (p=0.699 n=10+10) CreateGoroutinesSingle-16 175ns ± 3% 172ns ± 3% -1.90% (p=0.011 n=10+10) ClosureCall-16 0.11ns ± 7% 0.12ns ± 3% ~ (p=0.842 n=9+10) WakeupParallelSpinning/0s-16 11.4µs ± 0% 11.4µs ± 0% ~ (p=0.325 n=9+10) WakeupParallelSpinning/1µs-16 15.4µs ± 0% 15.4µs ± 1% ~ (p=0.955 n=10+10) WakeupParallelSpinning/2µs-16 18.7µs ± 2% 18.9µs ± 2% ~ (p=0.052 n=10+10) WakeupParallelSpinning/5µs-16 30.7µs ± 0% 30.7µs ± 0% -0.03% (p=0.003 n=10+10) WakeupParallelSpinning/10µs-16 48.8µs ± 0% 48.8µs ± 0% ~ (p=0.670 n=10+10) WakeupParallelSpinning/20µs-16 90.8µs ± 0% 90.8µs ± 0% -0.02% (p=0.004 n=10+10) WakeupParallelSpinning/50µs-16 211µs ± 0% 211µs ± 0% ~ (p=0.194 n=10+10) WakeupParallelSpinning/100µs-16 323µs ± 0% 323µs ± 0% ~ (p=1.000 n=10+9) WakeupParallelSyscall/0s-16 118µs ± 0% 118µs ± 0% ~ (p=0.447 n=10+9) WakeupParallelSyscall/1µs-16 119µs ± 2% 119µs ± 1% ~ (p=0.604 n=10+9) WakeupParallelSyscall/2µs-16 120µs ± 1% 121µs ± 3% ~ (p=0.263 n=8+10) WakeupParallelSyscall/5µs-16 126µs ± 2% 126µs ± 2% ~ (p=0.510 n=10+9) WakeupParallelSyscall/10µs-16 136µs ± 1% 137µs ± 1% ~ (p=0.095 n=9+10) WakeupParallelSyscall/20µs-16 156µs ± 2% 157µs ± 3% ~ (p=0.604 n=10+9) WakeupParallelSyscall/50µs-16 221µs ± 1% 220µs ± 1% ~ (p=0.063 n=10+10) WakeupParallelSyscall/100µs-16 326µs ± 0% 325µs ± 0% -0.26% (p=0.003 n=9+10) Matmult-16 0.67ns ± 2% 0.66ns ± 2% ~ (p=0.256 n=10+10) Fastrand-16 0.08ns ±11% 0.08ns ±13% ~ (p=0.661 n=9+10) Fastrand64-16 0.08ns ±11% 0.08ns ± 6% ~ (p=0.631 n=10+10) FastrandHashiter-16 1.76ns ± 1% 1.76ns ± 1% ~ (p=0.854 n=8+8) Fastrandn/2-16 0.86ns ± 1% 0.86ns ± 1% +1.09% (p=0.000 n=10+9) Fastrandn/3-16 0.85ns ± 1% 0.86ns ± 1% +1.23% (p=0.001 n=10+10) Fastrandn/4-16 0.85ns ± 1% 0.87ns ± 2% +1.60% (p=0.000 n=10+10) Fastrandn/5-16 0.85ns ± 1% 0.86ns ± 1% +1.05% (p=0.000 n=10+10) IfaceCmp100-16 46.6ns ± 0% 46.1ns ± 0% -1.18% (p=0.000 n=10+10) IfaceCmpNil100-16 26.8ns ± 0% 26.8ns ± 0% ~ (p=0.777 n=10+8) EfaceCmpDiff-16 132ns ± 0% 130ns ± 0% -0.95% (p=0.000 n=10+9) EfaceCmpDiffIndirect-16 209ns ± 0% 211ns ± 0% +1.14% (p=0.000 n=10+9) Defer-16 3.40ns ± 1% 3.04ns ± 0% -10.67% (p=0.000 n=10+10) Defer10-16 29.4ns ± 2% 27.2ns ± 3% -7.26% (p=0.000 n=10+10) DeferMany-16 110ns ± 6% 113ns ± 2% +3.45% (p=0.017 n=9+9) PanicRecover-16 67.6ns ± 0% 67.7ns ± 2% ~ (p=0.436 n=9+9) GoroutineProfile/small-nil/idle-16 3.90µs ± 4% 3.86µs ± 2% ~ (p=0.305 n=10+9) GoroutineProfile/small-nil/loaded-16 4.82µs ± 6% 4.82µs ± 4% ~ (p=0.905 n=10+9) GoroutineProfile/small/idle-16 103µs ± 3% 102µs ± 3% ~ (p=0.113 n=9+9) GoroutineProfile/small/loaded-16 432µs ± 5% 440µs ±13% ~ (p=0.604 n=9+10) GoroutineProfile/large-nil/idle-16 3.86µs ± 3% 3.82µs ± 3% ~ (p=0.210 n=10+10) GoroutineProfile/large-nil/loaded-16 4.90µs ± 2% 4.90µs ± 5% ~ (p=0.780 n=10+9) GoroutineProfile/large/idle-16 2.58ms ± 1% 2.52ms ± 1% -2.38% (p=0.000 n=10+10) GoroutineProfile/large/loaded-16 8.62ms ± 9% 8.90ms ±11% ~ (p=0.400 n=9+10) GoroutineProfile/sparse-nil/idle-16 3.85µs ± 4% 3.81µs ± 3% ~ (p=0.470 n=10+10) GoroutineProfile/sparse-nil/loaded-16 4.82µs ± 4% 4.69µs ± 5% ~ (p=0.052 n=10+10) GoroutineProfile/sparse/idle-16 102µs ± 4% 102µs ± 2% ~ (p=0.497 n=10+9) GoroutineProfile/sparse/loaded-16 438µs ± 7% 437µs ± 6% ~ (p=0.796 n=10+10) RWMutexUncontended-16 6.79ns ± 0% 6.78ns ± 0% ~ (p=0.228 n=10+8) RWMutexWrite100-16 85.4ns ± 0% 87.1ns ± 0% +2.00% (p=0.000 n=10+8) RWMutexWrite10-16 168ns ±25% 152ns ±11% ~ (p=0.063 n=10+10) RWMutexWorkWrite100-16 106ns ± 0% 106ns ± 3% ~ (p=0.136 n=10+10) RWMutexWorkWrite10-16 567ns ± 3% 571ns ± 1% ~ (p=0.326 n=10+9) SemTable/OneAddrCollision/n=1000-16 15.9µs ± 1% 16.0µs ± 1% +0.50% (p=0.031 n=9+9) SemTable/ManyAddrCollision/n=1000-16 56.2µs ± 1% 56.8µs ± 1% +1.06% (p=0.000 n=10+10) SemTable/OneAddrCollision/n=2000-16 32.6µs ± 2% 32.9µs ± 4% ~ (p=0.156 n=10+9) SemTable/ManyAddrCollision/n=2000-16 118µs ± 0% 119µs ± 0% +0.75% (p=0.000 n=9+10) SemTable/OneAddrCollision/n=4000-16 65.3µs ± 1% 65.6µs ± 3% ~ (p=0.497 n=9+10) SemTable/ManyAddrCollision/n=4000-16 245µs ± 0% 248µs ± 2% +1.36% (p=0.000 n=9+10) SemTable/OneAddrCollision/n=8000-16 131µs ± 1% 130µs ± 1% -1.01% (p=0.002 n=9+10) SemTable/ManyAddrCollision/n=8000-16 503µs ± 1% 508µs ± 0% +0.97% (p=0.000 n=10+10) MakeSliceCopy/mallocmove/Byte-16 67.6ns ± 1% 64.1ns ± 2% -5.20% (p=0.000 n=10+10) MakeSliceCopy/mallocmove/Int-16 65.0ns ± 7% 61.7ns ± 4% -5.08% (p=0.009 n=10+10) MakeSliceCopy/mallocmove/Ptr-16 88.1ns ± 1% 79.9ns ± 1% -9.29% (p=0.000 n=10+10) MakeSliceCopy/makecopy/Byte-16 65.2ns ± 6% 63.4ns ± 0% ~ (p=0.500 n=10+8) MakeSliceCopy/makecopy/Int-16 63.2ns ± 1% 64.1ns ± 1% +1.34% (p=0.001 n=9+9) MakeSliceCopy/makecopy/Ptr-16 88.1ns ± 1% 80.1ns ± 1% -9.09% (p=0.000 n=10+10) MakeSliceCopy/nilappend/Byte-16 69.8ns ± 1% 65.7ns ± 3% -5.80% (p=0.000 n=10+10) MakeSliceCopy/nilappend/Int-16 69.6ns ± 2% 67.2ns ± 1% -3.50% (p=0.000 n=10+9) MakeSliceCopy/nilappend/Ptr-16 91.5ns ± 1% 83.8ns ± 1% -8.42% (p=0.000 n=9+10) MakeSlice/Byte-16 6.64ns ± 3% 6.58ns ± 2% ~ (p=0.393 n=10+10) MakeSlice/Int16-16 8.60ns ± 1% 8.38ns ± 3% -2.48% (p=0.001 n=9+10) MakeSlice/Int-16 17.7ns ± 3% 16.9ns ± 1% -4.67% (p=0.000 n=10+9) MakeSlice/Ptr-16 24.0ns ± 3% 23.3ns ± 2% -3.25% (p=0.000 n=10+9) MakeSlice/Struct/24-16 34.1ns ± 1% 32.0ns ± 1% -6.11% (p=0.000 n=10+10) MakeSlice/Struct/32-16 39.1ns ± 4% 38.2ns ± 1% ~ (p=0.829 n=10+8) MakeSlice/Struct/40-16 47.0ns ± 5% 43.0ns ± 2% -8.55% (p=0.000 n=10+9) GrowSlice/Byte-16 15.3ns ± 3% 15.0ns ± 2% -1.75% (p=0.005 n=9+9) GrowSlice/Int16-16 18.9ns ± 2% 18.4ns ± 2% -2.71% (p=0.000 n=10+9) GrowSlice/Int-16 33.9ns ± 1% 32.2ns ± 1% -4.89% (p=0.000 n=10+9) GrowSlice/Ptr-16 45.3ns ± 2% 43.5ns ± 1% -4.12% (p=0.000 n=10+10) GrowSlice/Struct/24-16 61.9ns ± 2% 60.0ns ± 4% -3.10% (p=0.002 n=10+10) GrowSlice/Struct/32-16 79.9ns ± 2% 72.3ns ± 3% -9.58% (p=0.000 n=8+10) GrowSlice/Struct/40-16 97.1ns ± 7% 88.8ns ± 5% -8.49% (p=0.000 n=10+10) ExtendSlice/IntSlice-16 21.1ns ± 2% 20.3ns ± 2% -3.71% (p=0.000 n=10+10) ExtendSlice/PointerSlice-16 26.8ns ± 2% 26.3ns ± 2% -1.86% (p=0.004 n=10+10) ExtendSlice/NoGrow-16 1.23ns ± 0% 1.30ns ± 1% +5.03% (p=0.000 n=10+10) Append-16 4.58ns ± 1% 4.53ns ± 0% -1.11% (p=0.000 n=10+10) AppendGrowByte-16 1.46ms ± 8% 1.42ms ± 7% -3.24% (p=0.035 n=10+10) AppendGrowString-16 27.8ms ± 4% 27.2ms ± 5% ~ (p=0.052 n=10+10) AppendSlice/1Bytes-16 1.03ns ± 1% 1.04ns ± 1% ~ (p=0.303 n=10+10) AppendSlice/4Bytes-16 1.04ns ± 0% 1.05ns ± 0% +0.79% (p=0.000 n=9+10) AppendSlice/7Bytes-16 1.23ns ± 1% 1.24ns ± 0% +0.45% (p=0.001 n=10+10) AppendSlice/8Bytes-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.183 n=10+10) AppendSlice/15Bytes-16 1.37ns ± 1% 1.43ns ± 1% +3.88% (p=0.000 n=10+10) AppendSlice/16Bytes-16 1.37ns ± 1% 1.42ns ± 1% +3.63% (p=0.000 n=9+10) AppendSlice/32Bytes-16 1.44ns ± 0% 1.47ns ± 1% +1.83% (p=0.000 n=10+10) AppendSliceLarge/1024Bytes-16 257ns ± 2% 234ns ± 1% -8.96% (p=0.000 n=8+9) AppendSliceLarge/4096Bytes-16 871ns ± 6% 812ns ± 1% -6.80% (p=0.000 n=10+10) AppendSliceLarge/16384Bytes-16 3.15µs ± 6% 3.04µs ± 5% ~ (p=0.052 n=10+10) AppendSliceLarge/65536Bytes-16 10.7µs ± 7% 10.8µs ± 2% ~ (p=0.278 n=10+9) AppendSliceLarge/262144Bytes-16 42.9µs ± 1% 39.6µs ± 5% -7.75% (p=0.000 n=9+10) AppendSliceLarge/1048576Bytes-16 147µs ± 4% 144µs ± 4% -2.21% (p=0.035 n=10+10) AppendStr/1Bytes-16 1.20ns ± 0% 1.20ns ± 0% ~ (p=0.755 n=10+10) AppendStr/4Bytes-16 1.13ns ± 0% 1.14ns ± 1% +1.20% (p=0.000 n=10+10) AppendStr/8Bytes-16 1.24ns ± 0% 1.25ns ± 0% +0.93% (p=0.000 n=10+10) AppendStr/16Bytes-16 1.40ns ± 0% 1.42ns ± 0% +2.10% (p=0.000 n=9+10) AppendStr/32Bytes-16 1.44ns ± 0% 1.45ns ± 0% +0.99% (p=0.000 n=10+10) AppendSpecialCase-16 8.64ns ± 1% 8.89ns ± 2% +2.90% (p=0.000 n=10+10) Copy/1Byte-16 1.24ns ± 1% 1.24ns ± 0% -0.28% (p=0.000 n=10+6) Copy/1String-16 1.24ns ± 0% 1.23ns ± 0% ~ (p=0.160 n=10+10) Copy/2Byte-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.115 n=10+10) Copy/2String-16 1.24ns ± 0% 1.24ns ± 1% ~ (p=0.954 n=10+10) Copy/4Byte-16 1.24ns ± 0% 1.24ns ± 0% -0.44% (p=0.001 n=10+10) Copy/4String-16 1.23ns ± 0% 1.23ns ± 0% ~ (p=0.081 n=10+10) Copy/8Byte-16 1.37ns ± 0% 1.34ns ± 0% -1.79% (p=0.000 n=9+9) Copy/8String-16 1.34ns ± 0% 1.34ns ± 0% -0.58% (p=0.000 n=9+10) Copy/12Byte-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.149 n=9+9) Copy/12String-16 1.44ns ± 0% 1.45ns ± 0% ~ (p=0.124 n=9+9) Copy/16Byte-16 1.44ns ± 0% 1.44ns ± 0% -0.19% (p=0.004 n=10+9) Copy/16String-16 1.44ns ± 0% 1.45ns ± 0% +0.30% (p=0.008 n=10+10) Copy/32Byte-16 1.63ns ± 1% 1.62ns ± 1% -0.72% (p=0.002 n=10+10) Copy/32String-16 1.60ns ± 1% 1.64ns ± 0% +2.23% (p=0.000 n=10+10) Copy/128Byte-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.757 n=9+10) Copy/128String-16 2.07ns ± 0% 2.07ns ± 0% +0.36% (p=0.004 n=10+10) Copy/1024Byte-16 6.07ns ± 2% 6.00ns ± 1% -1.20% (p=0.000 n=9+10) Copy/1024String-16 6.05ns ± 0% 5.95ns ± 1% -1.54% (p=0.000 n=10+9) AppendInPlace/NoGrow/Byte-16 288ns ± 1% 284ns ± 1% -1.58% (p=0.000 n=10+10) AppendInPlace/NoGrow/1Ptr-16 844ns ± 1% 809ns ± 3% -4.13% (p=0.000 n=9+10) AppendInPlace/NoGrow/2Ptr-16 1.47µs ± 1% 1.46µs ± 1% ~ (p=0.388 n=9+10) AppendInPlace/NoGrow/3Ptr-16 1.87µs ± 7% 1.91µs ± 1% ~ (p=0.166 n=10+8) AppendInPlace/NoGrow/4Ptr-16 2.66µs ± 1% 2.67µs ± 3% ~ (p=0.968 n=9+10) AppendInPlace/Grow/Byte-16 126ns ± 2% 121ns ± 2% -4.06% (p=0.000 n=10+10) AppendInPlace/Grow/1Ptr-16 132ns ± 2% 127ns ± 2% -4.28% (p=0.000 n=10+9) AppendInPlace/Grow/2Ptr-16 196ns ± 2% 188ns ± 1% -4.20% (p=0.000 n=10+8) AppendInPlace/Grow/3Ptr-16 264ns ± 1% 260ns ± 1% -1.51% (p=0.000 n=9+10) AppendInPlace/Grow/4Ptr-16 297ns ± 2% 294ns ± 2% ~ (p=0.085 n=10+10) StackCopyPtr-16 36.4ms ± 2% 36.7ms ± 2% ~ (p=0.481 n=10+10) StackCopy-16 33.9ms ± 3% 32.6ms ± 1% -3.87% (p=0.000 n=10+8) StackCopyNoCache-16 1.00ms ± 5% 1.01ms ± 5% ~ (p=0.143 n=10+10) StackCopyWithStkobj-16 11.0ms ± 3% 10.9ms ± 4% ~ (p=0.579 n=10+10) Issue18138-16 49.2µs ± 5% 49.0µs ± 4% ~ (p=1.000 n=10+9) CompareStringEqual-16 1.39ns ± 1% 1.45ns ± 2% +3.80% (p=0.000 n=8+10) CompareStringIdentical-16 0.55ns ± 1% 0.55ns ± 0% +0.42% (p=0.007 n=10+10) CompareStringSameLength-16 1.03ns ± 0% 1.03ns ± 0% ~ (p=0.430 n=9+10) CompareStringDifferentLength-16 0.11ns ± 2% 0.11ns ± 3% ~ (p=0.139 n=9+10) CompareStringBigUnaligned-16 23.9µs ± 1% 24.0µs ± 1% ~ (p=0.370 n=9+8) CompareStringBig-16 22.0µs ± 3% 22.2µs ± 3% ~ (p=0.243 n=9+10) ConcatStringAndBytes-16 10.7ns ± 1% 10.0ns ± 2% -6.33% (p=0.000 n=10+10) SliceByteToString/1-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=0.057 n=10+10) SliceByteToString/2-16 6.67ns ± 2% 6.60ns ± 3% ~ (p=0.101 n=10+10) SliceByteToString/4-16 7.76ns ± 2% 7.56ns ± 3% -2.59% (p=0.001 n=10+10) SliceByteToString/8-16 9.81ns ± 4% 9.57ns ± 2% -2.48% (p=0.005 n=10+10) SliceByteToString/16-16 14.0ns ± 3% 13.7ns ± 2% -2.31% (p=0.009 n=10+10) SliceByteToString/32-16 17.3ns ± 1% 16.7ns ± 2% -3.41% (p=0.000 n=10+10) SliceByteToString/64-16 25.1ns ± 1% 24.1ns ± 2% -3.93% (p=0.000 n=9+10) SliceByteToString/128-16 38.6ns ± 1% 36.5ns ± 1% -5.60% (p=0.000 n=10+10) RuneCount/lenruneslice/ASCII-16 4.12ns ± 0% 4.11ns ± 0% ~ (p=0.382 n=10+10) RuneCount/lenruneslice/Japanese-16 25.4ns ± 2% 25.6ns ± 2% ~ (p=0.138 n=9+10) RuneCount/lenruneslice/MixedLength-16 17.1ns ± 0% 17.2ns ± 0% +0.59% (p=0.000 n=9+9) RuneCount/rangeloop/ASCII-16 3.30ns ± 1% 3.29ns ± 0% ~ (p=0.267 n=10+10) RuneCount/rangeloop/Japanese-16 20.1ns ± 1% 24.9ns ± 1% +24.31% (p=0.000 n=9+9) RuneCount/rangeloop/MixedLength-16 16.5ns ± 1% 16.7ns ± 1% +1.34% (p=0.000 n=10+10) RuneCount/utf8.RuneCountInString/ASCII-16 5.71ns ± 1% 5.73ns ± 2% ~ (p=0.579 n=10+10) RuneCount/utf8.RuneCountInString/Japanese-16 22.0ns ± 6% 18.4ns ± 3% -16.41% (p=0.000 n=9+10) RuneCount/utf8.RuneCountInString/MixedLength-16 15.0ns ± 1% 14.9ns ± 1% -1.01% (p=0.004 n=9+10) RuneIterate/range/ASCII-16 2.69ns ± 1% 2.72ns ± 0% +0.94% (p=0.026 n=10+9) RuneIterate/range/Japanese-16 24.5ns ± 2% 25.3ns ± 2% +3.23% (p=0.000 n=9+10) RuneIterate/range/MixedLength-16 17.0ns ± 1% 17.1ns ± 1% +0.85% (p=0.000 n=10+10) RuneIterate/range1/ASCII-16 2.70ns ± 1% 2.72ns ± 0% ~ (p=0.058 n=9+9) RuneIterate/range1/Japanese-16 24.1ns ± 2% 25.2ns ± 3% +4.30% (p=0.000 n=10+10) RuneIterate/range1/MixedLength-16 16.9ns ± 1% 17.7ns ± 0% +5.04% (p=0.000 n=10+8) RuneIterate/range2/ASCII-16 2.84ns ± 8% 2.72ns ± 1% -4.28% (p=0.003 n=10+9) RuneIterate/range2/Japanese-16 22.7ns ± 4% 25.2ns ± 3% +10.97% (p=0.000 n=10+10) RuneIterate/range2/MixedLength-16 17.0ns ± 1% 17.2ns ± 0% +0.95% (p=0.000 n=10+10) ArrayEqual-16 0.40ns ± 5% 0.35ns ± 2% -11.83% (p=0.000 n=10+10) Func/Name-16 8.05ns ± 1% 8.09ns ± 1% +0.40% (p=0.025 n=8+10) Func/Entry-16 1.73ns ± 1% 1.66ns ± 1% -3.93% (p=0.000 n=10+10) Func/FileLine-16 27.5ns ± 2% 26.0ns ± 0% -5.50% (p=0.000 n=10+10) [Geo mean] 16.7ns 15.7ns -6.08% name old speed new speed delta SetTypePtr-16 11.0GB/s ± 1% 11.0GB/s ± 3% ~ (p=0.684 n=10+10) SetTypePtr8-16 15.5GB/s ± 0% 15.5GB/s ± 0% ~ (p=0.123 n=10+10) SetTypePtr16-16 31.0GB/s ± 1% 31.1GB/s ± 0% ~ (p=0.123 n=10+10) SetTypePtr32-16 62.1GB/s ± 0% 62.2GB/s ± 0% ~ (p=0.123 n=10+10) SetTypePtr64-16 124GB/s ± 0% 124GB/s ± 0% ~ (p=0.684 n=10+10) SetTypePtr126-16 146GB/s ± 0% 146GB/s ± 0% ~ (p=0.481 n=10+10) SetTypePtr128-16 154GB/s ± 0% 154GB/s ± 0% ~ (p=0.243 n=9+10) SetTypePtrSlice-16 151GB/s ± 1% 151GB/s ± 1% ~ (p=0.497 n=9+10) SetTypeNode1-16 5.82GB/s ± 1% 5.82GB/s ± 0% ~ (p=0.353 n=10+10) SetTypeNode1Slice-16 76.1GB/s ± 1% 77.0GB/s ± 1% +1.19% (p=0.000 n=10+10) SetTypeNode8-16 19.4GB/s ± 0% 19.4GB/s ± 0% ~ (p=0.130 n=8+8) SetTypeNode8Slice-16 113GB/s ± 0% 113GB/s ± 0% ~ (p=0.604 n=10+9) SetTypeNode64-16 76.5GB/s ± 0% 76.5GB/s ± 0% ~ (p=0.190 n=10+10) SetTypeNode64Slice-16 97.8GB/s ± 0% 97.7GB/s ± 0% ~ (p=0.549 n=9+10) SetTypeNode64Dead-16 95.5GB/s ± 0% 95.7GB/s ± 0% ~ (p=0.118 n=10+6) SetTypeNode64DeadSlice-16 112GB/s ± 0% 112GB/s ± 0% ~ (p=0.353 n=10+10) SetTypeNode124-16 146GB/s ± 0% 146GB/s ± 0% ~ (p=0.853 n=10+10) SetTypeNode124Slice-16 146GB/s ± 5% 149GB/s ± 0% ~ (p=0.315 n=10+10) SetTypeNode126-16 154GB/s ± 0% 154GB/s ± 0% ~ (p=0.356 n=10+9) SetTypeNode126Slice-16 150GB/s ± 0% 150GB/s ± 0% ~ (p=0.095 n=9+10) SetTypeNode128-16 107GB/s ± 0% 107GB/s ± 0% +0.31% (p=0.003 n=9+10) SetTypeNode128Slice-16 119GB/s ± 0% 120GB/s ± 0% ~ (p=0.156 n=10+9) SetTypeNode130-16 108GB/s ± 0% 108GB/s ± 0% +0.33% (p=0.002 n=10+10) SetTypeNode130Slice-16 119GB/s ± 0% 119GB/s ± 0% ~ (p=0.739 n=10+10) SetTypeNode1024-16 160GB/s ± 0% 159GB/s ± 1% ~ (p=0.113 n=9+9) SetTypeNode1024Slice-16 144GB/s ± 0% 144GB/s ± 0% ~ (p=0.063 n=10+10) Hash5-16 2.59GB/s ± 1% 2.49GB/s ± 0% -3.90% (p=0.000 n=10+9) Hash16-16 7.85GB/s ± 1% 7.23GB/s ± 1% -7.92% (p=0.000 n=10+10) Hash64-16 24.0GB/s ± 0% 23.9GB/s ± 0% ~ (p=0.190 n=9+9) Hash1024-16 62.4GB/s ± 0% 62.3GB/s ± 0% -0.16% (p=0.017 n=9+10) Hash65536-16 74.0GB/s ± 0% 74.0GB/s ± 0% ~ (p=0.796 n=10+10) Memmove/1-16 1.08GB/s ± 0% 1.08GB/s ± 0% -0.21% (p=0.035 n=10+10) Memmove/2-16 2.16GB/s ± 0% 2.15GB/s ± 0% ~ (p=0.105 n=10+10) Memmove/3-16 3.24GB/s ± 1% 3.22GB/s ± 1% -0.49% (p=0.004 n=10+10) Memmove/4-16 3.89GB/s ± 0% 3.89GB/s ± 0% ~ (p=0.218 n=10+10) Memmove/5-16 4.42GB/s ± 0% 4.42GB/s ± 0% ~ (p=0.075 n=10+10) Memmove/6-16 5.31GB/s ± 0% 5.29GB/s ± 1% ~ (p=0.218 n=10+10) Memmove/7-16 6.19GB/s ± 0% 6.18GB/s ± 0% -0.15% (p=0.035 n=10+9) Memmove/8-16 7.07GB/s ± 0% 7.07GB/s ± 0% ~ (p=0.684 n=10+10) Memmove/9-16 7.22GB/s ± 0% 6.68GB/s ± 0% -7.37% (p=0.000 n=10+10) Memmove/10-16 8.02GB/s ± 0% 7.43GB/s ± 0% -7.38% (p=0.000 n=9+9) Memmove/11-16 8.83GB/s ± 0% 8.13GB/s ± 0% -7.87% (p=0.000 n=10+9) Memmove/12-16 9.62GB/s ± 0% 8.89GB/s ± 1% -7.61% (p=0.000 n=10+10) Memmove/13-16 10.4GB/s ± 0% 9.7GB/s ± 0% -7.20% (p=0.000 n=10+10) Memmove/14-16 11.2GB/s ± 0% 10.4GB/s ± 1% -7.64% (p=0.000 n=10+9) Memmove/15-16 12.0GB/s ± 0% 11.1GB/s ± 0% -7.46% (p=0.000 n=10+9) Memmove/16-16 12.8GB/s ± 0% 11.8GB/s ± 1% -7.67% (p=0.000 n=10+10) Memmove/32-16 23.8GB/s ± 0% 23.5GB/s ± 1% -1.20% (p=0.000 n=10+10) Memmove/64-16 44.2GB/s ± 0% 39.1GB/s ± 0% -11.56% (p=0.000 n=10+9) Memmove/128-16 68.7GB/s ± 0% 63.2GB/s ± 0% -7.95% (p=0.000 n=10+10) Memmove/256-16 104GB/s ± 0% 103GB/s ± 0% -1.13% (p=0.000 n=10+10) Memmove/512-16 129GB/s ± 1% 129GB/s ± 0% ~ (p=0.165 n=10+10) Memmove/1024-16 174GB/s ± 1% 174GB/s ± 1% ~ (p=0.258 n=9+9) Memmove/2048-16 213GB/s ± 1% 213GB/s ± 2% ~ (p=0.963 n=8+9) Memmove/4096-16 250GB/s ± 1% 240GB/s ± 4% -3.83% (p=0.006 n=9+9) MemmoveOverlap/32-16 19.8GB/s ± 1% 19.1GB/s ± 1% -3.40% (p=0.000 n=10+10) MemmoveOverlap/64-16 39.0GB/s ± 0% 38.8GB/s ± 0% -0.28% (p=0.001 n=9+9) MemmoveOverlap/128-16 62.2GB/s ± 0% 62.1GB/s ± 0% ~ (p=0.063 n=10+10) MemmoveOverlap/256-16 96.0GB/s ± 0% 95.8GB/s ± 0% -0.26% (p=0.009 n=10+10) MemmoveOverlap/512-16 83.6GB/s ±16% 89.2GB/s ± 0% ~ (p=0.696 n=10+8) MemmoveOverlap/1024-16 141GB/s ± 0% 140GB/s ± 0% -0.28% (p=0.006 n=8+10) MemmoveOverlap/2048-16 172GB/s ± 0% 171GB/s ± 1% -0.38% (p=0.008 n=9+9) MemmoveOverlap/4096-16 176GB/s ± 1% 177GB/s ± 1% +0.84% (p=0.001 n=8+10) MemmoveUnalignedDst/1-16 806MB/s ± 0% 802MB/s ± 1% -0.52% (p=0.023 n=10+10) MemmoveUnalignedDst/2-16 1.62GB/s ± 0% 1.62GB/s ± 0% -0.11% (p=0.041 n=10+10) MemmoveUnalignedDst/3-16 2.43GB/s ± 0% 2.43GB/s ± 0% -0.14% (p=0.006 n=9+9) MemmoveUnalignedDst/4-16 3.24GB/s ± 0% 3.23GB/s ± 1% -0.36% (p=0.007 n=10+10) MemmoveUnalignedDst/5-16 3.71GB/s ± 0% 3.71GB/s ± 0% ~ (p=0.063 n=10+10) MemmoveUnalignedDst/6-16 4.48GB/s ± 0% 4.47GB/s ± 0% ~ (p=0.912 n=10+10) MemmoveUnalignedDst/7-16 5.22GB/s ± 0% 5.22GB/s ± 0% ~ (p=1.000 n=10+10) MemmoveUnalignedDst/8-16 5.95GB/s ± 0% 5.93GB/s ± 1% -0.40% (p=0.023 n=10+10) MemmoveUnalignedDst/9-16 6.24GB/s ± 0% 6.24GB/s ± 0% ~ (p=0.912 n=10+10) MemmoveUnalignedDst/10-16 6.94GB/s ± 0% 6.94GB/s ± 0% ~ (p=0.353 n=10+10) MemmoveUnalignedDst/11-16 7.64GB/s ± 0% 7.63GB/s ± 0% ~ (p=0.393 n=10+10) MemmoveUnalignedDst/12-16 8.33GB/s ± 0% 8.33GB/s ± 0% ~ (p=0.971 n=10+10) MemmoveUnalignedDst/13-16 9.02GB/s ± 0% 9.01GB/s ± 0% ~ (p=0.436 n=10+10) MemmoveUnalignedDst/14-16 9.71GB/s ± 0% 9.71GB/s ± 0% ~ (p=0.280 n=10+10) MemmoveUnalignedDst/15-16 10.4GB/s ± 0% 10.4GB/s ± 1% ~ (p=0.853 n=10+10) MemmoveUnalignedDst/16-16 11.1GB/s ± 0% 11.1GB/s ± 0% ~ (p=0.089 n=10+10) MemmoveUnalignedDst/32-16 19.7GB/s ± 1% 19.6GB/s ± 0% ~ (p=0.075 n=10+10) MemmoveUnalignedDst/64-16 38.9GB/s ± 0% 38.8GB/s ± 0% ~ (p=0.218 n=10+10) MemmoveUnalignedDst/128-16 62.1GB/s ± 0% 62.1GB/s ± 0% ~ (p=0.549 n=10+9) MemmoveUnalignedDst/256-16 69.4GB/s ± 0% 69.3GB/s ± 0% ~ (p=0.105 n=10+10) MemmoveUnalignedDst/512-16 124GB/s ± 1% 124GB/s ± 0% ~ (p=0.762 n=10+8) MemmoveUnalignedDst/1024-16 136GB/s ± 0% 136GB/s ± 1% ~ (p=0.666 n=9+9) MemmoveUnalignedDst/2048-16 159GB/s ± 0% 159GB/s ± 0% ~ (p=0.574 n=8+8) MemmoveUnalignedDst/4096-16 161GB/s ± 0% 161GB/s ± 0% ~ (p=1.000 n=9+9) MemmoveUnalignedDstOverlap/32-16 7.84GB/s ± 0% 7.83GB/s ± 0% ~ (p=0.353 n=10+10) MemmoveUnalignedDstOverlap/64-16 14.0GB/s ± 0% 14.0GB/s ± 0% ~ (p=0.661 n=10+9) MemmoveUnalignedDstOverlap/128-16 27.4GB/s ± 0% 27.4GB/s ± 0% ~ (p=0.353 n=10+10) MemmoveUnalignedDstOverlap/256-16 50.4GB/s ± 0% 50.4GB/s ± 0% ~ (p=0.156 n=10+9) MemmoveUnalignedDstOverlap/512-16 60.7GB/s ± 4% 62.5GB/s ± 0% +3.07% (p=0.022 n=10+9) MemmoveUnalignedDstOverlap/1024-16 107GB/s ± 0% 107GB/s ± 0% ~ (p=0.234 n=8+8) MemmoveUnalignedDstOverlap/2048-16 146GB/s ± 0% 146GB/s ± 1% ~ (p=0.182 n=10+9) MemmoveUnalignedDstOverlap/4096-16 155GB/s ± 0% 155GB/s ± 0% ~ (p=0.400 n=10+9) MemmoveUnalignedSrc/1-16 882MB/s ± 0% 884MB/s ± 1% +0.24% (p=0.033 n=10+9) MemmoveUnalignedSrc/2-16 1.76GB/s ± 1% 1.77GB/s ± 0% +0.27% (p=0.028 n=10+9) MemmoveUnalignedSrc/3-16 2.43GB/s ± 0% 2.43GB/s ± 0% +0.26% (p=0.027 n=9+10) MemmoveUnalignedSrc/4-16 3.24GB/s ± 0% 3.24GB/s ± 1% ~ (p=0.079 n=9+10) MemmoveUnalignedSrc/5-16 3.73GB/s ± 0% 3.73GB/s ± 1% ~ (p=0.829 n=8+10) MemmoveUnalignedSrc/6-16 4.47GB/s ± 0% 4.49GB/s ± 0% +0.39% (p=0.000 n=10+10) MemmoveUnalignedSrc/7-16 5.22GB/s ± 0% 5.23GB/s ± 0% ~ (p=0.280 n=10+10) MemmoveUnalignedSrc/8-16 5.95GB/s ± 0% 5.98GB/s ± 0% +0.39% (p=0.001 n=10+9) MemmoveUnalignedSrc/9-16 6.24GB/s ± 0% 6.25GB/s ± 0% ~ (p=0.549 n=10+9) MemmoveUnalignedSrc/10-16 6.93GB/s ± 0% 6.94GB/s ± 0% ~ (p=0.604 n=10+9) MemmoveUnalignedSrc/11-16 7.63GB/s ± 0% 7.63GB/s ± 1% ~ (p=0.353 n=10+10) MemmoveUnalignedSrc/12-16 8.32GB/s ± 0% 8.32GB/s ± 0% ~ (p=0.218 n=10+10) MemmoveUnalignedSrc/13-16 9.02GB/s ± 0% 9.00GB/s ± 1% ~ (p=0.684 n=10+10) MemmoveUnalignedSrc/14-16 9.71GB/s ± 0% 9.71GB/s ± 0% ~ (p=0.739 n=10+10) MemmoveUnalignedSrc/15-16 10.4GB/s ± 0% 10.4GB/s ± 0% ~ (p=0.353 n=10+10) MemmoveUnalignedSrc/16-16 11.1GB/s ± 1% 11.1GB/s ± 0% ~ (p=0.579 n=10+10) MemmoveUnalignedSrc/32-16 20.0GB/s ± 1% 20.0GB/s ± 0% ~ (p=0.631 n=10+10) MemmoveUnalignedSrc/64-16 38.8GB/s ± 0% 38.8GB/s ± 0% ~ (p=0.579 n=10+10) MemmoveUnalignedSrc/128-16 61.2GB/s ± 0% 61.2GB/s ± 0% ~ (p=0.780 n=10+9) MemmoveUnalignedSrc/256-16 94.8GB/s ± 0% 92.2GB/s ± 1% -2.73% (p=0.000 n=10+10) MemmoveUnalignedSrc/512-16 119GB/s ± 0% 119GB/s ± 0% +0.26% (p=0.027 n=8+9) MemmoveUnalignedSrc/1024-16 141GB/s ± 0% 142GB/s ± 1% +1.07% (p=0.000 n=8+10) MemmoveUnalignedSrc/2048-16 157GB/s ± 0% 157GB/s ± 0% ~ (p=0.167 n=9+8) MemmoveUnalignedSrc/4096-16 161GB/s ± 0% 162GB/s ± 1% ~ (p=0.063 n=10+10) MemmoveUnalignedSrcOverlap/32-16 7.93GB/s ± 0% 7.88GB/s ± 0% -0.63% (p=0.000 n=9+10) MemmoveUnalignedSrcOverlap/64-16 15.5GB/s ± 0% 15.5GB/s ± 0% ~ (p=0.529 n=10+10) MemmoveUnalignedSrcOverlap/128-16 28.3GB/s ± 0% 28.3GB/s ± 0% ~ (p=0.218 n=10+10) MemmoveUnalignedSrcOverlap/256-16 41.5GB/s ± 0% 41.6GB/s ± 0% +0.35% (p=0.000 n=10+9) MemmoveUnalignedSrcOverlap/512-16 68.9GB/s ± 0% 68.8GB/s ± 0% ~ (p=0.541 n=9+8) MemmoveUnalignedSrcOverlap/1024-16 115GB/s ± 0% 115GB/s ± 0% ~ (p=0.382 n=8+8) MemmoveUnalignedSrcOverlap/2048-16 155GB/s ± 0% 144GB/s ±18% ~ (p=0.101 n=8+10) MemmoveUnalignedSrcOverlap/4096-16 160GB/s ± 0% 160GB/s ± 1% ~ (p=0.605 n=9+9) Memclr/5-16 5.81GB/s ± 1% 5.80GB/s ± 2% ~ (p=0.546 n=9+9) Memclr/16-16 15.5GB/s ± 0% 15.4GB/s ± 0% -0.32% (p=0.008 n=9+10) Memclr/64-16 51.8GB/s ± 0% 50.7GB/s ± 0% -2.22% (p=0.000 n=10+10) Memclr/256-16 113GB/s ± 0% 113GB/s ± 0% ~ (p=0.143 n=10+10) Memclr/4096-16 239GB/s ± 1% 237GB/s ± 0% -0.87% (p=0.000 n=10+10) Memclr/65536-16 79.8GB/s ± 0% 79.7GB/s ± 0% ~ (p=0.529 n=10+10) Memclr/1M-16 74.6GB/s ± 1% 74.7GB/s ± 1% ~ (p=0.529 n=10+10) Memclr/4M-16 48.7GB/s ± 1% 48.8GB/s ± 0% ~ (p=0.123 n=10+10) Memclr/8M-16 48.2GB/s ± 2% 48.6GB/s ± 0% ~ (p=0.408 n=10+8) Memclr/16M-16 43.6GB/s ± 4% 43.3GB/s ± 0% ~ (p=0.173 n=10+8) Memclr/64M-16 30.7GB/s ± 0% 30.7GB/s ± 0% ~ (p=0.113 n=10+9) GoMemclr/5-16 6.07GB/s ± 0% 6.08GB/s ± 0% ~ (p=0.367 n=9+10) GoMemclr/16-16 15.6GB/s ± 0% 15.6GB/s ± 0% -0.22% (p=0.004 n=9+9) GoMemclr/64-16 56.1GB/s ± 0% 56.1GB/s ± 0% ~ (p=0.968 n=10+9) GoMemclr/256-16 125GB/s ± 0% 124GB/s ± 0% ~ (p=0.912 n=10+10) MemclrRange/1K_2K-16 210GB/s ± 0% 224GB/s ± 1% +6.81% (p=0.000 n=10+10) MemclrRange/2K_8K-16 228GB/s ± 0% 228GB/s ± 0% ~ (p=0.684 n=10+10) MemclrRange/4K_16K-16 279GB/s ± 0% 279GB/s ± 0% ~ (p=0.780 n=9+10) MemclrRange/160K_228K-16 80.3GB/s ± 0% 80.2GB/s ± 0% ~ (p=0.165 n=10+10) Copy/1Byte-16 808MB/s ± 1% 810MB/s ± 0% +0.28% (p=0.000 n=10+8) Copy/1String-16 810MB/s ± 0% 811MB/s ± 0% ~ (p=0.105 n=10+10) Copy/2Byte-16 1.62GB/s ± 0% 1.62GB/s ± 0% ~ (p=0.182 n=10+9) Copy/2String-16 1.62GB/s ± 1% 1.62GB/s ± 1% ~ (p=1.000 n=10+10) Copy/4Byte-16 3.22GB/s ± 0% 3.24GB/s ± 0% +0.46% (p=0.000 n=10+10) Copy/4String-16 3.24GB/s ± 0% 3.24GB/s ± 0% ~ (p=0.075 n=10+10) Copy/8Byte-16 5.86GB/s ± 0% 5.96GB/s ± 0% +1.82% (p=0.000 n=9+9) Copy/8String-16 5.95GB/s ± 0% 5.99GB/s ± 0% +0.59% (p=0.000 n=9+10) Copy/12Byte-16 8.32GB/s ± 0% 8.32GB/s ± 0% ~ (p=0.190 n=9+9) Copy/12String-16 8.31GB/s ± 0% 8.29GB/s ± 0% ~ (p=0.068 n=9+10) Copy/16Byte-16 11.1GB/s ± 0% 11.1GB/s ± 0% +0.18% (p=0.003 n=10+9) Copy/16String-16 11.1GB/s ± 0% 11.1GB/s ± 0% -0.31% (p=0.009 n=10+10) Copy/32Byte-16 19.6GB/s ± 1% 19.8GB/s ± 1% +0.72% (p=0.002 n=10+10) Copy/32String-16 20.0GB/s ± 0% 19.5GB/s ± 0% -2.19% (p=0.000 n=10+10) Copy/128Byte-16 62.2GB/s ± 0% 62.1GB/s ± 0% ~ (p=0.661 n=9+10) Copy/128String-16 61.9GB/s ± 0% 61.7GB/s ± 0% -0.35% (p=0.005 n=10+10) Copy/1024Byte-16 169GB/s ± 2% 171GB/s ± 1% +1.21% (p=0.000 n=9+10) Copy/1024String-16 169GB/s ± 0% 172GB/s ± 1% +1.57% (p=0.000 n=10+9) CompareStringBigUnaligned-16 43.8GB/s ± 1% 43.7GB/s ± 1% ~ (p=0.370 n=9+8) CompareStringBig-16 47.6GB/s ± 3% 47.3GB/s ± 3% ~ (p=0.243 n=9+10) [Geo mean] 25.3GB/s 28.0GB/s +10.66% name old p50-ns new p50-ns delta ReadMemStatsLatency-16 98.4k ±37% 112.7k ±64% ~ (p=0.436 n=10+10) ReadMetricsLatency-16 1.69k ± 3% 1.70k ± 2% ~ (p=0.646 n=9+10) GoroutineProfile/small-nil/idle-16 3.75k ± 4% 3.72k ± 1% ~ (p=0.447 n=10+9) GoroutineProfile/small-nil/loaded-16 4.33k ± 3% 4.31k ± 5% ~ (p=0.931 n=9+9) GoroutineProfile/small/idle-16 102k ± 3% 101k ± 4% ~ (p=0.113 n=9+9) GoroutineProfile/small/loaded-16 214k ± 3% 215k ± 6% ~ (p=0.842 n=9+10) GoroutineProfile/large-nil/idle-16 3.70k ± 2% 3.65k ± 2% ~ (p=0.075 n=10+10) GoroutineProfile/large-nil/loaded-16 4.36k ± 3% 4.31k ±10% ~ (p=0.631 n=10+10) GoroutineProfile/large/idle-16 2.56M ± 1% 2.51M ± 1% -2.28% (p=0.000 n=10+10) GoroutineProfile/large/loaded-16 6.77M ± 4% 6.85M ±19% ~ (p=0.536 n=7+10) GoroutineProfile/sparse-nil/idle-16 3.66k ± 1% 3.64k ± 2% ~ (p=0.136 n=9+9) GoroutineProfile/sparse-nil/loaded-16 4.25k ± 5% 4.15k ± 4% ~ (p=0.190 n=10+10) GoroutineProfile/sparse/idle-16 102k ± 4% 101k ± 3% ~ (p=0.447 n=10+9) GoroutineProfile/sparse/loaded-16 216k ± 4% 218k ± 4% ~ (p=0.549 n=9+10) [Geo mean] 35.8k 35.9k +0.35% name old p90-ns new p90-ns delta ReadMemStatsLatency-16 983k ±310% 200k ±34% -79.62% (p=0.034 n=10+8) ReadMetricsLatency-16 4.01k ±35% 3.75k ±17% ~ (p=0.315 n=10+10) GoroutineProfile/small-nil/idle-16 4.21k ± 4% 4.27k ± 8% ~ (p=0.968 n=9+10) GoroutineProfile/small-nil/loaded-16 5.58k ± 8% 5.35k ±12% ~ (p=0.190 n=10+10) GoroutineProfile/small/idle-16 108k ± 6% 107k ± 7% ~ (p=0.497 n=9+10) GoroutineProfile/small/loaded-16 450k ± 5% 432k ± 3% -3.92% (p=0.002 n=9+9) GoroutineProfile/large-nil/idle-16 4.13k ± 7% 4.04k ± 2% ~ (p=0.181 n=10+8) GoroutineProfile/large-nil/loaded-16 5.76k ± 4% 5.67k ± 5% ~ (p=0.190 n=10+10) GoroutineProfile/large/idle-16 2.63M ± 2% 2.58M ± 1% -1.97% (p=0.000 n=10+10) GoroutineProfile/large/loaded-16 16.9M ± 4% 17.0M ± 6% ~ (p=0.661 n=9+10) GoroutineProfile/sparse-nil/idle-16 4.21k ±10% 4.07k ± 7% ~ (p=0.128 n=10+10) GoroutineProfile/sparse-nil/loaded-16 5.55k ± 8% 5.38k ± 6% ~ (p=0.089 n=10+10) GoroutineProfile/sparse/idle-16 106k ± 4% 106k ± 3% ~ (p=0.661 n=10+9) GoroutineProfile/sparse/loaded-16 454k ± 6% 441k ± 6% -2.86% (p=0.043 n=10+10) [Geo mean] 58.4k 51.0k -12.61% name old p99-ns new p99-ns delta ReadMemStatsLatency-16 983k ±310% 200k ±34% -79.62% (p=0.034 n=10+8) ReadMetricsLatency-16 26.6k ±22% 26.6k ±17% ~ (p=0.971 n=10+10) GoroutineProfile/small-nil/idle-16 5.27k ±12% 5.19k ±12% ~ (p=0.579 n=10+10) GoroutineProfile/small-nil/loaded-16 7.28k ± 3% 7.04k ± 9% ~ (p=0.113 n=9+10) GoroutineProfile/small/idle-16 114k ± 6% 113k ± 6% ~ (p=0.604 n=9+10) GoroutineProfile/small/loaded-16 4.84M ±70% 6.06M ±131% ~ (p=0.842 n=9+10) GoroutineProfile/large-nil/idle-16 5.38k ±19% 5.26k ±13% ~ (p=0.912 n=10+10) GoroutineProfile/large-nil/loaded-16 7.38k ± 3% 7.23k ± 4% ~ (p=0.143 n=10+10) GoroutineProfile/large/idle-16 2.79M ± 5% 2.72M ± 3% ~ (p=0.089 n=10+10) GoroutineProfile/large/loaded-16 24.0M ±24% 24.9M ±29% ~ (p=0.684 n=10+10) GoroutineProfile/sparse-nil/idle-16 5.32k ±17% 5.49k ±17% ~ (p=0.684 n=10+10) GoroutineProfile/sparse-nil/loaded-16 7.25k ± 4% 6.97k ± 5% -3.90% (p=0.005 n=9+10) GoroutineProfile/sparse/idle-16 113k ± 5% 112k ± 6% ~ (p=0.631 n=10+10) GoroutineProfile/sparse/loaded-16 4.00M ±66% 4.26M ±65% ~ (p=0.489 n=9+9) [Geo mean] 107k 97k -9.55% name old alloc/op new alloc/op delta NewEmptyMap-16 0.00B 0.00B ~ (all equal) NewSmallMap-16 0.00B 0.00B ~ (all equal) MapPopulate/1-16 0.00B 0.00B ~ (all equal) MapPopulate/10-16 179B ± 0% 179B ± 0% ~ (all equal) MapPopulate/100-16 3.35kB ± 0% 3.35kB ± 0% ~ (p=0.294 n=10+8) MapPopulate/1000-16 53.3kB ± 0% 53.3kB ± 0% ~ (p=1.000 n=8+10) MapPopulate/10000-16 428kB ± 0% 428kB ± 0% ~ (p=0.469 n=10+10) MapPopulate/100000-16 3.62MB ± 0% 3.62MB ± 0% ~ (p=0.888 n=9+10) MapStringConversion/32/simple-16 0.00B 0.00B ~ (all equal) MapStringConversion/32/struct-16 0.00B 0.00B ~ (all equal) MapStringConversion/32/array-16 0.00B 0.00B ~ (all equal) MapStringConversion/64/simple-16 0.00B 0.00B ~ (all equal) MapStringConversion/64/struct-16 0.00B 0.00B ~ (all equal) MapStringConversion/64/array-16 0.00B 0.00B ~ (all equal) NewEmptyMapHintLessThan8-16 0.00B 0.00B ~ (all equal) NewEmptyMapHintGreaterThan8-16 1.15kB ± 0% 1.15kB ± 0% ~ (all equal) MapAppendAssign/Int32/256-16 41.7B ±15% 44.3B ±12% ~ (p=0.106 n=10+10) MapAppendAssign/Int32/65536-16 22.6B ± 6% 23.5B ± 6% +4.19% (p=0.025 n=9+10) MapAppendAssign/Int64/256-16 43.5B ±10% 42.9B ± 7% ~ (p=0.757 n=10+10) MapAppendAssign/Int64/65536-16 24.7B ± 7% 21.8B ± 6% -11.74% (p=0.000 n=10+10) MapAppendAssign/Str/256-16 87.6B ±10% 89.2B ± 9% ~ (p=0.379 n=10+10) MapAppendAssign/Str/65536-16 45.1B ±14% 47.2B ± 8% ~ (p=0.150 n=10+9) CreateGoroutinesCapture-16 144B ± 0% 144B ± 0% ~ (all equal) [Geo mean] 769B 770B +0.21% name old allocs/op new allocs/op delta NewEmptyMap-16 0.00 0.00 ~ (all equal) NewSmallMap-16 0.00 0.00 ~ (all equal) MapPopulate/1-16 0.00 0.00 ~ (all equal) MapPopulate/10-16 1.00 ± 0% 1.00 ± 0% ~ (all equal) MapPopulate/100-16 17.0 ± 0% 17.0 ± 0% ~ (all equal) MapPopulate/1000-16 73.0 ± 0% 73.0 ± 0% ~ (all equal) MapPopulate/10000-16 320 ± 0% 320 ± 0% ~ (p=1.000 n=10+10) MapPopulate/100000-16 4.00k ± 0% 4.00k ± 0% ~ (p=0.753 n=10+10) MapStringConversion/32/simple-16 0.00 0.00 ~ (all equal) MapStringConversion/32/struct-16 0.00 0.00 ~ (all equal) MapStringConversion/32/array-16 0.00 0.00 ~ (all equal) MapStringConversion/64/simple-16 0.00 0.00 ~ (all equal) MapStringConversion/64/struct-16 0.00 0.00 ~ (all equal) MapStringConversion/64/array-16 0.00 0.00 ~ (all equal) NewEmptyMapHintLessThan8-16 0.00 0.00 ~ (all equal) NewEmptyMapHintGreaterThan8-16 1.00 ± 0% 1.00 ± 0% ~ (all equal) MapAppendAssign/Int32/256-16 0.00 0.00 ~ (all equal) MapAppendAssign/Int32/65536-16 0.00 0.00 ~ (all equal) MapAppendAssign/Int64/256-16 0.00 0.00 ~ (all equal) MapAppendAssign/Int64/65536-16 0.00 0.00 ~ (all equal) MapAppendAssign/Str/256-16 0.00 0.00 ~ (all equal) MapAppendAssign/Str/65536-16 0.00 0.00 ~ (all equal) CreateGoroutinesCapture-16 5.00 ± 0% 5.00 ± 0% ~ (all equal) [Geo mean] 26.0 26.0 +0.00% Change-Id: I5fb03e93df8b380e04795afbdcd1c94aeeecacc6 Reviewed-on: https://go-review.googlesource.com/c/go/+/454255 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Jakub Ciolek <jakub@ciolek.dev> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-09-05runtime: convert local var ready at TestMemmoveAtomicity to atomic typecuiweixie
For #53821 Change-Id: I2487b8d18a4cd3fc6e64fbbb531419812bfe0f08 Reviewed-on: https://go-review.googlesource.com/c/go/+/427136 Run-TryBot: xie cui <523516579@qq.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>
2022-08-23cmd/compile/internal/ssa: optimize memory moving on arm64eric fang
This CL optimizes memory moving with LDP and STP on arm64. Benchmarks: name old time/op new time/op delta ClearFat7-160 1.08ns ± 0% 0.95ns ± 0% -11.41% (p=0.008 n=5+5) ClearFat8-160 0.84ns ± 0% 0.84ns ± 0% -0.95% (p=0.008 n=5+5) ClearFat11-160 1.08ns ± 0% 0.95ns ± 0% -11.46% (p=0.008 n=5+5) ClearFat12-160 0.95ns ± 0% 0.95ns ± 0% ~ (p=0.063 n=4+5) ClearFat13-160 1.08ns ± 0% 0.95ns ± 0% -11.45% (p=0.008 n=5+5) ClearFat14-160 1.08ns ± 0% 0.95ns ± 0% -11.47% (p=0.008 n=5+5) ClearFat15-160 1.24ns ± 0% 0.95ns ± 0% -22.98% (p=0.029 n=4+4) ClearFat16-160 0.84ns ± 0% 0.83ns ± 0% -0.11% (p=0.008 n=5+5) ClearFat24-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal) ClearFat32-160 2.86ns ± 0% 2.86ns ± 0% ~ (p=0.333 n=5+4) ClearFat40-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal) ClearFat48-160 3.32ns ± 1% 3.31ns ± 1% ~ (p=0.690 n=5+5) ClearFat56-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal) ClearFat64-160 3.25ns ± 1% 3.26ns ± 1% ~ (p=0.841 n=5+5) ClearFat72-160 2.22ns ± 0% 2.22ns ± 0% ~ (p=0.444 n=5+5) ClearFat128-160 4.03ns ± 0% 4.04ns ± 0% +0.32% (p=0.008 n=5+5) ClearFat256-160 6.44ns ± 0% 6.44ns ± 0% +0.08% (p=0.016 n=4+5) ClearFat512-160 12.2ns ± 0% 12.2ns ± 0% +0.13% (p=0.008 n=5+5) ClearFat1024-160 24.3ns ± 0% 24.3ns ± 0% ~ (p=0.167 n=5+5) ClearFat1032-160 24.5ns ± 0% 24.5ns ± 0% ~ (p=0.238 n=4+5) ClearFat1040-160 29.2ns ± 0% 29.3ns ± 0% +0.34% (p=0.008 n=5+5) CopyFat7-160 1.43ns ± 0% 1.07ns ± 0% -24.97% (p=0.008 n=5+5) CopyFat8-160 0.89ns ± 0% 0.89ns ± 0% ~ (p=0.238 n=5+5) CopyFat11-160 1.43ns ± 0% 1.07ns ± 0% -24.97% (p=0.008 n=5+5) CopyFat12-160 1.07ns ± 0% 1.07ns ± 0% ~ (p=0.238 n=5+4) CopyFat13-160 1.43ns ± 0% 1.07ns ± 0% ~ (p=0.079 n=4+5) CopyFat14-160 1.43ns ± 0% 1.07ns ± 0% -24.95% (p=0.008 n=5+5) CopyFat15-160 1.79ns ± 0% 1.07ns ± 0% ~ (p=0.079 n=4+5) CopyFat16-160 1.07ns ± 0% 1.07ns ± 0% ~ (p=0.444 n=5+5) CopyFat24-160 1.84ns ± 2% 1.67ns ± 0% -9.28% (p=0.008 n=5+5) CopyFat32-160 3.22ns ± 0% 2.92ns ± 0% -9.40% (p=0.008 n=5+5) CopyFat64-160 3.64ns ± 0% 3.57ns ± 0% -1.96% (p=0.008 n=5+5) CopyFat72-160 3.56ns ± 0% 3.11ns ± 0% -12.89% (p=0.008 n=5+5) CopyFat128-160 5.06ns ± 0% 5.06ns ± 0% +0.04% (p=0.048 n=5+5) CopyFat256-160 9.13ns ± 0% 9.13ns ± 0% ~ (p=0.659 n=5+5) CopyFat512-160 17.4ns ± 0% 17.4ns ± 0% ~ (p=0.167 n=5+5) CopyFat520-160 17.2ns ± 0% 17.3ns ± 0% +0.37% (p=0.008 n=5+5) CopyFat1024-160 34.1ns ± 0% 34.0ns ± 0% ~ (p=0.127 n=5+5) CopyFat1032-160 80.9ns ± 0% 34.2ns ± 0% -57.74% (p=0.008 n=5+5) CopyFat1040-160 94.4ns ± 0% 41.7ns ± 0% -55.78% (p=0.016 n=5+4) Change-Id: I14186f9f82b0ecf8b6c02191dc5da566b9a21e6c Reviewed-on: https://go-review.googlesource.com/c/go/+/421654 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-12runtime: run "gofmt -s -w"Cuong Manh Le
Change-Id: I7eb3de35d1f1f0237962735450b37d738966f30c Reviewed-on: https://go-review.googlesource.com/c/go/+/423254 Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2022-05-20runtime: add BenchmarkMemclrRangenimelehin
This benchmark is added to test improvements in memclr_amd64. As it is stated in Intel Optimization Manual 15.16.3.3, AVX2-implemented memclr can produce a skewed result with the branch predictor being trained by the large loop iteration count. This benchmark generates sizes between some specified range. This should help to measure how memclr works when branch predictors may be incorrectly trained. Change-Id: I14d173cafe43ca47198ed920e655547a66b3909f Reviewed-on: https://go-review.googlesource.com/c/go/+/373362 Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com>
2020-11-02runtime: improve memmove performance on arm64Jonathan Swinney
Replace the memmove implementation for moves of 17 bytes or larger with an implementation from ARM optimized software. The moves of 16 bytes or fewer are unchanged, but the registers used are updated to match the rest of the implementation. This implementation makes use of new optimizations: - software pipelined loop for large (>128 byte) moves - medium size moves (17..128 bytes) have a new implementation - address realignment when src or dst is unaligned - preference for aligned src (loads) or dst (stores) depending on CPU To support preference for aligned loads or aligned stores, a new CPU flag is added. This flag indicates that the detected micro architecture performs better with aligned loads. Some tested CPUs did not exhibit a significant difference and are left with the default behavior of realigning based on the destination address (stores). Neoverse N1 (Tested on Graviton 2) name old time/op new time/op delta Memmove/0-4 1.88ns ± 1% 1.87ns ± 1% -0.58% (p=0.020 n=10+10) Memmove/1-4 4.40ns ± 0% 4.40ns ± 0% ~ (all equal) Memmove/8-4 3.88ns ± 3% 3.80ns ± 0% -1.97% (p=0.001 n=10+9) Memmove/16-4 3.90ns ± 3% 3.80ns ± 0% -2.49% (p=0.000 n=10+9) Memmove/32-4 4.80ns ± 0% 4.40ns ± 0% -8.33% (p=0.000 n=9+8) Memmove/64-4 5.86ns ± 0% 5.00ns ± 0% -14.76% (p=0.000 n=8+8) Memmove/128-4 8.46ns ± 0% 8.06ns ± 0% -4.62% (p=0.000 n=10+10) Memmove/256-4 12.4ns ± 0% 12.2ns ± 0% -1.61% (p=0.000 n=10+10) Memmove/512-4 19.5ns ± 0% 19.1ns ± 0% -2.05% (p=0.000 n=10+10) Memmove/1024-4 33.7ns ± 0% 33.5ns ± 0% -0.59% (p=0.000 n=10+10) Memmove/2048-4 62.1ns ± 0% 59.0ns ± 0% -4.99% (p=0.000 n=10+10) Memmove/4096-4 117ns ± 1% 110ns ± 0% -5.66% (p=0.000 n=10+10) MemmoveUnalignedDst/64-4 6.41ns ± 0% 5.62ns ± 0% -12.32% (p=0.000 n=10+7) MemmoveUnalignedDst/128-4 9.40ns ± 0% 8.34ns ± 0% -11.24% (p=0.000 n=10+10) MemmoveUnalignedDst/256-4 12.8ns ± 0% 12.8ns ± 0% ~ (all equal) MemmoveUnalignedDst/512-4 20.4ns ± 0% 19.7ns ± 0% -3.43% (p=0.000 n=9+10) MemmoveUnalignedDst/1024-4 34.1ns ± 0% 35.1ns ± 0% +2.93% (p=0.000 n=9+9) MemmoveUnalignedDst/2048-4 61.5ns ± 0% 60.4ns ± 0% -1.77% (p=0.000 n=10+10) MemmoveUnalignedDst/4096-4 122ns ± 0% 113ns ± 0% -7.38% (p=0.002 n=8+10) MemmoveUnalignedSrc/64-4 7.25ns ± 1% 6.26ns ± 0% -13.64% (p=0.000 n=9+9) MemmoveUnalignedSrc/128-4 10.5ns ± 0% 9.7ns ± 0% -7.52% (p=0.000 n=10+10) MemmoveUnalignedSrc/256-4 17.1ns ± 0% 17.3ns ± 0% +1.17% (p=0.000 n=10+10) MemmoveUnalignedSrc/512-4 27.0ns ± 0% 27.0ns ± 0% ~ (all equal) MemmoveUnalignedSrc/1024-4 46.7ns ± 0% 35.7ns ± 0% -23.55% (p=0.000 n=10+9) MemmoveUnalignedSrc/2048-4 85.2ns ± 0% 61.2ns ± 0% -28.17% (p=0.000 n=10+8) MemmoveUnalignedSrc/4096-4 162ns ± 0% 113ns ± 0% -30.25% (p=0.000 n=10+10) name old speed new speed delta Memmove/4096-4 35.2GB/s ± 0% 37.1GB/s ± 0% +5.56% (p=0.000 n=10+9) MemmoveUnalignedSrc/1024-4 21.9GB/s ± 0% 28.7GB/s ± 0% +30.90% (p=0.000 n=10+10) MemmoveUnalignedSrc/2048-4 24.0GB/s ± 0% 33.5GB/s ± 0% +39.18% (p=0.000 n=10+9) MemmoveUnalignedSrc/4096-4 25.3GB/s ± 0% 36.2GB/s ± 0% +43.50% (p=0.000 n=10+7) Cortex-A72 (Graviton 1) name old time/op new time/op delta Memmove/0-4 3.06ns ± 3% 3.08ns ± 1% ~ (p=0.958 n=10+9) Memmove/1-4 8.72ns ± 0% 7.85ns ± 0% -9.98% (p=0.002 n=8+10) Memmove/8-4 8.29ns ± 0% 8.29ns ± 0% ~ (all equal) Memmove/16-4 8.29ns ± 0% 8.29ns ± 0% ~ (all equal) Memmove/32-4 8.19ns ± 2% 8.29ns ± 0% ~ (p=0.114 n=10+10) Memmove/64-4 18.3ns ± 4% 10.0ns ± 0% -45.36% (p=0.000 n=10+10) Memmove/128-4 14.8ns ± 0% 17.4ns ± 0% +17.77% (p=0.000 n=10+10) Memmove/256-4 21.8ns ± 0% 23.1ns ± 0% +5.96% (p=0.000 n=10+10) Memmove/512-4 35.8ns ± 0% 37.2ns ± 0% +3.91% (p=0.000 n=10+10) Memmove/1024-4 63.7ns ± 0% 67.2ns ± 0% +5.49% (p=0.000 n=10+10) Memmove/2048-4 126ns ± 0% 123ns ± 0% -2.38% (p=0.000 n=10+10) Memmove/4096-4 238ns ± 1% 243ns ± 1% +1.93% (p=0.000 n=10+10) MemmoveUnalignedDst/64-4 19.3ns ± 1% 12.0ns ± 1% -37.49% (p=0.000 n=10+10) MemmoveUnalignedDst/128-4 17.2ns ± 0% 17.4ns ± 0% +1.16% (p=0.000 n=10+10) MemmoveUnalignedDst/256-4 28.2ns ± 8% 29.2ns ± 0% ~ (p=0.352 n=10+10) MemmoveUnalignedDst/512-4 49.8ns ± 3% 48.9ns ± 0% ~ (p=1.000 n=10+10) MemmoveUnalignedDst/1024-4 89.5ns ± 0% 80.5ns ± 1% -10.02% (p=0.000 n=10+10) MemmoveUnalignedDst/2048-4 180ns ± 0% 127ns ± 0% -29.44% (p=0.000 n=9+10) MemmoveUnalignedDst/4096-4 347ns ± 0% 244ns ± 0% -29.59% (p=0.000 n=10+9) MemmoveUnalignedSrc/128-4 16.1ns ± 0% 21.8ns ± 0% +35.40% (p=0.000 n=10+10) MemmoveUnalignedSrc/256-4 24.9ns ± 8% 26.6ns ± 0% +6.70% (p=0.015 n=10+10) MemmoveUnalignedSrc/512-4 39.4ns ± 6% 40.6ns ± 0% ~ (p=0.352 n=10+10) MemmoveUnalignedSrc/1024-4 72.5ns ± 0% 83.0ns ± 1% +14.44% (p=0.000 n=9+10) MemmoveUnalignedSrc/2048-4 129ns ± 1% 128ns ± 1% ~ (p=0.179 n=10+10) MemmoveUnalignedSrc/4096-4 241ns ± 0% 253ns ± 1% +4.99% (p=0.000 n=9+9) Cortex-A53 (Raspberry Pi 3) name old time/op new time/op delta Memmove/0-4 11.0ns ± 0% 11.0ns ± 1% ~ (p=0.294 n=8+10) Memmove/1-4 29.6ns ± 0% 28.0ns ± 1% -5.41% (p=0.000 n=9+10) Memmove/8-4 23.5ns ± 0% 22.1ns ± 0% -6.11% (p=0.000 n=8+8) Memmove/16-4 23.7ns ± 1% 22.1ns ± 0% -6.59% (p=0.000 n=10+8) Memmove/32-4 27.9ns ± 0% 27.1ns ± 0% -3.13% (p=0.000 n=8+8) Memmove/64-4 33.8ns ± 0% 31.5ns ± 1% -6.99% (p=0.000 n=8+10) Memmove/128-4 45.6ns ± 0% 44.2ns ± 1% -3.23% (p=0.000 n=9+10) Memmove/256-4 69.3ns ± 0% 69.3ns ± 0% ~ (p=0.072 n=8+8) Memmove/512-4 127ns ± 0% 110ns ± 0% -13.39% (p=0.000 n=8+8) Memmove/1024-4 222ns ± 0% 205ns ± 1% -7.66% (p=0.000 n=7+10) Memmove/2048-4 411ns ± 0% 366ns ± 0% -10.98% (p=0.000 n=8+9) Memmove/4096-4 795ns ± 1% 695ns ± 1% -12.63% (p=0.000 n=10+10) MemmoveUnalignedDst/64-4 44.0ns ± 0% 40.5ns ± 0% -7.93% (p=0.000 n=8+8) MemmoveUnalignedDst/128-4 59.6ns ± 0% 54.9ns ± 0% -7.85% (p=0.000 n=9+9) MemmoveUnalignedDst/256-4 98.2ns ±11% 90.0ns ± 1% ~ (p=0.130 n=10+10) MemmoveUnalignedDst/512-4 161ns ± 2% 145ns ± 1% -9.96% (p=0.000 n=10+10) MemmoveUnalignedDst/1024-4 281ns ± 0% 265ns ± 0% -5.65% (p=0.000 n=9+8) MemmoveUnalignedDst/2048-4 528ns ± 0% 482ns ± 0% -8.73% (p=0.000 n=8+9) MemmoveUnalignedDst/4096-4 1.02µs ± 1% 0.92µs ± 0% -10.00% (p=0.000 n=10+8) MemmoveUnalignedSrc/64-4 42.4ns ± 1% 40.5ns ± 0% -4.39% (p=0.000 n=10+8) MemmoveUnalignedSrc/128-4 57.4ns ± 0% 57.0ns ± 1% -0.75% (p=0.048 n=9+10) MemmoveUnalignedSrc/256-4 88.1ns ± 1% 89.6ns ± 0% +1.70% (p=0.000 n=9+8) MemmoveUnalignedSrc/512-4 160ns ± 2% 144ns ± 0% -9.89% (p=0.000 n=10+8) MemmoveUnalignedSrc/1024-4 286ns ± 0% 266ns ± 1% -6.69% (p=0.000 n=8+10) MemmoveUnalignedSrc/2048-4 525ns ± 0% 483ns ± 1% -7.96% (p=0.000 n=9+10) MemmoveUnalignedSrc/4096-4 1.01µs ± 0% 0.92µs ± 1% -9.40% (p=0.000 n=8+10) Change-Id: Ia1144e9d4dfafdece6e167c5e576bf80f254c8ab Reviewed-on: https://go-review.googlesource.com/c/go/+/243357 TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: eric fang <eric.fang@arm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-27runtime: add 2-byte and 8-byte sub-benchmarks for memmove load/storeHeisenberg
Change-Id: I6389d7efe90836b6ece44d2e75053d1ad9f35d08 Reviewed-on: https://go-review.googlesource.com/c/go/+/253417 Trust: Emmanuel Odeke <emmanuel@orijtech.com> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06runtime: test memmove writes pointers atomicallyCherry Zhang
In the previous CL we ensures that memmove writes pointers atomically, so the concurrent GC won't observe a partially updated pointer. This CL adds a test. Change-Id: Icd1124bf3a15ef25bac20c7fb8933f1a642d897c Reviewed-on: https://go-review.googlesource.com/c/go/+/212627 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-05-16runtime: disable some tests in -quick modeRuss Cox
Speeds up the "go test runtime -cpu=1,2,4 -short -quick" phase of all.bash. For #26473. Change-Id: I090f5a5aa754462b3253a2156dc31fa67ce7af2a Reviewed-on: https://go-review.googlesource.com/c/go/+/177399 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-06runtime, cmd/compile: use ldp for DUFFCOPY on ARM64Meng Zhuo
name old time/op new time/op delta CopyFat8 2.15ns ± 1% 2.19ns ± 6% ~ (p=0.171 n=8+9) CopyFat12 2.15ns ± 0% 2.17ns ± 2% ~ (p=0.137 n=8+10) CopyFat16 2.17ns ± 3% 2.15ns ± 0% ~ (p=0.211 n=10+10) CopyFat24 2.16ns ± 1% 2.15ns ± 0% ~ (p=0.087 n=10+10) CopyFat32 11.5ns ± 0% 12.8ns ± 2% +10.87% (p=0.000 n=8+10) CopyFat64 20.2ns ± 2% 12.9ns ± 0% -36.11% (p=0.000 n=10+10) CopyFat128 37.2ns ± 0% 21.5ns ± 0% -42.20% (p=0.000 n=10+10) CopyFat256 71.6ns ± 0% 38.7ns ± 0% -45.95% (p=0.000 n=10+10) CopyFat512 140ns ± 0% 73ns ± 0% -47.86% (p=0.000 n=10+9) CopyFat520 142ns ± 0% 74ns ± 0% -47.54% (p=0.000 n=10+10) CopyFat1024 277ns ± 0% 141ns ± 0% -49.10% (p=0.000 n=10+10) Change-Id: If54bc571add5db674d5e081579c87e80153d0a5a Reviewed-on: https://go-review.googlesource.com/97395 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-31runtime: shorten tests in all.bashRuss Cox
This cuts 23 seconds from all.bash on my MacBook Pro. Change-Id: Ibc4d7c01660b9e9ebd088dd55ba993f0d7ec6aa3 Reviewed-on: https://go-review.googlesource.com/73991 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-01-23runtime: amd64, use 4-byte ops for memmove of 4 bytesKeith Randall
memmove used to use 2 2-byte load/store pairs to move 4 bytes. When the result is loaded with a single 4-byte load, it caused a store to load fowarding stall. To avoid the stall, special case memmove to use 4 byte ops for the 4 byte copy case. We already have a special case for 8-byte copies. 386 already specializes 4-byte copies. I'll do 2-byte copies also, but not for 1.8. benchmark old ns/op new ns/op delta BenchmarkIssue18740-8 7567 4799 -36.58% 3-byte copies get a bit slower. Other copies are unchanged. name old time/op new time/op delta Memmove/3-8 4.76ns ± 5% 5.26ns ± 3% +10.50% (p=0.000 n=10+10) Fixes #18740 Change-Id: Iec82cbac0ecfee80fa3c8fc83828f9a1819c3c74 Reviewed-on: https://go-review.googlesource.com/35567 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-11-04all: sprinkle t.Parallel on some slow testsBrad Fitzpatrick
I used the slowtests.go tool as described in https://golang.org/cl/32684 on packages that stood out. go test -short std drops from ~56 to ~52 seconds. This isn't a huge win, but it was mostly an exercise. Updates #17751 Change-Id: I9f3402e36a038d71e662d06ce2c1d52f6c4b674d Reviewed-on: https://go-review.googlesource.com/32751 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-10-06 runtime: improve memmove for amd64Denis Nagorny
Use AVX if available on 4th generation of Intel(TM) Core(TM) processors. (collected on E5 2609v3 @1.9GHz) name old speed new speed delta Memmove/1-6 158MB/s ± 0% 172MB/s ± 0% +9.09% (p=0.000 n=16+16) Memmove/2-6 316MB/s ± 0% 345MB/s ± 0% +9.09% (p=0.000 n=18+16) Memmove/3-6 517MB/s ± 0% 517MB/s ± 0% ~ (p=0.445 n=16+16) Memmove/4-6 687MB/s ± 1% 690MB/s ± 0% +0.35% (p=0.000 n=20+17) Memmove/5-6 729MB/s ± 0% 729MB/s ± 0% +0.01% (p=0.000 n=16+18) Memmove/6-6 875MB/s ± 0% 875MB/s ± 0% +0.01% (p=0.000 n=18+18) Memmove/7-6 1.02GB/s ± 0% 1.02GB/s ± 1% ~ (p=0.139 n=19+20) Memmove/8-6 1.26GB/s ± 0% 1.26GB/s ± 0% +0.00% (p=0.000 n=18+18) Memmove/9-6 1.42GB/s ± 0% 1.42GB/s ± 0% +0.00% (p=0.000 n=17+18) Memmove/10-6 1.58GB/s ± 0% 1.58GB/s ± 0% +0.00% (p=0.000 n=19+19) Memmove/11-6 1.74GB/s ± 0% 1.74GB/s ± 0% +0.00% (p=0.001 n=18+17) Memmove/12-6 1.90GB/s ± 0% 1.90GB/s ± 0% +0.00% (p=0.000 n=19+19) Memmove/13-6 2.05GB/s ± 0% 2.05GB/s ± 0% +0.00% (p=0.000 n=18+19) Memmove/14-6 2.21GB/s ± 0% 2.21GB/s ± 0% +0.00% (p=0.000 n=16+20) Memmove/15-6 2.37GB/s ± 0% 2.37GB/s ± 0% +0.00% (p=0.004 n=19+20) Memmove/16-6 2.53GB/s ± 0% 2.53GB/s ± 0% +0.00% (p=0.000 n=16+16) Memmove/32-6 4.67GB/s ± 0% 4.67GB/s ± 0% +0.00% (p=0.000 n=17+17) Memmove/64-6 8.67GB/s ± 0% 8.64GB/s ± 0% -0.33% (p=0.000 n=18+17) Memmove/128-6 12.6GB/s ± 0% 11.6GB/s ± 0% -8.05% (p=0.000 n=16+19) Memmove/256-6 16.3GB/s ± 0% 16.6GB/s ± 0% +1.66% (p=0.000 n=20+18) Memmove/512-6 21.5GB/s ± 0% 24.4GB/s ± 0% +13.35% (p=0.000 n=18+17) Memmove/1024-6 24.7GB/s ± 0% 33.7GB/s ± 0% +36.12% (p=0.000 n=18+18) Memmove/2048-6 27.3GB/s ± 0% 43.3GB/s ± 0% +58.77% (p=0.000 n=19+17) Memmove/4096-6 37.5GB/s ± 0% 50.5GB/s ± 0% +34.56% (p=0.000 n=19+19) MemmoveUnalignedDst/1-6 135MB/s ± 0% 146MB/s ± 0% +7.69% (p=0.000 n=16+14) MemmoveUnalignedDst/2-6 271MB/s ± 0% 292MB/s ± 0% +7.69% (p=0.000 n=18+18) MemmoveUnalignedDst/3-6 438MB/s ± 0% 438MB/s ± 0% ~ (p=0.352 n=16+19) MemmoveUnalignedDst/4-6 584MB/s ± 0% 584MB/s ± 0% ~ (p=0.876 n=17+17) MemmoveUnalignedDst/5-6 631MB/s ± 1% 632MB/s ± 0% +0.25% (p=0.000 n=20+17) MemmoveUnalignedDst/6-6 759MB/s ± 0% 759MB/s ± 0% +0.00% (p=0.000 n=19+16) MemmoveUnalignedDst/7-6 885MB/s ± 0% 883MB/s ± 1% ~ (p=0.647 n=18+20) MemmoveUnalignedDst/8-6 1.08GB/s ± 0% 1.08GB/s ± 0% +0.00% (p=0.035 n=19+18) MemmoveUnalignedDst/9-6 1.22GB/s ± 0% 1.22GB/s ± 0% ~ (p=0.251 n=18+17) MemmoveUnalignedDst/10-6 1.35GB/s ± 0% 1.35GB/s ± 0% ~ (p=0.327 n=17+18) MemmoveUnalignedDst/11-6 1.49GB/s ± 0% 1.49GB/s ± 0% ~ (p=0.531 n=18+19) MemmoveUnalignedDst/12-6 1.63GB/s ± 0% 1.63GB/s ± 0% ~ (p=0.886 n=19+18) MemmoveUnalignedDst/13-6 1.76GB/s ± 0% 1.76GB/s ± 1% -0.24% (p=0.006 n=18+20) MemmoveUnalignedDst/14-6 1.90GB/s ± 0% 1.90GB/s ± 0% ~ (p=0.818 n=20+19) MemmoveUnalignedDst/15-6 2.03GB/s ± 0% 2.03GB/s ± 0% ~ (p=0.294 n=17+16) MemmoveUnalignedDst/16-6 2.17GB/s ± 0% 2.17GB/s ± 0% ~ (p=0.602 n=16+18) MemmoveUnalignedDst/32-6 4.05GB/s ± 0% 4.05GB/s ± 0% +0.00% (p=0.010 n=18+17) MemmoveUnalignedDst/64-6 7.59GB/s ± 0% 7.59GB/s ± 0% +0.00% (p=0.022 n=18+16) MemmoveUnalignedDst/128-6 11.1GB/s ± 0% 11.4GB/s ± 0% +2.79% (p=0.000 n=18+17) MemmoveUnalignedDst/256-6 16.4GB/s ± 0% 16.7GB/s ± 0% +1.59% (p=0.000 n=20+17) MemmoveUnalignedDst/512-6 15.7GB/s ± 0% 21.3GB/s ± 0% +35.87% (p=0.000 n=18+20) MemmoveUnalignedDst/1024-6 16.0GB/s ±20% 31.5GB/s ± 0% +96.93% (p=0.000 n=20+14) MemmoveUnalignedDst/2048-6 19.6GB/s ± 0% 42.1GB/s ± 0% +115.16% (p=0.000 n=17+18) MemmoveUnalignedDst/4096-6 6.41GB/s ± 0% 33.18GB/s ± 0% +417.56% (p=0.000 n=17+18) MemmoveUnalignedSrc/1-6 171MB/s ± 0% 166MB/s ± 0% -3.33% (p=0.000 n=19+16) MemmoveUnalignedSrc/2-6 343MB/s ± 0% 342MB/s ± 1% -0.41% (p=0.000 n=17+20) MemmoveUnalignedSrc/3-6 508MB/s ± 0% 493MB/s ± 1% -2.90% (p=0.000 n=17+17) MemmoveUnalignedSrc/4-6 677MB/s ± 0% 660MB/s ± 2% -2.55% (p=0.000 n=17+20) MemmoveUnalignedSrc/5-6 790MB/s ± 0% 790MB/s ± 0% ~ (p=0.139 n=17+17) MemmoveUnalignedSrc/6-6 948MB/s ± 0% 946MB/s ± 1% ~ (p=0.330 n=17+19) MemmoveUnalignedSrc/7-6 1.11GB/s ± 0% 1.11GB/s ± 0% -0.05% (p=0.026 n=17+17) MemmoveUnalignedSrc/8-6 1.38GB/s ± 0% 1.38GB/s ± 0% ~ (p=0.091 n=18+16) MemmoveUnalignedSrc/9-6 1.42GB/s ± 0% 1.40GB/s ± 1% -1.04% (p=0.000 n=19+20) MemmoveUnalignedSrc/10-6 1.58GB/s ± 0% 1.56GB/s ± 1% -1.15% (p=0.000 n=18+19) MemmoveUnalignedSrc/11-6 1.73GB/s ± 0% 1.71GB/s ± 1% -1.30% (p=0.000 n=20+20) MemmoveUnalignedSrc/12-6 1.89GB/s ± 0% 1.87GB/s ± 1% -1.18% (p=0.000 n=17+20) MemmoveUnalignedSrc/13-6 2.05GB/s ± 0% 2.02GB/s ± 1% -1.18% (p=0.000 n=17+20) MemmoveUnalignedSrc/14-6 2.21GB/s ± 0% 2.18GB/s ± 1% -1.14% (p=0.000 n=17+20) MemmoveUnalignedSrc/15-6 2.36GB/s ± 0% 2.34GB/s ± 1% -1.04% (p=0.000 n=17+20) MemmoveUnalignedSrc/16-6 2.52GB/s ± 0% 2.49GB/s ± 1% -1.26% (p=0.000 n=19+20) MemmoveUnalignedSrc/32-6 4.82GB/s ± 0% 4.61GB/s ± 0% -4.40% (p=0.000 n=19+20) MemmoveUnalignedSrc/64-6 5.03GB/s ± 4% 7.97GB/s ± 0% +58.55% (p=0.000 n=20+16) MemmoveUnalignedSrc/128-6 11.1GB/s ± 0% 11.2GB/s ± 0% +0.52% (p=0.000 n=17+18) MemmoveUnalignedSrc/256-6 16.5GB/s ± 0% 16.4GB/s ± 0% -0.10% (p=0.000 n=20+18) MemmoveUnalignedSrc/512-6 21.0GB/s ± 0% 22.1GB/s ± 0% +5.48% (p=0.000 n=14+17) MemmoveUnalignedSrc/1024-6 24.9GB/s ± 0% 31.9GB/s ± 0% +28.20% (p=0.000 n=19+20) MemmoveUnalignedSrc/2048-6 23.3GB/s ± 0% 33.8GB/s ± 0% +45.22% (p=0.000 n=17+19) MemmoveUnalignedSrc/4096-6 37.3GB/s ± 0% 42.7GB/s ± 0% +14.30% (p=0.000 n=17+17) Change-Id: Id66aa3e499ccfb117cb99d623ef326b50d057b64 Reviewed-on: https://go-review.googlesource.com/29590 Run-TryBot: Denis Nagorny <denis.nagorny@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-08-31Revert "runtime: improve memmove for amd64"Joe Tsai
This reverts commit 3607c5f4f18ad4d423e40996ebf7f46b2f79ce02. This was causing failures on amd64 machines without AVX. Fixes #16939 Change-Id: I70080fbb4e7ae791857334f2bffd847d08cb25fa Reviewed-on: https://go-review.googlesource.com/28274 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-08-31runtime: improve memmove for amd64Denis Nagorny
Use AVX if available on 4th generation of Intel(TM) Core(TM) processors. (collected on E5 2609v3 @1.9GHz) name old speed new speed delta Memmove/1-6 158MB/s ± 0% 172MB/s ± 0% +9.09% (p=0.000 n=16+16) Memmove/2-6 316MB/s ± 0% 345MB/s ± 0% +9.09% (p=0.000 n=18+16) Memmove/3-6 517MB/s ± 0% 517MB/s ± 0% ~ (p=0.445 n=16+16) Memmove/4-6 687MB/s ± 1% 690MB/s ± 0% +0.35% (p=0.000 n=20+17) Memmove/5-6 729MB/s ± 0% 729MB/s ± 0% +0.01% (p=0.000 n=16+18) Memmove/6-6 875MB/s ± 0% 875MB/s ± 0% +0.01% (p=0.000 n=18+18) Memmove/7-6 1.02GB/s ± 0% 1.02GB/s ± 1% ~ (p=0.139 n=19+20) Memmove/8-6 1.26GB/s ± 0% 1.26GB/s ± 0% +0.00% (p=0.000 n=18+18) Memmove/9-6 1.42GB/s ± 0% 1.42GB/s ± 0% +0.00% (p=0.000 n=17+18) Memmove/10-6 1.58GB/s ± 0% 1.58GB/s ± 0% +0.00% (p=0.000 n=19+19) Memmove/11-6 1.74GB/s ± 0% 1.74GB/s ± 0% +0.00% (p=0.001 n=18+17) Memmove/12-6 1.90GB/s ± 0% 1.90GB/s ± 0% +0.00% (p=0.000 n=19+19) Memmove/13-6 2.05GB/s ± 0% 2.05GB/s ± 0% +0.00% (p=0.000 n=18+19) Memmove/14-6 2.21GB/s ± 0% 2.21GB/s ± 0% +0.00% (p=0.000 n=16+20) Memmove/15-6 2.37GB/s ± 0% 2.37GB/s ± 0% +0.00% (p=0.004 n=19+20) Memmove/16-6 2.53GB/s ± 0% 2.53GB/s ± 0% +0.00% (p=0.000 n=16+16) Memmove/32-6 4.67GB/s ± 0% 4.67GB/s ± 0% +0.00% (p=0.000 n=17+17) Memmove/64-6 8.67GB/s ± 0% 8.64GB/s ± 0% -0.33% (p=0.000 n=18+17) Memmove/128-6 12.6GB/s ± 0% 11.6GB/s ± 0% -8.05% (p=0.000 n=16+19) Memmove/256-6 16.3GB/s ± 0% 16.6GB/s ± 0% +1.66% (p=0.000 n=20+18) Memmove/512-6 21.5GB/s ± 0% 24.4GB/s ± 0% +13.35% (p=0.000 n=18+17) Memmove/1024-6 24.7GB/s ± 0% 33.7GB/s ± 0% +36.12% (p=0.000 n=18+18) Memmove/2048-6 27.3GB/s ± 0% 43.3GB/s ± 0% +58.77% (p=0.000 n=19+17) Memmove/4096-6 37.5GB/s ± 0% 50.5GB/s ± 0% +34.56% (p=0.000 n=19+19) MemmoveUnalignedDst/1-6 135MB/s ± 0% 146MB/s ± 0% +7.69% (p=0.000 n=16+14) MemmoveUnalignedDst/2-6 271MB/s ± 0% 292MB/s ± 0% +7.69% (p=0.000 n=18+18) MemmoveUnalignedDst/3-6 438MB/s ± 0% 438MB/s ± 0% ~ (p=0.352 n=16+19) MemmoveUnalignedDst/4-6 584MB/s ± 0% 584MB/s ± 0% ~ (p=0.876 n=17+17) MemmoveUnalignedDst/5-6 631MB/s ± 1% 632MB/s ± 0% +0.25% (p=0.000 n=20+17) MemmoveUnalignedDst/6-6 759MB/s ± 0% 759MB/s ± 0% +0.00% (p=0.000 n=19+16) MemmoveUnalignedDst/7-6 885MB/s ± 0% 883MB/s ± 1% ~ (p=0.647 n=18+20) MemmoveUnalignedDst/8-6 1.08GB/s ± 0% 1.08GB/s ± 0% +0.00% (p=0.035 n=19+18) MemmoveUnalignedDst/9-6 1.22GB/s ± 0% 1.22GB/s ± 0% ~ (p=0.251 n=18+17) MemmoveUnalignedDst/10-6 1.35GB/s ± 0% 1.35GB/s ± 0% ~ (p=0.327 n=17+18) MemmoveUnalignedDst/11-6 1.49GB/s ± 0% 1.49GB/s ± 0% ~ (p=0.531 n=18+19) MemmoveUnalignedDst/12-6 1.63GB/s ± 0% 1.63GB/s ± 0% ~ (p=0.886 n=19+18) MemmoveUnalignedDst/13-6 1.76GB/s ± 0% 1.76GB/s ± 1% -0.24% (p=0.006 n=18+20) MemmoveUnalignedDst/14-6 1.90GB/s ± 0% 1.90GB/s ± 0% ~ (p=0.818 n=20+19) MemmoveUnalignedDst/15-6 2.03GB/s ± 0% 2.03GB/s ± 0% ~ (p=0.294 n=17+16) MemmoveUnalignedDst/16-6 2.17GB/s ± 0% 2.17GB/s ± 0% ~ (p=0.602 n=16+18) MemmoveUnalignedDst/32-6 4.05GB/s ± 0% 4.05GB/s ± 0% +0.00% (p=0.010 n=18+17) MemmoveUnalignedDst/64-6 7.59GB/s ± 0% 7.59GB/s ± 0% +0.00% (p=0.022 n=18+16) MemmoveUnalignedDst/128-6 11.1GB/s ± 0% 11.4GB/s ± 0% +2.79% (p=0.000 n=18+17) MemmoveUnalignedDst/256-6 16.4GB/s ± 0% 16.7GB/s ± 0% +1.59% (p=0.000 n=20+17) MemmoveUnalignedDst/512-6 15.7GB/s ± 0% 21.3GB/s ± 0% +35.87% (p=0.000 n=18+20) MemmoveUnalignedDst/1024-6 16.0GB/s ±20% 31.5GB/s ± 0% +96.93% (p=0.000 n=20+14) MemmoveUnalignedDst/2048-6 19.6GB/s ± 0% 42.1GB/s ± 0% +115.16% (p=0.000 n=17+18) MemmoveUnalignedDst/4096-6 6.41GB/s ± 0% 33.18GB/s ± 0% +417.56% (p=0.000 n=17+18) MemmoveUnalignedSrc/1-6 171MB/s ± 0% 166MB/s ± 0% -3.33% (p=0.000 n=19+16) MemmoveUnalignedSrc/2-6 343MB/s ± 0% 342MB/s ± 1% -0.41% (p=0.000 n=17+20) MemmoveUnalignedSrc/3-6 508MB/s ± 0% 493MB/s ± 1% -2.90% (p=0.000 n=17+17) MemmoveUnalignedSrc/4-6 677MB/s ± 0% 660MB/s ± 2% -2.55% (p=0.000 n=17+20) MemmoveUnalignedSrc/5-6 790MB/s ± 0% 790MB/s ± 0% ~ (p=0.139 n=17+17) MemmoveUnalignedSrc/6-6 948MB/s ± 0% 946MB/s ± 1% ~ (p=0.330 n=17+19) MemmoveUnalignedSrc/7-6 1.11GB/s ± 0% 1.11GB/s ± 0% -0.05% (p=0.026 n=17+17) MemmoveUnalignedSrc/8-6 1.38GB/s ± 0% 1.38GB/s ± 0% ~ (p=0.091 n=18+16) MemmoveUnalignedSrc/9-6 1.42GB/s ± 0% 1.40GB/s ± 1% -1.04% (p=0.000 n=19+20) MemmoveUnalignedSrc/10-6 1.58GB/s ± 0% 1.56GB/s ± 1% -1.15% (p=0.000 n=18+19) MemmoveUnalignedSrc/11-6 1.73GB/s ± 0% 1.71GB/s ± 1% -1.30% (p=0.000 n=20+20) MemmoveUnalignedSrc/12-6 1.89GB/s ± 0% 1.87GB/s ± 1% -1.18% (p=0.000 n=17+20) MemmoveUnalignedSrc/13-6 2.05GB/s ± 0% 2.02GB/s ± 1% -1.18% (p=0.000 n=17+20) MemmoveUnalignedSrc/14-6 2.21GB/s ± 0% 2.18GB/s ± 1% -1.14% (p=0.000 n=17+20) MemmoveUnalignedSrc/15-6 2.36GB/s ± 0% 2.34GB/s ± 1% -1.04% (p=0.000 n=17+20) MemmoveUnalignedSrc/16-6 2.52GB/s ± 0% 2.49GB/s ± 1% -1.26% (p=0.000 n=19+20) MemmoveUnalignedSrc/32-6 4.82GB/s ± 0% 4.61GB/s ± 0% -4.40% (p=0.000 n=19+20) MemmoveUnalignedSrc/64-6 5.03GB/s ± 4% 7.97GB/s ± 0% +58.55% (p=0.000 n=20+16) MemmoveUnalignedSrc/128-6 11.1GB/s ± 0% 11.2GB/s ± 0% +0.52% (p=0.000 n=17+18) MemmoveUnalignedSrc/256-6 16.5GB/s ± 0% 16.4GB/s ± 0% -0.10% (p=0.000 n=20+18) MemmoveUnalignedSrc/512-6 21.0GB/s ± 0% 22.1GB/s ± 0% +5.48% (p=0.000 n=14+17) MemmoveUnalignedSrc/1024-6 24.9GB/s ± 0% 31.9GB/s ± 0% +28.20% (p=0.000 n=19+20) MemmoveUnalignedSrc/2048-6 23.3GB/s ± 0% 33.8GB/s ± 0% +45.22% (p=0.000 n=17+19) MemmoveUnalignedSrc/4096-6 37.3GB/s ± 0% 42.7GB/s ± 0% +14.30% (p=0.000 n=17+17) Change-Id: Iab488d93a293cdf573ab5cd89b95a818bbb5d531 Reviewed-on: https://go-review.googlesource.com/22515 Run-TryBot: Denis Nagorny <denis.nagorny@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-05-25runtime: use of Run for some benchmarksMarcel van Lohuizen
Names of sub-benchmarks are preserved, short of the additional slash. Change-Id: I9b3f82964f9a44b0d28724413320afd091ed3106 Reviewed-on: https://go-review.googlesource.com/23425 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-21runtime: use MOVSB instead of MOVSQ for unaligned movesKeith Randall
MOVSB is quite a bit faster for unaligned moves. Possibly we should use MOVSB all of the time, but Intel folks say it might be a bit faster to use MOVSQ on some processors (but not any I have access to at the moment). benchmark old ns/op new ns/op delta BenchmarkMemmove4096-8 93.9 93.2 -0.75% BenchmarkMemmoveUnalignedDst4096-8 256 151 -41.02% BenchmarkMemmoveUnalignedSrc4096-8 175 90.5 -48.29% Fixes #14630 Change-Id: I568e6d6590eb3615e6a699fb474020596be665ff Reviewed-on: https://go-review.googlesource.com/20293 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-01all: make copyright headers consistent with one space after periodBrad Fitzpatrick
This is a subset of https://golang.org/cl/20022 with only the copyright header lines, so the next CL will be smaller and more reviewable. Go policy has been single space after periods in comments for some time. The copyright header template at: https://golang.org/doc/contribute.html#copyright also uses a single space. Make them all consistent. Change-Id: Icc26c6b8495c3820da6b171ca96a74701b4a01b0 Reviewed-on: https://go-review.googlesource.com/20111 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-24runtime: speed up memclr with avx2 on amd64Ilya Tocar
Results are a bit noisy, but show good improvement (haswell) name old time/op new time/op delta Memclr5-48 6.06ns ± 8% 5.65ns ± 8% -6.81% (p=0.000 n=20+20) Memclr16-48 5.75ns ± 6% 5.71ns ± 6% ~ (p=0.545 n=20+19) Memclr64-48 6.54ns ± 5% 6.14ns ± 9% -6.12% (p=0.000 n=18+20) Memclr256-48 10.1ns ±12% 9.9ns ±14% ~ (p=0.285 n=20+19) Memclr4096-48 104ns ± 8% 57ns ±15% -44.98% (p=0.000 n=20+20) Memclr65536-48 2.45µs ± 5% 2.43µs ± 8% ~ (p=0.665 n=16+20) Memclr1M-48 58.7µs ±13% 56.4µs ±11% -3.92% (p=0.033 n=20+19) Memclr4M-48 233µs ± 9% 234µs ± 9% ~ (p=0.728 n=20+19) Memclr8M-48 469µs ±11% 472µs ±16% ~ (p=0.947 n=20+20) Memclr16M-48 947µs ±10% 916µs ±10% ~ (p=0.050 n=20+19) Memclr64M-48 10.9ms ±10% 4.5ms ± 9% -58.43% (p=0.000 n=20+20) GoMemclr5-48 3.80ns ±13% 3.38ns ± 6% -11.02% (p=0.000 n=20+20) GoMemclr16-48 3.34ns ±15% 3.40ns ± 9% ~ (p=0.351 n=20+20) GoMemclr64-48 4.10ns ±15% 4.04ns ±10% ~ (p=1.000 n=20+19) GoMemclr256-48 7.75ns ±20% 7.88ns ± 9% ~ (p=0.227 n=20+19) name old speed new speed delta Memclr5-48 826MB/s ± 7% 886MB/s ± 8% +7.32% (p=0.000 n=20+20) Memclr16-48 2.78GB/s ± 5% 2.81GB/s ± 6% ~ (p=0.550 n=20+19) Memclr64-48 9.79GB/s ± 5% 10.44GB/s ±10% +6.64% (p=0.000 n=18+20) Memclr256-48 25.4GB/s ±14% 25.6GB/s ±12% ~ (p=0.647 n=20+19) Memclr4096-48 39.4GB/s ± 8% 72.0GB/s ±13% +82.81% (p=0.000 n=20+20) Memclr65536-48 26.6GB/s ± 6% 27.0GB/s ± 9% ~ (p=0.517 n=17+20) Memclr1M-48 17.9GB/s ±12% 18.5GB/s ±11% ~ (p=0.068 n=20+20) Memclr4M-48 18.0GB/s ± 9% 17.8GB/s ±14% ~ (p=0.547 n=20+20) Memclr8M-48 17.9GB/s ±10% 17.8GB/s ±14% ~ (p=0.947 n=20+20) Memclr16M-48 17.8GB/s ± 9% 18.4GB/s ± 9% ~ (p=0.050 n=20+19) Memclr64M-48 6.19GB/s ±10% 14.87GB/s ± 9% +140.11% (p=0.000 n=20+20) GoMemclr5-48 1.31GB/s ±10% 1.48GB/s ± 6% +13.06% (p=0.000 n=19+20) GoMemclr16-48 4.81GB/s ±14% 4.71GB/s ± 8% ~ (p=0.341 n=20+20) GoMemclr64-48 15.7GB/s ±13% 15.8GB/s ±11% ~ (p=0.967 n=20+19) Change-Id: I393f3f20e2f31538d1b1dd70d6e5c201c106a095 Reviewed-on: https://go-review.googlesource.com/16773 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Klaus Post <klauspost@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>
2015-10-08runtime: adjust the arm64 memmove and memclr to operate by word as much as ↵Michael Hudson-Doyle
they can Not only is this an obvious optimization: benchmark old MB/s new MB/s speedup BenchmarkMemmove1-4 35.35 29.65 0.84x BenchmarkMemmove2-4 63.78 52.53 0.82x BenchmarkMemmove3-4 89.72 73.96 0.82x BenchmarkMemmove4-4 109.94 95.73 0.87x BenchmarkMemmove5-4 127.60 112.80 0.88x BenchmarkMemmove6-4 143.59 126.67 0.88x BenchmarkMemmove7-4 157.90 138.92 0.88x BenchmarkMemmove8-4 167.18 231.81 1.39x BenchmarkMemmove9-4 175.23 252.07 1.44x BenchmarkMemmove10-4 165.68 261.10 1.58x BenchmarkMemmove11-4 174.43 263.31 1.51x BenchmarkMemmove12-4 180.76 267.56 1.48x BenchmarkMemmove13-4 189.06 284.93 1.51x BenchmarkMemmove14-4 186.31 284.72 1.53x BenchmarkMemmove15-4 195.75 281.62 1.44x BenchmarkMemmove16-4 202.96 439.23 2.16x BenchmarkMemmove32-4 264.77 775.77 2.93x BenchmarkMemmove64-4 306.81 1209.64 3.94x BenchmarkMemmove128-4 357.03 1515.41 4.24x BenchmarkMemmove256-4 380.77 2066.01 5.43x BenchmarkMemmove512-4 385.05 2556.45 6.64x BenchmarkMemmove1024-4 381.23 2804.10 7.36x BenchmarkMemmove2048-4 379.06 2814.83 7.43x BenchmarkMemmove4096-4 387.43 3064.96 7.91x BenchmarkMemmoveUnaligned1-4 28.91 25.40 0.88x BenchmarkMemmoveUnaligned2-4 56.13 47.56 0.85x BenchmarkMemmoveUnaligned3-4 74.32 69.31 0.93x BenchmarkMemmoveUnaligned4-4 97.02 83.58 0.86x BenchmarkMemmoveUnaligned5-4 110.17 103.62 0.94x BenchmarkMemmoveUnaligned6-4 124.95 113.26 0.91x BenchmarkMemmoveUnaligned7-4 142.37 130.82 0.92x BenchmarkMemmoveUnaligned8-4 151.20 205.64 1.36x BenchmarkMemmoveUnaligned9-4 166.97 215.42 1.29x BenchmarkMemmoveUnaligned10-4 148.49 221.22 1.49x BenchmarkMemmoveUnaligned11-4 159.47 239.57 1.50x BenchmarkMemmoveUnaligned12-4 163.52 247.32 1.51x BenchmarkMemmoveUnaligned13-4 167.55 256.54 1.53x BenchmarkMemmoveUnaligned14-4 175.12 251.03 1.43x BenchmarkMemmoveUnaligned15-4 192.10 267.13 1.39x BenchmarkMemmoveUnaligned16-4 190.76 378.87 1.99x BenchmarkMemmoveUnaligned32-4 259.02 562.98 2.17x BenchmarkMemmoveUnaligned64-4 317.72 842.44 2.65x BenchmarkMemmoveUnaligned128-4 355.43 1274.49 3.59x BenchmarkMemmoveUnaligned256-4 378.17 1815.74 4.80x BenchmarkMemmoveUnaligned512-4 362.15 2180.81 6.02x BenchmarkMemmoveUnaligned1024-4 376.07 2453.58 6.52x BenchmarkMemmoveUnaligned2048-4 381.66 2568.32 6.73x BenchmarkMemmoveUnaligned4096-4 398.51 2669.36 6.70x BenchmarkMemclr5-4 113.83 107.93 0.95x BenchmarkMemclr16-4 223.84 389.63 1.74x BenchmarkMemclr64-4 421.99 1209.58 2.87x BenchmarkMemclr256-4 525.94 2411.58 4.59x BenchmarkMemclr4096-4 581.66 4372.20 7.52x BenchmarkMemclr65536-4 565.84 4747.48 8.39x BenchmarkGoMemclr5-4 194.63 160.31 0.82x BenchmarkGoMemclr16-4 295.30 630.07 2.13x BenchmarkGoMemclr64-4 480.24 1884.03 3.92x BenchmarkGoMemclr256-4 540.23 2926.49 5.42x but it turns out that it's necessary to avoid the GC seeing partially written pointers. It's of course possible to be more sophisticated (using ldp/stp to move 16 bytes at a time in the core loop and unrolling the tail copying loops being the obvious ideas) but I wanted something simple and (reasonably) obviously correct. Fixes #12552 Change-Id: Iaeaf8a812cd06f4747ba2f792de1ded738890735 Reviewed-on: https://go-review.googlesource.com/14813 Reviewed-by: Austin Clements <austin@google.com>
2015-04-15cmd/6g, runtime: improve duffzero throughputJosh Bleecher Snyder
It is faster to execute MOVQ AX,(DI) MOVQ AX,8(DI) MOVQ AX,16(DI) MOVQ AX,24(DI) ADDQ $32,DI than STOSQ STOSQ STOSQ STOSQ However, in order to be able to jump into the middle of a block of MOVQs, the call site needs to pre-adjust DI. If we're clearing a small area, the cost of that DI pre-adjustment isn't repaid. This CL switches the DUFFZERO implementation to use a hybrid strategy, in which small clears use STOSQ as before, but large clears use mostly MOVQ/ADDQ blocks. benchmark old ns/op new ns/op delta BenchmarkClearFat8 0.55 0.55 +0.00% BenchmarkClearFat12 0.82 0.83 +1.22% BenchmarkClearFat16 0.55 0.55 +0.00% BenchmarkClearFat24 0.82 0.82 +0.00% BenchmarkClearFat32 2.20 1.94 -11.82% BenchmarkClearFat40 1.92 1.66 -13.54% BenchmarkClearFat48 2.21 1.93 -12.67% BenchmarkClearFat56 3.03 2.20 -27.39% BenchmarkClearFat64 3.26 2.48 -23.93% BenchmarkClearFat72 3.57 2.76 -22.69% BenchmarkClearFat80 3.83 3.05 -20.37% BenchmarkClearFat88 4.14 3.30 -20.29% BenchmarkClearFat128 5.54 4.69 -15.34% BenchmarkClearFat256 9.95 9.09 -8.64% BenchmarkClearFat512 18.7 17.9 -4.28% BenchmarkClearFat1024 36.2 35.4 -2.21% Change-Id: Ic786406d9b3cab68d5a231688f9e66fcd1bd7103 Reviewed-on: https://go-review.googlesource.com/2585 Reviewed-by: Keith Randall <khr@golang.org>
2015-01-09cmd/gc: optimize memclr of slices and arraysJosh Bleecher Snyder
Recognize loops of the form for i := range a { a[i] = zero } in which the evaluation of a is free from side effects. Replace these loops with calls to memclr. This occurs in the stdlib in 18 places. The motivating example is clearing a byte slice: benchmark old ns/op new ns/op delta BenchmarkGoMemclr5 3.31 3.26 -1.51% BenchmarkGoMemclr16 13.7 3.28 -76.06% BenchmarkGoMemclr64 50.8 4.14 -91.85% BenchmarkGoMemclr256 157 6.02 -96.17% Update #5373. Change-Id: I99d3e6f5f268e8c6499b7e661df46403e5eb83e4 Reviewed-on: https://go-review.googlesource.com/2520 Reviewed-by: Keith Randall <khr@golang.org>
2014-09-08build: move package sources from src/pkg to srcRuss Cox
Preparation was in CL 134570043. This CL contains only the effect of 'hg mv src/pkg/* src'. For more about the move, see golang.org/s/go14nopkg.