| Age | Commit message (Collapse) | Author |
|
If memequal is invoked with the same pointers as arguments it ends up
comparing the whole memory contents, instead of just comparing the pointers.
This effectively makes an operation that could be O(1) into O(n). All the
other architectures already have this optimization in place. For
instance, arm64 also have it, in memequal_varlen.
Such optimization is very specific, one case that it will probably benefit is
programs that rely heavily on interning of strings.
goos: darwin
goarch: arm64
pkg: bytes
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Equal/same/1-8 2.678n ± ∞ ¹ 2.400n ± ∞ ¹ -10.38% (p=0.008 n=5)
Equal/same/6-8 3.267n ± ∞ ¹ 2.431n ± ∞ ¹ -25.59% (p=0.008 n=5)
Equal/same/9-8 2.981n ± ∞ ¹ 2.385n ± ∞ ¹ -19.99% (p=0.008 n=5)
Equal/same/15-8 2.974n ± ∞ ¹ 2.390n ± ∞ ¹ -19.64% (p=0.008 n=5)
Equal/same/16-8 2.983n ± ∞ ¹ 2.380n ± ∞ ¹ -20.21% (p=0.008 n=5)
Equal/same/20-8 3.567n ± ∞ ¹ 2.384n ± ∞ ¹ -33.17% (p=0.008 n=5)
Equal/same/32-8 3.568n ± ∞ ¹ 2.385n ± ∞ ¹ -33.16% (p=0.008 n=5)
Equal/same/4K-8 78.040n ± ∞ ¹ 2.378n ± ∞ ¹ -96.95% (p=0.008 n=5)
Equal/same/4M-8 78713.000n ± ∞ ¹ 2.385n ± ∞ ¹ -100.00% (p=0.008 n=5)
Equal/same/64M-8 1348095.000n ± ∞ ¹ 2.381n ± ∞ ¹ -100.00% (p=0.008 n=5)
geomean 43.52n 2.390n -94.51%
¹ need >= 6 samples for confidence interval at level 0.95
│ old.txt │ new.txt │
│ B/s │ B/s vs base │
Equal/same/1-8 356.1Mi ± ∞ ¹ 397.3Mi ± ∞ ¹ +11.57% (p=0.008 n=5)
Equal/same/6-8 1.711Gi ± ∞ ¹ 2.298Gi ± ∞ ¹ +34.35% (p=0.008 n=5)
Equal/same/9-8 2.812Gi ± ∞ ¹ 3.515Gi ± ∞ ¹ +24.99% (p=0.008 n=5)
Equal/same/15-8 4.698Gi ± ∞ ¹ 5.844Gi ± ∞ ¹ +24.41% (p=0.008 n=5)
Equal/same/16-8 4.995Gi ± ∞ ¹ 6.260Gi ± ∞ ¹ +25.34% (p=0.008 n=5)
Equal/same/20-8 5.222Gi ± ∞ ¹ 7.814Gi ± ∞ ¹ +49.63% (p=0.008 n=5)
Equal/same/32-8 8.353Gi ± ∞ ¹ 12.496Gi ± ∞ ¹ +49.59% (p=0.008 n=5)
Equal/same/4K-8 48.88Gi ± ∞ ¹ 1603.96Gi ± ∞ ¹ +3181.17% (p=0.008 n=5)
Equal/same/4M-8 49.63Gi ± ∞ ¹ 1637911.85Gi ± ∞ ¹ +3300381.91% (p=0.008 n=5)
Equal/same/64M-8 46.36Gi ± ∞ ¹ 26253069.97Gi ± ∞ ¹ +56626517.99% (p=0.008 n=5)
geomean 6.737Gi 122.7Gi +1721.01%
¹ need >= 6 samples for confidence interval at level 0.95
Fixes #64381
Change-Id: I7d423930a688edd88c4ba60d45e097296d9be852
GitHub-Last-Rev: ae8189fafb1cba87b5394f09f971746ae9299273
GitHub-Pull-Request: golang/go#64419
Reviewed-on: https://go-review.googlesource.com/c/go/+/545416
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
|
|
Change-Id: I3996fb31789a1f8559348e059cf371774e548a8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/393875
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
functions to register ABI on ARM64
This CL ports a few performance-critical assembly functions to use
register arguments directly. This is similar to CL 308931 and
CL 310184.
Change-Id: I6e30dfff17f76b8578ce8cfd51de21b66610fdb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/324400
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
memequal_varlen on ARM64
Currently, memequal_varlen opens up a frame and call memequal,
which then tail-calls memeqbody. This CL changes memequal_varlen
tail-calls memeqbody directly.
This makes it simpler to switch to the register ABI in the next
CL.
Change-Id: Ia1367c0abb7f4755fe736c404411793fb9e5c04f
Reviewed-on: https://go-review.googlesource.com/c/go/+/324399
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
The compiler has advanced enough that it is cheaper
to convert to strings than to go through the assembly
trampolines to call runtime.memequal.
Simplify Equal accordingly, and cull dead code from bytealg.
While we're here, simplify Equal's documentation.
Fixes #31587
Change-Id: Ie721d33f9a6cbd86b1d873398b20e7882c2c63e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/173323
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Cleans things up quite a bit.
There's still a few more, like runtime.cmpstring, which might also
be worth fixing.
Change-Id: Ide18dd621efc129cc686db223f47fa0b044b5580
Reviewed-on: https://go-review.googlesource.com/c/148578
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Currently the 16-byte loop chunk16_loop is implemented with NEON instructions LD1, VMOV and VCMEQ.
Using scalar instructions LDP and CMP to achieve this loop can reduce the number of clock cycles.
For cases where the length of strings are between 4 to 15 bytes, loading the last 8 or 4 bytes at
a time to reduce the number of comparisons.
Benchmarks:
name old time/op new time/op delta
Equal/0-8 5.51ns ± 0% 5.84ns ±14% ~ (p=0.246 n=7+8)
Equal/1-8 10.5ns ± 0% 10.5ns ± 0% ~ (all equal)
Equal/6-8 14.0ns ± 0% 12.5ns ± 0% -10.71% (p=0.000 n=8+8)
Equal/9-8 13.5ns ± 0% 12.5ns ± 0% -7.41% (p=0.000 n=8+8)
Equal/15-8 15.5ns ± 0% 12.5ns ± 0% -19.35% (p=0.000 n=8+8)
Equal/16-8 14.0ns ± 0% 13.0ns ± 0% -7.14% (p=0.000 n=8+8)
Equal/20-8 16.5ns ± 0% 16.0ns ± 0% -3.03% (p=0.000 n=8+8)
Equal/32-8 16.5ns ± 0% 15.3ns ± 0% -7.27% (p=0.000 n=8+8)
Equal/4K-8 552ns ± 0% 553ns ± 0% ~ (p=0.315 n=8+8)
Equal/4M-8 1.13ms ±23% 1.20ms ±27% ~ (p=0.442 n=8+8)
Equal/64M-8 32.9ms ± 0% 32.6ms ± 0% -1.15% (p=0.000 n=8+8)
CompareBytesEqual-8 12.0ns ± 0% 12.0ns ± 0% ~ (all equal)
Change-Id: If317ecdcc98e31883d37fd7d42b113b548c5bd2a
Reviewed-on: https://go-review.googlesource.com/112496
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
|
|
Functions exported on behalf of other packages need to have their
argument stack maps specified explicitly. They don't get an implicit
map because they are not in the local package, and if they get defer'd
they need argument maps.
Fixes #24419
Change-Id: I35b7d8b4a03d4770ba88699e1007cb3fcb5397a9
Reviewed-on: https://go-review.googlesource.com/122676
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Move bytes.Equal, runtime.memequal, and runtime.memequal_varlen
to the bytealg package.
Update #19792
Change-Id: Ic4175e952936016ea0bda6c7c3dbb33afdc8e4ac
Reviewed-on: https://go-review.googlesource.com/98355
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|