| Age | Commit message (Collapse) | Author |
|
Change-Id: I742b49a3889892b7b1bb354f47f1c0d933c041e4
Reviewed-on: https://go-review.googlesource.com/c/go/+/682395
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
This CL marks non-leaf nosplit assembly functions as NOFRAME to avoid
relying on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.
Updates #57302
Updates #40044
Change-Id: Ia4d26f8420dcf2b54528969ffbf40a73f1315d61
Reviewed-on: https://go-review.googlesource.com/c/go/+/459395
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
In ABIInternal, reserve X15 as constant zero, and use it to zero
memory. (Maybe there can be more use of it?)
The register is zeroed when transition to ABIInternal from ABI0.
Caveat: using X15 generates longer instructions than using X0.
Maybe we want to use X0?
Change-Id: I12d5ee92a01fc0b59dad4e5ab023ac71bc2a8b7d
Reviewed-on: https://go-review.googlesource.com/c/go/+/288093
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
[This is a roll-forward of CL 262319, with a fix for some Darwin test
failures].
Change the definitions of selected runtime assembly routines
from ABI0 (the default) to ABIInternal. The ABIInternal def is
intended to indicate that these functions don't follow the existing Go
runtime ABI. In addition, convert the assembly reference to
runtime.main (from runtime.mainPC) to ABIInternal. Finally, for
functions such as "runtime.duffzero" that are called directly from
generated code, make sure that the compiler looks up the correct
ABI version.
This is intended to support the register abi work, however these
changes should not have any issues even when GOEXPERIMENT=regabi is
not in effect.
Updates #27539, #40724.
Change-Id: Idf507f1c06176073563845239e1a54dad51a9ea9
Reviewed-on: https://go-review.googlesource.com/c/go/+/266638
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
This reverts commit 50af50d136551e2009b2b52e829570536271cdaa.
Reason for revert: Causes failures in the runtime package test on Darwin, apparently.
Change-Id: I006bc1b3443fa7207e92fb4a93e3fb438d4d3de3
Reviewed-on: https://go-review.googlesource.com/c/go/+/266257
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Change the definitions of selected runtime assembly routines
from ABI0 (the default) to ABIInternal. The ABIInternal def is
intended to indicate that these functions don't follow the existing Go
runtime ABI. In addition, convert the assembly reference to
runtime.main (from runtime.mainPC) to ABIInternal. Finally, for
functions such as "runtime.duffzero" that are called directly from
generated code, make sure that the compiler looks up the correct
ABI version.
This is intended to support the register abi work, however these
changes should not have any issues even when GOEXPERIMENT=regabi is
not in effect.
Updates #27539, #40724.
Change-Id: I9846f8dcaccc95718cf2e61a18b7e924a0677e4c
Reviewed-on: https://go-review.googlesource.com/c/go/+/262319
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Use 16-byte stores instead of 8-byte stores to zero small blocks.
Also switch to duffzero for 65+ bytes only, because for each
duffzero call we also save/restore BP, so call requires 4 instructions
and replacing it with 4 sse stores doesn't cause code-bloat.
Also switch duffzero to use leaq, instead of addq to avoid clobbering flags.
ClearFat8-6 0.54ns ± 0% 0.54ns ± 0% ~ (all equal)
ClearFat12-6 1.07ns ± 0% 1.07ns ± 0% ~ (all equal)
ClearFat16-6 1.07ns ± 0% 0.69ns ± 0% -35.51% (p=0.001 n=8+9)
ClearFat24-6 1.61ns ± 1% 1.07ns ± 0% -33.33% (p=0.000 n=10+10)
ClearFat32-6 2.14ns ± 0% 1.07ns ± 0% -50.00% (p=0.001 n=8+9)
ClearFat40-6 2.67ns ± 1% 1.61ns ± 0% -39.72% (p=0.000 n=10+8)
ClearFat48-6 3.75ns ± 0% 2.68ns ± 0% -28.59% (p=0.000 n=9+9)
ClearFat56-6 4.29ns ± 0% 3.22ns ± 0% -25.10% (p=0.000 n=9+9)
ClearFat64-6 4.30ns ± 0% 3.22ns ± 0% -25.15% (p=0.000 n=8+8)
ClearFat128-6 7.50ns ± 1% 7.51ns ± 0% ~ (p=0.767 n=10+9)
ClearFat256-6 13.9ns ± 1% 13.9ns ± 1% ~ (p=0.257 n=10+10)
ClearFat512-6 26.8ns ± 0% 26.8ns ± 0% ~ (p=0.467 n=8+8)
ClearFat1024-6 52.5ns ± 0% 52.5ns ± 0% ~ (p=1.000 n=8+8)
Also shaves ~20kb from go tool:
go_old 10384994
go_new 10364514 [-20480 bytes]
section differences
global text (code) = -20585 bytes (-0.532047%)
read-only data = -302 bytes (-0.018101%)
Total difference -20887 bytes (-0.348731%)
Change-Id: I15854e87544545c1af24775df895e38e16e12694
Reviewed-on: https://go-review.googlesource.com/54410
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Per golang.org/s/generatedcode
Updates #nnn
Change-Id: Ia7513ef6bd26c20b62b57b29f7770684a315d389
Reviewed-on: https://go-review.googlesource.com/45470
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Matt Layher <mdlayher@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Use movups to copy 16 bytes at a time.
Results (haswell):
name old time/op new time/op delta
CopyFat8-48 0.62ns ± 3% 0.63ns ± 3% ~ (p=0.535 n=20+20)
CopyFat12-48 0.92ns ± 2% 0.93ns ± 3% ~ (p=0.594 n=17+18)
CopyFat16-48 1.23ns ± 2% 1.23ns ± 2% ~ (p=0.839 n=20+19)
CopyFat24-48 1.85ns ± 2% 1.84ns ± 0% -0.48% (p=0.014 n=19+20)
CopyFat32-48 2.45ns ± 0% 2.45ns ± 1% ~ (p=1.000 n=16+16)
CopyFat64-48 3.30ns ± 2% 2.14ns ± 1% -35.00% (p=0.000 n=20+18)
CopyFat128-48 6.05ns ± 0% 3.98ns ± 0% -34.22% (p=0.000 n=18+17)
CopyFat256-48 11.9ns ± 3% 7.7ns ± 0% -35.87% (p=0.000 n=20+17)
CopyFat512-48 23.0ns ± 2% 15.1ns ± 2% -34.52% (p=0.000 n=20+18)
CopyFat1024-48 44.8ns ± 1% 29.8ns ± 2% -33.48% (p=0.000 n=17+19)
Change-Id: I8a78773c656d400726a020894461e00c59f896bf
Reviewed-on: https://go-review.googlesource.com/14836
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Use MOVUPS to zero 16 bytes at a time.
results (haswell):
name old time/op new time/op delta
ClearFat8-48 0.62ns ± 2% 0.62ns ± 1% ~ (p=0.085 n=20+15)
ClearFat12-48 0.93ns ± 2% 0.93ns ± 2% ~ (p=0.757 n=19+19)
ClearFat16-48 1.23ns ± 1% 1.23ns ± 1% ~ (p=0.896 n=19+17)
ClearFat24-48 1.85ns ± 2% 1.84ns ± 0% -0.51% (p=0.023 n=20+15)
ClearFat32-48 2.45ns ± 0% 2.46ns ± 2% ~ (p=0.053 n=17+18)
ClearFat40-48 1.99ns ± 0% 0.92ns ± 2% -53.54% (p=0.000 n=19+20)
ClearFat48-48 2.15ns ± 1% 0.92ns ± 2% -56.93% (p=0.000 n=19+20)
ClearFat56-48 2.46ns ± 1% 1.23ns ± 0% -49.98% (p=0.000 n=19+14)
ClearFat64-48 2.76ns ± 0% 2.14ns ± 1% -22.21% (p=0.000 n=17+17)
ClearFat128-48 5.21ns ± 0% 3.99ns ± 0% -23.46% (p=0.000 n=17+19)
ClearFat256-48 10.3ns ± 4% 7.7ns ± 0% -25.37% (p=0.000 n=20+17)
ClearFat512-48 20.2ns ± 4% 15.0ns ± 1% -25.58% (p=0.000 n=20+17)
ClearFat1024-48 39.7ns ± 2% 29.7ns ± 0% -25.05% (p=0.000 n=19+19)
Change-Id: I200401eec971b2dd2450c0651c51e378bd982405
Reviewed-on: https://go-review.googlesource.com/14408
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
It is faster to execute
MOVQ AX,(DI)
MOVQ AX,8(DI)
MOVQ AX,16(DI)
MOVQ AX,24(DI)
ADDQ $32,DI
than
STOSQ
STOSQ
STOSQ
STOSQ
However, in order to be able to jump into
the middle of a block of MOVQs, the call
site needs to pre-adjust DI.
If we're clearing a small area, the cost
of that DI pre-adjustment isn't repaid.
This CL switches the DUFFZERO implementation
to use a hybrid strategy, in which small
clears use STOSQ as before, but large clears
use mostly MOVQ/ADDQ blocks.
benchmark old ns/op new ns/op delta
BenchmarkClearFat8 0.55 0.55 +0.00%
BenchmarkClearFat12 0.82 0.83 +1.22%
BenchmarkClearFat16 0.55 0.55 +0.00%
BenchmarkClearFat24 0.82 0.82 +0.00%
BenchmarkClearFat32 2.20 1.94 -11.82%
BenchmarkClearFat40 1.92 1.66 -13.54%
BenchmarkClearFat48 2.21 1.93 -12.67%
BenchmarkClearFat56 3.03 2.20 -27.39%
BenchmarkClearFat64 3.26 2.48 -23.93%
BenchmarkClearFat72 3.57 2.76 -22.69%
BenchmarkClearFat80 3.83 3.05 -20.37%
BenchmarkClearFat88 4.14 3.30 -20.29%
BenchmarkClearFat128 5.54 4.69 -15.34%
BenchmarkClearFat256 9.95 9.09 -8.64%
BenchmarkClearFat512 18.7 17.9 -4.28%
BenchmarkClearFat1024 36.2 35.4 -2.21%
Change-Id: Ic786406d9b3cab68d5a231688f9e66fcd1bd7103
Reviewed-on: https://go-review.googlesource.com/2585
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This makes it easier to experiment with alternative implementations.
While we're here, update the comments.
No functional changes. Passes toolstash -cmp.
Change-Id: I428535754908f0fdd7cc36c214ddb6e1e60f376e
Reviewed-on: https://go-review.googlesource.com/8310
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|