| Age | Commit message (Collapse) | Author |
|
This CL marks non-leaf nosplit assembly functions as NOFRAME to avoid
relying on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.
Updates #57302
Updates #40044
Change-Id: Ia4d26f8420dcf2b54528969ffbf40a73f1315d61
Reviewed-on: https://go-review.googlesource.com/c/go/+/459395
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Contributors to the loong64 port are:
Weining Lu <luweining@loongson.cn>
Lei Wang <wanglei@loongson.cn>
Lingqin Gong <gonglingqin@loongson.cn>
Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Meidan Li <limeidan@loongson.cn>
Xiaojuan Zhai <zhaixiaojuan@loongson.cn>
Qiyuan Pu <puqiyuan@loongson.cn>
Guoqi Chen <chenguoqi@loongson.cn>
This port has been updated to Go 1.15.6:
https://github.com/loongson/go
Updates #46229
Change-Id: Ida040e76dc8172f60e6aee1ea2b5bce13ab3581e
Reviewed-on: https://go-review.googlesource.com/c/go/+/368077
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
|
|
This CL adds
- spill functions used by runtime
- ABIInternal to functions
Adding new stubs_riscv64 file to eliminate vet issues while compiling.
Change-Id: I2a9f6088a1cd2d9708f26b2d97895b4e5f9f87e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/360296
Trust: mzh <mzh@golangcn.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
As CL 356519 require, X8-X23 will be argument register, however X10, X11
is used by duff device.
This CL changes X10, X11 into X24, X25 to meet the prerequisite.
Update #40724
Change-Id: Ie9b899afbba7e9a51bb7dacd89e49ca1c1fc33ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/357976
Trust: mzh <mzh@golangcn.org>
Run-TryBot: mzh <mzh@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
|
|
Update many generators, also handle files that were not part of the
standard build during 'go fix' in CL 344955.
Fixes #41184.
Change-Id: I1edc684e8101882dcd11f75c6745c266fccfe9e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/359476
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
|
|
When these packages are released as part of Go 1.18,
Go 1.16 will no longer be supported, so we can remove
the +build tags in these files.
Ran go fix -fix=buildtag std cmd and then reverted the bootstrapDirs
as defined in src/cmd/dist/buildtool.go, which need to continue
to build with Go 1.4 for now.
Also reverted src/vendor and src/cmd/vendor, which will need
to be updated in their own repos first.
Manual changes in runtime/pprof/mprof_test.go to adjust line numbers.
For #41184.
Change-Id: Ic0f93f7091295b6abc76ed5cd6e6746e1280861e
Reviewed-on: https://go-review.googlesource.com/c/go/+/344955
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
|
|
This adds support for duffcopy on ppc64x and updates the
ssa/config.go file to enable register args and recognize
the duffDevice is available on ppc64x.
Change-Id: Ifc472cc9cc19c9a80e468fb52078c75f7dd44d36
Reviewed-on: https://go-review.googlesource.com/c/go/+/351490
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
compiler ABIInternal
For functions such as gcWriteBarrier and panicIndexXXX, the
compiler generates ABIInternal calls directly. And they must not
use wrappers because it follows a special calling convention or
the caller's PC is used. Mark them as ABIInternal.
Note that even though they are marked as ABIInternal, they don't
actually use the internal ABI, i.e. regabiargs is not honored for
now.
Now all.bash passes with GOEXPERIMENT=regabiwrappers (at least on
macOS).
Change-Id: I87e41964e6dc4efae03e8eb636ae9fa1d99285bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/323934
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Don't add them to files in vendor and cmd/vendor though. These will be
pulled in by updating the respective dependencies.
For #41184
Change-Id: Icc57458c9b3033c347124323f33084c85b224c70
Reviewed-on: https://go-review.googlesource.com/c/go/+/319389
Trust: Tobias Klauser <tobias.klauser@gmail.com>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
Make all our package sources use Go 1.17 gofmt format
(adding //go:build lines).
Part of //go:build change (#41184).
See https://golang.org/design/draft-gobuild
Change-Id: Ia0534360e4957e58cd9a18429c39d0e32a6addb4
Reviewed-on: https://go-review.googlesource.com/c/go/+/294430
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
In ABIInternal, reserve X15 as constant zero, and use it to zero
memory. (Maybe there can be more use of it?)
The register is zeroed when transition to ABIInternal from ABI0.
Caveat: using X15 generates longer instructions than using X0.
Maybe we want to use X0?
Change-Id: I12d5ee92a01fc0b59dad4e5ab023ac71bc2a8b7d
Reviewed-on: https://go-review.googlesource.com/c/go/+/288093
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
As part of #42026, these helpers from io/ioutil were moved to os.
(ioutil.TempFile and TempDir became os.CreateTemp and MkdirTemp.)
Update the Go tree to use the preferred names.
As usual, code compiled with the Go 1.4 bootstrap toolchain
and code vendored from other sources is excluded.
ReadDir changes are in a separate CL, because they are not a
simple search and replace.
For #42026.
Change-Id: If318df0216d57e95ea0c4093b89f65e5b0ababb3
Reviewed-on: https://go-review.googlesource.com/c/go/+/266365
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Implement runtime.duffzero and runtime.duffcopy for riscv64.
Use obj.ADUFFZERO/obj.ADUFFCOPY for medium size, word aligned
zeroing/moving.
Change-Id: I42ec622055630c94cb77e286d8d33dbe7c9f846c
Reviewed-on: https://go-review.googlesource.com/c/go/+/237797
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Change-Id: I71966cc5def4615d64876165872e5e7f2956b270
Reviewed-on: https://go-review.googlesource.com/c/go/+/253397
Run-TryBot: Martin Möhrmann <martisch@uos.de>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Although duffcopy is not used on PPC64, duff_ppc64x.s and
mkduff.go don't match. Make it so.
Fixes #38188.
Change-Id: Ic6c08e335795ea407880efd449f4229696af7744
Reviewed-on: https://go-review.googlesource.com/c/go/+/226719
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
OS: Linux loongson 3.10.84 mips64el
CPU: Loongson 3A3000 quad core
name old time/op new time/op delta
BinaryTree17 23.5s ± 1% 23.2s ± 0% -1.12% (p=0.008 n=5+5)
Fannkuch11 10.2s ± 0% 10.1s ± 0% -0.19% (p=0.008 n=5+5)
FmtFprintfEmpty 450ns ± 0% 446ns ± 1% -0.89% (p=0.024 n=5+5)
FmtFprintfString 722ns ± 1% 721ns ± 1% ~ (p=0.762 n=5+5)
FmtFprintfInt 693ns ± 2% 691ns ± 2% ~ (p=0.889 n=5+5)
FmtFprintfIntInt 912ns ± 1% 911ns ± 0% ~ (p=0.722 n=5+5)
FmtFprintfPrefixedInt 1.35µs ± 2% 1.35µs ± 2% ~ (p=1.000 n=5+5)
FmtFprintfFloat 1.79µs ± 0% 1.78µs ± 0% ~ (p=0.683 n=5+5)
FmtManyArgs 3.46µs ± 1% 3.48µs ± 1% ~ (p=0.246 n=5+5)
GobDecode 48.8ms ± 1% 48.6ms ± 0% ~ (p=0.222 n=5+5)
GobEncode 37.7ms ± 1% 37.4ms ± 1% ~ (p=0.095 n=5+5)
Gzip 1.72s ± 1% 1.72s ± 0% ~ (p=0.905 n=5+4)
Gunzip 342ms ± 0% 342ms ± 0% ~ (p=0.421 n=5+5)
HTTPClientServer 219µs ± 1% 219µs ± 1% ~ (p=1.000 n=5+5)
JSONEncode 89.1ms ± 1% 89.4ms ± 1% ~ (p=0.222 n=5+5)
JSONDecode 292ms ± 1% 291ms ± 0% ~ (p=0.421 n=5+5)
Mandelbrot200 15.7ms ± 0% 15.6ms ± 0% ~ (p=0.690 n=5+5)
GoParse 19.5ms ± 1% 19.6ms ± 1% ~ (p=0.310 n=5+5)
RegexpMatchEasy0_32 534ns ± 1% 529ns ± 1% ~ (p=0.056 n=5+5)
RegexpMatchEasy0_1K 2.75µs ± 0% 2.74µs ± 0% -0.46% (p=0.008 n=5+5)
RegexpMatchEasy1_32 572ns ± 2% 565ns ± 3% ~ (p=0.310 n=5+5)
RegexpMatchEasy1_1K 4.15µs ± 0% 4.15µs ± 1% ~ (p=0.548 n=5+5)
RegexpMatchMedium_32 31.2ns ± 0% 31.1ns ± 0% -0.45% (p=0.016 n=5+4)
RegexpMatchMedium_1K 235µs ± 1% 235µs ± 0% ~ (p=1.000 n=5+5)
RegexpMatchHard_32 13.9µs ± 1% 13.5µs ± 1% -2.74% (p=0.008 n=5+5)
RegexpMatchHard_1K 416µs ± 2% 410µs ± 2% ~ (p=0.056 n=5+5)
Revcomp 6.36s ± 0% 6.34s ± 0% -0.31% (p=0.008 n=5+5)
Template 352ms ± 1% 353ms ± 0% +0.45% (p=0.032 n=5+5)
TimeParse 2.04µs ± 4% 2.01µs ± 0% ~ (p=0.056 n=5+5)
TimeFormat 2.97µs ± 0% 2.97µs ± 0% ~ (p=1.000 n=5+5)
name old speed new speed delta
GobDecode 15.7MB/s ± 1% 15.8MB/s ± 0% ~ (p=0.206 n=5+5)
GobEncode 20.4MB/s ± 1% 20.5MB/s ± 1% ~ (p=0.056 n=5+5)
Gzip 11.3MB/s ± 1% 11.3MB/s ± 0% ~ (p=0.841 n=5+4)
Gunzip 56.7MB/s ± 0% 56.8MB/s ± 0% ~ (p=0.389 n=5+5)
JSONEncode 21.8MB/s ± 1% 21.7MB/s ± 1% ~ (p=0.246 n=5+5)
JSONDecode 6.66MB/s ± 0% 6.67MB/s ± 0% ~ (p=0.857 n=4+5)
GoParse 2.97MB/s ± 1% 2.96MB/s ± 1% ~ (p=0.238 n=5+5)
RegexpMatchEasy0_32 59.9MB/s ± 1% 60.5MB/s ± 1% +0.92% (p=0.032 n=5+5)
RegexpMatchEasy0_1K 372MB/s ± 0% 374MB/s ± 0% +0.46% (p=0.008 n=5+5)
RegexpMatchEasy1_32 56.0MB/s ± 2% 56.7MB/s ± 3% ~ (p=0.310 n=5+5)
RegexpMatchEasy1_1K 247MB/s ± 0% 247MB/s ± 1% ~ (p=0.548 n=5+5)
RegexpMatchMedium_32 32.0MB/s ± 0% 32.1MB/s ± 0% ~ (p=0.135 n=5+5)
RegexpMatchMedium_1K 4.35MB/s ± 1% 4.35MB/s ± 1% ~ (p=0.825 n=5+5)
RegexpMatchHard_32 2.30MB/s ± 1% 2.37MB/s ± 1% +2.78% (p=0.008 n=5+5)
RegexpMatchHard_1K 2.47MB/s ± 1% 2.50MB/s ± 2% ~ (p=0.095 n=5+5)
Revcomp 40.0MB/s ± 0% 40.1MB/s ± 0% +0.31% (p=0.016 n=5+5)
Template 5.51MB/s ± 1% 5.49MB/s ± 0% ~ (p=0.190 n=5+5)
Change-Id: I540a2e4e7992376ce04f93b332f64fc3b6071237
Reviewed-on: https://go-review.googlesource.com/c/go/+/185078
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Currently we use R16 and R17 for ARM64's Duff's devices.
According to ARM64 ABI, R16 and R17 can be used by the (external)
linker as scratch registers in trampolines. So don't use these
registers to pass information across functions.
It seems unlikely that calling Duff's devices would need a
trampoline in normal cases. But it could happen if the call
target is out of the 128 MB direct jump limit.
The choice of R20 and R21 is kind of arbitrary. The register
allocator allocates from low-numbered registers. High numbered
registers are chosen so it is unlikely to hold a live value and
forces a spill.
Fixes #32773.
Change-Id: Id22d555b5afeadd4efcf62797d1580d641c39218
Reviewed-on: https://go-review.googlesource.com/c/go/+/183842
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
name old time/op new time/op delta
CopyFat8 2.15ns ± 1% 2.19ns ± 6% ~ (p=0.171 n=8+9)
CopyFat12 2.15ns ± 0% 2.17ns ± 2% ~ (p=0.137 n=8+10)
CopyFat16 2.17ns ± 3% 2.15ns ± 0% ~ (p=0.211 n=10+10)
CopyFat24 2.16ns ± 1% 2.15ns ± 0% ~ (p=0.087 n=10+10)
CopyFat32 11.5ns ± 0% 12.8ns ± 2% +10.87% (p=0.000 n=8+10)
CopyFat64 20.2ns ± 2% 12.9ns ± 0% -36.11% (p=0.000 n=10+10)
CopyFat128 37.2ns ± 0% 21.5ns ± 0% -42.20% (p=0.000 n=10+10)
CopyFat256 71.6ns ± 0% 38.7ns ± 0% -45.95% (p=0.000 n=10+10)
CopyFat512 140ns ± 0% 73ns ± 0% -47.86% (p=0.000 n=10+9)
CopyFat520 142ns ± 0% 74ns ± 0% -47.54% (p=0.000 n=10+10)
CopyFat1024 277ns ± 0% 141ns ± 0% -49.10% (p=0.000 n=10+10)
Change-Id: If54bc571add5db674d5e081579c87e80153d0a5a
Reviewed-on: https://go-review.googlesource.com/97395
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This replaces frame size -4/-8 with the NOFRAME flag in mips and
mips64 assembly.
This was automated with:
sed -i -e 's/\(^TEXT.*[A-Z]\),\( *\)\$-[84]/\1|NOFRAME,\2$0/' $(find -name '*_mips*.s')
Plus a manual fix to mkduff.go.
The go binary is identical on both architectures before and after this
change.
Change-Id: I0310384d1a584118c41d1cd3a042bb8ea7227efb
Reviewed-on: https://go-review.googlesource.com/92044
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This replaces frame size -8 with the NOFRAME flag in arm64 assembly.
This was automated with:
sed -i -e 's/\(^TEXT.*[A-Z]\),\( *\)\$-8/\1|NOFRAME,\2$0/' $(find -name '*_arm64.s')
Plus a manual fix to mkduff.go.
The go binary is identical before and after this change.
Change-Id: I0310384d1a584118c41d1cd3a042bb8ea7227efa
Reviewed-on: https://go-review.googlesource.com/92043
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Use "STP (ZR, ZR), O(R)" instead of "MOVD ZR, O(R)" to implement memory clearing.
Also improve assembler supports to STP/LDP.
Results (A57@2GHzx8):
benchmark old ns/op new ns/op delta
BenchmarkClearFat8-8 1.00 1.00 +0.00%
BenchmarkClearFat12-8 1.01 1.01 +0.00%
BenchmarkClearFat16-8 1.01 1.01 +0.00%
BenchmarkClearFat24-8 1.52 1.52 +0.00%
BenchmarkClearFat32-8 3.00 2.02 -32.67%
BenchmarkClearFat40-8 3.50 2.52 -28.00%
BenchmarkClearFat48-8 3.50 3.03 -13.43%
BenchmarkClearFat56-8 4.00 3.50 -12.50%
BenchmarkClearFat64-8 4.25 4.00 -5.88%
BenchmarkClearFat128-8 8.01 8.01 +0.00%
BenchmarkClearFat256-8 16.1 16.0 -0.62%
BenchmarkClearFat512-8 32.1 32.0 -0.31%
BenchmarkClearFat1024-8 64.1 64.1 +0.00%
Change-Id: Ie5f5eac271ff685884775005825f206167a5c146
Reviewed-on: https://go-review.googlesource.com/55610
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Use 16-byte stores instead of 8-byte stores to zero small blocks.
Also switch to duffzero for 65+ bytes only, because for each
duffzero call we also save/restore BP, so call requires 4 instructions
and replacing it with 4 sse stores doesn't cause code-bloat.
Also switch duffzero to use leaq, instead of addq to avoid clobbering flags.
ClearFat8-6 0.54ns ± 0% 0.54ns ± 0% ~ (all equal)
ClearFat12-6 1.07ns ± 0% 1.07ns ± 0% ~ (all equal)
ClearFat16-6 1.07ns ± 0% 0.69ns ± 0% -35.51% (p=0.001 n=8+9)
ClearFat24-6 1.61ns ± 1% 1.07ns ± 0% -33.33% (p=0.000 n=10+10)
ClearFat32-6 2.14ns ± 0% 1.07ns ± 0% -50.00% (p=0.001 n=8+9)
ClearFat40-6 2.67ns ± 1% 1.61ns ± 0% -39.72% (p=0.000 n=10+8)
ClearFat48-6 3.75ns ± 0% 2.68ns ± 0% -28.59% (p=0.000 n=9+9)
ClearFat56-6 4.29ns ± 0% 3.22ns ± 0% -25.10% (p=0.000 n=9+9)
ClearFat64-6 4.30ns ± 0% 3.22ns ± 0% -25.15% (p=0.000 n=8+8)
ClearFat128-6 7.50ns ± 1% 7.51ns ± 0% ~ (p=0.767 n=10+9)
ClearFat256-6 13.9ns ± 1% 13.9ns ± 1% ~ (p=0.257 n=10+10)
ClearFat512-6 26.8ns ± 0% 26.8ns ± 0% ~ (p=0.467 n=8+8)
ClearFat1024-6 52.5ns ± 0% 52.5ns ± 0% ~ (p=1.000 n=8+8)
Also shaves ~20kb from go tool:
go_old 10384994
go_new 10364514 [-20480 bytes]
section differences
global text (code) = -20585 bytes (-0.532047%)
read-only data = -302 bytes (-0.018101%)
Total difference -20887 bytes (-0.348731%)
Change-Id: I15854e87544545c1af24775df895e38e16e12694
Reviewed-on: https://go-review.googlesource.com/54410
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Per golang.org/s/generatedcode
Updates #nnn
Change-Id: Ia7513ef6bd26c20b62b57b29f7770684a315d389
Reviewed-on: https://go-review.googlesource.com/45470
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Matt Layher <mdlayher@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Update comments for duffzero and duffcopy
which referred to legacy locations:
+ cmd/?g/cgen.go
+ cmd/?g/ggen.go
Remnants of the old days when we had 5g, 6g etc.
Those locations have since moved to:
+ cmd/compile/internal/<arch>/cgen.go
+ cmd/compile/internal/<arch>/ggen.go
Change-Id: Ie2ea668559d52d42b747260ea69a6d5b3d70e859
Reviewed-on: https://go-review.googlesource.com/29073
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
Change-Id: I8984eac30e5df78d4b94f19412135d3cc36969f8
Reviewed-on: https://go-review.googlesource.com/29910
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Change-Id: I9e94027ef66c88007107de2b2b75c3d7cf1352af
Reviewed-on: https://go-review.googlesource.com/14467
Reviewed-by: Minux Ma <minux@golang.org>
|
|
on ppc64x
Replace the confusing game where a frame size of $-8 would suppress the
implicit setting up of a stack frame with a nice explicit flag.
The code to set up the function prologue is still a little confusing but better
than it was.
Change-Id: I1d49278ff42c6bc734ebfb079998b32bc53f8d9a
Reviewed-on: https://go-review.googlesource.com/15670
Reviewed-by: Minux Ma <minux@golang.org>
|
|
Use movups to copy 16 bytes at a time.
Results (haswell):
name old time/op new time/op delta
CopyFat8-48 0.62ns ± 3% 0.63ns ± 3% ~ (p=0.535 n=20+20)
CopyFat12-48 0.92ns ± 2% 0.93ns ± 3% ~ (p=0.594 n=17+18)
CopyFat16-48 1.23ns ± 2% 1.23ns ± 2% ~ (p=0.839 n=20+19)
CopyFat24-48 1.85ns ± 2% 1.84ns ± 0% -0.48% (p=0.014 n=19+20)
CopyFat32-48 2.45ns ± 0% 2.45ns ± 1% ~ (p=1.000 n=16+16)
CopyFat64-48 3.30ns ± 2% 2.14ns ± 1% -35.00% (p=0.000 n=20+18)
CopyFat128-48 6.05ns ± 0% 3.98ns ± 0% -34.22% (p=0.000 n=18+17)
CopyFat256-48 11.9ns ± 3% 7.7ns ± 0% -35.87% (p=0.000 n=20+17)
CopyFat512-48 23.0ns ± 2% 15.1ns ± 2% -34.52% (p=0.000 n=20+18)
CopyFat1024-48 44.8ns ± 1% 29.8ns ± 2% -33.48% (p=0.000 n=17+19)
Change-Id: I8a78773c656d400726a020894461e00c59f896bf
Reviewed-on: https://go-review.googlesource.com/14836
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Use MOVUPS to zero 16 bytes at a time.
results (haswell):
name old time/op new time/op delta
ClearFat8-48 0.62ns ± 2% 0.62ns ± 1% ~ (p=0.085 n=20+15)
ClearFat12-48 0.93ns ± 2% 0.93ns ± 2% ~ (p=0.757 n=19+19)
ClearFat16-48 1.23ns ± 1% 1.23ns ± 1% ~ (p=0.896 n=19+17)
ClearFat24-48 1.85ns ± 2% 1.84ns ± 0% -0.51% (p=0.023 n=20+15)
ClearFat32-48 2.45ns ± 0% 2.46ns ± 2% ~ (p=0.053 n=17+18)
ClearFat40-48 1.99ns ± 0% 0.92ns ± 2% -53.54% (p=0.000 n=19+20)
ClearFat48-48 2.15ns ± 1% 0.92ns ± 2% -56.93% (p=0.000 n=19+20)
ClearFat56-48 2.46ns ± 1% 1.23ns ± 0% -49.98% (p=0.000 n=19+14)
ClearFat64-48 2.76ns ± 0% 2.14ns ± 1% -22.21% (p=0.000 n=17+17)
ClearFat128-48 5.21ns ± 0% 3.99ns ± 0% -23.46% (p=0.000 n=17+19)
ClearFat256-48 10.3ns ± 4% 7.7ns ± 0% -25.37% (p=0.000 n=20+17)
ClearFat512-48 20.2ns ± 4% 15.0ns ± 1% -25.58% (p=0.000 n=20+17)
ClearFat1024-48 39.7ns ± 2% 29.7ns ± 0% -25.05% (p=0.000 n=19+19)
Change-Id: I200401eec971b2dd2450c0651c51e378bd982405
Reviewed-on: https://go-review.googlesource.com/14408
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
To avoid confusion with the runtime concept of copying stack.
Change-Id: I33442377b71012c2482c2d0ddd561492c71e70d0
Reviewed-on: https://go-review.googlesource.com/8639
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
It is faster to execute
MOVQ AX,(DI)
MOVQ AX,8(DI)
MOVQ AX,16(DI)
MOVQ AX,24(DI)
ADDQ $32,DI
than
STOSQ
STOSQ
STOSQ
STOSQ
However, in order to be able to jump into
the middle of a block of MOVQs, the call
site needs to pre-adjust DI.
If we're clearing a small area, the cost
of that DI pre-adjustment isn't repaid.
This CL switches the DUFFZERO implementation
to use a hybrid strategy, in which small
clears use STOSQ as before, but large clears
use mostly MOVQ/ADDQ blocks.
benchmark old ns/op new ns/op delta
BenchmarkClearFat8 0.55 0.55 +0.00%
BenchmarkClearFat12 0.82 0.83 +1.22%
BenchmarkClearFat16 0.55 0.55 +0.00%
BenchmarkClearFat24 0.82 0.82 +0.00%
BenchmarkClearFat32 2.20 1.94 -11.82%
BenchmarkClearFat40 1.92 1.66 -13.54%
BenchmarkClearFat48 2.21 1.93 -12.67%
BenchmarkClearFat56 3.03 2.20 -27.39%
BenchmarkClearFat64 3.26 2.48 -23.93%
BenchmarkClearFat72 3.57 2.76 -22.69%
BenchmarkClearFat80 3.83 3.05 -20.37%
BenchmarkClearFat88 4.14 3.30 -20.29%
BenchmarkClearFat128 5.54 4.69 -15.34%
BenchmarkClearFat256 9.95 9.09 -8.64%
BenchmarkClearFat512 18.7 17.9 -4.28%
BenchmarkClearFat1024 36.2 35.4 -2.21%
Change-Id: Ic786406d9b3cab68d5a231688f9e66fcd1bd7103
Reviewed-on: https://go-review.googlesource.com/2585
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This makes it easier to experiment with alternative implementations.
While we're here, update the comments.
No functional changes. Passes toolstash -cmp.
Change-Id: I428535754908f0fdd7cc36c214ddb6e1e60f376e
Reviewed-on: https://go-review.googlesource.com/8310
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|