| Age | Commit message (Collapse) | Author |
|
This reverts commit 719dfcf8a8478d70360bf3c34c0e920be7b32994.
Reason for revert: Causing crashes.
Change-Id: I0b8526dd03d82fa074ce4f97f1789eeac702b3eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/709755
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Instead of storing LR (the return address) at 0(SP) and the FP
(parent's frame pointer) at -8(SP), store them at framesize-8(SP)
and framesize-16(SP), respectively.
We push and pop data onto the stack such that we're never accessing
anything below SP.
The prolog/epilog lengths are unchanged (3 insns for a typical prolog,
2 for a typical epilog).
We use 8 bytes more per frame.
Typical prologue:
STP.W (FP, LR), -16(SP)
MOVD SP, FP
SUB $C, SP
Typical epilogue:
ADD $C, SP
LDP.P 16(SP), (FP, LR)
RET
The previous word where we stored LR, at 0(SP), is now unused.
We could repurpose that slot for storing a local variable.
The new prolog and epilog instructions are recognized by libunwind,
so pc-sampling tools like perf should now be accurate. (TODO: except
maybe after the first RET instruction? Have to look into that.)
Update #73753 (fixes, for arm64)
Update #57302 (Quim thinks this will help on that issue)
Change-Id: I4800036a9a9a08aaaf35d9f99de79a36cf37ebb8
Reviewed-on: https://go-review.googlesource.com/c/go/+/674615
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Change-Id: Ib290079a77a746a8512cd4638310b24164f6a930
Reviewed-on: https://go-review.googlesource.com/c/go/+/679456
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The previous CL made this adjustment unnecessary. The argp field
is no longer used by the runtime.
Change-Id: I3491eeef4103c6653ec345d604c0acd290af9e8f
Reviewed-on: https://go-review.googlesource.com/c/go/+/685356
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
That way, the frame is atomically popped. Previously, for big frames
the SP was unwound in two steps (because arm64 can only add constants
up to 1<<12 in a single instruction).
Fixes #73259
Change-Id: I382c249194ad7bc9fc19607c27487c58d90d49e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/689235
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
This reverts commits
3f3782feed6e0726ddb08afd32dad7d94fbb38c6 (CL 648518)
b386b628521780c048af14a148f373c84e687b26 (CL 668475)
Fixes #73542
Change-Id: I218851c5c0b62700281feb0b3f82b6b9b97b910d
Reviewed-on: https://go-review.googlesource.com/c/go/+/670055
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
We currently make some parts of the preamble unpreemptible because
it confuses morestack. See comments in the code.
Instead, have morestack handle those weird cases so we can
remove unpreemptible marks from most places.
This CL makes user functions preemptible everywhere if they have no
write barriers (at least, on x86). In cmd/go the fraction of functions
that need preemptible markings drops from 82% to 36%. Makes the cmd/go
binary 0.3% smaller.
Update #35470
Change-Id: Ic83d5eabfd0f6d239a92e65684bcce7e67ff30bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/648518
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
for small frames
CL 379075 implemented function prologue/epilogue with STP/LDP.
To fix issue #53374, CL 412474 reverted the prologue STP change for
small frames, and the LDP in epilogue was kept. The current instructions
are:
prologue:
MOVD.W R30, -offset(RSP)
MOVD R29, -8(RSP)
epilogue:
LDP -8(RSP), (R29, R30)
ADD $offset, RSP, RSP
It seems a bit strange, as:
1) The prolog and epilogue are not in the same pattern (either STR-LDR,
or STP-LDP).
2) Go Internal ABI defines that R30 is saved at 0(RSP) and R29 is saved
at -8(RSP), so we can not use a single STP.W/LDP.P to save/restore
LR&FP and adjust SP. Changing the ABI causes too much complexity,
and the benefit is not that big.
This patch reverts the small frames' epilogue change in CL 379075. It
converts LDP in the epilogue to LDR-LDR. Another solution is to re-apply
the STP change in prologue, which requires to fix #53609. This seems the
easier and safer solution in the mean time. The new instructions are:
prologue:
MOVD.W R30, -offset(RSP)
MOVD R29, -8(RSP)
epilogue:
MOVD -8(RSP), R29
MOVD.P offset(RSP), R30
The current pattern may cause performance issues in Store-Forwarding on
micro-architectures like AmpereOne. Assuming a function call in the
middle of such code is short enough that the stores are still around,
then the LDP executes and it may wait longer to get the results from
separated stores in Store Buffers other than single STP.
Store-Forwarding aims to improve the efficiency of the processor by
allowing data to be forwarded directly from a store operation to a
subsequent load operation when certain conditions are met. See the
paper: "Memory Barriers: a Hardware View for Software Hackers"
(chapter 3.2: Store Forwarding).
The performance of following ARM64 Linux servers were tested:
1) AmpereOne (ARM v8.6+) from Ampere Computing.
2) Ampere Altra (ARM Neoverse N1) from Ampere Computing.
3) Graviton2 (ARM Neoverse N1) from AWS.
The effect of this change depends the hardware implementation of
store-forwarding. It can obviously improve AmpereOne, especially for
small functions that are frequently called and returned quickly.
E.g., JSON Marshal/Unmarshal benchmarks on AmpereOne:
goos: linux
goarch: arm64
pkg: encoding/json
│ ampere-one.base │ ampere-one.new │
│ sec/op │ sec/op vs base │
CodeMarshal-8 882.1µ ± 1% 779.6µ ± 1% -11.62% (p=0.000 n=10)
CodeMarshalError-8 961.5µ ± 0% 855.7µ ± 1% -11.01% (p=0.000 n=10)
MarshalBytes/32-8 207.6n ± 1% 187.8n ± 0% -9.52% (p=0.000 n=10)
MarshalBytes/256-8 501.0n ± 1% 482.6n ± 1% -3.68% (p=0.000 n=10)
MarshalBytes/4096-8 5.336µ ± 1% 5.074µ ± 1% -4.92% (p=0.000 n=10)
MarshalBytesError/32-8 242.3µ ± 2% 205.7µ ± 3% -15.08% (p=0.000 n=10)
MarshalBytesError/256-8 242.4µ ± 1% 205.2µ ± 2% -15.35% (p=0.000 n=10)
MarshalBytesError/4096-8 247.9µ ± 0% 210.1µ ± 1% -15.24% (p=0.000 n=10)
MarshalMap-8 150.8n ± 1% 145.7n ± 0% -3.35% (p=0.000 n=10)
EncodeMarshaler-8 50.30n ± 26% 54.48n ± 6% ~ (p=0.739 n=10)
CodeUnmarshal-8 4.796m ± 2% 4.055m ± 1% -15.45% (p=0.000 n=10)
CodeUnmarshalReuse-8 4.260m ± 1% 3.496m ± 1% -17.94% (p=0.000 n=10)
UnmarshalString-8 73.89n ± 1% 65.83n ± 1% -10.91% (p=0.000 n=10)
UnmarshalFloat64-8 60.63n ± 1% 58.66n ± 25% ~ (p=0.143 n=10)
UnmarshalInt64-8 55.62n ± 1% 53.25n ± 22% ~ (p=0.468 n=10)
UnmarshalMap-8 255.3n ± 1% 230.3n ± 1% -9.77% (p=0.000 n=10)
UnmarshalNumber-8 467.2n ± 1% 367.0n ± 0% -21.43% (p=0.000 n=10)
geomean 6.224µ 5.605µ -9.94%
Other ARM64 micro-architectures may be not affected so much by such
issue. E.g., benchmarks on Ampere Altra and Graviton2 show slight
improvements:
│ altra.base │ altra.new │
│ sec/op │ sec/op vs base │
CodeMarshal-8 980.1µ ± 1% 977.3µ ± 1% ~ (p=0.912 n=10)
CodeMarshalError-8 1.109m ± 3% 1.096m ± 5% ~ (p=0.971 n=10)
MarshalBytes/32-8 246.8n ± 1% 245.4n ± 0% -0.55% (p=0.002 n=10)
MarshalBytes/256-8 590.9n ± 1% 606.6n ± 1% +2.67% (p=0.000 n=10)
MarshalBytes/4096-8 6.351µ ± 1% 6.376µ ± 1% ~ (p=0.183 n=10)
MarshalBytesError/32-8 245.3µ ± 2% 246.1µ ± 2% ~ (p=0.684 n=10)
MarshalBytesError/256-8 245.5µ ± 1% 248.7µ ± 2% ~ (p=0.218 n=10)
MarshalBytesError/4096-8 254.2µ ± 1% 254.9µ ± 1% ~ (p=0.481 n=10)
MarshalMap-8 152.7n ± 2% 151.5n ± 3% ~ (p=0.782 n=10)
EncodeMarshaler-8 45.95n ± 7% 42.88n ± 5% -6.70% (p=0.014 n=10)
CodeUnmarshal-8 5.121m ± 4% 5.125m ± 3% ~ (p=0.579 n=10)
CodeUnmarshalReuse-8 4.616m ± 3% 4.634m ± 2% ~ (p=0.529 n=10)
UnmarshalString-8 72.12n ± 2% 72.20n ± 2% ~ (p=0.912 n=10)
UnmarshalFloat64-8 64.44n ± 5% 63.20n ± 4% ~ (p=0.393 n=10)
UnmarshalInt64-8 61.49n ± 2% 58.14n ± 4% -5.45% (p=0.002 n=10)
UnmarshalMap-8 263.6n ± 2% 266.2n ± 1% ~ (p=0.196 n=10)
UnmarshalNumber-8 464.7n ± 1% 464.0n ± 0% ~ (p=0.566 n=10)
geomean 6.617µ 6.575µ -0.64%
│ graviton2.base │ graviton2.new │
│ sec/op │ sec/op vs base │
CodeMarshal-8 1.122m ± 0% 1.118m ± 1% ~ (p=0.052 n=10)
CodeMarshalError-8 1.216m ± 1% 1.214m ± 0% ~ (p=0.631 n=10)
MarshalBytes/32-8 289.9n ± 0% 280.8n ± 0% -3.17% (p=0.000 n=10)
MarshalBytes/256-8 675.9n ± 0% 664.7n ± 0% -1.66% (p=0.000 n=10)
MarshalBytes/4096-8 6.884µ ± 0% 6.885µ ± 0% ~ (p=0.565 n=10)
MarshalBytesError/32-8 293.1µ ± 2% 288.9µ ± 2% ~ (p=0.123 n=10)
MarshalBytesError/256-8 296.0µ ± 3% 289.0µ ± 1% -2.36% (p=0.019 n=10)
MarshalBytesError/4096-8 300.4µ ± 1% 295.6µ ± 0% -1.60% (p=0.000 n=10)
MarshalMap-8 168.8n ± 1% 168.8n ± 1% ~ (p=1.000 n=10)
EncodeMarshaler-8 53.77n ± 8% 50.05n ± 12% ~ (p=0.579 n=10)
CodeUnmarshal-8 5.875m ± 2% 5.882m ± 1% ~ (p=0.796 n=10)
CodeUnmarshalReuse-8 5.383m ± 1% 5.366m ± 0% ~ (p=0.631 n=10)
UnmarshalString-8 74.59n ± 1% 73.99n ± 0% -0.80% (p=0.001 n=10)
UnmarshalFloat64-8 68.52n ± 7% 64.19n ± 18% ~ (p=0.868 n=10)
UnmarshalInt64-8 65.32n ± 13% 62.24n ± 8% ~ (p=0.138 n=10)
UnmarshalMap-8 290.1n ± 0% 291.3n ± 0% +0.43% (p=0.010 n=10)
UnmarshalNumber-8 514.4n ± 0% 499.4n ± 0% -2.93% (p=0.000 n=10)
geomean 7.459µ 7.317µ -1.91%
Change-Id: If27386fc5f514b76bdaf2012c2ce86cc65f7ca5b
Reviewed-on: https://go-review.googlesource.com/c/go/+/621775
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
ARM64 allows for a register to be specified with a return
instruction. While the assembler parsing and encoding currently
supports this, the preprocess function uses LR unconditionally.
Correct this such that if a register is specified, the register
is used.
Change-Id: I708f6c7e910d141559b60d2d5ee76ae2e1dc3a0e
Reviewed-on: https://go-review.googlesource.com/c/go/+/592796
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For leaf but nonzero-frame functions.
Currently we're not restoring it properly. We also need to restore
it before popping the stack frame, so that the frame won't get
clobbered by a signal handler in the meantime.
Fixes #63830
Needs a test, but I'm not at all sure how we would actually do that. Leaving for inspiration.
Change-Id: I273a25f2a838f05a959c810145cccc5428eaf164
Reviewed-on: https://go-review.googlesource.com/c/go/+/538635
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Eric Fang <eric.fang@arm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Change-Id: I36a0f0989d37bef45ea8778da799b56a7e9a0c30
Reviewed-on: https://go-review.googlesource.com/c/go/+/529515
Run-TryBot: shuang cui <imcusg@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
|
|
The UnspillReg code should always be preemptible because all the arg registers will be saved by runtime.asyncpreempt.
Change-Id: Ie36b5d0cdd1275efcb95661354d83be2e1b00a86
Reviewed-on: https://go-review.googlesource.com/c/go/+/526235
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Load large constants into vector registers from rodata, instead of placing them
in the literal pool. This treats VMOVQ/VMOVD/VMOVS the same as FMOVD/FMOVS and
makes use of the existing mechanism for storing values in rodata. Two additional
instructions are required for a load, however these instructions are used
infrequently and already have a high latency.
Updates #59615
Change-Id: I54226730267689963d73321e548733ae2d66740e
Reviewed-on: https://go-review.googlesource.com/c/go/+/515617
Reviewed-by: Eric Fang <eric.fang@arm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
For #59670.
Change-Id: I91448363be2fc678964ce119d85cd5fae34a14da
Reviewed-on: https://go-review.googlesource.com/c/go/+/486975
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Auto-Submit: Austin Clements <austin@google.com>
|
|
For #59670.
Change-Id: Ie784ba4dd2701e4f455e1abde4a6bfebee4b1387
Reviewed-on: https://go-review.googlesource.com/c/go/+/485496
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Auto-Submit: Austin Clements <austin@google.com>
|
|
internal/abi"
This reverts commit CL 486379.
Submitted out of order and breaks bootstrap.
Change-Id: Ie20a61cc56efc79a365841293ca4e7352b02d86b
Reviewed-on: https://go-review.googlesource.com/c/go/+/486917
TryBot-Bypass: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
For #59670.
Change-Id: I04a17079b351b9b4999ca252825373c17afb8a88
Reviewed-on: https://go-review.googlesource.com/c/go/+/486379
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Usually optimization rules have corresponding priorities, some need to
be run first, some run next, and some run last, which produces the best
code. But currently our optimization rules have no priority, this CL
adds a late lower pass that runs those rules that need to be run at last,
such as split unreasonable constant folding. This pass can be seen as
the second round of the lower pass.
For example:
func foo(a, b uint64) uint64 {
d := a+0x1234568
d1 := b+0x1234568
return d&d1
}
The code generated by the master branch:
0x0004 00004 ADD $19088744, R0, R2 // movz+movk+add
0x0010 00016 ADD $19088744, R1, R1 // movz+movk+add
0x001c 00028 AND R1, R2, R0
This is because the current constant folding optimization rules do not
take into account the range of constants, causing the constant to be
loaded repeatedly. This CL splits these unreasonable constants folding
in the late lower pass. With this CL the generated code:
0x0004 00004 MOVD $19088744, R2 // movz+movk
0x000c 00012 ADD R0, R2, R3
0x0010 00016 ADD R1, R2, R1
0x0014 00020 AND R1, R3, R0
This CL also adds constant folding optimization for ADDS instruction.
In addition, in order not to introduce the codegen regression, an
optimization rule is added to change the addition of a negative number
into a subtraction of a positive number.
go1 benchmarks:
name old time/op new time/op delta
BinaryTree17-8 1.22s ± 1% 1.24s ± 0% +1.56% (p=0.008 n=5+5)
Fannkuch11-8 1.54s ± 0% 1.53s ± 0% -0.69% (p=0.016 n=4+5)
FmtFprintfEmpty-8 14.1ns ± 0% 14.1ns ± 0% ~ (p=0.079 n=4+5)
FmtFprintfString-8 26.0ns ± 0% 26.1ns ± 0% +0.23% (p=0.008 n=5+5)
FmtFprintfInt-8 32.3ns ± 0% 32.9ns ± 1% +1.72% (p=0.008 n=5+5)
FmtFprintfIntInt-8 54.5ns ± 0% 55.5ns ± 0% +1.83% (p=0.008 n=5+5)
FmtFprintfPrefixedInt-8 61.5ns ± 0% 62.0ns ± 0% +0.93% (p=0.008 n=5+5)
FmtFprintfFloat-8 72.0ns ± 0% 73.6ns ± 0% +2.24% (p=0.008 n=5+5)
FmtManyArgs-8 221ns ± 0% 224ns ± 0% +1.22% (p=0.008 n=5+5)
GobDecode-8 1.91ms ± 0% 1.93ms ± 0% +0.98% (p=0.008 n=5+5)
GobEncode-8 1.40ms ± 1% 1.39ms ± 0% -0.79% (p=0.032 n=5+5)
Gzip-8 115ms ± 0% 117ms ± 1% +1.17% (p=0.008 n=5+5)
Gunzip-8 19.4ms ± 1% 19.3ms ± 0% -0.71% (p=0.016 n=5+4)
HTTPClientServer-8 27.0µs ± 0% 27.3µs ± 0% +0.80% (p=0.008 n=5+5)
JSONEncode-8 3.36ms ± 1% 3.33ms ± 0% ~ (p=0.056 n=5+5)
JSONDecode-8 17.5ms ± 2% 17.8ms ± 0% +1.71% (p=0.016 n=5+4)
Mandelbrot200-8 2.29ms ± 0% 2.29ms ± 0% ~ (p=0.151 n=5+5)
GoParse-8 1.35ms ± 1% 1.36ms ± 1% ~ (p=0.056 n=5+5)
RegexpMatchEasy0_32-8 24.5ns ± 0% 24.5ns ± 0% ~ (p=0.444 n=4+5)
RegexpMatchEasy0_1K-8 131ns ±11% 118ns ± 6% ~ (p=0.056 n=5+5)
RegexpMatchEasy1_32-8 22.9ns ± 0% 22.9ns ± 0% ~ (p=0.905 n=4+5)
RegexpMatchEasy1_1K-8 126ns ± 0% 127ns ± 0% ~ (p=0.063 n=4+5)
RegexpMatchMedium_32-8 486ns ± 5% 483ns ± 0% ~ (p=0.381 n=5+4)
RegexpMatchMedium_1K-8 15.4µs ± 1% 15.5µs ± 0% ~ (p=0.151 n=5+5)
RegexpMatchHard_32-8 687ns ± 0% 686ns ± 0% ~ (p=0.103 n=5+5)
RegexpMatchHard_1K-8 20.7µs ± 0% 20.7µs ± 1% ~ (p=0.151 n=5+5)
Revcomp-8 175ms ± 2% 176ms ± 3% ~ (p=1.000 n=5+5)
Template-8 20.4ms ± 6% 20.1ms ± 2% ~ (p=0.151 n=5+5)
TimeParse-8 112ns ± 0% 113ns ± 0% +0.97% (p=0.016 n=5+4)
TimeFormat-8 156ns ± 0% 145ns ± 0% -7.14% (p=0.029 n=4+4)
Change-Id: I3ced26e89041f873ac989586514ccc5ee09f13da
Reviewed-on: https://go-review.googlesource.com/c/go/+/425134
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
|
|
This CL adds some optimizaion rules:
1, Converts CMP to CMN, or vice versa, when comparing with a negative
number.
2, For equal and not equal comparisons, CMP can be converted to CMN in
some cases. In theory we could do the same optimization for LT, LE, GT
and GE, but need to account for overflow, this CL doesn't handle them.
There are no noticeable performance changes.
Change-Id: Ia49266c019ab7908ebc9510c2f02e121b1607869
Reviewed-on: https://go-review.googlesource.com/c/go/+/429795
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
|
|
Previously the first operand of MSR could be $0, which would be
converted to the ZR register. This is prohibited by CL 404316,
this CL restores this instruction format.
Change-Id: I5b5be59e76aa58423a0fb96942d1b2a9de62e311
Reviewed-on: https://go-review.googlesource.com/c/go/+/426198
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
|
|
Previously the first operand of FMOVD and FMOVS could be $0, which
would be converted to the ZR register. This is prohibited by CL 404316,
also it broken the encoding of "FMOVD/FMOVS ZR, Rn", this CL restores
this instruction format and fixes the encoding issue.
Fixes #54655.
Fixes #54729.
Change-Id: I9c42cd41296bed7ffd601609bd8ecaa27d11e659
Reviewed-on: https://go-review.googlesource.com/c/go/+/425188
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
Previously we convert $0 to the ZR register for some reasons, which causes
two problems:
1. Confusion, the special case of the ZR register needs to be considered
when dealing with constants. For encoding, some places we encode ZR, and
some places we encode $0, although we have converted $0 to ZR.
2. Unexpected instruction format. All instructions that support ZR register
operands can be replaced by $0.
This patch removes this conversion. Note that this patch may cause previously
unintendedly supported instruction formats to no longer be supported.
Change-Id: I3d8d2c06711b7614a38191397da7776417f1861c
Reviewed-on: https://go-review.googlesource.com/c/go/+/404316
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
When we create a thread with signals blocked. But glibc's
pthread_sigmask doesn't really allow us to block SIGSETXID. So we
may get a signal early on before the signal stack is set. If we
get a signal on the current stack, it will clobber anything below
the SP. This CL makes it to save LR and decrement SP in a single
MOVD.W instruction for small frames, so we don't write below the
SP.
We used to use a single MOVD.W instruction before CL 379075.
CL 379075 changed to use an STP instruction to save the LR and FP,
then decrementing the SP. This CL changes it back, just this part
(epilogues and large frame prologues are unchanged). For small
frames, it is the same number of instructions either way.
This decreases the size of a "small" frame from 0x1f0 to 0xf0.
For frame sizes in between, it could benefit from using an
STP instruction instead of using the prologue for the "large"
frame case. We don't bother it for now as this is a stop-gap
solution anyway.
This only addresses the issue with small frames. Luckily, all
functions from thread entry to setting up the signal stack have
samll frames.
Other possible ideas:
- Expand the unwind info metadata, separate SP delta and the
location of the return address, so we can express "SP is
decremented but the return address is in the LR register". Then
we can always create the frame first then write the LR, without
writing anything below the SP (except the frame pointer at SP-8,
which is minor because it doesn't really affect program
execution).
- Set up the signal stack immediately in mstart in assembly.
For Go 1.19 we do this simple fix. We plan to do the metadata fix
in Go 1.20 ( #53609 ).
Other LR architectures are addressed in CL 413428.
Fix #53374.
Change-Id: I9d6582ab14ccb06ac61ad43852943d9555e22ae5
Reviewed-on: https://go-review.googlesource.com/c/go/+/412474
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Eric Fang <eric.fang@arm.com>
|
|
For some 32-bit instructions whose first operand is a constant, we
copy the lower 32 bits of the constant into the upper 32 bits in progedit,
which leads to the wrong value being printed in -S output.
The purpose of this is that we don't need to distinguish between 32-bit
and 64-bit constants when checking C_BITCON, this CL puts the modified
value in a temporary variable, so that the constant operand of the
instruction will not be modified.
Fixes #53551
Change-Id: I40ee9223b4187bff1c0a1bab7eb508fcb30325f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/414374
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
Some arithmetic operation instructions such as ADD and SUB support two
formats of left shift (<<) operation, namely shifted register format and
extended register format. And the encoding, supported registers and shifted
amount are both different.
The assembly parser doesn't distinguish them and parses them into TYPE_SHIFT
type, because the parser can't tell them apart and in most cases extended
left-shift can be replaced by shifted left-shift. The only exception is
when the second source register or the destination register is RSP.
This CL converts this case into the extended format in the preprocess stage,
which helps to simplify some of the logic of the new assembler implementation
and also makes this situation look more reasonable.
Change-Id: I2cd7d2d663b38a7ba77a9fef1092708b8cb9bc3d
Reviewed-on: https://go-review.googlesource.com/c/go/+/311709
Trust: Eric Fang <eric.fang@arm.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
In function prologue and epilogue, we save and restore FP and LR
registers, and adjust RSP. The current instruction sequence is as
follow.
For frame size <= 240B,
prologue:
MOVD.W R30, -offset(RSP)
MOVD R29, -8(RSP)
epilogue:
MOVD -8(RSP), R29
MOVD.P offset(RSP), R30
For frame size > 240B,
prologue:
SUB $offset, RSP, R27
MOVD R30, (R27)
MOVD R27, RSP
MOVD R29, -8(RSP)
epilogue:
MOVD -8(RSP), R29
MOVD (RSP), R30
ADD $offset, RSP
Each sequence uses two load or store instructions, actually we can load
or store two registers with one LDP or STP instruction. This CL changes
the sequences as follow.
For frame size <= 496B,
prologue:
STP (R29, R30), -(offset+8)(RSP)
SUB $offset, RSP, RSP
epilogue:
LDP -8(RSP), (R29, R30)
ADD $offset, RSP, RSP
For frame size > 496B,
prologue:
SUB $offset, RSP, R20
STP (R29, R30), -8(R20)
MOVD R20, RSP
epilogue:
LDP -8(RSP), (R29, R30)
ADD $offset, RSP, RSP
Change-Id: Ia58af85fc81cce9b7c393dc38df43bffb203baad
Reviewed-on: https://go-review.googlesource.com/c/go/+/379075
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: Eric Fang <eric.fang@arm.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
|
|
When framesize <= objabi.StackSmall, 128B, the stacksplit prologue is:
MOVD 16(g), R16
MOVD SP, R17
CMP R16, R17
BLS morestack_label
The second instruction is not necessary, we can compare R16 with SP
directly, so the sequence becomes:
MOVD 16(g), R16
CMP R16, SP
BLS morestack_label
This CL removes this instruction.
Change-Id: I0567ac52e9be124880957271951e1186da203612
Reviewed-on: https://go-review.googlesource.com/c/go/+/379076
Trust: Eric Fang <eric.fang@arm.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This adds a debugging hook for optionally calling a "maymorestack"
function in the prologue of any function that might call morestack
(whether it does at run time or not). The maymorestack function will
let us improve lock checking and add debugging modes that stress
function preemption and stack growth.
Passes toolstash-check -all (except on js/wasm, where toolstash
appears to be broken)
Fixes #48297.
Change-Id: I27197947482b329af75dafb9971fc0d3a52eaf31
Reviewed-on: https://go-review.googlesource.com/c/go/+/359795
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
This CL provide new intrinsics to emit prefetch instructions for AMD64
and ARM64 platforms:
Prefetch - prefetches data from memory address to cache;
PrefetchStreamed - prefetches data from memory address, with a hint
that this data is being streamed.
This patch also provides prefetch calls pointed by RSC inside scanobject
and greyobject of GC mark logic.
Performance results provided by Michael:
https://perf.golang.org/search?q=upload:20210901.9
Benchmark parameters:
tree2 -heapsize=1000000000 -cpus=8
tree -n=18
parser
peano
Benchmarks AMD64 (Xeon - Cascade Lake):
name old time/op new time/op delta
Tree2-8 36.1ms ± 6% 33.4ms ± 5% -7.65% (p=0.000 n=9+9)
Tree-8 326ms ± 1% 324ms ± 1% -0.44% (p=0.006 n=9+10)
Parser-8 2.75s ± 1% 2.71s ± 1% -1.47% (p=0.008 n=5+5)
Peano-8 63.1ms ± 1% 63.0ms ± 1% ~ (p=0.730 n=9+9)
[Geo mean] 213ms 207ms -2.45%
Benchmarks ARM64 (Kunpeng 920):
name old time/op new time/op delta
Tree2-8 50.3ms ± 8% 44.1ms ± 5% -12.24% (p=0.000 n=10+9)
Tree-8 494ms ± 1% 493ms ± 1% ~ (p=0.684 n=10+10)
Parser-8 3.99s ± 1% 3.93s ± 1% -1.37% (p=0.016 n=5+5)
Peano-8 84.4ms ± 0% 84.1ms ± 1% ~ (p=0.068 n=8+10)
[Geo mean] 302ms 291ms -3.67%
Change-Id: I43e10bc2f9512dc49d7631dd8843a79036fa43d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/328289
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
Some arm64 instructions accept ZR as its destination register, such as MOVD,
AND, ADD etc. although it doesn't seem to make much sense, but we should
make sure the encoding is correct. However there exists some encoding mistakes
in the current assembler, they are:
1, 'MOVD $1, ZR' is incorrectly encoded as 'MOVD $1, ZR' + '0x00000000'.
2, 'AND $1, R2, ZR' is incorrectly encoded as 'MOVD $1, R27' + 'AND R27, R2, ZR' +
'0x00000000'.
3, 'AND $1, ZR' is incorrectly encoded as 'AND $1, ZR, RSP'.
Obviously the first two encoding errors can cause SIGILL, and the third one will
rewrite RSP.
At the same time, I found some weird encodings but they don't cause errors.
4, 'MOVD $0x0001000100010001, ZR' is encoded as 'MOVW $1, ZR' + 'MOVKW $(1<<16), ZR'.
5, 'AND $0x0001000100010001, R2, ZR' is encoded as 'MOVD $1, R27' + 'MOVK $(1<<16), R27' +
'MOVK $(1<<32), R27'.
Some of these issues also apply to 32-bit versions of these instructions.
These problems are not very complicated, and are basically caused by the improper
adaptation of the class of the constant to the entry in the optab. But the relationship
between these constant classes is a bit complicated, so I don't know how to deal with
issue 4 and 5, because they won't cause errors, so this CL didn't deal with them.
This CL fixed the first three issues.
Issue 1:
before: 'MOVD $1, ZR' => 'MOVD $1, ZR' + '0x00000000'.
after: 'MOVD $1, ZR' => 'MOVD $1, ZR'.
Issue 2:
before: 'AND $1, R2, ZR' => 'MOVD $1, R27' + 'AND R27, R2, ZR' + '0x00000000'.
after: 'AND $1, R2, ZR' => 'ORR $1, ZR, R27' + 'AND R27, R2, ZR'.
Issue 3:
before: 'AND $1, ZR' => 'AND $1, ZR, RSP'.
after: 'AND $1, ZR' => 'ORR $1, ZR, R27' + 'AND R27, ZR, ZR'.
Change-Id: I3c889079229f847b863ad56c88966be12d947202
Reviewed-on: https://go-review.googlesource.com/c/go/+/329750
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: eric fang <eric.fang@arm.com>
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
compiler ABIInternal
For functions such as gcWriteBarrier and panicIndexXXX, the
compiler generates ABIInternal calls directly. And they must not
use wrappers because it follows a special calling convention or
the caller's PC is used. Mark them as ABIInternal.
Note that even though they are marked as ABIInternal, they don't
actually use the internal ABI, i.e. regabiargs is not honored for
now.
Now all.bash passes with GOEXPERIMENT=regabiwrappers (at least on
macOS).
Change-Id: I87e41964e6dc4efae03e8eb636ae9fa1d99285bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/323934
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
|
Spill arg registers before calling morestack, and reload after.
Change-Id: I09404def321b8f935d5e8836a46ccae8256d0d55
Reviewed-on: https://go-review.googlesource.com/c/go/+/322853
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
function prologue
Avoid using R1, R2, etc. in function prologue, which may carry
live argument values.
Change-Id: I80322b3f7e8fda7aaff622aaa99bc76d02e09727
Reviewed-on: https://go-review.googlesource.com/c/go/+/322852
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.
Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
|
|
CL 307010 for arm64.
Change-Id: I6c6e1bd6065df059e50c3632a9eb669b64fce899
Reviewed-on: https://go-review.googlesource.com/c/go/+/307050
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This change omits the stack check on arm64 when the size of a stack
frame is less than obj.StackSmall.
The effect is not very significant, because CL 92040 has set the leaf
function with a framesize of 0 to NOFRAME, which makes the code
prologue on arm64 much closer to other architectures. But it is not
without effect, for example, it is effective for std library functions
such as runtime.usleep, fmt.isSpace, etc. Since this CL is very simple,
I think this optimization is worthwhile.
compilecmp results on linux/arm64:
name old time/op new time/op delta
Template 284ms ± 1% 283ms ± 1% -0.29% (p=0.000 n=50+50)
Unicode 125ms ± 2% 125ms ± 1% ~ (p=0.445 n=49+49)
GoTypes 1.70s ± 1% 1.69s ± 1% -0.36% (p=0.000 n=50+50)
Compiler 124ms ± 1% 124ms ± 1% -0.31% (p=0.003 n=48+48)
SSA 12.7s ± 1% 12.7s ± 1% ~ (p=0.117 n=50+50)
Flate 172ms ± 1% 171ms ± 1% -0.55% (p=0.000 n=50+50)
GoParser 265ms ± 1% 264ms ± 1% -0.23% (p=0.000 n=47+48)
Reflect 653ms ± 1% 646ms ± 1% -1.12% (p=0.000 n=48+50)
Tar 246ms ± 1% 245ms ± 1% -0.41% (p=0.000 n=46+47)
XML 328ms ± 1% 327ms ± 1% -0.18% (p=0.020 n=46+50)
LinkCompiler 599ms ± 1% 598ms ± 1% ~ (p=0.237 n=50+49)
ExternalLinkCompiler 1.87s ± 1% 1.87s ± 1% -0.18% (p=0.000 n=50+50)
LinkWithoutDebugCompiler 365ms ± 1% 364ms ± 2% ~ (p=0.131 n=50+50)
[Geo mean] 490ms 488ms -0.32%
name old alloc/op new alloc/op delta
Template 38.8MB ± 1% 38.8MB ± 1% +0.16% (p=0.013 n=47+49)
Unicode 28.4MB ± 0% 28.4MB ± 0% ~ (p=0.512 n=46+44)
GoTypes 169MB ± 1% 169MB ± 1% ~ (p=0.628 n=50+50)
Compiler 23.2MB ± 1% 23.2MB ± 1% ~ (p=0.424 n=46+44)
SSA 1.55GB ± 0% 1.55GB ± 0% ~ (p=0.603 n=48+50)
Flate 23.7MB ± 1% 23.8MB ± 1% ~ (p=0.797 n=50+50)
GoParser 35.3MB ± 1% 35.3MB ± 1% ~ (p=0.932 n=49+49)
Reflect 85.0MB ± 0% 84.9MB ± 0% -0.05% (p=0.038 n=45+40)
Tar 34.4MB ± 1% 34.5MB ± 1% ~ (p=0.288 n=50+50)
XML 43.8MB ± 2% 43.9MB ± 2% ~ (p=0.798 n=46+49)
LinkCompiler 136MB ± 0% 136MB ± 0% ~ (p=0.750 n=50+50)
ExternalLinkCompiler 127MB ± 0% 127MB ± 0% ~ (p=0.852 n=50+50)
LinkWithoutDebugCompiler 84.1MB ± 0% 84.1MB ± 0% ~ (p=0.890 n=50+50)
[Geo mean] 70.4MB 70.4MB +0.01%
file before after Δ %
addr2line 4006004 4006012 +8 +0.000%
asm 4936863 4936919 +56 +0.001%
buildid 2594947 2594859 -88 -0.003%
cgo 4399702 4399806 +104 +0.002%
compile 22233139 22233107 -32 -0.000%
cover 4443681 4443785 +104 +0.002%
dist 3365902 3365806 -96 -0.003%
doc 3776175 3776231 +56 +0.001%
fix 3218624 3218552 -72 -0.002%
nm 3923345 3923329 -16 -0.000%
objdump 4295473 4295673 +200 +0.005%
pack 2390561 2390497 -64 -0.003%
pprof 12866419 12866275 -144 -0.001%
test2json 2587113 2587129 +16 +0.001%
trace 9609814 9609710 -104 -0.001%
vet 6790272 6791048 +776 +0.011%
total 106832751 106833455 +704 +0.001%
Updates #13379 (for arm64)
Change-Id: I07664ab0b978c66c0b18b8482222e9ba3772290d
Reviewed-on: https://go-review.googlesource.com/c/go/+/302853
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: eric fang <eric.fang@arm.com>
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
Constant of BITCON type can be moved into RSP by MOVD or MOVW instructions
directly, this CL enables this format of these two instructions.
For 32-bit ADDWop instructions with constant, rewrite the high 32-bit
to be a repetition of the low 32-bit, just as ANDWop instructions do,
so that we can optimize ADDW $bitcon, Rn, Rt as:
MOVW $bitcon, Rtmp
ADDW Rtmp, Rn, Rt
The original code is:
MOVZ $bitcon_low, Rtmp
MOVK $bitcon_high,Rtmp
ADDW Rtmp, Rn, Rt
Change-Id: I30e71972bcfd6470a8b6e6ffbacaee79d523805a
Reviewed-on: https://go-review.googlesource.com/c/go/+/289649
Trust: eric fang <eric.fang@arm.com>
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Frame pointers were already enabled on linux, darwin, ios,
but not freebsd, android, openbsd, netbsd.
But the space was reserved on all platforms, leading to
two different arm64 framepointer conditions in different
parts of the code, one of which had no name
(framepointer_enabled || GOARCH == "arm64",
which might have been "framepointer_space_reserved").
So on the disabled systems, the stack layouts were still
set up for frame pointers and the only difference was not
actually maintaining the FP register in the generated code.
Reduce complexity by just enabling the frame pointer
completely on all the arm64 systems.
This commit passes on freebsd, android, netbsd.
I have not been able to try it on openbsd.
This CL is part of a stack adding windows/arm64
support (#36439), intended to land in the Go 1.17 cycle.
This CL is, however, not windows/arm64-specific.
It is cleanup meant to make the port (and future ports) easier.
Change-Id: I83bd23369d24b76db4c6a648fa74f6917819a093
Reviewed-on: https://go-review.googlesource.com/c/go/+/288814
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
The runtime traceback code has its own definition of which functions
mark the top frame of a stack, separate from the TOPFRAME bits that
exist in the assembly and are passed along in DWARF information.
It's error-prone and redundant to have two different sources of truth.
This CL provides the actual TOPFRAME bits to the runtime, so that
the runtime can use those bits instead of reinventing its own category.
This CL also adds a new bit, SPWRITE, which marks functions that
write directly to SP (anything but adding and subtracting constants).
Such functions must stop a traceback, because the traceback has no
way to rederive the SP on entry. Again, the runtime has its own definition
which is mostly correct, but also missing some functions. During ordinary
goroutine context switches, such functions do not appear on the stack,
so the incompleteness in the runtime usually doesn't matter.
But profiling signals can arrive at any moment, and the runtime may
crash during traceback if it attempts to unwind an SP-writing frame
and gets out-of-sync with the actual stack. The runtime contains code
to try to detect likely candidates but again it is incomplete.
Deriving the SPWRITE bit automatically from the actual assembly code
provides the complete truth, and passing it to the runtime lets the
runtime use it.
This CL is part of a stack adding windows/arm64
support (#36439), intended to land in the Go 1.17 cycle.
This CL is, however, not windows/arm64-specific.
It is cleanup meant to make the port (and future ports) easier.
Change-Id: I227f53b23ac5b3dabfcc5e8ee3f00df4e113cf58
Reviewed-on: https://go-review.googlesource.com/c/go/+/288800
Trust: Russ Cox <rsc@golang.org>
Trust: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This creates space for a different kind of extension field
in LSym without making the struct any larger.
(There are many LSym, so we care about keeping the struct small.)
Change-Id: Ib16edb9e15f54c2a7351c8b875e19684058711e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/243943
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Currently we don't use sigaltstack on darwin/arm64, as is not
supported on iOS. However, it is supported on macOS. Use it.
(iOS remains unchanged.)
Change-Id: Icc154c5e2edf2dbdc8ca68741ad9157fc15a72ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/256917
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
We currently use two fields to store the targets of branches.
Some phases use p.To.Val, some use p.Pcond. Rewrite so that
every branch instruction uses p.To.Val.
p.From.Val is also used in rare instances.
Introduce a Pool link for use by arm/arm64, instead of
repurposing Pcond.
This is a cleanup CL in preparation for some stack frame CLs.
Change-Id: If8239177e4b1ea2bccd0608eb39553d23210d405
Reviewed-on: https://go-review.googlesource.com/c/go/+/251437
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This reverts CL 243318.
Reason for revert: Seems to be crashing some builders.
Change-Id: I2ffc59bc5535be60b884b281c8d0eff4647dc756
Reviewed-on: https://go-review.googlesource.com/c/go/+/251169
Reviewed-by: Bryan C. Mills <bcmills@google.com>
|
|
We currently use two fields to store the targets of branches.
Some phases use p.To.Val, some use p.Pcond. Rewrite so that
every branch instruction uses p.To.Val.
p.From.Val is also used in rare instances.
Introduce a Pool link for use by arm/arm64, instead of
repurposing Pcond.
This is a cleanup CL in preparation for some stack frame CLs.
Change-Id: I9055bf0a1d986aff421e47951a1dedc301c846f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/243318
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
I think they are no longer experimental status. Might as well promote
them to permanent.
Change-Id: Id1259601b3dd2061dd60df86ee48080bfb575d2f
Reviewed-on: https://go-review.googlesource.com/c/go/+/249857
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
This has already been done for s390x, ppc64. This CL is for
all the other architectures.
Fixes #40796
Change-Id: Idd1816e057df63022d47e99fa06617811d8c8489
Reviewed-on: https://go-review.googlesource.com/c/go/+/248684
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Change-Id: I7e9ec2835f1a7d9821dff3e868aebf07fece8137
Reviewed-on: https://go-review.googlesource.com/c/go/+/223297
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
When there are both a synchronous preemption request (by
clobbering the stack guard) and an asynchronous one (by signal),
the running goroutine may observe the synchronous request first
in stack bounds check, and go to the path of calling morestack.
If the preemption signal arrives at this point before the call to
morestack, the goroutine will be asynchronously preempted,
entering the scheduler. When it is resumed, the scheduler clears
the preemption request, unclobbers the stack guard. But the
resumed goroutine will still call morestack, as it is already on
its way. morestack will, as there is no preemption request,
double the stack unnecessarily. If this happens multiple times,
the stack may grow too big, although only a small amount is
actually used.
To fix this, we mark the stack bounds check and the call to
morestack async-nonpreemptible, starting after the memory
instruction (mostly a load, on x86 CMP with memory).
Not done for Wasm as it does not support async preemption.
Fixes #35470.
Change-Id: Ibd7f3d935a3649b80f47539116ec9b9556680cf2
Reviewed-on: https://go-review.googlesource.com/c/go/+/207350
Reviewed-by: David Chase <drchase@google.com>
|
|
iOS does not support SA_ONSTACK. The signal handler runs on the
G stack. Any writes below the SP may be clobbered by the signal
handler (even without call injection). So we save LR after
decrementing SP on iOS.
Updates #35439.
Change-Id: Ia6d7a0669e0bcf417b44c031d2e26675c1184165
Reviewed-on: https://go-review.googlesource.com/c/go/+/206418
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
When the frame size is large, we generate
MOVD.P 0xf0(SP), LR
ADD $(framesize-0xf0), SP
This is problematic: after the first instruction, we have a
partial frame of size (framesize-0xf0). If we try to unwind the
stack at this point, we'll try to read the LR from the stack at
0(SP) (the new SP) as the frame size is not 0. But this slot does
not contain a valid LR.
Fix this by not changing SP in two instructions. Instead,
generate
MOVD (SP), LR
ADD $framesize, SP
This affects not only async preemption but also profiling. So we
change the generated instructions, instead of marking unsafe
point.
Change-Id: I4e78c62d50ffc4acff70ccfbfec16a5ccae17f24
Reviewed-on: https://go-review.googlesource.com/c/go/+/206057
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|