aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/internal/obj/ppc64
AgeCommit message (Collapse)Author
2021-04-05cmd/internal/obj/ppc64: simplify huge frame prologueAustin Clements
CL 307010 for ppc64. I spent a long time trying to figure out how to use the carry bit from ADDCCC to further simplify this (like what we do on arm64), but gave up after I couldn't figure out how to access the carry bit without just adding more instructions. Change-Id: I6cad51b93616865b203cb16554f16121375aabbc Reviewed-on: https://go-review.googlesource.com/c/go/+/307149 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-29cmd/internal/obj/ppc64: remove bogus MOVBU optab entryPaul E. Murphy
This was missed in https://golang.org/cl/303329 . It is another impossible usage of MOVBU as a load like "MOVBU 0(rX), rY, rZ" or "MOVBU rX(rB), rY, rZ". Change-Id: Ib3dd984b6424907498ed65b798649f0b990d50a7 Reviewed-on: https://go-review.googlesource.com/c/go/+/304471 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-24cmd/internal/obj: remove bogus load/store optab entries from ppc64Paul E. Murphy
No valid operation should match those removed by this patch. They kind of look as if they match X-form load/stores on ppc64, but the second argument is always ignored when translating to machine code. Similarly, it should be noted an X-form memory access encodes into an Addr which is a classified as a ZOREG argument with a non-zero index, and a register type Addr. Change-Id: I1adbb020d1b2612b18949d0e7eda05dbb3e8a25c Reviewed-on: https://go-review.googlesource.com/c/go/+/303329 Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
2021-03-19cmd/internal/obj/ppc64: consolidate memory classificationsPaul E. Murphy
Several classifications exist only to help disambiguate an implied register (i.e $0/R0 as the implied second register argument when loading constants, or pseudo-registers used exclusively by the assembler front-end). The register determination is folded into getimpliedreg. The classifications and their related optab entries are removed or updated. Change-Id: Iffb167aa9fa57fbc1a537c79fbdfb36cb38f9d95 Reviewed-on: https://go-review.googlesource.com/c/go/+/301789 Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Cherry Zhang <cherryyz@google.com>
2021-03-15cmd/internal/obj: reorder ppc64 MOV* optab entriesPaul E. Murphy
These are always sorted and grouped during initialization of the actual opcode -> optab map generation. Thus, their initial location in optab is mostly aimed at readability. This cleanup is intends to ease reviewing of future patches which simplify, combine, or remove MOV* optab entries. Change-Id: I87583ed34fab79e0f625880f419d499939e2a9e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/300612 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Trust: Emmanuel Odeke <emmanuel@orijtech.com>
2021-03-10cmd/internal/obj: remove param element from ppc64 optabPaul E. Murphy
This is rarely used, and is implied based on the memory type of the operand. This is a step towards simplifying the MOV* pseudo opcodes on ppc64. Similarly, remove the bogus param value from AVMULESB. Change-Id: Ibad4d045ec6d8c5163a468b2db1dfb762ef674ee Reviewed-on: https://go-review.googlesource.com/c/go/+/300177 Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Trust: Ian Lance Taylor <iant@golang.org>
2021-03-09cmd/asm,cmd/compile: support 5 operand RLWNM/RLWMI on ppc64Paul E. Murphy
These instructions are actually 5 argument opcodes as specified by the ISA. Prior to this patch, the MB and ME arguments were merged into a single bitmask operand to workaround the limitations of the ppc64 assembler backend. This limitation no longer exists. Thus, we can pass operands for these opcodes without having to merge the MB and ME arguments in the assembler frontend or compiler backend. Likewise, support for 4 operand variants is unchanged. Change-Id: Ib086774f3581edeaadfd2190d652aaaa8a90daeb Reviewed-on: https://go-review.googlesource.com/c/go/+/298750 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org> Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
2021-03-04cmd/internal: Add 6 args to ppc64 optabPaul E. Murphy
This is a preparatory patch to support 6 arg opcodes on POWER10, and simplify 5 arg opcode processing (e.g RLWNM and similar). This expands the optab structure, and renames a4 arguments to a6. No actual change in functionality is made. Change-Id: I785e4177778e4bf1326cf8e46e8aeaaa0e4d406b Reviewed-on: https://go-review.googlesource.com/c/go/+/295031 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Trust: Keith Randall <khr@golang.org>
2021-02-22cmd/internal: cleanup ppc64 optab structurePaul E. Murphy
This is no-functionality change to begin the process of supporting more than 6 operands. This rewrites the table to use named arguments, and removes default initialized argument values. The following sed regexes rewrote the table: s/{\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\)}/{as:\1,a1:\2,a2:\3,a3:\4,a4:\5,type_:\6,size:\7,param:\8} s/a[1-4]: C_NONE, //g s/, param: 0// Change-Id: I5f4de9da75f2fb3964d625d6b4e2f1ce1e29cc47 Reviewed-on: https://go-review.googlesource.com/c/go/+/294189 Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
2021-02-19cmd/asm, cmd/link, runtime: introduce FuncInfo flag bitsRuss Cox
The runtime traceback code has its own definition of which functions mark the top frame of a stack, separate from the TOPFRAME bits that exist in the assembly and are passed along in DWARF information. It's error-prone and redundant to have two different sources of truth. This CL provides the actual TOPFRAME bits to the runtime, so that the runtime can use those bits instead of reinventing its own category. This CL also adds a new bit, SPWRITE, which marks functions that write directly to SP (anything but adding and subtracting constants). Such functions must stop a traceback, because the traceback has no way to rederive the SP on entry. Again, the runtime has its own definition which is mostly correct, but also missing some functions. During ordinary goroutine context switches, such functions do not appear on the stack, so the incompleteness in the runtime usually doesn't matter. But profiling signals can arrive at any moment, and the runtime may crash during traceback if it attempts to unwind an SP-writing frame and gets out-of-sync with the actual stack. The runtime contains code to try to detect likely candidates but again it is incomplete. Deriving the SPWRITE bit automatically from the actual assembly code provides the complete truth, and passing it to the runtime lets the runtime use it. This CL is part of a stack adding windows/arm64 support (#36439), intended to land in the Go 1.17 cycle. This CL is, however, not windows/arm64-specific. It is cleanup meant to make the port (and future ports) easier. Change-Id: I227f53b23ac5b3dabfcc5e8ee3f00df4e113cf58 Reviewed-on: https://go-review.googlesource.com/c/go/+/288800 Trust: Russ Cox <rsc@golang.org> Trust: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-11-19cmd/compile,cmd/asm: fix function pointer call perf regression on ppc64Paul E. Murphy
by inserting hint when using bclrl. Using this instruction as subroutine call is not the expected default behavior, and as a result confuses the branch predictor. The default expected behavior is a conditional return from a subroutine. We can change this assumption by encoding a hint this is not a subroutine return. The regex benchmarks are a pretty good example of how much this hint can help generic ppc64le code on a power9 machine: name old time/op new time/op delta Find 606ns ± 0% 447ns ± 0% -26.27% FindAllNoMatches 309ns ± 0% 205ns ± 0% -33.72% FindString 609ns ± 0% 451ns ± 0% -26.04% FindSubmatch 734ns ± 0% 594ns ± 0% -19.07% FindStringSubmatch 706ns ± 0% 574ns ± 0% -18.83% Literal 177ns ± 0% 136ns ± 0% -22.89% NotLiteral 4.69µs ± 0% 2.34µs ± 0% -50.14% MatchClass 6.05µs ± 0% 3.26µs ± 0% -46.08% MatchClass_InRange 5.93µs ± 0% 3.15µs ± 0% -46.86% ReplaceAll 3.15µs ± 0% 2.18µs ± 0% -30.77% AnchoredLiteralShortNonMatch 156ns ± 0% 109ns ± 0% -30.61% AnchoredLiteralLongNonMatch 192ns ± 0% 136ns ± 0% -29.34% AnchoredShortMatch 268ns ± 0% 209ns ± 0% -22.00% AnchoredLongMatch 472ns ± 0% 357ns ± 0% -24.30% OnePassShortA 1.16µs ± 0% 0.87µs ± 0% -25.03% NotOnePassShortA 1.34µs ± 0% 1.20µs ± 0% -10.63% OnePassShortB 940ns ± 0% 655ns ± 0% -30.29% NotOnePassShortB 873ns ± 0% 703ns ± 0% -19.52% OnePassLongPrefix 258ns ± 0% 155ns ± 0% -40.13% OnePassLongNotPrefix 943ns ± 0% 529ns ± 0% -43.89% MatchParallelShared 591ns ± 0% 436ns ± 0% -26.31% MatchParallelCopied 596ns ± 0% 435ns ± 0% -27.10% QuoteMetaAll 186ns ± 0% 186ns ± 0% -0.16% QuoteMetaNone 55.9ns ± 0% 55.9ns ± 0% +0.02% Compile/Onepass 9.64µs ± 0% 9.26µs ± 0% -3.97% Compile/Medium 21.7µs ± 0% 20.6µs ± 0% -4.90% Compile/Hard 174µs ± 0% 174µs ± 0% +0.07% Match/Easy0/16 7.35ns ± 0% 7.34ns ± 0% -0.11% Match/Easy0/32 116ns ± 0% 97ns ± 0% -16.27% Match/Easy0/1K 592ns ± 0% 562ns ± 0% -5.04% Match/Easy0/32K 12.6µs ± 0% 12.5µs ± 0% -0.64% Match/Easy0/1M 556µs ± 0% 556µs ± 0% -0.00% Match/Easy0/32M 17.7ms ± 0% 17.7ms ± 0% +0.05% Match/Easy0i/16 7.34ns ± 0% 7.35ns ± 0% +0.10% Match/Easy0i/32 2.82µs ± 0% 1.64µs ± 0% -41.71% Match/Easy0i/1K 83.2µs ± 0% 48.2µs ± 0% -42.06% Match/Easy0i/32K 2.13ms ± 0% 1.80ms ± 0% -15.34% Match/Easy0i/1M 68.1ms ± 0% 57.6ms ± 0% -15.31% Match/Easy0i/32M 2.18s ± 0% 1.80s ± 0% -17.52% Match/Easy1/16 7.36ns ± 0% 7.34ns ± 0% -0.24% Match/Easy1/32 118ns ± 0% 96ns ± 0% -18.72% Match/Easy1/1K 2.46µs ± 0% 1.58µs ± 0% -35.65% Match/Easy1/32K 80.2µs ± 0% 54.6µs ± 0% -31.92% Match/Easy1/1M 2.75ms ± 0% 1.88ms ± 0% -31.66% Match/Easy1/32M 87.5ms ± 0% 59.8ms ± 0% -31.62% Match/Medium/16 7.34ns ± 0% 7.34ns ± 0% +0.01% Match/Medium/32 2.60µs ± 0% 1.50µs ± 0% -42.61% Match/Medium/1K 78.1µs ± 0% 43.7µs ± 0% -44.06% Match/Medium/32K 2.08ms ± 0% 1.52ms ± 0% -27.11% Match/Medium/1M 66.5ms ± 0% 48.6ms ± 0% -26.96% Match/Medium/32M 2.14s ± 0% 1.60s ± 0% -25.18% Match/Hard/16 7.35ns ± 0% 7.35ns ± 0% +0.03% Match/Hard/32 3.58µs ± 0% 2.44µs ± 0% -31.82% Match/Hard/1K 108µs ± 0% 75µs ± 0% -31.04% Match/Hard/32K 2.79ms ± 0% 2.25ms ± 0% -19.30% Match/Hard/1M 89.4ms ± 0% 72.2ms ± 0% -19.26% Match/Hard/32M 2.91s ± 0% 2.37s ± 0% -18.60% Match/Hard1/16 11.1µs ± 0% 8.3µs ± 0% -25.07% Match/Hard1/32 21.4µs ± 0% 16.1µs ± 0% -24.85% Match/Hard1/1K 658µs ± 0% 498µs ± 0% -24.27% Match/Hard1/32K 12.2ms ± 0% 11.7ms ± 0% -4.60% Match/Hard1/1M 391ms ± 0% 374ms ± 0% -4.40% Match/Hard1/32M 12.6s ± 0% 12.0s ± 0% -4.68% Match_onepass_regex/16 870ns ± 0% 611ns ± 0% -29.79% Match_onepass_regex/32 1.58µs ± 0% 1.08µs ± 0% -31.48% Match_onepass_regex/1K 45.7µs ± 0% 30.3µs ± 0% -33.58% Match_onepass_regex/32K 1.45ms ± 0% 0.97ms ± 0% -33.20% Match_onepass_regex/1M 46.2ms ± 0% 30.9ms ± 0% -33.01% Match_onepass_regex/32M 1.46s ± 0% 0.99s ± 0% -32.02% name old alloc/op new alloc/op delta Find 0.00B 0.00B 0.00% FindAllNoMatches 0.00B 0.00B 0.00% FindString 0.00B 0.00B 0.00% FindSubmatch 48.0B ± 0% 48.0B ± 0% 0.00% FindStringSubmatch 32.0B ± 0% 32.0B ± 0% 0.00% Compile/Onepass 4.02kB ± 0% 4.02kB ± 0% 0.00% Compile/Medium 9.39kB ± 0% 9.39kB ± 0% 0.00% Compile/Hard 84.7kB ± 0% 84.7kB ± 0% 0.00% Match_onepass_regex/16 0.00B 0.00B 0.00% Match_onepass_regex/32 0.00B 0.00B 0.00% Match_onepass_regex/1K 0.00B 0.00B 0.00% Match_onepass_regex/32K 0.00B 0.00B 0.00% Match_onepass_regex/1M 5.00B ± 0% 3.00B ± 0% -40.00% Match_onepass_regex/32M 136B ± 0% 68B ± 0% -50.00% name old allocs/op new allocs/op delta Find 0.00 0.00 0.00% FindAllNoMatches 0.00 0.00 0.00% FindString 0.00 0.00 0.00% FindSubmatch 1.00 ± 0% 1.00 ± 0% 0.00% FindStringSubmatch 1.00 ± 0% 1.00 ± 0% 0.00% Compile/Onepass 52.0 ± 0% 52.0 ± 0% 0.00% Compile/Medium 112 ± 0% 112 ± 0% 0.00% Compile/Hard 424 ± 0% 424 ± 0% 0.00% Match_onepass_regex/16 0.00 0.00 0.00% Match_onepass_regex/32 0.00 0.00 0.00% Match_onepass_regex/1K 0.00 0.00 0.00% Match_onepass_regex/32K 0.00 0.00 0.00% Match_onepass_regex/1M 0.00 0.00 0.00% Match_onepass_regex/32M 2.00 ± 0% 1.00 ± 0% -50.00% name old speed new speed delta QuoteMetaAll 75.2MB/s ± 0% 75.3MB/s ± 0% +0.15% QuoteMetaNone 465MB/s ± 0% 465MB/s ± 0% -0.02% Match/Easy0/16 2.18GB/s ± 0% 2.18GB/s ± 0% +0.10% Match/Easy0/32 276MB/s ± 0% 330MB/s ± 0% +19.46% Match/Easy0/1K 1.73GB/s ± 0% 1.82GB/s ± 0% +5.29% Match/Easy0/32K 2.60GB/s ± 0% 2.62GB/s ± 0% +0.64% Match/Easy0/1M 1.89GB/s ± 0% 1.89GB/s ± 0% +0.00% Match/Easy0/32M 1.89GB/s ± 0% 1.89GB/s ± 0% -0.05% Match/Easy0i/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.10% Match/Easy0i/32 11.4MB/s ± 0% 19.5MB/s ± 0% +71.48% Match/Easy0i/1K 12.3MB/s ± 0% 21.2MB/s ± 0% +72.62% Match/Easy0i/32K 15.4MB/s ± 0% 18.2MB/s ± 0% +18.12% Match/Easy0i/1M 15.4MB/s ± 0% 18.2MB/s ± 0% +18.12% Match/Easy0i/32M 15.4MB/s ± 0% 18.6MB/s ± 0% +21.21% Match/Easy1/16 2.17GB/s ± 0% 2.18GB/s ± 0% +0.24% Match/Easy1/32 271MB/s ± 0% 333MB/s ± 0% +23.07% Match/Easy1/1K 417MB/s ± 0% 648MB/s ± 0% +55.38% Match/Easy1/32K 409MB/s ± 0% 600MB/s ± 0% +46.88% Match/Easy1/1M 381MB/s ± 0% 558MB/s ± 0% +46.33% Match/Easy1/32M 383MB/s ± 0% 561MB/s ± 0% +46.25% Match/Medium/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.01% Match/Medium/32 12.3MB/s ± 0% 21.4MB/s ± 0% +74.13% Match/Medium/1K 13.1MB/s ± 0% 23.4MB/s ± 0% +78.73% Match/Medium/32K 15.7MB/s ± 0% 21.6MB/s ± 0% +37.23% Match/Medium/1M 15.8MB/s ± 0% 21.6MB/s ± 0% +36.93% Match/Medium/32M 15.7MB/s ± 0% 21.0MB/s ± 0% +33.67% Match/Hard/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.03% Match/Hard/32 8.93MB/s ± 0% 13.10MB/s ± 0% +46.70% Match/Hard/1K 9.48MB/s ± 0% 13.74MB/s ± 0% +44.94% Match/Hard/32K 11.7MB/s ± 0% 14.5MB/s ± 0% +23.87% Match/Hard/1M 11.7MB/s ± 0% 14.5MB/s ± 0% +23.87% Match/Hard/32M 11.6MB/s ± 0% 14.2MB/s ± 0% +22.86% Match/Hard1/16 1.44MB/s ± 0% 1.93MB/s ± 0% +34.03% Match/Hard1/32 1.49MB/s ± 0% 1.99MB/s ± 0% +33.56% Match/Hard1/1K 1.56MB/s ± 0% 2.05MB/s ± 0% +31.41% Match/Hard1/32K 2.68MB/s ± 0% 2.80MB/s ± 0% +4.48% Match/Hard1/1M 2.68MB/s ± 0% 2.80MB/s ± 0% +4.48% Match/Hard1/32M 2.66MB/s ± 0% 2.79MB/s ± 0% +4.89% Match_onepass_regex/16 18.4MB/s ± 0% 26.2MB/s ± 0% +42.41% Match_onepass_regex/32 20.2MB/s ± 0% 29.5MB/s ± 0% +45.92% Match_onepass_regex/1K 22.4MB/s ± 0% 33.8MB/s ± 0% +50.54% Match_onepass_regex/32K 22.6MB/s ± 0% 33.9MB/s ± 0% +49.67% Match_onepass_regex/1M 22.7MB/s ± 0% 33.9MB/s ± 0% +49.27% Match_onepass_regex/32M 23.0MB/s ± 0% 33.9MB/s ± 0% +47.14% Fixes #42709 Change-Id: Ice07fec2de4c5b1302febf8c2978ae8c1e4fd3e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/271337 Reviewed-by: Cherry Zhang <cherryyz@google.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
2020-11-04cmd/internal/obj: add prologue_end DWARF stmt for ppc64Derek Parker
This patch adds a prologue_end statement to the DWARF information for the ppc64 arch. Prologue end is used by the Delve debugger in order to determine where to set a breakpoint to avoid the stacksplit prologue. Updates #36612 Change-Id: Ifb16c1476fe716a0bf493c5486d1d88ebe8d0253 GitHub-Last-Rev: 77a217206d529df8bf8d4ef10a5347b6ae524612 GitHub-Pull-Request: golang/go#42261 Reviewed-on: https://go-review.googlesource.com/c/go/+/266019 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Trust: Dmitri Shuralyov <dmitshur@golang.org>
2020-11-04cmd/asm: fix rlwnm reg,reg,const,reg encoding on ppc64Paul E. Murphy
The wrong value for the first reg parameter was selected. Likewise the wrong opcode was selected. This should match rlwnm (rrr type), not rlwinm (irr type). Similarly, fix the optab matching rules so clrlslwi does not match reg,reg,const,reg arguments. This is not a valid operand combination for clrlslwi. Fixes #42368 Change-Id: I4eb16d45a760b9fd3f497ef9863f82465351d39f Reviewed-on: https://go-review.googlesource.com/c/go/+/267421 Reviewed-by: Cherry Zhang <cherryyz@google.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-16cmd/internal/obj/ppc64,cmd/asm/internal/asm/testdata: fix up ppc64 testcasesLynn Boger
When a fix was made at the end of the last release related to NOPs, it was discovered that the ppc64.s testcase was out of date and contained comments that weren't being processed. Essentially the instructions in that test were being assembled but there was no verification that the encodings weres correct. The ppc64enc.s file was mostly complete and included the valid encodings for verification. This change moves ppc64enc.s to ppc64.s and adds the instructions that were missing. This also adds a minor fix to asm9.go on the assembly of the addex that was discovered during this testing. Change-Id: Iaada1563b137849ad195fa88f32ecc9ab3e1e95f Reviewed-on: https://go-review.googlesource.com/c/go/+/260217 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-16cmd/internal/obj: move LSym.Func into LSym.ExtraRuss Cox
This creates space for a different kind of extension field in LSym without making the struct any larger. (There are many LSym, so we care about keeping the struct small.) Change-Id: Ib16edb9e15f54c2a7351c8b875e19684058711e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/243943 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-06cmd/compile,cmd/internal/obj/ppc64: use mulli where possibleLynn Boger
This adds support to allow the use of mulli when one of the multiply operands is a constant that fits in 16 bits. This especially helps in the case where this instruction appears in a loop since the load of the constant is not being moved out of the loop. Some improvements seen in compress/flate on power9: Decode/Digits/Huffman/1e4 259µs ± 0% 261µs ± 0% +0.57% (p=1.000 n=1+1) Decode/Digits/Huffman/1e5 2.43ms ± 0% 2.45ms ± 0% +0.79% (p=1.000 n=1+1) Decode/Digits/Huffman/1e6 23.9ms ± 0% 24.2ms ± 0% +0.86% (p=1.000 n=1+1) Decode/Digits/Speed/1e4 278µs ± 0% 279µs ± 0% +0.34% (p=1.000 n=1+1) Decode/Digits/Speed/1e5 2.80ms ± 0% 2.81ms ± 0% +0.29% (p=1.000 n=1+1) Decode/Digits/Speed/1e6 28.0ms ± 0% 28.1ms ± 0% +0.28% (p=1.000 n=1+1) Decode/Digits/Default/1e4 278µs ± 0% 278µs ± 0% +0.28% (p=1.000 n=1+1) Decode/Digits/Default/1e5 2.68ms ± 0% 2.69ms ± 0% +0.19% (p=1.000 n=1+1) Decode/Digits/Default/1e6 26.6ms ± 0% 26.6ms ± 0% +0.21% (p=1.000 n=1+1) Decode/Digits/Compression/1e4 278µs ± 0% 278µs ± 0% +0.00% (p=1.000 n=1+1) Decode/Digits/Compression/1e5 2.68ms ± 0% 2.69ms ± 0% +0.21% (p=1.000 n=1+1) Decode/Digits/Compression/1e6 26.6ms ± 0% 26.6ms ± 0% +0.07% (p=1.000 n=1+1) Decode/Newton/Huffman/1e4 322µs ± 0% 312µs ± 0% -2.84% (p=1.000 n=1+1) Decode/Newton/Huffman/1e5 3.11ms ± 0% 2.91ms ± 0% -6.41% (p=1.000 n=1+1) Decode/Newton/Huffman/1e6 31.4ms ± 0% 29.3ms ± 0% -6.85% (p=1.000 n=1+1) Decode/Newton/Speed/1e4 282µs ± 0% 269µs ± 0% -4.69% (p=1.000 n=1+1) Decode/Newton/Speed/1e5 2.29ms ± 0% 2.20ms ± 0% -4.13% (p=1.000 n=1+1) Decode/Newton/Speed/1e6 22.7ms ± 0% 21.3ms ± 0% -6.06% (p=1.000 n=1+1) Decode/Newton/Default/1e4 254µs ± 0% 237µs ± 0% -6.60% (p=1.000 n=1+1) Decode/Newton/Default/1e5 1.86ms ± 0% 1.75ms ± 0% -5.99% (p=1.000 n=1+1) Decode/Newton/Default/1e6 18.1ms ± 0% 17.4ms ± 0% -4.10% (p=1.000 n=1+1) Decode/Newton/Compression/1e4 254µs ± 0% 244µs ± 0% -3.91% (p=1.000 n=1+1) Decode/Newton/Compression/1e5 1.85ms ± 0% 1.79ms ± 0% -3.10% (p=1.000 n=1+1) Decode/Newton/Compression/1e6 18.0ms ± 0% 17.3ms ± 0% -3.88% (p=1.000 n=1+1) Change-Id: I840320fab1c4bf64c76b001c2651ab79f23df4eb Reviewed-on: https://go-review.googlesource.com/c/go/+/259444 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-01cmd/compile,cmd/internal/obj/ppc64: fix some shift rules due to a regressionLynn Boger
A recent change to improve shifts was generating some invalid cases when the rule was based on an AND. The extended mnemonics CLRLSLDI and CLRLSLWI only allow certain values for the operands and in the mask case those values were not being checked properly. This adds a check to those rules to verify that the 'b' and 'n' values used when an AND was part of the rule have correct values. There was a bug in some diag messages in asm9. The message expected 3 values but only provided 2. Those are corrected here also. The test/codegen/shift.go was updated to add a few more cases to check for the case mentioned here. Some of the comments that mention the order of operands in these extended mnemonics were wrong and those have been corrected. Fixes #41683. Change-Id: If5bb860acaa5051b9e0cd80784b2868b85898c31 Reviewed-on: https://go-review.googlesource.com/c/go/+/258138 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-09-28cmd/asm,cmd/compile,cmd/internal/obj/ppc64: add extswsli support on power9Lynn Boger
This adds support for the extswsli instruction which combines extsw followed by a shift. New benchmark demonstrates the improvement: name old time/op new time/op delta ExtShift 1.34µs ± 0% 1.30µs ± 0% -3.15% (p=0.057 n=4+3) Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/257017 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-09-17cmd/compile: use combined shifts to improve array addressing on ppc64xLynn Boger
This change adds rules to find pairs of instructions that can be combined into a single shifts. These instruction sequences are common in array addressing within loops. Improvements can be seen in many crypto packages and the hash packages. These are based on the extended mnemonics found in the ISA sections C.8.1 and C.8.2. Some rules in PPC64.rules were moved because the ordering prevented some matching. The following results were generated on power9. hash/crc32: CRC32/poly=Koopman/size=40/align=0 195ns ± 0% 163ns ± 0% -16.41% CRC32/poly=Koopman/size=40/align=1 200ns ± 0% 163ns ± 0% -18.50% CRC32/poly=Koopman/size=512/align=0 1.98µs ± 0% 1.67µs ± 0% -15.46% CRC32/poly=Koopman/size=512/align=1 1.98µs ± 0% 1.69µs ± 0% -14.80% CRC32/poly=Koopman/size=1kB/align=0 3.90µs ± 0% 3.31µs ± 0% -15.27% CRC32/poly=Koopman/size=1kB/align=1 3.85µs ± 0% 3.31µs ± 0% -14.15% CRC32/poly=Koopman/size=4kB/align=0 15.3µs ± 0% 13.1µs ± 0% -14.22% CRC32/poly=Koopman/size=4kB/align=1 15.4µs ± 0% 13.1µs ± 0% -14.79% CRC32/poly=Koopman/size=32kB/align=0 137µs ± 0% 105µs ± 0% -23.56% CRC32/poly=Koopman/size=32kB/align=1 137µs ± 0% 105µs ± 0% -23.53% crypto/rc4: RC4_128 733ns ± 0% 650ns ± 0% -11.32% (p=1.000 n=1+1) RC4_1K 5.80µs ± 0% 5.17µs ± 0% -10.89% (p=1.000 n=1+1) RC4_8K 45.7µs ± 0% 40.8µs ± 0% -10.73% (p=1.000 n=1+1) crypto/sha1: Hash8Bytes 635ns ± 0% 613ns ± 0% -3.46% (p=1.000 n=1+1) Hash320Bytes 2.30µs ± 0% 2.18µs ± 0% -5.38% (p=1.000 n=1+1) Hash1K 5.88µs ± 0% 5.38µs ± 0% -8.62% (p=1.000 n=1+1) Hash8K 42.0µs ± 0% 37.9µs ± 0% -9.75% (p=1.000 n=1+1) There are other improvements found in golang.org/x/crypto which are all in the range of 5-15%. Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c Reviewed-on: https://go-review.googlesource.com/c/go/+/252097 Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2020-08-31cmd/compile,cmd/asm: simplify recording of branch targets, take 2Keith Randall
We currently use two fields to store the targets of branches. Some phases use p.To.Val, some use p.Pcond. Rewrite so that every branch instruction uses p.To.Val. p.From.Val is also used in rare instances. Introduce a Pool link for use by arm/arm64, instead of repurposing Pcond. This is a cleanup CL in preparation for some stack frame CLs. Change-Id: If8239177e4b1ea2bccd0608eb39553d23210d405 Reviewed-on: https://go-review.googlesource.com/c/go/+/251437 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-28Revert "cmd/compile,cmd/asm: simplify recording of branch targets"Keith Randall
This reverts CL 243318. Reason for revert: Seems to be crashing some builders. Change-Id: I2ffc59bc5535be60b884b281c8d0eff4647dc756 Reviewed-on: https://go-review.googlesource.com/c/go/+/251169 Reviewed-by: Bryan C. Mills <bcmills@google.com>
2020-08-27cmd/compile,cmd/asm: simplify recording of branch targetsKeith Randall
We currently use two fields to store the targets of branches. Some phases use p.To.Val, some use p.Pcond. Rewrite so that every branch instruction uses p.To.Val. p.From.Val is also used in rare instances. Introduce a Pool link for use by arm/arm64, instead of repurposing Pcond. This is a cleanup CL in preparation for some stack frame CLs. Change-Id: I9055bf0a1d986aff421e47951a1dedc301c846f8 Reviewed-on: https://go-review.googlesource.com/c/go/+/243318 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-19cmd/asm,cmd/internal/obj/ppc64: add {l,st}xvx power9 instructionsPaul E. Murphy
These are the indexed vsx load operations with the same endian and alignment benefits of {l,st}vx. Likewise, cleanup redundant comments in op{load,store}x and fix ISA 3.0 typos nearby. Change-Id: Ie1ace17c6150cf9168a834e435114028ff6eb07c Reviewed-on: https://go-review.googlesource.com/c/go/+/249025 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-08-13cmd/internal/obj/ppc64: don't remove NOP in assemblerLynn Boger
Previously, the assembler removed NOPs from the Prog list in obj9.go. NOPs shouldn't be removed if they were added as an inline mark, as described in the issue below. Fixes #40689 Once the NOPs were left in the Prog list, some instructions were flagged as invalid because they had an operand which was not represented in optab. In order to preserve the previous assembler behavior, entries were added to optab for those operand cases. They were not flagged as errors before because the NOP instructions were removed before the code to check the valid opcode/operand combinations. Change-Id: Iae5145f94459027cf458e914d7c5d6089807ccf8 Reviewed-on: https://go-review.googlesource.com/c/go/+/247842 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Michael Munday <mike.munday@ibm.com> Reviewed-by: Keith Randall <khr@golang.org>
2020-07-08[dev.link] cmd/compile: make compiler-generated ppc64 TOC symbols staticThan McIntosh
Set the AttrStatic flag on compiler-emitted TOC symbols for ppc64; these symbols don't need to go into the final symbol table in Go binaries. This fixes a buglet introduced by CL 240539 that was causing failures on the aix builder. Change-Id: If8b63bcf6d2791f1ec5a0c371d2d11e806202fd2 Reviewed-on: https://go-review.googlesource.com/c/go/+/241637 Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-06-18cmd/internal/obj/ppc64: update docLynn Boger
This updates the ppc64 asm doc file, including information on updates to the objdump, correcting information on operand order, and adding some information on shifts. Change-Id: Ib8ed53eac86c2121ea5b657c361ad92aae31cb32 Reviewed-on: https://go-review.googlesource.com/c/go/+/238237 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-04-29cmd/compile,cmd/internal/obj/ppc64: use mod instructions on power9Lynn Boger
This updates the PPC64.rules file to use the MOD instructions that are available in power9. Prior to power9 this is done using a longer sequence with multiply and divide. Included in this change is removal of the REM* opcode variations that set the CC or OV bits since their settings are based on the DIV and are not appropriate for the REM. Change-Id: Iceed9ce33e128e1911c15592ee674276ce8ba3fa Reviewed-on: https://go-review.googlesource.com/c/go/+/229761 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-23cmd/asm,cmd/internal/obj/ppc64: update instructions and testsLynn Boger
This change adds some instructions that were missing from the ppc64 assembler, mostly power9 but a few others from earlier. Tests in cmd/asm for ppc64 were updated: ppc64.s includes the new instructions, and ppc64enc.s now includes not only the new instructions but most ppc64 opcodes to provide a more complete test of the ppc64 assembler. The ppc64 instruction set is used for linux/ppc64le, linux/ppc64, and aix/ppc64. Change-Id: I8695f89dbca06174847963f4ef869f2e584d5bbf Reviewed-on: https://go-review.googlesource.com/c/go/+/229479 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-15cmd/internal/obj/ppc64: add support for PCALIGN 32Lynn Boger
This adds support support for the PCALIGN value 32. When this directive occurs code will be aligned to 32 bytes unless too many NOPs are needed, and then will fall back to 16 byte alignment. On Linux the function's alignment is promoted from 16 to 32 in functions where PCALIGN 32 appears. On AIX the function's alignment is left at 16 due to complexity with modifying its alignment, which means code will be aligned to at least 16, possibly 32 at times, which is still good. Test was updated to accept new value. Change-Id: I28e72d5f30ca472ed9ba736ddeabfea192d11797 Reviewed-on: https://go-review.googlesource.com/c/go/+/228258 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-13Revert "cmd/internal/obj/ppc64: add support for pcalign 32 on ppc64x"Bryan C. Mills
This reverts CL 227775. Reason for revert: broke aix-ppc64 builder (https://build.golang.org/log/cf3b4f9fd09ee81f422a4b58488b9d0a2692c949). Change-Id: I2095bb2aadb5a4064eb89ad353012503faf15709 Reviewed-on: https://go-review.googlesource.com/c/go/+/228143 Run-TryBot: Bryan C. Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-10cmd/internal/obj/ppc64: add support for pcalign 32 on ppc64xLynn Boger
Previous PCALIGN support on ppc64x only accepted 8 and 16 byte alignment since the default function alignment was 16. Now that the function's alignment can be set to a larger value when needed, PCALIGN can accept 32. When this happens then the function's alignment will be changed to 32. Test has been updated to recognized this new value. Change-Id: If82c3cd50d7c686fcf8a9e819708b15660cdfa63 Reviewed-on: https://go-review.googlesource.com/c/go/+/227775 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-09cmd/internal/obj/ppc64: leverage AND operation to calculate remainderAndy Pan
Change-Id: I03e2a573eb778591071db4f783585a5d71a14c03 Reviewed-on: https://go-review.googlesource.com/c/go/+/227005 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-03-24cmd/internal/obj/ppc64: fix PCALIGN on ppc64leLynn Boger
This fixes a potential issue with the previous implementation of PCALIGN on ppc64. Previously PCALIGN was processed inside of asmout and indicated the padding size by setting the value in the optab, changing it back after the alignment instructions were added. Now PCALIGN is processed outside of asmout, and optab is not changed. Change-Id: I8b0093a0e2b7e06176af27e05150d04ae2c55d60 Reviewed-on: https://go-review.googlesource.com/c/go/+/225198 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-03-13cmd/internal/obj/ppc64: clean up some opcodesLynn Boger
This does some clean up of the ppc64 opcodes to remove names from the opcode list that don't actually assemble. At one time names were added to this list to represent opcode "classes" to organize other opcodes that have the same set of operand combinations. Since this is not documented, it is confusing as to which opcodes can be used in an asm file and which can't, and which opcodes should be supported in the disassembler. It is clearer for the user if the list of Go opcodes are all opcodes that can be assembled with names that match the ppc64 opcode where possible. I found this when trying to use Go opcode XXLAND in an asm file which seems like it should map to ppc64 xxland but when used it gets this error: go tool asm test_xxland.s asm: bad r/r, r/r/r or r/r/r/r opcode XXLAND asm: assembly failed This change removes the opcodes that are only used for opcode "classes" and fixes the case statement where they are referenced. This also fixes XXLAND and XXPERM which are opcodes that should assemble to their corresponding ppc64 opcode but do not. Change-Id: I52300db6b22f7f8b3dd3491c3f35a384b943352c Reviewed-on: https://go-review.googlesource.com/c/go/+/223138 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-27cmd/internal/obj: mark split-stack prologue nonpreemptibleCherry Zhang
When there are both a synchronous preemption request (by clobbering the stack guard) and an asynchronous one (by signal), the running goroutine may observe the synchronous request first in stack bounds check, and go to the path of calling morestack. If the preemption signal arrives at this point before the call to morestack, the goroutine will be asynchronously preempted, entering the scheduler. When it is resumed, the scheduler clears the preemption request, unclobbers the stack guard. But the resumed goroutine will still call morestack, as it is already on its way. morestack will, as there is no preemption request, double the stack unnecessarily. If this happens multiple times, the stack may grow too big, although only a small amount is actually used. To fix this, we mark the stack bounds check and the call to morestack async-nonpreemptible, starting after the memory instruction (mostly a load, on x86 CMP with memory). Not done for Wasm as it does not support async preemption. Fixes #35470. Change-Id: Ibd7f3d935a3649b80f47539116ec9b9556680cf2 Reviewed-on: https://go-review.googlesource.com/c/go/+/207350 Reviewed-by: David Chase <drchase@google.com>
2019-11-07cmd/internal/obj/ppc64: handle MOVDU for SP deltaCherry Zhang
If a MOVDU instruction is used with an offset of SP, the instruction changes SP therefore needs an SP delta, which is used for generating the PC-SP table for stack unwinding. MOVDU is frequently used for allocating the frame and saving the LR in the same instruction, so this is particularly useful. Change-Id: Icb63eb55aa01c3dc350ac4e4cff6371f4c3c5867 Reviewed-on: https://go-review.googlesource.com/c/go/+/205279 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-11-07cmd/compile, cmd/internal/obj/ppc64: mark unsafe pointsCherry Zhang
We'll use CTR as a scratch register for call injection. Mark code sequences that use CTR as unsafe for async preemption. Currently it is only used in LoweredZero and LoweredMove. It is unfortunate that they are nonpreemptible. But I think it is still better than using LR for call injection and marking all leaf functions nonpreemptible. Also mark the prologue of large frame functions nonpreemptible, as we write below SP. Change-Id: I05a75431499f3f4b2f23651a7b17f7fcf2afbe06 Reviewed-on: https://go-review.googlesource.com/c/go/+/203823 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-11-07cmd/compile, cmd/internal/obj/ppc64: use LR for indirect callsCherry Zhang
On PPC64, indirect calls can be made through LR or CTR. Currently both are used. This CL changes it to always use LR. For async preemption, to return from the injected call, we need an indirect jump back to the PC we preeempted. This jump can be made through LR or CTR. So we'll have to clobber either LR or CTR. Currently, LR is used more frequently. In particular, for a leaf function, LR is live throughout the function. We don't want to make leaf functions nonpreemptible. So we choose CTR for the call injection. For code sequences that use CTR, if it is ok to use another register, change it to. Plus, it is a call so it will clobber LR anyway. It doesn't need to also clobber CTR (even without preemption). Change-Id: I07bd0e93b94a1a3aa2be2cd465801136165d8ab8 Reviewed-on: https://go-review.googlesource.com/c/go/+/203822 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2019-11-04cmd/internal/obj/ppc64: add support for DQ-form instructionsCarlos Eduardo Seo
POWER9 (ISA 3.0) introduced a new format of load/store instructions to implement indexed load/store quadword, using an immediate value instead of a register index. This change adds support for this new instruction encoding and adds the new load/store quadword instructions (lxv/stxv) to the assembler. This change also adds the missing XX1-form loads/stores (halfword and byte) included in ISA 3.0. Change-Id: Ibcdf53c342d7a352d64a9403c2fe7b25be9c3b24 Reviewed-on: https://go-review.googlesource.com/c/go/+/200399 Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2019-10-01cmd/internal/obj/ppc64: Fix ADUFFxxxx generation on aix/ppc64Clément Chigot
ADUFFCOPY and ADUFFZERO instructions weren't handled by rewriteToUseTOC. These instructions are considered as a simple branch except with -dynlink where they become an indirect call. Fixes #34604 Change-Id: I16ca6a152164966fb9cbf792219a8a39aad2b53b Reviewed-on: https://go-review.googlesource.com/c/go/+/197842 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-08-29cmd/internal/obj/ppc64: add support for vmrgow,vmrgewLynn Boger
This adds support for ppc64 instructions vmrgow and vmrgew which are needed for an improved implementation of chacha20. Change-Id: I967a2de54236bcc573a99f7e2b222d5a8bb29e03 Reviewed-on: https://go-review.googlesource.com/c/go/+/192117 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2019-06-25cmd/internal/obj/ppc64: add doc.goLynn Boger
Adding some details on writing Go assembler for ppc64. Change-Id: I46fc6b75ee6c36946f90280b2b670e0d32bcc6b1 Reviewed-on: https://go-review.googlesource.com/c/go/+/183837 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-20cmd/compile/internal, cmd/internal/obj/ppc64: generate new count trailing ↵Carlos Eduardo Seo
zeros instructions on POWER9 This change adds new POWER9 instructions for counting trailing zeros (CNTTZW/CNTTZD) to the assembler and generates them in SSA when GOPPC64=power9. name old time/op new time/op delta TrailingZeros-160 1.59ns ±20% 1.45ns ±10% -8.81% (p=0.000 n=14+13) TrailingZeros8-160 1.55ns ±23% 1.62ns ±44% ~ (p=0.593 n=13+15) TrailingZeros16-160 1.78ns ±23% 1.62ns ±38% -9.31% (p=0.003 n=14+14) TrailingZeros32-160 1.64ns ±10% 1.49ns ± 9% -9.15% (p=0.000 n=13+14) TrailingZeros64-160 1.53ns ± 6% 1.45ns ± 5% -5.38% (p=0.000 n=15+13) Change-Id: I365e6ff79f3ce4d8ebe089a6a86b1771853eb596 Reviewed-on: https://go-review.googlesource.com/c/go/+/167517 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2019-03-14cmd/internal/obj/ppc64: fix wrong register encoding in XX1-Form instructionsCarlos Eduardo Seo
A bug in the encoding of XX1-Form is flipping bit 31 of such instructions. This may result in register clobering when using VSX instructions. This was not exposed before because we currently don't generate these instructions in SSA, and the asm files in which they are present aren't affected by register clobbering. This change fixes the bug and adds a testcase for the problem. Fixes #30112 Change-Id: I77b606159ae1efea33d2ba3e1c74b7fae8d5d2e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/163759 Reviewed-by: Bryan C. Mills <bcmills@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Bryan C. Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-01-09cmd/dist, cmd/link, runtime: fix stack size when cross-compiling aix/ppc64Clément Chigot
This commit allows to cross-compiling aix/ppc64. The nosplit limit must twice as large as on others platforms because of AIX syscalls. The stack limit, especially stackGuardMultiplier, was set by cmd/dist during the bootstrap and doesn't depend on GOOS/GOARCH target. Fixes #29572 Change-Id: Id51e38885e1978d981aa9e14972eaec17294322e Reviewed-on: https://go-review.googlesource.com/c/157117 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-12-17cmd: improve aix/ppc64 new symbol addressingClément Chigot
This commit updates the new symbol addressing made for aix/ppc64 according to feedbacks given in CL 151039. Change-Id: Ic4eb9943dc520d65f7d084adf8fa9a2530f4d3f9 Reviewed-on: https://go-review.googlesource.com/c/151302 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-11-28cmd/asm,cmd/internal/obj/ppc64: add VPERMXOR to ppc64 assemblerLynn Boger
VPERMXOR is missing from the Go assembler for ppc64. It has the same format as VPERM. It was requested by an external user so they could write an optimized algorithm in asm. Change-Id: Icf4c682f7f46716ccae64e6ae3d62e8cec67f6c1 Reviewed-on: https://go-review.googlesource.com/c/151578 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2018-11-27cmd: fix symbols addressing for aix/ppc64Clément Chigot
This commit changes the code generated for addressing symbols on AIX operating system. On AIX, every symbol accesses must be done via another symbol near the TOC, named TOC anchor or TOC entry. This TOC anchor is a pointer to the symbol address. During Progedit function, when a symbol access is detected, its instructions are modified to create a load on its TOC anchor and retrieve the symbol. Change-Id: I00cf8f49c13004bc99fa8af13d549a709320f797 Reviewed-on: https://go-review.googlesource.com/c/151039 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-10-23cmd/asm/internal,cmd/internal/obj/ppc64: add alignment directive to asm for ↵Lynn Boger
ppc64x This adds support for an alignment directive that can be used within Go asm to indicate preferred code alignment for ppc64x. This is intended to be used with loops to improve performance. This change only adds the directive and aligns the code based on it. Follow up changes will modify asm functions for ppc64x that benefit from preferred alignment. Fixes #14935 Here is one example of the improvement in memmove when the directive is used on the loops in the code: Memmove/64 8.74ns ± 0% 8.64ns ± 0% -1.19% (p=0.000 n=8+8) Memmove/128 11.5ns ± 0% 11.0ns ± 0% -4.35% (p=0.000 n=8+8) Memmove/256 23.0ns ± 0% 15.3ns ± 0% -33.48% (p=0.000 n=8+8) Memmove/512 31.7ns ± 0% 31.8ns ± 0% +0.32% (p=0.000 n=8+8) Memmove/1024 52.3ns ± 0% 43.9ns ± 0% -16.10% (p=0.000 n=8+8) Memmove/2048 93.2ns ± 0% 76.2ns ± 0% -18.24% (p=0.000 n=8+8) Memmove/4096 174ns ± 0% 141ns ± 0% -18.97% (p=0.000 n=8+8) Change-Id: I200d77e923dd5d78c22fe3f8eb142a8fbaff57bf Reviewed-on: https://go-review.googlesource.com/c/144218 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-10-10cmd/internal/obj/ppc64: generate float 0 more efficiently on ppc64xLynn Boger
This change makes use of a VSX instruction to generate the float 0 value instead of generating a constant in memory and loading it from there. This uses 1 instruction instead of 2 and avoids a memory reference. in the +0 case, uses 2 instructions in the -0 case but avoids the memory reference. Since this is done in the assembler for ppc64x, an update has been made to the assembler test. Change-Id: Ief7dddcb057bfb602f78215f6947664e8c841464 Reviewed-on: https://go-review.googlesource.com/c/139420 Reviewed-by: Michael Munday <mike.munday@ibm.com>