aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/internal/obj
AgeCommit message (Collapse)Author
2017-06-23cmd/internal/obj/arm: fix wrong encoding of MULBBBen Shi
"MULBB R1, R2, R3" is encoded to 0xe163f182, which should be 0xe1630182. This patch fix it. fix #20764 Change-Id: I9d3c3ffa40ecde86638e5e083eacc67578caebf4 Reviewed-on: https://go-review.googlesource.com/46491 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-06-23cmd/internal/obj/arm: fix setting U bit in shifted register offset of MOVBSBen Shi
"MOVBS.U R0<<0(R1), R2" is assembled to 0xe19120d0 (ldrsb r2, [r1, r0]), but it is expected to be 0xe11120d0 (ldrsb r2, [r1, -r0]). This patch fixes it and adds more encoding tests. fixes #20701 Change-Id: Ic1fb46438d71a978dbef06d97494a70c95fcbf3a Reviewed-on: https://go-review.googlesource.com/45996 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-06-15runtime: restore arm assembly stubs for div/modKeith Randall
These are used by DIV[U] and MOD[U] assembly instructions. Add a test in the stdlib so we actually exercise linking to these routines. Update #19507 Change-Id: I0d8e19a53e3744abc0c661ea95486f94ec67585e Reviewed-on: https://go-review.googlesource.com/45703 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-06-14runtime: remove unused arm assembly for div/modKeith Randall
Also add runtime· prefixes to the code that is still used. Fixes #19507 Change-Id: Ib6da6b2a9e398061d3f93958ee1258295b6cc33b Reviewed-on: https://go-review.googlesource.com/45699 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-06-13cmd/internal/obj/arm: fix MOVW to/from FPSRBen Shi
"MOVW FPSR, g" should be assembled to 0xeef1aa10, but actually 0xee30a110 (RFS). "MOVW g, FPSR" should be 0xeee1aa10, but actually 0xee20a110 (WFS). They should be updated to VFP forms, since the ARM back end doesn't support non-VFP floating points. The patch fixes them and adds more assembly encoding tests. fixes #20643 Change-Id: I3b29490337c6e8d891b400fcedc8b0a87b82b527 Reviewed-on: https://go-review.googlesource.com/45276 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-06-09cmd/internal/obj/arm: fix encoding of move register/immediate to CPSRBen Shi
"MOVW R1, CPSR" is assembled to 0xe129f001, which should be 0xe12cf001. "MOVW $255, CPSR" is assembled to 0xe329f0ff, which should be 0xe32cf0ff. This patch fixes them and adds more assembly encoding tests. fix #20626 Change-Id: Iefc945879ea774edf40438ce39f52c144e1501a1 Reviewed-on: https://go-review.googlesource.com/45170 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-06-07cmd/internal/obj/arm: don't split instructions on NaClCherry Zhang
We insert guard instructions after each "dangerous" instruction to make NaCl's validator happy. This happens before asmout. If in asmout an instruction is split to two dangerous instructions, but only one guard instruction is inserted, the validation fails. Therefore don't split instructions on NaCl. Fixes #20595. Change-Id: Ie34f209bc7d907d6d16ecef6721f88420981ac01 Reviewed-on: https://go-review.googlesource.com/45021 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-06-05cmd/internal/obj/ppc64: fix MOVFL REG, CONSTIan Lance Taylor
The MOVFL instruction (which external PPC64 docs call mtcrf) can take either a CR register or a constant. It doesn't make sense to specify both, as the CR register implies the constant value. Specifying either a register or a constant is enforced by the implementation in the asmout method (case 69). However, the optab was providing a form that specified both a constant and a CR register, and was not providing a form that specified only a constant. This CL fixes the optab table to provide a form that takes only a constant. No test because I don't know where to write it. The next CL in this series will use the new instruction format. Change-Id: I8bb5d3ed60f483b54c341ce613931e126f7d7be6 Reviewed-on: https://go-review.googlesource.com/44732 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-06-05cmd/internal/obj/arm: fix constant decompositionBen Shi
There are two issues in constant decomposition. 1. A typo in "func immrot2s" blocks "case 107" of []optab be triggered. 2. Though "ADD $0xffff, R0, R0" is decomposed to "ADD $0xff00, R0, R0" and "ADD $0x00ff, R0, R0" as expected, "ADD $0xffff, R0" still uses the constant pool, which should be the same as "ADD $0xffff, R0, R0". This patch fixes them and adds more instruction encoding tests. fix #20516 Change-Id: Icd7bdfa1946b29db15580dcb429111266f1384c6 Reviewed-on: https://go-review.googlesource.com/44335 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-05-25cmd/internal/obj/arm: fix illegal forms of ARM VFP instructionBen Shi
"ADDF F0, R1, F2" is silently accepted by the arm assembler and assembled to the same binary code of "ADDF F0, F1, F2". So does "CMPF F0, R1". "ABSF F0, F1, F2" is also silently accepted and assembled to a different instruction. This patch reports those illegal forms and adds test cases. fix #20464 Change-Id: I88b80dc29de24c6266ac7bf7bce1578c5adbc68c Reviewed-on: https://go-review.googlesource.com/43931 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-05-22cmd/internal/obj/arm: report invalid .S/.P/.W suffix in ARM instructionsBen Shi
Many instructions can not have a .S suffix, such as MULS, SWI, CLZ, CMP, STREX and others. And so do .P and .W suffixes. Even wrong assembly code is generated for some instructions with invalid suffixes. This patch tries to simplify .S/.W/.P checks. And a wrong assembly test for arm is added. fixes #20377 Change-Id: Iba1c99d9e6b7b16a749b4d93ca2102e17c5822fe Reviewed-on: https://go-review.googlesource.com/43561 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-05-18cmd/compile: output DWARF lexical blocks for local variablesAlessandro Arzilli
Change compiler and linker to emit DWARF lexical blocks in .debug_info section when compiling with -N -l. Version of debug_info is updated from DWARF v2 to DWARF v3 since version 2 does not allow lexical blocks with discontinuous PC ranges. Remaining open problems: - scope information is removed from inlined functions - variables records do not have DW_AT_start_scope attributes so a variable will shadow other variables with the same name as soon as its containing scope begins, even before its declaration. Updates #6913. Updates #12899. Change-Id: Idc6808788512ea20e7e45bcf782453acb416fb49 Reviewed-on: https://go-review.googlesource.com/40095 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-05-18cmd/internal/obj/arm: remove illegal form of the SWI instructionBen Shi
SWI only support "SWI $imm", but currently "SWI (Reg)" is also accepted. This patch fixes it. And more instruction tests are added to cmd/asm/internal/asm/testdata/arm.s fixes #20375 Change-Id: Id437d853924a403e41da9b6cbddd20d994b624ff Reviewed-on: https://go-review.googlesource.com/43552 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-05-16cmd/internal/obj/mips: add support of LLV, SCV, NOOP instructionsCherry Zhang
LLV and SCV are 64-bit load-linked and store-conditional. They were used in runtime as #define WORD. Change them to normal instruction form. NOOP is hardware no-op. It was written as WORD $0. Make a name for it for better disassembly output. Fixes #12561. Fixes #18238. Change-Id: I82c667ce756fa83ef37b034b641e8c4366335e83 Reviewed-on: https://go-review.googlesource.com/40297 Reviewed-by: Minux Ma <minux@golang.org> Run-TryBot: Minux Ma <minux@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-05-11cmd/internal/obj: continue to optimize ARM's constant poolBen Shi
Both Keith's https://go-review.googlesource.com/c/41612/ and and Ben's https://go-review.googlesource.com/c/41679/ optimized ARM's constant pool. But neither was complete. First, BIC was forgotten. 1. "BIC $0xff00ff00, Reg" can be optimized to "BIC $0xff000000, Reg BIC $0x0000ff00, Reg" 2. "BIC $0xffff00ff, Reg" can be optimized to "AND $0x0000ff00, Reg" 3. "AND $0xffff00ff, Reg" can be optimized to "BIC $0x0000ff00, Reg" Second, break a non-ARMImmRot to the subtraction of two ARMImmRots was left as TODO. 1. "ADD $0x00fffff0, Reg" can be optimized to "ADD $0x01000000, Reg SUB $0x00000010, Reg" 2. "SUB $0x00fffff0, Reg" can be optimized to "SUB $0x01000000, Reg ADD $0x00000010, Reg" This patch fixes them and issue #19844. The go1 benchmark shows improvements. name old time/op new time/op delta BinaryTree17-4 41.4s ± 1% 41.7s ± 1% +0.54% (p=0.000 n=50+49) Fannkuch11-4 24.7s ± 1% 25.1s ± 0% +1.70% (p=0.000 n=50+49) FmtFprintfEmpty-4 853ns ± 1% 852ns ± 1% ~ (p=0.833 n=50+50) FmtFprintfString-4 1.33µs ± 1% 1.33µs ± 1% ~ (p=0.163 n=50+50) FmtFprintfInt-4 1.40µs ± 1% 1.40µs ± 0% ~ (p=0.293 n=50+35) FmtFprintfIntInt-4 2.09µs ± 1% 2.08µs ± 1% -0.39% (p=0.000 n=50+49) FmtFprintfPrefixedInt-4 2.43µs ± 1% 2.43µs ± 1% ~ (p=0.552 n=50+50) FmtFprintfFloat-4 4.57µs ± 1% 4.42µs ± 1% -3.18% (p=0.000 n=50+50) FmtManyArgs-4 8.62µs ± 1% 8.52µs ± 0% -1.08% (p=0.000 n=50+50) GobDecode-4 101ms ± 1% 101ms ± 2% +0.45% (p=0.001 n=49+49) GobEncode-4 90.7ms ± 1% 91.1ms ± 2% +0.51% (p=0.001 n=50+50) Gzip-4 4.23s ± 1% 4.21s ± 1% -0.62% (p=0.000 n=50+50) Gunzip-4 623ms ± 1% 619ms ± 0% -0.63% (p=0.000 n=50+42) HTTPClientServer-4 721µs ± 5% 683µs ± 3% -5.25% (p=0.000 n=50+47) JSONEncode-4 251ms ± 1% 253ms ± 1% +0.54% (p=0.000 n=49+50) JSONDecode-4 941ms ± 1% 944ms ± 1% +0.30% (p=0.001 n=49+50) Mandelbrot200-4 49.3ms ± 1% 49.3ms ± 0% ~ (p=0.918 n=50+48) GoParse-4 47.1ms ± 1% 47.2ms ± 1% +0.18% (p=0.025 n=50+50) RegexpMatchEasy0_32-4 1.23µs ± 1% 1.24µs ± 1% +0.30% (p=0.000 n=49+50) RegexpMatchEasy0_1K-4 7.74µs ± 7% 7.76µs ± 5% ~ (p=0.888 n=50+50) RegexpMatchEasy1_32-4 1.32µs ± 1% 1.32µs ± 1% +0.23% (p=0.003 n=50+50) RegexpMatchEasy1_1K-4 10.6µs ± 2% 10.5µs ± 3% -1.29% (p=0.000 n=49+50) RegexpMatchMedium_32-4 2.19µs ± 1% 2.10µs ± 1% -3.79% (p=0.000 n=49+49) RegexpMatchMedium_1K-4 544µs ± 0% 545µs ± 0% ~ (p=0.123 n=41+50) RegexpMatchHard_32-4 28.8µs ± 0% 28.8µs ± 1% ~ (p=0.580 n=46+50) RegexpMatchHard_1K-4 863µs ± 1% 865µs ± 1% +0.31% (p=0.027 n=47+50) Revcomp-4 82.2ms ± 2% 82.3ms ± 2% ~ (p=0.894 n=48+49) Template-4 1.06s ± 1% 1.04s ± 1% -1.18% (p=0.000 n=50+49) TimeParse-4 7.25µs ± 1% 7.35µs ± 0% +1.48% (p=0.000 n=50+50) TimeFormat-4 13.3µs ± 1% 13.2µs ± 1% -0.13% (p=0.007 n=50+50) [Geo mean] 736µs 733µs -0.37% name old speed new speed delta GobDecode-4 7.60MB/s ± 1% 7.56MB/s ± 2% -0.46% (p=0.001 n=49+49) GobEncode-4 8.47MB/s ± 1% 8.42MB/s ± 2% -0.50% (p=0.001 n=50+50) Gzip-4 4.58MB/s ± 1% 4.61MB/s ± 1% +0.59% (p=0.000 n=50+50) Gunzip-4 31.2MB/s ± 1% 31.4MB/s ± 0% +0.63% (p=0.000 n=50+42) JSONEncode-4 7.73MB/s ± 1% 7.69MB/s ± 1% -0.53% (p=0.000 n=49+50) JSONDecode-4 2.06MB/s ± 1% 2.06MB/s ± 1% ~ (p=0.052 n=44+50) GoParse-4 1.23MB/s ± 0% 1.23MB/s ± 2% ~ (p=0.526 n=26+50) RegexpMatchEasy0_32-4 25.9MB/s ± 1% 25.9MB/s ± 1% -0.30% (p=0.000 n=49+50) RegexpMatchEasy0_1K-4 132MB/s ± 7% 132MB/s ± 6% ~ (p=0.885 n=50+50) RegexpMatchEasy1_32-4 24.2MB/s ± 1% 24.1MB/s ± 1% -0.22% (p=0.003 n=50+50) RegexpMatchEasy1_1K-4 96.4MB/s ± 2% 97.8MB/s ± 3% +1.36% (p=0.000 n=50+50) RegexpMatchMedium_32-4 460kB/s ± 0% 476kB/s ± 1% +3.43% (p=0.000 n=49+50) RegexpMatchMedium_1K-4 1.88MB/s ± 0% 1.88MB/s ± 0% ~ (all equal) RegexpMatchHard_32-4 1.11MB/s ± 0% 1.11MB/s ± 1% +0.34% (p=0.000 n=45+50) RegexpMatchHard_1K-4 1.19MB/s ± 1% 1.18MB/s ± 1% -0.34% (p=0.033 n=50+50) Revcomp-4 30.9MB/s ± 2% 30.9MB/s ± 2% ~ (p=0.894 n=48+49) Template-4 1.84MB/s ± 1% 1.86MB/s ± 2% +1.19% (p=0.000 n=48+50) [Geo mean] 6.63MB/s 6.65MB/s +0.26% Fixes #19844. Change-Id: I5ad16cc0b29267bb4579aca3dcc10a0b8ade1aa4 Reviewed-on: https://go-review.googlesource.com/42430 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-05-09cmd/internal/obj, cmd/link: fix st_other field on PPC64Ian Lance Taylor
In PPC64 ELF files, the st_other field indicates the number of prologue instructions between the global and local entry points. We add the instructions in the compiler and assembler if -shared is used. We were assuming that the instructions were present when building a c-archive or PIE or doing dynamic linking, on the assumption that those are the cases where the go tool would be building with -shared. That assumption fails when using some other tool, such as Bazel, that does not necessarily use -shared in exactly the same way. This CL records in the object file whether a symbol was compiled with -shared (this will be the same for all symbols in a given compilation) and uses that information when setting the st_other field. Fixes #20290. Change-Id: Ib2b77e16aef38824871102e3c244fcf04a86c6ea Reviewed-on: https://go-review.googlesource.com/43051 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-05-09cmd/internal/obj/arm64, cmd/compile: improve offset folding on ARM64Cherry Zhang
ARM64 assembler backend only accepts loads and stores with small or aligned offset. The compiler therefore can only fold small or aligned offsets into loads and stores. For locals and args, their offsets to SP are not known until very late, and the compiler makes conservative decision not folding some of them. However, in most cases, the offset is indeed small or aligned, and can be folded into load and store (but actually not). This CL adds support of loads and stores with large and unaligned offsets. When the offset doesn't fit into the instruction, it uses two instructions and (for very large offset) the constant pool. This way, the compiler doesn't need to be conservative, and can simply fold the offset. To make it work, the assembler's optab matching rules need to be changed. Before, MOVD accepts C_UAUTO32K which matches multiple of 8 between 0 and 32K, and also C_UAUTO16K, which may not be multiple of 8 and does not fit into MOVD instruction. The assembler errors in the latter case. This change makes it only matches multiple of 8 (or offsets within ±256, which also fits in instruction), and uses the large-or-unaligned-offset rule for things doesn't fit (without error). Other sized move rules are changed similarly. Class C_UAUTO64K and C_UOREG64K are removed, as they are never used. In shared library, load/store of global is rewritten to using GOT and temp register, which conflicts with the use of temp register for assembling large offset. So the folding is disabled for globals in shared library mode. Reduce cmd/go binary size by 2%. name old time/op new time/op delta BinaryTree17-8 8.67s ± 0% 8.61s ± 0% -0.60% (p=0.000 n=9+10) Fannkuch11-8 6.24s ± 0% 6.19s ± 0% -0.83% (p=0.000 n=10+9) FmtFprintfEmpty-8 116ns ± 0% 116ns ± 0% ~ (all equal) FmtFprintfString-8 196ns ± 0% 192ns ± 0% -1.89% (p=0.000 n=10+10) FmtFprintfInt-8 199ns ± 0% 198ns ± 0% -0.35% (p=0.001 n=9+10) FmtFprintfIntInt-8 294ns ± 0% 293ns ± 0% -0.34% (p=0.000 n=8+8) FmtFprintfPrefixedInt-8 318ns ± 1% 318ns ± 1% ~ (p=1.000 n=10+10) FmtFprintfFloat-8 537ns ± 0% 531ns ± 0% -1.17% (p=0.000 n=9+10) FmtManyArgs-8 1.19µs ± 1% 1.18µs ± 1% -1.41% (p=0.001 n=10+10) GobDecode-8 17.2ms ± 1% 17.3ms ± 2% ~ (p=0.165 n=10+10) GobEncode-8 14.7ms ± 1% 14.7ms ± 2% ~ (p=0.631 n=10+10) Gzip-8 837ms ± 0% 836ms ± 0% -0.14% (p=0.006 n=9+10) Gunzip-8 141ms ± 0% 139ms ± 0% -1.24% (p=0.000 n=9+10) HTTPClientServer-8 256µs ± 1% 253µs ± 1% -1.35% (p=0.000 n=10+10) JSONEncode-8 40.1ms ± 1% 41.3ms ± 1% +3.06% (p=0.000 n=10+9) JSONDecode-8 157ms ± 1% 156ms ± 1% -0.83% (p=0.001 n=9+8) Mandelbrot200-8 8.94ms ± 0% 8.94ms ± 0% +0.02% (p=0.000 n=9+9) GoParse-8 8.69ms ± 0% 8.54ms ± 1% -1.69% (p=0.000 n=8+10) RegexpMatchEasy0_32-8 227ns ± 1% 228ns ± 1% +0.48% (p=0.016 n=10+9) RegexpMatchEasy0_1K-8 1.92µs ± 0% 1.63µs ± 0% -15.08% (p=0.000 n=10+9) RegexpMatchEasy1_32-8 256ns ± 0% 251ns ± 0% -2.19% (p=0.000 n=10+9) RegexpMatchEasy1_1K-8 2.38µs ± 0% 2.09µs ± 0% -12.49% (p=0.000 n=10+9) RegexpMatchMedium_32-8 352ns ± 0% 354ns ± 0% +0.39% (p=0.002 n=10+9) RegexpMatchMedium_1K-8 106µs ± 0% 106µs ± 0% -0.05% (p=0.005 n=10+9) RegexpMatchHard_32-8 5.92µs ± 0% 5.89µs ± 0% -0.40% (p=0.000 n=9+8) RegexpMatchHard_1K-8 180µs ± 0% 179µs ± 0% -0.14% (p=0.000 n=10+9) Revcomp-8 1.20s ± 0% 1.13s ± 0% -6.29% (p=0.000 n=9+8) Template-8 159ms ± 1% 154ms ± 1% -3.14% (p=0.000 n=9+10) TimeParse-8 800ns ± 3% 769ns ± 1% -3.91% (p=0.000 n=10+10) TimeFormat-8 826ns ± 2% 817ns ± 2% -1.04% (p=0.050 n=10+10) [Geo mean] 145µs 143µs -1.79% Change-Id: I5fc42087cee9b54ea414f8ef6d6d020b80eb5985 Reviewed-on: https://go-review.googlesource.com/42172 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>
2017-05-06cmd/asm: fix operand order of ARM's MULA instructionBen Shi
As discussion in issue #19141, the addend should be the third argument of MULA. This patch fixes it in both the front end and the back end of the assembler. And also tests are added to the encoding test. Fixes #19141 Change-Id: Idbc6f338b8fdfcad97a135f27a98c5b375b27d43 Reviewed-on: https://go-review.googlesource.com/42028 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-05-05cmd/internal/obj/ppc64, cmd/link/internal/ppc64: Change function alignment to 16Carlos Eduardo Seo
The Power processor manual states that "Branches not from the last instruction of an aligned quadword and not to the first instruction of an aligned quadword cause inefficiencies in the IBuffer". This changes the function alignment from 8 to 16 bytes to comply with that. Fixes #18963 Change-Id: Ibce9bf8302110a86c6ab05948569af9ffdfcf4bb Reviewed-on: https://go-review.googlesource.com/36390 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-05-02cmd/{asm,compile}: avoid zeroAuto clobbering flags on s390xMichael Munday
This CL modifies how MOV[DWHB] instructions that store a constant to memory are assembled to avoid them clobbering the condition code (flags). It also modifies zeroAuto to use MOVD instructions instead of CLEAR (which is assembled as XC). MOV[DWHB]storeconst ops also no longer clobbers flags. Note: this CL modifies the assembler so that it can no longer handle immediates outside the range of an int16 or offsets from SB, which reflects what the machine instructions support. The compiler doesn't need this capability any more and I don't think this affects any existing assembly, but it is easy to workaround if it does. Fixes #20187. Change-Id: Ie54947ff38367bd6a19962bf1a6d0296a4accffb Reviewed-on: https://go-review.googlesource.com/42179 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-05-02cmd/internal/obj: fix LSym.Type during compilation, not linkingJosh Bleecher Snyder
Prior to this CL, the compiler and assembler were sloppy about the LSym.Type for LSyms containing static data. The linker then fixed this up, converting Sxxx and SBSS to SDATA, and SNOPTRBSS to SNOPTRDATA if it noticed that the symbol had associated data. It is preferable to just get this right in cmd/compile and cmd/asm, because it removes an unnecessary traversal of the symbol table from the linker (see #14624). Do this by touching up the LSym.Type fixes in LSym.prepwrite and Link.Globl. I have confirmed by instrumenting the linker that the now-eliminated code paths were unreached. And an additional check in the object file writing code will help preserve that invariant. There was a case in the Windows linker, with internal linking and cgo, where we were generating SNOPTRBSS symbols with data. For now, convert those at the site at which they occur into SNOPTRDATA, just like they were. Does not pass toolstash-check, but does generate identical linked binaries. No compiler performance changes. Change-Id: I77b071ab103685ff8e042cee9abb864385488872 Reviewed-on: https://go-review.googlesource.com/40864 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Alex Brainman <alex.brainman@gmail.com> Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2017-05-01cmd/internal/obj/x86: use LEAx rather than ADDx when calling DUFFxxxx via GOTMichael Hudson-Doyle
DUFFZERO on 386 is not marked as clobbering flags, but rewriteToUseGot rewrote "ADUFFZERO $offset" to "MOVL runtime.duffxxx@GOT, CX; ADDL $offset, CX; CALL CX" which does. Luckily the fix is easier than figuring out what the problem was: replace the ADDL $offset, CX with LEAL $offset(CX), CX. On amd64 DUFFZERO clobbers flags, on arm, arm64 and ppc64 ADD does not clobber flags and s390x does not use the duff functions, so I'm fairly confident this is the only fix required. I don't know how to write a test though. Change-Id: I69b0958f5f45771d61db5f5ecb4ded94e8960d4d Reviewed-on: https://go-review.googlesource.com/41821 Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-05-01cmd/internal/obj/x86: fix ANDPS encodingDamien Lespiau
ANDPS, like all others PS (Packed Single precision floats) instructions, need Ym: they don't use the 0x66 prefix. From the manual: NP 0F 54 /r ANDPS xmm1, xmm2/m128 NP meaning, quoting the manual: NP - Indicates the use of 66/F2/F3 prefixes (beyond those already part of the instructions opcode) are not allowed with the instruction. And indeed, the same instruction prefixed by 0x66 is ANDPD. Updates #14069 Change-Id: If312a6f1e77113ab8c0febe66bdb1b4171e41e0a Reviewed-on: https://go-review.googlesource.com/42090 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-28cmd/internal/objabi: shrink SymType down to a uint8Michael Hudson-Doyle
Now that it only takes small values. Change-Id: I08086d392529d8775b470d65afc2475f8d0e7f4a Reviewed-on: https://go-review.googlesource.com/42030 Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-28cmd/internal: remove SymKind values that are only checked for, never setMichael Hudson-Doyle
Change-Id: Id152767c033c12966e9e12ae303b99f38776f919 Reviewed-on: https://go-review.googlesource.com/40987 Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-27cmd/internal/obj: ARM, use immediates instead of constant pool entriesKeith Randall
When a constant doesn't fit in a single instruction, use two paired instructions instead of the constant pool. For example ADD $0xaa00bb, R0, R1 Used to rewrite to: MOV ?(IP), R11 ADD R11, R0, R1 Instead, do: ADD $0xaa0000, R0, R1 ADD $0xbb, R1, R1 Same number of instructions. Good: 4 less bytes (no constant pool entry) One less load. Bad: Critical path is one instruction longer. It's probably worth it to avoid the loads, they are expensive. Dave Cheney got us some performance numbers: https://perf.golang.org/search?q=upload:20170426.1 TL;DR mean 1.37% improvement. Change-Id: Ib206836161fdc94a3962db6f9caa635c87d57cf1 Reviewed-on: https://go-review.googlesource.com/41612 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-27cmd/internal/obj/arm64: fix encoding of conditionWei Xiao
The current code treats condition as special register and write its raw data directly into instruction. The fix converts the raw data into correct condition encoding. Also fix the operand catogery of FCCMP. Add tests to cover all cases. Change-Id: Ib194041bd9017dd0edbc241564fe983082ac616b Reviewed-on: https://go-review.googlesource.com/41511 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-27cmd/compile: add initial backend concurrency supportJosh Bleecher Snyder
This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-26cmd/internal/obj/x86: fix adcb r/mem8,reg8 encodingDamien Lespiau
Taken from the Intel Software Development Manual (of course, in the line below it's ADC DST, SRC; The opposite of the commit subject). 12 /r ADC r8, r/m8 We need 0x12 for the corresponding ytab line, not 0x10. {Ymb, Ynone, Yrb, Zm_r, 1}, Updates #14069 Change-Id: Id37cbd0c581c9988c2de355efa908956278e2189 Reviewed-on: https://go-review.googlesource.com/41857 Reviewed-by: Keith Randall <khr@golang.org>
2017-04-26cmd/internal/obj/ppc64: use MOVDU to update stack reg for leaf functions ↵Lynn Boger
where possible When the stack register is decremented to acquire stack space at the beginning of a function, a MOVDU should be used so it is done atomically, unless the size of the stack frame is too large for that instruction. The code to determine whether to use MOVDU or MOVD was checking if the function was a leaf and always generating MOVD when it was. The choice of MOVD vs. MOVDU should only depend on the stack frame size. This fixes that problem. Change-Id: I0e49c79036f1e8f7584179e1442b938fc6da085f Reviewed-on: https://go-review.googlesource.com/41813 Reviewed-by: Michael Munday <munday@ca.ibm.com>
2017-04-26cmd/internal: fix bug getting wrong indicator in DRconv()Fangming.Fang
Change-Id: I251ae497b0ab237d4b3fe98e397052394142d437 Reviewed-on: https://go-review.googlesource.com/41653 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-25cmd/internal/obj/x86: port the doasm comment to goDamien Lespiau
This comment is very useful but still refers to the C implementation. Adapting it for Go is fairly straightforward though. Change-Id: Ib6dde25f3a18acbce76bb3cffdc29f5ccf43c1f7 Reviewed-on: https://go-review.googlesource.com/41696 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-25cmd: fix the order that s390x operands are printed inMichael Munday
The assembler reordered the operands of some instructions to put the first operand into From3. Unfortunately this meant that when the instructions were printed the operands were in a different order than the assembler would expect as input. For example, 'MVC $8, (R1), (R2)' would be printed as 'MVC (R1), $8, (R2)'. Originally this was done to ensure that From contained the source memory operand. The current compiler no longer requires this and so this CL simply makes all instructions use the standard order for operands: From, Reg, From3 and finally To. Fixes #18295 Change-Id: Ib2b5ec29c647ca7a995eb03dc78f82d99618b092 Reviewed-on: https://go-review.googlesource.com/40299 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-25cmd/internal/obj/arm: use new form of MOVW introduced in ARMv7Ben Shi
As discussion in issue #18293, "MOVW $Imm-16, Reg" was introduced in ARMv7. It directly encoded the 16-bit immediate into the instruction instead of put it in the constant pool. This patch makes the arm assembler choose this form of MOVW if available. Besides 4 bytes are saved in the constant pool, the go1 benchmark test also shows a slight improvement. name old time/op new time/op delta BinaryTree17-4 42.7s ± 1% 42.7s ± 1% ~ (p=0.304 n=50+50) Fannkuch11-4 24.8s ± 1% 24.8s ± 0% ~ (p=0.757 n=50+49) FmtFprintfEmpty-4 875ns ± 1% 873ns ± 2% ~ (p=0.066 n=44+46) FmtFprintfString-4 1.43µs ± 1% 1.45µs ± 1% +1.68% (p=0.000 n=44+44) FmtFprintfInt-4 1.52µs ± 1% 1.52µs ± 1% +0.26% (p=0.009 n=41+45) FmtFprintfIntInt-4 2.19µs ± 1% 2.20µs ± 1% +0.76% (p=0.000 n=43+46) FmtFprintfPrefixedInt-4 2.56µs ± 2% 2.53µs ± 1% -1.03% (p=0.000 n=45+44) FmtFprintfFloat-4 4.41µs ± 1% 4.39µs ± 1% -0.52% (p=0.000 n=44+44) FmtManyArgs-4 9.02µs ± 2% 9.04µs ± 1% +0.27% (p=0.000 n=46+44) GobDecode-4 106ms ± 1% 106ms ± 1% ~ (p=0.310 n=45+43) GobEncode-4 88.1ms ± 2% 88.0ms ± 2% ~ (p=0.648 n=49+50) Gzip-4 4.31s ± 1% 4.27s ± 1% -1.01% (p=0.000 n=50+50) Gunzip-4 618ms ± 1% 608ms ± 1% -1.65% (p=0.000 n=45+47) HTTPClientServer-4 689µs ± 6% 692µs ± 4% +0.52% (p=0.038 n=50+47) JSONEncode-4 282ms ± 2% 280ms ± 1% -0.75% (p=0.000 n=46+43) JSONDecode-4 945ms ± 2% 940ms ± 1% -0.47% (p=0.000 n=47+47) Mandelbrot200-4 49.4ms ± 1% 49.3ms ± 1% ~ (p=0.163 n=45+45) GoParse-4 46.0ms ± 3% 45.5ms ± 2% -0.95% (p=0.000 n=49+40) RegexpMatchEasy0_32-4 1.29µs ± 1% 1.28µs ± 1% -0.14% (p=0.005 n=38+45) RegexpMatchEasy0_1K-4 7.92µs ± 8% 7.75µs ± 6% -2.12% (p=0.000 n=47+50) RegexpMatchEasy1_32-4 1.31µs ± 1% 1.31µs ± 0% ~ (p=0.282 n=45+48) RegexpMatchEasy1_1K-4 10.4µs ± 5% 10.4µs ± 3% ~ (p=0.771 n=50+49) RegexpMatchMedium_32-4 2.06µs ± 1% 2.07µs ± 1% +0.35% (p=0.001 n=44+49) RegexpMatchMedium_1K-4 533µs ± 1% 532µs ± 1% ~ (p=0.710 n=43+47) RegexpMatchHard_32-4 29.7µs ± 1% 29.6µs ± 1% -0.34% (p=0.002 n=43+46) RegexpMatchHard_1K-4 893µs ± 2% 885µs ± 1% -0.85% (p=0.000 n=50+45) Revcomp-4 85.6ms ± 4% 85.5ms ± 2% ~ (p=0.683 n=50+50) Template-4 1.05s ± 3% 1.04s ± 1% -1.06% (p=0.000 n=50+44) TimeParse-4 7.19µs ± 2% 7.11µs ± 2% -1.10% (p=0.000 n=48+46) TimeFormat-4 13.4µs ± 1% 13.5µs ± 1% ~ (p=0.056 n=46+49) [Geo mean] 747µs 745µs -0.28% name old speed new speed delta GobDecode-4 7.23MB/s ± 1% 7.22MB/s ± 1% ~ (p=0.062 n=45+39) GobEncode-4 8.71MB/s ± 2% 8.72MB/s ± 2% ~ (p=0.656 n=49+50) Gzip-4 4.50MB/s ± 1% 4.55MB/s ± 1% +1.03% (p=0.000 n=50+50) Gunzip-4 31.4MB/s ± 1% 31.9MB/s ± 1% +1.67% (p=0.000 n=45+47) JSONEncode-4 6.89MB/s ± 2% 6.94MB/s ± 1% +0.76% (p=0.000 n=46+43) JSONDecode-4 2.05MB/s ± 2% 2.06MB/s ± 2% +0.32% (p=0.017 n=47+50) GoParse-4 1.26MB/s ± 3% 1.27MB/s ± 1% +0.68% (p=0.000 n=50+48) RegexpMatchEasy0_32-4 24.9MB/s ± 1% 24.9MB/s ± 1% +0.13% (p=0.004 n=38+45) RegexpMatchEasy0_1K-4 129MB/s ± 7% 132MB/s ± 6% +2.34% (p=0.000 n=46+50) RegexpMatchEasy1_32-4 24.5MB/s ± 1% 24.4MB/s ± 1% ~ (p=0.252 n=45+48) RegexpMatchEasy1_1K-4 98.8MB/s ± 4% 98.7MB/s ± 3% ~ (p=0.771 n=50+49) RegexpMatchMedium_32-4 485kB/s ± 3% 480kB/s ± 0% -0.95% (p=0.000 n=50+38) RegexpMatchMedium_1K-4 1.92MB/s ± 1% 1.92MB/s ± 1% ~ (p=0.129 n=43+47) RegexpMatchHard_32-4 1.08MB/s ± 2% 1.08MB/s ± 1% +0.38% (p=0.017 n=46+46) RegexpMatchHard_1K-4 1.15MB/s ± 2% 1.16MB/s ± 1% +0.67% (p=0.001 n=50+49) Revcomp-4 29.7MB/s ± 4% 29.7MB/s ± 2% ~ (p=0.682 n=50+50) Template-4 1.85MB/s ± 3% 1.87MB/s ± 1% +1.04% (p=0.000 n=50+44) [Geo mean] 6.56MB/s 6.60MB/s +0.47% Change-Id: Ic2cca90133c27a08d9f1a23c65b0eed5fbd02684 Reviewed-on: https://go-review.googlesource.com/41190 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-20cmd/internal/obj: eliminate LSym.VersionJosh Bleecher Snyder
There were only two versions, 0 and 1, and the only user of version 1 was the assembler, to indicate that a symbol was static. Rename LSym.Version to Static, and add it to LSym.Attributes. Simplify call-sites. Passes toolstash-check. Change-Id: Iabd39918f5019cce78f381d13f0481ae09f3871f Reviewed-on: https://go-review.googlesource.com/41201 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-20cmd/internal/obj/x86: fix relocation offset for VEX encoded instructionsIlya Tocar
VEX encoded instructions don't have a REX byte, so for PC relative addressing we don't need to recalculate relocation offset. Fixes #19518 Change-Id: Icf5414962de4350d76fd220817498337f90614fc Reviewed-on: https://go-review.googlesource.com/38138 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-20cmd/compile: add rotates to PPC64.rulesLynn Boger
This updates PPC64.rules to include rules to generate rotates for ADD, OR, XOR operators that combine two opposite shifts that sum to 32 or 64. To support this change opcodes for ROTL and ROTLW were added to be used like the rotldi and rotlwi extended mnemonics. This provides the following improvement in sha3: BenchmarkPermutationFunction-8 302.83 376.40 1.24x BenchmarkSha3_512_MTU-8 98.64 121.92 1.24x BenchmarkSha3_384_MTU-8 136.80 168.30 1.23x BenchmarkSha3_256_MTU-8 169.21 211.29 1.25x BenchmarkSha3_224_MTU-8 179.76 221.19 1.23x BenchmarkShake128_MTU-8 212.87 263.23 1.24x BenchmarkShake256_MTU-8 196.62 245.60 1.25x BenchmarkShake256_16x-8 163.57 194.37 1.19x BenchmarkShake256_1MiB-8 199.02 248.74 1.25x BenchmarkSha3_512_1MiB-8 106.55 133.13 1.25x Fixes #20030 Change-Id: I484c56f48395d32f53ff3ecb3ac6cb8191cfee44 Reviewed-on: https://go-review.googlesource.com/40992 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-20cmd/internal/obj: split Link.hash into version 0 and 1Josh Bleecher Snyder
Though LSym.Version is an int, it can only have the value 0 or 1. Using that, split Link.hash into two maps, one for version 0 (which is far more common) and one for version 1. This lets use just the name for lookups, which is both faster and more compact. This matters because Link.hash map lookups are frequent, and will be contended once the backend is concurrent. name old time/op new time/op delta Template 194ms ± 3% 192ms ± 5% -1.46% (p=0.000 n=47+49) Unicode 84.5ms ± 3% 83.8ms ± 3% -0.81% (p=0.011 n=50+49) GoTypes 543ms ± 2% 545ms ± 4% ~ (p=0.566 n=46+49) Compiler 2.48s ± 2% 2.48s ± 3% ~ (p=0.706 n=47+50) SSA 5.94s ± 3% 5.98s ± 2% +0.55% (p=0.040 n=49+50) Flate 119ms ± 6% 119ms ± 4% ~ (p=0.681 n=48+47) GoParser 145ms ± 4% 145ms ± 3% ~ (p=0.662 n=47+49) Reflect 348ms ± 3% 344ms ± 3% -1.17% (p=0.000 n=47+47) Tar 105ms ± 4% 104ms ± 3% ~ (p=0.155 n=50+47) XML 197ms ± 2% 197ms ± 3% ~ (p=0.666 n=49+49) [Geo mean] 332ms 331ms -0.37% name old user-time/op new user-time/op delta Template 230ms ±10% 226ms ±10% -1.85% (p=0.041 n=50+50) Unicode 104ms ± 6% 103ms ± 5% ~ (p=0.076 n=49+49) GoTypes 707ms ± 4% 705ms ± 5% ~ (p=0.521 n=50+50) Compiler 3.30s ± 3% 3.33s ± 4% +0.76% (p=0.003 n=50+49) SSA 8.17s ± 4% 8.23s ± 3% +0.66% (p=0.030 n=50+49) Flate 139ms ± 6% 138ms ± 8% ~ (p=0.184 n=49+48) GoParser 174ms ± 5% 172ms ± 6% ~ (p=0.107 n=48+49) Reflect 431ms ± 8% 420ms ± 5% -2.57% (p=0.000 n=50+46) Tar 119ms ± 6% 118ms ± 7% -0.95% (p=0.033 n=50+49) XML 236ms ± 4% 236ms ± 4% ~ (p=0.935 n=50+48) [Geo mean] 410ms 407ms -0.67% name old alloc/op new alloc/op delta Template 38.7MB ± 0% 38.6MB ± 0% -0.29% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.7MB ± 0% -0.24% (p=0.008 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.29% (p=0.008 n=5+5) Compiler 462MB ± 0% 462MB ± 0% -0.12% (p=0.008 n=5+5) SSA 1.27GB ± 0% 1.27GB ± 0% -0.05% (p=0.008 n=5+5) Flate 25.2MB ± 0% 25.1MB ± 0% -0.37% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.6MB ± 0% ~ (p=0.056 n=5+5) Reflect 77.5MB ± 0% 77.2MB ± 0% -0.38% (p=0.008 n=5+5) Tar 26.4MB ± 0% 26.3MB ± 0% ~ (p=0.151 n=5+5) XML 41.9MB ± 0% 41.9MB ± 0% -0.20% (p=0.032 n=5+5) [Geo mean] 74.5MB 74.3MB -0.23% name old allocs/op new allocs/op delta Template 378k ± 1% 377k ± 1% ~ (p=0.690 n=5+5) Unicode 321k ± 0% 322k ± 0% ~ (p=0.595 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.310 n=5+5) Compiler 4.25M ± 0% 4.25M ± 0% ~ (p=0.151 n=5+5) SSA 9.84M ± 0% 9.84M ± 0% ~ (p=0.841 n=5+5) Flate 232k ± 1% 232k ± 0% ~ (p=0.690 n=5+5) GoParser 315k ± 1% 315k ± 1% ~ (p=0.841 n=5+5) Reflect 970k ± 0% 970k ± 0% ~ (p=0.841 n=5+5) Tar 248k ± 0% 248k ± 1% ~ (p=0.841 n=5+5) XML 389k ± 0% 389k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 724k 724k +0.01% Updates #15756 Change-Id: I2646332e89f0444ca9d5a41d7172537d904ed636 Reviewed-on: https://go-review.googlesource.com/41050 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-19cmd/internal/obj: remove unused obj.LinksymfmtDave Cheney
obj.Linksymfmt is no longer referenced by any packages in cmd/... Change-Id: Id4d9213d1577e13580b60755dbf7da313b17cb0e Reviewed-on: https://go-review.googlesource.com/41171 Run-TryBot: Dave Cheney <dave@cheney.net> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-19cmd/internal/obj/s390x: delete unused REGZERO constantMichael Munday
When we switched to SSA R0 was made allocatable and no longer holds zero on s390x. Change-Id: I1c752bb02da35462a535492379345fa9f4e12cb0 Reviewed-on: https://go-review.googlesource.com/41079 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-19cmd/internal/objabi: extract shared functionality from objMatthew Dempsky
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing the assembler backends no longer requires reinstalling cmd/link or cmd/addr2line. There's also now one canonical definition of the object file format in cmd/internal/objabi/doc.go, with a warning to update all three implementations. objabi is still something of a grab bag of unrelated code (e.g., flag and environment variable handling probably belong in a separate "tool" package), but this is still progress. Fixes #15165. Fixes #20026. Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c Reviewed-on: https://go-review.googlesource.com/40972 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-18cmd/internal/obj: un-embed FuncInfo field in LSymMatthew Dempsky
Automated refactoring using github.com/mdempsky/unbed (to rewrite s.Foo to s.FuncInfo.Foo) and then gorename (to rename the FuncInfo field to just Func). Passes toolstash-check -all. Change-Id: I802c07a1239e0efea058a91a87c5efe12170083a Reviewed-on: https://go-review.googlesource.com/40670 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-18cmd/internal/obj: unexport Link.HashJosh Bleecher Snyder
A prior CL eliminated the last reference to Ctxt.Hash from the compiler. Change-Id: Ic97ff84ed1a14e0c93fb0e8ec0b2617c3397c0e8 Reviewed-on: https://go-review.googlesource.com/40699 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-18cmd/internal/obj: rework gclocals handlingJosh Bleecher Snyder
The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-17cmd/asm: detect invalid DS form offsets for ppc64xLynn Boger
While debugging a recent regression it was discovered that the assembler for ppc64x was not always generating the correct instruction for DS form loads and stores. When an instruction is DS form then the offset must be a multiple of 4, and if it isn't then bits outside the offset field were being incorrectly set resulting in unexpected and incorrect instructions. This change adds a check to determine when the opcode is DS form and then verifies that the offset is a multiple of 4 before generating the instruction, otherwise logs an error. This also changes a few asm files that were using unaligned offsets for DS form loads and stores. In the runtime package these were instructions intended to cause a crash so using aligned or unaligned offsets doesn't change that behavior. Change-Id: Ie3a7e1e65dcc9933b54de7a46a054da8459cb56f Reviewed-on: https://go-review.googlesource.com/40476 Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2017-04-17cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructionsMichael Munday
The instructions allow moves between floating point and general purpose registers without any conversion taking place. Change-Id: I82c6f3ad9c841a83783b5be80dcf5cd538ff49e6 Reviewed-on: https://go-review.googlesource.com/38777 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-17cmd/internal/obj: unify Setuintxx and WriteIntJosh Bleecher Snyder
They do basically the same work. Setuintxx was only used in a single place, so eliminate it in favor of WriteInt. duintxxLSym's alignment rounding was not used in practice; change it into alignment assertion. Passes toolstash-check. No compiler performance changes. Change-Id: I0f7410cf2ccffbdc02ad796eaf973ee6a83074f8 Reviewed-on: https://go-review.googlesource.com/40863 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-17cmd/internal/obj: pretty-print LSym.Type when debuggingJosh Bleecher Snyder
We have a stringer for LSym.Type. Use it. Before: "".algarray t=31 size=224 After: "".algarray SBSS size=224 Change-Id: Ib4c7d2bc1dbe9943cf2a5dfa5d9f2d7fbd50b7f2 Reviewed-on: https://go-review.googlesource.com/40862 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-13cmd/internal/obj: cache dwarfSymJosh Bleecher Snyder
Follow-up to review feedback from mdempsky on CL 40507. Reduces mutex contention by about 1%. Change-Id: I540ea6772925f4a59e58f55a3458eff15880c328 Reviewed-on: https://go-review.googlesource.com/40575 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-13cmd/internal/obj: generate function DWARF symbols earlyJosh Bleecher Snyder
This removes a concurrent access of ctxt.Data. Updates #15756 Change-Id: Id017e90e47e093cd8825907f3853bb3d3bf8280d Reviewed-on: https://go-review.googlesource.com/40507 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>