aboutsummaryrefslogtreecommitdiff
path: root/src/internal/cpu/cpu.go
AgeCommit message (Collapse)Author
2026-01-28cmd/compile, simd: capture VAES instructions and fix AVX512VAES featureJunyang Shao
The code previously filters out VAES-only instructions, this CL added them back. This CL added the VAES feature check following the Intel xed data: XED_ISA_SET_VAES: vaes.7.0.ecx.9 # avx.1.0.ecx.28 This CL also found out that the old AVX512VAES feature check is not checking the correct bits, it also fixes it: XED_ISA_SET_AVX512_VAES_128: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 avx512vl.7.0.ebx.31 XED_ISA_SET_AVX512_VAES_256: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 avx512vl.7.0.ebx.31 XED_ISA_SET_AVX512_VAES_512: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 It restricts to the most strict common set - includes avx512vl for even 512-bits although it doesn't requires it. Change-Id: I4e2f72b312fd2411589fbc12f9ee5c63c09c2e9a Reviewed-on: https://go-review.googlesource.com/c/go/+/738500 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-09-30[dev.simd] cmd/compile, simd: add AES instructionsJunyang Shao
AVXAES is a composite feature set, Intel did listed it as "AVXAES" in the XED data instead of separating them. The tests will be in the next CL. Change-Id: I89c97261f2228b2fdafb48f63e82ef6239bdd5ca Reviewed-on: https://go-review.googlesource.com/c/go/+/706055 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-09-03[dev.simd] all: merge master (4c4cefc) into dev.simdCherry Mui
Merge List: + 2025-09-03 4c4cefc19a cmd/gofmt: simplify logic to process arguments + 2025-09-03 925a3cdcd1 unicode/utf8: make DecodeRune{,InString} inlineable + 2025-09-03 3e596d448f math: rename Modf parameter int to integer + 2025-09-02 2a7f1d47b0 runtime: use one more address bit for tagged pointers + 2025-09-02 b09068041a cmd/dist: run racebench tests only in longtest mode + 2025-09-02 355370ac52 runtime: add comment for concatstring2 + 2025-09-02 1eec830f54 go/doc: linkify interface methods + 2025-08-31 7bba745820 cmd/compile: use generated loops instead of DUFFZERO on loong64 + 2025-08-31 882335e2cb cmd/internal/obj/loong64: add LDPTR.{W/D} and STPTR.{W/D} instructions support + 2025-08-31 d4b17f5869 internal/runtime/atomic: reset wrong jump target in Cas{,64} on loong64 + 2025-08-31 6a08e80399 net/http: skip redirecting in ServeMux when URL path for CONNECT is empty + 2025-08-29 8bcda6c79d runtime/race: add race detector support for linux/riscv64 + 2025-08-29 8377adafc5 cmd/cgo: split loadDWARF into two parts + 2025-08-29 a7d9d5a80a cmd/cgo: move typedefs and typedefList out of Package + 2025-08-29 1d459c4357 all: delete more windows/arm remnants + 2025-08-29 27ce6e4e26 cmd/compile: remove sign extension before MULW on riscv64 + 2025-08-29 84b070bfb1 cmd/compile/internal/ssa: make oneBit function generic + 2025-08-29 fe42628dae internal/cpu: inline DebugOptions + 2025-08-29 94b7d519bd net: update document on limitation of iprawsock on Windows + 2025-08-29 ba9e1ddccf testing: allow specify temp dir by GOTMPDIR environment variable + 2025-08-29 9f6936b8da cmd/link: disallow linkname of runtime.addmoduledata + 2025-08-29 89d41d254a bytes, strings: speed up TrimSpace + 2025-08-29 38204e0872 testing/synctest: call out common issues with tests + 2025-08-29 252c901125 os,syscall: pass file flags to CreateFile on Windows + 2025-08-29 53515fb0a9 crypto/tls: use hash.Cloner + 2025-08-28 13bb48e6fb go/constant: fix complex != unknown comparison + 2025-08-28 ba1109feb5 net: remove redundant cgoLookupCNAME return parameter + 2025-08-28 f74ed44ed9 net/http/httputil: remove redundant pw.Close() call in DumpRequestOut + 2025-08-28 a9689d2e0b time: skip TestLongAdjustTimers in short mode on single CPU systems + 2025-08-28 ebc763f76d syscall: only get parent PID if SysProcAttr.Pdeathsig is set + 2025-08-28 7f1864b0a8 strings: remove redundant "runs" from string.Fields docstring + 2025-08-28 90c21fa5b6 net/textproto: eliminate some bounds checks + 2025-08-27 e47d88beae os: return nil slice when ReadDir is used with a file on file_windows + 2025-08-27 6b837a64db cmd/internal/obj/loong64: simplify buildop + 2025-08-27 765905e3bd debug/elf: don't panic if symtab too small + 2025-08-27 2ee4b31242 net/http: Ensure that CONNECT proxied requests respect MaxResponseHeaderBytes + 2025-08-27 b21867b1a2 net/http: require exact match for CrossSiteProtection bypass patterns + 2025-08-27 d19e377f6e cmd/cgo: make it safe to run gcc in parallel + 2025-08-27 49a2f3ed87 net: allow zero value destination address in WriteMsgUDPAddrPort + 2025-08-26 afc51ed007 internall/poll: remove bufs field from Windows' poll.operation + 2025-08-26 801b74eb95 internal/poll: remove rsa field from Windows' poll.operation + 2025-08-26 fa18c547cd syscall: sort Windows env block in StartProcess + 2025-08-26 bfd130db02 internal/poll: don't use stack-allocated WSAMsg parameters + 2025-08-26 dae9e456ae runtime: identify virtual memory layout for riscv64 + 2025-08-25 25c2d4109f math: use Trunc to implement Modf + 2025-08-25 4e05a070c4 math: implement IsInf using Abs + 2025-08-25 1eed4f32a0 math: optimize Signbit implementation slightly + 2025-08-25 bd71b94659 cmd/compile/internal: optimizing add+sll rule using ALSLV instruction on loong64 + 2025-08-25 ea55ca3600 runtime: skip doInit of plugins in runtime.main + 2025-08-25 9ae2f1fb57 internal/trace: skip async preempt off tests on low end systems + 2025-08-25 bbd5342a62 net: fix cgoResSearch + 2025-08-25 ed7f804775 os: set full name for Roots created with Root.OpenRoot + 2025-08-25 a21249436b internal/poll: use fdMutex to provide read/write locking on Windows + 2025-08-24 44c5956bf7 test/codegen: add Mul2 and DivPow2 test for loong64 + 2025-08-24 0aa8019e94 test/codegen: add Mul* test for loong64 + 2025-08-24 83420974b7 test/codegen: add sqrt* abs and copysign test for loong64 + 2025-08-23 f2db0dca0b net/http/httptest: redirect example.com requests to server + 2025-08-22 d86ec92499 internal/syscall/windows: increase internal Windows O_ flags values + 2025-08-22 9d3f7fda70 crypto/tls: fix quic comment typo + 2025-08-22 78a05c541f internal/poll: don't pass non-nil WSAMsg.Name with 0 namelen on windows + 2025-08-22 52c3f73fda runtime/metrics: improve doc + 2025-08-22 a076f49757 os: fix Root.MkdirAll to handle race of directory creation + 2025-08-22 98238fd495 all: delete remaining windows/arm code + 2025-08-21 1ad30844d9 cmd/asm: process forward jump to PCALIGN + 2025-08-21 13c082601d internal/poll: permit nil destination address in WriteMsg{Inet4,Inet6} + 2025-08-21 9b0a507735 runtime: remove remaining windows/arm files and comments + 2025-08-21 1843f1e9c0 cmd/compile: use zero register instead of specialized *zero instructions on loong64 + 2025-08-21 e0870a0a12 cmd/compile: simplify zerorange on loong64 + 2025-08-21 fb8bbe46d5 cmd/compile/internal/ssa: eliminate unnecessary extension operations + 2025-08-21 9632ba8160 cmd/compile: optimize some patterns into revb2h/revb4h instruction on loong64 + 2025-08-21 8dcab6f450 syscall: simplify execve handling on libc platforms + 2025-08-21 ba840c1bf9 cmd/compile: deduplication in the source code generated by mknode + 2025-08-21 fa706ea50f cmd/compile: optimize rule (x + x) << c to x << c+1 on loong64 + 2025-08-21 ffc85ee1f1 cmd/internal/objabi,cmd/link: add support for additional riscv64 relocations Change-Id: I3896f74b1a3cc0a52b29ca48767bb0ba84620f71
2025-08-29internal/cpu: inline DebugOptionsTobias Klauser
internal/cpu.DebugOptions is only ever set in runtime.cpuinit on unix-like platforms. DebugOptions itself is only used in MustHaveDebugOptionsSupport, so inline the GOOS check there. Change-Id: I6a35d6b8afcdadfc59585258002f53c20026116c Reviewed-on: https://go-review.googlesource.com/c/go/+/699775 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Tobias Klauser <tobias.klauser@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Florian Lehner <lehner.florian86@gmail.com>
2025-08-14[dev.simd] all: merge master (924fe98) into dev.simdCherry Mui
Conflicts: - src/cmd/compile/internal/amd64/ssa.go - src/cmd/compile/internal/ssa/expand_calls.go - src/cmd/compile/internal/ssagen/ssa.go - src/internal/buildcfg/exp.go - src/internal/cpu/cpu.go - src/internal/cpu/cpu_x86.go - src/runtime/mkpreempt.go - src/runtime/preempt_amd64.go - src/runtime/preempt_amd64.s Merge List: + 2025-08-14 924fe98902 cmd/internal/obj/riscv: add encoding for compressed riscv64 instructions + 2025-08-13 320df537cc cmd/compile: emit classify instructions for infinity tests on riscv64 + 2025-08-13 ca66f907dd cmd/compile: use generated loops instead of DUFFCOPY on amd64 + 2025-08-13 4b1800e476 encoding/json/v2: cleanup error constructors + 2025-08-13 af8870708b encoding/json/v2: fix incorrect marshaling of NaN in float64 any + 2025-08-13 0a75e5a07b encoding/json/v2: fix wrong type with cyclic marshal error in map[string]any + 2025-08-13 de9b6f9875 cmd/pprof: update vendored github.com/google/pprof + 2025-08-13 674c5f0edd os/exec: fix incorrect expansion of ".." in LookPath on plan9 + 2025-08-13 9bbea0f21a cmd/compile: during regalloc, fixedreg values are always available + 2025-08-13 08eef97500 runtime/trace: fix documentation typo + 2025-08-13 2fe5d51d04 internal/trace: fix wrong scope for Event.Range or EvGCSweepActive + 2025-08-13 9fcb87c352 cmd/compile: teach prove about len's & cap's max based on the element size + 2025-08-13 9763ece873 cmd/compile: absorb NEGV into branch on loong64 + 2025-08-13 f10a82b76f all: update vendored dependencies [generated] + 2025-08-13 3bea95b277 cmd/link/internal/ld: remove OpenBSD buildid workaround + 2025-08-12 90b7d7aaa2 cmd/compile/internal: optimize multiplication use new operation 'ADDshiftLLV' on loong64 + 2025-08-12 1b263fc604 runtime/race: restore previous version of LLVM TSAN on macOS + 2025-08-12 b266318cf7 cmd/compile/internal/ssa: use BEQ/BNE to optimize the combination of XOR and EQ/NE on loong64 + 2025-08-12 adbf59525c internal/runtime/gc/scan: avoid -1 index when cache sizes unavailable + 2025-08-12 4e182db5fc Revert "cmd/compile: use generated loops instead of DUFFCOPY on amd64" + 2025-08-12 d2b3c1a504 internal/trace: clarify which StateTransition events have stacks + 2025-08-12 f63e12d0e0 internal/trace: fix Sync.ClockSnapshot comment + 2025-08-12 8e317da77d internal/trace: remove unused StateTransition.id field + 2025-08-12 f67d8ff34a internal/trace/tracev2: adjust comment for consistency + 2025-08-12 fe4d445c36 internal/trace/tracev2: fix EvSTWBegin comment to include stack ID + 2025-08-12 750789fab7 internal/trace/internal/testgen: fix missing stacks nframes arg + 2025-08-12 889ab74169 internal/runtime/gc/scan: import scan kernel from gclab [green tea] + 2025-08-12 182336bf05 net/http: fix data race in client + 2025-08-12 f04421ea9a cmd/compile: soften test for 74788 + 2025-08-12 28aa529c99 cmd/compile: use generated loops instead of DUFFZERO on arm64 + 2025-08-12 ec9e1176c3 cmd/compile: use generated loops instead of DUFFCOPY on amd64 + 2025-08-12 d0a64f7969 Revert "cmd/compile/internal/ssa: Use transitive properties for len/cap" + 2025-08-12 00a7bdcb55 all: delete aliastypeparams GOEXPERIMENT + 2025-08-11 74421a305b Revert "cmd/compile: allow multi-field structs to be stored directly in interfaces" + 2025-08-11 c31359138c Revert "cmd/compile: allow StructSelect [x] of interface data fields for x>0" + 2025-08-11 7248995b60 Revert "cmd/compile: allow more args in StructMake folding rule" + 2025-08-11 caf9fc3ccd Revert "reflect: handle zero-sized fields of directly-stored structures correctly" + 2025-08-11 ce3f3e2ae7 cmd/link/internal/ld, internal/syscall/unix: use posix_fallocate on netbsd + 2025-08-11 3dbef65bf3 database/sql: allow drivers to override Scan behavior + 2025-08-11 2b804abf07 net: context aware Dialer.Dial functions + 2025-08-11 6abfe7b0de cmd/dist: require Go 1.24.6 as minimum bootstrap toolchain + 2025-08-11 691af6ca28 encoding/json: fix Indent trailing whitespace regression in goexperiment.jsonv2 + 2025-08-11 925149da20 net/http: add example for CrossOriginProtection + 2025-08-11 cf4af0b2f3 encoding/json/v2: fix UnmarshalDecode regression with EOF + 2025-08-11 b096ddb9ea internal/runtime/maps: loop invariant code motion with h2(hash) by hand + 2025-08-11 a2431776eb net, os, file/filepath, syscall: use slices.Equal in tests + 2025-08-11 a7f05b38f7 cmd/compile: convert branch with zero to more optimal branch zero on loong64 + 2025-08-11 1718828c81 internal/sync: warn about incorrect unsafe usage in HashTrieMap + 2025-08-11 084c0f8494 cmd/compile: allow InlMark operations to be speculatively executed + 2025-08-10 a62f72f7a7 cmd/compile/internal/ssa: optimise more branches with SGTconst/SGTUconst on loong64 + 2025-08-08 fbac94a799 internal/sync: rename Store parameter from old to new + 2025-08-08 317be4cfeb cmd/compile/internal/staticinit: remove deadcode + 2025-08-08 bce5601cbb cmd/go: fix fips doc link + 2025-08-08 777d76c4f2 text/template: use sync.OnceValue for builtinFuncs + 2025-08-08 0201524c52 math: remove redundant infinity tests + 2025-08-08 dcc77f9e3c cmd/go: fix get -tool when multiple packages are provided + 2025-08-08 c7b85e9ddc all: update blog link + 2025-08-08 a8dd771e13 crypto/tls: check if quic conn can send session ticket + 2025-08-08 bdb2d50fdf net: fix WriteMsgUDPAddrPort addr handling on IPv4 sockets + 2025-08-08 768c51e368 internal/runtime/maps: remove unused var bitsetDeleted + 2025-08-08 b3388569a1 reflect: handle zero-sized fields of directly-stored structures correctly + 2025-08-08 d83b16fcb8 internal/bytealg: vector implementation of compare for riscv64 + 2025-08-07 dd3abf6bc5 internal/bytealg: optimize Index/IndexString on loong64 + 2025-08-07 73ff6d1480 cmd/internal/obj/loong64: change the immediate range of ALSL{W/WU/V} + 2025-08-07 f3606b0825 cmd/compile/internal/ssa: fix typo in LOONG64Ops.go comment + 2025-08-07 ee7bb8969a cmd/internal/obj/loong64: add support for FSEL instruction + 2025-08-07 1f7ffca171 time: skip TestLongAdjustTimers on plan9 (too slow) + 2025-08-06 8282b72d62 runtime/race: update darwin race syso + 2025-08-06 dc54d7b607 all: remove support for windows/arm + 2025-08-06 e0a1ea431c cmd/compile: make panicBounds stack frame smaller on ppc64 + 2025-08-06 2747f925dd debug/macho: support reading imported symbols without LC_DYSYMTAB + 2025-08-06 025d36917c cmd/internal/testdir: pass -buildid to link command + 2025-08-06 f53dcb6280 cmd/internal/testdir: unify link command + 2025-08-06 a3895fe9f1 database/sql: avoid closing Rows while scan is in progress + 2025-08-06 608e9fac90 go/types, types2: flip on position tracing + 2025-08-06 72e8237cc1 cmd/compile: allow more args in StructMake folding rule + 2025-08-06 3406a617d9 internal/bytealg: vector implementation of indexbyte for riscv64 + 2025-08-06 75ea2d05c0 internal/bytealg: vector implementation of equal for riscv64 + 2025-08-05 17a8be7117 crypto/sha512: use const table for key loading on loong64 + 2025-08-05 dda9d780e2 crypto/sha256: use const table for key loading on loong64 + 2025-08-05 5defe8ebb3 internal/chacha8rand: replace WORD with instruction VMOVQ + 2025-08-05 4c7362e41c cmd/internal/obj/loong64: add new instructions ALSL{W/WU/V} for loong64 + 2025-08-05 a552737418 cmd/compile: fold negation into multiplication on loong64 + 2025-08-05 e1fd4faf91 runtime: fix godoc comment for inVDSOPage + 2025-08-05 bcd25c79aa cmd/compile: allow StructSelect [x] of interface data fields for x>0 + 2025-08-05 b0945a54b5 cmd/dist, internal/platform: mark freebsd/riscv64 broken + 2025-08-05 55d961b202 runtime: save AVX2 and AVX-512 state on asynchronous preemption + 2025-08-05 af0c4fe2ca runtime: save scalar registers off stack in amd64 async preemption + 2025-08-05 e73afaae69 internal/cpu: add AVX-512-CD and DQ, and derived "basic AVX-512" + 2025-08-05 cef381ba60 runtime: eliminate global state in mkpreempt.go + 2025-08-05 c0025d5e0b go/parser: correct comment in expectedErrors + 2025-08-05 4ee0df8c46 cmd: remove dead code + 2025-08-05 a2c45f0eb1 runtime: test VDSO symbol hash values + 2025-08-05 cd55f86b8d cmd/compile: allow multi-field structs to be stored directly in interfaces + 2025-08-05 21ab0128b6 cmd/compile: remove support for old-style bounds check calls + 2025-08-05 802d056c78 cmd/compile: move ppc64 over to new bounds check strategy + 2025-08-05 a3295df873 cmd/compile/internal/ssa: Use transitive properties for len/cap + 2025-08-05 bd082857a5 doc: fix typo in go memory model doc + 2025-08-05 2b622b05a9 cmd/compile: remove isUintXPowerOfTwo functions + 2025-08-05 72147ffa75 cmd/compile: simplify isUintXPowerOfTwo implementation + 2025-08-05 26da1199eb cmd/compile: make isUint{32,64}PowerOfTwo implementations clearer + 2025-08-05 5ab9f23977 cmd/compile, runtime: add checkptr instrumentation for unsafe.Add + 2025-08-05 fcc036f03b cmd/compile: optimise float <-> int register moves on riscv64 Change-Id: Ie94f29d9b0cc14a52a536866f5abaef27b5c52d7
2025-08-12internal/runtime/gc/scan: import scan kernel from gclab [green tea]Michael Anthony Knyszek
This change imports the AVX512 GC scanning kernel from CL 593938 into a new package, internal/runtime/gc/scan. Credit to Austin Clements for most of this work. I did some cleanup, added support for more size classes to the expanders, and added more testing. I also restructured the code to make it easier and clearer to add new scan kernels for new architectures. For #73581. Change-Id: I76bcbc889fa6cad73ba0084620fae084a5912e6b Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64_avx512,gotip-linux-amd64_avx512-greenteagc Reviewed-on: https://go-review.googlesource.com/c/go/+/655280 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-08-05internal/cpu: add AVX-512-CD and DQ, and derived "basic AVX-512"Austin Clements
This adds detection for the CD and DQ sub-features of x86 AVX-512. Building on these, we also add a "derived" AVX-512 feature that bundles together the basic usable subset of subfeatures. Despite the F in AVX-512-F standing for "foundation", AVX-512-F+BW+DQ+VL together really form the basic usable subset of AVX-512 functionality. These have also all been supported together by almost every CPU, and are guaranteed by GOAMD64=v4, so there's little point in separating them out. This is a cherry-pick of CL 680899 from the dev.simd branch. Change-Id: I34356502bd1853ba2372e48db0b10d55cffe07a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/693396 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Austin Clements <austin@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-04[dev.simd] all: merge master (7a1679d) into dev.simdCherry Mui
Conflicts: - src/cmd/compile/internal/amd64/ssa.go - src/cmd/compile/internal/ssa/rewriteAMD64.go - src/internal/buildcfg/exp.go - src/internal/cpu/cpu.go - src/internal/cpu/cpu_x86.go - src/internal/goexperiment/flags.go Merge List: + 2025-08-04 7a1679d7ae cmd/compile: move s390x over to new bounds check strategy + 2025-08-04 95693816a5 cmd/compile: move riscv64 over to new bounds check strategy + 2025-08-04 d7bd7773eb go/parser: remove safePos + 2025-08-04 4b6cbc377f cmd/cgo/internal/test: use (syntactic) constant for C array bound + 2025-08-03 b2960e3580 cmd/internal/obj/loong64: add {V,XV}{BITCLR/BITSET/BITREV}[I].{B/H/W/D} instructions support + 2025-08-03 abeeef1c08 cmd/compile/internal/test: fix typo in comments + 2025-08-03 d44749b65b cmd/internal/obj/loong64: add [X]VLDREPL.{B/H/W/D} instructions support + 2025-08-03 d6beda863e runtime: add reference to debugPinnerV1 + 2025-08-01 4ab1aec007 cmd/go: modload should use a read-write lock to improve concurrency + 2025-08-01 e666972a67 runtime: deduplicate Windows stdcall + 2025-08-01 ef40549786 runtime,syscall: move loadlibrary and getprocaddress to syscall + 2025-08-01 336931a4ca cmd/go: use os.Rename to move files on Windows + 2025-08-01 eef5f8d930 cmd/compile: enforce that locals are always accessed with SP base register + 2025-08-01 e071617222 cmd/compile: optimize multiplication rules on loong64 + 2025-07-31 eb7f515c4d cmd/compile: use generated loops instead of DUFFZERO on amd64 + 2025-07-31 c0ee2fd4e3 cmd/go: explicitly reject module paths "go" and "toolchain" + 2025-07-30 a4d99770c0 runtime/metrics: add cleanup and finalizer queue metrics + 2025-07-30 70a2ff7648 runtime: add cgo call benchmark + 2025-07-30 69338a335a cmd/go/internal/gover: fix ModIsPrerelease for toolchain versions + 2025-07-30 cedf63616a cmd/compile: add floating point min/max intrinsics on s390x + 2025-07-30 82a1921c3b all: remove redundant Swiss prefixes + 2025-07-30 2ae059ccaf all: remove GOEXPERIMENT=swissmap + 2025-07-30 cc571dab91 cmd/compile: deduplicate instructions when rewrite func results + 2025-07-30 2174a7936c crypto/tls: use standard chacha20-poly1305 cipher suite names + 2025-07-30 8330fb48a6 cmd/compile: move mips32 over to new bounds check strategy + 2025-07-30 9f9d7b50e8 cmd/compile: move mips64 over to new bounds check strategy + 2025-07-30 5216fd570e cmd/compile: move loong64 over to new bounds check strategy + 2025-07-30 89a0af86b8 cmd/compile: allow ops to specify clobbering input registers + 2025-07-30 5e94d72158 cmd/compile: simplify zerorange on arm64 + 2025-07-30 8cd85e602a cmd/compile: check domination of loop return in both controls + 2025-07-30 cefaed0de0 reflect: fix noswiss builder + 2025-07-30 3aa1b00081 regexp: fix compiling alternate patterns of different fold case literals + 2025-07-30 b1e933d955 cmd/compile: avoid extending when already sufficiently masked on loong64 + 2025-07-29 880ca333d7 cmd/compile: removing log2uint32 function + 2025-07-29 1513661dc3 cmd/compile: simplify logX implementations + 2025-07-29 bd94ae8903 cmd/compile: use unsigned power-of-two detector for unsigned mod + 2025-07-29 f3582fc80e cmd/compile: add unsigned power-of-two detector + 2025-07-29 f7d167fe71 internal/abi: move direct/indirect flag from Kind to TFlag + 2025-07-29 e0b07dc22e os/exec: fix incorrect expansion of "", "." and ".." in LookPath + 2025-07-29 25816d401c internal/goexperiment: delete RangeFunc goexperiment + 2025-07-29 7961bf71f8 internal/goexperiment: delete CacheProg goexperiment + 2025-07-29 e15a14c4dd sync: remove synchashtriemap GOEXPERIMENT + 2025-07-29 7dccd6395c cmd/compile: move arm32 over to new bounds check strategy + 2025-07-29 d79405a344 runtime: only deduct assist credit for arenas during GC + 2025-07-29 19a086f716 cmd/go/internal/telemetrystats: count goexperiments + 2025-07-29 aa95ab8215 image: fix formatting of godoc link + 2025-07-29 4c854b7a3e crypto/elliptic: change a variable name that have the same name as keywords + 2025-07-28 b10eb1d042 cmd/compile: simplify zerorange on amd64 + 2025-07-28 f8eae7a3c3 os/user: fix tests to pass on non-english Windows + 2025-07-28 0984264471 internal/poll: remove msg field from Windows' poll.operation + 2025-07-28 d7b4114346 internal/poll: remove rsan field from Windows' poll.operation + 2025-07-28 361b1ab41f internal/poll: remove sa field from Windows' poll.operation + 2025-07-28 9b6bd64e46 internal/poll: remove qty and flags fields from Windows' poll.operation + 2025-07-28 cd3655a824 internal/runtime/maps: fix spelling errors in comments + 2025-07-28 d5dc36af45 runtime: remove openbsd/mips64 related code + 2025-07-28 64ba72474d errors: omit redundant nil check in type assertion for Join + 2025-07-28 e151db3e06 all: omit unnecessary type conversions + 2025-07-28 4569255f8c cmd/compile: cleanup SelectN rules by indexing into args + 2025-07-28 94645d2413 cmd/compile: rewrite cmov(x, x, cond) into x + 2025-07-28 10c5cf68d4 net/http: add proper panic message + 2025-07-28 46b5839231 test/codegen: fix failing condmove wasm tests + 2025-07-28 98f301cf68 runtime,syscall: move SyscallX implementations from runtime to syscall + 2025-07-28 c7ed3a1c5a internal/runtime/syscall/windows: factor out code from runtime + 2025-07-28 e81eac19d3 hash/crc32: fix incorrect checksums with avx512+race + 2025-07-25 6fbad4be75 cmd/compile: remove no-longer-necessary call to calculateDepths + 2025-07-25 5045fdd8ff cmd/compile: fix containsUnavoidableCall computation + 2025-07-25 d28b27cd8e go/types, types2: use nil to represent incomplete explicit aliases + 2025-07-25 7b53d8d06e cmd/compile/internal/types2: add loaded state between loader calls and constraint expansion + 2025-07-25 374e3be2eb os/user: user random name for the test user account + 2025-07-25 1aa154621d runtime: rename scanobject to scanObject + 2025-07-25 41b429881a runtime: duplicate scanobject in greentea and non-greentea files + 2025-07-25 aeb256e98a cmd/compile: remove unused arg from gorecover + 2025-07-25 08376e1a9c runtime: iterate through inlinings when processing recover() + 2025-07-25 c76c3abc54 encoding/json: fix truncated Token error regression in goexperiment.jsonv2 + 2025-07-25 ebdbfccd98 encoding/json/jsontext: preserve buffer capacity in Encoder.Reset + 2025-07-25 91c4f0ccd5 reflect: avoid a bounds check in stack-constrained code + 2025-07-24 3636ced112 encoding/json: fix extra data regression under goexperiment.jsonv2 + 2025-07-24 a6eec8bdc7 encoding/json: reduce error text regressions under goexperiment.jsonv2 + 2025-07-24 0fa88dec1e time: remove redundant uint32 conversion in split + 2025-07-24 ada30b8248 internal/buildcfg: add ability to get GORISCV64 variable in GOGOARCH + 2025-07-24 6f6c6c5782 cmd/internal/obj: rip out argp adjustment for wrapper frames + 2025-07-24 7b50024330 runtime: detect successful recovers differently + 2025-07-24 7b9de668bd unicode/utf8: skip ahead during ascii runs in Valid/ValidString + 2025-07-24 076eae436e cmd/compile: move amd64 and 386 over to new bounds check strategy + 2025-07-24 f703dc5bef cmd/compile: add missing StringLen rule in prove + 2025-07-24 394d0bee8d cmd/compile: move arm64 over to new bounds check strategy + 2025-07-24 3024785b92 cmd/compile,runtime: remember idx+len for bounds check failure with less code + 2025-07-24 741a19ab41 runtime: move bounds check constants to internal/abi + 2025-07-24 ce05ad448f cmd/compile: rewrite condselects into doublings and halvings + 2025-07-24 fcd28070fe cmd/compile: add opt branchelim to rewrite some CondSelect into math + 2025-07-24 f32cf8e4b0 cmd/compile: learn transitive proofs for safe unsigned subs + 2025-07-24 d574856482 cmd/compile: learn transitive proofs for safe negative signed adds + 2025-07-24 1a72920f09 cmd/compile: learn transitive proofs for safe positive signed adds + 2025-07-24 e5f202bb60 cmd/compile: learn transitive proofs for safe unsigned adds + 2025-07-24 bd80f74bc1 cmd/compile: fold shift through AND for slice operations + 2025-07-24 5c45fe1385 internal/runtime/syscall: rename to internal/runtime/syscall/linux + 2025-07-24 592c2db868 cmd/compile: improve loopRotate to handle nested loops + 2025-07-24 dcb479c2f9 cmd/compile: optimize slice bounds checking with SUB/SUBconst comparisons + 2025-07-24 f11599b0b9 internal/poll: remove handle field from Windows' poll.operation + 2025-07-24 f7432e0230 internal/poll: remove fd field from Windows' poll.operation + 2025-07-24 e84ed38641 runtime: add benchmark for small-size memmory operation + 2025-07-24 18dbe5b941 hash/crc32: add AVX512 IEEE CRC32 calculation + 2025-07-24 c641900f72 cmd/compile: prefer base.Fatalf to panic in dwarfgen + 2025-07-24 d71d8aeafd cmd/internal/obj/s390x: add MVCLE instruction + 2025-07-24 b6cf1d94dc runtime: optimize memclr on mips64x + 2025-07-24 a8edd99479 runtime: improvement in memclr for s390x + 2025-07-24 bd04f65511 internal/runtime/exithook: fix a typo + 2025-07-24 5c8624a396 cmd/internal/goobj: make error output clear + 2025-07-24 44d73dfb4e cmd/go/internal/doc: clean up after merge with cmd/internal/doc + 2025-07-24 bd446662dd cmd/internal/doc: merge with cmd/go/internal/doc + 2025-07-24 da8b50c830 cmd/doc: delete + 2025-07-24 6669aa3b14 runtime: randomize heap base address + 2025-07-24 26338a7f69 cmd/compile: use better fatal message for staticValue1 + 2025-07-24 8587ba272e cmd/cgo: compare malloc return value to NULL instead of literal 0 + 2025-07-24 cae45167b7 go/types, types2: better error messages for certain type mismatches + 2025-07-24 2ddf542e4c cmd/compile: use ,ok return idiom for sparsemap.get + 2025-07-24 6505fcbd0a cmd/compile: use generics for sparse map + 2025-07-24 14f5eb7812 cmd/api: rerun updategolden + 2025-07-24 52b6d7f67a runtime: drop NetBSD kernel bug sysmon workaround fixed in NetBSD 9.2 + 2025-07-24 1ebebf1cc1 cmd/go: clean should respect workspaces + 2025-07-24 6536a93547 encoding/json/jsontext: preserve buffer capacity in Decoder.Reset + 2025-07-24 efc37e97c0 cmd/go: always return the cached path from go tool -n + 2025-07-23 98a031193b runtime: check TestUsingVDSO ExitError type assertion + 2025-07-23 6bb42997c8 doc/next: initialize + 2025-07-23 2696a11a97 internal/goversion: update Version to 1.26 + 2025-07-23 489868f776 cmd/link: scope test to linux & net.sendFile + 2025-07-22 71c2bf5513 cmd/compile: fix loclist for heap return vars without optimizations + 2025-07-22 c74399e7f5 net: correct comment for ListenConfig.ListenPacket + 2025-07-22 4ed9943b26 all: go fmt + 2025-07-22 1aaf7422f1 cmd/internal/objabi: remove redundant word in comment + 2025-07-21 d5ec0815e6 runtime: relax TestMemoryLimitNoGCPercent a bit + 2025-07-21 f7cc61e7d7 cmd/compile: for arm64 epilog, do SP increment with a single instruction + 2025-07-21 5dac42363b runtime: fix asan wrapper for riscv64 + 2025-07-21 e5502e0959 cmd/go: check subcommand properties + 2025-07-19 2363897932 cmd/internal/obj: enable got pcrel itype in fips140 for riscv64 + 2025-07-19 e32255fcc0 cmd/compile/internal/ssa: restrict architectures for TestDebugLines_74576 + 2025-07-18 0451816430 os: revert the use of AddCleanup to close files and roots + 2025-07-18 34b70684ba go/types: infer correct type for y in append(bytes, y...) + 2025-07-17 66536242fc cmd/compile/internal/escape: improve DWARF .debug_line numbering for literal rewriting optimizations + 2025-07-16 385000b004 runtime: fix idle time double-counting bug + 2025-07-16 f506ad2644 cmd/compile/internal/escape: speed up analyzing some functions with many closures + 2025-07-16 9c507e7942 cmd/link, runtime: on Wasm, put only function index in method table and func table + 2025-07-16 9782dcfd16 runtime: use 32-bit function index on Wasm + 2025-07-16 c876bf9346 cmd/internal/obj/wasm: use 64-bit instructions for indirect calls + 2025-07-15 b4309ece66 cmd/internal/doc: upgrade godoc pkgsite to 01b046e + 2025-07-15 75a19dbcd7 runtime: use memclrNoHeapPointers to clear inline mark bits + 2025-07-15 6d4a91c7a5 runtime: only clear inline mark bits on span alloc if necessary + 2025-07-15 0c6296ab12 runtime: have mergeInlineMarkBits also clear the inline mark bits + 2025-07-15 397d2117ec runtime: merge inline mark bits with gcmarkBits 8 bytes at a time + 2025-07-15 7dceabd3be runtime/maps: fix typo in group.go comment (instrinsified -> intrinsified) + 2025-07-15 d826bf4d74 os: remove useless error check + 2025-07-14 bb07e55aff runtime: expand GOMAXPROCS documentation + 2025-07-14 9159cd4ec6 encoding/json: decompose legacy options + 2025-07-14 c6556b8eb3 encoding/json/v2: add security section to doc + 2025-07-11 6ebb5f56d9 runtime: gofmt after CL 643897 and CL 662455 + 2025-07-11 1e48ca7020 encoding/json: remove legacy option to EscapeInvalidUTF8 + 2025-07-11 a0a99cb22b encoding/json/v2: report wrapped io.ErrUnexpectedEOF + 2025-07-11 9d04122d24 crypto/rsa: drop contradictory promise to keep PublicKey modulus secret + 2025-07-11 1ca23682dd crypto/rsa: fix documentation formatting + 2025-07-11 4bc3373c8e runtime: turn off large memmove tests under asan/msan Change-Id: I1e32d964eba770b85421efb86b305a2242f24466
2025-07-24hash/crc32: add AVX512 IEEE CRC32 calculationKlaus Post
Benchmark: goos: windows goarch: amd64 pkg: hash/crc32 cpu: AMD Ryzen 9 9950X 16-Core Processor benchmark old MB/s new MB/s speedup BenchmarkCRC32/poly=IEEE/size=15/align=0-32 1081.48 1089.42 1.01x BenchmarkCRC32/poly=IEEE/size=15/align=1-32 1085.87 1082.61 1.00x BenchmarkCRC32/poly=IEEE/size=40/align=0-32 2756.33 2752.37 1.00x BenchmarkCRC32/poly=IEEE/size=40/align=1-32 2758.27 2756.99 1.00x BenchmarkCRC32/poly=IEEE/size=512/align=0-32 18133.44 18076.52 1.00x BenchmarkCRC32/poly=IEEE/size=512/align=1-32 18151.05 18055.41 0.99x BenchmarkCRC32/poly=IEEE/size=1kB/align=0-32 19902.93 48581.07 2.44x BenchmarkCRC32/poly=IEEE/size=1kB/align=1-32 19966.99 48393.25 2.42x BenchmarkCRC32/poly=IEEE/size=4kB/align=0-32 21690.33 51679.25 2.38x BenchmarkCRC32/poly=IEEE/size=4kB/align=1-32 21655.30 51731.22 2.39x BenchmarkCRC32/poly=IEEE/size=32kB/align=0-32 22046.57 46406.90 2.10x BenchmarkCRC32/poly=IEEE/size=32kB/align=1-32 21986.22 46250.66 2.10x AVX512 are enabled above 1KB input size. This rather high limit is due to AVX512 may be slower to ramp up than the regular SSE4 implementation for smaller inputs. This is not reflected in the benchmarks, since consecutive calls means the CPU is "hot". The 'HasAVX512VPCLMULQDQ' name mirrors the one in golang.org/x/sys/cpu Change-Id: Id23685d8e3cc412b6d397a7d70056844bdb79271 Change-Id: Id23685d8e3cc412b6d397a7d70056844bdb79271 GitHub-Last-Rev: 6639f07b9febc7c96a7f3b402a2fd60f7be5e154 GitHub-Pull-Request: golang/go#74701 Reviewed-on: https://go-review.googlesource.com/c/go/+/689435 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Keith Randall <khr@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-07-21[dev.simd] simd, internal/cpu: support more AVX CPU Feature checksJunyang Shao
This CL adds more checks, it also changes HasAVX512GFNI to be exactly checking GFNI instead of being a virtual feature. This CL copies its logic from x/sys/arch. Change-Id: I4612b0409b8a3518928300562ae08bcf123d53a7 Reviewed-on: https://go-review.googlesource.com/c/go/+/688276 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-07-01[dev.simd] internal/cpu: add GFNI feature checkJunyang Shao
This CL amends HasAVX512 flag with GFNI check. This is needed because our SIMD API supports Galois Field operations. Change-Id: I3e957b7b2215d2b7b6b8a7a0ca3e2e60d453b2e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/685295 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-06-13[dev.simd] internal/cpu: add AVX-512-CD and DQ, and derived "basic AVX-512"Austin Clements
This adds detection for the CD and DQ sub-features of x86 AVX-512. Building on these, we also add a "derived" AVX-512 feature that bundles together the basic usable subset of subfeatures. Despite the F in AVX-512-F standing for "foundation", AVX-512-F+BW+DQ+VL together really form the basic usable subset of AVX-512 functionality. These have also all been supported together by almost every CPU, and are guaranteed by GOAMD64=v4, so there's little point in separating them out. Change-Id: I34356502bd1853ba2372e48db0b10d55cffe07a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/680899 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21internal/cpu: add ARM64.HasSHA3Filippo Valsorda
For #69536 Change-Id: If237226ba03e282443b4fc90484968c903198cb1 Reviewed-on: https://go-review.googlesource.com/c/go/+/616715 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Roland Shoemaker <roland@golang.org>
2025-05-01cmd/compile,internal/cpu,runtime: intrinsify math/bits.OnesCount on riscv64Joel Sing
For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount using the CPOP/CPOPW machine instructions. Since the native Go implementation of OnesCount is relatively expensive, it is also worth emitting a check for Zbb support when compiled for rva20u64. On a Banana Pi F3, with GORISCV64=rva22u64: │ oc.1 │ oc.2 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 4.389n ± 0% -74.08% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.016n ± 0% -11.10% (p=0.000 n=10) OnesCount16-8 9.404n ± 0% 5.015n ± 0% -46.67% (p=0.000 n=10) OnesCount32-8 13.165n ± 0% 4.388n ± 0% -66.67% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 4.388n ± 0% -73.08% (p=0.000 n=10) geomean 11.40n 4.629n -59.40% On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb detection enabled: │ oc.3 │ oc.4 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 5.643n ± 0% -66.67% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.642n ± 0% ~ (p=0.447 n=10) OnesCount16-8 10.030n ± 0% 6.896n ± 0% -31.25% (p=0.000 n=10) OnesCount32-8 13.170n ± 0% 5.642n ± 0% -57.16% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 5.642n ± 0% -65.39% (p=0.000 n=10) geomean 11.55n 5.873n -49.16% On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb detection disabled: │ oc.3 │ oc.5 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.93n ± 0% 29.47n ± 0% +74.07% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.643n ± 0% ~ (p=0.191 n=10) OnesCount16-8 10.03n ± 0% 15.05n ± 0% +50.05% (p=0.000 n=10) OnesCount32-8 13.17n ± 0% 18.18n ± 0% +38.04% (p=0.000 n=10) OnesCount64-8 16.30n ± 0% 21.94n ± 0% +34.60% (p=0.000 n=10) geomean 11.55n 15.84n +37.16% For hardware without Zbb, this adds ~5ns overhead, while for hardware with Zbb we achieve a performance gain up of up to 11ns. It is worth noting that OnesCount8 is cheap enough that it is preferable to stick with the generic version in this case. Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5 Reviewed-on: https://go-review.googlesource.com/c/go/+/660856 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-11internal/bytealg: optimize Count{,String} in loong64Guoqi Chen
Benchmark on Loongson 3A6000 and 3A5000: goos: linux goarch: loong64 pkg: bytes cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | CountSingle/10 13.210n ± 0% 9.984n ± 0% -24.42% (p=0.000 n=15) CountSingle/32 31.970n ± 1% 7.205n ± 0% -77.46% (p=0.000 n=15) CountSingle/4K 4039.0n ± 0% 108.7n ± 0% -97.31% (p=0.000 n=15) CountSingle/4M 4158.9µ ± 0% 117.3µ ± 0% -97.18% (p=0.000 n=15) CountSingle/64M 68.641m ± 0% 2.585m ± 1% -96.23% (p=0.000 n=15) geomean 13.72µ 1.189µ -91.34% | bench.old | bench.new | | B/s | B/s vs base | CountSingle/10 722.0Mi ± 0% 955.2Mi ± 0% +32.30% (p=0.000 n=15) CountSingle/32 954.6Mi ± 1% 4235.4Mi ± 0% +343.68% (p=0.000 n=15) CountSingle/4K 967.2Mi ± 0% 35947.6Mi ± 0% +3616.64% (p=0.000 n=15) CountSingle/4M 961.8Mi ± 0% 34092.7Mi ± 0% +3444.71% (p=0.000 n=15) CountSingle/64M 932.4Mi ± 0% 24757.2Mi ± 1% +2555.24% (p=0.000 n=15) geomean 902.2Mi 10.17Gi +1054.77% goos: linux goarch: loong64 pkg: bytes cpu: Loongson-3A5000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | CountSingle/10 14.41n ± 0% 12.81n ± 0% -11.10% (p=0.000 n=15) CountSingle/32 36.230n ± 0% 9.609n ± 0% -73.48% (p=0.000 n=15) CountSingle/4K 4366.0n ± 0% 165.5n ± 0% -96.21% (p=0.000 n=15) CountSingle/4M 4464.7µ ± 0% 325.2µ ± 0% -92.72% (p=0.000 n=15) CountSingle/64M 75.627m ± 0% 8.307m ± 69% -89.02% (p=0.000 n=15) geomean 15.04µ 2.229µ -85.18% | bench.old | bench.new | | B/s | B/s vs base | CountSingle/10 661.8Mi ± 0% 744.4Mi ± 0% +12.49% (p=0.000 n=15) CountSingle/32 842.4Mi ± 0% 3176.1Mi ± 0% +277.03% (p=0.000 n=15) CountSingle/4K 894.7Mi ± 0% 23596.7Mi ± 0% +2537.34% (p=0.000 n=15) CountSingle/4M 895.9Mi ± 0% 12299.7Mi ± 0% +1272.88% (p=0.000 n=15) CountSingle/64M 846.3Mi ± 0% 7703.9Mi ± 41% +810.34% (p=0.000 n=15) geomean 823.3Mi 5.424Gi +574.68% Change-Id: Ie07592beac61bdb093470c524049ed494df4d703 Reviewed-on: https://go-review.googlesource.com/c/go/+/586055 Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-02-05cpu/internal: provide runtime detection of RISC-V extensions on LinuxMark Ryan
Add a RISCV64 variable to cpu/internal that indicates both the presence of RISC-V extensions and performance information about the underlying RISC-V cores. The variable is only populated with non false values on Linux. The detection code relies on the riscv_hwprobe syscall introduced in Linux 6.4. The patch can detect RVV 1.0 and whether the CPU supports fast misaligned accesses. It can only detect RVV 1.0 on a 6.5 kernel or later (without backports). Updates #61416 Change-Id: I2d8289345c885b699afff441d417cae38f6bdc54 Reviewed-on: https://go-review.googlesource.com/c/go/+/522995 Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com>
2024-11-12cmd/compile: optimize math/bits.OnesCount{16,32,64} implementation on loong64Guoqi Chen
Use Loong64's LSX instruction VPCNT to implement math/bits.OnesCount{16,32,64} and make it intrinsic. Benchmark results on loongson 3A5000 and 3A6000 machines: goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A5000-HV @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | OnesCount 4.413n ± 0% 1.401n ± 0% -68.25% (p=0.000 n=10) OnesCount8 1.364n ± 0% 1.363n ± 0% ~ (p=0.130 n=10) OnesCount16 2.112n ± 0% 1.534n ± 0% -27.37% (p=0.000 n=10) OnesCount32 4.533n ± 0% 1.529n ± 0% -66.27% (p=0.000 n=10) OnesCount64 4.565n ± 0% 1.531n ± 1% -66.46% (p=0.000 n=10) geomean 3.048n 1.470n -51.78% goos: linux goarch: loong64 pkg: math/bits cpu: Loongson-3A6000 @ 2500.00MHz | bench.old | bench.new | | sec/op | sec/op vs base | OnesCount 3.553n ± 0% 1.201n ± 0% -66.20% (p=0.000 n=10) OnesCount8 0.8021n ± 0% 0.8004n ± 0% -0.21% (p=0.000 n=10) OnesCount16 1.216n ± 0% 1.000n ± 0% -17.76% (p=0.000 n=10) OnesCount32 3.006n ± 0% 1.035n ± 0% -65.57% (p=0.000 n=10) OnesCount64 3.503n ± 0% 1.035n ± 0% -70.45% (p=0.000 n=10) geomean 2.053n 1.006n -51.01% Change-Id: I07a5b8da2bb48711b896387ec7625145804affc8 Reviewed-on: https://go-review.googlesource.com/c/go/+/620978 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-04internal/cpu: add CPU feature LAMCAS and LAM_BH detection on loong64Guoqi Chen
Change-Id: Ic5580c4ee006d87b3152ae5de7b25fb532c6a33f Reviewed-on: https://go-review.googlesource.com/c/go/+/612976 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Martin Möhrmann <martin@golang.org>
2024-08-29internal/cpu, runtime: make linux/loong64 HWCAP data availableWANG Xuerui
This can be used to toggle runtime usages of ISA extensions as such usages appear. Only the CRC32 bit is exposed for now, as the others are not going to be utilized in the standard library for a while. Change-Id: I774032ca84dc8bcf1c9f17558917315af07c7314 Reviewed-on: https://go-review.googlesource.com/c/go/+/482416 Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: xiaodong liu <teaofmoli@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2024-07-24internal/cpu: add DIT detection on arm64Roland Shoemaker
Add support for detecting the DIT feature on ARM64 processors. This mirrors https://go.dev/cl/597377, but using the platform specific semantics. Updates #66450 Change-Id: Ia107e3e3369de7825af70823b485afe2f587358e Reviewed-on: https://go-review.googlesource.com/c/go/+/598335 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2024-07-22runtime: add ERMS-based memmove support for modern CPU platformsTangYang
The current memmove implementation uses REP MOVSB to copy data larger than 2KB when the useAVXmemmove global variable is false and the CPU supports the ERMS feature. This feature is currently only enabled on CPUs in the Sandy Bridge (Client) , Sandy Bridge (Server), Ivy Bridge (Client), and Ivy Bridge (Server) microarchitectures. For modern Intel CPU microarchitectures that support the ERMS feature, such as Ice Lake (Server), Sapphire Rapids , REP MOVSB achieves better performance than the AVX-based copy currently implemented in memmove. Benchstat result: goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz │ ./old.txt │ ./new.txt │ │ sec/op │ sec/op vs base │ Memmove/2048-2 25.24n ± 0% 24.27n ± 0% -3.84% (p=0.000 n=10) Memmove/4096-2 44.87n ± 0% 33.16n ± 1% -26.11% (p=0.000 n=10) geomean 33.65n 28.37n -15.71% │ ./old.txt │ ./new.txt │ │ B/s │ B/s vs base │ Memmove/2048-2 75.56Gi ± 0% 78.59Gi ± 0% +4.02% (p=0.000 n=10) Memmove/4096-2 85.01Gi ± 0% 115.05Gi ± 1% +35.34% (p=0.000 n=10) geomean 80.14Gi 95.09Gi +18.65% Fixes #66958 Change-Id: I1fafd1b51a16752f83ac15047cf3b29422a79d5d GitHub-Last-Rev: 89cf5af32b1b41e1499282058656a8a5c7aed359 GitHub-Pull-Request: golang/go#66959 Reviewed-on: https://go-review.googlesource.com/c/go/+/580735 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-05-15cmd/link: disallow pull-only linknamesCherry Mui
As mentioned in CL 584598, linkname is a mechanism that, when abused, can break API integrity and even safety of Go programs. CL 584598 is a first step to restrict the use of linknames, by implementing a blocklist. This CL takes a step further, tightening up the restriction by allowing linkname references ("pull") only when the definition side explicitly opts into it, by having a linkname on the definition (possibly to itself). This way, it is at least clear on the definition side that the symbol, despite being unexported, is accessed outside of the package. Unexported symbols without linkname can now be actually private. This is similar to the symbol visibility rule used by gccgo for years (which defines unexported non-linknamed symbols as C static symbols). As there can be pull-only linknames in the wild that may be broken by this change, we currently only enforce this rule for symbols defined in the standard library. Push linknames are added in the standard library to allow things build. Linkname references to external (non-Go) symbols are still allowed, as their visibility is controlled by the C symbol visibility rules and enforced by the C (static or dynamic) linker. Assembly symbols are treated similar to linknamed symbols. This is controlled by -checklinkname linker flag, currently not enabled by default. A follow-up CL will enable it by default. Change-Id: I07344f5c7a02124dbbef0fbc8fec3b666a4b2b0e Reviewed-on: https://go-review.googlesource.com/c/go/+/585358 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Russ Cox <rsc@golang.org>
2023-11-15internal/cpu: detect support of AVX512Achille Roussel
Extracts changes from that were submitted in other CLs to enable AVX512 detection, notably: - https://go-review.googlesource.com/c/go/+/271521 - https://go-review.googlesource.com/c/go/+/379394 - https://go-review.googlesource.com/c/go/+/502476 This change adds properties to the cpu.X86 fields to enable runtime detection of AVX512, and the hasAVX512F, hasAVX512BW, and hasAVX512VL macros to support bypassing runtime checks in assembly code when GOAMD64=v4 is set. Change-Id: Ia7c3f22f1e66bf1de575aba522cb0d0a55ce791f Reviewed-on: https://go-review.googlesource.com/c/go/+/536257 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <martin@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Martin Möhrmann <martin@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Martin Möhrmann <moehrmann@google.com> Commit-Queue: Martin Möhrmann <martin@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2023-10-31internal/cpu: add comments to copied functionsapocelipes
Just as same as other copied functions, like stringsTrimSuffix in "os/executable_procfs.go" Change-Id: I9c9fbd75b009a5ae0e869cf1fddc77c0e08d9a67 GitHub-Last-Rev: 4c18865e15ede0f53121b6845a1879cdd70d1a38 GitHub-Pull-Request: golang/go#63704 Reviewed-on: https://go-review.googlesource.com/c/go/+/537056 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2023-10-31runtime: on arm32, detect whether we have sync instructionsKeith Randall
Make the choice of using these instructions dynamic (triggered by cpu feature detection) rather than static (trigered by GOARM setting). if GOARM>=7, we know we have them. For GOARM=5/6, dynamically dispatch based on auxv information. Update #17082 Update #61588 Change-Id: I8a50481d942f62cf36348998a99225d0d242f8af Reviewed-on: https://go-review.googlesource.com/c/go/+/525637 TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Run-TryBot: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-04-25internal/cpu: add a detection for Neoverse(N2, V2) coresfanzha02
The memmove implementation relies on the variable runtime.arm64UseAlignedLoads to select fastest code path. Considering Neoverse N2 and V2 cores prefer aligned loads, this patch adds code to detect them for memmove performance. And this patch uses a new variable ARM64.IsNeoverse to represent all Neoverse cores, removing the more specific versions. Change-Id: I9e06eae01a0325a0b604ac6af1e55711dd6133f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/487815 Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Fannie Zhang <Fannie.Zhang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-11-08runtime internal/cpu: rename "Zeus" "NeoverseV1".Matthew Horsnell
Rename "Zeus" to "NeoverseV1" for the partnum 0xd40 to be consistent with the documentation of MIDR_EL1 as described in https://developer.arm.com/documentation/101427/0101/?lang=en Change-Id: I2e3d5ec76b953a831cb4ab0438bc1c403648644b Reviewed-on: https://go-review.googlesource.com/c/go/+/414775 Reviewed-by: Jonathan Swinney <jswinney@amazon.com> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Eric Fang <eric.fang@arm.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2022-08-15internal/cpu: detect sha-ni instruction support for AMD64ted
addresses proposal #53084 required by sha-256 change list developed for #50543 Change-Id: I5454d746fce069a7a4993d70dc5b0a5544f8eeaf Reviewed-on: https://go-review.googlesource.com/c/go/+/408794 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@google.com>
2022-08-09internal/cpu: add sha512 for arm64Meng Zhuo
The new M1 cpu (Apple) comes with sha512 hardware acceleration feature. Change-Id: I823d1e9b09b472bd21571eee75cc5314cd66b1ff Reviewed-on: https://go-review.googlesource.com/c/go/+/408836 Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-10internal/cpu: report CPU if known on PPC64Paul E. Murphy
The PPC64 maintainers are testing on P10 hardware, so it is helpful to report the correct cpu, even if this information is not used elsewhere yet. Note, AIX will report the current CPU of the host system, so a POWER10 will not set the IsPOWER9 flag. This is existing behavior, and should be fixed in a separate patch. Change-Id: Iebe23dd96ebe03c8a1c70d1ed2dc1506bad3c330 Reviewed-on: https://go-review.googlesource.com/c/go/+/404394 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com>
2021-10-06internal/cpu: remove option to mark cpu features requiredMartin Möhrmann
With the removal of SSE2 runtime detection made in golang.org/cl/344350 we can remove this mechanism as there are no required features anymore. For making sure CPUs running a go program support all the minimal hardware requirements the go runtime should do feature checks early in the runtime initialization before it is likely any compiler emitted but unsupported instructions are used. This is already the case for e.g. checking MMX support on 386 arch targets. Change-Id: If7b1cb6f43233841e917d37a18314d06a334a734 Reviewed-on: https://go-review.googlesource.com/c/go/+/354209 Trust: Martin Möhrmann <martin@golang.org> Run-TryBot: Martin Möhrmann <martin@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>
2021-08-23all: replace runtime SSE2 detection with GO386 settingMartin Möhrmann
When GO386=sse2 we can assume sse2 to be present without a runtime check. If GO386=softfloat is set we can avoid the usage of SSE2 even if detected. This might cause a memcpy, memclr and bytealg slowdown of Go binaries compiled with softfloat on machines that support SSE2. Such setups are rare and should use GO386=sse2 instead if performance matters. On targets that support SSE2 we avoid the runtime overhead of dynamic cpu feature dispatch. The removal of runtime sse2 checks also allows to simplify internal/cpu further by removing handling of the required feature option as a followup after this CL. Change-Id: I90a853a8853a405cb665497c6d1a86556947ba17 Reviewed-on: https://go-review.googlesource.com/c/go/+/344350 Trust: Martin Möhrmann <martin@golang.org> Run-TryBot: Martin Möhrmann <martin@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-08-23runtime: use RDTSCP for instruction stream serialized read of TSCMartin Möhrmann
To measure all instructions having been completed before reading the time stamp counter with RDTSC an instruction sequence that has instruction stream serializing properties which guarantee waiting until all previous instructions have been executed is needed. This does not necessary mean to wait for all stores to be globally visible. This CL aims to remove vendor specific logic for determining the instruction sequence with CPU feature flag checks that are CPU vendor independent. For intel LFENCE has the wanted properties at least since it was introduced together with SSE2 support. On AMD instruction stream serializing LFENCE is supported by setting an MSR C001_1029[1]=1 on AMD family 10h/12h/14h/15h/16h/17h processors. AMD family 0Fh/11h processors support LFENCE as serializing always. AMD plans support for this MSR and access to this bit for all future processors. Source: https://developer.amd.com/wp-content/resources/Managing-Speculation-on-AMD-Processors.pdf Reading the MSR to determine LFENCE properties is not always possible or reliable (hypervisors). The Linux kernel is relying on serializing LFENCE on AMD CPUs since a commit in July 2019: https://lkml.org/lkml/2019/7/22/295 and the MSR C001_1029 to enable serialization has been set by default with the Spectre v1 mitigations. Using an MFENCE on AMD is waiting on previous instructions having been executed but in addition also flushes store buffers. To align the serialization properties without runtime detection of CPU manufacturers we can use the newer RDTSCP instruction which waits until all previous instructions have been executed. RDTSCP is available on Intel since around 2008 and on AMD CPUs since around 2006. Support for RDTSCP can be checked independently of manufacturer by checking CPUID bits. Using RDTSCP is the default in Linux to read TSC in program order when the instruction is available. https://github.com/torvalds/linux/blob/e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93/arch/x86/include/asm/msr.h#L231 Change-Id: Ifa841843b9abb2816f8f0754a163ebf01385306d Reviewed-on: https://go-review.googlesource.com/c/go/+/344429 Reviewed-by: Keith Randall <khr@golang.org> Trust: Martin Möhrmann <martin@golang.org> Run-TryBot: Martin Möhrmann <martin@golang.org> TryBot-Result: Go Bot <gobot@golang.org>
2020-11-05internal/cpu: fix and cleanup ARM64 cpu feature fields and optionsMartin Möhrmann
Remove all cpu features from the ARM64 struct that are not initialized to reduce cache lines used and to avoid those features being accidentially used without actual detection if they are present. Add missing option to mask the CPUID feature. Change-Id: I94bf90c0655de1af2218ac72117ac6c52adfc289 Reviewed-on: https://go-review.googlesource.com/c/go/+/267658 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com> Trust: Martin Möhrmann <moehrmann@google.com>
2020-11-02runtime: improve memmove performance on arm64Jonathan Swinney
Replace the memmove implementation for moves of 17 bytes or larger with an implementation from ARM optimized software. The moves of 16 bytes or fewer are unchanged, but the registers used are updated to match the rest of the implementation. This implementation makes use of new optimizations: - software pipelined loop for large (>128 byte) moves - medium size moves (17..128 bytes) have a new implementation - address realignment when src or dst is unaligned - preference for aligned src (loads) or dst (stores) depending on CPU To support preference for aligned loads or aligned stores, a new CPU flag is added. This flag indicates that the detected micro architecture performs better with aligned loads. Some tested CPUs did not exhibit a significant difference and are left with the default behavior of realigning based on the destination address (stores). Neoverse N1 (Tested on Graviton 2) name old time/op new time/op delta Memmove/0-4 1.88ns ± 1% 1.87ns ± 1% -0.58% (p=0.020 n=10+10) Memmove/1-4 4.40ns ± 0% 4.40ns ± 0% ~ (all equal) Memmove/8-4 3.88ns ± 3% 3.80ns ± 0% -1.97% (p=0.001 n=10+9) Memmove/16-4 3.90ns ± 3% 3.80ns ± 0% -2.49% (p=0.000 n=10+9) Memmove/32-4 4.80ns ± 0% 4.40ns ± 0% -8.33% (p=0.000 n=9+8) Memmove/64-4 5.86ns ± 0% 5.00ns ± 0% -14.76% (p=0.000 n=8+8) Memmove/128-4 8.46ns ± 0% 8.06ns ± 0% -4.62% (p=0.000 n=10+10) Memmove/256-4 12.4ns ± 0% 12.2ns ± 0% -1.61% (p=0.000 n=10+10) Memmove/512-4 19.5ns ± 0% 19.1ns ± 0% -2.05% (p=0.000 n=10+10) Memmove/1024-4 33.7ns ± 0% 33.5ns ± 0% -0.59% (p=0.000 n=10+10) Memmove/2048-4 62.1ns ± 0% 59.0ns ± 0% -4.99% (p=0.000 n=10+10) Memmove/4096-4 117ns ± 1% 110ns ± 0% -5.66% (p=0.000 n=10+10) MemmoveUnalignedDst/64-4 6.41ns ± 0% 5.62ns ± 0% -12.32% (p=0.000 n=10+7) MemmoveUnalignedDst/128-4 9.40ns ± 0% 8.34ns ± 0% -11.24% (p=0.000 n=10+10) MemmoveUnalignedDst/256-4 12.8ns ± 0% 12.8ns ± 0% ~ (all equal) MemmoveUnalignedDst/512-4 20.4ns ± 0% 19.7ns ± 0% -3.43% (p=0.000 n=9+10) MemmoveUnalignedDst/1024-4 34.1ns ± 0% 35.1ns ± 0% +2.93% (p=0.000 n=9+9) MemmoveUnalignedDst/2048-4 61.5ns ± 0% 60.4ns ± 0% -1.77% (p=0.000 n=10+10) MemmoveUnalignedDst/4096-4 122ns ± 0% 113ns ± 0% -7.38% (p=0.002 n=8+10) MemmoveUnalignedSrc/64-4 7.25ns ± 1% 6.26ns ± 0% -13.64% (p=0.000 n=9+9) MemmoveUnalignedSrc/128-4 10.5ns ± 0% 9.7ns ± 0% -7.52% (p=0.000 n=10+10) MemmoveUnalignedSrc/256-4 17.1ns ± 0% 17.3ns ± 0% +1.17% (p=0.000 n=10+10) MemmoveUnalignedSrc/512-4 27.0ns ± 0% 27.0ns ± 0% ~ (all equal) MemmoveUnalignedSrc/1024-4 46.7ns ± 0% 35.7ns ± 0% -23.55% (p=0.000 n=10+9) MemmoveUnalignedSrc/2048-4 85.2ns ± 0% 61.2ns ± 0% -28.17% (p=0.000 n=10+8) MemmoveUnalignedSrc/4096-4 162ns ± 0% 113ns ± 0% -30.25% (p=0.000 n=10+10) name old speed new speed delta Memmove/4096-4 35.2GB/s ± 0% 37.1GB/s ± 0% +5.56% (p=0.000 n=10+9) MemmoveUnalignedSrc/1024-4 21.9GB/s ± 0% 28.7GB/s ± 0% +30.90% (p=0.000 n=10+10) MemmoveUnalignedSrc/2048-4 24.0GB/s ± 0% 33.5GB/s ± 0% +39.18% (p=0.000 n=10+9) MemmoveUnalignedSrc/4096-4 25.3GB/s ± 0% 36.2GB/s ± 0% +43.50% (p=0.000 n=10+7) Cortex-A72 (Graviton 1) name old time/op new time/op delta Memmove/0-4 3.06ns ± 3% 3.08ns ± 1% ~ (p=0.958 n=10+9) Memmove/1-4 8.72ns ± 0% 7.85ns ± 0% -9.98% (p=0.002 n=8+10) Memmove/8-4 8.29ns ± 0% 8.29ns ± 0% ~ (all equal) Memmove/16-4 8.29ns ± 0% 8.29ns ± 0% ~ (all equal) Memmove/32-4 8.19ns ± 2% 8.29ns ± 0% ~ (p=0.114 n=10+10) Memmove/64-4 18.3ns ± 4% 10.0ns ± 0% -45.36% (p=0.000 n=10+10) Memmove/128-4 14.8ns ± 0% 17.4ns ± 0% +17.77% (p=0.000 n=10+10) Memmove/256-4 21.8ns ± 0% 23.1ns ± 0% +5.96% (p=0.000 n=10+10) Memmove/512-4 35.8ns ± 0% 37.2ns ± 0% +3.91% (p=0.000 n=10+10) Memmove/1024-4 63.7ns ± 0% 67.2ns ± 0% +5.49% (p=0.000 n=10+10) Memmove/2048-4 126ns ± 0% 123ns ± 0% -2.38% (p=0.000 n=10+10) Memmove/4096-4 238ns ± 1% 243ns ± 1% +1.93% (p=0.000 n=10+10) MemmoveUnalignedDst/64-4 19.3ns ± 1% 12.0ns ± 1% -37.49% (p=0.000 n=10+10) MemmoveUnalignedDst/128-4 17.2ns ± 0% 17.4ns ± 0% +1.16% (p=0.000 n=10+10) MemmoveUnalignedDst/256-4 28.2ns ± 8% 29.2ns ± 0% ~ (p=0.352 n=10+10) MemmoveUnalignedDst/512-4 49.8ns ± 3% 48.9ns ± 0% ~ (p=1.000 n=10+10) MemmoveUnalignedDst/1024-4 89.5ns ± 0% 80.5ns ± 1% -10.02% (p=0.000 n=10+10) MemmoveUnalignedDst/2048-4 180ns ± 0% 127ns ± 0% -29.44% (p=0.000 n=9+10) MemmoveUnalignedDst/4096-4 347ns ± 0% 244ns ± 0% -29.59% (p=0.000 n=10+9) MemmoveUnalignedSrc/128-4 16.1ns ± 0% 21.8ns ± 0% +35.40% (p=0.000 n=10+10) MemmoveUnalignedSrc/256-4 24.9ns ± 8% 26.6ns ± 0% +6.70% (p=0.015 n=10+10) MemmoveUnalignedSrc/512-4 39.4ns ± 6% 40.6ns ± 0% ~ (p=0.352 n=10+10) MemmoveUnalignedSrc/1024-4 72.5ns ± 0% 83.0ns ± 1% +14.44% (p=0.000 n=9+10) MemmoveUnalignedSrc/2048-4 129ns ± 1% 128ns ± 1% ~ (p=0.179 n=10+10) MemmoveUnalignedSrc/4096-4 241ns ± 0% 253ns ± 1% +4.99% (p=0.000 n=9+9) Cortex-A53 (Raspberry Pi 3) name old time/op new time/op delta Memmove/0-4 11.0ns ± 0% 11.0ns ± 1% ~ (p=0.294 n=8+10) Memmove/1-4 29.6ns ± 0% 28.0ns ± 1% -5.41% (p=0.000 n=9+10) Memmove/8-4 23.5ns ± 0% 22.1ns ± 0% -6.11% (p=0.000 n=8+8) Memmove/16-4 23.7ns ± 1% 22.1ns ± 0% -6.59% (p=0.000 n=10+8) Memmove/32-4 27.9ns ± 0% 27.1ns ± 0% -3.13% (p=0.000 n=8+8) Memmove/64-4 33.8ns ± 0% 31.5ns ± 1% -6.99% (p=0.000 n=8+10) Memmove/128-4 45.6ns ± 0% 44.2ns ± 1% -3.23% (p=0.000 n=9+10) Memmove/256-4 69.3ns ± 0% 69.3ns ± 0% ~ (p=0.072 n=8+8) Memmove/512-4 127ns ± 0% 110ns ± 0% -13.39% (p=0.000 n=8+8) Memmove/1024-4 222ns ± 0% 205ns ± 1% -7.66% (p=0.000 n=7+10) Memmove/2048-4 411ns ± 0% 366ns ± 0% -10.98% (p=0.000 n=8+9) Memmove/4096-4 795ns ± 1% 695ns ± 1% -12.63% (p=0.000 n=10+10) MemmoveUnalignedDst/64-4 44.0ns ± 0% 40.5ns ± 0% -7.93% (p=0.000 n=8+8) MemmoveUnalignedDst/128-4 59.6ns ± 0% 54.9ns ± 0% -7.85% (p=0.000 n=9+9) MemmoveUnalignedDst/256-4 98.2ns ±11% 90.0ns ± 1% ~ (p=0.130 n=10+10) MemmoveUnalignedDst/512-4 161ns ± 2% 145ns ± 1% -9.96% (p=0.000 n=10+10) MemmoveUnalignedDst/1024-4 281ns ± 0% 265ns ± 0% -5.65% (p=0.000 n=9+8) MemmoveUnalignedDst/2048-4 528ns ± 0% 482ns ± 0% -8.73% (p=0.000 n=8+9) MemmoveUnalignedDst/4096-4 1.02µs ± 1% 0.92µs ± 0% -10.00% (p=0.000 n=10+8) MemmoveUnalignedSrc/64-4 42.4ns ± 1% 40.5ns ± 0% -4.39% (p=0.000 n=10+8) MemmoveUnalignedSrc/128-4 57.4ns ± 0% 57.0ns ± 1% -0.75% (p=0.048 n=9+10) MemmoveUnalignedSrc/256-4 88.1ns ± 1% 89.6ns ± 0% +1.70% (p=0.000 n=9+8) MemmoveUnalignedSrc/512-4 160ns ± 2% 144ns ± 0% -9.89% (p=0.000 n=10+8) MemmoveUnalignedSrc/1024-4 286ns ± 0% 266ns ± 1% -6.69% (p=0.000 n=8+10) MemmoveUnalignedSrc/2048-4 525ns ± 0% 483ns ± 1% -7.96% (p=0.000 n=9+10) MemmoveUnalignedSrc/4096-4 1.01µs ± 0% 0.92µs ± 1% -9.40% (p=0.000 n=8+10) Change-Id: Ia1144e9d4dfafdece6e167c5e576bf80f254c8ab Reviewed-on: https://go-review.googlesource.com/c/go/+/243357 TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: eric fang <eric.fang@arm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-02internal/cpu: use anonymous struct for CPU feature varsTobias Klauser
Like in x/sys/cpu, use anonymous structs to declare the CPU feature vars instead of defining single-use types. Also, order the vars alphabetically. Change-Id: Iedd3ca51916e3cbb852d2aeed18b3a4c6613e778 Reviewed-on: https://go-review.googlesource.com/c/go/+/221757 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2020-02-28internal/cpu: add MIPS64x feature detectionMeng Zhuo
Change-Id: Iacdad1758aa15e4703fccef38c08ecb338b95fd7 Reviewed-on: https://go-review.googlesource.com/c/go/+/200579 Run-TryBot: Meng Zhuo <mengzhuo1203@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-09-08all: fix typosAinar Garipov
Use the following (suboptimal) script to obtain a list of possible typos: #!/usr/bin/env sh set -x git ls-files |\ grep -e '\.\(c\|cc\|go\)$' |\ xargs -n 1\ awk\ '/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\ hunspell -d en_US -l |\ grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\ grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\ sort -f |\ uniq -c |\ awk '$1 == 1 { print $2; }' Then, go through the results manually and fix the most obvious typos in the non-vendored code. Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae Reviewed-on: https://go-review.googlesource.com/c/go/+/193848 Reviewed-by: Robert Griesemer <gri@golang.org>
2019-04-30internal/cpu: add detection for the new ECDSA and EDDSA capabilities on s390xbill_ofarrell
This CL will check for the Message-Security-Assist Extension 9 facility which enables the KDSA instruction. Change-Id: I659aac09726e0999ec652ef1f5983072c8131a48 Reviewed-on: https://go-review.googlesource.com/c/go/+/174529 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-02-28internal/cpu: change s390x API to match x/sys/cpuMichael Munday
This CL changes the internal/cpu API to more closely match the public version in x/sys/cpu (added in CL 163003). This will make it easier to update the dependencies of vendored code. The most prominent renaming is from VE1 to VXE for the vector-enhancements facility 1. VXE is the mnemonic used for this facility in the HWCAP vector. Change-Id: I922d6c8bb287900a4bd7af70567e22eac567b5c1 Reviewed-on: https://go-review.googlesource.com/c/164437 Reviewed-by: Martin Möhrmann <moehrmann@google.com> Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-12-05crypto/elliptic: utilize faster z14 multiply/square instructions (when ↵bill_ofarrell
available) In the s390x assembly implementation of NIST P-256 curve, utilize faster multiply/square instructions introduced in the z14. These new instructions are designed for crypto and are constant time. The algorithm is unchanged except for faster multiplication when run on a z14 or later. On z13, the original mutiplication (also constant time) is used. P-256 performance is critical in many applications, such as Blockchain. name old time new time delta BaseMultP256 24396 ns/op 21564 ns/op 1.13x ScalarMultP256 87546 ns/op 72813 ns/op. 1.20x Change-Id: I7e6d8b420fac56d5f9cc13c9423e2080df854bac Reviewed-on: https://go-review.googlesource.com/c/146022 Reviewed-by: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Michael Munday <mike.munday@ibm.com>
2018-11-14internal/cpu: move GODEBUGCPU options into GODEBUGMartin Möhrmann
Change internal/cpu feature configuration to use GODEBUG=cpu.feature1=value,cpu.feature2=value... instead of GODEBUGCPU=feature1=value,feature2=value... . This is not a backwards compatibility breaking change since GODEBUGCPU was introduced in go1.11 as an undocumented compiler experiment. Fixes #28757 Change-Id: Ib21b3fed2334baeeb061a722ab1eb513d1137e87 Reviewed-on: https://go-review.googlesource.com/c/149578 Run-TryBot: Martin Möhrmann <martisch@uos.de> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-29internal/cpu: remove unused and not required ppc64(le) feature detectionMartin Möhrmann
Minimum Go requirement for ppc64(le) architecture support is POWER8. https://github.com/golang/go/wiki/MinimumRequirements#ppc64-big-endian Reduce CPU features supported in internal/cpu to those needed to test minimum requirements and cpu feature kernel support for ppc64(le). Currently no internal/cpu feature variables are used to guard code from using unsupported instructions. The IsPower9 feature variable and detection is kept as it will soon be used to guard code execution. Reducing the set of detected CPU features for ppc64(le) makes implementing Go support for new operating systems easier as CPU feature detection for ppc64(le) needs operating system support (e.g. hwcap on Linux and getsystemcfg syscall on AIX). Change-Id: Ic4c17b31610970e481cd139c657da46507391d1d Reviewed-on: https://go-review.googlesource.com/c/145117 Run-TryBot: Martin Möhrmann <martisch@uos.de> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-24internal/cpu: add options and warnings for required cpu featuresMartin Möhrmann
Updates #27218 Change-Id: I8603f3a639cdd9ee201c4f1566692e5b88877fc4 Reviewed-on: https://go-review.googlesource.com/c/144107 Run-TryBot: Martin Möhrmann <martisch@uos.de> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-15internal/cpu: add invalid option warnings and support to enable cpu featuresMartin Möhrmann
This CL adds the ability to enable the cpu feature FEATURE by specifying FEATURE=on in GODEBUGCPU. Syntax support to enable cpu features is useful in combination with a preceeding all=off to disable all but some specific cpu features. Example: GODEBUGCPU=all=off,sse3=on This CL implements printing of warnings for invalid GODEBUGCPU settings: - requests enabling features that are not supported with the current CPU - specifying values different than 'on' or 'off' for a feature - settings for unkown cpu feature names Updates #27218 Change-Id: Ic13e5c4c35426a390c50eaa4bd2a408ef2ee21be Reviewed-on: https://go-review.googlesource.com/c/141800 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-10-15internal/cpu: expose ARM feature flags for FMAAkhil Indurti
This change exposes feature flags needed to implement an FMA intrinsic on ARM CPUs via auxv's HWCAP bits. Specifically, it exposes HasVFPv4 to detect if an ARM processor has the fourth version of the vector floating point unit. The relevant instruction for this CL is VFMA, emitted in Go as FMULAD. Updates #26630. Change-Id: Ibbc04fb24c2b4d994f93762360f1a37bc6d83ff7 Reviewed-on: https://go-review.googlesource.com/c/126315 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2018-10-12internal/cpu: use 'off' for disabling cpu capabilities instead of '0'Martin Möhrmann
Updates #27218 Change-Id: I4ce20376fd601b5f958d79014af7eaf89e9de613 Reviewed-on: https://go-review.googlesource.com/c/141818 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-12internal/cpu: enable support for GODEBUGCPU in non-experimental buildsMartin Möhrmann
Enabling GODEBUGCPU without the need to set GOEXPERIMENT=debugcpu enables trybots and builders to run tests for GODEBUGCPU features in upcoming CLs that will implement the new syntax and features for non-experimental GODEBUGCPU support from proposal golang.org/issue/27218. Updates #27218 Change-Id: Icc69e51e736711a86b02b46bd441ffc28423beba Reviewed-on: https://go-review.googlesource.com/c/141817 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-24runtime: replace sys.CacheLineSize by corresponding internal/cpu const and varsMartin Möhrmann
sys here is runtime/internal/sys. Replace uses of sys.CacheLineSize for padding by cpu.CacheLinePad or cpu.CacheLinePadSize. Replace other uses of sys.CacheLineSize by cpu.CacheLineSize. Remove now unused sys.CacheLineSize. Updates #25203 Change-Id: I1daf410fe8f6c0493471c2ceccb9ca0a5a75ed8f Reviewed-on: https://go-review.googlesource.com/126601 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-08-24internal/cpu: add a CacheLinePadSize constantMartin Möhrmann
The new constant CacheLinePadSize can be used to compute best effort alignment of structs to cache lines. e.g. the runtime can use this in the locktab definition: var locktab [57]struct { l spinlock pad [cpu.CacheLinePadSize - unsafe.Sizeof(spinlock{})]byte } Change-Id: I86f6fbfc5ee7436f742776a7d4a99a1d54ffccc8 Reviewed-on: https://go-review.googlesource.com/131237 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>