diff options
| author | smasher164 <aindurti@gmail.com> | 2018-09-25 03:10:33 -0400 |
|---|---|---|
| committer | Keith Randall <khr@golang.org> | 2019-10-21 16:42:10 +0000 |
| commit | 7a6da218b191de13f4f3555c55aab958b09b66bd (patch) | |
| tree | 6f324979a21514735e80b78bb9ce3c5ad64ea72e /src/runtime | |
| parent | 50f4896b72d16b6538178c8ca851b20655075b7f (diff) | |
| download | go-7a6da218b191de13f4f3555c55aab958b09b66bd.tar.xz | |
cmd/compile: add fma intrinsic for amd64
To permit ssa-level optimization, this change introduces an amd64 intrinsic
that generates the VFMADD231SD instruction for the fused-multiply-add
operation on systems that support it. System support is detected via
cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic
("Fma") to VFMADD231SD.
The benchmark compares the software implementation (old) with the intrinsic
(new).
name old time/op new time/op delta
Fma-4 27.2ns ± 1% 1.0ns ± 9% -96.48% (p=0.008 n=5+5)
Updates #25819.
Change-Id: I966655e5f96817a5d06dff5942418a3915b09584
Reviewed-on: https://go-review.googlesource.com/c/go/+/137156
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Diffstat (limited to 'src/runtime')
| -rw-r--r-- | src/runtime/cpuflags.go | 1 | ||||
| -rw-r--r-- | src/runtime/proc.go | 1 |
2 files changed, 2 insertions, 0 deletions
diff --git a/src/runtime/cpuflags.go b/src/runtime/cpuflags.go index 1565afb93a..3e859a3516 100644 --- a/src/runtime/cpuflags.go +++ b/src/runtime/cpuflags.go @@ -23,6 +23,7 @@ var ( // TODO: deprecate these; use internal/cpu directly. x86HasPOPCNT bool x86HasSSE41 bool + x86HasFMA bool arm64HasATOMICS bool ) diff --git a/src/runtime/proc.go b/src/runtime/proc.go index d7f55b6c64..c419dee771 100644 --- a/src/runtime/proc.go +++ b/src/runtime/proc.go @@ -514,6 +514,7 @@ func cpuinit() { // to guard execution of instructions that can not be assumed to be always supported. x86HasPOPCNT = cpu.X86.HasPOPCNT x86HasSSE41 = cpu.X86.HasSSE41 + x86HasFMA = cpu.X86.HasFMA arm64HasATOMICS = cpu.ARM64.HasATOMICS } |
