diff options
| author | Wayne Zuo <wdvxdr@golangcn.org> | 2022-03-30 21:44:44 +0800 |
|---|---|---|
| committer | Emmanuel Odeke <emmanuel@orijtech.com> | 2022-04-04 04:01:17 +0000 |
| commit | a92ca515077e5cf54673eb8c5c2d9db4824330db (patch) | |
| tree | 7dc63db107f5cef14d2819196e7a85b082bc2dc3 /src/os/exec/exec_windows_test.go | |
| parent | ba6df85c7c94c7b26d4979e92fdb9ec7fa4cc1e4 (diff) | |
| download | go-a92ca515077e5cf54673eb8c5c2d9db4824330db.tar.xz | |
cmd/compile: use LZCNT instruction for GOAMD64>=3
LZCNT is similar to BSR, but BSR(x) is undefined when x == 0, so using
LZCNT can avoid a special case for zero input. Except that case,
LZCNTQ(x) == 63-BSRQ(x) and LZCNTL(x) == 31-BSRL(x).
And according to https://www.agner.org/optimize/instruction_tables.pdf,
LZCNT instructions are much faster than BSR on AMD CPU.
name old time/op new time/op delta
LeadingZeros-8 0.91ns ± 1% 0.80ns ± 7% -11.68% (p=0.000 n=9+9)
LeadingZeros8-8 0.98ns ±15% 0.91ns ± 1% -7.34% (p=0.000 n=9+9)
LeadingZeros16-8 0.94ns ± 3% 0.92ns ± 2% -2.36% (p=0.001 n=10+10)
LeadingZeros32-8 0.89ns ± 1% 0.78ns ± 2% -12.49% (p=0.000 n=10+10)
LeadingZeros64-8 0.92ns ± 1% 0.78ns ± 1% -14.48% (p=0.000 n=10+10)
Change-Id: I125147fe3d6994a4cfe558432780408e9a27557a
Reviewed-on: https://go-review.googlesource.com/c/go/+/396794
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Diffstat (limited to 'src/os/exec/exec_windows_test.go')
0 files changed, 0 insertions, 0 deletions
