cmd/compile, runtime: intrinsify atomic And8 and Or8 on s390x - go - Fork of Go programming language with my patches.

diff options

author	Michael Munday <mike.munday@ibm.com>	2019-10-23 06:43:23 -0700
committer	Brad Fitzpatrick <bradfitz@golang.org>	2019-11-11 15:23:59 +0000
commit	b3885dbc93ceae1b12f7e80edd2696baf566edec (patch)
tree	c7f9c66dbe89b891b512347b0e56cf69d253a387 /src/encoding
parent	75c839af22a50cb027766ea54335e234dac32836 (diff)
download	go-b3885dbc93ceae1b12f7e80edd2696baf566edec.tar.xz

cmd/compile, runtime: intrinsify atomic And8 and Or8 on s390x

Intrinsify these functions to match other platforms. Update the sequence of instructions used in the assembly implementations to match the intrinsics. Also, add a micro benchmark so we can more easily measure the performance of these two functions: name old time/op new time/op delta And8-8 5.33ns ± 7% 2.55ns ± 8% -52.12% (p=0.000 n=20+20) And8Parallel-8 7.39ns ± 5% 3.74ns ± 4% -49.34% (p=0.000 n=20+20) Or8-8 4.84ns ±15% 2.64ns ±11% -45.50% (p=0.000 n=20+20) Or8Parallel-8 7.27ns ± 3% 3.84ns ± 4% -47.10% (p=0.000 n=19+20) By using a 'rotate then xor selected bits' instruction combined with either a 'load and and' or a 'load and or' instruction we can implement And8 and Or8 with far fewer instructions. Replacing 'compare and swap' with atomic instructions may also improve performance when there is contention. Change-Id: I28bb8032052b73ae8ccdf6e4c612d2877085fa01 Reviewed-on: https://go-review.googlesource.com/c/go/+/204277 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

Diffstat (limited to 'src/encoding')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: