From 986bcc14ca19d030da571af9a83058025f250254 Mon Sep 17 00:00:00 2001 From: Austin Clements Date: Wed, 20 Oct 2021 11:50:19 -0400 Subject: 14313-benchmark-format: add unit metadata The predefined keys differ slightly from what I proposed in golang/go#43744. I tried specifying {higher,lower}={better,worse} like I originally proposed and it just got really messy. Turning it around to better={higher,lower} means its somewhat backwards from what you might expect from English phrasing, but it lets us use a single key because I don't think anyone is going to accidentally write worse={higher,lower}, and this avoids any annoying questions about what happens if a user specifies both "higher" and "lower". For golang/go#43744. Change-Id: I895914b179c291003e76f897cabbcbdb2381f163 Reviewed-on: https://go-review.googlesource.com/c/proposal/+/357530 Reviewed-by: Michael Knyszek --- design/14313-benchmark-format.md | 49 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 46 insertions(+), 3 deletions(-) diff --git a/design/14313-benchmark-format.md b/design/14313-benchmark-format.md index 3fb3ef7..45944bd 100644 --- a/design/14313-benchmark-format.md +++ b/design/14313-benchmark-format.md @@ -89,7 +89,7 @@ the need to process custom output formats in future benchmarks. ## Proposal A Go benchmark data file is a UTF-8 textual file consisting of a sequence of lines. -Configuration lines and benchmark result lines, described below, +Configuration lines, benchmark result lines, and unit metadata lines, described below, have semantic meaning in the reporting of benchmark results. All other lines in the data file, including but not limited to @@ -150,7 +150,7 @@ In the example, the CPU cost is reported per-operation and the throughput is reported per-second; neither is a total that depends on the number of iterations. -### Value Units +#### Value Units A value's unit string is expected to specify not only the measurement unit but also, as needed, a description of what is being measured. @@ -167,7 +167,7 @@ and rescale known measurement units. For example, consistently large “ns/op” or “L1-miss-ns/op” might be rescaled to “ms/op” or “L1-miss-ms/op” for display. -### Benchmark Name Configuration +#### Benchmark Name Configuration In the current testing package, benchmark names correspond to Go identifiers: each benchmark must be written as a different Go function. @@ -184,6 +184,49 @@ that slash-prefixed key=value pairs in the benchmark name are treated by benchmark data processors as per-benchmark configuration values. +### Unit metadata + +When a benchmark reports units outside the standard units implemented +by the testing package, it can be useful for tools to understand +additional metadata about those units. + +A unit metadata line has the form + + Unit = = ... + +The fields are separated by runs of space characters (as defined by +`unicode.IsSpace`), and space characters are not allowed within unit, +key, or value. +Keys must not contain `=`. + +It is an error to specify different values for any given unit and key, +even on different unit metadata lines. +That is, once unit metadata is specified, it can't be overridden. +Specifying the same value for a key multiple times is not an error. + +Unit metadata applies to all following benchmark result lines, though +it is unspecified whether it applies to earlier benchmark results +lines. +This allows for stream-oriented processing of benchmark results. + +Keys are not constrained, but the following keys have predefined +meanings: + +- `better={higher,lower}` indicates whether higher or lower values of + this unit are better (indicate an improvement). + By default, ns/op, B/op, and allocs/op are `better=lower`, and MB/s + is `better=higher`. + Other units do not assume a default. + +- `assume={nothing,exact}` indicates what statistical assumption to + make when considering distributions of values. + `nothing` means to make no statistical assumptions (e.g., use + non-parametric methods) and `exact` means to assume measurements are + exact (repeated measurement does not increase confidence). + The default is `nothing`. + In the future we may also support `normal`, but that's almost never + the right assumption for benchmarks. + ### Example The benchmark output given in the background section above -- cgit v1.3