14313-benchmark-format: add unit metadata

The predefined keys differ slightly from what I proposed in golang/go#43744. I tried specifying {higher,lower}={better,worse} like I originally proposed and it just got really messy. Turning it around to better={higher,lower} means its somewhat backwards from what you might expect from English phrasing, but it lets us use a single key because I don't think anyone is going to accidentally write worse={higher,lower}, and this avoids any annoying questions about what happens if a user specifies both "higher" and "lower". For golang/go#43744. Change-Id: I895914b179c291003e76f897cabbcbdb2381f163 Reviewed-on: https://go-review.googlesource.com/c/proposal/+/357530 Reviewed-by: Michael Knyszek <mknyszek@google.com>
author: Austin Clements <austin@google.com> 2021-10-20 11:50:19 -0400
committer: Austin Clements <austin@google.com> 2024-10-29 18:08:10 +0000
commit: 986bcc14ca19d030da571af9a83058025f250254 (patch)
tree: 5b60eac9a0b835860182e8b05d2aeffc51298786
parent: 1dd567da00eec3ccfbcbafb7a04f4297e7a69b48 (diff)
download: go-x-proposal-986bcc14ca19d030da571af9a83058025f250254.tar.xz
1 files changed, 46 insertions, 3 deletions
diff --git a/design/14313-benchmark-format.md b/design/14313-benchmark-format.md
index 3fb3ef7..45944bd 100644
--- a/design/14313-benchmark-format.md
+++ b/design/14313-benchmark-format.md
@@ -89,7 +89,7 @@ the need to process custom output formats in future benchmarks.
 ## Proposal
 
 A Go benchmark data file is a UTF-8 textual file consisting of a sequence of lines.
-Configuration lines and benchmark result lines, described below,
+Configuration lines, benchmark result lines, and unit metadata lines, described below,
 have semantic meaning in the reporting of benchmark results.
 
 All other lines in the data file, including but not limited to
@@ -150,7 +150,7 @@ In the example, the CPU cost is reported per-operation and the
 throughput is reported per-second; neither is a total that
 depends on the number of iterations.
 
-### Value Units
+#### Value Units
 
 A value's unit string is expected to specify not only the measurement unit
 but also, as needed, a description of what is being measured.
@@ -167,7 +167,7 @@ and rescale known measurement units.
 For example, consistently large “ns/op” or “L1-miss-ns/op”
 might be rescaled to “ms/op” or “L1-miss-ms/op” for display.
 
-### Benchmark Name Configuration
+#### Benchmark Name Configuration
 
 In the current testing package, benchmark names correspond to Go identifiers:
 each benchmark must be written as a different Go function.
@@ -184,6 +184,49 @@ that slash-prefixed key=value pairs in the benchmark name are
 treated by benchmark data processors as per-benchmark 
 configuration values.
 
+### Unit metadata
+
+When a benchmark reports units outside the standard units implemented
+by the testing package, it can be useful for tools to understand
+additional metadata about those units.
+
+A unit metadata line has the form
+
+	Unit <unit> <key>=<value> <key>=<value> ...
+
+The fields are separated by runs of space characters (as defined by
+`unicode.IsSpace`), and space characters are not allowed within unit,
+key, or value.
+Keys must not contain `=`.
+
+It is an error to specify different values for any given unit and key,
+even on different unit metadata lines.
+That is, once unit metadata is specified, it can't be overridden.
+Specifying the same value for a key multiple times is not an error.
+
+Unit metadata applies to all following benchmark result lines, though
+it is unspecified whether it applies to earlier benchmark results
+lines.
+This allows for stream-oriented processing of benchmark results.
+
+Keys are not constrained, but the following keys have predefined
+meanings:
+
+- `better={higher,lower}` indicates whether higher or lower values of
+  this unit are better (indicate an improvement).
+  By default, ns/op, B/op, and allocs/op are `better=lower`, and MB/s
+  is `better=higher`.
+  Other units do not assume a default.
+
+- `assume={nothing,exact}` indicates what statistical assumption to
+  make when considering distributions of values.
+  `nothing` means to make no statistical assumptions (e.g., use
+  non-parametric methods) and `exact` means to assume measurements are
+  exact (repeated measurement does not increase confidence).
+  The default is `nothing`.
+  In the future we may also support `normal`, but that's almost never
+  the right assumption for benchmarks.
+
 ### Example
 
 The benchmark output given in the background section above
author	Austin Clements <austin@google.com>	2021-10-20 11:50:19 -0400
committer	Austin Clements <austin@google.com>	2024-10-29 18:08:10 +0000
commit	986bcc14ca19d030da571af9a83058025f250254 (patch)
tree	5b60eac9a0b835860182e8b05d2aeffc51298786
parent	1dd567da00eec3ccfbcbafb7a04f4297e7a69b48 (diff)
download	go-x-proposal-986bcc14ca19d030da571af9a83058025f250254.tar.xz