From 986bcc14ca19d030da571af9a83058025f250254 Mon Sep 17 00:00:00 2001
From: Austin Clements <austin@google.com>
Date: Wed, 20 Oct 2021 11:50:19 -0400
Subject: 14313-benchmark-format: add unit metadata

The predefined keys differ slightly from what I proposed in golang/go#43744. I
tried specifying {higher,lower}={better,worse} like I originally
proposed and it just got really messy. Turning it around to
better={higher,lower} means its somewhat backwards from what you might
expect from English phrasing, but it lets us use a single key because
I don't think anyone is going to accidentally write
worse={higher,lower}, and this avoids any annoying questions about
what happens if a user specifies both "higher" and "lower".

For golang/go#43744.

Change-Id: I895914b179c291003e76f897cabbcbdb2381f163
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/357530
Reviewed-by: Michael Knyszek <mknyszek@google.com>
---
 design/14313-benchmark-format.md | 49 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 46 insertions(+), 3 deletions(-)
diff --git a/design/14313-benchmark-format.md b/design/14313-benchmark-format.md
index 3fb3ef7..45944bd 100644
--- a/design/14313-benchmark-format.md
+++ b/design/14313-benchmark-format.md
@@ -89,7 +89,7 @@ the need to process custom output formats in future benchmarks.
 ## Proposal
 
 A Go benchmark data file is a UTF-8 textual file consisting of a sequence of lines.
-Configuration lines and benchmark result lines, described below,
+Configuration lines, benchmark result lines, and unit metadata lines, described below,
 have semantic meaning in the reporting of benchmark results.
 
 All other lines in the data file, including but not limited to
@@ -150,7 +150,7 @@ In the example, the CPU cost is reported per-operation and the
 throughput is reported per-second; neither is a total that
 depends on the number of iterations.
 
-### Value Units
+#### Value Units
 
 A value's unit string is expected to specify not only the measurement unit
 but also, as needed, a description of what is being measured.
@@ -167,7 +167,7 @@ and rescale known measurement units.
 For example, consistently large “ns/op” or “L1-miss-ns/op”
 might be rescaled to “ms/op” or “L1-miss-ms/op” for display.
 
-### Benchmark Name Configuration
+#### Benchmark Name Configuration
 
 In the current testing package, benchmark names correspond to Go identifiers:
 each benchmark must be written as a different Go function.
@@ -184,6 +184,49 @@ that slash-prefixed key=value pairs in the benchmark name are
 treated by benchmark data processors as per-benchmark 
 configuration values.
 
+### Unit metadata
+
+When a benchmark reports units outside the standard units implemented
+by the testing package, it can be useful for tools to understand
+additional metadata about those units.
+
+A unit metadata line has the form
+
+	Unit <unit> <key>=<value> <key>=<value> ...
+
+The fields are separated by runs of space characters (as defined by
+`unicode.IsSpace`), and space characters are not allowed within unit,
+key, or value.
+Keys must not contain `=`.
+
+It is an error to specify different values for any given unit and key,
+even on different unit metadata lines.
+That is, once unit metadata is specified, it can't be overridden.
+Specifying the same value for a key multiple times is not an error.
+
+Unit metadata applies to all following benchmark result lines, though
+it is unspecified whether it applies to earlier benchmark results
+lines.
+This allows for stream-oriented processing of benchmark results.
+
+Keys are not constrained, but the following keys have predefined
+meanings:
+
+- `better={higher,lower}` indicates whether higher or lower values of
+  this unit are better (indicate an improvement).
+  By default, ns/op, B/op, and allocs/op are `better=lower`, and MB/s
+  is `better=higher`.
+  Other units do not assume a default.
+
+- `assume={nothing,exact}` indicates what statistical assumption to
+  make when considering distributions of values.
+  `nothing` means to make no statistical assumptions (e.g., use
+  non-parametric methods) and `exact` means to assume measurements are
+  exact (repeated measurement does not increase confidence).
+  The default is `nothing`.
+  In the future we may also support `normal`, but that's almost never
+  the right assumption for benchmarks.
+
 ### Example
 
 The benchmark output given in the background section above
-- 
cgit v1.3