diff options
| author | Katie Hockman <katie@golang.org> | 2020-07-20 17:32:30 -0400 |
|---|---|---|
| committer | Katie Hockman <katie@golang.org> | 2020-07-21 20:06:32 +0000 |
| commit | 1b90177fe4e9f9f449760eea4fa7128521b09ab5 (patch) | |
| tree | 6e203fd2b77178b41ec74df6bf7dfd46c72a7f22 | |
| parent | baec2fd7b17109e7c17ed6f9f4fcdf1ccacd5a09 (diff) | |
| download | go-x-proposal-1b90177fe4e9f9f449760eea4fa7128521b09ab5.tar.xz | |
design/40307-fuzzing.md: add design draft
Change-Id: Idfd921426537e7fe2368b522f8df9d0cec1bae25
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/243947
Reviewed-by: Filippo Valsorda <filippo@golang.org>
| -rw-r--r-- | design/40307-fuzzing.md | 594 |
1 files changed, 594 insertions, 0 deletions
diff --git a/design/40307-fuzzing.md b/design/40307-fuzzing.md new file mode 100644 index 0000000..cb0ad6d --- /dev/null +++ b/design/40307-fuzzing.md @@ -0,0 +1,594 @@ +# Design Draft: First Class Fuzzing + +Author: Katie Hockman + +Last updated: 2020-07-21 + +Discussion at https://golang.org/issue/40307 + +## Abstract + +Systems built with Go must be secure and resilient. +Fuzzing can help with this, by allowing developers to identify and fix bugs, +empowering them to improve the quality of their code. +However, there is no standard way of fuzzing Go code today, and no +out-of-the-box tooling or support. +This proposal will create a unified fuzzing narrative which makes fuzzing a +first class option for Go developers. + +## Background + +Fuzzing is a type of automated testing which continuously manipulates inputs to +a program to find issues such as panics, bugs, or data races to which the code +may be susceptible. +These semi-random data mutations can discover new code coverage that existing +unit tests may miss, and uncover edge-case bugs which would otherwise go +unnoticed. +This type of testing works best when able to run more mutations quickly, rather +than fewer mutations intelligently. + +Since fuzzing can reach edge cases which humans often miss, fuzz testing is +particularly valuable for finding security exploits and vulnerabilities. +Fuzz tests have historically been authored primarily by security engineers, and +hackers may use similar methods to find vulnerabilities maliciously. +However, writing fuzz targets needn’t be constrained to developers with security +expertise. +There is great value in fuzz testing all programs, including those +which may be more subtly security-relevant, especially those working with +arbitrary user input. + +Other languages support and encourage fuzz testing. +[libFuzzer](https://llvm.org/docs/LibFuzzer.html) and +[AFL](https://lcamtuf.coredump.cx/afl/) are widely used, particularly with +C/C++, and AFL has identified vulnerabilities in programs like Mozilla Firefox, +Internet Explorer, OpenSSH, Adobe Flash, and more. +In Rust, +[cargo-fuzz](https://fitzgeraldnick.com/2020/01/16/better-support-for-fuzzing-structured-inputs-in-rust.html) +allows for fuzzing of structured data in addition to raw bytes, allowing for +even more flexibility with authoring fuzz targets. +Existing tools in Go, such as go-fuzz, have many [success +stories](https://github.com/dvyukov/go-fuzz#trophies), but there is no fully +supported or canonical solution for Go. +The goal is to make fuzzing a first-class experience, making it so easy that it +becomes the norm for Go packages to have fuzz targets. +Having fuzz targets available in a standard format makes it possible to use them +automatically in CI, or even as the basis for experiments with different +mutation engines. + +There is strong community interest for this. +It’s the third most supported +[proposal](https://github.com/golang/go/issues/19109) on the issue tracker (~500 ++1s), with projects like [go-fuzz](https://github.com/dvyukov/go-fuzz) (3.5K +stars) and other community-led efforts that have been in the works for several +years. +Prototypes exist, but lack core features like robust module support, go command +integration, and integration with new [compiler +instrumentation](https://github.com/golang/go/issues/14565). + +## Proposal + +Support `Fuzz` functions in Go test files, making fuzzing a first class option +for Go developers through unified, end-to-end support. + +## Rationale + +One alternative would be to keep with the status quo and ask Go developers to +use existing tools, or build their own as needed. +Developers could use tools +like [go-fuzz](https://github.com/dvyukov/go-fuzz) or +[fzgo](https://github.com/thepudds/fzgo) (built on top of go-fuzz) to solve some +of their needs. +However, each existing solution involves more work than typical Go testing, and +is missing crucial features. +Fuzz testing shouldn’t be any more complicated, or any less feature-complete, +than other types of Go testing (like benchmarking or unit testing). +Existing solutions add extra overhead such as custom command line tools, +separate test files or build tags, lack of robust modules support, and lack of +testing/customization support from the standard library. + +By making fuzzing easier for developers, we will increase the amount of Go code +that’s covered by fuzz tests. +This will have particularly high impact for heavily depended upon or +security-sensitive packages. +The more Go code that’s covered by fuzz tests, the more bugs will be found and +fixed in the wider ecosystem. +These bug fixes matter for the stability and security of systems written in Go. + +The best solution for Go in the long-term is to have a feature-rich, fully +supported, unified narrative for fuzzing. +It should be just as easy to write fuzz targets as it is to write unit tests, +and developers shouldn’t need to learn custom workflows or new tools to build or +run these fuzz targets. +Along with the language support, we must provide documentation, tutorials, and +incentives for Go package owners to add fuzz tests to their packages. +This is a measurable goal, and we can track the number of fuzz targets and +resulting bug fixes resulting from this design. + +Standardizing this also provides new opportunities for other tools to be built, +and integration into existing infrastructure. +For example, this proposal creates consistency for building and running fuzz +targets, making it easier to build turnkey +[OSS-Fuzz](https://github.com/google/oss-fuzz) support. + +In the long term, this design could start to replace existing table tests, +seamlessly integrating into the existing Go testing ecosystem. + +Some motivations written or provided by members of the Go community: + +* https://tiny.cc/why-go-fuzz +* [Around 400 documented bugs](https://github.com/dvyukov/go-fuzz#trophies) + were found by owners of various open-source Go packages with go-fuzz. + +## Compatibility + +This proposal will not impact any current compatibility promises. +It is possible that there are existing `FuzzX` functions in yyy\_test.go files +today, and the go command will emit an error on such functions if they have an +unsupported signature. +This should however be unlikely, since most existing fuzz tools don’t +support these functions within yyy\_test.go files. + +## Implementation + +There are several components to this proposal which are described below. +The big pieces to be supported in the MVP are: support for fuzzing built-in +types, structs, and types which implement the BinaryMarshaler and +BinaryUnmarshaler interfaces or the TextMarshaler and TextUnmarshaler +interfaces, a new `testing.F` type, full `go` command support, and building a +tailored-to-Go fuzzing engine using the [new compiler +instrumentation](https://golang.org/issue/14565). + +There is already a lot of existing work that has been done to support this, and +we should leverage as much of that as possible when building native support, +e.g. [go-fuzz](https://github.com/dvyukov/go-fuzz), +[fzgo](https://github.com/thepudds/fzgo). +Work for this will be done in a dev branch (e.g. dev.fuzzing) of the main Go +repository, led by Katie Hockman, with contributions from other members of the +Go team and members of the community as appropriate. + + +### Overview + +The **fuzz target** is a `FuzzX` function in a test file. Each fuzz target has +its own corpus of inputs. + +The **fuzz function** is the function that is executed for every seed or +generated corpus entry. + +At the beginning of the fuzz target, a developer provides a “[seed +corpus](#seed-corpus)”. +This is an interesting set of inputs that will be tested using <code>[go +test](#go-command)</code> by default, and can provide a starting point for a +[mutation engine](#fuzzing-engine-and-mutator) if fuzzing. +The testing portion of the fuzz target is a function within an +<code>[f.Fuzz](#fuzz-function)</code> invocation. +This function is run for each input in the seed corpus. +If the developer is fuzzing this target, then an [on disk +corpus](#fuzzing-engine-managed-corpus) will be managed by the fuzzing engine, +and a mutator will generate new inputs to run against the testing function, +attempting to discover interesting inputs or [crashes](#crashers). + +With the new support, a fuzz target will look something like this: + +``` +func FuzzMarshalFoo(f *testing.F) { + // Seed the initial corpus + inputs := []string{"cat", "DoG", "!mouse!"} + for _, input := range inputs { + f.Add(input, big.NewInt(100)) + } + + // Run the fuzz test + f.Fuzz(func(a string, num *big.Int) { + f.Parallel() + if num.Sign() <= 0 { + f.BadInput() // only test positive numbers + } + val, err := MarshalFoo(a, num) + if err != nil { + f.BadInput() + } + if val == nil { + f.Fatal("MarshalFoo: val == nil, err == nil") + } + a2, num2, err := UnmarshalFoo(val) + if err != nil { + f.Fatalf("failed to unmarshal valid Foo: %v", err) + } + if a2 == nil || num2 == nil { + f.Error("UnmarshalFoo: a==nil, num==nil, err==nil") + } + if a2 != a || !num2.Equal(num) { + f.Error("UnmarshalFoo does not match the provided input") + } + }) +} +``` + +### testing.F + +Below is the list of methods on testing.F. + +``` +// Add will add the arguments to the seed corpus for the fuzz target. This +// cannot be called within the Fuzz function. The args must match those in +// in the Fuzz function. +func (f *F) Add(args ...interface{}) + +// Cleanup registers a function to be called when the fuzz target completes. +func (f *F) Cleanup(fn func()) + +// Error marks the arguments as having failed the test, but continues execution +// for that set of arguments. +func (f *F) Error(args ...interface{}) + +// Errorf behaves the same as Error but formats its arguments according to the +// format, analogous to Printf. +func (f *F) Errorf(args ...interface{}) + +// Fatal marks the arguments as having failed the test and stops its execution +// by calling runtime.Goexit (which then runs all deferred calls in the current +// goroutine). The fuzz target keeps executing with the next set of arguments. +func (f *F) Fatal(args ...interface{}) + +// Fatalf behaves the same as Fatal but formats its arguments according to the +// format, analogous to Printf. +func (f *F) Fatalf(args ...interface{}) + +// Fuzz runs the fuzz function with the target. It runs fn in a separate +// goroutine. Only one call to Fuzz is allowed per fuzz target, and any +// subsequent calls will panic. It only returns once all arguments have been +// passed to the fuzz function. +func (f *F) Fuzz(fn interface{}) + +// Helper marks the calling function as a helper function. +func (f *F) Helper() + +// Log formats its arguments using default formatting, analogous to Println, +// and records the text in the error log. +func (f *F) Log(args ...interface{}) + +// Logf formats its arguments according to the format, analogous to Printf, and +// records the text in the error log. +func (f *F) Logf(args ...interface{}) + +// Name returns the name of the running fuzz target. +func (f *F) Name() + +// Parallel signals that multiple instances of the Fuzz function can be run in +// parallel on separate goroutines. This must be called within an f.Fuzz +// function. +func (f *F) Parallel() + +// Skip marks the test as having been skipped and stops its execution by +// calling runtime.Goexit. Skip must be called before the Fuzz function. +func (f *F) Skip() + +// BadInput indicates that the input has failed some pre-condition, and the +// rest of the test should be skipped. The args will not be added to the +// corpus, nor will they be considered a crasher. +func (f *F) BadInput() +``` + +### Fuzz function + +A fuzz function has two main sections: 1) initializing and seeding the corpus +and 2) the Fuzz function which is executed for every seed or generated corpus +entry. + +1. The corpus generation is done first, and builds a seed corpus by calling + `f.Add(...)` with interesting input values for the fuzz test. + This should be fairly quick, thus able to run before the fuzz testing + begins, every time it’s run. These inputs are run by default with `go test`. +1. The `f.Fuzz(...)` function is executed with the provided seed corpus. + If this target is being fuzzed, then new inputs of the provided argument + types will be continously tested against the `f.Fuzz(...)` function. + +The arguments to `f.Add(...)` and the function in `f.Fuzz(...)` must be the same +type within the target, and there must be at least one argument specified. +This will be ensured by a vet check. +Fuzzing of built-in types (e.g. simple types, maps, arrays) and types which +implement the BinaryMarshaler and TextMarshaler interfaces are supported. + +Interfaces, functions, and channels will never be supported. + +### Seed Corpus + +The **seed corpus** is the user-specified set of inputs to a fuzz target which +will be run by default with go test. +These are a set of meaningful input to test the behavior of the package, as well +as a set of regression tests for any newly discovered bugs discovered by the +fuzzing engine. +This set of inputs is also used to “seed” the corpus used by the fuzzing engine +when mutating inputs to discover new code coverage. + +The fuzz target will always look in the package’s testdata/ directory for an +existing seed corpus to use, if one exists. + +The seed corpus can also be populated programmatically using `f.Add` within the +fuzz target. +Programmatic seed corpuses make it easy to add new entries when support for new +things are added (for example adding a new key type to a key parsing function) +saving the mutation engine a lot of work. +These can also be more clear for the developer when they break the build when +something changes. + +Then maybe also a sentence about how running the seed corpus with `go test` prevents rotting, and is reproducible and cacheable. + +### Fuzzing Engine and Mutator + +A new **coverage-guided fuzzing engine**, written in Go, will be built. +This fuzzing engine will be responsible for using compiler instrumentation to +understand coverage information, generating test arguments with a mutator, and +maintaining the corpus. + +The **mutator** is responsible for working with a generator to mutate bytes to +be used as input to the fuzz target. + +Take the following `f.Fuzz` arguments as an example. + +``` + A string // N bytes + B int64 // 8 bytes + Num *big.Int // M bytes +``` + +A generator will provide some bytes for each type, where the number of bytes +could be constant (e.g. 8 bytes for an int64) or variable (e.g. N bytes for a +string, likely with some upper bound). + +For constant-length types, the number of bytes can be hard-coded into the +fuzzing engine, making generation simpler. + +For variable-length types, the mutator is responsible for varying the number of +bytes requested from the generator. + +These bytes then need to be converted to the types used by the `f.Fuzz` +function. +The string and other built-in types can be decoded directly. +For other types, this can be done using either +<code>[UnmarshalBinary](https://pkg.go.dev/encoding?tab=doc#BinaryUnmarshaler)</code> +or +<code>[UnmarshalText](https://pkg.go.dev/encoding?tab=doc#TextUnmarshaler)</code> +if implemented on the type. +If building a struct, it can also build exported fields recursively as needed. + +#### Fuzzing engine managed corpus + +An on disk corpus will be managed by the fuzzing engine and will live outside +the module. +New items can be added to the corpus in several ways, e.g. adding it to the seed +corpus, or by the fuzzing engine (e.g. because of new code coverage). +The details of how the corpus is built and processed should be unimportant to +users. +This should be a technical detail that developers don’t need to understand in +order to seed a corpus or write a fuzz target. +However, a developer still has the ability to copy files directly into the on +disk corpus directory if they wish to do so. + +Under the hood, the corpus will be several files, with each file being a “unit”, +or bytes, that can be umarshaled for use in the fuzz target. + +_Examples:_ + +1: A fuzz target’s `f.Fuzz` function takes three arguments, a `string`, a +`myString`, and a `*big.Int` + +``` +Fuzz(func(f *testing.F, a string, b myString, num *big.Int) +``` + +There are three “units” that will need to be provided by the mutator, so three +files written to the corpus for this fuzz target (one per input). +The string can be decoded directly by the mutator. +The `myString` and `*big.Int` types both need an `UnmarshalText` or +`UnmarshalBinary` implementation in order for the mutator to decode the +generator-provided bytes into the respective type. + +2: A fuzz target’s `f.Fuzz` function takes one argument, a single `[]byte` + +``` +Fuzz(func(f *testing.F, b []byte) +``` + +This is the typical “non-structured fuzzing” approach, and one that many +existing Go fuzz targets currently implement. +There is only one “unit” to be provided by the mutator, so only one file per +corpus entry will be written. + +#### Minification + Pruning + +Corpus entries will be minified to the smallest input that causes the failure +where possible, and pruned wherever possible to remove corpus entries that don’t +add additional coverage. +If a developer manually adds input files to the corpus directory, the fuzzing +engine may change the file names in order to help with this. + +### Crashers + +A **crasher** is a panic found within `f.Fuzz(...)`, a race condition, a call to +`Fatal`, or a call to `Error`. +By default, the fuzz target will stop after the first crasher is found, and a +crash report will be provided. +Crash reports will include the inputs that caused the crash and the resulting +error message or stack trace. +The crasher inputs will be written to the package's testdata/ directory as a +seed corpus entry. + +Since this crasher is added to testdata/, which will then be run by default as +part of the seed corpus for the fuzz target, this can act as a test for the new +failure. +A user experience may look something like this: + +1. A user runs `go test -fuzz=FuzzFoo`, and a crasher is found while fuzzing. +1. The arguments that caused the crash are added to a testdata directory within + the package automatically. +1. A subsequent run of `go test` (even without `-fuzz=FuzzFoo`) will then hit + this newly discovering failing condition, and continue to fail until the bug + is fixed. + +### Go command + +Fuzz testing will only be supported in module mode, and if run in GOPATH mode +the fuzz targets will be ignored. + +Fuzz targets will be in *_test.go files, and can be in the same file as Test and +Benchmark targets. +These test files can exist wherever *_test.go files can currently live, and do +not need to be in any fuzz-specific directory or have a fuzz-specific file name +or build tag. + +A new environment variable will be added, `$GOFUZZCACHE`, which will default to +`$HOME/Library/Caches/go-test-fuzz`. +This directory will hold the mutator-managed corpus. +For example, the corpus for each fuzz target will be managed in a subdirectory +called `<module_name>/<pkg>/@corpus/<target_name>` where `<module_name>` will +follow module case-encoding and include the major version. + +The default behavior of `go test` will be to build and run the fuzz targets +using the seed corpus only. +No special instrumentation would be needed, the mutation engine would not run, +and the test can be cached as usual. +This default mode **will not** run the existing on disk corpus against the fuzz +target. +This is to allow for reproducibility and cacheability for `go test` executions +by default. + +In order to run a fuzz target with the mutation engine, `-fuzz` will take a +regexp which must match only one fuzz target. +In this situtation, only the fuzz target will run (ignoring all other tests). +Only one package is allowed to be tested at a time in this mode. +The following flags will be added or have modified meaning: + +``` +-fuzz name + Run the fuzz target with the given regexp. Must match at most one fuzz + target. +-keepfuzzing + Keep running the target if a crasher is found. (default false) +-parallel + Allow parallel execution of f.Fuzz functions that call f.Parallel. + The value of this flag is the maximum number of f.Fuzz functions to run + simultaneously within the given fuzz target. (default GOMAXPROCS) +-race + Enable data race detection while fuzzing. (default true) +``` + +`go test` will not respect `-p` when running with `-fuzz`, as it doesn't make +sense to fuzz multiple packages at the same time. + +## Open issues and future work + +### Options + +There are options that developers often need to fuzz effectively and safely. +These options will likely make the most sense on a target-by-target basis, +rather than as a `go test` flag. +Which options to make available still needs some investigation. +The options may be set in a few ways. + +As a struct: + +``` +func FuzzFoo(f *testing.F) { + f.FuzzOpts(&testing.FuzzOpts { + MaxInputSize: 1024, + }) + f.Fuzz(func(a string) { + ... + }) +``` + +Or individually as: + +``` +func FuzzFoo(f *testing.F) { + f.MaxInputSize(1024) + f.Fuzz(func(a string) { + ... + }) +} +``` + +### Dictionaries + +Support accepting [dictionaries](https://llvm.org/docs/LibFuzzer.html#id31) when +seeding the corpus to guide the fuzzer. + +### Instrument specific packages only + +We might need a way to specify to instrument only some packages for coverage, +but there isn’t enough data yet to be sure. +One example use case for this would be a fuzzing engine which is spending too +much time discovering coverage in the encoding/json parser, when it should +instead be focusing on coverage for some intended package. + +There are also questions about whether or not this is possible with the current +compiler instrumentation available. +By runtime, the fuzz target will have already been compiled, so recompiling to +leave out (or only include) certain packages may not be feasible. + +### Auto-generated unit test + +A crash report may be able to provide a reproducible unit test. +The generated unit test can be copied directly into a *_test.go file, or can be +used as a good starting point for a regression test. +This test can be auto-generated from the fuzz target, with a few changes: + +* Creating the vars to be used in the fuzz target will vary depending on the + type. In the case of a simple type, it may be instantiated directly from the + bytes. If, for example, the bytes are not printable, then they may be + decoded from hex before being decoded using UnmarshalBinary. +* The name passed into `t.Run()` will be the name of the associated fuzz + target. +* Each `f.BadInput()` will be turned into a `panic("unreachable code")`. +* Each `f.Add()` will be turned into a no-op on the LHS, and will keep the RHS + (ie. `_=<args to f.Add()>`). This is to preserve any necessary setup, and + can be removed at the developer’s discretion. +* All other `testing.F` functions will be converted to the equivalent + `testing.T` function. + +``` +const corpus_w5fcysjrx7yqtD = `` // const name is the hash + +func TestMarshalFoo(t *testing.T) { + inputs := []string{"cat", "dog", "mouse", "bird", "turtle"} + for _, input := range inputs { + _, _ = input, big.NewInt(100) + } + var a string, num *big.Int, other otherType + a = "badString" + if num.UnmarshalText([]byte("10")) != nil { + panic("failed to unmarshal num") + } + otherBytes, err := hex.DecodeString("a3fd12jd03df") + if err != nil { + panic("failed to decode other hex") + } + if other.UnmarshalBinary(otherBytes) != nil { + panic("failed to unmarshal other") + } + // Run the fuzz test with these arguments + t.Run("FuzzMarshalFoo", func(t *testing.T) { + if num.Sign() <= 0 { + panic("unreachable code") // only test positive numbers + } + val, err := MarshalFoo(test.A, test.Num) + if err != nil { + panic("unreachable code") // bad input + } + if val == nil { + t.Fatal("val == nil, err == nil") + } + a2, num2, err := UnmarshalFoo(val) + if err != nil { + t.Fatalf("failed to unmarshal valid Foo: %v", err) + } + if a2 == nil || num2 == nil { + t.Error("UnmarshalFoo succeeded but gave a nil a and num") + } + if a2 != a || !num2.Equal(test.Num) { + t.Error("UnmarshalFoo does not match the provided input") + } + }) +} |
