diff options
| author | Katie Hockman <katie@golang.org> | 2020-07-22 16:18:13 -0400 |
|---|---|---|
| committer | Katie Hockman <katie@golang.org> | 2020-07-22 20:40:44 +0000 |
| commit | 522400e00c82921a68bed78611cfb0ff3c014140 (patch) | |
| tree | d7a67637ac42eaaf0af52496e524da6cba8e181e | |
| parent | db42f6798a584c30c667ac078dfd8d7daae457a4 (diff) | |
| download | go-x-proposal-522400e00c82921a68bed78611cfb0ff3c014140.tar.xz | |
design/draft-fuzzing.md: clarify feedback request
Change-Id: I1c1c701b9606fba0cae4b94baefddb144a6f6cd6
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/244377
Reviewed-by: Katie Hockman <katie@golang.org>
| -rw-r--r-- | design/40307-fuzzing.md | 561 | ||||
| -rw-r--r-- | design/draft-fuzzing.md | 568 |
2 files changed, 569 insertions, 560 deletions
diff --git a/design/40307-fuzzing.md b/design/40307-fuzzing.md index a1b66c9..c630ae9 100644 --- a/design/40307-fuzzing.md +++ b/design/40307-fuzzing.md @@ -1,560 +1 @@ -# Design Draft: First Class Fuzzing - -Author: Katie Hockman - -Last updated: 2020-07-21 - -Discussion at https://golang.org/issue/40307 - -## Abstract - -Systems built with Go must be secure and resilient. -Fuzzing can help with this, by allowing developers to identify and fix bugs, -empowering them to improve the quality of their code. -However, there is no standard way of fuzzing Go code today, and no -out-of-the-box tooling or support. -This proposal will create a unified fuzzing narrative which makes fuzzing a -first class option for Go developers. - -## Background - -Fuzzing is a type of automated testing which continuously manipulates inputs to -a program to find issues such as panics, bugs, or data races to which the code -may be susceptible. -These semi-random data mutations can discover new code coverage that existing -unit tests may miss, and uncover edge-case bugs which would otherwise go -unnoticed. -This type of testing works best when able to run more mutations quickly, rather -than fewer mutations intelligently. - -Since fuzzing can reach edge cases which humans often miss, fuzz testing is -particularly valuable for finding security exploits and vulnerabilities. -Fuzz tests have historically been authored primarily by security engineers, and -hackers may use similar methods to find vulnerabilities maliciously. -However, writing fuzz targets needn’t be constrained to developers with security -expertise. -There is great value in fuzz testing all programs, including those -which may be more subtly security-relevant, especially those working with -arbitrary user input. - -Other languages support and encourage fuzz testing. -[libFuzzer](https://llvm.org/docs/LibFuzzer.html) and -[AFL](https://lcamtuf.coredump.cx/afl/) are widely used, particularly with -C/C++, and AFL has identified vulnerabilities in programs like Mozilla Firefox, -Internet Explorer, OpenSSH, Adobe Flash, and more. -In Rust, -[cargo-fuzz](https://fitzgeraldnick.com/2020/01/16/better-support-for-fuzzing-structured-inputs-in-rust.html) -allows for fuzzing of structured data in addition to raw bytes, allowing for -even more flexibility with authoring fuzz targets. -Existing tools in Go, such as go-fuzz, have many [success -stories](https://github.com/dvyukov/go-fuzz#trophies), but there is no fully -supported or canonical solution for Go. -The goal is to make fuzzing a first-class experience, making it so easy that it -becomes the norm for Go packages to have fuzz targets. -Having fuzz targets available in a standard format makes it possible to use them -automatically in CI, or even as the basis for experiments with different -mutation engines. - -There is strong community interest for this. -It’s the third most supported -[proposal](https://github.com/golang/go/issues/19109) on the issue tracker (~500 -+1s), with projects like [go-fuzz](https://github.com/dvyukov/go-fuzz) (3.5K -stars) and other community-led efforts that have been in the works for several -years. -Prototypes exist, but lack core features like robust module support, go command -integration, and integration with new [compiler -instrumentation](https://github.com/golang/go/issues/14565). - -## Proposal - -Support `Fuzz` functions in Go test files, making fuzzing a first class option -for Go developers through unified, end-to-end support. - -## Rationale - -One alternative would be to keep with the status quo and ask Go developers to -use existing tools, or build their own as needed. -Developers could use tools -like [go-fuzz](https://github.com/dvyukov/go-fuzz) or -[fzgo](https://github.com/thepudds/fzgo) (built on top of go-fuzz) to solve some -of their needs. -However, each existing solution involves more work than typical Go testing, and -is missing crucial features. -Fuzz testing shouldn’t be any more complicated, or any less feature-complete, -than other types of Go testing (like benchmarking or unit testing). -Existing solutions add extra overhead such as custom command line tools, -separate test files or build tags, lack of robust modules support, and lack of -testing/customization support from the standard library. - -By making fuzzing easier for developers, we will increase the amount of Go code -that’s covered by fuzz tests. -This will have particularly high impact for heavily depended upon or -security-sensitive packages. -The more Go code that’s covered by fuzz tests, the more bugs will be found and -fixed in the wider ecosystem. -These bug fixes matter for the stability and security of systems written in Go. - -The best solution for Go in the long-term is to have a feature-rich, fully -supported, unified narrative for fuzzing. -It should be just as easy to write fuzz targets as it is to write unit tests. -Developers should be able to use existing tools for which they are already -familiar, with small variations to support fuzzing. -Along with the language support, we must provide documentation, tutorials, and -incentives for Go package owners to add fuzz tests to their packages. -This is a measurable goal, and we can track the number of fuzz targets and -resulting bug fixes resulting from this design. - -Standardizing this also provides new opportunities for other tools to be built, -and integration into existing infrastructure. -For example, this proposal creates consistency for building and running fuzz -targets, making it easier to build turnkey -[OSS-Fuzz](https://github.com/google/oss-fuzz) support. - -In the long term, this design could start to replace existing table tests, -seamlessly integrating into the existing Go testing ecosystem. - -Some motivations written or provided by members of the Go community: - -* https://tiny.cc/why-go-fuzz -* [Around 400 documented bugs](https://github.com/dvyukov/go-fuzz#trophies) - were found by owners of various open-source Go packages with go-fuzz. - -## Compatibility - -This proposal will not impact any current compatibility promises. -It is possible that there are existing `FuzzX` functions in yyy\_test.go files -today, and the go command will emit an error on such functions if they have an -unsupported signature. -This should however be unlikely, since most existing fuzz tools don’t -support these functions within yyy\_test.go files. - -## Implementation - -There are several components to this proposal which are described below. -The big pieces to be supported in the MVP are: support for fuzzing built-in -types, structs, and types which implement the BinaryMarshaler and -BinaryUnmarshaler interfaces or the TextMarshaler and TextUnmarshaler -interfaces, a new `testing.F` type, full `go` command support, and building a -tailored-to-Go fuzzing engine using the [new compiler -instrumentation](https://golang.org/issue/14565). - -There is already a lot of existing work that has been done to support this, and -we should leverage as much of that as possible when building native support, -e.g. [go-fuzz](https://github.com/dvyukov/go-fuzz), -[fzgo](https://github.com/thepudds/fzgo). -Work for this will be done in a dev branch (e.g. dev.fuzzing) of the main Go -repository, led by Katie Hockman, with contributions from other members of the -Go team and members of the community as appropriate. - -### Overview - -The **fuzz target** is a `FuzzX` function in a test file. Each fuzz target has -its own corpus of inputs. - -The **fuzz function** is the function that is executed for every seed or -generated corpus entry. - -At the beginning of the fuzz target, a developer provides a “[seed -corpus](#seed-corpus)”. -This is an interesting set of inputs that will be tested using <code>[go -test](#go-command)</code> by default, and can provide a starting point for a -[mutation engine](#fuzzing-engine-and-mutator) if fuzzing. -The testing portion of the fuzz target is a function within an -<code>[f.Fuzz](#fuzz-function)</code> invocation. -This function is run for each input in the seed corpus. -If the developer is fuzzing this target with the new `-fuzz` flag with `go -test`, then an [on disk corpus](#fuzzing-engine-managed-corpus) will be managed -by the fuzzing engine, and a mutator will generate new inputs to run against the -testing function, attempting to discover interesting inputs or -[crashes](#crashers). - -With the new support, a fuzz target will look something like this: - -``` -func FuzzMarshalFoo(f *testing.F) { - // Seed the initial corpus - inputs := []string{"cat", "DoG", "!mouse!"} - for _, input := range inputs { - f.Add(input, big.NewInt(100)) - } - - // Run the fuzz test - f.Fuzz(func(a string, num *big.Int) { - f.Parallel() - if num.Sign() <= 0 { - f.BadInput() // only test positive numbers - } - val, err := MarshalFoo(a, num) - if err != nil { - f.BadInput() - } - if val == nil { - f.Fatal("MarshalFoo: val == nil, err == nil") - } - a2, num2, err := UnmarshalFoo(val) - if err != nil { - f.Fatalf("failed to unmarshal valid Foo: %v", err) - } - if a2 == nil || num2 == nil { - f.Error("UnmarshalFoo: a==nil, num==nil, err==nil") - } - if a2 != a || !num2.Equal(num) { - f.Error("UnmarshalFoo does not match the provided input") - } - }) -} -``` - -### testing.F - -Below is the list of methods on testing.F. - -``` -// Add will add the arguments to the seed corpus for the fuzz target. This -// cannot be called within the Fuzz function. The args must match those in -// in the Fuzz function. -func (f *F) Add(args ...interface{}) - -// Cleanup registers a function to be called when the fuzz target completes. -func (f *F) Cleanup(fn func()) - -// Error marks the arguments as having failed the test, but continues execution -// for that set of arguments. -func (f *F) Error(args ...interface{}) - -// Errorf behaves the same as Error but formats its arguments according to the -// format, analogous to Printf. -func (f *F) Errorf(args ...interface{}) - -// Fatal marks the arguments as having failed the test and stops its execution -// by calling runtime.Goexit (which then runs all deferred calls in the current -// goroutine). The fuzz target keeps executing with the next set of arguments. -func (f *F) Fatal(args ...interface{}) - -// Fatalf behaves the same as Fatal but formats its arguments according to the -// format, analogous to Printf. -func (f *F) Fatalf(args ...interface{}) - -// Fuzz runs the fuzz function with the target. It runs fn in a separate -// goroutine. Only one call to Fuzz is allowed per fuzz target, and any -// subsequent calls will panic. It only returns once all arguments have been -// passed to the fuzz function. -func (f *F) Fuzz(fn interface{}) - -// Helper marks the calling function as a helper function. -func (f *F) Helper() - -// Log formats its arguments using default formatting, analogous to Println, -// and records the text in the error log. -func (f *F) Log(args ...interface{}) - -// Logf formats its arguments according to the format, analogous to Printf, and -// records the text in the error log. -func (f *F) Logf(args ...interface{}) - -// Name returns the name of the running fuzz target. -func (f *F) Name() - -// Parallel signals that multiple instances of the Fuzz function can be run in -// parallel on separate goroutines. This must be called within an f.Fuzz -// function. -func (f *F) Parallel() - -// Skip marks the test as having been skipped and stops its execution by -// calling runtime.Goexit. Skip must be called before the Fuzz function. -func (f *F) Skip() - -// BadInput indicates that the input has failed some pre-condition, and the -// rest of the test should be skipped. The args will not be added to the -// corpus, nor will they be considered a crasher. -func (f *F) BadInput() -``` - -### Fuzz function - -A fuzz function has two main sections: 1) initializing and seeding the corpus -and 2) the Fuzz function which is executed for every seed or generated corpus -entry. - -1. The corpus generation is done first, and builds a seed corpus by calling - `f.Add(...)` with interesting input values for the fuzz test. - This should be fairly quick, thus able to run before the fuzz testing - begins, every time it’s run. These inputs are run by default with `go test`. -1. The `f.Fuzz(...)` function is executed with the provided seed corpus. - If this target is being fuzzed, then new inputs of the provided argument - types will be continously tested against the `f.Fuzz(...)` function. - -The arguments to `f.Add(...)` and the function in `f.Fuzz(...)` must be the same -type within the target, and there must be at least one argument specified. -This will be ensured by a vet check. -Fuzzing of built-in types (e.g. simple types, maps, arrays) and types which -implement the BinaryMarshaler and TextMarshaler interfaces are supported. - -Interfaces, functions, and channels are not appropriate types to fuzz, so will -never be supported. - -### Seed Corpus - -The **seed corpus** is the user-specified set of inputs to a fuzz target which -will be run by default with go test. -These should be composed of meaningful inputs to test the behavior of the -package, as well as a set of regression inputs for any newly discovered bugs -identified by the fuzzing engine. -This set of inputs is also used to “seed” the corpus used by the fuzzing engine -when mutating inputs to discover new code coverage. - -The seed corpus can be populated programmatically using `f.Add` within the -fuzz target. -Programmatic seed corpuses make it easy to add new entries when support for new -things are added (for example adding a new key type to a key parsing function) -saving the mutation engine a lot of work. -These can also be more clear for the developer when they break the build when -something changes. - -The fuzz target will always look in the package’s testdata/ directory for an -existing seed corpus to use as well, if one exists. -This seed corpus will be in a directory of the form `testdata/<target_name>`, -with a file for each unit that can be unmarshaled for testing. - -_Examples:_ - -1: A fuzz target’s `f.Fuzz` function takes three arguments - -``` -f.Fuzz(func(a string, b myStruct, num *big.Int) {...}) - -type myStruct struct { - A, B string - num int -} -``` - -In this example, string is a built-in type, so can be decoded directly. -`*big.Int` implements `UnmarshalText`, so can also be unmarshaled directly. -However, `myStruct` does not implement `UnmarshalBinary` or `UnmarshalText` so -the struct is pieced together recursively from its exported types. That would -mean two sets of bytes will be written for this type, one for each of A and B. -In total, four files would be written, and four inputs can be mutated when -fuzzing. - -2: A fuzz target’s `f.Fuzz` function takes a single `[]byte` - -``` -f.Fuzz(func(b []byte) {...}) -``` - -This is the typical “non-structured fuzzing” approach. -There is only one set of bytes to be provided by the mutator, so only one file -will be written. - -### Fuzzing Engine and Mutator - -A new **coverage-guided fuzzing engine**, written in Go, will be built. -This fuzzing engine will be responsible for using compiler instrumentation to -understand coverage information, generating test arguments with a mutator, and -maintaining the corpus. - -The **mutator** is responsible for working with a generator to mutate bytes to -be used as input to the fuzz target. - -Take the following `f.Fuzz` arguments as an example. - -``` - A string // N bytes - B int64 // 8 bytes - Num *big.Int // M bytes -``` - -A generator will provide some bytes for each type, where the number of bytes -could be constant (e.g. 8 bytes for an int64) or variable (e.g. N bytes for a -string, likely with some upper bound). - -For constant-length types, the number of bytes can be hard-coded into the -fuzzing engine, making generation simpler. - -For variable-length types, the mutator is responsible for varying the number of -bytes requested from the generator. - -These bytes then need to be converted to the types used by the `f.Fuzz` -function. -The string and other built-in types can be decoded directly. -For other types, this can be done using either -<code>[UnmarshalBinary](https://pkg.go.dev/encoding?tab=doc#BinaryUnmarshaler)</code> -or -<code>[UnmarshalText](https://pkg.go.dev/encoding?tab=doc#TextUnmarshaler)</code> -if implemented on the type. -If building a struct, it can also build exported fields recursively as needed. - -#### Fuzzing engine managed corpus - -An on disk corpus will be managed by the fuzzing engine and will live outside -the module. -New items can be added to this corpus in several ways, e.g. as part of the seed -corpus, or by the fuzzing engine (e.g. because of new code coverage). - -The details of how the corpus is built and processed should be unimportant to -users. -This should be a technical detail that developers don’t need to understand in -order to seed a corpus or write a fuzz target. -Any existing files that a developer wants to include in the fuzz test should be -added to the seed corpus directory, `testdata/<target_name>`. - - -#### Minification + Pruning - -Corpus entries will be minified to the smallest input that causes the failure -where possible, and pruned wherever possible to remove corpus entries that don’t -add additional coverage. -If a developer manually adds input files to the corpus directory, the fuzzing -engine may change the file names in order to help with this. - -### Crashers - -A **crasher** is a panic found within `f.Fuzz(...)`, a race condition, a call to -`Fatal`, or a call to `Error`. -By default, the fuzz target will stop after the first crasher is found, and a -crash report will be provided. -Crash reports will include the inputs that caused the crash and the resulting -error message or stack trace. -The crasher inputs will be written to the package's testdata/ directory as a -seed corpus entry. - -Since this crasher is added to testdata/, which will then be run by default as -part of the seed corpus for the fuzz target, this can act as a test for the new -failure. -A user experience may look something like this: - -1. A user runs `go test -fuzz=FuzzFoo`, and a crasher is found while fuzzing. -1. The arguments that caused the crash are added to a testdata directory within - the package automatically. -1. A subsequent run of `go test` (even without `-fuzz=FuzzFoo`) will then hit - this newly discovering failing condition, and continue to fail until the bug - is fixed. - -### Go command - -Fuzz testing will only be supported in module mode, and if run in GOPATH mode -the fuzz targets will be ignored. - -Fuzz targets will be in *_test.go files, and can be in the same file as Test and -Benchmark targets. -These test files can exist wherever *_test.go files can currently live, and do -not need to be in any fuzz-specific directory or have a fuzz-specific file name -or build tag. - -A new environment variable will be added, `$GOFUZZCACHE`, which will default to -an appropriate cache directory on the developer's machine. -This directory will hold the mutator-managed corpus. -For example, the corpus for each fuzz target will be managed in a subdirectory -called `<module_name>/<pkg>/@corpus/<target_name>` where `<module_name>` will -follow module case-encoding and include the major version. - -The default behavior of `go test` will be to build and run the fuzz targets -using the seed corpus only. -No special instrumentation would be needed, the mutation engine would not run, -and the test can be cached as usual. -This default mode **will not** run the existing on disk corpus against the fuzz -target. -This is to allow for reproducibility and cacheability for `go test` executions -by default. - -In order to run a fuzz target with the mutation engine, `-fuzz` will take a -regexp which must match only one fuzz target. -In this situtation, only the fuzz target will run (ignoring all other tests). -Only one package is allowed to be tested at a time in this mode. -The following flags will be added or have modified meaning: - -``` --fuzz name - Run the fuzz target with the given regexp. Must match at most one fuzz - target. --keepfuzzing - Keep running the target if a crasher is found. (default false) --parallel - Allow parallel execution of f.Fuzz functions that call f.Parallel. - The value of this flag is the maximum number of f.Fuzz functions to run - simultaneously within the given fuzz target. (default GOMAXPROCS) --race - Enable data race detection while fuzzing. (default true) -``` - -`go test` will not respect `-p` when running with `-fuzz`, as it doesn't make -sense to fuzz multiple packages at the same time. - -## Open issues and future work - -### Naming scheme for corpus files - -There are several naming schemes for the corpus files which may be appropriate, -and the final decision is still undecided. - -Take the following example: - -``` -f.Fuzz(func(a string, b myStruct, num *big.Int) {...}) - -type myStruct struct { - A, B string - num int -} -``` - -For two corpus entries, this could be structured as follows: -* 0000001.string -* 0000001.myStruct.string -* 0000001.myStruct.string -* 0000001.big_int -* 0000002.string -* 0000002.myStruct.string -* 0000002.myStruct.string -* 0000002.big_int - - -### Options - -There are options that developers often need to fuzz effectively and safely. -These options will likely make the most sense on a target-by-target basis, -rather than as a `go test` flag. -Which options to make available still needs some investigation. -The options may be set in a few ways. - -As a struct: - -``` -func FuzzFoo(f *testing.F) { - f.FuzzOpts(&testing.FuzzOpts { - MaxInputSize: 1024, - }) - f.Fuzz(func(a string) { - ... - }) -``` - -Or individually as: - -``` -func FuzzFoo(f *testing.F) { - f.MaxInputSize(1024) - f.Fuzz(func(a string) { - ... - }) -} -``` - -### Dictionaries - -Support accepting [dictionaries](https://llvm.org/docs/LibFuzzer.html#id31) when -seeding the corpus to guide the fuzzer. - -### Instrument specific packages only - -We might need a way to specify to instrument only some packages for coverage, -but there isn’t enough data yet to be sure. -One example use case for this would be a fuzzing engine which is spending too -much time discovering coverage in the encoding/json parser, when it should -instead be focusing on coverage for some intended package. - -There are also questions about whether or not this is possible with the current -compiler instrumentation available. -By runtime, the fuzz target will have already been compiled, so recompiling to -leave out (or only include) certain packages may not be feasible.
\ No newline at end of file +Moved to [golang.org/s/draft-fuzzing-design](https://golang.org/s/draft-fuzzing-design). diff --git a/design/draft-fuzzing.md b/design/draft-fuzzing.md new file mode 100644 index 0000000..d71af45 --- /dev/null +++ b/design/draft-fuzzing.md @@ -0,0 +1,568 @@ +# Design Draft: First Class Fuzzing + +Author: Katie Hockman + +[golang.org/s/draft-fuzzing-design](https://golang.org/s/draft-fuzzing-design) + +This is a **Draft Design**, not a formal Go proposal, since it is a large +change that is still flexible. +The goal of circulating this draft design is to collect feedback to shape an +intended eventual proposal. + +For this change, we will use [a Go Reddit +thread](https://golang.org/s/draft-fuzzing-reddit) to manage Q&A, since Reddit's +threading support can easily match questions with answers and keep separate +lines of discussion separate. + +## Abstract + +Systems built with Go must be secure and resilient. +Fuzzing can help with this, by allowing developers to identify and fix bugs, +empowering them to improve the quality of their code. +However, there is no standard way of fuzzing Go code today, and no +out-of-the-box tooling or support. +This proposal will create a unified fuzzing narrative which makes fuzzing a +first class option for Go developers. + +## Background + +Fuzzing is a type of automated testing which continuously manipulates inputs to +a program to find issues such as panics, bugs, or data races to which the code +may be susceptible. +These semi-random data mutations can discover new code coverage that existing +unit tests may miss, and uncover edge-case bugs which would otherwise go +unnoticed. +This type of testing works best when able to run more mutations quickly, rather +than fewer mutations intelligently. + +Since fuzzing can reach edge cases which humans often miss, fuzz testing is +particularly valuable for finding security exploits and vulnerabilities. +Fuzz tests have historically been authored primarily by security engineers, and +hackers may use similar methods to find vulnerabilities maliciously. +However, writing fuzz targets needn’t be constrained to developers with security +expertise. +There is great value in fuzz testing all programs, including those +which may be more subtly security-relevant, especially those working with +arbitrary user input. + +Other languages support and encourage fuzz testing. +[libFuzzer](https://llvm.org/docs/LibFuzzer.html) and +[AFL](https://lcamtuf.coredump.cx/afl/) are widely used, particularly with +C/C++, and AFL has identified vulnerabilities in programs like Mozilla Firefox, +Internet Explorer, OpenSSH, Adobe Flash, and more. +In Rust, +[cargo-fuzz](https://fitzgeraldnick.com/2020/01/16/better-support-for-fuzzing-structured-inputs-in-rust.html) +allows for fuzzing of structured data in addition to raw bytes, allowing for +even more flexibility with authoring fuzz targets. +Existing tools in Go, such as go-fuzz, have many [success +stories](https://github.com/dvyukov/go-fuzz#trophies), but there is no fully +supported or canonical solution for Go. +The goal is to make fuzzing a first-class experience, making it so easy that it +becomes the norm for Go packages to have fuzz targets. +Having fuzz targets available in a standard format makes it possible to use them +automatically in CI, or even as the basis for experiments with different +mutation engines. + +There is strong community interest for this. +It’s the third most supported +[proposal](https://github.com/golang/go/issues/19109) on the issue tracker (~500 ++1s), with projects like [go-fuzz](https://github.com/dvyukov/go-fuzz) (3.5K +stars) and other community-led efforts that have been in the works for several +years. +Prototypes exist, but lack core features like robust module support, go command +integration, and integration with new [compiler +instrumentation](https://github.com/golang/go/issues/14565). + +## Proposal + +Support `Fuzz` functions in Go test files, making fuzzing a first class option +for Go developers through unified, end-to-end support. + +## Rationale + +One alternative would be to keep with the status quo and ask Go developers to +use existing tools, or build their own as needed. +Developers could use tools +like [go-fuzz](https://github.com/dvyukov/go-fuzz) or +[fzgo](https://github.com/thepudds/fzgo) (built on top of go-fuzz) to solve some +of their needs. +However, each existing solution involves more work than typical Go testing, and +is missing crucial features. +Fuzz testing shouldn’t be any more complicated, or any less feature-complete, +than other types of Go testing (like benchmarking or unit testing). +Existing solutions add extra overhead such as custom command line tools, +separate test files or build tags, lack of robust modules support, and lack of +testing/customization support from the standard library. + +By making fuzzing easier for developers, we will increase the amount of Go code +that’s covered by fuzz tests. +This will have particularly high impact for heavily depended upon or +security-sensitive packages. +The more Go code that’s covered by fuzz tests, the more bugs will be found and +fixed in the wider ecosystem. +These bug fixes matter for the stability and security of systems written in Go. + +The best solution for Go in the long-term is to have a feature-rich, fully +supported, unified narrative for fuzzing. +It should be just as easy to write fuzz targets as it is to write unit tests. +Developers should be able to use existing tools for which they are already +familiar, with small variations to support fuzzing. +Along with the language support, we must provide documentation, tutorials, and +incentives for Go package owners to add fuzz tests to their packages. +This is a measurable goal, and we can track the number of fuzz targets and +resulting bug fixes resulting from this design. + +Standardizing this also provides new opportunities for other tools to be built, +and integration into existing infrastructure. +For example, this proposal creates consistency for building and running fuzz +targets, making it easier to build turnkey +[OSS-Fuzz](https://github.com/google/oss-fuzz) support. + +In the long term, this design could start to replace existing table tests, +seamlessly integrating into the existing Go testing ecosystem. + +Some motivations written or provided by members of the Go community: + +* https://tiny.cc/why-go-fuzz +* [Around 400 documented bugs](https://github.com/dvyukov/go-fuzz#trophies) + were found by owners of various open-source Go packages with go-fuzz. + +## Compatibility + +This proposal will not impact any current compatibility promises. +It is possible that there are existing `FuzzX` functions in yyy\_test.go files +today, and the go command will emit an error on such functions if they have an +unsupported signature. +This should however be unlikely, since most existing fuzz tools don’t +support these functions within yyy\_test.go files. + +## Implementation + +There are several components to this proposal which are described below. +The big pieces to be supported in the MVP are: support for fuzzing built-in +types, structs, and types which implement the BinaryMarshaler and +BinaryUnmarshaler interfaces or the TextMarshaler and TextUnmarshaler +interfaces, a new `testing.F` type, full `go` command support, and building a +tailored-to-Go fuzzing engine using the [new compiler +instrumentation](https://golang.org/issue/14565). + +There is already a lot of existing work that has been done to support this, and +we should leverage as much of that as possible when building native support, +e.g. [go-fuzz](https://github.com/dvyukov/go-fuzz), +[fzgo](https://github.com/thepudds/fzgo). +Work for this will be done in a dev branch (e.g. dev.fuzzing) of the main Go +repository, led by Katie Hockman, with contributions from other members of the +Go team and members of the community as appropriate. + +### Overview + +The **fuzz target** is a `FuzzX` function in a test file. Each fuzz target has +its own corpus of inputs. + +The **fuzz function** is the function that is executed for every seed or +generated corpus entry. + +At the beginning of the fuzz target, a developer provides a “[seed +corpus](#seed-corpus)”. +This is an interesting set of inputs that will be tested using <code>[go +test](#go-command)</code> by default, and can provide a starting point for a +[mutation engine](#fuzzing-engine-and-mutator) if fuzzing. +The testing portion of the fuzz target is a function within an +<code>[f.Fuzz](#fuzz-function)</code> invocation. +This function is run for each input in the seed corpus. +If the developer is fuzzing this target with the new `-fuzz` flag with `go +test`, then an [on disk corpus](#fuzzing-engine-managed-corpus) will be managed +by the fuzzing engine, and a mutator will generate new inputs to run against the +testing function, attempting to discover interesting inputs or +[crashes](#crashers). + +With the new support, a fuzz target will look something like this: + +``` +func FuzzMarshalFoo(f *testing.F) { + // Seed the initial corpus + inputs := []string{"cat", "DoG", "!mouse!"} + for _, input := range inputs { + f.Add(input, big.NewInt(100)) + } + + // Run the fuzz test + f.Fuzz(func(a string, num *big.Int) { + f.Parallel() + if num.Sign() <= 0 { + f.BadInput() // only test positive numbers + } + val, err := MarshalFoo(a, num) + if err != nil { + f.BadInput() + } + if val == nil { + f.Fatal("MarshalFoo: val == nil, err == nil") + } + a2, num2, err := UnmarshalFoo(val) + if err != nil { + f.Fatalf("failed to unmarshal valid Foo: %v", err) + } + if a2 == nil || num2 == nil { + f.Error("UnmarshalFoo: a==nil, num==nil, err==nil") + } + if a2 != a || !num2.Equal(num) { + f.Error("UnmarshalFoo does not match the provided input") + } + }) +} +``` + +### testing.F + +Below is the list of methods on testing.F. + +``` +// Add will add the arguments to the seed corpus for the fuzz target. This +// cannot be called within the Fuzz function. The args must match those in +// in the Fuzz function. +func (f *F) Add(args ...interface{}) + +// Cleanup registers a function to be called when the fuzz target completes. +func (f *F) Cleanup(fn func()) + +// Error marks the arguments as having failed the test, but continues execution +// for that set of arguments. +func (f *F) Error(args ...interface{}) + +// Errorf behaves the same as Error but formats its arguments according to the +// format, analogous to Printf. +func (f *F) Errorf(args ...interface{}) + +// Fatal marks the arguments as having failed the test and stops its execution +// by calling runtime.Goexit (which then runs all deferred calls in the current +// goroutine). The fuzz target keeps executing with the next set of arguments. +func (f *F) Fatal(args ...interface{}) + +// Fatalf behaves the same as Fatal but formats its arguments according to the +// format, analogous to Printf. +func (f *F) Fatalf(args ...interface{}) + +// Fuzz runs the fuzz function with the target. It runs fn in a separate +// goroutine. Only one call to Fuzz is allowed per fuzz target, and any +// subsequent calls will panic. It only returns once all arguments have been +// passed to the fuzz function. +func (f *F) Fuzz(fn interface{}) + +// Helper marks the calling function as a helper function. +func (f *F) Helper() + +// Log formats its arguments using default formatting, analogous to Println, +// and records the text in the error log. +func (f *F) Log(args ...interface{}) + +// Logf formats its arguments according to the format, analogous to Printf, and +// records the text in the error log. +func (f *F) Logf(args ...interface{}) + +// Name returns the name of the running fuzz target. +func (f *F) Name() + +// Parallel signals that multiple instances of the Fuzz function can be run in +// parallel on separate goroutines. This must be called within an f.Fuzz +// function. +func (f *F) Parallel() + +// Skip marks the test as having been skipped and stops its execution by +// calling runtime.Goexit. Skip must be called before the Fuzz function. +func (f *F) Skip() + +// BadInput indicates that the input has failed some pre-condition, and the +// rest of the test should be skipped. The args will not be added to the +// corpus, nor will they be considered a crasher. +func (f *F) BadInput() +``` + +### Fuzz function + +A fuzz function has two main sections: 1) initializing and seeding the corpus +and 2) the Fuzz function which is executed for every seed or generated corpus +entry. + +1. The corpus generation is done first, and builds a seed corpus by calling + `f.Add(...)` with interesting input values for the fuzz test. + This should be fairly quick, thus able to run before the fuzz testing + begins, every time it’s run. These inputs are run by default with `go test`. +1. The `f.Fuzz(...)` function is executed with the provided seed corpus. + If this target is being fuzzed, then new inputs of the provided argument + types will be continously tested against the `f.Fuzz(...)` function. + +The arguments to `f.Add(...)` and the function in `f.Fuzz(...)` must be the same +type within the target, and there must be at least one argument specified. +This will be ensured by a vet check. +Fuzzing of built-in types (e.g. simple types, maps, arrays) and types which +implement the BinaryMarshaler and TextMarshaler interfaces are supported. + +Interfaces, functions, and channels are not appropriate types to fuzz, so will +never be supported. + +### Seed Corpus + +The **seed corpus** is the user-specified set of inputs to a fuzz target which +will be run by default with go test. +These should be composed of meaningful inputs to test the behavior of the +package, as well as a set of regression inputs for any newly discovered bugs +identified by the fuzzing engine. +This set of inputs is also used to “seed” the corpus used by the fuzzing engine +when mutating inputs to discover new code coverage. + +The seed corpus can be populated programmatically using `f.Add` within the +fuzz target. +Programmatic seed corpuses make it easy to add new entries when support for new +things are added (for example adding a new key type to a key parsing function) +saving the mutation engine a lot of work. +These can also be more clear for the developer when they break the build when +something changes. + +The fuzz target will always look in the package’s testdata/ directory for an +existing seed corpus to use as well, if one exists. +This seed corpus will be in a directory of the form `testdata/<target_name>`, +with a file for each unit that can be unmarshaled for testing. + +_Examples:_ + +1: A fuzz target’s `f.Fuzz` function takes three arguments + +``` +f.Fuzz(func(a string, b myStruct, num *big.Int) {...}) + +type myStruct struct { + A, B string + num int +} +``` + +In this example, string is a built-in type, so can be decoded directly. +`*big.Int` implements `UnmarshalText`, so can also be unmarshaled directly. +However, `myStruct` does not implement `UnmarshalBinary` or `UnmarshalText` so +the struct is pieced together recursively from its exported types. That would +mean two sets of bytes will be written for this type, one for each of A and B. +In total, four files would be written, and four inputs can be mutated when +fuzzing. + +2: A fuzz target’s `f.Fuzz` function takes a single `[]byte` + +``` +f.Fuzz(func(b []byte) {...}) +``` + +This is the typical “non-structured fuzzing” approach. +There is only one set of bytes to be provided by the mutator, so only one file +will be written. + +### Fuzzing Engine and Mutator + +A new **coverage-guided fuzzing engine**, written in Go, will be built. +This fuzzing engine will be responsible for using compiler instrumentation to +understand coverage information, generating test arguments with a mutator, and +maintaining the corpus. + +The **mutator** is responsible for working with a generator to mutate bytes to +be used as input to the fuzz target. + +Take the following `f.Fuzz` arguments as an example. + +``` + A string // N bytes + B int64 // 8 bytes + Num *big.Int // M bytes +``` + +A generator will provide some bytes for each type, where the number of bytes +could be constant (e.g. 8 bytes for an int64) or variable (e.g. N bytes for a +string, likely with some upper bound). + +For constant-length types, the number of bytes can be hard-coded into the +fuzzing engine, making generation simpler. + +For variable-length types, the mutator is responsible for varying the number of +bytes requested from the generator. + +These bytes then need to be converted to the types used by the `f.Fuzz` +function. +The string and other built-in types can be decoded directly. +For other types, this can be done using either +<code>[UnmarshalBinary](https://pkg.go.dev/encoding?tab=doc#BinaryUnmarshaler)</code> +or +<code>[UnmarshalText](https://pkg.go.dev/encoding?tab=doc#TextUnmarshaler)</code> +if implemented on the type. +If building a struct, it can also build exported fields recursively as needed. + +#### Fuzzing engine managed corpus + +An on disk corpus will be managed by the fuzzing engine and will live outside +the module. +New items can be added to this corpus in several ways, e.g. as part of the seed +corpus, or by the fuzzing engine (e.g. because of new code coverage). + +The details of how the corpus is built and processed should be unimportant to +users. +This should be a technical detail that developers don’t need to understand in +order to seed a corpus or write a fuzz target. +Any existing files that a developer wants to include in the fuzz test should be +added to the seed corpus directory, `testdata/<target_name>`. + + +#### Minification + Pruning + +Corpus entries will be minified to the smallest input that causes the failure +where possible, and pruned wherever possible to remove corpus entries that don’t +add additional coverage. +If a developer manually adds input files to the corpus directory, the fuzzing +engine may change the file names in order to help with this. + +### Crashers + +A **crasher** is a panic found within `f.Fuzz(...)`, a race condition, a call to +`Fatal`, or a call to `Error`. +By default, the fuzz target will stop after the first crasher is found, and a +crash report will be provided. +Crash reports will include the inputs that caused the crash and the resulting +error message or stack trace. +The crasher inputs will be written to the package's testdata/ directory as a +seed corpus entry. + +Since this crasher is added to testdata/, which will then be run by default as +part of the seed corpus for the fuzz target, this can act as a test for the new +failure. +A user experience may look something like this: + +1. A user runs `go test -fuzz=FuzzFoo`, and a crasher is found while fuzzing. +1. The arguments that caused the crash are added to a testdata directory within + the package automatically. +1. A subsequent run of `go test` (even without `-fuzz=FuzzFoo`) will then hit + this newly discovering failing condition, and continue to fail until the bug + is fixed. + +### Go command + +Fuzz testing will only be supported in module mode, and if run in GOPATH mode +the fuzz targets will be ignored. + +Fuzz targets will be in *_test.go files, and can be in the same file as Test and +Benchmark targets. +These test files can exist wherever *_test.go files can currently live, and do +not need to be in any fuzz-specific directory or have a fuzz-specific file name +or build tag. + +A new environment variable will be added, `$GOFUZZCACHE`, which will default to +an appropriate cache directory on the developer's machine. +This directory will hold the mutator-managed corpus. +For example, the corpus for each fuzz target will be managed in a subdirectory +called `<module_name>/<pkg>/@corpus/<target_name>` where `<module_name>` will +follow module case-encoding and include the major version. + +The default behavior of `go test` will be to build and run the fuzz targets +using the seed corpus only. +No special instrumentation would be needed, the mutation engine would not run, +and the test can be cached as usual. +This default mode **will not** run the existing on disk corpus against the fuzz +target. +This is to allow for reproducibility and cacheability for `go test` executions +by default. + +In order to run a fuzz target with the mutation engine, `-fuzz` will take a +regexp which must match only one fuzz target. +In this situtation, only the fuzz target will run (ignoring all other tests). +Only one package is allowed to be tested at a time in this mode. +The following flags will be added or have modified meaning: + +``` +-fuzz name + Run the fuzz target with the given regexp. Must match at most one fuzz + target. +-keepfuzzing + Keep running the target if a crasher is found. (default false) +-parallel + Allow parallel execution of f.Fuzz functions that call f.Parallel. + The value of this flag is the maximum number of f.Fuzz functions to run + simultaneously within the given fuzz target. (default GOMAXPROCS) +-race + Enable data race detection while fuzzing. (default true) +``` + +`go test` will not respect `-p` when running with `-fuzz`, as it doesn't make +sense to fuzz multiple packages at the same time. + +## Open issues and future work + +### Naming scheme for corpus files + +There are several naming schemes for the corpus files which may be appropriate, +and the final decision is still undecided. + +Take the following example: + +``` +f.Fuzz(func(a string, b myStruct, num *big.Int) {...}) + +type myStruct struct { + A, B string + num int +} +``` + +For two corpus entries, this could be structured as follows: +* 0000001.string +* 0000001.myStruct.string +* 0000001.myStruct.string +* 0000001.big_int +* 0000002.string +* 0000002.myStruct.string +* 0000002.myStruct.string +* 0000002.big_int + + +### Options + +There are options that developers often need to fuzz effectively and safely. +These options will likely make the most sense on a target-by-target basis, +rather than as a `go test` flag. +Which options to make available still needs some investigation. +The options may be set in a few ways. + +As a struct: + +``` +func FuzzFoo(f *testing.F) { + f.FuzzOpts(&testing.FuzzOpts { + MaxInputSize: 1024, + }) + f.Fuzz(func(a string) { + ... + }) +``` + +Or individually as: + +``` +func FuzzFoo(f *testing.F) { + f.MaxInputSize(1024) + f.Fuzz(func(a string) { + ... + }) +} +``` + +### Dictionaries + +Support accepting [dictionaries](https://llvm.org/docs/LibFuzzer.html#id31) when +seeding the corpus to guide the fuzzer. + +### Instrument specific packages only + +We might need a way to specify to instrument only some packages for coverage, +but there isn’t enough data yet to be sure. +One example use case for this would be a fuzzing engine which is spending too +much time discovering coverage in the encoding/json parser, when it should +instead be focusing on coverage for some intended package. + +There are also questions about whether or not this is possible with the current +compiler instrumentation available. +By runtime, the fuzz target will have already been compiled, so recompiling to +leave out (or only include) certain packages may not be feasible.
\ No newline at end of file |
