diff options
| author | Damien Neil <dneil@google.com> | 2025-08-18 15:05:41 -0700 |
|---|---|---|
| committer | Austin Clements <austin@google.com> | 2025-08-27 08:57:52 -0700 |
| commit | 7e781a80ef493a16a6f04f7a49dfd1e8fdbfd8a9 (patch) | |
| tree | d2c0dbd72c4770fd6926d2a576953a937ba55811 /_content | |
| parent | f1d710aa6c79f6d9411faa11170b979c58544bf2 (diff) | |
| download | go-x-website-7e781a80ef493a16a6f04f7a49dfd1e8fdbfd8a9.tar.xz | |
_content/blog/testing-time.md: new blog post
Change-Id: I0312e2c60aa7d71e051b8dec788c28ae7ad648ad
Reviewed-on: https://go-review.googlesource.com/c/website/+/697177
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Austin Clements <austin@google.com>
Diffstat (limited to '_content')
| -rw-r--r-- | _content/blog/testing-time.md | 995 |
1 files changed, 995 insertions, 0 deletions
diff --git a/_content/blog/testing-time.md b/_content/blog/testing-time.md new file mode 100644 index 00000000..e386df55 --- /dev/null +++ b/_content/blog/testing-time.md @@ -0,0 +1,995 @@ +--- +title: Testing Time (and other asyncronicities) +date: 2025-08-26 +by: +- Damien Neil +tags: +- concurrency +- testing +- synctest +summary: A discussion of testing asyncronous code + and an exploration of the `testing/synctest` package. + Based on the GopherCon Europe 2025 talk with the same title. +--- + +In Go 1.24, we introduced the [`testing/synctest`](/pkg/testing/synctest) +package as an experimental package. +This package can significantly simplify writing tests for concurrent, +asynchronous code. +In Go 1.25, the `testing/synctest` package has graduated from experiment +to general availability. + +What follows is the blog version of my talk on +the [`testing/synctest`](/pkg/testing/synctest) package +at GopherCon Europe 2025 in Berlin. + +## What is an asynchronous function? + +A synchronous function is pretty simple. +You call it, it does something, and it returns. + +An asynchronous function is different. +You call it, it returns, and then it does something. + +As a concrete, if somewhat artificial, example, +the following `Cleanup` function is synchronous. +You call it, it deletes a cache directory, and it returns. + +``` +func (c *Cache) Cleanup() { + os.RemoveAll(c.cacheDir) +} +``` + +`CleanupInBackground` is an asynchronous function. +You call it, it returns, and the cache directory is deleted...sooner or later. + +``` +func (c *Cache) CleanupInBackground() { + go os.RemoveAll(c.cacheDir) +} +``` + +Sometimes an asynchronous function does something in the future. +For example, the `context` package's `WithDeadline` function +returns a context which will be canceled in the future. + +``` +package context + +// WithDeadline returns a derived context +// with a deadline no later than d. +func WithDeadline(parent Context, d time.Time) (Context, CancelFunc) +``` + +When I talk about testing concurrent code, +I mean testing these sorts of asynchronous operations, +both ones which use real time and ones which do not. + +## Tests + +A test verifies that a system behaves as we expect. +There's a lot of terminology describing types +of test--unit tests, integration tests, and so on--but +for our purposes here every kind of test reduces to three steps: + +1. Set up some initial conditions. +2. Tell the system under test to do something. +3. Verify the result. + +Testing a synchronous function is straightforward: + +- You call the function; +- the function does something and returns; +- you verify the result. + +Testing an asyncronous function, however, is tricky: + +- You call the function; +- it returns; +- you wait for it to finish doing whatever it does; +- you verify the result. + +If you don't wait for the correct amount of time, +you may find yourself verifying the result of an operation that hasn't happened yet +or has only happened partially. +This never ends well. + +Testing an asynchronous function is especially tricky +when you want to assert that something has *not* happened. +You can verify that the thing has not happened yet, +but how do you know with certainty that it isn't going to happen later? + +## An example + +To make things a little more concrete, +let's work with a real-world example. +Consider the `context` package's `WithDeadline` function again. + +``` +package context + +// WithDeadline returns a derived context +// with a deadline no later than d. +func WithDeadline(parent Context, d time.Time) (Context, CancelFunc) +``` + +There are two obvious tests to write for `WithDeadline`. + +1. The context is *not* canceled *before* the deadline. +2. The context *is* canceled *after* the deadline. + +Let's write a test. + +To keep the amount of code marginally less overwhelming, +we'll just test the second case: +After the deadline expires, the context is canceled. + +``` +func TestWithDeadlineAfterDeadline(t *testing.T) { + deadline := time.Now().Add(1 * time.Second) + ctx, _ := context.WithDeadline(t.Context(), deadline) + + time.Sleep(time.Until(deadline)) + + if err := ctx.Err(); err != context.DeadlineExceeded { + t.Fatalf("context not canceled after deadline") + } +} +``` + +This test is simple: + +1. Use `context.WithDeadline` to create a context with a deadline one second in the future. +2. Wait until the deadline. +3. Verify that the context is canceled. + +Unfortunately, this test obviously has a problem. +It sleeps until the exact moment the deadline expires. +Odds are good that the context has not been canceled yet by the time we examine it. +At best, this test will be very flaky. + +Let's fix it. + +``` +time.Sleep(time.Until(deadline) + 100*time.Millisecond) +``` + +We can sleep until 100ms after the deadline. +A hundred milliseconds is an eternity in computer terms. +This should be fine. + +Unfortunately, we still have two problems. + +First, this test takes 1.1 seconds to execute. +That's slow. +This is a simple test. +It should execute in milliseconds at the most. + +Second, this test is flaky. +A hundred milliseconds is an eternity in computer terms, +but on an overloaded continuous integration (CI) system +it isn't unusual to see pauses much longer than that. +This test will probably pass consistently on a developer's workstation, +but I would expect occasional failures in a CI system. + +## Slow or flaky: Pick two + +Tests that use real time are always slow or flaky. +Usually they're both. +If the test waits longer than necessary, it is slow. +If it doesn't wait long enough, it is flaky. +You can make the test more slow and less flaky, +or less slow and more flaky, +but you can't make it fast and reliable. + +We have a lot of tests in the `net/http` package which use this approach. +They're all slow and/or flaky, which is what started me down the road +which brings us here today. + +## Write synchronous functions? + +The simplest way to test an asyncronous function is not to do it. +Synchronous functions are easy to test. +If you can transform an asyncronous function into a synchronous one, +it will be easier to test. + +For example, if we consider our cache cleanup functions from earlier, +the synchronous `Cleanup` is obviously better than +the asynchronous `CleanupInBackground`. +The synchronous function is easier to test, +and the caller can easily start a new goroutine to run it in the background if needed. +As a general rule, +the higher up the call stack you can push your concurrency, +the better. + +``` +// CleanupInBackground is hard to test. +cache.CleanupInBackground() + +// Cleanup is easy to test, +// and easy to run in the background when needed. +go cache.Cleanup() +``` + + +Unfortunately, this sort of transformation isn't always possible. +For example, `context.WithDeadline` is an inherently asynchronous API. + +## Instrument code for testability? + +A better approach is to make our code more testable. + +Here's an example of what this might look like for our `WithDeadline` test: + +``` +func TestWithDeadlineAfterDeadline(t *testing.T) { + clock := fakeClock() + timeout := 1 * time.Second + deadline := clock.Now().Add(timeout) + + ctx, _ := context.WithDeadlineClock( + t.Context(), deadline, clock) + + clock.Advance(timeout) + context.WaitUntilIdle(ctx) + if err := ctx.Err(); err != context.DeadlineExceeded { + t.Fatalf("context not canceled after deadline") + } +} +``` + +Instead of using real time, we use a fake time implementation. +Using fake time avoids unnecessarily slow tests, +because we never wait around doing nothing. +It also helps avoid test flakiness, +since the current time only changes when the test adjusts it. + +There are various fake time packages out there, +or you can write your own. + +To use fake time, we need to modify our API to accept a fake clock. +I've added a `context.WithDeadlineClock` function here, +that takes an additional clock parameter: + +``` +ctx, _ := context.WithDeadlineClock( + t.Context(), deadline, clock) +``` + +When we advance our fake clock, we have a problem. +Advancing time is an asynchrounous operation. +Sleeping goroutines may wake up, +timers may send on their channels, +and timer functions may run. +We need to wait for that work to finish before we can test +the expected behavior of the system. + +I've added a `context.WaitUntilIdle` function here, +which waits for any background work related to a context to complete: + +``` +clock.Advance(timeout) +context.WaitUntilIdle(ctx) +``` + +This is a simple example, but it demonstrates +the two fundamental principles of writing testable concurrent code: + +1. Use fake time (if you use time). +2. Have some way to wait for quiescence, + which is a fancy way of saying + "all background activity has stopped and the system is stable". + +The interesting question, of course, is how we do this. +I've glossed over the details in this example because +there are some big downsides to this approach. + +It's hard. +Using a fake clock isn't difficult, +but identifying when background concurrent work is finished +and it is safe to examine the state of the system is. + +Your code becomes less idiomatic. +You can't use standard time package functions. +You need to be very careful to keep track of everything happening +in the background. + +You need to instrument not just your code, +but any other packages you use. +If you call any third-party concurrent code, +you're probably out of luck. + +Worst of all, it can be just about impossible +to retrofit this approach into an existing codebase. + +I attempted to apply this approach to Go's HTTP implementation, +and while I had some success at doing so in places, +the HTTP/2 server simply defeated me. +In particular, adding instrumentation to detect quiescence +without extensive rewriting proved infeasible, +or at least beyond my skills. + +## Horrible runtime hacks? + +What do we do if we can't make our code testable? + +What if instead of instrumenting our code, +we had a way to observe the behavior of the uninstrumented system? + +A Go program consists of a set of goroutines. +Those goroutines have states. +We just need to wait until all the goroutines have stopped running. + +Unfortunately, the Go runtime doesn't provide any way to tell what +those goroutines are doing. Or does it? + +The `runtime` package contains a function that gives us a stack trace +for every running goroutine, as well as their states. +This is text intended for human consumption, +but we could parse that output. +Could we use this to detect quiescence? + +Now, of course this is a terrible idea. +There is no guarantee that the format of these stack traces will be stable over time. +You should not do this. + +I did it. +And it worked. +In fact, it worked surprisingly well. + +With a simple implementation of a fake clock, +a small amount of instrumentation to keep track of what goroutines were part of the test, +and some horrifying abuse of `runtime.Stack`, +I finally had a way to write fast, reliable tests for the `http` package. + +The underlying implementation of these tests was horrible, +but it demonstrated that there was a useful concept here. + +## A better way + +Go may have built-in concurrency, +but testing programs that use that concurrency is hard. + +We're faced with an unfortunate choice: +We can write simple, idiomatic code, but it will be impossible to test quickly and reliably; +or we can write testable code, but it will be complicated and unidiomatic. + +So we asked ourselves what we can do to make this better. + +As we saw earlier, the two fundamental features required to write testable concurrent code are +fake time and a way to wait for quiescence. + +We need a better way to to wait for quiescence. +We should be able to ask the runtime when background goroutines have finished their work. +We also want to be able to limit the scope of this query to a single test, +so that unrelated tests do not interfere with each other. + +We also need better support for testing programs using fake time. + +It isn't hard to make a fake time implementation, +but code which uses an implementation like this is not idiomatic. + +Idiomatic code will use a `time.Timer`, +but it is not possible to create a fake `Timer`. +We asked ourselves whether we should provide a way for tests to +create a fake `Timer`, where the test controls when the timer fires. + +A testing implementation of time needs to define an entirely new version of the `time` package, +and pass that to every function that operates on time. +We considered whether we should define a common time interface, +in the same way that `net.Conn` is a common interface describing a network connection. + +What we realized, however, is that unlike network connections, +there is only one possible implementation of fake time. +A fake network may want to introduce latency or errors. +Time, in contrast, does only one thing: It moves forward. +Tests need to control the rate at which time progresses, +but a timer scheduled to fire ten seconds in the future +should always fire ten (possibly fake) seconds in the future. + +In addition, we don't want to upset the entire Go ecosystem. +Most programs today use functions in the time package. +We want to keep those programs not only working, +but idiomatic. + +This led to the conclusion that what we need is a way for a test to +tell the time package to use a fake clock, +in much the same way that the Go playground uses a fake clock. +Unlike the playground, +we need to limit the scope of that change to a single test. +(It may not be obvious that the Go playground uses a fake clock, +because we turn any fake delays into real delays on the front end, +but it does.) + +## The `synctest` experiment + +And so in Go 1.24 we introduced [`testing/synctest`](/pkg/testing/synctest), +a new, experimental package to simplify testing concurrent programs. +Over the months following the release of Go 1.24 +we gathered feedback from early adopters. +(Thank you to everyone who tried it out!) +We made a number of changes to address problems and shortcomings. +And now, in Go 1.25, we've released the `testing/synctest` package +as part of the standard library. + +It lets you run a function in what we're calling a "bubble". +Within the bubble, the time package uses a fake clock, +and the `synctest` package provides a function to wait for the bubble to quiesce. + +## The `synctest` package + +The `synctest` package contains just two functions. + +``` +package synctest + +// Test executes f in a new bubble. +// Goroutines in the bubble use a fake clock. +func Test(t *testing.T, f func(*testing.T)) + +// Wait waits for background activity in the bubble to complete. +func Wait() +``` + +[`Test`](/pkg/testing/synctest#Test) executes a function in a new bubble. + +[`Wait`](/pkg/testing/synctest#Wait) blocks until every goroutine in the bubble is blocked +waiting for some other goroutine in the bubble. +We call that state being "durably blocked". + +## Testing with synctest + +Let's look at an example of synctest in action. + +``` +func TestWithDeadlineAfterDeadline(t *testing.T) { + synctest.Test(t, func(t *testing.T) { + deadline := time.Now().Add(1 * time.Second) + ctx, _ := context.WithDeadline(t.Context(), deadline) + + time.Sleep(time.Until(deadline)) + synctest.Wait() + if err := ctx.Err(); err != context.DeadlineExceeded { + t.Fatalf("context not canceled after deadline") + } + }) +} +``` + +This might look a little familiar. +This is the naïve test for `context.WithDeadline` that we looked at earlier. +The only changes are that we've wrapped the test in +a `synctest.Test` call to execute it in a bubble +and we have added a `synctest.Wait` call. + +This test is fast and reliable. +It runs almost instantaneously. +It precisely tests the expected behavior of the system under test. +It also requires no modification of the `context` package. + +Using the `synctest` package, +we can write simple, idiomatic code +and test it reliably. + +This is a very simple example, of course, +but this is a real test of real production code. +If `synctest` had existed when the `context` package was written, +we would have had a much easier time writing tests for it. + +## Time + +Time in the bubble behaves much the same as the fake time in the Go playground. +Time starts at midnight, January 1, 2000 UTC. +If you need to run a test at some specific point in time for some reason, +you can just sleep until then. + +``` +func TestAtSpecificTime(t *testing.T) { + synctest.Test(t, func(t *testing.T) { + // 2000-01-01 00:00:00 +0000 UTC + t.Log(time.Now().In(time.UTC)) + + // This does not take 25 years. + time.Sleep(time.Until( + time.Date(2025, 1, 1, 0, 0, 0, 0, time.UTC))) + + // 2025-01-01 00:00:00 +0000 UTC + t.Log(time.Now().In(time.UTC)) + }) +} +``` + +Time only passes when every goroutine in the bubble has blocked. +You can think of the bubble as simulating an infinitely fast computer: +Any amount of computation takes no time. + +The following test will always print that zero seconds +of fake time have elapsed since the start of the test, +no matter how much real time has passed. + +``` +func TestExpensiveWork(t *testing.T) { + synctest.Test(t, func(t *testing.T) { + start := time.Now() + for range 1e7 { + // do expensive work + } + t.Log(time.Since(start)) // 0s + }) +} +``` + +In the next test, the `time.Sleep` call will return immediately, +rather than waiting for ten real seconds. +The test will always print that exactly ten fake seconds +have passed since the start of the test. + +``` +func TestSleep(t *testing.T) { + synctest.Test(t, func(t *testing.T) { + start := time.Now() + time.Sleep(10 * time.Second) + t.Log(time.Since(start)) // 10s + }) +} +``` + +## Waiting for quiescence + +The [`synctest.Wait`](/pkg/testing/synctest#Wait) function +lets us wait for background activity to complete. + +``` +func TestWait(t *testing.T) { + synctest.Test(t, func(t *testing.T) { + done := false + go func() { + done = true + }() + + // Wait for the above goroutine to finish. + synctest.Wait() + + t.Log(done) // true + }) +} +``` + +If we didn't have the `Wait` call in the above test, +we would have a race condition: +One goroutine modifies the `done` variable +while another reads from it without synchronization. +The `Wait` call provides that synchronization. + +You may be familiar with the `-race` test flag, +which enables the data race detector. +The race detector is aware of the synchronization provided by `Wait`, +and does not complain about this test. +If we forgot the `Wait` call, the race detector would correctly complain. + +The `synctest.Wait` function provides synchronization, +but the passage of time does not. + +In the next example, one goroutine writes to the `done` variable +while another sleeps for one nanosecond before reading from it. +It should be obvious that when run with a real clock outside a synctest bubble, +this code contains a race condition. +Inside a synctest bubble, +while the fake clock ensures that the goroutine completes before `time.Sleep` returns, +the race detector will still report the data race, +just like it would if this code were run outside a synctest bubble. + +``` +func TestTimeDataRace(t *testing.T) { + synctest.Test(t, func(t *testing.T) { + done := false + go func() { + done = true // write + }() + + time.Sleep(1 * time.Nanosecond) + + t.Log(done) // read (unsynchronized) + }) +} +``` + + +Adding a `Wait` call provides explicit synchronization and fixes the data race: + +``` +time.Sleep(1 * time.Nanosecond) +synctest.Wait() // synchronize +t.Log(done) // read +``` + +## Example: `io.Copy` + +Taking advantage of the synchronization provided by `synctest.Wait` allows us +to write simpler tests with less explicit synchronization. + +For example, consider this test of [`io.Copy`](/pkg/io#Copy). + +``` +func TestIOCopy(t *testing.T) { + synctest.Test(t, func(t *testing.T) { + srcReader, srcWriter := io.Pipe() + defer srcWriter.Close() + + var dst bytes.Buffer + go io.Copy(&dst, srcReader) + + data := "1234" + srcWriter.Write([]byte("1234")) + synctest.Wait() + + if got, want := dst.String(), data; got != want { + t.Errorf("Copy wrote %q, want %q", got, want) + } + }) +} +``` + +The `io.Copy` function copies data from an `io.Reader` to an `io.Writer`. +You might not immediately think of `io.Copy` as a concurrent function, +since it blocks until the copy has completed. +However, providing data to `io.Copy`'s reader is an asynchronous operation: + +- `Copy` calls the reader's `Read` method; +- `Read` returns some data; +- and the data is written to the writer at a later time. + +In this test, we are verifying that `io.Copy` writes new data to the writer +without waiting to fill its buffer. + +Looking at the test step by step, +we first create an `io.Pipe` to serve as the source `io.Copy` reads from: + +``` +srcReader, srcWriter := io.Pipe() +defer srcWriter.Close() +``` + +We call `io.Copy` in a new goroutine, +copying from the read end of the pipe into a `bytes.Buffer`: + +``` +var dst bytes.Buffer +go io.Copy(&dst, srcReader) +``` + +We write to the other end of the pipe, +and wait for `io.Copy` to handle the data: + +``` +data := "1234" +srcWriter.Write([]byte("1234")) +synctest.Wait() +``` + +Finally, we verify that the destination buffer contains the desired data: + +``` +if got, want := dst.String(), data; got != want { + t.Errorf("Copy wrote %q, want %q", got, want) +} +``` + +We don't need to add a mutex or other synchronization around the destination buffer, +because `synctest.Wait` ensures that it is never accessed concurrently. + +This test demonstrates a few important points. + +Even synchronous functions like `io.Copy`, +which do not perform additional background work after they return, +may exhibit asynchronous behaviors. + +Using `synctest.Wait`, we can test those behaviors. + +Note also that this test does not work with time. +Many asynchronous systems involve time, but not all. + +## Bubble exit + +The `synctest.Test` function waits for all goroutines in the bubble to exit +before returning. +Time stops advancing after the root goroutine (the goroutine started by `Test`) returns. + +In the next example, `Test` waits for the background goroutine to run and exit +before it returns: + +``` +func TestWaitForGoroutine(t *testing.T) { + synctest.Test(t, func(t *testing.T) { + go func() { + // This runs before synctest.Test returns. + }() + }) +} +``` + +In this example, we schedule a `time.AfterFunc` for a time in the future. +The bubble's root goroutine returns before that time is reached, +so the `AfterFunc` never runs: + +``` +func TestDoNotWaitForTimer(t *testing.T) { + synctest.Test(t, func(t *testing.T) { + time.AfterFunc(1 * time.Nanosecond, func() { + // This never runs. + }) + }) +} +``` + +In the next example, we start a goroutine that sleeps. +The root goroutine returns and time stops advancing. +The bubble is now deadlocked, +because `Test` is waiting for all goroutines in the bubble to finish +but the sleeping goroutine is waiting for time to advance. + +``` +func TestDeadlock(t *testing.T) { + synctest.Test(t, func(t *testing.T) { + go func() { + // This sleep never returns and the test deadlocks. + time.Sleep(1 * time.Nanosecond) + }() + }) +} +``` + +## Deadlocks + +The `synctest` package panics when a bubble is deadlocked +due to every goroutine in the bubble being durably blocked on +another goroutine in the bubble. + +``` +--- FAIL: Test (0.00s) +--- FAIL: TestDeadlock (0.00s) +panic: deadlock: main bubble goroutine has exited but blocked goroutines remain [recovered, repanicked] + +goroutine 7 [running]: +(stacks elided for clarity) + +goroutine 10 [sleep (durable), synctest bubble 1]: +time.Sleep(0x1) + /Users/dneil/src/go/src/runtime/time.go:361 +0x130 +_.TestDeadlock.func1.1() + /tmp/s/main_test.go:13 +0x20 +created by _.TestDeadlock.func1 in goroutine 9 + /tmp/s/main_test.go:11 +0x24 +FAIL _ 0.173s +FAIL +``` + +The runtime will print stack traces for every goroutine in the deadlocked bubble. + +When printing the status of a bubbled goroutine, +the runtime indicates when the goroutine is durably blocked. +You can see that the sleeping goroutine in this test is durably blocked. + +## Durable blocking + +"Durably blocking" is a core concept in synctest. + +A goroutine is durably blocked when it is not only blocked, +but when it can only be unblocked by another goroutine in the same bubble. + +When every goroutine in a bubble is durably blocked: + +1. `synctest.Wait` returns. +2. If there is no `synctest.Wait` call in progress, + fake time advances instantly to the next point that will wake a goroutine. +3. If there is no goroutine that can be woken by advancing time, + the bubble is deadlocked and the test fails. + +It is important for us to make a distinction between a goroutine which is merely blocked +and one which is *durably* blocked. +We don't want to declare a deadlock when a goroutine is temporarily blocked on +some event arising outside its bubble. + +Let's look at some ways in which a goroutine can block non-durably. + +### Not durably blocking: I/O (files, pipes, network connections, etc.) + +The most important limitation is that I/O is not durably blocking, +including network I/O. +A goroutine reading from a network connection may be blocked, +but it will be unblocked by data arriving on that connection. + +This is obviously true for a connection to some network service, +but it is also true for a loopback connection, +even when the reader and writer are both in the same bubble. + +When we write data to a network socket, +even a loopback socket, +the data is passed to the kernel for delivery. +There is a period of time between the write system call returning +and the kernel notifying the other side of the connection that data is available. +The Go runtime cannot distinguish between a goroutine blocked waiting for +data that is already in the kernel's buffers +and one blocked waiting for data that will not arrive. + +This means that tests of networked programs using synctest +usually cannot use real network connections. +Instead, they should use an in-memory fake. + +I'm not going to go over the process of creating a fake network here, +but the `synctest` package documentation contains +[a complete worked example](/pkg/testing/synctest#hdr-Example__HTTP_100_Continue) +of testing an HTTP client and server communicating over a fake network. + +### Not durably blocking: syscalls, cgo calls, anything that isn't Go + +Syscalls and cgo calls are not durably blocking. +We can only reason about the state of goroutines executing Go code. + +### Not durably blocking: Mutexes + +Perhaps surprisingly, mutexes are not durably blocking. +This is a decision born of practicality: +Mutexes are often used to guard global state, +so a bubbled goroutine will often need to acquire a mutex held outside its bubble. +Mutexes are highly performance-sensitive, +so adding additional instrumentation to them +risks slowing down non-test programs. + +We can test programs that use mutexes with synctest, +but the fake clock will not advance while a goroutine is blocked on mutex acquisition. +This hasn't posed a problem in any case we've encountered, +but it is something to be aware of. + +### Durably blocking: `time.Sleep` + +So what is durably blocking? + +`time.Sleep` is obviously durable, +since time can only advance when every goroutine in the bubble is durably blocked. + +### Durably blocking: send or receive on channels created in the same bubble + +Channel operations on channels created within the same bubble are durable. + +We make a distinction between bubbled channels (created in a bubble) +and unbubbled channels (created outside any bubble). +This means that a function using a global channel for synchronization, +for example to control access to a globally cached resource, +can be safely called from within a bubble. + +Trying to operate on a bubbled channel from outside its bubble is an error. + +### Durably blocking: `sync.WaitGroup` belonging to the same bubble + +We also associate `sync.WaitGroup`s with bubbles. + +`WaitGroup` doesn't have a constructor, +so we make the association with the bubble implicitly on the first call to `Go` or `Add`. + +As with channels, +waiting on a `WaitGroup` belonging to the same bubble is durably blocking, +and waiting on one from outside the bubble is not. +Calling `Go` or `Add` on a `WaitGroup` belonging to a different bubble is an error. + +### Durably blocking: `sync.Cond.Wait` + +Waiting on a `sync.Cond` is always durably blocking. +Waking up a goroutine waiting on a `Cond` in a different bubble is an error. + +### Durably blocking: `select{}` + +Finally, an empty select is durably blocking. +(A select with cases is durably blocking if all the operations in it are so.) + +That's the complete list of durably blocking operations. +It isn't very long, +but it's enough to handle almost all real-world programs. + +The rule is that a goroutine is durably blocked when it is blocked, +and we can guarantee that it can only be unblocked +by another goroutine in its bubble. + +In cases where it is possible to attempt to wake a bubbled goroutine from outside its bubble, +we panic. +For example, it is an error to operate on a bubbled channel from outside its bubble. + +## Changes from 1.24 to 1.25 + +We released an experimental version of the `synctest` package in Go 1.24. +To ensure that early adopters were aware of the experimental status of the package, +you needed to set a GOEXPERIMENT flag to make the package visible. + +The feedback we received from those early adopters was invaluable, +both in demonstrating that the package is useful +and in uncovering areas where the API needed work. + +These are some of the changes made between the experimental version +and the version released in Go 1.25. + +### Replaced Run with Test + +The original version of the API created a bubble with a `Run` function: + +``` +// Run executes f in a new bubble. +func Run(f func()) +``` + +It became clear that we needed a way to create a `*testing.T` +that is scoped to a bubble. +For example, `t.Cleanup` should run cleanup functions in the same bubble +they are registered in, not after the bubble exits. +We renamed `Run` to `Test` and made it create a `T` scoped to the lifetime +of the new bubble. + +### Time stops when a bubble's root goroutine returns + +We originally continued to advance time within a bubble for so long as +the bubble contained any goroutines waiting for future events. +This turned out to be very confusing when a long-lived goroutine never returned, +such as a goroutine reading forever from a `time.Ticker`. +We now stop advancing time when a bubble's root goroutine returns. +If the bubble is blocked waiting for time to advance, +this results in a deadlock and a panic which can be analyzed. + +### Removed cases where "durable" wasn't + +We cleaned up the definition of "durably blocking". +The original implementation had cases where a durably blocked goroutine could +be unblocked from outside the bubble. +For example, channels recorded whether they were created in a bubble, +but not which in which bubble they were created, +so one bubble could unblock a channel in a different bubble. +The current implementation contains no cases we know of +where a durably blocked goroutine can be unblocked from outside its bubble. + +### Better stack traces + +We made improvements to the information printed in stack traces. +When a bubble deadlocks, we by default now only print stacks for the goroutines in tha bubble. +Stack traces also clearly indicate which goroutines in a bubble are durably blocked. + +### Randomized events happening at the same time + +We made improvements to the randomization of events happening at the same time. +Originally, timers scheduled to fire at the same instant +would always do so in the order they were created. +This ordering is now randomized. + +## Future work + +We're pretty happy with the synctest package at the moment. + +Aside from the inevitable bug fixes, +we don't currently expect any major changes to it in the future. +Of course, with wider adoption it is always possible that we'll discover something +that needs doing. + +One possible area of work is to improve the detection of durably blocked goroutines. +It would be nice if we could make mutex operations durably blocking, +with a restriction that a mutex acquired in a bubble must be released +from within the same bubble. + +Testing networked code with synctest requires a fake network. +The `net.Pipe` function can create a fake `net.Conn`, +but there is currently no standard library function that creates +a fake `net.Listener` or `net.PacketConn`. +In addition, the `net.Conn` returned by `net.Pipe` is synchronous--every write blocks +until a read consumes the data--which is not representative of real network behavior. +Perhaps we should add a good fake implementations of common network interfaces +to the standard library. + +## Conclusion + +That's the `synctest` package. + +I can't say that it makes testing concurrent code simple, +because concurrency is never simple. +What it does is let you write the simplest possible concurrent code, +using idiomatic Go, +and the standard time package, +and then write fast, reliable tests for it. + +I hope you find it useful. |
