_content/blog/testing-time.md: new blog post

Change-Id: I0312e2c60aa7d71e051b8dec788c28ae7ad648ad Reviewed-on: https://go-review.googlesource.com/c/website/+/697177 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Austin Clements <austin@google.com>
author: Damien Neil <dneil@google.com> 2025-08-18 15:05:41 -0700
committer: Austin Clements <austin@google.com> 2025-08-27 08:57:52 -0700
commit: 7e781a80ef493a16a6f04f7a49dfd1e8fdbfd8a9 (patch)
tree: d2c0dbd72c4770fd6926d2a576953a937ba55811 /_content/blog
parent: f1d710aa6c79f6d9411faa11170b979c58544bf2 (diff)
download: go-x-website-7e781a80ef493a16a6f04f7a49dfd1e8fdbfd8a9.tar.xz
1 files changed, 995 insertions, 0 deletions
diff --git a/_content/blog/testing-time.md b/_content/blog/testing-time.md
new file mode 100644
index 00000000..e386df55
--- /dev/null
+++ b/_content/blog/testing-time.md
@@ -0,0 +1,995 @@
+---
+title: Testing Time (and other asyncronicities)
+date: 2025-08-26
+by:
+- Damien Neil
+tags:
+- concurrency
+- testing
+- synctest
+summary: A discussion of testing asyncronous code
+  and an exploration of the `testing/synctest` package.
+  Based on the GopherCon Europe 2025 talk with the same title.
+---
+
+In Go 1.24, we introduced the [`testing/synctest`](/pkg/testing/synctest)
+package as an experimental package.
+This package can significantly simplify writing tests for concurrent,
+asynchronous code.
+In Go 1.25, the `testing/synctest` package has graduated from experiment
+to general availability.
+
+What follows is the blog version of my talk on
+the [`testing/synctest`](/pkg/testing/synctest) package
+at GopherCon Europe 2025 in Berlin.
+
+## What is an asynchronous function?
+
+A synchronous function is pretty simple.
+You call it, it does something, and it returns.
+
+An asynchronous function is different.
+You call it, it returns, and then it does something.
+
+As a concrete, if somewhat artificial, example,
+the following `Cleanup` function is synchronous.
+You call it, it deletes a cache directory, and it returns.
+
+```
+func (c *Cache) Cleanup() {
+    os.RemoveAll(c.cacheDir)
+}
+```
+
+`CleanupInBackground` is an asynchronous function.
+You call it, it returns, and the cache directory is deleted...sooner or later.
+
+```
+func (c *Cache) CleanupInBackground() {
+    go os.RemoveAll(c.cacheDir)
+}
+```
+
+Sometimes an asynchronous function does something in the future.
+For example, the `context` package's `WithDeadline` function
+returns a context which will be canceled in the future.
+
+```
+package context
+
+// WithDeadline returns a derived context
+// with a deadline no later than d.
+func WithDeadline(parent Context, d time.Time) (Context, CancelFunc)
+```
+
+When I talk about testing concurrent code,
+I mean testing these sorts of asynchronous operations,
+both ones which use real time and ones which do not.
+
+## Tests
+
+A test verifies that a system behaves as we expect.
+There's a lot of terminology describing types
+of test--unit tests, integration tests, and so on--but
+for our purposes here every kind of test reduces to three steps:
+
+1. Set up some initial conditions.
+2. Tell the system under test to do something.
+3. Verify the result.
+
+Testing a synchronous function is straightforward:
+
+- You call the function;
+- the function does something and returns;
+- you verify the result.
+
+Testing an asyncronous function, however, is tricky:
+
+- You call the function;
+- it returns;
+- you wait for it to finish doing whatever it does;
+- you verify the result.
+
+If you don't wait for the correct amount of time,
+you may find yourself verifying the result of an operation that hasn't happened yet
+or has only happened partially.
+This never ends well.
+
+Testing an asynchronous function is especially tricky
+when you want to assert that something has *not* happened.
+You can verify that the thing has not happened yet,
+but how do you know with certainty that it isn't going to happen later?
+
+## An example
+
+To make things a little more concrete,
+let's work with a real-world example.
+Consider the `context` package's `WithDeadline` function again.
+
+```
+package context
+
+// WithDeadline returns a derived context
+// with a deadline no later than d.
+func WithDeadline(parent Context, d time.Time) (Context, CancelFunc)
+```
+
+There are two obvious tests to write for `WithDeadline`.
+
+1. The context is *not* canceled *before* the deadline.
+2. The context *is* canceled *after* the deadline.
+
+Let's write a test.
+
+To keep the amount of code marginally less overwhelming,
+we'll just test the second case:
+After the deadline expires, the context is canceled.
+
+```
+func TestWithDeadlineAfterDeadline(t *testing.T) {
+    deadline := time.Now().Add(1 * time.Second)
+    ctx, _ := context.WithDeadline(t.Context(), deadline)
+
+    time.Sleep(time.Until(deadline))
+
+    if err := ctx.Err(); err != context.DeadlineExceeded {
+        t.Fatalf("context not canceled after deadline")
+    }
+}
+```
+
+This test is simple:
+
+1. Use `context.WithDeadline` to create a context with a deadline one second in the future.
+2. Wait until the deadline.
+3. Verify that the context is canceled.
+
+Unfortunately, this test obviously has a problem.
+It sleeps until the exact moment the deadline expires.
+Odds are good that the context has not been canceled yet by the time we examine it.
+At best, this test will be very flaky.
+
+Let's fix it.
+
+```
+time.Sleep(time.Until(deadline) + 100*time.Millisecond)
+```
+
+We can sleep until 100ms after the deadline.
+A hundred milliseconds is an eternity in computer terms.
+This should be fine.
+
+Unfortunately, we still have two problems.
+
+First, this test takes 1.1 seconds to execute.
+That's slow.
+This is a simple test.
+It should execute in milliseconds at the most.
+
+Second, this test is flaky.
+A hundred milliseconds is an eternity in computer terms,
+but on an overloaded continuous integration (CI) system
+it isn't unusual to see pauses much longer than that.
+This test will probably pass consistently on a developer's workstation,
+but I would expect occasional failures in a CI system.
+
+## Slow or flaky: Pick two
+
+Tests that use real time are always slow or flaky.
+Usually they're both.
+If the test waits longer than necessary, it is slow.
+If it doesn't wait long enough, it is flaky.
+You can make the test more slow and less flaky,
+or less slow and more flaky,
+but you can't make it fast and reliable.
+
+We have a lot of tests in the `net/http` package which use this approach.
+They're all slow and/or flaky, which is what started me down the road
+which brings us here today.
+
+## Write synchronous functions?
+
+The simplest way to test an asyncronous function is not to do it.
+Synchronous functions are easy to test.
+If you can transform an asyncronous function into a synchronous one,
+it will be easier to test.
+
+For example, if we consider our cache cleanup functions from earlier,
+the synchronous `Cleanup` is obviously better than
+the asynchronous `CleanupInBackground`.
+The synchronous function is easier to test,
+and the caller can easily start a new goroutine to run it in the background if needed.
+As a general rule,
+the higher up the call stack you can push your concurrency,
+the better.
+
+```
+// CleanupInBackground is hard to test.
+cache.CleanupInBackground()
+
+// Cleanup is easy to test,
+// and easy to run in the background when needed.
+go cache.Cleanup()
+```
+
+
+Unfortunately, this sort of transformation isn't always possible.
+For example, `context.WithDeadline` is an inherently asynchronous API.
+
+## Instrument code for testability?
+
+A better approach is to make our code more testable.
+
+Here's an example of what this might look like for our `WithDeadline` test:
+
+```
+func TestWithDeadlineAfterDeadline(t *testing.T) {
+    clock := fakeClock()
+    timeout := 1 * time.Second
+    deadline := clock.Now().Add(timeout)
+
+    ctx, _ := context.WithDeadlineClock(
+        t.Context(), deadline, clock)
+
+    clock.Advance(timeout)
+    context.WaitUntilIdle(ctx)
+    if err := ctx.Err(); err != context.DeadlineExceeded {
+        t.Fatalf("context not canceled after deadline")
+    }
+}
+```
+
+Instead of using real time, we use a fake time implementation.
+Using fake time avoids unnecessarily slow tests,
+because we never wait around doing nothing.
+It also helps avoid test flakiness,
+since the current time only changes when the test adjusts it.
+
+There are various fake time packages out there,
+or you can write your own.
+
+To use fake time, we need to modify our API to accept a fake clock.
+I've added a `context.WithDeadlineClock` function here,
+that takes an additional clock parameter:
+
+```
+ctx, _ := context.WithDeadlineClock(
+    t.Context(), deadline, clock)
+```
+
+When we advance our fake clock, we have a problem.
+Advancing time is an asynchrounous operation.
+Sleeping goroutines may wake up,
+timers may send on their channels,
+and timer functions may run.
+We need to wait for that work to finish before we can test
+the expected behavior of the system.
+
+I've added a `context.WaitUntilIdle` function here,
+which waits for any background work related to a context to complete:
+
+```
+clock.Advance(timeout)
+context.WaitUntilIdle(ctx)
+```
+
+This is a simple example, but it demonstrates
+the two fundamental principles of writing testable concurrent code:
+
+1. Use fake time (if you use time).
+2. Have some way to wait for quiescence,
+   which is a fancy way of saying
+   "all background activity has stopped and the system is stable".
+
+The interesting question, of course, is how we do this.
+I've glossed over the details in this example because
+there are some big downsides to this approach.
+
+It's hard.
+Using a fake clock isn't difficult,
+but identifying when background concurrent work is finished
+and it is safe to examine the state of the system is.
+
+Your code becomes less idiomatic.
+You can't use standard time package functions.
+You need to be very careful to keep track of everything happening
+in the background.
+
+You need to instrument not just your code,
+but any other packages you use.
+If you call any third-party concurrent code,
+you're probably out of luck.
+
+Worst of all, it can be just about impossible
+to retrofit this approach into an existing codebase.
+
+I attempted to apply this approach to Go's HTTP implementation,
+and while I had some success at doing so in places,
+the HTTP/2 server simply defeated me.
+In particular, adding instrumentation to detect quiescence
+without extensive rewriting proved infeasible,
+or at least beyond my skills.
+
+## Horrible runtime hacks?
+
+What do we do if we can't make our code testable?
+
+What if instead of instrumenting our code,
+we had a way to observe the behavior of the uninstrumented system?
+
+A Go program consists of a set of goroutines.
+Those goroutines have states.
+We just need to wait until all the goroutines have stopped running.
+
+Unfortunately, the Go runtime doesn't provide any way to tell what
+those goroutines are doing. Or does it?
+
+The `runtime` package contains a function that gives us a stack trace
+for every running goroutine, as well as their states.
+This is text intended for human consumption,
+but we could parse that output.
+Could we use this to detect quiescence?
+
+Now, of course this is a terrible idea.
+There is no guarantee that the format of these stack traces will be stable over time.
+You should not do this.
+
+I did it.
+And it worked.
+In fact, it worked surprisingly well.
+
+With a simple implementation of a fake clock,
+a small amount of instrumentation to keep track of what goroutines were part of the test,
+and some horrifying abuse of `runtime.Stack`,
+I finally had a way to write fast, reliable tests for the `http` package.
+
+The underlying implementation of these tests was horrible,
+but it demonstrated that there was a useful concept here.
+
+## A better way
+
+Go may have built-in concurrency,
+but testing programs that use that concurrency is hard.
+
+We're faced with an unfortunate choice:
+We can write simple, idiomatic code, but it will be impossible to test quickly and reliably;
+or we can write testable code, but it will be complicated and unidiomatic.
+
+So we asked ourselves what we can do to make this better.
+
+As we saw earlier, the two fundamental features required to write testable concurrent code are
+fake time and a way to wait for quiescence.
+
+We need a better way to to wait for quiescence.
+We should be able to ask the runtime when background goroutines have finished their work.
+We also want to be able to limit the scope of this query to a single test,
+so that unrelated tests do not interfere with each other.
+
+We also need better support for testing programs using fake time.
+
+It isn't hard to make a fake time implementation,
+but code which uses an implementation like this is not idiomatic.
+
+Idiomatic code will use a `time.Timer`,
+but it is not possible to create a fake `Timer`.
+We asked ourselves whether we should provide a way for tests to
+create a fake `Timer`, where the test controls when the timer fires.
+
+A testing implementation of time needs to define an entirely new version of the `time` package,
+and pass that to every function that operates on time.
+We considered whether we should define a common time interface,
+in the same way that `net.Conn` is a common interface describing a network connection.
+
+What we realized, however, is that unlike network connections,
+there is only one possible implementation of fake time.
+A fake network may want to introduce latency or errors.
+Time, in contrast, does only one thing: It moves forward.
+Tests need to control the rate at which time progresses,
+but a timer scheduled to fire ten seconds in the future
+should always fire ten (possibly fake) seconds in the future.
+
+In addition, we don't want to upset the entire Go ecosystem.
+Most programs today use functions in the time package.
+We want to keep those programs not only working,
+but idiomatic.
+
+This led to the conclusion that what we need is a way for a test to
+tell the time package to use a fake clock,
+in much the same way that the Go playground uses a fake clock.
+Unlike the playground,
+we need to limit the scope of that change to a single test.
+(It may not be obvious that the Go playground uses a fake clock,
+because we turn any fake delays into real delays on the front end,
+but it does.)
+
+## The `synctest` experiment
+
+And so in Go 1.24 we introduced [`testing/synctest`](/pkg/testing/synctest),
+a new, experimental package to simplify testing concurrent programs.
+Over the months following the release of Go 1.24
+we gathered feedback from early adopters.
+(Thank you to everyone who tried it out!)
+We made a number of changes to address problems and shortcomings.
+And now, in Go 1.25, we've released the `testing/synctest` package
+as part of the standard library.
+
+It lets you run a function in what we're calling a "bubble".
+Within the bubble, the time package uses a fake clock,
+and the `synctest` package provides a function to wait for the bubble to quiesce.
+
+## The `synctest` package
+
+The `synctest` package contains just two functions.
+
+```
+package synctest
+
+// Test executes f in a new bubble.
+// Goroutines in the bubble use a fake clock.
+func Test(t *testing.T, f func(*testing.T))
+
+// Wait waits for background activity in the bubble to complete.
+func Wait()
+```
+
+[`Test`](/pkg/testing/synctest#Test) executes a function in a new bubble.
+
+[`Wait`](/pkg/testing/synctest#Wait) blocks until every goroutine in the bubble is blocked
+waiting for some other goroutine in the bubble.
+We call that state being "durably blocked".
+
+## Testing with synctest
+
+Let's look at an example of synctest in action.
+
+```
+func TestWithDeadlineAfterDeadline(t *testing.T) {
+    synctest.Test(t, func(t *testing.T) {
+        deadline := time.Now().Add(1 * time.Second)
+        ctx, _ := context.WithDeadline(t.Context(), deadline)
+
+        time.Sleep(time.Until(deadline))
+        synctest.Wait()
+        if err := ctx.Err(); err != context.DeadlineExceeded {
+            t.Fatalf("context not canceled after deadline")
+        }
+    })
+}
+```
+
+This might look a little familiar.
+This is the naïve test for `context.WithDeadline` that we looked at earlier.
+The only changes are that we've wrapped the test in
+a `synctest.Test` call to execute it in a bubble
+and we have added a `synctest.Wait` call.
+
+This test is fast and reliable.
+It runs almost instantaneously.
+It precisely tests the expected behavior of the system under test.
+It also requires no modification of the `context` package.
+
+Using the `synctest` package,
+we can write simple, idiomatic code
+and test it reliably.
+
+This is a very simple example, of course,
+but this is a real test of real production code.
+If `synctest` had existed when the `context` package was written,
+we would have had a much easier time writing tests for it.
+
+## Time
+
+Time in the bubble behaves much the same as the fake time in the Go playground.
+Time starts at midnight, January 1, 2000 UTC.
+If you need to run a test at some specific point in time for some reason,
+you can just sleep until then.
+
+```
+func TestAtSpecificTime(t *testing.T) {
+   synctest.Test(t, func(t *testing.T) {
+       // 2000-01-01 00:00:00 +0000 UTC
+       t.Log(time.Now().In(time.UTC))
+
+       // This does not take 25 years.
+       time.Sleep(time.Until(
+           time.Date(2025, 1, 1, 0, 0, 0, 0, time.UTC)))
+
+       // 2025-01-01 00:00:00 +0000 UTC
+       t.Log(time.Now().In(time.UTC))
+   })
+}
+```
+
+Time only passes when every goroutine in the bubble has blocked.
+You can think of the bubble as simulating an infinitely fast computer:
+Any amount of computation takes no time.
+
+The following test will always print that zero seconds
+of fake time have elapsed since the start of the test,
+no matter how much real time has passed.
+
+```
+func TestExpensiveWork(t *testing.T) {
+   synctest.Test(t, func(t *testing.T) {
+       start := time.Now()
+       for range 1e7 {
+           // do expensive work
+       }
+       t.Log(time.Since(start)) // 0s
+   })
+}
+```
+
+In the next test, the `time.Sleep` call will return immediately,
+rather than waiting for ten real seconds.
+The test will always print that exactly ten fake seconds
+have passed since the start of the test.
+
+```
+func TestSleep(t *testing.T) {
+   synctest.Test(t, func(t *testing.T) {
+       start := time.Now()
+       time.Sleep(10 * time.Second)
+       t.Log(time.Since(start)) // 10s
+   })
+}
+```
+
+## Waiting for quiescence
+
+The [`synctest.Wait`](/pkg/testing/synctest#Wait) function
+lets us wait for background activity to complete.
+
+```
+func TestWait(t *testing.T) {
+   synctest.Test(t, func(t *testing.T) {
+       done := false
+       go func() {
+           done = true
+       }()
+
+       // Wait for the above goroutine to finish.
+       synctest.Wait()
+
+       t.Log(done) // true
+   })
+}
+```
+
+If we didn't have the `Wait` call in the above test,
+we would have a race condition:
+One goroutine modifies the `done` variable
+while another reads from it without synchronization.
+The `Wait` call provides that synchronization.
+
+You may be familiar with the `-race` test flag,
+which enables the data race detector.
+The race detector is aware of the synchronization provided by `Wait`,
+and does not complain about this test.
+If we forgot the `Wait` call, the race detector would correctly complain.
+
+The `synctest.Wait` function provides synchronization,
+but the passage of time does not.
+
+In the next example, one goroutine writes to the `done` variable
+while another sleeps for one nanosecond before reading from it.
+It should be obvious that when run with a real clock outside a synctest bubble,
+this code contains a race condition.
+Inside a synctest bubble,
+while the fake clock ensures that the goroutine completes before `time.Sleep` returns,
+the race detector will still report the data race,
+just like it would if this code were run outside a synctest bubble.
+
+```
+func TestTimeDataRace(t *testing.T) {
+   synctest.Test(t, func(t *testing.T) {
+       done := false
+       go func() {
+           done = true // write
+       }()
+
+       time.Sleep(1 * time.Nanosecond)
+
+       t.Log(done)     // read (unsynchronized)
+   })
+}
+```
+
+
+Adding a `Wait` call provides explicit synchronization and fixes the data race:
+
+```
+time.Sleep(1 * time.Nanosecond)
+synctest.Wait() // synchronize
+t.Log(done)     // read
+```
+
+## Example: `io.Copy`
+
+Taking advantage of the synchronization provided by `synctest.Wait` allows us
+to write simpler tests with less explicit synchronization.
+
+For example, consider this test of [`io.Copy`](/pkg/io#Copy).
+
+```
+func TestIOCopy(t *testing.T) {
+   synctest.Test(t, func(t *testing.T) {
+       srcReader, srcWriter := io.Pipe()
+       defer srcWriter.Close()
+
+       var dst bytes.Buffer
+       go io.Copy(&dst, srcReader)
+
+       data := "1234"
+       srcWriter.Write([]byte("1234"))
+       synctest.Wait()
+
+       if got, want := dst.String(), data; got != want {
+           t.Errorf("Copy wrote %q, want %q", got, want)
+       }
+   })
+}
+```
+
+The `io.Copy` function copies data from an `io.Reader` to an `io.Writer`.
+You might not immediately think of `io.Copy` as a concurrent function,
+since it blocks until the copy has completed.
+However, providing data to `io.Copy`'s reader is an asynchronous operation:
+
+- `Copy` calls the reader's `Read` method;
+- `Read` returns some data;
+- and the data is written to the writer at a later time.
+
+In this test, we are verifying that `io.Copy` writes new data to the writer
+without waiting to fill its buffer.
+
+Looking at the test step by step,
+we first create an `io.Pipe` to serve as the source `io.Copy` reads from:
+
+```
+srcReader, srcWriter := io.Pipe()
+defer srcWriter.Close()
+```
+
+We call `io.Copy` in a new goroutine,
+copying from the read end of the pipe into a `bytes.Buffer`:
+
+```
+var dst bytes.Buffer
+go io.Copy(&dst, srcReader)
+```
+
+We write to the other end of the pipe,
+and wait for `io.Copy` to handle the data:
+
+```
+data := "1234"
+srcWriter.Write([]byte("1234"))
+synctest.Wait()
+```
+
+Finally, we verify that the destination buffer contains the desired data:
+
+```
+if got, want := dst.String(), data; got != want {
+    t.Errorf("Copy wrote %q, want %q", got, want)
+}
+```
+
+We don't need to add a mutex or other synchronization around the destination buffer,
+because `synctest.Wait` ensures that it is never accessed concurrently.
+
+This test demonstrates a few important points.
+
+Even synchronous functions like `io.Copy`,
+which do not perform additional background work after they return,
+may exhibit asynchronous behaviors.
+
+Using `synctest.Wait`, we can test those behaviors.
+
+Note also that this test does not work with time.
+Many asynchronous systems involve time, but not all.
+
+## Bubble exit
+
+The `synctest.Test` function waits for all goroutines in the bubble to exit
+before returning.
+Time stops advancing after the root goroutine (the goroutine started by `Test`) returns.
+
+In the next example, `Test` waits for the background goroutine to run and exit
+before it returns:
+
+```
+func TestWaitForGoroutine(t *testing.T) {
+    synctest.Test(t, func(t *testing.T) {
+        go func() {
+            // This runs before synctest.Test returns.
+        }()
+    })
+}
+```
+
+In this example, we schedule a `time.AfterFunc` for a time in the future.
+The bubble's root goroutine returns before that time is reached,
+so the `AfterFunc` never runs:
+
+```
+func TestDoNotWaitForTimer(t *testing.T) {
+    synctest.Test(t, func(t *testing.T) {
+        time.AfterFunc(1 * time.Nanosecond, func() {
+            // This never runs.
+        })
+    })
+}
+```
+
+In the next example, we start a goroutine that sleeps.
+The root goroutine returns and time stops advancing.
+The bubble is now deadlocked,
+because `Test` is waiting for all goroutines in the bubble to finish
+but the sleeping goroutine is waiting for time to advance.
+
+```
+func TestDeadlock(t *testing.T) {
+    synctest.Test(t, func(t *testing.T) {
+        go func() {
+            // This sleep never returns and the test deadlocks.
+            time.Sleep(1 * time.Nanosecond)
+        }()
+    })
+}
+```
+
+## Deadlocks
+
+The `synctest` package panics when a bubble is deadlocked
+due to every goroutine in the bubble being durably blocked on
+another goroutine in the bubble.
+
+```
+--- FAIL: Test (0.00s)
+--- FAIL: TestDeadlock (0.00s)
+panic: deadlock: main bubble goroutine has exited but blocked goroutines remain [recovered, repanicked]
+
+goroutine 7 [running]:
+(stacks elided for clarity)
+
+goroutine 10 [sleep (durable), synctest bubble 1]:
+time.Sleep(0x1)
+	/Users/dneil/src/go/src/runtime/time.go:361 +0x130
+_.TestDeadlock.func1.1()
+	/tmp/s/main_test.go:13 +0x20
+created by _.TestDeadlock.func1 in goroutine 9
+	/tmp/s/main_test.go:11 +0x24
+FAIL	_	0.173s
+FAIL
+```
+
+The runtime will print stack traces for every goroutine in the deadlocked bubble.
+
+When printing the status of a bubbled goroutine,
+the runtime indicates when the goroutine is durably blocked.
+You can see that the sleeping goroutine in this test is durably blocked.
+
+## Durable blocking
+
+"Durably blocking" is a core concept in synctest.
+
+A goroutine is durably blocked when it is not only blocked,
+but when it can only be unblocked by another goroutine in the same bubble.
+
+When every goroutine in a bubble is durably blocked:
+
+1. `synctest.Wait` returns.
+2. If there is no `synctest.Wait` call in progress,
+   fake time advances instantly to the next point that will wake a goroutine.
+3. If there is no goroutine that can be woken by advancing time,
+   the bubble is deadlocked and the test fails.
+
+It is important for us to make a distinction between a goroutine which is merely blocked
+and one which is *durably* blocked.
+We don't want to declare a deadlock when a goroutine is temporarily blocked on
+some event arising outside its bubble.
+
+Let's look at some ways in which a goroutine can block non-durably.
+
+### Not durably blocking: I/O (files, pipes, network connections, etc.)
+
+The most important limitation is that I/O is not durably blocking,
+including network I/O.
+A goroutine reading from a network connection may be blocked,
+but it will be unblocked by data arriving on that connection.
+
+This is obviously true for a connection to some network service,
+but it is also true for a loopback connection,
+even when the reader and writer are both in the same bubble.
+
+When we write data to a network socket,
+even a loopback socket,
+the data is passed to the kernel for delivery.
+There is a period of time between the write system call returning
+and the kernel notifying the other side of the connection that data is available.
+The Go runtime cannot distinguish between a goroutine blocked waiting for
+data that is already in the kernel's buffers
+and one blocked waiting for data that will not arrive.
+
+This means that tests of networked programs using synctest
+usually cannot use real network connections.
+Instead, they should use an in-memory fake.
+
+I'm not going to go over the process of creating a fake network here,
+but the `synctest` package documentation contains
+[a complete worked example](/pkg/testing/synctest#hdr-Example__HTTP_100_Continue)
+of testing an HTTP client and server communicating over a fake network.
+
+### Not durably blocking: syscalls, cgo calls, anything that isn't Go
+
+Syscalls and cgo calls are not durably blocking.
+We can only reason about the state of goroutines executing Go code.
+
+### Not durably blocking: Mutexes
+
+Perhaps surprisingly, mutexes are not durably blocking.
+This is a decision born of practicality:
+Mutexes are often used to guard global state,
+so a bubbled goroutine will often need to acquire a mutex held outside its bubble.
+Mutexes are highly performance-sensitive,
+so adding additional instrumentation to them
+risks slowing down non-test programs.
+
+We can test programs that use mutexes with synctest,
+but the fake clock will not advance while a goroutine is blocked on mutex acquisition.
+This hasn't posed a problem in any case we've encountered,
+but it is something to be aware of.
+
+### Durably blocking: `time.Sleep`
+
+So what is durably blocking?
+
+`time.Sleep` is obviously durable,
+since time can only advance when every goroutine in the bubble is durably blocked.
+
+### Durably blocking: send or receive on channels created in the same bubble
+
+Channel operations on channels created within the same bubble are durable.
+
+We make a distinction between bubbled channels (created in a bubble)
+and unbubbled channels (created outside any bubble).
+This means that a function using a global channel for synchronization,
+for example to control access to a globally cached resource,
+can be safely called from within a bubble.
+
+Trying to operate on a bubbled channel from outside its bubble is an error.
+
+### Durably blocking: `sync.WaitGroup` belonging to the same bubble
+
+We also associate `sync.WaitGroup`s with bubbles.
+
+`WaitGroup` doesn't have a constructor,
+so we make the association with the bubble implicitly on the first call to `Go` or `Add`.
+
+As with channels,
+waiting on a `WaitGroup` belonging to the same bubble is durably blocking,
+and waiting on one from outside the bubble is not.
+Calling `Go` or `Add` on a `WaitGroup` belonging to a different bubble is an error.
+
+### Durably blocking: `sync.Cond.Wait`
+
+Waiting on a `sync.Cond` is always durably blocking.
+Waking up a goroutine waiting on a `Cond` in a different bubble is an error.
+
+### Durably blocking: `select{}`
+
+Finally, an empty select is durably blocking.
+(A select with cases is durably blocking if all the operations in it are so.)
+
+That's the complete list of durably blocking operations.
+It isn't very long,
+but it's enough to handle almost all real-world programs.
+
+The rule is that a goroutine is durably blocked when it is blocked,
+and we can guarantee that it can only be unblocked
+by another goroutine in its bubble.
+
+In cases where it is possible to attempt to wake a bubbled goroutine from outside its bubble,
+we panic.
+For example, it is an error to operate on a bubbled channel from outside its bubble.
+
+## Changes from 1.24 to 1.25
+
+We released an experimental version of the `synctest` package in Go 1.24.
+To ensure that early adopters were aware of the experimental status of the package,
+you needed to set a GOEXPERIMENT flag to make the package visible.
+
+The feedback we received from those early adopters was invaluable,
+both in demonstrating that the package is useful
+and in uncovering areas where the API needed work.
+
+These are some of the changes made between the experimental version
+and the version released in Go 1.25.
+
+### Replaced Run with Test
+
+The original version of the API created a bubble with a `Run` function:
+
+```
+// Run executes f in a new bubble.
+func Run(f func())
+```
+
+It became clear that we needed a way to create a `*testing.T`
+that is scoped to a bubble.
+For example, `t.Cleanup` should run cleanup functions in the same bubble
+they are registered in, not after the bubble exits.
+We renamed `Run` to `Test` and made it create a `T` scoped to the lifetime
+of the new bubble.
+
+### Time stops when a bubble's root goroutine returns
+
+We originally continued to advance time within a bubble for so long as
+the bubble contained any goroutines waiting for future events.
+This turned out to be very confusing when a long-lived goroutine never returned,
+such as a goroutine reading forever from a `time.Ticker`.
+We now stop advancing time when a bubble's root goroutine returns.
+If the bubble is blocked waiting for time to advance,
+this results in a deadlock and a panic which can be analyzed.
+
+### Removed cases where "durable" wasn't
+
+We cleaned up the definition of "durably blocking".
+The original implementation had cases where a durably blocked goroutine could
+be unblocked from outside the bubble.
+For example, channels recorded whether they were created in a bubble,
+but not which in which bubble they were created,
+so one bubble could unblock a channel in a different bubble.
+The current implementation contains no cases we know of
+where a durably blocked goroutine can be unblocked from outside its bubble.
+
+### Better stack traces
+
+We made improvements to the information printed in stack traces.
+When a bubble deadlocks, we by default now only print stacks for the goroutines in tha bubble.
+Stack traces also clearly indicate which goroutines in a bubble are durably blocked.
+
+### Randomized events happening at the same time
+
+We made improvements to the randomization of events happening at the same time.
+Originally, timers scheduled to fire at the same instant
+would always do so in the order they were created.
+This ordering is now randomized.
+
+## Future work
+
+We're pretty happy with the synctest package at the moment.
+
+Aside from the inevitable bug fixes,
+we don't currently expect any major changes to it in the future.
+Of course, with wider adoption it is always possible that we'll discover something
+that needs doing.
+
+One possible area of work is to improve the detection of durably blocked goroutines.
+It would be nice if we could make mutex operations durably blocking,
+with a restriction that a mutex acquired in a bubble must be released
+from within the same bubble.
+
+Testing networked code with synctest requires a fake network.
+The `net.Pipe` function can create a fake `net.Conn`,
+but there is currently no standard library function that creates
+a fake `net.Listener` or `net.PacketConn`.
+In addition, the `net.Conn` returned by `net.Pipe` is synchronous--every write blocks
+until a read consumes the data--which is not representative of real network behavior.
+Perhaps we should add a good fake implementations of common network interfaces
+to the standard library.
+
+## Conclusion
+
+That's the `synctest` package.
+
+I can't say that it makes testing concurrent code simple,
+because concurrency is never simple.
+What it does is let you write the simplest possible concurrent code,
+using idiomatic Go,
+and the standard time package,
+and then write fast, reliable tests for it.
+
+I hope you find it useful.
author	Damien Neil <dneil@google.com>	2025-08-18 15:05:41 -0700
committer	Austin Clements <austin@google.com>	2025-08-27 08:57:52 -0700
commit	7e781a80ef493a16a6f04f7a49dfd1e8fdbfd8a9 (patch)
tree	d2c0dbd72c4770fd6926d2a576953a937ba55811 /_content/blog
parent	f1d710aa6c79f6d9411faa11170b979c58544bf2 (diff)
download	go-x-website-7e781a80ef493a16a6f04f7a49dfd1e8fdbfd8a9.tar.xz