How to Measure Performance Impact of Code Changes in Go

Introduction

Go has built-in support for benchmarks. You can use it to compare performance of several implementations of the same task, or simply to measure the time taken to run some code. Let's take generating a random number for example:

func RandUInt64() uint64 {
    data := make([]byte, 8)
    rand.Read(data)
    return binary.BigEndian.Uint64(data)
}

func BenchmarkRandUint64(b *testing.B) {
    for i := 0; i < b.N; i++ {
        RandUInt64()
    }
}

Run go test -bench=RandUint64 -benchmem to get results in Go Benchmark Data Format.

BenchmarkRandUint64-12       1248550           895.4 ns/op         8 B/op          1 allocs/op

Suppose you want to keep an eye on the performance impact of code changes, you could write several implementations in different functions and benchmark all of them, but it's too much work and doesn't scale for nested function calls.

The easier way is to compare benchmark results before and after a change. Given this simple data format, you may be tempted to write a CLI tool that parses benchmark outputs and prints statistical differences. The Go team has the same concern and that's exactly what they have implemented in the benchstat package.

benchstat

If you have been following development of Go itself, you may have already seen benchmark results like this in its commit messages.

name                       old time/op    new time/op    delta
ByteReplacerWriteString-8    1.23µs ± 0%    2.16µs ± 1%   +75.31%  (p=0.000 n=10+10)

name                       old alloc/op   new alloc/op   delta
ByteReplacerWriteString-8    2.69kB ± 0%    0.00kB       -100.00%  (p=0.000 n=10+10)

name                       old allocs/op  new allocs/op  delta
ByteReplacerWriteString-8      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)

This is generated by a semi-official tool benchstat, which can compute and compare statistics about benchmarks.

Let's run a benchmark for 5 times and then run benchstat against the results.

$ go test -bench=RandUint64 -benchmem -count=5 >a.txt
$ benchstat a.txt
name           time/op
RandUint64-12  870ns ± 1%

name           alloc/op
RandUint64-12  8.00B ± 0%

name           allocs/op
RandUint64-12   1.00 ± 0%

It works. Let's change some code, run a second round of benchmark and compare the results with benchstat.

$ go test -bench=RandUint64 -benchmem -count=5 >b.txt
$ benchstat a.txt b.txt
name           old time/op    new time/op    delta
RandUint64-12     870ns ± 1%      21ns ± 2%   -97.55%  (p=0.008 n=5+5)

name           old alloc/op   new alloc/op   delta
RandUint64-12     8.00B ± 0%     0.00B       -100.00%  (p=0.008 n=5+5)

name           old allocs/op  new allocs/op  delta
RandUint64-12      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)

There you go! We can see that time taken for each operation is reduced by 97.55%, and no memory allocation is required in the new implementation.

Conclusion

This is how to measure performance impact of code changes before they go into production. For changes that have been deployed to production, you can use pprof to take CPU profiles, or deploy a continuous profiling solution like parca.

Thanks for reading and see you next time!