Many modern workflows aren't amenable to interactive debugging. When a test fails in CI, or a component is failing in a distributed system, it's difficult or even impossible to attach a debugger. Most interactive debuggers need to stop the program to inspect the state, and stopping a program may cause it to be killed by a supervisor or trigger unacceptable cascading failures.
Record-and-replay debuggers like rr can avoid these problems. Recording an execution is much like producing a log. rr's recording overhead is low for many workloads, especially functional tests (which tend to not require high parallelism within a single test). Pernosco leverages rr recording to integrate into developer workflows in novel ways. In particular, Pernosco can integrate into CI to enable easy debugging of CI test failures.
We built Pernosco integration for Github and Travis CI. The user installs our Pernosco Github app:
The user triggers a Travis test run which fails. Pernosco receives the failure notification from Github and automatically reruns the failing test, with recording.
If Pernosco successfully reproduces the failure, it then builds the omniscient database and updates Github with a link to a Pernosco debugging session.
The user clicks on the "Details" link to enter the Pernosco debugger and diagnose the bug.
We hope eventually all CI systems will support integrated debugging like this!
Pernosco processing adds some latency to the push/test cycle, but only when developers are already waiting for CI results. Compare this to the effort required to rerun and debug the test locally, or to multiple cycles of enabling extra logging and re-pushing to figure out the problem.
Pernosco is most effective when we can examine the entire tree of processes spawned by a CI job and identify the smallest set of processes needed to debug a particular failing test. Unfortunately Travis doesn't distinguish build failures from test failures, and it doesn't identify finer-grained test results than the result of the entire job. Therefore we use heuristics to pick out processes for debugging; for example, we make independent recordings of the tests run by
cargo test when that appears in a Travis configuration.
Many CI frameworks exist, and many projects have their own. Some work is required to integrate Pernosco with each one.
Firefox developers validate their changes before commit by pushing them to a Mozilla 'Try server' which runs automated tests on them. Pernosco detects test failures on the Try server, then reproduces, records and processes those failures, notifying developers by email when Pernosco sessions are available. Clicking on a link to debug is much more convenient than having to reproduce the test failure locally.
To simplify debugging, our test reproducer stops the recording as soon as the first test in a suite has failed. Sometimes developers want to debug a different test failure. To support that, we offer a dashboard where developers can view each test that failed and select them individually for reproduction and debugging with Pernosco.
Our CI support primarily targets reproduction of deterministic test failures, and re-runs each failing test suite just once by default. We also support rerunning a failed test suite multiple times (with rr chaos mode) to enable reproduction and debugging of intermittent test failures.