Implementing Pernosco Support For Github Actions

Posted by roc on 29 October 2021

Pernosco supports debugging failures in Github Actions tests. When a test fails, you get a link that takes you directly to debug that failure:

(If you want Pernosco debugging for the Github Actions of your open-source project, contact us.)

Implementing this required reimplementing (some of) Github Actions workflow execution. We have just published our gha-runner implementation of Github Actions (as a Rust library). You can use gha-runner to run Github Actions workflows locally, and it has extension points to let projects like Pernosco modify how workflows are executed. gha-runner contains an example that lets you run Github Actions steps locally under strace, e.g.:

$ target/debug/examples/gha_local \
    --strace /tmp/strace.out \
    --image-path ghcr.io/catthehacker/ubuntu:act \
    Pernosco github-actions-test 6475d0f048a72996e3bd559cdd3763f53fe3d072 \
    .github/workflows/build.yml \
    "Build+test (stable, ubuntu-18.04)"
...
failures:
    one

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

error: test failed, to rerun pass '--test one'
Step 'cargo test (debug; default features)' exited with code 101, aborting
$ head -5 /tmp/strace.out
557   execve("/root/.cargo/bin/cargo", ["cargo", "test"], 0x7ffcb6870200 /* 31 vars */) = 0
557   brk(NULL)                         = 0x5646cb4bb000
557   access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
557   access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
557   openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3

(Note: gha-runner is far from a complete implementation of Github Actions, but can run simple workflows.)

How Pernosco's Github Actions support works

When a Github Actions workflow fails in a Pernosco-supported project, Github notifies a Pernosco daemon. Our daemon analyzes the failed workflow (using gha-runner) to check whether the PERNOSCO_ENABLED environment variable is set in the failing workflow step (we don't want to reproduce and offer to debug build failures, for example). The daemon then spawns a task into our AWS ECS cluster to rerun the failed step under rr and hopefully reproduce the failure. We use gha-runner to rerun all job steps up to and including the failing step. We have to run this in our AWS instances because Github's own Azure instances don't provide virtualized hardware performance counters and thus don't support rr recording. We don't want to require our users to use self-hosted AWS runners (which can be difficult and expensive). The steps before the failing step run normally, but the failing step is run under rr to get a recording. If we record a test failure, the recording is submitted to Pernosco for analysis. When the analysis is done and the debugger is ready, we update Github with the link to the debugger.

Implementing gha-runner

The official Github Actions runner is written in C#, is extremely complicated, and communicates with Github's infrastructure using complex undocumented protocols. It is unsuited to our purposes. act is much more suitable: it's a self-contained Go project that lets you run Github Actions locally, and looks really good. However, we wanted a library that would integrate easily into our Rust project, and that supports the extension points we need — customizable container-running backends, so we can run GHA jobs as ECS tasks, and the ability to inject rr into containers and run steps under rr. So we wrote our own. I'm grateful to the act author(s) for proving that standalone GHA runners are possible (at least for many workflows) and that I was occasionally able to clarify GHA semantics by reading act source code.

gha-runner can run simple workflows but is currently missing a lot of functionality. For example, actions that bring their own containers are not currently supported. pre and post steps are not supported. All actions run in Docker containers and Docker nesting is not supported, so actions that run their own docker commands won't work. Action expression syntax is quite limited currently. Most of these missing pieces can probably be implemented fairly easily.

We hope that gha-runner will be useful to other projects and welcome PRs to improve it.

Octocrab

gha-runner uses octocrab to access Github APIs. We're in the process of upstreaming some changes to octocrab; once that is complete we'll rebase gha-runner to upstream octocrab and release gha-runner on crates.io.

A public personal access token

Github Actions actions are passed a $GITHUB_TOKEN that they use to authenticate with Github for checkouts and other API calls. For public projects no authentication is normally needed, but nevertheless a valid token must be provided. To make gha-runner easy to use for public projects, we have created a personal access token that anyone can use: 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. This token has no access right grants, so can only access public resources. For extra security, it belongs to the pernosco-unauthorized Github account which itself has no access to anything and exists solely to own this token.