As mentioned in the introduction, the current state of Pernosco is not the last word in debugging. There are many more improvements we would like to make but haven't had time to yet, and there surely many great ideas we haven't thought of that could build on our work.
There are many features that would clearly be beneficial.
Pernosco should display the values of variables as they changed across source lines, so data changes over time can be directly visualized instead of having to set the current moment in time to successive states.
Pernosco currently doesn't support any kind of prettyprinting in its views of data, but other debuggers support extensible prettyprinting so that users can supply custom visualizations of specific data types. Those extension frameworks aren't directly applicable to Pernosco because Pernosco data printing has richer semantics (e.g. every rendered value can be clicked on to yield a dataflow explanation). However, clearly some framework for extensible prettyprinting is needed. This needs to be powerful enough to express the extensions found in JsDbg for example.
The notebook can be extended to capture more context around the notes. In general the notebook is a great platform improving how users think about their debugging problems individually and together. There is probably a lot to learn from what users record in their notebooks.
Supporting statically compiled languages such as Go would be pretty easy. Some internal refactoring would be needed to support goroutines.
Supporting simple interpreted languages such as Python would not be very hard. We can inspect the Python interpreter state and map that up to the source level.
Supporting complex VMs with multi-tier JITs and moving GC would be hard, but is probably still feasible.
Pernosco is a natural platform for integrating dynamic analysis. For example, Valgrind/ASAN-style memory checking and dynamic race detection could be implemented as post-recording dynamic analyses in Pernosco's internal framework. Furthermore, the usablity of dynamic bug-detection analysis would benefit tremendously from being supported by Pernosoco's powerful debugging tools. This makes Pernosco a strategic platform for dynamic analysis; a dynamic analysis is more valuable integrated with Pernosco than standing alone.
This applies to performance analysis as well as correctness. rr recording invalidates some kinds of performance analysis (e.g. inter-core interactions), but a lot of performance analysis is still meaningful because the application mostly executes the same code as without rr. We could, for example, replay the application under a cache simulator to estimate the cache behaviour of code, validate the simulation by cross-checking against measurements from hardware performance counters gathered during recording, and visualize the results in the context of the complete Pernosco debugging experience.
Beyond the obviously good ideas, there are more tantalizing possibilities that may or may not pay off.
Pernosco often has to decide what information is most relevant to the user, e.g. when selecting the stack frame to show when jumping to a new point in time, determining which alert to highlight at the start of a session, or ordering the results of the search box. Given data collected from user debugging sessions, we might be able to learn good heuristics for which results are most likely to be relevant for specific projects or even across all projects.
One of the most difficult debugging scenarios is explaining why something didn't happen. One way to attack this problem would be to compare an execution where something didn't happen with a "closely related" execution where it did.
Closely-related executions where one passes and the other fails are commonly available. For example, if a failure is reproduced by a minimized testcase, then a small change to the testcase will produce a passing execution. Alternatively, if the failure is a regression due to a small code change, the two different code versions produce closely-related pass/fail executions. If the failure is intermittent, then we have closely-related executions with the same code and testcase. Visualizing the differences between these executions seems likely to be valuable, especially if we can learn good relevance heuristics.