Structured Identifiers

Posted by roc on 4 October 2021

Source-level debuggers devote much interface real estate to program identifiers. To avoid ambiguity we often want to display fully qualified identifiers (e.g. including namespace/module names and the parameters of generic types), but these are often very long and unnecessarily verbose. Pernosco takes advantage of interactivity by eliding selected parts of complex identifiers; clicking on an elided part reveals elided text (which may contain nested elided parts). This requires treating identifiers not as plain strings, but as syntax trees — structured identifiers. This need to obtain structured identifiers has implications for the design and implementation of debuginfo formats and demangling APIs.

The syntax tree for an identifier usually matches the structure of the AST generated when parsing the identifier in its programming language. Here, the syntax tree for mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::MaybeGlobalThisPolicy,mozilla::dom::binding_detail::ThrowExceptions> has nodes for the template parameters and namespace prefixes.

Obtaining structured identifiers by demangling

Languages with namespaces (which for our purposes here includes features such as C++ classes, structs and unions, Ada packages and Rust modules) typically need to use fully-qualified names for linking to avoid symbol collisions. Their compilers perform "name mangling" to convert fully qualified names into simple ASCII identifiers in object files. Fortunately for us, this mangling usually encodes enough tree structure for our purposes. Debuggers already demangle object-file identifiers to human-readable fully qualified names, so we just need to augment the demangler to emit an AST instead of a string.

Many demangling libraries exist, but they tend to focus on string emission and need work to add support for AST emission. For Pernosco we added AST emission to cpp-demangle (a Rust implementation of C++ demangling). GNU Ada has very simple demangling so we created a simple Ada demangler in Rust. The standard Rust demangler rustc-demangle only emits strings, but there is another library ast-demangle that produces ASTs from new (v0) Rust mangled names.

It would be helpful if future demangling APIs supported AST emission. It's fine for each demangler to produce its own AST format, as long as the AST definition is clear and captures basically the same information as a language parse of the fully qualified name. Pernosco post-processes language-specific ASTs into a language-neutral format for our internal use.

Obtaining structured identifiers from DWARF debuginfo

Sometimes Pernosco needs to display the names of types (e.g. as an annotation on a local variable name). In C++ at least, type names are not always available in mangled form, because there is no linkage symbol associated with a type. Instead we can reconstruct a fully qualified name for the type from the structure of DWARF debuginfo. For example, the DWARF tag for a C++ type is nested inside DWARF tags for its enclosing namespaces, so we can build a structured identifier for a DWARF type by examining its DWARF ancestors.

This approach has problems. When a structured identifier contains an instance of generic type, we want to capture its generic parameters in the AST. DWARF has two sources of information for a type's generic parameters: the type's name is a string containing a representation of those parameters (not mangled!), and there is explicit DWARF debuginfo describing each generic parameter. Interpreting the latter gets hairy; for example in C++20 a template parameter can be a value of struct/class type; DWARF gives us that value as a byte vector which we would have to interpret as that type and prettyprint. On the other hand, parsing the type name itself to a structured identifier means effectively having to parse the language's type syntax, which is extremely difficult for C++.

Because of these difficulties Pernosco doesn't solve this problem yet. I expect we'll eventually extract the generic parameters from DWARF and prettyprint them, but it would be easier if DWARF just gave us mangled names for all types so we didn't have to try to build our own structured identifiers for types by piecing together DWARF.

Conclusion

Parsing identifiers to an AST is very helpful for debugger interfaces. It's best done as part of identifier demangling, so demanglers should support this. It would be helpful for debuginfo to include mangled names for all types.