What if you could reduce the time it takes to link your program by 25%, reduce the memory it takes to link your program by 40%, and reduce the size of the binary by 50%, all by changing a compiler flag?1 That's the power of "split DWARF", a compiler and debugger feature that uses a new format for the DWARF debugging information that's specifically designed to reduce the work the linker is required to do. Let's dive into how it works and what is required for you to benefit from it.
Compilers for languages like C++ produce debugging information in the DWARF format for consumption by downstream tools like debuggers, stack walkers, profilers, etc. When configured to enable interactive debugging (e.g. gcc's
-g2), the binary contains information for all the lines of code, functions, variables, and types in the program. This adds up quickly, especially in languages that follow the zero-overhead principle or have "zero-cost abstractions". The "zero-overhead" refers to runtime overhead, but an abstraction that is compiled down to a minimal amount of machine code may still require the emission of large amounts of debugging information to correctly describe what that machine code does in terms of the functions and types of the higher-level language. In an optimized build with debugging information of the Pernosco database builder (which is written in Rust), 95% of the final binary size is the debugging information!
All of that debugging information has to be copied into the final binary by the linker. Some of it, such as the entries that link a function in the debugging information to the actual code in the binary, has to be adjusted by the linker as well based on how the executable code is ultimately laid out in the binary. Processing all of this information consumes CPU and especially memory during the linking phase.
During development engineers spend a lot of time in an edit-build-run cycle. Even in a large project with hundreds or thousands of source files, a typical build step in this cycle requires recompiling significantly less than all source files. It might only require recompiling one! But the linker has to run every time, from scratch, and reprocess all of the object files into the final binary. It's not difficult for the linker to dominate the build step. Some large software projects even have a special development-only configuration that avoids linking the entire project together specifically to reduce latency in the edit-build-run-cycle.
Split DWARF addresses this problem by allowing most debugging information to bypass the linker entirely. Some information, such as "function F starts at address X and ends at address Y", does have to be processed by the linker because it depends on relocations. That information is a relatively small amount of the total debugging information though. The linker does not change the member variables of a class, or which register a variable lives in inside a given function. All of that bypasses the linker and is placed in a separate "DWARF object" or
.dwo file. Debugging information that requires relocation is left in the traditional object file and processed by the linker normally. This is accomplished by introducing a level of indirection. Debugging information that needs to specify an address or other relocatable value is instead modified to contain an index into a table of addresses. This table of addresses forms the new
.debug_addr section and is placed in the object file. The debugging information that has had all of its relocatable values replaced with the indicies is then placed in the separate
.dwo file and never looked at by the linker.
The biggest downside of this approach is that it requires the consumers of the debugging information to understand the new format and how to gather all the debugging information across the multiple files it now lives in. The Pernosco omniscient debugger is fully capable of handling the pre-standard GNU extension version of split DWARF as well as the standardized DWARF 5 version, in both the hosted and on-premises versions. Try debugging with Pernosco individual accounts or on-premises yourself today!
1 Measurements taken by linking the Chromium browser on an EC2 c5d.9xl instance with and without split DWARF enabled.