Where Should the Debugger Set a Breakpoint?

Posted by khuey on 9 November 2021

When you search for executions of a function in Pernosco (or set a breakpoint in a more traditional debugger), how does the debugger decide where to stop? You might think it just stops at the first instruction in a function, but that's not true in general.

Debuggers actually stop immediately after the function's "prologue", and for good reason. At -O0, compilers generally do not produce accurate debug information for the function prologue, and stopping any earlier would lead to the debugger seeing incorrect parameter values. That, in turn, would produce the wrong results when evaluating conditional breakpoints. Finding the end of a function's prologue can be surprisingly difficult though. In this article I'll go through why this is an issue, and some of the heuristics involved in determining where to place a breakpoint.

What is a function prologue?

If you've ever looked at the disassembly of a compiled program, you've probably noticed some common instruction sequences at the beginnings of functions. These instructions are called the prologue of the function. Its job is to handle bookkeeping at function entry. These instructions will do things like set up the stack frame and move parameters from where the calling convention dictates they must be at the function boundary to where they live during the function body.

In the disassembled main function of this example program we can see some features that typically appear in function prologues on x86-64. The initial endbr64 instruction is a marker telling a new hardware security feature that this is a valid address to jump to. After that, the push rbp; mov rbp, rsp sequence establishes the "frame pointer" in rbp. The sub rsp, 0x20 allocates 32 bytes of storage for this function on the stack (stacks grow downwards on x86). And finally, the values of the argc and argv variables which the calling convention dictates are passed in the rdi and rsi registers are copied onto the stack. Note that the addresses rbp - 0x20 and rbp - 0x14 are both in the 32 bytes of storage we just allocated.

Some or all of these instructions may be missing in an optimized build. The compiler can omit the frame pointer to free up rbp for other uses. If the function is a leaf function (a function that calls no other functions), the compiler can take advantage of the "red zone" (128 bytes of space below rsp on Linux x86-64) and avoid explicitly adjusting rsp. And some variables may not even get stack storage at all in an optimized build, if the optimizer can find space for them in registers.

What is wrong with debug information in the function prologue at -O0?

If we move to a point earlier in the function than where the debugger chose to stop, we can see at the bottom left that the values of the parameters argc and argv are no longer correct. This is because the compiler-provided debug info is incorrect in the function prologue.

Compilers on Linux and other Unix-like machines communicate information about the generated machine code to debuggers and other tools in the DWARF format. Running the dwarfdump tool on the example program here (and eliding a large amount of information irrelevant to this discussion) produces this output:

< 1><0x00000bf8>    DW_TAG_subprogram
                      DW_AT_name                  main
                      DW_AT_low_pc                0x0000173d
                      DW_AT_high_pc               <offset-from-lowpc>179
                      DW_AT_frame_base            len 0x0001: 9c: DW_OP_call_frame_cfa
< 2><0x00000c1a>      DW_TAG_formal_parameter
                        DW_AT_name                  argc
                        DW_AT_location              len 0x0002: 915c: DW_OP_fbreg -36
< 2><0x00000c29>      DW_TAG_formal_parameter
                        DW_AT_name                  argv
                        DW_AT_location              len 0x0002: 9150: DW_OP_fbreg -48
< 2><0x00000c38>      DW_TAG_variable
                        DW_AT_name                  message
                        DW_AT_location              len 0x0002: 9161: DW_OP_fbreg -31

The compiler is telling us that it compiled a subprogram/function named "main". It tells us the generated code starts at the offset 0x173d in the .text section of this binary, and that it continues for 179 bytes. It also tells us that this function has three variables: the parameters argc and argv and the local variable message. The locations of these values are provided as well, in the form of offsets from the "frame base register. The "frame base register" used to be an actual machine register, but with modern compilers (this was compiled with gcc 9) it's usually defined to be the Canonical Frame Address of the call frame as it is here. The CFA in turn is defined to be the value of the stack pointer immediately before the call instruction that entered the function, which generally puts it 16 bytes above the frame pointer (which, if present, is generally in rbp). I have diagrammed the stack layout relative to both the CFA and rbp below, but the important thing to remember is that rbp-relative addresses can be converted to CFA-relative addresses by subtracting 16 bytes. (Note that I've written the rbp offsets in base 16 and the CFA offsets in base 10 because that's what the disassembler and dwarfdump tools do, respectively)

Relative to CFARelative to RBPValue
-8+ 0x8Function return address (saved by the call instruction)
-16  0Caller's rbp (saved by the function prologue)
-24- 0x8Stack overflow canary (inserted by -fstack-protector)
-31- 0xfThe local variable message
-36- 0x14The parameter argc
-48- 0x20The parameter argv

If we take another look at the assembly code, this makes sense. We see the value 1 being written to the address rbp - 0x14. This corresponds to the address CFA - 36, which is exactly where the compiler's debug info told us to find argc.

When compiling without optimizations (i.e. with -O0), compilers will generally allocate a stack slot for every variable in a function. This makes generating code much easier (there's no need to keep track of which variable is in which register, or to allocate registers efficiently) at the cost of producing much worse (i.e. slower) code. Because a variable is always in the same place, when the compiler emits debug information for an unoptimized function, it can simply say that "variable X is always at position Y on the stack". But this is false for one very important class of variables: function parameters. As we saw above the calling convention dictates where parameters must be on function entry, and one of the tasks of the function prologue is to move parameters from their locations on entry to more suitable locations for the function body. Critically, neither gcc nor llvm based compilers note this change of location in the debug info at -O0. They incorrectly claim that the parameter/variable is in the stack slot allocated for it all along.

This error in the debug info means that a debugger no longer just has to evaluate a variable, it also has to find the correct point at which to evaluate the variable.

How does a debugger find the end of the prologue?

Debuggers consult another part of the compiler-provided debug information called the line number tables. These tables contain the information needed to map machine code addresses to lines of source code. The relevant entries for this function in our test program are:

<pc>        [lno,col] NS BB ET PE EB IS= DI= uri: "filepath"
0x0000173d  [  61,57] NS
0x00001750  [  61,57] NS
0x0000175f  [  62, 8] NS
0x00001770  [  63, 9] NS
0x00001789  [  64, 7] NS
0x00001795  [  65, 7] NS
0x000017a1  [  66, 7] NS
0x000017ad  [  67, 3] NS
0x000017b3  [  68, 9] NS
0x000017bf  [  70,19] NS
0x000017c4  [  71,26] NS
0x000017c9  [  72, 7] NS
0x000017d5  [  73,10] NS
0x000017da  [  74, 1] NS
0x000017f0  [  74, 1] NS ET

Each line gives an address, a line number and a column number in the source code corresponding to that address, and zero or more flags. The most common flag, NS, stands for "new statement" and represents a statement in the source code. In 2005, DWARF 3 introduced a prologue end flag to the line number tables that debuggers use to convert between lines of source code and addresses in the program. This was intended to replace earlier heuristic based approaches for finding the end of the prologue. This prologue end flag is represented in the dwarfdump output by PE. However, 16 years later, only llvm actually emits the PE flag, so we have to fall back to heuristics to find the prologue end for this program.

Heuristic #1: A second entry for the same line/column number signals the end of the prologue.

Before the PE flag was proposed, gcc chose to emit a row in the line number table for the start of the function and a second row for the same line/column for the end of the prologue. Because debuggers look for the highest address not exceeding the current address to find the correct row to use when translating the source code locations, repeating the line and column information for two addresses that really belong on the same line is harmless. This repetition can be interpreted by a debugger that recognizes it as signaling the end of the prologue.

In the above example we can see that gcc repeated entries for line 61, column 57, first for 0x173d, and later for 0x1750. That's gcc's way of telling debuggers that the prologue ends at 0x1750, and that's where they will set breakpoints.

Heuristic #2: Examine the instructions present.

Not all compilers emit a second entry though. The Ada compiler, for example, tends not to. The second heuristic debuggers apply is to look for instructions that appear to be part of a prologue. On x86-64, the push rbp; mov rbp, rsp sequence (or this sequence preceded by an endbr64) is a dead giveaway for a prologue. Determining everything that belongs to the prologue would be too hard, but once it's been determined that a prologue exists at all, the debugger can choose to treat the next entry in the line number table as the end of the prologue.

Heuristic #3: Trust the compiler, if it looks trustworthy.

This is the debug information for the same program, only this time compiled with optimizations enabled.

< 1><0x00000c16>    DW_TAG_subprogram
                      DW_AT_name                  main
                      DW_AT_low_pc                0x00001220
                      DW_AT_high_pc               <offset-from-lowpc>174
                      DW_AT_frame_base            len 0x0001: 9c: DW_OP_call_frame_cfa
< 2><0x00000c38>      DW_TAG_formal_parameter
                        DW_AT_name                  argc
                        DW_AT_location              <loclist at .debug_loc+0x00000008 with 4 entries>
                        [ 0]<offset pair low-off: 0x00001220 addr 0x00001220 high-off: 0x00001263 addr 0x00001263>DW_OP_reg5
                        [ 1]<offset pair low-off: 0x00001263 addr 0x00001263 high-off: 0x000012c8 addr 0x000012c8>DW_OP_reg3
                        [ 2]<offset pair low-off: 0x000012c8 addr 0x000012c8 high-off: 0x000012c9 addr 0x000012c9>DW_OP_GNU_entry_value(DW_OP_reg5) DW_OP_stack_value
                        [ 3]<offset pair low-off: 0x000012c9 addr 0x000012c9 high-off: 0x000012ce addr 0x000012ce>DW_OP_reg3
< 2><0x00000c4c>      DW_TAG_formal_parameter
                        DW_AT_name                  argv
                        DW_AT_location              <loclist at .debug_loc+0x0000006b with 2 entries>
                        [ 0]<offset pair low-off: 0x00001220 addr 0x00001220 high-off: 0x0000122e addr 0x0000122e>DW_OP_reg4
                        [ 1]<offset pair low-off: 0x0000122e addr 0x0000122e high-off: 0x000012ce addr 0x000012ce>DW_OP_GNU_entry_value(DW_OP_reg4) DW_OP_stack_value
< 2><0x00000c60>      DW_TAG_variable
                        DW_AT_name                  message
                        DW_AT_location              len 0x0002: 9161: DW_OP_fbreg -31

The locations for the parameters are different now. They are using a DWARF feature called location lists, where instead of specifying that something is always at one location, the location is described as a function of the current machine code address. We see that the function in this optimized build starts at 0x1220, and that argc is in register 5 from 0x1220 to 0x1263. After that, it is in register 3 from 0x1263 to 0x12c8, and so on. Similarly argv starts off in register 4 before its location changes later in the program as well. Registers 5 and 4, in the DWARF numbering scheme, happen to be rdi and rsi respectively, exactly where the calling convention puts these values on program entry.

Recall that the motivation for skipping over the function prologue is that compilers tend to emit incorrect debug information for them. But if the compiler does appear to be tracking variable locations at the beginning of functions, a debugger can choose to trust it and simply stop at the first instruction in the program.

Conclusions

Even something as simple as setting a breakpoint on a function can be surprisingly complex. The compiler needs to emit information the debugger can use to map source level concepts such as files, lines of code, and functions, into machine level concepts such as addresses, registers, and stack locations. This information is not always correct or well-structured and debuggers (including Pernosco) implement several heuristics to manage the problems.