When you search for executions of a function in Pernosco (or set a breakpoint in a more traditional debugger), how does the debugger decide where to stop? You might think it just stops at the first instruction in a function, but that's not true in general.
Debuggers actually stop immediately after the function's "prologue", and for good reason. At
-O0, compilers generally do not produce accurate debug information for the function prologue, and stopping any earlier would lead to the debugger seeing incorrect parameter values. That, in turn, would produce the wrong results when evaluating conditional breakpoints. Finding the end of a function's prologue can be surprisingly difficult though. In this article I'll go through why this is an issue, and some of the heuristics involved in determining where to place a breakpoint.
If you've ever looked at the disassembly of a compiled program, you've probably noticed some common instruction sequences at the beginnings of functions. These instructions are called the prologue of the function. Its job is to handle bookkeeping at function entry. These instructions will do things like set up the stack frame and move parameters from where the calling convention dictates they must be at the function boundary to where they live during the function body.
In the disassembled
main function of this example program we can see some features that typically appear in function prologues on x86-64. The initial
endbr64 instruction is a marker telling a new hardware security feature that this is a valid address to jump to. After that, the
push rbp; mov rbp, rsp sequence establishes the "frame pointer" in
sub rsp, 0x20 allocates 32 bytes of storage for this function on the stack (stacks grow downwards on x86). And finally, the values of the
argv variables which the calling convention dictates are passed in the
rsi registers are copied onto the stack. Note that the addresses
rbp - 0x20 and
rbp - 0x14 are both in the 32 bytes of storage we just allocated.
Some or all of these instructions may be missing in an optimized build. The compiler can omit the frame pointer to free up
rbp for other uses. If the function is a leaf function (a function that calls no other functions), the compiler can take advantage of the "red zone" (128 bytes of space below
rsp on Linux x86-64) and avoid explicitly adjusting
rsp. And some variables may not even get stack storage at all in an optimized build, if the optimizer can find space for them in registers.
If we move to a point earlier in the function than where the debugger chose to stop, we can see at the bottom left that the values of the parameters
argv are no longer correct. This is because the compiler-provided debug info is incorrect in the function prologue.
Compilers on Linux and other Unix-like machines communicate information about the generated machine code to debuggers and other tools in the DWARF format. Running the dwarfdump tool on the example program here (and eliding a large amount of information irrelevant to this discussion) produces this output:
< 1><0x00000bf8> DW_TAG_subprogram DW_AT_name main DW_AT_low_pc 0x0000173d DW_AT_high_pc <offset-from-lowpc>179 DW_AT_frame_base len 0x0001: 9c: DW_OP_call_frame_cfa < 2><0x00000c1a> DW_TAG_formal_parameter DW_AT_name argc DW_AT_location len 0x0002: 915c: DW_OP_fbreg -36 < 2><0x00000c29> DW_TAG_formal_parameter DW_AT_name argv DW_AT_location len 0x0002: 9150: DW_OP_fbreg -48 < 2><0x00000c38> DW_TAG_variable DW_AT_name message DW_AT_location len 0x0002: 9161: DW_OP_fbreg -31
The compiler is telling us that it compiled a subprogram/function named "main". It tells us the generated code starts at the offset
0x173d in the .text section of this binary, and that it continues for 179 bytes. It also tells us that this function has three variables: the parameters
argv and the local variable
message. The locations of these values are provided as well, in the form of offsets from the "frame base register. The "frame base register" used to be an actual machine register, but with modern compilers (this was compiled with gcc 9) it's usually defined to be the Canonical Frame Address of the call frame as it is here. The CFA in turn is defined to be the value of the stack pointer immediately before the call instruction that entered the function, which generally puts it 16 bytes above the frame pointer (which, if present, is generally in
rbp). I have diagrammed the stack layout relative to both the CFA and
rbp below, but the important thing to remember is that
rbp-relative addresses can be converted to CFA-relative addresses by subtracting 16 bytes. (Note that I've written the
rbp offsets in base 16 and the CFA offsets in base 10 because that's what the disassembler and dwarfdump tools do, respectively)
|Relative to CFA||Relative to RBP||Value|
|-8||+ 0x8||Function return address (saved by the |
|-24||- 0x8||Stack overflow canary (inserted by -fstack-protector)|
|-31||- 0xf||The local variable |
|-36||- 0x14||The parameter |
|-48||- 0x20||The parameter |
If we take another look at the assembly code, this makes sense. We see the value
1 being written to the address
rbp - 0x14. This corresponds to the address CFA - 36, which is exactly where the compiler's debug info told us to find
When compiling without optimizations (i.e. with
-O0), compilers will generally allocate a stack slot for every variable in a function. This makes generating code much easier (there's no need to keep track of which variable is in which register, or to allocate registers efficiently) at the cost of producing much worse (i.e. slower) code. Because a variable is always in the same place, when the compiler emits debug information for an unoptimized function, it can simply say that "variable X is always at position Y on the stack". But this is false for one very important class of variables: function parameters. As we saw above the calling convention dictates where parameters must be on function entry, and one of the tasks of the function prologue is to move parameters from their locations on entry to more suitable locations for the function body. Critically, neither gcc nor llvm based compilers note this change of location in the debug info at
-O0. They incorrectly claim that the parameter/variable is in the stack slot allocated for it all along.
This error in the debug info means that a debugger no longer just has to evaluate a variable, it also has to find the correct point at which to evaluate the variable.
Debuggers consult another part of the compiler-provided debug information called the line number tables. These tables contain the information needed to map machine code addresses to lines of source code. The relevant entries for this function in our test program are:
<pc> [lno,col] NS BB ET PE EB IS= DI= uri: "filepath" 0x0000173d [ 61,57] NS 0x00001750 [ 61,57] NS 0x0000175f [ 62, 8] NS 0x00001770 [ 63, 9] NS 0x00001789 [ 64, 7] NS 0x00001795 [ 65, 7] NS 0x000017a1 [ 66, 7] NS 0x000017ad [ 67, 3] NS 0x000017b3 [ 68, 9] NS 0x000017bf [ 70,19] NS 0x000017c4 [ 71,26] NS 0x000017c9 [ 72, 7] NS 0x000017d5 [ 73,10] NS 0x000017da [ 74, 1] NS 0x000017f0 [ 74, 1] NS ET
Each line gives an address, a line number and a column number in the source code corresponding to that address, and zero or more flags. The most common flag,
NS, stands for "new statement" and represents a statement in the source code. In 2005, DWARF 3 introduced a prologue end flag to the line number tables that debuggers use to convert between lines of source code and addresses in the program. This was intended to replace earlier heuristic based approaches for finding the end of the prologue. This prologue end flag is represented in the dwarfdump output by
PE. However, 16 years later, only llvm actually emits the
PE flag, so we have to fall back to heuristics to find the prologue end for this program.
PE flag was proposed, gcc chose to emit a row in the line number table for the start of the function and a second row for the same line/column for the end of the prologue. Because debuggers look for the highest address not exceeding the current address to find the correct row to use when translating the source code locations, repeating the line and column information for two addresses that really belong on the same line is harmless. This repetition can be interpreted by a debugger that recognizes it as signaling the end of the prologue.
In the above example we can see that gcc repeated entries for line 61, column 57, first for
0x173d, and later for
0x1750. That's gcc's way of telling debuggers that the prologue ends at
0x1750, and that's where they will set breakpoints.
Not all compilers emit a second entry though. The Ada compiler, for example, tends not to. The second heuristic debuggers apply is to look for instructions that appear to be part of a prologue. On x86-64, the
push rbp; mov rbp, rsp sequence (or this sequence preceded by an
endbr64) is a dead giveaway for a prologue. Determining everything that belongs to the prologue would be too hard, but once it's been determined that a prologue exists at all, the debugger can choose to treat the next entry in the line number table as the end of the prologue.
This is the debug information for the same program, only this time compiled with optimizations enabled.
< 1><0x00000c16> DW_TAG_subprogram DW_AT_name main DW_AT_low_pc 0x00001220 DW_AT_high_pc <offset-from-lowpc>174 DW_AT_frame_base len 0x0001: 9c: DW_OP_call_frame_cfa < 2><0x00000c38> DW_TAG_formal_parameter DW_AT_name argc DW_AT_location <loclist at .debug_loc+0x00000008 with 4 entries> [ 0]<offset pair low-off: 0x00001220 addr 0x00001220 high-off: 0x00001263 addr 0x00001263>DW_OP_reg5 [ 1]<offset pair low-off: 0x00001263 addr 0x00001263 high-off: 0x000012c8 addr 0x000012c8>DW_OP_reg3 [ 2]<offset pair low-off: 0x000012c8 addr 0x000012c8 high-off: 0x000012c9 addr 0x000012c9>DW_OP_GNU_entry_value(DW_OP_reg5) DW_OP_stack_value [ 3]<offset pair low-off: 0x000012c9 addr 0x000012c9 high-off: 0x000012ce addr 0x000012ce>DW_OP_reg3 < 2><0x00000c4c> DW_TAG_formal_parameter DW_AT_name argv DW_AT_location <loclist at .debug_loc+0x0000006b with 2 entries> [ 0]<offset pair low-off: 0x00001220 addr 0x00001220 high-off: 0x0000122e addr 0x0000122e>DW_OP_reg4 [ 1]<offset pair low-off: 0x0000122e addr 0x0000122e high-off: 0x000012ce addr 0x000012ce>DW_OP_GNU_entry_value(DW_OP_reg4) DW_OP_stack_value < 2><0x00000c60> DW_TAG_variable DW_AT_name message DW_AT_location len 0x0002: 9161: DW_OP_fbreg -31
The locations for the parameters are different now. They are using a DWARF feature called location lists, where instead of specifying that something is always at one location, the location is described as a function of the current machine code address. We see that the function in this optimized build starts at
0x1220, and that
argc is in register 5 from
0x1263. After that, it is in register 3 from
0x12c8, and so on. Similarly
argv starts off in register 4 before its location changes later in the program as well. Registers 5 and 4, in the DWARF numbering scheme, happen to be
rsi respectively, exactly where the calling convention puts these values on program entry.
Recall that the motivation for skipping over the function prologue is that compilers tend to emit incorrect debug information for them. But if the compiler does appear to be tracking variable locations at the beginning of functions, a debugger can choose to trust it and simply stop at the first instruction in the program.
Even something as simple as setting a breakpoint on a function can be surprisingly complex. The compiler needs to emit information the debugger can use to map source level concepts such as files, lines of code, and functions, into machine level concepts such as addresses, registers, and stack locations. This information is not always correct or well-structured and debuggers (including Pernosco) implement several heuristics to manage the problems.