libdwarf
|
Your thoughts on the document?
A) Are the section and subsection titles on Main Page meaningful to you?
B) Are the titles on the Modules page meaningful to you?
Anything else you find misleading or confusing? Thanks in advance for any suggestions.
This document describes an interface to libdwarf, a library of functions to provide access to DWARF debugging information records, DWARF line number information, DWARF address range and global names information, weak names information, DWARF frame description information, DWARF static function names, DWARF static variables, and DWARF type information. In addition the library provides access to several object sections (created by compiler writers and for debuggers) related to debugging but not mentioned in any DWARF standard.
The document has long mentioned the "Unix International Programming Languages Special Interest Group" (PLSIG), under whose auspices the DWARF committee was formed around 1991. "Unix International" was disbanded in the 1990s and no longer exists.
The DWARF committee published DWARF2 July 27, 1993, DWARF3 in 2005, DWARF4 in 2010, and DWARF5 in 2017.
In the mid 1990s this document and the library it describes (which the committee never endorsed, having decided not to endorse or approve any particular library interface) was made available on the internet by Silicon Graphics, Inc.
In 2005 the DWARF committee began an affiliation with FreeStandards.org. In 2007 FreeStandards.org merged with The Linux Foundation. The DWARF committee dropped its affiliation with FreeStandards.org in 2007 and established the dwarfstd.org website.
Libdwarf can safely open multiple Dwarf_Debug pointers simultaneously but all such Dwarf_Debug pointers must be opened within the same thread. And all libdwarf calls must be made from within that single (same) thread.
Essentially every libdwarf call could involve dealing with an error (possibly data corruption in the object file). Here we explain the two main approaches the library provides (though we think only one of them is truly appropriate except in toy programs).
A) The suggested approach is to define a Dwarf_Error.
Then, in every call where there is a Dwarf_Error argument pass its address. For example:
The possible return values to res are, in general:
If DW_DLV_ERROR is returned then error is set (by the library) to a pointer to important details about the error. If DW_DLV_NO_ENTRY or DW_DLV_OK is returned the error argument is ignored by the library.
Some functions cannot possibly return some of these three values. As defined later for each function.
B) An alternative (not recommended) approach is to pass NULL to the error argument.
If your initialization provided an 'errhand' function pointer argument (see below) the library will call errhand if an error is encountered. (Your errhand function could simply exit if you so choose.)
The the library will then return DW_DLV_ERROR, though you will have no way to identify what the error was. Could be a malloc fail or data corruption or an invalid argument to the call, or something else.
That is the whole picture. The library never calls exit() under any circumstances.
Each initialization call (for example)
has two arguments that appear nowhere else in the library.
If you use the suggested A) approach just pass NULL to both those arguments.
Note that dw_errarg is a pointer so one could create a struct with data of interest and use that pointer as the dw_errarg. Or one could put an integer in there or simply NULL, it just depends what you want to do in the Dwarf_Handler function you write.
If you wish to provide a dw_errhand, define a function (this first example is not a good choice as it terminates the application!).
and pass bad_dw_errhandler (as a function pointer, no parentheses The Dwarf_Ptr argument is the value you passed in as dw_errarg, and can be anything. By doing an exit() you guarantee that your application abruptly stops. This is only acceptable to toy or practice programs.
A better dw_errhand function is
because it returns. The DW_DLV_ERROR code is returned from libdwarf and your code can do what it likes with the error situation.
If you do not wish to provide a dw_errhand, just pass both arguments as NULL.
So let us examine a case where anything could happen. And here we are taking the recommeded method of using a non-null dwarf_Error*:
If res == DW_DLV_OK, then newdie is a DIE pointer and when appropriate we should do dwarf_dealloc_die(newdie)
If res == DW_DLV_NO_ENTRY, then newdie is not set and there is no error. In this case it means die was the last of a siblinglist. The exact meaning of course depends on the call.
If res == DW_DLV_ERROR then something really bad happened. The only way to know what is to examine the *error as in
If it's a decently large program then you want to free any local memory and return res. If a small and unimportant program print something and exit.
If you want to discard the error report then
That's all there is to it.
Line Table Registers
Please refer to the DWARF5 Standard for details. The line table registers are named in Section 6.2.2 State Machine Registers and are not much changed from DWARF2.
Certain functions on Dwarf_Line data return values for these 'registers' as these are the data available for debuggers and other tools to relate code addresses to source file locations.
DWARF defines (in each version of DWARF) sections which have a somewhat special character.
These are referenced from compilation units and other places and the Standard does not forbid blocks of random bytes at the start or end or between the areas referenced from elsewhere.
Sometimes compilers (or linkers) leave trash behind as a result of optimizations. If there is a lot of space wasted that way it is quality of implementation issue. But usually the wasted space, if any, is small.
Compiler writers or others may be interested in looking at these sections independently so libdwarf provides functions that allow reading the sections without reference to what references them.
Abbreviations can be read independently
Strings can be read independently
String Offsets can be read independently
Those functions allow starting at byte 0 of the section and provide a length so you can calculate the next section offset to call or refer to.
Usually that works fine. But if there is some random data somewhere outside of referenced areas the reader function may fail, returning DW_DLV_ERROR. Such an error is neither a compiler bug nor a libdwarf bug.
In dealing with .debug_frame or .eh_frame there are a few related values that must be set unless one has relatively few registers in the target ABI (anything under 188 registers, see dwarf.h DW_FRAME_LAST_REG_NUM for this default).
The requirements stem from the design of the section. See the DWARF5 Standard for details.
Keep in mind that register values correspond to columns in the theoretical fully complete table of a row per pc and a column per register.
There is no time or space penalty in setting Undefined_Value, Same_Value, and CFA_Column much larger than the Table_Size.
Here are the five values.
Table_Size: This sets the number of columns in the theoretical table. It starts at DW_FRAME_LAST_REG_NUM which defaults to 188. This is the only value you might need to change, given the defaults of the others are set reasonably large by default.
Undefined_Value: A register number that means the register value is undefined. For example due to a call clobbering the register. DW_FRAME_UNDEFINED_VAL defaults to 12288. There no such column in the table.
Same_Value: A register number that means the register value is the same as the value at the call. Nothing can have clobbered it. DW_FRAME_SAME_VAL defaults to 12289. There no such column in the table.
Initial_Value: The value must be either DW_FRAME_UNDEFINED_VAL or DW_FRAME_SAME_VAL to represent how most registers are to be thought of at a function call. This is a property of the ABI and instruction set. Specific frame instructions in the CIE or FDE will override this for registers not matching this value.
CFA_Column: A number for the CFA. Defined so we can use a register number to refer to it. DW_FRAME_CFA_COL defaults to 12290. There no such column in the table. See libdwarf.h struct member rt3_cfa_rule or function dwarf_get_fde_info_for_cfa_reg3_b .
A set of functions allow these to be changed at runtime. The set should be called (if needed) immediately after initializing a Dwarf_Debug and before any other calls on that Dwarf_Debug. If just one value (for example, Table_Size) needs altering, then just call that single function.
For the library accessing frame data to work properly there are certain invariants that must be true once the set of functions have been called.
REQUIRED:
Each section consists of a header for a specific compilation unit (CU) followed by an a set of tuples, each tuple consisting of an offset of a compilation unit followed by a null-terminated namestring. The tuple set is ended by a 0,0 pair. Then followed with the data for the next CU and so on.
The function set provided for each such section allows one to print all the section data as it literally appears in the section (with headers and tuples) or to treat it as a single array with CU data columns.
Each has a set of 6 functions.
The following four were defined in SGI/IRIX compilers in the 1990s but never part of the DWARF standard.
It not likely you will encounter these.
This most commonly happens with just-in-time compilation, and someone working on the code wants do debug this on-the-fly code in a situation where nothing can be written to disc, but DWARF can be constructed in memory.
For a simple example of this
But the libdwarf feature can be used in a wide variety of ways.
For example, the DWARF data could be kept in simple files of bytes on the internet. Or on the local net. Or if files can be written locally each section could be kept in a simple stream of bytes in the local file system.
Another example is a non-standard file system, or file format, with the intent of obfuscating the file or the DWARF.
For this to work the code generator must generate standard DWARF.
Overall the idea is a simple one: You write a small handful of functions and supply function pointers and code implementing the functions. These are part of your application or library, not part of libdwarf.
You set up a little bit of data with that code (all described below) and then you have essentially written the dwarf_init_path equivalent and you can access compilation units, line tables etc and the standard libdwarf function calls simply work.
Data you need to create involves these types. What follows describes how to fill them in and how to make them work for you.
Dwarf_Obj_Access_Section_a: Your implementation of a om_get_section_info must simply fill in a few fields (leaving most zero) for libdwarf. The fields here are standard Elf, but for most you can just use the value zero. We assume here you will not be doing relocations at runtime.
as_name: Here you set a section name via the pointer. The section names must be names as defined in the DWARF standard, so if such do not appear in your data you have to create the strings yourself.
as_type: Just fill in zero.
as_flags: Just fill in zero.
as_addr: Fill in the address, in local memory, where the bytes of the section are.
as_offset: Just fill in zero.
as_size: Fill in the size, in bytes, of the section you are telling libdwarf about.
as_link: Just fill in zero.
as_info: Just fill in zero.
as_addralign:Just fill in zero.
as_entrysize: Just fill in one.
Dwarf_Obj_Access_Methods_a_s: The functions we need to access object data from libdwarf are declared here.
In these function pointer declarations 'void *obj' is intended to be a pointer (the object field in Dwarf_Obj_Access_Interface_s) that hides the library-specific and object-specific data that makes it possible to handle multiple object formats and multiple libraries. It's not required that one handles multiple such in a single libdwarf archive/shared-library (but not ruled out either). See dwarf_elf_object_access_internals_t and dwarf_elf_access.c for an example.
Usually the struct Dwarf_Obj_Access_Methods_a_s is statically defined and the function pointers are set at compile time.
The om_get_filesize member is new September 4, 2021. Its position is NOT at the end of the list. The member names all now have om_ prefix.
A typical executable or shared object is unlikely to have any section groups, and in that case what follows is irrelevant and unimportant.
COMDAT groups enable compilers and linkers to work together to eliminate blocks of duplicate DWARF and duplicate CODE.
Debug Fission allows compilers and linkers to separate large amounts of DWARF from the executable, shrinking disk space needed in the executable while allowing full debugging (which also applies to shared objects).
See the DWARF5 Standard, Section E.1 Using Compilation Units page 364.
To name such groups (defined later here) we add the following defines to libdwarf.h (the standard does not specify how to do any of this).
The DW_GROUPNUMBER_ are used in libdwarf functions dwarf_init_path(), dwarf_init_path_dl() and dwarf_init_b(). In all those cases unless you know there is any complexity in your object file, pass in DW_GROUPNUMBER_ANY.
To see section groups usage, see the example source:
The function interface declarations:
If an object file has multiple groups libdwarf will not reveal contents of the other groups. One must pass in another groupnumber to dwarf_init_path, meaning init a new Dwarf_Debug, to get libdwarf to access that group.
When opening a Dwarf_Debug the following applies:
If DW_GROUPNUMBER_ANY is passed in libdwarf will choose either of DW_GROUPNUMBER_BASE(1) or DW_GROUPNUMBER_DWO (2) depending on the object content. If both groups one and two are in the object libdwarf will chose DW_GROUPNUMBER_BASE.
If DW_GROUPNUMBER_BASE is passed in libdwarf will choose it if non-split DWARF is in the object, else the init call will return DW_DLV_NO_ENTRY.
If DW_GROUPNUMBER_DWO is passed in libdwarf will choose it if .dwo sections are in the object, else the init will call return DW_DLV_NO_ENTRY.
If a groupnumber greater than two is passed in libdwarf simply accepts it, whether any sections corresponding to that groupnumber exist or not.
For information on groups "dwarfdump -i" on an object file will show all section group information unless the object file is a simple standard object with no .dwo sections and no COMDAT groups (in which case the output will be silent on groups). Look for Section Groups data in the dwarfdump output. The groups information will be appearing very early in the dwarfdump output.
Sections that are part of an Elf COMDAT GROUP are asigned a group number > 2. There can be many such COMDAT groups in an object file (but none in an executable or shared object). Each such COMDAT group will have a small set of sections in it and each section in such a group will be assigned the same group number by libdwarf.
Sections that are in a .dwp .dwo object file are assigned to DW_GROUPNUMBER_DWO,
Sections not part of a .dwp package file or a.dwo section, or a COMDAT group are assigned DW_GROUPNUMBER_BASE.
At least one compiler relies on relocations to identify COMDAT groups, but the compiler authors do not publicly document how this works so we ignore such (these COMDAT groups will result in libdwarf returning DW_DLV_ERROR).
For information on groups "dwarfdump -i" on an object file will show all section group information unless the object file is a simple standard object with no .dwo sections and no COMDAT groups (in which case the output will be silent on groups). Look for Section Groups data in the dwarfdump output. The groups information will be appearing very early in the dwarfdump output.
Sections that are part of an Elf COMDAT GROUP are asigned a group number > 2. There can be many such COMDAT groups in an object file (but none in an executable or shared object). Each such COMDAT group will have a small set of sections in it and each section in such a group will be assigned the same group number by libdwarf.
Sections that are in a .dwp .dwo object file are assigned to DW_GROUPNUMBER_DWO,
Sections not part of a .dwp package file or a.dwo section, or a COMDAT group are assigned DW_GROUPNUMBER_BASE.
Popular compilers and tools are using such sections. There is no detailed documentation that we can find (so far) on how the COMDAT section groups are used, so libdwarf is based on observations of what compilers generate.
There are, at present, two distinct approaches in use to put DWARF information into separate objects to significantly shrink the size of the executable.
One is Macos dSYM. It's a convention of placing the DWARF-containing object in a subdirectory tree.
The other is GNU debuglink and GNU debug_id. These are two distinct ways to provide names of alternative DWARF-containing objects elsewhere in a file system.
If one initializes a Dwarf_Debug object with dwarf_init_path() or dwarf_init_path_dl() appropriately libdwarf will automatically open the alternate object and report on the DWARF there.
libdwarf provides means to automatically read the alternate object (in place of the one named in the init call) or to suppress that and read the named object file.
Case 1:
If dw_true_path_out_buffer or dw_true_path_bufferlen are passed in as zero then the library will not look for an alternative object.
Case 2:
If dw_true_path_out_buffer passes a pointer to space you provide and dw_true_path_bufferlen passes in the length, in bytes, of the buffer, libdwarf will look for alternate DWARF-containing objects. We advise that the caller zero all the bytes in dw_true_path_out_buffer before calling.
If the alternate object name (with its null-terminator) is too long to fit in the buffer the call will return DW_DLV_ERROR with dw_error providing error code DW_DLE_PATH_SIZE_TOO_SMALL.
If the alternate object name fits in the buffer libdwarf will open and use that alternate file in the returned Dwarf_Dbg.
It's up to callers to notice that dw_true_path_out_buffer now contains a string and callers will probably wish to do something with the string.
If the initial byte of dw_true_path_out_buffer is a non-null when the call returns then an alternative object was found and opened.
The second function, dwarf_init_path_dl(), is the same as dwarf_init_path() except the _dl version has three additional arguments, as follows:
Pass in NULL or dw_dl_path_array, an array of pointers to strings with alternate GNU debuglink paths you want searched. For most people, passing in NULL suffices.
Pass in dw_dl_path_array_size, the number of elements in dw_dl_path_array.
Pass in dw_dl_path_source as NULL or a pointer to char. If non-null libdwarf will set it to one of three values:
DW_PATHSOURCE_basic which means the original input dw_path is the one opened in dw_dbg.
DW_PATHSOURCE_dsym which means a Macos dSYM object was found and is the one opened in dw_dbg. dw_true_path_out_buffer contains the dSYM object path.
DW_PATHSOURCE_debuglink which means a GNU debuglink or GNU debug-id path was found and names the one opened in dw_dbg. dw_true_path_out_buffer contains the object path.
GNU Debuglink-specific issue:
If GNU debuglink is present and considered by dwarf_init_path() or dwarf_init_path_dl() the library may be required to compute a 32bit crc (Cyclic Redundancy Check) on the file found via GNU debuglink.
For people doing repeated builds of objects using such the crc check is a waste of time as they know the crc comparison will pass.
For such situations a special interface function lets the dwarf_init_path() or dwarf_init_path_dl() caller suppress the crc check without having any effect on anything else in libdwarf.
It might be used as follows (the same pattern applies to dwarf_init_path_dl() ) for any program that might do multiple dwarf_init_path() or dwarf_init_path_dl() calls in a single program execution.
This pattern ensures the crc check is suppressed for this single dwarf_init_path() or dwarf_init_path_dl() call while leaving the setting unchanged for further dwarf_init_path() or dwarf_init_path_dl() calls in the running program.
We list these with newest first.
Changes 0.4.0 to 0.4.1
libdwarf accepts DW_AT_entry_pc in a compilation unit DIE as a base address for location lists (though it will prefer DW_AT_low_pc if present, per DWARF3). A particular compiler emits DW_AT_entry_pc in a DWARF2 object, requiring this change.
libdwarf adds dwarf_suppress_debuglink_crc() so that library callers can suppress crc calculations. (useful to save the time of crc when building and testing the same thing(s) over and over; it just loses a little checking.) Additionally, libdwarf now properly handles objects with only GNU debug-id or only GNU debuglink.
dwarfdump adds --show-args, an option to print its arguments and version. Without that new option the version and arguments are not shown. The output of -v (--version) is a little more complete.
dwarfdump adds --suppress-debuglink-crc, an option to avoid crc calculations when rebuilding and rerunning tests depending on GNU .note.gnu.buildid or .gnu_debuglink sections. The help text and the dwarfdump.1 man page are more specific documenting --suppress-debuglink-crc and --no-follow-debuglink
Changes 0.3.4 to 0.4.0
Removed the unused Dwarf_Error argument from dwarf_return_empty_pubnames() as the function can only return DW_DLV_OK. dwarf_xu_header_free() renamed to dwarf_dealloc_xu_header(). dwarf_gdbindex_free() renamed to dwarf_dealloc_gdbindex(). dwarf_loc_head_c_dealloc renamed to dwarf_dealloc_loc_head_c().
dwarf_get_location_op_value_d() renamed to dwarf_get_location_op_value_c(), and 3 pointless arguments removed. The dwarf_get_location_op_value_d version and the three arguments were added for DWARF5 in libdwarf-20210528 but the change was a mistake. Now reverted to the previous version.
The .debug_names section interfaces have changed. Added dwarf_dnames_offsets() to provide details of facts useful in problems reading the section. dwarf_dnames_name() now does work and the interface was changed to make it easier to use.
Changes 0.3.3 to 0.3.4
Replaced the groff -mm based libdwarf.pdf with a libdwarf.pdf generated by doxygen and latex.
Added support for the meson build system.
Updated an include in libdwarfp source files. Improved doxygen documentation of libdwarf. Now 'make check -j8' and the like works correctly. Fixed a bug where reading a PE (Windows) object could fail for certain section virtual size values. Added initializers to two uninitialized local variables in dwarfdump source so a compiler warning cannot not kill a –enable-wall build.
Added src/bin/dwarfexample/showsectiongroups.c so it is easy to see what groups are present in an object without all the other dwarfdump output.
Changes 20210528 to 0.3.3 (28 January 2022)
There were major revisions in going from date versioning to Semantic Versioning. Many functions were deleted and various functions changed their list of arguments. Many many filenames changed. Include lists were simplified. Far too much changed to list here.