WWDC 2021: Symbolication - Beyond the basics

Find hereafter a detailed summary of the above named video which belongs to a taxonomy of some WWDC footages.

The original video is available on the official Apple website (session 10211).

"Discover how you can achieve maximum performance and insightful debugging with your app. Symbolication is at the center of tools such as Instruments and LLDB to help bridge the layers between your application's runtime and your source code. Learn how this process works and the steps you can take to gain the most insight into your app."

The various contents of this speech are indicated hereunder:

The basics
Back to the file
Debug information
Tools & comand lines

Most of the illustrations are parts of the Apple presentations and may be available at the Resources section inside the Overview sheet of each video.

Hereafter, the underlined elements lead directly to the playback of the WWDC video at the appropriate moment.

The basics #

Symbolication is the way to make a connection between what's happening during runtime in an app and its pairing lines of code that brought it into being.

Xcode and the command lines are efficient tools to get a more workable layout in order to find out the file name and the line number in the code that gave rise to the highlighted issue.

The symbolication process is divided into two important phases that are detailed in the next sections.

Back to the file #

The connection between the on disk addresses and the runtime addresses relies on the creation of some binary groups into segments that are loaded into memory by the system to provide the app itself.

The Mach-O header that contains appropriate commands to handle the segments properties is a mandatory section to be read by the system before running an app (Mach-O = format used for all executable binaries and libraries).

The link between the runtime and the linker addresses depends on a random value generated by the kernel called the ASLR slide.

Debug information #

When the app is built, Xcode creates the debug information as the connection between the source code and the file addresses that includes different sort of info.

Function starts #

This debug information is just a mean to confirm the existence of a function in code by providing its address thanks to the LC_FUNCTION_STARTS command of the Mach-O header.

A specific command line is also capable of listing all the functions with their starting address.

When a crash log lacks function names, this debug information may be of a great help to find out what function is involved.

Nlist symbol table #

🎬 (16:46)

This category of debug information differs from the function starts in the specific way that it supplies a complete structure of details that may be of three different types.

🎬 (17:30)

This type gathers both the methods and functions existing in the direct symbol table.

The nm and symbols commands list the direct symbols of the project with readable names easy to understand.

It's crucial to notice that all the methods and functions aren't necessary listed as direct symbols.

To avoid wasting space keeping useless symbols, it's necessary to handle the build settings in order to manage how the app is stripped during the build.

The display of command lines is a perfect mean to note if some elements may be either never in the direct symbols or simply stripped.

🎬 (22:47)

This type regroups both the methods and functions that are used from other frameworks or libraries.

DWARF #

🎬 (24:19)

Originally designed along with ELF (Executable and Linkable Format), DWARF (Debugging With Attributed Record Formats) is a file format used for debugging support that brings about more detailed debug information.

🎬 (25:54)

A subprogram stands for a defined function that is embedded in a compile unit.

This tree representation can also be analyzed in depth thanks to a command line return.

🎬 (28:12)

The debug_line stream is the source of the line table program that creates a mapping between addresses and the corresponding line of code that define the inspected element.

Browsing the tree is a good way to figure out the matching of an element.

A command line with atos including the appropriate parameters leads to the same result.

🎬 (30:03)

Inlining is a routine optimization performed by the compiler that replaces the function call by the block defining its behavior.

A command line with dwarfdump can look into some relationships to point out the inlined subroutines using the abstract origin node.

🎬 (31:54)

Tools & comand lines #

Besides all the examples provided in the previous sections, some pieces of advice are introduced hereafter to ease symbolication:

Optimize the build settings for debugging ⟹ 🎬
Check out the DSYM existence and all the UUIDs for a specific DSYM ⟹ mdfind and symbols 🎬

Verify the DSYM size doesn't exceed its maximum ⟹ 🎬

Check for DWARF validity ⟹ dwarfdump 🎬
Cross-check the UUID of the app build and the DSYM's ⟹ symbols 🎬
Consult the types of debug information ⟹ symbols 🎬
Inspect your entitlements and your code signing ⟹ codesign 🎬
Specify the architecture for Universal 2 apps ⟹ atos, nm, otool and symbol 🎬
Display the load commands located in the Mach-O header ⟹ otool 🎬
Evaluate the ASLR slide ⟹ atos 🎬