For example, when I come across library relocation problems, the first thing I do is run ldd on the executable. The ldd tool lists the dependent shared libraries that the executable requires, along with their paths if found.
On OS X though, here's what happens when you try to run ldd.
evil:~ mohit$ ldd /bin/ls
-bash: ldd: command not found
Not Found? But it's on all the common UNIX flavours. I wonder if objdump works.
$ objdump -x /bin/ls
-bash: objdump: command not found
Command not found. What's going on?
The problem is that unlike Linux, Solaris, HP-UX, and many other UNIX variants, OS X does not use ELF binaries. In addition, OS X is not part of the GNU project, which is home to tools like ldd and objdump.
In order to get a list of dependencies for an executable on OS X, you need to use otool.
evil:~ mohit$ otool /bin/ls
otool: one of -fahlLtdoOrTMRIHScis must be specified
Usage: otool [-fahlLDtdorSTMRIHvVcXm] object_file ...
-f print the fat headers
-a print the archive header
-h print the mach header
-l print the load commands
-L print shared libraries used
-D print shared library id name
-t print the text section (disassemble with -v)
start dissassemble from routine name
print contents of section
-d print the data section
-o print the Objective-C segment
-r print the relocation entries
-S print the table of contents of a library
-T print the table of contents of a dynamic shared library
-M print the module table of a dynamic shared library
-R print the reference table of a dynamic shared library
-I print the indirect symbol table
-H print the two-level hints table
-v print verbosely (symbolicly) when possible
-V print disassembled operands symbolicly
-c print argument strings of a core file
-X print no leading addresses or headers
-m don't use archive(member) syntax
evil:~ mohit$ otool -L /bin/ls
/usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 88.0.0)
Much better. I can see that /bin/ls references two dynamic libraries. Though, the filename extensions don't look at all familiar.
I'm quite sure that many UNIX / Linux users have had similar experiences while working on OS X systems, so I decided to write a little on what I have learnt so far about OS X executable files.
The OS X Runtime Architecture
A runtime environment is a framework for code execution on OS X. It consists of a set of conventions that define how code is loaded, managed and executed. When an application is launched, the relevant runtime environment loads the program into memory, resolves references to external libraries, and prepares the code for execution.
OS X supports three runtime environments:
- Dyld Runtime Environment: The preferred runtime environment based on the dyld library manager.
- CFM Runtime Environment: A legacy environment inherited from OS 9. This is really designed for applications that want to use some of the newer OS X features, but have not been completely ported to dyld yet.
- The Classic Environment: This environment makes it possible for unmodified OS 9 (9.1 or 9.2) applications to run on OS X.
The Mach-O Executable File Format
In OS X, almost all files containing executable code, e.g., applications, frameworks, libraries, kernel extensions etc., are implemented as Mach-O files. Mach-O is a file format and an ABI (Application Binary Interface) that describes how an executable is to be loaded and run by the kernel. To be more specific, it tells the OS:
- Which dynamic loader to use.
- Which shared libraries to load.
- How to organize the process address space.
- Where the function entry-point is, and more.
To support the Dyld Runtime Environment, all files must be built using the Mach-O executable format.
How Mach-O Files are Organized
Mach-O files are divided into three regions: a header, a load commands region, and the raw segment data. The header and load commands regions describe the features, layout and other characteristics of the file, while the raw segment data region contains ranges of bytes that are referenced by the load commands.
To investigate and examine the various parts of Mach-O files, OS X comes with a useful program called otool located in /usr/bin.
In the following sections, we will use otool to learn more about how Mach-O files are organized.
To view the the Mach-O header of a file, use the -h parameter of the otool command.
evil:~ mohit$ otool -h /bin/ls
magic cputype cpusubtype filetype ncmds sizeofcmds flags
0xfeedface 18 0 2 11 1608 0x00000085
The first thing specified in the header is the magic number. The magic number identifies the file as either a 32-bit or a 64-bit Mach-O file. It also identifies the endianness of the CPU that it was intended for. To decipher the magic number, have a look at /usr/include/mach-o/loader.h.
The header also specifies the target architecture for the file. This allows the kernel to ensure that the code is not run on a processor-type that it was not written for. For example, in the above output, cputype is set to 18, which is CPU_TYPE_POWERPC, as defined in /usr/include/mach/machine.h.
From these two entries alone, we can infer that this binary was intended for 32-bit PowerPC based systems.
Sometimes binaries can contain code for more than one architecture. These are known as Universal Binaries, and generally begin with an additional header called the fat_header. To examine the contents of the fat_header, use the -f switch of the otool command.
The cpusubtype attribute specifies the exact model of the CPU, and is generally set to CPU_SUBTYPE_POWERPC_ALL or CPU_SUBTYPE_I386_ALL.
The filetype signifies how the file is to be aligned and used. It usually tells you if the file is a library, a standard executable, a core file etc. The filetype above equates to MH_EXECUTE, which signifies a demand paged executable file. Below is a snip from /usr/include/mach-o/loader.h that lists the different file-types as of this writing.
#define MH_OBJECT 0x1 /* relocatable object file */
#define MH_EXECUTE 0x2 /* demand paged executable file */
#define MH_FVMLIB 0x3 /* fixed VM shared library file */
#define MH_CORE 0x4 /* core file */
#define MH_PRELOAD 0x5 /* preloaded executable file */
#define MH_DYLIB 0x6 /* dynamically bound shared library */
#define MH_DYLINKER 0x7 /* dynamic link editor */
#define MH_BUNDLE 0x8 /* dynamically bound bundle file */
#define MH_DYLIB_STUB 0x9 /* shared library stub for static */
/* linking only, no section contents */
The next two attributes refer to the load commands section, and specify the number and size of the commands.
And finally, we have flags, that specify various features that the kernel may use while loading and executing Mach-O files.
The load commands region contains a list of commands that tell the kernel how to load the various raw segments within the file. They basically describe how each segment is aligned, protected and laid out in memory.
To see a the list of load commands within a file, use the -l switch of the otool command.
evil:~/Temp mohit$ otool -l /bin/ls
Load command 0
Load command 1
align 2^2 (4)
[ ___SNIPPED FOR BREVITY___ ]
Load command 4
name /usr/lib/dyld (offset 12)
Load command 5
name /usr/lib/libncurses.5.4.dylib (offset 24)
time stamp 1111407638 Mon Mar 21 07:20:38 2005
current version 5.4.0
compatibility version 5.4.0
Load command 6
name /usr/lib/libSystem.B.dylib (offset 24)
time stamp 1111407267 Mon Mar 21 07:14:27 2005
current version 88.0.0
compatibility version 1.0.0
Load command 7
Load command 8
Load command 9
Load command 10
r0 0x00000000 r1 0x00000000 r2 0x00000000 r3 0x00000000 r4 0x00000000
r5 0x00000000 r6 0x00000000 r7 0x00000000 r8 0x00000000 r9 0x00000000
r10 0x00000000 r11 0x00000000 r12 0x00000000 r13 0x00000000 r14 0x00000000
r15 0x00000000 r16 0x00000000 r17 0x00000000 r18 0x00000000 r19 0x00000000
r20 0x00000000 r21 0x00000000 r22 0x00000000 r23 0x00000000 r24 0x00000000
r25 0x00000000 r26 0x00000000 r27 0x00000000 r28 0x00000000 r29 0x00000000
r30 0x00000000 r31 0x00000000 cr 0x00000000 xer 0x00000000 lr 0x00000000
ctr 0x00000000 mq 0x00000000 vrsave 0x00000000 srr0 0x00001ac4 srr1 0x00000000
The above file has 11 load commands located directly below the header, numbered 0 to 10.
The first four commands (LC_SEGMENT), numbered 0 to 3, define how segments within the file are to be mapped into memory. A segment defines a range of bytes in the Mach-O binary, and can contain zero or more sections. We will talk more about segments later.
Load command 4 (LC_LOAD_DYLINKER) specifies which dynamic linker to use. This is almost always set to /usr/lib/dyld, which is the default OS X dynamic library linker.
Commands 5 and 6 (LC_LOAD_DYLIB) specify the shared libraries that this file links against. These are loaded by the dynamic loader specified in command 4.
Commands 7 and 8 (LC_SYMTAB, LC_DYNSYMTAB) specify the symbol tables used by the file and the dynamic linker respectively. Command 9 (LC_TWOLEVEL_HINTS) contains the hint table for the two-level namespace.
And finally, command 10 (LC_UNIXTHREAD), defines the initial state of the main thread of the process. This command is only included in executable files.
Segments and Sections
Most of the load commands mentioned above make references to segments within the file. A segment is a range of bytes within a Mach-O file that maps directly into virtual memory by the kernel and the dynamic linker. The header and load commands regions are considered as the first segment of the file.
An typical OS X executable generally has five segments:
- __PAGEZERO : Located at virtual memory address 0 and has no protection rights. This segment occupies no space in the file, and causes access to NULL to immediately crash.
- __TEXT : Contains read-only data and executable code.
- __DATA : Contains writable data. These sections are generally marked copy-on-write by the kernel.
- __OBJC : Contains data used by the Objective C language runtime.
- __LINKEDIT : Contains raw data used by the dynamic linker.
To see the contents of a section, use the -s option with the otool command.
evil:~/Temp mohit$ otool -sv __TEXT __cstring /bin/ls
Contents of (__TEXT,__cstring) section
00006320 00000000 5f5f6479 6c645f6d 6f645f74
00006330 65726d5f 66756e63 73000000 5f5f6479
00006340 6c645f6d 616b655f 64656c61 7965645f
00006350 6d6f6475 6c655f69 6e697469 616c697a
To disassemble the __text section, use the -tv switch.
evil:~/Temp mohit$ otool -tv /bin/ls
00001ac4 or r26,r1,r1
00001ac8 addi r1,r1,0xfffc
00001acc rlwinm r1,r1,0,0,26
00001ad0 li r0,0x0
00001ad4 stw r0,0x0(r1)
00001ad8 stwu r1,0xffc0(r1)
00001adc lwz r3,0x0(r26)
00001ae0 addi r4,r26,0x4
Within the __TEXT segment, there are four major sections:
- __text : The compiled machine code for the executable.
- __const : General constants data.
- __cstring : Literal string constants.
- __picsymbol_stub : Position-independent code stub routines used by the dynamic linker.
Running an Application
Now that we know what a Mach-O file looks like, let us see how OS X loads and runs an application.
When you run an application, the shell first calls the fork() system call. Fork creates a logical copy of the calling process (the shell) and schedules it for execution. This child process then calls the execve() system call providing the path of the program to be executed.
The kernel loads the specified file, and examines its header to verify that it is a valid Mach-O file. It then starts interpreting the load commands, replacing the child process's address space with segments from the file.
At the same time, the kernel also executes the dynamic linker specified by the binary, which proceeds to load and link all the dependent libraries. After it binds just enough symbols that are necessary for running the file, it calls the entry-point function.
The entry-point function is usually a standard function statically linked in from /usr/lib/crt1.o at build time. This function initializes the kernel environment and calls the executable's main() function.
The application is now running.
The Dynamic Linker
The OS X dynamic linker, /usr/lib/dyld, is responsible for loading dependent shared libraries, importing the various symbols and functions, and binding them into the current process.
When the process is first started, all the linker does is import the shared libraries into the address space of the process. Depending on how the program was built, the actual binding may be performed at different stages of its execution.
- Immediately after loading, as in load-time binding.
- When a symbol is referenced, as in just-in-time binding.
- Before the process is even executed, an optimization technique known as pre-binding
An application can only continue to run when all the required symbols and segments from all the different object files can be resolved. In order to find libraries and frameworks, the standard dynamic linker, /usr/bin/dyld, searches a predefined set of directories. To override these directories, or to provide fallback paths, the DYLD_LIBRARY_PATH or DYLD_FALLBACK_LIBRARY_PATH environment variables can be set a colon-separated list of directories.
As you can see, executing a process in OS X is a complex affair, and I have tried to cover as much as is necessary for a useful debugging session.
To learn more about Mach-O executables, otool, and the OS X kernel in general, here are a list of references that I would recommend:
Mac OS X ABI Mach-O File Format Reference
Executing Mach-O Files
Overview of Dynamic Libraries
The otool man page
The dyld man page
2006/03/28 - Looks like this article was Slashdotted and Dugg. It has been slightly modified since, thanks to a few readers who pointed out errors and typos within.
2006/03/28 - I have answered some of your questions and comments regarding this article here: Q&A: How OS X Executes Applications.