Post: Linking, Loaders and Shared Libraries

5 minute read

Notes of the following resource:

IntroPermalink

Binary: shared object and executable

Linking Intro:

Compiler: Build starts with source files and compiler compiles them to object files.
- Translation unit: preprocessor add all includes in a source file
- Object file: compiled translation unit
  - container of different sections (ie .text .data)
    - .data BBS?
Linker:
- Pull together different sections from object files and concat them together
- put all .text/.data from all object files together
Loader:
- Load the different sections into memory as segments with the correct permission
  - .data: r/w
  - .text: r/x

Shared library function call:

normal: code call a function in its own binary
shared library: map shared library to its own address space and perform function call into the shared library.
- shared library is able to call other shared library

Linking text relocation:

Purpose call a function from another object file/shared libary:
In the .code section it calls an arbitrary address
- ie call 0x00000 at line 0x1000
In the .reloc (like a TODO for linker) it states “find foo and write its address at 0x1000”
Disadvantages:
- Modifying the .code segment makes the section unshareable between processes
- Relocation need to be done once per call-site vs once per function

Global Offset Table (GOT linux):

Calling a function with an memory address of a placeholder instead of an address to be overwritten
- ie call [0x2000]: will call the function at address [0x2000]
.reloc: only need to write the address of the function in 0x2000 instead of every call site.
Disadvantage:
- could incur extra indirectin

Linux import sectionPermalink

.dynamic: * a list of dynamic library names it depends on

.dynsym: symbol table * contains all symbols that the binary contains * The ones to be imported will be marked with UND

Semantically linux binary tells the loader:

Here are all the libraries I would like to load with the .dynamic section, I want to map to the process
Here are the bucket of symbols I want you to locate in any of the loaded libraries.

InterpositionPermalink

Interposition: overriding a symbol in one binary (executable / library) from another

Library Search Order:

Linux: breadth first
Loader will start from executable then the libraries the executable and then the libraries the first order libraries depend on.
If a 3rd degree library uses foo and defines foo, the loader will start with the executable before that library
Anything before the current library have a chance to interpose foo
Search in current lib before bread first search (--Bsymbolic)
LD_PRELOAD: load the libraries in the env var before the libraries in the executable.
- Loader will still search executable first then the libraries in LD_PRELOAD

Can a shared-library symbol be overridden from an executable?

Linux: yes

C++: new

allows default implementation of operator new new to be overridden

Symbol Resolution Time:

For executable: undefined symbols are not allowed by the linker.
- linker will find all libraries that one of them have the symbol? (isn’t that the loader)
For libraries: undefined symbols are allowed because the symbols might be in the library/executable that imports it
- This is because of linux breadth first search
- --allow-shib-undefined

Process-wide singleton:

Linux: just put the singleton in the executable

Circular library dependencies:

Linux: yes
Shared libraries allow undefined symbols at link time

Weak vs Strong symbols:

strong symbols:
- can override weak symbols of the same name
- if two strong symbols of the same name, linker will choose the first symbol (interposition)
  - allow overriding of malloc
weak symbols:
- does not need definition
- usually are declaration

Position Independent CodePermalink

What is not PIC:

Hard coded function address is not PIC: if the binary is loaded in another address the call will not work
- call 0x789
Instruction pointer relative call: call a function that is an offset from the current instruction pointer (rip)
- call rip-12
- Not interposition: loader cannot hijack the address of the function to call
- Used for hidden symbols
GOT: Indirection: call is made to a offset of a table of function address
- If the binary is move to another address, the .got needs to be updated with the new function address
  - .got is not pic
- The code segment is still PIC as the offset into the .got is the same
- Do each binary have its own .got?
- Uses the .got table
- Indirection might occur in shared library:
  - compiler don’t know if the function definition in the shared library will be interposed

Building a PIC executable:

Executable symbols cannot be interposed
-fpie: to help when executable are located at a different memory address.

Lazy BindingPermalink

Resolution only done at the first time
- yes
- Intervened with -no-plt

First call to an unresolved symbol:

calls f() => call procedure_lookup_stub_42
- 42 is an arbitrary identifier given to f
procedure_lookup_stub_42: jmp [got_slot_42]
- got_slot_42: procedure_lookup_stub_42 + 1
- got_slot_42 points to the next address of this address
  - jumps back
jump backs to the next address of jmp [got_slot_42]
- push 42; jmp <ldr resolver>
- calls the resolver to resolve the f() symbol
resolver resolves the symbol and overwrites the got_slot with the address of f found
After the first call jmp [got_slot_42] it will got to f directly

plt: the table of the all the procedure_lookup_stub

Actual function calls are into plt slots

plt slot is per binary

Downside:

security possibility as the GOT is re-writable

Function PointersPermalink

The binary that defines the symbol will have the address of the symbol in .dynsym
- The address of the symbols is the offset into the plt
Binary that imports the symbols from other binary will load the address from .dynsym

Symbol VisibilityPermalink

.dynsym: contains all (global) symbol and is always visible to all other binaries
- all global symbols are potentially exported
Symbols can be marked UNDEFINED
- needs to be “imported” by the loader from other binaries
symbols have visibility: default / protected / hidden
- hidden: program instruction relative code is hidden
  - not on the .got and other binary don’t know how to call them
- default: symbols are in .got

Twitter Facebook LinkedIn

Klement Tan

Post: Linking, Loaders and Shared Libraries

IntroPermalink

Linux import sectionPermalink

InterpositionPermalink

Position Independent CodePermalink

Lazy BindingPermalink

Function PointersPermalink

Symbol VisibilityPermalink

You May Also Enjoy

Post: Single Producer Single Consumer Lock-free FIFO From the Ground Up - Charles Frasch - CppCon 2023

Post: Back to Basics: Initialization in C++

Post: C++ Exceptions

Post: VLAN