Post: Linking, Loaders and Shared Libraries
Notes of the following resource:
- Linkers, Loaders and Shared Libraries in Windows, Linux, and C++ - Ofek Shilon - CppCon 2023
- ELF interposition and -Bsymbolic
IntroPermalink
Binary: shared object and executable
Linking Intro:
- Compiler: Build starts with source files and compiler compiles them to object files.
- Translation unit: preprocessor add all includes in a source file
- Object file: compiled translation unit
- container of different sections (ie .text .data)
- .data BBS?
- container of different sections (ie .text .data)
- Linker:
- Pull together different sections from object files and concat them together
- put all .text/.data from all object files together
- Loader:
- Load the different sections into memory as segments with the correct permission
- .data: r/w
- .text: r/x
- Load the different sections into memory as segments with the correct permission
Shared library function call:
- normal: code call a function in its own binary
- shared library: map shared library to its own address space and perform function call into the shared library.
- shared library is able to call other shared library
Linking text relocation:
- Purpose call a function from another object file/shared libary:
- In the
.codesection it calls an arbitrary address- ie
call 0x00000at line0x1000
- ie
- In the
.reloc(like a TODO for linker) it states “findfooand write its address at0x1000” - Disadvantages:
- Modifying the
.codesegment makes the section unshareable between processes - Relocation need to be done once per call-site vs once per function
- Modifying the
Global Offset Table (GOT linux):
- Calling a function with an memory address of a placeholder instead of an address to be overwritten
- ie
call [0x2000]: will call the function at address[0x2000]
- ie
.reloc: only need to write the address of the function in0x2000instead of every call site.- Disadvantage:
- could incur extra indirectin
Linux import sectionPermalink
.dynamic:
* a list of dynamic library names it depends on
.dynsym: symbol table
* contains all symbols that the binary contains
* The ones to be imported will be marked with UND
Semantically linux binary tells the loader:
- Here are all the libraries I would like to load with the
.dynamicsection, I want to map to the process - Here are the bucket of symbols I want you to locate in any of the loaded libraries.
InterpositionPermalink
Interposition: overriding a symbol in one binary (executable / library) from another
Library Search Order:
- Linux: breadth first
- Loader will start from executable then the libraries the executable and then the libraries the first order libraries depend on.
- If a 3rd degree library uses
fooand definesfoo, the loader will start with the executable before that library - Anything before the current library have a chance to interpose
foo - Search in current lib before bread first search (
--Bsymbolic) LD_PRELOAD: load the libraries in the env var before the libraries in the executable.- Loader will still search executable first then the libraries in
LD_PRELOAD
- Loader will still search executable first then the libraries in
Can a shared-library symbol be overridden from an executable?
- Linux: yes
C++: new
- allows default implementation of
operator newnew to be overridden
Symbol Resolution Time:
- For executable: undefined symbols are not allowed by the linker.
- linker will find all libraries that one of them have the symbol? (isn’t that the loader)
- For libraries: undefined symbols are allowed because the symbols might be in the library/executable that imports it
- This is because of linux breadth first search
--allow-shib-undefined
Process-wide singleton:
- Linux: just put the singleton in the executable
Circular library dependencies:
- Linux: yes
- Shared libraries allow undefined symbols at link time
Weak vs Strong symbols:
- strong symbols:
- can override weak symbols of the same name
- if two strong symbols of the same name, linker will choose the first symbol (interposition)
- allow overriding of malloc
- weak symbols:
- does not need definition
- usually are declaration
Position Independent CodePermalink
What is not PIC:
- Hard coded function address is not PIC: if the binary is loaded in another address the call will not work
call 0x789
- Instruction pointer relative call: call a function that is an offset from the current instruction pointer (
rip)call rip-12- Not interposition: loader cannot hijack the address of the function to call
- Used for
hidden symbols
- GOT: Indirection: call is made to a offset of a table of function address
- If the binary is move to another address, the
.gotneeds to be updated with the new function address.gotis not pic
- The code segment is still PIC as the offset into the
.gotis the same - Do each binary have its own
.got? - Uses the
.gottable - Indirection might occur in shared library:
- compiler don’t know if the function definition in the shared library will be interposed
- If the binary is move to another address, the
Building a PIC executable:
- Executable symbols cannot be interposed
-fpie: to help when executable are located at a different memory address.
Lazy BindingPermalink
- Resolution only done at the first time
- yes
- Intervened with
-no-plt
First call to an unresolved symbol:
- calls
f() => call procedure_lookup_stub_4242is an arbitrary identifier given tof
procedure_lookup_stub_42: jmp [got_slot_42]got_slot_42: procedure_lookup_stub_42 + 1got_slot_42points to the next address of this address- jumps back
- jump backs to the next address of
jmp [got_slot_42]push 42; jmp <ldr resolver>- calls the resolver to resolve the
f()symbol
- resolver resolves the symbol and overwrites the
got_slotwith the address offfound - After the first call
jmp [got_slot_42]it will got tofdirectly
plt: the table of the all the procedure_lookup_stub
Actual function calls are into plt slots
pltslot is per binary
Downside:
- security possibility as the GOT is re-writable
Function PointersPermalink
- The binary that defines the symbol will have the address of the symbol in
.dynsym- The address of the symbols is the offset into the
plt
- The address of the symbols is the offset into the
- Binary that imports the symbols from other binary will load the address from
.dynsym
Symbol VisibilityPermalink
.dynsym: contains all (global) symbol and is always visible to all other binaries- all global symbols are potentially exported
- Symbols can be marked
UNDEFINED- needs to be “imported” by the loader from other binaries
- symbols have visibility:
default/protected/hiddenhidden: program instruction relative code ishidden- not on the
.gotand other binary don’t know how to call them
- not on the
default: symbols are in.got