Why do you need NOPs?

I’ve read a lot of things and taken both the OSCP and OSCE courses, yet I’ve never seen anyone really break out why we use NOPsleds. There are instances where they are used to line up shellcode to a particular offset, which is self-explanatory. However, there are cases where an exploit won’t work without them. Why is that?

Looking at the TRUN command in Vulnserver, it’s a relatively easy exploit. We begin crafting our overflow string, this time with no NOPs. We already established the EIP offset is 2003, and we’re using reverse shell shellcode generated by msfvenom. The only bad character identified is \x00, we we encode with the default x86/shikata_ga_nai encoder and specify the bad character. POC code for the exploit is available here.

No NOPs this time

In this instance, we are using the JMP ESP instruction located at 0x62501205 to jump to our shellcode. Before sending the exploit, ensure a breakpoint is set at this address. We send the exploit and step through it in the debugger. Take the jump to ESP and we see, we are properly lined up.

Sitting at ESP…

If we move forward one instruction at a time, the program crashes and shellcode does not execute. If we’re paying attention, we will see that part of our shellcode was overwritten with some weird instructions.

RET instruction is gone, shellcode overwritten…

So, back to the drawing board. Modify the string to add the nops in, and let’s look at execution.

NOPsled in place…

At this point we can see our shellcode at the end of the NOPs. If we continue execution, we will receive a shell. But at this point it’s important to step back and understand the process here.

Shellcode begins…

Starting at 0x00b7fa24 we see our shellcode, but what we actually see if the msfvenom decoding stub. The important part to note is the instruction at 0x00b7fa2b: FSTENV (28-Byte) PTR SS:[ESP-C]. Floating point instructions are used for placing EIP on the stack, which is useful for all sorts of reasons but for our purposes here in order to perform relative calculations for the decoder to work. If we advance execution to 0x00b7fa2f (but don’t execute this instruction) we can see this play out.

Did not crash this time…

So we use FXCH to manipulate the floating point registers and put the value of EIP into an FPU register. Then we execute the FSTENV (28-BYTE) PTR SS:[ESP-c] instruction, which dumps the floating point environment into memory. It dumps 28 bytes of data starting at ESP-C, which in this case would be 0x00b7fa00. If we look at the stack, the memory address that FXCH instruction was performed at is now at the top of the stack. The next instruction will pop this value into EBX, and now EBX will be used by the decoder for relative calculations. At this point, if we press F9 to continue execution (in Immunity…)

It works!

I find this very interesting to see in action. I happened on information about how these instructions worked when dealing with a stack alignment problem in another exploit. Understanding how the decoder worked could have saved me some time.

Vulnserver: LTER SEH Buffer Overflow

Vulnserver is an intentionally vulnerable application used for training exploit development. It consists of several commands, some vulnerable and some not, and the the user is intended to find and exploit these vulnerabilities. For many specific vulnerabilities, there are several ways to exploit them. In my preparation for the OSCE exam, I was able to find and exploit each command in turn. However, it wasn’t until reviewing the infamous HP NNM 7.5.1 exploit that I was able to exploit the LTER command in Vulnserver by overflowing the SEH address.

To start we find the crash. I used boofuzz for this, using a template found out on this blog site. The crash should occur fairly quickly.

boofuzz template for vulnserver

After fuzzing we replicate the crash manually by sending a metasploit pattern to identify the offset. We get the “LTER /.:/” prepend string from the fuzz results.

Sending a pattern of 5000 bytes

Ensure the application is open and attached to a debugger on the target machine. The application has crashed and we can see the MSF pattern overwriting several locations in memory. Inspecting the SEH chain shows us the SEH pointer is overwritten. Querying these values in the pattern_offset utility in MSF returns an offset of 3495 for NSEH, 3499 for SEH. Note, the offset for NSEH on Windows 2003 will be 3491, on Windows Vista it will be 3515. By sending a shorter buffer, you can overwrite EIP directly instead of overwriting the SEH pointer. This can be a useful exercise for dealing with character restrictions in a simpler problem.

SEH pointer is overwritten

Further testing shows that we have 28 bytes following SEH to test bad characters manually. While testing, we send the below string:

testing bad characters…

And we find some strange results:

Everything after 7F is mangled…

It wasn’t until doing the OSCE course that I really recognized what was happening here and some of the additional options I have here. The characters after 7F are being mangled, but testing reveals that they are being mangled in a predictable way, basically subtracting 7F from any value greater than 7F. The end result is that we are left with a more or less alpha-numeric character set to work with.

The next step is to identify a “pop pop ret” pointer in the essfunc.dll module that consists of allowed characters. We can use mona or findjmp.exe in order to find this address, or just search immunity for “pop r32 pop r32 ret” but this is the least readable of the three options. We add this address into our attack string at the SEH location.

pop pop ret found at 0x6250120b in essfunc.dll

We need more space to work with, 28 bytes is not enough to function with even without character restrictions. We can take a backwards jump, with a couple of modifications. For a normal backwards jump, we would use the opcodes eb XX, where XX is equal to the number of bytes we want to jump, minus 1, subtracted from 255 and converted to hex. So if we want to jump back 64 bytes we would use c0, 128 bytes would be 80. We can’t use either of these but if we use FF it will be converted to 80 when vulnserver.exe does it’s alpha-numeric conversion. Instead of using eb for a short jump, we can use 77 for a conditional short jump. This jump relies on the zero flag and the carry flag being unset. We can ensure they are set to zero by putting an operation in front of them, such as \x42 which translates to INC EDX. It would take a very unlikely set of circumstances for INC EDX to lead to the zero flag and carry flag being set.

Backwards alphanumeric jump

We set nseh equal to “\x42\x77\xff\x42” and add it to the attack string. If we send this string and follow it in the debugger, then take the jump, we arrive at about 127 bytes of code we can use.

Jump back into the buffer, giving us some breathing room…

Now that we have more space, we can sub-encode values. Sub encoding uses alpha-numeric values and SUB instructions to put specific values we need on the stack. As an example, we will take the value 0xe7ffe775 and sub encode it. First, subtract the value from 0xffffffff and then add 1. We get the value 1800188b. We can then break the bytes out into a table, as seen below. We must ensure that the values we pick add up to our bytes in the left column, but also that they are not in the list of bad characters for the application.

Table for manual sub encoding

To break this down, if we first zero out EAX, we can then perform these instructions (SUB EAX, 15521542; SUB EAX,015e0208; SUB EAX, 01500141) and then EAX will contain the value e7ffe775. Push this value on the stack, and now we have our decoded instructions on the stack. There are calculators online for doing this kind of encoding but it’s a good exercise to do it manually and learn how it is done.

My objective here was to jump all the way back to the beginning of the buffer so that I have 3000+ bytes to work with. The first thing we need to do is align the stack in our current buffer area. To do this we increase ESP by 1188 in order to place ESP right at the end of our current buffer. So whatever we push onto the stack will get executed. To do this, we push the value of ESP onto the stack, we pop it into EAX, we adjust it using sub encoding, and then we push the value of EAX onto the stack and pop it into ESP. Now, our stack is located at the end of our current buffer segment.

Sub encoded stack adjustment

From here, we want to jump backwards from the location of our stack to the beginning of our buffer. We do the math and see that we need to jump back 0xdb9, we would normally perform a near jump to FFFFF264, opcodes would translate to E9 64F2FFFF due to endianess. We will have to write this as two instructions on the stack to account for the uneven number of bytes, since we can only write 4 bytes at a time. We must zero out the EAX register before sub encoding instructions. We want to sub encode the values 64F2FFFF and E9414141 and push them on the stack in that order.

Sub encoding a near jump back to the beginning of our buffer

In this instance, I am using the same method muts used in the NNM exploit to zero out EAX. There are other methods, the one I normally like is pushing 41414141 onto the stack, popping it into EAX, and then XORing EAX with 41414141 again. If we execute these instruction, we see our jump get revealed at our stack location.

We’re jumping back and…

And now we are back at the beginning of the buffer and have a ton of buffer space for code execution. We’re actually 8 bytes into our buffer here, which is important to note for keeping our attack string aligned.

Now we’re at 00b7f23c

Now we have plenty of space. Msfvenom can generate alpha-numeric shellcode, so at this point we’re home free. Or almost. There are two other factors to consider. The first is that msfvenom-generated alpha-numeric shellcode is prepended by a non-alpha-numeric stub which sets the shellcode location in order to to operations relative to the start of the shellcode. This is a problem in this case. Offsec is nice enough to document the BufferRegister feature of msfvenom, which sets your buffer to a particular register. In order for this to work, we need to line up a register with our shellcode. So we adjust ESP using the same method as before and we can see below that our stack now lines up with our buffer, so the BufferRegister setting in msfvenom will work, our shellcode is waiting for us about 900 bytes and change away, and we should have a shell.

Lined up buffer.

You’ll note that this doesn’t work, and this is due to stack alignment. The stack must be aligned to a DWORD in x86 processors, which means that the memory address must be divisible by 4. Since ESP has been set to a place not evenly divisible by 4, instructions get confused and execution just gets messed up. So we adjust the stack to a location divisible by 4 and ensure we adjust all of our padding as well, and that’s it.

Adjusted shellcode

And…

It works.

I have the working code posted here. It should be noted that I chose ESP, but you can use any register, I believe

Backdooring Portable Executables: Code Caves and threading failure

To start off, let’s take care of the issue with the code cave. We don’t want to add another section to the PE, we want to modify the executable as little as possible to prevent possible detection. We can use the Memory Map tool in Immunity Debugger to manually review the sections for large, uninterrupted spaces of nulls. This blog shows a pretty succinct process for backdooring executables (and is my primary source for the next portion on threading) and introduces a tool called Cminer. Using Cminer on putty.exe, we find two potential caves:

Using Cminer to locate code caves in putty.exe of 350 bytes or more

These are actually rather small at 500ish bytes a piece, and may not work for encoded shellcode, requiring a jump from one cave to the other. But for unencoded shellcode, we are looking at about 350 bytes so this is plenty of space. Also note, manual inspection of the memory space will show that there is a ton more null space in the .xdata segment, it does not end with 0x4b8800. Whether this is some kind of issue with the tool or user error, I’m not sure. The good news is that if needed we can fit plenty of encoded shellcode in there. The bad news, I tested several different open source tools and none really provided reliable results. In fact, manual inspection revealed huge portions of null space in every section in the putty.exe PE, so, yeah I don’t know, probably a rabbit hole that it would be good to go down eventually and figure out how to accurately do this using automation. For right now, manual inspection is needed but Cminer did accurately provide two code cave locations so there is that. It should be noted that even though there are huge areas of null space, trying to overwrite some of those areas will cause the debugger to complain, so it’s a trial and error thing to find a suitable area.

So uhhh… about that nullspace ending at 0x004b8800…

From here it proceeds the same as with the previous example. Step into the code cave, save the program state, place the shellcode (this time generated using msfvenom with EXITFUNC=seh), adjust the stack, and close it out by loading the saved program state from the stack and proceeding with program execution.

For threading, I used a few different resources, the most easy to read was here, but was not successful. OSCE course starts in two days so I’m not going to spend anymore time on it, I’m close but it’s not quite working out.

Backdooring Portable Executables: Fixing execution

The best part of writing things down here is that inevitably 5 minutes later, I find someone with the same problem. When backdooring putty.exe, I came across an issue with the EXITFUNC code appended onto the end of msfvenom code. The code ends with a CALL EBP, which goes off into the shellcode and eventually to ntdll.KiFastSystemCallRet where it terminates the program.

We got a shell, but the program terminates here…

Not ideal. So one solution was to change that CALL EBP instruction to NOPs. Skipping that call allows execution to proceed, everything is right with the world, except the gnawing feeling that it’s just gross.

so dirty…

But hey, it works. Turns out though, Capt Meelo had the same issue and came up with a couple of solutions. His first solution was the same as the one up here, turn that CALL instruction into NOPs and the program executes as expected. Kind of reassuring to see that someone came up with the same fix, even if it’s not ideal. But his second solution was kind of a head-slapper moment. Just create the shellcode using the EXITFUNC=seh option:

generating shellcode

A closer inspection of the shellcode reveals they’re identical, as you’d expect since looking at the metasploit github shows you that EXITFUNC is determined by code tacked on to the end of the shellcode. The real difference is in a value moved into EBX at 0x004c3127 shown below.

Left: EXITFUNC=seh Right: EXITFUNC=none

So, lesson learned. If there’s an issue with a particular part of the shellcode, maybe try all the options associated with that shellcode first before manually NOPing out instructions. Using EXITFUNC=seh resumes execution like a charm. There is one remaining issue, though. Putting simple shellcode like popping calc will always work, but in the case of a reverse shell we find that the program does not execute unless a listener is configured on the correct port. This is not ideal. So on the agenda: redirecting to existing null space int he code rather than adding a section and fixing the last remaining execution issue.

Backdooring Portable Executables

Prepping for OSCE, lots of shellcoding and debugging going on. Shellcoding is particularly frustrating today so to change gears for a bit I’m going to write up backdooring PEs. Examples are x86, tested on Windows XP SP3, using the x86 putty executable available here. I also used Stud_PE, Immunity Debugger, and a great blog post over at Sector876 that helped me get the basics down. There’s a couple of different methods, let’s start with the first, we’ll add a section.

First, let’s start by opening the program in Stud_PE. Select the Sections tab and you can see all of the sections present in the executable. Right click and choose “New Section” and adjust the size. Ensure you’ve added enough space to fit your shellcode in, and select the option to fill the space with null bytes. After you save, run the executable to ensure it still functions and nothing is broken.

adding a section with Stud_PE

Open the executable in the debugger and inspect the sections, in Immunity this is easily found in the Memory Map tab. If you inspect the section, you should find it filled with null bytes. Note the memory address that the section starts at, in this instance we see 0x004c3000.

our new section, all filled with nulls

We will need to redirect program execution to this address, where we will eventually put shellcode. For now, note the existing instructions at the program’s entry point. We will need these for redirecting back to normal execution for the program.

our initial instructions

After saving these addresses, it is time to redirect execution. In Immunity we do this by right clicking the instruction and selecting Assemble, then inputting our jump instruction. We want to jump to 0x004c3000 in this case.

assembling our jmp instruction

After assembling the instruction should update in the debugger to exactly what we need.

jmp to…

Hit F7 in Immunity to move to the next instruction and you should see the new section we added, filled with nulls.

nulls, nulls everywhere

The first thing that should be done at this point is to save the executable. After saving, in the new section overwrite the first two instructions with PUSHAD and PUSHFD instructions. This saves the program state before we pass it our shellcode. Press F7 twice to move ahead and then note the location of ESP. We will need this location after the shellcode to adjust the stack prior to normal program execution. In this instance, ESP is at 0x0012FFA0 following the two instructions added.

ready for shellcode now

Use msfvenom to create shellcode. In this instance we’re creating a reverse tcp shell for Windows in hex format, no bad chars specified. In Immunity, this is placed in the executable by selecting enough memory locations to hold our shellcode starting with the instruction following the PUSHFD, then right clicking and performing a binary paste.

generating shellcode

Once the shellcode is pasted in, place a breakpoint on the last instruction of the shellcode and press F9 to continue execution. Note that if the shellcode is a reverse shell, unless you have a listener set up it will take execution off into an exception, and that you’ll have to close the listener to hit your breakpoint. Once at the breakpoint, note the address at ESP. In this case it is 0x0012fd9c.

ESP location after shellcode execution

The determine the ESP offset, we subtract the initial ESP value from the ending value, so 12ffa0 – 12fd9c, and our offset is 0x204. in order to get ESP back where it belongs for normal program execution, we need to add 0x204 to ESP, so we right click the instruction immediately following the shellcode and input ADD ESP, 0x204. The next two instructions should be POPFD followed by POPAD to resume the program’s saved state from before the shellcode.

Perfect! Or was it…

At this point, it doesn’t work. You get your shell, the program doesn’t open, you close your shell, the program crashes. Frustrating. What we want is for the program to open and for you to get your shell at the same time. And for it to not crash when we close out of our shell. First things first. The opening issue has already been solved for us.

The offending code

So we found it. It’s near the bottom, in the blog linked above he links to the metasploit github where you can find it for yourself. So we update it…

Easy enough

Still not quite right though, normal program execution doesn’t happen. Over at the metasploit github page, you can view the code for the exitfunc portion. This code is appended to the end of your msfvenom-generated code even if you choose EXITFUNC=none, right at the end. Check out this part is particular:

very last line of your shellcode

If you trace it in the debugger, your shellcode calls EBP which takes it off back into the shellcode and eventually out into ntdll.KiFastSystemCallRet which… ok, turning into a rabbit hole. Researching this, figuring out why it is doing this, is a priority. But for right now, quick and dirty, oh so dirty, just overwrite that call with NOPs and it works.

this is a gross solution and I should feel bad…

On the agenda is creating a backdoor using existing null space within the application and figuring out this business with the EXITFUNC in msfvenom.

program’s running…
shell’s working… it’s still gross though