Why do you need NOPs?

I’ve read a lot of things and taken both the OSCP and OSCE courses, yet I’ve never seen anyone really break out why we use NOPsleds. There are instances where they are used to line up shellcode to a particular offset, which is self-explanatory. However, there are cases where an exploit won’t work without them. Why is that?

Looking at the TRUN command in Vulnserver, it’s a relatively easy exploit. We begin crafting our overflow string, this time with no NOPs. We already established the EIP offset is 2003, and we’re using reverse shell shellcode generated by msfvenom. The only bad character identified is \x00, we we encode with the default x86/shikata_ga_nai encoder and specify the bad character. POC code for the exploit is available here.

No NOPs this time

In this instance, we are using the JMP ESP instruction located at 0x62501205 to jump to our shellcode. Before sending the exploit, ensure a breakpoint is set at this address. We send the exploit and step through it in the debugger. Take the jump to ESP and we see, we are properly lined up.

Sitting at ESP…

If we move forward one instruction at a time, the program crashes and shellcode does not execute. If we’re paying attention, we will see that part of our shellcode was overwritten with some weird instructions.

RET instruction is gone, shellcode overwritten…

So, back to the drawing board. Modify the string to add the nops in, and let’s look at execution.

NOPsled in place…

At this point we can see our shellcode at the end of the NOPs. If we continue execution, we will receive a shell. But at this point it’s important to step back and understand the process here.

Shellcode begins…

Starting at 0x00b7fa24 we see our shellcode, but what we actually see if the msfvenom decoding stub. The important part to note is the instruction at 0x00b7fa2b: FSTENV (28-Byte) PTR SS:[ESP-C]. Floating point instructions are used for placing EIP on the stack, which is useful for all sorts of reasons but for our purposes here in order to perform relative calculations for the decoder to work. If we advance execution to 0x00b7fa2f (but don’t execute this instruction) we can see this play out.

Did not crash this time…

So we use FXCH to manipulate the floating point registers and put the value of EIP into an FPU register. Then we execute the FSTENV (28-BYTE) PTR SS:[ESP-c] instruction, which dumps the floating point environment into memory. It dumps 28 bytes of data starting at ESP-C, which in this case would be 0x00b7fa00. If we look at the stack, the memory address that FXCH instruction was performed at is now at the top of the stack. The next instruction will pop this value into EBX, and now EBX will be used by the decoder for relative calculations. At this point, if we press F9 to continue execution (in Immunity…)

It works!

I find this very interesting to see in action. I happened on information about how these instructions worked when dealing with a stack alignment problem in another exploit. Understanding how the decoder worked could have saved me some time.