x86 Assembly 101: conditional jumps, logic and shiftings

According the statistics I’m using as a referral, the next most common instruction is the comparison one CMP, but, actually, before explaining this one it’s necessary/useful to introduce other instuctions, just like ADD and SUB.

The ADD and SUB instructions

The ADD was vaguely introduced the last time and it just adds the source to the destination, just like the SUB that subtracts the source from the destination. The syntax is the following:

ADD [Destination], [Source]

SUB [Destination], [Source]

Both operands can be registers, but, as usual, at most one can be a memory location in r/m32 form; obviously the source can be an immediate as well. Some examples could be:

ADD ESP, 4

SUB EAX, [ECX*8]

The execution of those two instructions modifies the value of the flags in the EFLAGS register. In particular it modifies the following flags:

  • Overflow Flag (OF): set to 1 if the signed result is too large to fit into the destination.
  • Sign Flag (SF): set to 1 if the result is negative, i.e. if the MSB is 1. This flag is set to 1 even if operating with unsigned values.
  • Zero Flag (ZF): set to 1 if the result is 0.
  • Carry Flag (CF): set to 1 if the unsigned result it too large to fit into the destination.
  • Adjust Flag (AF): set to 1 if when there is an overflow or an underflow when working with BCD numbers
  • Parity Flag (PF): set to 1 if the result is even.

The LEA instruction

This LEA (Load Effective Address) instruction is a very powerful one. Its purpose is to compute the effective address of the source operand and store it in the destination: the actual content of the memory location is not loaded. The syntax is the following:

LEA [Destination], [Source]

It could be mistaken, at first sight, for another MOV instruction, but it’s actually different as it allows to perform address computations like the following one:

LEA EAX, [EBX+4*ESI+5]

An important note: the LEA instruction is the only exception to the bracket rule, as it just computes the operation inside the brackets, without loading the memory location content.

Obviously this instruction is used by compilers not only for memory address computations, but also for optimizations of multiplications and summations, like in the following case, in which EBX is rapidly multiplied by 18:

LEA EBX, [EBX*2]
LEA EBX, [EBX*8 + EBX]

The JMP instruction

This instruction allows to jump unconditionally to another arbitrary code location by setting a new value for the EIP register. The destination address can be expressed as a displacement from the last byte of the JMP instruction itself or as an absolute address. In the end the syntax for JMP is:

JMP [Operand]

where [Operand] can be:

  • a register: the EIP will be loaded with the content of the register specified.
  • a short relative: a 1-byte displacement, sometimes indicated as JMP SHORT.
  • a near relative: a 4-bytes displacement.
  • a memory location: the destination address will be read from that memory location.

A very VERY simple example involving the JMP instruction is the code generated by the compilation of:

int main(void)
{
    int a = 1;
    goto exit;
    a = a + 2;
exit:
    return a;
}

which is the following:

_main:
00401000  push        ebp
00401001  mov         ebp,esp
00401003  push        ecx
00401004  mov         dword ptr [ebp-4],1
0040100B  jmp         00401016
0040100D  mov         eax,dword ptr [ebp-4]
00401010  add         eax,2
00401013  mov         dword ptr [ebp-4],eax
exit:
00401016  mov         eax,dword ptr [ebp-4]
00401019  mov         esp,ebp
0040101B  pop         ebp
0040101C  ret

As far you can see, the GOTO instruction has been translated into a JMP instruction (wow, I generated an example for the ADD instruction as well). The JMP instruction is only 2 bytes long: this means (of course) that a 1-byte displacement has been used and that we’re dealing with a JMP SHORT. Again, it’s possible to observe how the space for the “a” local variable is reserved in the stack by using PUSH ECX.


The CMP instruction (and the cousin TEST)

At last, the CMP instruction holds the 5% of occurrences and is used to perform comparison between two operands: this is done by subtracting the second one from the first one and by setting the appropriate flags. Of course, the behavior of the CMP instruction is almost totally the same of the SUB’s one, but, in this case, the result is not stored. The syntax is the following:

CMP [Operand], [Operand]

where both the operands can be a register or a memory address and only the second one can be an immediate. Examples of the CMP instruction are:

CMP EAX, EBX

CMP ECX, DWORD PTR [EBP-8]

CMP DWORD PTR [ESI], 0

CMP EAX, 2

The TEST instruction, with the 3% of occurrences, is less used compared to CMP, but does more or less the same: instead of subtracting the two operands, performs an AND between the two of them and sets the flags accordingly. Same syntax:

TEST [Operand], [Operand]

Once the comparison is performed, to use the information derived from it, it’s necessary to use one of the conditional jump instructions.

The Jcc instructions

This a class of instructions, called the conditional jump instructions. There are something like a LOT of conditional jump instructions and examining all of them here would be really long, repetitive and mostly boring. The syntax is exacly the same of the JMP instruction and some of them, luckily, are just aliases for other Jcc instruction; in any case, I’ll enumerate here the most used ones with the corresponding flags checked:

  • JE [Operand] (if ZF is set) – jump if equal
  • JNE [Operand] (if ZF is not set) – jump if not equal
  • JZ [Operand] (if ZF is set) – same of JE – jump if zero
  • JNZ [Operand] (if ZF is not set) – same of JNE – jump if not zero
  • JS [Operand] (if SF is set) – jump if negative
  • JNS [Operand] (if SF is not set) – jump if not negative
  • JO [Operand] (if OF is set) – jump if overflow
  • JNO [Operand] (if OF is not set) – jump if not overflow
  • JC [Operand] (if CF is set) – jump if carry
  • JNC [Operand] (if CF is not set) – jump if not carry
  • JG [Operand] (if ZF is not set and SF==OF) – jump if greater (signed)
  • JGE [Operand] (if SF==OF) – jump if greater or equal (signed)
  • JL [Operand] (if SF!=OF) – jump if less (signed)
  • JLE [Operand] (if SF!=OF or ZF is set) – jump if less or equal (signed)
  • JB [Operand] (if CF is set) – jump if below (unsigned)
  • JBE [Operand] (if CF is set or ZF is set) – jump if below or equal (unsigned)
  • JA [Operand] (if CF is not set and ZF is set) – jump if above (unsigned)
  • JAE [Operand] (if CF is not set) – jump if above or equal (unsigned)

The following application verifies how many parameters were passed in the command line and returns a different message according to the different situations:

int main(int argc, char **argv)
{
    if (argc == 2)
    {
        printf("Well done");
    }
    else if (argc > 2)
    {
        printf("You entered too many parameters");
        return 1;
    }
    else
    {
        printf("Please enter at least a parameter");
        return 1;
    }

    return 2;
}

The generated code is the following one:

_main:
00401000  push        ebp
00401001  mov         ebp,esp
00401003  cmp         dword ptr [ebp+8],2 ; argc in ebp+8
00401007  jne         00401018
00401009  push        403000h
0040100E  call        00401050            ; printf: Well done
00401013  add         esp,4
00401016  jmp         00401048
00401018  cmp         dword ptr [ebp+8],2
0040101C  jle         00401034
0040101E  push        40300Ch
00401023  call        00401050            ; printf: too many
00401028  add         esp,4
0040102B  mov         eax,1
00401030  jmp         0040104D
00401032  jmp         00401048
00401034  push        40302Ch
00401039  call        00401050            ; printf: too few
0040103E  add         esp,4
00401041  mov         eax,1
00401046  jmp         0040104D
00401048  mov         eax,2
0040104D  pop         ebp
0040104E  ret

As seen, the argc variable is retrieve from the stack and directly compared to the immediate 2 at 0x00401003: if not equal, the flow is hijacked to 0x00401018, where there is another comparison. I’d say that the flow here is pretty comprehensible. After each call to printf there’s an ADD instruction that cleans up the stack.


The AND, OR and XOR instructions

These three instruction are, of course, instructions about the bit-wise logic operations. They accept two operands and they store the result in the first one:

AND/OR/XOR [Destination], [Source]

The destination and the source can be a register or an r/m32 memory address, and the source can be an immediate as well. Again, only one of the operands can be a memory address and not both at the same time. Examples are:

AND EAX, ECX

OR EDX, 0Fh

XOR EBX, EBX

This last example (XOR EBX, EBX) is worth of note: in fact it is used to set the register to 0 instead of using the MOV instruction. This is done for two reasons:

  • it’s faster
  • less space is used in terms of opcode bytes

So, “XOR reg, reg” is exactly the same of “MOV reg, 0”

The NOT instruction (and the sister NEG)

And those are the missing logic instructions. The NOT instruction logically negates the operand, while the NEG performs the two’s complement negation of it. I’ve chosen to put them into another category, as the syntax for these ones is different: in fact they accept only one operand and stores the results into itself:

NOT [Operand]

NEG [Operand]

Again, the operand can be a register or a r/m32 memory address. Examples of it are:

NOT EAX
NEG [ESI+EBX]


The following program illustrates the use of some of the aforementioned logic instructions:

int main(void)
{
    int a = 0xdeadc0de;
    a ^= 0xbaadf00d;

    if (a & 0xcafefeed)
        return 1;
    else
        return 0;
}

which is translated into:

_main:
00401000  push        ebp
00401001  mov         ebp,esp
00401003  push        ecx
00401004  mov         dword ptr [ebp-4],0DEADC0DEh
0040100B  mov         eax,dword ptr [ebp-4]
0040100E  xor         eax,0BAADF00Dh
00401013  mov         dword ptr [ebp-4],eax
00401016  mov         ecx,dword ptr [ebp-4]
00401019  and         ecx,0CAFEFEEDh
0040101F  je          0040102A
00401021  mov         eax,1
00401026  jmp         0040102C
00401028  jmp         0040102C
0040102A  xor         eax,eax
0040102C  mov         esp,ebp
0040102E  pop         ebp
0040102F  ret

Even if I actually don’t know why two JMP instructions were generated at 00401026 and 00401028, the reading of the program is pretty straightforward. Note the use of the XOR instruction at 0040102A to set EAX at 0 (this is the return 0).


The SHL/SHR instruction

Another bit-wise instruction existing in x86 Assembly is the SHift Left/Right instruction, that shifts to the left or to the right the bits of the first operand of a number of bytes specified in the second one, padding the empty bit positions with zeroes. It can be associated to the “<<” and “>>” operands in C/C++, so it’s just like multiplying or dividing the operand. The syntax is the usual one, with two operands

SHL [Operand], [NumberOfBits]
SHR [Operand], [NumberOfBits]

The first operand can be a register or a memory address, while the second one can be an 8-bit immediate or the CL register. At the end of:

MOV EAX, 0xFh
MOV CL, 2
SHL EAX, CL

the EAX register will contain the value 0x3C, as it is 0xF * 4.

The bits the are shifted away on the left or right side go into the CF flag. So, at the end of:

MOV EAX, 0x99h
SHL EAX, 1

EAX will be set at 0x32 and the CF flag will be set as well.


Compilers optimize the multiplications and the divisions by multiples of 2 by transforming them into shift operations. For example, the following C code:

int main(void)
{
    unsigned int a;

    a = 0xc0de;
    a /= 4;
    a *= 8;

    return a;
}

is translated into:

_main:
00401000  push        ebp
00401001  mov         ebp,esp
00401003  push        ecx
00401004  mov         dword ptr [ebp-4],0C0DEh
0040100B  mov         eax,dword ptr [ebp-4]
0040100E  shr         eax,2
00401011  mov         dword ptr [ebp-4],eax
00401014  mov         ecx,dword ptr [ebp-4]
00401017  shl         ecx,3
0040101A  mov         dword ptr [ebp-4],ecx
0040101D  mov         eax,dword ptr [ebp-4]
00401020  mov         esp,ebp
00401022  pop         ebp
00401023  ret

In this code, it’s clear how the division by 4 has been coded into the “SHR EAX, 2” instruction at 0x0040100E, and the multiplication by 8 is just a “SHL ECX, 3” at 0x00401017.


With all the knowledge acquired now, it is possible to encode for-loops in x86 Assembly: in fact only ADD and Jcc instructions are actually necessary to do this. The following simple for-loop

#include <stdio.h>

int main(void)
{
    unsigned int i;
    for (i = 0; i < 10; i++)
    {
        printf("%d\n", i);
    }

    return 0;
}

is encoded into:

_main:
00401000  push        ebp
00401001  mov         ebp,esp
00401003  push        ecx
00401004  mov         dword ptr [ebp-4],0      ; i = 0
0040100B  jmp         00401016
0040100D  mov         eax,dword ptr [ebp-4]
00401010  add         eax,1                    ; i++
00401013  mov         dword ptr [ebp-4],eax
00401016  cmp         dword ptr [ebp-4],0Ah    ; i <= 10 ?
0040101A  jae         00401030                 ; yep
0040101C  mov         ecx,dword ptr [ebp-4]
0040101F  push        ecx
00401020  push        403000h                  ; "%d\n"
00401025  call        dword ptr ds:[00402098h] ; printf
0040102B  add         esp,8                    ; printf stack cleaning
0040102E  jmp         0040100D
00401030  xor         eax,eax                  ; return 0
00401032  mov         esp,ebp
00401034  pop         ebp
00401035  ret

It was pretty messy to explain this function in words, so I decided to put comments in the code itself. Worth of noting, anyway, is the JMP instruction at 0x0040100B that skips the i++ at the first iteration.

I’d say it’s enough for now, so, see you next time!

Advertisements

One thought on “x86 Assembly 101: conditional jumps, logic and shiftings

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s