"Are you such a dreamer to put the world to rights?
I stay home forever
where 2 and 2
always makes a 5"
[Thom Yorke - 2 + 2 = 5]
I owe everything I know about x86 architecture to Xeno Kovah. A man who shared his class videos and slides freely available to everyone which is a noble act. In return to his great efforts, I decided to write this tutorial on x86 architecture and assembly and publish it for free so everyone who is interested can learn and contribute.
About the Author
Arash TC is the main author and maintainer of this book. He is currently studying IT in Finland. He's got OSCP/OSCE and a long long long way ahead to become a pro. He will appreciate reader's comments and criticisms and contributions. His main interest is low level security and kernel internals.
The Holy Book of x86
This book/guide/tutorial/wiki is about assembly and x86 architecture. It's written by a low level security dude for low level security dudes. If you want to learn Assembly and its structure, reversing basics, Segmentation, Paging etc. keep on reading. I highly recommend you check opensecuritytraining.info website and watch Intro to x86 videos as you read this book. This book will teach you x86 architecture with the perspective of Information Security and Trusted Computing.
UPDATE from August 2017: now you can get the book in hard copy! Make sure to check it out: The Holy Book of x86 on Amazon
Below we present you a part of the book. Here you can download your copy: https://github.com/Captainarash/The_Holy_Book_of_X86
Registers are small memory storage areas built into the processor. They are still a volatile memory storage so if you power off your PC, you're gonna lose the state of your registers. Intel Architecture defines 8 General Purpose Registers (GPR) as follows:
Each of the registers above are 4 bytes long for 32-bit version, and 8 bytes long for the 64-bit version. Beside those General Purpose Registers, we have EIP (RIP for 64-bit) which is called Instruction Pointer which hold the current flow of the execution; and we have EFLAGS, which is a 32-bit long register of registers. Yeah. I know that might sound crazy but I will explain them fully in later chapters. EAX is mostly used when a function wants to return a value and it also used for lots of different purposes. You have to see them in action in order to recognize its usability in different scenarios. EBX is base pointer for data section and EDX is I/O pointer but let's save these convention for later chapters. ECX is mostly used as a counter for repetitive instructions i.e. a for loop. ESI and EDI are used as Source Index and Destination Index respectively i.e. copying a string value. ESP is the stack pointer which always points to the top of the current stack. EPB is the base pointer which always (actually not always :D) points to the bottom of the current stack by convention. I have to mention that these are only some conventions and you don't have to use them in the exact way. It is simply for simplicity and readability for your code.
Two very important concepts you need to know, Stack and Heap. I'm gonna tell you what Stack is now and save Heap for later chapters. Stack is a conceptual are of memory (RAM) which mostly holds a functions local variables. Stack has a Last-In-First-Out data structure, meaning that the first thing that is pushed onto the stack is the last thing that is gonna pop out. Imagine a bucket full of apples. The first apple that you put inside the bucket is the last one that you can pull out (of course if you don't just turn the bucket upside down :D). Stack grows down from higher memory addresses to lower memory addresses. For example, if the stack starts from address 0x7fff4444 (ESP), the next DWORD (4 bytes, remember?) that you push onto the stack, decrements the stack by 4 bytes and then ESP will point to 0x7fff4444 - 4 = 0x7fff4440. You will see this in greater detail in the next few paragraphs.
There still some major things you need to in order to fully understand even a simple "Hello World" code in Assembly. So stick with me and be patient.
Caller - Callee Convention
Caller Save Registers mean that when ever you want to call a function, save these registers (EAX,EDX,ECX 32-bit or RAX,EDX,ECX for 64-bit) somehow so when the execution is handed over to the function, your data will remain intact. That means caller is responsible of saving these registers in order to prevent their destruction when the function modifies the values held in these registers. The caller is also responsible for restoring the saved values in registers when execution gets back to him. Callee Save Registers means that when the function (the callee) need more registers than those which are already saved by Caller, Callee is responsible to save those values before going to its actual execution. The registers that the callee is responsible for are EBP, EBX, ESI, EDI (RBP, RBX, RSI, RDI 64-bit). The callee is responsible to restore these saved values back their place before handing the execution back to caller.
Here's some Instructions for you but before you begin you must know the basic syntax of an assembly instruction. We have 2 different notations of assembly, Intel notation and AT&T notation. In Intel Notation after the instruction, first the destination is mentioned followed by a comma and the source.
instruction destination, source:
In AT&T notation after the instruction, first comes the source followed by a comma and then the destination. Every registers has a percent sign (%) appended to the beginning of it. It looks like this: instruction %source, %destination
*** NOTE: The precent sign is only applied to registers. It is not applied to immediate values.
The first instruction for you to learn is NOP. Better to wipe that smile off your face and tell me what NOP does?NOP instruction is this: XCHG eax,eax.
It conceptually does nothing but behind the scene, it actually doing something. It exchanges (XCHG as you guessed) the value in EAX with EAX.
PUSH instruction pushes either byte, a word, a dword or a quadword onto the stack. For this part of tutorial I will only explain pushing a dword (4-byte value) onto the stack. The rest is just a matter of seconds to understand and fit. In order to fully understand what a push instruction does, you have to see it by demonstration. For the following instructions:
(1) PUSH 0x41414141
(2) PUSH 0x42424242
(3) PUSH 0x43434343
Consider ESP points to some address that hold the content 0xDEADBEEF (0) before executing the 3 lines above. After the execution of each PUSH instruction, ESP gets decremented by 4 and the value will be pushed on to the stack and the new ESP will point to it.
POP is instruction is exactly the opposite of a PUSH instruction. It pops (moves) whatever value that ESP is currently pointing at to a register and will increment ESP by 4. If you look at the demonstration below, if we assume the EAX holds the value 0xDEADCE11 before execution of the following 3 lines of assembly, by issuing a PUSH EAX instruction (1), the current value at the address that ESP is pointing to at the time (CCCC or 0x43434343) will be popped off the stack and it will show up in EAX register and ESP will be incremented by 4. Notice that popping values off the stack will not completely destroy the popped value. It just moves it to the register as the instruction says and adds 4 to ESP.