
                            Back To The Basics
                                by SPo0ky       


    I wrote this tutorial because some beginners who read the first two
  editions of our magazine told us that they have problems understanding the
  basics of assembly... In this tutorial I'll not try to teach you any virus
  techniques, I'll only try to explain the most basic things of assembly like
  how tasm/tlink work, registers, memory, interrupts,... as simple as
  possible.
  Maybe this sounds boring to you but if you want to code your own viruses
  you have to understand the basics.
  Also this article will not fully teach you assembly! It will only help you
  to understand the basics. To understand assembly you will need at least a
  few weeks with good training (-programming). I also suggest you to buy a
  good book about Assembler and to read many many many virus source codes
  (Thats the way I used to learn Assembler).
  (Also - If you don't have a bookstore in your town you can use the online
   bookstore at http://intertain.com, they have many great books, and they
   are fast and cheap!)


 1. Why Assembler?

    There are some pros and cons why you should (should not) use Assembler.
  Contrary to HLL (High Level Languages), like Pascal, Basic or C++, in
  Assembler you have to tell the CPU each step it has to execute,
  which means that to write a big (complex) Assembler program is very time
  consuming. Thats why most of the time, big programs are not completely
  written in Assembler but Assembler parts are included in HLL-programs.
  Another con of Assembler is that programs can not be used on other brands of
  CPUs which they were written on because each brand of CPU has another
  instruction set (We will use the 80x86 instruction set, which is used in
  the IBM-PC and compatibles), but this gives you the possibility to optimize
  the code for one specific CPU so that you can use all of its capabilities.
  The result is extremely FAST and SMALL code.


 2. A simple assembly program

    Lets start with a short Assembler program. You don't have to understand
  what each instruction is used for now, I'll explain that later.
  Just type this program into an ascii editor (-edit.com) and save it as
  example.asm.

    .model tiny
    .code
      org 100h

    start:
      mov ah,9h
      mov dx,offset message
      int 21h

      mov ax,4c00h
      int 21h

      message   db  'CodeBreakers Rule! ;-)',10,13,'$'
    end start


 3. Assembler and Linker (TASM and TLINK)

    Before I continue I'll show you how to use TASM and TLINK to compile such
  a file to an EXE or COM program.

  After you saved the above program into a file (example.asm) you can type

    TASM EXAMPLE1.ASM

  This will generate a so called OBJECT-FILE named EXAMPLE.OBJ.
  Generally this file contains only information for the linker and a
  translated version of the above code into binary (machine code).

    Example:

      MOV AH,9h

    would be translated into 1011 0100 0000 1001

  This Object file is not executable yet, to make an executable file (COM or
  EXE) we have to use a LINKER (TLINK).
  This linker will stick one or more object files together to one executable
  file.
  To link the example1.obj file you just type:

    TLINK /t EXAMPLE1.OBJ

  The /t switch tells the linker that it should produce a COM file, if you
  leave /t away you will get an EXE file. Anyway, now you should have a ready
  to run COM file... Just type EXAMPLE to start it!


 4. Registers

    Registers are extremely fast accessible memory cells in the CPU, they are
  used to address memory, to give instructions to the CPU,... generally they
  are used to store "values".
  All registers can store 16 bits (= 2 bytes) of data and some registers can
  be split into two 8 bit (= 1 byte) registers.

   Registers of the 8086 CPU's:

     +--------------------+
     | AH  |  AL  >    AX |  -> Accumulator Register
     | BH  |  BL  >    BX |  -> Base Register
     | CH  |  CL  >    CX |  -> Count Register
     | DH  |  DL  >    DX |  -> Data Register
     +-----+--------------+
     |         SI         |  -> Source Index
     |         DI         |  -> Destination Index
     +--------------------+
     |         BP         |  -> Base Pointer
     |         SP         |  -> Stack Pointer
     +--------------------+
     |         CS         |  -> Code Segment
     |         DS         |  -> Data Segment
     |         ES         |  -> Extra Segment
     |         SS         |  -> Stack Segment
     +--------------------+
     |         IP         |  -> Instruction Pointer
     +--------------------+
     |         F          |  -> Flag-Register
     +--------------------+

   Only the registers AX, BX, CX and DX can be split into two parts:
   AH, AL, BH, BL, CH, CL, DH, DL. Each of them has only 8 bits instead of
   16!

   BTW -  8 bits are called a BYTE
         16 bits are called a WORD
         AH, AL, BH, BL, and so on, are called byte register and all others
         (AX, BX, CX, DX,...) are called WORD registers.


 5. MOV(e)

    Assembler (or the CPU) provides many functions which allow you to
  manipulate (to change) the data stored in a register. One of the most
  important instructions used to manipulate a register is MOV.
  Look back to our example program... we used MOV 3 times (MOV AH,9 /
  MOV DX, OFFSET MESSAGE / MOV AX,4C00H).
  You can change the data in a register by using: MOV <REG>, <DATA>. Where
  <REG> is the 16 or 8 bit register you want to change, and <DATA> is the
  data you wan to store in the register. 

    But MOV can't be only used to change the data of registers, it can also
  be used to change data stored at a certain location in MEMORY.

 6. MEMORY

    When you execute our little example program it just displays text... now,
  how can the computer know WHICH text it should display? Again, look at the
  example programs source code:

    MOV AH, 9
    MOV DX, OFFSET MESSAGE
    INT 21H
    .
    .
    MESSAGE DB 'Some text...$'

    The first line is used to tell the CPU that it should display text
  (function 9 in AH is used to display text). And the second line tells the
  CPU where it has to look for the text in memory. 
  Each byte in memory gets a number (an address), so the CPU knows exactly
  which byte it has to read/write.
  In the second line, "OFFSET MESSAGE" would return the address where the
  MESSAGE is stored in memory and store it in DX (To display text MS-DOS
  requires that the address of the text is stored in the register DX!).

 Some examples:

  Let's say this is our memory:

 Offset:   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16 
 Data:   | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P |

  We want to get the data which is stored at offset 6 into register AH.
  Which instruction would be used?

  -> MOV AH, [6]

  This would put the data at offset 6 (= 'F') into register AH. The '[' and
  ']' are very important, If you forget them it would put the number 6 into
  AH instead of 'F'!

  Remember, AH is a 8 bit register (-> It can store only 1 byte or 8 bit),
  what would happen if we'd use AX (which is a 16 bit register) instead of
  AH?

  -> MOV AX, [6]

  AX would become 'GF'. Yes, not 'FG'! In the x86's everything you read from
  memory into a word register is turned around! (Thats not that important for
  you yet, but you should know it anyway...)

 7. Interrupts

    I needed a very long time until I found a (hopefully) good way to explain
  interrupts to a newbie! Finally I decided to use a simple example, the
  MS-DOS Prompt.
  When you are at the MS-DOS prompt you can enter commands, after you press
  RETURN the command gets executed. You could compare the pressing of the
  RETURN key with an interrupt. In assembler you fill the registers with
  values, then you execute the interrupt. The interrupt code would then
  evaluate the values you put into the register, it would decide which
  function it should execute, .... and finally it would return the results
  (in a register, on the screen or on your hdd,...)

    Ex.:
      MOV AH,9
      MOV DX,OFFSET MESSAGE
      INT 21H

    The first two lines of this example have already been explained above,
  the 3rd line would execute an interrupt, interrupt 21h(ex).
  Interrupts are numbered from 0 to FFh, each interrupt provides other types
  of 'services'. Like
    INT 21H, this is the MS-DOS interrupt, it provides basic DOS functions,
     like input/output of text, file functions (like open, read, write to
     files).
    INT 13H is the BIOS interrupt, it provides many Disk access functions
     like reading/writing/formatting of disk sectors.
    INT 10H is the video interrupt, this interrupt allows you to use many
     functions to make nice graphics :-) It provides functions to change the
     video mode, to draw pixels onto the screen, to change the color of text,
     and so on...
  For a list of all interrupts and their functions download Ralf Browns
  Interrupt List from http://www.cs.cmu.edu/afs/cs.cmu.edu/user/ralf/pub/WWW/

 8. The sample program, step-by-step

  .model tiny
  -----------
    This is not code which will later be executed... it just tells TASM/TLINK
   that they should use one segment for the whole program. There are also
   .model small, .model huge, etc,... but for small programs (= simple
   viruses ;) model tiny is enough.

  .code
  -----
    This tells TASM and TLINK that our executable code begin here. After this
   like we can begin to write our main program.

  org 100h
  --------
    All COM files are loaded into memory at offset 100h. ORG tells the
   compiler 'where to store the code in memory'.

  start:
  ------
    This is just a lable which is required by TASM...

  MOV AH, 9
  ---------
    We want to display text... we will use the dos interrupt to do so.
   INT 21h requires that we put the function number into register AH. So, to
   tell the CPU that we want to display some text we 'MOV(E)' the number of
   the function used to display text (9) into register AH.

  MOV DX, OFFSET MESSAGE
  ----------------------
    The CPU needs to know where to find the text it should display... If we
   use INT 21H we have to store this location in register DX. To do so we
   just get the address of the message with 'OFFSET MESSAGE' and move it into
   DX.

  INT 21H
  -------
    Now that we have 'collected' enough information (filled the registers
   with many stupid numbers) we can execute the interrupt, which will finally
   get the CPU to display some text for us. :-)

  MOV AX,4C00H
  ------------
    Never forget the last two lines of this code! They are used to exit a
   program. If you forget them, your program will crash.
   DOS uses the function 4C to exit programs, 00 means that we will not
   return an Error Code (exit without an error).

  INT 21H
  -------
    Now INT 21H will execute the function to exit the program...

  message   db  'CodeBreakers Rule! ;-)',10,13,'$'
  ------------------------------------------------
    This is the message we WANT to display :-)
   '10,13' does the same as pressing return, it puts the cursor into the next
   line.
   '$', this sign doesn't mean 'fast money' :) ... somehow DOS needs to know
   where to stop displaying text, Bill G. decided to use '$'.

  end start
  ---------
    This indicates the end of the label 'start', also required by TASM.
   btw - This doesn't exit the program! To exit the program you still have to
   use function 4C with INT 21h.


  Ok, thats all for now... I know that this is a very basic tutorial, and
 that it wasn't written very well... but I hope that it answered at least
 a few of your questions! If you have any further questions feel free to use
 the message board on our homepage at www.codebreakers.org, or email me at
 spo0ky@thepentagon.com (spo0ky with zero! :)... Maybe I'll write a FAQ with
 specific questions for the 4th edition.

   --SPo0ky
