Tunneling Document #4
Development
of
Emulation Systems
Tunneling Documents #1, #2, #3, and #4 are all (c) 1997 PRINCE OF SADNESS and may not be modified without prior consent by the copyright holder, but may be reprinted and/or used as long as the correct copyright status of that document is stated, and that the medium in which my work is published must be free. All documents are read/used at your own risk. ICE and COS are (c) 1997 PRINCE OF SADNESS and may be modified as long as the base code of the modified system is acknowledged, and the copyright status of said base system be stated. ICE and COS may be used, as long as the copyright status of said systems be stated correctly, but in their compiled binary form, if no other copyrights are stated in the complete package which the ICE and/or COS systems are part of, then the copyright status of ICE and/or COS need not be stated, however all usage of ICE and COS must be free. ICE and COS systems are read/used at your own risk.
    .--------------.
    | Introduction |
    '--------------'
    Recently, emulation systems (aka Generic Decryption in the AV world) have
come into the limelight, especially in the AV marketing process under many
various names such as "Viral Instruction Code Emulation" and "Stryker", and
even though their usage by the AV is in a crippled form, this document will
take us into the wonderfull world of emulation and its uses by the virogen.
    Emulation solves many problems of the tunneling process, while bringing in
many of its own.  Cheifly of which... emulation systems are CPU dependant, and
as such, I had to decide wether to give you a crippled XT emulation system to
explain, which would not run very well on higher 386+ computers... or give you
a complicated 386+ emulation system which would not run on computers lesser
than the 386, such as the XT.
    I have opted for the 386+ emulation system for many reasons.  First of all,
XT emulation has been done before... but full 386+ emulation hasn't.  Also, if
you know how to write a 386+ emulator, you can write an XT emulator, however
the reverse is not so true.  Finally, my XT emulator wasn't really that good,
and it was hard to test as my BIOS and DOS has 386+ instructions in it :)
 .----------------------------------------------------------------------------.
 |                                  Part 1                                    |
 |                                                                            |
 |                          Basic emulation overview                          |
 '----------------------------------------------------------------------------'
-----------------------------------------------------------------------------
Section 1:  What is emulation?
-----------------------------------------------------------------------------
    Creating an emulation system is really just the development of your own
software based CPU.  This virtual-CPU can then be used to run code under in
your own completely protected environment.  This allows you to control every
facet of that code being run... as it is not really 'running', you are simply
emulating what WOULD happen if it was running under a real CPU... in accordance
to your own set of CPU rules.
    You should have realised by now, that single stepping through an interrupt
is the most reliable way to detect an original interrupt entrypoint, however a
major flaw inherent in single stepping is anti-tunneling code.  In an emulation
system however, it is as though you are single stepping through the interrupt
code... however you -CANNOT- be detected.  Of course, emulation however, is
nothing like single stepping :)
    So far you may be stupid enough to think emulation is some form of single
stepping or code tracing.  This is very far from the truth.  No code is
'executed' or 'emulated' in code tracing (except maybe JMP SHORT, etc), and
emulation has NOTHING to do with single step mode.  A computer could be devoid
of single step mode and it would be of no problem to the emulator.  Later on
however, you will learn that the emulation system you will be learning about
can actually EMULATE single step mode (however, by then of course, you will
understand more fully the concept of emulation).
    Also, do not be under the common assumption that in emulation, no code is
actually run.  This is untrue.  Code being emulated -IS- run... however under
the complete control of your emulation system :)  Code written to write some
text to the screen, or data to disk, will do so under an emulation system.  The
line seems to blur however when the AV talk about emulation systems, and their
emulation systems do not emulate such things that will write to the disk, or
the screen, etc.  Such a system is still emulation, it's just that the emulator
itself controls the code being emulated in this specific way.
-----------------------------------------------------------------------------
Section 2:  Emulation history
-----------------------------------------------------------------------------
    In the public view, usage of emulation in virogen began February 3rd 1995,
when Antigen [VLAD] released the first version of Antigen's Radical Tunneler
(ART).  Soon however, he released a new version (2.2) which was emulation in
its own right (whereas the first version was of somewhat less capability).
    ART 2.2 had many problems in its emulation, the least of which was that it
could only (barely) emulate an XT, however in general, in a tunneler, this is
enough.  Antigen did create more versions of ART (up to at least v4), however I
have not seen them so I cannot comment on wether he has fixed previous problems
or not.
    The idea of ART quickly inspired CyberGOD to create 'Tracer', a complete
emulation system with none of the bugs of ART.  Tracer was faster, smaller, and
more complete than ART (it supported some common extra 186+ instructions), and
also it came in nice modules to be compiled together, with a demonstration
module that used Tracer to become a primitive 'DEBUG' program.
    Unfortunately, Tracer keeps its secrets well hidden in complex intertwining
code, and neither me, nor many other people, can understand it.  ART had this
same problem to a much lesser degree (as I understood it in the end) :)  This
is a bad thing in that I cannot learn from Tracer, however it is a good thing
in that Tracer's structure will not influence my emulation system, and also
because I have learnt to comment properly so that my code does not become like
Tracers' in being unreadable :)
    And that's it.  Those are the only two emulation systems created for usage
in virogen by VX coders.  This is good, much room remains for expansion of the
emulation system, especially into the handling of new instruction sets, new
structures, and new uses (so far, they can be used for tunneling and mid-file
infection, however there are probably many more uses).
-----------------------------------------------------------------------------
Section 3:  Emulation system structure
-----------------------------------------------------------------------------
    There are 3 categories most emulation systems can be lumped into... some of
which are only present in the AV world, some in the application world, and some
in the world of virus creation.
    .------.
    | SCCE |
    '------'
    Self Contained Code Emulation works like a proper CPU.  An instruction is
fetched from memory, it is decoded by the SCCE, and passed to an appropriate
routine which will emulate the instruction, and the loop continues on the next
instruction.  The emulator would contain routines to decode the memory/register
addressing operands, and then a routine for every possible instruction on the
CPU being emulated.  As you can imagine, SCCE can become quite large in size...
and slow in speed.
    The SCCE of course, has its uses for advanced AV software.  With all of the
instructions being handled internally... the AV can make the emulator report
extensively on the actions of each instruction... allowing these reports to be
cross referenced with heuristic data and generic cleaning modules to create an
effective AV system.  Also, the AV can control memory and port access down to
the most minute detail... since it is handling address calculation/decoding,
etc, all internally.  This means it can prevent the virus from escaping its own
memory area... or at least... if the SCCE is designed securely ;)
    Unfortunately, the AV are too scared... or maybe just not competent enough
to code or realise the usage of such an emulation system, and often opt out to
create inferior LCE systems (described later).  The SCCE system is however, put
to good use on Macintosh and such where an emulation system is coded to provide
an INTEL processor in the macintosh environment, allowing DOS and other OS to
run in a window.
    .-----.
    | BCE |
    '-----'
    Buffered Code Emulation, or BCE is a scaled down version of the SCCE, good
for usage in viruses due to its small size and faster speed in comparison to
SCCE.  This is obviously apparent, as all 3 emulation systems written by virus
coders use the BCE model to achieve emulation.
    In the BCE, an instruction is fetched from memory and compared against a
list of instructions which are 'special'.   If an instruction is not special,
it is decoded slightly to get its length, and then all such instructions are
routed to one small procedure which can generically emulate any instruction
which is not-special.  Special instructions, a small percentage of the complete
instruction set, are handled in specific small handlers.
    The BCE lessens the number of instructions it has to handle specifically by
routing the non-special instructions through a small generic handler, and by
doing this it reduces its size and increases its speed.  However, this is not
without its drawbacks, as it means you can't really restrict access to certain
memory areas or ports or anything like that, and you cannot create reports as
comprehensive as those an SCCE can provide.  However, those features aren't
needed in viruses, so that's okay.
    .-----.
    | LCE |
    '-----'
    Limited Code Emulation is somewhat like the level of emulation system used
in generic decryption as you know it.  An LCE is not really an emulator at all,
as it does not really 'emulate' instructions, it simply tracks the contents of
registers through a section of code, and maybe maintains a small list of memory
locations which were modified... or interrupts that were called, etc.
    The reason the LCE is used by the AV rather than the bigger, more complex
systems, is because even just bare minumum support of a few instructions can
take you a long way in decrypting primitive encrypted viruses, because viruses
use only a tiny portion of the total INTEL instruction set to decrypt their
main bodies.  By using an LCE, much overhead which occurs due to having to
handle the whole INTEL instruction set is lost, ending up in collosal speed
increases, at the sacrifice of not being able to handle complex decryptors.
    LCE can become usefull when quick file scanning is needed as a small yet
decent LCE can be used to quickly check files for suspicious behaviour, whereas
using an SCCE algorithm on each file would be unbelievably slow.  Then, maybe,
if things looked suspicious in a file, the LCE could them start up some SCCE
code to check the file more fully.
-----------------------------------------------------------------------------
Section 4:  Emulation system considerations
-----------------------------------------------------------------------------
    Some problems are common to all emulation systems, biggest of which is the
slow speed at which they execute code compared to code execited under a real
CPU.  All emulation systems have major overheads, in that for each instruction
needing emulation, it will take hundreds of real CPU instructions to process
and decode and finally carry out the operations required by the instruction.
    Secondly, emulation systems are -BIG-, and the bigger they are, the faster
they run, and the smaller they are, the slower they run!  It all really depends
on design structure.  It is possible to create a relatively fast emulation
system, however it would be large.  To create a small emulation system means
you need to compress some opcode information which means more overhead for
instruction decoding which means slower execution.
    For the AV, they can use as much space as they want, however they need to
be really fast.  This is okay.  Virus coders need small emulation systems,
however they must also be fast so the user doesn't notice a difference in
computer speed.  Hence, the virus coder is in between a rock and a hard place
and sacrifices must be made in the design of the emulation system.
    Thirdly, there is the problem of WHAT processor to emulate.  The more you
can emulate, the more stable you are however the bigger you will be.  In the
case of the virus, this is very bad, as it must be able to emulate things very
reliably so the users computer does not crash, and remain small so the user
does not notice disk space dissapearing.  You must decide wether to take the
risk of crashing and save space, or to be bigger and have less risk.
 .----------------------------------------------------------------------------.
 |                                  Part 2                                    |
 |                                                                            |
 |                           INTEL Complete Emulation                         |
 '----------------------------------------------------------------------------'
-----------------------------------------------------------------------------
Section 1:  Introducing the COS method
-----------------------------------------------------------------------------
    COS (complex opcode storage method) has come to replace the role of the
CMT (complex mask table) in both code tracers and emulation systems.  The COS
method offers compact storage for opcode information from the XT to Pentium
(possibly even MMX), in a format quicker to access than allowable under the
CMT, while also giving the COS decoder more flexibility in determining what to
do with opcodes in certain situations.
    To illustrate those points, the three tables below summarize what features
each type of opcode storage method provides, as well as relative speed in
returning opcode information, and efficency of each method to store opcode
information.  Of course, these tables are only very rough.
  SPEED
  .----------------------------.---------------.--------------.--------------.
  | Loops per opcode           |    CMT 1.0    |   CMT 2.0    |      COS     |
  -----------------------------|---------------|--------------|---------------
  | Minimum                    |        0      |       0      |       0      |
  | Maximum                    |       80      |      39      |      32      |
  | Average                    |       60      |      30      |       3      |
  '----------------------------'---------------'--------------'--------------'
  SIZE
  .----------------------------.---------------.--------------.--------------.
  | Instruction set handled    |    CMT 1.0    |   CMT 2.0    |      COS     |
  -----------------------------|---------------|--------------|---------------
  | XT                         |      1/2k     |     1/3k     |     2/5k     |
  | 286                        |      3/4k     |     N/A      |     3/5k     |
  | 386                        |      N/A      |     N/A      |     3/4k     |
  | Pentium                    |      N/A      |     N/A      |     4/5k     |
  | Pentium (MMX?)             |      N/A      |     N/A      |       1k     |
  '----------------------------'---------------'--------------'--------------'
    ** Note that COS can be shrunken to handle the less complicated
       instruction sets of lower processors, however to store more
       complex instruction sets leads to only negligible variations
       in COS size
  FEATURES
  .------------------------------------------------.---------.---------.-----.
  | Features                                       | CMT 1.0 | CMT 2.0 | COS |
  -------------------------------------------------|---------|---------|------
  | Opcode length determination                    |   xxx   |   xxx   | xxx |
  | Opcode validity determination                  |         |         | xxx |
  | Repeat descriptors for compact table storage   |   xxx   |         | xxx |
  | Dedicated routine handling                     |         |   xxx   | xxx |
  | Completely variable CPU opcode storage         |         |         | xxx |
  '------------------------------------------------'---------'---------'-----'
    ** Note that CMT 2.0 is less capable than CMT 1.0, however this
       was done intentionally to speed up the processing of opcode
       information as seen in the relative speed table above
    .---------------------------.
    | COS table entry structure |
    '---------------------------'
            .---------------------- extra type identifier flag
            |                          0 = invalid opcode
            |                          1 = repeat entry
            |
            -----.----------------- group access number
            |    |
            |    | .--------.------ repetition count - 1
           .'. .-'-'-. .----'----.
    7   6   5   4   3   2   1   0
   '--.--' '--.--' '.' '----.----'
      |       |     |       '------ immediate data length
      |       |     |                  000 = none
      |       |     |                  001 = byte sized always
      |       |     |                  010 = word sized always
      |       |     |                  011 = doubleword sized always
      |       |     |                  100 = farword sized always
      |       |     |                  101 = byte or word
      |       |     |                  110 = word or doubleword
      |       |     |                  111 = doubleword or farward
      |       |     |
      |       |     '-------------- procedure flag
      |       |                         0 = generic routine
      |       |                         1 = dedicated routine
      |       |
      |       '-------------------- restriction type
      |                                 00 = none
      |                                 01 = word/doubleword value
      |                                      built into instruction
      |                                 10 = mod/M only
      |                                 11 = mod/R only
      |
      '---------------------------- opcode identification
                                       00 = plain opcode
                                       01 = extra type flag
                                       10 = group entry
                                       11 = modr/m opcode
    .------------------------.
    | COS table entry layout |
    '------------------------'
            Table layout:       Size    Description
                               '----'  '-----------'
                       optional byte    repeat descriptor
                                byte    opcode descriptor
                       optional word    dedicated routine address
    .------------.
    | COS tables |
    '------------'
    In COS, there is no longer one big table of opcode information, opcodes are
divied up into 3 sets of tables... NORMAL, EXTENDED, and GROUP.  Each table is
set out in exactly the same way, and as such the decoder may utilize one loop
to do all instruction location processing... giving speed and size increases in
decoders.
    All opcodes begin using the NORMAL tables with a size of 1.  If the opcode
is prefixed by an 0FH, it is categorized as an EXTENDED opcode, and begins with
a size of 2.  As the decoder processes the opcode and locates its entry in its
respective table... the descriptor of that opcode may point to a GROUP table,
at which time GROUP processing comes into effect (described later).
    .-----------------.
    | COS descriptors |
    '-----------------'
    There are 4 types of descriptor (characterized by the last 2 bits of the
descriptor itself XXxxxxxx)... NORMAL, MODRM, EXTENDED, and GROUP.  NORMAL and
MODRM types are related and split up into the same sets of sections, however,
GROUP and EXTENDED codes have their own layout.
    EXTENDED descriptors (01Xxxxxx) come in 2 forms, repeat entry amd invalid
entry (specified by the 5th opcode bit).  An invalid descriptor means that the
opcode refrenced by this table entry is invalid, and should be treated as such,
the instruction has a length of 0, and the other fields of the invalid opcode
entry are unused.
    A repeat descriptor (011xxxxx) means that this entry covers opcodes whose
numbers are from that table entry number, to that table entry number + xxxxx,
the x's being a number specified in the descriptor itself.  If the current
opcode's number is a number in that range, then following the repeat descriptor
is another descriptor, which is used in the table entry decoding procedure.
    GROUP descriptors (10xxxyyy) tell the decoder that the table entry for this
instruction is contained within the group tables.  The yyy section of the group
descriptor specifies an immediate data length which is decoded and added to the
total instruction length after decoding of the proper group table entry, and
the set of group tables to use is indicated with the xxx portion of the
descriptor.
    PLAIN opcodes simply have restriction, procedural, and immediate decoding
applied, whereas MODRM opcodes are just like PLAIN opcodes however they go
through an extra process of MODRM decoding.
    Restriction decoding handles some restrictive forms of MODRM.  If these
restriction bits are set to 00, there is no restriction, nothing happens.  If
the fields are 01, then instruction length must be incremented by 2, or if an
address-size prefix was present before the opcode, 4.  The forms of 10 and 11
restriction types can be used by a decoder to ensure further validity of the
instruction being processed, however it is not neccessary.  10 means that the
instruction (of the MODRM type) may only specify a memory operand, while 11
means the instruction (of the MODRM type) may only specify a register operand.
    Procedural decoding is just to decide wether an opcode needs a dedicated
handler (ie: it is special) or if it can be handled by the generic opcode
handler.  If the procedure bit is set in PLAIN or MODRM descriptors, then a
word follows the descriptor with the address of a routine to call to handle
that opcode, otherwise the generic handler is used.
    Immediate data decoding is used to give instructions proper lengths.  In
types of immediate data length with only one type (ie: byte), this length is
added to the total instruction length.  In double types (ie: word/double), the
instructon length is increased by 2, UNLESS an operand-size prefix is present
before the opcode, at which time the instruction length if increase by 4 (in
total).
    MODRM decoding is complex... and comes in 2 forms.  One handler decodes the
basic XT MODRM format... however a second MODRM decoder handles MODRM opcodes
prefixed by address-size or operand-size opcodes, as these mean the opcode is
of the 32-bit MODRM type, and may also include an SIB (scale index byte) to be
handled.  The internal handling of MODRM is of no real concern to you.
    .-----------------.
    | COS table usage |
    '-----------------'
    In each of the COS tables, an opcodes descriptor is determined by the value
of the opcode.  In the NORMAL and EXTENDED COS tables, the first table entry is
for opcode 00, and the next, for opcode 01, etc.  However, certain descriptors
may cover a range of opcode numbers, by using repeat descriptors.
    COS GROUP tables are set out differently however.  There are 8 seperate
GROUP tables, each containing the equivalent (with the number of single and
repeat opcode descriptors) of 8 table entries.  Which of these group tables,
0-7, to use for each opcode, is indicated in the 3 xxx bits of the group
descriptor (the group access code).
    An opcode is referenced into one of these tables, by taking the 3rd, 4th
and 5th bits of the second byte of the opcode, and it corresponds to a table
entry in that group table, which is where the 'real' table entry for that
opcode is.  Group tables cannot contain group descriptors.
    .---------------------.
    | Default COS decoder |
    '---------------------'
    The COS decoder in ICE utilizes the full COS definition, EXCEPT handling of
restrictive opcode types 10 and 11 (you can add this if you like, however it is
of no real consequence to emulation).  Also, the COS decoder in ICE supports an
extension to the COS standard, which allows the usage of index tables which (at
the cost of 40 bytes) increase the speed of COS decoding eighty-fold.  Quite a
nice trade-off, don't you think?
    .----------------------------------------------.
    | Default COS decoder structure (very roughly) |
    '----------------------------------------------'
                       .-------------------.
                       | BCE passes opcode |
                       |      over to      |
       .---------------|  the COS decoder  -----------------.
       |               '-------------------'                |
       |                                          .---------------------.
    .--'--.                                       | Decoder recognizes  |
    | BCE |                                       | opcode as belonging |
    '-----'                                       | to either normal or |
       |                         .-----------------   extended tables   |
       |                         |                '-------------------.-'
       |                         |                                    |
       |               .-------------------.      .---------------------.
       |               | Normal tables are |      | Extended tables are |
       |               |     loaded for    |      | loaded for scanning |
       |               |      scanning     |      '-------------------.-'
       |               '---------.---------'                          |
       |                         |                                    |
       |                         '------------------.-----------------'
       |                               .-------------------------.
       |     .------------------.      |  Index tables are used  |
       |     | Group tables are |      | to provide offset into  |
       |     |    loaded for    -------| main database tables to |
       |     |     scanning     |      |   begin the process of  |
       |     '------------------'      |  opcode recognition at  |
       |              |                '------------.------------'
       |              |                             |
       |              |                   .------------------.
       |              |                   |   Table entries  |
       |              |                   | sorted as either |
       |              |                   | repeat or single |
       |              |                   |     entries      |
       |              |                   '---------.--------'
       |              |                             |
       |              |                .-------------------------.
       |              '-----------------   Opcode recognized as  |
       |                               |   normal or group type  |
       |                               '------------.------------'
       |          .---------------------------------'
       |  .----------------------.             .--------------------.
       |  | Opcode determined as |   invalid   |   Size of opcode   |
       |  |   valid or invalid   --------------| and opcode handler |
       |  '-.--------------------'             |  address given to  |
       |    |                                  |  emulation system  ----.
       |    | valid                            '--------------------'   |
       |    |                                             |             |
       |  .-------------------------.                     |             |
       |  | MODR/M length of opcode |           .---------'----------.  |
       |  |  determined, immediate  |           | Last minute fixups |  |
       |  |    length determined    ------------|     take place     |  |
       |  '-------------------------'           '--------------------'  |
       '----------------------------------------------------------------'
-----------------------------------------------------------------------------
Section 2:  ICE dispatcher and internals
-----------------------------------------------------------------------------
    The ICE dispatcher is the control centre of the emulation system.  It is
charged with various jobs, from emulating single step mode, loading opcodes to
be emulated, calculating their length, preparing for generic opcode emulation
if necessary, and calling opcode handlers.
    Being the centre of control in the emulator, the dispatcher also contains
the code neccessary to determine the address of interrupt entrypoints, although
we will leave that code out until later on in the document.
    .-----------.
    | Registers |
    '-----------'
    In an emulator, a portion of memory is allocated to store the 'emulated'
registers.  Since an emulator needs the real CPU registers for its own usage,
the emulated code does not use nor affect the real CPU registers, they affect
their counterparts in the emulated registers structure.  Some people get away
with pushing/popping the entire CPU registers onto/off the stack as needed,
however in theory both concepts are the same and this is easier to do.
; STRUC for our simulated CPU registers
;
struc ice_register_struc
      label _eax dword
      label _ax word
      label _al byte
            db 0
      label _ah byte
            db 0
            dw 0
      label _ebx dword
      label _bx word
      label _bl byte
            db 0
      label _bh byte
            db 0
            dw 0
      label _ecx dword
      label _cx word
      label _cl byte
            db 0
      label _ch byte
            db 0
            dw 0
      label _edx dword
      label _dx word
      label _dl byte
            db 0
      label _dh byte
            db 0
            dw 0
      label _edi dword
      label _di word
            dw 0
            dw 0
      label _esi dword
      label _si word
            dw 0
            dw 0
      label _ebp dword
      label _bp word
            dw 0
            dw 0
      label _csip dword
      label _ip word
            dw 0
      _cs   dw 0
      label _ssesp fword
      label _esp dword
      label _sp word
            dd 0
      _ss   dw 0
      _es   dw 0
      _ds   dw 0
      label _eflags dword
      label _flags word
            dd 0
ends ice_register_struc
ice_reg register_struc <>       ; our new 32-bit registers structure
    Notice how there is no EIP in our emulated registers structure.  This is
because in real mode... the top half of the EIP is always 0... and so we just
ignore the top EIP half, and use simple IP addressing.  It is easier this way
anyway.
    .--------.
    | Stacks |
    '--------'
    Just as we need a set of registers for the simulated CPU, we also need to
keep our emulation stack seperate from the stack which will be used by the
emulated code.  This is so we do not corrupt old stack information (which
anti-tunneling stack tests look for), and do not get any conflict between our
data on the stack and the data of the emulated code.  To handle this, we create
a small area of memory for our personal stack space (which we will call the
internal stack) and a variable to keep track of our internal stack pointer.
    During normal emulator execution, we are using the internal stacks by
default.  If we need to switch to external stacks (the stacks used by the
emulated code), we simply save our internal stack pointer and reaload SS and
ESP with the data in the emulated registers structure.  To switch back, we save
the SS and ESP in the emulated registers structure and load SS with CS and ESP
with the address in our internal stack pointer.
    Switching between internal and external stacks is handled through simple
macros as it just makes some areas of the ICE code easier to understand, with
one descriptive word rather than a few lines of hard to read code.
; STRUC for our internal 32-bit stacks
;
struc ice_stack_struc
      internal_esp   dd 0
      switch dw 0
          label bottom
      dw 50h dup(0)
          label top
ends ice_stack_struc
ice_internal_stack ice_stack_struc <>
; MACRO's, used for internal/external stack switching
;
macro ice_switch_to_internal_stack
        mov [cs:ice_reg._ss], ss
        mov [cs:ice_reg._esp], esp  ; save external stack address
        mov [cs:ice_internal_stack.switch], cs
        mov ss, [cs:ice_internal_stack.switch]
        mov esp, [cs:ice_internal_stack.internal_esp]
                ; set stack to internal stack address
        endm
macro ice_switch_to_external_stack
        mov [cs:ice_internal_stack.internal_esp], esp
                                    ; save internal stack offset
        mov ss, [cs:ice_reg._ss]
        mov esp, [cs:ice_reg._esp]  ; set stack to external stack address
        endm
    But this is not the end of stack discussion.  We will constantly be needing
quick access to the paramaters on the external stacks... for pushing and
popping.  On a 386, you can push/pop word or doubleword values, and as such we
create 4 routines to handle all the possible stack access we could need.
; 16-bit external stack push from AX
;
proc ice_external_push_16 near
        push es
        push edi
        les edi, [ds:ice_reg._ssesp]
        dec di
        dec di
        mov [es:di], ax
        mov [ds:ice_reg._sp], di
        pop edi
        pop es
        ret
endp ice_external_push_16
; 16-bit external stack pop into AX
;
proc ice_external_pop_16 near
        cld
        push ds
        push esi
        lds esi, [ds:ice_reg._ssesp]
        lodsw
        mov [cs:ice_reg._sp], si
        pop esi
        pop ds
        ret
endp ice_external_pop_16
; 32-bit external stack push from EAX
;
proc ice_external_push_32 near
        push es
        push edi
        les edi, [ds:ice_reg._ssesp]
        sub edi, 4
        mov [es:edi], eax
        mov [ds:ice_reg._esp], edi
        pop edi
        pop es
        ret
endp ice_external_push_32
; 32-bit external stack pop into EAX
;
proc ice_external_pop_32 near
        cld
        push ds
        push esi
        lds esi, [ds:ice_reg._ssesp]
        lodsd
        mov [cs:ice_reg._esp], esi
        pop esi
        pop ds
        ret
endp ice_external_pop_32
    .-------------------------.
    | Finally, the dispatcher |
    '-------------------------'
    The first thing needing attention inside the dispatcher is the simulation
of single step mode.  Now that may sound a little wierd... but you must realize
that we are trying to simulate a proper CPU here... we cannot allow REAL single
step mode to be run because that would give us away as an emulator!  This was a
big problem in ART, it did not handle single step mode and as such anything
which used it, would find itself single stepping through ART rather than its
own code.
    So, to begin with, we check the emulated flags register to see if the TF is
set (and therefore single step mode is on).  If it is set, we branch to a peice
of code to emulate an INT 1.  This INT 1 is only emulated inside the code being
emulated... we do not actually go into INT 1 ourselves.  Our INT emulation code
simply clears the emulated flags' TF and IF, pushes the emulated flags, CS, and
IP, onto the external stack, and sets the emulated CS and IP to point to the
INT 1 address.  This is an exact emulation of single step mode being done by
the CPU.
proc ice_tf_handler near
        xor ax, ax
        mov [ds:ice_opcode_length], ax
        mov ah, 1
        call ice_int_x
        jmp ice_tf_handled
endp ice_tf_handler
proc ice_dispatch near
        test [byte high word ds:ice_reg._flags], 1
        jnz ice_tf_handler          ; check for TF in emulated flags
ice_tf_handled:
    The ICE_INT_X procedure takes the interrupt to be emulated in AH, and the
number to add to the emulated IP register in the ICE_INTERRUPT_LENGTH variable.
The reason why is because when handling a normal interrupt, the return IP will
be 2 bytes AFTER the INT instruction.  However, since we are emulating single
step mode, we need the IP to point back to the original instruction, so we set
the variable to 0.  You'll see how that works later on.
    Next, we save the value of the emulated IP register in another variable
before we begin processing of the opcode.  This processing will require removal
of segment override prefixes.  However, later on, we may need the beginning of
the FULL instruction rather than of just the raw opcode.
        mov ax, [ds:ice_reg._ip]
        mov [ds:ice_original_ip], ax; save address of _IP before prefix removal
                                    ; begins
    Now that we have that all sorted out, we need to begin the gruelling task
of override removal.  What is required, is the removal and storage of any
opcode overrides found before the instruction we are needing to emulate.  For
our purposes, the opcode overrides we need to handle are the segment override
prefixes, the repeat override prefixes, the LOCK prefix (ignored), and address
size and operand size 386+ prefixes.
    To achieve this siphoning, we first clear our 4 seperate opcode storage
variables to 0 (they are all a byte long, as there can only be one valid prefix
of each type MAXIMUM... and this is the last prefix (ie: REP REPNE MOVSB, REP
is ignored)), and then check for each type of override in sequence.  If any are
found, the override is stored and the complete override recognition process
begins again (so we can trap things like REP CS: REPNE), except without the
variable clearing.
        xor eax, eax
        mov [ds:ice_overrides], eax ; clear prefix variables
                                    ; (they are 4 one byters, stored in a row,
                                    ; so we use one doubleword move to clear)
ice_segment_removal:
        les di, [ds:ice_reg._csip]  ; ES:DI=instruction to emulate
ice_breakpoint:
        mov ax, [es:di]             ; get opcode
        mov bx, ax
        and al, 011100111b
        cmp al, 000100110b
        mov al, bl
        je ice_segment_removal_process
        and al, 0feh
        cmp al, 064h
        je ice_segment_removal_process
        cmp al, 0f2h
        je ice_repeat_removal_process
        mov al, bl
        cmp al, 66h
        je ice_operand_removal_process
        cmp al, 67h
        je ice_address_removal_process
        cmp al, 0f0h
        je ice_removal_jump
ice_decode_begin:
                        ...
ice_address_removal_process:
        mov [ds:ice_address_override], al
        jmp ice_removal_jump
ice_operand_removal_process:
        mov [ds:ice_operand_override], al
        jmp ice_removal_jump        ; repeat override removal process
ice_repeat_removal_process:
        mov [ds:ice_repeat_override], bl
        jmp ice_removal_jump        ; repeat override removal process
ice_segment_removal_process:
        mov [ds:ice_segment_override], bl
ice_removal_jump:
        inc [ds:ice_reg._ip]        ; increment IP
        jmp ice_segment_removal     ; repeat override removal process
    The reason the code has been set out so strangely, rather than having the
jumps inline, etc, is because a conditional jump not taken is faster than a
conditional jump taken.  Since overrides aren't really THAT common, it's faster
to have the jumps only occur, and go slowly, if overrides are found. Speed is
very important in the dispatcher as it is used the most often (equal with the
COS decoder).
    Now that we have a pure opcode, we simply call the COS decoder with that
opcode!  However, the COS decoder needs special registers set up and returns
opcode information in a certain way as detailed below.
; registers modified : AX, BX, CX, DX, SI, BP
; registers untouched: DI, SP, ES, DS, SS
; Requires:           AX holds opcode to scan through table
;                     segment of COS tables in DS
;                     ES:DI points to raw opcode
;                     DF clear (direction flag)
; Returns:            CX                = instruction length
;                     ice_opcode_length = instruction length
;                     ice_handler       = opcode handler address
;
    As you can see, the only value we return is the length of the instruction
in CX... but we save copied of the instruction length and also what procedure
to call to handle the opcode as well.  Both of these are saved in this way for
reasons you will see later.
    Armed with this information, you may think we are ready to emulate the
instruction.  However, opcodes need to be loaded into the second part of a
special buffer, which is 4 bytes long, followed by 16 more bytes.  We must
first clear both parts of the buffer with NOPs.  Then, using the length in CX,
we REP MOVSB the code from the emulated CS:IP to our second buffer (the one
which is 16 bytes long).
ice_decode_begin:
        cld
        push bx             ; save original opcode
        mov [ds:ice_current_opcode], ax
        call ice_decoder    ; scan opcode through COS decoder
        push cx             ; save length to copy
        lds si, [ds:ice_reg._csip]
        push cs
        pop es
        mov di, (offset ice_override_buffer)
        mov cx, 5
        mov eax, 90909090h
        rep stosd           ; clear execution buffer with NOP instructions
        pop cx              ; restore length of instruction to copy
        mov di, (offset ice_opcode_buffer)
        rep movsb           ; copy instruction to be emulated into execution
                            ; buffer
    What has just been done, is the opcode to be emulated (minus overrides) has
been copied into a buffer, the remainder of which is filled with NOPs, and
prefixed by a 4 byte NOP buffer.  These 2 buffers are used by the generic
opcode handler.  Even if an opcode uses a special handler, sometimes those
special handlers access information in these buffers, or even call the generic
opcode handler outright.  This is why we -ALWAYS- load the opcode up into the
buffers.
    Now that we have things ready for the generic opcode handler, we must set
up some registers for the special opcode handlers.  DS must equal CS, ES:DI
must point to the raw opcode of the instruction being emulated, AX must hold
the actual opcode being emulated, the variable ICE_COMMUNICATION must be
cleared, and DL must hold the value of the ICE_OPERAND_OVERRIDE variable (which
means DL=0 if no 386 operand size override is present, otherwise it will be
nonzero).  Then we place a call to the address stored by the COS decoder.
ice_copy_complete:
        push cs
        pop ds
        pop ax                          ; original opcode, saved earlier
        les di, [ds:ice_reg._csip]
        mov dl, [ds:ice_operand_override]
        mov [ds:ice_communication], 0   ; clear communication area
        ; On entry to opcode handlers
        ;   AX    = opcode of instruction
        ;   ES:DI = instruction address
        ;   DS    = CS
        ;   DL    = ice operand override
        call [ds:ice_handler]           ; call opcode handler
    On return from the opcode handler, we can now increment the emulated IP
register with the length of the instruction which was saved by the COS decoder
earlier on.  However, some instructions emulated such as INT and JMP don't need
any instruction length to be added to the IP once they have finished handling
the instruction themselves.  In these cases, the special procedure handling the
opcode sets the instruction length to 0 before returning to the dispatcher.
        mov ax, [ds:ice_opcode_length]
        add [ds:ice_reg._ip], ax        ; increment IP by instruction length
    And now, before we return to the dispatcher, we do another small check for
single step mode handling.  If the ICE_COMMUNICATION variable has changed, this
means we must skip one pass of the TF checking code.  It will change after
things like an IRET or POPF where the TF turns from clear to set (in which
case, in the emulation of single step mode, on return from the INT 1, the
emulator has time to emulate the instruction before the next INT 1 is
emulated), or when the SS register gets changed (the CPU always skips single
step mode for one pass after SS is changed so one can modify the SP too).
        cmp [ds:ice_communication], 1
        jb ice_dispatch         ; default restart condition, clear old prefixes
                                ; and do TF check
        jmp ice_tf_handled      ; special POPF/IRET condition, skip checking
endp ice_dispatch
    And this ends our dispatcher.  To see how all the code peices fit together,
then look in Part 3 where the complete ICE source is.  Note how the spaghetti
code is actually optimization to not take conditional jumps wherever possible.
You may also want to check out how the COS decoder works...
-----------------------------------------------------------------------------
Section 3:  ICE generic opcode handler
-----------------------------------------------------------------------------
    Now for the good stuff... the very thing which seperates SCCE from the BCE,
the generic opcode handler.  Any opcodes which cannot run under the generic
opcode handler are specified as 'special' instructions... and they are then
handled seperately.  Special instructions usually modify the CS or IP, and this
cannot be done in the generic opcode handler so those instructions are special.
    Okay, now, to understand how a generic opcode emulator works, it helps to
understand an overview of what we have to do.  To put it in the simplest terms
possible... we simply load the real CPU registers with the registers from the
emulated registers structure... execute the copy of the instruction we have
saved in our internal buffers while switched to the external stack... then save
all the CPU registers back into the emulated registers structure, and switch
back to internal stacks.  That's the simple overview, now for the detail.
    Okay, first, we load up the CPU registers with the registers from the
emulated registers structure... except for CS, IP, SS, ESP.  SS and ESP are
loaded using the special stack switching macro, so that we don't corrupt our
internal stack pointer.  We also must load up the eflags register, but to do
so, we need to save a temporary copy of the flags, and then mask off the TF bit
in the original copy of the flags, before loading them into the real CPU flags
register.  The reason we do this is so that single-stepping won't take over
control in our emulation routine, as we are already emulating it seperately.
    Later on when we have to save the flags back into the emulated registers
structure, we will have lost track of wether the TF is set or clear.  This is
where the saved copy of the flags comes into handy, as we simply OR the saved
TF against the TF of the flags in the emulated registers structure, and we have
the proper flags back!  All of the instructions which can check/modify the TF
use special opcode handlers, so our TF will never change in the generic opcode
handler, and our saved TF will always be valid.
    With the flags handled (sort of), we must now copy the original overrides
from their variables to the 4 bytes of NOP prefixing the 16 byte buffer which
our instruction to be emulated was copied into earlier.  We must be carefull
however, that when we put the overrides in place, that there is no NOP space
between the overrides or between the overrides and the beginning of the
instruction being emulated.
    Once this has been done, we can load up all the CPU registers from the
emulated registers structure and switch to external stacks.  Right after the
stack switch code, the 2 buffers (override and instruction) are sitting
there... and the CS:IP runs right into them.  But they don't contain data, due
to all our fixing them up, they contain a proper opcode and prefixes, to become
executable code (which is why we filled redundant space with NOP rather than
0).  If you don't understand that, you will see how it works later.
    Once the instruction has executed, we switch to internal stacks and save
all the CPU registers (including the flags) back into the emulated registers
structure.  We then touch up the saved copy of the flags, and return to the
dispatcher.  The opcode has been succesfully 'emulated'.
    .----------------------.
    | Infamous CS problems |
    '----------------------'
    With all that done, there are some slight problems with generic opcode
handlers which must be fixed to provide proper generic emulation.  Basically,
when a CS override is encountered... when the instruction is 'run' in our
protected environment (emulated), it will be referencing OUR CS rather than the
proper CS.  To fix this, in the beginning of our handler, we check to see if a
CS: override present, and if so, we change it to a DS:.  Then, when loading up
the CPU registers from the emulated registers structure, we set DS to the CS:
of the code we are emulating.  Later, on storage of the CPU registers to the
emulated registers structure, we don't save the DS back (as it has been changed
by us), and switch the saved DS: override back to CS:.
    This itself presents a problem however, in instructions such as
        LDS AX, [CS:100]    and
        MOV DS, [CS:100]    and
        MOV [CS:100], DS    and
        MOV [DS:100], CS    and
        MOV [CS:100], CS
    In the first 2 cases, DS must be saved back to the emulated registers
structure as it is changed by the emulated instructions as well.  In the 3rd
case however, DS itself can't be used because it is stored somewhere in memory
in incorrect form (holding CS: instead of the proper value).  The 4th and 5th
cases just won't work at all.
    To handle this problem, all LDS instructions, and instructions which
involve segment registers with CS: overrides, are re-routed through the COS
decoder to special handlers.  Then, for the 1st, 2nd, and 3rd cases, a special
portion of the generic opcode handler is called, which instead of swapping the
CS: override with a DS: override and loading DS... swaps CS: with ES: and loads
up ES.  Then, in these special cases, the opcodes will decode properly (DS will
be saved back into the emulated registers structure, and ES will not be).
    For the 4th and 5th cases however, the answer is more complex and not to do
with overrides... the special handlers for those opcodes will be covered in a
later section of the document.
    Note that I have not given you any code for all this in here, as it would
just be repeating everything in Part 3 of the document where the complete
generic opcode handler source can be found (the procedure is called
ice_generic).
-----------------------------------------------------------------------------
Section 4:  ICE special opcode handlers (basic)
-----------------------------------------------------------------------------
    We'll start with the most basic opcode handlers... just to give you an idea
of what is needed in special opcode handlers.  Later, in the next section, we
will cover the more advanced handlers.
    .-----.
    | AAM |
    '-----'
    Some emulation systems do not handle the undocumented variant of AAM, which
can cause a divide-by-0 exception in various circumstances.  AAM usually has an
opcode of D40A, and when in the form of D400, will always issue INT 0.  We
emulate this in our special opcode handler, unless the AAM is 'normal' in which
case we parse it through the generic opcode handler.
proc ice_aam near
        or ah, ah
        jz ice_div_exception        ; emulate a DIV exception
        jmp ice_generic             ; emulate AAM generically
endp ice_aam
    .------------.
    | POP segreg |
    '------------'
    The only POP segreg instruction we must handle is POP SS, in which case we
must skip single step handler checking on the next instruction pass.  We do
this by setting the ICE_COMMUNICATION variable to 1, and then continuing on
with the generic opcode handler handling the POP segreg instruction itself.
proc ice_pop_segreg near
        cmp al, 17h
        jne ice_pop_segreg_exit     ; is it POP SS?
        inc [ds:ice_communication]  ; if so, skip single step handler on return
ice_pop_segreg_exit:
        jmp ice_generic             ; use generic handler for opcode anyway
endp ice_pop_segreg
    .-------------.
    | PUSH segreg |
    '-------------'
    Just as before... there is only one PUSH segreg instruction we must handle,
and that is PUSH CS (we do not need to handle POP CS because it is handled by
the COS decoder as an extended instruction prefix).  With PUSH CS, there are 2
variants we must handle, the 16-bit version, where we simply use our external
stack push procedure to push the emulated CS value onto the external stack...
but also the 32-bit version, where we must use methods to determine the unknown
top half of the CS register and push it, combined with the emulated CS, onto
the external stack (in double-word form).
proc ice_push_segreg near
        cmp al, 0eh
        jne ice_generic             ; not PUSH CS?  exit!
        db 66h
        push cs
        pop eax
        mov ax, [ds:ice_reg._cs]    ; determine the complete emulated CS
        or dl, dl
        jnz ice_push_segreg_32      ; go to 32-bit version if operand size
                                    ; prefix is present
        call ice_external_push_16   ; push 16-bit emulated CS
        ret
ice_push_segreg_32:
        call ice_external_push_32   ; push 32-bit emulated CS
        ret
endp ice_push_segreg
    .------------.
    | MOV segreg |
    '------------'
    There are two forms of this we must handle... both MOV with segreg as a
source, and MOV with segreg as a destination.  Of these, we must handle all
references to CS, references to DS, and references to SS.
    For MOV SS, of either form, we must set ICE_COMMUNICATION to skip the next
single step check... to emulate the CPU.  This is so an INT 1 is not called
while the emulated SP is possibly incorrect as SS was just changed, in which
case things would get corrupted.
    For MOV DS, of either form, we must call the ICE_GENERIC_PROCESS_ES label
to initiate generic opcode handling for these instructions.  This will fix
problems with the instructions which use DS while a CS override is present. We
do not need to check for the CS override, because this is done in the generic
opcode handler.
    For "MOV CS, ?", we must emulate an invalid opcode exception, as this is an
invalid opcode :)  For the alternate "MOV ?, CS" instruction however, things
become more complex.
    First, if we find a "MOV AX, CS" or "MOV EAX, CS" instruction, we just
emulate this straight out by calculating the emulated CS and overwriting eAX in
the emulated registers structure with this value.
    If it is not of this form however, we convert the instruction to a "MOV ?,
eAX" instruction in the generic opcode handler execution buffer.  Then, we save
the emulated eAX register on the stack, and replace the copy in the emulated
registers structure with the calculated emulated CS value.  Then we -CALL- the
generic opcode handler, and on return, we return eAX to its original value.
    The reason we handle the "MOV eAX, CS" instructions seperately, is because
if we didn't, then when we convert the instruction it will become "MOV eAX,
eAX", and then on return from the generic opcode handler, the eAX will be
replaced with its original value... and the saved CS value we just moved into
it would be lost.
proc ice_mov_segreg_source near
        and ah, 111000b
        cmp ah, 1000b
        je ice_mov_regmem_cs        ; MOV ?, CS
endp ice_mov_segreg_source
proc ice_mov_segreg_destination near
        and ah, 111000b
        cmp ah, 1000b
        je ice_invalid_opcode       ; MOV CS, ?
        cmp ah, 11000b
        je ice_generic_process_es   ; MOV DS instructions
        cmp ax, 1000010001110b
        jne ice_mov_segreg_exit
        inc [ds:ice_communication]  ; MOV SS, ?
ice_mov_segreg_exit:
        jmp ice_generic             ; handle the rest generically
endp ice_mov_segreg_destination
proc ice_mov_regmem_cs near
        cmp [byte high word ds:ice_current_opcode], 11001000b
        je ice_mov_ax_cs
        mov [byte ds:ice_opcode_buffer], 89h
        and [byte ds:ice_opcode_buffer+1], 11000111b
        push [ds:ice_reg._eax]      ; save _EAX
        xor eax, eax
        mov ax, [ds:ice_reg._cs]
        mov [ds:ice_reg._eax], eax  ; _EAX = _CS
        call ice_generic            ; emulate it
        pop [ds:ice_reg._eax]       ; restore _EAX
        ret                         ; exit
endp ice_mov_regmem_cs
proc ice_mov_ax_cs near
        xor eax, eax
        mov ax, [ds:ice_reg._cs]
        cmp [ds:ice_operand_override], 0
        jne ice_mov_ax_cs_32
        mov [ds:ice_reg._ax], ax    ; _AX = _CS
        ret
endp ice_mov_ax_cs
proc ice_mov_ax_cs_32 near
        mov [ds:ice_reg._eax], eax  ; _EAX = 0000 shl 16 + _CS
        ret
endp ice_mov_ax_cs_32
    .------------.
    | PUSHF/POPF |
    '------------'
    PUSHF and POPF are relatively easy to handle... we simply use our external
stack access procedures to move the flags to/from the stack... in their 16-bit
versions by deafult or, in the case of an operand size prefix override, in
32-bit form.
    However, in the POPF instruction, we must check for a change in the state
of the trap flag... if it changes from clear to set (0 to 1), then we set the
ICE_COMMUNICATION variable to 1 to skip the TF checking code for one pass... as
this is what the CPU does.
proc ice_pushf near
        mov eax, [ds:ice_reg._eflags]   ; get the flags
        or dl, dl
        jnz ice_pushfd
        call ice_external_push_16   ; push them onto external stack (word)
        ret
proc ice_pushfd near
        call ice_external_push_32   ; push them onto external stack (double)
        ret
endp ice_pushfd
endp ice_pushf
proc ice_popf near
        mov bx, [ds:ice_reg._flags] ; get a copy of the flags
        or dl, dl
        jnz ice_popfd
        call ice_external_pop_16    ; get the new copy of the flags
        mov [ds:ice_reg._flags], ax ; save them into the real flags
        jmp ice_popf_single_step
proc ice_popfd near
        call ice_external_pop_32        ; get the new copy of the flags
        mov [ds:ice_reg._eflags], eax   ; save them into the real flags
ice_popf_single_step:
        and bh, 1
        jnz ice_popf_exit           ; exit if TF was originally SET
        and ah, 1
        jz ice_popf_exit            ; exit if TF is still SET
        inc [ds:ice_communication]  ; TF transition from OFF-ON, skip TF check
                                    ; for one instruction pass
ice_popf_exit:
        ret                         ; POPF emulation finished
endp ice_popfd
endp ice_popf
    .-------------.
    | LOOP??/JCXZ |
    '-------------'
    LOOP and JCXZ instructions are easy to handle... all that really needs to
be noted is that, instead of calculating 8-bit IP offsets in the case that
short jumps follow through... we use the code of the short conditional jump
procedure.  That procedure will be discussed in the advanced handler section.
proc ice_loop near      ; DEC CX, JNZ X
        or dl, dl
        jnz ice_loop_ecx
        dec [ds:ice_reg._cx]
        jnz ice_jmp_conditional_short_follow
        ret
ice_loop_ecx:           ; DEC ECX, JNZ X
        dec [ds:ice_reg._ecx]
        jnz ice_jmp_conditional_short_follow
        ret
endp ice_loop
proc ice_loope near
        test [byte low word ds:ice_reg._flags], 1000000b
        jnz ice_loop    ; use normal LOOP procedure if ZF set
        jmp ice_loop_dec    ; decrement eCX anyway
endp ice_loope
proc ice_loopne near
        test [byte low word ds:ice_reg._flags], 1000000b
        jz ice_loop     ; use normal LOOP procedure if ZF clear
        jmp ice_loop_dec    ; decrement eCX anyway
endp ice_loopne
proc ice_loop_dec near
        or dl, dl
        jnz ice_loope_ecx
        dec [ds:ice_reg._cx]    ; decrement CX
        ret
ice_loope_ecx:
        dec [ds:ice_reg._ecx]   ; decrement ECX
        ret
endp ice_loop_dec
proc ice_jcxz near
        mov eax, [ds:ice_reg._ecx]
        or dl, dl
        jnz ice_jcxz_ecx
        or ax, ax       ; follow short jump if CX was 0
        jz ice_jmp_conditional_short_follow
        ret
ice_jcxz_ecx:
        or ecx, ecx     ; follow short jump if ECX was 0
        jz ice_jmp_conditional_short_follow
        ret
endp ice_jcxz
    .------------------.
    | INT instructions |
    '------------------'
    INT instructions only need to be handled in 16-bit form, as there are no
32-bit equivalents, at least, in real mode anyway.  Note how the opcode length
is set to 0 on interrupt executions once they have been emulated so as not to
mess with the emulated IP on return to the dispatcher.  Also note how the main
interrupt execution procedure accepts the interrupt to be emulated in AH, and
adds the original instruction length to the emulated return IP address on the
external stack.
proc ice_into near
        mov ah, 4
        test [byte high word ds:ice_reg._flags], 1000b
        jnz ice_int_x       ; emulate interrupt if emulated overflow flag set
        ret                 ; else just skip the interrupt
endp ice_into
proc ice_int_3 near
        mov ah, 3           ; emulate INT 3 instruction (length of 1 already in
                            ; the ice_opcode_length variable
proc ice_int_x near
        xchg ax, bx         ; BX holds interrupt to emulate
        mov ax, [ds:ice_reg._flags]
        call ice_external_push_16   ; save emulated flags on external stack
        mov ax, [ds:ice_reg._cs]
        call ice_external_push_16   ; save emulated CS on external stack
        mov ax, [ds:ice_reg._ip]
        add ax, [ds:ice_opcode_length]
        call ice_external_push_16   ; save emulated return IP on external stack
        and [byte high word ds:ice_reg._flags], 11111100b
                                    ; clear emulated IF and TF
        xor ax, ax
        mov di, ax              ; DI = 0
        mov al, bh              ; AL = INT to emulate
        shl ax, 2               ; AX = INT * 4
        xchg ax, di
        mov es, ax              ; ES = 0, DI = INT * 4
        mov ax, [word es:di]    ; get offset of interrupt code
        mov [ds:ice_reg._ip], ax; update emulated IP
        mov ax, [word es:di+2]  ; get segment of interrupt code
        mov [ds:ice_reg._cs], ax; update emulated CS
        xor ax, ax
        mov [ds:ice_opcode_length], ax  ; clear opcode length as IP is already
                                        ; set properly
        ret
endp ice_int_x
endp ice_int_3
    .--------------.
    | RET families |
    '--------------'
    Some RET instructions are easier to handle than others... due to their
16-bit and 32-bit natures.  Some RET instructions have a word value following
them to be added to eSP.  Also, in the case of 32-bit RET instructions, you
must make sure the return address is valid, in that the top half of the return
IP must be 0, otherwise a protection fault must be emulated.
    Strangely enough, in real mode, there is a 32-bit version of IRET, however
there is no corresponding 32-bit version of INT, as it just always uses the
normal 16-bit INT.  This is possibly due to memory manager interference, and
may not be for all computers.  But, shrug, who cares?
proc ice_ret_near_value
        mov bx, [es:di+1]
        jmp ice_ret_near_skip   ; get value to add to eSP
endp ice_ret_near_value
proc ice_ret_near
        xor bx, bx              ; value to add to eSP is 0
ice_ret_near_skip:
        or dl, dl
        jnz ice_ret_near_32     ; 32-bit RET NEAR
        call ice_external_pop_16; get new IP
        mov [ds:ice_reg._ip], ax; set new IP
        jmp ice_ret_exit
ice_ret_near_32:
        call ice_external_pop_32    ; get new IP
        cmp eax, 10000h
        jnb ice_ret_exception       ; emulate exception if invalid return IP
        mov [ds:ice_reg._ip], ax    ; set new IP
endp ice_ret_near
proc ice_ret_exit near
        dec [ds:ice_opcode_length]  ; instruction length = 0, for dispatcher
        or dl, dl
        jnz ice_ret_exit
        add [ds:ice_reg._sp], bx    ; update SP
        ret
ice_retn_exit_32:
        xor eax, eax
        mov ax, bx
        add [ds:ice_reg._esp], eax  ; update ESP
        ret
endp ice_ret_exit
proc ice_ret_exception near
        call ice_external_push_32   ; for protection fault in RETs... we must
                                    ; have a valid return address... and since
                                    ; what we have here is an invalid one....
                                    ; set the stack back to normal first
        jmp ice_protection_fault
endp ice_ret_exception
proc ice_ret_far_value near
        mov bx, [es:di+1]       ; get value to add to eSP
        jmp ice_ret_far_skip
endp ice_ret_far_value
proc ice_ret_far near
        xor bx, bx              ; value to add to eSP is 0
ice_ret_far_skip:
        or dl, dl
        jnz ice_ret_far_32      ; 32-bit RET FAR
        call ice_external_pop_16
        mov [ds:ice_reg._ip], ax; save new IP
        call ice_external_pop_16
        mov [ds:ice_reg._cs], ax; save new CS
        jmp ice_ret_exit
endp ice_ret_far
proc ice_ret_far_32 near
        call ice_external_pop_32; get new IP
        cmp eax, 10000h
        jnb ice_ret_exception   ; emulate exception if it's invalid
        mov [ds:ice_reg._ip], ax; save new IP
        call ice_external_pop_32
        mov [ds:ice_reg._cs], ax; save new CS
        jmp ice_ret_exit
endp ice_ret_far_32
proc ice_iret
        dec [ds:ice_opcode_length]  ; set opcode length to 0
        or dl, dl
        jnz ice_iret_32             ; use 32-bit IRET
        call ice_external_pop_16
        mov [ds:ice_reg._ip], ax    ; save new IP
        call ice_external_pop_16
        mov [ds:ice_reg._cs], ax    ; save new CS
        jmp ice_popf                ; emulate POPF
ice_iret_32:
        call ice_external_pop_32    ; get new IP
        cmp eax, 10000h
        jnb ice_ret_exception       ; emulate exception if it's invalid
        mov [ds:ice_reg._ip], ax    ; set new IP
        call ice_external_pop_32
        mov [ds:ice_reg._cs], ax    ; set new CS
        jmp ice_popf                ; emulate POPF[D]
endp ice_iret
    .------------.
    | Commentary |
    '------------'
    As you can see, the basic opcode handlers for ICE are very simple... and
there is probably some slight room for optimization, especially in the case of
combining 16-bit and 32-bit code, which is the crutch of most confusion.
    Anyway, with the basic handler concepts out of the way, we now move onto
the remaining few handlers which are slightly more complex.  Actually, most in
the next section aren't really complex at all... however I lumped them there
just because I felt like it.
    What are you doing here reading this?  Get to the next section!
-----------------------------------------------------------------------------
Section 5:  ICE special opcode handlers (advanced)
-----------------------------------------------------------------------------
    Good.  You're here.  In this section, we cover JMP SHORT instructions,
conditional jump instructions, JMP/CALL instructions with direct values,
JMP/CALL instructions with indirect values, BOUND, and DIV handling.  They are
all slightly complex due to the difference between their 16-bit and 32-bit
forms.
    .--------------------------.
    | JMP (short, conditional) |
    '--------------------------'
    Through the COS database tables, JMP SHORT is re-routed to point to the
'follow conditional jump' section of code.  This then brings us to the handling
of conditional jumps (with short, long, and very long displacements).
    For efficient JMP handling, we use the concept of self-modifying code.  We
copy the first byte of the jump instruction to emulate (which holds the details
of WHAT type of jump it is), and use our own displacement to point to a section
of code which emulates the following of a conditional jump.  If the conditional
jump falls through, then the special handler exits and the IP is updated by the
dispatcher to point past the conditional jump instruction.
    Look at the code (note the jump to clear instruction prefetch).
proc ice_jmp_conditional_short near
        mov [byte ds:ice_jmp_conditional_short_modify], al
        db 0ebh, 00
        mov ebx, [ds:ice_reg._eflags]
        and bh, 11111110b
        push ebx
        popfd
ice_jmp_conditional_short_modify:
        jc ice_jmp_conditional_short_follow
        ret
ice_jmp_conditional_short_follow:
        mov al, [es:di+1]
        cbw
        add [ds:ice_reg._ip], ax
        ret
endp ice_jmp_conditional_short
proc ice_jmp_conditional_long near
        mov [word ds:ice_jmp_conditional_long_modify], ax
        db 0ebh, 00
        mov ebx, [ds:ice_reg._eflags]
        and bh, 11111110b
        push ebx
        popfd
ice_jmp_conditional_long_modify:
        dw 0fh
        dw 1
        ret
ice_jmp_conditional_long_follow:
        or dl, dl
        jnz ice_jmp_conditional_long_32
        mov ax, [es:di+2]
        add [ds:ice_reg._ip], ax
        ret
endp ice_jmp_conditional_long
proc ice_jmp_conditional_long_32 near
        xor eax, eax
        mov ax, [ds:ice_opcode_length]
        add ax, [ds:ice_reg._ip]
        add eax, [es:di+2]
        cmp eax, 10000h
        jnb ice_protection_fault
        mov [ds:ice_reg._ip], ax
        ret
endp ice_jmp_conditional_long_32
    .-------------------.
    | JMP/CALL (direct) |
    '-------------------'
    These are all easy enough to handle... just note how we continue to check
the 32-bit versions so as to have valid IPs (or, if the IP is not valid,
emulating a general protection fault), and that we keep clearing the opcode
length to 0... except in the case of 16-bit JMP/CALL NEAR DIRECT, in which case
the opcode length isn't touched because it forms a part of the new IP.
    Note that this is not the only method of handling direct JMP/CALL, as there
is another way which can be used in conjunction with indirect JMP/CALL
handling, which will save 1 kilobyte of space!  However... it has problems...
discussed in the next section.
proc ice_direct_call_far near
        or dl, dl
        jnz ice_direct_call_far_32
        mov ax, [ds:ice_reg._cs]
        call ice_external_push_16
        mov ax, [ds:ice_reg._ip]
        add ax, 5
        call ice_external_push_16
proc ice_direct_jmp_far near
        or dl, dl
        jnz ice_direct_jmp_far_32
        mov ax, [word es:di+1]
        mov [ds:ice_reg._ip], ax
        mov ax, [word es:di+3]
        mov [ds:ice_reg._cs], ax
        dec [ds:ice_opcode_length]
        ret
endp ice_direct_jmp_far
endp ice_direct_call_far
proc ice_direct_call_far_32 near
        cmp [word high dword es:di+1], 0
        jnz ice_protection_fault
        db 66h
        push cs
        pop eax
        mov ax, [ds:ice_reg._cs]
        call ice_external_push_32
        xor eax, eax
        mov ax, [ds:ice_reg._ip]
        add ax, 7
        call ice_external_push_32
proc ice_direct_jmp_far_32 near
        mov eax, [es:di+1]
        cmp eax, 10000h
        jnb ice_protection_fault
        mov [ds:ice_reg._ip], ax
        mov ax, [word es:di+5]
        mov [ds:ice_reg._cs], ax
        dec [ds:ice_opcode_length]
        ret
endp ice_direct_jmp_far_32
endp ice_direct_call_far_32
proc ice_direct_call_near near
        or dl, dl
        jnz ice_direct_call_near_32
        mov ax, [ds:ice_reg._ip]
        add ax, 3
        call ice_external_push_16
proc ice_direct_jmp_near near
        or dl, dl
        jnz ice_direct_jmp_near_32
        mov ax, [es:di+1]
        add [ds:ice_reg._ip], ax
        ret
endp ice_direct_jmp_near
endp ice_direct_call_near
proc ice_direct_call_near_32 near
        xor eax, eax
        mov ax, [ds:ice_reg._ip]
        add ax, 5
        push eax
        add eax, [es:di+1]
        cmp eax, 10000h
        pop eax
        jnb ice_protection_fault
        call ice_external_push_32
proc ice_direct_jmp_near_32 near
        xor eax, eax
        mov ax, [ds:ice_reg._ip]
        add eax, [es:di+1]
        add eax, 5
        cmp eax, 10000h
        jnb ice_protection_fault
        mov [ds:ice_reg._ip], ax
        dec [ds:ice_opcode_length]
        ret
endp ice_direct_jmp_near_32
endp ice_direct_call_near_32
    .-----------------------------------.
    | INDIRECTS (JMP, CALL, BOUND, DIV) |
    '-----------------------------------'
    I've decided to leave the hardest for the very last... special instructions
which can use indirect operands, in which you have multiple choices about how
to handle them, all as complex as each other and with various speed, size, and
reliability trade offs :(
    The first method, which is 100% reliable, is spending +2k or more on
manually decoding the MODRM fields of these opcodes in both 16-bit and 32-bit
forms.  It's slow, and it's a bitch.  You could possibly work out some sort of
table format for this... or maybe not.  I did not bother with this possibility,
as although it is viable for 16-bit MODRM, with 32-bit MODRM and SIB bytes it
is just hopeless.
    The second method, is to modify the instructions in the generic opcode
handler to calculate the address being referenced, which is faster and smaller
than the first method, and can be 100% reliable.  Unfortunately, it still takes
up alot of code, especially with the 32-bit variants, however it can be used
for ALL indirect instructions, so you do save a little space.
    The third method, is to hook i0 for DIV, i5 for BOUND, and generically
execute the instruction.  Then, if your handler gets executed, you unhook and
generically emulate the exception interrupt.  However, this leaves you open to
anti-emulation code which will, for instance, use DIV using the value at 0:0,
which will no longer be there since you hooked it, etc.  It is small, and not
100% effective but still reliable enough to use.
    Then, for JMP/CALL access, you could single step through the individual
JMP/CALL instruction, and then you will record the CS+IP and fix up the old
part of the stack which was destroyed by the i1.  This is very effective in
that -ALL- direct and indirect JMP/CALL instructions can use the SAME
procedure... bringing down the complete ICE size to 2k, major space savings
considering it is normally about 3k.
    Unfortunately, this 3rd method also has the problem of instructions
accessing the values in the IVT at vector 1 (ie: CALL [FAR 0:4] for emulating
an interrupt), some of which can be avoided but most of which cannot, which
means this procedure is... reliable enough for use as you can mask indirect
JMP/CALLs to i1... however unreliable in that using only part of the address at
i1 will screw you up (and this could be done by some debuggers, possibly).
    So, as you can see, the only really choices are options 2 and 3, the
question is wether you are willing to sacrifice an extra 512 bytes to be
reliable, or save 512 bytes and skimp out on properly handling things.  Note
that also, in the third method, since you are single stepping, if there is a
faulty 32-bit JMP/CALL instruction, then you cannot emulate an exception,
whereas you can if you use the second method.
    Decisions, decisions :)
    Here is the code to handle all direct/indirect CALL/NEAR instructions using
the third method, however in the full example source code I use the second
method.  If you want to swap the methods over, you must remove the DIRECT and
INDIRECT JMP/CALL handling code (which was shown above), and point all indirect
and direct jmp near/far instructions to ice_indirect_jmp.  The indirect and
direct call near instructions go to ice_indirect_calln and the indirect and
direct call far instructions go to ice_indirect_callf.  This is done by
modifying the COS tables.  These routines could stand to be optimized slightly,
as using them does slow down emulation quite a bit.
proc ice_indirect_calln near
        or dl, dl
        jnz ice_indirect_calln_32
        mov ax, 8
        jmp ice_indirect
ice_indirect_calln_32:
        mov ax, 0ah
        jmp ice_indirect
endp ice_indirect_calln
proc ice_indirect_callf near
        or dl, dl
        jnz ice_indirect_callf_32
        mov ax, 0ah
        jmp ice_indirect
ice_indirect_callf_32:
        mov ax, 0eh
        jmp ice_indirect
endp ice_indirect_callf
proc ice_indirect_jmp near
        mov ax, 6
endp ice_indirect_jmp
proc ice_indirect near
        les edi, [ds:ice_reg._ssesp]
        sub di, ax
        push [dword es:di]
        push [word es:di+4]
        push di
        mov ax, [ds:ice_original_ip]
        mov [ds:ice_reg._ip], ax
        xor ax, ax
        mov es, ax
        les di, [dword es:4]
        push [dword es:di]
        push [word es:di+4]
        mov [byte es:di], 0eah
        mov [word es:di+1], offset ice_int_1
        mov [word es:di+3], cs
        mov [ds:ice_indirect_saved], 2
        mov ebx, [ds:ice_reg._ebx]
        mov ecx, [ds:ice_reg._ecx]
        mov edx, [ds:ice_reg._edx]
        mov edi, [ds:ice_reg._edi]
        mov esi, [ds:ice_reg._esi]
        mov ebp, [ds:ice_reg._ebp]
        mov es, [ds:ice_reg._es]
        mov ds, [ds:ice_reg._ds]        ; registers loaded (except eAX)
        ice_switch_to_external_stack    ; stack loaded
        cli
        pushf
        pop ax
        or ah, 1
        push ax
        mov eax, [cs:ice_reg._eax]      ; load eAX now
        popf                            ; turn on single step mode
        jmp [dword cs:ice_reg._csip]    ; do it
ice_indirect_return:
        ice_switch_to_internal_stack    ; internal stack is now on ;)
        push cs
        pop ds
        xor ax, ax
        mov es, ax
        les di, [dword es:4]
        pop [word es:di+4]
        pop [dword es:di]               ; restore INT 1 vector
        call ice_external_pop_32
        mov [ds:ice_reg._csip], eax
        call ice_external_pop_16
        mov es, [ds:ice_reg._ss]
        pop di
        pop [word es:di+4]
        pop [dword es:di]
        xor ax, ax
        mov [ds:ice_opcode_length], ax
        ret
endp ice_indirect
proc ice_int_1 far
        dec [cs:ice_indirect_saved]
        jz ice_indirect_return          ; don't activate too early
        iret
endp ice_int_1
    Here are the procedures to do BOUND/DIV using the interrupt hooking method,
which is much more reliable than the above procedure for JMP/CALL handling.
This is using the third method, and will not be used in the full source code,
as it uses the second method which all BOUND/DIV/JMP/CALL instructions can go
through.  To swap code sections, after replacing the JMP/CALL procedures as
above, and fixing up the COS tables, point IDIV and DIV to the ice_div and the
BOUND instruction to ice_bound.
proc ice_bound near
        mov di, (5*4)
        call ice_bound_div_execute
        jnz ice_bound_exception
        ret
ice_bound_exception:
        mov ah, 5
        jmp ice_fault_execute
endp ice_bound
proc ice_int far
        inc [word cs:ice_indirect_saved]
        iret
endp ice_int
proc ice_div near
        xor di, di
        call ice_bound_div_execute
        jnz ice_div_exception
        ret
ice_div_exception:
        xor ax, ax
        jmp ice_fault_execute
endp ice_div
proc ice_bound_div_execute near
        xor ax, ax
        mov [ds:ice_indirect_saved], ax
        mov es, ax
        push [dword es:di]
        push di
        mov [word es:di], offset ice_int
        mov [es:di+2], cs
        call ice_generic
        pop di
        xor ax, ax
        mov es, ax
        pop [dword es:di]
        cmp [ds:ice_indirect_saved], ax
        ret
endp ice_bound_div_execute proc
                  To see the 2nd method, refer to the full
                   source code in part 3 of this document.
    .------.
    | LOCK |
    '------'
    I discussed the complexity of LOCK handlers earlier, but since I've already
written (and scrapped) a routine to handle LOCK instructions, I've included it
just for educational purposes.  It looks complex, and has no comments, so is
most probably not bug free.  I drew up the table below to help me determine
which instructions are able to be prefixed by LOCK, and used it while coding my
LOCK handler.  If a LOCK prefixes any instruction not on this list, then you
emulate an invalid opcode exception.
    Set 1: normal set of instructions
    Set 2: all extended instructions
        VALID OPCODES:      Set 1              Set 2
                           '-----'            '-----'
            BT   mem, op    .                   A3h
            BTS  mem, op    .                   ABh
            BTR  mem, op    .                   B3h
            BTC  mem, op    <-- grp8            BBh
            XCHG mem, op    <-- 86h
            XCHG reg, mem   <-- 87h
            ADD  mem, op    .                   00h, 01h
            ADC  mem, op    .                   10h, 11h
            AND  mem, op    .                   20h, 21h
            OR   mem, op    .                   08h, 09h
            SBB  mem, op    .                   18h, 19h
            SUB  mem, op    .                   28h, 29h
            XOR  mem, op    <-- 80h, 81h, 83h   30h, 31h
            DEC  mem        .
            INC  mem        <-- grp 4
            NEG  mem        .
            NOT  mem        <-- grp 3
    Before we go onto the ICE_LOCK procedure I'll quickly describe how to
include it into the ICE emulation system.  To use it, you must include the
procedure itself in the source file and remove the LOCK opcode siphoner from
the beginning of the dispatcher.  Then, at the end of the dispatcher where the
ICE_COMMUNICATION variable is checked, you must add another check for it to be
equal to 2, and if it is you continue with the opcode siphoning but SKIP the
section which CLEARS the opcodes (ie: je ICE_SEGMENT_REMOVAL).  Finally, you
must update the COS table to set the 'use a procedure' bit, then add a word
variable after it pointing to this ICE_LOCK procedure.
proc ice_lock_override near
        inc di
        mov ax, [es:di]
proc ice_lock near
        cmp al, 2eh
        je ice_lock_override
        cmp al, 3eh
        je ice_lock_override
        cmp al, 26h
        je ice_lock_override
        cmp al, 36h
        je ice_lock_override
        cmp al, 0f2h
        je ice_lock_override
        cmp al, 0f3h
        je ice_lock_override
        cmp al, 66h
        je ice_lock_override
        cmp al, 67h
        je ice_lock_override
        cmp al, 0fh
        je ice_lock_extended
        cmp ah, 86h
        je ice_lock_testmem
        cmp ah, 87h
        je ice_lock_testmem
        cmp al, 0feh
        je ice_lock_grp4
        cmp al, f6h
        je ice_lock_grp3
        cmp al, f7h
        je ice_lock_grp3
        cmp al, 0
        je ice_lock_testmem
        cmp al, 1
        je ice_lock_testmem
        cmp al, 10h
        je ice_lock_testmem
        cmp al, 11h
        je ice_lock_testmem
        cmp al, 20h
        je ice_lock_testmem
        cmp al, 21h
        je ice_lock_testmem
        cmp al, 30h
        je ice_lock_testmem
        cmp al, 31h
        je ice_lock_testmem
        cmp al, 08h
        je ice_lock_testmem
        cmp al, 09h
        je ice_lock_testmem
        cmp al, 18h
        je ice_lock_testmem
        cmp al, 19h
        je ice_lock_testmem
        cmp al, 28h
        je ice_lock_testmem
        cmp al, 29h
        je ice_lock_testmem
        cmp al, 80h
        jb ice_lock_invalid
        cmp al, 83h
        ja ice_lock_invalid
        jmp ice_lock_testmem
ice_lock_grp3:
        push ax
        and ah, 111000b
        cmp ah, 10000b
        je ice_lock_grp3_okay
        cmp ah, 11000b
        je ice_lock_grp3_okay
        pop ax
        jmp ice_invalid_opcode
ice_lock_grp3_okay:
        pop ax
        jmp ice_lock_testmem
ice_lock_grp4:
        push ax
        and ah, 111000b
        cmp ah, 0
        je ice_lock_grp4_okay
        cmp ah, 1000b
        je ice_lock_4_grp_okay
        pop ax
        jmp ice_invalid_opcode
ice_lock_grp4_okay:
        pop ax
        jmp ice_lock_testmem
ice_lock_extended:
        cmp ah, a3h
        je ice_lock_testmem
        cmp ah, b3h
        je ice_lock_testmem
        cmp ah, abh
        je ice_lock_testmem
        cmp ah, bbh
        je ice_lock_testmem
        cmp ah, bah
        jne ice_invalid_opcode
        mov ah, [ds:si+2]
ice_lock_grp8:
        test ah, 100000b
        jz ice_invalid_opcode
ice_lock_testmem:
        and ah, 11000000b
        cmp ah, 11000000b
        je ice_invalid_opcode
        mov [ds:ice_communication], 2
        ret
endp ice_lock
endp ice_lock_override
-----------------------------------------------------------------------------
Section 6:  Notes on ICE
-----------------------------------------------------------------------------
    So how well does ICE fare?  Well, it can vary between 2k and 3.5k depending
on what type of procedures you use to handle indirect instructions, and wether
you include the LOCK procedure or not (probably not a good idea, mine is prone
to bug, because why fix it if I won't use it?).  The included source however,
is an average of 3k.
    Is that good or bad?  Well, the XT tracers are generally between 1.5k and
2k... and since you can get ICE down to 2k it is -FUCKING- good!  Also, ICE
could really be optimized quite a bit... many of the special opcode handlers
are nowhere near as optimized as possible :)  However, you must make sure as
you decrease size you don't decrease speed too.
    What about COS?  The checks for invalid MODR/M combinations were removed
from the COS decoder simply because we don't really need them, however the COS
tables -ARE- set up with the MODR/M restrictions set.  The COS decoder provided
is quite excellent actually in its usage of index tables to speed processing of
the COS tables, however could possibly be optimized.
    As for the general BCE design, there are many other generic opcode handlers
which are much smaller and faster than mine, ICEs BCE could be redesigned to be
smaller and faster too although it would probably require you to alter the many
other parts of ICE to work with the new modifications too.
    And how well does ICE work?  I can emulate 32-bit TBSCAN for DOS under it,
so I suppose it is good enough ;)  I can also emulate PKZIP/ARJ/RAR/etc under
it, as well as things like IRG#8 magazine reader, and other DOS programs like
SCANDISK and DEFRAG (but you cannot access floppy disks with it, emulators are
too slow for this, the disks time out).
    There -ARE- a few minor bugs in ICE... which I cannot find.  ICE will not
run Manifest from QEMM (MFT.EXE) nor MSD.EXE nor QPEG 386 (QPV.EXE), and all
seem to be hanging on the same problem opcode, which I suspect is there due to
some emulation bug because it's not a valid opcode :)  ICE -USED- to be able to
run MSD.EXE just fine... however somewhere along the line it stopped working.
    I decided to release ICE anyway as it is probably a minature bug... not
worth holding up finishing my glorious tunneling series up for :)  If anyone
can find the bug... tell me!  I've gone half insane (and deaf, listening to
music while I code) trying to find it!
    Uhh, anyway, like I said, ICE is a first generation product, there are no
other 386+ emulators written for viruses out there at the moment.  Note that
the COS tables only work for 386 opcodes, and that the reason I call ICE a 386+
emulation system is because the COS tables can be, if you can find the opcode
information, updated to include even Pentium instructions (I just do not have
those opcode lists however).  Note I said you only have to update the TABLES,
not COS or the decoder... which is why COS is so neat ;)  Well, you might have
to change the COS definition a little bit for Pentium, as I think there are
64-bit MODR/M instructions?  Maybe soon a COS v2 will be needed?  :)
    For the worlds first (virogen) 386+ emulator, ICE does a damn good job, but
just like anything, it can be improved.  I'm sure you'll all bring out ICEs of
your own (probably looking nothing like mine), assuming anyone wants to use an
emulation system at all.  More on that later.
    Time for the full source code (mmmm mmm) :)
    Normally I give you an example program to display tunneled i13 and i21
vectors, however in this document I shall do things a little differently, being
the last one and all.  This source will emulate its own residency, and after
that, -EVERYTHING- is being emulated, hence the reason your computer slows down
to a crawl :)
    To convert ICE to tunnel things, you would simply set the correct registers
in the emulated registers structure, set the CS:IP properly, set the stack to
point to the return emulation address, and add in code to the ICE dispatcher to
check for the emulated CS:IP to point back to your virus (so when control is
returned from the interrupt being emulated) at which time control is passed
straight from ICE to your virus.  Also, code would be added to ICE to save the
emulated CS:IP address when the original interrupt entrypoint was detected.
    For test purposes however, remember, loading ANYTHING after running even
this current program source will be emulated... so a DIR is emulated, the INTs
it makes are emulated, -EVERYTHING-.  So you can even load (given enough time)
your favourite AV program and see how it doesn't notice it's under complete
control of ICE.  Imagine that power used in your next virus...
      .--------------------------------------------------------------.
      |           IMPORTANT IMPORTANT IMPORTANT IMPORTANT            |
      ----------------------------------------------------------------
      | ICE WILL -NOT- RUN UNDER ANY SORT OF MEMORY MANAGER, BECAUSE |
      | BCE EMULATORS CAN NOT SUPPORT PROTECTED MODE SWITCHING.  FOR |
      | TUNNELING HOWEVER, NORMAL INTERRUPTS WILL NOT CAUSE A MEMORY |
      | MANAGER TO SWITCH BETWEEN PROCESSOR MODES,  HENCE THE REASON |
      | EMULATORS WILL WORK UNDER MEMORY MANAGERS IN TUNNELERS    :) |
      '--------------------------------------------------------------'
 .----------------------------------------------------------------------------.
 |                                  Part 3                                    |
 |                                                                            |
 |                           ICE complete source                              |
 '----------------------------------------------------------------------------'
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
; ICE, the INTEL complex emulator
;
; tasm /m9 ice.asm
; tlink /3 ice
;
ideal
p386
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
; MACRO's, used for internal/external stack switching
;
macro ice_switch_to_internal_stack
        mov [cs:ice_reg._ss], ss
        mov [cs:ice_reg._esp], esp  ; save external stack address
        mov [cs:ice_internal_stack.switch], cs
        mov ss, [cs:ice_internal_stack.switch]
        mov esp, [cs:ice_internal_stack.internal_esp]
                ; set stack to internal stack address
        endm
macro ice_switch_to_external_stack
        mov [cs:ice_internal_stack.internal_esp], esp
                                    ; save internal stack offset
        mov ss, [cs:ice_reg._ss]
        mov esp, [cs:ice_reg._esp]  ; set stack to external stack address
        endm
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
; STACK
segment stackers para stack 'stack'
        dw 050h
ends stackers
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
; Segment definition... where all our code/data is stored
;
segment ice para public 'code'
        assume cs:ice, ds:ice, es:nothing, ss:stackers
proc ice_setup near
        xor ax, ax
        mov ds, ax
        les ax, [ds:21h*4]
        push cs
        pop ds
        mov [ds:ice_reg._cs], es
        mov [ds:ice_reg._ip], ax
        mov [ds:ice_reg._ah], 31h
        mov [ds:ice_reg._dx], 100h
        mov ax, (offset ice_return)
        pushf
        push cs
        push ax
        mov [ds:ice_reg._ss], ss
        mov [ds:ice_reg._esp], esp
        push cs
        pop ss
        mov esp, (offset ice_internal_stack.top)
        cli
        pushfd
        pop [dword ds:ice_reg._eflags]
        and [byte high word ds:ice_reg._flags], 11111100b
        jmp ice_dispatch
ice_return:
        mov ax, 4c00h
        int 21h
endp ice_setup
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
; ICE Dispatcher
;
proc ice_tf_handler near
        xor ax, ax
        mov [ds:ice_opcode_length], ax
        mov ah, 1
        call ice_int_x
        jmp ice_tf_handled
endp ice_tf_handler
ice_address_removal_process:
        mov [ds:ice_address_override], al
        jmp ice_removal_jump
ice_operand_removal_process:
        mov [ds:ice_operand_override], al
        jmp ice_removal_jump        ; repeat override removal process
ice_repeat_removal_process:
        mov [ds:ice_repeat_override], bl
        jmp ice_removal_jump        ; repeat override removal process
ice_segment_removal_process:
        mov [ds:ice_segment_override], bl
ice_removal_jump:
        inc [ds:ice_reg._ip]        ; increment IP
        jmp ice_segment_removal     ; repeat override removal process
proc ice_dispatch near
        test [byte high word ds:ice_reg._flags], 1
        jnz ice_tf_handler          ; check for TF in emulated flags
ice_tf_handled:
        mov ax, [ds:ice_reg._ip]
        mov [ds:ice_original_ip], ax; save address of _IP before prefix removal
                                    ; begins
        xor eax, eax
        mov [ds:ice_overrides], eax ; clear prefix variables
                                    ; (they are 4 one byters, stored in a row,
                                    ; so we use one doubleword move to clear)
ice_segment_removal:
        les di, [ds:ice_reg._csip]  ; ES:DI=instruction to emulate
ice_breakpoint:
        mov ax, [es:di]             ; get opcode
        mov bx, ax
        and al, 011100111b
        cmp al, 000100110b
        mov al, bl
        je ice_segment_removal_process
        and al, 0feh
        cmp al, 064h
        je ice_segment_removal_process
        cmp al, 0f2h
        je ice_repeat_removal_process
        mov al, bl
        cmp al, 66h
        je ice_operand_removal_process
        cmp al, 67h
        je ice_address_removal_process
        cmp al, 0f0h
        je ice_removal_jump
ice_decode_begin:
        cld
        push bx             ; save original opcode
        mov [ds:ice_current_opcode], ax
        call ice_decoder    ; scan opcode through COS decoder
        push cx             ; save length to copy
        lds si, [ds:ice_reg._csip]
        push cs
        pop es
        mov di, (offset ice_override_buffer)
        mov cx, 5
        mov eax, 90909090h
        rep stosd           ; clear execution buffer with NOP instructions
        pop cx              ; restore length of instruction to copy
        mov di, (offset ice_opcode_buffer)
        rep movsb           ; copy instruction to be emulated into execution
                            ; buffer
ice_copy_complete:
        push cs
        pop ds
        pop ax                          ; original opcode, saved earlier
        les di, [ds:ice_reg._csip]
        mov dl, [ds:ice_operand_override]
        mov [ds:ice_communication], 0   ; clear communication area
        ; On entry to opcode handlers
        ;   AX    = opcode of instruction
        ;   ES:DI = instruction address
        call [ds:ice_handler]           ; call opcode handler
        cli
        mov ax, [ds:ice_opcode_length]
        add [ds:ice_reg._ip], ax        ; increment IP by instruction length
        cmp [ds:ice_communication], 1
        jb ice_dispatch         ; default restart condition, clear old prefixes
                                ; and do TF check
        jmp ice_tf_handled      ; special POPF/IRET condition, skip checking
endp ice_dispatch
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
; ICE COS decoder
;
ice_decoder_extended:
        inc cx          ; increment instruction length
        inc di          ; increment pointer to point to rest of instruction
        mov al, ah
        mov bx, (offset ice_extended_layout)
        mov si, (offset ice_tables._extended)
        jmp ice_decoder_normal_middle
proc ice_decoder near
        xor cx, cx                      ; clear instruction length
        cmp al, 0fh
        je ice_decoder_extended
ice_decoder_normal:
        mov bx, (offset ice_normal_layout)
        mov si, (offset ice_tables._normal)
ice_decoder_normal_middle:
        and ax, 11110000b
        mov dx, ax
        shr al, 4
        add ax, bx
        xchg ax, si
        xor bx, bx
        mov bl, [ds:si]
        add ax, bx
        xchg ax, si
ice_decoder_setup:
        mov ax, [es:di] ; load opcode to compare with table numbers
        mov ah, 0       ; clear top half as it's junk
ice_decoder_loop:
        mov bl, [ds:si]
        and bl, 11100000b
        cmp bl, 01100000b               ; is repeat flag set?
        jne ice_decoder_single          ; no, handle it as a single entry
ice_decoder_repeat:
        mov bl, [ds:si]
        and bl, 11111b                  ; get repeat length
        inc bx                          ; make real repeat length
                        ; get number of opcodes covered by this repeat entry
        add dx, bx      ; table entry = table entry + repeat entries
        inc si          ; point to 'real' opcode entry
        cmp ax, dx      ; is our opcode covered by repeater?
        jb ice_decoder_match    ; yes, decode entry
        jmp ice_decoder_nomatch
ice_decoder_single:
        cmp ax, dx      ; does opcode = table entry?
        je ice_decoder_match    ; yes, decode entry
        inc dx          ; increment table entry number
ice_decoder_nomatch:
        test [byte ds:si], 1000b    ; is procedure entry set?
        jz ice_decoder_skip_entry_easy
        push ax
        mov al, [ds:si]
        and al, 11000000b
        cmp al, 10000000b
        pop ax
        je ice_decoder_skip_entry_easy  ; invalid if group flag set
        inc si
        inc si          ; fixup pointer to skip procedure address
ice_decoder_skip_entry_easy:
        inc si          ; point to next table entry
                        ; move pointer to next entry
        jmp ice_decoder_loop            ; test next entry against opcode
endp ice_decoder
proc ice_decoder_groups near
        call ice_decoder_immediates     ; calculate immediates
        mov al, [ds:si]
        and ax, 111000b         ; get group access number
        shr al, 3               ; right-align it
        add ax, (offset ice_groups_layout)
        xchg ax, si
        mov ax, (offset ice_tables._groups)
        xor bx, bx
        mov bl, [byte ds:si]
        add ax, bx
        xchg ax, si             ; get group table address
        mov al, [es:di+1]
        and ax, 111000b
        shr al, 3               ; index into group entry
        xor dx, dx              ; clear table entry number
        jmp ice_decoder_loop    ; decode !
endp ice_decoder_groups
proc ice_decoder_invalid near
        xor cx, cx                      ; length of 0
        mov [ds:ice_handler], offset ice_invalid_opcode
                                        ; use invalid opcode handler
        ret
endp ice_decoder_invalid
proc ice_decoder_fixup_test near
        and ah, 111000b
        jnz ice_decoder_fixup_over
        cmp al, 0f6h
        je ice_decoder_fixup_byte
        cmp [ds:ice_operand_override], 0
        je ice_decoder_fixup_word
        inc cx  ; DWORD fixup
        inc cx
ice_decoder_fixup_word:
        inc cx  ; WORD fixup
ice_decoder_fixup_byte:
        inc cx  ; BYTE fixup
        jmp ice_decoder_fixup_over
endp ice_decoder_fixup_test
proc ice_decoder_match near
        mov bl, [ds:si]         ; get table entry
        and bl, 11000000b
        cmp bl, 10000000b       ; mask for group entry flag
        je ice_decoder_groups   ; convert decoding for group tables
        mov bl, [ds:si]
        and bl, 11100000b
        cmp bl, 01000000b
        je ice_decoder_invalid  ; invalid opcode
        mov bp, (offset ice_generic)    ; use generic opcode handler by default
        test [byte ds:si], 1000b
        jz ice_decoder_match_no_handler
        mov bp, [ds:si+1]       ; use special opcode handler
ice_decoder_match_no_handler:
        mov ax, [es:di]
        cmp al, 0f6h
        je ice_decoder_fixup_test
        cmp al, 0f7h
        je ice_decoder_fixup_test
        cmp al, 0c8h
        je ice_decoder_fixup_byte   ; note that the extended C8h instruction
                                    ; (0FC8) is invalid and won't come by here
                                    ; so this won't stuff it up
ice_decoder_fixup_over:
        mov bl, [ds:si]         ; get table entry again
        and bl, 11110000b       ; get header bits of table entry
        jz ice_decoder_plain            ; just a plain old opcode
        cmp bl, 00010000b
        je ice_decoder_special_address  ; special address opcode
endp ice_decoder_match
proc ice_decoder_modrm near
        inc cx
        mov bl, [es:di+1]
        mov al, bl
        cmp [ds:ice_address_override], 0
        jne ice_decoder_modrm_32        ; use 32-bit MODR/M calculations
        and al, 11000111b
        cmp al, 110b
        je ice_decoder_modrm_big        ; address   = two addition
        and al, 11000000b
        jz ice_decoder_plain            ; register = no addition
        cmp al, 01000000b
        je ice_decoder_modrm_small      ; small     = one addition
        cmp al, 10000000b
        jne ice_decoder_plain           ; register = no addition
                                        ; big       = two addition
ice_decoder_modrm_big:
        inc cx
ice_decoder_modrm_small:
        inc cx
        jmp ice_decoder_plain
endp ice_decoder_modrm
proc ice_decoder_modrm_32 near
        and al, 11000000b
        cmp al, 11000000b
        je ice_decoder_plain            ; register = no addition
        mov al, bl
        and al, 111b
        cmp al, 100b
        jne ice_decoder_modrm_sib
        inc cx          ; account for Scale/Index/Base byte
        mov al, bl
        and al, 11000000b
        jnz ice_decoder_modrm_sib
        mov al, [es:di+2]
        and al, 111b
        cmp al, 101b
        je ice_decoder_modrm_four_32
ice_decoder_modrm_sib:
        mov al, bl
        and al, 11000111b
        cmp al, 101b
        je ice_decoder_modrm_four_32    ; 32-bit displacement = four addition
        and al, 11000000b
        jz ice_decoder_plain            ; no addition
        cmp al, 10000000b
        jb ice_decoder_modrm_one_32     ; small displacement = one addition
ice_decoder_modrm_four_32:              ; 32-bit displacements
        add cx, 3
ice_decoder_modrm_one_32:               ; 8-bit displacements
        inc cx
        jmp ice_decoder_plain           ; go to immediate data length decoder
endp ice_decoder_modrm_32
proc ice_decoder_special_address near
        inc cx
        inc cx                          ; word memory address
        cmp [byte ds:ice_address_override], 0
        je ice_decoder_plain
        inc cx
        inc cx                  ; doubleword memory address
endp ice_decoder_special_address
proc ice_decoder_plain near
        call ice_decoder_immediates     ; calculate immediates
        inc cx                          ; instruction size + 1
        mov [ds:ice_opcode_length], cx  ; save opcode length
        mov [ds:ice_handler], bp        ; save opcode handler address
        ret
endp ice_decoder_plain
proc ice_decoder_immediates near
        mov al, [ds:si]
        and ax, 111b
        shl al, 1
        add ax, (offset ice_immediates_table)
        cmp [ds:ice_operand_override], 0
        je ice_decoder_immediates_conversion
        inc ax
ice_decoder_immediates_conversion:
        xchg ax, si
        add cl, [ds:si]
        xchg ax, si
        ret
endp ice_decoder_immediates
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_mov_segreg_source near
        and ah, 111000b
        cmp ah, 1000b
        je ice_mov_regmem_cs        ; MOV ?, CS
endp ice_mov_segreg_source
proc ice_mov_segreg_destination near
        and ah, 111000b
        cmp ah, 1000b
        je ice_invalid_opcode       ; MOV CS, ?
        cmp ah, 11000b
        je ice_generic_process_es   ; MOV DS instructions
        cmp ax, 1000010001110b
        jne ice_mov_segreg_exit
        inc [ds:ice_communication]  ; MOV SS, ?
ice_mov_segreg_exit:
        jmp ice_generic             ; handle the rest generically
endp ice_mov_segreg_destination
proc ice_mov_regmem_cs near
        cmp [byte high word ds:ice_current_opcode], 11001000b
        je ice_mov_ax_cs
        mov [byte ds:ice_opcode_buffer], 89h
        and [byte ds:ice_opcode_buffer+1], 11000111b
        push [ds:ice_reg._eax]      ; save _EAX
        xor eax, eax
        mov ax, [ds:ice_reg._cs]
        mov [ds:ice_reg._eax], eax  ; _EAX = _CS
        call ice_generic            ; emulate it
        pop [ds:ice_reg._eax]       ; restore _EAX
        ret                         ; exit
endp ice_mov_regmem_cs
proc ice_mov_ax_cs near
        xor eax, eax
        mov ax, [ds:ice_reg._cs]
        cmp [ds:ice_operand_override], 0
        jne ice_mov_ax_cs_32
        mov [ds:ice_reg._ax], ax    ; _AX = _CS
        ret
endp ice_mov_ax_cs
proc ice_mov_ax_cs_32 near
        mov [ds:ice_reg._eax], eax  ; _EAX = 0000 shl 16 + _CS
        ret
endp ice_mov_ax_cs_32
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_fault_execute near
        xchg bl, ah
        xor ax, ax
        mov [ds:ice_opcode_length], ax
        mov ax, [ds:ice_original_ip]
        mov [ds:ice_reg._ip], ax
        xchg ah, bl
        jmp ice_int_x
endp ice_fault_execute
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_protection_fault near
        mov ah, 13              ; yes, 13, not 13h
        jmp ice_fault_execute
endp ice_protection_fault
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_invalid_opcode near
        mov ah, 6
        jmp ice_fault_execute
endp ice_invalid_opcode
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_indirect near
        mov al, 10001011b
        and ah, 11000111b
        mov [word ds:ice_opcode_buffer], ax
        push [ds:ice_reg._eax]
        call ice_generic
        mov eax, [ds:ice_reg._eax]
        mov [ds:ice_indirect_saved], eax
        pop [ds:ice_reg._eax]
        mov ax, [ds:ice_current_opcode]
        cmp al, 62h
        je ice_indirect_second
        and ah, 111000b
        cmp ah, 110000b
        je ice_div
        cmp ah, 111000b
        je ice_div
        mov bl, [ds:ice_operand_override]
        cmp ah, 100000b
        je ice_indirect_jmp_near
        cmp ah, 010000b
        jne ice_indirect_second
ice_indirect_call_near:
        or bl, bl
        jnz ice_indirect_call_near_32
        mov ax, [ds:ice_reg._ip]
        add ax, [ds:ice_opcode_length]
        call ice_external_push_16
ice_indirect_jmp_near:
        or bl, bl
        jnz ice_indirect_jmp_near_32
        mov ax, [word low dword ds:ice_indirect_saved]
        mov [ds:ice_reg._ip], ax
        xor ax, ax
        mov [ds:ice_opcode_length], ax
        ret
ice_indirect_call_near_32:
        cmp [word high dword ds:ice_indirect_saved], 0
        jne ice_protection_fault
        xor eax, eax
        mov ax, [ds:ice_reg._ip]
        add ax, [ds:ice_opcode_length]
        call ice_external_push_32
ice_indirect_jmp_near_32:
        mov eax, [ds:ice_indirect_saved]
        cmp eax, 10000h
        jnb ice_protection_fault
        mov [ds:ice_reg._ip], ax
        xor ax, ax
        mov [ds:ice_opcode_length], ax
        ret
ice_indirect_second:
        push [ds:ice_reg._eax]
        mov [byte ds:ice_opcode_buffer], 8dh
        call ice_generic
        mov eax, [ds:ice_reg._eax]
        pop [ds:ice_reg._eax]
        cmp [ds:ice_address_override], 0
        jnz ice_indirect_second_32
        xchg ax, di
        mov al, [ds:ice_segment_override]
        mov bx, [word low dword ds:ice_indirect_saved]
        cmp al, 26h
        je ice_indirect_es
        cmp al, 2eh
        je ice_indirect_cs
        cmp al, 36h
        je ice_indirect_ss
        cmp al, 64h
        je ice_indirect_fs
        cmp al, 65h
        je ice_indirect_gs
ice_indirect_ds:
        mov es, [ds:ice_reg._ds]
        cmp bx, [es:di]
        je ice_indirect_third
ice_indirect_ss:
        mov es, [ds:ice_reg._ss]
        cmp bx, [es:di]
        je ice_indirect_third
ice_indirect_cs:
        mov es, [ds:ice_reg._cs]
        cmp bx, [es:di]
        je ice_indirect_third
ice_indirect_es:
        mov es, [ds:ice_reg._es]
        cmp bx, [es:di]
        je ice_indirect_third
ice_indirect_fs:
        push fs
        pop es
        cmp bx, [es:di]
        je ice_indirect_third
ice_indirect_gs:
        push gs
        pop es
        cmp bx, [es:di]
        jne ice_protection_fault
ice_indirect_third:
        mov cx, [es:di+2]
        mov ax, [ds:ice_current_opcode]
        cmp al, 62h
        je ice_bound
        and ah, 111000b
        cmp ah, 101000b
        je ice_indirect_jmp_far
ice_indirect_call_far:
        mov ax, [ds:ice_reg._cs]
        call ice_external_push_16
        mov ax, [ds:ice_reg._ip]
        add ax, [ds:ice_opcode_length]
        call ice_external_push_16
ice_indirect_jmp_far:
        mov [ds:ice_reg._cs], cx
        mov ax, [word low dword ds:ice_indirect_saved]
        mov [ds:ice_reg._ip], ax
        xor ax, ax
        mov [ds:ice_opcode_length], ax
        ret
ice_indirect_second_32:
        cmp eax, 10000h
        jnb ice_protection_fault
        xchg eax, edi
        mov al, [ds:ice_segment_override]
        mov ebx, [ds:ice_indirect_saved]
        cmp al, 26h
        je ice_indirect_es_32
        cmp al, 2eh
        je ice_indirect_cs_32
        cmp al, 36h
        je ice_indirect_ss_32
        cmp al, 64h
        je ice_indirect_fs_32
        cmp al, 65h
        je ice_indirect_gs_32
ice_indirect_ds_32:
        mov es, [ds:ice_reg._ds]
        cmp ebx, [es:edi]
        je ice_indirect_third_32
ice_indirect_ss_32:
        mov es, [ds:ice_reg._ss]
        cmp ebx, [es:edi]
        je ice_indirect_third_32
ice_indirect_cs_32:
        mov es, [ds:ice_reg._cs]
        cmp ebx, [es:edi]
        je ice_indirect_third_32
ice_indirect_es_32:
        mov es, [ds:ice_reg._es]
        cmp ebx, [es:edi]
        je ice_indirect_third_32
ice_indirect_fs_32:
        push fs
        pop es
        cmp ebx, [es:edi]
        je ice_indirect_third_32
ice_indirect_gs_32:
        push gs
        pop es
        cmp ebx, [es:edi]
        jne ice_protection_fault
ice_indirect_third_32:
        mov ecx, [es:di+4]
        mov ax, [ds:ice_current_opcode]
        cmp al, 62h
        je ice_bound_32
        and ah, 111000b
        cmp ah, 101000b
        je ice_indirect_jmp_far_32
ice_indirect_call_far_32:
        db 66
        push cs
        pop eax
        mov ax, [ds:ice_reg._cs]
        call ice_external_push_32
        xor eax, eax
        mov ax, [ds:ice_reg._ip]
        add ax, [ds:ice_opcode_length]
        call ice_external_push_32
ice_indirect_jmp_far_32:
        mov [ds:ice_reg._cs], cx
        mov ax, [word low dword ds:ice_indirect]
        mov [ds:ice_reg._ip], ax
        xor ax, ax
        mov [ds:ice_opcode_length], ax
        ret
endp ice_indirect
proc ice_div near
        mov ebx, [ds:ice_indirect_saved]
        cmp al, 0f6h
        jne ice_div_word
ice_div_byte:
        or bl, bl
        jz ice_div_exception
ice_div_okay:
        mov ax, [ds:ice_current_opcode]
        mov [word ds:ice_opcode_buffer], ax
        jmp ice_generic
ice_div_word:
        cmp [ds:ice_operand_override], 0
        jnz ice_div_dword
        or bx, bx
        jnz ice_div_okay
ice_div_exception:
        xor ax, ax
        jmp ice_fault_execute
ice_div_dword:
        or ebx, ebx
        jz ice_div_exception
        jmp ice_div_okay
endp ice_div
proc ice_bound near
        push [ds:ice_reg._ax]
        push cx
        mov al, 89h
        and ah, 111000b
        or ah, 11000000b
        mov [word ds:ice_opcode_buffer], ax
        mov [word ds:ice_opcode_buffer+2], 9090h
        call ice_generic
        mov ax, [ds:ice_reg._ax]
        mov bx, [word low dword ds:ice_indirect_saved]
        pop cx
        pop [ds:ice_reg._ax]
        cmp ax, bx
        jb ice_bound_triggered
        cmp ax, cx
        ja ice_bound_triggered
        ret
ice_bound_triggered:
        mov ah, 5
        jmp ice_fault_execute
endp ice_bound
proc ice_bound_32 near
        push [ds:ice_reg._eax]
        push ecx
        mov al, 89h
        and ah, 111000b
        or ah, 11000000b
        mov [word ds:ice_opcode_buffer], ax
        mov [dword ds:ice_opcode_buffer+2], 90909090h
        call ice_generic
        mov eax, [ds:ice_reg._eax]
        mov ebx, [ds:ice_indirect_saved]
        pop ecx
        pop [ds:ice_reg._eax]
        cmp eax, ebx
        jb ice_bound_triggered
        cmp eax, ecx
        ja ice_bound_triggered
        ret
endp ice_bound_32
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_direct_call_far near
        or dl, dl
        jnz ice_direct_call_far_32
        mov ax, [ds:ice_reg._cs]
        call ice_external_push_16
        mov ax, [ds:ice_reg._ip]
        add ax, 5
        call ice_external_push_16
proc ice_direct_jmp_far near
        or dl, dl
        jnz ice_direct_jmp_far_32
        mov ax, [word es:di+1]
        mov [ds:ice_reg._ip], ax
        mov ax, [word es:di+3]
        mov [ds:ice_reg._cs], ax
        dec [ds:ice_opcode_length]
        ret
endp ice_direct_jmp_far
endp ice_direct_call_far
proc ice_direct_call_far_32 near
        cmp [word high dword es:di+1], 0
        jnz ice_protection_fault
        db 66h
        push cs
        pop eax
        mov ax, [ds:ice_reg._cs]
        call ice_external_push_32
        xor eax, eax
        mov ax, [ds:ice_reg._ip]
        add ax, 7
        call ice_external_push_32
proc ice_direct_jmp_far_32 near
        mov eax, [es:di+1]
        cmp eax, 10000h
        jnb ice_protection_fault
        mov [ds:ice_reg._ip], ax
        mov ax, [word es:di+5]
        mov [ds:ice_reg._cs], ax
        dec [ds:ice_opcode_length]
        ret
endp ice_direct_jmp_far_32
endp ice_direct_call_far_32
proc ice_direct_call_near near
        or dl, dl
        jnz ice_direct_call_near_32
        mov ax, [ds:ice_reg._ip]
        add ax, 3
        call ice_external_push_16
proc ice_direct_jmp_near near
        or dl, dl
        jnz ice_direct_jmp_near_32
        mov ax, [es:di+1]
        add [ds:ice_reg._ip], ax
        ret
endp ice_direct_jmp_near
endp ice_direct_call_near
proc ice_direct_call_near_32 near
        xor eax, eax
        mov ax, [ds:ice_reg._ip]
        add ax, 5
        push eax
        add eax, [es:di+1]
        cmp eax, 10000h
        pop eax
        jnb ice_protection_fault
        call ice_external_push_32
proc ice_direct_jmp_near_32 near
        xor eax, eax
        mov ax, [ds:ice_reg._ip]
        add eax, [es:di+1]
        add eax, 5
        cmp eax, 10000h
        jnb ice_protection_fault
        mov [ds:ice_reg._ip], ax
        dec [ds:ice_opcode_length]
        ret
endp ice_direct_jmp_near_32
endp ice_direct_call_near_32
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_into near
        mov ah, 4
        test [byte high word ds:ice_reg._flags], 1000b
        jnz ice_int_x       ; emulate interrupt if emulated overflow flag set
        ret                 ; else just skip the interrupt
endp ice_into
proc ice_int_3 near
        mov ah, 3           ; emulate INT 3 instruction (length of 1 already in
                            ; the ice_opcode_length variable
proc ice_int_x near
        xchg ax, bx         ; BX holds interrupt to emulate
        mov ax, [ds:ice_reg._flags]
        call ice_external_push_16   ; save emulated flags on external stack
        mov ax, [ds:ice_reg._cs]
        call ice_external_push_16   ; save emulated CS on external stack
        mov ax, [ds:ice_reg._ip]
        add ax, [ds:ice_opcode_length]
        call ice_external_push_16   ; save emulated return IP on external stack
        and [byte high word ds:ice_reg._flags], 11111100b
                                    ; clear emulated IF and TF
        xor ax, ax
        mov di, ax              ; DI = 0
        mov al, bh              ; AL = INT to emulate
        shl ax, 2               ; AX = INT * 4
        xchg ax, di
        mov es, ax              ; ES = 0, DI = INT * 4
        mov ax, [word es:di]    ; get offset of interrupt code
        mov [ds:ice_reg._ip], ax; update emulated IP
        mov ax, [word es:di+2]  ; get segment of interrupt code
        mov [ds:ice_reg._cs], ax; update emulated CS
        xor ax, ax
        mov [ds:ice_opcode_length], ax  ; clear opcode length as IP is already
                                        ; set properly
        ret
endp ice_int_x
endp ice_int_3
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_ret_near_value
        mov bx, [es:di+1]
        jmp ice_ret_near_skip   ; get value to add to eSP
endp ice_ret_near_value
proc ice_ret_near
        xor bx, bx              ; value to add to eSP is 0
ice_ret_near_skip:
        or dl, dl
        jnz ice_ret_near_32     ; 32-bit RET NEAR
        call ice_external_pop_16; get new IP
        mov [ds:ice_reg._ip], ax; set new IP
        jmp ice_ret_exit
ice_ret_near_32:
        call ice_external_pop_32    ; get new IP
        cmp eax, 10000h
        jnb ice_ret_exception       ; emulate exception if invalid return IP
        mov [ds:ice_reg._ip], ax    ; set new IP
endp ice_ret_near
proc ice_ret_exit near
        dec [ds:ice_opcode_length]  ; instruction length = 0, for dispatcher
        or dl, dl
        jnz ice_ret_exit
        add [ds:ice_reg._sp], bx    ; update SP
        ret
ice_retn_exit_32:
        xor eax, eax
        mov ax, bx
        add [ds:ice_reg._esp], eax  ; update ESP
        ret
endp ice_ret_exit
proc ice_ret_exception near
        call ice_external_push_32   ; for protection fault in RETs... we must
                                    ; have a valid return address... and since
                                    ; what we have here is an invalid one....
                                    ; set the stack back to normal first
        jmp ice_protection_fault
endp ice_ret_exception
proc ice_ret_far_value near
        mov bx, [es:di+1]       ; get value to add to eSP
        jmp ice_ret_far_skip
endp ice_ret_far_value
proc ice_ret_far near
        xor bx, bx              ; value to add to eSP is 0
ice_ret_far_skip:
        or dl, dl
        jnz ice_ret_far_32      ; 32-bit RET FAR
        call ice_external_pop_16
        mov [ds:ice_reg._ip], ax; save new IP
        call ice_external_pop_16
        mov [ds:ice_reg._cs], ax; save new CS
        jmp ice_ret_exit
endp ice_ret_far
proc ice_ret_far_32 near
        call ice_external_pop_32; get new IP
        cmp eax, 10000h
        jnb ice_ret_exception   ; emulate exception if it's invalid
        mov [ds:ice_reg._ip], ax; save new IP
        call ice_external_pop_32
        mov [ds:ice_reg._cs], ax; save new CS
        jmp ice_ret_exit
endp ice_ret_far_32
proc ice_iret
        dec [ds:ice_opcode_length]  ; set opcode length to 0
        or dl, dl
        jnz ice_iret_32             ; use 32-bit IRET
        call ice_external_pop_16
        mov [ds:ice_reg._ip], ax    ; save new IP
        call ice_external_pop_16
        mov [ds:ice_reg._cs], ax    ; save new CS
        jmp ice_popf                ; emulate POPF
ice_iret_32:
        call ice_external_pop_32    ; get new IP
        cmp eax, 10000h
        jnb ice_ret_exception       ; emulate exception if it's invalid
        mov [ds:ice_reg._ip], ax    ; set new IP
        call ice_external_pop_32
        mov [ds:ice_reg._cs], ax    ; set new CS
        jmp ice_popf                ; emulate POPF[D]
endp ice_iret
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_pushf near
        mov eax, [ds:ice_reg._eflags]   ; get the flags
        or dl, dl
        jnz ice_pushfd
        call ice_external_push_16   ; push them onto external stack (word)
        ret
proc ice_pushfd near
        call ice_external_push_32   ; push them onto external stack (double)
        ret
endp ice_pushfd
endp ice_pushf
proc ice_popf near
        mov bx, [ds:ice_reg._flags] ; get a copy of the flags
        or dl, dl
        jnz ice_popfd
        call ice_external_pop_16    ; get the new copy of the flags
        mov [ds:ice_reg._flags], ax ; save them into the real flags
        jmp ice_popf_single_step
proc ice_popfd near
        call ice_external_pop_32        ; get the new copy of the flags
        mov [ds:ice_reg._eflags], eax   ; save them into the real flags
ice_popf_single_step:
        and bh, 1
        jnz ice_popf_exit           ; exit if TF was originally SET
        and ah, 1
        jz ice_popf_exit            ; exit if TF is still SET
        inc [ds:ice_communication]  ; TF transition from OFF-ON, skip TF check
                                    ; for one instruction pass
ice_popf_exit:
        ret                         ; POPF emulation finished
endp ice_popfd
endp ice_popf
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_loop near      ; DEC CX, JNZ X
        or dl, dl
        jnz ice_loop_ecx
        dec [ds:ice_reg._cx]
        jnz ice_jmp_conditional_short_follow
        ret
ice_loop_ecx:           ; DEC ECX, JNZ X
        dec [ds:ice_reg._ecx]
        jnz ice_jmp_conditional_short_follow
        ret
endp ice_loop
proc ice_loope near
        test [byte low word ds:ice_reg._flags], 1000000b
        jnz ice_loop    ; use normal LOOP procedure if ZF set
        jmp ice_loop_dec    ; decrement eCX anyway
endp ice_loope
proc ice_loopne near
        test [byte low word ds:ice_reg._flags], 1000000b
        jz ice_loop     ; use normal LOOP procedure if ZF clear
        jmp ice_loop_dec    ; decrement eCX anyway
endp ice_loopne
proc ice_loop_dec near
        or dl, dl
        jnz ice_loope_ecx
        dec [ds:ice_reg._cx]    ; decrement CX
        ret
ice_loope_ecx:
        dec [ds:ice_reg._ecx]   ; decrement ECX
        ret
endp ice_loop_dec
proc ice_jcxz near
        mov eax, [ds:ice_reg._ecx]
        or dl, dl
        jnz ice_jcxz_ecx
        or ax, ax       ; follow short jump if CX was 0
        jz ice_jmp_conditional_short_follow
        ret
ice_jcxz_ecx:
        or ecx, ecx     ; follow short jump if ECX was 0
        jz ice_jmp_conditional_short_follow
        ret
endp ice_jcxz
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_aam near
        or ah, ah
        jz ice_div_exception        ; emulate a DIV exception
        jmp ice_generic             ; emulate AAM generically
endp ice_aam
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_pop_segreg near
        cmp al, 17h
        jne ice_pop_segreg_exit     ; is it POP SS?
        inc [ds:ice_communication]  ; if so, skip single step handler on return
ice_pop_segreg_exit:
        jmp ice_generic             ; use generic handler for opcode anyway
endp ice_pop_segreg
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_jmp_conditional_short near
        mov [byte ds:ice_jmp_conditional_short_modify], al
        db 0ebh, 00
        mov ebx, [ds:ice_reg._eflags]
        and bh, 11111110b
        push ebx
        popfd
ice_jmp_conditional_short_modify:
        jc ice_jmp_conditional_short_follow
        ret
ice_jmp_conditional_short_follow:
        mov al, [es:di+1]
        cbw
        add [ds:ice_reg._ip], ax
        ret
endp ice_jmp_conditional_short
proc ice_jmp_conditional_long near
        mov [byte high word ds:ice_jmp_conditional_long_modify], ah
        db 0ebh, 00
        mov ebx, [ds:ice_reg._eflags]
        and bh, 11111110b
        push ebx
        popfd
ice_jmp_conditional_long_modify:
        dw 0fh
        dw 1
        ret
ice_jmp_conditional_long_follow:
        or dl, dl
        jnz ice_jmp_conditional_long_32
        mov ax, [es:di+2]
        add [ds:ice_reg._ip], ax
        ret
endp ice_jmp_conditional_long
proc ice_jmp_conditional_long_32 near
        xor eax, eax
        mov ax, [ds:ice_opcode_length]
        add ax, [ds:ice_reg._ip]
        add eax, [es:di+2]
        cmp eax, 10000h
        jnb ice_protection_fault
        mov [ds:ice_reg._ip], ax
        ret
endp ice_jmp_conditional_long_32
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_push_segreg near
        cmp al, 0eh
        jne ice_generic             ; not PUSH CS?  exit!
        db 66h
        push cs
        pop eax
        mov ax, [ds:ice_reg._cs]    ; determine the complete emulated CS
        or dl, dl
        jnz ice_push_segreg_32      ; go to 32-bit version if operand size
                                    ; prefix is present
        call ice_external_push_16   ; push 16-bit emulated CS
        ret
ice_push_segreg_32:
        call ice_external_push_32   ; push 32-bit emulated CS
        ret
endp ice_push_segreg
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
; 16-bit external stack push from AX
;
proc ice_external_push_16 near
        push es
        push edi
        les edi, [ds:ice_reg._ssesp]
        dec di
        dec di
        mov [es:di], ax
        mov [ds:ice_reg._sp], di
        pop edi
        pop es
        ret
endp ice_external_push_16
; 16-bit external stack pop into AX
;
proc ice_external_pop_16 near
        cld
        push ds
        push esi
        lds esi, [ds:ice_reg._ssesp]
        lodsw
        mov [cs:ice_reg._sp], si
        pop esi
        pop ds
        ret
endp ice_external_pop_16
; 32-bit external stack push from EAX
;
proc ice_external_push_32 near
        push es
        push edi
        les edi, [ds:ice_reg._ssesp]
        sub edi, 4
        mov [es:edi], eax
        mov [ds:ice_reg._esp], edi
        pop edi
        pop es
        ret
endp ice_external_push_32
; 32-bit external stack pop into EAX
;
proc ice_external_pop_32 near
        cld
        push ds
        push esi
        lds esi, [ds:ice_reg._ssesp]
        lodsd
        mov [cs:ice_reg._esp], esi
        pop esi
        pop ds
        ret
endp ice_external_pop_32
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
proc ice_generic_process_es near
        cmp [ds:ice_segment_override], 2eh
        jne ice_generic_main
        mov [ds:ice_cs_swapped], 2
        mov [ds:ice_segment_override], 26h
proc ice_generic near
        cmp [ds:ice_segment_override], 2eh
        jne ice_generic_main
        mov [ds:ice_cs_swapped], 1
        mov [ds:ice_segment_override], 3eh
ice_generic_main:
        push [ds:ice_reg._flags]
        and [byte high word ds:ice_reg._flags], 11111110b
        push ds
        pop es
        std
        mov si, (offset ice_overrides+3)
        mov di, (offset ice_override_buffer+3)
        lodsb
        or al, al
        jz ice_generic_no_segment
        stosb
ice_generic_no_segment:
        lodsb
        or al, al
        jz ice_generic_no_repeat
        stosb
ice_generic_no_repeat:
        lodsb
        or al, al
        jz ice_generic_no_operand
        stosb
ice_generic_no_operand:
        lodsb
        or al, al
        jz ice_generic_no_address
        stosb
ice_generic_no_address:
        mov eax, [ds:ice_reg._eax]
        mov ebx, [ds:ice_reg._ebx]
        mov ecx, [ds:ice_reg._ecx]
        mov edx, [ds:ice_reg._edx]
        mov edi, [ds:ice_reg._edi]
        mov esi, [ds:ice_reg._esi]
        mov ebp, [ds:ice_reg._ebp]
        mov es, [ds:ice_reg._es]
        mov ds, [ds:ice_reg._ds]
        cmp [cs:ice_cs_swapped], 0
        je ice_generic_swapped
        cmp [cs:ice_cs_swapped], 2
        je ice_generic_swap_es
ice_generic_swap_ds:
        mov ds, [cs:ice_reg._cs]
        jmp ice_generic_swapped
ice_generic_swap_es:
        mov es, [cs:ice_reg._cs]
ice_generic_swapped:
        push [cs:ice_reg._eflags]
        popfd
        ice_switch_to_external_stack
align 4
ice_override_buffer db 4 dup (90h)
ice_opcode_buffer db 10h dup (90h)
        ice_switch_to_internal_stack
        pushfd
        pop [cs:ice_reg._eflags]
        cmp [cs:ice_cs_swapped], 0
        je ice_generic_save_both
        cmp [cs:ice_cs_swapped], 1
        je ice_generic_restore_ds
        mov es, [cs:ice_reg._es]
        jmp ice_generic_save_both
ice_generic_restore_ds:
        mov ds, [cs:ice_reg._ds]
ice_generic_save_both:
        mov [cs:ice_reg._ds], ds
        push cs
        pop ds
        mov [ds:ice_reg._es], es
        mov [ds:ice_reg._eax], eax
        mov [ds:ice_reg._ebx], ebx
        mov [ds:ice_reg._ecx], ecx
        mov [ds:ice_reg._edx], edx
        mov [ds:ice_reg._edi], edi
        mov [ds:ice_reg._esi], esi
        mov [ds:ice_reg._ebp], ebp
        cmp [ds:ice_cs_swapped], 0
        je ice_generic_exit
        mov [ds:ice_segment_override], 2eh
ice_generic_exit:
        pop ax
        and ah, 1
        or [byte high word ds:ice_reg._flags], ah
        mov [ds:ice_cs_swapped], 0
proc ice_skip near
        ret
endp ice_skip
endp ice_generic
endp ice_generic_process_es
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
; STRUC definitions
;
; STRUC for our internal 32-bit stacks
;
struc ice_stack_struc
      internal_esp   dd 0
      switch dw 0
          label bottom
      dw 50h dup(0)
          label top
ends ice_stack_struc
; STRUC for immediate tables
;
struc ice_immediates_table_struc
    db 0, 0
    db 1, 1
    db 2, 2
    db 4, 4
    db 6, 6
    db 1, 2
    db 2, 4
    db 4, 6
ends ice_immediates_table_struc
; STRUC for group layouts
;
struc ice_groups_layout_struc
        db (offset ice_tables._group_0 - offset ice_tables._groups)
        db (offset ice_tables._group_1 - offset ice_tables._groups)
        db (offset ice_tables._group_2 - offset ice_tables._groups)
        db (offset ice_tables._group_3 - offset ice_tables._groups)
        db (offset ice_tables._group_4 - offset ice_tables._groups)
        db (offset ice_tables._group_5 - offset ice_tables._groups)
        db (offset ice_tables._group_6 - offset ice_tables._groups)
        db (offset ice_tables._group_7 - offset ice_tables._groups)
ends ice_groups_layout_struc
; STRUC for extended layouts
;
struc ice_extended_layout_struc
        db (offset ice_tables._extended_0 - offset ice_tables._extended)
        db (offset ice_tables._extended_1 - offset ice_tables._extended)
        db (offset ice_tables._extended_2 - offset ice_tables._extended)
        db (offset ice_tables._extended_3 - offset ice_tables._extended)
        db (offset ice_tables._extended_4 - offset ice_tables._extended)
        db (offset ice_tables._extended_5 - offset ice_tables._extended)
        db (offset ice_tables._extended_6 - offset ice_tables._extended)
        db (offset ice_tables._extended_7 - offset ice_tables._extended)
        db (offset ice_tables._extended_8 - offset ice_tables._extended)
        db (offset ice_tables._extended_9 - offset ice_tables._extended)
        db (offset ice_tables._extended_a - offset ice_tables._extended)
        db (offset ice_tables._extended_b - offset ice_tables._extended)
        db (offset ice_tables._extended_c - offset ice_tables._extended)
        db (offset ice_tables._extended_d - offset ice_tables._extended)
        db (offset ice_tables._extended_e - offset ice_tables._extended)
        db (offset ice_tables._extended_f - offset ice_tables._extended)
ends ice_extended_layout_struc
; STRUC for normal layouts
;
struc ice_normal_layout_struc
        db (offset ice_tables._normal_0 - offset ice_tables._normal)
        db (offset ice_tables._normal_1 - offset ice_tables._normal)
        db (offset ice_tables._normal_2 - offset ice_tables._normal)
        db (offset ice_tables._normal_3 - offset ice_tables._normal)
        db (offset ice_tables._normal_4 - offset ice_tables._normal)
        db (offset ice_tables._normal_5 - offset ice_tables._normal)
        db (offset ice_tables._normal_6 - offset ice_tables._normal)
        db (offset ice_tables._normal_7 - offset ice_tables._normal)
        db (offset ice_tables._normal_8 - offset ice_tables._normal)
        db (offset ice_tables._normal_9 - offset ice_tables._normal)
        db (offset ice_tables._normal_a - offset ice_tables._normal)
        db (offset ice_tables._normal_b - offset ice_tables._normal)
        db (offset ice_tables._normal_c - offset ice_tables._normal)
        db (offset ice_tables._normal_d - offset ice_tables._normal)
        db (offset ice_tables._normal_e - offset ice_tables._normal)
        db (offset ice_tables._normal_f - offset ice_tables._normal)
ends ice_normal_layout_struc
; STRUC for our simulated CPU registers
;
struc ice_register_struc
      label _eax dword
      label _ax word
      label _al byte
            db 0
      label _ah byte
            db 0
            dw 0
      label _ebx dword
      label _bx word
      label _bl byte
            db 0
      label _bh byte
            db 0
            dw 0
      label _ecx dword
      label _cx word
      label _cl byte
            db 0
      label _ch byte
            db 0
            dw 0
      label _edx dword
      label _dx word
      label _dl byte
            db 0
      label _dh byte
            db 0
            dw 0
      label _edi dword
      label _di word
            dw 0
            dw 0
      label _esi dword
      label _si word
            dw 0
            dw 0
      label _ebp dword
      label _bp word
            dw 0
            dw 0
      label _csip dword
      label _ip word
            dw 0
      _cs   dw 0
      label _ssesp fword
      label _esp dword
      label _sp word
            dd 0
      _ss   dw 0
      _es   dw 0
      _ds   dw 0
      label _eflags dword
      label _flags word
            dd 0
ends ice_register_struc
; STRUC for COS database tables
;
struc ice_tables_struc
    label _normal unknown
        label _normal_0 unknown
        label _normal_1 unknown
        label _normal_2 unknown
        label _normal_3 unknown
            db 063h, 0c0h, 001h, 006h, 000h, 008h
            dw offset ice_pop_segreg
            db 063h, 0c0h, 001h, 006h, 008h
            dw offset ice_push_segreg
            db 000h
        label _normal_4 unknown
        label _normal_5 unknown
            db 06fh, 000h
        label _normal_6 unknown
            db 000h, 000h, 0e8h
            dw offset ice_indirect
            db 0f0h, 063h, 000h, 006h, 0c6h, 001h, 0c1h, 063h, 000h
        label _normal_7 unknown
            db 06fh, 009h
            dw offset ice_jmp_conditional_short
        label _normal_8 unknown
            db 081h, 086h, 040h, 081h, 067h, 0c0h, 0c8h
            dw offset ice_mov_segreg_source
            db 0e0h, 0c8h
            dw offset ice_mov_segreg_destination
            db 0c0h
        label _normal_9 unknown
            db 069h, 000h, 08h
            dw offset ice_direct_call_far
            db 000h, 008h
            dw offset ice_pushf
            db 008h
            dw offset ice_popf
            db 000h, 000h
        label _normal_a unknown
            db 063h, 010h, 063h, 000h, 001h, 006h, 065h, 000h
        label _normal_b unknown
            db 067h, 001h, 067h, 006h
        label _normal_c unknown
            db 089h, 089h, 008h
            dw offset ice_ret_near_value
            db 008h
            dw offset ice_ret_near
            db 0e0h, 0e8h
            dw offset ice_generic_process_es
            db 0c1h, 0c6h, 002h, 000h, 008h
            dw offset ice_ret_far_value
            db 008h
            dw offset ice_ret_far
            db 008h
            dw offset ice_int_3
            db 009h
            dw offset ice_int_x
            db 008h
            dw offset ice_into
            db 008h
            dw offset ice_iret
        label _normal_d unknown
            db 063h, 088h, 009h
            dw offset ice_aam
            db 001h, 040h, 000h, 067h, 0c0h
        label _normal_e unknown
            db 009h
            dw offset ice_loopne
            db 009h
            dw offset ice_loope
            db 009h
            dw offset ice_loop
            db 009h
            dw offset ice_jcxz
            db 063h, 001h, 00eh
            dw offset ice_direct_call_near
            db 00eh
            dw offset ice_direct_jmp_near
            db 008h
            dw offset ice_direct_jmp_far
            db 009h
            dw offset ice_jmp_conditional_short_follow
            db 063h, 000h
        label _normal_f unknown
            db 000h, 040h, 000h, 000h, 008h
            dw offset ice_skip
            db 000h, 090h, 090h, 065h, 000h, 098h, 0a0h
    label _extended unknown
        label _extended_0 unknown
            db 0a8h, 0b0h, 0c0h, 0c0h, 040h, 040h, 000, 68h, 40h
        label _extended_1 unknown
            db 06fh, 040h
        label _extended_2 unknown
            db 064h, 0f0h, 040h, 0f0h, 68h, 40h
        label _extended_3 unknown
        label _extended_4 unknown
        label _extended_5 unknown
        label _extended_6 unknown
        label _extended_7 unknown
            db 06fh, 040h
        label _extended_8 unknown
            db 06fh, 00eh
            dw offset ice_jmp_conditional_long
        label _extended_9 unknown
            db 06fh, 0c0h
        label _extended_a unknown
            db 000h,000h,040h,0c0h,0c1h,0c0h,040h,040h,000h,000h,040h,0c0h,0c1h,0c0h,040h,0c0h
        label _extended_b unknown
            db 040h,040h,0e0h,0c0h,0e0h,0e0h,0c0h,0c0h,040h,040h,0b9h,064h,0c0h
        label _extended_c unknown
        label _extended_d unknown
        label _extended_e unknown
        label _extended_f unknown
            db 06fh, 040h
    label _groups unknown
        label _group_0 unknown
            db 067h, 0c0h
        label _group_1 unknown
            db 065h, 0c0h, 040h, 0c0h
        label _group_2 unknown
            db 0c0h, 040h, 063h, 0c0h, 061h, 0c8h
            dw offset ice_indirect
        label _group_3 unknown
            db 0c0h, 0c0h, 065h, 040h
        label _group_4 unknown
            db 0c0h, 0c0h, 063h, 0c8h
            dw offset ice_indirect
            db 0c0h, 040h
        label _group_5 unknown
            db 065h, 0c0h, 040h, 040h
        label _group_6 unknown
            db 064h, 0c0h, 040h, 0c0h, 040h
        label _group_7 unknown
            db 063h, 040h, 063h, 0c0h
ends ice_tables_struc
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
align 4
ice_tables ice_tables_struc <>
align 4
ice_normal_layout ice_normal_layout_struc <>
align 4
ice_extended_layout ice_extended_layout_struc <>
align 4
ice_groups_layout ice_groups_layout_struc <>
align 4
ice_immediates_table ice_immediates_table_struc <>
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
align 4
ice_reg ice_register_struc <>
align 4
ice_internal_stack ice_stack_struc <>
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
align 4
ice_indirect_saved dd 0
label ice_overrides dword
    ice_repeat_override  db 0
    ice_segment_override db 0
    ice_address_override db 0
    ice_operand_override db 0
ice_current_opcode dw 0
ice_opcode_length dw 0
ice_original_ip dw 0
ice_handler dw 0
ice_cs_swapped db 0
ice_communication db 0
; =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
ends ice
    end ice_setup
 .----------------------------------------------------------------------------.
 |                                  Part 4                                    |
 |                                                                            |
 |                            Anti-emulation methods                          |
 '----------------------------------------------------------------------------'
    Emulation is supposed to be the be-all and end-all of tunneling methods,
but emulation, just like all other tunneling methods, is not perfect, and is
actually quite far from perfect, as you will soon discover.
    .-------------------------------------.
    | Anti-emulation with invalid opcodes |
    '-------------------------------------'
    The easiest way to detect a stupid emulator is to hook the invalid opcode
interrupt and execute an invalid opcode.  Of course, this will not work on 8086
processors as they just hang on invalid opcodes, however on a 286+, normally,
your hooked i6 routine will be executed.  ICE emulates the i6 instruction
properly, however some other emulation systems may not do this. Some emulation
systems (such as those used in AV) might even just abort straight away on
invalid opcodes... nice eh?  Protected mode instructions in real-mode have the
same effect, however they look legitimate so they might not be able to be
flagged by AV products heuristically like they do with other invalid opcodes.
    .-------------------------.
    | Anti-emulation with AAM |
    '-------------------------'
    This is an easy to way stop even good BCE and SCCE systems.  Normally, AAM
will come in the opcode form of D40A... however by changing the opcode to D400,
the AAM instruction, if emulated in a BCE, will cause a divide by 0 exception,
whose interrupt you can hook before emulation of the instruction.
    In an SCCE system however, they may only check the first byte of the opcode
and if it matches, they emulate a proper D40A AAM opcode.  In this case, if you
clear a flag before execution of the opcode, and then set the flag inside your
divide-by-0 exception handler... after that opcode, if you check the flag and
it isn't set, you're definately under an emulation system.
    The only problem with this particular method is that some clone CPUs may
execute AAM in exactly the manner of an SCCE, and may not execute the divide by
0 exception.  Another problem is that the emulation system designers saw this
trick coming, and emulate your exception handler, which is what ICE does, or
hook the divide-by-0 exception for themselves like ART does (this has problems,
see the INDIRECT handlers section for more detail).
    .---------------------------.
    | Anti-emulation with FLAGS |
    '---------------------------'
    Another little trick is to set INTEL undefined bits in the flags register
which aren't used by any current processor.  If the emulation system is stupid,
it might allow you to set some of the bitfields which INTEL has left undefined,
and wouldn't normally allow you to change.
    ICE has this problem, however it has it in a different way which you have
to check for specifically to catch out.  Although your PUSHF is coped directly
into the emulated register structure, once inside the generic opcode emulation
routine, your newly set flags are corrected by the CPU.  So to detect ICE one
would have to emulate PUSHF/POPF directly after each other to detect the flaws
in the flags :)
    This is a very good way to detect SCCE systems, as they may not correct the
flags like a BCE does.  Unfortunately, clone CPUs and even newer INTEL CPUs may
use these undefined fields for their own purposes and allow them to be set and
cleared at will.  This may also hang the computer anyway :)  As such, this is
not a very good emulation system detection method.
    .------------------------------.
    | Anti-emulation with hardware |
    '------------------------------'
    The most exploitable problems with emulators, are usually in the form of
hardware tricks.  Generally, under debuggers and emulation systems (especially
in AV software), hardware interrupts are completely disabled, which means if
you hook yourself into something like the timer interrupt, i8, and then go into
a never-ending loop... then your i8 will never run and the emulator will crash
or abort.
    Another trick, is to hook i76, the hard disk interrupt... and issue some
sort of file processing/disk interrupt.  If the system is 286+ and has a hard
disk, your interrupt will be executed by the hardware.  If you're under a
stupid SCCE system, you could even break out this way, even if it uses its own
disk/file handling procedures.  This technique will also work on BCE systems.
    The problem comes in when emulators disable hardware interrupts entirely at
all times, in which case your i76 and/or i8 will never be emulated.  Emulators
might even emulate an i8 from time to time just to make things seem normal for
the clock.  However, in this state, NO keyboard or disk access will function
whatsoever.  For certain uses however, such as in a tunneler, you could disable
hardware interrupts temporarily as they shouldn't be needed in handling a
simple interrupt.  However, then you are open to detection :)
    This problem is common to ALL forms of BCE... and is pretty impossible to
avoid... as you cannot just hook certain vectors, as you could be detected.
Even if you hid yourself... you couldn't protect yourself from all stos/movs
instructions and the like... and even then, some programs such as Windows and
DESQView reprogram the PIC (Programmable Interrupt Controller) to point those
IRQs into other places, and you CANNOT (without using DPMI services) work out
where they now point.
    .----------------------------.
    | Exploiting CPU differences |
    '----------------------------'
    An easy way to detect an emulation system is to find out what processor you
are running on.  If you are on an 8086... then if you cause a DIV exception...
the return address should point to the instruction AFTER the one which caused
the exception.  On a 286+ however, it will point to the instruction CAUSING the
exception.  In this way, if the emulation system emulates the wrong divide-by-0
exception, you've caught it out.
    .----------------------.
    | Exploiting 386+ bugs |
    '----------------------'
    The 386+ instruction set is immensely complex and its very hard (if not
impossible) to support all instructions down to the slightest quirk (ICE
doesn't complety support indirect instructions for example).  You see, coding
an emulation system is hard work, and designers may cut corners by not properly
checking the EIP in CALL/JMP instructions, or handle LOCK instructions fully,
etc, in which case you can catch them out by hooking general protection fault
and invalid opcode exception handlers and doing some tricky opcodes.
    .------------.
    | Conclusion |
    '------------'
    Anti-emulation does exist.  As you can see, there are many problems for the
emulation system designer... and to fix those problems he/she must take risks
of being detected by smart code.  ICE can be made close to impossible to break
out of with hardware interrupts, however it then looses its power to process
interrupts properly!
    It's all about trade offs :)
 .----------------------------------------------------------------------------.
 |                                  Part 5                                    |
 |                                                                            |
 |                            Generic anti-tunneling                          |
 '----------------------------------------------------------------------------'
    Since this -IS- supposed to be a document about tunneling, and not just
emulation, I've decided to show you how all tunneling systems, including those
of the emulation type, can be made completely and utterly useless with just a
few opcodes.  Neat, huh?  :)
     +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
                  Generic anti-tunneler/virus mechanism
       Calling code              Standard handler
    .----------------.         .--------------------.
    | ....           |      .->|PUSHF               |
    | INT xx         -------'  |CMP [CS:VAR], FFH   |
    | ....           |<-----.  |JNE ALERT           |
    '----------------'      |  |INC [CS:VAR]        |
                            |  |POPF                |        User Application
                            |  |CALL FAR INT_HANDLER-----.    Interrupt Code
 .------------------------- |->|PUSHF               |    | .------------------.
 |                          |  |CMP [CS:VAR], 0     |    | |Normal application|
 |                          |  |JNE ALERT           |    '>|code for handling |
 |.-------------------------|--|DEC [CS:VAR]        |-------    interrupts    |
 ||    JMP handler          |  |POPF                |      '------------------'
 || .----------------.      '---RETF 2              |
 |'>|PUSHF           |         '--------------------'
 |  |CMP [CS:VAR], 0 |          Kernel Code          Hidden handler
 |  |JNE ALERT       -.      .----------------.    .----------------.
 |  |MOV [CS:VAR], 1 ||      | Proper kernel  | .->|CALL TEST_CNTR  -----.
 |  |CALL RESTORE_JMP|<-.    |   interrupt    | |  |INC [CS:VAR]    |<---|-.
 |  |CALL FAR "KC"   ------->|  instructions  --'  |POPF            |    | |
 |.>|PUSHF           || |    '----------------'    |JMP ORIG_HANDLER--.  | |
 || |CMP [CS:VAR], 2 || |                          '----------------' |  | |
 || |JNE ALERT       -- |  .---------------.                          |  | |
 || |MOV [CS:VAR], 0 || |  |               |                          |  | |
 || |CALL OUR_JMP    |<-|--'  RESTORE_JMP  '----.      OUR_JMP        |  | |
 || |POPF            || | .-------------------. | .----------------.  |  | |
 '|--RETF 2          || | | Restore original  | | |Overwrite bytes |  |  | |
  | '----------------'| '>| bytes from the KC | '>|at KC entrypoint|  |  | |
  |                   |   | entrypoint        |   |with a FAR JMP  |  |  | |
  |                   |   '-------------------'   '----------------'  |  | |
  |                   |                                               |  | |
  |                   |     More Kernel Code                          |  | |
  '-------------------------------------------------------------------'  | |
                      |                                                  | |
          ALERT       |       Test Center                                | |
      .-------------. |  .----------------------.<-----------------------' |
      | [CS:VAR]=0  | |  |   Here, we test for  |       [CS:VAR]=1         |
      | (tunneler)  |<'  |  [CS:VAR]   and bad  |     and safe functions   |
      |     or      |<---- interrupt functions  ---------------------------'
      |bad functions|    '----------------------'
      | (evil code) |
      '-------------'
     +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
    In case you don't understand that... let me explain.  What you have there,
is a complex intertwining of interrupt handlers, inserted into different places
along the interrupt chain.  Each handler modifies this variable in some way,
and if -ANY- of the interrupt handlers are not called, then an interrupt has
been executed in a non-standard way, which indicates a tunneling program has
been activated (they are -ALL- called in normal program execution).
    .----------------.
    | Initialization |
    '----------------'
    When your program loads up, it first grabs the entrypoint of the interrupt
it is wanting to keep a watch on (through standard tunneling techniques if it
is not the first to load up), and overwrites the first few bytes there to form
a FAR JMP to its own code handler (described later).
    Next, your program hooks a 'secret' interrupt vector.  For i13 (on 286+
systems), i76 is called by the hardware every time i13 is finished.  In DOS,
i21 calls i2A sometime during execution.  We hook which of these we want, and
then we finally hook the vector itself through modification of the IVT.  With
this done, we set our internal variable to FFH, set our memory up so we stay
resident, and then we exit.
    .---------.
    | Level 1 |
    '---------'
    Level 1 is the standard interrupt hook.  On each exit from this hook, it
sets an internal variable to FFH... and on entry to this hook, it checks to
make sure the variable -IS- FFH.  If the variable is not FFH, then some program
has accessed other levels of the interrupt chain bypassing this hook, at which
time we tell the user.  If the variable is okay, we set it to 0, and pass
control over to the standard interrupt code following ours, which would
normally be our level 2 hook.  The level 2 hook, on exit, sets the internal
variable to 0, we check to make sure it is zero (alerting the user if it
isn't), and then set it to FFH, before exiting our hook to the calling program
code.
    .---------.
    | Level 2 |
    '---------'
    Level 2 is the interrupt splice we set up.  On entry to this hook, we check
that our internal variable is set to 0 (by the Level 1 handler).  If it is not
0, we alert the user to a tunneling presence.  If it is 0, we set it to 1,
restore the original bytes of the interrupt handler we overwrote, and then
emulate an interrupt call to the address of that interrupt handler.  On exit
from our hook, we check the internal variable is set to 2 (for reasons you'll
discover later), and if it isn't, we alert the user.  If it is 2, we set it
back to 0 and exit our hook, transferring control to our Level 1 handler, which
will check that the value 0 is set, alerting the user otherwise.
    .---------.
    | Level 3 |
    '---------'
    Level 3 is our secret interrupt hook.  First it checks our other hooks have
been processed, by making sure the internal variable is set to 1.  If it isn't,
we alert the user, and if it is, we increment it to 2 (later to be checked by
the Level 2 handler).  Here we can also check to see if any bad functions are
being processed... however generally, in the level 3 hook, the function has
already been executed and all you could do is alert the user and halt the
computer.
    .-----------------.
    | Why so complex? |
    '-----------------'
    Why the complexity?  Why can't you just hook the secret hidden interrupt
and check things from there?  Well, the reason is that some nasties (grin) kill
those interrupt vectors before using interrupts, which means your code won't be
called.  In this way, with the JMP FAR handler, a security alarm will be set
off if a tunneler tries to do this :)
    Of course, if you just had the JMP FAR interrupt controller... you could
check for nasty functions from there.  Not a bad idea... unless a nasty program
saves the bytes at the interrupt entrypoint and will restore them from time to
time if they change ;)  Of course, that wouldn't be very common... and could
actually be quite bad for networking programs, etc.  Sigh, oh well.
    Finally, we all know a standard interrupt hook alone is not good enough to
stop any virus out there, as it will invariably just tunnel past the routine
alltogether!  Either way, even if some parts of the code presented above are
slightly redundant, it will generically detect any and all tunneling attempts
(or at least... it will detect a tunneler has gone through the interrupt
vectors as the program which used the tunneler calls what it thinks is the
original interrupt entrypoint).
    There is a large possibility for false alarms however... should a program
which has hooked into the early chain of command, and uses interrupts to do
processing while in the middle of handling an interrupt.  But to stop these
false alarms there are ways you can check if there really was a tunneling
attempt or if a proper INT was executed... or you can innoculate certain
programs from causing an alarm.
    There are many ways to do it :)
    Oh well, that may not all be correct but you get the idea, right?
 .----------------------------------------------------------------------------.
 |                                  Part 6                                    |
 |                                                                            |
 |                                Conclusion                                  |
 '----------------------------------------------------------------------------'
   HURRAH!  HURRAH!  HURRAH!  HURRAH!  HURRAH!  HURRAH!  HURRAH!  HURRAH!
                  .--------------------------------------.
                  |             YOU ARE A                |
                  |           TUNNELING GOD              |
                  '--------------------------------------'
    Yes, that's right, you have reached the end of my series of documents on
tunneling!  You have reached the status of a tunneling GOD... and hopefully,
judging from how fast my documents have spread into magazines, web sites, and
personal collections, and from how much people have liked them... you will be
joined in your tunneling GOD status by alot of other virus coders.  There can
never be too many to help in the war against the AV :)
    Sigh, what can I say?  It has surely been an interesting journey down the
path of tunneling methods.  Now that I have reached the end, I can surely say
that I know more than I will ever need to really know about tunneling, and that
tunneling, although it has its uses, is not worth spending so much time on when
there are so many other important things to learn.  However, now that you and I
have mastered tunneling we can move onto other things, which is good.
    With the tunneling series finally over and done with, it is time to tell
you about some other projects I have on the horizon.  First of all, is an
excellent document discussing wether viruses are 'alive' or not.  It gets into
some quite philosophical ideas and questions, which are sure to stimulate you,
or at the very least, make you think twice next time you create or destroy a
virus.  I also have another document in mind on virus technology, the pros and
cons, where it is at the moment, and where it is headed for the future.  It
also covers some ground on the new fully polymorphic (metamorphic) viruses, and
how emulation technology used in AV software will cope with it and other virus
technologies.
    Looks like this is going to be another interesting year of documents.
                                    Prince Of Sadness [Immortal Riot/Genesis]