5. COM?

Well, we've covered a first-time production process of making of a .COM file in some detail. But what exactly IS a .COM program?

DOS .COM Programs

The .COM type of program was the first (and only) type of program that DOS 1.0 understood. It wasn't until DOS 2.0 that a new type of program called a .EXE was invented. DOS 1.0 was a floppy-based system and these floppies didn't hold a lot of data (about 320kbyte), so there was no real worry about very big programs at that point. When DOS 2.0 arrived and supported a hard disk in the IBM PC/XT, there was more memory on a typical PC and the disk space had been expanded to some 10Mbyte and the need for a new type of program had become clearer.

In DOS 1.0, a very simple set of rules applied to .COM programs. When a program started, it owned all of the available memory in the computer. If it tried to use more than was present, it wouldn't run right, of course. But there was no memory management in DOS 1.0 and so any program running could simply do whatever it wanted to do, use whatever memory it wanted to use. DOS 1.0 simply kept track of the first available memory location it could use to load up programs. When DOS would run a .COM program, it would simply prepare a special area called the Program Segment Prefix (PSP) in the first 256 bytes of this available memory and then blindly load the entire .COM file starting just after that point. After loading the .COM file (as simple data bytes) into the memory just after the PSP, DOS would set up a few register values, place a return address (to return to DOS) onto the stack, and then jump straight into the program -- always starting that program at the first byte after the PSP. Since 256 bytes corresponds to 0100h in hexadecimal, this is the offset address that was always used. The CS (code segment) register would always point to the beginning of the available memory, though, which was at the beginning of the PSP.

When DOS 2.0 arrived, it included memory management features. To remain compatible with DOS 1.0's style of running .COM programs, DOS 2.0 would first allocate the largest available block of memory (often, the only one) and assign it to the .COM program. It would also prepare the first 256 bytes of that block as the PSP and would load the .COM file exactly after that point, quite similarly to DOS 1.0. The main issue with .COM programs running on new versions of DOS is that all of the available memory (or at least, the largest block of memory available) was allocated to the .COM program. So, if the .COM program wanted to avail itself of any of the newer DOS memory management functions to allocate more memory, it usually gets a "no way" response. So, .COM programs will sometimes return all of the unused part of their allocation back to DOS when they first start up. We didn't do that in our early examples, because that process would have greatly complicated my explaining them.

The registers that DOS sets up for .COM programs are:

CS, DS, ES, SS: These are the four segment registers available to the earlier x86 CPUs (DOS programs do not usually deal with the newer FS and GS segment registers.) They are all initialized by DOS (when used with an offset of 0) to point to the beginning of the Program Segment Prefix (PSP.)
IP: This is the instruction pointer and represents the offset address used to run programs. For .COM programs, this starting address is 0100h, so the IP is always set to 0100h when a .COM program first starts. (The IP register is always paired with the CS segment register to determine the complete address for the running code.)
SP: This is the stack pointer and represents the offset address of the current point in the stack area. For .COM programs, this value is usually 0FFFEh when the program starts, being set to the highest number possible one a word of 0 has been pushed. However, if DOS knows that less than 65536 bytes are available for the .COM program, it will set the SP register to the end of the actual available memory before pushing a word of 0 and then running the .COM program. So the SP register isn't always 0FFFEh -- it just usually is in these days with very large DOS program areas.
BX:CX: This register pair is usually set to the size of the .COM file, treating BX as the upper 16-bits of a 32-bit value. However, I really haven't tested what happens when the .COM file is near or larger than 65536 bytes, so I'd recommend testing this before relying on it.
AX, DX, SI, DI: These registers are set to 0, when the .COM program starts. I wouldn't rely on this behavior, though. Just set them as you need them and don't count on them being zero when DOS starts the .COM program.

Summary of .COM Programs

DOS 1.0 only knew how to run .COM programs. In this first version of DOS, there were also no memory management functions and no concept of allocated or free blocks of memory. .COM programs were free to use any or all of the available memory, when they ran. Their stack was simply set up to the end of their program segment (there was only one such segment) or the end of memory, which ever came first, and then they were then simply started. DOS just loaded the .COM program into the first available memory segment and ran them.

DOS 1.0 would function on a machine with only 16k of RAM -- the first PC from IBM provided a minimum of 16k, with an option to increase this to 64k provided on the motherboard and up to 256k with a memory expansion card.

With the advent of the IBM PC/XT, standard RAM was increased to 128k, with an option to increase this to 256k on the motherboard and up to 640k, with memory expansion cards. The XT also added a 10Mb hard disk. Microsoft came out with DOS version 2.0 to accomodate this new machine and included a number of features, including extended support for larger disks with the FAT16 format, the .EXE program type, and memory allocation functions used to support them.

For backward compatibility, while also now comforming to the requirements of the new memory allocation functions, DOS versions from 2.0 and beyond allocate the largest (usually, this means all of memory) memory block when starting .COM programs. This was the safer way to proceed, under the new guidelines.

.COM programs which want to free up unused memory to DOS, at least those versions from 2.0 and beyond, need to add code designed for that purpose. For example, if a .COM program wishes to use DOS to load and run another program, it will need to free up some of the DOS memory, first.

The .COM file format is really quite simple. DOS doesn't interpret it, it just loads it up into memory and starts it. The first byte in the .COM file is the first byte of the program that starts, when DOS runs it. It's quite a simple process.

It's time to talk a little about the Program Segment Prefix (PSP.)

Last updated: Thursday, July 08, 2004 15:02