

It's important to have a little understanding of the ML assembler in
order to see what the linker does. And probably the easiest way to think of the ML
assembler is as a tool that turns your assembler source code into literal strings of
bytes, just like an ASCII message. Except that it's a literal string built by ML from it's
knowledge about how each of the instructions need to be coded up in binary form (machine
code.) But rather than each instruction being a separate string of its own, ML turns whole
sequences of instructions into single instances of a literal strings.
How does ML know when to start or stop a string, then? Well,
whenever you change memory segments is the simple answer. When you entered a line with a
directive that said .code, the assembler started a new literal string. If you
then give the directive that says, .data, the assembler will terminate the
earlier string and then start up a new one. These strings go into the object file I
mentioned earlier. It's the job of the linker to read these object files and put these
strings together into something meaningful, when building the final program.
It's now time to discuss a little of what the linker actually does
for us. Understanding this will help you write your programs well.
Linking with LINK
One of the tasks the linker handles is collecting up these strings
that the ML assembler has written into the object file(s) and putting them into some
order. For this part of its task, you can image that the linker has a tablet of blank
pages it can write on. But there is nothing on them, when it starts up. Just empty pages.
Each of these pages can be labeled at the top with the name of a memory segment. No two
pages can have the same name. Each page is "very long" so it can hold everything
the linker needs to put there.
The linker then grabs up the first object file mentioned (if there
is more than one, it often doesn't matter what order they are given) and starts to read
through it. Each string is labeled with a memory segment name within which it belongs. The
linker searches the pages it has and sees if that memory segment is already mentioned at
the top of any one of them. If not, the linker selects the next available blank page and
heads it with the name of this newly discovered segment, placing the name at the top where
it can be easily found. Then, the linker simply appends the literal string of bytes it
found in the object file on this page. Think of it as a kind of accountant's page, with
ruled lines -- one per byte. The linker (acting as an accountant) simply takes each byte
in sequence and places its value on the next line, numbering each line as it goes. The
first line with the first byte is numbered 0. It continues this process as it reads the
first object file. Then it closes that file and opens the next one and repeats this for
each string there, too.
And so it goes. In the end, the linker will have one or more of
these partly filled pages at hand. When that happens, the linker simply writes each of
these pages, one at a time, to the final executable program file. There are some other
details that complicate the linker's life a little.
For example, sometimes one of those literal strings in the object
file will also have a special notation telling the linker that they can't be simply added
to the memory segment, but that they have to be exactly placed on that segment, starting
at a given row. In cases like this, the linker may need to keep track of an empty 'hole'
on the page. It can fill this hole, if it wants, with other strings that don't have to be
placed in an exact spot and are short enough to fit into that hole.
Another complication happens when one of these literal strings isn't
quite so... literal. It might have a spot with few consecutive bytes that are marked as
'unknown' and need to be adjusted by the linker. These are usually spots reserved for a
memory address (segment or offset part) that wasn't known by ML when it assembled the
code, but where the linker will eventually be able to figure it out as it finishes reading
all the object files. In these cases, ML will place a note there telling the linker the
name of the label of which it didn't know the value and leave it to the linker to later
'fix up.' To fix these, the linker must keep a separate list of such places that
need adjusting. In the normal course of adding strings to various pages in its book
(memory segments), the linker also writes down any labels (named byte entries) it sees as
it processes them. Once the process of reading all the various strings from the various
object files is complete, the linker will then go back to its list of "fix ups"
and go find the referenced label and patch in the right bytes where it is supposed to.
Also, it's possible to tell the linker you want to make a composite
page from one or more of the other named memory segment pages. This is what I think of as
a GROUP page.
Let's look at a print out of the object file, lesson02.obj:
This is what the internals of an object file might look like (and do
look like, in this case.) If you look in there, you'll see two entries marked LEDATA.
These are those 'strings' I have been talking about. In one case, it's code from the code
segment we wrote. In the other case, it's the actual string we wanted to display on the
screen. But they are written into the object file as two separate strings. Also note that
the linker page name, the memory segment in other words, is also mentioned on these LEDATA
headers; one as _TEXT and the other as _DATA. Those are the named pages, so to speak,
where those strings go. There is also a "FIXUPP" record. This is one of those
entries designed to cause the linker to 'patch up' one of the LEDATA strings. See what you
can figure out by reading this. Compare it with what you can see in the listing file
generated by ML. It's not necessarily important, but it's fun.
Summary
When you write your assembly code, you can choose to place code,
constants, or variables into different memory segments that the linker will manage for
you. Later, as you learn more about the use of segments in the assembler, keep in mind
what I've said here about the linker. The main thing to remember is that different
segments are like different pages of paper. You can instruct the assembler and the linker
to place code on any of them you want, any time you want, in any file you want. They will
do the work of making sure that things get put where you say to put them. (Whenever you
use a .model directive, in fact, the assembler will automatically create a
segment called _TEXT.)
For very simple programs, you'll probably just use two segments,
just like _TEXT and _DATA mentioned above. Your code will get placed onto the _TEXT page
and your constants and variables will get placed onto the _DATA page. For a lot of useful
programs, that's all you really need.
Last updated: Thursday, July 08, 2004 22:11