Behind the Scenes on a Web Page
by angelazaharia
Have you ever wondered what exactly happens when you go on the Internet, type (or click on) a URL, and access a website with your browser? How do all those images, text, multimedia special effects (and let's not forget the ads here!) "magically" appear on your screen? It's all rather mysterious, isn't it? Wanna take a lookie-see "behind the scenes?" That is what this article is all about.
First, let's mention a few truths here and throw in some hooks: very few websites are actually profitable (making enough/or even any money to be in the black).
That is why most dot-com sites throw all sorts of ads and/or pop-up banners at you. But wait, have you ever noticed how all of those advertisements are on top of the page and are the first thing to appear (be downloaded)? Have you ever monitored how many cookies an average website writes onto your HD? Ever heard of companies such as DoubleClick, Aureate, Akamai? If yes, do you know what they do to make money? When you use a search engine, do you ever wonder why all the links you find on page one are major commercial companies' sites? Weren't you surprised even a little bit when advertisements tailor-made to fit what you were looking at began to pop up on your screen? All these questions, eh?
Here are the tools I will be using to unveil all those "secrets."
Your ordinary web browser (Netscape, not Internet Explorer), EditPad (a freeware, same as Windoze's Notepad but of course it does a lot more), a good firewall such as WRQ @Guard (oldie but goodie), and my brain.
I will use @Guard's wonderful logging capabilities and dashboard window to monitor all the connections my web browser will make in the course of my investigation, no matter how short-lived they may be, hehehe.
The website I will be looking at is www.wired.com/news/technology from Wired Magazine, a tech news site which I read almost daily. For this session, I will be accepting all ads, cookies, Java, JavaScript, ActiveX, and everything else they throw at me. I activate @Guard's dashboard window and I am ready to begin!
I start Netscape, click on the www.wired.com/news/technology link and immediately begin checking my connections by refreshing the option on the dashboard window. Here is what appears:
Executable State Remote Local Port Sent Received NETSCAPE.EXE Connected/Out a1112.g.akamai.net:http myPC 2372 371 503 NETSCAPE.EXE Connected/Out a1112.g.akamai.net:http myPC 2373 368 582 NETSCAPE.EXE Connected/Out lubid.lycos.com:http myPC 2374 350 419Hmmmm... Rather interesting, isn't it? Let's go over each part and explain what we are looking at exactly:
NETSCAPE.EXE is the browser, of course.
Connected/Out means Netscape is reaching out and connecting right now.
Remote is the remote server Netscape is connected to, in this case it's two servers named a1112.g.akamai.net and lubid.lycos.com both using server port http (or 80).
Local is my PC and Port is what port is being used on my PC (in this case it's three ports: 2372, 2373, and 2374).
Sent and Received are bytes sent by my PC and received by my PC.
Anything jumping at you already? I sure hope so! I do not remember asking to connect to either a1112.g.akamai.net or lubid.lycos.com, but rather to www.wired.com/news/technology. So who/what are those places and more importantly why am I connecting to them and why am I sending and receiving data to/from them? (Small as it may be - 371 bytes is next to nothing.)
Oops, and since I told Netscape to: "Warn me before accepting any Cookies" I get this lovely message on my screen:
The server www.wired.com wishes to set a cookie that will automatically be sent to any server in the domain wired.com. The name and value of the cookie are: p_uniqueid=7s42L2dLf04XY6gr3B. This cookie will persist until Thu Dec 31 15:59:11 2037. Do you wish to allow the cookie to be set?Wow, this cookie will be "alive" on my HD for a loooong time, won't it? Not to worry, I love cookies and I eat them every day, making sure none are left on my HD. So I click Yes. But did you notice in the message how that cookie will be read by any server that's part of Wired.com? We will come back to that part later.
Let's now save the HTML code of the web page and look at it. To do that in Netscape, I go to File -> Save As (or Ctrl+S) -> Save. The name of the page is: technology.html
Oh, wait, while talking to you, another connection appears, so let's hurry and look at it by refreshing the dashboard window again. The new connection is connection number four:
Executable State Remote Local Port Sent Received NETSCAPE.EXE Connected/Out a1112.g.akamai.net:http myPC 2372 371 503 NETSCAPE.EXE Connected/Out a1112.g.akamai.net:http myPC 2373 368 582 NETSCAPE.EXE Connected/Out lubid.lycos.com:http myPC 2374 350 419 NETSCAPE.EXE Ctd/UNKNOWN local host myPC 0 0It stays active for a second and then it's gone. Hehe, that was just an ad Wired was trying to get by me, but I'm too clever for them and I simply threw it right back into their faces using my hosts file. That's what local host means. I will talk about the hosts file at the end of this article.
Let's continue studying. Using EditPad, I open the saved HTML code of technology.html and scroll down. Aha! There it is! Almost right at the top, in the <!- THIS IS THE NEW NAV BAR -> I see multiple references to both the mysterious Lycos and Akamai.
Here are a few of them:
<a href="http://www.lycos.com/network/" target=_top>and
<img src="http://a1112.g.akamai.net/7/1112/492/03312000/static.wired.com/news/images/lycos_logo_3.gif" width=116 height=19 alt="The Lycos Network" border=0> <a href="http://www.lycos.com/">Lycos Home</a> <a href="http://www.lycos.com/sitemap.asp"><a href="http://my.lycos.com/">My Lycos</a><img src="http://a1112.g.akamai.net/7/1112">The details of all the above gibberish don't really matter.
What's important is that they include Lycos and Akamai. Let's just mark those obvious web addresses: http://www.lycos.com/network/, http://www.lycos.com/, and http://my.lycos.com/
So now it is beginning to make some sense, isn't it? Every time I go to www.wired.com/news/technology I also connect to this bunch of other websites too. Lycos.com appears to be one of the main servers for this domain. I have done some info digging previously and I know Wired is part of the large Lycos Corporation which also includes free web hostings such as www.tripod.lycos.com and angelfire.lycos.com, search engines (hotbot.lycos.com), and other various "free" Internet services such as free web page building tools.
Remember what my cookie said? It will be read by all the Wired (Lycos) domains, which means that if I am a frequent visitor to a few of their sites, they will have a rather detailed report of what I like to look at and what I like to do online just by tracking me with their cookies. Visiting those websites, you can see they are international, with servers in just about every major country in the world. Spider webs indeed!
Now, let's look at the Akamai part and see how they fit into this puzzle:
<img src="http://a1112.g.akamai.net/7/1112/492/03312000/static.wired.com/news/images/lycos_logo_3.gif" width=116 height=19 alt="The Lycos Network" border=0>img src means image source. Its web address matches exactly what the dashboard window showed:
Remote Local Port Sent Received a1112.g.akamai.net:http myPC 2372 371 503 a1112.g.akamai.net:http myPC 2373 368 582Reading the HTML Akamai code further, it becomes clear what its function is. Akamai keeps Wired images on its servers and when we click on a Wired site, our browsers read the HTML code and also connect to the Akamai server to get the images from there. Very interesting, isn't it?
Bet you didn't know that, eh? Akamai hosts often-requested images and other data from hundreds of sites on their ring of servers scattered around the world. What's even more interesting is Akamai does all this "free of charge." How do you think they make their money, eh? I will leave that little puzzle for you to figure out.
Going through the HTML code, I see numerous references to Akamai. Just for the fun of it, I count them and come up with 36 times the Akamai server got contacted to serve an image to me. Doing the same for Lycos, I find 33 references.
Let's now look at my @Guard's logs and see what extra info we can dig from them. Here is @Guard's Web History Event Log, showing more sites my browser made a connection with:
8/25/01 10:47:17.227 http://lubid.lycos.com/one.asp?site=wired.lycos.com&ord=825356 8/25/01 10:46:56.857 http://www.wired.com/news/technology/As you can see, the ?site=wired.lycos.com&ord=825356 matches the date, but I'm not sure what the rest means.
Here is @Guard's Web Connections Event Log, showing the sites my browser made a connection with:
8/25/01 10:47:16.510 Connection: www.wired.com: http from [myPC]: 2368, 283 bytes sent, 43118 bytes received, 22.053 elapsed time2368 is the port my PC used, 283 were the bytes my PC sent and 43118 were the bytes my PC received.
Most eye opening is the Privacy Event Log, showing just about every connection established while the web page's data (the images) was being transferred:
8/25/01 10:47:16.630 Allowed User-Agent: Mozilla/4.08 [en] (Win95; U;Nay) sent to http://lubid.lycos.com/one.asp?site=wired.lycos.com&ord=825356 8/25/01 10:47:16.630 Blocked Referer: http://www.wired.com/news/technology/ sent to http://lubid.lycos.com/one.asp?site=wired.lycos.com&ord=825356 8/25/01 10:47:16.623 Allowed User-Agent: Mozilla/4.08 [en] (Win95; U;Nav) sent to http://a1112.g.akamai.net/7/1112/492/20010825/www.wired.com/news/images/mail2.gif 8/25/01 10:47:16.623 Blocked Referer: http://www.wired.com/news/technology/ sent to http://a1112.g.akamai.net/7/1112/492/20010825/www.wired.com/news/images/mail2.gif 8/25/01 10:47:16.547 Allowed User-Agent: Mozilla/4.08 [en] (Win95; U;Nav) sent to http://a1112.g.akamai.net/7/1112/492/20010825/www.wired.com/news/images/w_button.gif 8/25/01 10:47:16.547 Blocked Referer: http://www.wired.com/news/technology/ sent to http://a1112.g.akamai.net/7/1112/492/20010825/www.wired.com/news/images/w_button.gif 8/25/01 10:46:54.478 Allowed User-Agent: Mozilla/4.08 [en] (Win95; U;Nav) sent to http://www.wired.com/news/technology/Oops, I guess I told @Guard to block a few connections, hehe. Oh well...
Now, let's try accessing again the exact same site, but this time with @Guard firewall turned off, just to see if anything different happens. I will again be using Netscape, so I can watch the connections as they appear on Netscape's status bar located along the lower bottom-left side.
I go through the same steps and keep a constant eye on the bottom left part of Netscape. This time, along with the expected Akamai and Lycos I notice something different, something I haven't seen before:
Connect: Contacting Host: ln.doubleclick.net/ad... Transferring data from: http://ln.doubleclick.net/ad... Connect: Contacting Host: ln.doubleclick.net/ad... Transferring data from: http://ln.doubleclick.net/ad... Connect: Contacting Host: ln.doubleclick.net/ad... Transferring data from: http://ln.doubleclick.net/ad...then:
Connect: Contacting Host: ad.doubleclick.net/ad... Transferring data from: http://ad.doubleclick.net/ad... Connect: Contacting Host: ad.doubleclick.net/ad... Transferring data from: http://ad.doubleclick.net/ad...and finally:
Connect: Contacting Host: m.doubleclick.net/ad... Transferring data from: Connect: Contacting Host: m.doubleclick.net/ad... Transferring data from: Connect: Contacting Host: m.doubleclick.net/ad... Transferring data from:The connections last for one or two seconds at most.
(Note: Here is a secret I failed to mention before. I run on a painfully s-l-o-w 33,600 bps modem connection which helps me observe everything that happens in kinda slow motion. People using 56 kbps modems, DSL cable, or T1 lines won't be able to see what I see because everything will happen very fast for them. This is one instance where slow speed pays off!)
Intrigued, I go back to the technology.html file and search for the ln.doubleclick.net string first and, again, I find numerous references such as:
<a href="http://ln.doubleclick.net/jump/wn.ln/technology;h=net;sz=468x60;ptile=1;pos=1;!category=adult;ord=2215222830?" target=_top>and
<img height=60 src="http://ln.doubleclick.net/ad/wn.ln/technology;h=net;sz=468x60;ptile=1;pos=1;!category=adult;ord=2215222830?">How interesting! Besides connecting to ln.doubleclick.net, they also send images <img height=60 src=... from their server http://ln.doubleclick.net/ad/wn.ln to my PC.
Care to guess what kind of images those might be? Well, DoubleClick are notorious for their ads! In fact, a big stink was raised last year when it was found out how they began combining their ads with cookies, this tracking and making detailed reports on everyone who is stupid enough to even click on an ad. Just for the fun of it, I again counted how many times my browser had to connect to doubleclick.net to receive all the images. This time it was only seven times. Well, I guess that's better than 36 times! Yeah, right!
Let's play with the DoubleClick ad now and see if we can learn anything interesting from it. On the web page I run my mouse over it and carefully watch Netscape's status bar. Here is what I get:
http://ln.doubleclick.net/click;3215854;0-0;1;3630096;1-468|60;0|0|0;;%3f http://music.lycos.com/features/pd...and my browser runs into the end of the screen on the right side. Again that Lycos appears, eh?
Almost like it's following us everywhere we wanna go! Wanna grab the whole string from the HTML code? Betcha million bux I can find it in there, hehe. No? Didn't think so either.
What the hell I say, let's click on it, see what happens and where it will lead us. Immediately, I begin to see the same: Connect: Contacting Host: ln.doubleclick.net/ad... as before, over and over and over again.
Transferring data from: http://ln.doubleclick.net/ad... and I am sent to http://music.lycos.com/features/pdiddy/.
I guess Lycos is in the music biz too, selling/giving away free MP3s, etc. with that music.lycos.com website. I patiently wait until the page has loaded. Then since I don't care to get any P. Diddy material, I use the Back button to go to the original Wired page. And the ad has now changed. Hmmm...
Since I simply love punishment, I again click on the ad, and now I am sent to:
http://www-3.ibm.com/e-business/lp/innov3/innov3_flat.html?formId=15&P_Site=S03&P_Campaign=101C4E02&P_Creative=koustuv&c=Innovations_W3&n=koustuv&r=lycos&t=ad&P_Vanity=And when I go back to Wired, I am not surprised to see that the ad has changed again.
Noticed all those Lycos references all over the place in all the URL links? Finally, I check the cookie file in C:\Program Files\Netscape\default\ folder. Here is the full text of the cookie I allowed in earlier:
.lycos.com TRUE / FALSE 2147403541 lubid 010000508BD395FD04483A B11D7000BD0D1400000000There are those Lycos and Lubid names yet again. Funny, eh? Lycos, Lycos, Lycos, Lycos, everywhere, even if it was a Wired cookie!
Let's review everything we have learned so far: When we click on an ordinary web page to access it, our browser reads the HTML code of that web page and most likely it also opens numerous other short-lived back door connections to various other web servers which contain the images and the ads for the original website.
Usually, an average web page will contact up to between four and nine other servers and get data from them. The most common (the ones I know of) are Akamai which "serves" images, DoubleClick which servers both ads (in form of images) and cookies embedded into the ads. All of this surreptitious activity can easily be spotted with a good firewall and a bit of patience.
Are you starting to feel a little uncomfortable now, seeing all these "behind the scenes" activities happening just to read one lousy web page? Personally, all that connecting to multiple servers and sending and receiving data from/to them makes me highly annoyed because I know exactly what DoubleClick and Akamai do. Numerous articles have already been written about DoubleClick, so I don't have to repeat them here.
To summarize: To survive the collapse of the NASDAQ, most commercial bastards on the Internet have been trying to find new various ways to make money. They throw as many ads at us as possible and try to compile a very detailed use of all of our online activities using cookies, ads, web bugs, Java, JavaScript, and other known and unknown ways.
Internet companies serving "content" (be it news, information, etc.) get into contracts with sleazebags such as DoubleClick, Akamai, and others, and create databases out of every bit of information they can squeeze about you and your surfing habits. Do you know how many people are monitoring, logging, classifying everything you are doing online right now? Isn't privacy important to you?
Personally, I say that anyone who monitors you without your permission is your enemy. I say we must fight them with everything we got including but not limited to: knowledge of how our PCs and all of our software work, a good firewall, and last but not least our brains!
Don't kid yourself: Those clowns don't have any shame or remorse. All the very juicy information they collect about you is later sold for a lot of money to different companies that may be interested in this kind of stuff (trust me, there are a lot).
Go ahead and check what your favorite web page is doing behind your back. Betcha you will be surprised.