Your Eyes Have Just Been Sold
by docburton
I read angelazaharia article "Behind the Scenes on a Web Page" in 18:4 and thought it would be a good idea to add what I know about ad serving and DoubleClick specifically.
I used them for my ads for a while and was amazed at how simple yet violating their technology can be. Also, you should check out the manual floating around with instructions on how to work every machine and some other goodies. So do some research for the specifics.
In this article I'll give you an overview of what DoubleClick does, how they do it, and some of the potential dangers and weaknesses.
The Business
A web company puts a few lines of HTML on their page that point to ad.doubleclick.net and wham! The user's attention has been sold to the highest bidder.
Those "tags" allow either DoubleClick or the web company, if they choose, to exploit their users in a variety of ways. The most obvious is the "targeted" ad, which allows them to sell space either very specifically ("Searched for the 49ers at 10 am and lives in the 94111 ZIP Code") to very broadly ("one of the sports sites") depending on who's buying.
The main technology that is used throughout the company is called Dynamic Advertising, Reporting, and Targeting (DART). Other ad serving products that we looked at work similarly to this so I'll be more big-picture with this article but still give you the guts of how it works. Also, they sometimes use other technologies for collecting suckers since they've taken over all of their competition. But DART is the most prevalent and the one we'll concentrate on.
There's also an email business that uses some of the techniques below but email is less consistent with support for GIFs, JPEGs, Flash, and the like so making and tracking an ad is much more difficult. However, as we'll see later, privacy goes out the window far more easily than with just a browser.
The Structure of the Ad
When you go to your favorite web page there are several calls made in order to gather all of the information.
The first is obviously the HTML of the page itself, and next are every image and object needed to complete the picture. Out of laziness or ignorance the ads usually go to another domain.
In our case it was ad.doubleclick.net. This opens up a whole mess of issues on privacy since images bring with them cookies (a big mistake made during the protocol creation days) and two companies who've never heard of each other can accidentally share information about their users just because they both use a third-party. Not to mention, you asked the site for info about a certain topic and, next thing you know, they've commoditized your "eyeballs."
The most basic style of ad is the link/image combo. This allows for JPEGs and GIFs only and is composed of an <href> and an <img scr=>. What you will usually see is an animated GIF somewhere on the page that can click through to a URL (first passing through DoubleClick).
The other styles of tags all allow "rich media," meaning they can add HTML, JavaScript, Java, etc. into that banner. How do they do this? By nesting HTML in such a way that your browser will pick out the most sophisticated ad it can handle. (Sometimes they'll sniff what browser you have and just give you a tag that your browser can understand.)
I'm not going to show you every type so grab the source whenever you see an ad that's more than a simple image and you'll see what I mean. The other tags are made up of: <iframe> calls that only Internet Explorer 4 and above will use; JavaScript calls for Netscape 3 and above; <ilayer/layer> calls for Netscape 4 and above; and <frame> calls for just about every browser.
They are nested so that if, for example, you don't understand JavaScript, then your browser will pick up the <noscript> section with just a link and image. What this means is that through the trick of a layer, <iframe>, JavaScript, or <frame> that ad will be much more powerful, and usually more annoying.
The Call
I'll now give you a loose structure of the network behind every ad delivery.
There are two main types of servers used: ad servers and media servers, with hundreds or thousands of each around the world living usually in data centers. That initial call to ad.doubleclick.net ends up at a dispatcher who will then pick the ad server that is closest to you based on your IP and their network map. (You may have noticed that you get a ping every now and then from DoubleClick. What they are trying to do is figure out where you are and test the fastest times to their servers.)
The ad server will then take all of the info about you and what you are doing (don't worry, we'll go into detail about that later) and decide which ads to ram down your throat. Once that's decided it will pass the connection along to a media server which has all of the images, HTML, class files, and other objects to form the ad.
With rich media the mouseover will often contain m.doubleclick.net followed by a complicated string, since the ad has already been chosen and resolved.
The media servers work similar to Akamai's servers. They are basically a lot of computers sitting "at the edge of the net" that will send you an image or whatever, usually before the rest of the page loads. All of this happens (including ad selection and delivery) within a few milliseconds which is why someone like Wired would be tempted to give their ad space or even their regular images away to another company to deliver.
What They Know and the Cookie
There isn't too much info in the cookie even though this is usually the source of privacy flareups and blocking it only somewhat helps.
Some of the things you'll find in it are an ID (since cookies are attached to browsers a different browser seems like a different user) and sometimes info about which ads you saw recently (in case there is a limit on how many times you should see a certain ad, or an order that they want you to see the ads in.)
The ID is the most important part. When you first come across a DoubleClick ad you are assigned this ID, they record your IP, and begin the process of looking you up. Then when you return some time later and send that ID with your ad request they'll be able to tell where you are in detail.
How do they do it?
They claim a variety of ways but the main one is taking your IP and reversing it to find the domain it came from. Now, that domain can tell you a lot of info. Are you using a small provider that you thought was so private?
Well, they'll just look them up and see that they are in Nowheresville, U.S.A. (pop. 9) which means ZIP Code 12345 and area code 678. Think that using a major ISP is any better? Nope. Chances are they've reserved a range of IPs for that local phone number you just dialed into and you're in the same boat. So now DoubleClick has your provider, country, state, city, ZIP code, and area code.
Surfing at work? Even better for them. Not only do they know where that company is located, but what industry it's in, how many employees work there, and the size of its annual revenue, etc. All of this is public info but they just have the department to put it all together and exploit you when you're trying to research Cap'n Crunch cereal. Think because you blocked cookies that you're safe? Well, they'll just look up your IP then, and get most of same info.
The last major piece is all the stuff in the header of your ad request. The browser type and version, operating system, date, and time of day may all be of interest to an advertiser. Running Opera on Red Hat Linux? I'm sure Microsoft would love to send some offers your way. All these things are easily sold to advertisers by checking a box.
What the Website Told Them About You
Now let's break down the long string that comes after: ad.doubleclick.net/
The first is the type of ad requested between the two slashes. For the link/image combo you'll see "jump" meaning it's a link so send back a redirect and "ad" meaning only send an image.
The others allow you to have rich media (JavaScript, HTML, pop-ups) and there are four of them: "adi" means send whatever the ad is in the form of an <iframe> (including wrapped images); "adj" means send that ad in JavaScript form (a bunch of document.writes); adl puts it in a layer; and adf requests the ad in a frame wrap.
It's the way that you ask for an ad using this code that will determine what form the ad takes.
Now for the fun stuff.
The next string before the slash tells the ad server which company's ad bank (and often what section of their site) to grab the ads from.
Sometimes it's obvious who they are (lycos.com) or it's a subsidiary (wn.ln I assume stands for Wired Network at Lycos Network) that ultimately points back to the larger company. Other times it's more specific (sports.lycos) or cryptic (sp.ln).
I've noticed that the naming conventions vary but are usually straightforward. This is the first narrowing of the pool of ads.
After that you have the second major category between the slash and the semi-colon, such as: /baseball;
Then the scary part begins. From that point on the company can stick anything in there in the form of: this=that
I used CGI to put demographic and page info into the tags (all for a paycheck). Searching for "cars" and got a Ford ad? That's because it says search=cars in the DoubleClick URL.
Watch out if you're registered on a site because they could put gender=female or g=f or even x=1 all to say that you're a lady.
Use your imagination as to what kind of privacy you can lose when thousands of the sites you go to all store this info in the same set of DoubleClick servers.
To be complete, you also almost always have sz= for the size of the ad, some version of tile= for the number of the ad on the page (to avoid redundancy or to send more than one together), and !category= to avoid certain types of ads such as adult.
The line ends with ord=[some number]? and then a random number that might be generated per page per person. I used a timestamp but the number is meaningless since all it does is make sure that your browser (or proxy server) isn't going to cache the ad and that when you click it goes with the correct ad.
The ads, if they are rich media, can actually use any of this info in the ad itself. So, you may see an ad that says, "Hi, Bradley Peterson" or "Check out the weather in San Francisco." Be aware that rich media is sometimes just a piece of text that is blended in with the rest of the page. If your search results first go to an ad serving company then some advertiser probably bought the word you just searched for. You'll see this technique used in a lot of "advertorials."
What They Know About You
When the ad server sends your browser to a media server it counts an "impression," meaning you saw the ad.
When you click on an ad (and why would you ever want to do that?) it first goes to DoubleClick who counts the click, then gives you a 302 redirect to the advertiser's site. At that end there may or may not be "web bugs," pixel-sized clear images, to track how much farther you go into their site.
For example they may be on the product info page or the "Thanks for being a sucker and buying my crap" page. The advertiser then knows that one million people were annoyed by their ad, one thousand were stupid enough to click, one hundred almost bought their crap, and one sucker actually did. They can also find out this info based on all of the targetable stuff I mentioned above. For example, Kmart might be interested that people who search for George Bush usually buy guns.
In their network business they'll make a lot of use out of the web bugs. They know that you went to a website about sports and later, when you're on a site about cars, they'll show you a sports ad. It can get very specific, like seeing an ad for a scanner you were looking at a week ago on a totally different site.
To sum up, the info that is recorded about you (and targeted) whenever you see an ad is: any search words you used; domain and type (.edu, etc); your industry; your company's size; demographic info you've given; your geography; time and day; browser info; service provider; OS; and section of the site you're in.
More Evil in the Future
DoubleClick bought the largest junk-mail company in the world and is trying to combine what these scum know about you (just about every credit card purchase, what telemarketers you responded to, etc.) with everything DoubleClick knows about you online.
The way they'll do this is by using a web bug on pages where you input personal information. Check those pages where you put in your credit card, address, SSN, or even last name, for a DoubleClick pixel.
They may be linking up the ID in the cookie with the entire junkmail database. They'll then use your info to give you a really targeted ad. Bought a printer at the RadioShack around the corner recently? Well, now you're going to get a lot of ads for ink cartridges.
Email is the most susceptible, and they have a huge spam business. Very often you give your name and address when you sign up for email lists, register with a company, or whatever, and then your web surfing is linked to all this info. Pay no attention to their privacy statements on this, if they say they won't do something, it means that they haven't figured out how yet.
Potential Weaknesses
One of the weaknesses of the tags is found when they use JavaScript to deliver rich media.
If the ad servers go down or are slow, the entire page will be frozen since browsers can't render around JavaScript. A server that has a hard time seeing DoubleClick will not be able to deliver the regular page (at least for the Netscape users). We got hammered on this more than once when no one could use our site because of slow ads.
As far as DoS attacks, good luck.
They are on several different backbones and have routers that are fairly intelligent with load balancing and so on, so it will be difficult, although it has certainly been done before. Also, they supposedly eliminate from reports any spiders, bugs, etc. by using an algorithm of "too much, too fast." This doesn't mean that the ad won't be delivered, it will, just that the web company won't be charged by DoubleClick. That could be useful.
Blocking them: there is software that basically refuses to make a call to doubleclick.net. This is effective in some ways but not all companies use the DoubleClick domain (for Public Relations reasons) and simply use IP masking. Probably a more effective way to block their ads is to sniff out the signature tag designs or look for the patterns such as: ;ord=
To shut down their email business, complain to the RBL who will put their email servers (and there are a lot of them) onto a black hole list that a lot of major companies use to block spam.
Also, their reverse domain lookup isn't close to reliable. If you work for a company that is based in Canada but you're in Florida, it appears that you're in Canada. Oddly enough AOL confuses them as well, since they only use a few IPs for everyone - it all looks like it comes from a remote part of Virginia. I'm not going to tell you to switch to AOL, but rather check out that box which translates your cookie ID to an IP and then looks up your personal info. That's a big weak spot for the company.
If you work for a company that advertises anywhere on the net, chances are they use DoubleClick somewhere. Get access to that account and give DoubleClick a "rich media" ad that will do some very nasty things across a variety of sites. Most of the time the person who actually enters your ad won't even bother to look at what the code does and might even give you access to change it yourself.
So, target your audience using the above criteria and send them a JavaScript that erases or leaves a message in their DoubleClick cookie. Or tell them that you'd rather host the image (have them redirect it to you) and make some anti-DoubleClick ads (be sure to create a different image with the same name for when the complaints find their way back to you). A rich media ad is practically an entire web page so use your imagination. Some sites even give their advertisers the entire frame straight out!
More Info
I've always found Customer Support to be very helpful and they will answer most of your questions about how it all works.
Don't worry about being a client, the turnover in most Internet companies is so frequent that you can just pick a major, or better yet, minor company (just look for the ad.doubleclick.net on the page) and tell them you are a new "trafficker" or that the webmaster is out sick and you're trying to figure out how this ad stuff works.
If you're unlucky enough to work for a company that uses them, then find a way into their training class where they tell you all about how everything works, as well as what's the best way to exploit the technology and people's trust to make money. (I found the teachers to be very helpful when it came to the design of the DoubleClick network, type of routers used, etc.)
Well, I hope this gave you an overview of what online ad companies do and how they do it. It's up to us to explore their structure more (there is plenty of leaked info around) and point out to them the weaknesses in their system. Maybe throw a little civil disobedience in there too to let them know that you are not a person who is willingly exploited so that some huge company can sell you crap that you don't need.
Good Luck!
Shout outs to blabpuppet, wiccanwarrior, KarMaKid, and the rest of the Avila crew.