Cloaking and You ---------------- Written by Grandpa Hackman Friday, 18 November 2005 It sounds so mysterious. Cloaking. Like something with ominous gray overtones. Why is it done? How is it done? What is the future of cloaking? This article will attempt to answer those questions. I won't be able to include the actual instruction booklet to tell you how to cloak, but I can tell you what it's about, and how to find out more. Also, this article describes website cloaking, not the kind often used by hackers to hide themselves for roots. To fully understand what it's all about, an explanation of search engine optimization is in order. Priorities. That's what it's about. Money. If you're a website owner, you have many options available to you to advertise your site. Advertising your site is essential. After all, you may have the finest mousetrap on the planet, but if nobody knows about it, you're not going to catch any mice. There are not as many options as "the good ole days", but there are still a lot of methods to advertise. There are "surf engines" like ts25 (one of the best for the buck), double-opt-in safelists, pay-per-click, etc. If you're hyping a website, you probably use all of these options and then some. But it's hard to beat the "targeted traffic" you get for free from a search engine. Or should I say "You can't beat the targeted traffic you get for free from a search engine." This is because your potential customer came to you actually looking for your product. He's 80% sold before he gets to your page. He needs and wants this widget and actually went to the trouble to search for it. And not only is this a very special customer likely to buy, buy, buy, but it didn't cost you a dime. Most other methods have some cost involved and on top of that, the customers are not "targeted", they just happen to be interested in the subject line of your email or your pretty webpage struck their fancy as it flashed across their screen, etc. They still don't know that they want or need your product, they're just temporarily mesmerized. Now it's your job to "sell" them your product, much tougher than just supplying what they're already looking for. OK, so now we've discussed why you would want to be listed on the search engines. What is involved in getting listed? Well, with most of them, not a whole lot. Yahoo wants $299/yr. to "think" about listing you, no guarantee you'll get listed, what a deal. Fortunately, there are ways to "trick" Yahoo into listing you without paying that outrageous graft. But most of the rest will find you with their spiders. Now, this can take months. Most site owners are a little more anxious than that. Again, there are ways to speed up this process, for free even. Well, it involves effort on your part, but at least no capital outlay. And then there is the issue of ranking. It really doesn't mean much if you get listed on a search engine and you're back on page 68. Nobody is going to see you, NOBODY. So not only is it important to get listed, but it is mandatory that you get a good ranking. Otherwise, your goal of making money is not going to be realized. Google states on their information pages that: "Google doesn't accept payment for inclusion (known as "paid inclusion") of sites in our index, nor for improving the rank of sites in our results." This appears to be the case. Google claims that their ranking system is based upon "a complex algorithm" designed to prevent fraud and designed to give ALL the opportunity for a good listing, regardless of financial clout. All well and good. Then they go on and tell you "It's also possible that we're not able to crawl your site due to technical reasons. A few of the most common ones are listed below: Your pages were unavailable when we tried to crawl them. Your pages are dynamically generated. You employ doorway pages. Your pages use frames." And that's the problem. Except for the 1st instance, you may very well be employing one of those methods and have trouble getting listed. And you probably have very good reasons to write a dynamically generated page, for example. These tend to be infinitely more work to produce and for all your labor, you are rewarded with a page that Google won't list. Which bring us full circle to cloaking. Cloaking is a method which allows you to serve one set of pages to the search engines and another set to humans. Or other divisions according to data sent by the requestor. Here's how it works: When you (or the search engine) request a page an "http header" is sent. Here's an example of the information in this header: GET: /index.htm HOST: www.yoursite.com USER_AGENT=Mozilla/4.71 (Windows 98;US) Opera 5.50 [en] REFERER=http://some_refering_site.com REMOTE_ADDR=123.122.111.006 By watching the user_agent and/or remote_addr fields, you can dynamically produce pages styled to the viewer. By keeping simple lists/databases of the known major spider ip ranges you can easily pick them out. In the case of a search engine, you give them what they want, a page optimized with keywords, no frames, etc. In the case of the humans, you give them what they want, a spiffy-looking, easy to navigate page. It's a win-win. The search engines all claim to abhore this technique, but they won't absolutely ban it because they'd have to dump the most profitable pages they have. There can be many reasons why you would want to work in the gray area of cloaking. You can use cloaking to serve appropriate pages to unique regions of the globe. Or have one set of pages for dialup customers and another set for broadband visitors. Many various reasons. SearchEngineWorld has a great forum on "search engine spider identification". At that site you can find the ip address range of most of the major spiders. And many other interesting facts. If haven't made cgi pages before, you can learn to make pages that feed data out according to criteria obtained from the requesting party. There's software for sale that claims to do the cloaking for you. It's also interesting to note the number of "unknown" spiders that are out there, and seemingly up to no good. Really, they're up to something, but what? Four of the most basic and common methods of cloaking are: User Agent, IP, IP Delivery and IP Redirection. User Agent: This is the simplest form of cloaking. Most major search engines set the user agent variable to the name of their search engine spider. They also put the word "mozilla" in their user agents to "fool" you into thinking that a Netscape or IE browser is hitting your site. You can easily write a script that shows the cloaked page if mozilla is absent and deliver optimized html if it recognizes a spider name. IP Cloaking: Most web servers have a RemoteAddr variable that allows you to check the IP address of the site visitor. Using a list of known spider engines you can write a script that identifies a spider by it's IP address and deliver html that is designed for exactly that spider. IP Delivery: This actually describes the method of delivery of html when you have determined whether you are dealing with a spider or a human. It simply serves up the appropriate page on the spot. IP Redirection: In this method, the "spider page" IS the initial doorway page. If it is indeed a spider visiting you are done. Otherwise the requester is redirected to a page designed for humans. There are many outstanding resources on the web to guide you towards effective cloaking. You should also be aware that at least publicly, the search engines frown on the practice of cloaking. For this reason if you are caught doing it you may be penalized. For example, Google will lower your page ranking. Some search engines will expel you from their database. So if you do decide to cloak, don't get caught. Learning to cloak websites is a mighty contribution in the hackers bag of tricks.