Cloaking and You
   ----------------
   Written by Grandpa Hackman
   Friday, 18 November 2005

   It sounds so mysterious.  Cloaking.  Like something with ominous gray
   overtones.

   Why is it done?  How is it done?  What is the future of cloaking?

   This article will attempt to answer those questions.

   I won't be able to include the actual instruction booklet to tell you
   how to cloak, but I can tell you what it's about, and how to find out
   more.  Also, this article describes website cloaking, not the kind
   often used by hackers to hide themselves for roots.

   To fully understand what it's all about, an explanation of search
   engine optimization is in order.

   Priorities.  That's what it's about.  Money.  If you're a website
   owner, you have many options available to you to advertise your site.
   Advertising your site is essential.  After all, you may have the finest
   mousetrap on the planet, but if nobody knows about it, you're not going
   to catch any mice.

   There are not as many options as "the good ole days", but there are
   still a lot of methods to advertise.  There are "surf engines" like
   ts25 (one of the best for the buck), double-opt-in safelists,
   pay-per-click, etc.  If you're hyping a website, you probably use all
   of these options and then some.

   But it's hard to beat the "targeted traffic" you get for free from a
   search engine.  Or should I say "You can't beat the targeted traffic
   you get for free from a search engine."  This is because your potential
   customer came to you actually looking for your product.  He's 80% sold
   before he gets to your page.  He needs and wants this widget and
   actually went to the trouble to search for it.

   And not only is this a very special customer likely to buy, buy, buy,
   but it didn't cost you a dime.  Most other methods have some cost
   involved and on top of that, the customers are not "targeted", they
   just happen to be interested in the subject line of your email or your
   pretty webpage struck their fancy as it flashed across their screen,
   etc.  They still don't know that they want or need your product,
   they're just temporarily mesmerized.  Now it's your job to "sell" them
   your product, much tougher than just supplying what they're already
   looking for.

   OK, so now we've discussed why you would want to be listed on the
   search engines.  What is involved in getting listed?

   Well, with most of them, not a whole lot.  Yahoo wants $299/yr. to
   "think" about listing you, no guarantee you'll get listed, what a
   deal.  Fortunately, there are ways to "trick" Yahoo into listing you
   without paying that outrageous graft.

   But most of the rest will find you with their spiders.  Now, this can
   take months.  Most site owners are a little more anxious than that.
   Again, there are ways to speed up this process, for free even.  Well,
   it involves effort on your part, but at least no capital outlay.

   And then there is the issue of ranking.  It really doesn't mean much if
   you get listed on a search engine and you're back on page 68.  Nobody
   is going to see you, NOBODY.

   So not only is it important to get listed, but it is mandatory that you
   get a good ranking.  Otherwise, your goal of making money is not going
   to be realized.

   Google states on their information pages that: "Google doesn't accept
   payment for inclusion (known as "paid inclusion") of sites in our
   index, nor for improving the rank of sites in our results."

   This appears to be the case.  Google claims that their ranking system
   is based upon "a complex algorithm" designed to prevent fraud and
   designed to give ALL the opportunity for a good listing, regardless of
   financial clout.  All well and good.  Then they go on and tell you
   "It's also possible that we're not able to crawl your site due to
   technical reasons. A few of the most common ones are listed below:

   Your pages were unavailable when we tried to crawl them.
   Your pages are dynamically generated.
   You employ doorway pages.
   Your pages use frames."

   And that's the problem.  Except for the 1st instance, you may very well
   be employing one of those methods and have trouble getting listed.  And
   you probably have very good reasons to write a dynamically generated
   page, for example.  These tend to be infinitely more work to produce
   and for all your labor, you are rewarded with a page that Google won't
   list.

   Which bring us full circle to cloaking.  Cloaking is a method which
   allows you to serve one set of pages to the search engines and another
   set to humans.  Or other divisions according to data sent by the
   requestor.  Here's how it works:

   When you (or the search engine) request a page an "http header" is
   sent.  Here's an example of the information in this header:

   	GET: /index.htm
   	HOST: www.yoursite.com
   	USER_AGENT=Mozilla/4.71 (Windows 98;US) Opera 5.50 [en]
   	REFERER=http://some_refering_site.com
   	REMOTE_ADDR=123.122.111.006

   By watching the user_agent and/or remote_addr fields, you can
   dynamically produce pages styled to the viewer.  By keeping simple
   lists/databases of the known major spider ip ranges you can easily pick
   them out.  In the case of a search engine, you give them what they
   want, a page optimized with keywords, no frames, etc.  In the case of
   the humans, you give them what they want, a spiffy-looking, easy to
   navigate page.  It's a win-win.

   The search engines all claim to abhore this technique, but they won't
   absolutely ban it because they'd have to dump the most profitable pages
   they have.

   There can be many reasons why you would want to work in the gray area
   of cloaking.

   You can use cloaking to serve appropriate pages to unique regions of
   the globe.  Or have one set of pages for dialup customers and another
   set for broadband visitors.  Many various reasons.

   SearchEngineWorld has a great forum on "search engine spider
   identification".  At that site you can find the ip address range of
   most of the major spiders.  And many other interesting facts.

   If haven't made cgi pages before, you can learn to make pages that feed
   data out according to criteria obtained from the requesting party.
   There's software for sale that claims to do the cloaking for you.

   It's also interesting to note the number of "unknown" spiders that are
   out there, and seemingly up to no good.  Really, they're up to
   something, but what?

   Four of the most basic and common methods of cloaking are: User Agent,
   IP, IP Delivery and IP Redirection.

   User Agent: This is the simplest form of cloaking.  Most major search
   engines set the user agent variable to the name of their search engine
   spider.  They also put the word "mozilla" in their user agents to
   "fool" you into thinking that a Netscape or IE browser is hitting your
   site.  You can easily write a script that shows the cloaked page if
   mozilla is absent and deliver optimized html if it recognizes a spider
   name.

   IP Cloaking: Most web servers have a RemoteAddr variable that allows
   you to check the IP address of the site visitor.  Using a list of known
   spider engines you can write a script that identifies a spider by it's
   IP address and deliver html that is designed for exactly that spider.

   IP Delivery:  This actually describes the method of delivery of html
   when you have determined whether you are dealing with a spider or a
   human.  It simply serves up the appropriate page on the spot.

   IP Redirection:  In this method, the "spider page" IS the initial
   doorway page.  If it is indeed a spider visiting you are done.
   Otherwise the requester is redirected to a page designed for humans.

   There are many outstanding resources on the web to guide you towards
   effective cloaking.  You should also be aware that at least publicly,
   the search engines frown on the practice of cloaking.  For this reason
   if you are caught doing it you may be penalized.  For example, Google
   will lower your page ranking.  Some search engines will expel you from
   their database.  So if you do decide to cloak, don't get caught.

   Learning to cloak websites is a mighty contribution in the hackers bag
   of tricks.