// Making A Digital Copy Of The Fone Book For Fun Or Pleasure // // by nwbell // // phile version: 1.0.0 // // http://www.oldskoolphreak.com INTRO: What is this, and where'd it come from? -------------------------------------------------- First off, I ought to mention that this phile is based on the work of 813-Error (superpages_scanning.txt from oldskoolphreak.com). The original phile showed how to use output from Superpages.com's residential white pages search to make a "don't call" list for your wardialer. But I thought it would be cool to take it a step further: why not keep the names and addresses, and make myself a searchable fone book for phree? I did this all on a Mac running OS X (if you don't have a Mac, you should - go out and buy/beg/borrow/steal one right now). Because it's a Mac the shell commands were run in Terminal.app (duh), the find-and-replace type crap was done in TextEdit (if you're more leet than myself in straight UNIX, go ahead and do it your own way), and the database that it all went into was FileMaker (again, your choice). If you're doing this on a *NIX box, you probably don't need me to tell you what to do. It's all pretty simple. If you're one of those punks that's using a Windoze box, 813-Error's phile also includes info on where to get wget and grep for your platform that you really ought to read before you try this. Or you could just get a real computer ;) Oh, and before I forget: gotta give a shoutout to Fatman from Fatcorp - remember, buddy, beige boxes don't have to be beige! (I like my yellow one) STEP ONE: Set up wget to grab the right data -------------------------------------------------- The Superpages site uses a certain syntax for reverse lookup requests. It is as follows: ## http://directory.superpages.com/wp/results.jsp?SRC=&STYPE=WR&PS=xx&A=npa&X=nxx&P=yyyy&PI=z The important parts to know are: [PS=xx] This indicates the number of records per page. 60 is the maximum; I used 50 (easier to work with) [A=npa] This is the NPA of the number(s) you're looking up. In this case, it's 320 [X=nxx] This is the exchange you're looking in. My example uses 679 [P=yyyy] This is the actual fone number you're looking up. As far as wildcards, using an asterisk in place of a number is OK (ex. 99**, *001, ****) [PI=z] This is the page number that you wish to see. For instance, searching for 320-679-0*** would bring up around 1000 listings, and at 50 per page, that's 10 pages. So to get all the pages, I would have to specify "1", "2",... "10" one at a time. Also worth noting is that the site will never return more than 1000 results (that's why I broke it down into groups of 1000 numbers below) So we'll make a textfile containing all the addresses, in order to avoid typing a ton of commands (I was going for the whole exchange, so that's around 100 pages!). I called it "wgetthese.txt", and added each of the 100 lines using copy-and-paste in TextEdit. No special formatting is required; just put one address on each line: ## http://directory.superpages.com/wp/results.jsp?SRC=&STYPE=WR&PS=50&A=320&X=679&P=0***&PI=1 ## http://directory.superpages.com/wp/results.jsp?SRC=&STYPE=WR&PS=50&A=320&X=679&P=0***&PI=2 ## http://directory.superpages.com/wp/results.jsp?SRC=&STYPE=WR&PS=50&A=320&X=679&P=0***&PI=3 ## ... ## http://directory.superpages.com/wp/results.jsp?SRC=&STYPE=WR&PS=50&A=320&X=679&P=9***&PI=9 ## http://directory.superpages.com/wp/results.jsp?SRC=&STYPE=WR&PS=50&A=320&X=679&P=9***&PI=10 Next, we open a shell and grab the html files using wget: ## wget -i wgetthese.txt -E -w 5 Then we make a folder and put all those HTML files in it (so we don't whack anything else by accident when we strip the extra formatting off them): ## mkdir files; mv *html files/ STEP TWO: Grep out all the extra crap -------------------------------------------------- Now we have to make a textfile with the grep patterns that we will use to root out just the names, addresses, and phone numbers (actually, we don't have to do it with a textfile, but I like to). My sample is called "patterns.txt", and it looks like this: ## < B> ## < br> ## (320) 679 - ## < div style="font-style: italic; color: #BBBBBB"> If you look at the HTML, it makes more sense: because this isn't XML, the only way I can identify the name line is by the fact that it's bold (HTML tag < B>); the address line always contains a forced carriage return at the end (HTML tag < BR>), and the "no address available" message is always formatted a certain way (HTML < div> tag). I know this isn't the cleanest way to do this, but it's easily curable later when we do find-and-replace. Time to do the first grep-ing: ## cd files; grep -h -f ../patterns.txt * > ../raw.txt So now we've got a file containing names, addresses, phone numbers, and Mapquest links. But that won't do! Let's grep again to strip the links: ## grep -h -v "< a href" ../raw.txt > ../raw2.txt; rm ../raw.txt STEP THREE: Clean up the results -------------------------------------------------- Now we've got a file of just names, addresses, and phone numbers (raw2.txt). From here, we only need to strip the excess HTML using find-and-replace in TextEdit (any old word processor), and we're set. In this example I'm looking to end up with tab-delimited data; if you need it some other way, feel free to get creative. If you're using something other than TextEdit, know that I'm making a few assumptions. I assume that your WP's find-and-replace allows you to search for ANY block of text you can paste into it (including tabs, carriage returns, big blocks, etc.). I also assume that your find-and-replace feature allows you to leave the 'replace with' field empty and still work (for deleting blocks throughout a file). Here we go... Get rid of the bold tag and extra spaces before the name: ## Find: " < B>" ## Replace: "" Make the bold closing tag and the BR tag go away, and replace with a tab: ## Find: "< /B>[return] < br> " ## Replace: "[tab]" Replace the formatting tags between the address and phone number with a tab: ## Find: "< br>[return] < br>< font face="geneva,arial,helvetica" size="-1">" ## Replace: "[tab]" Remove the font closing tags after the phone number: ## Find: "< /font>" ## Replace: "" Replace the formatting before the 'address not available' string with a tab: ## Find: " < div style="font-style: italic; color: #BBBBBB">" ## Replace: "[tab]" Replace the formatting after the 'address not available' string with another tab: ## Find: "< /div> < br>< font face="geneva,arial,helvetica" size="-1">" ## Replace: "[tab]" Get rid of the last chunks of HTML that are scattered all around the file: ## Find: "< br>[return] < font COLOR="#FFFFFF" FACE="Arial, Helvetica, sans-serif" SIZE="2">English / < a HREF="http://espanol.superpages.com">< font COLOR="#FFFFFF" FACE="Arial, Helvetica, sans-serif" SIZE="2">Español< /a>< br>[return] < br>[return] < br>< img SRC="http://img.superpages.com/images-yp/sp/images/spacer.gif" WIDTH="620" HEIGHT="1" ALT="">< br>< /div>[return] < br>< /td> ## Replace: "" ## Find: " < li>Use the wild card character*< /b> if you are unsure of a particular digit.< br>For example, 203-555-12*2 will find 203-555-1212 (as well as numbers ending with 02, 22, 32, 42, 52, 62, 72, 82, and 92).[return]" ## Replace: "" Yay - now you've got a big fat tab-delimited fonebook! From here you can either just grep it or search it in your WP when you want to find something, or you can do like me and put it in a database for even more phreaky phun. STEP FOUR: Do the database -------------------------------------------------- I used FileMaker 7, but you can do this in whatever database you want. Because I plan to do a bunch of other exchanges in the same database later, I tried to organize it so that each NPA has a database, and each exchange has a table. First, I made a new database phile (which I called 320.fp7) Then I made a new table (called 679). The table was set up to support three fields: name, address, and number. In the 679 table of the 320 database, I used the Import Records command and selected the raw2.txt file we just made, being sure to set its type as Tab-Delimited. I defined the first item as being equal to 'name', the second as 'address', and the third as 'number'. Then I clicked Import. About twenty seconds later, wham! A fully searchable fonebook for 320-679-xxxx! Plus, it can be sorted eight ways to Sunday (to look for patterns, etc.), exported in a dozen or more formats (handy for batch-harassment jobs and junkmail/telemarketing type chores), and all sorts of other cool schtuff. CONCLUSION: That was pretty nifty, huh? -------------------------------------------------- Well, now you know how to make your own fonebooks by exploiting the simplicity of Verizon's online directory. If you've got a better way to do any of the stuff I did here (especially the find-and-replace crap), please let me know - my email address is nwbell at fatcorp.moramv.com, or you might find me on Cal's forums (http://cal.phonelosers.org/). Have fun, stay out of trouble, and keep phreaking alive!!! nwbell