mcquery.js - A Web Scraper for Disc Golf Players

by Brenden Hyde

Disclaimer: I am not a corporate shill for Innova or any other company; I'm just an obsessive fanboy.

Introduction

Each summer my girlfriend and I seem to have a new obsession.

This year it's disc golf.  In addition to playing every day, we also watch the professionals play tournaments on YouTube.  The highest-rated player in the world right now is Paul McBeth who is sponsored by Innova.

As part of his sponsorship, they make a special golf disc for him called the "McPro Roc3."  It's a highly sought-after disc because of its limited releases, and it always sells out extremely quickly.  I had tried and failed several times to manually keep tabs on their sales, but I was always too late to order one.

To add insult to injury, there were near-constant brag posts on Reddit by those who were lucky enough to snag one.  They sold on eBay for around double the price, but I didn't want one badly enough to pay the mark-up.

That's why I decided to write a simple web scraper to keep tabs on the site for me.

Planning

I am no developer by trade, but I had dabbled in Node.js through a YouTube series called "The Net Ninja," so I decided to use that.

With the language chosen, I started to plan out what I wanted the app to do, which on paper was only two things:

1.)  Periodically check the McPro Roc3 sales page for available inventory.

2.)  If there was inventory, send me a text message with a link to the page.

Preparation

I used Debian for my Linux distribution, but any distribution should work.

First, I downloaded Node.js from nodejs.org, which includes Node itself as well as Node Package Manager (npm).

Installing them is beyond the scope of this article, but there are plenty of tutorials to get up and running.

In addition to Node.js and NPM, I used three external Node.js modules called "Request", "Cheerio", and "Twilio".

The Request module let me make "HTTP GET" requests to Innova's website to download the product page, and the Cheerio module let me parse through the HTML to find the section that could tell me whether or not they had inventory.  Twilio is a paid SMS gateway service that requires an account, and it allowed my scraper to text me when my precious disc became available.

To install the external modules, I ran these commands:

$ npm init -f
$ npm install twilio -save
$ npm install cheerio -save
$ npm install request -save

The first command created my "package.json" file (more on that in a minute).

The "-save" option in the commands adds them to the "dependencies" section of the package.json file.

This in turn allows others to more easily run your app by taking care of all the dependencies.  To install all dependencies from someone else's "package.json" file, just change directories to the location of the JSON file and run:

$ npm install

Beyond installing those modules, there was some actual programming involved, but describing that here would be boring.  I have added some comments in the code section that hopefully explain my rationale.

Outcome

I programmed the mcquery.js scraper over the course of four to five hours at work (please don't tell my boss), and when I got home, I excitedly explained the concept to my girlfriend and initialized the program for a demo run with this command:

$ node mcquery.js

It was written to check the inventory status every 30 seconds and to spit out the result to my command line.  It was a neverending, scrolling terminal that looked eerily like an homage to The Shining:

availability out-of-stock
availability out-of-stock
availability out-of-stock
availability out-of-stock
availability out-of-stock
availability out-of-stock
availability out-of-stock

I expected to see a lot more where that came from since there were often month-long gaps between releases, so I left the program running and walked away.

About 20 minutes later I felt the buzz of a text message in my pocket, and when I checked my text messages, I saw those glorious words:

McPro Roc3s are in stock again! (hyperlink to site)

At first I assumed my app had malfunctioned, since I had received dozens of messages like that throughout my testing that day.  I went to the site, and sure enough my scraper had done its job!  I unashamedly bought one in every color and laughed an evil victory laugh.

Obsessive?  Yes.  Unnecessary?  Probably (considering their Twitter feed has the same information).

But who has the disc now?  Who.  Has.  The.  Disc.  Now?!

Give Me the Codez

mcquery.js:

// BEGINNING OF mcquery.js
var request = require ('request');
var cheerio = require ('cheerio');
var twilio = require ('twilio');
var accountSid = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'; // replace with real SID
var authToken = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'; // replace with real authToken
var client = new twilio.RestClient(accountSid, authToken);
var requrl = 'http://proshop.innovadiscs.com/mcpro-roc3.html'
var textStr = "McPro Roc3s are in stock again!\n" + requrl;
var status = '';

function rocquest(err, resp, body) {  // makes HTTP GET request to Innova
      if(!err && resp.statusCode == 200) {
        var $ = cheerio.load(body);
        $("p[class~='availability']").each(function() {
        status = this.attribs.class;
        console.log(status);
		});
}}

var sendAlert = function() { // Sends text message when discs in stock
    client.messages.create({
    body: textStr,
    to: '+19705552600' // Text this number (change as needed)
    from: '+19704202600' // From a valid Twilio number
    }, function(err, message) {
if(err) {
    console.error(err.message);
}    else {
    console.log(message.sid);
}
      });
};

var mcquery = function() { // check statu s and send text if available
      request(requrl, rocquest);
      if(status.includes('availability in-stock')) {
      clearInterval(loop);
      sendcAlert();
      }
};

var loop = set Interval(mcquery, 30000); // starts the program by invoking a loop

// END OF mcquery.js

package.json:

{
"name": "mcquery.js",
"version": "1.0.0",
"description": "Query Innova's Pro Shop for new McPro Roc3 discs",
"main": "mcquery.js",
"scripts": {
   "test": "echo \"Error: no test specified\" && exit 1"
},
"author": "Brenden Hyde",
"license": "ISC",
"dependencies": {
     "cheerio": "^0.20.0",
     "request": "^2.73.0",
     "twilio": "^2.9.1"
 }
}

README:

mcquery.js Version 0.1.0

DESCRIPTION
mcquery.js is used to check the stock status of McPro Roc3 discs in the Innova 
Pro Shop.

USAGE
1.)  Install nodejs from nodejs.org and link the node and npm bin files to your
     $PATH environment variable.
2.)  cd to the directory where mcquery.js is located.
3.)  Type "npm install" to install program dependencies (twilio, request, 
     cheerio).
4.)  Type "node mcquery.js" to run the program. This will spit out the 
     availability status to the console every 30 seconds by default -
     this default can be changed in the loop() function by removing the
     "3000" and replacing with the appropriate umber of milliseconds.
     Once the availability changes to "availability in-stock," a text is
     sent to the phone number referenced in the sendAlert() function.

NOTES
1.) The mcquery function checks Innova's Pro Shop for the
    status of the McPro Roc3  It uses the external "request"
    module with a URL and callback function as parameters.

2.) rocquest is the callback function for mcquery (2nd
    parameter). This function uses the external "cheerio" module
    to do JQuery HTML DOM parsing to find the paragraph with
    the availability status (in-stock or out-of-stock).

3.) The status variable is a global variable that is
    modified by rocquest and is a string that either says
    "availability in-stock " or "availability out-of-stock".

4.) The status is set each time mcquery() is called. This
    is done periodically because the status will change at
    some point from "out-of-stock" to "in-stock". Therefore,
    mcquery needs to be called repeatedly in a loop.

5.) setInterval is the function that will call mcquery every
    n milliseconds. n is set to 30000 milliseconds (30 seconds)
    by default. It will need to be cleared when the status no
    longer requires updating (i.e., the disc is in stock).

6.) loop will be the function that acts as a wrapper for
    setInterval calling mcquery. As the name implies, it is a
    loop which will run until it has served its purpose.

7.) clearInterval is the function that will stop
    setInterval from repeatedly running mcquery. It takes
    only one parameter, in this case "loop".

8.) sendAlert is the function which will send a text message to me
    (you?) once the status is set to "in-stock". It only needs to be
    called once, right after the loop is stopped with clearInterval.

Acknowledgment

Smitha Milli's YouTube video on web scraping with Cheerio and Request was a large chunk of my program.

It's available at www.youtube.com/watch?v=LJHpm0J688Y.

Code: mcquery.js  (README)

Code: mcquery.json

Return to $2600 Index