sheet.codes demo: scraping data

For SEOs and data-driven site managers, getting the latest information for products or other valuable data sources is important for keeping your content fresh and engaging for your readers and search engines. In order to get that information at regular automated intervals, you may have to scrape it. However, not everyone has the dev team resources to build scraping tools. As marketers, we have access to two tools that can help us create and automate simple scraping jobs – Google Sheets and Google Apps Script.

The sheet.codes plugin is designed to work with any Google Apps Script functions that display information in the cells of your Google Sheets spreadsheet. Just create your Google Apps Script function and use your sheet.codes shortcodes like normal.

Let’s use the travel niche as an example (sample spreadsheet).

Let’s say you have pages on your travel website about every country around the world. Each of those pages are filled with articles on what to do, where to stay, and more. In today’s environment, a crucial component of that page could be to show alerts for the travel restrictions and advisories that are currently in place for each destination. The U.S. State Department has an active travel advisories feed for every country around the world with current travel advisories in place and other details that you could also pull into your pages. You can find the feed here.

Starting with the spreadsheet setup, we’re keeping it pretty simple – I grabbed the URLs for specific travel advisory pages for a handful of countries and listed them in column B of our sample spreadsheet linked above. Remember to reserve row 1 for your column headers, which you’ll use to identify the column you want to pull information from in your sheet.codes shortcodes.

Once you have input all of your URLs, select all of them and navigate to ‘Data > Named ranges’ to give your set of URLs a unique name. We will reference this named range in our Google Apps Scripts function later.

Once you have your URLs listed, navigate to ‘Tools > Script editor’ in your Google Sheets menu. This will open up the Google Apps Script editor, which is where we will write a custom function for scraping the information we need from the travel advisory pages for each country.

First, give your Apps Script project a name.

Next, click the ‘+’ symbol next to Libraries. We are going to hook up a script library called Cheerio to help us do our web scraping (I’ve already added Cheerio in the screenshot below, so you will not see it yet).

Next, head to this GitHub page to grab the Script ID that you will enter in the modal field provided. Then click ‘Add’ to add the script library to your Google Apps Script.

Okay, now we are ready to write our scraping function.

We need to grab two elements off each page – the date of the travel advisory and the specific travel advisory status, both found at the top of each page.

Using the URL Fetch service native to Google Apps Script to visit each page and Cheerio to parse the HTML, we are able to target the elements on each web page that get us what we need. Each country page uses the same template, so the CSS classes that we need to target are consistent for each element.

Here’s the code for the function, along with comments:

//declare your function, give it a name
function TravelAdvisoriesFunction() {

//declaring our range of URLs that we need to scrape and where we need to output results
var range = SpreadsheetApp.getActive().getRangeByName('CountryURLs');
var values = range.getValues();
const scrapesheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Sheet1');

//optionally, you can log your array of URLs to your execution log to check progress and ensure you are grabbing everything correctly
Logger.log(values);

//looping through each URL and grabbing the text on the page
values.forEach(function(row, i) {
var response = UrlFetchApp.fetch(row);
var $ = Cheerio.load(response.getContentText());

//grabbing the specific elements on the page that we want
var LatestAdvisory = $('.tsg-rwd-emergency-alertheader-type-frame-typebox').first().text().trim();
var AdvisoryDetails = $('.tsg-rwd-emergency-alertheader-title').first().text().trim();

//writing values to our spreadsheet
scrapesheet.getRange(i+2,3, 1, 2).setValues([[LatestAdvisory, AdvisoryDetails]]);

//optionally, you can log your results to your execution log to check progress
Logger.log(LatestAdvisory);
Logger.log(AdvisoryDetails);
})
 }

After you finish writing your function, click ‘Save’, then ‘Run’ and you should see results begin to populate in your execution log and your Google Sheets spreadsheet, like below.

Obviously, we need to do some cleaning on the ‘latestadvisory’ field, but that’s pretty easy with some trimming functions.

Okay, now to set it to run on an automatic timer so that we can pull fresh information every day.

To set up this timer, navigate to the Triggers section of Google Apps Script, denoted by the stopwatch. In the bottom right corner, click the blue ‘Add Trigger’ button.

Here, you can configure a time-driven trigger to automatically run your function at certain intervals each day, hour, or other time frame. In the example above, I set a timer to run the travel advisories function every day at 6am and to notify immediately if it fails.

Now, you will get fresh travel advisory data scraped to your spreadsheet every day that you can use sheet.codes shortcodes to pull into the country hub pages on your WordPress site.

And that’s about it. A simple scraper with Google Apps Script, Google Sheets, and the sheet.codes plugin to get fresh information to your site every day. And this is just one simple example.

Have any other cool use cases with scraping data that you want to share? Drop me a line.