White Hat Link Building with Scrapebox
On the go? Have Polly read to you.
Scrapebox is a useful SEO backlinking tool which allows us to pull lists of relevant websites to acquire links from. It has two primary functions: it scrapes search engine results and automates comment posting.
Despite the negative attention it’s garnered from the SEO community as a black hat tool, Scrapebox can be a huge benefit to any backlinking campaign without darkening our hat color. We’ll ignore the automatic comment poster due to the spammy nature of the function and focus on scraping search engines.
Scrapebox has 4 panels in its interface:
The application allows us to choose which keywords we want to search for, and then scrapes the search engine results page for those queries. It also allows us to use advanced operators (also known as Google Dorks) to further customize our queries. You can add thousands of keywords and scrapebox will combine each of them with your footprint.
There are two different parts of the harvester, the first allows us to customize our footprint with Google dorks and the second is where we choose which keywords we want to search (the results of which are scraped).
In the example above, we’re instructing scrapebox to search for the terms “fitness”, “workout routines”, and “workout regimen”(2). Using Google Dorks in our footprint (1), we’re pulling websites that contain the web extension “.edu” in the url because it is designated for academic websites which generally have higher domain authority than normal websites; and the phrase “write for us” written within the text of the website, which will pull websites offering guest posts.
In order to use scrapebox safely, you’ll need to set up proxies. A proxy is a spoofed IP address the scrapebox will appear to be scraping from. If we don’t set up proxies, the search engine being scraped will throttle your IP and stop your scrape. If continuously abused, they will eventually ban you IP. Why?
Scrapebox rapidly queries search engines thousands of times and scrapes up the URL’s that are returned. Google’s wants to provide information and wasn’t built for this sort of automated data retrieval. While Bing is notoriously relaxed about their IP banning (probably because they want to show their investors more searches are being performed in their engine, even if they aren’t authentic), Google is notoriously and increasingly strict about their scraping policy.
Scrapebox’s Built In Proxy Harvester
In the bottom left interface, select manage.
A screen will pop up, select Harvest.
This will pull a list of sources Scrapebox will harvest proxies from. Press all, which will select every resource, and then start.
This will take a minute or two. Once its completed select apply.
We just harvested thousands of proxies, but remember that these are “public” proxies, meaning they’re being used by other people online. Because of this many of them have been burnt and will reveal our IP, haulting our scrape and potentially resulting in a ban.
Select Test Proxies > All Proxies to make sure they’ll keep us protected.
This will take some time. Once it’s completed, hit the filter button and choose “Keep Google Proxies” and then “Save select proxies to scrapebox.”
This will load up the proxies that passed Googles test and kept our IP address anonymous. Now that our proxies are loaded and our footprint is customized, we’re ready to scrape. But before we do that, lets configure Scrapebox’s page authority add on, which makes a MOZ api call for each URL we load into it on a text file, returns their Page authorities, and resorts the list accordingly.
Sorting Results By Page Authority
Exit out of the proxy manager page and you’ll see the 4 panel interface now loaded with our filtered proxies. In the example above I mentioned a useful technique, using scrapebox to find relevant niche websites that offer backlinks. Now lets further filter our results by sorting the URLs scraped by their domain authority.
Click the scrapebox interface and than select the tools across the top of your screen and then press page authority
The following page will open. For this, we need a Moz API key, which we pass into Scrapebox in a format they provide on the account setup page.
Once added, we can load a list of url’s we’ve scraped into the page authority add on and it will make an API call to Moz for each URL and return the list in ascending or descending order. You can also use the add on separately from your scrapes. If you had a text file of a list of URL’s you wanted to know the domain authority for, you could upload them to the add on separately and use it.