White Hat Link Building with Scrapebox - Patrick Harris
post-template-default,single,single-post,postid-5975,single-format-standard,eltd-core-1.1.1,woocommerce-no-js,eltd-boxed,flow-ver-1.3.6,eltd-smooth-scroll,eltd-smooth-page-transitions,ajax,eltd-blog-installed,page-template-blog-standard,eltd-header-standard,eltd-sticky-header-on-scroll-up,eltd-default-mobile-header,eltd-sticky-up-mobile-header,eltd-menu-item-first-level-bg-color,eltd-dropdown-slide-from-bottom,eltd-dark-header,eltd-header-style-on-scroll,wpb-js-composer js-comp-ver-5.5.2,vc_responsive
Scrapebox logo

White Hat Link Building with Scrapebox

On the go? Have Polly read to you.

Scrapebox is a useful SEO backlinking tool which allows us to pull lists of relevant websites to acquire links from. It has two primary functions: it scrapes search engine results and automates comment posting.


Despite the negative attention it’s garnered from the SEO community as a black hat tool, Scrapebox can be a huge benefit to any backlinking campaign without darkening our hat color. We’ll ignore the automatic comment poster due to the spammy nature of the function and focus on scraping search engines.


Scrapebox has 4 panels in its interface:



scrapebox overview




The application allows us to choose which keywords we want to search for, and then scrapes the search engine results page for those queries. It also allows us to use advanced operators (also known as Google Dorks) to further customize our queries. You can add thousands of keywords and scrapebox will combine each of them with your footprint.



There are two different parts of the harvester, the first allows us to customize our footprint with Google dorks and the second is where we choose which keywords we want to search (the results of which are scraped).








In the example above, we’re instructing scrapebox to search for the terms “fitness”, “workout routines”, and “workout regimen”(2). Using Google Dorks in our footprint (1), we’re pulling websites that contain the web extension “.edu” in the url because it is designated for academic websites which generally have higher domain authority than normal websites; and the phrase “write for us” written within the text of the website, which will pull websites offering guest posts.




Using Proxies



In order to use scrapebox safely, you’ll need to set up proxies. A proxy is a spoofed IP address the scrapebox will appear to be scraping from. If we don’t set up proxies, the search engine being scraped will throttle your IP and stop your scrape. If continuously abused, they will eventually ban you IP. Why?


Scrapebox rapidly queries search engines thousands of times and scrapes up the URL’s that are returned. Google’s wants to provide information and wasn’t built for this sort of automated data retrieval. While Bing is notoriously relaxed about their IP banning (probably because they want to show their investors more searches are being performed in their engine, even if they aren’t authentic), Google is notoriously and increasingly strict about their scraping policy.



Scrapebox’s Built In Proxy Harvester


In the bottom left interface, select manage.



scrapebox proxy



A screen will pop up, select Harvest.





This will pull a list of sources Scrapebox will harvest proxies from. Press all, which will select every resource, and then start.



scrapebox proxy harvest



This will take a minute or two. Once its completed select apply.



harvest proxies scapebox



We just harvested thousands of proxies, but remember that these are “public” proxies, meaning they’re being used by other people online. Because of this many of them have been burnt and will reveal our IP, haulting our scrape and potentially resulting in a ban.



Select Test Proxies > All Proxies to make sure they’ll keep us protected.



testing proxies scrapebox



This will take some time. Once it’s completed, hit the filter button and choose “Keep Google Proxies” and then “Save select proxies to scrapebox.”


This will load up the proxies that passed Googles test and kept our IP address anonymous. Now that our proxies are loaded and our footprint is customized, we’re ready to scrape. But before we do that, lets configure Scrapebox’s page authority add on, which makes a MOZ api call for each URL we load into it on a text file, returns their Page authorities, and resorts the list accordingly.



Sorting Results By Page Authority



Exit out of the proxy manager page and you’ll see the 4 panel interface now loaded with our filtered proxies. In the example above I mentioned a useful technique, using scrapebox to find relevant niche websites that offer backlinks. Now lets further filter our results by sorting the URLs scraped by their domain authority.


Click the scrapebox interface and than select the tools across the top of your screen and then press page authority



scrapebox page authority



The following page will open. For this, we need a Moz API key, which we pass into Scrapebox in a format they provide on the account setup page.



scrapebox page authority



Once added, we can load a list of url’s we’ve scraped into the page authority add on and it will make an API call to Moz for each URL and return the list in ascending or descending order. You can also use the add on separately from your scrapes. If you had a text file of a list of URL’s you wanted to know the domain authority for, you could upload them to the add on separately and use it.


Post a Comment