How easy is it to crawl for expired domains? In this video I introduce my cloud-based domain crawler, designed and tested by SEO pros.
For more information, visit http://www.pbnlab.com
Welcome to PBN Lab!
In this video I’m going to give you a quick crash course on how to set up a new job, then we’ll have a quick look at the results page.
Once you log in after activating your account, you’ll be presented with the dashboard.
I think you’ll agree with me that dashboard is very bare, and somewhat minimalistic. That’ll change soon enough and the big space down the bottom here is where the live job stats will appear.
In the red panel you can see an overview of your crawl jobs, whether they’re running, queued, waiting or completed.
And in the black panel is a quick overview of the domains you’ve found.
Let’s get started by clicking Add New Job.
The first step in the process is giving the new job a name. It can be anything you like, and ideally should reflect what you’re about to search for - because you’ll see this job name in the results page.
In step 2, we’ll be adding a list of seed URLs, which is a list of distinct web sites or pages that the crawler will use to begin it's crawl from.
There are 3 simple ways to load in a list of seed URLs.
The first and easiest is by using a keyword search, which utilises current data from the Google index, and we source and pay for that data legitimately using Google’s CSE API.
The second option allows you to manually enter (or paste) seed URLs, one URL per line.
The third option allows you to paste raw HTML, which we’ll parse for links programmatically. It’s a powerful, but more advanced method.
For this basic demo, I’ll use the keyword search, since it will deliver us great results with ease.
Enter the keywords you’d like to use - just like when you’re genuinely searching Google. You can put them in double quotes and include other semantic terms that the Google search recognises.
My niche site is a buying keyword for cycling shoes, so I’ll use the search phrase “bradley wiggins”, who was the Tour de France winner in 2012. We won’t get into WHAT to search for right now, because it’s a huge topic in itself.
By specifying a country, the Google API will return us results more specific to that geographic region - but keep in mind that it doesn’t mean they’ll necessarily be sites from that region.
We’ll leave that at the default of USA.
By specifying a year - the Google API will return the top results beginning that year. While it doesn’t always vary the results return greatly, it does at times help yield different results.
I’ll set that to 2012.
After the web service pulls data back from Google, we’ll get to see the list of seed URLs. You can click out and take a look at some of the pages, but take my word for it when I say doing that’s really not worth your time. Just let it rip and click next to queue the job.
If we head back to the dashboard now, we’ll see the live status of the current job.
There’s a lot of data here to look at as the crawler goes through the various stages of the process. Initially we’ll see the stats of the crawl itself, with the main points of interest here being the total URLs indexed, the crawl rate in URLs per minute and how much time is remaining.
After the crawl completes, very quickly it’ll establish a distinct list of domains from the broken URLs, then look them up and verify their availability. And of course the most important part, fetching the metrics for the available domains and returning all of the crucial data back here so you can review the results.
There’s nothing you need to do, it’s just showing you what it’s up to.
I’ll pause the recording now for the next few minutes, and then we’ll take a quick look at the job results view, and how we can use it to quickly review the domains we found in the crawl.
So we’re back, and after about 20 seconds of our own time setting up the job, and waiting maybe 10 minutes in total for the whole job to complete, I’ve got 71 domains.
Indexed 304 thousand URLs and about 7945 domains
Let’s go check them out!
Информация по комментариям в разработке