I love Xenu.
Xenu Link Sleuth is a free but invaluable piece of software to use when it comes to crawl any website and find out errors. The tool can also generate Google Sitemap, or GraphViz visualization to fully understand how the site structure looks like.
But Xenu has its limits. There are some critical things that may add immense value to the data provided by Xenu, but cannot be easily found out by ordinary means! That gives a case for tools like SEOSpyder(for Mac), or Screaming Frog SEO Spider(limited free version available).
All these tools including Xenu Link Sleuth run from your desktop computer. That implies that they consume your bandwidth, and computing resources :). The bandwidth does mean something if I start crawling a site with 100k+ link website. Even with 60 parallel threads, Xenu Link Sleuth can take anywhere between 40-120 minutes depending on the health of my poor broadband connection and the speed of the website.
More over, repeated scanning of the sites can get your IP banned due to suspicious activity.
With those problems in the background, it is no surprise that I was delighted at finding a completely free tool that is one up over Xenu Link Sleuth. It can even match a few things that the paid tools can do, and all that without using your internet connection.
The said tool is Google Sitemap Generator by Internet Marketing Ninjas.
All you have to do is –
- Provide URL of the home page
- Select the number of pages
- Click the “Ninja Check” button to let it rip
For crawling more than 5000 pages the tool needs the following two lines to be added to the robots.txt file.
User-agent: NinjaBot Allow: /
For sites that you don’t own, you can just split the crawl go over different categories separately.
There are a couple of other useful options –
- You can also let the tool use a different UserAgent (e.g. Google Bot, Chrome browser etc.)
- Specify “Crawler Limitations” to include or exclude certain strings
The tool goes through the given website, and categorizes all links as valid web pages/posts (those that return a 200 OK HTTP return code), 301/302 external or internal redirects, and broken links from a variety of causes.
You will find a tonne of useful information in the output..
- All URLs in the website, with the number of internal links, anchor text used by those links
- Titles of the page, meta description and meta tags
- Internal and external links per page
- Image links with alt tags
- Details about all affiliate links in one place
- Page size
- Google Authorship information for each post
- Categories and tags used in the website
Since crawling is a tedious activity, you can provide an email where the report can be emailed. This is completely optional – I chose to wait up and check on the tool.
If you want to check all errors within the website, any potential SEO issues due to external/internal linking, or just want to checkout what the competition is doing – this tool sure comes in handy at the irresistible cost of $0.
Do you know of any similar tools that can help? Comment, and let us know!