Welcome to part 5 of my 7 1/2 part guide to Google Webmaster Tools. In this post I will be covering everything under the Crawl section.
This area of Google Webmaster Tools that contain a lot of advanced features used to control the search visibility of your website and provide you with with information on how often your site is crawled. The first area in this section is called Crawl Errors.
I’m not going to spend too much time on this section since I covered it in Part 1, so I will just do a quick review. This section shows us a list of errors that Google has found when crawling our websites within the last 90 days. We are able to download that list which includes server codes, URLs and the date detected, providing you with more information to diagnose and solve those issues. Recently added features include showing errors from Smartphones and Googlebot-mobile crawl errors. From here you can also see data from DNS, Server connectivity and Robots.txt.
This area under Crawl shows all Googlebot activity, on our website, for the past 90 days. There are three graphs we are presented with when we first land on this page: Pages crawled per day, Kilobytes downloaded per day and Time spent downloading a page (in milliseconds).
Rolling over a graph will provide us with the date it was crawled and information pertaining to that day. This could mean how many pages were crawled that day, how many kilobytes were downloaded or how much time was spend downloading a page. To the right of these charts are information highlights such as High, Average and Low. This makes it a lot easier if we need to diagnose any issues with site load time. If there are any issues with our site, Webmaster Tools allows you to Fetch as Google to see how Googlebot sees your page.
Fetch as Google
The third link down under Crawl is Fetch as Google. Here we have the ability to submit pages to Google’s index as well as see how Google’s spider sees the page.
From here we have a few options to choose from when we want to see how Google sees our website. The first step is to enter a URL path into the text box. For example, if your website is http;//www.example.com, you only need enter the page you are interested in seeing; this is what comes after .com/. Let’s say we are interested in /sample.html, we would enter sample.html in the space provided and click FETCH. After we click fetch we are prompted with a pending message then Your request was completed successfully. Here is what it looks like:
Since I don’t currently have the page /sample.html on my website, this tool is going to return an error message of Not found. From doing SEO we know that not found is a 404 error.
Messages in the Fetch Status column are hyperlinks, clicking them will take us to a page that outlines the status in more detail. Here is a small sample of what you would see on a not found error page:
Fetch as GoogleThis is how Googlebot fetched the page.URL: http://joerega.com/sample.html Date: Thursday, March 6, 2014 at 4:23:54 AM PST Googlebot Type: Web Download Time (in milliseconds): 2016
HTTP/1.1 404 Not Found Date: Thu, 06 Mar 2014 12:24:20 GMT
Notice the 404 Not Found message above. Since this page does not exist, there is no HTML source code to download. It’s safe to assume that this page never existed, or was removed, and should be properly redirected. Jumping back to the main Fetch as Google page there are a few other areas to discuss. First, you may have noticed that there was a number under Fetches remaining; this number is 500. The counter means that you are allowed to use this feature for 500 URLs per a week. After entering a URL (that exists) and clicking Fetch as Google, the message that appears under Fetch status will say Succes and say have a button appear next to it that says Submit to index. This will allow us to submit a page to Google’s index faster than waiting for their spiders to crawl our site. If we select the option to submit a URL to Google’s index, we are prompted with two options: Submit just this URL and Submit this URL and all linked URLs.
Notice that the Fetch status on the new URL now says Success and has Submit to index next to it. Clicking the Submit to index will cause a pop-up to appear giving us two options. Here we can choose to submit just one URL or this URL plus all linked pages. The ladder should only be used if there were major updates to your website; this is why that option is limited to 10 per week.
Once submitted the Submit to index button will now say URL submitted to index. We can then explore other fetch options that are available . Entering another page or directory into the fetch bar we see that there are five options total: Web, Mobile XHTML/WML, Mobile cHTML, Mobile Smartphone – New and Mobile Smartphone – Depreciated. The first option, New, we already saw how it worked by entering the URL above and submitting to index. The other options are used to see how your phone will preform on various mobile devices. In an effort to save time and not get too far off course, I will mention that the other options are used to test the different types of markup used for mobile development. Now, to do a complete 180, let’s discuss another area in Webmaster Tools where we can tell if Google and other search engines are ignoring URLs or directories on your site.
Before changed over to Search Console, Google added a feature to this section called fetch and render. This allows us to see our web pages as Google perceives them.
This section is now called Blocked Resources and has moved under Google Index, covered in Part 4 of this guide. ***
Blocked URLs in Webmaster Tools allows us to see if our robots.txt is working properly and properly blocking content we don’t want indexed. Be careful how you configure your robots.txt because it’s very easy to overlook something and block an entire section of your website, or your entire website from Google (yes I’ve seen this happen). Here is what a basic robots.txt looks like:
User-agent: * Disallow:
The asterisks ( * ) after User-agent: is saying that all search engines are welcome to crawl the site. Disallow: on the next line is currently saying that we are not disallowing search engines from crawling our site. If we were to include a forward slash after disallow, this would block our entire website from all search engines.
User-agent: * Disallow: / This is what a robots.txt looks like when blocking an entire site.
In this section we are able to test our robots.txt before it’s uploaded to check for any errors. Here is a screenshot of how it looks:
Currently we can see that no search engines are blocked an my website is open for indexing. Further down this page we see an area where we can test blocking pages before going live on our robots.txt. Here I used the page /test.html (which doesn’t really exist on my site btw) to show the results.
The page /test.html would be successfully blocked if I was to add this to my robots.txt. Similar to the Fetch as Google section mentioned prior to this, we are able to test our robots.txt against other Google user-agents such as Googlebot-Mobile, Google-Image, Mediapartners-Google (used to determine AdSense content) and Adsbot-Google (used to determine AdWords landing page quality). Some people choose to add their XML sitemap to the robots as well; while it’s not required, it’s also not harmful to do so. Picture the robots.txt as a roadmap to your website. Search engines will stop here first to determine what pages can be crawled an indexed. They already know to look for an XML sitemap but adding the line Sitemap: http://www.example.com/sitemap.xml may bring piece of mind to the webmaster. Sitemaps themselves can become very complicated if there is a mistake, which is why I’m glad that Webmaster Tools provides a section for that as well
As of Wednesday, July 16, there is an easier way to test a websites robots.txt file using the robots.txt testing tool.
Robots.txt Testing Tool
We are now able to test the performance of our robots.txt before we upload it and make it live. Think of this as a spell check for technical issues. Here we are able to see what sections of our sites are blocked, intentionally or not, and are able to add new fields manually to test how they will work. New Disallow: tags that are added correctly will be highlighted green while incorrect ones will have a red highlight.
I highly recommend you test any new intended updates here first before publishing them because I don’t want to see an entire website blocked from the search engines (believe me this happens more than you’d expect).
Sitemaps are an important part of a website. While HTML sitemaps are geared towards humans, XML sitemaps cater towards machines; more specifically, search engines. Google Webmaster Tools allows us to upload multiple sitemaps and will alert us to any errors they may contain or develop. We are able to see, in this section as well as in Google Index, how many pages are indexed.
Without spending too much time on sitemaps themselves, I will only cover the basics here today and how we upload them to Webmaster Tools.
The second to last option in the left-nav under Crawl is Sitemaps. Clicking on it will bring us to the following page:
This is easy to visualize since I currently don’t have may pages to my website. The first step in entering a sitemap is to click the red button in the upper-left corner that says: ADD/TEST SITEMAP. As soon as that button is clicked we see the following window appear:
The location of an XML sitemap generally lives on the domain name as a file. What does that mean? It means it can be found here: http://www.example.com/sitemap.xml. Some websites will have site-map.xml or sitemap.xml.gz, theses formats work just the same. After you enter the file name in the space provided, click Submit Sitemap. This will add your sitemap to Webmaster Tools and provide you with the image you see above: how many pages are indexed verses how many have been submitted.
If your website is new and recently launched, you may only see one column for Submitted. That’s perfectly normal as Google has not indexed any of your pages yet.
Before submitting your sitemap you may want to text it to see if there are any errors. In lieu of clicking Submit Sitemap, click Test Sitemap and you will be provided feedback on the health of your XML Sitemap.
Once your Sitemap is submitted and the URLs are indexed, you can click on the second tab that says Web Pages.
This page will let you know if there are any issues with your Sitemaps. I say Sitemaps because you can submit more than one in this section. Many webmasters will upload separate Sitemaps for different pieces of content: one for images, one for videos, etc.
This is another technical section of Google Webmaster Tools and should be used very carefully. Changing the parameters of our URLs may severely affect how our sites are crawled and indexed.
URL Parameters work like this: Let’s say we have an eCommerce website that sells shoes. One of our URLs may looks like this: http://www.example.com/mens?category=sneakers&nike. There may also be another version of the URL that looks like this: http://www.example.com/mens-nike-sneakers.html. To avoid any duplicate content issues with our websites, we can use this section to help us show Google the main URL we would like indexed. Again, I am going to stress the importance of being extremely careful in this section. Google goes into greater detail about this and you can read more here.
We covered a lot of material in this post today. From diagnosing crawl errors to learning how Google sees our website. The Crawl section of Webmaster Tools is very useful when we need to dig deeper into our sites health.