Google Search Console – A Comprehensive Guide – Part 5

SEO SEO - Google Webmaster Tools

Welcome to part 5 of my 7 1/2 part guide to Google Webmaster Tools. In this post I will be covering everything under the Crawl section.

Crawl

This area of Google Webmaster Tools that contain a lot of advanced features used to control the search visibility of your website and provide you with with information on how often your site is crawled. The first area in this section is called Crawl Errors.

Crawl Errors

I’m not going to spend too much time on this section since I covered  it in Part 1, so I will just do a quick review. This section shows us a list of errors that Google has found when crawling our websites within the last 90 days. We are able to download that list which includes server codes, URLs and the date detected, providing you with more information to diagnose and solve those issues. Recently added features include showing errors from Smartphones and Googlebot-mobile crawl errors. From here you can also see data from DNS, Server connectivity and Robots.txt.

Crawl Stats

This area under Crawl shows all Googlebot activity, on our website, for the past 90 days. There are three graphs we are presented with when we first land on this page: Pages crawled per day, Kilobytes downloaded per day and Time spent downloading a page (in milliseconds).

Webmaster Tools Crawl Stats
Google Webmaster Tools Crawl Stats Screenshot

Rolling over a graph will provide us with the date it was crawled and information pertaining to that day. This could mean how many pages were crawled that day, how many kilobytes were downloaded or how much time was spend downloading a page. To the right of these charts are information highlights such as High, Average and Low. This makes it a lot easier if we need to diagnose any issues with site load time. If there are any issues with our site, Webmaster Tools allows you to Fetch as Google to see how Googlebot sees your page.

Fetch as Google

The third link down under Crawl is Fetch as Google. Here we have the ability to submit pages to Google’s index as well as see how Google’s spider sees the page.

Webmaster Tools Fetch as Google
Google Webmaster Tools Fetch as Google Screenshot

From here we have a few options to choose from when we want to see how Google sees our website. The first step is to enter a URL path into the text box. For example, if your website is http;//www.example.com, you only need enter the page you are interested in seeing; this is what comes after .com/. Let’s say we are interested in /sample.html, we would enter sample.html in the space provided and click FETCH. After we click fetch we are prompted with a pending message then Your request was completed successfully. Here is what it looks like:

webmaster tools fetch as Google screenshot 2
Google Webmaster Tools Fetch as Google Screenshot 2

Since I don’t currently have the page /sample.html on my website, this tool is going to return an error message of Not found. From doing SEO we know that not found is a 404 error.

webmaster tools fetch as Google screenshot 3
Google Webmaster Tools Fetch as Google Screenshot 3

Messages in the Fetch Status column are hyperlinks, clicking them will take us to a page that outlines the status in more detail. Here is a small sample of what you would see on a not found error page:

Fetch as Google

This is how Googlebot fetched the page.
URL: http://joerega.com/sample.html Date: Thursday, March 6, 2014 at 4:23:54 AM PST Googlebot Type: Web Download Time (in milliseconds): 2016
HTTP/1.1 404 Not Found
Date: Thu, 06 Mar 2014 12:24:20 GMT

Notice the 404 Not Found message above. Since this page does not exist, there is no HTML source code to download. It’s safe to assume that this page never existed, or was removed, and should be properly redirected. Jumping back to the main Fetch as Google page there are a few other areas to discuss. First, you may have noticed that there was a number under Fetches remaining; this number is 500. The counter means that you are allowed to use this feature for 500 URLs per a week. After entering a URL (that exists) and clicking  Fetch as Google, the message that appears under Fetch status will say Succes and say have a button appear next to it that says Submit to index. This will allow us to submit a page to Google’s index faster than waiting for their spiders to crawl our site. If we select the option to submit a URL to Google’s index, we are prompted with two options: Submit just this URL and Submit this URL and all linked URLs.

webmaster tools fetch as Google screenshot 5
Google Webmaster Tools Fetch as Google Screenshot 5

 

 

 

 

 

 

Notice that the Fetch status on the new URL now says Success and has Submit to index next to it. Clicking the Submit to index will cause a pop-up to appear giving us two options. Here we can choose to submit just one URL or this URL plus all linked pages. The ladder should only be used if there were major updates to your website; this is why that option is limited to 10 per week.

webmaster tools fetch as Google screenshot 4
Google Webmaster Tools Fetch as Google Screenshot 4

Once submitted the Submit to index button will now say URL submitted to index. We can then explore other fetch options that are available . Entering another page or directory into the fetch bar we see that there are five options total: Web, Mobile XHTML/WML, Mobile cHTML, Mobile Smartphone – New and Mobile Smartphone – Depreciated. The first option, New, we already saw how it worked by entering the URL above and submitting to index. The other options are used to see how your phone will preform on various  mobile devices. In an effort to save time and not get too far off course, I will mention that the other options are used to test the different types of markup used for mobile development. Now, to do a complete 180, let’s discuss another area in Webmaster Tools where we can tell if Google and other search engines are ignoring URLs or directories on your site.

***Update 9/24/2015

Before changed over to Search Console, Google added a feature to this section called fetch and render. This allows us to see our web pages as Google perceives them.

For example, if there are any blocked resources such as a CSS or Javascript file, the page may render different for googlebot than it would for you or me.

Fetch and render - Google Search Console

Blocked URLs

***Update 9/24/2015

This section is now called Blocked Resources and has moved under Google Index, covered in Part 4 of this guide. ***

Blocked URLs in Webmaster Tools allows us to see if our robots.txt is working properly and properly blocking content we don’t want indexed. Be careful how you configure your robots.txt because it’s very easy to overlook something and block an entire section of your website, or your entire website from Google (yes I’ve seen this happen). Here is what a basic robots.txt looks like:

User-agent: * Disallow:

The asterisks ( * ) after User-agent: is saying that all search engines are welcome to crawl the site. Disallow: on the next line is currently saying that we are not disallowing search engines from crawling our site. If we were to include a  forward slash after disallow, this would block our entire website from all search engines.

User-agent: * Disallow: / This is what a robots.txt looks like when blocking an entire site.

In this section we are able to test our robots.txt before it’s uploaded to check for any errors. Here is a screenshot of how it looks:

webmaster tools blocked urls screenshot 1
Google Webmaster Tools Blocked URLs Screenshot 1

Currently we can see that no search engines are blocked an my website is open for indexing. Further down this page we see an area where we can test blocking pages before going live on our robots.txt. Here I used the page /test.html (which doesn’t really exist on  my site btw) to show the results.

webmaster tools blocked urls screenshot 2
Google Webmaster Tools Blocked URLs Screenshot 2

 

 

 

 

 

 

 

 

 

 

The page /test.html would be successfully blocked if I was to add this to my robots.txt. Similar to the Fetch as Google section mentioned prior to this, we are able to test our robots.txt against other Google user-agents such as Googlebot-Mobile, Google-Image, Mediapartners-Google (used to determine AdSense content) and Adsbot-Google (used to determine AdWords landing page quality). Some people choose to add their XML sitemap to the robots as well; while it’s not required, it’s also not harmful to do so. Picture the robots.txt as a roadmap to your website. Search engines will stop here first to determine what pages can be crawled an indexed. They already know to look for an XML sitemap but adding the line Sitemap: http://www.example.com/sitemap.xml may bring piece of mind to the webmaster. Sitemaps themselves can become very complicated if there is a mistake, which is why I’m glad that Webmaster Tools provides a section for that as well

 

**addendum**

As of  Wednesday, July 16, there is an easier way to test a websites robots.txt file using the robots.txt testing tool.

Robots.txt Testing Tool

We are now able to test the performance of our robots.txt before we upload it and make it live. Think of this as a spell check for technical issues. Here we are able to see what sections of our sites are blocked, intentionally or not, and are able to add new fields manually to test how they will work.  New Disallow: tags that are added correctly will be highlighted green while incorrect ones will have a red highlight.

I highly recommend you test any new intended updates here first before publishing them because I don’t want to see an entire website blocked from the search engines (believe me this happens more than you’d expect).

 

Sitemaps

Sitemaps are an important part of a website. While HTML sitemaps are geared towards humans, XML sitemaps cater towards machines; more specifically, search engines. Google Webmaster Tools allows us to upload multiple sitemaps and will alert us to any errors they may contain or develop. We are able to see, in this section as well as in Google Index, how many pages are indexed.

Without spending too much time on sitemaps themselves, I will only cover the basics here today and how we upload them to Webmaster Tools.

The second to last option in the left-nav under Crawl is Sitemaps. Clicking on it will bring us to the following page:

webmaster tools sitemaps screenshot 1
Google Webmaster Tools Sitemaps Screenshot 1

 

This is easy to visualize since I currently don’t have may pages to my website. The first step in entering a sitemap is to click the red button in the upper-left corner that says: ADD/TEST SITEMAP. As soon as that button is clicked we see the following window appear:

webmaster tools sitemaps screenshot 3
Google Webmaster Tools Sitemaps Screenshot 3

 

 

 

 

 

 

The location of an XML sitemap generally lives on the domain name as a file. What does that mean? It means it can be found here: http://www.example.com/sitemap.xml. Some websites will have site-map.xml or sitemap.xml.gz, theses formats work just the same. After you enter the file name in the space provided, click Submit Sitemap. This will add your sitemap to Webmaster Tools and provide you with the image you see above: how many pages are indexed verses how many have been submitted.

If your website is new and recently launched, you may only see one column for Submitted. That’s perfectly normal as Google has not indexed any of your pages yet.

Before submitting your sitemap you may want to text it to see if there are any errors. In lieu of clicking Submit Sitemap, click Test Sitemap and you will be provided feedback on the health of your XML Sitemap.

Once your Sitemap is submitted and the URLs are indexed, you can click on the second tab that says Web Pages.

webmaster tools sitemaps screenshot 2
Google Webmaster Tools Sitemaps Screenshot 2

This page will let you know if there are any issues with your Sitemaps. I say Sitemaps because you can submit more than one in this section. Many webmasters will upload separate Sitemaps for different pieces of content: one for images, one for videos, etc.

 

URL Parameters

This is another technical section of Google Webmaster Tools and should be used very carefully. Changing the parameters of our URLs may severely affect how our sites are crawled and indexed.

URL Parameters work like this: Let’s say we have an eCommerce website that sells shoes. One of our URLs may looks like this: http://www.example.com/mens?category=sneakers&nike. There may also be another version of the URL that looks like this: http://www.example.com/mens-nike-sneakers.html. To avoid any duplicate content issues with our websites, we can use this section to help us show Google the main URL we would like indexed. Again, I am going to stress the importance of being extremely careful in this section. Google goes into greater detail about this and you can read more here.

 

Conclusion

We covered a lot of material in this post today. From diagnosing crawl errors to learning how Google sees our website. The Crawl section of Webmaster Tools is very useful when we need to dig deeper into our sites health.

 

Back to Part 4

Proceed to Part 6

About the Author:

An SEO in NYC with a penchant for the technical side of things. Father, Husband, Novice Photographer and Music Lover.

2 comments

  1. Dee Dee

    This article is very helpful! It answered many of my questions. I do have one question left though. Any idea what’s included in “pages crawled per day”? In GWT, I have a site that Google currently has about 700 pages indexed and the site map only contains about 1,100 page. But the crawl stats: pages crawled per day spiked to 17,000 on day last month and then went to about 5,000 pages crawled per day for a whole week. Is there any explanation for the number jumping so high? There’s not even that many pages on the site. I thought maybe the “pages crawled per day” might include more than just website pages? Thanks.

    • Hi Dee Dee,
      Sorry for the delay in response, I’m just seeing this now. The pages crawled per day are looking at each page and the links therein. So, if your Home page has 20 links in its navigation, it will crawl those 20 pages, as many times as they’re linked to.
      The goal is to increase the amount of pages crawled (this will show an increase in content) and reduce the sites load time.
      I hope this was helpful!
      –Joe

Comments are closed.