In late July, Google added Index Status to Webmaster Tools to help site owners better understand how many pages are indexed on their websites. In addition, Index Status can also help webmasters diagnose indexation problems, which can be caused by redirects, canonicalization issues, duplicate content, or security problems. Until now, many webmasters relied on using less-than-optimal methods for determining true indexation. For example, running site: commands against a domain, subdomain, subdirectory, etc. This was a maddening exercise for many SEO’s, since the number shown could radically change (and quickly).
So, Google adding Index Status was a welcome addition to Webmaster Tools. That said, I’m getting a lot of questions about what the reports mean, how to analyze the data, and how to diagnose potential indexation problems. So that’s exactly what I’m going to address in this post. I’ll introduce the reports and then explain how to use that data to better understand your site’s indexation. Note, it’s important to understand that Index Status doesn’t necessarily answer questions. Instead, it might raise red flags and prompt more questions. Unfortunately, it won’t tell you where the indexation problems reside on your site. That’s up to you and your team to figure out.
The Index Status reports are under the “Health” tab in Google Webmaster Tools. The default report (or “Basic” report) will show you a trending graph of total pages indexed for the past year. This report alone can signal potential problems. For most sites, you should see a steady increase in indexation over time. For example, this is a normal indexation graph:
But what about a trending graph that shows spikes and valleys? If you see something like the graph below, it very well could mean you are experiencing indexation issues. Notice how the indexation graph spikes, then drops, only to spike again. There may be legitimate reasons why this is happening, based on changes you made to your site. But, you might have no idea why your indexation is spiking, and would require further site analysis to understand what’s going on. Once again, this is why SEO Audits are so powerful.
Now it’s time to dig into the advanced report, which definitely provides more data. When you click the “Advanced” tab, you’ll see four trending lines in the graph. The data includes:
- Total Indexed
- Ever Crawled
- Not Selected
- Blocked by Robots
“Total indexed” is the same data we saw in the basic report. “Ever crawled” shows the total number of pages ever crawled by Google (the cumulative total). “Not selected” includes the total number of pages that have not been selected to be indexed, since they look extremely similar to other pages, or that redirect to other pages. I’ll cover “Not selected” in more detail below. And “Blocked by robots” is just that, pages that you are choosing to block. Note, those are pages you are hopefully choosing to block… More about that below.
What You Can Learn From Index Status
When you analyze the advanced report, you might notice some strange trending right off the bat. For example, if you see the number of pages blocked by robots.txt spike, then you know someone added new directives. For example, one of my clients had that number jump from 0 to 20,000+ URL’s in a short period of time. Again, if you want this to happen, then that’s totally fine. But if this surprises you, then you should dig deeper.
Depending on how you structure a robots.txt file, you can easily block important URL’s from being crawled and indexed. It would be smart to analyze your robots.txt directives to make sure they are accurate. Speak with your developers to better understand the changes that were made, and why. You never know what you are going to find.
The Red Flag of “Not Selected”
If you notice a large number of pages that fall under “Not selected”, then that could also signal potential problems. Note, depending on the type of website you have, it might be completely normal to see a larger number of “Not selected” than indexed. It’s natural for Google to run into some redirects and non-canonical URL’s while crawling your site. And that’s especially the case with ecommerce sites or large publishers.
But, that number should not be extreme… For example, if you see the number of pages flagged as “Not selected” suddenly spike to 100K pages, when you only have 1,500 pages indexed, then you might have a new technical issue on your hands. Maybe each page on your site is resolving at multiple URL’s based on a coding change. That would yield many “Not selected” pages. Or maybe you implemented thousands of redirects without realizing it. Those would fall under “Not selected” as well.
Index Status can also flag potential hacking scenarios. If you notice the number of pages indexed spike or drop significantly, then it could mean that someone (or some bot) is adding or deleting pages from your site. For example, someone might be adding pages to your site that link out a number of other websites delivering malware. Or maybe they are inserting rich anchor text links to other risky sites from newly-created pages on your site. You get the picture.
Again, these reports don’t answer your questions, they prompt you to ask more. Take the data and speak with your developers. Find out what has changed on the site, and why. If you are still baffled, then have an SEO audit completed. As you can guess, these reports would be much more useful if the problematic URL’s were listed. That would provide actionable data right within the Index Status reports in Google Webmaster Tools. My hope is that Google adds that data some day.
Bonus Tip: Use Annotations to Document Site Changes
For many websites, change is a constant occurrence. If you are rolling out new changes to your site on a regular basis, then you need a good way to document those changes. One way of doing this is by using annotations in Google Analytics. Using annotations, you can add notes for a specific date that are shared across users of the GA profile. I use them often when changes are made SEO-wise. Then it’s easier to identify why certain changes in your reporting are happening. So, if you see strange trending in Index Status, then double check your annotations. The answer may be sitting right in Google Analytics. :)
Summary – Analyzing Your Index Status
I think the moral of the story here is that normal trending can indicate strong SEO health. You want to see gradual increases in indexation over time. That said, not every site will show that natural increase. There may be spikes and valleys as technical changes are made to a website. So, it’s important to analyze the data to better understand the number of pages that are indexed, how many are being blocked by robots.txt, and how many are not selected based on redirects or canonical issues. What you find might be completely expected, which would be good. But, you might be uncovering a serious issue that’s inhibiting important pages from being crawled and indexed. And that can be a killer SEO-wise.