Sinister 404s – The Hidden SEO Danger of Returning The Wrong Header Response Code [Case Study]

Hidden SEO Danger 404 Response Code

A few weeks ago, I was contacted by a small business owner about my SEO services. And what started out as a simple check of a website turned into an interesting case study about hidden SEO dangers. The company has been in business for a long time (30+ years), and the owner was looking to boost the site’s SEO performance over the long-term. From the email and voicemail I received, it sounded like they were struggling to rank well across important target queries and wanted to address that ASAP. I also knew they were running AdWords to provide air cover for SEO (which is smart, but definitely not a long-term plan for their business).

Unfortunately, my schedule has been crazy and I knew I couldn’t take them on as a longer-term client. But, I still wanted to quickly check out their website to get a better feel for what was going on. And it took me about three minutes to notice a massive problem (one that is killing their efforts to rank for many queries). And that’s a shame because they probably should rank for those keywords based on their history, services, content, etc.

Surfacing a Giant SEO Problem
As I browsed the site, I noticed they had a good amount of content for a small business. The site had a professional design, it was relatively clean from a layout perspective, and provided strong content about their business, their history, news about the organization, the services they provided, and more.

But then it hit me. Actually, it was staring me right in the face. I noticed a small 404 icon when hitting one of their service pages (via the Redirect Path Chrome extension). OK, so that’s odd… The page renders fine, the content and design show up perfectly, but the page 404s (returning a Page Not Found error). It’s like the opposite of a soft 404. That’s where the page looks like a 404, but actually returns a 200 code. Well in this situation, the page look like a 200, but returns a 404 instead. I guess you can call it a “soft 200″.

404 Header Response Code in Redirect Path Chrome Extension

So I started to visit other pages on the site and more 404 header response codes followed. Actually, almost every single page on the site was throwing a 404 header response code. Holy cow, the initial 404 was just the tip of the iceberg.

After seeing 404s pop up all over the site, I quickly decided to crawl the website via Screaming Frog. I wanted to see how widespread of a problem it was. And it ends up that my initial assessment was spot on. Almost every page on the site returned a 404 header response code. The only pages that didn’t were the homepage and some pdfs. But every other page, including the services pages, news pages, about page, contact, etc. returned a 404.

Header Response Codes in Screaming Frog

For those of you familiar with SEO, then you know how this problem can impact a website. But for those of you unfamiliar with 404s and how they impact SEO, I’ll provide a quick rundown next. Then I’ll jump back to the story.

What is a 404 Header Response Code?
Every time a webpage is requested, the server will return a header response code. There are many that can be returned, but there are some standard codes you’ll come across. For example, 200 means the page returned OK, 301 means permanent redirect, 302 is a temporary redirect, 500 is an application error, 403 is forbidden, and 404 means page not found.

Header response codes are extremely important to understand for SEO. If you want a webpage indexed, then you definitely want it to return a 200 response code (which again, means OK, the request has succeeded). But if the page returns a 404, then that tells the engines that the page was not found and that it should be removed from the index. Yes, read that last line again. 404s basically inform Google and Bing that the page is gone and that it can be removed from each respective index. That means it will have no shot of ranking for target keywords.

And from an inbound links perspective, 404s are a killer. If a page 404s, then it cannot benefit from any inbound links pointing at the url. And the domain itself cannot benefit either (at an aggregate level). So 404s will get urls removed from Google’s index and can hamper your link equity (at the url level and at the domain level). Not good, to say the least.

Side Note: Checking Response  Codes
Based on what I’ve explained, some of you reading this post might be wondering how to easily check your header response codes. And you definitely should. I won’t cover the process in detail in this post, but I will point you in the right direction. There are several tools to choose from and I’ll include a few below.

You could Fetch as Google in Google Webmaster Tools to check the response sent to Googlebot (which includes the header response code). You can also use a browser plugin like Web Developer Tools or Redirect Path to quickly check header response codes on a url by url basis.

Web Developer Plugin Header Response Code

Fetch as Google and browser plugins are great, but they only let you process one url at a time. But what if you wanted to check your entire site in one shot? For situations like that, you could use a tool that crawls an entire website (or sections of a site). For example, you could use Xenu or Screaming Frog for small to medium sized sites and then a tool like Deep Crawl for larger-scale sites. All three will return a boatload of information about your pages, including the header response codes. Now back to the case study.

Dangerous, But Invisible to the Naked Eye
Remember, the entire site was returning 404 header response codes, other than the homepage and a few pdfs. But this 404 situation was sinister since the webpages looked like they resolved ok. You didn’t see a standard 404 page, but instead, you saw the actual page and content. But, the pages were actually 404ing and not being indexed. Like I said, it was a sinister problem.

Based on what I just explained, you could tell why an SMB owner would be baffled and simply not understand why their website wasn’t ranking well. They could see their site, their content, the various pages resolving, but they couldn’t see the underlying problem. Header response codes are hidden to the naked eye, and most people don’t even realize they are being returned at all. But the response code returned is critically important for how the search engines process your webpages.

Swingers Find Hidden 404s

My Response – “You’re At SEO Defcon 2”
This was a tough situation for me. I absolutely wanted to help the business longer-term, but couldn’t based on my schedule. But I absolutely wanted to make sure they understood the problem I came across while quickly checking out their website.

So I crafted a quick email explaining that I couldn’t help them at this time, but that I found a big problem on their site. As quickly and concisely as I could, I explained the 404 situation, provided a few screenshots, and explained they should get in touch with their designer, developer, or hosting provider to rectify the situation ASAP. That means ensuring their webpages return the proper header response codes. Basically, I told them that if their webpages should be indexed, then they should return a 200 header response code and not the 404s being returned now.

I hit “Send” and the ball was in their court.

Their Response – “We hear you and we’re on the right track – we think.”
I heard back from the business owner who explained they started working with someone to rectify the problem. They clearly didn’t know this was going on and they were hoping to have the situation fixed soon.

But as of today, the problem is still there. The site still returns 404 header response codes on almost every page. That’s unfortunate, since again, the pages returning a 404 have no chance at all of ranking in search and cannot help them from a link equity standpoint. The pages aren’t indexed and the site is basically telling Google and Bing to not index any of the core pages on the site.

I’m going to keep an eye on the situation to see when the changes take hold. And I hope that’s soon. It’s a great example of how hidden technical dangers can destroy SEO.

Opening Up The Site – How Will The Engines Respond?
My hope is that when the pages return the proper response codes that Google and Bing will begin indexing the pages and ranking them appropriately. And that will help on several levels. The website can drive more prospective customers via organic search, while the business can probably pull back on AdWords spend. And the site can grow its power from an inbound link standpoint as well, now that the pages are being indexed properly.

But as I often say about SEO, it’s all about the execution. If they don’t implement the necessary changes, then their situation will remain as-is. I’ll try an update this post if the situation improves.

Summary – Know Your Header Response Codes
Although hidden to the naked eye, header response codes are critically important for SEO. The right codes will enable the engines to properly crawl and index your webpages, while the wrong codes could lead to SEO disaster. I recommend checking your site today (via both manual checks and a crawl). You might find you’re in the clear with 200s, but you also might find some sinister 404s. So check now.

GG

 

How To Identify A Mobile Rankings Demotion Using The New Search Analytics Report in Google Webmaster Tools

Search Impact Reporting in Google Webmaster Tools

{Update: The Search Impact report was renamed to “Search Analytics” during the beta. The screenshots below will show “Search Impact” when the new report in Google Webmaster Tools is labeled “Search Analytics”.}

April 21, 2015 is an important date. That’s the day when Google will begin using mobile friendliness as a ranking signal. There’s been a lot of talk about how that’s actually going to work, how much of an impact it will have, etc. Well, more and more information has been surfacing over the past few days about the changes.

For example, Gary Illyes spoke at SMX West heavily about the new mobile UX algo and provided some outstanding information. Jennifer Slegg wrote up a recap of that session, which I highly recommend reading. She provided some amazing nuggets of information, including information about mobile friendly techniques, how the algo will handle specific urls, if 4/21 is hard date for the rollout, if Google is building a mobile index (which they are), and more.

So, as 4/21 quickly approaches, many webmasters are working hard to get their sites in order from a mobile UX standpoint. As documented by John Mueller and Gary Illyes (and really Google itself), you can use any of the three options for providing a mobile-friendly version of your website. For example, you can use responsive design, dynamic delivery, or even a separate mobile site. I’ve seen all three techniques work well for clients, so the path you choose should be based on your own site and business. But definitely move quickly… April 21 will roll up quickly.

 

The *Current* Smartphone Rankings Demotion – A Glimpse Into the Future
Many people don’t realize this, but Google already has a smartphone rankings demotion in place for specific situations. For example, when there are faulty redirects from the desktop version of the content to the mobile version, or if there are other mobile-only errors.

I caught one of those situations in the wild and wrote a two-part case study about it. I first detailed the problems I saw on Electronista.com and then documented the improvements in rankings and traffic once the problems were fixed. Based on what Gary Illyes and John Mueller have both said about the mobile UX algo, it sounds like the new algorithm will work in a very similar fashion to the current smartphone rankings demotion. Therefore, I definitely recommend you review the two-part case study.

Checking For Faulty Mobile Redirects

For example, the current smartphone rankings demotion is on a url by url basis. Just because you have faulty redirects or mobile-only errors does not mean the entire domain should suffer (algorithmically). Also, the desktop urls are unaffected (which makes absolute sense). Also, and this is important, the algorithm is running in real-time and will impact urls during the normal crawling process.

That means urls can be demoted as Google comes across mobile problems, but the demotion can also be lifted as Google crawls the urls and notices that the problems are fixed. And that’s exactly what I saw with the smartphone rankings demotion situations I have helped with.

 

Checking Mobile Rankings and The (New) Search Analytics Report
Google is currently testing a new search queries report in Google Webmaster Tools (called the Search Analytics report). Note, the report used to be called “Search Impact”, but was changed during the alpha. I have been testing the new version of the Search Analytics reporting and it provides some great functionality beyond what the current Search Queries reporting provides. I plan to write more about that soon, but for now, let’s focus on the mobile friendliness algorithm rolling out on 4/21.

There are six dimensions you can segment your data by in the new Search Analytics reporting. One of those dimensions is “Devices”. Using this report, you can filter data by desktop, mobile, and tablet. See below:

The Devices Dimension in The Search Impact Reporting

But don’t get fooled by the simplicity of the default report. By combining dimensions, you can view some elaborate reports that tell you a lot in a short amount of time.

When working on a smartphone rankings demotion (the current algo in place), I had to identify queries where a site ranked well in the desktop results, and then jump to the search queries reporting using the “mobile” filter for search property. When doing this for a large amount of queries, it could easily get monotonous.

But the new Search Analytics report comes to the rescue and provides a nifty way to see side by side rankings when comparing desktop to mobile. Below, I’m going to show you how to quickly run this report to see a side by side comparison of clicks and average position by query. By doing so, you can quickly identify a smartphone rankings demotion. That’s for the current smartphone rankings demotion, and should work for the new mobile UX algo rolling out on 4/21/15. Let’s jump into the report.

 

How To Check Rankings By Device
First, if you’re not part of the alpha testing program, then you won’t be able to access the Search Analytics report. But don’t fear, I can only imagine that Google wants to roll it out prior to 4/21/15 (based on the device reporting I’m showing you in this post).

To access the reporting, click “Search Traffic” and then “Search Analytics” in the left-side menu:

Accessing The Search Impact Reporting in Google Webmaster Tools

The default view will show you clicks for the past 30 days. The first thing you need to do is click the “Queries” dimension. That will present all of the queries your site ranks for during the timeframe you selected.

Using The Queries Dimension In The Search Impact Reporting

Next, click the filter dropdown underneath “Devices”, which should say “No filter” (since there isn’t a filter in place yet). Click the dropdown and the select “Compare devices”.

Filtering By Device In The Search Impact Reporting

Keep “Desktop VS. Mobile” as the selection and then click “Compare”.

Comparing By Device In The Search Impact Reporting

You should now see a comparison of clicks per query for both desktop and mobile. That’s great, but we need to know how the site ranks for each query across both desktop and mobile. To see that, click the checkbox for the “Avg. Position” metric.  This will add average position for each query to the report.

Adding The Average Position Metric In The Search Impact Reporting

To view more queries than the default ten, you can use the dropdown at the top of the report. For example, you can show up to 500 rows in the report in Google Webmaster Tools.

Now you can start checking rankings for queries across both desktop and mobile. Don’t expect them to be exactly the same for every query… But they should be close. For example, the first three listed below are very close (two are identical and one is off by just .1).

Comparing Average Position by Query In The Search Impact Reporting

In my experience, when you have a smartphone rankings demotion, there will be a clear difference. For example, some smartphone rankings will be 10+ positions lower (or even non-existent in certain situations). So, if you see rows like the following, then you might have a problem.

Identifying a Rankings Difference In The Search Impact Reporting

 

How To Identify Problems and Lift The Smartphone Rankings Demotion
If you find that there is a smartphone rankings demotion in place, then you should run to the “Mobile Usability” reporting in Google Webmaster Tools. Google will provide the problems it encountered while crawling your site. I highly recommend fixing those mobile usability issues asap.

Mobile Usability Reporting in Google Webmaster Tools

You can also use the mobile friendly test via the Google Developers site. That will also highlight problems on a url by url basis.
https://www.google.com/webmasters/tools/mobile-friendly/

Using Google's Mobile Friendly Test

You can also check the crawl errors reporting in Google Webmaster Tools to see if there are smartphone errors or faulty redirects.

Smartphone Crawl Errors in Google Webmaster Tools

And you can crawl your site as Googlebot for Smartphones to check how your site is handling requests for the desktop pages (if you have mobile redirects in place). Doing so can surface problems sitting below the surface that are sometimes hard to pick up manually.

Crawl As Googlebot for Smartphones

 

Summary – The Search Analytics Report Can Make An Impact
We all knew that mobile UX would become a ranking signal at some point, but now we have a specific date from Google for the rollout (4/21/15). When the new mobile algo launches, many will be wondering if they have been impacted, if their website dropped in rankings, and which urls are causing problems. As I demonstrated above, the new Search Analytics reporting can help webmasters quickly identify problems by comparing the rankings across desktop and mobile (quickly and efficiently).

If you don’t have access to the Search Analytics reporting yet, don’t worry. Again, I believe Google is going to roll this out before the 4/21 deadline. That would make complete sense, since the “Devices” dimension could prove to be extremely helpful when a smartphone rankings demotion is in place. One thing is for sure. The changes rolling out on (or around) April 21 will be fascinating to analyze. Google said this change will have a “significant impact” on the smartphone search results. And that impact can translate into many lost visitors, conversions, and revenue. Good luck.

GG

 

XML Sitemaps – 8 Facts, Tips, and Recommendations for the Advanced SEO

XML Sitemaps for Advanced SEOs

After publishing my last post about dangerous rel canonical problems, I started receiving a lot of questions about other areas of technical SEO. One topic in particular that seemed to generate many questions was how to best use and set up xml sitemaps for larger and more complex websites.

Sure, in its most basic form, webmasters can provide a list of urls that they want the search engines to crawl and index. Sounds easy, right? Well, for larger and more complex sites, the situation is often not so easy. And if the xml sitemap situation spirals out of control, you can end up feeding Google and Bing thousands, hundreds of thousands, or millions of bad urls. And that’s never a good thing.

While helping clients, it’s not uncommon for me to audit a site and surface serious errors with regard to xml sitemaps. And when that’s the case, websites can send Google and Bing mixed signals, urls might not get indexed properly, and both engines can end up losing trust in your sitemaps. And as Bing’s Duane Forrester once said in this interview with Eric Enge:

“Your Sitemaps need to be clean. We have a 1% allowance for dirt in a Sitemap. If we see more than a 1% level of dirt, we begin losing trust in the Sitemap.”

Clearly that’s not what you want happening…

So, based on the technical SEO work I perform for clients, including conducting many audits, I decided to list some important facts, tips, and answers for those looking to maximize their xml sitemaps. My hope is that you can learn something new from the bullets listed below, and implement changes quickly.

 

1. Use RSS/Atom and XML For Maximum Coverage
This past fall, Google published a post on the webmaster central blog about best practices for xml sitemaps. In that post, they explained that sites should use a combination of xml sitemaps and RSS/Atom feeds for maximum coverage.

Xml sitemaps should contain all canonical urls on your site, while RSS/Atom feeds should contain the latest additions or recently updated urls. XML sitemaps will contain many urls, where RSS/Atom feeds will only contain a limited set of new or recently changed urls.

RSS/Atom Feed and XML Sitemaps

So, if you have new urls (or recently updated urls) that you want Google to prioritize, then use both xml sitemaps and RSS/Atom feeds. Google says by using RSS, it can help them “keep your content fresher in its index”. I don’t know about you, but I like the idea of Google keeping my content fresher. :)

Also, it’s worth noting that Google recommends maximizing the number of urls per xml sitemap. For example, don’t cut up your xml sitemaps into many smaller files (if possible). Instead, use the space you have in each sitemap to include all of your urls. If you don’t Google explains that, “it can impact the speed and efficiency of crawling your urls.” I recommend reading Google’s post to learn how to best use xml sitemaps and RSS/Atom feeds to maximize your efforts. By the way, you can include 50K urls per sitemap and each sitemap must be less than 10MB uncompressed.

 

2. XML Sitemaps By Protocol and Subdomain
I find a lot of webmasters are confused by protocol and subdomains, and both can end up impacting how urls in sitemaps get crawled and indexed.

URLs included in xml sitemaps must use the same protocol and subdomain as the sitemap itself. This means that https urls located in an http sitemap should not be included in the sitemap. This also means that urls on sample.domain.com cannot be located in the sitemap on www.domain.com. So on and so forth.

XML Sitemaps and Protocol and Subdomains

 

This is a common problem when sites employ multiple subdomains or they have sections using https and http (like ecommerce retailers). And then of course we have many sites starting to switch to https for all urls, but haven’t changed their xml sitemaps to reflect the changes. My recommendation is to check your xml sitemaps reporting today, while also manually checking the sitemaps. You might just find issues that you can fix quickly.

 

3. Dirty Sitemaps – Hate Them, Avoid Them
When auditing sites, I often crawl the xml sitemaps myself to see what I find. And it’s not uncommon to find many urls that resolve with non-200 header response codes. For example, urls that 404, 302, 301, return 500s, etc.

Dirty XML Sitemaps

You should only provide canonical urls in your xml sitemaps. You should not provide non-200 header response code urls (or non-canonical urls that point to other urls). The engines do not like “dirty sitemaps” since they can send Google and Bing on a wild goose chase throughout your site. For example, imagine driving Google and Bing to 50K urls that end up 404ing, redirecting, or not resolving. Not good, to say the least.

Remember Duane’s comment from earlier about “dirt” in sitemaps. The engines can lose trust in your sitemaps, which is never a good thing SEO-wise. More about crawling your sitemaps later in this post.

 

4. View Trending in Google Webmaster Tools
Many SEOs are familiar with xml sitemaps reporting in Google Webmaster Tools, which can help surface various problems, while also providing important indexation statistics. Well there’s a hidden visual gem in the report that’s easy to miss. The default view will show the number of pages submitted in your xml sitemaps and the number indexed. But if you click the “sitemaps content” box for each category, you can view trending over the past 30 days. This can help you identify bumps in the road, or surges, as you make changes.

For example, check out the trending below. You can see the number of images submitted and indexed drop significantly over a period of time, only to climb back up. You would definitely want to know why that happened, so you can avoid problems down the line. Sending this to your dev team can help them identify potential problems that can build over time.

XML Sitemaps Trending in Google Webmaster Tools

 

5. Using Rel Alternate in Sitemaps for Mobile URLs
When using mobile urls (like m.), it’s incredibly important to ensure you have the proper technical SEO setup. For example, you should be using rel alternate on the desktop pages pointing to the mobile pages, and then rel canonical on the mobile pages pointing back to the desktop pages.

Although not an approach I often push for, you can provide rel alternate annotations in your xml sitemaps. The annotations look like this:

Rel Alternate in XML Sitemaps

 

It’s worth noting that you should still add rel canonical to the source code of your mobile pages pointing to your desktop pages.

 

6. Using hreflang in Sitemaps for Multi-Language Pages
If you have pages that target different languages, then you are probably already familiar with hreflang. Using hreflang, you can tell Google which pages should target which languages. Then Google can surface the correct pages in the SERPs based on the language/country of the person searching Google.

Similar to rel alternate, you can either provide the hreflang code in a page’s html code (page by page), or you can use xml sitemaps to provide the hreflang code. For example, you could provide the following hreflang attributes when you have the same content targeting different languages:

Hreflang in XML Sitemaps

Just be sure to include a separate <loc> element for each url that contains alternative language content (i.e. all of the sister urls should be listed in the sitemap via a <loc> element).

 

7. Testing XML Sitemaps in Google Webmaster Tools
Last, but not least, you can test your xml sitemaps or other feeds in Google Webmaster Tools. Although easy to miss, there is a red “Add/Test Sitemap” button in the upper right-hand corner of the Sitemaps reporting page in Google Webmaster Tools.

Test XML Sitemaps in Google Webmaster Tools

When you click that button, you can add the url of your sitemap or feed. Once you click “Test Sitemap”, Google will provide results based on analyzing the sitemap/feed. Then you can rectify those issues before submitting the sitemap. I think too many webmasters use a “set it and forget it” approach to xml sitemaps. Using the test functionality in GWT, you can nip some problems in the bud. And it’s simple to use.

Results of XML Sitemaps Test in Google Webmaster Tools

 

8. Bonus: Crawl Your XML Sitemap Via Screaming Frog
In SEO, you can either test and know, or read and believe. As you can probably guess, I’m a big fan of the former… For xml sitemaps, you should test them thoroughly to ensure all is ok. One way to do this is to crawl your own sitemaps. By doing so, you can identify problematic tags, non-200 header response codes, and other little gremlins that can cause sitemap issues.

One of my favorite tools for crawling sitemaps is Screaming Frog (which I have mentioned many times in my previous posts). By setting the crawl mode to “list mode”, you can crawl your sitemaps directly. Screaming Frog natively handles xml sitemaps, meaning you don’t need to convert your xml sitemaps into another format before crawling (which is awesome).

Crawling Sitemaps in Screaming Frog

Screaming Frog will then load your sitemap and begin crawling the urls it contains. In real-time, you can view the results of the crawl. And if you have Graph View up and running during the crawl, you can visually graph the results as the crawler collects data. I love that feature. Then it’s up to you to rectify any problems that are surfaced.

Graph View in in Screaming Frog

 

Summary – Maximize and Optimize Your XML Sitemaps
As I’ve covered throughout this post, there are many ways to use xml sitemaps to maximize your SEO efforts. Clean xml sitemaps can help you inform the engines about all of the urls on your site, including the most recent additions and updates. It’s a direct feed to the engines, so it’s important to get it right (and especially for larger and more complex websites).

I hope my post provided some helpful nuggets of sitemap information that enable you to enhance your own efforts. I recommend setting some time aside soon to review, crawl, audit, and then refine your xml sitemaps. There may be some low-hanging fruit changes that can yield nice wins. Now excuse me while I review the latest sitemap crawl. :)

GG

 

How To Get More Links, Crawl Errors, Search Queries, and More By Verifying Directories in Google Webmaster Tools

Verify by Directory in Google Webmaster Tools

In my opinion, it’s critically important to verify your website in Google Webmaster Tools (GWT). By doing so, you can receive information directly from Google as it crawls and indexes your website. There are many reports in GWT that can help identify various problems SEO-wise. For example, you can check the crawl errors report to surface problems Googlebot is encountering while crawling your site. You can check the HTML improvements section to view problems with titles, descriptions, and other metadata. You can view your inbound links as picked up by Google (more on that soon). You can check xml sitemaps reporting to view warnings, errors, and the indexed to submitted ratio. You can view indexation by directory via Index Status (forget about a site command, index status enables you to view your true indexation number).

In addition to the reporting you receive in GWT, Google will communicate with webmasters via “Site Messages”. Google will send messages when it experiences problems crawling a website, when it picks up errors or other issues, and of course, if you’ve received a manual action (penalty). That’s right, Google will tell you when your site has been penalized. It’s just another important reason to verify your website in GWT.

Limit On Inbound Links for Sites With Large Profiles
And let’s not forget about links. Using Google Webmaster Tools, you can view and download the inbound links leading to your site (as picked up by Google). And in a world filled with Penguins, manual actions, and potential negative SEO, it’s extremely important to view your inbound links, and often. Sure, there’s a limit of ~100K links that you can download from GWT, which can be limiting for larger and more popular sites, but I’ll cover an important workaround soon. And that workaround doesn’t just apply to links. It applies to a number of other reports too.

When helping larger websites with SEO, it’s not long before you run into the dreaded limit problem with Google Webmaster Tools. The most obvious limit is with inbound links. Unfortunately, there’s a limit of ~100K links that you can download from GWT. For most sites, that’s not a problem. But for larger sites, that can be extremely limiting. For example, I’m helping one site now with 9M inbound links. Trying to hunt down link problems at the site-level is nearly impossible via GWT with a link profile that large.

Inbound Links in Google Webmaster Tools

 

When you run into this problem, third party tools can come in very handy, like Majestic SEO, ahrefs, and Open Site Explorer. And you should also download your links from Bing Webmaster Tools, which is another great resource SEO-wise. But when you are dealing with a Google problem, it’s optimal to have link data directly from Google itself.

So, how do you overcome the link limit problem in GWT? Well, there’s a workaround that I’m finding many webmasters either don’t know about or haven’t implemented yet – verification by directory.

Verification by Directory to the Rescue
If you’ve been following along, then you can probably see some issues with GWT for larger, complex sites. On the one hand, you can get some incredible data directly from Google. But on the other hand, larger sites inherently have many directories, pages, and links to deal with, which can make your job analyzing that data harder to complete.

This is why I often recommend verifying by directory for clients with larger and more complex websites. It’s a great way to dig deep into specific areas of a website. As mentioned earlier, I’ve found that many business owners don’t even know you can verify by directory!  Yes, you can, and I recommend doing that today (even if you have a smaller site, but have distinct directories of content you monitor). For example, if you have a blog, you can verify the blog subdirectory in addition to your entire site. Then you can view reporting that’s focused on the blog (versus muddying up the reporting with data from outside the blog).

Add A Directory in Google Webmaster Tools

And again, if you are dealing with an inbound links problem, then isolating specific directories is a fantastic way to proceed to get granular links data. There’s a good chance the granular reporting by directory could surface new unnatural links that you didn’t find via the site-level reporting in GWT. The good news is that verifying your directories will only take a few minutes. Then you’ll just need to wait for the reporting to populate.

Which Reports Are Available For Directories?
I’m sure you are wondering which reports can be viewed by subdirectory. Well, many are available by directory, but not all. Below, you can view the reports in GWT that provide granular data by directory.

  • Search Queries
  • Top Pages (within Search Queries reporting)
  • Links to Your Site
  • Index Status
  • Crawl Errors (by device type)
  • HTML Improvements
  • Internal Links
  • International Targeting (New!)
  • Content Keywords
  • Structured Data

 

GWT Reporting by Directory – Some Examples

Indexation by Directory
Let’s say you’re having a problem with indexation. Maybe Google has only indexed 60% of your total pages for some reason. Checking the Index Status report is great, but doesn’t give you the information you need to isolate the problem.  For example, you want to try and hunt down the specific areas of the site that aren’t indexed as heavily as others.

If you verify your subdirectories in GWT, then you can quickly check the Index Status report to view indexation by directory. Based on what you find, you might dig deeper to see what’s going on in specific areas of your website. For example, running crawls of that subdirectory via several tools could help uncover potential problems. Are there roadblocks you are throwing up for Googlebot, are you mistakenly using the meta robots tag in that directory, is the directory blocked by robots.txt, is your internal linking weaker in that area, etc? Viewing indexation by directory is a logical first step to diagnosing a problem.

How To View Index Status by Directory in Google Webmaster Tools

 

Search Queries by Directory
Google Webmaster Tools provides search queries (keywords) that have returned pages on your website (over the past 90 days). Now that we live in a “not provided” world, the search queries reporting is important to analyze and export on a regular basis. You can view impressions, clicks, CTR, and average position for each query in the report.

But checking search queries at the site level can be a daunting task in Google Webmaster Tools. What if you wanted to view the search query data for a specific section instead? If you verify by directory, then all of the search query data will be limited to that directory. That includes impressions, clicks, CTR, and average position for queries leading to content in that directory only.

In addition, the “Top Pages” report will only contain the top pages from that directory. Again, this quickly enables you to hone in on content that’s receiving the most impressions and clicks.

And if you feel like there has been a drop in performance for a specific directory, then you can click the “with change” button to view the change in impressions, clicks, CTR, and average position for the directory. Again, the more granular you can get, the more chance of diagnosing problems.

How To View Search Query Reporting by Directory in Google Webmaster Tools

 

Links by Directory
I started explaining more about this earlier, and it’s an extremely important example. When you have a manual action for unnatural links, you definitely want to see what Google is seeing. For sites with large link profiles, GWT is not ideal. You can only download ~100K links, and those can be watered down by specific pieces of content or sections (leaving other important sections out in the cold).

When you verify by directory, the “links to your site” section will be focused on that specific directory. And that’s huge for sites trying to get a better feel for their link profile, unnatural links, etc. You can see domains linking to your content in a specific directory, your most linked content, and of course, the actual links. And you can download the top ~100K links directly from the report.

In addition, if you are trying to get a good feel for your latest links (like if you’re worried about negative SEO), then you can download the most recent links picked up by Google by clicking the “Download latest links” button.  That report will be focused on the directory at hand, versus a site-level download.

I’m not saying this is perfect, because some directories will have many more links than 100K. But it’s much stronger than simply downloading 100K links at the site-level.

How To View Inbound Links by Directory in Google Webmaster Tools

 

Crawl Errors By Directory
If you are trying to analyze the health of your website, then the Crawl Errors reporting is extremely helpful to review. But again, this can be daunting with larger websites (as all pages are reported at the site-level). But if you verify by directory, the crawl errors reporting will be focused on a specific directory. And that can help you identify problems quickly and efficiently.

In addition, you can view crawl errors reporting by Google crawler. For example, Googlebot versus Googlebot for Smartphones versus Googlebot-mobile for Feature Phones. By drilling into crawl errors by directory, you can start to surface problems at a granular level. This includes 404s, 500s, Soft 404s, and more.

How To View Crawl Errors by Directory in Google Webmaster Tools

Summary – Get Granular To View More Google Webmaster Tools Data
Verifying your website in Google Webmaster Tools is extremely important on several levels (as documented above).  But verifying by directory is also important, as it enables you to analyze specific parts of a website at a granular basis. I hope this post convinced you to set up your core directories in GWT today.

To me, it’s critically important to hunt down SEO problems as quickly as possible. The speed at which you can identify, and then rectify, those problems can directly impact your overall SEO health (and traffic to your site). In addition, analyzing granular reporting can help surface potential problems in a much cleaner way than viewing site-wide data. And that’s why verifying subdirectories is a powerful way to proceed (especially for large and complex sites).  So don’t hesitate. Go and verify your directories in Google Webmaster Tools now. More data awaits.

GG

 

 

Rap Genius Recovery: Analyzing The Keyword Gains and Losses After The Google Penalty Was Lifted

Rap Genius Recovers From Google Penalty

On Christmas Day, Rap Genius was given a heck of a gift from Google.  A penalty that sent their rankings plummeting faster than an anvil off the Eiffel tower.  The loss in traffic has been documented heavily as many keywords dropped from page one to page five and beyond.  And many of those keywords used to rank in positions #1 through #3 (or prime real estate SEO-wise).  Once the penalty was in place, what followed was a huge decrease in visits from Google organic, since most people don’t even venture to page two and beyond.  It’s like Siberia for SEO.

Gaming Links
So what happened that Google had to tear itself away from eggnog and a warm fire to penalize a lyrics website on Christmas Day?  Rap Genius was gaming links, and badly.  No, not just badly, but with such disregard for the consequences that they were almost daring Google to take action.  And that’s until Matt Cutts learned of the matter and took swift action on Rap Genius.

That was Christmas Day. Ho, ho, ho.  You get coal in your lyrical stocking.   I won’t go nuts here explaining the ins and outs of what they were doing.  That’s been documented heavily across the web.  In a nutshell, they were exchanging tweets for links.  If bloggers added a list of rich anchor text links to their posts, then Rap Genius would tweet links to their content.  The bloggers get a boatload of traffic and Rap Genius got links (and a lot of them using rich anchor text like {artist} + {song} + lyrics).  Here’s a quick screenshot of one page breaking the rules:

Rap Genius Unnatural Links

A 10 Day Penalty – LOL
Now, I help a lot of companies with algorithmic hits and manual actions.  Many of the companies contacting me for help broke the rules and are seeking help in identifying and then rectifying their SEO problems.  Depending on the situation, recovery can take months of hard work (or longer).  From an unnatural links standpoint, you need to analyze the site’s link profile, flag unnatural links, remove as many as you can manually, and then disavow the rest.  If you only have 500 links leading to your site, this can happen relatively quickly.  If you have 5 million, it can be a much larger and nastier project.

Rap Genius has 1.5 million links showing in Majestic’s Fresh Index.  And as you start to drill into the anchor text leading to the site, there are many questionable links.  You can reference their own post about the recovery to see examples of what I’m referring to.  Needless to say, they had a lot of work to do in order to recover.

So, you would think that it would take some time to track down, remove, and then disavow the unnatural links that caused them so much grief.  And then they would need to craft a serious reconsideration request documenting how they broke the rules, how they fixed the problem, and of course, offer a sincere apology for what they did (with a guarantee they will never do it again).   Then Google would need to go through the recon request, check all of the removals and hard work, and then decide whether the manual action should be lifted, or if Rap Genius had more work to do.  This should take at least a few weeks, right?  Wrong.  How about 10 days.

Rap Genius Recovers After 10 Days

Only 10 days after receiving a manual action, Rap Genius is back in Google.  As you can guess, the SEO community was not exactly thrilled with the news.  Screams of special treatment rang through the twitterverse, as Rap Genius explained that Google helped them to some degree understand how to best tackle the situation, or what to target.  Believe me, that’s rare.  Really rare…

Process for Removing and Disavowing Links
Rap Genius wrote a post about the recovery on January 4th, which included the detailed process for identifying and then dealing with unnatural links.  They had thousands of links to deal with, beginning with a master list of 178K.  From that master list, they started to drill into specific domains to identify unnatural links.   Once they did, Rap Genius removed what they could and disavowed the rest using Google’s Disavow Tool.   Following their work, Google removed the manual action on January 4th and Rap Genius was back in Google.

But many SEOs wondered how much they came back, especially since Rap Genius had to nuke thousands of links.  And many of those links were to deeper pages with rich anchor text.  Well, I’ve been tracking the situation from the start, checking which keywords dropped during the penalty, and now tracking which ones returned to high rankings after the penalty was lifted.  I’ll quickly explain the process I used for tracking rankings and then provide my findings.

My Process for Analyzing Rankings (With Some Nuances)
When the penalty was first applied to Rap Genius, I quickly checked SEMRush to view the organic search trending and to identify keywords that were “lost” and ones that “declined”.  Rap Genius ranks for hundreds of thousands of keywords according to SEMRush and its organic search reporting identified a 70K+ keyword loss based on the penalty.

Note, you can’t compare third party tools to a website’s own analytics reporting, and SEMRush won’t cover every keyword leading to the site.  But, for larger sites with a lot of volume, SEMRush is a fantastic tool viewing the gains and losses for a specific domain.  I’ve found it to be extremely thorough and accurate.

Checking the lost and declined keywords that SEMRush was reporting lined up with manual checks.  Those keywords definitely took a plunge, with Rap Genius appearing on page five or beyond.  And as I mentioned earlier, that’s basically Siberia for organic search.

When the penalty was lifted, I used the same process for checking keywords, but this time I checked the “new” and “improved” categories.  The reporting has shown 43K+ keywords showing in the “new” category, which means those keywords did not rank the last time SEMRush checked that query.

I also used Advanced Web Ranking to check 500 of the top keywords that were ranking prior to the penalty (and that dropped after the manual action was applied).  The keywords I checked were all ranking in the top ten prior to the penalty.  Once the penalty was lifted, I ran the rankings for those keywords.  I wanted to see how much of an improvement there was for the top 500 keywords.

Then I dug into the data based on both SEMRush and Advanced Web Ranking to see what I could find.  I have provided my findings below.   And yes, this is a fluid situation, so rankings could change.  But we have at least a few days of data now.  Without further ado, here’s what I found.

 

Branded Keywords
This was easy. Branded keywords that were obliterated during the penalty returned quickly with strong rankings.  This was completely expected.  For example, if you search for rap genius, rapgenius, or any variation, the site now ranks at the top of the search results.  And the domain name ranks with sitelinks. No surprises here.

Rap Genius Branded Keywords

Category Keywords
For category keywords, like “rap lyrics”, “favorite song lyrics”, and “popular song lyrics”, I saw mixed results after the recovery.  For example, the site now ranks #1 for “rap lyrics”, which makes sense, but does not rank well for “favorite song lyrics” and “popular song lyrics”.  And it ranked well for each of those prior to the penalty.  Although specific song lyric queries are a driving force for rap genius (covered soon), category keywords can drive a lot of volume.  It’s clear that the site didn’t recover for a number of key category keywords.

Rap Genius Category Keywords

 

Artist Keywords
I noticed that the site ranked for a lot of artists prior to the penalty (just the artist name with no modifiers).  For example, “kirko bangz”, “lil b”, etc.  Similar to what I saw with category keywords, I saw mixed results with artists.  Searching for the two artists I listed above does not yield high rankings anymore, when they both ranked on page one prior to the penalty.  Some increased in rankings, but not to page one.  For example, “2 chainz” ranks #12 after the penalty was lifted.  But it was MIA when the penalty was in effect.  Another example is “Kendrick Lamar”, which Rap Genius ranked #8 for prior to the penalty.  The site is not ranking well at all for that query now.  So again, it seems that Rap Genius recovered for some artist queries, but not all.

Rap Genius Artist Keywords

Lyrics Keywords
Based on my research, I could clearly see the power of {song} + lyrics queries for Rap Genius.  It’s a driving force for the site.  And Rap Genius is now ranking again for many of those queries.  When the penalty was first lifted, I started checking a number of those queries and saw Rap Genius back on page one, and sometimes #1.  But when I started checking in scale, you could definitely see that not all keywords returned to high rankings.

Rap Genius High Rankings for Lyrics Keywords

For example, “hallelujah lyrics”, “little things lyrics”, and “roller coaster lyrics” are still off of page one.  Then there are keywords that skyrocketed back up the charts, I mean search rankings.  For example, “swimming pool lyrics”, “marvins room lyrics”, and “not afraid lyrics” all returned after the penalty after being buried.  So, it seems that many song lyrics keywords returned, but there are some that rank page two and beyond.

Rap Genius Low Rankings for Lyrics Keywords

What About Keywords That Were Gamed?
I’m sure some of you are wondering how Rap Genius fared for keywords that were gamed via unnatural links.  For example, “22 two’s lyrics” yields extremely strong rankings for Rap Genius, when it was one of the songs gamed via the link scheme.  Actually, rap genius ranks twice in the top 5.  Go figure.

Rap Genius Rankings for Gamed Links - Jay Z

Ditto for “timbaland know bout me”, which was also one of the songs that made its way into the spammy list of links at the end of articles and posts.  Rap Genius ranks #3 right now.

Rap Genius Rankings for Gamed Links - Timbaland

And then there’s Justin Bieber, which I can’t cover with just one sentence.  Rap Genius currently ranks on page 3 for “Justin Bieber song lyrics”, when it used to rank #8!  And then “Justin Bieber baby lyrics” now ranks #12 on page 2, when it used to rank #8.  But for “Justin Bieber lyrics”, Rap Genius is #10, on page one.

Rap Genius Rankings for Justin Bieber Lyrics

Overall, I saw close to 100 Justin Bieber keywords pop back into the top few pages of Google after the penalty was lifted.  But, many were not on page one anymore… I saw many of those keywords yield rankings on page two or beyond for Rap Genius.  See the screenshot below:

Rap Genius Keywords for Justin Bieber

 

Summary – Rap Genius Recovers, But The Scars Remain
So there you have it.  A rundown of where Rap Genius is after the penalty was lifted.  Again, I can’t see every keyword that was lost or gained during the Christmas Day fiasco, but I could see enough of the data.  It seems that Rap Genius came back strong, but not full-blast.  I saw many keywords return, but still a number that remain buried in Google.

But let’s face it, a 10 day penalty is a slap on the wrist for Rap Genius.  They now have a clean(er) platform back, and can build up on that platform.  That’s a lot better than struggling for months (or longer) with horrible rankings.  As I explained earlier, too many business owners aren’t as lucky as Rap Genius.  10 days and help from Google can quicken up the recovery process.  That’s for sure.

I’ll end with one more screenshot to reinforce the fact that Rap Genius is back.  And it’s a fitting query. :)

Rap Genius I'm Sorry

GG