Sinister 404s – The Hidden SEO Danger of Returning The Wrong Header Response Code [Case Study]

Hidden SEO Danger 404 Response Code

A few weeks ago, I was contacted by a small business owner about my SEO services. And what started out as a simple check of a website turned into an interesting case study about hidden SEO dangers. The company has been in business for a long time (30+ years), and the owner was looking to boost the site’s SEO performance over the long-term. From the email and voicemail I received, it sounded like they were struggling to rank well across important target queries and wanted to address that ASAP. I also knew they were running AdWords to provide air cover for SEO (which is smart, but definitely not a long-term plan for their business).

Unfortunately, my schedule has been crazy and I knew I couldn’t take them on as a longer-term client. But, I still wanted to quickly check out their website to get a better feel for what was going on. And it took me about three minutes to notice a massive problem (one that is killing their efforts to rank for many queries). And that’s a shame because they probably should rank for those keywords based on their history, services, content, etc.

Surfacing a Giant SEO Problem
As I browsed the site, I noticed they had a good amount of content for a small business. The site had a professional design, it was relatively clean from a layout perspective, and provided strong content about their business, their history, news about the organization, the services they provided, and more.

But then it hit me. Actually, it was staring me right in the face. I noticed a small 404 icon when hitting one of their service pages (via the Redirect Path Chrome extension). OK, so that’s odd… The page renders fine, the content and design show up perfectly, but the page 404s (returning a Page Not Found error). It’s like the opposite of a soft 404. That’s where the page looks like a 404, but actually returns a 200 code. Well in this situation, the page look like a 200, but returns a 404 instead. I guess you can call it a “soft 200″.

404 Header Response Code in Redirect Path Chrome Extension

So I started to visit other pages on the site and more 404 header response codes followed. Actually, almost every single page on the site was throwing a 404 header response code. Holy cow, the initial 404 was just the tip of the iceberg.

After seeing 404s pop up all over the site, I quickly decided to crawl the website via Screaming Frog. I wanted to see how widespread of a problem it was. And it ends up that my initial assessment was spot on. Almost every page on the site returned a 404 header response code. The only pages that didn’t were the homepage and some pdfs. But every other page, including the services pages, news pages, about page, contact, etc. returned a 404.

Header Response Codes in Screaming Frog

For those of you familiar with SEO, then you know how this problem can impact a website. But for those of you unfamiliar with 404s and how they impact SEO, I’ll provide a quick rundown next. Then I’ll jump back to the story.

What is a 404 Header Response Code?
Every time a webpage is requested, the server will return a header response code. There are many that can be returned, but there are some standard codes you’ll come across. For example, 200 means the page returned OK, 301 means permanent redirect, 302 is a temporary redirect, 500 is an application error, 403 is forbidden, and 404 means page not found.

Header response codes are extremely important to understand for SEO. If you want a webpage indexed, then you definitely want it to return a 200 response code (which again, means OK, the request has succeeded). But if the page returns a 404, then that tells the engines that the page was not found and that it should be removed from the index. Yes, read that last line again. 404s basically inform Google and Bing that the page is gone and that it can be removed from each respective index. That means it will have no shot of ranking for target keywords.

And from an inbound links perspective, 404s are a killer. If a page 404s, then it cannot benefit from any inbound links pointing at the url. And the domain itself cannot benefit either (at an aggregate level). So 404s will get urls removed from Google’s index and can hamper your link equity (at the url level and at the domain level). Not good, to say the least.

Side Note: Checking Response  Codes
Based on what I’ve explained, some of you reading this post might be wondering how to easily check your header response codes. And you definitely should. I won’t cover the process in detail in this post, but I will point you in the right direction. There are several tools to choose from and I’ll include a few below.

You could Fetch as Google in Google Webmaster Tools to check the response sent to Googlebot (which includes the header response code). You can also use a browser plugin like Web Developer Tools or Redirect Path to quickly check header response codes on a url by url basis.

Web Developer Plugin Header Response Code

Fetch as Google and browser plugins are great, but they only let you process one url at a time. But what if you wanted to check your entire site in one shot? For situations like that, you could use a tool that crawls an entire website (or sections of a site). For example, you could use Xenu or Screaming Frog for small to medium sized sites and then a tool like Deep Crawl for larger-scale sites. All three will return a boatload of information about your pages, including the header response codes. Now back to the case study.

Dangerous, But Invisible to the Naked Eye
Remember, the entire site was returning 404 header response codes, other than the homepage and a few pdfs. But this 404 situation was sinister since the webpages looked like they resolved ok. You didn’t see a standard 404 page, but instead, you saw the actual page and content. But, the pages were actually 404ing and not being indexed. Like I said, it was a sinister problem.

Based on what I just explained, you could tell why an SMB owner would be baffled and simply not understand why their website wasn’t ranking well. They could see their site, their content, the various pages resolving, but they couldn’t see the underlying problem. Header response codes are hidden to the naked eye, and most people don’t even realize they are being returned at all. But the response code returned is critically important for how the search engines process your webpages.

Swingers Find Hidden 404s

My Response – “You’re At SEO Defcon 2”
This was a tough situation for me. I absolutely wanted to help the business longer-term, but couldn’t based on my schedule. But I absolutely wanted to make sure they understood the problem I came across while quickly checking out their website.

So I crafted a quick email explaining that I couldn’t help them at this time, but that I found a big problem on their site. As quickly and concisely as I could, I explained the 404 situation, provided a few screenshots, and explained they should get in touch with their designer, developer, or hosting provider to rectify the situation ASAP. That means ensuring their webpages return the proper header response codes. Basically, I told them that if their webpages should be indexed, then they should return a 200 header response code and not the 404s being returned now.

I hit “Send” and the ball was in their court.

Their Response – “We hear you and we’re on the right track – we think.”
I heard back from the business owner who explained they started working with someone to rectify the problem. They clearly didn’t know this was going on and they were hoping to have the situation fixed soon.

But as of today, the problem is still there. The site still returns 404 header response codes on almost every page. That’s unfortunate, since again, the pages returning a 404 have no chance at all of ranking in search and cannot help them from a link equity standpoint. The pages aren’t indexed and the site is basically telling Google and Bing to not index any of the core pages on the site.

I’m going to keep an eye on the situation to see when the changes take hold. And I hope that’s soon. It’s a great example of how hidden technical dangers can destroy SEO.

Opening Up The Site – How Will The Engines Respond?
My hope is that when the pages return the proper response codes that Google and Bing will begin indexing the pages and ranking them appropriately. And that will help on several levels. The website can drive more prospective customers via organic search, while the business can probably pull back on AdWords spend. And the site can grow its power from an inbound link standpoint as well, now that the pages are being indexed properly.

But as I often say about SEO, it’s all about the execution. If they don’t implement the necessary changes, then their situation will remain as-is. I’ll try an update this post if the situation improves.

Summary – Know Your Header Response Codes
Although hidden to the naked eye, header response codes are critically important for SEO. The right codes will enable the engines to properly crawl and index your webpages, while the wrong codes could lead to SEO disaster. I recommend checking your site today (via both manual checks and a crawl). You might find you’re in the clear with 200s, but you also might find some sinister 404s. So check now.

GG

 

How To Identify A Mobile Rankings Demotion Using The New Search Impact Report in Google Webmaster Tools

Search Impact Reporting in Google Webmaster Tools

April 21, 2015 is an important date. That’s the day when Google will begin using mobile friendliness as a ranking signal. There’s been a lot of talk about how that’s actually going to work, how much of an impact it will have, etc. Well, more and more information has been surfacing over the past few days about the changes.

For example, Gary Illyes spoke at SMX West heavily about the new mobile UX algo and provided some outstanding information. Jennifer Slegg wrote up a recap of that session, which I highly recommend reading. She provided some amazing nuggets of information, including information about mobile friendly techniques, how the algo will handle specific urls, if 4/21 is hard date for the rollout, if Google is building a mobile index (which they are), and more.

So, as 4/21 quickly approaches, many webmasters are working hard to get their sites in order from a mobile UX standpoint. As documented by John Mueller and Gary Illyes (and really Google itself), you can use any of the three options for providing a mobile-friendly version of your website. For example, you can use responsive design, dynamic delivery, or even a separate mobile site. I’ve seen all three techniques work well for clients, so the path you choose should be based on your own site and business. But definitely move quickly… April 21 will roll up quickly.

 

The *Current* Smartphone Rankings Demotion – A Glimpse Into the Future
Many people don’t realize this, but Google already has a smartphone rankings demotion in place for specific situations. For example, when there are faulty redirects from the desktop version of the content to the mobile version, or if there are other mobile-only errors.

I caught one of those situations in the wild and wrote a two-part case study about it. I first detailed the problems I saw on Electronista.com and then documented the improvements in rankings and traffic once the problems were fixed. Based on what Gary Illyes and John Mueller have both said about the mobile UX algo, it sounds like the new algorithm will work in a very similar fashion to the current smartphone rankings demotion. Therefore, I definitely recommend you review the two-part case study.

Checking For Faulty Mobile Redirects

For example, the current smartphone rankings demotion is on a url by url basis. Just because you have faulty redirects or mobile-only errors does not mean the entire domain should suffer (algorithmically). Also, the desktop urls are unaffected (which makes absolute sense). Also, and this is important, the algorithm is running in real-time and will impact urls during the normal crawling process.

That means urls can be demoted as Google comes across mobile problems, but the demotion can also be lifted as Google crawls the urls and notices that the problems are fixed. And that’s exactly what I saw with the smartphone rankings demotion situations I have helped with.

 

Checking Mobile Rankings and The (New) Search Impact Report
Google is currently testing a new search queries report in Google Webmaster Tools (called the Search Impact report). I have been testing the alpha version of the Search Impact reporting and it provides some great functionality beyond what the current Search Queries reporting provides. I plan to write more about that soon, but for now, let’s focus on the mobile friendliness algorithm rolling out on 4/21.

There are six dimensions you can segment your data by in the new Search Impact reporting. One of those dimensions is “Devices”. Using this report, you can filter data by desktop, mobile, and tablet. See below:

The Devices Dimension in The Search Impact Reporting

But don’t get fooled by the simplicity of the default report. By combining dimensions, you can view some elaborate reports that tell you a lot in a short amount of time.

When working on a smartphone rankings demotion (the current algo in place), I had to identify queries where a site ranked well in the desktop results, and then jump to the search queries reporting using the “mobile” filter for search property. When doing this for a large amount of queries, it could easily get monotonous.

But the new Search Impact report comes to the rescue and provides a nifty way to see side by side rankings when comparing desktop to mobile. Below, I’m going to show you how to quickly run this report to see a side by side comparison of clicks and average position by query. By doing so, you can quickly identify a smartphone rankings demotion. That’s for the current smartphone rankings demotion, and should work for the new mobile UX algo rolling out on 4/21/15. Let’s jump into the report.

 

How To Check Rankings By Device
First, if you’re not part of the alpha testing program, then you won’t be able to access the Search Impact report. But don’t fear, I can only imagine that Google wants to roll it out prior to 4/21/15 (based on the device reporting I’m showing you in this post).

To access the reporting, click “Search Traffic” and then “Search Impact” in the left-side menu:

Accessing The Search Impact Reporting in Google Webmaster Tools

The default view will show you clicks for the past 30 days. The first thing you need to do is click the “Queries” dimension. That will present all of the queries your site ranks for during the timeframe you selected.

Using The Queries Dimension In The Search Impact Reporting

Next, click the filter dropdown underneath “Devices”, which should say “No filter” (since there isn’t a filter in place yet). Click the dropdown and the select “Compare devices”.

Filtering By Device In The Search Impact Reporting

Keep “Desktop VS. Mobile” as the selection and then click “Compare”.

Comparing By Device In The Search Impact Reporting

You should now see a comparison of clicks per query for both desktop and mobile. That’s great, but we need to know how the site ranks for each query across both desktop and mobile. To see that, click the checkbox for the “Avg. Position” metric.  This will add average position for each query to the report.

Adding The Average Position Metric In The Search Impact Reporting

To view more queries than the default ten, you can use the dropdown at the top of the report. For example, you can show up to 500 rows in the report in Google Webmaster Tools.

Now you can start checking rankings for queries across both desktop and mobile. Don’t expect them to be exactly the same for every query… But they should be close. For example, the first three listed below are very close (two are identical and one is off by just .1).

Comparing Average Position by Query In The Search Impact Reporting

In my experience, when you have a smartphone rankings demotion, there will be a clear difference. For example, some smartphone rankings will be 10+ positions lower (or even non-existent in certain situations). So, if you see rows like the following, then you might have a problem.

Identifying a Rankings Difference In The Search Impact Reporting

 

How To Identify Problems and Lift The Smartphone Rankings Demotion
If you find that there is a smartphone rankings demotion in place, then you should run to the “Mobile Usability” reporting in Google Webmaster Tools. Google will provide the problems it encountered while crawling your site. I highly recommend fixing those mobile usability issues asap.

Mobile Usability Reporting in Google Webmaster Tools

You can also use the mobile friendly test via the Google Developers site. That will also highlight problems on a url by url basis.
https://www.google.com/webmasters/tools/mobile-friendly/

Using Google's Mobile Friendly Test

You can also check the crawl errors reporting in Google Webmaster Tools to see if there are smartphone errors or faulty redirects.

Smartphone Crawl Errors in Google Webmaster Tools

And you can crawl your site as Googlebot for Smartphones to check how your site is handling requests for the desktop pages (if you have mobile redirects in place). Doing so can surface problems sitting below the surface that are sometimes hard to pick up manually.

Crawl As Googlebot for Smartphones

 

Summary – The Search Impact Report Can Make An Impact
We all knew that mobile UX would become a ranking signal at some point, but now we have a specific date from Google for the rollout (4/21/15). When the new mobile algo launches, many will be wondering if they have been impacted, if their website dropped in rankings, and which urls are causing problems. As I demonstrated above, the new Search Impact reporting can help webmasters quickly identify problems by comparing the rankings across desktop and mobile (quickly and efficiently).

If you don’t have access to the Search Impact reporting yet, don’t worry. Again, I believe Google is going to roll this out before the 4/21 deadline. That would make complete sense, since the “Devices” dimension could prove to be extremely helpful when a smartphone rankings demotion is in place. One thing is for sure. The changes rolling out on (or around) April 21 will be fascinating to analyze. Google said this change will have a “significant impact” on the smartphone search results. And that impact can translate into many lost visitors, conversions, and revenue. Good luck.

GG

 

When The Hammer Falls – Analyzing Lyrics in the Google SERPs and Its Impact on Traffic [Case Study]

Summary: In the fall of 2014, both Bing and Google began surfacing song lyrics directly in the search engine results pages (SERPS). Since users could now find lyrics immediately in the SERPs, many wondered what would happen to lyrics websites that provided the same information, but required a click through to view the lyrics. This post provides findings from analyzing three large-scale lyrics web sites to determine the traffic impact of lyrics in the SERPs.

Song Lyrics Displayed In The Google Search Results

Article Contents and Quick Jumps:


Introduction
In April of 2014, I picked up a major algorithm update that heavily impacted lyrics web sites. The drop in traffic to many key players in the niche was substantial, with some losing 60%+ of their Google organic traffic overnight. For those of you familiar with Panda or Penguin hits, you know what this looks like.

Lyrics Web Sites Hit By Google Algorithm Update in April of 2014

I ended up digging in heavily and analyzing the drop across the entire niche. I reviewed a number of lyrics sites across several countries that got hit and wrote a post covering my findings (linked to above). After writing that post, I had a number of lyrics sites reach out to me for more information. They wanted to know more about what I surfaced, what the problems could be, and if I could help rectify the situation. It was a fascinating algo hit to analyze and I absolutely wanted to take on the challenge of helping the sites recover. So I began helping several of the lyrics sites that were heavily impacted.

2014 – A Crazy Year for Lyrics Sites
I took on several of the lyrics sites as clients and began heavily analyzing and auditing the negative impact. That included performing a deep crawl analysis of each site, a heavy-duty technical SEO analysis, a thorough content analysis, while also using every tool in my arsenal to surface SEO-related problems.

I won’t sugarcoat my findings, there were many problems I surfaced, across content, technical SEO, and even links (in certain situations). It was hard to say if the specific update in April was Panda, a separate algo update that hammered lyrics sites, or something else. But I tackled the situation by covering as many bases as I could. Each remediation plan was extensive and covered many ways to tackle the problems I surfaced. As time went on, and many changes were implemented, the sites started to recover. Some recovered sooner than others, while other sites took many more months to surge back.

Lyrics Website Recovering During Panda Update

On that note, many of the large lyrics sites have ridden the Panda roller coaster for a long time. And that’s common for large-scale websites that haven’t focused on Panda-proofing their web sites. Over time, insidious thin content builds on the site like a giant layer of bamboo. And as the bamboo thickens, Panda smells dinner. And before you know it, boom, Panda hits the site (and for these sites, it hit them hard).

After recovering, each site would hold their collective breath while subsequent Panda updates rolled out. Based on the lyrics web sites I have assisted, only one has fallen again to Panda. The others have remained out of the gray area and are doing well traffic-wise. Unfortunately, one lyrics web site I was helping saw a temporary recovery after recovering relatively quickly (almost too quickly). Quick recoveries are rare when you’re dealing with Panda, so I did find that specific recovery odd. It typically takes months before you see a major surge after being pummeled by Panda. The site surged during the 9/5 update and then got hammered again during the cloaked 10/24 update. And Panda has not rolled out since 10/24/14, so we’re still waiting to see if the site comes back.

Lyrics Website Temporary Recovery from Panda

But enough about Panda for now. Actually, Google Panda could pale in comparison to what showed up in late fall 2014. We all knew it was possible, considering Google’s ambition to provide more and more data in the search engine results pages (SERPs). But it’s another story when you actually see it happen. I’m referring to the search engines adding lyrics directly in the SERPs. You know, when someone searches for song lyrics, and boom, the lyrics show up right in the desktop or mobile SERPs. No click through needed. I’ll cover how this unfolded next.


Lyrics Show Up in the SERPs
Bing was the first to add lyrics in the SERPs on October 7, 2014. That was the first bomb dropped on lyrics sites. It was a small bomb, considering it was only showing in Bing in the United States and Bing has approximately 19.7% market share (according to comScore Dec 2014 stats). Bing also drives Yahoo search (organic and paid), but lyrics are not showing in Yahoo yet.

Lyrics in Bing SERPs

But the writing was on the wall. Lyrics were coming to Google, and sooner than later. When lyrics hit Bing, I sent emails to all of my lyrics clients explaining the situation, providing screenshots, and sample searches. Not every song would yield lyrics in the SERPs, but this was still a major event for the lyrics industry.

Next up was the first move by Google. On October 24, 2014, if you searched for a specific song, Google began providing a YouTube video with some song and artist information at the top of the SERPs. And near the bottom of that unit was a line or two from the lyrics and then a link to Google Play for the full lyrics. Whoa, so Google was beginning their assault on lyrics by simply linking to Google Play to view the lyrics. Again, I immediately emailed my clients and explained the situation, knowing lyrics were coming to the main SERPs soon.

Lyrics in Google SERPs Linking To Google Play

 

December 19, 2014 – The Hammer Falls
And then this happened:

Lyrics in Google SERPs Finally Arrive on December 19, 2014

And here was my Google+ share, which ended up getting a lot of attention:

Google Plus Share of Lyrics in the Google SERPs

 

I shared this screenshot of Google including lyrics directly in the SERPs, and the G+ post got noticed, a lot. That share was mentioned on a number of prominent websites, including Search Engine Roundtable, TechCrunch, Billboard, and more.

To clarify what was happening search-wise, on December 19, 2014 Google began showing song lyrics for users in the United States, and only for certain songs. I’m assuming the limit on songs and geography was based on licensing, so this doesn’t impact every song available. I’ll cover more about the impact of those limitations soon when I dig into some stats, but it’s an important note.

For example, if you search for “bang bang lyrics” in the United States, you get this:

Bang Bang Lyrics in US Google SERPs

But if you search for “you shook me all night long lyrics”, you won’t see lyrics in the SERPs. Clearly Google doesn’t have the rights to present the lyrics to all AC/DC songs, but it does for “Bang Bang”.

You Shook Me All Night Long Without Lyrics in US Google SERPs

And by the way, that’s for the desktop search results. This is also happening in mobile search, in the United States, and for certain songs. Talk about dominating the mobile SERPs, check out the screenshot below. Where on desktop, you get the lyrics, but still see links to lyrics websites above the fold (typically), mobile is another story.

Check out the search for “bang bang lyrics” on my smartphone:

Bang Bang Lyrics in the Mobile U.S. Google SERPs

Can you see the massive difference? It’s just lyrics, and nothing else. And to add insult to injury, the percentage of users searching for lyrics is heavily skewed mobile. And that makes sense. Those users are on the go, hear a song, want to know the lyrics, and simply search on their phones. Or, they are in a situation where their phone –is their computer– so their searches will always be mobile.

Mobile Heavy Queries for Lyrics Globally

 

Death to Lyrics Websites?
Based on what I’ve explained so far, you know that Panda loves taking a bite out of lyrics web sites and you also know that both Google and Bing are providing lyrics directly in the SERPs (in the US and for certain songs). And you might guess that all of this means absolute death for lyrics websites. But wait, does it? I wouldn’t jump to conclusions just yet. There are definitely nuances to this situation that require further analysis and exploration.

For example, how much of a hit have the lyrics sites taken based on lyrics in the SERPs? How much traffic dropped for each song that yields lyrics in the SERPs? Was there an impact just in the United States or around the world too? And what about the difference between desktop and mobile? All of these were great questions, and I was eager to find answers.

So, I reached out to several of my lyrics clients and asked if I could analyze the changes and document the data in this post (anonymously of course). The post isn’t meant to focus on the sites in particular, but instead, focus on the impact that “lyrics in the SERPs” have made to their traffic. The lyrics websites I’ve been helping generate revenue via advertising, so a massive drop in traffic means a massive drop in revenue. It’s pretty much that simple at this point. That’s why Panda strikes fear in every lyrics web site owner and why lyrics in the SERPs can strip away visits, pageviews, and ad dollars. It’s a new one-two punch from Google.


Analyzing Three Large-Scale Lyrics Websites
Three of my clients were nice enough to let me move forward with the analysis. And I greatly appreciate having clients that are awesome, and are willing to let me analyze and share that data. The three sites I analyzed for this post are large-scale lyrics sites. Combined, they drive more than 30 million visits from Google organic per month and have approximately 6 million lyrics pages indexed. And as I explained earlier, a lot of that traffic is from users on mobile devices. Approximately 40-50% of all Google organic traffic is from mobile devices (across all three sites).

Process:
My goal with the analysis was to understand the impact of lyrics in the SERPs from a click-through and traffic standpoint. I dug into search queries driving traffic over time to all three sites while also checking impressions and clicks in the SERPs (via Google Webmaster Tools, both desktop and mobile). Then I also checked Google Analytics to determine the change in traffic levels to song pages since the lyrics hit the SERPs.

For example, if a query saw a similar number of impressions since the launch of lyrics in the SERPs, but clicks dropped off a cliff, then I could dig in to analyze the SERPs for that query (both desktop and mobile). I found some interesting examples for sure, which I’ll cover below.

An example of stable or increasing impressions, but clicks dropping off a cliff: 

Google Webmaster Tools Impressions and Clicks for Lyrics Queries

 

Caveats:
My analysis measured the impact right after lyrics hit the SERPs (from December 19, 2014 through the end of January 2015). The holidays were mixed in, which I tried to account for the best I could. Some of the lyrics sites saw steady traffic during the holidays, while one dipped and then returned as the New Year approached. The songs I analyzed and documented were not holiday-focused songs. I made sure to try and isolate songs that would not be impacted by the holidays. Also, Google Webmaster Tools data was sometimes wonky. I’m sure that’s no surprise to many of you working heavily in SEO, but it’s worth noting. I tried my best to exclude songs where the data looked strange.

Google Webmaster Tools & Advanced Segmentation in GA
When I began my analysis, I quickly found out that the straight reporting in both Google Webmaster Tools and Google Analytics wouldn’t suffice. Overall Google organic traffic wouldn’t help, since lyrics only rolled out in the SERPs in the United States. When checking traffic since the rollout, you really couldn’t see much overall change. But the devil is in the details as they say. So I used the functionality available to me in both GWT and GA to slice and dice the data. And that greatly helped me understand the impact of lyrics in the SERPs.

In Google Webmaster Tools, the search queries reporting enables you to filter the results. This was incredibly helpful, as I was able to isolate traffic from the United States and also view web versus mobile traffic. But there was another nifty filter I used that really helped. You see, many people visit lyrics websites for the meaning of the lyrics, and not just to see the lyrics. For example, “take me to church meaning” or “meaning of hallelujah lyrics”.

The reason I wanted to weed those queries out is because as of now, Google does not provide the lyrics in the SERPs for “meaning” focused queries. And that’s good for my clients by the way. So by adding the filters per site, I would able to isolate songs that could be impacted.

Filtering GWT Search Queries by Search Property, Location, and Negative Query:

Google Webmaster Tools Filters for Property, Location, and Query

After setting the filters, I was able to search for queries that yielded relatively stable impressions, but saw a drop in clicks and click through rate. And I always kept an eye on average position to make sure it didn’t drop heavily.

From a Google Analytics standpoint, I ran into a similar problem. Top-level statistics wouldn’t cut it. I needed Google organic traffic from the United States only. And then I wanted both Desktop and Mobile Google organic traffic from the United States only (separated). That’s where the power of advanced segments come in.

I built segments for Desktop Google organic traffic from the United States and Mobile Google organic traffic from the United States. By activating these segments, my reporting isolated that traffic and enabled me identify trends and changes based on those segments alone. By the way, I wrote a tutorial for how to use segments to analyze Panda hits. You should check that out if you aren’t familiar with segments in GA. You’ll love them, believe me.

Filtering Google Organic Traffic from the United States in GA Using Segments:

Google Analytics Segments for U.S. Desktop Google Organic Traffic

 

So, with the right tools and filters in place, I began to dig in. It was fascinating to analyze the queries leading to all three sites now that lyrics hit the SERPs. I cover what I found next. By the way, this posts focuses on Google and not Bing. I might write up another post focused on Bing’s lyrics in the SERPs, but I wanted to focus on Google to start.


The Impact of Lyrics in the SERPs – The Data
With multiple computers up and running, two phones, and two tablets, I began to dig in. I wanted to find queries and songs that typically drove traffic to the three sites that now yielded lyrics in the SERPs. And then I wanted to see what happened once those lyrics hit the SERPs, the impact on clicks, traffic, etc. I have documented a number of examples below. By the way, there are many more examples, but I wanted to just provide a sampling below. Here we go…

 

Spill The Wine Lyrics by War
Google Organic Desktop US Traffic Down 73%
Google Organic Mobile US Traffic Down 65%
GWT Clicks Down 56%

 

Sister Ray Lyrics by The Velvet Underground
Google Organic Desktop US Traffic Down 73%
Google Organic Mobile US Traffic Down 56%
GWT Clicks Down 84%

 

Rude Lyrics by Magic!
Google Organic Desktop US Traffic Down 41%
Google Organic Mobile US Traffic Down 32%
GWT Clicks Down 55%

 

Bang Bang Lyrics by Jesse J, Nicki Manaj and Ariana Grande
Google Organic Desktop US Traffic Down 32%
Google Organic Mobile US Traffic Down 47%
GWT Clicks Down 66%

 

Fireproof Lyrics by One Direction
Google Organic Desktop US Traffic Down 44%
Google Organic Mobile US Traffic Down 40%
GWT Clicks Down 29%

 

All of Me Lyrics by John Legend
Google Organic Desktop US Traffic Down 39%
Google Organic Mobile US Traffic Down 14%
GWT Clicks Down 61%

 

Country Road Lyrics by John Denver
Google Organic Desktop US Traffic Down 62%
Google Organic Mobile US Traffic Down 45%
GWT Clicks Down 36%

 

Come Sail Away Lyrics by Styx
Google Organic Desktop US Traffic Down 43%
Google Organic Mobile US Traffic Down 27%
GWT Clicks Down 55%

 

Midnight Special Lyrics by Huddie William Ledbetter
Google Organic Desktop US Traffic Down 53%
Google Organic Mobile US Traffic Down 85%
GWT Clicks Down 33%

 

Comfortably Numb Lyrics by Pink Floyd
Google Organic Desktop US Traffic Down 46%
Google Organic Mobile US Traffic Down 17%
GWT Clicks Down 43%

 

Yes, There’s A Serious Impact
As you can see from the statistics above, both desktop and mobile traffic to the song pages dropped significantly since lyrics hit the SERPs (for songs that yield lyrics in the SERPs). Again, these songs showed stable impressions during the timeframe, yet showed large drops in clicks from the SERPs, and subsequent traffic to the three lyrics sites I analyzed.

Some users were clearly getting what they wanted when searching for lyrics and finding that information in the SERPs. And in mobile search, the lyrics take up the entire results page. So it’s no surprise to see some mobile numbers absolutely plummet after lyrics hit the SERPs.


What Could Lyrics Sites Do?
Above, I provided a sampling of what I saw while analyzing the impact of lyrics in the U.S. Google SERPS. Clearly there’s a large impact. The good news for lyrics sites is that there are several core factors helping them right now.

  • This is only in the United States.
  • The lyrics only trigger when the query is structured in certain ways. For example, “magic rude lyrics” yields lyrics where “rude lyrics magic” does not. Also, if additional words are entered in the query, lyrics will not be shown (like “meaning” which I explained earlier.)
  • Not all songs are impacted (yet). I found many examples of songs that did not yield lyrics in the SERPs. Again, this is probably due to licensing issues.

If you look at the overall traffic numbers for the sites I analyzed (and the other sites I have access to), Google organic traffic overall has not been heavily impacted. Taking all global Google organic traffic into account, and across all songs, you clearly don’t see the huge drop like I showed you for the songs listed above. That said, this is still a grave situation for many lyrics sites. The content they have licensed and provided on their sites is now being surfaced directly in the SERPs. If this expands to more songs, more countries, and for additional queries, then it can have a massive impact on their businesses. Actually, it could very well end their businesses.

Moving forward, lyrics sites need to up their game from a functionality and value proposition standpoint. If Google can easily add lyrics to the SERPs, then lyrics sites need to keep driving forward with what Google can’t do (at least for now). They should develop new functionality, strengthen community engagement, provide member benefits, include more data and media for artists and songs, provide a killer mobile experience, etc.

Remember, there are many people searching for additional information related to songs. For example, people want to know the meaning of lyrics and seem to enjoy the community engagement about learning what each lyric means. And lyrics don’t trigger in the SERPs for those queries (yet).

And then you have the next generation of devices, social networks, messaging apps, gaming consoles, connected cars, etc. I would start thinking about how people are going to search for lyrics across new devices and in new environments. That’s a new frontier and it would be smart to begin building and testing lyrics applications that can work in those new environments. Mobile, wearables, voice search, cars, etc. provide a wealth of opportunity for business owners focused on music. It just takes the right ideas, time, resources, and of course, money.

But I’ll stop there. I think that topic can be an entire post and this one is getting too long already. :)

 

Summary – Moving Forward With (Expanding) Lyrics in the SERPs
In the short-term, it’s hard to say how this will expand. Google and Bing might drop the effort and keep things as-is, or they could keep expanding lyrics in the SERPs until every song and every country is covered.

Based on the current song and geography limits in Google and Bing, lyrics websites are still surviving, and especially for searches outside the United States. It will be interesting to watch this space over time, especially since I have several clients adapting to the new lyrics world as I write this post.

From an SEO standpoint, between Google Panda and content surfacing in the SERPs, lyrics web sites are fighting a battle on two fronts. If it’s not Panda attacking the site one night, it’s the Knowledge Graph pushing song lyrics front and center in the SERPs. And in this day and age, wars are won by technology, not brute strength. So lyrics sites need to up their engineering prowess, think two to three steps ahead of the industry, and then execute quickly and at a very high level.

That’s how they can survive and prosper in the coming years. Of course, that’s until we have a Google chip implanted in our brains that instantly provides the lyrics to every song ever written, from the around the world, since the beginning of time. Think about that for a second.

GG

 

Insidious Thin Content on Large-Scale Websites and Its Impact on Google Panda

Insidious Thin Content and Google Panda

If you’ve read some of my case studies in the past, then you know Panda can be a real pain the neck for large-scale websites. For example, publishers, ecommerce retailers, directories, and other websites that often have tens of thousands, hundreds of thousands, or millions of pages indexed. When sites grow that large, with many categories, directories, and subdomains, content can easily get out of control. For example, I sometimes surface problematic areas of a website that clients didn’t even know existed! There’s usually a gap of silence on the web conference when I present situations like that. But once everyone realizes that low quality content is in fact present, then we can proceed with how to rectify the problems at hand.

And that’s how you beat Panda. Surfacing content quality problems and then quickly fixing those problems. And if companies don’t surface and rectify those problems, then they remain heavily impacted by Panda. Or even more maddening, they can go in and out of the gray area of Panda. That means they can get hit, recover to a degree, get hit again, recover, etc. It’s a maddening place to live SEO-wise.

The Insidious Thin Content Problem
The definition of insidious is:
“proceeding in a gradual, subtle way, but with harmful effects”

And that’s exactly how thin content can increase over time on large-scale websites. The problem usually doesn’t rear its ugly head in one giant blast (although that can happen). Instead, it can gradually increase over time as more and more content is added, edited, technical changes are made, new updates get pushed to the website, new partnerships formed, etc. And before you know it, boom, you’ve got a huge thin content problem and Panda is knocking on the door. Or worse, it’s already knocked down your door.

So, based on recent Panda audits, I wanted to provide three examples of how an insidious thin content problem can get out of control on larger-scale websites. My hope is that you can review these examples and then apply the same model to your own business.

 

Insidious Thin Content: Example 1
During one recent audit, I ended up surfacing a number of pages that seemed rogue. For example, they weren’t linked to from many other pages on the site, didn’t contain the full site template, and only contained a small amount of content. And the content didn’t really have any context about why it was there, what users were looking at, etc. I found that very strange.

Thin Content with No Site Template

So I dug into that issue, and started surfacing more and more of that content. Before I knew it, I was up to 4,100 pages of that content! Yes, there were over four thousand rogue, thin pages based on that one find.

To make matters even worse, when checking how Google was crawling and indexing that content, you could quickly see major problems. Using both fetch and render in Google Webmaster Tools and checking the cache of the pages revealed Google couldn’t see most of the content. So the thin pages were even thinner than I initially thought. They were essentially blank to Google.

Thin Content and Content Won't Render

When bringing this up to my client, they did realize the pages were present on the site, but didn’t understand the potential impact Panda-wise. After explaining more about how Panda works, and how thin content equates to giant pieces of bamboo, they totally got it.

I explained that they should either immediately 404 that content or noindex it. And if they wanted to quicken that process a little, then 410 the content. Basically, if the pages should not be on the site for users or Google, then 404 or 410 them. If the pages are beneficial for users for some reason, then noindex the content using the meta robots tag.

So, with one finding, my client will nuke thousands of pages of thin content from their website (which had been hammered by Panda). That will sure help and it’s only one finding based on a number of core problems I surfaced on the site during my audit. Again, the problem didn’t manifest itself overnight. Instead, it took years of this type of content building on the site. And before they knew it, Panda came and hammered the site. Insidious.

 

Insidious Thin Content: Example 2
In another audit I recently conducted, I kept surfacing thin pages that basically provided third party videos (which were often YouTube videos embedded in the page). So you had very little original content and then just a video. After digging into the situation, I found many pages like this. At this time, I estimate there could be as many as one thousand pages like this on the site. And I still need to analyze more of the site and crawl, so it could be even worse…

Now, the web site has been around for a long time, so it’s not like all the thin video pages popped up overnight. The site produces a lot of content, but would continually supplement stronger content with this quick approach that yielded extremely thin and unoriginal content. And as time went on, the insidious problem yielded a Panda attack (actually, multiple Panda attacks over time).

Thin Video Pages and Google Panda

Note, this was not the only content quality problem the site suffered from. It never is just one problem that causes a Panda attack by the way. I’ve always said that Panda has many tentacles and that low quality content can mean several things. Whenever I perform a deep crawl analysis and audit on a severe Panda hit, I often surface a number of serious problems. This was just one that I picked up during the audit, but it’s an important find.

By the way, checking Google organic traffic to these pages revealed a major decrease in traffic over time… Even Google was sending major signals to the site that it didn’t like the content. So there are many thin video pages indexed, but almost no traffic. Running a Panda report showing the largest drop in traffic to Google organic landing pages after a Panda hit reveals many of the thin video pages in the list. It’s one of the reasons I recommend running a Panda report once a site has been hit. It’s loaded with actionable data.

So now I’m working with my client to identify all pages on the site that can be categorized as thin video pages. Then we need to determine which are ok (there aren’t many), which are truly low quality, which should be noindexed, and which should be nuked. And again, this was just one problem… there are a number of other content quality problems riddling the site.

 

Insidious Thin Content: Example 3

During another Panda project, I surfaced an interesting thin content problem. And it’s one that grew over time to create a pretty nasty situation. I surfaced many urls that simply provided a quick update about a specific topic. Those updates were typically just a few lines of content all within a specific category. The posts were extremely thin… and were sometimes only a paragraph or two without any images, visuals, links to more content, etc.

Thin Quick Updates and Google Panda

Upon digging into the entire crawl, I found over five thousand pages that fit this category of thin content. Clearly this was a contributing factor to the significant Panda hit the site experienced. So I’m working with my client on reviewing the situation and making the right decision with regard to handling that content. Most of the content will be noindexed versus being removed, since there are reasons outside of SEO that need to be taken into account. For example, partnerships, contractual obligations, etc.

Over time, you can see that some of these pages actually used to rank well and drive organic search traffic from Google. That’s probably due to the authority of the site. I’ve seen that many times since 2011 when Panda first rolled out. A site builds enormous SEO power and then starts pumping out thinner, lower-quality content.  And then that content ends up ranking well. And when users hit the thin content from Google, they bounce off the site quickly (and often back to the search results). In aggregate, low user engagement, high bounce rates, and low dwell time can be a killer Panda-wise. Webmasters need to avoid that situation like the plague. You can read my case study about “6 months with Panda” to learn more about that situation.

 

Summary – Stopping The Insidious Thin Content Problem is Key For Panda Recovery
So there you have it. Three quick examples of insidious thin content problems on large-scale websites. They often don’t pop up overnight, but instead, they grow over time. And before you know it, you’ve got a thick layer of bamboo on your site attracting the mighty Panda. By the way, there are many other examples of insidious thin content that I’ve come across during my Panda work and I’ll try and write more about this problem soon. I think it’s incredibly important for webmasters to understand how the problem can grow, the impact it can have, and how to handle the situation.

In the meantime, I’ll leave you with some quick advice. My recommendation to any large-scale website is to truly understand your content now, identify any Panda risks, and take action sooner than later. It’s much better to be proactive and handle thin content in the short-term versus dealing with a major Panda hit after the fact. By the way, the last Panda update was on 10/24, and I’m fully expecting another one soon. Google rolled out an update last year on 1/11/14, so we are definitely due for one soon. I’ll be sure to communicate what I’m seeing once the update rolls out.

GG

 

 

XML Sitemaps – 8 Facts, Tips, and Recommendations for the Advanced SEO

XML Sitemaps for Advanced SEOs

After publishing my last post about dangerous rel canonical problems, I started receiving a lot of questions about other areas of technical SEO. One topic in particular that seemed to generate many questions was how to best use and set up xml sitemaps for larger and more complex websites.

Sure, in its most basic form, webmasters can provide a list of urls that they want the search engines to crawl and index. Sounds easy, right? Well, for larger and more complex sites, the situation is often not so easy. And if the xml sitemap situation spirals out of control, you can end up feeding Google and Bing thousands, hundreds of thousands, or millions of bad urls. And that’s never a good thing.

While helping clients, it’s not uncommon for me to audit a site and surface serious errors with regard to xml sitemaps. And when that’s the case, websites can send Google and Bing mixed signals, urls might not get indexed properly, and both engines can end up losing trust in your sitemaps. And as Bing’s Duane Forrester once said in this interview with Eric Enge:

“Your Sitemaps need to be clean. We have a 1% allowance for dirt in a Sitemap. If we see more than a 1% level of dirt, we begin losing trust in the Sitemap.”

Clearly that’s not what you want happening…

So, based on the technical SEO work I perform for clients, including conducting many audits, I decided to list some important facts, tips, and answers for those looking to maximize their xml sitemaps. My hope is that you can learn something new from the bullets listed below, and implement changes quickly.

 

1. Use RSS/Atom and XML For Maximum Coverage
This past fall, Google published a post on the webmaster central blog about best practices for xml sitemaps. In that post, they explained that sites should use a combination of xml sitemaps and RSS/Atom feeds for maximum coverage.

Xml sitemaps should contain all canonical urls on your site, while RSS/Atom feeds should contain the latest additions or recently updated urls. XML sitemaps will contain many urls, where RSS/Atom feeds will only contain a limited set of new or recently changed urls.

RSS/Atom Feed and XML Sitemaps

So, if you have new urls (or recently updated urls) that you want Google to prioritize, then use both xml sitemaps and RSS/Atom feeds. Google says by using RSS, it can help them “keep your content fresher in its index”. I don’t know about you, but I like the idea of Google keeping my content fresher. :)

Also, it’s worth noting that Google recommends maximizing the number of urls per xml sitemap. For example, don’t cut up your xml sitemaps into many smaller files (if possible). Instead, use the space you have in each sitemap to include all of your urls. If you don’t Google explains that, “it can impact the speed and efficiency of crawling your urls.” I recommend reading Google’s post to learn how to best use xml sitemaps and RSS/Atom feeds to maximize your efforts. By the way, you can include 50K urls per sitemap and each sitemap must be less than 10MB uncompressed.

 

2. XML Sitemaps By Protocol and Subdomain
I find a lot of webmasters are confused by protocol and subdomains, and both can end up impacting how urls in sitemaps get crawled and indexed.

URLs included in xml sitemaps must use the same protocol and subdomain as the sitemap itself. This means that https urls located in an http sitemap should not be included in the sitemap. This also means that urls on sample.domain.com cannot be located in the sitemap on www.domain.com. So on and so forth.

XML Sitemaps and Protocol and Subdomains

 

This is a common problem when sites employ multiple subdomains or they have sections using https and http (like ecommerce retailers). And then of course we have many sites starting to switch to https for all urls, but haven’t changed their xml sitemaps to reflect the changes. My recommendation is to check your xml sitemaps reporting today, while also manually checking the sitemaps. You might just find issues that you can fix quickly.

 

3. Dirty Sitemaps – Hate Them, Avoid Them
When auditing sites, I often crawl the xml sitemaps myself to see what I find. And it’s not uncommon to find many urls that resolve with non-200 header response codes. For example, urls that 404, 302, 301, return 500s, etc.

Dirty XML Sitemaps

You should only provide canonical urls in your xml sitemaps. You should not provide non-200 header response code urls (or non-canonical urls that point to other urls). The engines do not like “dirty sitemaps” since they can send Google and Bing on a wild goose chase throughout your site. For example, imagine driving Google and Bing to 50K urls that end up 404ing, redirecting, or not resolving. Not good, to say the least.

Remember Duane’s comment from earlier about “dirt” in sitemaps. The engines can lose trust in your sitemaps, which is never a good thing SEO-wise. More about crawling your sitemaps later in this post.

 

4. View Trending in Google Webmaster Tools
Many SEOs are familiar with xml sitemaps reporting in Google Webmaster Tools, which can help surface various problems, while also providing important indexation statistics. Well there’s a hidden visual gem in the report that’s easy to miss. The default view will show the number of pages submitted in your xml sitemaps and the number indexed. But if you click the “sitemaps content” box for each category, you can view trending over the past 30 days. This can help you identify bumps in the road, or surges, as you make changes.

For example, check out the trending below. You can see the number of images submitted and indexed drop significantly over a period of time, only to climb back up. You would definitely want to know why that happened, so you can avoid problems down the line. Sending this to your dev team can help them identify potential problems that can build over time.

XML Sitemaps Trending in Google Webmaster Tools

 

5. Using Rel Alternate in Sitemaps for Mobile URLs
When using mobile urls (like m.), it’s incredibly important to ensure you have the proper technical SEO setup. For example, you should be using rel alternate on the desktop pages pointing to the mobile pages, and then rel canonical on the mobile pages pointing back to the desktop pages.

Although not an approach I often push for, you can provide rel alternate annotations in your xml sitemaps. The annotations look like this:

Rel Alternate in XML Sitemaps

 

It’s worth noting that you should still add rel canonical to the source code of your mobile pages pointing to your desktop pages.

 

6. Using hreflang in Sitemaps for Multi-Language Pages
If you have pages that target different languages, then you are probably already familiar with hreflang. Using hreflang, you can tell Google which pages should target which languages. Then Google can surface the correct pages in the SERPs based on the language/country of the person searching Google.

Similar to rel alternate, you can either provide the hreflang code in a page’s html code (page by page), or you can use xml sitemaps to provide the hreflang code. For example, you could provide the following hreflang attributes when you have the same content targeting different languages:

Hreflang in XML Sitemaps

Just be sure to include a separate <loc> element for each url that contains alternative language content (i.e. all of the sister urls should be listed in the sitemap via a <loc> element).

 

7. Testing XML Sitemaps in Google Webmaster Tools
Last, but not least, you can test your xml sitemaps or other feeds in Google Webmaster Tools. Although easy to miss, there is a red “Add/Test Sitemap” button in the upper right-hand corner of the Sitemaps reporting page in Google Webmaster Tools.

Test XML Sitemaps in Google Webmaster Tools

When you click that button, you can add the url of your sitemap or feed. Once you click “Test Sitemap”, Google will provide results based on analyzing the sitemap/feed. Then you can rectify those issues before submitting the sitemap. I think too many webmasters use a “set it and forget it” approach to xml sitemaps. Using the test functionality in GWT, you can nip some problems in the bud. And it’s simple to use.

Results of XML Sitemaps Test in Google Webmaster Tools

 

8. Bonus: Crawl Your XML Sitemap Via Screaming Frog
In SEO, you can either test and know, or read and believe. As you can probably guess, I’m a big fan of the former… For xml sitemaps, you should test them thoroughly to ensure all is ok. One way to do this is to crawl your own sitemaps. By doing so, you can identify problematic tags, non-200 header response codes, and other little gremlins that can cause sitemap issues.

One of my favorite tools for crawling sitemaps is Screaming Frog (which I have mentioned many times in my previous posts). By setting the crawl mode to “list mode”, you can crawl your sitemaps directly. Screaming Frog natively handles xml sitemaps, meaning you don’t need to convert your xml sitemaps into another format before crawling (which is awesome).

Crawling Sitemaps in Screaming Frog

Screaming Frog will then load your sitemap and begin crawling the urls it contains. In real-time, you can view the results of the crawl. And if you have Graph View up and running during the crawl, you can visually graph the results as the crawler collects data. I love that feature. Then it’s up to you to rectify any problems that are surfaced.

Graph View in in Screaming Frog

 

Summary – Maximize and Optimize Your XML Sitemaps
As I’ve covered throughout this post, there are many ways to use xml sitemaps to maximize your SEO efforts. Clean xml sitemaps can help you inform the engines about all of the urls on your site, including the most recent additions and updates. It’s a direct feed to the engines, so it’s important to get it right (and especially for larger and more complex websites).

I hope my post provided some helpful nuggets of sitemap information that enable you to enhance your own efforts. I recommend setting some time aside soon to review, crawl, audit, and then refine your xml sitemaps. There may be some low-hanging fruit changes that can yield nice wins. Now excuse me while I review the latest sitemap crawl. :)

GG