Wednesday, February 4th, 2015

When The Hammer Falls – Analyzing Lyrics in the Google SERPs and Its Impact on Traffic [Case Study]

Summary: In the fall of 2014, both Bing and Google began surfacing song lyrics directly in the search engine results pages (SERPS). Since users could now find lyrics immediately in the SERPs, many wondered what would happen to lyrics websites that provided the same information, but required a click through to view the lyrics. This post provides findings from analyzing three large-scale lyrics web sites to determine the traffic impact of lyrics in the SERPs.

Song Lyrics Displayed In The Google Search Results

Article Contents and Quick Jumps:

In April of 2014, I picked up a major algorithm update that heavily impacted lyrics web sites. The drop in traffic to many key players in the niche was substantial, with some losing 60%+ of their Google organic traffic overnight. For those of you familiar with Panda or Penguin hits, you know what this looks like.

Lyrics Web Sites Hit By Google Algorithm Update in April of 2014

I ended up digging in heavily and analyzing the drop across the entire niche. I reviewed a number of lyrics sites across several countries that got hit and wrote a post covering my findings (linked to above). After writing that post, I had a number of lyrics sites reach out to me for more information. They wanted to know more about what I surfaced, what the problems could be, and if I could help rectify the situation. It was a fascinating algo hit to analyze and I absolutely wanted to take on the challenge of helping the sites recover. So I began helping several of the lyrics sites that were heavily impacted.

2014 – A Crazy Year for Lyrics Sites
I took on several of the lyrics sites as clients and began heavily analyzing and auditing the negative impact. That included performing a deep crawl analysis of each site, a heavy-duty technical SEO analysis, a thorough content analysis, while also using every tool in my arsenal to surface SEO-related problems.

I won’t sugarcoat my findings, there were many problems I surfaced, across content, technical SEO, and even links (in certain situations). It was hard to say if the specific update in April was Panda, a separate algo update that hammered lyrics sites, or something else. But I tackled the situation by covering as many bases as I could. Each remediation plan was extensive and covered many ways to tackle the problems I surfaced. As time went on, and many changes were implemented, the sites started to recover. Some recovered sooner than others, while other sites took many more months to surge back.

Lyrics Website Recovering During Panda Update

On that note, many of the large lyrics sites have ridden the Panda roller coaster for a long time. And that’s common for large-scale websites that haven’t focused on Panda-proofing their web sites. Over time, insidious thin content builds on the site like a giant layer of bamboo. And as the bamboo thickens, Panda smells dinner. And before you know it, boom, Panda hits the site (and for these sites, it hit them hard).

After recovering, each site would hold their collective breath while subsequent Panda updates rolled out. Based on the lyrics web sites I have assisted, only one has fallen again to Panda. The others have remained out of the gray area and are doing well traffic-wise. Unfortunately, one lyrics web site I was helping saw a temporary recovery after recovering relatively quickly (almost too quickly). Quick recoveries are rare when you’re dealing with Panda, so I did find that specific recovery odd. It typically takes months before you see a major surge after being pummeled by Panda. The site surged during the 9/5 update and then got hammered again during the cloaked 10/24 update. And Panda has not rolled out since 10/24/14, so we’re still waiting to see if the site comes back.

Lyrics Website Temporary Recovery from Panda

But enough about Panda for now. Actually, Google Panda could pale in comparison to what showed up in late fall 2014. We all knew it was possible, considering Google’s ambition to provide more and more data in the search engine results pages (SERPs). But it’s another story when you actually see it happen. I’m referring to the search engines adding lyrics directly in the SERPs. You know, when someone searches for song lyrics, and boom, the lyrics show up right in the desktop or mobile SERPs. No click through needed. I’ll cover how this unfolded next.

Lyrics Show Up in the SERPs
Bing was the first to add lyrics in the SERPs on October 7, 2014. That was the first bomb dropped on lyrics sites. It was a small bomb, considering it was only showing in Bing in the United States and Bing has approximately 19.7% market share (according to comScore Dec 2014 stats). Bing also drives Yahoo search (organic and paid), but lyrics are not showing in Yahoo yet.

Lyrics in Bing SERPs

But the writing was on the wall. Lyrics were coming to Google, and sooner than later. When lyrics hit Bing, I sent emails to all of my lyrics clients explaining the situation, providing screenshots, and sample searches. Not every song would yield lyrics in the SERPs, but this was still a major event for the lyrics industry.

Next up was the first move by Google. On October 24, 2014, if you searched for a specific song, Google began providing a YouTube video with some song and artist information at the top of the SERPs. And near the bottom of that unit was a line or two from the lyrics and then a link to Google Play for the full lyrics. Whoa, so Google was beginning their assault on lyrics by simply linking to Google Play to view the lyrics. Again, I immediately emailed my clients and explained the situation, knowing lyrics were coming to the main SERPs soon.

Lyrics in Google SERPs Linking To Google Play


December 19, 2014 – The Hammer Falls
And then this happened:

Lyrics in Google SERPs Finally Arrive on December 19, 2014

And here was my Google+ share, which ended up getting a lot of attention:

Google Plus Share of Lyrics in the Google SERPs


I shared this screenshot of Google including lyrics directly in the SERPs, and the G+ post got noticed, a lot. That share was mentioned on a number of prominent websites, including Search Engine Roundtable, TechCrunch, Billboard, and more.

To clarify what was happening search-wise, on December 19, 2014 Google began showing song lyrics for users in the United States, and only for certain songs. I’m assuming the limit on songs and geography was based on licensing, so this doesn’t impact every song available. I’ll cover more about the impact of those limitations soon when I dig into some stats, but it’s an important note.

For example, if you search for “bang bang lyrics” in the United States, you get this:

Bang Bang Lyrics in US Google SERPs

But if you search for “you shook me all night long lyrics”, you won’t see lyrics in the SERPs. Clearly Google doesn’t have the rights to present the lyrics to all AC/DC songs, but it does for “Bang Bang”.

You Shook Me All Night Long Without Lyrics in US Google SERPs

And by the way, that’s for the desktop search results. This is also happening in mobile search, in the United States, and for certain songs. Talk about dominating the mobile SERPs, check out the screenshot below. Where on desktop, you get the lyrics, but still see links to lyrics websites above the fold (typically), mobile is another story.

Check out the search for “bang bang lyrics” on my smartphone:

Bang Bang Lyrics in the Mobile U.S. Google SERPs

Can you see the massive difference? It’s just lyrics, and nothing else. And to add insult to injury, the percentage of users searching for lyrics is heavily skewed mobile. And that makes sense. Those users are on the go, hear a song, want to know the lyrics, and simply search on their phones. Or, they are in a situation where their phone –is their computer– so their searches will always be mobile.

Mobile Heavy Queries for Lyrics Globally


Death to Lyrics Websites?
Based on what I’ve explained so far, you know that Panda loves taking a bite out of lyrics web sites and you also know that both Google and Bing are providing lyrics directly in the SERPs (in the US and for certain songs). And you might guess that all of this means absolute death for lyrics websites. But wait, does it? I wouldn’t jump to conclusions just yet. There are definitely nuances to this situation that require further analysis and exploration.

For example, how much of a hit have the lyrics sites taken based on lyrics in the SERPs? How much traffic dropped for each song that yields lyrics in the SERPs? Was there an impact just in the United States or around the world too? And what about the difference between desktop and mobile? All of these were great questions, and I was eager to find answers.

So, I reached out to several of my lyrics clients and asked if I could analyze the changes and document the data in this post (anonymously of course). The post isn’t meant to focus on the sites in particular, but instead, focus on the impact that “lyrics in the SERPs” have made to their traffic. The lyrics websites I’ve been helping generate revenue via advertising, so a massive drop in traffic means a massive drop in revenue. It’s pretty much that simple at this point. That’s why Panda strikes fear in every lyrics web site owner and why lyrics in the SERPs can strip away visits, pageviews, and ad dollars. It’s a new one-two punch from Google.

Analyzing Three Large-Scale Lyrics Websites
Three of my clients were nice enough to let me move forward with the analysis. And I greatly appreciate having clients that are awesome, and are willing to let me analyze and share that data. The three sites I analyzed for this post are large-scale lyrics sites. Combined, they drive more than 30 million visits from Google organic per month and have approximately 6 million lyrics pages indexed. And as I explained earlier, a lot of that traffic is from users on mobile devices. Approximately 40-50% of all Google organic traffic is from mobile devices (across all three sites).

My goal with the analysis was to understand the impact of lyrics in the SERPs from a click-through and traffic standpoint. I dug into search queries driving traffic over time to all three sites while also checking impressions and clicks in the SERPs (via Google Webmaster Tools, both desktop and mobile). Then I also checked Google Analytics to determine the change in traffic levels to song pages since the lyrics hit the SERPs.

For example, if a query saw a similar number of impressions since the launch of lyrics in the SERPs, but clicks dropped off a cliff, then I could dig in to analyze the SERPs for that query (both desktop and mobile). I found some interesting examples for sure, which I’ll cover below.

An example of stable or increasing impressions, but clicks dropping off a cliff: 

Google Webmaster Tools Impressions and Clicks for Lyrics Queries


My analysis measured the impact right after lyrics hit the SERPs (from December 19, 2014 through the end of January 2015). The holidays were mixed in, which I tried to account for the best I could. Some of the lyrics sites saw steady traffic during the holidays, while one dipped and then returned as the New Year approached. The songs I analyzed and documented were not holiday-focused songs. I made sure to try and isolate songs that would not be impacted by the holidays. Also, Google Webmaster Tools data was sometimes wonky. I’m sure that’s no surprise to many of you working heavily in SEO, but it’s worth noting. I tried my best to exclude songs where the data looked strange.

Google Webmaster Tools & Advanced Segmentation in GA
When I began my analysis, I quickly found out that the straight reporting in both Google Webmaster Tools and Google Analytics wouldn’t suffice. Overall Google organic traffic wouldn’t help, since lyrics only rolled out in the SERPs in the United States. When checking traffic since the rollout, you really couldn’t see much overall change. But the devil is in the details as they say. So I used the functionality available to me in both GWT and GA to slice and dice the data. And that greatly helped me understand the impact of lyrics in the SERPs.

In Google Webmaster Tools, the search queries reporting enables you to filter the results. This was incredibly helpful, as I was able to isolate traffic from the United States and also view web versus mobile traffic. But there was another nifty filter I used that really helped. You see, many people visit lyrics websites for the meaning of the lyrics, and not just to see the lyrics. For example, “take me to church meaning” or “meaning of hallelujah lyrics”.

The reason I wanted to weed those queries out is because as of now, Google does not provide the lyrics in the SERPs for “meaning” focused queries. And that’s good for my clients by the way. So by adding the filters per site, I would able to isolate songs that could be impacted.

Filtering GWT Search Queries by Search Property, Location, and Negative Query:

Google Webmaster Tools Filters for Property, Location, and Query

After setting the filters, I was able to search for queries that yielded relatively stable impressions, but saw a drop in clicks and click through rate. And I always kept an eye on average position to make sure it didn’t drop heavily.

From a Google Analytics standpoint, I ran into a similar problem. Top-level statistics wouldn’t cut it. I needed Google organic traffic from the United States only. And then I wanted both Desktop and Mobile Google organic traffic from the United States only (separated). That’s where the power of advanced segments come in.

I built segments for Desktop Google organic traffic from the United States and Mobile Google organic traffic from the United States. By activating these segments, my reporting isolated that traffic and enabled me identify trends and changes based on those segments alone. By the way, I wrote a tutorial for how to use segments to analyze Panda hits. You should check that out if you aren’t familiar with segments in GA. You’ll love them, believe me.

Filtering Google Organic Traffic from the United States in GA Using Segments:

Google Analytics Segments for U.S. Desktop Google Organic Traffic


So, with the right tools and filters in place, I began to dig in. It was fascinating to analyze the queries leading to all three sites now that lyrics hit the SERPs. I cover what I found next. By the way, this posts focuses on Google and not Bing. I might write up another post focused on Bing’s lyrics in the SERPs, but I wanted to focus on Google to start.

The Impact of Lyrics in the SERPs – The Data
With multiple computers up and running, two phones, and two tablets, I began to dig in. I wanted to find queries and songs that typically drove traffic to the three sites that now yielded lyrics in the SERPs. And then I wanted to see what happened once those lyrics hit the SERPs, the impact on clicks, traffic, etc. I have documented a number of examples below. By the way, there are many more examples, but I wanted to just provide a sampling below. Here we go…


Spill The Wine Lyrics by War
Google Organic Desktop US Traffic Down 73%
Google Organic Mobile US Traffic Down 65%
GWT Clicks Down 56%


Sister Ray Lyrics by The Velvet Underground
Google Organic Desktop US Traffic Down 73%
Google Organic Mobile US Traffic Down 56%
GWT Clicks Down 84%


Rude Lyrics by Magic!
Google Organic Desktop US Traffic Down 41%
Google Organic Mobile US Traffic Down 32%
GWT Clicks Down 55%


Bang Bang Lyrics by Jesse J, Nicki Manaj and Ariana Grande
Google Organic Desktop US Traffic Down 32%
Google Organic Mobile US Traffic Down 47%
GWT Clicks Down 66%


Fireproof Lyrics by One Direction
Google Organic Desktop US Traffic Down 44%
Google Organic Mobile US Traffic Down 40%
GWT Clicks Down 29%


All of Me Lyrics by John Legend
Google Organic Desktop US Traffic Down 39%
Google Organic Mobile US Traffic Down 14%
GWT Clicks Down 61%


Country Road Lyrics by John Denver
Google Organic Desktop US Traffic Down 62%
Google Organic Mobile US Traffic Down 45%
GWT Clicks Down 36%


Come Sail Away Lyrics by Styx
Google Organic Desktop US Traffic Down 43%
Google Organic Mobile US Traffic Down 27%
GWT Clicks Down 55%


Midnight Special Lyrics by Huddie William Ledbetter
Google Organic Desktop US Traffic Down 53%
Google Organic Mobile US Traffic Down 85%
GWT Clicks Down 33%


Comfortably Numb Lyrics by Pink Floyd
Google Organic Desktop US Traffic Down 46%
Google Organic Mobile US Traffic Down 17%
GWT Clicks Down 43%


Yes, There’s A Serious Impact
As you can see from the statistics above, both desktop and mobile traffic to the song pages dropped significantly since lyrics hit the SERPs (for songs that yield lyrics in the SERPs). Again, these songs showed stable impressions during the timeframe, yet showed large drops in clicks from the SERPs, and subsequent traffic to the three lyrics sites I analyzed.

Some users were clearly getting what they wanted when searching for lyrics and finding that information in the SERPs. And in mobile search, the lyrics take up the entire results page. So it’s no surprise to see some mobile numbers absolutely plummet after lyrics hit the SERPs.

What Could Lyrics Sites Do?
Above, I provided a sampling of what I saw while analyzing the impact of lyrics in the U.S. Google SERPS. Clearly there’s a large impact. The good news for lyrics sites is that there are several core factors helping them right now.

  • This is only in the United States.
  • The lyrics only trigger when the query is structured in certain ways. For example, “magic rude lyrics” yields lyrics where “rude lyrics magic” does not. Also, if additional words are entered in the query, lyrics will not be shown (like “meaning” which I explained earlier.)
  • Not all songs are impacted (yet). I found many examples of songs that did not yield lyrics in the SERPs. Again, this is probably due to licensing issues.

If you look at the overall traffic numbers for the sites I analyzed (and the other sites I have access to), Google organic traffic overall has not been heavily impacted. Taking all global Google organic traffic into account, and across all songs, you clearly don’t see the huge drop like I showed you for the songs listed above. That said, this is still a grave situation for many lyrics sites. The content they have licensed and provided on their sites is now being surfaced directly in the SERPs. If this expands to more songs, more countries, and for additional queries, then it can have a massive impact on their businesses. Actually, it could very well end their businesses.

Moving forward, lyrics sites need to up their game from a functionality and value proposition standpoint. If Google can easily add lyrics to the SERPs, then lyrics sites need to keep driving forward with what Google can’t do (at least for now). They should develop new functionality, strengthen community engagement, provide member benefits, include more data and media for artists and songs, provide a killer mobile experience, etc.

Remember, there are many people searching for additional information related to songs. For example, people want to know the meaning of lyrics and seem to enjoy the community engagement about learning what each lyric means. And lyrics don’t trigger in the SERPs for those queries (yet).

And then you have the next generation of devices, social networks, messaging apps, gaming consoles, connected cars, etc. I would start thinking about how people are going to search for lyrics across new devices and in new environments. That’s a new frontier and it would be smart to begin building and testing lyrics applications that can work in those new environments. Mobile, wearables, voice search, cars, etc. provide a wealth of opportunity for business owners focused on music. It just takes the right ideas, time, resources, and of course, money.

But I’ll stop there. I think that topic can be an entire post and this one is getting too long already. :)


Summary – Moving Forward With (Expanding) Lyrics in the SERPs
In the short-term, it’s hard to say how this will expand. Google and Bing might drop the effort and keep things as-is, or they could keep expanding lyrics in the SERPs until every song and every country is covered.

Based on the current song and geography limits in Google and Bing, lyrics websites are still surviving, and especially for searches outside the United States. It will be interesting to watch this space over time, especially since I have several clients adapting to the new lyrics world as I write this post.

From an SEO standpoint, between Google Panda and content surfacing in the SERPs, lyrics web sites are fighting a battle on two fronts. If it’s not Panda attacking the site one night, it’s the Knowledge Graph pushing song lyrics front and center in the SERPs. And in this day and age, wars are won by technology, not brute strength. So lyrics sites need to up their engineering prowess, think two to three steps ahead of the industry, and then execute quickly and at a very high level.

That’s how they can survive and prosper in the coming years. Of course, that’s until we have a Google chip implanted in our brains that instantly provides the lyrics to every song ever written, from the around the world, since the beginning of time. Think about that for a second.



Friday, January 23rd, 2015

Insidious Thin Content on Large-Scale Websites and Its Impact on Google Panda

Insidious Thin Content and Google Panda

If you’ve read some of my case studies in the past, then you know Panda can be a real pain the neck for large-scale websites. For example, publishers, ecommerce retailers, directories, and other websites that often have tens of thousands, hundreds of thousands, or millions of pages indexed. When sites grow that large, with many categories, directories, and subdomains, content can easily get out of control. For example, I sometimes surface problematic areas of a website that clients didn’t even know existed! There’s usually a gap of silence on the web conference when I present situations like that. But once everyone realizes that low quality content is in fact present, then we can proceed with how to rectify the problems at hand.

And that’s how you beat Panda. Surfacing content quality problems and then quickly fixing those problems. And if companies don’t surface and rectify those problems, then they remain heavily impacted by Panda. Or even more maddening, they can go in and out of the gray area of Panda. That means they can get hit, recover to a degree, get hit again, recover, etc. It’s a maddening place to live SEO-wise.

The Insidious Thin Content Problem
The definition of insidious is:
“proceeding in a gradual, subtle way, but with harmful effects”

And that’s exactly how thin content can increase over time on large-scale websites. The problem usually doesn’t rear its ugly head in one giant blast (although that can happen). Instead, it can gradually increase over time as more and more content is added, edited, technical changes are made, new updates get pushed to the website, new partnerships formed, etc. And before you know it, boom, you’ve got a huge thin content problem and Panda is knocking on the door. Or worse, it’s already knocked down your door.

So, based on recent Panda audits, I wanted to provide three examples of how an insidious thin content problem can get out of control on larger-scale websites. My hope is that you can review these examples and then apply the same model to your own business.


Insidious Thin Content: Example 1
During one recent audit, I ended up surfacing a number of pages that seemed rogue. For example, they weren’t linked to from many other pages on the site, didn’t contain the full site template, and only contained a small amount of content. And the content didn’t really have any context about why it was there, what users were looking at, etc. I found that very strange.

Thin Content with No Site Template

So I dug into that issue, and started surfacing more and more of that content. Before I knew it, I was up to 4,100 pages of that content! Yes, there were over four thousand rogue, thin pages based on that one find.

To make matters even worse, when checking how Google was crawling and indexing that content, you could quickly see major problems. Using both fetch and render in Google Webmaster Tools and checking the cache of the pages revealed Google couldn’t see most of the content. So the thin pages were even thinner than I initially thought. They were essentially blank to Google.

Thin Content and Content Won't Render

When bringing this up to my client, they did realize the pages were present on the site, but didn’t understand the potential impact Panda-wise. After explaining more about how Panda works, and how thin content equates to giant pieces of bamboo, they totally got it.

I explained that they should either immediately 404 that content or noindex it. And if they wanted to quicken that process a little, then 410 the content. Basically, if the pages should not be on the site for users or Google, then 404 or 410 them. If the pages are beneficial for users for some reason, then noindex the content using the meta robots tag.

So, with one finding, my client will nuke thousands of pages of thin content from their website (which had been hammered by Panda). That will sure help and it’s only one finding based on a number of core problems I surfaced on the site during my audit. Again, the problem didn’t manifest itself overnight. Instead, it took years of this type of content building on the site. And before they knew it, Panda came and hammered the site. Insidious.


Insidious Thin Content: Example 2
In another audit I recently conducted, I kept surfacing thin pages that basically provided third party videos (which were often YouTube videos embedded in the page). So you had very little original content and then just a video. After digging into the situation, I found many pages like this. At this time, I estimate there could be as many as one thousand pages like this on the site. And I still need to analyze more of the site and crawl, so it could be even worse…

Now, the web site has been around for a long time, so it’s not like all the thin video pages popped up overnight. The site produces a lot of content, but would continually supplement stronger content with this quick approach that yielded extremely thin and unoriginal content. And as time went on, the insidious problem yielded a Panda attack (actually, multiple Panda attacks over time).

Thin Video Pages and Google Panda

Note, this was not the only content quality problem the site suffered from. It never is just one problem that causes a Panda attack by the way. I’ve always said that Panda has many tentacles and that low quality content can mean several things. Whenever I perform a deep crawl analysis and audit on a severe Panda hit, I often surface a number of serious problems. This was just one that I picked up during the audit, but it’s an important find.

By the way, checking Google organic traffic to these pages revealed a major decrease in traffic over time… Even Google was sending major signals to the site that it didn’t like the content. So there are many thin video pages indexed, but almost no traffic. Running a Panda report showing the largest drop in traffic to Google organic landing pages after a Panda hit reveals many of the thin video pages in the list. It’s one of the reasons I recommend running a Panda report once a site has been hit. It’s loaded with actionable data.

So now I’m working with my client to identify all pages on the site that can be categorized as thin video pages. Then we need to determine which are ok (there aren’t many), which are truly low quality, which should be noindexed, and which should be nuked. And again, this was just one problem… there are a number of other content quality problems riddling the site.


Insidious Thin Content: Example 3

During another Panda project, I surfaced an interesting thin content problem. And it’s one that grew over time to create a pretty nasty situation. I surfaced many urls that simply provided a quick update about a specific topic. Those updates were typically just a few lines of content all within a specific category. The posts were extremely thin… and were sometimes only a paragraph or two without any images, visuals, links to more content, etc.

Thin Quick Updates and Google Panda

Upon digging into the entire crawl, I found over five thousand pages that fit this category of thin content. Clearly this was a contributing factor to the significant Panda hit the site experienced. So I’m working with my client on reviewing the situation and making the right decision with regard to handling that content. Most of the content will be noindexed versus being removed, since there are reasons outside of SEO that need to be taken into account. For example, partnerships, contractual obligations, etc.

Over time, you can see that some of these pages actually used to rank well and drive organic search traffic from Google. That’s probably due to the authority of the site. I’ve seen that many times since 2011 when Panda first rolled out. A site builds enormous SEO power and then starts pumping out thinner, lower-quality content.  And then that content ends up ranking well. And when users hit the thin content from Google, they bounce off the site quickly (and often back to the search results). In aggregate, low user engagement, high bounce rates, and low dwell time can be a killer Panda-wise. Webmasters need to avoid that situation like the plague. You can read my case study about “6 months with Panda” to learn more about that situation.


Summary – Stopping The Insidious Thin Content Problem is Key For Panda Recovery
So there you have it. Three quick examples of insidious thin content problems on large-scale websites. They often don’t pop up overnight, but instead, they grow over time. And before you know it, you’ve got a thick layer of bamboo on your site attracting the mighty Panda. By the way, there are many other examples of insidious thin content that I’ve come across during my Panda work and I’ll try and write more about this problem soon. I think it’s incredibly important for webmasters to understand how the problem can grow, the impact it can have, and how to handle the situation.

In the meantime, I’ll leave you with some quick advice. My recommendation to any large-scale website is to truly understand your content now, identify any Panda risks, and take action sooner than later. It’s much better to be proactive and handle thin content in the short-term versus dealing with a major Panda hit after the fact. By the way, the last Panda update was on 10/24, and I’m fully expecting another one soon. Google rolled out an update last year on 1/11/14, so we are definitely due for one soon. I’ll be sure to communicate what I’m seeing once the update rolls out.




Monday, December 29th, 2014

XML Sitemaps – 8 Facts, Tips, and Recommendations for the Advanced SEO

XML Sitemaps for Advanced SEOs

After publishing my last post about dangerous rel canonical problems, I started receiving a lot of questions about other areas of technical SEO. One topic in particular that seemed to generate many questions was how to best use and set up xml sitemaps for larger and more complex websites.

Sure, in its most basic form, webmasters can provide a list of urls that they want the search engines to crawl and index. Sounds easy, right? Well, for larger and more complex sites, the situation is often not so easy. And if the xml sitemap situation spirals out of control, you can end up feeding Google and Bing thousands, hundreds of thousands, or millions of bad urls. And that’s never a good thing.

While helping clients, it’s not uncommon for me to audit a site and surface serious errors with regard to xml sitemaps. And when that’s the case, websites can send Google and Bing mixed signals, urls might not get indexed properly, and both engines can end up losing trust in your sitemaps. And as Bing’s Duane Forrester once said in this interview with Eric Enge:

“Your Sitemaps need to be clean. We have a 1% allowance for dirt in a Sitemap. If we see more than a 1% level of dirt, we begin losing trust in the Sitemap.”

Clearly that’s not what you want happening…

So, based on the technical SEO work I perform for clients, including conducting many audits, I decided to list some important facts, tips, and answers for those looking to maximize their xml sitemaps. My hope is that you can learn something new from the bullets listed below, and implement changes quickly.


1. Use RSS/Atom and XML For Maximum Coverage
This past fall, Google published a post on the webmaster central blog about best practices for xml sitemaps. In that post, they explained that sites should use a combination of xml sitemaps and RSS/Atom feeds for maximum coverage.

Xml sitemaps should contain all canonical urls on your site, while RSS/Atom feeds should contain the latest additions or recently updated urls. XML sitemaps will contain many urls, where RSS/Atom feeds will only contain a limited set of new or recently changed urls.

RSS/Atom Feed and XML Sitemaps

So, if you have new urls (or recently updated urls) that you want Google to prioritize, then use both xml sitemaps and RSS/Atom feeds. Google says by using RSS, it can help them “keep your content fresher in its index”. I don’t know about you, but I like the idea of Google keeping my content fresher. :)

Also, it’s worth noting that Google recommends maximizing the number of urls per xml sitemap. For example, don’t cut up your xml sitemaps into many smaller files (if possible). Instead, use the space you have in each sitemap to include all of your urls. If you don’t Google explains that, “it can impact the speed and efficiency of crawling your urls.” I recommend reading Google’s post to learn how to best use xml sitemaps and RSS/Atom feeds to maximize your efforts. By the way, you can include 50K urls per sitemap and each sitemap must be less than 10MB uncompressed.


2. XML Sitemaps By Protocol and Subdomain
I find a lot of webmasters are confused by protocol and subdomains, and both can end up impacting how urls in sitemaps get crawled and indexed.

URLs included in xml sitemaps must use the same protocol and subdomain as the sitemap itself. This means that https urls located in an http sitemap should not be included in the sitemap. This also means that urls on cannot be located in the sitemap on So on and so forth.

XML Sitemaps and Protocol and Subdomains


This is a common problem when sites employ multiple subdomains or they have sections using https and http (like ecommerce retailers). And then of course we have many sites starting to switch to https for all urls, but haven’t changed their xml sitemaps to reflect the changes. My recommendation is to check your xml sitemaps reporting today, while also manually checking the sitemaps. You might just find issues that you can fix quickly.


3. Dirty Sitemaps – Hate Them, Avoid Them
When auditing sites, I often crawl the xml sitemaps myself to see what I find. And it’s not uncommon to find many urls that resolve with non-200 header response codes. For example, urls that 404, 302, 301, return 500s, etc.

Dirty XML Sitemaps

You should only provide canonical urls in your xml sitemaps. You should not provide non-200 header response code urls (or non-canonical urls that point to other urls). The engines do not like “dirty sitemaps” since they can send Google and Bing on a wild goose chase throughout your site. For example, imagine driving Google and Bing to 50K urls that end up 404ing, redirecting, or not resolving. Not good, to say the least.

Remember Duane’s comment from earlier about “dirt” in sitemaps. The engines can lose trust in your sitemaps, which is never a good thing SEO-wise. More about crawling your sitemaps later in this post.


4. View Trending in Google Webmaster Tools
Many SEOs are familiar with xml sitemaps reporting in Google Webmaster Tools, which can help surface various problems, while also providing important indexation statistics. Well there’s a hidden visual gem in the report that’s easy to miss. The default view will show the number of pages submitted in your xml sitemaps and the number indexed. But if you click the “sitemaps content” box for each category, you can view trending over the past 30 days. This can help you identify bumps in the road, or surges, as you make changes.

For example, check out the trending below. You can see the number of images submitted and indexed drop significantly over a period of time, only to climb back up. You would definitely want to know why that happened, so you can avoid problems down the line. Sending this to your dev team can help them identify potential problems that can build over time.

XML Sitemaps Trending in Google Webmaster Tools


5. Using Rel Alternate in Sitemaps for Mobile URLs
When using mobile urls (like m.), it’s incredibly important to ensure you have the proper technical SEO setup. For example, you should be using rel alternate on the desktop pages pointing to the mobile pages, and then rel canonical on the mobile pages pointing back to the desktop pages.

Although not an approach I often push for, you can provide rel alternate annotations in your xml sitemaps. The annotations look like this:

Rel Alternate in XML Sitemaps


It’s worth noting that you should still add rel canonical to the source code of your mobile pages pointing to your desktop pages.


6. Using hreflang in Sitemaps for Multi-Language Pages
If you have pages that target different languages, then you are probably already familiar with hreflang. Using hreflang, you can tell Google which pages should target which languages. Then Google can surface the correct pages in the SERPs based on the language/country of the person searching Google.

Similar to rel alternate, you can either provide the hreflang code in a page’s html code (page by page), or you can use xml sitemaps to provide the hreflang code. For example, you could provide the following hreflang attributes when you have the same content targeting different languages:

Hreflang in XML Sitemaps

Just be sure to include a separate <loc> element for each url that contains alternative language content (i.e. all of the sister urls should be listed in the sitemap via a <loc> element).


7. Testing XML Sitemaps in Google Webmaster Tools
Last, but not least, you can test your xml sitemaps or other feeds in Google Webmaster Tools. Although easy to miss, there is a red “Add/Test Sitemap” button in the upper right-hand corner of the Sitemaps reporting page in Google Webmaster Tools.

Test XML Sitemaps in Google Webmaster Tools

When you click that button, you can add the url of your sitemap or feed. Once you click “Test Sitemap”, Google will provide results based on analyzing the sitemap/feed. Then you can rectify those issues before submitting the sitemap. I think too many webmasters use a “set it and forget it” approach to xml sitemaps. Using the test functionality in GWT, you can nip some problems in the bud. And it’s simple to use.

Results of XML Sitemaps Test in Google Webmaster Tools


8. Bonus: Crawl Your XML Sitemap Via Screaming Frog
In SEO, you can either test and know, or read and believe. As you can probably guess, I’m a big fan of the former… For xml sitemaps, you should test them thoroughly to ensure all is ok. One way to do this is to crawl your own sitemaps. By doing so, you can identify problematic tags, non-200 header response codes, and other little gremlins that can cause sitemap issues.

One of my favorite tools for crawling sitemaps is Screaming Frog (which I have mentioned many times in my previous posts). By setting the crawl mode to “list mode”, you can crawl your sitemaps directly. Screaming Frog natively handles xml sitemaps, meaning you don’t need to convert your xml sitemaps into another format before crawling (which is awesome).

Crawling Sitemaps in Screaming Frog

Screaming Frog will then load your sitemap and begin crawling the urls it contains. In real-time, you can view the results of the crawl. And if you have Graph View up and running during the crawl, you can visually graph the results as the crawler collects data. I love that feature. Then it’s up to you to rectify any problems that are surfaced.

Graph View in in Screaming Frog


Summary – Maximize and Optimize Your XML Sitemaps
As I’ve covered throughout this post, there are many ways to use xml sitemaps to maximize your SEO efforts. Clean xml sitemaps can help you inform the engines about all of the urls on your site, including the most recent additions and updates. It’s a direct feed to the engines, so it’s important to get it right (and especially for larger and more complex websites).

I hope my post provided some helpful nuggets of sitemap information that enable you to enhance your own efforts. I recommend setting some time aside soon to review, crawl, audit, and then refine your xml sitemaps. There may be some low-hanging fruit changes that can yield nice wins. Now excuse me while I review the latest sitemap crawl. :)



Tuesday, December 9th, 2014

6 Dangerous Rel Canonical Problems Based on Crawling 11M+ Pages in 2014

Dangerous Rel Canonical Problems

Based on helping clients with Panda work, Penguin problems, SEO technical audits, etc., I end up crawling a lot of websites. In 2014, I estimate that I crawled over eleven million pages while helping clients. And during those crawls, I often pick up serious technical problems inhibiting the SEO performance of the sites in question.

For example, surfacing response code issues, redirects, thin content, duplicate content, metadata problems, mobile issues, and more.  And since those problems often lie below the surface, they can sit unidentified and unresolved for a long time. It’s one of the reasons I believe SEO technical audits are the most powerful deliverable in all of SEO.

Last week, I found an interesting comment from John Mueller in a Google Webmaster Hangout video. He was speaking about the canonical url tag and explained that Google needs to process rel canonical as a second or third step (at 48:30 in the video). He explained that processing rel canonical signals is not part of the crawling process, but instead, it’s handled down the line. And that’s one reason you can see urls indexed that are canonicalized to other pages. It’s not necessarily a problem, but gives some insight into how Google handles rel canonical.

When analyzing my tweets a few days later, I noticed that specific tweet got a lot of eyeballs and engagement.

Tweet About Rel Canonical and John Mueller of Google


That got me thinking that there are probably several other questions about rel canonical that are confusing webmasters. Sure, Google published a post covering some common rel canonical problems, but that doesn’t cover all of the issues webmasters can face. So, based on crawling over eleven million pages in 2014, I figured I would list some dangerous rel canonical issues I’ve come across (along with how to rectify them). My hope is that some readers can leave this post and make changes immediately. Let’s jump in.


1. Canonicalizing Many URLs To One
When auditing websites I sometimes come across situations where entire sections of content are being canonicalized to one url. The sections might contain dozens or urls (or more), but the site is using the canonical url tag on every page in the section pointing to one other page on the site.

If the site is canonicalizing many pages to one, then it will have little chance of ranking for any of the content on the canonicalized pages. All of the indexing properties will be consolidated to the url used in the canonic al url tag (in the href). Rel canonical is meant to handle very similar content at more than one url, and was not meant for handling many pages of unique content pointing to one other page.

When explaining this to clients, they typically didn’t understand the full ramifications of implementing a many to one rel canonical strategy. By the way, the common reason for doing this is to try and boost the rankings of the most important pages on the site. For example, webmasters believe that if they canonicalize 60 pages in a section to the top-level page, then that top-level page will be the all-powerful url ranking in the SERPs. Unfortunately, while they are doing that, they strip away any possibility of the canonicalized pages ranking for the content they hold. And on larger sites, this can turn ugly quickly.

Rel Canonical Many URLs to One
If you have unique pages with valuable content, then do not canonicalize them to other pages… Let those pages be indexed, optimize the pages for the content at hand, and make sure you can rank for all of the queries that relate to that content. When you take the long tail of SEO into account, those additional pages with unique content can drive many valuable visitors to your site via organic search. Don’t underestimate the power of the long tail.


2. Daisy Chaining Rel Canonical
When using the canonical url tag, you want to avoid daisy chaining hrefs. For example, if you were canonicalizing page2.htm to page1.htm, but page 1.htm is then canonicalized to page3.htm, then you are sending very strange signals to the engines. To clarify, I’m not referring to actual redirects (like 301s or 302s), but instead, I’m talking about the hrefs used in the canonical url tag.

Here’s an example:
page 2.htm includes the following: <link rel=“canonical” href=“page1.htm” />
But page1.htm includes this: <link rel=“canonical” href=“page3.htm” />

Daisy Chaining Rel Canonical

While conducting SEO audits, I’ve seen this botched many times, even beyond the daisy chaining. Sometimes page3.htm doesn’t even exist, sometimes it redirects via 301s or 302s, etc.

Overall, don’t send mixed signals to the engines about which url is the canonical one. If you say it’s page1.htm but then tell the engines that it’s page3.htm once they crawl page1.htm, and then botch page3.htm in a variety of ways, you might experience some very strange ranking problems. Be clear and direct via rel canonical.


3. Using The Non-Canonical Version
This situation is a little different, but can cause problems nonetheless. I actually just audited a site that used this technique across 2.1M pages. Needless to say, they will be making changes asap. In this scenario, a page is referencing a non-canonical version of the original url via the canonical url tag.  But the non-canonical version actually redirects back to the original url.

For example:
page1.htm includes this: <link rel=“canonical” href=“page1.htm?id=46” />
But page1.htm?id=46 redirects back to page1.htm

Rel Canonical to Non-Canoncial Version of URL

So in a worst-case scenario, this is implemented across the entire site and can impact many urls. Now, Google views rel canonical as a hint and not a directive. So there’s a chance Google will pick up this error and rectify the issue on its end. But I wouldn’t bank on that happening. I would fix rel canonical to point to the actual canonical urls on the site versus non-canonical versions that redirect to the original url (or somewhere else).


4. No Rel Canonical + The Use of Querystring Parameters
This one is simple. I often find websites that haven’t implemented the canonical url tag at all. For some smaller and less complex sites, this isn’t a massive problem. But for larger, more complex sites, this can quickly get out of control.

As an example, I recently audited a website that heavily used campaign tracking parameters (both from external campaigns and from internal promotions). By the way, don’t use campaign tracking parameters on internal promotions… they can cause massive tracking problems. Anyway, many of those urls were getting crawled and indexed. And depending on how many campaigns were set up, some urls had many non-canonical versions being crawled and indexed.

Not Using Rel Canonical With Campaign Parameters

By implementing the canonical url tag, you could signal to the engines that all of the variations of urls with querystring parameters should be canonicalized to the original, canonical url. But without rel canonical in place, you run the risk of diluting the strength of the urls in question (as many different versions can be crawled, indexed, and linked to from outside the site).

Imagine 500K urls indexed with 125K duplicate urls also indexed. And for some urls, maybe there are five to ten duplicates per page. You can see how this can get out of control. It’s easy to set up rel canonical programmatically (either via plugins or your own server-side code). Set it up today to avoid a situation like what I listed above.


5. Canonical URL Tag Not Present on Mobile Urls (m. or other)
Mobile has been getting a lot of attention recently (yes, understatement of the year). When clients are implementing an m. approach to mobile handling, I make sure to pay particular attention the bidirectional annotations on both the desktop and mobile urls. And to clarify, I’m not just referring to a specific m. setup. It can be any mobile urls that your site is using (redirecting from the desktop urls to mobile urls).

For example, Google recommends you add rel alternate on your desktop urls pointing to your mobile urls and then rel canonical on your mobile urls pointing back to your desktop urls.

Not Using Rel Canonical With Mobile URLs

This ensures Google understands that the pages are the same and should be treated as one. Without the correct annotations in place, you are hoping Google understands the relationship between the desktop and mobile pages. But if it doesn’t, you could be providing many duplicate urls on your site that can be crawled and indexed. And on larger-scale websites (1M+ pages), this can turn ugly.

Also, contrary to what many think, separate mobile urls can work extremely well for websites (versus responsive or adaptive design). I have a number of clients using mobile urls and the sites rank extremely well across engines. You just need to make sure the relationship is sound from a technical standpoint.


6. Rel Canonical to a 404 (or Noindexed Page)
The last scenario I’ll cover can be a nasty one. This problem often lies undetected until pages start falling out the index and rankings start to plummet. If a site contains urls that use rel canonical pointing to a 404 or a noindexed page, then the site will have little shot of ranking for the content on those canonicalized pages. You are basically telling the engines that the true, canonical url is a 404 (not found), or a page you don’t want indexed (a page that uses the meta robots tag containing “noindex”).

I had a company reach out to me once during the holidays freaking out because their organic search traffic plummeted. After quickly auditing the site, it was easy to see why. All of their core pages were using rel canonical pointing to versions of that page that returned 404 header response codes. The site (which had over 10M pages indexed) was giving Google the wrong information, and in a big way.

Rel Canonical Pointing to 404 or Noindexed Page
Once the dev team implemented the change, organic search traffic began to surge. As more and more pages sent the correct signals to Google, and Google indexed and ranked the pages correctly, the site regained its traffic. For an authority site like this one, it only took a week or two to regain its rankings and traffic. But without changing the flawed canonical setup, I’m not sure it would ever surge back.

Side Note: This is why I always recommend checking changes in a staging environment prior to pushing them live. Letting your SEO review all changes before they hit the production site is a smart way to avoid potential disaster.


Summary – Don’t Botch Rel Canonical
I’ve always said that you need a solid SEO structure in order to rank well across engines. In my opinion, SEO technical audits are worth their weight in gold (and especially for larger-scale websites.) Rel canonical is a great example of an area that can cause serious problems if not handled correctly. And it often lies below the surface, wreaking havoc by sending mixed signals to the engines.

My hope is that the scenarios listed above can help you identify, and then rectify canonical url problems riddling your website. The good news is that the changes are relatively easy to implement once you identify the problems. My advice is to keep rel canonical simple, send clear signals, and be consistent across your website. If you do that, good things can happen. And that’s exactly what you want SEO-wise.



Wednesday, November 26th, 2014

Panda Analysis Using Google Analytics Segments – How To Isolate Desktop, Mobile, and Tablet Traffic From Google

Segments in Google Analytics to Isolate Traffic

In previous posts about Panda analysis, I’ve mentioned the importance of understanding the content that users are visiting from Google organic. Since Google is measuring user engagement, hunting down those top landing pages can often reveal serious content quality problems.

In addition, I’ve written about understanding the devices being used to access your site from the search results. For example, what’s the breakdown of users by desktop, mobile, and tablets from Google organic? If 50% of your visits are from smartphones, then you absolutely need to analyze your site through that lens. If not, you can miss important problems that users are experiencing while visiting your website. And if left unfixed, those problems can lead to a boatload of horrible engagement signals being sent to Google. And that can lead to serious Panda problems.

Panda Help Via Segments in Google Analytics
So, if you want to analyze your content by desktop, mobile, and tablet users through a Panda lens, what’s the best way to achieve that? Well, there’s an incredibly powerful feature in Google Analytics that I find many webmasters simply don’t use. It’s called segmentation and enables you slice and dice your traffic based on a number of dimensions or metrics.

Segments are non-destructive, meaning that you can apply them to your data and not affect the source of the data. Yes, that means you can’t screw up your reporting. :) In addition, you can apply new segments to previous traffic (they are backwards compatible). So you can build a new segment today and apply it to traffic from six months ago, or longer.

For our purposes today, I’m going to walk you through how to quickly build three new segments. The segments will isolate Google organic traffic from desktop users, mobile users, and tablet users. Then I’ll explain how to use the new segments while analyzing Panda hits.


How To Create Segments in Google Analytics
When you fire up Google Analytics, the “All Sessions” segment is automatically applied to your reporting. So yes, you’ve already been using segments without even knowing it. If you click the “All Sessions” segment, you’ll see a list of additional segments you can choose.

Google Analytics All Sessions Segment

You might be surprised to see a number of segments have been built for you already. They are located in the “System” category (accessed via the left side links). For example, “Direct Traffic”, “AdWords”, “Organic Traffic”, and more.

Google Analytics System Segments


We are going to build custom segments by copying three system segments and then adding more dimensions. We’ll start by creating a custom segment for mobile traffic from Google organic.

1. Access the system segments by clicking “All Sessions” and then clicking the link labeled “System” (located on the left side of the UI).


Google Analytics System Segments


2. Scroll down and find the “Mobile Traffic” segment. To the far right, click the “Actions” dropdown. Then choose “Copy” from the list.


Copying a System Segment in Google Analytics


3. The segment already has “Device Category”, “exactly matches”, and “mobile” as the condition. We are going to add one more condition to the list, which is Google organic traffic. Click the “And” button on the far right. Then choose “Acquisition” and the “Source/Medium” from the dimensions list. Then choose “exactly matches” and select “google/organic” from the list. Note, autocomplete will list the top sources of traffic once you place your cursor in the text box.


Creating a Segment by Adding Conditions


4. Name your segment “Mobile Google Organic” by using the text box labeled “Segment Name” at the top of the window. It’s easy to miss.


Name a Custom Segment in Google Analytics


5. Then click “Save” at the bottom of the create segment window.


Save a Custom Segment in Google Analytics


Congratulations! You just created a custom segment.


Create The Tablet Traffic Segment
Now repeat the process listed above to create a custom segment for tablet traffic from Google organic.  You will begin with the system segment for “Tablet Traffic” and then copy it. Then you will add a condition for Google organic as the source and medium.


Desktop Traffic (Not a default system segment.)
I held off on explaining the “Desktop Traffic” segment, since there’s an additional step in creating one. For whatever reason, there’s not a system segment for isolating desktop traffic. So, you need to create this segment differently. Don’t worry, it’s still easy to do.

We’ll start with the “Mobile Traffic” segment in the “System” list, copy it, and then refine the condition.

1. Click “All Sessions” and the find “Mobile Traffic” in the “System” list. Click “Actions” to the far right and then click “Copy”.


Copying a System Segment in Google Analytics


2. The current condition is set for “Device Category” exactly matching “mobile”. We’ll simply change mobile to “desktop”. Delete “mobile” and start typing “desktop”. Then just select the word “desktop” as it shows up.

Creating a Desktop Segment in Google Analytics


3. Since we want Desktop traffic from Google Organic, we need to add another condition. You can do this by clicking “And” to the far right, selecting “Acquisition”, and then “Source/Medium” from the dropdown. Then select “exactly matches” and enter “Google/Organic” in the text box. Remember, autocomplete will list the top sources of traffic as you start to type.


Creating a Google Organic Desktop Segment in Google Analytics


4. Name your segment “Desktop Google Organic” and then click “Save” at the bottom of the segment window to save your new custom segment.


Quickly Check Your Segments
OK, at this point you should have three new segments for Google organic traffic from desktop, mobile, and tablets. To ensure you have these segments available, click “All Sessions” at the top of your reporting, and click the “Custom” link on the left. Scroll down and make sure you have all three new segments. Remember, you named them “Desktop Google Organic”, “Mobile Google Organic”, and “Tablet Google Organic”.

If you have them, then you’re good to go. If you don’t, read through the instructions again and create all three segments.


Run Panda Reports by Segment
In the past, I’ve explained the importance of running a Panda report in Google Analytics for identifying problematic content. A Panda report isolates landing pages from Google organic that have dropped substantially after a Panda hit. Well, now that you have segments for desktop, mobile, and tablet traffic from Google organic, you can run Panda reports by segment.

For example, click “All Sessions” at the top of your reporting and select “Mobile Google Organic” from the “All” or “Custom” categories. Then visit your “Landing Pages” report under “Behavior” and “Site Content” in the left side menu in GA. Since you have a specific segment active in Google Analytics, the reporting you see will be directly tied to that segment (and filter out any other traffic).

Creating a Google Panda Report Using Custom Segments


Then follow the directions in my previous post to run and export the Panda report. You’ll end up with an Excel spreadsheet highlighting top landing pages from mobile devices that dropped significantly after the Panda hit. Then you can dig deeper to better understand the content quality (or engagement) problems impacting those pages.

Combine with Adjusted Bounce Rate (ABR)
User engagement matters for Panda. I’ve documented that point many times in my previous posts about Panda analysis, remediation, and recovery. The more poor engagement signals you send Google, the more bamboo you are building up. And it’s only a matter of time before Panda comes knocking.

So, when analyzing user engagement, many people jump to the almighty Bounce Rate metric to see what’s going on. But here’s the problem. Standard Bounce Rate is flawed. Someone could spend five minutes reading a webpage on your site, leave, and it’s considered a bounce. But that’s not how Google sees it. That would be considered a “long click” to Google and would be absolutely fine.

And this is where Adjusted Bounce Rate shines. If you aren’t familiar with ABR, then read my post about it (including how to implement it). Basically, Adjusted Bounce Rate takes time on page into account and can give you a much stronger view of actual bounce rate. Once you implement ABR, you can check bounce rates for each of the segments you created earlier (and by landing page). Then you can find high ABR pages by segment (desktop, mobile, and tablet traffic).

Combining Adjusted Bounce Rate with Custom Segments


Check Devices By Segment (Smartphones and Tablets)
In addition to running a Panda report, you can also check the top devices being used by people searching Google and visiting your website. Then you can analyze that data to see if there are specific problems per device. And if it’s a device that’s heavily used by people visiting your site from Google organic, then you could uncover serious problems that might lie undetected by typical audits.

GA’s mobile reporting is great, but the default reporting is not by traffic source. But using your new segments, you could identify top devices by mobile and tablet traffic from Google organic. And that’s exactly what you need to see when analyzing Panda hits.

Analyzing Devices with Custom Segments in Google Analytics

For example, imagine you saw very high bounce rates (or adjusted bounce rates) for ipad users visiting from Google organic. Or maybe your mobile segment reveals very low engagement from Galaxy S5 users. You could then test your site via those specific devices to uncover rendering problems, usability problems, etc.


Summary – Isolate SEO Problems Via Google Analytics Segments
After reading this post, I hope you are ready to jump into Google Analytics to create segments for desktop, mobile, and tablet traffic from Google organic. Once you do, you can analyze all of your reporting through the lens of each segment. And that can enable you to identify potential problems impacting your site from a Panda standpoint. I recommend setting up those segments today and digging into your reporting. You might just find some amazing nuggets of information. Good luck.



Monday, October 27th, 2014

Penguin 3.0 Analysis – Penguin Tremors, Recoveries, Fresh Hits, and Crossing Algorithms

Penguin 3.0 Analysis and Findings

Oct 17, 2014 was an important date for many SEOs, webmasters, and business owners. Penguin, which we’ve been waiting over an entire year for, started to roll out. Google’s Gary Illyes explained at SMX East that Penguin 3.0 was imminent, that it would be a “delight” for webmasters, that it would be a new algorithm, and more. So we all eagerly awaited the arrival of Penguin 3.0.

There were still many questions about the next version of Penguin. For example, why has it taken so long to update Penguin, would there be collateral damage, would it actually have new signals, would it roll out more frequently, and more?  So when we saw the first signs of Penguin rolling out, many of us dug in and began to analyze both recoveries and fresh hits. I had just gotten back from SES Denver, where I was presenting about Panda and Penguin, so the timing was interesting to say the least. :)

Since the algorithm is rolling out slowly, I needed enough time and data to analyze the initial update, and then subsequent tremors. And I’m glad I waited ten days to write a post, since there have been several interesting updates already. Now that we’re ten days into the rollout, and several tremors have occurred, I believe I have enough data to write my first post about Penguin 3.0. And it’s probably the first of several as Penguin continues to roll out globally.

“Mountain View, We Have a Problem”
Based on the long delay of Penguin, it was clear that Google was having issues with the algo. Nobody knows exactly what the problems were, but you can guess that the results during testing were less than optimal. The signature of previous Penguin algorithms has been extremely acute up to now. It targeted spammy inbound links on low quality websites. Compare that to an extremely complex algorithm like Panda, and you can see clear differences…

But Panda is about on-site content, which makes it less susceptible to tampering. Penguin, on the other hand, is about external links. And those links can be manipulated. The more Penguin updates that rolled out, the more data you could gain about its signature. And that can lead to very nasty things happening. For example, launching negative SEO campaigns, adding any website to a host of low quality sites that have been previously impacted by Penguin, etc. All of that can muddy the algorithm waters, which can lead to a lot of collateral damage. I won’t harp on negative SEO in this post, but I wanted to bring it up. I do believe that had a big impact on why Penguin took so long to roll out.

My Goal With This Post
I’m going to quickly provide bullets listing what we know so far about Penguin 3.0 and then jump to my findings based on the first ten days of the rollout. I want to explain what I’ve seen in the Penguin trenches, including recoveries, fresh hits, and other interesting tidbits I’ve seen across my travels. In addition, I want to explain the danger of crossing algorithms, which is going on right now. I’ll explain more about Penguin, Panda, and Pirate all roaming the web at the same time, and the confusion that can cause. Let’s dig in.

Here’s what we know so far about Penguin 3.0:

  • Penguin 3.0 started rolling out on 10/17 and was officially announced on 10/21.
  • It’s a global rollout.
  • It’s a refresh and not an update. New signals have not been added. You can read more about the differences between a refresh and update from Marie Haynes.
  • It will be a slow and steady rollout that can take weeks to complete. More about Penguin tremors soon.
  • There was more international impact initially. Then I saw an uptick in U.S. impact during subsequent Penguin tremors.
  • Google has been very quiet about the update. That’s a little strange given the magnitude of Penguin 3.0, how long we have waited, etc. I cover more about the future of Penguin later in this post.


10 Days In – Several Penguin Tremors Already
We are now ten days into the Penguin 3.0 rollout. Based on the nature of this update, I didn’t want to write a post too quickly. I wanted more data, the ability to track many sites during the rollout in order to gauge the impact, fresh hits, and recoveries. And that’s exactly what I’ve done since early Saturday, October 18. Penguin began rolling out the night before and there’s been a lot of movement since then.

When Penguin first rolled out, it was clear to me that it would be a slow and steady rollout. I said that from the beginning. I knew there was potential for disaster (from Google’s standpoint), so there was no way they would roll out it globally all at one time. Instead, I believed they would start rolling out Penguin, heavily analyze the SERPs, adjust the algo where needed, and then push more updates and expand.  If you’ve been following my writing over the past few years, then you know I call this phenomenon “tremors”. I have seen this often with Panda, and especially since Panda 4.0. Those tremors were even confirmed by Google’s John Mueller.

Specifically with Penguin, I have seen several tremors since the initial rollout on 10/17. There was significant movement on 10/22, and then I saw even more movement on 10/24. Some sites seeing early recovery saw more impact during the subsequent tremors, while other sites saw their first impact from Penguin during those later tremors.

For example, one client I helped with both Panda and Penguin jumped early on Friday 10/24. You can see their trending below. They are up 48% since Friday:

Penguin 3.0 Recovery During Tremor

That’s awesome, and was amazing to see (especially for the business owner). They have worked very hard over the past year to clean up the site on several fronts, including content, links, mobile, etc. It’s great to see that hard work pay off via multiple algorithm updates (they recovered from Panda in May during Panda 4.0 and now during Penguin 3.0.) It’s been a good year for them for sure. :)

Moving forward, I fully expect to see more tremors as the global rollout continues. That can mean sites seeing fresh impact, while others see more movement beyond the first date that Penguin 3.0 impacted their sites. For example, a site may recover or get hit on 10/17, but see movement up or down during subsequent tremors. We’ve already seen this happen and it will continue throughout the rollout.

More Recoveries During Penguin 3.0
For those battling Penguin for a long time (some since Penguin 2.0 on May 22, 2013), this was a much-anticipated update. Some companies I’ve been helping have worked hard over the past 12-18 months to clean up their link profiles. That means nuking unnatural links and using the disavow tool heavily to rid their site of spammy links.

For those of you unfamiliar with link cleanup, the process is tedious, painful, and time consuming. And of course, you can have the nasty replicating links problem, which I have seen many times with spammy directories. That’s when unnatural links replicate across other low quality directories. Websites I’ve been helping with this situation must continually analyze and clean their link profiles. You simply can’t get rid of the problem quickly or easily. It’s a nasty reminder to never go down the spammy linkbuilding path again.

For example, here’s a site that had hundreds of spammy links pop up in the fall of 2014. They had no idea this was going on… 

Penguin 3.0 and New Spammy Links


When sites that have been working hard to rectify their link problems experience a Penguin recovery, it’s an amazing feeling. Some of the sites I’ve been helping have seen a nice bounce-back via Penguin 3.0. I’ll quickly cover two of those recoveries below.

The first is an ecommerce retailer that unfortunately took a dangerous path a few years ago. They hired several SEO companies over a number of years and each ended up building thousands of spammy links. It’s a similar story that’s been seen many times since Penguin first arrived. You know, an SMB trying to compete in a tough space, ends up following the wrong strategy, does well in the short-term, and then gets pummeled by Penguin.

The site was not in good shape when they first contacted me. So we tackled the unnatural link profile head on. I heavily analyzed their link profile, flagged many spammy links, they had a small team working on link removals, and whatever couldn’t be removed was disavowed. We updated the disavow file several times over a four to five month period.

But, and this is a point too many Penguin victims will be familiar with, we were done with link cleanup work in the spring of 2014! Yes, we had done everything we could, but simply needed a Penguin refresh or update. Surely that would happen soon, right?… No way. We had to wait until October 17, 2014 for that to happen. The good news is that this site saw positive impact immediately. You can see the increase in impressions and clicks below starting on 10/17. And Google organic traffic is up 52% since Penguin rolled out.

Penguin 3.0 Recovery on 10/17/14


The next recovery I’ll quickly explain started on 10/17 and saw subsequent increases during the various Penguin tremors I mentioned earlier. They saw distinct movement on 10/17, 10/22, and then 10/25. The site saw a pretty big hit from Penguin 2.0 and then another significant hit from Penguin 2.1 (where Google turned up the dial). The website’s link profile was riddled with exact match anchor text from low quality sites.

The site owner actually removed or nofollowed a good percentage of unnatural links. You can see the impact below. Notice the uptick in trending during the various tremors I mentioned.

Penguin 3.0 Recovery During Tremors


A Reality Check – Some Websites Left Hanging But Rollout Is Not Complete
I must admit, though, I know of several companies that are still waiting for Penguin recovery that should recover during Penguin (to some level). They worked hard just like the companies I listed above. They cleaned up their link profiles, heavily used the disavow tool, worked tirelessly to fix their Penguin problem, but have not seen any impact yet from Penguin 3.0. And many other companies have been complaining about the same thing. But again, Google said the full rollout could take weeks to complete… so it’s entirely possible that they will recover, but at some point over the next few weeks.


A Note About Disavow Errors
It’s worth noting that one client of mine battling Penguin made a huge mistake leading up to Penguin 3.0. They decided to update their disavow file in late September (without my help), and the file contained serious errors. They didn’t catch that upon submission. I ended up noticing something strange in the email from Google Webmaster Tools regarding the number of domains being disavowed. The total number of domains being recorded by GWT was a few hundred less than what was listed in the disavow file prior to the latest submission. And those extra few hundred domains encompass thousands of spammy links. I contacted my client immediately and they rectified the disavow file errors quickly and re-uploaded it.

The website has not recovered yet (although it absolutely should to some level). I have no idea if that disavow glitch threw off Penguin, or if this site is simply waiting for a Penguin tremor to recover. But it’s worth noting.


Fresh Penguin Hits
Now let’s move to the negative side of Penguin 3.0. There have been many fresh hits since 10/17 and I’ve been heavily analyzing those drops. It didn’t take long to see that the same old link tactics were being targeted (similar to previous versions of Penguin). And my research supports that Penguin 3.0 was a refresh and not a new algorithm.

For example, exact match anchor text links from spammy directories, article marketing, comment spam, forum spam, etc. Every fresh hit I analyzed yielded a horrible link profile using these tactics. These were clear Penguin hits… I could tell just by looking at the anchor text distribution that they were in serious Penguin danger.

For example, here’s the anchor text distribution for a site hit by Penguin 3.0. Notice all of the exact match anchor text?

Anchor Text Distribution for Fresh Penguin 3.0 Hit

For those of you new to SEO, this is not what a natural link profile looks like. Typically, there is little exact match anchor text, brand terms show up heavily, urls are used to link to pages, generic phrases, etc. If your top twenty anchor text terms are filled with exact match or rich anchor text, then you are sending “fresh fish” signals to Google. And Google will respond by sending a crew of Penguins your way. The end result will not be pretty.

Hit Penguin 3.0


Crazy Gets Crazier
I must admit that some fresh hits stood out, and not in a good way. For example, I found one site that started its spammy linkbuilding just two days after Penguin 2.1 rolled out in October of 2013! Holy cow… the business owner didn’t waste any time, right? Either they didn’t know about Penguin or they were willing to take a huge risk. Regardless, that site got destroyed by Penguin 3.0.

I could keep showing you fresh hit information, but unfortunately, you would get bored. They all look similar… spammy links from low quality sites using exact match anchor text. Many of the hits I analyzed were Grade-A Penguin food. It’s like the sites lobbed a softball at Penguin, and Google knocked it out of the park.


Next Update & Frequency?
At SMX East, Gary Illyes explained that the new Penguin algorithm was structured in a way where Google could update Penguin more frequently (similar to Panda). All signs point to a refresh with Penguin 3.0, so I’m not sure we’ll see Penguin updating regularly (beyond the rollout). That’s unfortunate, since we waited over one year to see this refresh…

Also, John Mueller was asked during a webmaster hangout if Penguin would update more frequently. He responded that the “holiday season is approaching and they wouldn’t want to make such as fuss”. If that’s the case, then we are looking at January as the earliest date for the next Penguin refresh or update. So, we have a minimum of three to four months before we see a Penguin refresh or update. And it could very well take longer, given Google’s track record with the Penguin algorithm. It wouldn’t shock me to see the next update in the Spring of 2015.

Check John’s comments at 46:45:


Important – The Crossing of Algorithm Updates (Penguin, Panda, and Pirate)
In the past, I have explained the confusion that can occur when Google rolls out multiple algorithm updates around the same time. The algorithm sandwich from April of 2012 is a great example, Google rolled out Panda, Penguin, and then another Panda refresh all within 10 days. It caused massive confusion and some sites were even hit by both algos. I called that “Pandeguin” and wrote about it here.

Well, we are seeing that again right now. Penguin 3.0 rolled out on 10/17, the latest version of Pirate rolled out late last week, and I’m confident we saw a Panda tremor starting late in the day on Friday 10/24. I had several clients dealing with Panda problems see impact late on 10/24 (starting around 5PM ET).

A bad Panda hit starting late on 10/24:

When Panda and Penguin Collide
A big Panda recovery starting at the same time: 

When Panda and Penguin Collide


I can see the Panda impact based on the large amount of Panda data I have access to (across sites, categories, and countries). But the average business owner does not have access to that data. And Google will typically not confirm Panda tremors. So, if webmasters saw impact on Friday (and I’m sure many have), then serious confusion will ensue. Were they hit by Penguin, Panda, or for some sites dealing with previous DMCA issues, was it actually Pirate?

Update: I now have even more data backing a Panda tremor late on 10/24. I had Paul Macnamara and  Michael Vittori explain they are seeing the same thing. They also provided screenshots of trending for both sites. You can see with Michael’s that the site got hit during the 9/5 Panda update, but recovered on Friday. Paul’s screenshot shows a clear uptick on 10/25 on a site impacted by Panda (no Penguin or Pirate impact at all).
Another Panda recovery during the 10/24 tremor.


Another Panda recovery during the 10/24 tremor.

And this underscores a serious problem for the average webmaster. If you work on fixing your site based on the wrong algorithm, they you will undoubtedly spin your SEO wheels. I’ve seen this many times over the years, and spinning wheels do nothing but waste money, time, and resources.

If you saw impact this past week, you need to make sure you know which algorithm update impacted your site. It’s not easy, when three external algos are roaming the web all at one time. But it’s important to analyze your situation, your search history, and determine what you need to do in order to recover.

A Note About Negative SEO
I couldn’t write a post about Penguin 3.0 without mentioning negative SEO. The fear with this latest update was that negative SEO would rear its ugly head. Many thought that the heavy uptick in companies building spammy links to their competitors would cause serious collateral damage.

Theoretically, that can definitely happen (and there are a number of claims of negative SEO since 10/17). Let’s face it, Penguin’s signature is not complicated to break down. So if someone built spammy links to their competitors on sites targeted by Penguin, then those sites could possibly get hit by subsequent Penguin refreshes. Many in the industry (including myself) believe this is one of the reasons it has taken so long for Google to roll out Penguin 3.0. I’m sure internal testing revealed serious collateral damage.

But here’s the problem with negative SEO… it’s very hard to prove that NSEO is the culprit (for most sites). I’ve received many calls since Penguin first rolled out in 2012 with business owners claiming they never set up spammy links that got them hit. But when you dig into the situation, you can often trace the spammy link trail back to someone tied to the company.

That might be a marketing person, agency, SEO company, PR agency, intern, etc.  You can check out my Search Engine Watch column titled Racing Penguin to read a case study of a company that thought negative SEO was at work, when in fact, it was their own PR agency setting up the links. So, although we’ve heard complaints of negative SEO with Penguin 3.0, it’s hard to say if those are accurate claims.

Negative SEO and Penguin 3.0


Penguin 3.0 Impact – What Should You Do Next?

  • If you have been negatively impacted by Penguin 3.0, my advice remains consistent with previous Penguin hits. You need to download all of your inbound links from a number of sources, analyze those links, flag unnatural links, and then remove/disavow them. Then you need to wait for a Penguin refresh or update. That can be months from now, but I would start soon. You never know when the next Penguin update will be…
  • On the flip side, if you have just recovered from a Penguin hit, then you should create a process for checking your links on a monthly basis. Make sure new spammy links are not being built. I have seen spammy links replicate in the past… so it’s important to fully understand your latest links. I wrote a blog post covering how to do this on Search Engine Watch (linked to above). I recommend reading that post and implementing the monthly process.
  • And if you are unsure of which algorithm update impacted your site, then speak with as many people familiar with algo updates as possible. You need to make sure you are targeting the right one with your remediation plan. But as I mentioned earlier, there are three external algos in the wild now (with Penguin, Panda, and Pirate). This inherently brings a level of confusion for webmasters seeing impact.


Summary – Penguin 3.0 and Beyond
That’s what I have for now. Again, I plan to write more posts soon about the impact of Penguin 3.0, the slow and steady rollout, interesting cases that surface, and more. In the meantime, I highly recommend analyzing your reporting heavily over the next few weeks. And that’s especially the case since multiple algos are running at the same time. It’s a crazy situation, and underscores the complexity of today’s SEO environment. So strap on your SEO helmets, grab a bottle of Tylenol, and fire up Google Webmaster Tools. It’s going to be an interesting ride.




Monday, September 29th, 2014

Panda 4.1 Analysis and Findings – Affiliate Marketing, Keyword Stuffing, Security Warnings, and Deception Prevalent

Panda 4.1 Analysis and Findings

On Tuesday, September 23, Google began rolling out a new Panda update. Pierre Far from Google announced the update on Google+ (on Thursday) and explained that some new signals have been added to Panda (based on user and webmaster feedback). The latter point is worth its own blog post, but that’s the not the focus of my post today. Pierre explained that the new Panda update will result in a “greater diversity of high-quality small- and medium-sized sites ranking higher”. He also explained that the new signals will “help Panda identify low-quality content more precisely”.

I first spotted the update late on 9/23 when some companies I have been helping with major Panda 4.0 hits absolutely popped. They had been working hard since May of 2014 on cleaning up their sites from a content quality standpoint, dealing with aggressive ad tactics, boosting credibility on their sites, etc. So it was amazing to see the surge in traffic due to the latest update.

Here are two examples of recovery during Panda 4.1. Both clients have been making significant changes over the past several months:

Panda 4.1 Recovery

Panda 4.1 Recovery Google Webmaster Tools

As a side note, two of my clients made the Searchmetrics winners list, which was released on Friday. :)

A Note About 4.1
If you follow me on Twitter, then you already know that I hate using the 4.1 tag for this update. I do a lot of Panda work and have access to a lot of Panda data. That enables me to see unconfirmed Panda updates (and tremors).  There have been many updates since Panda 4.0, so this is not the only Panda update since May 20, 2014. Not even close actually.

I’ve written heavily about what I called “Panda tremors”, which was confirmed by John Mueller of Google. Also, I’ve done my best to write about subsequent Panda updates I have seen since Panda 4.0 here on my blog and on my Search Engine Watch column. By the way, the latest big update was on 9/5/14, which impacted many sites across the web. I had several clients I’ve been helping with Panda hits recover during the 9/5 update.

My main point here is that 4.1 should be called something else, like 4.75. :) But since Danny Sullivan tagged it as Panda 4.1, and everybody is using that number, then I’ll go with it. The name isn’t that important anyway. The signature of the algo is, and that’s what I’m focused on.


Panda 4.1 Analysis Process
When major updates get rolled out, I tend to dig in full blast and analyze the situation. And that’s exactly what I did with Panda 4.1. There were several angles I took while analyzing P4.1, based on the recoveries and fresh hits I know of (and have been part of).

So, here is the process I used, which can help you understand how and why I came up with the findings detailed in this post.

1. First-Party Known Recoveries
These are recoveries I have been guiding and helping with. They are clients of mine and I know everything that was wrong with their websites, content, ad problems, etc. And I also know how well changes were implemented, if they stuck, how user engagement changed during the recovery work, etc. And of course, I know the exact level of recovery seen during Panda 4.1.

2. Third-Party Known Recoveries
These are sites I know recovered, but I’m not working with directly. Therefore, I use third party tools to help identify increases in rankings, which landing pages jumped in the rankings, etc. Then I would analyze those sites to better understand the current content surging, while also checking the previous drops due to Panda to understand their initial problems.

3. First-Party Known Fresh Hits
Based on the amount of Panda work I do, I often have a number of companies reach out to me with fresh Panda hits. Since these are confirmed Panda hits (large drops in traffic starting when P4.1  rolled out), I can feel confident that I’m reviewing a site that Panda 4.1 targeted. Since Tuesday 9/23, I have analyzed 21 websites (Update: now 42 websites) that have been freshly hit by Panda 4.1. And that number will increase by the end of this week. More companies are reaching out to me with fresh Panda hits… and I’ve been neck deep in bamboo all weekend.

4. Third-Party Unconfirmed Fresh Hits
During my analysis, I often come across other websites in a niche with trending that reveals a fresh Panda hit. Now, third party tools are not always accurate, so I don’t hold as much confidence in those fresh hits.  But digging into them, identifying the lost rankings, the landing pages that were once ranking, the overall quality of the site, etc., I can often identify serious Panda candidates (sites that should have been hit). I have analyzed a number of these third-party unconfirmed fresh hits during my analysis over the past several days.


Panda 4.1 Findings
OK, now that you have a better understanding of how I came up with my findings, let’s dig into actual P4.1 problems. I’ll start with a note about the sinister surge and then jump into the findings. Also, it’s important to understand that not all of the sites were targeted by new signals. There are several factors that can throw off identifying new signals, such as when the sites were started, how the sites have changed over time, how deep in the gray area of Panda they were, etc. But the factors listed below are important to understand, and avoid. Let’s jump in.


Sinister Surge Reared Its Ugly Head
Last year I wrote a post on Search Engine Watch detailing the sinister surge in traffic prior to an algorithm hit. I saw that phenomenon so many times since February of 2011 that I wanted to make sure webmasters understood this strange, but deadly situation. After I wrote that post, I had many people contact me explaining they have seen the exact same thing. So yes, the surge is real, it’s sinister, and it’s something I saw often during my latest analysis of Panda 4.1.

By the way, the surge is sinister since most webmasters think they are surging in Google for the right reasons, when in fact, Google is dishing out more traffic to problematic content and gaining a stronger feel for user engagement. And if you have user engagement problems, then you are essentially feeding the mighty Panda “Grade-A” bamboo. It’s not long after the surge begins that the wave crashes and traffic plummets.

Understanding the surge now isn’t something that can help Panda 4.1 victims (since they have already been hit). But this can help anyone out there that sees the surge and wonders why it is happening. If you question content quality on your website, your ad situation, user engagement, etc., and you see the surge, deal with it immediately. Have an audit completed, check your landing pages from Google organic, your adjusted bounce, rate, etc. Make sure users are happy. If they aren’t, then Panda will pay you a visit. And it won’t be a pleasant experience.

The Sinister Surge Before Panda Strikes


Affiliate Marketers Crushed
I analyzed a number of affiliate websites that got destroyed during Panda 4.1. Now, I’ve seen affiliate marketers get pummeled for a long time based on previous Panda updates, so it’s interesting that some affiliate sites that have been around for a while just got hit by Panda 4.1. Some sites I analyzed have been around since 2012 and just got hit now.

For example, there were sites with very thin content ranking for competitive keywords while their primary purpose was driving users to partner websites (like Amazon and other ecommerce sites). The landing pages only held a small paragraph up top and then listed affiliate links to Amazon (or other partner websites). Many of the pages did not contain useful information and it was clear that the sites were gateways to other sites where you could actually buy the products. I’ve seen Google cut out the middleman a thousand times since February of 2011 when Panda first rolled out, and it seems Panda 4.1 upped the aggressiveness on affiliates.

I also saw affiliate sites that had pages ranking for target keywords, but when you visited those pages the top affiliate links were listed first, pushing down the actual content that users were searching for. So when you are looking for A, but hit a page containing D, E, F, and G, with A being way down the page, you probably won’t be very happy. Clearly, the webmaster was trying to make as much money as possible by getting users to click through the affiliate links. Affiliate problems plus deception is a killer combination. More about deception later in the post.

Panda 4.1 and Affiliate Marketing

Affiliates with Blank and/or Broken Pages
I came across sites with top landing pages from Google organic that were broken or blank. Talk about a double whammy… the sites were at risk already with pure affiliate content. But driving users to an affiliate site with pages that don’t render or break is a risky proposition for sure. I can tell you with almost 100% certainty that users were quickly bouncing back to the search results after hitting these sites. And I’ve mentioned many times before how low dwell time is a giant invitation to the mighty Panda.

Blank Affiliate Pages and Panda 4.1

Doorway Pages + Affiliate Are Even Worse
I also analyzed several sites hit by Panda 4.1 that held many doorway pages (thin pages over-optimized for target keywords). And once you hit those pages, there were affiliate links weaved throughout the content. So there were two problems here. First, you had over-optimized pages, which can get you hit. Second, you had low-quality affiliate pages that jumped users to partner websites to take action. That recipe clearly caused the sites in question to get hammered.  More about over-optimization next.


Keyword Stuffing and Doorway Pages
There seemed to be a serious uptick in sites employing keyword stuffing hit by Panda 4.1. Some pages were completely overloaded in the title tag, metadata, and in the body of the page. In addition, I saw several examples of sites using local doorway pages completely over-optimized and keyword stuffed.

For example, using {city} + {target keyword} + {city} + {second target keyword} + {city} + {third target keyword} in the title. And then using those keywords heavily throughout the page.

And many of the pages did not contain high quality content. Instead, they were typically thin without useful information. Actually, some contained just an image with no copy. And then there were pages with the duplicate content, just targeted to a different geographic location.

The websites I analyzed were poorly-written, hard to read through, and most people would probably laugh off the page as being written for search engines. I know I did. The days of stuffing pages and metadata with target keywords are long gone. And it’s interesting to see Panda 4.1 target a number of sites employing this tactic.

Panda 4.1 and Keyword Stuffing

Panda 4.1 and Keyword Density

Side Note About Human Beings:
It’s worth reiterating something I often tell Panda victims I’m helping. Actually, I just mentioned this in my latest Search Engine Watch column (which coincidentally went live the day after P4.1 rolled out!) Have neutral third parties go through your website and provide feedback. Most business owners are too close to their own sites, content, ad setup, etc. Real people can provide real feedback, and that input could save your site from a future panda hit.

I analyzed several sites hit by Panda 4.1 with serious ad problems. For example floating ads throughout the content, not organized in any way, blending ads with content in a way where it was hard to decipher what was an ad and what was content, etc.

I mentioned deception in the past, especially when referring to Panda 4.0, but I saw this again during 4.1. If you are running ads heavily on your site, then you absolutely need to make sure there is clear distinction between content and ads. If you are blending them so closely that users mistakenly click ads thinking it was content, then you are playing Russian roulette with Panda.

Panda 4.1 and Deception

Users hate being deceived, and it can lead to them bouncing off the site, reporting your site to organizations focused on security, or to Google itself. They can also publicly complain to others via social networks, blogging, etc. And by the way, Google can often pick that up too (if those reviews and complaints are public.) And if that happens, then you can absolutely get destroyed by Panda. I’ve seen it many times over the years, while seeing it more and more since Panda 4.0.

Deception is bad. Do the right thing. Panda is always watching.


Content Farms Revisited
I can’t believe I came across this in 2014, but I did. I saw several sites that were essentially content farms that got hammered during Panda 4.1. They were packed with many (and sometimes ridiculous) how-to articles. I think many people in digital marketing understand that Panda was first created to target sites like this, so it’s hard to believe that people would go and create more… years after many of those sites had been destroyed. But that’s what I saw!

To add to the problems, the sites contained a barebones design, they were unorganized, weaved ads and affiliates links throughout the content, etc. Some even copied how-to articles (or just the steps) from other prominent websites.

Now, to be fair to Google, several of the sites were started in 2014, so Google needed some time to better understand user engagement, the content, ad situation, etc. But here’s the crazy thing. Two of those sites surged with Panda 4.0. My reaction: “Whhaatt??” Yes, the sites benefitted somehow during the massive May 20 update. That’s a little embarrassing for Google, since it’s clearly not what they are trying to rise in the rankings…

Incorrect Panda 4.0 Surge

But that was temporary, as Panda 4.1 took care of the sites (although late in my opinion). So, if you are thinking about creating a site packed with ridiculous how-to articles, think again. And it goes without saying that you shouldn’t copy content from other websites. The combination will surely get you hit by Panda. I just hope Google is quicker next time with the punishment.

Security Warnings, Popup Ads, and Forced Downloads
There were several sites I analyzed that had been flagged by various security and trust systems. For example, several sites were flagged as providing adware, spyware, or containing viruses. I also saw several of the sites using egregious popups when first hitting the site, forcing  downloads, etc.

And when Panda focuses on user engagement, launching aggressive popups and forcing downloads is like hanging fresh bamboo in the center of your websites and ringing the Panda dinner bell. Users hate popups, especially when it’s the first impression of your site. Second, they are fearful of any downloads, let alone ones you are forcing them to execute. And third, security messages in firefox, chrome, antivirus applications, WOT, etc. are not going to help matters.

Trust and credibility are important factors for avoiding Panda hits. Cross the line and you can send strong signals to Google that users are unhappy with your site. And bad things typically ensue.

Panda 4.1 Security Problems

Next Steps:
Needless to say, Panda 4.1 was a big update and many sites were impacted. Just like Panda 4.0, I’ve seen some incredible recoveries during 4.1, while also seeing some horrible fresh hits. Some of my clients saw near-full recoveries, while other sites pushing the limits of spamming got destroyed (dropping by 70%+).

I have included some final bullets below for those impacted by P4.1. My hope is that victims can begin the recovery process, while those seeing recovery can make sure the surge in traffic remains.

  • If you have been hit by Panda 4.1, then run a Panda report to identify top content that was negatively impacted. Analyzing that content can often reveal glaring problems.
  • Have an audit conducted. They are worth their weight in gold. Some webmasters are too close to their own content to objectively identify problems that need to be fixed.
  • Have real people go through your website and provide real feedback. Don’t accept sugarcoated feedback. It won’t help.
  • If you have recovered, make sure the surge in traffic remains. Follow the steps listed in my latest Search Engine Watch column to make sure you aren’t feeding Google the same (or similar) problems that got you hit in the first place.
  • Understand that Panda recovery takes time. You need to first make changes, then Google needs to recrawl those changes (over time), and then Google needs to be measure user engagement again. This can take months. Be patient.
  • Understand that there isn’t a silver Panda bullet. I usually find a number of problems contributing to Panda attacks during my audits. Think holistically about user engagement and then factor in the various problems surfaced during an audit.
  • Last, but most importantly, understand that Panda is about user happiness. Make sure user engagement is strong, users are happy with your content, and they don’t have a poor experience while traversing your website. Don’t deceive them, don’t trick them into clicking ads, and make a great first impression. If you don’t, those users can direct their feedback to Panda. And he can be a tough dude to deal with.


Summary – Panda 4.1 Reinforces That Users Rule
So there you have it. Findings based on analyzing a number of websites impacted by Panda 4.1. I will try and post more information as I get deeper into Panda 4.1 recovery work. Similar to other major algorithm updates, I’m confident we’ll see Panda tremors soon, which will bring recoveries, temporary recoveries, and more hits. Strap on your SEO helmets. It’s going to be an interesting ride.



Wednesday, September 17th, 2014

How To Check If Google Analytics Is Firing On Android Devices Using Remote Debugging With Chrome [Tutorial]

How To Debug Google Analytics on Mobile Devices

We all know that having a strong analytics setup is important. Marketing without measurement is a risky proposition for sure. But in a multi-device world, it’s not as easy to make sure your setup is accurately tracking what you need – or tracking at all. And if your analytics code isn’t firing properly across smartphones, tablets, and desktop computers, your data will be messy, incomplete, and inaccurate. And there’s nothing that drives a marketer crazier than flawed data.

A few weeks ago, Annie Cushing tweeted a quick question to her followers asking how everyone was testing their Google Analytics setup via mobile devices. This is something many digital marketers grapple with, especially when you are trying to track down problems. For example, I do a lot of algorithm update work and often dig into the analytics setup for a site to ensure we are seeing the full drop in traffic, conversion, revenue, etc.

My knee-jerk response was to check real-time reporting in Google Analytics while accessing specific pages to ensure those visits were being tracked, in addition to events. That could work, but it’s not as granular or isolated as you would want. I also mentioned to Annie that using a chrome extension like User Agent Switcher could help. That wouldn’t document the firing of analytics code, but would let you see the source code when accessing a webpage via a specific type of smartphone or tablet. But again, you couldn’t see the actual firing of the code or the events being tracked. And that’s obviously an important aspect to debugging analytics problems.

A Solution – Remote Debugging on Android with Chrome
So I did what I typically do when I run into a tricky situation. I find a solution! And for Android devices, I found a solid one. Many of you might be familiar with Chrome Developer Tools (on your desktop computer). It holds some outstanding functionality for debugging websites and web applications. But although it’s extremely helpful for debugging desktop webpages, it didn’t really address the problem at hand (out of the box), since we want to debug mobile devices.

So I started to research the issue and that’s when I came across a nifty technique which would allow you to connect your Android device to your desktop computer and then debug the Chrome tabs running on your mobile device from your desktop computer. And since I could use Chrome Developer Tools to debug the tabs on my desktop computer, I could check to see if Google Analytics was indeed firing when accessing webpages via my Android device. Awesome.

So, I spent some time testing this out and it does work. Sure, I had to jump through some hoops to get it to run properly, but it finally did work. Below I’ll cover what you’ll need to test this out for yourself and how to overcome some of the problems I encountered. Let’s get started.


What You’ll Need
In order to debug GA code running on your mobile device, you’ll need the proper setup both on your desktop computer and on your Android device. In its simplest form, you’ll need:

  • Chrome installed on your desktop (version 32 or later).
  • Android 4.0 or later.
  • A USB Cable to connect your device to your computer.
  • Android SDK {this will not be required for some of you, but others might need to install it. More on that situation below}.

If you run into the problems I ran into, you’ll need the Android SDK installed. I already had it installed since I’ve been testing various Android functionality and code, so it wasn’t a big deal. But you might need to install it on your own. I wouldn’t run to do that just yet, though. If the straight setup works for you, then run with it. If not, then you might need to install the Android SDK.

If you are confident you have the necessary setup listed above, then you can move to the tutorial listed below. I’ll walk you through how to debug Chrome tabs running on your mobile device via Chrome on your desktop computer. And yes, we’ll be isolating Google Analytics code firing on our Android devices to ensure you are tracking what you need.

How To Debug Google Analytics on Your Android Device – Step-By-Step Instructions

  1. Enable USB Debugging on Your Android Device
    Access your settings on your Android device and click Developer Options. On my device, that was located in the more “More” grouping of my settings and under System Manager. If you don’t see Developer Options, then you need to enable it.You can do that by accessing Settings, tapping About Phone or About Device and tapping Build Number seven times. Yes, that sounds extremely cryptic, but that’s what you need to do. Once you do, Developer Options will show up in under System Manager in your phone’s settings.

    Enable USB Debugging on Android Device

    Then you can check the box to enable USB Debugging on your device. You will need to do this in order to debug Google Analytics in Chrome on your device.

  2. Enable USB Discovery in Chrome (on your desktop)
    Next, type chrome:inspect in a new tab in Chrome on your desktop. Ensure “Discover USB devices” is checked on this screen.

    Enable USB Discovery in Chrome Desktop
  3. Connect Your Phone To Your Computer via USB
  4. Allow USB Debugging
    When you connect your phone to your computer, you should see a dialog box on your phone that asks you if you want to allow USB debugging. Click OK. Note, if you don’t see this dialog box, debugging your mobile device from Chrome on your desktop will not work. I provide instructions for getting around this problem later in the tutorial. If you are experiencing this problem, hop down to that section now.

    Allow USB Debugging on Android Device
  5. Fire up Chrome On Your Mobile Device
    Start Chrome on your Android device and access a webpage (any webpage you want to debug).
  6. Inspect With Chrome on your Desktop
    Once you open a webpage in Chrome on your mobile device, access Chrome on your desktop and visit chrome:inspect. Once you do, you should see your device listed and the various tabs that are open in Chrome on your Android device.

    Inspect Chrome Tabs on Desktop Computer
  7. Click Inspect To Debug The Mobile Tab
    When you click “inspect”, you can use Chrome Developer Tools on your desktop to debug the mobile web view. You can use all of the functionality in Chrome Developer Tools to debug the webpage open on your mobile device.
  8. Click the Network Tab in Chrome Developer Tools
    By accessing the Network Tab, you can view all network activity based on the webpage you have loaded in Chrome on your mobile device. That includes any resources that are requested by the webpage. Then reload the webpage on your mobile device to ensure you are seeing all resources.
  9. First Check for GA.js
    When you load a webpage on your mobile device, many resources will be listed in the network tab. But you should look for ga.js to see if the Google Analytics snippet is being loaded.Tip: You can use the search box and enter “ga.js” to filter all resources by that string. It’s an easy way to isolate what you are looking for.

    Check for ga.js in Network Tab in Developer Tools
  10. Next Check for utm.gif
    After checking for ga.js, you should look for the tracking pixel that’s sent to GA named utm.gif. If that is listed in the network panel, then your mobile webpage is tracking properly (at least basic tracking). Again, you can use the search box to filter by utm.gif.

    Check for utm.gif in Network Tab in Developer Tools
  11. Bonus: Advanced Tracking
    If you are firing events from mobile webpages, then you can see them listed here as well. For example, you can see an event being fired when a user stays on the page for more than 30 seconds below. So for this situation, we know that pageviews are accurately being tracked and the time on page event is being tracked via mobile. Nice.

    Check event tracking in Chrome for Android


A Note About Troubleshooting
I mentioned earlier that if you don’t see the “Allow USB Debugging” dialog on your mobile device when you connect your phone to your computer, then this setup won’t work for you. It didn’t initially work for me. After doing some digging around, I found the legacy workflow for remote debugging on Android.

By following the steps listed below, I finally got the prompt to show up on my mobile device. Then I was able to debug open Chrome tabs on my Android device.


  1. Install the Android SDK (if you don’t already have it installed)
    You can learn more about the SDK here and download the necessary files.
  2. Kill the ADB Server
    Use a command prompt to access the “platform-tools” folder in the SDK directory and then issue the following command: adb kill-server. Note, you should use the cd command to change directory to the folder containing adb. That’s the platform-tools folder in your Android SDK directory.

    Kill ADB Server
  3. Revoke USB Debugging on Your Android Device
    Disconnect your phone from your computer. Then go back to Developer Options on your Android phone and tap Revoke USB debugging authorization.

    Revoke USB Debugging
  4. Start the ADB Server
    Now you must restart the adb server. Use a command prompt, access the platform-tools folder again, and enter the following command: adb start-server.

    Start ADB Server
  5. Reconnect Your Device To Your Computer
    Once you reconnect your device, you should see the “Allow USB Debugging” dialog box. Click “OK” and you should be good to go. This will enable you to debug Chrome tabs running on your mobile device via Chrome running on your desktop.
  6. Open Chrome on Your Android Device
    Go ahead and open a webpage that you want to debug in Chrome on your Android phone. Once it’s loaded in Chrome in Android, you can follow the instructions listed earlier for using the network panel to debug the GA setup.


Summary – Know When Google Analytics is Firing on Mobile Devices
So there you have it. There is a way to debug the actual firing of GA code on your Android devices and it works well. Sure, you may need to go the extra mile, use the legacy workflow, and install the Android SDK, but you should be able to get it working. And once you do, you’ll never have to guess if GA is really working on Android devices. You’ll know if it is by debugging your Chrome tabs on Android via Chrome running on your desktop. Good luck.




Tuesday, September 9th, 2014

Panda Update on Friday September 5, 2014

Panda Update on 9/5/14

My last blog post explained that Panda is now running in near-real-time and what that means for webmasters and business owners. Well, that was perfect timing as Panda just made another trip around the web as kids head back to school and the NFL kicks in.

I’ve seen multiple Panda clients see recovery starting on Friday 9/5. And some of the clients had been seriously impacted by our cute, black and white friend in the past. Two sites, in particular, saw drops of 60%+ from previous Panda updates.

Here are a few screenshots from companies seeing impact from the 9/5/14 Panda update:

Panda Recovery on 9/5/14


Another Panda Recovery on 9/5/14


Panda is Starting The School Year Out Right
Teachers always say that hard work can lead to success. And it seems the schoolyard Panda feels the same way. The clients seeing the biggest spikes in traffic have done a lot of hard work Panda-wise.

Over the past few months, massive Panda problems were uncovered from a content quality standpoint. That included finding thin content, duplicate content, low-quality content, scraped content, while also identifying ad problems and technical  problems that were impacting content quality and user engagement.

The user experience across each site was poor to say the least and the changes they have made (and are actively implementing) are improving the overall quality of their websites. And that’s exactly what you need to do in order to see positive Panda movement.

A Note About Temporary Recoveries (or Tests)
I recently wrote a post about temporary Panda recoveries, which I have seen several of over the past month or so.  It’s interesting to note that two sites that just bounced back had seen temporary Panda recoveries in the past month. Now, we don’t know if they were truly temporary recoveries or simply tests of a future Panda update that ended up getting rolled back. But since Friday 9/5, both of those sites have spiked again. Let’s hope these recoveries stick.

Temporary Panda Recovery


Beyond temporary recoveries, other websites battling Panda saw serious spikes in Google organic traffic starting on Friday 9/5. And like I said earlier, they had gotten hammered by Panda in the past. It’s awesome to see them bounce back.

For example, one site is up 85% and another is up 71%. Nice increases to say the least.

Panda Recovery Percentage in GA


Summary – Everybody’s Working for the Weekend (Including Panda)
As I explained earlier, Panda is now near-real-time and the days of waiting for monthly Panda updates are gone. The fact of the matter is that you can see impact at any point during the month (or even multiple times per month). So, if you’ve been impacted by Panda in the past, then check your reporting now. Friday might have been a very good day for you. And on the flip side (for those facing the Panda music for the first time), you might see a frightening drop in Google organic traffic. One thing is for sure… with the mighty Panda roaming the web in near-real-time, it’s never been more important to keep a close eye on content quality. Panda sure is.

So get ready for the next update. I’m confident it’s not far away. Actually, it might be just around the corner.




Tuesday, September 2nd, 2014

Google Panda Running Regularly Since P4.0, Approaches Near-Real-Time

Google Panda Running Regularly

In June of 2013 I wrote about the maturing of Google’s Panda algorithm and how it started to roll out monthly over a ten day period. Google also explained at that time that they wouldn’t be confirming future Panda updates. In my post, I explained how the combination of monthly updates, over ten days, with no confirmation, could lead to serious webmaster confusion. Getting hit by Panda was already confusing enough for webmasters (when they knew it was Panda). Now sites could get hit during a ten day period, any month, without confirmation from Google about what hit them.

So the monthly updates went on, I picked up a number of them, and yes, it was confusing for many. I received plenty of emails from business owners wondering why they experienced drops during those unconfirmed updates. In case you’re wondering, I could pick up those unconfirmed updates since I help a lot of companies with Panda and I have access to a lot of Panda data. More about that soon. But the average webmaster could not easily pick up those updates, which led to serious confusion and frustration. And that’s the situation we were in until May of 2014.

And Along Came Panda 4.0
This went on until Panda 4.0, which was a huge update released on May 20, 2014. Google did announce the update for several reasons. First, it was a new Panda algorithm. Second, they knew it was HUGE and would impact many websites (and some aggressively).

Everything about the update was big. There were huge recoveries and massive new hits. You can read my previous posts about Panda 4.0 to learn more about the update. But that’s not the focus of this post. Something else has been going on since Panda 4.0, and it’s critically important to understand.

After Panda 4.0 rolled out on May 20, 2014, I noticed that sites impacted by the algorithm update were seeing continual “tremors”. Sites that were hit were seeing more drops every week or so and sites that experienced recovery also saw tremors during those dates (slight increases during those intervals). Moving forward, I also started to see sites reverse direction during some of the tremors. Some that saw recovery saw slight decreases and others that were hit saw slight increases. It was fascinating to analyze.

I reached out to Google’s John Mueller via G+ to see if he could shed some light on the situation. Well, he did, and I documented his response in my Search Engine Watch column soon after. John explained that Google doesn’t have a fixed schedule for algorithm updates like Panda. They could definitely tweak the algo to get the desired results and roll it out more frequently. That was big news, and confirmed the tremors I was seeing.

Google's John Mueller Clarifies Panda Tremors

John also explained more about Panda in a recent Google Webmaster Office Hours Hangout (from August 15, 2014).Here’s a quote from John:

“I believe Panda is a lot more regular now, so that’s probably happening fairly regularly.”

And based on what I’ve been seeing across websites impacted by Panda, he’s not kidding. You can see the video below (starting at 21:40).
Since Panda 4.0, I’ve seen tremors almost weekly. And guess what? They really haven’t stopped. So it seems they aren’t temporary adjustments to Panda, but instead, this could be the new way that Panda roams the web. Yes, that would mean we are in the age of a near-real-time Panda. And that can be both amazing and horrifying for webmasters.


What I’ve Seen Since Panda 4.0
I mentioned that I have access to a lot of Panda data. That’s because I’ve helped a lot of companies with Panda since February of 2011, while also having new companies reach out to me about fresh Panda hits. This enables me to see recoveries with companies that are working hard to rectify content quality problems, while also seeing new Panda hits. This combination enables me to document serious Panda activity on certain dates.

Since Panda 4.0 rolled out, I have consistently seen tremors (almost weekly). I have seen companies continue to increase, continue to decrease, fluctuate up and down, and I have also documented temporary recoveries. Below, I’ll show you what some of the tremors look like and then I’ll explain what this all means.

Panda Tremors – Example
Example of Panda Tremors


Panda Tremors – Example
Second Example of Panda Tremors


Temporary Panda Recovery During Tremors
Temporary Panda Recovery During Tremors


Another Temporary Panda Recovery During Tremors
Example of Temporary Panda Recovery During Tremor


Fresh Bamboo and The Near-Real-Time Panda Algo
So, what does this all mean for webmasters and business owners? Well, it means that Panda is rolling out often, and sites can be impacted more frequently than before. That’s huge news for any webmaster dealing with a Panda problem. In the past, you would have to wait for a monthly Panda update to run before you could see recovery (or further decline). Now you can see impact much more frequently. Again, this is big.

That’s why I have seen sites fluctuate almost weekly since Panda 4.0. Some have stabilized, while others continue to dance with the mighty Panda. And the temporary recoveries emphasize an important point. If you haven’t completed enough Panda recovery work, you might see what looks to be recovery, only to get hammered again (and quickly). It’s one of the reasons I explain to Panda victims that they need to move quickly and implement serious changes based on a thorough Panda audit. If not, they are setting themselves up to continually see declines, or worse, see a misleading temporary recovery, only to get smoked again.

Summary – The Good and the Bad of The Near-Real-Time Panda
As I explained above, it looks like a new phase of Panda has begun. As someone neck deep in Panda work, it’s fascinating to analyze. With the mighty Panda roaming the web in near-real-time, websites can see ups and downs throughout the month. They can get hit, or recover, or even see both in one month. That’s why it’s never been more important to address content quality problems on your website. As always, my recommendation is to focus on user engagement, nuke thin and low quality content, remove deceptive tactics, and win the Panda game.

Let’s face it, Panda has upped its game. Have you?