Archive for the ‘seo’ Category

Friday, January 23rd, 2015

Insidious Thin Content on Large-Scale Websites and Its Impact on Google Panda

Insidious Thin Content and Google Panda

If you’ve read some of my case studies in the past, then you know Panda can be a real pain the neck for large-scale websites. For example, publishers, ecommerce retailers, directories, and other websites that often have tens of thousands, hundreds of thousands, or millions of pages indexed. When sites grow that large, with many categories, directories, and subdomains, content can easily get out of control. For example, I sometimes surface problematic areas of a website that clients didn’t even know existed! There’s usually a gap of silence on the web conference when I present situations like that. But once everyone realizes that low quality content is in fact present, then we can proceed with how to rectify the problems at hand.

And that’s how you beat Panda. Surfacing content quality problems and then quickly fixing those problems. And if companies don’t surface and rectify those problems, then they remain heavily impacted by Panda. Or even more maddening, they can go in and out of the gray area of Panda. That means they can get hit, recover to a degree, get hit again, recover, etc. It’s a maddening place to live SEO-wise.

The Insidious Thin Content Problem
The definition of insidious is:
“proceeding in a gradual, subtle way, but with harmful effects”

And that’s exactly how thin content can increase over time on large-scale websites. The problem usually doesn’t rear its ugly head in one giant blast (although that can happen). Instead, it can gradually increase over time as more and more content is added, edited, technical changes are made, new updates get pushed to the website, new partnerships formed, etc. And before you know it, boom, you’ve got a huge thin content problem and Panda is knocking on the door. Or worse, it’s already knocked down your door.

So, based on recent Panda audits, I wanted to provide three examples of how an insidious thin content problem can get out of control on larger-scale websites. My hope is that you can review these examples and then apply the same model to your own business.


Insidious Thin Content: Example 1
During one recent audit, I ended up surfacing a number of pages that seemed rogue. For example, they weren’t linked to from many other pages on the site, didn’t contain the full site template, and only contained a small amount of content. And the content didn’t really have any context about why it was there, what users were looking at, etc. I found that very strange.

Thin Content with No Site Template

So I dug into that issue, and started surfacing more and more of that content. Before I knew it, I was up to 4,100 pages of that content! Yes, there were over four thousand rogue, thin pages based on that one find.

To make matters even worse, when checking how Google was crawling and indexing that content, you could quickly see major problems. Using both fetch and render in Google Webmaster Tools and checking the cache of the pages revealed Google couldn’t see most of the content. So the thin pages were even thinner than I initially thought. They were essentially blank to Google.

Thin Content and Content Won't Render

When bringing this up to my client, they did realize the pages were present on the site, but didn’t understand the potential impact Panda-wise. After explaining more about how Panda works, and how thin content equates to giant pieces of bamboo, they totally got it.

I explained that they should either immediately 404 that content or noindex it. And if they wanted to quicken that process a little, then 410 the content. Basically, if the pages should not be on the site for users or Google, then 404 or 410 them. If the pages are beneficial for users for some reason, then noindex the content using the meta robots tag.

So, with one finding, my client will nuke thousands of pages of thin content from their website (which had been hammered by Panda). That will sure help and it’s only one finding based on a number of core problems I surfaced on the site during my audit. Again, the problem didn’t manifest itself overnight. Instead, it took years of this type of content building on the site. And before they knew it, Panda came and hammered the site. Insidious.


Insidious Thin Content: Example 2
In another audit I recently conducted, I kept surfacing thin pages that basically provided third party videos (which were often YouTube videos embedded in the page). So you had very little original content and then just a video. After digging into the situation, I found many pages like this. At this time, I estimate there could be as many as one thousand pages like this on the site. And I still need to analyze more of the site and crawl, so it could be even worse…

Now, the web site has been around for a long time, so it’s not like all the thin video pages popped up overnight. The site produces a lot of content, but would continually supplement stronger content with this quick approach that yielded extremely thin and unoriginal content. And as time went on, the insidious problem yielded a Panda attack (actually, multiple Panda attacks over time).

Thin Video Pages and Google Panda

Note, this was not the only content quality problem the site suffered from. It never is just one problem that causes a Panda attack by the way. I’ve always said that Panda has many tentacles and that low quality content can mean several things. Whenever I perform a deep crawl analysis and audit on a severe Panda hit, I often surface a number of serious problems. This was just one that I picked up during the audit, but it’s an important find.

By the way, checking Google organic traffic to these pages revealed a major decrease in traffic over time… Even Google was sending major signals to the site that it didn’t like the content. So there are many thin video pages indexed, but almost no traffic. Running a Panda report showing the largest drop in traffic to Google organic landing pages after a Panda hit reveals many of the thin video pages in the list. It’s one of the reasons I recommend running a Panda report once a site has been hit. It’s loaded with actionable data.

So now I’m working with my client to identify all pages on the site that can be categorized as thin video pages. Then we need to determine which are ok (there aren’t many), which are truly low quality, which should be noindexed, and which should be nuked. And again, this was just one problem… there are a number of other content quality problems riddling the site.


Insidious Thin Content: Example 3

During another Panda project, I surfaced an interesting thin content problem. And it’s one that grew over time to create a pretty nasty situation. I surfaced many urls that simply provided a quick update about a specific topic. Those updates were typically just a few lines of content all within a specific category. The posts were extremely thin… and were sometimes only a paragraph or two without any images, visuals, links to more content, etc.

Thin Quick Updates and Google Panda

Upon digging into the entire crawl, I found over five thousand pages that fit this category of thin content. Clearly this was a contributing factor to the significant Panda hit the site experienced. So I’m working with my client on reviewing the situation and making the right decision with regard to handling that content. Most of the content will be noindexed versus being removed, since there are reasons outside of SEO that need to be taken into account. For example, partnerships, contractual obligations, etc.

Over time, you can see that some of these pages actually used to rank well and drive organic search traffic from Google. That’s probably due to the authority of the site. I’ve seen that many times since 2011 when Panda first rolled out. A site builds enormous SEO power and then starts pumping out thinner, lower-quality content.  And then that content ends up ranking well. And when users hit the thin content from Google, they bounce off the site quickly (and often back to the search results). In aggregate, low user engagement, high bounce rates, and low dwell time can be a killer Panda-wise. Webmasters need to avoid that situation like the plague. You can read my case study about “6 months with Panda” to learn more about that situation.


Summary – Stopping The Insidious Thin Content Problem is Key For Panda Recovery
So there you have it. Three quick examples of insidious thin content problems on large-scale websites. They often don’t pop up overnight, but instead, they grow over time. And before you know it, you’ve got a thick layer of bamboo on your site attracting the mighty Panda. By the way, there are many other examples of insidious thin content that I’ve come across during my Panda work and I’ll try and write more about this problem soon. I think it’s incredibly important for webmasters to understand how the problem can grow, the impact it can have, and how to handle the situation.

In the meantime, I’ll leave you with some quick advice. My recommendation to any large-scale website is to truly understand your content now, identify any Panda risks, and take action sooner than later. It’s much better to be proactive and handle thin content in the short-term versus dealing with a major Panda hit after the fact. By the way, the last Panda update was on 10/24, and I’m fully expecting another one soon. Google rolled out an update last year on 1/11/14, so we are definitely due for one soon. I’ll be sure to communicate what I’m seeing once the update rolls out.




Monday, December 29th, 2014

XML Sitemaps – 8 Facts, Tips, and Recommendations for the Advanced SEO

XML Sitemaps for Advanced SEOs

After publishing my last post about dangerous rel canonical problems, I started receiving a lot of questions about other areas of technical SEO. One topic in particular that seemed to generate many questions was how to best use and set up xml sitemaps for larger and more complex websites.

Sure, in its most basic form, webmasters can provide a list of urls that they want the search engines to crawl and index. Sounds easy, right? Well, for larger and more complex sites, the situation is often not so easy. And if the xml sitemap situation spirals out of control, you can end up feeding Google and Bing thousands, hundreds of thousands, or millions of bad urls. And that’s never a good thing.

While helping clients, it’s not uncommon for me to audit a site and surface serious errors with regard to xml sitemaps. And when that’s the case, websites can send Google and Bing mixed signals, urls might not get indexed properly, and both engines can end up losing trust in your sitemaps. And as Bing’s Duane Forrester once said in this interview with Eric Enge:

“Your Sitemaps need to be clean. We have a 1% allowance for dirt in a Sitemap. If we see more than a 1% level of dirt, we begin losing trust in the Sitemap.”

Clearly that’s not what you want happening…

So, based on the technical SEO work I perform for clients, including conducting many audits, I decided to list some important facts, tips, and answers for those looking to maximize their xml sitemaps. My hope is that you can learn something new from the bullets listed below, and implement changes quickly.


1. Use RSS/Atom and XML For Maximum Coverage
This past fall, Google published a post on the webmaster central blog about best practices for xml sitemaps. In that post, they explained that sites should use a combination of xml sitemaps and RSS/Atom feeds for maximum coverage.

Xml sitemaps should contain all canonical urls on your site, while RSS/Atom feeds should contain the latest additions or recently updated urls. XML sitemaps will contain many urls, where RSS/Atom feeds will only contain a limited set of new or recently changed urls.

RSS/Atom Feed and XML Sitemaps

So, if you have new urls (or recently updated urls) that you want Google to prioritize, then use both xml sitemaps and RSS/Atom feeds. Google says by using RSS, it can help them “keep your content fresher in its index”. I don’t know about you, but I like the idea of Google keeping my content fresher. :)

Also, it’s worth noting that Google recommends maximizing the number of urls per xml sitemap. For example, don’t cut up your xml sitemaps into many smaller files (if possible). Instead, use the space you have in each sitemap to include all of your urls. If you don’t Google explains that, “it can impact the speed and efficiency of crawling your urls.” I recommend reading Google’s post to learn how to best use xml sitemaps and RSS/Atom feeds to maximize your efforts. By the way, you can include 50K urls per sitemap and each sitemap must be less than 10MB uncompressed.


2. XML Sitemaps By Protocol and Subdomain
I find a lot of webmasters are confused by protocol and subdomains, and both can end up impacting how urls in sitemaps get crawled and indexed.

URLs included in xml sitemaps must use the same protocol and subdomain as the sitemap itself. This means that https urls located in an http sitemap should not be included in the sitemap. This also means that urls on cannot be located in the sitemap on So on and so forth.

XML Sitemaps and Protocol and Subdomains


This is a common problem when sites employ multiple subdomains or they have sections using https and http (like ecommerce retailers). And then of course we have many sites starting to switch to https for all urls, but haven’t changed their xml sitemaps to reflect the changes. My recommendation is to check your xml sitemaps reporting today, while also manually checking the sitemaps. You might just find issues that you can fix quickly.


3. Dirty Sitemaps – Hate Them, Avoid Them
When auditing sites, I often crawl the xml sitemaps myself to see what I find. And it’s not uncommon to find many urls that resolve with non-200 header response codes. For example, urls that 404, 302, 301, return 500s, etc.

Dirty XML Sitemaps

You should only provide canonical urls in your xml sitemaps. You should not provide non-200 header response code urls (or non-canonical urls that point to other urls). The engines do not like “dirty sitemaps” since they can send Google and Bing on a wild goose chase throughout your site. For example, imagine driving Google and Bing to 50K urls that end up 404ing, redirecting, or not resolving. Not good, to say the least.

Remember Duane’s comment from earlier about “dirt” in sitemaps. The engines can lose trust in your sitemaps, which is never a good thing SEO-wise. More about crawling your sitemaps later in this post.


4. View Trending in Google Webmaster Tools
Many SEOs are familiar with xml sitemaps reporting in Google Webmaster Tools, which can help surface various problems, while also providing important indexation statistics. Well there’s a hidden visual gem in the report that’s easy to miss. The default view will show the number of pages submitted in your xml sitemaps and the number indexed. But if you click the “sitemaps content” box for each category, you can view trending over the past 30 days. This can help you identify bumps in the road, or surges, as you make changes.

For example, check out the trending below. You can see the number of images submitted and indexed drop significantly over a period of time, only to climb back up. You would definitely want to know why that happened, so you can avoid problems down the line. Sending this to your dev team can help them identify potential problems that can build over time.

XML Sitemaps Trending in Google Webmaster Tools


5. Using Rel Alternate in Sitemaps for Mobile URLs
When using mobile urls (like m.), it’s incredibly important to ensure you have the proper technical SEO setup. For example, you should be using rel alternate on the desktop pages pointing to the mobile pages, and then rel canonical on the mobile pages pointing back to the desktop pages.

Although not an approach I often push for, you can provide rel alternate annotations in your xml sitemaps. The annotations look like this:

Rel Alternate in XML Sitemaps


It’s worth noting that you should still add rel canonical to the source code of your mobile pages pointing to your desktop pages.


6. Using hreflang in Sitemaps for Multi-Language Pages
If you have pages that target different languages, then you are probably already familiar with hreflang. Using hreflang, you can tell Google which pages should target which languages. Then Google can surface the correct pages in the SERPs based on the language/country of the person searching Google.

Similar to rel alternate, you can either provide the hreflang code in a page’s html code (page by page), or you can use xml sitemaps to provide the hreflang code. For example, you could provide the following hreflang attributes when you have the same content targeting different languages:

Hreflang in XML Sitemaps

Just be sure to include a separate <loc> element for each url that contains alternative language content (i.e. all of the sister urls should be listed in the sitemap via a <loc> element).


7. Testing XML Sitemaps in Google Webmaster Tools
Last, but not least, you can test your xml sitemaps or other feeds in Google Webmaster Tools. Although easy to miss, there is a red “Add/Test Sitemap” button in the upper right-hand corner of the Sitemaps reporting page in Google Webmaster Tools.

Test XML Sitemaps in Google Webmaster Tools

When you click that button, you can add the url of your sitemap or feed. Once you click “Test Sitemap”, Google will provide results based on analyzing the sitemap/feed. Then you can rectify those issues before submitting the sitemap. I think too many webmasters use a “set it and forget it” approach to xml sitemaps. Using the test functionality in GWT, you can nip some problems in the bud. And it’s simple to use.

Results of XML Sitemaps Test in Google Webmaster Tools


8. Bonus: Crawl Your XML Sitemap Via Screaming Frog
In SEO, you can either test and know, or read and believe. As you can probably guess, I’m a big fan of the former… For xml sitemaps, you should test them thoroughly to ensure all is ok. One way to do this is to crawl your own sitemaps. By doing so, you can identify problematic tags, non-200 header response codes, and other little gremlins that can cause sitemap issues.

One of my favorite tools for crawling sitemaps is Screaming Frog (which I have mentioned many times in my previous posts). By setting the crawl mode to “list mode”, you can crawl your sitemaps directly. Screaming Frog natively handles xml sitemaps, meaning you don’t need to convert your xml sitemaps into another format before crawling (which is awesome).

Crawling Sitemaps in Screaming Frog

Screaming Frog will then load your sitemap and begin crawling the urls it contains. In real-time, you can view the results of the crawl. And if you have Graph View up and running during the crawl, you can visually graph the results as the crawler collects data. I love that feature. Then it’s up to you to rectify any problems that are surfaced.

Graph View in in Screaming Frog


Summary – Maximize and Optimize Your XML Sitemaps
As I’ve covered throughout this post, there are many ways to use xml sitemaps to maximize your SEO efforts. Clean xml sitemaps can help you inform the engines about all of the urls on your site, including the most recent additions and updates. It’s a direct feed to the engines, so it’s important to get it right (and especially for larger and more complex websites).

I hope my post provided some helpful nuggets of sitemap information that enable you to enhance your own efforts. I recommend setting some time aside soon to review, crawl, audit, and then refine your xml sitemaps. There may be some low-hanging fruit changes that can yield nice wins. Now excuse me while I review the latest sitemap crawl. :)



Tuesday, December 9th, 2014

6 Dangerous Rel Canonical Problems Based on Crawling 11M+ Pages in 2014

Dangerous Rel Canonical Problems

Based on helping clients with Panda work, Penguin problems, SEO technical audits, etc., I end up crawling a lot of websites. In 2014, I estimate that I crawled over eleven million pages while helping clients. And during those crawls, I often pick up serious technical problems inhibiting the SEO performance of the sites in question.

For example, surfacing response code issues, redirects, thin content, duplicate content, metadata problems, mobile issues, and more.  And since those problems often lie below the surface, they can sit unidentified and unresolved for a long time. It’s one of the reasons I believe SEO technical audits are the most powerful deliverable in all of SEO.

Last week, I found an interesting comment from John Mueller in a Google Webmaster Hangout video. He was speaking about the canonical url tag and explained that Google needs to process rel canonical as a second or third step (at 48:30 in the video). He explained that processing rel canonical signals is not part of the crawling process, but instead, it’s handled down the line. And that’s one reason you can see urls indexed that are canonicalized to other pages. It’s not necessarily a problem, but gives some insight into how Google handles rel canonical.

When analyzing my tweets a few days later, I noticed that specific tweet got a lot of eyeballs and engagement.

Tweet About Rel Canonical and John Mueller of Google


That got me thinking that there are probably several other questions about rel canonical that are confusing webmasters. Sure, Google published a post covering some common rel canonical problems, but that doesn’t cover all of the issues webmasters can face. So, based on crawling over eleven million pages in 2014, I figured I would list some dangerous rel canonical issues I’ve come across (along with how to rectify them). My hope is that some readers can leave this post and make changes immediately. Let’s jump in.


1. Canonicalizing Many URLs To One
When auditing websites I sometimes come across situations where entire sections of content are being canonicalized to one url. The sections might contain dozens or urls (or more), but the site is using the canonical url tag on every page in the section pointing to one other page on the site.

If the site is canonicalizing many pages to one, then it will have little chance of ranking for any of the content on the canonicalized pages. All of the indexing properties will be consolidated to the url used in the canonic al url tag (in the href). Rel canonical is meant to handle very similar content at more than one url, and was not meant for handling many pages of unique content pointing to one other page.

When explaining this to clients, they typically didn’t understand the full ramifications of implementing a many to one rel canonical strategy. By the way, the common reason for doing this is to try and boost the rankings of the most important pages on the site. For example, webmasters believe that if they canonicalize 60 pages in a section to the top-level page, then that top-level page will be the all-powerful url ranking in the SERPs. Unfortunately, while they are doing that, they strip away any possibility of the canonicalized pages ranking for the content they hold. And on larger sites, this can turn ugly quickly.

Rel Canonical Many URLs to One
If you have unique pages with valuable content, then do not canonicalize them to other pages… Let those pages be indexed, optimize the pages for the content at hand, and make sure you can rank for all of the queries that relate to that content. When you take the long tail of SEO into account, those additional pages with unique content can drive many valuable visitors to your site via organic search. Don’t underestimate the power of the long tail.


2. Daisy Chaining Rel Canonical
When using the canonical url tag, you want to avoid daisy chaining hrefs. For example, if you were canonicalizing page2.htm to page1.htm, but page 1.htm is then canonicalized to page3.htm, then you are sending very strange signals to the engines. To clarify, I’m not referring to actual redirects (like 301s or 302s), but instead, I’m talking about the hrefs used in the canonical url tag.

Here’s an example:
page 2.htm includes the following: <link rel=“canonical” href=“page1.htm” />
But page1.htm includes this: <link rel=“canonical” href=“page3.htm” />

Daisy Chaining Rel Canonical

While conducting SEO audits, I’ve seen this botched many times, even beyond the daisy chaining. Sometimes page3.htm doesn’t even exist, sometimes it redirects via 301s or 302s, etc.

Overall, don’t send mixed signals to the engines about which url is the canonical one. If you say it’s page1.htm but then tell the engines that it’s page3.htm once they crawl page1.htm, and then botch page3.htm in a variety of ways, you might experience some very strange ranking problems. Be clear and direct via rel canonical.


3. Using The Non-Canonical Version
This situation is a little different, but can cause problems nonetheless. I actually just audited a site that used this technique across 2.1M pages. Needless to say, they will be making changes asap. In this scenario, a page is referencing a non-canonical version of the original url via the canonical url tag.  But the non-canonical version actually redirects back to the original url.

For example:
page1.htm includes this: <link rel=“canonical” href=“page1.htm?id=46” />
But page1.htm?id=46 redirects back to page1.htm

Rel Canonical to Non-Canoncial Version of URL

So in a worst-case scenario, this is implemented across the entire site and can impact many urls. Now, Google views rel canonical as a hint and not a directive. So there’s a chance Google will pick up this error and rectify the issue on its end. But I wouldn’t bank on that happening. I would fix rel canonical to point to the actual canonical urls on the site versus non-canonical versions that redirect to the original url (or somewhere else).


4. No Rel Canonical + The Use of Querystring Parameters
This one is simple. I often find websites that haven’t implemented the canonical url tag at all. For some smaller and less complex sites, this isn’t a massive problem. But for larger, more complex sites, this can quickly get out of control.

As an example, I recently audited a website that heavily used campaign tracking parameters (both from external campaigns and from internal promotions). By the way, don’t use campaign tracking parameters on internal promotions… they can cause massive tracking problems. Anyway, many of those urls were getting crawled and indexed. And depending on how many campaigns were set up, some urls had many non-canonical versions being crawled and indexed.

Not Using Rel Canonical With Campaign Parameters

By implementing the canonical url tag, you could signal to the engines that all of the variations of urls with querystring parameters should be canonicalized to the original, canonical url. But without rel canonical in place, you run the risk of diluting the strength of the urls in question (as many different versions can be crawled, indexed, and linked to from outside the site).

Imagine 500K urls indexed with 125K duplicate urls also indexed. And for some urls, maybe there are five to ten duplicates per page. You can see how this can get out of control. It’s easy to set up rel canonical programmatically (either via plugins or your own server-side code). Set it up today to avoid a situation like what I listed above.


5. Canonical URL Tag Not Present on Mobile Urls (m. or other)
Mobile has been getting a lot of attention recently (yes, understatement of the year). When clients are implementing an m. approach to mobile handling, I make sure to pay particular attention the bidirectional annotations on both the desktop and mobile urls. And to clarify, I’m not just referring to a specific m. setup. It can be any mobile urls that your site is using (redirecting from the desktop urls to mobile urls).

For example, Google recommends you add rel alternate on your desktop urls pointing to your mobile urls and then rel canonical on your mobile urls pointing back to your desktop urls.

Not Using Rel Canonical With Mobile URLs

This ensures Google understands that the pages are the same and should be treated as one. Without the correct annotations in place, you are hoping Google understands the relationship between the desktop and mobile pages. But if it doesn’t, you could be providing many duplicate urls on your site that can be crawled and indexed. And on larger-scale websites (1M+ pages), this can turn ugly.

Also, contrary to what many think, separate mobile urls can work extremely well for websites (versus responsive or adaptive design). I have a number of clients using mobile urls and the sites rank extremely well across engines. You just need to make sure the relationship is sound from a technical standpoint.


6. Rel Canonical to a 404 (or Noindexed Page)
The last scenario I’ll cover can be a nasty one. This problem often lies undetected until pages start falling out the index and rankings start to plummet. If a site contains urls that use rel canonical pointing to a 404 or a noindexed page, then the site will have little shot of ranking for the content on those canonicalized pages. You are basically telling the engines that the true, canonical url is a 404 (not found), or a page you don’t want indexed (a page that uses the meta robots tag containing “noindex”).

I had a company reach out to me once during the holidays freaking out because their organic search traffic plummeted. After quickly auditing the site, it was easy to see why. All of their core pages were using rel canonical pointing to versions of that page that returned 404 header response codes. The site (which had over 10M pages indexed) was giving Google the wrong information, and in a big way.

Rel Canonical Pointing to 404 or Noindexed Page
Once the dev team implemented the change, organic search traffic began to surge. As more and more pages sent the correct signals to Google, and Google indexed and ranked the pages correctly, the site regained its traffic. For an authority site like this one, it only took a week or two to regain its rankings and traffic. But without changing the flawed canonical setup, I’m not sure it would ever surge back.

Side Note: This is why I always recommend checking changes in a staging environment prior to pushing them live. Letting your SEO review all changes before they hit the production site is a smart way to avoid potential disaster.


Summary – Don’t Botch Rel Canonical
I’ve always said that you need a solid SEO structure in order to rank well across engines. In my opinion, SEO technical audits are worth their weight in gold (and especially for larger-scale websites.) Rel canonical is a great example of an area that can cause serious problems if not handled correctly. And it often lies below the surface, wreaking havoc by sending mixed signals to the engines.

My hope is that the scenarios listed above can help you identify, and then rectify canonical url problems riddling your website. The good news is that the changes are relatively easy to implement once you identify the problems. My advice is to keep rel canonical simple, send clear signals, and be consistent across your website. If you do that, good things can happen. And that’s exactly what you want SEO-wise.



Wednesday, November 26th, 2014

Panda Analysis Using Google Analytics Segments – How To Isolate Desktop, Mobile, and Tablet Traffic From Google

Segments in Google Analytics to Isolate Traffic

In previous posts about Panda analysis, I’ve mentioned the importance of understanding the content that users are visiting from Google organic. Since Google is measuring user engagement, hunting down those top landing pages can often reveal serious content quality problems.

In addition, I’ve written about understanding the devices being used to access your site from the search results. For example, what’s the breakdown of users by desktop, mobile, and tablets from Google organic? If 50% of your visits are from smartphones, then you absolutely need to analyze your site through that lens. If not, you can miss important problems that users are experiencing while visiting your website. And if left unfixed, those problems can lead to a boatload of horrible engagement signals being sent to Google. And that can lead to serious Panda problems.

Panda Help Via Segments in Google Analytics
So, if you want to analyze your content by desktop, mobile, and tablet users through a Panda lens, what’s the best way to achieve that? Well, there’s an incredibly powerful feature in Google Analytics that I find many webmasters simply don’t use. It’s called segmentation and enables you slice and dice your traffic based on a number of dimensions or metrics.

Segments are non-destructive, meaning that you can apply them to your data and not affect the source of the data. Yes, that means you can’t screw up your reporting. :) In addition, you can apply new segments to previous traffic (they are backwards compatible). So you can build a new segment today and apply it to traffic from six months ago, or longer.

For our purposes today, I’m going to walk you through how to quickly build three new segments. The segments will isolate Google organic traffic from desktop users, mobile users, and tablet users. Then I’ll explain how to use the new segments while analyzing Panda hits.


How To Create Segments in Google Analytics
When you fire up Google Analytics, the “All Sessions” segment is automatically applied to your reporting. So yes, you’ve already been using segments without even knowing it. If you click the “All Sessions” segment, you’ll see a list of additional segments you can choose.

Google Analytics All Sessions Segment

You might be surprised to see a number of segments have been built for you already. They are located in the “System” category (accessed via the left side links). For example, “Direct Traffic”, “AdWords”, “Organic Traffic”, and more.

Google Analytics System Segments


We are going to build custom segments by copying three system segments and then adding more dimensions. We’ll start by creating a custom segment for mobile traffic from Google organic.

1. Access the system segments by clicking “All Sessions” and then clicking the link labeled “System” (located on the left side of the UI).


Google Analytics System Segments


2. Scroll down and find the “Mobile Traffic” segment. To the far right, click the “Actions” dropdown. Then choose “Copy” from the list.


Copying a System Segment in Google Analytics


3. The segment already has “Device Category”, “exactly matches”, and “mobile” as the condition. We are going to add one more condition to the list, which is Google organic traffic. Click the “And” button on the far right. Then choose “Acquisition” and the “Source/Medium” from the dimensions list. Then choose “exactly matches” and select “google/organic” from the list. Note, autocomplete will list the top sources of traffic once you place your cursor in the text box.


Creating a Segment by Adding Conditions


4. Name your segment “Mobile Google Organic” by using the text box labeled “Segment Name” at the top of the window. It’s easy to miss.


Name a Custom Segment in Google Analytics


5. Then click “Save” at the bottom of the create segment window.


Save a Custom Segment in Google Analytics


Congratulations! You just created a custom segment.


Create The Tablet Traffic Segment
Now repeat the process listed above to create a custom segment for tablet traffic from Google organic.  You will begin with the system segment for “Tablet Traffic” and then copy it. Then you will add a condition for Google organic as the source and medium.


Desktop Traffic (Not a default system segment.)
I held off on explaining the “Desktop Traffic” segment, since there’s an additional step in creating one. For whatever reason, there’s not a system segment for isolating desktop traffic. So, you need to create this segment differently. Don’t worry, it’s still easy to do.

We’ll start with the “Mobile Traffic” segment in the “System” list, copy it, and then refine the condition.

1. Click “All Sessions” and the find “Mobile Traffic” in the “System” list. Click “Actions” to the far right and then click “Copy”.


Copying a System Segment in Google Analytics


2. The current condition is set for “Device Category” exactly matching “mobile”. We’ll simply change mobile to “desktop”. Delete “mobile” and start typing “desktop”. Then just select the word “desktop” as it shows up.

Creating a Desktop Segment in Google Analytics


3. Since we want Desktop traffic from Google Organic, we need to add another condition. You can do this by clicking “And” to the far right, selecting “Acquisition”, and then “Source/Medium” from the dropdown. Then select “exactly matches” and enter “Google/Organic” in the text box. Remember, autocomplete will list the top sources of traffic as you start to type.


Creating a Google Organic Desktop Segment in Google Analytics


4. Name your segment “Desktop Google Organic” and then click “Save” at the bottom of the segment window to save your new custom segment.


Quickly Check Your Segments
OK, at this point you should have three new segments for Google organic traffic from desktop, mobile, and tablets. To ensure you have these segments available, click “All Sessions” at the top of your reporting, and click the “Custom” link on the left. Scroll down and make sure you have all three new segments. Remember, you named them “Desktop Google Organic”, “Mobile Google Organic”, and “Tablet Google Organic”.

If you have them, then you’re good to go. If you don’t, read through the instructions again and create all three segments.


Run Panda Reports by Segment
In the past, I’ve explained the importance of running a Panda report in Google Analytics for identifying problematic content. A Panda report isolates landing pages from Google organic that have dropped substantially after a Panda hit. Well, now that you have segments for desktop, mobile, and tablet traffic from Google organic, you can run Panda reports by segment.

For example, click “All Sessions” at the top of your reporting and select “Mobile Google Organic” from the “All” or “Custom” categories. Then visit your “Landing Pages” report under “Behavior” and “Site Content” in the left side menu in GA. Since you have a specific segment active in Google Analytics, the reporting you see will be directly tied to that segment (and filter out any other traffic).

Creating a Google Panda Report Using Custom Segments


Then follow the directions in my previous post to run and export the Panda report. You’ll end up with an Excel spreadsheet highlighting top landing pages from mobile devices that dropped significantly after the Panda hit. Then you can dig deeper to better understand the content quality (or engagement) problems impacting those pages.

Combine with Adjusted Bounce Rate (ABR)
User engagement matters for Panda. I’ve documented that point many times in my previous posts about Panda analysis, remediation, and recovery. The more poor engagement signals you send Google, the more bamboo you are building up. And it’s only a matter of time before Panda comes knocking.

So, when analyzing user engagement, many people jump to the almighty Bounce Rate metric to see what’s going on. But here’s the problem. Standard Bounce Rate is flawed. Someone could spend five minutes reading a webpage on your site, leave, and it’s considered a bounce. But that’s not how Google sees it. That would be considered a “long click” to Google and would be absolutely fine.

And this is where Adjusted Bounce Rate shines. If you aren’t familiar with ABR, then read my post about it (including how to implement it). Basically, Adjusted Bounce Rate takes time on page into account and can give you a much stronger view of actual bounce rate. Once you implement ABR, you can check bounce rates for each of the segments you created earlier (and by landing page). Then you can find high ABR pages by segment (desktop, mobile, and tablet traffic).

Combining Adjusted Bounce Rate with Custom Segments


Check Devices By Segment (Smartphones and Tablets)
In addition to running a Panda report, you can also check the top devices being used by people searching Google and visiting your website. Then you can analyze that data to see if there are specific problems per device. And if it’s a device that’s heavily used by people visiting your site from Google organic, then you could uncover serious problems that might lie undetected by typical audits.

GA’s mobile reporting is great, but the default reporting is not by traffic source. But using your new segments, you could identify top devices by mobile and tablet traffic from Google organic. And that’s exactly what you need to see when analyzing Panda hits.

Analyzing Devices with Custom Segments in Google Analytics

For example, imagine you saw very high bounce rates (or adjusted bounce rates) for ipad users visiting from Google organic. Or maybe your mobile segment reveals very low engagement from Galaxy S5 users. You could then test your site via those specific devices to uncover rendering problems, usability problems, etc.


Summary – Isolate SEO Problems Via Google Analytics Segments
After reading this post, I hope you are ready to jump into Google Analytics to create segments for desktop, mobile, and tablet traffic from Google organic. Once you do, you can analyze all of your reporting through the lens of each segment. And that can enable you to identify potential problems impacting your site from a Panda standpoint. I recommend setting up those segments today and digging into your reporting. You might just find some amazing nuggets of information. Good luck.



Monday, October 27th, 2014

Penguin 3.0 Analysis – Penguin Tremors, Recoveries, Fresh Hits, and Crossing Algorithms

Penguin 3.0 Analysis and Findings

Oct 17, 2014 was an important date for many SEOs, webmasters, and business owners. Penguin, which we’ve been waiting over an entire year for, started to roll out. Google’s Gary Illyes explained at SMX East that Penguin 3.0 was imminent, that it would be a “delight” for webmasters, that it would be a new algorithm, and more. So we all eagerly awaited the arrival of Penguin 3.0.

There were still many questions about the next version of Penguin. For example, why has it taken so long to update Penguin, would there be collateral damage, would it actually have new signals, would it roll out more frequently, and more?  So when we saw the first signs of Penguin rolling out, many of us dug in and began to analyze both recoveries and fresh hits. I had just gotten back from SES Denver, where I was presenting about Panda and Penguin, so the timing was interesting to say the least. :)

Since the algorithm is rolling out slowly, I needed enough time and data to analyze the initial update, and then subsequent tremors. And I’m glad I waited ten days to write a post, since there have been several interesting updates already. Now that we’re ten days into the rollout, and several tremors have occurred, I believe I have enough data to write my first post about Penguin 3.0. And it’s probably the first of several as Penguin continues to roll out globally.

“Mountain View, We Have a Problem”
Based on the long delay of Penguin, it was clear that Google was having issues with the algo. Nobody knows exactly what the problems were, but you can guess that the results during testing were less than optimal. The signature of previous Penguin algorithms has been extremely acute up to now. It targeted spammy inbound links on low quality websites. Compare that to an extremely complex algorithm like Panda, and you can see clear differences…

But Panda is about on-site content, which makes it less susceptible to tampering. Penguin, on the other hand, is about external links. And those links can be manipulated. The more Penguin updates that rolled out, the more data you could gain about its signature. And that can lead to very nasty things happening. For example, launching negative SEO campaigns, adding any website to a host of low quality sites that have been previously impacted by Penguin, etc. All of that can muddy the algorithm waters, which can lead to a lot of collateral damage. I won’t harp on negative SEO in this post, but I wanted to bring it up. I do believe that had a big impact on why Penguin took so long to roll out.

My Goal With This Post
I’m going to quickly provide bullets listing what we know so far about Penguin 3.0 and then jump to my findings based on the first ten days of the rollout. I want to explain what I’ve seen in the Penguin trenches, including recoveries, fresh hits, and other interesting tidbits I’ve seen across my travels. In addition, I want to explain the danger of crossing algorithms, which is going on right now. I’ll explain more about Penguin, Panda, and Pirate all roaming the web at the same time, and the confusion that can cause. Let’s dig in.

Here’s what we know so far about Penguin 3.0:

  • Penguin 3.0 started rolling out on 10/17 and was officially announced on 10/21.
  • It’s a global rollout.
  • It’s a refresh and not an update. New signals have not been added. You can read more about the differences between a refresh and update from Marie Haynes.
  • It will be a slow and steady rollout that can take weeks to complete. More about Penguin tremors soon.
  • There was more international impact initially. Then I saw an uptick in U.S. impact during subsequent Penguin tremors.
  • Google has been very quiet about the update. That’s a little strange given the magnitude of Penguin 3.0, how long we have waited, etc. I cover more about the future of Penguin later in this post.


10 Days In – Several Penguin Tremors Already
We are now ten days into the Penguin 3.0 rollout. Based on the nature of this update, I didn’t want to write a post too quickly. I wanted more data, the ability to track many sites during the rollout in order to gauge the impact, fresh hits, and recoveries. And that’s exactly what I’ve done since early Saturday, October 18. Penguin began rolling out the night before and there’s been a lot of movement since then.

When Penguin first rolled out, it was clear to me that it would be a slow and steady rollout. I said that from the beginning. I knew there was potential for disaster (from Google’s standpoint), so there was no way they would roll out it globally all at one time. Instead, I believed they would start rolling out Penguin, heavily analyze the SERPs, adjust the algo where needed, and then push more updates and expand.  If you’ve been following my writing over the past few years, then you know I call this phenomenon “tremors”. I have seen this often with Panda, and especially since Panda 4.0. Those tremors were even confirmed by Google’s John Mueller.

Specifically with Penguin, I have seen several tremors since the initial rollout on 10/17. There was significant movement on 10/22, and then I saw even more movement on 10/24. Some sites seeing early recovery saw more impact during the subsequent tremors, while other sites saw their first impact from Penguin during those later tremors.

For example, one client I helped with both Panda and Penguin jumped early on Friday 10/24. You can see their trending below. They are up 48% since Friday:

Penguin 3.0 Recovery During Tremor

That’s awesome, and was amazing to see (especially for the business owner). They have worked very hard over the past year to clean up the site on several fronts, including content, links, mobile, etc. It’s great to see that hard work pay off via multiple algorithm updates (they recovered from Panda in May during Panda 4.0 and now during Penguin 3.0.) It’s been a good year for them for sure. :)

Moving forward, I fully expect to see more tremors as the global rollout continues. That can mean sites seeing fresh impact, while others see more movement beyond the first date that Penguin 3.0 impacted their sites. For example, a site may recover or get hit on 10/17, but see movement up or down during subsequent tremors. We’ve already seen this happen and it will continue throughout the rollout.

More Recoveries During Penguin 3.0
For those battling Penguin for a long time (some since Penguin 2.0 on May 22, 2013), this was a much-anticipated update. Some companies I’ve been helping have worked hard over the past 12-18 months to clean up their link profiles. That means nuking unnatural links and using the disavow tool heavily to rid their site of spammy links.

For those of you unfamiliar with link cleanup, the process is tedious, painful, and time consuming. And of course, you can have the nasty replicating links problem, which I have seen many times with spammy directories. That’s when unnatural links replicate across other low quality directories. Websites I’ve been helping with this situation must continually analyze and clean their link profiles. You simply can’t get rid of the problem quickly or easily. It’s a nasty reminder to never go down the spammy linkbuilding path again.

For example, here’s a site that had hundreds of spammy links pop up in the fall of 2014. They had no idea this was going on… 

Penguin 3.0 and New Spammy Links


When sites that have been working hard to rectify their link problems experience a Penguin recovery, it’s an amazing feeling. Some of the sites I’ve been helping have seen a nice bounce-back via Penguin 3.0. I’ll quickly cover two of those recoveries below.

The first is an ecommerce retailer that unfortunately took a dangerous path a few years ago. They hired several SEO companies over a number of years and each ended up building thousands of spammy links. It’s a similar story that’s been seen many times since Penguin first arrived. You know, an SMB trying to compete in a tough space, ends up following the wrong strategy, does well in the short-term, and then gets pummeled by Penguin.

The site was not in good shape when they first contacted me. So we tackled the unnatural link profile head on. I heavily analyzed their link profile, flagged many spammy links, they had a small team working on link removals, and whatever couldn’t be removed was disavowed. We updated the disavow file several times over a four to five month period.

But, and this is a point too many Penguin victims will be familiar with, we were done with link cleanup work in the spring of 2014! Yes, we had done everything we could, but simply needed a Penguin refresh or update. Surely that would happen soon, right?… No way. We had to wait until October 17, 2014 for that to happen. The good news is that this site saw positive impact immediately. You can see the increase in impressions and clicks below starting on 10/17. And Google organic traffic is up 52% since Penguin rolled out.

Penguin 3.0 Recovery on 10/17/14


The next recovery I’ll quickly explain started on 10/17 and saw subsequent increases during the various Penguin tremors I mentioned earlier. They saw distinct movement on 10/17, 10/22, and then 10/25. The site saw a pretty big hit from Penguin 2.0 and then another significant hit from Penguin 2.1 (where Google turned up the dial). The website’s link profile was riddled with exact match anchor text from low quality sites.

The site owner actually removed or nofollowed a good percentage of unnatural links. You can see the impact below. Notice the uptick in trending during the various tremors I mentioned.

Penguin 3.0 Recovery During Tremors


A Reality Check – Some Websites Left Hanging But Rollout Is Not Complete
I must admit, though, I know of several companies that are still waiting for Penguin recovery that should recover during Penguin (to some level). They worked hard just like the companies I listed above. They cleaned up their link profiles, heavily used the disavow tool, worked tirelessly to fix their Penguin problem, but have not seen any impact yet from Penguin 3.0. And many other companies have been complaining about the same thing. But again, Google said the full rollout could take weeks to complete… so it’s entirely possible that they will recover, but at some point over the next few weeks.


A Note About Disavow Errors
It’s worth noting that one client of mine battling Penguin made a huge mistake leading up to Penguin 3.0. They decided to update their disavow file in late September (without my help), and the file contained serious errors. They didn’t catch that upon submission. I ended up noticing something strange in the email from Google Webmaster Tools regarding the number of domains being disavowed. The total number of domains being recorded by GWT was a few hundred less than what was listed in the disavow file prior to the latest submission. And those extra few hundred domains encompass thousands of spammy links. I contacted my client immediately and they rectified the disavow file errors quickly and re-uploaded it.

The website has not recovered yet (although it absolutely should to some level). I have no idea if that disavow glitch threw off Penguin, or if this site is simply waiting for a Penguin tremor to recover. But it’s worth noting.


Fresh Penguin Hits
Now let’s move to the negative side of Penguin 3.0. There have been many fresh hits since 10/17 and I’ve been heavily analyzing those drops. It didn’t take long to see that the same old link tactics were being targeted (similar to previous versions of Penguin). And my research supports that Penguin 3.0 was a refresh and not a new algorithm.

For example, exact match anchor text links from spammy directories, article marketing, comment spam, forum spam, etc. Every fresh hit I analyzed yielded a horrible link profile using these tactics. These were clear Penguin hits… I could tell just by looking at the anchor text distribution that they were in serious Penguin danger.

For example, here’s the anchor text distribution for a site hit by Penguin 3.0. Notice all of the exact match anchor text?

Anchor Text Distribution for Fresh Penguin 3.0 Hit

For those of you new to SEO, this is not what a natural link profile looks like. Typically, there is little exact match anchor text, brand terms show up heavily, urls are used to link to pages, generic phrases, etc. If your top twenty anchor text terms are filled with exact match or rich anchor text, then you are sending “fresh fish” signals to Google. And Google will respond by sending a crew of Penguins your way. The end result will not be pretty.

Hit Penguin 3.0


Crazy Gets Crazier
I must admit that some fresh hits stood out, and not in a good way. For example, I found one site that started its spammy linkbuilding just two days after Penguin 2.1 rolled out in October of 2013! Holy cow… the business owner didn’t waste any time, right? Either they didn’t know about Penguin or they were willing to take a huge risk. Regardless, that site got destroyed by Penguin 3.0.

I could keep showing you fresh hit information, but unfortunately, you would get bored. They all look similar… spammy links from low quality sites using exact match anchor text. Many of the hits I analyzed were Grade-A Penguin food. It’s like the sites lobbed a softball at Penguin, and Google knocked it out of the park.


Next Update & Frequency?
At SMX East, Gary Illyes explained that the new Penguin algorithm was structured in a way where Google could update Penguin more frequently (similar to Panda). All signs point to a refresh with Penguin 3.0, so I’m not sure we’ll see Penguin updating regularly (beyond the rollout). That’s unfortunate, since we waited over one year to see this refresh…

Also, John Mueller was asked during a webmaster hangout if Penguin would update more frequently. He responded that the “holiday season is approaching and they wouldn’t want to make such as fuss”. If that’s the case, then we are looking at January as the earliest date for the next Penguin refresh or update. So, we have a minimum of three to four months before we see a Penguin refresh or update. And it could very well take longer, given Google’s track record with the Penguin algorithm. It wouldn’t shock me to see the next update in the Spring of 2015.

Check John’s comments at 46:45:


Important – The Crossing of Algorithm Updates (Penguin, Panda, and Pirate)
In the past, I have explained the confusion that can occur when Google rolls out multiple algorithm updates around the same time. The algorithm sandwich from April of 2012 is a great example, Google rolled out Panda, Penguin, and then another Panda refresh all within 10 days. It caused massive confusion and some sites were even hit by both algos. I called that “Pandeguin” and wrote about it here.

Well, we are seeing that again right now. Penguin 3.0 rolled out on 10/17, the latest version of Pirate rolled out late last week, and I’m confident we saw a Panda tremor starting late in the day on Friday 10/24. I had several clients dealing with Panda problems see impact late on 10/24 (starting around 5PM ET).

A bad Panda hit starting late on 10/24:

When Panda and Penguin Collide
A big Panda recovery starting at the same time: 

When Panda and Penguin Collide


I can see the Panda impact based on the large amount of Panda data I have access to (across sites, categories, and countries). But the average business owner does not have access to that data. And Google will typically not confirm Panda tremors. So, if webmasters saw impact on Friday (and I’m sure many have), then serious confusion will ensue. Were they hit by Penguin, Panda, or for some sites dealing with previous DMCA issues, was it actually Pirate?

Update: I now have even more data backing a Panda tremor late on 10/24. I had Paul Macnamara and  Michael Vittori explain they are seeing the same thing. They also provided screenshots of trending for both sites. You can see with Michael’s that the site got hit during the 9/5 Panda update, but recovered on Friday. Paul’s screenshot shows a clear uptick on 10/25 on a site impacted by Panda (no Penguin or Pirate impact at all).
Another Panda recovery during the 10/24 tremor.


Another Panda recovery during the 10/24 tremor.

And this underscores a serious problem for the average webmaster. If you work on fixing your site based on the wrong algorithm, they you will undoubtedly spin your SEO wheels. I’ve seen this many times over the years, and spinning wheels do nothing but waste money, time, and resources.

If you saw impact this past week, you need to make sure you know which algorithm update impacted your site. It’s not easy, when three external algos are roaming the web all at one time. But it’s important to analyze your situation, your search history, and determine what you need to do in order to recover.

A Note About Negative SEO
I couldn’t write a post about Penguin 3.0 without mentioning negative SEO. The fear with this latest update was that negative SEO would rear its ugly head. Many thought that the heavy uptick in companies building spammy links to their competitors would cause serious collateral damage.

Theoretically, that can definitely happen (and there are a number of claims of negative SEO since 10/17). Let’s face it, Penguin’s signature is not complicated to break down. So if someone built spammy links to their competitors on sites targeted by Penguin, then those sites could possibly get hit by subsequent Penguin refreshes. Many in the industry (including myself) believe this is one of the reasons it has taken so long for Google to roll out Penguin 3.0. I’m sure internal testing revealed serious collateral damage.

But here’s the problem with negative SEO… it’s very hard to prove that NSEO is the culprit (for most sites). I’ve received many calls since Penguin first rolled out in 2012 with business owners claiming they never set up spammy links that got them hit. But when you dig into the situation, you can often trace the spammy link trail back to someone tied to the company.

That might be a marketing person, agency, SEO company, PR agency, intern, etc.  You can check out my Search Engine Watch column titled Racing Penguin to read a case study of a company that thought negative SEO was at work, when in fact, it was their own PR agency setting up the links. So, although we’ve heard complaints of negative SEO with Penguin 3.0, it’s hard to say if those are accurate claims.

Negative SEO and Penguin 3.0


Penguin 3.0 Impact – What Should You Do Next?

  • If you have been negatively impacted by Penguin 3.0, my advice remains consistent with previous Penguin hits. You need to download all of your inbound links from a number of sources, analyze those links, flag unnatural links, and then remove/disavow them. Then you need to wait for a Penguin refresh or update. That can be months from now, but I would start soon. You never know when the next Penguin update will be…
  • On the flip side, if you have just recovered from a Penguin hit, then you should create a process for checking your links on a monthly basis. Make sure new spammy links are not being built. I have seen spammy links replicate in the past… so it’s important to fully understand your latest links. I wrote a blog post covering how to do this on Search Engine Watch (linked to above). I recommend reading that post and implementing the monthly process.
  • And if you are unsure of which algorithm update impacted your site, then speak with as many people familiar with algo updates as possible. You need to make sure you are targeting the right one with your remediation plan. But as I mentioned earlier, there are three external algos in the wild now (with Penguin, Panda, and Pirate). This inherently brings a level of confusion for webmasters seeing impact.


Summary – Penguin 3.0 and Beyond
That’s what I have for now. Again, I plan to write more posts soon about the impact of Penguin 3.0, the slow and steady rollout, interesting cases that surface, and more. In the meantime, I highly recommend analyzing your reporting heavily over the next few weeks. And that’s especially the case since multiple algos are running at the same time. It’s a crazy situation, and underscores the complexity of today’s SEO environment. So strap on your SEO helmets, grab a bottle of Tylenol, and fire up Google Webmaster Tools. It’s going to be an interesting ride.




Monday, September 29th, 2014

Panda 4.1 Analysis and Findings – Affiliate Marketing, Keyword Stuffing, Security Warnings, and Deception Prevalent

Panda 4.1 Analysis and Findings

On Tuesday, September 23, Google began rolling out a new Panda update. Pierre Far from Google announced the update on Google+ (on Thursday) and explained that some new signals have been added to Panda (based on user and webmaster feedback). The latter point is worth its own blog post, but that’s the not the focus of my post today. Pierre explained that the new Panda update will result in a “greater diversity of high-quality small- and medium-sized sites ranking higher”. He also explained that the new signals will “help Panda identify low-quality content more precisely”.

I first spotted the update late on 9/23 when some companies I have been helping with major Panda 4.0 hits absolutely popped. They had been working hard since May of 2014 on cleaning up their sites from a content quality standpoint, dealing with aggressive ad tactics, boosting credibility on their sites, etc. So it was amazing to see the surge in traffic due to the latest update.

Here are two examples of recovery during Panda 4.1. Both clients have been making significant changes over the past several months:

Panda 4.1 Recovery

Panda 4.1 Recovery Google Webmaster Tools

As a side note, two of my clients made the Searchmetrics winners list, which was released on Friday. :)

A Note About 4.1
If you follow me on Twitter, then you already know that I hate using the 4.1 tag for this update. I do a lot of Panda work and have access to a lot of Panda data. That enables me to see unconfirmed Panda updates (and tremors).  There have been many updates since Panda 4.0, so this is not the only Panda update since May 20, 2014. Not even close actually.

I’ve written heavily about what I called “Panda tremors”, which was confirmed by John Mueller of Google. Also, I’ve done my best to write about subsequent Panda updates I have seen since Panda 4.0 here on my blog and on my Search Engine Watch column. By the way, the latest big update was on 9/5/14, which impacted many sites across the web. I had several clients I’ve been helping with Panda hits recover during the 9/5 update.

My main point here is that 4.1 should be called something else, like 4.75. :) But since Danny Sullivan tagged it as Panda 4.1, and everybody is using that number, then I’ll go with it. The name isn’t that important anyway. The signature of the algo is, and that’s what I’m focused on.


Panda 4.1 Analysis Process
When major updates get rolled out, I tend to dig in full blast and analyze the situation. And that’s exactly what I did with Panda 4.1. There were several angles I took while analyzing P4.1, based on the recoveries and fresh hits I know of (and have been part of).

So, here is the process I used, which can help you understand how and why I came up with the findings detailed in this post.

1. First-Party Known Recoveries
These are recoveries I have been guiding and helping with. They are clients of mine and I know everything that was wrong with their websites, content, ad problems, etc. And I also know how well changes were implemented, if they stuck, how user engagement changed during the recovery work, etc. And of course, I know the exact level of recovery seen during Panda 4.1.

2. Third-Party Known Recoveries
These are sites I know recovered, but I’m not working with directly. Therefore, I use third party tools to help identify increases in rankings, which landing pages jumped in the rankings, etc. Then I would analyze those sites to better understand the current content surging, while also checking the previous drops due to Panda to understand their initial problems.

3. First-Party Known Fresh Hits
Based on the amount of Panda work I do, I often have a number of companies reach out to me with fresh Panda hits. Since these are confirmed Panda hits (large drops in traffic starting when P4.1  rolled out), I can feel confident that I’m reviewing a site that Panda 4.1 targeted. Since Tuesday 9/23, I have analyzed 21 websites (Update: now 42 websites) that have been freshly hit by Panda 4.1. And that number will increase by the end of this week. More companies are reaching out to me with fresh Panda hits… and I’ve been neck deep in bamboo all weekend.

4. Third-Party Unconfirmed Fresh Hits
During my analysis, I often come across other websites in a niche with trending that reveals a fresh Panda hit. Now, third party tools are not always accurate, so I don’t hold as much confidence in those fresh hits.  But digging into them, identifying the lost rankings, the landing pages that were once ranking, the overall quality of the site, etc., I can often identify serious Panda candidates (sites that should have been hit). I have analyzed a number of these third-party unconfirmed fresh hits during my analysis over the past several days.


Panda 4.1 Findings
OK, now that you have a better understanding of how I came up with my findings, let’s dig into actual P4.1 problems. I’ll start with a note about the sinister surge and then jump into the findings. Also, it’s important to understand that not all of the sites were targeted by new signals. There are several factors that can throw off identifying new signals, such as when the sites were started, how the sites have changed over time, how deep in the gray area of Panda they were, etc. But the factors listed below are important to understand, and avoid. Let’s jump in.


Sinister Surge Reared Its Ugly Head
Last year I wrote a post on Search Engine Watch detailing the sinister surge in traffic prior to an algorithm hit. I saw that phenomenon so many times since February of 2011 that I wanted to make sure webmasters understood this strange, but deadly situation. After I wrote that post, I had many people contact me explaining they have seen the exact same thing. So yes, the surge is real, it’s sinister, and it’s something I saw often during my latest analysis of Panda 4.1.

By the way, the surge is sinister since most webmasters think they are surging in Google for the right reasons, when in fact, Google is dishing out more traffic to problematic content and gaining a stronger feel for user engagement. And if you have user engagement problems, then you are essentially feeding the mighty Panda “Grade-A” bamboo. It’s not long after the surge begins that the wave crashes and traffic plummets.

Understanding the surge now isn’t something that can help Panda 4.1 victims (since they have already been hit). But this can help anyone out there that sees the surge and wonders why it is happening. If you question content quality on your website, your ad situation, user engagement, etc., and you see the surge, deal with it immediately. Have an audit completed, check your landing pages from Google organic, your adjusted bounce, rate, etc. Make sure users are happy. If they aren’t, then Panda will pay you a visit. And it won’t be a pleasant experience.

The Sinister Surge Before Panda Strikes


Affiliate Marketers Crushed
I analyzed a number of affiliate websites that got destroyed during Panda 4.1. Now, I’ve seen affiliate marketers get pummeled for a long time based on previous Panda updates, so it’s interesting that some affiliate sites that have been around for a while just got hit by Panda 4.1. Some sites I analyzed have been around since 2012 and just got hit now.

For example, there were sites with very thin content ranking for competitive keywords while their primary purpose was driving users to partner websites (like Amazon and other ecommerce sites). The landing pages only held a small paragraph up top and then listed affiliate links to Amazon (or other partner websites). Many of the pages did not contain useful information and it was clear that the sites were gateways to other sites where you could actually buy the products. I’ve seen Google cut out the middleman a thousand times since February of 2011 when Panda first rolled out, and it seems Panda 4.1 upped the aggressiveness on affiliates.

I also saw affiliate sites that had pages ranking for target keywords, but when you visited those pages the top affiliate links were listed first, pushing down the actual content that users were searching for. So when you are looking for A, but hit a page containing D, E, F, and G, with A being way down the page, you probably won’t be very happy. Clearly, the webmaster was trying to make as much money as possible by getting users to click through the affiliate links. Affiliate problems plus deception is a killer combination. More about deception later in the post.

Panda 4.1 and Affiliate Marketing

Affiliates with Blank and/or Broken Pages
I came across sites with top landing pages from Google organic that were broken or blank. Talk about a double whammy… the sites were at risk already with pure affiliate content. But driving users to an affiliate site with pages that don’t render or break is a risky proposition for sure. I can tell you with almost 100% certainty that users were quickly bouncing back to the search results after hitting these sites. And I’ve mentioned many times before how low dwell time is a giant invitation to the mighty Panda.

Blank Affiliate Pages and Panda 4.1

Doorway Pages + Affiliate Are Even Worse
I also analyzed several sites hit by Panda 4.1 that held many doorway pages (thin pages over-optimized for target keywords). And once you hit those pages, there were affiliate links weaved throughout the content. So there were two problems here. First, you had over-optimized pages, which can get you hit. Second, you had low-quality affiliate pages that jumped users to partner websites to take action. That recipe clearly caused the sites in question to get hammered.  More about over-optimization next.


Keyword Stuffing and Doorway Pages
There seemed to be a serious uptick in sites employing keyword stuffing hit by Panda 4.1. Some pages were completely overloaded in the title tag, metadata, and in the body of the page. In addition, I saw several examples of sites using local doorway pages completely over-optimized and keyword stuffed.

For example, using {city} + {target keyword} + {city} + {second target keyword} + {city} + {third target keyword} in the title. And then using those keywords heavily throughout the page.

And many of the pages did not contain high quality content. Instead, they were typically thin without useful information. Actually, some contained just an image with no copy. And then there were pages with the duplicate content, just targeted to a different geographic location.

The websites I analyzed were poorly-written, hard to read through, and most people would probably laugh off the page as being written for search engines. I know I did. The days of stuffing pages and metadata with target keywords are long gone. And it’s interesting to see Panda 4.1 target a number of sites employing this tactic.

Panda 4.1 and Keyword Stuffing

Panda 4.1 and Keyword Density

Side Note About Human Beings:
It’s worth reiterating something I often tell Panda victims I’m helping. Actually, I just mentioned this in my latest Search Engine Watch column (which coincidentally went live the day after P4.1 rolled out!) Have neutral third parties go through your website and provide feedback. Most business owners are too close to their own sites, content, ad setup, etc. Real people can provide real feedback, and that input could save your site from a future panda hit.

I analyzed several sites hit by Panda 4.1 with serious ad problems. For example floating ads throughout the content, not organized in any way, blending ads with content in a way where it was hard to decipher what was an ad and what was content, etc.

I mentioned deception in the past, especially when referring to Panda 4.0, but I saw this again during 4.1. If you are running ads heavily on your site, then you absolutely need to make sure there is clear distinction between content and ads. If you are blending them so closely that users mistakenly click ads thinking it was content, then you are playing Russian roulette with Panda.

Panda 4.1 and Deception

Users hate being deceived, and it can lead to them bouncing off the site, reporting your site to organizations focused on security, or to Google itself. They can also publicly complain to others via social networks, blogging, etc. And by the way, Google can often pick that up too (if those reviews and complaints are public.) And if that happens, then you can absolutely get destroyed by Panda. I’ve seen it many times over the years, while seeing it more and more since Panda 4.0.

Deception is bad. Do the right thing. Panda is always watching.


Content Farms Revisited
I can’t believe I came across this in 2014, but I did. I saw several sites that were essentially content farms that got hammered during Panda 4.1. They were packed with many (and sometimes ridiculous) how-to articles. I think many people in digital marketing understand that Panda was first created to target sites like this, so it’s hard to believe that people would go and create more… years after many of those sites had been destroyed. But that’s what I saw!

To add to the problems, the sites contained a barebones design, they were unorganized, weaved ads and affiliates links throughout the content, etc. Some even copied how-to articles (or just the steps) from other prominent websites.

Now, to be fair to Google, several of the sites were started in 2014, so Google needed some time to better understand user engagement, the content, ad situation, etc. But here’s the crazy thing. Two of those sites surged with Panda 4.0. My reaction: “Whhaatt??” Yes, the sites benefitted somehow during the massive May 20 update. That’s a little embarrassing for Google, since it’s clearly not what they are trying to rise in the rankings…

Incorrect Panda 4.0 Surge

But that was temporary, as Panda 4.1 took care of the sites (although late in my opinion). So, if you are thinking about creating a site packed with ridiculous how-to articles, think again. And it goes without saying that you shouldn’t copy content from other websites. The combination will surely get you hit by Panda. I just hope Google is quicker next time with the punishment.

Security Warnings, Popup Ads, and Forced Downloads
There were several sites I analyzed that had been flagged by various security and trust systems. For example, several sites were flagged as providing adware, spyware, or containing viruses. I also saw several of the sites using egregious popups when first hitting the site, forcing  downloads, etc.

And when Panda focuses on user engagement, launching aggressive popups and forcing downloads is like hanging fresh bamboo in the center of your websites and ringing the Panda dinner bell. Users hate popups, especially when it’s the first impression of your site. Second, they are fearful of any downloads, let alone ones you are forcing them to execute. And third, security messages in firefox, chrome, antivirus applications, WOT, etc. are not going to help matters.

Trust and credibility are important factors for avoiding Panda hits. Cross the line and you can send strong signals to Google that users are unhappy with your site. And bad things typically ensue.

Panda 4.1 Security Problems

Next Steps:
Needless to say, Panda 4.1 was a big update and many sites were impacted. Just like Panda 4.0, I’ve seen some incredible recoveries during 4.1, while also seeing some horrible fresh hits. Some of my clients saw near-full recoveries, while other sites pushing the limits of spamming got destroyed (dropping by 70%+).

I have included some final bullets below for those impacted by P4.1. My hope is that victims can begin the recovery process, while those seeing recovery can make sure the surge in traffic remains.

  • If you have been hit by Panda 4.1, then run a Panda report to identify top content that was negatively impacted. Analyzing that content can often reveal glaring problems.
  • Have an audit conducted. They are worth their weight in gold. Some webmasters are too close to their own content to objectively identify problems that need to be fixed.
  • Have real people go through your website and provide real feedback. Don’t accept sugarcoated feedback. It won’t help.
  • If you have recovered, make sure the surge in traffic remains. Follow the steps listed in my latest Search Engine Watch column to make sure you aren’t feeding Google the same (or similar) problems that got you hit in the first place.
  • Understand that Panda recovery takes time. You need to first make changes, then Google needs to recrawl those changes (over time), and then Google needs to be measure user engagement again. This can take months. Be patient.
  • Understand that there isn’t a silver Panda bullet. I usually find a number of problems contributing to Panda attacks during my audits. Think holistically about user engagement and then factor in the various problems surfaced during an audit.
  • Last, but most importantly, understand that Panda is about user happiness. Make sure user engagement is strong, users are happy with your content, and they don’t have a poor experience while traversing your website. Don’t deceive them, don’t trick them into clicking ads, and make a great first impression. If you don’t, those users can direct their feedback to Panda. And he can be a tough dude to deal with.


Summary – Panda 4.1 Reinforces That Users Rule
So there you have it. Findings based on analyzing a number of websites impacted by Panda 4.1. I will try and post more information as I get deeper into Panda 4.1 recovery work. Similar to other major algorithm updates, I’m confident we’ll see Panda tremors soon, which will bring recoveries, temporary recoveries, and more hits. Strap on your SEO helmets. It’s going to be an interesting ride.



Tuesday, September 9th, 2014

Panda Update on Friday September 5, 2014

Panda Update on 9/5/14

My last blog post explained that Panda is now running in near-real-time and what that means for webmasters and business owners. Well, that was perfect timing as Panda just made another trip around the web as kids head back to school and the NFL kicks in.

I’ve seen multiple Panda clients see recovery starting on Friday 9/5. And some of the clients had been seriously impacted by our cute, black and white friend in the past. Two sites, in particular, saw drops of 60%+ from previous Panda updates.

Here are a few screenshots from companies seeing impact from the 9/5/14 Panda update:

Panda Recovery on 9/5/14


Another Panda Recovery on 9/5/14


Panda is Starting The School Year Out Right
Teachers always say that hard work can lead to success. And it seems the schoolyard Panda feels the same way. The clients seeing the biggest spikes in traffic have done a lot of hard work Panda-wise.

Over the past few months, massive Panda problems were uncovered from a content quality standpoint. That included finding thin content, duplicate content, low-quality content, scraped content, while also identifying ad problems and technical  problems that were impacting content quality and user engagement.

The user experience across each site was poor to say the least and the changes they have made (and are actively implementing) are improving the overall quality of their websites. And that’s exactly what you need to do in order to see positive Panda movement.

A Note About Temporary Recoveries (or Tests)
I recently wrote a post about temporary Panda recoveries, which I have seen several of over the past month or so.  It’s interesting to note that two sites that just bounced back had seen temporary Panda recoveries in the past month. Now, we don’t know if they were truly temporary recoveries or simply tests of a future Panda update that ended up getting rolled back. But since Friday 9/5, both of those sites have spiked again. Let’s hope these recoveries stick.

Temporary Panda Recovery


Beyond temporary recoveries, other websites battling Panda saw serious spikes in Google organic traffic starting on Friday 9/5. And like I said earlier, they had gotten hammered by Panda in the past. It’s awesome to see them bounce back.

For example, one site is up 85% and another is up 71%. Nice increases to say the least.

Panda Recovery Percentage in GA


Summary – Everybody’s Working for the Weekend (Including Panda)
As I explained earlier, Panda is now near-real-time and the days of waiting for monthly Panda updates are gone. The fact of the matter is that you can see impact at any point during the month (or even multiple times per month). So, if you’ve been impacted by Panda in the past, then check your reporting now. Friday might have been a very good day for you. And on the flip side (for those facing the Panda music for the first time), you might see a frightening drop in Google organic traffic. One thing is for sure… with the mighty Panda roaming the web in near-real-time, it’s never been more important to keep a close eye on content quality. Panda sure is.

So get ready for the next update. I’m confident it’s not far away. Actually, it might be just around the corner.




Tuesday, September 2nd, 2014

Google Panda Running Regularly Since P4.0, Approaches Near-Real-Time

Google Panda Running Regularly

In June of 2013 I wrote about the maturing of Google’s Panda algorithm and how it started to roll out monthly over a ten day period. Google also explained at that time that they wouldn’t be confirming future Panda updates. In my post, I explained how the combination of monthly updates, over ten days, with no confirmation, could lead to serious webmaster confusion. Getting hit by Panda was already confusing enough for webmasters (when they knew it was Panda). Now sites could get hit during a ten day period, any month, without confirmation from Google about what hit them.

So the monthly updates went on, I picked up a number of them, and yes, it was confusing for many. I received plenty of emails from business owners wondering why they experienced drops during those unconfirmed updates. In case you’re wondering, I could pick up those unconfirmed updates since I help a lot of companies with Panda and I have access to a lot of Panda data. More about that soon. But the average webmaster could not easily pick up those updates, which led to serious confusion and frustration. And that’s the situation we were in until May of 2014.

And Along Came Panda 4.0
This went on until Panda 4.0, which was a huge update released on May 20, 2014. Google did announce the update for several reasons. First, it was a new Panda algorithm. Second, they knew it was HUGE and would impact many websites (and some aggressively).

Everything about the update was big. There were huge recoveries and massive new hits. You can read my previous posts about Panda 4.0 to learn more about the update. But that’s not the focus of this post. Something else has been going on since Panda 4.0, and it’s critically important to understand.

After Panda 4.0 rolled out on May 20, 2014, I noticed that sites impacted by the algorithm update were seeing continual “tremors”. Sites that were hit were seeing more drops every week or so and sites that experienced recovery also saw tremors during those dates (slight increases during those intervals). Moving forward, I also started to see sites reverse direction during some of the tremors. Some that saw recovery saw slight decreases and others that were hit saw slight increases. It was fascinating to analyze.

I reached out to Google’s John Mueller via G+ to see if he could shed some light on the situation. Well, he did, and I documented his response in my Search Engine Watch column soon after. John explained that Google doesn’t have a fixed schedule for algorithm updates like Panda. They could definitely tweak the algo to get the desired results and roll it out more frequently. That was big news, and confirmed the tremors I was seeing.

Google's John Mueller Clarifies Panda Tremors

John also explained more about Panda in a recent Google Webmaster Office Hours Hangout (from August 15, 2014).Here’s a quote from John:

“I believe Panda is a lot more regular now, so that’s probably happening fairly regularly.”

And based on what I’ve been seeing across websites impacted by Panda, he’s not kidding. You can see the video below (starting at 21:40).
Since Panda 4.0, I’ve seen tremors almost weekly. And guess what? They really haven’t stopped. So it seems they aren’t temporary adjustments to Panda, but instead, this could be the new way that Panda roams the web. Yes, that would mean we are in the age of a near-real-time Panda. And that can be both amazing and horrifying for webmasters.


What I’ve Seen Since Panda 4.0
I mentioned that I have access to a lot of Panda data. That’s because I’ve helped a lot of companies with Panda since February of 2011, while also having new companies reach out to me about fresh Panda hits. This enables me to see recoveries with companies that are working hard to rectify content quality problems, while also seeing new Panda hits. This combination enables me to document serious Panda activity on certain dates.

Since Panda 4.0 rolled out, I have consistently seen tremors (almost weekly). I have seen companies continue to increase, continue to decrease, fluctuate up and down, and I have also documented temporary recoveries. Below, I’ll show you what some of the tremors look like and then I’ll explain what this all means.

Panda Tremors – Example
Example of Panda Tremors


Panda Tremors – Example
Second Example of Panda Tremors


Temporary Panda Recovery During Tremors
Temporary Panda Recovery During Tremors


Another Temporary Panda Recovery During Tremors
Example of Temporary Panda Recovery During Tremor


Fresh Bamboo and The Near-Real-Time Panda Algo
So, what does this all mean for webmasters and business owners? Well, it means that Panda is rolling out often, and sites can be impacted more frequently than before. That’s huge news for any webmaster dealing with a Panda problem. In the past, you would have to wait for a monthly Panda update to run before you could see recovery (or further decline). Now you can see impact much more frequently. Again, this is big.

That’s why I have seen sites fluctuate almost weekly since Panda 4.0. Some have stabilized, while others continue to dance with the mighty Panda. And the temporary recoveries emphasize an important point. If you haven’t completed enough Panda recovery work, you might see what looks to be recovery, only to get hammered again (and quickly). It’s one of the reasons I explain to Panda victims that they need to move quickly and implement serious changes based on a thorough Panda audit. If not, they are setting themselves up to continually see declines, or worse, see a misleading temporary recovery, only to get smoked again.

Summary – The Good and the Bad of The Near-Real-Time Panda
As I explained above, it looks like a new phase of Panda has begun. As someone neck deep in Panda work, it’s fascinating to analyze. With the mighty Panda roaming the web in near-real-time, websites can see ups and downs throughout the month. They can get hit, or recover, or even see both in one month. That’s why it’s never been more important to address content quality problems on your website. As always, my recommendation is to focus on user engagement, nuke thin and low quality content, remove deceptive tactics, and win the Panda game.

Let’s face it, Panda has upped its game. Have you?



Wednesday, August 13th, 2014

Affiliate Marketer Attacked by Panda 4.0 Sees Temporary Recovery, Gets Hit Again 5 Days Later [Case Study]

Panda Temporary Recovery Case Study

Panda 4.0 arrived in late May with a fury not seen by many previous updates. It was a HUGE update and many sites were decimated by P4.0. Most businesses reaching out to me after the May 20 update saw drops of 50%+, with some losing 80% of their Google organic search traffic overnight. And on the flip side, recoveries were strong too. There were some companies I was helping with past Panda attacks that saw increases of 200%+, with some seeing over 400% increases. Like I said, everything about Panda 4.0 was big.

Panda Games – The Rundown
A few weeks ago, I was analyzing a Panda tremor and saw some very interesting movement across sites I have been helping. More to come on that front, but that’s not the focus of this post today. That same day, a business owner reached out to me explaining that he saw serious fluctuations on a site of his that was crushed by Panda 4.0. Needless to say between what I was seeing, and what he had just explained, I was interested for sure.

So I asked how much of a recovery he saw during the latest Panda tremor, and what I heard shocked me – “Close to a full recovery.”  Whoa, not many have recovered from Panda 4.0 yet, so now he had my attention. Since my schedule has been insane, I didn’t have time to dig in too much at that point. I was planning to, but just couldn’t during that timeframe.

But then I heard back from the business owner the following week. I was at the Jersey Shore on vacation when a giant wave crashed at my feet (both literally and figuratively).  The business owner’s email read, “FYI, I just lost all of the gains from the recovery last week”.  Once again, my reaction was “Whoa…” :)

So to quickly recap what happened, a site that got crushed by Panda 4.0 ended up recovering during a Panda tremor (in late July), only to get hammered again five days later. By the way, it was a near-full recovery during the five day stint (regaining 75% of its Google organic search traffic). In addition, I’ve been analyzing other Panda 4.0 sites that were impacted during the late July 2014 update (which I plan to cover in future blog posts).  It was big tremor.

Quick Note About Temporary Recoveries:
It’s worth noting that I have seen other Panda victims see increases in Google organic traffic during the recovery phase (almost like the site is being tested). I’ve seen this during Panda work since 2011. I’ll explain more about that phenomenon soon, but I wanted to bring it up now since this site did see a temporary recovery.

Digging In
If you know me at all, you know what came next. I fired up my Keurig and dug into the site. With a cup of Jet Fuel and Black Tiger in me, I wanted to know all I could about this interesting Panda 4.0 case study. In this post, I’ll explain more about the temporary recovery, the factors that led to the Panda hit, why I think the site saw a temporary recovery, and end with some key learnings that are important for any business owners dealing with Panda 4.0 attacks to understand.  Let’s go.

Panda Factors
Although I want to focus on the temporary recovery, let’s quickly cover the initial Panda 4.0 hit. The site is small, containing less than 60 pages indexed. It’s a site covering an extremely focused niche and it’s a partial match domain (PMD). After analyzing the site, here are what I believe to be the core factors that led to the Panda hit.

Heavy Affiliate Content:
Looking through the history of the site reveals an increase of content in 2013 and much of the site content became affiliate-driven. The site was heavily linking to for products tied to the niche (and some were followed affiliate links). So there was a lot of traffic arriving on the site that was quickly going out. That’s never a good situation from a Panda-standpoint. Also, the other content funneled visits to the affiliate pages where the site could have a greater chance at converting those visits into potential sales down the line. And of course, you have followed affiliate links, which should be nofollowed.

I can’t tell you how many affiliate marketers have reached out to me after getting smoked by Panda since February of 2011. If you aren’t providing a serious value-add, then there’s a strong chance of getting crushed. I’ve seen it a thousand times. That’s a nice segue to the next factor – engagement.

Low Engagement, High Bounce Rates
I’ve mentioned many times in Panda blog posts the importance of strong engagement. Google has several ways to measure user engagement, but one of the easiest ways is via dwell time. If someone clicks through a search result on Google, visits a page, and quickly clicks back to the search results, that’s a pretty clear signal that the user didn’t find what they wanted (or that they didn’t have a positive user experience). Low dwell time is a giant invitation to the mighty Panda.

Checking standard bounce rates for top landing pages leading up to the Panda attack revealed extremely high percentages. Many of the pages had 90% or higher bounce rates. I wish the site had implemented Adjusted Bounce Rate (ABR), but it didn’t. ABR is a much stronger view of actual bounce rate that takes time on page into account. That said, many top landing pages with 90%+ bounce rates is not good.

High Bounce Rates Before Panda Struck

No Frills Design, Broken HTML
The site itself did not help build credibility. It was a basic WordPress design with little credibility-building factors. There weren’t clear signs of who ran the site, which company owned the site, etc. It was basically a shell WordPress site that you’ve seen a million times. The “About” page was just a paragraph and doesn’t inform the user about who was actually writing the content, who was behind the site, etc. By the way, I find about pages like that to make matters worse, not better.

In addition, there were several pages with broken html, where some html was showing up on the page itself (like html tags).

Broken HTML and Design and Google Panda

When you are trying to drive strong engagement, trust definitely matters. The less people trust the site and the company behind the content, the less chance you have of retaining them. And again, the more users that jump back to the search results, the more virtual bamboo you are piling up.

Deceptive Ads (Cloaked)
During my analysis, I found ads throughout the content that were very similar in style and design to the content itself. So, it was easy to think the ads were the actual content, which could trick users into clicking the ads. I’ve seen this a number of times while analyzing Panda attacks (and especially Panda 4.0.) In addition, this is even called out in the latest version of the Quality Grader Guidelines.

Deceptive Ads and Panda

I’ve found deception to be an important factor in recent Panda hits, so ads that are cloaked as content can be extremely problematic. Remember, SEOs and digital marketers might pick them up pretty quickly, but we’re not the majority of users browsing the web. Think about what the average person would do if they found those ads… Many would have no idea they were ads and not content. And they sure wouldn’t be happy landing on some advertiser’s website after clicking them.

Mixed throughout the content were many exact match anchor text links (EMATs), either pointing to the affiliate pages mentioned before or to off-site authority sites.  For example, a typical landing page would link heavily to the Amazon pages, but also to Wikipedia pages. I’ve seen this tactic used in the past as well with other Panda and Phantom victims (and I’ve even seen this during Penguin analysis).

Typically, the thought process is that if Google sees a site linking to authority sites, then it might trust that site more (the linking site). But it also creates a pattern that’s easy to pick up. It’s not natural to continually link to Wikipedia from many places on your site, and Google’s algorithms can probably pick up the trend when it takes all outbound links into account. And that many of the links are exact match anchor text doesn’t help (since the links throughout the pages tended to look over-optimized and somewhat spammy).

Authorship Backfiring
While analyzing the site, I noticed many of the top landing pages had authorship implemented. But when checking out the author, I got a feeling he wasn’t real. Sure, there was a G+ profile set up, and even other social accounts, but something didn’t feel right about the author.

And using reverse image lookup in Google images, I pulled up the same photo being used elsewhere on the web. In addition, it looked like a stock photo. The one used on the site I was analyzing was cropped to throw off the look (which helped make it look more unique).

So, if I had questions about the author, you better believe Google must have too. And add questionable authorship to the other factors listed above, and you can see how the credibility factor for this site was pushing it into the gray area of Panda. The author in the photo might as well been holding a piece of bamboo.

The Surge, The Hit, The Temporary Recovery, and Subsequent Hit
Below, I’ll quickly detail what happened as the site experienced a roller coaster ride across the giant Panda coaster.

Index status revealed a doubling of pages indexed leading into 2014. My guess is that more content was added to cast a wider net from an affiliate marketing standpoint. And again, many of those pages had affiliate links to Amazon to buy various products. That new content worked (in the short-term). Google organic traffic increased nicely on the site.

Then the site experienced the misleading and sinister surge that I wrote about in my Search Engine Watch column. In March of 2014, the site spiked in Google. Many different keywords related to the niche were driving traffic to the site. But unfortunately, that traffic was all leading to the problems I mentioned earlier.

Surge of Traffic Before Panda Attack

The surge I mentioned enables Google to gain a lot of engagement data from real users. And if you have content quality problems, usability problems, ad problems, etc., then you are feeding Panda a lot of bamboo. And that can easily lead to a Panda attack.

And that’s what happened during Panda 4.0. The wave crashed and the site lost 86% of its Google organic traffic overnight. Yes, 86%. Many of the keywords that the site picked up during the surge were lost during Panda 4.0. The landing pages that were once driving a boatload of organic search traffic dropped off a cliff visits-wise. One page in particular dropped by 96% when you compared post-Panda to pre-Panda (with 30 days of data). That’s a serious hit and speaks volumes about how Google was viewing the website.

Interesting Note – Money Term Untouched
While analyzing the keywords that dropped, it was interesting to see that the site’s money keyword was not impacted at all during Panda 4.0 (or even the second hit which I’ll cover shortly). That keyword, which is also in the domain name, stayed as-is. It’s hard to say why that was the case, but it was. Checking trending throughout the roller coaster ride reveals steady impressions, clicks, and average position.

Money Keyword Unaffected by Panda

July 22, 2014 – The Temporary Recovery
Then along came Tuesday, July 22. The site absolutely spiked with what looked to be a near-full Panda recovery. The site jumped up to 75% of its original traffic levels from Google organic.

Temporary Panda Recovery on July 22, 2014

Checking the keywords that surged back, they matched up very well with the keywords from pre-Panda 4.0. There was clearly a Panda update pushed out, although it was hard to say if it was a Panda tremor (minor tweaks) or something larger. It’s worth noting that I saw other sites dealing with Panda 4.0 hits show serious movement on this day. For example, one large site saw almost a full recovery (from a major Panda 4.0 hit).

July 27, 2014 – It was nice while it lasted.
Well, that was fast. It seems yet another Panda tremor came rolling through and the site lost all of its gains. I’ll cover more about that shortly, but it’s important to note that the site dropped back to its post Panda 4.0 levels. So, the temporary recovery lasted about 5 days. That’s a tough pill to swallow for the business owner, but taking a look at the situation objectively, it makes a lot of sense.

Second Panda Hit After Temporary Recovery

This situation underscores an important point about Panda recovery. You need to make serious changes in order to see long-term improvement. Band-aids and lack of action will get you nowhere. Or worse, it could yield a misleading, temporary recovery that gets your hopes up, only to come crashing down again. Let’s explore the temporary recovery in more detail.

Temporary Recoveries and Panda Tests
I mentioned earlier that I’ve seen Panda victims experience short bumps in Google organic traffic during the recovery phase. I even documented it in one of my Panda recovery case studies. It’s almost like Google is giving the site a second chance, testing user engagement, analyzing the new traffic, etc. And if it likes what it sees, the recovery could stick. In the case study I just mentioned, the site ended up recovering just a few weeks after the temporary bump occurred.

So, will this website experience a similar recovery? You never know, but I doubt it. The site that ended up recovering long-term made massive changes based on a deep Panda audit. They should have recovered (even quicker than they did in my opinion). The site I just analyzed hasn’t made any changes at all, so I doubt it will recover in its current state.

Key Learnings
I’ll end this post with some key learnings based on what I’ve seen with Panda recovery, tremors, etc. If you are struggling with Panda recovery, or if you are helping others with Panda recovery, then the following bullets are important to understand.

  • Google can, and will, push out minor Panda updates (which I call Panda tremors). Sites can recover during those updates to various degrees. For example, I saw a large-scale Panda 4.0 victim experience a near-full recovery during the July 22 update.
  • Small websites can get hammered by Panda too. I know there’s often a lot of focus on large-scale websites with many pages indexed, but I’ve analyzed and helped a number of small sites with Panda hits. Panda is size-agnostic.
  • When websites stir up a serious Panda cocktail, it can experience a misleading surge in traffic, followed by a catastrophic Panda attack. Understanding the factors that can lead to a Panda hit is extremely important. You should avoid them like the plague.
  • Be ready for Panda tests. When Google tests your site again, make sure you are ready from a content, ad, and engagement standpoint. Do the right things Panda-wise so you can pass with flying colors. If not, don’t bank on a recovery sticking. It might just be temporary…
  • Once again, I found deception and trickery contribute to a Panda hit. Cloaked ads, questionable authorship, heavy affiliate linking, and more led to this Panda attack. If you deceive users, expect a visit from the mighty Panda. And no, it probably won’t be pleasant.
  • In some situations, money terms may not be affected by panda. In this case study, the core money term was not impacted at all. It remained steady throughout the ups and downs. But as documented above, that didn’t stop the site from experiencing a massive drop in Google organic traffic (86%).

Summary: Long-Term Panda Changes = Long-Term Panda Wins
First, I’m glad you made it to the end of this post (I know it was getting long). Second, I hope you found this Panda case study interesting. It was definitely fascinating to analyze. I’ve helped many companies with Panda attacks since February of 2011 and this case had some very interesting aspects to it. As usual, my hope is this situation can help some of you dealing with Panda attacks better understand the fluctuations you are seeing over time. Panda can be a confusing topic for sure.

If there are few core things you should remember leaving this post, it’s that temporary recoveries can happen, implementing the right Panda changes over time is extremely important, Google can test your site during the recovery phase, and organic search traffic can come and go like the wind. Just make sure you’re ready when the Panda comes knocking.




Tuesday, July 22nd, 2014

How To Get More Links, Crawl Errors, Search Queries, and More By Verifying Directories in Google Webmaster Tools

Verify by Directory in Google Webmaster Tools

In my opinion, it’s critically important to verify your website in Google Webmaster Tools (GWT). By doing so, you can receive information directly from Google as it crawls and indexes your website. There are many reports in GWT that can help identify various problems SEO-wise. For example, you can check the crawl errors report to surface problems Googlebot is encountering while crawling your site. You can check the HTML improvements section to view problems with titles, descriptions, and other metadata. You can view your inbound links as picked up by Google (more on that soon). You can check xml sitemaps reporting to view warnings, errors, and the indexed to submitted ratio. You can view indexation by directory via Index Status (forget about a site command, index status enables you to view your true indexation number).

In addition to the reporting you receive in GWT, Google will communicate with webmasters via “Site Messages”. Google will send messages when it experiences problems crawling a website, when it picks up errors or other issues, and of course, if you’ve received a manual action (penalty). That’s right, Google will tell you when your site has been penalized. It’s just another important reason to verify your website in GWT.

Limit On Inbound Links for Sites With Large Profiles
And let’s not forget about links. Using Google Webmaster Tools, you can view and download the inbound links leading to your site (as picked up by Google). And in a world filled with Penguins, manual actions, and potential negative SEO, it’s extremely important to view your inbound links, and often. Sure, there’s a limit of ~100K links that you can download from GWT, which can be limiting for larger and more popular sites, but I’ll cover an important workaround soon. And that workaround doesn’t just apply to links. It applies to a number of other reports too.

When helping larger websites with SEO, it’s not long before you run into the dreaded limit problem with Google Webmaster Tools. The most obvious limit is with inbound links. Unfortunately, there’s a limit of ~100K links that you can download from GWT. For most sites, that’s not a problem. But for larger sites, that can be extremely limiting. For example, I’m helping one site now with 9M inbound links. Trying to hunt down link problems at the site-level is nearly impossible via GWT with a link profile that large.

Inbound Links in Google Webmaster Tools


When you run into this problem, third party tools can come in very handy, like Majestic SEO, ahrefs, and Open Site Explorer. And you should also download your links from Bing Webmaster Tools, which is another great resource SEO-wise. But when you are dealing with a Google problem, it’s optimal to have link data directly from Google itself.

So, how do you overcome the link limit problem in GWT? Well, there’s a workaround that I’m finding many webmasters either don’t know about or haven’t implemented yet – verification by directory.

Verification by Directory to the Rescue
If you’ve been following along, then you can probably see some issues with GWT for larger, complex sites. On the one hand, you can get some incredible data directly from Google. But on the other hand, larger sites inherently have many directories, pages, and links to deal with, which can make your job analyzing that data harder to complete.

This is why I often recommend verifying by directory for clients with larger and more complex websites. It’s a great way to dig deep into specific areas of a website. As mentioned earlier, I’ve found that many business owners don’t even know you can verify by directory!  Yes, you can, and I recommend doing that today (even if you have a smaller site, but have distinct directories of content you monitor). For example, if you have a blog, you can verify the blog subdirectory in addition to your entire site. Then you can view reporting that’s focused on the blog (versus muddying up the reporting with data from outside the blog).

Add A Directory in Google Webmaster Tools

And again, if you are dealing with an inbound links problem, then isolating specific directories is a fantastic way to proceed to get granular links data. There’s a good chance the granular reporting by directory could surface new unnatural links that you didn’t find via the site-level reporting in GWT. The good news is that verifying your directories will only take a few minutes. Then you’ll just need to wait for the reporting to populate.

Which Reports Are Available For Directories?
I’m sure you are wondering which reports can be viewed by subdirectory. Well, many are available by directory, but not all. Below, you can view the reports in GWT that provide granular data by directory.

  • Search Queries
  • Top Pages (within Search Queries reporting)
  • Links to Your Site
  • Index Status
  • Crawl Errors (by device type)
  • HTML Improvements
  • Internal Links
  • International Targeting (New!)
  • Content Keywords
  • Structured Data


GWT Reporting by Directory – Some Examples

Indexation by Directory
Let’s say you’re having a problem with indexation. Maybe Google has only indexed 60% of your total pages for some reason. Checking the Index Status report is great, but doesn’t give you the information you need to isolate the problem.  For example, you want to try and hunt down the specific areas of the site that aren’t indexed as heavily as others.

If you verify your subdirectories in GWT, then you can quickly check the Index Status report to view indexation by directory. Based on what you find, you might dig deeper to see what’s going on in specific areas of your website. For example, running crawls of that subdirectory via several tools could help uncover potential problems. Are there roadblocks you are throwing up for Googlebot, are you mistakenly using the meta robots tag in that directory, is the directory blocked by robots.txt, is your internal linking weaker in that area, etc? Viewing indexation by directory is a logical first step to diagnosing a problem.

How To View Index Status by Directory in Google Webmaster Tools


Search Queries by Directory
Google Webmaster Tools provides search queries (keywords) that have returned pages on your website (over the past 90 days). Now that we live in a “not provided” world, the search queries reporting is important to analyze and export on a regular basis. You can view impressions, clicks, CTR, and average position for each query in the report.

But checking search queries at the site level can be a daunting task in Google Webmaster Tools. What if you wanted to view the search query data for a specific section instead? If you verify by directory, then all of the search query data will be limited to that directory. That includes impressions, clicks, CTR, and average position for queries leading to content in that directory only.

In addition, the “Top Pages” report will only contain the top pages from that directory. Again, this quickly enables you to hone in on content that’s receiving the most impressions and clicks.

And if you feel like there has been a drop in performance for a specific directory, then you can click the “with change” button to view the change in impressions, clicks, CTR, and average position for the directory. Again, the more granular you can get, the more chance of diagnosing problems.

How To View Search Query Reporting by Directory in Google Webmaster Tools


Links by Directory
I started explaining more about this earlier, and it’s an extremely important example. When you have a manual action for unnatural links, you definitely want to see what Google is seeing. For sites with large link profiles, GWT is not ideal. You can only download ~100K links, and those can be watered down by specific pieces of content or sections (leaving other important sections out in the cold).

When you verify by directory, the “links to your site” section will be focused on that specific directory. And that’s huge for sites trying to get a better feel for their link profile, unnatural links, etc. You can see domains linking to your content in a specific directory, your most linked content, and of course, the actual links. And you can download the top ~100K links directly from the report.

In addition, if you are trying to get a good feel for your latest links (like if you’re worried about negative SEO), then you can download the most recent links picked up by Google by clicking the “Download latest links” button.  That report will be focused on the directory at hand, versus a site-level download.

I’m not saying this is perfect, because some directories will have many more links than 100K. But it’s much stronger than simply downloading 100K links at the site-level.

How To View Inbound Links by Directory in Google Webmaster Tools


Crawl Errors By Directory
If you are trying to analyze the health of your website, then the Crawl Errors reporting is extremely helpful to review. But again, this can be daunting with larger websites (as all pages are reported at the site-level). But if you verify by directory, the crawl errors reporting will be focused on a specific directory. And that can help you identify problems quickly and efficiently.

In addition, you can view crawl errors reporting by Google crawler. For example, Googlebot versus Googlebot for Smartphones versus Googlebot-mobile for Feature Phones. By drilling into crawl errors by directory, you can start to surface problems at a granular level. This includes 404s, 500s, Soft 404s, and more.

How To View Crawl Errors by Directory in Google Webmaster Tools

Summary – Get Granular To View More Google Webmaster Tools Data
Verifying your website in Google Webmaster Tools is extremely important on several levels (as documented above).  But verifying by directory is also important, as it enables you to analyze specific parts of a website at a granular basis. I hope this post convinced you to set up your core directories in GWT today.

To me, it’s critically important to hunt down SEO problems as quickly as possible. The speed at which you can identify, and then rectify, those problems can directly impact your overall SEO health (and traffic to your site). In addition, analyzing granular reporting can help surface potential problems in a much cleaner way than viewing site-wide data. And that’s why verifying subdirectories is a powerful way to proceed (especially for large and complex sites).  So don’t hesitate. Go and verify your directories in Google Webmaster Tools now. More data awaits.