Insidious Thin Content on Large-Scale Websites and Its Impact on Google Panda

Glenn Gabe

algorithm-updates, google, seo

Insidious Thin Content and Google Panda

If you’ve read some of my case studies in the past, then you know Panda can be a real pain the neck for large-scale websites. For example, publishers, ecommerce retailers, directories, and other websites that often have tens of thousands, hundreds of thousands, or millions of pages indexed. When sites grow that large, with many categories, directories, and subdomains, content can easily get out of control. For example, I sometimes surface problematic areas of a website that clients didn’t even know existed! There’s usually a gap of silence on the web conference when I present situations like that. But once everyone realizes that low quality content is in fact present, then we can proceed with how to rectify the problems at hand.

And that’s how you beat Panda. Surfacing content quality problems and then quickly fixing those problems. And if companies don’t surface and rectify those problems, then they remain heavily impacted by Panda. Or even more maddening, they can go in and out of the gray area of Panda. That means they can get hit, recover to a degree, get hit again, recover, etc. It’s a maddening place to live SEO-wise.

The Insidious Thin Content Problem
The definition of insidious is:
“proceeding in a gradual, subtle way, but with harmful effects”

And that’s exactly how thin content can increase over time on large-scale websites. The problem usually doesn’t rear its ugly head in one giant blast (although that can happen). Instead, it can gradually increase over time as more and more content is added, edited, technical changes are made, new updates get pushed to the website, new partnerships formed, etc. And before you know it, boom, you’ve got a huge thin content problem and Panda is knocking on the door. Or worse, it’s already knocked down your door.

So, based on recent Panda audits, I wanted to provide three examples of how an insidious thin content problem can get out of control on larger-scale websites. My hope is that you can review these examples and then apply the same model to your own business.

 

Insidious Thin Content: Example 1
During one recent audit, I ended up surfacing a number of pages that seemed rogue. For example, they weren’t linked to from many other pages on the site, didn’t contain the full site template, and only contained a small amount of content. And the content didn’t really have any context about why it was there, what users were looking at, etc. I found that very strange.

Thin Content with No Site Template

So I dug into that issue, and started surfacing more and more of that content. Before I knew it, I was up to 4,100 pages of that content! Yes, there were over four thousand rogue, thin pages based on that one find.

To make matters even worse, when checking how Google was crawling and indexing that content, you could quickly see major problems. Using both fetch and render in Google Webmaster Tools and checking the cache of the pages revealed Google couldn’t see most of the content. So the thin pages were even thinner than I initially thought. They were essentially blank to Google.

Thin Content and Content Won't Render

When bringing this up to my client, they did realize the pages were present on the site, but didn’t understand the potential impact Panda-wise. After explaining more about how Panda works, and how thin content equates to giant pieces of bamboo, they totally got it.

I explained that they should either immediately 404 that content or noindex it. And if they wanted to quicken that process a little, then 410 the content. Basically, if the pages should not be on the site for users or Google, then 404 or 410 them. If the pages are beneficial for users for some reason, then noindex the content using the meta robots tag.

So, with one finding, my client will nuke thousands of pages of thin content from their website (which had been hammered by Panda). That will sure help and it’s only one finding based on a number of core problems I surfaced on the site during my audit. Again, the problem didn’t manifest itself overnight. Instead, it took years of this type of content building on the site. And before they knew it, Panda came and hammered the site. Insidious.

 

Insidious Thin Content: Example 2
In another audit I recently conducted, I kept surfacing thin pages that basically provided third party videos (which were often YouTube videos embedded in the page). So you had very little original content and then just a video. After digging into the situation, I found many pages like this. At this time, I estimate there could be as many as one thousand pages like this on the site. And I still need to analyze more of the site and crawl, so it could be even worse…

Now, the web site has been around for a long time, so it’s not like all the thin video pages popped up overnight. The site produces a lot of content, but would continually supplement stronger content with this quick approach that yielded extremely thin and unoriginal content. And as time went on, the insidious problem yielded a Panda attack (actually, multiple Panda attacks over time).

Thin Video Pages and Google Panda

Note, this was not the only content quality problem the site suffered from. It never is just one problem that causes a Panda attack by the way. I’ve always said that Panda has many tentacles and that low quality content can mean several things. Whenever I perform a deep crawl analysis and audit on a severe Panda hit, I often surface a number of serious problems. This was just one that I picked up during the audit, but it’s an important find.

By the way, checking Google organic traffic to these pages revealed a major decrease in traffic over time… Even Google was sending major signals to the site that it didn’t like the content. So there are many thin video pages indexed, but almost no traffic. Running a Panda report showing the largest drop in traffic to Google organic landing pages after a Panda hit reveals many of the thin video pages in the list. It’s one of the reasons I recommend running a Panda report once a site has been hit. It’s loaded with actionable data.

So now I’m working with my client to identify all pages on the site that can be categorized as thin video pages. Then we need to determine which are ok (there aren’t many), which are truly low quality, which should be noindexed, and which should be nuked. And again, this was just one problem… there are a number of other content quality problems riddling the site.

 

Insidious Thin Content: Example 3

During another Panda project, I surfaced an interesting thin content problem. And it’s one that grew over time to create a pretty nasty situation. I surfaced many urls that simply provided a quick update about a specific topic. Those updates were typically just a few lines of content all within a specific category. The posts were extremely thin… and were sometimes only a paragraph or two without any images, visuals, links to more content, etc.

Thin Quick Updates and Google Panda

Upon digging into the entire crawl, I found over five thousand pages that fit this category of thin content. Clearly this was a contributing factor to the significant Panda hit the site experienced. So I’m working with my client on reviewing the situation and making the right decision with regard to handling that content. Most of the content will be noindexed versus being removed, since there are reasons outside of SEO that need to be taken into account. For example, partnerships, contractual obligations, etc.

Over time, you can see that some of these pages actually used to rank well and drive organic search traffic from Google. That’s probably due to the authority of the site. I’ve seen that many times since 2011 when Panda first rolled out. A site builds enormous SEO power and then starts pumping out thinner, lower-quality content.  And then that content ends up ranking well. And when users hit the thin content from Google, they bounce off the site quickly (and often back to the search results). In aggregate, low user engagement, high bounce rates, and low dwell time can be a killer Panda-wise. Webmasters need to avoid that situation like the plague. You can read my case study about “6 months with Panda” to learn more about that situation.

 

Summary – Stopping The Insidious Thin Content Problem is Key For Panda Recovery
So there you have it. Three quick examples of insidious thin content problems on large-scale websites. They often don’t pop up overnight, but instead, they grow over time. And before you know it, you’ve got a thick layer of bamboo on your site attracting the mighty Panda. By the way, there are many other examples of insidious thin content that I’ve come across during my Panda work and I’ll try and write more about this problem soon. I think it’s incredibly important for webmasters to understand how the problem can grow, the impact it can have, and how to handle the situation.

In the meantime, I’ll leave you with some quick advice. My recommendation to any large-scale website is to truly understand your content now, identify any Panda risks, and take action sooner than later. It’s much better to be proactive and handle thin content in the short-term versus dealing with a major Panda hit after the fact. By the way, the last Panda update was on 10/24, and I’m fully expecting another one soon. Google rolled out an update last year on 1/11/14, so we are definitely due for one soon. I’ll be sure to communicate what I’m seeing once the update rolls out.

GG