Can Duplicate content hurt my website’s Google Rankings

The official word from Matt Cutts from Google is that the “typical white hat site does not have to worry about 1-3 versions of an article on their site.”

Now, I am a fan of Matt Cutts – though at times I wish he would think a bit more laterally. He comes out with these statements without really appreciating the impact of them. In this case, this statement has caused a lot of grief for many webmasters, who have as a result got lax with checking their site for duplicate content because they consider themselves white hat, so therefore duplicate content won’t ever be a problem for them! Nothing could be more wrong. But more on that later – let me detail duplicate content first.

Duplicate Content – What is it?

Google and other search engines don’t wish to have their indexes filled with what is essentially the same content. They don’t want website “a” which is publishing a site about dogs, to have position one on Google, with website “b” publishing the SAME article in position to… etc etc and so on. To prevent this problem, Google will have a stab at choosing what they see as they best one to show, usually the first one they ever indexed, but not always.

Duplicate Content is created in many different ways.

These are but a few:

  • Retailers publishing the standard manufacturer’s description when they put their products up on their ecommerce website.
  • Syndication of articles.
  • Websites accidentally getting the www, and non www versions of their site indexed.
  • Websites accidentally publishing different urls, with the same content.(The most common problem – its endemic to many off the shelf content management systems and shopping carts.)
  • Websites accidentally publishing their content on a sub domain as well as the main domain.
  • Websites having too little different content on a page – not duplicate content per se, its more like ‘too similar content”

So Matt says its not a problem, why are you saying it is?

My problem with Matt’s statement is that it:-

  • Directly contradicts Google’s Quality Guidelines – Don't create multiple pages, subdomains, or domains with substantially duplicate content.
  • Doesn’t go far enough detailing what happens to a site with its products duplicate 100 times over because of a poor shopping system, will invariably have its pages left out of Google’s cache. At the time of writing this article, such an example can be seen here http://www.google.com.au/search?sourceid=navclient&ie=UTF-8&rlz=1T4GGLJ_en&q=site%3aincense.com.au. This site every page duplicated many times over, and as a result has nothing in the Google site index but the home page.
  • Doesn’t explain that Google may decide to keep the wrong version of a duplicate page in its index. The page may in time become a lesser value page than the one(s) Google does not keep in its index, and thus not rank as highly.
  • That Google rewards sites with greatly different content on their pages, so thus a site with many copies of the same content, cannot expect to gain such rewards.
  • Contradicts Google’s position to not repeat large blocks of text such as large copyright statements – accidental duplicate content would appear in exactly the same way, especially if there is a database adding random data to the page making the “duplicate” pages in fact appear to Google as only “near duplicates”
  • Other websites may link to the different duplicate copies, and since Google only keeps one version of the duplicated web pages in its index, it may not count the incoming links to the ignored page. Incoming links invariably help rankings…. unless they are made to pages google is ignoring..
  • There are many more situations where how Google treats duplicate content means having it is dangerous for a webmaster, whether they are white hat or not. With respect Matt – there are so many exceptions to your general rule, that making the generalization was to many, a disservice

Summary

  1. If your website has content that is duplicate or near duplicate to the content of another site, expect to pay a rankings price. Forget trying to just add reviews and user feedback or other tricks, you are simply minimising damage, not avoiding it.
  2. If something has gone awry with your content management system and it is creating duplicate content the problem will not be visible to you or people using the site, but it will be to Google. Google will view your site as diluting its message with many similar pages and will suffer lower rankings on Google as a result.
  3. If you have somehow managed to accidentally publish your content in multiple domains or subdomains, your site is breaching Google’s webmaster guidelines and is likely suffering as a result. Again, there is a good chance you have done it and don't even know it.

At the SEO Guys, we are expert at spotting duplicate and near duplicate content, as well as determining the cause of it and how you can fix your site so the problem goes away. In many cases, we have seen our fixes result in a huge swing in Google rankings within a week of fixing the problem.