duplicate-thin-content

The larger and older the website you're working on, the harder it becomes to find duplicate content and thin content.

Add external duplication to (too much) internal duplication and your site can be demoted or even completely filtered out in search results.

Add too much thin content and especially when combined with internal and external duplication, you're looking at a slap on the wrist from Panda.

How To Find Thin Content And Duplicate Content

While Google has said that short content doesn't equal thin content, word count is a pretty close measure of spotting what might be thin content.

For duplicate content you could just do a "site:" query in Google, feeding it a phrase from a page. That's sort of good enough for a one page spot check but what if you want to find duplicate content across your site? And what if you not only want to catch 1-on-1 duplicates but near duplicates as well?

In both cases we're using the excellent website crawler A1 Website Analyzer which as best as possible does a content word count and can give content similarity feedback. Using this crawler it's easy to find:

  • Thin content
  • Duplicates titles
  • Duplicates headers (H1 and H2)
  • Duplicates descriptions
  • Similar content

Crawling The Website

When you first start A1 Website Analyzer, the first thing you will need to do is crawling the website you wish to inspect:
a1wa-crawl-start

Depending on the data you are interested in, you can configure data
collection options in the "Scan website | Data collection" tab.

This will give a result that looks like this:
a1wa-crawl-done

After the scan has finished, you can select which columns to show in the program and the Excel file it can export: a1wa-data-columns

You can also set various filters to only show the pages you are interested in:
a1wa-pick-filters.png

You can even select predefined columns/filters "reports":
a1wa-pick-report

Note: To see all options you can switch off "Simplified easy mode": a1wa-easy-mode

(For a full description on how to crawl a site with A1 Website Analyzer, click here)

Finding Pages With Thin Content

A good technique to find pages with thin / shallow content is to look for pages in your website that has a significantly lower text to code ratio than the other. You can then rewrite them as needed.

a1wa-thin-content

Besides the text/code ration there's also pure word count:

image

You can sort and filter as you like, export to Excel and filter there, or use the quick report presets:

image

Finding Pages With Duplicate Titles, Headers And Descriptions

By simply selecting the built-in reports that configures which columns are visible and which filters are visible you can quickly see where you have duplicate or similar content.

Duplicate page titles:
a1wa-duplicate-titles

Duplicate H1 page headers:
a1wa-duplicate-headers

Duplicate page descriptions:
a1wa-duplicate-descriptions

Again, you can also sort by these columns inside A1 Website Analyzer, or export to a spreadsheet and work from there.

Finding Pages With Similar Content

A1 Website Analyzer has a unique feature that can give visual feedback for which pages have similar content.

It is still experimental but it works quite well for many websites, and it is an additional tool in the toolbox for identifying possible problems in a website.

Before you initiate a scan, you have to enable the option "Perform keyword analysis of all pages" found in the "Scan website | Data collection" tab.

After the scan has finished, you can sort the results which will try to group and sort content in "similar" sections. These "similar" groupings go beyond simply determining if pages are exact duplicates.

a1wa-similar-content

The highlighted pages have a huge overlap at the start of their content

How To Fix Duplicate & Thin Content

Common solutions for fixing these problems include:

  • Merge content of multiple thin content pages into one solid content page. Redirect the old, thin pages to this URL.
  • Use the canonical tag to point multiple pages into one - in particular relevant if your pages contain small variations, e.g. different versions of the same product. (See also: What is the Difference between 301 Redirect and Canonical Attribute)

Note that ensuring pages have unique titles, headings, and meta descriptions, and that these match the content is a big part of conversion-centered SEO.

About the Author: Ruud Hein

I love helping to make web sites make it. From the ground up if needed. CSS challenges, server-side scripting, user and device friendly JavaScript tricks search engines have no problems with. Tracking how the sites perform and then figuring out how to make that performance and the tracking better. I'm passionate about information. No matter how often I trim my feeds in my feed readers (yes, I use more than one), I always have a couple of hundred in there covering topics ranging from design to usability, from SEO to SEM, from life hacks to productivity blogs, from.... Well, you get the idea, I guess. Knowledge and information management is close to my heart. Has to be with the amount of information I track. My "trusted system" is usually in flux but always at hand and fully searchable. My paid passion job at Search Engine People sees me applying my passions and knowledge to a wide array of problems, ones I usually experience as challenges. It's good to have you here: pleased to meet you!