A new tool has been added to the SEO's arsenal to deal with duplicate content created by URL parameters.
The tool is specific to parameter handling within Google's search and can be found in the Google Webmaster Tools
URL Parameters
URL parameters customize our web experience, from passing page ids to organizing our shopping search results with variables like size, price or color.
URL parameters can present SEO challenges. Search engines think every URL is a new page and index it.
URL Parameters are a common culprit when it comes to duplicate content because they make the same content appear on "different" pages.
For example, these are likely to have very similar content:
- https://example.com/widgets/blue/
- https://example.com/?product=widget&color=blue
- https://example.com/?product=widget&color=blue&price=all
Fortunately, there are ways to deal with this challenge, a number of which have been discussed in length here on SEP.
Ways To Deal With Parameter-caused Duplicate Content
Dealing with duplicate content, as a result of parameters, can be done through methods such as:
- Canonical meta tag
- Noindex meta tags
- Robots.txt blocking
- URL Rewriting and 301 Redirects
Google Webmaster Tools' New Parameter Tool
To find it, click on the Site Configuration Tab which will dropdown revealing URL Parameters as the last option.
Once youve located the URL Parameters section in webmaster tools, youll likely notice a list of your common parameters already compiled and waiting for your input. Smaller sites may have only a few parameters, if any at all " while large or ecommerce sites could have hundreds.
The parameter handling tool allows you to give additional information on your parameters and inform Google of the role and importance of different parameters.
Describing Your Parameters
Google gives you several options to choose from with each parameter based on the role youd like it to play and the importance of its indexation. Below are the options youre presented with for each parameter.
Question 1: Does this parameter change page content seen by the user?
Option 1: Yes: Changes, reorders, or narrows page content
If you choose yes, another set of options will open up allowing you to further describe that parameter.
Option 2: No: Doesnt affect page content (ex: tracks usage)
If you select no, Googlebot will know to only crawl and index one representative url.
Question 2: How does this parameter affect page content?
Your available options are:
a. Sorts: This parameter changes the order in which content is presented.
b. Narrows: This parameter filters the content on the page.
c. Specifies: This parameter determines the set of content displayed on a page.
d. Translates: This parameter displays a translated version of the content.
e. Paginates: This parameter displays a specific page of a long listing or article.
f. Other: This parameter changes content in ways other than those described above.
Question 3: Which URLs with this parameter should Googlebot crawl?
Lastly, you can indicate which parameters should be crawled with these options:
a. Let Googlebot Decide
b. Every URL
c. Only URLs with value: (user-defined)
d. No URLs
These options are fairly self-explanatory, giving you distinct options on how Googlebot should crawl the page.
Common Questions with Parameter Handling
If I have canonical tags, a robots.txt or other methods of controlling parameters already in place, what settings should I choose?
If you have already established a way of handling your parameters then you dont necessarily need to do anything with this new tool. However, if you wish to customize certain parameters, you can leave others as Let Googlebot Decide and they should take your prior parameter handling methods into account.
What do you do if there are multiple parameters on the same page that you want to handle in different ways?
If there are parameters you wish never to index, chose the No URLs option. This will override any other settings when that particular parameter is present. Otherwise, you can select the option to crawl only URLs with a certain value, indicating the pages you would like crawled. In general, more restrictive settings will override less restrictive ones.
Will the new URL Parameters tool conflict with other canonical tags?
According to Google, it wont conflict and its fine to use both if you want to be very thorough.
With the rapidly growing web and incredible amounts of content, parameter handling is an effective effort by Google to reduce redundant crawling of pages, making more efficient use of Googlebot resources to crawl and index as much as possible. Using this new tool, you can accurately describe your parameters, reduce duplicate content and improve crawling efficiency across your sites.
What about parameters that are part of a file such as part of an image file or theme file. I see a lot of parameters in there that are part of my theme files and I’m not sure if I should be blocking them or not. The parameters in question are: “h”, “w”, “src”, etc.
Hi Joe,
That’s a great question. There are many sites, particularly with wordpress, that load scripts, images, includes and a number of other files often using variables. The answer however is not a simple yes or no, but would depend on the content you’re loading.
Some ways you can determine this would be to take the url (with parameters), view it in your browser and ask:
– Is there content showing that you want indexed?
– Is the content unique, or is the parameter something that does not change the content? In which case you may have a cleaner url to index.
An example of where it might be loading something you wouldn’t want to block would be an iframe loading unique content. For your specific examples of h, w and src they’re likely something to do with height and width which won’t change the unique content but rather the layout. If you try the quick test above you’ll have a confident answer.
For most cases with theme files, functions, plugins, and the like they simply add functionality and blocking them from the index won’t affect your content. It’s important however to ensure that other content that you do want to be indexed does not use the same variable.
If you’re unsure it’s generally better to leave it to ‘let google decide’ and they’ll index any relevant unique content.
Hope that helps,
Kris