In a previous post I outlined how to implement rel=canonical HTTP headers using ASP.NET C#, ASP VBScript and PHP. However, there is a much simpler way to do this via the .htaccess file. Before we get into the code, lets recap on why you might want to use, and in what situations you could use an http header canonical link.
Back in June Google announced that its web search now supports a rel=canonical link relationship in HTTP headers. This is useful for both SEOs and webmasters who are combating the likes of PDF documents ranking higher than their preferred HTML pages. If your company publishes documents in both HTML format and PDF format, then using the HTTP header canonical link will help you to tell Google which is the preferred or main format that it should rank. This is even more essential if you are using Analytics to track traffic to your website as PDF downloads arent tracked in Analytics. Potentially youre missing out on traffic and accurate Analytics metrics if your PDF documents rank higher than your HTML pages.
For many people, HTTP headers are an area of development that is all too often overlooked. Take for example a 404 page; do you check to see if your 404 page serves the correct HTTP header response? Its important to know how to check response headers and what they mean, but thats for a different post. Heres what Googles 404 page response header looks like, which is correct:
The Code
By far the easiest way to implement an HTTP header canonical link is via the .htaccess file. This is because the file can be accessed via FTP and the majority of developers know what they are doing with a .htaccess file. If you are unsure of what you are doing I suggest you stay clear of the .htaccess file as you can quite easily make a mess of things.
In the following example we going to use the .htaccess file to place an HTTP header canonical link from a PDF file called white-paper.pdf to an HTML file called white-paper.html. In this case both files are placed on the root of the domain:
<FilesMatch "white-paper.pdf">
Header set Link '< https://yoursite.com/white-paper.html>;rel="canonical"'
</FilesMatch>
To test that everything is working correctly, use a plugin such as Live HTTP Headers for Firefox. If you need to implement a large number canonical links than your best bet is to use the methods in my previous post as a .htaccess file can become very bloated using this method. Its important to gain control and manage all of the files on your website and using rel=canonical HTTP headers is a great way of achieving this.
Thanks Alex, really helpful and useful post.
One more question, can I use absolute URL or relative URL is a must?
FilesMatch “http://www.example.com/pdf/white-paper.pdf”