So you already understand the importance of proper URL normalization within a site's structure. Great! ... but where do you go from there? The htaccess file allows you to override certain decisions that the server makes in the background. What this means for SEOs is that we can manipulate the information the server sends back to the client in a way that search engines and users find superior.
Your Basic 301
301 is the HTTP status code for a permanent redirect. Thus, a 301 command invoked by htaccess lets search engines know that any authority/links accumulated by the old URL should be directed to the new URL forever. 301 redirects are the bread and butter of URL normalization.
The Syntax
The 301 redirect is extremely simple:
redirect 301 /relative/path/to/file.php https://www.yoursite.com/path/to/new/file.php
The Need for More Power
Redirecting pages using the standard 301 command shown above is extremely simple when you have a small number of pages to consider.
However, once the rules become more complex, such as a directory or site transfer, the redirect 301 command becomes extremely time inefficient.
Creating a redirect 301 for thousands of pages can easily takes days of boring grunt work.
The solution? The Apache Rewrite tool.
A Word on Regular Expressions
While the following examples in the rest of the article may be slightly modified to fit your needs, I highly encourage you to learn regular expressions. They aren't nearly as difficult as they seem.
For a starting point, you can check out my introduction to regular expressions guide, and then grab some more tutorials from Regular-Expressions.info.
Additionally, you can view my more advanced guide to RegEx, covering back references, quantifiers, and anchors.
A Simple Example Using the Rewrite
Before you declare any RewriteRules, you need to tell .htaccess to turn on the Rewrite Engine.
Place the following near the top of your .htaccess file: RewriteEngine On
The syntax for Rewrite Rules are as follows:
For example, let's assume your old server only let you serve up plain html files. Now that you've moved to a host that doesn't use floppy disks, you've decided to implement some php. You had a ton of html files that are now php files, and you don't want to incur any duplicate content issues or lose the link juice those html pages had garnered. The solution: Explanation: Let's start off with the URL pattern. The parenthesis around the dot and * tell the engine to store (remember) these matched characters in what we call a back reference (a bit like a variable). This allows us to (re)use these characters later, which is extremely useful! The dollar sign matches the end of the string. The combination of the caret and dollar sign ensure our pattern matches the entire URL and not just a substring of the URL. Now that we've finished the pattern, let's take a look at the file reference. $1 matches the first back reference, $2 matches the second, and so on. Now files such as website.com/file.html will reference website.com/file.php But in this example, we don't want to stop at the reference. If we left off the flags at the end, the code as is would simply display the contents of file.php though the url is still file.html. That's just duplicate content! By placing the [R=301, L] flags at the end of the Rewrite Rule, we give the signal to actually redirect to file.php and have that reside in the URL. This might seem a bit complex at first, but after familiarizing yourself with basic regular expressions, you will easily understand rewrite rules such as this one. For example, website.com/work will load website.com/work.php, but the url will not have the extension (and extensionless URLs are sexy!) Let's say you're using wordpress and you want a pretty URL to represent listing books by author on a books page. WordPress executes its own rewrite rules that routes everything back to index.php, so many of your rewrite rules will not work because you're probably trying to rewrite a URL that has already been rewritten! The workaround is to use the pagename GET parameter of index.php. The world of Regular Expressions is quite fascinating. Rewrites allow your sites to become both more organized AND more flexible!ReWriteRule url_pattern file_reference
Let's define these
1. Change File Extension
RewriteBase /
RewriteRule ^(.*).html$ $1.php [R=301, L]
So now all requests to www.yoursite.com/whatever.html get redirected to www.yoursite.com/whatever.php
^(.*).html$
Now section by section:
^
The caret symbol matches the start of a string. In our case, the string is the URL. Rarely do you exclude the caret; without the caret, you can introduce ambiguities.(.*)
The dot operator for regular expressions matches any character. The star quantifier following represents "0 or more instances". Thus .* together yields "any character, 0 or more times". By '0 or more', I do not imply that the character has to be repeated: .* matches the strings aaaaa and 23e2323. Any character, any number of times..html$
We know that .html is an extension. However, as defined above, the dot operator stands for any character. So the pattern ".html" to a regular expression engine can be "9html" or "lhtml", anything! By using the escape character , we tell the engine that we want to literally match a period.$1.php
The parenthesis in the url pattern helped us capture all characters up until the .html extension. We are able to access these back references through the dollar sign.More Rewrite Snippets
2. Force Www In Urls:
RewriteCond %{HTTP_HOST} !^www.yourwebsite.com [NC]
RewriteRule ^(.*)$ https://www.yourwebsite.com/$1 [R=301,L]
3. Urls Without Extensions Should Be Executed Via Php.
RewriteRule ^$ index.php
RewriteRule ^((?!(.|.php)).)*$ $0.php
4. Redirect Index.php To The Root
Options +FollowSymLinks DirectoryIndex index.php
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /index.php HTTP/
RewriteRule ^index.php$ https://www.yourwebsite.com/ [R=301,L]
5. Using Rewrite With WordPress:
RewriteRule ^author/(.+)$ index.php?pagename=books&author=$1
So the url will appear as www.buybooks.com/author/king, but the server will load the php file as if www.buybooks.com/index.php?pagename=books&author=king was entered. This means you have access to the values via $_GET on the books page in your theme! Pretty handy for a WP developer wanting to organize information.