Let's face it. You wish your website ranked for the number one spot on Google. Yes there are more important things like overall online visibility and ROI. And yes, there are other major players like Yahoo! and Bing. But today we are going to focus on Google. In fact, I am going to make a bold statement and say that anyone who tells you they don't care about ranking high in Google is lying.
Software robots, or bots, called spiders build lists of words they find on websites. This process is called crawling. Spiders crawl through web pages and index what they find. They also follow every link found within a web page. This practice of following links enables spiders to quickly travel across the web indexing pages.
There are millions of websites and billions of web pages. Because of this, Google has sophisticated algorithms that determine how much time a spider can spend on your site. In order for your site to be displayed in the results pages, it is important that the spiders properly (and fully) index your website.
Spider Friendly Checklist
For those of you well versed in all things SEO, nothing here will be new to you. For the rest of you newbs out there (we all have to start somewhere), please keep in mind that some factors have more importance than others and are listed in no particular order. In my humble opinion it is beneficial to execute all of the items listed below. Every little bit helps. With constantly changing algorithms, it is imperative to have all the bases (and basics) covered.
Individually each of the tasks below won't have a huge impact. Collectively they will help your site rank; especially if you're in a niche and not-so -competitive industry (like crumb rubber in Penticton). If you are in a competitive industry (like New York
real estate), these tactics are small (but necessary) stepping stones to compete with the big boys. At the end of the day, no matter how competitive your industry is, if spiders are unable to index your site, you won't be found in Google. Simple as that.
All of the items below can (and do) have blog posts of their own describing each task in detail. For the sake of brevity, each tactic is described in basic top level terms.
- Create good site architecture and link structure: Two to three clicks to reach a destination (i.e. important product or service landing pages) from the homepage is optimal. If a spider has to crawl too deep it may never get to those pages. Also, be sure you don't have any orphan pages (i.e. pages that aren't linked to any other pages on your site). This seems glaringly obvious, but it happens.
- Avoid the use of dynamic URLs: A dynamic URL is a URL that is not written in plain English. An example of a dynamic URL is this: www.mysite.com/url.com?id=4&ses=aa#. One of the problems associated with dynamic URLs is that too many parameters can cause a spider trap. This happens when a spider gets trapped in an endless loop of code. What you want to use is a URL that has your chosen keywords written in plain English. This is referred to as a canonical URL. Dynamic URLs are handy for tracking things, so if you do need to use them, make sure you use a mod_rewrite to ensure the spiders track the canonical URL.
Another tip: use
underscores instead of hyphenshyphens instead of underscores. Spiders read underscores as all one word, whereas dashes are read as separate words. - Beware of duplicate content
issues: Duplicate content can waste valuable spider time. However, there are instances where it is unavoidable, so ensure that it is dealt with correctly. In many cases the best solution is to use 301 redirects to point all of the duplicate content to one page. Your date with spidey is limited so make sure you're giving him new content to index. It's a waste of time having a spider index pages that they already have. Also, Google gets to pick which of the duplicated pages it wants to index and it just might end up picking the wrong one.There are more advanced options to consider as well. These include adding a canonical tag (a page level meta tag) to specify which version is the canonical page (aka plain English URL). The downfall to this is that the spiders have to crawl the page first to read the tag. So it's not necessarily maximizing your time with Google's spider. Google recently another option: in Webmaster Tools you can tell Google's robots to ignore any dynamic parameters, and have the spiders only crawl the canonical version of page. The benefit to doing this is that it can reduce crawl on unnecessary pages and free up bandwidth for other pages to get crawled.
- Create a robots.txt file: This file creates an opportunity to tell the spiders which parts of your site are not important for them to check out (such as folders where your images are contained). This helps to ensure you're not wasting valuable face time having unnecessary files checked. Robots.txt can also help you tell spiders which pages it shouldn't check to avoid duplicate content issues. Be careful though. An incorrect robots.txt can also make your site uncrawlable.
- Generate an XML sitemap: It's true that Google will eventually find your site and spider it. It's also true that this can take some time if you have a brand new site with no little or no external links pointing towards it. Submitting a XML sitemap to Google helps speed up the process and is generally considered good practice.
-
Utilize onpage SEO tactics: Not all onpage tactics are equal. However, it's inevitable that algorithms will change and covering all the bases is encouraged. Onpage tactics help busy little spiders building their lists of words to distinguish which keywords are central to each page on your site.
Your goal is to create content that people will like to read and share with others (and hopefully link to). Make sure the following tactics are used so that the inclusions of keywords still sound as natural as possible (including the use of modifiers and synonyms of the selected phrase).
- Use your most important keywords at the front of your page title.
- Utilize relevant keywords for the H1 tag for page headlines.
- Adjust your internal linking structure so that you are linking using relevant anchor text.
- Label images and photos with your targeted keywords only if relevant (i.e. no unnecessary keyword stuffing).
- There is no magical formula for how many times you should repeat your selected keyword phrase. However, it's safe to say that using it at least two or three times throughout the body copy makes sense.
- Ensure your pages have a fast download time: Google says that spiders will crawl as many pages as they can without overwhelming your server. Most often they only crawl a portion of your site before they move on. There is a direct correlation between page download time and how many pages are crawled that day. Make sure that your pages are not too big and load quickly.
- Be careful when using Flash: Yes. It is true that spiders have come a long way in their ability to index Flash images. At present, Google's Flash algorithms extract text and links only. Which sounds all good. However, the problem is that Google's spiders will not crawl or index any Flash executed using Javascript (which a lot of Flash uses). At this point in time it is still best practice to be careful when using Flash for integral parts of your website i.e. links, navigation and important content.
- Create custom 404 pages: If a spider is on your site and hits a 404 page (i.e. page not found) with no links on it, then it's the end of the visit for you. It has nowhere to go. A custom 404 page ensures that there are links for the spider to continue indexing your site.
- Add your site to Google Webmaster Tools. Google will let you know if there are things wrong with your site. Like crawl errors. Or duplicate meta information. This information helps you keep your site Google friendly.
- Do not hide behind logins if you want your content to be seen: If a user needs to login to view content, then so will the spider. This means that any content hidden behind a login will not get indexed.
- Do not require that cookies or session IDs be enabled to view your site: Our friendly crawlers do not have the ability to access cookies or sessions IDs. If this is a requirement on your site, it won't get indexed by spiders. There are parts of a site where it's okay; like the checkout. There are parts of a site where it's bad; like your homepage.
And there you have it: A solid platform upon which to build a successful (and well spidered) website. What are some of your tips for maximizing your time with Google's spiders? We'd love it if you'd share them.
Stephanie Woods is a freelance SEO/SEM consultant (www.stephwoodsseo.wordpress.com) in Kelowna, BC, Canada. You can follow her on Twitter (www.twitter.com/steph_woods).