10 Things Google Wished You Knew

Ruud Hein

13 years ago

Stay Connected with Us!

Still trying to read Google's hidden signals and figure out their true meaning?

Sometimes a cigar is just a cigar and a kiss just a kiss. Here are 10 "take them at face-value" things Google wished you knew.

1. There is no duplicate content penalty

"Let's put this to bed once and for all, folks: There's no such thing as a 'duplicate content penalty.'"
-- Susan Moskwa, Webmaster Trends Analyst, in Demystifying the "duplicate content penalty"

Scraping other's content can get you in trouble, sure, but having on-site, accidental, non-malicious duplicate content does not earn you minus point or a penalty.

On larger, dynamic sites, duplicate content can cause a near-infinite amount of pages to crawl while Google assigns a finite amount of time on each site crawled depending on its importance. Wasting Googlebot's time on your site by feeding it duplicate content? Almost as good an idea as trying to get a penalty.

2. We see your NOSCRIPT & raise you a "yeah, right"

"One of the problems with noscript is - as others have mentioned - that it's been abused quite a bit by spammers, so search engines might treat it with some suspicion. So if this is really important content, then I wouldn't rely on all search engines treating your noscript elements in the same way as normal, visible, static content on your pages."
-- John Mueller, Webmaster Trends Analyst, in Best way to include static content in dynamic pages?

The only thing missing from John Mu's statement is a "wink wink, nudge nudge" after his "might" treat it with suspicion. Cold hard fact: content in <noscript> loses almost all its value.

If you want to tailor to people with JavaScript turned off the way to do it is with unobtrusive JavaScript.

3. We don't do meta keywords

"Our web search (the well-known search at Google.com that hundreds of millions of people use each day) disregards keyword metatags completely. They simply don't have any effect in our search ranking at present."
-- Matt Cutts, Search Quality Team, in Google does not use the keywords meta tag in web ranking

It's not just for search ranking that Google ignores the meta keywords " it doesn't even use it for retrieval. Whatever you put down in your meta keywords is 100% useless at Google's.

4. Use 503 "Away" Server Code & We'll Be Back Under 24 Hours (but you can still serve content)

"The interesting thing about a 503 HTTP result code (or most others) is that you can serve normal content to your users and it will only be recognized by those that explicitly watch out for the result codes, usually only search engine crawlers. [] return the 503 HTTP result code with your "we're currently closed" message, so that users can see the message, but search engine crawlers know to ignore the content and to come back another day (in practice, they'll probably try to come back sooner than that)."
-- John Mueller, Webmaster Trends Analyst, in Can I restrict Google from crawling my site on a specific day of the week?

Any "we're not serving content at the moment" kind of situation, like upgrading your CMS or other site work, should see your server returning a 503 status code; you can still show visitors regular content if needed.

5. There is no supplemental index (anymore)

"Now we're coming to the next major milestone in the elimination of the artificial difference between indices: rather than searching some part of our index in more depth for obscure queries, we're now searching the whole index for every query."
-- Yonatan Zunger, Search Quantity Team, in The Ultimate Fate of Supplemental Results

The supplemental index was a necessary part of the old disk-based index. Inserting new information in the index and then re-sorting was an "expensive" operation. Part of the solution was to only insert important documents "right away" into the main index and push the lesser important ones into a supplemental index.

Nowadays Google's complete index of the web is stored in memory, not on disk, and parts of the index can be decompressed at will. Inserting new documents and inserting new ranking systems (read: sorting) is super easy and can be done in real-time.

6. We care about valid HTML NOT!

"Seriously... I don't want to discourage anyone from validating their site; however, unless it's REALLY broken, we're likely going to be able to spider it pretty decently. []

Being more specific: I'm betting that in the vast majority of cases in which folks have indexing or ranking concerns, the core issue is NOT that their site doesn't perfectly validate"
-- Adam Lasnik, speaking as webmaster liaison, in Is W3C validation really essential for Google to list my site?

To wilfully disregard good, clean, valid code is economic insanity: where valid code is inherently cross browser-, cross device- and cross platform compatible, bad code isn't causing you to either miss out on opportunities that should have been yours from the start or forcing you to spending extra bucks, time and time again, simply to catch up.

While all that is true, valid code isn't one of Google's 200 ranking factors.

7. We adhere to the robots protocol " except crawl delay

"[]the reason that Google doesn't support crawl-delay is because way too many people accidentally mess it up. For example, they set crawl-delay to a hundred thousand, and, that means you get to crawl one page every other day or something like that.

We have even seen people who set a crawl-delay such that we'd only be allowed to crawl one page per month. What we have done instead is provide throttling ability within Webmaster Central []"
-- Matt Cutts, Search Quality Team, in Eric Enge Interviews Google's Matt Cutts

If you really feel the need to set crawl-delay it might be time to look into another host,server or server setup. If your setup can run into serious problems when a large number of page requests are made rapidly, you're likely unable to deal with a promoted blog post going viral or getting linked to by huge traffic drivers like Techmeme, Lifehacker, or the New York Times.

8. The cached version of your page doesn't correspond to our last crawl

"In general, we do not always update the cached page every time that we crawl a page. Especially when the page does not significantly change, we may opt to just keeping the old date on it."
-- John Mueller, Webmaster Trends Analyst, in Google cache of index page does not change

Regardless of how many times a day or year Googlebot comes by, the cached version of one or more pages on your site isn't always updated.

9. TLD trumps hosting location for geo-targeting

"if your site has a geographic TLD/ccTLD (like .co.nz) then we will not use the location of the server as well. Doing that would be a bit confusing, we can't really "average" between New Zealand and the USA... At any rate, if you are using a ccTLD like .co.nz you really don't have to worry about where you're hosting your website, the ccTLD is generally a much stronger signal than the server's location could ever be."
-- John Mueller, Webmaster Trends Analyst, in hosting server IP address importance to SEO

An "oh cool" remark for a lot of people, we're sure. Got the domain name extension that goes with the country you want to talk to? No worries about where you're hosting. Of course, if you do want to target other countries, then you have some work to do.

10. Pages blocked in robots.txt can still get PR

"a page that is blocked by robots.txt can still accrue PageRank. In the old days, ebay.com blocked Google in robots.txt, but we still wanted to be able to return ebay.com for the query [ebay], so uncrawled urls can accumulate PageRank and be shown in our search results."
-- Matt Cutts, Search Quality Team, in PageRank sculpting