How Search Really Works: The Keyword Density Myth

Ruud Hein

17 years ago

Stay Connected with Us!

This post is part of an ongoing series: How Search Really Works.
Last week: Keyword Stuffing.

What is Keyword Density?

Keyword Density is a function, a calculation, of keyword frequency.

It's calculated as number of occurrences divided by number of words and is usually expressed as a percentage.

What is Keyword Density Used For?

Nothing much, really.

Keyword density can help in readability calculations.

Keyword density is also sometimes used as a simplified manner to introduce local keyword weight but should never be confused with it.

Why don't Search Engines use Keyword Density?

Search engines deal with calculations that say something about words in a document in relation to the index it appears in.

Keyword density says something about words in a document in relation to the document itself. It doesn't help you to compare and thus sort or rank a set of documents.

Frequency <> Relevance

The fact is that frequency in and of itself doesn't equate to relevance.

The word the is the most commonly used English word: it appears with the highest frequency. If a search engine would calculate relevance as frequency, all documents in its index would have the as their topic subject.

Likewise the word time is the most commonly used English noun. This would make a multitude of documents relevant to time before anything else.

Keyword Weight

To make sense of word occurrences in a document a search engine has to see those words in the context of its index.

This is done by calculating the overall importance of words both in the document and in the index.

This importance is called term weight.

To calculate the importance of a word in a document, 3 variables are needed:

local weight: a calculation based on keyword frequency in this document. This variable can be calculated in many ways but not as a straightforward count of how many times the word appears in the document.
global weight: calculated based upon number of documents in index divided by number of documents with the keyword.
normalization: a calculation designed to remove the unfair advantages and disadvantages of document length. Usually you work to express the end values between 0 and 1.

None of the search engines have ever disclosed which published or unpublished scales they use for local weight or global weight.

What we're looking to achieve is to get high values for terms (words/phrases) that occur a lot of times in the relevant documents but infrequently in the index as a whole.

Keyword Density Myth Summary

Search engines use term weight to rank documents by relevance.

Term weight is calculated from the result of two other calculations: local weight and global weight.

Without knowing the function used for local weight we can't calculate it -- but we do know that it's not just pure keyword frequency.

Without knowing the size of the index, the number of documents relevant to the term, and the function used for global weight we can't calculate it.

Using keyword density as a guesstimator of weight or relevance is therefore utterly useless. It's like giving you the height of a three dimensional object based on which you have to not only return its volume but also tell whether it is larger or smaller than any other unseen object in a collection you don't know about.

Hungry for more? I recommend The Keyword Density of Non-Sense.