Friday, August 27, 2010
Keyword Strength from a More Rigorous Perspective
With this week's release came a new look and new behavior for Compendium's Keyword Strength Meter, a widget that appears in several places in our application for indicating the quality of keyword usage in a body of content.
For one thing, we migrated the algorithm from the browser to the server, so that it is computed and updated on draft save operations. While this sacrifices the immediate feedback that many customers have come to love, it allows us to provide this feature for all
of our clients, including those which have very large target keyword pools. It also allows us to expose the keyword strength algorithm as a web service API call
The meter itself got a makeover that replaces the shifting gradient to a progress bar. The new look is more accessible to the color blind.
What hasn't changes is the math behind the meter.
Every once in a while, we get a question about how the keyword strength score is computed. I've written previously about the objectives of the algorithm
, how it attempts to find a balance between diverse keyword usage and detrimental keyword stuffing, but that's a pretty high level discussion.
When I talk with team members about the meter's algorithm, I've always downplayed the complexity, because it's always seemed like a pretty straightforward calculation. As an amusement, I decided to recast the algorithm using more precise terminology, resembling that of what a mathematician or computer scientist might use. Here is what I wound up with.Keyword Strength
For the purposes of this discussion, a token
is a contiguous sequence of characters within a string that contains no whitespace.
be a string of characters consisting of white space delimited tokens.
be a vector of n
strings containing white space delimited tokens.
denote the i
th element of K
) be a function that computes the number of tokens present in the string x
For the purposes of defining M
below, appearance will be determined by a case-insensitive character comparison.
) be a function which returns ni
, the the number of times where Ki
appears in T
) be a function which returns 1 if Ki
appears anywhere in T
and 0 otherwise.
) be a function that computes the concentration
, a measure of how much of string T
is comprised of the tokens in K
Let s be a three element vector of scoring functions that have the following formulas:
, and Copt
are adjustable parameters.
be a three-element vector of weights between 0 and 1, such that:
Then the keyword strength
of a string T
relative to a vector of keyword phrase strings K
is determined by the dot product of the scoring function and weight vectors.