Posted by Jeff_Baker
In January of 2018 Brafton began a massive organic keyword targeting campaign, amounting to over 90,000 words of blog content being published.
Did it work?
Well, yeah. We doubled the number of total keywords we rank for in less than six months. By using our advanced keyword research and topic writing process published earlier this year we also increased our organic traffic by 45% and the number of keywords ranking in the top ten results by 130%.
But we got a whole lot more than just traffic.
From planning to execution and performance tracking, we meticulously logged every aspect of the project. I’m talking blog word count, MarketMuse performance scores, on-page SEO scores, days indexed on Google. You name it, we recorded it.
As a byproduct of this nerdery, we were able to draw juicy correlations between our target keyword rankings and variables that can affect and predict those rankings. But specifically for this piece...
How well keyword research tools can predict where you will rank.
A little background
We created a list of keywords we wanted to target in blogs based on optimal combinations of search volume, organic keyword difficulty scores, SERP crowding, and searcher intent.
We then wrote a blog post targeting each individual keyword. We intended for each new piece of blog content to rank for the target keyword on its own.
With our keyword list in hand, my colleague and I manually created content briefs explaining how we would like each blog post written to maximize the likelihood of ranking for the target keyword. Here’s an example of a typical brief we would give to a writer:
Between mid-January and late May, we ended up writing 55 blog posts each targeting 55 unique keywords. 50 of those blog posts ended up ranking in the top 100 of Google results.
We then paused and took a snapshot of each URL’s Google ranking position for its target keyword and its corresponding organic difficulty scores from Moz, SEMrush, Ahrefs, SpyFu, and KW Finder. We also took the PPC competition scores from the Keyword Planner Tool.
Our intention was to draw statistical correlations between between our keyword rankings and each tool’s organic difficulty score. With this data, we were able to report on how accurately each tool predicted where we would rank.
This study is uniquely scientific, in that each blog had one specific keyword target. We optimized the blog content specifically for that keyword. Therefore every post was created in a similar fashion.
Do keyword research tools actually work?
We use them every day, on faith. But has anyone ever actually asked, or better yet, measured how well keyword research tools report on the organic difficulty of a given keyword?
Today, we are doing just that. So let’s cut through the chit-chat and get to the results...
While Moz wins top-performing keyword research tool, note that any keyword research tool with organic difficulty functionality will give you an advantage over flipping a coin (or using Google Keyword Planner Tool).
As you will see in the following paragraphs, we have run each tool through a battery of statistical tests to ensure that we painted a fair and accurate representation of its performance. I’ll even provide the raw data for you to inspect for yourself.
Let’s dig in!
The Pearson Correlation Coefficient
Yes, statistics! For those of you currently feeling panicked and lobbing obscenities at your screen, don’t worry — we’re going to walk through this together.
In order to understand the relationship between two variables, our first step is to create a scatter plot chart.
Below is the scatter plot for our 50 keyword rankings compared to their corresponding Moz organic difficulty scores.
We start with a visual inspection of the data to determine if there is a linear relationship between the two variables. Ideally for each tool, you would expect to see the X variable (keyword ranking) increase proportionately with the Y variable (organic difficulty). Put simply, if the tool is working, the higher the keyword difficulty, the less likely you will rank in a top position, and vice-versa.
This chart is all fine and dandy, however, it’s not very scientific. This is where the Pearson Correlation Coefficient (PCC) comes into play.
Phew. Still with me?
So each of these scatter plots will have a corresponding PCC score that will tell us how well each tool predicted where we would rank, based on its keyword difficulty score.
We will use the following table from statisticshowto.com to interpret the PCC score for each tool:
Coefficient Correlation R Score |
Key |
---|---|
.70 or higher |
Very strong positive relationship |
.40 to +.69 |
Strong positive relationship |
.30 to +.39 |
Moderate positive relationship |
.20 to +.29 |
Weak positive relationship |
.01 to +.19 |
No or negligible relationship |
0 |
No relationship [zero correlation] |
-.01 to -.19 |
No or negligible relationship |
-.20 to -.29 |
Weak negative relationship |
-.30 to -.39 |
Moderate negative relationship |
-.40 to -.69 |
Strong negative relationship |
-.70 or higher |
Very strong negative relationship |
In order to visually understand what some of these relationships would look like on a scatter plot, check out these sample charts from Laerd Statistics.
And here are some examples of charts with their correlating PCC scores (r):
The closer the numbers cluster towards the regression line in either a positive or negative slope, the stronger the relationship.
That was the tough part - you still with me? Great, now let’s look at each tool’s results.
Test 1: The Pearson Correlation Coefficient
Now that we've all had our statistics refresher course, we will take a look at the results, in order of performance. We will evaluate each tool’s PCC score, the statistical significance of the data (P-val), the strength of the relationship, and the percentage of keywords the tool was able to find and report keyword difficulty values for.
In order of performance:
#1: Moz
Revisiting Moz’s scatter plot, we observe a tight grouping of results relative to the regression line with few moderate outliers.
Moz Organic Difficulty Predictability |
|
---|---|
PCC |
0.412 |
P-val |
.003 (P<0.05) |
Relationship |
Strong |
% Keywords Matched |
100.00% |
Moz came in first with the highest PCC of .412. As an added bonus, Moz grabs data on keyword difficulty in real time, rather than from a fixed database. This means that you can get any keyword difficulty score for any keyword.
In other words, Moz was able to generate keyword difficulty scores for 100% of the 50 keywords studied.
#2: SpyFu
Visually, SpyFu shows a fairly tight clustering amongst low difficulty keywords, and a couple moderate outliers amongst the higher difficulty keywords.
SpyFu Organic Difficulty Predictability |
|
---|---|
PCC |
0.405 |
P-val |
.01 (P<0.05) |
Relationship |
Strong |
% Keywords Matched |
80.00% |
SpyFu came in right under Moz with 1.7% weaker PCC (.405). However, the tool ran into the largest issue with keyword matching, with only 40 of 50 keywords producing keyword difficulty scores.
#3: SEMrush
SEMrush would certainly benefit from a couple mulligans (a second chance to perform an action). The Correlation Coefficient is very sensitive to outliers, which pushed SEMrush’s score down to third (.364).
SEMrush Organic Difficulty Predictability |
|
---|---|
PCC |
0.364 |
P-val |
.01 (P<0.05) |
Relationship |
Moderate |
% Keywords Matched |
92.00% |
Further complicating the research process, only 46 of 50 keywords had keyword difficulty scores associated with them, and many of those had to be found through SEMrush’s “phrase match” feature individually, rather than through the difficulty tool.
The process was more laborious to dig around for data.
#4: KW Finder
KW Finder definitely could have benefitted from more than a few mulligans with numerous strong outliers, coming in right behind SEMrush with a score of .360.
KW Finder Organic Difficulty Predictability |
|
---|---|
PCC |
0.360 |
P-val |
.01 (P<0.05) |
Relationship |
Moderate |
% Keywords Matched |
100.00% |
Fortunately, the KW Finder tool had a 100% match rate without any trouble digging around for the data.
#5: Ahrefs
Ahrefs comes in fifth by a large margin at .316, barely passing the “weak relationship” threshold.
Ahrefs Organic Difficulty Predictability |
|
---|---|
PCC |
0.316 |
P-val |
.03 (P<0.05) |
Relationship |
Moderate |
% Keywords Matched |
100% |
On a positive note, the tool seems to be very reliable with low difficulty scores (notice the tight clustering for low difficulty scores), and matched all 50 keywords.
#6: Google Keyword Planner Tool
Before you ask, yes, SEO companies still use the paid competition figures from Google’s Keyword Planner Tool (and other tools) to assess organic ranking potential. As you can see from the scatter plot, there is in fact no linear relationship between the two variables.
Google Keyword Planner Tool Organic Difficulty Predictability |
|
---|---|
PCC |
0.045 |
P-val |
Statistically insignificant/no linear relationship |
Relationship |
Negligible/None |
% Keywords Matched |
88.00% |
SEO agencies still using KPT for organic research (you know who you are!) — let this serve as a warning: You need to evolve.
Test 1 summary
For scoring, we will use a ten-point scale and score every tool relative to the highest-scoring competitor. For example, if the second highest score is 98% of the highest score, the tool will receive a 9.8. As a reminder, here are the results from the PCC test:
And the resulting scores are as follows:
Tool |
PCC Test |
---|---|
Moz |
10 |
SpyFu |
9.8 |
SEMrush |
8.8 |
KW Finder |
8.7 |
Ahrefs |
7.7 |
KPT |
1.1 |
Moz takes the top position for the first test, followed closely by SpyFu (with an 80% match rate caveat).
Test 2: Adjusted Pearson Correlation Coefficient
Let’s call this the “Mulligan Round.” In this round, assuming sometimes things just go haywire and a tool just flat-out misses, we will remove the three most egregious outliers to each tool’s score.
Here are the adjusted results for the handicap round:
Adjusted Scores (3 Outliers removed) |
PCC |
Difference (+/-) |
---|---|---|
SpyFu |
0.527 |
0.122 |
SEMrush |
0.515 |
0.150 |
Moz |
0.514 |
0.101 |
Ahrefs |
0.478 |
0.162 |
KWFinder |
0.470 |
0.110 |
Keyword Planner Tool |
0.189 |
0.144 |
As noted in the original PCC test, some of these tools really took a big hit with major outliers. Specifically, Ahrefs and SEMrush benefitted the most from their outliers being removed, gaining .162 and .150 respectively to their scores, while Moz benefited the least from the adjustments.
For those of you crying out, “But this is real life, you don’t get mulligans with SEO!”, never fear, we will make adjustments for reliability at the end.
Here are the updated scores at the end of round two:
Tool |
PCC Test |
Adjusted PCC |
Total |
---|---|---|---|
SpyFu |
9.8 |
10 |
19.8 |
Moz |
10 |
9.7 |
19.7 |
SEMrush |
8.8 |
9.8 |
18.6 |
KW Finder |
8.7 |
8.9 |
17.6 |
AHREFs |
7.7 |
9.1 |
16.8 |
KPT |
1.1 |
3.6 |
4.7 |
SpyFu takes the lead! Now let’s jump into the final round of statistical tests.
Test 3: Resampling
Being that there has never been a study performed on keyword research tools at this scale, we wanted to ensure that we explored multiple ways of looking at the data.
Big thanks to Russ Jones, who put together an entirely different model that answers the question: "What is the likelihood that the keyword difficulty of two randomly selected keywords will correctly predict the relative position of rankings?"
He randomly selected 2 keywords from the list and their associated difficulty scores.
Let’s assume one tool says that the difficulties are 30 and 60, respectively. What is the likelihood that the article written for a score of 30 ranks higher than the article written on 60? Then, he performed the same test 1,000 times.
He also threw out examples where the two randomly selected keywords shared the same rankings, or data points were missing. Here was the outcome:
source https://moz.com/blog/ranking-keyword-research-tools
Resampling |
% Guessed correctly |
---|
No comments:
Post a Comment