Last week, we had the opportunity to see Moz founder Rand Fishkin hold court at a session called “Why Great Marketers must be Great Skeptics.” While the presentation as a whole was terrific (as one would expect from Fishkin), one section in particular caught our attention:

The difference between correlation and causation.

As data scientists and analysts, this is a hugely important distinction in our line of work. Not understanding the differences between the two can lead CEOs, Marketing VPs and Sales Managers to make what they think are the correct decisions. In truth, without understanding the difference between correlation and causation, these decisions could be wrong and incredibly costly.

Don’t make those mistakes. Read on for a primer on the vast differences between correlation and causation, and how to spot them.

Definitions

  • Correlation – a mutual relation of two or more things. In statistics and data analysis, the definition is refined further; correlation (typically expressed as a number) describes the size and direction of a relationship between two or more variables.

  • Causation – indicates that one event is the result of the occurrence of the other event. It is also known as “cause and effect” – the second event is understood as a direct consequence of the first event

Why is this important in data analysis for sales and marketing

Sales and marketing managers analyze data to draw actionable insights from the things they are doing, so that they can tweak them to be more efficient or effective – “How effective is this campaign in producing leads?” “What happens when my prospecting reps increase their daily call volumes by 20%?”

Incorrectly interpreting the causal or corollary effects of certain actions could produce the wrong effects, creating a domino effect of subsequent decisions based on this bit of analysis. If you take one step down the wrong path, each subsequent step (based on your initial and incorrect choice of path) will lead you further and further away from where you want to be.

In his presentation, Rand Fishkin emphasized that the best marketers and Sales VPs are skeptics when it comes to data analysis. He also broke down the difference between what he considered bad skeptics and good skeptics:

  • Bad skeptics don’t question what’s truly causal and what’s merely correlated. When running experiments or analysis, they assume that one belief-reinforcing data point is evidence enough, with seeking additional validation (or invalidation).

  • Good skeptics know the difference between the two, and that correlation doesn’t imply causality. They seek to uncover the reasons why behind the underlying results. They are not afraid – and even welcome it! – to have their hypotheses proven wrong.

Let’s look at a real-life sales example. Let’s say each rep on the outbound prospecting team at a sales organization is charged with making 100 dials a day. The sales manager, looking to grow the sales pipeline, decides to increase this daily quota by 20% – each rep is now expected to make 120 calls a day.

A week after implementing this new daily quota, the pipeline has grown immensely, surging by 20% over the previous week’s pipeline generation. The sales manager is thrilled! It seems that there is a direct correlation between making more calls and generating more pipeline. The proud sales manager goes to the CEO, reporting that he caused the sales pipeline to grow by 20% because he increased his reps’ daily call quota by 20%.

But what if the increase in pipeline had nothing to do with the increase in quota at all? Maybe the growing pipeline can be attributed to a new scrubbed list that was also introduced the same week as the new quota goals. Perhaps it was this list that was the real cause of the increased pipeline, not the more demanding quota.

A bad skeptic would see that first data point seemingly affirming his hypothesis – “More dials will lead to more pipeline!” – and take that to the bank (and to the CEO). On the other hand, a good skeptic would note this correlation and try to drill-down in order to find the real causation variables, if there are any.

This might include A/B testing on smaller quotas with the same list, larger quotas with different lists, having both the best and underperforming reps working on these lists, etc. Only by testing, and testing more, will he be able to ascertain what the true cause of the growing pipeline is, and how he can repeat it for the future.

 

One of the biggest problems with making mistakes and decisions in data analysis is compounding those mistakes into future decisions. The key, then, is being a good enough skeptic early on, and knowing the differences between correlation and causation, so that you don’t make that first mistake or leaps of assumption in the first place.

Become a good skeptic. Know the differences between correlation and causation.