So for my first blogpost, I thought I’d touch on an issue that comes up again and again in the communication of data insight to fundraisers – indeed any “data customers”. It’s so important, I used to have a sign above my desk saying “There’s no such thing as an average donor”. And when I’ve been asked to interview, one of the questions I use to sift the wheat from the chaff is “What’s the difference between mean, mode and median?” Technically, it should be trivial, although you’d be amazed at how many analysts don’t know, but it also helps illustrate their ability to communicate data concepts to the lay person.
Anyhow, here goes:
The mean
While all three are technically “averages”, this is what is normally understood as an average. But that doesn’t allow us to simply say “oh, it’s the average”! The mean is the sum of the value under scrutiny for all cases in the set, divided by the number of cases. For instance; the average of 2,6,7 is (2+6+7)/3 = 15/3 = 5. It’s useful particularly because it can (in most cases) scale with volume, and combine with factors – for instance, you can calculate the total gift by multiplying the average gift with the total, whether the volume is 100 or 1000.
Note that it won’t always be a whole number – and this can lead to instances where it can be misleading. If the number being averaged is always a whole number, you will get an average case that is not real. As a famous example, the average number of legs is a little below 2. It’s – sadly – about 1.999, because of amputees. But no one has 1.999 legs – and to make socks for the “average(mean) person” would be nonsense.
The mode
Which takes us nicely to the mode. The mode simply means the most common – it’s often described as the typical. So the modal number of legs is 2. This is, however, very focused, in that it ignores all those individuals not in that category. This is particularly a problem when the distribution is not symmetric. For instance, it’s probably true that the the most common number of readers for academic papers, once published, is zero. Which leads us to the (modal) average of zero. Which isn’t very helpful! In fundraising, many distributions aren’t symmetric – gifts in particular are skewed right (meaning that the majority are at the lower values). For this reason, it’s very unusual for people to mean mode (sorry 😉 ) when they say average – it’s use is quite idiosyncratic.
The median
A further problem of the mean is the treatment of outliers. In order to retain the arithmetic properties, it gives large weight to very big numbers. The mean of 9,10,10,11,60 is 20. There might well be reason why the ‘6’ should not be able to drag the average so far away from what looks like the typical answer. And it feels odd that 4 out of 5 of the values are “below average”. (Odd, but not necessarily wrong – see here for a mathematical discussion of this).
The median (although it actually predates the mean) addresses this somewhat, by taking the middle value once they are ranked. For many people this intuitively feels right – there will be an equal number above and below average. So the median of the above group is 10. (If the set has an even number of instances, where there is no middle value, then we take the arithmetic mean of the two middle ones. For instance 8,9,10,12,14,15 would have a median of (10+12)/2 = 11. Note that, again, this could result in a figure that isn’t a whole number. For this reason, some people will make further adjustments, or just choose the 10 or the 12.
While intuitive, the median does require some care. Predominantly, it doesn’t allow the arithmetic freedom that the mean does – you can’t multiply the median average by the number of individuals to get the total value, except under certain conditions.
Secondly, it does lose information – once you’ve reduced the values to a median, you have still lost all the information about the other values (outliers, skew, etc), and so you can’t draw on that information later. See below for more detail on this.
Other measures
Unsurprisingly, that’s note the end of the story. An early average, not used now, was the mid-range, where they would simply take the arithmetic mean of the highest and lowest figure. In the examples above, the mid-range of 8,9,10,12,14,15 would be 23/2 or 11.5. (One of the reasons for this, and the use of the mean in the median, is that the idea of average of two values – “splitting the difference” – existed long before it was extended into a calculation for a set of any size.)
You might also hear, occasionally, of geometric and arithmetic averages. These refer to the mean. The arithmetic average is the mean as described above, where the values are added. The geometric mean in fact multiplies the values, and then takes the root of the number of them. This is relatively rare, but is useful in growth/trends. Consider the example of a value which doubled, and then halved – ie it went from 100 to 200 to 100 again. The growth was x2, then x0.5. The arithmetic average of these is (2-0.5)/2 = 75% which makes no sense. The geometric average is sqrt(2*0.5) = sqrt(1) = 1. This – the concept behind CAGR (compound annual growth rate) is complicated and probably worthy of another, later, post.
Warnings
As I say, the use of geometric averages, and so the likelihood of confusion is rare, but there are a few much more common traps.
Average of Averages
First of all, averages have simplified a lot of data, and so have lost information. That means you don’t know implicitly whether the average was very close to the true underlying base, or was from some spurious one-off data. The statistical concepts such as variance, standard deviations and standard errors try and address this. But too often I’ve seen people taking averages of averages. Imagine two sets of data: {2,6}, and {6,7,8,9,10}. The mean average of the first is 4, the second 8. (Because these are symmetric, the mean is in this instance is equal to the median, but you can almost never arithmetically manipulate medians, so we are talking about means here). If you take the average of the two averages, you get 6. But while there’s much more information in the second set, it’s being given equal weighting as the first. The average of the total group is in fact 48/7 which is nearly 7.
This sort of mistake happens all too often in fundraising when trying to work out the average gift across different RFV segments.
Percentage changes
A second trap involves the use of percentage changes. Returning to the example of the geometric average, the percentage change was +100% then -50% – but using those values will give errors. Taking a salutary example away from fundraising – if your pension goes up by 10% then down by 10%, it will actually end up 99% of its original value.