Sum of us use means too
Sometimes, it feels like I might get a reputation as a nay-sayer: querying the wonderful world of data submissions.
It is always good to see others have such a bent, too. When it also relates to how data are provided/ presented/ used then it it doubly gratifying
As the title (punningly?) suggests, means are a way of expressing data. They also come with caveats: they may well be suitable, in certain circumstances, but also hide a lot too. Lintott and Mathews ( Biodi & Cons open access 2017) have looked at the use of means in EIAs. They are not impressed.
They note the risks in using means with skewed data sets, or without any description of the range of variation.
Looking at bat data they are clear in their concerns:
"we show that EIAs frequently summarise these data using the mean or fail to define the term ‘average’. This can lead to the systematic misinterpretation of evidence which has serious implications for assessing risk. There is therefore a pressing need for guidance to specify data processing techniques so that planning decisions are made on a firm evidence-base. By ensuring that data processing is systematic and transparent it will result in mitigation decisions and conservation strategies that are cost-effective and proportionate to the predicted degree of risk".
Unpicking that a little for bats (their preferred species group), they also suggest:
"the mean is unlikely to be a good summary of data that are skewed or highly clustered. For example bat activity is known to vary depending on factors such as temperature, seasonality, wind speed, and insect availability (Fischer et al. 2009); and for rare species where few data can be collected, the estimated mean will depend heavily on a small number of data points. For example, independent assessments showed that nightly bat activity did not conform to statistical normality at any of the 46 wind farm sites surveyed as part of the UK National Bats and Wind Turbines project (Mathews et al. 2016)."
Think about that a little, then look at the next finding:
"The presentation of inappropriate summary statistics also hinders the ability of research scientists to use the data on which the EcIA is based (e.g. to assess the effectiveness of mitigation for major road developments). A good environmental assessment should disclose all relevant information to allow the significance of the environmental effects of the proposed development to be determined (Elkin and Smith 1988). It is evident that presenting just the mean number of bat passes fails to meet this standard. We therefore suggest that in many circumstances presenting the median and inter-quartile range would give a better understanding of usual activity/abundance. Similarly, presenting information on the relative frequency of occurrences of ecological importance (e.g. the number of nights a rare bat is encountered at a proposed development site) would give a better understanding of the likely consequences of the development than just presenting the mean. "
There is a bit more, but you get the gist. When academics start querying the evidence base and the data manipulation in practical terms, that is an alarm bell.
Hopefully, someone out there is listening
Basic mathematical errors may make ecological assessments unreliable. Biodiversity and Conservation . ISSN 0960-3115 Available from: http://eprints.uwe.ac.uk/ 33747