Uncertainty, Precision, and Power • Statistical Considerations for Monitoring and Sampling Field Manual

It is important to consider how to reduce the uncertainty (and increase precision) in statistical analyses of survey/monitoring data. Practically, there are two components to this: 1) increasing the information content in the dataset; and 2) reducing non-relevant variation (noise) during the collection process. Addressing the first component, increasing information content, begins by using an efficient survey/monitoring design, such as a spatially balanced design. More information implies that the signal in the data can be clarified with greater ease. The second component (noise reduction) refers to that part of the variation in the data that is induced by (for example) performing assays slightly differently each time. This includes the measurement tools used (e.g. CATAMI for image classification; Althaus et al. 2015). This type of noise can be reduced by adhering to well-defined, repeatable, measurement protocols and classification schemes so that two or more measurements on the same sample will generate identical or at least very similar observations. See the gear-specific chapters in this field manual package for detailed advice on reducing measurement noise. As an example, consider the scoring of an AUV mission. Scoring more images from the mission (less subsetting) and using a higher density of points within each image will reduce measurement variation. There are diminishing returns though, with more points becoming less and less better (Perkins et al.; 2016). What constitutes irrelevant noise depends on the objectives of the study, but additional sources of noise can be subtle and can encompass issues such as taxonomic inconsistency and inclusion of non-target species or individuals of a wide variety of life-stages. The latter may occur from sampling pelagic species in transit to the targeted habitat.

For some novel measurement platforms, measurement/scoring techniques are still being assessed and these updates should be incorporated where possible. Examples of this process are Perkins et al. (2016) for scoring AUV images and Schobernd et al. (2014) for scoring BRUV deployments. We stress though, that whilst noise reduction is important, it is not the only consideration and that particular care should be taken to maintain protocols within already established monitoring programs, or calibrate new protocols with old. In addition to reducing ‘noise’, it will ensure that, for example, time-series do not get ‘broken’ and that data are directly comparable in time and space without unfortunate confounding due to a change in sampling methodology or other factors discussed above.

Most Chapters in this field manual package are variations on the noise-reduction theme as they provide a foundation for reducing variation between and within surveys. In particular, if adhered to, they will help minimise, or possibly even eliminate, inherent systematic variation (bias) between different surveys or within a monitoring program. This will have the effect of increasing the utility of combining data from different surveys (as there will be minimised bias between the two sets). We have unfortunately come across long-term studies that could not be used to estimate trends in the target species because of inconsistencies in sampling design and implementation (Hosack and Lawrence, 2013).

Any approach to reducing variance in the sample statistics should be welcomed whole-heartedly, so long as there is no introduction of confounding between it and any spatial/temporal signals or other important trends. This includes processes to eliminate obvious sources of measurement variation (e.g. non-uniform gear deployment, faulty measurement equipment, poor laboratory practices) and data entry errors. In well conducted studies, under most circumstances, measurement variation is likely to be relatively small compared to the variation in the ecological processes that are being sampled. Understanding this means that exorbitant amounts of time should not be placed in perfecting each measurement – especially not if the cost of perfection is a substantial reduction in the number of samples taken. Often a much richer sample is obtained (in terms of signal to noise) by taking more, slightly noisier, samples than fewer precise ones. Unfortunately, we are aware of no rules-of-thumb to guide researchers with this issue.

In certain situations, it may be pragmatic to alter the measurement process. For example, in the rare situations where the cost of performing the measurement assay is large compared to the cost of collecting the ecological material, then it may be useful to combine a number of samples prior to measurement. However, it is important to realise what is being lost in this case: the ability to understand the variability between samples within the same assay group. This is often a limitation if the combined samples originated in quite different environmental conditions (for example). It should also be noted that in analyses, the combined sample is the sampling unit, not the original ones that contribute to it. Another situation where measurement variation can be reduced is when the assay is both cheap and noisy. In this situation, it can be beneficial to use sub-samples (sometimes confusingly called replicates), which are measurements on the same biological material. When sub-samples are utilised, the analyst can partition variation into measurement and ecological components.

Some design experts advise that a power analysis be performed before any survey effort is undertaken. Recall that a power analysis calculates the probability that the survey will be able to detect a difference if there actually is one (a true positive). This is undoubtedly a good thing to do when there is a clear hypothesis to be tested, a meaningful effect size can be stipulated and detected, and an (at least moderately) accurate estimate of variance is available prior to the study. However, this is not always the case. It has been observed that power analyses are often performed without great thought, leading to (perhaps) overly large stipulated sample sizes (e.g. Mapstone, 1995); probably larger than any reasonable budget will allow. The arguments outlined in Mapstone (1995) are, to us, quite compelling as they make a researcher undertaking a power analysis think critically about the relative environmental/economic/political costs of making a poor decision. Sometimes it will be more important to guard against making a false-negative (type II) errors than false-positive (type I). Such a situation could occur if the cost of falsely declaring significance is larger than that of falsely declaring _non-_significance (e.g. declaring impact may result in closure of a factory or imposing fishing quotas). This is quite contrary to many applications of hypothesis testing in other areas of science.

If a power analysis is undertaken, then there is some general advice that we offer to marine ecologists. First, don’t blindly follow text-book recipes for power analyses. They make some strong assumptions that are unlikely to be met in ecology (e.g. normality of observations, independence of observations, and constancy of variance in space and/or time). Second, be prepared to do a lot of homework about the sizes of the components of variation that you are likely to observe: “How much overdispersion is there in your study region?” “Is there any spatial autocorrelation likely?” “What analysis methods are intended to be used?”

It is our opinion that a very useful, and often not too difficult, method for assessing power is to use simulation. There have recently been attempts to provide simplified R-based tools for this process (Green and MacLeod, 2016, for mixed models), and these show promise. The simulation approach consists of a small number of steps: 1) simulate some data under the alternative hypothesis (incorporating the effect that is being considered), 2) analyse the data and see if there is a significant effect, and 3) repeat steps 1) and 2) many times. The proportion of analyses (of simulated data) that produce a significant analysis will give one minus the power of the test. The simulation approach is not without detractions though, and many of these are shared with all power-analyses. Primarily, the simulation model describes a simplified version of reality, which is likely to be less ‘noisy’. The reduction in noise stems from unaccounted for events, such as storms, unusual recruitment events, and so on. Irrespectively, power analyses are widely used and the simulation approach has been used in many places, including the marine realm (Foster et al., 2014, Perkins et al., 2017). Power is not the only piece of information that can come from the simulation though. In particular, it can be used to support the evaluation of how sample size and study design impacts more general monitoring objectives (e.g., the ability to estimate parameters in a model or predict future data).