In all areas of science (and where statistical methods are applied), representative samples are typically achieved by randomly selecting samples from a wider population (e.g. Thompson, 2012; Smith et al., 2017; Tillé and Wilhelm, 2017). Random sampling ensures that the information contained in the sample is generalisable to the population that it was obtained from (Fisher, 1925). Simply using some sort of random sampling ensures that the data are representative and thereby able to answer many types of research questions (see Table 2.1).
An alternative, which is unfortunately common in marine ecology, is to select sites based on other (non-random) properties. These properties could include their convenience to be sampled, or what a researcher expects to find. This is called ‘ad-hoc’, ‘opportunistic’, ‘haphazard’, ‘judgemental’, ‘purposeful’, or ‘convenience’ sampling. While at first glance this approach appears to be efficient, it in fact diminishes the ability to answer any questions about the population as a whole, which limits questions to those involving the specific sample only: descriptive and exploratory questions (unless non-testable assumptions are made). The reader is referred to Smith et al. (2017) and Dobson et al. (2020) for recent discussions on this topic in ecology.
The implication here is immediate and clear – researchers should randomise the sampling process if they expect the patterns observed in the sample to hold in the population. Researchers should not routinely perform haphazard sampling. Of course, there may be situations where a particular location appears so interesting that it could be appended to a randomised survey design, but its data can only be included into the analysis with additional (strong) assumptions and/or complexities in analysis approaches. The randomisation process is particularly important for monitoring programs where data from multiple surveys (through time and/or space) are combined.
An important side-effect of randomisation is that a researcher must specify what the statistical population under study is. Formally, for surveying geographic areas, the population is a collection of potential survey locations from which a random sample is taken, often called a sample frame in the literature. The formal specification of the sample frame is important as it gives the extent to which the results are legitimately generalisable. A sample frame may be delimited by some combination of: spatial extent, depth, habitat type, season and the type of sample that the selected gear can adequately collect. Generalisation beyond the sample frame requires assumptions, often quite strong assumptions, that the processes outside the sample frame are identical to those within it. It is best to try and avoid these assumptions by expanding the sample frame prior to undertaking the survey.