Simple randomisation – randomly scattering sampling locations through space – is not necessarily an efficient approach, and in many circumstances a large number of samples are necessary to obtain acceptably precise estimates of population parameters (e.g. Tillé and Wilhelm, 2017). This potential inefficiency is one of the reasons that haphazard sampling can initially although mistakenly appear quite attractive. There are, however, ways to address inefficiency, and to generate designs that require fewer samples and resources. Various researchers have proposed statistically valid restrictions on the randomisation process. In the environmental sciences this discussion has ultimately led to several forms of spatially balanced designs (Stevens and Olsen, 2004; Dobbie et al., 2008; Grafström et al., 2012; Grafström, 2012; Grafström and Tillé, 2013; Grafström, 2013; Robertson et al., 2013; Brown et al., 2015; Foster et al., 2017; Tillé and Wilhelm, 2017), with similar ideas known as ‘spatial coverage designs’ (Royle and Nychka, 1998; Brus et al., 1999, 2006; Minasny and McBratney, 2006; Walvoort et al., 2010) and ‘even sampling designs’ (Chen et al., 2012).

A spatially balanced design can be seen as an extreme form of stratification (Stevens and Olsen, 2004) that aims to reduce the frequency of placing samples close to each other (relative to simple randomisations). This process improves efficiency by reducing the amount of spatial auto-correlation between data implying that each sample is providing as much unique information as possible (Grafström and Tillé, 2013). Additionally, spatially balanced designs are more efficient than other types of randomised designs as they tend to increase balance on many environmental variables (also known as covariates), where the population’s covariate mean is equal to the sample’s covariate mean (Grafström, 2013). This is more than just stratifying for important environmental gradients, as that process does not ensure balance unless explicitly accounted for. Even if balance is sought in stratification, the simple randomisation process within strata lacks efficiency, can complicate analyses, and can be wasteful of ‘degrees of freedom’ in the analysis (reducing analytical power – where relevant). In summary, spatially balanced designs are used to enhance efficiency so that the greatest amount of information is obtained from any number of sample locations (compared to other forms of randomisation).

This type of efficiency is not the only consideration though - logistical considerations often impose practical constraints. Take for example baited remote underwater video (BRUV) surveys where there are often multiple BRUV cameras, each of which must undergo a ‘soak time’ before the data can be collected. In this case, it is inefficient to sit at a single station during the soak time of each and every BRUV deployment (not sampling and not travelling to other sample locations). Instead, it may be better to make multiple deployments in a spatial ‘cluster’ with all the available BRUVs sampling simultaneously. Similar arguments can be made for sampling gear that takes considerable time to deploy and retrieve (e.g. SCUBA), where multiple transects can be swum in a single dive. This type of design is known as a cluster design in the sampling literature (Thomson 2012) and has been successfully used for marine sampling (e.g. Lawrence et al., 2015 and Hill et al., 2018). The location of the clusters can still be spatially balanced (Lawrence et al., 2015), which gives the spatial balance over the survey area. We suggest trying to make the number of clusters as large as possible, especially if there is a trade-off between the number of clusters and the size of those clusters. In some situations, like the BRUV example, the number within each cluster may be naturally specified by the number of BRUV cameras that are available or can be safely stored on the vessel. All design decisions have implications for analysis, and using nested designs is no exception. When analysing nested data, there should be some accounting for within-cluster correlation. This can be achieved using a cluster random-effect, or by using a geostatistical model (e.g. Diggle, P. & Ribeiro, 2007 and Banerjee, et al. 2004).

Some researchers will know spatially balanced designs as ‘GRTS’ (for generalized random tessellation stratified; Stevens and Olsen, 2004), but GRTS is just one type of spatially balanced design. It is a good design approach and it is the prime reason that spatially balanced designs are gaining popularity. However, it is not the most spatially balanced design, which implies that it is also not the most efficient (Grafström et al., 2012; Robertson et al., 2013; Foster et al., 2017). Between the various spatially-balanced design types, the differences in relative performance are minor. Computational methods for GRTS, via the *spsurvey* R-package (Kincaid and Olsen, 2016), in our experience can be cumbersome, time-consuming and in some ways inflexible. Experienced GRTS users can legitimately continue using it, as the efficiency cost is not large, and they have already overcome many of the more cumbersome aspects. However, we recommend that new users start with *MBHdesign*.

While we focus here on spatial balance, many (but not all) of the algorithms for producing spatial balance can be employed to sampling situations that involve more than just 2-dimensional space. In particular, the algorithms implemented in *MBHdesign* are equally applicable to space-time scenarios and even space-depth-time ones (where a 3-dimensional volume, such as a water mass, is sampled over time). In fact, the algorithms scale well with dimensions, and there is no limiting dimensionality, except what is practical in the application.

The efficiencies of spatially balanced designs can be further improved by increasing the probability of selecting sites (sampling locations) where the sampling variable is thought to have greater variance (e.g. Godambe and Joshi, 1965; Brewer et al., 1988; Chambers, 2011; Grafström and Tillé, 2013). Here, we use the term ‘site’ to mean the location where a single deployment is made. We note that others may have slightly different definitions. This is achieved by altering the so-called inclusion probabilities of each potential site. Inclusion probabilities specify the chance of each site being randomly chosen to be part of the survey and they can be chosen on the basis of data from a pilot study or from other sources (e.g. literature on similar species and/or regions). A very low inclusion probability (near zero) will imply that the site will almost never be sampled, whereas a site with very high inclusion probability will be chosen much more often. The inclusion probabilities are prescribed by the survey designer to indicate where the sampling effort should be placed (see Grafström and Tillé, 2013, for more information on how to perform this task).

In ecology, where univariate biological variables often have an increasing mean-variance relationship (e.g. through Taylor’s power law; Taylor, 1961), this equates to increasing inclusion probabilities in locations where the variable being sampled is expected to have high abundance (noting again that this often the motivation for judgemental samples but here we embed auxiliary information on abundance within a strictly probabilistic framework). If no prior knowledge exists about the variable under study, which may have been obtained from previous surveys or a pilot study, then the inclusion probabilities should be equal.

Special consideration is required in situations where there are multiple outcome variables to be measured, such as the sampling of multiple different community types or multiple species. In these cases, the inclusion probabilities should reflect ‘combined usefulness’. For sampling multiple communities/species this means that each community/species should be effectively sampled and that the combined inclusion probabilities should reflect this. Whilst the ‘combined usefulness’ concept is vague, it should reflect the combined utility of each sampling site to each component of the multivariate observation. In situations where the multivariate observations are independent or even negatively correlated, such as communities occupying different habitats might be, then inclusion probabilities may be increased for each different habitat. The nett result of this process may be an inclusion probability surface that is quite even and so equal inclusion probabilities may be a good default..

Altering inclusion probabilities requires the identification of one or more measured covariates (available at time of design) that can be used to guide the variation in inclusion probabilities. It is beneficial only in situations where the inclusion probabilities are related to the sampling variable. When inclusion probabilities *do not* have this relationship, then this will cause a *loss* of efficiency (lower precision) than equal inclusion probabilities. We caution against using too many covariates in the design stage and point out that equal inclusion probabilities is a conservative and usually adequate approach. In fact, fewer covariates is better in many ways. The simple reason is that if they are used to define the design then they must also be used in the analysis (as the design is conditional on these covariates), see Gelman et al. (2013) and Foster et al. (2017) for discussion. This means that precious ‘degrees of freedom’ must then be used to estimate potentially non-helpful parameters, which has the effect of increasing analysis complexity and reducing the discrimination ability of the analysis. So, the survey designer must weigh up the anticipated reduction in variation due to incorporating the covariate against the necessity to use more terms in the model. When there are multiple sample variables of interest, altering the inclusion probabilities should be considered carefully as altering the probabilities to reduce the variation in one variable may be at the expense of others.

The concepts of stratification and altered inclusion probabilities are almost, but not quite, identical in situations where stratification is applicable. However, at the cost of being conceptually more sophisticated, the inclusion probability concept is more general and more flexible. The reasoning for the equivalence is that the inclusion probabilities can be designed to match the stratification, so that *on average* the specified number of survey sites is chosen within each strata, but this is not guaranteed for every randomised design. Contrastingly, all stratified designs will have the specified number of survey sites within each strata. To us, this is not a large difference and the ability to spatially balance the design is likely to lead to bigger benefits. We therefore recommend altering inclusion probabilities with spatial-balance in preference to formal stratification. However, stratification is not a bad option and is more efficient than simple randomisation (when the stratification is meaningful). We note that the _spdesign _software that implements GRTS allows for stratification _and _spatial balance by balancing within each spatially-contiguous strata.

When planning marine monitoring programs, the ability to incorporate any existing sites will often be advantageous, especially when those sites are part of a random sample. An example is when certain sites are mandatorily sampled to achieve regulatory compliance or where sites must be sampled in the future to demonstrate compliance. In the NESP Marine Biodiversity Hub, methodology was developed to incorporate these *legacy* sites into a spatially balanced design. Legacy sites (or historical, reference or sentinel sites) are those sites that have been sampled in the past and the researcher wants to re-visit them as part of the upcoming survey for comparability, or sites that must be sampled in the future, for example to quantify the effects of decommissioning oil and gas platforms in-situ. Readers are referred to Foster et al. (2017) for details. Briefly however, spatial-balance is achieved by setting the inclusion probability of legacy sites to one and adjusting inclusion probabilities (within the proximity of legacy sites) downwards so that new samples are less likely to be placed very near them.

## Software

There are many pieces of software that will generate spatially-balanced designs, most of which are based on different algorithms. For monitoring the marine environment, we developed a specific software – the R-package *MBHdesign*. It is intended to be easy to use and tailored to common situations in marine ecology. It also has the ability to make designs spatially balanced around existing legacy sites, see Foster et al. (2017), and also for designing surveys with sampling platforms that are transect-based, see Foster et al (2019). We will use *MBHdesign* in the example to follow.