Often, when performing spatial analysis, one may need to execute some type of sampling across space. For example, one may need to sample locations across a geographically continuous surface (think soils, anything weather related, etc.). A spatial random sample can be used to select locations without bias. With a simple python script one can develop a spatial random sample with relative ease. In this post I will cover a few definitions, provide a code sample, and discuss some additional points.
First, a few definitions:
Random Number: A number chosen as if by chance from some specified distribution such that selection of a large set of these numbers reproduces the underlying distribution.
Statistical Randomness: A numeric sequence is said to be statistically random when it contains no recognizable patterns or regularities; sequences such as the results of an ideal dice roll, or the digits of π exhibit statistical randomness.
Simple Random Sample: A sample in which every element in the population has an equal chance of being selected.
Second, what is a spatial random sample?
Spatial Random Sample: Locations obtained by choosing x-coordinates and y-coordinates at random (p. 58). Any points that do not intersect the landform will be dropped from the list of random points.
Third, give me some python code to do this!
import os, random from time import strftime f = open("C:\\Data\\output\\spatial_random_sample.csv", 'w') #How many points will be generated numpoints = random.randint(0,1000) # Create the bounding box #set longitude values - Y values minx = -180 maxx = 180 #set latitude values - X values miny = -23.5 maxy = 23.5 print "Start Time:", strftime("%a, %d %b %Y %H:%M:%S") #Print the column headers print >>f, "ID",",","X",",","Y" for x in range(0,numpoints): print >>f, x,",", random.uniform(minx,maxx),",", random.uniform(miny,maxy) f.close() print "Script Complete, Hooray!", numpoints, "random points generated" print "End Time:", strftime("%a, %d %b %Y %H:%M:%S")
This quick, dirty and very simple script does a few things. First, it creates a csv file in a local directory, and by using the ‘w’ mode the file will be created if it doesn’t exist and will be overwritten every time the code is run (so be careful).
Next, the code selects a random number of points to be generated. In this case it will be a random integer between zero and 1,000. The user will then set the bounding box for which the points will be contained by. If using ArcPy and ArcGIS the user could easily set the bounding box to that of a particular layer. In this example, it is simply 180,-180 and the approximate Tropic of Cancer and Tropic of Capricorn.
The next block of code will generate the random number of points in the specified ranges and print them to a csv file. The output is fairly straight forward: three columns, an ID field and X and Y. The user can open the file in OpenOffice as they could any other csv file.
Well, that’s great. With this data the user can easily visualize it in Quantum using the Add Delimited Text Layer tool from the Layer menu. Since the output was formatted with X and Y fields the tool will populate itself:
Once the user clicks OK the points will be added to the map. From there the user can export the data to any number of formats and perform their analysis.
As you can see it is pretty easy to generate random points with the script. In fact, ArcMap and Quantum have tools that will do this, but both run much slower than just creating a simple spatial random sample as demonstrated here, as they have many more options than this simple script. Also, the Arc version will only work if the user has ArcEditor or the spatial analyst extension. The folks at SpatialEcology also have a tool that will do this within ArcMap as well, and I am sure there are other tools out there as well.
But before we wrap this up, here are a couple notes:
- This is a simple example, and not intended to be an “end-all, be-all example”.
- Python generates psuedo-random values
- The points that are generated have an equal chance of being created, meaning that whatever is being sampled with those coordinates has an equal chance of being selected as well.
- The script presented here does not check against any boundaries, only a bounding box.
- The above code can easily be extended to work within ArcPy and ArcGIS. I can post the code later on if there is interest.
You might be interested in the R statistical software package “spsurvey”, which is used for the design and analysis of spatially-balanced probabilistic surveys. Besides providing good spatial coverage, it uses a local neighborhood variance estimator that tends to give smaller variance estimates than other estimators. Many environmental protection agencies, both federal and state, use such survey designs. This is a link to a web site describing such surveys: http://www.epa.gov/nheerl/arm/designpages/monitdesign/monit_intro.htm
The 2004 paper by Stevens and Olsen, “Spatially-Balanced Sampling of Natural Resources”, describes the survey methods and can be found here:
http://www.epa.gov/nheerl/arm/bibliography.htm