Spatial Random Sample, Sample

Often, when performing spatial analysis, one may need to execute some type of sampling across space.  For example, one may need to sample locations across a geographically continuous surface (think soils, anything weather related, etc.).  A spatial random sample can be used to select locations without bias.  With a simple python script one can develop a spatial random sample with relative ease.  In this post I will cover a few definitions, provide a code sample, and discuss some additional points.

First, a few definitions:

Random Number: A number chosen as if by chance from some specified distribution such that selection of a large set of these numbers reproduces the underlying distribution.

Statistical Randomness: A numeric sequence is said to be statistically random when it contains no recognizable patterns or regularities; sequences such as the results of an ideal dice roll, or the digits of π exhibit statistical randomness.

Simple Random Sample: A sample in which every element in the population has an equal chance of being selected.

Second, what is a spatial random sample?

Spatial Random Sample: Locations obtained by choosing x-coordinates and y-coordinates at random (p. 58). Any points that do not intersect the landform will be dropped from the list of random points.  

Third, give me some python code to do this!

import os, random
from time import strftime

f = open("C:\\Data\\output\\spatial_random_sample.csv", 'w')

#How many points will be generated
numpoints = random.randint(0,1000)

# Create the bounding box
#set longitude values - Y values
minx = -180
maxx = 180

#set latitude values - X values
miny = -23.5
maxy = 23.5

print "Start Time:", strftime("%a, %d %b %Y %H:%M:%S")
#Print the column headers
print >>f, "ID",",","X",",","Y"
for x in range(0,numpoints):
print >>f, x,",", random.uniform(minx,maxx),",",                      random.uniform(miny,maxy)
f.close()

print "Script Complete, Hooray!", numpoints, "random points generated"
print "End Time:", strftime("%a, %d %b %Y %H:%M:%S")

This quick, dirty and very simple script does a few things. First, it creates a csv file in a local directory, and by using the ‘w’ mode the file will be created if it doesn’t exist and will be overwritten every time the code is run (so be careful).

Next, the code  selects a random number of points to be generated. In this case it will be a random integer between zero and 1,000. The user will then set the bounding box for which the points will be contained by. If using ArcPy and ArcGIS the user could easily set the bounding box to that of a particular layer. In this example, it is simply 180,-180 and the approximate Tropic of Cancer and Tropic of Capricorn.

The next block of code will generate the random number of points in the specified ranges and print them to a csv file.  The output is fairly straight forward: three columns, an ID field and X and Y. The user can open the file in OpenOffice as they could any other csv file.

Well, that’s great.  With this data the user can easily visualize it in Quantum using the Add Delimited Text Layer tool from the Layer menu. Since the output was formatted with X and Y fields the tool will populate itself:

Once the user clicks OK the points will be added to the map.  From there the user can export the data to any number of formats and perform their analysis.

As you can see it is pretty easy to generate random points with the script.  In fact, ArcMap and Quantum have tools that will do this, but both run much slower than just creating a simple spatial random sample as demonstrated here, as they have many more options than this simple script.  Also, the Arc version will only work if the user has ArcEditor or the spatial analyst extension.  The folks at SpatialEcology also have a tool that will do this within ArcMap as well, and I am sure there are other tools out there as well.

But before we wrap this up, here are a couple notes:

  • This is a simple example, and not intended to be an “end-all, be-all example”.
  • Python generates psuedo-random values
  • The points that are generated have an equal chance of being created, meaning that whatever is being sampled with those coordinates has an equal chance of being selected as well.
  • The script presented here does not check against any boundaries, only a bounding box.
  • The above code can easily be extended to work within ArcPy and ArcGIS.  I can post the code later on if there is interest.

GISDoctor Spatial Analysis Post Series

There once was a well know GIS blog post that compared geographic information systems to word processors.  No matter what you think about the post we will always need people who are skilled at “writing” and have something to “write” about.

As I have said before, and will say again, if you are using GIS technologies you should have a grasp on the fundamentals.  You wouldn’t write a paper or a report without a grasp on the basics of the topic or without a knowledge of writing in general.  So, to improve the world’s GIS grammar (or at least my own), I will be posting a number of spatial analysis related topics over the course of the next few months.  Here are a few of the topics I will cover:

  • Data classification schemes
  • Understanding spatial random samples
  • Topology, from a spatial point of view
  • The basics of projections
  • Avoiding false accuracy
  • Using root mean square
  • Geary’s c and Moran’s I
  • The First Law of Geography
  • Spatial autocorrelation
  • and many more…

I’ll use a variety of software, data, and problems to explain these topics, in order to expose the reader to the broad language of GIS.

Apple and OSM – The Year of OpenStreetMap Continues

The year of OpenStreetMap continues.  You have probably heard by now that Apple is now using a mix of TIGER data and OSM tiles in their mapping application.  As I said a couple weeks ago, 2012 is the year of OpenStreetMap, and this change for Apple, who had been using Google’s mapping data, is the biggest switch to date.

As I have said before, when large, well established organizations switch to these open data sources it can have a major impact on the open data movement, and Apple is probably as big as it gets.   However, Apple could derail the momentum that is the Year of OpenStreetMap!

The rumor on the street (haha, get the pun!) is that Apple is using an older set of tiles and TIGER data (yes, that TIGER).  These older datasets aren’t perfect and anyone who has ever taken a GIS course knows that TIGER data should be used for reference purposes only, and not in a global application that will potentially have millions of users.  Now, why would Apple be using this older data?  Are we seeing a beta product while they get ready to push new tiles out soon?  Do they not have any well trained geographers or GIS pros working for them who know about data quality?  Are they not taking their mapping applications seriously?

If OpenStreetMap data is to be successfully integrated into an application the users of that application will need to trust the quality of the data.  If the most influential tech company in the world messes this up it could impact who joins the OSM movement next, and perhaps set the movement back.

For more details on the switch and the data issues check out what SlashGeo had to sayJames Fee’s comments, and this article from Geek.com.

A few motivated indivduals have created some really great mash-ups that display the new Apple tiles.  Check them out for yourself to compare what currently exists in OSM and what Apple has published:

And one last comment.  Apple’s map visualization scheme is horrible.  Of all the great basemaps out on the web and Apple designs a visualization scheme that just screams 2001.  Maybe it’s being optimized for mobile devices, but as a trained cartographer I think it looks bad.

Full disclosure.  I am not an Apple person.  I have a Dell laptop, a Samsung phone, and an old IPod.

Time to Learn Code, GIS Pros!

Check out this great article from Adena Schutzberg at Directions Magazine from earlier this week, “Should All GIS Users Learn to Code?

Adena’s answer to this question?  Yes, they should, and I totally agree.  In fact, I’ve been saying it for a while.  GIS users don’t need to be experts in multiple languages, but they should be able to both create, understand, and dissect code at some level.  I also believe that any self respecting college or university department that teaches GIS should include some programming requirement, whether it is a semester long course or integrated into an advanced GIS course.  By learning to code the GIS user becomes not only more flexibile in the workplace, but more valuable to their employer.

So, if you have a few minutes check out Adena’s article.  A lot of great points!

The Year of OpenStreetMap Continues

On February 28th, 2012 I (yesterday), I posted a blog titled “2012 -The Year of OpenStreetMap”.

In the post I said that “the next “big” mapping application that hits the market will be have an OSM back-end.”  Well, well, well, talk about timing.  Today, FourSquare announced that they will be switching to OSM for their mapping back-end using MapBox.  Pretty cool.

If you get a minute take a gander at the comments in the FourSquare blog post.  There is a mixed reaction to the switch, which I can understand.  There are also a number of really good suggestions that FourSquare could take a look at too.  Just like when any social network incorporates a design change there will be some push back from the users.  But, with time the OSM footprint will improve, as the community will grow, mature, and produce better maps.  People seem to forget that early on Google Maps wasn’t perfect and had data gaps as well.  FourSquare would be smart to somehow encourage OSM mapping parties and promote what is now being called the OpenStreetMap movement!

February 28, 2012.  You heard it here first.  2012 – The Year of OpenStreetMap.  What will be next!

2012 – The Year of OpenStreetMap? Yes.

OpenStreetMap has been in the news a lot lately, and rightfully so.

Has the geospatial world reached the tipping point? Are users, developers, and society as a whole now more accepting of open-source spatial information? Are we now confident in the crowd sourced masterpiece that is OSM?

Yes, yes, and yes.

So, now two full months into 2012 I’m calling it.  2012 is the year of OpenStreetMap.  But why now?  I think it is due to a few reasons:

  • Quality and Coverage Improvements:  When OSM started many parts of the world were under-mapped, but once the community of users developed so did the maps.  Over time, and with great publicity during certain global events, the coverage and quality of the maps drastically improved.  In 2012 the data in OSM is now equal to, or better than well known web mapping tools.  For example, check out the coverage for North Korea.
  • Development of the Contributors Community: When I first learned of OSM several years ago I was skeptical of the random people creating this global street map, just like I was skeptical of Wikipedia.  Well, I was proven wrong (I’m still skeptical of Wikipedia…).  Even though there has been instances of tampering of OSM, its contributors have proven to be a consistent and reliable source of quality data.  I often spot check locations that I am familiar with to see if anything is amiss or needs to be updated, and thankfully I rarely have to make edits.  The growing and dedicated user community has really driven the quality, which is a great thing!
  • Credibility: Credibility is tough to earn, but through the efforts of users, developers, and the map using public, many reputable organizations trust the data available in OSM.  As OSM’s credibility grows a wider variety of well known organizations will start to use their data.  I’m guessing the next “big” mapping application that hits the market will be have an OSM back-end.
  • The Paywall: If you had a choice between spending something on a service or spending nothing on a very comparable (or perhaps better) service which would you select.  I would pick the equally as good free service.  You’ll see this with OSM.

 

So, there you have it.  I think you’ll hear a whole lot more out of OSM in 2012, whether it is about new and exciting applications built using their data, or companies switching their services from one of the major players to OSM.

There it is, my reason calling 2012, the Year of OpenStreetMap…two months late 🙂

Now, go host a mapping party!

One Million Points

I am working on a couple projects and I need to generate some random points across defined bounding boxes.  I have the basic code worked out in python and I am testing the results.  Just for kicks here is a million points generated in Python to a csv file (in 10 seconds) and drawn (rather quickly) in Quantum.  Sweet.

One Million Points

I’ll share the code on Thursday.

 

TileMill for Windows

There has been a lot of buzz around TileMill lately.  With the new Windows versions recently released I’d figured I would give it a try.  I’ve seen demonstrations of the open source application before at WhereCamp Boston and I was excited to use the Windows version as I had never used the other OS versions before.

Before I started mapping I grabbed some point data from GeoNames, downloading the cities under 5000 table.  The table contains about 45,700 records representing cities around the world.  As you can see from a screenshot of the data loaded into Quantum the data are in Lat/Long already, making the data readily usable for web mapping.

After loading the GeoNames csv file into TileMill I followed the crash course and built this simple map:

As you can see I didn’t do much in terms of styling the map, but for testing purposes I played around with adding additional styles and worked with the built in features.  Once I was happy with my test map I exported a web-friendly png file to disk that came out like this:

What I liked:

  • I really, really, like Carto.  One of my biggest frustrations with traditional GIS software are the difficulties associated with quickly styling a map, or experimenting with the design on the fly.  I love how a user, with some experience, can quickly and effectively style a map.
  • The user interface, especially the projects window is well designed. I really appreciate the clean, fast, and intuitive feel of the program.  Sometimes simple is better.
  • There is an easily accessible manual from the main application.  I went back to it a couple times during my experimentation to use it.
  • Tutorial was easy to follow, even using my own data.
  • The tool was able to handle the 45k+ points easily.  I’d like to next try a complex polygon object.
  • The export tool allowed the user to set the pixel size, extent, and image format quickly and easily.

What I didn’t like:

  • The program crashed on me a couple times, probably due to me rushing and not reading instructions correctly.
  • Start-up on my machine (Windows 7, 64 bit, plenty of processing) was close to 30 seconds.
  • If you are unfamiliar with CSS, Carto might seem a little awkward at first, but with some practice it is pretty easy to understand.
  • The install package on their website would run on my machine but when I would start the program a cmd window would open and then close.  After reading through their forums I found that others had the same problem.  The folks at TileMill provided another install package that worked on my Windows 7 machine.

One final thing, a great addition would be to add an intellisense like feature to TileMill for Carto.

Overall, I thought it was a great tool, and I will use it again.  If you have some free time, or are looking for a new way to map you data, check it out.