23 July 2015

Data Horrors

"The great tragedy of science -- the slaying of a beautiful hypothesis by an ugly fact."  Thomas H. Huxley.

Sometimes, though, you have to pay attention to just how ugly the observation (fact) is.  And even more to how ugly a collection of observations is.  Science fair project I judged a couple of years ago, the student mentioned his methods for keeping the experiment, which had to be untouched while going, out of reach of his young brother.  This student has a firm grasp of the ugliness of data and trying to collect it.  I gave him high marks.

I also mentioned a story or two I knew of data collection challenges.  I'll share them and some others here, and invite you to add your own.

One family of ocean data comes from buoys floating on top of the ocean.  A lot of the ocean is far from land, therefore far from perches for birds.  Sea gulls and other birds are often grateful for the lovely perches we're putting out for them.  Unfortunately, it does not help the accuracy of your wind speed measurements to have a bird sitting on your gauge.  Birds sitting on the solar panel reduce your energy available/recharge rate, and thence maybe lead to data outages while waiting for recharging. Guano is great for fertilizer, but wrecks havoc on the accuracy of your temperature, pressure, and moisture readings.

Walrus don't mind taking a rest every now and then either.  They're not normally a threat to wind speed measurement (which is at the top of the buoy).  But we also want to get wave measurements -- how high are they, how fast are they, what direction are they going.  Having a walrus or two on your buoy slows its ability to respond, and may suppress the peaks of the measured waves.

On land, your instrument enclosures (the Stevenson Screen for instance) provide a nice place for bees, wasps, small birds to nest.  Squirrels like to play with them too.  A beehive next to your thermometer does not help its accuracy.

Back at sea, I once got a call about a problem buoy.  It was reporting extremely high temperatures near noon because the paint had been stripped during a storm, and the now-bare metal was reflecting sunlight onto the marine thermometer.

That should get you started for remembering your own horror stories about data collection.

Recently saw someone on the web taking the line that if data wasn't perfect, you should throw out everything from that instrument or site.  Well, no.  If you did that, you'd never have any data to work with.  For my examples, you mostly just ignore the data during the period you've got a walrus infestation.  But there are other kinds of things which affect your observing, and which you might be able to compensate for.

5 comments:

Catmando said...

" A beehive next to your thermometer does not help its accuracy."

Apian heat island anyone?

afeman said...

There must be a marine biologist or two horrified that you're calling those walruses!

Robert Grumbine said...

Sea lions? Otters? I'm pretty sure they aren't iguanas.

I have a couple of field biologist friends. It took them quite a while to believe that 'little brown bird' really was the most precise I could be for many of my id efforts.

Anyhow, the field people do tell stories of walrus.

Anonymous said...

This question is OT but I hope you don't mind. I'm a non-science layperson with an interest in the topic of global warming. Reading comments on another blog, a sceptic linked to a couple of graphs on the WUWT site purporting to be from NOAA showing US temperature plots. Both graphs show increasing trend lines but also "average" temperature lines which are flat. I hope I don't sound too dumb, but I thought the the trend WAS a plot of the average temperature. I'm confused as to why they're different. Can you please explain it to me in simple terms. Thanks.

Robert Grumbine said...

Anon:
One of the first things to observe is that you're interested in _global_ warming, but the WUWT figures are for _US_ temperature. The US (all 50 states) is right about 2% of the surface of the globe.

I don't know what's up over there, but it seems like a good chance for me to take a look at the data from NOAA myself, both global and the 'contiguous US' (lower 48 states -- even less than 2% of the globe since Alaska is excluded). I retrieved the US data from http://www.ncdc.noaa.gov/cag/time-series/us/110/00/tavg/12/07/1895-2015.csv?base_prd=true&firstbaseyear=1901&lastbaseyear=2000

And the globe from http://www.ncdc.noaa.gov/monitoring-references/faq/anomalies.php (monthly data, global, land and ocean, csv format)

The WUWT notes often look only at the US. I think it seems a bad idea to try to understand the globe by ignoring 98% of it. Beyond the matter of ignoring 98% of the earth's surface, there's another reason not to use subsets of the globe (only) in looking for understanding the global climate. That is, not only can the one small area not be representative of the averages of the rest of the world, it is also much more variable. Same effect as a short run of me making all my free throws would not be too surprising even though I'm a poor shot. But sometimes I get lucky for a few in a row. In the long haul (climate of the globe), bet on the NBA player over me.

I'll flesh this out with some proper graphs and figures, in terms of looking at what we can tell from the two data sets. Should appear ... let's say Tuesday so that I'll be sure to have it in hand.