Find good data sets

A data set is a collection of data points in some kind of structure, like points on a map or a table in an Excel spreadsheet. If you have a research interest and want to learn more, you might analyze a data set created by someone else to help you understand. For example, if you are interested in the health of marine life in Bellingham Bay, you might look for a scientist’s or government agency’s data set on water sample quality to study it for yourself.

There are public and private data sets and they vary in quality. On the advice of Tableau, a data analysis company, know these two things.

You won’t find what you’re looking for. You might find something close, but be flexible.

You’ll have to clean up the data. People aren’t always neat and tidy when they collect information, and you may have to clean up messy documents or spreadsheets.

A good data set has these qualities:

  • Has information close to what you need
  • Is not summarized or too high-level, has details
  • Measures a couple of things
  • Is consistent in naming categories or measures
  • Is useable (not too messy or big)

Open data sets are widely available, so consider What’s good and not good data? when you look at what’s out there. Here are a few places to start looking.

The US government provides as a lot of data sets at data.gov, from lottery winning numbers to crime data to climate science.

Google Datasets links to a variety of data sets, although not all of them are free.

This open data repository list is also a great place to look, it has a ton of subject areas.

WCC Library also has access to datasets through OneSearch.


Translate »