Bringing in Data and Publicly Available Data Packages

This week we discussed how we bring in data, forms of data, good sources for help, and some packages that pull in publicly available data.

First of all, we talked about R Studio (https://www.rstudio.com/). R Studio is a great interface for using R and in addition it allows for some “point and click” methods of bring in data. The “input dataset” button on the top right square of the R Studio interface allows you to input data from either a local file on your computer or by connecting to the internet.

Now, data can also be brought in through code. A good resource for ways to import specific types of data is this Quick R page: http://www.statmethods.net/input/importingdata.html. The most common data type that people work with is .csv files, which are inported using the “read.csv()” command. If you want to read an Excel file you need the “xlsx” package.

If you want to read data from a website, which the point and click method in R Studio lets you do, there are many ways to do it. Two common ways are using the “RCurl” package or the “data.table” package. Examples of that code are below. Remember, to use a package you need to first have the package installed (“buying the book”) and then you need to use the library command to use the package (“taking the book off the shelf”).

library(RCurl)
myfile <- getURL(‘https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/data/bycatch.csv’, ssl.verifyhost=FALSE, ssl.verifypeer=FALSE)

library(data.table)
mydat <- fread(‘http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat’)
head(mydat)

Some packages that we discussed which make us of publicly available data are:

Out of these packages EcoRetriever is the hardest to install. You must first install the Retriever program from http://www.data-retriever.org/, then install the ecoretriever package. This will allows you program in queries of the data available at data-retriever.org.

An example of using one of these packages, the dataRetreival package which is automatically accessed through the package ‘EGRET’ can be found here: r_for_hydrology_script.  This script is from R Working Group contributor Tung Nguyen.

In addition to those packages there were questions about Economic and Social Science data sources. Here are some packages or resources that I tracked down which have data specific for those fields: