Skip to main content Skip to navigation
CEREO Data

Intro to R

This week was our introduction to R lecture. This lecture was a brief overview of what R is and how to use it, with the goal of making the newer R users in our community literate in R syntax. For this session users needed to have downloaded both R and R Studio.

First we discussed what R is, along with some great resources that people can use to learn R. The first resource comes from the Software Carpentry team: https://swcarpentry.github.io/r-novice-gapminder/. The second is from one of our fellow R group members, Rachel Olsson, and is an introduction for new users designed for one of her labs, but applicable here. Make sure to download both the Lab1 Walkthrough and the Floral_diversity dataset.

The R script for our session (in .txt), along with the notes we added to it in class: IntroRScript

Bringing in Data and Publicly Available Data Packages

This week we discussed how we bring in data, forms of data, good sources for help, and some packages that pull in publicly available data.

First of all, we talked about R Studio (https://www.rstudio.com/). R Studio is a great interface for using R and in addition it allows for some “point and click” methods of bring in data. The “input dataset” button on the top right square of the R Studio interface allows you to input data from either a local file on your computer or by connecting to the internet.

Now, data can also be brought in through code. A good resource for ways to import specific types of data is this Quick R page: http://www.statmethods.net/input/importingdata.html. The most common data type that people work with is .csv files, which are inported using the “read.csv()” command. If you want to read an Excel file you need the “xlsx” package.

If you want to read data from a website, which the point and click method in R Studio lets you do, there are many ways to do it. Two common ways are using the “RCurl” package or the “data.table” package. Examples of that code are below. Remember, to use a package you need to first have the package installed (“buying the book”) and then you need to use the library command to use the package (“taking the book off the shelf”).

library(RCurl)
myfile <- getURL(‘https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/data/bycatch.csv’, ssl.verifyhost=FALSE, ssl.verifypeer=FALSE)

library(data.table)
mydat <- fread(‘http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat’)
head(mydat)

Some packages that we discussed which make us of publicly available data are:

Out of these packages EcoRetriever is the hardest to install. You must first install the Retriever program from http://www.data-retriever.org/, then install the ecoretriever package. This will allows you program in queries of the data available at data-retriever.org.

An example of using one of these packages, the dataRetreival package which is automatically accessed through the package ‘EGRET’ can be found here: r_for_hydrology_script.  This script is from R Working Group contributor Tung Nguyen.

In addition to those packages there were questions about Economic and Social Science data sources. Here are some packages or resources that I tracked down which have data specific for those fields:

Informational Meeting

Today we discussed the format of the R group and potential topics for the rest of term.

The group functions in small 10-15 minute lessons given by group participants on specific topics. Topics of interest this term are:

Mapping
Graphing
Survival Estimation
Writing/creating packages
Social Science specific packages
Data management
Bayesian Statistics

We will cover these and more topics this term.

Next Week’s session will be “Data Sourcing and R Studio”. We will cover online sources of data and some cool packages to get that data, along with a brief intro to R studio shortcuts and use. If you do not have R studio please download it from https://www.rstudio.com/.