Skip to main content Skip to navigation
CEREO allison.cramer

Research Profile, N. Potter: Impact of Climate on Agriculture

Today Nicholas discussed his research on the impact of climate on agriculture. Specifically he explained how he was using the ‘matching’ package to create future variables for economic values for land parcels.

The future predictions for climate variables came from model predictions Nicholas already had access to. His economic variables were more difficult as they are the result of not only possible crops but regional variables. The matching package lets a researcher identify variables within the data set which are the most similar, in n dimensions, to the predicted variables. The future economic variable for that parcel is therefore the same as the past variable for the matching site.

Some specifics on this matching method, and references, can be found at this site: http://potterzot.com/research/rgroup.html#3

The .csv file and a text version of the R script can be found below:

sample_data

RP_Potter_matching

 

 

Troubleshooting: Data importing, database IDs, and Plotly tables

#### Issue 1####
#importing data using the “point and click” method in R studio leads to different data structures
#how to make match data structure from read.csv()?

# Input with point and click
str(WeedAbundance)

#input with read.csv() command
dat <- read.csv(“WeedAbundance.csv”)
str(dat)

#lets make it match in type
WA <- data.frame(WeedAbundance)
str(WA)

#lets make the specific row match in type
WA$Location <- as.factor(WA$Location)
str(WA)

 

#### Issue 2####

#Two different names one column.
#want to consolidate into single named variable and keep the record of the index
#so that we can still go between databases

m[‘Vendor.WSUID’] <- stringr::str_extract(m$Comment, “(?i)VENDOR #\\d+”)
m[‘VID’] <- stringr::str_extract(m$Comment, “(?i)V#\\d+”)

d <- c(“Vendor #3452345”, “V#23245234”)
lapply(d, function(i) { strsplit(i, “#”)[[1]][[2]]})

lapply(blah blah paste0(col1, col2, collapse = TRUE))

 

#### Issue 3####

#Indexing tables output from Plotly.

#This issue was addressed after our session by contributor Julia Piaskowski, who solved the error: https://cereo.wsu.edu/2017/09/21/indexing-tables-plotly-output/

 

Indexing Tables From Plotly output

Submitted by our Working Group member Julia Piaskowski. We discussed this a bit in Trouble Shooting, but its relevant to anyone using Plotly. Thanks Julia! :

Plotly is an enormously handy set of interactive plotting function that have been developed for R, Python and D3.js. One very useful feature of plotly is “event_data” where users are able to click on or select parts of the plot for more information or to create new plots from the selected output. There’s several types of event_data, but I am focusing on “plotly_selected” where a user selects several points using a lasso or box selection tool.

I am using plotly in R/Shiny to create an interactive plot. The idea seemed simple: generate a bivariate scatterplot colored by a factor in the data set. Users should be able to select a region of interest, and the full data from those observations will be presented in a table. However, the indexing was not working using the usual mydata[rows,columns] method – it worked for some observations, but not others.

The problem was that the data were being unstacked by the grouping variable used to color the plot, essentially creating a separate a rowXcolumn object for each level of the grouping variable.

The solution was slightly complicated, requiring both subsetting the object AND the plotly event_data. A small example is shown below. A bigger demo is at: https://jpiaskowski.shinyapps.io/cherry_gebv_xplorr/

library(shiny)
library(plotly)

# Make data
set.seed(123)
mydata<-data.frame(Name = replicate(10,paste(sample(LETTERS,3),collapse = “”)),
a = 1:10,b = sample(1:100,10),c = rep(c(“up”,”down”),5),
d = paste0(“secret_data”,1:10))

# Generate the Display (ultra simple in this example)

ui <- fluidPage(
fluidRow(
column(6, tags$h3(“Basic Plot”), plotlyOutput(“Plot1”, height = “300px”)) ,
column(6, tags$h3(“Selected Output”), tableOutput(“clickTable”))
))

# server info
server <- function(input, output){

output$Plot1 %>%
layout(
dragmode = “select”)
})

output$clickTable <-renderTable({

event.data.select <- event_data(“plotly_selected”, source = “subset”)

if(is.null(event.data.select) == T) return(“Choose individuals by selecting a region in the scatterplot”) else {

# Get index from each group
# This is the where indexing was failing previously. The scripts must reference both the group variable (called the “curveNumber” AND the observation (the “pointNumber”). Not to be outdone by something more confusing, plotly is going against R convention and starting their indexing at zero – hence the use of “0” for the “up” group, and why a value of 1 is added to each “pointNumber”.

up.group <- subset(mydata, c == “up”)[subset(event.data.select, curveNumber == 0)$pointNumber + 1,]
down.group <- subset(mydata, c == “down”)[subset(event.data.select, curveNumber == 1)$pointNumber + 1,]

# Combine and make table
table.subset <- rbind(up.group,down.group)
}

})

shinyApp(ui = ui, server = server)

Word Clouds

Today Stephanie Labou spoke about text mining and word clouds. Our script today used the “tm” and “wordcloud” packages.

Some questions were asked at our session, primarily 1) can you mine PDFs? The answer is yes! You can! You can find more information about reading in text from PDFs here and here and here is the online book about text mining in R (including sentiment analysis and n-grams).

Here is some example text to run with the script: data_skills_ms , sourced from Stephanie Labou’s paper

Here is the script: WordCloudScript!

Intro to R

This week was our introduction to R lecture. This lecture was a brief overview of what R is and how to use it, with the goal of making the newer R users in our community literate in R syntax. For this session users needed to have downloaded both R and R Studio.

First we discussed what R is, along with some great resources that people can use to learn R. The first resource comes from the Software Carpentry team: https://swcarpentry.github.io/r-novice-gapminder/. The second is from one of our fellow R group members, Rachel Olsson, and is an introduction for new users designed for one of her labs, but applicable here. Make sure to download both the Lab1 Walkthrough and the Floral_diversity dataset.

The R script for our session (in .txt), along with the notes we added to it in class: IntroRScript

Package Intro: Multivariate Time Series – MAR models

This week Dr. Steve Katz will discuss multivariate time series analysis using the MARSS package. There is some supplementary material for this talk:

packages needed: MAR1 and MARSS

An example of using MAR1 and MARSS on ecological data: R demo supplement 20130305

The package user guide to help orient you with the MARSS package

https://cran.r-project.org/web/packages/MARSS/vignettes/UserGuide.pdf

Research Profile: GLMM and Predictions

Tomorrow PhD Candidate Zoe Hanley will discuss generalized linear models  in R and making prediction maps for wolf distribution. Necessary packages are:

library(glmmADMB) #Generalized Linear Mixed Modeling (GLMMs). Includes zero-inflated distributions.
#Use download instructions from:http://glmmadmb.r-forge.r-project.org/
library(graphics) #temporal autocorrelation graphs
library(lattice) #PACK vs. YEAR graphs
library(bbmle) #AIC table
library(plyr) #create cross-validation progress bar

The data and script can be found below:

RGroup_MAP

RGroup_GLMM

HanleyRProfile

 

Packrat Package – managing package versions

This week CEREO’s Stephanie Labou introduced us to the packrat package. Packrat is a relatively new package that assists collaboration and functionality of code by maintain and standardizing package versions used in a project. Depending on the level of experience, R users may not have ran into this issue before but it is a persistent problem with the R system. Due to the dynamic and open nature of the software, changes and improvements to packages can tweek the way that certain functions interact, making old code buggy or obsolete. Packrat is an attempt to control for this.

Packrat, in essence, creates a large zip file with all of the libraries and settings used for a project. Users then send this entire file to their collaborators and collaborators load packages and libraries from that zip file. This ensures that the versions of packages used are the same across all collaborators. Within packrat, each folder is essentially its own project, with its own packages – packrat folders are created within the working directory when the creation command is called.

The first step in using packrat is to create, or “bundle” your libraries. This is shown in the script below. In addition, the script below uses the “::” syntax to call commands. The double colon symbol is a way to specify exactly which packages commands are being used. This is because some packages have commands with the same name – whichever package is loaded last will overwrite the identically named commands from the other one. This is why people’s script may sometimes have notes about the loading order of packages.

PackratBundling

Once the packrat package has bundled the libraries for a project, you can then send the entire file to a collaborator. To re-create this on your own computer open a brand new R session and then follow the script below, which will unbundle the packrat file created in the above script:

PackratUnbundling

Once you are working within a packrat session there are some useful commands to know. One is sessionInfo() which shows what versions of things you have loaded. There is also a way to install older versions of packages – this is useful if you want to create a new packrat project but you realize your current packages are too new. Information on how to do that can be found here.

Additionally, the scripts provided by Stephanie do an excellent job of annotating, or commenting, on the code. This is especially important when working with collaborators, but is also important when working solo as it makes it easier to troubleshoot issues. Good annotations can help users determine if issues are code issues, are package related (and can therefore be addressed with packrat), or are (rarely) issues with versions of R. R version errors are harder to fix, and are not addressed by the packrat package. But! As Dr. Katz said during this session: “there is a long conversation to be had about strategies in programming for another time.”

Enjoy packrat!

PCA and Atmospheric Research

Today Tsengel Nergui showed us how she used Principal Component Analysis in her atmospheric research. The script and data provided shows an excellent example of PCA application. Tsengel discusses not only int interpretation of the results, but also some of the standardization that one can do prior to PCA.

In the discussion portion of the session we talked about how a conceptual understanding of PCA can be broken into two philosophies: calculating the eigenvalues or focusing on the dissimilarity matrix. Both lead to the same place but some researchers may find one or the other strategy more compelling. PCA, and indeed other multivariate apraoches in R, are very clearly explained in Manly’s Multivariate Statistical Methods: A Primer. The 4th edition has a website that includes example data and script for R. Another good resource is the R package vegan.

In addition to discussin PCA, we also discussed loading jpegs in R. This is very simple to do with the jpeg package.

This talk will require the following packages:

library(stats)
library(plyr)  # plyr must be called before dplyr
library(dplyr) 
library(ggplot2) 
jpeg
rasterImage

Necessary script and data below:

Rsession_MixedBag2_tsengel

BEL116_hourly_O3_met_2012Summer