Skip to main content Skip to navigation
CEREO February 2017

PCA and Atmospheric Research

Today Tsengel Nergui showed us how she used Principal Component Analysis in her atmospheric research. The script and data provided shows an excellent example of PCA application. Tsengel discusses not only int interpretation of the results, but also some of the standardization that one can do prior to PCA.

In the discussion portion of the session we talked about how a conceptual understanding of PCA can be broken into two philosophies: calculating the eigenvalues or focusing on the dissimilarity matrix. Both lead to the same place but some researchers may find one or the other strategy more compelling. PCA, and indeed other multivariate apraoches in R, are very clearly explained in Manly’s Multivariate Statistical Methods: A Primer. The 4th edition has a website that includes example data and script for R. Another good resource is the R package vegan.

In addition to discussin PCA, we also discussed loading jpegs in R. This is very simple to do with the jpeg package.

This talk will require the following packages:

library(stats)
library(plyr)  # plyr must be called before dplyr
library(dplyr) 
library(ggplot2) 
jpeg
rasterImage

Necessary script and data below:

Rsession_MixedBag2_tsengel

BEL116_hourly_O3_met_2012Summer

 

 

 

 

 

WSU Student Water Club

There is a new Water Club at WSU, it is for undergraduate and graduate students interested in water-related sciences to have a hands-on opportunity to experience each other’s research, share in new opportunities, and socialize in a relaxed environment.

Will largely be run by graduate and undergraduate students, meaning that you are in control of what its mission, function, vitality and visibility. Potential activities may include, but are not limited to student-run research projects, opportunities to get involved in and acquainted with water related studies across sciences and humanities, and opportunities for professional development, social activities, and volunteer opportunities. Above all, this will be a contingent run by students and for students. No prior background with water-related study is required!

If you have any further questions, please do not hesitate to email Michael Meyer (michael.f.meyer@wsu.edu) or Julie Padowski (julie.padowski@wsu.edu).

 

 

2017 EcoFS Summer Course Flyer

Caribbean Ecosystem Field Studies
* Study, snorkel & SCUBA dive on the Caribbean coral reef of Mexico *
    May 21- June 10  or  June 14 – July 4
Colorado Ecosystem Field Studies  
* Study, camp, & hike in the Rocky Mountains of Colorado *
   June 18 – July 8  or  July 15- August 4
For all course information visit the website: EcoFS.org

 

 

High Performance Computing and R – WSU’s Kamiak Cluster

This week we had a guest speaker, Jeff White from IT, who discussed accessing the Kamiak High Performance Computer on campus (slides can be found here). We also discussed creating .csv files and getting that data into R.

Kamiak is a computer may be accessed by any student with an approved access. Access can be set up by contacting CIRC, the Center for Institutional Research Computing, which runs Kamiak through their Service Desk. You will need to make an account first and your adviser or project PI will need to vouch for you.

Kamiak is a large computer, or “cluster” of smaller computers which work in tandem. Kamiak is a Linux system – what that means functionally is you access it through what is called the “secure shell”, or ssh. This is an interface which communicates with the computer remotely, so you load it up on your personal computer and then can run programs and software on the Kamiak computer. It is not a point and click system, but rather one that is done by coding, in this case Linux. Information on how to install or open ssh software onto your own computer can be found here: https://hpc.wsu.edu/users-guide/terminal-ssh/.

Once you have the ssh running, and an active Kamiak account, you log into the computer using your WSU credentials. There are a vast number of commands you can use to communicate with the computer – here is a good resource for learning Linux in general, which goes over both the “secure shell” and how to write scripts to run programs: http://linuxcommand.org/.  From Jeff’s lecture there were a number of quick commands that he used which I have summarized below and on our Resources page.

On Kamiak, the primary way of navigating files and “jobs” (programs the computer is running) is through using a scheduling software called “slurm”. The following commands all have an “s” in the front because they refer to slurm specific commands – they are not generic linux commands, though in many cases those work too. For more information see the entire Training PDF.

sinfo #shows what CPUs are available to use
sbatch #creates job
scontrol #shows jobs
scancel #cancels jobs. Example, to cancel job humner 345: scancel 345
sq #shows all of your running or pending jobs. 

#Other commands
idev #opens up an interactive interface to run programs without an writing .sh script and submitting it to the computer 
cat slurm -Job #looks at a specific job number. Example: cat slurm -345

In general, Kamiak and Linux systems work where you write a “script”, basically a set of commands for the computer to do on its own, and then you submit that script to the computer and look at the results after. These script files are .sh files and can be written in a number of different programs, called text editors. A basic one that is relatively simple, and which can be edited and created in Kamiak through the “vim” command, is vim. Once you have written the instructions into your .sh file, you move the file, and any associated data, to Kamiak and you tell Kamiak to run it. Kamiak will run it as commanded and then the output will be saved where ever you have directed it to save. A great example of running a file, and of a simple Kamiak .sh script, can be found on the Kamiak website here.

To move files to and from  Kamiak there are a few different ways. For Mac or Linux users it can be done relatively easily as there are built in programs that let you transfer files. For Windows users a great program to use is WinScp. This program lets you use it either through the command line (aka the code) or through a point and click interface. All of these programs work where you first connect to Kamiak from your computer, then move files, then disconnect.

Here is an example of creating a .csv file in R, then moving it to Kamiak, on a Windows computer. Mac and Linux users will have similar experiences.

Creating the File

Connecting to Kamiak to transfer the file using WinScp

Connecting to Kamiak, note the name of Kamiak and the port number.

Moving the file

Using R on Kamiak

When using R on Kamiak it is important to create a default space for packages to install to on your own home directory. Our own Tung Nguyen has created one for us that is on the Kamiak website: https://hpc.wsu.edu/r/

 

 

2017 Summer Specialty Conference

Abstracts deadline: Feb 6, 2017

June 25-28, 2017- Tysons Corner, VA. Sheraton Tysons Hotel

The theme of this year’s meeting is Climate Change Solutions: Collaborative Science, Policy and Planning for Sustainable Water Management. Responding to climate change is complicated by the scale, complexity and inherent uncertainty of the problem.

For more information

High Resolution Graphics, Memory Issues, and WSU security

Today in our troubleshooting session we addressed exporting high resolution figures using R studio. Because R studio does not allow for a resolution increase we use coding that relies on some embedded R functionality, and works great for those that just us native R rather than R Studio. The script does not require any packages as it is base R functions, but it does require the working directory to be the desired destination folder – otherwise you’ll have to search your hard drive for it! The script for that is here: ExportingHighResolutionFigures

We also discussed memory issues in R. R is not very streamlined as far as memory use so there are a few tricks we can use to assist it to be more efficient. The first is using the ls() function. This identifies what products or objects you have currently stored within your R session. The more products you have, the more memory you are using. The ls() command shows the same products that are easily viewed in R Studio in the environment panel on the top left.

If you wold like to remove any products in the environment you can use the rm() command. Place the name of the object you would like to remove within the parentheses and it will be deleted from the environment.

An additional tool for salvaging memory is the gc() command. “gc” stands for “garbage collector” and, while it doesn’t delete any products, it removes memory storage that is associated with deleted or altered products.

Lastly, we discussed a persistent issue at WSU with accessing data from external sources. The current workaround, if you are using a Windows OS, is to specify to R which internet port it needs to use. R’s default port is currently not working for some data retrieval, so using the Microsoft Explorer port is necessary (WSU’s security allows Explorer to get data). The code to do this is setInternet2(TRUE).