The CEREO Newsletter for May 2015 is now available- you can view it here.
Today Tsengel Nergui showed us how she used Principal Component Analysis in her atmospheric research. The script and data provided shows an excellent example of PCA application. Tsengel discusses not only int interpretation of the results, but also some of the standardization that one can do prior to PCA.
In the discussion portion of the session we talked about how a conceptual understanding of PCA can be broken into two philosophies: calculating the eigenvalues or focusing on the dissimilarity matrix. Both lead to the same place but some researchers may find one or the other strategy more compelling. PCA, and indeed other multivariate apraoches in R, are very clearly explained in Manly’s Multivariate Statistical Methods: A Primer. The 4th edition has a website that includes example data and script for R. Another good resource is the R package vegan.
In addition to discussin PCA, we also discussed loading jpegs in R. This is very simple to do with the jpeg package.
This talk will require the following packages:
library(stats) library(plyr) # plyr must be called before dplyr library(dplyr) library(ggplot2) jpeg rasterImage
Necessary script and data below:
There is a new Water Club at WSU, it is for undergraduate and graduate students interested in water-related sciences to have a hands-on opportunity to experience each other’s research, share in new opportunities, and socialize in a relaxed environment.
Will largely be run by graduate and undergraduate students, meaning that you are in control of what its mission, function, vitality and visibility. Potential activities may include, but are not limited to student-run research projects, opportunities to get involved in and acquainted with water related studies across sciences and humanities, and opportunities for professional development, social activities, and volunteer opportunities. Above all, this will be a contingent run by students and for students. No prior background with water-related study is required!
Caribbean Ecosystem Field Studies
* Study, snorkel & SCUBA dive on the Caribbean coral reef of Mexico *
May 21- June 10 or June 14 – July 4
Colorado Ecosystem Field Studies
* Study, camp, & hike in the Rocky Mountains of Colorado *
June 18 – July 8 or July 15- August 4
For all course information visit the website: EcoFS.org
This week we had a guest speaker, Jeff White from IT, who discussed accessing the Kamiak High Performance Computer on campus (slides can be found here). We also discussed creating .csv files and getting that data into R.
Kamiak is a computer may be accessed by any student with an approved access. Access can be set up by contacting CIRC, the Center for Institutional Research Computing, which runs Kamiak through their Service Desk. You will need to make an account first and your adviser or project PI will need to vouch for you.
Kamiak is a large computer, or “cluster” of smaller computers which work in tandem. Kamiak is a Linux system – what that means functionally is you access it through what is called the “secure shell”, or ssh. This is an interface which communicates with the computer remotely, so you load it up on your personal computer and then can run programs and software on the Kamiak computer. It is not a point and click system, but rather one that is done by coding, in this case Linux. Information on how to install or open ssh software onto your own computer can be found here: https://hpc.wsu.edu/users-guide/terminal-ssh/.
Once you have the ssh running, and an active Kamiak account, you log into the computer using your WSU credentials. There are a vast number of commands you can use to communicate with the computer – here is a good resource for learning Linux in general, which goes over both the “secure shell” and how to write scripts to run programs: http://linuxcommand.org/. From Jeff’s lecture there were a number of quick commands that he used which I have summarized below and on our Resources page.
On Kamiak, the primary way of navigating files and “jobs” (programs the computer is running) is through using a scheduling software called “slurm”. The following commands all have an “s” in the front because they refer to slurm specific commands – they are not generic linux commands, though in many cases those work too. For more information see the entire Training PDF.
sinfo #shows what CPUs are available to use sbatch #creates job scontrol #shows jobs scancel #cancels jobs. Example, to cancel job humner 345: scancel 345 sq #shows all of your running or pending jobs. #Other commands idev #opens up an interactive interface to run programs without an writing .sh script and submitting it to the computer cat slurm -Job #looks at a specific job number. Example: cat slurm -345
In general, Kamiak and Linux systems work where you write a “script”, basically a set of commands for the computer to do on its own, and then you submit that script to the computer and look at the results after. These script files are .sh files and can be written in a number of different programs, called text editors. A basic one that is relatively simple, and which can be edited and created in Kamiak through the “vim” command, is vim. Once you have written the instructions into your .sh file, you move the file, and any associated data, to Kamiak and you tell Kamiak to run it. Kamiak will run it as commanded and then the output will be saved where ever you have directed it to save. A great example of running a file, and of a simple Kamiak .sh script, can be found on the Kamiak website here.
To move files to and from Kamiak there are a few different ways. For Mac or Linux users it can be done relatively easily as there are built in programs that let you transfer files. For Windows users a great program to use is WinScp. This program lets you use it either through the command line (aka the code) or through a point and click interface. All of these programs work where you first connect to Kamiak from your computer, then move files, then disconnect.
Here is an example of creating a .csv file in R, then moving it to Kamiak, on a Windows computer. Mac and Linux users will have similar experiences.
Creating the File
Connecting to Kamiak to transfer the file using WinScp
Connecting to Kamiak, note the name of Kamiak and the port number.
Moving the file
Using R on Kamiak
When using R on Kamiak it is important to create a default space for packages to install to on your own home directory. Our own Tung Nguyen has created one for us that is on the Kamiak website: https://hpc.wsu.edu/r/
Nitrogen: At the Nexus Between Food Security and Sustainability is a virtual symposium that will be held on March 8-9, 2017 for 3 hours each day.
The symposium will start at 8AM Pacific Time US
Abstracts deadline: Feb 6, 2017
June 25-28, 2017- Tysons Corner, VA. Sheraton Tysons Hotel
The theme of this year’s meeting is Climate Change Solutions: Collaborative Science, Policy and Planning for Sustainable Water Management. Responding to climate change is complicated by the scale, complexity and inherent uncertainty of the problem.
Today in our troubleshooting session we addressed exporting high resolution figures using R studio. Because R studio does not allow for a resolution increase we use coding that relies on some embedded R functionality, and works great for those that just us native R rather than R Studio. The script does not require any packages as it is base R functions, but it does require the working directory to be the desired destination folder – otherwise you’ll have to search your hard drive for it! The script for that is here: ExportingHighResolutionFigures
We also discussed memory issues in R. R is not very streamlined as far as memory use so there are a few tricks we can use to assist it to be more efficient. The first is using the ls() function. This identifies what products or objects you have currently stored within your R session. The more products you have, the more memory you are using. The ls() command shows the same products that are easily viewed in R Studio in the environment panel on the top left.
If you wold like to remove any products in the environment you can use the rm() command. Place the name of the object you would like to remove within the parentheses and it will be deleted from the environment.
An additional tool for salvaging memory is the gc() command. “gc” stands for “garbage collector” and, while it doesn’t delete any products, it removes memory storage that is associated with deleted or altered products.
Lastly, we discussed a persistent issue at WSU with accessing data from external sources. The current workaround, if you are using a Windows OS, is to specify to R which internet port it needs to use. R’s default port is currently not working for some data retrieval, so using the Microsoft Explorer port is necessary (WSU’s security allows Explorer to get data). The code to do this is setInternet2(TRUE).