Skip to main content Skip to navigation
CEREO Computing

Packrat Package – managing package versions

This week CEREO’s Stephanie Labou introduced us to the packrat package. Packrat is a relatively new package that assists collaboration and functionality of code by maintain and standardizing package versions used in a project. Depending on the level of experience, R users may not have ran into this issue before but it is a persistent problem with the R system. Due to the dynamic and open nature of the software, changes and improvements to packages can tweek the way that certain functions interact, making old code buggy or obsolete. Packrat is an attempt to control for this.

Packrat, in essence, creates a large zip file with all of the libraries and settings used for a project. Users then send this entire file to their collaborators and collaborators load packages and libraries from that zip file. This ensures that the versions of packages used are the same across all collaborators. Within packrat, each folder is essentially its own project, with its own packages – packrat folders are created within the working directory when the creation command is called.

The first step in using packrat is to create, or “bundle” your libraries. This is shown in the script below. In addition, the script below uses the “::” syntax to call commands. The double colon symbol is a way to specify exactly which packages commands are being used. This is because some packages have commands with the same name – whichever package is loaded last will overwrite the identically named commands from the other one. This is why people’s script may sometimes have notes about the loading order of packages.

PackratBundling

Once the packrat package has bundled the libraries for a project, you can then send the entire file to a collaborator. To re-create this on your own computer open a brand new R session and then follow the script below, which will unbundle the packrat file created in the above script:

PackratUnbundling

Once you are working within a packrat session there are some useful commands to know. One is sessionInfo() which shows what versions of things you have loaded. There is also a way to install older versions of packages – this is useful if you want to create a new packrat project but you realize your current packages are too new. Information on how to do that can be found here.

Additionally, the scripts provided by Stephanie do an excellent job of annotating, or commenting, on the code. This is especially important when working with collaborators, but is also important when working solo as it makes it easier to troubleshoot issues. Good annotations can help users determine if issues are code issues, are package related (and can therefore be addressed with packrat), or are (rarely) issues with versions of R. R version errors are harder to fix, and are not addressed by the packrat package. But! As Dr. Katz said during this session: “there is a long conversation to be had about strategies in programming for another time.”

Enjoy packrat!

High Performance Computing and R – WSU’s Kamiak Cluster

This week we had a guest speaker, Jeff White from IT, who discussed accessing the Kamiak High Performance Computer on campus (slides can be found here). We also discussed creating .csv files and getting that data into R.

Kamiak is a computer may be accessed by any student with an approved access. Access can be set up by contacting CIRC, the Center for Institutional Research Computing, which runs Kamiak through their Service Desk. You will need to make an account first and your adviser or project PI will need to vouch for you.

Kamiak is a large computer, or “cluster” of smaller computers which work in tandem. Kamiak is a Linux system – what that means functionally is you access it through what is called the “secure shell”, or ssh. This is an interface which communicates with the computer remotely, so you load it up on your personal computer and then can run programs and software on the Kamiak computer. It is not a point and click system, but rather one that is done by coding, in this case Linux. Information on how to install or open ssh software onto your own computer can be found here: https://hpc.wsu.edu/users-guide/terminal-ssh/.

Once you have the ssh running, and an active Kamiak account, you log into the computer using your WSU credentials. There are a vast number of commands you can use to communicate with the computer – here is a good resource for learning Linux in general, which goes over both the “secure shell” and how to write scripts to run programs: http://linuxcommand.org/.  From Jeff’s lecture there were a number of quick commands that he used which I have summarized below and on our Resources page.

On Kamiak, the primary way of navigating files and “jobs” (programs the computer is running) is through using a scheduling software called “slurm”. The following commands all have an “s” in the front because they refer to slurm specific commands – they are not generic linux commands, though in many cases those work too. For more information see the entire Training PDF.

sinfo #shows what CPUs are available to use
sbatch #creates job
scontrol #shows jobs
scancel #cancels jobs. Example, to cancel job humner 345: scancel 345
sq #shows all of your running or pending jobs. 

#Other commands
idev #opens up an interactive interface to run programs without an writing .sh script and submitting it to the computer 
cat slurm -Job #looks at a specific job number. Example: cat slurm -345

In general, Kamiak and Linux systems work where you write a “script”, basically a set of commands for the computer to do on its own, and then you submit that script to the computer and look at the results after. These script files are .sh files and can be written in a number of different programs, called text editors. A basic one that is relatively simple, and which can be edited and created in Kamiak through the “vim” command, is vim. Once you have written the instructions into your .sh file, you move the file, and any associated data, to Kamiak and you tell Kamiak to run it. Kamiak will run it as commanded and then the output will be saved where ever you have directed it to save. A great example of running a file, and of a simple Kamiak .sh script, can be found on the Kamiak website here.

To move files to and from  Kamiak there are a few different ways. For Mac or Linux users it can be done relatively easily as there are built in programs that let you transfer files. For Windows users a great program to use is WinScp. This program lets you use it either through the command line (aka the code) or through a point and click interface. All of these programs work where you first connect to Kamiak from your computer, then move files, then disconnect.

Here is an example of creating a .csv file in R, then moving it to Kamiak, on a Windows computer. Mac and Linux users will have similar experiences.

Creating the File

Connecting to Kamiak to transfer the file using WinScp

Connecting to Kamiak, note the name of Kamiak and the port number.

Moving the file

Using R on Kamiak

When using R on Kamiak it is important to create a default space for packages to install to on your own home directory. Our own Tung Nguyen has created one for us that is on the Kamiak website: https://hpc.wsu.edu/r/