Prep Work #3 - MARCC and Troubleshooting R - Practical Genomics Workshop

Learning Objectives

Continue practicing genomics data manipulation using R
- Review loading and examining a data.frame
- Find data of interest using boolean vectors
- Troubleshoot common error messages
Install R packages
Practice using Unix on MARCC
- Log in using ssh and two-factor authentication
- Transfer a file using scp
- Run a program available through module load

Tasks

Subset and wrangle RNA-seq data in R ( preview )
- Download assignment ( prepwork-may.Rmd )
  - Make sure to save file with extension .Rmd and not .Rmd.txt
- Save work as prepwork-may-lastname.Rmd and email to your TA
Install R packages
- Install each package one at a time, testing that everything is ok with library( "___" )
  - i.e. No Error : there is no package called "___"
- From CRAN e.g. install.packages( "package" )
  - ggplot2, dplyr, gplots, devtools
- From Bioconductor e.g. biocLite( "package" )
  - annotatr, rtracklayer, Homo.sapiens, bumphunter, RTopper, reactome.db, GenomicRanges
  - Note: Load the biocLite() function once per R session via source( "https://bioconductor.org/biocLite.R" )
- From GitHub e.g. install_github( "name/repo" )
  - vqv/ggbiplot
  - Note: Load the install_github() function via library( "devtools" )
Setup and test Google Authenticator
Transfer a file from MARCC
- Instruct your laptop to retrieve text.txt from MARCC and store in on your Desktop in a folder called PG2018 (scroll right to see entire scp command … ends in PG2018)
  - Note: Windows use md and pscp instead of mkdir and scp
```
# On your **laptop**, change to the "Desktop" directory
cd Desktop

# Make a directory called PG2018 if it doesn't already exist
mkdir PG2018

# Secure copy a file from MARCC to your PG2018 directory
scp username@jhu.edu@gateway2.marcc.jhu.edu:work/test.txt PG2018
```
- View the contents of Desktop/PG2018/test.txt

Run a program on MARCC

Log into MARCC, connect to a compute node, and build a genome index using Bowtie2

# Use your username, which should resemble lastname-temp
ssh gateway2.marcc.jhu.edu -l username@jhu.edu

# Connect to compute node interactively
srun -p shared -c 1 --mem 1024 -t 30 --pty bash

# Switch to high performance file system
cd scratch

# Create personal directory for genome datasets
mkdir genomes
cd genomes

# Make a local working copy of chr20 from human hg19
cp ~/work/genomes/chr20.fa.gz .
ls -l
gunzip chr20.fa.gz
ls -l

# Load program
module load bowtie2/2.2.5

# Index genome ... ~2 min
bowtie2-build chr20.fa chr20
ls -l