Prep Work #3 - MARCC and Troubleshooting R

Learning Objectives

  1. Continue practicing genomics data manipulation using R
    • Review loading and examining a data.frame
    • Find data of interest using boolean vectors
    • Troubleshoot common error messages
  2. Install R packages
  3. Practice using Unix on MARCC
    • Log in using ssh and two-factor authentication
    • Transfer a file using scp
    • Run a program available through module load

Tasks

  1. Subset and wrangle RNA-seq data in R ( preview )
    • Download assignment ( prepwork-may.Rmd )
      • Make sure to save file with extension .Rmd and not .Rmd.txt
    • Save work as prepwork-may-lastname.Rmd and email to your TA
  2. Install R packages
    • Install each package one at a time, testing that everything is ok with library( "___" )
      • i.e. No Error : there is no package called "___"
    • From CRAN e.g. install.packages( "package" )
      • ggplot2, dplyr, gplots, devtools
    • From Bioconductor e.g. biocLite( "package" )
      • annotatr, rtracklayer, Homo.sapiens, bumphunter, RTopper, reactome.db, GenomicRanges
      • Note: Load the biocLite() function once per R session via source( "https://bioconductor.org/biocLite.R" )
    • From GitHub e.g. install_github( "name/repo" )
      • vqv/ggbiplot
      • Note: Load the install_github() function via library( "devtools" )
  3. Setup and test Google Authenticator
  4. Transfer a file from MARCC
    • Instruct your laptop to retrieve text.txt from MARCC and store in on your Desktop in a folder called PG2018 (scroll right to see entire scp command … ends in PG2018)
      • Note: Windows use md and pscp instead of mkdir and scp
    # On your **laptop**, change to the "Desktop" directory
    cd Desktop
    
    # Make a directory called PG2018 if it doesn't already exist
    mkdir PG2018
    
    # Secure copy a file from MARCC to your PG2018 directory
    scp username@jhu.edu@gateway2.marcc.jhu.edu:work/test.txt PG2018
    • View the contents of Desktop/PG2018/test.txt
  5. Run a program on MARCC
    # Use your username, which should resemble lastname-temp
    ssh gateway2.marcc.jhu.edu -l username@jhu.edu
    
    # Connect to compute node interactively
    srun -p shared -c 1 --mem 1024 -t 30 --pty bash
    
    # Switch to high performance file system
    cd scratch
    
    # Create personal directory for genome datasets
    mkdir genomes
    cd genomes
    
    # Make a local working copy of chr20 from human hg19
    cp ~/work/genomes/chr20.fa.gz .
    ls -l
    gunzip chr20.fa.gz
    ls -l
    
    # Load program
    module load bowtie2/2.2.5
    
    # Index genome ... ~2 min
    bowtie2-build chr20.fa chr20
    ls -l