Learning Objectives
- Continue practicing genomics data manipulation using R
- Review loading and examining a data.frame
- Find data of interest using boolean vectors
- Troubleshoot common error messages
- Install R packages
- Practice using Unix on MARCC
- Log in using ssh and two-factor authentication
- Transfer a file using scp
- Run a program available through
module load
Tasks
- Subset and wrangle RNA-seq data in R ( preview )
- Download assignment (
prepwork-may.Rmd
)- Make sure to save file with extension
.Rmd
and not.Rmd.txt
- Make sure to save file with extension
- Save work as
prepwork-may-lastname.Rmd
and email to your TA
- Download assignment (
- Install R packages
- Install each package one at a time, testing that everything is ok with
library( "___" )
- i.e. No
Error : there is no package called "___"
- i.e. No
- From CRAN e.g.
install.packages( "package" )
- ggplot2, dplyr, gplots, devtools
- From Bioconductor e.g.
biocLite( "package" )
- annotatr, rtracklayer, Homo.sapiens, bumphunter, RTopper, reactome.db, GenomicRanges
- Note: Load the
biocLite()
function once per R session viasource( "https://bioconductor.org/biocLite.R" )
- From GitHub e.g.
install_github( "name/repo" )
- vqv/ggbiplot
- Note: Load the
install_github()
function vialibrary( "devtools" )
- Install each package one at a time, testing that everything is ok with
- Setup and test Google Authenticator
- Transfer a file from MARCC
- Instruct your laptop to retrieve
text.txt
from MARCC and store in on your Desktop in a folder calledPG2018
(scroll right to see entirescp
command … ends inPG2018
)- Note: Windows use
md
andpscp
instead ofmkdir
andscp
- Note: Windows use
# On your **laptop**, change to the "Desktop" directory cd Desktop # Make a directory called PG2018 if it doesn't already exist mkdir PG2018 # Secure copy a file from MARCC to your PG2018 directory scp username@jhu.edu@gateway2.marcc.jhu.edu:work/test.txt PG2018
- View the contents of
Desktop/PG2018/test.txt
- Instruct your laptop to retrieve
- Run a program on MARCC
- Log into MARCC, connect to a compute node, and build a genome index using Bowtie2
# Use your username, which should resemble lastname-temp ssh gateway2.marcc.jhu.edu -l username@jhu.edu # Connect to compute node interactively srun -p shared -c 1 --mem 1024 -t 30 --pty bash # Switch to high performance file system cd scratch # Create personal directory for genome datasets mkdir genomes cd genomes # Make a local working copy of chr20 from human hg19 cp ~/work/genomes/chr20.fa.gz . ls -l gunzip chr20.fa.gz ls -l # Load program module load bowtie2/2.2.5 # Index genome ... ~2 min bowtie2-build chr20.fa chr20 ls -l