Home Login
Functional Annotation of ANimal Genomes (FAANG) Project
— A coordinated international action to accelerate Genome to Phenome

Bioinformatics Training Workshop @ PAG2020

Functional Annotations of Animal Genomes Workshop
Notice: For participants do not meet the prerequisites, the workshop will start at 2:00PM January 8, 2020; while for advanced participants who have basic knowledge of and experience with UNIX command line and R programming language are expected to start at 8:00 AM January 9, 2020. To prepare yourself for the workshop, you can teach yourself some basic Unix and R via the resources provided by the appendix.
Day 1: Jan 8, 2020 (2:00 PM - 5:30 PM)
2:00-2:20registration and welcome
2:20-3:20Introduction to UNIX command line (Lecture and hands-on session)
  • Intro to the UNIX operating system (ssh, terminal, file system)
  • Handle paths and files (ls, cp, cd, mkdir, rm, find, touch, mv, ln)
  • View content of files (less, more, head, tail, vi, cat)
  • Search content of files (grep, wildcards)
  • Get help (man)
  • File permission (chmod)
  • Process and job
  • Resource management (df, du, quota, ...)
  • Other goodies (gzip/unzip, tar, zcat, cut, paste, join, file, diff, history)
  • 3:20-3:30break
    3:30-5:30Introduction to R (Lecture and hands-on session)
  • R/RStudio installation and setup
  • Data structure and manipulation (vector, list, array/matrix, data frame/table)
  • Data input and output
  • Get help
  • Flow control (if/else, for, while, repeat, break, next)
  • Function
  • Intro to Object oriented programming in R (S3/S4 classes, reference class).
  • R package management
  • Different modes of running R scripts (interactive and batch)
  • Day 2: Jan 9, 2020 (8:00 AM -6:30 PM)
    8:00-8:15registration and welcome
    8:15 -9:15Advanced Unix command line (Lecture and hands-on session)
  • Regular expression, sed, awk, xargs
  • pipe, redirection, shell scripting
  • Large data management: download, copy, backup(scp, ftp, ascp, rsync)
  • Software installation and management (module, configuration file, ...)
  • HPC job management systems (one or 2 out of SLURM, SGE, PBS, LSF, ...)
  • 9:15-9:25break
    9:25-10:55Exploratory data analysis and Data visualization in R (Lecture and hands-on session)
  • Descriptive statistics (frequency, correlation, statistic tests, PCA, MDS, hierarchical clustering)
  • R graphics (base, ggplot2): all types of plots, and heatmap
  • 10:55-11:05break
    11:05-12:00Introduction to some very useful R/BioConductor packages (mainly lecture and some hands-on session)
  • Readr, data.table
  • tidyverse, dplyr,reshape2, tidyr, stringr, lubridate
  • ggvis, plotly, htmlwidgets, googleVis, threejs
  • lme4/nlme, survival, caret
  • shiny, rmarkdown, solidify
  • For more useful packages, see https://awesome-r.com/
  • 12:00PM-1:00lunch
    1:00-2:00Introduction to NGS technologies, common types of genomics data, and handling tools (Lecture)
  • Illumina and other sequencing technologies
  • fastq, fasta, BAM, SAM, bed, gtf, gff, bigwig, ...
  • read QC: FASTQC, MultiQC
  • Short read aligners, SAMtools, picardtools, bedtools-
  • Visualization: IGV, UCSC genome browser
  • 2:00-2:10break
    2:10-3:20RNA-seq data analysis (I) (Lecture) (REF)
  • Experimental design, RNA-seq data characteristics
  • Read QC (FASTQC, MultiQC)
  • Optional preprocessing: trimming (Trimmomatic/Trim Galore), error correction (Rcorrector)
  • Alignment (STAR, HISAT, GMAP/GSNAP, RSEM) or semi-alignment (Kallisto, Salmon)
  • Post ‚Äìalignment QC (QoRT)
  • Optional post-alignment process: multi-mapper assignment (MMR)
  • Count summary (featureCounts)
  • Exploratory analysis (PCA, hierarchical clustering)
  • Differential expression analysis (statistic model selection): DESeq2/edgeR, Voom
  • Gene ontology and pathway analysis
  • Network analysis
  • 3:20-3:30break
    3:30-5:30RNA-seq data analysis (II) (hands-on session)
  • Exercises with toy data for QC and mapping, and real life count table for DEG, discussion
  • 5:30-6:30ChIP-seq data analysis (I) (Lecture) (REF)
  • Experimental design, data characteristics
  • Read QC (FASTQC, MultiQC)
  • Optional preprocessing: trimming (Trimmomatic/Trim Galore), error correction (Rcorrector)
  • Alignment (bwa-mem, bowtie/bowtie2)
  • Post-alignment QC (picard tools, deeptools, ChIPQC, SPP)
  • Peak Calling (MASC2, MUSIC/BCP)
  • Peak annotation (ChIPSeeker, GREAT)
  • Day 3: Jan 10, 2020 (8:00 AM -12:00 PM)
    8:00-10:00ChIP-seq data analysis (II) (Hands-on session)
  • Exercises, discussion
  • 10:00-10:10break
    10:10-11:00ATAC-seq data analysis (Lecture, providing scripts for analysis)
  • Experimental design, data characteristics
  • Read QC (FASTQC, MultiQC)
  • Optional preprocessing: trimming (Trimmomatic/Trim Galore), error correction (Rcorrector)
  • Alignment (bwa-mem, bowtie/bowtie2)
  • Post-alignment QC (picard tools, deeptools, ATACseqQC) (REF)
  • Peak Calling (MASC2)
  • Peak annotation (ChIPSeeker, GREAT)
  • 11:00-12:00NGS data and metadata management (Lecture and demo) [Instructor: Peter Harrison @ EBI]
  • FAANG data and metadata submission
  • Querying and downloading legacy NGS data from ENA SRA databases
  • APPENDIX: Resources for preparing yourself to meet the prerequisites

    This Bioinformatics Training Workshop was financially supported by USDA NIFA grant 2020-67015-30982, as well as the Swine, Horse, Poultry and Bovine NRSP8 Genome Coordinators.

     

    © 2014-2022 FAANG Consortium Contact: FAANG@iastate.edu