Welcome to The Carpentries Etherpad! This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents. Use of this service is restricted to members of The Carpentries community; this is not for general purpose use (for that, try https://etherpad.wikimedia.org). Users are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ ---------------------------------------------------------------------------- Genomics Workshop - University of Georgia May 18-19, 2022 8:00 am - 4:00 pm EST Instructors: Alejandro De Santiago, Tiago Pereira, Jason Wallace Lesson Materials: https://datacarpentry.org/genomics-workshop/ ---------------------------------------------------------------------------- URLs - Day 2 TRIMMOMATIC COMMAND *trimmomatic PE SRR2584863_1.fastq.gz SRR2584863_2.fastq.gz \ * SRR2584863_1.trim.fastq.gz SRR2584863_1un.trim.fastq.gz \ * SRR2584863_2.trim.fastq.gz SRR2584863_2un.trim.fastq.gz \ * SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:NexteraPE-PE.fa:2:40:15 BWA *bwa mem ref_genome/ecoli_rel606.fasta trimmed_fastq_small/SRR2584863_1.trim.sub.fastq trimmed_fastq_small/SRR2584863_2.trim.sub.fastq > SRR2584863.aligned.sam *samtools view -S -b results/sam/SRR2584866.aligned.sam > results/bam/SRR2584866.aligned.bam https://datacarpentry.org/wrangling-genomics/04-variant_calling/index.html Metadata - https://raw.githubusercontent.com/datacarpentry/wrangling-genomics/gh-pages/files/Ecoli_metadata_composite.csv FASTQ Files (full) * ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/004/SRR2589044/SRR2589044_1.fastq.gz * ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/004/SRR2589044/SRR2589044_2.fastq.gz * ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/003/SRR2584863/SRR2584863_1.fastq.gz * ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/003/SRR2584863/SRR2584863_2.fastq.gz * ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/006/SRR2584866/SRR2584866_1.fastq.gz * ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/006/SRR2584866/SRR2584866_2.fastq.gz Illumina Adaptors: ~/.miniconda3/pkgs/trimmomatic-0.38-0/share/trimmomatic-0.38-0/adapters/NexteraPE-PE.fa E. coli genome: * ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/017/985/GCA_000017985.1_ASM1798v1/GCA_000017985.1_ASM1798v1_genomic.fna.gz FASTQ Files (short) https://ndownloader.figshare.com/files/14418248 Fancy script: https://raw.githubusercontent.com/datacarpentry/wrangling-genomics/gh-pages/files/run_variant_calling.sh Integrated Genomics Viewer https://software.broadinstitute.org/oftware/igv/download ---------------------------------------------------------------------------- *AWS Instances (Please, write your name next to one these) Username: dcuser password: data4Carp shell access: ssh dcuser@your-instance ec2-34-239-169-102.compute-1.amazonaws.com: Jason Wallace ec2-44-200-40-187.compute-1.amazonaws.com: Alejandro De Santiago ec2-3-237-41-62.compute-1.amazonaws.com: Tiago Pereira ec2-34-204-176-247.compute-1.amazonaws.com: Kheeman Kwon ec2-3-236-114-42.compute-1.amazonaws.com Nolan Kemppinen ec2-3-239-118-17.compute-1.amazonaws.com Maya Salcedo ec2-100-24-210-136.compute-1.amazonaws.com Colton Meinecke ec2-3-238-49-59.compute-1.amazonaws.com Shiva Makaju ec2-44-192-24-250.compute-1.amazonaws.com Hannah Choi ec2-3-235-186-82.compute-1.amazonaws.com Yasin Topcu ec2-34-239-150-198.compute-1.amazonaws.com Rick Field ec2-44-192-77-15.compute-1.amazonaws.com Summer Blanco ec2-3-236-107-96.compute-1.amazonaws.comAustin Hart dcuser@ec2-3-238-182-94.compute-1.amazonaws.com Yun-Ching Wendy Tsai ec2-3-230-155-100.compute-1.amazonaws.com Qian Feng dcuser@ec2-3-239-81-92.compute-1.amazonaws.comShufan Zhang ec2-3-238-177-184.compute-1.amazonaws.comKarthick Chennakesavan ec2-3-238-94-0.compute-1.amazonaws.com Kathirvel Maruthai ec2-44-200-87-188.compute-1.amazonaws.comFrancisco de Blas ec2-35-168-111-93.compute-1.amazonaws.com ec2-44-200-142-167.compute-1.amazonaws.com -> Duplicate ec2-35-168-111-93.compute-1.amazonaws.com -> Duplicate ec2-44-200-142-167.compute-1.amazonaws.com David Forgacs ec2-34-204-192-30.compute-1.amazonaws.com Susan Ihejirika ec2-34-238-191-231.compute-1.amazonaws.com Mingyu Wang dcuser@ec2-3-235-40-86.compute-1.amazonaws.com Ehsan Suez dcuser@ec2-35-170-77-179.compute-1.amazonaws.comYin Wang dcuser@ec2-44-200-27-223.compute-1.amazonaws.comShuang Yang ec2-3-227-2-177.compute-1.amazonaws.com Huifang Xu dcuser@ec2-3-215-186-66.compute-1.amazonaws.com Haifeng Zhang ec2-34-204-188-98.compute-1.amazonaws.com Yanbing Wang ec2-3-238-121-54.compute-1.amazonaws.comSamuel Manthi ec2-3-236-210-110.compute-1.amazonaws.comBrady O'Boyle ec2-44-197-195-247.compute-1.amazonaws.comRay Parcon ec2-44-200-66-77.compute-1.amazonaws.comFrancisco de Blas ec2-3-239-86-196.compute-1.amazonaws.comScott Clem ec2-35-153-143-146.compute-1.amazonaws.com ec2-44-200-187-50.compute-1.amazonaws.comFiifi_DAdzie ec2-3-238-248-62.compute-1.amazonaws.comUsha Bhatta ec2-35-170-75-62.compute-1.amazonaws.comAnne Frances Jarrell ---------------------------------------------------------------------------- Rules for Formatting Your Metadata - Each sample has its own row - clear names - No spaces in the column names - Submission sheet: https://datacarpentry.org/organization-genomics/files/sample_submission.txt Sequencing results: https://datacarpentry.org/organization-genomics/files/sequencing_results_metadata.txt Sample BioProject: prjna294072 Sub-project: PRJNA295606 European Nucleotide Archive: https://www.ebi.ac.uk/ena/browser/home Accession: SRR2589044 ###### Tab completion - When typing in terminal, hit [tab] to auto-complete file/directory names Recent commands - Press [up arrow] to scroll through recent commands (good for repeating/modifying them) Shell commands: [ctrl + c] - Stop the current command (useful if gets stuck or taking too long) clear - Clear your terminal pwd - Print Working Directory ls - List files/directories (-F flag adds stuff to end of names; -l flag gives "long" format with extra info) man [command] - Show the Manual for [command] cd - Change Directory echo - Print stuff to the terminal cat - "Concatenate" (but really just printing file contents unless you redirect it elsewhere) head - Print the first few lines of a file tail - Print the last few lines of a file cp - Copy a file mkdir - Make a new directory chmod - Change permissions ("mode") of a file (useful for making things read-only) rm - Delete (remove) files grep - Search within a file [command] > [file] - Redirection; take output of [command] and save it to [file] >> - Appending; works like redirection, except output is added to the end of the file instead of overwriting it [command1] | [command2] - Pipe. Output from [command1] is used as input for [command2] wc - Word count a file (often used with the "-l" flag for just the number of lines) history - Show the commands you've run less - Browse through a file basename - Get the base name (=no directory structure) of a file Can also strip end of file name, so "basename backup/SRR09867.fq .fq" returns "SRR09867" curl - Download a file given a URL; prints to console by default wget - Download a file given a URL; saves to file by default scp - Secure copy. Copya file from a remote location to a local file Usage: scp username@server:/path/to/file /local/path/ Example: scp dcuser@ec2-34-239-169-102.compute-1.amazonaws.com:/home/dcuser/shell_data/untrimmed_fastq/SRR097977.fastq ~/Downloads #### Post-Lunch Exercises ### Exercise #1 - Find all commands from your history where you used the 'grep' command and save them to a file history | grep 'grep' > grep_command.txt Exercise #2 * List all of the files in /usr/bin that start with the letter ‘c’. can we do this? ls /usr/bin | grep "^c" -> Works, but simpler answer: ls /usr/bin/c* * List all of the files in /usr/bin that contain the letter ‘a’. ls /usr/bin/*a* * List all of the files in /usr/bin that end with the letter ‘o’. ls /usr/bin/*o Exercise #3 Starting in the shell_data/untrimmed_fastq/ directory, do the following: - Make sure that you have deleted your backup directory and all files it contains. - Create a backup of each of your FASTQ files using cp. (Note: You’ll need to do this individually for each of the two FASTQ files. We haven’t learned yet how to do this with a wildcard.) - Use a wildcard to move all of your backup files to a new backup directory. - Change the permissions on all of your backup files to be write-protected. mkdir backup cp SRR097977.fastq SRR097977_backup.fastq cp SRR098026.fastq SRR098026_backup.fastq mv *_backup.fastq backup/ chmod -w backup/* URLs: ftp://ftp.ensemblgenomes.org/pub/release-37/bacteria/species_EnsemblBacteria.txt https://datacarpentry.org/shell-genomics/06-organization/index.html Is this metadata properly formatted? Yes xxxxxxxxx No x*XxcpxxxYx Is the first sequence good quality? yes Xx xxxxxxxxxxxxxxXxxx no Is the last sequence good quality? yes xXxxxxxxxxxxxxxxXxxx No I have a question. "cat */summary.txt > ~/dc_workshop/docs/fastqc_summaries.txt" works but "cat *summary.txt > ~/dc_workshop/docs/fastqc_summaries.txt" doesn't work. Why so? [nicely explained!] Which samples fail at elast one of FASTQC quality tests? And which tests?