Welcome to The Carpentries Etherpad!

This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.

Use of this service is restricted to members of The Carpentries community; this is not for general purpose use (for that, try https://etherpad.wikimedia.org).

Users are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html

All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/


 ----------------------------------------------------------------------------
Participants
Endre Sebestyén - Semmelweis University
  1. Lisa Tietze - PhD candidate, NTNU (analyze own data set)
  2. Eirini Tsirvouli - PhD @ NTNU (analyze own data, maybe apply to be trained as a Data Carpentry trainer)
  3. Iva Pitelkova, Head Engineer at Tromsø Museum, UiT (genome skimming on plant chloroplasts)
  4. May Khider - PhD candidate @ NTNU dept. Biotechnology (would like to be more comfortable with programming software and apply knowledge from workshop to my own project)
  5. Hannah Schweitzer - UiT The Arctic University of Norway
  6. Jonathan Bramsiepe UiO - Plant RNAseq and DNAseq datasets 
  7. Katrine Bjerkan - UiO- Analysis of own data and future work- both plant RNAseq and DNAseq 
  8. Guy Hindley - PhD Candidate, UiO - NORMENT - Psychiatric genetics department. Feel more comfortable handling genomic data (psych GWAS sumstats predominantly)
  9. Abel Gizaw - Postdoc at UiO
  10. Abush Zinaw- Visiting PhD student at UiO
  11. Didac Vidal Pineiro - Researcher at UiO
  12. amit sharma, Reserarcher,NTNU
  13. Bisa: Researcher at FBA, Nord University, Bodø
  14. Alexandra Jonsson- PhD at UiO. Single cell trancriptomics in cod 
  15. Nur : researcher at Norwegian Institute of Public Health 
  16. Prabin, NMBU
  17. Juline, Nord
  18. Mingyi, UiO, genome methylation analysis
Data Carpentry Genomics Workshop

https://datacarpentry.org/genomics-workshop/setup.html

Part 1: Project Organization and Management for Genomics
https://datacarpentry.org/organization-genomics/

Sequencing experiments done: ++++++_++
RNA with sample info to seq-facilities and get back fastq-files and some basic bioinformaitc analysis was done at the seq-facility
plant DNA for barcoding, we needed to include plant taxon ID together with our samples, concentration of the samples, we now do inhouse sequencing of aDNA amplicon libraries and also genome skimming on plant chloroplasts
I have sent DNA fragments and plasmids for sequencing, never genomes
I have done 16S/18S targeted sequencing in house on our own MiSeq, and I have sent out metagenomics sequencing to a NovaSeq. For sending out metagenomics you need concentrated DNA and have analyzed all metagenomes in house. I have never done eukaryotic genomes.
Meta data: how samples were prepared for sequencing and when, which conditions samples were grown in (if applicable) or where they were taken from/isolated, what type of data (RNA/DNA, which organism)

Planning for NGS Projects
https://datacarpentry.org/organization-genomics/02-project-planning/index.html

Formatting problems in spreadsheets and how to deal with them
https://datacarpentry.org/spreadsheet-ecology-lesson/02-common-mistakes/index.html

A Quick Guide to Organizing Computational Biology Projects
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424

A bigger collection of papers, and tutorials on how to start with bioinformatics projects, what to learn, etc
https://github.com/esebesty/bioinf_starter_pack

NCBI SRA
https://www.ncbi.nlm.nih.gov/sra/

EBI ENA
https://www.ebi.ac.uk/ena/browser/home

Introduction to the Command Line for Genomics

Instructor
Luca Di Stasio

Helpers
Tadeu
Kari 
Ali

Participants
  1. Eirini Tsirvouli
  2. Abel Gizaw
  3. Lisa Tietze
  4. Jonathan
  5. Abush Zinaw
  6. Fleur
  7. Katrine Bjerkan
  8. Bisa
  9. amit
  10. Hannah
  11. May
  12. Prabin
  13. Nur
  14. Guy
  15. Iva -

Windows +++++++++
Mac++
Linux +

Both windows and linux (two computers) OK+
mac+

How much do you know shell?
never used +++++
once a year+++++
once a month+
once a week +

every day++
never

Amazon instances:

INSTRUCTIONS        
substitute the links below for the ec2.... links in the example        
        
username: dcuser        
pw: data4Carp        
            
shell access        
$ ssh dcuser@ec2-12-345-678-90.compute-1.amazonaws.com        
        
RStudio server        
visit ec2-12-345-678-90.compute-1.amazonaws.com:8787 in your browser, use same username/pw        
        
Same user/pw for all instances. 

Learner Instances
ec2-3-237-39-84.compute-1.amazonaws.com - Eirini Tsirvouli
ec2-18-210-23-240.compute-1.amazonaws.com - Abel Gizaw
ec2-35-175-223-241.compute-1.amazonaws.com - Lisa Tietze
ec2-35-172-203-120.compute-1.amazonaws.com - Jonathan
ec2-3-236-114-246.compute-1.amazonaws.com - Abush Zinaw
ec2-3-226-122-33.compute-1.amazonaws.com - Fleur
ec2-3-236-56-11.compute-1.amazonaws.com - Katrine Bjerkan
ec2-18-204-56-16.compute-1.amazonaws.com - Bisa
ec2-3-236-118-202.compute-1.amazonaws.com - amit
 ec2-3-227-211-96.compute-1.amazonaws.com- Hannah
ec2-3-236-85-209.compute-1.amazonaws.com - May
ec2-3-236-85-109.compute-1.amazonaws.com - Prabin
ec2-3-237-13-115.compute-1.amazonaws.com - Nur
ec2-35-175-120-232.compute-1.amazonaws.com - Guy
ec2-3-226-244-206.compute-1.amazonaws.com-Adrian
ec2-3-235-75-241.compute-1.amazonaws.com - Alexandra
ec2-3-238-112-102.compute-1.amazonaws.com - Iva
ec2-35-172-230-97.compute-1.amazonaws.com
ec2-3-238-124-61.compute-1.amazonaws.com
ec2-3-235-174-82.compute-1.amazonaws.comBenedicte
ec2-3-85-241-133.compute-1.amazonaws.comElisa
ec2-34-229-132-87.compute-1.amazonaws.com
ec2-3-238-84-202.compute-1.amazonaws.com
ec2-3-236-252-88.compute-1.amazonaws.com
ec2-18-206-16-63.compute-1.amazonaws.com_mingyi
ec2-3-215-22-202.compute-1.amazonaws.com
ec2-3-236-249-145.compute-1.amazonaws.com - Ali
ec2-100-26-176-28.compute-1.amazonaws.com - Endre
ec2-3-236-45-0.compute-1.amazonaws.com - Kari
ec2-3-234-208-96.compute-1.amazonaws.com - Tadeu


/home/dcuser/shell_data/sra_metadata
we want to go to dcuser
how can we do that?

 ..

cd ../..
cd.. cd .. (run cd .. twice)

cd /home/dcuser

$ cd ..
$ pwd
/home/dcuser/shell_data
$ cd ..
$ pwd
/home/dcuser
$
cd ../../


Guy$ pwd
/home/dcuser/shell_data
Guy$ cd /home/dcuser
Guy$ pwd
/home/dcuser

what's the difference between > and >> ? 
ls --> myFolderContent.txtm
ls -l --> my FolderContent.txt

> creates a file and write the list into it while >> adds new content to the file.
 I am not sure I see the difference
ls -l

Feedback
What went well
What should be improved


Introduction to the Command Line for Genomics


Instructor
Luca Di Stasio

Helpers
Tadeu
Kari 
Ali

Participants
  1. Eirini 
  2. Benedicte Garmann-Johnsen
  3. Jonathan
  4. Katrine
  5. May
  6. Bisa
  7. Lisa
  8. Fleur
  9. Abel Gizaw
  10. Guy
  11. Alexandra
  12. Abush
  13. amit
  14. Hannah
  15. Iva
  16. Adrian

mv -  what does it do?
rename firstScript.sh to test.sh
mv firstScript.sh test.sh
rename, move file in to different directory... mv firstScript.sh test.sh
mv myfirstscript.sh test.sh
mv myfirstScript.sh test.sh
$mv ~/scripts/myfirstScript.sh ~/scripts/test.sh - move and change name of file 

$ mv myfirstScript.sh test.sh
$ ls
firstExample.txt   myFolderContent.txt  mytext.text
firstExample.txtĈ  myHistory.txt        test.sh
mv is used to either move a folder or rename a file.

create a folder named "scriptsBackup" in dcuser
copy (cp) test.sh into scriptsBackup
q
$ mkdir scriptsBackup
$ cp test.sh scriptsBackup
ls ~mkdir ~/scriptsBackup and cp test.sh ~/scriptsBackup/
mkdir scriptsBackup
cp test.sh scriptsBackup
mkdir scriptsBackup to crate the scriptsBackup directory then, cp scripts/test.ch scriptsBackup
mkdir ScriptsBackup
cp ~/scripts/test.sh ~/ScriptsBackup/

$cd ..
$pwd
/home/dcuser
$ls
R  r_data  scripts  shell_data
$mkdir scriptsBackup
$ls
$cp ~/scripts/test.sh ~/scriptsBackup
.
for each file in ../shell_data/untrimmed_fastq/ read thhe first 2 lines and save them to a file called seq_info.txt
for filename in *.fastq; do head -n 2 ${filename} >> seq_info.txt; done

$ for filename in *.fastq
> do
> head -n 2 ${filename} >> seq_info.txt
> done
$ cat seq_info.txt

for filename in *.fastq; do head -n 2 ${filename}; done > seq_info.txt

for filename in *.fastq; do head -n 2 ${filename}; >seq_info.txt; ls; done

create a script "dataAnalysis.sh" in the folder script
for each file in ../shell_data/untrimmed_fastq/ read the first 2 lines and save them to a file called seq_info.txt, the file seq_info.txt is in the same ../shell_data/untrimmed_fastq/ folder
nano dataAnalysis.sh (for filename in *.fastq; do head -n 2 ${filename} >> seq_info.txt; done)
chmod +x dataAnalysis.sh
./dataAnalysis.sh
cd ~/shell_data/untrimmed_fastq
for filename in *.fastq
do
head -n 2 ${filename} >> seq_info.txt
done


/home/dcuser/shell_data/untrimmed_fastq
$ for filename in *.fastq; do head -n 2 $filename >> seq_info.text; done

$ cat seq_info.text
@SRR097977.1 209DTAAXX_Lenski2_1_7:8:3:710:178 length=36
TATTCTGCCATAATGAAATTCGCCACTTGTTAGTGT
@SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35
NNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNN
$ ls
seq_info.text  SRR097977.fastq  SRR098026.fastq

/home/dcuser
$ cd scripts
$ ls
dataAnalysis.sh   firstExample.txtĈ    myHistory.txt  scriptsBackup
firstExample.txt  myFolderContent.txt  mytext.text    test.sh

How can I now move the seq_info.text to dataAnalysis.sh???

$ nano  dataAnalysis.sh
$ mv dataAnalysis.sh ~/scripts

nano dataAnalysis.sh (for filename in *.fastq; do head -n 2 ${filename} >> seq_info.txt; done)
mv ~/scripts/dataAnalysis.sh ~/shell_data/untrimmed_fastq/

nano dataAnalysis.sh (for filename in ~/shell_data/untrimmed_fastq/*.fastq; do head -n 2 ${filename} >>~/shell_data/untrimmed_fastq/seq__info.txt; done)

seq_info.txt should be saved in the folder ~/results
write in a single line code: create the folder ~/results and execute the script and verify the result of the result of the script

 mkdir ~/results && bash seq_info.txt && cat seq_txt.txt

nano ~/scripts/dataAnalysis.sh (for filename in ~/shell_data/untrimmed_fastq/*.fastq; do head -n 2 ${filename} >>~/results/seq_info.txt; done)
mkdir ~/results && ~/scripts/dataAnalysis.sh && cat ~/results/seq_info.txt
mkdir results && cd results && bash ~/scripts/dataAnalysis.sh && ls && cat seq_info.txt

Guy$ nano ~/scripts/dataAnalysis.sh
cd ~/shell_data/untrimmed_fastq
for filename in *.fastq
do
head -n 2 ${filename} >> ~/results/seq_info.txt
cat seq_info.txt
done
Guy$ mkdir ~/results && ~/scripts/dataAnalysis.sh && cat ~/results/seq_info.txt

ftp://ftp.ensemblgenomes.org/pub/release-37/bacteria/species_EnsemblBacteria.txt

Data Wrangling and Processing for Genomics
https://datacarpentry.org/wrangling-genomics/
https://datacarpentry.org/genomics-workshop/setup.html

Assessing Read Quality  
https://datacarpentry.org/wrangling-genomics/02-quality-control/index.html

Sequencing read quality check

fastqc code and documentation
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Guy$ tail  -n 4 SRR2584863_1.fastq
CTGCAATACCACGCTGATCTTTCACATGATGTAAGAAAAGTGGGATCAGCAAACCGGGTGCTGCTGTGGCTAGTTGCAGCAAACCATGCAGTGAACCCGCCTGTGCTTCGCTATAGCCGTGACTGATGAGGATCGCCGGAAGCCAGCCAA
+
CCCFFFFFHHHHGJJJJJJJJJHGIJJJIJJJJIJJJJIIIIJJJJJJJJJJJJJIIJJJHHHHHFFFFFEEEEEDDDDDDDDDDDDDDDDDCDEDDBDBDDBDDDDDDDDDBDEEDDDD7@BDDDDDD>AA>?B?<@BDD@BDC?BDA?

tail -n 4 SRR2584863_1.fastq 

tail -n 4 *863_1.fastq

$cat ~/dc_workshop/docs/fastqc_summaries.txt

$ cat */summary.txt > ~/dc_workshop/docs/fastqc_summaries.txt
$ ls
$ cat ~/dc_workshop/docs/fastqc_summaries.txt

sort fastqc_summaries.txt | grep "FAIL" fastqc_summaries.txt 

cat fastqc_summaries.txt | sort
How can you sort so that it kicks evrything out of the list that has a PASS?
Check the "man grep" command. grep has a -v parameter, that gets you everything that does NOT match a pattern, and after you can sort
for example: grep -v PASS fastqc_summaries.txt | sort

$cat ~/dc_workshop/docs/fastqc_summaries.txt

Trimming and Filtering
https://datacarpentry.org/wrangling-genomics/03-trimming/index.html



Data Wrangling and Processing for Genomics - part 2
https://datacarpentry.org/wrangling-genomics/

Instructor
Endre Sebestyén

Helpers
Tadeu
Kari
Ali

Participants
1. Lisa Tietze
Hannah
May
Jonathan
Bisa
Berihun 
Abel Gizaw
Benedicte
Fleur
Iva
Abush
amit
Adrian
Alexandra
Katrine
Eirini
Nur


Variant Calling Workflow  
https://datacarpentry.org/wrangling-genomics/04-variant_calling/index.html

BWA documentation
http://bio-bwa.sourceforge.net/bwa.shtml

The SAM/BAM file format specification
https://samtools.github.io/hts-specs/SAMv1.pdf

samtools documentation
http://www.htslib.org/doc/samtools.html

bcftools documentation
http://www.htslib.org/doc/bcftools.html


What is the real name of the reference genome?

head data/ref_genome/ecoli_rel606.fasta
>CP000819.1 Escherichia coli B str. REL606, complete genome

CP000819.1 Escherichia coli B str. REL606, complete genome

How many variants are there in the vcf file?
Use grep and wc

IGV download
http://software.broadinstitute.org/software/igv/download

IGV setup done
Endre
Lisa
Kari
Iva
Tadeu
Abel
Jonathan
Katrine
May
Fleur

IGV files to download
https://www.dropbox.com/sh/hhmh6j4b212b5dd/AADIfAdTsfo5rs5I_nWeuM1pa?dl=0

Intro to Cloud computing:
https://datacarpentry.org/cloud-genomics/03-verifying-instance/index.html

Introduction to Cloud Computing for Genomics
https://datacarpentry.org/cloud-genomics/

Post workshop survey
https://carpentries.typeform.com/to/UgVdRQ?slug=2020-10-21-nord-online