ceWelcome to Software Carpentry Etherpad!

This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.

Use of this service is restricted to members of the Software Carpentry and Data Carpentry community; this is not for general purpose use (for that, try etherpad.wikimedia.org).

Users are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html

All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/

Software Carpentry Code of Conduct
https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html

UNESCO Standard of Conduct
http://users.ictp.it/~staff/downloads/CODE_EN.pdf

DINNER FOR TONIGHT 
Raffaele al california ristorante pizzeria at 19:30
Bus map 
https://www.google.co.uk/maps/dir/45.7071795,13.7142384/Raffaele+al+california+ristorante+pizzeria,+Viale+Miramare,+303,+34136+Trieste+TS/@45.701725,13.7162727,15z/data=!3m1!4b1!4m17!1m6!3m5!1s0x477b13306f567993:0x20b109da192dd917!2sRaffaele+al+california+ristorante+pizzeria!8m2!3d45.6950801!4d13.7358166!4m9!1m1!4e1!1m5!1m1!1s0x477b13306f567993:0x20b109da192dd917!2m2!1d13.7358166!2d45.6950801!3e0
If you walk it (about 40 minutes) it is best to pretty much follow the bus route (i.e. walk along Str. Costiera) - don't go into Miramare park as it closes at 7. 


MATERIALS FOR THE SCHOOL 
Slides for the courses that are not on a github repository can be found here
https://drive.google.com/drive/folders/1PvfPCOVdbdOIowcfQAhnTguuwXJ
nfx8U?usp=sharing

Concept document for CODATA-RDA Schools
https://drive.google.com/open?id=1AwMILol4ZpSJwLzcPDT_XwAyC4dZy-Km

If you are interested in becoming a Certified Software Carpentry Instructor please look at the following sites :-
https://carpentries.org/become-instructor/
https://carpentries.github.io/instructor-training/

Self-learning session - please go to 
https://authorcarpentry.github.io/orcid-profile/00-orcid-profile.html
and do Exercise 1 and 2. Please complete this before Thursday, 9 August because we will use your ORCID bio in the Reproducible Reporting lesson.

Users are expected to follow our code of conduct: https://docs.carpentries.org/top
RDM survey
https://docs.google.com/forms/d/1qQlo-OgUssEZ_AD

BdCpUiHrB9JXKNjUnEt2KJkn3zBQ/edit?ts=5b645696

PDF for Machine learning
http://indico.ictp.it/event/8329/session/7/contribution/29/material/0/0.pdf

http://indico.ictp.it/event/8329/session/7/contribution/29/material/0/0.pdf

Data file
http://files.grouplens.org/datasets/movielens/ml-latest-small.zip

_____________________________________________________________________________________________________________________________________

Software Carpentry                                                              CODATA-RDA School of Research Data Science
ICTP Conference Room                                                       Mon-Wed, Aug 06-08                    
Workshop website                                                                https://orchid00.github.io/2018-08-06-ICTP/
 

Welcome to Software Carpentry 
We will use this Etherpad to share links and snippets of code, take notes, ask and answer questions, and whatever else comes to mind.
The page displays a screen with three major parts:

	* The left side holds today's notes: please edit these as we go along.
	* The top right side shows the names of users who are logged in: please add your name and pick the color that best reflects your mood and personality.
	* The bottom right is a real time chat window for asking questions of the instructor and your fellow learners.

Instructors
	* Marko Vidak, ELIXIR-SI and Faculty of Medicine, University of Ljubljana
	* Hugh Shanahan
	* Paula Andrea Martinez, ELIXIR-BE
	* Louise Bezuidenhout - Institute for Science, Innovation and Society, Univerity of Oxford

Helpers
	*      Gail Clement   

Participants
Please add yourself to the participant list below:
	* Abramowicz Tomasz
	* Atul Saini
	* Bayode Taye
	* Maphuti Betty Ledwaba
	* Bouchra Chaouni
	* Benhrif  Oussama 
	* Caroline Franco
	*  Mesfin Diro
	* Cristina Chavez
	* Caterina Cevallos
	* Gianluca Coidessa
	* Denise Brito
	* Herbert Nguruwe
	* Joel Defo
	* KIVOUILA SAMINOU Luce Evrard
	caroline Ajilogba
	* Laterza, Simone
	* Sule, Mary-Jane
	* Ashok Rai
	* Lilian Juma
	* Simisani Kelaotswe
	* Boutaina Ettetuani
	* chadia Ed-driouch 
	* Vivek Ananth
	* Dagim Yoseph
	* Neema Mduma
	* Mansour Esmaeili
	* Naushin Nower
	* Mojtaba Khodadadi
	* Barbara Glover
	* SAmuel Terkper Ahuno
	* Solomon Gizaw
	* Mohamed Shaltout
	* Gustavo Andre Santos 
	* Gabriela Tenorio 
	* Ngabo Desire
	* Addo David
	* Sina Ayangbenro
	* Chinedu Obieze   018/School/
	* Materials - https://opensciencegrid.org/dosar/DataTrieste2018/Materials/ 
	* Ekpe Okorafor
	* Fatiha Merazka
	* Najmejh Mirian

Friendly and collaborative active participants
Participants are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html

Public content page
All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ 

Pre-survey
Make you sure you have filled the Software Carpentry workshop pre-survey so we know what is your pre-knowledge before the workshop
https://www.surveymonkey.com/r/swc_pre_workshop_v1?workshop_id=2018-08-06-ICTP

Sticky Notes: Use these on the back of your laptop to signal us if we're going too fast or too slow. Use one color to indicate that you're happy and everything is fine, and use the other color to indicate that you're having difficulty keeping up or that you need someone to stop by to help you. Tell us what you need.

Icebreaker: Turn to a partner and introduce yourself by name, one word about your research (i.e. 'microbes', 'dogs', 'vectors', 'stars') and one thing you're proud of that you made. It could be a bookshelf, a curry, a 3D plot, a piece of software, your bed this morning, just something you did that you're proud of. 
Write your partner's name, one word about their research, and thing they're proud of below.

	* A
	* Benhrif  Oussama : Bioinformatics : Development of new Genomic Data Analysis Algorithm
	* C
	* Sule, Mary-Jane : Cloud, development of a trusted cloud model

|-----------------------|
| Twitter hash        |
| #datatrieste18  | 
|-----------------------|

Open and Responsible Science: Introductory lecture
https://drive.google.com/file/d/1mbGXLeM-EXe1aMVhjvhlD_S_Obox0Dmj/view?usp=sharing 

Bash Shell 
==========

Setup: Please see the workshop website for required software. In addition, you need to download some files to follow this lesson. Please follow the instructions here: http://swcarpentry.github.io/shell-novice/setup.html

Prerequisites
If you have stored files on a computer at all and recognize the word “file” and either “directory” or “folder” (two common words for the same thing), you’re ready for this lesson.
If you’re already comfortable manipulating files and directories, searching for files with grep and find, and writing simple loops and scripts, you probably won’t learn much from this lesson. Some options so that you don't get bored:
	* look around to see if anyone near you needs help
	* do all of the challenge exercises listed below, and continue on to exercises in succeeding lesson episodes ( http://swcarpentry.github.io/shell-novice/  )
	* observe how the lesson and taught and how a workshop is run in furthering the noble goal of becoming a Carpentry instructor yourself

Feedback
Minute cards http://tiny.cc/triestecards
https://www.surveymonkey.com/r/swc_post_workshop_v1?workshop_id=2018-08-06-ICTP


┏━━━━•❅•°•❈•°•❅•━━━━┓
❍   WELCOME DAY2     ❍
┗━━━━•❅•°•❈•°•❅•━━━━┛

Unix shell

Lessons and follow-up exercises:
http://swcarpentry.github.io/shell-novice/

Presentation with slides - this is a concise version of lessons available at http://swcarpentry.github.io/shell-novice/

Presentation:
https://docs.google.com/presentation/d/1qpAkfsczLngzeKoXFh9uM_EAkB6MNx hCyjBwwGXH-Uc/edit?usp=sharing

The slides also contain a brief description of ELIXIR and its activities. There is also a link to the ELIXIR-SI e-Learning platform (EeLP) where more Unix training courses are available for registered users: https://elixir.mf.uni-lj.si/?lang=en

Git
===

http://swcarpentry.github.io/git-novice/

Cheat sheet of Git 
https://services.github.com/on-demand/downloads/github-git-cheat-sheet.pdf


R session
========
Materials: recommended to look after the session
http://swcarpentry.github.io/   swc-releases/2016.06/r-novice-gapminder/index.html

Slides R community: tinyurl.com/Rcommunityslides


Do you relate? http://swcarpentry.github.io/swc-releases/2016.06/r-novice-gapminder/fig/bad_layout.png
How to improve?
Project management

Create a folder called RProjects under Documents

Exercise 1 - New Rstudio Project (4 min)

		* RStudio menu (top left corner): click File menu button,
		* Then New Project
		* Click New Directory
		* Click New Project
		* In Directory name type the name of your project, e.g. Rfoundation (Browse and select a folder where to locate your project, e.i. the RProjects folder)
		* Lastly, click the Create Project button

Lesson Instructions
http://swcarpentry.github.io/swc-releases/2016.06/r-novice-gapminder/02-project-intro/


     RProjects
     |_ RIntro
		|_ data
		|_ scripts
		|_ figures
		
Why Project management?
"Managing your projects in a reproducible fashion doesn't just make your science reproducible, it makes your life easier."

Documents/RProjects/RIntro

R is case sentitive
  - camelCase
  - snake_case


R style guide : 
    - http://adv-r.had.co.nz/Style.html
    - https://github.com/r-lib/styler


functions fun:
    
data 
Gapminder dataset in csv format, both links are the same dataset
 - http://tiny.cc/SWCgapminder
 - https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/data/gapminder-FiveYearData.csv


┏━━━━•❅•°•❈•°•❅•━━━━┓
❍   WELCOME DAY3   ❍
┗━━━━•❅•°•❈•°•❅•━━━━┛


R session
========

If you are using your own computer make sure you have the following
R Packages installed
install.packages("tidyverse")http://pad.carpenteries.org/dataTrieste18
install.packages("rmarkdown")
install.packages("knitr")

ggplot2

# minimun layers of ggplot
ggplot(data = gapminder, aes( x = x, y = y )) +
    geom_point()

# adding shape, shape = column

# adding colour
ggplot(data = gapminder, aes( x = x, y = y , colour = column)) +
     geom_point()

# custom scale in the x axis
ggplot(data = gapminder, aes(x = year, y = pop, colour = continent)) +
        geom_point() +
        scale_x_continuous(breaks = unique_years)
        
        
ggplot(data = gapminder, aes(x = year, y = pop, colour = continent)) +
    geom_point() +
    scale_x_continuous(breaks = unique_years) +
    scale_y_continuous(breaks = c(0, 100000000, 200000000, 500000000, 1000000000),
                    labels = c(0, "100 mi", "200 mi", "500 mi", "1 billion"))
 
 # creating a histogram of lifeExp coloured by continent
 
 
 ]# creating a histogram of each continent lifeExp
ggplot(______, aes(____)) +
   geom____() +
   facet_wrap(~ continent)
 
 # create a histogram by continent and a new theme
ggplot(data = gapminder, aes(x = lifeExp, fill = continent)) +
    geom_histogram(bins = 12) +
    facet_wrap(~ continent) +
    theme_dark() 
    
    #Create a line plot of year and lifeExp coloured by continent
    # add custom labes with labs()
    
# create a new function frame 
function_name <- function (....){
    ......
}

# create a function to subset countries by the first letter
countriesByLetter <- function(){
  }
  
# countriesByLetter <- function(fl){
    my_countries <- gapminder[starts_with %in% fl, ]
    my_countries
    return(my_countries)
}

 
 # Copy the function countriesByLetter and
 # make a new function plotCountriesByLetter
 # add some code to make the previous plot
 # and save the plot in your figures folder as a .png
 
 
 # Tip to save the plot
 names <- paste0(fl, collapse = "_")
 paste0("figures/Countries_by_letter_" , names, ".png")
 
 
  # ten minutes talk with somebody and show them how you solved the last exercise 
  # or where you got stucked and solve the problem together


Everyone loves emojies

 #before
:D <-+ +++++
:)   <-+++++++++++
:|   <- +++++++
:(   <- ++++
:S  <-+++

#after
:D <- +++++++++++++++
:)   <-+++++++
:|   <-+++++
:(   <-+
:S  <-++++

AWESOME!!

Summary so far
Today we have learned to use ggplot2 the grammar of graphics
We have learned to to reproducible code with functions
Now we are going to learn the grammar of data manipulation dplyr


gapminder %>%
  select(starts_with("c"))

select(gapminder, starts_with("c"))

# R conditionals
== eaquals
!= different
> greater
< lower
>=
<=
& and
or |

gapminder %>%                 
   filter(continent == "Africa" ) %>%                   
   select(year, country, lifeExp)
                    
#another option
# select African countries for lifeExp, country and year,
af <- gapminder[gapminder$continent == "Africa", ] %>% select(lifeExp, country, year)
nrow(af)
write_csv(af, path = "data/african-countries.csv")


Final challenge
Challenge yourself to create a report with the gapminder dataset. Answer at least one question using the dataset, or as many as you want.tail -f --lines=500 simple.dag.dagman.out
https://raw.githubusercontent.com/orchid00/Report_example/master/report/blankReport.Rmd 


Feedback
Minute cards http://tiny.cc/triestecards

Extra links
=========
	* R first session August 7: https://github.com/orchid00/Report_example/blob/master/scripts/001_firstSteps.R
	* R morning session August 8: https://github.com/orchid00/Report_example/blob/master/scripts/002_plots.R
	* R data structures image https://github.com/orchid00/R4da/blob/master/img/Rdatasctructures.png
	* Gapminder website https://www.gapminder.org/ when you have 15 minutes you can take the test of common knowledge http://forms.gapminder.org/s3/test-2018
	* R community slides tinyurl.com/Rcommunityslides
	* Shiny tutorial https://bioinformatics-core-shared-training.github.io/shiny-bioinformatics/tutorial
	* Connecting Git with Rstudio from zero http://happygitwithr.com/
	* 10 minutes markdown tutorial https://commonmark.org/help/tutorial/
	* If you want to learn github-flavored markdown in 3 minutes: https://guides.github.com/features/mastering-markdown/ 
	* R script sections: https://support.rstudio.com/hc/en-us/articles/200484568-Code-Folding-and-Sections
	* R Studio shortcuts   Alt+Shift+K
	* names colours http://www.stat.columbia.edu/%7Etzheng/files/Rcolor.pdf
	* ggplot2 different color palettes (see http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/
	* http://www.stat.columbia.edu/%7Etzheng/files/Rcolor.pdf
	* https://www.rstudio.com/resources/cheatsheets/
	* http://www.ggplot2-exts.org/gallery/
	* https://www.r-graph-gallery.com/
	* http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
	* Cool plugin to "manually" edit your ggplots directly from Rstudio: https://github.com/calligross/ggthemeassist
      You can install it the same way than an R package
      This is how it works: https://raw.githubusercontent.com/calligross/ggthemeassist/master/examples/ggThemeAssist2.gif
	* Also, this package allows you to obtain 'publication ready plots' and it's built on top of ggplot2: http://www.sthda.com/english/rpkgs/ggpubr/ **Cool feature**: if you have 2 or more groups that you are comparing and plotting (i.e. with boxpots, violin plots, etc) this includes a built-in function to do the statistics between the groups (for instance t-tests) and add the lines and significance values to the plot: http://www.sthda.com/english/rpkgs/ggpubr/reference/stat_compare_means.html
	* https://rmarkdown.rstudio.com/developer_document_templates


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Ethics exercise 

A Command Line Interface (CLI) such as Bash requires training and practice. There is an added difficulty for non-English speakers in that the commands were originally developed by English speakers. Most users find Graphical User Interfaces (GUI’s) very intuitive. On the other hand, CLI’s give much more control and require much less development.

In the future it is likely that you will be teaching CLI to colleagues and students.  Yesterday we learnt that one of the key elements of Responsible Conduct of Research is being a responsible mentor and colleague - this includes being aware of how cultural and linguistic differences can cause learning challenges for those we interact with.  Please take 5 minutes to reflect on the following question: if you are going to be training non-English speakers who have only used GUI’s  in Bash and other shell languages what steps would you take to make sure that they did not experience any unnecessary learning/use chal

┏━━━━•❅•°•❈•°•❅•━━━━┓
❍   WELCOME DAY4 ❍
┗━━━━•❅•°•❈•°•❅•━━━━┛

Open Science recap and survey intro - https://drive.google.com/open?id=1F0az3c9_3TkZp9g_ZC6ciifbfDEsaVU-

Data Management Plan slides - https://drive.google.com/open?id=1ZfYAh5QGIgK2UEUBWsCJpJAum2P2gle2

DMP exercise - https://repositorian.github.io/DOAJ_Exercise 

Feedback: http://tiny.cc/tr8iestecards


AuthorCarpentry & Reproducibility Reporting

- Lesson is here: https://authorcarpentry.github.io/DT2018/
- Student files are here: https://github.com/AuthorCarpentry/DTSTUDENT2018
- Rendered final report on the web is here: 

The student files needed for the lab sessions are in a github repository at https://github.com/AuthorCarpentry/DTSTUDENT2018
1. Fork this repository into your own GitHub account, renaming the folder from 
DTSTUDENT2018  to your name 
2. Clone the repository you just made in your account to your desktop
3. In RStudio, create a new project from the repository now on your desktop. This will make sure the student files are in the working directory (so all of the paths to data or image files work properly)

Adding more features to your YAML Heading

output:
  html_document:
    code_folding: hide
    css: custom.css
    number_sections: yes
    toc: yes
    toc_depth: 2
    toc_float: yes
    theme: readable
    highlight: kate


┏━━━━•❅•°•❈•°•❅•━━━━┓
❍         WELCOME DAY 5           ❍
┗━━━━•❅•°•❈•°•❅•━━━━┛

Intro to RDM - https://drive.google.com/open?id=1LEYLkTmPvUfamYDzMfQfX1SsHFl99ZPa
FAIR data - https://drive.google.com/open?id=12o-uybNjyOGTcIn54X1faJgoV3Ukpuqw
FAIR discussion - https://drive.google.com/open?id=1hEmIzPR3uoSiKFYQzzlcmxf9dTHuTQTi
Open and Responsible Science part 2 - https://drive.google.com/file/d/1BA8aBAl65c3p9JY9sW-z_DVIZAziFGKb/view?usp=sharing
Ethics handout - https://drive.google.com/file/d/1mK2HY2rt0TElGwRem5WU3tnyPNPPMxlA/view?usp=sharing 
electronic Journal Delivery System - http://ejds.ictp.it/
INASP materials - https://drive.google.com/open?id=16spdJPJDOcXmuhoJcoDXPvthBo_oXedu

-----------------------------------------------------
AuthorCarpentry & Reproducibility Reporting - Friday August 9, 2017

Add a dynamic parameter to your reproducible report! (and receive a Knitr sticker!!!)

This example makes the Institution you work for a changeable variable that you can select from a pick list when you knit.

Step 1. Add the following to the bottom of your YAML header (but *above* the  --- )

params: (make sure this is flush left)
  institution: (indented two spaces, should line up with 'r' in params above)
    choices: (two more indented spaces)
    - International Centre for Theoretical Physics (same indent as 'choices', here and for all of the lines below here)
    - CODATA-RDA Summer School
    - Elsevier Inc
    - California Institute of Technology
    - Add another Institution here, either your own or one you have always wanted to work for! :-)
    input: select
    label: 'Institution:'
    value: Pick one of the choices above to serve as a default
    
Step 2. 

Replace all occurrences of 'Institution' in your document with the following code that will fill in whatever value you select from the picklist. Hint:  One occurrence of  your Institution may occur in the YAML, within the 'author' field.  Make sure the code is surrounded by backticks, not quotes.
    
`r params$institution`

Step 3. 
Save your work 

Step 4. 
On the Knit button, select 'Knit with parameters'
You should then see a picklist appear that allows you to select from the Institutions you included in the YAML

Step 5. If all works properly, your report will now include your selected Institution!

Step 6. Anwer the following question here on the Etherpad (this earns you a knitr hex sticker!):

How might you apply the parameterization feature in a reproducible report about your own research? 

-----------------------------------------------------

AuthorCarpentry: Licensing and ORCID

	* CODATA-RDA Legal Interoperability of Research Data
		* Interest Group: https://rd-alliance.org/groups/rdacodata-legal-interoperability-ig.html
		* Principles & Guidelines: https://www.rd-alliance.org/rda-codata-legal-interoperability-research-data-principles-and-implementation-guidelines-now
		* Reading List  (Zotero database): https://www.zotero.org/groups/1757514/legalinteropdata
	* Creative Commons Choose a License https://creativecommongs.org/choose/
	* ORCID (Online Researcher Identifer): https://orcid.org/
	* CrossCite (For any DOI, get the perfect citation in the style you desire): https://crosscite.org/


-----------------------------------------------------
AuthorCarpentry: Citation Styles in Your RMarkdown document

Step 1. Pick your desired citation style from the Citation Style Repository (courtesy of Zotero): https://www.zotero.org/styles?q=biomed-central

Step 2. Click on the hyperlink for the style you want and save the file with extension .csl to the RStudio project folder on your computer

Step 3. This is an xml file (xml is another markup language). Pandoc will read this file and create inline citations and references in the reference list according to this style.

Step  4. In the YAML header, add the line csl: and insert the exact path and name of your file.csl

Step 5. Knit to the output of your choice and note the change in formatting for your citations and reference list!

-----------------------------------------------------


YOUR GIT ID's


francocarol
@asocrai
c0ra
mjaysule
Daisy1984
neylicious
mesfind
CObieze
Ayansina
samkiv
Dengabo
dagimy
TayeB2018
Solomon2018
Akorfa
sahuno
CarolineNWU
Lilian9
joelthegrace
gabytv10
drabrito
@mkhm
@scienception
Simisani

#To install jrGgplot2 (on your laptop)
install.packages("drat")
drat::addRepo("jr-packages")
install.packages("jrGgplot2",dependencies = TRUE)


┏━━━━•❅•°•❈•°•❅•━━━━┓
❍         WELCOME DAY 9          ❍
┗━━━━•❅•°•❈•°•❅•━━━━┛

# Hands-On Exercise: 
# Implementing a Basic Recommender Engine for Movies
#
# Instructor: Dr. Ekpe Okorafor
# August 14th, 2018
# DataTrieste18


install.packages("data.table")
install.packages("ggplot2")
install.packages("recommenderlab")

# Exercise 1
movies = read.csv("/Users/ekpe/recommender/ml-latest-small/movies.csv") #load movies.csv file
str(movies)/#list the structure of movies

ratings = read.csv("/Users/ekpe/recommender/ml-latest-small/ratings.csv") #load ratings.csv file
str(ratings) #list the structure of ratings

# Exercise 2
library(ggplot2)
plot <- ggplot(ratings, aes(x = rating)) + geom_histogram()
plot

# Exercise 3
library(data.table)
movgen <- as.data.frame(movies$genres, stringsAsFactors=FALSE)
movgen2 <- as.data.frame(tstrsplit(movgen[,1], '[|]', type.convert = TRUE), stringsAsFactors = FALSE)
colnames(movgen2) <- c(1:7)
head(movgen2, n=4)

# Exercise 4
movgen_list <- c("Action", "Adventure", "Animation", "Children", "Comedy", "Crime","Documentary", "Drama", "Fantasy","Film-Noir", "Horror", "Musical", "Mystery","Romance","Sci-Fi", "Thriller", "War", "Western")
movgen_matrix <- matrix(0,9126,18) #empty matrix
movgen_matrix[1,] <- movgen_list #set first row to genre list
colnames(movgen_matrix) <- movgen_list #set column names to genre list

#iterate through matrix
for (i in 1:nrow(movgen2)) {
  for (c in 1:ncol(movgen2)) {
    genmat_col = which(movgen_matrix[1,] == movgen2[i,c])
    movgen_matrix[i+1,genmat_col] <- 1
  }
}

#convert into dataframe
movgen_matrix2 <- as.data.frame(movgen_matrix[-1,], stringsAsFactors=FALSE) #remove first row, which was the genre list
for (c in 1:ncol(movgen_matrix2)) {
  movgen_matrix2[,c] <- as.integer(movgen_matrix2[,c])
} #convert from characters to integers

# Exercise 5
binary_ratings <- ratings
for (i in 1:nrow(binary_ratings)){
  if (binary_ratings[i,3] > 3){
    binary_ratings[i,3] <- 1
  }
  else{
    binary_ratings[i,3] <- -1
  }
}
head(binary_ratings, n=7)

binary_ratings2 <- dcast(binary_ratings, movieId~userId, value.var = "rating", na.rm=FALSE)
for (i in 1:ncol(binary_ratings2)){
  binary_ratings2[which(is.na(binary_ratings2[,i]) == TRUE),i] <- 0
}
binary_ratings2 = binary_ratings2[,-1] #remove movieIds col. Rows are movieIds, cols are userIds
dim(binary_ratings2)

#Remove rows that are not rated from movies dataset
unique_movieIds <- (unique(movies$movieId)) #9125
unique_ratings <- (unique(ratings$movieId)) #9066
movies2 <- movies[-which((unique_movieIds %in% unique_ratings) == FALSE),]
rownames(movies2) <- NULL
dim(movies2)

#Remove rows that are not rated from movgen_matrix2
movgen_matrix3 <- movgen_matrix2[-which((unique_movieIds %in% unique_ratings) == FALSE),]
rownames(movgen_matrix3) <- NULL
dim(movgen_matrix3)

#Calculate dot product for User Profiles
result = matrix(0,18,671)
for (c in 1:ncol(binary_ratings2)){
  for (i in 1:ncol(movgen_matrix3)){
    result[i,c] <- sum((movgen_matrix3[,i]) * (binary_ratings2[,c]))
  }
}
#Convert to Binary scale
for (i in 1:nrow(result)){
  for (j in 1:ncol(result)) {
     if (result[i,j] < 0){
     result[i,j] <- 0
     }
  else {
     result[i,j] <- 1
    }
  }
}

# Exercise 6
result2 <- result[1,] #First user's profile
sim_mat <- rbind.data.frame(result2, movgen_matrix3)
sim_mat <- data.frame(lapply(sim_mat,function(x){as.integer(x)})) #convert data to type integer

#Calculate Jaccard distance between user profile and all movies
library(proxy)
sim_results <- dist(sim_mat, method = "Jaccard")
sim_results <- as.data.frame(as.matrix(sim_results[1:9066]))
rows <- which(sim_results == min(sim_results))
#Recommended movies
movies[rows,]

#Exercise 7
library(recommenderlab)
recommender_models <- recommenderRegistry$get_entries(dataType = "realRatingMatrix")
names(recommender_models) # Display models
lapply(recommender_models, "[[", "description") # Describe models
recommender_models$UBCF_realRatingMatrix$parameters # List parameters of model

library(reshape2)
#Create ratings matrix. Rows = userId, Columns = movieId
ratingmat <- dcast(ratings, userId~movieId, value.var = "rating", na.rm=FALSE)
ratingmat <- as.matrix(ratingmat[,-1]) #remove userIds

#Convert rating matrix into a recommenderlab sparse matrix
ratingmat <- as(ratingmat, "realRatingMatrix")

#Normalize the data
ratingmat_norm <- normalize(ratingmat)

#Create Recommender Model. "UBCF" stands for User-Based Collaborative Filtering
recommender_model <- Recommender(ratingmat_norm, method = "UBCF", param=list(method="Cosine",nn=30))
recom <- predict(recommender_model, ratingmat[1], n=10) #Obtain top 10 recommendations for 1st user in dataset
recom_list <- as(recom, "list") #convert recommenderlab object to readable list

#Obtain recommendations
recom_result <- matrix(0,10)
for (i in c(1:10)){
  recom_result[i] <- movies[as.integer(recom_list[[1]][i]),1]
}
recom_result


# ---- Excercise 5
# Author: Mojtaba Khodadadi
# github: @mkhm
# stackoverflow: 3454902
binary_ratings <- ratings
head(ratings)
binary_ratings$rating <- as.integer(ratings$rating/4.0)
head(binary_ratings)
where = which(binary_ratings[,3] == 0)
binary_ratings[where,3] <- -1
head(binary_ratings)
ratings


┏━━━━•❅•°•❈•°•❅•━━━━┓
❍         WELCOME DAY 10           ❍
┗━━━━•❅•°•❈•°•❅•━━━━┛


Materials for today are in the Artificial Neural Network folder in the materials folder 
https://drive.google.com/drive/folders/1PvfPCOVdbdOIowcfQAhnTguuwXJnfx8U?usp=sharing

You can deposit your presentations at 
https://drive.google.com/drive/folders/1vRnGy-YX1XMWxciYml0-yzvnuMPw77a4?usp=sharing

Presentation on RDA overview and implementation of RDA outputs to make research workflows FAIR
https://docs.google.com/presentation/d/1dA4mJ2_rFskoo8TE9wmvyTulqYaEpx_fR0Qkzmss_u4/edit?usp=sharing

┏━━━━•❅•°•❈•°•❅•━━━━┓
❍         WELCOME DAY 11           ❍
┗━━━━•❅•°•❈•°•❅•━━━━┛

If you using a Windows Laptop then you need to install a terminal client program with ssh. Examples of software you can use (and freely download) are 
https://putty.org
and
http://cmder.net

the second one looks nicer though I've never used it - Hugh

DOSAR (Distributed Organization for Scientific and Academic Research) - https://opensciencegrid.org/dosar/ and http://www.dosar.org/ 
Schedule - https://opensciencegrid.org/dosar/DataTrieste2018/School/
Materials - https://opensciencegrid.org/dosar/DataTrieste2018/Materials/ 

┏━━━━•❅•°•❈•°•❅•━━━━┓
❍         WELCOME DAY 12           ❍
┗━━━━•❅•°•❈•°•❅•━━━━┛

EGI Computational Infrastructures (Continued) 
Cloud and Jupyter Notebooksˇ
https://documents.egi.eu/document/3349 

Jupyter notebook url after registration

https://training.fedcloud-tf.fedcloud.eu


####################################################
IoT/Big Data

https://goo.gl/57f4DT
Also here us the link to download the zip file for the docker image: https://github.com/scaledaction/sentiment-analysis/archive/master.zip