ceWelcome to Software Carpentry Etherpad! This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents. Use of this service is restricted to members of the Software Carpentry and Data Carpentry community; this is not for general purpose use (for that, try etherpad.wikimedia.org). Users are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ Software Carpentry Code of Conduct https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html UNESCO Standard of Conduct http://users.ictp.it/~staff/downloads/CODE_EN.pdf DINNER FOR TONIGHT Raffaele al california ristorante pizzeria at 19:30 Bus map https://www.google.co.uk/maps/dir/45.7071795,13.7142384/Raffaele+al+california+ristorante+pizzeria,+Viale+Miramare,+303,+34136+Trieste+TS/@45.701725,13.7162727,15z/data=!3m1!4b1!4m17!1m6!3m5!1s0x477b13306f567993:0x20b109da192dd917!2sRaffaele+al+california+ristorante+pizzeria!8m2!3d45.6950801!4d13.7358166!4m9!1m1!4e1!1m5!1m1!1s0x477b13306f567993:0x20b109da192dd917!2m2!1d13.7358166!2d45.6950801!3e0 If you walk it (about 40 minutes) it is best to pretty much follow the bus route (i.e. walk along Str. Costiera) - don't go into Miramare park as it closes at 7. MATERIALS FOR THE SCHOOL Slides for the courses that are not on a github repository can be found here https://drive.google.com/drive/folders/1PvfPCOVdbdOIowcfQAhnTguuwXJ nfx8U?usp=sharing Concept document for CODATA-RDA Schools https://drive.google.com/open?id=1AwMILol4ZpSJwLzcPDT_XwAyC4dZy-Km If you are interested in becoming a Certified Software Carpentry Instructor please look at the following sites :- https://carpentries.org/become-instructor/ https://carpentries.github.io/instructor-training/ Self-learning session - please go to https://authorcarpentry.github.io/orcid-profile/00-orcid-profile.html and do Exercise 1 and 2. Please complete this before Thursday, 9 August because we will use your ORCID bio in the Reproducible Reporting lesson. Users are expected to follow our code of conduct: https://docs.carpentries.org/top RDM survey https://docs.google.com/forms/d/1qQlo-OgUssEZ_AD BdCpUiHrB9JXKNjUnEt2KJkn3zBQ/edit?ts=5b645696 PDF for Machine learning http://indico.ictp.it/event/8329/session/7/contribution/29/material/0/0.pdf http://indico.ictp.it/event/8329/session/7/contribution/29/material/0/0.pdf Data file http://files.grouplens.org/datasets/movielens/ml-latest-small.zip _____________________________________________________________________________________________________________________________________ Software Carpentry CODATA-RDA School of Research Data Science ICTP Conference Room Mon-Wed, Aug 06-08 Workshop website https://orchid00.github.io/2018-08-06-ICTP/ Welcome to Software Carpentry We will use this Etherpad to share links and snippets of code, take notes, ask and answer questions, and whatever else comes to mind. The page displays a screen with three major parts: * The left side holds today's notes: please edit these as we go along. * The top right side shows the names of users who are logged in: please add your name and pick the color that best reflects your mood and personality. * The bottom right is a real time chat window for asking questions of the instructor and your fellow learners. Instructors * Marko Vidak, ELIXIR-SI and Faculty of Medicine, University of Ljubljana * Hugh Shanahan * Paula Andrea Martinez, ELIXIR-BE * Louise Bezuidenhout - Institute for Science, Innovation and Society, Univerity of Oxford Helpers * Gail Clement Participants Please add yourself to the participant list below: * Abramowicz Tomasz * Atul Saini * Bayode Taye * Maphuti Betty Ledwaba * Bouchra Chaouni * Benhrif Oussama * Caroline Franco * Mesfin Diro * Cristina Chavez * Caterina Cevallos * Gianluca Coidessa * Denise Brito * Herbert Nguruwe * Joel Defo * KIVOUILA SAMINOU Luce Evrard caroline Ajilogba * Laterza, Simone * Sule, Mary-Jane * Ashok Rai * Lilian Juma * Simisani Kelaotswe * Boutaina Ettetuani * chadia Ed-driouch * Vivek Ananth * Dagim Yoseph * Neema Mduma * Mansour Esmaeili * Naushin Nower * Mojtaba Khodadadi * Barbara Glover * SAmuel Terkper Ahuno * Solomon Gizaw * Mohamed Shaltout * Gustavo Andre Santos * Gabriela Tenorio * Ngabo Desire * Addo David * Sina Ayangbenro * Chinedu Obieze 018/School/ * Materials - https://opensciencegrid.org/dosar/DataTrieste2018/Materials/ * Ekpe Okorafor * Fatiha Merazka * Najmejh Mirian Friendly and collaborative active participants Participants are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html Public content page All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ Pre-survey Make you sure you have filled the Software Carpentry workshop pre-survey so we know what is your pre-knowledge before the workshop https://www.surveymonkey.com/r/swc_pre_workshop_v1?workshop_id=2018-08-06-ICTP Sticky Notes: Use these on the back of your laptop to signal us if we're going too fast or too slow. Use one color to indicate that you're happy and everything is fine, and use the other color to indicate that you're having difficulty keeping up or that you need someone to stop by to help you. Tell us what you need. Icebreaker: Turn to a partner and introduce yourself by name, one word about your research (i.e. 'microbes', 'dogs', 'vectors', 'stars') and one thing you're proud of that you made. It could be a bookshelf, a curry, a 3D plot, a piece of software, your bed this morning, just something you did that you're proud of. Write your partner's name, one word about their research, and thing they're proud of below. * A * Benhrif Oussama : Bioinformatics : Development of new Genomic Data Analysis Algorithm * C * Sule, Mary-Jane : Cloud, development of a trusted cloud model |-----------------------| | Twitter hash | | #datatrieste18 | |-----------------------| Open and Responsible Science: Introductory lecture https://drive.google.com/file/d/1mbGXLeM-EXe1aMVhjvhlD_S_Obox0Dmj/view?usp=sharing Bash Shell ========== Setup: Please see the workshop website for required software. In addition, you need to download some files to follow this lesson. Please follow the instructions here: http://swcarpentry.github.io/shell-novice/setup.html Prerequisites If you have stored files on a computer at all and recognize the word “file” and either “directory” or “folder” (two common words for the same thing), you’re ready for this lesson. If you’re already comfortable manipulating files and directories, searching for files with grep and find, and writing simple loops and scripts, you probably won’t learn much from this lesson. Some options so that you don't get bored: * look around to see if anyone near you needs help * do all of the challenge exercises listed below, and continue on to exercises in succeeding lesson episodes ( http://swcarpentry.github.io/shell-novice/ ) * observe how the lesson and taught and how a workshop is run in furthering the noble goal of becoming a Carpentry instructor yourself Feedback Minute cards http://tiny.cc/triestecards https://www.surveymonkey.com/r/swc_post_workshop_v1?workshop_id=2018-08-06-ICTP ┏━━━━•❅•°•❈•°•❅•━━━━┓ ❍ WELCOME DAY2 ❍ ┗━━━━•❅•°•❈•°•❅•━━━━┛ Unix shell Lessons and follow-up exercises: http://swcarpentry.github.io/shell-novice/ Presentation with slides - this is a concise version of lessons available at http://swcarpentry.github.io/shell-novice/ Presentation: https://docs.google.com/presentation/d/1qpAkfsczLngzeKoXFh9uM_EAkB6MNx hCyjBwwGXH-Uc/edit?usp=sharing The slides also contain a brief description of ELIXIR and its activities. There is also a link to the ELIXIR-SI e-Learning platform (EeLP) where more Unix training courses are available for registered users: https://elixir.mf.uni-lj.si/?lang=en Git === http://swcarpentry.github.io/git-novice/ Cheat sheet of Git https://services.github.com/on-demand/downloads/github-git-cheat-sheet.pdf R session ======== Materials: recommended to look after the session http://swcarpentry.github.io/ swc-releases/2016.06/r-novice-gapminder/index.html Slides R community: tinyurl.com/Rcommunityslides Do you relate? http://swcarpentry.github.io/swc-releases/2016.06/r-novice-gapminder/fig/bad_layout.png How to improve? Project management Create a folder called RProjects under Documents Exercise 1 - New Rstudio Project (4 min) * RStudio menu (top left corner): click File menu button, * Then New Project * Click New Directory * Click New Project * In Directory name type the name of your project, e.g. Rfoundation (Browse and select a folder where to locate your project, e.i. the RProjects folder) * Lastly, click the Create Project button Lesson Instructions http://swcarpentry.github.io/swc-releases/2016.06/r-novice-gapminder/02-project-intro/ RProjects |_ RIntro |_ data |_ scripts |_ figures Why Project management? "Managing your projects in a reproducible fashion doesn't just make your science reproducible, it makes your life easier." Documents/RProjects/RIntro R is case sentitive - camelCase - snake_case R style guide : - http://adv-r.had.co.nz/Style.html - https://github.com/r-lib/styler functions fun: data Gapminder dataset in csv format, both links are the same dataset - http://tiny.cc/SWCgapminder - https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/data/gapminder-FiveYearData.csv ┏━━━━•❅•°•❈•°•❅•━━━━┓ ❍ WELCOME DAY3 ❍ ┗━━━━•❅•°•❈•°•❅•━━━━┛ R session ======== If you are using your own computer make sure you have the following R Packages installed install.packages("tidyverse")http://pad.carpenteries.org/dataTrieste18 install.packages("rmarkdown") install.packages("knitr") ggplot2 # minimun layers of ggplot ggplot(data = gapminder, aes( x = x, y = y )) + geom_point() # adding shape, shape = column # adding colour ggplot(data = gapminder, aes( x = x, y = y , colour = column)) + geom_point() # custom scale in the x axis ggplot(data = gapminder, aes(x = year, y = pop, colour = continent)) + geom_point() + scale_x_continuous(breaks = unique_years) ggplot(data = gapminder, aes(x = year, y = pop, colour = continent)) + geom_point() + scale_x_continuous(breaks = unique_years) + scale_y_continuous(breaks = c(0, 100000000, 200000000, 500000000, 1000000000), labels = c(0, "100 mi", "200 mi", "500 mi", "1 billion")) # creating a histogram of lifeExp coloured by continent ]# creating a histogram of each continent lifeExp ggplot(______, aes(____)) + geom____() + facet_wrap(~ continent) # create a histogram by continent and a new theme ggplot(data = gapminder, aes(x = lifeExp, fill = continent)) + geom_histogram(bins = 12) + facet_wrap(~ continent) + theme_dark() #Create a line plot of year and lifeExp coloured by continent # add custom labes with labs() # create a new function frame function_name <- function (....){ ...... } # create a function to subset countries by the first letter countriesByLetter <- function(){ } # countriesByLetter <- function(fl){ my_countries <- gapminder[starts_with %in% fl, ] my_countries return(my_countries) } # Copy the function countriesByLetter and # make a new function plotCountriesByLetter # add some code to make the previous plot # and save the plot in your figures folder as a .png # Tip to save the plot names <- paste0(fl, collapse = "_") paste0("figures/Countries_by_letter_" , names, ".png") # ten minutes talk with somebody and show them how you solved the last exercise # or where you got stucked and solve the problem together Everyone loves emojies #before :D <-+ +++++ :) <-+++++++++++ :| <- +++++++ :( <- ++++ :S <-+++ #after :D <- +++++++++++++++ :) <-+++++++ :| <-+++++ :( <-+ :S <-++++ AWESOME!! Summary so far Today we have learned to use ggplot2 the grammar of graphics We have learned to to reproducible code with functions Now we are going to learn the grammar of data manipulation dplyr gapminder %>% select(starts_with("c")) select(gapminder, starts_with("c")) # R conditionals == eaquals != different > greater < lower >= <= & and or | gapminder %>% filter(continent == "Africa" ) %>% select(year, country, lifeExp) #another option # select African countries for lifeExp, country and year, af <- gapminder[gapminder$continent == "Africa", ] %>% select(lifeExp, country, year) nrow(af) write_csv(af, path = "data/african-countries.csv") Final challenge Challenge yourself to create a report with the gapminder dataset. Answer at least one question using the dataset, or as many as you want.tail -f --lines=500 simple.dag.dagman.out https://raw.githubusercontent.com/orchid00/Report_example/master/report/blankReport.Rmd Feedback Minute cards http://tiny.cc/triestecards Extra links ========= * R first session August 7: https://github.com/orchid00/Report_example/blob/master/scripts/001_firstSteps.R * R morning session August 8: https://github.com/orchid00/Report_example/blob/master/scripts/002_plots.R * R data structures image https://github.com/orchid00/R4da/blob/master/img/Rdatasctructures.png * Gapminder website https://www.gapminder.org/ when you have 15 minutes you can take the test of common knowledge http://forms.gapminder.org/s3/test-2018 * R community slides tinyurl.com/Rcommunityslides * Shiny tutorial https://bioinformatics-core-shared-training.github.io/shiny-bioinformatics/tutorial * Connecting Git with Rstudio from zero http://happygitwithr.com/ * 10 minutes markdown tutorial https://commonmark.org/help/tutorial/ * If you want to learn github-flavored markdown in 3 minutes: https://guides.github.com/features/mastering-markdown/ * R script sections: https://support.rstudio.com/hc/en-us/articles/200484568-Code-Folding-and-Sections * R Studio shortcuts Alt+Shift+K * names colours http://www.stat.columbia.edu/%7Etzheng/files/Rcolor.pdf * ggplot2 different color palettes (see http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/ * http://www.stat.columbia.edu/%7Etzheng/files/Rcolor.pdf * https://www.rstudio.com/resources/cheatsheets/ * http://www.ggplot2-exts.org/gallery/ * https://www.r-graph-gallery.com/ * http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html * Cool plugin to "manually" edit your ggplots directly from Rstudio: https://github.com/calligross/ggthemeassist You can install it the same way than an R package This is how it works: https://raw.githubusercontent.com/calligross/ggthemeassist/master/examples/ggThemeAssist2.gif * Also, this package allows you to obtain 'publication ready plots' and it's built on top of ggplot2: http://www.sthda.com/english/rpkgs/ggpubr/ **Cool feature**: if you have 2 or more groups that you are comparing and plotting (i.e. with boxpots, violin plots, etc) this includes a built-in function to do the statistics between the groups (for instance t-tests) and add the lines and significance values to the plot: http://www.sthda.com/english/rpkgs/ggpubr/reference/stat_compare_means.html * https://rmarkdown.rstudio.com/developer_document_templates -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Ethics exercise A Command Line Interface (CLI) such as Bash requires training and practice. There is an added difficulty for non-English speakers in that the commands were originally developed by English speakers. Most users find Graphical User Interfaces (GUI’s) very intuitive. On the other hand, CLI’s give much more control and require much less development. In the future it is likely that you will be teaching CLI to colleagues and students. Yesterday we learnt that one of the key elements of Responsible Conduct of Research is being a responsible mentor and colleague - this includes being aware of how cultural and linguistic differences can cause learning challenges for those we interact with. Please take 5 minutes to reflect on the following question: if you are going to be training non-English speakers who have only used GUI’s in Bash and other shell languages what steps would you take to make sure that they did not experience any unnecessary learning/use chal ┏━━━━•❅•°•❈•°•❅•━━━━┓ ❍ WELCOME DAY4 ❍ ┗━━━━•❅•°•❈•°•❅•━━━━┛ Open Science recap and survey intro - https://drive.google.com/open?id=1F0az3c9_3TkZp9g_ZC6ciifbfDEsaVU- Data Management Plan slides - https://drive.google.com/open?id=1ZfYAh5QGIgK2UEUBWsCJpJAum2P2gle2 DMP exercise - https://repositorian.github.io/DOAJ_Exercise Feedback: http://tiny.cc/tr8iestecards AuthorCarpentry & Reproducibility Reporting - Lesson is here: https://authorcarpentry.github.io/DT2018/ - Student files are here: https://github.com/AuthorCarpentry/DTSTUDENT2018 - Rendered final report on the web is here: The student files needed for the lab sessions are in a github repository at https://github.com/AuthorCarpentry/DTSTUDENT2018 1. Fork this repository into your own GitHub account, renaming the folder from DTSTUDENT2018 to your name 2. Clone the repository you just made in your account to your desktop 3. In RStudio, create a new project from the repository now on your desktop. This will make sure the student files are in the working directory (so all of the paths to data or image files work properly) Adding more features to your YAML Heading output: html_document: code_folding: hide css: custom.css number_sections: yes toc: yes toc_depth: 2 toc_float: yes theme: readable highlight: kate ┏━━━━•❅•°•❈•°•❅•━━━━┓ ❍ WELCOME DAY 5 ❍ ┗━━━━•❅•°•❈•°•❅•━━━━┛ Intro to RDM - https://drive.google.com/open?id=1LEYLkTmPvUfamYDzMfQfX1SsHFl99ZPa FAIR data - https://drive.google.com/open?id=12o-uybNjyOGTcIn54X1faJgoV3Ukpuqw FAIR discussion - https://drive.google.com/open?id=1hEmIzPR3uoSiKFYQzzlcmxf9dTHuTQTi Open and Responsible Science part 2 - https://drive.google.com/file/d/1BA8aBAl65c3p9JY9sW-z_DVIZAziFGKb/view?usp=sharing Ethics handout - https://drive.google.com/file/d/1mK2HY2rt0TElGwRem5WU3tnyPNPPMxlA/view?usp=sharing electronic Journal Delivery System - http://ejds.ictp.it/ INASP materials - https://drive.google.com/open?id=16spdJPJDOcXmuhoJcoDXPvthBo_oXedu ----------------------------------------------------- AuthorCarpentry & Reproducibility Reporting - Friday August 9, 2017 Add a dynamic parameter to your reproducible report! (and receive a Knitr sticker!!!) This example makes the Institution you work for a changeable variable that you can select from a pick list when you knit. Step 1. Add the following to the bottom of your YAML header (but *above* the --- ) params: (make sure this is flush left) institution: (indented two spaces, should line up with 'r' in params above) choices: (two more indented spaces) - International Centre for Theoretical Physics (same indent as 'choices', here and for all of the lines below here) - CODATA-RDA Summer School - Elsevier Inc - California Institute of Technology - Add another Institution here, either your own or one you have always wanted to work for! :-) input: select label: 'Institution:' value: Pick one of the choices above to serve as a default Step 2. Replace all occurrences of 'Institution' in your document with the following code that will fill in whatever value you select from the picklist. Hint: One occurrence of your Institution may occur in the YAML, within the 'author' field. Make sure the code is surrounded by backticks, not quotes. `r params$institution` Step 3. Save your work Step 4. On the Knit button, select 'Knit with parameters' You should then see a picklist appear that allows you to select from the Institutions you included in the YAML Step 5. If all works properly, your report will now include your selected Institution! Step 6. Anwer the following question here on the Etherpad (this earns you a knitr hex sticker!): How might you apply the parameterization feature in a reproducible report about your own research? ----------------------------------------------------- AuthorCarpentry: Licensing and ORCID * CODATA-RDA Legal Interoperability of Research Data * Interest Group: https://rd-alliance.org/groups/rdacodata-legal-interoperability-ig.html * Principles & Guidelines: https://www.rd-alliance.org/rda-codata-legal-interoperability-research-data-principles-and-implementation-guidelines-now * Reading List (Zotero database): https://www.zotero.org/groups/1757514/legalinteropdata * Creative Commons Choose a License https://creativecommongs.org/choose/ * ORCID (Online Researcher Identifer): https://orcid.org/ * CrossCite (For any DOI, get the perfect citation in the style you desire): https://crosscite.org/ ----------------------------------------------------- AuthorCarpentry: Citation Styles in Your RMarkdown document Step 1. Pick your desired citation style from the Citation Style Repository (courtesy of Zotero): https://www.zotero.org/styles?q=biomed-central Step 2. Click on the hyperlink for the style you want and save the file with extension .csl to the RStudio project folder on your computer Step 3. This is an xml file (xml is another markup language). Pandoc will read this file and create inline citations and references in the reference list according to this style. Step 4. In the YAML header, add the line csl: and insert the exact path and name of your file.csl Step 5. Knit to the output of your choice and note the change in formatting for your citations and reference list! ----------------------------------------------------- YOUR GIT ID's francocarol @asocrai c0ra mjaysule Daisy1984 neylicious mesfind CObieze Ayansina samkiv Dengabo dagimy TayeB2018 Solomon2018 Akorfa sahuno CarolineNWU Lilian9 joelthegrace gabytv10 drabrito @mkhm @scienception Simisani #To install jrGgplot2 (on your laptop) install.packages("drat") drat::addRepo("jr-packages") install.packages("jrGgplot2",dependencies = TRUE) ┏━━━━•❅•°•❈•°•❅•━━━━┓ ❍ WELCOME DAY 9 ❍ ┗━━━━•❅•°•❈•°•❅•━━━━┛ # Hands-On Exercise: # Implementing a Basic Recommender Engine for Movies # # Instructor: Dr. Ekpe Okorafor # August 14th, 2018 # DataTrieste18 install.packages("data.table") install.packages("ggplot2") install.packages("recommenderlab") # Exercise 1 movies = read.csv("/Users/ekpe/recommender/ml-latest-small/movies.csv") #load movies.csv file str(movies)/#list the structure of movies ratings = read.csv("/Users/ekpe/recommender/ml-latest-small/ratings.csv") #load ratings.csv file str(ratings) #list the structure of ratings # Exercise 2 library(ggplot2) plot <- ggplot(ratings, aes(x = rating)) + geom_histogram() plot # Exercise 3 library(data.table) movgen <- as.data.frame(movies$genres, stringsAsFactors=FALSE) movgen2 <- as.data.frame(tstrsplit(movgen[,1], '[|]', type.convert = TRUE), stringsAsFactors = FALSE) colnames(movgen2) <- c(1:7) head(movgen2, n=4) # Exercise 4 movgen_list <- c("Action", "Adventure", "Animation", "Children", "Comedy", "Crime","Documentary", "Drama", "Fantasy","Film-Noir", "Horror", "Musical", "Mystery","Romance","Sci-Fi", "Thriller", "War", "Western") movgen_matrix <- matrix(0,9126,18) #empty matrix movgen_matrix[1,] <- movgen_list #set first row to genre list colnames(movgen_matrix) <- movgen_list #set column names to genre list #iterate through matrix for (i in 1:nrow(movgen2)) { for (c in 1:ncol(movgen2)) { genmat_col = which(movgen_matrix[1,] == movgen2[i,c]) movgen_matrix[i+1,genmat_col] <- 1 } } #convert into dataframe movgen_matrix2 <- as.data.frame(movgen_matrix[-1,], stringsAsFactors=FALSE) #remove first row, which was the genre list for (c in 1:ncol(movgen_matrix2)) { movgen_matrix2[,c] <- as.integer(movgen_matrix2[,c]) } #convert from characters to integers # Exercise 5 binary_ratings <- ratings for (i in 1:nrow(binary_ratings)){ if (binary_ratings[i,3] > 3){ binary_ratings[i,3] <- 1 } else{ binary_ratings[i,3] <- -1 } } head(binary_ratings, n=7) binary_ratings2 <- dcast(binary_ratings, movieId~userId, value.var = "rating", na.rm=FALSE) for (i in 1:ncol(binary_ratings2)){ binary_ratings2[which(is.na(binary_ratings2[,i]) == TRUE),i] <- 0 } binary_ratings2 = binary_ratings2[,-1] #remove movieIds col. Rows are movieIds, cols are userIds dim(binary_ratings2) #Remove rows that are not rated from movies dataset unique_movieIds <- (unique(movies$movieId)) #9125 unique_ratings <- (unique(ratings$movieId)) #9066 movies2 <- movies[-which((unique_movieIds %in% unique_ratings) == FALSE),] rownames(movies2) <- NULL dim(movies2) #Remove rows that are not rated from movgen_matrix2 movgen_matrix3 <- movgen_matrix2[-which((unique_movieIds %in% unique_ratings) == FALSE),] rownames(movgen_matrix3) <- NULL dim(movgen_matrix3) #Calculate dot product for User Profiles result = matrix(0,18,671) for (c in 1:ncol(binary_ratings2)){ for (i in 1:ncol(movgen_matrix3)){ result[i,c] <- sum((movgen_matrix3[,i]) * (binary_ratings2[,c])) } } #Convert to Binary scale for (i in 1:nrow(result)){ for (j in 1:ncol(result)) { if (result[i,j] < 0){ result[i,j] <- 0 } else { result[i,j] <- 1 } } } # Exercise 6 result2 <- result[1,] #First user's profile sim_mat <- rbind.data.frame(result2, movgen_matrix3) sim_mat <- data.frame(lapply(sim_mat,function(x){as.integer(x)})) #convert data to type integer #Calculate Jaccard distance between user profile and all movies library(proxy) sim_results <- dist(sim_mat, method = "Jaccard") sim_results <- as.data.frame(as.matrix(sim_results[1:9066])) rows <- which(sim_results == min(sim_results)) #Recommended movies movies[rows,] #Exercise 7 library(recommenderlab) recommender_models <- recommenderRegistry$get_entries(dataType = "realRatingMatrix") names(recommender_models) # Display models lapply(recommender_models, "[[", "description") # Describe models recommender_models$UBCF_realRatingMatrix$parameters # List parameters of model library(reshape2) #Create ratings matrix. Rows = userId, Columns = movieId ratingmat <- dcast(ratings, userId~movieId, value.var = "rating", na.rm=FALSE) ratingmat <- as.matrix(ratingmat[,-1]) #remove userIds #Convert rating matrix into a recommenderlab sparse matrix ratingmat <- as(ratingmat, "realRatingMatrix") #Normalize the data ratingmat_norm <- normalize(ratingmat) #Create Recommender Model. "UBCF" stands for User-Based Collaborative Filtering recommender_model <- Recommender(ratingmat_norm, method = "UBCF", param=list(method="Cosine",nn=30)) recom <- predict(recommender_model, ratingmat[1], n=10) #Obtain top 10 recommendations for 1st user in dataset recom_list <- as(recom, "list") #convert recommenderlab object to readable list #Obtain recommendations recom_result <- matrix(0,10) for (i in c(1:10)){ recom_result[i] <- movies[as.integer(recom_list[[1]][i]),1] } recom_result # ---- Excercise 5 # Author: Mojtaba Khodadadi # github: @mkhm # stackoverflow: 3454902 binary_ratings <- ratings head(ratings) binary_ratings$rating <- as.integer(ratings$rating/4.0) head(binary_ratings) where = which(binary_ratings[,3] == 0) binary_ratings[where,3] <- -1 head(binary_ratings) ratings ┏━━━━•❅•°•❈•°•❅•━━━━┓ ❍ WELCOME DAY 10 ❍ ┗━━━━•❅•°•❈•°•❅•━━━━┛ Materials for today are in the Artificial Neural Network folder in the materials folder https://drive.google.com/drive/folders/1PvfPCOVdbdOIowcfQAhnTguuwXJnfx8U?usp=sharing You can deposit your presentations at https://drive.google.com/drive/folders/1vRnGy-YX1XMWxciYml0-yzvnuMPw77a4?usp=sharing Presentation on RDA overview and implementation of RDA outputs to make research workflows FAIR https://docs.google.com/presentation/d/1dA4mJ2_rFskoo8TE9wmvyTulqYaEpx_fR0Qkzmss_u4/edit?usp=sharing ┏━━━━•❅•°•❈•°•❅•━━━━┓ ❍ WELCOME DAY 11 ❍ ┗━━━━•❅•°•❈•°•❅•━━━━┛ If you using a Windows Laptop then you need to install a terminal client program with ssh. Examples of software you can use (and freely download) are https://putty.org and http://cmder.net the second one looks nicer though I've never used it - Hugh DOSAR (Distributed Organization for Scientific and Academic Research) - https://opensciencegrid.org/dosar/ and http://www.dosar.org/ Schedule - https://opensciencegrid.org/dosar/DataTrieste2018/School/ Materials - https://opensciencegrid.org/dosar/DataTrieste2018/Materials/ ┏━━━━•❅•°•❈•°•❅•━━━━┓ ❍ WELCOME DAY 12 ❍ ┗━━━━•❅•°•❈•°•❅•━━━━┛ EGI Computational Infrastructures (Continued) Cloud and Jupyter Notebooksˇ https://documents.egi.eu/document/3349 Jupyter notebook url after registration https://training.fedcloud-tf.fedcloud.eu #################################################### IoT/Big Data https://goo.gl/57f4DT Also here us the link to download the zip file for the docker image: https://github.com/scaledaction/sentiment-analysis/archive/master.zip