Welcome to The Carpentries Etherpad! This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents. Use of this service is restricted to members of The Carpentries community; this is not for general purpose use (for that, try https://etherpad.wikimedia.org). Users are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ ---------------------------------------------------------------------------- *NWU DC Workshop 22 May 2023 Website: https://nwu-eresearch.github.io/2023-05-22-NWU-DC/ *R and R studio download links: https://datacarpentry.org/R-ecology-lesson/#software-setup *https://datacarpentry.org/R-ecology-lesson/#software-setup *Day 4 Attendance: Name/email/affiliation/twitter 1. Martin Dreyer, martin.dreyer@nwu.ac.za. NWU. @Amfdrey 2 Rozeena Arif, r.arif.1@research.gla.ac.uk, University of Glasgow 3Marzieh Behrouz, marzieh.behrouz@gmail.com 4Melize Meyer, melize.m99@gmail.com. NWU 5Guillaume De Swardt, guillaumedeswardt53@gmail.com 6Carol Mmakola carolmmakola@gmail.com 7 8 9 10 Data export: #export to data folder write_csv(surveys_complete, file ="data/surveys_complete.csv") *Exercises *Exercise 4 Use what you just learned to create a scatter plot of weight over species_id with the plot types showing in different colors. Is this a good way to show this type of data? feedback: I believe this is a better way for visual depiction of weight ranges for each species ID *Exercise3 How many animals were caught in each plot_type surveyed? Use group_by() and summarize() to find the mean, min, and max hindfoot length for each species (using species_id). Also add the number of observations (hint: see ?n). surveys %>% filter(!is.na(hindfoot_length)) %>% group_by(species_id) %>% summarize(mean_hindfoot_length = mean(hindfoot_length), min_hindfoot_length = min(hindfoot_length), max_hindfoot_length = max(hindfoot_length), n=n()) What was the heaviest animal measured in each year? Return the columns year, genus, species_id, and weight. surveys %>% filter(!is.na(weight)) %>% group_by(year) %>% filter(weight == max(weight)) %>% select(year, genus, species, weight) *Exercise 2 Create a new data frame from the surveys data that meets the following criteria: contains only the species_id column and a new column called hindfoot_cm containing the hindfoot_length values (currently in mm) converted to centimeters. In this hindfoot_cm column, there are no NAs and all values are less than 3. * Hint: think about how the commands should be ordered to produce this data frame! surveys_hindfoot_cm <- surveys %>% filter(!is.na(hindfoot_length)) %>% mutate(hindfoot_cm = hindfoot_length/10) %>% filter(hindfoot_cm<3) %>% select(species_id, hindfoot_cm) Exercise1 1. Using pipes, subset the surveys data to include animals collected before 1995 and retain only the columns year, sex, and weight. Feedback feedback 21486 rows with 3 columns feedback 21486 obs. of 3 variables *Day 3 Attendance: Name/email/affiliation/twitter 1. Martin Dreyer, martin.dreyer@nwu.ac.za. NWU. @Amfdrey 2 Melize Meyer, melize.m99@gmail.com. NWU 3Guillaume De Swardt, guillaumedeswardt53@gmail.com 4 Rozeena Arif, r.arif.1@research.gla.ac.uk, University of Glasgow 5 Victory Samuel, samuelvictory4@gmail.com, NWU 6Marzieh Behrouz, marzieh.behrouz@gmail.com 7Innocent N Mthembu mthembui@unizulu.ac.za Unizulu 8Carol Mmakola carolmmakola@gmail.com 9 10 *Data for R: *download.file(url = "https://ndownloader.figshare.com/files/2292169", destfile = "data_raw/portal_data_joined.csv") *Exercises Exercise 5 * Rename “F” and “M” to “female” and “male” respectively. * Now that we have renamed the factor level to “undetermined”, can you recreate the barplot such that “undetermined” is first (before “female”)? Exercise 4 1. Change the columns taxa and genus in the surveys data frame into a factor. 1. Using the functions you learned before, can you find out… * How many rabbits were observed? * How many different genera are in the genus column? 1. 75 2. 26 Exercis 3 1. Using this vector of heights in inches, create a new vector, heights_no_na, with the NAs removed. *heights <- c(63, 69, 60, 65, NA, 68, 61, 70, 61, 59, 64, 69, 63, 63, NA, 72, 65, 64, 70, 63, 65) * * 1. Use the function median() to calculate the median of the heights vector. 2. Use R to figure out how many people in the set are taller than 67 inches. 1--->64 Exercise 2 * What will happen in each of these examples? (hint: use class() to check the data type of your objects): * * num_char <- c(1, 2, 3, "a") * num_logical <- c(1, 2, 3, TRUE) * char_logical <- c("a", "b", "c", TRUE) * tricky <- c(1, 2, 3, "4") * Why do you think it happens? * because of commas, it is taking up charater type in three of the cases, while deals the second case as numeric and assigns 1 to TRUE, as TRUE and FALSE are dealt as 1 & 0 in the bachend. *Feedback What are the values after each statement in the following? *mass <- 47.5 # mass? age <- 122 # age? mass <- mass * 2.0 # mass? age <- age - 20 # age? mass_index <- mass/age # mass_index? *Feedback 1. mass = 47.5 2. age = 122 3. mass = 95 4. age = 102 5. mass_index = 0.9313725 feedback mass 47.5 age 122 mass 95 age 102 mass_index 0.9313725 feedback mass*2= 95 age-20=102 *Day 2 Attendance: Name/email/affiliation/twitter 1.Martin Dreyer, martin.dreyer@nwu.ac.za. NWU. @Amfdrey 2 Rozeena Arif, r.arif.1@research.gla.ac.uk, University of Glasgow, UK 3.Oluwadara Omotayo, alamuoluwadara@gmail.com, NWU 4Melize Meyer, melize.m99@gmail.com. NWU 5 Victory Samuel, samuelvictory4@gmail.com, NWU. 6Guillaume De Swardt, guillaumedeswardt53@gmail.com 7Marzieh Behrouz, marzieh.behrouz@gmail.com 8Innocent N Mthembu Mthembui@unizulu.ac.za University of Zululand 9Carol Mmakola carolmmakola@gmail.com 10Winny Nekesa Akullo nekesawinny@gmail.com, NSSF *Open Refine: https://datacarpentry.org/OpenRefine-ecology-lesson/03-exploring-data.html *Exercises *Exercise 1. Using faceting, find out how many years are represented in the census. 2. Which years have the most and least observations? 3. Is the column formatted as Number, Date, or Text? Feedback 1. 16 2.1993 (least) 1978(most) 3. text Feedback 1. 16 2. most = 1978 least = 1993 3.Text *Exercise The dataset included other numeric columns that we will explore in this exercise: * period - Unique number assigned to each survey period * plot - Plot number animal was caught on, from 1 to 24 * recordID - Unique record ID number to facilitate quick reference to particular entry Transform the columns period, plot, and recordID from text to numbers. * 1. How does changing the format change the faceting display for the yr column? 2. Can all columns be transformed to numbers? *Feedback 1 It displays the bar chart for year column representing the occurence frequency of each year 2 No, it generates an error message for text columns as it cannot find any numeric digits to convert them into numbers * Feedback 1 2 no *Exercise 1. For a column you transformed to numbers, edit one or two cells, replacing the numbers with text (such as abc) or blank (no number or text). 2. Use the pulldown menu to apply a numeric facet to the column you edited. The facet will appear in the left panel. 3. Notice that there are several checkboxes in this facet: Numeric, Non-numeric, Blank, and Error. Below these are counts of the number of cells in each category. You should see checks for Non-numeric and Blank if you changed some values. 4. Experiment with checking or unchecking these boxes to select subsets of your data. * *Day 1 Attendance: Name/email/affiliation/twitter 1.Martin Dreyer, martin.dreyer@nwu.ac.za. NWU. @Amfdrey 2.Melize Meyer, melize.m99@gmail.com. NWU. NA 3.Marzieh Behrouz, marzieh.behrouz@gmail.com, Shiraz University of Thechnology 4. Victory Samuel, samuelvictory4@gmail.com, NWU 5.Ruth Adebayo, adebayoruth101@gmail.com 6.Innocent N Mthembu Mthembui@unizulu.ac.za 7.Carol Mmakola - carolmmmakola@gmail.com 8.Rozeena Arif - r.arif.1@research.gla.ac.uk - University of Glasgow 9. 10. *Data Cleaning with OpenRefine https://datacarpentry.org/OpenRefine-ecology-lesson/data/Portal_rodents_19772002_simplified.csv Lesson: https://datacarpentry.org/OpenRefine-ecology-lesson/ 03-exploring-data.html *Data Organisation in Spreadsheets *Dates as data Challenge: pulling month, day and year out of dates * Let’s create a tab called dates in our data spreadsheet and copy the ‘plot 3’ table from the 2014 tab (that contains the problematic dates). * Let’s extract month, day and year from the dates in the Date collected column into new columns. For this we can use the following built-in Excel functions: *YEAR() * *MONTH() * *DAY() Challenge: pulling hour, minute and second out of the current time Current time and date are best retrieved using the functions NOW(), which returns the current date and time, and TODAY(), which returns the current date. The results will be formatted according to your computer’s settings. * 1. Extract the year, month and day from the current date and time string returned by the NOW() function. 2. Calculate the current time using NOW()-TODAY(). 3. Extract the hour, minute and second from the current time using functions HOUR(), MINUTE() and SECOND(). 4. Press F9 to force the spreadsheet to recalculate the NOW() function, and check that it has been updated.