*Databases and SQL workshop * *This EtherPad: *pad.carpentries.org/202 5 - 03 - 26 -uio Software Carpentry code of conduct: *https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html Install DB browser for SQLite (if you haven't already): *https://sqlitebrowser.org/dl/ If you don't have admin rights on your laptop, you can download a "portable" version of DB browser that does not require admin to install: https://sqlitebrowser.org/blog/portableapp-for-3-12-0-release-now-available/ Download the database survey.db (if you haven't already): *https://swcarpentry.github.io/sql-novice-survey/files/survey.db Today's Lesson https://swcarpentry.github.io/sql-novice-survey/ *Today's program (subject to change) *09.00: Intro, Selecting Data, Sorting and Removing Duplicates *10.30: Coffee *10.45: Filtering, Calculating new values, Missing Data (lesson) *12.00: Lunch break - In this building: behind the coffee shop; or across the plaza *13.00: Missing Data (excercises), Aggregation *14.30: Coffee *14.45: Combining Data, Data Hygiene, Creating and Modifying Data *16.00: Wrap-up *Instructors Nicholas David Yardley Ball, Kyrre Traavik Låberg *Notes The six basic clauses of an SQL SELECT statement: SELECT column_1, column_2 ... FROM table_1 INNER JOIN table_2 ON join_condition ... WHERE filter_condition_1 ... GROUP BY column_1 ... HAVING filter_condition_after_aggregation ... ORDER BY column ... Alan Beaulieu, Learning SQL (O'Reilly) -> available online through UiO Bibliothek, focussed on MySQL For future workshops: move the relation diagram from the Combining data section to the beginning of the workshop (I agree) SELECT columns FROM tables WHERE filter_conditions *Commenting in DB Browser -- One line comment. /* Multi line comments */ *Episode 4 - Calculating New Values: A note on the last excercise (Selecting Major Site Identifiers): BEWARE, THIS REVEALS THE SOLUTION TO THE EXERCISE, SO DON'T PEEK UNTIL YOU'VE TRIED TO SOLVE IT There are two major ways of selecting distinct values in a set: distinct (as shown in the exercise) *SELECT DISTINCT substr(site, 1, instr(site, '-') - 1) AS MajorSite FROM Visited; group by (which we will talk about in Lesson 6 - Aggregation) *SELECT substr(site, 1, instr(site, '-') - 1) AS MajorSite FROM Visited GROUP BY MajorSite; Arguably GROUP BY is the most robust, as DISTINCT is not supported by some database systems. As a rule of thumb DISTINCT removes duplicates, and is a quick fix if you just want this, whereas GROUP BY lets you use aggregate functions and add more sophistication to your query in the future. ---- UNION is also never explained until it is used in the excercise UNIONS. If you want to learn more about the use of UNION than what is explained in the excercise, there is a good primer here: https://www.simplilearn.com/tutorials/sql-tutorial/sql-union *Episode 6 - Aggregation A comment on this section: "We used count(reading) here, but we could just as easily have counted quant or any other field in the table, or even used count(*), since the function doesn’t care about the values themselves, just how many values there are": This is not always true: Using count(*) is different from using count on specific columns, count(person), etc. if there are missing (NULL) values. Try for example: *SELECT COUNT(*) FROM Survey; = 21 *SELECT COUNT(person) FROM Survey; = 19 *SELECT COUNT(quant) FROM Survey; = 21 *Episode 7 - Combining Data: Example of JOIN structure where you have several tables. It can fast become a confusing structure to keep track of. Method 1: *SELECT Site.lat, Site.long, Visited.dated, Survey.quant, Survey.reading *FROM Site JOIN Visited JOIN Survey *ON Site.name = Visited.site *AND Visited.id = Survey.taken *AND Visited.dated IS NOT NULL; Method 2: *SELECT Site.lat, Site.long, Visited.dated, Survey.quant, Survey.reading *FROM Site *JOIN Visited ON Site.name=Visited.site *JOIN Survey ON Visited.id=Survey.taken *WHERE Visited.dated IS NOT NULL; These two methods do the same thing, use the structure that you find more intuitive. ------- The solution for the excercise 'Listing Radiation Readings' can be a bit confusing. The requirement is to extract the 'rad' readings from the site 'DR-1'. The solution suggests that a JOIN between tables Site, Visited and Survey is performed. However this is not necessary. The table Site is only useful to extract the coordinates of those sites, and the site id is already recorded in the Visited table. So a JOIN between Visited and Survey is enough: *SELECT Survey.reading *FROM Visited *JOIN Survey *ON Visited.id = Survey.taken *WHERE Survey.quant = 'rad' and Visited.site = 'DR-1'; *Joining two tables with columns of different data types (integer vs text) SELECT * FROM table1 JOIN table2 ON table1.id = CAST(table2.id AS INTEGER); OR SELECT * FROM table1 JOIN table2 ON CAST(table1.id AS TEXT) = table2.id; *Contacts: Instructors: Nicholas David Yardley Ball, Kyrre Traavik Låberg email: n.d.y.ball@imv.uio.no, kyrretl@ub.uio.no *UiO Carpentries: *contact-us@carpentry.uio.no *UiO Central Database support Databasedrift https://www.usit.uio.no/om/organisasjon/iti/dbd/ *Reference and glossary: https://swcarpentry.github.io/sql-novice-survey/reference.html *POST WORKSHOP -Please fill out this Post Workshop Survey: https://carpentries.typeform.com/to/UgVdRQ?slug=2025-03-26-uio&typeform-source=uio-carpentry.github.io -Do the excercises you didn't have time to do during the course *JOIN US Contemplate becoming a helper in a Software Carpentry Course *https://www.ub.uio.no/english/writing-publishing/dsc/carpentry-uio/ Do you have questions? Write them below: