Welcome to The Carpentries Etherpad!
This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.
Use of this service is restricted to members of The Carpentries community; this is not for general purpose use (for that, try etherpad.wikimedia.org).
Users are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html
All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/
Workshop Twitter hashtag: #LibCarpMoHub
Slack: https://missourihub.slack.com/
## Agenda
### Day 1
* Introductions
* Workshop Instructors and Helpers
* Missouri Hub
* Introduce yourself to two neighbors
* Logistics
* About Library Carpentry
* Introduction to Working with Data (Regular Expressions)
* The UNIX Shell
### Day 2
* Logistics reminder
* Introduction to Git
* OpenRefine
## Attending
Anna Oates/Federal Reserve Bank of St Louis/Anna.Oates@stls.frb.org/@annaoates
Heather Moulaison Sandy - University of Missouri iSchool - moulaisonhe@missouri.edu
Tori Lyons/Logan University/victoria.lyons@logan.edu
Brianna Chatmon/Stephens/University of Missouri/bctkd@mail.missouri.edu
Shannon Mawhiney/Missouri State University/smawhiney@missouristate.edu
Carol Clark/Saint Louis Art Museum/carol.clark@slam.org/@GLAMdatacarol
AJ Robinson/ Washington University/ robinson.a@wustl.edu
Jessica Kleekamp / Washington University in St. Louis / jkleekamp@wustl.edu
Marcy Vana / Washington University School of Medicine / vanam@wustl.edu
Jenny Bossaller/University of Missouri/bossallerj@missouri.edu
Drew Kupsky - Saint Louis University - drew.kupsky@slu.edu
Stephanie Chinn / Missouri Univeristy of Science & Technology / garvin@mst.edu
Jamillah Boyd University of Missouri St. Louis @jamillahboyd
Maze Ndukum / Washington University School of Medicine / ndukummaze@wustl.edu
Dylan Martin / Lincoln University of Missouri / martind2@lincolnu.edu
todd quinn / University of New Mexico
Levi Dolan / University of Missouri / ljd437@mail.missouri.edu
Anne Cox/State Historical Society of Missouri/coxan@shsmo.org/@woozymoose
Daron Dierkes / The Missouri Historical Society / ddierkes@mohistory.org
Chris Sorensen/Washington University School of Medicine/sorensenc@wustl.edu
Matt Butler / Missouri State Library / matthew.butler@sos.mo.gov
Evan Sprague /Washington University School of Medicine / e
Dorris Scott/ Washington University in St. Louis/d.scott@wustl.edu/@Dorris_Scott
Amanda Sprochi/MU/ sprochia@health.missouri.edu
Katherine Leonard / University of Missouri / knln9c@mail.missouri.edu
Feliciy Dykas / University of Missouri - dykasf@missouri.edu
Dave LaCrone / Kansas City Public Liibrary / davidlacrone@kclibrary.org
Emily Stenberg / Washington University in St. Louis / emily.stenberg@wustl.edu
## Introduction to Working with Data (Regular Expressions)
Used in programming to match patterns (and replace)
"Finding a needle in a haystack"
Often used with text/code
You can plug it in and use it in text/code editors, scripts, OpenRefine! (Another that is often cited, brackets.io)
Also known as RegEx or regex
We will start off using:
https://regexr.com/
> Cheatsheet (let's first walk through this)
[A-Z] this is a range, will match characters between - and including A & Z - but capital letters only
[a-z] range will match characters between and including, a- z, lower case
\w matches any word character (alpa & underscore)
+ match one or more of the proceding token (\w in this case)
\d matches a digit
\s matches white space
\s{2,} matches two or more spaces
\b\w{5}\b matches words with 5 characters
.* matches 0 or more times
.*? matches 0 or 1 times
So, what is ^[Oo]rgani.e\b going to match?
At the start of a line Organize or organize or Organise or organise
Organize, organize, organise, Organise
Organize, Organize organice
Organize with lower or upper case beginning in either US English or otherwise, or any other character to make a nonsense word. Must begin with all characters in the RegEx and end with the boundary
Organize, organize, Organipe, organibe... (the ^ means the line must start with O or o)
Lines that start with organize spelled with any character where the z is (Upper of lowercase o)
Organize upper or lower case at beginning of
line spelled in British or Amerian English
Organize, organize, Organise, organise
What will the regular expression ^[Oo]rgani.e\w* match
Organize
organizen organizea
Organized, organisers
At the start of a line Organizes, Organized, organized or organises, organiked, organike
organibe234234jweljaltjq234oj2oqfjoaw4vnq3o
Start of the line has O or o followed by rgani, any character, anything after e
Organizedly
Starting with O or o, matching rgani, any character, matching e, with any zero or more characters after
organise343443435353353
What will the regular expression \b[Oo]rgani.e\b|\b[Oo]rgani.e\w{1}\b match?
Start of a line Organi/organi, seventh character anything, followed by an e, OR
Orangile
organize or organizer, but not organizers
Organize upper or lower case British or American spelling or organiz/ser/d/ something
Starts with entire expression, leading with O or o, matching rgani, any character, ending with e OR Starts with entire expression, leading with O or o, matching rgani, has e, ending on boundary matching one of any character
Find all of the words starting with Comm or comm that are plural.
[Cc]omm\w*s\b
\b[Cc]omm\w*s\b
\b[Cc]omm\w*s\b
\b[Cc]omm\w+s\b
b[Cc]omm.*s\b (this includes spaces) = 23 results
Isolating email addresses from the Software Carpentry Code of Conduct
\w*@\w*\.[a-z]*
\b\w+@\w+\.\w+\b
\b\w*@\w*\.\w*\b
\b\w*@\w*\.\w{3}\b
\b\w+@ (missed part of one)
\b\w+@\w+\.\w+\b
\b\w+@\w+\.[A-Za-z]+
\w+\@\w+\.\w+
@\w*\.
[\w]+@
\w+@\w+\....
\w+@\w*\.
\w+@\w+\.\w+\b
\w|
\b\w*@\w*\.\w*\b
[\w]@
NOTE ON THE CHEAT SHEETS
Using parenthesis around an expression turns it into a group. When using regex to replace strings, you can group the strings your are matching, and then reference the groups by the dollar sign. You'll see an example of this in the markdown-to-html example.
Examples of ways used:
In OpenRefine to change formatting in a column
on shell, looping to change file names
MarcEdit
to fix/standardize wonky date formats
To find and replace very specific strings
Another helpful regular expressions cheatsheet:
http://www.cbs.dtu.dk/courses/27610/regular-expressions-cheat-sheet-v2.pdf
For our exercises, we will use The Carpentries Code of Conduct:
https://raw.githubusercontent.com/libcce/lc-lesson-materials/master/code-of-conduct.md (copy to regexr) Ctrl-A to select all, Ctrl-C to copy
Regular expression challenges using The Carpentries Code of Conduct:
1. Find different spellings of organize
2. Change the Markdown links to HTML links/tags
3. Change the bold Markdown headings (with HTML break tags) to HTML heading tags
4. Change the Markdown headings to HTML headings
Start here as group:
- Switch the ISO dates to USA format (dd-mm-yyyy)
Expression: (\d{4})-(\d{2})-(\d{2})
Replace with: $2-$3-$1
RegEx Exercise - Possible Answers
- Find different spellings of organize
organi[zs]\w+
\organi.e
(organi).(e) and then to replace the s or z <<$1z$2>>, this way leaves the endings (-er) untouched
- Change the Markdown links to HTML links/tags?
[link text](https://www.google.com)
(\[)(.*?)(\]\s*\()(.*?)(\))
$2
OR
(\[.*\])(\(.*\))
$1
-Change Markdown break tags to HTML break tags
(\*\*)(.*?)(\*\*)
$2
- Change the Markdown headings to HTML headings
(\#{2,4}\s*)(.*?)($)
$2
- Switch the ISO dates to USA format
(\d{4})(\-)(\d{2})(\-)(\d{2})
$3/$5/$1
OR
Expression: (\d{4})-(\d{2})-(\d{2})
Replace with: $2-$3-$1
\* matches the astricks (*) - need to escape it with back slash "\"
Exercise 2
Go here https://www.thetranscriptionpeople.com.au/2015/04/14/a-humorous-look-at-how-punctuation-can-change-meaning/ and copy the main block of text.
Practice making all of the sentences either with the Oxford comma, or all of them without
More info/exercises:
https://librarycarpentry.github.io/lc-data-intro/04-regular-expressions/index.html
You might have found this helpful if I had thought to paste this in beforehand:
Regular Expressions Cheat Sheet https://docs.google.com/document/d/1P-LAtXb5S8F_tIE9TQOT2qP5ItEwMnR81AxK519T5Ig/edit
*## The UNIX Shell
*### Data Files
You need to download some files to follow this lesson:
https://raw.githubusercontent.com/librarycarpentry/lc-shell/gh-pages/data/shell-lesson.zip
1. Download shell-lesson.zip and move the file to your Desktop.
2. Unzip/extract the file (ask your instructor if you need help with this step). You should end up with a new folder called shell-lesson on your Desktop.
Note: Something to look out for, sometimes there are issues in getting to the shell lesson folder via Windows Git Bash
Go to directory, right click, open/select Git Bash or cd /c/...
First, if you were unable to install Git Bash then try this:
https://console.cloud.google.com/cloudshell/editor?shellonly=true&pli=1
wget github.com/LibraryCarpentry/lc-shell/raw/gh-pages/data/shell-lesson.zip
unzip shell-lesson.zip
*### About the Shell
Before we had graphical interfaces, we had command line interface
UNIX Shell began in 1970s
Historical flowchart: https://en.wikipedia.org/wiki/File:Unix_history-simple.svg
Futher reading
The unix shell: https://en.wikipedia.org/wiki/Unix_shell
Unix-like: https://en.wikipedia.org/wiki/Unix-like
What is unix?: A brief introduction to unix: https://www.softwaretestinghelp.com/unix-introduction/
A beginner's guide to the unix command line: https://www.osc.edu/supercomputing/unix-cmds
Command line crash course: https://learnpythonthehardway.org/book/appendixa.html
Use cases
- Programming, data science work, research computing
- Wrangling with and cleaning lots of data/files
- Example: ORCID data dump via Figshare
https://orcid.figshare.com/articles/ORCID_Public_Data_File_2018/7234028
- Example: Mining journal article PDFs at the European Southern Observatory
https://www.eso.org/sci/libraries/telbib_methodology.html
HELP
- `man(ls`
- `help(ls)`
- Google it
- Explain Shell: https://explainshell.com/explain?cmd=ls+-lh
- TL;DR: https://tldr.sh/
- Basic UNIX Commands: https://www.tjhsst.edu/~dhyatt/superap/unixcmd.html
- For Windows: http:/man.he.net/
- `COMMAND --help`
- Example: `ls --help`
TEXT EDITORS
- nano - default
- Notepad++ (Windows): https://notepad-plus-plus.org/
- If using Notepad ++, here is a cheatsheet for keyboard shortcuts: http://www.cheat-sheets.org/saved-copy/Notepad++_Cheat_Sheet.pdf
- Sublime (macOS, Linux): https://www.sublimetext.com/
- Atom (macOS): https://atom.io/
- Editpad Pro (Windows): https://www.editpadpro.com/download.html
- Visual Studio Code (Windows, macOS, Linux): https://code.visualstudio.com/
- IntelliJ (Windows, macOS, Linux): https://www.jetbrains.com/idea/
*### Working with files and directories
https://drive.google.com/file/d/12N579z4FgK9yM3m4j0XBkQQxKdcJk62s/view?usp=sharing
https://raw.githubusercontent.com/librarycarpentry/lc-shell/gh-pages/data/shell-lesson.zip
KEY COMMANDS
`pwd` - present working directory
`cd` - change directory
`ls` - list directory contents
- helpful flags/options: `ls -l` and `ls -lh`
Permissions / symbolic links / user / group / size / date / folder or filename
`d` - stands for directory / chmod 777 (file) / extensible @ / ls -a (see hidden files)
QUIZ
* What flags do you use to list contents of a directory in long listing format and sort by modification date, newest first?
* And how can you order by file size?
* How can you see hidden files?
ANSWERS
`ls -lt` (order by mod date)
`ls -lS` (order by file size)
`ls -a` (do not ignore entries starting with .)
`pwd`
`mkdir firstdirectory`
`cd firstdirectory`
`cd ..`
`ls -lh`
`cat` - concatenate files and print on the standard output (in other words open and print a file to screen)
type `82 + [TAB]`
`cat 829-0.txt`
QUIZ
* What is the title of 829-0?
cp gulliver.txt gulliver-backup.txt
cp gulliver.txt gulliver_backup.txt
ANSWER
GULLIVER’S TRAVELS
`head` - output the first part of files (first 10 lines)
`head 829-0.txt`
`tail` - output the last part of files (last 10 lines)
`tail 829-0.txt`
QUIZ
TASK: Create a for loop that prints the name, first line, last line of each text (.txt) file in the current directory.
for file in *.txt; do; echo "$file"; head -n *.txt; tail -n 1 *.txt;done
for filename in *.txt; do echo $filename; head -n 1 $filename; tail -n 1 $filename; done
for file in *.txt; do echo "$file"; head -n 1 $file; tail -n 1 $file; done
for file in *.txt do ; echo "$file" ; head -n 1 $file ; tail -n 1 $file ; done
for file in *.txt; do echo "$file"; head -n 1 "$file"; tail -n 1 "$file"; done
* How can you return the first 20 lines of 829-0.txt?
* How can you return the last 30 lines of 829-0.txt?
ANSWER
`head -n20 829-0.txt`
`tail -n30 829-0.txt`
Example: Sometimes files are too big to open and head and tail can be a lightweight way to peak inside or to get header information in automated way.
`less` - allows you to scroll/page through file
`less 829-0.txt`
Navigating output
`spacebar` to page, up and down arrows, `q` to quit
`mv` - move (rename) files
`mv 829-0.txt gulliver.txt`
QUIZ
What is the title of 33504-0.txt and can you rename it to its title.txt?
ANSWER
Opticks and mv opticks.txt
mv renames a file, cp copies a file and places to new file name whereve you want it to go
QUIZ
Can you create backup files of the two titles above in a "backup" folder naming the files by adding "_backup.txt"?
ANSWER
mkdir backup
cp gulliver.txt backup/gulliver_backup.txt
cp opticks.txt backup/opticks_backup.txt
Wildcards
What does * do?
QUIZ
How can we use this wildcard to match + list all the .txt files?
ANSWER
ls *.txt
How can you see the history of your commands?
- You can use the up and down arrow keys
- You can use history
history !number to print out specific command
You can also redirect output of your history to a text file
history > history.txt
For a taste of Shell programming, let's create a variable which holds a value:
NAME=Groot
And let's print out to the command line:
echo "I am $NAME"
<<>>
Create a number of files quickly using touch
touch a.txt b.txt c.txt d.txt
Now for the scripting!
We will create a Bash script on the command line
For our script we are going to loop through all the text files
And we are going to print the file name
Then we are going to finish
$ for filename in *.txt
> do
> echo $filename
> done
Before our exercise, how can we create and edit a file on the command line?
Let's use the command line tool "nano." You can use an alternative text editor, such as Notepad++, Sublime, Atom, etc.
nano myfile.txt
Editor screen appears
Write "This is my file!"
Ctrl + X to close
Enter to save file
Exit w/ "y" or "yes"
EXERCISE
- Create a file called myscript.sh
- Add a similar for loop to the myscript file
- In this loop you will print the file name to screen
- And you will print the first and last 5 lines of the file to screen
- Then you will end the loop
Note: Add...
#!/bin/bash
# My first script
... to the top of the file
ANSWER:
myscript.sh
#!/bin/bash
# My first script
for filename in *.txt
do
echo $filename
head -n 5 $filename
tail -n 5 $filename
done
Run the executable:
./myscript.sh OR bash myscript.sh
To make file executable, run `chmod +x FILENAME`
ABOUT #! https://www.in-ulm.de/~mascheck/various/shebang/
TIP: Ctrl + C to quit when in infinite loop
Navigate to shell-lesson dir
`ls -lh`
`wc` - word count
`wc *.tsv` (see words and lines)
QUIZ
What options are available to you in wc?
`wc -l *.tsv > lengths.txt`
`cat lengths.txt`
Piping
Two or more commands connect together via a "|"
Order is -> | -> | ...
`wc -l *.tsv | sort`
`wc -l *.tsv | sort -r`
-------
EXERCISES
We have our wc -l *.tsv | sort -n | head -n 1 pipeline. What would happen if you piped this into cat?
wc -l *.tsv | sort -n | head -n 1 | cat
5375 2014-02-02_JA-britain.tsv
*wc -l *.tsv | sort -n | head -n 1
-w == words - can be used to get the word count for words (e.g., wc -w FILENAME)
Know the 10 files that contain the most words.
wc -w * | sort -n | head -n 10
wc -w *.tsv | sort -nr | head -n 11
wc -w * | sort -n | tail -n 11
wc -w *.tsv | sort -nr | head -n 11
wc -w *.tsv | sort -n | tail -n 11
wc -w *.tsv | sort -n | tail -n 11
-c - flag to append to grep for counting number of instances
-o - flag to limit return to exact string match ; not the complete line which contains a match
1. Search for all case sensitive instances of a word of your choice in all derived .tsv files. Print your results to the shell.
grep -wc speculation *.tsv
grep -w Egypt *.tsv
grep -w Dakota *.tsv
grep -w history *.tsv
grep -w Foo *.tsv
grep -w Washington *.tsv
grep -w social *.tsv
grep -w Africa *.tsv
grep -wc battle *.tsv
$ grep -w India *.tsv
$ grep -wc history 2014-01-31_JA-africa.tsv
grep -w war *.tsv
grep -w colonial *.tsv
grep -w virgina *.tsv
2. Move up to the shell-lesson directory. Search for all case sensitive instances of a word of your choice in the "America" and "Africa" .tsv files in the shell-lesson directory. Print you results to the shell.
grep -w colonial *a.tsv
grep -w Dakota 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -w history 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -w speculation 2014-01-31_JA-america.tsv 2014-01-31_JA-africa.tsv
grep -w social 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -w Washington 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -w war 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
$ grep -w water 2014-01-31_JAfrica.tsv 2014-01-31_JA-america.tsv
grep -w slavery *a.tsv
3. Count all case sensitive instances of a word of your choice in the ‘America’ and ‘Africa’ .tsv files in this directory. Print your results to the shell.
grep -c bear 2014-01-31_JA-a*.tsv
grep -wc colonial *a.tsv
grep -cw Dakota 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -wc Washington 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -c speculation 2014-01-31_JA-america.tsv 2014-01-31_JA-africa.tsv
grep -c bear 2014-01-31_JA-a*.tsv
grep -wc social 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -wc history 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -ci Connecticut *a.tsv
grep -c war 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -wc Bird 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
$ grep -wc fruit 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
4. Count all case insensitive instances of that word in the ‘America’ and ‘Africa’ .tsv files in this directory. Print your results to the shell.
grep -cwi Dakota 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -cwi speculation 2014-01-31_JA-america.tsv 2014-01-31_JA-africa.tsv
grep -wci colonial *a.tsv
grep -wci social 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -wci history 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -wci washington 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -wci Bird 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -cwi war 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
grep -cwi Nevada 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
5. Search for all case insensitive instances of that word in the ‘America’ and ‘Africa’ .tsv files in this directory. Print your results to a file results/[wordOfChoice].tsv.
grep -wi Dakota 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv > results/Dakota.tsv
grep -wi colonial *a.tsv > results/colonial.tsv
grep -i social 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv > results/social.tsv
grep -iw speculation 2014-01-31_JA-america.tsv 2014-01-31_JA-africa.tsv > results/speculation.tsv
grep -i history *a.tsv > results/history.tsv
grep -wi Dakota 2014-01-31_JA-africa.tsv *a.tsv > Dakota-i.tsv
grep -wi history 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv > results/history.tsv
grep -wi washington 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv > results/Washington.tsv
grep -wi Bird 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv > ./results/Bird.txt
grep -wi war 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv > results/war.tsv
grep -wi Nevada 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv
6. Search for all case insensitive instances of that whole word in the ‘America’ and ‘Africa’ .tsv files in this directory. Print your results to a file results/[wordOfChoice]-i.tsv.
grep -wi Dakota 2014-01-31_JA-africa.tsv *a.tsv > Dakota-i.tsv
grep -wi history 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv > results/history.tsv
grep -wi \bwashington\b 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv > results/Washington.tsv
grep -i war 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv > results/war-i.tsv
*$ grep -iw war *a.tsv > results/war-i.tsv
grep -wi social *a.tsv > results/social-i.tsv
grep -wi Nevada 2014-01-31_JA-africa.tsv 2014-01-31_JA-america.tsv >results/Nevada.tsv
7. Use regular expressions to find all ISSN numbers (four digits followed by hyphen followed by four digits) in 2014-01_JA.tsv and print the results to a file results/issns.tsv. Note that you might have to use the -E flag (or -P with some versions of grep, e.g. with Git Bash on Windows.).
grep -E '\d{4}-\d{4}' 2014-01_JA.tsv > ./results/issns.tsv (because apparently \d{#} is an extended regular expression, and grep needs -E to recognize it)
grep -P '\d{4}-\d{4}' 2014-01_JA.tsv > results/issns.tsv
grep -P '\d{4}-\d{4}' 2014-01_JA.tsv > results/issns.tsv (-P recognizes the pattern of digits xxxx-xxxx)
grep -P '\d{4}-\d{4}' 2014-01_JA.tsv > results/issns.tsv
grep -E '\d{4}-\d{4}' 2014-01_JA.tsv > results/issns.tsv
grep -E '\d{4}-\d{4}' 2014-01_JA.tsv > results/issns.tsv
grep -P '\d{4}\-\d{4}' 2014-01_JA.tsv > results/issns.tsv
QUIZ
How would you get the file with the lowest number of lines?
And how can you save that to a txt file?
ANSWER
TIP: If you wanted to append a date stamp to the top of the file you just created:
date >> topsort.txt
`grep` - print lines matching a pattern
grep is probably one of the most useful command line tools for searching for matches within files/directories
`grep 1999 *.tsv`
By the way, how could we have redirected the output to a txt file?
`grep -c 1999 *.tsv` (lists number of matches per file)
`grep -c revolution *.tsv`(lists case sensitive search)
`grep -ci revolution *.tsv` (lists insensitive)
## Try other keywords here like America or German
Here is an example that I used at the European Southern Observatory to find instrument names in context:
`grep -C 2 'HARPS' *` (get two lines for context of match)
QUIZ
How would you search for the China journal in the same files?
ANSWER
`grep -iwE 'fr[ae]nc[eh]' *.tsv` (flags i insensitive, w word, E expression)
## Try to find variations of organize
How can we tell if the number of matches has changed with our regex?
`grep -o 'needle' haystack | wc -l`
EXERCISE
How would we find issns in 014-01_JA.tsv using grep, regex, and redirect output to a issns.tsv file?
We can walk through this together and write it on the board...
ANSWER
EXERCISE
Combine what you learned of the for loop with using grep to find the word counts of names in gulliver.txt...
ANSWER
Quick demo of sed which allows you to replace words in file:
less diary.html
Look inside and replace foo with bar (some word)
sed -i '' 's/Daddy/Mommy/g' diary.html
*## Introduction to Git
Why would you want to learn this?
Examples:
Democratic databases: science on GitHub Scientists are turning to a software–development site to share data and code.
https://www.nature.com/news/democratic-databases-science-on-github-1.20719
Making Code Citeable
https://guides.github.com/activities/citable-code/
Journal of Open Source Software
https://joss.theoj.org/
Our path to better science in less time using open data science tools
https://www.nature.com/articles/s41559-017-0160
FAIR Data Action Plan (Issues for Community Feedback)
https://github.com/FAIR-Data-EG/Action-Plan/issues
Code4Lib Community Statement in Support of Chris Bourg
https://github.com/code4lib/c4l18-keynote-statement
Conference Websites
https://libcce.github.io/TriangleJupyter/
Library Carpentry Lessons
https://github.com/LibraryCarpentry
Carpentries Workshop & Lesson Templates
https://github.com/carpentries/workshop-template
https://carpentries.github.io/lesson-example/setup.html
### Go to GitHub
https://github.com/
Can search all public github repositories.
### Sign up (create an account)
Once you've done this, copy and paste the link to your account here:
<< For example, mine is https://github.com/annaoates - Anna Oates>>
https://github.com/ddierkes - Daron Dierkes, Missouri Historical Society
https://github.com/lacrone - Dave LaCrone
https://github.com/emrsster - Emily Stenberg, WashU
https://github.com/drewkupsky - Drew Kupsky, Saint Louis University
https://github.com/brichatmon/KC -Brianna Chatmon, Stephens College, SISLT Student, University of Missouri
https://github.com/leonardstl - Katherine Leonard, University of Missouri
https://github.com/firbolg - Levi Dolan
https://github.com/genreina - Shannon Mawhiney, MSU
https://github.com/annecox - Anne Cox, SHSMO
https://github.com/dykasf/ - Felicity Dykas, U of Missouri
https://github.com/carolaclark - Carol Clark, Saint Louis Art Museum
https://github.com/butlermt - Matt Butler, Missouri State Library
https://github.com/asprochi Amanda Sprochi MU
https://github.com/bossjen - Jenny Bossaller, University of MO
https://github.com/hlmoulaison heather moulaison sandy
https://github.com/dmart423 - Dylan Martin, Lincoln University of Missouri
https://github.com/jkleekamp - Jessica Kleekamp, Washington University in St. Louis
https://github.com/stephchinn - Stephanie Chinn, Missouri University of Science & Technology
https://github.com/cjsorensen15 - Chris Sorensen, Washington University School of Medicine
https://github.com/deniceadkins Denice Adkins, School of Information Science & Learning Technologies, University of Missouri
https://github.com/vanamarc - Marcy Vana, Washington University School of Medicine
https://github.com/momiji15 - Dorris Scott, Washington University in St. Louis
https://github.com/mightylibrarian - Todd Quinn, University of New Mexico
https://github.com/EvanSprague - Evan Sprague, Washington University School of Medicine
https://github.com/robinsonaj AJ Robinson, Washington University
https://github.com/ndukumm, Maze Ndukum, Washington Unversity School of Medicine
https://github.com/toribethlyons Tori Lyons, Logan University
https://github.com/jamillahboyd Jamillah Boyd, University of Missouri St. Louis
Readme file is the first thing one sees when they find your repository. Readme files should give enough information about your repository contents.
*#### Connect to local
`git config --global user.name "Your Name"`
`git config --global user.email "your@email"`
*git config --global core.editor "nano -w"
https://help.github.com/en/github/getting-started-with-github/set-up-git
### Using Git
`mkdir REPONAME`
`cd REPONAME`
`git init`
`git status`
`git add FILENAME`
`git commit -m 'ADDMESSAGE`
https
ssh
`git remote add origin GITHUBURL`
`git remote -v`
`git push -u origin master`
`git diff`
`git log`
`git push`
`git pull`
git commands cheatsheet : https://education.github.com/git-cheat-sheet-education.pdf
### Post-it exercise (forking, branching, merging w/ cats)
https://guides.github.com/introduction/flow/
forking - creating a copy in your github account
branching - users who forked a repo-make changes in their account on this new branch
pull request - ask to add your changes to the original repository
merging - move the changes from the branches to the original repository
Readme file is the first thing one sees when they find your repository. Readme files should give enough information about your repository contents.
#### GitHub Pages
`git checkout -b gh pages`
`git push --set-upstream origin gh-pages`
Same as: `git push -u origin master`
GitHub Pages
Navigate to REPONAME
Settings
Go to settings to turn on GitHub Pages
Lots of Jekyll themes to choose from
Can use the HackMD tool to explore:
https://hackmd.io/
Share here: https://hackmd.io/s/rJmqomaTS
Can edit in HackMD and cut/paste into github md files.
Markdown cheat sheet: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
- Fork a repository from someone in the room (colleague github repos to chose from are above)
Click the "Fork" button at the top to put a copy of the repo in your github
- Create a Pull Request, which forks (makes a copy in your repo) to suggest a text/format change, that can be merged by the owner
## GitHub Pages Resources
You can add a theme to spruce up your GitHub Page!
Jekyll Themes: https://jekyllthemes.io/
Hugo Themeshttps://themes.gohugo.io/
## GitHub Troubleshooting
https://github.com/momiji15/gittingit/blob/master/howto.md
*## OpenRefine
Open Refine Cheat Sheet: https://docs.google.com/document/d/1RJVPyAChehfeVEd2DoltL6mm-N7NppHhUkha1Et2_gs/edit
File you need to download: https://github.com/LibraryCarpentry/lc-open-refine/raw/gh-pages/data/doaj-article-sample.csv (In Safari, right click and select download linked file; in Chrome and Firefox, right click and select save link as)
Remember that all of our instructions are available here: https://librarycarpentry.org/lc-open-refine/
## Checklist
* Logistics - restrooms, water fountain, emergency exits, emergency contact
* Review Code of Conduct with learners (https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html) We are all learners!
* Can you see the text (make it bigger?), can you hear us (speak louder?), can you slow down?, any other access questions?
* Schedule: Intro to Data, Shell (day 1), Git, OpenRefine (day 2) https://libcce.github.io/2019-01-07-MTSU/ .(coffee breaks at 10:30 and 2:30, lunch at noon)
* Remind learners to use sticky notes to give feedback
* Get feedback at lunch and end of each day using sticky notes
* Collect attendee names
* Lessons are online at https://librarycarpentry.org/lessons/