Posts

Showing posts with the label coding

Morning Workshop: Regular Expressions (Digital Humanities Summer Institute #DHSI18)

DHSI Morning Workshop: Regular Expressions by John Simpson Description:  Regular Expressions are a powerful tool for searching text to find patterns of characters. They are often used to extract postal codes, phone numbers, and emails from large sets of documents and when combined with a little bit of scripting they can turn tedious and error prone work done “by hand” into fast, effective, and automatic searching. In this workshop you will learn the basic syntax for regular expressions and deploy them to extract useful information in cases where doing it “by hand” would be tedious. Point browser to  https://regex101.com/ and to gutenberg.org/ebooks/13 Text version of The Hunting of the Snark. Most of the workshop should be discussion dialog. cwrc.ca/rsc-src Regex good for matching patterns of characters A PDF document in background is a lot of XML, lot fo stuff is not helpful, lots of XML vomit of individual lines, but can use to zoom in on a particular piece of tex...