Digital Humanities Summer Institute #DHSI18 Day 1 [Afternoon] Making Choices About Your Data
Digital Humanities Summer Institute #DHSI18 Day 1 Afternoon
Paige Morgan and Yvonne Lam [ #wrangledata ]
Paige Morgan and Yvonne Lam [ #wrangledata ]
Are you in it for the process or the product? Need to be sure you and your tenure committee and chair are on the same page.
Ex - Old Bailey online is most successful. Over 300 years of records from London's criminal court. Can search all sorts of facets. Project has several controlled vocabularies for offenses, verdicts, sentences, etc. This successful project was funded by UK grantign agencies that grant within high 6 figures into low 7 figures (pounds, not dollars) - that's the kind of money it takes for a source project.
Depending on wher you get news of DH from, you'll hear about different types and aspects of DH. Twitter: cool projects, I'm looking for this kind of tool, omg this tool is failing, small projects and struggling with DH. Not going to hear that from the elite and official sources - if reading from mainly elite official sources, your first or second step you might think is an NEH grant. This is highly unlikely. Disabuse yourself of the expectation that his is how DH works. Important for those starting out.
You don't need to apologize for not programming or not wanting to. (Yvonne, professional programmer) MEALS and SciTechStudies - tech is shaping. If you program, you look at problems certain way and certain methods. Feeling those methods are somehow more official and real (Vanhover Busch - the best tech experience is direct brain dump without physicality). Idea that everyone should have to program or your work is less real: no, not inherent or true, nothing about programming that makes it more special than any other tool. Look at your data especially at the beginning to see what tools would be useful.
[nonrelational - SparQL - queries RDF]
Even if you can't ask specialist question in specialist language, knowing your data will help you get an answer faster (instead of adding it to the Pile of Shame of things that take too much time to do). What will help you answer questions: what data do you have? What format? Modern format? How long a period does it span? How much data do you have?
Activity: Create Your Data (Group Work Notes)
Part 1: Have an object that is part of a collection, create data for collection.
Basic metadata: author, category of food, URLs, website host (Food Network) ingredients, cook time, servings, prep time, total time, equipment, images y/n, photo credit, nutrition facts included or not, nutrition facts
Questions people might ask of the data: number of ingredients, reviews/how good is it, chef name, professional chef or home cook, servings, category of food (breakfast, lunch, dinner, dessert), particular ingredient, calories/nutrition info
Part 2: 2nd object that expands the scope of the collection of materials and need to revise data model.
Added a physical banana.
Add format to metadata (paper, consumable banana itself). How does it change what collection is about? Catalog a consumable. Recipes and kitchen inventory? CSA shares in a farm? Lots of ways to organize it? Species of banana?
compare calories of banana to published nutritional value bc of difference in size
Structure of data: relational tables so can cross references.
Possible RQs? What is assumed to be in a normal pantry and not? Ingredients. How recipes are constructed - does banana in real life calories match calories reported by recipe? (We might not provide info, but others can add to the data this way for themselves)
Basic metadata: format,
---
Useful to be defamiliarized from data. Think about different choices you make with data that is not yours. Some data is ephemeral (banana won't last). Sometimes things break your expectations and standard workflow does not work.
-------
Longer Intros: What Data are You Working With? Why?
What is your data/material?
How much do you have now?
What format is it in? (images? PDFs? plain text? Something else?)
Is there more data you are hoping to incorporate?
What sort of questions are you interested in asking?
What is your goal with this data? Process? Product? Success on the job market?
Challenges:
Feeling behind on the tech, so much to know!
Migration to Samvera
Publication complications with NPS data
Being sensitive to native peoples, Chumash and possible options including Mukurtu
Ex - Old Bailey online is most successful. Over 300 years of records from London's criminal court. Can search all sorts of facets. Project has several controlled vocabularies for offenses, verdicts, sentences, etc. This successful project was funded by UK grantign agencies that grant within high 6 figures into low 7 figures (pounds, not dollars) - that's the kind of money it takes for a source project.
Depending on wher you get news of DH from, you'll hear about different types and aspects of DH. Twitter: cool projects, I'm looking for this kind of tool, omg this tool is failing, small projects and struggling with DH. Not going to hear that from the elite and official sources - if reading from mainly elite official sources, your first or second step you might think is an NEH grant. This is highly unlikely. Disabuse yourself of the expectation that his is how DH works. Important for those starting out.
You don't need to apologize for not programming or not wanting to. (Yvonne, professional programmer) MEALS and SciTechStudies - tech is shaping. If you program, you look at problems certain way and certain methods. Feeling those methods are somehow more official and real (Vanhover Busch - the best tech experience is direct brain dump without physicality). Idea that everyone should have to program or your work is less real: no, not inherent or true, nothing about programming that makes it more special than any other tool. Look at your data especially at the beginning to see what tools would be useful.
[nonrelational - SparQL - queries RDF]
Even if you can't ask specialist question in specialist language, knowing your data will help you get an answer faster (instead of adding it to the Pile of Shame of things that take too much time to do). What will help you answer questions: what data do you have? What format? Modern format? How long a period does it span? How much data do you have?
Activity: Create Your Data (Group Work Notes)
Part 1: Have an object that is part of a collection, create data for collection.
Basic metadata: author, category of food, URLs, website host (Food Network) ingredients, cook time, servings, prep time, total time, equipment, images y/n, photo credit, nutrition facts included or not, nutrition facts
Questions people might ask of the data: number of ingredients, reviews/how good is it, chef name, professional chef or home cook, servings, category of food (breakfast, lunch, dinner, dessert), particular ingredient, calories/nutrition info
Part 2: 2nd object that expands the scope of the collection of materials and need to revise data model.
Added a physical banana.
Add format to metadata (paper, consumable banana itself). How does it change what collection is about? Catalog a consumable. Recipes and kitchen inventory? CSA shares in a farm? Lots of ways to organize it? Species of banana?
compare calories of banana to published nutritional value bc of difference in size
Structure of data: relational tables so can cross references.
Possible RQs? What is assumed to be in a normal pantry and not? Ingredients. How recipes are constructed - does banana in real life calories match calories reported by recipe? (We might not provide info, but others can add to the data this way for themselves)
Basic metadata: format,
---
Useful to be defamiliarized from data. Think about different choices you make with data that is not yours. Some data is ephemeral (banana won't last). Sometimes things break your expectations and standard workflow does not work.
-------
Longer Intros: What Data are You Working With? Why?
What is your data/material?
How much do you have now?
What format is it in? (images? PDFs? plain text? Something else?)
Is there more data you are hoping to incorporate?
What sort of questions are you interested in asking?
What is your goal with this data? Process? Product? Success on the job market?
Challenges:
Feeling behind on the tech, so much to know!
Migration to Samvera
Publication complications with NPS data
Being sensitive to native peoples, Chumash and possible options including Mukurtu
Comments