Posts

Showing posts with the label data

Access 2018 Conference Day 2 Afternoon Sessions #AccessYMH

Integrating Digital Humanities Into the Web of Scholarship with SHARE: An Exploration of Requirements Joanne Paterson osf.io/pkvtu Today going to talk about SHARE, ways to use, integrating DH scholarship, emerging themes and initial thoughts. What is SHARE? Schema agnostic approach to aggregate diverse metadata. Community open source initiative. Scholars are doing various things, how can we bring all that together so we can see their body of work and things that are related? ARL initiative started in 2013. Aggregates metadata. Looks at research cycle and various outputs of research. To aggregate metadata, they put out a call to ask someone to help them build this, answered by Center for Open Science. (OSF - looks at research workflow, allows you to collaborate with others and share easily). OSF free and open project support, can work privately or publicly. SHARE - harvested datasets from wherever they're open, metadata about scholarly research - scholar's portal, figshare...

Digital Humanities Summer Institute #DHSI18 Day 4 [Morning] Making Choices About Your Data

Making Choices About Your Data Digital Humanities Summer Institute #DHSI18 Day 4 (Morning) Paige Morgan and Yvonne Lam Housekeeping The Shock of the Old (D. Edgerton) - People adopt new tech but the old runs alongside for many reasons. (I'd also recommend The Diffusion of Innovations  on top of this.) Class discussion of D'Ignazio and Klein's "Feminist Data Visualization." Colonialist legacies of tabular data. Even JSON's tree structure is hierarchical. What should we use? Jacqueline Wernimont's Numbered Lives: Life and Death in Quantum Media .  Donella Meadows - leverage points. Instead of massive interventions (which can be most temporary and ephemeral), smaller interventions can be most powerful. Bethany's piece on "The eternal September" of DH - when do you stop explaining yourself to the new but not establish gatekeeping. People have to discover for themselves and discover where they are. --------------------------- Visi...

Digital Humanities Summer Institute #DHSI18 Day 3 [Afternoon] Making Choices About Your Data

Making Choices About Your Data Digital Humanities Summer Institute #DHSI18 Day 3 (Afternoon) Paige Morgan and Yvonne Lam Article "Against Cleaning" Committing to giving certain answers when you are cleaning data. How do I make this material discoverable and allow it to intersect more clearly with discoveries being made in this field. You may feel like you need to tune your data so it gives specific answers. But the more you do is not to get project to spit out answers for people, but give answers that help people rethink.  What is the info I wan to surface for people, how do I get my data to surface that? [Much more concern for how *others* are going to use data here with the digital humanists that in my experience with social science, where we collect our data to answer our questions, then fin. Kudos to DH folks!] Expansion without growth - scalability Who is your audience? Who is relying on your workflow or the decisions you made that you can't explain? T...

Digital Humanities Summer Institute #DHSI18 Day 3 [Morning] Making Choices About Your Data

Making Choices About Your Data Digital Humanities Summer Institute #DHSI18 Day 3 (Morning) Paige Morgan and Yvonne Lam Standardized rights statements: http://rightsstatements.org/en/ Controlled vocabularies Working with Openrefine Free work time Lunch Reading: Against Cleaning Free work time Tomorrow: Meeting with FemDH Controlled vocabulary: a set of carefully chosen words and phrases used to help structure and define information so that it can be easily returned in a search, or parsed by analysis programs. May be the basis for taxonomies and ontologies; can be hierarchical or restricted in various ways. Ex) Pizza vocabulary.  Crust (deep dish; crispy) Sauce (marinara, alfredo, olive oil) Cheese (mozzarella, Provolone, parmesan) Veggies (mushrooms, green peppers, onions, tomatoes, olives) Meat We can say every pizza must have a crust, must have one or more sauces, etc. Can add another layer and say there are 'veggie pizzas' and 'm...

Digital Humanities Summer Institute #DHSI18 Day 2 [Afternoon] Making Choices About Your Data

Making Choices About Your Data Digital Humanities Summer Institute #DHSI18 Day 2 Afternoon Paige Morgan and Yvonne Lam Find and share datasets at: Figshare Humanities CORE repository Data is Plural Twitter (datset #dataset) GitHub Documentation Data dictionaries a record of what data is and isn't supposed to do, definitions, usage similar to a codebook, used more by folks working with coding languages that define different functions, how was it done in this experiment. Data dictionaries do the same for humanities What are your categories meant to cover? Workflows Set of instructions/rules (doesn't need to be a table, can be a list - what to do for each thing, what not to do) see smartdraw,com For tomorrow: Openrefine.org is free, works on Windows and Linux (use 2.8, not the beta)

Digital Humanities Summer Institute #DHSI18 Day 2 [Morning] Making Choices About Your Data

Making Choices About Your Data Digital Humanities Summer Institute #DHSI18 Day 2 Morning Paige Morgan and Yvonne Lam Clean data vs tidy data Cleaner data is grouped in fewest 'boxes' possible, categories. makes data more interoperable and legible to their agencies. Think 'race/ethnicity' - either few checkboxes/labels, or open where folks can write in anything at all (where running analysis would be difficult). Ambiguity and complexity. Ambiguity is - how does having more or less ambiguity in your data/project affect where the work goes?  Limited categories is legible and understandable to others. If you are studying something that manifests differently among categories, you'd need the 'messier' more detailed data.  machine parsable non machine parsable less accurate more accurate representation of complexity Book recommendation: Sorting Things Out - death causes and diseases data. Dataset originated for people working on merchant ...

Digital Humanities Summer Institute #DHSI18 Day 1 [Afternoon] Making Choices About Your Data

Digital Humanities Summer Institute #DHSI18 Day 1 Afternoon Paige Morgan and Yvonne Lam [ #wrangledata ] Are you in it for the process or the product? Need to be sure you and your tenure committee and chair are on the same page. Ex - Old Bailey online is most successful. Over 300 years of records from London's criminal court. Can search all sorts of facets. Project has several controlled vocabularies for offenses, verdicts, sentences, etc. This successful project was funded by UK grantign agencies that grant within high 6 figures into low 7 figures (pounds, not dollars) - that's the kind of money it takes for a source project. Depending on wher you get news of DH from, you'll hear about different types and aspects of DH. Twitter: cool projects, I'm looking for this kind of tool, omg this tool is failing, small projects and struggling with DH. Not going to hear that from the elite and official sources - if reading from mainly elite official sources, your first o...

Digital Humanities Summer Institute #DHSI18 Day 1 [Morning] Making Choices About Your Data

Digital Humanities Summer Institute #DHSI18 Day 1 Morning Paige Morgan and Yvonne Lam [ #wrangledata ] Goals Spreadsheet of data and metadata you can take to librarian or developer Clearer idea of what research questions you can ask of your data Better sense of what tools would be a good fit for your data; or what you would need to do to your data to make it work better with certain tools Start of specific plans about work that you want to do ON your data So much depends on what you're going to prioritize because you are not going to learn all the things at once. Encouragement ot think carefully and realistically and generously with selves about setting goals of what we're going to learn. Goals, milestones , FemTechNet MEALS Framework Idea is to poke a little bit at assumptions we have about technologies work and good ways of using them, what's an acceptable thing to apply technology to (mostly discussing digital tech). Not only is there this idea of ho...

IRPE Colloquium: Predicting Graduation for First Time Full-time Freshmen at CSUCI

IRPE Colloquium:  Predicting Graduation for First Time Full-time Freshmen at CSUCI Kristin Jordan (SOC, IRPE) & Jared Barton (ECON) 4/23/18 Where to find info See  this IRPE research brief Characteristics at admission predict graduation: HS GPA, SAT scores, high school curriculum, race/ethnicity, income, parent education. Looking at underrepresented (URM) student achievement gap, income gap (Pell grant or not), first generation college student or not.  Use info at admissions to explain past graduation rates and also to forecast and understand future graduation rates. Goals: examine which characteristics predict student success, and to decompose achievement gaps into what we can explain and what is left unexplained in understanding those gaps. Achievement gap characteristics overstated because students appear in more than one category. Achievement gap characteristics are correlated with other known (positive) predictors of graduation.   ...

Citation Metrics and Altmetrics: A Brief Overview (Computers in Libraries 2018)

Citation Metrics and Altmetrics: A Brief Overview Elaine Lasda, Associate Librarian, University at Albany Proprietory resources: Clairvariate anlytics, Scopus, Plumx (bought by elsevier used by Scopus). When Web of Science isn't enough or available. Free resources (see resource guide online at conference site). Citation/Bibliometric tools. Dimensions.ai is an open source citation database like scopus, very similar. Copernio one stop document retrieval browser add-on. Clarivate Analytics - vanit yassessment, but can help id hot otpics and bleeding edge of research. Tough to search for specific journal or researcher. Journalmetrics.com Scopus' CiteScore - Journal metric a ratio like the journal impact factor but includes other than scholarly peer reviewed: includes editorials, conference proceedings, review articles. Percentile rank, citation counts, SNP SJR (SNP supposed to correct for disciplinary differences in impact factor, but only thing that corrects for it reall...

UC DLFx 2018: Defining and Sustaining Digital Collection and Scholarship Services

UC DLFx 2018 Defining and Sustaining Digital Collection and Scholarship Services Zoe Borovsky UCLA), Mary Elings, Erik Mitchell (UCB), Laura Smart (UCI), Carl G. Stahmer (UCD), Stacey Reardon (UCB)  Dialogic open space: Panelists will introduce.  Framing questions What current use cases demonstrate a need for DS? Who are we missing? Demographics of folks we're serving? How are digital outputs changing our collection and preservation strategies and what changes do we need to make in the future? What additional or redeployed resources and labor will be required to provide necessary services? Are current and imagined services sustainable compared to traditional library services? https://ds.lib.ucdavis.edu/ucdlf has the questionnaire No one definition of digital scholarship, but seeing working with different groups on campus, but not sure how working best with that group. Is this question of expertise or infrastructure that we're providing? Different v...

UC DLFx 2018: Combined Session E Notes

Launching the Digital Lifecycle Program at UCB Lynne Grigsby  (Head Library IT) and Eric Mitchell, AUL/DCS Major strategic initiative through Library's strategic planning process. Last year and a half, developing current digitization strategy. Digital Lifecycle Program Mission and Goals To convert and publish on a massive scale the UCB collections. (To digitize content and shepherd all digitally converted assets of the library through their lifecycle to service the mission of the library. Goal 1: Create and manage digitized assets Goal 2: Have a formal, ongoing, and sustainable digitization program that focuses on the entire digital lifecycle Goal 3: Provide effective access and widespread dissemination of content How do you effect this change from ad hoc project based digitization to high throughput. Convert: - Respect and identify multiple digitization streams and make sure they don't conflict (wax cylinder digitization, vendor sourced 2D material digitization, ...