Access 2017 Conference Day 1 #accessYXE Notes Sessions 4-7

Session 4: Excavating the 80s: Strategies for Restoring Digital Artifacts From the First Era of Personal Computing - John Durno

"Avoiding technological quicksand" Rothenberg presentation review: hard copy, standards, computer museums, format migration, emulation. Ultimately he argued all but emulation would be of limited utility.

David Bearman's "Reality and Chimeras in the Preservation of Electronic Records" in 1999 D-Lib. OAIS Functional Model. Rothenberg's emulation still at work - see Internet Archive, see code4lib paper (missed cite), Preserving and Emulating Digital Art Objects (Cornell white paper). Usually case-based choosing whatever works, not prescripted.

Case Study 1: AtariWriter - no modern software that can read or convert. but if you can play games on an old platform, someone has probably written an emulator for it. Locate, install, configure an open source emulator. Tracking down old software not usually that difficult, though legality is uncertain. Copies of abandoned-ware for purposes of retrieval and study may fall under fair use...maybe. Reality is that most cases you don't want to read the old documents in that form, reconfiguring every time. Easy to configure to bridge printing output to host operating system, so could print as PDFs. Even if you think format migration is best way to go, sometimes only way to get there is via emulation.

Case 2; Wordstar. Opening one by one and converting them is tedious, Batch converters for wordstar scarce. Can fall back on standards - 7 bit ASCII. Many versions of WordStar varied to allow last bit to change, which makes last letter of each word to be unreadable. PERL--can run batch job. Lose formatting converting to ASCII. Most old files didn't have much formatting especially in correspondence and personal papers.

Case 3: Telidon 699. In 2015, presentation on retrieving images from this. initial success with emulation based approach, but a few more hundred files didn't work that well. Problem files had been developed in earlier version. Consumer grade computers couldn't render natively, so specialized peripherals developed solely to render Telidon images. Ultimate failure of Telidon to gain traction, almost impossible to track down, but almost only way to make intelligible except writing decoder from scratch. Luck: located functional Telidon 699 coder at Spark Radio Museum outside of Vancouver and a technician. You need a moving image format to capture unique qualities. How to transfer images across generations of technology? Recorded off CRT using video camera (similar to way original folks did it). Glenn E. Howarth image of Moon Meat: Vegetarian Nightmare.

Even if agree hardware preservation is hard, can't get away from it because if someone brings you old floppies, realityis that there aren't similar devices to hook up to SCSI cords and haven't been updated since Windows 98. Debate between emulation and migration is a red herring. Different classes require different handling. Requirements of practice can confound and complicate requirements of theory.

Session 5: Opening the DAMS: Open Systems, Open Data, and Open Collaboration with Samvera at UVic - Dean Seeman and Lisa Goddard

Samvera (used to be called Hydra), a toolset that sits on top of Fedora and Fedora 4. (Samvera is pathway to Fedora). Samvera is a complex stack of open source components. Simple way of installing is is Haiku (Hydra in a Box - closest to turnkey pregenerated application to install in a sandbox). UVic branded Haiku as Vault (vault.library.uvic.ca), not in production but starting to populate. ContentDM was the workhorse but doesn't align well with strategic objectives which includes ability to roll out to faculty to use for their research projects. Example: the Grants menu, the in-kind value faculty can add to grant applications (http://tiny.cc/uvicgrants). Have to have a web client currently, whcih is a real obstacle to broad collaboration. Need system that supports multitenancy--faculty wont just want library asset system, but their own system, display, permissions, etc. Need tool for working with images (IIIF supported by Samvera--especially important with annotating images). Using the Spotlight exhibit platform. Objects in DAMS need to be pulled into exhibits where can do narratives and content. Both use Ruby and similar architecture.

Wanted more control over development of system, roll out features when needed and not wait for long development cycles. Help faculty but use their funding to fund new features instead of having small siloed pieces of software without broader benefit all over campus with no benefit over and above initial project. Need data store for all emerging digital activities. Optimized for academic libraries. Built in versioning software, automated checksums and can run audits on those ot ensure files haven't degenerated in nonobvious ways. Globally unique identifiers makes audits easier. Globally web addressable.

Fedora 4 based on LDP spcification (Linked Data Platform), exposes standards well-documented APIs for create-read-update-delete requests. Portland Common Data Model supported. Linked open data architecture they hope will result in data web native, taking advantage of web protocols. Systems not permanent, but we hope our metadata are. Can plug an external triplestore into Fedora - can see our data in other contexts, see how links or should link to other datasets.

Advantages form cataloging perspective: Why change systems? ContentDM for 8 years, there's got to be a better way. Gives a chance to rethink metadata, get feet wet with linked data practically, and agency.

Rethink Metadata: Analyzed contentDM: 52 collections, 162 unique local fields, 89 of those fields used only once across entire system, inconsistent. Need application profile borrowed from Euopeana, etc. know metadata you have and how have used it in past, make informed decision about designing profile. Beyond properties, let's talk values and consistency - controlled vocabularies, EDF and identifiers with URIs to make linked data happen. Going for interoperability, giving up some specificity for that. A practical apology for linked data: global identifiers > local.

Agency: often we don't get a say in how systems are developed. What principles can we enact? We need people and machines to migrate and create good data. Humans with varying technical skill need to help create good data.

Machine data creation -> Human Judgment Required -> Machine Cosumption + Human Consumption

How enact? Field/property mapping from CONTENTdm fields to Samvera fields. Content migration mapping (values)--used "Vaultify" and tried to assign from controlled vocabularies as much as possible. tries to assign values and syntax automatically. Fine for migration, but how marry humans and machines in terms of metadata completion. Autocomplete, retrieve controlled vocab, retrieve URIs, help standardize syntax and other content when possible. See IMLS grant initiative with University of Houston.

Q&A: researchers happy to have help because not in it to write software and can focus on specific research question, as well as provide in-kinds, and provides opportunity to talk about digital preservation anyway.

Personal note: yes, get on this. Relevant to what we want to do at CSUCI.

Q&A - provenance issue - do these things become part of Library's preservation and intellectual preservation. Onboard to DAMS preferred because can input into metadata and technology. At least if can work with them on metadata models from the beginning, get stuff that's more easily preservable. Yes, UVicLib will take responsibility for those digital materials; whether they'll be indexed remains to be seen.

Session 6: The Way Leads to PushMi-Pullyu, a Lightweight Approach to Managing Content Flow for Repository Preservation at UofA Libraries - Weiwei Shi, Shane Murnaghan, & Matt Barnett

Pushmi-Pullyu from Dr. Doolittle. Ruby app running behind firewall to pull content from Fedora repository in response to user action. Constructs lightweight archival, and pushes into OpenStackSwift.

Stack: ERA (InstRepos) - Ruby on Rails currently based on Sufia 6.2 (a Samvera Head)
Fedora 4 repository: open source system for management and [...]

OpenStack Swift for long term preservation storage. Hihgly available distributed and consistent object store, versioning, internal audit, quarantine, etc. to ensure integrity of preserved objects.

Preservation commitment preservation plan - gold, silver, bronze. need lightweight tools for baseline requirements of commitment.

AIP Archival information Package - consists of content info and preservation description (content, metadata, packaging, and one or more files to capture comprehensive image of object). Need lightweight AIP without much investment while still trying to figure out different targets but still meet baseline requirement. Lightweight AIP diagram: content of objects, metadata, thumbnails and logs contained in bag )in tar file) with manifest, in Swift Object with name value pairs, project, project ID...

PushmiPullyu because want future integration with Archivematica, need to fulfill today's preservation commitment (maybe not capture more than essential info), and IT security restrictions directing access from public facing network of repository to preservation storage poses risk to content preserved. Can hack through public side to attack preserved files.

PMPY Development: goal was to create simplest thing that would work.
YAGNI development (You Aren't Gonna Need It) - no complex logging, no complex reporting, no complex web app to check status, as little custom development as possible
Fedora provides a messaging queue implemented with Java Message Service. PMPY could maybe trigger the preservation event. Creates noise - Fedora's JMS sounds great but cascades of saves and updates, almost 70 messages created, background jobs create characterization, derivatives, DOIs, etc for a single ingest....which message do you want to preserve your item on? if go up to the Rails layer, provides callbacks - anything that happens in mode can trigger code, - use Rails "after_save" callbacks on the Item model.

New problem: items saved multiple times during ingest. Decided really needed priority queues--least recently updated have highest priority for preservation. Many complex queueing solutions support this like RabbitMQ, more (missed one)

Redis sets - guarantee item appears once and only once no matter how many times added. Redis sorted sets (more, ask for slides).

PMPY Development - system design. IR pushes into priority queue, PMPY grabs, pushes into Swift. Software development - used GitHub with ZenHub extension as project planning and management. Agile - sprint planning, sprint backlogs, sprint demos, standups, retrospectives, 4 weeks 4 full time developers. Continuous integration, and Style Consistency (Hound, Rubocop, EditorConfig help check and maintain consistent style for every commit). Working in Ruby can leverage other work, wrote less custom code and more tests. Log monitoring using paid service called Rollbar, useful for monitoring apps.

Next: Gemify AIP creation and Swift ingestion; extend to file-system based ingestion, extend components to other platforms requiring preservation in the Library.

Session 7: "No, We Can't Just Script It" - Danielle Robichaud & Sara Allen
 Migrating archival data. Archival data, archival description history, a case study at the UofWaterloo. They do script a lot of data transformations, not always manually copying, but wanted to focus on factors making that work difficult. Archives /= libraries, archival data /= library daya. All descriptions are original work, it's the first time it's been described within the archival environment. No copy cataloging, need to spend time researching collections, reading about, writing about, get emotionally involved and protective of data. Records are organic and interrelated. No 1:1 relationship between descriptive record and object itself (can be of 1 thing in a folder of things, a box of things, etc.) DOn't usually describe individual photos because usually don't have mandate to do that. Describe some level of hierarchy, depend on researchers to drill down and find individual items. Only describe individual items with funding from donor or decide arbitrarily what is important, or legal reasons, etc.

Complex data, often fragile, not easy to replicate, stored somewhere on archivists hard drive. Cataloging has been working with shared standards for decades, archives standards adoption has been slow and uneven. Internal systems developed sometimes collection specific, sometimes institution specific. 1990 Canadian Rules for Archival Description (RAD) released, mid-90s, database solutions to replace paper records, 1995 - HYPER_RAD hyperlinked easier-to-use version of RAD released; Late 90s-00s, better databases/specialized archival management software. 1998 - EAD brings XML to archives - first time archival data is machine readable. 2001 ICA OSARIS report recommended standardized open source tool for encoding archival finading aids. 2008 July - ICA-ATOM 1.0 Beta released and Nov 2008 ICA-ATOM 1.0.4-beta now has support for RAD. Atherton quote in "Automation and the Dignity of the Archivist" - "Just to mention the words 'computer' or 'automation' in some circles is to invite cold suspicious stares of hostility, making one feel as though he has said something dirty."

Photographic negatives collection - 2 million, one of most heavily used. Recent pilot of Islandora - Waterloo Digital Library. Illustrates challenges especially when comes to available descriptions to work with. Info provided by photographers not created thinking about helping researchers at the other end. Title, date shot, no if and when it appeared in paper, color or B&W. Complte disconnect between archival description available and what researchers are expecting to find. When thinking about migrating digitized images, obvious entry is to duplicate file level record--doesn't work. Confusing and suggests staff screwed up when title doesn't jive with contents. How to describe item level images, how introduce keywords to be meaningful to staff and end users. Designations of whether ran in paper or not, brief description, clearly identify that the info was lifted from the newspaper, not original work by library. Scripting not helpful here in this instance. But batch XML file creation helpful. Archivists often expected to understand libraries, history of being undervalued and expertise questioned may make archivists cagey. Listen to archivists asks for help instead of thinking they haven't heard of OCR.











Comments

Popular posts from this blog

First Impressions & Customer Service Failures

Email Lists: A Dose of Common Sense

On the Great Myth of the Librarian Grays