Monday, October 30, 2017

ASIST 2017 Panel: Standards and Best Practices Related to the Publication, Exchange, and Usage of Open Data

ASIST 2017 Panel: Standards and Best Practices Related to the Publication, Exchange, and Usage of Open Data [abstract here]

Mark Needleman (co-chair ASIST Standards Committee)
Intentionally keep a record of NISO votes on the ASIST site, intentionally make public. ASIST site, About,

Standards and Toward Best Practices related to publication and exchange and usage of Shareable (not necessarily open) Data
Jane Greenberg

Data Sharing in open environments.

Data sharing advantages - more complete picture, ROI, more data, more experts, data reuse, better insights in to Big Data. Open data: DRYAD, DataOne, DFC, DataNet Federation Consortium, RDA (Research Data Alliance).

Standards for open data and data sharing. - editable pages, rich 5.1 on metadata standards for open environment.
Not as active: - directory of metadata standards for open data. interdisciplinary, open on github. Came from

Have you: Deposited data in a repository? Deposit sensitive or restricted data into a repository. Drexel repository: repository statement says they can reproduce adn distribute, But what abotu closed data?

Closed environments - "A licensing model and ecosystem for data sharing (NIS spoke). Intel-Collaborative Cancer Cloud (CCC); Collaborative Genomics Cloud (CGC), FICO. Barriers to sharing data with sensitive info: complex regulations,;data lifecycle,;licensing agreements; technical and systematic aspects of security; rights and privacy for sensitive information; incentives: why would someone go to the effort to share sensitive information?

Need to take steps: where are the standards, and what should we be doing? There is still merit in sharing, no sharing without a legal agreement in these environments, involves lawyers for data sharing which takes time and money. Companies want to share data for universities to do data visualization--6 months later, only partially written agreement. Sometimes the data you get isn't the data you actually wanted, researchers move on and don't want to be bothered.

A Licensing Model and Ecosystem for Data Sharing - part of northeast big data hub. central hubs and smaller initiatives and working groups, and spokes and rings, see NSF page. They have a spoke (LicModEco) - collaboration between-cell at MIT and [?]. 3 goals: licensing framework generator so you don't have to reinvent the wheel, and tie to data sharing platform. Building off of a system called DataHub. Initial proposal--more important in software to have standards. Solve 80% of the problem.

Where do standards fit? What metadata standards exist for access and rights and for workflow? Can we borrow from this? Existing metadata and rights standards
- METS (digital library containerizing)
- ODRL Open Digital Rights Language (

Connecting with other initiatives - Rights Data Integration Project (RDI), UK Copyright Hub, Research Data Alliance (legal interoperability interest group, RDA/NISO privacy task force)

FAIR talks about

Licensing Model and Ecosystem for Data Sharing at MIT.
Enabling Seamless Data Sharing in Industry and Academia (2017) - collect agreements, build a trusted platform, good metadata.

Representatives from industry, government, academic. Agreed to collect data sharing agreements to understand what is accessible and understand how might build underlying ontology of system. Gathered data sharing agreements. When sensitive data is involved and the agreement has worked out.  More detail breakdown of classes where defined attributes out of, getting down to single term concepts or identifiers (ontologizing). Have done some NLTK for parsing terms.

Difficult to collect because folks are not interested in sharing
Conclusions and next steps: many different efforts in rights area that are useful, good work in open data environment AND in closed/restricted environments. In rights area FAIR principles speak to broader topic of data sharing. Community building has been crucial. See Metadata Research Center  team.

Standards and Best Practices Related to the Publication, Exchange, and Usage of Open Data: Data on the Web and Image Based Resources
Marcia Zeng

Data on the web, and image based resources. Publishing and sharing dat on the web. Openness and flexibility of web create new opportunities and challenges. Data publishers and consumers don't necessarily know each other, don't know what each other are doing. Can we not reinvent the wheel? Importance of following the best practices - contributes to trustworthiness and reuse of structural metadata, descriptive metadata, access info, data quality info, provenance info, usage info, licensing info.

W3C Recommendation of January 2017 Data on the Web Best Practices. This standard gives us the idea of showing you what are the best practices, each has a 'why we need this' and also and intended outcomes, possible approach, and benefit (as well as benchmarks of the benefits. Benefits to publishers [see slides]. Can we trust this data? Best practices can guide your answer to that.

Web Annotation to convey info about a resource or association between resources (comments, tags, blog post about a news article) - there is a web annotation framework. Web annotation data model.

other notable W3C recommendations include web mentions - notify any URL when you mention it on your site. Linked Data Notifications. Subresource Integrity.

IIIF - International Image interoperability Framework - Image based resources on the web. Books, archives, newspapers, sheet music, maps, architecture, scrolls, STEM imagery, manuscripts. 4 APIs Image, presentation, [X? Missed this one], content search. Everyone uses different software; so facilitate distributed access over standard APIs. How can we cite and give credit to the image (ex high vs low image resolution). IIIF gives a URI for each level (region, size, mirror, rotation, quality). IIIF Image API requires a standard format. Not just 2D; also 3D objects can be referred--any angle/side, etc. Images can be shared/compared.

The presentation API - collection - manifest - sequence - canvas - content. One examples shows 16 institutions each sharing their own image, can compare features from images provided from different institutions. Allows for transcriptions of text but also annotations. Any existing image server can adopt this API.

Learn more at IIIF's YouTube channel. go to see implementation demos
All of the major systems now support IIIF. (Question: what about Omeka? Unanswered.)
[Do I need a linux machine

Update on EMR and EHR Standards: Finally starting to come together? Bob Kasenchak

Landscape for medical data
EMr: single practive's digital version of chart

EHR - theoretically all medical info

Sharing health info is complicated. Stored in disparate data structures, not all interoperable, legal and privacy concerns, some data has to be anonymized, HIPAA regulations govern which data have to be anonymized before it can be shared and gives guidelines on sharing. Different countries have different restrictions and requirements. Classic info mgt problems: storage, preservation, how push update to centralized record to all different users. Problem of quality of data source entered by humans.

Opportunity for standards fo rElectronic Health Info. Many, but HL7 is becoming generally accepted, and FHIR.

But some other standards: ASTM, ISO, openEHR, SMART.

HL7 - Health level 7 international - ANSI accredited. Becoming accepted because AMA is pushing for it, but adoption is slow and uneven. FHIR is a data format, XML platform. EHR and EMR agnostic, doesn't care what system you use. Goal of FHIR is to help organizations standardize and make available the data. Supposed to keep patient needs primary by minimizing data issues. Features APIs (RESTful). By adopting, can provide interoperable medical data over web regardless of platform and need/use case.

How long until 'flavors' of FHIR and HL7 to appear - XML is extensible by definition- leads to the SKOS problem. So many flavors and extensions that consuming apps can often no longer ingest SKOS from other organizations without significant pain: scripting, time, money, frustration. can you use SKOS? Which one? (Atypon, Temis, highwire, Skos 1, SKOS - XL)

Data Standards Life Cycle: Plan, acquire, process, analyze, preserve, publish, share - missing that standards change as they proliferate and are adopted, but extensibility creates a circle. How do extensible standards remain interoperable?

No comments: