[Metadatalibrarians] Code4Lib Journal Issue 36 and Call for New Editors

Ruth Kitchin Tillman ruthtillman at gmail.com
Fri Apr 21 07:58:49 PDT 2017

The new issue of the Code4Lib Journal is now available:


The table of contents is below.  As you are reading, also know that we are
looking for editors to join the Code4Lib Journal Editorial Committee.  What
does it mean to join the editorial committee?  Read more about our process
and structure (http://journal.code4lib.org/process-and-structure) and/or
ask one of the current members of the editorial committee (
http://journal.code4lib.org/editorial-committee).  Interested?  Send a
letter to journal at code4lib.org and address these two questions:

  1) What is your vision for the Code4Lib Journal? Why are you interested
in it?

  2) How can you contribute to the Code4Lib Journal, i.e. what do you have
to offer?

In the meantime, enjoy issue 36!

Editorial: Reflecting on the success and risks to the Code4Lib Journal

Peter E. Murray
Linked Data is People: Building a Knowledge Graph to Reshape the Library
Staff Directory <http://journal.code4lib.org/articles/12320>

Jason A. Clark and Scott W. H. Young

One of our greatest library resources is people. Most libraries have staff
directory information published on the web, yet most of this data is
trapped in local silos, PDFs, or unstructured HTML markup. With this in
mind, the library informatics team at Montana State University (MSU)
Library set a goal of remaking our people pages by connecting the local
staff database to the Linked Open Data (LOD) cloud. In pursuing linked data
integration for library staff profiles, we have realized two primary use
cases: improving the search engine optimization (SEO) for people pages and
creating network graph visualizations. In this article, we will focus on
the code to build this library graph model as well as the linked data
workflows and ontology expressions developed to support it. Existing linked
data work has largely centered around machine-actionable data and
improvements for bots or intelligent software agents. Our work demonstrates
that connecting your staff directory to the LOD cloud can reveal
relationships among people in dynamic ways, thereby raising staff
visibility and bringing an increased level of understanding and
collaboration potential for one of our primary assets: the people that make
the library happen.
Recommendations for the application of Schema.org to aggregated Cultural
Heritage metadata to increase relevance and visibility to search engines:
the case of Europeana <http://journal.code4lib.org/articles/12330>

Richard Wallis, Antoine Isaac, Valentine Charles, and Hugo Manguinhas

Europeana provides access to more than 54 million cultural heritage objects
through its portal Europeana Collections. It is crucial for Europeana to be
recognized by search engines as a trusted authoritative repository of
cultural heritage objects. Indeed, even though its portal is the main entry
point, most Europeana users come to it via search engines.

Europeana Collections is fuelled by metadata describing cultural objects,
represented in the Europeana Data Model (EDM). This paper presents the
research and consequent recommendations for publishing Europeana metadata
using the Schema.org vocabulary and best practices. Schema.org html
embedded metadata to be consumed by search engines to power rich services
(such as Google Knowledge Graph). Schema.org is an open and widely adopted
initiative (used by over 12 million domains) backed by Google, Bing,
Yahoo!, and Yandex, for sharing metadata across the web It underpins the
emergence of new web techniques, such as so called Semantic SEO.

Our research addressed the representation of the embedded metadata as part
of the Europeana HTML pages and sitemaps so that the re-use of this data
can be optimized.

The practical objective of our work is to produce a Schema.org
representation of Europeana resources described in EDM, being the richest
as possible and tailored to Europeana’s realities and user needs as well
the search engines and their users.
Autoload: a pipeline for expanding the holdings of an Institutional
Repository enabled by ResourceSync

James Powell, Martin Klein and Herbert Van de Sompel

Providing local access to locally produced content is a primary goal of the
Institutional Repository (IR). Guidelines, requirements, and workflows are
among the ways in which institutions attempt to ensure this content is
deposited and preserved, but some content is always missed. At Los Alamos
National Laboratory, the library implemented a service called LANL Research
Online (LARO), to provide public access to a collection of publicly
shareable LANL researcher publications authored between 2006 and 2016. LARO
exposed the fact that we have full text for only about 10% of eligible
publications for this time period, despite a review and release requirement
that ought to have resulted in a much higher deposition rate. This
discovery motivated a new effort to discover and add more full text content
to LARO. Autoload attempts to locate and harvest items that were not
deposited locally, but for which archivable copies exist. Here we describe
the Autoload pipeline prototype and how it aggregates and utilizes Web
services including Crossref, SHERPA/RoMEO, and oaDOI as it attempts to
retrieve archivable copies of resources. Autoload employs a bootstrapping
mechanism based on the ResourceSync standard, a NISO standard for data
replication and synchronization. We implemented support for ResourceSync
atop the LARO Solr index, which exposes metadata contained in the local IR.
This allowed us to utilize ResourceSync without modifying our IR. We close
with a brief discussion of other uses we envision for our ResourceSync-Solr
implementation, and describe how a new effort called Signposting can
replace cumbersome screen scraping with a robust autodiscovery path to
content which leverages the Web protocol.
Outside The Box: Building a Digital Asset Management Ecosystem for
Preservation and Access <http://journal.code4lib.org/articles/12342>

Andrew Weidner, Sean Watkins, Bethany Scott, Drew Krewer, Anne Washington,
Matthew Richardson

The University of Houston (UH) Libraries made an institutional commitment
in late 2015 to migrate the data for its digitized cultural heritage
collections to open source systems for preservation and access:
Hydra-in-a-Box, Archivematica, and ArchivesSpace. This article describes
the work that the UH Libraries implementation team has completed to date,
including open source tools for streamlining digital curation workflows,
minting and resolving identifiers, and managing SKOS vocabularies. These
systems, workflows, and tools, collectively known as the Bayou City Digital
Asset Management System (BCDAMS), represent a novel effort to solve common
issues in the digital curation lifecycle and may serve as a model for other
institutions seeking to implement flexible and comprehensive systems for
digital preservation and access.
Medici 2: A Scalable Content Management System for Cultural Heritage
Datasets <http://journal.code4lib.org/articles/12317>

Constantinos Sophocleous, Luigi Marini, Ropertos Georgiou, Mohammed
Elfarargy, Kenton McHenry

Digitizing large collections of Cultural Heritage (CH) resources and
providing tools for their management, analysis and visualization is
critical to CH research. A key element in achieving the above goal is to
provide user-friendly software offering an abstract interface for
interaction with a variety of digital content types. To address these
needs, the Medici content management system is being developed in a
collaborative effort between the National Center for Supercomputing
Applications (NCSA) at the University of Illinois at Urbana-Champaign,
Bibliotheca Alexandrina (BA) in Egypt, and the Cyprus Institute (CyI). The
project is pursued in the framework of European Project “Linking Scientific
Computing in Europe and Eastern Mediterranean 2” (LinkSCEEM2) and supported
by work funded through the U.S. National Science Foundation (NSF), the U.S.
National Archives and Records Administration (NARA), the U.S. National
Institutes of Health (NIH), the U.S. National Endowment for the Humanities
(NEH), the U.S. Office of Naval Research (ONR), the U.S. Environmental
Protection Agency (EPA) as well as other private sector efforts.

Medici is a Web 2.0 environment integrating analysis tools for the
auto-curation of un-curated digital data, allowing automatic processing of
input (CH) datasets, and visualization of both data and collections. It
offers a simple user interface for dataset preprocessing, previewing,
automatic metadata extraction, user input of metadata and provenance
support, storage, archiving and management, representation and
reproduction. Building on previous experience (Medici 1), NCSA, and CyI are
working towards the improvement of the technical, performance and
functionality aspects of the system. The current version of Medici (Medici
2) is the result of these efforts. It is a scalable, flexible, robust
distributed framework with wide data format support (including 3D models
and Reflectance Transformation Imaging-RTI) and metadata functionality. We
provide an overview of Medici 2’s current features supported by
representative use cases as well as a discussion of future development
An Interactive Map for Showcasing Repository Impacts

Hui Zhang and Camden Lopez

Digital repository managers rely on usage metrics such as the number of
downloads to demonstrate research visibility and impacts of the
repositories. Increasingly, they find that current tools such as
spreadsheets and charts are ineffective for revealing important elements of
usage, including reader locations, and for attracting the targeted
audiences. This article describes the design and development of a
readership map that provides an interactive, near-real-time visualization
of actual visits to an institutional repository using data from Google
Analytics. The readership map exhibits the global impacts of a repository
by displaying the city of every view or download together with the title of
the scholarship being read and a hyperlink to its page in the repository.
We will discuss project motivation and development issues such as
authentication with Google API, metadata integration, performance tuning,
and data privacy.

More information about the Metadatalibrarians mailing list