[Metadatalibrarians] Code4Lib Journal Issue 47 (apologies for cross posting)
Péter Király
kirunews at gmail.com
Tue Feb 18 00:30:57 PST 2020
I am happy to announce that Issue 46 of the Code4Lib Journal is now available.
https://journal.code4lib.org/
Table of contents:
Scraping BePress: Downloading Dissertations for Preservation
Stephen Zweibel
https://journal.code4lib.org/articles/15016
This article will describe our process developing a script to automate
downloading of documents and secondary materials from our library’s
BePress repository. Our objective was to collect the full archive of
dissertations and associated files from our repository into a local
disk for potential future applications and to build out a preservation
system.
Unlike at some institutions, our students submit directly into
BePress, so we did not have a separate repository of the files; and
the backup of BePress content that we had access to was not in an
ideal format (for example, it included “withdrawn” items and did not
effectively isolate electronic theses and dissertations). Perhaps more
importantly, the fact that BePress was not SWORD-enabled and lacked a
robust API or batch export option meant that we needed to develop a
data-scraping approach that would allow us to both extract files and
have metadata fields populated. Using a CSV of all of our records
provided by BePress, we wrote a script to loop through those records
and download their documents, placing them in directories according to
a local schema. We dealt with over 3,000 records and about three times
that many items, and now have an established process for retrieving
our files from BePress. Details of our experience and code are
included.
Persistent identifiers for heritage objects
Lukas Koster
https://journal.code4lib.org/articles/14978
Persistent identifiers (PID’s) are essential for getting access and
referring to library, archive and museum (LAM) collection objects in a
sustainable and unambiguous way, both internally and externally.
Heritage institutions need a universal policy for the use of PID’s in
order to have an efficient digital infrastructure at their disposal
and to achieve optimal interoperability, leading to open data, open
collections and efficient resource management.
Here the discussion is limited to PID’s that institutions can assign
to objects they own or administer themselves. PID’s for people,
subjects etc. can be used by heritage institutions, but are generally
managed by other parties.
The first part of this article consists of a general theoretical
description of persistent identifiers. First of all, I discuss the
questions of what persistent identifiers are and what they are not,
and what is needed to administer and use them. The most commonly used
existing PID systems are briefly characterized. Then I discuss the
types of objects PID’s can be assigned to. This section concludes with
an overview of the requirements that apply if PIDs should also be used
for linked data.
The second part examines current infrastructural practices, and
existing PID systems and their advantages and shortcomings. Based on
these practical issues and the pros and cons of existing PID systems a
list of requirements for PID systems is presented which is used to
address a number of practical considerations. This section concludes
with a number of recommendations.
Dimensions & VOSViewer Bibliometrics in the Reference Interview
Brett Williams
https://journal.code4lib.org/articles/14964
The VOSviewer software provides easy access to bibliometric mapping
using data from Dimensions, Scopus and Web of Science. The properly
formatted and structured citation data, and the ease in which it can
be exported open up new avenues for use during citation searches and
reference interviews. This paper details specific techniques for using
advanced searches in Dimensions, exporting the citation data, and
drawing insights from the maps produced in VOS Viewer. These search
techniques and data export practices are fast and accurate enough to
build into reference interviews for graduate students, faculty, and
post-PhD researchers. The search results derived from them are
accurate and allow a more comprehensive view of citation networks
embedded in ordinary complex boolean searches.
Automating Authority Control Processes
Stacey Wolf
https://journal.code4lib.org/articles/15014
Authority control is an important part of cataloging since it helps
provide consistent access to names, titles, subjects, and genre/forms.
There are a variety of methods for providing authority control,
ranging from manual, time-consuming processes to automated processes.
However, the automated processes often seem out of reach for small
libraries when it comes to using a pricey vendor or expert cataloger.
This paper introduces ideas on how to handle authority control using a
variety of tools, both paid and free. The author describes how their
library handles authority control; compares vendors and programs that
can be used to provide varying levels of authority control; and
demonstrates authority control using MarcEdit.
Managing Electronic Resources Without Buying into the Library Vendor Singularity
James Fournie
https://journal.code4lib.org/articles/14955
Over the past decade, the library automation market has faced
continuing consolidation. Many vendors in this space have pushed
towards monolithic and expensive Library Services Platforms. Other
vendors have taken “walled garden” approaches which force vendor
lock-in due to lack of interoperability. For these reasons and others,
many libraries have turned to open-source Integrated Library Systems
(ILSes) such as Koha and Evergreen. These systems offer more
flexibility and interoperability options, but tend to be developed
with a focus on public libraries and legacy print resource
functionality. They lack tools important to academic libraries such as
knowledge bases, link resolvers, and electronic resource management
systems (ERMs). Several open-source ERM options exist, including CORAL
and FOLIO. This article analyzes the current state of these and other
options for libraries considering supplementing their open-source ILS
either alone, hosted or in a consortial environment.
Shiny Fabric: A Lightweight, Open-source Tool for Visualizing and
Reporting Library Relationships
Atalay Kutlay, Cal Murgu
https://journal.code4lib.org/articles/14938
This article details the development and functionalities of an
open-source application called Fabric. Fabric is a simple to use
application that renders library data in the form of network graphs
(sociograms). Fabric is built in R using the Shiny package and is
meant to offer an easy-to-use alternative to other software, such as
Gephi and UCInet. In addition to being user friendly, Fabric can run
locally as well as on a hosted server. This article discusses the
development process and functionality of Fabric, use cases at the New
College of Florida’s Jane Bancroft Cook Library, as well as plans for
future development.
Analyzing and Normalizing Type Metadata for a Large Aggregated Digital Library
Joshua D. Lynch, Jessica Gibson, and Myung-Ja Han
https://journal.code4lib.org/articles/14995
The Illinois Digital Heritage Hub (IDHH) gathers and enhances metadata
from contributing institutions around the state of Illinois and
provides this metadata to the Digital Public Library of America (DPLA)
for greater access. The IDHH helps contributors shape their metadata
to the standards recommended and required by the DPLA in part by
analyzing and enhancing aggregated metadata. In late 2018, the IDHH
undertook a project to address a particularly problematic field, Type
metadata. This paper walks through the project, detailing the process
of gathering and analyzing metadata using the DPLA API and OpenRefine,
data remediation through XSL transformations in conjunction with local
improvements by contributing institutions, and the DPLA ingestion
system’s quality controls.
Scaling IIIF Image Tiling in the Cloud
Yinlin Chen, Soumik Ghosh, Tingting Jiang, James Tuttle
https://journal.code4lib.org/articles/14933
The International Archive of Women in Architecture, established at
Virginia Tech in 1985, collects books, biographical information, and
published materials from nearly 40 countries that are divided into
around 450 collections. In order to provide public access to these
collections, we built an application using the IIIF APIs to
pre-generate image tiles and manifests which are statically served in
the AWS cloud. We established an automatic image processing pipeline
using a suite of AWS services to implement microservices in Lambda and
Docker. By doing so, we reduced the processing time for terabytes of
images from weeks to days.
In this article, we describe our serverless architecture design and
implementations, elaborate the technical solution on integrating
multiple AWS services with other techniques into the application, and
describe our streamlined and scalable approach to handle extremely
large image datasets. Finally, we show the significantly improved
performance compared to traditional processing architectures along
with a cost evaluation.
Where Do We Go From Here: A Review of Technology Solutions for
Providing Access to Digital Collections
Kelli Babcock, Sunny Lee, Jana Rajakumar, Andy Wagner
https://journal.code4lib.org/articles/15000
The University of Toronto Libraries is currently reviewing technology
to support its Collections U of T service. Collections U of T provides
search and browse access to 375 digital collections (and over 203,000
digital objects) at the University of Toronto Libraries. Digital
objects typically include special collections material from the
university as well as faculty digital collections, all with unique
metadata requirements. The service is currently supported by
IIIF-enabled Islandora, with one Fedora back end and multiple Drupal
sites per parent collection (see attached image). Like many
institutions making use of Islandora, UTL is now confronted with
Drupal 7 end of life and has begun to investigate a migration path
forward. This article will summarise the Collections U of T functional
requirements and lessons learned from our current technology stack. It
will go on to outline our research to date for alternate solutions.
The article will review both emerging micro-service solutions, as well
as out-of-the-box platforms, to provide an overview of the digital
collection technology landscape in 2019. Note that our research is
focused on reviewing technology solutions for providing access to
digital collections, as preservation services are offered through
other services at the University of Toronto Libraries.
Péter Király
Coordinating Editor
More information about the Metadatalibrarians
mailing list