[Metadatalibrarians] Code4Lib Journal -- Issue 37 Published (apologies for cross posting)

Wed Jul 19 10:33:46 PDT 2017

Issue 37 of the Code4Lib Journal is now available at:
http://journal.code4lib.org/issues/issues/issue37

* *Table of Contents**

*Editorial: Welcome New Editors, What We Know About Who We Are, and
Submission Pro Tip!*
http://journal.code4lib.org/articles/12651
Sara Amato
Want to see your work in C4LJ? Here’s a pro tip!

*A Practical Starter Guide on Developing Accessible Websites*
http://journal.code4lib.org/articles/12697
Cynthia Ng and Michael Schofield
There is growing concern about the accessibility of the online content and
services provided by libraries and public institutions. While many articles
cover legislation, general benefits, and common opportunities to improve
web accessibility on the surface (e.g., alt tags), few articles discuss web
accessibility in more depth, and when they do, they are typically not
specific to library web services. This article is meant to fill in this
vacuum and will provide practical best practices and code.

*Recount: Revisiting the 42nd Canadian Federal Election to Evaluate the
Efficacy of Retroactive Tweet Collection*
http://journal.code4lib.org/articles/12676
Anthony T. Pinter and Ben Goldman
In this paper, we report the development and testing of a methodology for
collecting tweets from periods beyond the Twitter API’s seven-to-nine day
limitation. To accomplish this, we used Twitter’s advanced search feature
to search for tweets from past the seven to nine day limit, and then used
JavaScript to automatically scan the resulting webpage for tweet IDs. These
IDs were then rehydrated (tweet metadata retrieved) using twarc. To examine
the efficacy of this method for retrospective collection, we revisited the
case study of the 42nd Canadian Federal Election. Using comparisons between
the two datasets, we found that our methodology does not produce as robust
results as real-time streaming, but that it might be useful as a starting
point for researchers or collectors. We close by discussing the
implications of these findings.

*Extending Omeka for a Large-Scale Digital Project*
http://journal.code4lib.org/articles/12529
Haley Antell, Joe Corall, Virginia Dressler, Cara Gilgenbach
In September 2016, the department of Special Collections and Archives, Kent
State University Libraries, received a Digital Dissemination grant from the
National Historical Publications and Records Commission (NHPRC) to digitize
roughly 72,500 pages from the May 4 collection, which documents the May
1970 shootings of thirteen students by Ohio National Guardsmen at Kent
State University. This article will highlight the project team’s efforts to
adapt the Omeka instance with modifications to the interface and ingestion
processes to assist the efforts of presenting unique archival collections
online, including an automated method to create folder level links on the
relevant finding aids upon ingestion; implementing open source Tesseract to
provide OCR to uploaded files; automated PDF creation from the raw image
files using Ghostscript; and integrating Mirador to present a folder level
display to reflect archival organization as it occurs in the physical
collections. These adaptations, which have been shared via GitHub, will be
of interest to other institutions looking to present archival material in
Omeka.

*Annotation-based enrichment of Digital Objects using open-source
frameworks*
http://journal.code4lib.org/articles/12582
Marcus Emmanuel Barnes, Natkeeran Ledchumykanthan, Kim Pham, Kirsta
Stapelfeldt
The W3C Web Annotation Data Model, Protocol, and Vocabulary unify
approaches to annotations across the web, enabling their aggregation,
discovery and persistence over time. In addition, new javascript libraries
provide the ability for users to annotate multi-format content. In this
paper, we describe how we have leveraged these developments to provide
annotation features alongside Islandora’s existing preservation, access,
and management capabilities. We also discuss our experience developing with
the Web Annotation Model as an open web architecture standard, as well as
our approach to integrating mature external annotation libraries. The
resulting software (the Web Annotation Utility Module for Islandora)
accommodates annotation across multiple formats. This solution can be used
in various digital scholarship contexts.

*The FachRef-Assistant: Personalised, subject specific, and transparent
stock management*
http://journal.code4lib.org/articles/12660
Eike T. Spielberg, Frank Lützenkirchen
We present in this paper a personalized web application for the weeding of
printed resources: the FachRef-Assistant. It offers an extensive range of
tools for evidence based stock management, based on the thorough analysis
of usage statistics. Special attention is paid to the criteria
individualization, transparency of the parameters used, and generic
functions. Currently, it is designed to work with the Aleph-System from
ExLibris, but efforts were spent to keep the application as generic as
possible. For example, all procedures specific to the local library system
have been collected in one Java package. The inclusion of library specific
properties such as collections and systematics has been designed to be
highly generic as well by mapping the individual entries onto an in-memory
database. Hence simple adaption of the package and the mappings would
render the FachRef-Assistant compatible to other library systems.

The personalization of the application allows for the inclusion of subject
specific usage properties as well as of variations between different
collections within one subject area. The parameter sets used to analyse the
stock and to prepare weeding and purchase proposal lists are included in
the output XML-files to facilitate a high degree of transparency,
objectivity and reproducibility.

*The Semantics of Metadata: Avalon Media System and the Move to RDF*
http://journal.code4lib.org/articles/12668
Juliet L. Hardesty and Jennifer B. Young
The Avalon Media System (Avalon) provides access and management for digital
audio and video collections in libraries and archives. The open source
project is led by the libraries of Indiana University Bloomington and
Northwestern University and is funded in part by grants from The Andrew W.
Mellon Foundation and Institute of Museum and Library Services.

Avalon is based on the Samvera Community (formerly Hydra Project) software
stack and uses Fedora as the digital repository back end. The Avalon
project team is in the process of migrating digital repositories from
Fedora 3 to Fedora 4 and incorporating metadata statements using the
Resource Description Framework (RDF) instead of XML files accompanying the
digital objects in the repository. The Avalon team has worked on the
migration path for technical metadata and is now working on the migration
paths for structural metadata (PCDM) and descriptive metadata (from MODS
XML to RDF). This paper covers the decisions made to begin using RDF for
software development and offers a window into how Semantic Web technology
functions in the real world.

*OpeNumisma: A Software Platform Managing Numismatic Collections with A
Particular Focus On Reflectance Transformation Imaging*
http://journal.code4lib.org/articles/12627
Avgoustinos Avgousti, Andriana Nikolaidou, Ropertos Georgiou
This paper describes OpeNumisma; a reusable web-based platform focused on
digital numismatic collections. The platform provides an innovative merge
of digital imaging and data management systems that offer great new
opportunities for research and the dissemination of numismatic knowledge
online. A unique feature of the platform is the application of Reflectance
Transformation Imaging (RTI), a computational photographic method that
offers tremendous image analysis and possibilities for numismatic research.
This computational photography technique allows the user to observe on
browser minor details, unseen with the naked eye just by holding the
computer mouse rather than the actual object. The first successful
implementation of OpeNumisma has been the creation of a digital library for
the medieval coins from the collection of the Bank of Cyprus Cultural
Foundation.

*DuEPublicA: Automated bibliometric reports based on the University
Bibliography and external citation data*
http://journal.code4lib.org/articles/12549
Eike T. Spielberg
This paper describes a web application to generate bibliometric reports
based on the University Bibliography and the Scopus citation database. Our
goal is to offer an alternative to easy-to-prepare automated reports from
commercial sources. These often suffer from an incomplete coverage of
publication types and a difficult attribution to people, institutes and
universities. Using our University Bibliography as the source to select
relevant publications solves the two problems. As it is a local system,
maintained and set up by the library, we can include every publication type
we want. As the University Bibliography is linked to the identity
management system of the university, it enables an easy selection of
publications for people, institutes and the whole university.

The program is designed as a web application, which collects publications
from the University Bibliography, enriches them with citation data from
Scopus and performs three kinds of analyses:
1. A general analysis (number and type of publications, publications per
year etc.),
2. A citation analysis (average citations per publication, h-index,
uncitedness), and
3. An affiliation analysis (home and partner institutions)

We tried to keep the code highly generic, so that the inclusion of other
databases (Web of Science, IEEE) or other bibliographies is easily
feasible. The application is written in Java and XML and uses XSL
transformations and LaTeX to generate bibliometric reports as HTML pages
and in pdf format. Warnings and alerts are automatically included if the
citation analysis covers only a small fraction of the publications from the
University Bibliography. In addition, we describe a small tool that helps
to collect author details for an analysis.

*New Metadata Recipes for Old Cookbooks: Creating and Analyzing a Digital
Collection Using the HathiTrust Research Center Portal*
http://journal.code4lib.org/articles/12548
Gioia Stevens
The Early American Cookbooks digital project is a case study in analyzing
collections as data using HathiTrust and the HathiTrust Research Center
(HTRC) Portal. The purposes of the project are to create a freely
available, searchable collection of full-text early American cookbooks
within the HathiTrust Digital Library, to offer an overview of the scope
and contents of the collection, and to analyze trends and patterns in the
metadata and the full text of the collection. The digital project has two
basic components: a collection of 1450 full-text cookbooks published in the
United States between 1800 and 1920 and a website to present a guide to the
collection and the results of the analysis.

This article will focus on the workflow for analyzing the metadata and the
full-text of the collection. The workflow will cover: 1) creating a
searchable public collection of full-text titles within the HathiTrust
Digital Library and uploading it to the HTRC Portal, 2) analyzing and
visualizing legacy MARC data for the collection using MarcEdit, OpenRefine
and Tableau, and 3) using the text analysis tools in the HTRC Portal to
look for trends and patterns in the full text of the collection.

*Countering Stryker’s Punch: Algorithmically Filling the Black Hole*
http://journal.code4lib.org/articles/12542
Michael J. Bennett
Two current digital image editing programs are examined in the context of
filling in missing visual image data from hole-punched United States Farm
Security Administration (FSA) negatives. Specifically, Photoshop’s
Content-Aware Fill feature and GIMP’s Resynthesizer plugin are evaluated
and contrasted against comparable images. A possible automated workflow
geared towards large scale editing of similarly hole-punched negatives is
also explored. Finally, potential future research based upon this study’s
results are proposed in the context of leveraging previously-enhanced,
image-level metadata.