[Metadatalibrarians] Metadata entry software

Klausner, Kim Kim.Klausner at ucsf.edu
Tue Nov 24 10:13:25 PST 2009


Rich Murray at Duke suggested I ask this group the following question (I looked through nine months of the archives and didn't see the issue covered but my apologies if it has been):

Is it possible to share web-based, open source software for metadata entry?

Background to my question:
I'm an archivist by training but for the past three years have been managing a digital library (http://legacy.library.ucsf.edu) of 10.4 million tobacco industry documents (55 million pages).  We used customized, proprietary software when we created the library in 2002 with 5 million documents and metadata provided by the tobacco companies.  In 2006 we switched to open-source software partly because the original software couldn't handle the documents which we had OCRed in 2004.  The Legacy Tobacco Documents Library (LTDL) is heavily used by an international tobacco control research community because of its robust search and retrieval software.

In 2005 we created the Drug Industry Document Archive using the software we are now using with LTDL.  While it has only 2500 documents, there are potentially thousands, if not millions, of documents that may become available as a result of courts unsealing documents produced in litigation against drug companies.  We'd like to add these documents so people can understand the mechanisms by which the pharma industry conducts research, disseminates results, manufactures, prices and markets its products.  And here is the problem - how will we create metadata for the documents without funds to pay staff or a vendor?  Some documents may come with some metadata from the lawyers but many may not.  We need at a minimum fields for title, document date, author, document type, bates number (a number stamped on each page of a document used in litigation).

We're thinking about using a crowd-sourcing model - creating a web-based metadata entry module that can be used by classes of university students who are studying the issues covered in the documents.  Obviously, we'd need to work in QA.  I thought that this concept might be of interest to other university-based archives/libraries with large-scale metadata creation needs. But I'm wondering whether it would be possible for others to integrate any software we create into their collection management systems, or, alternatively, if anyone has already done this software development whether we could use what they've created.

Any thoughts, either on or off list would be appreciated.  Sorry if I've gone on a bit long here.

Kim Klausner
Industry Documents Digital Libraries Manager
University of California, San Francisco
530 Parnassus Avenue, Room 115
San Francisco, CA 94143-0840
(415) 514-0507
kim.klausner at ucsf.edu



More information about the Metadatalibrarians mailing list