[Metadatalibrarians] Tool to validate batches of XML documents?
Clay Redding
clay at monarchos.com
Thu Nov 6 09:43:08 PST 2008
Hi Chris,
I tend to just use a shell script. This is for a Unix/Linux/Mac OS X
solution with bash and xmllint from the libxml2 package. As an
alternative if you use Windows you could do this with a batch file
and perhaps something like Saxon to do the validation on each file.
This could even be further customized for MarkLogic, such as if the
file validates, then post the file over WebDAV to the MarkLogic
server for ingest.
###########
#!/bin/bash
# Change to the directory that contains your MARCXMLs
cd ./marcxml
# Grab all MARCXMLs -- these are just wildcard matches to files named
ending in .marc.xml
for marc in *.marc.xml
# Now validate each MARCXML against the schema. Output will only be
generated for invalid files.
do
xmllint --noout --schema http://www.loc.gov/standards/marcxml/
schema/MARC21slim.xsd $marc
done
###########
Good luck,
Clay
On Nov 6, 2008, at 12:22 PM, Schwartz, Christine wrote:
> Is anyone aware of an XML editor or tool that can identify invalid XML
> documents in batches?
>
>
>
> I receive batches of MARCXML records from our outsourcers and they
> occasionally use invalid characters by mistake. Using a screen shot of
> the files, I painstakingly go through the batches to identify by hand
> which MARCXML records did not go into our Mark Logic development
> server.
>
>
>
> Thanks,
>
>
>
> Chris
>
>
>
> Christine Schwartz
>
> Metadata Librarian
>
> Princeton Theological Seminary Libraries
>
> christine.schwartz at ptsem.edu
>
> (609) 497-7938
>
>
>
> _______________________________________________
> Metadatalibrarians mailing list
> Metadatalibrarians at lists.monarchos.com
> http://lists.monarchos.com/listinfo.cgi/metadatalibrarians-
> monarchos.com
More information about the Metadatalibrarians
mailing list