John Kunze is an Associate Director at the University of California Curation Center in the California Digital Library. With a background in computer science and mathematics, his current work focuses on data set curation, "data papers", creating long-term durable information object references using ARK identifiers and the N2T resolver, archiving websites, and promoting lightweight Dublin Core "Kernel" metadata. Parts of his work have been supported by the National Science Foundation, the Gordon and Betty Moore Foundation, and the Library of Congress. Previously, he contributed heavily to the standardization (RFC's, NISO specifications) of URLs, Dublin Core metadata, and the Z39.50 search and retrieval protocol. In an earlier life he created UC Berkeley's first campus-wide information system, which was an early rival and client of the World Wide Web. Before that he was a BSD Unix hacker whose work survives in today's Linux and Apple systems.
The deluge of data artifacts produced by data-intensive research presents a huge and complex problem. Little of this data is shared, re-used, or preserved in the scientific record. Because it is "unpublished" in traditional scholarly terms, neither libraries nor scientists know how to deal with it. At the California Digital Library (CDL) we're nibbling away on several fronts to try to shrink the "data curation" problem to a more manageable size. A survey of efforts on these fronts touches upon data papers, data citation, identifier and repository services, repository federation, and data management planning.