ALA TechSource Logo

ala-online-v1.jpg

 
curve Home spacer Publications spacer Subscribe spacer Blog spacer About  
    

Annual 2009: Catalogers Look to the Future

Submitted by Shirley Lincicum on July 31, 2009 - 10:58am

ALA Publishing staff working on RDA:Resource Description and Access are watching for library innovation building on bibliographic records.  Shirley Lincicum has offered some fantastic coverage of a technology that had catalogers excited at this year's annual conference.

Shirley has been a professional cataloger in academic libraries for 15 years. She currently serves as Librarian/Associate Professor at Western Oregon University in Monmouth, Oregon. She is currently on sabbatical and will be researching and writing about nest-generation cataloging systems and institutional repositories. You can read about her findings at http://shirley.alptown.com/blog/. Shirley received her Library and Information Science degree from the University of Illinois Graduate School of Library and Information Science. To contact her, visit her page at http://www.wou.edu/~lincics/.

-Dan Freeman

Resource Description and Access (RDA) is scheduled for release in November 2009 and was the hot topic of discussion for catalogers attending the 2009 ALA Annual Conference. One session at annual also discussed RDA, but it did so within a broader context, which made it a conference highlight. Bringing together four recognized leaders to discuss the emergence of linked data on the Web and the role that the library community can play in realizing the Semantic Web, the event drew a standing room only crowd, and offered a compelling glimpse at what is likely to be the future of cataloging. 

Eric Miller, President and co-founder of Zepheira opened things up with an overview of the current state of linked data development. Miller defined linked data as "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs." He emphasized sharing and connecting data as the key elements of this concept. Thousands of organizations and individuals are currently participating in creating linked data, and the availability of linked data has increased tremendously over the past six months. 

Miller listed several principles that govern linked data:

  • URIs represent “things”: people, places, concepts, departments

  • Using HTTP-compliant URIs makes data more accessible

  • When serving URIs, deliver useful, reusable information

  • Leverage standards (RDF, SKOS, etc.)

  • Add context. It's all about connecting and creating meaningful relationships between data.

He argued that the Web itself is becoming the basic architecture for building applications. Linked data applications don't run on the Web; they are applications of the Web. Users increasingly want their data back, and they want it back their way. With linked data, users are no longer limited to searching based on relationships that have been pre-defined by application developers, database designers, or librarians; users can create and search based on relationships that are meaningful to them. Miller's company Zepheira is currently working with the Library of Congress to create Recollection, a new platform intended to provide more useful tools and processes for sharing diverse content across the myriad collections covered by the LC Digital Preservation Program. This will empower users to create new views for existing data, combine data sets in customizable ways, and build communities around the data, allowing them to collaborate in curating and connecting collections in customized ways. Zepheira has also launched Freemix, a new social networking application designed to allow users to mix and share data.

In closing, Miller noted that in the linked data environment, credibility is more important than ever before. Libraries are trusted institutions with a wealth of experience in organizing and managing information resources. The library community needs to position itself to leverage this reputation and take a larger role in the development of linked data applications. Linked data has arrived, and the library community cannot afford to be left behind.

Diane Hillmann's presentation addressed this question:  Are Libraries Ready for Linked Data? Her answer was a resounding yes! Linked data is all about relationships, libraries have been concerned with expressing relationships between information objects for a very long time, and we now understand that we must use machine-based methods if we want to do a really good job. Traditional cataloging provides attribute = value pairs, for example: Title = [value] or Author = [value]. These attributes are embedded within a record that has an identifier. Because they don't have independent identifiers, attributes cannot be referenced outside the context of a record.

Linked data is based a model of triples consisting of subject, predicate and object, which permits the assignment of identifiers at the attribute level. Identifiers can also be assigned to relationships between attributes. Hillmann is currently involved in building a registry that maintains and serves relationship identifiers: http://metadataregistry.org/. Though overlooked by many, she finds the RDA Appendix to be extremely valuable because it defines more than 200 attribute relationships. A vocabulary based on RDA should be completely registered within a few weeks. It will be freely accessible to support linked data applications implemented by libraries and others. The registry, combined with the availability of applications and tools such as those being developed in conjunction with the eXtensible Catalog project, constitute essential infrastructure required to enable the library community to become more actively engaged with both using and creating linked data. 

Jennifer Bowen provided an overview of the eXtensible Catalog (XC) project, and described how XC supports linked data. One of the primary goals of the project is to build open source software that supports reuse of MARC-encoded library metadata in an extensible environment. Though it has added to the cost of development, XC has been designed specifically to support linked data. XC metadata is based on the FRBR model, and it supports a level of granularity similar to MARC. One of the challenges in developing XC was that the developers could not wait for RDA to be formally released, but because the software has been developed to support the same models and principles that form the basis for RDA, it should be relatively easy to implement full support for RDA when the time finally comes. XC also facilitates metadata harvesting via OAI-PMH and transformation of Dublin Core (DC) metadata. The XC application profile is being developed in accordance with the guidelines for DC application profiles, though it does not mandate the inclusion of DC Metadata Initiative (DCMI) terms. XC requires that terms be defined in RDF, and it is designed to utilize metadataregistry.org. XC incorporates terms from several namespaces and defines a 37 custom elements in its own namespace. Some of the custom elements mirror elements defined in other metadata schemes that are not yet registered, such as RDA and MARC. One of XC's biggest strengths is that it enables experimentation.  It provides Web-based tools that support harvesting, troubleshooting, transformation, and enhancement of metadata outside the context of existing legacy systems. Librarians can explore new approaches to managing metadata with no danger of permanently corrupting or destroying data stored in legacy systems. 

Bowen summarized the next steps for the XC project:

  • Finalize schema and registry of elements

  • Publish application profile

  • Identify and define metadata elements for user generated metadata

  • Enable schema data to be harvested as RDF

Rebecca Guenther described efforts currently underway at the Library of Congress to make controlled vocabularies available as linked data. The Library of Congress Vocabularies Service was created with the intention of facilitating development and maintenance of vocabularies maintained by LC and making them freely available to both libraries and the broader Web community. The service provides comprehensive information about the vocabularies in addition to exposing the vocabularies themselves as linked data. Most vocabularies will be represented using the Simple Knowledge Organization System (SKOS), an RDF application that was recently finalized by the W3C. Currently, LCSH is the only vocabulary available, but others will be offered in the future, including LC Name authorities. The service also offers bulk download of data in RDF format. Now that the service is officially up and running, LC plans to advocate for its use and solicit user feedback more actively. Also still to come: a mechanism for updating data as changes are made in the underlying vocabularies and the development of an OWL schema for LCSH to provide greater granularity and a means for expressing facets, since SKOS lacks this capability.  

Linked data is a promising new platform for creating and sharing exactly the type of metadata that catalogers have been creating and managing for decades. Services like metadataregistry.org and LC vocabularies, as well as the tools developed through the XC project are critical pieces of infrastructure that offer librarians the opportunity to leverage their expertise and share it with the broader Web community. After years of uncertainty and hype, catalogers now have real support to transform they way they work and make ever more valuable contributions to the evolving information space on the Web. Are catalogers ready to face the challenges associated with mastering linked data technology? I know I am.

UPDATE: Slides for all presentations are now available here: http://wikis.ala.org/annual2009/index.php/Grassroots_Programs