PURL rearchitecture underway

15 years ago, I was involved in series of discussions at the IETF regarding technical standards related to identifying resources in decentralized environments. Acronyms such as “URI”, “URN”, “URL”, “URC” (and occasionally the phrase “you are kidding?!”) where constantly thrown around with much heated debate regarding requirements, protocols, semantics and capabilities for each of these technologies. URLs (Uniform Resource Locators … the global identifiers that start with ‘http:’, ‘mailto:’, ‘ftp:’, etc) were increasingly becoming prevalent as people found the immediate feedback of merging “clickability” and global “addressability” an attractive one. The others standards, however, were not so lucky.

At that time, the library community was starting to focus on issues related to cataloging and managing Web resources. Relaxing link integrity (404 file not found) and making it easier for people to create these URLs was one of the reasons the Web had succeeded where other hyperlinking systems had failed. This relaxation, however, caused problems for effective cataloging, managing and relating of resources. Spending time, effort and money to do this only to find the resource is no longer available because it had been moved, etc. was a serious issue to individuals in the library community focused on describing and providing effective access to relevant digital resources.

A solution to this problem was developed by OCLC; PURLs – Persistent URLs. PURLs provided a level of indirection that allows the underlying Web addresses of resources to change over time without negatively affecting systems that depend on them. Persistence is not an technological issue as much as one of social and organizational commitment. The PURL software provided the simple technological solution, and made it available for others to use, but it was OCLC running the purl.org service that provided the organizational commitment that helped make it possible for others to create and share persistent identifiers.

OCLC has been running the purl.org services for more than 12 years; there are very few services I can think of that can make such a claim. The Library community has in many ways been ahead of the general curve for managing data. A barrier for weaving these ideas, however, into various other non-library applications is that the code behind this service has largely been the same for the past 12 years as well. I’m quite pleased to note, however, that this is in the process of changing. More specifically, the following press release explains whats going on.

DUBLIN, Ohio, July 11, 2007â€”OCLC Online Computer Library Center, Inc. and Zepheira, LLC announced today that they will work together to rearchitect OCLCâ€™s Persistent URL (PURL) service to more effectively support the management of a â€œWeb of data.â€

This re-architecture will not only make it easier for folks to embed PURLs within existing applications, it will also be updated to reflect the current understanding of Web architecture as defined by the World Wide Web Consortium (W3C). This new software will provide the ability to permanently identify networked information resources, such as Web documents, as well as non-networked resources such as people, organizations, concepts and scientific data. This capability will represent an important step forward in the adoption of a machine-processable â€œWeb of dataâ€ enabled by the Semantic Web.

One of the most important principles for Zepheira is that developments towards a Semantic Web can be carefully tuned and scaled to meet the immediate needs of businesses, while valuable experience from solving enterprise needs can bring focus to Semantic Web efforts. I’m pleased to see this work underway and very much look forward to what capabilities the new PURL work will help enable in the next 12 years (and beyond).