Potlach Feast

-

February 22, 2008

happy birthday RDF!

Filed under: semantic web — em @ 9:13 pm

Misha pretty much summed it all up in his post

The RDF Model and Syntax Specification became a W3C Recommendation nine years ago today!

Resource Description Framework (RDF)
Model and Syntax Specification
W3C Recommendation 22 February 1999
http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

Best wishes to all members of the original W3C RDF Model and Syntax Working Group and to all those who have built on top of the foundations we created.

Seems like only yesterday ;-)

-

February 18, 2008

active purls

Filed under: semantic web, business — em @ 4:14 pm

Stu Weibel’s post on ‘List Making Meets Redirection’ prompted me to comment on some of the Active PURL work (PURLs with associated services) we at Zepheira has been developing. Example ‘Active PURLs’ might be notification to publishers of problems with target URLs (basically a link-checker for PURLs), notification to readers of updates to target PURLs (a “what’s new” feed for PURLs), etc. More specifically the architecture allows for an open marketplace to grow around such associations with PURLs (or PURL patterns) and services.

While I only touched briefly on this work in my comment, David Wood has expanded on this in his blog and given additional context on the potential business applicability of this approach.

Perhaps the most interesting use of Active PURLs to enterprises might be the ability to provide standardized RDF metadata about SOA Web Services as well as relational databases. UDDI is so broken, we might as well fix it with existing SemWeb standards. That is not a new idea, but the application of Active PURLs to the problem is.

Applying the lessons and standards of the web back inside the enterprise makes sense for managing evolution, supporting collaboration and more effectively delivering products and services. More and more businesses are starting to realize the true benefit of being *in* the Web, not just on it.

-

August 17, 2007

purlz for the people

Filed under: semantic web, business — em @ 8:42 am

For a long time I’ve been thinking how useful it would be to give PURLS to people as a key part for managing evolving social networks. And now that the new purl work we’re doing at Zepheira (which is downright scary-good due to in part a rock solid engineering team and the use of NetKernel as a key underlying technology) will include support for identification of non-document resources, this will soon be possible.

Recently, Brian has been reflecting on his building of some very cool FOAF tools. And now whats even cooler is that it looks like Brian is on the case …

This exercise has also inspired me to make some progress on my goal to create some good tools to lower the bar to FOAF usage. I am going to leverage the PURLS work that we are doing for the OCLC. This will allow us to create permanent, resolvable names for ourselves that transcend where we currently hang our hats*. This will allow the networks to be more resilient. As many links as I am finding, there have been a ton of broken links (presumably people who have moved on) that would have enriched the result set even further!

I suspect he’ll have a production ready system in place by sometime tomorrow ;-)

-

July 11, 2007

PURL rearchitecture underway

Filed under: semantic web, libraries, business — em @ 2:21 pm

15 years ago, I was involved in series of discussions at the IETF regarding technical standards related to identifying resources in decentralized environments. Acronyms such as “URI”, “URN”, “URL”, “URC” (and occasionally the phrase “you are kidding?!”) where constantly thrown around with much heated debate regarding requirements, protocols, semantics and capabilities for each of these technologies. URLs (Uniform Resource Locators … the global identifiers that start with ‘http:’, ‘mailto:’, ‘ftp:’, etc) were increasingly becoming prevalent as people found the immediate feedback of merging “clickability” and global “addressability” an attractive one. The others standards, however, were not so lucky.

At that time, the library community was starting to focus on issues related to cataloging and managing Web resources. Relaxing link integrity (404 file not found) and making it easier for people to create these URLs was one of the reasons the Web had succeeded where other hyperlinking systems had failed. This relaxation, however, caused problems for effective cataloging, managing and relating of resources. Spending time, effort and money to do this only to find the resource is no longer available because it had been moved, etc. was a serious issue to individuals in the library community focused on describing and providing effective access to relevant digital resources.

A solution to this problem was developed by OCLC; PURLs - Persistent URLs. PURLs provided a level of indirection that allows the underlying Web addresses of resources to change over time without negatively affecting systems that depend on them. Persistence is not an technological issue as much as one of social and organizational commitment. The PURL software provided the simple technological solution, and made it available for others to use, but it was OCLC running the purl.org service that provided the organizational commitment that helped make it possible for others to create and share persistent identifiers.

OCLC has been running the purl.org services for more than 12 years; there are very few services I can think of that can make such a claim. The Library community has in many ways been ahead of the general curve for managing data. A barrier for weaving these ideas, however, into various other non-library applications is that the code behind this service has largely been the same for the past 12 years as well. I’m quite pleased to note, however, that this is in the process of changing. More specifically, the following press release explains whats going on.

DUBLIN, Ohio, July 11, 2007—OCLC Online Computer Library Center, Inc. and Zepheira, LLC announced today that they will work together to rearchitect OCLC’s Persistent URL (PURL) service to more effectively support the management of a “Web of data.”

This re-architecture will not only make it easier for folks to embed PURLs within existing applications, it will also be updated to reflect the current understanding of Web architecture as defined by the World Wide Web Consortium (W3C). This new software will provide the ability to permanently identify networked information resources, such as Web documents, as well as non-networked resources such as people, organizations, concepts and scientific data. This capability will represent an important step forward in the adoption of a machine-processable “Web of data” enabled by the Semantic Web.

One of the most important principles for Zepheira is that developments towards a Semantic Web can be carefully tuned and scaled to meet the immediate needs of businesses, while valuable experience from solving enterprise needs can bring focus to Semantic Web efforts. I’m pleased to see this work underway and very much look forward to what capabilities the new PURL work will help enable in the next 12 years (and beyond).

-

June 6, 2007

Recombinant Data

Filed under: semantic web — em @ 1:02 pm

Over the past several years, I’ve occasionally used the phrase “recombinant data” when talking about the Semantic Web. Recently at the Semantic Technologies 2007 conference I attempted to give this term a definition during one of my talks:

the ability to rapidly recombine, reform, re-factor and reuse data from different applications to address a particular task, need or objective

- Eric Miller, President Zepheira “The Business of Recombinant Data”

It’s not quite right, but close…

Passing this definition though a syllabic minimization filter yields: “when it comes to data - write once, use often”.

The talk went on to demonstrate the benefit of recombinant data by using various practical tools the Simile folks have been developing to solve specific use-cases. And from there connecting these examples back to real-world problems that enterprises are grappling with in terms of more efficient, flexible means of supporting data integration and ultimately effective business intelligence. And while I’m admitedly bias in my assessment, it seemed to go over extremely well. Re-echoing the point once again that showing rather than telling helps people understand the power of recombinant data.

-

April 8, 2007

Zepheira

Filed under: semantic web, business — em @ 5:41 pm

Several folks have asked me privately about Zepheira … more specifically what we do, how to pronounce it and where the name came. The following is a poor attempt to minimize future inquiries :)

Ok, first, what is it… the home page captures some of this:

Zepheira provides solutions to integrate, navigate and manage data across personal, group and enterprise boundaries to save time and money. Our team are experts in applying Semantic Web standards and knowledge management technologies to address your specific data integration challenges.

Not quite a full answer, but close. For those interested in more details, Zepheira’s services page may be additionally useful.

Secondly… how to pronounce it:

zepheira : ze-fear’-a

see… that wasn’t so hard :)

And finally, where did the name come from…

Naming things in general is difficult- companies even more so.

I’m fortunate enough to be part of a team of industry leaders who have come together to help provide effective solutions to address various data integration, collaboration and knowledge exchange challenges. During the discussion on what to call ourselves, we found ourselves discussing a wide range of inter-related topics including philosophy, values, experiences, lessons learned and future goals. James Lipton’s Inside the Actors Studio questions, care of Bernard Pivot, we found equally insightful :) In the process of sharing our individual views on various subjects we collectively recognized that while each of us are respected industry leaders in various areas, we also shared a deep passion for the arts. We found that everyones ‘hobby’ was incredibly artistic in nature - pottery, woodworking, sculpting, music, poetry, martial arts, weaving / fabric, photography, etc.

I own only one piece of “serious” art (something I paid money for). It’s a painting from an Armenian artist named Vakhtang. My wife and I stumbled upon this piece independently in Sausalito and it holds a special place in our hearts. I walk past it everyday and everyday I pause briefly to breath it in. It’s a bit of a daily ritual of mine; viewing this piece makes me pause, reflect and subsequently feel a bit better about life, the universe and everything.

The painting is called ‘Zepheira’.

It was one of those ‘ah ha!’ moments that we as a team had and it came together instantly: “Zepheira, ‘The Art of Data’”. It simply felt right.

-

November 2, 2006

miles davis timeline

Filed under: semantic web, music — em @ 12:08 pm

I’m a jazz fan. The combination of vinyl, low watt tube amplifcation, high efficency speakers and jazz works for me. Spinning some Miles Davis in particular I really enjoy.

Ok… enough on that. A couple of months ago, I spent a few minutes combining some of the tools that we’re building in the Simile project together to show the value of accessing data that is “behind” web pages and viewing this data in new and interesting ways. The Miles Davis discography timeline is a quick example of this (as well as a painful reminder of all of the Miles Davis albums I don’t have!).

For the unfortunate souls who don’t know who Miles Davis was, the BBC sums it up nicely…

“American jazz trumpeter and bandleader. He was in at the birth of hard bop, ‘cool’ jazz, modal jazz and jazz/rock fusion. For much of his life he struggled with racial prejudice and drug addiction. An icon of the modern age.” - BBC Artist Profile of Miles Davis.

The Miles Davis discography data (rdf/xml) and corresponding scraper that extracts the data from the BBC site are available for those interested in learning more. The Timeline tool is another excellent creation from David Huynh and provided by the Simile project.

-

October 28, 2005

the rdf.net challenger - piggy bank

Filed under: semantic web — em @ 8:08 pm

Tim Bray’s latest post reminded me of his rdf.net challenge. Working with various companies on RDF and Semantic Web related tools / products I had forgotten this was still in play. As it still is, i’m interested :) . More specifically, I’d like to offer Piggy Bank (and various other tools in the Simile toolkit) as a challenger.

Reading the (cleverly subjective) criteria of the original challenge


OK, I’m prepared to put my domain name where my mouth is. Herewith the RDF.net challenge: To the first person or organization that presents me with an RDF-based app that I actually want to use on a regular basis (at least once per day), and which has the potential to spread virally, I hereby promise to sign over the domain name RDF.net.

I’ve been using Piggy Bank daily for almost 6 months. Since the latest release a few weeks ago, the stability, speed and performance increases have made this an indispensable tool for me to collect, tag and manage “stuff” that is important to me. Piggy Bank helps me manage contacts, images, news items of interest, scholarly articles, teleconference details, events, web sites of interest, etc. - basically anything I find useful (to date I have about 3000 things i find useful). I figure I save anywhere from 20-30 minutes a day using Piggy-Bank. This give me a couple hours a week more I can spend more with my family - that to me is a killer app!


You can call me an idealist, but I think the Web is terribly metadata-thin, and I think that when we start to bring on board metadata-rich knowledge monuments such as WorldCat and some of the Thomson holdings, we’re damn well going to need a good clean efficient way to pump the metadata back and forth.

I’d argue the Web is actually quite metadata-rich (but my views of the Web go beyond the notion of hyperlinked documents and include the data that is often behind them). The results one gets from searching Monster.com, the listing of available apartments via appartment.com or the nearest starbucks coffee shops based on my zip code are all examples of metadata. The problem isn’t the lack of metadata per se, but the different ways of encoding this information, the ambiguity of terms used to describe this data and the lack of common protocols and interfaces for accessing this data directly. Being able to integrate the data that comes from these sites opens up a whole new set of end-user possibilities (the insomniacs reading this post that would love to see a map of the “show me all of the technology jobs that pay > ‘X’ with appartments that cost < ‘Y’ near coffee shops in city ‘Z’” can see the benefit of integrating this data instantly :) )

While I believe the semantic web standards (RDF, OWL, SPARQL) are key in helping address these problems in general, Piggy Bank is more focused on making the management and reuse of this data transparent to the user. Its focus is to empower the end-user and make it increasingly easy to access this ‘raw data’ and use / re-use, manage, integrate and share this data with others.

library

Tim mentioned OCLC’s WorldCat in his original challenge (which aparently has reached 1 billion holdings - congrats!). As a few people know, I previously worked at OCLC for 13 years and have a special place in my heart for libraries. OCLC has recently launched something that call Open Worldcat which provides web access to this data. The problem is (quite selfishly) it doesn’t quite do what *I* want. One of the itches I’ve always wanted to scratch was being able to find libraries that had *all* of the particular items I’m looking for. It’s frustrating to go to a particular library to get “green eggs and ham” and “hop on pop” but somewhere else to get”one fish, two fish” (my son likes dr. suess… what can i say). Making this data available on the web however is the key to opening up new end-user applications. Using Solvent and a couple hours trapped in an airport, for example, I exposed this data to Piggy Bank. Using Piggy Bank’s ability to combine information and integrate this with 3rd party services (e.g. Google maps) , I can overlay the results from my queries on a map and create an end-user application that shows the libraries nearest me that has *all* of the books I’m looking for. Using Solvent I’ve created a similar scraper for NCBI’s Pubmed (which has access to 15 million citations from MEDLINE and other life science journals for biomedical articles back to the 1950s). It took me a couple hours to build my first scaper. It took me half that time to build my second. And if others find either one of these useful, it now will take them couple minutes to plug this in to their piggy bank.

Oh, and by the way - the folks really wanting to “show me all of the technology jobs that pay > ‘X’ with apartments that cost < ‘Y’ near coffee shops in city ‘Z’” can do this now - check out the list of scrapers in Simile’s semantic bank.

The decentralized ‘plug in’ architecture of piggy-bank scrapers are a practical stop gap for services that don’t provide their data in RDF. Check out citeseer If you’d like to a quick sense of of a site that does - its far more useful to be to “bookmark” and share individual article level metadata than traditional web page results. For content providers, if you think providing an RSS feeds helps draw people to your site, providing common interfaces to RDF data will make that look trivial in comparison.

There is lots of data on the Web. Piggy Bank simply allows me to the ability to start to use it. I can’t imagine using my computer without access to a browser. Now I can’t image using my browser without piggy bank.


I think the RDF model is the right way to think about this kind of stuff, and I firmly believe that the killer app is lurking in the weeds out there …

The simile team has been focused on getting work done rather than anything else. Now and again, we should remind ourselves to come out from behind the weeds…

Did I win? :)

-

August 23, 2005

search to find

Filed under: semantic web — em @ 9:09 pm

News.com picked up on some of the new work thats going on in the acedemic search / information management space in their article Academia’s quest for the ultimate search tool citing Berkeley’s new interdisciplinary department focus on search technology, CMU’s Javelin work focused on Question Answering search technology and MIT’s Simile project.

I particularly like the susinct point MacKenzie makes regarding the benifits of the semantic web architecture that Simile has developed:


A generalized data archive lets you make data work together in ways you couldn’t before

MIT’s START system based on formalizing / mining metadata composed of natural language phrases and sentences I think is another one thats worth mentioning in this space. Opening up a RDF interface to this data in I think would be particularly interesting.

One search engine can’t do everything. Different search engines / strategies will be more effective at addressing different tasks. Being able to expose the data behind these services and allow individuals / organizations the means to tie together this data will be key .

-

August 3, 2005

on connecting things…

Filed under: semantic web — em @ 7:38 pm

(reconstructed from wayback)

Talks over on the Simile list have moved into the realm of bibliographic citations and of the best way of describing people. FRBR has been mentioned as well as IFLA’s FRAR work for authority records in this context. I’m particularly encoraged by the more recent work of Ian Davis and Richard Newman in this area in grounding FRBR in RDF.

I very much respect the FRBR work and I believe the instantiation of FRBR in RDF is an important step for weaving libraries into the Web and letting folks outside of the library community know that the libraries still know a thing or two regarding the modeling and management of information :) . I’d very much like to see this work move forward and I’m interested in learning more about how to help.

From the perspective of project Simile (where this discussion in part is taking place), however, I’m slightly less interested in the “best” way of describing things (e.g. People) and more interested in how to start operationalize the contextual linking of these things together. I believe there are some relatively simple steps that might be taken to achieve a very powerful network effect.

Here is an example …

hubmed has wrapped pubmed and provided (among many things) an RDF representation of the corresponding bibliographic data. This is an important step for “connecting things” in the biomedical and life sciences community. Here is an example of one of these records ( HTML, RDF/XML)

By itself, the article in RDF form is not really helpful. That said, in RDF it makes it easier to connect this with other data sets. To illustrate this example, I’ve added this RDF data to the Semantic Bank and used this tool to help connect intersting bits and pieces from several servers.

One of the first things one may notice looking at this record is that you’ll see the authors listed as (anonymous items). This is one of the reasons why I’m of the opinion that a “default value” thats included by the data providers would be useful.

If you get past the debug-view of the interface, another thing you may notice (choose ‘Show Referers’) is the fact that this article is a “supporting Article” for an Observation and that there is another article that supports this Observation as well. Further, this Observation is one of several “supporting Evidence” (again choose ‘Show Referers’) that is associated with the Amyloid Hypothesis which is related to Alzheimers Disease.

Some of this data comes from pubmed (articles), some comes from scientific communities (in the above case, the Amyloid Hypothesis is from Alzforum). Through the Semantic Web we can begin to see the various potentials of using a common framework to draw connections among various “things” of interest. In this specific case of the life sciences community, I think this community is very close to not only connecting people to people, people to articles, articles to journals, etc. but articles to hypothesis, hypothesis to disease, genes, proteins, etc. And ultimatly conntecting the dots between diseases to drugs.

There are many paths one may take to make this connection and the path for one may not be the same as one that works for another. Providing the ability f\ or people to create new connections among data and share this with others is key. A community focused on a particular goal, task or interest coupled with a f\ ramework for representing, sharing and integrating data is a powerful combination.

Small but important steps will help facilitate this goal. On the technology side, more tools like Connotea, Simile, etc. are required. From the content side however, common means of referencing ‘things’ that are real (people, places, articles, genes, proteins, etc.) and from there, agreement on a common means for describing these resources (RDF) are still required. Common protocols and interfaces to this data will be needed as well. This is where technologies such as SPARQL will be increasingly critical. Folks over in Nature and Hubmed seem to “get it” and are good examples of a growing awareness in the “interconnectedness of things”.

There continues to be a lot of focus on the “best” way to describing things. I don’t want this to stop. My hope is, however, that people will begin to place an equal if not greater value on the contextualization of these things they’re hoping to describe. As we weave a web of data, I believe how things connect will prove more valuable.

Next Page »
 

Creative Commons License
This work is licensed under a Creative Commons License.