Extracting and reifying RDF from XML

Serializing RDF and edge labelled graphs in XML

One of the issues in serializing RDF to XML, and extracting RDF from arbitrary or colloquial XML is that the XML object model (DOM) is a node labelled graph and the RDF object model forms an edge labelled graph.

Several mechanism have been proposed to simplifiy the RDF syntax: After implementing Sergey Melnik's simplified RDF syntax using Tim Connolly's rdfp.xsl as a base, I have subsequently implemented Tim Berniers-Lee's strawman syntax:

"The major difference between this syntax and RDF 1.0 M&S is that RDF edges correspond to elements, and RDF nodes are implicit. It is basically as the M&S syntax with parseType=resourceis a default."

This proposal, with its attendant implementation has the following properties:

  1. Uses rdf:parseType='Resource' as default
  2. Does not add to current rdf vocabulary
  3. Implements XLink2RDF proposal (now with extended links)
  4. Implements rdf:aboutEach, rdf collections and bagID
  5. Transforms to <rdf:Statement><rdf:predicate .../>...</rdf:Statement> form
  6. Transforms colloqial XML into RDF Statements
  7. *** Transformation of the output of a transformation results in reification

The XSLT implementation

The current XSLT implementation incorporates Jason Diamond's original rdf.xsl (nice work!). I have placed a copy at http://www.openhealth.org/RDF/rdf.xsl which is <xsl:include>d into the current implementation:

 http://www.openhealth.org/RDF/rdfExtractify.xsl

To use, also download http://www.openhealth.org/RDF/rdf.xsl into the same directory

 

How to name anonymous class instances?

In the Melnick proposal, class instances are named by use of the rdf:instance attribute. When we indicate that an element maps to a class instance through use of the rdf:type instance the member appears anonymous, or is it? As in the XLink2RDF proposal, nodes can be named using XPointer fragment identifiers. XPointers can be generated from an element using several techniques:

  1. 'Bare names' Value of ID or rdf:ID attribute e.g. #foo
  2. ChildSeq e.g. #/1/1/1
  3. XPath indices e.g. #xpointer(/foo[1]/bar[2])
  4. XPathattributese.g.#xpointer(/foo[@bar='123']/baz[@bop='whatever'])

An example XML document using simplified RDF syntax

<t:person

  rdf:about="http://www.openhealth.org/people/JohnDoe.xml"

        xmlns:t="http://www.openhealth.org/types">

    <t:name rdf:type="PersonName">           

        <t:first>John</t:first>

  <t:last>Doe</t:last>

    </t:name>

    <t:pid t:entity="NEMC">123-45-6789</t:pid>

    <t:SSN>000-11-1234</t:SSN>

    <t:patient rdf:type="Role">

       <t:primary-care-physician rdf:resource=".../DrJones.xml" />

    </t:patient>

    <t:address rdf:type="Address" loc="home">

      <t:street>750 Washington Street</t:street>

      <t:city>Boston</t:city>

      <t:state>MA</t:state>

    </t:address>

</t:person>

And transformed via rdfExtractify:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="http://www.openhealth.org/people/JohnDoe.xml"/>
      <rdf:object rdf:resource="http://www.openhealth.org/types#person"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.openhealth.org/types#name"/>
      <rdf:subject rdf:resource="http://www.openhealth.org/people/JohnDoe.xml"/>
      <rdf:object rdf:resource="#/1/1"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/1"/>
      <rdf:object rdf:resource="PersonName"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.openhealth.org/types#first"/>
      <rdf:subject rdf:resource="#/1/1"/>
      <rdf:object>John</rdf:object>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.openhealth.org/types#last"/>
      <rdf:subject rdf:resource="#/1/1"/>
      <rdf:object>Doe</rdf:object>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.openhealth.org/types#pid"/>
      <rdf:subject rdf:resource="http://www.openhealth.org/people/JohnDoe.xml"/>
      <rdf:object rdf:resource="#/1/2"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/2"/>
      <rdf:object rdf:resource="http://www.openhealth.org/types#pid"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.openhealth.org/types#entity"/>
      <rdf:subject rdf:resource="#/1/2"/>
      <rdf:object>NEMC</rdf:object>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#value"/>
      <rdf:subject rdf:resource="#/1/2"/>
      <rdf:object>123-45-6789</rdf:object>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.openhealth.org/types#SSN"/>
      <rdf:subject rdf:resource="http://www.openhealth.org/people/JohnDoe.xml"/>
      <rdf:object>000-11-1234</rdf:object>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.openhealth.org/types#patient"/>
      <rdf:subject rdf:resource="http://www.openhealth.org/people/JohnDoe.xml"/>
      <rdf:object rdf:resource="#/1/4"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/4"/>
      <rdf:object rdf:resource="Role"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.openhealth.org/types#primary-care-physician"/>
      <rdf:subject rdf:resource="#/1/4"/>
      <rdf:object rdf:resource="http://www.openhealth.org/people/DrJones.xml"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="http://www.openhealth.org/people/DrJones.xml"/>
      <rdf:object rdf:resource="http://www.openhealth.org/types#primary-care-physician"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.openhealth.org/types#address"/>
      <rdf:subject rdf:resource="http://www.openhealth.org/people/JohnDoe.xml"/>
      <rdf:object rdf:resource="#/1/5"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/5"/>
      <rdf:object rdf:resource="Address"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="file:/D:/rdf/test.xml#loc"/>
      <rdf:subject rdf:resource="#/1/5"/>
      <rdf:object>home</rdf:object>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.openhealth.org/types#street"/>
      <rdf:subject rdf:resource="#/1/5"/>
      <rdf:object>750 Washington Street</rdf:object>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.openhealth.org/types#city"/>
      <rdf:subject rdf:resource="#/1/5"/>
      <rdf:object>Boston</rdf:object>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.openhealth.org/types#state"/>
      <rdf:subject rdf:resource="#/1/5"/>
      <rdf:object>MA</rdf:object>
   </rdf:Statement>
</rdf:RDF>

And the reified result:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/1"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate"/>
      <rdf:subject rdf:resource="#/1/1"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#subject"/>
      <rdf:subject rdf:resource="#/1/1"/>
      <rdf:object rdf:resource="http://www.openhealth.org/people/JohnDoe.xml"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#object"/>
      <rdf:subject rdf:resource="#/1/1"/>
      <rdf:object rdf:resource="http://www.openhealth.org/types#person"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/2"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate"/>
      <rdf:subject rdf:resource="#/1/2"/>
      <rdf:object rdf:resource="http://www.openhealth.org/types#name"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#subject"/>
      <rdf:subject rdf:resource="#/1/2"/>
      <rdf:object rdf:resource="http://www.openhealth.org/people/JohnDoe.xml"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#object"/>
      <rdf:subject rdf:resource="#/1/2"/>
      <rdf:object rdf:resource="#/1/1"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/3"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate"/>
      <rdf:subject rdf:resource="#/1/3"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#subject"/>
      <rdf:subject rdf:resource="#/1/3"/>
      <rdf:object rdf:resource="#/1/1"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#object"/>
      <rdf:subject rdf:resource="#/1/3"/>
      <rdf:object rdf:resource="PersonName"/>
   </rdf:Statement>
   ...

   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#object"/>
      <rdf:subject rdf:resource="#/1/13"/>
      <rdf:object rdf:resource="http://www.openhealth.org/people/DrJones.xml"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/14"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate"/>
      <rdf:subject rdf:resource="#/1/14"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#subject"/>
      <rdf:subject rdf:resource="#/1/14"/>
      <rdf:object rdf:resource="http://www.openhealth.org/people/DrJones.xml"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#object"/>
      <rdf:subject rdf:resource="#/1/14"/>
      <rdf:object rdf:resource="http://www.openhealth.org/types#primary-care-physician"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/15"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate"/>
      <rdf:subject rdf:resource="#/1/15"/>
      <rdf:object rdf:resource="http://www.openhealth.org/types#address"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#subject"/>
      <rdf:subject rdf:resource="#/1/15"/>
      <rdf:object rdf:resource="http://www.openhealth.org/people/JohnDoe.xml"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#object"/>
      <rdf:subject rdf:resource="#/1/15"/>
      <rdf:object rdf:resource="#/1/5"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/16"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate"/>
      <rdf:subject rdf:resource="#/1/16"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#subject"/>
      <rdf:subject rdf:resource="#/1/16"/>
      <rdf:object rdf:resource="#/1/5"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#object"/>
      <rdf:subject rdf:resource="#/1/16"/>
      <rdf:object rdf:resource="Address"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/17"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate"/>
      <rdf:subject rdf:resource="#/1/17"/>
      <rdf:object rdf:resource="file:/D:/rdf/test.xml#loc"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#subject"/>
      <rdf:subject rdf:resource="#/1/17"/>
      <rdf:object rdf:resource="#/1/5"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#object"/>
      <rdf:subject rdf:resource="#/1/17"/>
      <rdf:object>home</rdf:object>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/18"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate"/>
      <rdf:subject rdf:resource="#/1/18"/>
      <rdf:object rdf:resource="http://www.openhealth.org/types#street"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#subject"/>
      <rdf:subject rdf:resource="#/1/18"/>
      <rdf:object rdf:resource="#/1/5"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#object"/>
      <rdf:subject rdf:resource="#/1/18"/>
      <rdf:object>750 Washington Street</rdf:object>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/19"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate"/>
      <rdf:subject rdf:resource="#/1/19"/>
      <rdf:object rdf:resource="http://www.openhealth.org/types#city"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#subject"/>
      <rdf:subject rdf:resource="#/1/19"/>
      <rdf:object rdf:resource="#/1/5"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#object"/>
      <rdf:subject rdf:resource="#/1/19"/>
      <rdf:object>Boston</rdf:object>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
      <rdf:subject rdf:resource="#/1/20"/>
      <rdf:object rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate"/>
      <rdf:subject rdf:resource="#/1/20"/>
      <rdf:object rdf:resource="http://www.openhealth.org/types#state"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#subject"/>
      <rdf:subject rdf:resource="#/1/20"/>
      <rdf:object rdf:resource="#/1/5"/>
   </rdf:Statement>
   <rdf:Statement>
      <rdf:predicate rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#object"/>
      <rdf:subject rdf:resource="#/1/20"/>
      <rdf:object>MA</rdf:object>
   </rdf:Statement>
</rdf:RDF>

error-file:TidyOut.log

 

Comments are welcome

Jonathan Borden

jonathan@openhealth.org

September 21, 2000