A space such as Cartesian space may have a set number of dimensions and other defining characteristics. The behavior of objects within the space is defined by a set of rules e.g. Newtonian Mechanics.
Other spaces may have a different number of dimensions and operate under a different set of rules such as quatum mechanics and String Theory.
A Data Space is defined by an architecture. Such an architecture can be visualized by a graph of nodes connected by arcs. Such graphical representations can be generated from schemata.
XML documents have often been described to have a treelike structure. More generally data structures can be described by graphs. Such data structures are easy to visualize. We can understand XML documents having a shallow arborization or a deep arborization. A document consisting of a root element whose content is a long string is the most shallow type of arborization. One might imagine a shape transformation which matches the string against a regular expression pattern which results in the string being transformed into a hierarchy of nodes and values. Figure 1 demonstrates a regular exression representing a DNA sequence which is transformed into a hierarchy using terms such as "gene" and "snp". This vocabulary is in the domain of molecular biology. Such a pattern or transform represents genetic information at two levels of abstraction.
Just as Cartesian space defines a point as having an x, y and z coordinate, a tree is described by nodes having parents and children. A path is used to locate a node on a tree by starting with the root and traversing child nodes until the desired node is identified.
In XML the XPath Recommendation describes such a hierarchical identification mechanism which has become quite popular perhaps due to its natural fit with XML.
A popular misconception is that a URL directly references an HTML document. Rather an URI (a URL being a type of URI) identifies a resource and that resource may be rendered at some point in time as an HTML document. This concept of the abstract resource is understood as a point in URI space.
URI Space is organized in a hierarchical fashion. At the root level, URIs are partitioned by scheme such as "http", "ftp", "mid" or "urn". Individual schemes may be hierarchical or non hierarchical and the general syntax is defined in the famous RFC 2396. Many URI schemes use path syntaxes to identify resources.
The Resource Directory Description Language (RDDL) is an attempt to define the format of a document suitable to be resolved (or rendered) as a result of dereferencing a namespace name URI. The RDDL format defines resources as simple Xlinks having a nature and purpose
The nature of a RDDL resource is a URI which is the xlink:role of the simple XLink which describes the resource with respect to the namespace. The nature of a resource is a statement made about the resource itself.
The purpose of a RDDL resource is a URI which is the xlink:arcrole of the simple XLink which describes the resource with respect to the namespace. The purpose of a resource is a statement made about the purpose of the resource with respect to the namespace.
RDDL space thus identifies resources by both a nature and purpose much as Cartesian space identifies points by both an x and y coordinate.
If the sole reason to have namespaces is to prevent name clashes then the XML Namespace mechanism is not the simplest way of creating a namespace. This is where the controversy is created. It is widely acknowledged that the authors of the XML Namespaces Recommendation had other things in mind when XML Namespaces were created. Yet the W3C specification does not address this (perhaps because each member of the committee had a different idea of what an XML Namespace might be able to accomplish J )
If we look at what an XML Namespace ought to be rather than what the W3C specification actually states, then the W3C XML Namespace mechanism starts to make a lot more sense.
Looking at XML Namespaces from a syntactic viewpoint: they accomplish the objective of partitioning XML element and attribute namespace in what is actually a logical and well designed fashion.
This viewpoint is controversial but can be supported. The XML Namespaces Recommendation uses URIs as namespace names. Many have argued that a simpler hierarchical naming mechanism such as that selected by Sun's Java programming language for package names, would have been a simpler naming mechanism for namespace names. Perhaps this is true and arguably the Java package naming mechanism could have been made to work. URIs turn out to be a terrific and well supported way to create unique names and have some important advantages over Java package names when we extend the XML Namespace mechanism.
As a space a namespace is associated with a set of rules which define the space. Such rules are the "Laws of Physics" for the particular namespace.
Such rules may be encoded as a set of schemas, stylesheets and executable code that define the properties of the namespace. The designer of the namespace should be free to describe the namespace using appropriate tools, yet we wish to have a common format with which to document namespaces for both human and machine use.
On December 30, 2000 in response to yet another debate on the xml-dev mailing list regarding whether a namespace URI ought resolve to anything, Tim Bray proposed that XHTML plus a bunch of XLinks would make a suitable format. By January 2, 2001 the RDDL acronym and basics of the RDDL format were formed (http://www.rddl.org) and by January 7, 2001 the first parser was available.
RDDL allows the description of many types of schemas related to a namespace including DTDs, XML Schema, CSS, RELAX, TREX, RDF Schema and Schematron.
In a long tradition, the RDDL specification is itself a RDDL document and points to resources describing the RDDL format including a DTD:
http://www.rddl.org/rddl-xhtml.dtd
This snippet describes the rddl:resource element:
<!ELEMENT rddl:resource (#PCDATA,%Flow.mix;)* >
<!ATTLIST rddl:resource
id ID #IMPLIED
xml:lang NMTOKEN #IMPLIED
xmlns:rddl CDATA "http://www.rddl.org/" #FIXED
%xlink.simple.attrib;
%xlink.namespace.attrib;
>
The content of the rddl:resource element is defined similarly to the (X)HTML div element, and may be placed anywhere a div element may be in an XHTML document. The content is typically a human readable description of the resource. Providing an id attribute, allows a fragment identifier to index the rddl:resource. The xml:lang attribute is used to specify the language of the content.
The rddl:resource element is defined as a simple Xlink:
xl:type (simple) #FIXED "simple"
xl:arcrole CDATA #IMPLIED
xl:role CDATA "http://www.rddl.org/#resource"
xl:href CDATA #IMPLIED
xl:title CDATA #IMPLIED
xl:show (none) #FIXED "none"
xl:embed (none) #FIXED "none"
xl:label CDATA #IMPLIED
The RDDL nature or Xlink role of a resource describes the nature of the resource. This is a property of the resource not the namespace.
The nature is often the namespace URI of the root element of the related resource when the related resource is XML. When the related resource is not XML, the nature may be the well known URI describing its MIME media type.
A description of some useful natures is found at http://www.rddl.org/natures. For example, the nature of an XML Schema document is: http://www.w3.org/2001/XMLSchema, the nature of a Schematron schema is: http://www.ascc.net/xml/schematron and the nature of XSLT is http://www.w3.org/1999/XSL/Transform.
The purpose of a RDDL resource is defined by the namespace and describes the purpose of the resource with respect to the namespace. It is a property of the namespace not the resource. A list of some useful purposes is found at http://www.rddl.org/purposes/.
The purpose of an XML Schema, when the schema is intended to be used for schema validation might be: http://www.rddl.org/purposes#schema-validation. A namespace may use several schemas for schema validation in which case it can assign a more descriptive purpose to each schema.
For example:
<rddl:resource
xlink:title="TREX Schema"
xlink:arcrole="http://www.rddl.org/purposes#schema-validation"
xlink:role="http://www.thaiopensource.com/trex"
xlink:href="xhtml-rddl.trex"
>
<h3>7.10 TREX</h3>
<p>A TREX Schema <a href="xhtml-rddl.trex">xhtml-rddl.trex
</a> for RDDL</p>
</rddl:resource>
Resources and URIs
The resource is defined in RFC 2396 yet much confusion remains. It is commonly thought that a resource is a document on the web but this is untrue, a resource is merely a point in URI space, no more and no less. In common usage, however, a resource is represented by a document having a MIME media type. Via the mechanism known as HTTP content negotiation, for example, different documents may be retrieved as the representations of a single resource i.e by dereferencing a single URI.
In practice, content negotiation is not always a useful mechanism for discovering properties of a particular resource. A namespace name, being a URI, suggests that a namespace may be considered a resource (or at least may be represented by a resource). By the RDDL mechanism, the RDDL document represents a namespace as a set of resources each having a nature and purpose. Thus each resource while being a point in URI Space, can be described by its own unique RDDL Space. At its simplest, each unique resource has a set of natures.
A namespace is defined by a set of tuples each containing an id a purpose and a resource.
By this formalism, each RDDL resource describes two RDF triples: the first whose subject is the namespace URI, whose predicate is the purpose and whose object is the href, and the second whose subject is the href whose predicate is rdf:type and whose object is the nature.
It is often implicit that the purpose of an XSLT document is to define a transform from one format to another. In RDDL, the namespace being defined may serve as the nature of the originating format and the purpose of the XSLT resource may be implicitly understood to be the nature of the format being transformed into.
For example the following resource:
<rddl:resource
xl:role="http://www.w3.org/1999/XSL/Transform"
xl:arcrole="http://purl.org/rss/1.0"
xl:href="toRSS.xsl"
>
In this case the purpose of the XSLT resource is to transform into RSS. The resulting document has a nature of http://purl.org/rss/1.0. Such a convention allows a software agent to piece together a series of transforms. For example suppose it is desired to transform from format A to format C. The RDDL description of the namespace for A may contain an XSLT providing a transform to B, and the RDDL description for the namespace of B may contain an XSLT transform into C. One could then serially transform A -> B -> C.
Software is often classified according to its nature, perhaps the language it is written in, and its purpose. Such a classification may allow a system to assemble components of a similar nature for a particular purpose.
Java Resource
This example demonstrates the incorporation of a Java resource into a namespace. In this case the purpose of the resource is to serve as the implementation of an XSLT extension library associated with the namespace.
<rddl:resource
xl:role="...application/java-archive"
xl:arcrole="...purposes/software#xslt-extension"
xl:href="thisNS-xslt-extension.jar"
>
<p>The xslt extensions bound to this namespace are packaged in a JAR</p>
</rddl:resource>
RDDLURL
The idea of a distributed package defined by a namespace is implemented using a simple API. The RDDL URL implements the indirection through the RDDL document to obtain namespace related resources by nature and purpose.
package org.rddl;
class RDDLURL {
public RDDLURL(String nsURI,
String nature,
String purpose);
public InputStream getInputStream();
static Namespace getNamespace(String nsURI);
}
Interface Namespace
In this API a namespace is defined by the interface:
public interface Namespace {
SortedMap getResourcesFromNature(String nature);
SortedMap getResourcesFromPurpose(String purpose);
SortedMap getResourcesFromHref(String href);
SortedMap getResourcesFromTitle(String title);
SortedMap getResourcesFromLang(String lang);
SortedMap getResourcesFromIdRange(String id0,id1);
Resource getResourceFromId(String id);
... }
This interface serves as a definition of an XML Namespace which is of practical use to software. A given nature or purpose defines a set of namespace related resources. Similarly the URI of a particular sub-resource is indicated by its href, yet a given sub-resource may be referenced by several RDDL resources (e.g. associated with different natures, purposes, ids etc.). As a simple XLink, each RDDL resource has a title, which serves as a short human readable description, and may be associated with an xml:lang attribute, which serves to distinguish the intended language of the documentation, or perhaps to distinguish specific sub-resources according to sub-resource language. Each resource may be given an id which serves as the target of a fragment identifier. Such ids allow URI references (e.g. URI + "#" + id) to target particular RDDL resources within a RDDL document. This is particularly useful when the id is the name of the resource being identified.
Interface Resource
public interface Resource {
String getPurpose();
String getNature();
String getHref();
String getTitle();
String getLang();
String getId();
String getBaseURI();
Container getContainer();
}
The RDDL API definition of a Resource contains the expected RDDL attributes. Note that every Resource may have a specific base URI, and that every Resource may be associated with a Container.
Interface Container
public interface Container extends Namespace {
Resource getResourceFromURI(String uri);
SortedMap getResourcesFromURIRange(String uri0,String uri1);
void addResource(Resource r);
}
A Container is a relatively simple extension of a Namespace with the ability to associate a Resource with an arbitrary URI, whereas all Resources in a particular Namespace are assumed to be based on the namespace name URI. Such resources are named by the composition of the namespace URI and the resource id as the fragment identifier of the resulting URI reference (e.g. nsURI#foo).
Namespaces from a semantic viewpoint
RDDL allows XML Namespaces to be used for "Semantic Web" applications by binding URIs to particular human readable text and machine readable code. A simple "Semantic Web" code snippet is found in the RDDLClassLoader java class. This class loads Java classes which are bound to particular XML Namespace URIs and is fully described below.
RDDLClassLoader
public class RDDLClassLoader extends java.net.URLClassLoader
{
static final String STR_NATURE_JAVA = "http://www.rddl.org/natures#java";
static final String STR_NATURE_JAR = "http://www.rddl.org/natures#JAR";
public RDDLClassLoader(String nsUrl,String purposeURI)
throws java.io.IOException,
org.xml.sax.SAXException{
super(buildUriList(nsUrl,purposeURI));
}
It is easy to see that this class is simply the java.net.URLClassLoader class which builds its URL list via the set of URLs in a RDDL directory having a particular purpose. (The relevent natures of these resources are either "java" or "JAR".)
The buildURIList static member implements the functionality of the class:
protected static URL[] buildUriList(
java.lang.String URI,
java.lang.String purposeURI)
{
// first get the RDDL Namespace for the URI
Namespace ns = RDDLURL.getNamespace(URI);
SortedMap ress0 = ns.getResourcesFromNature( STR_NATURE_JAVA );
TreeMap ress = new TreeMap(ress0);
// merge STR_NATURE_JAVA and STR_NATURE_JAR ress.putAll(ns.getResourcesFromNature(STR_NATURE_JAR));
// create a Vector of values having the desired
// purpose
Vector strArr = new Vector();
Iterator iter = ress.values().iterator();
while(iter.hasNext())
{
Resource res = (Resource)iter.next();
if (purposeURI.equals(res.getPurpose()))
strArr.addElement(res.getHref());
}
// convert to URL[]
int len = strArr.size();
URL[] uris = new URL[len];
URL baseURL = new URL(URI);
for(int i=0;i<len;i++){
uris[i] = new URL(
baseURL,
(String)strArr.elementAt(i));
};
return uris;
}
}
Efficiency Concerns
Concerns raised with RDDL/Namespace applications generally fall into two issues:
Frequent resolving of namespace URIs such as XML Schema would create a bottleneck just as would frequent downloading of HTML DTDs. Thankfully the Web has evolved mechanisms to deal with such issues including caches and local catalog indirection. Many applications will hardwire schemas such as for HTML and work around such bottlenecks.
Regarding the RDDL indirection, how and when an application binds to a resource is an implementation issue and need not impose significant actual overhead. Approaches toward dealing with these issues include namespace precompilation.
RDDL HTTP Extension Framework
There is no particular need for a RDDL document to be downloaded to the client. RDDL defines a set of HTTP Extension Framework headers which allow server side indirection:
For example the following HTTP Request can be used to request the appropriate XML Schema used to validate the namespace:
GET / HTTP/1.1 Host: www.rddl.org Opt: "http://www.rddl.org/httpext"; ns=11
11-Nature: http://www.w3.org/2001/XMLSchema
11-Purpose: http://www.rddl.org/purposes#schema-validation
Putting it all together
Mixing together terms from different namespaces in a single document or collection of documents raises particular challenges which we are just starting to understand.
Biomedical information serves as an example of the intermingling of namespace specific vocabularies. Many groups and organizations have developed specialized lexicons which might be documented at individual namespaces. For example "BioML" for genetics/bioinformatics, "CPT" codes for surgery/procedures, "SNOMED" for pathology, "ICD" for medicine and "DICOM" for radiology.
An "electronic medical record" might be expected to contain documents coded in each of these lexicons.
In order to make sense of such apparently disparate information a set of term equivalencies might be created via a set of hyperlinks.
Much as we might view a namespace as defining the shape of information, the activity of analyzing documents which contain linkages between terms in multiple namespaces may be helped by considering the shape of ontologies