Knowledge Bus and RDF

John F. Sowa (sowa@west.poly.edu)
Thu, 21 May 1998 02:58:41 -0400

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Previous message: Piek Vossen: "Top 40 Concepts"

At the NCITS T2 meeting last week, there were presentations on two
important examples of the use of ontologies in computer applications.
The first was on the Knowledge Bus by Bill Andersen from DoD, and the
second was on the Resource Description Framework (RDF) by Frank Olken
from the Lawrence Berkeley Laboratory. Either directly or indirectly,
both topics are related to Cyc, but they are just as relevant to any
ontology that may be developed by or be incorporated in the ontology
work we are considering.

Knowledge Bus:

Bill Andersen's talk on the Knowledge Bus was a preview of a paper that
will be presented at the 5th KRDB Workshop in Seattle on 31 May 1998.
The title is "Knowledge Bus: Generating Application-Focused Databases
from Large Ontologies" by B. J. Peterson, W. A. Andersen, and J. Engel.
A PostScript version of it can be downloaded from

http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-10/

The Knowledge Bus is a system that generates database definitions and
programming interfaces (APIs) from the Cyc ontology. It doesn't map the
entire Cyc knowledge base into a database, but only that subset that is
accessible from a specific context or _microtheory_. Instead of the
500,000 or so axioms of Cyc, it extracts about 5,000 that are relevant
to some application domain.

In this case, the Knowledge Bus was used to "develop databases for
the Department of Defense, which are now in operational use in complex
decision-support applications." The APIs are the Java class definitions
and interfaces, which are generated automatically from the Cyc ontology.
The programming details in the Java methods are filled in by a human
programmer, but they use straightforward programming techniques that
might someday be automated.

Cyc is used only in developing and testing the ontology and the
associated axioms. Cyc is not involved in the operational system,
which uses Java programs and a deductive database query engine, XSB.
The XSB system is a Prolog-like engine with well-founded semantics
that was developed at SUNY Stony Brook.

One interesting point is that Cyc uses full first-order logic with
default reasoning, but XSB only supports the Horn-clause subset of FOL
for deduction. As it turns out, about 98% of the FOL axioms in Cyc
are already in Horn-clause form, from which they can be automatically
translated to XSB rules. The other 2% of the axioms are not thrown away;
instead, they are used as integrity checks on the database.

Computationally, that approach is significant: Horn-clause deductions,
as in Prolog, are highly efficient, but full FOL theorem proving may
take an exponential amount of time. The non-Horn 2% of the axioms are
not used for deduction, but for integrity checks, which can also be done
efficiently: the truth or falsity of any FOL statement can be evaluated
in terms of a given database in polynomial time by the equivalent
of an ordinary SQL query. Although full FOL may be inefficient for
arbitrary deductions, it can still be used efficiently for other kinds
of applications. That is a point I have been emphasizing for years:
efficiency depends primarily on what you do with the logic and only
secondarily on the structure of the logical formulas.

Some people have advocated a restricted version of logic for specifying
ontologies. However, that seems to be short-sighted because we cannot
know in advance what users will want to do with the ontologies. The
experience with Knowledge Bus shows that automated tools can extract
an efficiently computable subset from an ontology stated in full FOL.
The ontology developers should provide as much knowledge as they can
in whatever notation is appropriate for the domain experts. Then the
application developers can select whatever subset they need and translate
it to any form their tools require.

RDF:

Frank Olken's talk was about the Resource Description Framework (RDF),
which evolved from the Meta Context Framework (MCF), which was developed
at Apple by R. V. Guha, the former associate director of Cyc, who is now
at Netscape. One of the other people involved at Apple was Larry Tesler,
who was the coauthor of the first paper that Roger Schank published
on his conceptual dependency theory (at IJCAI in 1969). Given that
heritage, it is not surprising that RDF happens to be a semantic network
that could be translated directly to a subset of conceptual graphs.
RDF has now been adopted by the W3 consortium as the primary language
for specifying resources on the Internet.

Following is a brief description of an RDF database by Guha et al.:

> 1. a set of labels, also referred to as property types

> 2. a set of nodes

> 3. a set of arcs where each arc is a triple consisting of two nodes
> (the source and target) and a label. Arcs are also referred to as
> properties. Often, we will refer to an arc with a certain source
> as a _property of that source_. Similarly we will refer to the
> target of the arc as the _value of the property_.

Following is an example from the RDF specification:

> An RDF expression is represented pictorially in text with nodes
> in '[...]' and arcs in '--...-->' as follows:

> [resource R]---PropertyType P-->[value V].

> This is read "V is the value of the property type P for resource R";
> or left-to-right, "R has property type P with value V." Consider
> as a simple example the statement:

> Ora Lassila is the author of the resource http://www.w3.org/People/Lassila

> This statement can be represented as follows:

> [http://www.w3.org/People/Lasilla]---Author-->"Ora Lassila"

> where the notation '[URI]' denotes the node representing the resource
> identified by URI and quotation marks (") denote an atomic value.

All of this happens to look like a version of the linear notation
for conceptual graphs. In fact, RDF is essentially the "simple graph"
subset of CGs, which was defined in _Conceptual Structures_ as CGs
with no negations, nested contexts, or quantifiers other than the
default existential.

What makes RDF important is not its theoretical sophistication, but the
fact that it has been adopted by the W3 consortium, which is supported
by all the big players, including IBM, Netscape, Microsoft, etc.
Technical reports that describe RDF and related topics can be viewed
or downloaded from the W3 web site: http://www.w3.org/TR/

For a two-page introduction to RDF at the "executive summary" level,

http://www.w3.org/TR/NOTE-rdf-simple-intro

For the current working draft of the RDF definition and syntax,

http://www.w3.org/TR/WD-rdf-syntax/

For an older paper on MCF by Guha et al., which is now obsolete for
the details, but is more interesting for the underlying rationale,

http://www.w3.org/TR/NOTE-MCF-XML/

For the latest working draft on RDF schemas by Guha of Netscape and
Andrew Layman of Microsoft,

http://www.w3.org/TR/WD-rdf-schema/

Previous message: Piek Vossen: "Top 40 Concepts"