Bill Mark position paper

Tom Gruber <>
Full-Name: Tom Gruber
Message-id: <2878500126-6078726@KSL-Mac-69>
Date: Wed, 20 Mar 91  15:22:06 PST
From: Tom Gruber <>
To: Shared KB working group <>
Subject: Bill Mark position paper
%\input newfigures
\title{Sharing Knowledge Bases}
\author{William Mark \\
& & \small{Information and Computing Sciences} & & \\
& & \small{Lockheed Palo Alto Research Labs} & & \\
& & \small{3251 Hanover St. O/96-01 B/254E} & & \\
& & \small{Palo Alto, CA 94304} & & \\
& & \small{(415)354-5236} \ \ \small{FAX (415)354-5235} & & \\
& & \small{} & & \\



Interest in shared knowledge bases has recently intensified due to advances in
knowledge representation technology, the high visibility of the Cyc experiment at MCC, and
widespread recognition that the current piecemeal approach to building knowledge bases is a
barrier to further advances in some important areas of AI.

Discussions about sharing knowledge often seem to assume
that we have knowledge to share right now, that we will
inevitably build much more knowledge that is worth sharing, that we know how we will share it,
and that the primary emphasis should be on opening the communication channels between the various
groups building knowledge bases.

My belief is that we {\em can} have lots of
knowledge to share, but only if we start building it to be sharable.
Knowledge can be worth sharing for a variety of reasons: it may be a repository of
problem solving know-how; an integral record
that supports reasoning about a set of decisions (e.g., that comprise some design);
a medium of communication to be used by
people cooperating to solve a problem; and so on.  I think that we don't know much at all about how
we will share knowledge, but I suspect that the different reasons for sharing knowledge will
require different technologies in their support.  Finally, I think that
communication among different knowledge-base-centered groups is an important issue for the future,
but that it isn't the primary issue at the moment.

\section{Issues and Recommendations}

\subsection{Issue 1: Explicit Assumptions}

Sharing any sort of knowledge relies on shared implicit assumptions (because it's too onerous to be
explicit about everything).   I think that the critical problem
in sharing knowledge bases (and in building large-scale knowledge bases in general) is 
being {\em explicit enough} about underlying assumptions to be fairly sure that users
(humans and programs) will interpret the knowledge in the same way most of the time.
This is clearly the famous ``ontology'' issue, but I'm stressing the problem of finding the right
balance of explicitness and implicitness that allows (promotes) sharing, but doesn't bog down
either the creators or the sharers.

{\bf Recommendations} \ I think that we really don't know the answers here; we need to
experiment -- early and often.  Some experimentation has already been done, but there has been too
much emphasis on ``getting it right'', and not enough on ``sharing it, whatever it is''.  Getting
it right is going to take a long time, and will slow us down too much if we make it a
prerequisite.  The main thing is to develop some knowledge bases with represented assumptions,
and to experiment with sharing them.

I suggest that we could gain a lot by trying to hammer out an
agreed upon set of assumptions using pencil and paper, and that it's quite feasible to begin that
way. However, as soon as the
knowledge base gets large, we will need automated help, which brings up Issue 2.

\subsection{Issue 2: Representation Software}

Constructing sharable knowledge bases will require a sustained effort by more than one person
over a considerable period of time.  There must be some automatic means (or at least some very
effective guidance) for {\em accumulating} the knowledge, i.e., taking knowledge provided in
incomplete ``pieces'' and fitting it into the framework of existing knowledge.
We need ways to continually determine and enforce the ramifications of changes
(e.g., additions) to the knowledge base with respect to assumptions in order to make sure that
incoming knowledge shares the right assumptions and is thus cumulative, not simply

{\bf Recommendations} \ I think that we need to start building shared knowledge bases using the
same knowledge representation system in order to see what the problems really are.  By a knowledge
representation system, I mean an actual software system that makes (and enforces) a set of
commitments about how knowledge should be represented and reasoned about.  I really don't think we
have a choice.  Since it is too onerous to encode all of the
assumptions explicitly (even if we knew them all), we have to rely on unstated
knowledge and underlying mechanisms.  Unstated knowledge is the Issue 1 problem: with
experience we will learn what absolutely has to be stated and what can be left implicit.  But
unstated underlying mechanisms we
can address more directly.

We cannot use someone else's underlying mechanisms without understanding them, and we have no way
of rigorously understanding them (they would have to be axiomatized at a level of detail
that would require vast labor, the axiomization would have to be understandable to all parties,
and it would have to be kept up to date as the system evolves).  Therefore, I think that we need
to use the same KRS so that {\em ipso facto} we share the same underlying mechanisms.

Since I think that we need to
start building knowledge bases now, we must start by making the best use
we can of an existing KRS.  But I also think that we should start working on a common
representation system that is specifically designed to enable sharing, i.e., that is specifically
designed for the incremental input of new knowledge and enforcement of commitments.  We should be
willing to forgo the relatively small (with the exception of Cyc) investment in
current software.  I think that advances in KR technology over the next
few years will make it obsolete anyway.

\subsection{Issue 3: ``Indexing''}

Sharing of large knowledge
bases, by people or programs, requires some concept of {\em indexing}, i.e., a means to
find relevant knowledge without having to examine in it detail.

{\bf Recommendations} \ Believe it or not, I think that we need to start making a serious
attempt to use learning and case-based reasoning technology in organizing our knowledge
bases.  This work addresses the automatic (or semi-automatic) construction of indices without
having the requirement of completely understanding the knowledge that is being indexed.  In the
meantime, we may get some mileage out of hypertext (but not much, because the complexity of the
index structures will require more principled construction and management techniques). 

\subsection{Issue 4: Interchange}

If enough progress is made on the previous issues, there will soon be significant bodies of
knowledge worth sharing, but ``trapped'' in the individual knowledge representation systems
advocated above (unless everyone agrees to use the same common KRS, which I doubt).  I think that
it's worth some effort now to develop knowledge representation formalisms capable of communicating
knowledge among different knowledge representation systems. 
{\bf Recommendations} \ I think that the current Interlingua
effort is on track here.

\subsection{Issue 5: ``Packages''}

It is almost surely the case that there is an intimate coupling between the representation 
of knowledge and the use for which it is intended.

{\bf Recommendations} \   We need research in techniques for 
analyzing knowledge in terms of task dependencies, and ``packaging'' the knowledge along with
its task dependent reasoning mechanisms in order to make it usefully sharable.  I think that this
area is getting far too little attention (at least in the U.S.).

\subsection{Issue 6: Staking Out Territory}

Along with the technical issues, there are some  significant sociological ones: who builds the
knowledge bases, and who has access to them? how will the effort to build large scale knowledge
bases affect knowledge representation research as a whole? how will intellectual property rights
be handled? -- and so on.

The sociological issue of most immediate import is staking out territory: if we (as a
community) don't seize the initiative, we will be overwhelmed by the much larger
activities in related areas like PDES/STEP and the database world.  These activities won't solve
our problem, but they may make decisions that have the effect of reducing our flexibility.

{\bf Recommendations} \  We must stake out this territory (i.e., how to represent and
exchange knowledge) before we find ourselves being told what the ``rules'' are.  The only way to
do it is to achieve some tangible results in knowledge base sharing.  

Unfortunately, in my view, the part of this issue that is getting most of
the attention is the other side: fear that our activities in the
knowledge base sharing area will result in restrictive standards and exclusionary research
policies for our own community.  My recommendation here is that we deal with this side of the issue
fairly, but treat it as the red herring I really think it is.

Good ideas emerge and evolve over time.  Like any good
capitalist, I think that something like a marketplace is a good way to encourage the best ideas to
emerge.  The current complaint is that we don't have a free marketplace: government funding is
setting the research agenda.  I really don't think that this is a problem: {\em getting}
government research funding is pretty much a free market activity, and (more important)
government research funding usually results in (at most) a convincing demonstration of
feasible technology.