Ontology for EDI (was Frames...)

fritz@rodin.wustl.edu (Fritz Lehmann)
Date: Wed, 21 Sep 94 08:43:58 CDT
From: fritz@rodin.wustl.edu (Fritz Lehmann)
Message-id: <9409211343.AA04251@rodin.wustl.edu>
To: cg@cs.umn.edu, edi-new@tegsun.harvard.edu, srkb@cs.umbc.edu
Subject: Ontology for EDI (was Frames...)
Cc: agc@scs.leeds.ac.uk, pdoudna@aol.com, phayes@cs.uiuc.edu
Sender: owner-srkb@cs.umbc.edu
Precedence: bulk

     Patrick Hayes, answering my earlier message on Frames and EDI,
----begin quote----
However, long experience suggests that the idea of there being a single
correct real-world ontology is overoptimistic. Almost any concept you can
think of, even very 'basic' ones, can be described perfectly correctly in
several different incompatible ways. I am just finishing a survey of ways
of describing time, for example, which gives at least three fundamentally
incompatible views of the structure of the timeline, each with several
subcases; and each of these can be axiomatised in various ways, with
different collections of relations and objects. Is this one ontology or
several ontologies? When we get to such things as a purchase-order, the
number of ways it can be described probably runs into four or five figures.
This is one of the problems with the formalisms we have, I think: they
force one to take a stance on issues which are not really important to the
task in hand and which later get in the way. (Is a fitted carpet IN a room
or PART OF the room? How about the paint on the wall?)
----end quote----

     I agree completely, but it is not too obvious what actually 
causes the problem.  I have some candidate causes:

     1. An obstacle to integrating differing ontologies
(which I have long insisted is a necessary ongoing task) is
the supposed need for precise logical equivalence between 
concepts as defined in both systems.  We do not demand this
of natural language translation; if I say "the table" to
a Frenchman it is certainly likely that there will be some
borderline cases where my concept diverges from his "le table",
but almost all instances of one concept will also be instances 
of the other.   There is very large overlap, even if the precise,
painstaking logical definitions (in terms of shared low-level
primitives) are logically inequivalent.  Similarly, for the 
inconsistent time models surveyed by Pat, I surmise that they
would yield different results in commonsense situations only
in extreme borderline cases (as where one event-interval ends
at EXACTLY the same time as another one begins -- down past
the Planck time-length).  For 99.99999999% of cases they may
be quite compatible.  I attempt a formal solution in an article
with A. G. Cohn called "The EGG/YOLK Reliability Hierarchy:
Semantic Data Integration Using Sorts with Prototypes" to
appear in this year's CIKM-94 Proceedings (ACM Press).  Others
have used statistics-based and machine-learning "partial
agreement" notions for this kind of problem.  In fact I 
assume that few concepts _within_ a language are understood
precisely the same way by different people.  Still, we manage.

     2.  The source of supposed ontological disagreement may
not arise at all except for peoples' disagreements about the
way they use words.  (This applies more to the higher (messier)
ontological levels than to the basic-level primitives like
time.)  If we had no preconceived linguistic notions, and 
had only the existing formally defined "more primitive"
ontology available, then we either accept a defined concept,
or else define a new one that fits our need.  The trouble 
comes when somebody says "Hey, that's not what I mean by the
word 'enclosure' -- you're defining it wrong!"  It is
useful to remember that we need not define concepts precisely;
we can specify incidence relations that constrain the
hierachy of concepts: "A cattle auction, a prostitute working,
a selling of candy at a candy store, a real estate deal for
cash, a lemonade stand transaction, a common-stock issue,
a lawyer charging a client, and a Deutschmarks-forDollars
exchange are all instances of SALEXYZ." Instead of fully defining 
SALEXYZ we have only constrained future refinements of
SALEXYZ's definition -- they must cover the list of 
examples.   Again, a problem will arise if someone
complains that "That's not what I mean by a 'sale'."
The answer is "Well that's what's pre-defined; if you want
something different then define it yourself."  Most
differences which arise this way will again be borderline
cases (e.g. the currency exchange rather than the candy store).
In the field of knowledge acquisition, and in the CYC
project, much thought has gone into reconciling different
conceptualizations of a domain.  It may be that the true
disagreements on ontology are few, and that most of the 
problem is with the intended meanings of words.

     The current EDI Standards (X12 and EDIFACT) give
no _definitions_ at all for most concepts.  It is baffling
to see, in EDIFACT for example, that 8249:1 (Equipment
Status: Continental) is "self-explanatory".  I do not
find it so!  In many cases a short word or phrase is
given in English; sometimes it's obvious what is meant,
sometimes not.  In most of the understandable cases,
it seems that a conceptual definition (using a good
stock of formally defined concepts) is feasible and 
would be very helpful to the uninitiated just as a
form of documentation, let alone for the machine-processing
and automated integration capabilities.

                          Yours truly,   Fritz Lehmann
GRANDAI Software, 4282 Sandburg Way, Irvine, CA 92715, U.S.A.
Tel:(714)-733-0566  Fax:(714)-733-0506  fritz@rodin.wustl.edu