real ontologies as emergent phenomena

"Steven R. Newcomb" <srn@techno.com>

Mail folder: SRKB Mail
Next message: Sy Ali: "CFP: FLAIRS '96 Track on Information Interchange"
Previous message: Fritz Lehmann: "Re: 11179-2 Classification"

Date: Wed, 2 Aug 1995 12:27:43 -0400
From: "Steven R. Newcomb" <srn@techno.com>
Message-id: <199508021627.MAA00692@bruno.techno.com>
To: srkb@cs.umbc.edu, cg@cs.umn.edu
cc: michel@hightext.com
Subject: real ontologies as emergent phenomena
Sender: owner-srkb@cs.umbc.edu
Precedence: bulk

First note is from Steve Newcomb to Fritz Lehmann.
Second note is Fritz's reply.  Third note is a new one
from Steve Newcomb.

********************************************************************************
Steve Newcomb -> Fritz Lehmann
********************************************************************************

To: fritz@rodin.wustl.edu
cc: biezunski
In-reply-to: <9508011632.AA08571@rodin.wustl.edu> (fritz@rodin.wustl.edu)
Subject: Re: Transaction models; ontological "tools"
--text follows this line--

[Fritz Lehmann:]

>      Yes, I already downloaded the "Introduction and Base Module"
> and Semantic Delaration Module" chapters of CApH [Conventions for
> the Application of HyTime].  I like very much the illustrations I've
> seen -- 1 through 5.  I'm still reading these sections.  I want to
> know where the "content" part of it is, meaning particular sets of
> defined semantic links.

I'm going to answer your question, but please first bear with
me through the intervening rant.  I've been working up this
rant for some time, and I want your reaction to it, if any.

It's the same problem everywhere, I'm afraid: it's expensive and
difficult to do that work and it's hard to see the payoff.  However,
the payoff can be quite tangible, even in the near term, in the
context of certain information assets -- documents like lawbooks and
other repositories of highly technical, highly interconnected complex
information.  Indeed, there are people and organizations who, right
now and today, would gladly spend millions to generate useful
ontologies if they could see a way to get there from where they are
right now.  Where they are right now is a place where they are
drowning in electronic documents (or, even worse, in paper documents)
that they can't live without and they can't comfortably live with,
either.  Four homely examples of such infoglut are (1) the endless
accumulation of caselaw; (2) technical information needed to maintain
complex hardware, especially weapon systems; (3) business
decision-support information, especially information to guide
investment decisions; and (4) information needed in order to
process insurance claims.

Human beings generate documents (a "document" is here defined "as any
information intended for human perception in any combination of
media").  Comparatively few people generate machine-processable
"knowledge" -- and even those few actually generate far more
document-type information than machine-processable knowledge.  So, as
a practical matter, if we want to have lots of machine-processable
knowledge available, we are going to have to find a way to make such
knowledge a necessary by-product of making more ordinary kinds of
documents more useful.

So we have to start with what we have: documents.  A radically
document-centric view pervades all of SGML, HyTime and CApH; these
standards are designed to serve the needs of information asset owners
and exploiters, and they don't kowtow to anybody else's needs,
desires, or convenience. [Aside: system vendors, for example, hate
these standards because they destroy the dependency users otherwise
have on vendors.  Aside from the fact that these standards are not
easy to implement, implementing them generally does not put any fancy,
salesworthy new graphics on the screen.]

A document-centric approach, which adds machine-processable knowledge
to particular documents as a way of enhancing the value of such
documents to ordinary users, is probably a viable way to get things
moving, in a broad way, toward the creation of ontologies.  When there
is a broad enough array of machine-processable ontologies in
existence, and if they all conform to a reasonably small number of
representation standards to some useful degree, then we may hope for
an "emergence" phenomenon after some critical level of functional
interconnection and complexity is reached.  What might emerge?
Personally, I hope for the emergence of a constantly evolving, highly
dynamic super-ontology that is the kind of orderly and yet
unpredictable system being studied and modeled at places like the
Santa Fe Institute under the rubric of "Complexity" research.

Frankly, I question the whole notion of deriving some sort of Ultimate
(i.e., Static) Ontology From First Principles.  I don't think there is
any way to fund the development of such an animal; and I'm not at all
sure it would be a good idea even if it could be funded.  A living
super-ontology consisting of constantly evolving sub-ontologies that
demonstrably work, adapt, and survive in their various domains is,
however, not only attainable but also preferable.  Therefore, for
my own part, I would like to develop an ontology or two about things I
really know well (like HyTime!) as enhancements to (or integral parts
of) ordinary documents, and I would like many others to do similar
work in whatever areas they understand well.

Interconnections between ordinary documents and other ordinary
documents can accumulate naturally as hypertext links, so long as the
meaning of each link is clear, and there are no prior artificial
boundaries placed on what links are allowed to mean.  These
interconnections will be useful immediately for ordinary browsing and
lookup operations.  But if sufficient care, rigor, precision of design
and self-documentation is used in the creation of such links now, they
will greatly aid the growth and development of machine-processable
ontologies later.  Much of the CApH work is intended to provide
guidance to those who wish to invest in the development of such
high-function, long-lived webs of hyperlinks.

Phew!  Rant over, now I'll answer your question.

We have developed very few semantic-bearing relationships (link
types), and the ones we have developed have been primarily
illustrative.  However, Michel Biezunski (michel@hightext.com, who is
now editing the CApH documents) has for some time been developing what
may pass for an ontology (actually a CApH "topic map") in the context
of our perennial project to create a useful instructional document
about SGML and HyTime.  This is a topic that we feel qualified to
develop an ontology about.  (We have no plans to develop an ontology
of common sense; we obviously lack that quality ourselves.)

So "where's the beef -- er, I mean `content'?"  I think it's all
around us, in the form of ordinary documents, just waiting to be
connected together and then to participate in some sort of emergent
phenomenon.  Now that HyTime gives us the ability to address it all
(and therefore to make arbitrary links among all of its components and
subcomponents), we can start experimenting with such things as
handwritten domain-specific ontologies, and more automatic approaches
using classifier systems and genetic algorithms that will operate in
the realm of ideas.

Best regards,

-Steve

P.S. Do you think I should post this to the srkb list?

***************************************************************
*          Steven R. Newcomb | President                      *
*     direct +1 716 389 0964 | TechnoTeacher, Inc.            *
*       main +1 716 389 0961 | (courier: 3800 Monroe Avenue,  *
*        fax +1 716 389 0960 |  Pittsford, NY 14534-1330 USA) *
*   Internet: srn@techno.com | P.O. Box 23795                 *
*        FTP: ftp.techno.com | Rochester, New York 14692-3795 *
* WWW: http://www.techno.com | USA                            *
***************************************************************



********************************************************************************
Fritz Lehmann -> Steve Newcomb
********************************************************************************

Date: Tue, 1 Aug 95 19:58:36 CDT
From: fritz@rodin.wustl.edu (Fritz Lehmann)
To: srn@techno.com
Subject: Re: Transaction models; ontological "tools"

Dear Steve,

     I definitely think you should post your "rant" to the
srkb@cs.umbc.edu and cg@cs.umn.edu lists.  I AGREE with almost
all that you say. Highly structured documents, marked-up as
much as possible with somewhat "semantic" links, will be the
best springboard from which to launch real, conceptual ontologies.

     In fact, already Knowledge Research in Tarzana, Cal. is building
massive semantic networks and taxonomies directly from scanned and other
government manuals (mostly highly structured military documents).

     My lecture at the ontoology workshop at IJCAI next month
will be on the "linking them all together" problem.   With 
widespread SGML and HTML documents accumulating fast, the day
is approaching faster (with a little help from automated partial
natural language processing).  An important thing, though, is to
have a system that encourages authors to put in MORE tags, especially
some agreed-upon semantic tags.  The DocBook DTD for example has a lot
of fairly specific-meaning tags.

                          Yours truly,   Fritz Lehmann
GRANDAI Software, 4282 Sandburg Way, Irvine, CA 92715, U.S.A.
Tel: +1(714)-856-0671            email: fritz@rodin.wustl.edu
=============================================================

********************************************************************************
Steve Newcomb -> Fritz Lehmann, et alii
********************************************************************************


[Frtiz Lehmann:]
> An important thing, though, is to
> have a system that encourages authors to put in MORE tags, especially
> some agreed-upon semantic tags.  The DocBook DTD for example has a lot
> of fairly specific-meaning tags.

The CApH approach is to retreat to a more abstract level than a DTD.
CApH defines a meta-DTD which provides the means to explain what is
meant by any particular arbitrary markup in any particular arbitrary
context(s).

The requirements on which the CApH design is based include:

* A requirement that CApH representations are sufficiently explicit
  and sufficiently flexible that writers will be able to cooperate,
  via some sort of network application, on the development of
  ontologies that can guide and govern their collective work.  One can
  imagine an application that quizzes an author about the topics
  covered by and related to a particular paragraph that an author has
  just written.  The act of connecting a paragraph (and/or any of its
  contents) would involve satisfying a semi-sentient program (and,
  ultimately, the specialist in charge of keeping the web self
  consistent and optimally interconnected) that the connections made
  (the relations expressed) are consistent, appropriate, and
  sufficient.

* A requirement that every concept and every assumption that governs a
  topic map (or any markup, or any other component of any document)
  can be explicitly described and explained, that all such
  descriptions and explanations can be automatically retrieved, and
  that, to the greatest degree possible, machine processing will not
  require natural language processing of the
  explanations/descriptions.  For now, the idea is to use computing
  machinery to enhance the usefulness of documents to their owners and
  users, and to control infoglut by enhanced filtering functionality.

********************************************************************************

Old CApH drafts can be found in ftp.techno.com//pub/CApH/docs.  New
drafts that conform to the corrected HyTime Standard will be available
soon; the editor is Michel Biezunski (michel@hightext.com).

***************************************************************
*          Steven R. Newcomb | President                      *
*     direct +1 716 389 0964 | TechnoTeacher, Inc.            *
*       main +1 716 389 0961 | (courier: 3800 Monroe Avenue,  *
*        fax +1 716 389 0960 |  Pittsford, NY 14534-1330 USA) *
*   Internet: srn@techno.com | P.O. Box 23795                 *
*        FTP: ftp.techno.com | Rochester, New York 14692-3795 *
* WWW: http://www.techno.com | USA                            *
***************************************************************