EDR's current activities

spillers@VNET.IBM.COM
Fri, 31 Oct 97 12:41:41 PST

Dear Hideo Miyoshi and Takano Ogino,
Thank you for your report describing EDR's current work and thank you for
the work you have done to match EDR's upper nodes with WordNnet Synsets.
I am very sorry you will not be able to attend the ANSI meeting, but please
be assured that your work will be discussed and I am certain that the
committee members will be as pleased as I am with the amount and quality
of your contribution to the Reference Ontology.

I am posting your note and your paper presented at IJCAI-97 to the onotlogy
list (onto-std@hpp.stanford.edu) so that committee members will be able to
review it before the meeting.

Best regards,
Bob Spillers

===============================================================================

From: SMTP3 --IINUS1 Date and time 10/30/97 01:10:25
=========================================================================
Received: from mf00.iij.ad.jp [202.232.2.11] by vnet.IBM.COM (IBM VM SMTP V2R3)
with TCP; Thu, 30 Oct 97 04:11:03 EST
Received: from edrgwo.edr.co.jp (edrgwo.edr.co.jp [163.217.15.1]) by
mf00.iij.ad.jp (8.8.6/3.5W-mf1.1) with SMTP id SAA08824; Thu, 30 Oct 1997
18:10:57 +0900 (JST)
From: miyoshi@edr.co.jp
Received: from edrrp.edr.co.jp by edrgwo.edr.co.jp (8.6.4+2.2W/IIJ-I1.0)
šid SAA13022; Thu, 30 Oct 1997 18:24:20 +0900
Received: from localhost by edrrp.edr.co.jp (4.1/6.4J.6-edrrp.3.0)
šid AA08340; Thu, 30 Oct 97 18:10:33 JST
Message-Id: <9710300910.AA08340@edrrp.edr.co.jp>
To: spillers@vnet.ibm.com
Cc: ogino@edr.co.jp, miyoshi@edr.co.jp, hovy@isi.edu
Subject: EDR's current activities
Date: Thu, 30 Oct 97 18:10:32 +0900

Dear Dr. Bob spillers,

We are very sorry, but EDR members can not attend the
coming meeting:

ANSI Ad Hoc Working Group on Ontology
Meeting at Stanford University
Thursday November 6, 1997

Therefore, we would like to report our current activities
on the matching between EDR Concept Classification Dictionary
and WordNet.

(1) We have done the matching between EDR's upper nodes which are
within three levels from the top and WordNet Synsets.
We have already reported the results to you on 25 July, 1996.
The result was also presented at the IJCAI-97 Workshop.

(2) We are now doing the matching between EDR's upper nodes (about
1700 nodes) which are within five levels from the top and WordNet
Synsets. This work is being done with computer assist in the
following method:

For each EDR's upper 1700 node:
1) get Japanese words corresponding to the node
2) get the English words corresponding to the Japanese word
obtained in 1)
3) remove useless words from the English word set in 2)
4) search for WordNet Synsets which include the words of 3), and
get the candidates
5) fix the corresponding Synset taking the direct upper node and
the direct lower node of both EDR and WordNet into account

We hope to finish this work by the end of fiscal 1997 (March, 1998).

Attached is the paper which we presented in the IJCAI-97 Workshop
on Ontologies and Multilingual NLP.

Best regards,

Takano Ogino
Hideo Miyoshi

Japan Electronic Dictionary Research Institute, LTD. (EDR)
Daini-Abe Bldg., 78-1, Kanda-Sakumagashi,
Chiyoda-ku, Tokyo 101, JAPAN
tel: +81-3-3851-5521 fax: +81-3-3851-5840
email: {miyoshi, ogino}@edr.co.jp

========================

%%%% ijcai97-submit.tex
%% \typeout{IJCAI'97 Submission Instructions for Authors}

% This is the instructions for authors for IJCAI'97.
\documentstyle[ijcai97]{article}
\setcounter{secnumdepth}{3}

% The file ijcai97.sty is the style file for IJCAI'97.
% The preparation of these files was supported by Schlumberger Palo Alto
% Research, AT\&T Bell Laboratories, and Morgan Kaufmann Publishers.
% Shirley Jowell, of Morgan Kaufmann Publishers, and Peter F.
% Patel-Schneider, of AT\&T Bell Laboratories collaborated on their
% preparation.

% These instructions can be modified and used in other conferences as long
% as credit to the authors and supporting agencies is retained, this notice
% is not changed, and further modification or reuse is not restricted.
% Neither Shirley Jowell nor Peter F. Patel-Schneider can be listed as
% contacts for providing assistance without their prior permission.

% To use for other conferences, change references to files and the
% conference appropriate and use other authors, contacts, publishers, and
% organizations.
% Also change the deadline and address for returning papers and the length and
% page charge instructions.
% Put where the files are available in the appropriate places.

\title{An Experiment on Matching EDR Concept Classification Dictionary
with WordNet}
% \author{Takano Ogino \and Hideo Miyoshi \and Fumihito Nishino \and
% Masahiro Kobayashi \and Jun'ichi Tsujii}

\author{Takano Ogino, Hideo Miyoshi, Fumihito Nishino, Masahiro Kobayashi \\
Japan Electronic Dictionary Research Institute, LTD. \\
{\small Daini-Abe Bldg., 78-1, Kanda-Sakumagashi,
Chiyoda-ku, Tokyo 101, JAPAN} \\
{\bf Jun'ichi Tsujii} \\
School of Science, University of Tokyo \\
{\small Hongo 7-3-1, Bunkyo-ku, Tokyo 113, Japan} }

\begin{document}

\maketitle

\begin{abstract}

This paper describes the outline of EDR Electronic
Dictionary and
the ongoing project of matching EDR Concept
Classification Dictionary and WordNet. We have been doing
matching of the two ontologies with two approaches.
We have also presented
the problems in doing matching and their examples.

\end{abstract}

\section{Introduction}

Multimedia information like documents is digitalized in large scale
and distributed on computer network of which Internet is the typical
example, and we are about to experience a large social change such as
electronic commerce and virtual company which is organized on computer
network.
In such an environment,
the information processing technology such as information retrievals
are becoming more and more important.
% In order to accomplish
% intelligent information retrieval systems, a knowledge base of a
% world, ontologies, are indispensable.
In order to accomplish intelligent information retrieval
systems, an ontology -- a kind of knowledge base of the
world, is indispensable. Here we take an ontology as a
subset of a knowledge base, that is, a knowledge base
includes broader knowledge such as causal relationships
between events and encyclopedic knowledge.
So far lots of organizations
have developed their own ontologies, but it is strongly expected
to unify and merge those individual ontologies into a standard one.

At EDR we are doing an experiment on matching EDR Concept
Classification Dictionary and WordNet as a member of ``ANSI Ad-Hoc Group
for Ontology Standards'' and also under the grant from Prof.
Jun'ichi Tsujii of Tokyo Univ.

This paper describes the ongoing project of matching EDR Concept
Classification Dictionary and WordNet, especially the procedures of
matching and the problems in performing it.

The final target of our project is to unify the various ontologies
into one, taking the common and uncommon classifications into
consideration.

\section{Activities for Ontology Standardization}

ANSI Ad-Hoc Group for Ontology Standards has been doing research and
discussions on ontology
standardization\cite{Hovy:ontology,Sawyer:min-march,Sawyer:min-sept}. The
attendees are
from universities, manufacturers, and government organizations. Most
of the attendees work on the research and development of information
retrieval, machine translation, artificial intelligence. EDR has
joined the group since March 1996 and has been working on linking
EDR Concept Classification Dictionary and WordNet \cite{Miller:wordnet}.

The purpose of the meeting was to establish a conforming ontology.
The discussion covered topics such as the description format, the
establishment of the core basic ontology, and which ontologis should
be linked. In the meeting, it is agreed that WordNet (free use) would
be used as the base and it would be linked with PanGloss, CYC
\cite{Hovy:ontology}, and EDR Concept Classification Dictionary.

At EDR a matching has been done between the top 126 intermediate
nodes of EDR
Concept Classification Dictionary which are open in \cite{EDRTG:V1-5} and
WordNet. This was done manually. EDR has been working on
the ``cross reference between WordNet and EDR
Concept Classification Dictionary''. The final target of this project
will be the inter-linkage between WordNet and EDR
Concept Classification Dictionary, focusing on the problems of matching
different ontologies, and the analyses from the following viewpoints
are being done.

\begin{enumerate}
\item the differences of basic strategies for constructing ontologies
depending on the application systems:

EDR Concept Classification Dictionary has been built up considering
applying it to lots of application systems, such as syntax analysis and
information retrieval. For example, in
EDR Concept Classification
Dictionary ``paperknife'' is classified as both ``cutlery''
and ``stationery''. The concept ``cutlery'' has a strong dependency
with a specific attribute and is effectively used in syntax analysis.
The concept ``stationery'' is a classification from a viewpoint of
a usage field and is effectively used in information retrieval.

% WordNet does not have such intermediate nodes.

\item the differences of the concepts depending on the specific languages
used for constructing ontologies:

For example, a Japanese word {\em bijin}(beauty) is used both as a
noun (a beautiful woman) and as an attribute (to be beautiful). So in
EDR Concept Classification
Dictionary the concept node corresponding to {\em bijin}
is classified as both ``person'' and ``attribute'' by
multiple-inheritance. On the other hand, the WordNet synset
\{beauty, sweetheart, peach, \ldots \}
has only ``person'' as an upper concept.

% \item the differences between the ideas of the intermediate nodes of
% EDR Concept Classification Dictionary and synsets of WordNet
\end{enumerate}

\section{An outline of EDR Concept Classification Dictionary}

\subsection{The Development of EDR Electronic Dictionary}

The EDR Electronic Dictionary \cite{Miyoshi-et-al:coling} was
developed for advanced natural
language processing. It consists of five types of large scale
dictionaries: the Word Dictionary, the Concept Dictionary, the
Bilingual Dictionary, the Co-occurrence Dictionary, the Technical
Terminology Dictionary. It is the product of a nine-year project(from
fiscal 1986 to fiscal 1994). The EDR Electronic Dictionary integrates
the relationship between lexical entries and their concepts in the
form of concepts hierarchy and semantic relations, together with
database of corpus from which lexical and conceptional information
were extracted.

\subsection{EDR Concept Dictionary}

EDR Concept Dictionary contains 400,000 concepts listed in Japanese and
English word Dictionaries of 200,000 words each.

The role of the Concept Dictionary is to provide the data required for
computer processing of the semantic contents or the concepts,
expressed in natural language sentenses, such as:

\begin{itemize}
\item Generating appropriate semantic representations for sentences
\item Determining the similarity (equivalence) of semantic contents
\item Converting a semantic content into a similar (equivalent) content
\end{itemize}

For this reason, the Concept Dictionary contains three types of
subdictionaries:

\begin{enumerate}
\item Headconcept Dictionary
\item Concept Classification Dictionary
\item Concept Description Dictionary
\end{enumerate}

The Headconcept Dictionary contains the concept explication. The
headconcept is a word whose meaning is close to the content meaning of
the concept.

The Concept Classification Dictionary contains the set of pairs of
concepts that have super-sub (is\_a) relation. For example, the
super-concepts of 'school' are 'organization,' 'building,' and
'function.' The sub-concepts of 'school' are 'elementary school,'
'university,' and so forth.

The Concept Description Dictionary contains the set of
pairs of concepts that have certain semantic relations other than
super-sub relations The following eight semantic relations are used:
\begin{verbatim}
object agent goal implement
a-object place scene cause
\end{verbatim}
The ``a-object'' is an object of a particular attribute.

\subsection{EDR Concept Classification Dictionary}

The Concept Classification Dictionary classifies all the concepts in
it by their meaning. A polysemous word can be categorized into several
word classfication groups. Concept classification reduces the number
of items that are otherwise categorized multiply.

The Concept Classification Dictionary contains the classification
of concepts that have a super-sub relation. According to the
classification of concepts used in a thesaurus and other reference
sources, concepts are classified according to a tree structure. (In
the EDR Concept Classification Dictionary, multiple inheritance is
allowed thus making it possible for one concept to have more than two
super-concepts. In such a case, the classification is not actually a
'tree structure'. However, the term 'tree structure'
is also used for the sake of convenience.)

Currently the number of concept groups (intermediate nodes) is
about 6,000 and the maximum depth of the tree is sixteen.

In the Concept Classification Dictionary, the pair of
concepts that have an immediate super and sub-concept relation are
registered in one record. The concept classification is expressed by
listing all the super-sub concept pairs. When all the possible
"branches" that make up the tree structure are listed, the concept
classification is fully expressed.

\section{Methods of Matching EDR Concept Classification Dictionary and
WordNet}

At EDR the matching between the intermediate nodes of EDR Concept
Classification Dictionary and WordNet synsets are being done.
The matching is being carried out with the following two approaches:
\begin{description}
\item[Approach1:] For every upper intermediate nodes of EDR Concept
Classification Dictionary, search the corresponding
synset of WordNet.
\item[Approach2:] For every synset of WordNet, search the corresponding
node of EDR Concept Classification Dictionary
\end{description}

Ideally the two approaches should bring about the same result. But
since EDR Concept Classification Dictionary and WordNet are not
constructed under the same basic ideas, some mismatches would be
supposed to arise. Therefore the matching methods of both directions
have been adopted. The example1 and example2 are the actual cases
of matching. In the examples, CID stands for a concept ID and CE
stands for a concept explication of EDR Concept Dictionary.
\newline
{\em Example1:}\ The case where both concepts coincide:

\begin{verbatim}
EDR's Concept
CID = 30f6b0
CE = a human being

WordNet synset
{person, individual, someone, mortal, soul}
\end{verbatim}

{\em Example2:}\ The case where both concepts do not coincide:

\begin{verbatim}
EDR's Concept
CID = 3aae71
CE = a person defined by his/her relation

WordNet synset
{communicator,lover,leader,acquaintance}
{friend,bedfellow,appointee}
{appointment,defender}
{guardian,protecto,peer}
{compeer,client,follower,friend,life}
{namesake,neighbor,neighbour,ward}
\end{verbatim}

The detailed matching process of both approaches are as follows:

\subsection{Approach1: Matching From EDR to WordNet}
\begin{description}
\item[step1-1:]To make a list of words which correspond to the
upper nodes (the depth is less than 4) of EDR Concept
Classification Dictionary.
\item[step1-2:]To search the synsets in WordNet which contain every
word of the list of step1-1. If the word is a Japanese one, its
corresponding English words are used consulting EDR JE Bilingual
Dictionary. Of course it is often the case that the words which
correspond to a same
intermediate node of EDR Concept Classification Dictionary are
located in the different synsets of WordNet, and some
of the EDR's words
are not containd in any of the synsets of WordNet.
In those case, the frequency will be used to determine the nearest
synset.

\end{description}

\subsection{Approach2: Matching From WordNet to EDR}
\begin{description}
\item[step2-1:]To pick up the words which are contained in the
synsets from the top. The data is stored in the four files of WordNet
categorized by the part of speech (NOUN, VERB, ADJ, ADV).
\item[step2-2:]To search the corresponding location of the EDR Concept
Classificaiton Dictionary for each word in the synsets. The English
words in the synsets are translated into the corresponding Japanese
words consulting EDR EJ Bilingual Dictionary before doing the search.
Of course if the original English words are included in the EDR
Concept Classificaiton Dictionary, they can be used without
translating.
\item[step2-3:]If the words contained in a synset of
WordNet correspond to different nodes of the EDR Concept
Classificaiton Dictionary, frequency and similarity calculation will
be used and finally determined by human.
\end{description}

\section{The Differences between EDR Concept Classification Dictionary
and WordNet}

This section describes the interim results of the matching experiments
which are now going on by the approach described in the Section 4.
Here we show the results of the approach1 (Matching From EDR to
WordNet) in which we tried manually to match the top 126 nodes of EDR
Concept Classification Dictionary and WordNet synsets.

The relations between the two ontologies are classified into the
following four types:
\begin{description}
\item[type1($=$)] The referents of an EDR's concept coincides with
the referents of a WordNet synset.
\item[type2($>$)] The EDR concept has a wider range of referents than
the WordNet synset.
\item[type3($<$)] The EDR concept has a narrower range of referents than
the WordNet synset.
\item[type4($*$)] The EDR concept can not be linked to any of WordNet synsets.
\end{description}

The type2, type3, and type4 are the problematic cases in matching.
Here we show some examples of each type.

{\em Example3.1:} type2

\begin{verbatim}
EDR's Concept
CID = 3aa912
CE = a subject of an action

WordNet synset
> {group, grouping}
> {conveyance,transport}
> {artifact, artefact}
> {mechanical device}
\end{verbatim}

{\em Example3.2:} type1

\begin{verbatim}
EDR's Concept
CID = 30f746
CE = organization

WordNet synset
= {organization}
\end{verbatim}

{\em Example3.3:} type3

\begin{verbatim}
EDR's Concept
CID = 3cfacc
CE = group of people

WordNet synset
< {people}
\end{verbatim}

{\em Example3.4:} type2

\begin{verbatim}
EDR's Concept
CID = 3f960d
CE = human race

WordNet synset
> {world, human race, humanity, humankind,
mankind, man}
> {nation, nationality, land, country,
a people}
> {social group}
\end{verbatim}

{\em Example3.5:} type1

\begin{verbatim}
EDR's Concept
CID = 444614
CE = meeting/conference

WordNet synset
= {meeting}
\end{verbatim}

{\em Example3.6:} type2

\begin{verbatim}
EDR's Concept
CID = 3aa930
CE = object/thing that moves independently

WordNet synset
> {mechanical device}
> {conveyance,transport}
\end{verbatim}

The Example4 shows a case where the same concept is placed
at different location in each ontology. The symbol ``$=>$''
stands for a sub-relation in WordNet synset hierarchy.

{\em Example4:}

\begin{verbatim}
EDR's Concept Hierarchy
1-3 a subject of an action (CID=3aa912)
1-3-1 organization (CID=30f746)
1-3-2 group of people (CID=3cfacc)
1-3-3 human race (CID=3f960d)
1-3-4 meeting (CID=444614)
1-3-5 thing that moves independently
(CID=3aa930)

WordNet Synset Hierarchy
{people}
=> {age group, age bracket}
=> {aged}
=> {young, youth}
=> {blind}
=> {blood}
=> {brave}
:
:
=> {nation, nationality, land,
country, a people}
\end{verbatim}

In the Example4, EDR's ``group of people'' (CID = 3cfacc) corresponds
to WordNet's \{people\}, and EDR's ``human race'' (CID = 3f960d)
corresponds to WordNet's \{nation, nationality, land, country,
a people\}.
EDR's two concepts are in a sister relation, however, WordNet's
two concepts are in a parent-child relation.
\newline

The Example5 shows the case of type4 where the EDR concept can not be
linked to any of WordNet synsets. EWD is a set of English
word examples linked to the original EDR's concepts.

{\em Example5:}

\begin{verbatim}
EDR's Concept Hierarchy

4-1 physical location/actual location/actual
space (CID=3aa938)
4-1-1 * place (physical) defined by its
function (CID=30f749)
EWD={a building lot, reclaimed land}
4-1-2 * place (physical) defined by its
shape (CID=4449d9)
EWD={a slope, a precipice}
4-1-3 * place (physical) defined by its
reputation or by an evaluation
(CID=444a86)
EWD={a paradise}
4-1-4 * place (physical) defined by its
condition or a condition
(CID=30f75c)
EWD={Eisbahn, the shade, wasteland}
\end{verbatim}

Out of the EDR top 126 nodes, the number of type1 is 41,
type2 is 20, type3 is 47, and type4 is 18.
The causes of matching problems are as follows:
\begin{enumerate}
\item Although EDR's concept has a corresponding WordNet synset,
the location in each ontolgy differs (Example4).
\item The different classification causes an EDR concept that
does not have a corresponding WordNet synset (Example5).
\item The different classification causes a different coverage
of referents of similar concepts (Example3).
\end{enumerate}

\section{Conclusion}

This paper has presented the ongoing project of matching EDR Concept
Classification Dictionary and WordNet, especially the procedures of
matching and the problems in performing it.

We hope that by clarifying the common parts and the different parts
that are specific to each ontology we can construct and merge into a
similar unified ontology starting form any of the existing ontologies.

We also hope that the unified standard ontology will contribute toward
a great development of natural language processing technologies, such
as an intelligent information retrieval, and semantic and contextual
understandings.

%% This section was initially prepared using BibTeX. The .bbl file was
%% placed here later
%\bibliography{publications}
%\bibliographystyle{named}
%% The file named.bst is a bibliography style file for BibTeX 0.99c
\begin{thebibliography}{}

\bibitem[\protect\citeauthoryear{EDR}{1996}]{EDRTG:V1-5}
EDR.
\newblock {\em EDR Electronic Dictionary Version 1.5 Technical Guide}.
\newblock EDR TR2-007, 1996.

\bibitem[\protect\citeauthoryear{Hovy}{1996}]{Hovy:ontology}
Eduard Hovy.
\newblock {\em Creating a Large Ontology}.
\newblock ANSI Ad Hoc Group on Ontology, Stanford University, September 1996.

\bibitem[\protect\citeauthoryear{Miller \bgroup \em et al.\egroup
}{1993}]{Miller:wordnet}
George A.Miller, Richard Beckwith, Christianne Fellbaum, Derek Gross, and
Katherine Miller, Randee Tengi.
\newblock {\em Five Papers on WordNet}.
\newblock Cognitive Science Laboratory Princeton University.
\newblock CSL Report 43, 1993.

\bibitem[\protect\citeauthoryear{Miyoshi \bgroup \em et al.\egroup
}{1996}]{Miyoshi-et-al:coling}
Hideo Miyoshi, Kenji Sugiyama, Masahiro Kobayashi, and Takano Ogino.
\newblock {\em An Overview of the EDR Electronic Dictionary and the Current
Status of Its Utilization}.
\newblock Proceedings of COLING-96, August 1996.

\bibitem[\protect\citeauthoryear{Sawyer}{1996a}]{Sawyer:min-march}
Submitted by Steve Sawyer.
\newblock {\em Minutes of the ANSI Ad Hoc Group on Ontology
Standards} Held at IBM Santa Teresa Laboratory.
\newblock March 1996.

\bibitem[\protect\citeauthoryear{Sawyer}{1996b}]{Sawyer:min-sept}
Submitted by Steve Sawyer.
\newblock {\em Minutes of the ANSI Ad Hoc Committee On Ontology
Standards} Held at Stanford's Knowledge Systems Laboratory.
\newblock September 1996.

\end{thebibliography}

\end{document}