1 Sirma AI Labs, naso@sirma.bg
2 Linguistic Modelling Lab., Bulgarian Academy of Sciences,
kivs@bgcict.acad.bg
ABSTRACT
A mapping of EuroWordNet Top Ontology into Upper Cyc Ontology is presented.
The mapping is expressed in terms of a CycL microtheory encoding of the
EuroWordNet Top Ontology, because it is impossible to be made just by means
of equivalence and subsumption relations. However we provide also a simplified
relational view that is sufficient for many purposes.
The mapping will facilitate a better understanding of those ontologies.
It could also be used as a tool for the linking of the actual lexical items
in the wordnets for the EWN-covered languages to the knowledge represented
in Cyc KB.
In this paper we will be concerned with mapping the EuroWordNet Top Ontology (see [Vossen 98]) to the Upper Cyc Ontology (see [Cycorp 97]). A direct and in the same time formally correct mapping via equivalence and subsumption relations is impossible because of the complexity of the Upper Cyc Ontology. This is why the mapping is expressed in terms of a CycL (the knowledge representation language of Cyc) microtheory encoding of the EuroWordNet Top Ontology. We also provide, however, a simplified relational view of the mapping that is sufficient for many purposes. The purpose of the mapping is manifold: (1) to ensure linking between the concepts in the two ontologies; (2) to be used as a tool for the linking of the actual lexical items in several languages to the knowledge represented in Cyc KB; (3) to provide a more detailed semantic context information for the lexical items in EuroWordNet.
A set of base concepts (BC) is developed in order to unify the conceptual knowledge represented among the different wordnets. The base concepts were selected from the resources available in each language according to their importance defined by two criteria: (1) the number of relations connected to this concept and (2) its position in the hierarchy of concepts. The set "language independent" base concepts in the ILI is produced by merging (in a complex way) the sets for each language. Additionally, the set of BC is grouped in coherent clusters by means of the Top Ontology (EwnTO). It comprises 64 concepts defining the fundamental semantic classes. Because of the choice of the BC it is expected that all the words could also be classified under the semantic features of the EwnTO via their relations to the BC.
The Upper Cyc Ontology (UCO) (see [Cycorp 97]) is the publicly available part of the Cyc® Knowledge Base and is devoted to the representation of language independent encyclopaedic knowledge. The Upper Cyc Ontology consists of about 3000 constants. Many of the constants are defined as unary predicates and could be viewed as concepts. Others denote relations, logical operators and so forth. The name of each Cyc® constant begins with the string #$. The set of constants in UCO is hierarchically organized by means of two structuring relations:
The important point about UCO is that it contains (according to its developers) enough concepts to structurally classify properly each new concept. So it should be a good candidate for an upper ontology.
;;; #$EuroWordnetMt
(#$genlMt #$EuroWordnetMt #$BaseKB)
(#$comment #$EuroWordNetMt "#$EuroWordNetMt
microtheory ...")
;;; #$EuroWordnetTCType
(#$ist #$EuroWordnetMt (#$isa #$EuroWordnetTCType #$PredicateCategory))
(#$ist #$EuroWordnetMt (#$comment #$EuroWordnetTCType
"Collection of the EuroWordnet top-concepts. Each TC is a Cyc predicate."))
The highest concept in EwnTO (Top) is stated to be the same as the peak of the hierarchy of UCO:
;;; #$TopTC
(#$ist #$EuroWordnetMt (#$equals #$TopTC #$Thing))
(#$ist #$EuroWordnetMt (#$isa #$TopTC #$EuroWordnetTCType))
In the following assertions we will leave implicit the part which explicitly states that the assertions are in #$EuroWordnetMt microtheory namely (#$ist #$EuroWordnetMt ASSERTION). Below are some of the definitions of the auxiliary predicates used in the mapping:
;;; #$exactType
(#$genlPreds #$exactType #$isa)
(#$genls #$MappingPredicate #$TaxonomicSlotForCollections)
(#$equivalent (#$exactType ?COL ?TYPE)
(#$and (#$equivalent (#$isa ?X ?TYPE) (#$genls ?X ?COL))
(#$isa ?COL ?TYPE))
#$exactType is a specification of the #$isa predicate which relates two collection such that the second collection ?TYPE is a collection of collections and it contains all sub-collections of ?COL (including ?COL itself) and only them. Using this predicate we can state as equivalent a part of the collection hierarchy and a collection of collections, i.e. concept and concept type.
;;; #$specificType
(#$isa #$specificType #$DefaultMonotonicPredicate)
(#$isa #$specificType #$TaxonomicSlotForCollections)
(#$equivalent (#$specificType ?COL ?TYPE)
(#$implies (#$isa ?X ?TYPE) (#$genls ?X ?COL))
Sometimes we need to state that a given collection contains only some
collections from the sub-hierarchy. For this purpose we use the predicate
#$specificType.
In
the above assertions
#$equivalent is a logical operator
easily definable in CycL.
Four concepts are defined one step below 1stOrderEntity concept - Origin, Form, Composition, Function, which determine the main aspects of each time and space thing and correspond to the Qualia structure presented in [Pustejovsky 95]. In our view these concepts are more like attributes of existing objects than object classes or orthogonal dimensions of explanation. We encode them as classes of predicates rather than predicates. The next layer under these four concepts defines the concepts representing the actual values for these attributes. Here we give an example of the encoding of one of these concepts in #$EuroWordNetMt:
;;; #$1stOrderEntityTC
(#$equals #$1stOrderEntityTC #$SomethingExisting)
(#$isa #$1stOrderEntityTC #$EuroWordnetTCType)
;;; #$1stOrderEntityTCType
(#$isa #$1stOrderEntityTCType #$PredicateCategory)
(#$genls #$1stOrderEntityTCType #$EuroWordnetTCType)
(#$exactType #$1stOrderEntityTC #$1stOrderEntityTCType)
;;; #$OriginTCType
(#$genls #$OriginTCType #$1stOrderEntityTCType)
;;; #$NaturalTC
(#$equals #$NaturalTC #$NaturalTangibleStuff)
(#$isa #$NaturalTC #$OriginTCType)
This example demonstrates the most simple mapping - the case there exists a Cyc constant with the same meaning as the EWN Top Concept.
The hierarchy below 2ndOrderEntiry is similar. The first level defines two dimensions (qualia) for the characteristics of a situation: SituationTypes and SituationComponents. The former divides situations in dynamic and static while the later defines clustering of the situations according to the presence of a specific aspect in the description of the situation content. Thus we can follow the above pattern in the definition of these concepts also.
;;; #$SituationTypeTCType
(#$genls #$SituationTypeTCType #$2ndOrderEntityTCType)
;;; #$SituationComponentTCType
(#$genls #$SituationComponentTCType #$2ndOrderEntityTCType)
Below these we often have more complicated mapping. For example, we need to define MentalTC as follows
;;; #$MentalTC
(#$equals #$MentalTC (#$UnionFn #$MentalEvent #$MentalAttribute))
Here an EWN top concept is represented as a disjunction of two UCO constants because UCO doesn't contain a common concept for dynamic and static situations which also to account for the mental aspect of a situation.
A more interesting case of classification is the mapping of the EWN's PartTC. In UCO there are some constants devoted to distiguishing some special kinds of parts as parts of organisms, parts of buildings and others but obviously there is no general definition part. Therefore all we can point out are examples of parts and we can say that in principle only in individual things can constitute a part of something. Here is the mapping definition:
;;; #$PartTC
(#$genls #$PartTC #$Individual)
(#$genls #$OrganismPart #$PartTC)
(#$genls #$CellPart #$PartTC)
(#$genls #$PartOfBuilding #$PartTC)
(#$isa #$PartTC #$CompositionTCType)
This definition reflects another characteristic of EwnTO which we will discuss in detail below. As a part of the Composition dimensions (or qualia) PartTC could be applied to many entities below 1stOrderEntity, but its value is not significant for many of the concepts: "It is not the case that all persons will be classified as Parts because they may be part of group." ([Vossen 98]) The notion of "intentional" significance is important in the EWN classification of word meanings but it is very hard to represent them on a general level in Cyc. We decided to leave this concept underspecified.
A harder problem than the described above is the mapping of the top
concept TimeTC. It is defined in [Vossen 98] as: "Situations
in which duration or time plays a significant role; Static yesterday, day,
pass, long, period, Dynamic e.g. begin, end, last, continue." This gives
us a hint at to how to constrain the concept from above by the disjunction
of two UCO constants (#$Event or #$StaticSituation)
both of which have a temporal aspect. However this is still too general
to cover the EwnTO meaning of TimeTC which doesn't include a lot of events
and states which are classified under #$Event or #$StaticSituation.
The only appropriate concept in UCO is #$TemporalRelation
but it is more specific and the mapping is problematic because it is not
a specialization of #$SituationType.
Mismatching taxonomic structure
Some of the problems arise because of differences between the formal definitions in the UCO and the EWN Top Ontology (with respect to the base taxonomic relations) for concepts which glosses are matching. In those cases the mapping was made according to the glosses.
Lets take as an example MoneyRepresentationTC
that is subsumed in EWN by RepresentationTC. They are mapped
into UCO as follows:
(#$equals #$RepresentationTC #$InformationBearingObject)
(#$equals #$MoneyRepresentationTC #$TenderObject)
However it is NOT true in UCO that
(#$genls #$TenderObject #$InformationBearingObject)
The reason for this is the different structuring of the conceptualizations
used by the creators of both ontologies. Thus #$Currency
(a specialization of #$TenderObject) is an #$InformationBearingObject
(IBO) but doesn't covers some of the meanings of MoneyRepresentationsTC,
for example "shares". The closest UCO concept for the latter one is #$Stock,
but it is a specialization of #$SalesAgreement and it is
not declared as a kind of #$TenderObject. The reason
for this is that #$Stock covers only the abstract aspect
of the stock without its material (paper) media which is represented
by #$StockCertificate. Unfortunately the later one is not
declared to be a specialization #$TenderObject. The mismatch
can be partially corrected if #$TenderObject could
be classified in UCO as a kind of IBO. The last assertion would be correct
because each of #$TenderObject instances could play
this role. However the correctness and relevance of the last assertion
is arguable. Finally, the comparison would be easier if #$StockCertificate
be classified as a #$TenderObject, but the last is not
obvious for some kinds of #$Stock.
This was a typical example in which we mapped MoneyRepresentation to #$TenderObject following the matching glosses. Our motivation for this decision was that it could be expected that the knowledge enterers using the ontologies (especially those with linguistic background) are more likely to also give preference to the meaning stated in the gloss. It is also a fact that the formal meaning encoded by the taxonomic relations is just a small fraction of what is described in the gloss.
Concept Types in UCO rather than Concepts
Some UCO constants stand for concept types (collections of collections) rather than for concepts themselves. Such concept types include, for example, the constant #$PositionType. It represents the collection of all concepts (predicates) about occupations (OccupationTC) but it is not a concept itself. The mapping is even harder when such a concept type covers just part of the sub-concepts of a top concept. Such is the case for example between #$SocialAttaributeType and SocialTC. The former covers only the static situations clustered under SocialTC.
The mapping between Time and #$TemporalRelation (continuing the above example explanation) is even more interesting because the later constant does not represents a concept - it is a concept type or in other words meta-concept. That means that the concepts related with it (like #$after) are not its specializations (sub-concepts). They are its instances that in Cyc will be expressed via (#$isa #$after #$TemporalRelation) rather then using one of the subsumtion predicates (#$genls, #$genlPreds, etc.). In UCO there are many other cases like this - meaning that is conceptualized as a concept type rather than as a concept.
With respect to the above phenomena we need the following relations between a EWN top concepts and UCO constants to be used in the mapping:
Mapping Relations
| Relation name | Encoding in CycL | Comment |
| exact mapping | (#$equals EWNTC CYCC) | |
| more general in Cyc | (#$genls EWNTC CYCC) | |
| more specific in Cyc | (#$genls CYCC EWNTC) | |
| instance of | (#$exactType EWNTC CYCC) | otherwise equivalent but encoded as a concept type (rather than concept) in UCO |
| instance of, more general in Cyc | (#$isa EWNTC CYCC) | EWNTC is more general then each of the instances of CYCC |
| instance of, more specific in Cyc | (#$specificType EWNTC CYCC) | EWNTC is more specific then some of the instances of CYCC |
| qualia for | (#$genls EWNTC CYCC) | EWNTC is qualia (attribute type) for instances of CYCC |
Missing subsumtion relations in UCO
There are subsumption relations that are not precise in UCO. For example Occupation could be directly mapped to #$IntendedFunction. On the other hand #$PositionType (that has no relation with #$IntendedFunction) is still relevant to Occupation because all of its instances are specializations of Occupation. In this case we included #$PositionType as a constant that is not "the Mapping" but still relevant to Occupation. This case is additionally complicated by the fact that #$PositionType itself is a type of concept rather then concept. The mapping is:
(#$equals #$OccupationTC #$IntendedFunction)
(#$specificType #$OccupationTC #$PositionType)
The mapping defined here shows the level of compatibility between the EWN top ontology and UCO. It became clear that there are important aspects that could not be properly covered in UCO. We should say that this mapping could suffer in quality because of two main reasons: (1) the complexity of UCO, that the authors can not pretend to fully understand; (2) underspecification of some of the EWN top concepts, especially some of those below 1stOrderEntity. As a result it could be expected that some of the mappings could be improved.
The mapping we have constructed will be used in two future works. First,
it will be extended with the base concepts of EuroWordNet and then used
for the creation of a Bulgarian lexical knowledge base connected to EuroWordNet
base concepts and thus to UCO. Second, we envisage a use of the mapping
for the analyses of text on the idea of lexical chains (see [Hirst and
St-Onge 1998]) which will be used to determine the right ontology chunks
assigned to the text along the lines of [Kiryakov and Simov 1999].
[Cycorp 97] Cyc® Public Ontology.
[Fellbaum 98] Fellbaum, Christiane (editor), WORDNET: an electronic lexical database. MIT Press, 1998.
[Hirst and St-Onge 1998] Graeme Hirst and David St-onge, Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms. In [Fellbaum 98]. MIT Press, 1998.
[Kiryakov and Simov 1999] Atanas Kiryakov and Kiril Simov. Ontologically Supported Semantic Matching. In the proceedings of NoDaLiDa'99 conference. Trondheim, Norway, 1999.
[Pustejovsky 95] James Pustejovsky. Generative Lexicon. The MIT Press. Cambridge, MA, 1995.
[Vossen 98] Vossen Piek (ed.), EuroWordNet
General Document. Version 3, Final, July 19, 1999
| EWN Top Concept | Cyc constant | Relation type |
| 1stOrderEntity (exact mapping) | #$SomethingExisting | |
| 2ndOrderEntity (exact mapping) | #$Situation | |
| 3rdOrderEntity (exact mapping) | #$PropositionalInformationThing | |
| Agentive (gap) | #$Event | much more general in Cyc |
| Animal (exact mapping) | #$NonPersonAnimal | |
| Artifact (exact mapping) | #$Artifact | |
| BoundedEvent (difference) | #$Event | more general in Cyc |
| #$TemporalObjectType | instance of, much more general in Cyc | |
| Building (exact mapping) | #$Building | |
| Cause (gap) | #$Event | much more general |
| Comestible (exact mapping) | #$FoodAndDrink | |
| Communication (difference) | #$Situation | much more general in Cyc |
| #$Communicating | more specific in Cyc, it requires exchange of information between at least two agents | |
| #$ibtHasInfoAbout | more specific in Cyc, covers "to be about" sense of "communicate", but only for non-abstract #$InformationBearingThings | |
| #$propositionalInfoAbout | more specific in Cyc, covers "to be about" sense of "communicate" for abstract things, e.g. theories | |
| Composition (qualia) | #$SomethingExisting | qualia for |
| Condition (difference) | #$Situation | much more general in Cyc |
| #$WeatherAttribute | more specific in Cyc | |
| #$PhysiologicalCondition | more specific in Cyc | |
| #$TangibleStuffStateType | instance of, more specific in Cyc | |
| #$StateOfMatter-SolidLiquidGaseous | more specific in Cyc | |
| Container (exact mapping) | #$ContainerProduct | seam a bit more specific in Cyc but it is not formally specified in EWN in order compare precisely |
| Covering (gap) | #$SomethingExisting | much more general in Cyc |
| Creature (gap) | #$BiologicalLivingObject | much more gen. in Cyc, missing, contradictory in EWN |
| Dynamic (exact mapping) | #$Event | |
| Existence (exact mapping) | #$CreationOrDestructionEvent | |
| Experience (difference) | #$Situation | much more general in Cyc |
| #$Perceiving | more specific in Cyc, covers only the phisical experiences, but not mental such as "desire" | |
| #$FeelingAttribute | more specific in Cyc, covers only the mental experiences, but not physical, such as "hear" | |
| Form (qualia) | #$SomethingExisting | qualia for |
| Function (qualia) | #$SomethingExisting | qualia for |
| #$IntendedFunction | more specific in Cyc | |
| #$Role | more specific in Cyc | |
| Furniture (exact mapping) | #$FurniturePiece | |
| Garment (exact mapping) | #$ClothingItem | |
| Gas (exact mapping) | #$GaseousTangibleThing | |
| Group (exact mapping) | #$Group | |
| Human (exact mapping) | #$Person | |
| ImageRepresentation (exact mapping) | #$VisualInformationSource | |
| Instrument (difference) | #$SomethingExisting | much more general in Cyc |
| #$PhysicalDevice | more specific in Cyc. But it is too underspecified in EWN in order to compare properly | |
| LanguageRepresentation (exact mapping) | #$TextualMaterial | |
| Liquid (exact mapping) | #$LiquidTangibleThing | |
| Living (exact mapping) | #$BiologicalLivingObject | |
| Location (difference) | #$Situation | more general in Cyc |
| #$SpatialPredicate | instance of, more specific in Cyc, covers Location + Static | |
| #$MovementEvent | more specific in Cyc, covers Location + Dynamic | |
| Manner (difference) | #$Situation | more general in Cyc |
| #$ScriptPerformanceAttribute | more specific in Cyc, covers the Static situations with Manner aspect | |
| #$LocomotionEvent | more specific in Cyc | |
| Mental (difference) | #$Situation | more general in Cyc |
| #$MentalAttribute | more specific in Cyc, covers Mental + Static | |
| #$MentalEvent | more specific in Cyc, covers Mental + Dynamic | |
| Modal (exact mapping) | #$ModalRelationship | |
| MoneyRepresentation (exact mapping) | #$TenderObject | |
| Natural (exact mapping) | #$NaturalTangibleStuff | |
| Object (exact mapping) | #$ExistingObjectType | instance_of |
| Occupation (exact mapping) | #$PositionType | instance_of |
| Origin (qualia) | #$SomethingExisting | qualia for |
| Part (difference) | #$Individual | much more general In Cyc |
| #$PartOfBuilding | more specific in Cyc | |
| #$CellPart | more specific in Cyc | |
| #$OrganismPart | more spec. in Cyc | |
| Phenomenal (gap) | #$Event | much more general in Cyc |
| Physical (difference) | #$Situation | |
| #$PhysicalAttribute | more specific in Cyc, covers the Static + Physical situations | |
| #$PhysicalEvent | more specific in Cyc, covers the Dynamic + Physical situations | |
| Place (exact mapping) | #$Place | |
| Plant (exact mapping) | #$PlantBLO | |
| Possession (difference) | #$Situation | more general in Cyc |
| #$ChangeInUserRights | more specific in Cyc, covers Dynamic + Possession | |
| #$userRightsRelation | more specific in Cyc, partially covers Static + Possession | |
| #$hasOwnershipIn | much more specific in Cyc, should be related in a way to #$userRightsRelation, but it is not in UCO | |
| #$UserRightsAttribute | more specific in Cyc, partially covers Static + Possession | |
| Property (problematic) | #$AttributeValue | it is not a #$Situation in Cyc |
| #$StaticSituation | more general in Cyc | |
| Purpose (exact mapping) | #$PurposefulAction | |
| Quantity (gap) | #$Situation | more general in Cyc |
| Relation (problematic) | #$Relationship | it is not a #$Situation in Cyc |
| #$StaticSituation | more general in Cyc | |
| Representation (exact mapping) | #$InformationBearingObject | |
| SituationComponent (qualia) | #$Situation | qualia for |
| SituationType (qualia) | #$Situation | qualia for |
| Social (difference) | #$Situation | more general in Cyc |
| #$SocialOccurrence | more specific in Cyc, covers Social + Dynamic | |
| #$SocialAttributeType | instance of, more specific in Cyc, covers Social + Static | |
| Software (exact mapping) | #$ComputerProgram | in Cyc it is an IBO, i.e. tangible object that bears an information that could be interpreted as a computer program. In EWN it is not determined that it is tangible |
| Solid (exact mapping) | #$SolidTangibleThing | |
| Static (exact mapping) | #$StaticSituation | |
| #$Relationship | more specific in Cyc, not related to #$StaticSituation in Cyc | |
| #$AttributeValue | more specific in Cyc, not related to #$StaticSituation in Cyc | |
| Stimulating (gap) | #$Event | much more general in Cyc |
| Substance (exact mapping) | #$ExistingStuffType | |
| Time (problematic) | #$TemporalRelation | instance of, more specific in Cyc, partially covers Time + Static, it is not a situation type in Cyc |
| #$StaticSituation | much more general in Cyc | |
| #$Event | much more general in Cyc | |
| Top (exact mapping) | #$Thing | |
| UnboundedEvent (difference) | #$Event | more general in Cyc |
| #$TemporalStuffType | instance of, much more general in Cyc | |
| Usage (gap) | #$Situation | more general in Cyc |
| #$ConsumingFoodOrDrink | more specific in Cyc | |
| Vehicle (exact mapping) | #$TransportationDevice-Vehicle |