Confidential IGC Design Document

The Knowledge Representation Language of Webmind

Version 1.5
March 23, 2001
Pei Wang
Cate Hartley

 

What is new in this version:

¨      question and command as sentence, to support query and schema

¨      to only use <sentence>, not <text>, as argument of relation

¨      to move <mathRelation> under <relation>

¨      ExtensionalSimilarity as built-in relation

¨      before/after as built-in relation

¨      property as special relation: exactly one value

¨      “symmetric difference” renamed as “symmetricDifference”

¨      <logicalOperator> from “&&, ||, !”  into “AND, OR, NOT”

¨      ordered-AND operator (for schema)

¨      change the range of <confidence> from (0, 1) to [0, 1]

¨      use “term” as concept name, and distinguish it from “word”

¨      anonymous term

 

1. Introduction

Webmind needs a knowledge representation language, mainly for its communication with the other (human or computer) systems.  In this document, such a language, KNOW (Knowledge Norm Of Webmind), is defined.

The following are some possible use cases of KNOW:

To serve these purposes, KNOW should be designed to keep some subtle balance:

In its relation to Webmind, KNOW should closely correspond to how knowledge is represented within Webmind (i.e., the nodes and links), so that its sentences can be easily integrated into PsyNet.  On the other hand, KNOW should be different from Psynese (in which the actual nodes and links are used) because many nodes/links, as well as variable values in them, do not represent domain knowledge, but represent various kinds of self knowledge, and knowledge about domain knowledge.  For example, nodes like UserNode, links like CountLink, and variables like inferenceTrail, should not be directly represented in KNOW.  Also, KNOW should be independent to implementation details, that is, even if the same semantic relation can be implemented in different ways, it should remain the same in KNOW.

In its relation to the users and external knowledge sources, the language should be independent to Webmind as much as possible.  This is because we should not force the user to know Webmind well to feed knowledge into it, nor can we assume the existing knowledge sources are Webmind-like.  Instead, we want the structure and expressing power of KNOW to be similar to that of natural languages, though we don't want it to be bounded into any specific natural language, such as English.

Based on this understanding, in the following a context-free grammar for KNOW is defined.  This version will still be incomplete.  The people who work with it (in the near future, most relevant work will be in natural language, psyche, machine learning, reason, and psycore) should report cases where knowledge cannot be properly represented in the current KNOW or where KNOW sentences cannot be properly converted into links and nodes, then the language will be revised accordingly.

Many related representation issues are discussed in Knowledge Representation for Inference.

 

2. The Grammar of KNOW

The grammar is written with the following notations:

Here comes the grammar:

<text> ::= {<sentence>}+
<sentence> ::= <relation> [<strength> <confidence>] [<type>]

            | <logicalOperator> {<sentence>}+
<type> ::= . | ? | !
<relation> ::= <builtInRelation>
                    | <userDefinedRelation>

        | <mathRelationName> <formula> <formula>    
                    | <logicalOperator> {<relation>}+
<builtInRelation> ::= Inheritance <argument> {<argument>}+
                              | Similarity <argument> {<argument>}+
                              | Implication <sentence> {<sentence>}+
                              | Equivalence <sentence> {<sentence>}+
                              | Instance <argument> {<argument>}+
                              | Membership <argument> {<argument>}+
                              | Subset <argument> {<argument>}+
                              | ExtensionalSimilarity <argument> {<argument>}+
                              | Property <argument> <argument> <argument>
                              | NumericProperty <argument> <argument> <measurement>
                              | RelativeProperty <argument> <argument> <argument> [<measurement>]
                              | Time <argument> <argument>
                              | NumericTime <argument> <measurement>
                              | RelativeTime <argument> <argument> [<measurement>]
                              | Location <argument> <argument>
                              | RelativeLocation <argument> <argument> <argument> [<measurement>]
                              | Association <argument> <argument>
                              | PartOf <argument> {<argument>}+
                              | Own <argument> <argument>
                              | Believe <argument> <sentence>
                              | Want <argument> <sentence>
                              | Before <argument> [<measurement>]
                              | After <argument> [<measurement>]
<userDefinedRelation> ::= <relationName> {<argument>}+
<measurement> ::= <value> <unit>
<argument> ::= term

           | word
                       | number
                       | <variable>
                       | <existentialVariable>
                       | <sentence>
                       | <modifier> <argument>
                       | <argumentSet>
<argumentSet> ::= {<argument>}+
<formula> ::= number
                       | EmptySet
                       | <variable>
                       | <existentialVariable>
                       | <mathOperator> {<formula>}+
<variable>::= term
<existentialVariable> ::= term [<dependency>]+
<dependency> ::= {<variable>}+
<mathRelationName>    ::=  == | != |  >  | <  | >= | <=
<mathOperator> ::= + | - | * | /
                                  | exponent | integer division | intersection | union | difference | symmetricDifference
                                  | set | list | powerSet | subsetUnion
<logicalOperator> ::= AND | OR | NOT |  Ordered-AND
<relationName>: term
<modifier>: term or number
<unit>: term
<value>: real number
<strength>: real number in [0, 1]
<confidence>: real number in [0, 1]


The outermost <text> tag should have three attributes:  source, translator, and dateTranslated.  The source field may be a reference to the source document used to create the encoding (example: mizar or cyc), or it may have the value created (which would imply that the translator has encoded knowledge from her/his own knowledge without the benefit of any particular source).  The translator field holds the name of the person doing the XML encoding, though this could potentially be a reference to a translation script in some cases.  If a script is referenced in the translator field, it's probably best to let the contents of the field be a url pointing to the script's location on the our Intelligenesis intranet (presently hosted at https://snowcrash.intelligenesis.net).  DateTranslated is obviously the date that the XML encoding was completed.  This should be updated whenever changes are made to the content.

An example:   <text source="mizar axioms:13" translator="charlie derr" dateTranslated="06/18/2000">


Sentences generated according to this formal grammar can be saved in two forms: in XML or in plain text.

For now, in the XML version, each non-terminal item in the grammar corresponds to a tag with the same name.  For example, the knowledge "John gave Mary a book written by himself" will become

<text>
  <sentence>
    <relation> <userDefinedRelation> <relationName> give </relationName>
                        <argument> John </argument> <argument> Mary </argument> <argument> book1 </argument>
                    </userDefinedRelation>
    </relation> <strength> 1.0 </strength> <confidence> 0.9 </confidence>
  </sentence>
  <sentence>
    <relation> <builtInRelation> Instance
                        <argument> book1 </argument> <argument> book </argument>
                    </builtInRelation>
    </relation> <strength> 1.0 </strength> <confidence> 0.9 </confidence>
  </sentence>
  <sentence>
    <relation> <userDefinedRelation> <relationName> author </relationName>
                        <argument> John </argument> <argument> book1 </argument>
                    </userDefinedRelation>
    </relation> <strength> 1.0 </strength> <confidence> 0.9 </confidence>
  </sentence>
</text>

Here strength and confidence take default values.

In the future, a DTD will be defined for KNOW to reduce the number of tags in the XML files.

Another way is to represent KNOW text as plain text, given the following conventions:

(1) Put each <sentence> in a [ ].
(2) If a <text> contains more than one <sentence>, put them in a { }, otherwise just use the <sentence>.
(3) Use an underscore("_") instead of whitespace when including a multi-word phrase as an argument or relation name.
(4) Separate adjunct items by a space.
(5) If an argument is preceeded by a series of modifiers, put all of them in ( ).
(6) Put an argument set in <>.
(7) Represent a <variable> (universal variable by default) by preceding the variable name with *_.
(8) Represent an <existentialVariable> by preceding the variable name with ?_.
(9) If an <existentialVariable> has dependencies, they should immediately follow the existential variable, and be enclosed in [].

Consequently, the same text become:

{ [give John Mary book1 1.0 0.9] [Instance book1 book 1.0 0.9] [author book1 John 1.0 0.9]}
 

3. Explanation and Examples

In the following, the grammar rules will be explained one by one, with examples (in plain text format) when necessary.

A text is the largest unit of knowledge and it contains a sequence of sentences.

Example: { [give John Mary book1 1.0 0.9] [Instance book1 book 1.0 0.9] [author book1 John 1.0 0.9]}
 

A sentence is either simple, containing a single relation, or a compound with formed with multiple sentences.

When <strength> and <confidence> are omitted from an encoding, a default truth value is used.  Default truth values for math relations and embedded relations in higher-order relation should be <1,1>.  Default truth values for empirical relation should be <1,c> where c is a system parameter (initially this value should be set near 0.9).  

For example, [Implication [Inheritance A B] [Inheritance C D]] 

should be understood as, [Implication [Inheritance A B 1.0 1.0] [Inheritance C D 1.0 1.0] 1.0 0.9]

There are 3 types of sentences: statements (the default, can be marked with "."), questions (marked with "?"), and command (marked with "!").

Examples: 

"Apples are red"

[Property color apple red].

"Did John give Mary a book?"

[give John Mary book1]?  [Instance book1 book].
 

A relation can also be either simple or compound.

A simple relation is either built-in, or user-defined.  The former use reserved words of KNOW as relation type, while the latter can have any relation type.  Math relation is a special kind of built-in relation.

The distinction between the two is made according to Webmind design considerations. A builtInRelation can be recognized and processed by special-purpose code in Webmind, while a userDefinedRelation is only used in reasoning by the default mechanism in the inference engine.

Since builtInRelations have hard-wired meaning in Webmind, they are usually processed more efficiently for special purposes.  On the other hand, to process too many relations in this way will decease the flexibility and learning ability of the system.

Both types of relation consists of a relation name, followed by a non-empty argument list.   When users enter relations into the system, names of builtInRelations can be selected from a given list.  All the other names are taken as userDefinedRelations.  The order of the arguments does matter in most relations.  To specify the desired order, examples are used once a relation name is chosen or given.  We don't want to specify the argument structure by limiting the category of each argument, nor do we want to name the arguments.

Example: [give John Mary book1]

(in the following examples, the strength and confidence will be omitted.)
 

In general, all built-in relations can be put into three categories:

    (1) Inheritance relations, including Inheritance,  Similarity, Implication, and Equivalence.  These are the basic logical relations, by which the other relations can be represented. Variations of them include Instance, Membership, Subset, and ExtensionalSimilarity. Subset and Member are pure extensional relations that apply more to the domains of mathematics or situations defined by local contexts than they do to general knowledge hence their use in the practice would be fairly limited.  Read Knowledge Representation for Inference for definitions of them.

    (2) Property relations, including Property, NumericProperty, RelativeProperty, Time, NumericTime, RelativeTime, Location, and RelativeLocation.  "Property" is a special kind of relation, and serves as a modifier of a item.  For the current purpose, we assume each property of an individual has a unique value. Though the boundary between "relation" and "property" is fuzzy, it still makes sense, and such a line can be drawn in most cases. For example, "Apple is red" can be seen as a relation among "apple", "color", and "red", but it is more natural to be seen as to assign a value (red) to an property (color) of an object (apple).  On the contrary, "John gave Mary a book" should not be put in this way.  In the above properties, Property, NumericProperty, and RelativeProperty are the primary types, while the others are special situations of the three, where the "property" is either time or place.

    (3) Special relations, including Association, PartOf, Own, Believe, and Want. These relations have nothing special from a logical point of view, but since they appear very often in Webmind, as well as in human knowledge, they are given special treatment to improve the efficiency of the system.

All built-in relations will be described one by one in the following.
 

Such a relation will become an InheritanceLink, indicating that the first argument is a special case of the second.

Example: [Inheritance bird animal]

When taking more that two arguments, [Inheritance a1, a2, ..., an] is identical to [Inheritance a1, a2], [Inheritance a2, a3], ..., [Inheritance an-1, an], with the same truth value for each relation.

Such a relation will become a pair of SimilarityLinks,  indicating that the two arguments are similar to each other (so their order doesn't matter).

Example: [Similarity dog cat]

When taking more that two arguments, [Similarity a1, a2, ..., an] is identical to [Similarity a1, a2], [Similarity a2, a3], ..., [Similarity an-1, an], with the same truth value for each relation.

Such a relation will become a pair of ImplicationLinks, indicating that the first text is a sufficient condition of the second.  Here two texts are used, rather than arguments in general, because an implication relation is a higher-order relation between sentences.

Example: [Implication [give John Mary book1] [Own Mary book1]]

When taking more that two arguments, [Implication a1, a2, ..., an] is identical to [Implication a1, a2], [Implication a2, a3], ..., [Implication an-1, an], with the same truth value for each relation.

Such a relation will become a pair of EquivalenceLinks, indicating that the two texts are sufficient condition of each other (therefore the order of the texts doesn't matter).  Again, the arguments are texts, not words or phrases.

Example: [Equivalence [give John Mary book1] [receive Mary John book1]

When taking more that two arguments, [Equivalence a1, a2, ..., an] is identical to [Equivalence a1, a2], [Equivalence a2, a3], ..., [Equivalence an-1, an], with the same truth value for each relation.

This relation indicates that an object (second argument) has a property (first argument) with a known value (third argument).  If the system contains two such relations that only differ in the last argument, some conflict resolution action will be taken.

Example: [Property color apple red]
 

This relation indicates that an object (second argument) has a property (first argument) with a known numeric value (measurement), where <measurement> consists of a number and a unit.

Example: [NumericProperty height  John 1.75 meter]
 

This relation indicates that an object (second argument) has a larger value in a property (first argument) than another object (third argument), with their difference as an optional argument. The implementation of this relation type is undecided yet --- it can either become a special PropertyLink, or a Property of a relation.

Example: [RelativeProperty height John Tom 0.2 meter]
 

This relation is a special case of Property, with "time" as the property, which is no longer indicated by an argument, but by the relation name.  The first argument may be a text.

Example: [Time [give John Mary book1] yesterday]
 

Similar to the above, this relation is a special case of NumericProperty, with "time" as the property to be measured.

Example: [NumericTime day 24 hour]
 

Similar to the above, this relation is a special case of RelativeProperty, with "time" as the property to be measured.

Example: [RelativeTime November December 1 month]

This relation is a special case of RelativeTime, with the reserved word "now" as the implicit and omitted second argument.  This is used to refer to past events.  An optional measurement value may be included.

Example: [Before [eat John dinner 1 hour]]

This relation is a special case of RelativeTime, with the reserved word "now" as the implicit and omitted second argument.  This is used to refer to future events.  An optional measurement value may be included.

Example: [After [marry Jim Jane 1 month]]

This relation is a special case of “event”, with "location" as the property, which is no longer indicated by an argument, but by the relation name.  The first argument may be a text.  It’s valid to use Location as a higher-order relation

Example: [Location Intelligenesis New_York_City]
 

Similar to the above, this relation is a special case of RelativeLocation, with "location" as the property to be measured. However, it still need to use the third argument for the direction of the relation (such as above/below, left/right, and so on)

Example: [RelativeLocation [meet John Mary] Central_Park south 1 mile]
 

Such a relation indicates that the first argument is associated to the second one (in a unspecified way), and it will become a pair of AssociativeLinks.

Example: [Association doctor nurse]
 

Such a relation indicates that the first argument is part of the second argument, and it will become a  PartOfLink and  a ContainLink.

We haven't had special code for this relation except in the NL module yet.

Example: [PartOf head body]

When taking more that two arguments, [PartOf a1, a2, ..., an] is identical to [PartOf a1, a2], [PartOf a2, a3], ..., [PartOf an-1, an], with the same truth value for each relation.

Such a relation indicates that the first argument is the owner of the second argument.

Example: [Own Mary book1]
 

Since "believe" is a frequently used relation, it will become a PropertyLink (BeliefLink) from a system (human or computer) to a text.  When the details are not important, the same KNOW can also be used to represent semantic relations such as "know", "guess", "see", "hear", and so on, with deferent default truth values.

Example: [Believe Pei {[Property shape Earth flat] [Property made_of Moon cheese]}]
 

This is similar to Believe except that the text indicating relations the system want to be true in a future time.  This relation will be used to represent goals, hopes, desires, and so on.  To be general, the second argument must be a text, not word or phrase, therefore Want is also a higher-order relation here (which is one of the several senses of "want").

Example: [Want Mary [give John Mary book1]]
 

This is an ordered relation.  When used with just two arguments it signifies that the first argument is a member of the second.  When used with n (for n>2) arguments, it signifies that argument1 is a member of argument2 and argument2 is a member of argument3, ... and argumentn-1 is a member of argumentn.  This relation may be used for any set-theoretic relations.  It may also be useful to represent "sets" of internal psynet entitities (nodes and links).   For use cases where one of the <argument>s is actually an <argumentSet>, see the definition below for <argumentSet>.  Basically, it's as if the membership relation is separately applied to each member of the <argumentSet>.

Examples:  [Membership pi real_numbers]  [Membership <4 5 8 20000023> natural_numbers]
 

This is an ordered relation.    When used with just two arguments it signifies that the first argument is a subset of the second.  When used with n (for n>2) arguments, it signifies that argument1 is a subset of argument2 and argument2 is a subset of argument3, ... and argumentn-1 is a subset of argumentn.  This relation may be used for any set-theoretic relations.   For use cases where one of the <argument>s is actually an <argumentSet>, see the definition below for <argumentSet>.  Basically, it's as if the subset relation is separately applied to each member of the <argumentSet>.

Examples:  [Subset natural_numbers integers] [Subset integers rational_numbers real_numbers]
 

This is an ordered relation.    When used with just two arguments it signifies that the first argument is an instance of the second. This relation may also be thought of as "member inheritance".  When used with more than 2 arguments, it follows the same pattern as the two relations above.  Ditto for when any of the <argument>s is actually an <argumentSet>.

Examples:

            [Instance Jerry mouse]
 

Such a relation will become multiple-target InheritanceLinks in Webmind.  It is necessary to confirm that every relationName is used with the same argument structure, which specifies the number, type, and order of the arguments.

Example: [give John Mary book1]
 

This may be used in some formulas to represent the standard mathematical operations.  We may add (things like exponent) to this list in the future if necessary.  The first four operators are the obvious.   Exponent takes two arguments and represents "y to the x power" where y is the first argument and x is the second. Integer division is "exact division" which applies only to the natural numbers, and yields a (whole number) quotient, and a remainder.  Intersection, union, difference, and symmetric difference are the expected operations when the arguments are sets. The set operator creates a set out of whatever arguments are given to it.  The list operator is exactly the same except that order matters in the argument list, i.e.  [ == (set a b) (set b a) ]   but,  [ !=  (list a b) (list b a) ].   The powerSet operator takes only one argument and returns all the subsets of that argument.  The subsetUnion operator (probably soon to be renamed) also takes only one argument (a set), and returns the union of the members of the elements of that set.

Example:  (* 18 9)
 

This operator takes sentence(s) as its arguments.  It will be placed in curly brackets, with the argument sentence(s) following.  NOT is a unary operator, and can only have a single sentence argument.  NOT should be used in case of high-order relations. For first order relations, negation should be represented by the appropriate truth value. AND and OR may have two or more sentence arguments.  Ordered-AND can be used in schema to indicate the order by which commands are executed.

Example:  {Implication [Instance *_personX human] [OR [Instance *_personX male] [Instance *_personX female]]}
 

This relation will take two arguments and represent the relationship between them.  Because it is a mathematical structure, the relation will always carry [s,c] of [1,1]  (at least in every case we've been able to envision so far).

Examples:  (!= 4 8)    (== 2 (/ 4 2))

This corresponds to the expected mathematical construct.  It's a container in which we can put a number, a variable, or some combination using <mathOperator>s to link them together.

Examples:  (9.12)   (-5)  (0)   (+ 4 5)  (- 8 (/ 9 3))

This is a universal variable (the name of a universal variable should be preceded by "*_"), which is used to represent a variable in a  relation that is not bounded, constrained or limited by any other statement in the expression.

Example:

"Asia is the largest continent"

{[Instance Asia continent] [Implication [Instance *_continent1 continent] [RelativeProperty size *_continent1 Asia]]}

This variable (the name of an existential variable should be preceded by "?_") is for representing variables that appear in a relation which are somehow constrained, or dependent on some other variable.   When constrained by another variable (which would mean that there exist dependencies), it usually comes from an expression/theorem/formula which has a "there exists <something> such that <something else>".   In this case, any variables which appear in the <something else> part of the expression will be existentialVariables, and the dependency(ies) will be upon any variable(s) that appear in the <something> part of the expression.

Example:

"What did John give Mary?"

[give John Mary ?_gift]?

This is what we use to represent the constraints on an existential variable.  Dependencies (when they exist) will always be universal variable(s).

An argument can either be a simple one or a complex one.

A simple argument may be a word, a phrase, or a number.

A complex argument can either be a KNOW text, or an argument modified by a modifier, which is just like a simple argument itself.

A modifier usually correspond to a property, with the property name omitted.  For example, "red apple" correspond to a kind of apple whose "color" property has "red" as its value.  In cases where the property type is unclear, or is not a commonly agreed upon property type, modifiers may be treated as Inheritance relations, where the modifier then represents the category of all such things.

Example: [Inheritance orchid beautiful]

A argument set will cause the relation to be duplicated for each argument in the set, with the same truth value.

Example: [Inheritance <A B> C] is identical to [Inheritance A C] and [Inheritance B C], while [Inheritance A <B C>] is  identical to [Inheritance A B] and [Inheritance A C].

Please note that "argument set" is not the same as multiple argument.  For example, [Inheritance A B C] is identical to [Inheritance A B] and [Inheritance B C].
 

A user defined relation name can be any English word or phrase.
 

[like John Stacy]

Used in NumericPropertyLink, this should be an English word or phrase that is a measurement unit.

[NumericProperty length foot 12 inch]
 

A property value can be any real number.
 

The strength of a sentence is defined in the same way as the strength of a link in Webmind.  Various default values can be used for deferent verbal expressions, such as "usually", "hardly", "is", 'is not", and so on.
 

The confidence of a sentence is defined in the same way as the confidence of a link in Webmind (see Knowledge Representation for Inference).  Various default values can be used for deferent verbal expressions, such as "I guess", "in fact", and so on.  Usually the extreme values (0 for no evidence, 1 for complete evidence) are not allowed for empirical knowledge.  In the current design, Confidence 0 is used in questions, and confidence 1 is used for definitional and conventional knowledge, such as in mathematics.

 

In this version, we distinguish 3 types of terminals in KNOW.

A (normal) term is the name of a node (concept). It should be a string of characters starting with a letter.  Example: bird, red_apple.

An anonymous term is the internal ID of a unnamed node (concept). It should be a string of characters starting with the symbol "_". Example: _bird125, _red_apple03.

A word is an English word or phrase, quoted between " ".  Example: "bird", "red apple".

A number is just a numerical value.  Example: 103.45, 0037.