Loglish

Ben Goertzel

Introduction

The goal of this brief document is to describe a new language, which I call Loglish.

How can a whole language be described in such a brief document? The answer that it’s not really all that new: rather, it’s a synthesis of two existing languages, the natural language English and the constructed language Lojban. Thus, Loglish might be considered a “semi-natural” language. I’m not sure if other languages in this category exist or not.

The term “Loglish” has often been used in the Lojban literature as a pejorative, to refer to usage of Lojban that are overly English-like, thus violating the Lojbanic ethic of cultural neutrality. That usage of the term “Loglish” is a bit different from the usage I promote here.

The goal of Loglish is to make a language that maintains the elegant and mathematically correct logical structure of Lojban, but is much easier to learn and use.

Pragmatically, my main goal is to create a language that will make it relatively easy for a variety of humans to communicate with AI systems that are below human level in their general intelligence, but exceed human beings in some forms of logical inference and other capabilities.

More generally, Loglish seems to capture what is really interesting about Lojban, and to avoid what’s really annoying about Lojban (the lack of a full vocabulary; and the time-consuming and tedious nature of learning a large unfamiliar vocabulary in a language that lacks good teaching materials or the possibility of learning via immersion in a community of speakers). Of course, I realize that Loglish lacks some of the unique beauty and “cool value” of Lojban, but it maintains the primary purpose of Lojban, which is to have a language whose syntax and semantics are based on predicate logic, and whose words tend to have relatively unambiguous meanings. From the perspective of human/AI communication, these aspects of Lojban are what really matters.

This document is an initial conceptual overview rather than a complete language specification; there are probably plenty of omissions, errors, and so forth.

The document assumes the reader has at least a rudimentary understanding of Lojban, see e.g. my article www.goertzel.org/new_research/lojban_AI.pdf and references therein.

Finally, I note that Loglish as described here is not the first instantiation of the idea of merging English vocabulary with Lojban syntax. There is an entry on “Anglish” in the Lojban Wiki site which discusses such attempts and dismisses them rather summarily; and I am told there were earlier attempts in the pre-Lojban era of Loglan. I don’t know the details of these prior attempts, however, and my view is that with constructed language as with many other things, the devil is in the details. Lojban itself works reasonably well because of the thoughtful details that went into its construction, which set it apart from more facile attempts to make humanly-facile versions of predicate logic. Similarly, if Loglish works (as I believe it will, though only experience will tell) it will be because the details have been worked out thoughtfully and properly. The failure of prior attempts at English/Lojban or /Loglan hybrids doesn’t tell you much about the prospects for Loglish. Among other factors differentiating Loglish from prior similar attempts is Loglish’s use of WordNet and FrameNet to aid in automating the parsing process; without some mechanism like this, Loglish would not support automated parsing and would lose much of its appeal.

I have discussed Loglish a bit with members of the Lojban community and the main objections are

Loglish is not as elegant and pure as Lojban (true enough)
Loglish would prove very difficult to speak, because the use of English words would constantly push the speaker psychologically to use English grammar and violate the logic-based Loglish syntax

I believe the second complaint is unfounded, but, the only way to tell for sure is to write a Loglish parser and practice creating sentences that are parse-able by it, and see how difficult it is. Personally, I suspect it will be quite easy for anyone familiar with both English and predicate logic to get the knack. Some people have drawn an analogy between Loglish and restricted forms of English such as Basic English (which retains full English syntax but with restricted vocabulary), which are indeed difficult for people to speak (it’s hard to stay within the arbitrary restrictions); but I don’t think the analogy is such a good one.

Initial, Rough Language Specification

The basic definition of Loglish is as follows:

The Lojban cmavo are retained
The basic Lojban sentence structure is retained
Lojban vocabulary, aside from cmavo, are replaced by a combination of

English nouns/verbs/adverbs/adjectives, in their root forms
special compounds formed from these English words (these compounds represent English word-senses and are called “sense-compounds”)

There are also a couple special rules needed to preserve the unambiguity of parsing:

English words or names that are characterologically equivalent to Lojban cmavo should generally be omitted, or else need to be prefaced by a special prefix like “eng” (for instance if one wants to refer to “Poe” in speech without confusing it with the cmavo “po”, one can simply refer to “eng Poe”)
English collocations should be written with an underscore in the middle (e.g. “white_house” for the place the US President lives)

By “root form”, it is meant e.g. that English verbs used must be in the infinitival form; English nouns must be singular. For instance, “run” not “runs” or “ran”; “pig” not “pigs.”

Also, negative words are not allowed (e.g. “unambiguous” is not allowed, instead one must say “na ambiguous”, using the Lojban negation operator). Lojban constructions should be used in place of “and” and “or” as well (to avoid the ambiguity of the English words, e.g. English “and” may be either Boolean or sequential).

The subtlest part of Loglish syntax are the sense-compounds -- “special compounds formed from English words,” intended to be formed from English words using the new Loglish cmavo “qui” and the Lojban cmavo “fi’o”.[1]

Qui is a cmavo used in Loglish to create words with unambiguous senses, as in the example:

(English word) qui (English word)

The second English word in the compound is a sense-specifier. In some cases one might want two modifiers, using the form

(English word) qui (English word) qui (English word)

For instance, one might say “doctor qui medical” to refer to a doctor, but to specify which sense of the word “doctor” is intended (a medical doctor). Or, one might say, “doctor qui alter” to indicate the verb “doctor” in the sense of “alter with an intent to deceive.” Note that there is some flexibility in what words one uses as sense-specifiers: for instance, one could just as well say “doctor qui medical qui person” so as to be sure the listener doesn’t mistakenly misinterpret “doctor qui medical” as meaning “doctor qui cure” (the act of medically doctoring).

Sense compounds are an area where Loglish leaves the speaker some flexibility regarding the balance between concision and precision. It is possible to ignore sense-specifiers altogether and just use English vocabulary straight up within Loglish, with all its beautiful and frustrating ambiguity. But it’s also possible to use sense-specifiers to make Loglish words just as crisp and unambiguous as Lojban words – or more so.

Finally, there is the important issue of the argument-structure of Loglish content words. This is handled via three mechanisms: some conventions, the “fi’o” connective, and a dictionary.

First, the conventions:

Nouns are interpreted to have one argument, which is interpreted as a member of the category denoted by the noun
Adjectives/adverbs are taken to have two arguments: the first is the entity modified by the adjective/adverb, the second is the extent to which the modification holds
Intransitive verbs are interpreted to have at least one argument, which is interpreted as the argument of the predicate represented by the verb
Transitive verbs are interpreted to have at least two arguments, the subject and then the object
Ditransitive verbs are interpreted to have three arguments, and conventions must be made for each of these cases, e.g.

give x y z may be interpreted as “x give y to z”
take x y z may be interpreted as “x takes y from z”

A rule of thumb is that the agent comes first, the recipient comes last, and the object comes inbetween.

Next, the “fi’o” connective is intended to be used somewhat similarly to “qui”, but with a different purpose: rather than specifying a sense, it specifies an argument position. For instance,

la ben cu proceed fi’o to lo store

indicates that Ben is going to the store (note that we use “proceed” because it’s nicer than “eng go”, which is necessary rather than “go” since the latter is a Lojban cmavo as well as an English word), whereas

la ben cu proceed fi’o from lo store

indicates than Ben is going from the store. In the case of tanru, fi’o can be placed before the first item in the tanru, e.g.

la ben cu proceed fi’o food lo store

for “Ben goes to the food store.” Of course, qui and fi’o may be used together, as in

la ben cu proceed fi’o to lo field qui grass

le electron cu proceed fi’o to lo field qui magnet

la Ben cu murder lo chicken fi’o weapon lo pliers

Example 7

English

John discussed red-black trees in the oak tree

Loglish

puku vi le oak tree qui botany la John discuss le red-black tree qui computing

(A tree in computer science refers to a certain type of data structure; and a red-black tree is a particular kind of tree.)

(“puku” is Lojban for “earlier than now.”)

WordNet and FrameNet for Lojban Parsing

One of the strengths of Lojban is that it allows exact automated parsing via a standard parsing algorithm similar to those used for computer language or toy formal languages. Loglish, on the other hand, should allow automated parsing but not quite as accurately or as elegantly as Lojban. Loglish parsing will require recourse to automated linguistic resources such as WordNet and FrameNet, and even so parsing accuracy won’t be 100%, though I suspect that 99% accuracy will be achievable.

Basically, Loglish parsing will be just like Lojban parsing with the addition of algorithms for using WordNet to resolve word-sense ambiguity, and FrameNet to resolve argument-position ambiguity. These algorithms may be very simple, e.g. for sense-disambiguation

"Resolve 'X qui Y' to the sense of X whose WordNet definition has the smallest semantic distance to Y."

and for argument-position-disambiguation

"Given ' X fi’o X [article] Y' , assign Y to the FrameNet argument position of X whose description has the smallest semantic distance to Z. If Z is not in FrameNet, then find the entry in FrameNet with the smallest semantic distance to Z, and use it in place of Z in the preceding process"

I think it's reasonable in Loglish to adopt the convention that the first WordNet sense of a

word is taken as a default, and doesn't need to be specified.

Also, one may adopt the convention that, when a later-than-first sense of a word is specified, then the next uses of the word are taken to have this other sense, until another sense is specified for the word.

So, in

la Ben cu murder lo chicken qui coward fi’o weapon lo pliers

la Dr. Who cu resurrect lo chicken

the "chicken" in the second sentence would be automatically taken as "chicken qui coward" rather than "chicken qui bird"

In some empirical work studying humans interacting with natural language processing systems, we found it is quite rare for a word to be used in more than one sense in the same context (of course, these experiments were with descriptive rather than poetic discourse, obviously).

It’s worth noting that the process of computing semantic distances using WordNet is not fanciful at all and has already been implemented by several scientists. For some links into the literature, see: www.d.umn.edu/~tpederse/tools.htm, http://www.citeulike.org/user/schaal/article/266098. In our own work with WordNet, my own software team wound up brewing our own code for this, since none of these approaches were ideal.

As an example of FrameNet, let’s look at the FrameNet entry for the verb “kill.” Note that "kill" is a simple two-placed verb in English syntax, but it has more than one argument in English semantics, and the FrameNet entry reflects this. As one may observe at http://framenet.icsi.berkeley.edu/index.php, the following case-roles may be associated with the English verb "kill":

use Core

---------------------------

Degree Peripheral

Depictive Extra-Thematic

Instrument Peripheral

Killer Core

Manner Peripheral

Means Peripheral

Place Peripheral

Purpose Extra-Thematic

Reason Extra-Thematic

Result Extra-Thematic

Time Peripheral

Victim Core

Of course, if Loglish speakers become familiar with the habits used in FrameNet to label argument-positions, this will make the Loglish parser’s job a lot easier in terms of argument-position disambiguation.

As an example of the use of FrameNet in the parsing process, suppose someone says

la Ben cu murder lo chicken fi’o weapon lo pliers

This means that "Ben murders chickens using pliers as a weapon"

Now, suppose there is no FrameNet entry for "murder." Then the Loglish parser needs to find the closest semantic match in FrameNet, using WordNet-based semantic distance. Suppose this match is "kill". Then the parser must use WordNet similarity to figure out that "weapon" matches to "instrument" not "agent" or "patient" (a particularly easy problem since "Ben" and "chicken" are filling the "agent" and "patient" slots anyway, but it won't always be this easy)

Note that the person making the sentence was being fuzzy in a natural-language-ish way, via using "weapon" instead of "instrument." But inside the Loglish parser, this fuzziness is processed out via using WordNet semantic distance calculations, resulting in the semantic parse (expressed as a series of predicate logic relationships)

murder_1(Ben_1, chicken_1, pliers_1)

Inheritance(murder_1, murder)

Inheritance(Ben_1, Ben)

Inheritance(chicken_1, chicken)

Inheritance(pliers_1, pliers)

Agent(murder_1, Ben_1_

Patient(murder_1, chicken_1)

Instrument(murder_1, pliers_1_

Of course the sentence "Ben kills chickens with a pliers" is parseable unproblematically by a host of existing English parsers, so Loglish adds little value here. But note that even here it does add some value because an ordinary English parser would produce two parses corresponding to

[Ben kills chickens] with a pliers

vs.

Ben kills [chickens with a pliers]

Common sense tells us to choose the first of these two options but an automated parser may lack this common sense. The Loglish version eliminates the syntactic ambiguity of the English version, which is exactly what it's supposed to do.

As an aside, one suspects that the need to specify word-sense using qui would push Loglish speakers to habitually use less ambiguous English words.

For instance

ko get lo tape

is ambiguous because "tape" could be the sticky kind or the music kinds, so one could specify

it using

ko get lo tape qui music

but it's easier to just say

ko get lo cassette

Also, English idiomatic collocations like “rub_out” for “kill”, while technically usable within Loglish, will tend to be deprecated because of their potential to lead to parsing ambiguity.

Referencing in Loglish

Next I will discuss an area where the current Lojban syntax seems inadequate, and it seems preferable to introduce a novel mechanism within Loglish, slightly different from anything within Lojban. In fact I believe the mechanism discussed would be a valuable addition to Lojban as well, but that’s a separate issue and I don’t intend to argue it here.

The basic idea is that there is one referential word in Loglish – “it” – and then a reference-target-indicator “quu” which gives a qualitative indication of the referent of a given instance of “it,” intended to narrow down the scope of the reference resolution process.

For instance, in my intended usage, you could say

la Dr. Benjamin Goertzel cu proceed lo playground. It quu man cu kill lo dog. It cu eat lo cat.

In this case, "it" is defined to refer to "Dr. Benjamin Goertzel", not to "man" generically. The "man" qualifier following the "quu" is intended to merely guide the listener's mind

toward the right antecedent for the pronoun. It's not intended to explicitly define the pronoun. So, basically

it quu man

is the rough equivalent of the English "he", and

it quu woman

is the rough equivalent of the English "she"

It seems there is no equivalent of "quu" in this sense in Lojban; the Lojban referential mechanisms seem a bit more awkward to me (and have also been complained about by some in the Lojban community).

Note that subsequent uses of "it" may usefully be assumed to keep referring back to the same referent, until another use of "it" coupled with "quu" occurs. This is why no
quu” is needed in the third sentence of the above example.

Terminators

One syntactical loose end has been left dangling in the above discussion. In the above discussion, the Loglish cmavo qui, fi’o and quu have been used with single words following them, so no issues of termination have come up. However, it’s possible that one might want to use a complex construct following any one of these. With this in mind it seems necessary to introduce corresponding terminators. The cmavo fi’o already has a terminator, which I’ll name quik, fi’ok and quuk. So for instance one could specify

set qui nonfoundational math quik

versus

set qui traditional math quik

to describe the differing senses of “set” in nonfoundational versus traditional set theory. This mechanism will not really be needed very often, as in this case e.g. one could just as well say

set qui nonfoundational

but the longer construction is better in a sense because it provides the parser with more guidance in the disambiguation process.

Loglish Parser Output

At time of writing no Loglish parser exists but I believe it is highly desirable to write one and experiment with it. This brief section gives some indication of what the output of the initial Loglish parser should look like. Of course, subsequent Loglish parsers may become quite involved with various interactive features, but initially it seems appropriate to keep things simple.

An appropriate output format would seem to be:

Sentence
List of pairs of the form (X qui Y, WordNet gloss corresponding to chosen sense of X)
List of pairs of the form (“X: Y, W” or “X: fi’o Z [article] Y, W”, where W is the FrameNet argument label corresponding to chosen argument-position of X into which Y has been placed based on Z or based on default rules)
List of pairs of the form (it quu X, antecedent chosen based on X)
Parse tree of the sentence (in ASCII form)

For instance, a typical parse report might look like

It quu man cu kill lo dog fi’o weapon lo stick qui wood

SENSES

man - male human

dog – canine animal

weapon – implement used for attacking

stick – piece of wood formed from the branch of a tree

wood – substance making up the trunk or branches of a tree

ARGUMENTS

kill: It quu man, Killer

kill: lo dog, Victim

kill: fi’o weapon lo stick qui wood, Instrument

REFERENCES

It quu man, Ben

If the parse report doesn’t look correct then in the initial version the user’s only recourse will be to reformulate the sentence. In a more sophisticated version the user might have the ability to modify the dictionaries used within the parser to make his sentence parse right (thus creating customized versions of WordNet and FrameNet), but of course this leads to a lot of complexities, because the customized versions created by various users must be merged and then validated to function effectively.

Conclusing Discussion

According to the lojban.org wiki site, the main goals for Lojban are the following:

Lojban is designed to be used by people in communication with each other, and possibly in the future with computers.
Lojban is designed to be culturally neutral.
Lojban grammar is based on the principles of logic
Lojban has an unambiguous grammar.
Lojban has phonetic spelling, and unambiguous resolution of sounds into words.
Lojban is simple compared to natural languages; it is easy to learn.
Lojban's 1300 root words can be easily combined to form a vocabulary of millions of words.
Lojban is regular; the rules of the language are without exception.
Lojban attempts to remove restrictions on creative and clear thought and communication.
Lojban has a variety of uses, ranging from the creative to the scientific, from the theoretical to the practical.

Loglish, as defined here, fails totally at goal 5 and 7, and fails partially at goal 2 (it has no syntactic cultural bias but has vocabulary-based cultural bias inherited from English) but fulfills the other goals of Lojban.

On the plus side, however, I believe it will be much easier to learn than Lojban (thus much exceeding Lojban at goal 6), because of the small amount of new vocabulary to be learned. Furthermore -- and this is the main point -- it should be far easier to learn to use Loglish to the point of true fluency, due to the immediate availability of a large, familiar vocabulary.

As noted in the Introduction above, the main open question regarding Loglish is how easy it really turns out to be for humans to speak it, without getting tangled up in English-syntax vs. Lojban-syntax confusions. Another open question is how high the accuracy of Loglish parsing can be pushed, via judicious tuning of the use of WordNet and FrameNet.

[1] Thanks to Jorge Llambias for pointing out the best cmavo to use here