Notes:

1) This file contains the first few chapters of Digital Intuition, which constitute a "general overview" of the Webmind AI project. The full Digital Intuition manuscript is available only to selected individuals, and only under NDA.

2) A note on formatting. This manuscript exists in M$ Word format. When I saved the excerpt to HTML from Word, it produced a lovely 2Meg file. So I decided to save it to .txt instead, losing formatting but cutting the bytage by a factor of 10. Soon enough, a lovelier version will be posted, based on a more careful hand-conversion to HTML.







Digital Intuition

A Conceptual Overview of the Webmind AI Engine

Version 2

Primarily authored by
Ben Goertzel

Major contributions by
Pei Wang

Substantial contributions also by
Cassio Pennachin, Stephan Vladimir Bugaj,
Cate Hartley, Jeff Pressing, Anton Kolonin,
Lucio de Souza Coelho, Matt Ikle',
Len Trigg, Karin Verspoor

Based on ideas, designs & software cooperatively developed by
the roughly 45 members of the
the AI Development and Research Divisions of Webmind Inc.


Contents

Preface

1. History and Current Status of the Webmind AI Project

I. Conceptual Foundations

2. A Brief and Biased History of AI
3. Mind as a Web of Pattern
4. Key Aspects of Webmind AI
5. Knowledge Representation Issues
6. A Hybrid Approach to Natural Language Processing
7. Experiential Interaction
8. Experiential Interactive Learning
9. Emergence Within and Between Minds

II. Formal Models of System Components

10. Knowledge Representation for Inference
11. First-Order Inference
12. Higher-Order Inference
13. Probabilistic Term Logic
13. Importance Updating
14. Halos and Wanderers
15. Evolutionary Programming
16. Inference Control
17. Schema Learning
18. Feature Structure Parsing
19. Numerical Data Analysis

III. System Design

20. High-Level Architecture Issues
21. Architecture Overview
22. Webworld

IV. Applications

23. Exportation of Document Indexing Rules
24. A Formal Treatment of Rule Exportation
25. Financial Market Prediction
26. Future Applications

Appendix A. KNOW, Webmind's Knowledge Representation Language








1

History and Current Status
of the Webmind AI Engine Project

Ben Goertzel

1. Introduction

The Webmind AI Engine project is, in many important ways, the most ambitious initiative in the history of the AI discipline. Unlike most researchers and engineers working in the AI field, we in the AI Development Division of Webmind Inc. are actually making a serious attempt to create a truly intelligent computer program, in the short term. We have detailed software designs, and detailed engineering and testing and teaching plans, and a highly competent team of roughly 45 scientists and software engineers and testers executing these plans. We've been at it since mid-1998, and we estimate that within 1-3 years from the time I'm writing this (March 2001), we will complete the creation of a program that can hold highly intelligent (though not necessarily fully human-like) English conversations, talking to us about its own creative discoveries and ideas regarding the digital data that is its world.

This intelligent conversational program will be the WAE 1.0. It will focus on information retrieval and financial analysis, conversing about information it has read on the Internet and in document archives, financial market movements, its own self, and its own creative thoughts and discoveries relating to these areas. Within 1-3 years after this, we believe, we will deliver the AI Engine 2.0, which will possess an understanding of mathematics, and the ability to optimize and modify its own source code, thus continually improving its own intelligence in parallel with our own AI engineering work.

This brief Prologue summarizes the AI Engine project: its history, its future, and key elements of the software design and the conceptual thinking that has gone into it. It is written for readers who know a fair bit about AI: technical AI terms and concepts are introduced freely, without explanation. The rest of the book covers the same ground, but giving more background and more detail.

Of course, "1-4 years from real AI" and "1-3 years more to fully self-modifying AI" are very gutsy claims, similar to other claims that have been made (and not fulfilled) throughout the history of AI. But we believe that, due to the combination of advances in computer hardware and software with advances in various aspects of cognitive science, real AI really now is possible - and that we know how to achieve it, and are substantially advanced along the path to this goal.

I don't expect that this book, in itself, will convince every reader that the AI Engine is what we say it is. At least, though, I hope it will convince you that we have a plausible approach to constructing real AI, without significant omissions or mistakes. To really convince the skeptic, of course, nothing short of a real working and completed AI system will suffice. We're working on it, and we'll be there soon enough.


2. Webmind Inc. and the AI Development Division

Before launching into discussion of the AI Engine itself, it may be useful to say a few words about the context in which the construction of this software system proceeded.

The Webmind AI Engine was created in the AI Development Division of Webmind, Inc., a start-up software firm, incorporated in August, 1997 as Intelligenesis Corporation. Webmind Inc. had a main business office in New York, small engineering offices in New Zealand and Australia, and a large engineering office in Belo Horizonte, Brazil. At the time of writing, April 2001, Webmind Inc. is in the midst of filing for bankruptcy, and I'm seeking funding to start a successor firm carrying on the Webmind Inc. AI work. Several dozen of the AI Development staff are working unpaid, continuing the development and productization of the AI code.

On a very general strategic level, Webmind Inc.'s mission was to create an "intelligence infrastructure" for the Internet. The vision was simple: The Internet of the future will be an immensely intelligent system, displaying powerful synergies with human intelligence, and that whatever company owns the Internet's intelligence, will own the largest share of the Internet economy.

From a scientific view, the core of the Webmind Inc. vision is the Webmind AI Engine: a series of increasingly intelligent software releases culminating in a conversational system that reads all the data on the Internet, assists in restructuring the data on the Internet to be more useful to human beings, and ultimately posts the new information that it creates. From a business view, on the other hand, an at least equally key aspect of the Webmind Inc. vision is the creation and sales of products that bring digital intelligence in various forms to websites and intranets. Currently the products depend on the AI Engine technology in a variety of ways; and as the AI Engine matures, we expect them to become more and more thoroughly AI Engine dependent.

In March 2001, the firm encompassed the following divisions:

* The AI Development Division, concerned with developing the WAE, and delivering interim software along the way in the form of releases of the Webmind IR Engine and Webmind Conversation Engine
* The Market Predictor group, doing financial trading in a joint venture with a small group of investors, using a text-based nonlinear prediction system that was prototyped within the AI Engine
* The Text Categorization group, located in New Zealand, developing specific AI technology for text categorization in a highly customer-focused way
* Application Development and Product Marketing. This portion of the company is devoted to creating and marketing products - currently information retrieval applications carrying out functions such as text categorization, search, and entity extraction. These products use tools from the AI Engine along with other more conventional techniques
* Sales and Solutions Delivery, concerned with finding customers for our products and helping them to integrate our innovative AI software into their businesses
* Operations, including a robust IT staff able to support our demanding R&D and product groups

Though the main office was in New York, Webmind was an international company, with engineering offices in Belo Horizonte (Brazil), Melbourne, and Hamilton (New Zealand), and a small business office in Silicon Valley. The AI Development Division consisted of roughly 30 people in Brazil, 14 in New York, and 1 in Melbourne. There were also a handful software testers devoted to the AI codebase, located in Brazil. The Brazilian staff consisted primarily of expert object-oriented software engineers, and computer scientists. The New York staff consisted primarily of scientists in various relevant areas: cognitive science, computer science, linguistics, physics, mathematics. However, the breakdown of responsibilities between offices was not at all rigid: a lot of software engineering was done in New York, and a lot of conceptual thinking is done in Brazil. The smooth integration of high-quality software engineering with innovative scientific research was one of the noteworthy and uncommon aspects of the Webmind Inc. AI Development Division. We put a fair amount of work into developing a software process that was friendly to the needs of experimental ground-breaking research, but also guaranteed the production of high-quality, efficient, testable and modifiable code.

The AI Development Division was managed by Cassio Pennachin, who was also the President and founder of Webmind Brazil. Cassio was the lead software architect of the AI Engine, whereas conceptual leadership was provided by Ben Goertzel and Pei Wang, together with Cassio and a loose group of others including Karin Verspoor and her team of computational linguists, Jeff Pressing (a physicist/cognitive-scientist who is our guru on such things as prediction and causation), and John Cleary and a group of his former students in New Zealand (experts on categorization technology).

At time of writing, active efforts are underway to resurrect Webmind Inc. post-bankruptcy, but this is not the place to delve into such topics.

3. History of the AI Engine

The AI Engine design has evolved considerably over the 3 years since Webmind Inc. received its seed funding, due to a natural and productive feedback between theory and practice. This section recounts the evolution of the system, in a way that, we hope, will give the AI-educated reader a good sense of our current state of development and our future prospects.

On the highest level, the conceptual basis of the AI Engine is a complexity-science-oriented theory of mind called the "psynet model," described loosely in Ben Goertzel's previously published research monographs (and somewhat more precisely in some of his unpublished papers). In the psynet model, mind is envisioned as a self-organizing network of actors recognizing and creating patterns in each other, giving rise to emergent network-wide patterns. Phenomena like perception, action, cognition, memory and learning are theoretically framed in these terms - in terms of complexity, self-organization, and emergence in a network of pattern-focused actors. The theory seeks to identify the key structures and dynamics of mind as separate from the structures and dynamics used for their physical implementation. Rather than a rigorous scientific theory of mind, the psynet model is a conceptual framework, worked out more fully in some areas than in others. It would be compatible with a vast number of possible AI systems.

In 1994, I made an initial attempt to implement aspects of the psynet model of mind in Haskell, a functional programming language. This initial system was very simple - a dynamic network of simple computational actors recognizing patterns amongst each other, creating new agents embodying these patterns, and interacting with the user. Because of being implemented in Haskell, the system could only support a small network of actors. Partly because of this, unfortunately, this system failed to demonstrate interesting behaviors. From this experimentation, it became clear that a much larger and more diverse dynamic semantic network was going to be needed in order to give rise to emergent intelligence as hypothesized by the psynet model. A more industrial-strength programming language was required, and the system would have to be implemented using distributed processing on a cluster of powerful SMP machines. Furthermore, it was anticipated that a larger pool of weaker machines would need to be used for "background processing" - an idea that has led, these days, to the AI Engine adjunct called Webworld, a peer-to-peer network of mini-AI-Engines. The nodes and links of the Internet would be the key to supporting the nodes and links of the mind. Thus the name "Webmind" was born.

The first real attempt at a WAE was made in 1997. On the surface it was somewhat similar to the current AI Engine: a dynamic semantic network, consisting of nodes and links of many different types, with creative agents of various types that wandered among nodes along links, building new links and nodes based on patterns they had recognized. It was implemented in Java, a language that seemed to strike a middle way between elegance and practicality.

Implementation of the system picked up steam in mid-1998 when the company achieved funding so the chief technical founders (Ben Goertzel and Ken Silverman) could quit their day jobs and hire more programmers. This AI Engine was an interesting one, and before long we had found substantial success using this early-version system to recognize correlations between trends in the news and movements in the financial markets. But practical experimentation with initial versions of the system soon led us to two significant concerns. First, making the distributed processing infrastructure work effectively was going to be a very big job in itself. Secondly, using just a small assemblage of node, links and creative agents was not going to work, given the processing power and memory constraints we were facing (even with the distributed processing in place).

To address the second of these two concerns, in late summer 1998, we moved to a multi-modular design, in which we distinguished the core system (the distributed, dynamic semantic network) from modules corresponding to different aspects of intelligence. Each module contained its own specialized node types; and occasionally, its own link types and creative agents. For example, there were modules corresponding to: reasoning, natural language, categorization, evolutionary learning, numerical data analysis, and psyche (a term we use to encompass feelings and motivations). The focus of our work became the creation of nodes, links and creative agents specialized for various aspects of intelligence - but able to interact freely with the corresponding actors specialized for other aspects of intelligence.

This shift in engineering focus represented a significant change in philosophy - not a refutation of the original "psynet model of mind" perspective, but a significant augmentation to it. While the basic model of mind as a self-organizing, nonlinearly-dynamical semantic network of pattern-recognizing and pattern-creating actors still appeared to be in principle workable, we realized that it was not in itself a sufficient principle for the creation of a thinking machine. A human brain is born with a lot of specialized "wiring" for various types of intelligence (linguistic, visual, temporal, and so forth), and similarly, we found ourselves "wiring" our digital brain in various specialized ways, yet without sacrificing the complete adaptability of the system and its potential for self-organizing emergent general intelligence.

With this shift in focus, the project became more integrative in nature. We began to make extensive use of evolutionary programming, inspired by and going beyond John Koza's seminal work in that area. We integrated specialized techniques for numerical data analysis: prediction, association and causation finding, trend analysis, etc., developed by Jeff Pressing, myself, and others. Most critically, we created a reasoning module inspired in large measure by the Non-Axiomatic Reasoning System (NARS) developed by Pei Wang (the firm's first paid employee) over the past decade. (The AI Engine reasoning module now has two versions, one that uses NARS and the other that uses our own Probabilistic Term Logic system, which is somewhat similar to NARS but is founded on probability theory.) The use of neural-network-like activation spreading between nodes also became more sophisticated, as we developed two separate activation spreading processes to deal with attention allocation, and the detection of semantic associations.

In mid-1999, we had created a number of sophisticated modules all interacting in the context of one dynamic-semantic-network. But serious software engineering problems loomed: the original core system had not been designed to support all these modules, and needed a ground-up rewrite. During fall 1999 and early 2000, Cassio and his Brazilian object-oriented software gurus worked with Ken and myself (who had designed the original core) to create a new core system, and integrate the modules with the new core.

In the meantime, the market prediction work continued very successfully, and we realized that we could build a simpler system embodying the key AI processes needed for text-based nonlinear market prediction, without the overhead (or full emergent intelligence) of the full AI Engine. The Webmind Market Predictor product was born.

In early 1999, we also began doing categorization of texts drawn from financial message boards using AI Engine technology, and in this context learned a similar lesson: Reasonably good text classification could be done using techniques simpler, and easier to tune, than the full AI Engine. We began working toward the current Webmind Classification System product, which uses some fairly standard machine-learning categorization techniques, together with innovative methods for producing feature vectors representing documents (using objects extracted from the AI Engine codebase, along with other methods). In 1998, we had assumed that Webmind Inc.'s products would be based directly on the AI Engine, but as we began to understand how large the task of creating the AI Engine was, we became more interested in creating simpler products that leveraged aspects of the AI Engine's intelligence in highly-focused ways.

In terms of fundamental AI Engine development, natural language processing, at this stage, became the largest thorn in our side. We had been experimenting with unsupervised language learning, but this had failed to work adequately (as has traditionally been the case). We turned to supervised learning of linguistic rules based on linguistic corpora such as the Penn Treebank, XTag, morphological databases and so forth, but of course, this we knew this was only a partial solution. "Wiring in" knowledge in this way is only acceptable if the system has a way to adapt the knowledge based on its own learning, and unsupervised language learning did not seem adequate for this purpose.

We thus realized that we would have to expand the "Experiential Interactive Learning" aspect of our system. Language learning had to be integrated with the learning of cultural patterns of cognition, and this learning had to proceed through interaction with other minds in a shared, perceptual/manipulable environment. We created a mechanism by which Baby Webmind could interact with us in a simple simulated world, in which it could participate with us in various interactions with files, directories, financial data series, and other digital objects. It could then ground its linguistic knowledge in non-linguistic social interactions, just as a human child does when learning language.

Along with the Baby Webmind emphasis came a new focus on action as well as perception. We developed what we call the "schema" framework, a kind of program-execution framework implemented in terms of nodes and links and other actors; and worked out how schema could be learned by a combination of evolutionary programming and inference.

It took us a while to find a framework for representing and manipulating syntax that was compatible both with supervised learning from external sources and with experiential interactive learning. During the second half of 2000, we finally found this, in the form of our own version of lexicalized feature structure grammar. Finally, we had a natural language module that made sense in terms of the other modules of our digital mind, and in terms of the two modes of language acquisition that we had chosen to use.

Throughout 2000, our confidence in the finality of our AI design grew significantly. For the first time, we could review any textbook on cognitive science or human psychology, run through every aspect of mind mentioned there, and explain in detail how we accounted for that. We had designed a complete mind system, with the diverse specialization of the human brain, as well as the creative self-organizing flexibility. And, as the end of 2000 approached, we had written nearly all of the code needed to support this.

At the start of 2001, we completed what we called "WAE 0.5" - for the first time, an AI Engine incorporating all the modules, working together sensibly in a functioning distributed core. Millions of nodes, billions of links, dozens of types of cognitive processing. Well, there were only two small problems. Hundreds of parameters, complexly interacting, making the system very difficult to tune. And the performance of the system still wasn't anywhere near what we wanted it to be: the system needed to be drastically sped up, and its memory usage significantly reduced.

For a moment we thought we might have to rebuild the core again, but as it turns out, the object-oriented design of the 2000 core is sufficient that this is not the case. Instead, we are embarking on a program of intensive efficiency-oriented rearchitecture without altering the basic object structure and conceptual framework of the system. We have arrived at a number of simple yet radical design changes that, according to our experiments, should improve the speed of the system by several orders of magnitude. These changes will also make the parameter optimization problem significantly less severe. This rearchitecture process will take 3-6 months, and will proceed in parallel with basic AI work in areas like experiential interactive learning, language generation, and schema learning using hybridized evolution/inference. Most of these optimizations could not have been done a year earlier, because they are dependent on the specific makeup of the AI modules, which only became really "final" during 2000.

The AI Engine 0.5 is not adequate performance-wise to be used inside most kinds of products. It's too slow, and requires too many machines. However, its intelligence can be used to enhance the performance of information retrieval products, using a technique we call "rule export." One thing this engine is very good at is identifying relationships between various words, and various concepts. It is able to export rules describing the relationships between various words and concepts - in general, or in a particular domain characterized by a particular set of documents. Think of the exported rules as a kind of superintelligent, optimized thesaurus. The exported rule can be used within lightweight products to produce indices ("feature vectors") for documents, and these indices can be used for applications like text categorization, search and market prediction.

We believe that the concept of a "real AI Engine" exporting rules and other data structures to be used by simpler, less computationally intensive software system is a critical one. "Expert systems," simple software systems powered by sets of human-created rules, are well-known. Getting humans to explicitly state the rules by which they carry out intelligent acts in various domains, however, is not a trivial task. So much knowledge is tacit. Comparatively, using an intelligent, albeit extremely expensive, AI Engine to generate these rules has advantages, the chief one being that unlike human brains, the AI Engine's brain is "transparent": once it has learned something, one can look into its mind and see how it is doing that thing. Sometimes there are hard problems involved in figuring out how the system does something, but, not as hard as scanning and analyzing the real-time dynamics inside the human brain.

To sum up, then, the current state is that now, in March 2001:

* the conceptual design for the AI Engine is well specified, and apparently complete
* an in-principle adequate platform for the deployment of this full conceptual design (WM 0.5) is now implemented
* In theory, about 85-90% of the code required for "real AI" is now written, and most of what is left to be done is testing, tuning, teaching, wiring-in of specific knowledge, and refactoring for efficiency.
* Rigorous empirical testing of various system components, performing text analysis tasks both together and separately, is now seriously underway, with test plans created that will carry us far into the future.
* The system is now exporting rules embodying relationships between concepts, for use for document and query indexing within IR products

The work of the AI Development Division, at this stage, is then divided into three categories:

* Work on the "Webmind IR Engine", a successor to the AI Engine 0.5 providing a wider spectrum of functions supporting products in the IR domain
* Work on the "Webmind Conversation Engine" - direct work on the path toward real AI, chiefly involving experiential interactive learning, schema learning, and the training and tuning of the system's psyche
* Work on core systems and services that benefit both AI Engines

There is also some other work being done around the fringes, building towards the WAE 2.0. We are converting Mizar, a formalization of much classical and modern mathematics, into KNOW (our knowledge representation language, which maps immediately into AI Engine nodes and links). This will form the foundation for the system's understanding of mathematical structures, which will form the basis for its understanding of its own algorithms and data structures, which will form the basis for the intelligent self-modification of AI Engine 2.0.

4. Intended Delivery Schedule

At the start of 2001 we bifurcated the development effort in the AI Development division, and aim toward two separate though overlapping goals: a Webmind IR Engine and a Webmind Conversation Engine.

Very rough schedules for these two engines were worked out prior to the dissolution of Webmind Inc., and are given below. Of course, all dates given here are now off, and will be re-calculated once the funding situation for the post-bankruptcy Webmind Inc. becomes clearer.

* WM IR Engine 1.0, to be delivered Summer 2001, a solid platform providing intelligent text-analysis support to information retrieval products (principally search, categorization and entity extraction)

* WM Conversation Engine 0.7, to be delivered Fall 2001, carrying out simple, somewhat intelligent conversations in the domains of file manipulation and corporate information, using KNOW rather than English as its language

* WM IR Engine 2.0, to be delivered December 2001, adding robust natural language query processing, document summarization, and other functionalities yet to be specified, based on business needs

* WM Conversation Engine 0.8, to be delivered February 2002, providing simple English language conversation

* There will be a very small "maverick psynetized NL team" pushing toward the difficult goal getting NL conversation in place for WM Conversation Engine 0.7. This is not considered part of the mainstream of WCE 0.7 development, because of the fear that it may distract KNOW conversation development. However, it's understood that if this is achieved, the value of WCE 0.7 will be greatly enhanced.

* WM IR Engine 3.0, to be delivered Summer 2002, providing additional functionalities to be specified based on business needs

* WM AI Engine 1.0 (incorporating both IR and Conversation), to be delivered Fall 2002, providing intelligent (but not necessarily extremely human-like) conversation, in addition to the full spectrum of IR functions. I'll still call this guy WM 1.0 for short.

* WM AI Engine 2.0, to be delivered December 2003, incorporating mathematical theorem-proving (beginning from the Mizar database) and automatic analysis and optimization of the system's own source-code


5. Obstacles on the Path Ahead

The three big challenges that we seem to face in moving from AI Engine 0.5 to AI Engine 1.0 are:

* computational (space and time) efficiency.
* getting knowledge into the system to accelerate experiential learning
* parameter tuning for intelligent performance

Efficiency-wise, our experience so far indicates that the AI Engine 0.5 architecture is probably not going to be sufficiently efficient (in either speed or memory) to allow the full exercise of all the code that's been written within it - in either the real-world IR context or the real AI context. Thus a rearchitecture is in progress, based on the same essential object model. Essentially, the solidification of the conceptual design of the system over the last year allows us make a variety of optimizations aimed at specializing some of the very general structures in the current system so as to do better at the specific kinds of processes that we are actually asking these structures to carry out. The current system is written for extreme generality, and it has allowed us to experimentally design and implement a wide variety of AI processes (although, for efficiency reasons, not to test all of them in realistic situations, or in interesting combinations). Now that, through this experimental process, we have learned specifically what kinds of AI processes we want, we can morph the system into something more specifically tailored to carry out these processes effectively. It does seem the current architecture is sufficiently flexible that it will almost surely be possible to move to a more efficient architecture gradually, without abandoning the current general software framework or rewriting all the code.

Regarding getting knowledge into the system, we are embarking on several related efforts:

* Conversion of structured database data into KNOW format for import into WAE (This is for declarative knowledge.)
* Human encoding of common sense knowledge in KNOW (this is for declarative knowledge)
* Human encoding of actions (both external actions like file manipulations, and internal cognitive actions) using "schema programs" written in MindScript (this is for procedural knowledge)
* The Baby Webmind user interface, enabling knowledge acquisition through experiential learning (this helps with both declarative and procedural knowledge)
* Creation of training datasets so that schema operating in various parts of the system can be trained via supervised learning. (This is different in detail for different parts of the system, of course.) (This is for procedural knowledge only.)

Finally, regarding parameter optimization, there have been several major obstacles to effective work in this area so far:

* Slowness of the system makes the testing required for automatic parameter optimization unacceptably slow
* The interaction between various parameters is difficult to sort out
* Complexity of the system makes debugging difficult, so that parameter tuning and debugging end up being done simultaneously

One of the consequences of the system rearchitecture proposed here would be to make parameter optimization significantly easier, both through improving system speed, and through the creation of various system components each involving fewer parameters. Although one part of the new system (the AttentionalFocus) will be almost as hard to tune as the current system, even here the problem will be rendered simpler by the fact that the parameters in the AF are also parameters in simpler, easier-to-tune components with fewer parameters apiece. Default settings for AF parameters will be obtainable from simpler components, and will then have to adapt themselves to take into account emergent phenomena in AF parameter space.

Summing up the directions proposed in these three problem areas (efficiency, knowledge acquisition, and parameter tuning), one general observation to be made is that, at this stage of our design work, analogies to the human mind/brain are playing less and less of a role, whereas realities of computer hardware and machine learning testing and training procedures are playing more and more of a role. In a larger sense, what this presumably means is that while the analogies to the human mind helped us to gain a conceptual understanding of how AI has to be done, now that we have this conceptual understanding, we can keep the conceptual picture fixed, and vary the underlying implementation and teaching procedures in ways that have less to do with humans and more to do with computers.

Obstacles on the Path to AI Engine 2.0

Finally, while the above issues are the ones that currently preoccupy us, it's also worth briefly noting the obstacles that we believe will obstruct us in getting from AI Engine 1.0 to AI Engine 2.0, once the current problems are surpassed.

The key goal with AI Engine 2.0 is for the system to be able to fully understand its own source code, so it can improve itself through its own reasoning, and make itself progressively more intelligent. In theory, this can lead it to an exponentially acceleration of system intelligence over time. The two obstacles faced in turning AI Engine 1.0 into such a system are

* the creation of appropriate "inference control schema" for the particular types of higher-order inference involved in mathematical reasoning and program optimization
* the entry of relevant knowledge into the system.

The control schema problem appears to be solvable through supervised learning, in which the system is incrementally led through less and less simplistic problems in these areas (basically, this means we will teach the system these things, as is done with humans).

The knowledge entry problem is trickier, and has two parts:

* giving the system a good view into its Java implementation
* giving the system a good knowledge of algorithms and data structures (without which it can't understand why its code is structured as it is).

Giving the system a meaningful view into Java requires mapping Java code into a kind of abstract "state transition graph," a difficult problem which fortunately has been solved by some of our friends at Supercompilers LLC, in the course of their work creating a Java supercompiler. Giving the system a knowledge of algorithms and data structures could be done by teaching the system to read mathematics and computer science papers, but we suspect this is a trickier task that it may seem, because these are a specialized form of human discourse, not as formal as they appear at first glance. In order to jump-start the system's understanding of scientific literature in these areas, we believe it will be useful to explicitly encode knowledge about algorithms and data structures into the Mizar formalized mathematics language, from which it can then be directly translated in to AI Engine nodes and links. (This is a project that we would undertake now, if we were faced with an infinite-human-resources situation!)

6. Architecture and Dynamics Overview

In spite of its simple conceptual foundations, the AI Engine is a large and complex system. Some of the reasons for this complexity were reviewed in the History section above. Here we will give a quick overview of the system architecture, which surely will raise more questions than it will resolve in the mind of any educated reader, but will hopefully at least get across a general idea of what kind of system we're building.

From a very abstract, mathematical point of view, the WAE consists of the following conceptual/mathematical entities:
* Atomic actors (representing e.g. words, concepts, numerical data sets, URL's and other pointers to outside entities)
* Composite actors, grouping other actors
* Atomic actions, both external and internal (basically, transformations/creations/deletions of internal objects)
* Composite actions, grouping other actions
* data channels between actions
* n-ary relations (joining objects, sets or relations)
* Conjunctions and disjunctions of relations
The effective deployment of these entities on a distributed network of SMP (symmetric multiprocessor) machines is the job of the Webmind Core, also called "psycore." The specialization of these mathematical notions into actors that do useful things in important contexts is the job of both the core and the various modules implemented on top of it.

How are these mathematical structures embodied in software? This is of course a long complicated story, and only the most superficial highlights will be presented here. There is a collection of code which embodies a general system of software actors, living on a network of SMP machines. This is the Webmind Core. Then, within this, there is something called "psycore", which is a dynamic semantic network of a very general type, implemented on top of the core. The Webmind Core deals with such things as Lobes (groups of actors living on a single machine), Messages sent between actors, and so forth.
Psycore is founded on the two most basic actor types in the AI Engine: the Node and the Link. A Node represents a concept, process, percept or action. A Link represents a relationship - it may be a relationship between Nodes, or a relationship between Links. Links typically come with numerical "truth values," including a strength (how strong is the relationship the link represents) and a confidence (how sure is the system that the strength it's assigned is correct). The other actor types in psycore exist to support Nodes and Links - to group them in various ways, and to allow them to send messages to each other and to create new actors relating each other in various ways. Psycore is not the whole AI Engine architecture, but it is the crux of the system, the "secret sauce" that makes the system unique. Because it is a fundamental "mind network", it is often called the Psynet.

The first step to specializing the general psycore framework is to introduce some basic link types, which are all special kinds of n-ary relations. These fall into several classes: logical, associative, causal, and dataflow. Various node types may then be introduced, each one possessing various types of links and various types of link-building agents.

"Activation," the WAE form of energy, spreads through the network between nodes and links according to neural-net-like dynamics. Activation-spreading, in itself is goal-less and spontaneous, driven purely by the complex nonlinear dynamics of Peircean generalized association. But this isn't the whole story: a substantial percentage of system activity is goal-directed. A goal activates things that help achieve it, also recording as lasting knowledge information about which things helped achieve it. Thus activation spreading, in large part, works in the service of activating things that can help fulfill basic system goals.

All this is a general framework for learning and knowledge representation - basically, a flexible and extensible hybrid "semantic/neural network." In principle, this could be a mind in itself - but not a plausibly efficient one. Thus, there is a modular structure built on top of this general framework, involving special nodes, links and actions oriented toward special domain and special types of learning. But each module has only limited differentiation. It has its own parameters and its own distribution of node and link types, but it must use the same basic structures and dynamics as the other modules, as supplied by the core.

Finally, what exactly are WAE's goals? Just as humans are built to want to eat, have sex, drink, exercise, etc., so WAE is built to want to answer queries effectively. In order to assure this, each time it sends a query, it should receive an assessment of the perceived quality of its response. This may be explicit (e.g., the user may be asked), or it may be implicit (e.g., the client may calculate the number of times the human user needed to rephrase his question to get the answer desired). We all know how hard it is to guess what humans want -- what this means is that answering human questions is certainly a "complex goal." It's an adequately subtle goal to require all the modules to work together, utilizing the services of the mind OS cooperatively to build a diverse self-organizing actor system with emergent structures that are not only complex but adaptive.

Modules of Mind
The nodes and links in psycore are not uniform; they are of many different types, and the different types are grouped into various modules.
The need for modularity is not surprising from a neurophysiological perspective. The brain has hundreds of specialized parts devoted to tasks such as visual perception, smell, language, episodic memory, and so forth. Each of these parts is composed of neurons which share certain fundamental features, but each also has its unique features and capabilities that scientists are only beginning to understand. Similarly, when a WAE is running on a computer, different parts of the computer's memory are assigned to different tasks. Each of these parts of the computer's memory draws on the psycore for its basic organizational framework, and on more specialized modules for advanced capabilities.

To support this dynamic specialization, the AI Engine is divided into modules, each one containing nodes and links pertaining to a certain kind of mental processing. Each of the modules, broadly speaking is specialized for recognizing and forming a particular kind of pattern. And all the different kinds of nodes and links can learn from each other -- the real intelligence of the system lies here, in the dynamic knowledge that emerges from the interactions of different species of nodes and links. This is the essence of the AI Engine's mind, of how its patterns create and recognize patterns in themselves and the world to achieve their complex goals.

Here we'll give a quick laundry list of modules, without going into great detail on any of them. Each module contains various types of actors: nodes, links, wanderers, stimuli, and other lower-level actors that live inside nodes and links.

There's a numerics module, containing data processing actors that recognize patterns in tables of numbers, using a variety of algorithms, some standard, some innovative. DataNode embodies nonlinear data analysis methods and it recognizes subtle patterns that'll always be missed by ordinary data mining and financial analysis software.

There's a Natural Language Processing ("natlang") module, which deals with human language processing. Most simply, the natlang module represents texts as TextNodes, linking down to WordNodes representing words in the text, and other nodes and links representing linguistic feature structures, facts, concepts and ideas in the text. It has text processing actors that recognize key features and concepts in text, drawing relationships between texts and other texts, between texts and people, between texts and numerical data sets. These actors process vast amounts of text with a fair amount of understanding and a lot of speed.

On the other hand, the natlang module also contains reading actors, which are used to study important texts in detail. They proceed through each text slowly, applying semantic processing schema that build a mental model of the relationships in the text just like a human reader does. These reading actors really draw the AI Engine's full set of semantic relationships into play, every time they read a text. The nodes in the natural language module carry out both semantic and syntactic analysis. The NL system takes in text and parses it, and outputs the parsed text into nodes and links.

As important as language understanding is, however, it is not all-powerful. Relations learned through the natural language system are not intrinsically "understood" by the system -- they represent purely formal knowledge. The grounding module, on the other hand, contains schema that allow the system to derive knowledge directly from its environment. A particular case of grounding actors are textual-numerical correlation actors, which recognize patterns joining texts and numerical data files together. These are used by the Webmind Market Predictor when it finds the concepts in news that drive the financial markets.

The ingestion of linguistic as well as numerical data is mediated via the short term memory module. The most recent items read in through a given input stream are stored in short-term memory and the various intercombinations of these items are explicitly represented in a temporary way. This system is crucial, among other things, for disambiguation of linguistic terms.

There's a category module, containing actors that group other actors together according to measures of association, and form new nodes representing these groupings. This, remember, is a manifestation of the basic principle of the dual network.

There are learning actors, that recognize subtle patterns among other actors, and embody these as new actors. These span various modules, including the reason module, containing logical inference wanderers, that reason according to a form of probabilistic logic; and the automata module, containing AutomatonNodes that carry out evolutionary learning, according to genetic programming, a simulation of the way species reproduce and evolve.

In the user module there are actors that model users' minds, observing what users do, and recording and learning from this information - these are UserNodes and their associated Wanderers. There are actors that moderate specific interactions with users, such as conversations, or interactions on a graphical user interface. And in the self module there are self actors, wanderers and stimuli that help the SelfNode study its own structure and dynamics, and set and pursue its own goals.

There are QueryNodes in the query module, embodying schema that mediate interactions with human queriers. WAE's query processing is integrated with the rest of its mental activity, just as for a human being, question-answering is not so different from purely internal thought processing. When a query (be it a series of key words, a paragraph of natural language, or a series of commands requesting particular data operations) is entered into the system, a node is created for it, the query node sends out mobile actors, and these actors create new links joining it and other nodes. Activity related to the query node spreads through the Psynet, and after a certain period of time, the nodes with the highest activity relevant to this particular activation process are collected, and returned as the answers to the query. The distinction between activity due to a particular query and activity due to general Psynet thought processes or other queries is carried out via an innovative, proprietary technique of "parallel thought processes," which allows to do one thing the human mind cannot: carry out hundreds or thousands of simultaneous trains of thought, and keep them all straight!

Each of the actors (nodes, links, mobile agents) involved in these modules has in itself only a small amount of intelligence, sometimes no more than that you might see in competing AI products. Psycore is a platform in which they can all work together, learning from each other and rebuilding each other, creating an intelligence in the whole that is vastly greater than the sum of the intelligences of the parts.

What the psycore provides is a common dynamic data structure for all these different specialized pattern recognition and formation schemes to work together on. It achieves its true value when the different specialized schemes actually work together in harmony, helping each other learn every step of the way. The emergent intelligence that you obtain in this way outweighs by far the mechanical inefficiency of using a common dynamic data structure.

Experiential Learning

So far we've mostly been talking about the Engine's internals - how various processes inside the system work, and how they interact, and so forth. But ultimately, it's crucial that the AI Engine is embedded in the outside world - the Net is not only it's brain, it's its world. How does WAE experience? How does it learn from experience? Intelligence is all about achieving complex goals in complex environments, which means that sensing and understanding the environment is centrally important.

Experiencing and learning from experience uses all the mechanisms described above - language, reasoning and activation spreading, self and emergence, and on and on and on.

It also uses another level of structure: the division of nodes into function-specific node groupings. Most of the psynet running at any giventime consists of what we can think of as a medium-term memory node group. But interacting with the world requires three specialized node groups in addition: one for perception, one for action, and one for short-term memory. The dynamics here is about exactly what you'd think: stuff comes in through perception, and goes out through action; and short term memory is the gateway for most if not all interactions between perception/action and long term memory.

The short term memory is host to many of the nodes constituting what we call "WAE's Psyche": its goals, contexts, feelings and motivations; and schema that it commonly uses to achieve certain goals in certain contexts, including the goal of maximizing its own happiness, and the goal of maximizing user happiness insofar as possible.

MindServers

The last ingredient of our AI architecture has emerged over the last 6 months as a result of our need to improve the space and time efficiency of the system. We realized that the psycore framework, with freely interpenetrating nodes and links of various types carrying out various processes, actually provided more generality than was required by 80% of all mental processes. Of course, the remaining 20% of mental processes are crucial - these are the smartest processes, the one requiring "dynamic, free-flowing, focused attention." But we realized we had to rearchitect the system so that only the processes really requiring the full power of psycore, are run in psycore, and the others are run using a host of more specialized and efficient software mechanisms called MindServers.

Thus we now have a centralized "Mind Database" containing complete information about all the mind objects (nodes and links) in the system. Most of the system's learning processes require only partial information about mind objects, or require information about only some mind objects rather than all of them. Thus, surrounding the Mind DB, we have a collection of specialized learning processors or "Mind Servers." Mind Servers come in two species:

* Psycore-based: These represent knowledge internally using nodes and links, but have specialized scheduling of processes.
* Psycore-based: These represent knowledge internally using domain-specific knowledge representations

In either case, a Mind Server comes with a process that builds its own internal image of the knowledge from the Mind DB that it needs. Because of their special-purpose nature, the knowledge images inside non-core-based MindServers may be much more specialized than that inside the Mind DB, thus achieving greater space and time efficiency. Note that the whole framework is still based on the Webmind Core, which provides a general framework for networked software actors. The Webmind Core was originally built to support psycore, but now it also supports a richer "society of mind"-ish AI architecture with psycore at the center.

There are many different MindServers; the following is a partial list:

* Context Formation
* Association-Finding
* Inference
* Higher-Order Inference
* Prediction
* Causal Inference
* Linguistic Feature Structure Learning
* Genetic Programming

Mind Servers can take care of most, but by no means all, mental processing. What they leave out is - precisely the crux of the mind! Because, for those aspects of intelligence that require constant real-time interpenetration of different kinds of learning, psycore is needed. One can think of the psycore portion of the system - still by far the largest part of the system - as the system's AttentionalFocus. Unlike the MindServers, it contains a general representation of knowledge allowing simultaneous use of all the types of knowledge contained in the Mind DB.

Finally, along with the MindServers, we have added specialized system lobes corresponding to aspects of experiential interaction, as mentioned above. Real-time conversation processing (in KNOW, our formal-logical knowledge representation language, or English) is based on specialized lobes carrying out STM, language comprehension, language production, query processing, and related functions. Text processing uses a similar system.

Webworld

An even deeper level of background processing than is provided by the MindServers, is provided by Webworld. Webworld is a software system helps the AI Engine with speculative, long-term learning processes that may take a long time, but are ultimately needed for the system's increasing intelligence.

Webworld is a sister software system to the AI Engine, sharing some of the same codebase, but serving a complementary function. A Webworld lobe is a much lighter-weight version of a WAE lobe, which can live on a single-processor machine with a modest amount of RAM, and potentially a slow connection to other machines. Webworld lobes host actors just like WAE lobes, and they exchange actors and messages with other Webworld lobes and with AI Engines. AI Engines can dispatch non-real-time, non-data-intensive "background thinking" processes to Webworld, thus immensely enhancing the processing power at their disposal. Webworld is a key part of the Webmind Inc. vision of an intelligent Internet. It allows the AI Engine's intelligence to effectively colonize the entire Net, rather than remaining restricted to small clusters of sufficiently powerful machines.

In studying the practical applications of Webworld within the AI Engine, it becomes clear that there are two very separate use cases.

One is what we may call the high-bandwidth use case. In this case, we can assume we have a number of Webworld locations all of which have high-bandwidth access to a central Mind Db. This occurs, for example, when all the Webworld locations are on a single LAN, which also contains an AI Engine.

In this case, one can use Webworld locations to do pretty sophisticated things. For example, one can do schema evolution, where a population of schema resides in each location, and fitness evaluation of the schema involves inference, which gathers relevant data from the central Mind Db. GP operations, and inference operations, are carried out in the Webworld location, but the location must make frequent calls to the Mind Db to gather relevant links on which to perform inference.

The other case is the low-bandwidth use case. In this case, we assume that each Webworld location must do its own processing without frequent access to externally-stored data. Appropriate tasks for this kind of Webworld location would be more mathematically-oriented problems like:

* GP-based parameter optimization (given a bunch of data from the time server regarding the indicator values of the system under different parameter values, find the best parameter values for achieving given indicator values).
* Plan optimization (taking plans created by the inference engine and making them more compact and efficient, using graph theory algorithms)
* Creation of predictive models for system parameters (using ESP, etc., based on data from the time server)
* Clustering for category node formation. (A collection of nodes are characterized by feature vectors, which are exported to a Webworld location for clustering using a standard algorithm, e.g. a Weka method.)

Any "deep AI" problems require background knowledge which is highly memory-intensive and hence not appropriate for a low-bandwidth Webworld situation.

Since many of the low-bandwidth-appropriate problems are optimization problems, which can be approached using GP; and schema learning, a key high-bandwidth Webworld application, also relies on GP; the first priority with Webworld is thus to get distributed GP working as effectively as possible.

Summary

The architecture of the system has evolved over time in what seems to us a very natural way. We have made many discoveries that you just don't make until you actually start building and experimenting with an integrative, large-scale, self-organizing AI system. While we are sure that our learning curve is not over, we feel that our conceptual design and our system architecture are now both sufficiently complete to lead us to our goal of creating a program that can hold an intelligent conversation about itself and information it has read. We expect that our near future discoveries will consist largely of ways to make particular components more efficient, and intelligent "control schema" representing high-level habits of thought (affecting not the system structure but the system dynamics).

In review, the architecture consists of

* The Webmind Core, a general "distributed actor system" framework, on top of which all other system components are implemented
* Psycore, a dynamic semantic network of flexibly-defined nodes and links
* An assemblage of AI modules, each containing specific types of nodes and links devoted to a particular aspect of mental function: reason, NLP, GP, psyche, user modeling, etc.
* An assemblage of MindServers, interacting via a common MindDB, each carrying out a particular aspect of mental function in isolation, for cases where the overhead of full inter-process interpenetration is not needed
* A real-time conversation-processing subsystem, carrying out language comprehension and generation and short-term memory, based on a specialized variant of psycore
* Webworld, a peer-to-peer distributed processing framework to which psycore nodes and MindServers can dispatch difficult long-term problems for solution

This seems to us essentially the only possible way to induce contemporary hardware and software to give rise to the self-organizing emergent network of patterns that is the mind.

In terms of practical applications of the system, it seems that the first wave of real-AI-empowered software applications will not use the AI Engine directly, but will rather use rules of various sorts exported from the AI Engine. This is because the AI Engine itself uses a lot of processing power, so that even if a working intelligent computer conversationalist is created, it may only be able to talk to one person at a time, and may require hundreds of thousands of dollars worth of hardware. There may be a limited market for "intelligent digital gurus" like this as research tools, but for the mass market, in the short term, the key will be the exportation of rules and other data structures that simpler, less computationally costly software systems can use to dramatically enhance their performance.






I
Conceptual Background







2
A Brief and Biased History of AI

Ben Goertzel

with some help from Ted Goertzel


This chapter began as part of a popularization about the AI Engine that my father Ted Goertzel and I were writing in 1998. We decided to put the popularization on hold until the system itself was ready to be launched into the world in a sufficiently impressive way. Maybe this year!


1. Competitors in the Race to Real AI (and the Lack Thereof)

Given that there exists a major subdiscipline of computer science called "Artificial Intelligence," one might expect there to be many strong competitors in the race to create a real AI system. But in fact this is not the case. Most of the field of AI is not directly concerned with the design and engineering of real AI systems at all, but rather with various narrowly defined subproblems of the problem of creating digital intelligence. The presupposition of this work is that solving these subproblems, in isolation, contributes significantly toward solving the overall problem of creating real AI. While this is of course true to a certain extent, our experience with the AI Engine suggests that it is not so true as is commonly believed. In many cases, the best approach to implementing an aspect of mind in isolation, is very different from the best way to implement this same aspect of mind in the framework of an integrated, self-organizing AI system.

Quite honestly, at the present time we don't consider there to be any very serious competitors in the race toward real AI. Without being too egomaniacal about it, there is simply no evidence that anyone else has a serious and comprehensive design for a digital mind. However we do realize that there is bound to be more than one approach to creating real AI, and we are always open to learning from the experiences of other teams with similar ambitious goals.

Perhaps closest thing we have to a real competitor on the real AI front is Artificial Intelligence Enterprises (www.a-i.com), a small Israeli company whose engineering group is run by Jason Hutchens, a former colleague of mine from University of Western Australia in Perth. They are a direct competitor in that they are seeking to create a conversational AI system somewhat similar to the Webmind Conversation Engine. However, they have a very small team and are focusing on statistical learning based language comprehension and generation rather than on deep cognition, semantics, and so forth. If this team were given a significant infusion of more diverse AI expertise, and many more engineers, it could become a serious threat. At best, however, they would be 3-5 years behind us, as their initial design is certainly no more advanced than ours was in 1997.

Another project that competes less directly is Katsunori Shimohara and Hugo de Garis's Artificial Brain project, initiated at ATR in Japan (see http://citeseer.nj.nec.com/1572.html) and continued at Starlab in Brussels, and Genotype Inc. in Boulder, Colorado. This is an attempt to create a hardware platform (the CBM, or CAM-Brain Machine) for real AI using Field-Programmable Gate Arrays to implement genetic programming evolution of neural networks. We view this fascinating work as somewhat similar to the work on the Connection Machine undertaken at Danny Hillis's Thinking Machines Corp. - the focus is on the hardware platform, and there is not a well-articulated understanding of how to use this hardware platform to give rise to real intelligence. It is highly possible that the CBM could be used inside WAE, as a special-purpose genetic programming MindServer; but CBM and the conceptual framework underlying it appear not to be adequate to support the full diversity of processing needed to create an artificial mind.

A project that once would have appeared to be competitive with ours, but changed its goals well before Webmind Inc. was formed, is the well-known CYC project (www.cyc.com). This began as an attempt to create true AI by encoding all common sense knowledge in first-order predicate logic. They produced a somewhat useful knowledge database and a fairly ordinary inference engine, but appear to have no R&D program aimed at creating autonomous, creative interactive intelligence.

Another previous contender who has abandoned the race for true AI is Danny Hillis, founder of the company Thinking Machines, Inc. This firm focused on the creation of an adequate hardware platform for building real artificial intelligence - a massively parallel, quasi-brain-like machine called the Connection Machine (Hillis, 1987). However, their pioneering hardware work was not matched with a systematic effort to implement a truly intelligent program embodying all the aspects of the mind. The magnificent hardware design vision was not correlated with an equally grand and detailed mind design vision. And at this point, of course, the Connection Machine hardware has been rendered obsolete by developments in conventional computer hardware and network computing.

On the other hand, the well-known Cog project at MIT is aiming toward building real AI in the long run, but their path to real AI involves gradually building up to cognition after first getting animal-like perception and action to work via "subsumption architecture robotics." This approach might eventually yield success, but only after decades.

Of course, there are hundreds of other AI engineering projects in place at various universities and companies throughout the world, but, nearly all of these involve building specialized AI systems restricted to one aspect of the mind, rather than creating an overall intelligent system.

Why is the field of AI this way? Why are there so few projects directly aimed at actually creating an autonomous, creative, digital intelligence?

I believe that, ultimately, the main culprit in the history of AI has been the lack of adequate hardware. It seems to me that hardly anybody has ever seriously tried to build a whole mind -- a computer system that can observe the world around it, act in the world around it, remember information, recognize patterns in the world and in itself, and create new structures inside itself in order to help it better achieve its goals. Presumably no one has tried to do this because the computer resources available have always been blatantly inadequate to support such a program. It seems to me that, lacking the computer resources to build a whole mind, researchers have typically focused on one or another particular aspect of the mind, and tried to push this aspect as far as it could go. Lacking a perceptual environment to embed their AI systems in, researchers have built reasoning and memory programs with essentially no perceptual systems; programs that act entirely on the basis of logical rules, with no direct sensory link to the world - and have theorized that something substantial about mind can be learned this way. Lacking the ability to create neural nets with a billion neurons, researchers have proposed that neural nets with 10,000 neurons can be made to do something mind-like. Lacking the ability to build specialized hardware supporting all aspects of intelligence, one builds specialized hardware supporting one particular aspect of intelligence (say, evolution of neural networks according to training data) and asks how far this one aspect can be pushed. And so on.

To build a comprehensive system, with perception, action, memory, and the ability to conceive of new ideas and to study itself, is not a simple thing. Necessarily, such a system consumes a lot of computer memory and processing power, and is difficult to program and debug because each of its parts gains its meaning largely from its interaction with the other parts. Yet, is this not the only approach that can possibly succeed at achieving the goal of a real thinking machine?

We now have, for the first time, hardware barely adequate to support a comprehensive AI system. Moore's law and the advance of high-bandwidth networking mean that the situation is going to keep getting better and better. However, we are stuck with a body of AI theory that has excessively adapted itself to the era of weak computers, and that is consequently divided into a set of narrow perspectives, each focusing on a particular aspect of the mind. In order to make real AI work, I believe, we need to take an integrative perspective, focusing on

* The creation of a "mind OS" that embodies the basic nature of mind, and allows specialized mind structures and algorithms dealing with specialized aspects of mind to happily coexist
* The implementation of a diversity of mind structures and algorithms ("mind modules") on top of this mind OS
* The encouragement of emergence among these specialized modules, so that the system as a whole is coherently responsive to its goals

This - together with a precise plan for how the Mind OS should work, based on a study of the philosophy of mind -- is the core of the Webmind vision.

In the remainder of this chapter, I'll briefly review some large-scale trends in previous AI work, with a focus on how they fit into this picture of the history of AI, and how they contribute to the WAE. Then, in the next chapter, I'll start over from scratch, and begin with the philosophy of mind that leads to the WAE - that tells us how to build the "mind OS" and encourage the emergence that I believe real AI demands.

2. Nets versus Rules

When I first started studying AI in the mid-1980's, it seemed that AI researchers were fairly clearly divided into two camps, the neural net camp and the logic-based or rule-based camp. This isn't quite so true anymore, but in reviewing the history of AI, it's an interesting place to start. Both of these camps wanted to make AI by simulating human intelligence, but they focused on very different aspects of human intelligence. One modeled the brain, the other modeled the mind.

The neural net approach starts with neurons, the nerve cells the brain is made of. It tries to simulate the ways in which these cells are linked together, and in which they achieve cooperative behaviors by nonlinearly spreading electricity among each other, and modulating each other's chemical properties. Its conceptual roots go back to Norbert Wiener's book "Cybernetics: Control and Communication in Animals and Machines," from the 1930's, an amazing book for its time, in which it was shown for the first time that the same mathematical principles could be used to understand both man-made electrical control systems and biological systems like bodies and brains.

Rule-based models, on the other hand, try to simulate the mind's ability to make logical, rational decisions, without asking how the brain does this biologically. They trace back to a century of revolutionary developments in mathematical logic, culminating in the realization that Leibniz's dream of a complete logical formalization of all knowledge is actually achievable in principle, although very difficult in practice.

To most any observer not caught up on one or another side of the debate, it's obvious that both of these ways of looking at the mind are extremely limited. True intelligence requires more than following carefully defined rules, and it also requires more than random links between a few thousand artificial neurons. The WAE incorporates aspects of neural nets and also of logic-based AI, although it doesn't use either one in a conventional way.

Neural Nets

The landmark work that sticks out as the start of neural network theory was the work of cyberneticists Warren McCullough and Walter Pitts, in the early 1940's_, along with the psychological theories Donald Hebb published in his 1949 book The Organization of Behavior. These early researchers created the "neural net view of the mind." In this view, the stuff of mind is patterns of electrical flow among neurons. Collections of tightly interlinked neurons - what Hebb called "cell assemblies" -- lead to distinct, repeatable patterns of flow. Electrical charge passes through neural networks according to nonlinear threshold rules, and on a slower time scale, modifies the properties of the synaptic connections making up the networks - the links between neurons. Via synaptic modification (Hebbian learning), distinct, repeatable patterns of flow create collections of tightly interlinked neurons. Everything is patterns of connection, patterns of electrical flow.

What McCullough and Pitts did in their first research paper on neural networks was to prove that a simple neural network model could do anything - could serve as a "universal computer." At this stage computers were still basically an idea, but Alan Turing and others had worked out the theory of how computers could work, and McCullough and Pitts' work, right there at the beginning, linked these ideas with brain science.

The overall behavior of a neural network is given by the pattern of interconnections between the various neurons, and by the values of weights that are assigned to the various connections. But, given the nonlinear nature of the threshold dynamics, there's no easy mathematical way to tell what kind of behavior a certain interconnection pattern and weight distribution are going to lead to. There's no easy mathematical way to tell how the weights should adapt themselves to provide learning, and it's not so easy to figure out how the brain does it - neuroscientists still don't know.

Modern work on neural nets falls basically into two camps. There are attractor neural nets, which are complex dynamical systems, used as associative memories and problem-solving algorithms. And, much more popularly, there are feedforward or "backprop" neural networks, which are learn patterns between inputs and outputs. These are used in various practical applications, ranging from financial prediction to automobiles' on-board diagnostic computers. In the WAE we use different methods to recognize input/output patterns, but mostly for convenience - we could use feedforward neural nets if we wanted to; we've experimented with them a bit but have found other similar things that are less brainlike but more efficient. In my view, the use of neurons in this way doesn't really embody any of the essence of the brain's intelligence - there are other ways of doing the same thing that don't look like the brain but work a little bit better.

Real-world neural net engineering gets quite complex. For instance, to get optimal performance for OCR, instead of one neural net, researchers have constructed modular nets, with numerous subnetworks. Each subnetwork learns something very specific via backpropagation, then the subnetworks are linked together into an overall meta-network. One can train a single network to recognize a given feature of a character 4/20/014/20/01 say a descender, or an ascender coupled with a great deal of whitespace, or a collection of letters with little whitespace and no ascenders or descenders. But it is hard to train a single network to do several different things 4/20/014/20/01 say, to recognize letters with ascenders only, letters with descenders only, letters with both ascenders and descenders, and letters with neither. Thus, instead of one large network, it pays to break things up into a collection of smaller networks in a hierarchical architecture. If the network learned how to break itself up into smaller pieces, one would have a very impressive system; but currently this is not the case, the subnets are carefully engineered by humans.

The WAE has "nodes" that are a bit like neurons - they have a threshold rule in them - and "links" that are a bit like synapses - connections between neurons - in the brain. But the WAE's nodes have a lot more information in them than neurons, they more closely represent huge groups of neurons, as will be made clear in Chapter 2. And the WAE's links have more to them than the links in neural net models - they're not just conduits for simulated electricity; they have specific meanings and are formed by specialized actors that recognize these meanings. In short, like backprop neural nets, the WAE takes the brain as an inspiration in some ways, but does not attempt to model the brain. But while backprop neural nets use the brain as an inspiration for how to map inputs into outputs, the WAE takes it as an inspiration for how to construct a whole mind.

Rules

Rule-based AI programs aren't based on self4/20/01organizing networks of autonomous elements like neurons or nodes, but rather on systems of simple logical rules. Intelligence is reduced to following orders. You don't try to deal with the emergence of mind from brain - you just try to look at what mind does, and write the simplest possible programs that will emulate this behavior. The approach is refreshingly direct, and teaches us a lot about the complexity of human behaviors that we take for granted and that we consider very very simple - like holding a conversation or solving a puzzle or even walking across the room. Much of what it teaches us, however, is that it's really hard to boil down intelligent behaviors into sets of rules - the sets of rules are huge and variegated, and the crux of intelligence become the dynamic learning of rules rather than the particular rules themselves.

One famous early rule-based program was something called the General Problem Solver, which was not that general at all, but was capable of solving a variety of simple puzzles, for example cryptarithmetic puzzles like DONALD + GERALD = ROBERT. [To solve this, assign a number to each letter so that the equation comes out correctly.] What GPS was doing was taking an overall goal - solving a puzzle - and breaking it down into subgoals. It then tried to solve the subgoals, breaking them down into subgoals if necessary, until it got subgoals small enough that it could deal with them in some direct way, like by enumerating all possible values some letter could take in a cryptarithmetic puzzle. This same basic logic is used now in much bigger and better rule-based AI programs, for example, SOAR, the subject of ongoing development by Simon and his colleagues.

This business of goal and subgoals is important to the WAE - we have something called a GoalNode, and we have processes called schema that can break goals contained in GoalNodes into subgoals. The basic algorithm of GPS and SOAR is something that's necessary for the mind, but it doesn't have to be done in as rigid a way as these programs do it. In fact, doing it in such a rigid way is tremendously destructive. But to make it flexible, you need the goal and subgoal management part of the mind to interact with the other parts of the mind. The system has to be able to flexibly determine which of its processes are effective for achieving which of its goals in what contexts - and for this it needs reasoning and association-finding and long-term memory. And it has to be able to use context-dependent, intuitive reasoning to figure out what goals to split into subgoals in what way in what situation. Basically GPS and SOAR and this whole line of AI research are a result of taking one aspect of the mind - goal-directed, problem-solving behavior - and extracting it from the rest of the mind. Unfortunately, when you extract it from the rest of the mind, this aspect of thinking isn't all that useful, because it has no way to control itself in a context-dependent way.

Another famous rule-based AI program was BACON, which was basically a data mining tool for extracting algebraic patterns from scientific data. Herbert Simon once claimed that a four4/20/01to4/20/01five hour run of BACON corresponds to "not more than one human scientific lifetime." Douglas Hofstadter, in his book Metamagical Themas, suggested that one run of BACON actually corresponds to about one second of a human scientist's life work. I think that Hofstadter's estimate, though perhaps a little skimpy, is much closer to the mark. Only a very small percentage of scientific work is composed of BACON4/20/01style data crunching.

In the WAE, we actually use stuff like BACON - though vastly more sophisticated. We call this aspect of the WAE's thinking "machine learning" or "data mining" - it's discussed in detail in Chapters 8 and 11. Recognizing patterns in vast amounts of data is a very important part of the mind, but it's only part of the mind. The WAE learns rules explaining why humans like some messages or e-mails better than others, using methods not that different from BACON's. But, the real trick there is in mapping the messages or e-mails into numbers that data mining methods can deal with. This involves understanding the meanings of various words and phrases and expressions. Also, there's the matter of deciding what data to look at, which is done by the general association-finding mechanisms in the WAE's mind. And there's reasoning which brings general background knowledge into the process, as opposed to pure data mining which is just pattern-finding. Bringing associations and reasoning into the picture, you need long-term memory, which opens a whole big and beautiful can of worms. Pattern finding is crucial, but it's only a little part of the picture.

Rule4/20/01based AI - "symbolic" AI -- has had plenty of practical successes. But every single one of these successes has resulted from specialized tricks, rather than flexible intelligence. One term for this is "brittleness." Or, you could call it remarkable literal4/20/01mindedness. These programs are a lot like WordPerfect, DOS 6.0, or a pocket calculator -- they do what they're told, and not one little bit more. If they're programmed to deal with one context, then that's what they'll deal with; not in a million years will they generalize their knowledge to something totally different.

There was one famous program that contained logical definitions of everyday words. An "arch" was defined as "Three blocks, A, -- and C, so that C is supported by A and B, and A and -- do not touch." This is all very well for playing with blocks 4/20/014/20/01 but what will the program do when it gets to Arches National Park in Utah ... or builds arches out of modeling clay? On the other hand, show a clever three4/20/01year old human an arch made of blocks, and she'll immediately recognize a rock arch as a member of the "arch" category. It won't occur to her that a rock arch can't be naturally decomposed into three blocks A, -- and C. Children, unlike expensive research computers, are anything but brittle 4/20/014/20/01 even their bones are flexible!

Some people have tried to get around the brittleness problem by providing the computer with so much information that it could answer any possible contingency. The most ambitious project in this direction was Doug Lenat's Cyc project, mentioned above, which has been going since 1984. Cyc is focused on trying to build a program with common sense. The Cyc team is mainly focused on encoding millions of items of data, so that the program can know everything an eight4/20/01year4/20/01old kid knows. "Cyc" was originally short for "Encyclopedia," but they found that the knowledge they needed was quite different from that found in encyclopedias. It was everyday knowledge you could get by asking a small child, perhaps more like that in a dictionary. For example, the Cyc definition of "skin" goes like this:

"A (piece of) skin serves as outer protective and tactile sensory covering for (part of) an animal's body. This is the collection of all pieces of skin. Some examples include #$TheGoldenFleece (representing an entire skin of an animal) and (#$BodyPartFn #$YulBrynner #$Scalp) (representing a small portion of his skin).

The Cyc definition of happiness is:

The enjoyment of pleasurable satisfaction that goes with well4/20/01being, security, effective accomplishments, or satisfied wishes. As with all #$FeelingAttributeTypes, this is a #$Collection 4/20/014/20/01 the set of all possible amounts of happiness one can feel. One instance of #$Happiness is `extremely happy'; another is `just a little bit happy'.

This is an attempt to solve the common sense problem that we see when playing with chat bots like ELIZA - these chat bots have no common sense, they have no idea what words mean. Cyc is based on getting humans to tell computers what words mean.

It's interesting stuff, but you have to ask: How much do the logical definitions in Cyc really overlap with the kind of information contained in the mind of an eight-year old child. We humans aren't even explicitly aware of much of the information we use to make sense of the world. A human's notion of happiness or skin is much bigger, disorderly and messier than these definitions. These kinds of general abstract definitions may be inferred in the human mind from a whole lot of smaller-scale, practical patterns recognized involving skin and happiness, but they're not the be-all and end-all. In dealing with most practical situations involving skin and happiness, we don't refer to this kind of abstraction at all, but we use the more specialized patterns that the general conclusions were derived from.

Basically, Cyc tried to divorce information from learning, but it can't be done. A mind can only make intelligent use of information that it has figured out for itself. Despite sixteen years of programming, Cyc never succeeded in emulating an eight year old child. Nor has anyone yet found much use for a CD-ROM full of formal, logical definitions of common sense information.

In fairness to Doug Lenat, I must say that he is now working from a computational4/20/01psychology perspective that has something in common with my approach. He has a reasonably solid theory of general heuristics 4/20/014/20/01 problem4/20/01solving rules that are abstract enough to apply to any context whatsoever, and his Cycorp Company is in some limited ways a competitor to Webmind Inc., developing intelligent text analysis techniques. His pre4/20/01Cyc programs AM and EURISKO applied his general heuristics to mathematics and science respectively. Both of these programs were moderately successful, exemplars in their field, but far from true intelligence. They lack a holistic view of the mind. Getting the problem4/20/01solving rules right means virtually nothing, because problem4/20/01solving rules gain their psychological meaning from their interaction with other parts of the mind. If the other parts aren't even there, the problem solving is bound to be sterile.

EURISKO won a naval fleet design contest two years in a row, until the rules were changed to prohibit computer programs from entering. And it also received a patent for designing a three4/20/01dimensional semiconductor junction. But when looked at carefully, even EURISKO's triumphs appear simplistic and mechanical. Consider EURISKO's most impressive achievement, the 34/20/01D semiconductor junction. The novelty here is that the two logic functions "Not both A and B" and "A or B" are both done by the same junction, the same device. One could build a 34/20/01D computer by appropriately arranging a bunch of these junctions in a cube.

But how did EURISKO make this invention? The crucial step was to apply the following general4/20/01purpose heuristic: "When you have a structure which depends on two different things, X and Y, try making X and Y the same thing." The discovery, albeit an interesting one, came right out of the heuristic. This is a far cry from the systematic intuition of a talented human inventor, which synthesizes dozens of different heuristics in a complex, situation4/20/01appropriate way.

By way of contrast, think about the Croatian inventor Nikola Tesla, probably the greatest inventor in recent history, who developed a collection of highly idiosyncratic thought processes for analyzing electricity. These led him to a steady stream of brilliant inventions, from alternating current to radio to robotic control. But not one of his inventions can be traced to a single "rule" or "heuristic." Each stemmed from far more subtle intuitive processes, such as the visualization of magnetic field lines, and the physical metaphor of electricity as a fluid. And each involved the simultaneous conception of many interdependent components.

EURISKO may have good general4/20/01purpose heuristics, but what it lacks is the ability to create its own specific4/20/01context heuristics based on everyday life experience. And this is precisely because it has no everyday life experience: no experience of human life, and no autonomously4/20/01discovered, body4/20/01centered digital life either. It has no experience with fluids, so it will never decide that electricity is like a fluid. It has never played with blocks or repaired a bicycle or prepared an elaborate meal, nor has it experienced anything analogous in its digital realm ... so it has no experience with building complex structures out of multiple interlocking parts, and it will never understand what is involved in this. EURISKO pushes the envelope of rule4/20/01based AI; it is just about as flexible as a rule4/20/01based program can ever get. But it is not flexible enough. In order to get programs capable of context4/20/01dependent learning, it seems to be necessary to write programs which self4/20/01organize 4/20/014/20/01 if not exactly as the brain does, then at least as drastically as the brain does.

Beyond the Nets versus Rules Dichotomy

"Nets versus rules" is an adequate way to view the history of AI from 30,000 feet, but of course, this perspective comes nowhere near to getting across the wild diversity of innovation you find by looking at the papers of individual researchers, including those way out of the mainstream. Here I will mention just three random bits of AI work which have had a particular influence on the WAE.

There was the work of John Andreae at the University of Canterbury in Hamilton, New Zealand. He wrote a nice little system called PURR-PUSS which learned to interact with you statistically. One of his students was John Cleary, who was one of the machine learning gurus at Waikato University in Hamilton, New Zealand, where I taught for a year. John has been working for Webmind Inc. on our machine learning module for over a year now, and he recommended a few of his best students to us, who now make up our Hamilton office. We're not exactly emulating PURR-PUSS in the WAE, but the statistical learning methods that it embodied are there in our machine learning module and our reasoning system, and the emphasis on interactive learning that Andraea advocated lives on in our current "Baby Webmind" project.

Then there is the idea of genetic algorithms - doing AI by simulating evolution rather than the brain. There are papers on the topic going back to the late 60's, but until the early 1990's this area of research was still extremely obscure. By the mid-90's it was a well-recognized area of computer science and I was doing research into the mathematics of genetic algorithms, studying questions such as "Why is evolution involving sexual reproduction more efficient than evolution involving asexual reproduction only?" Although the details are different, evolutionary AI is similar in spirit to neural net AI - you're dealing with a complex, self-organizing system that gives results in a holistic way, where each part of the system doesn't necessarily have a meaning in itself but only in the context of the behavior of the whole. In The Evolving Mind, I wrote a bit about the relation between evolutionary programming in AI and Edelman's theories of evolution in the brain. It turns out you can model the brain as an evolutionary system, with special constraints that make it a bit different from evolving ecosystems or genetic algorithms in AI. We have an evolution module in the WAE, which is used for two things: as one among many machine learning methods for finding patterns in data (along with feedforward neural nets and purely statistical methods); and as one among two ways of learning schema for perceiving and acting (the other being probabilistic logical inference).

Finally there is the notion of a Multi-Agent System. The AI Engine is indeed a multi-agent system, but, it differs in some important ways from the MAS's typically studied.
In most multi-agent software systems, system control is de-centralized, and a common metaphor is an economy model. In those models, the agent themselves are responsible for most of the control. They bid for resources from their environment. They decide when to move to a new location. In some cases, they even have the permission to alter the environment so their needs will be better fulfilled. In the WAE, this model isn't entirely appropriate because of the need for overall coordination of system functions. Most multi-agent software systems are more like a society than a mind. The WAE mixes up central control with decentralized, creative dynamics, which makes optimizing its parameters and controlling its behavior inordinately complex.

In the WAE, rather than being done in a purely decentralized way, control is largely decentralized on a high level (between major system components), but centralized within each of these components (which may themselves contain thousands to billions of semi-autonomous actors). Control between and within high-level components is done by specialized Actors in the system, which will measure the system's behavior, and, according to some high level goals given by the system's cognitive processes, drive the system towards states that allow better performance under the necessary constraints of memory and processor usage, as well as attention restriction.

This is a major philosophical point, but it's one that we feel confident we've handled correctly. The important point is that Marvin Minsky's metaphor of a "Society of Mind" is not actually correct, though it's evocative and useful in some ways. A mind is not a society, although it has many society-like aspects: as compared to a society, it's much more coherently focused in its behavior. Some of this coherent focus is emergent, but one of the factors that allows such focus to emerge is a modicum of overall system control. In the brain, there are many examples of this kind of overall control, including various hormonal systems affecting neurochemical activity, and older, more primitive parts of the brain that control our basic motivations and their manifestations in everyday behavior. In the WAE, centralized overall control is provided by the Homeostatic Controller, which regulates parameters (more like a hormonal system), and the Attention Broker, which regulates heavyweight tasks (more like the primitive brain, choosing between actions according to current motivations).

3. The Importance of Embodiment

When neural nets were being dissed in the early 70's, not everyone was optimistic about the potentials of rule-based AI. In 1972, the era in which ELIZA was receiving a lot of attention. a philosopher named Hubert Dreyfus's wrote a book called What Computers Can't Do, which was a vicious attack on AI. Dreyfus argued that artificial intelligence researchers were fundamentally misguided, that they could never achieve their objectives with the methods they were using. Dreyfus preached the importance of body4/20/01centered learning, and the close connection between logic, emotion and intuition. Without a body, Dreyfus argued, without feelings, there can be no real generalization of special4/20/01case ideas. Based on these philosophical considerations, he predicted that AI would be a failure.

In 1992, Dreyfus re-released the book with the title What Computers Still Can't Do. The new Introduction brims over with insolence. But his exultant crowing is not quite persuasive. He was right about the limitations of the AI programs of the 1960s and 1970s. But the observers who thought it was just a matter of time and resources also been proven correct in many cases. Dreyfus, for example, ridiculed a late-1960's prediction by Frank Rosenblatt that computers would soon be able to take dictation, just as a human secretary can. Although this prediction didn't come true as quickly as Rosenblatt had thought, some fairly good programs are available today for this purpose, relying in large part on a neural net architecture to learn each users' speech patterns.

Dreyfus's critique of AI, in the first edition, was too strong. He appeared to believe that detailed simulation of the human body was the only possible path to AI, and he argued that this would be impossible without simulating the biology of the brain and the rest of the body. Actually, the human brain is only one intelligent system, and a great deal can be accomplished without replicating the details of its biology. But Dreyfus's arguments posed a serious challenge to AI theorists: how to design a machine that can simulate body4/20/01based, emotion4/20/01based conceptual generalization? I believe that Dreyfus was essentially correct that, if this is impossible, AI cannot work. A physical body just like ours is not required: an AI entity could have a virtual body, enabling it to interact in a rich and autonomous way with a virtual world. And emotions need not be controlled by biological neurotransmitters, they can come out of complex digital dynamics. But the point is, unless one has a computing system that is large, complex and autonomous, with integrated sensory, memory and action systems interacting with a rich environment and forming a self system, it will never develop the ability to generalize from one domain to another. The ability to generalize is learned through general experience, and general experience is gained by exploring a world.

In designing the WAE I took Dreyfus's critique to heart. Of course, I didn't try to replicate the human body as he thought was necessary. Instead, I bypassed his critique by designing a huge, self-organizing system, which lives in the perceptual world of the Internet and understands that its body is made up by Java objects living in the RAM of certain machines. It is a nonhuman, embodied social actor. Dreyfus didn't try very hard to imagine an embodied, social intelligence without a human-like body, but, his ideas certainly leave room for such a thing. His problem was not with AI but with the attempt to build a mind that operates in a vacuum, instead of synergistically with a self and a world.


4. Toward a Middle Way

I've presented a dichotomy between symbolic and connectionist AI - rule-based and neural-net AI. I've pointed out that a lot of cool AI doesn't fit into this framework at all, things like statistical machine learning and genetic algorithms. Now I'm going to dig my hole even deeper by arguing that the distinction between symbolic and connectionist AI is actually a lot fuzzier than most AI gurus realize.

This is a key issue because I often like to say that the WAE synthesizes connectionist and symbolic AI. While this is a true statement, it glosses over the peculiar vagueness of the notions of "symbolic" and "connectionist" themselves. When you get deeply into these concepts, you realize that this classical dichotomy is not quite framed correctly in most discussions on AI. There is a valid distinction between AI that is inspired by the brain, and AI that is inspired by conscious reasoning and problem-solving behavior. But the distinction between "symbolic" and "connectionist" knowledge representation is not as clear as it's usually thought to be.

Classically, the distinction is that in a symbolic system, meanings of concepts are kept in special localized data structures like rules, whereas in a neural-net-like connectionist system, meanings of concepts are distributed throughout the network. Also, in a symbolic system the dynamics of the system can be easily understood in terms of what individual rules do, whereas in a connectionist system the dynamics can basically only be understood holistically, in terms of what the whole system is doing.

But in reality the difference isn't so clear. For example, one branch of symbolic AI is "semantic networks." In a semantic network you have nodes that represent concepts and links representing relations between concepts. Suppose you have a semantic network in which there is a node representing "floor." This is, obviously, symbolic in the classic sense. The meaning of the "floor" node is localized. But wait - is it really?

In some semantic network based AI systems, all the relations are made up by people. But some of them have reasoning that builds relationships, that learns, for example, that because people walk on floors, floors must be solid, because people only can walk on solid things. In a system like this, relations are built from other relations, and so the meaning of the "floor" node may be contained in its relations to other nodes, i.e. its connections to other nodes. And, the formation of these connections may have been based on the connections of the other nodes to yet other nodes, etc. etc. etc.

What this means is that, in a semantic network formed by iterative reasoning rather than by expert rule creation, each element of knowledge (each node) actually represents the result of an holistic dynamic. It has meaning in itself -- a link to our socially constructed concept "floor" -- but internally its meaning is its relation to other things, each of which is only defined by the other things it related to, etc.; so that the meaning of the part is only truly describable in terms of the whole.

On the other hand, suppose one has a neural network in which memories are represented as attractors (a Hopfield Net, or Attractor Neural Network, in the lingo). Then, the meaning of a link between two nodes in this network mainly consists of the attractors that its presence triggers. On the other hand there's also a clear local interpretation: If the weight of the link is large then that means the two nodes it connects exist together in a lot of attractors. I.e., they're contextually similar. If the weight of the link is large and negative, this means that the two nodes rarely co-exist in an attractor -- they're contextually opposite. Whether the nodes have immediate symbolic meaning or not depends on the application -- in typical attractor neural network applications, they do, each one being a perceptible part of some useful attractor.

The point is, in both classic symbolic and classic connectionist knowledge representation systems, one has a mix of locally and holistically defined meaning. The mix may be different in different knowledge representation systems, but there is no rigid division between the two. This fact is important in understanding the WAE, which intermixes "symbolic" style and "connectionist" style knowledge representations freely.

Of course, there are extremes of symbolic AI and extremes of connectionism. There are logic based AI systems that don't have nearly the holistic-meaning aspect of a reasoning-updated semantic network as I've described above. And, there are connectionist learning systems -- e.g. backpropagation neural nets -- in which the semantics of links are way less transparent than in the attractor neural net example I've given above. But this is also an interesting point. I believe that, of all the techniques in symbolic AI, the ones that are most valuable are the ones that verge most closely on global, holistic knowledge representation; and of all the techniques in connectionist AI, the ones that are most valuable are the ones that verge most closely on localized knowledge representation. This is because real intelligence only comes about when the two kinds of knowledge representation intersect , interact and build on each other.

I'm certainly not alone in coming to the conclusion that the middle way is where it's at. For instance, Gerald Edelman, a Nobel Prize4/20/01winning biologist, proposed a theory of "neuronal group selection" or Neural Darwinism, which describes how the brain constructs larger4/20/01scale networks called "maps" out of neural modules, and selects between these maps in an evolutionary manner, in order to find maps of optimum performance. And Marvin Minsky, the champion of rule-based AI, had moved in an oddly similar direction, proposing a "Society of Mind" theory in which mind is viewed as a kind of society of actors or processes that send messages to each other and form alliances into temporary working groups.

Minsky's and Edelman's ideas differ on many details. Edelman thinks rule-based AI is claptrap of the worst possible kind. Minsky still upholds the rule-based paradigm --though he now admits that it may sometimes be productive to model the individual "actors" or "processes" of the mind using neural nets; he does not believe the self-organization of the population of mind actors as a whole is important. But even so, the Society of Mind theory and the Neural Darwinism approach are both indicative of a shift toward a new view of the structure of intelligence, one which I believe is fundamentally correct.

What Minsky and Edelman share is a focus on the intermediate level of process dynamics. They are both looking above neurons and below rigid rational rules, and trying to find the essence of mind in the interactions of large numbers of middle4/20/01level psychological processes. I believe this is the correct perspective, in large part because I think it is how the human mind works. Of course, it is difficult to open up the human brain and test one's hypotheses. But brain scientists are making enormous strides, and there will undoubtedly be exciting new findings from them, which I expect to resonate nicely with developments in artificial intelligence. In computer science, the way to prove a theory is to translate it into computer code and get it running on available computer hardware, which is what we're doing with the WAE.

5. The WAE as a Synthetic Solution

In working out the initial the WAE design, I was adamant about avoiding the oversimplifications of both the neural net and rule-based models. The neural network approach portrayed the brain as unrealistically unstructured and implausibly dependent on self4/20/01organization and learning, with little or no large4/20/01scale order. The rule-based approach portrayed the mind as unrealistically orderly, as implausibly dependent upon logical reasoning, with little or no chaotic, deeply trial4/20/01and4/20/01error4/20/01based self4/20/01organization. It totally misses the point about logic and rules: no particular system of rules is all that important to mind; the crucial thing is the ability to conceive new systems of rules to match new kinds of situations.

In order to understand how to go beyond these simplifications, I looked deep into the philosophy of mind, as will be described in the following chapter. From this fundamental perspective, I arrived at a conceptual design for a mind OS. Mind, I concluded, is an amazingly simple thing. Mind is a network of actors that act on each other, send messages to each other, and transform each other. Many of these actors are concerned with recognizing patterns in each other, and with achieving system goals in this way. The freeness and looseness of actor intercreation must be preserved in order that adaptivity and creativity may flourish - but there must nevertheless be actors specialized to carry out various tasks, otherwise, given finite computational resources, the mind will end up being equally stupid in all areas. The emergent dynamics of the whole actor system is essential to its intelligence, and this becomes yet more complex when one introduces specialized actor types.

How does this "self-organizing network of actors" view fit into the history of AI, to the traditional dichotomy between rules and neural nets? The key here, I believed, was to focus on the intermediate level of brain/mind organization: larger than the neuron, smaller than the abstract logical rules. In brain terms, I was most impressed by research on neural modules 4/20/014/20/01 clusters of tens or hundreds of thousands of neurons, each performing individual functions in an integrated way. One module might detect edges of forms in the visual field, another might contribute to the conjugation of verbs. The network of neural modules is a network of primitive mental processes rather than a network of non4/20/01psychological, low4/20/01level cells (neurons). The key lies in the way the modules are connected to each other, and they way they process information collectively. Neural modules, it seemed, were a good approximation to my "mind-actors." They lived at a level that was low enough to embody nonlinear self-organizing dynamics, yet high enough to be mindlike rather than physical-world-like, to have clear semantic meaning.

Of course, the brain is highly complex, and this conceptual perspective on brain structure comes nowhere near to capturing the brain's full diversity. For example, the cortical columns that are the basis for many ideas of neural groupings of this size are only found in primary cortices, not secondary or tertiary regions. This suggests a different structure for aggregates in the pure input/output (sensorimotor) domains from natural aggregates in the interactive, highly integrative parts of the cortices. But in spite of this very real diversity, it seems clear that the modular-grouping idea is at least a guide to understanding brain structure, worthwhile to consider as a high-level inspiration for AI design.

Consequently, I initially designed the WAE with this middle level of the brain's organization in mind. The AI Engine is not a neural net in the strict sense, but it embodies many aspects of neural networks. Compared to typical neural net AI programs, it's a level further removed from neurological detail. The basic unit of a neural network is the "formal neuron," a computer model of a biological neuron. The basic unit of the WAE, on the other hand, is a Java object called a node, which I think of as corresponding to a cluster of 10,000 to 100,000 neurons. These nodes interact with each other in many of the same ways that the neurons in a neural net program do.

Each of the WAE's nodes has rules programmed into it, which limit and shape what it does, depending on its node type. We haven't waited for each node in the WAE to figure out everything on its own, we've given each of them instructions based on our theories of how the mind operates. So the WAE includes both the logic of rule-based AI and the self-organizing capabilities of neural net AI.

And interestingly, although the initial AI Engine design was based on an attempt to mimic intermediate-level brain structure, during the course of our work we've found ourselves forced to imitate high-level brain structure a little more closely. I'm talking about the fact that, in moving from the AI Engine 0.5 release to the 1.0 release, we have introduced a great deal of functional specialization among our artificial brain's components. Just as the brain has specialized regions devoted to vision processing, language processing, temporal perception, and so forth, so does the current AI Engine design embody specialized subsystems for language processing, time processing, attention allocation, and various other tasks. At the same time, we have veered even further away from simulating the neuron-level of the brain, modifying the dynamics of the nodes and links of the AI Engine to look progressively less and less like straightforward neural net equations, based on the results of experimentation.

This, I believe, is also how the brain/mind works. The brain is more than a network of neurons connected according to simple patterns, and the mind is more than an assemblage of clever algorithms or logical transformation rules. Artificial intelligence research has a history of attempts to ignore this fact, for the sake of getting interesting semi4/20/01intelligent behavior out of insufficiently powerful computers. A lot has been learned through the process, but what has not been learned is how to build a true thinking machine. Rather, we have mostly learned what intelligence is not. Intelligence is not following prescribed deductive or heuristic rules, like Deep Blue or EURISKO; intelligence is not the adaptation of synapses in response to environmental feedback, as in Hebbian or backpropagation neural nets. Intelligence involves these things, but what intelligence really is, in my view, is something different: the self4/20/01organization and mutual intercreation of a network of processes, embodying perception, action, memory and reasoning in a unified way, and guiding an autonomous system in its interactions with a rich, flexible environment. That is what the WAE is all about.

In conclusion, I'll stress once again the key role that diversity plays in the mind. The essence of mind is very simple, but this essence is only a self-organizing "mind OS" of intertransforming actors - the other half of mind is the specialized modules and learning methods that run on the OS. Because of this, one of the keys to building the WAE has been creating a community of AI experts with different points of view, and expertise on different parts of the mind, who are all reasonably willing to listen to each other. Each part of the mind is different, and requires a different way of thinking, and it's very easy to assume that the way of thinking that is natural for thinking about the part of the mind that you've been thinking about, is the only correct way. In developing the WAE, we've had my initial design which was firmly based on ideas of self-organization and emergence. We've had Jeff Pressing's approach to mind based on cognitive psychology and his approach to numerical data understanding based on nonlinear dynamics. We've had Pei Wang's approach to reasoning, based on his own peculiar Peirce-inspired version of probabilistic logic. We've integrated evolutionary programming, inspired but not entirely based on John Koza's work in this area, and statistical learning as developed by our Kiwi team, as descended from the thinking of John Cleary, John Andreae, and others. And this is just a partial list. The WAE framework gives general data structures and dynamics in which a variety of different approaches to AI can coexist and interact and learn from each other.





3

Mind as a Web of Pattern

Ben Goertzel

(plus a paragraph here and there by Jeff Pressing)


1. Introduction
I've explained above how I think the field of AI came to become preoccupied with things other than actually creating artificial intelligence. Some people have a different diagnosis: They don't think AI is possible at all. They think that a digital computer program can never really be intelligent, because it lacks some magical chemical, physical or spiritual ingredient that humans possess. I meet fewer people with this attitude now than I did fifteen years ago, but they still exist. In fact, some folks with this attitude have been productive Webmind Inc. employees.
The fact that there is no consensus among educated, intelligent people as to whether computational intelligence is possible, indicates that even in a book focused on a practical AI system, some attention to philosophical foundations is worthwhile. In this chapter I'll give this aspect of AI its due, reviewing various philosophical issues associated with computational intelligence, and explaining how they've been answered in the context of the WAE.
Of course, I don't expect to convince every reader that I have all the answers regarding questions such as the nature of intelligence, mind and consciousness. But at least, I believe I can show that there is a consistent and sensible conceptual framework underlying the AI Engine, which includes plausible answers to all these questions as well as to more practical questions of AI design.

2. Is AI Possible?
When I first started thinking about AI, it wasn't obvious to me that it was possible. But I convinced myself with two arguments, one scientific, one philosophical.
The scientific argument went like this: Admittedly, the human brain is the only definitive example of intelligence that we know, and it doesn't look like it's executing algorithms: it is a largely incomprehensible mass of self-organizing electrochemical processes. However, assuming that these electrochemical processes obey the laws of quantum physics, they can be explained in terms of a system of differential equations derived from quantum theory. And any such system of differential equations may be approximated, to within any desired degree of accuracy, by a computable function. Therefore, anyone who claims that the human mind cannot be understood in terms of computation is either 1) denying that the laws of quantum physics, or any similar mathematical laws, apply to the brain; or 2) denying that any degree of understanding of the brain will yield an understanding of the human mind. Neither of these alternatives seemed reasonable to me (and neither of them seems reasonable now). (There's a little more to it than this, one can consider potential quantum gravity phenomena, but that would take us too far afield.)
And, the philosophical argument for the possibility of AI went like this: One can divide the universe into two categories: the describable and the indescribable. According to the Church-Turing Thesis, everything that is describable is computable. And everything that is indescribable is algorithmically random, and indistinguishable from the pseudorandom by any finite entity. So, the universe consists, at most, of computation plus randomness. Intelligence is part of the universe. QED.
So, suppose you accept these arguments, or others, and you believe that AI is possible. A couple other questions then arise.
First, how do you do it? How do you create AI?
Second, what does AI mean exactly? What is the criterion by which we can judge whether a digital system is intelligent?
Of course, these two questions interrelate. In the next section of this chapter I'll explore the second question: What is intelligence? This question naturally leads into a couple more philosophical questions: What is mind? What is consciousness? Here I'll give fairly high-level conceptual answers to these questions, referring the reader who wants mathematical or philosophical details to my past publications on these topics. The goal is to give just enough conceptual background to provide a meaningful philosophical context for the discussion of mind design in the chapters to follow.

3. What is Intelligence?

Intelligence doesn't mean precisely simulating human intelligence. The WAE doesn't do that, and it would be unreasonable to expect it to, given that it lacks a human body. The Turing Test, "write a computer program that can simulate a human in a text-based conversational interchange," serves to make the theoretical point that intelligence is defined by behavior rather than by mystical qualities, so that if a program could act like, a human it should be considered as intelligent as a human. But it is not useful as a guide for practical AI development.

I'm not going to propose a specific IQ test for the WAE or other computer programs. This might be an interesting task, but it can't even be approached until there are a lot of intelligent computer programs of the same type. IQ tests work fairly well within a single culture, and much worse across cultures - how much worse will they work across species, or across different types of computer programs, which may well be as different as different species of animals? What is needed right now is something much more basic than an IQ test: a working, practical understanding of the nature of intelligence, which can be used as an intuitive guide for work on the development of intelligent machines.

My own working definition of intelligence builds on various ideas from psychology and engineering, as documented in (Goertzel, 1993, 2000). I believe that intelligence is best understood as follows:

Intelligence is the ability to achieve complex goals in a complex environment

The greater the total complexity of the set of goals that the organism can achieve, the more intelligent it is. (Of course, there are mathematical issues in how one takes this sum total, but I won't delve into these here. There is also the question of how to quantify the notion of complexity, which I'll return to briefly below.)

Note that this definition of intelligence is purely behavioral: it doesn't specify any particular experiences or structures or processes as characteristic of intelligent systems. I think this is as it should be. Intelligence is something systems display; how they achieve it under the hood is another story. It may well be that certain structures and processes and experiences are necessary aspects of any sufficiently intelligent system.
My guess is that the science of 2050 will contain laws of the form: Any sufficiently intelligent system has got to have this list of structures and has got to manifest this list of processes. But this is another point, not necessary for understanding how to design an intelligent system.

In conclusion, then: When I say that the WAE is an intelligent system, what I mean is that it is capable of achieving a variety of complex goals in the complex environment that is the internet. To go beyond this fairly abstract statement, one has to specify something about what kinds of goals and environments one is interested in. In the case of biological intelligence, the key goals are survival of the organism and its DNA (the latter represented by the organism's offspring and its relatives). In the WAE's case, the goals that WAE 1.0 is expected to achieve are:

1. Predicting economic and financial and political and consumer data based on diverse numerical data and concepts expressed in news

2. Conversing with humans in simple English, with the goal not of simulating human conversation, but of expressing its insights and inferences to humans, and gathering information and ideas from them

3. Learning the preferences of humans and AI systems, and providing them with information in accordance with their preferences. Clarifying their preferences by asking them questions about it and responding to their answers.

4. Communicating with other WAE's, similar to its conversations with humans, but using a WAE-only language called Psynese

5. Composing knowledge files containing its insights, inferences and discoveries, expressed in XML or in simple English

6. Reporting on its own state, and modifying its parameters based on its self-analysis to optimize its achievement of its other goals

This is what a WAE instance needs to do in order to survive, in order to keep humans and other WAE instances happy with it so that it stays alive. It is by no means all the WAE will ever be able to do, but it's a start. Subsequent versions of the WAE are expected to offer enhanced conversational fluency, and enhanced abilities at knowledge creation, including theorem proving and scientific discovery and the composition of knowledge files consisting of complex discourses.

Are these goals complex enough that the WAE should be called intelligent? Ultimately this is a subjective decision. My belief is, yes. This is not a chess program or a medical diagnosis program, which is capable in one narrow area and ignorant of the world at large. This is a program that studies itself and interacts with others, that ingests information from the world around it and thinks about this information, coming to its own conclusions and guiding its internal and external actions accordingly.

Whether the WAE is smarter than or stupider than humans is not a very interesting question. My own sense is that the first version will be significantly stupider than humans overall though smarter in many particular domains; but that within a couple years there may be a version that is competitive with humans in terms of overall intelligence; and within 10 years there will probably be a version dramatically smarter than humans overall, with a much more refined design running on much more powerful hardware. But it's not clear to me how relevant my own subjective judgment is, in assessing the intelligence of another type of being. I'm content to make it as smart as possible.

4. What is Mind?

If intelligence is the achieving of complex goals in complex environments, then what is "mind"?

Of course, many philosophers have addressed this question. My favorite ideas in this domain come from three very different thinkers: Charles S. Peirce, Buddha, and Friedrich Nietzsche. I've also been much inspired by the emerging discipline of complexity science, and its vision of the mind as a complex, adaptive self-organizing system.

One of my favorite passages in the history of philosophy is where Peirce says that: