Science, Probability and Human Nature:

A Sociological/Computational/Probabilist Philosophy of Science

Ben Goertzel

January 18, 2004

Introduction

In this essay I present a somewhat novel approach to the philosophy of science, to which I give the awkward name “sociological/computational probabilism” (SCP)[1].

My main goal in formulating this approach has been to develop a philosophical perspective that does justice to both the relativism and sociological embeddedness of science, and the objectivity and rationality of science. I have also been motivated by an interest in clarifying the relationship between probability theory and the validation of scientific theories.

The SCP philosophy has its roots in several different places, most notably:

contemporary mathematical learning theory, particularly probability theory and the theory of algorithmic information
Imre Lakatos’ and Paul Feyerabend’s conflicting but related philosophies of science

It draws ideas from these sources, but adheres precisely to none of them.

Two main issues are addressed here (though the SCP framework may be used to address other issues as well). First: What is a workable and justifiable way to compare two scientific theories, or two scientific approaches? And secondly and more briefly: what modes of thinking, what types of cognitive dynamics, should occur and be reinforced inside the mind of the scientist, in order for science to progress effectively?

Finally, I would like to emphasize the relative modesty of my objectives here. Feyerabend (1975) has argued that the only universal rule characterizing high-quality scientific progress is “anything goes.” Any other rule that anyone proposes, he suggests, is going to have some exception somewhere. I think this is quite true. On the other hand, this kind of slipperiness is not unique to the philosophy of science; it also holds true in other sciences such as biology and linguistics. The goal in these “semi-exact” sciences is to find principles that are – to borrow a phrase from computer science -- “probably approximately correct.” That is, one wants conclusions that come very close to being true in nearly all cases – but one accepts that various sorts of exceptions will exist. My ambition here is to outline a philosophy of science that is simple and clear, and that provides a probably approximately correct model of the human endeavor we call science.

What Is This Thing Called Science?

What is this thing called science? Feyerabend correctly points out that science is a sociological phenomenon – one among many types of activity carried out by human beings. My approach to defining of science takes this rather obvious observation as an explicit starting-point. This is different from many approaches to defining science that are found in the philosophy-of-science literature, which are more abstract and impersonal in nature.

Begin with the assumption that there is a community of human beings who each has their own notion of reality. I then posit the existence of something called the Master Observation Set (MOS): a massive compendium of all scientific datasets to be considered in the formulation of scientific theories by individuals in this community. Currently the MOS is a hypothetical entity, but the increasing habit in the scientific community of posting observational on the Internet is rapidly making it a palpable reality.

The nature of the data in the MOS will be explored a little later in, in the course of our discussion of computational probabilism, where we will think of the MOS as consisting of datasets which contain a combination of “raw data” and “metadata.” But let us leave this issue for later.

The MOS may be considered as a “fuzzy set,” where the degree of membership of a dataset in the MOS is defined as average, over all people in the community, of the extent of the belief that this person has that the dataset is accurate. The key point is that each of these humans in the community basically agrees that the data records and metadata in the Master Observation Set reflect reasonably accurate observations of their own realities. There may be disputes over the accuracy of particular observations or datasets, but so long as these don’t involve the bulk of observations in the MOS, things are still okay.

Given this background, I can now define what I mean by a “scientific theory.” Namely, a scientific theory as a set of procedures with inputs and outputs specified as follows:

The inputs are drawn from the set of subsets of the MOS
The output is a prediction about some subset of the MOS

The relation between the output subset B and the input subset A may be defined by some proposition. For instance, A could be data regarding the state of a population of yeast cells in a Petri dish at a point in time T and a temperature of 30 degrees Celsius; B could be data regarding the state of the same population of yeast cells one minute later, assuming the temperature in the meantime has been gradually decreased to 0 degrees Celsius.

Different scientific theories provide procedures dealing with different sorts of subsets A and predicting things about different sorts of subsets B. A scientific “Theory of Everything” would try to make the best possible prediction about any subset B based on any subset A. No single, coherent scientific Theory of Everthing currently exists; but the body of modern science as a whole may be considered as an attempt at such.

In less technical terms, what I am saying is that, to be scientific, ultimately a theory must boil down to a set of prescriptions for predicting some almost-universally-accepted-as-real data from some other almost-universally-accepted-as-real data. Most of the time this data is quantitative, but it certainly doesn’t have to be. It just happens that, among humans, quantitative data seems to have an easier time fitting into the category of “almost universally accepted as real.” Data represented in terms of words tends to be slipperier and less likely to achieve the standard of near-universal perceived validity.

As already emphasized, I have intentionally defined science in terms of a social community, rather than in terms of an objective reality. This is in line with both Feyerabend and Lakatos’s emphasis on science as a social activity. The problem with defining science in terms of an objective reality is obvious: the definition of objective reality is deeply bound up with particular scientific theories. Do we mean classical objective reality, quantum objective reality or something else? Whatever else science is, we know it is a social activity, and can define it as such without being preferential to any particular scientific theory about the nature of the universe, or any particular research programme.

The main topic of this essay is how to judge the quality of a scientific theory or a scientific approach. However, in pursuing this topic we will also implicitly touch a related subject, which is how to judge the quality of approaches to understanding the universe in general – be they scientific or not. Feyerabend criticized Lakatos’s philosophy of science for not explaining why science is superior to witchcraft or theology. This seemed problematic to Feyerabend not because he believed in the superiority of science – in fact he did not – but because Lakatos did. Lakatos would have liked his philosophy of science to explain the superiority of rationality to other modes of social and individual cognition. On this topic we conclude that, in a sense, what Lakatos desired is partially achievable. One can apply some of the same approaches used to compare scientific theories, to compare nonscientific approaches. Furthermore, the sociological definition of science allows us to consider the sense in which some theories traditionally viewed as nonscientific may be considered as scientific in some important senses after all.

An interesting example of a theory that is not traditionally considered scientific is the Buddhist philosophy of consciousness (Stcherbatsky, 1958). This philosophy is extremely refined, delineating numerous types of consciousness and their interrelationship; it has much of the rigor and detail that is typically associated with science. Arguably, it has more rigor and detail than some social sciences do today, and more than biology did before the molecular-biology revolution. However, it fails the above definition of science, unless – and this is an important unless -- one restricts the community of humans involved to those who have achieved a certain level of “spiritual advancement” in the Buddhist religion. It makes many statements about phenomena that are not in the overall human Master Observation Set, but are rather specifically thought to be observable only by spiritually advanced human beings. This is a theory that is scientific with the respect to the community of the Buddhistically advanced, who can test and check its statements, and unscientific with respect to the rest of us.

Now, one may argue that this isn’t so different from a theory making statements about phenomena regarding underground caves on the dark side of the moon, where most humans will never go. Are theories about caves on the dark side of the moon scientific with respect to humanity as a whole, or only with respect to those humans who actually visit the dark side of the moon and crawl through the caves and see what’s there? Here things get interesting, and subtle. The key question is whether the nonmoonwalking humans back on Earth agree to admit the data on the interior of the caves on the dark side of the moon into the Master Observation Set. If they don’t agree to admit the data – perhaps they believe there’s a government conspiracy to fake this data to hide the aliens who live in the caves up there – then theories about this data don’t qualify as scientific with regard to the community of humans, but only with regard to the community of moonwalking humans. So the difference between this case and the Buddhist psychology case is, basically, that nearly all humans are willing to accept the data collected by astronauts observing the caves on the dark side of the moon; whereas most humans are not willing to accept the data collected by Buddhist monks in the course of their meditations.

So Buddhist psychology is scientific, but only with respect to a limited community. This means the ideas developed here may all be applied to it, so long as one restricts all consideration to the relevant community. But as we will discover, the actually situation is a little stronger than that. Some of the approaches we will discuss for evaluating scientific theories and approaches may also be used to evaluate things like Buddhist psychology and witchcraft, even in the context of communities where these things fail the definition of science given here.

A Naïve Computational/Probabilist Approach

Having defined what science is, we may now approach the question of defining what makes one scientific theory better than another. I’ll start by presenting a point of view that I call “naïve computational probabilism.” This is a philosophy of science, motivated by recent research in algorithmic information theory, which is mathematically sophisticated and elegant but – for reasons to be explored in the following section -- historically and psychologically naïve.

The basic idea is that a scientific theory should be judged on three criteria:

What is its scope of applicability?
How simple is it?
How accurate are its predictions, on average?

Scope, simplicity, predictivity. The ideal is an extremely simple theory with extremely accurate predictions and broad scope. In practice one must strike a balance, sacrificing some simplicity and scope in order to get more explanatory value.

The accuracy of predictions of a scientific theory T may be formalized in many different ways. For instance, one may begin by defining a mathematical metric (distance measure) on the Master Observation Set, and then proceed by averaging -- over all pairs (A,B) where A and B are subsets of the MOS that lie in the scope of the theory – the distance between T’s predictions of B based on A and the actual state of B.

The simplicity of a theory T may be formalized by assuming that all theories are expressed in some particular formal language, designed for interpretation by some abstract machine M. One may then look at the length of the statement of the theory T and the amount of time T takes to make predictions about subsets of the MOS that lie in its scope. These two criteria of space and time complexity may then be averaged in various ways.

This approach to assessing the quality of scientific theories is rooted in algorithmic information theory and statistical decision theory. It connects nicely with recent developments in theoretical computer science. For instance, if one combines it with Marcus Hutter’s (2003) mathematical formalization of the notion of intelligence, one arrives at the conclusion that intelligence may be achieved by learning good scientific theories.

The downside of this approach, on the other hand, is that its practical applicability is limited by the relative lack of formality of real scientific theories and real scientific practice. We will review the nature and consequences of this informality in an abstracted form, in the following section.

However, I believe that there is some deep truth at the kernel of this “naïve computational/probabilist approach” nonetheless – and after reviewing the various problems that one confronts in applying these abstract ideas to actual science, we will return to similar themes again, in the context of a neo-Lakatosian analysis of research programmes, and finally in the context of a “pragmatic unified approach” to the philosophy of science.

The Pragmatic Failures of Probabilism

Two of Lakatos’s philosophy-of-science concepts strike me as particularly important and useful. The first is his highly savvy critique of “probabilism” – the perspective that competing scientific theories may be compared against each other by assessing which one is more probable based on the given evidence. The second is his notion of “progression versus regression” as a way of comparing scientific research programmes, even when these research programmes use very different languages and present incommensurable views of reality.

Let us begin with Lakatos’s issues with probability theory as applied to the validation of scientific theories. After brushing aside some fallacious arguments made by earlier philosophers to the effect that science is nonprobabilistic in principle, Lakatos (1999) poses the very cogent objection that, although a probabilist perspective on science may be correct in principle, it is utterly infeasible in practice. The problem is that to properly apply probability theory to a scientific theory, that theory must be completely formalized. And, as Feyerabend has argued in detail, even relatively simple and standard scientific theories like Newtonian mechanics are far from completely formalized when applied in practice. (In fact, even pure mathematics is far from completely formalized when applied in practice; the Mizar[2] project has illustrated this in a fascinating way by carrying out a complete formalization of a reasonably large subset of modern mathematics.) Using Newtonian mechanics in reality requires a substantial phase of “setting up the problem,” in which informal reasoning is used to match aspects of real-world phenomena to variables and equations in the theory. And this informal “setting up the problem” phase is carried out quite differently for Newtonian mechanics than for quantum mechanics, for example – it truly is “part of the theory.”

In principle, it may seem, no such theory-dependent setting-up phase should be necessary. It should be possible to define a set of “empirical observations” in a theory-independent way, and then assess various theories in terms of their predictions on this set of empirical observations. Feyerabend, however, argues that this is not possible because observation is so theory-dependent that different theories may differ on what is being observed in a given real-world situation. This is an excellent point, worth exploring a little – although our conclusion will be that it doesn’t hold up completely.

Within the context of contemporary science, there is fairly good agreement on what constitutes an observation. General relativity, classical mechanics and quantum field theory are “incommensurable” theories in the sense that they provide mutually contradictory conceptual and formal perspectives on the world, and there is no clear and universal way of taking a phenomenon as expressed in the language of one of these theories, and translating it in the language of the others. But even so, it seems the task of defining a theory-independent set of laboratory observations for comparing these theories would not lead to any significant conceptual problems (though it would be a Herculean endeavor, to be sure). But there are subtle conceptual issues here – of which Feyerabend was well aware -- which arise when one analyzes the situation carefully.

Let’s suppose that – as envisioned above -- we created a Master Observation Set, consisting of the outputs of all scientific laboratory instruments of all kinds that have ever been used. We could then perform predictive experiments, of the form: Feed a given scientific theory one portion of the Master Observation Set, and see how well it predicts some other related portion. The same predictive experiments could be performed for different scientific theories, thus providing a common basis of comparison. The expected prediction error of each theory could be tabulated, thus making the probabilist foundation of science a reality.

Applied to fundamental physics theories, what would we learn from this exercise? We would find that classical mechanics fails to correctly predict a lot of observations involving microscopic phenomena and some involving macrosopic phenomena (involving gravity, light and electromagnetism, for example). It would also be found mathematically and computationally intractable in very many cases. General relativity would be found to fail on some microscopic phenomena, and to be mathematically and computationally intractable for a huge number of microscopic and macroscopic phenomena. Quantum field theory would be found erroneous for some gravitational phenomena, and would be found mathematically and computationally intractable for anything except microscopic systems or very simple macroscopic ones.

Certainly, some issues of theory-dependence of observation would arise in constructing the Master Observation Set, but it seems to me that they would not be so terribly severe. For example, it’s true that (as Feyerabend notes) general relativists tend to focus on those anomalies in Newtonian mechanics that general relativity can address, ignoring the other anomalies. However, this is theory-dependence of focus; the general relativists would not deny the admission of these other Newtonian anomalies into the Master Observation Set. Similarly, some modern physicists who question the current estimate of the mass of the top quark, suggest that the Fermilab scientists who have produced this estimate have erred in deciding which observations in their laboratory constitute “quark observation events” and which constitute “noise.” They contend that the classification of events versus noise has been done with a specific view towards obtaining a specific value for the mass of the top quark by studying these events. This may perhaps be the case – but even if it is, the problem would be avoided by adding all the raw output of the Fermilab equipment to the Master Observation Set.

Another problem that must be dealt with is the fact that the application of a scientific theory to a scientific dataset within the Master Observation set is not fully automated. Each dataset comes with some “metadata” telling what the dataset measures. For example, a dataset giving the expression levels of genes observed in a tissue sample from some organism consists of two parts: a table of numbers indicating genes’ names and their quantitative expression levels, and some metadata information telling the conditions under which the data was gathered: the type of organisms, its age, the temperature of the room, the type of microarrayer used to gather the data, etc. To take a single representative example, in the case of gene expression data there is a standard format called MIAME[3]. Before a dataset can be analyzed according to some scientific theory, some work must be done, using the metadata to case the data in the language of the scientific theory. Feyerabend argues, in effect, that this metadata-interpretation process is informal and is a critical part of scientific theories. He suggests that because it is informal, this metadata-interpretation may potentially be done differently by different scientists operating within the same theoretical tradition.

It seems to me that the metadata problem is an important one, but is not fatal to the probabilistic program. If a universal metadata language is agreed upon by adherents to various theories, then one can salvage the probabilistic program by one of two strategies:

Define a scientific theory as a kind of activity carried out by its human adherents in order to predict one set of data from another. The notion of probabilistic validation is perfectly useful in this case, just as much as if a theory is considered as purely formal without any human component.
Define a scientific theory to consist of a theory as traditionally conceived, plus a set of formal mappings that translate metadata (expressed in the universal metadata language) into theoretical expressions

Either one of these expansions of the ordinary definition of scientific theory would seem to work, as a way of salvaging the probabilist program.

The only problem remaining, then, is the main one that Lakatos complained about. He argued that, even if this kind of probabilistic approach to comparing scientific theories is theoretically possible, it is pragmatically absurd.

This is an interesting point to reconsider in the context of modern technology. With more and more scientific datasets placed online for public scrutiny – indeed, many high-quality journals now make this a precondition for publication – my proposed Master Observation Set isn’t nearly so farfetched now as it would have seemed in Lakatos’s day.

But clearly, the formation of the MOS as a database is not the major problem here. The bigger issue is one of combinatorial explosion and computation time. There is no feasible way to survey all possible subsets A and B of the MOS, and ask how well a given theory predicts B from A. The only way to work around this seems to be to define meaningful samplings of the set of all subset-pairs of the MOS. For instance, imitating crossvalidation in statistics, one can divide the MOS into N equally-sized subsets, and then iteratively leave each subset out, seeing how well a theory can predict the left-out subset from the rest. Once one starts doing things like this, a subtle kind of theory-dependence can creep in, because different ways of subdividing the MOS may lead different theories to perform better! However, this is not such a severe problem; and it seems that, in the future, some form of crossvalidation on an online MOS may well be feasible – and may provide a good way of assessing scientific theories.

The statistical issue of “overfitting” also arises here: ideally, one would like theories to make predictions of subsets of the MOS that were not used in the creation of the theory. Lakatos observes that this is a major criterion often used in the assessment of scientific theories. Frequently but not invariably, this takes the form of a theory predicting observations that were not known at the time of the theory’s creation.

Altogether, my conclusion is that a probabilistic approach to comparing scientific theories is not only possible-in-principle – it may become possible-in-practice once existing technologies and data-publication practices advance a little further.

However, the possibility of a probabilistic approach is one issue – and the issue of whether science actually works this way, in practice, even in an approximate sense, is quite another. Lakatos’s claim is that, even though probabilism is possible in practice, it is so computationally intractable that it essentially bears no relationship to the pragmatics of scientific-theory assessment. Unfortunately, I think this is almost the case.

The Prevalence of Biased Probabilistic Inference Heuristics

What are the most important ways in which real-world scientific-theory-validation differs from the idealized view of science as cross-validation testing on the MOS that I have presented above? One big difference, as noted above, is that the translation from metadata to theoretical formulations is nearly always left informal in practice. However, there is another difference that is even more essential. Because of the often large effort involved in applying scientific theories to observations to obtain predictions about other observations, a significant amount of merit tends to be given to “predicted predictions” on the MOS.

That is, suppose a scientific theory is determined – through detailed calculations or computer simulations -- to make prediction P for the values of observation-set B, based on the values of observation-set A. Then, this knowledge will be used to extrapolate inferentially, and guess the predictions P’ made for the values of other observation-sets B’, based on other observation-sets A’. These predictions P’ will then be compared to reality, even though they haven’t been derived in exquisite detail like the predictions P. This inferential extrapolation is typically carried out, not using generic probabilistic inference, but using heuristic inference methods supplied by the theory itself. In fact this is a major part of any decent scientific theory – the tools it provides allowing the scientist to analyze a new situation without taking a detailed-data-analysis approach.

So, in reality, because we can’t do cross-validation or detailed out-of-sample testing on the MOS, we’re always looking at approximations – which wouldn’t be so troubling, if it weren’t for the fact that the approximations are computed using approximation techniques that are part of the theories being tested! This is the sort of reason that science begins to look like a kind of opportunistic anarchism, as in Feyerabend’s philosophy. It’s the reason why Lakatos is basically right about probabilism – until we can really construct the MOS and deploy vast amounts of computational power to assessing competing theories using crossvalidation, we had better use some method besides probability theory to compare scientific theories.

Research Programmes

These observations about the practical limitations of probability theory lead us directly into Lakatos’s theory of “research programmes.” Since each scientific research programme provides its own heuristic short-cuts to true probabilistic validation, and since truly accurate crossvalidation or out-of-sample validation on the MOS is infeasible, some other way of assessing the quality of a research programme is required – and Lakatos actually had some concrete suggestions in this direction. Note the language I’m using here – we’re now talking about assessing the quality of a research programme, not an individual theory. This is essential.

For example, we’re now talking about assessing quantum mechanics, not quantum mechanics’ explanation of the helium atom. Of course, one can often compare two theories of some particular phenomenon – say, competing explanations of the behavior of liquid helium. But this comparison nearly always takes place within a common research programme (quantum electrodynamics, in the case of liquid helium). Sometimes, a cross-research-programme analysis of some particular phenomenon is possible, but this can’t be counted on. Generally, if one has two incommensurable research programmes – two research programmes that “speak different languages” in a fundamental sense, like quantum versus Newtonian mechanics, or Freudian versus behavioral psychology – then one faces major problems in doing specific-example-based comparisons. There are difficulties of formalization and validation such as mentioned above … and then one also faces problems of the theory-driven selection of illustrative examples.

But how can one assess the quality of a research programme except by assessing the ability of its component theories to make predictions about the MOS? Lakatos suggests a distinction between “progressive” and “regressive” research programmes. In essence, a research programme is progressive if it is creatively and adaptively growing in response to new information.

I like to formalize this as follows: A research programme is progressive if, when confronted with a significant amount of new data added to the MOS, it can generally predict this data either

without modification, or else
with modifications that are relatively simple compared to the complexity of the new data added

A regressive research programme, on the other hand, deals with qualitatively novel datasets by means of modifications that are equal or greater in complexity to the new data itself, or else by tactically decreasing its scope to avoid the problems encountered.

Of course, a research programme may be progressive at some points in its history and regressive at other points. Predicting the future progressiveness or regressiveness of a research program is itself a difficult problem of probabilistic inference! So it may seem that, from a practical perspective, we have done nothing but push the problem of probabilistic inference up from the level of scientific theory accuracy to the level of research programme progressiveness. There is some justice to this criticism – however, I think the correct response is that, in some ways, judging research programmes progressiveness is an easier problem.

The idea that judging research programmes may in a sense be easier than creating scientific theories, is related to various results from modern computer science. For example, there is a computational technique called the Bayesian Optimization Algorithm (Pelikan, 2002), which solves optimization problems by maintaining a population of candidate solutions. Each candidate in the population is evaluated as to its “fitness” – its efficacy at solving the optimization problem – and a probabilistic model is then constructed, embodying patterns that are common to many of the fit candidates and not many of the unfit candidates. These patterns are then used to generate new candidates, which are more likely than random candidates to be fit, because they embody patterns known to characterize fit candidates. The problem with this algorithm is that finding a good probabilistic model of the population is not so easy – in a sense, one has replaced one’s original optimization problem with a new one: finding an optimal model of the fit elements in the population. The beauty of the algorithm, however, is that this new, higher-level optimization problem is actually considerably easier. It is easier to recognize patterns characterizing reasonably-good solutions – once one has some reasonably-good solutions to study – than to find good solutions in the first place. This is analogous to the situation in the evolution of science, as one sees by mapping BOA “candidates” into “scientific research programmes” and BOA “models” to “heuristics indicating scientific research programme quality.” It’s easier to find patterns characterizing scientific research programme quality – as Lakatos has done, based on many examples of both high and low quality research programmes – than to find quality scientific research programmes directly.

A good example of a research programme shrinking scope in response to challenging new data can be found within my own primary research area, artificial intelligence. The contemporary field of AI is dominated by a perspective that I call “narrow AI,” which holds that the problem of engineering and describing software programs with general intelligence can be addressed via the process of engineering and describing software programs carrying out highly specific tasks. AI researcher Danny Hillis expressed this perspective with his assertion that intelligence consists of “a lot of little things.” On the other hand, there is an alternate research programme that I call AGI (Artificial General Intelligence), which holds that the essence of intelligence is holistic, and hence that the correct theories about how general intelligence works will not have much to do with software programs narrowly tailored to highly specific tasks. In the early days of narrow AI (the 1960’s and 1970’s), researchers sought to construct narrow AI programs carrying out highly ambitious tasks, but their theories failed to accurately predict the behavior of these programs – specifically, the programs worked far less intelligently than the theories suggested they would. The result has been, not to discard the narrow-AI approach, but simply to stop trying to make ambitious software programs – thus, in practice, narrowing the scope of the narrow-AI research programme to “the explanation of the behavior of software programs carrying out specialized tasks.” For instance, in the domain of automated theorem proving (a subspecialty of AI), few researchers seek to create truly autonomous theorem-proving software anymore; the scope has narrowed to the study and creation of software that proves theorems in a semi-automated way, with significant human aid.

This narrowing of scope of the narrow-AI research programme seems to me indicative of regressiveness. However, narrow-AI advocates would argue that this regression is only temporary – that after the problems within the newly narrower scope have been solved, the narrow-AI research programme will be ready to broaden its focus again. After enough “little things” have been fully understood, then perhaps the project of piecing together little things to make a big thing – an artificial general intelligence – will seem as simple as the original narrow-AI theorists thought it would be. On the other hand, the AGI research programme has many fewer practical achievements to its name than its narrow-AI competitor, yet it is clearly far more progressive, as it is explaining more and more phenomena each year, and continually leading to more and more interesting software systems. This example illustrates the difficulty of applying these philosophy-of-science ideas to make normative judgments about practical situations. Predicting whether a regressive research programme will become progressive again, or whether an early-stage, speculative progressive research programme will indeed flourish and blossom – these are not easy problems (though I have my own strong opinion regarding the narrow-AI/AGI case; see Goertzel, 2002).

Ptolemaic astronomy, with its epicycles, is the classic example of a regressive research program generating modifications whose complexity is qualitatively “too large” relative to the new data being incorporated. Feyerabend argues against Ptolemaic astronomy as an example of overcomplexity, on the grounds that the Copernican perspective added other kinds of complexity that were even more complex than the Ptolemaic epicycles. Feyerabend has a good point regarding the relativity of simplicity-assessment – however, I ultimately think he’s incorrect, and that, relative to human judgment, Ptolemaic astronomy is just plain more complicated.

This example illustrates a major problem with my formalization of the progressive/regressive distinction: it’s dependent upon the measure of simplicity one defines. This may seem to be a hopelessly subjective issue. However, the subjectivity may be at least partially gotten under control by an appeal to algorithmic information theory, as was done above in the context of the naïve computational/probabilist approach. If one defines a computational model (using, say, a kind of abstract computer such as a Turing machine) of what a “theory” is – say, if one defines a theory as a computer program running on some particular abstract machine – then one can define simplicity based on algorithmic information theory or its variants. A simple entity is one that runs fast and takes up little memory. A modification to a scientific theory is simple if it requires little code and doesn’t make calculations with the theory humongously slower. There is some arbitrariness in the weighting of space versus time complexity here, and in practice scientists seem to weight space complexity much more – i.e., brief theories are considered high-quality, whereas no one minds much if a theory entails horrifically complex calculations even to explain phenomena that intuitively appear quite simple.

Without this kind of appeal to an “objective” measure of simplicity, it seems, the theory of research programmes is doomed. Thus in our definition of science, it would probably be wise to insert a clause stating that the individuals involved, as well as agreeing on the probable reality of most of the Master Observation Set, should roughly agree on what’s simple and what’s complicated. Given this addition to the definition of “science,” we may say that scientific theories come along with a method for validation by definition.

It’s worth noting that, in this approach, a research programme cannot be validated in the abstract: it has to be validated as a methodology for action adopted by a particular set of intelligent agents (primarily humans, at the moment). Because in order for the theory to get modified to deal with new situations -- someone has got to do the modifying.

A Practical Amalgam

Returning to the naïve computational/probabilist approach defined earlier, we may now ask: What have we managed to salvage? In essence, we have discarded the probabilistic-accuracy criterion as being too difficult to assess in practice, and replaced it with a shift up from the level of theories to the level of research programmes; but we have not managed to get rid of the computational-simplicity aspect. We are still talking about simplicity, not in the context of simplicity of theories, but in the context of simplicity of modification to research programmes.

What is the conclusion about the comparison of different scientific theories and approaches? It seems that reality dictates a mixed approach.

In the case of theories existing within the same research programme, or research programmes that are not too incommensurable, the naïve computational/probabilist approach is closer to reality. We really do compare theories based on which ones make more accurate predictions within their pertinent domains.

On the other hand, when research programmes are too strongly incommensurable, the probabilistic approach is bollixed in practice by issues relating to different interpretations of the same data, and different heuristics for estimating predictivity in various contexts. One must resort to a cruder, more high-level approach: the comparison of the quality of the research programmes in general. However, in order to compare the progressiveness of two research programmes, one requires a standard of “simplicity” that spans theories within both of the programmes.

Furthermore, the same approach used to compare incommensurable scientific research programmes may be used to compare nonscientific belief systems. The probabilist aspect of the ideas presented above is peculiar to scientific theories – but the notion of progressive vs. regressive is not tied to probabilism. One may assess and compare the progressiveness of two different religions, for example. The question of who will perceive this is valuable, however, is another issue.

The Lakatosian notion of progressiveness embodies the scientific ideal of “progress” – which, as Feyerabend points out, is not a universal idea. Aristotelian science, for example, was more attuned to the ideal of stability – of finding a collection of ideas that would explain the universe adequately, once and for all. The belief systems of precivilized tribes also tend to be oriented toward stability rather than progress.

And so, Feyerabend is correct that there is no “objective” way to compare scientific theories or research programmes. Different research programmes breed different intuitive notions of simplicity, and hence there is subjectivity in the calculation of which research programme requires the more complex modification to deal with a new dataset. There is also a value judgment implicit in the assessment that “progressiveness” and the ability to gracefully incorporate new information is a good ting.

The dependence on the value judgement that “progress is good” doesn’t worry me much. But the dependence on a subjective measure of simplicity is more troubling. If the probabilistic assessment of theory quality is impossible due to computational intractability, and the assessment of research programme progressiveness depends on a subjective simplicity measure -- then can science progress? Must we abandon, with Feyerabend, the vision of science as a progression through a series of better and better research programmes?

I think not. The key lies in the definition of science as a human enterprise.

Human Nature

When David Hume treated the problem of induction (in A Treatise on Human Understanding), he noted a familiar infinite regress. Suppose we predict the future from the past using some predictive methodology – then how do we know this predictive methodology is workable? We know this because in the past the predictive methodology seemed to work. But, how do we know the predictive methodology will continue to work in the future? Well, we have to use some predictive methodology to govern this prediction…. Hume pointed out that humans do not suffer from this regress except in pathological cases, because our “human nature” provides a way out. Essentially, he argued, we are hard-wired to use certain base-level predictive methodologies, and we do so because our brains tell us to, independently of any abstract reasoning we may carry out.

Combining Hume with Darwin, we see that this pushes the problem of determining the hard-wired predictive methodology out of the domain of human psychology and into the domain of evolution by natural selection. And how did natural selection come up with this hard-wired predictive methodology for us? The same way that natural selection comes up with everything else: by chance-guided physical self-organization. We reason inductively the way we do because we can’t help it; chance-guided self-organization made us this way and further self-organizing dynamics has reified it.

Similarly, the solution to the problem of the progress of science is solved by the fact that we humans have an innate sense of simplicity. Our sense of simplicity is guided by the beliefs and theories that we hold, and the scientific research programmes or other organized traditions that we work within – but only in pathological cases is a scientist’s sense of simplicity completely dominated by the research programme in which he works. There is a universal human sense of simplicity, about which our individual senses of simplicity cluster – and it is this that allows us, eventually, to arrive at virtual consensus on which research programmes are progressive and which are regressive. Individual, belief-guided variation in the sense of simplicity allows some people to carry on a long time with research programmes that others believe are clearly regressive – but eventually, if a research programme’s ongoing data-driven modifications get too perversely complicated according to the innate human sense of simplicity, even the true believers’ intuitions rebel and the research programme is recognized as regressive.

One can see this happening in social science today, with the Marxist research programme. Marxist true believers have a deeply Marx-influenced sense of simplicity, so that there are many modifications to Marxism that appear to them reasonably simple and elegant, although other humans tend to view them as overcomplicated and unconvincing “special pleading.” However, as the modifications get more and more perversely complex according to the innate human sense of simplicity, more and more Marxists are impelled to abandon the faith and move on to different ways of looking at the world. Clearly, the probabilistic aspect has also played a role here: Marxism, on the face of it, would seem to have made many wrong predictions about major world situations. However, the Marxist theoretical framework is extremely fertile, and it always has “good” explanations for why its predictions weren’t wrong after all. Marxism provides many powerful examples of the ability of research programmes to influence human perceptions of the world. What has caused Marxism to gradually dwindle in support, in my view, is not just its failure to predict, but at least as critically, its failure to come up with humanly simple modifications to allow it to explain newly observed data.

We see from this analysis the profound truth of Feyerabend’s assertion that science is first and foremost a human enterprise. To nonhuman intelligences, with a different innate sense of simplicity (no Humean “human nature”!), the human Master Observation Set might well lead to an entirely different set of hypotheses. These intelligences might look at our theories and find them interesting but a bit perversely complicated; and we might think the same thing about their own theories of the same data. Without the common ground of human intuition-about-simplicity, we’d have no way to resolve the dispute except using the massively computationally intensive probabilist approach – cross-validation and out-of-sample testing on the MOS.

Cognitive Analogues and Enablers of the Scientific Enterprise

Now, in this final section, I will shift focus somewhat: away from the broad currents of scientific change, and into the mind of the individual scientist. I will briefly and somewhat cursorily explore the question: What sorts of thought processes are maximally conducive to scientific progress as discussed above? This is interesting from the point of view of theoretical psychology, and it’s also interesting from the point of view of artificial intelligence. One of the long-standing goals of AI research is to produce artificial scientists. It stands to reason that this quest should be informed by a proper understanding of the nature of the scientific enterprise.

Informed by Lakatos, Feyerabend and others, and using the language of probability theory and computation theory, we have reformulated the philosophy of science in a novel way. What does this tell us about cognition, on the individual level? As it turns out, all the conclusions drawn above may be ported from the “scientific community” level to the “individual mind” level, if one merely takes the step of replacing the Master Observation Set with an Individual Observation Set – the set of observations that an individual human mind has either made itself, or has learned through symbolic means (e.g. language) and believed.

In place of scientific theories, we have predictive procedures in the individual mind: procedures that take some information in the IOS and make predictions about other information in the IOS. Again, the quality of a procedure may be measured probabilistically. And again, for computational reasons, much of this quality-assessment is doing via inference rather than direct evaluation on observations in the IOS. And again, much of this inference is heuristically guided by theories held by the mind.

The psychological analogue of a research programme is a belief system, a tightly interconnected network of ideas, including a set of related and interlocking procedures for predicting parts of the IOS based on other parts. As we have noted above, on the societal level research programmes are merely special kinds of human belief systems, and their progressiveness may be studied in about the same way as one would do for other belief systems (such as religions).

The theory-dependence of observation is extremely marked in human beings. It penetrates down to the lower levels of the perceptual cortex. We have a remarkable capacity to see what we want to see, and remember what we want to be the case. A famous and very low-level example of this is the blind spot that we would each see in front of our nose, if our brains didn’t perform interpolation so as to convince us that our perceived world, at a given instant, doesn’t have a hole in it (Maturana and Varela, 1992).

And the theory-dependence of human observation and memory does, in fact, make it difficult for us to objectively compare belief systems. We humans are very good at not seeing things that don’t match our beliefs. There are many examples of this in the history of science – scientists ignoring “anomalous” data points for example – but there are even more examples in other domains of human life, such as politics.

Human belief systems have a strong propensity to persist over time, because of their capability to bias observation and memory in their favor, and their ability to conceive inference control heuristics that bias probability estimates in their favor. In Chaotic Logic I described this propensity as a kind of “psychological immune system.”

Counterbalancing this tendency of belief systems to persist, however, is the human mind/brain’s quest for progress. We humans intrinsically value both novelty and simplicity. And these two values, as we have seen, lead to the possibility of a Lakatosian assessment of the progressiveness of research programmes and other belief systems. We want our thought-systems to grow to encompass newly observed situations, and we desire simplicity of understanding, according to our individual variations on our innate human senses of novelty and simplicity. These two desires cause the human mind – in many cases – to allocate attention away from belief-systems that have become regressive in the Lakatosian sense. And of course, this dynamic on the individual cognitive level is what is responsible for the same dynamic on the sociological level, as is seen in the history of science.

This analysis makes clear one way in which science differs from most other human belief systems. The human mind has a tendency to grow persistent belief-systems, but it also – via its innate quest for novelty and simplicity – has a tendency to seek progressive belief systems. What is unusual about science is the way it values the quest for novelty and conceptual simplicity over the quest for stability and persistence.

We thus reach the rather obvious conclusion that individuals who value stability and persistence may be good scientists, but they will be unlikely to create new research programmes. The progress from one research programme to the next is always made by a human being who, psychologically, values novelty and conceptual simplicity over stability. (Of course, however, human psychology is complex, and the same individually may value novelty in one domain and stability in another – at this juncture one may introduce the notion of “subpersonalities” (Goertzel, 1997) and reformulate the previous statement to say that new research programmes are created by individuals with powerful subpersonalities that value novelty and conceptual simplicity over stability.) And this means that science is much more likely to progress in cultures that reinforce these values, than in more “steady-state” oriented societies.

Finally, what does this tell us about the engineering of scientifically accomplished AI systems? The main lesson is that a purely analytical, probability-calculation-based approach will probably never be workable. It will run into the same problems that characterize naïve computational probabilism in the philosophy of science. All the needed probabilities can never be calculated, so heuristics must be used – but these heuristics inevitably wind up being theory-driven, and hence they do not provide an objective way of comparing theories. Probabilistic theory assessment must be augmented by a more system-theoretic sort of theory assessment, such as is provided by the notion of progressiveness. And the notion of simplicity is key here. If an AI program finds different things simple than humans do, it is going to find different scientific theories preferable. This may be a good thing, in some contexts; we may wish AI scientists to derive theories no humans would ever think of. On the other hand, if its innate notion of simplicity is too different from ours, an AI may find itself unable to learn effectively from human scientific knowledge – it may be able to absorb the abstract theoretical statements, but not the intuitions that surround them and guide their use.

References

Feyerabend, Paul (1975). Against Method. London: Verso
Goertzel, Ben (1994). Chaotic Logic. New York: Plenum.
Goertzel, Ben (1997). From Complexity to Creativity. New York: Plenum
Goertzel, Ben (2002). The Path to Posthumanity. Online at http://www.agiri.org/path/
Hutter, Marcus (2003). A Gentle Introduction to the Universal Algorithmic Agent AIXI , Technical Report IDSIA-01-03
Lakatos, Imre and Paul Feyerabend and Matteo Motterlini (1999). For and Against Method. Chicago: University of Chicago Press
Maturana, Humberto and Francisco Varela (1992). The Tree of Knowledge. Boston: Shambhala

· Pelikan, Martin (2002). Bayesian Optimization Algorithm: From Single Level to Hierarchy, Ph.D. Dissertation from the Dept. of Computer Science at the University of Illinois at Urbana-Champaign

· Stcherbatsky, Th. (1958). Buddhist Logic, The Hague, Netherlands: Mouton & Co.

[1] The ideas given here resemble some that I presented in my 1994 book Chaotic Logic, which also discusses Lakatos, the history of science, and the dynamics of human belief systems; but the treatment here introduces a number of important new ideas.

[3] http://www.mged.org/Workgroups/MIAME/miame_checklist.html