
“My plan for today:
1. Pick up dry cleaning.
2. Go to dentist.
3. Think up brilliant idea.”
Good luck with that third bullet. Big ideas can’t be planned like growing tomatoes in one’s garden. We stumble upon ideas, and although we can sometimes recall how we got there, we could not have anticipated the discovery in advance. That’s why grant proposals never wrap up as, “And via following this four-part plan, I will have arrived at a ground-breaking new hypothesis by year three.”
Three impossible thoughts before breakfast we can manage, but one great idea before dinner we cannot.
Unplanned ideas are often best illustrated by ‘Eureka!”, or ‘Aha!’, moments, like Einstein’s clock tower moment that sparked his special relativity, or Archimedes’ bathtub water-displacement idea.
Why are great ideas so unanticipatable?
Perhaps ideas cannot be planned because of some peculiarity of our psychology. Had our brains evolved differently, perhaps we would never have Eureka moments.
On the other hand, what if it is much deeper than that? What if the unplannability of ideas is due to the nature of ideas, not our brains at all? What if the computer brain, Hal, from 2001: A Space Odyssey were to say, “Something really cool just occurred to me, Dave!”
In the late 1990s I began work on a new notion of computing which I called “self-monitoring” computation. Rather than having a machine simply follow an algorithm, I required that a machine also “monitor itself.” What this meant was that the the machine must at all stages report how close it is to finishing its work. And, I demanded that the machine’s report not merely be a probabilistic guess, but a number that gets lower on each computation step.
What was the point of these machines? I was hoping to get a handle on the unanticipatability of ideas, and to understand the extent to which Eureka moments are found for any sophisticated machine.
If a problem could be solved via a self-monitoring machine, then that machine would come to a solution without a Eureka moment. But, I wondered, perhaps I would be able to prove that some problems are more difficult to monitor than others. And, perhaps I would be able show that some problems are not monitorable at all -– and thus their solutions necessitate Eureka moments.
On the basis of my description of self-monitoring machines above, one might suspect that I demanded that the machine’s “self-monitoring report” be the number of steps left in the algorithm. But that would require machines to know exactly how many steps they need to finish an algorithm, and that wouldn’t allow machines to compute much.
Instead, the notion of “number” in the self-monitoring report is more subtle (concerning something called “transfinite ordinal numbers”), and can be best understood by your and my favorite thing…
Committee meetings.
Imagine you have been placed on a committee, and must meet weekly until some task is completed. If the task is easy, you may be able to announce at the first meeting that there will be exactly, say, 13 meetings. Usually, however, it will not be possible to know how many meetings will be needed.
Instead, you might announce at the first meeting that there will be three initial meetings, and that at the third meeting the committee will decide how many more meetings will be needed. That one decision about how many more meetings to allow gives the committee greater computational power.
Now the committee is not stuck doing some fixed number of meetings, but can, instead, have three meetings to decide how many meetings it needs. This decision about how many more meetings to have is a “first-order decision.”
And committees can be much more powerful than that.
Rather than deciding after three meetings how many more meetings there will be, you can announce that at the end of those decided-upon number of meetings, you will allow yourself one more first-order decision about how many meetings there will be. The decision in this case is to allow two first-order decisions about meetings (the first occurring after three initial meetings).
You are now beginning to see how you as the committee head could allow the committee any number of first-order decisions about more meetings. And the more first-order decisions allowed, the more complicated the task the committee can handle.
Even with all these first-order decisions, committees can get themselves yet more computational power by allowing themselves second-order decisions, which concern how many first-order decisions the committee will be allowed to have. So, you could decide that on the seventh meeting the committee will undertake a second-order decision, i.e., a decision about how many first-order decisions it will allow itself.
And once you realize you are allowed second-order decisions, why not use third-order decisions (about the number of second-order decisions to allow yourself), or fourth-order decisions, and so on.
Committees who follow a protocol of this kind will always be able to report how close they are to finishing their work. Not “close” in the sense of the exact number of meetings. But “close” in the sense of the number of decisions left at all the different levels. And, after each meeting, the report of how close they are to finishing always gets lower.
And when such a committee does finish, the fact that it finished (and solved whatever problem it was tasked) will not have come as a surprise to itself. Instead, you as committee chair will say, “We’re done, as we foresaw from our previous meetings.”
My self-monitoring machines carry out their self-monitoring in the same fashion as in the committee examples I just gave. (See the little appendix at the end for some examples.)
What does this have to do with the Eureka moment!?
Some problems are harder to self-monitor than others, in the sense of requiring a higher tier in the self-monitoring hierarchy just mentioned. Such problems are possible to solve while self-monitoring –- and thus possible to solve without a Eureka moment -– but may simply be too difficult to monitor.
Thus, one potential reason for why a machine has an ‘Aha!’ moment is that it simply fails to monitor itself, perhaps because it is too taxing to do so at the required level (even though the problem was in principle monitorable). Such Eureka moments could potentially have been made without Eureka moments.
Here, though, is the surprising bit that I proved…
Of all the problems that machines can solve, only a fraction of them are monitorable at all.
The class of problems that are monitorable turns out to be a computationally meager class compared to the entire set of problems within the power of machines.
Therefore, most of the interesting problems that exist cannot be solved without a Eureka moment!
What does this mean for our creative efforts?
It means you have to be patient.
When you are carrying out idea-creation efforts, you are implementing some kind of program, and odds are good it may not be monitorable even in principle. And even if it is monitorable, you are likely to have little or no idea at which level to monitor it. (A problem being monitorable doesn’t mean it is obvious how to do so.)
The scary part of idea-mongering is that you don’t know if you will ever get another idea. And even if an oracle told you that there will be one, you have no way of knowing how long it will take.
It takes a sort of inner faith to allow yourself to work months or years on idea generation, with no assurance there will be a pay-off!
But what is the alternative? The space of problems for which you can gauge how close you are to solving it is meager.
I’d rather leave the door open to a great idea that comes with no assurance than be assured I will have a meager idea. You can keep your nilla wafer –– I’m rolling the dice for raspberry cheesecake!
+++++++++++++++++++++++++++++++++
(The journal article on this is here http://www.changizi.com/ord.pdf, but I warn you it is eminently unreadable! An unpublished, readable paper I wrote on it back then can also be found here: http://www.changizi.com/Aha.pdf )
+++++++++++++++++++++++++++++++++
Appendix: Some examples of self-monitoring machines doing computations
For example, suppose the machine can add 1 on each step. Then a self-monitoring machine can compute the function “y=x+7” via allowing itself only seven steps, or “meetings”. No matter the input x, it just adds 1 at each step, and it will be done.
To handle “y=2x”, a machine must allow itself one (first-order) decision, which will be to allow itself x steps, and add 1, x many times, starting from x. (This corresponds to having a self-monitoring level of omega, the first transfinite ordinal. For “y=kx”, the level would be omega * (k-1).)
In order to monitor “y=x^2” (i.e., “x squared”) it no longer suffices to allow oneself some fixed number of first-order decisions. One needs x many first-order decisions, and what x is changes depending on the input. So now the machine needs one second-order decision about how many first-order decisions it needs. Upon receiving x=17 as input, the machine will decide that it needs 16 more first-order decisions, and its first first-order decision will be to allow itself 17 steps (to add one) before making its next first-order decision. (This corresponds to transfinite ordinal omega squared. If the equation were “y=x^2 + k”, for example, the ordinal would be omega^2 + k.)
This hierarchy keeps going, to omega^omega, to omega^omega^omega, and so on.
~~~
Mark Changizi is Professor of Human Cognition at 2AI, and the author of The Vision Revolution (Benbella Books) and the upcoming book Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man (Benbella Books).
This piece first appeared July 6, 2010, at Science 2.0.



The reason you can’t plan for a new idea is quite simple. To plan for it, you’d have to know what you were planning for — and if you knew that, you would already have the idea in mind.
New ideas are produced by causal processes, but calling such processes “programs” carries a misleading suggestion. Programs are procedures for producing a certain outcome. If you don’t know what the outcome is supposed to be, you can’t write a program for it. (Of course, after the fact, one might write a program to produce the same, now known, result.)
A better analogy might be a kaleidoscope. New patterns fall into place through causal processes, but the results are unplanned. Many new combinations of pieces of glass (or ideas) are not so interesting, but a few may be beautiful. Of course, one has to be prepared to recognize the beauty of a new combination, especially a new combination of ideas.
Hi William. It’s actually possible to write programs that search for hypotheses to explain some set of data. That is, one can build “scientist machines”, that try their best to be the scientist, and come to hypotheses. …hypotheses the programmer him/herself cannot possibly have in mind, because the “scientist program / machine” is potentially picking out one hypothesis from infinitely many possible ones. In theoretical computer science we derive theorems about the space of all such machines, for example. So it’s not true that you have to know what you’re planning for. -Mark
Hi Mark,
Interesting post. It seems vaguely related to a thought experiment I’ve played around with other the years. It’s about why some very promising Aha! moments don’t always work out.
Here’s the version I published here:
http://new-savanna.blogspot.com/2010/05/theyre-at-it-again-hacking-human-mind.html
Here’s an analogy that I find useful in thinking about this business of always seeing The Solution around the next corner. The analogy is based on those old Christmas tree lights that were wired in parallel. When one light went out the whole string went out. To get the lights working again you had to test each bulb individually until you found one that was broken. Then you’d replace it in the string, turn the switch, and – if it was the only defective bulb – the lights would go on. If they didn’t, then you had to keep on looking until you found another broken bulb. And so forth. It was a little tedious, but not difficult.
Think of the problem of building an artificial human mind as consisting of, say, 100,000 little problems. Each of them is a light bulb, they’re all connected into one string, and that string is wired in series. If even one bulb is missing or defective, the string won’t light. In the analogy the string’s initial state is that won’t light. We don’t know what bulb or bulbs is causing the problem or even how many are defective. In fact, a thousand bulbs are defective and they’re randomly scattered along the string. Research, then, is the process of testing each bulb to see whether or not it works. If it doesn’t, you replace it.
So, finding a broken bulb – one in a hundred – is a good thing. It means we’ve got a chance to get things right. Just replace the bulb. When the string doesn’t light . . . Drat, failed again!
There’s more work to do. But the failure doesn’t necessarily mean that the folks who thought they were finally going to bust the piñata were blowing smoke up our collective posterior fundament. Perhaps they were, perhaps they weren’t. Chances are that they’ve really figured something out, and that their techniques are valuable. But, the problem is bigger than they know, than we know. Or can imagine.
Nice piece on the future of AI, and all. We’re a millenium away from interestingly creative machines, I think. …and from reverse-engineering the brain.
Mark,
You are both right and wrong as to the eventual computational ability to monitor an event or series of events. (Heisenberg anyone?)
Monitoring an event only becomes impossible if you are trying, or needing, to get an absolute answer. (Schrodinger anyone?)
However “reality” is not binary (absolute anything) but is rather an analogue process where the Ying and Yang is determined by the probability of the Ying and Yang much like the statistical nature of string theory. The statistical macro-reality becomes an aggregate of the statistical micro-reality.
An illustrative metaphor of this is the processing of images by the photoreceptors in the eye where you have 100 photoreceptors for every optic nerve fiber. The four inner neural layers of the retina serve as a “biological circuit board” to filter the photoreceptor stimuli with the emphasis on detecting motion and distance. Only the significant (threatening) stimuli from the photoreceptor clusters are sent along the optic nerve fiber to the brain, and the brain itself further filters those stimuli to determine which are significant (such as threats) or which are noise (such as Rush Limbaugh).
Eventually, that cascade process will be emulated by a “computer” (e.g. the “neural net” of the character Data in Star Trek) so that it will be possible for non-biological entities (computers) to have concept processing and filtering.
As I have told my sons, Skynet (from the “Terminator” movies) is not science fiction but rather is predictive history. The only good news for me is that I likely I won’t be alive when it happens 50 years from now.
However, despite the billions of people that might die as the robots try to take over, we will eventually defeat them because we can still do something they cannot do: behave irrationally.
That irrationality will allow the survivors to figure out how to get the robots to pull the plug on themselves, since the key to their impending domination is overwhelming force and logic.
(And what is even scarier, as I am proof-reading this, is that despite my sarcasm i am very serious about this. Thank you for letting me share.)
[...] it is intrinsically impossible to know how close one is to reaching the end. For these puzzles, sudden breakthroughs—aha moments—are in fact logically required rather than due to some quirk of human [...]