1068 Fulton Av Sunnyvale, CA 94089-1505
I show that quantum systems can rapidly solve some problems for which finite state machines require a non- polynomially large time. This class of problems is closely related to the class of problems that animals can solve rapidly and effortlessly, but are intractable for computers by all known algorithms.
Penrose[7] and many others [8, 9, 10] argue from practical considerations, Godel's theorem, and on philosophical grounds, that consciousness or awareness is non-algorithmic and so cannot be generated by a system that can be described by classical physics, such as a conventional computer, but could perhaps be generated by a system requiring a quantum (Hilbert space) description. Penrose suspects that aspects of quantum physics not yet understood might be needed to explain consciousness. In this paper we shall see that only known quantum physics is needed to explain perception.
Bialek[11, 12] and Frolich[13] suggested on very different grounds that cells process information using quantum mechanical processes. Frolich suggested a class of mechanisms that might enable them to do this despite the high temperature and large size of biological membranes and macromolecules. Deutch[14] showed that quantum systems can solve some problems that computers cannot solve in polynomial time, but he did not show that quantum systems could solve perception problems. Penrose conjectured that some areas where animals are superior to computers are of this class, but did not find any examples. Bialek[15] argued that perception is inherently non-polynomial if done algorithmically, and therefore neurons must be doing something remarkable, but he did not show that quantum mechanics would enable them to do this.
This paper will show that quantum systems can also rapidly solve perception problems, closing the gap between Bialek's argument and Deutch's result, and demonstrating Penrose's conjecture. This result supports the idea that animals perceive by processing sensory information quantum mechanically in hilbert spaces corresponding to many strongly coupled degrees of freedom.
The fact that animals perceive effortlessly has lead to a widespread belief that some polynomial time algorithm must exist, yet no significant progress has been made in the search for an efficient perception algorithm.
A number of perception problems have been studied very thoroughly, in particular the target acquisition problem in radar and sonar, and the visual problem of inferring objects from two dimensional data.
An algorithm that could perceive in polynomial time is called direct perception (DP), or bottom up perception. Such algorithms unsuccessfully attempt to construct object descriptions (top level) from immediate data (bottom level). The algorithms that do work are called indirect perception (IP), or top down perception. Such algorithms start at the top level (objects) and search for a match with the immediate data. Such algorithms take non-polynomial time [2, 3], for they must try an non-polynomial number of object hypotheses.
Many top down algorithms are not strictly top down - they start from both top and bottom and look for a match in the middle, but this does not change the character of the algorithm.
Bottom up algorithms do not work. Ullman[16] gives examples of situations where perception must be purely top down because all local or explicit information is suppressed, scrambled or misleading, so that bottom up processing has nothing to start from. Gregory[17], made the same argument long before these acronyms were coined, using the example of a dalmatian against a background of spots. In Gregory's lucid terminology we perceive by forming object hypotheses that fit the data.
Kanade[5] showed that when we attempt to generalize the polyhedral labeling problem it no longer has a unique solution. (The polyhedral labeling problem is a special case of the problem of forming a 2 1/2 D image, which is the problem of identifying contours in an image and labeling them as silhouette, convex, concave, groove, or mere change in surface albedo.) His result means that even when there is local and explicit data in an image, this is not sufficient to form a visual perception. You also need knowledge of what objects are likely. This led him to perform the negative chair experiment: He constructed an unlikely unfamiliar object and had people look at it. They misperceived it even though it was right in front of them. This experiment showed that not only is light and shade insufficient in itself to construct 3D or 2 1/2 D descriptions of the image, but even with small angle stereoscopic and small angle apparent motion data, it is still insufficient. Object perception is primary. We do not see three dimensionally and infer objects from the three dimensional information. We do not see what we think we see - we perceive it by forming hypotheses. Gregory[17] made the same argument using the example of reversed masks, but many people argued that this was a special case. Kanade[5] showed the same phenomenon occurs with any complex object. This shows that it is pointless to do anything elaborate to the immediate data without an object hypothesis.
The results for the target acquisition problem are similar to the visual perception problem: If the target and background signals are comparable then the only way to extract the target signal is to find the correct object hypothesis. You cannot find the correct object hypothesis by extracting the target signal first. This is a tractable problem if you are only interested in a single class of objects with a single orientation, for example a specific type of interceptor on an intercept course at full speed, but it is an intractable problem if you are interested in many different possible targets that can be travelling in many different possible directions at many different speeds, with several similar moving objects present at once, and several similar interfering radars, yet bats solve this problem with the effortless rapidity that animals always show when using their most important senses for normal survival purposes.
In some areas of AI, such as chess, search is an acceptable algorithm because we merely want a good sequence, not the best sequence, and this can be achieved in polynomial time, but in perception we want the right object hypothesis, not merely a good object hypothesis.
Thus the problem of perceiving efficiently is equivalent to the problem of efficiently minimizing a class of many variable functions. Almost everyone who has confronted this problem has argued, from the fact that animals perceive rapidly, that it must be sufficient to find a good minimum, not the global minimum, but it all cases tried so far, algorithms that stop before finding the global minimum have been unsuccessful, as one would expect from the nature of the problem.
Any classical system performing such a minimization can only sample the system locally in phase space, thus if the function is irregular and general the system must find a non-polynomially large number of local minima before it finds the global minimum. This the reason why a chess playing computer must explicitly generate an enormous number of sequences and evaluate each one, and a perceiving computer must explicitly generate a non-polynomially large number of object hypotheses and evaluate each one, but this is not how we play chess and this is not how we perceive.
If the function to be minimized is completely general then the problem is non-polynomial (NP) complete. It is likely that animals and computers are both equally incapable of solving large NP complete problems. This leads us to expect that the class of functions corresponding to the class of perception problems is not general, but has some special property that enables animals to get a handle on it.
We shall see that the perception problem corresponds to the problem of finding the global minimum of a function of many variables, where the global minimum is much deeper than any other minimum.
One may easily show that a quantum system with a potential corresponding to such a function may be rapidly brought to the ground state by cooling where the ground state is not localized, followed by adiabatic cooling so that the ground state becomes substantially localized in the well containing the global minimum, whereas a classical system with the corresponding hamiltonian requires nonpolynomially large time to reach the corresponding ground state by cooling; for a classical system the ground state is always localized in the well so the system will rattle randomly from one local minimum to another, until it finally hits the well containing the global minimum; the number of local minima will be exponential in the number of degrees of freedom.
For this to work in polynomial time the momentum terms in the initial hamiltonian must be large enough relative to the potential terms to prevent substantial localization, which requires that the significant degrees of freedom of the system be initially far in the quantum domain. The change in state will remain adiabatic during the change in hamiltonian if the local minima are sufficiently shallow relative to the global minimum that the ground state is not significantly localized in local minima during the change.
We can deduce that the global minimum is much deeper than any local minimum from the way in which these functions are constructed.
The function to be minimized is a measure of the discrepancy between the perception and the immediate sensory data. We wish to minimize the function with respect to a set of variables that constitute a parse tree describing the external world assumed to be generating the immediate data (the object hypothesis). The grammar of the parse tree is a model of the world and the way in which the world interacts with the senses. There will be a single deep well because the immediate data has much higher apparent entropy than the parse tree data. In other words the immediate data is not noise, it has internal consistency in that it is capable of being generated from a much smaller parse tree. Conversely, if the immediate data was indistinguishable from noise, there would be no dominant global well. The probability that a second dissimilar parse tree will have a comparably good fit to the data varies exponentially with the difference between the apparent entropies of the parse tree and the immediate data.
This result is also supported by the fact that programs that truncate their search produce flagrant errors, suggesting that almost right interpretations are rare, and by the fact that genuinely ambiguous images (images with two distinct interpretations) seldom occur in nature but only occur when contrived by artists, indicating that cases where the two deepest wells are of comparable depth are very rare in nature. (We can ignore the very common case - fitting a stimulus of small apparent entropy, for example a Rorschach blot, with a complex perception.)
The mechanism I have described (adiabatically cooling the ground state) is an equilibrium mechanism that can only work for very small systems or at very low temperatures. There are however a few small loopholes in this argument: