Neural Networks

A Brief Introduction

The human mind is possibly the least well understood natural phenomenon, and its complexities are one of the last great frontiers of scientific knowledge remaining today. Gaining insight into the functioning of the mind is difficult for a variety of reasons. Like any complex system where all parts are constantly interacting, the reductionist approach of studying each component in isolation is of limited value, though it can provide some information. Additionally, only so much can be learned by observing the brain from the outside, but ethical considerations make it impossible to delve around the inside of a functioning mind, due to the often destructive nature of tampering. While this is changing due to advances in medical imaging techniques, such as CAT scans and MRIs, there is still information they cannot provide, and the cost can be prohibitive. Thus, most research is confined to taking observations of the inputs and outputs of the brain, and developing plausible models for the resulting behavior, which could be implemented in the structures observed in brains obtained from the deceased. The systems required to account for observed behavior are generally quite complex and not amenable to designing by hand. To get around this, researchers noted that modeling the brain would be easier if instead of trying to create a model of the brain out of thin air, one tried to develop one by simulating the actual processes at work in a natural brain. Out of this line of inquiry was born the field of neural networks, which seeks to create simplified models of all or parts of the brain by mimicking its natural structure and learning processes.

Neural networks is a rich and prolific field, regularly yielding new understanding about how the mind works as well as practical applications that allow for the automation of tasks previously thought quite difficult for machines to accomplish. Most research projects in neural networks seek to either emulate the activity of a certain part of the brain, and thus provide hints as to how the actual brain is organized, or, more commonly, seek to develop a system capable of completing a certain task, either to investigate the sort of solution that results and compare it to those of humans or other organisms, or because a system capable of performing this task is desired. This work will examine in moderate detail a variety of different projects in neural nets, showing the extent and direction of current research.

The Nature and History of Neural Networks

An artificial neural network is a computational device based upon an abstract model of natural neurons and their interconnections. In a natural brain, there exist many neurons that function as computational elements. Information processing occurs at each neuron based on the inputs it receives from other neurons which are connected to it. The signals received may be either excitatory or inhibitory, and the neuron determines whether or not to send a signal itself based on some complex function of these inputs. In an artificial neural network, many simplifying assumptions are made to create a manageable system. First, the signals transmitted between neurons consist of simple scalar numbers, and transmission along the links is the only way neurons communicate; there are no analogs to hormones and other complex messaging systems found in actual biology. Second, each neuron has no internal memory; that is, unlike biological neurons, which can be affected by whatever molecules are in their cytoplasm, there are no hidden variables inside a neuron that may cause it to act differently on identical input at different times. A neuron's output is determined solely by some mathematical function of its current weighted inputs, with the weight being dependent on the particular link. However, these weightings, the connections themselves, and the function to determine the output for a given neuron may change over time based on some algorithm, depending on the type of network in question. In most systems used, the activation function is actually a function of the sum of the weighted inputs, and is typically non-linear. Threshold functions, which are zero if the input is less than some constant and one if the input is greater than that constant, sigmoid functions, which are like threshold functions with a smooth rather than an abrupt transition, and gaussian functions, where activation peaks at some input value and falls off as the input moves away from that value, are the most commonly used activation functions.

The first work in neural networks was done in the early 1940s by Warren McCulloch and Walter Pitts (Fausett 1994). They implemented logical functions, such as AND and OR, via relatively simple hand-crafted networks that made use of threshold functions. Their later work broke ground in important research areas, such as recognizing patterns even after rotation and translation. In 1949, Donald Hebb came up with the important Hebbian learning principle, in which connected neurons that are simultaneously excited strengthen the link between themselves. His ideas were later improved upon, most notably by the addition of the principle that neurons which are simultaneously inhibited should also increase the strength of their connection.

In the 1950s and 1960s research in the field flourished, giving rise to the first golden age of neural networks. Even John von Neumann, who is generally considered to be the father of modern computing, and after whom traditional architectures are named von Neumann Machines, was interested in the neural network approach to computation and in the modeling of the human brain. Through the work of Frank Rosenblatt and others, it was this time that gave birth to the perceptron, the first major class of neural networks studied. A perceptron is a single neuron that accepts some number of inputs and attempts to calculate some function of these inputs as an output. Importantly, the perceptron was the first neural network to actually learn. It adjusted the weights of its inputs based on whether or not the output was correct, as gauged by some standard, until it matched the function as well possible. Of particular interest was that if some set of weights existed that would calculate a given function, then a perceptron was guaranteed to find them. However, in 1969, Minsky and Papert published an influential paper showing that simple perceptrons could only learn functions that are linearly separable, which excludes most interesting problems. After this, interest in the field dropped off sharply, as did funding.

Despite the sudden drought of resources for the few researchers who were not discouraged, many important advances were made after this point. Bernard Widrow and Marcian Hoff developed another learning rule, the delta rule, which attempts to minimize overall error better than the perceptron learning rule, leading to an improved ability for a network to generalize based on the examples on which it was trained. Their networks were known as ADALINE, for Adaptive Linear systems. Modified and extended versions of these have been used to solve a variety of problems. Other researchers working throughout the proverbial dark ages of neural networks were Teuvo Kohonen, who pioneered work with self-organizing maps for memorizing and recalling information, and James Anderson, who developed a method to truncate output during the learning process to prevent the network from becoming unstable, which has been used in medical diagnosis systems. Also, Stephen Grossberg and Gail Carpenter invented another form of self-organizing network known as Adaptive Resonance Theory, which relies on identifying core sets of features that all members of a category must share.

While it had been shown that perceptrons were inadequate for most functions, multilayer networks, which contained several groups of neurons, with the outputs from one group used as inputs for the next, were much more powerful. In fact, it was known that any function which could be computed by an ordinary von Neumann-style computer could be done with a three-layer neural network as well. Unfortunately, determining which network could do this calculation was the problem. There was no general learning algorithm for multilayer networks, so there wasn't much that could be done with them, as setting them up by hand was nearly impossible and would consume far too much time. In 1974, Werbos discovered an algorithm for training multi-layer networks, but it failed to gain much publicity. This method was independently rediscovered in 1985 by David Parker and 1986 by LeCun, but did not come to be widely known until it was refined and publicized by David Rumelhart and James McClelland. This Backpropagation algorithm was basically an extension of the perceptron learning algorithm where it was determined what amount of the error in the output was due to each of the inputs, and that error was sent back to the neurons that provided the input. In this way, each neuron knew what its own error was, and could adjust its input weights based on the perceptron learning rule. This does require a differentiable activation function, ruling out threshold functions, but that is not a problem, because sigmoid functions work just as well. It should be noted that this learning algorithm has no basis whatsoever in biology, as information percolates in only one direction through the axons and dendrites of biological neurons.

While backpropagation was the key discovery that led to a renaissance in neural networks in the late 1980s, other important developments also contributed. Physicist John Hopfield, together with AT&T researcher David Tank, developed the network named after him that makes use of fixed weights but changing activation functions to learn, and which can serve both as an associative memory or as a means of finding solutions to constraint-satisfaction problems, such as the infamous travelling salesman problem, which involves finding the shortest route that visits all cities on a given map. Kunihiko Fukushima and others at NHK Laboratories of Tokyo developed the neocognitron, a self-organizing network capable of recognizing rotated and translated characters. Nondeterministic networks, such as the Boltzmann machine, made use of probability density functions and uncertain activity to achieve better results, incorporating important ideas like simulated annealing and Bayesian probability theory. Also of importance were the growing dissatisfaction with so-called "good old-fashioned AI", the symbolic approach to artificial intelligence, which had failed to live up to the grandiose promises made a decade earlier, and the increasing availability of low-cost, high-speed computing equipment available for simulating neural networks. The ever-quickening pace of hardware development made it quite feasible to simulate whatever neural network one desired using a von Neumann architecture, rather than having to laboriously build an actual physical neural network. Then, ironically, as the 1990s came along, many feared that the pace of development in computer hardware would slow down or stop within the next decade or two due to the physical limit on chip size imposed by the size of atoms. In response, some labs have turned to researching optical computational devices, which are best suited for use as neural network s due to their physical structure (Wasserman 1989).

Networks for Visual Perception

One of the most studied parts of the human nervous system is that dealing with vision, for a variety of reasons. Since much of the apparatus for vision is located in the eye, it presents a discrete system that communicates with the brain via the well-defined interface of the optic nerve, making at least the early stages of vision easier to isolate and study. Secondly, vision is fairly concrete compared to most other tasks. It has a definite objective, to determine what objects are in front of the viewer. Providing constant, reproducible, objective stimuli for vision is much simpler than for most other tasks, since a picture does the trick nicely. Also of importance is the fact that we humans place so much emphasis on vision. It is our primary means of getting information about the world, and we tend to emphasize it over our other senses much of the time. Thus, it is no wonder that we would like to understand vision better. One way to do this would be to attempt to construct functional replicas of the human visual system that exhibited similar characteristics in terms of both achievements and mistakes. As part of this project, one would gain a way of providing computers with meaningful visual input of the world, something generally considered quite important to developing machine intelligence. Many efforts within the field of neural networks seek to develop some sort of vision system that parallels the human one.

One such project is the recent Receptive-Field Laterally Interconnected Synergetically Self-Organizing Map (RF-LISSOM) model of the primary visual cortex (Miikkulainen 1997b, 1998). It has been known for several decades that the general structure of the primary visual cortex consists of a proximity-preserving map of the retina with orientation and ocular dominance columns of neurons. Recently, it has become clear that these have a high degree of lateral interconnects to other columns, which appears to be very important to optical activity. Additionally, studies of adults who have undergone trauma indicate that this organization of the primary visual cortex can radically adjust itself when compensating for damage to part of the visual system. The RF-LISSOM model is a simplified version of the human visual cortex, where each entire column of neurons is modeled as a single computational unit, to make things computationally feasible. There exists an input surface that serves as a black-and-white retina. Each of the units in the cortex receives input from a block of units in the retina surrounding its position. Its output is determined by the inputs from the retina and by both short-range excitatory and long-range inhibitory connections to other units in the cortex. The cortex was initially set up randomly, then trained by repeated exposure to patterns consisting of lines at various angles. During the training process, which required a Cray T3D massively parallel supercomputer, modified Hebbian learning was used to reinforce links between units that were responding similarly and prune those between neurons whose outputs were unrelated.

The result was a self-organized network where certain clusters of units responded strongly to lines of certain orientations, and only very weakly to other orientations. The resulting map showed groupings of units very similar to that found in the human visual cortex. Features such as pinwheel centers, where orientation changes 180° about a point, linear zones, which show smooth transitions to neighboring orientations, and fractures, which are discontinuous changes in orientation, were present in both the RF-LISSOM model and in actual human cortices. Mathematical analysis showed that the network encoded data more efficiently than a comparable network with fixed rather than self-organized lateral connections, indicating that self-organization is actually beneficial to the functioning of the human visual system, and not a hindrance to be worked around. Additionally, the network showed surprising robustness, being able to compensate completely for small simulated lesions, and fairly well for large ones.

Interestingly, the network showed effects similar to those that create certain categories of optical illusions in humans, known as tilt aftereffects. If a person observes a group of lines at a particular angle for a prolonged period, then looks at another group of lines that is slightly tilted with respect to the first, they will perceive them as being more tilted away from the original set. If instead they look second at a group that is tilted far away from the original group, they new group is perceived as being less tilted away from the original group. When exposed to tilted lines and allowed to settle into a stable state, the network showed just such tilt aftereffects when exposed to lines immediately afterwards, with units for tilts somewhat away from the original tilt responding more strongly than they otherwise would.

The amazing similarities between this very simple yet self-organized system and human visual system would seem to indicate that some fundamental process is at work in both, and that this simple model is sufficient to capture some of the essence of the primary visual cortex's organization principle. As has been well known for some time, one of the most interesting features of neural networks is their ability to not only do complex tasks, but to make very human-like errors while doing them.

Networks for Performing Tasks and Motion Control

Most neural networks research focuses on the creation of networks to process data, generally through some form of recognition or classification, but sometimes through the computing of some function of the input. However, a living organism is much more complex than this. It not only has to make sense of its input, but then must decide upon a course of action that will achieve its goals, take an action that affects its environment in a potentially unpredictable manner, and reevaluate its position. The creation of entire neural network intelligent agents is significantly more complex than the previously discussed tasks, often requiring various connected subnets, each devoted to a specific task. However, researchers have still managed to develop a variety of networks that solve either real-world tasks, such as backing up a semi-trailer truck into a loading dock (Nguyen 1989), or tasks in simulated worlds, like predator evasion or prey tracking.

In general, complex tasks like these benefit from more advanced learning techniques. One common approach is to start with a highly simplified version of the task and, once the network learns to do it, gradually present it with harder and harder tasks until the desired task is reached. After all, one would hardly expect a child to produce a well-argued essay without ever having been taught how to read first. Another approach is to evolve networks that solve the problem by using artificial selection on populations of networks.

Control of robotic arms is a very interesting problem. From a research point of view, an agent moving, albeit in a limited sense, through a varying environment with which it must interact constitutes a very complex problem that more or less covers most of the important low-level interactions with the world. From a practical viewpoint, most everything is assembled by robotic arms in factories, and getting them to work right is essential. The traditional method of training robotic arms was to have them memorize exactly the movements a human operator put them through, then repeat them perfectly every time. This works, but only if the environment is the exactly the same every time, which, while achievable on an assembly line, can't always be done. In his work, Moriarty (1996) details a scheme for using a neural network to control an industry-standard OSCAR-6 robot arm to avoid obstacles and reach a random target based on visual and sensory input. Specifically, he makes use of neuro-evolution to produce robust controller networks.

The evolution was done through the SANE package, which create a population of neurons with certain weights and a population of ways of putting the neurons together into networks. Many of each type are generated, networks are built based on the recipes for using the neurons, and each resulting network is tested on the given task. Those which are more successful are duplicated and modified slightly, in a process known as mutation. Those which are less successful are discarded. The entire process is then repeated with the new sets of neurons and recipes. While the initial networks are random and achieve little, over the course of successive generations they rapidly evolve to become quite good at the task in question.

The OSCAR-6 arm contains six joints, with a gripping "hand" at the end, and a camera as well. The networks to be used receive input about the hand's current position and the distance to any objects the camera senses in any direction. Since the task of positioning the hand at a given location without bumping into anything is rather complex, it was broken down into two stages. The first stage gives a rough approximation, and seeks to get in the vicinity of the right position, but focuses on not hitting anything while doing so. The second stage worries less about obstacle avoidance, as there shouldn't be any obstacles between it and the goal by then, and concentrates on fine-tuning the hand's location. This division of labor allows for more specialization, and the combined effect is better than that which can be evolved in reasonable time from a single network of similar size.

On tasks consisting of avoiding a single randomly placed cube 30 cm on each side, the robot arm was able to get within 1 cm of the randomly located target, which is considered sufficient for industrial use. Arms using traditional control mechanisms hit the obstacle over 11% of the time, as opposed to under 2% for the one using the neural net. Clearly, without any overt guidance from the designers of the system as to how to do it, a network had evolved that was quite efficient at obstacle avoidance.

In many real-world situations, the exact environment that needs to be dealt with is not known ahead of time, and new, unpredictable factors may periodically emerge that need to be handled. One approach to interacting with such dynamic environments using neural networks is to employ a technique known as online neuro-evolution (Agogino 1999). In traditional, or offline, neuro-evolution, a population of neural networks is evolved to cope with a given task or set of tasks, then fixed and allowed to do its work. In online neuro-evolution, the population of networks is allowed to keep evolving even after it is put into use, allowing it to adapt better to its current environment, and develop strategies for coping with new challenges.

Agogino's experiment involved an artificial world with a home base and two to four gold mines. There was a population of 30 neural networks known as peons, each of which had as its task to start at the home base and reach a gold mine as quickly as possible, whereupon it would be transported back to the base. To complicate matters, there were predators that killed the peons on contact. A peon had sensors that told it whether or not there was a gold mine or a predator in each of the four quadrants about it. It was capable of moving in any direction, and had no memory. Initially, there were sixteen different scenarios involving all possible combinations of four different placements of the mines with four different behaviors for the predator. The fitness of a peon was based on how many times it managed to get to a mine before being eaten, and how long it lived.

An initial random population was trained on each of the sixteen scenarios until its performance ceased to improve on any of them. The performance of this population was then compared to that of a population that was allowed to continue evolving online and adapting specifically to the scenario in case, and the online population did significantly better. Even when the online group started with a random population and not one that had already evolved offline, it still outperformed the offline group within minutes. When both were tried on a simple scenario (based on the intelligence of the predator), then switched to a harder one once the online group had enough time to specialize for the simple one, the online group still did no worse than offline group right after switch (even though one might think that it would lose its ability to deal with the harder scenario due to specialization) and rapidly evolved to outperform the offline group by an even bigger margin than before the switch. When both were tried first on an easy scenario, then switched to a new scenario not in the original sixteen (with a new mode of predator behavior), the offline group performed dismally, but the online group adapted very well within seconds.

These results show that by allowing a neural network to continue to train and adapt itself to its environment even after it has achieved optimal results on test cases and been put it into actual use, one can achieve a surprising degree of robustness and versatility that improves performance even on known cases, and leads to the ability to deal with new and unexpected cases as well. This sort of behavior is vital to any sort of interaction involving the often chaotic real world, and is a step towards the creation of autonomous systems that can operate without any human supervision, such as those that would be needed for the unmanned exploration of other planets.

Networks for Language Recognition

One task that is of particular interest is the understanding of normal human language. From a research point of view, language is viewed as one of the more complex things that humans deal with on a regular basis, and many feel that it is one of the more important characteristics that helps to set us apart from less intelligent animals. To construct an artificial neural network that displayed an advanced understanding of language and was able to process sentences and come up with appropriate replies could provide great insight into the mechanisms underlying human comprehension of language. From a practical point of view, if machines were able to understand human language, our interactions with them could become much more streamlined and easier, particularly for the layperson. For these reasons, a large bulk of the research currently being done in neural networks, particularly that funded by corporations, is in the realm of language comprehension.

Understanding ordinary human language would at first appear to be a highly symbolic activity that could be achieved by rigorous parsing of sentences and referencing words to their meanings. However, people often make use of expressions the listener does not know, use words inappropriately, mumble, repeat themselves, and add in extraneous filler such as "um", "like", and "you know", but we still manage to communicate with each other just fine. If our mind were relying on a purely symbolic approach, we would probably have a hard time dealing with such things, and might not even generate such sentences in the first place. Thus, this would seem to imply that at least some level of the human ability to speak does not operate on rigid principles, and might best be modeled by an artificial neural network.

In attempting to build a network for understanding language, one must first decide how to represent a word. In a subsymbolic representation, no one neuron in the network stands for anything. It is only as group that any meaning can be assigned to their states. This also causes the information representation to have the property of being holographic, meaning that any one part can be recreated fairly well given the rest, providing great robustness against noise. Additionally, information stored holographically leads to automatic generalization. Concepts that are similar (by whatever criteria gave rise to the representation) will have similar representations, and thus if one of them stimulates the network in a certain manner, the other will do so in a very similar manner.

One important task in understanding language is that of case-role assignment, which is the job of figuring out what part each noun in a sentence plays. Depending on the language, this may be indicated by word order, auxiliary words, or some alteration of the noun, such as a suffix or vowel change. Additionally, semantic clues based on the meanings and usual roles of words can aid in this task.

SPEC is a neural network designed to perform this case-role assignment task on arbitrarily complex sentences but with only a highly limited vocabulary of thirty words (Miikkulainen 1997a). The subsymbolic representation of each word is developed throughout the training process using a technique known as FGREP, which was developed earlier and creates representations that have a level of similarity based on how similar the ways they are used in sentences are. This leads to representations that group words into categories, which need not be mutually exclusive (such as chicken, which is similar to both other foods and other animals).

The SPEC network takes as its input a sequence of words, one at a time, and outputs at each step its current best guess for which words fill the case-roles of Agent, Act, Patient, Instrument, and Modifier in the current clause. Internally, it consists of three separate interconnected modules, a parser, a segmenter, and a stack. The parser's job is to add the current input word to the representation of the current clause and to determine what word is occupying each case-role based on this internal representation. The stack is used to remember information about previous clauses while examining an embedded clause, and the segmenter determines where the clause boundaries are and directs the stack to save and restore the parser's representation of a clause.

After training with just 100 different sentences making use of only two-level tail embedding and two-level center embedding, SPEC was able to correctly assign case roles in 98,100 distinct sentences making use of up to four levels of tail and/or center embedding. To test the limitations of its behavior, noise was added to intentionally degenerate the representation in the stack. Even with this change, it was still correct 94% of the time. When it failed, it was usually with deep center embeddings, which which people typically have trouble as well. Interestingly, it would always pick another noun, never accidentally picking a verb, showing that it had a clear distinction between the two classes of words. Furthermore, the noun it picked was usually one that had appeared before in the sentence, and not just one at random. Also, the tighter the semantic restrictions on what nouns could go in a given spot, the higher the chance of SPEC getting it right. Even on those it got wrong, it usually chose a noun that made sense. (That is, if the correct assignment is "boy chased cat", it might err with "boy chased dog", but not "boy chased pasta".) Such mistakes are surprisingly similar to those people typically make.

However, just because something works similarly to the way humans do does not mean that humans are necessarily operating in the same manner. In his work, Mayberry (1999) demonstrates that a differently structured parsing system can produce similar results. His SARDSRN makes use of a parser nearly identical to that in SPEC and a self-organizing map to store information about how specific words usually relate to each other, to ease the memory burden on the parser. The result performs very well, though not quite at the same level as SPEC. Neither one is obviously more biologically plausible than the other, as one could draw analogies between SPEC's stack and a phonological loop, or between SARDSRN's map and an associative memory of grammar.

While determining case-role assignment is by no means trivial, it is still a far cry from fully comprehending and responding to sentences. An attempt to perform this much more complex task in a limited domain is found in the DISCERN system (Miikkulainen 2000). DISCERN is designed to read in short stories of a few sentences that describe stereotypical series of events, such as going to a restaurant, shopping, or taking a plane trip. The stories themselves are created from template-like scripts, but not all details are always included. After having read a story, DISCERN remembers the facts contained therein and can answer simple natural-language queries about the story, making inferences if necessary to fill in details that were not explicit. It is also capable of providing a paraphrase of the original story, including all the details it infers.

DISCERN itself is organized into several discrete interconnected modules to handle subtasks, including a lexicon, episodic memory, sentence parser, sentence generator, story parser, story generator, cue former, and answer producer. The lexicon has two layers, which organize vocabulary based on similarity in spelling and pronunciation, and on similarity in meaning. The episodic memory is a hierarchy of feature maps that contain increasingly specific levels of detail, ranging from the general category of a story to the specific details known. The other units are responsible for breaking down input and feeding it to these two, or reassembling their output into coherent English sentences.

DISCERN's limited ability to infer is based upon the fact that certain facts in the stories have strong correlations. If told, "John left a big tip," and later asked, "Did John like his meal?", the system is able to respond yes. These correlations are not hard-coded, but rather learned through the observation of sample stories, allowing for a great deal of flexibility. Due to the organization of its episodic memory, distinct facts are easier to remember than similar facts, since they occupy different regions of the memory. For example, after reading a story about John's meal at Denny's, facts about Suzie's meal at Hunan Palace are more likely to be obscured than those about Fred's flight to Dallas. If asked questions about the meals at this point, it may respond with facts about the incorrect meal due to confusion, not unlike people asked to remember several similar things. Interestingly, if DISCERN is asked questions that are wrong, it will still answer. For instance, if told a story where John eats a steak at Ma Maison, and then asked who ate lobster at Ma Maison, the system will reply with John. Even though some of the facts don't match, enough do to trigger the correct memory, and the system dismisses the incongruities as irrelevant mistakes. Likewise, a person in such a situation will generally assume that either they remember the choice of food incorrectly, the questioner is the one mistaken, or someone just heard the other wrong, and ignore the discrepancy with no more than a passing thought. Only if they know multiple similar facts which could be the one the questioner wants to know will they take more time to find out which is the one actually meant.

While DISCERN is yet another toy system that only works in a ridiculously specialized domain, it can easily be adapted and trained to any other domain, and expanded to deal with much larger and more general domains. In the areas where it does work, the level of performance it achieves is quite good. Additionally, it exhibits certain behaviors that are almost humanlike, indicating that people would probably feel comfortable interacting with it; we are, after all, used to dealing with each other's foibles, but those of alien machines might throw us off. With some work, its promise of a practical system to understand, remember, and infer information could well be realized.


The study of neural networks is an incredibly diverse field. It covers theoretical research that attempts to recreate the functionings of biology as well as corporate-driven solving of purely pragmatic problems, and intertwines through every category of cognition from language to motor control to vision. As more and more artificial intelligence researchers run against the wall of knowledge acquisition that presents the largest barrier to achieving their goals, many come to view manufacturing an intelligence as being a fundamentally intractable approach, and turn to the self-organization and learning properties of neural networks to carry out further research. New advances in medical technology let us take ever better glimpses of the workings of the human mind without disturbing a thing. The insights we gain from such imaging techniques give us more accurate guesses for where to start modeling consciousness, and provide physical evidence that actual minds do or do not work as neural network models predict, allowing for the refinement of the model. The continuing advances in computer technology allow for the simulation of ever more complex networks, eventually allowing us to exceed even the complexity of the human mind. Even if current lines of development run up against physical limits within the next few decades, information gained from developing neural networks could serve as a springboard for alternative computing technologies like optical processors or biocomputers. The constant shrinking and cheapening of computing devices will allow for their integration into all sorts of devices, and neural networks will be able to provide these gadgets with learning abilities to adapt to an individual user. The practical applications of neural networks will help to ensure that funding for research in the area will not dry up any time soon, allowing it to remain at the cutting edge of developments in cognitive science.


Agogino, Adrian, Kenneth Stanley, and Risto Miikkulainen. "Real-Time Interactive Neuro-Evolution." in Neural Processing Letters. Boston; Dordrecht; London: Kluwer, 1999.

Fausett, Laurene. Fundamentals of Neural Networks: Architectures, Algorithms, and Applications. Upper River Saddle, New Jersey: Prentice Hall, 1994.

Mayberry, Marshall R. and Risto Miikkulainen. "SARDSRN: A Neural Network Shift-Reduce Parser" in Proceedings of the 16th Annual Joint Conference on Artificial Intelligence, (IJCAI-99, Stockholm, Sweden). Denver: Morgan Kaufmann, 1999.

Miikkulainen, Risto. (1997a) "Natural Language Processing with Subsymbolic Neural Networks" in Neural Network Perspectives on Cognition and Adaptive Robotics, 120-139. Bristol, UK; Philadelphia: Institute of Physics Press, 1997.

_____ (1997b), James A. Bednar, Yoonsuck Choe, and Joseph Sirosh. "Self-organization, Plasticity, and Low-level Visual Phenomena in a Laterally Connected Map Model of the Primary Visual Cortex" in R. L. Goldstone, P. G. Schyns and D. L. Medin (editors), Psychology of Learning and Motivation, volume 36: Perceptual Learning, 257-308. San Diego, CA: Academic Press, 1997.

_____ (1998), James A. Bednar, Yoonsuck Choe, and Joseph Sirosh. "A Self-Organizing Neural Network Model of the Visual Cortex" in Proceedings of the Fifth International Conference on Neural Information Processing (ICONIP 98, Kitakyushu, Japan).

_____. (2000) "Text and Discourse Understanding: The DISCERN System" in R. Dale, H. Moisl, and H. Somers (editors), A Handbook of Natural Language Processing Techniques and Applications for the Processing of Languages as Text. New York: Marcel Dekker, 2000.

Moriarty, D. E. and Risto Miikkulainen. "Evolving Obstacle Avoidance Behavior in a Robot Arm." in P. Maes, M. Mataric, J.-A. Meyer, and J. Pollack (editors), From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior. (SAB96, Cambridge, MA). Cambridge, MA: MIT Press, 1996.

Nguyen , D. and B. Widrow. "The Truck Backer-Upper: An Example of Self-Learning in Neural Networks." in Proceedings of the International Joint Conference on Neural Networks. Piscataway, NJ: IEEE, 1989.

Wasserman, P. D. "Optical Neural Networks." in Neural Computing: Theory and Practice. New York: Van Nostrand Reinhold, 1989.

5692 people have viewed this page since 2 September 2002.

© Andrés Santiago Pérez-Bergquist, All rights reserved. The reproduction of this work, by any means electronic, physical, or otherwise, in whole or in part, except for the purposes of review or criticism, without the express written consent of the author, is strictly prohibited. All references to copyrighted and/or trademarked names and ideas held by other individuals and/or corporations should not be considered a challenge to said copyrights and trademarks.

If you wish to contact the author, you may do so at