Neural networks are used widely to study computation in biological nervous systems, as well as to perform real-time control tasks, such as in autonomous robots. In this approach, model parameters, like synaptic weights, are usually optimized numerically, so that the network has the right input-output properties. However, since information processing in neural networks is usually highly distributed, how they function often remains a mystery. Deriving computational rules from optimized neural networks is called "rule extraction," and while methods exist for rule extraction from static, feed-forward networks, but much less is known about dynamic recurrent networks. That is the subject of this talk.
I'll do this in the context of a particular control problem: chemotaxis in the nematode C. elegans. This is a tiny, soil-dwelling worm, shown on the left. It's interesting for computational neuroscience because it has exactly 302 neurons, and we know the morphology of every neuron and the connectivity of the entire nervous system.
On the right I'm showing its chemotaxis behavior. The ring represents the border of a Petri dish 8.5 cm in diameter, containing a gaussian-shaped field of chemical attractant, e.g., Cl- with a peak concentration of 1mM. Four worms were placed in the dish at the asterisks, and allowed to move freely for 5 minutes. After initial transient movements, each worm oriented nearly directly up the gradient, and dwelled at the center.
An essential aspect of this control problem is that the worm senses the chemical concentration at only a single point at the tip of its nose, so it can't get any information about the gradient by sitting still. Instead it must move through the environment and make temporal comparisons. Thus, to perform chemotaxis the nervous system almost certainly computes a temporal derivative.
This behavior is also interesting because we know quite a bit about the neural circuit for chemotaxis. It involves about 40 neurons, including 11 pairs of chemosensory neurons (one pair is shown here), 4 pairs of interneurons, and 5 pairs of motor neurons, which innervate dorsal and ventral neck muscles. (Dorsal and ventral because the worm actually creeps on its side.) This is a recurrent network, because it has both feed-forward and feed-back connections.
Electrophysiological recordings from chemosensory neurons indicate that these neurons are functionally isopotential, and that they do not fire action potentials (Goodman and Lockery, unpublished). Morphological similaries between sensory neurons and other neurons suggest that this is also for other neurons in C. elegans.
To model a behavior like chemotaxis, we have to model not only the nervous system, but also the body.
On the left is a simplified model of the nematode body, showing the location of the nose (x,y) at which the chemical concentration is sensed, and the velocity vector v, pointed in a direction theta. From behavioral experiments (Pierce and Lockery, unpublished) we know that during chemotaxis nematodes move at nearly constant speed, about 1/5 of a body length per second. Thus the rate of change of the x and y coordinates are given by the first two equations. The rate of change of theta is determined by the degree of bend in the neck, which is determined by the relative tension in the dorsal and ventral neck muscles, which in turn is determined by the relative voltage of the corresponding dorsal and ventral motorneurons. This is shown in the third equation.
On the right is a simplified model of the chemotaxis network. It shares certain generic features with the biological network I just showed you. It has 1 chemosensory neuron, which senses the chemical concentration at the tip of the nose, 3 interneurons, and 2 motor neurons. The dynamics of each neuron is described by the equation on the right, where Vi is the voltage of the ith neuron. In this model voltage (not firing rate) is the neuron state variable, since these cells don't fire action potentials. For the purposes of rule extraction, this model has linear voltage dependence, so the network equations can be solved analytically.
To summarize the worm model, the job of the network is to control the rate of turning as a function chemical concentration. For this to occur, we must optimize the parameters wij (synaptic weights) and Vbarj (the presynaptic voltage at which there is no synaptic transmission). We use a simple training algorithm called simulated annealing.
The optimization is based on a particular fitness function, the time average of the chemical concentration at the worm's location. This way, a worm that goes directly to the center of the dish and remains there will have a high fitness. The fitness of a particular model worm is averaged over several initial conditions to obtain an overall fitness.
On the left are the real worm tracks you already saw, with the green stuff representing the gaussian-shaped field of attractant. On the right is the result of one optimization. After initial transient movements, each initial condition led to oriented movement up the gradient, and persistent dwelling at the peak. I'd like to ask you to remember the basic shape of these tracks, because you're going to see them again: after the initial transient movements, there's a leftward arch, and then a clover-like pattern in the center when you look at all four tracks.
These are the equations you already saw. Since the network equations are linear in voltage, they can be solved analytically using standard methods. Taking the difference of dorsal (D) and ventral (V) motor neuron voltages, the rate of turning is equal to a constant bias, plus a convolution of the input C(t') with a kernel k(t-t'). This kernel is also the impulse response of the linear network, and can be computed analytically from the network parameters.
Now here is the most important step. According to the convolution, the rate of turning depends on the entire history of the input; see how the integral is over t', which runs from the initial time t0, to the time t at which we compute the rate of turning. But notice from the figure that the kernel decays to zero in a few seconds, which is very short compared to the several minutes over which chemotaxis behavior occurs. Thus in practice, the rate of turning only depends on the recent history of the input. This feature of finite memory is a generic property of all controllable linear systems.
Th relatively short history dependence of the integrand k(t-t')C(t') suggests a Taylor series expansion of the input C(t') about the time t. A major simplification arises from the fact that, in a Taylor series like this, C and its derivatives are evaluated at t, not t'. So when you make the substitution, C and its derivatives come out of the integral, and you end up with a new expression, in which the rate of turning is equal to a constant bias, plus a linear sum of C and its time derivatives. The expansion coefficients zn are constants, and can be computed analytically. We call the boxed result a derivative expansion of the analytic solution for the rate of turning. This is the extracted rule. In general, an expansion like this is most useful if one can get away with keeping only the first few terms. This is the case for this network, as is shown on the next slide.
At the top is the extracted rule for this network, where I've computed the bias and the z coefficients numerically. If we keep terms only through zeroth order, i.e., the constant bias and the C term, then the model worm does not perform chemotaxis. But as soon as we include the first order, or dC/dt term, we get chemotaxis behavior very much like that of the neural network. If we include the second order term we get nearly the same result, so it appears that our derivative expansion converges by first order, at least for this network. The dominance of the first order term makes perfect sense, since for a point sensor moving at constant speed in two dimensions, there is a very simple relation between the first derivative dC/dt and the spatial gradient, and chemotaxis is defined as oriented movement in a gradient. The remarkable fact is that this network evolved to compute dC/dt implicitly, by virtue of optimizing with the fitness function shown earlier.
I can even tell you how the extracted rule is working. The C term is small, especially at the start. The behavior arises primarily from a competition between the first derivative and the constant bias. Remember, to get oriented movement up the gradient, we need the rate of turning to be near zero when the concentration is increasing. This occurs here because the first derivative and bias terms have opposite signs. In fact, if you put in numbers for the speed of the worm and the steepness of the gradient, you will find that this cancellation is nearly exact. But I don't want to tell you that the C term does nothing. On the right is the result of omitting the C term, and you can see a slight effect near the center of the dish. But the effect is subtle, and certainly not required.
This competition between the first derivative dC/dt and the constant bias, described above, defines one strategy for chemotaxis. We have already explained why the first derivative is important, but what about the bias? Further optimizations and analysis shows that there exists another strategy for which the bias is zero, and the competition occurs instead between the first derivative dC/dt and the C term. If higher order terms are unimportant, then these two classes of strategies exhaust the chemotaxis strategies available to a linear, deterministic network.
In conclusion, we've derived a method for extracting computational rules from continuous-time, linear recurrent networks. We did this in the context of a particular control problem, nematode chemotaxis, and found two distinct strategies. For all networks studied, we found that the derivative expansion converged by first order, and we explained that this makes sense for chemotaxis since it is defined as oriented movement in a gradient. We suspect, therefore, that a similar approach will work equally well for the broad class of control problems involving orientation in a spatial gradient, as well as other control problems for which the relevant information is contained in the first few time derivatives of the input.
Of course, we want to be able to extract computational rules from nonlinear networks, especially since nonlinearities abound in the real nematode nervous system. Work is now underway in this direction.
The authors thank J. Pierce and T. Morse for helpful discussions. This work is supported by NIMH MH11373, NIMH MH51383, NSF IBN 9458102, ONR N00014-94-1-0642, the Sloan Foundation, and the Searle Scholars Program.