US20150302296A1 - Plastic action-selection networks for neuromorphic hardware - Google Patents

Plastic action-selection networks for neuromorphic hardware Download PDF

Info

Publication number
US20150302296A1
US20150302296A1 US13/896,110 US201313896110A US2015302296A1 US 20150302296 A1 US20150302296 A1 US 20150302296A1 US 201313896110 A US201313896110 A US 201313896110A US 2015302296 A1 US2015302296 A1 US 2015302296A1
Authority
US
United States
Prior art keywords
neurons
population
channel
channels
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/896,110
Inventor
Corey M. THIBEAULT
Narayan Srinivasa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HRL Laboratories LLC
Original Assignee
HRL Laboratories LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HRL Laboratories LLC filed Critical HRL Laboratories LLC
Priority to US13/896,110 priority Critical patent/US20150302296A1/en
Priority to PCT/US2013/041451 priority patent/WO2014088634A1/en
Assigned to HRL LABORATORIES, LLC reassignment HRL LABORATORIES, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SRINIVASA, NARAYAN, THIBEAULT, COREY M.
Priority to US14/293,928 priority patent/US9349092B2/en
Publication of US20150302296A1 publication Critical patent/US20150302296A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Definitions

  • This disclosure relates to neural networks, and in particular to neural networks capable of action-selection and reinforcement-learning.
  • a neural model for reinforcement-learning and for action-selection comprises a plurality of channels, a population of input neurons in each of the channels, a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels and a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to output neurons in a channel that the reward neuron is part of, wherein if the environmental input for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced, and wherein if the environmental input for a channel is negative, the corresponding channel of a population of output neurons are punished and have their responses attenuated.
  • a neural model for reinforcement-learning and for action-selection comprises a plurality of channels, a population of input neurons in each of the channels, a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels, a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to output neurons in a channel that the reward neuron is part of, and a population of inhibition neurons in each of the channels, wherein each population of inhibition neurons receive an input from a population of output neurons in a same channel that the population of inhibition neurons is part of, and wherein a population of inhibition neurons in a channel has an output to output neurons in every other channel except the channel of which the inhibition neurons are part of, wherein if the environmental input to a population of reward neurons for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced, and wherein
  • a basal ganglia neural network model comprises a plurality of channels, a population of cortex neurons in each of the channels, a population of striatum neurons in each of the channels, each population of striatum neurons in each of the channels coupled to each population of cortex neurons in each of the channels, a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to striatum neurons in a channel that the reward neuron is part of, and a population of Substantia Nigra pars reticulata (SNr) neurons in each of the channels, wherein each population of SNr neurons is coupled only to a population of striatum neurons in a channel that the SNr neurons are part of, wherein if the environmental input to a population of reward neurons for a channel is positive, the corresponding channel of a population of striatum neurons are rewarded and have their responses reinforced, wherein if the environmental input to
  • FIG. 1 shows a neural network in accordance with the present disclosure
  • FIG. 2 shows another neural network with lateral inhibition in accordance with the present disclosure
  • FIG. 3 shows a basal ganglia neural network in accordance with the present disclosure
  • FIGS. 4A to 4C show an example of a reward-learning scenario in accordance with the present disclosure
  • FIGS. 5A to 5F show an example of synaptic weights for a neural network in accordance with the present disclosure
  • FIG. 6 is a diagram showing a pong style virtual environment in accordance with the present disclosure.
  • FIGS. 7A to 7C , 8 A to 8 C, and 9 A to 9 C illustrate results for the pong style virtual environment of FIG. 6 for different spatial widths and time spans in accordance with the present disclosure
  • FIG. 10 illustrates the overall accuracy for the model with a spatial width of 0.025 in accordance with the present disclosure.
  • the embodied modeling can be described as the coupling of computational biology and engineering. Historically strategies for embedding artificial intelligence have failed to result in agents with truly emergent properties. Because of this it is still unreasonable to deploy a robotic entity and expect it to learn from its environment the way biological entities can. Similarly, neural models require complex and varied input signals in order to accurately replicate the activity observed in vivo.
  • One method for creating these complex stimuli is through immersing a model in a real or virtual environment capable of providing feedback.
  • action selection is the arbitration of competing signals.
  • the complex circuitry of the basal ganglia (BG) is active in gating the information flow in the frontal cortex by appropriately selecting between input signals.
  • This selection mechanism can affect simple action all the way up to complex behaviors and cognitive processing.
  • Reinforcement or reward learning is the reinforcement of actions or decisions that maximizes the positive outcome of those choices. This is similar to instrumental conditioning where stimulus response trials result in reinforcement of responses that are rewarded and attenuation of those that are not.
  • Reinforcement-learning in a neural network is an ideal alternative to supervised learning algorithms. Where supervised learning requires an intelligent teaching signal that must have a detailed understanding of the task, reinforcement learning can develop independent of the task without any prior knowledge. Only the quality of the output signal in response to the input signal and current contextual state of the network is needed.
  • neurons within a neural network may be modeled by a Leaky-Integrate and Fire (LIF) model.
  • LIF Leaky-Integrate and Fire
  • Cm is the membrane capacitance
  • I is the sum of external and synaptic currents
  • Erest is the reversal potential for that particular class of synapse.
  • the LIF model is one of the least computationally intensive neural models but is still capable of replicating many aspects of neural activity.
  • gmax is the maximum conductance for that particular class of synapse
  • geff is the current synaptic efficacy between [0, geffmax], and
  • Esyn is the reversal potential for that particular class of synapse.
  • the global parameter values that may be used in one embodiment are presented in Table 1.
  • the governing equations are numerically integrated using Euler integration with a 1 milliseconds (ms) time step.
  • FIGS. 1 to 3 show three different neural network embodiments. Initially, each of these networks has no knowledge or inherent understanding of their environment. The behavior is learned through feedback from the environment in the form of reward and punishment signals encoded as either random or structured spike events. These signals strengthen or weaken the synaptic connections between neurons; reinforcing the appropriate action.
  • the first model is a simple feed-forward network that consists entirely of excitatory neurons arranged into N channels.
  • the neural network of FIG. 1 has N channels.
  • Each of the N channels has a population of input neurons 12 , a population of output neurons 14 , and a population of reward neurons 16 .
  • the populations of input neurons 12 are connected with equal probability and equal conductance to all of the populations of output neurons 14 , ensuring that there is no inherent bias to a particular input-output pair.
  • the populations of input neurons 12 are connected randomly to the populations of output neurons 14 . This embodiment is particularly important to large-scale implementations of these networks as well as afferent limitations imposed by a neuromorphic architecture.
  • Each channel of a population of input neurons 12 is connected to each channel of a population of output neurons 14 channel by synapses 18 .
  • One set of parameters that may be used with the model of FIG. 1 is presented in Table 2.
  • the synapse connections 18 between input neurons 12 and output neurons 14 are randomly created from the entire input neuron 12 population to ensure that there is no bias between input and output channels.
  • Reward neurons 16 receive input from environmental inputs 20 , which may be sensed from the environment. Each channel of reward neurons is coupled to only one corresponding channel of output neurons 20 via synapses 22 . If the environmental inputs for a channel are positive, the corresponding channel of output neurons 14 are rewarded and have their responses reinforced. If the environmental inputs for a channel are negative, the corresponding channel output neurons 14 are punished and have their responses attenuated.
  • the input neurons 12 , the output neurons 14 and the reward neurons 16 may be modeled by the Leaky-Integrate and Fire (LIF) model defined by equation 1.
  • the synapses 18 and 22 may be modeled by the spike-timing dependent plasticity (STDP) of equation 4.
  • FIG. 2 shows another neural network with lateral inhibition between the output populations in accordance with the present disclosure.
  • the neural network of FIG. 2 creates an on-center off-surround network where the most active population suppresses the other output populations. Not only is this a more biologically realistic network but it also offers more control in the selection process.
  • One set of parameters for this model may be the parameters shown in Table 3.
  • a key aspect of the neural network is the diffuse connections of the inhibition neurons 36 . Each channel of a population of inhibition neurons 36 project to every other channel of output neurons 32 , excluding the channel of which the population of inhibition neurons 36 are a part of.
  • the neural network of FIG. 2 has N channels.
  • Each of the N channels has a population of input neurons 30 , a population of output neurons 32 , a population of reward neurons 34 , and a population of inhibition neurons 36 .
  • Each channel of a population of input neurons 30 is connected to each channel of a population of output neurons 32 channel by synapses 38 .
  • the populations of input neurons 30 are connected with equal probability and equal conductance to all of the populations of output neurons 32 , ensuring that there is no inherent bias to a particular input-output pair.
  • the synapse connections 38 between the populations of input neurons 30 and the populations of output neurons 32 are connected randomly from the entire population of input neurons 30 .
  • Each channel of a population of reward neurons 34 receives inputs from environmental inputs 40 , which may be sensed from the environment. Each channel of a population of reward neurons 34 is coupled to only one corresponding channel of a population of output neurons 32 via synapses 42 . If the environmental inputs for a channel are positive, the corresponding channel of output neurons 32 are rewarded and have their responses reinforced. If the environmental inputs for a channel are negative, the corresponding channel output neurons 32 are punished and have their responses attenuated.
  • Each channel of a population of output neurons 32 are connected by synapses 46 to a corresponding channel of a population of inhibition neurons 36 .
  • the inhibition neurons 36 in a channel are coupled via synapses 44 to output neurons 32 in every other channel; however the inhibition neurons 36 in a channel are not coupled to output neurons 32 of the channel of which the inhibition neurons 36 are part of.
  • the inhibition neurons 36 may via the synapses 44 inhibit the responses from output neurons 32 in every other channel.
  • the input neurons 30 , the output neurons 32 , the reward neurons 34 , and the inhibition neurons 36 may be modeled by the Leaky-Integrate and Fire (LIF) model defined by equation 1.
  • the synapses 38 , 42 , 44 and 46 may be modeled by the spike-timing dependent plasticity (STDP) of equation 4.
  • FIG. 3 shows a basal ganglia (BG) neural network in accordance with the present disclosure.
  • the neural network of FIG. 3 emulates the physiological activity of the BG direct pathway, where the Substantia Nigra pars reticulata (SNr) neurons 56 are tonically active, firing around 30 Hz.
  • the substantia nigra is part of the basal ganglia and the pars reticulata is part of the substantia nigra.
  • the basal activity of the SNr neurons 56 is suppressed by the inhibitory afferents of the striatum neurons 52 , resulting in a disinhibitory mechanism of action. Learning occurs between the cortex neurons 50 and the striatum neurons 52 to develop the appropriate input-output channel combinations.
  • Table 4 One set of parameters that may be use this model are shown in Table 4.
  • the SNr neurons 54 are tonically active. However, the LIF neuron of equation 1 is not capable of replicating that spontaneous activity. To compensate, a Poisson random excitatory input 68 is injected into the SNr neuron populations 56 . In addition, low-level uniform random noise may be injected into the network.
  • the neural network of FIG. 3 has N channels.
  • Each of the N channels has a population of cortex neurons 50 , a population of striatum neurons 52 , a population of reward neurons 54 , and a population of SNr neurons 56 .
  • Each channel of cortex neurons 50 is connected to each striatum neuron channel by synapses 58 .
  • the populations of cortex neurons 50 are connected with equal probability and equal conductance to all of the populations of striatum neurons 52 , ensuring that there is no inherent bias to a particular cortex-striatum pair. In another embodiment, the populations of cortex neurons 50 are connected randomly to the populations of striatum neurons 52 .
  • the population of striatum neurons 52 in a channel is connected to the population of striatum neurons 52 in every other channel by synapses 60 .
  • Reward neurons 54 receive input from environmental inputs 62 , which may be sensed from the environment. Each channel of reward neurons 54 is coupled to only to the corresponding channel of striatum neurons 52 of which the reward neurons 54 are part of via synapses 64 . If the environmental inputs for a channel are positive, the corresponding channel of striatum neurons 52 are rewarded and have their responses reinforced. If the environmental inputs for a channel are negative, the corresponding channel striatum neurons 52 are punished and have their responses attenuated.
  • Each channel of striatum neurons 52 are connected by synapses 66 only to a corresponding channel of SNr neurons 56 .
  • a Poisson random excitatory input 68 is injected into each channel of SNr neurons 56 .
  • the cortex neurons 50 , the striatum neurons 52 , the reward neurons 54 , and the SNr neurons 56 may be modeled by the Leaky-Integrate and Fire (LIF) model defined by equation 1.
  • the synapses 58 , 60 , 64 and 66 may be modeled by the spike-timing dependent plasticity (STDP) of equation 4.
  • Stereotyped spiking signals may be sent to an input population and all of the reward populations.
  • the timing of the signal is delayed for the target channel so the synaptic learning between the input population and the desired output populations is potentiated, while all other channels are depressed.
  • the timing of these signals are dependent on the values chosen in Equation 4. Punishment signals can be injected by removing the delay from the target reward population and suppressing the activity of the other output populations.
  • the LIF neuron is only an example of a neural model that can be used. Any mathematical model capable of integrating multiple signals and converting that into discrete time events can be employed in these networks.
  • FIG. 4A shows an activity rate map of the example scenario. The activity was calculated using a moving Gaussian weighted window.
  • FIG. 4B shows a spike raster of the input populations.
  • FIG. 4C shows a spike raster of the output populations.
  • FIGS. 5A-5F show the synaptic weights at 0 sec., 10 sec., 11 sec., 21 sec., 22 sec, and 33 sec., respectively.
  • stage A the network is initialized with all input/output connections have a synaptic USE value of 0.25; as illustrated in FIG. 5A by the heat map of the average weights between input/output populations.
  • stage B a Poisson random input is injected into consecutive channels for 10 seconds to establish the basal activity of the network.
  • the resulting average synaptic weight matrix is shown in FIG. 5B .
  • stage C alternating reward signals are sent to establish single input/output pairs.
  • the weight matrix is now dominated by the diagonal shown in FIG. 5C .
  • stage D the repeated Poisson input signals from B., above, are injected for 10 seconds.
  • the weight matrix shown in FIG. 5D demonstrates further potentiation of the established input/output pairs and a continued depression of the other connections.
  • stage E an opposite set of input/output associations are established using alternating reward signals.
  • the reward protocol needs to be about twice as long as the original training.
  • the new weight matrix is shown in FIG. 5E .
  • stage F 10 seconds of the repeated Poisson inputs illustrate the newly established input/output pairs in FIG. 5F .
  • FIG. 6 is a mock-up of that environment.
  • the position of the puck 70 in the game space is sent to a number of discretized neural channels. Each of these channels in essence represents a vertical column of the game board.
  • the inputs are Poisson random spiking events with a rate defined by a Gaussian curve, described below. This provides a noisy input signal with overlap between channels.
  • the networks signal, through a winner-takes-all mechanism, the position of the paddle 72 .
  • the network has no knowledge or inherent understanding of how to play the game.
  • the behavior is learned through feedback provided as reward and punishment signals encoded as random spike events.
  • the stimulus into the network is determined by the location of the puck 70 to each of the spatial channels.
  • the signal strength for each spatial channel is computed by sampling a Gaussian function based on the location of the channel.
  • the location of the puck 70 on the map determines the peak amplitude and center of a Gaussian function defined as
  • a is a peak amplitude of the Gaussian function
  • c is a spatial width of the Gaussian function
  • Xc is the non-dimensional location of the channel.
  • the peak amplitude and Gaussian center are defined as
  • Y* is the non-dimensional location of the puck in the y dimension
  • Rmax is the maximum input stimulus in spikes/s
  • FIG. 7 shows an example stimulus map for two spatial channels.
  • FIG. 7B shows a stimulus overlap between two consecutive spatial channels.
  • FIG. 7C shows an example stimulus for different locations of the puck 70 .
  • FIGS. 8 and 9 show the results for a spatial width, c, of 0:025 at FIG. 8A 0-25 sec., FIG. 8B 50-75 sec., and FIG. 8C 125-150 sec.
  • FIG. 10 shows the overall accuracy for the model with a spatial width, c, of 0:025.
  • the neural networks of FIGS. 1-3 may be implemented with passive and active electronics components including transistors, resistors, and capacitors.
  • the neural networks may also be implemented with computers or processors.
  • One type of processor that may be used is a memristor based neuromorphic processor.

Abstract

A neural model for reinforcement-learning and for action-selection includes a plurality of channels, a population of input neurons in each of the channels, a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels, and a population of reward neurons in each of the channels. Each channel of a population of reward neurons receives input from an environmental input, and is coupled only to output neurons in a channel that the reward neuron is part of. If the environmental input for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced, otherwise the corresponding channel of a population of output neurons are punished and have their responses attenuated.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • The present application is related to and claims priority to U.S. Provisional Patent Application Ser. No. 61/732,590 filed on Dec. 3, 2012, which is hereby incorporated by reference in its entirety.
  • STATEMENT REGARDING FEDERAL FUNDING
  • This invention was made under U.S. Government contract DARPA SyNAPSE HR0011-09-C-0001. The U.S. Government has certain rights in this invention.
  • TECHNICAL FIELD
  • This disclosure relates to neural networks, and in particular to neural networks capable of action-selection and reinforcement-learning.
  • BACKGROUND
  • In the prior art, neural networks capable of action-selection have been well characterized, as have those that demonstrate reinforcement-learning. However, in the prior art, action-selection and reinforcement-learning algorithms present complex solutions to the distal reward problem, which are not easily amenable to hardware implementations.
  • Barr, D., P. Dudek, J. Chambers, and K. Gurney describe in “Implementation of multi-layer leaky integrator networks on a cellular processor array” Neural Networks, 2007. IJCNN August 2007. International Joint Conference, pp. 1560-1565, a model of the basal ganglia on a neural processor array. The software neural model was capable of performing action selection. However, Barr et al. did not describe any inherent mechanisms for reinforcement-learning and the micro-channels of the basal ganglia were predefined.
  • Merolla, P., J. Arthur, F. Akopyan, N. Imam, R. Manohar, and D. Modha describe in “A digital neurosynaptic core using embedded crossbar memory with 45 pj per spike in 45 nm” Custom Integrated Circuits Conference (CICC), September 2011 IEEE, pp. 1-4, a neuromorphic processor capable of playing a game of pong against a human opponent. However, the network was constructed off-line and once programmed on the hardware, remained static.
  • What is needed is a neural network that implements action-selection and reinforcement-learning and that can be more readily implemented with hardware. The embodiments of the present disclosure answer these and other needs.
  • SUMMARY
  • In a first embodiment disclosed herein, a neural model for reinforcement-learning and for action-selection comprises a plurality of channels, a population of input neurons in each of the channels, a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels and a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to output neurons in a channel that the reward neuron is part of, wherein if the environmental input for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced, and wherein if the environmental input for a channel is negative, the corresponding channel of a population of output neurons are punished and have their responses attenuated.
  • In another embodiment disclosed herein, a neural model for reinforcement-learning and for action-selection comprises a plurality of channels, a population of input neurons in each of the channels, a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels, a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to output neurons in a channel that the reward neuron is part of, and a population of inhibition neurons in each of the channels, wherein each population of inhibition neurons receive an input from a population of output neurons in a same channel that the population of inhibition neurons is part of, and wherein a population of inhibition neurons in a channel has an output to output neurons in every other channel except the channel of which the inhibition neurons are part of, wherein if the environmental input to a population of reward neurons for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced, and wherein if the environmental input to a population of reward neurons for a channel is negative, the corresponding channel of a population of output neurons are punished and have their responses attenuated.
  • In yet another embodiment disclosed herein, a basal ganglia neural network model comprises a plurality of channels, a population of cortex neurons in each of the channels, a population of striatum neurons in each of the channels, each population of striatum neurons in each of the channels coupled to each population of cortex neurons in each of the channels, a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to striatum neurons in a channel that the reward neuron is part of, and a population of Substantia Nigra pars reticulata (SNr) neurons in each of the channels, wherein each population of SNr neurons is coupled only to a population of striatum neurons in a channel that the SNr neurons are part of, wherein if the environmental input to a population of reward neurons for a channel is positive, the corresponding channel of a population of striatum neurons are rewarded and have their responses reinforced, wherein if the environmental input to a population of reward neurons for a channel is negative, the corresponding channel of a population of striatum neurons are punished and have their responses attenuated, and wherein each population of SNr neurons is tonically active and is suppressed by inhibitory afferents of striatum neurons in a channel that the SNr neurons are part of.
  • These and other features and advantages will become further apparent from the detailed description and accompanying figures that follow. In the figures and description, numerals indicate the various features, like numerals referring to like features throughout both the drawings and the description.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 shows a neural network in accordance with the present disclosure;
  • FIG. 2 shows another neural network with lateral inhibition in accordance with the present disclosure;
  • FIG. 3 shows a basal ganglia neural network in accordance with the present disclosure;
  • FIGS. 4A to 4C show an example of a reward-learning scenario in accordance with the present disclosure;
  • FIGS. 5A to 5F show an example of synaptic weights for a neural network in accordance with the present disclosure;
  • FIG. 6 is a diagram showing a pong style virtual environment in accordance with the present disclosure;
  • FIGS. 7A to 7C, 8A to 8C, and 9A to 9C illustrate results for the pong style virtual environment of FIG. 6 for different spatial widths and time spans in accordance with the present disclosure; and
  • FIG. 10 illustrates the overall accuracy for the model with a spatial width of 0.025 in accordance with the present disclosure.
  • DETAILED DESCRIPTION
  • In the following description, numerous specific details are set forth to clearly describe various specific embodiments disclosed herein. One skilled in the art, however, will understand that the presently claimed invention may be practiced without all of the specific details discussed below. In other instances, well known features have not been described so as not to obscure the invention.
  • The combination of action-selection and reinforcement-learning in biological entities is essential for successfully adapting and thriving in any environment. This is also true for the successful operation of intelligent agents. Presented here are the design and implementation of biologically inspired action selection/reinforcement-networks for the control of an agent by a neuromorphic processor.
  • The embodied modeling can be described as the coupling of computational biology and engineering. Historically strategies for embedding artificial intelligence have failed to result in agents with truly emergent properties. Because of this it is still unreasonable to deploy a robotic entity and expect it to learn from its environment the way biological entities can. Similarly, neural models require complex and varied input signals in order to accurately replicate the activity observed in vivo. One method for creating these complex stimuli is through immersing a model in a real or virtual environment capable of providing feedback.
  • Conceptually, action selection is the arbitration of competing signals. In the mammalian nervous system the complex circuitry of the basal ganglia (BG) is active in gating the information flow in the frontal cortex by appropriately selecting between input signals. This selection mechanism can affect simple action all the way up to complex behaviors and cognitive processing. Although overly simplified, it can be helpful to relate the BG to a circuit multiplexer that actively connecting inputs to outputs based on the current system state.
  • Reinforcement or reward learning (RL) is the reinforcement of actions or decisions that maximizes the positive outcome of those choices. This is similar to instrumental conditioning where stimulus response trials result in reinforcement of responses that are rewarded and attenuation of those that are not. Reinforcement-learning in a neural network is an ideal alternative to supervised learning algorithms. Where supervised learning requires an intelligent teaching signal that must have a detailed understanding of the task, reinforcement learning can develop independent of the task without any prior knowledge. Only the quality of the output signal in response to the input signal and current contextual state of the network is needed.
  • In an embodiment according to the present disclosure, neurons within a neural network may be modeled by a Leaky-Integrate and Fire (LIF) model. The LIF model is defined by equation 1.
  • C m V t = - g leak ( V - E rest ) + I . ( 1 )
  • where
  • Cm is the membrane capacitance,
  • I is the sum of external and synaptic currents,
  • gleak conductance of the leak channels, and
  • Erest is the reversal potential for that particular class of synapse.
  • As the current input into the model neuron is increased the membrane voltage will proportionally increase until a threshold voltage is reached. At this point an action potential is fired and the membrane voltage is reset to the resting value. The neuron model is placed in a refractory period for 2 milliseconds where no changes in membrane voltages are allowed. If the current is removed before reaching the threshold, the voltage will decay to Erest. The LIF model is one of the least computationally intensive neural models but is still capable of replicating many aspects of neural activity.
  • The connections between neurons or synapses are modeled by conductance-based synapses. The general form of that influence is defined as equation 2.

  • g syn =g max ·g eff·(V−E syn.  (2)
  • where
  • gmax is the maximum conductance for that particular class of synapse,
  • geff is the current synaptic efficacy between [0, geffmax], and
  • Esyn is the reversal potential for that particular class of synapse.
  • To simulate the buffering and re-uptake of neurotransmitters, the influence that a presynaptic action potential has on a neuron can be decayed based on a specified time constant. This process is abstracted using equation 3.
  • τ syn = g i syn t = - g i syn + W ji δ ( t - t j ) . ( 3 )
  • Learning at the synaptic level is achieved through the spike-timing dependent plasticity rules described in Song, S., K. D. Miller, and L. F. Abbott (2000), “Competitive Hebbian Learning through Spike-timing Dependent Synaptic Plasticity” Nature Neuroscience (9) 2000, pp 919-926, as shown in equation 4.

  • g eff →g eff +g effmax Ft)  (4)
  • where
  • Δ t = t pre - t post F ( Δ t ) = { A + ( Δ t τ + ) A - ( Δ t τ - ) if ( g eff < 0 ) then g eff -> 0 if ( g > g effmax ) then g eff -> g effmax
  • The global parameter values that may be used in one embodiment are presented in Table 1. The governing equations are numerically integrated using Euler integration with a 1 milliseconds (ms) time step.
  • TABLE 1
    Global model parameters.
    Parameter Value
    C
    m 1. (pF)
    τ ge 5. (ms)
    τgi 100. (ms)
    Eexc 0. (mV)
    Einh −80. (mV)
    Vrest 0. (mV)
    A+ 0.025
    A 0.026
    τ + 20. (ms)
    τ 20. (ms)
  • FIGS. 1 to 3 show three different neural network embodiments. Initially, each of these networks has no knowledge or inherent understanding of their environment. The behavior is learned through feedback from the environment in the form of reward and punishment signals encoded as either random or structured spike events. These signals strengthen or weaken the synaptic connections between neurons; reinforcing the appropriate action.
  • The first model, shown in FIG. 1, is a simple feed-forward network that consists entirely of excitatory neurons arranged into N channels. The neural network of FIG. 1 has N channels. Each of the N channels has a population of input neurons 12, a population of output neurons 14, and a population of reward neurons 16.
  • In one embodiment the populations of input neurons 12 are connected with equal probability and equal conductance to all of the populations of output neurons 14, ensuring that there is no inherent bias to a particular input-output pair. In another embodiment, the populations of input neurons 12 are connected randomly to the populations of output neurons 14. This embodiment is particularly important to large-scale implementations of these networks as well as afferent limitations imposed by a neuromorphic architecture.
  • Each channel of a population of input neurons 12 is connected to each channel of a population of output neurons 14 channel by synapses 18. One set of parameters that may be used with the model of FIG. 1 is presented in Table 2. The synapse connections 18 between input neurons 12 and output neurons 14 are randomly created from the entire input neuron 12 population to ensure that there is no bias between input and output channels.
  • Reward neurons 16 receive input from environmental inputs 20, which may be sensed from the environment. Each channel of reward neurons is coupled to only one corresponding channel of output neurons 20 via synapses 22. If the environmental inputs for a channel are positive, the corresponding channel of output neurons 14 are rewarded and have their responses reinforced. If the environmental inputs for a channel are negative, the corresponding channel output neurons 14 are punished and have their responses attenuated.
  • The input neurons 12, the output neurons 14 and the reward neurons 16 may be modeled by the Leaky-Integrate and Fire (LIF) model defined by equation 1. The synapses 18 and 22 may be modeled by the spike-timing dependent plasticity (STDP) of equation 4.
  • TABLE 2
    Parameters for the excitatory only network.
    A. Neuron parameters
    Neurons
    Neural Region Per Channel
    Input
    3
    Output 3
    Reward 1
    B. Connections
    Synaptic Conductance Number of Incoming
    Source → Destination (gmax) · (geff) Connections (total)
    Input → Output  (10.0) · (0.25) 15
    Reward → Input (10.0) · (1.0) 1
  • FIG. 2 shows another neural network with lateral inhibition between the output populations in accordance with the present disclosure. The neural network of FIG. 2 creates an on-center off-surround network where the most active population suppresses the other output populations. Not only is this a more biologically realistic network but it also offers more control in the selection process. One set of parameters for this model may be the parameters shown in Table 3. A key aspect of the neural network is the diffuse connections of the inhibition neurons 36. Each channel of a population of inhibition neurons 36 project to every other channel of output neurons 32, excluding the channel of which the population of inhibition neurons 36 are a part of.
  • The neural network of FIG. 2 has N channels. Each of the N channels has a population of input neurons 30, a population of output neurons 32, a population of reward neurons 34, and a population of inhibition neurons 36. Each channel of a population of input neurons 30 is connected to each channel of a population of output neurons 32 channel by synapses 38.
  • In one embodiment the populations of input neurons 30 are connected with equal probability and equal conductance to all of the populations of output neurons 32, ensuring that there is no inherent bias to a particular input-output pair. In another embodiment, the synapse connections 38 between the populations of input neurons 30 and the populations of output neurons 32 are connected randomly from the entire population of input neurons 30.
  • Each channel of a population of reward neurons 34 receives inputs from environmental inputs 40, which may be sensed from the environment. Each channel of a population of reward neurons 34 is coupled to only one corresponding channel of a population of output neurons 32 via synapses 42. If the environmental inputs for a channel are positive, the corresponding channel of output neurons 32 are rewarded and have their responses reinforced. If the environmental inputs for a channel are negative, the corresponding channel output neurons 32 are punished and have their responses attenuated.
  • Each channel of a population of output neurons 32 are connected by synapses 46 to a corresponding channel of a population of inhibition neurons 36. The inhibition neurons 36 in a channel are coupled via synapses 44 to output neurons 32 in every other channel; however the inhibition neurons 36 in a channel are not coupled to output neurons 32 of the channel of which the inhibition neurons 36 are part of.
  • As the responses from the output neurons 32 of a channel of which the inhibition neurons 36 are part of increase, the inhibition neurons 36 may via the synapses 44 inhibit the responses from output neurons 32 in every other channel.
  • The input neurons 30, the output neurons 32, the reward neurons 34, and the inhibition neurons 36 may be modeled by the Leaky-Integrate and Fire (LIF) model defined by equation 1. The synapses 38, 42, 44 and 46 may be modeled by the spike-timing dependent plasticity (STDP) of equation 4.
  • TABLE 3
    Parameters for the lateral-inhibition network.
    A. Neuron parameters
    Neurons
    Neural Region Per Channel
    Input
    3
    Output 3
    Inhibition 3
    Reward 1
    B. Connections
    Synaptic Conductance Number of Incoming
    Source → Destination (gmax) · (geff) Connections (total)
    Input → Output  (10.0) · (0.25) 15
    Output → Inhibition (10.0) · (1.0) 15
    Inhibition → Output (10.0) · (1.0) 15
    Reward → Input (10.0) · (1.0) 1
  • FIG. 3 shows a basal ganglia (BG) neural network in accordance with the present disclosure. The neural network of FIG. 3 emulates the physiological activity of the BG direct pathway, where the Substantia Nigra pars reticulata (SNr) neurons 56 are tonically active, firing around 30 Hz. The substantia nigra is part of the basal ganglia and the pars reticulata is part of the substantia nigra. The basal activity of the SNr neurons 56 is suppressed by the inhibitory afferents of the striatum neurons 52, resulting in a disinhibitory mechanism of action. Learning occurs between the cortex neurons 50 and the striatum neurons 52 to develop the appropriate input-output channel combinations. One set of parameters that may be use this model are shown in Table 4.
  • TABLE 4
    Parameters for the basal ganglia direct pathway.
    A. Neuron parameters
    Neurons
    Neural Region Per Channel
    Cortex (Ctx) 4
    Striatum (Str) 3
    Substania Nigra 3
    pars reticulata (SNr)
    Excitatory 9
    Reward 6
    B. Connections
    Number of Incoming
    Synaptic Connections
    Source → Destination Conductance (per channel)
    Ctx → Str 0.1 4
    Str → Str (diffuse) 10.0 3
    Excitatory → SNr 0.08 3
    Str → SNr 10.0 3
    Reward → Str 10.0 6
  • Physiologically, the SNr neurons 54 are tonically active. However, the LIF neuron of equation 1 is not capable of replicating that spontaneous activity. To compensate, a Poisson random excitatory input 68 is injected into the SNr neuron populations 56. In addition, low-level uniform random noise may be injected into the network.
  • The neural network of FIG. 3 has N channels. Each of the N channels has a population of cortex neurons 50, a population of striatum neurons 52, a population of reward neurons 54, and a population of SNr neurons 56. Each channel of cortex neurons 50 is connected to each striatum neuron channel by synapses 58.
  • In one embodiment the populations of cortex neurons 50 are connected with equal probability and equal conductance to all of the populations of striatum neurons 52, ensuring that there is no inherent bias to a particular cortex-striatum pair. In another embodiment, the populations of cortex neurons 50 are connected randomly to the populations of striatum neurons 52.
  • The population of striatum neurons 52 in a channel is connected to the population of striatum neurons 52 in every other channel by synapses 60.
  • Reward neurons 54 receive input from environmental inputs 62, which may be sensed from the environment. Each channel of reward neurons 54 is coupled to only to the corresponding channel of striatum neurons 52 of which the reward neurons 54 are part of via synapses 64. If the environmental inputs for a channel are positive, the corresponding channel of striatum neurons 52 are rewarded and have their responses reinforced. If the environmental inputs for a channel are negative, the corresponding channel striatum neurons 52 are punished and have their responses attenuated.
  • Each channel of striatum neurons 52 are connected by synapses 66 only to a corresponding channel of SNr neurons 56. A Poisson random excitatory input 68 is injected into each channel of SNr neurons 56.
  • The cortex neurons 50, the striatum neurons 52, the reward neurons 54, and the SNr neurons 56 may be modeled by the Leaky-Integrate and Fire (LIF) model defined by equation 1. The synapses 58, 60, 64 and 66 may be modeled by the spike-timing dependent plasticity (STDP) of equation 4.
  • Learning in these networks is driven by a conditioned stimulus injection. Stereotyped spiking signals may be sent to an input population and all of the reward populations. The timing of the signal is delayed for the target channel so the synaptic learning between the input population and the desired output populations is potentiated, while all other channels are depressed. The timing of these signals are dependent on the values chosen in Equation 4. Punishment signals can be injected by removing the delay from the target reward population and suppressing the activity of the other output populations.
  • This is only one way of exploiting the architecture of these networks to create arbitrary input/output combinations. Any Hebbian, actor-critic, reward-modulated or distal-reward learning rule can be applied to achieve the same modulation of the synaptic weights.
  • Similarly, the LIF neuron is only an example of a neural model that can be used. Any mathematical model capable of integrating multiple signals and converting that into discrete time events can be employed in these networks.
  • Finally, the specific connectivity is not crucial to the performance; increasing the number of connections per cell can improve the stability and plasticity.
  • The model of FIG. 1 has been implemented under the constraints of an initial memristor based neuromorphic processor. An example reward-learning scenario is illustrated in FIGS. 4A-4C. FIG. 4A shows an activity rate map of the example scenario. The activity was calculated using a moving Gaussian weighted window. FIG. 4B shows a spike raster of the input populations. FIG. 4C shows a spike raster of the output populations.
  • The stages are marked by the letters in the center of FIG. 4A. FIGS. 5A-5F show the synaptic weights at 0 sec., 10 sec., 11 sec., 21 sec., 22 sec, and 33 sec., respectively.
  • In stage A, the network is initialized with all input/output connections have a synaptic USE value of 0.25; as illustrated in FIG. 5A by the heat map of the average weights between input/output populations.
  • In stage B, a Poisson random input is injected into consecutive channels for 10 seconds to establish the basal activity of the network. The resulting average synaptic weight matrix is shown in FIG. 5B.
  • In stage C, alternating reward signals are sent to establish single input/output pairs. The weight matrix is now dominated by the diagonal shown in FIG. 5C.
  • In stage D, the repeated Poisson input signals from B., above, are injected for 10 seconds. After this, the weight matrix shown in FIG. 5D demonstrates further potentiation of the established input/output pairs and a continued depression of the other connections.
  • In stage E, an opposite set of input/output associations are established using alternating reward signals. For stable retraining of the network the reward protocol needs to be about twice as long as the original training. The new weight matrix is shown in FIG. 5E.
  • In stage F, 10 seconds of the repeated Poisson inputs illustrate the newly established input/output pairs in FIG. 5F.
  • To illustrate the lateral inhibition network a pong style virtual environment was implemented. FIG. 6 is a mock-up of that environment. The position of the puck 70 in the game space is sent to a number of discretized neural channels. Each of these channels in essence represents a vertical column of the game board. The inputs are Poisson random spiking events with a rate defined by a Gaussian curve, described below. This provides a noisy input signal with overlap between channels. The networks signal, through a winner-takes-all mechanism, the position of the paddle 72.
  • Initially, the network has no knowledge or inherent understanding of how to play the game. The behavior is learned through feedback provided as reward and punishment signals encoded as random spike events. The stimulus into the network is determined by the location of the puck 70 to each of the spatial channels. The signal strength for each spatial channel is computed by sampling a Gaussian function based on the location of the channel. The location of the puck 70 on the map determines the peak amplitude and center of a Gaussian function defined as

  • fx c(X*)=ae −((x c -X*) 2 /2e 2 )  (1)
  • where
  • a is a peak amplitude of the Gaussian function,
  • b is a center of the Gaussian function,
  • c is a spatial width of the Gaussian function, and
  • Xc is the non-dimensional location of the channel.
  • The peak amplitude and Gaussian center are defined as

  • a=Y*·R max  (2)

  • b=X*  (3)
  • where
  • Y* is the non-dimensional location of the puck in the y dimension,
  • Rmax is the maximum input stimulus in spikes/s, and
  • X* Non-dimensional location of the puck in the x dimension.
  • This is visualized in FIG. 7 for a spatial width, c, of 0:05. The reward or punishment to the network arrives when the puck 70 reaches the bottom of the game board 74. FIG. 7A shows an example stimulus map for two spatial channels. FIG. 7B shows a stimulus overlap between two consecutive spatial channels. FIG. 7C shows an example stimulus for different locations of the puck 70.
  • FIGS. 8 and 9 show the results for a spatial width, c, of 0:025 at FIG. 8A 0-25 sec., FIG. 8B 50-75 sec., and FIG. 8C 125-150 sec. FIG. 10 shows the overall accuracy for the model with a spatial width, c, of 0:025.
  • The neural networks of FIGS. 1-3 may be implemented with passive and active electronics components including transistors, resistors, and capacitors. The neural networks may also be implemented with computers or processors. One type of processor that may be used is a memristor based neuromorphic processor.
  • Having now described the invention in accordance with the requirements of the patent statutes, those skilled in this art will understand how to make changes and modifications to the present invention to meet their specific requirements or conditions. Such changes and modifications may be made without departing from the scope and spirit of the invention as disclosed herein.
  • The foregoing Detailed Description of exemplary and preferred embodiments is presented for purposes of illustration and disclosure in accordance with the requirements of the law. It is not intended to be exhaustive nor to limit the invention to the precise form(s) described, but only to enable others skilled in the art to understand how the invention may be suited for a particular use or implementation. The possibility of modifications and variations will be apparent to practitioners skilled in the art. No limitation is intended by the description of exemplary embodiments which may have included tolerances, feature dimensions, specific operating conditions, engineering specifications, or the like, and which may vary between implementations or with changes to the state of the art, and no limitation should be implied therefrom. Applicant has made this disclosure with respect to the current state of the art, but also contemplates advancements and that adaptations in the future may take into consideration of those advancements, namely in accordance with the then current state of the art. It is intended that the scope of the invention be defined by the Claims as written and equivalents as applicable. Reference to a claim element in the singular is not intended to mean “one and only one” unless explicitly so stated. Moreover, no element, component, nor method or process step in this disclosure is intended to be dedicated to the public regardless of whether the element, component, or step is explicitly recited in the Claims. No claim element herein is to be construed under the provisions of 35 U.S.C. Sec. 112, sixth paragraph, unless the element is expressly recited using the phrase “means for . . . ” and no method or process step herein is to be construed under those provisions unless the step, or steps, are expressly recited using the phrase “comprising the step(s) of . . . . ”

Claims (21)

What is claimed is:
1. A neural model for reinforcement-learning and for action-selection comprising:
a plurality of channels;
a population of input neurons in each of the channels;
a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels; and
a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to output neurons in a channel that the reward neuron is part of;
wherein if the environmental input for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced; and
wherein if the environmental input for a channel is negative, the corresponding channel of a population of output neurons are punished and have their responses attenuated.
2. The neural model of claim 1 wherein each population of output neurons in each of the channels are coupled to each population of input neurons in each of the channels by a synapse having spike-timing dependent plasticity behaving according to

g eff →g eff +g effmax Ft)
where
Δ t = t pre - t post F ( Δ t ) = { A + ( Δ t τ + ) A - ( Δ t τ - ) if ( g eff < 0 ) then g eff -> 0 if ( g > g effmax ) then g eff -> g effmax .
3. The neural model of claim 1 wherein each population of input neurons, each population of output neurons, and each population of reward neurons are modeled with a Leaky-Integrate and Fire (LIF) model behaving according to
C m V t = - g leak ( V - E rest ) + I .
where
Cm is the membrane capacitance,
I is the sum of external and synaptic currents,
gleak conductance of the leak channels, and
Erest is the reversal potential for that particular class of synapse.
4. The neural model of claim 1 wherein the populations of input neurons are connected with equal probability and equal conductance to all of the populations of output neurons.
5. The neural model of claim 1 wherein the populations of input neurons are connected randomly to the populations of output neurons.
6. The neural model of claim 1 wherein the neural model is implemented with a memristor based neuromorphic processor.
7. A neural model for reinforcement-learning and for action-selection comprising:
a plurality of channels;
a population of input neurons in each of the channels;
a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels;
a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to output neurons in a channel that the reward neuron is part of; and
a population of inhibition neurons in each of the channels, wherein each population of inhibition neurons receive an input from a population of output neurons in a same channel that the population of inhibition neurons is part of, and wherein a population of inhibition neurons in a channel has an output to output neurons in every other channel except the channel of which the inhibition neurons are part of;
wherein if the environmental input to a population of reward neurons for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced; and
wherein if the environmental input to a population of reward neurons for a channel is negative, the corresponding channel of a population of output neurons are punished and have their responses attenuated.
8. The neural model of claim 7 wherein:
each population of output neurons in each of the channels are coupled to each population of input neurons in each of the channels by a synapse having spike-timing dependent plasticity;
each channel of reward neurons is coupled to output neurons by a synapse having spike-timing dependent plasticity;
the input to each population of inhibition neurons from a population of output neurons in a same channel that the population of inhibition neurons is part of is by a synapse having spike-timing dependent plasticity; and
the output from each population of inhibition neurons in a channel is coupled to output neurons in every other channel except the channel of which the inhibition neurons are part of by a synapse having spike-timing dependent plasticity;
wherein the spike-timing dependent plasticity of each synapse behaves according to

g eff →g eff +g effmax Ft)
where
Δ t = t pre - t post F ( Δ t ) = { A + ( Δ t τ + ) A - ( Δ t τ - ) if ( g eff < 0 ) then g eff -> 0 if ( g > g effmax ) then g eff -> g effmax .
9. The neural model of claim 7 wherein each population of input neurons, each population of output neurons, each population of reward neurons, and each population of inhibition neurons are modeled with a Leaky-Integrate and Fire (LIF) model behaving according to
C m V t = - g leak ( V - E rest ) + I .
where
Cm is the membrane capacitance,
I is the sum of external and synaptic currents,
gleak conductance of the leak channels, and
Erest is the reversal potential for that particular class of synapse.
10. The neural model of claim 7 wherein the populations of input neurons are connected with equal probability and equal conductance to all of the populations of output neurons.
11. The neural model of claim 7 wherein the populations of input neurons are connected randomly to the populations of output neurons.
12. The neural model of claim 7 wherein as a response increases from output neurons of a channel of which a population of inhibition neurons is part of, the inhibition neurons inhibit the responses from populations of output neurons in every other channel.
13. The neural model of claim 7 wherein the neural model is implemented with a memristor based neuromorphic processor.
14. A basal ganglia neural network model comprising:
a plurality of channels;
a population of cortex neurons in each of the channels;
a population of striatum neurons in each of the channels, each population of striatum neurons in each of the channels coupled to each population of cortex neurons in each of the channels;
a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to striatum neurons in a channel that the reward neuron is part of; and
a population of Substantia Nigra pars reticulata (SNr) neurons in each of the channels, wherein each population of SNr neurons is coupled only to a population of striatum neurons in a channel that the SNr neurons are part of;
wherein if the environmental input to a population of reward neurons for a channel is positive, the corresponding channel of a population of striatum neurons are rewarded and have their responses reinforced;
wherein if the environmental input to a population of reward neurons for a channel is negative, the corresponding channel of a population of striatum neurons are punished and have their responses attenuated; and
wherein each population of SNr neurons is tonically active and is suppressed by inhibitory afferents of striatum neurons in a channel that the SNr neurons are part of.
15. The basal ganglia neural network model of claim 14 wherein:
each population of cortex neurons in each of the channels are coupled to each population of striatum neurons in each of the channels by a synapse having spike-timing dependent plasticity;
each population of striatum neurons in a channel are coupled to striatum neurons in every other channel by a synapse having spike-timing dependent plasticity;
each channel of reward neurons is coupled to a population of striatum neurons in a same channel by a synapse having spike-timing dependent plasticity;
each population of SNr neurons is coupled to a population of striatum neurons in a same channel that the population of SNr neurons is part of by a synapse having spike-timing dependent plasticity; and
wherein the spike-timing dependent plasticity of each synapse behaves according to

g eff →g eff +g effmax Ft)
where
Δ t = t pre - t post F ( Δ t ) = { A + ( Δ t τ + ) A - ( Δ t τ - ) if ( g eff < 0 ) then g eff -> 0 if ( g > g effmax ) then g eff -> g effmax .
16. The basal ganglia neural network model of claim 14 wherein each population of cortex neurons, each population of striatum neurons, each population of reward neurons, and each population of SNr neurons are modeled with a Leaky-Integrate and Fire (LIF) model behaving according to
C m V t = - g leak ( V - E rest ) + I .
where
Cm is the membrane capacitance,
I is the sum of external and synaptic currents,
gleak conductance of the leak channels, and
Erest is the reversal potential for that particular class of synapse.
17. The basal ganglia neural network model of claim 14 wherein the populations of cortex neurons are connected with equal probability and equal conductance to all of the populations of striatum neurons.
18. The basal ganglia neural network model of claim 14 wherein the populations of cortex neurons are connected randomly to the populations of striatum neurons.
19. The basal ganglia neural network model of claim 14 wherein a Poisson random excitation is injected into the populations of SNr neurons.
20. The basal ganglia neural network model of claim 14 wherein uniform random noise is injected into the populations of SNr neurons.
21. The basal ganglia neural network model of claim 14 wherein the basal ganglia neural network model is implemented with a memristor based neuromorphic processor.
US13/896,110 2012-12-03 2013-05-16 Plastic action-selection networks for neuromorphic hardware Abandoned US20150302296A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US13/896,110 US20150302296A1 (en) 2012-12-03 2013-05-16 Plastic action-selection networks for neuromorphic hardware
PCT/US2013/041451 WO2014088634A1 (en) 2012-12-03 2013-05-16 Neural model for reinforcement learning
US14/293,928 US9349092B2 (en) 2012-12-03 2014-06-02 Neural network for reinforcement learning

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261732590P 2012-12-03 2012-12-03
US13/896,110 US20150302296A1 (en) 2012-12-03 2013-05-16 Plastic action-selection networks for neuromorphic hardware

Related Child Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/041451 Continuation WO2014088634A1 (en) 2012-12-03 2013-05-16 Neural model for reinforcement learning

Publications (1)

Publication Number Publication Date
US20150302296A1 true US20150302296A1 (en) 2015-10-22

Family

ID=50883848

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/896,110 Abandoned US20150302296A1 (en) 2012-12-03 2013-05-16 Plastic action-selection networks for neuromorphic hardware
US14/293,928 Active US9349092B2 (en) 2012-12-03 2014-06-02 Neural network for reinforcement learning

Family Applications After (1)

Application Number Title Priority Date Filing Date
US14/293,928 Active US9349092B2 (en) 2012-12-03 2014-06-02 Neural network for reinforcement learning

Country Status (2)

Country Link
US (2) US20150302296A1 (en)
WO (1) WO2014088634A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9940574B1 (en) * 2013-06-26 2018-04-10 Hrl Laboratories, Llc System and method to control a model state of a neuromorphic model of a brain
US10650307B2 (en) 2016-09-13 2020-05-12 International Business Machines Corporation Neuromorphic architecture for unsupervised pattern detection and feature learning
US20230267336A1 (en) * 2022-02-18 2023-08-24 MakinaRocks Co., Ltd. Method For Training A Neural Network Model For Semiconductor Design
US11863221B1 (en) * 2020-07-14 2024-01-02 Hrl Laboratories, Llc Low size, weight and power (swap) efficient hardware implementation of a wide instantaneous bandwidth neuromorphic adaptive core (NeurACore)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10733500B2 (en) 2015-10-21 2020-08-04 International Business Machines Corporation Short-term memory using neuromorphic hardware
US10552731B2 (en) 2015-12-28 2020-02-04 International Business Machines Corporation Digital STDP synapse and LIF neuron-based neuromorphic system
US10891543B2 (en) 2015-12-28 2021-01-12 Samsung Electronics Co., Ltd. LUT based synapse weight update scheme in STDP neuromorphic systems
US11592817B2 (en) * 2017-04-28 2023-02-28 Intel Corporation Storage management for machine learning at autonomous machines
EP3593294A1 (en) * 2017-06-28 2020-01-15 Deepmind Technologies Limited Training action selection neural networks using apprenticeship
US11307562B2 (en) * 2019-11-04 2022-04-19 Honeywell International Inc. Application of simple random search approach for reinforcement learning to controller tuning parameters
US11500337B2 (en) 2019-11-04 2022-11-15 Honeywell International Inc. Method and system for directly tuning PID parameters using a simplified actor-critic approach to reinforcement learning
US20220114439A1 (en) * 2020-10-08 2022-04-14 Here Global B.V. Method, apparatus, and system for generating asynchronous learning rules and/architectures
CN114521904B (en) * 2022-01-25 2023-09-26 中山大学 Brain electrical activity simulation method and system based on coupled neuron group

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8781983B2 (en) * 2009-12-29 2014-07-15 Knowmtech, Llc Framework for the evolution of electronic neural assemblies toward directed goals
US20100217145A1 (en) * 2006-06-09 2010-08-26 Bracco Spa Method of processing multichannel and multivariate signals and method of classifying sources of multichannel and multivariate signals operating according to such processing method
DE102007017259B4 (en) * 2007-04-12 2009-04-09 Siemens Ag Method for computer-aided control and / or regulation of a technical system
US8943008B2 (en) * 2011-09-21 2015-01-27 Brain Corporation Apparatus and methods for reinforcement learning in artificial neural networks
US20140025613A1 (en) * 2012-07-20 2014-01-23 Filip Ponulak Apparatus and methods for reinforcement learning in large populations of artificial spiking neurons

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9940574B1 (en) * 2013-06-26 2018-04-10 Hrl Laboratories, Llc System and method to control a model state of a neuromorphic model of a brain
US10650307B2 (en) 2016-09-13 2020-05-12 International Business Machines Corporation Neuromorphic architecture for unsupervised pattern detection and feature learning
US11863221B1 (en) * 2020-07-14 2024-01-02 Hrl Laboratories, Llc Low size, weight and power (swap) efficient hardware implementation of a wide instantaneous bandwidth neuromorphic adaptive core (NeurACore)
US20230267336A1 (en) * 2022-02-18 2023-08-24 MakinaRocks Co., Ltd. Method For Training A Neural Network Model For Semiconductor Design

Also Published As

Publication number Publication date
US20140344202A1 (en) 2014-11-20
US9349092B2 (en) 2016-05-24
WO2014088634A1 (en) 2014-06-12

Similar Documents

Publication Publication Date Title
US9349092B2 (en) Neural network for reinforcement learning
Hong et al. Training spiking neural networks for cognitive tasks: A versatile framework compatible with various temporal codes
US9418331B2 (en) Methods and apparatus for tagging classes using supervised learning
US20100198765A1 (en) Prediction by single neurons
US9558442B2 (en) Monitoring neural networks with shadow networks
Lin et al. Relative ordering learning in spiking neural network for pattern recognition
KR20170031695A (en) Decomposing convolution operation in neural networks
KR20160084401A (en) Implementing synaptic learning using replay in spiking neural networks
Wang et al. A modified error function for the backpropagation algorithm
US20140012789A1 (en) Problem solving by plastic neuronal networks
US9361545B2 (en) Methods and apparatus for estimating angular movement with a single two dimensional device
Davies et al. A forecast-based STDP rule suitable for neuromorphic implementation
Zidan et al. Temporal learning using second-order memristors
Zhang et al. Hmsnn: hippocampus inspired memory spiking neural network
Bi et al. Avoiding the local minima problem in backpropagation algorithm with modified error function
Chivers How to train an all-purpose robot: DeepMind is tackling one of the hardest problems for AI
EP2939187A1 (en) Neural model for reinforcement learning
Christophe et al. Pattern recognition with spiking neural networks: a simple training method.
US8112372B2 (en) Prediction by single neurons and networks
Mokhtar et al. Hippocampus-inspired spiking neural network on FPGA
Fernando From blickets to synapses: Inferring temporal causal networks by observation
CN112884125A (en) Transfer reinforcement learning control method, system, storage medium, device and application
Hourdakis et al. Computational modeling of cortical pathways involved in action execution and action observation
Robinson et al. Laguerre-volterra identification of spike-timing-dependent plasticity from spiking activity: A simulation study
Rafe et al. Exploration Of Encoding And Decoding Methods For Spiking Neural Networks On The Cart Pole And Lunar Lander Problems Using Evolutionary Training

Legal Events

Date Code Title Description
AS Assignment

Owner name: HRL LABORATORIES, LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:THIBEAULT, COREY M.;SRINIVASA, NARAYAN;REEL/FRAME:030432/0145

Effective date: 20130507

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION