CN104823205A

CN104823205A - Neural model for reinforcement learning

Info

Publication number: CN104823205A
Application number: CN201380063033.4A
Authority: CN
Inventors: 科里·M·蒂博; 纳拉延·斯里尼瓦桑
Original assignee: HRL Laboratories LLC
Current assignee: HRL Laboratories LLC
Priority date: 2012-12-03
Filing date: 2013-05-16
Publication date: 2015-08-05
Anticipated expiration: 2033-05-16
Also published as: CN104823205B; EP2939187A1; EP2939187A4

Abstract

A neural model for reinforcement-learning and for action-selection includes a plurality of channels, a population of input neurons in each of the channels, a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels, and a population of reward neurons in each of the channels. Each channel of a population of reward neurons receives input from an environmental input, and is coupled only to output neurons in a channel that the reward neuron is part of. If the environmental input for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced, otherwise the corresponding channel of a population of output neurons are punished and have their responses attenuated.

Description

For strengthening the neural model of study

The cross reference of related application

The application number that the application relates on Dec 3rd, 2012 and submits to is the U.S. Provisional Patent Application of 61/732,590, and requires that it is as right of priority, it is all incorporated to by reference herein at this.The application number that the application also relates on May 16th, 2013 and submits to is the U.S. Non-provisional Patent application of 13/896,110, and requires that it is as right of priority, it is all incorporated to by reference herein at this.

About the statement that federal government subsidizes

The present invention carries out under the support of government contract " the neuronic cynapse HR0011-09-C-0001 of ARPA ".Government has certain right in the present invention.

Technical field

The disclosure relates to neural network, particularly can carry out the neural network selected and strengthen study of taking action.Network is selected in the plastic action that technology disclosed herein comprises for neuromorphic hardware.

Background technology

In the prior art, the neural network can carrying out action selection has shown good feature, and its feature had describes and strengthens study.But in the prior art, action is selected and strengthens learning algorithm to present complicated solution for tip award problem, this for hardware embodiments be do not allow manageable.

Barr, D., P.Dudek, J.Chambers and K.Gurney describe the model of the basal ganglion on a neuron processor array in " the Implementationof multi-layer leaky integrator networks on a cellularprocessor array " of the 1560-1565 page of the international neural network joint conference (IJCNN) in August, 2007.Software neural model can perform action and select.But the people such as Barr do not describe any inherent mechanism about strengthening study, and the microchannel of basal ganglion is predefined.

Merolla, P., J.Arthur, F.Akopyan, N.Imam, R.Manohar and D.Modha describe one in " the A digital neurosynaptic core using embeddedcrossbar memory with 45pj per spike in 45nm " of the 1-4 page of the custom layout meeting (CICC) in Institute of Electrical and Electric Engineers in September, 2011 can carry out the neuromorphic processor of table tennis game with human opponent.But this network is that off-line builds, and once on hardware programming can keep static.

Select it is desirable that one can realize action and strengthen study and hard-wired neural network can be easier to.Multiple embodiment of the present disclosure gives answer to these and other demand.

Summary of the invention

In first embodiment disclosed herein, a kind of neural model for strengthening study and action selection comprises: multiple passage; Be arranged in the input neuron group of each passage; Be arranged in the output nerve tuple of each passage, the often group input neuron in each passage is couple to the often group output neuron in each passage; Be arranged in the neural tuple of award of each passage, wherein often group award neuron receives the input from environment input, and wherein the award neuron of each passage is only couple to the output neuron in the passage of award belonging to neuron; Wherein, if the environment input of passage is positive, then the output neuron of respective channel is awarded and is had the response of enhancing; Wherein, if the environment input of passage is negative, then the output neuron of respective channel is punished and is had the response of decay.

In another embodiment disclosed herein, a kind of neural model for strengthening study and action selection comprises: multiple passage; Be arranged in the input neuron group of each passage; Be arranged in the output nerve tuple of each passage, the often group input neuron in each passage is couple to the often group output neuron in each passage; Be arranged in the neural tuple of award of each passage, wherein often group award neuron receives the input from environment input, and wherein the award neuron of each passage be only couple to belonging to award neuron passage in output neuron; Be arranged in the inhibitory neuron group of each passage, wherein often organize the input of inhibitory neuron reception from the output nerve tuple in the same passage belonging to this group inhibitory neuron, and the inhibitory neuron group in one of them passage has the output of the output neuron to other passage each be arranged in except passage belonging to this inhibitory neuron group; Wherein, if be positive for the environment input of the neural tuple of award of a passage, then the output nerve tuple of respective channel is awarded and is had the response of enhancing; Wherein, if be negative for the environment input of the neural tuple of award of a passage, then the output nerve tuple of respective channel is punished and is had the response of decay.

In another embodiment disclosed herein, a kind of basal ganglion neural network model comprises: multiple passage; Be arranged in the Cortical neurons tuple of each passage; Be arranged in the striatal neuron group of each passage, the often group striatal neuron in each passage is couple to the often group cortical neuron in each passage; Be arranged in the neural tuple of award of each passage, wherein often group award neuron receives the input from environment input, and wherein the award neuron of each passage is only couple to the striatal neuron in the passage of award belonging to neuron; Be arranged in Substantia nigra reticulata (SNr) the neural tuple of each passage, wherein often organize the striatal neuron group in the passage that SNr neuron is only couple to belonging to SNr neuron; Wherein, if be positive for the environment input of the neural tuple of award of a passage, then the striatal neuron group of respective channel is awarded and is had the response of enhancing; Wherein, if be negative for the environment input of the neural tuple of award of a passage, then the striatal neuron group of respective channel is punished and is had the response of decay; Wherein often organize SNr neuron and be tatanic activity and suppression is imported in the suppression of striatal neuron in passage belonging to SNr neuron into.

These and other feature and advantage can be clearly shown that by the detailed description and the accompanying drawings below.In the accompanying drawings and the description, label indicates different features, and label similar in whole instructions and accompanying drawing refers to similar feature.

Accompanying drawing explanation

Fig. 1 shows according to a neural network of the present disclosure;

Fig. 2 shows has the neural network of lateral inhibition according to of the present disclosure another;

Fig. 3 shows according to basal ganglion neural network of the present disclosure;

Fig. 4 A to 4C shows the example strengthening study script according to of the present disclosure;

Fig. 5 A to 5F shows the example of the synapse weight according to neural network of the present disclosure;

Fig. 6 is the sketch of the virtual environment illustrated according to a kind of ball of rattling of the present disclosure;

Fig. 7-9 illustrates according to the virtual environment of the table tennis ball of Fig. 6 of the present disclosure result for different space width and time span;

Figure 10 illustrates according to space width of the present disclosure the overall accuracy of the model being 0.025.

Embodiment

In the following description, in order to describe clearly multiple specific embodiment disclosed herein, many details are listed.But those skilled in the art will be understood that and can put into practice current advocated invention when all details discussed below not comprising.In other example, in order to this invention not fuzzy, well-known part is not described.

Action in bion select and the combination that strengthens study for the successful adaptation in any environment with to spread be very important.This successful running for intelligent agent is also applicable.What present here is design and implementation for being selected/strengthened network by the creativeness action biologically of neuromorphic processor control agent.

The modeling process embodied can be described to coupling of calculation biology and engineering science.In history, the strategy implementing artificial intelligence fails to cause having the factor of real emergent properties.Therefore, dispose robot individuality and wish that it remains unadvisable from the behavior of environment learning bion.Equally, neural model needs complicated to copy intravital activity exactly with various input signal.A kind of method for creating these complex stimuluses can provide in the real or virtual environment of feedback by being immersed in by model.

Conceptually, action selection is the arbitration of compete signal.In mammalian nervous system, by suitably selection between multiple input signals, be effective during the information that the complicated circuit of basal ganglion flows in the volume of unlatching cortex.This selection mechanism can to simply taking action until complex behavior and understanding process have an impact.Although exceedingly simplify, it contributes to BG to be associated with circuit multiplexer, and this circuit multiplexer initiatively connects input and output according to current system state.

To strengthen or award study (RL) is that the positive result that makes these select is maximized to taking action or the enhancing of decision-making.This is similar to tool conditional reflection, and wherein stimuli responsive test causes the response of awarding to be enhanced, and the response of decay is not enhanced.Enhancing study in neural network substitutes the ideal of supervised learning algorithm.Supervised learning needs intelligent guidance signal, and this intelligent guidance signal demand has detailed understanding to task, and the exploitation strengthening study can independent of task, without any need for first standby knowledge.Only need as the quality of the output signal of the response to input signal and the current ambient conditions of network.

At one according in embodiment of the present disclosure, leak integration by band and trigger (LIF) model to the neuron modeling in neural network.LIF model is defined by equation 1.

C_{m} \frac{dV}{dt} = - g_{leak} (V - E_{rest}) + I . - - - (1)

Wherein

Cm is membrane capacitance,

I be foreign current and synaptic currents and,

Gleak is the specific conductance of leak channel,

Erest is the reversal potential of certain kinds cynapse.

Along with the neuronic electric current of input model increases, membrane voltage also scales up, until arrive threshold voltage.Now action potential is triggered, and membrane voltage resets to stand-by value.Neuron models are in high resistant (refractory) cycle 2 milliseconds, and now membrane voltage does not allow any change occurs.If electric current was removed before reaching threshold value, then voltage reduces to Erest.LIF model is one of minimum neural model of computational intensity, and still can copy neurobehavioral a lot of aspect.

Connection between neuron or cynapse is by the cynapse modeling based on specific conductance.General type such as the equation 2 of this impact defines.

g _syn-g _max·g _eff·(V-E _syn). (2)

Wherein

Gmax is its maximum conductance coefficient of certain kinds cynapse,

Geff is the current synaptic efficacy between [0, geffmax],

Esyn is the reversal potential of certain kinds cynapse.

In order to analog neuron transmission element buffering and absorb again, the presynaptic can be taken action the decay of the impact on neuron that current potential has based on official hour constant.Equation 3 can be used to extract this process.

τ_{syn} \frac{{dg}_{i}^{syn}}{dt} = - g_{i}^{syn} + Σ W_{ji} δ (t - t_{j}) . - - - (3)

Study on synaptic levels relies on plasticity rule by peak hour and realizes, as Song, S., K.D.Miller and L.F.Abbott (2000) 2000 Natureneuroscience (9) 919-926 page " Competitive Hebbian Learningthrough Spike-timing Dependent Synaptic Plasticity " in describe peak hour rely on plasticity rule, as shown in equation 4.

g _eff→g _eff+g _effmaxF(Δt) (4)

Wherein,

Δt＝t _pre-t _post

F (Δt) = \{\begin{matrix} A_{+} e^{(\frac{Δt}{τ_{+}})} \\ A_{-} e^{(\frac{Δt}{τ_{-}})} \end{matrix}

If (g _eff< 0), then g _eff→ 0

If (g > g _effmax), then g _eff→ g _effmax.

The global parameter value that can be used in an embodiment is presented in table 1.Euler's integral is used to carry out digital integration with the time step of 1 millisecond (ms) to governing equation.

Table 1: world model's parameter

Fig. 1-3 illustrates three different neural network embodiments.Initially, each of these neural networks to their environment without any understanding or inherent to understand.Its performance is that award or punishment signal are compiled as furcella event that is random or that have structure by arriving from the award of environment or the feedback learning of punishment signal form.The Synaptic junction that these signals are strengthened or weakened between neuron; Strengthen suitable action.

The first model shown in Fig. 1 is pure feedforward network, and this feedforward network is made up of the whole excitatory neurons be configured in N number of passage.The neural network of Fig. 1 has N number of passage.Each passage in N number of passage has one group of input neuron, 12, one group of output neuron 14, group award neuron 16.

In one embodiment, each group input neuron 12 is connected to each group of all output neurons 14 with equal probability and equal specific conductance, thus guarantees that specific input-output is to there is not inherent bias voltage.In another embodiment, each group input neuron 12 is connected to each group of output neuron 14 randomly.This embodiment is even more important for the large-scale embodiment of these networks and the restriction of importing into that caused by neuromorphic system.

The input neuron 12 of each passage is connected to the output neuron 14 of each passage by cynapse 18.One group of parameter of the model that can be used for Fig. 1 is presented in table 2.Synaptic junction 18 between input neuron 12 and output neuron 14 is created at random by whole input neuron group 12, thus guarantees to there is not bias voltage between constrained input passage.

Award neuron 16 receives the input by environment sensing from environment input 20.The award neuron of each passage is only couple to the output neuron 20 of respective channel by cynapse 22.If the environment input of passage is positive, then the output neuron 14 of respective channel is awarded and is had the response of enhancing.If the environment input of passage is negative, then the output neuron 14 of respective channel is punished and is had the response of decay.

The band defined by equation 1 leaks integration and triggers (LIF) model to input neuron 12, output neuron 14, the modeling of award neuron 16.Plasticity model (STDP) is relied on to cynapse 18 and 22 modeling by peak hour by equation 4.

Table 2: the parameter stimulating network

Fig. 2 illustrates according to another neural network of the present disclosure, has lateral inhibition between the output group of this neural network.The neural network of Fig. 2 creates closed center around network, and wherein most of positive group suppresses other output group.This is not only a biologically more feasible network, and it also provides more control to selection process.One group of parameter for this model can be the parameter of showing in table 3.A main aspect of neural network is the diffusion connection of inhibitory neuron 36.One group of inhibitory neuron 36 of each passage projects the output neuron 32 of other passage each except the passage belonging to this group inhibitory neuron 36.

The neuroid of Fig. 2 has N number of passage.Each passage of this N number of passage has one group of input neuron, 30, one group of output neuron 32, group award neuron, 34, one group of inhibitory neuron 36.One group of input neuron 30 of each passage is connected to one group of output neuron 32 of each passage by cynapse 38.

In one embodiment, these group input neurons 30 are connected to the output neuron 32 of all groups with equal probability and equal specific conductance, thus guarantee that specific input-output is to there is not inherent bias voltage.In another embodiment, each group input neuron 30 is connected by whole input neuron groups 30 randomly with each Synaptic junction 38 organized between output neuron 32.

The award neuron 34 of each passage receives the input by environment sensing from environment input 40.The award neuron 34 of each passage is only couple to the output neuron 32 of respective channel by cynapse 42.If the environment input of passage is positive, then the output neuron 32 of respective channel is awarded and is had the response of enhancing.If the environment input of passage is negative, then the output neuron 32 of respective channel is punished and is had the response of decay.

The output neuron 32 of each passage is connected to the inhibitory neuron 36 of respective channel by cynapse 46.Inhibitory neuron 36 in passage is couple to the output neuron 32 in other passage each by cynapse 44, but the inhibitory neuron in passage 36 is not couple to the output neuron 32 of the passage belonging to inhibitory neuron 36.

When the response of the output neuron 32 from the passage belonging to inhibitory neuron 36 increases, inhibitory neuron 36 suppresses the response from the output neuron 32 in other passage each by cynapse 44.

The band defined by equation 1 leaks integration and triggers (LIF) model to input neuron 30, output neuron 32, award neuron 34, inhibitory neuron 36 modeling.Plasticity model (STDP) is relied on to cynapse 38,42,44,46 modeling by peak hour by equation 4.

Table 3: the parameter of lateral inhibition network

Fig. 3 illustrates according to basal ganglion (BG) neural network of the present disclosure.The neuron network simulation of Fig. 3 physiologically active of BG direct path, wherein Substantia nigra reticulata (SNr) neuron 56 is tatanic activities, and discharges with about 30Hz.Black substance is a part for basal ganglion, and reticular part is a part for black substance.The Basal Activity of SNr neuron 56 imports control into by the suppression of striatal neuron 52, which results in the mechanism of disinthibiting of action.Study occurs between cortical neuron 50 and striatal neuron 52, thus develops suitable input-output channel combination.The parameter that one group can be used for this model is illustrated in table 4.

Table 4: the parameter of basal ganglion direct path

On physiology, SNr neuron 56 is tatanic activities.But, LIF neuron not this spontaneous activity of reproducible of equation 1.In order to make up, Poisson random stimulus input 68 is injected in SNr neuron 56.In addition, low-level uniform random noise can be injected in network.

The neural network of Fig. 3 has N number of passage.Each passage of this N number of passage has one group of cortical neuron, 50, one group of striatal neuron 52, group award neuron, 54, one group of SNr neuron 56.The cortical neuron 50 of each passage is connected to each striatal neuron passage by cynapse 58.

In one embodiment, each group cortical neuron 50 is connected to each group of all striatal neurons 52 with equal probability and equal specific conductance, thus guarantees that specific cortical/striatal is to there is not inherent bias voltage.In another embodiment, each group cortical neuron 50 is connected to each group of striatal neuron 52 randomly.

One group of striatal neuron 52 in a passage is connected to one group of striatal neuron 52 in other passage each by cynapse 60.

Award neuron 54 receives the input by environment sensing from environment input 62.The award neuron 54 of each passage is only couple to the striatal neuron 52 of respective channel by cynapse 64, award neuron 54 is parts of striatal neuron 52.If the environment input of passage is positive, then the striatal neuron 52 of respective channel is awarded and is had the response of enhancing.If the environment input of passage is negative, then the striatal neuron 52 of respective channel is punished and is had the response of decay.

The striatal neuron 52 of each passage is only connected to the SNr neuron 56 of respective channel by cynapse 66.Poisson random stimulus input 68 is injected into the SNr neuron 56 in each passage.

The band defined by equation 1 leaks integration and triggers (LIF) model to cortical neuron 50, striatal neuron 52, award neuron 54, SNr neuron 56 modeling.Plasticity model (STDP) is relied on to cynapse 58,60,64,66 modeling by peak hour by equation 4.

Study in these networks is injected by conditional stimulus and is driven.Changeless spiking can be sent to input group and all award groups.Postponed by the time point of the signal of destination channel, thus make the cynapse between the output group of input group and expectation learn to be enhanced, other passages all are suppressed simultaneously.The time point of these signals depends on the value selected in equation 4.By delay is removed and suppresses the activity of other output group to inject punishment signal from target award group.

This onlyly a kind ofly develops these for creating the mode of the architecture of the network of any input/output combination.Any hertz of cloth, actor-reviewer, award-modulation or tip-award learning rules can be applied to realize the identical modulation of synapse weight.

Equally, LIF neuron is an example of spendable neuron models.Anyly can carry out integral operation to multiple signal and the mathematical model being converted into discrete time-event can be used in these networks.

Finally, specific connectivity is not conclusive to performance.The quantity increasing the connection in each cell can improve stability and plasticity.

The model of Fig. 1 realizes under the constraint condition of the original memristor based on neuromorphic processor.Fig. 4 A-4C shows an exemplary enhancing study script.The activity rate that Fig. 4 A illustrates this exemplary script is drawn.Activity uses mobile Gauss's weighted window to calculate.Fig. 4 B shows the spike grating of input group.Fig. 4 C shows the spike grating of output group.

Each stage is by the alphabetic flag of Fig. 4 A central authorities.Fig. 5 A-5F respectively show synapse weight when 0 second, 10 seconds, 11 seconds, 21 seconds, 22 seconds, 33 seconds.

At stage A, the cynapse use value all with all I/O connection with 0.25 carries out initialization to network; As in Fig. 5 A with shown by the thermal map of the average weight between I/O group.

At stage B, continue to inject Poisson stochastic inputs in 10 seconds in continuous passage thus the Basal Activity setting up network.Produced average synapse weight matrix is illustrated in Fig. 5 B.

At stage C, send reward signal alternately thus create single I/O pair.Weight matrix is arranged by the diagonal line shown in Fig. 5 C now.

At stage D, continue within 10 seconds, to inject the Poisson input signal repeated with stage B above.After this, the weight matrix shown in Fig. 5 D indicates the right further enhancing of created I/O and other lasting suppression connected.

At stage E, alternately reward signal is used to establish a set of relative I/O association.In order to carry out stable retraining to network, award agreement needs to be two double-lengths of original training.New weight matrix is illustrated in Fig. 5 E.

At stage F, the Poisson input of the repetition of 10 seconds shows the I/O pair of the new establishment in Fig. 5 F.

In order to illustrate lateral inhibition network, implement a kind of virtual environment of ball of rattling.Fig. 6 is the model of this environment.The position of the ball 70 in gamespace is sent to the neural channel of many discretizes.A vertical column of each the game representation plate in fact in these passages.These inputs are the Poisson random peaks events with the ratio that Gaussian curve described below defines.This improves noise inputs to the lap between passage.By the mechanism of Winner-take-all, network sends the signal of the position of racket 72.

At first, network is for how carrying out this game without any understanding or the understanding of inherence.By the feedback provided with the award being compiled as random peaks event and punishment signal, operating condition is learnt.The stimulation entering into network is determined by the position of ball 70 for each spatial channel.The signal intensity of each spatial channel calculates by sampling to the Gaussian function based on channel position.The position of the ball 70 in drawing determines peak amplitude and the center of Gaussian function, and this Gaussian function is defined as follows:

f_{X_{c}} (X^{*}) = {ac}^{- ({(X_{c} - X^{*})}^{2} / {2 c}^{2})}

Wherein

A is the peak amplitude of this function

B is the center of this function

C is the space width of Gaussian function

Xc is the zero dimension position of passage

Peak amplitude and Gauss center are defined as follows:

a＝Y ^*·R _max(2)

b＝X ^*(3)

Wherein

Y* is the zero dimension position of ball in y dimension,

Rmax is the maximum input stimulus in spike per second,

X* is the zero dimension position of ball in x dimension.

This is manifested in the figure 7, and wherein space width c is 0.05.The award to network and punishment is there occurs when ball 70 reaches the bottom of cribbage-board 74.The exemplary stimulation that Fig. 7 A illustrates two spatial channel is drawn.The stimulation that Fig. 7 B illustrates between two continuous space passages is overlapping.Fig. 7 C illustrates the exemplary stimulation of the diverse location of ball 70.

Fig. 8 and Fig. 9 illustrates the result in 125-150 second of the 50-75 second of the 0-25 second of Fig. 8 A, Fig. 8 B, Fig. 8 C when space width c is 0.025.Figure 10 illustrates the overall accuracy that space width c is the model of 0.025.

The neural network of Fig. 1-3 can be implemented by passive and active electronic component, comprises transistor, resistor, electric capacity.Described neural network also can be implemented by computing machine or processor.A kind of processor of spendable type is the memristor based on neuromorphic processor.

According to the requirement of patent statute, invention has been described thus, it will be appreciated by those skilled in the art that and how make amendment to the present invention and change to make it meet specific requirement and condition.Can these amendments be made when not departing from scope and spirit of the present invention disclosed herein to the present invention and change.

What more than present is to set forth the present invention according to laws and regulations requirement and disclose to object that is exemplary and detailed description of preferred embodiment.Its object is also non exhaustive or limit the invention in described (multiple) concrete form, and is only provided for others skilled in the art and is appreciated that how the present invention is adapted to specific purposes or embodiment.Concerning this area working technician, to modify and the possibility that changes is apparent.To comprise deviation, accessory size, specific operation condition, engineering specification etc. exemplary embodiment description and do not mean that restriction, it can change or change along with the state change of this area between various embodiment, and it does not imply any restriction.Applicant has made the disclosure for prior art, but still expection improves further, and can " prior art " of future (i.e.) make the present invention stand good in future by considering that these improve.Its object is to, scope of the present invention limited by claims herein and equivalent applicatory thereof.Unless explicitly stated otherwise, the singulative related to otherwise in claims not means " one and only one ".In addition, for any element of the present invention, assembly or method, process steps, no matter whether these elements, assembly or step be claimed clearly in detail in the claims, and they all do not mean that and gratuitously contribute to the public.Unless element of the present invention adopt phrase " for ... instrument " clearly quote, otherwise the element of claim can not be explained according to United States code the 35th volume Section 112 the 6th section, unless and step herein adopt " comprise ... step " clearly quote, otherwise method or the process steps of claim can not be explained according to aforesaid clause.

At least disclose design below herein:

Conceive 1. 1 kinds for strengthening the neural model of study and action selection, this neural model comprises:

Multiple passage;

Be arranged in one group of input neuron of each passage;

Be arranged in one group of output neuron of each passage, the often group input neuron in each passage is couple to the often group output neuron in each passage;

Be arranged in one group of award neuron of each passage, wherein often group award neuron receives the input from environment input, and wherein the award neuron of each passage is only couple to the output neuron in the passage of award belonging to neuron;

Wherein, if the environment input of passage is positive, then the output neuron of respective channel is awarded and is had the response of enhancing;

Wherein, if the environment input of passage is negative, then the output neuron of respective channel is punished and is had the response of decay.

Design is 2. as conceived the neural model as described in 1, and the output neuron of often organizing wherein in each passage relies on plastic cynapse be couple to often group input neuron in each passage by having performance following peak hour:

g _eff→g _eff+g _effmaxF(Δt)

Wherein,

Δt＝t _pre-t _post

F (Δt) = \{\begin{matrix} A_{+} e^{(\frac{Δt}{τ_{+}})} \\ A_{-} e^{(\frac{Δt}{τ_{-}})} \end{matrix}

If (g _eff< 0), then g _eff→ 0

If (g > g _effmax), then g _eff→ g _effmax.

Design 3. is as conceived the neural model as described in 1, and wherein often organizing input neuron, often organizing output neuron, often group award neuron is leak integration by the band that performance is following to trigger (LIF) model modeling:

C_{m} \frac{dV}{dt} = - g_{leak} (V - E_{rest}) + I .

Wherein

Cm is membrane capacitance,

I be outside and synaptic currents with,

Gleak is the specific conductance of leak channel,

Erest is the reversal potential of certain kinds cynapse.

Design 4., as conceived the neural model as described in 1, is wherein respectively organized input neuron and is connected to each group of all output neurons with equal probability and equal specific conductance.

Design 5. is as conceived the neural model as described in 1, and wherein respectively organizing input neuron is be connected to each group of output neuron randomly.

Design 6. is as conceived the neural model as described in 1, and wherein said neural model is realized by the memristor based on neuromorphic processor.

Conceive 7. 1 kinds for strengthening the neural model of study and action selection, this neural model comprises:

Multiple passage;

Be arranged in the input neuron group of each passage;

Be arranged in the output nerve tuple of each passage, the often group input neuron in each passage is couple to the often group output neuron in each passage;

Be arranged in the neural tuple of award of each passage, wherein often group award neuron receives the input from environment input, and wherein the award neuron of each passage is only couple to the output neuron in the passage of award belonging to neuron;

Be arranged in the inhibitory neuron group of each passage, wherein often organize the input of inhibitory neuron reception from the output nerve tuple in the same passage belonging to this group inhibitory neuron, and the inhibitory neuron group in one of them passage has the output of the output neuron to other passage each be arranged in except the passage belonging to this group inhibitory neuron;

Wherein, if be positive for the input of the neural tuple of award of a passage, then the output nerve tuple of respective channel is awarded and is had the response of enhancing; And

Wherein, if be negative for the input of the neural tuple of award of a passage, then the output nerve tuple of respective channel is punished and is had the response of decay.

Conceive 8. as conceived the neural model as described in 7, wherein:

Output neuron of often organizing wherein in each passage relies on plastic cynapse be couple to often group input neuron in each passage by having peak hour;

Each award neuron relies on plastic cynapse be couple to output neuron by having peak hour;

By having peak hour, plastic synaptic input is relied on for the input from the output nerve tuple in phase one passage belonging to this group inhibitory neuron often organizing inhibitory neuron; And

The output often organizing inhibitory neuron in a passage relies on output neuron in other each passage that plastic cynapse is couple to except the passage belonging to this group inhibitory neuron by having peak hour;

Wherein the peak hour dependence plasticity performance of each cynapse is as follows:

g _eff→g _eff+g _effmaxF(Δt)

Wherein,

Δt＝t _pre-t _post

F (Δt) = \{\begin{matrix} A_{+} e^{(\frac{Δt}{τ_{+}})} \\ A_{-} e^{(\frac{Δt}{τ_{-}})} \end{matrix}

If (g _eff< 0), then g _eff→ 0

If (g > g _effmax), then g _eff→ g _effmax.

Design 9. is as conceived the neural model as described in 7, and wherein often organizing input neuron, often organizing output neuron, often group award neuron, often organize inhibitory neuron is leak integration by the band that performance is following to trigger (LIF) model modeling:

C_{m} \frac{dV}{dt} = - g_{leak} (V - E_{rest}) + I .

Wherein

Cm is membrane capacitance,

I be outside and synaptic currents with,

Gleak is the specific conductance of leak channel,

Erest is the reversal potential of certain kinds cynapse.

Design 10., as conceived the neural model as described in 7, is wherein respectively organized input neuron and is connected to each group of all output neurons with equal probability and equal specific conductance.

Design 11. is as conceived the neural model as described in 7, and wherein respectively organizing input neuron is be connected to each group of output neuron randomly.

Design 12. is as conceived the neural model as described in 7, and wherein when the response of the output neuron from a passage belonging to one group of inhibitory neuron increases, inhibitory neuron suppresses the response from each group of output neuron in other passage each.

Design 13. is as conceived the neural model as described in 7, and wherein said neural model is realized by the memristor based on neuromorphic processor.

Conceive 14. 1 kinds of basal ganglion neural network models, it comprises:

Multiple passage;

Be arranged in the Cortical neurons tuple of each passage;

Be arranged in the striatal neuron group of each passage, the often group striatal neuron in each passage is couple to the often group cortical neuron in each passage;

Be arranged in the neural tuple of award of each passage, wherein often group award neuron receives the input from environment input, and wherein the award neuron of each passage is only couple to the striatal neuron in the passage of award belonging to neuron; And

Be arranged in Substantia nigra reticulata (SNr) the neural tuple of each passage, wherein often organize SN _rneuron is only couple to the striatal neuron group in the passage belonging to SNr neuron;

Wherein, if be positive for the input of the neural tuple of award of a passage, then the striatal neuron group of respective channel is awarded and is had the response of enhancing;

Wherein, if be negative for the input of the neural tuple of award of a passage, then the striatal neuron group of respective channel is punished and is had the response of decay;

Wherein often organize SNr neuron to be tatanic activity and to be that suppression is imported in the suppression of the striatal neuron in the passage belonging to SNr neuron into.

Conceive 15. as conceived the basal ganglion neural network model as described in 14, wherein:

Cortical neuron of often organizing in each passage relies on plastic cynapse be couple to often group striatal neuron in each passage by having peak hour;

Striatal neuron of often organizing in each passage relies on plastic cynapse be couple to striatal neuron in other passage each by having peak hour;

The award neuron of each passage relies on plastic cynapse be couple to striatal neuron group in same passage by having peak hour;

Often organize SNr neuron and rely on striatal neuron group in the same passage that plastic cynapse is couple to belonging to this group SNr neuron by having peak hour; And

geff→geff+geffmaxF(Δt)

Wherein,

Δt＝t _pre-t _post

F (Δt) = \{\begin{matrix} A_{+} e^{(\frac{Δt}{τ_{+}})} \\ A_{-} e^{(\frac{Δt}{τ_{-}})} \end{matrix}

If (g _eff< 0), then g _eff→ 0

If (g > g _effmax), then g _eff→ g _effmax.

Design 16. is as conceived the basal ganglion neural network model as described in 14, and wherein often organizing cortical neuron, often organizing striatal neuron, often group award neuron, often organize SNr neuron is leak integration by the band that performance is following to trigger (LIF) model modeling:

C_{m} \frac{dV}{dt} = - g_{leak} (V - E_{rest}) + I .

Wherein

Cm is membrane capacitance,

I be outside and synaptic currents with,

Gleak is the specific conductance of leak channel,

Erest is the reversal potential of certain kinds cynapse.

Design 17., as conceived the basal ganglion neural network model as described in 14, is wherein respectively organized cortical neuron and is connected to each group of all striatal neurons with equal probability and equal specific conductance.

Design 18. is as conceived the basal ganglion neural network model as described in 14, and wherein respectively organizing cortical neuron is be connected to each group of striatal neuron randomly.

Poisson random stimulus, as conceived the basal ganglion neural network model as described in 14, is wherein injected in each group of SNr neuron by design 19..

Uniform random noise, as conceived the basal ganglion neural network model as described in 14, is wherein injected in each group of SNr neuron by design 20..

Design 21. is as conceived the basal ganglion neural network model as described in 14, and wherein said basal ganglion neural network model is realized by the memristor based on neuromorphic processor.

Claims (amendment according to treaty the 19th article)

1., for strengthening a neural model for study and action selection, this neural model comprises:

Multiple passage;

Be arranged in the input neuron group of each passage;

Be arranged in the output nerve tuple of each passage, each input neuron group in each passage is couple to each output nerve tuple in each passage;

Be arranged in the neural tuple of award of each passage, wherein the neural tuple of each award receives the input from environment input, and wherein the award neuron of each passage is only couple to the output neuron in the passage of award belonging to neuron;

2. neural model as claimed in claim 1, each output nerve tuple wherein in each passage relies on plastic cynapse by the peak hour with following performance and is couple to each input neuron group in each passage:

g _eff→g _eff+g _effmaxF(Δt)

Wherein,

Δt＝t _pre-t _post

F (Δt) = \{\begin{matrix} A_{+} e^{(\frac{Δt}{τ_{+}})} \\ A_{-} e^{(\frac{Δt}{τ_{-}})} \end{matrix}

If (g _eff< 0), then g _eff→ 0

If (g > g _effmax), then g _eff→ g _effmax.

3. neural model as claimed in claim 1, wherein each input neuron group, each output nerve tuple, the neural tuple of each award leak integration by the band that performance is following to trigger (LIF) model modeling:

C_{m} \frac{dV}{dt} = - g_{leak} (V - E_{rest}) + I .

Wherein

Cm is membrane capacitance,

I be foreign current and synaptic currents and,

Gleak is the specific conductance of leak channel,

Erest is reversal potential.

4. neural model as claimed in claim 1, wherein each input neuron group is connected to all each output nerve tuples with equal probability and equal specific conductance.

5. neural model as claimed in claim 1, wherein each input neuron group is connected to each output nerve tuple randomly.

6. neural model as claimed in claim 1, wherein said neural model is realized by the memristor based on neuromorphic processor.

7., for strengthening a neural model for study and action selection, this neural model comprises:

Multiple passage;

Be arranged in the input neuron group of each passage;

Be arranged in the inhibitory neuron group of each passage, wherein each inhibitory neuron group receives the input from the output nerve tuple in the same passage belonging to this inhibitory neuron group, and the inhibitory neuron group in one of them passage has the output of the output neuron to other passage each be arranged in except passage belonging to this inhibitory neuron group;

Wherein, if be positive for the environment input of the neural tuple of award of a passage, then the output nerve tuple of respective channel is awarded and is had the response of enhancing; And

Wherein, if be negative for the environment input of the neural tuple of award of a passage, then the output nerve tuple of respective channel is punished and is had the response of decay.

8. neural model as claimed in claim 7, wherein:

Each output nerve tuple wherein in each passage relies on plastic cynapse be couple to each input neuron group in each passage by having peak hour;

Award neuron in each passage relies on plastic cynapse be couple to output neuron by having peak hour;

The input from the output nerve tuple in the same passage belonging to this inhibitory neuron group of each inhibitory neuron group relies on plastic cynapse provide by having peak hour; And

The output of each inhibitory neuron group in a passage relies on output neuron in other passage each that plastic cynapse is couple to except passage belonging to this inhibitory neuron group by having peak hour;

g _eff→g _eff+g _effmaxF(Δt)

Wherein,

Δt＝t _pre-t _post

F (Δt) = \{\begin{matrix} A_{+} e^{(\frac{Δt}{τ_{+}})} \\ A_{-} e^{(\frac{Δt}{τ_{-}})} \end{matrix}

If (g _eff< 0), then g _eff→ 0

If (g > g _effmax), then g _eff→ g _efgmax.

9. neural model as claimed in claim 7, wherein each input neuron group, each output nerve tuple, the neural tuple of each award, each inhibitory neuron group leak integration by the band that performance is following to trigger (LIF) model modeling:

C_{m} \frac{dV}{dt} = - g_{leak} (V - E_{rest}) + I .

Wherein

Cm is membrane capacitance,

I be foreign current and synaptic currents and,

Gleak is the specific conductance of leak channel,

Erest is reversal potential.

10. neural model as claimed in claim 7, wherein each input neuron group is connected to all each output nerve tuples with equal probability and equal specific conductance.

11. neural models as claimed in claim 7, wherein each input neuron group is connected to each output nerve tuple randomly.

12. neural models as claimed in claim 7, wherein when the response of the output neuron from the passage belonging to inhibitory neuron group increases, inhibitory neuron suppresses the response from each output nerve tuple in other passage each.

13. neural models as claimed in claim 7, wherein said neural model is realized by the memristor based on neuromorphic processor.

14. 1 kinds of basal ganglion neural network models, it comprises:

Multiple passage;

Be arranged in the Cortical neurons tuple of each passage;

Be arranged in the striatal neuron group of each passage, each striatal neuron group in each passage is couple to each Cortical neurons tuple in each passage;

Be arranged in the neural tuple of award of each passage, wherein the neural tuple of each award receives the input from environment input, and wherein the award neuron of each passage is only couple to the striatal neuron in the passage of award belonging to neuron; And

Be arranged in Substantia nigra reticulata (SNr) the neural tuple of each passage, wherein each SN _rneural tuple is only couple to the striatal neuron group in the passage belonging to SNr neuron;

Wherein, if be positive for the environment input of the neural tuple of award of a passage, then the striatal neuron group of respective channel is awarded and is had the response of enhancing;

Wherein, if be negative for the environment input of the neural tuple of award of a passage, then the striatal neuron group of respective channel is punished and is had the response of decay;

Wherein the neural tuple of each SNr is tatanic activity and suppression is imported in the suppression of striatal neuron in passage belonging to SNr neuron into.

15. basal ganglion neural network models as claimed in claim 14, wherein:

Each Cortical neurons tuple in each passage relies on plastic cynapse be couple to each striatal neuron group in each passage by having peak hour;

Each striatal neuron group in each passage relies on plastic cynapse be couple to striatal neuron in other passage each by having peak hour;

The neuronic passage of each award relies on plastic cynapse be couple to one group of striatal neuron in same channels by having peak hour;

The neural tuple of each SNr relies on striatal neuron group in the same passage that plastic cynapse is couple to belonging to the neural tuple of this SNr by having peak hour; And

g _eff→g _eff+g _effmaxF(Δt)

Wherein,

Δt＝t _pre-t _post

F (Δt) = \{\begin{matrix} A_{+} e^{(\frac{Δt}{τ_{+}})} \\ A_{-} e^{(\frac{Δt}{τ_{-}})} \end{matrix}

If (g _eff< 0), then g _eff→ 0

If (g > g _effmax), then g _eff→ g _effmax.

16. basal ganglion neural network models as claimed in claim 14, wherein each Cortical neurons tuple, each striatal neuron group, the neural tuple of each award, the neural tuple of each SNr leak integration by the band that performance is following to trigger (LIF) model modeling:

C_{m} \frac{dV}{dt} = - g_{leak} (V - E_{rest}) + I .

Wherein

Cm is membrane capacitance,

I be foreign current and synaptic currents and,

Gleak is the specific conductance of leak channel,

Erest is reversal potential.

17. basal ganglion neural network models as claimed in claim 14, wherein each Cortical neurons tuple is connected to each all striatal neuron groups with equal probability and equal specific conductance.

18. basal ganglion neural network models as claimed in claim 14, wherein each Cortical neurons tuple is connected to each striatal neuron group randomly.

19. basal ganglion neural network models as claimed in claim 14, are wherein injected into Poisson random stimulus in the neural tuple of each SNr.

20. basal ganglion neural network models as claimed in claim 14, are wherein injected into uniform random noise in the neural tuple of each SNr.

21. basal ganglion neural network models as claimed in claim 14, wherein said basal ganglion neural network model is realized by the memristor based on neuromorphic processor.

Claims

Multiple passage;

Be arranged in the input neuron group of each passage;

g _eff→g _eff+g _effmaxF(Δt)

Wherein,

Δt＝t _pre-t _post

F (Δt) = \{\begin{matrix} A_{+} e^{(\frac{Δt}{τ_{+}})} \\ A_{-} e^{(\frac{Δt}{τ_{-}})} \end{matrix}

If (g _eff< 0), then g _eff→ 0

If (g > g _effmax), then g _eff→ g _effmax.

C_{m} \frac{dV}{dt} = - g_{leak} (V - E_{rest}) + I .

Wherein

Cm is membrane capacitance,

I be foreign current and synaptic currents and,

Gleak is the specific conductance of leak channel,

Erest is the reversal potential of certain kinds cynapse.

Multiple passage;

Be arranged in the input neuron group of each passage;

8. neural model as claimed in claim 7, wherein:

g _eff→g _eff+g _effmaxF(Δt)

Wherein,

Δt＝t _pre-t _post

F (Δt) = \{\begin{matrix} A_{+} e^{(\frac{Δt}{τ_{+}})} \\ A_{-} e^{(\frac{Δt}{τ_{-}})} \end{matrix}

If (g _eff< 0), then g _eff→ 0

If (g > g _effmax), then g _eff→ g _effmax.

C_{m} \frac{dV}{dt} = - g_{leak} (V - E_{rest}) + I .

Wherein

Cm is membrane capacitance,

I be foreign current and synaptic currents and,

Gleak is the specific conductance of leak channel,

Erest is the reversal potential of certain kinds cynapse.

14. 1 kinds of basal ganglion neural network models, it comprises:

Multiple passage;

Be arranged in the Cortical neurons tuple of each passage;

Be arranged in Substantia nigra reticulata (SNr) the neural tuple of each passage, the striatal neuron group during wherein the neural tuple of each SNr is only couple to belonging to SNr neuron passage;

15. basal ganglion neural network models as claimed in claim 14, wherein:

g _eff→g _eff+g _effmaxF(Δt)

Wherein,

Δt＝t _pre-t _post

F (Δt) = \{\begin{matrix} A_{+} e^{(\frac{Δt}{τ_{+}})} \\ A_{-} e^{(\frac{Δt}{τ_{-}})} \end{matrix}

If (g _eff< 0), then g _eff→ 0

If (g > g _effmax), then g _eff→ g _effmax.

16. basal ganglion neural network models as claimed in claim 14, wherein each Cortical neurons tuple, each striatal neuron group, each award neural tuple, each SN _rneural tuple leaks integration by the band that performance is following to trigger (LIF) model modeling:

C_{m} \frac{dV}{dt} = - g_{leak} (V - E_{rest}) + I .

Wherein

Cm is membrane capacitance,

I be foreign current and synaptic currents and,

Gleak is the specific conductance of leak channel,

Erest is the reversal potential of certain kinds cynapse.