CN103886367B

CN103886367B - A kind of bionic intelligence control method

Info

Publication number: CN103886367B
Application number: CN201410101272.0A
Authority: CN
Inventors: 阮晓钢; 黄静; 于乃功; 魏若岩; 范青武; 朱晓庆; 肖尧
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2014-03-18
Filing date: 2014-03-18
Publication date: 2016-08-17
Anticipated expiration: 2034-03-18
Also published as: CN103886367A

Abstract

The present invention relates to a kind of bionic intelligence control method.The level of intelligence that can reach for traditional robot control method is limited, robot cannot independently adapt to circumstances not known, it is difficult to obtain the ability of complex task from simple experience, task dispatching problem cannot be completed in the way of self study, the present invention is from the sensorimotor nervous system of bionical angle simulation biology, and operant conditioning reflex mechanism is incorporated in the design of sensorimotor system.The present invention has reappeared biological motion neuro-cognitive in the way of replicating sensorimotor system, is conducive to the Cognition Mechanism that simulation is biological, and then improves robot human-subject test；Add operant conditioning reflex function, thus explain in sensorimotor system interactional feedback closed loop relation between " perception " and " moving " so that system can show similar biological self study behavior, improves the level of intelligence of robot.

Description

A kind of bionic intelligence control method

Technical field

The present invention relates to Based Intelligent Control, be specifically related to one have operant conditioning reflex function, can simulated feel kinetic system The bionic intelligence control method of system.

Technical background

Although traditional method design robot alleviate mankind's physical work burden, reduce the mankind at hazardous environment under make The aspects such as industry achieve immense success, but its limited level of intelligence limits the further genralrlization of its application and gos deep into, example As: complexity, changeable and unknown environment cannot be adapted to；It is only capable of specific task, it is impossible to autonomous development makes new advances ability etc.. In order to improve the level of intelligence of robot so that it is finally close to one of human cognitive ability, artificial intelligence and robot field New approaches cognition robotics is born.This theme of cognitive robot relates to neuro physiology, psychology, computer Multiple field such as science, brain science, its Research Thinking certainly will need to draw nourishment from from the existing theory of these related disciplines, and Its development also will promote the progress of above-mentioned subject, thus affects the association area of national economy further.The present invention Be based on such discipline background, it is proposed that a kind of have operant conditioning reflex function, can simulated feel motor system imitative Raw intelligent control method.

1938, this Jenner of famous American psychologist (B.F.Skinner) proposed operant conditioning reflex first The concept of (Operant Conditioning), and it is theoretical thus to have founded operant conditioning reflex.He has used for reference Pavlov's " strengthen " (reinforcement) concept, and the intension of this concept is reformed.He is divided into positive reinforcement " strengthening " (positive reinforcement) and negative reinforcement (negative reinforcement) two kinds, positive reinforcement promotes organism Increasing the response probability stimulated, the reaction that negative reinforcement then promotes organism to eliminate this stimulation increases.The operation bar of this Jenner Part reflection theory reflects contacting between the perception of biology and motion, and perception produces reaction, reacts motion-affecting probability, this The core that this Jenner is theoretical just.

" perception-motion " is the basis of biological motion neuro-cognitive.Written by Lyons, France the first university Jeannerod professor " Motor Cognition:What Actions Tell to the Self " nervus motorius that describes of system for the first time recognize The theoretical system known.In the theoretical system that nervus motorius is cognitive, perception and action play key player, as far back as last century five In the ten's, Sperry just insists that " perception-action " ring (Perception-Action Cycle) is that nervous system runs Basic logic.Therefore, replicate " perception-motion " system thus reappear biological motion neuro-cognitive, be to explore Cognition Mechanism, understanding Cognitive behavior and then the reliable thinking of raising human-subject test.

Along this thinking, the present invention propose one have operant conditioning reflex function, can simulated feel motor system Bionic intelligence control method, thus improve cognition and the level of intelligence of robot.Relevant patent such as application number The patent of invention of CN200910086990.4 and application number CN200910089263.3, based on automaton theory, proposes behaviour respectively Make trained reflex automaton model, and discuss the application in bionical autonomic learning controls of this model.Application No. 201410055115.0 patent propose a kind of Obstacle Avoidance based on Skinner operant conditioning reflex principle, real Xian Liao robot is without the autonomous cruise under tutor's signal；The patent of Application No. 201310656943.5 is then anti-by operating condition Penetrate principle for image processing field, improve precision and the efficiency of image procossing.But, model above is all without reference to biology Nervus motorius is cognitive, is not incorporated by operant conditioning reflex in the design of sensorimotor system.At present, there is not yet and phase of the present invention As patent records.

Summary of the invention

The level of intelligence that traditional robot control method can reach is limited, and robot cannot independently adapt to unknown ring Border, it is difficult to obtained the ability of complex task from simple experience, it is impossible to complete task in the way of self study.For tradition The problems referred to above that control method exists, the present invention proposes a kind of bionic intelligence control method.The method is from bionical angle mould Intend the sensorimotor nervous system of biology, used for reference operant conditioning reflex theoretical simultaneously, made system have operant conditioning reflex Function, can preferably simulate (performance) biological self study behavior, makes robot show good adaptive ability and higher Level of intelligence.

As it is shown in figure 1, the bionic intelligence control method recurrence in accordance with the following steps that the present invention proposes is run:

Step 1, builds the neural network model of perception motor system, determines each layer god such as sensing layer, hidden layer and motion layer Through unit's quantity, make weight matrix W1, W2 in [0,1] upper random value, determine initial perception state, set learning rate.

Use 3 layers of feedforward neural network N³[l, m, n] expresses the relation between perception and motion, as shown in Figure 2.Wherein, Input layer comprises l neuron, characterizes the information perceived in a coded form, constitutes so-called " sensing layer "；Hidden layer comprises m Individual neuron, calculates the information of sensing layer transmission and processes；Two weights between hidden layer and input layer and hidden layer Matrix W 1, W2, functionally simulate the information processing maincenter in biological perception motor system；Output layer comprises n neuron, Represent n action in set of actions, constitute " motion layer ".Information is propagated, through perception by feedforward (Feed Forward) mode Layer, hidden layer, motion layer forward flow, it is achieved that from the mapping relations perceiving motion.

Biological perception information includes that external sensible (such as vision, audition, sense of touch, the sense of taste, olfactory sensation etc.) and internal perception are (such as famine Starve perception, very thirsty perception etc.) two kinds, although the receptor related to is different, but the perception information that they are carried is at nerve System is all with unified binary coded form transmission and process.Therefore, the sensing layer in the present invention is to perception information Type be not added with distinguish.

Step 2, inputs sensing layer by the perception information of current state.

Step 3, calculates from the mapping perceiving motion according to the operation algorithm of feedforward neural network.

Step 3.1, calculates sensing layer output, and formula is as follows:

s_{i}^{*} = s_{i} - - - (1)

In formula, s_iWithIt is respectively input and the output of sensing layer i-th neuron.

Step 3.2, calculates hidden layer output, and formula is as follows:

h_{j} = \underset{i}{Σ} w_{ij} s_{i}^{*} - - - (2)

h_{j}^{*} = \frac{1}{1 + e^{- hj}} - - - (3)

In formula, h_jWithIt is respectively input and output, the w of hidden layer jth neuron_ijRepresent that sensing layer i-th is neural Connection weights between unit and hidden layer jth neuron.

Step 3.3, calculates motion layer output, and formula is as follows:

m_{o} = \underset{j}{Σ} w_{jo} h_{j}^{*} - - - (4)

m_{o}^{*} = m_{o} - - - (5)

In formula, m_oWithIt is respectively input and output, the w of o neuron of motion layer_joRepresent hidden layer jth neuron And the connection weights between the o neuron of motion layer.

Step 4, exports according to motion layer, calculating set of actions probability:

p_{o} = \frac{| m_{o}^{*} |}{\underset{o}{Σ} | m_{o}^{*} |} - - - (6)

In formula, p_oRepresent the probability that the o action is corresponding,Expression is askedAbsolute value.

Step 5, performs the action of maximum probability according to the principle of " victor takes entirely " (Winner-take-all).

Step 6, calculates the state transfer after action executing, records new state.

Step 7, the difference of the negative preferable angle value before and after calculating state transfer.

So-called negative ideality is the concept being estimated the state that perception information is corresponding.In given application situation In, the gap between state and perfect condition that perception information is corresponding is the biggest, and negative preferable angle value will be the biggest, otherwise then gets over Little.If state is s before action executing_old, after execution, state is s_new, negative ideality function is ε, then the negative reason before and after state transfer The difference thinking angle value is:

Δ ε=ε (s_new)-ε(s_old) (7)

Step 8, calculates orientation function value.

Orientation function δ simulates orientation biological in nature, reflects the biological tendency degree to state change.With Biological orientation concept is consistent, during δ ＞ 0, for being just orientated, illustrates that state tendency of changes tends to change in perfect condition, systematic function Good；During δ ＜ 0, for negative orientation, illustrate that systematic function tends to be deteriorated；During δ=0, it is zero orientation, illustrates that systematic function is not changed in.

The computing formula of orientation function value is expressed as follows:

δ_{ij} = δ (Δ ϵ_{ij}) \{\begin{matrix} > 0, & Δ ϵ_{ij} < 0 \\ = 0, & Δ ϵ_{ij} = 0 \\ < 0, & Δ ϵ_{ij} > 0 \end{matrix} - - - (8)

In formula, δ i_jExpression state s_iExecution action a_kIt is transferred to state s_jAfter orientation function value, Δ ε i_j=ε (s_j)-ε (s_i)。

Orientation function δ is continuous, for Δ ε on interval of definition_ijMonotonic decreasing function, its absolute value is with Δ ε_ijAbsolute value Monotonic increase.As Δ ε_ijDuring ＞ 0, negative ideality increases, and systematic function tends to be deteriorated, thus orientation function δ ＜ 0, and Δ ε_ijMore Greatly, orientation function δ is the least；Otherwise, as Δ ε_ijDuring ＜ 0, negative ideality diminishes, and systematic function tends to improve, thus orientation function δ ＞ 0, and Δ ε_ijThe biggest, orientation function δ is the least；As Δ ε_ijWhen=0, negative ideality is constant, and systematic function tends to not change, Thus orientation function δ=0.

Step 9, with reference to method of negative gradient descent error back propagation algorithm, according to formula (9)～(12) adjustment weights:

Step 9.1, the weights between calculating action layer and hidden layer change, and formula is as follows:

Δ w_{jo} = δ \cdot η \cdot h_{j}^{*} - - - (9)

w_jo(k+1)=w_jo(k)+Δw_jo (10)

In formula, η represents learning rate, usually the constant in the range of [0,1], is used for setting the speed that weights change, i.e. Pace of learning.

Step 9.2, calculates the weights change value between hidden layer and sensing layer, and formula is as follows:

Δ w_{ij} = δ \cdot η \cdot w_{jo} (k) \cdot h_{j}^{*} (1 - h_{j}^{*}) \cdot s_{i} - - - (11)

wi_j(k+¹)=wi_j(k)+Δwi_j

As δ ＞ 0, represent that systematicness tends to improving,

Δ w_{ij} = δ \cdot η \cdot w_{jo} (k) \cdot h_{j}^{*} (1 - h_{j}^{*}) \cdot s_{i} > 0, Δ w_{jo} = δ \cdot η \cdot h_{j}^{*} > 0,

Then w_jo、w_ijIncreasing, respective action output increases, and selected probability increases the most accordingly；As δ ＜ 0, Δ w_joAnd Δ w_ijIt is negative, Then w_jo、w_ijReducing, respective action output reduces, and selected probability reduces the most accordingly；When δ=0, Δ w_joAnd Δ w_ijIt is Zero, then w_jo、w_ijBeing zero, respective action output is constant, and selected probability also keeps constant.Thus achieve operating condition anti- Penetrate function.

Step 10, making new state is current state.

Step 11, it may be judged whether meet termination condition, if it is satisfied, then terminate program；Otherwise, 2 are gone to step.

Compared with prior art, the invention have the advantages that the present invention reappears in the way of replicating sensorimotor system Biological motion neuro-cognitive, is conducive to the biological Cognition Mechanism of simulation, and then improve robot human-subject test；Add operation Trained reflex function, thus explains in sensorimotor system interactional feedback closed loop between " perception " and " moving " and closes System so that system can show similar biological self study behavior, improves the level of intelligence of robot.The present invention can be used for reality The bionical Autonomous Control of existing robot, simple, there is higher engineer applied and be worth.

Accompanying drawing explanation

The flow chart of Fig. 1 method involved in the present invention；

The system construction drawing of Fig. 2 present invention；

Fig. 3 is machine Columba livia action probability curve in embodiment 1；

Fig. 4 is the change curve of machine Columba livia perception state correspondence encoded radio in embodiment 1；

Fig. 5 is objective for implementation in embodiment 2 " wheeled circular machine people " structure top view；

Fig. 6 is simulated environment and the experimental result picture of embodiment 2.

Detailed description of the invention

It is embodied as being described further to this method with embodiment below in conjunction with the accompanying drawings.

Shown in system structure Fig. 2 of the present invention.Realize the flow chart of the inventive method as shown in Figure 1.

Embodiment one: have the machine Columba livia of learning capacity, imitates the experiment of this Jenner (Skinner) pigeon.

The experiment of this Jenner pigeon is that this Jenner is in order to study zoopery that is theoretical with justification function trained reflex and that design One of.In this experiment, pigeon is placed in a case, in the face of trichroism button red, yellow, blue.Pecking red button, pigeon will Obtain food；Peck yellow button, without any stimulation；Peck blue buttons, then can shock by electricity.This Jenner finds, pigeon during beginning After the number of times pecking trichroism button is of substantially equal, but experiment carries out a period of time, pigeon pecks the number of times of red button and to be substantially more than Yellow, blue two buttons, thus demonstrate the correctness that operant conditioning reflex is theoretical.It is real that the present embodiment will reappear this Jenner pigeon Test, prove that the present invention can simulate animal learning behavior with this, show stronger self-learning capability.

Method step is as follows:

1. initialize, make sensing layer neuron number l=1, represent " hungry ", " half-full ", " full " three kinds of perception informations One of, wherein, sensing layer input be 1 expression perception state be " hungry ", be 2 expression perception states be " half-full ", be 3 expression senses Know that state is " full "；Motion layer neuron number n=3, represent " pecking red button ", " pecking yellow button " and " peck blue by Button " three kinds of actions；Hidden neuron number m is defined as 100 according to formula (13)；Make weight matrix W1, W2 [0,0.05] with Machine value, original state is " hungry ".

m = \sqrt{l + n} + a, 1 \leq a \leq 100 - - - (13)

2. the perception information of current state is inputted sensing layer；

3., as shown in formula (1)～(5), calculate from perceiving reflecting of motion according to the operation algorithm of feedforward neural network Penetrate；

4. export according to motion layer, be calculated set of actions probability distribution according to formula (6)；

5. according to the action executing that the principle select probability of " victor takes entirely " (Winner-take-all) is maximum；

6. calculate the transfer of the state after this action executing according to following state-transition matrix (as shown in table 1), and by it It is recorded as new state；

Table 1 machine Columba livia state-transition matrix

7. define the negative preferable angle value of each state as shown in table 2, and then calculate the negative reason before and after state transfer according to formula (7) Think the difference of angle value；

The negative preferable angle value of each state of table 2

State	Full	Half-full	Hungry
				Negative preferable angle value	-10	0	10

8., according to formula (1) requirement, take the computational methods that formula (14) is orientation function value:

δ_{ij} = δ (Δ ϵ_{ij}) \{\begin{matrix} > 0, & Δ ϵ_{ij} < 0 \\ = 0, & Δ ϵ_{ij} = 0 \\ < 0, & Δ ϵ_{ij} > 0 \end{matrix} - - - (14)

Wherein, Δ ε i_j=ε_j-ε_i, represent that robot is from state s_iTransfer to state s_jThe change of negative preferable angle value, δ afterwards i_jThe orientation function value of expression state transfer.

9., with reference to gradient dropout error back propagation algorithm, adjust weights according to formula (9)～(12), learning rate η=0.1:

10. making new state is current state；

11. reach 100 as EP (end of program) condition using study number of times, if it is satisfied, then terminate program；Otherwise, the 2nd is turned Step.

During as it is shown on figure 3, study starts, machine Columba livia randomly chooses to peck and takes button, and therefore the probability of three kinds of actions is equal；With The carrying out of study, machine Columba livia pecks the probability of red button and is gradually increasing, and pecks yellow, the probability of blue buttons is gradually reduced, and Peck blue buttons probability to decline faster；After 100 times learn, machine Columba livia pecks the probability of red button and has leveled off to 1, and pecks Yellow, the probability of blue buttons then level off to 0, and this illustrates the acquistion optimum action of machine Columba livia, and operant conditioning reflex has been built Vertical.Along with machine Columba livia gradually learns optimum action, its state perceived the most gradually tends to preferable, as shown in Figure 4: during beginning, Machine Columba livia is in starvation, and along with the carrying out of study, machine Columba livia gradually learns to peck red button, thus state is gradually to full Changing, when study is to nearly 40 times, machine Columba livia is stable in satiety.The study of machine Columba livia presents the " perception-OK of biology Dynamic " forming process of function: machine Columba livia obtains self current state (hungry, half-full or full) by internal perception, works as action When performing to change perception state, the most therefore the action probability of machine Columba livia changes；Thus, action can change perception, and perception is also Energy feedback influence action, it is achieved that perception and the closed loop system of action, finally embodies biological self study, adaptive intelligence spy Levy.

Embodiment two: there is the machine anthelmintic of learning capacity.

Wiener says in " cybernetics ": " even if the vision of certain forms-neuromuscular feedback system is in the lowest grade of anthelmintic Animal World is also particularly important." in order to allow machine show habit and the behavior of anthelmintic, and thus discuss animal and machine Control mechanism similar in device, Wiener is a virtual machine anthelmintic in " cybernetics ".This machine that he is conceived is compacted Although worm is different from the form of true anthelmintic and composition, but all has similar sensation-motor system.This example will be by bearing Light experiment reproduces Wiener machine anthelmintic, and the nervous system as machine anthelmintic is played work by the system model that invention is proposed With.

It not bionical from form due to the present embodiment, therefore objective for implementation is set as one with light intensity sensor Wheeled circular machine people.This robot radius is 5cm, and walking mechanism uses double-wheel differential type motion chassis, about robot Both sides are provided with wheel w_LAnd w_R, DC servo motor driving, afterbody has a passive universal wheel w_F.This robot Frame for movement rough schematic view is as shown in Figure 5.

Experimental situation is the square space of a 4m × 4m, places a point source, such as Fig. 6 at geometric center (2,2) place Shown in.In experimental situation, light intensity foundation is radially uniformly distributed with light source distance, if the coordinate of light source is (x everywhere_light, y_light), then coordinate (x, y) intensity of illumination at place in environmentCalculate according to formula (15).

In formula, Φ_maxFor light intensity at largest light intensity, i.e. light source, the present embodiment is set as 10 units；K is light intensity coefficient, The present embodiment is set as 1.Obviously, the light intensity in somewhere and Φ_maxBe directly proportional, with this at and square being inversely proportional to of light source distance.

Machine anthelmintic is from light source place, and pace is 10cm/s, under this situation, enters according to flow chart shown in Fig. 1 Row experiment, it may be assumed that

1. initializing, make sensing layer neuron number l=1, its input represents the light intensity perceived；Motion layer neuron Number n=6, represents robot and 6 kinds of angles can be turned to advance, and i.e. [0 °, 60 °, 120 °, 180 °, 240 °, 300 °], these angles All on the basis of current direction of advance, as 60 ° represent that selection rotates direction determined by 60 ° counterclockwise from current direction, remaining By that analogy；Hidden neuron number m is defined as 10 according to formula (13)；Weight matrix W1, W2 is made to take at random in [0,0.1] Value, the original state making robot is light source position, i.e. coordinate (2,2) place.

2. calculate light intensity according to formula (15), the perception information of current state is inputted sensing layer；

6. calculate the transfer of the state after this action executing according to formula (16), and be recorded as new state；

\{\begin{matrix} x_{new} = x_{old} + v * t_{s} \cos θ_{k} \\ y_{new} = y_{old} + v * t_{s} \sin θ_{k} & (k = 1,2, . . ., 6) \end{matrix} - - - (16)

In formula, x_new、y_newRepresent the new horizontal stroke of robot after Action Selection, vertical coordinate, similar, x_old、y_oldRepresent and select The horizontal stroke of front robot, vertical coordinate；V is robot translational speed, t_sFor robot sensor sampling time, θ_kRepresent selected direction At the radian value with the robot center of circle as limit, in the polar coordinate system set up for pole axis of direction of advance.

7. the machine anthelmintic in the present embodiment has negative phototactic characteristics, likes black and detests light, thus by light intensity computing formula (15) as the computational methods (setting K=1) of negative preferable angle value, and then the negative ideal before and after state transfer is calculated according to formula (7) The difference of angle value；

8. calculate orientation function value according to formula (14)；

9., with reference to gradient dropout error back propagation algorithm, adjust weights according to formula (9)～(12), make learning rate η= 0.5:

10. making new state is current state；

11. with machine anthelmintic arrival ambient boundary as termination condition, if it is satisfied, then terminate program；Otherwise, the 2nd step is turned.

As it is shown in figure 5, machine anthelmintic is from light source point, move along the path away from light source, until arriving at environment limit Boundary, successfully simulation (has showed) the negative Phototactic behavior of biological anthelmintic.In this course, machine anthelmintic shows and has and give birth to " sensation-motion " system as thing Vermes: the intensity signal perceived can be reacted, and then according to biological orientation (bearing light) selects optimum action, and the selection of action will change intelligent body status so that perception information becomes the most therewith Change.Such " perception-action " closed loop is set up on the basis of operant conditioning reflex.From fig. 6 it can be seen that machine anthelmintic leaves The track begun not requires to move ahead in strict accordance with bearing light, has the phenomenon turned back, but along with study number of times increases, " selects light intensity Weak place moves ahead " this action is owing to meeting biological inherent orientation, and its probability is continuously increased, and is reflected in experimental result and is exactly The track of machine anthelmintic action is gradually to the local extension that light is strong and weak, and reentry phenomenon disappears.This test result indicate that, in " thorn Swash-reaction-strengthening " repeated action under, machine anthelmintic has been built up operant conditioning reflex, shows the behavior bearing light. The acquistion of behavior of " bearing light ", reflects machine anthelmintic and achieves the self adaptation to environment by self study, thus demonstrate This method is at the effectiveness improved on robot cognition and level of intelligence.

Claims

1. a bionic intelligence control method, it is characterised in that from the sensorimotor nervous system that the simulation of bionical angle is biological, and Operant conditioning reflex mechanism is incorporated in the design of sensorimotor system, enable a system to preferably simulate biological self study row For；Said method comprising the steps of:

Step 1, builds the neural network model of perception motor system, determines that each layers such as sensing layer, hidden layer and motion layer are neural Unit's quantity, makes weight matrix W1, W2 in [0,1] upper random value, determines initial perception state, set learning rate；

Use 3 layers of feedforward neural network N³[l, m, n] expresses the relation between perception and motion；Input layer comprises l neuron, Characterize the information perceived in a coded form, constitute sensing layer；Hidden layer comprises m neuron, the letter to sensing layer transmission Breath calculates and processes；Two weight matrixs W1, W2 between hidden layer and input layer and hidden layer, functionally simulate biology Information processing maincenter in perception motor system；Output layer comprises n neuron, represents n action in set of actions, constitutes Motion layer；Information is pressed feed-forward mode and is propagated, through sensing layer, hidden layer, motion layer forward flow, it is achieved from perceiving reflecting of motion Penetrate relation；

Step 2, inputs sensing layer by the perception information of current state；

Step 3, calculates from the mapping perceiving motion according to the operation algorithm of feedforward neural network；

Step 3.1, calculates sensing layer output, and formula is as follows:

s_{i}^{*} = s_{i}

In formula, s_iWithIt is respectively input and the output of sensing layer i-th neuron；

Step 3.2, calculates hidden layer output, and formula is as follows:

h_{j} = \underset{i}{Σ} w_{i j} s_{i}^{*}

h_{j}^{*} = \frac{1}{1 + e^{- h_{j}}}

In formula, h_jWithIt is respectively input and output, the w of hidden layer jth neuron_ijRepresent that sensing layer i-th neuron is with hidden Connection weights between layer jth neuron；

Step 3.3, calculates motion layer output, and formula is as follows:

m_{o} = \underset{j}{Σ} w_{j o} h_{j}^{*}

m_{o}^{*} = m_{o}

In formula, m_oWithIt is respectively input and output, the w of o neuron of motion layer_joRepresent hidden layer jth neuron and fortune Connection weights between dynamic the o neuron of layer；

Step 4, exports according to motion layer, calculates set of actions probability, and formula is as follows:

p_{o} = \frac{| m_{o}^{*} |}{\underset{o}{Σ} | m_{o}^{*} |}

In formula, p_oRepresent the probability that the o action is corresponding,Expression is askedAbsolute value；

Step 5, the principle entirely taken according to victor performs the action of maximum probability；

Step 6, calculates the state transfer after action executing, records new state；

Step 7, the difference of the negative preferable angle value before and after calculating state transfer；

So-called negative ideality is the concept being estimated the state that perception information is corresponding；In given application situation, Gap between state and perfect condition that perception information is corresponding is the biggest, and negative preferable angle value will be the biggest, otherwise the least；If Before action executing, state is s_old, after execution, state is s_new, negative ideality function is ε, then the negative ideality before and after state transfer The difference of value is:

Δ ε=ε (s_new)-ε(s_old)

Step 8, calculates orientation function value；Orientation function δ simulates orientation biological in nature, reflects biological to state The tendency degree of change；Consistent with biological orientation concept, during δ ＞ 0, for being just orientated, illustrate that state tendency of changes is in preferable shape State, systematic function tends to improve；During δ ＜ 0, for negative orientation, illustrate that systematic function tends to be deteriorated；During δ=0, it is zero orientation, explanation Systematic function is not changed in；

The computing formula of orientation function value is expressed as follows:

δ_{i j} = δ ({Δϵ}_{i j}) \{\begin{matrix} > 0, & {Δϵ}_{i j} < 0 \\ = 0, & {Δϵ}_{i j} = 0 \\ < 0, & {Δϵ}_{i j} > 0 \end{matrix} - - - (8)

In formula, δ_ijExpression state s_iExecution action a_kIt is transferred to state s_jAfter orientation function value, Δ ε_ij=ε (s_j)-ε(s_i)；

Orientation function δ is continuous, for Δ ε on interval of definition_ijMonotonic decreasing function, its absolute value is with Δ ε_ijAbsolute value dullness is passed Increase；As Δ ε_ijDuring ＞ 0, negative ideality increases, and systematic function tends to be deteriorated, thus orientation function δ ＜ 0, and Δ ε_ijThe biggest, take The least to function δ；Otherwise, as Δ ε_ijDuring ＜ 0, negative ideality diminishes, and systematic function tends to improve, thus orientation function δ ＞ 0, And Δ ε_ijThe biggest, orientation function δ is the least；As Δ ε_ijWhen=0, negative ideality is constant, and systematic function tends to not change, thus Orientation function δ=0；

{Δw}_{j o} = δ \cdot η \cdot h_{j}^{*} - - - (9)

w_jo(k+1)=w_jo(k)+Δw_jo (10)

In formula, η represents learning rate, usually the constant in the range of [0,1], is used for setting the speed that weights change, i.e. learns Speed；

{Δw}_{i j} = δ \cdot η \cdot w_{j o} (k) \cdot h_{j}^{*} (1 - h_{j}^{*}) \cdot s_{i} - - - (11)

w_ij(k+1)=w_ij(k)+Δw_ij (12)

As δ ＞ 0, represent that systematicness tends to improving,Then w_jo、w_ijIncreasing, respective action output increases, and selected probability increases the most accordingly；As δ ＜ 0, Δ w_joAnd Δ w_ijIt is negative, Then w_jo、w_ijReducing, respective action output reduces, and selected probability reduces the most accordingly；When δ=0, Δ w_joAnd Δ w_ijIt is Zero, then w_jo、w_ijBeing zero, respective action output is constant, and selected probability also keeps constant；Thus achieve operating condition anti- Penetrate function；

Step 10, making new state is current state；

Step 11, it may be judged whether meet termination condition, if it is satisfied, then terminate；Otherwise, 2 are gone to step.

A kind of bionic intelligence control method the most according to claim 1, it is characterised in that the sensing layer pair described in step 1 External sensible information and internal perception information are not added with distinguishing, all transmit with unified binary coded form in nervous system and Process.

A kind of bionic intelligence control method the most according to claim 1, it is characterised in that described step 8 calculates orientation letter The method of numerical value is as follows:

Orientation function δ simulates orientation biological in nature, reflects the biological tendency degree to state change；And biology Orientation concept is consistent, during δ ＞ 0, for being just orientated, illustrates that state tendency of changes improves in perfect condition, systematic function trend；δ During ＜ 0, for negative orientation, illustrate that systematic function tends to be deteriorated；During δ=0, it is zero orientation, illustrates that systematic function is not changed in；

The computing formula of orientation function value is expressed as follows:

δ_{i j} = δ ({Δϵ}_{i j}) \{\begin{matrix} > 0, & {Δϵ}_{i j} < 0 \\ = 0, & {Δϵ}_{i j} = 0 \\ < 0, & {Δϵ}_{i j} > 0 \end{matrix}

In formula, δ_ijExpression state s_iIt is transferred to state s_jAfter orientation function value, Δ ε_ij=ε (s_j)-ε(s_i)；

Orientation function δ is continuous, for Δ ε on interval of definition_ijMonotonic decreasing function, its absolute value is with Δ ε_ijAbsolute value dullness is passed Increase；As Δ ε_ijDuring ＞ 0, negative ideality increases, and systematic function tends to be deteriorated, thus orientation function δ ＜ 0, and Δ ε_ijThe biggest, take The least to function δ；Otherwise, as Δ ε_ijDuring ＜ 0, negative ideality diminishes, and systematic function tends to improve, thus orientation function δ ＞ 0, And Δ ε_ijThe biggest, orientation function δ is the least；As Δ ε_ijWhen=0, negative ideality is constant, and systematic function tends to not change, thus Orientation function δ=0.

A kind of bionic intelligence control method the most according to claim 1, it is characterised in that described step 9 adjusts weights Method is as follows:

(1) weights between calculating action layer and hidden layer change, and formula is as follows:

{Δw}_{j o} = δ \cdot η \cdot h_{j}^{*}

w_jo(k+1)=w_jo(k)+Δw_jo

(2) calculating the weights change value between hidden layer and sensing layer, formula is as follows:

{Δw}_{i j} = δ \cdot η \cdot w_{j o} (k) \cdot h_{j}^{*} (1 - h_{j}^{*}) \cdot s_{i}

w_ij(k+1)=w_ij(k)+Δw_ij

As δ ＞ 0, systematicness tends to improving,Then w_jo、w_ijIncreasing, respective action output increases, and selected probability increases the most accordingly；As δ ＜ 0, Δ w_joAnd Δ w_ijIt is negative, Then w_jo、w_ijReducing, respective action output reduces, and selected probability reduces the most accordingly；When δ=0, Δ w_joAnd Δ w_ijIt is Zero, then w_jo、w_ijBeing zero, respective action output is constant, and selected probability also keeps constant；Thus achieve operating condition anti- Penetrate function.