CN103886367B - A kind of bionic intelligence control method - Google Patents

A kind of bionic intelligence control method Download PDF

Info

Publication number
CN103886367B
CN103886367B CN201410101272.0A CN201410101272A CN103886367B CN 103886367 B CN103886367 B CN 103886367B CN 201410101272 A CN201410101272 A CN 201410101272A CN 103886367 B CN103886367 B CN 103886367B
Authority
CN
China
Prior art keywords
delta
function
orientation
formula
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410101272.0A
Other languages
Chinese (zh)
Other versions
CN103886367A (en
Inventor
阮晓钢
黄静
于乃功
魏若岩
范青武
朱晓庆
肖尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201410101272.0A priority Critical patent/CN103886367B/en
Publication of CN103886367A publication Critical patent/CN103886367A/en
Application granted granted Critical
Publication of CN103886367B publication Critical patent/CN103886367B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Feedback Control In General (AREA)
  • Manipulator (AREA)

Abstract

The present invention relates to a kind of bionic intelligence control method.The level of intelligence that can reach for traditional robot control method is limited, robot cannot independently adapt to circumstances not known, it is difficult to obtain the ability of complex task from simple experience, task dispatching problem cannot be completed in the way of self study, the present invention is from the sensorimotor nervous system of bionical angle simulation biology, and operant conditioning reflex mechanism is incorporated in the design of sensorimotor system.The present invention has reappeared biological motion neuro-cognitive in the way of replicating sensorimotor system, is conducive to the Cognition Mechanism that simulation is biological, and then improves robot human-subject test;Add operant conditioning reflex function, thus explain in sensorimotor system interactional feedback closed loop relation between " perception " and " moving " so that system can show similar biological self study behavior, improves the level of intelligence of robot.

Description

A kind of bionic intelligence control method
Technical field
The present invention relates to Based Intelligent Control, be specifically related to one have operant conditioning reflex function, can simulated feel kinetic system The bionic intelligence control method of system.
Technical background
Although traditional method design robot alleviate mankind's physical work burden, reduce the mankind at hazardous environment under make The aspects such as industry achieve immense success, but its limited level of intelligence limits the further genralrlization of its application and gos deep into, example As: complexity, changeable and unknown environment cannot be adapted to;It is only capable of specific task, it is impossible to autonomous development makes new advances ability etc.. In order to improve the level of intelligence of robot so that it is finally close to one of human cognitive ability, artificial intelligence and robot field New approaches cognition robotics is born.This theme of cognitive robot relates to neuro physiology, psychology, computer Multiple field such as science, brain science, its Research Thinking certainly will need to draw nourishment from from the existing theory of these related disciplines, and Its development also will promote the progress of above-mentioned subject, thus affects the association area of national economy further.The present invention Be based on such discipline background, it is proposed that a kind of have operant conditioning reflex function, can simulated feel motor system imitative Raw intelligent control method.
1938, this Jenner of famous American psychologist (B.F.Skinner) proposed operant conditioning reflex first The concept of (Operant Conditioning), and it is theoretical thus to have founded operant conditioning reflex.He has used for reference Pavlov's " strengthen " (reinforcement) concept, and the intension of this concept is reformed.He is divided into positive reinforcement " strengthening " (positive reinforcement) and negative reinforcement (negative reinforcement) two kinds, positive reinforcement promotes organism Increasing the response probability stimulated, the reaction that negative reinforcement then promotes organism to eliminate this stimulation increases.The operation bar of this Jenner Part reflection theory reflects contacting between the perception of biology and motion, and perception produces reaction, reacts motion-affecting probability, this The core that this Jenner is theoretical just.
" perception-motion " is the basis of biological motion neuro-cognitive.Written by Lyons, France the first university Jeannerod professor " Motor Cognition:What Actions Tell to the Self " nervus motorius that describes of system for the first time recognize The theoretical system known.In the theoretical system that nervus motorius is cognitive, perception and action play key player, as far back as last century five In the ten's, Sperry just insists that " perception-action " ring (Perception-Action Cycle) is that nervous system runs Basic logic.Therefore, replicate " perception-motion " system thus reappear biological motion neuro-cognitive, be to explore Cognition Mechanism, understanding Cognitive behavior and then the reliable thinking of raising human-subject test.
Along this thinking, the present invention propose one have operant conditioning reflex function, can simulated feel motor system Bionic intelligence control method, thus improve cognition and the level of intelligence of robot.Relevant patent such as application number The patent of invention of CN200910086990.4 and application number CN200910089263.3, based on automaton theory, proposes behaviour respectively Make trained reflex automaton model, and discuss the application in bionical autonomic learning controls of this model.Application No. 201410055115.0 patent propose a kind of Obstacle Avoidance based on Skinner operant conditioning reflex principle, real Xian Liao robot is without the autonomous cruise under tutor's signal;The patent of Application No. 201310656943.5 is then anti-by operating condition Penetrate principle for image processing field, improve precision and the efficiency of image procossing.But, model above is all without reference to biology Nervus motorius is cognitive, is not incorporated by operant conditioning reflex in the design of sensorimotor system.At present, there is not yet and phase of the present invention As patent records.
Summary of the invention
The level of intelligence that traditional robot control method can reach is limited, and robot cannot independently adapt to unknown ring Border, it is difficult to obtained the ability of complex task from simple experience, it is impossible to complete task in the way of self study.For tradition The problems referred to above that control method exists, the present invention proposes a kind of bionic intelligence control method.The method is from bionical angle mould Intend the sensorimotor nervous system of biology, used for reference operant conditioning reflex theoretical simultaneously, made system have operant conditioning reflex Function, can preferably simulate (performance) biological self study behavior, makes robot show good adaptive ability and higher Level of intelligence.
As it is shown in figure 1, the bionic intelligence control method recurrence in accordance with the following steps that the present invention proposes is run:
Step 1, builds the neural network model of perception motor system, determines each layer god such as sensing layer, hidden layer and motion layer Through unit's quantity, make weight matrix W1, W2 in [0,1] upper random value, determine initial perception state, set learning rate.
Use 3 layers of feedforward neural network N3[l, m, n] expresses the relation between perception and motion, as shown in Figure 2.Wherein, Input layer comprises l neuron, characterizes the information perceived in a coded form, constitutes so-called " sensing layer ";Hidden layer comprises m Individual neuron, calculates the information of sensing layer transmission and processes;Two weights between hidden layer and input layer and hidden layer Matrix W 1, W2, functionally simulate the information processing maincenter in biological perception motor system;Output layer comprises n neuron, Represent n action in set of actions, constitute " motion layer ".Information is propagated, through perception by feedforward (Feed Forward) mode Layer, hidden layer, motion layer forward flow, it is achieved that from the mapping relations perceiving motion.
Biological perception information includes that external sensible (such as vision, audition, sense of touch, the sense of taste, olfactory sensation etc.) and internal perception are (such as famine Starve perception, very thirsty perception etc.) two kinds, although the receptor related to is different, but the perception information that they are carried is at nerve System is all with unified binary coded form transmission and process.Therefore, the sensing layer in the present invention is to perception information Type be not added with distinguish.
Step 2, inputs sensing layer by the perception information of current state.
Step 3, calculates from the mapping perceiving motion according to the operation algorithm of feedforward neural network.
Step 3.1, calculates sensing layer output, and formula is as follows:
s i * = s i - - - ( 1 )
In formula, siWithIt is respectively input and the output of sensing layer i-th neuron.
Step 3.2, calculates hidden layer output, and formula is as follows:
h j = Σ i w ij s i * - - - ( 2 )
h j * = 1 1 + e - hj - - - ( 3 )
In formula, hjWithIt is respectively input and output, the w of hidden layer jth neuronijRepresent that sensing layer i-th is neural Connection weights between unit and hidden layer jth neuron.
Step 3.3, calculates motion layer output, and formula is as follows:
m o = Σ j w jo h j * - - - ( 4 )
m o * = m o - - - ( 5 )
In formula, moWithIt is respectively input and output, the w of o neuron of motion layerjoRepresent hidden layer jth neuron And the connection weights between the o neuron of motion layer.
Step 4, exports according to motion layer, calculating set of actions probability:
p o = | m o * | Σ o | m o * | - - - ( 6 )
In formula, poRepresent the probability that the o action is corresponding,Expression is askedAbsolute value.
Step 5, performs the action of maximum probability according to the principle of " victor takes entirely " (Winner-take-all).
Step 6, calculates the state transfer after action executing, records new state.
Step 7, the difference of the negative preferable angle value before and after calculating state transfer.
So-called negative ideality is the concept being estimated the state that perception information is corresponding.In given application situation In, the gap between state and perfect condition that perception information is corresponding is the biggest, and negative preferable angle value will be the biggest, otherwise then gets over Little.If state is s before action executingold, after execution, state is snew, negative ideality function is ε, then the negative reason before and after state transfer The difference thinking angle value is:
Δ ε=ε (snew)-ε(sold) (7)
Step 8, calculates orientation function value.
Orientation function δ simulates orientation biological in nature, reflects the biological tendency degree to state change.With Biological orientation concept is consistent, during δ > 0, for being just orientated, illustrates that state tendency of changes tends to change in perfect condition, systematic function Good;During δ < 0, for negative orientation, illustrate that systematic function tends to be deteriorated;During δ=0, it is zero orientation, illustrates that systematic function is not changed in.
The computing formula of orientation function value is expressed as follows:
&delta; ij = &delta; ( &Delta; &epsiv; ij ) > 0 , &Delta; &epsiv; ij < 0 = 0 , &Delta; &epsiv; ij = 0 < 0 , &Delta; &epsiv; ij > 0 - - - ( 8 )
In formula, δ ijExpression state siExecution action akIt is transferred to state sjAfter orientation function value, Δ ε ij=ε (sj)-ε (si)。
Orientation function δ is continuous, for Δ ε on interval of definitionijMonotonic decreasing function, its absolute value is with Δ εijAbsolute value Monotonic increase.As Δ εijDuring > 0, negative ideality increases, and systematic function tends to be deteriorated, thus orientation function δ < 0, and Δ εijMore Greatly, orientation function δ is the least;Otherwise, as Δ εijDuring < 0, negative ideality diminishes, and systematic function tends to improve, thus orientation function δ > 0, and Δ εijThe biggest, orientation function δ is the least;As Δ εijWhen=0, negative ideality is constant, and systematic function tends to not change, Thus orientation function δ=0.
Step 9, with reference to method of negative gradient descent error back propagation algorithm, according to formula (9)~(12) adjustment weights:
Step 9.1, the weights between calculating action layer and hidden layer change, and formula is as follows:
&Delta; w jo = &delta; &CenterDot; &eta; &CenterDot; h j * - - - ( 9 )
wjo(k+1)=wjo(k)+Δwjo (10)
In formula, η represents learning rate, usually the constant in the range of [0,1], is used for setting the speed that weights change, i.e. Pace of learning.
Step 9.2, calculates the weights change value between hidden layer and sensing layer, and formula is as follows:
&Delta; w ij = &delta; &CenterDot; &eta; &CenterDot; w jo ( k ) &CenterDot; h j * ( 1 - h j * ) &CenterDot; s i - - - ( 11 )
wij(k+1)=wij(k)+Δwij
As δ > 0, represent that systematicness tends to improving, &Delta; w ij = &delta; &CenterDot; &eta; &CenterDot; w jo ( k ) &CenterDot; h j * ( 1 - h j * ) &CenterDot; s i > 0 , &Delta; w jo = &delta; &CenterDot; &eta; &CenterDot; h j * > 0 , Then wjo、wijIncreasing, respective action output increases, and selected probability increases the most accordingly;As δ < 0, Δ wjoAnd Δ wijIt is negative, Then wjo、wijReducing, respective action output reduces, and selected probability reduces the most accordingly;When δ=0, Δ wjoAnd Δ wijIt is Zero, then wjo、wijBeing zero, respective action output is constant, and selected probability also keeps constant.Thus achieve operating condition anti- Penetrate function.
Step 10, making new state is current state.
Step 11, it may be judged whether meet termination condition, if it is satisfied, then terminate program;Otherwise, 2 are gone to step.
Compared with prior art, the invention have the advantages that the present invention reappears in the way of replicating sensorimotor system Biological motion neuro-cognitive, is conducive to the biological Cognition Mechanism of simulation, and then improve robot human-subject test;Add operation Trained reflex function, thus explains in sensorimotor system interactional feedback closed loop between " perception " and " moving " and closes System so that system can show similar biological self study behavior, improves the level of intelligence of robot.The present invention can be used for reality The bionical Autonomous Control of existing robot, simple, there is higher engineer applied and be worth.
Accompanying drawing explanation
The flow chart of Fig. 1 method involved in the present invention;
The system construction drawing of Fig. 2 present invention;
Fig. 3 is machine Columba livia action probability curve in embodiment 1;
Fig. 4 is the change curve of machine Columba livia perception state correspondence encoded radio in embodiment 1;
Fig. 5 is objective for implementation in embodiment 2 " wheeled circular machine people " structure top view;
Fig. 6 is simulated environment and the experimental result picture of embodiment 2.
Detailed description of the invention
It is embodied as being described further to this method with embodiment below in conjunction with the accompanying drawings.
Shown in system structure Fig. 2 of the present invention.Realize the flow chart of the inventive method as shown in Figure 1.
Embodiment one: have the machine Columba livia of learning capacity, imitates the experiment of this Jenner (Skinner) pigeon.
The experiment of this Jenner pigeon is that this Jenner is in order to study zoopery that is theoretical with justification function trained reflex and that design One of.In this experiment, pigeon is placed in a case, in the face of trichroism button red, yellow, blue.Pecking red button, pigeon will Obtain food;Peck yellow button, without any stimulation;Peck blue buttons, then can shock by electricity.This Jenner finds, pigeon during beginning After the number of times pecking trichroism button is of substantially equal, but experiment carries out a period of time, pigeon pecks the number of times of red button and to be substantially more than Yellow, blue two buttons, thus demonstrate the correctness that operant conditioning reflex is theoretical.It is real that the present embodiment will reappear this Jenner pigeon Test, prove that the present invention can simulate animal learning behavior with this, show stronger self-learning capability.
Method step is as follows:
1. initialize, make sensing layer neuron number l=1, represent " hungry ", " half-full ", " full " three kinds of perception informations One of, wherein, sensing layer input be 1 expression perception state be " hungry ", be 2 expression perception states be " half-full ", be 3 expression senses Know that state is " full ";Motion layer neuron number n=3, represent " pecking red button ", " pecking yellow button " and " peck blue by Button " three kinds of actions;Hidden neuron number m is defined as 100 according to formula (13);Make weight matrix W1, W2 [0,0.05] with Machine value, original state is " hungry ".
m = l + n + a , 1 &le; a &le; 100 - - - ( 13 )
2. the perception information of current state is inputted sensing layer;
3., as shown in formula (1)~(5), calculate from perceiving reflecting of motion according to the operation algorithm of feedforward neural network Penetrate;
4. export according to motion layer, be calculated set of actions probability distribution according to formula (6);
5. according to the action executing that the principle select probability of " victor takes entirely " (Winner-take-all) is maximum;
6. calculate the transfer of the state after this action executing according to following state-transition matrix (as shown in table 1), and by it It is recorded as new state;
Table 1 machine Columba livia state-transition matrix
7. define the negative preferable angle value of each state as shown in table 2, and then calculate the negative reason before and after state transfer according to formula (7) Think the difference of angle value;
The negative preferable angle value of each state of table 2
State Full Half-full Hungry
Negative preferable angle value -10 0 10
8., according to formula (1) requirement, take the computational methods that formula (14) is orientation function value:
&delta; ij = &delta; ( &Delta; &epsiv; ij ) > 0 , &Delta; &epsiv; ij < 0 = 0 , &Delta; &epsiv; ij = 0 < 0 , &Delta; &epsiv; ij > 0 - - - ( 14 )
Wherein, Δ ε ijji, represent that robot is from state siTransfer to state sjThe change of negative preferable angle value, δ afterwards ijThe orientation function value of expression state transfer.
9., with reference to gradient dropout error back propagation algorithm, adjust weights according to formula (9)~(12), learning rate η=0.1:
10. making new state is current state;
11. reach 100 as EP (end of program) condition using study number of times, if it is satisfied, then terminate program;Otherwise, the 2nd is turned Step.
During as it is shown on figure 3, study starts, machine Columba livia randomly chooses to peck and takes button, and therefore the probability of three kinds of actions is equal;With The carrying out of study, machine Columba livia pecks the probability of red button and is gradually increasing, and pecks yellow, the probability of blue buttons is gradually reduced, and Peck blue buttons probability to decline faster;After 100 times learn, machine Columba livia pecks the probability of red button and has leveled off to 1, and pecks Yellow, the probability of blue buttons then level off to 0, and this illustrates the acquistion optimum action of machine Columba livia, and operant conditioning reflex has been built Vertical.Along with machine Columba livia gradually learns optimum action, its state perceived the most gradually tends to preferable, as shown in Figure 4: during beginning, Machine Columba livia is in starvation, and along with the carrying out of study, machine Columba livia gradually learns to peck red button, thus state is gradually to full Changing, when study is to nearly 40 times, machine Columba livia is stable in satiety.The study of machine Columba livia presents the " perception-OK of biology Dynamic " forming process of function: machine Columba livia obtains self current state (hungry, half-full or full) by internal perception, works as action When performing to change perception state, the most therefore the action probability of machine Columba livia changes;Thus, action can change perception, and perception is also Energy feedback influence action, it is achieved that perception and the closed loop system of action, finally embodies biological self study, adaptive intelligence spy Levy.
Embodiment two: there is the machine anthelmintic of learning capacity.
Wiener says in " cybernetics ": " even if the vision of certain forms-neuromuscular feedback system is in the lowest grade of anthelmintic Animal World is also particularly important." in order to allow machine show habit and the behavior of anthelmintic, and thus discuss animal and machine Control mechanism similar in device, Wiener is a virtual machine anthelmintic in " cybernetics ".This machine that he is conceived is compacted Although worm is different from the form of true anthelmintic and composition, but all has similar sensation-motor system.This example will be by bearing Light experiment reproduces Wiener machine anthelmintic, and the nervous system as machine anthelmintic is played work by the system model that invention is proposed With.
It not bionical from form due to the present embodiment, therefore objective for implementation is set as one with light intensity sensor Wheeled circular machine people.This robot radius is 5cm, and walking mechanism uses double-wheel differential type motion chassis, about robot Both sides are provided with wheel wLAnd wR, DC servo motor driving, afterbody has a passive universal wheel wF.This robot Frame for movement rough schematic view is as shown in Figure 5.
Experimental situation is the square space of a 4m × 4m, places a point source, such as Fig. 6 at geometric center (2,2) place Shown in.In experimental situation, light intensity foundation is radially uniformly distributed with light source distance, if the coordinate of light source is (x everywherelight, ylight), then coordinate (x, y) intensity of illumination at place in environmentCalculate according to formula (15).
In formula, ΦmaxFor light intensity at largest light intensity, i.e. light source, the present embodiment is set as 10 units;K is light intensity coefficient, The present embodiment is set as 1.Obviously, the light intensity in somewhere and ΦmaxBe directly proportional, with this at and square being inversely proportional to of light source distance.
Machine anthelmintic is from light source place, and pace is 10cm/s, under this situation, enters according to flow chart shown in Fig. 1 Row experiment, it may be assumed that
1. initializing, make sensing layer neuron number l=1, its input represents the light intensity perceived;Motion layer neuron Number n=6, represents robot and 6 kinds of angles can be turned to advance, and i.e. [0 °, 60 °, 120 °, 180 °, 240 °, 300 °], these angles All on the basis of current direction of advance, as 60 ° represent that selection rotates direction determined by 60 ° counterclockwise from current direction, remaining By that analogy;Hidden neuron number m is defined as 10 according to formula (13);Weight matrix W1, W2 is made to take at random in [0,0.1] Value, the original state making robot is light source position, i.e. coordinate (2,2) place.
2. calculate light intensity according to formula (15), the perception information of current state is inputted sensing layer;
3., as shown in formula (1)~(5), calculate from perceiving reflecting of motion according to the operation algorithm of feedforward neural network Penetrate;
4. export according to motion layer, be calculated set of actions probability distribution according to formula (6);
5. according to the action executing that the principle select probability of " victor takes entirely " (Winner-take-all) is maximum;
6. calculate the transfer of the state after this action executing according to formula (16), and be recorded as new state;
x new = x old + v * t s cos &theta; k y new = y old + v * t s sin &theta; k ( k = 1,2 , . . . , 6 ) - - - ( 16 )
In formula, xnew、ynewRepresent the new horizontal stroke of robot after Action Selection, vertical coordinate, similar, xold、yoldRepresent and select The horizontal stroke of front robot, vertical coordinate;V is robot translational speed, tsFor robot sensor sampling time, θkRepresent selected direction At the radian value with the robot center of circle as limit, in the polar coordinate system set up for pole axis of direction of advance.
7. the machine anthelmintic in the present embodiment has negative phototactic characteristics, likes black and detests light, thus by light intensity computing formula (15) as the computational methods (setting K=1) of negative preferable angle value, and then the negative ideal before and after state transfer is calculated according to formula (7) The difference of angle value;
8. calculate orientation function value according to formula (14);
9., with reference to gradient dropout error back propagation algorithm, adjust weights according to formula (9)~(12), make learning rate η= 0.5:
10. making new state is current state;
11. with machine anthelmintic arrival ambient boundary as termination condition, if it is satisfied, then terminate program;Otherwise, the 2nd step is turned.
As it is shown in figure 5, machine anthelmintic is from light source point, move along the path away from light source, until arriving at environment limit Boundary, successfully simulation (has showed) the negative Phototactic behavior of biological anthelmintic.In this course, machine anthelmintic shows and has and give birth to " sensation-motion " system as thing Vermes: the intensity signal perceived can be reacted, and then according to biological orientation (bearing light) selects optimum action, and the selection of action will change intelligent body status so that perception information becomes the most therewith Change.Such " perception-action " closed loop is set up on the basis of operant conditioning reflex.From fig. 6 it can be seen that machine anthelmintic leaves The track begun not requires to move ahead in strict accordance with bearing light, has the phenomenon turned back, but along with study number of times increases, " selects light intensity Weak place moves ahead " this action is owing to meeting biological inherent orientation, and its probability is continuously increased, and is reflected in experimental result and is exactly The track of machine anthelmintic action is gradually to the local extension that light is strong and weak, and reentry phenomenon disappears.This test result indicate that, in " thorn Swash-reaction-strengthening " repeated action under, machine anthelmintic has been built up operant conditioning reflex, shows the behavior bearing light. The acquistion of behavior of " bearing light ", reflects machine anthelmintic and achieves the self adaptation to environment by self study, thus demonstrate This method is at the effectiveness improved on robot cognition and level of intelligence.

Claims (4)

1. a bionic intelligence control method, it is characterised in that from the sensorimotor nervous system that the simulation of bionical angle is biological, and Operant conditioning reflex mechanism is incorporated in the design of sensorimotor system, enable a system to preferably simulate biological self study row For;Said method comprising the steps of:
Step 1, builds the neural network model of perception motor system, determines that each layers such as sensing layer, hidden layer and motion layer are neural Unit's quantity, makes weight matrix W1, W2 in [0,1] upper random value, determines initial perception state, set learning rate;
Use 3 layers of feedforward neural network N3[l, m, n] expresses the relation between perception and motion;Input layer comprises l neuron, Characterize the information perceived in a coded form, constitute sensing layer;Hidden layer comprises m neuron, the letter to sensing layer transmission Breath calculates and processes;Two weight matrixs W1, W2 between hidden layer and input layer and hidden layer, functionally simulate biology Information processing maincenter in perception motor system;Output layer comprises n neuron, represents n action in set of actions, constitutes Motion layer;Information is pressed feed-forward mode and is propagated, through sensing layer, hidden layer, motion layer forward flow, it is achieved from perceiving reflecting of motion Penetrate relation;
Step 2, inputs sensing layer by the perception information of current state;
Step 3, calculates from the mapping perceiving motion according to the operation algorithm of feedforward neural network;
Step 3.1, calculates sensing layer output, and formula is as follows:
s i * = s i
In formula, siWithIt is respectively input and the output of sensing layer i-th neuron;
Step 3.2, calculates hidden layer output, and formula is as follows:
h j = &Sigma; i w i j s i *
h j * = 1 1 + e - h j
In formula, hjWithIt is respectively input and output, the w of hidden layer jth neuronijRepresent that sensing layer i-th neuron is with hidden Connection weights between layer jth neuron;
Step 3.3, calculates motion layer output, and formula is as follows:
m o = &Sigma; j w j o h j *
m o * = m o
In formula, moWithIt is respectively input and output, the w of o neuron of motion layerjoRepresent hidden layer jth neuron and fortune Connection weights between dynamic the o neuron of layer;
Step 4, exports according to motion layer, calculates set of actions probability, and formula is as follows:
p o = | m o * | &Sigma; o | m o * |
In formula, poRepresent the probability that the o action is corresponding,Expression is askedAbsolute value;
Step 5, the principle entirely taken according to victor performs the action of maximum probability;
Step 6, calculates the state transfer after action executing, records new state;
Step 7, the difference of the negative preferable angle value before and after calculating state transfer;
So-called negative ideality is the concept being estimated the state that perception information is corresponding;In given application situation, Gap between state and perfect condition that perception information is corresponding is the biggest, and negative preferable angle value will be the biggest, otherwise the least;If Before action executing, state is sold, after execution, state is snew, negative ideality function is ε, then the negative ideality before and after state transfer The difference of value is:
Δ ε=ε (snew)-ε(sold)
Step 8, calculates orientation function value;Orientation function δ simulates orientation biological in nature, reflects biological to state The tendency degree of change;Consistent with biological orientation concept, during δ > 0, for being just orientated, illustrate that state tendency of changes is in preferable shape State, systematic function tends to improve;During δ < 0, for negative orientation, illustrate that systematic function tends to be deteriorated;During δ=0, it is zero orientation, explanation Systematic function is not changed in;
The computing formula of orientation function value is expressed as follows:
&delta; i j = &delta; ( &Delta;&epsiv; i j ) > 0 , &Delta;&epsiv; i j < 0 = 0 , &Delta;&epsiv; i j = 0 < 0 , &Delta;&epsiv; i j > 0 - - - ( 8 )
In formula, δijExpression state siExecution action akIt is transferred to state sjAfter orientation function value, Δ εij=ε (sj)-ε(si);
Orientation function δ is continuous, for Δ ε on interval of definitionijMonotonic decreasing function, its absolute value is with Δ εijAbsolute value dullness is passed Increase;As Δ εijDuring > 0, negative ideality increases, and systematic function tends to be deteriorated, thus orientation function δ < 0, and Δ εijThe biggest, take The least to function δ;Otherwise, as Δ εijDuring < 0, negative ideality diminishes, and systematic function tends to improve, thus orientation function δ > 0, And Δ εijThe biggest, orientation function δ is the least;As Δ εijWhen=0, negative ideality is constant, and systematic function tends to not change, thus Orientation function δ=0;
Step 9, with reference to method of negative gradient descent error back propagation algorithm, according to formula (9)~(12) adjustment weights:
Step 9.1, the weights between calculating action layer and hidden layer change, and formula is as follows:
&Delta;w j o = &delta; &CenterDot; &eta; &CenterDot; h j * - - - ( 9 )
wjo(k+1)=wjo(k)+Δwjo (10)
In formula, η represents learning rate, usually the constant in the range of [0,1], is used for setting the speed that weights change, i.e. learns Speed;
Step 9.2, calculates the weights change value between hidden layer and sensing layer, and formula is as follows:
&Delta;w i j = &delta; &CenterDot; &eta; &CenterDot; w j o ( k ) &CenterDot; h j * ( 1 - h j * ) &CenterDot; s i - - - ( 11 )
wij(k+1)=wij(k)+Δwij (12)
As δ > 0, represent that systematicness tends to improving,Then wjo、wijIncreasing, respective action output increases, and selected probability increases the most accordingly;As δ < 0, Δ wjoAnd Δ wijIt is negative, Then wjo、wijReducing, respective action output reduces, and selected probability reduces the most accordingly;When δ=0, Δ wjoAnd Δ wijIt is Zero, then wjo、wijBeing zero, respective action output is constant, and selected probability also keeps constant;Thus achieve operating condition anti- Penetrate function;
Step 10, making new state is current state;
Step 11, it may be judged whether meet termination condition, if it is satisfied, then terminate;Otherwise, 2 are gone to step.
A kind of bionic intelligence control method the most according to claim 1, it is characterised in that the sensing layer pair described in step 1 External sensible information and internal perception information are not added with distinguishing, all transmit with unified binary coded form in nervous system and Process.
A kind of bionic intelligence control method the most according to claim 1, it is characterised in that described step 8 calculates orientation letter The method of numerical value is as follows:
Orientation function δ simulates orientation biological in nature, reflects the biological tendency degree to state change;And biology Orientation concept is consistent, during δ > 0, for being just orientated, illustrates that state tendency of changes improves in perfect condition, systematic function trend;δ During < 0, for negative orientation, illustrate that systematic function tends to be deteriorated;During δ=0, it is zero orientation, illustrates that systematic function is not changed in;
The computing formula of orientation function value is expressed as follows:
&delta; i j = &delta; ( &Delta;&epsiv; i j ) > 0 , &Delta;&epsiv; i j < 0 = 0 , &Delta;&epsiv; i j = 0 < 0 , &Delta;&epsiv; i j > 0
In formula, δijExpression state siIt is transferred to state sjAfter orientation function value, Δ εij=ε (sj)-ε(si);
Orientation function δ is continuous, for Δ ε on interval of definitionijMonotonic decreasing function, its absolute value is with Δ εijAbsolute value dullness is passed Increase;As Δ εijDuring > 0, negative ideality increases, and systematic function tends to be deteriorated, thus orientation function δ < 0, and Δ εijThe biggest, take The least to function δ;Otherwise, as Δ εijDuring < 0, negative ideality diminishes, and systematic function tends to improve, thus orientation function δ > 0, And Δ εijThe biggest, orientation function δ is the least;As Δ εijWhen=0, negative ideality is constant, and systematic function tends to not change, thus Orientation function δ=0.
A kind of bionic intelligence control method the most according to claim 1, it is characterised in that described step 9 adjusts weights Method is as follows:
(1) weights between calculating action layer and hidden layer change, and formula is as follows:
&Delta;w j o = &delta; &CenterDot; &eta; &CenterDot; h j *
wjo(k+1)=wjo(k)+Δwjo
In formula, η represents learning rate, usually the constant in the range of [0,1], is used for setting the speed that weights change, i.e. learns Speed;
(2) calculating the weights change value between hidden layer and sensing layer, formula is as follows:
&Delta;w i j = &delta; &CenterDot; &eta; &CenterDot; w j o ( k ) &CenterDot; h j * ( 1 - h j * ) &CenterDot; s i
wij(k+1)=wij(k)+Δwij
As δ > 0, systematicness tends to improving,Then wjo、wijIncreasing, respective action output increases, and selected probability increases the most accordingly;As δ < 0, Δ wjoAnd Δ wijIt is negative, Then wjo、wijReducing, respective action output reduces, and selected probability reduces the most accordingly;When δ=0, Δ wjoAnd Δ wijIt is Zero, then wjo、wijBeing zero, respective action output is constant, and selected probability also keeps constant;Thus achieve operating condition anti- Penetrate function.
CN201410101272.0A 2014-03-18 2014-03-18 A kind of bionic intelligence control method Expired - Fee Related CN103886367B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410101272.0A CN103886367B (en) 2014-03-18 2014-03-18 A kind of bionic intelligence control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410101272.0A CN103886367B (en) 2014-03-18 2014-03-18 A kind of bionic intelligence control method

Publications (2)

Publication Number Publication Date
CN103886367A CN103886367A (en) 2014-06-25
CN103886367B true CN103886367B (en) 2016-08-17

Family

ID=50955250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410101272.0A Expired - Fee Related CN103886367B (en) 2014-03-18 2014-03-18 A kind of bionic intelligence control method

Country Status (1)

Country Link
CN (1) CN103886367B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104614988B (en) * 2014-12-22 2017-04-19 北京工业大学 Cognitive and learning method of cognitive moving system with inner engine
CN105045260A (en) * 2015-05-25 2015-11-11 湖南大学 Mobile robot path planning method in unknown dynamic environment
CN105824251B (en) * 2016-05-18 2018-06-15 重庆邮电大学 It is a kind of based on neural network it is bionical become warm behavioral approach
CN109212975B (en) * 2018-11-13 2021-05-28 北方工业大学 Cognitive learning method with development mechanism for perception action
CN112558605B (en) * 2020-12-06 2022-12-16 北京工业大学 Robot behavior learning system based on striatum structure and learning method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102200787A (en) * 2011-04-18 2011-09-28 重庆大学 Robot behaviour multi-level integrated learning method and robot behaviour multi-level integrated learning system
CN102915039A (en) * 2012-11-09 2013-02-06 河海大学常州校区 Multi-robot combined target searching method of animal-simulated space cognition
CN102930252A (en) * 2012-10-26 2013-02-13 广东百泰科技有限公司 Sight tracking method based on neural network head movement compensation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102200787A (en) * 2011-04-18 2011-09-28 重庆大学 Robot behaviour multi-level integrated learning method and robot behaviour multi-level integrated learning system
CN102930252A (en) * 2012-10-26 2013-02-13 广东百泰科技有限公司 Sight tracking method based on neural network head movement compensation
CN102915039A (en) * 2012-11-09 2013-02-06 河海大学常州校区 Multi-robot combined target searching method of animal-simulated space cognition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"具有操作条件反射机能的人工感觉运动系";黄静等;《控制理论与应用》;20150531;第32卷(第5期);全文 *

Also Published As

Publication number Publication date
CN103886367A (en) 2014-06-25

Similar Documents

Publication Publication Date Title
CN103886367B (en) A kind of bionic intelligence control method
Caligiore et al. TRoPICALS: a computational embodied neuroscience model of compatibility effects.
Cangelosi et al. Cognitive robotics
CN105700526A (en) On-line sequence limit learning machine method possessing autonomous learning capability
Clark The humanness of artificial non-normative personalities
Aberšek et al. Virtual teacher: Cognitive approach to e-learning material
Cohen et al. Social babbling: The emergence of symbolic gestures and words
CN105205533A (en) Development automatic machine with brain cognition mechanism and learning method of development automatic machine
Lu et al. An autonomous learning mobile robot using biological reward modulate STDP
Dimitrievska et al. Behavior models of emotion-featured robots: A survey
Jian et al. Research on intelligent cognitive function enhancement of intelligent robot based on ant colony algorithm
Hou et al. Autonomous driving at the handling limit using residual reinforcement learning
CN116805158A (en) Development network model for learning consciousness in biological brain
Tzafestas An Introduction to Robophilosophy Cognition, Intelligence, Autonomy, Consciousness, Conscience, and Ethics
Yan et al. Path Planning for Mobile Robot's Continuous Action Space Based on Deep Reinforcement Learning
Hilleli et al. Toward deep reinforcement learning without a simulator: An autonomous steering example
Grüneberg et al. An approach to subjective computing: A robot that learns from interaction with humans
Pecyna et al. A deep neural network for finger counting and numerosity estimation
CN104570738A (en) Robot track tracing method based on Skinner operant conditioning automata
Baldassarre et al. Action-outcome contingencies as the engine of open-ended learning: computational models and developmental experiments
Weng A model for auto-programming for general purposes
Dreyfus Totally model-free learned skillful coping
Levinson et al. Automatic language acquisition by an autonomous robot
Feng et al. Robot intelligent communication based on deep learning and TRIZ ergonomics for personalized healthcare
He et al. Evaluation and Analysis of the Intervention Effect of Systematic Parent Training Based on Computational Intelligence on Child Autism

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160817