CN103886367B - A kind of bionic intelligence control method - Google Patents
A kind of bionic intelligence control method Download PDFInfo
- Publication number
- CN103886367B CN103886367B CN201410101272.0A CN201410101272A CN103886367B CN 103886367 B CN103886367 B CN 103886367B CN 201410101272 A CN201410101272 A CN 201410101272A CN 103886367 B CN103886367 B CN 103886367B
- Authority
- CN
- China
- Prior art keywords
- delta
- function
- orientation
- formula
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 239000011664 nicotinic acid Substances 0.000 title claims abstract description 11
- 230000006870 function Effects 0.000 claims abstract description 66
- 230000008447 perception Effects 0.000 claims abstract description 39
- 230000033001 locomotion Effects 0.000 claims abstract description 32
- 230000011514 reflex Effects 0.000 claims abstract description 19
- 230000031868 operant conditioning Effects 0.000 claims abstract description 17
- 230000007246 mechanism Effects 0.000 claims abstract description 7
- 210000000653 nervous system Anatomy 0.000 claims abstract description 6
- 238000013461 design Methods 0.000 claims abstract description 5
- 238000004088 simulation Methods 0.000 claims abstract description 5
- 230000009471 action Effects 0.000 claims description 51
- 210000002569 neuron Anatomy 0.000 claims description 23
- 230000008859 change Effects 0.000 claims description 22
- 230000009897 systematic effect Effects 0.000 claims description 18
- 230000013016 learning Effects 0.000 claims description 12
- 238000012546 transfer Methods 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 3
- 230000003247 decreasing effect Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000010365 information processing Effects 0.000 claims description 2
- 230000001537 neural effect Effects 0.000 claims description 2
- 238000003062 neural network model Methods 0.000 claims description 2
- 230000000644 propagated effect Effects 0.000 claims description 2
- 230000006399 behavior Effects 0.000 abstract description 9
- 230000019771 cognition Effects 0.000 abstract description 7
- 238000012360 testing method Methods 0.000 abstract description 5
- 230000003362 replicative effect Effects 0.000 abstract description 2
- 230000000507 anthelmentic effect Effects 0.000 description 17
- 241000272205 Columba livia Species 0.000 description 15
- 241000272201 Columbiformes Species 0.000 description 7
- 230000002787 reinforcement Effects 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 6
- 241001513371 Knautia arvensis Species 0.000 description 4
- 230000001149 cognitive effect Effects 0.000 description 4
- 241001465754 Metazoa Species 0.000 description 3
- 241001513109 Chrysocephalum apiculatum Species 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000000205 computational method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000638 stimulation Effects 0.000 description 2
- 238000005728 strengthening Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000002567 autonomic effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000003930 cognitive ability Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 231100001261 hazardous Toxicity 0.000 description 1
- 235000003642 hunger Nutrition 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000008395 negative phototaxis Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000036403 neuro physiology Effects 0.000 description 1
- 230000002319 phototactic effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 235000019627 satiety Nutrition 0.000 description 1
- 230000036186 satiety Effects 0.000 description 1
- 230000035807 sensation Effects 0.000 description 1
- 235000019615 sensations Nutrition 0.000 description 1
- 230000014860 sensory perception of taste Effects 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 230000037351 starvation Effects 0.000 description 1
Landscapes
- Feedback Control In General (AREA)
- Manipulator (AREA)
Abstract
The present invention relates to a kind of bionic intelligence control method.The level of intelligence that can reach for traditional robot control method is limited, robot cannot independently adapt to circumstances not known, it is difficult to obtain the ability of complex task from simple experience, task dispatching problem cannot be completed in the way of self study, the present invention is from the sensorimotor nervous system of bionical angle simulation biology, and operant conditioning reflex mechanism is incorporated in the design of sensorimotor system.The present invention has reappeared biological motion neuro-cognitive in the way of replicating sensorimotor system, is conducive to the Cognition Mechanism that simulation is biological, and then improves robot human-subject test;Add operant conditioning reflex function, thus explain in sensorimotor system interactional feedback closed loop relation between " perception " and " moving " so that system can show similar biological self study behavior, improves the level of intelligence of robot.
Description
Technical field
The present invention relates to Based Intelligent Control, be specifically related to one have operant conditioning reflex function, can simulated feel kinetic system
The bionic intelligence control method of system.
Technical background
Although traditional method design robot alleviate mankind's physical work burden, reduce the mankind at hazardous environment under make
The aspects such as industry achieve immense success, but its limited level of intelligence limits the further genralrlization of its application and gos deep into, example
As: complexity, changeable and unknown environment cannot be adapted to;It is only capable of specific task, it is impossible to autonomous development makes new advances ability etc..
In order to improve the level of intelligence of robot so that it is finally close to one of human cognitive ability, artificial intelligence and robot field
New approaches cognition robotics is born.This theme of cognitive robot relates to neuro physiology, psychology, computer
Multiple field such as science, brain science, its Research Thinking certainly will need to draw nourishment from from the existing theory of these related disciplines, and
Its development also will promote the progress of above-mentioned subject, thus affects the association area of national economy further.The present invention
Be based on such discipline background, it is proposed that a kind of have operant conditioning reflex function, can simulated feel motor system imitative
Raw intelligent control method.
1938, this Jenner of famous American psychologist (B.F.Skinner) proposed operant conditioning reflex first
The concept of (Operant Conditioning), and it is theoretical thus to have founded operant conditioning reflex.He has used for reference Pavlov's
" strengthen " (reinforcement) concept, and the intension of this concept is reformed.He is divided into positive reinforcement " strengthening "
(positive reinforcement) and negative reinforcement (negative reinforcement) two kinds, positive reinforcement promotes organism
Increasing the response probability stimulated, the reaction that negative reinforcement then promotes organism to eliminate this stimulation increases.The operation bar of this Jenner
Part reflection theory reflects contacting between the perception of biology and motion, and perception produces reaction, reacts motion-affecting probability, this
The core that this Jenner is theoretical just.
" perception-motion " is the basis of biological motion neuro-cognitive.Written by Lyons, France the first university Jeannerod professor
" Motor Cognition:What Actions Tell to the Self " nervus motorius that describes of system for the first time recognize
The theoretical system known.In the theoretical system that nervus motorius is cognitive, perception and action play key player, as far back as last century five
In the ten's, Sperry just insists that " perception-action " ring (Perception-Action Cycle) is that nervous system runs
Basic logic.Therefore, replicate " perception-motion " system thus reappear biological motion neuro-cognitive, be to explore Cognition Mechanism, understanding
Cognitive behavior and then the reliable thinking of raising human-subject test.
Along this thinking, the present invention propose one have operant conditioning reflex function, can simulated feel motor system
Bionic intelligence control method, thus improve cognition and the level of intelligence of robot.Relevant patent such as application number
The patent of invention of CN200910086990.4 and application number CN200910089263.3, based on automaton theory, proposes behaviour respectively
Make trained reflex automaton model, and discuss the application in bionical autonomic learning controls of this model.Application No.
201410055115.0 patent propose a kind of Obstacle Avoidance based on Skinner operant conditioning reflex principle, real
Xian Liao robot is without the autonomous cruise under tutor's signal;The patent of Application No. 201310656943.5 is then anti-by operating condition
Penetrate principle for image processing field, improve precision and the efficiency of image procossing.But, model above is all without reference to biology
Nervus motorius is cognitive, is not incorporated by operant conditioning reflex in the design of sensorimotor system.At present, there is not yet and phase of the present invention
As patent records.
Summary of the invention
The level of intelligence that traditional robot control method can reach is limited, and robot cannot independently adapt to unknown ring
Border, it is difficult to obtained the ability of complex task from simple experience, it is impossible to complete task in the way of self study.For tradition
The problems referred to above that control method exists, the present invention proposes a kind of bionic intelligence control method.The method is from bionical angle mould
Intend the sensorimotor nervous system of biology, used for reference operant conditioning reflex theoretical simultaneously, made system have operant conditioning reflex
Function, can preferably simulate (performance) biological self study behavior, makes robot show good adaptive ability and higher
Level of intelligence.
As it is shown in figure 1, the bionic intelligence control method recurrence in accordance with the following steps that the present invention proposes is run:
Step 1, builds the neural network model of perception motor system, determines each layer god such as sensing layer, hidden layer and motion layer
Through unit's quantity, make weight matrix W1, W2 in [0,1] upper random value, determine initial perception state, set learning rate.
Use 3 layers of feedforward neural network N3[l, m, n] expresses the relation between perception and motion, as shown in Figure 2.Wherein,
Input layer comprises l neuron, characterizes the information perceived in a coded form, constitutes so-called " sensing layer ";Hidden layer comprises m
Individual neuron, calculates the information of sensing layer transmission and processes;Two weights between hidden layer and input layer and hidden layer
Matrix W 1, W2, functionally simulate the information processing maincenter in biological perception motor system;Output layer comprises n neuron,
Represent n action in set of actions, constitute " motion layer ".Information is propagated, through perception by feedforward (Feed Forward) mode
Layer, hidden layer, motion layer forward flow, it is achieved that from the mapping relations perceiving motion.
Biological perception information includes that external sensible (such as vision, audition, sense of touch, the sense of taste, olfactory sensation etc.) and internal perception are (such as famine
Starve perception, very thirsty perception etc.) two kinds, although the receptor related to is different, but the perception information that they are carried is at nerve
System is all with unified binary coded form transmission and process.Therefore, the sensing layer in the present invention is to perception information
Type be not added with distinguish.
Step 2, inputs sensing layer by the perception information of current state.
Step 3, calculates from the mapping perceiving motion according to the operation algorithm of feedforward neural network.
Step 3.1, calculates sensing layer output, and formula is as follows:
In formula, siWithIt is respectively input and the output of sensing layer i-th neuron.
Step 3.2, calculates hidden layer output, and formula is as follows:
In formula, hjWithIt is respectively input and output, the w of hidden layer jth neuronijRepresent that sensing layer i-th is neural
Connection weights between unit and hidden layer jth neuron.
Step 3.3, calculates motion layer output, and formula is as follows:
In formula, moWithIt is respectively input and output, the w of o neuron of motion layerjoRepresent hidden layer jth neuron
And the connection weights between the o neuron of motion layer.
Step 4, exports according to motion layer, calculating set of actions probability:
In formula, poRepresent the probability that the o action is corresponding,Expression is askedAbsolute value.
Step 5, performs the action of maximum probability according to the principle of " victor takes entirely " (Winner-take-all).
Step 6, calculates the state transfer after action executing, records new state.
Step 7, the difference of the negative preferable angle value before and after calculating state transfer.
So-called negative ideality is the concept being estimated the state that perception information is corresponding.In given application situation
In, the gap between state and perfect condition that perception information is corresponding is the biggest, and negative preferable angle value will be the biggest, otherwise then gets over
Little.If state is s before action executingold, after execution, state is snew, negative ideality function is ε, then the negative reason before and after state transfer
The difference thinking angle value is:
Δ ε=ε (snew)-ε(sold) (7)
Step 8, calculates orientation function value.
Orientation function δ simulates orientation biological in nature, reflects the biological tendency degree to state change.With
Biological orientation concept is consistent, during δ > 0, for being just orientated, illustrates that state tendency of changes tends to change in perfect condition, systematic function
Good;During δ < 0, for negative orientation, illustrate that systematic function tends to be deteriorated;During δ=0, it is zero orientation, illustrates that systematic function is not changed in.
The computing formula of orientation function value is expressed as follows:
In formula, δ ijExpression state siExecution action akIt is transferred to state sjAfter orientation function value, Δ ε ij=ε (sj)-ε
(si)。
Orientation function δ is continuous, for Δ ε on interval of definitionijMonotonic decreasing function, its absolute value is with Δ εijAbsolute value
Monotonic increase.As Δ εijDuring > 0, negative ideality increases, and systematic function tends to be deteriorated, thus orientation function δ < 0, and Δ εijMore
Greatly, orientation function δ is the least;Otherwise, as Δ εijDuring < 0, negative ideality diminishes, and systematic function tends to improve, thus orientation function δ
> 0, and Δ εijThe biggest, orientation function δ is the least;As Δ εijWhen=0, negative ideality is constant, and systematic function tends to not change,
Thus orientation function δ=0.
Step 9, with reference to method of negative gradient descent error back propagation algorithm, according to formula (9)~(12) adjustment weights:
Step 9.1, the weights between calculating action layer and hidden layer change, and formula is as follows:
wjo(k+1)=wjo(k)+Δwjo (10)
In formula, η represents learning rate, usually the constant in the range of [0,1], is used for setting the speed that weights change, i.e.
Pace of learning.
Step 9.2, calculates the weights change value between hidden layer and sensing layer, and formula is as follows:
wij(k+1)=wij(k)+Δwij
As δ > 0, represent that systematicness tends to improving, Then
wjo、wijIncreasing, respective action output increases, and selected probability increases the most accordingly;As δ < 0, Δ wjoAnd Δ wijIt is negative,
Then wjo、wijReducing, respective action output reduces, and selected probability reduces the most accordingly;When δ=0, Δ wjoAnd Δ wijIt is
Zero, then wjo、wijBeing zero, respective action output is constant, and selected probability also keeps constant.Thus achieve operating condition anti-
Penetrate function.
Step 10, making new state is current state.
Step 11, it may be judged whether meet termination condition, if it is satisfied, then terminate program;Otherwise, 2 are gone to step.
Compared with prior art, the invention have the advantages that the present invention reappears in the way of replicating sensorimotor system
Biological motion neuro-cognitive, is conducive to the biological Cognition Mechanism of simulation, and then improve robot human-subject test;Add operation
Trained reflex function, thus explains in sensorimotor system interactional feedback closed loop between " perception " and " moving " and closes
System so that system can show similar biological self study behavior, improves the level of intelligence of robot.The present invention can be used for reality
The bionical Autonomous Control of existing robot, simple, there is higher engineer applied and be worth.
Accompanying drawing explanation
The flow chart of Fig. 1 method involved in the present invention;
The system construction drawing of Fig. 2 present invention;
Fig. 3 is machine Columba livia action probability curve in embodiment 1;
Fig. 4 is the change curve of machine Columba livia perception state correspondence encoded radio in embodiment 1;
Fig. 5 is objective for implementation in embodiment 2 " wheeled circular machine people " structure top view;
Fig. 6 is simulated environment and the experimental result picture of embodiment 2.
Detailed description of the invention
It is embodied as being described further to this method with embodiment below in conjunction with the accompanying drawings.
Shown in system structure Fig. 2 of the present invention.Realize the flow chart of the inventive method as shown in Figure 1.
Embodiment one: have the machine Columba livia of learning capacity, imitates the experiment of this Jenner (Skinner) pigeon.
The experiment of this Jenner pigeon is that this Jenner is in order to study zoopery that is theoretical with justification function trained reflex and that design
One of.In this experiment, pigeon is placed in a case, in the face of trichroism button red, yellow, blue.Pecking red button, pigeon will
Obtain food;Peck yellow button, without any stimulation;Peck blue buttons, then can shock by electricity.This Jenner finds, pigeon during beginning
After the number of times pecking trichroism button is of substantially equal, but experiment carries out a period of time, pigeon pecks the number of times of red button and to be substantially more than
Yellow, blue two buttons, thus demonstrate the correctness that operant conditioning reflex is theoretical.It is real that the present embodiment will reappear this Jenner pigeon
Test, prove that the present invention can simulate animal learning behavior with this, show stronger self-learning capability.
Method step is as follows:
1. initialize, make sensing layer neuron number l=1, represent " hungry ", " half-full ", " full " three kinds of perception informations
One of, wherein, sensing layer input be 1 expression perception state be " hungry ", be 2 expression perception states be " half-full ", be 3 expression senses
Know that state is " full ";Motion layer neuron number n=3, represent " pecking red button ", " pecking yellow button " and " peck blue by
Button " three kinds of actions;Hidden neuron number m is defined as 100 according to formula (13);Make weight matrix W1, W2 [0,0.05] with
Machine value, original state is " hungry ".
2. the perception information of current state is inputted sensing layer;
3., as shown in formula (1)~(5), calculate from perceiving reflecting of motion according to the operation algorithm of feedforward neural network
Penetrate;
4. export according to motion layer, be calculated set of actions probability distribution according to formula (6);
5. according to the action executing that the principle select probability of " victor takes entirely " (Winner-take-all) is maximum;
6. calculate the transfer of the state after this action executing according to following state-transition matrix (as shown in table 1), and by it
It is recorded as new state;
Table 1 machine Columba livia state-transition matrix
7. define the negative preferable angle value of each state as shown in table 2, and then calculate the negative reason before and after state transfer according to formula (7)
Think the difference of angle value;
The negative preferable angle value of each state of table 2
State | Full | Half-full | Hungry |
Negative preferable angle value | -10 | 0 | 10 |
8., according to formula (1) requirement, take the computational methods that formula (14) is orientation function value:
Wherein, Δ ε ij=εj-εi, represent that robot is from state siTransfer to state sjThe change of negative preferable angle value, δ afterwards
ijThe orientation function value of expression state transfer.
9., with reference to gradient dropout error back propagation algorithm, adjust weights according to formula (9)~(12), learning rate η=0.1:
10. making new state is current state;
11. reach 100 as EP (end of program) condition using study number of times, if it is satisfied, then terminate program;Otherwise, the 2nd is turned
Step.
During as it is shown on figure 3, study starts, machine Columba livia randomly chooses to peck and takes button, and therefore the probability of three kinds of actions is equal;With
The carrying out of study, machine Columba livia pecks the probability of red button and is gradually increasing, and pecks yellow, the probability of blue buttons is gradually reduced, and
Peck blue buttons probability to decline faster;After 100 times learn, machine Columba livia pecks the probability of red button and has leveled off to 1, and pecks
Yellow, the probability of blue buttons then level off to 0, and this illustrates the acquistion optimum action of machine Columba livia, and operant conditioning reflex has been built
Vertical.Along with machine Columba livia gradually learns optimum action, its state perceived the most gradually tends to preferable, as shown in Figure 4: during beginning,
Machine Columba livia is in starvation, and along with the carrying out of study, machine Columba livia gradually learns to peck red button, thus state is gradually to full
Changing, when study is to nearly 40 times, machine Columba livia is stable in satiety.The study of machine Columba livia presents the " perception-OK of biology
Dynamic " forming process of function: machine Columba livia obtains self current state (hungry, half-full or full) by internal perception, works as action
When performing to change perception state, the most therefore the action probability of machine Columba livia changes;Thus, action can change perception, and perception is also
Energy feedback influence action, it is achieved that perception and the closed loop system of action, finally embodies biological self study, adaptive intelligence spy
Levy.
Embodiment two: there is the machine anthelmintic of learning capacity.
Wiener says in " cybernetics ": " even if the vision of certain forms-neuromuscular feedback system is in the lowest grade of anthelmintic
Animal World is also particularly important." in order to allow machine show habit and the behavior of anthelmintic, and thus discuss animal and machine
Control mechanism similar in device, Wiener is a virtual machine anthelmintic in " cybernetics ".This machine that he is conceived is compacted
Although worm is different from the form of true anthelmintic and composition, but all has similar sensation-motor system.This example will be by bearing
Light experiment reproduces Wiener machine anthelmintic, and the nervous system as machine anthelmintic is played work by the system model that invention is proposed
With.
It not bionical from form due to the present embodiment, therefore objective for implementation is set as one with light intensity sensor
Wheeled circular machine people.This robot radius is 5cm, and walking mechanism uses double-wheel differential type motion chassis, about robot
Both sides are provided with wheel wLAnd wR, DC servo motor driving, afterbody has a passive universal wheel wF.This robot
Frame for movement rough schematic view is as shown in Figure 5.
Experimental situation is the square space of a 4m × 4m, places a point source, such as Fig. 6 at geometric center (2,2) place
Shown in.In experimental situation, light intensity foundation is radially uniformly distributed with light source distance, if the coordinate of light source is (x everywherelight,
ylight), then coordinate (x, y) intensity of illumination at place in environmentCalculate according to formula (15).
In formula, ΦmaxFor light intensity at largest light intensity, i.e. light source, the present embodiment is set as 10 units;K is light intensity coefficient,
The present embodiment is set as 1.Obviously, the light intensity in somewhere and ΦmaxBe directly proportional, with this at and square being inversely proportional to of light source distance.
Machine anthelmintic is from light source place, and pace is 10cm/s, under this situation, enters according to flow chart shown in Fig. 1
Row experiment, it may be assumed that
1. initializing, make sensing layer neuron number l=1, its input represents the light intensity perceived;Motion layer neuron
Number n=6, represents robot and 6 kinds of angles can be turned to advance, and i.e. [0 °, 60 °, 120 °, 180 °, 240 °, 300 °], these angles
All on the basis of current direction of advance, as 60 ° represent that selection rotates direction determined by 60 ° counterclockwise from current direction, remaining
By that analogy;Hidden neuron number m is defined as 10 according to formula (13);Weight matrix W1, W2 is made to take at random in [0,0.1]
Value, the original state making robot is light source position, i.e. coordinate (2,2) place.
2. calculate light intensity according to formula (15), the perception information of current state is inputted sensing layer;
3., as shown in formula (1)~(5), calculate from perceiving reflecting of motion according to the operation algorithm of feedforward neural network
Penetrate;
4. export according to motion layer, be calculated set of actions probability distribution according to formula (6);
5. according to the action executing that the principle select probability of " victor takes entirely " (Winner-take-all) is maximum;
6. calculate the transfer of the state after this action executing according to formula (16), and be recorded as new state;
In formula, xnew、ynewRepresent the new horizontal stroke of robot after Action Selection, vertical coordinate, similar, xold、yoldRepresent and select
The horizontal stroke of front robot, vertical coordinate;V is robot translational speed, tsFor robot sensor sampling time, θkRepresent selected direction
At the radian value with the robot center of circle as limit, in the polar coordinate system set up for pole axis of direction of advance.
7. the machine anthelmintic in the present embodiment has negative phototactic characteristics, likes black and detests light, thus by light intensity computing formula
(15) as the computational methods (setting K=1) of negative preferable angle value, and then the negative ideal before and after state transfer is calculated according to formula (7)
The difference of angle value;
8. calculate orientation function value according to formula (14);
9., with reference to gradient dropout error back propagation algorithm, adjust weights according to formula (9)~(12), make learning rate η=
0.5:
10. making new state is current state;
11. with machine anthelmintic arrival ambient boundary as termination condition, if it is satisfied, then terminate program;Otherwise, the 2nd step is turned.
As it is shown in figure 5, machine anthelmintic is from light source point, move along the path away from light source, until arriving at environment limit
Boundary, successfully simulation (has showed) the negative Phototactic behavior of biological anthelmintic.In this course, machine anthelmintic shows and has and give birth to
" sensation-motion " system as thing Vermes: the intensity signal perceived can be reacted, and then according to biological orientation
(bearing light) selects optimum action, and the selection of action will change intelligent body status so that perception information becomes the most therewith
Change.Such " perception-action " closed loop is set up on the basis of operant conditioning reflex.From fig. 6 it can be seen that machine anthelmintic leaves
The track begun not requires to move ahead in strict accordance with bearing light, has the phenomenon turned back, but along with study number of times increases, " selects light intensity
Weak place moves ahead " this action is owing to meeting biological inherent orientation, and its probability is continuously increased, and is reflected in experimental result and is exactly
The track of machine anthelmintic action is gradually to the local extension that light is strong and weak, and reentry phenomenon disappears.This test result indicate that, in " thorn
Swash-reaction-strengthening " repeated action under, machine anthelmintic has been built up operant conditioning reflex, shows the behavior bearing light.
The acquistion of behavior of " bearing light ", reflects machine anthelmintic and achieves the self adaptation to environment by self study, thus demonstrate
This method is at the effectiveness improved on robot cognition and level of intelligence.
Claims (4)
1. a bionic intelligence control method, it is characterised in that from the sensorimotor nervous system that the simulation of bionical angle is biological, and
Operant conditioning reflex mechanism is incorporated in the design of sensorimotor system, enable a system to preferably simulate biological self study row
For;Said method comprising the steps of:
Step 1, builds the neural network model of perception motor system, determines that each layers such as sensing layer, hidden layer and motion layer are neural
Unit's quantity, makes weight matrix W1, W2 in [0,1] upper random value, determines initial perception state, set learning rate;
Use 3 layers of feedforward neural network N3[l, m, n] expresses the relation between perception and motion;Input layer comprises l neuron,
Characterize the information perceived in a coded form, constitute sensing layer;Hidden layer comprises m neuron, the letter to sensing layer transmission
Breath calculates and processes;Two weight matrixs W1, W2 between hidden layer and input layer and hidden layer, functionally simulate biology
Information processing maincenter in perception motor system;Output layer comprises n neuron, represents n action in set of actions, constitutes
Motion layer;Information is pressed feed-forward mode and is propagated, through sensing layer, hidden layer, motion layer forward flow, it is achieved from perceiving reflecting of motion
Penetrate relation;
Step 2, inputs sensing layer by the perception information of current state;
Step 3, calculates from the mapping perceiving motion according to the operation algorithm of feedforward neural network;
Step 3.1, calculates sensing layer output, and formula is as follows:
In formula, siWithIt is respectively input and the output of sensing layer i-th neuron;
Step 3.2, calculates hidden layer output, and formula is as follows:
In formula, hjWithIt is respectively input and output, the w of hidden layer jth neuronijRepresent that sensing layer i-th neuron is with hidden
Connection weights between layer jth neuron;
Step 3.3, calculates motion layer output, and formula is as follows:
In formula, moWithIt is respectively input and output, the w of o neuron of motion layerjoRepresent hidden layer jth neuron and fortune
Connection weights between dynamic the o neuron of layer;
Step 4, exports according to motion layer, calculates set of actions probability, and formula is as follows:
In formula, poRepresent the probability that the o action is corresponding,Expression is askedAbsolute value;
Step 5, the principle entirely taken according to victor performs the action of maximum probability;
Step 6, calculates the state transfer after action executing, records new state;
Step 7, the difference of the negative preferable angle value before and after calculating state transfer;
So-called negative ideality is the concept being estimated the state that perception information is corresponding;In given application situation,
Gap between state and perfect condition that perception information is corresponding is the biggest, and negative preferable angle value will be the biggest, otherwise the least;If
Before action executing, state is sold, after execution, state is snew, negative ideality function is ε, then the negative ideality before and after state transfer
The difference of value is:
Δ ε=ε (snew)-ε(sold)
Step 8, calculates orientation function value;Orientation function δ simulates orientation biological in nature, reflects biological to state
The tendency degree of change;Consistent with biological orientation concept, during δ > 0, for being just orientated, illustrate that state tendency of changes is in preferable shape
State, systematic function tends to improve;During δ < 0, for negative orientation, illustrate that systematic function tends to be deteriorated;During δ=0, it is zero orientation, explanation
Systematic function is not changed in;
The computing formula of orientation function value is expressed as follows:
In formula, δijExpression state siExecution action akIt is transferred to state sjAfter orientation function value, Δ εij=ε (sj)-ε(si);
Orientation function δ is continuous, for Δ ε on interval of definitionijMonotonic decreasing function, its absolute value is with Δ εijAbsolute value dullness is passed
Increase;As Δ εijDuring > 0, negative ideality increases, and systematic function tends to be deteriorated, thus orientation function δ < 0, and Δ εijThe biggest, take
The least to function δ;Otherwise, as Δ εijDuring < 0, negative ideality diminishes, and systematic function tends to improve, thus orientation function δ > 0,
And Δ εijThe biggest, orientation function δ is the least;As Δ εijWhen=0, negative ideality is constant, and systematic function tends to not change, thus
Orientation function δ=0;
Step 9, with reference to method of negative gradient descent error back propagation algorithm, according to formula (9)~(12) adjustment weights:
Step 9.1, the weights between calculating action layer and hidden layer change, and formula is as follows:
wjo(k+1)=wjo(k)+Δwjo (10)
In formula, η represents learning rate, usually the constant in the range of [0,1], is used for setting the speed that weights change, i.e. learns
Speed;
Step 9.2, calculates the weights change value between hidden layer and sensing layer, and formula is as follows:
wij(k+1)=wij(k)+Δwij (12)
As δ > 0, represent that systematicness tends to improving,Then
wjo、wijIncreasing, respective action output increases, and selected probability increases the most accordingly;As δ < 0, Δ wjoAnd Δ wijIt is negative,
Then wjo、wijReducing, respective action output reduces, and selected probability reduces the most accordingly;When δ=0, Δ wjoAnd Δ wijIt is
Zero, then wjo、wijBeing zero, respective action output is constant, and selected probability also keeps constant;Thus achieve operating condition anti-
Penetrate function;
Step 10, making new state is current state;
Step 11, it may be judged whether meet termination condition, if it is satisfied, then terminate;Otherwise, 2 are gone to step.
A kind of bionic intelligence control method the most according to claim 1, it is characterised in that the sensing layer pair described in step 1
External sensible information and internal perception information are not added with distinguishing, all transmit with unified binary coded form in nervous system and
Process.
A kind of bionic intelligence control method the most according to claim 1, it is characterised in that described step 8 calculates orientation letter
The method of numerical value is as follows:
Orientation function δ simulates orientation biological in nature, reflects the biological tendency degree to state change;And biology
Orientation concept is consistent, during δ > 0, for being just orientated, illustrates that state tendency of changes improves in perfect condition, systematic function trend;δ
During < 0, for negative orientation, illustrate that systematic function tends to be deteriorated;During δ=0, it is zero orientation, illustrates that systematic function is not changed in;
The computing formula of orientation function value is expressed as follows:
In formula, δijExpression state siIt is transferred to state sjAfter orientation function value, Δ εij=ε (sj)-ε(si);
Orientation function δ is continuous, for Δ ε on interval of definitionijMonotonic decreasing function, its absolute value is with Δ εijAbsolute value dullness is passed
Increase;As Δ εijDuring > 0, negative ideality increases, and systematic function tends to be deteriorated, thus orientation function δ < 0, and Δ εijThe biggest, take
The least to function δ;Otherwise, as Δ εijDuring < 0, negative ideality diminishes, and systematic function tends to improve, thus orientation function δ > 0,
And Δ εijThe biggest, orientation function δ is the least;As Δ εijWhen=0, negative ideality is constant, and systematic function tends to not change, thus
Orientation function δ=0.
A kind of bionic intelligence control method the most according to claim 1, it is characterised in that described step 9 adjusts weights
Method is as follows:
(1) weights between calculating action layer and hidden layer change, and formula is as follows:
wjo(k+1)=wjo(k)+Δwjo
In formula, η represents learning rate, usually the constant in the range of [0,1], is used for setting the speed that weights change, i.e. learns
Speed;
(2) calculating the weights change value between hidden layer and sensing layer, formula is as follows:
wij(k+1)=wij(k)+Δwij
As δ > 0, systematicness tends to improving,Then
wjo、wijIncreasing, respective action output increases, and selected probability increases the most accordingly;As δ < 0, Δ wjoAnd Δ wijIt is negative,
Then wjo、wijReducing, respective action output reduces, and selected probability reduces the most accordingly;When δ=0, Δ wjoAnd Δ wijIt is
Zero, then wjo、wijBeing zero, respective action output is constant, and selected probability also keeps constant;Thus achieve operating condition anti-
Penetrate function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410101272.0A CN103886367B (en) | 2014-03-18 | 2014-03-18 | A kind of bionic intelligence control method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410101272.0A CN103886367B (en) | 2014-03-18 | 2014-03-18 | A kind of bionic intelligence control method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103886367A CN103886367A (en) | 2014-06-25 |
CN103886367B true CN103886367B (en) | 2016-08-17 |
Family
ID=50955250
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410101272.0A Expired - Fee Related CN103886367B (en) | 2014-03-18 | 2014-03-18 | A kind of bionic intelligence control method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103886367B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104614988B (en) * | 2014-12-22 | 2017-04-19 | 北京工业大学 | Cognitive and learning method of cognitive moving system with inner engine |
CN105045260A (en) * | 2015-05-25 | 2015-11-11 | 湖南大学 | Mobile robot path planning method in unknown dynamic environment |
CN105824251B (en) * | 2016-05-18 | 2018-06-15 | 重庆邮电大学 | It is a kind of based on neural network it is bionical become warm behavioral approach |
CN109212975B (en) * | 2018-11-13 | 2021-05-28 | 北方工业大学 | Cognitive learning method with development mechanism for perception action |
CN112558605B (en) * | 2020-12-06 | 2022-12-16 | 北京工业大学 | Robot behavior learning system based on striatum structure and learning method thereof |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102200787A (en) * | 2011-04-18 | 2011-09-28 | 重庆大学 | Robot behaviour multi-level integrated learning method and robot behaviour multi-level integrated learning system |
CN102915039A (en) * | 2012-11-09 | 2013-02-06 | 河海大学常州校区 | Multi-robot combined target searching method of animal-simulated space cognition |
CN102930252A (en) * | 2012-10-26 | 2013-02-13 | 广东百泰科技有限公司 | Sight tracking method based on neural network head movement compensation |
-
2014
- 2014-03-18 CN CN201410101272.0A patent/CN103886367B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102200787A (en) * | 2011-04-18 | 2011-09-28 | 重庆大学 | Robot behaviour multi-level integrated learning method and robot behaviour multi-level integrated learning system |
CN102930252A (en) * | 2012-10-26 | 2013-02-13 | 广东百泰科技有限公司 | Sight tracking method based on neural network head movement compensation |
CN102915039A (en) * | 2012-11-09 | 2013-02-06 | 河海大学常州校区 | Multi-robot combined target searching method of animal-simulated space cognition |
Non-Patent Citations (1)
Title |
---|
"具有操作条件反射机能的人工感觉运动系";黄静等;《控制理论与应用》;20150531;第32卷(第5期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN103886367A (en) | 2014-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103886367B (en) | A kind of bionic intelligence control method | |
Caligiore et al. | TRoPICALS: a computational embodied neuroscience model of compatibility effects. | |
Cangelosi et al. | Cognitive robotics | |
CN105700526A (en) | On-line sequence limit learning machine method possessing autonomous learning capability | |
Clark | The humanness of artificial non-normative personalities | |
Aberšek et al. | Virtual teacher: Cognitive approach to e-learning material | |
Cohen et al. | Social babbling: The emergence of symbolic gestures and words | |
CN105205533A (en) | Development automatic machine with brain cognition mechanism and learning method of development automatic machine | |
Lu et al. | An autonomous learning mobile robot using biological reward modulate STDP | |
Dimitrievska et al. | Behavior models of emotion-featured robots: A survey | |
Jian et al. | Research on intelligent cognitive function enhancement of intelligent robot based on ant colony algorithm | |
Hou et al. | Autonomous driving at the handling limit using residual reinforcement learning | |
CN116805158A (en) | Development network model for learning consciousness in biological brain | |
Tzafestas | An Introduction to Robophilosophy Cognition, Intelligence, Autonomy, Consciousness, Conscience, and Ethics | |
Yan et al. | Path Planning for Mobile Robot's Continuous Action Space Based on Deep Reinforcement Learning | |
Hilleli et al. | Toward deep reinforcement learning without a simulator: An autonomous steering example | |
Grüneberg et al. | An approach to subjective computing: A robot that learns from interaction with humans | |
Pecyna et al. | A deep neural network for finger counting and numerosity estimation | |
CN104570738A (en) | Robot track tracing method based on Skinner operant conditioning automata | |
Baldassarre et al. | Action-outcome contingencies as the engine of open-ended learning: computational models and developmental experiments | |
Weng | A model for auto-programming for general purposes | |
Dreyfus | Totally model-free learned skillful coping | |
Levinson et al. | Automatic language acquisition by an autonomous robot | |
Feng et al. | Robot intelligent communication based on deep learning and TRIZ ergonomics for personalized healthcare | |
He et al. | Evaluation and Analysis of the Intervention Effect of Systematic Parent Training Based on Computational Intelligence on Child Autism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160817 |