CN105205533B - Development automatic machine and its learning method with brain Mechanism of Cognition - Google Patents

Development automatic machine and its learning method with brain Mechanism of Cognition Download PDF

Info

Publication number
CN105205533B
CN105205533B CN201510628233.0A CN201510628233A CN105205533B CN 105205533 B CN105205533 B CN 105205533B CN 201510628233 A CN201510628233 A CN 201510628233A CN 105205533 B CN105205533 B CN 105205533B
Authority
CN
China
Prior art keywords
mrow
msub
strio
mfrac
orientation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510628233.0A
Other languages
Chinese (zh)
Other versions
CN105205533A (en
Inventor
任红格
史涛
向迎帆
李福进
李冬梅
霍美杰
徐少彬
刘为民
张春磊
尹瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Science and Technology
Original Assignee
North China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Science and Technology filed Critical North China University of Science and Technology
Priority to CN201510628233.0A priority Critical patent/CN105205533B/en
Publication of CN105205533A publication Critical patent/CN105205533A/en
Application granted granted Critical
Publication of CN105205533B publication Critical patent/CN105205533B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Feedback Control In General (AREA)

Abstract

The present invention relates to the development automatic machine and its learning method with brain Mechanism of Cognition, belong to intelligent robot technology field.Development automatic machine with brain Mechanism of Cognition, including internal state set, system output set, built-in function behavior set, state transition equation, reward signal, system evaluation functions, system acting select probability, dopamine responsive to differential signal.Development automatic machine and its learning method provided by the invention with brain Mechanism of Cognition, framework provides that a kind of generalization ability is strong for system autonomous development process based on learning automaton, mathematical modeling applied widely;Sensorimotor system is combined by this method with intrinsic motivation mechanism, is improved self study and the adaptive ability of system, is realized intelligence truly.

Description

Development automatic machine and its learning method with brain Mechanism of Cognition
Technical field
The present invention relates to the development automatic machine and its learning method with brain Mechanism of Cognition, belong to intelligent robot technology Field.
Technical background
Learning and Memory is the essence of people and animal intelligence behavior, and a variety of technical ability of people and animal are all in its nervous system Gradually form and grow up during by self study and self-organizing, study and simulation people and animal nervous activity and Self-regulatory mechanism, and intelligent robot is given to, it is the important subject of artificial intelligence and control science.
1996, J.Weng proposed robot autonomous intelligence development thought earliest, and he thinks that intelligent body should simulate On the basis of human brain, interacted under the interior control in development program by sensor and effector with circumstances not known to develop intelligence Ability.Brooks etc. emphasizes that robot interacts study with teacher, environment and gradually develops its intelligence, and by with reference to neurology department Research theory proposes that the computation model in the regions such as prefrontal lobe in the cortex of simulation people and animal, hypothalamus, hippocampus comes Challenge in complex environment is handled, this has also related to sensorimotor system.Initial cognitive development is from sensorimotor What the formation and development of system coordination mechanism started, while sensorimotor system is the process for being formed and being developed in intrinsic motivation again In constantly coordinate and it is perfect.Neurology pertinent literature shows, during people and animal learning, cerebral cortex, basal nuclei And cerebellum can with itself distinctive method multiple operation, and in people's correlation relevant with animal movement, cerebellum and Basal nuclei is distributed in cerebral cortex to the both sides of the route of motor message transmission between spinal cord, and they can participate in any behavior act Initiation and control.
Related patent such as application number CN200910086990.4 patent of invention is based on automaton theory, it is proposed that operation Automaton model, and the model is applied in the autonomous learning control of robot.Application No. CN201310656943.5's Operant conditioning reflex principle is then applied to image processing field by patent, effectively raises the precision and speed of system process images Degree.The patent of Application No. 201410101272.0 is low mainly for traditional robot learning efficiency, the problems such as adaptability difference A kind of bionic intelligence control method is proposed, effectively raises intelligent robot level.Application No. 201410163756.8 A kind of autonomous intelligence development cloud robot system based on cloud computing is proposed, the system can effectively mitigate robot execution The burden of computing intensive task, the shared of different machines human world knowledge can also be realized.But above patent does not relate to Simulate the learning system of mankind's brain Mechanism of Cognition.
The content of the invention
For above-mentioned technical problem, the present invention is introduced into psychology using biological sensorimotor system as theoretical foundation Intrinsic motivation mechanism learns to drive, there is provided a kind of development automatic machine and its learning method with brain Mechanism of Cognition, improves machine The autonomous development cognitive ability of device people.
Development automatic machine with brain Mechanism of Cognition, including internal state set, system output set, built-in function behavior Set, state transition equation, reward signal, system evaluation functions, system acting select probability, dopamine responsive to differential signal;
(1) SC=[s1,s2,...sj] it is expressed as limited internal state set, the sensation corresponded in cerebral cortex Cortex, sjJ-th of state is represented, j is the number of internal state.
(2) MC=[y1,y2,...yi] system output set is expressed as, the motor cortex corresponded in cerebral cortex, yi I-th of output is represented, i represents the number of output.
(3)CbA=[a1,a2,...ak] built-in function behavior set is expressed as, correspond to cerebellum region, akFor k-th Internal actions, k are the number of internal actions.
(4)f:S (t) × a (t) → s (t+1) is state transition equation, i.e., the state s (t+1) at t+1 moment is by t State s (t) and operation behavior a (t) are together decided on, and typically have environment or model to determine.
(5) r (t)=r (s (t), a (t)) is expressed as system internally state is the inside taken by s (t) in t The reward signal for making state be transferred to after s (t+1) after operation behavior a (t), the mound sent relative to thalamus are felt.
(6) input signal in cerebral cortex includes two parts, is sensory cortex information and motor cortex information respectively, makees For the input of corpus straitum, therefore:
CC={ SC, MC } (1)
Corpus straitum is mainly the evaluation mechanism for predicting organism operative orientation quality, is also furtherly intrinsic motivation machine The evaluation mechanism of tropism quality is produced, it is as follows to define system evaluation functions:
BGstrio(t)=r (t+1)+γ r (t+2)+γ2r(t+3)+... (2)
Wherein, γ ∈ [0,1] are discount factor;Due to reason existing for intrinsic motivation mechanism so that the evaluation letter of system Number BGstrioGradually 0 is leveled off to, so as to ensure that system is ultimately at stable state;It is the oriented nuclei in intrinsic motivation mechanism to define η The heart, major function are to instruct autonomous cognition direction;Definition orientation core η span is in [ηminmax] between, i.e. orientation Preferably between the functional value worst with orientation;The definition of intrinsic motivation orientation function is as shown in formula (3) so in corpus straitum:
Wherein λ is the parameter of orientation function, and the difference for defining the orientation function of two adjacent moments is θ (t)=η (t)-η (t-1), carry out the orientation degree of judgement system, if θ (t) > 0, illustrate that t is bigger than the orientation value at t-1 moment, on the contrary θ (t) < 0, illustrate to illustrate that t is smaller than the orientation value at t-1 moment.
(7) in the learning process of basal ganglion, the matrix in corpus straitum mainly acts selection function;By it is interior A most important feature selects execution to act exactly according to probability size in the learning process of motivational mechanism driving;Using The Boltzmann rule of probabilitys realize the action selection function of matrix, so as to realizing the probability selection mechanism of learning automaton, its The middle Boltzmann rule of probabilitys belong to known;Define first:
Wherein:M represents m-th of internal actions, and A represents the Boltzmann rule of probabilitys, p (a=ak) expression action selection is generally Rate.
According to the definition in formula (4), by the system acting select probability output BG of corpus straitum matrixmatrix(s, a) come Substitute p (a=ak) represent, formula (2) substitutes into formula (4) and obtains formula (5):
Wherein, T is thermal constant, and the random degree of selection of expression action, the degree that the bigger explanations of T act selection is bigger, The degree that the opposite smaller explanations of T act selection is smaller;When T gradually goes to zero, then BGstrio(SC(t),ak) corresponding to action Select probability gradually tends to 1, and T numerical value is gradually reduced over time in system, represents system experience in learning process Knowledge gradually increases, and is gradually evolved into a systems stabilisation from a unstable system;
(8) dopaminergic discharged by substantia nigra compacta be used as action assess instruct signal, for improve by The Behavior Expression of maximum following award caused by action, more accurate action is performed to obtain;At the t+1 moment by corpus straitum The evaluation function determined is:
BGstrio(t+1)=r (t+2)+γ r (t+3)+γ2r(t+4)+...(6)
Formula (7) can be drawn with reference to formula (2) and formula (6):
BGstrio(t)=r (t+1)+γ BGstrio(t+1) (7)
This shows, in t, evaluation function BGstrio(t) the evaluation function BG at t+1 moment can be usedstrio(t+1) come Represent, but due to the influence of the error present in prediction initial stage so that with evaluation of estimate BGstrio(t+1) BG is representedstrio(t) Value and actual value and unequal, so need to carry out in substantia nigra compacta by the award information of thalamus output and corpus straitum output Processing, and discharge dopaminergic SNDPATo adjust the table of evaluation of estimate, dopamine responsive to differential signal is represented with formula (8):
SNDPA=r (t+1)+γ BGstrio(t+1)-BGstrio(t) (8)
The learning method of development automatic machine with brain Mechanism of Cognition, comprises the following steps:
(1) initialize:Iterative learning step number initial value t=0, iterative learning number are stepmax, initialize parameters And synaptic weight, the then probability that initial internal operation behavior is performed when experiment starts are identical;
(2) current state SC (t) is perceived;
(3) evaluation function BG is calculated in corpus straitumstrio(t), due to the presence of intrinsic motivation mechanism, according to current BGstrio(t) value calculates orientation function η (t);
(4) the action select probability BG of corpus straitum matrix is calculated according to formula according to orientation qualitymatrix(s, a) and by Cerebellum execution action a (t);
(5) according to state transition equation, state is by SC (t) → SC (t+1);
(6) thalamus sends award r (t) immediately and triggers dopamine response regulation evaluation of estimate;
(7) by brain motor cortex output action y (t);
(8) (2)~(7) are repeated until t=stepmax;Study terminates.
Compared with prior art, development automatic machine and its learning method provided by the invention with brain Mechanism of Cognition, with Framework provides that a kind of generalization ability is strong for system autonomous development process based on learning automaton, mathematical modulo applied widely Type;Secondly sensorimotor system is combined by this method with intrinsic motivation mechanism, improves self study and the adaptive ability of system, Realize intelligence truly.
Brief description of the drawings
Fig. 1 is present system structure chart;
Fig. 2 is learning process figure of the present invention;
Fig. 3 is that the coaxial two wheels robot balance of embodiment controls each condition responsive curve;
Fig. 4 is the coaxial two wheels robot balance control evaluation function and error simulation curve of embodiment;
Fig. 5 is the interference--free experiments simulation result of embodiment;
Fig. 6 is the learning method of embodiment and traditional learning automaton method evaluation function curve comparison figure;
The learning method of Fig. 7 embodiments and traditional learning automaton method error curve comparison figure.
Embodiment
The invention will be further described with reference to the accompanying drawings and detailed description.
Using coaxial two wheels robot as embodiment, system construction drawing according to Fig. 2 step flow as shown in figure 1, learnt.
For incomplete formula double-wheel self-balancing robot, it is an intrinsic unstable system, various realizing Before motion, first have to ensure that robot can keep Equilibrium, so the posture balancing of coaxial two wheels robot is to be moved The most important condition of control.In order to verify a kind of validity of development automatic machine with brain Mechanism of Cognition proposed by the invention, Robustness and superiority, the present embodiment have studied how logical the robot under circumstances not known is using coaxial two wheels robot as object Cross autonomous learning and finally learn technical performance.
Robot has four output quantities in experimentation and meets corresponding conditionses, i.e. left and right two-wheeled angular speed θrAnd θl Less than 3.489rad/s, fuselage itself inclination alpha < 0.1744rad and robot swing rod angular speed β < 3.489rad/s.Discount because Sub- γ=0.9, sampling time 0.01s.In each experiment, when the number of attempt of robot is tasted more than 1000 times or once When the balance step number of examination is more than 20000 step, then stops the study of robot and restart another experiment.If robot exists It can also keep balancing after undergoing 20000 steps in wherein once attempting, then it is assumed that the technical ability of balance control has been learned by robot. After each the failure of an experiment, original state and each weights are reset to a range of random value again, then relearn.
Experiment 1:Balance control experiment
Robot, using method proposed by the present invention, by constantly study, passes through under the circumstances not known not interfered with 42 explorations simultaneously complete experiment in the 43rd exploration, take around 220 Walk of experience or so, i.e. 2.2s or so has just learned to balance Technical ability is controlled, is demonstrated by its faster independent learning ability and effectiveness of the invention, each shape of preceding 3000 step in simulation result State amount response curve and evaluation function and error simulation curve are as shown in Figure 3 and Figure 4.
Experiment 2:Interference--free experiments
In the actual running of system, input/output signal more or less can be disturbed by external noise, or Detection means it is inaccurate, quantity of state is produced certain error.So in order to simulate actual environment, when robot When keeping 9800 step after association's balance control, the pulse signal that amplitude is 25 is added in each input state amount, if machine Device people can be subjected to the interference of pulse signal and keep balancing, then it is assumed that Success in Experiment simultaneously proves that the present invention has certain robust Property.Fig. 5 is the output response for adding each state after pulse signal, it can be seen that by 200 steps, i.e. after 2s or so, and robot weight Newly reach equilbrium position.
Experiment 3:The present embodiment and traditional learning automaton contrast experiment
Because the present invention has introduced intrinsic motivation mechanism to drive the autonomous learning of robot, the mistake of system is advantageously reduced Difference, improve convergence of algorithm speed.In order to prove the superiority of the present invention, respectively using traditional learning automaton algorithm and Ben Fa It is bright that balance control experiment has been carried out to coaxial two wheels robot, and its experimental result is analyzed.The parameter of two kinds of algorithms in experiment Set identical, Fig. 6 and Fig. 7 are the comparison diagram of the evaluation function of two kinds of algorithms and error curve in preceding 2000 step.Can be with by Fig. 6 The present invention is found out in about 220 steps, i.e. 2.2s just completes the study of balance control technical ability, and traditional learning automaton method In about 600 steps, i.e. 6s, just complete study, it was demonstrated that convergence rate of the invention is better than traditional learning automaton method.Fig. 7 tables Bright error span of the invention is better than traditional learning automaton method, is more beneficial for the stabilization of system.

Claims (2)

  1. A kind of 1. development automatic machine with brain Mechanism of Cognition, it is characterised in that:Including internal state set, system output collection Close, built-in function behavior set, state transition equation, reward signal, system evaluation functions, system acting select probability, DOPA Amine responsive to differential signal;
    (1) SC=[s1,s2,...sj] it is expressed as limited internal state set, the sensory cortex corresponded in cerebral cortex, sjJ-th of state is represented, j is the number of internal state;
    (2) MC=[y1,y2,...yi] system output set is expressed as, the motor cortex corresponded in cerebral cortex, yiRepresent I-th of output, i represent the number of output;
    (3)CbA=[a1,a2,...ak] built-in function behavior set is expressed as, correspond to cerebellum region, akInside k-th Action, k are the number of internal actions;
    (4)f:S (t) × a (t) → s (t+1) is state transition equation, i.e. the state s (t+1) at t+1 moment by t state s (t) together decide on operation behavior a (t), determined by environment or model;
    (5) r (t)=r (s (t), a (t)) be expressed as system t internally state by the built-in function taken during s (t) The reward signal for making state be transferred to after s (t+1) after behavior a (t), the mound sent relative to thalamus are felt;
    (6) input signal in cerebral cortex includes two parts, is sensory cortex information and motor cortex information respectively, as line The input of shape body, therefore:
    CC={ SC, MC } (1)
    Corpus straitum is mainly the evaluation mechanism for predicting organism operative orientation quality, and furtherly and intrinsic motivation mechanism takes The evaluation mechanism of tropism quality, it is as follows to define system evaluation functions:
    BGstrio(t)=r (t+1)+γ r (t+2)+γ2r(t+3)+... (2)
    Wherein, γ ∈ [0,1] are discount factor;Due to reason existing for intrinsic motivation mechanism so that the evaluation function of system BGstrioGradually 0 is leveled off to, so as to ensure that system is ultimately at stable state;It is the oriented nuclei in intrinsic motivation mechanism to define η The heart, major function are to instruct autonomous cognition direction;Definition orientation core η span is in [ηminmax] between, i.e. orientation Preferably between the functional value worst with orientation;The definition of intrinsic motivation orientation function is as shown in formula (3) so in corpus straitum:
    <mrow> <mi>&amp;eta;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mn>1</mn> <mo>-</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <msub> <mi>&amp;lambda;BG</mi> <mrow> <mi>s</mi> <mi>t</mi> <mi>r</mi> <mi>i</mi> <mi>o</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> <mrow> <mn>1</mn> <mo>+</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <msub> <mi>&amp;lambda;BG</mi> <mrow> <mi>s</mi> <mi>t</mi> <mi>r</mi> <mi>i</mi> <mi>o</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
    Wherein λ is the parameter of orientation function, and the difference for defining the orientation function of two adjacent moments is θ (t)=η (t)-η (t- 1), carry out the orientation degree of judgement system, if θ (t) > 0, illustrate that t is bigger than the orientation value at t-1 moment, on the contrary θ (t) < 0, illustrate that t is smaller than the orientation value at t-1 moment;
    (7) in the learning process of basal ganglion, the matrix in corpus straitum mainly acts selection function;By intrinsic motivation A most important feature selects execution to act exactly according to probability size in the learning process of mechanism drives;Using The Boltzmann rule of probabilitys realize the action selection function of matrix, so as to realizing the probability selection mechanism of learning automaton, its The middle Boltzmann rule of probabilitys belong to known;Define first:
    <mrow> <mtable> <mtr> <mtd> <mrow> <mi>A</mi> <mo>=</mo> <msub> <mi>Boltz</mi> <mi>T</mi> </msub> <mo>{</mo> <mi>E</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>,</mo> <msub> <mi>a</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> <mi>k</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>....</mn> <mi>m</mi> <mo>}</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>&amp;DoubleLeftRightArrow;</mo> <mi>p</mi> <mrow> <mo>(</mo> <mi>a</mi> <mo>=</mo> <msub> <mi>a</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <msup> <mi>e</mi> <mfrac> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>,</mo> <msub> <mi>a</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> <mi>T</mi> </mfrac> </msup> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msup> <mi>e</mi> <mfrac> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>,</mo> <msub> <mi>a</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> <mi>T</mi> </mfrac> </msup> </mrow> </mfrac> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
    Wherein:M represents m-th of internal actions, and A represents the Boltzmann rule of probabilitys, p (a=ak) expression action select probability;
    According to the definition in formula (4), by the system acting select probability output BG of corpus straitum matrixmatrix(s, a) substitute P (a=ak) represent, formula (2) substitutes into formula (4) and obtains formula (5):
    <mrow> <msub> <mi>BG</mi> <mrow> <mi>m</mi> <mi>a</mi> <mi>t</mi> <mi>r</mi> <mi>i</mi> <mi>x</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>s</mi> <mo>,</mo> <mi>a</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <msup> <mi>e</mi> <mfrac> <mrow> <msub> <mi>BG</mi> <mrow> <mi>s</mi> <mi>t</mi> <mi>r</mi> <mi>i</mi> <mi>o</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>S</mi> <mi>C</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> <mo>,</mo> <msub> <mi>a</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> <mi>T</mi> </mfrac> </msup> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msup> <mi>e</mi> <mfrac> <mrow> <msub> <mi>BG</mi> <mrow> <mi>s</mi> <mi>t</mi> <mi>r</mi> <mi>i</mi> <mi>o</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>S</mi> <mi>C</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> <mo>,</mo> <msub> <mi>a</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> <mi>T</mi> </mfrac> </msup> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>
    Wherein, T is thermal constant, and the random degree of selection of expression action, the degree that the bigger explanations of T act selection is bigger, opposite T The degree that smaller explanation acts selection is smaller;When T gradually goes to zero, then BGstrio(SC(t),ak) corresponding to action selection Probability gradually tends to 1, and T numerical value is gradually reduced over time in system, represents system Heuristics in learning process Gradually increase, and be gradually evolved into a systems stabilisation from a unstable system;
    (8) dopaminergic discharged by substantia nigra compacta be used as action assess instruct signal, for improve by acting The Behavior Expression of caused maximum following award, more accurate action is performed to obtain;Determined at the t+1 moment by corpus straitum Fixed evaluation function is:
    BGstrio(t+1)=r (t+2)+γ r (t+3)+γ2r(t+4)+... (6)
    Formula (7) can be drawn with reference to formula (2) and formula (6):
    BGstrio(t)=r (t+1)+γ BGstrio(t+1) (7)
    This shows, in t, evaluation function BGstrio(t) the evaluation function BG at t+1 moment can be usedstrio(t+1) represent, But due to the influence of the error present in prediction initial stage so that with evaluation of estimate BGstrio(t+1) BG is representedstrio(t) value With actual value and unequal, so need to carry out in substantia nigra compacta by the award information of thalamus output and corpus straitum output Reason, and discharge dopaminergic SNDPATo adjust the table of evaluation of estimate, dopamine responsive to differential signal is represented with formula (8):
    SNDPA=r (t+1)+γ BGstrio(t+1)-BGstrio(t) (8)
  2. 2. the development automatic machine according to claim 1 with brain Mechanism of Cognition, it is characterised in that:Its learning method, bag Include following steps:
    (1) initialize:Iterative learning step number initial value t=0, iterative learning number are stepmax, initialize parameters and dash forward Weights are touched, then the probability that initial internal operation behavior is performed when experiment starts is identical;
    (2) current state SC (t) is perceived;
    (3) evaluation function BG is calculated in corpus straitumstrio(t), due to the presence of intrinsic motivation mechanism, according to current BGstrio(t) Value calculate orientation function η (t);
    (4) the action select probability BG of corpus straitum matrix is calculated according to formula according to orientation qualitymatrix(s, a) and by cerebellum Execution action a (t);
    (5) according to state transition equation, state is by SC (t) → SC (t+1);
    (6) thalamus sends award r (t) immediately and triggers dopamine response regulation evaluation of estimate;
    (7) by brain motor cortex output action y (t);
    (8) (2)~(7) are repeated until t=stepmax;Study terminates.
CN201510628233.0A 2015-09-29 2015-09-29 Development automatic machine and its learning method with brain Mechanism of Cognition Expired - Fee Related CN105205533B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510628233.0A CN105205533B (en) 2015-09-29 2015-09-29 Development automatic machine and its learning method with brain Mechanism of Cognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510628233.0A CN105205533B (en) 2015-09-29 2015-09-29 Development automatic machine and its learning method with brain Mechanism of Cognition

Publications (2)

Publication Number Publication Date
CN105205533A CN105205533A (en) 2015-12-30
CN105205533B true CN105205533B (en) 2018-01-05

Family

ID=54953202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510628233.0A Expired - Fee Related CN105205533B (en) 2015-09-29 2015-09-29 Development automatic machine and its learning method with brain Mechanism of Cognition

Country Status (1)

Country Link
CN (1) CN105205533B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105690392B (en) * 2016-04-14 2017-11-28 苏州大学 Motion planning and robot control method and apparatus based on actor reviewer's method
CN105824251B (en) * 2016-05-18 2018-06-15 重庆邮电大学 It is a kind of based on neural network it is bionical become warm behavioral approach
CN106598058A (en) * 2016-12-20 2017-04-26 华北理工大学 Intrinsically motivated extreme learning machine autonomous development system and operating method thereof
CN107894715A (en) * 2017-11-13 2018-04-10 华北理工大学 The cognitive development method of robot pose path targetpath optimization
CN108646550B (en) * 2018-04-03 2022-03-22 江苏江荣智能科技有限公司 Multi-agent formation method based on behavior selection
CN109002887A (en) * 2018-08-10 2018-12-14 华北理工大学 The heuristic curiosity cognitive development system of biology and its operation method
CN109212975B (en) * 2018-11-13 2021-05-28 北方工业大学 Cognitive learning method with development mechanism for perception action
CN112558605B (en) * 2020-12-06 2022-12-16 北京工业大学 Robot behavior learning system based on striatum structure and learning method thereof
CN113255765B (en) 2021-05-25 2024-03-19 南京航空航天大学 Cognitive learning method based on brain mechanism

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599137A (en) * 2009-07-15 2009-12-09 北京工业大学 Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior
US7668796B2 (en) * 2006-01-06 2010-02-23 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Automata learning algorithms and processes for providing more complete systems requirements specification by scenario generation, CSP-based syntax-oriented model construction, and R2D2C system requirements transformation
CN101673354A (en) * 2009-06-12 2010-03-17 北京工业大学 Operant conditioning reflex automatic machine and application thereof in control of biomimetic autonomous learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7668796B2 (en) * 2006-01-06 2010-02-23 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Automata learning algorithms and processes for providing more complete systems requirements specification by scenario generation, CSP-based syntax-oriented model construction, and R2D2C system requirements transformation
CN101673354A (en) * 2009-06-12 2010-03-17 北京工业大学 Operant conditioning reflex automatic machine and application thereof in control of biomimetic autonomous learning
CN101599137A (en) * 2009-07-15 2009-12-09 北京工业大学 Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A study on autonomous learning mechanism of Cognitive robot;Shi Tao;《Control and Decision Conference》;20150525;全文 *
Research on robust bionic learning algorithm in balance control for the robot;Shi Tao;《Control and Decision Conference》;20150525;全文 *
一种内在动机驱动的FRBF网络自主学习算法;任红格;《河北联合大学学报(自然科学版)》;20150731;第37卷(第3期);全文 *

Also Published As

Publication number Publication date
CN105205533A (en) 2015-12-30

Similar Documents

Publication Publication Date Title
CN105205533B (en) Development automatic machine and its learning method with brain Mechanism of Cognition
Newell Change in movement and skill: Learning, retention, and transfer
Casellato et al. Adaptive robotic control driven by a versatile spiking cerebellar network
Tamosiunaite et al. Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives
Holland et al. Robots with internal models a route to machine consciousness?
Caligiore et al. TRoPICALS: a computational embodied neuroscience model of compatibility effects.
Salt et al. Parameter optimization and learning in a spiking neural network for UAV obstacle avoidance targeting neuromorphic processors
Caligiore et al. Integrating reinforcement learning, equilibrium points, and minimum variance to understand the development of reaching: a computational model.
CN107894715A (en) The cognitive development method of robot pose path targetpath optimization
Gumbsch et al. Autonomous identification and goal-directed invocation of event-predictive behavioral primitives
Weng et al. Modulation for emergent networks: Serotonin and dopamine
Zhang et al. Overview of deep reinforcement learning improvements and applications
CN103886367B (en) A kind of bionic intelligence control method
John et al. Modelling 3D saccade generation by feedforward optimal control
CN112405542B (en) Musculoskeletal robot control method and system based on brain inspiring multitask learning
Hilleli et al. Toward deep reinforcement learning without a simulator: An autonomous steering example
Guo et al. WWN-9: Cross-domain synaptic maintenance and its application to object groups recognition
Houbre et al. Balancing exploration and exploitation: a neurally inspired mechanism to learn sensorimotor contingencies
Preux et al. A generic architecture for adaptive agents based on reinforcement learning
CN112525194A (en) Cognitive navigation method based on endogenous and exogenous information of hippocampus-striatum
Ren et al. A computational model of cognitive development for the motor skill learning from curiosity
CN112766317A (en) Neural network weight training method based on memory playback and computer equipment
CN109002887A (en) The heuristic curiosity cognitive development system of biology and its operation method
Saquib Visual Tracking with Spiking Neural Networks in an Oculomotor Controller for a Biomimetic Model of the Eye
Vaidya et al. Reducing catastrophic forgetting in self organizing maps with internally-induced generative replay (student abstract)

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180105

Termination date: 20180929

CF01 Termination of patent right due to non-payment of annual fee