CN101673354A - Operant conditioning reflex automatic machine and application thereof in control of biomimetic autonomous learning - Google Patents

Operant conditioning reflex automatic machine and application thereof in control of biomimetic autonomous learning Download PDF

Info

Publication number
CN101673354A
CN101673354A CN200910086990A CN200910086990A CN101673354A CN 101673354 A CN101673354 A CN 101673354A CN 200910086990 A CN200910086990 A CN 200910086990A CN 200910086990 A CN200910086990 A CN 200910086990A CN 101673354 A CN101673354 A CN 101673354A
Authority
CN
China
Prior art keywords
state
ocm
abk
constantly
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910086990A
Other languages
Chinese (zh)
Inventor
阮晓钢
郜园园
蔡建羡
陈静
戴丽珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN200910086990A priority Critical patent/CN101673354A/en
Publication of CN101673354A publication Critical patent/CN101673354A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides an operant conditioning reflex automatic machine model, and designs a biomimetic autonomous learning control method based on the model. Aiming at the control problem of a natural world system, the operant conditioning reflex automatic machine model which can be used for describing and imitating and is designed with the function of self-organization (including autonomous learning and self-adaption) can be designed by utilizing the biomimetic self-organizing learning method, thus effectively applying bionics and psychics to the control of a system. By using the operant conditioning reflex automatic machine model (OCM), the operation (control quantity) is firstly selected at random according to the input and the state of the current system, and the operation with high probability value is inclined to be selected due to good operation orientation. After the control is implemented, the state is observed, and the control effect is output outside; then, an orientation unit is used for evaluating the state after control, modifies the probability value of a rule collection and continuously acquires the behavior with the good operation orientation, thus being capable of conveniently selecting more excellent behavior next time; and finally, the autonomous control can be realized.

Description

Operant conditioning reflex automat and the application in bionical autonomous learning control thereof
Technical field
The present invention relates to a kind of biorobot (Operant Conditioning Automata is hereinafter to be referred as OCM) based on the operant conditioning reflex principle.It utilizes computer technology, automatic control technology, bionics, psychology, biology to wait and realizes bionical autonomous learning control.
Background technology
The present invention is based on the operant conditioning reflex theory of Skinner, it is different from Pavlov's classical conditioned reflex.Classical conditioning is the process that is induced reaction by conditional stimulus, and its formula is S → R, and reaction has congenital, and stimulus is as a kind of reinforcement, and is expert at for just being presented before; And operant conditioned reflex is at first to do certain operant response, the process that is strengthened then, and its formula is R → S, and reaction has posteriority, and its reinforcement appears at after the behavior appearance, and purpose is to allow the desired specific behavior of the experimenter of subjects association.In view of the above, Skinner further proposes two kinds of study again: a kind of is the study of classical conditioning formula, and another kind is the study of operant conditioning reflex formula.Two kinds of study form no less importants, and the reinforcing stimulus of operant conditioned reflex has clear and definite purpose, more helps the specific behavior of subjects association.
Automaton model of the present invention is based upon on the basis of finite-state automata, and a general finite state machine is five-tuple a: FSM={A, Z, S, f, g}.The meaning of each symbol wherein: (1) A represents that limited incoming symbol set (2) S represents that limited (inside) state symbol set (s (0) ∈ S is an original state) (3) Z represents that limited output (receive status) assemble of symbol (4) f:S * A → S represents that state transition function (5) g:S → Z represents output function.
At present, similar patent of invention mainly is based on the research of the method for finite-state automata or cellular automaton, general phenomenons such as the cellular automaton that adopts is mainly used in research information transmission, calculating, structure, grows, duplicates, competition, but also well do not using aspect the perception of simulated animal and the cognitive behavior.As application (patent) number be 200610119136.X, is called the Edge-Detection Algorithm based on cellular automaton; Application (patent) number is 200810031543.4, and name is called the many Ai Zhen body dynamic multi-objective collaboration tracking method based on finite-state automata.The patent of operant conditioning reflex automat and application facet thereof yet there are no.
The present invention proposes a kind of operant conditioning reflex automaton model, and based on this modelling a kind of method of bionical autonomous learning control.The objective of the invention is to test and illustrate that the method has realized simulated operation conditioned reflex learning mechanism, confirm to go to realize the feasibility of the Model free control of some state continuous control system with inverted pendulum control problem with the method with the Skinner pigeon.
Summary of the invention
The present invention is different from traditional control method, be based on the operant conditioning reflex study mechanism, principle according to automat, Balance Control problem at experiment of Skinner pigeon and inverted pendulum, use bionical self-organization (comprising self study and self-adaptation) learning method, design a kind ofly can be used for describing, simulation, design have self-organization (comprising self study and self-adaptation) function operations conditioned reflex automaton model, thereby effectively with bionics, psychology and biological applications in control system, thereby realize the function of bionical autonomous learning control.
Operant conditioning reflex automat of the present invention is one eight tuple
OCM=<A,S,O,Z,R,f,ψ,δ>,
Wherein,
(1) incoming symbol of OCM set: A={a j| j=0,1,2 ..., n A, a jBe j incoming symbol of OCM;
(2) internal state of OCM set: S={s j| i=0,1,2 ..., n S, s iBe i state symbol of OCM;
(3) built-in function of OCM set: O={o k| k=1,2 ..., n O, o kBe k functional symbol of OCM;
(4) output symbol of OCM set: Z={z m| m=0,1,2 ..., n Z, z mBe m output symbol of OCM;
(5) regular collection of OCM: R={r Ijk| i ∈ 0,1,2 ..., n S; J ∈ 0,1,2 ..., n A; K ∈ 1,2 ..., n O, each element r of R Ijk∈ R represents " condition-operation " rule at random:
r ijk:s i×a j→o k(p ijk)
Be that OCM is in s at state i(∈ S) and be input as a jUnder the condition of (∈ A) according to Probability p IjkImplementation and operation o k(∈ O), p Ijk=p (o k| s i∩ a j) be that OCM is in s at state iBe input as a iImplementation and operation o under the condition of ∈ A kProbability, claim regular r again IjkExcitation probability.
(6) state space equation of OCM:
f : f S : S ( t ) &times; A ( t ) &times; O ( t ) &RightArrow; S ( t + 1 ) f Z : S ( t ) &times; A ( t ) &times; O ( t ) &RightArrow; Z ( t + 1 )
Wherein, f SIt is the state transition equation of OCM, OCM t+1 state s (t+1) (∈ S) constantly by t constantly state s (t) (∈ S) and t imports a (t) (∈ A) constantly and t operation o (t) (∈ O) constantly is definite, irrelevant with its t state, input and operation before the moment, and, f SCan be unknown, can observe but the result of OCM state transitions is OCM self; f ZIt is the output equation of OCM, OCMt+1 output z (t+1) (∈ Z) constantly by t constantly state s (t) (∈ S) and t imports a (t) (∈ A) constantly and t operation o (t) (∈ O) constantly is definite, irrelevant with its t state and input and operation before constantly, the output of OCM is that the external world can be observed;
(7) the state orientation function of OCM: ψ: S * A → [h, q], h is defined as the poorest orientation function value of orientation, and q is that (the orientation here defines from biological significance the best orientation function value of orientation, the direction of environment decision biological evolution, i.e. Sheng Wu orientation).The value of p and q can be come value according to handled concrete object.For arbitrary s i(∈ S) and input a j(∈ A), ψ Ij=ψ (s i, a j) be that OCM is about state s iWith input a jExpectation value, if ψ Ij<0, then claim s iBe that OCM is being input as a jThe time negative state of orientation; If ψ Ij=0, then claim s iBe that OCM is being input as a jThe time zero state of orientation; If ψ Ij>0, then claim s iBe that OCM is being input as a jThe time just orientation state;
(8) the operant conditioning reflex law of learning of OCM:
Figure A20091008699000062
If OCM t state constantly is s (t)=s a∈ S, input a (t)=a b∈ A, according among the set R at random " condition-operation " rule choose be operating as o (t)=o c∈ O observes t+1 state s (t+1)=s constantly behind the implementation and operation d∈ S, then based on the operant conditioning reflex principle, " condition-operation " regular p at random among the operational set R Abk(k=1,2 ..., n o) excitation probability comply with
&delta; : &ForAll; k &NotEqual; c p abk ( t + 1 ) = p abk ( t ) - &xi; ( &psi; &RightArrow; abk ) &CenterDot; p abk ( t ) p abk ( t + 1 ) = max min ( p abk ( t + 1 ) , 0,1 ) p abc ( t + 1 ) = 1 - &Sigma; k &NotEqual; c p abk ( t + 1 )
Regulate, wherein,
Figure A20091008699000072
Be that OCM is in s at state a(∈ S) and be input as a bImplementation and operation o under the condition of (∈ A) c(∈ O) back state transitions is s dThe variable quantity of (∈ S) back orientation function value, available this variable quantity is judged the quality of this operation;
Figure A20091008699000073
Be monotonic increasing function, and if only if for ξ (x)=0
Figure A20091008699000074
R is the working rule sum, and λ is a learning rate, i.e. the speed of each iterative learning.p Abc(t) (a ∈ 0,1,2 ..., n S; B ∈ 0,1,2 ..., n A; C ∈ 1,2 ..., n O) be that the OCM state is in s a(∈ S) and be input as a bImplementation and operation o when (∈ A) cProbability p (the o of (∈ O) c| s a∩ a b) in t value constantly, when
Figure A20091008699000075
The time, implementation and operation o is described c(∈ O) and transfering state are s dOrientation function value after (∈ S) diminishes, i.e. orientation variation, then p Abc(t+1)<p Abc(t), represent next selection operation o constantly cThe probability of (∈ O) reduces; When
Figure A20091008699000076
The time, implementation and operation o is described c(∈ O) and transfering state are s dOrientation function value after (∈ S) is constant, and promptly orientation is also constant, at this moment p Abc(t+1)=p Abc(t), represent next selection operation o constantly cThe probability of (∈ O) is constant; When
Figure A20091008699000077
The time, implementation and operation o is described c(∈ O) and transfering state are s dIt is big that orientation function value after (∈ S) becomes, and promptly orientation improves, then p Abc(t+1)>p Abc(t), represent next selection operation o constantly cThe probability of (∈ O) increases.Max min (p wherein Abk(t+1), be to work as p 0,1) Abk(t+1)>1 o'clock p Abk(t+1)=1; p Abk(t+1)<0 o'clock p Abk(t+1)=0 can guarantee p Abk(t+1) ∈ [0,1], and
Figure A20091008699000078
Promptly be illustrated in and take the probability of different operating under the same state of same input and be 1, when t → ∞, if p Abc(t) → 1, description operation o c(∈ O) is in s at state a(∈ S) and be input as a bBehavior optimum under the condition of (∈ A).Our given study iterations Tf or optimum behavior are selected probability max-thresholds p generally speaking ε, be in s when study reaches iterations or works as a certain state a(∈ S) and be input as a bImplementation and operation o under the condition of (∈ A) cThe Probability p of (∈ O) Abc(t) 〉=p εIn time, stop to learn p ε∈ [0.7,1] is specifically set by the system environments of reality, generally is made as p ε=0.9.
Key character of the present invention is to simulate biological operant conditioning reflex mechanism, thereby has bionical self organizing function, comprises self study and adaptation function, can be used for describing, and simulates, and designs the system of various self-organizations.
Technical scheme of the present invention is seen Fig. 1, Fig. 2.
Method step of the present invention is as follows:
(1) sets the starting condition of testing.The initial state s of given OCM (0), the initial input a (0) of given OCM, learning rate λ, each " condition-operation " regular r at random among the given R Ijk(i ∈ 0,1,2 ..., n S; J ∈ 0,1,2 ..., n A; K ∈ 1,2 ..., n o) initial excitation probability p Ijk(0)=and l/r, given iterative learning step number Tf or optimum behavior are selected Probability p ε, determine λ, Tf and p by requirement of experiment and environment ε, generally get λ=0.05, Tf=1000, p ε=0.9.
(2) selection operation and implementation and operation at random.According to OCMt constantly state s (t) ∈ S and input a (t) ∈ A and R in each " condition-operation " regular r at random Ijk(i ∈ 0,1,2 ..., n S; J ∈ 0,1,2 ..., n A; K ∈ 1,2 ..., n o) excitation probability t value p constantly Ijk(t), press t moment state probable value p of each operation down Ijk(t) distribute, select t operation o (t) ∈ O constantly randomly; If OCMt is state s (t)=s constantly a, input a (t)=a b, choose t operation o (t)=o constantly c, then the state of OCM is according to f S: S (t) * A (t) * O (t) → S (t+1) state transition equation occurrence features shifts;
(3) operant conditioning reflex.If observe state s (t+1)=s d∈ S, t+1 constantly then operant conditioning reflex unit δ to " condition-operation " regular r at random AbcExcitation probability is regulated, r AbcExcitation probability t+1 value constantly
&delta; : &ForAll; k &NotEqual; c p abk ( t + 1 ) = p abk ( t ) - &xi; ( &psi; &RightArrow; abk ) &CenterDot; p abk ( t ) p abk ( t + 1 ) = max min ( p abk ( t + 1 ) , 0,1 ) p abc ( t + 1 ) = 1 - &Sigma; k &NotEqual; c p abk ( t + 1 )
Maxmin (p wherein Abk(t+1), be to work as p 0,1) Abk(t+1)>1 o'clock p Abk(t+1)=1; p Abk(t+1)<0 o'clock p Abk(t+1)=0; Can guarantee p Abk(t+1) ∈ [0,1].And &Sigma; k = 1 n o p abk ( t ) = 1 .
(4) by the output equation f of system Z: S (t) * A (t) * O (t) → Z (t+1) externally exports Z (t+1).
(5) repeat (2)-step of (4), up to reaching iterative learning number of times Tf or working as p Abc(t+1)>p εIn time, stop to test.
The process flow diagram of the inventive method is seen Fig. 3.
Advantage of the present invention is can simulate and bionical nature life " adjusting to changed conditions property ", makes machine life have thinking, memory and learning functionality.Have the machine life of cognitive behavior or cognitive ability thereby can not only change factum, and can improve factum.What make machine life performance has bio-imitability and intelligent more.
Description of drawings
Fig. 1 is a structural representation of the present invention
Fig. 1: 1 incoming symbol set, the set of 2 internal states, 3 built-in functions set, the set of 4 output symbols, 5 " condition-operation " regular collections at random, 6 state space unit, 7 state orientation functions, 8 operant conditioning reflex law of learnings.
Fig. 2 is the application structure block diagram of patent of the present invention
Fig. 3 is the method flow diagram of patent of the present invention
Fig. 4 is three kinds of behavior number of times design sketchs of Skinner pigeon experiment
1 red button, 2 yellow button, 3 blue buttons
Fig. 5 (a) is that the Skinner pigeon is tested iterative learning simulated effect figure, the behavior evolutionary operator probability figure that is respectively at 1000 times and 1000 times frequency of training (b).1 red button, 2 yellow button, 3 blue buttons
Fig. 6 (a) is the drift angle curve map under the definite model of inverted pendulum Balance Control experiment
Fig. 6 (b) is the drift angle speed curve diagram under the definite model of inverted pendulum Balance Control experiment
Fig. 7 (a) is the drift angle curve map under the inverted pendulum Balance Control probabilistic model
Fig. 7 (b) is the drift angle speed curve diagram under the inverted pendulum Balance Control experiment probabilistic model
Embodiment
Embodiment one: shown in Fig. 4,5, and the experiment of Skinner operant conditioning reflex pigeon.
The target of Skinner pigeon experiment training is to make its association peck the operation behavior of red button.Obtain food (positive reinforcement stimulation) when it pecks red button, do not have any stimulation when pecking yellow button, shock by electricity when pecking blue buttons (negative reinforcement stimulation).Adopt Skinner operant conditioning reflex automaton model method to experimentize, as shown in Figure 1, 2, 3.
Provide the discrete mathematics model that the pigeon experiment is simplified earlier: establishing pigeon has three states to be respectively: starvation, semistarvation state and Zero Hunger state.When the pigeon state was hungry, when giving its food, state transitions was semistarvation, and when not stimulating to pigeon food or to its electric shock, pigeon still shows as starvation, is output as the state of pigeon this moment; When the pigeon state was semistarvation, when giving its food, state transitions was a Zero Hunger, did not stimulate to its food or to its electric shock, and pigeon shifts and is starvation, is output as the state of pigeon at this moment; When the pigeon state was Zero Hunger, when giving its food, state still shifted and is Zero Hunger, did not give its food, and state transitions is the semistarvation state, stimulated to its electric shock, and state transitions is a starvation, was output as the pigeon state of this moment.Obtain the state transition equation f of its model s: S (t) * A (t) * O (t) → S (t+1) specifically is expressed as:
f(s 0,a 0,o 1)=s 1 f(s 1,a 0,o 1)=s 2 f(s 2,a 0,o 1)=s 2
f(s 0,a 1,o 2)=s 0 f(s 1,a 1,o 2)=s 0 f(s 2,a 1,o 2)=s 1
f(s 0,a 2,o 3)=s 0 f(s 1,a 2,o 3)=s 0 f(s 2,a 2,o 3)=s 0
The incoming symbol set A={ a of pigeon experiment 0, a 1, a 2, a wherein 0Expression is pecked red button to pigeon food, a 1Expression is pecked yellow button to pigeon food, a 2The blue buttons stimulation of shocking by electricity is pecked in expression; State set S={s 0, s 1, s 2, s wherein 0The expression starvation, s 1The expression semistarvation, s 2Expression Zero Hunger state; Operational set O={o 1, o 2, o 3, o wherein 1The expression pigeon pecks red button, o 2The expression pigeon pecks yellow button, o 3The expression pigeon pecks blue buttons, and pigeon pecks redly during beginning, and yellow and blue three buttons are at random; Regular collection R is: r Ijk: s i* a j→ o k(p Ijk), represent that promptly pigeon is being in s i(∈ S) state and be input as a jUnder the condition of (∈ A) according to Probability p IjkImplementation and operation o k(∈ O), p Ijk=p (o k| s i∩ a j) be that pigeon is in s at state iBe input as a iImplementation and operation o under the condition of ∈ A kProbability, claim regular r again IjkExcitation probability.Setting pigeon discrete state orientation function ψ: S * A → and 1,0,1,2,3}, concrete function expression is:
ψ 00(s 0,a 0)=1 ψ 10(s 1,a 0)=2 ψ 20(s 2,a 0)=3
ψ 01(s 0,a 1)=0 ψ 11(s 1,a 1)=1 ψ 21(s 2,a 1)=2
ψ 02(s 0,a 2)=-1 ψ 12(s 1,a 2)=0 ψ 22(s 2,a 2)=1
ψ 00(s 0, a 0When)=1 expression be hunger when the pigeon state, to its food, then its state orientation function value more greatly 1;
ψ 11(s 0, a 1When)=0 expression is hungry when the pigeon state, do not give its food, then its state orientation function value is 0;
ψ 22(s 0, a 2When)=-1 expression was hungry when the pigeon state, to its electric shock, then its state orientation function value is less was-1.
Pigeon experimental technique basic step is as follows:
(1) sets the starting condition of testing.Initial input is set to pigeon food a 0, original state is set at starvation s 0, set the initial probability that pigeon pecks three buttons and be 1/3, it is impartial promptly just having begun the chance that pigeon pecks three buttons, learning rate λ=0.05 is set optimum behavior and is selected probability threshold value p ε=0.97.
(2) selection operation and implementation and operation at random.Being located at the state that t observes pigeon constantly is s a∈ S is input as a b∈ A, state orientation function value is ψ Ab∈ ψ, according among the set R at random " condition-operation " rule press probable value p that t constantly respectively operates Ijk(t) the operation o that distributes and choose c∈ O, implementation and operation o cBehind the ∈ O according to the state transition function f of pigeon s:
S (t) * A (t) * O (t) → S (t+1) carries out state transitions.Promptly comply with
f(s 0,a 0,o 1)=s 1 f(s 1,a 0,o 1)=s 2 f(s 2,a 0,o 1)=s 2
f(s 0,a 1,o 2)=s 0 f(s 1,a 1,o 2)=s 0 f(s 2,a 1,o 2)=s 1
f(s 0,a 2,o 3)=s 0 f(s 1,a 2,o 3)=s 0 f(s 2,a 2,o 3)=s 0
Carry out state transitions.F (s wherein 0, a 0, o 1)=s 1Expression is the semistarvation state to pigeon state transitions under the food when red button is pecked in its selection when pigeon is starvation; F (s 1, a 1, o 2)=s 0Expression is not a starvation to pigeon state transitions under its food when it selects yellow button when pigeon is the semistarvation state; F (s 2, a 2, o 3)=s 0The expression pigeon is when being the Zero Hunger state, is starvation to its electric shock pigeon state transitions when it selects blue buttons.
Output function definition f z: z m=s i, m=i, i={0,1,2}.Output set Z={z 0, z 1, z 2, z 0=s 0, z 1=s 1, z 2=s 2When pigeon after t shifts constantly, t+1 constantly state is s d∈ S, the state orientation function value that obtains t+1 moment pigeon so is ψ Ab∈ ψ.
(3) operant conditioning reflex.According to the variable quantity of the state orientation function ψ value of pigeon, promptly
Figure A20091008699000101
According to operant conditioning reflex unit δ to " condition-operation " regular r at random AbcExcitation probability is regulated.The δ here is:
&delta; : &ForAll; k &NotEqual; c p abk ( t + 1 ) = p abk ( t ) - &xi; ( &psi; &RightArrow; abk ) &CenterDot; p abk ( t ) p abk ( t + 1 ) = max min ( p abk ( t + 1 ) , 0,1 ) p abc ( t + 1 ) = 1 - &Sigma; k &NotEqual; c p abk ( t + 1 )
Wherein,
Figure A20091008699000103
Be that pigeon is at state s a(∈ S) and be input as a bImplementation and operation o under the condition of (∈ A) c(∈ O) back state transitions is s dThe variable quantity of (∈ S) back orientation function value, available this variable quantity is judged the quality of this operation;
Figure A20091008699000104
Be monotonic increasing function, and if only if for ξ (x)=0 Be the working rule sum, λ is a learning rate, and promptly the speed of each iterative learning is got r=3 here, λ=0.05.p Abc(t) (a ∈ 0,1,2 ..., n S; B ∈ 0,1,2 ..., n A; C ∈ 1,2 ..., n o) be that pigeon is in s a(∈ S) and be input as a bImplementation and operation o when (∈ A) cProbability p (the o of (∈ O) c| s a∩ a b) in t value constantly, when
Figure A20091008699000106
The time, implementation and operation o is described c(∈ O) and transfering state are s dOrientation function value after (∈ S) diminishes, and promptly at this moment the orientation variation obtains p Abc(t+1)<p Abc(t), represent next selection operation o constantly cThe probability of (∈ O) reduces, and promptly diminishes in this input and state selecteed probability of the following behavior; When
Figure A20091008699000107
The time, implementation and operation o is described c(∈ O) and transfering state are s dOrientation function value after (∈ S) is constant, and promptly orientation is also constant, at this moment obtains p Abc(t+1)=p Abc(t), represent next selection operation o constantly cThe probability of (∈ O) is constant; When
Figure A20091008699000108
The time, implementation and operation o is described c(∈ O) and transfering state are s dIt is big that orientation function value after (∈ S) becomes, and promptly orientation improves, and at this moment obtains p Abc(t+1)>p Abc(t), represent next selection operation o constantly cThe probability of (∈ O) increases.
Specifically, when t constantly the time pigeon state be hungry s 0The time, if pigeon is pressed the excitation probability p among the regular collection R 001(t)=p (o 1| s 0∩ a 0Red button o has been selected to peck in)=0.55 1Operation, and give its food a 0, by pigeon state transition equation f (s 0, a 0, o 1)=s 1Then next moment state transitions of pigeon is semistarvation s 1, the orientation function value ψ of this moment 10(s 1, a 0The orientation function value ψ of)=2 when previous hungry 00(s 0, a 0)=1 obtains p 001(t+1)>p 001(t), thus when study next time the probability selecting to peck red button just increase.
Maxmin (p wherein Abk(t+1), be to work as p 0,1) Abk(t+1)>1 o'clock p Abk(t+1)=1, p Abk(t+1)<0 o'clock p Abk(t+1)=0 can guarantee p Abk(t+1) ∈ [0,1], and
Figure A20091008699000111
Promptly be illustrated in and take the probability of different operating under the same state of same input and be 1.When t → ∞, if p Abc(t) → 1, description operation o then c(∈ O) is in s at state a(∈ S) and be input as a bBehavior optimum under the condition of (∈ A).
(4) externally output.By output function definition f z: z m=s i, i=0,1,2, m=i.Press output set Z={z 0, z 1, z 2, z 0=s 0, z 1=s 1, z 2=s 2Externally export t+1 state constantly.
(5) whether the judgment experiment stop condition reaches.Work as p Abc=p (o c| s a∩ a b)>p εThe time, think that then pigeon learned an optimum operation behavior, then after this pigeon just continues to select this optimum operation behavior up to reaching iterations Tf under this input of this state.Otherwise repeat (2) one (4) experimental procedure, till satisfying condition.
The result shows, uses the model of above-mentioned operant conditioning reflex automat, after a period of time, pigeon peck get red button number of times apparently higher than pecking the number of times of getting other two buttons, see Fig. 4.Fig. 5 is pigeon experiment iterative learning simulated effect figure, the process of pigeon operant conditioning reflex study formation as can be seen from FIG..
Embodiment two: as Fig. 6-shown in Figure 7, and the experiment of the Balance Control of single inverted pendulum.
The target of inverted pendulum control is by applying a power u (controlled quentity controlled variable) for the dolly base, being functional symbol set O, i.e. u=o k, k=1,2 ..., n oFinal assurance bar does not fall down, and promptly is no more than a vertical off setting angular range that pre-defines.Adopt the method for Skinner operant conditioning reflex automaton model to control experiment, as shown in Figure 1, 2, 3.
Inverted pendulum can be described with the following equation of motion
&theta; &CenterDot; &CenterDot; = m ( m + M ) gl ( M + m ) I + Mml 2 &theta; - ml ( M + m ) I + Mml 2 u , wushu u=o k, k=1,2 ..., n oBelow substitution obtains
In the formula I = 1 12 mL 2 , l = 1 2 L .
Formula:
&theta; &CenterDot; &CenterDot; = m ( m + M ) gl ( M + m ) I + Mml 2 &theta; - ml ( M + m ) I + Mml 2 o k
In the formula
Figure A20091008699000116
Figure A20091008699000117
K=1,2 ..., n o
By Euler method numerical approximation, available following difference equation comes the emulation reversible pendulum system:
&theta; ( t + 1 ) = &theta; ( t ) + &tau; &CenterDot; &theta; &CenterDot; ( t )
&theta; &CenterDot; ( t + 1 ) = &theta; &CenterDot; ( t ) + &tau; &CenterDot; &theta; &CenterDot; &CenterDot; ( t )
Time τ generally is set at 0.02 second, and the reversible pendulum system of giving more than obviously is a deterministic system.For the Model free control that is applicable to the continuous random system based on the method for operant conditioning reflex automaton model too, i.e. f are described SCan be unknown.In above deterministic models, introduced noise signal and constituted an inverted pendulum model at random, promptly in emulation in order to the inverted pendulum equation above equation replaces down.
&theta; &CenterDot; ( t + 1 ) = &theta; &CenterDot; ( t ) + &tau; &CenterDot; &theta; &CenterDot; &CenterDot; ( t ) + d
Wherein d is a random noise, and d is at [1.5,1.5] last one even stochastic distribution noise here.
Output function definition f z: z m=s i, m=i, i={0,1,2}.Output set Z={z 0, z 1, z 2, z 0=s 0, z 1=s 1, z 2=s 2
The incoming symbol set A={ a of inverted pendulum experiment 0, Wherein θ is the angle that fork departs from perpendicular line, For departing from the angular velocity of perpendicular line.State set S={s 0, s 1, s 2, s wherein 0Expression inverted pendulum state of a control is bad, s 1Expression inverted pendulum state of a control better/bad, s 2Expression inverted pendulum state of a control is good.Output set Z={z 0, z 1, z 2, z 0Expression inverted pendulum control effect is bad, z 1Expression inverted pendulum control effect better/bad, z 1The control of expression inverted pendulum is effective, promptly reaches the control requirement.Operational set O={o 1, o 2, o 3, o wherein 1Expression applies a power to the right, o for the dolly base 2Expression applies small zero power, an o of approaching for the dolly base 3Expression applies a power left for the dolly base.Regular collection R is: r Ijk: s i* a j→ o k(p Ijk), represent that promptly inverted pendulum is being in s i(∈ S) state and be input as a jUnder the condition of (∈ A) according to Probability p IjkImplementation and operation o k(∈ O), p Ijk=p (o k| s i∩ a j) be that inverted pendulum is in s at state iBe input as a iImplementation and operation o under the condition of ∈ A kProbability, claim regular r again IjkExcitation probability.State orientation function value is ψ: S * A → (0,1,2), wherein ψ 00(s 0, a 0)=0, ψ 10(s 1, a 0)=1, ψ 20(s 2, a 0)=2.
Inverted pendulum experiment control method basic step is as follows:
(1) sets the starting condition of testing.Wherein, gravity acceleration g=9.8m/s 2, dolly mass M=1.0kg, the quality m=0.1kg of bar, half long L=0.5m of bar.Drift angle range Theta ∈ [0.1 ,+0.1] is set, angular velocity range
Figure A20091008699000124
When stipulating inverted pendulum drift angle left avertence here on the occasion of, be negative value during right avertence, same, the angular velocity direction left the time for just, direction to the right the time for bearing.Initial input is a 0(0) get θ (0)=5 °=0.087,
Figure A20091008699000125
Wherein the pendulum angle value is converted into radian value.Original state is s (0)=s 0, wherein working as θ ∈ [0.1 ,-0.03] or θ ∈ [+0.03 ,+0.1] is s 0, promptly out of order, when θ ∈ (0.03 ,+0.005) or θ ∈ (+0.005 ,+0.03), be s 1, promptly state better/bad, be s when θ ∈ [0.005 ,+0.005] 2, promptly State Control is good.Set three operating physical forces of inverted pendulum, i.e. controlled quentity controlled variable O={o 1, o 2, o 3}={-5,0.1,5}, inverted pendulum select the initial probability of these three operating physical forces to be 1/3, given iterative learning step number Tf=1000, and learning rate λ=0.02, working rule sum r=3, it is p that optimum behavior is selected probability threshold value ε=0.95.
(2) selection operation and implementation and operation at random.Being located at the state that t observes inverted pendulum constantly is s a∈ S is input as a b∈ A, state orientation function value is ψ Ab∈ ψ is according to the operation o that chooses of " condition-operation " rule at random among the set R c∈ O, behind the implementation and operation according to the state transition equation f of inverted pendulum s: S (t) * A (t) * O (t) → S (t+1) carries out state transitions.
The inverted pendulum state transition equation can be described with the following equation of motion
&theta; &CenterDot; &CenterDot; = m ( m + M ) gl ( M + m ) I + Mml 2 &theta; - ml ( M + m ) I + Mml 2 o k
In the formula
Figure A20091008699000132
Figure A20091008699000133
K=1,2 ..., n o
Come the emulation reversible pendulum system with following difference equation:
&theta; ( t + 1 ) = &theta; ( t ) + &tau; &CenterDot; &theta; &CenterDot; ( t )
&theta; &CenterDot; ( t + 1 ) = &theta; &CenterDot; ( t ) + &tau; &CenterDot; &theta; &CenterDot; &CenterDot; ( t )
Time τ generally is set at 0.02 second, and the reversible pendulum system of giving more than obviously is a deterministic system.
Also available random inverted pendulum model comes emulation, promptly uses following formula:
&theta; &CenterDot; ( t + 1 ) = &theta; &CenterDot; ( t ) + &tau; &CenterDot; &theta; &CenterDot; &CenterDot; ( t ) + d
Formula above replacing &theta; &CenterDot; ( t + 1 ) = &theta; &CenterDot; ( t ) + &tau; &CenterDot; &theta; &CenterDot; &CenterDot; ( t ) .
Wherein d is a random noise, here d be on [1.5,1.5]-even stochastic distribution noise.
Output function definition f z: z i=s i, i=0,1,2.Output set Z={z 0, z 1, z 2, z 0=s 0, z 1=s 1, z 2=s 2Constantly do exercises by inverted pendulum after equation shifts at t when inverted pendulum, t+1 moment state is s d∈ S, the state orientation function value that obtains t+1 moment inverted pendulum so is ψ Db∈ ψ.
(3) operant conditioning reflex.According to the variable quantity of the state orientation function ψ value of inverted pendulum, promptly According to operant conditioning reflex unit δ to " condition-operation " regular r at random AbcExcitation probability is regulated.The δ here is:
&delta; : &ForAll; k &NotEqual; c p abk ( t + 1 ) = p abk ( t ) - &xi; ( &psi; &RightArrow; abk ) &CenterDot; p abk ( t ) p abk ( t + 1 ) = max min ( p abk ( t + 1 ) , 0,1 ) p abc ( t + 1 ) = 1 - &Sigma; k &NotEqual; c p abk ( t + 1 )
Wherein,
Figure A200910086990001310
Be that inverted pendulum is at state s a(∈ S) and be input as a bImplementation and operation o under the condition of (∈ A) c(∈ O) back state transitions is s dThe variable quantity of orientation function value before and after (∈ S), available this variable quantity is judged the quality of this operation;
Figure A200910086990001311
Be monotonic increasing function, and if only if for ξ (x)=0
Figure A200910086990001312
Figure A200910086990001313
R is the working rule sum, and λ is a learning rate, and promptly the speed of each iterative learning is got r=3 here, λ=0.02.p Abc(t) (a ∈ 0,1,2 ..., n S; B ∈ 0,1,2 ..., n A; C ∈ 1,2 ..., n o) be that inverted pendulum is in s a(∈ S) and be input as a bImplementation and operation o when (∈ A) cProbability p (the o of (∈ O) c| s a∩ a b) in t value constantly, when The time, implementation and operation o is described c(∈ O) and transfering state are s dOrientation function value after (∈ S) diminishes, and promptly the orientation variation then obtains p Abc(t+1)<p Abc(t), represent next selection operation o constantly cThe probability of (∈ O) reduces; When
Figure A200910086990001315
The time, implementation and operation o is described c(∈ O) and transfering state are s dOrientation function value after (∈ S) is constant, and promptly orientation is also constant, then obtains p Abc(t+1)=p Abc(t), represent next selection operation o constantly cThe probability of (∈ O) is constant; When
Figure A200910086990001316
The time, implementation and operation o is described c(∈ O) and transfering state are s dIt is big that orientation function value after (∈ S) becomes, and promptly orientation improves, and then obtains p Abc(t+1)>p Abc(t), expression t+1 moment selection operation o cThe probability of (∈ O) increases.
Specifically, use inverted pendulum and determine model emulation, when t imports θ (t)=0.046 He during the moment
Figure A20091008699000141
The time inverted pendulum to left avertence, and drift angle acceleration direction is to the right the time, the inverted pendulum state is well s 2If this moment is by the excitation probability p in the regular collection 203(t)=p (o 3| s 2∩ a 0)=0.335 is chosen is operating as o 3, promptly a left side pushes away, again by the inverted pendulum equation of motion
&theta; &CenterDot; &CenterDot; = m ( m + M ) gl ( M + m ) I + Mml 2 &theta; - ml ( M + m ) I + Mml 2 o k And difference equation &theta; ( t + 1 ) = &theta; ( t ) + &tau; &CenterDot; &theta; &CenterDot; ( t ) &theta; &CenterDot; ( t + 1 ) = &theta; &CenterDot; ( t ) + &tau; &CenterDot; &theta; &CenterDot; &CenterDot; ( t ) Try to achieve t+1
In the formula
Figure A20091008699000145
K=1,2 ..., n O
Moment θ (t+1)=0.031, promptly inverted pendulum t+1 state transitions constantly is better s 1, the orientation function value ψ of this moment 10(s 1, a 0)=1 is than t moment ψ 20(s 2, a 0)=2 are little, obtain p 203(t+1)<p 203(t), thus during next iterative learning at input a 0And state s 2Following selection operation o 3The probability that a left side pushes away will reduce, and correspondingly, selects the probability of other two kinds of operations to increase.
Maxmin (p wherein Abk(t+1), be to work as p 0,1) Abk(t+1)>1 o'clock p Abk(t+1)=1; p Abk(t+1)<0 o'clock p Abk(t+1)=0 can guarantee p Abk(t+1) ∈ [0,1], and here
Figure A20091008699000146
Promptly be illustrated in and take the probability of different operating under the same state of same input and be 1.When t → ∞, if p Abc(t) → 1, description operation o then c(∈ O) is in s at state a(∈ S) and be input as a bBehavior optimum under the condition of (∈ A).
(4) externally output.By output function definition f z: z m=s i, i=0,1,2, m=i.Press output set Z={z 0, z 1, z 2, z 0=s 0, z 1=s 1, z 2=s 2Externally export t+1 state constantly.
(5) whether the judgment experiment stop condition reaches.
When t+1 satisfies constantly | θ |≤0.005 and
Figure A20091008699000147
And p Abc(t+1)=p (o c| s a∩ a b)>0.95 o'clock thinks that then inverted pendulum can realize its autonomic balance control by study, then after inverted pendulum by this input of this state lasting selection operation o down cUp to reaching iterations Tf oOtherwise repeat (2)-experimental procedure of (4), till satisfying condition.
Fig. 6, Fig. 7 show, under similarity condition, determine that model or probabilistic model adopt the method for operant conditioning reflex automaton model can both successfully control the balance of inverted pendulum no matter be, obviously, because the introducing of random noise has increased the learning difficulty of probabilistic model, each test on average will can realize the control of inverted pendulum autonomic balance after 8800 times.

Claims (2)

1, operant conditioning reflex automat, the operant conditioning reflex automat is designated hereinafter simply as OCM, it is characterized in that: be one eight tuple
OCM=<A,S,O,Z,R,f,ψ,δ>,
Wherein,
(1) incoming symbol of OCM set: A={a j| j=0,1,2 ..., n A, a jBe j incoming symbol of OCM;
(2) internal state of OCM set: S={s i| i=0,1,2 ..., n S, s iBe i state symbol of OCM;
(3) built-in function of OCM set: O={o k| k=1,2 ..., n O, o kBe k functional symbol of OCM;
(4) output symbol of OCM set: Z={z m| m=0,1,2 ..., n Z, z mBe m output symbol of OCM;
(5) regular collection of OCM: R={r Ijk| i ∈ 0,1,2 ..., n S; J ∈ 0,1,2 ..., n A; K ∈ 1,2 ..., n O, each element r of R Ijk∈ R represents " condition-operation " rule at random:
r ijk:s i×a j→o k(p ijk)
Be that OCM is in s at state i(∈ S) and be input as a jUnder the condition of (∈ A) according to Probability p IjkImplementation and operation o k(∈ O), p Ijk=p (o k| s i∩ a j) be that OCM is in s at state iBe input as a iImplementation and operation o under the condition of ∈ A kProbability, claim regular r again IjkExcitation probability;
The state space equation of 15 (6) OCM:
f : f S : S ( t ) &times; A ( t ) &times; O ( t ) &RightArrow; S ( t + 1 ) f Z : S ( t ) &times; A ( t ) &times; O ( t ) &RightArrow; Z ( t + 1 )
Wherein, f SIt is the state transition equation of OCM, OCM t+1 state s (t+1) (∈ S) constantly by t constantly state s (t) (∈ S) and t imports a (t) (∈ A) constantly and t operation o (t) (∈ O) constantly is definite, irrelevant with its t state, input and operation before the moment, and, f SBe unknown, but the result of OCM state transitions is OCM self observation; f ZIt is the output equation of OCM, OCMt+1 output z (t+1) (∈ Z) constantly by t constantly state s (t) (∈ S) and t imports a (t) (∈ A) constantly and t operation o (t) (∈ O) constantly is definite, irrelevant with its t state and input and operation before constantly, the output of OCM is external world observation;
(7) the state orientation function of OCM: ψ: S * A → [h, q], h is defined as the poorest orientation function value of orientation, and q is that the best orientation function value of orientation is for arbitrary s i(∈ S) and input a j(∈ A), ψ Ij=ψ (s i, a j) be that OCM is about state s iWith input a jExpectation value, if ψ Ij<0, then claim s iBe that OCM is being input as a jThe time negative state of orientation; If ψ Ij=0, then claim s iBe that OCM is being input as a jThe time zero state of orientation; If ψ Ij>0, then claim s iBe that OCM is being input as a jThe time just orientation state;
(8) the operant conditioning reflex law of learning of OCM:
Figure A2009100869900002C2
If OCM t state constantly is s (t)=s a∈ S, input a (t)=a b∈ A, according among the set R at random " condition-operation " rule choose be operating as o (t)=o c∈ O observes t+1 state s (t+1)=s constantly behind the implementation and operation d∈ S, then based on the operant conditioning reflex principle, " condition-operation " regular p at random among the operational set R Abk(k=1,2 ..., n O) excitation probability comply with
&delta; : &ForAll; k &NotEqual; c p abk ( t + 1 ) = p abk ( t ) - &xi; ( &psi; &RightArrow; abk ) &CenterDot; p abk ( t ) p abk ( t + 1 ) = max min ( p abk ( t + 1 ) , 0,1 ) p abc ( t + 1 ) = 1 - &Sigma; k &NotEqual; c p abk ( t + 1 )
Regulate, wherein,
Figure A2009100869900003C2
Be that OCM is in s at state a(∈ S) and be input as a bImplementation and operation o under the condition of (∈ A) c(∈ O) back state transitions is s d(∈ S) be the variable quantity of orientation function value afterwards, judges the quality of this operation with this variable quantity;
Figure A2009100869900003C3
Be monotonic increasing function, and if only if for ξ (x)=0
Figure A2009100869900003C4
R is the working rule sum, and λ is a learning rate, i.e. the speed of each iterative learning; p Abc(t) (a ∈ 0,1,2 ..., n S; B ∈ 0,1,2 ..., n A; C ∈ 1,2 ..., n O) be that the OCM state is in s a(∈ S) and be input as a bImplementation and operation o when (∈ A) cProbability p (the o of (∈ O) c| s a∩ a b) in t value constantly, when
Figure A2009100869900003C5
The time, implementation and operation o is described c(∈ O) and transfering state are s dOrientation function value after (∈ S) diminishes, i.e. orientation variation, then p Abc(t+1)<p Abc(t), represent next selection operation o constantly cThe probability of (∈ O) reduces; When
Figure A2009100869900003C6
The time, implementation and operation o is described c(∈ O) and transfering state are s dOrientation function value after (∈ S) is constant, and promptly orientation is also constant, at this moment p Abc(t+1)=p Abc(t), represent next selection operation o constantly cThe probability of (∈ O) is constant; When
Figure A2009100869900003C7
The time, implementation and operation o is described c(∈ O) and transfering state are s dIt is big that orientation function value after (∈ S) becomes, and promptly orientation improves, then p Abc(t+1)>p Abc(t), represent next selection operation o constantly cThe probability of (∈ O) increases; Max min (p wherein Abk(t+1), be to work as p 0,1) Abk(t+1)>1 o'clock p Abk(t+1)=1; p Abk(t+1)<0 o'clock p Abk(t+1)=0 guarantee p Abk(t+1) ∈ [0,1], and
Figure A2009100869900003C8
Promptly be illustrated in and take the probability of different operating under the same state of same input and be 1, when t → ∞, if p Abc(t) → 1, description operation o c(∈ O) is in s at state a(∈ S) and be input as a bBehavior optimum under the condition of (∈ A); When reaching iterations or work as a certain state, study is in s a(∈ S) and be input as a bImplementation and operation o under the condition of (∈ A) cThe Probability p of (∈ O) Abc(t) 〉=p εIn time, stop to learn p ε∈ [0.7,1].
2. the application of operant conditioning reflex automat as claimed in claim 1 in bionical autonomous learning control is characterized in that, comprises the steps:
(1) sets the starting condition of testing; The initial state s of given OCM (0), the initial input a (0) of given OCM, learning rate λ, each " condition-operation " regular r at random among the given R Ijk(i ∈ 0,1,2 ..., n S; J ∈ 0,1,2 ..., n A; K ∈ 1,2 ..., n O) initial excitation probability p Ijk(0)=and 1/r, given iterative learning step number Tf or optimum behavior are selected Probability p ε
(2) selection operation and implementation and operation at random; According to OCMt constantly state s (t) ∈ S and input a (t) ∈ A and R in each " condition-operation " regular r at random Ijk(i ∈ 0,1,2 ..., n S; J ∈ 0,1,2 ..., n A; K ∈ 1,2 ..., n O) excitation probability t value p constantly Ijk(t), press t moment state probable value p of each operation down Ijk(t) distribute, select t operation o (t) ∈ O constantly randomly; If OCM t is state s (t)=s constantly a, input a (t)=a b, choose t operation o (t)=o constantly c, then the state of OCM is according to f S: S (t) * A (t) * O (t) → S (t+1) state transition equation occurrence features shifts;
(3) operant conditioning reflex; If observe state s (t+1)=s d∈ S, t+1 constantly then operant conditioning reflex unit δ to " condition-operation " regular r at random AbcExcitation probability is regulated, r AbcExcitation probability t+1 value constantly
&delta; : &ForAll; k &NotEqual; c p abk ( t + 1 ) = p abk ( t ) - &xi; ( &psi; &RightArrow; abk ) &CenterDot; p abk ( t ) p abk ( t + 1 ) = max min ( p abk ( t + 1 ) , 0,1 ) p abc ( t + 1 ) = 1 - &Sigma; k &NotEqual; c p abk ( t + 1 )
Max min (p wherein Abk(t+1), be to work as 0,1)
p Abk(t+1)>1 o'clock p Abk(t+1)=1; p Abk(t+1)<0 o'clock p Abk(t+1)=0; Guarantee p Abk(t+1) ∈ [0,1]; And
&Sigma; k = 1 n o p abk ( t ) = 1 ;
(4) by the output equation f of system Z: S (t) * A (t) * O (t) → Z (t+1) externally exports Z (t+1);
(5) repeat (2)-step of (4), up to reaching iterative learning number of times Tf or working as p Abc(t+1)>p εIn time, stop to test.
CN200910086990A 2009-06-12 2009-06-12 Operant conditioning reflex automatic machine and application thereof in control of biomimetic autonomous learning Pending CN101673354A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910086990A CN101673354A (en) 2009-06-12 2009-06-12 Operant conditioning reflex automatic machine and application thereof in control of biomimetic autonomous learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910086990A CN101673354A (en) 2009-06-12 2009-06-12 Operant conditioning reflex automatic machine and application thereof in control of biomimetic autonomous learning

Publications (1)

Publication Number Publication Date
CN101673354A true CN101673354A (en) 2010-03-17

Family

ID=42020573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910086990A Pending CN101673354A (en) 2009-06-12 2009-06-12 Operant conditioning reflex automatic machine and application thereof in control of biomimetic autonomous learning

Country Status (1)

Country Link
CN (1) CN101673354A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103792846A (en) * 2014-02-18 2014-05-14 北京工业大学 Robot obstacle avoidance guiding method based on Skinner operating condition reflection principle
CN104570738A (en) * 2014-12-30 2015-04-29 北京工业大学 Robot track tracing method based on Skinner operant conditioning automata
CN104614988A (en) * 2014-12-22 2015-05-13 北京工业大学 Cognitive and learning method of cognitive moving system with inner engine
CN105094124A (en) * 2014-05-21 2015-11-25 防灾科技学院 Method and model for performing independent path exploration based on operant conditioning
CN105205533A (en) * 2015-09-29 2015-12-30 华北理工大学 Development automatic machine with brain cognition mechanism and learning method of development automatic machine
CN109154798A (en) * 2016-05-09 2019-01-04 1Qb信息技术公司 For improving the method and system of the strategy of Stochastic Control Problem
CN109212975A (en) * 2018-11-13 2019-01-15 北方工业大学 A kind of perception action cognitive learning method with developmental mechanism
CN111580392A (en) * 2020-07-14 2020-08-25 江南大学 Finite frequency range robust iterative learning control method of series inverted pendulum
CN111736471A (en) * 2020-07-14 2020-10-02 江南大学 Iterative feedback setting control and robust optimization method of rotary inverted pendulum

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103792846A (en) * 2014-02-18 2014-05-14 北京工业大学 Robot obstacle avoidance guiding method based on Skinner operating condition reflection principle
CN103792846B (en) * 2014-02-18 2016-05-18 北京工业大学 Based on the robot obstacle-avoiding air navigation aid of Skinner operant conditioning reflex principle
CN105094124A (en) * 2014-05-21 2015-11-25 防灾科技学院 Method and model for performing independent path exploration based on operant conditioning
CN104614988B (en) * 2014-12-22 2017-04-19 北京工业大学 Cognitive and learning method of cognitive moving system with inner engine
CN104614988A (en) * 2014-12-22 2015-05-13 北京工业大学 Cognitive and learning method of cognitive moving system with inner engine
CN104570738A (en) * 2014-12-30 2015-04-29 北京工业大学 Robot track tracing method based on Skinner operant conditioning automata
CN105205533A (en) * 2015-09-29 2015-12-30 华北理工大学 Development automatic machine with brain cognition mechanism and learning method of development automatic machine
CN105205533B (en) * 2015-09-29 2018-01-05 华北理工大学 Development automatic machine and its learning method with brain Mechanism of Cognition
CN109154798A (en) * 2016-05-09 2019-01-04 1Qb信息技术公司 For improving the method and system of the strategy of Stochastic Control Problem
CN109212975A (en) * 2018-11-13 2019-01-15 北方工业大学 A kind of perception action cognitive learning method with developmental mechanism
CN111580392A (en) * 2020-07-14 2020-08-25 江南大学 Finite frequency range robust iterative learning control method of series inverted pendulum
CN111736471A (en) * 2020-07-14 2020-10-02 江南大学 Iterative feedback setting control and robust optimization method of rotary inverted pendulum
CN111580392B (en) * 2020-07-14 2021-06-15 江南大学 Finite frequency range robust iterative learning control method of series inverted pendulum

Similar Documents

Publication Publication Date Title
CN101673354A (en) Operant conditioning reflex automatic machine and application thereof in control of biomimetic autonomous learning
CN105279555B (en) A kind of adaptive learning neural network implementation method based on evolution algorithm
Van Gelder et al. It’s about time: An overview of the dynamical approach to cognition
CN109492765A (en) A kind of image Increment Learning Algorithm based on migration models
CN109829541A (en) Deep neural network incremental training method and system based on learning automaton
CN113272828A (en) Elastic neural network
CN103971180A (en) Continuous optimization problem solving method based on pigeon-inspired optimization
CN103679139A (en) Face recognition method based on particle swarm optimization BP network
CN103593703A (en) Neural network optimizing system and method based on genetic algorithms
CN113255936A (en) Deep reinforcement learning strategy protection defense method and device based on simulation learning and attention mechanism
CN106980831A (en) Based on self-encoding encoder from affiliation recognition methods
CN105989376B (en) A kind of hand-written discrimination system neural network based, device and mobile terminal
CN104680025A (en) Oil pumping unit parameter optimization method on basis of genetic algorithm extreme learning machine
CN103472866A (en) Furnace temperature optimization system and method of pesticide waste liquid incinerator of intelligent fuzzy system
CN104614988B (en) Cognitive and learning method of cognitive moving system with inner engine
CN109948771A (en) It is a kind of to be looked for food the Situation Assessment algorithm of Optimized BP Neural Network based on bacterium
CN107886163A (en) Single-object problem optimization method and device based on AGN and CNN
CN107563496A (en) A kind of deep learning mode identification method of vectorial core convolutional neural networks
Zhu et al. Extracting decision tree from trained deep reinforcement learning in traffic signal control
Singh et al. Cloud Hopfield neural network: Analysis and simulation
Shultz 17 Computational Models in Developmental Psychology
Belavkin et al. Conflict resolution and learning probability matching in a neural cell-assembly architecture
Ge et al. A cooperative framework of learning automata and its application in tutorial-like system
Wu et al. Not only pairwise relationships: fine-grained relational modeling for multivariate time series forecasting
CN104766359A (en) Method for determining noise items of winged insect motion models

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20100317