CN101599137A - Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior - Google Patents

Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior Download PDF

Info

Publication number
CN101599137A
CN101599137A CNA2009100892633A CN200910089263A CN101599137A CN 101599137 A CN101599137 A CN 101599137A CN A2009100892633 A CNA2009100892633 A CN A2009100892633A CN 200910089263 A CN200910089263 A CN 200910089263A CN 101599137 A CN101599137 A CN 101599137A
Authority
CN
China
Prior art keywords
aoc
state
probability
constantly
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2009100892633A
Other languages
Chinese (zh)
Inventor
阮晓钢
戴丽珍
蔡建羡
陈静
郜园园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CNA2009100892633A priority Critical patent/CN101599137A/en
Publication of CN101599137A publication Critical patent/CN101599137A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior belong to bionical field.A kind of autonomous operant conditioning reflex automat AOC relates to a kind of discrete calculation machine model of describing autonomous formula automaton, mainly comprise: operational set, state set, " condition-operation " regular collection, observable state transitions, and operant conditioning reflex law of learning, and, defined behavior entropy, stipulated the recurrence working procedure of AOC based on AOC state orientation value.The key character of AOC is to simulate biological operant conditioning reflex mechanism, thereby has a bionical self organizing function, comprise self study and adaptation function, can be used for describing, simulation designs various self-organizing systems, especially, should be applied to describe simulation, the various intelligent behaviors of planing machine robot system.

Description

Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior
Technical field
The present invention relates to a kind of automat, a kind of biorobot based on the operant conditioning reflex principle.
Background technology
The automaton model that is used for learning system has just had at 1960s, and is called as learning automaton, in the past few years, mainly is that the structure that changes learning automaton satisfies different application requirements, generally is that existing input has output again.The present invention is a self-organizing system based on Si Jinna operant conditioning reflex theory, has self study and adaptation function.This Jenner just began the experimental study of animal learning and proposed two kinds of study forms from late 1920s: a kind of is through canonical form conditioned reflex study, in order to mould organic respondent behavior; Another kind is operant conditioned reflex study, in order to mould organic operation behavior.Western scholar thinks that these two kinds of reflections are two kinds of different connection processes: classical conditioning is the connection process of S--R; Operant conditioned reflex is the connection process of R--S.
The nearly more than ten years, the academic attention rate of autonomous system increases year by year, and the document total amount relevant with autonomous system increases year by year.The present invention is an autonomous formula automat, and with Non-Self-Governing formula automat difference, its output does not need the driving of external command, is that automat is made according to the needs of self.Relevant patent as: application number is 98115560.X, and name is called the screen generating method of customer operated type automatic computer and customer operated type automatic computer, application number and is 200710071071.0 and is called based on band memory and determines that the regular expression method for matching in speedup etc. of finte-state machine all is automat to be intersected with external environment realize certain function.At present, autonomous formula operant conditioning reflex automat did not also occur.
The present invention proposes an abstract self-organizing model based on Si Jinna operant conditioning reflex theory, be used for describing, simulation, design various self-organizing systems, make it show self study and adaptive characteristic, especially, should be applied to describe, simulation, the various intelligent behaviors of planing machine robot system.
Summary of the invention
The invention provides a kind of description that can be used for, simulation, the autonomous operant conditioning reflex automat that design has self-organization (comprising self study and self-adaptation) function.
Operant conditioning reflex automat of the present invention is one nine tuple, comprise: the incoming symbol set, internal state set, built-in function set, output symbol set, " condition-operation " regular collection, state transitions unit, observing unit, state orientation unit at random, and operant conditioning reflex unit, and, stipulated the recurrence working procedure of AOC.The key character of AOC is to simulate biological operant conditioning reflex mechanism, thereby has bionical self organizing function, comprises self study and adaptation function, can be used for describing, and simulation designs various self-organizing systems with interactive function.
A general finite-state automata is five-tuple a: FA={A, Z, S, f, g}.Wherein, A represents limited incoming symbol set, state symbol set (s (0) ∈ S is an original state) that S represents limited (inside), and Z represents limited output (receive status) assemble of symbol, and f:S * A → S represents state transition function, and g:S → Z represents output function.Finite-state automata FA is a non-autonomous system.
Among the AOC functional symbol be not equal to finite-state automata FA in incoming symbol, the functional symbol representative is the built-in function of AOC among the AOC, and among the FA incoming symbol representative be external command, on this meaning, AOC and finite-state automata FA be non-equivalence seemingly.Functional symbol set omega among the AOC is not the incoming symbol set among the FA, but the built-in function of AOC.It on the incoming symbol collective entity among the FA set of the instruction that may import of outside.Do not have the output symbol set among the AOC, do not have output function naturally yet.As autonomous formula system, AOC needs output symbol set and output function.Autonomous system also can or also need to act on environment or objective world.From the form of state space equation, output is the combination of state, or the combination of state and operation, and therefore, internal state set that we can say AOC itself is exactly a kind of set of output symbol, and the state of AOC can be observed; " state of AOC can be observed " is meant that AOC self has receptor in the definition, can detect the change of oneself state, and do not mean that the external world can observe this tittle; Autonomous formula automat also needs output, and this output does not need the driving of external command, is that automat is made according to the needs of self.
Autonomous formula automat is compared with Non-Self-Governing formula automat, it is advantageous that its output does not need the driving of external command, certain that to be automat make according to the needs of self acts on the action of environment, even that is to say external environment condition changes, autonomous formula automat still can be worked as usual, and Non-Self-Governing formula automat need change the variation that structural model or parameter adapt to external environment condition.Non-autonomous system always can be converted into autonomous system, so always can find an autonomous operant conditioning reflex AOC corresponding with corresponding Non-Self-Governing operant conditioning reflex automat.Autonomous operant conditioning reflex AOC uses more extensive.
In information theory, entropy can be used as measuring of certain incident uncertainty.Quantity of information is big more, and architecture is regular more, and function is perfect more, and entropy is just more little.Utilize the notion of entropy, theoretically the metering of research information, transmission, conversion, storage.The present invention introduces the notion of operation entropy, the convergence of proof AOC operation entropy ψ (t), because the process of system's self-organization is the process of absorption information, it is the process of drawing negentropy, it is the process that removes uncertainty, so also just illustrate the self-organization characteristic of AOC, AOC has self study and adaptation function really.
The present invention proposes a kind of autonomous operant conditioning reflex automat, and simulate the zoopery of Si Jinna with it, realized simulated operation conditioned reflex learning mechanism to prove this automat, also realize the Balance Control of double-wheel self-balancing robot simultaneously, illustrate that AOC can be used for the various intelligent behaviors of planing machine robot system with it.
Automat of the present invention is the autonomous operant conditioning reflex automat of one nine tuple:
AOC=<t,Ω,S,Γ,δ,ε,η,ψ,s 0>
Figure A20091008926300071
Wherein
(1) discrete time of AOC: t ∈ 0,1,2 ..., n t, t=0 is the initial moment of AOC;
(2) functional symbol of AOC set: Ω={ α k| k=1,2 ..., n Ω, α kK functional symbol for AOC;
(3) state set of AOC: S={s i| i=0,1,2 ..., n S, s iI state for AOC;
(4) working rule of AOC set: Γ={ r Ik(p) | p ∈ P; I ∈ 0,1,2 ..., n S; K ∈ 0,1,2 ..., n Ω, " condition-operation " regular r at random Ik(p): s i→ α k(p) mean that AOC is in s at its state iUnder the condition of ∈ S according to Probability p ∈ P implementation and operation α k∈ Ω, p=p Ik=p (α k| s i) be that AOC is in s at state iCondition under implementation and operation α kProbable value, P represents p IkSet;
(5) state transition function of AOC: δ: S (t) * Ω (t) → S (t+1), AOC t+1 state s (t+1) ∈ S is constantly determined by t state s (t) ∈ S and t operation α (t) ∈ Ω constantly constantly, and is irrelevant with state and the operation of its t before the moment; The determined state transitions process of δ is known or unknown, but the result of its state transitions can observe;
(6) orientation function of AOC: ε: S → E={ ε i| i=0,1,2 ..., n S, ε i=ε (s i) ∈ E is state s iThe orientation value of ∈ S;
(7) the operant conditioning reflex law of learning of AOC:
Figure A20091008926300072
Regulate working rule r Ik(p) the enforcement Probability p ∈ P of ∈ Γ: suppose that t state constantly is s (t), implementation and operation α (t) ∈ Ω, the state s (t+1) that t+1 observes constantly, operant conditioning reflex theory according to Si Jinna, if ε (s (t+1))-ε (s (t))<0 then p (α (t) | s (t)) tend to reduce, otherwise, if ε (s (t+1))-ε (s (t))>0 then p (α (t) | s (t)) tend to increase.Autonomous operant conditioning reflex automat AOC is in state s (t)=s in the t moment iAnd current selection operation α (t)=α k, simultaneously according to the state transitions transfer function, next state s (t+1)=s constantly jThe operant conditioning reflex mechanism that simulation is biological, next is that the probability of t+1 current operation constantly will change constantly so, its value increases Δ on original basis, here Δ is relevant with orientation value ε, orientation value is big more to show that the result of operation is good more, Δ is big more simultaneously, the probability of t+1 all the other operations constantly all correspondingly deducts a value, and the value that deducts and just in time be Δ, the value that corresponding operating deducts just in time is that the probability of last one this operation constantly accounts for these ratios of operating (not comprising t selected operation of the moment) sum and multiply by Δ.Guaranteed that like this each probability sum of selecting each operation constantly all is 1.Be described as with formula more formally: as s (t)=s i, α (t)=α kAnd s (t+1)=s jThe time, p so Ik(t+1)=p Ik(t)+and Δ, the probability tables of other operation is shown p Iu(t+1)=p Iu(t)-and Δ ξ, u represents 0 to n here ΩBetween be not equal to any one numerical value of k.Wherein, p Ik(t) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S kThe probability of ∈ Ω is in t value constantly; p Ik(t+1) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S kThe probability of ∈ Ω is in t+1 value constantly;
Figure A20091008926300081
And 0≤p Ik+ Δ≤1; &epsiv; &RightArrow; ij = &epsiv; ( s j ) - &epsiv; ( s i ) It is the increment of orientation value;
Figure A20091008926300083
Be monotonic increasing function, satisfy And if only if x=0; A is a learning rate; &xi; = p iu ( t ) / &Sigma; v &NotEqual; k p iv ( t ) , Here v represents 0 to n ΩBetween be not equal to all numerical value of k,
Figure A20091008926300086
Expression AOC state is in s iImplementation and operation α under the condition of ∈ S uThe probability sum of ∈ Ω is in t value constantly; p Iu(t) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S uThe probability of ∈ Ω is at t value constantly, p Iu(t+1) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S uThe probability of ∈ Ω is in t+1 value constantly.
(8) the operation entropy of AOC: ψ: P * E → R +, R +Be positive set of real numbers, AOC is in s at t operation entropy ψ (t) expression t moment state constantly iOperation entropy sum under the condition:
&psi; ( t ) = &psi; ( &Omega; ( t ) | S ) = &Sigma; i = 0 n S p i &psi; i ( t ) = &Sigma; i = 0 n S p ( s i ) &psi; i ( &Omega; ( t ) | s i )
It is in state s (t)=s constantly by t iEvolutionary operator probability set under the condition and orientation function set decision.ψ i(t) be that AOC is in state s iOperation entropy under the condition:
&psi; i ( t ) = &psi; i ( &Omega; ( t ) | s i ) = - &Sigma; k = 1 n &Omega; p ik log 2 p ik = - &Sigma; k = 1 n &Omega; p ( &alpha; k | s i ) log 2 p ( &alpha; k | s i )
Know that operation entropy and weighted sum under each state just can draw AOC at t operation entropy constantly
&psi; ( t ) = - &Sigma; i = 0 n S p i &Sigma; k = 1 n &Omega; p ik log 2 p ik = - &Sigma; i = 0 n S p ( s i ) &Sigma; k = 1 n &Omega; p ( &alpha; k | s i ) log 2 p ( &alpha; k | s i ) . If the operation entropy ψ (t) of AOC is more and more littler and trend towards minimum when t → ∞, so just illustrate that AOC operation entropy ψ (t) is a convergent.AOC is a self-organizing system based on Si Jinna operant conditioning reflex theory, has self study and adaptation function.The process of system's self-organization is the process of absorption information, is the process of drawing negentropy, is the process that removes uncertainty.In order to illustrate the self-organization characteristic of AOC, we need prove the convergence of AOC operation entropy ψ (t).
(9) initial state of AOC: s 0=s (0) ∈ S.
Key character of the present invention is to simulate biological operant conditioning reflex mechanism, thereby has bionical self organizing function, comprises self study and adaptation function, can be used for describing, and simulates, and designs the system of various self-organizations.
Autonomous operant conditioning reflex automat AOC among the present invention recursively moves according to following program step:
(1) initialization: t=0 is set, the original state s of given AOC (0) at random, given learning rate a, given initial operation Probability p Ik(0)=1/n Ω(i=0,1,2 ..., n SK=1,2 ..., n Ω); Given stop time T f
(2) selection operation: according to the regular r among " condition-operation " regular collection Γ among the operational set Γ Ik(p): s i→ α k(p), promptly AOC is in s at its state iUnder the condition of ∈ S according to Probability p ∈ P implementation and operation α k∈ Ω, p=p Ik=p (α k| s i) be that AOC is in s at its state iCondition under implementation and operation α kProbable value, select the AOC state to be in operation α (t) the ∈ Ω of s (t) ∈ S randomly;
(3) in implementation and operation: the t moment, AOC is in state s (t) ∈ S and implements operation α (t) the ∈ Ω that previous step has been chosen, and δ (s (t), α (t))=δ (s takes place to shift current state i, α k);
(4) observer state: according to the state transition function of AOC: δ: S (t) * Ω (t) → S (t+1), the result of state transitions can observe fully, promptly exist j ∈ 0,1,2 ..., n SMake s (t+1)=s j
(5) operant conditioning reflex: at t moment implementation and operation, not only the state of AOC shifts, and it each operates in next enforcement probability constantly and also change, then according to the operant conditioning reflex law of learning
Figure A20091008926300091
Regulate working rule r Ik(p) the enforcement Probability p ∈ P of ∈ Γ.T is s (t)=s constantly iAnd α (t)=α k, t+1 evolutionary operator probability is constantly complied with so
Upgrade.Wherein, And 0≤p Ik+ Δ≤1; &epsiv; &RightArrow; ij = &epsiv; ( s j ) - &epsiv; ( s i ) It is the increment of orientation value; A is a learning rate; &xi; = p iu ( t ) / &Sigma; v &NotEqual; k p iv ( t ) ;
(6) calculating operation entropy: according to the formula of the operation entropy that defines
&psi; ( t ) = - &Sigma; i = 0 n S p i &Sigma; k = 1 n &Omega; p ik log 2 p ik = - &Sigma; i = 0 n S p ( s i ) &Sigma; k = 1 n &Omega; p ( &alpha; k | s i ) log 2 p ( &alpha; k | s i ) . Calculate t operation entropy constantly, wherein, p (s i) be AOC state s iThe probability of occurrence of ∈ S is at t value constantly, p (α k| s i) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S kThe probability of ∈ Ω is in t value constantly.
(7) recurrence shifts: if t+1≤T f, t=t+1 also repeats (2)-(7) so;
(8) as t+1>T fIn time, shut down.
The process flow diagram of the inventive method is seen Fig. 2.
Description of drawings
Accompanying drawing 1, the structural representation of autonomous operant conditioning reflex automat among the present invention;
T is discrete (1) constantly, and Ω is operation α k(k=1,2 ..., n Ω) set (2), S is state s i(i=0,1,2 ..., n S) set (3), δ is state transition function (4), Γ is " condition-operation " regular r Ik(i ∈ 0,1,2 ..., n S; K ∈ 1,2 ..., n Ω) set (5), ε is orientation function (6), η is conditioned reflex law of learning (7), ψ is behavior entropy (8), s 0Be original state (9).
Accompanying drawing 2, autonomous operant conditioning reflex automat AOC program flow diagram;
Accompanying drawing 3, the operation behavior probability curve of small white mouse;
Accompanying drawing 4, the operation entropy curve of small white mouse experiment;
Accompanying drawing 5, the operation behavior probability curve of machine dove;
Accompanying drawing 6, the operation entropy curve of machine dove experiment;
Accompanying drawing 7, double-wheel self-balancing robot is at erectility each operation behavior probability curve when being bias angle theta=0 °;
Accompanying drawing 8, double-wheel self-balancing robot be each the operation behavior probability curve during 0 °<θ<12 ° in the drift angle;
Accompanying drawing 9, double-wheel self-balancing robot each operation behavior probability curve during in bias angle theta=12 °;
Accompanying drawing 10, double-wheel self-balancing robot be each the operation behavior probability curve during-12 °<θ<0 ° in the drift angle;
Accompanying drawing 11, double-wheel self-balancing robot each operation behavior probability curve during in bias angle theta=-12 °;
Accompanying drawing 12, the operation entropy curve of double-wheel self-balancing robot experiment;
Embodiment
The small white mouse that one: one minimum system one of embodiment has learning ability, the small white mouse experiment of simulation Si Jinna.Describe the small white mouse experiment of Si Jinna briefly: put white mouse in the Skinner box into, and establish a lever, the structure of chest is got rid of all outside stimuluss as far as possible.Small white mouse can be free movable in case, when its presses lever, just has in the dish that a food falls into the chest below, and small white mouse just can be had food.Case has the action of a device recording animal outward.Small white mouse can learn ceaselessly to press lever, rewards by the action acquisition food of oneself.This experiment realizes the small white mouse experiment of Si Jinna by autonomous operant conditioning reflex automat.It is to press lever α that small white mouse has one of two operation behavior 1, another is not press lever α 2, i.e. operational set Ω={ α 1, α 2, probability is represented with p1, p2 respectively.Its state set S={s 0, s 1, s 0The expression starvation, s 1Represent non-starvation.Its working rule: Γ={ r Ik(p) | p ∈ P; I ∈ 0,1}; K ∈ 0,1}}, " condition-operation " regular r at random Ik(p): s i→ α k(p) mean that AOC is in s at its state iUnder the condition of ∈ S according to Probability p ∈ P implementation and operation α k∈ Ω, p=p Ik=p (α k| s i) be that AOC is in s at state iCondition under implementation and operation α kProbable value.Its state transition function: δ: S (t) * Ω (t) → S (t+1), concrete condition is:
s 0×p 1→s 1,s 0×p 2→s 0,s 1×p 1→s 1,s 1×p 2→s 0。Its orientation function: ε: S → E={ ε i| i=0,1}, ε i=ε (s i) ∈ E is state s iThe orientation value of ∈ S, definition simultaneously &Delta; = a &times; &epsiv; &RightArrow; ij &times; ( 1 - p 1 ) . Wherein a is a learning rate,
Figure A20091008926300102
It is the increment of orientation value.The probability of two behaviors of initial time all is 0.5, as long as small white mouse presses lever just can obtain award, press the probability of lever also to increase simultaneously to be next constantly small white mouse select the possibility increase of pressure lever, its probability is according to the operant conditioning reflex law of learning
Figure A20091008926300103
Upgrade, through study once and again, small white mouse selects to press the Probability p of lever 1Increasing.The learning rate a=0.01 of this experiment, after the study through 668 steps, small white mouse association presses lever to obtain food, can find out easily from accompanying drawing 3, and small white mouse is pressed the Probability p of lever 1Finally trend towards 1.In the process of experiment, according to the formula of the operation entropy that defines &psi; ( t ) = - &Sigma; i = 0 n S p i &Sigma; k = 1 n &Omega; p ik log 2 p ik = - &Sigma; i = 0 n S p ( s i ) &Sigma; k = 1 n &Omega; p ( &alpha; k | s i ) log 2 p ( &alpha; k | s i ) Calculated each operation entropy constantly, As time goes on the operation entropy ψ (t) of AOC is more and more littler and trend towards minimum when t → ∞, sees accompanying drawing 4, illustrates that so AOC operation entropy ψ (t) is a convergent.AOC is a self-organizing system based on Si Jinna operant conditioning reflex theory, has self study and adaptation function.The process of system's self-organization is the process of absorption information, is the process of drawing negentropy, is the process that removes uncertainty.Since the convergence of verified AOC operation entropy ψ (t) has so also just been illustrated the self-organization characteristic of AOC.
The concrete implementation step of this experiment is as follows:
(1) initialization: t=0 is set, the original state s of given AOC (0) at random, given learning rate a=0.01, given initial operation Probability p Ik(0)=0.5 (i=0,1; K=1,2}; Given stop time T f=1000;
(2) selection operation: according to the regular Γ={ r among " condition-operation " regular collection Γ among the operational set Γ Ik(p) | p ∈ P; I ∈ 0,1}; K ∈ 1,2}}, " condition-operation " regular r at random Ik(p): s i→ α k(p), promptly AOC is in s at its state iUnder the condition of ∈ S according to Probability p ∈ P implementation and operation α k∈ Ω, p=p Ik=p (α k| s i) be that AOC is in s at 0 state iCondition under implementation and operation α kProbable value, select the AOC state to be in operation α (t) the ∈ Ω of s (t) ∈ S randomly;
(3) in implementation and operation: the t moment, AOC is in state s (t) ∈ S and implements operation α (t) the ∈ Ω that previous step has been chosen, and current state is according to δ: S (t) * Ω (t) → S (t+1), and concrete condition is:
s 0* p 1→ s 1, s 0* p 2→ s 0, s 1* p 1→ s 1, s 1* p 2→ s 0Shift;
(4) observer state: according to the state transition function of AOC: δ: S (t) * Ω (t) → S (t+1), though the state transitions process is known or unknown, the result of its state transitions can observe fully, and { 0,1} makes s (t+1)=s promptly j ∈ j
(5) operant conditioning reflex: at t moment implementation and operation, not only the state of AOC shifts, and it each operates in next enforcement probability constantly and also change, then according to the operant conditioning reflex law of learning
Figure A20091008926300112
Regulate working rule r Ik(p) the enforcement Probability p ∈ P of ∈ Γ.T is s (t)=s constantly iAnd α (t)=α k, t+1 evolutionary operator probability is constantly complied with so
Figure A20091008926300113
Upgrade.Wherein,
Figure A20091008926300114
And 0≤p Ik+ Δ≤1; &epsiv; &RightArrow; ij = &epsiv; ( s j ) - &epsiv; ( s i ) It is the increment of orientation value; A is a learning rate; &xi; = p iu ( t ) / &Sigma; v &NotEqual; k p iv ( t ) ;
(6) calculating operation entropy: according to the formula of the operation entropy that defines
&psi; ( t ) = - &Sigma; i = 0 n S p i &Sigma; k = 1 n &Omega; p ik log 2 p ik = - &Sigma; i = 0 n S p ( s i ) &Sigma; k = 1 n &Omega; p ( &alpha; k | s i ) log 2 p ( &alpha; k | s i ) . Calculate t operation entropy constantly, wherein, p (s i) be AOC state s iThe probability of occurrence of ∈ S is at t value constantly, p (α k| s i) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S kThe probability of ∈ Ω is in t value constantly.
(7) recurrence shifts: if t+1≤T f, t=t+1 also repeats (2)-(7) so;
(8) as t+1>T fIn time, shut down.
Embodiment two: have the machine dove of learning ability, the pigeon experiment of simulation Si Jinna.Obtain food (positive reinforcement stimulation) when the machine dove pecks red button in this experiment, do not have any stimulation when pecking yellow button, shock by electricity when pecking blue buttons (negative reinforcement stimulation), pigeon pecks redly during beginning, and yellow and blue three buttons are at random.After a period of time, pigeon peck get red button number of times apparently higher than pecking the number of times of getting other two buttons.Be the autonomous operant conditioning reflex automat of one 3 operation of machine dove definition, 3 states, its operational set Ω={ α 0, α 1, α 2, its element is respectively to peck red button α 0, peck yellow button α 1With peck blue buttons α 2, probability is represented with p0, p1, p2 respectively.State set S={s 0, s 1, s 2, i.e. Zero Hunger state (non-starvation) s 0, semistarvation state s 1, starvation s 2, the state transitions rule is:
δ(s 0×α 0)=s 0 δ(s 0×α 1)=s 1 δ(s 0×α 2)=s 1
δ(s 1×α 0)=s 0 δ(s 1×α 1)=s 2 δ(s 1×α 2)=s 2
δ(s 2×α 0)=s 1 δ(s 2×α 1)=s 2 δ(s 2×α 2)=s 2
Show as following table 1 with form.Its orientation function: ε: S → E={ ε i| i=0, ± 0.5, ± 1}, ε i=ε (s i) ∈ E is state s iThe orientation value of ∈ S, definition simultaneously &Delta; = a &times; &epsiv; &RightArrow; ij &times; ( 1 - p 0 ) . Its orientation: s 0→ s 0: zero orientation ( &epsiv; &RightArrow; 00 = 0 ) ; s 0→ s 1: zero orientation ( &epsiv; &RightArrow; 01 = 0 ) ; s 1→ s 0: just be orientated ( &epsiv; &RightArrow; 10 = 0.5 ) ; s 1→ s 2: negative orientation ( &epsiv; &RightArrow; 12 = - 0.5 ) ; s 2→ s 1: just be orientated ( &epsiv; &RightArrow; 21 = 1.0 ) ; s 2→ s 2: negative orientation ( &epsiv; &RightArrow; 22 = - 1.0 ) . According to the operant conditioning reflex law of learning
Figure A20091008926300128
Current operation is awarded ( &epsiv; &RightArrow; ij > 0 ) The time correspondence the enforcement probability tend to increase corresponding the reducing of enforcement probability of other operations; Current operation is awarded ( &epsiv; &RightArrow; ij = 0 ) The time all operations probability all constant; Current operation is awarded ( &epsiv; &RightArrow; ij < 0 ) The time correspondence the enforcement probability tend to reduce the corresponding increase of enforcement probability of other operations.The initial probability of each operation all is 1/3, study through general 5000 steps, the machine dove substantially only pecks red button, do not peck red button and blue buttons, from accompanying drawing 5 as can be seen the machine dove Probability p 0 of pecking red button trend towards 1, peck the Probability p 1 of yellow button and the Probability p 2 of blue buttons and all trend towards 0.
The state transitions of table 1 machine dove
Figure A20091008926300131
In the process of experiment, each is constantly all according to the formula of the operation entropy of definition
&psi; ( t ) = - &Sigma; i = 0 n S p i &Sigma; k = 1 n &Omega; p ik log 2 p ik = - &Sigma; i = 0 n S p ( s i ) &Sigma; k = 1 n &Omega; p ( &alpha; k | s i ) log 2 p ( &alpha; k | s i ) Calculated the operation entropy, As time goes on the operation entropy ψ (t) of AOC is more and more littler and trend towards minimum when t → ∞, sees accompanying drawing 6, illustrates that so AOC operation entropy ψ (t) is a convergent.AOC is a self-organizing system based on Si Jinna operant conditioning reflex theory, has self study and adaptation function.The process of system's self-organization is the process of absorption information, is the process of drawing negentropy, is the process that removes uncertainty.Since the convergence of verified AOC operation entropy ψ (t) has so also just been illustrated the self-organization characteristic of AOC.
The concrete implementation step of this experiment is as follows:
(1) initialization: t=0 is set, the original state s of given AOC (0) at random, given learning rate a=0.01, given initial operation Probability p Ik(0)=1/3 (i=0,1,2; K=0,1,2); Given stop time T f=5000;
(2) selection operation: according to " condition-operation " regular collection among the operational set Γ
Γ={ r Ik(p) | p ∈ P; I ∈ 0,1,2}; K ∈ 0,1,2}}, the regular r of " condition-operation " at random Ik(p): s i→ α k(p), promptly AOC is in s at its state iUnder the condition of ∈ S according to Probability p ∈ P implementation and operation α k∈ Ω, p=p Ik=p (α k| s i) be that AOC is in s at its state iCondition under implementation and operation α kProbable value, select the AOC state to be in operation α (t) the ∈ Ω of s (t) ∈ S randomly;
(3) in implementation and operation: the t moment, AOC is in state s (t) ∈ S and implements operation α (t) the ∈ Ω that previous step has been chosen, and current state δ: S (t) * Ω (t) → S (t+1) shifts, and concrete condition is:
δ(s 0×α 0)=s 0 δ(s 0×α 1)=s 1 δ(s 0×α 2)=s 1
δ(s 1×α 0)=s 0 δ(s 1×α 1)=s 2 δ(s 1×α 2)=s 2
δ(s 2×α 0)=s 1 δ(s 2×α 1)=s 2 δ(s 2×α 2)=s 2
(4) observer state: according to the state transition function of AOC: δ: S (t) * Ω (t) → S (t+1), though the state transitions process is known or unknown, the result of its state transitions can observe fully, promptly exists
{ 0,1,2} makes s (t+1)=s to j ∈ j
(5) operant conditioning reflex: at t moment implementation and operation, not only the state of AOC shifts, and it each operates in next enforcement probability constantly and also change, then according to the operant conditioning reflex law of learning Regulate working rule r Ik(p) the enforcement Probability p ∈ P of ∈ Γ.T is s (t)=s constantly iAnd α (t)=α k, t+1 evolutionary operator probability is constantly complied with so
Figure A20091008926300142
Upgrade.Wherein,
Figure A20091008926300143
And 0≤p Ik+ Δ≤1; &epsiv; &RightArrow; ij = &epsiv; ( s j ) - &epsiv; ( s i ) It is the increment of orientation value; A is a learning rate; &xi; = p iu ( t ) / &Sigma; v &NotEqual; k p iv ( t ) ;
(6) calculating operation entropy: according to the formula of the operation entropy that defines
&psi; ( t ) = - &Sigma; i = 0 n S p i &Sigma; k = 1 n &Omega; p ik log 2 p ik = - &Sigma; i = 0 n S p ( s i ) &Sigma; k = 1 n &Omega; p ( &alpha; k | s i ) log 2 p ( &alpha; k | s i ) . Calculate t operation entropy constantly, wherein, p (s i) be AOC state s iThe probability of occurrence of ∈ S is at t value constantly, p (α k| s k) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S kThe probability of ∈ Ω is in t value constantly.
(7) recurrence shifts: if t+1≤T f, t=t+1 also repeats (2)-(7) so;
(8) as t+1>T fIn time, shut down.
Embodiment three: the Balance Control that realizes double-wheel self-balancing robot by autonomous operant conditioning reflex automat.The two-wheeled vertical type robot is move left and right on the level land freely.When the drift angle exceeds ± 12 ° of robots meeting out of trim.The state set of She Ji AOC automat is exactly the robot drift angle for this reason, comprise 6 state: θ=0 °, 0 °<θ<12 °, θ=12 ° ,-12 °<θ<0 °, θ=-12 °, | θ |>12 °, use s respectively 0, s 1, s 2, s 3, s 4, s 5, s 6Represent, therefore, state set S={s 0, s 1, s 2, s 2, s 3, s 4, s 5, s 6, }.Its operational set Ω={ α 0, α 1, α 2, comprise not mobile α 0, α moves right 1, be moved to the left α 2Its state transitions rule is as follows:
δ(s 0×α 0)=s 0 δ(s 0×α 1)=s 3 δ(s 0×α 2)=s 1
δ(s 1×α 0)=s 2 δ(s 1×α 1)=s 0 δ(s 1×α 2)=s 2
δ(s 2×α 0)=s 5 δ(s 2×α 1)=s 1 δ(s 2×α 2)=s 5
δ(s 3×α 0)=s 4 δ(s 3×α 1)=s 4 δ(s 3×α 2)=s 0
δ(s 4×α 0)=s 5 δ(s 4×α 1)=s 5 δ(s 4×α 2)=s 3
See Table 2.Its orientation function: ε: S → E={ ε i| i=0, ± 0.5, ± 1}, ε i=ε (s i) ∈ E is state s iThe orientation value of ∈ S, simultaneously &Delta; = a &times; &epsiv; &RightArrow; ij &times; ( 1 - p ik ) . Its orientation: s 0→ s 0: just be orientated ( &epsiv; &RightArrow; 00 = 0 ) ; s 0→ s 3: zero orientation ( &epsiv; &RightArrow; 03 = 0 ) ; s 0→ s 1: zero orientation ( &epsiv; &RightArrow; 01 = 0 ) ; s 1→ s 0: just be orientated ( &epsiv; &RightArrow; 10 = 1.0 ) ; s 1→ s 2: negative orientation ( &epsiv; &RightArrow; 12 = - 0.5 ) ; s 2→ s 1: just be orientated ( &epsiv; &RightArrow; 21 = 1.0 ) ; s 2→ s 5: negative orientation ( &epsiv; &RightArrow; 25 = - 1.0 ) ; s 3→ s 4: negative orientation ( &epsiv; &RightArrow; 34 = - 0.5 ) ; s 3→ s 0: just be orientated ( &epsiv; &RightArrow; 30 = 1.0 ) ; s 4→ s 5: negative orientation ( &epsiv; &RightArrow; 45 = - 1.0 ) ; s 4→ s 3: just be orientated ( &epsiv; &RightArrow; 43 = 1.0 ) . P wherein IkThe expression robot is in state s iFollowing implementation and operation a kProbability.According to the operant conditioning reflex law of learning
Figure A20091008926300151
Its probability is brought in constant renewal in.Initial probability all is 1/3, through the study in general 1500 steps, robot all can keep the balance of self with the operation that chooses near 1 probability under every kind of state, the operation that it generally can both choose under preceding 5 kinds of states allows θ trend towards 0 °, from accompanying drawing 7-11 as can be seen.In the process of experiment, each is constantly all according to the formula of the operation entropy of definition
&psi; ( t ) = - &Sigma; i = 0 n S p i &Sigma; k = 1 n &Omega; p ik log 2 p ik = - &Sigma; i = 0 n S p ( s i ) &Sigma; k = 1 n &Omega; p ( &alpha; k | s i ) log 2 p ( &alpha; k | s i ) Calculated the operation entropy, As time goes on the operation entropy ψ (t) of AOC is more and more littler and trend towards minimum when t → ∞, sees accompanying drawing 12, illustrates that so AOC operation entropy ψ (t) is a convergent.AOC is a self-organizing system based on Si Jinna operant conditioning reflex theory, has self study and adaptation function.The process of system's self-organization is the process of absorption information, is the process of drawing negentropy, is the process that removes uncertainty.Since the convergence of verified AOC operation entropy ψ (t) has so also just been illustrated the self-organization characteristic of AOC.
The state transitions of table 2 double-wheel self-balancing robot and orientation mechanism
Figure A20091008926300153
The concrete implementation step of this experiment is as follows:
(1) initialization: t=0 is set, the original state s of given AOC (0) at random, given learning rate a=0.01, given initial operation Probability p Ik(0)=1/3 (i=0,1,2; K=0,1,2); Given stop time T f=1500;
(2) selection operation: according to " condition-operation " rule among the operational set Γ
Γ={ r Ik(p) | p ∈ P; I ∈ 0,1,2,3,4}; K ∈ 0,1,2}}, the regular r of " condition-operation " at random Ik(p): s i→ α k(p), promptly AOC is in s at its state iUnder the condition of ∈ S according to Probability p ∈ P implementation and operation α k∈ Ω, p=p Ik=p (α k| s i) be that AOC is in s at its state iCondition under implementation and operation α kProbable value, select the AOC state to be in operation α (t) the ∈ Ω of s (t) ∈ S randomly;
(3) in implementation and operation: the t moment, AOC is in state s (t) ∈ S and implements operation α (t) the ∈ Ω that previous step has been chosen, current state δ: S (t) * Ω (t) → S (t+1), and concrete condition is:
δ(s 0×α 0)=s 0 δ(s 0×α 1)=s 3 δ(s 0×α 2)=s 1
δ(s 1×α 0)=s 2 δ(s 1×α 1)=s 0 δ(s 1×α 2)=s 2
δ(s 2×α 0)=s 5 δ(s 2×α 1)=s 1 δ(s 2×α 2)=s 5
δ(s 3×α 0)=s 4 δ(s 3×α 1)=s 4 δ(s 3×α 2)=s 0
δ(s 4×α 0)=s 5 δ(s 4×α 1)=s 5 δ(s 4×α 2)=s 3
Shift;
(4) observer state: according to the state transition function of AOC: δ: S (t) * Ω (t) → S (t+1), though the state transitions process is known or unknown, the result of its state transitions can observe fully, promptly exists
{ 0,1,2,3,4} makes s (t+1)=s to j ∈ j
(5) operant conditioning reflex: at t moment implementation and operation, not only the state of AOC shifts, and it each operates in next enforcement probability constantly and also change, then according to the operant conditioning reflex law of learning Regulate working rule r Ik(p) the enforcement Probability p ∈ P of ∈ Γ.T is s (t)=s constantly iAnd α (t)=α k, t+1 evolutionary operator probability is constantly complied with so
Figure A20091008926300162
Upgrade.Wherein,
Figure A20091008926300163
And 0≤p Ik+ Δ≤1; &epsiv; &RightArrow; ij = &epsiv; ( s j ) - &epsiv; ( s i ) It is the increment of orientation value; A is a learning rate; &xi; = p iu ( t ) / &Sigma; v &NotEqual; k p iv ( t ) ; Here the operation of the optimum under every kind of state is different, so must calculate the probability of every kind of different operating correspondence under the state, one has 15 probability.
(6) calculating operation entropy: according to the formula of the operation entropy that defines
&psi; ( t ) = - &Sigma; i = 0 n S p i &Sigma; k = 1 n &Omega; p ik log 2 p ik = - &Sigma; i = 0 n S p ( s i ) &Sigma; k = 1 n &Omega; p ( &alpha; k | s i ) log 2 p ( &alpha; k | s i ) . Calculate t operation entropy constantly, wherein, p (s i) be AOC state s iThe probability of occurrence of ∈ S is at t value constantly, p (α k| s i) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S kThe probability of ∈ Ω is in t value constantly.
(7) recurrence shifts: if t+1≤T f, t=t+1 also repeats (2)-(7) so;
(8) as t+1>T fIn time, shut down.

Claims (2)

1, a kind of autonomous operant conditioning reflex automat is one nine tuple hereinafter to be referred as AOC:
AOC = < t , &Omega; , S , &Gamma; , &delta; , &epsiv; , &eta; , &psi; , s 0 >
Figure A2009100892630002C2
Wherein
(1) discrete time of AOC: t ∈ 0,1,2 ..., n t, t=0 is the initial moment of AOC;
(2) functional symbol of AOC set: Ω={ α k| k=1,2 ..., n Ω, α kK functional symbol for AOC;
(3) state set of AOC: S={s i| i=0,1,2 ..., n S, s iI state for AOC;
(4) working rule of AOC set: Γ={ r Ik(p) | p ∈ P; I ∈ 0,1,2 ..., n S; K ∈ 0,1,2 ..., n Ω, " condition-operation " regular r at random Ik(p): s i→ α k(p) mean that AOC is in s at its state iUnder the condition of ∈ S according to Probability p ∈ P implementation and operation α k∈ Ω, p=p Ik=p (α k| s i) be that AOC is in s at state iCondition under implementation and operation α kProbable value, P represents p IkSet;
(5) state transition function of AOC: δ: S (t) * Ω (t) → S (t+1), AOC t+1 state s (t+1) ∈ S is constantly determined by t state s (t) ∈ S and t operation α (t) ∈ Ω constantly constantly, and is irrelevant with state and the operation of its t before the moment; The determined state transitions process of δ is known or unknown, but the result of its state transitions can observe;
(6) orientation function of AOC: ε: S → E={ ε i| i=0,1,2 ..., n S, ε i=ε (s i) ∈ E is state s iThe orientation value of ∈ S;
(7) the operant conditioning reflex law of learning of AOC:
Figure A2009100892630002C3
The operant conditioning reflex mechanism that simulation is biological is regulated working rule r Ik(p) the enforcement probability of ∈ Γ supposes that t state constantly is s (t)=s i, implementation and operation α (t)=α kState s (t+1)=s that ∈ Ω, t+1 observe constantly j, t+1 evolutionary operator probability is constantly complied with so
Figure A2009100892630002C4
Upgrade; Here wherein, p Ik(t) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S kThe probability of ∈ Ω is in t value constantly; p Ik(t+1) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S kThe probability of ∈ Ω is in t+1 value constantly; And 0≤p Ik+ Δ≤1; &epsiv; &RightArrow; ij = &epsiv; ( s j ) - &epsiv; ( s i ) It is the increment of orientation value;
Figure A2009100892630002C7
Be monotonic increasing function, satisfy
Figure A2009100892630002C8
And if only if x=0; A is a learning rate; &xi; = p iu ( t ) / &Sigma; v &NotEqual; k p iv ( t ) , Here u represents 0 to n ΩBetween be not equal to any one numerical value of k,
Figure A2009100892630003C1
Expression AOC state is in s iImplementation and operation α under the condition of ∈ S uThe probability sum of ∈ Ω is in t value constantly, and v represents 0 to n ΩBetween be not equal to all numerical value of k; p Iu(t) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S uThe probability of ∈ Ω is at t value constantly, p Iu(t+1) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S uThe probability of ∈ Ω is in t+1 value constantly;
(8) the operation entropy of AOC: ψ: P * E → R +, R +Be positive set of real numbers, AOC is in s at t operation entropy ψ (t) expression t moment state constantly iOperation entropy sum under the condition:
&psi; ( t ) = &psi; ( &Omega; ( t ) | S ) = &Sigma; i = 0 n S p i &psi; i ( t ) = &Sigma; i = 0 n S p ( s i ) &psi; i ( &Omega; ( t ) | s i )
It is in state s (t)=s constantly by t iEvolutionary operator probability set under the condition and orientation function set decision; ψ i(t) be that AOC is in state s iOperation entropy under the condition:
&psi; i ( t ) = &psi; i ( &Omega; ( t ) | s i ) = - &Sigma; k = 1 n &Omega; p ik log 2 p ik = - &Sigma; k = 1 n &Omega; p ( &alpha; k | s i ) log 2 p ( &alpha; k | s i )
Know that operation entropy and weighted sum under each state just can draw AOC at t operation entropy constantly &psi; ( t ) = - &Sigma; i = 0 n S p i &Sigma; k = 1 n &Omega; p ik log 2 p ik = - &Sigma; i = 0 n S p ( s i ) &Sigma; k = 1 n &Omega; p ( &alpha; k | s i ) log 2 p ( &alpha; k | s i ) ; Wherein, p (s i) be AOC state s iThe probability of occurrence of ∈ S is at t value constantly, p (α k| s i) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S kThe probability of ∈ Ω is in t value constantly;
(9) initial state of AOC: s 0=s (0) ∈ S.
2, autonomous operant conditioning reflex automat AOC according to claim 1 is characterized in that it complies with following program step and recursively move:
(1) initialization: t=0 is set, the original state s of given AOC (0) at random, given learning rate a, given initial operation Probability p Ik(0)=1/n Ω(i=0,1,2 ..., n SK=1,2 ..., n Ω); Given stop time T f
(2) selection operation: according to the regular r among " condition-operation " regular collection Γ among the operational set Γ Ik(p): s i→ α k(p), promptly AOC is in s at its state iUnder the condition of ∈ S according to Probability p ∈ P implementation and operation α k∈ Ω, p=p Ik=p (α k| s i) be that AOC is in s at its state iCondition under implementation and operation α kProbable value, select the AOC state to be in operation α (t) the ∈ Ω of s (t) ∈ S randomly;
(3) in implementation and operation: the t moment, AOC is in state s (t) ∈ S and implements operation α (t) the ∈ Ω that previous step has been chosen, and δ (s (t), α (t))=δ (s takes place to shift current state i, α k);
(4) observer state: according to the state transition function of AOC: δ: S (t) * Ω (t) → S (t+1), the result of state transitions can observe fully, promptly exist j ∈ 0,1,2 ..., n SMake s (t+1)=s j
(5) operant conditioning reflex: at t moment implementation and operation, not only the state of AOC shifts, and it each operates in next enforcement probability constantly and also change, then according to the operant conditioning reflex law of learning
Figure A2009100892630004C1
Regulate working rule r Ik(p) the enforcement Probability p ∈ P of ∈ Γ; T is s (t)=s constantly iAnd α (t)=α k, t+1 evolutionary operator probability is constantly complied with so
Upgrade; Wherein,
Figure A2009100892630004C3
And 0≤p Ik+ Δ≤1; &epsiv; &RightArrow; ij = &epsiv; ( s j ) - &epsiv; ( s i ) It is the increment of orientation value; A is a learning rate; &xi; = p iu ( t ) / &Sigma; v &NotEqual; k p iv ( t ) ;
(6) calculating operation entropy: according to the formula of the operation entropy that defines &psi; ( t ) = - &Sigma; i = 0 n S p i &Sigma; k = 1 n &Omega; p ik log 2 p ik = - &Sigma; i = 0 n S p ( s i ) &Sigma; k = 1 n &Omega; p ( &alpha; k | s i ) log 2 p ( &alpha; k | s i ) ; Calculate t operation entropy constantly, wherein, p (s i) be AOC state s iThe probability of occurrence of ∈ S is at t value constantly, p (α k| s i) be that the AOC state is in s iImplementation and operation α under the condition of ∈ S kThe probability of ∈ Ω is in t value constantly;
(7) recurrence shifts: if t+1≤T f, t=t+1 also repeats (2)-(7) so;
(8) as t+1>T fIn time, shut down.
CNA2009100892633A 2009-07-15 2009-07-15 Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior Pending CN101599137A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2009100892633A CN101599137A (en) 2009-07-15 2009-07-15 Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2009100892633A CN101599137A (en) 2009-07-15 2009-07-15 Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior

Publications (1)

Publication Number Publication Date
CN101599137A true CN101599137A (en) 2009-12-09

Family

ID=41420574

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2009100892633A Pending CN101599137A (en) 2009-07-15 2009-07-15 Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior

Country Status (1)

Country Link
CN (1) CN101599137A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103792846A (en) * 2014-02-18 2014-05-14 北京工业大学 Robot obstacle avoidance guiding method based on Skinner operating condition reflection principle
CN104570738A (en) * 2014-12-30 2015-04-29 北京工业大学 Robot track tracing method based on Skinner operant conditioning automata
CN104614988A (en) * 2014-12-22 2015-05-13 北京工业大学 Cognitive and learning method of cognitive moving system with inner engine
CN105094124A (en) * 2014-05-21 2015-11-25 防灾科技学院 Method and model for performing independent path exploration based on operant conditioning
CN105205533A (en) * 2015-09-29 2015-12-30 华北理工大学 Development automatic machine with brain cognition mechanism and learning method of development automatic machine
WO2017114130A1 (en) * 2015-12-31 2017-07-06 深圳光启合众科技有限公司 Method and device for obtaining state of robot
CN108846477A (en) * 2018-06-28 2018-11-20 上海浦东发展银行股份有限公司信用卡中心 A kind of wisdom brain decision system and decision-making technique based on reflex arc
CN109212975A (en) * 2018-11-13 2019-01-15 北方工业大学 A kind of perception action cognitive learning method with developmental mechanism
CN111464707A (en) * 2020-03-30 2020-07-28 中国建设银行股份有限公司 Outbound call processing method, device and system

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103792846A (en) * 2014-02-18 2014-05-14 北京工业大学 Robot obstacle avoidance guiding method based on Skinner operating condition reflection principle
CN103792846B (en) * 2014-02-18 2016-05-18 北京工业大学 Based on the robot obstacle-avoiding air navigation aid of Skinner operant conditioning reflex principle
CN105094124A (en) * 2014-05-21 2015-11-25 防灾科技学院 Method and model for performing independent path exploration based on operant conditioning
CN104614988B (en) * 2014-12-22 2017-04-19 北京工业大学 Cognitive and learning method of cognitive moving system with inner engine
CN104614988A (en) * 2014-12-22 2015-05-13 北京工业大学 Cognitive and learning method of cognitive moving system with inner engine
CN104570738A (en) * 2014-12-30 2015-04-29 北京工业大学 Robot track tracing method based on Skinner operant conditioning automata
CN105205533A (en) * 2015-09-29 2015-12-30 华北理工大学 Development automatic machine with brain cognition mechanism and learning method of development automatic machine
CN105205533B (en) * 2015-09-29 2018-01-05 华北理工大学 Development automatic machine and its learning method with brain Mechanism of Cognition
WO2017114130A1 (en) * 2015-12-31 2017-07-06 深圳光启合众科技有限公司 Method and device for obtaining state of robot
CN106926236A (en) * 2015-12-31 2017-07-07 深圳光启合众科技有限公司 The method and apparatus for obtaining the state of robot
CN106926236B (en) * 2015-12-31 2020-06-30 深圳光启合众科技有限公司 Method and device for acquiring state of robot
CN108846477A (en) * 2018-06-28 2018-11-20 上海浦东发展银行股份有限公司信用卡中心 A kind of wisdom brain decision system and decision-making technique based on reflex arc
CN108846477B (en) * 2018-06-28 2022-06-21 上海浦东发展银行股份有限公司信用卡中心 Intelligent brain decision system and decision method based on reflection arcs
CN109212975A (en) * 2018-11-13 2019-01-15 北方工业大学 A kind of perception action cognitive learning method with developmental mechanism
CN111464707A (en) * 2020-03-30 2020-07-28 中国建设银行股份有限公司 Outbound call processing method, device and system

Similar Documents

Publication Publication Date Title
CN101599137A (en) Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior
King et al. Making the most of statistical analyses: Improving interpretation and presentation
Thomson et al. A general instance-based learning framework for studying intuitive decision-making in a cognitive architecture
CN105242544B (en) Consider the non-linear multiple no-manned plane System Fault Tolerance formation control method of random perturbation
Srinivasan et al. A framework for describing functions in design
CN103218689A (en) Analyzing method and analyzing device for operator state assessment reliability
CN109993281A (en) A kind of causality method for digging based on deep learning
Schmidt Social learning in the Anthropocene: Novel challenges, shadow networks, and ethical practices
Nightingale What is technology? Six definitions and two pathologies
CN101673354A (en) Operant conditioning reflex automatic machine and application thereof in control of biomimetic autonomous learning
Diehl et al. A causal-based approach to explain, predict and prevent failures in robotic tasks
Geyer et al. Explainable AI for engineering design: A unified approach of systems engineering and component-based deep learning
CN116663416A (en) CGF decision behavior simulation method based on behavior tree
CN104614988A (en) Cognitive and learning method of cognitive moving system with inner engine
Tijani et al. Towards a general framework for an observation and knowledge based model of occupant behaviour in office buildings
Ge et al. A cooperative framework of learning automata and its application in tutorial-like system
Schimpf et al. Work in progress: A Markov chain method for modeling student behaviors
US7512581B2 (en) Electronic circuit implementing knowledge enhanced electronic logic solutions
Carley et al. Computational organization theory: An introduction
El Namaki A Systems Approach to the Artificial Intelligence Concept
CN115759509B (en) Complex system level digital twin operation virtual-real consistency judging and interacting method
US20140289170A1 (en) Apparatus for inverting relationships implemented with knowledge enhanced electronic logic
Virtanen Optimal pilot decisions and flight trajectories in air combat
Byrne et al. Artificial Intelligence and Machine Learning: Overview and Applications.
Horváth et al. New content behind the concept intelligent engineering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20091209