CN101599137A - Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior - Google Patents

Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior Download PDF

Info

Publication number
CN101599137A
CN101599137A CNA2009100892633A CN200910089263A CN101599137A CN 101599137 A CN101599137 A CN 101599137A CN A2009100892633 A CNA2009100892633 A CN A2009100892633A CN 200910089263 A CN200910089263 A CN 200910089263A CN 101599137 A CN101599137 A CN 101599137A
Authority
CN
China
Prior art keywords
msub
mrow
aoc
state
munderover
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2009100892633A
Other languages
Chinese (zh)
Inventor
阮晓钢
戴丽珍
蔡建羡
陈静
郜园园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CNA2009100892633A priority Critical patent/CN101599137A/en
Publication of CN101599137A publication Critical patent/CN101599137A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior belong to bionical field.A kind of autonomous operant conditioning reflex automat AOC relates to a kind of discrete calculation machine model of describing autonomous formula automaton, mainly comprise: operational set, state set, " condition-operation " regular collection, observable state transitions, and operant conditioning reflex law of learning, and, defined behavior entropy, stipulated the recurrence working procedure of AOC based on AOC state orientation value.The key character of AOC is to simulate biological operant conditioning reflex mechanism, thereby has a bionical self organizing function, comprise self study and adaptation function, can be used for describing, simulation designs various self-organizing systems, especially, should be applied to describe simulation, the various intelligent behaviors of planing machine robot system.

Description

Autonomous operation conditional reflex automata and application thereof in realization of intelligent behaviors
Technical Field
The invention relates to an automaton, in particular to a bionic automaton based on an operation conditioned reflex principle.
Background
Automaton models for learning systems have been developed in 1960s and are referred to as learning automata, and over the last few years, it has been essential to change the structure of learning automata to meet different application requirements, typically both input and output. The invention is a self-organizing system based on the theory of Stefan gold operation conditioned reflex, and has self-learning and self-adapting functions. Since the 20 th century, the study of animals was initiated and two forms of study were proposed: one is classical conditioned reflex learning to shape the response behavior of an organism; another is operative conditioned reflex learning to shape the operational behavior of the organism. The western scientist believes that these two types of reflections are two distinct coupling processes: classical conditioned reflex is the process of S- -R ligation; operative conditioned reflex is the R- -S coupling process.
In recent decades, the academic interest of autonomous systems has increased year by year, and the total amount of literature associated with autonomous systems has increased year by year. The invention is an autonomous automaton, different from a non-autonomous automaton, the output of which does not need the drive of an external instruction and is made by the automaton according to the self-requirement. Related patents are as follows: a picture generating method named as a customer operation type automaton and the customer operation type automaton with application number of 98115560.X, a regular expression matching acceleration method named as a finite automaton with memory determination based on application number of 200710071071.0 and the like are all used for realizing a certain function by intersecting the automaton with an external environment. At present, an autonomous operation conditional reflection automaton has not been available.
The invention provides an abstract self-organizing model based on a Stefan operating conditioned reflex theory, which is used for describing, simulating and designing various self-organizing systems to show self-learning and self-adaptive characteristics.
Disclosure of Invention
The invention provides an autonomous operation conditional reflection automaton which can be used for describing, simulating and designing self-organization (including self-learning and self-adaption).
The operating conditional reflection automaton of the present invention is a nine-tuple comprising: an input symbol set, an internal state set, an internal operation set, an output symbol set, a random 'condition-operation' rule set, a state transition unit, an observation unit, a state orientation unit, and an operation conditional reflection learning unit, and a recursive execution procedure of the AOC is specified. The AOC is characterized in that the AOC simulates the operation conditioned reflex mechanism of organisms, thereby having bionic self-organization functions, including self-learning and self-adaption functions, and being used for describing, simulating and designing various self-organization systems with interaction functions.
A general finite state automaton is a five-tuple: FA ═ a, Z, S, f, g }. Where a represents a finite input symbol set, S represents a finite (internal) state symbol set (S (0) ∈ S being the initial state), Z represents a finite output (acceptance state) symbol set, f: s × A → S denotes the state transition function, g: s → Z represents the output function. The finite state automata FA is a non-autonomous system.
The AOC and finite state automata FA appear to be unequivalent in the sense that the operator symbols in the AOC are not equivalent to the input symbols in the finite state automata FA, which represent the internal operations of the AOC, and the input symbols in the FA represent the external instructions. The set of operation symbols Ω in the AOC is not the set of input symbols in the FA, but the internal operation of the AOC. The set of input symbols in the FA is actually the set of instructions that may be input externally. There is no output symbol set in AOC, and naturally there is no output function. As an autonomous system, an AOC requires a set of output symbols and an output function. The autonomous system may or may also need to act on the environment or the objective world. From the form of the state space equation, the output is a combination of states, or a combination of states and operations, and therefore, it can be said that the internal state set of the AOC is itself a set of output symbols, and the state of the AOC is observable; by "the state of the AOC is observable" is meant that the AOC itself has a receptor and is able to detect changes in its state, and does not mean that the outside world is able to observe these quantities; the autonomous automaton also needs output, the output does not need to be driven by an external instruction, and the autonomous automaton is made according to the needs of the autonomous automaton.
Compared with a non-autonomous automaton, the autonomous automaton has the advantages that the output of the autonomous automaton does not need to be driven by an external instruction, the autonomous automaton can perform certain action on the environment according to the self requirement, namely, even if the external environment changes, the autonomous automaton can still work normally, and the non-autonomous automaton needs to change a structural model or parameters to adapt to the change of the external environment. A non-autonomous system can always be converted into an autonomous system, and then an autonomous operation conditional reflection AOC can always be found to correspond to a corresponding non-autonomous operation conditional reflection automaton. Autonomous operation conditional reflection AOC are more widely used.
In information theory, entropy can be used as a measure of uncertainty for some event. The larger the information quantity is, the more regular the architecture is, the more perfect the function is, and the smaller the entropy is. By utilizing the concept of entropy, the measurement, transmission, transformation and storage of information are researched theoretically. The invention introduces the concept of the operation entropy to prove the convergence of the AOC operation entropy psi (t), and the self-organization process of the system is a process of absorbing information, a process of absorbing negative entropy and a process of eliminating uncertainty, so that the self-organization characteristic of the AOC is clarified, and the AOC has self-learning and self-adapting functions.
The invention provides an autonomous operation conditioned reflex automaton, which is used for simulating an animal experiment of Stefan to prove that the automaton realizes a mechanism of simulating operation conditioned reflex learning, and simultaneously realizes balance control of a two-wheeled self-balancing robot, which shows that AOC can be used for designing various intelligent behaviors of a robot system.
The automaton of the invention is a nine-tuple autonomous operation conditional reflection automaton:
AOC=<t,Ω,S,Γ,δ,ε,η,ψ,s0>
Figure A20091008926300071
wherein
(1) Discrete time of AOC: t e {0, 1, 2, …, n t0 is the starting time of AOC;
(2) set of operation symbols for AOC: q ═ αk|k=1,2,…,nΩ},αkThe kth operation symbol of AOC;
(3) state set of AOC: s ═ Si|i=0,1,2,…,nS},siIs the ith state of the AOC;
(4) AOC operation rule set: Γ ═ rik(p)|p∈P;i∈{0,1,2,…,nS};k∈{0,1,2,…,nΩ} random 'condition-operation' rule rik(p):si→αk(p) means that the AOC is in s stateiImplementing operation alpha according to probability P ∈ P under condition of ∈ Sk∈Ω,p=pik=p(αk|si) I.e. AOC is in state siUnder the conditions of operation akP represents PikA set of (a);
(5) state transfer function of AOC: δ: s (t) × Ω (t) → S (t +1), the state S (t +1) ∈ S at the time of AOC t +1 being determined by the state S (t) ∈ S at the time t and the operation α (t) ∈ S at the time t, regardless of the state and operation before the time t; δ the state transition process determined is known or unknown, but the results of its state transition can be observed;
(6) orientation function of AOC: epsilon: s → E ═ εi|i=0,1,2,…,nS},εi=ε(si) E is as state siAn orientation value of S;
(7) operating conditions of AOCLearning law of shooting:
Figure A20091008926300072
adjusting the operating rule rikThe probability of implementation of (P) ∈ Γ P: assuming that the state at time t is s (t), an operation α (t) ∈ Ω and the state s (t +1) observed at time t +1 are carried out, and according to the theory of conditioned operation of gold, if ∈ (s (t +1)) — ∈ (s (t))) < 0, p (α (t) | s (t)) tends to decrease, whereas if ∈ (s (t +1)) — ∈(s) (t)) > 0, p (α (t) | s (t)) tends to increase. At time t, the AOC is in the state s (t) siAnd the current selection operation α (t) ═ αkAt the same time, according to the state transition transfer function, the state s (t +1) at the next moment is sjSimulating the operating conditioned reflex mechanism of the living being, the probability of the current operation at the next moment, i.e. at the moment t +1, is changed, its value is increased by Δ, where Δ is related to the orientation value ∈, the larger the orientation value indicates the better result of the operation, and the larger Δ is, the probabilities of the remaining operations at the moment t +1 are correspondingly subtracted by a value, and the sum of the subtracted values is exactly Δ, and the probability of the operation at the last moment is the ratio of the sum of the operations (excluding the operation selected at the moment t) multiplied by Δ. This ensures that the sum of the probabilities of selecting the individual operations at each instant is 1. More formally described by the formula: when s (t) is equal to si、α(t)=αkAnd s (t +1) ═ sjThen pik(t+1)=pik(t) + Δ, the probability of other operations being denoted piu(t+1)=piu(t) - Δ ξ, where u denotes 0 to nΩAny value between k is not equal to k. Wherein p isik(t) is AOC state at siE.s condition to perform operation alphakThe value of the probability of belonging to omega at the time t; p is a radical ofik(t +1) is the AOC state at siE.s condition to perform operation alphakThe value of the probability of belonging to omega at the moment t + 1;
Figure A20091008926300081
and 0. ltoreq. pik+Δ≤1; <math> <mrow> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mi>ij</mi> </msub> <mo>=</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math> I.e. the increment of the orientation value;
Figure A20091008926300083
is a monotonically increasing function, satisfiesIf and only if x is 0; a is the learning rate; <math> <mrow> <mi>&xi;</mi> <mo>=</mo> <msub> <mi>p</mi> <mi>iu</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>/</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>v</mi> <mo>&NotEqual;</mo> <mi>k</mi> </mrow> </munder> <msub> <mi>p</mi> <mi>iv</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> where v represents 0 to nΩAll values of k not equal to k in between,
Figure A20091008926300086
indicates that the AOC state is at siE.s condition to perform operation alphauThe value of the sum of the probabilities belonging to omega at the time t; p is a radical ofiu(t) is AOC state at siE.s condition to perform operation alphauValue of the probability of e.omega at time t, piu(t +1) is the AOC state at siE.s condition to perform operation alphauThe value of the probability of e Ω at time t + 1.
(8) Operating entropy of AOC: psi: p × E → R+,R+Is a positive real number set, the operating entropy ψ (t) of the AOC at time t indicates that the state at time t is at siSum of operating entropies under conditions:
<math> <mrow> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>&Omega;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>|</mo> <mi>S</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>i</mi> </msub> <msub> <mi>&psi;</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>&psi;</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>&Omega;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math>
it is in the state s (t) s from time tiThe set of operating probabilities and the set of orientation functions under the condition. Psii(t) is AOC atState siOperating entropy under conditions:
<math> <mrow> <msub> <mi>&psi;</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>&psi;</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>&Omega;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msub> <mi>log</mi> <mn>2</mn> </msub> <msub> <mi>p</mi> <mi>ik</mi> </msub> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math>
knowing the operation entropy of each state and weighting and summing the operation entropy to obtain the operation entropy of the AOC at the time t
<math> <mrow> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>i</mi> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msub> <mi>log</mi> <mn>2</mn> </msub> <msub> <mi>p</mi> <mi>ik</mi> </msub> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> If the operating entropy ψ (t) of the AOC becomes smaller and tends to be minimum at t → ∞, it is explained that the operating entropy ψ (t) of the AOC is convergent. The AOC is a self-organizing system based on the theory of Stefan-gold operation conditioned reflex, and has self-learning and self-adapting functions. The self-organizing process of the system is a process of absorbing information, a process of absorbing negative entropy and a process of eliminating uncertainty. To elucidate the self-organizing properties of AOCs, we need to demonstrate the convergence of the AOC operational entropy ψ (t).
(9) Initial state of AOC: s0=s(0)∈S。
The invention is characterized in that the operation conditioned reflex mechanism of the simulation organism has bionic self-organization function, including self-learning and self-adapting function, and can be used for describing, simulating and designing various self-organization systems.
The AOC of the autonomous operation conditional reflecting automaton of the invention operates recursively according to the following procedural steps:
(1) initialization: setting t to 0, randomly giving the initial state s (0) of the AOC, giving the learning rate a, and giving the initial operation probability pik(0)=1/nΩ(i=0,1,2,…,nS;k=1,2,…,nΩ) (ii) a Given down time Tf
(2) Selecting operation: depends on the rule r in the "condition-operation" rule set Γ in the operation set Γik(p):si→αk(p) i.e. AOC in its state is in siImplementing operation alpha according to probability P ∈ P under condition of ∈ Sk∈Ω,p=pik=p(αk|si) Is that the AOC is in its state at siUnder the conditions of operation akRandomly selecting an operation alpha (t) ∈ omega with the AOC state being S (t) ∈ S;
(3) the implementation operation is as follows: at time t, the AOC is in a state S (t) e S, the operation α (t) e Ω selected in the previous step is performed, and the current state is shifted by δ (S (t)), α (t)) - δ (S (t))i,αk);
(4) And (3) observing the state: according to the state transfer function of AOC: δ: s (t) × Ω (t) → S (t +1), the result of the state transition being fully observable, i.e., the presence of j ∈ {0, 1, 2, …, nSSuch that s (t +1) is sj
(5) Operating conditions reflection: when the operation is performed at time t, not only the state of the AOC is shifted but also the probability of performing each operation at the next time is changed, and the learning law is reflected according to the operation conditions
Figure A20091008926300091
Adjusting the operating rule rikThe implementation probability P ∈ P of (P) ∈ Γ. time t s (t) siAnd α (t) ═ αkThen the probability of operation at time t +1 depends on
And (6) updating. Wherein,and 0. ltoreq. pik+Δ≤1; <math> <mrow> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mi>ij</mi> </msub> <mo>=</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math> I.e. the increment of the orientation value; a is the learning rate; <math> <mrow> <mi>&xi;</mi> <mo>=</mo> <msub> <mi>p</mi> <mi>iu</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>/</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>v</mi> <mo>&NotEqual;</mo> <mi>k</mi> </mrow> </munder> <msub> <mi>p</mi> <mi>iv</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
(6) calculating an operation entropy: formulation according to defined operational entropy
<math> <mrow> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>i</mi> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msub> <mi>log</mi> <mn>2</mn> </msub> <msub> <mi>p</mi> <mi>ik</mi> </msub> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> Calculating the operation entropy at the time t, wherein p(s)i) Is AOC state siThe value of the probability of occurrence of e S at time t, p (α)k|si) Is AOC state at siE.s condition to perform operation alphakThe value of the probability e omega at time t.
(7) And (3) recursive transition: if T + 1. ltoreq.TfThen t ═ t +1 and repeat (2) - (7);
(8) when T +1 > TfAnd stopping the machine.
The flow chart of the method of the invention is shown in FIG. 2.
Drawings
FIG. 1 is a schematic structural diagram of an autonomous operation conditional reflection automaton according to the present invention;
t is the discrete time (1) and Ω is the operation αk(k=1,2,…,nΩ) S is the state Si(i=0,1,2,…,nS) Is a state transfer function (4), Γ is a "condition-operation" rule rik(i∈{0,1,2,…,nS};k∈{1,2,…,nΩ}), ε is the orientation function (6), η is the conditioned reflex learning law (7), ψ is the behavioral entropy (8), s0Is the initial state (9).
FIG. 2 is a flow chart of an AOC program of the autonomous operation conditional reflecting automaton;
FIG. 3 is a probability curve of the behavior of mouse;
FIG. 4 is a graph of the operating entropy of the mouse experiment;
FIG. 5 is an operation behavior probability curve of a machine pigeon;
FIG. 6 is an operation entropy curve of a machine pigeon experiment;
fig. 7 is a graph showing probability of each operation behavior of the two-wheeled self-balancing robot in an upright state, that is, when the deflection angle θ is 0 °;
FIG. 8 shows probability curves of operation behaviors of the two-wheeled self-balancing robot when the deflection angle is more than 0 degrees and less than 12 degrees;
fig. 9 is a graph showing probability of each operation behavior of the two-wheeled self-balancing robot when the deflection angle θ is 12 °;
FIG. 10 shows probability curves of operation behaviors of the two-wheeled self-balancing robot when the deflection angle is larger than-12 degrees and smaller than theta and smaller than 0 degree;
fig. 11 is a graph showing probability of each operation behavior of the two-wheeled self-balancing robot when the deflection angle θ is-12 °;
FIG. 12 is an operation entropy curve of a two-round self-balancing robot experiment;
examples
The first embodiment is as follows: one minimal system was a learning mouse, a rat experiment simulating scant. Briefly describe the mouse experiment with sper gold: a white mouse is placed in the Steiner box, and a lever is arranged, so that the structure of the box can eliminate all external stimuli as far as possible. The mouse can move freely in the box, when the lever is pressed, a group of food falls into a plate below the box, and the mouse can eat the food. A device is arranged outside the box to record the action of the animal. The mouse learns to press the lever continuously and obtains food reward through self action. This experiment was carried out byThe rat experiment of the Skinson is realized by autonomously operating a conditional reflex automaton. The white mouse has two operation behaviors, namely pressing the lever alpha1And the other is a non-pressure lever alpha2I.e. the set of operations Ω ═ { α ═ α1,α2And the probabilities are respectively represented by p1 and p 2. Its state set S ═ S0,s1},s0Indicating a state of hunger, s1Indicating a non-starvation condition. The operation rule is as follows: Γ ═ rik(P) | P ∈ P; i belongs to {0, 1 }; k ∈ {0, 1} }, random "condition-operation" rule rik(p):si→αk(p) means that the AOC is in s stateiImplementing operation alpha according to probability P ∈ P under condition of ∈ Sk∈Ω,p=pik=p(αk|si) I.e. AOC is in state siUnder the conditions of operation akThe probability value of (2). Its state transfer function: δ: s (t) × Ω (t) → S (t +1), in particular:
s0×p1→s1,s0×p2→s0,s1×p1→s1,s1×p2→s0. Its orientation function: epsilon: s → E ═ εi|i=0,1},εi=ε(si) E is as state siOrientation value of S and defining <math> <mrow> <mi>&Delta;</mi> <mo>=</mo> <mi>a</mi> <mo>&times;</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mi>ij</mi> </msub> <mo>&times;</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>p</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> Where a is the learning rate,
Figure A20091008926300102
is the increment of the orientation value. The probability of the two behaviors at the initial moment is 0.5, and the mouse can obtain reward as long as pressing the leverThe probability of pressing the lever at any time is increased, namely the probability of selectively pressing the lever by the white mouse at the next time is increased, and the probability reflects the learning law according to the operating conditions
Figure A20091008926300103
Updating, repeatedly learning, and selecting the probability p of lever pressing by the white mouse1Are becoming larger and larger. The learning rate a of the experiment is 0.01, and after 668 steps of learning, the mouse learns to press the lever to obtain food, and as can be easily seen from the attached figure 3, the probability p that the mouse presses the lever1Eventually tending towards 1. During the course of the experiment, according to the formula of the defined operation entropy <math> <mrow> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>i</mi> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msub> <mi>log</mi> <mn>2</mn> </msub> <msub> <mi>p</mi> <mi>ik</mi> </msub> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math> The operation entropy at each time is calculated, and the operation entropy ψ (t) of the AOC becomes smaller and smaller with the passage of time and tends to be minimum at t → ∞, see fig. 4, which shows that the operation entropy ψ (t) of the AOC is convergent. The AOC is a self-organizing system based on the theory of Stefan-gold operation conditioned reflex, and has self-learning and self-adapting functions. The self-organizing process of the system is a process of absorbing information, a process of absorbing negative entropy and a process of eliminating uncertainty. Now that the convergence of the AOC operation entropy ψ (t) has been demonstrated, the self-organizing properties of AOC are also elucidated.
The specific implementation steps of the experiment are as follows:
(1) initialization: setting t to 0, randomly giving the initial state s (0) of the AOC, giving the learning rate a to 0.01, and giving the initial operation probability pik(0) 0.5 (i-0, 1; k-1, 2), given a downtime Tf=1000;
(2) Selecting operation: according to rule Γ in "condition-operation" rule set Γ in operation set Γ ═ r { r ═ rik(P) | P ∈ P; i belongs to {0, 1 }; k ∈ {1, 2} }, random "condition-operation" rule rik(p):si→αk(p) i.e. AOC in its state is in siImplementing operation alpha according to probability P ∈ P under condition of ∈ Sk∈Ω,p=pik=p(αk|si) Is that AOC is at s in the 0 stateiUnder the conditions of operation akRandomly selecting an operation alpha (t) ∈ omega with the AOC state being S (t) ∈ S;
(3) the implementation operation is as follows: at time t, AOC is in a state S (t) e S, the selected operation alpha (t) e omega in the last step is implemented, and the current state is determined according to delta: s (t) × Ω (t) → S (t +1), in particular:
s0×p1→s1,s0×p2→s0,s1×p1→s1,s1×p2→s0a transfer occurs;
(4) and (3) observing the state: according to the state transfer function of AOC: δ: s (t) × Ω (t) → S (t +1), the state transition process of which is known or unknown, but the result of which is fully observable, i.e., the presence of j ∈ {0, 1} causes S (t +1) ═ Sj
(5) Operating conditions reflection: when the operation is performed at time t, not only the state of the AOC is shifted but also the probability of performing each operation at the next time is changed, and the learning law is reflected according to the operation conditions
Figure A20091008926300112
Adjusting the operating rule rikThe implementation probability P ∈ P of (P) ∈ Γ. time t s (t) siAnd α (t) ═ αkThen the probability of operation at time t +1 depends on
Figure A20091008926300113
And (6) updating. Wherein,
Figure A20091008926300114
and 0. ltoreq. pik+Δ≤1; <math> <mrow> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mi>ij</mi> </msub> <mo>=</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math> I.e. the increment of the orientation value; a is the learning rate; <math> <mrow> <mi>&xi;</mi> <mo>=</mo> <msub> <mi>p</mi> <mi>iu</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>/</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>v</mi> <mo>&NotEqual;</mo> <mi>k</mi> </mrow> </munder> <msub> <mi>p</mi> <mi>iv</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
(6) calculating an operation entropy: formulation according to defined operational entropy
<math> <mrow> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>i</mi> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msub> <mi>log</mi> <mn>2</mn> </msub> <msub> <mi>p</mi> <mi>ik</mi> </msub> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> Calculating the operation entropy at the time t, wherein p(s)i) Is AOC state siThe value of the probability of occurrence of e S at time t, p (α)k|si) Is AOC state at siE.s condition to perform operation alphakThe value of the probability e omega at time t.
(7) And (3) recursive transition: if T + 1. ltoreq.TfThen t ═ t +1 and repeat (2) - (7);
(8) when T +1 > TfAnd stopping the machine.
Example two: the machine pigeon with learning ability simulates the pigeon experiment of the Skino. In this experiment, the pigeon was fed with food when pecking the red button (positive reinforcement stimulus), without any stimulus when pecking the yellow button, and given an electric shock when pecking the blue button (negative reinforcement stimulus), and the pigeon was randomized at the beginning with the red, yellow and blue buttons. After a while, the pigeon pecked the red button significantly more often than the other two buttons. Defining an autonomous operating condition of 3-operation 3-state for a machine pigeonReflection automaton having an operation set Ω ═ α0,α1,α2Are elements of a red pecking button alpha respectively0And a yellow pecking button alpha1And a blue pecking button alpha2The probabilities are represented by p0, p1, and p2, respectively. Set of states S ═ S0,s1,s2I.e. zero starvation status (non-starvation status) s0Semi-hungry state s1Starvation state s2The state transition rule is:
δ(s0×α0)=s0 δ(s0×α1)=s1 δ(s0×α2)=s1
δ(s1×α0)=s0 δ(s1×α1)=s2 δ(s1×α2)=s2
δ(s2×α0)=s1 δ(s2×α1)=s2 δ(s2×α2)=s2
is shown in tabular form in table 1 below. Its orientation function: epsilon: s → E ═ εi|i=0,±0.5,±1},εi=ε(si) E is as state siOrientation value of S and defining <math> <mrow> <mi>&Delta;</mi> <mo>=</mo> <mi>a</mi> <mo>&times;</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mi>ij</mi> </msub> <mo>&times;</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>p</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> Orientation thereof: s0→s0: zero orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>00</mn> </msub> <mo>=</mo> <mn>0</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s0→s1: zero orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>01</mn> </msub> <mo>=</mo> <mn>0</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s1→s0: is in a positive orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>10</mn> </msub> <mo>=</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s1→s2: negative orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>12</mn> </msub> <mo>=</mo> <mo>-</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s2→s1: is in a positive orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>21</mn> </msub> <mo>=</mo> <mn>1.0</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s2→s2: negative orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>22</mn> </msub> <mo>=</mo> <mo>-</mo> <mn>1.0</mn> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> Reflection law of learning according to operating conditions
Figure A20091008926300128
The current operation is rewarded <math> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mi>ij</mi> </msub> <mo>></mo> <mn>0</mn> <mo>)</mo> </mrow> </math> The corresponding implementation probability tends to increase, and the implementation probabilities of other operations correspondingly decrease; the current operation is rewarded <math> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mi>ij</mi> </msub> <mo>=</mo> <mn>0</mn> <mo>)</mo> </mrow> </math> The probability of all operations is not changed; the current operation is rewarded <math> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mi>ij</mi> </msub> <mo>&lt;</mo> <mn>0</mn> <mo>)</mo> </mrow> </math> The corresponding implementation probability tends to decrease and the implementation probability of other operations increases accordingly. The initial probability of each operation is 1/3, and after about 5000 steps of learning, the machine pigeon basically pecks only the red button, but not the red button and the blue button, and it can be seen from fig. 5 that the probability p0 of the machine pigeon pecking the red button tends to 1, and the probability p1 of pecking the yellow button and the probability p2 of pecking the blue button tend to 0.
TABLE 1 State transition of machine pigeons
Figure A20091008926300131
In the course of the experiment, each time is according to the formula of the defined operation entropy
<math> <mrow> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>i</mi> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msub> <mi>log</mi> <mn>2</mn> </msub> <msub> <mi>p</mi> <mi>ik</mi> </msub> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math> The operation entropy is calculated, and the operation entropy ψ (t) of the AOC becomes smaller and smaller with the passage of time and tends to be minimum at t → ∞, see fig. 6, which shows that the AOC operation entropy ψ (t) is convergent. The AOC is a self-organizing system based on the theory of Stefan-gold operation conditioned reflex, and has self-learning and self-adapting functions. The self-organizing process of the system is a process of absorbing information, a process of absorbing negative entropy and a process of eliminating uncertainty. Now that the convergence of the AOC operation entropy ψ (t) has been demonstrated, the self-organizing properties of AOC are also elucidated.
The specific implementation steps of the experiment are as follows:
(1) initialization: setting t to 0, randomly giving the initial state s (0) of the AOC, giving the learning rate a to 0.01, and giving the initial operation probability pik(0) 1/3 (i-0, 1, 2; k-0, 1, 2); given down time Tf=5000;
(2) Selecting operation: rule set of 'conditional-operation' in operation set gamma
Γ={rik(P) | P ∈ P; i belongs to {0, 1, 2 }; k ∈ {0, 1, 2} }, a rule r of random "condition-operations }ik(p):si→αk(p) i.e. AOC in its state is in siImplementing operation alpha according to probability P ∈ P under condition of ∈ Sk∈Ω,p=pik=p(αk|si) Is that the AOC is in its state at siUnder the conditions of operation akRandomly selecting an operation alpha (t) ∈ omega with the AOC state being S (t) ∈ S;
(3) the implementation operation is as follows: at time t, AOC is in a state S (t) e S to implement the operation α (t) e Ω selected in the previous step, and the current state δ: s (t) × Ω (t) → S (t +1), in particular:
δ(s0×α0)=s0 δ(s0×α1)=s1 δ(s0×α2)=s1
δ(s1×α0)=s0 δ(s1×α1)=s2 δ(s1×α2)=s2
δ(s2×α0)=s1 δ(s2×α1)=s2 δ(s2×α2)=s2
(4) and (3) observing the state: according to the state transfer function of AOC: δ: s (t) × Ω (t) → S (t +1), and although the state transition process is known or unknown, the result of the state transition is fully observable, i.e., there is a state transition
j ∈ {0, 1, 2} such that s (t +1) ═ sj
(5) Operating conditions reflection: when the operation is performed at time t, not only the state of the AOC is shifted but also the probability of performing each operation at the next time is changed, and the learning law is reflected according to the operation conditionsAdjusting the operating rule rikThe implementation probability P ∈ P of (P) ∈ Γ. time t s (t) siAnd α (t) ═ αkThen the probability of operation at time t +1 depends on
Figure A20091008926300142
And (6) updating. Wherein,
Figure A20091008926300143
and 0. ltoreq. pik+Δ≤1; <math> <mrow> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mi>ij</mi> </msub> <mo>=</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math> I.e. the increment of the orientation value; a is the learning rate; <math> <mrow> <mi>&xi;</mi> <mo>=</mo> <msub> <mi>p</mi> <mi>iu</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>/</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>v</mi> <mo>&NotEqual;</mo> <mi>k</mi> </mrow> </munder> <msub> <mi>p</mi> <mi>iv</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
(6) calculating an operation entropy: formulation according to defined operational entropy
<math> <mrow> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>i</mi> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msub> <mi>log</mi> <mn>2</mn> </msub> <msub> <mi>p</mi> <mi>ik</mi> </msub> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> Calculating the operation entropy at the time t, wherein p(s)i) Is AOC state siThe value of the probability of occurrence of e S at time t, p (α)k|sk) Is AOC state at siE.s condition to perform operation alphakThe value of the probability e omega at time t.
(7) And (3) recursive transition: if T + 1. ltoreq.TfThen t ═ t +1 and repeat (2) - (7);
(8) when T +1 > TfAnd stopping the machine.
Example three: and realizing the balance control of the two-wheeled self-balancing robot through an autonomous operation conditional reflection automaton. The two-wheeled upright robot can freely move left and right on the flat ground. When the deflection angle exceeds +/-12 degrees, the robot loses balance. The state set of the AOC automaton designed for the purpose is a robot deflection angle, and comprises 6 states: theta is 0 DEG, 0 DEG < theta < 12 DEG,Theta is 12 DEG, -12 DEG < theta < 0 DEG, -12 DEG, -theta > 12 DEG, respectively0、s1、s2、s3、s4、s5、s6Thus, the state set S ═ { S ═ S0,s1,s2,s2,s3,s4,s5,s6,}. Its operating set Ω ═ { α ═ α0,α1,α2Includes not moving alpha0And to the right by alpha1Leftward movement by α2. The state transition rules are as follows:
δ(s0×α0)=s0 δ(s0×α1)=s3 δ(s0×α2)=s1
δ(s1×α0)=s2 δ(s1×α1)=s0 δ(s1×α2)=s2
δ(s2×α0)=s5 δ(s2×α1)=s1 δ(s2×α2)=s5
δ(s3×α0)=s4 δ(s3×α1)=s4 δ(s3×α2)=s0
δ(s4×α0)=s5 δ(s4×α1)=s5 δ(s4×α2)=s3
see table 2. Its orientation function: epsilon: s → E ═ εi|i=0,±0.5,±1},εi=ε(si) E is as state siE.g. orientation value of S, at the same time <math> <mrow> <mi>&Delta;</mi> <mo>=</mo> <mi>a</mi> <mo>&times;</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mi>ij</mi> </msub> <mo>&times;</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>p</mi> <mi>ik</mi> </msub> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> Orientation thereof: s0→s0: is in a positive orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>00</mn> </msub> <mo>=</mo> <mn>0</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s0→s3: zero orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>03</mn> </msub> <mo>=</mo> <mn>0</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s0→s1: zero orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>01</mn> </msub> <mo>=</mo> <mn>0</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s1→s0: is in a positive orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>10</mn> </msub> <mo>=</mo> <mn>1.0</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s1→s2: negative orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>12</mn> </msub> <mo>=</mo> <mo>-</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s2→s1: is in a positive orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>21</mn> </msub> <mo>=</mo> <mn>1.0</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s2→s5: negative orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>25</mn> </msub> <mo>=</mo> <mo>-</mo> <mn>1.0</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s3→s4: negative orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>34</mn> </msub> <mo>=</mo> <mo>-</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s3→s0: is in a positive orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>30</mn> </msub> <mo>=</mo> <mn>1.0</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s4→s5: negative orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>45</mn> </msub> <mo>=</mo> <mo>-</mo> <mn>1.0</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> s4→s3: is in a positive orientation <math> <mrow> <mrow> <mo>(</mo> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mn>43</mn> </msub> <mo>=</mo> <mn>1.0</mn> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> Wherein p isikIndicating that the robot is in state siOperation akThe probability of (c). Reflection law of learning according to operating conditions
Figure A20091008926300151
The probability is continuously updated. The initial probability is 1/3, and after learning of about 1500 steps, the robot can select good operation with probability close to 1 in each state, and keep self balance, and in the former 5 states, the robot can generally select good operation to make theta tend to 0 degrees, as can be seen from fig. 7-11. In the course of the experiment, each time is according to the formula of the defined operation entropy
<math> <mrow> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>i</mi> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msub> <mi>log</mi> <mn>2</mn> </msub> <msub> <mi>p</mi> <mi>ik</mi> </msub> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math> The operation entropy is calculated, and the operation entropy ψ (t) of the AOC becomes smaller and smaller with the passage of time and tends to be minimum at t → ∞, see fig. 12, which shows that the AOC operation entropy ψ (t) is convergent. The AOC is a self-organizing system based on the theory of Stefan-gold operation conditioned reflex, and has self-learning and self-adapting functions. The self-organizing process of the system is a process of absorbing information, a process of absorbing negative entropy and a process of eliminating uncertainty. Now that the convergence of the AOC operation entropy ψ (t) has been demonstrated, the self-organizing properties of AOC are also elucidated.
TABLE 2 State transition and orientation mechanism for two-wheeled self-balancing robot
Figure A20091008926300153
The specific implementation steps of the experiment are as follows:
(1) initialization: setting t to 0, randomly giving the initial state s (0) of the AOC, giving the learning rate a to 0.01, and giving the initial operation probability pik(0) 1/3 (i-0, 1, 2; k-0, 1, 2); given down time Tf=1500;
(2) Selecting operation: by the "condition-operation" rule in the operation set Γ
Γ={rik(P) | P ∈ P; i belongs to {0, 1, 2, 3, 4 }; k ∈ {0, 1, 2} }, a rule r of random "condition-operations }ik(p):si→αk(p) i.e. AOC in its state is in siImplementing operation alpha according to probability P ∈ P under condition of ∈ Sk∈Ω,p=pik=p(αk|si) Is that the AOC is in its state at siUnder the conditions of operation akRandomly selecting an operation alpha (t) ∈ omega with the AOC state being S (t) ∈ S;
(3) the implementation operation is as follows: at time t, AOC is in a state S (t) e S to implement the operation α (t) e Ω selected in the previous step, and the current state δ: s (t) × Ω (t) → S (t +1), in particular:
δ(s0×α0)=s0 δ(s0×α1)=s3 δ(s0×α2)=s1
δ(s1×α0)=s2 δ(s1×α1)=s0 δ(s1×α2)=s2
δ(s2×α0)=s5 δ(s2×α1)=s1 δ(s2×α2)=s5
δ(s3×α0)=s4 δ(s3×α1)=s4 δ(s3×α2)=s0
δ(s4×α0)=s5 δ(s4×α1)=s5 δ(s4×α2)=s3
a transfer occurs;
(4) and (3) observing the state: according to the state transfer function of AOC: δ: s (t) × Ω (t) → S (t +1), and although the state transition process is known or unknown, the result of the state transition is fully observable, i.e., there is a state transition
j ∈ {0, 1, 2, 3, 4} such that s (t +1) ═ sj
(5) Operating conditions reflection: when the operation is performed at time t, not only the state of the AOC is shifted but also the probability of performing each operation at the next time is changed, and the learning law is reflected according to the operation conditionsAdjusting the operating rule rikThe implementation probability P ∈ P of (P) ∈ Γ. time t s (t) siAnd α (t) ═ αkThen the probability of operation at time t +1 depends on
Figure A20091008926300162
And (6) updating. Wherein,
Figure A20091008926300163
and 0. ltoreq. pik+Δ≤1; <math> <mrow> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mi>ij</mi> </msub> <mo>=</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math> I.e. the increment of the orientation value; a is the learning rate; <math> <mrow> <mi>&xi;</mi> <mo>=</mo> <msub> <mi>p</mi> <mi>iu</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>/</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>v</mi> <mo>&NotEqual;</mo> <mi>k</mi> </mrow> </munder> <msub> <mi>p</mi> <mi>iv</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> here, the optimal operation in each state is different, so that the probabilities corresponding to the different operations in each state are calculated, and there are 15 probabilities in total.
(6) Calculating an operation entropy: formulation according to defined operational entropy
<math> <mrow> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>i</mi> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msub> <mi>log</mi> <mn>2</mn> </msub> <msub> <mi>p</mi> <mi>ik</mi> </msub> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> Calculating the operation entropy at the time t, wherein p(s)i) Is AOC state siThe value of the probability of occurrence of e S at time t, p (α)k|si) Is AOC state at siE.s condition to perform operation alphakThe value of the probability e omega at time t.
(7) And (3) recursive transition: if T + 1. ltoreq.TfThen t ═ t +1 and repeat (2) - (7);
(8) when T +1 > TfAnd stopping the machine.

Claims (2)

1. An autonomous operation conditional reflection automaton, AOC for short, is a nine-tuple:
<math> <mrow> <mi>AOC</mi> <mo>=</mo> <mo>&lt;</mo> <mi>t</mi> <mo>,</mo> <mi>&Omega;</mi> <mo>,</mo> <mi>S</mi> <mo>,</mo> <mi>&Gamma;</mi> <mo>,</mo> <mi>&delta;</mi> <mo>,</mo> <mi>&epsiv;</mi> <mo>,</mo> <mi>&eta;</mi> <mo>,</mo> <mi>&psi;</mi> <mo>,</mo> <msub> <mi>s</mi> <mn>0</mn> </msub> <mo>></mo> </mrow> </math>
Figure A2009100892630002C2
wherein
(1) Discrete time of AOC: t e {0, 1, 2, …, nt0 is the starting time of AOC;
(2) set of operation symbols for AOC: q ═ αk|k=1,2,…,nΩ},αkThe kth operation symbol of AOC;
(3) state set of AOC: s ═ Si|i=0,1,2,…,nS},siIs the ith state of the AOC;
(4) AOC operation rule set: Γ ═ rik(p)|p∈P;i∈{0,1,2,…,nS};k∈{0,1,2,…,nΩ} random 'condition-operation' rule rik(p):si→αk(p) means that the AOC is in s stateiImplementing operation alpha according to probability P ∈ P under condition of ∈ Sk∈Ω,p=pik=p(αk|si) I.e. AOC is in state siUnder the conditions of operation akP represents PikA set of (a);
(5) state transfer function of AOC: δ: s (t) × Ω (t) → S (t +1), the state S (t +1) ∈ S at the time of AOC t +1 being determined by the state S (t) ∈ S at the time t and the operation α (t) ∈ S at the time t, regardless of the state and operation before the time t; δ the state transition process determined is known or unknown, but the results of its state transition can be observed;
(6) orientation function of AOC: epsilon: s → E ═ εi|i=0,1,2,…,nS},εi=ε(si) E is as state siAn orientation value of S;
(7) operating condition of AOC reflects learning law:
Figure A2009100892630002C3
simulating the biological operating conditioned reflex mechanism, adjusting the operating rule rikThe probability of implementation of (p) ∈ Γ, assuming that the state at time t is s (t) ═ siCarrying out an operation α (t) ═ αkE Ω, s (t +1) ═ s observed at time t +1jThen the probability of operation at time t +1 depends on
Figure A2009100892630002C4
Updating is carried out; wherein, pik(t) is AOC state at siE.s condition to perform operation alphakThe value of the probability of belonging to omega at the time t; p is a radical ofik(t +1) is the AOC state at siE.s condition to perform operation alphakThe value of the probability of belonging to omega at the moment t + 1;and 0. ltoreq. pik+Δ≤1; <math> <mrow> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mi>ij</mi> </msub> <mo>=</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math> I.e. the increment of the orientation value;
Figure A2009100892630002C7
is a monotonically increasing function, satisfies
Figure A2009100892630002C8
If and only if x is 0; a is the learning rate; <math> <mrow> <mi>&xi;</mi> <mo>=</mo> <msub> <mi>p</mi> <mi>iu</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>/</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>v</mi> <mo>&NotEqual;</mo> <mi>k</mi> </mrow> </munder> <msub> <mi>p</mi> <mi>iv</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> where u represents 0 to nΩAny value between which k is not equal to k,
Figure A2009100892630003C1
indicates that the AOC state is at siE.s condition to perform operation alphauE.g. the sum of the probabilities of Ω at time t, v denotes 0 to nΩAll values between which k is not equal; p is a radical ofiu(t) is AOC state at siE.s condition to perform operation alphauValue of the probability of e.omega at time t, piu(t +1) is the AOC state at siE.s condition to perform operation alphauThe value of the probability of belonging to omega at the moment t + 1;
(8) operating entropy of AOC: psi: p × E → R+,R+Is a positive real number set, the operating entropy ψ (t) of the AOC at time t indicates that the state at time t is at siSum of operating entropies under conditions:
<math> <mrow> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>&Omega;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>|</mo> <mi>S</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>i</mi> </msub> <msub> <mi>&psi;</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>&psi;</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>&Omega;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math>
it is in the state s (t) s from time tiDetermining an operation probability set and an orientation function set under the condition; psii(t) is AOC in state siOperating entropy under conditions:
<math> <mrow> <msub> <mi>&psi;</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>&psi;</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>&Omega;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msub> <mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> </mrow> <mi>ik</mi> </msub> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mrow> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <mi>s</mi> </mrow> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math>
knowing the operation entropy of each state and weighting and summing the operation entropy to obtain the operation entropy of the AOC at the time t <math> <mrow> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>i</mi> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msub> <mi>log</mi> <mn>2</mn> </msub> <msub> <mi>p</mi> <mi>ik</mi> </msub> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> Wherein, p(s)i) Is AOC state siThe value of the probability of occurrence of e S at time t, p (α)k|si) Is AOC state at siE.s condition to perform operation alphakThe value of the probability of belonging to omega at the time t;
(9) initial state of AOC: s0=s(0)∈S。
2. Autonomous operating conditional reflecting automaton AOC according to claim 1, characterized in that it operates recursively according to the following procedural steps:
(1) initialization: setting t to 0, randomly giving the initial state s (0) of the AOC, giving the learning rate a, and giving the initial operation probability pik(0)=1/nΩ(i=0,1,2,…,nS;k=1,2,…,nΩ) (ii) a Given down time Tf
(2) Selecting operation: in the "condition-operation" rule set Γ of the operation set ΓRule rik(p):si→αk(p) i.e. AOC in its state is in siImplementing operation alpha according to probability P ∈ P under condition of ∈ Sk∈Ω,p=pik=p(αk|si) Is that the AOC is in its state at siUnder the conditions of operation akRandomly selecting an operation alpha (t) ∈ omega with the AOC state being S (t) ∈ S;
(3) the implementation operation is as follows: at time t, the AOC is in a state S (t) e S, the operation α (t) e Ω selected in the previous step is performed, and the current state is shifted by δ (S (t)), α (t)) - δ (S (t))i,αk);
(4) And (3) observing the state: according to the state transfer function of AOC: δ: s (t) × Ω (t) → S (t +1), the result of the state transition being fully observable, i.e., the presence of j ∈ {0, 1, 2, …, nSSuch that s (t +1) is sj
(5) Operating conditions reflection: when the operation is performed at time t, not only the state of the AOC is shifted but also the probability of performing each operation at the next time is changed, and the learning law is reflected according to the operation conditions
Figure A2009100892630004C1
Adjusting the operating rule rik(P) an implementation probability P ∈ P of Γ; time t s (t) siAnd α (t) ═ αkThen the probability of operation at time t +1 depends on
Updating is carried out; wherein,
Figure A2009100892630004C3
and 0. ltoreq. pik+Δ≤1; <math> <mrow> <msub> <mover> <mi>&epsiv;</mi> <mo>&RightArrow;</mo> </mover> <mi>ij</mi> </msub> <mo>=</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </math> I.e. the increment of the orientation value; a is the learning rate; <math> <mrow> <mi>&xi;</mi> <mo>=</mo> <msub> <mi>p</mi> <mi>iu</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>/</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>v</mi> <mo>&NotEqual;</mo> <mi>k</mi> </mrow> </munder> <msub> <mi>p</mi> <mi>iv</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
(6) calculating an operation entropy: formulation according to defined operational entropy <math> <mrow> <mi>&psi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>i</mi> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msub> <mi>log</mi> <mn>2</mn> </msub> <msub> <mi>p</mi> <mi>ik</mi> </msub> <mo>=</mo> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>n</mi> <mi>S</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>&Omega;</mi> </msub> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> Calculating the operation entropy at the time t, wherein p(s)i) Is AOC state siThe value of the probability of occurrence of e S at time t, p (α)k|si) Is AOC state at siE.s condition to perform operation alphakThe value of the probability of belonging to omega at the time t;
(7) and (3) recursive transition: if T + 1. ltoreq.TfThen t ═ t +1 and repeat (2) - (7);
(8) when T +1 > TfAnd stopping the machine.
CNA2009100892633A 2009-07-15 2009-07-15 Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior Pending CN101599137A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2009100892633A CN101599137A (en) 2009-07-15 2009-07-15 Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2009100892633A CN101599137A (en) 2009-07-15 2009-07-15 Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior

Publications (1)

Publication Number Publication Date
CN101599137A true CN101599137A (en) 2009-12-09

Family

ID=41420574

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2009100892633A Pending CN101599137A (en) 2009-07-15 2009-07-15 Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior

Country Status (1)

Country Link
CN (1) CN101599137A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103792846A (en) * 2014-02-18 2014-05-14 北京工业大学 Robot obstacle avoidance guiding method based on Skinner operating condition reflection principle
CN104570738A (en) * 2014-12-30 2015-04-29 北京工业大学 Robot track tracing method based on Skinner operant conditioning automata
CN104614988A (en) * 2014-12-22 2015-05-13 北京工业大学 Cognitive and learning method of cognitive moving system with inner engine
CN105094124A (en) * 2014-05-21 2015-11-25 防灾科技学院 Method and model for performing independent path exploration based on operant conditioning
CN105205533A (en) * 2015-09-29 2015-12-30 华北理工大学 Development automatic machine with brain cognition mechanism and learning method of development automatic machine
WO2017114130A1 (en) * 2015-12-31 2017-07-06 深圳光启合众科技有限公司 Method and device for obtaining state of robot
CN108846477A (en) * 2018-06-28 2018-11-20 上海浦东发展银行股份有限公司信用卡中心 A kind of wisdom brain decision system and decision-making technique based on reflex arc
CN109212975A (en) * 2018-11-13 2019-01-15 北方工业大学 A kind of perception action cognitive learning method with developmental mechanism
CN111464707A (en) * 2020-03-30 2020-07-28 中国建设银行股份有限公司 Outbound call processing method, device and system

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103792846A (en) * 2014-02-18 2014-05-14 北京工业大学 Robot obstacle avoidance guiding method based on Skinner operating condition reflection principle
CN103792846B (en) * 2014-02-18 2016-05-18 北京工业大学 Based on the robot obstacle-avoiding air navigation aid of Skinner operant conditioning reflex principle
CN105094124A (en) * 2014-05-21 2015-11-25 防灾科技学院 Method and model for performing independent path exploration based on operant conditioning
CN104614988B (en) * 2014-12-22 2017-04-19 北京工业大学 Cognitive and learning method of cognitive moving system with inner engine
CN104614988A (en) * 2014-12-22 2015-05-13 北京工业大学 Cognitive and learning method of cognitive moving system with inner engine
CN104570738A (en) * 2014-12-30 2015-04-29 北京工业大学 Robot track tracing method based on Skinner operant conditioning automata
CN105205533A (en) * 2015-09-29 2015-12-30 华北理工大学 Development automatic machine with brain cognition mechanism and learning method of development automatic machine
CN105205533B (en) * 2015-09-29 2018-01-05 华北理工大学 Development automatic machine and its learning method with brain Mechanism of Cognition
WO2017114130A1 (en) * 2015-12-31 2017-07-06 深圳光启合众科技有限公司 Method and device for obtaining state of robot
CN106926236A (en) * 2015-12-31 2017-07-07 深圳光启合众科技有限公司 The method and apparatus for obtaining the state of robot
CN106926236B (en) * 2015-12-31 2020-06-30 深圳光启合众科技有限公司 Method and device for acquiring state of robot
CN108846477A (en) * 2018-06-28 2018-11-20 上海浦东发展银行股份有限公司信用卡中心 A kind of wisdom brain decision system and decision-making technique based on reflex arc
CN108846477B (en) * 2018-06-28 2022-06-21 上海浦东发展银行股份有限公司信用卡中心 Intelligent brain decision system and decision method based on reflection arcs
CN109212975A (en) * 2018-11-13 2019-01-15 北方工业大学 A kind of perception action cognitive learning method with developmental mechanism
CN111464707A (en) * 2020-03-30 2020-07-28 中国建设银行股份有限公司 Outbound call processing method, device and system

Similar Documents

Publication Publication Date Title
CN101599137A (en) Autonomous operant conditioning reflex automat and the application in realizing intelligent behavior
Lyu et al. SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning
US9613310B2 (en) Neural network learning and collaboration apparatus and methods
Mishra et al. Prediction and control with temporal segment models
US11762679B2 (en) Information processing device, information processing method, and non-transitory computer-readable storage medium
JP2020204803A (en) Learning method and program
EP3587045A1 (en) Method and device for the computer-aided determination of control parameters for favourable handling of a technical system
CN116560239B (en) Multi-agent reinforcement learning method, device and medium
Agrawal The task specification problem
CN114063446A (en) Method for controlling a robot device and robot device controller
CN101673354A (en) Operant conditioning reflex automatic machine and application thereof in control of biomimetic autonomous learning
Sacks et al. Learning to optimize in model predictive control
Santucci et al. Intrinsic motivation mechanisms for competence acquisition
Hussein et al. Towards Trust-Aware Human-Automation Interaction: An Overview of the Potential of Computational Trust Models.
Seurin et al. Don't do what doesn't matter: Intrinsic motivation with action usefulness
Luo et al. RLIF: Interactive Imitation Learning as Reinforcement Learning
EP3793785B1 (en) Method and device for the computer-aided determination of control parameters for favourable handling of a technical system
Paudel Learning for robot decision making under distribution shift: A survey
Stahlhut et al. Interaction is more beneficial in complex reinforcement learning problems than in simple ones
Laezza Robot Learning for Manipulation of Deformable Linear Objects
Hong et al. Adversarial exploration strategy for self-supervised imitation learning
Varkey et al. Learning robotic grasp using visual-tactile model
Rahim et al. Genetically evolved action selection mechanism in a behavior-based system for target tracking
US20210397143A1 (en) Autonomous self-learning system
US20210390377A1 (en) Autonomous self-learning system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20091209