CN105205533A

CN105205533A - Development automatic machine with brain cognition mechanism and learning method of development automatic machine

Info

Publication number: CN105205533A
Application number: CN201510628233.0A
Authority: CN
Inventors: 任红格; 史涛; 向迎帆; 李福进; 李冬梅; 霍美杰; 徐少彬; 刘为民; 张春磊; 尹瑞
Original assignee: North China University of Science and Technology
Current assignee: North China University of Science and Technology
Priority date: 2015-09-29
Filing date: 2015-09-29
Publication date: 2015-12-30
Anticipated expiration: 2035-09-29
Also published as: CN105205533B

Abstract

The invention relates to a development automatic machine with a brain cognition mechanism and a learning method of the development automatic machine, and belongs to the technical field of intelligent robots. The development automatic machine comprises an inner state set, a system output set, an inner operation behavior set, a state transfer equation, a reward signal, a system evaluating function, a system action selecting probability and a Dopamine response differential signal. By means of the development automatic machine and the learning method, a mathematic model high in generalization ability and wide in application range is provided for the system autonomous development process with the learning automatic machine as the basic frame; by means of the method, a sensorimotor system and an intrinsic motivation mechanism are combined, the self-learning and self-adaption capacity of the system is improved, and the intelligence in the true sense is achieved.

Description

Developmental automaton with brain cognitive mechanism and learning method thereof

Technical Field

The invention relates to a development automaton with brain cognitive mechanism and a learning method thereof, belonging to the technical field of intelligent robots.

Technical Field

The learning and memory are the essence of the intelligent behaviors of human and animals, and various skills of the human and animals are gradually formed and developed in the process that the nervous system of the human and animals is self-learned and self-organized, so that the nervous activity and the self-regulation mechanism of the human and animals are learned and simulated and are endowed to the intelligent robot, and the learning and memory method is an important research subject of artificial intelligence and control science.

In 1996, j.weng first proposed a robot autonomous mental development concept, who thought that an agent should develop mental ability under the control of an intrinsic development program through interaction of sensors and effectors with an unknown environment on the basis of a human brain simulation. Brooks et al emphasize that interactive learning of robots with teachers and environments gradually develops the intelligence of the robots, and propose a calculation model simulating the areas of prefrontal lobes, hypothalamus, hippocampus and the like in the cerebral cortex of human and animals by combining the research theory of neuroscience to process complex problems in complex environments, which also relates to a sensorimotor system. The initial cognitive development begins with the development and development of the coordinated mechanisms of the sensorimotor system, which is in turn constantly coordinated and refined during the process of intrinsic motivation development and development. The literature of neurological relevance shows that during the learning process of human and animal, the cerebral cortex, basal nucleus and cerebellum work in parallel in a self-specific way, and in the correlation of human and animal movement, the cerebellum and basal nucleus are distributed on two sides of the route of motor signal transmission between the cerebral cortex and the spinal cord, and participate in the initiation and control of any behavioral action.

Related patents such as the invention patent of application No. CN200910086990.4 are based on the theory of automata, and propose to operate an automaton model and apply the model to autonomous learning control of a robot. The patent with application number CN201310656943.5 applies the principle of conditioned reflex operation to the field of image processing, which effectively improves the accuracy and speed of image processing by the system. The patent with the application number of 201410101272.0 mainly aims at the problems that the traditional robot is low in learning efficiency, poor in adaptability and the like, provides a bionic intelligent control method, and effectively improves the intelligent level of the robot. Application number 201410163756.8 provides an autonomous mental development cloud robot system based on cloud computing, which can effectively reduce the burden of the robot in executing operation-intensive tasks and can also realize the sharing of knowledge among different robots. However, the above patents do not relate to a learning system that simulates the cognitive mechanisms of the human brain.

Disclosure of Invention

Aiming at the technical problems, the invention takes a biological sensorimotor system as a theoretical basis and introduces an internal motivation mechanism in psychology to drive learning, provides a developmental automaton with a brain cognitive mechanism and a learning method thereof, and improves the autonomous developmental cognitive ability of a robot.

The developmental automaton with the brain cognitive mechanism comprises an internal state set, a system output set, an internal operation behavior set, a state transition equation, a reward signal, a system evaluation function, a system action selection probability and a dopamine response difference signal;

（1）represented as a finite set of internal states, corresponding to the sensory cortex in the cerebral cortex,is shown asIn one of the states, the state of the mobile terminal,the number of internal states.

（2）Represented as a set of system outputs, corresponding to the motor cortex in the cerebral cortex,is shown asThe output of the first and second processors is,indicating the number of outputs.

（3）Represented as a set of internal operating behaviors, corresponding to the cerebellar regions,is as followsThe internal action of the one or more internal actions,the number of internal actions.

（4）Is a state transition equation, i.e.State of the momentByState of the momentAnd operational behaviorJoint decisions are generally determined by an environment or model.

（5）Is shown as a systemThe internal state of the moment isIs the internal operational action takenPost-transition to stateThe latter reward signal, relative to the thalamus which gives rise to the thalamus sensation.

(6) The input signal in the cerebral cortex contains two parts, sensory and motor cortex information, respectively, as inputs to the striatum, and thus:

（1）

the striated corpuscle is mainly an evaluation mechanism for predicting the orientation of the action of the organism, and further is an evaluation mechanism for predicting the orientation of an intrinsic motivation mechanism, and a system evaluation function is defined as follows:

（2）

wherein,is a discount factor; due to the existence of an internal motivation mechanism, the evaluation function of the systemGradually approaches to 0, thereby ensuring that the system is finally in a stable state; definition ofIs the orientation core in the intrinsic motivation mechanism, and the main function is to guide the autonomous cognitive direction; defining orientation kernelsIn the range ofThe function value of the best orientation and the function value of the worst orientation are obtained; then the motive orientation function definition within the striated corpuscle is shown in equation (3):

（3）

whereinFor the parameters of the orientation function, the difference of the orientation functions at two adjacent time instants is defined asTo determine the degree of orientation of the system, ifDescription of the inventionTime ratioThe orientation value at the moment is large, and vice versaDescription of the inventionTime ratioThe orientation value at the time is small.

(7) During the learning process of the basal ganglia, the matrix in the striatum is mainly the action selection function; one of the most important features in the learning process driven by the intrinsic motivation mechanism is to select the execution action according to the probability magnitude; adopting Boltzmann probability rules to realize the behavior selection function of the matrix, thereby realizing the probability selection mechanism of the learning automaton; firstly, defining:

（4）

the system action selection probability output for the striatal stroma, as defined in equation (4), is expressed by equation (5):

（5）

wherein,is a temperature constant, represents the degree of randomness of the selection of the motion,the greater the degree of action selection, and converselySmaller indicates less degree of action selection; when in useGradually approaches zero, thenThe corresponding action selection probability gradually approaches to 1, and the system is provided withThe value of (A) is gradually reduced along with the time, which represents that the system experiences the gradual increase of knowledge in the learning process, andgradually evolving from an unstable system to a stable system;

(8) the dopaminergic released by the substantia nigra pars compacta can be used as a guiding signal for the assessment of an action for improving the expression of a behavior resulting in a maximum future reward for the action, in order to obtain a more accurate execution of the action; in thatThe evaluation function of the time determined by the striated bodies is:

（6）

combining equation (2) and equation (6) yields equation (7):

（7）

this shows thatTime of day, evaluation functionCan useEvaluation function of timeBut the evaluation value is used due to the influence of the error existing in the initial stage of predictionTo representIs not equal to the actual value, so that reward messages output by the thalamic output and the striatum need to be processed in the substantia nigra pars compacta and release dopaminergic energyTo adjust the table of evaluation values, the dopamine response differential signal is represented by equation (8):

（8）。

the learning method of the developmental automata with the brain cognitive mechanism comprises the following steps:

(1) initialization: initial value of iterative learning step numberThe number of iterative learning isInitializing each parameter and synapse weight, wherein the probability of executing initial internal operation behavior is the same when the experiment starts;

(2) sensing a current state；

(3) Computing an evaluation function in striated bodiesDue to the existence of intrinsic motivation mechanisms, according to the presentValue of (a) to calculate an orientation function；

(4) Calculating the behavior selection probability of striatum matrix according to the orientation according to a formulaAnd actions performed by the cerebellum；

(5) According to the state transition equation, the state is represented by；

(6) Immediate reward from thalamusAnd initiating a dopamine response adjustment evaluation value;

(7) output of motion from the motor cortex of the brain；

(8) Repeating the steps (2) to (7) until the(ii) a And finishing the learning.

Compared with the prior art, the development automaton with the brain cognitive mechanism and the learning method thereof provided by the invention provide a mathematical model with strong generalization capability and wide application range for the system autonomous development process by taking the learning automaton as a basic framework; secondly, the method combines a sensory-motor system and an internal motivation mechanism, improves the self-learning and self-adaptive capacity of the system, and realizes the intelligence in the true sense.

Drawings

FIG. 1 is a block diagram of the system of the present invention;

FIG. 2 is a flow chart of the learning process of the present invention;

FIG. 3 is a response curve of the balance control states of the two-wheeled robot according to the embodiment;

FIG. 4 is a simulation curve of the balance control evaluation function and error of the two-wheeled robot according to the embodiment;

FIG. 5 shows simulation results of the anti-interference test of the embodiment;

FIG. 6 is a graph comparing evaluation function curves of the learning method of the embodiment and the conventional learning automaton method;

FIG. 7 is a graph comparing error curves of the learning method of the embodiment with those of the conventional learning automaton method.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

In the embodiment of two-wheeled robot, the system structure is shown in fig. 1, and learning is performed according to the flow of steps shown in fig. 2.

Aiming at an incomplete two-wheel self-balancing robot, the system is an intrinsic unstable system, and before various motions are realized, the robot is ensured to be capable of keeping self balance, so the posture balance of the two-wheel robot is the primary condition for motion control. In order to verify the effectiveness, robustness and superiority of the developmental automaton with brain cognitive mechanism provided by the invention, the embodiment uses two rounds of robots as objects to research how the robots finally learn the motor skills through autonomous learning in an unknown environment.

The robot has four outputs in the experimental processThe output quantity meets the corresponding condition, namely the angular speed of the left and the right wheelsAndare all less than 3.489rad/s, and the self-inclination angle of the fuselagerad and robot pendulum angular velocityrad/s. Discount factorThe sampling time was 0.01 s. In each experiment, when the number of attempts of the robot exceeds 1000 or the number of equilibrium steps of one attempt exceeds 20000 steps, the learning of the robot is stopped and another experiment is restarted. If the robot is able to maintain balance after 20000 steps in one of the attempts, the robot is considered to have learned the skill of balance control. After each experiment fails, the initial state and each weight value are reset to random values within a certain range, and then the learning is carried out again.

Experiment 1: balance control experiment

Under the unknown environment without interference, the robot adopts the method provided by the invention, through continuous learning, 42 times of trial and 43 times of trial, the balance control skill is learned about 220 seconds, namely about 2.2 seconds, the robot shows the faster autonomous learning capability and the effectiveness of the invention, and the response curve of each state quantity of the first 3000 steps and the evaluation function and error simulation curve in the simulation result are shown in fig. 3 and 4.

Experiment 2: anti-interference experiment

In the actual operation process of the system, the input and output signals are more or less interfered by external noise, or the inaccuracy of the detection device can cause a certain error of the state quantity. Then, in order to simulate the actual environment, when the robot has learned the balance control and keeps 9800 steps, a pulse signal with the amplitude of 25 is added to each input state quantity, and if the robot can endure the interference of the pulse signal and keep balance, the experiment is considered to be successful and the invention is proved to have certain robustness. Fig. 5 shows the output response of each state after adding the pulse signal, and it can be seen that after 200 steps, i.e. 2s or so, the robot reaches the equilibrium position again.

Experiment 3: comparison experiment of the present embodiment and the conventional learning automata

The invention introduces an internal motivation mechanism to drive the robot to independently learn, thereby being beneficial to reducing the error of the system and improving the convergence rate of the algorithm. In order to prove the superiority of the invention, a balance control experiment is carried out on the two-wheeled robot by respectively applying the traditional learning automata algorithm and the invention, and the experimental result is analyzed. The parameters of the two algorithms are set to be the same in the experiment, and fig. 6 and 7 are comparison graphs of evaluation functions and error curves of the two algorithms in the first 2000 steps. It can be seen from fig. 6 that the learning of the balance control skill is completed in about 220 steps, i.e., 2.2s, while the learning is completed in about 600 steps, i.e., 6s, by the conventional learning automaton method, which proves that the convergence rate of the present invention is superior to that of the conventional learning automaton method. Fig. 7 shows that the error amplitude of the invention is superior to that of the traditional learning automaton method, and is more beneficial to the stability of the system.

Claims

1. A developmental automaton with brain cognitive mechanisms, comprising: the method comprises an internal state set, a system output set, an internal operation behavior set, a state transition equation, a reward signal, a system evaluation function, a system action selection probability and a dopamine response difference signal;

（1）represented as a finite set of internal states, corresponding to the sensory cortex in the cerebral cortex,is shown asIn one of the states, the state of the mobile terminal,the number of internal states;

（2）represented as a set of system outputs, corresponding to the motor cortex in the cerebral cortex,is shown asThe output of the first and second processors is,representing the output number;

（3）represented as a set of internal operating behaviors, corresponding to the cerebellar regions,is as followsThe internal action of the one or more internal actions,the number of internal actions;

（4）is a state transition equation, i.e.State of the momentByState of the momentAnd operational behaviorJoint decisions, typically determined by environment or model;

（5）is shown as a systemThe internal state of the moment isIs the internal operational action takenPost-transition to stateA posterior reward signal, relative to the thalamus-initiated thalamus sensation;

（1）

（2）

（3）

whereinDefining the orientation of two adjacent time instants as parameters of an orientation functionThe difference of the functions isTo determine the degree of orientation of the system, ifDescription of the inventionTime ratioThe orientation value at the moment is large, and vice versaDescription of the inventionTime ratioThe orientation value at the time is small;

（4）

（5）

wherein,is a temperature constant, represents the degree of randomness of the selection of the motion,the greater the degree of action selection, and converselySmaller indicates less degree of action selection; when in useGradually approaches zero, thenThe corresponding action selection probability gradually approaches to 1, and the system is provided withThe numerical value of (A) is gradually reduced along with time, which represents that the system experiences gradually increased knowledge in the learning process and gradually evolves from an unstable system to a stable system;

（6）