CN107894715A - The cognitive development method of robot pose path targetpath optimization - Google Patents

The cognitive development method of robot pose path targetpath optimization Download PDF

Info

Publication number
CN107894715A
CN107894715A CN201711117326.2A CN201711117326A CN107894715A CN 107894715 A CN107894715 A CN 107894715A CN 201711117326 A CN201711117326 A CN 201711117326A CN 107894715 A CN107894715 A CN 107894715A
Authority
CN
China
Prior art keywords
strio
state
robot
orientation
equation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201711117326.2A
Other languages
Chinese (zh)
Inventor
任红格
史涛
宫海洋
李福进
李军
尹瑞
徐少彬
赵传松
杜建
王玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Science and Technology
Original Assignee
North China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Science and Technology filed Critical North China University of Science and Technology
Priority to CN201711117326.2A priority Critical patent/CN107894715A/en
Publication of CN107894715A publication Critical patent/CN107894715A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/048Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators using a predictor
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/12Target-seeking control

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides a kind of cognitive development algorithm CBCLA of robot pose path targetpath optimization, it is divided into eight parts, respectively limited internal state set, the output set of system, built-in function behavior set, state transition equation, in t, internally state, evaluation function, the action select probability of corpus straitum matrix export system, dopaminergic, it can be represented with a biquaternion group:CBCLA={ SC, MC, CbA,f,r(t),BGstrio,BGmatrix,SNDPA; the nervous activity of present invention simulation organism sensorimotor system; using learning automaton as framework; with reference to the characteristics of intrinsic motivation mechanism drives organism autonomous learning; this cognitive development algorithm is applied among Research on Path Planning of Mobile Robot, and robot is developed under circumstances not known by autonomous learning; moving equilibrium control technical ability is gradually grasped, and realizes the real-time tracking of target.

Description

Cognitive development method for robot posture path target track optimization
The technical field is as follows:
the invention belongs to the technical field of intelligent robots, and relates to a robot attitude path target track optimization simulation method.
Technical background:
the human and animal intelligent behaviors are embodied by learning and memory, various skills of the human and animal intelligent behaviors are slowly formed and gradually developed in the process that a nervous system is self-learned and self-organized, the nervous activity and the self-regulation mechanism of the human and animal are learned and simulated and are endowed to an intelligent robot, and the human and animal intelligent behaviors are important research subjects of artificial intelligence and control science. In 1996, j.weng first proposed a concept of autonomous mental development of robots, who thought that agents should develop mental ability by interacting with unknown environments through sensors and effectors under the control of an intrinsic development program on the basis of a simulated human brain. Brooks et al emphasize that interactive learning of robots with teachers and environments gradually develops the intelligence of the robots, and propose a calculation model simulating the areas of prefrontal lobes, hypothalamus, hippocampus and the like in the cerebral cortex of human and animals by combining the research theory of neuroscience to process complex problems in complex environments, which also relates to a sensorimotor system. The initial cognitive development begins with the development and development of the coordinated mechanisms of the sensorimotor system, which is in turn constantly coordinated and refined during the process of intrinsic motivation development and development. The literature of neurological relevance shows that during the learning process of human and animal, the cerebral cortex, basal nucleus and cerebellum work in parallel in a self-specific way, and in the correlation of human and animal movement, the cerebellum and basal nucleus are distributed on two sides of the route of motor signal transmission between the cerebral cortex and the spinal cord, and participate in the initiation and control of any behavioral action.
Many scholars have conducted relevant research as early as the 80's of the 20 th century. In 2000, moren et al proposed a system combining emotion and behavior selection for MOWER-based two-process learning theory. Wang et al in 2007 put forward an intelligent model based on artificial emotion on the basis of a human brain emotion circuit, and apply the intelligent model to an inverted pendulum system to enable the inverted pendulum system to successfully learn balance control skills. In 2010, batto et al take reinforcement learning as a theoretical framework from the perspective of evolution, and adopt active learning driven by an internal motivation, so that the learning efficiency of an intelligent agent is greatly improved. In 2013, oudeyer and the like provide a system state transition error learning machine by starting from the exploration under the self-consciousness of organisms and combining the internal motivation thought, and realize the active exploration learning of unknown atmosphere. The research and application of a layered reinforcement learning and potential action model are proposed by Shenxianzhang et al in 2014, the potential action model of the obstacle is researched, and the application of the robot path planning problem in the obstacle environment is carried out by combining the layered reinforcement learning.
The biologically relevant literature concludes that there are mechanisms in the sensorimotor system of humans and animals whose motivation is linked to intrinsic targets, called intrinsic motivational mechanisms. The mechanism is a learning mechanism which is based on a sensory-motor system as a theoretical basis and is guided by the curiosity and orientation of organisms. Aiming at the problem that the traditional machine learning can not continuously learn, inspired by the research, a sensory-motor system is combined with an internal motivation mechanism, the cognitive behavior of an organism is simulated, a learning automaton is taken as a basic framework, a cognitive development algorithm based on a cerebellum-basilar nucleus-cerebral cortex loop is provided, and by utilizing the algorithm, two rounds of robots gradually master the motor balance control skill through the interaction with an unknown environment, and the real-time cognitive development is realized, so that the aim of continuous learning of machine learning is fulfilled.
The invention discloses a three-dimensional flight path planning method for a power transmission line inspection unmanned aerial vehicle, which is disclosed by related patents such as application publication No. CN 106403948A.
The invention patent of application publication No. CN106557844A discloses a welding robot path planning method based on a clustering guidance multi-target particle swarm optimization technology, which comprises the steps of establishing a D # H parameter model of a welding robot, obtaining an obstacle avoidance path through a geometric obstacle avoidance strategy, and carrying out Cartesian space-based trajectory planning on the obstacle avoidance path.
However, neither of the above patents addresses the practical exploration of imparting both human and animal neural activity and autoregulatory mechanisms to robots.
The invention content is as follows:
the invention provides a robot control method for simulating human psychological cognitive mechanism and brain nerve movement aiming at the learning of continuous behaviors of two-wheeled robots, which enables the robot to simulate the thinking mode of a cognitive development algorithm (CBCLA) of a cerebellum-basement nucleus-cerebral cortex loop of a human to perform bionic cognitive development and apply the bionic cognitive development method to the path planning research of a mobile robot, and provides a cognitive development method for optimizing a posture path target track of the robot, which specifically comprises the following steps:
a cognitive development method for optimizing a robot posture path target track combines a cognitive development algorithm (CBCLA) thinking of a human based on cerebellum, basal nucleus and cerebral cortex loop with a robot, the cognitive development process of the robot is divided into eight parts, namely a limited internal state set, a system output set and an internal operation behavior set, a state transfer equation, a system internal state at the time t, an evaluation function, a striatum matrix action selection probability output and dopaminergic energy, and the mutual correlation of the eight parts can be represented by an eight-element array:
CBCLA={SC,MC,Cb A ,f,r(t),BG strio ,BG matrix ,SN DPA }
1)SC=[s 1 ,s 2 ,...s j ]expressed as a limited set of internal states, corresponding to the sensory cortex, s, in the cerebral cortex j Represents the jth state, j is the number of internal states;
2)MC=[y 1 ,y 2 ,...y i ]expressed as the output set of the system, corresponding to the motor cortex, y, in the cerebral cortex i Represents the ith output, i represents the output number;
3)Cb A =[a 1 ,a 2 ,...a k ]expressed as a set of internal operating behaviors corresponding to the cerebellar region, a k K is the kth internal action, and k is the number of internal actions;
4) S (t) × a (t) → s (t + 1) is a state transition equation, namely the state s (t + 1) at the time t +1 is determined by the state s (t) at the time t and the operation behavior a (t) together, and generally determined by an environment or a model;
5) r (t) = r (s (t), a (t)) represents a reward signal after the system makes the state transition to s (t + 1) after the internal operation action a (t) is taken when the internal state is s (t) at time t, and is relative to the thalamus sensation sent by the thalamus;
6)BG strio as an evaluation function, the striated corpuscle is mainly an evaluation mechanism for predicting the movement orientation of the organism, and is further an evaluation mechanism for evaluating the orientation of an intrinsic motivation mechanism;
7)BG matrix outputting the action selection probability of the striated corpuscle matrix, wherein the matrix in the striated corpuscles is mainly used for the action selection function in the learning process of the basal ganglia;
8)SN DPA dopaminergic, as a guided incentive for behavioral assessment, as a behavioral representation that enhances the unknown maximum reward formed by the incentive, results in accurate performance of the action.
From the state transition equation f: S (t) × a (t) → S (t + 1), the external state S (t + 1) ∈ S at the time t +1 is always determined by the external state S (t) ∈ S at the time t and the external agent action a (t) ∈ a at the time t, and is independent of the external state and the external agent action before the time t.
The input signal in the cerebral cortex described above contains two parts, sensory cortical information and motor cortical information, respectively, as inputs to the striated bodies, and therefore:
CC={SC,MC} (1)
the invention defines an evaluation function BG strio The following:
wherein γ ∈ [0,1 ]]The discount factor is the evaluation function BG of the system due to the existence of the intrinsic motivation mechanism strio Gradually approaches to 0, thereby ensuring that the system is finally in a stable state; defining eta as an orientation core in an internal motor mechanism, wherein the main function is to guide the autonomous cognitive direction, and generally defining the value range of the orientation core eta as [ eta ] minmax ]I.e. between the best and the worst function value of orientation, the motor orientation function in the striated corpuscle is defined as shown in equation (3):
wherein lambda is a parameter of an orientation function, the difference value of the orientation functions at two adjacent moments is defined as theta (t) = eta (t) -eta (t-1), and the orientation degree of the system is judged, if the theta (t) =0, the moment t is larger than the orientation value at the moment t-1, otherwise, the theta (t) <0, the moment t is smaller than the orientation value at the moment t-1.
The invention adopts Boltzmann probability rule to realize the action selection function of the matrix and the probability selection mechanism of the learning automaton, and firstly defines:
according to the definition in equation (4), we can express the motion selection probability output of the striated corpuscle matrix as equation (5):
wherein, T is a temperature constant and represents the random degree of the selection of the action, the larger T is the degree of the action selection, conversely, the smaller T is the degree of the action selection, and when T gradually approaches zero, BG is obtained strio (SC(t),a j ) The corresponding action selection probability gradually approaches 1, and the value of T in the system gradually decreases along with time.
The evaluation function determined by the striated bodies at time t +1 is:
combining equation (2) and equation (6) yields equation (7):
BG strio (t)=r(t+1)+γBG strio (t+1) (7)
this shows that at time t the evaluation function BG strio (t) the evaluation function BG at time t +1 can be used strio (t + 1), but the evaluation value BG is used because of the influence of the error existing in the initial stage of prediction strio (t + 1) represents BG strio The value of (t) is not equal to the actual value, so that reward messages output by the thalamic output and the striatonisomes need to be processed in the substantia nigra pars compacta and release dopaminergic SN DPA The table for adjusting the evaluation value can be expressed by equation (8):
SN DPA =r(t+1)+γBG strio (t+1)-BG strio (t) (8)
the invention simulates the neural activity of a biological sensory-motor system, takes a learning automata as a frame, combines the characteristic of driving an organism to autonomously learn by an internal motivation mechanism, and provides a cognitive development algorithm for optimizing the target track of a robot attitude path.
Description of the drawings:
FIG. 1 is a diagram of an algorithmic control structure according to the present invention;
FIG. 2 is a cognitive development robot frame;
FIG. 3 is a graph of response output for each state;
FIG. 4 is a graph of evaluation function versus error simulation;
FIG. 5 shows simulation results of an anti-interference experiment;
FIG. 6 is a comparison graph of the merit functions of the CBCLA algorithm and the classical LA algorithm;
FIG. 7 is an error contrast diagram of the CBCLA algorithm and the classical LA algorithm.
Detailed description of the preferred embodiments
The invention provides a cognitive development method based on a robot aiming at the learning of continuous behaviors of two-wheeled robots, simulating human psychological cognitive mechanism and brain nerve movement phenomenon, and based on the thinking activity of a cognitive development algorithm (CBCLA) of a human cerebellum-basalis-cerebral cortex loop, and the cognitive development method is applied to the path planning research of a mobile robot. The robot gradually grasps the motor balance control skill through autonomous learning development in an unknown environment, and realizes real-time tracking of the target.
According to the thought, a cognitive development method for optimizing a robot posture path target track is created, the method combines a human cognitive development algorithm (CBCLA) thinking based on a cerebellum-basement nucleus-cerebral cortex loop with a robot, the cognitive development process of the robot is divided into eight parts, namely a limited internal state set, a system output set, an internal operation behavior set, a state transition equation, a system internal state at the moment t, an evaluation function, a striatum matrix action selection probability output and dopaminergic energy, and the mutual correlation of the eight parts can be represented by an eight-element array:
CBCLA={SC,MC,Cb A ,f,r(t),BG strio ,BG matrix ,SN DPA }
the specific meanings of the respective parts are as follows:
(1)SC=[s 1 ,s 2 ,...s j ]expressed as a limited set of internal states, corresponding to the sensory cortex, s, in the cerebral cortex j Represents the jth state, j being the number of internal states.
(2)MC=[y 1 ,y 2 ,...y i ]Expressed as the output set of the system, corresponding to the motor cortex, y, in the cerebral cortex i Indicates the ith output, and i indicates the number of outputs.
(3)Cb A =[a 1 ,a 2 ,...a k ]Expressed as a set of internal operating behaviors corresponding to the cerebellar region, a k Is the kth internal action, and k is the number of internal actions.
(4) S (t) × a (t) → s (t + 1) is a state transition equation, that is, the state s (t + 1) at the time t +1 is determined by the state s (t) at the time t and the operation behavior a (t) together, and is generally determined by an environment or a model.
(5) r (t) = r (s (t), a (t)) means a reward signal after the system shifts to s (t + 1) after the internal state is that s (t) is the internal operation action a (t) taken at time t, with respect to the thalamus giving out a hill sensation.
The input signal in the cerebral cortex contains two parts, sensory and motor cortical information, respectively, as input to the striated bodies, and thus:
CC={SC,MC} (1)
(6)BG strio for the evaluation function, the striated corpuscle is mainly an evaluation mechanism for predicting the orientation of the living body action, and further an evaluation mechanism for predicting the orientation of the intrinsic motivation mechanism, and the evaluation function is defined as follows:
wherein γ ∈ [0,1 ]]Is a discount factor. Due to the existence of an internal motivation mechanism, the evaluation function BG of the system strio Gradually approaches 0 to ensure that the system is finally in a steady state. We define η as the orientation core in intrinsic motivation mechanisms, the main function is to direct the autonomous cognitive direction. The value range of the orientation core eta is generally defined in [ eta [ ] minmax ]I.e., between the best and worst function values of orientation. Then the motive orientation function definition within the striated corpuscle is shown in equation (3):
and determining the orientation degree of the system by defining the difference value of the orientation functions at two adjacent moments as theta (t) = eta (t) -eta (t-1), wherein if the theta (t) =0, the orientation value at the moment t is larger than that at the moment t-1, otherwise, the orientation value at the moment t is smaller than that at the moment t-1, if the theta (t) <0, the orientation value at the moment t is smaller than that at the moment t-1.
(7)BG matrix And (4) outputting the action selection probability of the striated corpuscle stroma, wherein the stroma in the striated corpuscles is mainly used for the action selection function in the learning process of the basal ganglia. One of the most important features in the learning process driven by the intrinsic motivation mechanism is the choice of action to perform according to the magnitude of probability. The Boltzmann probability rule is adopted to realize the action selection function of the matrix, so that the probability selection mechanism of the learning automaton is realized. First we define:
according to the definition in equation (4), we can express the motion selection probability output of the striated corpuscle matrix by equation (5):
where T is a temperature constant and indicates the degree of randomness of the selection of the motion, and a larger T indicates a larger degree of motion selection, whereas a smaller T indicates a smaller degree of motion selection. When T gradually approaches zero, BG strio (SC(t),a j ) The corresponding action selection probability gradually approaches 1, the value of T in the system gradually decreases along with time, which means that the system experiences gradually increasing knowledge in the learning process, and gradually evolves from an unstable system to a stable system.
(8)SN DPA Dopaminergic, which can be a guideline incentive for a flat line of action, as a behavioral expression that enhances the unknown maximum reward formed by the incentive, thereby resulting in a more accurate performance action. The evaluation function determined by the striated bodies at time t +1 is:
combining equation (2) and equation (6) yields equation (7):
BG strio (t)=r(t+1)+γBG strio (t+1) (7)
this shows that at time t the evaluation function BG strio (t) the evaluation function BG at time t +1 can be used strio (t + 1), but the evaluation value BG is used because of the influence of the error existing in the initial stage of prediction strio (t + 1) represents BG strio The value of (t) is not equal to the actual value, so that reward messages from thalamic output and striatal output need to be processed in the substantia nigra pars compacta and release dopaminergic SN DPA The table for adjusting the evaluation value can be expressed by equation (8):
SN DPA =r(t+1)+γBG strio (t+1)-BG strio (t) (8)
the convergence of the algorithm of the invention proves that:
evaluation of striated body outputCost function BG stri o (t) is set toFor convenience of demonstration, this is expressed by equation (9):
BG strio (t)=J(t) (9)
application in Markov environmentIterative algorithm if there is an absolute value | r (s, a) | and an iterative initial value of an instant reward for any state action pair (s, a)Has a boundary of 0 ≦ γ&If the number of times of adjustment of each state action pair (s, a) is not limited when n approaches infinity, then 1, n is the number of iterationsWill eventually trend towards the optimal value J with a probability of 1 * (s,a)。
And (3) proving that: the absolute value of the difference between the merit function of any one state action pair (s, a) and its optimum is considered as:
wherein the state and action after the transition are s 'and a', the state of the secondary transition is s ",is in an arbitrary state. The maximum estimation error of the evaluation function at the nth iteration is set as follows:
then there are:
ΔJ n ≤γΔJ n-1 ≤γ n ΔJ 0 (12)
because of the fact thatBounded, then Δ J 0 Bounded, each (s, a) is hinted so Δ J as n approaches infinity 0 Approaching 0. The cerebellar-basolateral nucleus-cortical loop-based cognitive development algorithm performed as an evaluation function converges at n → ∞ when the system is in an equilibrium steady state.
The method combines dynamic planning and animal physiology knowledge, thereby realizing machine online learning with report. The cognitive development algorithm is applied to the path planning research of the mobile robot, and the robot gradually grasps the motor balance control skill through autonomous learning development in an unknown environment and realizes the real-time tracking of the target.
The invention is further illustrated with reference to the following figures and embodiments.
Fig. 1 is a diagram showing a structure of an algorithm control according to the present invention, and the algorithm control is performed in the order shown in the figure. Fig. 2 shows a cognitive development robot frame, which corresponds to the state quantities shown in fig. 1. The balance of the robot is controlled first, because the robot capable of self-balancing is obtained on the premise of the experiment.
In order to verify the effectiveness, robustness and superiority of the cerebellum-basalis-cortex loop-based cognitive development algorithm (CBCLA) proposed herein, two rounds of robotics were used as experimental subjects to study how the robotics finally learn motor skills through autonomous learning in an unknown environment.
The robot has four output quantities and meets corresponding conditions in the experimental process, namely the angular speeds theta of the left and right wheels r And theta l Are all less than 3.489rad/s, and the self inclination angle alpha of the machine body&lt, 0.1744rad and the angular speed beta of the swing link of the robot&lt, 3.489rad/s. Discount factor γ =0.9 and sampling time is 0.01s. The robot obtains self-balancing standard is to keep 20000 steps in one test. The criteria for failure are that the number of attempts exceeds 1000 or that the number of balance steps per trial exceeds 20000 steps. In each experimentAfter failure, the initial state and each weight random value are endowed again in a certain range, and the next learning is carried out again.
(1) Balance control experiment: under an unknown environment without interference, the robot adopts the CBCLA algorithm provided by the text, through continuous learning, through 42 times of heuristics and completing an experiment in 43 times of heuristics, the balance control skill is learned after about 220 steps, namely about 2.2s, and the faster autonomous learning capability and the effectiveness of the algorithm are expressed, wherein fig. 3 shows the response curve of each state quantity of the first 3000 steps in the experiment, and fig. 4 shows the evaluation function and the error simulation curve of the first 3000 steps in the experiment.
(2) And (3) anti-interference experiment: in the actual operation process of the system, the input and output signals are more or less interfered by external noise, or the inaccuracy of the detection device can cause a certain error of the state quantity. Then, in order to simulate the actual environment, when the robot has learned the balance control and kept 9800 steps, a pulse signal with an amplitude of 25 is added to each input state quantity, and if the robot can withstand the disturbance of the pulse signal and keep balance, the experiment is considered successful and the CBCLA algorithm proposed herein is proved to have certain robustness. Fig. 5 shows the output response of each state after adding the pulse signal, and it can be seen that after about 200 steps (i.e. 2 s), the robot reaches the equilibrium position again.
(3) Algorithm comparison experiment: the algorithm introduces an internal motivation mechanism to drive the robot to independently learn, so that the error of the system is reduced, and the convergence rate of the algorithm is improved. In order to prove the superiority of the CBCLA algorithm, a balance control experiment is carried out on the two-wheeled robot by applying a classic Learning Automata (LA) algorithm and the CBCLA algorithm, and the experimental result is analyzed. The parameter settings of the two algorithms were the same in the experiment. Evaluation function BG strio Whether the corresponding system can reach the stability or not is judged by comparing the simulation curves of the LA algorithm and the CBCLA algorithm with each other through the graph 6, and the quick stability comparison of the evaluation function shows that the CBCLA algorithm completes the learning of the balance control skill in about 220 steps (namely 2.2 s), while the classic LA algorithm completes the learning in about 600 steps (namely 6 s), thereby proving that the CBCLA algorithmHas a convergence speed superior to that of the classical LA algorithm. Error SN DPA The stability of the system is reflected, and the error comparison graph of the two algorithms in fig. 7 shows that the error amplitude of the CBCLA algorithm is smaller than that of the classical LA algorithm, so that the stability of the system is facilitated.

Claims (6)

1. A cognitive development method for optimizing a robot posture path target track combines a cognitive development algorithm (CBCLA) thinking of a robot based on cerebellum-basal nucleus-cerebral cortex loop with the robot, the cognitive development process of the robot is divided into eight parts which are respectively a limited internal state set, a system output set and an internal operation behavior set, a state transition equation, a system internal state at the time t, an evaluation function, an action selection probability output of a striatum matrix and dopaminergic energy, and the mutual correlation of the eight parts can be represented by an eight-element array:
CBCLA={SC,MC,Cb A ,f,r(t),BG strio ,BG matrix ,SN DPA }
1)SC=[s 1 ,s 2 ,...s j ]expressed as a limited set of internal states, corresponding to the sensory cortex, s, in the cerebral cortex j Represents the jth state, j is the number of internal states;
2)MC=[y 1 ,y 2 ,...y i ]expressed as the output set of the system, corresponding to the motor cortex, y, in the cerebral cortex i The ith output is represented, and i represents the output number;
3)Cb A =[a 1 ,a 2 ,...a k ]expressed as a set of internal operating behaviors corresponding to the cerebellar region, a k K is the kth internal action, and k is the number of the internal actions;
4) S (t) × a (t) → s (t + 1) is a state transition equation, namely the state s (t + 1) at the time t +1 is determined by the state s (t) at the time t and the operation behavior a (t) together, and is generally determined by an environment or a model;
5) r (t) = r (s (t), a (t)) represents a reward signal after the system makes the state transition to s (t + 1) after the internal operation action a (t) is taken when the internal state is s (t) at the time t, relative to a thalamus sensation emitted by the thalamus;
6)BG strio as an evaluation function, the striated corpuscle is mainly an evaluation mechanism for predicting the movement orientation of the living body, and is further an evaluation mechanism for evaluating the orientation of an intrinsic motivation mechanism;
7)BG matrix outputting the action selection probability of the striated corpuscle matrix, wherein the matrix in the striated corpuscles is mainly used for the action selection function in the learning process of the basal ganglia;
8)SN DPA dopaminergic, as a guided incentive for behavioral assessment, as a behavioral representation that enhances the unknown maximum reward formed by the incentive, results in accurate performance of the action.
2. The cognitive development method of robot pose path target trajectory optimization of claim 1, wherein: s (t) × a (t) → S (t + 1), from the state transition equation f, the external state S (t + 1) ∈ S at the time t +1 is always determined by the external state S (t) ∈ S at the time t and the external agent action a (t) ∈ a at the time t, and is independent of the external state and the external agent action before the time t.
3. The cognitive development method of robot pose path target track optimization of claim 1, wherein: the input signal in the cerebral cortex contains two parts, sensory and motor cortical information, respectively, as input to the striated bodies, and thus:
CC={SC,MC} (1)
4. the cognitive development method of robot pose path target track optimization of claim 1, wherein: defining an evaluation function BG strio The following were used:
wherein γ ∈ [0,1 ]]As a discount factor, due to inherent motivation mechanismsSo that the evaluation function BG of the system strio Gradually approaches to 0, thereby ensuring that the system is finally in a stable state; defining eta as an orientation core in an internal motor mechanism, wherein the main function is to guide the autonomous cognitive direction, and generally defining the value range of the orientation core eta as [ eta ] minmax ]I.e. between the best and the worst function value of orientation, the motor orientation function in the striated corpuscle is defined as shown in equation (3):
and determining the orientation degree of the system by defining the difference value of the orientation functions at two adjacent moments as theta (t) = eta (t) -eta (t-1), wherein if the theta (t) =0, the orientation value at the moment t is larger than that at the moment t-1, otherwise, the orientation value at the moment t is smaller than that at the moment t-1, if the theta (t) <0, the orientation value at the moment t is smaller than that at the moment t-1.
5. The cognitive development method of robot pose path target trajectory optimization of claim 1, wherein: adopting Boltzmann probability rule to realize the action selection function of the matrix and the probability selection mechanism of the learning automaton, and firstly defining:
according to the definition in equation (4), we can express the motion selection probability output of the striated corpuscle matrix by equation (5):
wherein, T is a temperature constant and represents the random degree of action selection, the larger T is the degree of action selection, conversely, the smaller T is the degree of action selection, and when T gradually approaches zero, BG is the strio (SC(t),a j ) The corresponding action selection probability gradually approaches 1, and the value of T in the system gradually decreases along with the time.
6. The cognitive development method of robot pose path target trajectory optimization of claim 4, wherein: the evaluation function determined by the striated bodies at time t +1 is:
combining equation (2) and equation (6) yields equation (7):
BG strio (t)=r(t+1)+γBG strio (t+1) (7)
this shows that at time t the evaluation function BG strio (t) the evaluation function BG at time t +1 can be used strio (t + 1), but the evaluation value BG is used because of the influence of the error existing in the initial stage of prediction strio (t + 1) represents BG strio The value of (t) is not equal to the actual value, so that reward messages output by the thalamic output and the striatonisomes need to be processed in the substantia nigra pars compacta and release dopaminergic SN DPA The table for adjusting the evaluation value can be expressed by equation (8):
SN DPA =r(t+1)+γBG strio (t+1)-BG strio (t) (8)。
CN201711117326.2A 2017-11-13 2017-11-13 The cognitive development method of robot pose path targetpath optimization Withdrawn CN107894715A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711117326.2A CN107894715A (en) 2017-11-13 2017-11-13 The cognitive development method of robot pose path targetpath optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711117326.2A CN107894715A (en) 2017-11-13 2017-11-13 The cognitive development method of robot pose path targetpath optimization

Publications (1)

Publication Number Publication Date
CN107894715A true CN107894715A (en) 2018-04-10

Family

ID=61805123

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711117326.2A Withdrawn CN107894715A (en) 2017-11-13 2017-11-13 The cognitive development method of robot pose path targetpath optimization

Country Status (1)

Country Link
CN (1) CN107894715A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109059931A (en) * 2018-09-05 2018-12-21 北京航空航天大学 A kind of paths planning method based on multiple agent intensified learning
CN109212975A (en) * 2018-11-13 2019-01-15 北方工业大学 A kind of perception action cognitive learning method with developmental mechanism
CN109696918A (en) * 2018-11-16 2019-04-30 华北理工大学 A kind of aircraft of tracking four-axis system implementation method and application this method based on color lump identification
CN114761183A (en) * 2019-12-03 2022-07-15 西门子股份公司 Computerized engineering tool and method for developing neurological skills for robotic systems

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205533A (en) * 2015-09-29 2015-12-30 华北理工大学 Development automatic machine with brain cognition mechanism and learning method of development automatic machine

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205533A (en) * 2015-09-29 2015-12-30 华北理工大学 Development automatic machine with brain cognition mechanism and learning method of development automatic machine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
任红格 等: "《基于操作条件反射机制的感觉运动系统认知模型的建立》", 《机器人》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109059931A (en) * 2018-09-05 2018-12-21 北京航空航天大学 A kind of paths planning method based on multiple agent intensified learning
CN109212975A (en) * 2018-11-13 2019-01-15 北方工业大学 A kind of perception action cognitive learning method with developmental mechanism
CN109696918A (en) * 2018-11-16 2019-04-30 华北理工大学 A kind of aircraft of tracking four-axis system implementation method and application this method based on color lump identification
CN114761183A (en) * 2019-12-03 2022-07-15 西门子股份公司 Computerized engineering tool and method for developing neurological skills for robotic systems

Similar Documents

Publication Publication Date Title
CN107894715A (en) The cognitive development method of robot pose path targetpath optimization
Abreu et al. Learning low level skills from scratch for humanoid robot soccer using deep reinforcement learning
CN105205533B (en) Development automatic machine and its learning method with brain Mechanism of Cognition
CN105700526A (en) On-line sequence limit learning machine method possessing autonomous learning capability
Cox et al. Neuromodulation as a robot controller
Cangelosi et al. Cognitive robotics
Tian et al. Learning to drive like human beings: A method based on deep reinforcement learning
Azimirad et al. Experimental study of reinforcement learning in mobile robots through spiking architecture of thalamo-cortico-thalamic circuitry of mammalian brain
CN109227550A (en) A kind of Mechanical arm control method based on RBF neural
Tutuko et al. Route optimization of non-holonomic leader-follower control using dynamic particle swarm optimization
Yan et al. Path Planning for Mobile Robot's Continuous Action Space Based on Deep Reinforcement Learning
Jiang et al. A Neural Network Controller for Trajectory Control of Industrial Robot Manipulators.
Perez-Pena et al. An approach to motor control for spike-based neuromorphic robotics
Singh et al. Neuron-based control mechanisms for a robotic arm and hand
Sarim et al. An artificial brain mechanism to develop a learning paradigm for robot navigation
Chen et al. A crash avoidance system based upon the cockroach escape response circuit
Azimirad et al. Optimizing the parameters of spiking neural networks for mobile robot implementation
Suro et al. A hierarchical representation of behaviour supporting open ended development and progressive learning for artificial agents
Saxena et al. Advancement of industrial automation in integration with robotics
Wang et al. A computational developmental model of perceptual learning for mobile robot
CN113967909A (en) Mechanical arm intelligent control method based on direction reward
Arie et al. Reinforcement learning of a continuous motor sequence with hidden states
Chai et al. A Possible Explanation for the Generation of Habit in Navigation: a Striatal Behavioral Learning Model
Wang et al. A biologically inspired behavior control for the unexpected uncertainty with motivated developmental network
Garza-Coello et al. AWS DeepRacer: A Way to Understand and Apply the Reinforcement Learning Methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20180410

WW01 Invention patent application withdrawn after publication