CN106598058A - Intrinsically motivated extreme learning machine autonomous development system and operating method thereof - Google Patents

Intrinsically motivated extreme learning machine autonomous development system and operating method thereof Download PDF

Info

Publication number
CN106598058A
CN106598058A CN201611182422.0A CN201611182422A CN106598058A CN 106598058 A CN106598058 A CN 106598058A CN 201611182422 A CN201611182422 A CN 201611182422A CN 106598058 A CN106598058 A CN 106598058A
Authority
CN
China
Prior art keywords
function
action
internal motivation
learning
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611182422.0A
Other languages
Chinese (zh)
Inventor
史涛
任红格
尹瑞
李福进
刘伟民
张春磊
宫海洋
杜建
王玮
赵传松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Science and Technology
Original Assignee
North China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Science and Technology filed Critical North China University of Science and Technology
Priority to CN201611182422.0A priority Critical patent/CN106598058A/en
Publication of CN106598058A publication Critical patent/CN106598058A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/08Control of attitude, i.e. control of roll, pitch, or yaw
    • G05D1/0891Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for land vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Automation & Control Theory (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Remote Sensing (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Manipulator (AREA)

Abstract

The invention belongs to the technical field of intelligent robots, and concretely relates to an intrinsically motivated extreme learning machine autonomous development system and an operating method thereof. The autonomous development system comprises an inner state set, a motion set, a state transition function, an intrinsic motivation orientation function, a reward signal, a reinforced learning update iteration formula, an evaluation function and a motion selection probability. According to the invention, an intrinsic motivation signal is utilized to simulate an orientation cognitive mechanism of the interest of people in things so that a robot can finish relevant tasks voluntarily, thereby solving a problem that the robot is poor in self-learning. Furthermore, an extreme learning machine network is utilized to practice learning and store knowledge and experience so that the robot, if an experience fails, can use the stored knowledge and experience to keep exploring instead of learning from the beginning. In this way, the learning speed of the robot is increased, and a problem of low efficiency of reinforced learning for single-step learning is solved.

Description

The extreme learning machine that internal motivation drives is from development system and its operation method
Technical field
The invention belongs to intelligent robot technology field, and in particular to the extreme learning machine that a kind of internal motivation drives is spontaneous Educate system and its operation method.
Background technology
With the continuous development of today's society intellectual technology, roboticses are played extremely in the productive life of people Important effect, it can not only replace the mankind to complete some relative heavy tasks, and improve work to a certain extent Make efficiency, while having saved a large amount of human resourcess.
Internal motivation is extremely important concept in developmental psychology, is also that the open cognitive development of the mankind is vital Mechanism, it promotes intelligent body to go to explore and manipulate environment, cultivates its curiosity and participate in New activity interested.It is this Motivation can be survived, curiosity, and many factors such as tropism affect, thus internal motivation is referred to as in psychological development, it It is mechanism more crucial during sensorimotor and cognitive development.
2006, Q learning algorithms in combination with BP neural network, were realized the handstand of the non-discretization of state by Zhang Tao et al. The model-free study control of pendulum, improves pace of learning.2010, Ren Hongge et al. was employed based on Skinner operating conditions The Recurrent neural network learning algorithm of reflection theory completes double-wheel self-balancing robot control as the study mechanism of robot System, demonstrates the robust performance of this algorithm.2013, Oudeyer et al. just biological autopsychic search problems, and with reference to interior In motivation (Intrinsic motivation, IM) thought, systematic state transfer error learning machine is proposed, realized based on interior The Active searching of circumstances not known is learnt in the robot of Motivation Model.2013, the increment self-organization network that Shen et al. is proposed Network I/O mode, it is to avoid the problems such as the dimension disaster that caused by way of look-up table by traditional Q-learning, strengthens intelligence The ability that energy body is accumulated experience.2014, Hu Qixiang et al. was inspired by psychology internal motivation, it is proposed that a kind of internal motivational drives The online autonomic learning method of dynamic mobile robot circumstances not known, improves algorithmic statement degree, reduces systematic error, intelligent journey Degree is significantly improved.
At the beginning of Chinese invention patent CN201110255530.7 discloses a kind of robot intensified learning based on neutral net Beginning method, the method can effectively improve the learning efficiency of starting stage, accelerate convergence rate;Can by Q-value initialization Priori is dissolved in learning system, the study to the robot initial stage is optimized, so as to provide one for robot Individual preferable learning foundation, overcomes the deficiency that the research of existing robot intensified learning technology is present.
Chinese invention patent CN201510358313.9 discloses a kind of moving equilibrium robot based on internal motivation certainly Main cognitive system and control method, the patent is set up based on inherent dynamic mainly for the robot autonomous cognitive question of moving equilibrium The robot autonomous cognitive system of moving equilibrium of machine, the intellectual learning behavior to understanding the mankind in depth is more autonomous with construction to be recognized Know that robot provides method and solution route.
Chinese invention patent CN201510442275.5 discloses a kind of scanning certificate graphs picture based on extreme learning machine to be known Other method, the patent provides a kind of quick, generalization ability strong processing method for the similarity retrieval of certificate, significantly improves certificate The classification accuracy of image retrieval.
Initiative present in equilibrium problem currently for the motor control of double-wheel self-balancing robot is poor and conventional strong Chemistry is practised and the low problem of the single step learning efficiency is not yet solved, and is badly in need of a kind of extreme learning machine of internal motivation driving of offer spontaneous Educate system and its control method.
The content of the invention
The purpose of the present invention be exactly in order to overcome the shortcomings of the motor control equilibrium problem of existing double-wheel self-balancing robot, From development system, the system is learnt as framework the extreme learning machine driven there is provided a kind of internal motivation with strengthening Q, will be inherent dynamic Used as intrinsic reward, driven machine people learnt machine signal, while using extreme learning machine network depositing as knowledge accumulation Storage space, robot is enable as people, by self study, self-organizing, so as to progressive formation by apery brain learning model Technical ability is controlled with its balance is improved, with initiative difference in the motor control equilibrium problem for solving double-wheel self-balancing robot and in the past The intensified learning problem low to the single step learning efficiency.In order to solve above-mentioned technical problem, the present invention is obtained by following technical proposals To solve:
From development system, the system includes internal state set, set of actions, shape to the extreme learning machine that a kind of internal motivation drives State transfer function, internal motivation orientation function, reward signal, intensified learning update iterative formula, evaluation function and Action Selection Probability;The cognitive model of the system is combined with extreme learning machine network as framework with reinforcing Q learning algorithms, and by inherent dynamic Plane mechanism drives, and is designed to eight tuple models and is expressed as follows:
()
The implication of wherein each element is as follows:
(1)Internal state set,,Represent theIndividual state,To produce The number of all states of life.
(2)Set of actions,,Represent theIndividual action behavior, For the number of everything behavior.
(3)State transition function, the functionThe external status at momentAlways ByThe external status at momentWith external smart body actionTogether decide on, preferably determined with environment by system model.
(4)The orientation function of middle internal motivation, the internal motivation orientation function of the function has system Evaluation function determine.
(5)Reward signal, the signal represents that system existsMoment,Perform under state dynamic MakeAfterwards systematic state transfer is arrivedReward value afterwards.
(6)Middle intensified learning updates iterative formula, and the formula represents that system existsMoment exists External status areWhen the external smart body action that showedAfter be transferred to stateValue function afterwards.
(7)In evaluation function.
(8)In Action Selection probability, represent in stateLower selection actionIt is general Rate.
Using above-mentioned technical proposal the present invention compared with prior art, the unexpected technique effect for bringing is as follows:
1)The present invention uses internal motivation reward mechanism, by Cognition Mechanism of the internal motivation to hobby, by task Judgement determine evaluation of estimate, drive intelligent body spontaneously to complete appointed task.The method with traditional cognitive development method ratio Compared with being effectively improved the study initiative of intelligent body.
2)The present invention uses extreme learning machine network, and extreme learning machine network replacement conventional ad-hoc neutral net has been come Learn into training and stored knowledge and experience, make robot to learn from beginning after the failure of an experiment, but utilize what is stored Knowledge experience continues to explore.The method is greatly enhanced coaxial two wheels robot adaptation compared with traditional neutral net storage method The speed of environment, makes robot learn self-balancing in circumstances not known within a short period of time.
3)The present invention by internal motivation with classics Q learn (external motivation) algorithm in combination with, by desk evaluation value with it is outer The interaction of portion's adaptive value, enhances the study initiative of intelligent body, while also effectively raising robot learning efficiency.
The preferred version of the present invention is as follows:
In described state transition function, the state transition equation that its state-transferring unit determines is:
()
DescribedInThe external status at momentAlways byThe external status at momentWithThe external smart body action at momentDetermine, with itExternal status and external smart body before moment Action is unrelated.
The orientation function of described internal motivation is:
()
WhereinFor the parameter of orientation function, whenValue it is less, fewer, internal motivation in system is rewarded in corresponding action Orientation is less;Conversely, working asBigger, corresponding action reward is bigger, and the orientation of internal motivation is bigger in system.
Wherein described internal motivation describes strange degree, curious degree, degree of being weary of etc. in terms of psychology, is to drive people Or other biological is explored and the driving force for being learnt.The driving force is attributed to intelligent body and is drawn by internal motivation in learning process What is entered removes orientation mechanism function.Simulated human brain working mechanism so as to possess independent learning ability, improved learning efficiency.
In classical Q learning algorithms, foundation time difference TD algorithms are to Markov decision making processes for described reward signal Behavior value function is iterated calculating, and its iterative formula is:
()
WhereinFor Studying factors;
Its prize signal is updated to into the prize signal of internal motivation driving:
()
In formula,For internal motivation function,For outside motivation function,WithRepresent respectivelyWithWeight, iteration is public Formula is:
()
The reinforcing Q learning algorithms are modeled by Markov decision making processes, and iteration goes out optimal solution:
()
Wherein,For discount factor,For Studying factors, and
In the algorithm, intrinsic reward function is replaced by internal motivation, it is assumed that intelligent body runs in a foreign environment, and its is defeated Output is, and desired output is defined as, then the difference of both, it is defined as the inside prize of system Encourage function.When system existsMoment selection actionWhen, state can be fromIt is transferred toIf,, i.e. system The error of the error ratio previous moment of generation is little, explanationThe action that moment is chosen makes system reach the effect of expectation target state Fruit is comparedThe effect of moment selection action is more preferable, while also illustrating that system existsThe orientation at moment is big;, whereas if, then illustrate that system existsThe orientation at moment is little.
Further, the reinforcing Q learning algorithm flow processs are as follows:
Step 1:Random initializtion
Step 2:Observation current stateAnd select to perform an action decision-making
Step 3:Obtain NextState, and while obtain prize signal
Step 4:According to formula () update Q-value.
Under the driving of internal motivation mechanism so that internal actions evaluation functionGradually level off to 0.So as to make two Wheel robot can keep most suitable statokinetic, described evaluation function to be defined as follows:
()
WhereinFor discount factor;
The evaluation function at moment is:
()
The evaluation function at moment can be byThe evaluation function at moment is represented.
WillObserve as an observer, set up TD error difference formulas as follows:
()
With intensified learning as framework, the thought of intelligent body is driven with reference to internal motivation, using internal motivation orientation function as the party The reward mechanism of method, and be trained using extreme learning machine network so that robot autonomous learning capacity is strengthened, while Also pace of learning is substantially increased.
Described Action Selection probability is random chance.
Present invention also offers a kind of operation method of extreme learning machine of internal motivation driving from development system, including such as Lower step:
Step 1:Initialization current system conditions, choose discount factor, Studying factors, choose suitable The weight of internal motivation function and outside motivation functionWith
Step 2:The Q-value of all action that may be taken is calculated in intensified learning.
Step 3:According to Q-value, suitable action is selected.
Step 4:Current action is performed, and the study to next stage makes a policy.
Step5:Calculate internal motivation function, while calculating the action decision-making of optimum according to intensified learning.
()
Step 6:According to formula () update Q-value.
Step7:Update current time, with current state
Step8:Repeat 2 ~ Step of Step 7 until training is finished.
Description of the drawings
Fig. 1 is the systematic training flow chart of the present invention.
Fig. 2 is coaxial two wheels robot system construction drawing.
Fig. 3 is coaxial two wheels robot simplified structure diagram.
Fig. 4 is the network model of extreme learning machine.
Fig. 5 is to develop system structural framework certainly based on internal motivation.
Fig. 6 is that the extreme learning machine that internal motivation drives develops system structural framework certainly.
Fig. 7 is quantity of state change curve.
Fig. 8 is evaluation function and curve of error.
Fig. 9 is robot stressing conditions curve.
Figure 10 is evaluation function simulation comparison.
Figure 11 is systematic error simulation comparison.
Specific embodiment
The present invention is further elaborated for the embodiment be given with reference to Fig. 1 to Figure 11, but embodiment is not to the present invention Constitute any restriction.
The extreme learning machine that the internal motivation of the present invention drives is shown in Fig. 6 from system structural framework is developed, and according to shown in Fig. 1 Flow process be trained study.
Fig. 2 gives coaxial two wheels robot system structure model, is simulation inverted pendulum model on its spirit.Fig. 3 is given Coaxial two wheels robot simplified structural modal and parameter, its design parameter implication such as following table.
Fig. 4 is extreme learning machine network architecture, and it is a kind of simple Single hidden layer feedforward neural networks.Fig. 5 is base In internal motivation from development model structure framework, its training storage network application is to pass
The self organizing neural network of system.
Coaxial two wheels robot first had to ensure that it can walk upright before various tasks are realized, that is, keeps balancing.In order to The extreme learning machine that a kind of checking internal motivation proposed by the invention drives is tested from the effectiveness and autonomy of development model With coaxial two wheels robot mathematical model as object of study, the robot self-balancing technical ability under circumstances not known is studied.
1. experimental design
Double-wheel self-balancing robot is completed into autonomic balance as the control targe of experiment in circumstances not known, is driven in internal motivation Extreme learning machine from developmental mechanism, action extreme learning machine (ELM) network choose, its four state inputs Respectively robot itself inclination angle, car body own angular velocity, displacement and body speed of vehicle, are output as the control of coaxial two wheels robot system Amount.Evaluate network to choose, its input quantity is respectively four states of robot and by action extreme learning machine net The system control amount of network output layer output, is output as evaluation functionWith the change of fuselage stress.Evaluation functionRepresent For:
()
Wherein,For award return value, when robot itself inclination angle withWhen, system will obtain an award Value, otherwise.Appropriate discount factor and Studying factors is have chosen, the sampling time is chosen.Examination every time More than 200 times or again step number is regarded as the failure of an experiment more than 15000 steps in training process to test exploration number of times, now should terminate Test is simultaneously tested again.If in once testing 15000 steps can be kept not fall, illustrate that robot is autonomous complete under circumstances not known Into balance control.Sound out after failure every time, original state, weight threshold are reset to into a range of random number, then It is secondary to be trained.The result for summarizing 60 experiments shows that after 65 failures of empirical average, robot can be realized as self-balancing Control, embodies stronger self study and adaptive ability.Simulation result is shown in Fig. 7.
2. interpretation
In order to verify effectiveness of the invention and convergence, emulation is carried out to coaxial two wheels robot system motion balance quality real Test, and experimental result is analyzed.
Fig. 7 represents double-wheel self-balancing robot by a kind of extreme learning machine of internal motivation driving proposed by the present invention certainly Four quantity of states after development system training change over curve, and as seen from the figure robot is completed after 3s, i.e. 300 steps Self-balancing is controlled.Embody invention self study faster and adaptive ability.
Fig. 8 represents the evaluation function curve and curve of error of the step system state amount of robot 3000 in training process.
Fig. 9 is then robot stress change curve.
Figure 10 is a kind of extreme learning machine that drives of internal motivation proposed by the invention from development method (IM-Q-ELM) With the evaluation function simulation comparison figure of traditional nitrification enhancement (RL).
Figure 11 is both the above Algorithm Error simulation comparison.As can be seen that IM-Q-ELM methods proposed by the invention Robust performance is much strong to cross the latter.Thus, learnt by above-mentioned experiment, the intensified learning that internal motivation drives learns through the limit More excellent performance can be obtained after machine network training with faster learning training speed.Equally also show double-wheel self-balancing machine The stronger adaptive ability of people and control ability.
The present invention proposes a kind of extreme learning machine of internal motivation driving from development system, and is applied to double-wheel self-balancing machine In device people balance control, the reward mechanism of traditional intensified learning is replaced with internal motivation mechanism, and determine evaluation of estimate;By pole Limit learning machine network replaces conventional ad-hoc neutral net to complete training study and stored knowledge and experience, is greatly enhanced two Wheel robot adapts to the speed of environment, makes robot learn self-balancing in circumstances not known within a short period of time.
Those skilled in the art can have kinds of schemes to realize the present invention, the above without departing from the essence and spirit of the present invention Described is only the preferably feasible embodiment of the present invention, not thereby limits to the interest field of the present invention, all with the present invention The equivalent structure change that description and accompanying drawing content are made, is both contained within the interest field of the present invention.

Claims (6)

1. the extreme learning machine that a kind of internal motivation drives from development system, the system include internal state set, set of actions, State transition function, internal motivation orientation function, reward signal, intensified learning update iterative formula, evaluation function and action choosing Select probability;The cognitive model of the system is combined with extreme learning machine network as framework with reinforcing Q learning algorithms, and by inherence Motivational mechanism drives, and is designed to eight tuple models and is expressed as follows:
()
The implication of wherein each element is as follows:
(1)Internal state set,,Represent theIndividual state,To produce The number of all states;
(2)Set of actions,,Represent theIndividual action behavior,For institute There is the number of action behavior;
(3)State transition function, the functionThe external status at momentAlways byWhen The external status at quarterWith external smart body actionTogether decide on, preferably determined with environment by system model;
(4)The orientation function of middle internal motivation, the internal motivation orientation function of the function is systematic to be commented Valency function is determined;
(5)Reward signal, the signal represents that system existsMoment,Execution action under state Afterwards systematic state transfer is arrivedReward value afterwards;
(6)Middle intensified learning updates iterative formula, and the formula represents that system existsMoment is in outside State isWhen the external smart body action that showedAfter be transferred to stateValue function afterwards;
(7)In evaluation function;
(8)In Action Selection probability, represent in stateLower selection actionProbability.
2. the extreme learning machine that internal motivation according to claim 1 drives is from development system, it is characterised in that described In state transition function, the state transition equation that its state-transferring unit determines is:
()
DescribedInThe external status at momentAlways byThe external status at moment WithThe external smart body action at momentDetermine, with itExternal status and the action of external smart body before moment without Close.
3. the extreme learning machine that internal motivation according to claim 1 drives is from development system, it is characterised in that described The orientation function of internal motivation is:
()
WhereinFor the parameter of orientation function, whenValue it is less, fewer, internal motivation in system is rewarded in corresponding action Orientation is less;Conversely, working asBigger, corresponding action reward is bigger, and the orientation of internal motivation is bigger in system.
4. the extreme learning machine that internal motivation according to claim 1 drives is from development system, it is characterised in that described Reward signal is carried out according to time difference TD algorithms in classical Q learning algorithms to the behavior value function of Markov decision making processes Iterate to calculate, its iterative formula is:
()
WhereinFor Studying factors;
Its prize signal is updated to into the prize signal of internal motivation driving:
()
In formula,For internal motivation function,For outside motivation function,WithRepresent respectivelyWithWeight, iterative formula For:
()
The extreme learning machine that internal motivation according to claim 1 drives is from development system, it is characterised in that the reinforcing Q learning algorithms are modeled by Markov decision making processes, and iteration goes out optimal solution:
()
Wherein,For discount factor,For Studying factors, and
The extreme learning machine that internal motivation according to claim 5 drives is from development system, it is characterised in that the reinforcing Q learning algorithm flow processs are as follows:
Step 1:Random initializtion
Step 2:Observation current stateAnd select to perform an action decision-making
Step 3:Obtain NextState, and while obtain prize signal
Step 4:According to formula () update Q-value.
5. the extreme learning machine that internal motivation according to claim 1 drives is from development system, it is characterised in that described Evaluation function is defined as follows:
()
WhereinFor discount factor;
The evaluation function at moment is:
()
WillObserve as an observer, set up TD error difference formulas as follows:
()
The extreme learning machine that internal motivation according to claim 1 drives is from development system, it is characterised in that described is dynamic The probability that elects is random chance.
6. the extreme learning machine that the internal motivation described in any one of claim 1 ~ 8 drives is wrapped from the operation method of development system Include following steps:
Step 1:Initialization current system conditions, choose discount factor, Studying factors, choose suitable The weight of internal motivation function and outside motivation functionWith
Step 2:The Q-value of all action that may be taken is calculated in intensified learning;
Step 3:According to Q-value, suitable action is selected;
Step 4:Current action is performed, and the study to next stage makes a policy;
Step5:Calculate internal motivation function, while calculating the action decision-making of optimum according to intensified learning;
()
Step 6:According to formula () update Q-value;
Step7:Update current time, with current state
Step8:Repeat 2 ~ Step of Step 7 until training is finished.
CN201611182422.0A 2016-12-20 2016-12-20 Intrinsically motivated extreme learning machine autonomous development system and operating method thereof Pending CN106598058A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611182422.0A CN106598058A (en) 2016-12-20 2016-12-20 Intrinsically motivated extreme learning machine autonomous development system and operating method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611182422.0A CN106598058A (en) 2016-12-20 2016-12-20 Intrinsically motivated extreme learning machine autonomous development system and operating method thereof

Publications (1)

Publication Number Publication Date
CN106598058A true CN106598058A (en) 2017-04-26

Family

ID=58599742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611182422.0A Pending CN106598058A (en) 2016-12-20 2016-12-20 Intrinsically motivated extreme learning machine autonomous development system and operating method thereof

Country Status (1)

Country Link
CN (1) CN106598058A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220540A (en) * 2017-04-19 2017-09-29 南京邮电大学 Intrusion detection method based on intensified learning
WO2018205778A1 (en) * 2017-05-11 2018-11-15 苏州大学张家港工业技术研究院 Large-range monitoring method based on deep weighted double-q learning and monitoring robot
CN109195207A (en) * 2018-07-19 2019-01-11 浙江工业大学 A kind of energy-collecting type wireless relay network througput maximization approach based on deeply study
CN109212975A (en) * 2018-11-13 2019-01-15 北方工业大学 A kind of perception action cognitive learning method with developmental mechanism
CN109243021A (en) * 2018-08-28 2019-01-18 余利 Deeply learning type intelligent door lock system and device based on user experience analysis
CN110070185A (en) * 2019-04-09 2019-07-30 中国海洋大学 A method of feedback, which is assessed, from demonstration and the mankind interacts intensified learning
CN110244561A (en) * 2019-06-11 2019-09-17 湘潭大学 A kind of double inverted pendulum adaptive sliding-mode observer method based on interference observer
CN110658785A (en) * 2018-06-28 2020-01-07 发那科株式会社 Output device, control device, and method for outputting evaluation function value
CN110687802A (en) * 2018-07-06 2020-01-14 珠海格力电器股份有限公司 Intelligent household electrical appliance control method and intelligent household electrical appliance control device
CN114065137A (en) * 2021-12-17 2022-02-18 桂林电子科技大学 Unmanned bicycle mass load eccentricity automatic identification method based on cognitive learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104914870A (en) * 2015-07-08 2015-09-16 中南大学 Ridge-regression-extreme-learning-machine-based local path planning method for outdoor robot
CN104992059A (en) * 2015-06-24 2015-10-21 天津职业技术师范大学 Intrinsic motivation based self-cognition system for motion balance robot and control method
CN105205533A (en) * 2015-09-29 2015-12-30 华北理工大学 Development automatic machine with brain cognition mechanism and learning method of development automatic machine
US20160086087A1 (en) * 2014-09-19 2016-03-24 King Fahd University Of Petroleum And Minerals Method for fast prediction of gas composition
CN105700526A (en) * 2016-01-13 2016-06-22 华北理工大学 On-line sequence limit learning machine method possessing autonomous learning capability

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160086087A1 (en) * 2014-09-19 2016-03-24 King Fahd University Of Petroleum And Minerals Method for fast prediction of gas composition
CN104992059A (en) * 2015-06-24 2015-10-21 天津职业技术师范大学 Intrinsic motivation based self-cognition system for motion balance robot and control method
CN104914870A (en) * 2015-07-08 2015-09-16 中南大学 Ridge-regression-extreme-learning-machine-based local path planning method for outdoor robot
CN105205533A (en) * 2015-09-29 2015-12-30 华北理工大学 Development automatic machine with brain cognition mechanism and learning method of development automatic machine
CN105700526A (en) * 2016-01-13 2016-06-22 华北理工大学 On-line sequence limit learning machine method possessing autonomous learning capability

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HONEGE REN等: "Research on Q-ELM Algorithm in Robot Path Planning", 《2016 CHINESE CONTROL AND DECISION CONFERENCE》 *
HONGGE REN等: "Research on Two-wheeled Self-balance Robot Based on IM-Q-ELM Algorithm", 《ICIC EXPRESS LETTERS,PART B:APPLICATIONS》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220540A (en) * 2017-04-19 2017-09-29 南京邮电大学 Intrusion detection method based on intensified learning
WO2018205778A1 (en) * 2017-05-11 2018-11-15 苏州大学张家港工业技术研究院 Large-range monitoring method based on deep weighted double-q learning and monitoring robot
US11224970B2 (en) 2017-05-11 2022-01-18 Soochow University Large area surveillance method and surveillance robot based on weighted double deep Q-learning
CN110658785B (en) * 2018-06-28 2024-03-08 发那科株式会社 Output device, control device, and method for outputting evaluation function value
CN110658785A (en) * 2018-06-28 2020-01-07 发那科株式会社 Output device, control device, and method for outputting evaluation function value
CN110687802A (en) * 2018-07-06 2020-01-14 珠海格力电器股份有限公司 Intelligent household electrical appliance control method and intelligent household electrical appliance control device
CN109195207B (en) * 2018-07-19 2021-05-18 浙江工业大学 Energy-collecting wireless relay network throughput maximization method based on deep reinforcement learning
CN109195207A (en) * 2018-07-19 2019-01-11 浙江工业大学 A kind of energy-collecting type wireless relay network througput maximization approach based on deeply study
CN109243021A (en) * 2018-08-28 2019-01-18 余利 Deeply learning type intelligent door lock system and device based on user experience analysis
CN109212975A (en) * 2018-11-13 2019-01-15 北方工业大学 A kind of perception action cognitive learning method with developmental mechanism
CN110070185A (en) * 2019-04-09 2019-07-30 中国海洋大学 A method of feedback, which is assessed, from demonstration and the mankind interacts intensified learning
CN110244561A (en) * 2019-06-11 2019-09-17 湘潭大学 A kind of double inverted pendulum adaptive sliding-mode observer method based on interference observer
CN110244561B (en) * 2019-06-11 2022-11-08 湘潭大学 Secondary inverted pendulum self-adaptive sliding mode control method based on disturbance observer
CN114065137A (en) * 2021-12-17 2022-02-18 桂林电子科技大学 Unmanned bicycle mass load eccentricity automatic identification method based on cognitive learning
CN114065137B (en) * 2021-12-17 2024-03-29 桂林电子科技大学 Automatic recognition method for mass load eccentricity of unmanned bicycle based on cognitive learning

Similar Documents

Publication Publication Date Title
CN106598058A (en) Intrinsically motivated extreme learning machine autonomous development system and operating method thereof
Eysenbach et al. Diversity is all you need: Learning skills without a reward function
Doncieux et al. Beyond black-box optimization: a review of selective pressures for evolutionary robotics
Such et al. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning
Nelson et al. Fitness functions in evolutionary robotics: A survey and analysis
CN105700526A (en) On-line sequence limit learning machine method possessing autonomous learning capability
CN105205533A (en) Development automatic machine with brain cognition mechanism and learning method of development automatic machine
Showalter et al. Neuromodulated multiobjective evolutionary neurocontrollers without speciation
Yan et al. Path Planning for Mobile Robot's Continuous Action Space Based on Deep Reinforcement Learning
Cahill Catastrophic forgetting in reinforcement-learning environments
CN116306947A (en) Multi-agent decision method based on Monte Carlo tree exploration
CN116841303A (en) Intelligent preferential high-order iterative self-learning control method for underwater robot
Patle Intelligent navigational strategies for multiple wheeled mobile robots using artificial hybrid methodologies
Showalter et al. Lamarckian inheritance in neuromodulated multiobjective evolutionary neurocontrollers
Cheng et al. An autonomous inter-task mapping learning method via artificial neural network for transfer learning
Lehman et al. Investigating biological assumptions through radical reimplementation
Tang et al. Reinforcement learning for robots path planning with rule-based shallow-trial
Menon et al. An Efficient Application of Neuroevolution for Competitive Multiagent Learning
Zhao et al. Variational Diversity Maximization for Hierarchical Skill Discovery
Kumar et al. A Novel Algorithm for Optimal Trajectory Generation Using Q Learning
Oudeyer Interactive learning gives the tempo to an intrinsically motivated robot learner
Kovalský et al. Evaluating the performance of a neuroevolution algorithm against a reinforcement learning algorithm on a self-driving car
Pagliuca Efficient Evolution of Neural Networks
Showalter Evolution of Multiobjective Neuromodulated Neurocontrollers for Multi-Robot Systems
Ni et al. A Novel Heuristic Exploration Method Based on Action Effectiveness Constraints to Relieve Loop Enhancement Effect in Reinforcement Learning with Sparse Rewards

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170426