CN106598058A - Intrinsically motivated extreme learning machine autonomous development system and operating method thereof - Google Patents
Intrinsically motivated extreme learning machine autonomous development system and operating method thereof Download PDFInfo
- Publication number
- CN106598058A CN106598058A CN201611182422.0A CN201611182422A CN106598058A CN 106598058 A CN106598058 A CN 106598058A CN 201611182422 A CN201611182422 A CN 201611182422A CN 106598058 A CN106598058 A CN 106598058A
- Authority
- CN
- China
- Prior art keywords
- function
- action
- internal motivation
- learning
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/08—Control of attitude, i.e. control of roll, pitch, or yaw
- G05D1/0891—Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for land vehicles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Automation & Control Theory (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Radar, Positioning & Navigation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Remote Sensing (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Aviation & Aerospace Engineering (AREA)
- Manipulator (AREA)
Abstract
The invention belongs to the technical field of intelligent robots, and concretely relates to an intrinsically motivated extreme learning machine autonomous development system and an operating method thereof. The autonomous development system comprises an inner state set, a motion set, a state transition function, an intrinsic motivation orientation function, a reward signal, a reinforced learning update iteration formula, an evaluation function and a motion selection probability. According to the invention, an intrinsic motivation signal is utilized to simulate an orientation cognitive mechanism of the interest of people in things so that a robot can finish relevant tasks voluntarily, thereby solving a problem that the robot is poor in self-learning. Furthermore, an extreme learning machine network is utilized to practice learning and store knowledge and experience so that the robot, if an experience fails, can use the stored knowledge and experience to keep exploring instead of learning from the beginning. In this way, the learning speed of the robot is increased, and a problem of low efficiency of reinforced learning for single-step learning is solved.
Description
Technical field
The invention belongs to intelligent robot technology field, and in particular to the extreme learning machine that a kind of internal motivation drives is spontaneous
Educate system and its operation method.
Background technology
With the continuous development of today's society intellectual technology, roboticses are played extremely in the productive life of people
Important effect, it can not only replace the mankind to complete some relative heavy tasks, and improve work to a certain extent
Make efficiency, while having saved a large amount of human resourcess.
Internal motivation is extremely important concept in developmental psychology, is also that the open cognitive development of the mankind is vital
Mechanism, it promotes intelligent body to go to explore and manipulate environment, cultivates its curiosity and participate in New activity interested.It is this
Motivation can be survived, curiosity, and many factors such as tropism affect, thus internal motivation is referred to as in psychological development, it
It is mechanism more crucial during sensorimotor and cognitive development.
2006, Q learning algorithms in combination with BP neural network, were realized the handstand of the non-discretization of state by Zhang Tao et al.
The model-free study control of pendulum, improves pace of learning.2010, Ren Hongge et al. was employed based on Skinner operating conditions
The Recurrent neural network learning algorithm of reflection theory completes double-wheel self-balancing robot control as the study mechanism of robot
System, demonstrates the robust performance of this algorithm.2013, Oudeyer et al. just biological autopsychic search problems, and with reference to interior
In motivation (Intrinsic motivation, IM) thought, systematic state transfer error learning machine is proposed, realized based on interior
The Active searching of circumstances not known is learnt in the robot of Motivation Model.2013, the increment self-organization network that Shen et al. is proposed
Network I/O mode, it is to avoid the problems such as the dimension disaster that caused by way of look-up table by traditional Q-learning, strengthens intelligence
The ability that energy body is accumulated experience.2014, Hu Qixiang et al. was inspired by psychology internal motivation, it is proposed that a kind of internal motivational drives
The online autonomic learning method of dynamic mobile robot circumstances not known, improves algorithmic statement degree, reduces systematic error, intelligent journey
Degree is significantly improved.
At the beginning of Chinese invention patent CN201110255530.7 discloses a kind of robot intensified learning based on neutral net
Beginning method, the method can effectively improve the learning efficiency of starting stage, accelerate convergence rate;Can by Q-value initialization
Priori is dissolved in learning system, the study to the robot initial stage is optimized, so as to provide one for robot
Individual preferable learning foundation, overcomes the deficiency that the research of existing robot intensified learning technology is present.
Chinese invention patent CN201510358313.9 discloses a kind of moving equilibrium robot based on internal motivation certainly
Main cognitive system and control method, the patent is set up based on inherent dynamic mainly for the robot autonomous cognitive question of moving equilibrium
The robot autonomous cognitive system of moving equilibrium of machine, the intellectual learning behavior to understanding the mankind in depth is more autonomous with construction to be recognized
Know that robot provides method and solution route.
Chinese invention patent CN201510442275.5 discloses a kind of scanning certificate graphs picture based on extreme learning machine to be known
Other method, the patent provides a kind of quick, generalization ability strong processing method for the similarity retrieval of certificate, significantly improves certificate
The classification accuracy of image retrieval.
Initiative present in equilibrium problem currently for the motor control of double-wheel self-balancing robot is poor and conventional strong
Chemistry is practised and the low problem of the single step learning efficiency is not yet solved, and is badly in need of a kind of extreme learning machine of internal motivation driving of offer spontaneous
Educate system and its control method.
The content of the invention
The purpose of the present invention be exactly in order to overcome the shortcomings of the motor control equilibrium problem of existing double-wheel self-balancing robot,
From development system, the system is learnt as framework the extreme learning machine driven there is provided a kind of internal motivation with strengthening Q, will be inherent dynamic
Used as intrinsic reward, driven machine people learnt machine signal, while using extreme learning machine network depositing as knowledge accumulation
Storage space, robot is enable as people, by self study, self-organizing, so as to progressive formation by apery brain learning model
Technical ability is controlled with its balance is improved, with initiative difference in the motor control equilibrium problem for solving double-wheel self-balancing robot and in the past
The intensified learning problem low to the single step learning efficiency.In order to solve above-mentioned technical problem, the present invention is obtained by following technical proposals
To solve:
From development system, the system includes internal state set, set of actions, shape to the extreme learning machine that a kind of internal motivation drives
State transfer function, internal motivation orientation function, reward signal, intensified learning update iterative formula, evaluation function and Action Selection
Probability;The cognitive model of the system is combined with extreme learning machine network as framework with reinforcing Q learning algorithms, and by inherent dynamic
Plane mechanism drives, and is designed to eight tuple models and is expressed as follows:
()
The implication of wherein each element is as follows:
(1)Internal state set,,Represent theIndividual state,To produce
The number of all states of life.
(2)Set of actions,,Represent theIndividual action behavior,
For the number of everything behavior.
(3)State transition function, the functionThe external status at momentAlways
ByThe external status at momentWith external smart body actionTogether decide on, preferably determined with environment by system model.
(4)The orientation function of middle internal motivation, the internal motivation orientation function of the function has system
Evaluation function determine.
(5)Reward signal, the signal represents that system existsMoment,Perform under state dynamic
MakeAfterwards systematic state transfer is arrivedReward value afterwards.
(6)Middle intensified learning updates iterative formula, and the formula represents that system existsMoment exists
External status areWhen the external smart body action that showedAfter be transferred to stateValue function afterwards.
(7)In evaluation function.
(8)In Action Selection probability, represent in stateLower selection actionIt is general
Rate.
Using above-mentioned technical proposal the present invention compared with prior art, the unexpected technique effect for bringing is as follows:
1)The present invention uses internal motivation reward mechanism, by Cognition Mechanism of the internal motivation to hobby, by task
Judgement determine evaluation of estimate, drive intelligent body spontaneously to complete appointed task.The method with traditional cognitive development method ratio
Compared with being effectively improved the study initiative of intelligent body.
2)The present invention uses extreme learning machine network, and extreme learning machine network replacement conventional ad-hoc neutral net has been come
Learn into training and stored knowledge and experience, make robot to learn from beginning after the failure of an experiment, but utilize what is stored
Knowledge experience continues to explore.The method is greatly enhanced coaxial two wheels robot adaptation compared with traditional neutral net storage method
The speed of environment, makes robot learn self-balancing in circumstances not known within a short period of time.
3)The present invention by internal motivation with classics Q learn (external motivation) algorithm in combination with, by desk evaluation value with it is outer
The interaction of portion's adaptive value, enhances the study initiative of intelligent body, while also effectively raising robot learning efficiency.
The preferred version of the present invention is as follows:
In described state transition function, the state transition equation that its state-transferring unit determines is:
()
DescribedInThe external status at momentAlways byThe external status at momentWithThe external smart body action at momentDetermine, with itExternal status and external smart body before moment
Action is unrelated.
The orientation function of described internal motivation is:
()
WhereinFor the parameter of orientation function, whenValue it is less, fewer, internal motivation in system is rewarded in corresponding action
Orientation is less;Conversely, working asBigger, corresponding action reward is bigger, and the orientation of internal motivation is bigger in system.
Wherein described internal motivation describes strange degree, curious degree, degree of being weary of etc. in terms of psychology, is to drive people
Or other biological is explored and the driving force for being learnt.The driving force is attributed to intelligent body and is drawn by internal motivation in learning process
What is entered removes orientation mechanism function.Simulated human brain working mechanism so as to possess independent learning ability, improved learning efficiency.
In classical Q learning algorithms, foundation time difference TD algorithms are to Markov decision making processes for described reward signal
Behavior value function is iterated calculating, and its iterative formula is:
()
WhereinFor Studying factors;
Its prize signal is updated to into the prize signal of internal motivation driving:
()
In formula,For internal motivation function,For outside motivation function,WithRepresent respectivelyWithWeight, iteration is public
Formula is:
()
The reinforcing Q learning algorithms are modeled by Markov decision making processes, and iteration goes out optimal solution:
()
Wherein,For discount factor,For Studying factors, and;
In the algorithm, intrinsic reward function is replaced by internal motivation, it is assumed that intelligent body runs in a foreign environment, and its is defeated
Output is, and desired output is defined as, then the difference of both, it is defined as the inside prize of system
Encourage function.When system existsMoment selection actionWhen, state can be fromIt is transferred toIf,, i.e. system
The error of the error ratio previous moment of generation is little, explanationThe action that moment is chosen makes system reach the effect of expectation target state
Fruit is comparedThe effect of moment selection action is more preferable, while also illustrating that system existsThe orientation at moment is big;, whereas if, then illustrate that system existsThe orientation at moment is little.
Further, the reinforcing Q learning algorithm flow processs are as follows:
Step 1:Random initializtion;
Step 2:Observation current stateAnd select to perform an action decision-making;
Step 3:Obtain NextState, and while obtain prize signal;
Step 4:According to formula () update Q-value.
Under the driving of internal motivation mechanism so that internal actions evaluation functionGradually level off to 0.So as to make two
Wheel robot can keep most suitable statokinetic, described evaluation function to be defined as follows:
()
WhereinFor discount factor;
The evaluation function at moment is:
()
The evaluation function at moment can be byThe evaluation function at moment is represented.
WillObserve as an observer, set up TD error difference formulas as follows:
()
With intensified learning as framework, the thought of intelligent body is driven with reference to internal motivation, using internal motivation orientation function as the party
The reward mechanism of method, and be trained using extreme learning machine network so that robot autonomous learning capacity is strengthened, while
Also pace of learning is substantially increased.
Described Action Selection probability is random chance.
Present invention also offers a kind of operation method of extreme learning machine of internal motivation driving from development system, including such as
Lower step:
Step 1:Initialization current system conditions, choose discount factor, Studying factors, choose suitable
The weight of internal motivation function and outside motivation functionWith。
Step 2:The Q-value of all action that may be taken is calculated in intensified learning.
Step 3:According to Q-value, suitable action is selected.
Step 4:Current action is performed, and the study to next stage makes a policy.
Step5:Calculate internal motivation function, while calculating the action decision-making of optimum according to intensified learning.
()
Step 6:According to formula () update Q-value.
Step7:Update current time, with current state。
Step8:Repeat 2 ~ Step of Step 7 until training is finished.
Description of the drawings
Fig. 1 is the systematic training flow chart of the present invention.
Fig. 2 is coaxial two wheels robot system construction drawing.
Fig. 3 is coaxial two wheels robot simplified structure diagram.
Fig. 4 is the network model of extreme learning machine.
Fig. 5 is to develop system structural framework certainly based on internal motivation.
Fig. 6 is that the extreme learning machine that internal motivation drives develops system structural framework certainly.
Fig. 7 is quantity of state change curve.
Fig. 8 is evaluation function and curve of error.
Fig. 9 is robot stressing conditions curve.
Figure 10 is evaluation function simulation comparison.
Figure 11 is systematic error simulation comparison.
Specific embodiment
The present invention is further elaborated for the embodiment be given with reference to Fig. 1 to Figure 11, but embodiment is not to the present invention
Constitute any restriction.
The extreme learning machine that the internal motivation of the present invention drives is shown in Fig. 6 from system structural framework is developed, and according to shown in Fig. 1
Flow process be trained study.
Fig. 2 gives coaxial two wheels robot system structure model, is simulation inverted pendulum model on its spirit.Fig. 3 is given
Coaxial two wheels robot simplified structural modal and parameter, its design parameter implication such as following table.
Fig. 4 is extreme learning machine network architecture, and it is a kind of simple Single hidden layer feedforward neural networks.Fig. 5 is base
In internal motivation from development model structure framework, its training storage network application is to pass
The self organizing neural network of system.
Coaxial two wheels robot first had to ensure that it can walk upright before various tasks are realized, that is, keeps balancing.In order to
The extreme learning machine that a kind of checking internal motivation proposed by the invention drives is tested from the effectiveness and autonomy of development model
With coaxial two wheels robot mathematical model as object of study, the robot self-balancing technical ability under circumstances not known is studied.
1. experimental design
Double-wheel self-balancing robot is completed into autonomic balance as the control targe of experiment in circumstances not known, is driven in internal motivation
Extreme learning machine from developmental mechanism, action extreme learning machine (ELM) network choose, its four state inputs
Respectively robot itself inclination angle, car body own angular velocity, displacement and body speed of vehicle, are output as the control of coaxial two wheels robot system
Amount.Evaluate network to choose, its input quantity is respectively four states of robot and by action extreme learning machine net
The system control amount of network output layer output, is output as evaluation functionWith the change of fuselage stress.Evaluation functionRepresent
For:
()
Wherein,For award return value, when robot itself inclination angle withWhen, system will obtain an award
Value, otherwise.Appropriate discount factor and Studying factors is have chosen, the sampling time is chosen.Examination every time
More than 200 times or again step number is regarded as the failure of an experiment more than 15000 steps in training process to test exploration number of times, now should terminate
Test is simultaneously tested again.If in once testing 15000 steps can be kept not fall, illustrate that robot is autonomous complete under circumstances not known
Into balance control.Sound out after failure every time, original state, weight threshold are reset to into a range of random number, then
It is secondary to be trained.The result for summarizing 60 experiments shows that after 65 failures of empirical average, robot can be realized as self-balancing
Control, embodies stronger self study and adaptive ability.Simulation result is shown in Fig. 7.
2. interpretation
In order to verify effectiveness of the invention and convergence, emulation is carried out to coaxial two wheels robot system motion balance quality real
Test, and experimental result is analyzed.
Fig. 7 represents double-wheel self-balancing robot by a kind of extreme learning machine of internal motivation driving proposed by the present invention certainly
Four quantity of states after development system training change over curve, and as seen from the figure robot is completed after 3s, i.e. 300 steps
Self-balancing is controlled.Embody invention self study faster and adaptive ability.
Fig. 8 represents the evaluation function curve and curve of error of the step system state amount of robot 3000 in training process.
Fig. 9 is then robot stress change curve.
Figure 10 is a kind of extreme learning machine that drives of internal motivation proposed by the invention from development method (IM-Q-ELM)
With the evaluation function simulation comparison figure of traditional nitrification enhancement (RL).
Figure 11 is both the above Algorithm Error simulation comparison.As can be seen that IM-Q-ELM methods proposed by the invention
Robust performance is much strong to cross the latter.Thus, learnt by above-mentioned experiment, the intensified learning that internal motivation drives learns through the limit
More excellent performance can be obtained after machine network training with faster learning training speed.Equally also show double-wheel self-balancing machine
The stronger adaptive ability of people and control ability.
The present invention proposes a kind of extreme learning machine of internal motivation driving from development system, and is applied to double-wheel self-balancing machine
In device people balance control, the reward mechanism of traditional intensified learning is replaced with internal motivation mechanism, and determine evaluation of estimate;By pole
Limit learning machine network replaces conventional ad-hoc neutral net to complete training study and stored knowledge and experience, is greatly enhanced two
Wheel robot adapts to the speed of environment, makes robot learn self-balancing in circumstances not known within a short period of time.
Those skilled in the art can have kinds of schemes to realize the present invention, the above without departing from the essence and spirit of the present invention
Described is only the preferably feasible embodiment of the present invention, not thereby limits to the interest field of the present invention, all with the present invention
The equivalent structure change that description and accompanying drawing content are made, is both contained within the interest field of the present invention.
Claims (6)
1. the extreme learning machine that a kind of internal motivation drives from development system, the system include internal state set, set of actions,
State transition function, internal motivation orientation function, reward signal, intensified learning update iterative formula, evaluation function and action choosing
Select probability;The cognitive model of the system is combined with extreme learning machine network as framework with reinforcing Q learning algorithms, and by inherence
Motivational mechanism drives, and is designed to eight tuple models and is expressed as follows:
()
The implication of wherein each element is as follows:
(1)Internal state set,,Represent theIndividual state,To produce
The number of all states;
(2)Set of actions,,Represent theIndividual action behavior,For institute
There is the number of action behavior;
(3)State transition function, the functionThe external status at momentAlways byWhen
The external status at quarterWith external smart body actionTogether decide on, preferably determined with environment by system model;
(4)The orientation function of middle internal motivation, the internal motivation orientation function of the function is systematic to be commented
Valency function is determined;
(5)Reward signal, the signal represents that system existsMoment,Execution action under state
Afterwards systematic state transfer is arrivedReward value afterwards;
(6)Middle intensified learning updates iterative formula, and the formula represents that system existsMoment is in outside
State isWhen the external smart body action that showedAfter be transferred to stateValue function afterwards;
(7)In evaluation function;
(8)In Action Selection probability, represent in stateLower selection actionProbability.
2. the extreme learning machine that internal motivation according to claim 1 drives is from development system, it is characterised in that described
In state transition function, the state transition equation that its state-transferring unit determines is:
()
DescribedInThe external status at momentAlways byThe external status at moment
WithThe external smart body action at momentDetermine, with itExternal status and the action of external smart body before moment without
Close.
3. the extreme learning machine that internal motivation according to claim 1 drives is from development system, it is characterised in that described
The orientation function of internal motivation is:
()
WhereinFor the parameter of orientation function, whenValue it is less, fewer, internal motivation in system is rewarded in corresponding action
Orientation is less;Conversely, working asBigger, corresponding action reward is bigger, and the orientation of internal motivation is bigger in system.
4. the extreme learning machine that internal motivation according to claim 1 drives is from development system, it is characterised in that described
Reward signal is carried out according to time difference TD algorithms in classical Q learning algorithms to the behavior value function of Markov decision making processes
Iterate to calculate, its iterative formula is:
()
WhereinFor Studying factors;
Its prize signal is updated to into the prize signal of internal motivation driving:
()
In formula,For internal motivation function,For outside motivation function,WithRepresent respectivelyWithWeight, iterative formula
For:
()
The extreme learning machine that internal motivation according to claim 1 drives is from development system, it is characterised in that the reinforcing
Q learning algorithms are modeled by Markov decision making processes, and iteration goes out optimal solution:
()
Wherein,For discount factor,For Studying factors, and;
The extreme learning machine that internal motivation according to claim 5 drives is from development system, it is characterised in that the reinforcing
Q learning algorithm flow processs are as follows:
Step 1:Random initializtion;
Step 2:Observation current stateAnd select to perform an action decision-making;
Step 3:Obtain NextState, and while obtain prize signal;
Step 4:According to formula () update Q-value.
5. the extreme learning machine that internal motivation according to claim 1 drives is from development system, it is characterised in that described
Evaluation function is defined as follows:
()
WhereinFor discount factor;
The evaluation function at moment is:
()
WillObserve as an observer, set up TD error difference formulas as follows:
()
The extreme learning machine that internal motivation according to claim 1 drives is from development system, it is characterised in that described is dynamic
The probability that elects is random chance.
6. the extreme learning machine that the internal motivation described in any one of claim 1 ~ 8 drives is wrapped from the operation method of development system
Include following steps:
Step 1:Initialization current system conditions, choose discount factor, Studying factors, choose suitable
The weight of internal motivation function and outside motivation functionWith;
Step 2:The Q-value of all action that may be taken is calculated in intensified learning;
Step 3:According to Q-value, suitable action is selected;
Step 4:Current action is performed, and the study to next stage makes a policy;
Step5:Calculate internal motivation function, while calculating the action decision-making of optimum according to intensified learning;
()
Step 6:According to formula () update Q-value;
Step7:Update current time, with current state;
Step8:Repeat 2 ~ Step of Step 7 until training is finished.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611182422.0A CN106598058A (en) | 2016-12-20 | 2016-12-20 | Intrinsically motivated extreme learning machine autonomous development system and operating method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611182422.0A CN106598058A (en) | 2016-12-20 | 2016-12-20 | Intrinsically motivated extreme learning machine autonomous development system and operating method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106598058A true CN106598058A (en) | 2017-04-26 |
Family
ID=58599742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611182422.0A Pending CN106598058A (en) | 2016-12-20 | 2016-12-20 | Intrinsically motivated extreme learning machine autonomous development system and operating method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106598058A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107220540A (en) * | 2017-04-19 | 2017-09-29 | 南京邮电大学 | Intrusion detection method based on intensified learning |
WO2018205778A1 (en) * | 2017-05-11 | 2018-11-15 | 苏州大学张家港工业技术研究院 | Large-range monitoring method based on deep weighted double-q learning and monitoring robot |
CN109195207A (en) * | 2018-07-19 | 2019-01-11 | 浙江工业大学 | A kind of energy-collecting type wireless relay network througput maximization approach based on deeply study |
CN109212975A (en) * | 2018-11-13 | 2019-01-15 | 北方工业大学 | A kind of perception action cognitive learning method with developmental mechanism |
CN109243021A (en) * | 2018-08-28 | 2019-01-18 | 余利 | Deeply learning type intelligent door lock system and device based on user experience analysis |
CN110070185A (en) * | 2019-04-09 | 2019-07-30 | 中国海洋大学 | A method of feedback, which is assessed, from demonstration and the mankind interacts intensified learning |
CN110244561A (en) * | 2019-06-11 | 2019-09-17 | 湘潭大学 | A kind of double inverted pendulum adaptive sliding-mode observer method based on interference observer |
CN110658785A (en) * | 2018-06-28 | 2020-01-07 | 发那科株式会社 | Output device, control device, and method for outputting evaluation function value |
CN110687802A (en) * | 2018-07-06 | 2020-01-14 | 珠海格力电器股份有限公司 | Intelligent household electrical appliance control method and intelligent household electrical appliance control device |
CN114065137A (en) * | 2021-12-17 | 2022-02-18 | 桂林电子科技大学 | Unmanned bicycle mass load eccentricity automatic identification method based on cognitive learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104914870A (en) * | 2015-07-08 | 2015-09-16 | 中南大学 | Ridge-regression-extreme-learning-machine-based local path planning method for outdoor robot |
CN104992059A (en) * | 2015-06-24 | 2015-10-21 | 天津职业技术师范大学 | Intrinsic motivation based self-cognition system for motion balance robot and control method |
CN105205533A (en) * | 2015-09-29 | 2015-12-30 | 华北理工大学 | Development automatic machine with brain cognition mechanism and learning method of development automatic machine |
US20160086087A1 (en) * | 2014-09-19 | 2016-03-24 | King Fahd University Of Petroleum And Minerals | Method for fast prediction of gas composition |
CN105700526A (en) * | 2016-01-13 | 2016-06-22 | 华北理工大学 | On-line sequence limit learning machine method possessing autonomous learning capability |
-
2016
- 2016-12-20 CN CN201611182422.0A patent/CN106598058A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160086087A1 (en) * | 2014-09-19 | 2016-03-24 | King Fahd University Of Petroleum And Minerals | Method for fast prediction of gas composition |
CN104992059A (en) * | 2015-06-24 | 2015-10-21 | 天津职业技术师范大学 | Intrinsic motivation based self-cognition system for motion balance robot and control method |
CN104914870A (en) * | 2015-07-08 | 2015-09-16 | 中南大学 | Ridge-regression-extreme-learning-machine-based local path planning method for outdoor robot |
CN105205533A (en) * | 2015-09-29 | 2015-12-30 | 华北理工大学 | Development automatic machine with brain cognition mechanism and learning method of development automatic machine |
CN105700526A (en) * | 2016-01-13 | 2016-06-22 | 华北理工大学 | On-line sequence limit learning machine method possessing autonomous learning capability |
Non-Patent Citations (2)
Title |
---|
HONEGE REN等: "Research on Q-ELM Algorithm in Robot Path Planning", 《2016 CHINESE CONTROL AND DECISION CONFERENCE》 * |
HONGGE REN等: "Research on Two-wheeled Self-balance Robot Based on IM-Q-ELM Algorithm", 《ICIC EXPRESS LETTERS,PART B:APPLICATIONS》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107220540A (en) * | 2017-04-19 | 2017-09-29 | 南京邮电大学 | Intrusion detection method based on intensified learning |
WO2018205778A1 (en) * | 2017-05-11 | 2018-11-15 | 苏州大学张家港工业技术研究院 | Large-range monitoring method based on deep weighted double-q learning and monitoring robot |
US11224970B2 (en) | 2017-05-11 | 2022-01-18 | Soochow University | Large area surveillance method and surveillance robot based on weighted double deep Q-learning |
CN110658785B (en) * | 2018-06-28 | 2024-03-08 | 发那科株式会社 | Output device, control device, and method for outputting evaluation function value |
CN110658785A (en) * | 2018-06-28 | 2020-01-07 | 发那科株式会社 | Output device, control device, and method for outputting evaluation function value |
CN110687802A (en) * | 2018-07-06 | 2020-01-14 | 珠海格力电器股份有限公司 | Intelligent household electrical appliance control method and intelligent household electrical appliance control device |
CN109195207B (en) * | 2018-07-19 | 2021-05-18 | 浙江工业大学 | Energy-collecting wireless relay network throughput maximization method based on deep reinforcement learning |
CN109195207A (en) * | 2018-07-19 | 2019-01-11 | 浙江工业大学 | A kind of energy-collecting type wireless relay network througput maximization approach based on deeply study |
CN109243021A (en) * | 2018-08-28 | 2019-01-18 | 余利 | Deeply learning type intelligent door lock system and device based on user experience analysis |
CN109212975A (en) * | 2018-11-13 | 2019-01-15 | 北方工业大学 | A kind of perception action cognitive learning method with developmental mechanism |
CN110070185A (en) * | 2019-04-09 | 2019-07-30 | 中国海洋大学 | A method of feedback, which is assessed, from demonstration and the mankind interacts intensified learning |
CN110244561A (en) * | 2019-06-11 | 2019-09-17 | 湘潭大学 | A kind of double inverted pendulum adaptive sliding-mode observer method based on interference observer |
CN110244561B (en) * | 2019-06-11 | 2022-11-08 | 湘潭大学 | Secondary inverted pendulum self-adaptive sliding mode control method based on disturbance observer |
CN114065137A (en) * | 2021-12-17 | 2022-02-18 | 桂林电子科技大学 | Unmanned bicycle mass load eccentricity automatic identification method based on cognitive learning |
CN114065137B (en) * | 2021-12-17 | 2024-03-29 | 桂林电子科技大学 | Automatic recognition method for mass load eccentricity of unmanned bicycle based on cognitive learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106598058A (en) | Intrinsically motivated extreme learning machine autonomous development system and operating method thereof | |
Eysenbach et al. | Diversity is all you need: Learning skills without a reward function | |
Doncieux et al. | Beyond black-box optimization: a review of selective pressures for evolutionary robotics | |
Such et al. | Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning | |
Nelson et al. | Fitness functions in evolutionary robotics: A survey and analysis | |
CN105700526A (en) | On-line sequence limit learning machine method possessing autonomous learning capability | |
CN105205533A (en) | Development automatic machine with brain cognition mechanism and learning method of development automatic machine | |
Showalter et al. | Neuromodulated multiobjective evolutionary neurocontrollers without speciation | |
Yan et al. | Path Planning for Mobile Robot's Continuous Action Space Based on Deep Reinforcement Learning | |
Cahill | Catastrophic forgetting in reinforcement-learning environments | |
CN116306947A (en) | Multi-agent decision method based on Monte Carlo tree exploration | |
CN116841303A (en) | Intelligent preferential high-order iterative self-learning control method for underwater robot | |
Patle | Intelligent navigational strategies for multiple wheeled mobile robots using artificial hybrid methodologies | |
Showalter et al. | Lamarckian inheritance in neuromodulated multiobjective evolutionary neurocontrollers | |
Cheng et al. | An autonomous inter-task mapping learning method via artificial neural network for transfer learning | |
Lehman et al. | Investigating biological assumptions through radical reimplementation | |
Tang et al. | Reinforcement learning for robots path planning with rule-based shallow-trial | |
Menon et al. | An Efficient Application of Neuroevolution for Competitive Multiagent Learning | |
Zhao et al. | Variational Diversity Maximization for Hierarchical Skill Discovery | |
Kumar et al. | A Novel Algorithm for Optimal Trajectory Generation Using Q Learning | |
Oudeyer | Interactive learning gives the tempo to an intrinsically motivated robot learner | |
Kovalský et al. | Evaluating the performance of a neuroevolution algorithm against a reinforcement learning algorithm on a self-driving car | |
Pagliuca | Efficient Evolution of Neural Networks | |
Showalter | Evolution of Multiobjective Neuromodulated Neurocontrollers for Multi-Robot Systems | |
Ni et al. | A Novel Heuristic Exploration Method Based on Action Effectiveness Constraints to Relieve Loop Enhancement Effect in Reinforcement Learning with Sparse Rewards |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170426 |