CN102402712A - Robot reinforced learning initialization method based on neural network - Google Patents
Robot reinforced learning initialization method based on neural network Download PDFInfo
- Publication number
- CN102402712A CN102402712A CN2011102555307A CN201110255530A CN102402712A CN 102402712 A CN102402712 A CN 102402712A CN 2011102555307 A CN2011102555307 A CN 2011102555307A CN 201110255530 A CN201110255530 A CN 201110255530A CN 102402712 A CN102402712 A CN 102402712A
- Authority
- CN
- China
- Prior art keywords
- state
- robot
- neural network
- repayment
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Feedback Control In General (AREA)
- Manipulator (AREA)
Abstract
The invention provides a robot reinforced learning initialization method based on a neural network. The neural network has the same topological structure as a robot working space, and each neuron corresponds to a discrete state of a state space. The method comprises the following steps of: evolving the neural network according to the known partial environmental information till reaching a balance state, wherein at the moment, the output value of each neuron represents maximum cumulative return acquired when the corresponding state follows the optimal strategy; defining the initial value of a Q function as the sum of the immediate return of the current state and the maximum converted cumulative return acquired when the subsequent state follows the optimal strategy; and the mapping the known environmental information into the initial value of the Q function by the neural network. Therefore, the prior knowledge is fused into a robot learning system, and the learning capacity of the robot at the initial stage of reinforced learning is improved; and compared with the conventional Q learning algorithm, the method has the advantages of effectively improving the learning efficiency of the initial stage and increasing the algorithm convergence speed.
Description
Technical field
The present invention relates to priori is dissolved in mobile robot's learning system, the method to Q value initialization in the robot intensified learning process belongs to the machine learning techniques field.
Background technology
Continuous expansion along with the robot application field; The task that robot faced also becomes increasingly complex; Although the researchist can carry out pre-programmed to the repetition behavior that robot possibly carry out under a lot of situation; But for realizing that whole expected behavior carries out the behavior design more and more difficult that becomes, the designer often can not make rational prediction to all behaviors of robot in advance.Therefore, autonomous robot that can the perception environment must obtain new behavior through the mutual on-line study with environment, makes robot can select to reach the optimum action of target based on specific task.
The intensified learning utilization is similar to the method for the trial and error (trial-and-error) among the human thinking and finds the optimum behavior strategy, is showing the better learning performance aspect the robot behavior study at present.The Q learning algorithm is a kind of reinforcement Learning Method of finding the solution INFORMATION OF INCOMPLETE Markov decision problem; Repayment immediately according to ambient condition and last step study acquisition; The mapping policy of modification from state to action; So that the accumulation return value that behavior obtains is maximum, thereby obtain the optimum behavior strategy from environment.Standard Q learning algorithm is Q or random number with Q value initialization generally, and to the priori of environment, the starting stage of study can only not selected action randomly in robot, and therefore, algorithm the convergence speed is slower in complex environment.In order to improve algorithm the convergence speed, the researchist has proposed the method for many improvement Q study, improves the algorithm learning efficiency, improves learning performance.
Generally, quicken Q studying convergence method of velocity and mainly comprise two aspects: a kind of method is the suitable repayment function of design, and another kind of method is reasonable initialization Q function.At present, the researchist has proposed many improved Q learning algorithms, makes robot in the process of intensified learning, can obtain more effectively to repay, and mainly comprises: related Q learning algorithm, inertia Q learning algorithm, Bayes Q learning algorithm etc.Its fundamental purpose will be dissolved in the repayment function for the valuable implicit information of robot exactly, thus the accelerating algorithm speed of convergence.Related Q study compares current repayment and constantly repayment immediately in the past, selects the bigger action of return value, can improve the learning ability of system through related method for returning, reduces obtaining the needed iteration step number of optimal value.The target of inertia Q study provides the method that a kind of predicted state is repaid immediately; Utilize the message delay principle in the learning process; Under the situation of necessity, new target is predicted that the expectation repayment of each situation of action comparer inspection selects the maximum action of expectation repayment to carry out then.Bayes Q study utilizes probability distribution to describe the uncertainty estimation of robotary-action to the Q value; Need to consider the distribution of previous moment Q value in the learning process; And utilize robot learning to experience previous distribution is upgraded; Utilize Bayes's variable to represent the cumulative maximum repayment of current state, bayes method has improved the performance of Q study from having improved the exploration strategy of Q study in essence.
Because enhanced signal all is the scalar value that is obtained by the state value function calculation in the standard intensified learning, can't people's knowledge form and behavior pattern be dissolved in the learning system.And in the robot learning process, the people often has the experience and knowledge of association area, and therefore, the cognition with the people in learning process feeds back to robot with intelligent form with enhanced signal, can reduce the state space dimension, accelerates algorithm the convergence speed.To the problem that the standard intensified learning exists in the man-machine interaction process, Thomaz etc. provide the external strengthening signal by the people in real time in robot intensified learning process, and the people is according to self experience adjustments training behavior, and guided robot carries out perspective exploration.Arsenio has proposed a kind of training data to be carried out learning strategy online, automatic mark, in the man-machine interaction process, obtains training data through triggering specific incident, thereby will the person of teaching be embedded into the backfeed loop of intensified learning.Mirza etc. have proposed the architecture based on interactive history, and robot can utilize and carry out social mutual historical experience with the people and carry out intensified learning, make robot in the easy game that carries out with the people, obtain suitable behavior gradually.
Another kind improves the method for Q learning algorithm performance to be dissolved into priori in the learning system exactly, and the Q value is carried out initialization.At present, the Q value is carried out initialized method and mainly comprise approximate function method, fuzzy rule method, potential function method etc.The approximate function method utilizes intelligence systems such as neural network to approach optimal value function, and priori is become the repayment functional value, robot is learnt on the subclass of whole state space, thereby can be accelerated algorithm the convergence speed.The fuzzy rule method is set up fuzzy rule base according to initial environment information, utilizes fuzzy logic that the Q value is carried out initialization then.The fuzzy rule that utilizes this method to set up is all set according to environmental information is artificial, often can not reflect the ambient condition of robot objectively, causes algorithm unstable.Potential function method defines corresponding state potential function at whole state space; Every bit potential energy value is corresponding to a certain discrete state value in the state space; Utilize the state potential function that the Q value is carried out initialization then, the Q value of learning system can be expressed as the change amount that initial value adds each iteration.In the middle of the various actions of robot; A series of code of conduct must be observed by robot; Robot emerges corresponding behavior and intelligence through cognitive with reciprocation, and intensified learning Q value initialization of robot will become corresponding robot behavior with priori exactly.Therefore; How to obtain the regularization expression-form of priori; Particularly realize the machine inference of domain experts'experiences and general knowledge, people's cognition and intelligence are converted into the calculating of machine and human-machine intelligence's integration technology of reasoning is that robot behavior is learnt urgent problem.
Summary of the invention
Deficiency to existing robot intensified learning Study on Technology present situation and existence; The present invention proposes a kind of learning efficiency that can effectively improve the starting stage, accelerates the robot intensified learning initial method based on neural network of speed of convergence; This method can be dissolved into priori in the learning system through Q value initialization; Study to the robot initial stage is optimized, thereby a learning foundation preferably is provided for robot.
Robot intensified learning initial method based on neural network of the present invention, neural network has identical topological structure with the robot working space, and each neuron is corresponding to a discrete state in the state space.At first neural network is developed according to known component environment information; Up to reaching equilibrium state; At this moment each neuron output value is just represented the cumulative maximum repayment that its corresponding states can obtain, and then the repayment immediately of current state is added that follow-up state follows the maximum conversion accumulation repayment (commutation factor is multiply by in the cumulative maximum repayment) that optimal strategy obtains, Q (s that can be right to all state-actions; A) set rational initial value; Can priori be dissolved in the learning system through Q value initialization, the study in robot initial stage is optimized, thereby a learning foundation preferably is provided for robot; Specifically may further comprise the steps:
(1) sets up neural network model
Neutral net has identical topological structure with the structure space of robot work, each neuron only with its local neighborhood in neuron be connected, type of attachment is all identical; Connecting power all equates; The propagation of information is two-way between the neuron, and neutral net has the architecture of highly-parallel, and each neuron is corresponding to one of the robot working space discrete state; Whole neutral net is formed two dimensional topology by N * N neuron; The state in its neighborhood is upgraded in neutral net input based on each discrete state in evolutionary process, reaches poised state up to neutral net, when reaching poised state; Neuron output value just forms a single-peaked curved surface in the neutral net, and the value of every bit is just represented the cumulative maximum repayment that institute's corresponding states can obtain on the curved surface;
(2) design repayment function
In learning process; Robot can move on 4 directions; Select 4 actions up and down in free position, action is selected based on current state by robot, and the repayment immediately that obtains if this action makes robot arrive target is 1; If bumping, robot and barrier or other machines people obtain immediately that repayment is-0.2, if robot moves then the repayment immediately that obtains is-0.1 at free space;
(3) calculate cumulative maximum repayment initial value
When neural network reaches equilibrium state, define the cumulative maximum repayment V of the corresponding state of each neuron
* Init(s
i) equal this neuron output value x
i, its relation formula is following:
In the formula, x
iI neuronic output valve when reaching equilibrium state for neural network, V
* Init(s
i) be from state s
iSet out and follow the obtainable cumulative maximum repayment of optimal strategy institute;
(4) Q value initialization
Q (s
i, initial value a) is defined as at state s
iFollowing select action a obtained repay the maximum conversion accumulation repayment that r adds follow-up state immediately:
In the formula, s
jFor robot at state s
iThe following action follow-up state that a produced, the Q of selecting
Init(s
i, be that state-action is to (s a)
i, initial Q value a); γ is a commutation factor, selects γ=0.95;
(5) based on the robot intensified learning step of neural network
(a) neural network develops according to initialization context information, up to reaching equilibrium state;
(b) state s
iThe initial value of the cumulative maximum repayment that obtains is defined as neuron output value x
i, relation formula is following:
(c) according to following regular initialization Q value:
(d) observe current state s
t
(e) continue in complex environment, to explore, at current state s
tUnder select one the action a
tAnd carry out, ambient condition is updated to new state s '
t, and r is repaid in reception immediately
t
(f) observe new state s '
t
(g) upgrade list item Q (s according to following formula
t, a
t) value:
In the formula, α
tBe learning rate, span is (0,1), and value is 0.5 usually, and decays with learning process; Q
T-1(s
t, a
t) and Q
T-1(s '
t, a '
t) state of being respectively-action is to (s
t, a
t) and (s '
t, a '
t) at t-1 value constantly, a '
tFor at new state s '
tThe action of following selection;
(h) judge whether robot has arrived the maximum study number of times that target or learning system have reached setting; The maximum study number of times of setting should guarantee that learning system restrains in maximum study number of times; If both satisfy one of which, then study finishes, and continues study otherwise turn back to step (d).
The present invention becomes Q function initial value through neural network with the known environment information mapping; Thereby priori is dissolved in the robotics learning system; Improved the learning ability of robot in intensified learning starting stage; Compare with the traditional Q-learning algorithm, can effectively improve the learning efficiency of starting stage, accelerate algorithm the convergence speed.
Description of drawings
Fig. 1 is an i neuron neighbour structure synoptic diagram.
Fig. 2 is a neuron output value synoptic diagram in the robot target vertex neighborhood.
Fig. 3 is cumulative maximum repayment initial value V*
Init(s, a) synoptic diagram.
Fig. 4 is neural network neuron output value synoptic diagram when reaching equilibrium state.
Fig. 5 is the robot planning path synoptic diagram that existing Q study obtains.
Fig. 6 is existing Q learning algorithm convergence process synoptic diagram.
Fig. 7 is a robot planning of the present invention path synoptic diagram.
Fig. 8 is a studying convergence process synoptic diagram of the present invention.
Embodiment
The present invention is based on neural network the robot intensified learning is carried out initialization; Neural network has identical topological structure with the robot working space; When neural network reaches equilibrium state; Neuron output value is represented the cumulative maximum repayment of corresponding states, utilizes the repayment immediately of current state and the maximum conversion of follow-up state to accumulate the initial value that repayment obtains the Q function.Can priori be dissolved in the learning system through Q value initialization, the study in robot initial stage is optimized, thereby a learning foundation preferably is provided for robot; Specifically may further comprise the steps:
1 neural network model
Neural network has identical topological structure with the robot working space, and each neuron is corresponding to robot working space's a discrete state.All neurons all only with its local neighborhood in neuron be connected, and its type of attachment is all identical, whole neural network is formed two dimensional topology by N * N neuron.Neural network has the architecture of highly-parallel, and all connect power and equate that all the propagation of information is two-way between the neuron.The state in its neighborhood is upgraded in neural network input according to each discrete state in evolutionary process, and whole neural network can be regarded a discrete time dynamical system as.
Neural network is in evolutionary process; Outside input according to impact point and the mapping generation neural network of barrier positional information in neural network topology structure; The neuron that barrier region is corresponding has negative outside input, and the impact point neuron has positive outside input.Neural network develops according to the outside input, and the positive neuron output value in impact point position can propagate into whole state space gradually damply through neuronic local the connection, up to reaching equilibrium state.S type activation function has guaranteed that impact point position neuron has the maximum positive neuron output value of the overall situation, and it is zero that the neuronic output valve of barrier region then is suppressed.Neural network reaches after the balance, and all neuron output values have just constituted a single-peaked curved surface, and the value of each point is just represented the obtainable cumulative maximum repayment of its corresponding states on the curved surface.
Suppose that the machine manual work does the space and be made up of 20 * 20 grids, neural network has identical topological structure with the robot working space, also comprises 20 * 20 neurons, and each neuron is corresponding to a discrete state of work space.Each neuron all only with its local neighborhood in neuron be connected, wherein neuronic type of attachment is as shown in Figure 1 in i neuron and its neighborhood.Whole neural network is formed two dimensional topology by 20 * 20 neurons.Neural network has the architecture of highly-parallel, and all connect power and all equate.In the neural network evolutionary process all neurons be input neuron be again output neuron, the propagation of information is two-way between the neuron, whole neural network can be regarded a discrete time dynamical system as.
I neuron of neural network is corresponding to i discrete state of structure space, and then i neuron discrete time kinetics equation is:
In the formula, i
*Be target nerve unit index value, x
i(t) be i neuron in t output valve constantly, N is the interior neuron number of i neuron neighborhood, I
i(t) be that i neuron imported in t outside constantly, f is an activation function, w
IjBe that j neuron weighed to i neuronic connection, computing formula is shown below:
In the formula, | i-j| is a vector x in the structure space
iAnd x
jBetween the Euclidian distance since each neuron only with its local neighborhood in neuron is connected, the r value is 1, neuron output forms single-peaked curved surface when guaranteeing that neural network reaches equilibrium state, the η span is (1,2), obvious w
Ij=w
Ji, i.e. w
IjFor symmetry; The neuronal activation function is selected the S type function, defines as follows:
In the formula; K is the slope of linear model; Span is (0,1), and f (x) has guaranteed that the positive neuron output value in impact point position can propagate into whole state space gradually damply; And impact point has the maximum positive neuron output value of the overall situation, and it is zero that the neuronic output valve of barrier region then is suppressed; The individual neuronic outside input of i is produced by impact point and the mapping of barrier positional information in neural network topology structure, is defined as follows:
In the formula; V is bigger constant, have the maximum neuron output value of the overall situation in order to guarantee the impact point neuron, and the barrier region neuron has the minimum neuron output value of the overall situation; The value of V should be greater than neuronic input summation, and span is the real number greater than 4.
2 repayment function designs
In learning process; Robot can move on 4 directions; Can select 4 actions up and down in free position, action is selected according to current state by robot, and the repayment immediately that obtains if this action makes robot arrive target is 1; If bumping, robot and barrier or other machines people obtain immediately that repayment is-0.2, if robot moves then the repayment immediately that obtains is-0.1 at free space.
3 calculate cumulative maximum repayment initial value
The mapping in neural network topology structure of based target point and barrier positional information produces the outside input of neutral net, and the neuron that barrier region is corresponding has negative outside input, and the impact point neuron has positive outside input.Neural network develops according to the outside input, and the positive neuron output value in impact point position can propagate into whole state space gradually damply through neuronic local the connection, up to reaching equilibrium state.S type activation function has guaranteed that impact point position neuron has the maximum positive neuron output value of the overall situation, and it is zero that the neuronic output valve of barrier region then is suppressed.Neural network reaches after the equilibrium state, and all neuron output values have just constituted a single-peaked curved surface, and are as shown in Figure 2, and the value of each point is just represented the obtainable cumulative maximum repayment of its corresponding states on the curved surface.
Robot is from arbitrary initial state s
tThe accumulation repayment that obtains of setting out defines as follows:
In the following formula, π is a control strategy, the immediately repayment sequence of r for obtaining, and γ is a commutation factor, span is (0,1), selects γ=0.95 here; Then robot follows the cumulative maximum repayment V that optimal strategy obtains from state s
*(s) calculate as follows:
When neural network reaches equilibrium state, define the cumulative maximum repayment V of the corresponding state of each neuron
* Init(s
i) equal this neuron output value x
i, its relation formula is following:
In the formula, x
iI neuronic output valve when reaching equilibrium state for neural network, V
* Init(s
i) be from state s
iSet out and follow the obtainable cumulative maximum repayment of optimal strategy institute.
4 robot intensified learnings based on neural network
4.1 traditional Q-learning algorithm
In the Markovian decision process, current state is known through the sensor senses surrounding environment in robot, and selects the current action that will carry out, and this action of environmental response also provides repayment immediately, and produce follow-up state.The task of robot intensified learning is exactly to obtain an optimal strategy to make robot obtain maximum conversion accumulation repayment from current state.The accumulation repayment that robot follows any tactful π acquisition from the arbitrary initial state is defined as:
In the formula, r
tBe t repayment immediately constantly, γ is a commutation factor, and span is (0,1), selects γ=0.95 here.
Robot defines as follows from the optimal strategy π * that state s can obtain the cumulative maximum repayment:
Robot from state s follow optimal strategy π * the cumulative maximum repayment that can obtain be defined as V* (s), then the value of Q function is the maximum conversion accumulation repayment that the repayment immediately of current state adds follow-up state, computing formula is following:
Q(s,a)≡(1-α
t)Q(s,a)+α
t(r(s,a)+γV*(s′))
In the formula, α
tBe learning rate, span is (0,1), selects α usually
tInitial value is 0.5, and with the decay of study number of times; V* (s ') and Q (s ', a ') relational expression is following:
Q (s then
t, a
t) according to following Policy Updates:
In the formula, Q
T-1(s
t, a
t) and Q
T-1(s '
t, a '
t) state of being respectively-action is to (s
t, a
t) and (s '
t, a '
t) at t-1 value constantly, a '
tFor at new state s '
tThe action of following selection.
4.2Q value initialization
According to known environment information neural network is developed,, at this moment define the obtainable cumulative maximum repayment of each discrete state and equal its corresponding neuronic output valve up to reaching equilibrium state.To carry out repayment immediately that selected action obtains from current state then and add that follow-up state follows the maximum conversion accumulation repayment that optimal strategy obtains, Q (s that can be right to all state-actions
i, a) rational initial value is set.Q (s
i, initial value computing formula a) is following:
In the formula, r is at state s
iThe following repayment immediately of selecting action a to obtain, γ is a commutation factor, span is (0,1), selects γ=0.95 here; s
jFor robot at state s
iThe following action follow-up state that a produced, the Q of selecting
Init(s
i, be that state-action is to (s a)
i, initial Q value a);
4.3 the Q learning algorithm based on neural network of the present invention
(1) neural network develops according to initialization context information, up to reaching equilibrium state.
(2) utilize neuron output value x
iTo state s
iInitialization is carried out in obtainable cumulative maximum repayment, and relation formula is following:
(3) according to following regular initialization Q value:
Q
Init(s
i,a)=r+γV
Init*(s
j)
(4) observe current state s
t
(5) continue in complex environment, to explore, at current state s
tUnder select one the action a
tAnd carry out, ambient condition is updated to new state s '
t, and r is repaid in reception immediately
t
(6) observe new state s '
t
(7) upgrade list item Q (s according to following formula
t, a
t) value:
(8) judge that (the maximum study number of times of setting can guarantee that learning system restrains to the maximum study number of times whether robot arrived target or learning system and reached setting in maximum study number of times; Maximum study number of times is set to 300 in experimental situation of the present invention); If both satisfy one of which; Then study finishes, and continues study otherwise turn back to step (4).
For intensified learning Q value initialization of robot process is described, select robot impact point neighborhood to demonstrate.When neural network reaches equilibrium state; Neuron output value is shown in numerical value in the node among Fig. 3 in the neighborhood, and each node is corresponding to a discrete state, and the repayment of the cumulative maximum of each state equals the neuronic output valve of this state; Red node is represented dbjective state, and the grey node is represented barrier.Each arrow is represented an action; If the repayment immediately that the target goal state G of robot then obtains is 1; If with barrier or the other machines people repayment immediately that obtains that bumps then be-0.2, if robot is-0.1 in the repayment immediately that free space moves then acquisition.γ is a commutation factor, selects γ=0.95, can obtain the initial value of Q function according to Q value initialization formula, and the right initialization Q value of each state-action is represented shown in the numerical value like arrow among Fig. 4.After initialization is accomplished; Robot when robot faces than complex environment, just has certain purpose in the starting stage of study selecting appropriate action under the original state arbitrarily; Rather than the selection action of completely random ground, thereby accelerate algorithm the convergence speed.
Mobile robot's environmental modeling of setting up in the laboratory with explore on the software platform, carried out emulation experiment.The robot planning path that Fig. 5 obtains for the existing robots reinforcement Learning Method; Fig. 6 is an existing robots intensified learning algorithm convergence process.Learning algorithm begins later convergence through 145 study, and (like preceding 20 study) robot basically all can not arrive impact point in maximum iteration time in the starting stage of study.This is because the Q value is initialized to 0, makes robot have no priori, can only select action randomly, thereby it is lower to cause learning starting stage efficient, and algorithm the convergence speed is slower.
Fig. 7 is robot planning of the present invention path; Fig. 8 is a convergence process of the present invention.Learning algorithm begins convergence after through 76 study; And robot also can both arrive impact point basically in the starting stage of study within maximum iteration time; The present invention has effectively improved the learning efficiency in robot initial stage, has obviously accelerated the speed of convergence of learning process.
Claims (1)
1. robot intensified learning initial method based on neural network; Neural network has identical topological structure with the robot working space, and each neuron at first develops to neural network according to known component environment information corresponding to a discrete state in the state space; Up to reaching equilibrium state; At this moment each neuron output value is just represented the cumulative maximum repayment that its corresponding states can obtain, and then the repayment immediately of current state is added that follow-up state follows the maximum conversion accumulation repayment that optimal strategy obtains, Q (s that can be right to all state-actions; A) set rational initial value; Can priori be dissolved in the learning system through Q value initialization, the study in robot initial stage is optimized, thereby a learning foundation preferably is provided for robot; Specifically may further comprise the steps:
(1) sets up neural network model
Neutral net has identical topological structure with the structure space of robot work, each neuron only with its local neighborhood in neuron be connected, type of attachment is all identical; Connecting power all equates; The propagation of information is two-way between the neuron, and neutral net has the architecture of highly-parallel, and each neuron is corresponding to one of the robot working space discrete state; Whole neutral net is formed two dimensional topology by N * N neuron; The state in its neighborhood is upgraded in neutral net input based on each discrete state in evolutionary process, reaches poised state up to neutral net, when reaching poised state; Neuron output value just forms a single-peaked curved surface in the neutral net, and the value of every bit is just represented the cumulative maximum repayment that institute's corresponding states can obtain on the curved surface;
(2) design repayment function
In learning process; Robot can move on 4 directions; Select 4 actions up and down in free position, action is selected based on current state by robot, and the repayment immediately that obtains if this action makes robot arrive target is 1; If bumping, robot and barrier or other machines people obtain immediately that repayment is-0.2, if robot moves then the repayment immediately that obtains is-0.1 at free space;
(3) calculate cumulative maximum repayment initial value
When neural network reaches equilibrium state, define the cumulative maximum repayment V of the corresponding state of each neuron
* Init(s
i) equal this neuron output value x
i, its relation formula is following:
In the formula, x
iI neuronic output valve when reaching equilibrium state for neural network, V
* Init(s
i) be from state s
iSet out and follow the obtainable cumulative maximum repayment of optimal strategy institute;
(4) Q value initialization
Q (s
i, initial value a) is defined as at state s
iFollowing select action a obtained repay the maximum conversion accumulation repayment that r adds follow-up state immediately:
In the formula, s
jFor robot at state s
iThe following action follow-up state that a produced, the Q of selecting
Init(s
i, be that state-action is to (s a)
i, initial Q value a); γ is a commutation factor, selects γ=0.95;
(5) based on the robot intensified learning step of neural network
(a) neural network develops according to initialization context information, up to reaching equilibrium state;
(b) state s
iThe initial value of the cumulative maximum repayment that obtains is defined as neuron output value x
i, relation formula is following:
(c) according to following regular initialization Q value:
(d) observe current state s
t
(e) continue in complex environment, to explore, at current state s
tUnder select one the action a
tAnd carry out, ambient condition is updated to new state s '
t, and r is repaid in reception immediately
t
(f) observe new state s '
t
(g) upgrade list item Q (s according to following formula
t, a
t) value:
In the formula, α
tBe learning rate, span is (0,1), and value is 0.5 usually, and decays with learning process; Q
T-1(s
t, a
t) and Q
T-1(s '
t, a '
t) state of being respectively-action is to (s
t, a
t) and (s '
t, a '
t) at t-1 value constantly, a '
tFor at new state s '
tThe action of following selection;
(h) judge whether robot has arrived the maximum study number of times that target or learning system have reached setting; The maximum study number of times of setting should guarantee that learning system restrains in maximum study number of times; If both satisfy one of which, then study finishes, and continues study otherwise turn back to step (d).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110255530.7A CN102402712B (en) | 2011-08-31 | 2011-08-31 | Robot reinforced learning initialization method based on neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110255530.7A CN102402712B (en) | 2011-08-31 | 2011-08-31 | Robot reinforced learning initialization method based on neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102402712A true CN102402712A (en) | 2012-04-04 |
CN102402712B CN102402712B (en) | 2014-03-05 |
Family
ID=45884895
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110255530.7A Expired - Fee Related CN102402712B (en) | 2011-08-31 | 2011-08-31 | Robot reinforced learning initialization method based on neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102402712B (en) |
Cited By (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102799179A (en) * | 2012-07-06 | 2012-11-28 | 山东大学 | Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning |
CN102819264A (en) * | 2012-07-30 | 2012-12-12 | 山东大学 | Path planning Q-learning initial method of mobile robot |
CN103218655A (en) * | 2013-03-07 | 2013-07-24 | 西安理工大学 | Reinforced learning algorithm based on immunologic tolerance mechanism |
CN104317297A (en) * | 2014-10-30 | 2015-01-28 | 沈阳化工大学 | Robot obstacle avoidance method under unknown environment |
CN104932264A (en) * | 2015-06-03 | 2015-09-23 | 华南理工大学 | Humanoid robot stable control method of RBF-Q learning frame |
CN104932267A (en) * | 2015-06-04 | 2015-09-23 | 曲阜师范大学 | Neural network learning control method adopting eligibility trace |
CN104932847A (en) * | 2015-06-08 | 2015-09-23 | 三维泰柯(厦门)电子科技有限公司 | Spatial network 3D printing algorithm |
CN105637540A (en) * | 2013-10-08 | 2016-06-01 | 谷歌公司 | Methods and apparatus for reinforcement learning |
CN105700526A (en) * | 2016-01-13 | 2016-06-22 | 华北理工大学 | On-line sequence limit learning machine method possessing autonomous learning capability |
CN105740644A (en) * | 2016-03-24 | 2016-07-06 | 苏州大学 | Cleaning robot optimal target path planning method based on model learning |
CN105955921A (en) * | 2016-04-18 | 2016-09-21 | 苏州大学 | Robot hierarchical reinforcement learning initialization method based on automatic discovery of abstract action |
CN106056213A (en) * | 2015-04-06 | 2016-10-26 | 谷歌公司 | Selecting reinforcement learning actions using goals and observations |
CN106295637A (en) * | 2016-07-29 | 2017-01-04 | 电子科技大学 | A kind of vehicle identification method based on degree of depth study with intensified learning |
CN107030704A (en) * | 2017-06-14 | 2017-08-11 | 郝允志 | Educational robot control design case based on neuroid |
CN107102644A (en) * | 2017-06-22 | 2017-08-29 | 华南师范大学 | The underwater robot method for controlling trajectory and control system learnt based on deeply |
CN107305370A (en) * | 2016-04-25 | 2017-10-31 | 发那科株式会社 | The production system of the decision content of the setting variable related to the exception of product |
CN107516112A (en) * | 2017-08-24 | 2017-12-26 | 北京小米移动软件有限公司 | Object type recognition methods, device, equipment and storage medium |
CN107562053A (en) * | 2017-08-30 | 2018-01-09 | 南京大学 | A kind of Hexapod Robot barrier-avoiding method based on fuzzy Q-learning |
CN107688851A (en) * | 2017-08-26 | 2018-02-13 | 胡明建 | A kind of no aixs cylinder transmits the design method of artificial neuron entirely |
CN107729953A (en) * | 2017-09-18 | 2018-02-23 | 清华大学 | Robot plume method for tracing based on continuous state behavior domain intensified learning |
CN107860389A (en) * | 2017-11-07 | 2018-03-30 | 金陵科技学院 | Robot chamber expert walks intensified learning path navigation algorithm |
CN107914124A (en) * | 2016-10-07 | 2018-04-17 | 发那科株式会社 | Operation auxiliary system with rote learning portion |
CN108051999A (en) * | 2017-10-31 | 2018-05-18 | 中国科学技术大学 | Accelerator beam path control method and system based on deeply study |
WO2018113260A1 (en) * | 2016-12-22 | 2018-06-28 | 深圳光启合众科技有限公司 | Emotional expression method and device, and robot |
CN108427283A (en) * | 2018-04-04 | 2018-08-21 | 浙江工贸职业技术学院 | A kind of control method that the compartment intellect service robot based on neural network is advanced |
CN108563971A (en) * | 2018-04-26 | 2018-09-21 | 广西大学 | The more reader anti-collision algorithms of RFID based on depth Q networks |
CN108594803A (en) * | 2018-03-06 | 2018-09-28 | 吉林大学 | Paths planning method based on Q- learning algorithms |
CN108693851A (en) * | 2017-03-31 | 2018-10-23 | 发那科株式会社 | Behavioural information learning device, robot control system and behavioural information learning method |
CN109032168A (en) * | 2018-05-07 | 2018-12-18 | 西安电子科技大学 | A kind of Route planner of the multiple no-manned plane Cooperative Area monitoring based on DQN |
CN109663359A (en) * | 2018-12-06 | 2019-04-23 | 广州多益网络股份有限公司 | Optimization method, device, terminal device and the storage medium of game intelligence body training |
CN109906132A (en) * | 2016-09-15 | 2019-06-18 | 谷歌有限责任公司 | The deeply of Robotic Manipulator learns |
CN110023965A (en) * | 2016-10-10 | 2019-07-16 | 渊慧科技有限公司 | For selecting the neural network of the movement executed by intelligent robot body |
CN110070188A (en) * | 2019-04-30 | 2019-07-30 | 山东大学 | A kind of increment type cognitive development system and method merging interactive intensified learning |
CN110196587A (en) * | 2018-02-27 | 2019-09-03 | 中国科学院深圳先进技术研究院 | Vehicular automatic driving control strategy model generating method, device, equipment and medium |
CN110307848A (en) * | 2019-07-04 | 2019-10-08 | 南京大学 | A kind of Mobile Robotics Navigation method |
CN110333739A (en) * | 2019-08-21 | 2019-10-15 | 哈尔滨工程大学 | A kind of AUV conduct programming and method of controlling operation based on intensified learning |
CN110703792A (en) * | 2019-11-07 | 2020-01-17 | 江苏科技大学 | Underwater robot attitude control method based on reinforcement learning |
CN110782000A (en) * | 2018-07-30 | 2020-02-11 | 国际商业机器公司 | Mimic learning by action shaping with antagonistic reinforcement learning |
CN111279276A (en) * | 2017-11-02 | 2020-06-12 | 西门子股份公司 | Randomized reinforcement learning for controlling complex systems |
CN111542836A (en) * | 2017-10-04 | 2020-08-14 | 华为技术有限公司 | Method for selecting action for object by using neural network |
CN111552183A (en) * | 2020-05-17 | 2020-08-18 | 南京大学 | Six-legged robot obstacle avoidance method based on adaptive weight reinforcement learning |
CN112297005A (en) * | 2020-10-10 | 2021-02-02 | 杭州电子科技大学 | Robot autonomous control method based on graph neural network reinforcement learning |
US11030523B2 (en) | 2016-10-28 | 2021-06-08 | Google Llc | Neural architecture search |
CN114310870A (en) * | 2021-11-10 | 2022-04-12 | 达闼科技(北京)有限公司 | Intelligent agent control method and device, electronic equipment and storage medium |
US11734575B2 (en) | 2018-07-30 | 2023-08-22 | International Business Machines Corporation | Sequential learning of constraints for hierarchical reinforcement learning |
CN110582784B (en) * | 2017-05-26 | 2023-11-14 | 渊慧科技有限公司 | Training action selection neural networks using look-ahead searching |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5402521A (en) * | 1990-02-28 | 1995-03-28 | Chiyoda Corporation | Method for recognition of abnormal conditions using neural networks |
WO2007135723A1 (en) * | 2006-05-22 | 2007-11-29 | Fujitsu Limited | Neural network learning device, method, and program |
CN101320251A (en) * | 2008-07-15 | 2008-12-10 | 华南理工大学 | Robot ambulation control method based on confirmation learning theory |
CN102063640A (en) * | 2010-11-29 | 2011-05-18 | 北京航空航天大学 | Robot behavior learning model based on utility differential network |
-
2011
- 2011-08-31 CN CN201110255530.7A patent/CN102402712B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5402521A (en) * | 1990-02-28 | 1995-03-28 | Chiyoda Corporation | Method for recognition of abnormal conditions using neural networks |
WO2007135723A1 (en) * | 2006-05-22 | 2007-11-29 | Fujitsu Limited | Neural network learning device, method, and program |
CN101320251A (en) * | 2008-07-15 | 2008-12-10 | 华南理工大学 | Robot ambulation control method based on confirmation learning theory |
CN102063640A (en) * | 2010-11-29 | 2011-05-18 | 北京航空航天大学 | Robot behavior learning model based on utility differential network |
Non-Patent Citations (1)
Title |
---|
宋勇 等: "基于神经网络的移动机器人路径规划方法", 《系统工程与电子技术》 * |
Cited By (82)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102799179A (en) * | 2012-07-06 | 2012-11-28 | 山东大学 | Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning |
CN102819264A (en) * | 2012-07-30 | 2012-12-12 | 山东大学 | Path planning Q-learning initial method of mobile robot |
CN102819264B (en) * | 2012-07-30 | 2015-01-21 | 山东大学 | Path planning Q-learning initial method of mobile robot |
CN103218655B (en) * | 2013-03-07 | 2016-02-24 | 西安理工大学 | Based on the nitrification enhancement of Mechanism of immunotolerance |
CN103218655A (en) * | 2013-03-07 | 2013-07-24 | 西安理工大学 | Reinforced learning algorithm based on immunologic tolerance mechanism |
CN105637540A (en) * | 2013-10-08 | 2016-06-01 | 谷歌公司 | Methods and apparatus for reinforcement learning |
CN104317297A (en) * | 2014-10-30 | 2015-01-28 | 沈阳化工大学 | Robot obstacle avoidance method under unknown environment |
CN106056213A (en) * | 2015-04-06 | 2016-10-26 | 谷歌公司 | Selecting reinforcement learning actions using goals and observations |
CN106056213B (en) * | 2015-04-06 | 2022-03-29 | 渊慧科技有限公司 | Selecting reinforcement learning actions using targets and observations |
CN104932264A (en) * | 2015-06-03 | 2015-09-23 | 华南理工大学 | Humanoid robot stable control method of RBF-Q learning frame |
CN104932264B (en) * | 2015-06-03 | 2018-07-20 | 华南理工大学 | The apery robot stabilized control method of Q learning frameworks based on RBF networks |
CN104932267A (en) * | 2015-06-04 | 2015-09-23 | 曲阜师范大学 | Neural network learning control method adopting eligibility trace |
CN104932267B (en) * | 2015-06-04 | 2017-10-03 | 曲阜师范大学 | A kind of neural network lea rning control method of use eligibility trace |
CN104932847A (en) * | 2015-06-08 | 2015-09-23 | 三维泰柯(厦门)电子科技有限公司 | Spatial network 3D printing algorithm |
CN104932847B (en) * | 2015-06-08 | 2018-01-19 | 三维泰柯(厦门)电子科技有限公司 | A kind of spatial network 3D printing algorithm |
CN105700526A (en) * | 2016-01-13 | 2016-06-22 | 华北理工大学 | On-line sequence limit learning machine method possessing autonomous learning capability |
CN105700526B (en) * | 2016-01-13 | 2018-07-27 | 华北理工大学 | Online limit of sequence learning machine method with independent learning ability |
CN105740644A (en) * | 2016-03-24 | 2016-07-06 | 苏州大学 | Cleaning robot optimal target path planning method based on model learning |
CN105740644B (en) * | 2016-03-24 | 2018-04-13 | 苏州大学 | Cleaning robot optimal target path planning method based on model learning |
CN105955921A (en) * | 2016-04-18 | 2016-09-21 | 苏州大学 | Robot hierarchical reinforcement learning initialization method based on automatic discovery of abstract action |
CN105955921B (en) * | 2016-04-18 | 2019-03-26 | 苏州大学 | Robot Hierarchical reinforcement learning initial method based on automatic discovery abstract action |
CN107305370A (en) * | 2016-04-25 | 2017-10-31 | 发那科株式会社 | The production system of the decision content of the setting variable related to the exception of product |
CN107305370B (en) * | 2016-04-25 | 2020-09-25 | 发那科株式会社 | Production system for setting determination value of variable related to abnormality of product |
US10782664B2 (en) | 2016-04-25 | 2020-09-22 | Fanuc Corporation | Production system that sets determination value of variable relating to abnormality of product |
CN106295637B (en) * | 2016-07-29 | 2019-05-03 | 电子科技大学 | A kind of vehicle identification method based on deep learning and intensified learning |
CN106295637A (en) * | 2016-07-29 | 2017-01-04 | 电子科技大学 | A kind of vehicle identification method based on degree of depth study with intensified learning |
US11400587B2 (en) | 2016-09-15 | 2022-08-02 | Google Llc | Deep reinforcement learning for robotic manipulation |
CN109906132B (en) * | 2016-09-15 | 2022-08-09 | 谷歌有限责任公司 | Robotic deep reinforcement learning |
CN109906132A (en) * | 2016-09-15 | 2019-06-18 | 谷歌有限责任公司 | The deeply of Robotic Manipulator learns |
US11897133B2 (en) | 2016-09-15 | 2024-02-13 | Google Llc | Deep reinforcement learning for robotic manipulation |
CN107914124A (en) * | 2016-10-07 | 2018-04-17 | 发那科株式会社 | Operation auxiliary system with rote learning portion |
CN110023965B (en) * | 2016-10-10 | 2023-07-11 | 渊慧科技有限公司 | System, method, and storage medium for selecting a neural network of actions |
US11534911B2 (en) | 2016-10-10 | 2022-12-27 | Deepmind Technologies Limited | Neural networks for selecting actions to be performed by a robotic agent |
CN110023965A (en) * | 2016-10-10 | 2019-07-16 | 渊慧科技有限公司 | For selecting the neural network of the movement executed by intelligent robot body |
US11030523B2 (en) | 2016-10-28 | 2021-06-08 | Google Llc | Neural architecture search |
US11829874B2 (en) | 2016-10-28 | 2023-11-28 | Google Llc | Neural architecture search |
CN108229640B (en) * | 2016-12-22 | 2021-08-20 | 山西翼天下智能科技有限公司 | Emotion expression method and device and robot |
CN108229640A (en) * | 2016-12-22 | 2018-06-29 | 深圳光启合众科技有限公司 | The method, apparatus and robot of emotion expression service |
WO2018113260A1 (en) * | 2016-12-22 | 2018-06-28 | 深圳光启合众科技有限公司 | Emotional expression method and device, and robot |
CN108693851A (en) * | 2017-03-31 | 2018-10-23 | 发那科株式会社 | Behavioural information learning device, robot control system and behavioural information learning method |
US10730182B2 (en) | 2017-03-31 | 2020-08-04 | Fanuc Corporation | Action information learning device, robot control system and action information learning method |
CN108693851B (en) * | 2017-03-31 | 2020-05-26 | 发那科株式会社 | Behavior information learning device, robot control system, and behavior information learning method |
CN110582784B (en) * | 2017-05-26 | 2023-11-14 | 渊慧科技有限公司 | Training action selection neural networks using look-ahead searching |
CN107030704A (en) * | 2017-06-14 | 2017-08-11 | 郝允志 | Educational robot control design case based on neuroid |
CN107102644B (en) * | 2017-06-22 | 2019-12-10 | 华南师范大学 | Underwater robot track control method and control system based on deep reinforcement learning |
CN107102644A (en) * | 2017-06-22 | 2017-08-29 | 华南师范大学 | The underwater robot method for controlling trajectory and control system learnt based on deeply |
CN107516112A (en) * | 2017-08-24 | 2017-12-26 | 北京小米移动软件有限公司 | Object type recognition methods, device, equipment and storage medium |
CN107688851A (en) * | 2017-08-26 | 2018-02-13 | 胡明建 | A kind of no aixs cylinder transmits the design method of artificial neuron entirely |
CN107562053A (en) * | 2017-08-30 | 2018-01-09 | 南京大学 | A kind of Hexapod Robot barrier-avoiding method based on fuzzy Q-learning |
CN107729953B (en) * | 2017-09-18 | 2019-09-27 | 清华大学 | Robot plume method for tracing based on continuous state behavior domain intensified learning |
CN107729953A (en) * | 2017-09-18 | 2018-02-23 | 清华大学 | Robot plume method for tracing based on continuous state behavior domain intensified learning |
CN111542836B (en) * | 2017-10-04 | 2024-05-17 | 华为技术有限公司 | Method for selecting action by using neural network as object |
CN111542836A (en) * | 2017-10-04 | 2020-08-14 | 华为技术有限公司 | Method for selecting action for object by using neural network |
CN108051999A (en) * | 2017-10-31 | 2018-05-18 | 中国科学技术大学 | Accelerator beam path control method and system based on deeply study |
US11164077B2 (en) | 2017-11-02 | 2021-11-02 | Siemens Aktiengesellschaft | Randomized reinforcement learning for control of complex systems |
CN111279276B (en) * | 2017-11-02 | 2024-05-31 | 西门子股份公司 | Randomization reinforcement learning for controlling complex systems |
CN111279276A (en) * | 2017-11-02 | 2020-06-12 | 西门子股份公司 | Randomized reinforcement learning for controlling complex systems |
CN107860389A (en) * | 2017-11-07 | 2018-03-30 | 金陵科技学院 | Robot chamber expert walks intensified learning path navigation algorithm |
CN110196587A (en) * | 2018-02-27 | 2019-09-03 | 中国科学院深圳先进技术研究院 | Vehicular automatic driving control strategy model generating method, device, equipment and medium |
CN108594803B (en) * | 2018-03-06 | 2020-06-12 | 吉林大学 | Path planning method based on Q-learning algorithm |
CN108594803A (en) * | 2018-03-06 | 2018-09-28 | 吉林大学 | Paths planning method based on Q- learning algorithms |
CN108427283A (en) * | 2018-04-04 | 2018-08-21 | 浙江工贸职业技术学院 | A kind of control method that the compartment intellect service robot based on neural network is advanced |
CN108563971A (en) * | 2018-04-26 | 2018-09-21 | 广西大学 | The more reader anti-collision algorithms of RFID based on depth Q networks |
CN109032168A (en) * | 2018-05-07 | 2018-12-18 | 西安电子科技大学 | A kind of Route planner of the multiple no-manned plane Cooperative Area monitoring based on DQN |
CN109032168B (en) * | 2018-05-07 | 2021-06-08 | 西安电子科技大学 | DQN-based multi-unmanned aerial vehicle collaborative area monitoring airway planning method |
CN110782000B (en) * | 2018-07-30 | 2023-11-24 | 国际商业机器公司 | Imitation learning by action shaping with contrast reinforcement learning |
CN110782000A (en) * | 2018-07-30 | 2020-02-11 | 国际商业机器公司 | Mimic learning by action shaping with antagonistic reinforcement learning |
US11734575B2 (en) | 2018-07-30 | 2023-08-22 | International Business Machines Corporation | Sequential learning of constraints for hierarchical reinforcement learning |
CN109663359B (en) * | 2018-12-06 | 2022-03-25 | 广州多益网络股份有限公司 | Game intelligent agent training optimization method and device, terminal device and storage medium |
CN109663359A (en) * | 2018-12-06 | 2019-04-23 | 广州多益网络股份有限公司 | Optimization method, device, terminal device and the storage medium of game intelligence body training |
CN110070188B (en) * | 2019-04-30 | 2021-03-30 | 山东大学 | Incremental cognitive development system and method integrating interactive reinforcement learning |
CN110070188A (en) * | 2019-04-30 | 2019-07-30 | 山东大学 | A kind of increment type cognitive development system and method merging interactive intensified learning |
CN110307848A (en) * | 2019-07-04 | 2019-10-08 | 南京大学 | A kind of Mobile Robotics Navigation method |
CN110333739A (en) * | 2019-08-21 | 2019-10-15 | 哈尔滨工程大学 | A kind of AUV conduct programming and method of controlling operation based on intensified learning |
CN110333739B (en) * | 2019-08-21 | 2020-07-31 | 哈尔滨工程大学 | AUV (autonomous Underwater vehicle) behavior planning and action control method based on reinforcement learning |
CN110703792B (en) * | 2019-11-07 | 2022-12-30 | 江苏科技大学 | Underwater robot attitude control method based on reinforcement learning |
CN110703792A (en) * | 2019-11-07 | 2020-01-17 | 江苏科技大学 | Underwater robot attitude control method based on reinforcement learning |
CN111552183B (en) * | 2020-05-17 | 2021-04-23 | 南京大学 | Six-legged robot obstacle avoidance method based on adaptive weight reinforcement learning |
CN111552183A (en) * | 2020-05-17 | 2020-08-18 | 南京大学 | Six-legged robot obstacle avoidance method based on adaptive weight reinforcement learning |
CN112297005A (en) * | 2020-10-10 | 2021-02-02 | 杭州电子科技大学 | Robot autonomous control method based on graph neural network reinforcement learning |
WO2023082949A1 (en) * | 2021-11-10 | 2023-05-19 | 达闼科技(北京)有限公司 | Agent control method and apparatus, electronic device, program, and storage medium |
CN114310870A (en) * | 2021-11-10 | 2022-04-12 | 达闼科技(北京)有限公司 | Intelligent agent control method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN102402712B (en) | 2014-03-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102402712B (en) | Robot reinforced learning initialization method based on neural network | |
CN102819264B (en) | Path planning Q-learning initial method of mobile robot | |
Qiang et al. | Reinforcement learning model, algorithms and its application | |
Mohanty et al. | Controlling the motion of an autonomous mobile robot using various techniques: a review | |
Parhi et al. | IWO-based adaptive neuro-fuzzy controller for mobile robot navigation in cluttered environments | |
CN110014428B (en) | Sequential logic task planning method based on reinforcement learning | |
Ma et al. | Wasserstein generative learning with kinematic constraints for probabilistic interactive driving behavior prediction | |
Ma et al. | State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots | |
CN110716575A (en) | UUV real-time collision avoidance planning method based on deep double-Q network reinforcement learning | |
Wang et al. | Adaptive environment modeling based reinforcement learning for collision avoidance in complex scenes | |
Sun et al. | A Fuzzy-Based Bio-Inspired Neural Network Approach for Target Search by Multiple Autonomous Underwater Vehicles in Underwater Environments. | |
Sun et al. | Event-triggered reconfigurable reinforcement learning motion-planning approach for mobile robot in unknown dynamic environments | |
Batti et al. | Mobile robot obstacle avoidance in labyrinth environment using fuzzy logic approach | |
Liu et al. | Autonomous exploration for mobile robot using Q-learning | |
Boufera et al. | Fuzzy inference system optimization by evolutionary approach for mobile robot navigation | |
Chen et al. | Survey of multi-agent strategy based on reinforcement learning | |
Song et al. | Towards efficient exploration in unknown spaces: A novel hierarchical approach based on intrinsic rewards | |
Obe et al. | Fuzzy control of autonomous mobile robot | |
Martovytskyi et al. | Approach to building a global mobile agent way based on Q-learning | |
Tang et al. | Reinforcement learning for robots path planning with rule-based shallow-trial | |
Senthilkumar et al. | Hybrid genetic-fuzzy approach to autonomous mobile robot | |
Ji et al. | Research on Path Planning of Mobile Robot Based on Reinforcement Learning | |
Li et al. | Research on obstacle avoidance strategy of grid workspace based on deep reinforcement learning | |
Ren et al. | Research on Q-ELM algorithm in robot path planning | |
Song et al. | Research on Local Path Planning for the Mobile Robot Based on QL-anfis Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20140305 Termination date: 20140831 |
|
EXPY | Termination of patent right or utility model |